ContentslistsavailableatScienceDirect
NeuroImage
journalhomepage:www.elsevier.com/locate/neuroimage
AI-driven attenuation correction for brain PET/MRI: Clinical evaluation of a dementia cohort and importance of the training group size
Claes Nøhr Ladefoged
1,∗, Adam Espe Hansen
1, Otto Mølby Henriksen
1, Frederik Jager Bruun
1, Live Eikenes
2, Silje Kjærnes Øen
2, Anna Karlberg
2,3, Liselotte Højgaard
1, Ian Law
1,
Flemming Littrup Andersen
11Department of Clinical Physiology, Nuclear Medicine & PET, Rigshospitalet, University of Copenhagen, Denmark
2Department of Circulation and Medical Imaging, Norwegian University of Science and Technology, Trondheim, Norway
3cDepartment of Radiology and Nuclear Medicine, St. Olavs hospital, Trondheim University Hospital, Trondheim, Norway
a r t i c le i n f o
Keywords:
Attenuation correction Deep learning
Convolutional neural network Artificial intelligence Brain
PET/MRI
a b s t r a ct
Introduction: Robustandreliableattenuationcorrection(AC)isaprerequisiteforaccuratequantificationof activityconcentration.IncombinedPET/MRI,ACischallengedbythelackofbonesignalintheMRIfrom whichtheACmapshastobederived.Deeplearning-basedimage-to-imagetranslationnetworkspresentitself asanoptimalsolutionforMRI-derivedAC(MR-AC).Highrobustnessandgeneralizabilityofthesenetworksare expectedtobeachievedthroughlargetrainingcohorts.Inthisstudy,weimplementedanMR-ACmethodbased ondeeplearning,andinvestigatedhowtrainingcohortsize,transferlearning,andMRinputaffectedrobustness, andsubsequentlyevaluatedthemethodinaclinicalsetup,withtheoverallaimtoexploreifthismethodcould beimplementedinclinicalroutineforPET/MRIexaminations.
Methods: Atotalcohortof1037adultsubjectsfromtheSiemensBiographmMRwithtwodifferentsoftwarever- sions(VB20PandVE11P)wasused.ThesoftwareupgradeincludedupdatestoallMRIsequences.Theimpactof traininggroupsizewasinvestigatedbytrainingaconvolutionalneuralnetwork(CNN)onanincreasingtraining groupsizefrom10to403.Theabilitytoadapttochangesintheinputimagesbetweensoftwareversionswere evaluatedusingtransferlearningfromalargecohorttoasmallercohort,byvaryingtraininggroupsizefrom5 to91subjects.TheimpactofMRIsequencewasevaluatedbytrainingthreenetworksbasedontheDixonVIBE sequence(DeepDixon),T1-weightedMPRAGE(DeepT1),andultra-shortechotime(UTE)sequence(DeepUTE).
Blindedclinicalevaluationrelativetothereferencelow-doseCT(CT-AC)wasperformedforDeepDixonin104 independent2-[18F]fluoro-2-deoxy-d-glucose([18F]FDG)PETpatientstudiesperformedforsuspectedneurode- generativedisorderusingstatisticalsurfaceprojections.
Results: Robustnessincreasedwithgroupsizeinthetrainingdataset:100subjectswererequiredtoreducethe numberofoutlierscomparedtoastate-of-the-artsegmentation-basedmethod,andacohort>400subjectsfurther increasedrobustnessintermsofreducedvariationandnumberofoutliers.Whenusingtransferlearningtoadapt tochangesintheMRIinput,asfewasfivesubjectsweresufficienttominimizeoutliers.Fullrobustnesswas achievedat20subjects.ComparablerobustandaccurateresultswereobtainedusingallthreetypesofMRIinput withabiasbelow1%relativetoCT-ACinanybrainregion.TheclinicalPETevaluationusingDeepDixonshowed noclinicallyrelevantdifferencescomparedtoCT-AC.
Conclusion: DeeplearningbasedACrequiresalargetrainingcohorttoachieveaccurateandrobustperformance.
Usingtransferlearning,onlyfivesubjectswereneededtofine-tunethemethodtolargechangestotheinput images.NoclinicallyrelevantdifferenceswerefoundcomparedtoCT-AC,indicatingthatclinicalimplementation ofourdeeplearning-basedMR-ACmethodwillbefeasibleacrossMRIsystemtypesusingtransferlearningand alimitednumberofsubjects.
∗Correspondingauthor.
E-mailaddress:claes.noehr.ladefoged@regionh.dk(C.N.Ladefoged).
1. INTRODUCTION
Positronemissiontomography(PET)imagesneed tobe corrected for photon attenuationtoaccuratelyquantify themeasuredradioac- tivetissueconcentration(Andersenetal.,2014;Dicksonetal.,2014).
InadualmodalityPETandMagneticResonanceImaging(MRI)scan-
https://doi.org/10.1016/j.neuroimage.2020.117221
Received8November2019;Receivedinrevisedform15July2020;Accepted28July2020 Availableonline1August2020
1053-8119/© 2020TheAuthors.PublishedbyElsevierInc.ThisisanopenaccessarticleundertheCCBYlicense.(http://creativecommons.org/licenses/by/4.0/)
ner,adensitymapforattenuationcorrection(AC)hastobederived from the MRI. This was initially not possible, which hampered the useofPET/MRIscanners,especiallyforbrainstudiesinbothclinical andresearchapplications(VandenbergheandMarsden,2015).Several MRI-guidedattenuationcorrectiontechniqueswereproposedaspoten- tialsolutions(ChenandAn,2017;Izquierdo-garciaandCatana,2016; Ladefogedetal.,2016;Mehranianetal.,2016).Elevenstate-of-the-art AC-methodswerestudiedinalargecohortofadultsubjectswithnor- malanatomy(Ladefogedetal.,2016),whichconcludedthatACwasa solvedtopicinthebrainwhenusingoneofthebestperformingmeth- ods.However,someofthesemethods,includingourownsegmentation- basedRESOLUTEmethod(Ladefogedetal.,2015),werelaterfoundto besensitivetospecifiedMRIsequences,and,thus,vulnerabletosystem softwareupdates.
Recently,artificialintelligence(AI)withdeeplearningconvolutional neuralnetworks(CNN)isbeingconsideredasanalternative,astheyof- feranumberofadvantagesovertheexistingmethods.Deeplearning methodscanconferrobustnesstowardschangestotheinputcausedby MRIhardware orsystemsoftware updates,aswell ascrossplatform compatibilitybetweenvendorsthroughtheprocessoftransferlearning.
Furthermore,methodsbasedonCNNsareusuallyveryprocessinginten- siveatthetrainingstep,butthegenerationofanattenuationmapfor agivensubjectoccurswithinseconds,makingthemattractivetoolsas apartofaclinicalworkflowwherespeed,accuracy,androbustnessare keyelements.
Since the first use of deep learning to convert MR images to CT (Han, 2017), numerous methods have been proposed, see e.g.
(Teuho et al., 2020; Torrado-Carvajal, 2020)). Several state-of-the- art networks were employed, from traditional encoder-decoder ar- chitectures (Gong et al., 2018; Han, 2017; Torrado-Carvajal et al., 2019),togenerativeadversarialnetworks(GANs)(Arabietal.,2019; Kazemifar et al., 2019), including variants acceptingunpaired data (Geetal.,2019; Leietal.,2020;Wolterinketal., 2017;Yangetal., 2018). Most methods from the literature use small training group sizes (<30) even though larger sizes could increase generalizability and robustness. The methods are based on single or multiple MRI sequences, spanning the common T1-weighted MPRAGE as well as specializedsequences capable of visualizingbone suchas zeroecho time(ZTE) or ultra-short echotime(UTE). The possible advantages in the context of attenuation correction, especially in terms of ro- bustness,fromusing largetraining cohortsaswellasspecializedse- quencesovertraditionalsequencesremaintobethoroughlyinvestigated systematically.
Recently, methods converting non-attenuation corrected (NAC) PET images directly to attenuation and scatter corrected PET im- ages have emerged, mainlytargeted for whole-body applications as paired data are readily available in large numbers (Arabi et al., 2020; Shiri et al., 2019; Van Hemmen et al., 2019; Yang et al., 2019).Thedrawbacksofthesemethodsaretheirdependencetowards choiceof tracer andlimitedability toextract structuralinformation (Arabietal.,2020).Inthebrain,theperformanceofthesenewmeth- odsremainstobeevaluated thoroughlyona cohortwithneurologic abnormalities.
Theaimof thisstudywastoimplementadeeplearningCNNfor clinicalMR-ACuse,andinvestigatethepotentialimpactonthequanti- tativeaccuracyandclinicalreadingofPETscansdependingontraining groupsizeandchoiceofMRIinput.Thiswasachievedbyutilizinga largecohortofsubjectsallexaminedonthesamePET/MRIfromtwo independentsitesincludingcommonandspecializedMRI,aswellas low-doseCTimagesusedasreference.1
1 Code and models for inference available at https://github.com/CAAI/
DeepMRAC
2. MATERIALSANDMETHODS
ThedataincludedcomprisedstudiesacquiredontwoSiemensBio- graphmMRsystems(SiemensHealthineers,Erlangen,Germany)span- ningtwodifferentsoftwareversions.Alargercohort,imagedwithsoft- wareversionVB20P,wasusedtoinvestigatetheimpactofcohortsize.A smallercohort,withthemostrecentsoftwareupdate(VE11P),wasused toinvestigatetheeffectoftransferlearning(basedonVB20Pdata),im- pactofchoiceofMRIinput,andtoperformaclinicalevaluation.
2.1. Patients
Datasets from 1037adult subjectswereobtained retrospectively from twodifferentcenters; n=1007from Rigshospitalet,University Hospital Copenhagen,Denmark,andn= 30from St.Olavs hospital, TrondheimUniversityHospital,Norway.Rigshospitaletprovideddata setsfromthecompletecohortofsubjectsreferredforaPET/MRIbrain examinationwithmatchingsame-dayheadCTbetweenNovember2013 andApril2019,examinedwithsoftwareversionVB20P(n=811)or VE11P(n=196).DatacomprisedPET/MRIstudiesimagedwithvarious tracers,butonlytheMRIsequenceswereusedtodevelopthemethod.
ThesubjectsincludedfromSt.Olavshospitalwerereferredtoaclini- cal2-[18F]fluoro-2-deoxy-d-glucose([18F]FDG)PET/MRIbrainexami- nationfordementia,allexaminedwithVE11P,andhadmatchingsame- dayheadCT.RetrospectiveuseofsubjectsfromRigshospitaletwasap- provedbytheDanishPatientSafety Authority(ref.3–3013–1513/1).
ThestudyfromSt.OlavshospitalwasapprovedbytheRegionalCom- mitteeforethicsinMedicalResearch(RECCentral)(ref.2013/1371) andallsubjectsgavewritteninformedconsent.Datawereextractedonly infullyanonymizedformincompliancetoTheEuropeanGeneralData ProtectionRegulation(GDPR).
Ineachofthetwogroups(VB20PandVE11P),wedividedthesub- jectsintotraining,validation,andtestcohorts.Thetrainandvalida- tioncohortswereusedtodevelopthemethod.Thesubjectsinthein- dependent testcohort wereallimagedwith[18F]FDG;noneofthese subjects hade.g.bonemodifying cranio-facialsurgicalinterventions, cranial defects, hyperostoses,dysplasias, disfigurement or metal im- plantsbesides dentalimplants.FortheVB20Pgroup, thetestcohort wasidenticaltothepatientsrecentlyusedinourmulti-centerevalua- tion(Ladefogedetal.,2016),andthetrain/validationsplitwasdone 70/30.WeinitiallydevelopedthemodelsfortheVE11Pgroupusing 4-foldcross validation.Oncethemodelswerefinalized,we fixedthe training/validation cohortstobe thefirstcross validation.The inde- pendenttestcohortwasprospectivelyacquiredafterthemodelswere trained.AnillustrationofthesplitsforeachgroupisshowninFig.1.
2.2. Imagingprotocols
2.2.1. MRI
ThescanprotocolsalwaysincludedaT1-weighted(T1w)MPRAGE, a UTEACsequence, andaDixon-VIBE sequence(thevendordefault forMR-AC).TheupgradetoVE11Pincludedupgradedversionstoall threesequences.TheUTEACsequencewasre-implemented,changing therelationshipbetweenthetwoechoimages,withconsequencesespe- ciallytothesignalinbone(Suppl.Fig.1A).Visually,themostnotice- ablechangewastotheDixon-VIBEsequence,whichisnowavailable inhigh-resolution,targetedforbrainpurposes(Suppl.Fig.1B).Noap- parentdifferencescouldbeobservedfortheT1-weightedMPRAGEse- quence.Nevertheless,inspectionofthearearepresentingboneshowed aslightdecreaseinmeanvaluefollowingtheupgrade(Suppl.Fig.1C).
SequencedetailsareavailableinTable1.
2.2.2. CT
A reference low-dose CT scan (120 kVp, 36–40 mAs, 0.6 × 0.6 × 3 mm3 voxels) of the head using a PET/CT system
Fig.1. Separationofsubjectsintotrain,validationandtestcohortswithineachgroup,andfurthersplittoinvestigatetheimpactoftraininggroupsize.Note,all30 patientsfromSt.Olavshospitalwerepartofthe91VE11Ptrainingcohort.ForeachMRIinputtype,fourmodelsaretrainedfromthen=403patientsoftheVB20P cohortwithincreasingnumberofsubjects.TheperformanceofthesemodelsisevaluatedusingtheindependentVB20Ptestcohort(n=201).ForVE11P,usingthe n=91trainingcohortpatients,atotaloffourmodelsaretrainedusingtransferlearning(TL)fromthen=403VB20Pmodel.Anadditionalmodelistrainedusing alln=91trainingpatients,butwithoutanytransferlearning(noTL).AllVE11Pmodels,andthen=403VB20Pmodelapplieddirectlywithoutre-training,are evaluatedusingtheindependentVE11Ptestcohort(n=104).ThissetupisidenticalforDeepUTE,DeepDixon,andDeepT1.
Table1
MRIsequenceparameters.
MRI Sequence Repetition time (TR) [ms] Echo time (TE) [ms] Flip angle [degrees] Acquisition time [s] Voxel size [mm 3] Matrix size VB20P
Dixon 3.6 1.23/2.46 10 19 2.6 ×2.6 ×3.1 126 ×192 ×128
T1w 1900 2.44 9 300 0.5 ×0.5 ×1 512 ×512 ×192
UTE 11.94 0.07/2.46 10 100 1.6 ×1.6 ×1.6 192 ×192 ×192
VE11P
Dixon 4.14 1.28/2.51 10 39 1.3 ×1.3 ×2 204 ×384 ×128
T1w 1900 2.44 9 300 0.5 ×0.5 ×1 512 ×512 ×192
UTE 4.64 0.07/2.46 10 118 1.6 ×1.6 ×1.6 192 ×192 ×192
Table2
Patientcharacteristicsinthe[18F]FDGPETtestsets.
Software version N Male/Female Age Mean (Range) Injected dose Mean (SD) Scan start p.i. Median (Range) VB20P 201 108/93 68 (23–96) y 203 ( + /- 20) MBq 51 (24–134) min
VE11P 104 52/52 73 (41–93) y 200 ( + /- 11) MBq 47 (39–69) min p.i.:postinjection.
(Biograph TruePoint 40, 64, or Biograph mCT, Siemens Healthi- neers)wasacquiredforallpatientsonthesamedayasthePET/MRI examination.
2.2.3. [18F]FDGPET
Thetestcohort included201 (VB20P)and104 (VE11P) subjects forquantitativeandclinicalevaluationof[18F]FDGPETdata.Thepa- tients werereferred for suspectedneurodegenerativedisease aspart of theclinical work-up.Patient characteristicsare givenin Table 2. Thesubjectswerepositionedhead-first witharmsdown inthefully- integratedPET/MRIsystem.Datawereacquiredoverasinglebedposi- tionof25.8cmcoveringtheheadandneckfor10min.Forthepurpose ofthisstudy,thePETdatafromthePET/MRIacquisitionwererecon- structedusing3DOrdinaryPoisson-OrderedSubsetExpectationMaxi- mization(OP-OSEM)with4iterations,21subsets,and3mmGaussian post-filteringon344×344matrices(2.1×2.1×2.0mm3voxels)in linewiththeclinicalprotocols.EachMR-ACmapwasresampledtoPET
resolutionasapartofthereconstruction.Noadditionalfilteringwas applied.
2.3. Deepconvolutionalneuralnetwork
2.3.1. Networkstructure
Theproposednetworkusedinthis studyisshown inSupplemen- taryFigure2.The3Dconvolutionalnetworkisbasedonanencoder- decoderstructurewithsymmetryconcatenationsbetweencorrespond- ing states, inspired by the U-Net architecture (Çiçek et al., 2016; Ronnebergeretal.,2015)butmodifiedforanend-to-endimagesynthe- sistask.Specifically,eachstageinthe3D-networkconsistsof3×3×3 kernels,batchnormalization(BN),rectifiedlinearunit(ReLU)activa- tion,andadropoutlayerwithincreasingfractionfrom0.1–0.3inthe encodingpart,andviceversainthedecodingpart.Thedownsampling betweenstageswasreplacedbyconvolutionswithstride2.WeusedL2 penaltiesforkernelregularizationontheconvolutionlayers.
2.3.2. Networktraining
TheproposednetworkswereimplementedinTensorFlow(version 2.1.0)(Abadietal.,2016).Ourexperimentsusedmeansquarederror aslossfunctionandtheAdamoptimizer(KingmaandBa,2015)witha learningrateof1×10−4trainedfor100epochswithabatchsizeof16.
AllcomputationswereperformedonanIBMPOWER9serverwithfour NVIDIATESLAV100GPUs.Thenetworksuses3Dvolumesasinputcon- sistingof16neighboringtransaxialslicesforeachMRIscan(16slices x192voxelsx192voxelsxCchannels),whereCdenotesthenumber ofimagesintheMRIsequence(in-andopposed-phaseforDixon(two channels),echoimagesforUTE(twochannels),andMPRAGEforT1w (onechannel)),andoutputsthecorrespondingCTslices(16slicesx192 voxelsx192voxelsx1channel).AllMRIsequenceswerefirstresampled totheresolutionoftheUTEimage,toensureisotropicvoxelsandmatrix size,andnormalizedtozeromeanandunitvariance.Subsequently,we extracted3Dvolumesfromthe192×192×192MRIscanswithastride of4.Thescannerbedandstructuresotherthanthepatientwasremoved fromtheCTimages,beforetheywereconvertedtolinearattenuation coefficientsandmovedintoPET/MRIspaceusinga6-parameterrigid alignmentprocedure(minctracc,McConnellImagingCenter,Montreal, Canada)withnormalizedmutualinformationasobjectivefunction.A maskoftheCT-coveragewasappliedtothethreeMRIsequencesduring thetrainingphase.
2.3.3. Networkpredictionandpost-processing
Togeneratethedeeplearningattenuationmaps,weextractedthe 3Dstack-of-slicesaroundeachsliceinthevolume,andcomputedthe averagevoxelvaluesforeachoftheoverlappingpredictedslices.
2.4. Referencemethods
Therigidlyco-registeredCTimageswereusedasourgoldstandard ACreferenceduringbothtrainingandevaluationfollowingconversion of Hounsfield Unitsas implemented on theSiemens PET/CTsystem (Carneyetal.,2006).Duetothelimitedcoverageintheneckregionby theacquiredCT,wereplacedthemissingareasbythevaluesfromthe vendor-providedUTEACmap.Toensureafaircomparison,thisreplace- mentwasalsoperformedinalltheotherattenuationmaps.Inaddition, wealsocomputedtheRESOLUTEattenuationmap(Ladefoged etal., 2015)forVB20PpatientsfromRigshospitalet.RESOLUTEiscalibrated toVB20PUTEdata, andwasthereforenot computedfor theVE11P patients. Aspartof theVE11P software upgrade,a vendor-provided atlas-basedMR-ACmethodwasmadeavailable(Koestersetal.,2016; Paulusetal.,2015),andwasusedastheMR-basedreferenceforthe VE11Ptestcohort.This methodis prone toboneartifactsrelatedto misregistrationinmorethan20%ofthecases(Øenetal.,2019).There- fore,patientswiththistypeofartifactswereexcludedfromtheanalysis oftheatlas-basedmethod.
2.5. PETevaluationmetrics
Duetotheuseofdatafromdifferentsoftwareversions(VB20Pand VE11P),causingdifferencesinallMRimageswithvaryingdegree,we evaluatedthecohortsseparately.
We first moved all data to common MNI space using ANTs (Avants etal., 2011) by diffeomorphic non-rigid registration of the patient’s T1w MPRAGE image to the ICBM 152 2009a template (Fonovetal.,2009).VoxelsinsidetheMNIbrainmaskwasconsidered partofthebrainmaskifthePETactivitywas>20%ofthemaximum intensityvalueofthebrain.Thevoxel-wisepercentdifferencerelative toPETwithCT-AC,definedas:
𝑅𝑒𝑙%=𝑃𝐸𝑇𝑥−𝑃𝐸𝑇𝐶𝑇 𝑃𝐸𝑇𝐶𝑇 × 100,
aswellastheabsoluterelativepercentdifference,definedas:
𝐴𝑏𝑠%= ||𝑃𝐸𝑇𝑥−𝑃𝐸𝑇𝐶𝑇||
𝑃𝐸𝑇𝐶𝑇 × 100,
werecalculatedforthePETimagescorrectedwitheachoftheMRI-based AC’s.
Asameasureofrobustnesstowardsoutliers,weusedthemetricin- troducedin Ladefogedetal.(Ladefogedetal.,2016) toestimate the numberofoutliersmeasuredinthePETimages.Themetriccalculates thepercentageofpatientswithina3%accuracyintheRel%imagesfor varyingvoxel-wisefractionsofthebrain,variedfrom0%to100%.A perfectscoreforamethodisthereforetohave100%ofthevoxelsinthe brainin100%ofthepatientswithin±3%ofPETwithCT-AC.
2.6. Effectofcohortsizeandchangestotheinputon[18F]FDGPET
Toevaluatetheeffectoftraininggroupsize,wetrainedintotalfour networkswithsizesofn={10,50,100,403}.Thesubjectsweresampled withreplacement.
Therobustnesstowardschangestotheinputimageswasevaluated usingimagesfromtheVE11Pcohort.Inrecognitionofthechangestothe MRimagesfollowingthesoftwareupgrade,itwasexpectedthatfurther fine-tuningofthenetworkwasneededtoadapttothesechanges.The purposeoftheanalysiswastotestthenumberofsubjectsneededforthis adaptation.Wecomparedanetworktrainedusingagroupofallavail- abletrainingsubjects(n=91)againstn={5,20,50},alltrainedusing transferlearningfromthefullVB20Ptrainingcohort(n=403).Inaddi- tion,wealsotrainedanetworkwithouttransferlearningonthefullco- hort(n=91).TheoverviewofthesetupisshowninFig.1.Werepeated thetrainingofthetwonetworkswithlowestnumberofsubjects(n=5 andn=20)atotaloffourtimesusingdifferentcombinationsoftraining subjectseachtime,todeterminetherobustnesstowardstheselectionof subjects.ThecomparisonswererepeatedforeachMRIsequencetype, usingidenticalhyperparametersaspresentedinSection2.3.Wecom- paredthenetworksbasedonthenumberofoutliersmeasuredin the PETimages,representingtherobustness.
2.7. EffectsofMRIsequenceon[18F]FDGPET
WeevaluatedtheeffectsofMRIsequenceonaccuracybytraining three independentnetworks,oneforeach sequence:Dixon,T1wand UTE,respectively.EachnetworkwastrainedonthefullVB20Pcohort, andsubsequentlyfine-tuned using thefullVE11Pcohort,anddesig- nated:DeepDixon,DeepT1,andDeepUTE.Weassessedtherobustness dependentonMRIsequencebycomparingthenumberofoutliersinthe VE11Pcohort.
Full brainand regionalperformancesof the networkswere eval- uatedusing anatomicalpredefinedtemplate regions fromMNIspace (Collinsetal.,1999;Fonovetal.,2009),withextractionofmeanRel% andAbs%values.Wefurthermoregenerated parametricaverage and standarddeviationRel%-distributionimagesacrossallpatientsforeach methodforvisualinspection.
2.8. Clinicalevaluation
The[18F]FDGPETimagesfromtheindependenttestcohort(VE11P, n=104)reconstructedusingCT-ACandDeepDixonwereanalyzedby MI Neurology(SiemensHealthineers, Erlangen,Germany). Statistical surfaceprojections(z-scoremaps)weregeneratedshowingdeviations fromavendor-provideddatabaseofhealthycontrols(46–79years)us- ingcerebellargraymatterasreferenceregion.Statisticalsurfaceprojec- tionsarewidelyusedandacceptedasthemostsensitivemethodforthe identificationofmetabolicreductionsin[18F]FDGPET.Theprojections areroutinelyusedinthereadingofclinical[18F]FDGPETscansprovid- inginformationonregionalpatternsandseverityofhypometabolism.
StatisticalsurfaceprojectionswereproducedforPETimagescreated withCT-AC andDeepDixon, andforeachpatient presented(blinded andrandomized)sidebysidetotwoexpertnuclearmedicinephysicians (IL,OH).Thereadersfirstindependentlyandthenbyconsensusvisu- allyscoredeachpairofprojectionsas“nodifference”,“minor,butnot
significant”,or“clinicallysignificant” wherethelatterwouldindicate achangeofdiagnosisordifferenceindicativeofdiseaseprogressionin onlyoneofthePETimages.Thisstrategywasselectedasthedifferences intheimageswereexpectedtobesmallandbarelydiscernibleondirect visualinspection,andstatisticalsurfaceprojectionsisthemostsensitive methodtodiscretechangesinaclinicalsetting(Burdetteetal.,1996).
Thereading,thus,simulatestheclinical evaluationof apatientwith follow-upimagingusing standardclinical methodology,andincludes alsotheindirecteffectsofperturbationsincorticaluptakecausedbyAC inducedeffectsonanatomicalwarpandreferenceregion.
3. RESULTS
Fig.2showstheaxialandsagittalviewsforeachproposedattenu- ationmethod(DeepDixon,DeepT1andDeepUTE)forasinglesample patientfromtheVE11Ptestcohort.Noticeespeciallytheexcellentper- formanceintheskull-baseandnasalcavitiesintheproposedmethods replicatingthemorphologyofevensmallanatomicaldetailsfromCT.
ThenetworktrainingtimeusingthefullVB20Pcohortwas40hrs,where thefine-tuningtothefullVE11Pcohortwas12hrs.Theinferencetime topredictanattenuationmapforanewsubjectwas4sec.Atotalof 13patients(13%)hadartifactsintheiratlas-basedattenuationmapre- latedtomisplacedbone.Thesesubjectswereremovedfromtheaverage performanceevaluationsoftheatlas-basedmethodonly.
3.1. Effectofcohortsizeandchangestotheinputon[18F]FDGPET
TheeffectofVB20PcohortsizeinDeepUTE trainingisshownin Fig.3a,whichshowsaclearcorrelationbetweengroupsizeandmodel performanceintermsofoutliersatthe3%[18F]FDGPETerror-level.
Trainingusingn=10subjectsresultsininadequatebonerepresentation, incorrectattenuationvaluesinbraintissue,andanoverallsmootherAC mapwithan8–10%negativebiasrelativetoPETwithCT-AC(Fig.3b).
Increasingthegroupsizedecreasedtheblurringandincreasedtheim- agecontrastandoveralldetaillevelintheACimages.Furthermore,the robustnessclearlyincreasedwithgroupsize.Thus,n=100wasrequired tooutperformRESOLUTEinthenumberofoutliers.Whentrainingus- ingthefullcohort,n=403,DeepUTEmarkedlyreducedthenumber ofoutlierscomparedtoRESOLUTE.Thelargeamountoftrainingdata empowersourmethodtohandlecommonartifactssuchassignalvoids fromdentalartifacts.AnexampleofthisisillustratedinFig.4.Asimi-
larrelationshipbetweentraininggroupsizeandnumberofoutlierswere foundwhenusingDeepDixonandDeepT1(Suppl.Fig.3).DeepT1ap- pearedmorerobusttowardstraininggroupsize,as10–50subjectswere sufficienttoachieveperformancenearRESOLUTEandincreasinggroup sizeabove100subjectsdidnotimproverobustness.
Fig.5showstheeffectoffine-tuningtheDeepUTEnetworktoasig- nificant changeintheUTEMRIinputsequencefollowingtheVB20P toVE11Psoftwareupgrade.TheVB20Pmodelwithouttransferlearn- ingisshown,whereitisapparentthattransferlearningisnecessary.
TransferlearningfromVB20Pcohortwasperformedon5,20,50and thefulln=91VE11PcohortwithUTEMRIasinput.Here,too,robust- nesswascorrelatedtothegroupsize,butsizeneededforconvergence was markedlyreducedton=5subjects. Incrementalrobustnessim- provementswereachievedwithincreasinggroupsize.Forcomparison, trainingtheVE11Pnetworkwithouttransferlearningusingalln=91 subjectsresultedinsimilarmodelaccuracyaswhenusingbetween5and 20subjectswithtransferlearning.Overallsimilarresultswereobserved forDeepDixon,withtheexceptionthatallmodelswithtransferlearning outperformedthemodelwithouttransferlearning(Suppl.Fig.4A).As expected,DeepT1trainedonlywithVB20Ppatientsgeneralizedwellto theVE11Pcohortwithoutre-training,withperformancesurpassingthe atlas-basedmethod(Suppl.Fig.4B).Thenumberofoutlierswassimi- lartotrainingwithalln=91VE11Ptrainingsubjectswithouttransfer learning,butfine-tuningwithVE11Pdatafurtherimprovedtherobust- ness.Repeatingmodeltrainingusingdifferenttrainingsubjectsforn=5 andn=20appearedrobustacrossallthreeMRIsequencetypes(Fig.5 andSuppl.Fig.4).
3.2. EffectsofMRIinputsequenceon[18F]FDGPET
Thenumberofoutliersatthe±3%level,representingtherobustness ofthemethod,wassimilaracrossallthreeproposedmethodswheneval- uatedontheVB20Ptestpatients(Fig.6A)andontheVE11Ptestpatients (Fig.6B)afterapplyingtransferlearning.Themethodsshowedasub- stantialimprovementoverbothRESOLUTEandtheatlas-basedmethod.
Therelativeandabsoluterelativepercentdifferenceregionalanaly- sisfortheVE11PcohortwithtransferlearningfromtheVB20Pcohortis showninFig.7andSupplementaryFigure5,respectively.Noneofthe proposedmethodsexceeded±1%averagerelativeerror(Rel%)inany regionofthebrain.Theatlas-basedmethodachievedalowfullbrain Rel%of0.8±2.4%,withhigherregionalerrorssubcorticallyofupto
Fig.2. AttenuationmapcomparisonforarepresentativepatientfromtheVE11Pcohort.TheattenuationimagesareshownpriortosuperimposingUTEvaluesin theareaoutsidetheCTfield-of-view.EachproposedMR-basedattenuationmapisprecededbytheunderlyingMRimageusedforinferenceforreference.Note,for simplicity,onlysecondecho(TE2)andin-phaseisshownforDepeUTEandDeepDixon,respectively.Allmodelsweretrainedusingthefullcohort(n=91)with transferlearningfromthecorrespondingVB20Pfullcohortmodels(n=403).
Fig.3. Theeffectoftraininggroupsizeonmodelaccu- racyofDeepUTE.A)An outlieranalysisforVB20Ptest subjects(n=201)ofmodelaccuracywithincreasingtrain- inggroupsize.B)Axialimagesofarepresentativepatient with[18F]FDGPETandcorrespondingDeepUTEAC-maps, and%-differencemapsrelativetoPETCT-AC.Thearrows intheAC-mapspointtothenasalcavityandbonewith amoredistinctresemblancetothereferenceCTwithin- creasinggroupsize.ThearrowsinthePETimagespoint toanoccipitallobe[18F]FDGPEThyper-intenseareawith convergentresemblancetothereferencestandardPETCT- AC.
7%.Themaximaloutlierforasinglepatientinanyregionofthebrain wasbelow6%forallproposedmethods(DeepUTErange:−4%to5%, DeepDixonrange:−4%to5%,DeepT1range:−5%to6%).Fortheatlas- basedmethod,theerrorsrangedfrom−15%to14%.Similarly,average absoluterelativeerror(Abs%)wasbelow2.5%inanyregionofthebrain
fortheproposedmethods,andbetween4%and8%regionallyforthe atlas-basedmethod.TheresultsfortheregionalanalysisfortheVB20P cohortareshowninSupplementaryFigure6.
Theaveragedrelativedifferencemeanandstandarddeviationim- ages are shown in Fig.8 for the VE11P cohort and Supplementary
Fig.4.ExamplecaseshowingrobustnesstometallicdentalimplantsforDee- pUTEtrainedwiththefullVB20Ptraininggroup(n=403).Metalimplantsdid notcauseanynoticeableartifactsinCT,butcausedlargesignalvoidsintheUTE echoimage.TheartifactsresultedinlargeerrorsintheRESOLUTEattenuation map,whereasDeepUTEwereabletolargelycorrectfortheartifact,asshown bothintheaxialandsagittalorientation.Theattenuationimagesareshown priortosuperimposingUTEvaluesintheareaoutsidetheCTfield-of-view.
Fig.5. OutlieranalysisfortheVE11Ptestpatients(n=104)showingtheeffects ofincreasinggroupsizeontransferlearningmodelaccuracyafterfine-tuning theDeepUTEmodel.ThedashedlinesrepresenttheperformanceoftheDee- pUTEmodelfromtheVB20PcohortappliedtotheVE11Ptestpatientswith- outtransferlearning(TL).Thepinklinerepresentstheperformanceoftraining thenetwork(DeepUTE)fromscratchwithoutTL,butwiththefulltraincohort (n=91),wheretheremaininglinesrepresentstheperformanceoffine-tuning ofDeepUTEwithincreasingtraininggroupsizeaftertransferlearningfromthe VB20Pcohort.Theshadedareasaroundn=5andn=20representsthe95%
confidenceintervalafterrepeatingthetrainingfourtimeswithdifferentsub- jectsineachrepetition.Theatlas-basedMR-ACmethod,shownforcomparison, wasonlybasedonsubjectswithoutregistration-relatedartifacts(n=91).
Figure 7 for the VB20P cohort. Again, near equal performance is achievedbyapplyingeitherinputMRIsequencetothedeeplearning method.Compared toRESOLUTE,especiallycorticalregionscloseto bonewasmoreaccuratewithalowerstandarddeviation(Suppl.Fig.7).
Table3
Consensusscoresfromclinicalevaluationof[18F]FDGPETcom- paringattenuationcorrectionusingCTandDeepDixon(VE11P;
n=104).
Consensus score Number
No difference 78 (75%)
Minor, not significant 25 (24%)
Clinically significant 1 (1%) ∗
∗Differencecausedbywarperrorinspatialnormalization.
3.3. Clinicalevaluation
The104pairsof[18F]FDGPETreconstructions(CTandDeepDixon) wereevaluated,and1pair(1%)wasscoredas“clinicallysignificant different” basedon thestatisticalsurfaceprojectionwhere103pairs (99%)werescoredasnotclinicallysignificantlydifferent(Table3).On directclinical readingofthe[18F]FDGPETimage ofthesinglecase ratedas“clinicallysignificantdifferent” therewasnovisuallydiscern- ablechangeinvoxelactivity.Thedifferencescouldbetracedtoadefect spatialnormalizationwarpthatwouldbefoundonroutinequalitycon- trol.Presumablyitwasbroughtonbyscanninginextremeneckflexion combinedwithsmalldifferencesinextra-cerebralactivity.
4. DISCUSSION
This study confirmed the usability of deep learning-based net- worksforMRI-based attenuationcorrectionin aclinical setting,and demonstrated performances exceeding previous state-of-the-art non- deeplearning-basedmethods.Bytrainingacommonauto-encoderar- chitectureusingincreasinggroupsizes,weshowedadirectcorrelation betweenaccuracyandsizewhenthenetworkwastrainedfromscratch.
Usingtransfer-learningfromthelargecohortofsubjects,however,we showedtheamountoftrainingdataneededtoadapttochangestothe MRIsequenceinputcouldbereducedsignificantlytoaslowas5sub- jects.Furthermore,wedemonstratedrobustnesstowardsthechoiceof MRIsequenceinput,withidenticalperformancewhenusingacommon Dixon-basedMR-ACsequenceaswiththespecializedUTEsequence.Fi- nally,wedemonstratedaretainedclinical valueandaccuracyofour methodologycomparedtoourreferenceCT-AC.
Themethodologyemployedinthisstudyisnotnovel,astheauto- encoderarchitecturehasbeenwidelyappliedforMR-ACpurposesal- ready(Gongetal.,2018;Han,2017;Liuetal.,2017).Thenoveltyof ourstudylieswiththeunprecedentedamountoftrainingdatautilized andtheanalysisofrobustnesswithrespecttothesizeofthetraining datasetandtypeofMRIinput.Deeplearningisusuallyassociatedwith largeamountsoftrainingdata,somethingthatisdifficulttoobtainin most health-careapplications.PreviouspublicationsemployingCNNs forMR-to-CTconversionarethereforeoftenbasedonsmallcohorts,with agroupsizerangingbetween10and30(Gongetal.,2018;Han,2017; Liuetal.,2017).Toinvestigatetheeffectofsize,wetrainedthenet- workend-to-endfromscratchusing10,50,100,and403subjects,re- spectively.Whiletherewasanimpactontheaverageperformancewith anincreasinglylargertraininggroup(Fig.3B),alargereffectwasdeter- minedtobeinthenumberofassociatedoutliers(Fig.3A),withthebest overallperformanceachievedforthelargestcohort(n=403).Interest- ingly,toachievetheperformanceofRESOLUTE,measuredinnumberof outliers,atraininggroupsizebetween50and100subjectswasneeded (Fig.3AandSuppl.Fig.3).Thissuggeststhatthedeeplearningmethods basedonfewerthan50subjectsfortrainingmightbeunstable,albeit havingdecentaverageerrors.Themodelaccuracyfurtherimproveswith increasingtraininggroupsizefrom100to403inDeepUTEandDeep- Dixon,confirmingfindingsinotherdomainswheredeeplearningwere applied(Sunetal.,2017).UsingT1wMPRAGEgenerallyappearstobe morestable(Suppl.Fig.3B),whichcouldbeduetothesequencebeing morestandardizedcomparedtoDixon-VIBEandUTE.
Fig.6. OutlieranalysisfortheVB20P(left,n=201)andVE11P(right,n=104)testpatientstoshowtheeffectsonmodelrobustnessbyvaryingtheMRIsequence inputtypeandacrosssoftwareupgrades.Allmodelsaretrainedusingthefulltraincohorts,n=403forVB20Pandn=91withtransferlearningforVE11P.RESOLUTE andatlas-basedmethodsareshownforcomparison.Onlysubjectswithoutregistration-relatedartifactswereusedtocomputetheoutliersfortheatlas-basedmethod (n=91).
Fig.7. FullbrainandregionalmeanrelativedifferencesacrossallVE11Ptest patients(n=104)foreachofthethreenetworkswithMRIsequencesUTE, Dixon,andT1-weightedMPRAGE,alltrainedusingthefulltraincohort(n=91) withtransferlearningfromtheVB20Pcohort,aswellastheatlas-basedMR- ACmethodforcomparison.Onlysubjectswithoutregistration-relatedartifacts wereusedtocomputetheresultsfortheatlas-basedmethod(n=91).Thebars representtheaveragerelativedifferencetoPETwithCT-ACacrosspatients.The blacklineineachrepresentsthe95%confidenceinterval.
Apopularandusefulstrategytoovercomesmalltraininggroupsizes istoapplytransferlearning(Bengioetal.,2013).Thisstrategywasalso usedbyHantoinitiatepartoftheirnetworkfromapretrainedVGG- 16layermodel(Han,2017),byJangetal.totrainamodelusing6
patientstransferlearnedfromamodelwith30 patients(Jangetal., 2018),andbyTorrado-Carvajaletal.totrainamodelpretrainedon19 T1wbrainimagestosynthesizeDixon-VIBEpelvisimagesfrom19pa- tients(Torrado-Carvajaletal.,2019).Inthisstudy,weemployedtrans- ferlearningtore-calibratethenetworktoanewimageappearancefol- lowingamajorsoftwareupgrade.Theresultsshowedlittleeffectofin- creasingthenumberofsubjectsabove5,as5–91subjectsfortraining yieldedsimilarmodelaccuracy(Fig.5andSuppl.Fig.4).Trainingwith transferlearningononlyfivesubjectsmatched(DeepUTE)orexceeded (DeepDixonandDeepT1)theperformanceof trainingonallsubjects (n=91)withouttransferlearning,demonstratingthatinformationfrom theoriginalmodeltrainedonalargecohortispreservedandutilized.
Thesefindingshaverelevancenotonlyforrecalibratingmethodsafter majorsoftwareupgrades,butalsofordistributionofmodelsbetween scanners andcenterswhenthemodelsdonotgeneralize well.Using onlyalimitednumberofsubjectswithpairedCTandMRI,themodel canbeadaptedtomatchscannersatdifferentlocations,potentiallyeven fromdifferentvendors.Wehypothesizethatsuchtransferlearningwill alsoapplytocohortswithdifferentdemographics(ethnicityetc.).
There were differences,toa variousdegree,in allthree MRI se- quencespre-andpost-upgrade,seeTable1andSupplementaryFigure 1,impactingtheabilityofthemethodstogeneralizeacrossthesystem upgrade.ThelargestdifferencewasobservedwiththeDixonsequence, mainlyexpressedinchangeofresolution,butnonetheless,DeepDixon achievedsimilar performanceaftertransferlearningasDeepT1.This suggeststhatsimilardomainadaptationtoMRIsequencesfromother vendors arefeasible,as differences in T1weighted implementations acrosssystemsarenogreaterthanbetweenVB20PandVE11Pforthe Dixon-VIBEorUTEsequence.DeepT1trainedwithVB20Pdatageneral- izedwelltoVE11Pdata,producingimagesthatwereobjectivelyidenti- caltotheimagesproducedafterfine-tuning.ThequantitativePETevalu- ationresultedina1–2%overestimationonaverage(resultsnotshown).
FurtherinspectionrevealedageneralreductioninMRIintensityinthe arearepresentingboneinpatientsexaminedaftertheupgrade(Suppl.
Fig.1C),causingDeepT1topredictdenserbone,ultimatelycausingthe overestimation.Despitethiserrorbeingacceptableformostclinicalpur- poses,wefoundthatfine-tuningreducedthePETbias,andindicates thatfine-tuningisneededafterallmajorupgradesoftheMRIsystem.
TrainingthemodelwithamoreheterogeneousdatasetwithT1weighted
Fig.8.Averagedrelativedifference(leftfourcolumns)andstandarddeviation(rightfourcolumns)imagesacrossallVE11PtestpatientsRel%images(n=104) foreachofthethreenetworkswithMRIsequencesUTE,Dixon,andT1-weightedMPRAGE,alltrainedwithtransferlearningfromtheVB20Pcohort,aswellasthe atlas-basedMR-ACmethod.Imagescomputedfortheatlas-basedmethodwereonlybasedonsubjectswithoutregistration-relatedartifacts(n=91).
MPRAGEimagesfrommultiplesitesandsystemscouldpotentiallyelim- inatetheneedforfine-tuningcompletely.
Usingourmethod,theaveragerelativebiasiswithin1%fromPET withCT-ACinanyregionofthebrainwithanyoftheMRimagesas input(Fig.7).Thisisessentialforclinicalevaluationasine.g.tumor delineationandtreatmentresponseassessment(Lawetal.,2019),and forneurologicalapplicationsusingthecerebellumasreferenceregion (Borghammeretal.,2010;Ishiietal.,2001;Yakushevetal.,2008).Uti- lizingthesamepatientcohortandmetricsaswasemployedinaprevious multi-centercomparison(Ladefogedetal.,2016),allowsustocompare notonlytoRESOLUTE,butalsoindirectlytotheotherbestperforming state-of-the-artmethodsfortheSiemensPET/MRI(Burgosetal.,2014; Izquierdo-Garciaetal.,2014;Méridaetal.,2017).Acrossallmetrics, ourmethodwasfoundtohavesimilarorbetterperformancethanthat ofthemostpromisingmethods.Themethodsbasedondeeplearning thathavebeenproposedintheliteraturereportcomparablePETbias aswasfoundinourwork.Jangetal.(Jangetal.,2018)andLiuetal.
(Liuetal.,2017) reportedaverageregional Rel%[18F]FDGPETbias within±2%acrosseightsubjectsand±4%in10subjects,respectively, comparedtoatissue-segmentedthreeclass(air,softtissue,andbone) CTreference,whereGongetal.(Gongetal.,2018)reported±3%in12 subjectscomparedtoareferenceCT-AC.Howevernotethatnooutlier analysisorclinicalevaluationswereperformedin thesepublications, andarobustregionalperformanceiscriticalforclinicaluse.
Theatlas-basedmethodhadregistration-relatedartifactsin13pa- tients.Ofthese,fourwerepositionedoutside thepatientvolume,as previouslyreported(Øenetal.,2019),andcouldhavebeenmanually removedpriortoreconstruction.Theremainingerrorscorruptedtheim- age,renderingarescantheonlyoption.Despiteremovingthesepatients fromthePETevaluation,theatlas-basedmethodstillhadaglobalab- soluterelativeerrorof5%(Suppl.Fig.5),whichislikelyrelatedtothe absenceofaccurateairsegmentation,seee.g.Fig.2.ThePETbiaswas higherthanthepreviouslyreported2.5%(Øenetal.,2019),butismost likelyduetoadifferenceinpatientcohort.
Specializedsequencesabletogeneratecontrastinbonehavelittle diagnosticvalue,andtheaddedcontrastcomesatthecostofincreased
acquisitiontime,andthuslesspatientcomfortandcompliance.While thespecializedsequenceshaveprovenpivotalforsegmentation-based methods(Dicksonetal.,2014),noevidenceexiststhatsuchsequences areneededinorderfordeeplearning-basedmethodstosucceed.Our results demonstratethat traditionalMRI sequences aresufficientfor deeplearning-basedMR-AC,confirmingthefindingsofseveralprevious works(Teuhoetal.,2020).Ofthethreenetworkswechosetoclinically evaluatethemoresimplifiedandpatientcompliantDeepDixon.Interms of cross-vendoruse,DeepT1istheobviouschoice,butonaSiemens mMR,thefastDixon-VIBEsequenceis alwayspartofthePETacqui- sition,andthereforeinherentlyhasreducedmotionandoptimalalign- mentofPETandMRimages.Inthe104patientexaminationsevaluated, twoexperiencedexpertreadersfoundnocaseswithclinicallysignifi- cantdifferencesbetweenCTandDeepDixon.Thespatialnormalization wasperformedindividuallyforeachPETimage,whichcouldpartlyex- plaintheminornon-significantdifferencesin24%ofthecases(Table3).
However,limitationstoDeepDixoninparticularrelatedtoabnormal bonestructures,surgicaldeformationandmetallicimplantsshouldbe keptinmind.ItisrecommendedthatevaluationofDeepDixonforthe useinbraintumorevaluationisperformedseparatelyusingtracerspe- cificclinicalmetricsasdonepreviously(Ladefogedetal.,2019,2017).
Nonetheless,thefrequencyofpotentialerrors/differencesrelatedtous- ingDeepDixonisverylow,andprobablysmallerandlessfrequentthan thatintroducedbydentalartifactsandmotiononthePET/CTsystem.
Inourcenter,wehavenowimplementedDeepDixonMR-ACinroutine clinicalimagingandperformedmorethan200[18F]FDGPETscansin adultpatientsreferredforsuspectedneurodegenerationwithoutroutine low-doseCT.Tofurtherminimizepotentialerrors,attenuation-mapsare carefullyinspectedforunusualstructuresandartifactsbeforethepa- tientleavesthedepartmentandalow-doseCTisperformediferrorsare suspectedfollowingimageinspection.
Ourstudyhadanumberoflimitations.Wechosetofocusonevalu- atingtheeffectsofgroupsizeandMRIsequenceinput.Theconclusions drawnherecouldpotentiallybedifferentifothernetworktypeswere applied.Itwasnotthescopeofthisstudytoevaluatetheeffectofdeep learningarchitecture,butwerecognizethepotentialimprovedaccuracy
associatedwithmoresophisticatednetworks,suchasthegenerativead- versarialnetwork(Goodfellowetal.,2014).Thehighaccuracyandlow numberofoutlierspresentedheresuggests,however,thatonlyminor improvementsaretobefound.Moreover,alimitationofthecompari- sonistheuseofidenticaltrainingsetupsforeachtraininggroupsize.
Tailoringthehyperparameterstoeachmodel,orinvestigatingtheuse of2Dor3Dpatchesasinputtoboostamountoftrainingsamplescould potentiallyimprovetheresultsofthenetworkswithalownumberof subjects.
5. CONCLUSION
Wehavedescribedandevaluatedadeeplearningattenuationcor- rectionapproachforPET/MRIneuroimagingusingmorethan1000sub- jects.WeshowedthatarequirementforaccurateandrobustMR-ACisa largegroupsizeofatleast50subjectsfortraining,butfurtherincreas- ingthesizeto400directlyimpactedthenumberofoutlierssignificantly.
However,usingtransferlearningfromalargecohort,agroupsizeof5 subjectswassufficienttorecalibratetochangesintheMRIsequences.
Fullrobustnesswasachievedwithonly20subjects,withperformance atthesamelevel orevensurpassingthat ofalarger trainingcohort (n=91)withouttransferlearning.Furthermore,wedemonstratedro- bustnesstowardsthechoiceofMRIsequenceinput.Theclinicaleval- uationshowednoclinically relevantdifferencescompared toCT-AC, althoughknowledgeaboutMR-AClimitationsisimportantwhenused inclinicalroutine.Thecombinationofaccuracy,outlierperformance, clinicalperformance,robustnesstowardsthechoiceof MRIsequence input,andlowgroupsizeneededforre-trainingfollowingamajorsoft- wareupgrade,indicatesthattheclinicalimplementationofourdeep learning-basedMR-ACmethodwillbefeasibleacrossMRIsystemtypes.
CRediTauthorstatement
ClaesNøhrLadefoged:Conceptualization,Methodology,Software, Validation, Formalanalysis, DataCuration,Writing – OriginalDraft, AdamEspeHansen:Conceptualization,Methodology,Formalanalysis, OttoMølbyHenriksen:Conceptualization,DataCuration,Resources, FrederikJagerBruun:DataCuration,LiveEikenes:DataCuration, SiljeKjærnesØen:DataCuration,AnnaKarlberg:Resources,Liselotte Højgaard: Resources,Funding Acquisition, Ian Law: Conceptualiza- tion,DataCuration,Resources,FlemmingLittrupAndersen:Concep- tualization,Methodology,Formalanalysis,Supervision.Allauthorpar- ticipatedindraftingandrevisingthemanuscript.
SupplementaryFigures
SupplementaryFigure1:DifferencesbetweenVB20PandVE11P for the three sequences UTE (A), Dixon-VIBE (B), and T1-weighted MPRAGE(C).UsingtheCTbonearea(linear attenuationcoefficient
>0.103cm−1)asamask,themeanbonesurrogatesignal,measured withR2∗ fromUTEsequences,arehigheraftertheupgrade(A).For T1wMPRAGE,thereisadecreaseinthesignalinthearearepresenting bone(C).Theeffectsoftheresolutionimprovement(Table1)forthe Dixon-VIBEsequenceareclearlyseenvisually(B).
SupplementaryFigure2:CNNU-net-likearchitectureusedinthis study. The network takes a stack-of-slices from 16 neighboring MR slices, andoutputsthecorresponding pseudo-CTimage.C represents thenumberofMRchannels:2forUTE(TE1andTE2),2forDixon(in- andopposed-phase),and1forT1w.
SupplementaryFigure3: Theeffectsof groupsizeon modelac- curacy.Outlier analysisshownforVB20Ptestpatients(n=201)for increasingtraininggroupsizeforDeepDixon(A)andDeepT1(B).RES- OLUTEisaddedforcomparison.
SupplementaryFigure4:OutlieranalysisfortheVE11P testpa- tients(n=104)showingtheeffectsofincreasinggroupsizeontrans- ferlearningmodelaccuracyafterfine-tuningtheDeepDixon(A)and
DeepT1(B)models.Thedashedlinesrepresenttheperformanceofthe modelfromtheVB20PcohortappliedtotheVE11Ptestpatientswith- outtransferlearning(TL).Thepinklinerepresentstheperformanceof trainingthenetworkfromscratchwithoutTL,butwiththefulltrain cohort(n=91),wheretheremaininglinesrepresentstheperformance offine-tuningwithincreasingtraininggroupsizeaftertransferlearn- ingfromtheVB20Pcohort.Theshadedareaaroundn=5andn=20 representsthe95%confidenceintervalafterrepeatingthetrainingfour times withdifferent subjectsin eachrepetition. Theatlas-basedMR- ACmethod,shownforcomparison,wasonlybasedonsubjectswithout registration-relatedartifacts(n=91).
SupplementaryFigure5:Globalandregionalmeanabsoluterel- ativedifferences acrossallVE11Ptestpatients(n=104)foreachof thethreenetworkswithMRIsequencesUTE,Dixon,andT1-weighted MPRAGE,alltrainedwithtransferlearningfromtheVB20Pcohort.The atlas-basedMR-ACmethod,shownforcomparison,wasonlybasedon subjectswithoutregistration-relatedartifacts(n=91).Thebarsrepre- senttheaverageabsoluterelativedifferencetoPETwithCT-ACacross patients.Theblacklineineachrepresentsthe95%confidenceinterval.
SupplementaryFigure6:Fullbrainandregionalmeanrelative(up- per)andabsolutemean(lower)differencesacrossallVB20Ptestpatients (n=201)foreachdeeplearningmodel.Thebarsrepresentthediffer- encetoPETwithCT-ACacrosspatients.Theblacklineineachrepresents the95%confidenceinterval.RESOLUTEshownforcomparison.
SupplementaryFigure 7:Averagedrelativedifference(topthree rows) andstandarddeviation (bottom three rows)imagesacross all VB20Ptestingpatients(n=201).Pleasenotethechangeofscalecom- paredtoFigure8.
ACKNOWLEDGMENTS
ThePET/MRIsystematRigshospitaletwaskindlyprovidedbythe JohnandBirtheMeyerFoundation,Denmark.Specialthankstothebio- engineersandradiographersatRigshospitaletandSt.OlavsHospitalfor patientpreparationsandimageacquisitions.WethankIBMDenmarkfor providingtwoPOWER9serverswith4TeslaV100GPUsineachsystem.
Supplementarymaterials
Supplementarymaterialassociatedwiththisarticlecanbefound,in theonlineversion,atdoi:10.1016/j.neuroimage.2020.117221. References
Abadi, M., Barham, P., Chen, J., et al., 2016. TensorFlow: a System for Large-Scale Ma- chine Learning, in: 12th USENIX Conference on Operating Systems Design and Imple- mentation (OSDI 16). pp. 265–283.
Andersen, F.L., Ladefoged, C.N., Beyer, T., et al., 2014. Combined PET/MR imaging in neu- rology: mR-based attenuation correction implies a strong spatial bias when ignoring bone. Neuroimage 84, 206–216. https://doi.org/10.1016/j.neuroimage.2013.08.042 . Arabi, H., Bortolin, K., Ginovart, N., Garibotto, V., Zaidi, H., 2020. Deep learning-guided joint attenuation and scatter correction in multitracer neuroimaging studies. Hum.
Brain Mapp 1–13. https://doi.org/10.1002/hbm.25039 .
Arabi, H. , Zeng, G. , Zheng, G. , Zaidi, H. , 2019. Novel adversarial semantic structure deep learning for MRI-guided attenuation correction in brain PET / MRI Novel adversarial semantic structure deep learning for MRI-guided attenuation correction in brain PET / MRI. Eur. J. Nucl. Med. Mol. Imaging 46, 2746–2759 .
Avants, B.B., Tustison, N.J., Song, G., et al., 2011. A reproducible evaluation of ANTs sim- ilarity metric performance in brain image registration. Neuroimage 54, 2033–2044.
https://doi.org/10.1016/j.neuroimage.2010.09.025 .
Bengio, Y., Courville, A., Vincent, P., 2013. Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828.
https://doi.org/10.1109/TPAMI.2013.50 .
Borghammer, P., Chakravarty, M., Jonsdottir, K.Y., et al., 2010. Cortical hy- pometabolism and hypoperfusion in Parkinson’s disease is extensive: prob- ably even at early disease stages. Brain Struct Funct 214, 303–317.
https://doi.org/10.1007/s00429-010-0246-0 .
Burdette, J.H., Minoshima, S., Vander Borght, T., Tran, D.D., Kuhl, D.E., 1996.
Alzheimer disease: improved visual interpretation of PET images by using three-dimensional stereotaxic surface projections. Radiology 198, 837–843.
https://doi.org/10.1148/radiology.198.3.8628880 .
Burgos, N., Cardoso, M.J., Thielemans, K., et al., 2014. Attenuation correction synthesis for hybrid PET-MR scanners: application to brain studies. IEEE Trans Med Imaging 33, 2332–2341. https://doi.org/10.1109/TMI.2014.2340135 .