AI-driven attenuation correction for brain PET/MRI: Clinical evaluation of a dementia cohort and importance of the training group size

(1)

ContentslistsavailableatScienceDirect

NeuroImage

journalhomepage:www.elsevier.com/locate/neuroimage

AI-driven attenuation correction for brain PET/MRI: Clinical evaluation of a dementia cohort and importance of the training group size

Claes Nøhr Ladefoged

¹^,^∗

, Adam Espe Hansen

¹

, Otto Mølby Henriksen

¹

, Frederik Jager Bruun

¹

, Live Eikenes

²

, Silje Kjærnes Øen

²

, Anna Karlberg

²^,³

, Liselotte Højgaard

¹

, Ian Law

¹

,

Flemming Littrup Andersen

¹

1Department of Clinical Physiology, Nuclear Medicine & PET, Rigshospitalet, University of Copenhagen, Denmark

2Department of Circulation and Medical Imaging, Norwegian University of Science and Technology, Trondheim, Norway

3cDepartment of Radiology and Nuclear Medicine, St. Olavs hospital, Trondheim University Hospital, Trondheim, Norway

a r t i c le i n f o

Keywords:

Attenuation correction Deep learning

Convolutional neural network Artiﬁcial intelligence Brain

PET/MRI

a b s t r a ct

Introduction: Robustandreliableattenuationcorrection(AC)isaprerequisiteforaccuratequantiﬁcationof activityconcentration.IncombinedPET/MRI,ACischallengedbythelackofbonesignalintheMRIfrom whichtheACmapshastobederived.Deeplearning-basedimage-to-imagetranslationnetworkspresentitself asanoptimalsolutionforMRI-derivedAC(MR-AC).Highrobustnessandgeneralizabilityofthesenetworksare expectedtobeachievedthroughlargetrainingcohorts.Inthisstudy,weimplementedanMR-ACmethodbased ondeeplearning,andinvestigatedhowtrainingcohortsize,transferlearning,andMRinputaﬀectedrobustness, andsubsequentlyevaluatedthemethodinaclinicalsetup,withtheoverallaimtoexploreifthismethodcould beimplementedinclinicalroutineforPET/MRIexaminations.

Methods: Atotalcohortof1037adultsubjectsfromtheSiemensBiographmMRwithtwodiﬀerentsoftwarever- sions(VB20PandVE11P)wasused.ThesoftwareupgradeincludedupdatestoallMRIsequences.Theimpactof traininggroupsizewasinvestigatedbytrainingaconvolutionalneuralnetwork(CNN)onanincreasingtraining groupsizefrom10to403.Theabilitytoadapttochangesintheinputimagesbetweensoftwareversionswere evaluatedusingtransferlearningfromalargecohorttoasmallercohort,byvaryingtraininggroupsizefrom5 to91subjects.TheimpactofMRIsequencewasevaluatedbytrainingthreenetworksbasedontheDixonVIBE sequence(DeepDixon),T1-weightedMPRAGE(DeepT1),andultra-shortechotime(UTE)sequence(DeepUTE).

Blindedclinicalevaluationrelativetothereferencelow-doseCT(CT-AC)wasperformedforDeepDixonin104 independent2-[¹⁸F]ﬂuoro-2-deoxy-d-glucose([¹⁸F]FDG)PETpatientstudiesperformedforsuspectedneurode- generativedisorderusingstatisticalsurfaceprojections.

Results: Robustnessincreasedwithgroupsizeinthetrainingdataset:100subjectswererequiredtoreducethe numberofoutlierscomparedtoastate-of-the-artsegmentation-basedmethod,andacohort>400subjectsfurther increasedrobustnessintermsofreducedvariationandnumberofoutliers.Whenusingtransferlearningtoadapt tochangesintheMRIinput,asfewasfivesubjectsweresufficienttominimizeoutliers.Fullrobustnesswas achievedat20subjects.ComparablerobustandaccurateresultswereobtainedusingallthreetypesofMRIinput withabiasbelow1%relativetoCT-ACinanybrainregion.TheclinicalPETevaluationusingDeepDixonshowed noclinicallyrelevantdifferencescomparedtoCT-AC.

Conclusion: DeeplearningbasedACrequiresalargetrainingcohorttoachieveaccurateandrobustperformance.

Usingtransferlearning,onlyfivesubjectswereneededtofine-tunethemethodtolargechangestotheinput images.NoclinicallyrelevantdifferenceswerefoundcomparedtoCT-AC,indicatingthatclinicalimplementation ofourdeeplearning-basedMR-ACmethodwillbefeasibleacrossMRIsystemtypesusingtransferlearningand alimitednumberofsubjects.

∗Correspondingauthor.

E-mailaddress:claes.noehr.ladefoged@regionh.dk(C.N.Ladefoged).

1. INTRODUCTION

Positronemissiontomography(PET)imagesneed tobe corrected for photon attenuationtoaccuratelyquantify themeasuredradioac- tivetissueconcentration(Andersenetal.,2014;Dicksonetal.,2014).

InadualmodalityPETandMagneticResonanceImaging(MRI)scan-

https://doi.org/10.1016/j.neuroimage.2020.117221

Received8November2019;Receivedinrevisedform15July2020;Accepted28July2020 Availableonline1August2020

(2)

ner,adensitymapforattenuationcorrection(AC)hastobederived from the MRI. This was initially not possible, which hampered the useofPET/MRIscanners,especiallyforbrainstudiesinbothclinical andresearchapplications(VandenbergheandMarsden,2015).Several MRI-guidedattenuationcorrectiontechniqueswereproposedaspoten- tialsolutions(ChenandAn,2017;Izquierdo-garciaandCatana,2016; Ladefogedetal.,2016;Mehranianetal.,2016).Elevenstate-of-the-art AC-methodswerestudiedinalargecohortofadultsubjectswithnor- malanatomy(Ladefogedetal.,2016),whichconcludedthatACwasa solvedtopicinthebrainwhenusingoneofthebestperformingmeth- ods.However,someofthesemethods,includingourownsegmentation- basedRESOLUTEmethod(Ladefogedetal.,2015),werelaterfoundto besensitivetospeciﬁedMRIsequences,and,thus,vulnerabletosystem softwareupdates.

Recently,artiﬁcialintelligence(AI)withdeeplearningconvolutional neuralnetworks(CNN)isbeingconsideredasanalternative,astheyof- feranumberofadvantagesovertheexistingmethods.Deeplearning methodscanconferrobustnesstowardschangestotheinputcausedby MRIhardware orsystemsoftware updates,aswell ascrossplatform compatibilitybetweenvendorsthroughtheprocessoftransferlearning.

Furthermore,methodsbasedonCNNsareusuallyveryprocessinginten- siveatthetrainingstep,butthegenerationofanattenuationmapfor agivensubjectoccurswithinseconds,makingthemattractivetoolsas apartofaclinicalworkﬂowwherespeed,accuracy,androbustnessare keyelements.

Since the ﬁrst use of deep learning to convert MR images to CT (Han, 2017), numerous methods have been proposed, see e.g.

(Teuho et al., 2020; Torrado-Carvajal, 2020)). Several state-of-the- art networks were employed, from traditional encoder-decoder ar- chitectures (Gong et al., 2018; Han, 2017; Torrado-Carvajal et al., 2019),togenerativeadversarialnetworks(GANs)(Arabietal.,2019; Kazemifar et al., 2019), including variants acceptingunpaired data (Geetal.,2019; Leietal.,2020;Wolterinketal., 2017;Yangetal., 2018). Most methods from the literature use small training group sizes (<30) even though larger sizes could increase generalizability and robustness. The methods are based on single or multiple MRI sequences, spanning the common T1-weighted MPRAGE as well as specializedsequences capable of visualizingbone suchas zeroecho time(ZTE) or ultra-short echotime(UTE). The possible advantages in the context of attenuation correction, especially in terms of robustness,fromusing largetraining cohortsaswellasspecializedse- quencesovertraditionalsequencesremaintobethoroughlyinvestigated systematically.

Recently, methods converting non-attenuation corrected (NAC) PET images directly to attenuation and scatter corrected PET images have emerged, mainlytargeted for whole-body applications as paired data are readily available in large numbers (Arabi et al., 2020; Shiri et al., 2019; Van Hemmen et al., 2019; Yang et al., 2019).Thedrawbacksofthesemethodsaretheirdependencetowards choiceof tracer andlimitedability toextract structuralinformation (Arabietal.,2020).Inthebrain,theperformanceofthesenewmeth- odsremainstobeevaluated thoroughlyona cohortwithneurologic abnormalities.

Theaimof thisstudywastoimplementadeeplearningCNNfor clinicalMR-ACuse,andinvestigatethepotentialimpactonthequanti- tativeaccuracyandclinicalreadingofPETscansdependingontraining groupsizeandchoiceofMRIinput.Thiswasachievedbyutilizinga largecohortofsubjectsallexaminedonthesamePET/MRIfromtwo independentsitesincludingcommonandspecializedMRI,aswellas low-doseCTimagesusedasreference.¹

1 Code and models for inference available at https://github.com/CAAI/

DeepMRAC

2. MATERIALSANDMETHODS

ThedataincludedcomprisedstudiesacquiredontwoSiemensBio- graphmMRsystems(SiemensHealthineers,Erlangen,Germany)span- ningtwodiﬀerentsoftwareversions.Alargercohort,imagedwithsoft- wareversionVB20P,wasusedtoinvestigatetheimpactofcohortsize.A smallercohort,withthemostrecentsoftwareupdate(VE11P),wasused toinvestigatetheeﬀectoftransferlearning(basedonVB20Pdata),im- pactofchoiceofMRIinput,andtoperformaclinicalevaluation.

2.1. Patients

Datasets from 1037adult subjectswereobtained retrospectively from twodiﬀerentcenters; n=1007from Rigshospitalet,University Hospital Copenhagen,Denmark,andn= 30from St.Olavs hospital, TrondheimUniversityHospital,Norway.Rigshospitaletprovideddata setsfromthecompletecohortofsubjectsreferredforaPET/MRIbrain examinationwithmatchingsame-dayheadCTbetweenNovember2013 andApril2019,examinedwithsoftwareversionVB20P(n=811)or VE11P(n=196).DatacomprisedPET/MRIstudiesimagedwithvarious tracers,butonlytheMRIsequenceswereusedtodevelopthemethod.

ThesubjectsincludedfromSt.Olavshospitalwerereferredtoaclini- cal2-[¹⁸F]ﬂuoro-2-deoxy-d-glucose([¹⁸F]FDG)PET/MRIbrainexami- nationfordementia,allexaminedwithVE11P,andhadmatchingsame- dayheadCT.RetrospectiveuseofsubjectsfromRigshospitaletwasap- provedbytheDanishPatientSafety Authority(ref.3–3013–1513/1).

ThestudyfromSt.OlavshospitalwasapprovedbytheRegionalCom- mitteeforethicsinMedicalResearch(RECCentral)(ref.2013/1371) andallsubjectsgavewritteninformedconsent.Datawereextractedonly infullyanonymizedformincompliancetoTheEuropeanGeneralData ProtectionRegulation(GDPR).

Ineachofthetwogroups(VB20PandVE11P),wedividedthesub- jectsintotraining,validation,andtestcohorts.Thetrainandvalida- tioncohortswereusedtodevelopthemethod.Thesubjectsinthein- dependent testcohort wereallimagedwith[¹⁸F]FDG;noneofthese subjects hade.g.bonemodifying cranio-facialsurgicalinterventions, cranial defects, hyperostoses,dysplasias, disfigurement or metal im- plantsbesides dentalimplants.FortheVB20Pgroup, thetestcohort wasidenticaltothepatientsrecentlyusedinourmulti-centerevalua- tion(Ladefogedetal.,2016),andthetrain/validationsplitwasdone 70/30.WeinitiallydevelopedthemodelsfortheVE11Pgroupusing 4-foldcross validation.Oncethemodelswerefinalized,we fixedthe training/validation cohortstobe thefirstcross validation.The inde- pendenttestcohortwasprospectivelyacquiredafterthemodelswere trained.AnillustrationofthesplitsforeachgroupisshowninFig.1.

2.2. Imagingprotocols

2.2.1. MRI

ThescanprotocolsalwaysincludedaT1-weighted(T1w)MPRAGE, a UTEACsequence, andaDixon-VIBE sequence(thevendordefault forMR-AC).TheupgradetoVE11Pincludedupgradedversionstoall threesequences.TheUTEACsequencewasre-implemented,changing therelationshipbetweenthetwoechoimages,withconsequencesespe- ciallytothesignalinbone(Suppl.Fig.1A).Visually,themostnotice- ablechangewastotheDixon-VIBEsequence,whichisnowavailable inhigh-resolution,targetedforbrainpurposes(Suppl.Fig.1B).Noap- parentdiﬀerencescouldbeobservedfortheT1-weightedMPRAGEse- quence.Nevertheless,inspectionofthearearepresentingboneshowed aslightdecreaseinmeanvaluefollowingtheupgrade(Suppl.Fig.1C).

SequencedetailsareavailableinTable1.

2.2.2. CT

A reference low-dose CT scan (120 kVp, 36–40 mAs, 0.6 × 0.6 × 3 mm³ voxels) of the head using a PET/CT system

(3)

Fig.1. Separationofsubjectsintotrain,validationandtestcohortswithineachgroup,andfurthersplittoinvestigatetheimpactoftraininggroupsize.Note,all30 patientsfromSt.Olavshospitalwerepartofthe91VE11Ptrainingcohort.ForeachMRIinputtype,fourmodelsaretrainedfromthen=403patientsoftheVB20P cohortwithincreasingnumberofsubjects.TheperformanceofthesemodelsisevaluatedusingtheindependentVB20Ptestcohort(n=201).ForVE11P,usingthe n=91trainingcohortpatients,atotaloffourmodelsaretrainedusingtransferlearning(TL)fromthen=403VB20Pmodel.Anadditionalmodelistrainedusing alln=91trainingpatients,butwithoutanytransferlearning(noTL).AllVE11Pmodels,andthen=403VB20Pmodelapplieddirectlywithoutre-training,are evaluatedusingtheindependentVE11Ptestcohort(n=104).ThissetupisidenticalforDeepUTE,DeepDixon,andDeepT1.

Table1

MRIsequenceparameters.

MRI Sequence Repetition time (TR) [ms] Echo time (TE) [ms] Flip angle [degrees] Acquisition time [s] Voxel size [mm ³] Matrix size VB20P

Dixon 3.6 1.23/2.46 10 19 2.6 ×2.6 ×3.1 126 ×192 ×128

T1w 1900 2.44 9 300 0.5 ×0.5 ×1 512 ×512 ×192

UTE 11.94 0.07/2.46 10 100 1.6 ×1.6 ×1.6 192 ×192 ×192

VE11P

Dixon 4.14 1.28/2.51 10 39 1.3 ×1.3 ×2 204 ×384 ×128

T1w 1900 2.44 9 300 0.5 ×0.5 ×1 512 ×512 ×192

UTE 4.64 0.07/2.46 10 118 1.6 ×1.6 ×1.6 192 ×192 ×192

Table2

Patientcharacteristicsinthe[¹⁸F]FDGPETtestsets.

Software version N Male/Female Age Mean (Range) Injected dose Mean (SD) Scan start p.i. Median (Range) VB20P 201 108/93 68 (23–96) y 203 ( + /- 20) MBq 51 (24–134) min

VE11P 104 52/52 73 (41–93) y 200 ( + /- 11) MBq 47 (39–69) min p.i.:postinjection.

(Biograph TruePoint 40, 64, or Biograph mCT, Siemens Healthi- neers)wasacquiredforallpatientsonthesamedayasthePET/MRI examination.

2.2.3. [¹⁸F]FDGPET

Thetestcohort included201 (VB20P)and104 (VE11P) subjects forquantitativeandclinicalevaluationof[¹⁸F]FDGPETdata.Thepa- tients werereferred for suspectedneurodegenerativedisease aspart of theclinical work-up.Patient characteristicsare givenin Table 2. Thesubjectswerepositionedhead-ﬁrst witharmsdown inthefully- integratedPET/MRIsystem.Datawereacquiredoverasinglebedposi- tionof25.8cmcoveringtheheadandneckfor10min.Forthepurpose ofthisstudy,thePETdatafromthePET/MRIacquisitionwererecon- structedusing3DOrdinaryPoisson-OrderedSubsetExpectationMaxi- mization(OP-OSEM)with4iterations,21subsets,and3mmGaussian post-ﬁlteringon344×344matrices(2.1×2.1×2.0mm³voxels)in linewiththeclinicalprotocols.EachMR-ACmapwasresampledtoPET

resolutionasapartofthereconstruction.Noadditionalﬁlteringwas applied.

2.3. Deepconvolutionalneuralnetwork

2.3.1. Networkstructure

Theproposednetworkusedinthis studyisshown inSupplemen- taryFigure2.The3Dconvolutionalnetworkisbasedonanencoder- decoderstructurewithsymmetryconcatenationsbetweencorrespond- ing states, inspired by the U-Net architecture (Çiçek et al., 2016; Ronnebergeretal.,2015)butmodifiedforanend-to-endimagesynthe- sistask.Specifically,eachstageinthe3D-networkconsistsof3×3×3 kernels,batchnormalization(BN),rectifiedlinearunit(ReLU)activa- tion,andadropoutlayerwithincreasingfractionfrom0.1–0.3inthe encodingpart,andviceversainthedecodingpart.Thedownsampling betweenstageswasreplacedbyconvolutionswithstride2.WeusedL₂ penaltiesforkernelregularizationontheconvolutionlayers.

(4)

2.3.2. Networktraining

TheproposednetworkswereimplementedinTensorFlow(version 2.1.0)(Abadietal.,2016).Ourexperimentsusedmeansquarederror aslossfunctionandtheAdamoptimizer(KingmaandBa,2015)witha learningrateof1×10⁻⁴trainedfor100epochswithabatchsizeof16.

AllcomputationswereperformedonanIBMPOWER9serverwithfour NVIDIATESLAV100GPUs.Thenetworksuses3Dvolumesasinputcon- sistingof16neighboringtransaxialslicesforeachMRIscan(16slices x192voxelsx192voxelsxCchannels),whereCdenotesthenumber ofimagesintheMRIsequence(in-andopposed-phaseforDixon(two channels),echoimagesforUTE(twochannels),andMPRAGEforT1w (onechannel)),andoutputsthecorrespondingCTslices(16slicesx192 voxelsx192voxelsx1channel).AllMRIsequenceswereﬁrstresampled totheresolutionoftheUTEimage,toensureisotropicvoxelsandmatrix size,andnormalizedtozeromeanandunitvariance.Subsequently,we extracted3Dvolumesfromthe192×192×192MRIscanswithastride of4.Thescannerbedandstructuresotherthanthepatientwasremoved fromtheCTimages,beforetheywereconvertedtolinearattenuation coeﬃcientsandmovedintoPET/MRIspaceusinga6-parameterrigid alignmentprocedure(minctracc,McConnellImagingCenter,Montreal, Canada)withnormalizedmutualinformationasobjectivefunction.A maskoftheCT-coveragewasappliedtothethreeMRIsequencesduring thetrainingphase.

2.3.3. Networkpredictionandpost-processing

Togeneratethedeeplearningattenuationmaps,weextractedthe 3Dstack-of-slicesaroundeachsliceinthevolume,andcomputedthe averagevoxelvaluesforeachoftheoverlappingpredictedslices.

2.4. Referencemethods

Therigidlyco-registeredCTimageswereusedasourgoldstandard ACreferenceduringbothtrainingandevaluationfollowingconversion of Hounsﬁeld Unitsas implemented on theSiemens PET/CTsystem (Carneyetal.,2006).Duetothelimitedcoverageintheneckregionby theacquiredCT,wereplacedthemissingareasbythevaluesfromthe vendor-providedUTEACmap.Toensureafaircomparison,thisreplace- mentwasalsoperformedinalltheotherattenuationmaps.Inaddition, wealsocomputedtheRESOLUTEattenuationmap(Ladefoged etal., 2015)forVB20PpatientsfromRigshospitalet.RESOLUTEiscalibrated toVB20PUTEdata, andwasthereforenot computedfor theVE11P patients. Aspartof theVE11P software upgrade,a vendor-provided atlas-basedMR-ACmethodwasmadeavailable(Koestersetal.,2016; Paulusetal.,2015),andwasusedastheMR-basedreferenceforthe VE11Ptestcohort.This methodis prone toboneartifactsrelatedto misregistrationinmorethan20%ofthecases(Øenetal.,2019).There- fore,patientswiththistypeofartifactswereexcludedfromtheanalysis oftheatlas-basedmethod.

2.5. PETevaluationmetrics

Duetotheuseofdatafromdiﬀerentsoftwareversions(VB20Pand VE11P),causingdiﬀerencesinallMRimageswithvaryingdegree,we evaluatedthecohortsseparately.

We first moved all data to common MNI space using ANTs (Avants etal., 2011) by diffeomorphic non-rigid registration of the patient’s T1w MPRAGE image to the ICBM 152 2009a template (Fonovetal.,2009).VoxelsinsidetheMNIbrainmaskwasconsidered partofthebrainmaskifthePETactivitywas>20%ofthemaximum intensityvalueofthebrain.Thevoxel-wisepercentdifferencerelative toPETwithCT-AC,definedas:

𝑅𝑒𝑙%=𝑃𝐸𝑇_𝑥−𝑃𝐸𝑇_𝐶_𝑇 𝑃𝐸𝑇𝐶𝑇 × 100,

aswellastheabsoluterelativepercentdiﬀerence,deﬁnedas:

𝐴𝑏𝑠%= ||𝑃𝐸𝑇_𝑥−𝑃𝐸𝑇_𝐶_𝑇||

𝑃𝐸𝑇𝐶𝑇 × 100,

werecalculatedforthePETimagescorrectedwitheachoftheMRI-based AC’s.

Asameasureofrobustnesstowardsoutliers,weusedthemetricin- troducedin Ladefogedetal.(Ladefogedetal.,2016) toestimate the numberofoutliersmeasuredinthePETimages.Themetriccalculates thepercentageofpatientswithina3%accuracyintheRel_%imagesfor varyingvoxel-wisefractionsofthebrain,variedfrom0%to100%.A perfectscoreforamethodisthereforetohave100%ofthevoxelsinthe brainin100%ofthepatientswithin±3%ofPETwithCT-AC.

2.6. Eﬀectofcohortsizeandchangestotheinputon[¹⁸F]FDGPET

Toevaluatetheeﬀectoftraininggroupsize,wetrainedintotalfour networkswithsizesofn={10,50,100,403}.Thesubjectsweresampled withreplacement.

Therobustnesstowardschangestotheinputimageswasevaluated usingimagesfromtheVE11Pcohort.Inrecognitionofthechangestothe MRimagesfollowingthesoftwareupgrade,itwasexpectedthatfurther ﬁne-tuningofthenetworkwasneededtoadapttothesechanges.The purposeoftheanalysiswastotestthenumberofsubjectsneededforthis adaptation.Wecomparedanetworktrainedusingagroupofallavail- abletrainingsubjects(n=91)againstn={5,20,50},alltrainedusing transferlearningfromthefullVB20Ptrainingcohort(n=403).Inaddi- tion,wealsotrainedanetworkwithouttransferlearningonthefullco- hort(n=91).TheoverviewofthesetupisshowninFig.1.Werepeated thetrainingofthetwonetworkswithlowestnumberofsubjects(n=5 andn=20)atotaloffourtimesusingdiﬀerentcombinationsoftraining subjectseachtime,todeterminetherobustnesstowardstheselectionof subjects.ThecomparisonswererepeatedforeachMRIsequencetype, usingidenticalhyperparametersaspresentedinSection2.3.Wecom- paredthenetworksbasedonthenumberofoutliersmeasuredin the PETimages,representingtherobustness.

2.7. EﬀectsofMRIsequenceon[¹⁸F]FDGPET

WeevaluatedtheeﬀectsofMRIsequenceonaccuracybytraining three independentnetworks,oneforeach sequence:Dixon,T1wand UTE,respectively.EachnetworkwastrainedonthefullVB20Pcohort, andsubsequentlyﬁne-tuned using thefullVE11Pcohort,anddesig- nated:DeepDixon,DeepT1,andDeepUTE.Weassessedtherobustness dependentonMRIsequencebycomparingthenumberofoutliersinthe VE11Pcohort.

Full brainand regionalperformancesof the networkswere eval- uatedusing anatomicalpredeﬁnedtemplate regions fromMNIspace (Collinsetal.,1999;Fonovetal.,2009),withextractionofmeanRel_% andAbs_%values.Wefurthermoregenerated parametricaverage and standarddeviationRel_%-distributionimagesacrossallpatientsforeach methodforvisualinspection.

2.8. Clinicalevaluation

The[¹⁸F]FDGPETimagesfromtheindependenttestcohort(VE11P, n=104)reconstructedusingCT-ACandDeepDixonwereanalyzedby MI Neurology(SiemensHealthineers, Erlangen,Germany). Statistical surfaceprojections(z-scoremaps)weregeneratedshowingdeviations fromavendor-provideddatabaseofhealthycontrols(46–79years)us- ingcerebellargraymatterasreferenceregion.Statisticalsurfaceprojec- tionsarewidelyusedandacceptedasthemostsensitivemethodforthe identiﬁcationofmetabolicreductionsin[¹⁸F]FDGPET.Theprojections areroutinelyusedinthereadingofclinical[¹⁸F]FDGPETscansprovid- inginformationonregionalpatternsandseverityofhypometabolism.

StatisticalsurfaceprojectionswereproducedforPETimagescreated withCT-AC andDeepDixon, andforeachpatient presented(blinded andrandomized)sidebysidetotwoexpertnuclearmedicinephysicians (IL,OH).Thereadersﬁrstindependentlyandthenbyconsensusvisu- allyscoredeachpairofprojectionsas“nodiﬀerence”,“minor,butnot

(5)

significant”,or“clinicallysignificant” wherethelatterwouldindicate achangeofdiagnosisordifferenceindicativeofdiseaseprogressionin onlyoneofthePETimages.Thisstrategywasselectedasthedifferences intheimageswereexpectedtobesmallandbarelydiscernibleondirect visualinspection,andstatisticalsurfaceprojectionsisthemostsensitive methodtodiscretechangesinaclinicalsetting(Burdetteetal.,1996).

Thereading,thus,simulatestheclinical evaluationof apatientwith follow-upimagingusing standardclinical methodology,andincludes alsotheindirecteﬀectsofperturbationsincorticaluptakecausedbyAC inducedeﬀectsonanatomicalwarpandreferenceregion.

3. RESULTS

Fig.2showstheaxialandsagittalviewsforeachproposedattenu- ationmethod(DeepDixon,DeepT1andDeepUTE)forasinglesample patientfromtheVE11Ptestcohort.Noticeespeciallytheexcellentper- formanceintheskull-baseandnasalcavitiesintheproposedmethods replicatingthemorphologyofevensmallanatomicaldetailsfromCT.

ThenetworktrainingtimeusingthefullVB20Pcohortwas40hrs,where theﬁne-tuningtothefullVE11Pcohortwas12hrs.Theinferencetime topredictanattenuationmapforanewsubjectwas4sec.Atotalof 13patients(13%)hadartifactsintheiratlas-basedattenuationmapre- latedtomisplacedbone.Thesesubjectswereremovedfromtheaverage performanceevaluationsoftheatlas-basedmethodonly.

3.1. Eﬀectofcohortsizeandchangestotheinputon[¹⁸F]FDGPET

TheeﬀectofVB20PcohortsizeinDeepUTE trainingisshownin Fig.3a,whichshowsaclearcorrelationbetweengroupsizeandmodel performanceintermsofoutliersatthe3%[¹⁸F]FDGPETerror-level.

Trainingusingn=10subjectsresultsininadequatebonerepresentation, incorrectattenuationvaluesinbraintissue,andanoverallsmootherAC mapwithan8–10%negativebiasrelativetoPETwithCT-AC(Fig.3b).

Increasingthegroupsizedecreasedtheblurringandincreasedtheim- agecontrastandoveralldetaillevelintheACimages.Furthermore,the robustnessclearlyincreasedwithgroupsize.Thus,n=100wasrequired tooutperformRESOLUTEinthenumberofoutliers.Whentrainingus- ingthefullcohort,n=403,DeepUTEmarkedlyreducedthenumber ofoutlierscomparedtoRESOLUTE.Thelargeamountoftrainingdata empowersourmethodtohandlecommonartifactssuchassignalvoids fromdentalartifacts.AnexampleofthisisillustratedinFig.4.Asimi-

larrelationshipbetweentraininggroupsizeandnumberofoutlierswere foundwhenusingDeepDixonandDeepT1(Suppl.Fig.3).DeepT1ap- pearedmorerobusttowardstraininggroupsize,as10–50subjectswere suﬃcienttoachieveperformancenearRESOLUTEandincreasinggroup sizeabove100subjectsdidnotimproverobustness.

Fig.5showstheeffectoffine-tuningtheDeepUTEnetworktoasig- nificant changeintheUTEMRIinputsequencefollowingtheVB20P toVE11Psoftwareupgrade.TheVB20Pmodelwithouttransferlearn- ingisshown,whereitisapparentthattransferlearningisnecessary.

TransferlearningfromVB20Pcohortwasperformedon5,20,50and thefulln=91VE11PcohortwithUTEMRIasinput.Here,too,robust- nesswascorrelatedtothegroupsize,butsizeneededforconvergence was markedlyreducedton=5subjects. Incrementalrobustnessim- provementswereachievedwithincreasinggroupsize.Forcomparison, trainingtheVE11Pnetworkwithouttransferlearningusingalln=91 subjectsresultedinsimilarmodelaccuracyaswhenusingbetween5and 20subjectswithtransferlearning.Overallsimilarresultswereobserved forDeepDixon,withtheexceptionthatallmodelswithtransferlearning outperformedthemodelwithouttransferlearning(Suppl.Fig.4A).As expected,DeepT1trainedonlywithVB20Ppatientsgeneralizedwellto theVE11Pcohortwithoutre-training,withperformancesurpassingthe atlas-basedmethod(Suppl.Fig.4B).Thenumberofoutlierswassimi- lartotrainingwithalln=91VE11Ptrainingsubjectswithouttransfer learning,butﬁne-tuningwithVE11Pdatafurtherimprovedtherobust- ness.Repeatingmodeltrainingusingdiﬀerenttrainingsubjectsforn=5 andn=20appearedrobustacrossallthreeMRIsequencetypes(Fig.5 andSuppl.Fig.4).

3.2. EﬀectsofMRIinputsequenceon[¹⁸F]FDGPET

Thenumberofoutliersatthe±3%level,representingtherobustness ofthemethod,wassimilaracrossallthreeproposedmethodswheneval- uatedontheVB20Ptestpatients(Fig.6A)andontheVE11Ptestpatients (Fig.6B)afterapplyingtransferlearning.Themethodsshowedasub- stantialimprovementoverbothRESOLUTEandtheatlas-basedmethod.

Therelativeandabsoluterelativepercentdiﬀerenceregionalanaly- sisfortheVE11PcohortwithtransferlearningfromtheVB20Pcohortis showninFig.7andSupplementaryFigure5,respectively.Noneofthe proposedmethodsexceeded±1%averagerelativeerror(Rel_%)inany regionofthebrain.Theatlas-basedmethodachievedalowfullbrain Rel_%of0.8±2.4%,withhigherregionalerrorssubcorticallyofupto

Fig.2. AttenuationmapcomparisonforarepresentativepatientfromtheVE11Pcohort.TheattenuationimagesareshownpriortosuperimposingUTEvaluesin theareaoutsidetheCTﬁeld-of-view.EachproposedMR-basedattenuationmapisprecededbytheunderlyingMRimageusedforinferenceforreference.Note,for simplicity,onlysecondecho(TE2)andin-phaseisshownforDepeUTEandDeepDixon,respectively.Allmodelsweretrainedusingthefullcohort(n=91)with transferlearningfromthecorrespondingVB20Pfullcohortmodels(n=403).

(6)

Fig.3. Theeﬀectoftraininggroupsizeonmodelaccu- racyofDeepUTE.A)An outlieranalysisforVB20Ptest subjects(n=201)ofmodelaccuracywithincreasingtrain- inggroupsize.B)Axialimagesofarepresentativepatient with[¹⁸F]FDGPETandcorrespondingDeepUTEAC-maps, and%-diﬀerencemapsrelativetoPETCT-AC.Thearrows intheAC-mapspointtothenasalcavityandbonewith amoredistinctresemblancetothereferenceCTwithin- creasinggroupsize.ThearrowsinthePETimagespoint toanoccipitallobe[¹⁸F]FDGPEThyper-intenseareawith convergentresemblancetothereferencestandardPETCT- AC.

7%.Themaximaloutlierforasinglepatientinanyregionofthebrain wasbelow6%forallproposedmethods(DeepUTErange:−4%to5%, DeepDixonrange:−4%to5%,DeepT1range:−5%to6%).Fortheatlas- basedmethod,theerrorsrangedfrom−15%to14%.Similarly,average absoluterelativeerror(Abs_%)wasbelow2.5%inanyregionofthebrain

fortheproposedmethods,andbetween4%and8%regionallyforthe atlas-basedmethod.TheresultsfortheregionalanalysisfortheVB20P cohortareshowninSupplementaryFigure6.

Theaveragedrelativediﬀerencemeanandstandarddeviationim- ages are shown in Fig.8 for the VE11P cohort and Supplementary

(7)

Fig.4.ExamplecaseshowingrobustnesstometallicdentalimplantsforDee- pUTEtrainedwiththefullVB20Ptraininggroup(n=403).Metalimplantsdid notcauseanynoticeableartifactsinCT,butcausedlargesignalvoidsintheUTE echoimage.TheartifactsresultedinlargeerrorsintheRESOLUTEattenuation map,whereasDeepUTEwereabletolargelycorrectfortheartifact,asshown bothintheaxialandsagittalorientation.Theattenuationimagesareshown priortosuperimposingUTEvaluesintheareaoutsidetheCTﬁeld-of-view.

Fig.5. OutlieranalysisfortheVE11Ptestpatients(n=104)showingtheeffects ofincreasinggroupsizeontransferlearningmodelaccuracyafterfine-tuning theDeepUTEmodel.ThedashedlinesrepresenttheperformanceoftheDee- pUTEmodelfromtheVB20PcohortappliedtotheVE11Ptestpatientswith- outtransferlearning(TL).Thepinklinerepresentstheperformanceoftraining thenetwork(DeepUTE)fromscratchwithoutTL,butwiththefulltraincohort (n=91),wheretheremaininglinesrepresentstheperformanceoffine-tuning ofDeepUTEwithincreasingtraininggroupsizeaftertransferlearningfromthe VB20Pcohort.Theshadedareasaroundn=5andn=20representsthe95%

conﬁdenceintervalafterrepeatingthetrainingfourtimeswithdiﬀerentsub- jectsineachrepetition.Theatlas-basedMR-ACmethod,shownforcomparison, wasonlybasedonsubjectswithoutregistration-relatedartifacts(n=91).

Figure 7 for the VB20P cohort. Again, near equal performance is achievedbyapplyingeitherinputMRIsequencetothedeeplearning method.Compared toRESOLUTE,especiallycorticalregionscloseto bonewasmoreaccuratewithalowerstandarddeviation(Suppl.Fig.7).

Table3

Consensusscoresfromclinicalevaluationof[¹⁸F]FDGPETcom- paringattenuationcorrectionusingCTandDeepDixon(VE11P;

n=104).

Consensus score Number

No difference 78 (75%)

Minor, not signiﬁcant 25 (24%)

Clinically signiﬁcant 1 (1%) ^∗

∗Diﬀerencecausedbywarperrorinspatialnormalization.

3.3. Clinicalevaluation

The104pairsof[¹⁸F]FDGPETreconstructions(CTandDeepDixon) wereevaluated,and1pair(1%)wasscoredas“clinicallysignificant different” basedon thestatisticalsurfaceprojectionwhere103pairs (99%)werescoredasnotclinicallysignificantlydifferent(Table3).On directclinical readingofthe[¹⁸F]FDGPETimage ofthesinglecase ratedas“clinicallysignificantdifferent” therewasnovisuallydiscern- ablechangeinvoxelactivity.Thedifferencescouldbetracedtoadefect spatialnormalizationwarpthatwouldbefoundonroutinequalitycon- trol.Presumablyitwasbroughtonbyscanninginextremeneckflexion combinedwithsmalldifferencesinextra-cerebralactivity.

4. DISCUSSION

This study conﬁrmed the usability of deep learning-based net- worksforMRI-based attenuationcorrectionin aclinical setting,and demonstrated performances exceeding previous state-of-the-art non- deeplearning-basedmethods.Bytrainingacommonauto-encoderar- chitectureusingincreasinggroupsizes,weshowedadirectcorrelation betweenaccuracyandsizewhenthenetworkwastrainedfromscratch.

Usingtransfer-learningfromthelargecohortofsubjects,however,we showedtheamountoftrainingdataneededtoadapttochangestothe MRIsequenceinputcouldbereducedsigniﬁcantlytoaslowas5sub- jects.Furthermore,wedemonstratedrobustnesstowardsthechoiceof MRIsequenceinput,withidenticalperformancewhenusingacommon Dixon-basedMR-ACsequenceaswiththespecializedUTEsequence.Fi- nally,wedemonstratedaretainedclinical valueandaccuracyofour methodologycomparedtoourreferenceCT-AC.

Themethodologyemployedinthisstudyisnotnovel,astheauto- encoderarchitecturehasbeenwidelyappliedforMR-ACpurposesal- ready(Gongetal.,2018;Han,2017;Liuetal.,2017).Thenoveltyof ourstudylieswiththeunprecedentedamountoftrainingdatautilized andtheanalysisofrobustnesswithrespecttothesizeofthetraining datasetandtypeofMRIinput.Deeplearningisusuallyassociatedwith largeamountsoftrainingdata,somethingthatisdifficulttoobtainin most health-careapplications.PreviouspublicationsemployingCNNs forMR-to-CTconversionarethereforeoftenbasedonsmallcohorts,with agroupsizerangingbetween10and30(Gongetal.,2018;Han,2017; Liuetal.,2017).Toinvestigatetheeffectofsize,wetrainedthenet- workend-to-endfromscratchusing10,50,100,and403subjects,respectively.Whiletherewasanimpactontheaverageperformancewith anincreasinglylargertraininggroup(Fig.3B),alargereffectwasdeter- minedtobeinthenumberofassociatedoutliers(Fig.3A),withthebest overallperformanceachievedforthelargestcohort(n=403).Interest- ingly,toachievetheperformanceofRESOLUTE,measuredinnumberof outliers,atraininggroupsizebetween50and100subjectswasneeded (Fig.3AandSuppl.Fig.3).Thissuggeststhatthedeeplearningmethods basedonfewerthan50subjectsfortrainingmightbeunstable,albeit havingdecentaverageerrors.Themodelaccuracyfurtherimproveswith increasingtraininggroupsizefrom100to403inDeepUTEandDeep- Dixon,confirmingfindingsinotherdomainswheredeeplearningwere applied(Sunetal.,2017).UsingT1wMPRAGEgenerallyappearstobe morestable(Suppl.Fig.3B),whichcouldbeduetothesequencebeing morestandardizedcomparedtoDixon-VIBEandUTE.

(8)

Fig.6. OutlieranalysisfortheVB20P(left,n=201)andVE11P(right,n=104)testpatientstoshowtheeﬀectsonmodelrobustnessbyvaryingtheMRIsequence inputtypeandacrosssoftwareupgrades.Allmodelsaretrainedusingthefulltraincohorts,n=403forVB20Pandn=91withtransferlearningforVE11P.RESOLUTE andatlas-basedmethodsareshownforcomparison.Onlysubjectswithoutregistration-relatedartifactswereusedtocomputetheoutliersfortheatlas-basedmethod (n=91).

Fig.7. FullbrainandregionalmeanrelativedifferencesacrossallVE11Ptest patients(n=104)foreachofthethreenetworkswithMRIsequencesUTE, Dixon,andT1-weightedMPRAGE,alltrainedusingthefulltraincohort(n=91) withtransferlearningfromtheVB20Pcohort,aswellastheatlas-basedMR- ACmethodforcomparison.Onlysubjectswithoutregistration-relatedartifacts wereusedtocomputetheresultsfortheatlas-basedmethod(n=91).Thebars representtheaveragerelativedifferencetoPETwithCT-ACacrosspatients.The blacklineineachrepresentsthe95%confidenceinterval.

Apopularandusefulstrategytoovercomesmalltraininggroupsizes istoapplytransferlearning(Bengioetal.,2013).Thisstrategywasalso usedbyHantoinitiatepartoftheirnetworkfromapretrainedVGG- 16layermodel(Han,2017),byJangetal.totrainamodelusing6

patientstransferlearnedfromamodelwith30 patients(Jangetal., 2018),andbyTorrado-Carvajaletal.totrainamodelpretrainedon19 T1wbrainimagestosynthesizeDixon-VIBEpelvisimagesfrom19pa- tients(Torrado-Carvajaletal.,2019).Inthisstudy,weemployedtrans- ferlearningtore-calibratethenetworktoanewimageappearancefol- lowingamajorsoftwareupgrade.Theresultsshowedlittleeﬀectofin- creasingthenumberofsubjectsabove5,as5–91subjectsfortraining yieldedsimilarmodelaccuracy(Fig.5andSuppl.Fig.4).Trainingwith transferlearningononlyﬁvesubjectsmatched(DeepUTE)orexceeded (DeepDixonandDeepT1)theperformanceof trainingonallsubjects (n=91)withouttransferlearning,demonstratingthatinformationfrom theoriginalmodeltrainedonalargecohortispreservedandutilized.

Thesefindingshaverelevancenotonlyforrecalibratingmethodsafter majorsoftwareupgrades,butalsofordistributionofmodelsbetween scanners andcenterswhenthemodelsdonotgeneralize well.Using onlyalimitednumberofsubjectswithpairedCTandMRI,themodel canbeadaptedtomatchscannersatdifferentlocations,potentiallyeven fromdifferentvendors.Wehypothesizethatsuchtransferlearningwill alsoapplytocohortswithdifferentdemographics(ethnicityetc.).

There were differences,toa variousdegree,in allthree MRI se- quencespre-andpost-upgrade,seeTable1andSupplementaryFigure 1,impactingtheabilityofthemethodstogeneralizeacrossthesystem upgrade.ThelargestdifferencewasobservedwiththeDixonsequence, mainlyexpressedinchangeofresolution,butnonetheless,DeepDixon achievedsimilar performanceaftertransferlearningasDeepT1.This suggeststhatsimilardomainadaptationtoMRIsequencesfromother vendors arefeasible,as differences in T1weighted implementations acrosssystemsarenogreaterthanbetweenVB20PandVE11Pforthe Dixon-VIBEorUTEsequence.DeepT1trainedwithVB20Pdatageneral- izedwelltoVE11Pdata,producingimagesthatwereobjectivelyidenti- caltotheimagesproducedafterfine-tuning.ThequantitativePETevalu- ationresultedina1–2%overestimationonaverage(resultsnotshown).

FurtherinspectionrevealedageneralreductioninMRIintensityinthe arearepresentingboneinpatientsexaminedaftertheupgrade(Suppl.

Fig.1C),causingDeepT1topredictdenserbone,ultimatelycausingthe overestimation.Despitethiserrorbeingacceptableformostclinicalpur- poses,wefoundthatﬁne-tuningreducedthePETbias,andindicates thatﬁne-tuningisneededafterallmajorupgradesoftheMRIsystem.

TrainingthemodelwithamoreheterogeneousdatasetwithT1weighted

(9)

Fig.8.Averagedrelativediﬀerence(leftfourcolumns)andstandarddeviation(rightfourcolumns)imagesacrossallVE11PtestpatientsRel_%images(n=104) foreachofthethreenetworkswithMRIsequencesUTE,Dixon,andT1-weightedMPRAGE,alltrainedwithtransferlearningfromtheVB20Pcohort,aswellasthe atlas-basedMR-ACmethod.Imagescomputedfortheatlas-basedmethodwereonlybasedonsubjectswithoutregistration-relatedartifacts(n=91).

MPRAGEimagesfrommultiplesitesandsystemscouldpotentiallyelim- inatetheneedforﬁne-tuningcompletely.

Usingourmethod,theaveragerelativebiasiswithin1%fromPET withCT-ACinanyregionofthebrainwithanyoftheMRimagesas input(Fig.7).Thisisessentialforclinicalevaluationasine.g.tumor delineationandtreatmentresponseassessment(Lawetal.,2019),and forneurologicalapplicationsusingthecerebellumasreferenceregion (Borghammeretal.,2010;Ishiietal.,2001;Yakushevetal.,2008).Uti- lizingthesamepatientcohortandmetricsaswasemployedinaprevious multi-centercomparison(Ladefogedetal.,2016),allowsustocompare notonlytoRESOLUTE,butalsoindirectlytotheotherbestperforming state-of-the-artmethodsfortheSiemensPET/MRI(Burgosetal.,2014; Izquierdo-Garciaetal.,2014;Méridaetal.,2017).Acrossallmetrics, ourmethodwasfoundtohavesimilarorbetterperformancethanthat ofthemostpromisingmethods.Themethodsbasedondeeplearning thathavebeenproposedintheliteraturereportcomparablePETbias aswasfoundinourwork.Jangetal.(Jangetal.,2018)andLiuetal.

(Liuetal.,2017) reportedaverageregional Rel_%[¹⁸F]FDGPETbias within±2%acrosseightsubjectsand±4%in10subjects,respectively, comparedtoatissue-segmentedthreeclass(air,softtissue,andbone) CTreference,whereGongetal.(Gongetal.,2018)reported±3%in12 subjectscomparedtoareferenceCT-AC.Howevernotethatnooutlier analysisorclinicalevaluationswereperformedin thesepublications, andarobustregionalperformanceiscriticalforclinicaluse.

Theatlas-basedmethodhadregistration-relatedartifactsin13pa- tients.Ofthese,fourwerepositionedoutside thepatientvolume,as previouslyreported(Øenetal.,2019),andcouldhavebeenmanually removedpriortoreconstruction.Theremainingerrorscorruptedtheim- age,renderingarescantheonlyoption.Despiteremovingthesepatients fromthePETevaluation,theatlas-basedmethodstillhadaglobalab- soluterelativeerrorof5%(Suppl.Fig.5),whichislikelyrelatedtothe absenceofaccurateairsegmentation,seee.g.Fig.2.ThePETbiaswas higherthanthepreviouslyreported2.5%(Øenetal.,2019),butismost likelyduetoadiﬀerenceinpatientcohort.

Specializedsequencesabletogeneratecontrastinbonehavelittle diagnosticvalue,andtheaddedcontrastcomesatthecostofincreased

acquisitiontime,andthuslesspatientcomfortandcompliance.While thespecializedsequenceshaveprovenpivotalforsegmentation-based methods(Dicksonetal.,2014),noevidenceexiststhatsuchsequences areneededinorderfordeeplearning-basedmethodstosucceed.Our results demonstratethat traditionalMRI sequences aresufficientfor deeplearning-basedMR-AC,confirmingthefindingsofseveralprevious works(Teuhoetal.,2020).Ofthethreenetworkswechosetoclinically evaluatethemoresimplifiedandpatientcompliantDeepDixon.Interms of cross-vendoruse,DeepT1istheobviouschoice,butonaSiemens mMR,thefastDixon-VIBEsequenceis alwayspartofthePETacqui- sition,andthereforeinherentlyhasreducedmotionandoptimalalign- mentofPETandMRimages.Inthe104patientexaminationsevaluated, twoexperiencedexpertreadersfoundnocaseswithclinicallysignifi- cantdifferencesbetweenCTandDeepDixon.Thespatialnormalization wasperformedindividuallyforeachPETimage,whichcouldpartlyex- plaintheminornon-significantdifferencesin24%ofthecases(Table3).

However,limitationstoDeepDixoninparticularrelatedtoabnormal bonestructures,surgicaldeformationandmetallicimplantsshouldbe keptinmind.ItisrecommendedthatevaluationofDeepDixonforthe useinbraintumorevaluationisperformedseparatelyusingtracerspe- ciﬁcclinicalmetricsasdonepreviously(Ladefogedetal.,2019,2017).

Nonetheless,thefrequencyofpotentialerrors/diﬀerencesrelatedtous- ingDeepDixonisverylow,andprobablysmallerandlessfrequentthan thatintroducedbydentalartifactsandmotiononthePET/CTsystem.

Inourcenter,wehavenowimplementedDeepDixonMR-ACinroutine clinicalimagingandperformedmorethan200[¹⁸F]FDGPETscansin adultpatientsreferredforsuspectedneurodegenerationwithoutroutine low-doseCT.Tofurtherminimizepotentialerrors,attenuation-mapsare carefullyinspectedforunusualstructuresandartifactsbeforethepa- tientleavesthedepartmentandalow-doseCTisperformediferrorsare suspectedfollowingimageinspection.

Ourstudyhadanumberoflimitations.Wechosetofocusonevalu- atingtheeffectsofgroupsizeandMRIsequenceinput.Theconclusions drawnherecouldpotentiallybedifferentifothernetworktypeswere applied.Itwasnotthescopeofthisstudytoevaluatetheeffectofdeep learningarchitecture,butwerecognizethepotentialimprovedaccuracy

(10)

associatedwithmoresophisticatednetworks,suchasthegenerativead- versarialnetwork(Goodfellowetal.,2014).Thehighaccuracyandlow numberofoutlierspresentedheresuggests,however,thatonlyminor improvementsaretobefound.Moreover,alimitationofthecompari- sonistheuseofidenticaltrainingsetupsforeachtraininggroupsize.

Tailoringthehyperparameterstoeachmodel,orinvestigatingtheuse of2Dor3Dpatchesasinputtoboostamountoftrainingsamplescould potentiallyimprovetheresultsofthenetworkswithalownumberof subjects.

5. CONCLUSION

Wehavedescribedandevaluatedadeeplearningattenuationcor- rectionapproachforPET/MRIneuroimagingusingmorethan1000sub- jects.WeshowedthatarequirementforaccurateandrobustMR-ACisa largegroupsizeofatleast50subjectsfortraining,butfurtherincreas- ingthesizeto400directlyimpactedthenumberofoutlierssigniﬁcantly.

However,usingtransferlearningfromalargecohort,agroupsizeof5 subjectswassuﬃcienttorecalibratetochangesintheMRIsequences.

Fullrobustnesswasachievedwithonly20subjects,withperformance atthesamelevel orevensurpassingthat ofalarger trainingcohort (n=91)withouttransferlearning.Furthermore,wedemonstratedro- bustnesstowardsthechoiceofMRIsequenceinput.Theclinicaleval- uationshowednoclinically relevantdiﬀerencescompared toCT-AC, althoughknowledgeaboutMR-AClimitationsisimportantwhenused inclinicalroutine.Thecombinationofaccuracy,outlierperformance, clinicalperformance,robustnesstowardsthechoiceof MRIsequence input,andlowgroupsizeneededforre-trainingfollowingamajorsoft- wareupgrade,indicatesthattheclinicalimplementationofourdeep learning-basedMR-ACmethodwillbefeasibleacrossMRIsystemtypes.

CRediTauthorstatement

ClaesNøhrLadefoged:Conceptualization,Methodology,Software, Validation, Formalanalysis, DataCuration,Writing – OriginalDraft, AdamEspeHansen:Conceptualization,Methodology,Formalanalysis, OttoMølbyHenriksen:Conceptualization,DataCuration,Resources, FrederikJagerBruun:DataCuration,LiveEikenes:DataCuration, SiljeKjærnesØen:DataCuration,AnnaKarlberg:Resources,Liselotte Højgaard: Resources,Funding Acquisition, Ian Law: Conceptualiza- tion,DataCuration,Resources,FlemmingLittrupAndersen:Concep- tualization,Methodology,Formalanalysis,Supervision.Allauthorpar- ticipatedindraftingandrevisingthemanuscript.

SupplementaryFigures

SupplementaryFigure1:DiﬀerencesbetweenVB20PandVE11P for the three sequences UTE (A), Dixon-VIBE (B), and T1-weighted MPRAGE(C).UsingtheCTbonearea(linear attenuationcoeﬃcient

>0.103cm⁻¹)asamask,themeanbonesurrogatesignal,measured withR2^∗ fromUTEsequences,arehigheraftertheupgrade(A).For T1wMPRAGE,thereisadecreaseinthesignalinthearearepresenting bone(C).Theeﬀectsoftheresolutionimprovement(Table1)forthe Dixon-VIBEsequenceareclearlyseenvisually(B).

SupplementaryFigure2:CNNU-net-likearchitectureusedinthis study. The network takes a stack-of-slices from 16 neighboring MR slices, andoutputsthecorresponding pseudo-CTimage.C represents thenumberofMRchannels:2forUTE(TE1andTE2),2forDixon(in- andopposed-phase),and1forT1w.

SupplementaryFigure3: Theeﬀectsof groupsizeon modelac- curacy.Outlier analysisshownforVB20Ptestpatients(n=201)for increasingtraininggroupsizeforDeepDixon(A)andDeepT1(B).RES- OLUTEisaddedforcomparison.

SupplementaryFigure4:OutlieranalysisfortheVE11P testpa- tients(n=104)showingtheeﬀectsofincreasinggroupsizeontrans- ferlearningmodelaccuracyafterﬁne-tuningtheDeepDixon(A)and

DeepT1(B)models.Thedashedlinesrepresenttheperformanceofthe modelfromtheVB20PcohortappliedtotheVE11Ptestpatientswith- outtransferlearning(TL).Thepinklinerepresentstheperformanceof trainingthenetworkfromscratchwithoutTL,butwiththefulltrain cohort(n=91),wheretheremaininglinesrepresentstheperformance offine-tuningwithincreasingtraininggroupsizeaftertransferlearn- ingfromtheVB20Pcohort.Theshadedareaaroundn=5andn=20 representsthe95%confidenceintervalafterrepeatingthetrainingfour times withdifferent subjectsin eachrepetition. Theatlas-basedMR- ACmethod,shownforcomparison,wasonlybasedonsubjectswithout registration-relatedartifacts(n=91).

SupplementaryFigure5:Globalandregionalmeanabsoluterel- ativedifferences acrossallVE11Ptestpatients(n=104)foreachof thethreenetworkswithMRIsequencesUTE,Dixon,andT1-weighted MPRAGE,alltrainedwithtransferlearningfromtheVB20Pcohort.The atlas-basedMR-ACmethod,shownforcomparison,wasonlybasedon subjectswithoutregistration-relatedartifacts(n=91).Thebarsrepre- senttheaverageabsoluterelativedifferencetoPETwithCT-ACacross patients.Theblacklineineachrepresentsthe95%confidenceinterval.

SupplementaryFigure6:Fullbrainandregionalmeanrelative(up- per)andabsolutemean(lower)differencesacrossallVB20Ptestpatients (n=201)foreachdeeplearningmodel.Thebarsrepresentthediffer- encetoPETwithCT-ACacrosspatients.Theblacklineineachrepresents the95%confidenceinterval.RESOLUTEshownforcomparison.

SupplementaryFigure 7:Averagedrelativediﬀerence(topthree rows) andstandarddeviation (bottom three rows)imagesacross all VB20Ptestingpatients(n=201).Pleasenotethechangeofscalecom- paredtoFigure8.

ACKNOWLEDGMENTS

ThePET/MRIsystematRigshospitaletwaskindlyprovidedbythe JohnandBirtheMeyerFoundation,Denmark.Specialthankstothebio- engineersandradiographersatRigshospitaletandSt.OlavsHospitalfor patientpreparationsandimageacquisitions.WethankIBMDenmarkfor providingtwoPOWER9serverswith4TeslaV100GPUsineachsystem.

Supplementarymaterials

Supplementarymaterialassociatedwiththisarticlecanbefound,in theonlineversion,atdoi:10.1016/j.neuroimage.2020.117221. References

Abadi, M., Barham, P., Chen, J., et al., 2016. TensorFlow: a System for Large-Scale Ma- chine Learning, in: 12th USENIX Conference on Operating Systems Design and Imple- mentation (OSDI 16). pp. 265–283.

Andersen, F.L., Ladefoged, C.N., Beyer, T., et al., 2014. Combined PET/MR imaging in neurology: mR-based attenuation correction implies a strong spatial bias when ignoring bone. Neuroimage 84, 206–216. https://doi.org/10.1016/j.neuroimage.2013.08.042 . Arabi, H., Bortolin, K., Ginovart, N., Garibotto, V., Zaidi, H., 2020. Deep learning-guided joint attenuation and scatter correction in multitracer neuroimaging studies. Hum.

Brain Mapp 1–13. https://doi.org/10.1002/hbm.25039 .

Arabi, H. , Zeng, G. , Zheng, G. , Zaidi, H. , 2019. Novel adversarial semantic structure deep learning for MRI-guided attenuation correction in brain PET / MRI Novel adversarial semantic structure deep learning for MRI-guided attenuation correction in brain PET / MRI. Eur. J. Nucl. Med. Mol. Imaging 46, 2746–2759 .

Avants, B.B., Tustison, N.J., Song, G., et al., 2011. A reproducible evaluation of ANTs sim- ilarity metric performance in brain image registration. Neuroimage 54, 2033–2044.

https://doi.org/10.1016/j.neuroimage.2010.09.025 .

Bengio, Y., Courville, A., Vincent, P., 2013. Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828.

https://doi.org/10.1109/TPAMI.2013.50 .

Borghammer, P., Chakravarty, M., Jonsdottir, K.Y., et al., 2010. Cortical hy- pometabolism and hypoperfusion in Parkinson’s disease is extensive: prob- ably even at early disease stages. Brain Struct Funct 214, 303–317.

https://doi.org/10.1007/s00429-010-0246-0 .

Burdette, J.H., Minoshima, S., Vander Borght, T., Tran, D.D., Kuhl, D.E., 1996.

Alzheimer disease: improved visual interpretation of PET images by using three-dimensional stereotaxic surface projections. Radiology 198, 837–843.

https://doi.org/10.1148/radiology.198.3.8628880 .

Burgos, N., Cardoso, M.J., Thielemans, K., et al., 2014. Attenuation correction synthesis for hybrid PET-MR scanners: application to brain studies. IEEE Trans Med Imaging 33, 2332–2341. https://doi.org/10.1109/TMI.2014.2340135 .