• No results found

AI-driven attenuation correction for brain PET/MRI: Clinical evaluation of a dementia cohort and importance of the training group size

N/A
N/A
Protected

Academic year: 2022

Share "AI-driven attenuation correction for brain PET/MRI: Clinical evaluation of a dementia cohort and importance of the training group size"

Copied!
11
0
0

Laster.... (Se fulltekst nå)

Fulltekst

(1)

ContentslistsavailableatScienceDirect

NeuroImage

journalhomepage:www.elsevier.com/locate/neuroimage

AI-driven attenuation correction for brain PET/MRI: Clinical evaluation of a dementia cohort and importance of the training group size

Claes Nøhr Ladefoged

1,

, Adam Espe Hansen

1

, Otto Mølby Henriksen

1

, Frederik Jager Bruun

1

, Live Eikenes

2

, Silje Kjærnes Øen

2

, Anna Karlberg

2,3

, Liselotte Højgaard

1

, Ian Law

1

,

Flemming Littrup Andersen

1

1Department of Clinical Physiology, Nuclear Medicine & PET, Rigshospitalet, University of Copenhagen, Denmark

2Department of Circulation and Medical Imaging, Norwegian University of Science and Technology, Trondheim, Norway

3cDepartment of Radiology and Nuclear Medicine, St. Olavs hospital, Trondheim University Hospital, Trondheim, Norway

a r t i c le i n f o

Keywords:

Attenuation correction Deep learning

Convolutional neural network Artificial intelligence Brain

PET/MRI

a b s t r a ct

Introduction: Robustandreliableattenuationcorrection(AC)isaprerequisiteforaccuratequantificationof activityconcentration.IncombinedPET/MRI,ACischallengedbythelackofbonesignalintheMRIfrom whichtheACmapshastobederived.Deeplearning-basedimage-to-imagetranslationnetworkspresentitself asanoptimalsolutionforMRI-derivedAC(MR-AC).Highrobustnessandgeneralizabilityofthesenetworksare expectedtobeachievedthroughlargetrainingcohorts.Inthisstudy,weimplementedanMR-ACmethodbased ondeeplearning,andinvestigatedhowtrainingcohortsize,transferlearning,andMRinputaffectedrobustness, andsubsequentlyevaluatedthemethodinaclinicalsetup,withtheoverallaimtoexploreifthismethodcould beimplementedinclinicalroutineforPET/MRIexaminations.

Methods: Atotalcohortof1037adultsubjectsfromtheSiemensBiographmMRwithtwodifferentsoftwarever- sions(VB20PandVE11P)wasused.ThesoftwareupgradeincludedupdatestoallMRIsequences.Theimpactof traininggroupsizewasinvestigatedbytrainingaconvolutionalneuralnetwork(CNN)onanincreasingtraining groupsizefrom10to403.Theabilitytoadapttochangesintheinputimagesbetweensoftwareversionswere evaluatedusingtransferlearningfromalargecohorttoasmallercohort,byvaryingtraininggroupsizefrom5 to91subjects.TheimpactofMRIsequencewasevaluatedbytrainingthreenetworksbasedontheDixonVIBE sequence(DeepDixon),T1-weightedMPRAGE(DeepT1),andultra-shortechotime(UTE)sequence(DeepUTE).

Blindedclinicalevaluationrelativetothereferencelow-doseCT(CT-AC)wasperformedforDeepDixonin104 independent2-[18F]fluoro-2-deoxy-d-glucose([18F]FDG)PETpatientstudiesperformedforsuspectedneurode- generativedisorderusingstatisticalsurfaceprojections.

Results: Robustnessincreasedwithgroupsizeinthetrainingdataset:100subjectswererequiredtoreducethe numberofoutlierscomparedtoastate-of-the-artsegmentation-basedmethod,andacohort>400subjectsfurther increasedrobustnessintermsofreducedvariationandnumberofoutliers.Whenusingtransferlearningtoadapt tochangesintheMRIinput,asfewasfivesubjectsweresufficienttominimizeoutliers.Fullrobustnesswas achievedat20subjects.ComparablerobustandaccurateresultswereobtainedusingallthreetypesofMRIinput withabiasbelow1%relativetoCT-ACinanybrainregion.TheclinicalPETevaluationusingDeepDixonshowed noclinicallyrelevantdifferencescomparedtoCT-AC.

Conclusion: DeeplearningbasedACrequiresalargetrainingcohorttoachieveaccurateandrobustperformance.

Usingtransferlearning,onlyfivesubjectswereneededtofine-tunethemethodtolargechangestotheinput images.NoclinicallyrelevantdifferenceswerefoundcomparedtoCT-AC,indicatingthatclinicalimplementation ofourdeeplearning-basedMR-ACmethodwillbefeasibleacrossMRIsystemtypesusingtransferlearningand alimitednumberofsubjects.

Correspondingauthor.

E-mailaddress:claes.noehr.ladefoged@regionh.dk(C.N.Ladefoged).

1. INTRODUCTION

Positronemissiontomography(PET)imagesneed tobe corrected for photon attenuationtoaccuratelyquantify themeasuredradioac- tivetissueconcentration(Andersenetal.,2014;Dicksonetal.,2014).

InadualmodalityPETandMagneticResonanceImaging(MRI)scan-

https://doi.org/10.1016/j.neuroimage.2020.117221

Received8November2019;Receivedinrevisedform15July2020;Accepted28July2020 Availableonline1August2020

1053-8119/© 2020TheAuthors.PublishedbyElsevierInc.ThisisanopenaccessarticleundertheCCBYlicense.(http://creativecommons.org/licenses/by/4.0/)

(2)

ner,adensitymapforattenuationcorrection(AC)hastobederived from the MRI. This was initially not possible, which hampered the useofPET/MRIscanners,especiallyforbrainstudiesinbothclinical andresearchapplications(VandenbergheandMarsden,2015).Several MRI-guidedattenuationcorrectiontechniqueswereproposedaspoten- tialsolutions(ChenandAn,2017;Izquierdo-garciaandCatana,2016; Ladefogedetal.,2016;Mehranianetal.,2016).Elevenstate-of-the-art AC-methodswerestudiedinalargecohortofadultsubjectswithnor- malanatomy(Ladefogedetal.,2016),whichconcludedthatACwasa solvedtopicinthebrainwhenusingoneofthebestperformingmeth- ods.However,someofthesemethods,includingourownsegmentation- basedRESOLUTEmethod(Ladefogedetal.,2015),werelaterfoundto besensitivetospecifiedMRIsequences,and,thus,vulnerabletosystem softwareupdates.

Recently,artificialintelligence(AI)withdeeplearningconvolutional neuralnetworks(CNN)isbeingconsideredasanalternative,astheyof- feranumberofadvantagesovertheexistingmethods.Deeplearning methodscanconferrobustnesstowardschangestotheinputcausedby MRIhardware orsystemsoftware updates,aswell ascrossplatform compatibilitybetweenvendorsthroughtheprocessoftransferlearning.

Furthermore,methodsbasedonCNNsareusuallyveryprocessinginten- siveatthetrainingstep,butthegenerationofanattenuationmapfor agivensubjectoccurswithinseconds,makingthemattractivetoolsas apartofaclinicalworkflowwherespeed,accuracy,androbustnessare keyelements.

Since the first use of deep learning to convert MR images to CT (Han, 2017), numerous methods have been proposed, see e.g.

(Teuho et al., 2020; Torrado-Carvajal, 2020)). Several state-of-the- art networks were employed, from traditional encoder-decoder ar- chitectures (Gong et al., 2018; Han, 2017; Torrado-Carvajal et al., 2019),togenerativeadversarialnetworks(GANs)(Arabietal.,2019; Kazemifar et al., 2019), including variants acceptingunpaired data (Geetal.,2019; Leietal.,2020;Wolterinketal., 2017;Yangetal., 2018). Most methods from the literature use small training group sizes (<30) even though larger sizes could increase generalizability and robustness. The methods are based on single or multiple MRI sequences, spanning the common T1-weighted MPRAGE as well as specializedsequences capable of visualizingbone suchas zeroecho time(ZTE) or ultra-short echotime(UTE). The possible advantages in the context of attenuation correction, especially in terms of ro- bustness,fromusing largetraining cohortsaswellasspecializedse- quencesovertraditionalsequencesremaintobethoroughlyinvestigated systematically.

Recently, methods converting non-attenuation corrected (NAC) PET images directly to attenuation and scatter corrected PET im- ages have emerged, mainlytargeted for whole-body applications as paired data are readily available in large numbers (Arabi et al., 2020; Shiri et al., 2019; Van Hemmen et al., 2019; Yang et al., 2019).Thedrawbacksofthesemethodsaretheirdependencetowards choiceof tracer andlimitedability toextract structuralinformation (Arabietal.,2020).Inthebrain,theperformanceofthesenewmeth- odsremainstobeevaluated thoroughlyona cohortwithneurologic abnormalities.

Theaimof thisstudywastoimplementadeeplearningCNNfor clinicalMR-ACuse,andinvestigatethepotentialimpactonthequanti- tativeaccuracyandclinicalreadingofPETscansdependingontraining groupsizeandchoiceofMRIinput.Thiswasachievedbyutilizinga largecohortofsubjectsallexaminedonthesamePET/MRIfromtwo independentsitesincludingcommonandspecializedMRI,aswellas low-doseCTimagesusedasreference.1

1 Code and models for inference available at https://github.com/CAAI/

DeepMRAC

2. MATERIALSANDMETHODS

ThedataincludedcomprisedstudiesacquiredontwoSiemensBio- graphmMRsystems(SiemensHealthineers,Erlangen,Germany)span- ningtwodifferentsoftwareversions.Alargercohort,imagedwithsoft- wareversionVB20P,wasusedtoinvestigatetheimpactofcohortsize.A smallercohort,withthemostrecentsoftwareupdate(VE11P),wasused toinvestigatetheeffectoftransferlearning(basedonVB20Pdata),im- pactofchoiceofMRIinput,andtoperformaclinicalevaluation.

2.1. Patients

Datasets from 1037adult subjectswereobtained retrospectively from twodifferentcenters; n=1007from Rigshospitalet,University Hospital Copenhagen,Denmark,andn= 30from St.Olavs hospital, TrondheimUniversityHospital,Norway.Rigshospitaletprovideddata setsfromthecompletecohortofsubjectsreferredforaPET/MRIbrain examinationwithmatchingsame-dayheadCTbetweenNovember2013 andApril2019,examinedwithsoftwareversionVB20P(n=811)or VE11P(n=196).DatacomprisedPET/MRIstudiesimagedwithvarious tracers,butonlytheMRIsequenceswereusedtodevelopthemethod.

ThesubjectsincludedfromSt.Olavshospitalwerereferredtoaclini- cal2-[18F]fluoro-2-deoxy-d-glucose([18F]FDG)PET/MRIbrainexami- nationfordementia,allexaminedwithVE11P,andhadmatchingsame- dayheadCT.RetrospectiveuseofsubjectsfromRigshospitaletwasap- provedbytheDanishPatientSafety Authority(ref.3–3013–1513/1).

ThestudyfromSt.OlavshospitalwasapprovedbytheRegionalCom- mitteeforethicsinMedicalResearch(RECCentral)(ref.2013/1371) andallsubjectsgavewritteninformedconsent.Datawereextractedonly infullyanonymizedformincompliancetoTheEuropeanGeneralData ProtectionRegulation(GDPR).

Ineachofthetwogroups(VB20PandVE11P),wedividedthesub- jectsintotraining,validation,andtestcohorts.Thetrainandvalida- tioncohortswereusedtodevelopthemethod.Thesubjectsinthein- dependent testcohort wereallimagedwith[18F]FDG;noneofthese subjects hade.g.bonemodifying cranio-facialsurgicalinterventions, cranial defects, hyperostoses,dysplasias, disfigurement or metal im- plantsbesides dentalimplants.FortheVB20Pgroup, thetestcohort wasidenticaltothepatientsrecentlyusedinourmulti-centerevalua- tion(Ladefogedetal.,2016),andthetrain/validationsplitwasdone 70/30.WeinitiallydevelopedthemodelsfortheVE11Pgroupusing 4-foldcross validation.Oncethemodelswerefinalized,we fixedthe training/validation cohortstobe thefirstcross validation.The inde- pendenttestcohortwasprospectivelyacquiredafterthemodelswere trained.AnillustrationofthesplitsforeachgroupisshowninFig.1.

2.2. Imagingprotocols

2.2.1. MRI

ThescanprotocolsalwaysincludedaT1-weighted(T1w)MPRAGE, a UTEACsequence, andaDixon-VIBE sequence(thevendordefault forMR-AC).TheupgradetoVE11Pincludedupgradedversionstoall threesequences.TheUTEACsequencewasre-implemented,changing therelationshipbetweenthetwoechoimages,withconsequencesespe- ciallytothesignalinbone(Suppl.Fig.1A).Visually,themostnotice- ablechangewastotheDixon-VIBEsequence,whichisnowavailable inhigh-resolution,targetedforbrainpurposes(Suppl.Fig.1B).Noap- parentdifferencescouldbeobservedfortheT1-weightedMPRAGEse- quence.Nevertheless,inspectionofthearearepresentingboneshowed aslightdecreaseinmeanvaluefollowingtheupgrade(Suppl.Fig.1C).

SequencedetailsareavailableinTable1.

2.2.2. CT

A reference low-dose CT scan (120 kVp, 36–40 mAs, 0.6 × 0.6 × 3 mm3 voxels) of the head using a PET/CT system

(3)

Fig.1. Separationofsubjectsintotrain,validationandtestcohortswithineachgroup,andfurthersplittoinvestigatetheimpactoftraininggroupsize.Note,all30 patientsfromSt.Olavshospitalwerepartofthe91VE11Ptrainingcohort.ForeachMRIinputtype,fourmodelsaretrainedfromthen=403patientsoftheVB20P cohortwithincreasingnumberofsubjects.TheperformanceofthesemodelsisevaluatedusingtheindependentVB20Ptestcohort(n=201).ForVE11P,usingthe n=91trainingcohortpatients,atotaloffourmodelsaretrainedusingtransferlearning(TL)fromthen=403VB20Pmodel.Anadditionalmodelistrainedusing alln=91trainingpatients,butwithoutanytransferlearning(noTL).AllVE11Pmodels,andthen=403VB20Pmodelapplieddirectlywithoutre-training,are evaluatedusingtheindependentVE11Ptestcohort(n=104).ThissetupisidenticalforDeepUTE,DeepDixon,andDeepT1.

Table1

MRIsequenceparameters.

MRI Sequence Repetition time (TR) [ms] Echo time (TE) [ms] Flip angle [degrees] Acquisition time [s] Voxel size [mm 3] Matrix size VB20P

Dixon 3.6 1.23/2.46 10 19 2.6 ×2.6 ×3.1 126 ×192 ×128

T1w 1900 2.44 9 300 0.5 ×0.5 ×1 512 ×512 ×192

UTE 11.94 0.07/2.46 10 100 1.6 ×1.6 ×1.6 192 ×192 ×192

VE11P

Dixon 4.14 1.28/2.51 10 39 1.3 ×1.3 ×2 204 ×384 ×128

T1w 1900 2.44 9 300 0.5 ×0.5 ×1 512 ×512 ×192

UTE 4.64 0.07/2.46 10 118 1.6 ×1.6 ×1.6 192 ×192 ×192

Table2

Patientcharacteristicsinthe[18F]FDGPETtestsets.

Software version N Male/Female Age Mean (Range) Injected dose Mean (SD) Scan start p.i. Median (Range) VB20P 201 108/93 68 (23–96) y 203 ( + /- 20) MBq 51 (24–134) min

VE11P 104 52/52 73 (41–93) y 200 ( + /- 11) MBq 47 (39–69) min p.i.:postinjection.

(Biograph TruePoint 40, 64, or Biograph mCT, Siemens Healthi- neers)wasacquiredforallpatientsonthesamedayasthePET/MRI examination.

2.2.3. [18F]FDGPET

Thetestcohort included201 (VB20P)and104 (VE11P) subjects forquantitativeandclinicalevaluationof[18F]FDGPETdata.Thepa- tients werereferred for suspectedneurodegenerativedisease aspart of theclinical work-up.Patient characteristicsare givenin Table 2. Thesubjectswerepositionedhead-first witharmsdown inthefully- integratedPET/MRIsystem.Datawereacquiredoverasinglebedposi- tionof25.8cmcoveringtheheadandneckfor10min.Forthepurpose ofthisstudy,thePETdatafromthePET/MRIacquisitionwererecon- structedusing3DOrdinaryPoisson-OrderedSubsetExpectationMaxi- mization(OP-OSEM)with4iterations,21subsets,and3mmGaussian post-filteringon344×344matrices(2.1×2.1×2.0mm3voxels)in linewiththeclinicalprotocols.EachMR-ACmapwasresampledtoPET

resolutionasapartofthereconstruction.Noadditionalfilteringwas applied.

2.3. Deepconvolutionalneuralnetwork

2.3.1. Networkstructure

Theproposednetworkusedinthis studyisshown inSupplemen- taryFigure2.The3Dconvolutionalnetworkisbasedonanencoder- decoderstructurewithsymmetryconcatenationsbetweencorrespond- ing states, inspired by the U-Net architecture (Çiçek et al., 2016; Ronnebergeretal.,2015)butmodifiedforanend-to-endimagesynthe- sistask.Specifically,eachstageinthe3D-networkconsistsof3×3×3 kernels,batchnormalization(BN),rectifiedlinearunit(ReLU)activa- tion,andadropoutlayerwithincreasingfractionfrom0.1–0.3inthe encodingpart,andviceversainthedecodingpart.Thedownsampling betweenstageswasreplacedbyconvolutionswithstride2.WeusedL2 penaltiesforkernelregularizationontheconvolutionlayers.

(4)

2.3.2. Networktraining

TheproposednetworkswereimplementedinTensorFlow(version 2.1.0)(Abadietal.,2016).Ourexperimentsusedmeansquarederror aslossfunctionandtheAdamoptimizer(KingmaandBa,2015)witha learningrateof1×104trainedfor100epochswithabatchsizeof16.

AllcomputationswereperformedonanIBMPOWER9serverwithfour NVIDIATESLAV100GPUs.Thenetworksuses3Dvolumesasinputcon- sistingof16neighboringtransaxialslicesforeachMRIscan(16slices x192voxelsx192voxelsxCchannels),whereCdenotesthenumber ofimagesintheMRIsequence(in-andopposed-phaseforDixon(two channels),echoimagesforUTE(twochannels),andMPRAGEforT1w (onechannel)),andoutputsthecorrespondingCTslices(16slicesx192 voxelsx192voxelsx1channel).AllMRIsequenceswerefirstresampled totheresolutionoftheUTEimage,toensureisotropicvoxelsandmatrix size,andnormalizedtozeromeanandunitvariance.Subsequently,we extracted3Dvolumesfromthe192×192×192MRIscanswithastride of4.Thescannerbedandstructuresotherthanthepatientwasremoved fromtheCTimages,beforetheywereconvertedtolinearattenuation coefficientsandmovedintoPET/MRIspaceusinga6-parameterrigid alignmentprocedure(minctracc,McConnellImagingCenter,Montreal, Canada)withnormalizedmutualinformationasobjectivefunction.A maskoftheCT-coveragewasappliedtothethreeMRIsequencesduring thetrainingphase.

2.3.3. Networkpredictionandpost-processing

Togeneratethedeeplearningattenuationmaps,weextractedthe 3Dstack-of-slicesaroundeachsliceinthevolume,andcomputedthe averagevoxelvaluesforeachoftheoverlappingpredictedslices.

2.4. Referencemethods

Therigidlyco-registeredCTimageswereusedasourgoldstandard ACreferenceduringbothtrainingandevaluationfollowingconversion of Hounsfield Unitsas implemented on theSiemens PET/CTsystem (Carneyetal.,2006).Duetothelimitedcoverageintheneckregionby theacquiredCT,wereplacedthemissingareasbythevaluesfromthe vendor-providedUTEACmap.Toensureafaircomparison,thisreplace- mentwasalsoperformedinalltheotherattenuationmaps.Inaddition, wealsocomputedtheRESOLUTEattenuationmap(Ladefoged etal., 2015)forVB20PpatientsfromRigshospitalet.RESOLUTEiscalibrated toVB20PUTEdata, andwasthereforenot computedfor theVE11P patients. Aspartof theVE11P software upgrade,a vendor-provided atlas-basedMR-ACmethodwasmadeavailable(Koestersetal.,2016; Paulusetal.,2015),andwasusedastheMR-basedreferenceforthe VE11Ptestcohort.This methodis prone toboneartifactsrelatedto misregistrationinmorethan20%ofthecases(Øenetal.,2019).There- fore,patientswiththistypeofartifactswereexcludedfromtheanalysis oftheatlas-basedmethod.

2.5. PETevaluationmetrics

Duetotheuseofdatafromdifferentsoftwareversions(VB20Pand VE11P),causingdifferencesinallMRimageswithvaryingdegree,we evaluatedthecohortsseparately.

We first moved all data to common MNI space using ANTs (Avants etal., 2011) by diffeomorphic non-rigid registration of the patient’s T1w MPRAGE image to the ICBM 152 2009a template (Fonovetal.,2009).VoxelsinsidetheMNIbrainmaskwasconsidered partofthebrainmaskifthePETactivitywas>20%ofthemaximum intensityvalueofthebrain.Thevoxel-wisepercentdifferencerelative toPETwithCT-AC,definedas:

𝑅𝑒𝑙%=𝑃𝐸𝑇𝑥𝑃𝐸𝑇𝐶𝑇 𝑃𝐸𝑇𝐶𝑇 × 100,

aswellastheabsoluterelativepercentdifference,definedas:

𝐴𝑏𝑠%= ||𝑃𝐸𝑇𝑥𝑃𝐸𝑇𝐶𝑇||

𝑃𝐸𝑇𝐶𝑇 × 100,

werecalculatedforthePETimagescorrectedwitheachoftheMRI-based AC’s.

Asameasureofrobustnesstowardsoutliers,weusedthemetricin- troducedin Ladefogedetal.(Ladefogedetal.,2016) toestimate the numberofoutliersmeasuredinthePETimages.Themetriccalculates thepercentageofpatientswithina3%accuracyintheRel%imagesfor varyingvoxel-wisefractionsofthebrain,variedfrom0%to100%.A perfectscoreforamethodisthereforetohave100%ofthevoxelsinthe brainin100%ofthepatientswithin±3%ofPETwithCT-AC.

2.6. Effectofcohortsizeandchangestotheinputon[18F]FDGPET

Toevaluatetheeffectoftraininggroupsize,wetrainedintotalfour networkswithsizesofn={10,50,100,403}.Thesubjectsweresampled withreplacement.

Therobustnesstowardschangestotheinputimageswasevaluated usingimagesfromtheVE11Pcohort.Inrecognitionofthechangestothe MRimagesfollowingthesoftwareupgrade,itwasexpectedthatfurther fine-tuningofthenetworkwasneededtoadapttothesechanges.The purposeoftheanalysiswastotestthenumberofsubjectsneededforthis adaptation.Wecomparedanetworktrainedusingagroupofallavail- abletrainingsubjects(n=91)againstn={5,20,50},alltrainedusing transferlearningfromthefullVB20Ptrainingcohort(n=403).Inaddi- tion,wealsotrainedanetworkwithouttransferlearningonthefullco- hort(n=91).TheoverviewofthesetupisshowninFig.1.Werepeated thetrainingofthetwonetworkswithlowestnumberofsubjects(n=5 andn=20)atotaloffourtimesusingdifferentcombinationsoftraining subjectseachtime,todeterminetherobustnesstowardstheselectionof subjects.ThecomparisonswererepeatedforeachMRIsequencetype, usingidenticalhyperparametersaspresentedinSection2.3.Wecom- paredthenetworksbasedonthenumberofoutliersmeasuredin the PETimages,representingtherobustness.

2.7. EffectsofMRIsequenceon[18F]FDGPET

WeevaluatedtheeffectsofMRIsequenceonaccuracybytraining three independentnetworks,oneforeach sequence:Dixon,T1wand UTE,respectively.EachnetworkwastrainedonthefullVB20Pcohort, andsubsequentlyfine-tuned using thefullVE11Pcohort,anddesig- nated:DeepDixon,DeepT1,andDeepUTE.Weassessedtherobustness dependentonMRIsequencebycomparingthenumberofoutliersinthe VE11Pcohort.

Full brainand regionalperformancesof the networkswere eval- uatedusing anatomicalpredefinedtemplate regions fromMNIspace (Collinsetal.,1999;Fonovetal.,2009),withextractionofmeanRel% andAbs%values.Wefurthermoregenerated parametricaverage and standarddeviationRel%-distributionimagesacrossallpatientsforeach methodforvisualinspection.

2.8. Clinicalevaluation

The[18F]FDGPETimagesfromtheindependenttestcohort(VE11P, n=104)reconstructedusingCT-ACandDeepDixonwereanalyzedby MI Neurology(SiemensHealthineers, Erlangen,Germany). Statistical surfaceprojections(z-scoremaps)weregeneratedshowingdeviations fromavendor-provideddatabaseofhealthycontrols(46–79years)us- ingcerebellargraymatterasreferenceregion.Statisticalsurfaceprojec- tionsarewidelyusedandacceptedasthemostsensitivemethodforthe identificationofmetabolicreductionsin[18F]FDGPET.Theprojections areroutinelyusedinthereadingofclinical[18F]FDGPETscansprovid- inginformationonregionalpatternsandseverityofhypometabolism.

StatisticalsurfaceprojectionswereproducedforPETimagescreated withCT-AC andDeepDixon, andforeachpatient presented(blinded andrandomized)sidebysidetotwoexpertnuclearmedicinephysicians (IL,OH).Thereadersfirstindependentlyandthenbyconsensusvisu- allyscoredeachpairofprojectionsas“nodifference”,“minor,butnot

(5)

significant”,or“clinicallysignificant” wherethelatterwouldindicate achangeofdiagnosisordifferenceindicativeofdiseaseprogressionin onlyoneofthePETimages.Thisstrategywasselectedasthedifferences intheimageswereexpectedtobesmallandbarelydiscernibleondirect visualinspection,andstatisticalsurfaceprojectionsisthemostsensitive methodtodiscretechangesinaclinicalsetting(Burdetteetal.,1996).

Thereading,thus,simulatestheclinical evaluationof apatientwith follow-upimagingusing standardclinical methodology,andincludes alsotheindirecteffectsofperturbationsincorticaluptakecausedbyAC inducedeffectsonanatomicalwarpandreferenceregion.

3. RESULTS

Fig.2showstheaxialandsagittalviewsforeachproposedattenu- ationmethod(DeepDixon,DeepT1andDeepUTE)forasinglesample patientfromtheVE11Ptestcohort.Noticeespeciallytheexcellentper- formanceintheskull-baseandnasalcavitiesintheproposedmethods replicatingthemorphologyofevensmallanatomicaldetailsfromCT.

ThenetworktrainingtimeusingthefullVB20Pcohortwas40hrs,where thefine-tuningtothefullVE11Pcohortwas12hrs.Theinferencetime topredictanattenuationmapforanewsubjectwas4sec.Atotalof 13patients(13%)hadartifactsintheiratlas-basedattenuationmapre- latedtomisplacedbone.Thesesubjectswereremovedfromtheaverage performanceevaluationsoftheatlas-basedmethodonly.

3.1. Effectofcohortsizeandchangestotheinputon[18F]FDGPET

TheeffectofVB20PcohortsizeinDeepUTE trainingisshownin Fig.3a,whichshowsaclearcorrelationbetweengroupsizeandmodel performanceintermsofoutliersatthe3%[18F]FDGPETerror-level.

Trainingusingn=10subjectsresultsininadequatebonerepresentation, incorrectattenuationvaluesinbraintissue,andanoverallsmootherAC mapwithan8–10%negativebiasrelativetoPETwithCT-AC(Fig.3b).

Increasingthegroupsizedecreasedtheblurringandincreasedtheim- agecontrastandoveralldetaillevelintheACimages.Furthermore,the robustnessclearlyincreasedwithgroupsize.Thus,n=100wasrequired tooutperformRESOLUTEinthenumberofoutliers.Whentrainingus- ingthefullcohort,n=403,DeepUTEmarkedlyreducedthenumber ofoutlierscomparedtoRESOLUTE.Thelargeamountoftrainingdata empowersourmethodtohandlecommonartifactssuchassignalvoids fromdentalartifacts.AnexampleofthisisillustratedinFig.4.Asimi-

larrelationshipbetweentraininggroupsizeandnumberofoutlierswere foundwhenusingDeepDixonandDeepT1(Suppl.Fig.3).DeepT1ap- pearedmorerobusttowardstraininggroupsize,as10–50subjectswere sufficienttoachieveperformancenearRESOLUTEandincreasinggroup sizeabove100subjectsdidnotimproverobustness.

Fig.5showstheeffectoffine-tuningtheDeepUTEnetworktoasig- nificant changeintheUTEMRIinputsequencefollowingtheVB20P toVE11Psoftwareupgrade.TheVB20Pmodelwithouttransferlearn- ingisshown,whereitisapparentthattransferlearningisnecessary.

TransferlearningfromVB20Pcohortwasperformedon5,20,50and thefulln=91VE11PcohortwithUTEMRIasinput.Here,too,robust- nesswascorrelatedtothegroupsize,butsizeneededforconvergence was markedlyreducedton=5subjects. Incrementalrobustnessim- provementswereachievedwithincreasinggroupsize.Forcomparison, trainingtheVE11Pnetworkwithouttransferlearningusingalln=91 subjectsresultedinsimilarmodelaccuracyaswhenusingbetween5and 20subjectswithtransferlearning.Overallsimilarresultswereobserved forDeepDixon,withtheexceptionthatallmodelswithtransferlearning outperformedthemodelwithouttransferlearning(Suppl.Fig.4A).As expected,DeepT1trainedonlywithVB20Ppatientsgeneralizedwellto theVE11Pcohortwithoutre-training,withperformancesurpassingthe atlas-basedmethod(Suppl.Fig.4B).Thenumberofoutlierswassimi- lartotrainingwithalln=91VE11Ptrainingsubjectswithouttransfer learning,butfine-tuningwithVE11Pdatafurtherimprovedtherobust- ness.Repeatingmodeltrainingusingdifferenttrainingsubjectsforn=5 andn=20appearedrobustacrossallthreeMRIsequencetypes(Fig.5 andSuppl.Fig.4).

3.2. EffectsofMRIinputsequenceon[18F]FDGPET

Thenumberofoutliersatthe±3%level,representingtherobustness ofthemethod,wassimilaracrossallthreeproposedmethodswheneval- uatedontheVB20Ptestpatients(Fig.6A)andontheVE11Ptestpatients (Fig.6B)afterapplyingtransferlearning.Themethodsshowedasub- stantialimprovementoverbothRESOLUTEandtheatlas-basedmethod.

Therelativeandabsoluterelativepercentdifferenceregionalanaly- sisfortheVE11PcohortwithtransferlearningfromtheVB20Pcohortis showninFig.7andSupplementaryFigure5,respectively.Noneofthe proposedmethodsexceeded±1%averagerelativeerror(Rel%)inany regionofthebrain.Theatlas-basedmethodachievedalowfullbrain Rel%of0.8±2.4%,withhigherregionalerrorssubcorticallyofupto

Fig.2. AttenuationmapcomparisonforarepresentativepatientfromtheVE11Pcohort.TheattenuationimagesareshownpriortosuperimposingUTEvaluesin theareaoutsidetheCTfield-of-view.EachproposedMR-basedattenuationmapisprecededbytheunderlyingMRimageusedforinferenceforreference.Note,for simplicity,onlysecondecho(TE2)andin-phaseisshownforDepeUTEandDeepDixon,respectively.Allmodelsweretrainedusingthefullcohort(n=91)with transferlearningfromthecorrespondingVB20Pfullcohortmodels(n=403).

(6)

Fig.3. Theeffectoftraininggroupsizeonmodelaccu- racyofDeepUTE.A)An outlieranalysisforVB20Ptest subjects(n=201)ofmodelaccuracywithincreasingtrain- inggroupsize.B)Axialimagesofarepresentativepatient with[18F]FDGPETandcorrespondingDeepUTEAC-maps, and%-differencemapsrelativetoPETCT-AC.Thearrows intheAC-mapspointtothenasalcavityandbonewith amoredistinctresemblancetothereferenceCTwithin- creasinggroupsize.ThearrowsinthePETimagespoint toanoccipitallobe[18F]FDGPEThyper-intenseareawith convergentresemblancetothereferencestandardPETCT- AC.

7%.Themaximaloutlierforasinglepatientinanyregionofthebrain wasbelow6%forallproposedmethods(DeepUTErange:−4%to5%, DeepDixonrange:−4%to5%,DeepT1range:−5%to6%).Fortheatlas- basedmethod,theerrorsrangedfrom−15%to14%.Similarly,average absoluterelativeerror(Abs%)wasbelow2.5%inanyregionofthebrain

fortheproposedmethods,andbetween4%and8%regionallyforthe atlas-basedmethod.TheresultsfortheregionalanalysisfortheVB20P cohortareshowninSupplementaryFigure6.

Theaveragedrelativedifferencemeanandstandarddeviationim- ages are shown in Fig.8 for the VE11P cohort and Supplementary

(7)

Fig.4.ExamplecaseshowingrobustnesstometallicdentalimplantsforDee- pUTEtrainedwiththefullVB20Ptraininggroup(n=403).Metalimplantsdid notcauseanynoticeableartifactsinCT,butcausedlargesignalvoidsintheUTE echoimage.TheartifactsresultedinlargeerrorsintheRESOLUTEattenuation map,whereasDeepUTEwereabletolargelycorrectfortheartifact,asshown bothintheaxialandsagittalorientation.Theattenuationimagesareshown priortosuperimposingUTEvaluesintheareaoutsidetheCTfield-of-view.

Fig.5. OutlieranalysisfortheVE11Ptestpatients(n=104)showingtheeffects ofincreasinggroupsizeontransferlearningmodelaccuracyafterfine-tuning theDeepUTEmodel.ThedashedlinesrepresenttheperformanceoftheDee- pUTEmodelfromtheVB20PcohortappliedtotheVE11Ptestpatientswith- outtransferlearning(TL).Thepinklinerepresentstheperformanceoftraining thenetwork(DeepUTE)fromscratchwithoutTL,butwiththefulltraincohort (n=91),wheretheremaininglinesrepresentstheperformanceoffine-tuning ofDeepUTEwithincreasingtraininggroupsizeaftertransferlearningfromthe VB20Pcohort.Theshadedareasaroundn=5andn=20representsthe95%

confidenceintervalafterrepeatingthetrainingfourtimeswithdifferentsub- jectsineachrepetition.Theatlas-basedMR-ACmethod,shownforcomparison, wasonlybasedonsubjectswithoutregistration-relatedartifacts(n=91).

Figure 7 for the VB20P cohort. Again, near equal performance is achievedbyapplyingeitherinputMRIsequencetothedeeplearning method.Compared toRESOLUTE,especiallycorticalregionscloseto bonewasmoreaccuratewithalowerstandarddeviation(Suppl.Fig.7).

Table3

Consensusscoresfromclinicalevaluationof[18F]FDGPETcom- paringattenuationcorrectionusingCTandDeepDixon(VE11P;

n=104).

Consensus score Number

No difference 78 (75%)

Minor, not significant 25 (24%)

Clinically significant 1 (1%)

Differencecausedbywarperrorinspatialnormalization.

3.3. Clinicalevaluation

The104pairsof[18F]FDGPETreconstructions(CTandDeepDixon) wereevaluated,and1pair(1%)wasscoredas“clinicallysignificant different” basedon thestatisticalsurfaceprojectionwhere103pairs (99%)werescoredasnotclinicallysignificantlydifferent(Table3).On directclinical readingofthe[18F]FDGPETimage ofthesinglecase ratedas“clinicallysignificantdifferent” therewasnovisuallydiscern- ablechangeinvoxelactivity.Thedifferencescouldbetracedtoadefect spatialnormalizationwarpthatwouldbefoundonroutinequalitycon- trol.Presumablyitwasbroughtonbyscanninginextremeneckflexion combinedwithsmalldifferencesinextra-cerebralactivity.

4. DISCUSSION

This study confirmed the usability of deep learning-based net- worksforMRI-based attenuationcorrectionin aclinical setting,and demonstrated performances exceeding previous state-of-the-art non- deeplearning-basedmethods.Bytrainingacommonauto-encoderar- chitectureusingincreasinggroupsizes,weshowedadirectcorrelation betweenaccuracyandsizewhenthenetworkwastrainedfromscratch.

Usingtransfer-learningfromthelargecohortofsubjects,however,we showedtheamountoftrainingdataneededtoadapttochangestothe MRIsequenceinputcouldbereducedsignificantlytoaslowas5sub- jects.Furthermore,wedemonstratedrobustnesstowardsthechoiceof MRIsequenceinput,withidenticalperformancewhenusingacommon Dixon-basedMR-ACsequenceaswiththespecializedUTEsequence.Fi- nally,wedemonstratedaretainedclinical valueandaccuracyofour methodologycomparedtoourreferenceCT-AC.

Themethodologyemployedinthisstudyisnotnovel,astheauto- encoderarchitecturehasbeenwidelyappliedforMR-ACpurposesal- ready(Gongetal.,2018;Han,2017;Liuetal.,2017).Thenoveltyof ourstudylieswiththeunprecedentedamountoftrainingdatautilized andtheanalysisofrobustnesswithrespecttothesizeofthetraining datasetandtypeofMRIinput.Deeplearningisusuallyassociatedwith largeamountsoftrainingdata,somethingthatisdifficulttoobtainin most health-careapplications.PreviouspublicationsemployingCNNs forMR-to-CTconversionarethereforeoftenbasedonsmallcohorts,with agroupsizerangingbetween10and30(Gongetal.,2018;Han,2017; Liuetal.,2017).Toinvestigatetheeffectofsize,wetrainedthenet- workend-to-endfromscratchusing10,50,100,and403subjects,re- spectively.Whiletherewasanimpactontheaverageperformancewith anincreasinglylargertraininggroup(Fig.3B),alargereffectwasdeter- minedtobeinthenumberofassociatedoutliers(Fig.3A),withthebest overallperformanceachievedforthelargestcohort(n=403).Interest- ingly,toachievetheperformanceofRESOLUTE,measuredinnumberof outliers,atraininggroupsizebetween50and100subjectswasneeded (Fig.3AandSuppl.Fig.3).Thissuggeststhatthedeeplearningmethods basedonfewerthan50subjectsfortrainingmightbeunstable,albeit havingdecentaverageerrors.Themodelaccuracyfurtherimproveswith increasingtraininggroupsizefrom100to403inDeepUTEandDeep- Dixon,confirmingfindingsinotherdomainswheredeeplearningwere applied(Sunetal.,2017).UsingT1wMPRAGEgenerallyappearstobe morestable(Suppl.Fig.3B),whichcouldbeduetothesequencebeing morestandardizedcomparedtoDixon-VIBEandUTE.

(8)

Fig.6. OutlieranalysisfortheVB20P(left,n=201)andVE11P(right,n=104)testpatientstoshowtheeffectsonmodelrobustnessbyvaryingtheMRIsequence inputtypeandacrosssoftwareupgrades.Allmodelsaretrainedusingthefulltraincohorts,n=403forVB20Pandn=91withtransferlearningforVE11P.RESOLUTE andatlas-basedmethodsareshownforcomparison.Onlysubjectswithoutregistration-relatedartifactswereusedtocomputetheoutliersfortheatlas-basedmethod (n=91).

Fig.7. FullbrainandregionalmeanrelativedifferencesacrossallVE11Ptest patients(n=104)foreachofthethreenetworkswithMRIsequencesUTE, Dixon,andT1-weightedMPRAGE,alltrainedusingthefulltraincohort(n=91) withtransferlearningfromtheVB20Pcohort,aswellastheatlas-basedMR- ACmethodforcomparison.Onlysubjectswithoutregistration-relatedartifacts wereusedtocomputetheresultsfortheatlas-basedmethod(n=91).Thebars representtheaveragerelativedifferencetoPETwithCT-ACacrosspatients.The blacklineineachrepresentsthe95%confidenceinterval.

Apopularandusefulstrategytoovercomesmalltraininggroupsizes istoapplytransferlearning(Bengioetal.,2013).Thisstrategywasalso usedbyHantoinitiatepartoftheirnetworkfromapretrainedVGG- 16layermodel(Han,2017),byJangetal.totrainamodelusing6

patientstransferlearnedfromamodelwith30 patients(Jangetal., 2018),andbyTorrado-Carvajaletal.totrainamodelpretrainedon19 T1wbrainimagestosynthesizeDixon-VIBEpelvisimagesfrom19pa- tients(Torrado-Carvajaletal.,2019).Inthisstudy,weemployedtrans- ferlearningtore-calibratethenetworktoanewimageappearancefol- lowingamajorsoftwareupgrade.Theresultsshowedlittleeffectofin- creasingthenumberofsubjectsabove5,as5–91subjectsfortraining yieldedsimilarmodelaccuracy(Fig.5andSuppl.Fig.4).Trainingwith transferlearningononlyfivesubjectsmatched(DeepUTE)orexceeded (DeepDixonandDeepT1)theperformanceof trainingonallsubjects (n=91)withouttransferlearning,demonstratingthatinformationfrom theoriginalmodeltrainedonalargecohortispreservedandutilized.

Thesefindingshaverelevancenotonlyforrecalibratingmethodsafter majorsoftwareupgrades,butalsofordistributionofmodelsbetween scanners andcenterswhenthemodelsdonotgeneralize well.Using onlyalimitednumberofsubjectswithpairedCTandMRI,themodel canbeadaptedtomatchscannersatdifferentlocations,potentiallyeven fromdifferentvendors.Wehypothesizethatsuchtransferlearningwill alsoapplytocohortswithdifferentdemographics(ethnicityetc.).

There were differences,toa variousdegree,in allthree MRI se- quencespre-andpost-upgrade,seeTable1andSupplementaryFigure 1,impactingtheabilityofthemethodstogeneralizeacrossthesystem upgrade.ThelargestdifferencewasobservedwiththeDixonsequence, mainlyexpressedinchangeofresolution,butnonetheless,DeepDixon achievedsimilar performanceaftertransferlearningasDeepT1.This suggeststhatsimilardomainadaptationtoMRIsequencesfromother vendors arefeasible,as differences in T1weighted implementations acrosssystemsarenogreaterthanbetweenVB20PandVE11Pforthe Dixon-VIBEorUTEsequence.DeepT1trainedwithVB20Pdatageneral- izedwelltoVE11Pdata,producingimagesthatwereobjectivelyidenti- caltotheimagesproducedafterfine-tuning.ThequantitativePETevalu- ationresultedina1–2%overestimationonaverage(resultsnotshown).

FurtherinspectionrevealedageneralreductioninMRIintensityinthe arearepresentingboneinpatientsexaminedaftertheupgrade(Suppl.

Fig.1C),causingDeepT1topredictdenserbone,ultimatelycausingthe overestimation.Despitethiserrorbeingacceptableformostclinicalpur- poses,wefoundthatfine-tuningreducedthePETbias,andindicates thatfine-tuningisneededafterallmajorupgradesoftheMRIsystem.

TrainingthemodelwithamoreheterogeneousdatasetwithT1weighted

(9)

Fig.8.Averagedrelativedifference(leftfourcolumns)andstandarddeviation(rightfourcolumns)imagesacrossallVE11PtestpatientsRel%images(n=104) foreachofthethreenetworkswithMRIsequencesUTE,Dixon,andT1-weightedMPRAGE,alltrainedwithtransferlearningfromtheVB20Pcohort,aswellasthe atlas-basedMR-ACmethod.Imagescomputedfortheatlas-basedmethodwereonlybasedonsubjectswithoutregistration-relatedartifacts(n=91).

MPRAGEimagesfrommultiplesitesandsystemscouldpotentiallyelim- inatetheneedforfine-tuningcompletely.

Usingourmethod,theaveragerelativebiasiswithin1%fromPET withCT-ACinanyregionofthebrainwithanyoftheMRimagesas input(Fig.7).Thisisessentialforclinicalevaluationasine.g.tumor delineationandtreatmentresponseassessment(Lawetal.,2019),and forneurologicalapplicationsusingthecerebellumasreferenceregion (Borghammeretal.,2010;Ishiietal.,2001;Yakushevetal.,2008).Uti- lizingthesamepatientcohortandmetricsaswasemployedinaprevious multi-centercomparison(Ladefogedetal.,2016),allowsustocompare notonlytoRESOLUTE,butalsoindirectlytotheotherbestperforming state-of-the-artmethodsfortheSiemensPET/MRI(Burgosetal.,2014; Izquierdo-Garciaetal.,2014;Méridaetal.,2017).Acrossallmetrics, ourmethodwasfoundtohavesimilarorbetterperformancethanthat ofthemostpromisingmethods.Themethodsbasedondeeplearning thathavebeenproposedintheliteraturereportcomparablePETbias aswasfoundinourwork.Jangetal.(Jangetal.,2018)andLiuetal.

(Liuetal.,2017) reportedaverageregional Rel%[18F]FDGPETbias within±2%acrosseightsubjectsand±4%in10subjects,respectively, comparedtoatissue-segmentedthreeclass(air,softtissue,andbone) CTreference,whereGongetal.(Gongetal.,2018)reported±3%in12 subjectscomparedtoareferenceCT-AC.Howevernotethatnooutlier analysisorclinicalevaluationswereperformedin thesepublications, andarobustregionalperformanceiscriticalforclinicaluse.

Theatlas-basedmethodhadregistration-relatedartifactsin13pa- tients.Ofthese,fourwerepositionedoutside thepatientvolume,as previouslyreported(Øenetal.,2019),andcouldhavebeenmanually removedpriortoreconstruction.Theremainingerrorscorruptedtheim- age,renderingarescantheonlyoption.Despiteremovingthesepatients fromthePETevaluation,theatlas-basedmethodstillhadaglobalab- soluterelativeerrorof5%(Suppl.Fig.5),whichislikelyrelatedtothe absenceofaccurateairsegmentation,seee.g.Fig.2.ThePETbiaswas higherthanthepreviouslyreported2.5%(Øenetal.,2019),butismost likelyduetoadifferenceinpatientcohort.

Specializedsequencesabletogeneratecontrastinbonehavelittle diagnosticvalue,andtheaddedcontrastcomesatthecostofincreased

acquisitiontime,andthuslesspatientcomfortandcompliance.While thespecializedsequenceshaveprovenpivotalforsegmentation-based methods(Dicksonetal.,2014),noevidenceexiststhatsuchsequences areneededinorderfordeeplearning-basedmethodstosucceed.Our results demonstratethat traditionalMRI sequences aresufficientfor deeplearning-basedMR-AC,confirmingthefindingsofseveralprevious works(Teuhoetal.,2020).Ofthethreenetworkswechosetoclinically evaluatethemoresimplifiedandpatientcompliantDeepDixon.Interms of cross-vendoruse,DeepT1istheobviouschoice,butonaSiemens mMR,thefastDixon-VIBEsequenceis alwayspartofthePETacqui- sition,andthereforeinherentlyhasreducedmotionandoptimalalign- mentofPETandMRimages.Inthe104patientexaminationsevaluated, twoexperiencedexpertreadersfoundnocaseswithclinicallysignifi- cantdifferencesbetweenCTandDeepDixon.Thespatialnormalization wasperformedindividuallyforeachPETimage,whichcouldpartlyex- plaintheminornon-significantdifferencesin24%ofthecases(Table3).

However,limitationstoDeepDixoninparticularrelatedtoabnormal bonestructures,surgicaldeformationandmetallicimplantsshouldbe keptinmind.ItisrecommendedthatevaluationofDeepDixonforthe useinbraintumorevaluationisperformedseparatelyusingtracerspe- cificclinicalmetricsasdonepreviously(Ladefogedetal.,2019,2017).

Nonetheless,thefrequencyofpotentialerrors/differencesrelatedtous- ingDeepDixonisverylow,andprobablysmallerandlessfrequentthan thatintroducedbydentalartifactsandmotiononthePET/CTsystem.

Inourcenter,wehavenowimplementedDeepDixonMR-ACinroutine clinicalimagingandperformedmorethan200[18F]FDGPETscansin adultpatientsreferredforsuspectedneurodegenerationwithoutroutine low-doseCT.Tofurtherminimizepotentialerrors,attenuation-mapsare carefullyinspectedforunusualstructuresandartifactsbeforethepa- tientleavesthedepartmentandalow-doseCTisperformediferrorsare suspectedfollowingimageinspection.

Ourstudyhadanumberoflimitations.Wechosetofocusonevalu- atingtheeffectsofgroupsizeandMRIsequenceinput.Theconclusions drawnherecouldpotentiallybedifferentifothernetworktypeswere applied.Itwasnotthescopeofthisstudytoevaluatetheeffectofdeep learningarchitecture,butwerecognizethepotentialimprovedaccuracy

(10)

associatedwithmoresophisticatednetworks,suchasthegenerativead- versarialnetwork(Goodfellowetal.,2014).Thehighaccuracyandlow numberofoutlierspresentedheresuggests,however,thatonlyminor improvementsaretobefound.Moreover,alimitationofthecompari- sonistheuseofidenticaltrainingsetupsforeachtraininggroupsize.

Tailoringthehyperparameterstoeachmodel,orinvestigatingtheuse of2Dor3Dpatchesasinputtoboostamountoftrainingsamplescould potentiallyimprovetheresultsofthenetworkswithalownumberof subjects.

5. CONCLUSION

Wehavedescribedandevaluatedadeeplearningattenuationcor- rectionapproachforPET/MRIneuroimagingusingmorethan1000sub- jects.WeshowedthatarequirementforaccurateandrobustMR-ACisa largegroupsizeofatleast50subjectsfortraining,butfurtherincreas- ingthesizeto400directlyimpactedthenumberofoutlierssignificantly.

However,usingtransferlearningfromalargecohort,agroupsizeof5 subjectswassufficienttorecalibratetochangesintheMRIsequences.

Fullrobustnesswasachievedwithonly20subjects,withperformance atthesamelevel orevensurpassingthat ofalarger trainingcohort (n=91)withouttransferlearning.Furthermore,wedemonstratedro- bustnesstowardsthechoiceofMRIsequenceinput.Theclinicaleval- uationshowednoclinically relevantdifferencescompared toCT-AC, althoughknowledgeaboutMR-AClimitationsisimportantwhenused inclinicalroutine.Thecombinationofaccuracy,outlierperformance, clinicalperformance,robustnesstowardsthechoiceof MRIsequence input,andlowgroupsizeneededforre-trainingfollowingamajorsoft- wareupgrade,indicatesthattheclinicalimplementationofourdeep learning-basedMR-ACmethodwillbefeasibleacrossMRIsystemtypes.

CRediTauthorstatement

ClaesNøhrLadefoged:Conceptualization,Methodology,Software, Validation, Formalanalysis, DataCuration,Writing – OriginalDraft, AdamEspeHansen:Conceptualization,Methodology,Formalanalysis, OttoMølbyHenriksen:Conceptualization,DataCuration,Resources, FrederikJagerBruun:DataCuration,LiveEikenes:DataCuration, SiljeKjærnesØen:DataCuration,AnnaKarlberg:Resources,Liselotte Højgaard: Resources,Funding Acquisition, Ian Law: Conceptualiza- tion,DataCuration,Resources,FlemmingLittrupAndersen:Concep- tualization,Methodology,Formalanalysis,Supervision.Allauthorpar- ticipatedindraftingandrevisingthemanuscript.

SupplementaryFigures

SupplementaryFigure1:DifferencesbetweenVB20PandVE11P for the three sequences UTE (A), Dixon-VIBE (B), and T1-weighted MPRAGE(C).UsingtheCTbonearea(linear attenuationcoefficient

>0.103cm1)asamask,themeanbonesurrogatesignal,measured withR2 fromUTEsequences,arehigheraftertheupgrade(A).For T1wMPRAGE,thereisadecreaseinthesignalinthearearepresenting bone(C).Theeffectsoftheresolutionimprovement(Table1)forthe Dixon-VIBEsequenceareclearlyseenvisually(B).

SupplementaryFigure2:CNNU-net-likearchitectureusedinthis study. The network takes a stack-of-slices from 16 neighboring MR slices, andoutputsthecorresponding pseudo-CTimage.C represents thenumberofMRchannels:2forUTE(TE1andTE2),2forDixon(in- andopposed-phase),and1forT1w.

SupplementaryFigure3: Theeffectsof groupsizeon modelac- curacy.Outlier analysisshownforVB20Ptestpatients(n=201)for increasingtraininggroupsizeforDeepDixon(A)andDeepT1(B).RES- OLUTEisaddedforcomparison.

SupplementaryFigure4:OutlieranalysisfortheVE11P testpa- tients(n=104)showingtheeffectsofincreasinggroupsizeontrans- ferlearningmodelaccuracyafterfine-tuningtheDeepDixon(A)and

DeepT1(B)models.Thedashedlinesrepresenttheperformanceofthe modelfromtheVB20PcohortappliedtotheVE11Ptestpatientswith- outtransferlearning(TL).Thepinklinerepresentstheperformanceof trainingthenetworkfromscratchwithoutTL,butwiththefulltrain cohort(n=91),wheretheremaininglinesrepresentstheperformance offine-tuningwithincreasingtraininggroupsizeaftertransferlearn- ingfromtheVB20Pcohort.Theshadedareaaroundn=5andn=20 representsthe95%confidenceintervalafterrepeatingthetrainingfour times withdifferent subjectsin eachrepetition. Theatlas-basedMR- ACmethod,shownforcomparison,wasonlybasedonsubjectswithout registration-relatedartifacts(n=91).

SupplementaryFigure5:Globalandregionalmeanabsoluterel- ativedifferences acrossallVE11Ptestpatients(n=104)foreachof thethreenetworkswithMRIsequencesUTE,Dixon,andT1-weighted MPRAGE,alltrainedwithtransferlearningfromtheVB20Pcohort.The atlas-basedMR-ACmethod,shownforcomparison,wasonlybasedon subjectswithoutregistration-relatedartifacts(n=91).Thebarsrepre- senttheaverageabsoluterelativedifferencetoPETwithCT-ACacross patients.Theblacklineineachrepresentsthe95%confidenceinterval.

SupplementaryFigure6:Fullbrainandregionalmeanrelative(up- per)andabsolutemean(lower)differencesacrossallVB20Ptestpatients (n=201)foreachdeeplearningmodel.Thebarsrepresentthediffer- encetoPETwithCT-ACacrosspatients.Theblacklineineachrepresents the95%confidenceinterval.RESOLUTEshownforcomparison.

SupplementaryFigure 7:Averagedrelativedifference(topthree rows) andstandarddeviation (bottom three rows)imagesacross all VB20Ptestingpatients(n=201).Pleasenotethechangeofscalecom- paredtoFigure8.

ACKNOWLEDGMENTS

ThePET/MRIsystematRigshospitaletwaskindlyprovidedbythe JohnandBirtheMeyerFoundation,Denmark.Specialthankstothebio- engineersandradiographersatRigshospitaletandSt.OlavsHospitalfor patientpreparationsandimageacquisitions.WethankIBMDenmarkfor providingtwoPOWER9serverswith4TeslaV100GPUsineachsystem.

Supplementarymaterials

Supplementarymaterialassociatedwiththisarticlecanbefound,in theonlineversion,atdoi:10.1016/j.neuroimage.2020.117221. References

Abadi, M., Barham, P., Chen, J., et al., 2016. TensorFlow: a System for Large-Scale Ma- chine Learning, in: 12th USENIX Conference on Operating Systems Design and Imple- mentation (OSDI 16). pp. 265–283.

Andersen, F.L., Ladefoged, C.N., Beyer, T., et al., 2014. Combined PET/MR imaging in neu- rology: mR-based attenuation correction implies a strong spatial bias when ignoring bone. Neuroimage 84, 206–216. https://doi.org/10.1016/j.neuroimage.2013.08.042 . Arabi, H., Bortolin, K., Ginovart, N., Garibotto, V., Zaidi, H., 2020. Deep learning-guided joint attenuation and scatter correction in multitracer neuroimaging studies. Hum.

Brain Mapp 1–13. https://doi.org/10.1002/hbm.25039 .

Arabi, H. , Zeng, G. , Zheng, G. , Zaidi, H. , 2019. Novel adversarial semantic structure deep learning for MRI-guided attenuation correction in brain PET / MRI Novel adversarial semantic structure deep learning for MRI-guided attenuation correction in brain PET / MRI. Eur. J. Nucl. Med. Mol. Imaging 46, 2746–2759 .

Avants, B.B., Tustison, N.J., Song, G., et al., 2011. A reproducible evaluation of ANTs sim- ilarity metric performance in brain image registration. Neuroimage 54, 2033–2044.

https://doi.org/10.1016/j.neuroimage.2010.09.025 .

Bengio, Y., Courville, A., Vincent, P., 2013. Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828.

https://doi.org/10.1109/TPAMI.2013.50 .

Borghammer, P., Chakravarty, M., Jonsdottir, K.Y., et al., 2010. Cortical hy- pometabolism and hypoperfusion in Parkinson’s disease is extensive: prob- ably even at early disease stages. Brain Struct Funct 214, 303–317.

https://doi.org/10.1007/s00429-010-0246-0 .

Burdette, J.H., Minoshima, S., Vander Borght, T., Tran, D.D., Kuhl, D.E., 1996.

Alzheimer disease: improved visual interpretation of PET images by using three-dimensional stereotaxic surface projections. Radiology 198, 837–843.

https://doi.org/10.1148/radiology.198.3.8628880 .

Burgos, N., Cardoso, M.J., Thielemans, K., et al., 2014. Attenuation correction synthesis for hybrid PET-MR scanners: application to brain studies. IEEE Trans Med Imaging 33, 2332–2341. https://doi.org/10.1109/TMI.2014.2340135 .

Referanser

RELATERTE DOKUMENTER

Model 3 was trained on low-risk (histologic grade 1 and 2) EEC cases from the Bergen cohort and tested in the low-risk MDACC EEC cohort; only protein markers (n = 163) were available

The particle size distributions were characterized by the means of a disc centrifuge, and the effect of dispersion time, power density, and total energy input, for both bath

Figure 5.9 Predicted path loss using the Okumura-Hata model with added Epstein-Peterson diffraction loss in upper panel and added Deygout diffraction loss in lower panel... For

Overall, the SAB considered 60 chemicals that included: (a) 14 declared as RCAs since entry into force of the Convention; (b) chemicals identied as potential RCAs from a list of

An abstract characterisation of reduction operators Intuitively a reduction operation, in the sense intended in the present paper, is an operation that can be applied to inter-

There had been an innovative report prepared by Lord Dawson in 1920 for the Minister of Health’s Consultative Council on Medical and Allied Services, in which he used his

The ideas launched by the Beveridge Commission in 1942 set the pace for major reforms in post-war Britain, and inspired Norwegian welfare programmes as well, with gradual

The data for this thesis has consisted of the burial site at Borre and documents and reports from the 1988-1992 Borre Project, including field journals (Elliot, 1989; Forseth, 1991b,