• No results found

Can a Dinosaur Think? Implementation of Artificial Intelligence in Extracorporeal Shock Wave Lithotripsy

N/A
N/A
Protected

Academic year: 2022

Share "Can a Dinosaur Think? Implementation of Artificial Intelligence in Extracorporeal Shock Wave Lithotripsy"

Copied!
10
0
0

Laster.... (Se fulltekst nå)

Fulltekst

(1)

Stone Disease

Can a Dinosaur Think? Implementation of Artificial Intelligence in Extracorporeal Shock Wave Lithotripsy

Sebastien Muller

a,b

, Ha˚kon Abildsnes

c

, Andreas Østvik

a,b

, Oda Kragset

c

, Inger Ganga˚s

d

, Harriet Birke

e

, Thomas Langø

a,b

, Carl-Jørgen Arum

e,f,g,h,

*

aDepartmentofHealthResearch,SINTEFDigital,Trondheim,Norway;bDepartmentofCirculationandMedicalImaging,NorwegianUniversityofScienceand Technology,Trondheim,Norway;cMedicalSchool,NorwegianUniversityofScienceandTechnology,Trondheim,Norway;dDepartmentofRadiology,St.Olavs Hospital,TrondheimUniversityHospital,Trondheim,Norway;eDepartmentofSurgery,St.Olavs Hospital,TrondheimUniversityHospital,Trondheim, Norway;fDepartmentofClinicalandMolecularMedicine,NorwegianUniversityofScienceandTechnology,Trondheim,Norway;gDepartmentofUrology, SkaneUniversityHospital,Malmö,Sweden;hDepartmentofTranslationalMedicine,LundUniversity,Malmö,Sweden

a v ai l a b l e a t w w w . s c i e n c e d i r e c t . c o m

j o u r n al h o m e p a g e : w w w . e u - o p e n s c i e n c e . e u r o p e a n u r o l o g y . c o m

Articleinfo

Articlehistory:

AcceptedFebruary25,2021

AssociateEditor:

SilviaProietti

Keywords:

Extracorporealshockwave lithotripsy

Kidneystones Artificialintelligence Machinelearning Neuralnetwork

Abstract

Background: Extracorporealshock wavelithotripsy (ESWL)of kidney stones is losinggroundtomoreexpensiveandinvasiveendoscopictreatments.

Objective: This proof-of-concept project was initiated to develop artificial intelligence(AI)-augmented ESWLandto investigatethe potentialfor machine learningtoimprovetheefficacyofESWL.

Design, setting, and participants: Two-dimensional ultrasound videos were capturedduringESWLtreatmentsfromaninlineultrasounddevicewithavideo grabber.Anobserverannotated23212imagesfrom11patientsaseitherinoroutof focus. The medianhitrate wascalculatedon a patientlevelvia bootstrapping.

AconvolutionalneuralnetworkwithU-Netarchitecturewastrainedon57ultra- soundimageswithdelineatedkidneystonesfromthesamepatientsannotatedbya secondobserver.WetestedU-Netontheultrasoundimagesannotatedbythefirst observer.Cross-validationwithatrainingsetofninepatients,avalidationsetofone patient,andatestsetofonepatientwasperformed.

Outcome measurements and statistical analysis: Classical metrics describing classifier performancewerecalculated,togetherwith anestimationofhowthe algorithmwouldaffectshockwavehitrate.

Resultsandlimitations: ThemedianhitrateforstandardESWLwas55.2% (95%

confidence interval [CI] 43.2–67.3%). The performance metrics for U-Net were accuracy 63.9%, sensitivity 56.0%, specificity 74.7%, positive predictive value 75.3%,negativepredictivevalue55.2%,Youden’sJstatistic30.7%,no-information rate58.0%,andCohen’s

k

0.2931.Thealgorithmreducedtotalmishitsby67.1%.The

mainlimitationisthatthisisaproof-of-conceptstudyinvolvingonly11patients.

*Correspondingauthor.DepartmentofClinicalandMolecularMedicine,NorwegianUniversityof ScienceandTechnology,ErlingSkjalgssonsgt.1,Trondheim7491,Norway.Tel.:+4531396833.

E-mailaddress:carl-jorgen.arum@ntnu.no(C.-J.Arum).

http://dx.doi.org/10.1016/j.euros.2021.02.007

2666-1683/©2021TheAuthor(s).PublishedbyElsevierB.V.onbehalfofEuropeanAssociationofUrology.Thisisanopenaccessarticle undertheCCBYlicense(http://creativecommons.org/licenses/by/4.0/).

(2)

1. Introduction

Urolithiasis is an increasingly common condition that imposesasubstantialburdenonbothpatientsandhealth careproviders [1,2]. The prevalence of urolithiasis varies globally,rangingfrom4%to20%[3–5].SinceChaussyetal [6]reportedextracorporealshockwavelithotripsy(ESWL) treatment for urolithiasis in 1980, it has become the treatmentoptionmostutilized.Theabilityofshockwavesto fragment stones is the basis for ESWL and the efficacy depends on shockwaves hitting the stones [7]. ESWL, percutaneous nephrolithotomy (PCNL), and ureteroreno- scopy/retrograde intrarenal surgery (URS/RIRS) are the main treatment options for symptomatic urolithiasis [8].Ofthese,ESWListheleastinvasive methodwith the fewestcomplications[9].Aglobalstudycoveringaperiodof 20yrfoundthattheshareoftotaltreatmentsincreasedby 17% for URS/RIRS, remained the same for PCNL, and decreasedby14.5%for ESWL[10].Anotherstudy investi- gatingliteraturetrendsforurolithiasistreatmentsrevealed that papers on URS/RIRS and PCNL increased by 171%

and279%,respectively,whilepapersonESWLdecreasedby 17%[11].

Increases in ESWL efficacy should reduce retreatment rates,operating room time, anesthesia needs, endoscopic equipmentuse,andcomplicationrates,therebysignificantly reducinghealthcarecosts.

Since the creation of the computer there has been a desire to design computers capable of competing with humanintelligence. This isachieved byimitating human cognitive function, a concept referred to as artificial intelligence(AI).Machinelearning(ML)isatypeofAIthat learnsthroughexperience[12].Severalnon-MLalgorithms fortrackingurinarystoneshavebeendevelopedandtested, butnonehasbeenwidelyadoptedinclinicalpractice.Ithas beendemonstratedthatMLalgorithmshavetheabilityto outperformcliniciansinimageanalysis[13–15].

Insupervisedlearninganalgorithmisgivenlabeleddata such as ultrasound images of kidneys with stones and withoutstones to train itto differentiate “stone” images from “no stone” images [16,17]. Popular ML algorithms inspiredbybiologicalneuralcircuitsincludeartificialneural networks(NNs)(Fig.1A).ThefirstlayerinanNNiscalled

theinputlayer,anditsroleistodistributetheoriginalinput datatothenextlayer[18].Theoutputlayermodifiesinput into the final output for the whole network, deciding whetheranimage contains aurinarystoneornot inour example. Between the input andoutput layers there are

“hidden”layersthatarecomposedofweightsthatcanbe taught to handle complex problems [18]. How the connectionsandlayersarestructureddefinesthearchitec- tureoftheNN[12,16].

NNtrainingistypicallyachievedusinganoptimizerthat seekstominimizealossfunctionthroughbackpropagation.

Theroleofthelossfunctionistomeasuretheabilityofthe algorithm to model the given data (eg, to identify renal stones)anditsvalueisusedtoupdatethenetworkweights in order to minimize the error. To investigate the generalizability of the NN, it should be validated using differentdatafromthedatausedfortraining.Thevalidation lossismonitoredduringtraining:asthenetworkimproves, the validation error decreases with the training error.

However,acommonproblemduringtrainingisoverfitting (Fig. 1B),whichistypicallyaresultofthemodelmemorizing thetrainingdata[19].Theresultisamodel thatdoesnot learngeneralizablefeatures,oftenidentifiedbydivergence of the validation loss. To prevent this, different training strategies are employed, such as early stopping and regularization.Moreimportantly,athirdindependentdata set,oftenreferredtoasthetestset,isneededandusedafter thetrainingprocedure.Thetestsetisusedtomeasurethe ability of the network to solve its task for unseen independentdata.

A convolutional NN (CNN) is preferred for complex imageanalysis[20].CNNsarebuilttofirstidentifyfeatures of low complexity, and then find features of higher complexity in deeperlayers [20]. Convolutionoperations identify the essential features of the input (eg, lines or circles) and give outputs called feature maps. Pooling operationsthendownsample(reducetheresolution)ofthe featuremapstoreducetheneedforcomputationalpowerin subsequentoperations.Twoofthepoolingoperationsmost oftenusedaremaximumpoolingandaveragepooling,as explained in Figure 1C. When an algorithm performs segmentation of an image, it partitions it into semantic objects [20],suchasdetermining whichpartofanimage Conclusions: OurcalculatedESWLhitrateof55.2%(95%CI43.2–67.3%)supports findingsfromearlier research.We havedemonstratedthat amachine learning algorithmtrainedonjust11patientsincreasesthehitrateto75.3%andreduces mishitsby67.1%.WhenU-Netistrainedonmoreandhigher-qualityannotations, evenbetterresultscanbeexpected.

Patientsummary: Kidneystonescan betreatedbyapplyingshockwaves tothe outsideofthebody.Ultrasoundscansofthekidneyareusedtoguidethemachine deliveringtheshockwaves,buttheshockwavescanstillmissthestone.Weused artificialintelligencetoimprovetheaccuracyinhittingthestonebeingtreated.

©2021TheAuthor(s).PublishedbyElsevierB.V.onbehalfofEuropeanAssociationof Urology.ThisisanopenaccessarticleundertheCCBYlicense(http://creativecommons.

org/licenses/by/4.0/).

(3)

depicts a urinary stone [16]. Different CNNs have been constructed for segmentation purposes, one example of whichisU-Net[20].ThefirstU-Netstageisdownsampling, inwhichconvolutionallayersidentifyimagefeatures,while maximum pooling operators downsample the feature maps. Inthe laststage,whichisupsampling,the feature maps are upsampled by upsampling operators and combined with copies of symmetric feature maps from the downsampling stage [20]. With these crossover connections, high-resolution features are preserved, as demonstratedinFigure2.

2. Patientsandmethods

Two-dimensionalultrasoundimageswereanalyzedtoestimatethehit rateforoperator-controlledESWLandtesttheU-Netperformance.To obtain images,aframe-grabberwas attachedto the ESWLmachine (PiezoLith3000,RichardWolfGmbH,Knittlingen,Germany)forcapture of inlinereal-timeultrasoundimagesduring ESWL. Eachvideowas 30minlong and 5-minvideosequences were randomly chosen for annotation.Theannotatorextractedultrasoundsamplestolabeleach frameaseitherfocuswhenthestonewasinthefocalzone(FZ)orout offocuswhenthestonewasnotintheFZ(Fig.3A).Thisprocesswas carriedoutusinganannotationtool(Fig.3B).Asastoneisusuallyinthe FZoroutoftheFZformorethantwoconsecutiveframes,theannotation processwassimpliedbylabelingonlythetransitionpointsforintervals offrames.Forexample,iftherstframeislabeledasinfocusandthe transitiontooutoffocusoccursinthetenthframe,thenallframesfrom thestartuntilthetenthframeareclassiedasinfocus.

Duringannotationwefoundthatsomestoneswerenotvisibleinthe ultrasound images,andthese patients(cases 1, 3,and 9) werenot includedintheanalysisofthehitrateforoperator-controlledESWL.In total, 731 frames were annotated directly, leading to a total of 23 212 frames. As the ultrasounddevice captures15 frames/s, we endedupwith26minofannotatedultrasoundvideo,representingan averageof3.2min foreach patient.In addition,a secondannotator delineated the kidney and renal stones in arbitrary frames for all patients.Thisresultedinbinarymasksforkidneysandkidneystones fromatotalof57images.

Totestastandard U-Netconvolutionalnetwork inkidney stone segmentation,itwastrainedusingthedelineatedimages.Fortraining andvalidationofthe network,weprovidedannotationsofboththe kidneyandthekidneystone.Markingthekidneygavethealgorithma referencepointorcontextualinformationforwherethestoneshouldbe, asakidneystoneremainsapproximatelyinthesamepositioninsidethe kidneythroughoutthetreatment.Weconductedpatient-basedcross- validation.Atotalof11modelswerecreatedbytrainingonframesfrom nine patientsand validating on frames from one patient. Of these 11models,eightweretestedonthesame23212framesannotatedasin focusoroutoffocusineightpatients.

The rstoutcomewewantedto investigatewasthe hitratefor operator-controlledESWLcontinuouslyring90pulses/min.Thehitrate referstothepercentageofshockwavesthathitthestone,inthisstudy denedasashotforwhichmorethan50%ofthestoneisintheFZ.To calculatethisweneededtoknowthenumberofframesinwhichthe stonewasinfocusoutofacertainnumberofframes.Eachframewas manuallyassignedalabelof0(outoffocus)or1(focus)byoneobserver andthesumofthelabelsgivesthenumberofframesforwhichthestone isintheFZ.UsingtheRenvironmentforstatisticalprogramming(www.

r-project.org),themedianhitrateforeachpatientwith95%condence interval (CI) was estimated using bias-corrected and accelerated bootstrappingtoevaluatetherobustnessoftheresultsbeyondsample estimates.Thehitratedistributionamongpatientswasexaminedby Fig.1(A)Exampleofthearchitectureofsimpleneuralnetwork

comprisinganinputlayerwithtwonodes,ahiddenlayerwiththree nodes,andanoutputlayerwithtwonodes.CreatedwithInkscape.(B)A graphdescribingoverfitting.Thetrainingerrorcontinuallydecreases duringtraining,eventuallyreachingzeroifthemodelistrainedforlong enough.Whenoverfittingstarts,thevalidationerrorwillstarttoincrease becausethemodelisgettingworseatgeneralizing.Theoptimalstopping timeisthelowestpointonthevalidationcurve.Basedonagraphby Tretyakov[25].(C)ImagefromYanietal[26](CreativeCommons Attribution3.0license)showingthatmaximumpoolingandaverage poolingdownsampletheinput.Inmaximumpooling,theinputisdivided intopartsandthehighestvalueforeachpartgivestheoutput.Inaverage pooling,theaveragevalueforeachpartgivestheoutput.

(4)

producingahistogram,boxplot,andanormalQ-QplotinSPSS,andby performingaShapiro-Wilktestandanalysisofkurtosisandskewness.

Ax2testforgoodnessoftwasperformedmanually todetermine whetherthehitrateswere uniformlydistributed andtoultimately decidewhetherpoolingwasappropriateornot.Apvalue<0.05was regardedasstatisticallysignicantandwouldleadtorejectionofthe nullhypothesisthatthehitratedistributionisuniform.Themedianhit rate with 95% CI was calculated on a frame level if pooling was appropriate,oronapatientlevelifpooling wasinappropriate.The overallmedianhitratewith95%CIwasestimatedviabias-corrected andacceleratedbootstrapping.

ToestimatetheperformanceoftheU-Netalgorithm,thedatawere inputtoRtocreateaconfusionmatrix(Table1)forwhichtheground truthwastheannotateddata.Framesforwhichthealgorithmdidnot detectastonewerenotincludedintheconfusionmatrix.Rwasthen usedtocalculateclassicalmetricsfortheperformanceofclassication models: accuracy, sensitivity, specicity, positive predictive value (PPV), negative predictive value (NPV), prevalence, detection rate, detectionprevalence, balanced accuracy, Youdens J statistic, theno-informationrate,andCohensk.Anexplanationofthevalues is providedinTable 2.We then estimatedthe treatmenttimefor U-Net-controlled ESWL relative to operator-controlled ESWL by dividingthenumberofframesannotatedasinfocusbythenumber oftruepositives.Bymultiplyingtherelativetreatmenttimebythe numberoftruenegativesanddividingthisbythenumberofframes annotatedasoutoffocus,weestimatedhowU-Netwouldaffectthe numberofmishits.Hitsperminutewascalculatedforbothoperator- controlledESWLandU-Net-controlledESWLgivenashockwaverate of90/min.Themedianhitrateand95%CIwerecalculatedforeach patient via bias-corrected and accelerated bootstrapping of 5000samplesofframesinR(Table3).

Written permissiontouse anonymizedultrasoundvideos down- loaded from patient records was obtained after evaluation by the regionalethicscommittee(referencenumber2014/2261).

3. Results

The hit rate among patients was normally distributed, asshowninFigure4A–D.Thiswassupportedbyanalysis of skewness (z=0.005) and kurtosis (z=1.73), and a Shapiro-Wilktest(p> 0.05).A

x

2goodness-of-fittestwas thenperformedmanuallyandcontrolledafterwardsinR.

The expectedhit ratefor each patientwas calculatedby multiplyingthetotalnumberofframesforthatpatientby thepooledmeanhitrate(50.12%).A

x

2valueof927.4with

sevendegreesoffreedomgaveapvalue<0.05,meaningthe null hypothesis of a uniform distribution among the patients was rejected and the data should therefore not bepooled.By bootstrapping themedianhit ratesfor eightpatientswith3000samplesinR,wefoundamedian hit rate of 55.2% (standard deviation 18.6%, 95% CI 43.2–67.3%). We chose to bootstrap with 3000 samples onthebasisofaconvergenceanalysisofthe95thpercentile, asexplainedinFigure4D.

Thealgorithmwasunabletofindastonein20.6%ofthe frames, so they were not included in the analysis. For theremaining18440frames,thedegreeofoverlapbetween thepredictedstoneareaandtheFZwascalculated.Overlap of 50%wasconsidered“in focus”. The testresultswere

Fig.2TheoriginalU-NetarchitecturecreatedbyRonnebergeretal[27].Bluerectanglesrepresentfeaturemaps,whilewhiterectanglesrepresent featuremapscopiedviacrossoverconnections.Thearrowsdenoteoperators(darkandlightblue=convolution;grey=crossoverconnection;

red=maximumpooling;green=upsampling).

(5)

organizedinaconfusionmatrixinRusingtheannotatoras the ground truth and the performance was calculated (Table1).Thealgorithmfoundthat58.0% (prevalence)of theframeshadstonesinfocus(Table4).Theaccuracyofthe algorithmwas 63.9%, meaning that it correctlyclassified 63.9%oftheframesaseither“infocus”or“outoffocus”.Of

theframeswithstonesinfocus,thealgorithmwasableto classifyapproximatelyhalfas“infocus”,asthesensitivity was 56.0%. The algorithm was better at classifying the stonesthatwere“outoffocus”,withspecificityof74.7%.The PPV (the number of frames that the algorithm correctly classifiedas“infocus”)was75.3%andtheNPV(thenumber

Fig.3(A)Exampleofaframeforwhichthemodelreportedthatthestonewasinfocus,as50%ofthepredictedstone(red)waswithinthefocal zone(yellow).(B)Screenshotoftheannotationtool.Theultrasoundvideowithcrosshairsisshowntotheright,andasliderisusedtogothroughthe frames.Toannotateaframe,theannotatorclicks“Selectframeforannotation”andchoosestolabeltheframeaseither“focus”or“outoffocus”.If thestoneisinfocusforthisframe,theannotatorthencontinuesthevideoandstopstolabelthefirstframeforwhichthestonegoesoutoffocus.

Theframesbetweenthesetwolabelsareautomaticallylabeled“focus”.Thegreenandredrectanglesrepresenttheframeslabeledbytheannotator.

(6)

offramesthealgorithmcorrectlyclassifiedas“outoffocus”) was55.2%.NotethatthePPVcorrespondstothehitrateif the lithotripter fires shockwaves in accordance with the algorithm.Thedetectionratewas32.5%,whilethedetection prevalencewasconsiderablyhigherat43.1%,indicativeofa substantialnumberoffalsepositives(whenAIclassifiesa frame as “in focus” when the stone is actually “out of focus”).WithaYouden’sJstatisticof30.7%(criterion:>0), Cohen’s

k

of0.2931(criterion:>0),andano-information rate of 58.0% (lower than accuracy), the algorithm performance is betterthan randomly guessing whether stones are in or out of focus, suggesting that it can correctly track kidney stones inultrasound images. The treatmenttimerelativetooperator-controlledESWLwas 1.94 (11 633/5 987), while the mishit rate was 32.9%

([1.941961]/[23212–11633])oftherateforoperator- controlledESWL.Operator-controlledESWLhitsthestone 45 times per minute (90/min 11 633/23 212), while U-Net-controlledESWLhitthestone23timesperminute (90/min5987/23212).

4. Discussion

Our findingssuggest thatthereis significant potentialto optimizetheESWLhitrate,asweestimatedanoperator- controlled hit rate of 55.2% could beimproved to 75.3%

usingaU-NetneuralnetworktocontrolESWLandthetotal numberofshockwavesmissingthestonewouldbereduced to approximately one-third, ultimately making the proceduresaferforpatients.

Thereareseverallimitationsandweaknessestotheway in which we estimated the hit rate. First, the annotator (amedicalstudent)wasinexperiencedinultrasoundimage interpretation;andsecond,identifyingtheexactbordersof the stone was difficultbecause of low image resolution, which we experienced as a significant issue during annotation.Theresolutionwaslowbecauseofthequality oftheprobe-scannersystemitselfandbecausetheprobe had to be retracted during shockwave firing. A future solution could be to register pre-intervention computed tomography(CT)imageswiththeultrasoundimages,which wouldprobablymakeiteasierfor theannotatorto make correct annotations by suggesting the stone position relativetothekidney.

Another problem is that the ultrasound images we sampled were from the first 5min of treatment. During treatment the stone is progressively fragmented and thereforebecomesmoredifficulttoidentify(alsotruefor fluoroscopy) and to subsequentlyhit, so the sampleswe usedarenotrepresentativeofthewholetreatmentcourse.

However,whenastonebecomestoodifficulttoidentifyitis notrelevantforouranalysis,astheannotatorcannotdecide whether thestone isin focus ornot.Estimated hitrates between patients were normally distributed, suggesting thattheyarerepresentative.Ourdefinitionofashockwave hitas50%overlapbetweenthestoneandtheFZmaynotbe optimal,asmarginalhitsmayalsocontributetofragmen- tation, resultinginunderestimationofthehitrate.Abias mighthavebeenintroducedwhenweexcludedpatients1, 3,and9becauseofthelackofvisibilityoftheirstoneson ultrasound. For operator-controlled ESWL, the operator wouldalsonotbeabletolocalizetheirstonesonultrasound, soperiodicfluoroscopywouldbeneeded.Consequently,the operatorhaslesscontrolofthereal-timelocationofastone anditwouldprobablyspendmoretimeoutoffocus.When Table1Confusionmatrixdesignandtestdata(imagesannotatedasinfocusoroutoffocus)organizedinaconfusionmatrix

Infocus(annotator) Outoffocus(annotator) Total

Design

Infocus(AI) TP FP TP+FP

Outoffocus(AI) FN TN FN+TN

Total TP+FN FP+TN TP+FP+FN+TN

Testdata

Infocus(AI) 5987 1961 7948

Outoffocus(AI) 4700 5792 10492

Total 10687 7753 18440

AI=artificialintelligence;TP=truepositive;FP=falsepositive;FN=falsenegative;TN=truenegative.

Table2Overviewofthemostimportantstatisticsdescribing performanceofaclassifier

Statistic Denition

Accuracy TPþFPþTNþFNTPþTN

Sensitivity TPþFNTP

Specificity TNþFPTN

Positivepredictivevalue(PPV) TPþFPTP Negativepredictivevalue(NPV) TNþFNTN

Prevalence TPþFPþFNþTNTPþFN

Detectionrate TPþFPþFNþTNTP

Detectionprevalence TPþFPþFNþTNTPþFP Balancedaccuracy SensitivityþSpecificity

2

Youden’sJstatistic SensitivityþSpecificity1

No-informationrate:

IfðTPþFNÞ>ðFPþTNÞ TPþFPþFNþTNTPþFN IfðFPþTNÞ>ðTPþFNÞ TPþFPþFNþTNFPþTN

(7)

images from these patients are left out, the operator- controlledhitratemightbeoverestimated.

Thetraining andperformance testingof thealgorithm alsohaveseverallimitationsandweaknesses.Thealgorithm was trained and validated on data without crosshairs annotatedbya secondinexperienced observer. Thus, the training set might contain false-positive stones, limiting the potential of the algorithm to learn stone-tracking correctly.Someofthetrainingandvalidationannotations wereperformedonultrasoundimagesinwhichthestone was difficult to identify (including patients 1, 3, and 9), increasingtheprobabilityoffalse-positivestones.

Thealgorithmwasonlytrainedon57imagesfromatotal of11patients.Thetrainingsetwasclearlynotlargeenough foroptimizingthealgorithmefficacy,andthealgorithmhas significantpotentialforimprovementifmorepatientsare includedandanexperiencedradiologistusesCTtoprovide accurateannotations.Asintheestimationoftheoperator- controlledhitrate,estimationofoverlapisalsoanissuein the performance test. The test set was annotated by a medical student whoassessed whetherthestone was in focusornotviaasemi-subjectivevisualevaluationofthe stoneandFZoverlap.Bycontrast,thealgorithmwastrained onimagesinwhichthestonesweredelineated.Whenthe stoneedgesare markedbyhand,computer softwarecan calculatethestoneandFZoverlapmuchmoreaccurately thanahumanvisuallyevaluatingtheoverlap.Asaresult, althoughthetestsetannotatorandthealgorithmmightbe inperfectagreementoverthelocationofastoneinatestset image,they might estimatedifferentdegreesof stone-FZ overlap,resultingindisagreementonwhetherastoneisin focusornot.Thisespeciallyrelatestostonesthatarecloseto 50%withintheFZ.Inthesecases,evensmalldifferencesin estimationoftheoverlapmightinfluencethedecisionon

“in focus” versus “out of focus”. This results in more uncertainty in the metrics describing the algorithm performance.

Usingtwodifferentinexperiencedannotatorshassome additionalweaknesses.Thealgorithmfirstlearnswhatone oftheannotatorsinterpretsasstonesandisthentestedon whattheotherannotatorinterpretsasstones.Oneproblem hereisinterobservervariability,whichweconfirmedwas

significant: comparisonof thetwo annotatorsrevealeda mismatchrateof37.5%.Thismeansthatthealgorithmwill neverperformperfectlyonthetestset,astheannotatorsfor thetrainingandtestsetsdisagreedoverthedefinitionof stone borders. In fact, accounting for interobserver variabilityinsteadofonlyusingoneobserverstrengthens theconfidenceinourmetricsindicatingthatthealgorithm hasstone-trackingability.

Should the algorithm becomebetter thanthe testset annotatoratidentifying stones,themetricswouldunder- estimate theperformance ofthe algorithm. Tosee if the algorithmperformedsignificantlybetterthanthemetrics implied, we visually examined several of the ultrasound videosofalgorithm-predictedstonesandtestedthetrained algorithm on the same type of annotations used in the training set. After reviewing the results, the idea that thealgorithmclearlyoutperformedthetestsetannotations wasrejected.

We discussed treating frames inwhich thealgorithm did notdetecta stoneas thoughthe algorithm reported that the stone was “out of focus”. This would lead to improvementsinalltheAIperformanceparametersexcept foradecreaseinsensitivity(51.2%).Mostnotably,wesaw increases in accuracy to 67.0%, specificity to 83.0%, and Youden’sJstatistic to34.2%.The argumentfor analyzing thedatainthiswayisthatstonesthatarenotdetectedwill not be shot at, resulting in a lower risk of treatment complications.Havingsaidthat,wechosenottodothis,as wecouldnotcontrolforwhetherframesinwhichstones were not detected by the algorithm had stones or not, whichwouldhaveresultedinover-rating ofthetracking abilityofthealgorithm.Inaddition,itwouldnotaffectthe PPV, which is arguably the most important parameter whenanalyzingthealgorithmperformanceatthecurrent stateoftheproject.

We were able to identify threestudies that estimated ESWL hit rates between 40% and 60% [13,21,22]. Our estimatedhit rateof 55.2% isat thehigher endof range compared to the other studies, but our wide 95% CI (43.2–67.3%) fits well with their observations. Different definitionsofthehitrateandsmallsamplesizeslimitthe generalizabilityofthesestudies.

Table3Medianhitratesforoperator-controlledextracorporealshockwavelithotripsyforeachpatientasestimatedviabootstrapping

Patient Framesinfocus(n) Totalframes(N) Medianhitrate,%(95%CI)

1

2 1588 2974 53.4(51.6–55.1)

3

4 1414 2397 59.0(57.0–61.0)

5 1774 2798 63.4(61.6–65.2)

6 1851 3382 54.7(53.0–56.4)

7 1697 3544 47.9(46.2–49.5)

8 2082 3926 53.1(51.5–54.6)

9

10 789 3699 21.3(20.0–22.7)

11 438 492 89.0(86.2–91.7)

Total 11633 23212 55.2(43.2–67.3)

CI=confidenceinterval.

(8)

TodatetherearenopublicationsonMLalgorithmsused to localize urinarystonesin ultrasoundimages forESWL treatment.Singlaetal[23]triedtolocalizeurinarystones usingfluoroscopy duringESWLtreatmentwithRetinaNet andachievedprecisionof70%10%usingadifferentML algorithm.

Our algorithm can be implemented by stopping the lithotripterfromfiringshockwaveswhenthestoneisout offocus.AnalgorithmsimilartothatusedbySinglaetal [23] couldalso be addedtocreatea pipelinethat uses bothultrasoundandfluoroscopy,whichcouldpotentially furtherimprovestone-trackingability.Ithasbeenshown thattreatmentpulseratesof60–90yieldthebeststone- freerate,butitshouldbenotedthatthisrateisbasedon testingofdifferentconstantrates,regardlessofwhether the stone is within the focal zone or not [24]. Current ESWLtreatmentroutinesuseapproximately3000–4000 pulses per treatment at a hit rate of 50%, resulting in approximately 2000 hits. Algorithm-controlled ESWL may only require2000 shockwaves, which wouldthus leadtoareductionintreatmenttime.Infact,shockwave rates couldbeincreased sothatwhen the stonepasses through the focal zone it could be hit multiple times.

Previousunpublishedfindingsbyourgroupshowedthat thestoneisrelativelystationaryattheendofexpiration (Fig.5)[25].Thisphysiologicalfactcouldbemuchbetter utilizedinalgorithm-controlledESWL,withshockwaves firedatahigherratewhilethestoneisstationaryinside the focal zone at the end of each expiration. The algorithm accounts for the entire kidney image, and not justthestone itself,soanother potentialbenefitof algorithm-controlled ESWL is that the hit rate can be maintained in the later stages of the treatment course when the stone often becomes unclear on both ultra- sound and fluoroscopy. Before the algorithm is implemented in clinical practice it should be trained and tested on more and higher-quality annotations, preferably by a uroradiologist using information from pretreatmentCT.Annotationoftrainingsetsshouldalso becarriedoutatseveraldifferentinstitutionstoimprove thegeneralizabilityofthe MLalgorithm.

Fig.4(A)Histogramofthehitrateforoperator-controlled extracorporealshockwavelithotripsy(ESWL).Thedistribution resemblesanormaldistribution,albeitwithadegreeofkurtosis.

However,thekurtosiszvaluewasnotstatisticallysignificant.(B) Boxplotofthehitrate(y-axis)foroperator-controlledESWL(x-axis),

showinganapproximatelysymmetricaldistributionconsistentwitha normaldistribution.Patients10and11areoutliers.(C)AnormalQ-Q plotofthehitrateforoperator-controlledESWL.Thepointsarecloseto theline,whichtypicallyindicatesanormaldistribution.Nevertheless, thereseemstobeatrendforhowthepointsareorganizedaroundthe line,suggestingthedistributionmightnotactuallybenormal.(D) Convergenceofthe95thpercentileforthehitrate.Therelative differencebetweentwoconsecutivevaluestendstowardszeroasthe numberofbootstrappingiterationsincreases.Itisonlypossibleto extract6435differentsamplesfromtheoriginalsamplesizeofeight.

Thislimitshowmanysampleswecanbootstrap,asincreasingthe numberofbootstrapsamplesincreasesthelikelihoodofextractingthe samesamplemultipletimes.Tofindtheoptimalnumberofbootstrap samples,weexploredhowmanybootstrapsamplesitwouldtaketo stabilizethe95thpercentile.Thisisshowninthegraph,withnumber ofbootstrapsamplesonthex-axisandthechangeofthe95th percentileinpercentageonthey-axis.Itisevidentthatthechangeis

<1%afterbootstrappingof2000–3000samples,indicatingthatthe

optimalnumberofbootstrapsamplesis2000–3000.

(9)

5. Conclusions

We estimated an operator-controlled ESWL hit rate of 55.2% (95% CI 43.2–67.3%), which means that approxi- mately half of the shockwaves applied miss the stone.

Algorithm-controlled ESWL increased the hit rate to approximately 75.3% and reduced the total number of shockwavesmissingthestonebyapproximately67.1%.Our resultsindicate thataU-Netneuralnetworktrained and testedonbetterannotationswillbeabletoimproveESWL efficacy.

Authorcontributions:Carl-JørgenArumhadfullaccesstoallthedatain thestudyandtakesresponsibilityfortheintegrityofthedataandthe accuracyofthedataanalysis.

Studyconceptanddesign:Muller,Østvik,Langø,Arum.

Acquisitionofdata:Muller,Abildsnes,Østvik,Kragset,Gangås,Birke.

Analysis andinterpretationofdata:Muller, Abildsnes,Østvik, Kragset, Langø,Arum.

Drafting ofthe manuscript:Muller, Abildsnes,Østvik, Kragset, Langø, Arum.

Criticalrevisionofthemanuscriptforimportantintellectualcontent:All authors.

Statisticalanalysis:Muller,Østvik.

Obtainingfunding:Langø,Arum.

Administrative,technical,ormaterialsupport:None.

Supervision:Langø,Arum.

Other:None.

Financialdisclosures:Carl-Jørgen Arum certiesthatall conictsof interest, including specic nancial interests and relationships and afliationsrelevanttothesubjectmatterormaterialsdiscussedinthe manuscript(eg,employment/afliation,grantsorfunding,consultan- cies,honoraria,stockownershiporoptions,experttestimony,royalties, orpatentsled,received,orpending),arethefollowing:None.

Funding/Supportandroleofthesponsor:Thisprojectwassupportedby the CentralNorwayRegionalHealth Authority,StiftelsenSINTEF,the FacultyofMedicineandHealthSciences,andtheNorwegianNational AdvisoryUnit forUltrasoundandImage-GuidedTherapyatSt.Olavs HospitalinTrondheim,Norway.Thesponsorsplayednodirectrolein thestudy.

References

[1]MoeOW.Kidneystones:pathophysiologyandmedicalmanagement.

Lancet2006;367:33344.

[2]ZiembaJB,MatlagaBR.Epidemiologyandeconomicsofnephro- lithiasis.InvestigClinUrol2017;58:299306.

[3]ScalesJrCD,LaiJC,DickAW,etal.Comparativeeffectivenessof shockwavelithotripsyandureteroscopyfortreatingpatientswith kidneystones.JAMASurg2014;149:648–53.

[4]LiuY,ChenY,LiaoB,etal.EpidemiologyofurolithiasisinAsia.Asian JUrol2018;5:20514.

[5]Raghallaigh HN, EllisD, Symes A.Geographicaland prevalence trendsinurolithiasisinEngland:Aten-yearreview.EurUrolSuppl 2017;16(3):e12.

[6]Chaussy C, Brendel W, Schmiedt E. Extracorporeally induced destructionofkidneystonesbyshockwaves.Lancet1980;2:12658.

[7]NeisiusA,LipkinME,RassweilerJJ,ZhongP,PremingerGM,KnollT.

Shock wave lithotripsy: the new phoenix? World J Urol 2015;33:21321.

[8]SrisubatA,PotisatS,LojanapiwatB,SetthawongV,LaopaiboonM.

Extracorporealshockwavelithotripsy(ESWL)versuspercutaneous nephrolithotomy(PCNL)orretrogradeintrarenalsurgery(RIRS)for kidneystones.CochraneDatabaseSystRev2014;2014:CD007044.

[9]AboumarzoukOM,KataSG,KeeleyFX,McClintonS,NabiG.Extra- corporealshockwavelithotripsy(ESWL)versusureteroscopicman- agement for ureteric calculi. Cochrane Database Syst Rev 2012;2012:CD006029.

[10] GeraghtyRM, JonesP,SomaniBK.Worldwidetrendsofurinary stonediseasetreatmentoverthelasttwodecades:asystematic review.JEndourol2017;31:54756.

Table4CalculatedperformancestatisticsforaU-Netmodel whentestedonultrasoundimagesannotatedasinoroutoffocus

Statistic Value

Accuracy(%) 63.9

Sensitivity(%) 56.0

Specificity(%) 74.7

Positivepredictivevalue(%) 75.3

Negativepredictivevalue(%) 55.2

Prevalence(%) 58.0

Detectionrate(%) 32.5

Detectionprevalence(%) 43.1

Balancedaccuracy(%) 65.4

Youden’sJstatistic(%) 30.7

No-informationrate(%) 58.0

Cohen’sk 0.2931

Fig.5AgraphbyKragset[28]demonstratingthethree-dimensional movementofaurinarystoneduringonerespiratorycycle.Eachdot representsthestone’slocationataspecifictimepoint.Whenlines betweenthedotsarelong,themovementislarge.Thedotsattheend ofexpirationareveryclosetoeachother,meaningthestoneisalmost standingstillthisistheoptimaltimeintervaltotargetthestone.

Referanser

RELATERTE DOKUMENTER

3 The definition of total defence reads: “The modernised total defence concept encompasses mutual support and cooperation between the Norwegian Armed Forces and civil society in

Only by mirroring the potential utility of force envisioned in the perpetrator‟s strategy and matching the functions of force through which they use violence against civilians, can

This report documents the experiences and lessons from the deployment of operational analysts to Afghanistan with the Norwegian Armed Forces, with regard to the concept, the main

Based on the above-mentioned tensions, a recommendation for further research is to examine whether young people who have participated in the TP influence their parents and peers in

A selection of conditional probability tables for the Bayesian network that will be used to model inference within each grid cell. The top of each table gives the

Overall, the SAB considered 60 chemicals that included: (a) 14 declared as RCAs since entry into force of the Convention; (b) chemicals identied as potential RCAs from a list of

Potential individual perceived barriers to using the SMART concept are being understood by analyzing how different factors that hinder and promote the motivation to use SMART

There had been an innovative report prepared by Lord Dawson in 1920 for the Minister of Health’s Consultative Council on Medical and Allied Services, in which he used his