• No results found

From Theory to Practice: Translating Whole-Genome Sequencing (WGS) into the Clinic

N/A
N/A
Protected

Academic year: 2022

Share "From Theory to Practice: Translating Whole-Genome Sequencing (WGS) into the Clinic"

Copied!
14
0
0

Laster.... (Se fulltekst nå)

Fulltekst

(1)

Review

From Theory to Practice: Translating Whole-Genome Sequencing (WGS) into the Clinic

Francois Balloux,

1,5,

* Ola Brønstad Brynildsrud,

2,5

Lucy van Dorp,

1,5

Liam P. Shaw,

1

Hongbin Chen,

1,3

Kathryn A. Harris,

4

Hui Wang,

3

and Vegard Eldholm

2

Hospitalsworldwidearefacinganincreasingincidenceofhard-to-treatinfec- tions. Limitinginfections and providingpatients with optimal drug regimens require timely strain identification as well as virulence and drug-resistance profiling. Additionally, prophylactic interventions based on the identification ofenvironmentalsourcesofrecurrentinfections(e.g.,contaminatedsinks)and reconstructionoftransmissionchains(i.e.,whoinfectedwhom)couldhelpto reduce the incidence ofnosocomial infections. WGS could hold the key to solvingtheseissues.However,uptakeintheclinichasbeenslow.Somemajor scientific and logistical challenges need to be solved before WGS fulfils its potential in clinical microbial diagnostics. In this review we identify major bottlenecks that need to be resolved for WGS to routinely inform clinical interventionanddiscusspossiblesolutions.

TheLureofWGSinClinicalMicrobiology

Thankstoprogressinhigh-throughputsequencingtechnologiesoverthelasttwodecades, generatingmicrobialgenomesisnowconsideredneitherparticularlychallengingnorexpen- sive.Asaresult,whole-genomesequencing(WGS)(seeGlossary)hasbeenchampionedas theobviousandinevitablefutureofdiagnosticsinmultiplereviewsandopinionpiecesdating backto2010[1–4].Despiteenthusiasminthecommunity,WGSdiagnosticshasnotyetbeen widelyadoptedinclinicalmicrobiology, whichmay seemat oddswiththe currentsuiteof applicationsfor whichWGShashuge potential,and whichare alreadywidelyused inthe academicliterature.CommonapplicationsofWGSindiagnosticmicrobiologyincludeisolate characterization,antimicrobialresistance(AMR)profiling,andestablishingthesourcesof recurrent infections and between-patient transmissions. All of these have obvious clinical relevanceandprovidecasestudieswhereWGScould,inprinciple,provideadditionalinfor- mation and even replace the knowledge obtained through standard clinical microbiology techniques.ThisreviewreiteratesthepotentialofWGSforclinicalmicrobiology,butalsoits currentlimitations,andsuggestspossiblesolutionstosomeofthemainbottleneckstoroutine implementation.Inparticular,wearguethatapplying existingWGSpipelinesdevelopedfor fundamentalresearchisunlikelytoproducethefastandrobusttoolsrequired,andthatnew dedicatedapproachesareneededforWGSintheclinic.

StrainIdentificationthroughWGS

Atthemostbasiclevel,WGScanbeusedtocharacterizeaclinicalisolate,informingonthe likelyspeciesand/orsubtypeandallowingphylogeneticplacementofagivensequencerelative toanexistingsetofisolates.WGS-basedstrainidentificationgivesafarsuperiorresolution

Highlights

Inprinciple, WGScanprovidehighly relevantinformationforclinicalmicro- biologyinnear-real-time,frompheno- typetestingtotrackingoutbreaks.

However, despite this promise, the uptakeofWGSintheclinichasbeen limitedtodate,andfutureimplementa- tionislikelytobeaslowprocess.

Theincreasinginformationprovidedby WGScancauseconictwithtraditional microbiological concepts andtyping schemes.

Decreasing raw sequencing costs have not translated into decreasing total costs for bacterial genomes, whichhavestabilised.

Existingresearchpipelinesarenotsui- tablefortheclinic,andbespokeclinical pipelinesshouldbedeveloped.

1UCLGeneticsInstitute,University CollegeLondon,GowerStreet, LondonWC1E6BT,UK

2InfectiousDiseasesand EnvironmentalHealth,Norwegian InstituteofPublicHealth, Lovisenberggata8,Oslo0456, Norway

3DepartmentofClinicalLaboratory, PekingUniversityPeoplesHospital, Beijing,100044,China

4GreatOrmondStreetHospitalNHS FoundationTrust,Departmentof Microbiology,Virology&Infection Prevention&Control,LondonWC1N 3JH,UK

5Theseauthorsmadeequal contributions

*Correspondence:

[email protected](F.Balloux).

1

(2)

compared to genetic marker-based approaches such as multilocus sequence typing (MLST)andcanbeusedwhenstandardtechniquessuchaspulsed-fieldgelelectrophoresis (PFGE), variable-number tandem repeat (VNTR) profiling, and MALDI-TOF are unable to accuratelydistinguishlineages[5].WGS-informedstrain identification couldbeofparticular significanceforbacteriawithlargeaccessorygenomes,whichencompassmanyoftheclinically mostproblematicbacteria,wheremuchoftherelevantgeneticdiversityisdrivenbydifferences intheaccessorygenomeonthechromosomeand/orplasmidcarriage.

Somewhatironically,theextremelyrichinformationofWGSdata, witheverygenomebeing unique,generatesproblemsofitsown.Clinicalmicrobiologytendstorelyonoftenlargelyad hoc taxonomical nomenclature, such as biochemical serovars for Salmonella enterica or mycobacterialinterspersed repetitive units (MIRUs) for Mycobacterium tuberculosis. While therichinformationcontainedinWGSshouldinprincipleallowsupersedingtraditionaltaxo- nomic classifications [6,7], defining an intuitive, meaningful and rigorous classification for genomesequencesrepresentsamajorchallenge.Forstrictlyclonalspecies,whichundergo nohorizontalgenetransfer(HGT),suchasM.tuberculosis,itispossibletodevisea‘natural’ robustphylogeneticallybasedclassification[8].Unfortunately,organismsundergoingregular HGT,andwithasignificantaccessorygenome,donotfallneatlyintoexistingclassification schemes. In fact, it is even questionable whether a completely satisfactory classification schemecouldbedevisedforsuchorganisms,asclassificationsbasedonthecoregenome, accessory genome, housekeeping genes (MLST), genotypic markers, plasmid sequence, virulencefactorsorAMRprofilemayallproduceincompatiblecategories(Figure1).

PredictingPhenotypesfromWGS

Beyondspeciesidentificationandcharacterization,genomesequencesprovidearichresource thatcanbeexploitedtopredictthepathogen’sphenotype.Themainmicrobialtraitsofclinical relevanceareAMRandvirulence,butmayalsoincludeothertraitssuchastheabilitytoform biofilmsorsurvivalintheenvironment.Sequence-baseddrugprofilingisoneofthepillarsofHIV treatmentandhasto becreditedfortheremarkablesuccessofantiretroviraltherapy(ART) regimes.PredictionofAMRfromsequencedatahasalsoreceivedconsiderableattentionfor bacterialpathogensbuthasnotledtocomparablesuccessatthisstage.

Resistanceagainstsingledrugscanberelativelystraightforwardtopredictinsomeinstances.

Forexample,thepresenceoftheSCCmeccassetteisareliablepredictorforbroad-spectrum beta-lactamresistanceinStaphylococcusaureus,withstrainscarryingthiselementreferredto asmethicillin-resistantS.aureus(MRSA).Inprinciple,WGSoffersthepossibilitytopredictthe fullresistanceprofiletomultipledrugs(the‘resistome’).Possiblythefirstrealattempttopredict theresistomefromWGSdatawasastudybyHoldenetal.in2013,showingthat,foralarge datasetofS.aureusST22isolates,98.8%ofallphenotypicresistancescouldbeexplainedby atleastonepreviouslydocumentedAMRelementormutationinthesequencedata[9].

Sincethen,severaltoolshavebeendevelopedforthepredictionofresistanceprofilesfrom WGS.TheseincludethosedesignedforpredictionofresistancephenotypefromacquiredAMR genes, including ResFinder [10] and ABRicate (https://github.com/tseemann/abricate), togetherwiththose also takinginto accountpointmutationsin chromosome-bornegenes suchasArg-Annot[11],theSequenceSearchToolforAntimicrobialResistance(SSTAR)[12], andtheComprehensiveAntibioticResistanceDatabase(CARD)[12].Ofthese,ResFinderand CARD can be implemented as online methods that, dependent on user traffic, can be considerablyslowerthanmostothertoolsthatonlyusethecommand-line.Theyare,however, superiorintermsofbroadusabilityandaremoreintuitivethan,forexample,thegraphicaluser

Glossary

Accessorygenome:thevariable genomeconsistingofgenesthatare presentonlyinsomestrainsofa givenspecies.Manyofthe organismsrepresentingthemost severeAMRthreatsare characterisedbylargeaccessory genomescontainingimportant componentsofclinicallyrelevant phenotypicdiversity.

Antimicrobialresistance(AMR):

theabilityofamicroorganismto reproduceinthepresenceofa specificantimicrobialcompound.

Alsoreferredtoasantibiotic resistance(ABRorAR).Thesumof thedetectedAMRgenesina sequencedisolateissometimes referredtoastheresistome.

Horizontalgenetranfer(HGT):the transmissionofgeneticmaterial laterallybetweenorganismsoutside

verticalparent-to-offspring inheritance,includingacrossspecies boundaries.Geneticelementsrelated toclinicallyrelevantphenotypessuch asAMRandvirulenceareoften transmittedviaHGT.

K-mer:astringoflengthk containedwithinalargersequence.

Forexample,thesequence‘ATTGT’

containstwo4-mers:ATTGand

TTGT.Theanalysisofthek-mer contentofrawsequencingreads allowsforrapidcharacterizationof thegeneticdifferencebetween isolateswithouttheneedforgenome assembly.

Multilocussequencetyping (MLST):aschemeusedtoassign typestobacteriabasedonthe allelespresentatadenedsetof chromosome-bornehousekeeping genes.Alsoreferredtoassequence typing(ST).

Phylogenetictree:arepresentation ofinferredevolutionaryrelationships basedonthegeneticdifferences betweenasetofsequences.Also referredtoasaphylogeny.

Transmissionchain:therouteof transmissionofapathogenbetween hostsduringanoutbreak.Thiscan oftenbecharacterizedusingWGS comparedtotraditional

epidemiologicalinferencebasedon, forexample,tracingcontacts betweenpatients.

Virulence:broadly,apathogen's abilitytocausedamagetoitshost, forexamplethroughinvasion,

(3)

adhesion,immuneevasion,andtoxin production.However,virulenceis currentlylooselydenedbyindirect proxieseitherphenotypically(e.g., throughserum-killingassays)or genetically(e.g.,bythepresenceof genesinvolvedincapsulesynthesis orhypermucosvisity).

Whole-genomesequencing (WGS):theprocessofdetermining thecompletenucleotidesequenceof anorganism’sgenome.Thisis generallyachievedbyshotgun sequencingofshortreadsthatare eitherassembleddenovoor mappedontoahigh-quality referencegenome.

interface of SSTAR. Other tools exist for richer species-specific characterization such as PhyResSE[13] andPATRIC-RAST[14].Furthertoolshavebeendevelopedto predictphe- notypedirectlyfromunassembledsequencingreads,bypassinggenomeassembly[15,16].

IthasbeenproposedthatWGS-basedphenotypingmight,insomeinstances,beequally,ifnot more,accuratethantraditionalphenotyping[16–19].However,itisprobablynocoincidence thatthemostsuccessfulapplicationstodatehaveprimarilybeenonM.tuberculosisandS.

aureus, which are characterised by essentially no, or very limited, accessory genomes, respectively.Othersuccessfulexamplesincludestreptococcalpathogens,whereWGS-based predictionsandmeasuredphenotypic resistanceshowgood agreementeven inlarge and diversesamplesofisolates[20,21].Onthewhole,however,predictingcomprehensiveAMR profilesinorganismswithopengenomes,suchasEscherichiacoli,whereonly6%ofgenesare found inevery single strain [22],is challenging and requires extremely extensive and well curatedreferencedatabases.

ThetransitiontoWGSmightappearrelativelystraightforwardifviewedasmerelyreplacingPCR panelswhicharealreadyusedwhentraditionalphenotypingcanbecumbersomeandunreli- able.However,toputtheproblemincontext,thereareover2000describedb-lactamasegene sequences responsible for multiresistance to b-lactam antibiotics such as penicillins,

(A)

(B)

= STX

STX

= STY

STY

MLST scheme Chromosome groupings Plasmid groupings Resistance groupings

G2

G1, G3 G1

G1 G2 G3

G1 G1

G2

G2

G2

G3 G3

G3 InserƟon

Figure1.TheChallengeofClassifyingOrganismswithOpenGenomes.Ahypotheticalexampleofthreeclosely relatedisolates(G1G3)collectedfromthesamehospitaloutbreak.(A)Asimpliedrepresentationoftheirgeneticmakeup.

Thestrainssharemostoftheirchromosome,butwithG2havingacquiredonepointmutation(smallblackline)inoneofthe genesofthemultilocussequencetyping(MLST)typingschemes,andthusbeingassignedtoadifferentsequencetype (ST);G3alsoacquiredaninsertiononthechromosome.Pointmutationsonthechromosomearerepresentedbyshort blacklines.Additionally,allthreestrainssharetwoplasmids(redandblue)carryingantimicrobialresistance(AMR) elements(shapes),andG1hasanadditionalprivateplasmid(purple).(B)Theschematicgroupingofthesethreestrains basedonMLSTtyping,chromosomalgeneticdistances,plasmidsimilarity,andAMRprole.

(4)

cephalosporins, and carbapenems [23]. Whilst b-lactam resistance in some pathogens, includingS. pneumoniae, canbepredictedthrough,for example,penicillin-bindingprotein (PBP)typing andmachine-learning-based approaches[24],the generalproblem ofreliably assigningresistancephenotypebasedonmanydescribedgenesequencesiscommonplace.

Atthisstage,manyoftheAMRreferencedatabasesarenotwellintegratedorcuratedandhave nominimum clinical standard. They often have varying predictive ranges and biases and producefairlyinaccessibleoutputfileswithlittle guidanceonhowto interpretorutilisethis informationforclinicalintervention.Perhapsbecauseoftheselimitations,althoughofobvious benefitaspartofadiagnosticsplatform,bothawarenessanduptakeintheclinichasbeen limited.

Additionally, with some notable exceptions, such as thepneumococci [24],most AMR profilepredictionsfromWGSdataarequalitative,simplypredictingwhetheranisolateis expectedtoberesistantorsusceptibleagainstacompounddespiteAMRgenerallybeinga continuousand oftencomplextrait.The levelof resistanceof astraintoa drugcan be affectedbymultipleepistaticAMRelementsormutations[25],thecopynumbervariationof theseelements [26],the function of the geneticbackground of the strain [27–29], and modulatingeffectsbytheenvironment[30].Thelevelofresistanceisgenerallywellcaptured bythesemiquantitativephenotypicmeasurementminimuminhibitoryconcentration(MIC), evenifcliniciansoftenuseadiscreteinterpretationofMICsintoresistant/susceptiblebased onfairlyarbitrarycut-offvalues.Quantitativeresistancepredictionsarenotjustofacademic interest.Intheclinic,low-levelresistancestrainscanstillbetreatedwithagivenantibiotic but the standard dose should be increased, which can be the best option at hand, especiallyfordrugswith lowtoxicity.

ThemajorityofeffortstopredictphenotypesfrombacterialgenomeshavebeenonAMR profiling.Yet,sometoolshavealsobeendevelopedformultispeciesvirulenceprofiling:the VirulenceFactors Database(VFDB) [31] or VirulenceFinder [32] aswell asthe bespoke virulencepredictiontoolforKlebsiellapneumoniae,Kleborate[33].Onemajorchallengeis thatvirulenceis oftenacontext-dependenttrait.Forexample,in K.pneumoniaevarious imperfect proxies for virulence are used. These include capsule type, hypermucovisity, biofilmand siderophoreproduction,orsurvivalinserum-killingassays.Whileall ofthese traitsarequantifiableandreproducible,andcouldthusinprinciplebepredictedusingWGS, itremainsunclearhowwelltheycorrelatewithvirulenceinthepatient.Giventhatvirulenceis oneof themostcommonlystudied phenotypes,yetlacksa cleardefinition, thegeneral problem of predicting bacterial phenotype from genotype may be substantially more complex than the special case of AMR, which is itself far from solved for all clinically relevantspecies.

TrackingOutbreaksandIdentifyingSourcesofRecurrentInfections

Beyondphenotypeprediction forindividual isolates,WGS hasallowedreconstructing out- breakswithinhospitalsandthecommunityacrossadiversityoftaxarangingfromcarbapenem- resistantK.pneumoniae[34–36]andAcinetobacterbaumannii[37]toMRSA[38,39],strepto- coccaldisease[40],andNeisseriagonorrhoea[41],amongstothers.WGScanrevealwhich isolatesarepartofanoutbreaklineageand,byintegratingepidemiologicaldatawithphyloge- neticinformation,detectdirectprobabletransmissionevents[42–45].Timedphylogenies,for example generated through BEAST [46,47], can provide likely time-windows on inferred transmissions, as well as dating when an outbreak lineagemay have started to expand.

Approachesbasedontransmissionchainscanalsobeusedtoidentifysourcesofrecurrent

(5)

infections(socalled‘super-spreaders’),anddonotnecessarilyrelyonallisolateswithinthe outbreak having been sequenced, allowing for partial sampling and analyses of ongoing outbreaks[48].Inthisway WGS-basedinferencecanelucidatepatternsofinfection which areimpossibletorecapitulatefromstandardsequencetypingalone[35].

However,WGS-informedoutbreaktrackingisusuallyperformedonlyretrospectively.Typically, thepublicationdatesofacademicliteraturerelatingtooutbreakreconstructionlaggreatly,often intheorderofatleast5yearssincetheinitialidentificationofanoutbreak[49,50].Evenanalyses publishedmorerapidlyaregenerallystilltooslowtoinformonreal-timeinterventions[38].Some attempts have beenmade to showthat near-real-time hospitaloutbreak reconstructionis feasibleretrospectively[51,52]orhaveperformedanalysesforongoingoutbreaksincloseto real-time[53,54],butthesestudiesarestillinaminorityandremainlargelywithintheacademic literature.

Someofthistime-lagprobablyrelatestothedifficultyoftransmission-chainreconstructionat actionabletime-scales.Thiscanberelatively straightforwardforviruseswithhigh mutation rates,smallgenomes,andfastandconstanttransmissiontimes,suchasEbola[55]andZika virus[56],butconversely,reconstructingoutbreaksforbacteriaandfungiposesaseriesof challenges. Available tools tend to be sophisticated and complex to implement, and the sequencedataneeds extremelycarefulqualitycontrolandcuration.Unfortunately, insome casesinsufficientgeneticvariationwillhaveaccumulatedoverthecourseofanoutbreak,anda transmissionchainsimplycannotbeinferredwithoutthissignal[57,58].Furthermore,extensive within-hostgeneticdiversity(typicalinchronicinfections)canrendertheinferenceoftrans- missionchainsintractable[59].Thesecomplexitiesmeanthata‘one-sizefitsall’bioinformatics approachtooutbreakanalysessimplydoesnotexist.

TheBonus ofImprovedSurveillance

One of the key promises of WGS is in molecular surveillance and real-time tracking of infectiousdisease.Thisreliesontransparentandstandardizeddatasharingofthemillions ofgenomessequencedeachyear,togetherwithaccompanyingmetadataonisolationhost, dateofsampling,andgeographiclocation.Withenoughdata,surveillanceinitiativeshavethe potentialtoidentifythelikelygeographicoriginofemergingpathogensandAMRgenes,group seeminglyunrelatedcasesintooutbreaks,andclearlyidentifywhensequencesaredivergent fromothercirculatingstrains.Inahospitalsetting,surveillancecanhelptodetecttransmis- sionwithinthehospitalandinflowfromthecommunity,optimizeantimicrobialstewardship, and informtreatment decisions; at national and global scales,it can highlight worldwide emergingtrendsforwhichcollatedevidencecandirectbothretrospectivebutalsoanticipa- torypolicydecisions.

Amongstthemostsuccessfulglobalsurveillanceinitiativesandanalyticalframeworksarethose relating specifically to the spread of viruses. Influenza surveillance is arguably the most developed, with large sequencing repositories such as the GISAID database (gisaid.org) andonlinedataexplorationandphylodynamicsavailablethroughwebtoolssuchasNextFlu [60]andNextStrain(http://nextstrain.org),whichalsoallowsexaminationofothersignificant virusesincludingZika,Ebola,andavianinfluenza.Anotherpopulartoolforthesharingofdata andvisualization ofphylogenetic treesandtheir accompanying meta-datais Microreact (microreact.org)[61],whichalsoallowsforinteractivedataqueryingandincludesbacteriaand fungi.Afurthertool,predominatelyforbacterialdata,isWGSA(www.wgsa.net).WGSAallows theuploadofgenomeassembliesthroughadrag-and-dropwebbrowser,allowingforaquick characterizationofspecies,MLSTtype,resistanceprofile,andphylogeneticplacementinthe

(6)

contextoftheexistingspeciesdatabasebasedoncoregenes.AtthetimeofwritingWGSA comprises20649genomespredominantlyfromS.aureus,N.gonorrhoeae,andSalmonella enterica serovar Typhi, together with Ebola and Zika viruses, all with some associated metadata.

Althoughanexcitinginitiative,WGSAandassociatedplatformsarestillareasonablylongway off characterizing all clinically relevant isolates and often rely entirely on the sequences uploaded alreadybeingassembled. Moregenerally, the successofany WGS surveillance isdependentonthe timely andopen sharing ofinformation from aroundthe globe.While sequence data from academic publications is near systematically deposited on public sequencedatabases(atleastuponpublication),suchdataarenearuselessiftheaccompa- nyingmetadata(seeabove)arenotalsoreleased,asremainsthecasefartoooften.Addition- ally, as more genomes are routinely sequenced in clinical settings as part of standard procedures,ensuringthat thecultureofsharing sequencedatapersistsbeyondacademic researchwillbecomeincreasinglyimportant.

CostofWGSintheClinic

ForWGStoberoutinelyadoptedinclinicalmicrobiology,it needstobecost-effective.Itis commonlyacceptedthatsequencingcostsareplummetingwiththeNationalHumanGenome ResearchInstitute(NHGRI)estimatingthecostperrawmegabase(Mb)ofDNAsequenceto 0.12 USD (www.genome.gov/sequencingcostsdata). This has led to claims that a draft bacterialgenomecancurrentlycostlessthan 1USDto generate[62].Thisisamisunder- standingasonecannotsimplyextrapolatethecostofabacterialgenomebymultiplyingahigh- throughputperDNAmegabase(Mb)sequencingcostbythesizeofitsgenome.Formicrobial sequencing,multiplesamplesmustbemultiplexedforcostefficiency,whichiseasiertoachieve inlargereference laboratorieswithhighsampleturnover. Excluding indirectcosts suchas salariesforpersonnel,preparationofsequencinglibrariesnowmakesupthemajorfractionof microbialsequencingcosts(Figure2).

TheprecipitousdropinthecostofproducingrawDNAsequencesinrecentyears(Figure2A) mostly reflects a massive increase in output with new iterations of Illumina production machines.Thesenumbersignoreallothercostsandsimplyreflectoutputrelativetothecost ofthesequencingkits/cartridges.Realisticcostestimatesforamicrobialgenomeincluding librarypreparationonthebestavailableplatformsgiveadifferentpicture(Figure2B).Sincethe introductionoftheIlluminaMiSeqplatformin2011,newsequencingkitsgeneratinghigher output have only marginally affected true microbial genome sequencing costs, as library preparationmakes upasignificant portionof thetotal(60 USDofatotalof 74USDfora typicalbacterialgenomein2018).Thesecostshaveremainedstableovertimeandareunlikely togodownsignificantlyinthenearfuture.Indeed,themarketseemstobeconsolidatingin fewer hands (e.g., represented by the procurement of KAPA by Roche in 2015), which economictheorypredictswillnotfavorpricedecrease.

Itisalsoimportanttokeepinmindthatthesecostsaremassiveunderestimateswhichdonot includeindirectcostssuchassalariesforlaboratorypersonnelanddownstreambioinformatics.

Suchindirectcostsaredifficulttoestimatepreciselyinanacademicsettingbutarefarfrom trivial.Per-genomesequencingand analysiscosts are likely to beeven higherinaclinical diagnosticsenvironmentduetotheneedforhighlystandardisedandaccreditedprocedures.

However,amicro-costinganalysiscoveringlaboratoryandpersonnelcostsestimatedthecost ofclinicalWGSto£481perM.tuberculosisisolateversus£518applyingstandardmethods, representingrelativelymarginalcostsavingsbutwithsignificanttimesavings[63].WGSdoes

(7)

indeedrepresentapotentiallycost-effectiveandhighlyinformativetoolforclinicaldiagnostics, butformicrobiology-scalesequencingweseemtobeinapost-plummeting-costsage.

TimeScalesofWGS-Baseddiagnostics

Onekeyfeatureof usefuldiagnosticstoolsistheirability torapidlyinform treatment.Most applicationsofWGSsofarhavebeenforlab-culturedorganisms(bacteriaandfungi).Tradi- tionalculturemethodsrequirelongturnaroundtime,withmostbacterialculturestaking1-5 days,fungalcultures7-30days,andmycobacterialculturesupto14-60days.Inthisscenario, WGSisusedasanadjuncttechnologyprimarilytoprovideinformationonthepresenceofAMR andvirulencegenes,whichisparticularlyusefulformechanismsthataredifficulttodetermine phenotypically(e.g. carbapenem resistance).Thisuseof WGS, whilstsolvingsome ofthe currentclinicalproblems,doesnotspeedupthediagnosisofinfection;itismorethecasethat new technologyis replacing some of the morecumbersome laboratory techniques whilst providingadditionalinformation.

WGSismoreappealingasamicrobiologicalfastdiagnosticssolutionwhencombinedwith proceduresthat circumvent (orshorten) the traditional culturestep. Thiscan be achieved through direct sampling of clinical material (Box 1) or by using a protocol enriching for sequencesofspecificorganism(s).Suchenrichmentmethods,generallybasedonthecapture

0.1 10.0

2010 2012 2014 2016 2018

Cost per megabase raw sequence (USD)

100 1000

2010 2012 2014 2016 2018

Cost per bacterial genome (USD)

3 Mb bacterial genome

Illumina MiSeq v1 v2 v3

Roche 454 GS jr

Library preparaon

Figure2.RawSequencingCostsHaveDroppedoverTimebuttheTrueSequencingCostPerBacterial GenomeHasStabilised.(A)Sequencingcostperrawmegabase(Mb)ofDNAsequencebetween2009and2018.Data fromhttps://www.genome.gov/27541954/dna-sequencing-costs-data/.(B)Theevolutionofcostsforabacterialgenome of3Mbsequencedto50depth(Illumina)or30depth(Roche454)between2009and2018.Thefractionofthetotal cost(redline)madeupoflibrarypreparationconsumables(blueline)indicatesthatthedropinrawsequencingcostshas hadalimitedimpactontruesequencingcostssince2011,andnoneafter2013.Thecostisbasedonourcalculationsfor theoutputandconsumablecostsforthe454GSJrandIlluminaMiseq2150,Miseq2250andMiseq2300,the leadingmicrobiology-scaleplatformsintermsofoutput/costratioin2009,2011,2015,and2018respectively.USD,US$.

(8)

ofknownsequencesthoughhybridization,areaparticularlytractableapproachforvirusesdue totheirsmallgenomesize.Forexample,theVirCapviromecapturemethodtargetsallknown viruses and can even enrich for novel sequences [64].Similar methods targeting specific organismshavebeendevelopedandsuccessfullydeployed,representinganattractiveoption forunculturableorganisms[16,65–68].

Relativetothetimerequiredforcultureanddownstreamanalysisofthedata,variationinthe speedofdifferentsequencingtechnologiesisrelativelymodest.Thereisconsiderableenthu- siasmfortheOxfordNanoporeTechnology(ONT)whichoutputsdatainrealtime,althoughthe ONT requires a comparable amount of time to the popular Illumina Miseq sequencer to generatethe samevolumeofsequence data.Sequencing onthe MiSeqsequencertakes between13to56hours,butasruntimecorrelateswithsequenceoutputandreadlength, researcherstendtosystematicallyfavourrunsoflongerduration.

EthicalConsiderations

Inthecontextofthisreview,geneticmaterialfromthehumanpatientpresentinclinicalsamples represents contamination, a major obstacle to obtaining a high yield of microbial DNA.

ProtocolsexisttodepletehumanDNApriortosequencing[69,70]butthesearenotcompletely Box1.WGSbeyondSingleGenomes

WGSinthestrictsenseusuallyreferstosequencingthegenomeofasingleorganism,anditiscommontodistinguish betweenthesample(thematerialthathasactuallybeentakenfromthepatient)andtheisolate(anorganismthathas beenculturedandisolatedfromthatsample).WGSmethodstraditionallysequenceaculturedisolatetoreduce contaminationfromotherorganisms,orsometimesrelyonenrichmentstrategiestargetingsequencesfromaspecic organism[66,67].However,thisrepresentsonlyasmallfractionofthetotalmicrobialdiversitypresentinaclinical sample.

Incontrast,metagenomicapproachessequencesamplesinanuntargetedway.Thisapproachisparticularlyrelevant forclinicalscenarioswherethepathogenofinterestcannotbepredictedand/orisfastidious(i.e.,hascomplexculturing requirements).Exampleapplicationsofclinicalmetagenomicsinclude:whenthediseasecausingagentisunexpected [74,75];investigatingthespreadofAMR-carryingplasmidsacrossspecies[35];andcharacterizingthenaturalhistoryof themicrobiome[76].Theremovaloftheculturerequirementcandrasticallydecreaseturn-aroundtimefromsampleto dataandenableidenticationofbothrareandnovelpathogens.Differentsampleshoweverpresentdifferentchallenges.

Easy-to-collectsamplesites(e.g.,faecesandsputum)typicallyalsohavearesidentmicrobiota,soitcanbechallenging todistinguishtheetiologicalagentofdiseasefromcolonizingmicrobes.Conversely,sitesthatareusuallysterile(e.g., cerebrospinalfluid,pleuralfluid)presentamuchbetteropportunityformetagenomicstocontributetoclinicalcare.

MetagenomicdataaremorecomplextoanalyzethansinglespeciesWGSdataandtendtorelyonsophisticated computationaltools,suchastheDesmansoftwareallowinginferenceofstrain-levelvariationinametagenomicsample [77].Suchapproachescanbedifculttoimplement,arecomputationallyverydemanding,andareunlikelytobe deployableinclinicalmicrobiologyinthenearfuture,althoughcloud-basedplatformsmaycircumventtheneedfor computationalresourcesindiagnosticlaboratories.Furthermore,somefasterapproachesforrapidstraincharacter- izationfromrawsequencereads,suchasMASH[78]andKmerFinder[10,79],couldfindauseindiagnostics microbiology,withthelatterhavingbeenshowntoidentifythepresenceofpathogenicstrainseveninculture-negative samples[10].

However,thedifferencesbetweenthesemethodsshouldnotobscuretheirfundamentalsimilarities.Obtainingsingle- speciesgenomesfromculture isoneendofa continuumofmethodsthatstretches all thewaytofull-blown metagenomicsofasample.Inprinciple,allmethodsproducethesamekindofdata:stringsofbases.Furthermore, inallcaseswhatisclinicallyrelevantrepresentsonlyasmallfractionofthesedata.Integratingsequencingdatafrom differentmethodsintoasinglediagnosticspipelineisthereforeanattractiveprospecttoquicklyidentifythegenomic needlesinthemetagenomichaystackina species-agnosticmanner.Forexample,thepresenceofaparticular antibiotic-resistancegeneinsequencingdatamayrecommendagainsttheuseofthatantibiotic;whetherthegene ispresentindatafromasingle-speciesisolateorfrommetagenomesisirrelevant.Asanexample,Leggettetal.used MinIONmetagenomicprofilingtoidentifypathogen-specificAMRgenespresentinafaecalsamplefromacriticallyill infantallwithin5hoftakingtheinitialsample[80].

(9)

problem-freeasthedepletionprotocolislikelytobiasestimatesofthemicrobialcommunity, andsomehumanreadswilllikelyremain.Inparticular,levelsofhumanDNAaresignificantly higherinfaecalsamples fromhospitalizedpatientscomparedtohealthycontrols[71],sug- gestingthattheproblemisexacerbatedinclinicalsettings.Therefore,theethicalandlegal issuesraisedbyintroducinghumanWGSintoroutinehealthcare[72]cannotbeavoidedby microbiallyfocused clinicalmetagenomics.Dismissingtheseconcernsas minormay bean optionforacademicresearchersuninterestedinthesehumandata,butitisnaivetothinkthat hospitalethicscommitteeswillsharethisview. EvenintheabsenceofhumanDNA,meta- genomicsamplesfrommultiplebodysitescanbeusedto identifyindividualsindatasetsof hundredsof people [73].Managing clinical metagenomics data inlight ofthese concerns shouldbetakenseriously,notonlyasabarriertoimplementationbutbecauseoftherealrisks topatientprivacy.

BespokePipelinesforGenomicsinClinical Microbiology

AmajorproblemintheanalysisofWGSdataisthattherearecurrentlyveryfew(ifany)accepted goldstandards.ThefundamentalstepsofWGSanalysesinmicrobialgenomicstendto be similaracrossapplicationsandusuallyconsistofthefollowingsteps:sequencedataquality control;identification/confirmationofthesequencedbiologicalmaterial;characterizationofthe sequencedisolate(includingtypingeffortsaswellascharacterizationofvirulencefactorsand putativeAMRelements/mutations);epidemiologicanalysis;andfinally,storageoftheresults (Figure 3). However, how these analyses are implemented varies widely, both between microbialspeciesandhumanlabs.Despitesomecommercialattemptsatone-stopanalysis suitessuchasRidomSeqsphere+(http://www.ridom.com/seqsphere/),mostlaboratoriesuse acollectionofopen-sourcetoolstoperformparticularsubanalyses.Typically,thesetoolsare thenwoventogetherintoapatchworkofsoftware(a‘pipeline’).Theideaofapipelineistoallow within-laboratory standardized analysis of batches of isolates with relatively little manual bioinformaticswork.Suchpipelinescanbehighlycustomizableforawiderangeofquestions.

Therearealsosomecommunaleffortsatstreamliningworkflowsacrosslaboratories.Asan example,Galaxy(https://usegalaxy.org)isaframeworkthatallowsnonbioinformaticianstouse awidearrayofbioinformaticstoolsthroughawebinterface.

Onemajorlimitationtorapidlyattainingusefulinformationinaclinicalsettingisthatanalysis pipelinesformicrobialgenomicshavegenerallybeendevelopedforfundamentalresearchor publichealthepidemiology[81].Thisusuallymeansthatthepipelinepermitsaverythorough andsophisticatedworkflowwithalargenumberofoptionsandmovingparts.Forexample,at thetimeofwriting(May,2018),the‘QCandmanipulation’stepinGalaxyaloneconsistsof35 differenttools,tests,andworkflowsthatcanbeappliedtoaninputsequence.While thisis desirablefrom aresearcher’s perspective,it isclearly prohibitivefor real-time analysisina clinicalsetting.Auserrequiresin-depthknowledgeaboutthepurposeeachtoolserves,the relativestrengthsandweaknessesofeachapproach,andafunctionalunderstandingofthe importantparameters.Furthermore,mostanalysispipelinesrequireproficiencyinLinuxsys- temsandnavigatingthecommandline,somethingclinicalmicrobiologistsarerarelytrainedfor.

Theroadtostringent,exhaustiveanalysisofWGSdataislongandpavedwithgoodintentions.

Inordertomovetowardsreal-timeinterpretableresultsforclinicsitwillbenecessarytotake certainshortcuts.Thefocusshouldbeonrapid,automatedanalysisandclear,unambiguous results.Somestepsinthepipelinecansimplybeomittedforclinicalpurposes.Asanexample, genome assemblymight appearto be abottleneck forreal-time WGS diagnostics, butis probablyrarelyrequired;sufficientcharacterizationofanisolatecanbemadebyanalysisofthe k-mersintherawsequencedata,whichisordersofmagnitudefaster.Accurateidentification

(10)

ofanisolatecanbemaderapidlywithMinHash-basedk-mermatchingmethodssuchasMash [78], and AMR elements can be identifiedfrom k-mers alone [14]. Another exampleof a computationallyintensivestepthatcouldbeomittedfromadefaultpipelineissophisticated phylogenetic inference. Best practice for the creation of phylogenetic trees may involve evaluatingtheindividuallikelihoodofaverywiderangeofpossibletreesgiven asequence alignmentorotherdistancemetric,repeatedforthousandsofbootstrappedreplicates,givinga treewithhighconfidencebutwithextremecomputationaltimecosts.Aclinicalpipelinecould usemuchfasterapproachesandstillprovideaninformativephylogenetictree[82].

InFigure3weoutlineourschematicvisionofacomputationalpipelinespecifictodiagnosticsin clinicalmicrobiology.Theclinicalpipelinewouldonlyencompassasmallsubsetoftheresearch pipelineaimed at generatingrapid andinterpretable output. Forepidemiological inference, pairwisedistancesbetweenstrainswouldbecomputedasamatrixofJaccarddistancesonthe

Public health/research Clinics

Quality

trimming Quality

trimming Adapter

trimming deduplicaƟonRead Read

downsampling k-mer-based

strain ID k-mer-based

strain ID Blast strain

ID (on conƟgs)

De novo assembly

In silico typing

In silico typing*

SNP calling on core/pairs Alignment to

reference wgMLST cgMLST PhylogeneƟc

tree

PhylogeneƟc tree*

Transmission chain

Transmission chain*

Cross-reference

known variants Cross-reference

known variants Results entered

in database Results entered

in database Storage of

raw data

QC

ID

Epidemiology

AMR Storage

Simplified

Figure3.TheStandard WGSResearchBioinformaticsPipelineCanBeModiedforClinicalUse.This schematicshowscommonstepsusedinpublichealthand/orresearchtogetherwithsuggestedmodificationsand omissionsforclinicalreal-timeimplementation.Stepsontherightmarkedwithanasteriskrepresentsimpliedversions optimised forspeed. cgMLST, core genome multilocus sequence typing; SNP, single-nucleotide polymorphism;

wgMLST,wholegenomemultilocussequencetyping.

(11)

sharedproportionofk-mersasoutputtedbyMash[78].Thismatrixcouldbeusedtogeneratea phylogenetictreeusingacomputationallyinexpensivemethod(e.g.,neighbor-joining).Addi- tionally,acorrelationbetweenpairwisegeneticdistanceandsamplingdatecouldbeperformed totestforevidenceoftemporalsignalinthedata(i.e.,accumulationofasufficientnumberof mutationsoverthesamplingperiod).Inthepresenceoftemporalsignal,theuserwouldbe providedwithatransmissionchainbasedonafastalgorithmsuchasSeqtrack[83].

Anybespokepipelineforclinicaldiagnosticswouldneedtobelinkedwithregularlyupdated multispecies databases containing information about the latest developments in typing schemes,as wellasclinicallyimportant factorssuchas AMRdeterminants. Resultswould haveto becontinuously validated,andinternationalaccreditation standardsmet atregular intervals.Atanationallevel,accreditationbodies(e.g.,UKASintheUK)maylacktheexpertise required.Inourexperience,manypromisingdatabaseshavecollapsedafterfundingexpiredor theresponsiblepostdocleftforanotherjob.IfWGSisevertomakeitintotheclinicitwillbe necessarytosecureindefinitefundingofbothinfrastructureandpersonnelforsuchdatabases.

ThelackofuptakeofWGS-baseddiagnosticsmayalsobeinpartduetoanunderstandable desiretomaintainthe‘statusquo’inabusyhospitalenvironmentwithalreadyestablished treatmentandinterventionsystems.Additionally,andperhapssignificantly,italsohighlightsthe difficultytocommunicatethepotentialbenefitsofWGStotheday-to-daylifeofaclinic.The mainproponentsofWGStendtobebasedinthepublichealth/researchenvironmentandare rarelyactivelyinvolvedinclinicaldecision-making.Thisinitselfcanpresentsomethingofa languagebarrier,challengingmeaningfuldialogueoverhowadoptionofnewapproachescan leadtoquantifiableimprovementsinexistingsystems.Further,thephysicalplanning,imple- mentationandintegrationofWGSdiagnosticsmaybeunlikelytosucceedwithoutcarefully plannedintroductionandcontinuedtrainingofitsuserbase.Thisisofcoursechallengedbythe alreadyresource-limitedinfrastructureofmanyclinicalsettings.

ConcludingRemarks

Despiteitsimmensepromiseandsomeearlysuccesses,itisdifficulttopredictifandwhen WGSwillcompletelysupersedecurrentstandardsinclinicalmicrobiology.Thereareseveral majorbottleneckstoitsimplementationasaroutineapproachtodiagnoseandcharacterise microbialinfections(seeOutstandingQuestions).Theseinclude,amongothers:the current costsofWGS,whichremainfarfromnegligibledespiteacommonbeliefthatsequencingcosts haveplummeted;alackoftrainingin,andpossibleculturalresistanceto,bioinformaticsamong clinicalmicrobiologists;alackofthenecessarycomputationalinfrastructureinmosthospitals;

theinadequacyofexistingreferencemicrobialgenomicsdatabasesnecessaryforreliableAMR andvirulenceprofiling;andthedifficultyofsettingupeffective,standardized,andaccredited bioinformaticsprotocols.

Focusing in the near future on WGS applications that fulfil unmet diagnostic needs and demonstrate clear benefits to patients and healthcare professionals will help to drive the culturalchanges requiredforthe transitionto WGSinclinical microbiology. However, irre- spectiveofhowthistransitionoccursandhowcompleteitis,itislikelytofeelhighlydisruptive formanyclinicalmicrobiologists.Thereisalsoagenuineriskthatpreciousknowledgeinbasic microbiologywillbelostafterthetransitiontoWGS,particularlyifinvestmentprioritisesnew technology at the expense of older expertise. More positively, irrespective of the future implementationofWGSinclinicalmicrobiology,weshouldnotforgetthattheavailabilityof extensivegenomicdatahasbeeninstrumentalinthedevelopmentofamultitudeofroutinenon- WGStypingschemes.

OutstandingQuestions

CanWGSbeusedtodeveloprobust classicationschemesthataccountfor thegeneticdiversityoforganismswith opengenomes?

Which clinically relevant phenotypes canbereliablypredictedusingWGS, andforwhichorganisms?

Howcanphylogeneticanalysesofout- breaksbespeededuptomeaningfully contribute to infection control at actionabletimescales?

Howcanpubliclyavailabledatabases bereliablymaintainedtotherequired clinical accreditation standards over longtimeperiods?

Willthetruecostofgeneratingabac- terial genome remain stable as the sequencing market consolidates in fewerhands?

Howcanclinicalmetagenomicdatabe managedsafelyinlinewiththeethical considerations applicable to identi- ablehumanDNA?

Howcanunwieldybioinformaticspipe- lines developed with academic researchinmindbeadaptedforaclin- icalsetting?

Cancurrentexpertiseintraditionalclin- icalmicrobiologybemaintainedinthe transitiontoWGS?

Referanser

RELATERTE DOKUMENTER

The AUTODYN-2D simulations have been found to give results that are in good agreement with the experiment, whereas the cavity expansion theory shows poor agreement with the

From the above review of protection initiatives, three recurring issues can be discerned as particularly relevant for military contributions to protection activities: (i) the need

Overall, the SAB considered 60 chemicals that included: (a) 14 declared as RCAs since entry into force of the Convention; (b) chemicals identied as potential RCAs from a list of

Results: In order to elucidate the genes and genomic regions underlying the genetic differences, we conducted a genome wide association study using whole genome resequencing data

The outbreak strain was confirmed by whole genome sequencing (WGS) and was isolated from the fruit mix consumed by cases, resulting in withdrawal from the market on 6 March

In this study, we studied the genomic relatedness of  Salmonella  Chester isolates by whole genome sequencing (WGS) analysed by a core genome multilocus sequence typing

Whether it was the health college, the medicinal agency, the medicinal office or, later, the offices of the county public health officers and the National Board of Health,

How can it address grand anthropological prob- lems related to man in society; in the culture, in the world; faced with “the oth- ers”….. To what extent could exceptional phenomena