4S Peak Filling – baseline estimation by iterative mean suppression
Kristian Hovde Liland
a,b,*
aNorwegianUniversityofLifeSciences,1430Ås,Norway
bNofima–NorwegianInstituteofFood,FisheriesandAquacultureResearch,1432Ås,Norway
GRAPHICAL ABSTRACT
ABSTRACT
Anovelbaselineestimationprocedurebuildingonpreviouslypublishedworksispresented.
Thecoreoftheestimationisaniterativespectrumsuppressionconsistingofamovingwindowminimum replacement(adaptedfromFriedrichs[1]).
Four,easilyunderstandable,parameterscontrolplacementofthebaselinerelativetothenoisebandaroundthe signal(adaptedfromEilers[2])andtheflexibilityindifferentsituations.
Themethodisespeciallysuitedfornon-linearbaselineswithlocalvariationsandforresolvingpeakclustersin qualitativeanalyses.
ã2015TheAuthors.PublishedbyElsevierB.V.ThisisanopenaccessarticleundertheCCBYlicense(http://
creativecommons.org/licenses/by/4.0/).
ARTICLE INFO
Methodname:Baselineestimationbyiterativemeansuppression
Keywords:Baselineestimation,Smoothing,Subsampling,Movingwindow,Noise,Interpolation Articlehistory:Received28November2014;Accepted20February2015;Availableonline21February2015
*Tel.:+4764970338.
E-mailaddress:[email protected](K.H.Liland).
http://dx.doi.org/10.1016/j.mex.2015.02.009
2215-0161/ã2015TheAuthors.PublishedbyElsevierB.V.ThisisanopenaccessarticleundertheCCBYlicense(http://
creativecommons.org/licenses/by/4.0/).
ContentslistsavailableatScienceDirect
MethodsX
journal homepage:www.elsevier.com/locate/mex
Methods
Therearefourmainingredientsinthe4SPeakFillingbaselineestimationprocedure.Theseare describedbelow,oneSatatime.Supportingthedescriptionsareillustratingfigureswithspectrafrom matrix assisted laser desorption/ionisation time-of-flight (MALDI-TOF) [3] and laser induced breakdownspectroscopy(LIBS)[4].Theformerisincludedtoillustratetheeffectsofthedifferent stepsofthealgorithm,whilethelatterpresentsaproblemwheretheproposedmethodhandlesa challengingbaseline estimation where othertested methods fail.AnR [5] implementationwith sectionnumbers correspondingtothis article is freelyavailable in the‘baseline’ package in the ComprehensiveRArchiveNetworkrepository.
Smoothing
Beforebaselineestimationisperformedonthespectra,theyaresmoothedbyapplyingapenaltyon theirsecondderivative;seeFig.1andSectionS1intheRcode.ThisisachievedusingtheWhittaker smootherasdescribedinEilers’articleonaperfectsmoother[6].Othersmoothersmayhavethe desiredeffects, but the Whittaker smoother is a quickand well-testedbasis for smoothing and baselineestimation.
Therearetwoimportanteffectsofapplyingasmoother.Firstofall,itlimitsspuriousnoisepeaks thatareofnointerestinthebaselineestimation.Moreimportantlyitgivestheuserthepossibilityof centringthebaseline inthe noise bandaroundthesignalspectrum byadjusting thesmoothing parameter,thusgivingamorerealisticzerosignalformosttypesofspectra.Itdoesnotmatterthatreal peaksmaybeshrunkseverelyinthesmoothingasthisdoesnotharmthebaselineestimation.Ifno smoothingisperformed,thebaselinewillbeplacedinthebottomofthenoiseband.
Subsampling
Insteadofestimatingthebaselinesdirectlyonthespectra,asimplebinningisperformedfirst;see Fig.2andSectionS2intheRcode.Thenumberofbinsischosenbytheuser,andtheminimumvaluein eachbinisusedasalocalrepresentativeofthespectrum.Thesubsamplingservestwogoals.Firstly,it increasestheefficiencyofthealgorithmwithregardtothenumberoflocalwindowsitwillworkin, thusreducingthenumberofiterationsneededtosuppressthebaseline.Secondly,itsimplifiesthe shapeofthespectrumwhileretainingthebasicshapeofthebaseline.
Ifthebaselineneedstobeflexibleorverysteep,e.g.becauseofdominatingfluorescenceinRaman spectroscopy,ahighnumberofbinsisneeded,whileamorelinearandflatbaselinecanberepresented byveryfewbins.Agoodstartingpointistohaveonebinper10spectrumvaluesandthenadjustthe numberaccordingtovisualinspectionorsubsequentanalysisperformance.Toomanybinsmaycause abaselinethat risesintopeaks,whiletoofewbinsmaycausethebaselineto“detach” fromthe spectrum(toolowestimate)ifthetruebaselineishighlyconcave.Forspectrahavingregionsofboth characteristics,binwidthsmayneedtobevariedalongthespectra(seeFig.6).
m/z
Intensity
6000 8000 10000 12000
010002000
Fig.1.SmoothingoftheoriginalMALDI-TOFspectrumbyWhittaker(l=104).
Suppression
Themainpartofthealgorithmistheiterativesuppression;seeFig.3andSectionS3intheRcode.A windowismovedalongeachspectrumsimilartothemedianwindowmethod[1].Therearefour featuresthatsetthestrategiesapart.Firstly,theminimumofthecurrentvalueandthemeanvaluein thewindowisusedinsteadofthemedianvalue.Awell-chosenwindowwidthandacorresponding numberofbucketswillensurethattheminimumissufficientlylowbutnotbelowtheperceived baseline.
Thesecondfeatureisthatthecurrentspectrumisupdatedforeachmoveofthewindowinsteadof estimating allminima beforeupdating thespectrum. Theeffect isa quickerconvergence, anda directionaleffect.Thelattermeansthatthewindowhastobemovedinbothdirections,firstslidingit totherightandthenbackagain,toavoidabiasedestimationaroundpeaks.
Thethird featureisthatthewindowis shrunkenlogarithmicallyforeach completedwindow movementrightandleft,endingupatawidthof1wavelength/mass.Bythisshrinkingweincrease theeffectivenessbyallowingwidewindowsinthefirstiterationswhilereducingthechanceofending upwithatoolowbaseline.
Thelastfeatureisthatthewindowissymmetricallyshrunkwhenclosetotheendsofthespectra.
Thisisdonetoavoidtoolowminimawhenthebaselineissteeptowardtheends,whichcanbethecase forspectralikethoseofMALDI-TOForRamanspectrawithdominatingfluorescence.
Thenumberofpassesthewindowmovesrightandleftoverthespectraisauserchosenparameter.
Thisparameteristightlyconnectedwiththenumberofbinsandaffectsthebaselineinasimilar manner.Toofewiterationswillleadtoabaselineraisingintothepeaks.Ifthebaselineisrelativelyflat, toomanyiterationswillnotharmtheestimation.Butifthetruebaselineishighlyconcave,toomany iterationsmayhavethesamebaseline“detaching”effectwherethebaselineisestimatedtoolow.A possibleextensionof thealgorithm couldbetoalsoinclude a convergencecriteriontostopthe iterationsearlywhenchangesaresmall.
m/z
Intensity
6000 8000 10000 12000
050010001500
Fig.2. SubsamplingofsmoothedMALDI-TOFspectrum,reducingresolutionfrom4000to150m/zvalues.
m/z
Intensity
6000 8000 10000 12000
4008001200
Fig.3. Iterativesuppressionofbaselinefromsmoothed,subsampledMALDI-TOFspectrum.Awindowwidthof15pointsand 20iterationswasused.
Aimingforabaselinehavinglittleflexibilityandinterpretingtheareaaround12,000m/zinthe MALDI-TOFspectrumasapeakclusterratherthanariseinthebaselineleadtothechoiceofawide (relativetothe150bins)windowwidth(15points).Thelargewidthofthementionedpeakcluster meantthatahighnumberofiterationswerealsoneeded(20).Astartingpointwhenadaptingthe parameterstoanewsetofspectraistochooseawindowwidthapproximatelycoveringthewidthof thewidestpeakand10iterationsbeforestartingtoadjusttheparameters.
Stretching
The final stageof the algorithm consists of interpolatingthe estimated baseline backtofull spectrumlength;seeFig.4andSectionS4intheRcode.Becauseofthechoiceofminimumvalues,we canplacetheestimatedbaselineatthecentrepointsofthebucketsandfindtheremainingvalues usingsimple,linearinterpolationforspeedandrobustnessorasetofproperlyconstrainedsmoothing splinesforbettersmoothness.
Subtraction
Ifthebaselineestimationisusedforbaselinecorrection,wehavetosubtractthefinalbaselinefrom theoriginalspectrumashasbeendoneinFig.5.Weobservethatthezerolinehasbeenwellcentredin thenoiseband.Thebaselinewaschosentoberigidenoughtoretainallthesmallpeaks,e.g.around 7000and8000m/z.Byusingmorebucketsandasmallerwindowwidthitispossibletoplacethe baselineinthepeakclusteraround12,000m/zsothatthepeaksarebetterresolvedasindicatedwith thecurvedlineinFig.5.Ifmorelocalisedcontrolofthebaselineflexibilityisneededonehastoresolve usevaryingbinwidthsalongthespectra.
AvastlymorechallengingbaselinecanbefoundinLIBSspectraasshowninFig.6.Thesecontain dipsinthebaselinethatareimpossibletoadapttowithoutalsohavingabaselinethatraiseshighinto
m/z
Intensity
6000 8000 10000 12000
010002000
Fig.5.MALDI-TOFspectrumcorrectedbytheestimatedbaseline.Analternativebaselineisindicatedwhichfollowstheshape ofthepeakclusteraround12,000m/z.
m/z
Intensity
6000 8000 10000 12000
010002000
Fig.4. Baselinestretchedbysmoothingsplinesaftersmoothing,subsamplingandsuppressionofMALDI-TOFspectrum.
thepeaks.Wetestedtheasymmetricleastsquares(ALS)baselineestimation,whichisa modified Whittakersmoother,andseveralothermethodswithconsistentlybadresults.Forinstance,ifALSis usedwithhighenoughpenaltyonthesecondderivativeandlowenoughweightonpositiveresiduals (seeRef.[2])togiveaflatbaselineunderthepeaks,falsepeakswillappeararoundthebaselinedips afterbaselinesubtraction.
Withtheproposedmethodthiscanbeovercomebycustomizingthesubsampling,asproposedin Ref.[7].Usingnarrowbinsaroundthebaselinedipsat360nmand530nm,widebinswherethe baselinelooksrelativelyflatandmediumbinwidthswherethebaselineissteeparound400nmwe gettheestimationandcorrectionshownintheupperandlowerpartofFig.6.Aslongasthereisnot toomuchshiftinthebaselinedipsfromspectrumtospectrum,thechosenbinwidthscanbeusedfor allspectrainthedataset.
Additionalinformation
The described baseline estimation is implemented in the R package ‘baseline’ in the CRAN repository(http://cran.r-project.org)underthenamefillPeaks.Thefourparametersmentionedabove aresummarizedinTable1.
Acknowledgements
MethodsXthanksthereviewersofthisarticle(PaulH.C.Eilersandasecondreviewerwhowould liketoremainanonymous)fortakingthetimetoprovidevaluablefeedback.
Intensity
400 600 800 1000
020k
nm
Intensity
400 600 800 1000
020k40k
Fig.6.LIBSspectrumcorrectedusing4SPeakFillingwithvaryingbaselineintervalwidthstohandlebaselinedipsaround 360nmand530nmandthesteepridgearound400nm(nosmoothing,3windowwidth,and2iterations).
Table1
Summaryoftheparameterscontrollingthebaselineestimation.
Description Rname Suggestedstartingvalue
Secondderivativepenaltyforsmoothing. lambda Centredinnoiseband:4 Belowspectra:0 Numberofbucketsforsubsamplingoravectorofstart/end
pointsofbuckets.
int 1/10ofthenumberofwavelengths/masses/points Initialhalfwidthofwindowsusedforsuppression. hwi Halfthewidthofthewidestpeak(full
width=centrepointhwi) Numberofiterationsforsuppression. it 10
References
[1]M.S.Friedrichs,Amodel-freealgorithmfortheremovalofbase-lineartifacts,J.Biomol.NMR5(1995)147–153.
[2]P.H.Eilers,Parametrictimewarping,Anal.Chem.76(2004)404–411.
[3]K.H.Liland,B.-H.Mevik,E.-O.Rukke,T.Almoy,T.Isaksson,QuantitativewholespectrumanalysiswithMALDI-TOFMS,Part II.Determiningtheconcentrationofmilkinmixtures,Chemometr.Intell.Lab.99(2009)39–48.
[4]A.K.Myakalwar,N.C.Dingari,R.R.Dasari,I.Barman,M.K.Gundawar,Non-gatedlaserinducedbreakdownspectroscopy providesapowerfulsegmentationtoolonconcomitanttreatmentofcharacteristicandcontinuumemission,PLoSOne9 (2014)e103546.
[5]R.CoreTeam,Alanguageandenvironmentforstatisticalcomputing,RFoundationforStatisticalComputing,RCoreTeam, Vienna,Austria,2014
[6]P.H.C.Eilers,Aperfectsmoother,Anal.Chem.75(2003)3631–3636.
[7]K.H.Liland,E.-O.Rukke,E.F.Olsen,T.Isaksson,Customizedbaselinecorrection,Chemometr.Intell.Lab.109(2011)51–56.