• No results found

4S Peak Filling – baseline estimation by iterative mean suppression

N/A
N/A
Protected

Academic year: 2022

Share "4S Peak Filling – baseline estimation by iterative mean suppression"

Copied!
6
0
0

Laster.... (Se fulltekst nå)

Fulltekst

(1)

4S Peak Filling – baseline estimation by iterative mean suppression

Kristian Hovde Liland

a,b,

*

aNorwegianUniversityofLifeSciences,1430Ås,Norway

bNofimaNorwegianInstituteofFood,FisheriesandAquacultureResearch,1432Ås,Norway

GRAPHICAL ABSTRACT

ABSTRACT

Anovelbaselineestimationprocedurebuildingonpreviouslypublishedworksispresented.

Thecoreoftheestimationisaniterativespectrumsuppressionconsistingofamovingwindowminimum replacement(adaptedfromFriedrichs[1]).

Four,easilyunderstandable,parameterscontrolplacementofthebaselinerelativetothenoisebandaroundthe signal(adaptedfromEilers[2])andtheflexibilityindifferentsituations.

Themethodisespeciallysuitedfornon-linearbaselineswithlocalvariationsandforresolvingpeakclustersin qualitativeanalyses.

ã2015TheAuthors.PublishedbyElsevierB.V.ThisisanopenaccessarticleundertheCCBYlicense(http://

creativecommons.org/licenses/by/4.0/).

ARTICLE INFO

Methodname:Baselineestimationbyiterativemeansuppression

Keywords:Baselineestimation,Smoothing,Subsampling,Movingwindow,Noise,Interpolation Articlehistory:Received28November2014;Accepted20February2015;Availableonline21February2015

*Tel.:+4764970338.

E-mailaddress:[email protected](K.H.Liland).

http://dx.doi.org/10.1016/j.mex.2015.02.009

2215-0161/ã2015TheAuthors.PublishedbyElsevierB.V.ThisisanopenaccessarticleundertheCCBYlicense(http://

creativecommons.org/licenses/by/4.0/).

ContentslistsavailableatScienceDirect

MethodsX

journal homepage:www.elsevier.com/locate/mex

(2)

Methods

Therearefourmainingredientsinthe4SPeakFillingbaselineestimationprocedure.Theseare describedbelow,oneSatatime.Supportingthedescriptionsareillustratingfigureswithspectrafrom matrix assisted laser desorption/ionisation time-of-flight (MALDI-TOF) [3] and laser induced breakdownspectroscopy(LIBS)[4].Theformerisincludedtoillustratetheeffectsofthedifferent stepsofthealgorithm,whilethelatterpresentsaproblemwheretheproposedmethodhandlesa challengingbaseline estimation where othertested methods fail.AnR [5] implementationwith sectionnumbers correspondingtothis article is freelyavailable in the‘baseline’ package in the ComprehensiveRArchiveNetworkrepository.

Smoothing

Beforebaselineestimationisperformedonthespectra,theyaresmoothedbyapplyingapenaltyon theirsecondderivative;seeFig.1andSectionS1intheRcode.ThisisachievedusingtheWhittaker smootherasdescribedinEilers’articleonaperfectsmoother[6].Othersmoothersmayhavethe desiredeffects, but the Whittaker smoother is a quickand well-testedbasis for smoothing and baselineestimation.

Therearetwoimportanteffectsofapplyingasmoother.Firstofall,itlimitsspuriousnoisepeaks thatareofnointerestinthebaselineestimation.Moreimportantlyitgivestheuserthepossibilityof centringthebaseline inthe noise bandaroundthesignalspectrum byadjusting thesmoothing parameter,thusgivingamorerealisticzerosignalformosttypesofspectra.Itdoesnotmatterthatreal peaksmaybeshrunkseverelyinthesmoothingasthisdoesnotharmthebaselineestimation.Ifno smoothingisperformed,thebaselinewillbeplacedinthebottomofthenoiseband.

Subsampling

Insteadofestimatingthebaselinesdirectlyonthespectra,asimplebinningisperformedfirst;see Fig.2andSectionS2intheRcode.Thenumberofbinsischosenbytheuser,andtheminimumvaluein eachbinisusedasalocalrepresentativeofthespectrum.Thesubsamplingservestwogoals.Firstly,it increasestheefficiencyofthealgorithmwithregardtothenumberoflocalwindowsitwillworkin, thusreducingthenumberofiterationsneededtosuppressthebaseline.Secondly,itsimplifiesthe shapeofthespectrumwhileretainingthebasicshapeofthebaseline.

Ifthebaselineneedstobeflexibleorverysteep,e.g.becauseofdominatingfluorescenceinRaman spectroscopy,ahighnumberofbinsisneeded,whileamorelinearandflatbaselinecanberepresented byveryfewbins.Agoodstartingpointistohaveonebinper10spectrumvaluesandthenadjustthe numberaccordingtovisualinspectionorsubsequentanalysisperformance.Toomanybinsmaycause abaselinethat risesintopeaks,whiletoofewbinsmaycausethebaselineto“detach” fromthe spectrum(toolowestimate)ifthetruebaselineishighlyconcave.Forspectrahavingregionsofboth characteristics,binwidthsmayneedtobevariedalongthespectra(seeFig.6).

m/z

Intensity

6000 8000 10000 12000

010002000

Fig.1.SmoothingoftheoriginalMALDI-TOFspectrumbyWhittaker(l=104).

(3)

Suppression

Themainpartofthealgorithmistheiterativesuppression;seeFig.3andSectionS3intheRcode.A windowismovedalongeachspectrumsimilartothemedianwindowmethod[1].Therearefour featuresthatsetthestrategiesapart.Firstly,theminimumofthecurrentvalueandthemeanvaluein thewindowisusedinsteadofthemedianvalue.Awell-chosenwindowwidthandacorresponding numberofbucketswillensurethattheminimumissufficientlylowbutnotbelowtheperceived baseline.

Thesecondfeatureisthatthecurrentspectrumisupdatedforeachmoveofthewindowinsteadof estimating allminima beforeupdating thespectrum. Theeffect isa quickerconvergence, anda directionaleffect.Thelattermeansthatthewindowhastobemovedinbothdirections,firstslidingit totherightandthenbackagain,toavoidabiasedestimationaroundpeaks.

Thethird featureisthatthewindowis shrunkenlogarithmicallyforeach completedwindow movementrightandleft,endingupatawidthof1wavelength/mass.Bythisshrinkingweincrease theeffectivenessbyallowingwidewindowsinthefirstiterationswhilereducingthechanceofending upwithatoolowbaseline.

Thelastfeatureisthatthewindowissymmetricallyshrunkwhenclosetotheendsofthespectra.

Thisisdonetoavoidtoolowminimawhenthebaselineissteeptowardtheends,whichcanbethecase forspectralikethoseofMALDI-TOForRamanspectrawithdominatingfluorescence.

Thenumberofpassesthewindowmovesrightandleftoverthespectraisauserchosenparameter.

Thisparameteristightlyconnectedwiththenumberofbinsandaffectsthebaselineinasimilar manner.Toofewiterationswillleadtoabaselineraisingintothepeaks.Ifthebaselineisrelativelyflat, toomanyiterationswillnotharmtheestimation.Butifthetruebaselineishighlyconcave,toomany iterationsmayhavethesamebaseline“detaching”effectwherethebaselineisestimatedtoolow.A possibleextensionof thealgorithm couldbetoalsoinclude a convergencecriteriontostopthe iterationsearlywhenchangesaresmall.

m/z

Intensity

6000 8000 10000 12000

050010001500

Fig.2. SubsamplingofsmoothedMALDI-TOFspectrum,reducingresolutionfrom4000to150m/zvalues.

m/z

Intensity

6000 8000 10000 12000

4008001200

Fig.3. Iterativesuppressionofbaselinefromsmoothed,subsampledMALDI-TOFspectrum.Awindowwidthof15pointsand 20iterationswasused.

(4)

Aimingforabaselinehavinglittleflexibilityandinterpretingtheareaaround12,000m/zinthe MALDI-TOFspectrumasapeakclusterratherthanariseinthebaselineleadtothechoiceofawide (relativetothe150bins)windowwidth(15points).Thelargewidthofthementionedpeakcluster meantthatahighnumberofiterationswerealsoneeded(20).Astartingpointwhenadaptingthe parameterstoanewsetofspectraistochooseawindowwidthapproximatelycoveringthewidthof thewidestpeakand10iterationsbeforestartingtoadjusttheparameters.

Stretching

The final stageof the algorithm consists of interpolatingthe estimated baseline backtofull spectrumlength;seeFig.4andSectionS4intheRcode.Becauseofthechoiceofminimumvalues,we canplacetheestimatedbaselineatthecentrepointsofthebucketsandfindtheremainingvalues usingsimple,linearinterpolationforspeedandrobustnessorasetofproperlyconstrainedsmoothing splinesforbettersmoothness.

Subtraction

Ifthebaselineestimationisusedforbaselinecorrection,wehavetosubtractthefinalbaselinefrom theoriginalspectrumashasbeendoneinFig.5.Weobservethatthezerolinehasbeenwellcentredin thenoiseband.Thebaselinewaschosentoberigidenoughtoretainallthesmallpeaks,e.g.around 7000and8000m/z.Byusingmorebucketsandasmallerwindowwidthitispossibletoplacethe baselineinthepeakclusteraround12,000m/zsothatthepeaksarebetterresolvedasindicatedwith thecurvedlineinFig.5.Ifmorelocalisedcontrolofthebaselineflexibilityisneededonehastoresolve usevaryingbinwidthsalongthespectra.

AvastlymorechallengingbaselinecanbefoundinLIBSspectraasshowninFig.6.Thesecontain dipsinthebaselinethatareimpossibletoadapttowithoutalsohavingabaselinethatraiseshighinto

m/z

Intensity

6000 8000 10000 12000

010002000

Fig.5.MALDI-TOFspectrumcorrectedbytheestimatedbaseline.Analternativebaselineisindicatedwhichfollowstheshape ofthepeakclusteraround12,000m/z.

m/z

Intensity

6000 8000 10000 12000

010002000

Fig.4. Baselinestretchedbysmoothingsplinesaftersmoothing,subsamplingandsuppressionofMALDI-TOFspectrum.

(5)

thepeaks.Wetestedtheasymmetricleastsquares(ALS)baselineestimation,whichisa modified Whittakersmoother,andseveralothermethodswithconsistentlybadresults.Forinstance,ifALSis usedwithhighenoughpenaltyonthesecondderivativeandlowenoughweightonpositiveresiduals (seeRef.[2])togiveaflatbaselineunderthepeaks,falsepeakswillappeararoundthebaselinedips afterbaselinesubtraction.

Withtheproposedmethodthiscanbeovercomebycustomizingthesubsampling,asproposedin Ref.[7].Usingnarrowbinsaroundthebaselinedipsat360nmand530nm,widebinswherethe baselinelooksrelativelyflatandmediumbinwidthswherethebaselineissteeparound400nmwe gettheestimationandcorrectionshownintheupperandlowerpartofFig.6.Aslongasthereisnot toomuchshiftinthebaselinedipsfromspectrumtospectrum,thechosenbinwidthscanbeusedfor allspectrainthedataset.

Additionalinformation

The described baseline estimation is implemented in the R package ‘baseline’ in the CRAN repository(http://cran.r-project.org)underthenamefillPeaks.Thefourparametersmentionedabove aresummarizedinTable1.

Acknowledgements

MethodsXthanksthereviewersofthisarticle(PaulH.C.Eilersandasecondreviewerwhowould liketoremainanonymous)fortakingthetimetoprovidevaluablefeedback.

Intensity

400 600 800 1000

020k

nm

Intensity

400 600 800 1000

020k40k

Fig.6.LIBSspectrumcorrectedusing4SPeakFillingwithvaryingbaselineintervalwidthstohandlebaselinedipsaround 360nmand530nmandthesteepridgearound400nm(nosmoothing,3windowwidth,and2iterations).

Table1

Summaryoftheparameterscontrollingthebaselineestimation.

Description Rname Suggestedstartingvalue

Secondderivativepenaltyforsmoothing. lambda Centredinnoiseband:4 Belowspectra:0 Numberofbucketsforsubsamplingoravectorofstart/end

pointsofbuckets.

int 1/10ofthenumberofwavelengths/masses/points Initialhalfwidthofwindowsusedforsuppression. hwi Halfthewidthofthewidestpeak(full

width=centrepointhwi) Numberofiterationsforsuppression. it 10

(6)

References

[1]M.S.Friedrichs,Amodel-freealgorithmfortheremovalofbase-lineartifacts,J.Biomol.NMR5(1995)147–153.

[2]P.H.Eilers,Parametrictimewarping,Anal.Chem.76(2004)404–411.

[3]K.H.Liland,B.-H.Mevik,E.-O.Rukke,T.Almoy,T.Isaksson,QuantitativewholespectrumanalysiswithMALDI-TOFMS,Part II.Determiningtheconcentrationofmilkinmixtures,Chemometr.Intell.Lab.99(2009)39–48.

[4]A.K.Myakalwar,N.C.Dingari,R.R.Dasari,I.Barman,M.K.Gundawar,Non-gatedlaserinducedbreakdownspectroscopy providesapowerfulsegmentationtoolonconcomitanttreatmentofcharacteristicandcontinuumemission,PLoSOne9 (2014)e103546.

[5]R.CoreTeam,Alanguageandenvironmentforstatisticalcomputing,RFoundationforStatisticalComputing,RCoreTeam, Vienna,Austria,2014

[6]P.H.C.Eilers,Aperfectsmoother,Anal.Chem.75(2003)3631–3636.

[7]K.H.Liland,E.-O.Rukke,E.F.Olsen,T.Isaksson,Customizedbaselinecorrection,Chemometr.Intell.Lab.109(2011)51–56.

Referanser

RELATERTE DOKUMENTER

Current–voltage (I–V) characteristics for nanogap molecular diodes based on the conductive molecule FcC11. a) Schematic showing the geometry of the nanogap electrodes used for

These wavelengths coincide with the absorption peak of the bovine S-cones (451 nm) and are also close to the peak absorption maximum of melanopsin [9, 12, 13]. There is also

A photolyase-deficient mutant of Escherichia coli transformed with coding sequence of OINE01000912_T103440 and exposed to brief (UV)-C treatment (peak emission at 254 nm)

The experiments verified that ultraviolet laser induced fluorescence featuring excitation by a laser with wavelength 294 nm is more efficient than one at 355 nm and comparable pulse

Nucleation events, that is, the formation of ultrafine particles (diameter between 3 nm and 25 nm) and their subsequent growth to larger sizes, have been observed in the marine

Lidar derived vertical profiles of the extinction coefficient at 355 nm, the backscatter coefficients at 355 and 532 nm along with the lidar ratio at 355 nm and the

Through numeri- cal simulations using an FDTD solver, it has been found that the introduction of spherical aluminum nanoparticles with a 3 nm oxide layer of diameter 120 nm in

Adjusted between group analyses, controlling for the effects of treatment, age at baseline, age at the onset of obesity, BMI at baseline, physical activity at baseline, and