methods for multivariate regression
KristianHovde Liland
1 ,⋆
,Martin Høy
2
, Harald Martens
2 , 3
, SolveSæbø
1
8th January 2013
1)NorwegianUniversityofLifeSienes,DepartmentofChemistry,BiotehnologyandFood Siene
P.O.Box5003,N-1432Ås,Norway
2)Noma,NorwegianInstituteofFood, FisheriesandAquaultureResearh
Osloveien1,N-1430Ås,Norway,
3) NorwegianUniversityofLifeSienes, DepartmentofMathematialSienesandTehnology
P.O.Box5003,N-1432Ås,Norway
(
⋆
)Correspondingauthor: kristian.lilandumb.no,tel: +4764965830
methods for multivariate regression
Abstrat
Analysisofdataontainingavastnumberoffeatures,butonlyalimitednumberofinformativeones,
requiresmethodsthatanseparatetruesignalfromnoisevariables. Onelassofmethodsattempting
this are the sparse partial least squares methods for regression (sparse PLS). This paper aims at
improvingthetheoretialfoundation,speedand robustnessofsuhmethods. A generaljustiation
of trunation of PLS loadingweights is ahieved through distributiontheory and the entral limit
theorem. Wealsointrodueaquikplug-inbasedtrunationproedurebasedonanovelappliation
oftheoryintendedforanalysisofvarianeforexperimentswithoutrepliates. Theresultisaversatile
andintuitivemethodthatperformsomponent-wisevariableseletionveryeientlyandinalessad
ho mannerthanexisting methods. Predition performane is onpar withexisting methods,while
robustnessisensuredthroughabettertheoretialfoundation.
1 Introdution
Oneofthemajorhallengesinreentandomingdataanalysisistheeverinreasingnumberofvariables
reorded for eah sample. The data matries beome wider and wider. Beause of instrumental noise,
biologialnoiseandother unontrollablevariationsin the reordedsignal,variables that shouldhaveno
signalforagivensample,orbeequalarosssamples,almostnevershowazerosignalin thenalentred
data set. And dierenes between two signals that should be zero are seldom zero in pratie. Sine
preditivemultivariatemethodslikepartial least squaresregression(PLSR) [1℄ in theirbasiforms take
into aountall variables, the sheer number of non-zero noisevariables will often over-shadowthe true
signal.
Variousforms ofvariable seletionapproaheshavebeenproposed in theontext ofregression. Variable
seletion analso play a role in nding important variables in explorative studies, with the purpose of
stabilizingtheregressionmodellingandimprovingitspreditiveabilityandinterpretability. Sometimesthe
aimistondwhihvariablesinueneaertainproessausually,oratleastonveythemostinteresting
information, e.g. metabolites, genes, wavenumbers, ormoleular weights. Depending on the aimof the
studydierentseletionstrategiesmaybefavourableandthefousonhowmanyvariablesto retainmay
bedierent.
Basedonideasofomponent-wisevariableseletion,sparsenessandnormallydistributednoisewepropose
tousedistributionbasedtrunation toidentifyallunimportantmodelparametersthatare(orappearto
be)non-zeroduetorandomerrors,andforethesetowardszero. InthepresentPLSRontext,thismeans
tozerooutsmall,apparentlyrandomelementsinalltheloadingweightvetors. Theintensionisthereby
todrastiallyreduetheproblemofnon-zeronoiseontributions. Inthefollowingsetionswewilllookat
somerelatedmethodsintendedforthesamepurposeandmotivateasimple,intuitiveandexiblestrategy
fortrunationofnon-informativevariables. Appliationstorealandsimulateddataandomparisonwith
othermethodswillalsobepresented.
A basi assumption in statistis is the entral limit theorem (CLT). The CLT was rst presented by
AbrahamdeMoivrein1733andhasbeenformalisedandinterpretedundervaryingonditionsanddegrees
of stritness eversine. A simple interpretation is that as the number of observations sampled from a
randomproessinreases,thedistributionofthemean(andthesum)willapproahanormaldistribution.
Moreinterestingin this ontextis thatmany typesofrandom noiseare seenasapproximatelynormally
distributed, and linear ombinations of suh will tend even more towards the normal distribution. In
this paperwepropose to use the CLT to distinguishbetweenvariables with expeted non-zero loading
weights from the noisy variables with loading weights with a zero-expetation. We refer to the new
modellingprinipleasTrunation-PLSin thefollowing,andtheresultingmethodsTrunation-PLSRand
Trunation-PLS-DAaredesribedindetailinSetion3.
Manyapproaheshavebeeninventedthatattempttondtheinterestinginformationinaloudofvariables
theneedleinthehaystak. Oneoftheoldestandmostvariedlassofmethodsforthispurposeisvariable
seletion. Alargeproportionofthese methodsworkunivariately,evaluatingsinglevariablesforinlusion
orexlusion. Whenthenumberofvariablesareountedintensorhundredsofthousands,thisstrategywill
bepronetospuriousorrelations,hamperedbymultipletestingproblemsandvulnerabletolowsensitivity
or high false disovery rate. Moreover, it an lead to serious misinterpretation: Assume e.g. that the
regressorset ontainsbothan "upstream",ausally important variableobservedwith muh noiseand a
"downstream" onsequentialbut unimportant variable observed with little noise, and that the two are
stronglyinterorrelated. Traditional stepwisevariableseletion methods willthen eliminatetheausally
importantvariableto reduetheollinearity.
Subspae-based regression methods suh as PCR and PLSR attain an impliit variable seletion - not
byeliminating individual variables, but by eliminating subspaedimensions-i.e. linearombinationsof
variables. However, if the number of noisy regressor- orregressand-variables is very high ompared to
thenumber ofobservations, this basibilinear approah is notgood enough: Theombined ovariation
ontributions of thenoisy variables preventthe bilinear regressionmethods from nding auseful initial
subspae. Therefore,various variableseletion stragetieshavebeendevelopedalso forPLSRto improve
predition and to simplify interpretation, but without eliminating interesting variables just to redue
ollinearity.
Oneapproah is to redue small parameterstowards zero by a generalshrinking/expansion of thePLS
loading weight elements aording to ahosen exponent (Powered PLS[2, 3℄). Another approah is to
indue sparseness in the data byforing ontributions lose to zeroto be true zeros. Examples of suh
methodsaretheleastabsoluteshrinkageandseletionoperator(LASSO)[4℄anditsspin-otheelastinet
[5℄, bothinduingonstraintsonthe
L 1 normoftheregressionvetorβ
. Thelattermethod alsoapplies
ridgingbypenalizingthe
L 2 normof β
. ForPLSRsparsenesswasintrodued byMartens&Næs(1989,
p. 160),whosuggestedtheuseof roughstatistial signianetesting oftheelementsin eahindividual
loadingweightvetor,followedbyare-orthogonalization. Asimilarapproahwasimplementedintermsof
thesoft-threshold-PLS[6℄(ST-PLS)andsparsePLS[7℄(sPLS).Thesemethodsapplyashrinkagetowards
zeroto thePLS loadingweightssothat manyontributions beomezero. The amountof shrinkagean
behosento removeaertainproportionof thevariables oritanbehosenbysomeotherriterion. In
addition to giving amultivariateapproah to variable seletion, these methods analso selet dierent
variablesin eah PLS omponent that is produed. Asthese two methods, ST-PLSand sPLS, arevery
ofthismethodtsmodelsmuhfasterthanthesPLSversion. Weproposetoombinethesparsenessideas
with the distributional quality of noisein data, e.g. in PLS loadingweights, to sort between noiseand
signalandtherebyweightingdownorompletelytrunatingwhat islassiedasnoise.
Inadditiontoseveral ofthementionedsparsemethods wewillinlude variableseletionbytheVariable
InueneonProjetion[8℄(VIP) andSeletivityRatioplot[9℄(SR)methodsforomparison. ThesePLS
basedmethodsusedierentriteriaforassessingtheimportaneofvariablesinregressionandlassiation.
Wewill notgointo details abouthowvariables areseletedbythese methods in thispaper,but inlude
themasreferenestandards.
Thedistributionbasedtrunationapproahto variableseletionaddsto analreadylonglistof methods
forvariableseletion. Asdesribedinthis artiletheseletionofvariables in thisapproah ismotivated
from awell establishedpriniplein lassial statistis. Furthermore, thereis only one tuning parameter
whih needs to be set for variable seletion, whih makes the method simple and easy to implement.
The statistial foundation and the non-omplexity of the new method makes it appealing and easy to
understand. However, thepreditive performane of predition methods is typiallyverydependent on
theproperties of thedata, and there is no uniformly best method for preditionand variable seletion.
Therefore,itisimportanttoexpandthestatistialtoolbox,butatthesametimeitisimportanttobuild
anunderstandingofwhen thevariousmethods workbest. Inorderto dothisweomparethepreditive
performane of the various methods and attempt to interprete the results in light of the multivariate
propertiesofthedata.
3 Methods
Distributionassumptions
InthefollowingtheTrunation-PLSisbasedonloadingweightsfromPLSregression,thoughtheonept
is appliable also to regular regressionoeients. Further, theapproah ould similarly be applied to
seletY variables,orto PLSsoresin orderto eliminatenon-informativesamples,but these aspetsare
notoveredin this paper. Whenreording output from somekind of spetrosopi/-metri instrument
we expet that theabsene of a signalresults in white (non-informative) noise, while the presene of a
signalwillprodueasystematideviationfromrandomness. Thesameappliestoothertypesofdata,e.g.
miroarrays,butthedistributionofthenoisevaries. WhenreatingvetorsofloadingweightsinPLS,we
omputethersteigenvetorofthematrixprodut
X ′ {a−1} · Y {a−1}(foromponentnumbera
). Ifagiven
X-variableisunorrelatedwiththeresponsevariable(s)(forpossiblydeatedmatries)theloadingweight
forthisvariablewillbeasumovernequallydistributedrandomvariables,andbytheCLTitwilltherefore
representrandomnormalnoise,atleastapproximately. ForX-variablesorrelatedtotheresponsevariable
thetheoretialdistributionsofeahloadingweightwillalsobeasymptotiallynormaldistributed,butwith
non-zero mean. However,as the orrelation inreasesthe distributions will be inreasinglyskewed. As
thetrueorrelationbetweenanX-variableandtheresponseapproahes1,thelimitingdistributionofthe
orrespondingloadingweightwillbeahi-squaredistributionwithnon-zeroexpetation. InFigure1(left)
thetheoretial distributions of three non-normalizedloadingweights(sample size
n
=20)are illustrated;a entred normal distribution for an unorrelated X-variable, and two skewed distributions for two X-
variableswithorrelation-0.6and0.6withtheresponse,respetively. Inthisgurethedistributionshave
noise distributionand 30%are orrelated with the response with either the -0.6 or the 0.6 orrelation.
Inareal data appliationthe loadingweightsofthe informativeX-variables willfollowdierent skewed
distributions. Thesampledistributionoftheweightswillthereforerepresentamix ofseveraltheoretial
distributionsandnotjust threeasusedinFigure1(left). An exampleofasampledistributionofloading
weightsis given in Figure 1(right). The main objetive in Trunation-PLS is to nd lowerand upper
ut-os betweenwhih it is assumed that the majority of the loadingweightsrepresent noisevariables.
Hene,theproblemboilsdowntondinganestimateoftheentralnormaldistributionofloadingweights
(oratleastseletedperentiles) inordertodistinguishthisfrom theskeweddistributions.
−40 0 −30 −20 −10 0 10 20 30 40
0.01 0.02 0.03 0.04 0.05 0.06 0.07
Value
Density
−0.15 −0.1 −0.05 0 0.05 0.1
0 10 20 30 40
Loading weight
Frequency
Figure1: Left: SimulatedtheoretialdistributionsofloadingweightsfromXvariableswithnoorrelation
to the response (red urve, 70% entred around 0), and orrelation of -0.6 and 0.6, respetively (blue
urves, 15% eah, entred around -12 and 12, respetively). Right: Histogram of normalized loading
weights(milk proteindata)illustratesthedistributionalharaterofthenoninformativeloadingweights.
Theredvertiallinesindiate theut-osbetweeninliersandoutliers.
ToonformtothelassialCLTtheobservationswouldneedtobeindependent,butthisisnotalwaystrue
in pratie. However, CLTtheory alsoexist for observations having weak dependene,and wewill only
onsiderthevariableswhere wedonotexpetanyinformationtobepresent,supportingindependeneof
thesevariables.
Algorithm
TheideapresentedinSetion2laysthegroundforawiderangeofpossibleimplementationsforlassifying
dataasnoiseorsignalbasedontheirdistribution. Inpriniple,thetrunationmaybeapplied toseveral
dierentmodelparametertypes-toobjetsoresinYorX,toY-loadingweightsandtoX-loadingweights.
Inthis paperwefouson thetrunation ofthe X-loadingweights,alled w in the nomenlatureof [10℄.
Themain approahwillbeto makeaondeneintervalaroundthemedianvalueofasortedvetor,e.g.
PLSloadingweights,andtrunateordown-weighteverythingthatfalls insidetheinterval,seeAlgorithm
1. ThewidthoftheondeneintervalwillbeestimatedusingtheoryfromLenth[11℄. Aseondapproah
willbetomakeuseofaqq-plot,lassifyingvariableslosetothestraightlinegoingthroughahosenpair
ofquantilesasinliers. AlternativelyoneouldadaptanormalorStudenttdistributiontothesamevetor
bydiret tting to the seleteddistribution, but this anbea timeonsuming and unstable proedure.
Thevariations havein ommon that outliers are onsidered true information, while observations within
aertain rangeof the distributionare lassiedasnoise. In thehistogram of loadingweights in Figure
1(right)theestimatedut-osbetweeninliersandoutliers areindiated. Thegeneraldistribution based
trunationalgorithmisasfollows:
•
Inputandidateloadingweightvetorw
to betrunated.•
Sortw ⇒ w s.
•
Eitheromputeaondeneintervalaroundthemedianof
w s,or
talinethroughquantilesaroundthemedianof
w s.
•
Classifyoutliersasreal,informativeontributionsandinliersasnoise.•
Trunateinliers.InpratiethedistributionbasedtrunationanbepluggedintotheNIPALS[12℄algorithmorkernelbased
algorithmsasaomponent-wiseproessingoftheandidatePLSloadingweightstoimposesparsenesson
the variables, or even trunate the sores to impose sparseness on the objets. In this paper we limit
the appliations to the single response ase, but the proedures are equally relevant in multi-response
problems, aswell as other multivariatemethods likeLPLS, PCA, ICA and CCA. Trunation of loading
weightswill berelevant for mostappliations as it is morelikelythat somevariables do notontribute
toaomponentthanthat aset ofobjetsdo notontribute. Whentrunatingonlyloadingweights,the
following omputationof soresensuresthat loadingweightsand soresreetthe sameinformation. If
soresaretrunated, thiswill notbereetedin theinformationof theloadingweights, meaningthat a
re-omputationof loadingweightsand soresmay beneessarybasedon thetrunation generated from
thesores,orloadingweightshavetobedisregardedwhenanalysingtheresultingmodel. Assuggestedby
Martens&Næs,oneouldalsore-orthogonalizethevetorsofloadingweightsiforthogonalityisonsidered
important. Re-orthogonalizationmayintrodueshadowingeetfrompreviousomponentsuhthatsome
zeroloadingweightsbeomenon-zero. Forthedatasetsweareusinginthispaperthehangesinregression
oeients areverysmall with orwithoutre-orthogonalization, and the preditionsare equal sinethe
non-orthogonalizedandorthogonalizedloadingweightsspanthesamepreditor spae.
Insteadof applying hard thresholding, where inliers are set to zeroand outliers are keptas theyare, it
ouldbevaluabletoshrinkaordingtotheprobabilityofbeinganinlieroroutlier. Suhasoftshrinkage
ouldbe
1 − P(x j = inlier)
,but estimating thisprobabilitywould requireestimatesof thedistributions oftheoutliers. Insteadweapplyaumulativedistributionfuntion ontheobservedvariablesandresaleso that the median is given weight 0 and the largest outlier is given weight 1. As this strategy gives
ratherpoordistintionbetweeninliersandoutliersweintrodueaparameterizedversionoftheseweights
toprodueweightsthat arelosertoahardut-oasillustratedinFigure2.
200 400 600 800 1000 0
0.2 0.4 0.6 0.8 1
Variable number
Weight
Figure2: Transformationof saledweightsforgraduallysteepertransition betweeninliersand outliers.
Forthisexampletheweightorrespondingto theut-obetweeninliersandoutliersisset to0.7.
3.1 Cut-o determination
In order to nd ut-os between inliers and outliers an estimate of the entral normal distribution of
inliersis needed. Sinethedistributionis enteredin zerothedistribution will be fullyharaterized by
an estimate of its variane. In order to distinguish the entral noise distribution from the non-entral
distributions oftheinformativeoutliers,amixture model approah ouldbeadopted. Forinstane, [13℄
presented amixture model approah forsample size determination with false disoveryrate ontrol for
high-throughputdataproblems,andasimilarapproahouldbeadoptedhere. However,estimatingaset
ofentralandnon-entraldistributionsinvolvesiterativeproedures(liketheEM-algorithm)whihwould
seriouslyslowdown the tting proess of the PLS regression model. Further, only the variane of the
entralnoisedistributionisneeded,notthepropertiesofthenon-entraldistributions.
A similar problem arises in the analysis of saturatedANOVA models for
2 k-designs withoutrepliates.
Thenalldegreesoffreedomareonsumedintheestimationoftheeetsandnoonventionalerrorvariane
estimateanbeomputed. Still,all eet estimateshavethesamevariane,but aset ofnon-important
eetshavezero-expetation. Fromtheseavarianeestimateforsignianetestinganbefoundbythe
methodpresentedbyLenth [11℄. Inorderto estimatethevarianeLenth usesthefatthat thestandard
deviation of a entral normal distribution is tightly onneted to the median of the absolute value of
therandomvariable. Sine themedian isratherrobustagainstthe inuenefrom outliers,this variane
estimatewillbeonlymoderatelyaetedbytheoutliersaslongasthemajorityoftheeets(orloading
weights in our ase) are samples from the entral noise distribution. In the setting of this paper the
approahofLenthan bedesribedasfollows:
Let
w 1 , w 2 , ..., w p represent the loading weightsomputed from the p
X-variables at step a
of the PLS
algorithm. Further, dene
s 0 = 1.5 · median |w k |
fork = 1, ...p
. It anbeshownthats 0 isafairly good
estimateofthestandarddeviationofthenormaldistributionoftheinliers. Inordertomakeitevenmore
robustand lessbiased Lenthreommends to make thenal estimate, thepseudo standard error (PSE),
basedonasetofinlyingvaluesonly:
P SE = 1.5 · median
|w k |<2.5·s 0
|w k |
.Lenth argues that if the
w k are realizations of a N (0, τ 2 )
random variable W
, the median of |W |
is
approximately
0.675τ
,implying that1.5 × median |W | ≈ 1.01τ
. AndsineP r(|W | > 2.5τ) ≈ 0.01
,thePSE isroughlyonsistentfor1.5timesthe
0.495th
quantileof|W |
,whih is1.5 × 0.665τ ≈ τ
.ThePSE anbeombinedwithaStudenttquantileof
d = p/3
degreesoffreedomto giveaonservative marginoferror(ME) forondeneintervals:M E = t 0.975;d · P SE
(95% ondene). However,in high-throughputdataproblemsthedegreesoffreedomwillusuallybelarge,andperentilesfromthestandard
normaldistributionmaybeusedinstead. InthePLSalgorithmtheut-osarethusdened bythelimits
ofa
(1 − α)100%
ondeneintervalaroundthemedianloadingweightwithmarginsoferrorasdesribedabove:
median(w) ± M E
,forsomehosenondenelevel(1 − α)
.Ifthereisalargeasymmetryinthenumberofpositiveandnegativeoutliers,theskewnessinthedistribution
of
w
mayauseMEto beslightlyinatedausingapotentiallossofinformativeoutliers detetedinthe lighter tail. This an beavoided byestimating the margin of errorseparately for positiveand negativeloading weights. This is aomplished by rst nding
s − 0 and P SE − using the absolute values of the
negativeweightsandthenomputingthemarginalerror
M E − forthelowertail. Thenthesameexerise
isondutedforthepositiveloadingweightsnding
s + 0,P SE +andnallyM E +fortheuppertail.Finally,
M E +fortheuppertail.Finally,
theut-osaredenedby
M E = min(M E − , M E + )
. Theinreasedexibilityanimprovetheestimationofboundariesbetweeninliersandoutlierswhenthereisasymmetryinthedistributions. Intherestofthis
paperwerefertotrunation usingLenth'smethodsasLenth.
3.2 Outlier detetion by qq-plots
An alternative to the above strategy is to use a qq-plot (quantile-quantile plot) as basis, extending an
interval around the median value of
w s minimising the mean squared error (MSE) to the line going
throughseleted quantiles (qq-line),e.g. the25-th and 75-th perentile of theStudent t distribution or
normaldistribution,seeFigure3. TofavoursolutionshavingmanyinlierstheMSEis weightedwiththe
ratiobetweenthetotalnumberofpointsand thenumberof non-informativeinliers (
n tot
n in
). Alternatively
onean favour solutionswith few informative outliers with MSEs that are not signiantly worse than
the minimum MSE. Utilizing funtions based on golden setion searh with paraboli interpolation, or
similar, the MSE minimization an be solved quikly as a linear searh, or aseries of suh in ases of
asymmetry. Visualisationofthesorted
w
vetorplotted againstthenaldistribution,e.g. Figure3,an aidinvalidatingandjustifyingthenal trunation.−0.05 0 0.05 0.1
−4
−3
−2
−1 0 1 2 3 4
Student t distribution (22 pseudo df)
Sorted loading weights
Figure3: qq-plotoftherstvetorofloadingweights(olonanerdata)againstaStudenttdistribution
with22pseudodegreesoffreedom. Smalldotsindiateoutlierswhilelargerdotsindiateinliers. Theline
goingthroughthe20-thand80-thperentilesisindiatedindot-dashedform.
exatlyhowmanydegreesoffreedomthat areonsumedbyaPLSomponentisnottrivial,butarough
estimate is the following leverage-based estimate (pseudo degrees of freedom):
P
i t 2 a max
i (t 2 a )
, wheret a is
the
a
-thPLS-sorevetorandi
isthesamplenumber. Asthetrunationis robustto hangesin numberof degrees of freedom, we do not need the exat degrees of freedom. Note that the numberof degrees
offreedom onsumedwill hange after trunation. Inthe restof this paperwe referto trunationusing
qq-plotsasqq-line.
Note that for both the Lenth and the qq-line method the number of variables seleted as informative
may vary from one omponent to another. Furthermore, the same variable may be seleted in several
omponents. Hene,the total numberof seleted variables may notbe set exatly, but anbe to some
extent ontrolled by the number of PLS-omponents and the hosen width of the interval around the
medianweight.
3.3 Referene methods
Thetrunation proeduresare omparedto ST-PLS,Elasti net, variableseletionby VIPandSR, and
PLSwithout any modiations. This isa small subsetof representativemethods. FormorePLS based
variable seletion methods wereommend the papers of Mehmood et al. [14℄ and Roger et al [15℄. To
makeomparisonsfair weoptimizeeahmethod separatelywith regardto lassiation/predition. The
performane ofeahmethod isevaluatedontest setdata orbyross-validationin termsof lassiation
errorsforthelassiationproblemsandrootmeansquareerrorofpredition(RMSEP)forthepredition
problems. WiththeElastinettheoptimizationisperformedoverareasonablegridofridgingvalues(0.1
to1,where thevalue1givestheLasso)and
L 1 shrinkages(automatiallyhosen[16℄). Theshrinkageof ST-PLSisvariedoverarelevantrange(0.05to0.95),andtheut-oforVIPisvariedfrom0.8to1.2[17℄.
ForSRweoptimizetheut-obetween0.05and0.5,astheut-osuggestedbytheauthors(0.5)selets
toofewvariables to obtaingood preditionson thedata sets tested in this paper. Beause there are so
manymodels,notallparameterombinationswillbereported.
There are several sparse PLS regression methods to hose between, but we found that their resulting
variableseletionswerequitesimilar,espeiallywhenoptimizingthesparsenessparameterwithregardto
predition. WehaveseletedST-PLS[6℄asaommonrepresentative,thoughanyof[7,18,19℄wouldhave
beenagoodalternative.
Inaddition to theresults assoiated withparameters givingthelowest preditionerrors wewill present
models that haveslightly higher predition errors but give moresparse loading weights and regression
oeients(simplied models). Forthe datasets where repeated ross-validation isused, the simplied
modelsshouldhavenomorethanonestandarderrorhigherpreditionerror,whileforthedatasetswhere
test set predition is used ommon additions to the error of 0.001 and 0.01 are used (see the Results
setion).
4.1 Data sets
The distribution based trunation method for variable seletion is ompared to the referene methods
on both aset of real data sets and to simulated data. These data sets represent awide range of high-
dimensional data typeswith dierent properties, and the results will be disussed in lightof these. In
ordertosummarizethedatapropertiesweusetheapproahofHellandandAlmøy[20℄andSæbøetal. [6℄
whostudythe eigenvaluestruture ofthesample ovarianematrixof thepreditorsand theovariane
betweentheprinipalomponentsandtheresponse. Inthefollowingwerefertothelatterpropertyasthe
relevaneofalatentomponent,followingthenotationofNæsandHelland[21℄. Wesummarizethedata
struturesineigenvalue-ovarianeplots. HellandandAlmøy[20℄onludeintheirstudythat predition,
usingPLSRmethodsatleast,ismostdiultinaseswherethereareirrelevantomponentshavinglarge
eigenvalues, or ontrary, if there are relevant omponents having small eigenvalues. In these ases we
thereforeexpetthatvariableseletionmethodsbasedonlatentomponentswillbelessfavourable.
4.1.1 Simulateddata
Thesearesimulateddataontainingtwoorrelating,informativefeaturesandavariablenumberofunin-
formativevariablesas desribedin [22,23℄. Thetotalnumberofvariablesrangefrom 100to 20000,and
thenumberofobservations ineahof twolassesare100and50 forthealibrationandvalidation data,
respetively. Thesimulationstudyisrepliatedexatlytobeomparabletothepapersithasappearedin
previously.
4.1.2 Colon aner data
These are expression levelsof 2000 genes on 62 patients as presented by Alon et al. [24℄. Among the
patients20werehealthywhile42hadolon aner. As anbeseenfrom Figure4thereareseverallarge
eigenvalueswhih indiate several diretions in thepreditor spae of largevariane. At the sametime
thesediretionsappeartoberelevantforpreditionbyhavinglargeovarianeswiththeresponse. Hene,
preditionusingPLSbasedmethodsshouldberelativelyeasy,butmightrequireafewomponents.
4.1.3 Prostate aner data
These are expression levels of 12600genes on 102 patients aspresented by Singh et al. [25℄. Among
the samples 52 were tumor speimens and 50 were normal. From Figure 4 we observe arapid drop in
eigenvaluesimplyingstrongdependenebetweenthepreditorvariables. However,somediretionsofsmall
variability(smalleigenvalues)havesomeofthelargestovarianeswiththeresponse. Thisisanexample
ofadatasetwheretherearerelevantomponentswithsmalleigenvalueswhihaordingtoHellandand
Almøy[20℄isnotfavourableforPLSpredition. WethereforeexpetthatthePLS-basedvariableseletion
methodswillnotperformwellforthis dataset.
These areRamanspetrafrom 45oilsamples extratedfrom farmed salmon(Salmo salar)[26℄. Raman
spetrosopy with aUV laser has beenonduted. As a fat indiator theiodine valuehas beenhosen
asthe response for regression. The spetraare pre-proessed by asymmetri least squares [27℄ (
λ = 7
,p = 0.11
[28℄)wrappedinaustomizedbaselineorretion[29℄toreduebaselineexibilityunderabroadluster ofpeaks. Thespetrahavebeenut down to2263 wavlengthsto removeartifats atthe endsof
thespetra. Thesedatahaveastrutureresemblingtheolondatawithseveraldiretionsinthepreditor
spaewithhighvariabilityandhighrelevane. Preditionshouldberelativelyeasyusingafewomponents
inthePLSmodel.
4.1.5 Milk protein data
These arematrix-assisted laserdesorption/ionizationtime-of-ight (MALDI-TOF)spetrafrom 45 milk
mixtures(x4spotrepliates) ofow,goat andewemilk[3℄. Anothersetof45mixturesfrom atehnial
repliate is used as validation data. Spetral values from 5000 m/z to 20000m/z (6179 variables) are
used for prediting the perentage of ow milk in the mixtures, i.e. the degree of adulteration. If the
trunationproedureispluggedintoanonialPLS(CPLS)[30℄,theperentageofgoatandewemilkan
be used asadditional responses to obtain moreparsimonious solutions. The eigenvalues for these data
implystrongvariable dependene with oneortworelevantomponents. Predition shouldbequiteeasy
withfewomponentsusingPLSregression.
Colon cancer data
Component
Scaled eigen v alue
0 5 10 15 20 25 30
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Scaled co v ar iance
Prostate cancer data
Component
Scaled eigen v alue
0 5 10 15 20 25 30
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Scaled co v ar iance
Fish oil data
Component
Scaled eigen v alue
0 5 10 15 20 25 30
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Scaled co v ar iance
Milk protein data
Component
Scaled eigen v alue
0 5 10 15 20 25 30
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Scaled co v ar iance
Figure4: Summariesofdatapropertiesfortherealdatasets. Eigenvaluesofthesampleovarianematrix
(saled by the largest) are marked by the height of bars. Covarianes (saled by the largest) between
prinipalomponentsandtheresponse aremarkedbyreddots.
4.2 Results
4.2.1 Simulateddata
Followingtheproposedsimulationshemeof[22℄ aswasdonewithPLSand sPLSin[23℄, weobtainthe
resultsshowninFigure5. ChoosingtwodierentwidthsoftheondeneintervalsofLenth'smethodwe
ndlassiationerrorsalmostidential towhat wasshownusingsPLS andgreatlyimprovedompared
to theonventionalPLS regression. However,thewidestLenthondene interval(99.9%) givesalmost
perfet lassiationregardlessofnumberofuninformativevariables. Theseoptimistiresultsareaused
by a simulation proedure that highly favours sparse modelling methods, and so should not be over-
interpreted.
0 2500 5000 7500 10000 12500 15000 17500 20000 0
0.05 0.1 0.15 0.2
p
Classification error
PLS Lenth (95%) Lenth (99.9%)
Figure5: Classiationerroroftwolass simulateddata. Tworegressorvariablesareinformativeforthe
regressandvariable,whilethetotalnumberofregressorvariablesareindiatedontherstaxisas
p
.4.2.2 Colon aner data
Figure6ashowstheaveragelassiationerrorofpatientsfrom200random10-foldross-validations[31℄.
Linear disriminant analysis with empirial priors is used for the lassiation. It is evident that one
omponentis not enoughto obtain good lassiation regardlessof the PLS method used. Elasti net
performs approximatelyat thesamelevelastheone-omponentPLSvariants. TheST-PLS andqq-line
Trunation-PLShavethebestombinationsoffewnon-zerovariablesandlowlassiationerror(bottom
leftorner ofthegure). The VIPansSRmethodswith twoand threePLSomponentshaveaslightly
worseombinationofsparsenessanderror,togetherwithLenthandWeightedLenth.
Wealsoobservethat hoosingamodelwithslightlyhighererrorthanthebestmodel angreatlyredue
thenumberofnon-zerovariables,espeiallyforLenth'smethod. Dependingontheaimoftheanalysis,e.g.
variable seletionorstable preditions,the hoie of trunationtype andparameter settingsmaydier,
espeiallysineallthepresentedmodelsusing twoandthreeomponentsliewithin a1% errormargin.
The mostsparse two omponent models (average number of non-zerovariables in parentheses) are ST-
PLS (74, simplied model), qq-line (171), Lenth (243) and ST-PLS (294). All of these models have a
higheraveragepreisionomparedtotheordinarytwoomponentPLSsolution,andareverylosetothe
preisionofthethreeomponentPLSsolution.
4.2.3 Prostate aner data
Figure6bshowstheaveragelassiationerrorofpatientsfrom100random10-foldross-validations.We
observethat thebest preditionsarefoundwhen using5omponentPLSmodels withvariable seletion
bySR. Following losely is theElasti net. Both of these methods giveverysparse solutions. There is
almost a 2% gap down to the rest of the methods. Here variable seletion by VIP, qq-line (simplied
model), ST-PLSand Lenth givethemostsparse solutionswhileWeighted Lenthgivesmarginallybetter
lassiation.
For thisdata set itseems that the small variationin the disriminating information favoursElasti net
andSRwhilethesparsePLSmethodsandVIPobtainproportionsorretlylassiedsimilartoonlyusing
PLSwithallvariables.
0 500 1000 1500 2000 0.08
0.09 0.1 0.11 0.12 0.13 0.14
# of non−zeros variables
Classification error
1 1
1 1
1 1 1
1 1 1 1
1 1
1 1 1
1 1
1
2 2
2 2
2 2
2 2
2 2
2
2 2
2 2 2
2
2
33 3 3 3
3
3 3 3
3 3 3
3 3 3
3 3
3
Lenth Weighted Lenth qq−line ST−PLS Elastic net VIP SR
(a)Colonanermiro-arraylassiationusingLDA.
Full PLS-DA:1 omp.: 0.130, 2 omp.: 0.105, 3 omp.:
0.085(dashedlines).
0 2000 4000 6000 8000 10000 12000
0.055 0.06 0.065 0.07 0.075 0.08 0.085
5
5 5
5
5 5
5 5
5
5 5
5
5
5 5
5 5
5
5 10
10 10
10 10
10 10 10
10 10 10
10
10 10
10 10 10
10
# of non−zeros variables
Classification error
(b)Prostateanermiro-arraydatalassiationusing
LDA.
FullPLS-DA:5 omp.: 0.078, 10omp. 0.0825(dashed
lines).
0 500 1000 1500 2000
1.4 1.6 1.8 2 2.2 2.4 2.6
1
1
1 1
1 1
1
1
1
1 1
1
1
1
1 1
1 1
2
2 2
2 2
2
2 2
2 2
2 2
2
2 2
2 2
2 3
3 3
3 3 3 3
3 3 3
3
3 3
3 3
3 3
3
# of non−zeros variables
Root mean squared error of prediction
()FishoilRamandatapreditionofiodine.
Full PLSR:1omp.: 2.70, 2 omp.: 1.68,3 omp. 1.74
(dashedlines).
0 1000 2000 3000 4000 5000 6000
0.075 0.08 0.085 0.09 0.095 0.1 0.105
2
2 2 2
2 2
2
2
2 2
2 2
2
2 2 2
2 2 3
3
3 3 3
3
3 3
3
3 3 3 3
3
3 3 3
3 4
4
4
4 4
4
4 4
4 4
4
4 4
4
4
4 4
4
# of non−zeros variables
Root mean squared error of prediction
(d)Milkprotein MALDI-TOFdata predition ofadul-
teration.
FullPLSR:1omp.:0.103,2omp.:0.074,3omp.:0.078
(dashedlines).
Figure 6: Repeated random 10-fold ross-validated lassiation (subgures a and b) and test set pre-
ditions(subgures andd) usingvarying numbersofPLS omponents. The symbolsindiate dierent
variable seletion strategies and their numbers of omponents. Blak symbols are assoiated with the
parameters giving the highest preision, while red symbols indiate models using fewer variables while
retainingmostoftheirpreision.
InFigure6weseetheresultsoftest setpreditionsusing thesamemethodsasabove. Parametershave
beenhosenbyross-validation. ThebestombinationofpreditionandsparsenessisobservedforLenth
and ST-PLS. Preisions of these preditions are muh better than only using PLS. TheRMSEP values
from Elasti net aresomewhere between theone omponent PLSmodels andthe two/three omponent
models. Astheparametersandsimpliationsarehosenontheross-validationresults,weobserveboth
redutionsandinreasesinRMSEPwhenusing simpliedmodels.
4.2.5 Milk protein data
Inadditiontoomparisonwiththereferenemethods thisdataset isinludedbothto showhowonean
obtainparsimoniousmodelsbypluggingthetrunationalgorithmintoadierentNIPALSalgorithm,the
anonialPLS,andtoshowhowinterpretationofspetraldataanbemadeeasierbyimposingsparseness.
TheCPLSalgorithmdiersfromtheregularPLSinthewaythatadditionalsampleinformation(likedesign
variables)maybeinludedasextraresponsevariablestostabilizetheextrationofthelatentomponents.
This has the typial eet that the number of omponentsis reduedompared to PLS regression. As
mentionedinthedesriptionof thedatatheperentageofgoat andewemilkwasinludedasadditional
responses in the analysis of the ow milk data. In Figure 6d we see the results of test set preditions
using the same methods as above. Parameters have been hosen by ross-validation. Here Elasti net
is the winner onsidering the ombination of predition and sparseness. However, predition-wise the
othermethodsareverylosebehind. AmongthePLSbasedmethods, Lenthhasthebestombinationof
preditionandsparseness,havingmarginallybetterpreditionthanElasti netusinglessthan
1 / 6
ofthevariableswiththesimpliedmodel.
Figure 7 shows the predition errorof PLS and CPLSregression used separately and ombined with a
pre-hosentrunation (99.9%ondene interval(Lenth'smethod)withsharput-o). Weobservethat
for models using few omponents trunation has no eet on predition with PLS, but gives a minor
improvementwhen ombinedwith CPLS. Also, CPLShas muh lower preditionerror for oneand two
omponentmodels. Lookingonlyatpredition,thebestbalanebetweenpreditionerrorandomplexity
isatwoomponentCPLSmodelwithtrunation.
0 2 4 6 8 10 0
0.05 0.1 0.15 0.2 0.25 0.3
# components
RMSEP
PLS PLS (Lenth) CPLS CPLS (Lenth)
2 4 6 8 10
0 200 400 600 800 1000 1200
# components
# of non−zero variables
PLS (Lenth) − per component PLS (Lenth) − total CPLS (Lenth) − per component CPLS (Lenth) − total
Figure 7: Predition of ow milk proportions in milk mixtures from MALDI-TOF spetra (left) and
thenumberof non-zero variablesperomponent/intotal using trunation(right). The totalnumberof
variableswas6179.
In Figure 8 we see the rst two vetors of loading weights from PLS and CPLS regression with and
withouttrunation. The ontrastis high witha highlevelof noisein the upperspetraand onlyafew
remainingpeaksin thelowerspetra. Here thetrunated spetraseemto haveanadvantagewhenused
forinterpretationandproteinassignment.
5 7 9 11 13 15 17 19
−0.2
−0.1 0 0.1
PLS
5 7 9 11 13 15 17 19
−0.2
−0.1 0 0.1
CPLS
5 7 9 11 13 15 17 19
−0.2
−0.1 0 0.1
x1000 m/z
Truncated CPLS
Figure 8: Loading weight vetors from MALDI-TOF spetra of milk (two rst omponents). The top
spetra ome from ordinaryPLS, the middle spetra from CPLS, while the bottom spetra ome from
trunatedCPLSwithtrunationparametersseletedtoreetatypialhoieappliableformanytypes
ofdata.
ThroughthispaperwehaveformalisedsomeaspetsofthefamilyofsparsePLSmethods. Firstlywehave
have justied trunation of loadingweights through the entral limit theorem and the distributions of
loadingweightswithnoorrelationtotheresponse. Seondlywehaveproposedanewtrunationfounded
on lassialstatistial asymptoti priniples. This is introdued through a novel appliation of Lenth's
theoryforreatingondeneintervalsinsaturatedANOVAmodelsfor
2 k-designswithoutrepliates. The
eetisthat theuseronlyhastohooseasignianelevelfortheondeneinterval,resultinginaless
adhoapproah.
Trunation inthispaperisahievedusingageneralandexibleplug-inwhihaneasilybeadjustedand
implemented also in other projetion based methods like PCA [32℄, ICA [33℄, PCR, CPLS and PPLS.
PLS regression is an iterative algorithm and omponent wise trunation will inevitably slow down the
algorithm,but Lenth's method is extremelyquik, i.e. there isa minimal lagompared to just running
regular PLSR. The alternative approah based on the qq-line is also quite quik, and appears to give
slightlybetterresultsin somesituations.
With regard to predition performane the trunation PLS is mostly on par with ST-PLS, sometimes
alittle better, sometimes alittle worse. As with all statistial methods, this is highly data dependent.
However, there are few parameters to tune and they have statistial interpretations. For the data sets
inludedinthispaperweseethatElastinetsometimesperformssigniantlybetterthanthesparsePLS
methods,whileittrailsbehindwhenusedonotherdatasets. Thisisalsotheaseforthevariableseletion
by Seletivity Ratio plots and to someextent the Variable Inuene on Predition method. The Lasso
wasalsotestedwiththeinludeddatasets,butbeingaspeialaseoftheElastinetitneverperformed
better in pratie. Butpredition isnottheonly goalforastatistial method. Thetrunation methods
havealso shown onsistentgood results, arebased onintuitivetheory, are quite robustto the hoie of
parametersandareextremelyquik.
Theperformaneof thevarious methods mayto someextentbeexplained bythestruture of thedata.
ThePLS-based methodsperformrelativelybetterwhenthere aremanydiretionsin thepreditorspae
with both ahigh variane(high eigenvalue) and ahigh relevane. This wasthe asefor both theolon
anerdataandtheshoildata,andherealsothePLS-basedvariableseletionmethodsperformedwell,
with the newtrunation method and ST-PLSslightly ahead of the others. Forthe prostate data these
methods performed worse, and this result onrms the expetations basedon the data properties that
PLSmethodshavetroublemakinggoodpreditionsforthiskindofdatawheretherearediretionsinthe
preditorspaeof lowvariane,but withhighrelevane. However,anexeptionistheSR method based
onthe5omponentPLSmodel. Thisanbeexplainedbythefat thattheSRmethodisadjusted tobe
morefavourablethanordinaryPLSwhentherearevariableswithlowvarianes,butwithhighorrelations
withtheresponse[9℄. This isexatlywhat istheasehereaordingtoFigure4. Apparentlytheelasti
nethasasimilar behaviour,whih anbeexplainedbythefatthat thismethod, liketheordinaryleast
squares, giveshigher weight to variableswith high orrelations to theresponse, asopposed to the more
ovariane-fousedPLS.Theresultsindiatethat inaseswherethereisastrongorrelationstruturein
thedata(prostateanerdataandmilkproteindata)theelastinetisagoodhoieofmethodforvariable
seletion. Whenhoosingamethod foranalysis andvariableseletionit maythereforebeworthwhileto
studythedatapropertiesintermsofeigenvaluesandomponent-responseovarianes.
Inmostappliations, smalldeviationsfrom orthogonalityanbedisregarded. However,when orthogonal
vetorsofloadingweightsis important, are-orthogonalizationstepanbeinludedafter thetrunation,
foring the urrent vetor of loading weights to be orthogonal to the previous vetors extrated. The
down-sidetothisisthatshadoweetsfrompreviousloadingweightsmayappearinthere-orthogonalized
loadingweights, ausing zeroweights ofregressorsalreadyused in previousomponentsto beomenon-
zero. Forthedatasetswehaveusedinthispaper,theshadoweetwassosmallthat theywereinvisible
in plots, and only appeared a few times in measurable sizes. The total numberof non-zero regression
oeientsshould notbeaeted.
A note should be madeon the dierent roles of theX loadingweights,
w a, and the X loadings, p a. It
is important to rememberthat theloading weightsontain the ovarianeinformation between
X {a−1}
and
Y {a−1} (thersteigenvetoroftheovarianematrixifY
is multiresponse) andgiveustheweights
that eah explanatory variable has when reating sores and loadings. The sores,
t a, are just linear
ombinations of the explanatoryvariables weightedby the loadingweights. The loadings, however,are
foundbyprojetingeah explanatoryvariableof
X {a−1} onthesores,t a. Loadingweightsandloadings
an look quite similar when no trunation has been applied, espeially for spetrosopi data. With
trunation, however, the loading weights obtain a lot of zero holes, while the loadings retain a more
ontinuousshape(at least for spetrosopi data). The upshot is that fully trunated variables are not
ompletelylost,andtheirroleinthesystemmaybeinterpretedgraphiallysinetheirloadingsareintat.
Depending ontheappliation, either loadingweightsorloadingsanbeinterpreted,havingrolessimilar
totheregressionoeientswith andwithoutzeroholes.
Insomeappliationsitmaybeinterestingtoapplytrunationwithoutendingupwithzerosintheresulting
regressionoeients,analogoustofousingonloadingsinsteadofloadingweights. Thisanbejustiedby
theneedtoremovenoiseintheomputationofPLSomponentsandatthesametimeproduingontinuous
regressionoeients.FromtheearlydaysofPLSRwendapproximateestimatesofregressionoeients
thatproduethedesiredeet. Twoalternativeshavebeenproposed. Firstlytheapproximatedregression
oeientsansimplybeestimatedbytheprodutoftheXandyloadings:
β ˆ † = P q ′. Amoreelaborate
strategyistoproduenewapproximatedXsores,yloadingsandregressionoeientsbyfullprojetion
ontheX loadings:
T ⋆ = XP (P ′ P ) −1, q ⋆ = y ′ T ⋆ (T ⋆′ T ⋆ ) −1, andnally: β ˆ ⋆ = P q ⋆′. Both strategieswill
β ˆ ⋆ = P q ⋆′. Both strategieswill
produeregressionvetorswithoutzeroholes.
Referenes
[1℄ Wold, S., Martens,H. &Wold,H. Themultivariatealibration problemin hemistry solvedbythe
PLSmethods. Leturenotesinmathematis 973,286293(1983).
[2℄ Indahl, U. Atwisttopartialleastsquaresregression. JournalofChemometris 19,3244(2005).
[3℄ Liland, K. H., Mevik, B.-H., Rukke,E.-O., Almøy, T.&Isaksson,T. Quantitativewhole spetrum
analysiswithMALDI-TOFMS,PartII:Determiningtheonentrationofmilkinmixtures. Chemo-
metris andIntelligent LaboratorySystems 99,3948(2009).
[4℄ Tibshirani, R. Regressionshrinkageand seletionviathelasso. J. R. Statist. So. B 58, 267288
(1996).
67, 301320(2005).
[6℄ Sæbø, S., Almøy, T., Aarøe, J. & Aastveit, A. H. ST-PLS: a multi-diretional nearest shrunken
entroidtypelassierviapls. Journalof Chemometris 22,5462(2008).
[7℄ Lê Cao, K., Rossouw, D., Robert-Granié, C. & Besse, P. A sparse pls for variable seletion when
integrating omisdata. Statistial appliations ingenetis andmoleularbiology 7(2008).
[8℄ Wold, S.,Johansson, E. & Cohi, M. 3DQSAR in drug design: theory, methods andappliations
(ESCOM SienePublishersB.V.,Leiden,TheNetherlands,1993).
[9℄ Rajalahti, T. et al. Disriminating variable test and seletivity ratio plot: Quantitative tools for
interpretation and variable (biomarker) seletion in omplex spetral or hromatographi proles.
Analytial Chemistry 81,25812590(2009).
[10℄ Martens,H. &Næs,T. Multivariate alibration (JohnWileyandSons,Chihester,UK,1989).
[11℄ Lenth,R.V. Quikandeasyanalysisofunrepliatedfatorials.Tehnometris 31,469473(1989).
[12℄ Wold, H. Estimation of prinipal omponent andrelated models by iterative leastsquares,vol.Mul-
tivariateanalysis(AademiPress,NewYork,USA,1966).
[13℄ Jørstad, T., Midelfart, H. & Bones, A. A mixture model approah to sample size estimation in
two-sampleomparativemiroarrayexperiments. BMC Bioinformatis 9(2008).
[14℄ Mehmood,T.,Liland,K.H.,Snipen,L.&Sæbø,S. Areviewofvariableseletionmethodsinpartial
least squaresregression. Chemometris andIntelligent LaboratorySystems 118,6269(2012).
[15℄ Roger, J., Palagos, B., Bertrand, D. & Fernandez-Ahumada, E. Covsel: Variable se-
letion for highly multivariate and multi-response alibration: Appliation to IR spetro-
sopy. Chemometris and Intelligent Laboratory Systems 106, 216 223 (2011). URL
http://www.sienediret.om/siene/artile/pii/S0169743910001978.
[16℄ Friedman,J.&Hastie, T. Regularizationpathsforgeneralizedlinearmodelsviaoordinatedesent.
Journal ofStatistialSoftware 33(2010).
[17℄ Chong,I.&Jun,C.Performaneofsomevariableseletionmethodswhenmultiollinearityispresent.
Chemometris andIntelligent LaboratorySystems 78,103112(2005).
[18℄ Chun,H. &Kele³. Sparsepartial leastsquaresregressionforsimultaneous dimensionredutionand
variable seletion. Journal of the Royal Statistial Soiety: Series B (Statistial Methodology) 72,
325(2010).
[19℄ Lee,D.,Lee,W.,Lee,Y.&Pawitan,Y.Sparsepartialleast-squaresregressionanditsappliationsto
high-throughput dataanalysis. ChemometrisandIntelligentLaboratorySystems 109,18(2011).
[20℄ Helland, I. S. & Almøy, T. Comparison of predition methods when only a few omponents are
relevant. Journalofthe AmerianStatistialAssoiation 89, 583591(1994).
[21℄ Næs, T.&Helland,I.S. Relevantomponentsin regression. Sandinavian Journal ofStatistis 20,
239250(1993).
dimensional lowsamplesizedata. InternationalJournalof AppliedMathematis 39,4860(2009).
[23℄ Filzmosera, P., Gshwandtnera, M. & Todorov, V. Review of sparse methods in regression and
lassiationwithappliationtohemometris. Journal 26,4251(2012).
[24℄ Alon,U.etal.Broadpatternsofgeneexpressionrevealedbylusteringanalysisoftumorandnormal
olontissuesprobedbyoligonuleotidearrays. P.Natl.Aad.Si.96,67456750(1996).
[25℄ Singh,D.et al. Geneexpressionorrelatesoflinialprostateanerbehavior. CanerCell 1,203
209(2002).
[26℄ Afseth, N. K., Segtnan, V. H. & Wold, J. P. Raman spetra of biologial samples: A study of
preproessingmethods. AppliedSpetrosopy 60,13581367(2006).
[27℄ Eilers, P. H. Parametritimewarping. Analytial Chemistry 76, 404411(2004).
[28℄ Liland, K. H., Almøy, T. & Mevik, B.-H. Optimal hoie of baseline orretion for multivariate
alibrationofspetra. AppliedSpetrosopy 64, 10071016(2010).
[29℄ Liland,K.H.,Rukke,E.-O.,Olsen,E.F.&Isaksson,T. Customizedbaselineorretion. Chemomet-
ris andIntelligent LaboratorySystems 109,5156(2011).
[30℄ Indahl, U. G., Liland, K.H. &Næs, T. Canonialpartial leastsquares aunied pls approah to
lassiationandregressionproblems. JournalofChemometris 23,495504(2009).
[31℄ Stone, M. Cross-validatory hoie and assesment of statistial preditions. Journal of the Royal
Statistial Soiety,Series BMethodologial 36,111147(1974).
[32℄ Pearson,K. Onlines andplanesoflosestt tosystemsofpointsinspae. Philosophial Magazine
2,559572(1901).
[33℄ Comon,P.Independentomponentanalysis,Anewonept? Signalproessing 36,287314(1994).