• No results found

PLS post-processing by similarity transformation: a simple alternative to OPLS

N/A
N/A
Protected

Academic year: 2022

Share "PLS post-processing by similarity transformation: a simple alternative to OPLS"

Copied!
7
0
0

Laster.... (Se fulltekst nå)

Fulltekst

(1)

PLS post processing by similarity transformation (PLS+ST): a simple alternative to OPLS

Theoretical properties and proofs

Rolf Ergon

Telemark University College Porsgrunn, Norway Email: [email protected]

This Supplementary Appendix gives the details and proofs of properties and results in the paperPLS post-processing by similarity transformation (PLS+ST): A simple alternative to OPLS [9]. For the readers convenience, the OPLS algorithm [2] is also included.

Property 1 The Martens factorization (2) has the special property that all score vectors except the …rst one are orthogonal to bothyandy.^

Proof: Sincew1 is given byw1= kXXTTyyk andT2:A=XW2:A, and sinceWTW=I, it follows thatTT2:Ay=WT2:AXTy=°°XTy°°W2:AT w1=0. From the prediction formula (3) further follows

^

y=XW¡

WTXTXW¢¡1

WTXTyand thus

£ t1 T2:A ¤T

^

y = TT^y=WTXTXW¡

WTXTXW¢¡1

WTXTy=WTXTy

= °°XTy°°WTw1=£ °°XTy°° 0 ¤T

; (17)

i.e. TT2:A^y=0:

Property 2 The residualEin the Martens factorization (2) is also orthogonal toy.

Proof: FromXTy=°°XTy°°w1,wT1w1= 1andyTT2:A= 0followsyTE=yT³

X¡TWT´

= yTX¡yT¡

t1wT1 +T2:AWT2:A¢

=yTX¡yTt1w1T =yTX¡yTXw1wT1 =°°XTy°°w1T¡°°XTy°°wT1 = 0.

Property 3 The factorizations (13) and (2) are identical, i.e. TWPTW=T. Proof: From the two well known estimator expressionsb^=W¡

WTXTXW¢¡1

WTXTy and

^b=W¡

PT¡1

qW =W¡

PT¡1¡

TTWTW¢¡1

TTWy [7] follows W¡

WTXTXW¢¡1

WTXTy=W³¡

PTT

TTWTWPT¡1¡

PTT

TTWy; (18) i.e. TWPTW=XW=T:

Property 4 The loading matrices in the factorizations (12) and (13) are P = £

p1 p2 ¢ ¢ ¢ pA¡1 pA ¤ and WWTP = £

p1 p2 ¢ ¢ ¢ pA¡1 wA ¤, i.e. they are di¤erent in the last column vector only.

(2)

Proof: The orthogonalized PLS algorithm results in an upper triangular and bi-diagonal matrix PTW, with ones along the main diagonal [8]. We thus have

PTWWT = 2 66 66 66 64

1 pT1w2 0 ¢ ¢ ¢ 0 0 1 pT2w3 ... ...

... ... ... ... 0

... ... 1 pTA¡1wA 0 ¢ ¢ ¢ ¢ ¢ ¢ 0 1

3 77 77 77 75 2 66 66 66 4

wT1 wT2 ... ... wTA

3 77 77 77 5

= 2 66 66 64

wT1 +pT1w2w2T wT2 +pT2w3w3T

...

wTA¡1+pTA¡1wAwAT wTA

3 77 77 75 :

(19) From this follows that pA in P is replaced by wA, as stated. For a complete proof we must also show that for 2 · i · A we have wTi¡1+pTi¡1wiwTi = pTi¡1, or equivalently that wiT = ¡

pTi¡1wi¢¡1¡

pTi¡1¡wiT¡1¢. Forming ¡

pTi¡1wi¢¡1¡

pTi¡1¡wTi¡1¢

wj we …nd the following possibilities for2·i·A:

j < i¡1 ) ¡

pTi¡1wi¢¡1¡

pTi¡1¡wTi¡1¢ wj

pTi¡1wi¢¡1(0¡0) = 0

j=i¡1 ) ¡

pTi¡1wi¢¡1¡

pTi¡1¡wTi¡1¢ wj

pTi¡1wi¢¡1

(1¡1) = 0

j=i ) ¡

pTi¡1wi¢¡1¡

pTi¡1¡wTi¡1¢ wj

pTi¡1wi¢¡1¡

pTi¡1wi¡0¢

= 1

j > i ) ¡

pTi¡1wi¢¡1¡

pTi¡1¡wTi¡1¢ wj

pTi¡1wi¢¡1

(0¡0) = 0

(20)

Since pi and thus also ¡

pTi¡1wi¢¡1(pi¡1¡wi¡1)belong to the span of w1, w2, ... wA, and since ¡

pTi¡1wi¢¡1¡

pTi¡1¡wTi¡1¢

wj = 1 for j = i and 0 for j 6= i, it …nally follows from the orthonormality ofWthat ¡

pTi¡1wi¢¡1¡

pTi¡1¡wTi¡1¢

=wTi , and thus thatwTi¡1+pTi¡1wiwiT = pTi¡1 for2·i·A.

Property 5 Using a predetermined loading weights matrix W, the de‡ation order in the al- gorithm resulting in the non-orthogonlized factorization (2) is of no importance for the resid- ual and predictions. A loading weights matrix W~ with permuted column vectors will thus give X=T ~~WT+EwithT ~~WT =TWT, and ynew=xTnewb^according to Eq. (3).

Proof: Since W is orthonormal the PLS algorithm giving the non-orthogonalized factoriza- tion (2) generally gives ti = (X¡P

over allj6=itjwjT)wi =Xwi. This is true irrespective of the order of de‡ation, i.e. T~ =X ~W. Introducing an invertible permutation matrix P~ with the prop- erty P~¡1=P~Tand W~ =W ~P , the predictions according to Eq. (3) will be ynew = xTnew~b = xTnewW ~P³

PW~ TXTXW ~P´¡1

PW~ TXTy=xTnew

WTXTXW¢¡1WTXTy=xTnewb.^

Property 6 Using a predetermined loading weights matrix W, the de‡ation order in the al- gorithm resulting in the orthogonlized factorizations (12) and (13) is of no importance for the residuals and predictions. A loading weights matrix W~ with permuted column vectors will thus give X = T~WP~T +EW and X = T~WP~TW ~~WT +E respectively, with T~WP~T = TWPT and T~WP~TW ~~WT =TWPTWWT.

Proof: According to Property 3 the relation between the orthogonalized and non-orthogonalized PLS algorithms isTWPTWWT=TWT. Use of the same algorithms with the predetermined and permuted matrixW~ must then necessarily result inT~WP~TW ~~WT=T ~~WT. SinceW ~~WT =WWT andT ~~WT =TWT (Property 5) it also follows that T~WP~TWWT=TWT =TWPTWWT and thusT~WP~T =TWPT. From this follow unaltered residuals and predictions.

The OPLS algorithm Following [2], the OPLS algorithm is as follows:

(3)

1. Seti= 1,Ei¡1=E0=X, andWortho,Tortho andPortho to empty matrices 2. wOPLSi = (Ei¡1)Ty

k(Ei¡1)Tyk=w1 3. tOPLSi =Ei¡1wi

4. pOPLSi = (Ei¡1)TtO P L Si (tO P L Si )TtO P L Si

5. worthoi = pO P L Si ¡wi

kpO P L Si ¡wik andWortho = [ Wortho worthoi ]

6. torthoi =Ei¡1wiortho andTortho = [ Tortho torthoi ] 7. porthoi =(EO P L Si¡1 )Tto r t h oi+1

(to r t h oi+1 )Tto r t h oi+1

andPortho = [ Portho porthoi ] 8. Ei=X¡TorthoPTortho

9. Leti=i+ 1and return to step 2 for additional orthogonal components, otherwise go to step 10

10. End.

The resultingEiare the …lteredXdata, and a one component PLS factorization after removal ofi=A¡1components further gives

EA¡1=tOPLSA ¡

pOPLSA ¢T

+EOPLS: (21)

Note that all steps givewOPLSi =w1.

Property 7 The OPLS loading weights matrix may be found from the ordinary PLS loading weights matrix asWortho=¡W2:A.

Proof: From Property 6 follows that orthogonalized PLS regression with the permuted loading weights matrix W~ =£

W2:A w1 ¤ gives the same …tted response vector ^y as with use of W.

Since the sign of awi vector has nothing to say for the productstipTi and torthoi ¡

porthoi ¢T

, this is true also forW~ =£

¡W2:A w1 ¤

:We use induction in the parameterirelated toWortho to show that the OPLS algorithm usesWortho =¡W2:A.

Fori= 1, i.e. oney-orthogonal component, the OPLS algorithm gives wortho1 = pOPLS1 ¡w1

°°pOPLS1 ¡w1°° = XTXw1(w1TXTXw1)¡1¡w1

°°XTXw1(w1TXTXw1)¡1¡w1°°; (22) while the recursive formula for the loading weights vectors developed by Helland [7] and the pre- diction formula (3) give (wherey^1 is the …tted response vector using one PLS component)

w2 = XT(y¡^y1)

kXT(y¡^y1)k= XT¡

y¡Xw1(wT1XTXw1)¡1w1TXT

°°XT(y¡Xw1(w1TXTXw1)¡1wT1XTy)°°

= w1¡XTXw1(w1TXTXw1)¡1

°°w1¡XTXw1(w1TXTXw1)¡1°° =¡wortho1 : (23) Assuming the property to be true up toworthoi¡1 we …nd according to the OPLS algorithm

worthoi = pOPLSi ¡w1

°°pOPLS¡w1°°; (24)

(4)

with

Ei¡1=X¡TorthoPTortho; (25)

where TorthoPTortho is the factorization of thei¡1removedy-orthogonal components. From the recursive loading weights formula [7] we also …nd

wi+1 = XT(y¡y^i) kXT(y¡y^i)k =

w1¡pXT^yi

yTXXTy

°°

°°w1¡pXT^yi

yTXXTy

°°

°°

; (26)

where^yi is the …tted response vector using a total oficomponents.

In order to show thatwiortho=¡wi+1 we …nally make use of the OPLS facts thatTTorthoy=0 and TTorthoy^=0(see [2] for proofs), i.e. ETi¡1y =¡

X¡TorthoPTortho¢T

y= XTy and ETi¡1y^i =

¡X¡TorthoPTortho¢T

^

y=XTy. We then use the prediction formula (3) and the fact that OPLS^ gives the same predictions as ordinary PLS, and develop pOPLSi into (also using wT1XTy = w1Tw1

q

yTXXTy= q

yTXXTy)

pOPLSi = ETi¡1Ei¡1w1¡

wT1ETi¡1Ei¡1w1¢¡1

=ETi¡1Ei¡1w1¡

wT1ETi¡1Ei¡1w1¢¡1wT1ETi¡1y w1TETi¡1y

= ETi¡1 ^yi

wT1ETi¡1y = XT^yi

wT1XTy = XTy^i q

yTXXTy

; (27)

and insertion into Eq. (24) and comparison with Eq. (26) …nally shows thatworthoi =¡wi+1. Property 8 After the removal ofA¡1y-orthogonal components, the OPLS factorization (14) results in the same residualEOPLS=EW and the same predictions as the original orthogonalized PLS factorization (12).

Proof: SinceWortho =¡W2:Athe OPLS factorization is equivalent with the factorization ob- tained by the standard PLS NIPALS algorithm with predetermined and permuted loading weights vectors in the orderw2,w3, ... ,wA andw1. From Property 6 thus follows that the residuals and the predictions are the same.

Result 1 The second similarity transformationTorthoPTortho=TorthoPTorthoW2:A¡

PTorthoW2:A¢¡1

PTortho results in the transformed OPLS score matrixTorthoPTorthoW2:A=T2:A.

Proof: According to Property 3 the two factorizationsX=TWT+EandX=TWPTWWT+ Eare identical, i.e. TWPTW=T. Using a permuted loading weights matrixW~ =£

W2:A w1 ¤ we correspondlingly haveT~=£

T2:A t1 ¤

=T~WP~TW, and that is independent of the number~ of components used. As the OPLS algorithm givesTorthoPTortho by use ofXandWortho=¡W2:A

(Property 7) in exactly the same way as we …nd the …rstA¡1components inT~WP~T, this will necessarily give

TorthoPTorthoW2:A=T2:A: (28)

Property 9 The last OPLS componenttOPLSA ¡

pOPLSA ¢T multiplied withWWT becomes tOPLSA ¡

pOPLSA ¢T

WWT =tOPLSA wT1. Proof: We have

¡pOPLSA ¢T

WWT

pOPLSA ¢T¡

w1wT1 +W2:AWT2:A¢

; (29)

(5)

where

¡pOPLSA ¢T

w1= wT1ETA¡1EA¡1

wT1ETA¡1EA¡1w1

w1= 1 (30)

and

¡pOPLSA ¢T

W2:A = wT1ETA¡1EA¡1

wT1ETA¡1EA¡1w1W2:A

= wT1ETA¡1 wT1ETA¡1EA¡1w1

¡X¡TorthoPTortho¢ W2:A

= wT1ETA¡1 wT1ETA¡1EA¡1w1

¡T2:A¡TorthoPTorthoW2:A¢

=0; (31) where we in the …nal equality make use of Result 1.

Property 10 After the removal ofA¡1y-orthogonal components, the modi…ed OPLS factoriza- tion (15) results in the same residualEand the same predictions as the modi…ed PLS factorization (13).

Proof: SinceWortho =¡W2:Athe OPLS factorization is equivalent with the factorization ob- tained by the standard PLS NIPALS algorithm with predetermined and permuted loading weights vectors in the orderw2,w3, ... ,wAandw1. From Property 6 and Eqs. (13) and (14) thus follows

X = TWPTWWT+E=³

TorthoPTortho+tOPLSA ¡

pOPLSA ¢T´

WWT +E

= TorthoPTorthoWWT +tOPLSA wT1 +E; (32)

where the …nal equality making use of Property 9 results in equality with Eq. (15).

Result 2 The …nal modi…ed OPLS component is identical with the …rst PLS+ST component, i.e. tOPLSA wT1 =tPLS+ST1 w1T.

Proof: WhenA¡1y-orthogonal components are subtracted fromX;it follows from the OPLS algorithm that the remaining score vector is

tOPLSA

X¡TorthoPTortho¢

w1=EA¡1w1: (33) Using the standard prediction formula (3) we further …nd

^

y=EA¡1w1¡

wT1ETA¡1EA¡1w1¢¡1

wT1ETA¡1y=EA¡1w1d=tOPLSA d; (34) wheredis a scalar. This con…rms thattOPLSA is in the direction ofy, which according to Property^ 6 is also identical with the …tted response vector using ordinary PLS regression.

From the PLS+ST factorization (5) follows tPLS+ST1 = q1¡1^y, where q1 is found as the …rst component in q = ¡

WTXTXW¢¡1

WTXTy: Since y is orthogonal to both T2:A (Property 1) andE(Property 2) we may also …ndq1 by use of the PLS+ST factorization (6) and

yTX = yT³

tPLS+ST1 w1T+T2:A¡

PPLS+ST2:A ¢T

+E´

=yTtPLS+ST1 wT1

= yTq1¡1^ywT1 = q1¡1yT^y q

yTXXTy

yTX; (35)

i.e. pq¡11 yTy^

yTXXTy = 1and

tPLS+ST1 =q1¡1^y= q

yTXXTy

yT^y ^y: (36)

(6)

Since yTTortho = 0 we …nd yTEA¡1 = yT¡

X¡TorthoPTortho¢

= yTX = q

yTXXTywT1, and from the tPLS+ST1 expression (36) using yTXw1 =

q

yTXXTyw1Tw1 = q

yTXXTy and y^ according to Eq. (34) thus follows

tPLS+ST1 = q

yTXXTy yT^y y^=

q

yTXXTyEA¡1w1d yTEA¡1w1d

= q

yTXXTytOPLSA yTXw1

=tOPLSA : (37)

Result 3 The …rst modi…ed and then transformed OPLS loading matrix is identical with the PLS+ST loading matrix, i.e. WWTPortho¡

WT2:APortho¢¡1

=PPLS+ST2:A .

Proof: According to Property 3 the two factorizationsX=TWT+EandX=TWPTWWT+ Eare identical withTWPTW=T. According to Property 10 these factorizations are also identical with the modi…ed OPLS factorization X = TorthoPTorthoWWT +tOPLSA wT1 +E and thus the transformed factorizationX=Tortho¡

PTorthoW2:A¢ ¡

PTorthoW2:A¢¡1PTorthoWWT+tOPLSA wT1+E, while the PLS+ST method givesX=T2:A¡

PPLS+ST2:A ¢T

+tPLS+STA wT1 +E. Since Result 1 shows that Tortho¡

PTorthoW2:A¢

=T2:A, while Result 2 shows that tOPLSA wT1 =tPLS+STA wT1, it follows that¡

PTorthoW2:A¢¡1

PTorthoWWT

PPLS+ST2:A ¢T

.

Property 11 The modi…ed loading matrix WWTPortho is di¤erent from Portho in the last column vector only, withporthoA¡1 replaced by¡

porthoA¡1¢T

w1w1¡wA.

Proof: The ordinary PLS algorithm results in an upper triangular and bi-diagonal matrixPTW, with 1 along the main diagonal [8]. SincePortho in the OPLS algorithm according to Property 7 is found from Wortho =¡W2:A in the same way as Pis found from W, the matrixPTorthoW2:A

must also be bi-diagonal with -1 along the main diagonal. We thus have (withp~i=porthoi )

PTortho£

w1 W2:A ¤ · wT1 WT2:A

¸

= 2 66 66 66 64

~

pT1w1 ¡1 ~pT1w3 0 ¢ ¢ ¢ 0

~

pT2w1 0 ¡1 ~pT2w4 ... ...

... ... . .. ... ... 0

~

pTA¡2w1 ... ... ¡1 ~pTA¡2wA

~

pTA¡1w1 0 ¢ ¢ ¢ ¢ ¢ ¢ 0 ¡1 3 77 77 77 75 2 66 66 66 4

wT1 wT2 ... ... wTA

3 77 77 77 5

= 2 66 66 64

~

pT1w1w1T¡wT2 +~pT1w3wT3

~

pT2w1w1T¡wT3 +~pT2w4wT4 ...

~

pTA¡2w1wT1 ¡wAT¡1+~pTA¡2wAwTA

~

pTA¡1w1w1T¡wTA

3 77 77 75

: (38)

From this follows that~pTA¡1 inPTortho is replaced byp~TA¡1w1w1T¡wAT, as stated. For a complete proof we must also show that for3·i·Awe have ~pTi¡2w1wT1 ¡wTi¡1+p~Ti¡2wiwTi =~pTi¡2, or equivalently thatwTi

~

pTi¡2wi¢¡1¡

¡p~Ti¡2w1w1T+wTi¡1+p~Ti¡2¢. Forming

(7)

¡~pTi¡2wi¢¡1¡

¡~pTi¡2w1wT1 +wi¡1T +~pTi¡2¢

wj we …nd the following possibilities for3·i·A:

j= 1 ) ¡

~

pTi¡2wi¢¡1

(¢)wj

~

pTi¡2wi¢¡1¡

¡~pTi¡2w1+ 0 +p~Ti¡2w1¢

= 0 1< j < i¡1 ) ¡

~

pTi¡2wi¢¡1(¢)wj

~

pTi¡2wi¢¡1(¡0 + 0 + 0) = 0

j=i¡1 ) ¡

~

pTi¡2wi¢¡1(¢)wj

~

pTi¡2wi¢¡1(¡0 + 1¡1) = 0

j=i ) ¡

~

pTi¡2wi¢¡1

(¢)wj

~

pTi¡2wi¢¡1¡

¡0 + 0 +~pTi¡2wi¢

= 1

j > i ) ¡

~

pTi¡2wi¢¡1

(¢)wj

~

pTi¡2wi¢¡1

(¡0 + 0 + 0) = 0

(39)

For ordinary PLS we know thatpi belongs to the span ofw1,w2, ... wA, and from Property 7 and the OPLS algorithm then follows that this must be the case also for¡p~Ti¡2w1w1+wi¡1+~pi¡2. Since

¡~pTi¡2wi¢¡1¡

¡~pTi¡2w1wT1 +wiT¡1+~pTi¡2¢

wj= 1forj=iand0forj 6=i, it …nally follows that

¡~pTi¡2wi¢¡1¡

¡~pTi¡2w1wT1 +wi¡1T +~pTi¡2¢

=wTi , and thus that~pTi¡2w1wT1 ¡wTi¡1+~pTi¡2wiwiT =

~ pTi¡2.

Result 4 For a single y-relevant component the relation between the post-processing PCP method [5] and PLS+ST is that tPCP1 ! tST1 = tOPLSA and w1PCP ! w1 when ^y ! y, i.e.

with good predictions.

Proof: PCP uses the factorization (with normalized loadings) X=tPCP1 ¡

wPCP1 ¢T

+EPCP; (40)

withtPCP1 =

p^yTXXT^y

^

yT^y ^yinstead oftPLS+ST1 =

pyTXXTy

yT^y y^as in Eq. (36) andwPCP1 = p XTy^

^ yTXXT^y

instead ofw1=p XTy

yTXXTy as in the PLS algorithms.

References

[1] Svensson O, Kourti T, MacGregor JF. An investigation of orthogonal signal correction algo- rithms and there characteristics.J. Chemometrics 2002;16: 176-188.

[2] Trygg J, Wold S. Orthogonal projections to latent structures, O-PLS.J. Chemometrics 2002;

16: 119-128.

[3] Martens H, Næs T.Multivariate Calibration. Wiley: New York, 1989.

[4] Verron T, Sabatier R, Jo¤re R. Some theoretical properties of the O-PLS method.J. Chemo- metrics 2004;18: 62-68.

[5] Langsrud Ø, Næs T. Optimised score plot by principal components of prediction.Chemometrics Intell. Lab. Syst. 2003;68: 61-74.

[6] Yu H, MacGregor JF. Post processing methods (PLS-CCA): simple alternatives to preprocess- ing methods (OSC-PLS). Chemometrics Intell. Lab. Syst. 2004;73: 199-205.

[7] Helland IS. On the structure of partial least squares regression.Communications in statistics 1988;17: 581-607.

[8] Manne R. Analysis of two partial-least-squares algorithms for multivariate calibration.Chemo- metrics Intell. Lab. Syst 1987;2: 187-97.

[9] Ergon R. PLS post processing by similarity transformation: A simple alternative to OPLS.J.

Chemometrics 2005;19: 1-4

Referanser

RELATERTE DOKUMENTER

An abstract characterisation of reduction operators Intuitively a reduction operation, in the sense intended in the present paper, is an operation that can be applied to inter-

In order to evaluate the ease of use for simple and complex transformations we provide several examples. The transformation code of each example is manually inspected by one or

explained variance of the sensory data (for PLS component 1 and 2).. 4.2.3 PLS external preference mapping 345. Figures 5) and 6) show the correlation loadings and scores plots

Figure 10: Classification of Lambrusco wines: Cross-validated predictions in the CVA space using both blocks restricted to only two classes (Salamino and Sorbara). Red bars are the

3. The orthogonalised PLS scores from 3. Contrary to regular PCA and PLS regression, the scores in PO-PLS are scaled to unit variance, and not the loadings. The reason for this is

For more complicated cases a simple procedure to determine whether or not there is CP viola- tion has been proposed in [18] and consists of starting by making a transformation to

This paper presents new techniques for matching the motion cycle boundaries by using simple animation processing al- gorithms based on observation of characteristics for

Having a highly automated method for character extraction with just one, simple parameter to be set when the post- processing of the 3D model is finished, our system needs a few