PLS post processing by similarity transformation (PLS+ST): a simple alternative to OPLS
Theoretical properties and proofs
Rolf Ergon
Telemark University College Porsgrunn, Norway Email: [email protected]
This Supplementary Appendix gives the details and proofs of properties and results in the paperPLS post-processing by similarity transformation (PLS+ST): A simple alternative to OPLS [9]. For the readers convenience, the OPLS algorithm [2] is also included.
Property 1 The Martens factorization (2) has the special property that all score vectors except the …rst one are orthogonal to bothyandy.^
Proof: Sincew1 is given byw1= kXXTTyyk andT2:A=XW2:A, and sinceWTW=I, it follows thatTT2:Ay=WT2:AXTy=°°XTy°°W2:AT w1=0. From the prediction formula (3) further follows
^
y=XW¡
WTXTXW¢¡1
WTXTyand thus
£ t1 T2:A ¤T
^
y = TT^y=WTXTXW¡
WTXTXW¢¡1
WTXTy=WTXTy
= °°XTy°°WTw1=£ °°XTy°° 0 ¤T
; (17)
i.e. TT2:A^y=0:
Property 2 The residualEin the Martens factorization (2) is also orthogonal toy.
Proof: FromXTy=°°XTy°°w1,wT1w1= 1andyTT2:A= 0followsyTE=yT³
X¡TWT´
= yTX¡yT¡
t1wT1 +T2:AWT2:A¢
=yTX¡yTt1w1T =yTX¡yTXw1wT1 =°°XTy°°w1T¡°°XTy°°wT1 = 0.
Property 3 The factorizations (13) and (2) are identical, i.e. TWPTW=T. Proof: From the two well known estimator expressionsb^=W¡
WTXTXW¢¡1
WTXTy and
^b=W¡
PTW¢¡1
qW =W¡
PTW¢¡1¡
TTWTW¢¡1
TTWy [7] follows W¡
WTXTXW¢¡1
WTXTy=W³¡
PTW¢T
TTWTWPTW´¡1¡
PTW¢T
TTWy; (18) i.e. TWPTW=XW=T:
Property 4 The loading matrices in the factorizations (12) and (13) are P = £
p1 p2 ¢ ¢ ¢ pA¡1 pA ¤ and WWTP = £
p1 p2 ¢ ¢ ¢ pA¡1 wA ¤, i.e. they are di¤erent in the last column vector only.
Proof: The orthogonalized PLS algorithm results in an upper triangular and bi-diagonal matrix PTW, with ones along the main diagonal [8]. We thus have
PTWWT = 2 66 66 66 64
1 pT1w2 0 ¢ ¢ ¢ 0 0 1 pT2w3 ... ...
... ... ... ... 0
... ... 1 pTA¡1wA 0 ¢ ¢ ¢ ¢ ¢ ¢ 0 1
3 77 77 77 75 2 66 66 66 4
wT1 wT2 ... ... wTA
3 77 77 77 5
= 2 66 66 64
wT1 +pT1w2w2T wT2 +pT2w3w3T
...
wTA¡1+pTA¡1wAwAT wTA
3 77 77 75 :
(19) From this follows that pA in P is replaced by wA, as stated. For a complete proof we must also show that for 2 · i · A we have wTi¡1+pTi¡1wiwTi = pTi¡1, or equivalently that wiT = ¡
pTi¡1wi¢¡1¡
pTi¡1¡wiT¡1¢. Forming ¡
pTi¡1wi¢¡1¡
pTi¡1¡wTi¡1¢
wj we …nd the following possibilities for2·i·A:
j < i¡1 ) ¡
pTi¡1wi¢¡1¡
pTi¡1¡wTi¡1¢ wj=¡
pTi¡1wi¢¡1(0¡0) = 0
j=i¡1 ) ¡
pTi¡1wi¢¡1¡
pTi¡1¡wTi¡1¢ wj=¡
pTi¡1wi¢¡1
(1¡1) = 0
j=i ) ¡
pTi¡1wi¢¡1¡
pTi¡1¡wTi¡1¢ wj=¡
pTi¡1wi¢¡1¡
pTi¡1wi¡0¢
= 1
j > i ) ¡
pTi¡1wi¢¡1¡
pTi¡1¡wTi¡1¢ wj=¡
pTi¡1wi¢¡1
(0¡0) = 0
(20)
Since pi and thus also ¡
pTi¡1wi¢¡1(pi¡1¡wi¡1)belong to the span of w1, w2, ... wA, and since ¡
pTi¡1wi¢¡1¡
pTi¡1¡wTi¡1¢
wj = 1 for j = i and 0 for j 6= i, it …nally follows from the orthonormality ofWthat ¡
pTi¡1wi¢¡1¡
pTi¡1¡wTi¡1¢
=wTi , and thus thatwTi¡1+pTi¡1wiwiT = pTi¡1 for2·i·A.
Property 5 Using a predetermined loading weights matrix W, the de‡ation order in the al- gorithm resulting in the non-orthogonlized factorization (2) is of no importance for the resid- ual and predictions. A loading weights matrix W~ with permuted column vectors will thus give X=T ~~WT+EwithT ~~WT =TWT, and ynew=xTnewb^according to Eq. (3).
Proof: Since W is orthonormal the PLS algorithm giving the non-orthogonalized factoriza- tion (2) generally gives ti = (X¡P
over allj6=itjwjT)wi =Xwi. This is true irrespective of the order of de‡ation, i.e. T~ =X ~W. Introducing an invertible permutation matrix P~ with the prop- erty P~¡1=P~Tand W~ =W ~P , the predictions according to Eq. (3) will be ynew = xTnew~b = xTnewW ~P³
PW~ TXTXW ~P´¡1
PW~ TXTy=xTnewW¡
WTXTXW¢¡1WTXTy=xTnewb.^
Property 6 Using a predetermined loading weights matrix W, the de‡ation order in the al- gorithm resulting in the orthogonlized factorizations (12) and (13) is of no importance for the residuals and predictions. A loading weights matrix W~ with permuted column vectors will thus give X = T~WP~T +EW and X = T~WP~TW ~~WT +E respectively, with T~WP~T = TWPT and T~WP~TW ~~WT =TWPTWWT.
Proof: According to Property 3 the relation between the orthogonalized and non-orthogonalized PLS algorithms isTWPTWWT=TWT. Use of the same algorithms with the predetermined and permuted matrixW~ must then necessarily result inT~WP~TW ~~WT=T ~~WT. SinceW ~~WT =WWT andT ~~WT =TWT (Property 5) it also follows that T~WP~TWWT=TWT =TWPTWWT and thusT~WP~T =TWPT. From this follow unaltered residuals and predictions.
The OPLS algorithm Following [2], the OPLS algorithm is as follows:
1. Seti= 1,Ei¡1=E0=X, andWortho,Tortho andPortho to empty matrices 2. wOPLSi = (Ei¡1)Ty
k(Ei¡1)Tyk=w1 3. tOPLSi =Ei¡1wi
4. pOPLSi = (Ei¡1)TtO P L Si (tO P L Si )TtO P L Si
5. worthoi = pO P L Si ¡wi
kpO P L Si ¡wik andWortho = [ Wortho worthoi ]
6. torthoi =Ei¡1wiortho andTortho = [ Tortho torthoi ] 7. porthoi =(EO P L Si¡1 )Tto r t h oi+1
(to r t h oi+1 )Tto r t h oi+1
andPortho = [ Portho porthoi ] 8. Ei=X¡TorthoPTortho
9. Leti=i+ 1and return to step 2 for additional orthogonal components, otherwise go to step 10
10. End.
The resultingEiare the …lteredXdata, and a one component PLS factorization after removal ofi=A¡1components further gives
EA¡1=tOPLSA ¡
pOPLSA ¢T
+EOPLS: (21)
Note that all steps givewOPLSi =w1.
Property 7 The OPLS loading weights matrix may be found from the ordinary PLS loading weights matrix asWortho=¡W2:A.
Proof: From Property 6 follows that orthogonalized PLS regression with the permuted loading weights matrix W~ =£
W2:A w1 ¤ gives the same …tted response vector ^y as with use of W.
Since the sign of awi vector has nothing to say for the productstipTi and torthoi ¡
porthoi ¢T
, this is true also forW~ =£
¡W2:A w1 ¤
:We use induction in the parameterirelated toWortho to show that the OPLS algorithm usesWortho =¡W2:A.
Fori= 1, i.e. oney-orthogonal component, the OPLS algorithm gives wortho1 = pOPLS1 ¡w1
°°pOPLS1 ¡w1°° = XTXw1(w1TXTXw1)¡1¡w1
°°XTXw1(w1TXTXw1)¡1¡w1°°; (22) while the recursive formula for the loading weights vectors developed by Helland [7] and the pre- diction formula (3) give (wherey^1 is the …tted response vector using one PLS component)
w2 = XT(y¡^y1)
kXT(y¡^y1)k= XT¡
y¡Xw1(wT1XTXw1)¡1w1TXTy¢
°°XT(y¡Xw1(w1TXTXw1)¡1wT1XTy)°°
= w1¡XTXw1(w1TXTXw1)¡1
°°w1¡XTXw1(w1TXTXw1)¡1°° =¡wortho1 : (23) Assuming the property to be true up toworthoi¡1 we …nd according to the OPLS algorithm
worthoi = pOPLSi ¡w1
°°pOPLS¡w1°°; (24)
with
Ei¡1=X¡TorthoPTortho; (25)
where TorthoPTortho is the factorization of thei¡1removedy-orthogonal components. From the recursive loading weights formula [7] we also …nd
wi+1 = XT(y¡y^i) kXT(y¡y^i)k =
w1¡pXT^yi
yTXXTy
°°
°°w1¡pXT^yi
yTXXTy
°°
°°
; (26)
where^yi is the …tted response vector using a total oficomponents.
In order to show thatwiortho=¡wi+1 we …nally make use of the OPLS facts thatTTorthoy=0 and TTorthoy^=0(see [2] for proofs), i.e. ETi¡1y =¡
X¡TorthoPTortho¢T
y= XTy and ETi¡1y^i =
¡X¡TorthoPTortho¢T
^
y=XTy. We then use the prediction formula (3) and the fact that OPLS^ gives the same predictions as ordinary PLS, and develop pOPLSi into (also using wT1XTy = w1Tw1
q
yTXXTy= q
yTXXTy)
pOPLSi = ETi¡1Ei¡1w1¡
wT1ETi¡1Ei¡1w1¢¡1
=ETi¡1Ei¡1w1¡
wT1ETi¡1Ei¡1w1¢¡1wT1ETi¡1y w1TETi¡1y
= ETi¡1 ^yi
wT1ETi¡1y = XT^yi
wT1XTy = XTy^i q
yTXXTy
; (27)
and insertion into Eq. (24) and comparison with Eq. (26) …nally shows thatworthoi =¡wi+1. Property 8 After the removal ofA¡1y-orthogonal components, the OPLS factorization (14) results in the same residualEOPLS=EW and the same predictions as the original orthogonalized PLS factorization (12).
Proof: SinceWortho =¡W2:Athe OPLS factorization is equivalent with the factorization ob- tained by the standard PLS NIPALS algorithm with predetermined and permuted loading weights vectors in the orderw2,w3, ... ,wA andw1. From Property 6 thus follows that the residuals and the predictions are the same.
Result 1 The second similarity transformationTorthoPTortho=TorthoPTorthoW2:A¡
PTorthoW2:A¢¡1
PTortho results in the transformed OPLS score matrixTorthoPTorthoW2:A=T2:A.
Proof: According to Property 3 the two factorizationsX=TWT+EandX=TWPTWWT+ Eare identical, i.e. TWPTW=T. Using a permuted loading weights matrixW~ =£
W2:A w1 ¤ we correspondlingly haveT~=£
T2:A t1 ¤
=T~WP~TW, and that is independent of the number~ of components used. As the OPLS algorithm givesTorthoPTortho by use ofXandWortho=¡W2:A
(Property 7) in exactly the same way as we …nd the …rstA¡1components inT~WP~T, this will necessarily give
TorthoPTorthoW2:A=T2:A: (28)
Property 9 The last OPLS componenttOPLSA ¡
pOPLSA ¢T multiplied withWWT becomes tOPLSA ¡
pOPLSA ¢T
WWT =tOPLSA wT1. Proof: We have
¡pOPLSA ¢T
WWT =¡
pOPLSA ¢T¡
w1wT1 +W2:AWT2:A¢
; (29)
where
¡pOPLSA ¢T
w1= wT1ETA¡1EA¡1
wT1ETA¡1EA¡1w1
w1= 1 (30)
and
¡pOPLSA ¢T
W2:A = wT1ETA¡1EA¡1
wT1ETA¡1EA¡1w1W2:A
= wT1ETA¡1 wT1ETA¡1EA¡1w1
¡X¡TorthoPTortho¢ W2:A
= wT1ETA¡1 wT1ETA¡1EA¡1w1
¡T2:A¡TorthoPTorthoW2:A¢
=0; (31) where we in the …nal equality make use of Result 1.
Property 10 After the removal ofA¡1y-orthogonal components, the modi…ed OPLS factoriza- tion (15) results in the same residualEand the same predictions as the modi…ed PLS factorization (13).
Proof: SinceWortho =¡W2:Athe OPLS factorization is equivalent with the factorization ob- tained by the standard PLS NIPALS algorithm with predetermined and permuted loading weights vectors in the orderw2,w3, ... ,wAandw1. From Property 6 and Eqs. (13) and (14) thus follows
X = TWPTWWT+E=³
TorthoPTortho+tOPLSA ¡
pOPLSA ¢T´
WWT +E
= TorthoPTorthoWWT +tOPLSA wT1 +E; (32)
where the …nal equality making use of Property 9 results in equality with Eq. (15).
Result 2 The …nal modi…ed OPLS component is identical with the …rst PLS+ST component, i.e. tOPLSA wT1 =tPLS+ST1 w1T.
Proof: WhenA¡1y-orthogonal components are subtracted fromX;it follows from the OPLS algorithm that the remaining score vector is
tOPLSA =¡
X¡TorthoPTortho¢
w1=EA¡1w1: (33) Using the standard prediction formula (3) we further …nd
^
y=EA¡1w1¡
wT1ETA¡1EA¡1w1¢¡1
wT1ETA¡1y=EA¡1w1d=tOPLSA d; (34) wheredis a scalar. This con…rms thattOPLSA is in the direction ofy, which according to Property^ 6 is also identical with the …tted response vector using ordinary PLS regression.
From the PLS+ST factorization (5) follows tPLS+ST1 = q1¡1^y, where q1 is found as the …rst component in q = ¡
WTXTXW¢¡1
WTXTy: Since y is orthogonal to both T2:A (Property 1) andE(Property 2) we may also …ndq1 by use of the PLS+ST factorization (6) and
yTX = yT³
tPLS+ST1 w1T+T2:A¡
PPLS+ST2:A ¢T
+E´
=yTtPLS+ST1 wT1
= yTq1¡1^ywT1 = q1¡1yT^y q
yTXXTy
yTX; (35)
i.e. pq¡11 yTy^
yTXXTy = 1and
tPLS+ST1 =q1¡1^y= q
yTXXTy
yT^y ^y: (36)
Since yTTortho = 0 we …nd yTEA¡1 = yT¡
X¡TorthoPTortho¢
= yTX = q
yTXXTywT1, and from the tPLS+ST1 expression (36) using yTXw1 =
q
yTXXTyw1Tw1 = q
yTXXTy and y^ according to Eq. (34) thus follows
tPLS+ST1 = q
yTXXTy yT^y y^=
q
yTXXTyEA¡1w1d yTEA¡1w1d
= q
yTXXTytOPLSA yTXw1
=tOPLSA : (37)
Result 3 The …rst modi…ed and then transformed OPLS loading matrix is identical with the PLS+ST loading matrix, i.e. WWTPortho¡
WT2:APortho¢¡1
=PPLS+ST2:A .
Proof: According to Property 3 the two factorizationsX=TWT+EandX=TWPTWWT+ Eare identical withTWPTW=T. According to Property 10 these factorizations are also identical with the modi…ed OPLS factorization X = TorthoPTorthoWWT +tOPLSA wT1 +E and thus the transformed factorizationX=Tortho¡
PTorthoW2:A¢ ¡
PTorthoW2:A¢¡1PTorthoWWT+tOPLSA wT1+E, while the PLS+ST method givesX=T2:A¡
PPLS+ST2:A ¢T
+tPLS+STA wT1 +E. Since Result 1 shows that Tortho¡
PTorthoW2:A¢
=T2:A, while Result 2 shows that tOPLSA wT1 =tPLS+STA wT1, it follows that¡
PTorthoW2:A¢¡1
PTorthoWWT =¡
PPLS+ST2:A ¢T
.
Property 11 The modi…ed loading matrix WWTPortho is di¤erent from Portho in the last column vector only, withporthoA¡1 replaced by¡
porthoA¡1¢T
w1w1¡wA.
Proof: The ordinary PLS algorithm results in an upper triangular and bi-diagonal matrixPTW, with 1 along the main diagonal [8]. SincePortho in the OPLS algorithm according to Property 7 is found from Wortho =¡W2:A in the same way as Pis found from W, the matrixPTorthoW2:A
must also be bi-diagonal with -1 along the main diagonal. We thus have (withp~i=porthoi )
PTortho£
w1 W2:A ¤ · wT1 WT2:A
¸
= 2 66 66 66 64
~
pT1w1 ¡1 ~pT1w3 0 ¢ ¢ ¢ 0
~
pT2w1 0 ¡1 ~pT2w4 ... ...
... ... . .. ... ... 0
~
pTA¡2w1 ... ... ¡1 ~pTA¡2wA
~
pTA¡1w1 0 ¢ ¢ ¢ ¢ ¢ ¢ 0 ¡1 3 77 77 77 75 2 66 66 66 4
wT1 wT2 ... ... wTA
3 77 77 77 5
= 2 66 66 64
~
pT1w1w1T¡wT2 +~pT1w3wT3
~
pT2w1w1T¡wT3 +~pT2w4wT4 ...
~
pTA¡2w1wT1 ¡wAT¡1+~pTA¡2wAwTA
~
pTA¡1w1w1T¡wTA
3 77 77 75
: (38)
From this follows that~pTA¡1 inPTortho is replaced byp~TA¡1w1w1T¡wAT, as stated. For a complete proof we must also show that for3·i·Awe have ~pTi¡2w1wT1 ¡wTi¡1+p~Ti¡2wiwTi =~pTi¡2, or equivalently thatwTi =¡
~
pTi¡2wi¢¡1¡
¡p~Ti¡2w1w1T+wTi¡1+p~Ti¡2¢. Forming
¡~pTi¡2wi¢¡1¡
¡~pTi¡2w1wT1 +wi¡1T +~pTi¡2¢
wj we …nd the following possibilities for3·i·A:
j= 1 ) ¡
~
pTi¡2wi¢¡1
(¢)wj =¡
~
pTi¡2wi¢¡1¡
¡~pTi¡2w1+ 0 +p~Ti¡2w1¢
= 0 1< j < i¡1 ) ¡
~
pTi¡2wi¢¡1(¢)wj=¡
~
pTi¡2wi¢¡1(¡0 + 0 + 0) = 0
j=i¡1 ) ¡
~
pTi¡2wi¢¡1(¢)wj=¡
~
pTi¡2wi¢¡1(¡0 + 1¡1) = 0
j=i ) ¡
~
pTi¡2wi¢¡1
(¢)wj=¡
~
pTi¡2wi¢¡1¡
¡0 + 0 +~pTi¡2wi¢
= 1
j > i ) ¡
~
pTi¡2wi¢¡1
(¢)wj=¡
~
pTi¡2wi¢¡1
(¡0 + 0 + 0) = 0
(39)
For ordinary PLS we know thatpi belongs to the span ofw1,w2, ... wA, and from Property 7 and the OPLS algorithm then follows that this must be the case also for¡p~Ti¡2w1w1+wi¡1+~pi¡2. Since
¡~pTi¡2wi¢¡1¡
¡~pTi¡2w1wT1 +wiT¡1+~pTi¡2¢
wj= 1forj=iand0forj 6=i, it …nally follows that
¡~pTi¡2wi¢¡1¡
¡~pTi¡2w1wT1 +wi¡1T +~pTi¡2¢
=wTi , and thus that~pTi¡2w1wT1 ¡wTi¡1+~pTi¡2wiwiT =
~ pTi¡2.
Result 4 For a single y-relevant component the relation between the post-processing PCP method [5] and PLS+ST is that tPCP1 ! tST1 = tOPLSA and w1PCP ! w1 when ^y ! y, i.e.
with good predictions.
Proof: PCP uses the factorization (with normalized loadings) X=tPCP1 ¡
wPCP1 ¢T
+EPCP; (40)
withtPCP1 =
p^yTXXT^y
^
yT^y ^yinstead oftPLS+ST1 =
pyTXXTy
yT^y y^as in Eq. (36) andwPCP1 = p XTy^
^ yTXXT^y
instead ofw1=p XTy
yTXXTy as in the PLS algorithms.
References
[1] Svensson O, Kourti T, MacGregor JF. An investigation of orthogonal signal correction algo- rithms and there characteristics.J. Chemometrics 2002;16: 176-188.
[2] Trygg J, Wold S. Orthogonal projections to latent structures, O-PLS.J. Chemometrics 2002;
16: 119-128.
[3] Martens H, Næs T.Multivariate Calibration. Wiley: New York, 1989.
[4] Verron T, Sabatier R, Jo¤re R. Some theoretical properties of the O-PLS method.J. Chemo- metrics 2004;18: 62-68.
[5] Langsrud Ø, Næs T. Optimised score plot by principal components of prediction.Chemometrics Intell. Lab. Syst. 2003;68: 61-74.
[6] Yu H, MacGregor JF. Post processing methods (PLS-CCA): simple alternatives to preprocess- ing methods (OSC-PLS). Chemometrics Intell. Lab. Syst. 2004;73: 199-205.
[7] Helland IS. On the structure of partial least squares regression.Communications in statistics 1988;17: 581-607.
[8] Manne R. Analysis of two partial-least-squares algorithms for multivariate calibration.Chemo- metrics Intell. Lab. Syst 1987;2: 187-97.
[9] Ergon R. PLS post processing by similarity transformation: A simple alternative to OPLS.J.
Chemometrics 2005;19: 1-4