Dynamic system multivariate calibration

(1)

(2)

Fig. 1. Basic principle for estimation of primary system outputsy₁ from known inputsuand measured secondary outputsy₂in pres- ence of process noisev.

the covariance of the y₁ estimate is also given. In Section 4 it is shown that least squares estimation ŽLSE , principal component regression PCR and. Ž .

Ž .

partial least squares regression PLSR can be seen as special static cases of this dynamical solution. The relations between these data based estimators and theoretical estimators based on known or assumed static models and noise properties are also presented, and the relation between two different PLSR algorithms falls out as a neat result. Extensions of the PCR and PLSR methods to cover also dynamic systems with collinear measurements are presented in Section 4. Section 5 gives some numerical examples and Monte Carlo simulations, and concluding remarks are given in Section 6.

2. Background and preliminaries

Linear regression and static calibration methods have roots in the classical least squares technique used by Gauss around 1800. When the number of estimator variables is large and the number of observations is limited, the ordinary solution to the least squares problem may have very large variance due to overfitting. This situation requires some form of reg-

w x

ularization, e.g., PCR or PLSR 1,2 . In many cases of great practical interest, the estimator variables far outnumber the observations at hand. An example is product quality characterization by use of near in- frared spectroscopy, with several thousand estimator

Ž .

variables frequencies and often less than 100 observations. In such cases, the estimator variables are often strongly collinear, and most of the information

within a subspace of the variable space. Basic tools for this data compression are singular value decom-

Ž .

position SVD and principal component analysis ŽPCA , and the regression method directly based on. this is PCR, while PLSR combines data compression and regression in an iterative approach. Such tools for multivariate data analysis are used in many scientific fields like biometrics, chemometrics, econometrics and psychometrics.

Linear regression can also be used to identify the parameters in dynamic system finite impulse re-

Ž .

sponse FIR models or autoregressive models with

Ž .w x

external inputs ARX 3 . Due to lack of noise modeling, this will normally result in biased parameter estimates, and the FIR truncation error comes in addition. Identification of FIR and ARX models by PCR and PLSR have also been investigated, see, e.g., Refs.

w4,5 .x

In parallel with the development of the PCR and PLSR methods, the field of general dynamic SI has been developed into a sophisticated set of methods and practical tools. Classical SI methods are summa-

w x rized in comprehensive books, e.g., Refs. 3,6 . At present, subspace identification methods attract a great deal if interest, see, e.g., Ref. 7 with furtherw x references. In all forms of SI, one finds that LSE is used as a basic tool. It is, however, refined and in some cases replaced by, e.g., prediction optimization methods in order to account for the noise influence in a proper way.

SI is also closely linked to the Kalman filtering theory 8 . This is done by use of innovation models,w x where the different process and measurement noise sources are replaced by the white noise innovations in an underlying Kalman filter.

From a SI and Kalman filtering point of view, it is intuitively evident that the classical linear regression and the modern multivariate calibration methods may be seen as special static cases of the more general parametric SI methods for dynamic systems. An early attempt to look into these similarities was made in Ref. 9 , and the present paper includes a furtherw x and more detailed attempt to do so. When these similarities are to be investigated, three basic facts have to be acknowledged.

Ž .1 Methods of multivariate calibration are used to find models for estimation of unknown output vari-

(3)

ablesyfrom both independent and dependent known variables x. In SI terminology, this means methods for estimation of unknown system outputs y₁ from both independent system inputsuand dependent system outputsy₂. The basic observation here is that also dependent outputsy₂ have to be used as inputs in the SI procedure.

Ž .2 When the multivariate calibration models are used for estimation, they₁outputs are not known, and this will also be the case for the corresponding dynamical models found by SI. We are, therefore, lead

Ž .

to consider output error OE models and not the qualitatively different ARMAX autoregressive mov-Ž ing average with external inputs type of models used. for, e.g., control design based on knowny₁ outputs.

Ž .3 In order to find the optimal y₁ estimate, the underlying Kalman filter must be of the predictor–

corrector form, which is normally not the case when innovation models are used in SI.

These basic facts must be reflected in the theoretical analysis of the relations between SI and LSE, PCR and PLSR, and this is quite independent of the specific SI methods considered. The use of both independent inputsuand dependenty₂ measurements as inputs in a SI procedure raises questions about iden- tifiability and applications on deterministic and perfect measurement systems. A preliminary discussion

w x

of this is given in Ref. 10 . A detailed comparison of ARMAX and OE models for prediction ofy₁based

w x onuandy₂ is given in Ref. 11 .

3. Secondary measurements as inputs in system identification

3.1. Statement of problem

Consider the discrete system model x_kq1 s AxkqBukqGvk

y_1,k s C x₁ kqD u1 kqw1,k Ž .1 y_2,k s C x₂ kqD u2 kqw2,k,

w ^T ^Tx^T wherexis the state vector, whilevandws w w₁ ₂ are white and independent process and measurement noise vectors. Also assume a stable system withŽA, G R

(

v.reachable, whereRv is given by the expecta-

tion R_vsEv v_{k k}^T. Note that some or all of the secondaryy₂ measurements may be collinear with some or all of the primaryy₁ measurements.

Further assume that input–output data are avail- w x

able from an informative experiment 12 , i.e., that data records for u_k, y_1,k andy_2,k for ks1,2, . . . ,N are at hand, with uk persistently exciting of appropriate order and N sufficiently high. The problem is now to identify the optimal one-step-ahead y_1,k predictor based on past and presentuk and pasty2,kval- ues, and the optimaly_1,k current estimator based also on present y_2,k values.

Note that it is a part of the problem thaty_1,k is not available as a basis for the prediction estimatey_1,k<ky1

or the current estimatey_1,k<k. This is a common situation in industrial applications, e.g., in polymer ex- truding, where product quality measurements involve costly laboratory analyses. Product samples are then collected at a rather low sampling rate, and product quality estimates at a higher rate may thus be valuable.

3.2. Optimal one-step-ahead predictor when y is₁ aÕailable

Ž Ž ..

The model Eq. 1 can be expressed in the ordinary innovation form 6 given by the followingw x

w x

equations, where AKsA K K₁ ₂ is the gain in a predictor type Kalman filter formulation with white innovationse₁ande₂:

e₁

w x

x_kq1 s AxkqBukqA K K1 2 e₂

k Ž .2

y_1,_k s C x₁ _kqD u₁ _kqe_1,_k y_2,k s C x2 kqD u2 kqe2,k.

The optimal one-step-ahead y₁ predictor with all measurements available and a knownukwill then be x_kq1sA IŽ yK C1 1yK C2 2.xk

qŽByAK D_{1 1}yAK D₂ ₂.uk

qAK y_{1 1,k}qAK y_{2 2,k}

y_1,ksC x₁ _kqD u₁ _k. Ž .3 This will be the best linear one-step-ahead predictor ifx₀,vk andwk have arbitrary statistics, and the optimal predictor assuming that x₀, vk and wk are

(4)

normally distributed 8 . This is also the predictorw x normally used in prediction error identification meth-

w x ods 3,6 .

3.3. Optimal one-step-ahead predictor when y is not₁ aÕailable

When they₁ measurements are not available as a

Ž Ž ..

basis for prediction, the ARMAX predictor Eq. 3 w x

is no longer optimal 11 . The obvious reason for this is that Eq. 3 is based on an underlying Kalman fil-Ž . ter driven byy₁ in addition tou andy₂, and the information in the y₂ measurements will then not be utilized in an optimal way wheny₁ is not available.

In a prediction error identification method, we must instead base the prediction on an underlying Kalman filter driven byu and only the y₂ measure-

Ž .

ments. With the assumption that C₂, A is de- tectable, the following innovation form can then be derived from Eq. 1 :Ž .

OEP OEP OE

x_kq1 s Axk qBukqAK2 e2,k

4

OEP Ž .

y_2,k s C x₂ k qD u2 kqe2,k. They₁output is then given by

OEP OEP

y_1,_ksC x₁ _k qD u₁ _kqq_k , Ž .5 where

OEP OEP

qk sC x1

Ž

kyxk

.

qw1,k Ž .6 is colored noise.

The underlying Kalman filter is governed by the well known Kalman filter equations 8 . The Kalmanw x gain is determined by

y1

OE OEP T OEP T

K2 sP C2

Ž

C P2 C2qR22

.

, Ž .7 where the prediction state estimation covariance

OEP Ž OEP.Ž OEP T.

P sE xkyxk xkyxk is given by the Riccati equation

PÔEPsAPÔEPA^TqGR G_v ^TyAKÔE₂ C P₂ ÔEPA^T, 8 Ž . and whereRvsEv vk k^T andR22sEw2,kw^T2,k.

Theoretically, it is possible to identify the system

Ž . Ž .

determined by Eqs. 4 and 5 using y₁ and y₂ as outputs, i.e., to identify Eq. 2 with a simplified noiseŽ . model employingK₁s0. With many secondaryy₂ measurements it is, however, a simpler task to usey₂

as an input signal, and identify the OE prediction

Ž .

model OEP model

OEP OE OEP

xkq1sA I

Ž

yK2 C2

.

xk

q

Ž

ByAK^OE2 D2

.

ukqAK^OE2 y2,k

OEP OEP

y_1,ksC x₁ _k qD u₁ _kqq_k . Ž .9 The corresponding input–output model is then

y1

y2,kqqk^OEP, Ž11. where q^y¹ is the unit delay operator. The transfer functions are here

AK2 , Ž13.

˜ ^OE

withAsAyAK₂ C₂.

In order to identify the deterministic part of the

Ž . ^OEP

system 10 , i.e., G₁ and G₂, we model qk by some unknown white noise sequence and use the prediction

ˆ ^y¹ ˆ ^y¹

yˆ1,ksG1

Ž

q ;u

.

ukqG2

Ž

q ;u

.

y2,k. Ž14. The prediction error is then

´1,ks y1,kyyˆ1,k

y1 ˆ y1

y2,kqqk . 15 Ž . When evaluating the result of minimizing a criterion

Ž . ŽŽ . ^N ^T .

function VN u str 1rN Ýks1 1,k´ ´1,k , we must now consider the fact that y_2,k and qk^OEP are not in-

Ž Ž ..

dependent. We then note that the predictor Eq. 14

(5)

has the form of an observer driven byu and they₂ measurements, and that the criterion function deter-

Ž Ž ..

mined by the prediction error Eq. 15 under the assumption of Gaussian noise therefore is minimized when and only when both

Ø the deterministic model is correct, and ˆ Ø the observer gain is a Kalman gain, i.e., K₂'

K^OE₂ .

Minimization will therefore asymptoticallyŽN™

ˆ ˆ

`.result inG₁'G₁andG₂'G₂, withG₁andG₂

Ž . Ž .

given by Eqs. 12 and 13 . The prediction estimate y_1,k<ky1will thus be asymptotically unbiased.

3.4. Optimal current estimator when y is not aÕail-1

able

Utilizing also currenty₂ values, the optimal estimator considering that y₁ is not available will be found by identifying the following OE model based on an underlying predictor–corrector Kalman filter w x8 utilizing also current data OEC model :Ž .

y1

OE OE

y1,ksC I1

Ž

yK2 C2

.

qIyAqAK2 C2

OE OE

P

Ž

ByAK2 D2

.

ukqAK2 y2,k

qC K₁ ^OE₂ Žy_2,kyD u₂ k.qD u1 kqqk^OEC. 16 Ž . Here we introduce the colored noise

OEC OEC

qk sC x1

Ž

kyxk

.

qw1,k, Ž17. based on

OEC OE OEP OE

xk s

Ž

IyK2 C2

.

xk qK2 Žy2,kyD u2 k.. 18 Ž .

Ž .

From Eq. 16 , we find the asymptotically unbiased and optimaly₁ current estimator

y1

OE OE

y1,k<ksC I1

Ž

yK2 C2

.

qIyAqAK2 C2

OE OE

P

Ž

ByAK2 D2

.

ukqAK2 y2,k

qC K₁ ^OE₂ Žy_2,kyD u₂ _k.qD u₁ _k. Ž19. This is the central relation in the paper, showing how past and presentuandy₂values can be utilized in an optimal way to find the current estimatey_1,k<k. It is straightforward to show, however, that identification

Ž .

of Eq. 19 by use of a prediction error method will

result in a correct result only whenw_1,_k andw_2,k are w x

uncorrelated 11 .

Ž Ž ..

The optimal estimator Eq. 19 is also the basis for Section 4, where LSE, PCR and PLSR are found as special static cases, and for the dynamic PCR and PLS solutions presented in Section 5.

3.5. Theoretical y current estimation coÕariance₁

Ž Ž ..

When the OEC model Eq. 16 is identified using a large data set, i.e., N™`, the estimate y_1,_k<k

will be asymptotically unbiased when we use either onlyuor bothuandy₂ as input signals. The asymptotic covariance will, however, depend on the model and the quality of the data. In the following we assume perfect model and noise information, and de- rive theoretical asymptotic expressions for the y₁ current estimation covariance.

The underlying Kalman filter driven byuand the

Ž . Ž .

y₂ measurements is governed by Eqs. 7 and 8 . The Ž .

current state estimate is given by Eq. 18 , and the

OEC Ž

current state estimation covariance P sE x y

OEC.Ž OEC T. k

xk xkyxk is thus

OEC OE OEP OE T

P s

Ž

IyK2 C2

s E y

Ž

1,kyy1,k<k

. Ž

y1,kyy1,k<k

.

T OEC T

s C P₁ C₁qR₁₁,

21 Ž .

OEC Ž . ^T

withP given by Eq. 20 andR₁₁sEw_1,kw_1,k. Assume now for convenience a scalar y₁ measurement. When the model is identified and validated by use of independent data sets with N™`, we will then find the theoretical root mean square error ŽRMSE.

1 N ₂

RMSE<^u,y2s

(

^N k

Ý

s1

Ž

y1,kyy1,k<k

.

OEC T

(

™ C P₁ C₁qR₁₁. Ž22.

(6)

4. Multivariate calibration as special cases

4.1. Assumptions according experimental setup and data

Ž Ž ..

Consider again the system Eq. 1 with the opti-

Ž Ž ..

maly₁ current estimator Eq. 19 , and expand the inputuwith a vectordof unknown offsets or distur-

w ^{T T}x^T

bances, i.e., useus d u_m , whereu_mis the known vector of manipulated or measured inputs. Let the input uk be piecewise constant over periods that are much longer than both the time constants in the underlying continuous system and the discretization sampling time, and assume possibly collinear obser- vationsy_1,j andy2,j at the of each such period. Also assume thatdjis a white noise sequence, i.e., that the unknown offsets and disturbances are independent from one observation to the next. With a piecewise static input vector uk and enough time for settle-

Ž Ž ..

ment, it follows from Eq. 1 that the observations will be given by

d

y1

y_1,js C I1Ž yA. BqD1 u_m

j j

q

Ý

v g^k 1,jykqw1,j Ž23a.

ks y`

d

y1

y_2,js C2ŽIyA. BqD2 u_m

j j

q

Ý

v gk 2,jykqw2,j, Ž23b.

ks y`

whereg₁andg₂stand for the impulse responses from v to y₁ and y₂. All measurements are thus linear combinations ofd andu_m plus noise, and since we assume a stable system with piecewise constant inputs and a settling time shorter than the data sampling time, this noise will be approximately white.

Note, however, that since the noise terms in Eqs.

Ž23a and 23b are partly determined by the com-. Ž . mon process noisev_k, they will not be independent, as required for the optimal current estimator Eq.Ž Ž19 . For calibration purposes it is also a normal..

procedure to use mean values of the measurements over a certain period of time in order to reduce the noise, but this does not affect the theoretical analysis.

4.2. Least squares estimation

If both d andu_m are completely known, there is no need to utilize the information in they₂ measure-

Ž .

ments, we can simply solve Eq. 23a as an ordinary least squares problem. In our case, however, we consider d as unknown, and they₂ measurements may then give valuable information aboutdand indirectly also about y₁. In the following analysis we assume that u_m,j is a persistently exciting stochastic signal, and that all data are centralized, i.e., thatd_j,u_m,_j,y_1,j andy_2,j are stochastic variables with zero mean. For details about centralization and the subsequent modi- fication of the estimator, see, e.g., Ref. 1 . We alsow x assume observations of u_m,_j, y_1,_j and y_2,_j from an informative experiment with samples for js 1,2, . . . ,J.

In order to use the Kalman filter formalism, we modeld_j as generated by a white noise sequencee_1,j through a pure delay system. In the same way we model the common noise part h_c,j iny1,j andy2,j as generated by a white noise sequencee_2,j. Expressing

w ^T ^Tx^T y₁andy₂ as linear combinations ofzs d h_c and u_m, we then arrive at the following dynamic system

d e1

z_jq1s hc jq1s e2 jsej

24 Ž . y_1,_jsL z₁₁ _jqL u_{12 m ,}_jqh_1,_j

y_2,jsL z21 jqL u22 m ,jqh2,j,

where the detailed expressions for theLmatrices fol-

Ž . Ž .

low from Eqs. 23a and 23b , and where h_1,_j and h_2,_jare white and independent noise sequences. This is a dynamic system as given in Eq. 1 withŽ . As0, Bs0 andGsI, and the algebraic Riccati 8 thenŽ . results in

PsPzsResEe ej ^Tj. Ž25. From Eq. 7 follows that the Kalman gain relatedŽ . to they₂ measurements is

y1

OE T T

K2 sR Le 21

Ž

L R L21 e 21qR22

.

, Ž26. whereR₂₂sEh_2,jh^T2,j. WithAsBs0 and appro-

Ž . priate change of notation according to Eq. 26 , the

(7)

y1

1 1

y1

T T T T

Bˆs

Ž

Y Y2 2

.

Y Y2 1s

ž

^NY Y2 2

/

P^NY Y2 1. 33 Ž .

Ž ^{T T} For a theoretical analysis we also stack z₁,z₂, . . . ,

T. Ž ^T ^T ^T . Ž ^T ^T

z_N , h_1,1,h_1,2, . . . ,h_1,_N and h_2,1,h_2,2, . . . ,

T .

h_2,_N in data matricesZ, E₁ andE₂, and by use of Ž .

Eq. 29 and for N™`we will then find

1 _T 1 _T _T _T

Y Y2 2s

Ž

ZL2qE2

. Ž

ZL E2 2

.

N N

™L R L₂ _e ^T₂qR₂₂, Ž34. and

1Y Y2T 1s 1

Ž

ZLT2qE2

. Ž

T ZLT1qE1

.

N N

™L R L₂ _e ^T₁. Ž35.

Ž . ^T

Here, we make use of the fact that 1rN Z Z™R_e and that h₁ and h₂ are independent white noise se-

Ž . ^T

quences, which means that 1rN L Z E₂ ₂™0, Ž1rN L Z E. ₂ ^T ₁™ 0, 1Ž rN E Z L. ^T₂ ^T₁™ 0 and Ž1rN E E. ^T₂ ₁™0 when N™`. By inserting Eqs.

Ž34 and 35 into Eq. 33 , we now find the estima-. Ž . Ž .

Ž . Ž .

tor 31 , showing that the estimators 31, 33 are asymptotically equivalent. This connection between Kalman filtering without dynamics and ordinary LSE was found also in Ref. 9 , but then without the gen-w x

Ž .

eral dynamic estimator 19 as a basis, and also lim-

Ž .

ited to the case wereL₁sI or at least invertible and h₁s0, i.e., the case werey₁are noise-free measurements of all states in the system possibly after aŽ similarity transformation ..

In a similar although somewhat more involved way we can also show that the estimator

y1

T T

1 U_m 1 U_m

ˆ w x

Bs

ž

^N ^Y²_T U Y_m ₂

/

^N ^Y²_T Y₁ Ž36. Ž .

is asymptotically equivalent with Eq. 28 . 4.3. Principal component regression

With a large number ofy₂ variables and a limited

Ž .

number of observations, the estimators 33, 36 may have very large variance. In the common case with collinear y₂ variables, we can then make use of the fact that the information can be compressed into a smaller number of latent variables determined by the total number of independent variables in u_m and z.

w x We then collect all input data in either Xs U Y_m ₂

Ž . Ž .

as in Eq. 36 or XsY₂ as in Eq. 33 , dependent

(8)

on the problem formulation. By use of an appropriate w x

number of principal components 1,2 , the data is then expressed as

XfTP^T, Ž37.

whereTis the score matrix andPis the loading matrix.

For convenience and due to space limitations we now limit the treatment to the case where u_m,js0,

w x^T

i.e., to the case where Xs y_2,1,y_2,2, . . . ,y_2,N . We then replace the measured variablesy_2,j with latent variables tjsP^Ty2,j, and make use of the fact

T Ž .

thatP PsI, and the system 29 is thus replaced by z_jq1 s ej

y^1,^j s L z¹ ^jqh^1,^j Ž38.

T T

tj f P L z2 jqP h2,j. Ž .

The theoretical estimator 31 is then replaced by

y1

T T T T T

BsP P L R L P

Ž

2 e 2 qP R22P

.

P L R L2 e 1. 39 Ž .

Ž . By replacing y₂ with T and inserting Eqs. 34 and Ž35 into Eq. 39 , we find the corresponding data. Ž . based PCR estimator

y1 y1

T T T T T T

ˆ

BsP T TŽ . T y₁sP P X XPŽ . P X Y₁. 40 Ž .

4.4. Partial least squares regression

The aim of PLSR is to improve PCR by finding t variables that explain both theXand theY₁data, and there exist at least two slightly different PLSR algorithms 1 . Also here we limit the treatment to the casew x were u_m,_js0, and it is convenient to start with the PLSR method of Martens that makes use of linear

T Ž

combinations t_MsW y₂ where the weight matrix Wis found iteratively . The result of this is that Eq.. Ž29 is replaced by the PLSR model. _M

z_jq1 s ej

y1,j s L z1 jqh1,j Ž41.

T T

t_{M ,}j f W L z2 jqW h2,j.

Ž .

The theoretical PCR estimator 39 is then replaced by the theoretical PLSR estimator

y1

T T T

BsW W L R L W

Ž

2 e 2 qW R W22

.

=W^TL R L₂ _e ^T₁, Ž42. Ž .

while Eq. 40 is replaced by the data based PLSR estimator

y1

T T

Bˆ s W T T

Ž

M M

.

T YM 1

43 Ž .

y1

T T T T

s W W X XWŽ . W X Y₁.

The original PLSR method of Wold uses linear

Ž ^T .^y¹ ^T

combinations t_Ws W P_W W y₂, with the same Wmatrix as Martens and with a special load-

Ž .

ing matrix P_W. The model 29 is then replaced by the PLSR model_W

zjq1 s ej

y_1,_j s L z₁ _jqh_1,j

y1 y1

T T T T

tW ,j f ŽW PW. W L z2 jqŽW PW. W h2,j.

44 Ž .

Ž ^T .^y^T

With W W P_W instead of W, the theoretical Ž .

PLSR estimator 42 becomes

y1 y1 T T yT

P W L R L W P

Ž . ₂ _e ₂ Ž .

yT

BsWŽ .P

ž

^q^{Ž .}^P _y₁^{W R W}_T ²² ^{Ž .}^P _y_T

/

y1 T T

P PŽ . W L R L₂ _e ₁

y1

T T T

sW W L R L W

Ž

2 e 2 qW R W22

.

PW^TL R L₂ e ^T1, Ž45. Ž .

while the data based PLSR estimator 43 becomes

y1 yT y1 T T yT

BˆsWŽ .P

Ž

Ž .P W X XWŽ .P

.

y1 T T

P PŽ . W X Y₁

y1 y1

T T T

sW P W

Ž

W

. Ž

T TW W

.

T yW y1

T T T T

sW W X XWŽ . W X Y₁. Ž46. We see from this that P_W disappears from the estimator expressions, and that the final theoretical as well as the data based estimators are the same for the Wold and Martens algorithms. This is, of course, well known 1 , although the relation to the underlyingw x Kalman filter is new.

(9)

4.5. Dynamic system PCR and PLSR solutions The optimal y₁ estimator for dynamic systems

Ž .

given in Eq. 19 may also form a basis for dynamic system solutions using PCR or PLSR. It is then natu- ral to split the secondary measurements into y_2,ks wy_21,k^T y_22,k^T x^T, wherey_21,k are the secondary measurements that are linked toy₁ only through a static system. The y_21,k measurements are then internally collinear, and they can thus be replaced by latent t

Ž . Ž . Ž .

variables as in Eqs. 38 , 41 and 44 , i.e., both PCR and PLSR may be used. Using, e.g., the score definition in the PLSR method of Martens, i.e.,tsW^Ty₂,

Ž .

the OEC estimator 19 will be replaced by

qC K₁ ^OE₂₂Žy_22,kyD u₂₂ _k.qD u₁ _k, Ž47.

T Ž ^OE ^T

where t sW y , A sA IyK W C y

k 21,k est t 21

O E . ^{O E} ^T

K_{2 2} C_{2 2} and B_ests By A K_t W D_{2 1}y AK^OE₂₂D₂₂. The Kalman gains are here determined as

Ž .

the solution to the Kalman filter 7, 8 with

T T

C2s

Ž

W C21

.

C22 and

T T T T

EW w w W_{21 21} EW w w_{21 22}

R₂₂s _T _T .

Ew w W₂₂ ₂₁ Ew w_{22 22}

If we find the t variables by use of the PLSR method of Wold, we have to replace W^T with ŽW P^T _W.^y¹W^T, while the PCR method usesP^T instead ofW^T.

Ž .

When the current estimator 47 is identified by use of, e.g., a prediction error method, also past t_k values will be used as a basis for determiningy_1,k<k, with reduced variance as the expected result, and we can in fact look at and treat the latent variables as ordinary measurement signals. An essential assumption is here that the linear relations betweeny_21,k and t_k

T T Ž ^T .^y¹ ^T

given byP ,W or W P_W W are time invari- ant and determined as in the static case either by PCA or by the iterative PLSR algorithms. Note, however, that time invariance is an essential assumption also in

Ž . the general estimator 19 .

If all or some of they₂₂ measurements follow the same dynamic response except for noise, and thus are

internally collinear, such measurements may also be replaced by latent variables in order to reduce the variance in the solution. However, sincey₂₂ is linked to y₁ through a dynamic system, the iterative PLSR method cannot be expected to work, and we must be content with using SVD or PCA to find these latent variables. They may also then be combined with known inputs or other measurements, and also with other latent variables found by PCA or PLSR.

Ž .

With uks0 and y22,ks0, Eq. 47 is simplified to

OE T

PAK^OEt t tkqC K1 ^OEt t tk, Ž48. showing the dynamic relation between the collinear time series y₂₁ represented by t and the time series y₁.

We end this subsection with a general discussion on dynamic system multivariate calibration methods.

Ž .

The proposed estimator 47 is based on the asymp- Ž .

totically optimal estimator 19 . This is in contrast with PCR and PLSR methods for identification of FIR models 4 , where also the asymptotical least squaresw x solution is biased due to truncation as well as lack of noise modeling 3 . It is also in contrast with PLSRw x methods for identification of ARX models 5 , wherew x again the asymptotical least squares solution is bi-

w x ased when the observation error is colored 3,6 .

In addition we must consider the fact that an ARX estimator would make use of pasty₁values that are not available as the present problem is formulated in Section 3.1. As for the optimal ARMAX estimator Ž .3 , an ARX estimator would then not utilize secondaryy₂ information in an optimal way. One obvious effect of this would be that noisy y₂₁ measurements collinear withy₁would be effectively ignored when the identification experiment gives low noisey₁ information.

The fact that FIR and ARX least squares solutions are not asymptotically optimal does not mean, of course, that the PCR and PLSR solutions presented

w x

in, e.g., Refs. 4,5 may not give good results in some realistic cases with a limited number of observations.

An in-depth comparison between such known solu-

(10)

Ž .

tions and the proposed estimator 47 is, however, beyond the scope of the present paper.

5. Simulation examples

Simulation studies are undertaken by use ofMAT-

LAB, primarily the dlsim.m function in the control w x

system toolbox 13 , and the prediction error method implemented in the pem.mfunction in the SI toolbox w14 . With an appropriate OE model specified, thex pem.mfunction identifies the optimal current estima-

Ž .

tor 19 , where the secondary measurements y₂ are also used as input signals. For validation compar-

Ž .

isons, the RMSE criterion in Eq. 22 was used, with y_1,k<k replaced by the appropriate estimate.

5.1. Example 1: a second-order system with a first- order process noise model

As a starting point, the following continuous second-order process model with an additional first-order process noise model was used e.g., interacting mix-Ž ing tanks or thermal processes :.

y1 1 0 0 0 xs 1 y2 1 xq 1 uq 0 v

0 0 y1 0 1 Ž49.

w x

y₁s 1 0 0 xqw₁

w x

y₂s 0 1 0 xqw₂.

The system was discretized assuming zero-order hold elements on theuandvinputs and a sampling intervalTs0.1, and then simulated withuk as a fil- tered PRBS signal with autocovariance r Ž .p s

<p< uu

Ž w x .

0.8 in Ref. 3 , example 5.11 with as0.8 , e.g., an input that was persistently exciting of sufficient order. The noise sourcesv_k,w_1,k andw_2,kwere independent and normally distributed white noise sequences with zero mean and variances given below.

The simulated system was identified using the

Ž .

OEP and OEC models 10, 16 withuk and y2,k as input signals and y_1,_k as output signal, using Ns 10 000 samples.

Ž .

The OEP model 10 was specified as

w x w x w x

nn_OEPs 0, 3,3 ,0,0, 3,3 , 1,1 , Ž50. i.e., a model

B q1

Ž

^y¹

.

B q2

Ž

^y¹

.

OEP

The OEC model Eq. 16 was specified as

w x w x w x

nn_OECs 0, 3,4 ,0,0, 3,3 , 1,0 , Ž56.

Ž . Ž ^y¹.

i.e., the same model as Eq. 51 , but with B q₂ altered to

B q2

Ž

^y¹

.

sb20qb q21 ^y¹qb q22 ^y²qb q23 ^y³. Ž57. As the main purpose of the simulations was to verify the theory, no attempt was made to find the model order and model structure from the data. The model order can, however, be found by ordinary use of one of the several available subspace identification methods 7 , and a systematic method for find-w x

w x

ing the structure is presented in Ref. 10 . For the OEP and OEC models, no attempt was made to force

Ž ^y¹. Ž ^y¹.

F q₁ and F q₂ to be identical, which they theoretically should be.

As a basis for comparisons given a specific experimental condition, each model was identified and validated in Ms100 Monte Carlo runs using independent data sets. In order to limit the influence of local minima problems, each identification and validation given a specific data set was repeated Js5 times with randomized initial B parameters Žb_il,_jq1

Ž .

sb_il,j= 1q0.5e, witheas a normal random variable with zero mean and variance 1 ..

The mean RMSE values and RMSE standard deviations for Ns10 000, rvs0.1, r22s0.01 and varyingr₁₁values are given in Table 1, showing an obvious agreement between results based on simulation and theory. Table 1 also includes theoretical

OEP T

RMSE values Var

( Ž

y^1,k^<^k^y¹

.

s

(

C P¹ C¹qr¹¹

Table 1

Validation RMSE mean values and standard deviations and theo-

Ž .

retical mean values for OE models multiplied with 10000 r₁₁ OEP OEP_th. OEC OEC_th.

y8

10 177"5 177 173"6 173

y6

10 177"5 177 173"5 173

y4

10 204"6 203 200"5 200

(11)

Fig. 2. Segment of validation responses for the OEP model Eq.Ž Ž51 using both.. uandy₂as inputs dashed, RMSEŽ s0.0273 and.

Ž w x

an OE model using onlyuas input nn_OEUs0,3,0,0,3,1 , dotted, RMSEs0.0906 . The experimental conditions are given by. r_vs 1,r₁₁s0.0001,r₂₂s0.01 and Ns200, and the ideal validation response is shown by solid line.

OEC T OEP

and Var

( Ž

y^1,k^<^k

.

s

(

C P¹ C¹qr¹¹ with P

OEC Ž . Ž .

andP computed according to Eqs. 8 and 20 . The results in Table 1 were obtained from Ns 10 000 samples. To indicate expected results for a more realistic number of samples, and at the same time visualize the degree of model misfit behind the RMSE values in Table 1, specific validation responses for models based on Ns200 samples are shown in Fig. 2. Fig. 2 also gives a representative picture of the improvement achieved by includingy₂ as an input signal in addition tou.

5.2. Example 2: dynamic system PCR and PLSR solutions

For simulations of the dynamical DPCR and DPLSR solutions in Section 5, three independent fil- tered white noise sequences were generated. The following continuous system of three independent second order systems was used as a starting point withŽ as y1 :.

a 0 0 1 0 0 0

0 a 0 0 1 0 0

0 0 a 0 0 1 0

x s xq

v₁

0 0 0 a 0 0

0 0 0 0 a 0 v₂

0 0 0 0 0 a v₃

w x

y₁ s 1 1 1 0 0 0 xqw₁

w x

y₂ s L₂₁ 0 xqw₂. 58 Ž .

Here, L₂₁ was a 200=3 matrix with uniformly

Ž .

distributed random parameters in the interval 0,1 . The system was discretized assuming zero-order hold elements on the vinputs and a sampling interval T s0.1. The system was then simulated withv,w₁and w₂ as independent and normally distributed white noise sequences with zero mean. The R_v and R₂₂ covariance matrices were diagonal, with uniformly

Ž .

distributed random parameters in the intervals 0,1

Ž .

and 0,r₂₂ , respectively, while the y₁ variance was r₁₁s0.0001. Different values of r₂₂ were used as described below.

The simulations started with r₂₂s0.01 and Ns 200. In order to find the appropriate number of com-

Ž .

ponents, the static PCR and PLSR estimators 40, 46 were first determined for different numbers of components A. In addition the dynamical DPCR and

Ž .

DPLSR estimates according to Eq. 48 were identi-

Ž w x

fied using the OE model see Ref. 14 for definition of nn.

w x w x w x

nns 0, 2, . . . ,2 ,0,0, 2, . . . ,2 , 0, . . . ,0 . Ž59. Each model was identified in Ms10 Monte Carlo runs, with validation against independent data sets with Ns200 samples. The resulting mean RMSE values are plotted in Fig. 3, and there we find the optimal number of components As3. This is not sur- prising since the system has three independent noise sources. Fig. 3 also indicates that PLSR is slightly better than PCR, and that the dynamic solutions are better than the static ones.

Fig. 3. RMSE mean values as function of number of components used in PCR, PLSR, DPCR and DPLSR models forr₂₂s0.01, based on 10 Monte Carlo runs with Ns200 samples.