Subspace system identiﬁcation of the Kalman ﬁlter: open and closed loop systems

(1)

Subspace system identification of the Kalman filter:

open and closed loop systems

Dr. David Di Ruscio Telemark university college Department of process automation

N-3914 Porsgrunn, Norway Fax: +47 35 57 52 50

Tel: +47 35 57 51 68 February 22, 2007

Abstract

Some proofs concerning a subspace identification algorithm are presented. It is proved that the Kalman filter gain and the noise innovations process can be identified directly from known input and output data without explicitly solving the Riccati equation. Furthermore, it is in general and for colored inputs, proved that the subspace identification of the states only is possible if the deterministic part of the system is known or identified beforehand. However, if the inputs are white, then, it is proved that the states can be identified directly. Some alternative projection matrices which can be used to compute the extended observability matrix directly from the data are presented. Furthermore, an efficient method for computing the deterministic part of the system is presented. The closed loop subspace identification problem is also addressed and it is shown that this problem is solved and unbiased estimates are obtained by simply including a filter in the feedback. Furthermore, an algorithm for consistent closed loop subspace estimation is presented.

Keywords: Identification methods; Subspace methods; Stochastic systems; Sampled data systems; Linear systems.

1 Introduction

A complete subspace identification (SID) algorithm are discussed and derived in this paper. The derivation presented is different from the other published pa- pers on subspace identification, Van Overschee and De Moor (1994), Larimore (1990), Viberg (1995) and Van Overschee (1995) and the references therein, because we are using general input and output matrix equations which de- scribes the relationship between the past and the future input and output data matrices.

One of the contributions in this paper is that it is shown that the Kalman filter model matrices, including the Kalman gain and the noise innovations process, of

(2)

a combined deterministic and stochastic system can be identified directly from certain projection matrices which are computed from the known input and output data, without solving any Riccati or Lyapunov matrix equations. This subspace method and results was presented without proof in Di Ruscio (1995) and Di Ruscio (1997). One contribution in this paper is a complete derivation with proof. A new method for computing the matrices in the deterministic part of the system is presented. This method has been used in the DSR Toolbox for Matlab, Di Ruscio (1996), but has not been published earlier.

Furthermore, it is pointed out that the states, in general (i.e. for colored input signals), only can be computed if the complete deterministic part of the model is known or identified first. This is probably the reason for which the state based subspace algorithms which are presented in the literature does not work properly for colored input signals. The SID algorithm in Verhagen (1994) works for colored input signals. The stochastic part of the model is not computed by this algorithm. The N4SID algorithm in Van Overschee and De Moor (1994) works well and only for white input signals. The stochastic part of the model is computed by solving a Riccati equation. However, the robust modification in Van Overschee and De Moor (1995) works well also for colored input signals.

The rest of this paper is organized as follows. Some basic matrix definitions and notations are presented in Section 2. The problem of subspace identification of the states for both colored and white input signals is discussed in Section 3.1.

The subspace identification of the extended observability matrix, which possibly is the most important step in any SID algorithm, are discussed in Section 3.2. It is proved that the Kalman filter gain matrix and the noise innovations process can be identified directly from the data in Section 3.3. A least squares optimal method for computing the deterministic part of the combined deterministic and stochastic system is presented in Section 3.4.

The problem of using subspace methods for closed loop systems are pointed out and some solutions to the problem are pointed out in section 4.

The main contribution in this paper is a new method for subspace system identification that works for closed loop as well as open loop systems. The method are based on the theory in Section 3 and is presented in Section 5. This method is probably one of the best for closed loop subspace system identification.

Some topics and remarks related to the algorithm are presented in Section 6. Numerical examples are provided in Section 7 in order to illustrate the behaviour of the algorithm both in open and closed loop. Some concluding remarks follows in Section 8.

2 Notation and definitions

2.1 System and matrix definitions

Consider the following state space model on innovations form

¯

x_k+1=Ax¯k+Buk+Cek, (1) y_k=D¯x_k+Eu_k+F e_k, (2)

(3)

where ek is white noise with covariance matrix E(eke^T_k) = Im. One of the problems addressed and discussed in this paper is to directly identify (subspace identification) the system order, n, the state vector ¯x_k ∈ Rⁿ, and the matrices (A, B, C, D, E, F) from a sequence of known input and output data vectors,uk,

∈ R^r and yk, ∈ R^m, respectively. A structure parameter, g, is introduced so thatg= 1 whenE is to be identified andg= 0 whenE is a-priori known to be zero. This should be extended to a structure matrix G with ones and zeroes, the ones pointing to the elements inE which are to be estimated. This is not considered further here. Based on (1) and (2) we make the following definitions for further use:

Definition 2.1 (Basic matrix definitions)

Theextended observability matrix, Oi, for the pair (D, A) is defined as

Oidef

=





 D DA ... DAⁱ⁻¹





 ∈ R^im^×ⁿ, (3)

where the subscript idenotes the number of block rows.

Thereversed extended controllabilitymatrix,C_i^d, for the pair (A, B) is defined as

C_i^d^def= £

Aⁱ⁻¹B Aⁱ⁻²B · · · B ¤

∈ Rⁿ^×^ir, (4) where the subscriptidenotes the number of block columns. Areversed extended controllabilitymatrix, C_i^s, for the pair (A, C) is defined similar to (4), i.e.,

C_i^s^def= £

Aⁱ⁻¹C Aⁱ⁻²C · · · C ¤

∈ Rⁿ^×^im, (5) i.e., with B substituted with C in (4). The lower block triangular Toeplitz matrix,H_i^d, for the quadruple matrices (D, A, B, E)

H_i^d^def=







E 0_m_×_r 0_m_×_r · · · 0_m_×_r DB E 0m×r · · · 0m×r

DAB DB E · · · 0m×r

... ... ... . .. ...

DAⁱ⁻²B DAⁱ⁻³B DAⁱ⁻⁴B · · · E







∈ R^im^×^(i+g⁻^1)r, (6)

where the subscript i denotes the number of block rows and i+g−1 is the number of block columns. Where 0m×r denotes the m×r matrix with zeroes.

A lower block triangular Toeplitz matrix H_i^s for the quadruple (D, A, C, F) is defined as

H_i^s ^def=







F 0m×m 0m×m · · · 0m×m

DC F 0m×m · · · 0m×m

DAC DC F · · · 0_m_×_m

... ... ... . .. ...

DAⁱ⁻²C DAⁱ⁻³C DAⁱ⁻⁴C · · · F







∈ R^im^×^im. (7)

(4)

2.2 Hankel matrix notation

Hankel matrices are frequently used in realization theory and subspace system identification. The special structure of a Hankel matrix as well as some matching notations, which are frequently used througout, are defined in the following.

Definition 2.2 (Hankel matrix) Given a (vector or matrix) sequence of data st ∈ R^nr^×^ns ∀ t= 0,1,2, . . . , t₀, t₀+ 1, . . . , (8) where nr is the number of rows in st and nc is the number of columns inst. Define integer numberst₀, L and K and define the matrix St as follows

S_t₀_|_Ldef

=







st0 st0+1 st0+2 · · · st0+K−1

st0+1 st0+2 st0+3 · · · st0+K

... ... ... . .. ...

st0+L−1 st0+L st0+L+1 · · · st0+L+K−2





 ∈R^Lnr^×^Knc. (9)

which is defined as a Hankel matrix because of the special structure. The integer numbers t₀, Land K are defined as follows:

• t₀ start index or initial time in the sequence, st0, which is the upper left block in the Hankel matrix.

• L is the number of nr-block rows in S_t0|L.

• K is the number of nc-block columns in S_t₀_|_L.

A Hankel matrix is symmetric and the elements are constant across the anti- diagonals. We are usually working with vector sequences in subspace system identification, i.e., s_t is a vector in this case and hence, nc = 1. Examples of such vector processes, to be used in the above Hankel-matrix definition, are the measured process outputs,yt∈ R^m, and possibly known inputs, ut ∈R^r. Also define

y_j_|_i^def= £

y^T_j y_j+1^T · · · y^T_j+i₋₁ ¤T

∈ R^im, (10) which is refereed to as an extended (output) vector, for later use.

2.3 Projections

Given two matricesA∈Rⁱ^×^k andB ∈R^j^×^k. The orthogonal projection of the row space ofA onto the row space of B is defined as

A/B =AB^T(BB^T)^†B. (11) The orthogonal projection of the row space of A onto the orthogonal comple- ment of the row space ofB is defined as

AB^⊥=A−A/B =A−AB^T(BB^T)^†B. (12)

(5)

The following properties are frequently used A/

· A B

¸

=A, (13)

A/

· A B

¸_⊥

= 0. (14)

Prof of (13) and (14) can be found in e.g., Di Ruscio (1997b). The Moore- Penrose pseudo-inverse of a matrix A ∈ Rⁱ^×^k where k > i is defined as A^† = A^T(AA^T)⁻¹. Furthermore, consistent with (12) we will use the definition

B^⊥=Ik−B^T(BB^T)^†B, (15) throughout the paper. Note also the properties that (B^⊥)^T =B^⊥andB^⊥B^⊥= B^⊥.

3 Subspace system identification

3.1 Subspace identification of the states

Consider a discrete time Kalman filter on innovations form, i.e.,

¯

x_k+1=Ax¯k+Buk+Kεk, (16) yk=D¯xk+Euk+εk, (17) where ¯xk ∈ Rⁿ is the predicted state in a minimum variance sense, εk ∈ R^m is the innovations at discrete timek, i.e., the part of yk ∈ R^m that cannot be predicted from past data (i.e. known past inputs and outputs) and the present input. Furthermore, ¯yk =D¯xk+Euk is the prediction of yk, and εk is white noise with covariance matrix ∆ = E(εkε^T_k). Here εk =F ek is the innovations and the model (1) and (2) is therefore equivalent with the Kalman filter (16) and (17). Furthermore, we have that K = CF⁻¹ and ∆ = E(ε_kε^T_k) = F F^T, whenF is non-singular, i.e., when the system is not deterministic and when the Kalman filter exists.

A well known belief is that the states is a function of the past. Let us have a lock at this statement. The predicted state at timek:=t₀+J, i.e. ¯xt0+J of a Kalman filter with the initial predicted state atk:= t₀, i.e. ¯xt0 given, can be expressed as

¯

x_t0+J = ˜C_J^sy_t0|J + ˜C_J^du_t0|J + (A−KD)^Jx¯t0, (18) where ˜C_J^s =C_J(A−KD, K) is the reversed extended controllability matrix of the pair (A−KD, K), ˜C_J^d = CJ(A−KD, B−KE) is the reversed extended controllability matrix of the pair (A−KD, B −KE) and ¯xt0 is the initial predicted state (estimate) at the initial discrete timet₀. See (5) for the definition of the reversed controllability matrix. J is thepast horizon, i.e., the number of

(6)

past outputs and inputs used to define the predicted state (estimate) ¯xt0+J at the discrete timet₀+J.

Using (18) for different t₀, i.e. for t₀, t₀+ 1, t₀+ 2, . . ., t₀+K−1, gives the matrix equation

X_t0+J = ˜C_J^sY_t₀_|_J+ ˜C_J^dU_t₀_|_J+ (A−KD)^JX_t0, (19) where

Xt0+J = £

¯

xt0+J x¯t0+J+1 · · · x¯t0+J+K−1

¤ ∈ Rⁿ^×^K, (20) Xt0 = £

¯

xt0 x¯t0+1 · · · x¯t0+K−1

¤ ∈ Rⁿ^×^K. (21) where K is the number of columns in the data matrices. Note that K also is equal to the number of vector equations of the form (18) which is used to form the matrix version (19). Note also that the state matrixXt0 can be eliminated from (19) by using the relationship

Y_t0|J =O_JXt0 +H_J^dU_t0|J+g−1+H_J^sE_t0|J, (22) which we have deduced from the innovations form, state space model (1) and (2). Puttingt₀ =:t₀+J in (22) gives

The data is usually defined at time instant (or number of observations) k = 1,2, . . . , N. Hence, t₀ = 1 in this case. However, we are often definingt₀ = 0 which corresponds to data defined at k = 0,1, . . . , N −1. The bar used to indicate predicted state is often omitted. Hence, for simplicity of notation, we define the following equations from (19), (22) and (23),

Y₀_|_J =OJX0+H_J^dU₀_|_J+g₋₁+H_J^sE₀_|_J, (24)

Y_J_|_L=£

H_L^d OLC˜_J^d OLC˜_J^s ¤





U_J_|_L+g₋₁ U₀_|_J Y₀_|_J



+O_L(A−KD)^JX₀+H_L^sE_J_|_L. (27) Equation (27) is important for understanding a SID algorithm, because, it gives the relationship between the past and the future. Note also the terms in (27)

(7)

which are ”proportional” with the extended observability matrixOL. From (27) we see that the effect from the future inputs, U_J_|_L+g₋₁, and the future noise, E_J_|_L, have to be removed from the future outputs,Y_J_|_L, in order to recover the subspace spanned by the extended observability matrix,OL. A variation of this equation, in which the termX₀ is eliminated by using (22) or (24) is presented in Di Ruscio (1997b). Note also that (25) and (24) gives

X_J =£

P_J^u P_J^y ¤· U₀_|_J Y₀_|_J

¸

−P_J^eE₀_|_J, (28)

P_Jû = C˜_J^d−(A−KD)^JO^†_JH_J^d, (29) P_J^y = C˜_J^s+ (A−KD)^JO^†_J, (30) P_Jê = (A−KD)^JO_J^†H_J^s, (31) where we for the sake of simplicity and without loss of generality have put g = 1. Equation (28) is useful because it shows that the future states X_J is in the range of a matrix consisting of past inputs,U₀_|_J, and past outputs, Y₀_|_J (in the deterministic case or when J → ∞). Note that we have introduced the notation,P_Jû, in order to represent the influence from thepast inputs upon the future. Combining (28) and (26) gives an alternative to (27), i.e. the

”past-future” matrix equation, Y_J_|_L=£

H_L^d O_LP_J^u O_LP_J^y ¤





U_J_|_L+g₋₁ U₀_|_J Y₀_|_J



−O_LP_J^eE₀_|_J+H_L^sE_J_|_L. (32) The two last terms in (32) cannot be predicted from data, i.e., because E₀_|_J andE_J_|_L are built from the innovations process ek.

It is important to note that a consistent estimate of the system dynamics can be obtained by choosingLandN properly. ChoosingLmin≤LwhereLmin = n+ rank(D)−1 and letting N → ∞, is in general, necessary conditions for a consistent estimate of the dynamics. See Section 3.2 for further details.

On the other side, it is in general, also necessary to let J → ∞ in order to obtain a consistent estimate of the states. The reason for this is that the term (A−KD)^J = 0 in this case. Hence, the effect of the initial state matrixX₀ on the future statesX_J has died out. We have the following Lemma

Lemma 3.1 (Subspace identification of the states)

Let K→ ∞ in the data matrices. The projected state matrix is defined as

XJ/





U_J_|_L+g₋₁ U₀_|_J Y₀_|_J



 = O_L^†(

Z^d_J_|_L

z }| {

Y_J_|_L/





U_J_|_L+g₋₁ U₀_|_J Y₀_|_J



−H_L^dU_J_|_L+g₋₁)

= C˜_J^sY₀_|_J+ ˜C_J^dU₀_|_J + (A−KD)^JX₀/





U_J_|_L+g₋₁ U₀_|_J Y₀_|_J



.

(33)

(8)

Consider the case when

(A−KD)^JX₀/





U_J_|_L+g₋₁ U₀_|_J Y₀_|_J



= 0, (34)

which is satisfied whenJ → ∞ and(A−KD) is stable. This gives

X_J/





U_J_|_L+g₋₁ U₀_|_J Y₀_|_J



=X_J, (35)

and hence we have, in general, the following expression for the future states

XJ =O_L^†(

Z^d_J_|_L

z }| {

Y_J_|_L/





U_J_|_L+g₋₁ U₀_|_J Y₀_|_J



−H_L^dU_J_|_L+g₋₁). (36)

△

Proof 3.1 The proof is divided into two parts.

Part 1

The relationship between the future data matrices is given by

Y_J_|_L=OLXJ+H_L^dU_J_|_L+g₋₁+H_L^sE_J_|_L. (37)

Projecting the row space of each term in (37) onto the row space of





U_J_|_L+g₋₁ U₀_|_J Y₀_|_J



 gives

Y_J_|_L/





U_J_|_L+g₋₁ U₀_|_J Y₀_|_J



=OLXJ/





U_J_|_L+g₋₁ U₀_|_J Y₀_|_J



+H_L^dU_J_|_L+g₋₁+dE₁ (38) where the error term is given by

dE₁ =H_L^sE_J_|_L/





U_J_|_L+g₋₁ U₀_|_J Y₀_|_J



. (39)

It make sense to assume that future noise matrixE_J_|_Lis uncorrelated with past data and the future inputs, hence, we have that (w.p.1)

Klim→∞dE₁= 0. (40)

(9)

Part 2

Equation (25) gives the relationship between the future state matrixX_J and the past data matrices. Projecting the row space of each term in this equation onto the row space of





U_J_|_L+g₋₁ U₀_|_J Y₀_|_J



 gives

X_J/





U_J_|_L+g₋₁ U₀_|_J Y₀_|_J



= ˜C_J^sY₀_|_J + ˜C_J^dU₀_|_J+ (A−KD)^JX₀/





U_J_|_L+g₋₁ U₀_|_J Y₀_|_J



. (41) LettingJ → ∞ (or assuming the last term to be zero) gives

X_J/





U_J_|_L+g₋₁ U₀_|_J Y₀_|_J



= ˜C_J^sY₀_|_J+ ˜C_J^dU₀_|_J. (42) Letting J → ∞ and assuming the system matrix (A−KD) for the predicted outputs to be stable in (25) shows that

XJ = ˜C_J^sY₀_|_J+ ˜C_J^dU₀_|_J. (43) Comparing (42) and (43) gives

XJ =XJ/





U_J_|_L+g₋₁ U₀_|_J Y₀_|_J



. (44)

Using (44) in (38) and solving forXJ gives (36). 2

The condition in (35) is usually satisfied for largeJ, i.e., we have that limJ→∞(A− KD)^J = 0 whenA−KDis stable. Note also that the eigenvalues of A−KD usually are close to zero for “large” process noise (or “small” measurements noise). Then, (A−KD)^J is approximately zero even for relatively small num- bersJ. We will now discuss some special cases

Lemma 3.2 (SID of states: white input)

Consider a combined deterministic and stochastic system excited with a white input signal. Then

X_J =O^†_LY_J_|_L/





U_J_|_L+g₋₁ U₀_|_J Y₀_|_J



U_J^⊥_|_L+g₋₁ (45) whenJ → ∞.

Proof 3.2 This result follows from the proof of Lemma 3.1 and (36) and using that

X_JU_J^⊥_|_L+g₋₁=X_J (46) whenu_k is white and, hence, X₀/U_J_|_L+g₋₁ = 0. 2

(10)

Lemma 3.3 (SID of states: pure stochastic system) Consider a stochastic system. Then we simply have that

X_J =O_L^†Y_J_|_L/Y₀_|_J (47) whenJ → ∞ or when (A−KD)^JX₀/Y₀_|_J = 0 is satisfied.

Proof 3.3 This result follows from the proof of Lemma 3.1 by putting the mea- sured input variables equal to zero. 2

Lemma 3.1 shows that it is in general (i.e. for colored input signals) necessary to know the deterministic part of the system, i.e., the Toepliz matrix H_L^d in (36), in order to properly identify the states. This means that the matrices B and E in addition to D and A has to be identified prior to computing the states. I.e. we need to know the deterministic part of the model. However, a special case is given by Lemma 3.2 and Equation (45) which shows that the states can be identified directly when the input signals is white. Note also that the extended observability matrix OL is needed in (36) and (45). OL can be identified directly from the data. This is proved in the next Section 3.2, and this is indeed the natural step in a SID algorithm.

In the case of a white input signal or when J → ∞ then, H_L^d, and the state matrix, X_J, can be computed as by the N4SID algorithm, Van Overschee and De Moor (1996). From (32) and (28) we have the following lemma

Lemma 3.4 (States, XJ, and Toepliz matrix H_L^d: N4SID) The following LS solution

£ H_L^d O_LP_J^u O_LP_J^y ¤

=Y_J_|_L





U_J_|_L+g₋₁ U₀_|_J Y₀_|_J





†

+dE. (48)

holds in:

i) The deterministic case, provided the input is PE of orderJ+L+g−1. The error term,dE= 0, in this case.

ii) When J → ∞, and the input is PE of infinite order. The error term, dE= 0, in this case.

iii) A white uk gives a consistent estimate of H_L^d irrespective of J >0. How- ever,O_LP_J^u andO_LP_J^y are not consistent estimates in this case. The first mL×(L+g)r part of the error term, dE, is zero in this case.

Hence, under conditions i) and ii), O_LP_J^u and O_LP_J^y can be computed as in (48). Then the states can be consistently estimated as

XJ =O^†_L£

OLP_J^u OLP_J^y ¤· U₀_|_J Y₀_|_J

¸

, (49)

provided conditions i) and ii) are satisfied, andO^†_L is known.

(11)

Proof 3.4 The PE conditions in the lemma are due to the existence of the LS solution, i.e., the concatenated matrix

· U_J_|_L+g₋₁ U₀_|_J

¸

has to be of full row rank.

From (32) we have that the error term in the LS problem is

dE= (−OLP_J^eE₀_|_J +H_L^sE_J_|_L)





U_J_|_L+g₋₁ U₀_|_J Y₀_|_J





†

=−OLP_J^eE₀_|_J





U_J_|_L+g₋₁ U₀_|_J Y₀_|_J





†

.(50)

It is clear from (31) that the error term dE = 0 when J → ∞. This proves condition i) in the lemma. Furthermore, the error term,dE= 0, in the deter- ministic case becauseE₀_|_J = 0in this case. This proves condition ii). Analyzing the error term, dE, for a white input shows that the error term is of the form

dE=£

0_mL_×_(L+g)r dE₂ dE₃ ¤_†

, (51)

where the dE₂ and dE₃ are submatrices in dE different from zero. Note that dE2 = 0 for strictly proper systems, g = 0, when uk is white. This proves condition iii).

The states can then be computed by using (28) or (43), provided conditions i) or ii) are satisfied. 2

One should note that in the N4SID algorithm the past horizon is put equal to the future horizon (N4SID parameteri). In order for the above lemma to give the same results as in the N4SID algorithm we have to puti=L+ 1,J =L+ 1 and g = 1, i.e so that J +L = 2L+ 1 = 2i. Note that this last result does not hold in general. It holds in the deterministic case or when J → ∞. The extended observability matrix O_L can be computed as presented in the next section.

3.2 The extended observability matrix

An important first step in the SID algorithm is the identification of the system order,n, and the extended observability matrixO_L+1. The reason for searching forOL+1 is that we have to defineA from the shift invariance property, Kung (1978), or a similar method, e.g. as in Di Ruscio (1995). The key is to compute a special projection matrix from the known data. This is done without using the states. We will in this section show how this can be done for colored input signals.

Lemma 3.5 (SID of the extended observability matrix) The following projections are equivalent

Z_J_|_L+1= (Y_J_|_L+1/





U_J_|_L+g U₀_|_J Y₀_|_J



)U_J^⊥_|_L+g (52)

Z_J_|_L+1= (Y_J_|_L+1U_J^⊥_|_L+g)/(

· U₀_|_J Y₀_|_J

¸

U_J^⊥_|_L+g) (53)

(12)

Z_J_|_L+1 =Y_J_|_L+1/(

· U₀_|_J Y₀_|_J

¸

U_J^⊥_|_L+g) (54) Furthermore,Z_J_|_L+1 is related to the extended observability matrix O_L+1 as

Z_J_|_L+1 =O_L+1X_J^a, (55) where the “projected states”X_J^a can be expressed as

X_J^a = (XJ/





U_J_|_L+g U₀_|_J Y₀_|_J



)U_J^⊥_|_L+g (56)

= ( ˜C_J^dU₀_|_J+ ˜C_J^sY₀_|_J −(A−KD)^JX₀/





U_J_|_L+g U₀_|_J Y₀_|_J



)U_J^⊥_|_L+g (57)

= (X_J−(A−KD)^JX₀





U_J_|_L+g U₀_|_J Y₀_|_J





⊥

)U_J^⊥_|_L+g (58)

= (XJ+ (A−KD)^JO^†_JH_J^sE₀_|_J





U_J_|_L+g U₀_|_J Y₀_|_J





⊥

)U_J^⊥_|_L+g (59) Furthermore, the column space of Z_J_|_L+1 coincide with the column space of O_L+1 andn=rank(Z_J_|_L+1) if rank(X_J^a) =n.

Proof 3.5 The proof is divided into two parts. In the first part (52) and (55) with the alternative expressions in (56) to (58) are proved. In the second part the equivalence with (52), (53) and (54) are proved.

Part 1Projecting the row space of each term in (26) with L:=L+ 1 onto the row space of





U_J_|_L+g₋₁ U₀_|_J Y₀_|_J



 gives

Y_J_|_L+1/





U_J_|_L+g₋₁ U₀_|_J Y₀_|_J



=O_L+1X_J/





U_J_|_L+g₋₁ U₀_|_J Y₀_|_J



+H_L+1^d U_J_|_L+g₋₁+dE₁,(60) where we have used (13). Then, w.p.1

Klim→∞dE₁= 0, (61)

where the error term, dE₁, is given by (39) with L := L+ 1 . Removing the effect of the future input matrix, U_J_|_L+g₋₁, on (60) gives (52) and (55) with X_J^a as in (56).

Furthermore, projecting the row space of each term in (25) onto the row space of





U_J_|_L+g₋₁ U₀_|_J Y₀_|_J



 gives

XJ/





U_J_|_L+g₋₁ U₀_|_J Y₀_|_J



= ˜C_J^sY₀_|_J + ˜C_J^dU₀_|_J + (A−KD)^JX0/





U_J_|_L+g₋₁ U₀_|_J Y₀_|_J



 (62)

(13)

From (25) we have that

C˜_J^sY₀_|_J+ ˜C_J^dU₀_|_J =X_J −(A−KD)^JX₀. (63) Combining (60), (62) and (63) gives (52) and (57)-(58).

Part 2 It is proved in Di Ruscio (1997) that Z_J_|_L+1 = Y_J_|_L+1/

· U_J_|_L+g W

¸ U_J^⊥_|_L+g

= Y_J_|_L+1U_J^⊥_|_L+gW^T(W U_J^⊥_|_L+gW^T)⁻¹W U_J^⊥_|_L+g, (64) where

W =

· U₀_|_J Y₀_|_J

¸

. (65)

Using thatU_J^⊥_|_L+gU_J^⊥_|_L+g =U_J^⊥_|_L+g in (64) proves the equivalence between (53), (54) and (52). 2

Lemma 3.6 (Consistency: Stochastic and deterministic systems) Let J → ∞, then

Z_J_|_L+1 =O_L+1XJU_J^⊥_|_L+g, (66) whereZ_J_|_L+1is defined as in Lemma 3.5. A sufficient condition for consistency, and that O_L+1 is contained in the column space of Z_J_|_L+1, is that there are no pure state feedback.

Proof 3.6 LettingJ → ∞in (58) gives (66). This can also be proved by using (44) in (56). Furthermore, if there are pure state feedback thenXJU_J^⊥_|_L+g will lose rank below the normal rank which is n. 2

Lemma 3.7 (Deterministic systems)

For pure deterministic systems we have that (66) can be changed to

Z_J_|_L+1 =:Y_J_|_L+1U_J^⊥_|_L+g=OL+1XJU_J^⊥_|_L+g. (67) The extended observability matrixO_L+1can be computed from the column space of Y_J_|_L+1U_J^⊥_|_L+g. Furthermore, one can let J = 0 in the deterministic case.

Proof 3.7 This follows from (66) and Lemma 3.5 by excluding the projection which removes the noise. 2

Lemma 3.8 (Stochastic systems)

For pure stochastic systems we have that (66) can be changed to

Z_J_|_L+1 =:Y_J_|_L+1/Y₀_|_J =OL+1XJ. (68) The extended observability matrixO_L+1can be computed from the column space of Y_J_|_L+1/Y₀_|_J.

Proof 3.8 This follows from (66) and Lemma 3.5 by excluding the input ma- trices from the equations and definitions. 2

(14)

3.3 Identification of the stochastic subsystem

We will in this section prove that, when the extended observability matrix is known (from Section 3.2), the kalman filter gain matrix can be identified directly from the data. Furthermore, it is proved that the noise innovations process can be identified directly in a first step in the DSR subspace algorithm. This result was first presented in Di Ruscio (1995) without proof. Some results concerning this is also presented in Di Ruscio (2001) and (2003).

Lemma 3.9 (The innovations)

Define the following projection from the data Z_J^s_|_L+1=Y_J_|_L+1−Y_J_|_L+1/





U_J_|_L+g U₀_|_J Y₀_|_J



=Y_J_|_L+1





U_J_|_L+g U₀_|_J Y₀_|_J





⊥

. (69) Then w.p.1 as J → ∞

Z_J^s_|_L+1 =H_L+1^s E_J_|_L+1. (70) Hence, the Toeplitz matrixH_L+1^s (with Markov matricesF,DC,. . .,DA^L⁻¹C) for the stochastic subsystem is in the column space of√¹

KZ_J^s_|_L+1 since _K¹E_J_|_L+1E_J^T_|_L+1= I_L+1_×_L+1.

Proof 3.9 The relationship between the future data matrices is given by Y_J_|_L=OLXJ+H_L^dU_J_|_L+g₋₁+H_L^sE_J_|_L. (71) Projecting the row space of each term in (71) onto the row space of





U_J_|_L+g₋₁ U₀_|_J Y₀_|_J



 gives

Y_J_|_L/





U_J_|_L+g₋₁ U₀_|_J Y₀_|_J



=OLXJ/





U_J_|_L+g₋₁ U₀_|_J Y₀_|_J



+H_L^dU_J_|_L+g₋₁+dE1, (72) then, w.p.1

Klim→∞dE₁= 0, (73)

where dE₁ is given in (39). Furthermore,

Jlim→∞XJ/





U_J_|_L+g₋₁ U₀_|_J Y₀_|_J



=XJ, (74)

where we have used Equations (44) and (38). From (71), (72) and (74) we have that

Y_J_|_L−Y_J_|_L/





U_J_|_L+g₋₁ U₀_|_J Y₀_|_J



=H_L^sE_J_|_L. (75) PuttingL:=L+ 1 in (75) gives (69). 2

(15)

Note the following from Lemma 3.9. The innovations can be identified directly as forg= 1

Z_J^s_|₁=F E_J_|₁ =Y_J_|₁−Y_J_|₁/



 U_J_|₁ U₀_|_J Y₀_|_J



 (76)

or forg= 0 whenE = 0

Z_J^s_|₁ =F E_J_|₁ =Y_J_|₁−Y_J_|₁/

· U₀_|_J Y₀_|_J

¸

(77) One should note that (77) holds for both open and closed loop systems. For closed loop systems it make sense to only consider systems in which the direct feed-through matrix,E, from the inputukto the output ykis zero. This result will be used in order to construct a subspace algorithm whic gives consisten results for close loop systems, se Section 5.

It is now possible to directly identify the matricesC and F in the innovations model (1) and (2) and K and ∆ in the Kalman filter (16) and (17). Two methods are presented in the following. The first one is a direct covariance based method for computingKand ∆ and the second one is a more numerically reliable “square root” based method for computingC and F.

Lemma 3.10 (correlation method for K and ∆) Define the projection ma- trixZ_J^s_|_L+1 as in (69) and define the correlation matrix

∆_L+1 = 1

KZ_J^s_|_L+1(Z_J^s_|_L+1)^T =H_L+1^s (H_L+1^s )^T. (78) where the Toepliz matrixH_L+1^s can be partitioned as

H_L+1^s =

· F 0m×Lm

O_LC H_L^s

¸

, (79)

where C=KF. Hence, (78) can be written as

∆L+1 =

· ∆₁₁ ∆₁₂

∆₂₁ ∆₂₂

¸

=

· F F^T F(O_LC)^T

OLCF^T OLC(OLC)^T +H_L^s(H_L^s)^T

¸

. (80) From this we have

E(εkε^T_k) =F F^T = ∆11 (81)

and

K=CF⁻¹ =O^†_L∆₂₁∆⁻₁₁¹. (82) Lemma 3.11 (square-root method for C and F) The LQ decomposition of √¹

KZ_J^s_|_L+1 gives

√1

KZ_J^s_|_L+1=R₃₃Q₃. (83)