• No results found

H˚ akon Tjelmeland

2.1 State-space models

A general state-space model consists of two discrete-time stochastic processes:

a latent process{xt}Tt=1wherext2⌦x✓Rnis ann-dimensional vector, called the state vector at time stept, and an observed process{yt}Tt=1, whereyt2⌦y ✓Rm is anm-dimensional vector and a partial observation ofxt. The latentxt-process, usually called the state process, is assumed to evolve in time according to a first-order Markov chain with initial distribution px1(x1) and transition probabilities pxt|xt 1(xt|xt 1), t 2. The joint distribution ofx1:T = (x1, . . . , xT) can thus be written as

px1:T(x1:T) =px1(x1) YT t=2

pxt|xt 1(xt|xt 1).

The observations y1, . . . , yT are assumed to be conditionally independent given the states, withytdepending onx1:T only throughxt. Hence, the joint likelihood fory1:T = (y1, . . . , yT) givenx1:T can be written as

py1:T|x1:T(y1:T|x1:T) = YT t=1

pyt|xt(yt|xt).

A graphical illustration of the general state-space model is shown in Figure 1.

When the variables of the state vector are categorical, the model is often called an HMM. Following K¨unsch (2000), the term HMM is in this report reserved for finite state-space state processes, while the term state-space model may refer to either a categorical or a continuous situation.

x1 x2 · · · xt 1 xt · · · xT

y1 y2 yt 1 yt yT

Figure 1: Graphical illustration of a state-space model.

An important inference procedure associated with state-space models, and the main motivation for the work of this report, is filtering. The objective of filtering is, for each time step t, to compute the so-called filtering distribution, pxt|y1:t(xt|y1:t), that is the distribution of the unobserved state xt given all the observations available at time t, y1:t = (y1, . . . , yt). Because of the particular dependency structure of the state-space model, the series of filtering distributions can be computed recursively according to a two-step procedure as follows:

pxt|y1:t 1(xt|y1:t 1) = Z

x

pxt|xt 1(xt|xt 1)pxt 1|y1:t 1(xt 1|y1:t 1)dxt 1, (1)

pxt|y1:t(xt|y1:t) = pxt|y1:t 1(xt|y1:t 1)pyt|xt(yt|xt) Z

x

pxt|y1:t 1(xt|y1:t 1)pyt|xt(yt|xt)dxt

. (2)

The first step is called the prediction step and computes the forecast distribution pxt|y1:t 1(xt|y1:t 1). The second step is called the update step and uses Bayes’ rule to condition the forecast (prior) distribution on the incoming observation yt to compute the filtering (posterior) distributionpxt|y1:t(xt|y1:t).

Generally, we are unable to evaluate the integrals in Eqs. (1) and (2), and the forecast and filtering distributions are left intractable. Approximate solutions are therefore necessary. The most common approach is to use a simulation-based method where a set of samples is used to empirically represent the series of pre-diction and filtering distributions. These methods are in the literature often referred to as ensemble methods, and the set of samples used to approximate the distributions is called an ensemble. Starting from an ensemble of indepen-dent realisations from the initial model px1(x1), the idea is to advance the en-semble forward in time according to the state-space model dynamics. Similarly to the recursion in Eqs. (1) and (2), an ensemble method may alternate be-tween a forecast step and an update step. Assuming at timetthat an ensemble {x˜t 1,(1), . . . ,x˜t 1,(M)} ofM independent realisations from the previous filtering

distribution pxt 1|y1:t 1(xt 1|y1:t 1) is available, the forecast step is then carried out by simulatingxt,(i)|˜xt 1,(i) ⇠pxt|xt 1(·|˜xt 1,(i)) independently for eachi. This yields a forecast ensemble,{xt,(1), . . . , xt,(M)}, with independent realisations from the forecast distribution pxt|y1:t 1(xt|y1:t 1). Typically in practical applications, we are able to deal with this forecasting, but to a high computational cost. Af-ter the forecast step, the forecast ensemble needs to be updated taking the new observationyt into account, so that a new filtering ensemble, {x˜t,(1), . . . ,x˜t,(M)}, with independent realisations from the filtering distributionpxt|y1:t(xt|y1:t) is ob-tained. However, in contrast to the prediction step, there is no straightforward way to proceed with this updating. Therefore, ensemble filtering methods require approximations in the update step. In the present report, we propose one such approximate updating method.

There exist two main classes of ensemble filtering methods: particle filters (Gordon et al., 1993; Doucet et al., 2001) and variations of the EnKF. Hybrid versions of these filters have also been proposed (e.g., Frei and K¨unsch, 2012, 2013). In this report, the focus is on the EnKF, and a brief review of the EnKF follows in the next section.

2.2 The ensemble Kalman filter

The EnKF is an ensemble filtering method which relies on Gaussian approx-imations. The filter was first introduced in Evensen (1994) and several modifi-cations of the algorithm have been proposed in the literature since then. The variety of EnKF methods can be classified into two main categories, stochastic filters and deterministic filters, di↵ering in whether the updating of the ensemble is carried out in a stochastic or deterministic manner. Deterministic filters are also known as square root filters, and this is the term we use in this report.

To understand the EnKF, consider first a linear-Gaussian model wherex ⇠ N(x;µ, Q) and y|x ⇠ N(y;Hx, R), µ 2 Rn, Q 2 Rnn, H 2 Rmn, and R 2 Rmm. The posterior model corresponding to this linear-Gaussian model is a Gaussian,N(x;µ, Q), with mean vectorµ 2Rn and covariance matrixQ 2 Rnn analytically available from the Kalman filter equations as

µ=µ+K(y Hµ) (3)

and

Q= (In KH)Q, (4)

respectively, where In2Rnn is then⇥nidentity matrix and

K=QH> HQH>+R 1 (5)

is the so-called Kalman gain matrix, where we have introduced the notation A> to denote the transpose of a matrix A. Now, suppose x ⇠ N(x;µ, Q) and

✏⇠ N(✏; 0, R) are independent random samples, and consider the linear transfor-mation

˜

x=x+K(y Hx+✏). (6)

It is then a straightforward matter to show that ˜x|y is distributed according to the Gaussian distributionN(x;µ, Q) with meanµ and covarianceQ given by Eqs. (3) and (4), respectively (e.g., Burgers et al., 1998).

At a given time step t, the EnKF starts by making a linear-Gaussian as-sumption about the true (unknown) underlying model. Specifically, the forecast samples xt,(1), . . . , xt,(M) are assumed to be distributed according to a Gaussian distributionN(xtt, Qt) where the parametersµtandQtare set equal to the sam-ple mean and the samsam-ple covariance of the forecast ensemble, and the likelihood model is assumed to be a Gaussian distribution with meanHtxt and covariance Rt, Ht 2 Rmn, Rt 2 Rmm. Under the assumption that the assumed linear-Gaussian model is correct we have xt,(i) ⇠N(xtt, Qt) for each i, and the goal is to update xt,(i) so that ˜xt,(i)⇠N(xtt, Qt), where µtand Qtare given by Eqs. (3) and (4), respectively, with a superscripttincluded in the notations, i.e.

µtt+Kt(yt Htµt) (7) and

Qt= (In KtHt)Qt, (8)

where, similarly, Kt is given by Eq. (5), with a superscript t included, Kt = Qt(Ht)> HtQt(Ht)>+Rt 1. Stochastic and square root EnKFs ob-tain this result in di↵erent ways. The stochastic EnKF proceeds by simulating

t,(i) ⇠ N(✏t; 0, Rt) independently fori = 1, . . . , M, and then exploits Eq. (6),

which now takes the form

˜

xt,(i)=xt,(i)+Kt(yt Htxt,(i)+✏t,(i)). (9) The square root EnKF instead performs a non-random linear transformation of xt,(i),

˜

xt,(i)=Bt(xt,(i) µt) +µt+Kt(yt Htµt), (10) whereBt2Rnn is a solution to the quadratic matrix equation

BtQt(Bt)>= (In KtHt)Qt. (11) If the underlying state-space model really is linear-Gaussian, the distribution of each updated sample converges to the true (Gaussian) filtering distribution as M ! 1. In all other cases, the update is biased. However, since the posterior ensemble is obtained from a linear shift of a possibly non-Gaussian prior ensemble, non-Gaussian properties of the true prior and posterior models may, to some extent, be captured.

3 Generalised, fully Bayesian updating