State-space models - H˚ akon Tjelmeland - Ensemble updating for a state-space model with catego

H˚ akon Tjelmeland

2.1 State-space models

A general state-space model consists of two discrete-time stochastic processes:

a latent process{x^t}^Tt=1wherex^t2⌦x✓Rⁿis ann-dimensional vector, called the state vector at time stept, and an observed process{y^t}^Tt=1, wherey^t2⌦y ✓R^m is anm-dimensional vector and a partial observation ofx^t. The latentx^t-process, usually called the state process, is assumed to evolve in time according to a first-order Markov chain with initial distribution px¹(x¹) and transition probabilities p_x^t_|x^t ¹(x^t|x^t ¹), t 2. The joint distribution ofx^1:T = (x¹, . . . , x^T) can thus be written as

px^1:T(x^1:T) =px¹(x¹) YT t=2

px^t|x^t ¹(x^t|x^t ¹).

The observations y¹, . . . , y^T are assumed to be conditionally independent given the states, withy^tdepending onx^1:T only throughx^t. Hence, the joint likelihood fory^1:T = (y¹, . . . , y^T) givenx^1:T can be written as

py^1:T|x^1:T(y^1:T|x^1:T) = YT t=1

py^t|x^t(y^t|x^t).

A graphical illustration of the general state-space model is shown in Figure 1.

When the variables of the state vector are categorical, the model is often called an HMM. Following K¨unsch (2000), the term HMM is in this report reserved for finite state-space state processes, while the term state-space model may refer to either a categorical or a continuous situation.

x¹ x² · · · x^t ¹ x^t · · · x^T

y¹ y² y^t ¹ y^t y^T

Figure 1: Graphical illustration of a state-space model.

An important inference procedure associated with state-space models, and the main motivation for the work of this report, is filtering. The objective of filtering is, for each time step t, to compute the so-called filtering distribution, px^t|y^1:t(x^t|y^1:t), that is the distribution of the unobserved state x^t given all the observations available at time t, y^1:t = (y¹, . . . , y^t). Because of the particular dependency structure of the state-space model, the series of filtering distributions can be computed recursively according to a two-step procedure as follows:

px^t|y^1:t ¹(x^t|y^1:t ¹) = Z

⌦x

px^t|x^t ¹(x^t|x^t ¹)px^t ¹|y^1:t ¹(x^t ¹|y^1:t ¹)dx^t ¹, (1)

⌦x

p_x^t_|y^1:t ¹(x^t|y^1:t ¹)py^t|x^t(y^t|x^t)dx^t

. (2)

The first step is called the prediction step and computes the forecast distribution px^t|y^1:t ¹(x^t|y^1:t ¹). The second step is called the update step and uses Bayes’ rule to condition the forecast (prior) distribution on the incoming observation y^t to compute the filtering (posterior) distributionp_xt|y^1:t(x^t|y^1:t).

Generally, we are unable to evaluate the integrals in Eqs. (1) and (2), and the forecast and filtering distributions are left intractable. Approximate solutions are therefore necessary. The most common approach is to use a simulation-based method where a set of samples is used to empirically represent the series of pre-diction and filtering distributions. These methods are in the literature often referred to as ensemble methods, and the set of samples used to approximate the distributions is called an ensemble. Starting from an ensemble of indepen-dent realisations from the initial model px¹(x¹), the idea is to advance the en-semble forward in time according to the state-space model dynamics. Similarly to the recursion in Eqs. (1) and (2), an ensemble method may alternate be-tween a forecast step and an update step. Assuming at timetthat an ensemble {x˜^t ^1,(1), . . . ,x˜^t ^1,(M)} ofM independent realisations from the previous filtering

distribution p_x^t ¹_|y^1:t ¹(x^t ¹|y^1:t ¹) is available, the forecast step is then carried out by simulatingx^t,(i)|˜x^t ^1,(i) ⇠px^t|x^t ¹(·|˜x^t ^1,(i)) independently for eachi. This yields a forecast ensemble,{x^t,(1), . . . , x^t,(M)}, with independent realisations from the forecast distribution p_x^t_|y^1:t ¹(x^t|y^1:t ¹). Typically in practical applications, we are able to deal with this forecasting, but to a high computational cost. Af-ter the forecast step, the forecast ensemble needs to be updated taking the new observationy^t into account, so that a new filtering ensemble, {x˜^t,(1), . . . ,x˜^t,(M)}, with independent realisations from the filtering distributionpx^t|y^1:t(x^t|y^1:t) is ob-tained. However, in contrast to the prediction step, there is no straightforward way to proceed with this updating. Therefore, ensemble filtering methods require approximations in the update step. In the present report, we propose one such approximate updating method.

There exist two main classes of ensemble filtering methods: particle filters (Gordon et al., 1993; Doucet et al., 2001) and variations of the EnKF. Hybrid versions of these filters have also been proposed (e.g., Frei and K¨unsch, 2012, 2013). In this report, the focus is on the EnKF, and a brief review of the EnKF follows in the next section.

2.2 The ensemble Kalman filter

The EnKF is an ensemble filtering method which relies on Gaussian approx-imations. The filter was first introduced in Evensen (1994) and several modifi-cations of the algorithm have been proposed in the literature since then. The variety of EnKF methods can be classified into two main categories, stochastic filters and deterministic filters, di↵ering in whether the updating of the ensemble is carried out in a stochastic or deterministic manner. Deterministic filters are also known as square root filters, and this is the term we use in this report.

To understand the EnKF, consider first a linear-Gaussian model wherex ⇠ N(x;µ, Q) and y|x ⇠ N(y;Hx, R), µ 2 Rⁿ, Q 2 Rⁿ^⇥ⁿ, H 2 R^m^⇥ⁿ, and R 2 R^m^⇥^m. The posterior model corresponding to this linear-Gaussian model is a Gaussian,N(x;µ^⇤, Q^⇤), with mean vectorµ^⇤ 2Rⁿ and covariance matrixQ^⇤ 2 Rⁿ^⇥ⁿ analytically available from the Kalman filter equations as

µ^⇤=µ+K(y Hµ) (3)

and

Q^⇤= (In KH)Q, (4)

respectively, where In2Rⁿ^⇥ⁿ is then⇥nidentity matrix and

K=QH^> HQH^>+R ¹ (5)

is the so-called Kalman gain matrix, where we have introduced the notation A^> to denote the transpose of a matrix A. Now, suppose x ⇠ N(x;µ, Q) and

✏⇠ N(✏; 0, R) are independent random samples, and consider the linear transfor-mation

x=x+K(y Hx+✏). (6)

It is then a straightforward matter to show that ˜x|y is distributed according to the Gaussian distributionN(x;µ^⇤, Q^⇤) with meanµ^⇤ and covarianceQ^⇤ given by Eqs. (3) and (4), respectively (e.g., Burgers et al., 1998).

At a given time step t, the EnKF starts by making a linear-Gaussian as-sumption about the true (unknown) underlying model. Specifically, the forecast samples x^t,(1), . . . , x^t,(M) are assumed to be distributed according to a Gaussian distributionN(x^t;µ^t, Q^t) where the parametersµ^tandQ^tare set equal to the sam-ple mean and the samsam-ple covariance of the forecast ensemble, and the likelihood model is assumed to be a Gaussian distribution with meanH^tx^t and covariance R^t, H^t 2 R^m^⇥ⁿ, R^t 2 R^m^⇥^m. Under the assumption that the assumed linear-Gaussian model is correct we have x^t,(i) ⇠N(x^t;µ^t, Q^t) for each i, and the goal is to update x^t,(i) so that ˜x^t,(i)⇠N(x^t;µ^⇤^t, Q^⇤^t), where µ^⇤^tand Q^⇤^tare given by Eqs. (3) and (4), respectively, with a superscripttincluded in the notations, i.e.

µ^⇤^t=µ^t+K^t(y^t H^tµ^t) (7) and

Q^⇤^t= (In K^tH^t)Q^t, (8)

where, similarly, K^t is given by Eq. (5), with a superscript t included, K^t = Q^t(H^t)^> H^tQ^t(H^t)^>+R^t ¹. Stochastic and square root EnKFs ob-tain this result in di↵erent ways. The stochastic EnKF proceeds by simulating

✏^t,(i) ⇠ N(✏^t; 0, R^t) independently fori = 1, . . . , M, and then exploits Eq. (6),

which now takes the form

x^t,(i)=x^t,(i)+K^t(y^t H^tx^t,(i)+✏^t,(i)). (9) The square root EnKF instead performs a non-random linear transformation of x^t,(i),

x^t,(i)=B^t(x^t,(i) µ^t) +µ^t+K^t(y^t H^tµ^t), (10) whereB^t2Rⁿ^⇥ⁿ is a solution to the quadratic matrix equation

B^tQ^t(B^t)^>= (In K^tH^t)Q^t. (11) If the underlying state-space model really is linear-Gaussian, the distribution of each updated sample converges to the true (Gaussian) filtering distribution as M ! 1. In all other cases, the update is biased. However, since the posterior ensemble is obtained from a linear shift of a possibly non-Gaussian prior ensemble, non-Gaussian properties of the true prior and posterior models may, to some extent, be captured.

3 Generalised, fully Bayesian updating

In document Ensemble updating for a state-space model with categorical variables (sider 109-113)