Drift towards Gaussianity - General case: Hidden Markov model

3 Ensemble Filter Algorithm

3.3 General case: Hidden Markov model

3.3.2 Drift towards Gaussianity

The EnKF algorithm is, as mentioned previously, asymptotically correct, whenn_e! 1, for Gauss-linear model HM model. Consider an initial ensemblee0generated from a non-Gaussian initial modelf(r0).

Assume further that the forward and likelihood functions are continuous.

The sequential updates in the EnKF will make the ensemblee_tmore and more Gaussian - the ensem-ble drifts towards Gaussianity. This drift is caused by the successive linearized updates when condition-ing on the data.

Several variants of the EnKF exist that address this issue:

1. Gaussian anamorphosis (GA) EnKF 2. Gaussian Mixture (GM) EnKF 3. Selection (S) EnKF

3.3.2.1 Gaussian anamorphosis (GA) EnKF The idea is to transform the ensemble to be marginally Gaussian before conditioning, the transformed ensemble will be marginally Gaussian, and to carry out the conditioning with the transformed ensemble. After conditioning, the ensemble is then back-transformed.

Applications have shown that Gaussian anamorphosis can successfully prevent the drift towards Gaus-sianity, see Zhou et al. (2012).

Consider a univariate random variableywith cdfFY(y) and a random sample (y1,...,yny) iid from FY(y). The cdf can be estimated as,

whereJ{·} is some semi parametric smoother of the empirical stepwise cdf estimator in the argument.

The smoother ensures that the back-transform is real valued.

The univariate Gaussian transform of one sampley₀fromF_Y(y) is defined as,

y0=©1( ˆF_Y^°1(y0);0,1). (34)

Note that ˜y0has an approximate standard Gaussian pdf. Similarly the back-transform of the univariate Gaussian sample ˜u0is,

u0=FˆY(©^°1₁ ( ˜u0;0,1)). (35)

The smoothing of ˆFY(y) ensures thatu02Rand do not belong to the set {y1,...,yny} only.

The univariate transformation of an ensemblee_t: {(r^u(i)_t ,d⁽ⁱ⁾_t ),i=1,...,ne} entails that for each en-semble member, each of the (n+m) dimensions must be transformed independently. Hence only ap-proximate univariate Gaussianity is ensured, the multivariate characteristics remaining unspecified. The

latter entails that the linearized conditioning is only approximately correct. Algorithm 3 presents the GA EnKF procedure.

Algorithm 3Gaussian anamorphosis EnKF et: { (r^u(i)_t ,d⁽ⁱ⁾_t ),i=1,...,ne} Conditioning:

Assess all (n+m) dimensions byFY(y) fromet!FˆY(y) Univariate Gaussian transform ofe_tby ˆF_Y(y)!e˜_t

Univariate Gaussian transform of observationsd_tby ˆFY(y)!d˜_t Estimateß_{r d}frome˜t!ßˆ_{r d}!Kˆt=°ˆ_{r d}[ ˆß_d]^°1

r^c(i)_t =r˜^u(i)_t +Kˆ_t(d˜_t°d˜⁽ⁱ⁾_t ),i=1,...,ne

Univariate back-transform ofr˜^c(i)_t by ˆFY(y)!r^c(i)_t Forwarding:

r^u(i)_t+1=!_t(r^c(i)_t )+≤^r_t;i=1,...,n_e d⁽ⁱ⁾_t₊₁=√t+1(r^u(i)_t₊₁)+≤^d_t;i=1,...,ne

e_t+1: {(r^u(i)_t+1,d⁽ⁱ⁾_t+1);i=1,...,ne}

3.3.2.2 Gaussian Mixture (GM) EnKF The basic idea is to specify the prior initial model as a Gaussian mixture (GM) model which can represent multimodal variables. Let the forward and likelihood models of the HM model be Gauss-linear. The posterior pdf will then also be a GM model and be analytically tractable. The conditioning step can be made independently for each component of the GM model, and the associated weight for each component can also be calcultated. For general forward and likelihood models, ensemble based filtering algorithms must be used. The difficulty is that each ensemble mem-ber must carry a mode indicator assigned to one of the components, and that this indicator may change during the forwarding step. These filtering algorithms have proven robust against the drift towards Gaus-sianity (Li et al., 2016; Ackerson and Fu, 1970; Chen and Liu, 2000; Smith, 2007; Dovera and Della Rossa, 2010; Bengtsson et al., 2003), at least for low-dimensional models.

Consider a set ofn-dimensional Gaussian pdfs,'_n(r;µ^l_r,ß^l_r);l=1,...,L, denoted components, and a set of normalized mixture weightsº: {º1,...,ºL}. The prior initial model is specified to be a GM model :

f(r₀)=X^L

l=1

º_l£'_n(r₀;µ^l_r,ß^l_r). (36) Note that a particular realizationr_swill belong to thek-th component of the mixture with probability:

∏k(rs)=

"_L X

l=1

ºl£'n(rs;µ^l_r,ß^l_r)

#_°1

£'n(rs;µ^k_r,ß^k_r). (37)

3 ENSEMBLE FILTER ALGORITHM 17

Consider further the Gauss-linear likelihood model,

f(d|r)='_m(d;Hr,ß_d|r). (38)

The associated posterior model will also be a Gaussian mixture model (Grana et al., 2017), f(r|d)=

XL l=1

º_l|d£'_n(r;µ^l_r_|d,ß^l_r_|d), (39) where the conditional expectations and covariances are obtained by component wise conditioning on the observationsd. The posterior mixture weights are defined by

ºk|d=

"_L X

l=1

ºl£'m(d;Hµ^l_r,Hß^l_rH^T+ß_d_|_r)

#_°1

ºk£'m(d;Hµ^k_r,Hß^k_rH^T+ß_d_|_r). (40) For general forward and likelihood models, the ensemble representation must contain a mode indicator associated with each ensemble member,

e_t: {(r^u(i)_t ,l_t⁽ⁱ⁾,d⁽ⁱ⁾_t ,i=1,...,n_e}, (41) with mode indicatorl_t⁽ⁱ⁾2{1,...,L}. The dynamic updating of this mode indicator often appears as chal-lenging. The conditioning/forward steps in the GM-EnKF are detailed in Algorithm 4.

Algorithm 4Gaussian mixture EnKF e^°_t^l: {(r^u(i)_t ,·,d⁽ⁱ⁾_t ),i=1,...,ne} Conditioning:

Assessf(r^u_t) frome^°_t^l!fˆ(r^u_t)=P_L

l=1ºˆ_l£'n(r, ˆµ^l_r, ˆß^l_r)

Assign ensemble memberito mode indicatorl_t⁽ⁱ⁾2{1,...,L} with probability {∏_l(r^u(i))_t ),l=1,...,L}

Define ensemblee_t: {(r^u(i)_t ,l_t⁽ⁱ⁾,d⁽ⁱ⁾_t ,i=1,...,n_e} Assessß^l_{r d};l=1,...,Lfromet !ßˆ^l_{r d}!Kˆ^l_t=°ˆ^l_{r d}[ ˆß^l_d]^°1 r^c(i)_t =r^u(i)_t +Kˆ^l_t⁽ⁱ⁾^t (dt°d⁽ⁱ⁾_t ),i=1,...,ne

Forwarding:

r^u(i)_t₊₁=!_t(r^c(i)_t )+≤^r_t,i=1,...,ne

d⁽ⁱ⁾_t+1=√_t+1(r^u(i)_t+1)+≤^d_t,i=1,...,ne

e^°l_t₊₁: {(r^u(i)_t+1,d⁽ⁱ⁾_t+1),i=1,...,ne}

The challenging part of the algorithm is to assess the GM modelf(r^u_t). Other versions of the (GM) EnKF algorithm use the EM-algorithm, particle filters or clustering techniques. If the dimension ofris of some size, estimating a suitable GM model will be complicated.

3.3.2.3 Selection (S)EnKF The basic idea is to specify the prior initial model as a selection-Gaussian pdf, see Section 2.2, which can represent multimodal, skewed and/or peaked variables. For general for-ward and likelihood models, one must rely on ensemble based filtering algorithms. These algorithms are inspired by the analytically tractable model discussed in Section 2.2, and have proven to be robust with regard to drift towards Gaussianity (Conjard and Omre, 2020). Let the prior initial distributionf(r0) be a selection-Gaussian pdf with parameters£^SG=(µ_r_˜,ß_r_˜,µ_∫,ß_∫|r˜,°_∫|˜r,A), see Equations 18 and 19. The auxiliary variables (˜r0,∫) are then jointly Gaussian and the variable of interest isr0=[˜r0|∫2A] which is selection-Gaussian.

The initial ensemblee₀of the EnKF algorithm contains realizations of the auxiliary variables [r˜₀,∫]

which are jointly Gaussian. The SEnKF algorithm is identical to the EnKF algorithm defined on these auxiliary variables. The forward model is given by

∑r˜t+1

while the likelihood model is given by

d_t=!_t(˜r_t,≤^d_t). (43)

Based on these models, an algorithm identical to the traditional EnKF algorithm is activated, to obtain the ensembleeT+1={(˜r^u(i)_T₊₁,∫^u(i)_T₊₁),i=1,...,ne}. Note that a time index is added to∫to account for the data assimilation up to timeT. The expectation vectorµ_r_˜_∫and covariance matrixß_r∫_˜ are estimated from e_T₊₁, and based on the jointly Gaussian'_2n((r˜,∫);µˆ_r_˜_∫,ßˆr˜∫), the filter variable of interest [r_T+1|d_0:T]= [˜rT+1|∫2A,d0:T] is assessed by McMC simulation.

The conditioning and forwarding steps of the SEnKF algorithm are specified in Algorithm 5.

Algorithm 5Selection EnKF

3 ENSEMBLE FILTER ALGORITHM 19

In document The Selection Kalman Model - Data assimilation for spatio-temporal variables with multimodal spatial histograms (sider 128-132)