5 Simulation example - Ensemble updating for a state-space model with categorical variables

In this section, the updating procedure described in Section 4 is demonstrated in a simulation example. The example involves a filtering problem where the unobserved Markov process{x^t}^Tt=1consists ofT = 100 time steps, the dimension ofx^tisn= 200, and there are three classes for each elementx^t_j ofx^t: 0, 1, and 2.

5.1 Experimental setup

To specify an initial distributionp_x¹(x¹) and a forward modelp_x^t_|x^t ¹(x^t|x^t ¹) for the latent Markov process {x^t}^Tt=1, we use a modified version of the binary simulation example in Loe and Tjelmeland (2021b). As for the example in that article, we let us inspire from the process when water comes through to an oil producing well in a petroleum reservoir. It should be stressed, however, that we do not claim that our model is really realistic for such a process. Thetinx^t_j then represents time andj the location in the well, withj= 1 being at the top of the well andj=nat the bottom. We let the eventsx^t_j= 0 andx^t_j = 1 represent the presence of porous sand stone filled with oil and water, respectively, in location j of the well at timet, while the eventx^t_j= 2 represents non-porous shale in the same location. One should note that the spatial distribution of sand stone and shale does not change with time, whereas the fluid in a sand stone may change.

Therefore, ifx^t_j ¹ = 2, the forward model should be specified so that alsox^t_j = 2 with probability 1, and correspondingly, if x^t_j ¹ = 0 or x^t_j ¹ = 1, the forward model should have probability zero forx^t_j= 2. In the start,t= 1, we want oil to be present in all the sand stone. Thereafter, water should gradually displace the oil and at timet=T water should be the dominating fluid.

To simplify the specification of the forward model, we let x^t given x^t ¹ be a first-order Markov chain, so that

px^t|x^t ¹(x^t|x^t ¹) =p_x^t₁_|x^t ¹(x^t₁|x^t ¹) Yn j=2

p_x^t_j_|x^t_j ₁_,x^t ¹(x^t_j|x^t_j ₁, x^t ¹). (34)

Moreover, forj= 2, . . . , n 1 we assume thatx^t_j inp_x^t_j_|_x^t_j ₁_,x^t ¹(x^t_j|x^t_j ₁, x^t ¹) only depends on (in addition to x^t_j ₁ of the vectorx^t) the three elements x^t_j ¹₁, x^t_j ¹ and x^t_j+1¹ of the vectorx^t ¹. Thereby,

px^t_j|x^t_j ₁,x^t ¹(x^t_j|x^tj 1, x^t ¹) =p_x^t

j|x^tj 1,x^t_j ¹₁,x^t_j ¹,x^t_j+1¹(x^t_j|x^tj 1, x^t_j ¹₁, x^t_j ¹, x^t_j+1¹) (35) for j= 2, . . . , n 1. Forj= 1 andj=nwe correspondingly assume

p_x^t₁_|_x^t ¹(x^t₁|x^t ¹) =p_xt

1|x^t₁¹,x^t₂ ¹(x^t₁|x^t₁ ¹, x^t₂ ¹) (36) and

px^t_n|x^t_n ₁,x^t ¹(x^t_n|x^tn 1, x^t ¹) =p_xt

n|x^tn 1,x^t_n¹₁,x^tn¹(x^t_n|x^tn 1, x^t_n ¹₁, x^t_n ¹). (37)

In the following, we first discuss the specification of Eq. (35). To obtain a model where the spatial distribution of sand stone and shale does not change in time we set for allx^t_j ₁, x^t_j ¹₁, x^t_j+1¹ 2 {0,1,2},

p_x^t

j|x^t_j ₁,x^t_j ¹₁,x^t_j ¹,x^t_j+1¹(x^t_j|x^tj 1, x^t_j ¹₁, x^t_j ¹= 2, x^t_j+1¹) =

( 1, forx^t_j = 2

0, otherwise, (38) and

p_x^t

j|x^t_j ₁,x^t_j ¹₁,x^t_j ¹,x^t_j+1¹(x^t_j = 2|x^t_j ₁, x^t_j ¹₁, x^t_j ¹, x^t_j+1¹) = 0, for x^t_j ¹2 {0,1}. (39) For the remaining probabilities in Eq. (35), we adopt the same values as used in Loe and Tjelmeland (2021a), see Table 1. The reasoning behind these proba-bilities is that ifx^t_j ¹= 1 the probability for having x^t_j = 1 should be high, and in particular this probability should be high if also x^t_j ₁ = 1. If x^t_j ¹ = 0 the probability for having alsox^t_j = 0 should be high unlessx^t_j ₁=x^t_j ¹₁=x^t_j+1¹ = 1.

The probabilities in Eqs. (36) and (37) we simply define from the values set for the probabilities in Eq. (35) by defining the values lying outside the simulated lattice to be zero. For x¹ we define that all the elements should be equal to 0 or 2, and assume the elements to be independent with p_x¹_j(x¹_j = 2) = 1/40 and p_x¹_j(x¹_j = 0) = 1 1/40. This results in a vectorx¹with a few (typically one node thick) layers of shale, with the remaining elements being oil filled sand stone. One realisation from the specified Markov process for{x^t}^Tt=1 is shown in Figure 7(a).

This realisation is also used to simulate the observations used in the simulation example.

For the likelihood fy^t|x^t(y^t|x^t), we know from Section 4 that it is sufficient to specify fy^t_j|x^t_j(y^t_j|x^tj) since the elements of y^t are assumed to be conditionally independent givenx^t, withy_j^tonly depending onx^t_j. To avoid that the likelihood involves an ordering of the three possible values ofx^t_j, we lety_j^tbe a vector with two components,y_j^t= (y_j,1^t , y_j,2^t ), and choosefy^t_j|x^t_j(y_j^t|x^tj) as a bivariate Gaussian distributionN(y^t_j;µ(x^t_j),⌃) with a mean vector

µ(x^t_j) = 8>

(0,0) ifx^t_j = 0, (1,0) ifx^t_j = 1, (¹₂,^p₂³) ifx^t_j = 2,

(40)

and covariance matrix ⌃ = ²I. As illustrated in Figure 8, the mean vectors

Table 1: Simulation experiment: Probabilities defining the true forward model

µ(0), µ(1) and µ(2) are chosen to lie at the vertices of an equilateral triangle with unit sides. This is to avoid an ordering of the three classes. We assume in this simulation experiment that the true likelihood model py^t|x^t(y^t|x^t) and the assumed likelihood modelfy^t|x^t(y^t|x^t) are equal. As such, the assumed likelihood modelfy^t|x^t(y^t|x^t) is used to generate the observation process{y^t}^Tt=1. Specifically, using the simulated Markov process shown in Figure 7(a) and setting = 1.0, we generate {y^t}^Tt=1 by simulating, independently for each j = 1, . . . ,200 and t= 1, . . . ,100,

y^t_j⇠fy_j^t|x^t_j(·|x^t_j).

An image of {(y_j,1^t , j = 1, . . . , n)}^Tt=1 is shown in Figure 7(b) and an image of {(y^tj,2, j= 1, . . . , n)}^Tt=1 is shown in Figure 7(c).

When running the proposed updating procedure, we need to set a value for

20 40 60 80 100

Figure 7: Simulation experiment: (a) The latent Markov process{x^t}¹⁰⁰t=1, (b) the first coor-dinates{(y^t_j,1, j = 1, . . . ,200)}¹⁰⁰t=1of the observation process{y^t}^Tt=1, and (c) the

Figure 8: Simulation experiment: Illustration of assumed likelihood modelf_y^t

j|x^tj(y_j^t|x^t_j)

⌫, the order of the assumed Markov chain model fx^t|✓^t(x^t|✓^t), and a value for the integer d in Eq. (28) which determines the structure of q(x^t,(i), z^t,(i);✓^t, y^t).

High values for⌫ and d, and high values fordespecially, make the construction ofq(x^t,(i), z^t,(i);✓^t, y^t) computer-demanding. Below, we investigate the two values

⌫= 1 and ⌫= 2, and for each of these we consider the three valuesd= 1,d= 2 and d= 3. Thereby, we have six combinations, or cases, for (⌫, d). For each of these six cases, we perform five independent runs, using ensemble size M = 20.

For each run, an initial ensemble {x^1,(1), . . . , x^1,(M)} is generated by simulating independent samples from the initial modelpx¹(x¹) of the Markov process specified above. The hyper-parametersa^t₀(0), . . . , a^t₀(K^⌫ 1),a^t,j_i (0), . . . , a^t,j_i (K 1) of the prior distributionf✓^t(✓^t) for✓^t(cf. Section A.1 in in Appendix A) at each time

Table 2: Results from simulation experiment: Proportion of correctly classified variablesx^t_j obtained with the MAP estimates in Eq. (41) computed in five independent runs d= 1, ⌫= 1 d= 2, ⌫= 1 d= 3, ⌫= 1 d= 1, ⌫= 2 d= 2, ⌫= 2 d= 3, ⌫= 2

0.8649 0.8903 0.8912 0.8472 0.8831 0.8688

steptare all set equal to one, and 500 iterations are used in the MCMC simulation of✓^t,(i)|x^t, ⁽ⁱ⁾, y^t (cf. Section A.2 in Appendix A).

5.2 Results

To evaluate the performance of the proposed approach, we first compute, for each of the five runs of each of the six combinations of (⌫, d), the maximum a posteriori probability (MAP) estimate ˆx^j_t ofx^t_j,t= 1, . . . , T,j= 1, . . . , n,

x^t_j= argmax

p^t_j(k) , (41)

where

p^t_j(k) = 1 M

XM i=1

1(z^t,(i)_j =k), k= 0,1,2, (42)

is an estimate of the marginal filtering probability p_x^t_j_|_y^1:t(k|y^1:t). Figure 9 shows images of the computed MAP estimates {ˆx^t_j, j = 1, . . . , n}^Tt=1 from one of the five runs performed for each of the six cases. From a visual inspection, it seems that we in all cases manage to capture the main characteristics of the true x^t -process in Figure 7(a), but the MAPs shown in Figures 9(a) and (d), which are obtained using d = 1, are possibly a bit noisier than the others. Table 2 lists the ratio of correctly classified variables x^t_j based on the MAPs obtained from the five independent runs of each case. According to Table 2, we classify around 85-90% of the variables correctly, and the best results are obtained when using the combinations ⌫ = 1, d= 2 and⌫ = 1, d= 3, i.e. when adopting a first-order Markov chain (⌫ = 1) for fx^t|✓^t(x^t|✓^t) and using 2⇥2- or 2⇥3-cliques (d = 2 or d= 3) in the construction ofq(x^t,(i), z^t,(i);✓^t,(i), y^t).

To further investigate the performance of the proposed approach, we estimate for each j and t the probability thatz_j^t,(i) is equal to the true value x^t_j, and we do this for each of the classes k= 0,1,2. Specifically, for each run and for each

20 40 60 80 100

Table 3: Results from simulation experiment: Estimated probabilities for observingz_j^t,(i)equal to the true valuex^t_jfor each classk= 0,1,2

d= 1, ⌫= 1 d= 2, ⌫= 1 d= 3, ⌫= 1 d= 1, ⌫= 2 d= 2, ⌫= 2 d= 3, ⌫= 2

⇡0|0 0.8210 0.8685 0.8687 0.8151 0.8587 0.8848

⇡_1|1 0.7558 0.7837 0.7964 0.7508 0.7840 0.7590

⇡2|2 0.7423 0.7480 0.7412 0.6935 0.7285 0.6985

⇡ 0.7730 0.8001 0.8021 0.7531 0.7904 0.7808

There are, in the latentx^t-process shown in Figure 7(a), 11929 variablesx^t_j taking the value 0, 7271 variables taking the value 1 and 800 variables taking the value 2. Thereby, since we run each of the six (⌫, d) combinations five times, we obtain for each (⌫, d) combination 5·11929 samples of⇡_0|0, 5·7271 samples of⇡_1|1and 5·800 samples of⇡2|2. We denote the corresponding sample means by ¯⇡0|0, ¯⇡1|1

and ¯⇡2|2, and we let ¯⇡ = ¹₃ ⇡¯0|0+ ¯⇡1|1+ ¯⇡2|2 . Figure 10 presents histograms constructed from the samples of ⇡_0|0, ⇡_1|1 and ⇡_2|2 for each case, and Table 3 summarises the corresponding computed values for ¯⇡0|0, ¯⇡1|1, ¯⇡2|2 and ¯⇡. The values for ¯⇡ indicate that, again, we obtain the best results using ⌫ = 1, d = 2 and ⌫ = 1, d = 3. Computationally, using d= 3 is more demanding, and since the improvement it o↵ers over d= 2 is only minor, the best approach may be to use ⌫= 1, d= 2.

In document Ensemble updating for a state-space model with categorical variables (sider 181-188)