Bayesian inversion of seismic data using multimodal selection Gaussian prior models

(1)

ISBN 978-82-326-6231-9 (printed ver.) ISBN 978-82-326-5505-2 (electronic ver.) ISSN 1503-8181 (printed ver.) ISSN 2703-8084 (online ver.)

Doctoral theses at NTNU, 2021:395

Ole Bernhard Forberg

Bayesian inversion of seismic data using multimodal selection Gaussian prior models

Doctor al thesis

Doctoral theses at NTNU, 2021:395Ole Bernhard Forberg NTNU Norwegian University of Science and Technology Thesis for the Degree of Philosophiae Doctor Faculty of Information Technology and Electrical Engineering Department of Mathematical Sciences

(2)

(3)

Thesis for the Degree of Philosophiae Doctor Trondheim, November 2021

Norwegian University of Science and Technology

Faculty of Information Technology and Electrical Engineering Department of Mathematical Sciences

Ole Bernhard Forberg

Bayesian inversion of seismic

data using multimodal selection

Gaussian prior models

(4)

Thesis for the Degree of Philosophiae Doctor

Faculty of Information Technology and Electrical Engineering Department of Mathematical Sciences

ISBN 978-82-326-6231-9 (printed ver.) ISBN 978-82-326-5505-2 (electronic ver.) ISSN 1503-8181 (printed ver.)

ISSN 2703-8084 (online ver.) Doctoral theses at NTNU, 2021:395 Printed by NTNU Grafisk senter

(5)

Bayesian inversion of seismic data using multimodal selection Gaussian prior models

Ole Bernhard Forberg August 22, 2021

(6)

(7)

Preface

This thesis is submitted in partial fulfillment of the requirements for the degree of Philosophiae Doctor (PhD) at the Norwegian University of Science and Technology (NTNU). The research is jointly funded by the Uncertainty in Reservoir Evaluation (URE) consortium, Aker BP and the Department of Mathematical Sciences (IMF), and is carried out at IMF.

I would like to thank my supervisor at the Department of Mathe- matical Sciences at NTNU, Professor Henning Omre, for his guidance throughout the last four years. Thank you for the knowledge you have imparted to me, which includes but is not limited to statistical topics.

I appreciate that we could converse about various topics and that you always were willing to share your point of view.

I would also like to thank my co-supervisor Øyvind Kjøsnes at Aker BP for his guidance and for the collaboration. Thank you for sharing your expertise on geophysics and seismic reservoir characterization, it helped better my understanding of the subjects. I would also like to extend a special thanks for your added availability while working on our most recent paper.

Finally, I would like to thank Assistant Professor Dario Grana at the University of Wyoming (UW) for the collaboration and the opportunity to visit UW. I really appreciate the hospitality you showed during my visit, it made my stay at UW a delight. I also appreciate your willingness to share your knowledge and for being there whenever I had a question.

(8)

(9)

Thesis Outline

Introduction 3

Bayesian inversion . . . 5

Markov chain Monte Carlo simulation . . . 6

Reservoir data . . . 9

Seismic AVO data . . . 9

Well data . . . 10

Bayesian reservoir characterization . . . 11

Likelihood model . . . 12

Prior model . . . 15

References . . . 28

Summary of papers 33

Paper I: Bayesian seismic AVO inversion for reservoir

variables with bimodal spatial histograms 41 Paper II: Bayesian Inversion of Time-Lapse Seismic

AVO Data for Multimodal Reservoir Properties 63 Paper III: Bayesian seismic AVO inversion using a

laterally coupled multimodal prior model 81

(10)

(11)

Background

(12)

(13)

Introduction

The success of a mathematical description of the world has led to an abundance of inverse problems in science and engineering. An inverse problem can be associated with any mathematical equation that describes the causal influence of a variabler∈ Ron an observable variabled∈ D, and this type of equation is termed a forward model. Here, the sets R and D are taken to be Hilbert spaces. In many applications a forward model can be theoretically based and captured by a functionf :R → D; hence, the forward model can be expressed

d|r

=f(r). (1)

Computing the observation d for a given configuration of the causal variable r is the forward problem associated with the forward model, whereas computingrfor a givend is the inverse problem.

An inverse problem is said to be well-posed if a unique solution exists and the solution changes continuously with variations in the observations (Hadamard, 1902, 1923). Well-posed problems are desirable because they in principle can be solved exactly by stable algorithms. Problems that are not well-posed are said to be ill-posed, and inverse problems of this type are often approached by regularization techniques. Regularization techniques impose restrictions on the solution that make the regularized problem well-posed. Tikhonov regularization is a classical regularization technique (Tikhonov, 1963) that offers a solution ˆrto ill-posed problems through minimization of an objective function,

ˆr= argmin

r

f(r)−d+αr−r^∗

. (2)

Here, α > 0 is a small parameter and r^∗ is a reasonable guess for the solution. The regularization and associated minimization is most com- monly carried out with respect to theL₂ norm, but other norms may be used. Nevertheless, the minimization problem is in general complicated and requires the use of a suitable numerical optimization procedure.

(14)

We have so far discussed inverse problems in which the observations are assumed to be exact. In real world applications, the assumption of exact observations tends to be unreasonable. Observations are burdened by measurement errors which, even for well-posed problems, may entail that the correct solution can not be found. Measurement errors can be detrimental for so-called ill-conditioned problems, which are character- ized by that small errors in the observations yield much greater errors in the solution. This type of inverse problems can be approached by regularization techniques. However, numerical approaches yield solutions that are point estimates; hence, the inherent uncertainties of the observations are not reflected. A probabilistic approach that incorporates the measurement errors therefore seems more reasonable, and it can provide a solution on the form of a probability statement. In a probabilistic setting, the forward model takes the form

d|r

=f(r) +e_d_|_r, (3)

where e_d_|_r is a stochastic variable representing the measurement errors.

The probabilistic forward model has an associated probability density function (pdf)p(d|r), which is termed the likelihood model.

Solving the inverse problem in a statistical framework involves identifying an estimatorˆr(d) :D → R, from which an estimate of the unknown causal variable rcan be computed. Estimators are often evaluated according to a loss function L(r,ˆr(d)), which is typically taken to be the sum of squared residuals, i.e., L(r,ˆr(d)) =r−ˆr(d)² .The expected loss of the estimator is captured by the risk function S_r_ˆ(r), which is defined pointwise inRas

S_r_ˆ(r) = Z

D

L(r,ˆr(d)) dp(d|r). (4) In the minimax approach, the optimal estimator is the estimator that minimizes the maximum risk. That is, the minimax risk is

S_mm = inf sup

ˆ r r∈R

S_ˆ_r(r), and the optimal minimax estimator is the one that achieves this risk. The minimax approach is a general principle for estimator selection, but it tends to be infeasible due to computational complexity, and results regarding the assessment of estimator uncertainty are limited. However, Stark (1992) reports results on minimax confidence intervals for a linear forward function (Backus, 1989; Donoho, 1989).

Alternatively, estimator selection can be based on minimization of the average risk. This is the framework of Bayesian statistics, in which optimal estimators minimize the average risk. In practice, estimation and uncertainty quantification are easily available in a Bayesian framework,

(15)

albeit substantial computational effort may be needed in some cases. We therefore further pursue inverse problems in a Bayesian framework.

Bayesian inversion

We now consider r and d to be real valued random vectors of dimen- sion nr and nd, respectively. In a Bayesian framework, the solution to the inverse problem is the posterior model p(r|d), which can be used for estimation and uncertainty assessment. The posterior model is the probability density function given by

p(r|d) = p(d|r)p(r)

p(d) ∝p(d|r)p(r). (5) Here, p(r)is the prior model, which represents the prior knowledge and beliefs that the modeler has aboutr. Furthermore,p(d|r)is the likelihood model describing the relationship between r and d. The normalizing constantp(d)tends to be challenging to compute and can make it difficult to obtain posterior models.

For certain combinations of prior models and likelihood models, the form of the posterior models are known. A class of prior models C is said to be conjugate with respect to a likelihood model, if the corresponding class of posterior models is also C (Casella and Berger, 2001).

Gaussian prior models provide the perhaps most well known examples of conjugacy; Gaussian prior models are conjugate with respect to so-called Gauss-linear likelihood models (Tarantola, 2005), which are likelihood models that are Gaussian with expectation linear in r. This relation provides the foundation of geostatistics and Kriging interpolation (Chilès and Delfiner, 1999), and because Gaussian prior models can adequately represent many phenomena, the relation is often used in Bayesian inversion frameworks due to analytically available posterior models. Moreover, predictive quantities can be computed directly from the parameters of the posterior model.

Then_r-dimensional random vectorris a Gaussian random field (GRF) withn_r-dimensional expectation vectorµ_rand(n_r×n_r)covariance matrix Σ_r, if its pdf is of the form (Johnson and Wichern, 2007)

p(r) = (2π)⁻ⁿ^r^/2|Σ_r|⁻^1/2exp

−1

2(r−µ_r)^T Σ⁻_r¹(r−µ_r)

, (6) with the superscriptT indicating the matrix transpose. We denote this Gaussian pdf by ϕ_n_r(r;µ_r,Σ_r). A Gauss-linear forward model is of the

form , where is an matrix representing the

(16)

linear operation onrand then_d-dimensional vectore_d_|_r is an error term with Gaussian distribution. We assume the error term to be zero in expectation; hence, the likelihood model can be expressed

p(d|r) =ϕ_n_d(d;Fr,Σ_d|r), (7) where Σ_d|r is the associated (n_d×n_d) covariance matrix. The Gaussian posterior model ϕ_n_r

r;µ_r|d,Σ_r_|_d

is easily available through computation of its conditional parameters (Johnson and Wichern, 2007),

µ_r|d=µ_r+Γ_rdΣ⁻_d¹(d−µ_d) (8) Σ_r_|_d=Σ_r−Γ^T_rdΣ⁻_d¹Γ_rd,

whereΓrd =ΣrF^T contains the inter-variable covariances betweenrand d. Moreover, µ_d =Fµ_r andΣd=FΣrF^T.

If a non-conjugate prior model is used, assessment of the posterior model tends to be simulation based. Markov chain Monte Carlo (McMC) is a widely used simulation technique for posterior model assessment (Mosegaard and Tarantola, 1995; Sen and Stoffa, 1996; Eidsvik et al.,

2004a).

Markov chain Monte Carlo simulation

McMC simulation schemes make use of Markov chains to simulate from the target distribution π(r) = p(r|d). The chain requires a transition kernel p_t r⁰|r

, by which the chain can enter new states r⁰ given its current state r. Such a transition kernel must ensure that the target distributionπ(·) is the equilibrium distribution of the chain, i.e.,

Z

B

π(r) dr= Z

pt(B|r)π(r) dr, ∀B∈ B, (9) where

p_t(B|r) = Z

B

p_t(r⁰|r) dr⁰ (10) and B is the Borel σ-field on Rⁿ^r. This condition is ensured by using reversible chains where the transition kernel satisifies the detailed balance equation (Gamerman and Hedibert, 2006),

(17)

defines the probability to move to a new state. Hence, the complement of the integral of the transition kernel over all new states defines the probability to remain in the current state, and the transition kernel can be expressed

p_t(B|r) = Z

B

q(r⁰|r)α(r⁰|r) dr⁰ (12) +I(r∈B)

1−

Z

q(r⁰|r)α(r⁰|r) dr⁰

, ∀B ∈ B,

where I(·) is the indicator function, being equal to 1 if its argument is true and0 otherwise.

Identifying a transition kernel p_t(r⁰|r) that can produce a Markov chain with the target distributionπ(·)as its equilibrium distribution may appear as a daunting task. Conveniently, the Metropolis-Hastings (M-H) algorithm (Hastings, 1970) provides the means by which to do so. The M-H algorithm designates an acceptance probability that ensures that the transition kernelpt(·|·)defines a reversible chain when combined with an arbitrary proposal kernel q(·|·),

α(r⁰|r) = min

1,π(r⁰)q(r|r⁰) π(r)q(r⁰|r)

. (13)

The M-H algorithm defines an irreducible and aperiodic chain with transition kernel p_t(·|·) withπ(·) as its limiting distribution if the proposal kernel q(·|·) is aperiodic and irreducible andα(r⁰|r)>0 for all possible values of(r⁰,r)(Roberts and Smith, 1994).

The sequence of simulations{r_k}ⁿk=1 from the McMC algorithm con- verges in distribution to the target distributionπ(r) as n→ ∞. Hence, obtaining a result in finite time necessitates decisions about where to start and when to stop the algorithm. Usually, the initial state of a chain is drawn at random and will not be from a representative region of π(r). A subsequent sequence of simulations are notably influenced by the initial state, before the chain is sufficiently close to π(r). This notably affected sequence of simulations is termed the burn-in of the chain, and we discard it because these simulations are not representative of π(r).

For practical purposes, we draw inspiration from the Gelman-Rubin convergence diagnostic (Gelman and Rubin, 1992) to decide when to stop a chain, unless otherwise stated. We run several chains in parallel and compare the results to determine an appropriate stopping point. The time necessary to reach a stopping point depends on how efficiently the chain navigates the target distribution π(r), which is captured by the concept of mixing. A chain is said to have good mixing if approximately

(18)

independent simulations are not far apart in the sequence of simulations, and conversely a chain has bad mixing if approximately independent simulations are far apart. Good mixing is preferable because it entails that the chain rapidly arrives at an acceptable degree of convergence. The mixing of a chain is influenced by the proposal kernel q(·|·); hence, the proposal kernel should be selected with care if computational efficiency is a concern.

In a setting where the form of the posterior model is unknown, McMC simulation is traditionally performed by using the prior model p(r) as proposal distribution (Mosegaard and Tarantola, 1995). The associated M-H acceptance probability is given by

α(r⁰|r) = min

1,p(d|r⁰) p(d|r)

. (14)

Although this approach is generally valid, it tends to suffer from low acceptance rates in multivariate settings. Moreover, in the absence of a very informative prior model there is an inverse relationship between acceptance rates and data quality. The acceptance rate increases as the uncertainty associated with the measured data increases, and decreases as the number of data points increases.

If a prior model that is conjugate with respect to a linear likelihood model is used with a non-linear likelihood model, the conjugate property can be exploited for more efficient simulation by use of an approximated linear likelihood model, p_L(d|r)≈p(d|r). Proposals can then be made from an approximate posterior modelp_L(r|d)∝p_L(d|r)p(r). Simulating from an approximate posterior model is similar to simulating from the prior model due to the conjugate property, but the proposals are likely to be better; hence, this approach may increase the acceptance rate. The associated M-H acceptance probability is

α(r⁰|r) = min

1,p(d|r⁰)/p_L(d|r⁰) p(d|r)/p_L(d|r)

. (15)

Note that α(r⁰|r) → 1 if the likelihood model is close to linear p(d|r)→p_L(d|r), as expected.

(19)

Reservoir data

Oil and gas reservoirs are formations of rock in the subsurface in which hydrocarbons have accumulated. Reservoirs are expensive to produce and necessitate the construction of production wells. Hence, attempts to produce reservoirs that are poorly suited for production entails big economical losses. However, producing well suited reservoirs can yield very large payoffs. The ability to identify a reservoirs suitability for production is therefore of utmost importance in the oil and gas industry.

The process of doing so is called reservoir characterization, which we will return to later. Reservoir characterization relies on data from potential reservoirs, which usually consist of seismic data and well data. Seismic data has good spatial coverage and poor precision, whereas well data has poor spatial coverage and high precision.

Seismic AVO data

Seismic data are collected by emitting compressional waves into the subsurface and measuring and registering the amplitude of the waves that are reflected back. These data and can be collected from large subsurface volumes in search for potential hydrocarbon reservoirs at relatively low cost. The arrival times of the reflections relative to the time of emission are also registered. The arrival times make it possible to map the registered amplitudes to particular locations in the subsurface; hence, seismic data are a representation of the subsurface based on wave amplitudes and times.

The physics involved in the collection of seismic data is complex and relies on intimate details of the medium that the seismic waves propagate through. Seismic wave energy is reflected back at interfaces defined by rock or fluid inhomogeneities. Moreover, the proportion of reflected energy is dependent on the angle of incidence of the emitted seismic waves on the interfaces, which is described by the Zoeppritz equations (Zoeppritz, 1919). These equations relate the reflectivity coefficients at interfaces to changes in the elastic properties and angle of incidence. The elastic properties consist of P-wave velocity and S-wave velocity, which are jointly referred to as seismic velocities, and density.

The angle dependency of the reflected energy is very useful for detect- ing hydrocarbons, because the seismic responses of fluid transitions have a strong and characteristic angle dependency. This attribute forms the basis for seismic amplitude variation with offset (AVO) data and seismic AVO analysis. Seismic AVO data consist of seismic data associated with different angles of incidence of the emitted compressional waves on in-

(20)

terfaces in the subsurface. For a specific subsurface target, seismic AVO data are collected by varying the horizontal distance between the source that emits the compresssional waves and the receiver that measures the reflections, while keeping the subsurface target centered between them.

In practice, receivers are usually placed at numerous distances from the source to obtain seismic data from different subsurface targets simultane- ously. This principle is used in offshore seismic data collection, in which a large number of receivers are towed behind a moving ship that emits compressional waves into the subsurface.

As an emitted compressional wave travels through the subsurface, it becomes distorted and stretched due to dispersion, and the shape of the distorted pulse is termed a seismic wavelet. The measured reflection at a given point in time contains contributions from several reflectivity coefficients in the subsurface, weighted by the seismic wavelet. Hence, the received signal is modeled as a convolution of a sequence of reflectivity coefficients with a seismic wavelet in a time interval. In practice, the seismic wavelet is unknown and application dependent, because the dispersive process is sensitive to the medium that the seismic waves travel through.

Therefore, in order to construct a sensible seismic forward model, seismic data should be supplemented by well data from which the seismic wavelet can be estimated.

Well data

In oil and gas exploration, it is usual to drill bore-holes at locations of particular interest in a seismic survey to obtain information that enables interpretation of the seismic data. Core samples, which are informative about the subsurface lithology, are collected during drilling. Once the wells are drilled, measuring instruments are placed in the bore-holes and hoisted up while producing well log data on a regular and relatively fine grid along the well profiles. Measurements of the seismic velocities and density, as well as measurements of petrophysical properties are recorded.

The measuring instruments are highly accurate; hence, well log data are subject only to minor measurement errors and can be considered as good approximations to the truth.

The reflectivity coefficients that can be used to estimate the seismic wavelet along the wells can be computed from well data using the Zoep- pritz equations. Estimating the seismic wavelet at a few locations in a seismic survey tends to be adequate because the wavelet shape varies slowly with lateral position (Walden and White, 1998), due to layering effects. Another important use of well data is in the construction of rock physics models, which are necessary to interpret seismic data in terms

(21)

of petrophysical properties. The seismic data are related to the elastic properties, and a rock physics model can in turn relate the elastic properties to petrophysical properties. A rock physics model can either be based on theoretical relations or be empirically approximated.

Bayesian reservoir characterization

The goal of reservoir characterization is to evaluate a reservoir’s suitability for production. Reservoir characterization is a spatial problem in which the reservoir zone of interest is discretized into a reservoir gridGr, consisting of n_r grid points. A spatial reservoir variable r from which the production suitability of a reservoir can be inferred, is defined on the reservoir grid Gr. That is, each random variable contained in r is associated with a location in Gr, which enables spatial effects to influence the characterization. A reservoir variable consists of a few selected petrophysical properties related to the producibility of the reservoir under study and typically includes permeability/porosity and water saturation, which are volumetric fractions, i.e., quantities limited to[0,1]. Porosity is informative about the presence of pores in the rock, which are pockets of empty space where hydrocarbons can settle. The degree to which the empty spaces in the rock are connected is described by permeability. The connectivity of the pores is important for fluid flow through the rock, which is crucial for the extraction of hydrocarbons. Permeability is usually strongly dependent on porosity, and since the former is complicated to measure, it is often derived from the latter. Water saturation is in- dicative of the location of hydrocarbons. Because water is denser than hydrocarbons, gravitational effects tend to separate the fluids. Hence, hydrocarbons can typically be found at locations where the water saturation is low. In reservoirs that are lithologically heterogeneous, porosity and water saturation may not be sufficiently informative and should prefer- ably be accompanied by lithology variables, such as volume of clay, to explain certain phenomena. The reservoir variable rcontainsn_p selected petrophysical properties and can be expressed as r= [r₁, ...,r_n_p], with each r_k being defined on the reservoir gridGr, fork= 1, ..., n_p. Hence,r is an n_pn_r-dimensional vector. For ease of presentation we will consider n_p = 1in the following.

Reservoir characterization of a subsurface reservoir volume is typically done by inversion of seismic data. This is an inverse problem in which seismic data d on a seismic grid Gd, consisting of n_d grid points, is sought to be explained by the reservoir variables r. We confine ourselves to seismic AVO data. The inversion is either cast into an optimization

(22)

setting or into a probabilistic setting. In an optimization setting, a highly accurate seismic forward model is used and the deviation between dand the seismics computed by the seismic forward model is minimized with respect to the elastic properties (Sen and Stoffa, 2013). Probabilistically, the inverse problem is usually approached in a Bayesian framework using approximate Zoeppritz equations (Buland and Omre, 2003; Gunning and Glinsky, 2004; Larsen et al., 2006). The Bayesian seismic reservoir characterization is represented by the posterior model p(r|d), which is proportional to the product of a likelihood modelp(d|r)and a prior model p(r), both of which has to be specified, see Equation 5.

Likelihood model

The likelihood model represents the chain of data acquisition, from the target variables of the inversion to the data, and is based on geophysics theory and well data. Well data from representative regions of the reservoir volume are particularly important for seismic wavelet estimation and may also be used to inform the rock physics model.

The Zoeppritz equations can in principle completely describe the relation between the PP reflectivity coefficients c(t, θ) and the elastic properties along a seismic trace. The elastic properties are canonical variables of the equation and consist of the seismic velocities, which are the P-wave velocitiesV_p(t)and the S-wave velocitiesV_s(t), and the densities ρ(t). However, the equations are difficult to interpret and their solution is unstable due to nonlinearity. Therefore, the Zoeppritz equations tend to be linearly approximated, and several linear approximations exist, including Bortfeld’s approximation (Bortfeld, 1961), Aki and Richards’

approximation (Aki and Richards, 1980), and Shuey’s approximation (Shuey, 1985). In Bayesian inversion frameworks, the Aki and Richards’

approximation is a common choice (Buland and Omre, 2003; Larsen et al., 2006; Grana and Della Rossa, 2010; Rimstad et al., 2012). We use a time continuous reflectivity function for the PP reflection coefficients, based on the Aki and Richards’ approximation (Buland and Omre, 2003),

c(t, θ) =a1(t, θ) δ

δtln Vp(t)

+a2(t, θ) δ

δtln Vs(t)

(16) +a3(t, θ) δ

δtln ρ(t) ,

(23)

where

a₁(t, θ) = 1 2

1 + tan²(θ)

, (17)

a₂(t, θ) =−4V¯_s²(t)

V¯_p²(t)sin²(θ), (18) a₃(t, θ) = 1

2 1−4V¯_s²(t)

V¯_p²(t)sin²(θ)

!

. (19)

Moreover, V¯_p(t),V¯_s(t), and ρ(t)¯ are time dependent averages which are assumed to be adequately represented by a constant or moving average in a time window. The above approximation can for n_θ offset angles be discretized and represented in matrix form as c = ADm. Here, we discretize according to Gr. Hence, c is an n_θn_r-dimensional vector.

Moreover,A is the sparse(n_θn_r×3n_r) matrix

A=







A1(θ1) A2(θ1) A3(θ1)

... ... ...

A₁(θ_n_θ) A₂(θ_n_θ) A₃(θ_n_θ),





 (20) whereA₁(θ_i),A₂(θ_i), andA₃(θ_i)are(n_r×n_r)diagonal matrices containing discrete time samples ofa₁(t, θ_i),a₂(t, θ_i), anda₃(t, θ_i), respectively, for i= 1, ..., n_θ. Furthermore, the (3n_r×3n_r) matrix D is a first order differential operator. Lastly, the3nr-dimensional vectorm contains the elastic properties, discretized on Gr, in the logarithmic domain. Hence, a reflectivity likelihood model is

c|m

=ADm+e_c_|_m, (21)

where e_c|m is an n_θn_r-dimensional vector containing approximation errors.

The seismic AVO datadare the convolution of a seismic wavelet with the reflectivity coefficients c. The wavelet is discretized to be consistent with the resolution of the reflectivity coefficients, and the convolutional likelihood model is

d|c

=Wc+e_d_|_c. (22)

Here, the(n_θn_d×n_θnr)matrixWcontains discretizations of the seismic waveletW(t, θ) for alln_θ offset angles, and the n_θn_d-dimensional vector

(24)

e_d_|_c contains observation errors. The wavelet matrix is of the form

W=







W₁ 0i_n_r . . . 0i_n_r 0inr W2 0inr . . . ...

... 0i_n_r ... ...

... ... ... 0i_n_r 0i_n_r . . . 0i_n_r W_n_θ







, (23)

where the(n_d×n_r)matricesW_icontainn_d discretizations of the seismic wavelet, with each row corresponding to the seismic wavelet centered at a location inGd.

The seismic likelihood model is p(d|m) =

Z

p(d|c,m)p(c|m) dc= Z

p(d|c)p(c|m) dc, (24) which can be expressed

[d|m] =WADm+e_d|m, (25)

wheree_d_|_m =We_c_|_m+e_d_|_c. The error terms are typically assigned Gaus- sian distributions (Buland and Omre, 2003), which yields a Gauss-linear seismic likelihood model. This likelihood model can readily be used in a Bayesian seismic inversion framework for the elastic properties, given a prior modelp(m). A Gaussian prior model is advantageous for its conjugate property, but not required. However, if reservoir variables are the target of the inversion, a rock physics likelihood model p(m|r)is needed.

The rock physics model can be integrated into the seismic inversion model or be used in rock physics inversion, i.e., after seismic inversion to elastic properties. However, for the uncertainties to propogate all the way from the seismic data to the reservoir variables, the data acquisition procedure should be described all the way from rtod by a likelihood model. The overall likelihood model is

p(d|r) = Z

p(d|m,r)p(m|r) dm= Z

p(d|m)p(m|r) dm, (26) where the latter step follows because m are canonical variables of the Zoeppritz equations. The rock physics forward function may be non-linear and, if so, the forward function in the overall likelihood model p(d|r) is non-linear. Under these circumstances, simulating from the posterior model tends to be computationally inefficient in spatial settings. However, if a linear forward function seems feasible, either by empirical estimation

(25)

or theoretical approximation, computationally efficient inversion schemes can be based on a Gauss-linear rock physics model. The rock physics forward function is then represented by the(3n_r×n_r) matrixBand the overall likelihood model takes the form

Prior & Posterior model

The prior modelp(r), see Equation 5, is assigned on a subjective basis, but should be representative of r. A prior model is representative if it is accurately centered and realistically represents the uncertainty and spatial continuity inr. Prior model assignment can be based on experience, expert knowledge, beliefs, data, or any combination thereof. If representative well data of the reservoir variables are available, it is natural to adopt an empirical Bayes approach to prior model assignment, and we do so. The empirical Bayes method entails estimation of the prior model from representative data, which can be done either non-parametrically or parametrically. We will be concerned with parametric empirical Bayes, to which an introduction can be found in Casella (1985). Reservoir variables tend to be multimodal due to underlying lithology/fluid (LF) classes (Grana and Della Rossa, 2010; Rimstad et al., 2012); hence, a Gaussian prior model may not be adequate. A prior modelp(r) should often have support for multimodality, and two Gaussian-based parametric model alternatives have emerged in the Bayesian seismic inversion literature;

namely, Gaussian mixture models and selection Gaussian models.

Gaussian mixture models

Gaussian mixture models (GMMs) can represent multimodal distributions and have successfully been applied to reservoir characterization (Grana and Della Rossa, 2010; Rimstad et al., 2012; Fjeldstad et al., 2021).

As the name implies, GMMs are generated by combining Gaussian models. Typically, the mixture is based on pre-defined LF classes that notably affect the reservoir variables; hence, the model relies on conditional Gaus- sian distributions for the reservoir variables,p(r|κ) =ϕ_n_r(r;µ_r_|_κ,Σ_r|κ), where κ∈Ωⁿ_κ^r is a spatial mode indicator variable representing the LF classes. Here, Ω_κ = {ω₁, ..., ω_n_κ} is the set of LF classes. A Gaussian

(26)

mixture prior modelp(r) can be expressed as p(r) =X

Ω^nrκ

p(r|κ)p(κ), (28) wherep(κ)is a prior model forκ. Defined as such, the prior model for the reservoir variables is a probability weighted sum of Gaussian distributions, each of which has a parameterization that is characteristic for a particular LF class.

The prior model for κ should honor geophysical laws and tenden- cies with respect to LF ordering and transitions, and spatial continuity.

Markov models can impose such constraints and have a long standing tra- dition in geophysical applications, first appearing in the form of Markov chains in 1D applications (Krumbein and Dacey, 1969). A first-order Markov chain prior model forκcan be expressed

p(κ) =p(κ1)

nr

Y

i=2

p(κi|κi−1), (29) where p(κ₁) is the stationary probability and p(κ_i|κ_i₋₁), i= 2, ..., n_r are transition probabilities. Higher order chains can be defined to add addi- tional constraints such as minimum layer thickness. The interpretability and functionality of Markov models have made them widely used in geophysical applications, and in particular to exploring the LF properties of reservoirs (Eidsvik et al., 2004b; Larsen et al., 2006; Ulvmoen et al., 2010).

Generally, Markov models are specified in the form of Markov random fields (MRFs), which are typically defined according to the Gibbs formulation (Besag, 1974; Kindermann and Snell, 1980). This formulation generalizes the Markov chain definition to higher spatial dimensions and is based on so-called cliques. A clique is defined on an undirectional graph as a set of mutually adjacent vertices, i.e., every pair of vertices in a clique are adjacent. Moreover, a maximal clique c is a clique that is not a subset of a larger clique, and the set of maximal cliques on the graph is denoted by C. The MRF model can be expressed in terms of maximal cliques as

p(κ)∝exp



−X

c∈C

ψ_c(κ_c)



, (30) where ψ_c(κ_c) is the clique potential function for the maximal clique c andκ_c are the LF classes in the maximal clique c.

(27)

The neighbordhood structure of an MRF is related to its cliques, as described by the Hammersley-Clifford theorem (Hammersley and Clifford, 1971). Intuitively, the neighborhood structure of an MRF can be defined locationwise from its defining cliques as the set of locations in the union of all cliques that include the target location, except the target location itself, as illustrated in Figure 1. The MRF model supports arbitrary spatial dimensionality and higher order neighborhoods; hence, the Markov chain model specified in Equation 29 is a special case of an MRF in 1D with neighborhoods consisting of the two nearest locations, except for border effects.

Clique Neighborhood

Figure 1: Cliques with corresponding neighborhoods. The neighborhoods are defined with respect to the cells marked by a cross.

The full conditional densities arep(κ_i|κ_i^c), with the subscripti^c de- noting the complement of location ion the graph, i.e., i^c ={1, ..., n_r}\i, i = 1, ..., n_r. These densities are defined by the neighborhood structure of the MRF, which can be identified through application of the Hammersley-Clifford theorem. The typically large grids associated with reservoir characterization entails that the normalization constant of the MRF is computationally prohibitive. Therefore, assessment ofp(κ)tends to be simulation based and is often done by using the full conditional densitiesp(κ_i|κ_i^c), that is, simulation is usually done by single-site Gibbs sampling.

GMMs are conjugate prior models subject to Gauss-linear likelihood models (Grana et al., 2017; Fjeldstad and Grana, 2018); hence, p(r|d)is

(28)

also a GMM and of the form p(r|d) =X

Ω^nrκ

p(r|κ,d)p(κ|d). (31) Note that the conditional pdf p(r|κ,d) is Gaussian and therefore easy to evaluate. Evaluation of the mixing weights p(κ|d) is more involved.

Assessment of the posterior modelp(κ|d) is complicated due to the ver- tically convolutional seismic likelihood model, which results in p(κ|d) being a convolved hidden Markov model (Lindberg and Omre, 2014).

Consequently,p(κ|d) will be a higher-order Markov model, irrespective of the order defined in p(κ). This entails that assessment of the exact posterior Markov model becomes computationally infeasible. However, assessment can be based on approximations (Rimstad and Omre, 2013;

Fjeldstad and Grana, 2018).

Selection Gaussian models

The selection Gaussian model is inspired by the concept of selection probability distributions (Azzalini, 1985; Arellano-Valle et al., 2006; Azzalini, 2013). The selection concept has been extended to spatial settings (Al- lard and Naveau, 2007; Omre and Rimstad, 2021) and has successfully been applied to seismic inversion (Karimi et al., 2010; Rimstad and Omre, 2014). The selection Gaussian model is a very flexible class of models and is a viable candidate for representing multimodal random variables.

A selection Gaussian random field (S-GRF) is based on two interacting GRFs: the basis GRF ˜r and the auxiliary GRF ν. The basis GRF is specified on the reservoir gridGr, whereas the auxiliary GRF is specified on a gridGν, which may differ. The conditional random field (RF)[ν|˜r]

is Gauss-linear; hence, (˜r,ν) are jointly Gaussian, with joint pdf p





"

˜ r ν

#

=ϕnr+nν





"

˜ r ν

#

;

"

µ_r_˜ µ_ν

# ,

"

Σ_r_˜ Γ_rν_˜ Γ^T_rν_˜ Σν

#

. (32) Here,µ_˜_ris then_r-dimensional expectation vector of˜randΣ_˜_ris its(n_r× n_r) covariance matrix. Similarly, µ_ν is then_ν-dimensional expectation vector ofν andΣ_ν is its(n_ν×n_ν) covariance matrix. Lastly,Γ_rν_˜ is the (n_r×n_ν)cross-covariance matrix between ˜randν. A truncation of the auxiliary GRFν is defined according to then_ν-dimensional selection set A. The conditional RFr= [˜r|ν ∈A] is selection Gaussian, with pdf

p(r) =p(˜r|ν ∈A) = Φnν(A;µ_ν|˜_r,Σ_ν_|_˜_r)

Φ_n_ν(A;µ_ν,Σ_ν) ϕnr(˜r;µ_r_˜,Σr˜), (33)

(29)

where the nominator and denominator are the Gaussian probabilities of the selection set. That is, the denominator is

Φnν(A;µ_ν,Σν) = Z

A

ϕnν(ν;µ_ν,Σν) dν, (34) and similarly for the nominator with appropriate distributional parameters. The conditional parameters in Equation 33 are computed by standard Gaussian conditioning formulas (Johnson and Wichern, 2007),

µ_ν|˜_r=µ_ν +Γ^T_˜_rνΣ⁻_˜_r¹(˜r−µ_r_˜) (35) Σ_ν_|_˜_r=Σ_ν −Γ^T_˜_rνΣ⁻_˜_r¹Γ_˜_rν.

The flexibility of the distribution enters through the cross-covarianceΓ_rν_˜ between˜randν, and the shape of the selection setA.

To robustly represent the variability in the reservoir variables, a spatially stationary prior model is sensible, because it captures the total uncertainty reflected in the well data at every location. An RF is said to be stationary if its associated pdf is spatially shift invariant for any subset of its random variables. That is, the pdf of the chosen subset of random variables must depend only on the distances between them and not on their specific location. We specify an S-GRF model withnν =nr, which can support stationarity, except for border effects. Moreover, it enables mode transitions at every location in the reservoir grid, which can be important for precisely locating regions of interest in the spatial domain. To restrict model complexity, we specify an intervariable spatial correlation structure, which is contained in the(nr×nr) matrixΩ and defined through a translation invariant and positive definite correlation functionρ(·). We define a stationary Gaussian distribution for˜rwith expectation vectorµ_˜_ri_n_r and covariance matrixΣ_˜_r=σ_r²_˜Ω. Here,µ_r_˜andσ_r²_˜ are the locationwise expectation and variance of˜r, respectively, andi_n_r is the n_r-dimensional vector of ones. Furthermore, because the influence of[ν|ν ∈A] on˜rdepends on the location ofAonly through the relative location of A to the distribution of the auxiliary variable, we define a stationary and locationwise standard Gaussian distribution forν, with expectation vectorµ_ν = 0i_n_r and covariance matrixΣ_ν =γ²Ω+(1−γ²)I_n_r, where I_n_r is the (n_r×n_r) identity matrix. Here, γ is the locationwise correlation between r˜ and ν. In this framework, the cross covariance matrix is an (n_r×n_r) matrix of the form Γ_rν_˜ =γσ_˜_rΩ. Lastly, we use a location invariant selection set, i.e., A=Aⁿ^r. The locationwise selection set Aconsists of n_A disjoint real intervals,A=SnA

i=1[a_i, b_i], a_i < b_i. The S-GRF specified above is stationary and the conditional parameters

(30)

involved in its pdf in Equation 33 can be expressed

µ_ν_|_˜_r= 0i_n_r+γσ⁻¹_˜_r (˜r−µ_r_˜i_n_r) (36) Σ_ν|˜_r= (1−γ²)I_n_r.

Hence, the model parameters are Θ^SG = [µ_˜_r, σ˜r, γ, A, ρ]. The first four parameters are primarily related to the locationwise distributions of the S-GRF, whereasρ(·)primarily relates to the spatial correlation structure.

We now briefly consider S-GRFs of low dimensionality to build some model intuition. Figure 2 presents three examples of univariate selection Gaussian pdfs and associated cumulative probability functions (cdfs), with the effect of the selection set highlighted. The geometry of the selection setA notably affects the selection Gaussian distribution and is the source of vast model flexibility, provided that the correlationγ between the basis variable r and the auxiliary variable ν is sufficiently strong.

Figure 3 illustrates the selection Gaussian distribution in a multivariate setting based on the bimodal and trimodal univariate distributions shown in Figure 2.

Figure 2: Univariate selection Gaussian distributions. Skewed (top row), bimodal (middle row), and trimodal (bottom row). The left column displays the shape of the selection setA superimposed on the joint distribution of r˜and ν, whereas the middle and right column display the resulting pdf and cdf ofr, respectively.

(31)

Figure 3: Multivariate selection Gaussian distributions with bimodal (top row) and trimodal (bottom row) marginals. Bivariate distributions (left column) and realizations from corresponding 2500-variate S-GRFs (right column). A moderate level of spatial correlation is used.

S-GRFs are conjugate prior models with respect to Gauss-linear likelihood models (Omre and Rimstad, 2021); hencep(r|d)is also an S-GRF and of the form

p(r|d) =p(˜r|ν ∈A,d) (37)

= Φ_n_r

A;µ_ν_|_r,d_˜ ,Σ_ν_|_˜_r,d Φ_n_r

A;µ_ν_|_d,Σ_ν_|_d ×ϕ_n_r

˜

r;µ_r_˜_|_d,Σ_r|d_˜ .

The expressions for the parameters involved in the posterior model can be developed from classical Gaussian theory. The conditional expectation vectors are

"

µ_r_˜_|_d µ_ν_|_d

#

=

"

µ_˜_ri_n_r 0i_n_r

# +

"

Σ_r_˜G^T Γ_ν˜_rG^T

#

Σ⁻¹_d (d−µ_d), (38) and the conditional covariance matrices are

"

Σ_˜_r_|_d Γ_˜_rν_|_d Γ_ν_r|d_˜ Σ_ν|d

#

=

"

Σ_r_˜ Γ_rν_˜ Γ_ν˜_r Σ_ν

#

−

"

Σ_r_˜G^T Γ_ν˜_rG^T

# Σ⁻¹_d h

GΣ_r_˜ GΓ_˜_rνi

, (39)

(32)

hence

µ_ν_|_˜_r,d=µ_ν_|_d+Γ_ν˜_r|dΣ⁻¹_˜_r_|_d(˜r−µ_r_˜_|_d), (40) Σ_ν|˜_r,d=Σ_ν|d+Γ_ν˜_r|dΣ⁻¹_˜_r_|_dΓ_˜_rν|d.

Recall thatGis the forward function of the likelihood model. Further- more, Σd =GΣrG^T.

The prior and posterior models are, therefore, both available once a suitable assessment strategy for S-GRFs is developed. The assessment strategy should be capable of dealing with non-stationary S-GRFs because the posterior model will, even if the prior model is stationary, tend to be non-stationary. In the following, we will discuss the assessment of a prior S-GRF model, but the same discussion applies to the assessment of the posterior model.

In the univariate case, the selection Gaussian distribution can easily be evaluated analytically by use of well known Gaussian cdfs,

p(r) =p(˜r|ν ∈A) (41)

= Pn_A

i=1

Φ1(bi, µ_ν_|_˜_r, σ²_ν_|_r_˜)−Φ1(ai, µ_ν_|_˜_r, σ_ν²_|_r_˜) PnA

i=1 Φ₁(b_i, µ_ν, σ_ν²)−Φ₁(a_i, µ_ν, σ²_ν) ϕ₁(˜r;µ_r_˜, σ_r²_˜).

Evaluation of the pdf becomes challenging for high dimensional S- GRFs and is usually simulation based. The simulation is performed in two steps; first the auxiliary variableν is simulated from the truncated Gaus- sian pdfp ν|ν ∈A

=I(ν ∈A)ϕ_n_ν(ν;µ_ν,Σ_ν)

Φ_n_ν (A;µ_ν,Σ_ν)₋1

and then the conditional basis variable

˜

r|ν,ν∈A

is simulated from the conditional Gaussian pdfϕnr

˜r;µ_˜_r|ν,Σ_˜_r_|_ν

. The latter step is straightfor- ward because efficient algorithms for simulation from GRFs are available, whereas the former is more involved and typically relies on an McMC algorithm. A sensible simulation strategy is to simulate the truncated auxiliary GRF [ν|ν ∈ A] piece by piece; hence, the form of the conditional pdfs of the RF is important. For a blockb⊆ {1, ..., n_r}of sizen_b, a block based decomposition of the pdf p(ν|ν ∈A) is

p(ν|ν ∈A) =p(ν_b,ν_b^c|ν_b∈Aⁿ^b,ν_b^c ∈Aⁿ^r⁻ⁿ^b) (42)

=p(ν_b|ν_b^c,ν_b∈Aⁿ^b)p(ν_b^c|ν_b ∈Aⁿ^b,ν_b^c ∈Aⁿ^r⁻ⁿ^b), where the subscript b^c denotes the complement of the block. In the following, we somewhere omit subscripting expectations and variances for ease of notation. The expectations and variances that appear without

(33)

a subscripted variable are to be understood as if subscripted by ν. We first consider the case of a single-site block, i.e., a block of size n_b = 1 that consists only of one location, b=i,i∈ {1, ..., n_r}. The associated block pdf is

p(ν_b|ν_b^c,ν_b ∈Aⁿ^b) = I(ν_b∈Aⁿ^b)p(ν_b|ν_b^c)

P(ν_b∈Aⁿ^b|ν_b^c) (43)

= I(ν_i ∈A)ϕ₁(ν_i;µ_i|ic, σ_i²_|_ic) Φ₁(A;µ_i_|_ic, σ²_i|ic) .

This block pdf is a truncated Gaussian pdf and represents the locationwise full conditional pdfs of the truncated GRF. Hence, the locationwise full conditional pdfs of the truncated GRF are computationally easily available, which entails that single-site Gibbs sampling may be feasible.

Single-site Gibbs sampling is a viable and efficient simulation strategy for RFs with weak spatial correlation structure that simulates sequentially from the locationwise full conditional distributions. The single-site Gibbs simulation algorithm is presented in pseudo code in Algorithm 1.

Algorithm 1: Dok simulation sweeps of

ν|ν ∈A

by single-site Gibbs.

Precompute conditioning weights k_i and conditional marginal variancesσ²_i_|_ic:

For i from 1 to n_r ki =Σi,i^cΣ⁻_ic¹. σ²_i_|_ic =σ²_i −kiΣ^T_i,ic. End

Initialize ν⁰ ∈A.

For j from 1 to k Set ν^j =ν^j⁻¹ For i from 1 to nr

Compute conditional mean and simulate:

µ^j_i_|_ic =µ_i+k_i

ν^j_ic−µ_ic

.

ν_i^j ∼ ^I^(ν

j i∈A)ϕ1

ν^j_i;µ^j_i|ic,σ²_i|ic Φ1

A;µ^j_i|ic,σ²_i|ic . End

End

(34)

Note that the conditioning weightsk_i are identical for locations that are not influenced by border effects in stationary S-GRFs, which may be exploited for more efficient precomputation. In non-stationary S-GRFs the conditioning weights tend to be unique for each location. Moreover, computation of the parameters involved in the full conditional marginals may be prohibitive if the grid under study is large and the range of the spatial correlation is long. In such situations it may be necessary to reduce computational time by approximating the full conditionals as π(ν_i|ν_i^c) ≈π(ν_i|ν_iⁿ), with ν_iⁿ consisting of random variables that are notably correlated with ν_i.

The feasibility of the single-site Gibbs algorithm depends on the properties of the locationwise full conditional pdfs. These pdfs become increasingly constrained with increasing correlation in the RF, which is detrimental to the mixing of the algorithm in multimodal settings. Block- wise sampling, which is based on a partition of the grid into blocks consisting of collections of adjacent grid points, offers a possible solution to the limitations of single-site Gibbs sampling. The sampling is performed blockwise and sequentially, which enables reduction of the influence of the RF outside the block on the locationwise marginals within the block by strategic inter-block sampling. The pdf of a block b⊆ {1, ..., nr} of sizen_b >1is

p(ν_b|ν_b^c,ν_b ∈Aⁿ^b) = I(νb∈Aⁿ^b)p(νb|νb^c)

P(νb ∈Aⁿ^b|νb^c) (44)

= Q

i∈bI(ν_i ∈A)ϕ₁(ν_i;µ_i_|_b^c,v, σ_i|b² c,v) Φn_b(Aⁿ^b;µ_b_|_bc,Σ_b_|_b^c) .

Here, the subscript b^c,v denotes the union of the complement of the block and the already visited locations within the block. Note that the normalization constant Φn_b(Aⁿ^b;µ_b|bc,Σ_b_|_b^c) can not be expressed in a sequentially conditional form; hence, assessment of the block pdf requires evaluation of a high dimensional Gaussian orthant probability, which is challenging. Therefore, we use the Metropolis Hastings approach presented in Rimstad and Omre (2014) with proposal

q(ν_b|ν_b^c,ν_b ∈Aⁿ^b) =Y

i∈b

I(νi∈A) ϕ₁

ν_i;µ_i|b^c,v, σ_i²_|_bc,v

Φ1

A;µ_i_|_b^c,v, σ²_i|bc,v

, (45)