ISBN 978-82-326-6231-9 (printed ver.) ISBN 978-82-326-5505-2 (electronic ver.) ISSN 1503-8181 (printed ver.) ISSN 2703-8084 (online ver.)
Doctoral theses at NTNU, 2021:395
Ole Bernhard Forberg
Bayesian inversion of seismic data using multimodal selection Gaussian prior models
Doctor al thesis
Doctoral theses at NTNU, 2021:395Ole Bernhard Forberg NTNU Norwegian University of Science and Technology Thesis for the Degree of Philosophiae Doctor Faculty of Information Technology and Electrical Engineering Department of Mathematical Sciences
Thesis for the Degree of Philosophiae Doctor Trondheim, November 2021
Norwegian University of Science and Technology
Faculty of Information Technology and Electrical Engineering Department of Mathematical Sciences
Ole Bernhard Forberg
Bayesian inversion of seismic
data using multimodal selection
Gaussian prior models
Thesis for the Degree of Philosophiae Doctor
Faculty of Information Technology and Electrical Engineering Department of Mathematical Sciences
© Ole Bernhard Forberg
ISBN 978-82-326-6231-9 (printed ver.) ISBN 978-82-326-5505-2 (electronic ver.) ISSN 1503-8181 (printed ver.)
ISSN 2703-8084 (online ver.) Doctoral theses at NTNU, 2021:395 Printed by NTNU Grafisk senter
Bayesian inversion of seismic data using multimodal selection Gaussian prior models
Ole Bernhard Forberg August 22, 2021
Preface
This thesis is submitted in partial fulfillment of the requirements for the degree of Philosophiae Doctor (PhD) at the Norwegian University of Science and Technology (NTNU). The research is jointly funded by the Uncertainty in Reservoir Evaluation (URE) consortium, Aker BP and the Department of Mathematical Sciences (IMF), and is carried out at IMF.
I would like to thank my supervisor at the Department of Mathe- matical Sciences at NTNU, Professor Henning Omre, for his guidance throughout the last four years. Thank you for the knowledge you have imparted to me, which includes but is not limited to statistical topics.
I appreciate that we could converse about various topics and that you always were willing to share your point of view.
I would also like to thank my co-supervisor Øyvind Kjøsnes at Aker BP for his guidance and for the collaboration. Thank you for sharing your expertise on geophysics and seismic reservoir characterization, it helped better my understanding of the subjects. I would also like to extend a special thanks for your added availability while working on our most recent paper.
Finally, I would like to thank Assistant Professor Dario Grana at the University of Wyoming (UW) for the collaboration and the opportunity to visit UW. I really appreciate the hospitality you showed during my visit, it made my stay at UW a delight. I also appreciate your willingness to share your knowledge and for being there whenever I had a question.
Thesis Outline
Introduction 3
Bayesian inversion . . . 5
Markov chain Monte Carlo simulation . . . 6
Reservoir data . . . 9
Seismic AVO data . . . 9
Well data . . . 10
Bayesian reservoir characterization . . . 11
Likelihood model . . . 12
Prior model . . . 15
References . . . 28
Summary of papers 33
Paper I: Bayesian seismic AVO inversion for reservoir
variables with bimodal spatial histograms 41 Paper II: Bayesian Inversion of Time-Lapse Seismic
AVO Data for Multimodal Reservoir Properties 63 Paper III: Bayesian seismic AVO inversion using a
laterally coupled multimodal prior model 81
Background
Introduction
The success of a mathematical description of the world has led to an abundance of inverse problems in science and engineering. An inverse problem can be associated with any mathematical equation that describes the causal influence of a variabler∈ Ron an observable variabled∈ D, and this type of equation is termed a forward model. Here, the sets R and D are taken to be Hilbert spaces. In many applications a forward model can be theoretically based and captured by a functionf :R → D; hence, the forward model can be expressed
d|r
=f(r). (1)
Computing the observation d for a given configuration of the causal variable r is the forward problem associated with the forward model, whereas computingrfor a givend is the inverse problem.
An inverse problem is said to be well-posed if a unique solution exists and the solution changes continuously with variations in the observations (Hadamard, 1902, 1923). Well-posed problems are desirable because they in principle can be solved exactly by stable algorithms. Problems that are not well-posed are said to be ill-posed, and inverse problems of this type are often approached by regularization techniques. Regularization techniques impose restrictions on the solution that make the regularized problem well-posed. Tikhonov regularization is a classical regularization technique (Tikhonov, 1963) that offers a solution ˆrto ill-posed problems through minimization of an objective function,
ˆr= argmin
r
f(r)−d+αr−r∗
. (2)
Here, α > 0 is a small parameter and r∗ is a reasonable guess for the solution. The regularization and associated minimization is most com- monly carried out with respect to theL2 norm, but other norms may be used. Nevertheless, the minimization problem is in general complicated and requires the use of a suitable numerical optimization procedure.
We have so far discussed inverse problems in which the observations are assumed to be exact. In real world applications, the assumption of exact observations tends to be unreasonable. Observations are burdened by measurement errors which, even for well-posed problems, may entail that the correct solution can not be found. Measurement errors can be detrimental for so-called ill-conditioned problems, which are character- ized by that small errors in the observations yield much greater errors in the solution. This type of inverse problems can be approached by regularization techniques. However, numerical approaches yield solutions that are point estimates; hence, the inherent uncertainties of the obser- vations are not reflected. A probabilistic approach that incorporates the measurement errors therefore seems more reasonable, and it can provide a solution on the form of a probability statement. In a probabilistic setting, the forward model takes the form
d|r
=f(r) +ed|r, (3)
where ed|r is a stochastic variable representing the measurement errors.
The probabilistic forward model has an associated probability density function (pdf)p(d|r), which is termed the likelihood model.
Solving the inverse problem in a statistical framework involves identi- fying an estimatorˆr(d) :D → R, from which an estimate of the unknown causal variable rcan be computed. Estimators are often evaluated ac- cording to a loss function L(r,ˆr(d)), which is typically taken to be the sum of squared residuals, i.e., L(r,ˆr(d)) =r−ˆr(d)2 .The expected loss of the estimator is captured by the risk function Srˆ(r), which is defined pointwise inRas
Srˆ(r) = Z
D
L(r,ˆr(d)) dp(d|r). (4) In the minimax approach, the optimal estimator is the estimator that minimizes the maximum risk. That is, the minimax risk is
Smm = inf sup
ˆ r r∈R
Sˆr(r), and the optimal minimax estimator is the one that achieves this risk. The minimax approach is a general principle for estimator selection, but it tends to be infeasible due to computational complexity, and results regarding the assessment of estimator uncertainty are limited. However, Stark (1992) reports results on minimax confidence intervals for a linear forward function (Backus, 1989; Donoho, 1989).
Alternatively, estimator selection can be based on minimization of the average risk. This is the framework of Bayesian statistics, in which optimal estimators minimize the average risk. In practice, estimation and uncertainty quantification are easily available in a Bayesian framework,
albeit substantial computational effort may be needed in some cases. We therefore further pursue inverse problems in a Bayesian framework.
Bayesian inversion
We now consider r and d to be real valued random vectors of dimen- sion nr and nd, respectively. In a Bayesian framework, the solution to the inverse problem is the posterior model p(r|d), which can be used for estimation and uncertainty assessment. The posterior model is the probability density function given by
p(r|d) = p(d|r)p(r)
p(d) ∝p(d|r)p(r). (5) Here, p(r)is the prior model, which represents the prior knowledge and beliefs that the modeler has aboutr. Furthermore,p(d|r)is the likelihood model describing the relationship between r and d. The normalizing constantp(d)tends to be challenging to compute and can make it difficult to obtain posterior models.
For certain combinations of prior models and likelihood models, the form of the posterior models are known. A class of prior models C is said to be conjugate with respect to a likelihood model, if the corre- sponding class of posterior models is also C (Casella and Berger, 2001).
Gaussian prior models provide the perhaps most well known examples of conjugacy; Gaussian prior models are conjugate with respect to so-called Gauss-linear likelihood models (Tarantola, 2005), which are likelihood models that are Gaussian with expectation linear in r. This relation provides the foundation of geostatistics and Kriging interpolation (Chilès and Delfiner, 1999), and because Gaussian prior models can adequately represent many phenomena, the relation is often used in Bayesian inver- sion frameworks due to analytically available posterior models. Moreover, predictive quantities can be computed directly from the parameters of the posterior model.
Thenr-dimensional random vectorris a Gaussian random field (GRF) withnr-dimensional expectation vectorµrand(nr×nr)covariance matrix Σr, if its pdf is of the form (Johnson and Wichern, 2007)
p(r) = (2π)−nr/2|Σr|−1/2exp
−1
2(r−µr)T Σ−r1(r−µr)
, (6) with the superscriptT indicating the matrix transpose. We denote this Gaussian pdf by ϕnr(r;µr,Σr). A Gauss-linear forward model is of the
form , where is an matrix representing the
linear operation onrand thend-dimensional vectored|r is an error term with Gaussian distribution. We assume the error term to be zero in expectation; hence, the likelihood model can be expressed
p(d|r) =ϕnd(d;Fr,Σd|r), (7) where Σd|r is the associated (nd×nd) covariance matrix. The Gaussian posterior model ϕnr
r;µr|d,Σr|d
is easily available through computa- tion of its conditional parameters (Johnson and Wichern, 2007),
µr|d=µr+ΓrdΣ−d1(d−µd) (8) Σr|d=Σr−ΓTrdΣ−d1Γrd,
whereΓrd =ΣrFT contains the inter-variable covariances betweenrand d. Moreover, µd =Fµr andΣd=FΣrFT.
If a non-conjugate prior model is used, assessment of the posterior model tends to be simulation based. Markov chain Monte Carlo (McMC) is a widely used simulation technique for posterior model assessment (Mosegaard and Tarantola, 1995; Sen and Stoffa, 1996; Eidsvik et al.,
2004a).
Markov chain Monte Carlo simulation
McMC simulation schemes make use of Markov chains to simulate from the target distribution π(r) = p(r|d). The chain requires a transition kernel pt r0|r
, by which the chain can enter new states r0 given its current state r. Such a transition kernel must ensure that the target distributionπ(·) is the equilibrium distribution of the chain, i.e.,
Z
B
π(r) dr= Z
pt(B|r)π(r) dr, ∀B∈ B, (9) where
pt(B|r) = Z
B
pt(r0|r) dr0 (10) and B is the Borel σ-field on Rnr. This condition is ensured by using reversible chains where the transition kernel satisifies the detailed balance equation (Gamerman and Hedibert, 2006),
π(r)pt(r0|r) =π(r0)pt(r|r0), ∀(r,r0). (11) The transition kernelpt(r0|r) consists of a proposal kernelq(r0|r) and a probability α(r0|r)∈[0,1], such thatpt(r0|r) =α(r0|r)q(r0|r) for r0 6=r
defines the probability to move to a new state. Hence, the complement of the integral of the transition kernel over all new states defines the probability to remain in the current state, and the transition kernel can be expressed
pt(B|r) = Z
B
q(r0|r)α(r0|r) dr0 (12) +I(r∈B)
1−
Z
q(r0|r)α(r0|r) dr0
, ∀B ∈ B,
where I(·) is the indicator function, being equal to 1 if its argument is true and0 otherwise.
Identifying a transition kernel pt(r0|r) that can produce a Markov chain with the target distributionπ(·)as its equilibrium distribution may appear as a daunting task. Conveniently, the Metropolis-Hastings (M-H) algorithm (Hastings, 1970) provides the means by which to do so. The M-H algorithm designates an acceptance probability that ensures that the transition kernelpt(·|·)defines a reversible chain when combined with an arbitrary proposal kernel q(·|·),
α(r0|r) = min
1,π(r0)q(r|r0) π(r)q(r0|r)
. (13)
The M-H algorithm defines an irreducible and aperiodic chain with tran- sition kernel pt(·|·) withπ(·) as its limiting distribution if the proposal kernel q(·|·) is aperiodic and irreducible andα(r0|r)>0 for all possible values of(r0,r)(Roberts and Smith, 1994).
The sequence of simulations{rk}nk=1 from the McMC algorithm con- verges in distribution to the target distributionπ(r) as n→ ∞. Hence, obtaining a result in finite time necessitates decisions about where to start and when to stop the algorithm. Usually, the initial state of a chain is drawn at random and will not be from a representative region of π(r). A subsequent sequence of simulations are notably influenced by the initial state, before the chain is sufficiently close to π(r). This notably affected sequence of simulations is termed the burn-in of the chain, and we discard it because these simulations are not representative of π(r).
For practical purposes, we draw inspiration from the Gelman-Rubin con- vergence diagnostic (Gelman and Rubin, 1992) to decide when to stop a chain, unless otherwise stated. We run several chains in parallel and compare the results to determine an appropriate stopping point. The time necessary to reach a stopping point depends on how efficiently the chain navigates the target distribution π(r), which is captured by the concept of mixing. A chain is said to have good mixing if approximately
independent simulations are not far apart in the sequence of simulations, and conversely a chain has bad mixing if approximately independent sim- ulations are far apart. Good mixing is preferable because it entails that the chain rapidly arrives at an acceptable degree of convergence. The mixing of a chain is influenced by the proposal kernel q(·|·); hence, the proposal kernel should be selected with care if computational efficiency is a concern.
In a setting where the form of the posterior model is unknown, McMC simulation is traditionally performed by using the prior model p(r) as proposal distribution (Mosegaard and Tarantola, 1995). The associated M-H acceptance probability is given by
α(r0|r) = min
1,p(d|r0) p(d|r)
. (14)
Although this approach is generally valid, it tends to suffer from low acceptance rates in multivariate settings. Moreover, in the absence of a very informative prior model there is an inverse relationship between acceptance rates and data quality. The acceptance rate increases as the uncertainty associated with the measured data increases, and decreases as the number of data points increases.
If a prior model that is conjugate with respect to a linear likelihood model is used with a non-linear likelihood model, the conjugate property can be exploited for more efficient simulation by use of an approximated linear likelihood model, pL(d|r)≈p(d|r). Proposals can then be made from an approximate posterior modelpL(r|d)∝pL(d|r)p(r). Simulating from an approximate posterior model is similar to simulating from the prior model due to the conjugate property, but the proposals are likely to be better; hence, this approach may increase the acceptance rate. The associated M-H acceptance probability is
α(r0|r) = min
1,p(d|r0)/pL(d|r0) p(d|r)/pL(d|r)
. (15)
Note that α(r0|r) → 1 if the likelihood model is close to linear p(d|r)→pL(d|r), as expected.
Reservoir data
Oil and gas reservoirs are formations of rock in the subsurface in which hydrocarbons have accumulated. Reservoirs are expensive to produce and necessitate the construction of production wells. Hence, attempts to produce reservoirs that are poorly suited for production entails big economical losses. However, producing well suited reservoirs can yield very large payoffs. The ability to identify a reservoirs suitability for production is therefore of utmost importance in the oil and gas industry.
The process of doing so is called reservoir characterization, which we will return to later. Reservoir characterization relies on data from potential reservoirs, which usually consist of seismic data and well data. Seismic data has good spatial coverage and poor precision, whereas well data has poor spatial coverage and high precision.
Seismic AVO data
Seismic data are collected by emitting compressional waves into the sub- surface and measuring and registering the amplitude of the waves that are reflected back. These data and can be collected from large subsurface volumes in search for potential hydrocarbon reservoirs at relatively low cost. The arrival times of the reflections relative to the time of emission are also registered. The arrival times make it possible to map the regis- tered amplitudes to particular locations in the subsurface; hence, seismic data are a representation of the subsurface based on wave amplitudes and times.
The physics involved in the collection of seismic data is complex and relies on intimate details of the medium that the seismic waves propagate through. Seismic wave energy is reflected back at interfaces defined by rock or fluid inhomogeneities. Moreover, the proportion of reflected energy is dependent on the angle of incidence of the emitted seismic waves on the interfaces, which is described by the Zoeppritz equations (Zoeppritz, 1919). These equations relate the reflectivity coefficients at interfaces to changes in the elastic properties and angle of incidence. The elastic properties consist of P-wave velocity and S-wave velocity, which are jointly referred to as seismic velocities, and density.
The angle dependency of the reflected energy is very useful for detect- ing hydrocarbons, because the seismic responses of fluid transitions have a strong and characteristic angle dependency. This attribute forms the basis for seismic amplitude variation with offset (AVO) data and seismic AVO analysis. Seismic AVO data consist of seismic data associated with different angles of incidence of the emitted compressional waves on in-
terfaces in the subsurface. For a specific subsurface target, seismic AVO data are collected by varying the horizontal distance between the source that emits the compresssional waves and the receiver that measures the reflections, while keeping the subsurface target centered between them.
In practice, receivers are usually placed at numerous distances from the source to obtain seismic data from different subsurface targets simultane- ously. This principle is used in offshore seismic data collection, in which a large number of receivers are towed behind a moving ship that emits compressional waves into the subsurface.
As an emitted compressional wave travels through the subsurface, it becomes distorted and stretched due to dispersion, and the shape of the distorted pulse is termed a seismic wavelet. The measured reflection at a given point in time contains contributions from several reflectivity co- efficients in the subsurface, weighted by the seismic wavelet. Hence, the received signal is modeled as a convolution of a sequence of reflectivity co- efficients with a seismic wavelet in a time interval. In practice, the seismic wavelet is unknown and application dependent, because the dispersive process is sensitive to the medium that the seismic waves travel through.
Therefore, in order to construct a sensible seismic forward model, seismic data should be supplemented by well data from which the seismic wavelet can be estimated.
Well data
In oil and gas exploration, it is usual to drill bore-holes at locations of particular interest in a seismic survey to obtain information that enables interpretation of the seismic data. Core samples, which are informative about the subsurface lithology, are collected during drilling. Once the wells are drilled, measuring instruments are placed in the bore-holes and hoisted up while producing well log data on a regular and relatively fine grid along the well profiles. Measurements of the seismic velocities and density, as well as measurements of petrophysical properties are recorded.
The measuring instruments are highly accurate; hence, well log data are subject only to minor measurement errors and can be considered as good approximations to the truth.
The reflectivity coefficients that can be used to estimate the seismic wavelet along the wells can be computed from well data using the Zoep- pritz equations. Estimating the seismic wavelet at a few locations in a seismic survey tends to be adequate because the wavelet shape varies slowly with lateral position (Walden and White, 1998), due to layering effects. Another important use of well data is in the construction of rock physics models, which are necessary to interpret seismic data in terms
of petrophysical properties. The seismic data are related to the elastic properties, and a rock physics model can in turn relate the elastic prop- erties to petrophysical properties. A rock physics model can either be based on theoretical relations or be empirically approximated.
Bayesian reservoir characterization
The goal of reservoir characterization is to evaluate a reservoir’s suit- ability for production. Reservoir characterization is a spatial problem in which the reservoir zone of interest is discretized into a reservoir gridGr, consisting of nr grid points. A spatial reservoir variable r from which the production suitability of a reservoir can be inferred, is defined on the reservoir grid Gr. That is, each random variable contained in r is associated with a location in Gr, which enables spatial effects to influ- ence the characterization. A reservoir variable consists of a few selected petrophysical properties related to the producibility of the reservoir under study and typically includes permeability/porosity and water saturation, which are volumetric fractions, i.e., quantities limited to[0,1]. Porosity is informative about the presence of pores in the rock, which are pockets of empty space where hydrocarbons can settle. The degree to which the empty spaces in the rock are connected is described by permeability. The connectivity of the pores is important for fluid flow through the rock, which is crucial for the extraction of hydrocarbons. Permeability is usu- ally strongly dependent on porosity, and since the former is complicated to measure, it is often derived from the latter. Water saturation is in- dicative of the location of hydrocarbons. Because water is denser than hydrocarbons, gravitational effects tend to separate the fluids. Hence, hy- drocarbons can typically be found at locations where the water saturation is low. In reservoirs that are lithologically heterogeneous, porosity and water saturation may not be sufficiently informative and should prefer- ably be accompanied by lithology variables, such as volume of clay, to explain certain phenomena. The reservoir variable rcontainsnp selected petrophysical properties and can be expressed as r= [r1, ...,rnp], with each rk being defined on the reservoir gridGr, fork= 1, ..., np. Hence,r is an npnr-dimensional vector. For ease of presentation we will consider np = 1in the following.
Reservoir characterization of a subsurface reservoir volume is typically done by inversion of seismic data. This is an inverse problem in which seismic data d on a seismic grid Gd, consisting of nd grid points, is sought to be explained by the reservoir variables r. We confine ourselves to seismic AVO data. The inversion is either cast into an optimization
setting or into a probabilistic setting. In an optimization setting, a highly accurate seismic forward model is used and the deviation between dand the seismics computed by the seismic forward model is minimized with respect to the elastic properties (Sen and Stoffa, 2013). Probabilistically, the inverse problem is usually approached in a Bayesian framework using approximate Zoeppritz equations (Buland and Omre, 2003; Gunning and Glinsky, 2004; Larsen et al., 2006). The Bayesian seismic reservoir characterization is represented by the posterior model p(r|d), which is proportional to the product of a likelihood modelp(d|r)and a prior model p(r), both of which has to be specified, see Equation 5.
Likelihood model
The likelihood model represents the chain of data acquisition, from the target variables of the inversion to the data, and is based on geophysics theory and well data. Well data from representative regions of the reser- voir volume are particularly important for seismic wavelet estimation and may also be used to inform the rock physics model.
The Zoeppritz equations can in principle completely describe the relation between the PP reflectivity coefficients c(t, θ) and the elastic properties along a seismic trace. The elastic properties are canonical variables of the equation and consist of the seismic velocities, which are the P-wave velocitiesVp(t)and the S-wave velocitiesVs(t), and the densities ρ(t). However, the equations are difficult to interpret and their solution is unstable due to nonlinearity. Therefore, the Zoeppritz equations tend to be linearly approximated, and several linear approximations exist, including Bortfeld’s approximation (Bortfeld, 1961), Aki and Richards’
approximation (Aki and Richards, 1980), and Shuey’s approximation (Shuey, 1985). In Bayesian inversion frameworks, the Aki and Richards’
approximation is a common choice (Buland and Omre, 2003; Larsen et al., 2006; Grana and Della Rossa, 2010; Rimstad et al., 2012). We use a time continuous reflectivity function for the PP reflection coefficients, based on the Aki and Richards’ approximation (Buland and Omre, 2003),
c(t, θ) =a1(t, θ) δ
δtln Vp(t)
+a2(t, θ) δ
δtln Vs(t)
(16) +a3(t, θ) δ
δtln ρ(t) ,
where
a1(t, θ) = 1 2
1 + tan2(θ)
, (17)
a2(t, θ) =−4V¯s2(t)
V¯p2(t)sin2(θ), (18) a3(t, θ) = 1
2 1−4V¯s2(t)
V¯p2(t)sin2(θ)
!
. (19)
Moreover, V¯p(t),V¯s(t), and ρ(t)¯ are time dependent averages which are assumed to be adequately represented by a constant or moving average in a time window. The above approximation can for nθ offset angles be discretized and represented in matrix form as c = ADm. Here, we discretize according to Gr. Hence, c is an nθnr-dimensional vector.
Moreover,A is the sparse(nθnr×3nr) matrix
A=
A1(θ1) A2(θ1) A3(θ1)
... ... ...
A1(θnθ) A2(θnθ) A3(θnθ),
(20) whereA1(θi),A2(θi), andA3(θi)are(nr×nr)diagonal matrices contain- ing discrete time samples ofa1(t, θi),a2(t, θi), anda3(t, θi), respectively, for i= 1, ..., nθ. Furthermore, the (3nr×3nr) matrix D is a first order differential operator. Lastly, the3nr-dimensional vectorm contains the elastic properties, discretized on Gr, in the logarithmic domain. Hence, a reflectivity likelihood model is
c|m
=ADm+ec|m, (21)
where ec|m is an nθnr-dimensional vector containing approximation er- rors.
The seismic AVO datadare the convolution of a seismic wavelet with the reflectivity coefficients c. The wavelet is discretized to be consistent with the resolution of the reflectivity coefficients, and the convolutional likelihood model is
d|c
=Wc+ed|c. (22)
Here, the(nθnd×nθnr)matrixWcontains discretizations of the seismic waveletW(t, θ) for allnθ offset angles, and the nθnd-dimensional vector
ed|c contains observation errors. The wavelet matrix is of the form
W=
W1 0inr . . . 0inr 0inr W2 0inr . . . ...
... 0inr ... ...
... ... ... 0inr 0inr . . . 0inr Wnθ
, (23)
where the(nd×nr)matricesWicontainnd discretizations of the seismic wavelet, with each row corresponding to the seismic wavelet centered at a location inGd.
The seismic likelihood model is p(d|m) =
Z
p(d|c,m)p(c|m) dc= Z
p(d|c)p(c|m) dc, (24) which can be expressed
[d|m] =WADm+ed|m, (25)
whereed|m =Wec|m+ed|c. The error terms are typically assigned Gaus- sian distributions (Buland and Omre, 2003), which yields a Gauss-linear seismic likelihood model. This likelihood model can readily be used in a Bayesian seismic inversion framework for the elastic properties, given a prior modelp(m). A Gaussian prior model is advantageous for its conju- gate property, but not required. However, if reservoir variables are the target of the inversion, a rock physics likelihood model p(m|r)is needed.
The rock physics model can be integrated into the seismic inversion model or be used in rock physics inversion, i.e., after seismic inversion to elastic properties. However, for the uncertainties to propogate all the way from the seismic data to the reservoir variables, the data acquisition procedure should be described all the way from rtod by a likelihood model. The overall likelihood model is
p(d|r) = Z
p(d|m,r)p(m|r) dm= Z
p(d|m)p(m|r) dm, (26) where the latter step follows because m are canonical variables of the Zoeppritz equations. The rock physics forward function may be non-linear and, if so, the forward function in the overall likelihood model p(d|r) is non-linear. Under these circumstances, simulating from the posterior model tends to be computationally inefficient in spatial settings. However, if a linear forward function seems feasible, either by empirical estimation
or theoretical approximation, computationally efficient inversion schemes can be based on a Gauss-linear rock physics model. The rock physics forward function is then represented by the(3nr×nr) matrixBand the overall likelihood model takes the form
[d|r] =Gr+ed|r=WADBr+ed|r, (27) where ed|r =WADem|r+ed|m is assumed to be Gaussian, withem|r a 3nr-dimensional vector containing rock physics model errors. Hence, the overall likelihood model is defined to be Gauss-linear.
Prior & Posterior model
The prior modelp(r), see Equation 5, is assigned on a subjective basis, but should be representative of r. A prior model is representative if it is accurately centered and realistically represents the uncertainty and spa- tial continuity inr. Prior model assignment can be based on experience, expert knowledge, beliefs, data, or any combination thereof. If represen- tative well data of the reservoir variables are available, it is natural to adopt an empirical Bayes approach to prior model assignment, and we do so. The empirical Bayes method entails estimation of the prior model from representative data, which can be done either non-parametrically or parametrically. We will be concerned with parametric empirical Bayes, to which an introduction can be found in Casella (1985). Reservoir vari- ables tend to be multimodal due to underlying lithology/fluid (LF) classes (Grana and Della Rossa, 2010; Rimstad et al., 2012); hence, a Gaussian prior model may not be adequate. A prior modelp(r) should often have support for multimodality, and two Gaussian-based parametric model alternatives have emerged in the Bayesian seismic inversion literature;
namely, Gaussian mixture models and selection Gaussian models.
Gaussian mixture models
Gaussian mixture models (GMMs) can represent multimodal distribu- tions and have successfully been applied to reservoir characterization (Grana and Della Rossa, 2010; Rimstad et al., 2012; Fjeldstad et al., 2021).
As the name implies, GMMs are generated by combining Gaussian mod- els. Typically, the mixture is based on pre-defined LF classes that notably affect the reservoir variables; hence, the model relies on conditional Gaus- sian distributions for the reservoir variables,p(r|κ) =ϕnr(r;µr|κ,Σr|κ), where κ∈Ωnκr is a spatial mode indicator variable representing the LF classes. Here, Ωκ = {ω1, ..., ωnκ} is the set of LF classes. A Gaussian
mixture prior modelp(r) can be expressed as p(r) =X
Ωnrκ
p(r|κ)p(κ), (28) wherep(κ)is a prior model forκ. Defined as such, the prior model for the reservoir variables is a probability weighted sum of Gaussian distributions, each of which has a parameterization that is characteristic for a particular LF class.
The prior model for κ should honor geophysical laws and tenden- cies with respect to LF ordering and transitions, and spatial continuity.
Markov models can impose such constraints and have a long standing tra- dition in geophysical applications, first appearing in the form of Markov chains in 1D applications (Krumbein and Dacey, 1969). A first-order Markov chain prior model forκcan be expressed
p(κ) =p(κ1)
nr
Y
i=2
p(κi|κi−1), (29) where p(κ1) is the stationary probability and p(κi|κi−1), i= 2, ..., nr are transition probabilities. Higher order chains can be defined to add addi- tional constraints such as minimum layer thickness. The interpretability and functionality of Markov models have made them widely used in geo- physical applications, and in particular to exploring the LF properties of reservoirs (Eidsvik et al., 2004b; Larsen et al., 2006; Ulvmoen et al., 2010).
Generally, Markov models are specified in the form of Markov ran- dom fields (MRFs), which are typically defined according to the Gibbs formulation (Besag, 1974; Kindermann and Snell, 1980). This formula- tion generalizes the Markov chain definition to higher spatial dimensions and is based on so-called cliques. A clique is defined on an undirectional graph as a set of mutually adjacent vertices, i.e., every pair of vertices in a clique are adjacent. Moreover, a maximal clique c is a clique that is not a subset of a larger clique, and the set of maximal cliques on the graph is denoted by C. The MRF model can be expressed in terms of maximal cliques as
p(κ)∝exp
−X
c∈C
ψc(κc)
, (30) where ψc(κc) is the clique potential function for the maximal clique c andκc are the LF classes in the maximal clique c.
The neighbordhood structure of an MRF is related to its cliques, as described by the Hammersley-Clifford theorem (Hammersley and Clifford, 1971). Intuitively, the neighborhood structure of an MRF can be defined locationwise from its defining cliques as the set of locations in the union of all cliques that include the target location, except the target location itself, as illustrated in Figure 1. The MRF model supports arbitrary spatial dimensionality and higher order neighborhoods; hence, the Markov chain model specified in Equation 29 is a special case of an MRF in 1D with neighborhoods consisting of the two nearest locations, except for border effects.
Clique Neighborhood
Figure 1: Cliques with corresponding neighborhoods. The neighborhoods are defined with respect to the cells marked by a cross.
The full conditional densities arep(κi|κic), with the subscriptic de- noting the complement of location ion the graph, i.e., ic ={1, ..., nr}\i, i = 1, ..., nr. These densities are defined by the neighborhood struc- ture of the MRF, which can be identified through application of the Hammersley-Clifford theorem. The typically large grids associated with reservoir characterization entails that the normalization constant of the MRF is computationally prohibitive. Therefore, assessment ofp(κ)tends to be simulation based and is often done by using the full conditional densitiesp(κi|κic), that is, simulation is usually done by single-site Gibbs sampling.
GMMs are conjugate prior models subject to Gauss-linear likelihood models (Grana et al., 2017; Fjeldstad and Grana, 2018); hence, p(r|d)is
also a GMM and of the form p(r|d) =X
Ωnrκ
p(r|κ,d)p(κ|d). (31) Note that the conditional pdf p(r|κ,d) is Gaussian and therefore easy to evaluate. Evaluation of the mixing weights p(κ|d) is more involved.
Assessment of the posterior modelp(κ|d) is complicated due to the ver- tically convolutional seismic likelihood model, which results in p(κ|d) being a convolved hidden Markov model (Lindberg and Omre, 2014).
Consequently,p(κ|d) will be a higher-order Markov model, irrespective of the order defined in p(κ). This entails that assessment of the exact posterior Markov model becomes computationally infeasible. However, assessment can be based on approximations (Rimstad and Omre, 2013;
Fjeldstad and Grana, 2018).
Selection Gaussian models
The selection Gaussian model is inspired by the concept of selection prob- ability distributions (Azzalini, 1985; Arellano-Valle et al., 2006; Azzalini, 2013). The selection concept has been extended to spatial settings (Al- lard and Naveau, 2007; Omre and Rimstad, 2021) and has successfully been applied to seismic inversion (Karimi et al., 2010; Rimstad and Omre, 2014). The selection Gaussian model is a very flexible class of models and is a viable candidate for representing multimodal random variables.
A selection Gaussian random field (S-GRF) is based on two interacting GRFs: the basis GRF ˜r and the auxiliary GRF ν. The basis GRF is specified on the reservoir gridGr, whereas the auxiliary GRF is specified on a gridGν, which may differ. The conditional random field (RF)[ν|˜r]
is Gauss-linear; hence, (˜r,ν) are jointly Gaussian, with joint pdf p
"
˜ r ν
#
=ϕnr+nν
"
˜ r ν
#
;
"
µr˜ µν
# ,
"
Σr˜ Γrν˜ ΓTrν˜ Σν
#
. (32) Here,µ˜ris thenr-dimensional expectation vector of˜randΣ˜ris its(nr× nr) covariance matrix. Similarly, µν is thenν-dimensional expectation vector ofν andΣν is its(nν×nν) covariance matrix. Lastly,Γrν˜ is the (nr×nν)cross-covariance matrix between ˜randν. A truncation of the auxiliary GRFν is defined according to thenν-dimensional selection set A. The conditional RFr= [˜r|ν ∈A] is selection Gaussian, with pdf
p(r) =p(˜r|ν ∈A) = Φnν(A;µν|˜r,Σν|˜r)
Φnν(A;µν,Σν) ϕnr(˜r;µr˜,Σr˜), (33)
where the nominator and denominator are the Gaussian probabilities of the selection set. That is, the denominator is
Φnν(A;µν,Σν) = Z
A
ϕnν(ν;µν,Σν) dν, (34) and similarly for the nominator with appropriate distributional parame- ters. The conditional parameters in Equation 33 are computed by stan- dard Gaussian conditioning formulas (Johnson and Wichern, 2007),
µν|˜r=µν +ΓT˜rνΣ−˜r1(˜r−µr˜) (35) Σν|˜r=Σν −ΓT˜rνΣ−˜r1Γ˜rν.
The flexibility of the distribution enters through the cross-covarianceΓrν˜ between˜randν, and the shape of the selection setA.
To robustly represent the variability in the reservoir variables, a spa- tially stationary prior model is sensible, because it captures the total uncertainty reflected in the well data at every location. An RF is said to be stationary if its associated pdf is spatially shift invariant for any subset of its random variables. That is, the pdf of the chosen subset of random variables must depend only on the distances between them and not on their specific location. We specify an S-GRF model withnν =nr, which can support stationarity, except for border effects. Moreover, it enables mode transitions at every location in the reservoir grid, which can be important for precisely locating regions of interest in the spatial domain. To restrict model complexity, we specify an intervariable spatial correlation structure, which is contained in the(nr×nr) matrixΩ and defined through a translation invariant and positive definite correlation functionρ(·). We define a stationary Gaussian distribution for˜rwith ex- pectation vectorµ˜rinr and covariance matrixΣ˜r=σr2˜Ω. Here,µr˜andσr2˜ are the locationwise expectation and variance of˜r, respectively, andinr is the nr-dimensional vector of ones. Furthermore, because the influence of[ν|ν ∈A] on˜rdepends on the location ofAonly through the relative location of A to the distribution of the auxiliary variable, we define a stationary and locationwise standard Gaussian distribution forν, with ex- pectation vectorµν = 0inr and covariance matrixΣν =γ2Ω+(1−γ2)Inr, where Inr is the (nr×nr) identity matrix. Here, γ is the locationwise correlation between r˜ and ν. In this framework, the cross covariance matrix is an (nr×nr) matrix of the form Γrν˜ =γσ˜rΩ. Lastly, we use a location invariant selection set, i.e., A=Anr. The locationwise selec- tion set Aconsists of nA disjoint real intervals,A=SnA
i=1[ai, bi], ai < bi. The S-GRF specified above is stationary and the conditional parameters
involved in its pdf in Equation 33 can be expressed
µν|˜r= 0inr+γσ−1˜r (˜r−µr˜inr) (36) Σν|˜r= (1−γ2)Inr.
Hence, the model parameters are ΘSG = [µ˜r, σ˜r, γ, A, ρ]. The first four parameters are primarily related to the locationwise distributions of the S-GRF, whereasρ(·)primarily relates to the spatial correlation structure.
We now briefly consider S-GRFs of low dimensionality to build some model intuition. Figure 2 presents three examples of univariate selection Gaussian pdfs and associated cumulative probability functions (cdfs), with the effect of the selection set highlighted. The geometry of the selec- tion setA notably affects the selection Gaussian distribution and is the source of vast model flexibility, provided that the correlationγ between the basis variable r and the auxiliary variable ν is sufficiently strong.
Figure 3 illustrates the selection Gaussian distribution in a multivariate setting based on the bimodal and trimodal univariate distributions shown in Figure 2.
Figure 2: Univariate selection Gaussian distributions. Skewed (top row), bimodal (middle row), and trimodal (bottom row). The left column displays the shape of the selection setA superimposed on the joint dis- tribution of r˜and ν, whereas the middle and right column display the resulting pdf and cdf ofr, respectively.
Figure 3: Multivariate selection Gaussian distributions with bimodal (top row) and trimodal (bottom row) marginals. Bivariate distributions (left column) and realizations from corresponding 2500-variate S-GRFs (right column). A moderate level of spatial correlation is used.
S-GRFs are conjugate prior models with respect to Gauss-linear like- lihood models (Omre and Rimstad, 2021); hencep(r|d)is also an S-GRF and of the form
p(r|d) =p(˜r|ν ∈A,d) (37)
= Φnr
A;µν|r,d˜ ,Σν|˜r,d Φnr
A;µν|d,Σν|d ×ϕnr
˜
r;µr˜|d,Σr|d˜ .
The expressions for the parameters involved in the posterior model can be developed from classical Gaussian theory. The conditional expectation vectors are
"
µr˜|d µν|d
#
=
"
µ˜rinr 0inr
# +
"
Σr˜GT Γν˜rGT
#
Σ−1d (d−µd), (38) and the conditional covariance matrices are
"
Σ˜r|d Γ˜rν|d Γνr|d˜ Σν|d
#
=
"
Σr˜ Γrν˜ Γν˜r Σν
#
−
"
Σr˜GT Γν˜rGT
# Σ−1d h
GΣr˜ GΓ˜rνi
, (39)
hence
µν|˜r,d=µν|d+Γν˜r|dΣ−1˜r|d(˜r−µr˜|d), (40) Σν|˜r,d=Σν|d+Γν˜r|dΣ−1˜r|dΓ˜rν|d.
Recall thatGis the forward function of the likelihood model. Further- more, Σd =GΣrGT.
The prior and posterior models are, therefore, both available once a suitable assessment strategy for S-GRFs is developed. The assessment strategy should be capable of dealing with non-stationary S-GRFs because the posterior model will, even if the prior model is stationary, tend to be non-stationary. In the following, we will discuss the assessment of a prior S-GRF model, but the same discussion applies to the assessment of the posterior model.
In the univariate case, the selection Gaussian distribution can easily be evaluated analytically by use of well known Gaussian cdfs,
p(r) =p(˜r|ν ∈A) (41)
= PnA
i=1
Φ1(bi, µν|˜r, σ2ν|r˜)−Φ1(ai, µν|˜r, σν2|r˜) PnA
i=1 Φ1(bi, µν, σν2)−Φ1(ai, µν, σ2ν) ϕ1(˜r;µr˜, σr2˜).
Evaluation of the pdf becomes challenging for high dimensional S- GRFs and is usually simulation based. The simulation is performed in two steps; first the auxiliary variableν is simulated from the truncated Gaus- sian pdfp ν|ν ∈A
=I(ν ∈A)ϕnν(ν;µν,Σν)
Φnν (A;µν,Σν)−1
and then the conditional basis variable
˜
r|ν,ν∈A
is simulated from the con- ditional Gaussian pdfϕnr
˜r;µ˜r|ν,Σ˜r|ν
. The latter step is straightfor- ward because efficient algorithms for simulation from GRFs are available, whereas the former is more involved and typically relies on an McMC algorithm. A sensible simulation strategy is to simulate the truncated auxiliary GRF [ν|ν ∈ A] piece by piece; hence, the form of the condi- tional pdfs of the RF is important. For a blockb⊆ {1, ..., nr}of sizenb, a block based decomposition of the pdf p(ν|ν ∈A) is
p(ν|ν ∈A) =p(νb,νbc|νb∈Anb,νbc ∈Anr−nb) (42)
=p(νb|νbc,νb∈Anb)p(νbc|νb ∈Anb,νbc ∈Anr−nb), where the subscript bc denotes the complement of the block. In the following, we somewhere omit subscripting expectations and variances for ease of notation. The expectations and variances that appear without
a subscripted variable are to be understood as if subscripted by ν. We first consider the case of a single-site block, i.e., a block of size nb = 1 that consists only of one location, b=i,i∈ {1, ..., nr}. The associated block pdf is
p(νb|νbc,νb ∈Anb) = I(νb∈Anb)p(νb|νbc)
P(νb∈Anb|νbc) (43)
= I(νi ∈A)ϕ1(νi;µi|ic, σi2|ic) Φ1(A;µi|ic, σ2i|ic) .
This block pdf is a truncated Gaussian pdf and represents the locationwise full conditional pdfs of the truncated GRF. Hence, the locationwise full conditional pdfs of the truncated GRF are computationally easily available, which entails that single-site Gibbs sampling may be feasible.
Single-site Gibbs sampling is a viable and efficient simulation strategy for RFs with weak spatial correlation structure that simulates sequentially from the locationwise full conditional distributions. The single-site Gibbs simulation algorithm is presented in pseudo code in Algorithm 1.
Algorithm 1: Dok simulation sweeps of
ν|ν ∈A
by single-site Gibbs.
Precompute conditioning weights ki and conditional marginal variancesσ2i|ic:
For i from 1 to nr ki =Σi,icΣ−ic1. σ2i|ic =σ2i −kiΣTi,ic. End
Initialize ν0 ∈A.
For j from 1 to k Set νj =νj−1 For i from 1 to nr
Compute conditional mean and simulate:
µji|ic =µi+ki
νjic−µic
.
νij ∼ I(ν
j i∈A)ϕ1
νji;µji|ic,σ2i|ic Φ1
A;µji|ic,σ2i|ic . End
End
Note that the conditioning weightski are identical for locations that are not influenced by border effects in stationary S-GRFs, which may be exploited for more efficient precomputation. In non-stationary S-GRFs the conditioning weights tend to be unique for each location. Moreover, computation of the parameters involved in the full conditional marginals may be prohibitive if the grid under study is large and the range of the spatial correlation is long. In such situations it may be necessary to reduce computational time by approximating the full conditionals as π(νi|νic) ≈π(νi|νin), with νin consisting of random variables that are notably correlated with νi.
The feasibility of the single-site Gibbs algorithm depends on the properties of the locationwise full conditional pdfs. These pdfs become increasingly constrained with increasing correlation in the RF, which is detrimental to the mixing of the algorithm in multimodal settings. Block- wise sampling, which is based on a partition of the grid into blocks con- sisting of collections of adjacent grid points, offers a possible solution to the limitations of single-site Gibbs sampling. The sampling is performed blockwise and sequentially, which enables reduction of the influence of the RF outside the block on the locationwise marginals within the block by strategic inter-block sampling. The pdf of a block b⊆ {1, ..., nr} of sizenb >1is
p(νb|νbc,νb ∈Anb) = I(νb∈Anb)p(νb|νbc)
P(νb ∈Anb|νbc) (44)
= Q
i∈bI(νi ∈A)ϕ1(νi;µi|bc,v, σi|b2 c,v) Φnb(Anb;µb|bc,Σb|bc) .
Here, the subscript bc,v denotes the union of the complement of the block and the already visited locations within the block. Note that the normalization constant Φnb(Anb;µb|bc,Σb|bc) can not be expressed in a sequentially conditional form; hence, assessment of the block pdf requires evaluation of a high dimensional Gaussian orthant probability, which is challenging. Therefore, we use the Metropolis Hastings approach presented in Rimstad and Omre (2014) with proposal
q(νb|νbc,νb ∈Anb) =Y
i∈b
I(νi∈A) ϕ1
νi;µi|bc,v, σi2|bc,v
Φ1
A;µi|bc,v, σ2i|bc,v
, (45)