Data fusion algorithms for assessing sensors’ accuracy in an oil production well : a Bayesian approach

(1)

FACULTY OF SCIENCE AND TECHNOLOGY

MASTER'S THESIS

Study program/specialization:

Master in Computer Sciences

Spring semester, 2009

Open / Confidential

Author: Rui Máximo Esteves ………

(signature author)

Instructor(s): Prof. Dr. Chunming Rong; Engº Tomasz Wlodarczyk

Supervisor(s): Eng. Einar Landre (Statoil Hydro)

Title of Master’s Thesis: Data fusion algorithms for assessing sensors’ accuracy in an oil production well - A Bayesian approach.

Norwegian title:

(2)

Acknowledges

The author would like to express his special gratitude to:

Prof. Dr. Chunming Rong from University of Stavanger;

Engº Einar Landre from Statoil Hydro;

Engº Tomasz Wlodarczyk from University of Stavanger;

Engº Mohammad Rajaieyamchee from University of Stavanger;

Engº Terje Kastad from University of Stavanger.

(3)

Abstract

Oil industry faces an underutilization problem of the captured data during the extracting process. This issue is a consequence of the lack of information regarding sensors’

accuracy. One effect can be a serious obstacle in the development of computer assisted decision systems.

In a production well, it can be experienced the inexistence of sensor redundancy and enough information to assess credible probabilities. In this situation, we have to strongly depend of the experts’ ability to provide alternatives based on their understanding. These skills can be a critical limitation and turns particularly difficult the establishment of a prediction model.

With this work we propose a Bayesian Network approach as a promissory data fusion technique for surveillance of sensors accuracy. We proved the usefulness of this method when it seems there isn’t enough feasible data to construct a model. In presence of certain data constrains we suggest an inversion of the causal relationship. This approach can be a possible solution to help the expert in accessing conditional probabilities.

(4)

Index of Contents

Acknowledges... i

Abstract ... ii

Index of Contents... iii

Index of figures ... v

Index of tables... vi

Chapter I - INTRODUCTION ... 1

Thesis Overview ... 2

Aim of the thesis ... 4

Chapter II - THEORY... 5

Data Fusion taxonomy ... 6

Introduction... 6

JDL Based Taxonomy... 8

Raw data level... 9

Methods... 9

Feature data level ... 11

Methods... 11

Decision level... 15

Methods... 15

Hard and Soft Decision Taxonomy... 23

Why Bayesian? ... 25

Bayesian theory... 27

Bayesian Network... 29

Applications ... 36

Limitations ... 38

Chapter III - THE MODEL... 40

Hypothesis... 41

Methodology ... 42

Assumptions... 43

Data considerations ... 44

Experiments ... 46

Approach A ... 47

Approach B ... 50

Test and results ... 54

Chapter IV - CONCLUSIONS ... 56

Conclusions... 57

Further developments... 58

References... 59

ANNEXES... 63

(5)

ANNEX E- Example of Neural Networks applied to pattern recognition... 73

ANNEX F- Example of Fuzzy Logic applied to a temperature control device... 77

ANNEX G- Genetic Algorithms Uses ... 84

ANNEX H- GA example applied to the Traveling Salesman problem ... 88

ANNEX I- Forward chaining expert system example ... 90

ANNEX J- Backward chaining expert system example ... 92

ANNEX L- Expert Systems uses ... 94

ANNEX M- Conditional Probabilities Tables ... 97

(6)

Index of figures

Figure 1- A simple feed Forward Neural Network ... 13

Figure 2- A simple neuron ... 13

Figure 3- The GA reproductive cycle ... 18

Figure 4- The architecture of Hearsay III- a speech understanding system ... 22

Figure 5- BN example to elucidate different types of queries. ... 30

Figure 6- BN example of an inverse graph approach. ... 31

Figure 7. Example of serial connection. ... 32

Figure 8. Example of d-separation- Z nodes are ascendants of X and Y... 32

Figure 9. Example of d-separation... 33

Figure 10- The tree problem ... 34

Figure 11- Bayesian Network according to approach A... 47

Figure 12- Example of one simulation... 48

Figure 13- Example of an incoherent simulation... 49

Figure 14- Bayesian Network according to approach B. ... 50

Figure 15- Example of BN under the same conditions as approach A. ... 51

Figure 16- Experiment to reduce the CPT’s tables complexity... 52

Figure 17- The final model including the sensor age condition ... 53

Figure 18. Kalman filter application... 64

Figure 19- Decision Boundaries ... 68

Figure 20- The Neural Network... 73

Figure 21 A simple block diagram of the control system ... 77

Figure 22- Typical control system response ... 78

Figure 23- The rule structure & rule matrix... 80

Figure 24- The features of a membership function... 81

Figure 25- Example errors ... 82

Figure 26- Early stage solution example ... 89

Figure 27- Optimal solution... 89

Figure 28- Overview of the GA Perfomance ... 89

(7)

Index of tables

Table 1- JDL Based Taxonomy ... 8

Table 2 Advantages and disadvantages of Genetic Algorithms ... 19

Table 3- Hard and Soft Decision Taxonomy ... 23

Table 4- P(Sick) ... 34

Table 5- P(Dry) ... 34

Table 6- P(Loses | Sick, Dry)... 35

Table 7- Analysis of data ... 45

Table 8- Classification into conditions’ states ... 54

Table 9- Results presented by the model ... 55

Table 10- Mass assignments for the various aircraft ... 71

(8)

Chapter I - INTRODUCTION

The chapter pretends to give a short overview of this work. It starts with the description of the background and the importance of the thesis. A general outline of the work is given. The chapter finishes with the aim’s definition.

(9)

Thesis Overview

In oil & gas extraction process each reservoir can be divided into homogeneous zones.

From an IT perspective an oil & gas zone can be seen as a closed uniform environment that contains some mixture of hydrocarbons under the same pressure and temperature conditions. Each well has a set of sensors to measure environmental conditions such as temperature and pressure. These conditions are distinct in the head of the well and in the reservoir (hole). A choke placed between these two places control this difference.

Statoil Hydro stated that the pressure gauges become to loose performance with time. As wells lifetime goes on, the measures became more uncertainty. The estimated lifecycle of a well can be more than 10 years. On the other hand, working at high temperatures can reduce the sensor lifetime to 2-3 years.

Until this point in time, Statoil does not have information about the accuracy of the measurements provided by the sensors. They suspect that one or more may be inaccurate but they cannot identify which one. Consequently, the usefulness of this data has been very limited. Statoil wants to have more information about the accuracy of the sensors’

measurements in order to increase their reliability.

In a production well, the quality of sensors’ measurements is an issue which the relevant attention has not been given. According to domain’ specialists, studies in this field could benefit the oil sector by providing a better control of the extraction process. This understanding should be one basic stone in the developing of decision support systems.

The usefulness of complex systems can be questioned when there is no information about

(10)

to data fusion levels. We stated that our problem was in the feature level and Bayesian Networks could be a promise method.

However, the examples found in the literature generally assume existence of data to assess probabilities or the ability of an expert to easily express them as believes. As this was not the case, we established then the following hypothesis:

H: In absence of data we may use Bayesian Network for sensor accuracy surveillance.

Latter we experimented different ways to construct the Bayesian Network. The aim was to test if the hypothesis was true or not.

We tested without success the conventional approach to design the Bayesian Network structure. However, we found more plausible to construct the model by reversing the causal direction of the relationships. In this way, one can easier express the knowledge of the expert when in presence of certain data constraints.

To develop this work we had several meetings with experts in the domain.

(11)

Aim of the thesis

The present model attempts to solve the following oil extraction problem.

In each production well we have three different sensors (S)¹ that can be inaccurate:

- bht (S1) – borehole temperature;

- bhp (S2) – borehole pressure;

- whp (S3) – well head pressure.

The model’s aim is to access probabilities for the sensors’ accuracy on a production well.

The challenge is doing it with no trustable data to construct the model.

We want to investigate if this can be possible using data fusion techniques and expert’s knowledge.

(12)

Chapter II - THEORY

We start our theoretical revision by a classification of several data fusion methods into taxonomy. For those who are not familiar with them, we provided some examples as annexes.

There is an explanation of the reasons why Bayesian was the chosen method. We then presented more carefully its subjacent theory. A review of similar studies is shown. The chapter ends with an overview of the Bayesian limitations.

(13)

Data Fusion taxonomy

Introduction

There are some distinct data fusion’s taxonomies. One of the most well known was developed by the Joint Director of Laboratories (JDL) from the U.S. Department of Defense. The JDL model was developed for military proposes and consist of four levels of data fusion:

1. identification and description of the objects;

2. interactive process to fuse spatial and temporal entities relationships;

3. combination of the activity and capacity of the enemy forces to infer their force;

4. related with all other levels and is responsible for regulation of the fusion process.

This model has been used also for other fields as image processing. However, given its specific nature is difficult to use in other domains. More generic model has been proposed by other authors with its base on JDL.

[1] presented the Data Fusion Architecture (DFA) in which the division of levels is taking in consideration the difference between data and variable. According to these authors, data can be defined as a measurement of the environment that is generated by a sensor or other type of source and variable is determined by an analysis of the data (feature

(14)

DFA presents three levels of data fusion:

1. data oriented;

2. task oriented (variable);

3. mixture of data and variable fusion.

The levels differentiate whether the fusion process is made before any data analysis (at the data level), after the data has been analyzed (at the variable level), or is done on a combination of raw data and variables (at the mixture level).

Others authors proposed variations of JDL model. In the next pages, a taxonomy based on [2] [3] will be presented.

(15)

JDL Based Taxonomy

In table 1 we present 3 level taxonomy based on JDL model. However this methods’

classification should not be seen as rigid. Depending on the application, some of them can be used at several levels.

JDL Based Taxonomy

Level Methods

Kalman Filtering Figure of Merit Raw Data

Gating

Bayesian Theory Dempster-Shafer Neural Networks Clustering Algorithms Feature data

Template Methods Fuzzy Logic

Genetics Algorithms Decision

(16)

Raw data level

In this level the data fusion is processed directly from the sensor data. According to [3], when multisensor data is commensurate (i.e. data from the same nature which is measuring the same physical phenomena) then the raw sensor can be directly combined.

The data association can be done by correlation of one set of sensor observations with another set of observations.

Methods

Kalman Filtering

The Kalman Filter can be defined as: “a set of mathematical equations that provides an efficient computational (recursive) means to estimate the state of a process, in a way that minimizes the mean of the squared error” [4]. According to these authors “The filter can be very powerful in several aspects it supports estimations of past, present, and even future states, and it can do so even when the precise nature of the modeled system is unknown”. This feature can be used in target positioning by removing the noise from sensor signals in order to better determine the present and future positions [2]. It uses a recursive solution in that each updated estimate of the state is computed from the previous estimate and the new input data. This lead to an efficient computing solution as only the previous estimate requires storage.

Kalman filters are based on linear dynamical systems discretised in time. It is assumed that the system and the measures are affected by White Gaussian noise. This means the noise is not correlated in time, and thus we can assume that at each discrete time, the

(17)

linearization procedure. The resulting filter is referred to as the extended Kalman filter (EKF) [6].

This filter is used for vision tracking on robotics, real time traffic-control algorithm, and autonomous driving systems.

The steps to use the Kalman filter for vision tracking are:

1. Initialization (k=0). In this step it is looked for the object in the whole image due we do not know previously the object position. We obtain this way x0. Also we can considerer initially a big error tolerance.

2. Prediction (k>0). In this stage using the Kalman filter we predict the relative position of the object, such position is considered as search center to find the object.

3. Correction (k>0). In this part we locate the object (which is in the neighborhood point predicted in the previous stage) and we use its real position (measurement) to carry out the state correction using the Kalman filter finding this way.

The steps 2 and 3 are carried out while the object tracking runs. [6]

In annex A is presented an example of a Kalman filter’s application to robotic football.

Other methods

[2] made reference to Figure of Merit and Gating as used with the aim to decide which

(18)

Feature data level

When multisensor data is not commensurate then can be fused on a feature/ state vector level. The aim of this level is the extraction of the representative features from the raw data³. On this level, one should extract the features from the various sensor observations and combine them into a single feature vector [3]. This feature vector should be a synthesis of more meaningful information for guiding human decision-making.

Methods

Bayesian theory

According to [2] Bayesian theory is one of the most common techniques employed in level two of data fusion. These authors encourage the use of Bayesian: "The use of multiple sensors in data fusion projects can produce conflicting data which, in turn, can cause decision problems. Application of the Bayesian theorem in such cases has proven successful in overcoming this challenge. It models the unknown system state by using probabilistic functions to determine an appropriate set of actions”. Since a certain level of uncertainty is generally associated with sensor’s data, it can be improved by quantifying the uncertainty behind each sensor decision and then comparing with some predetermined decision threshold level.

In Annex B is presented an example of Bayesian theory applied to pattern recognition.

(19)

Dempster-Shafer

Dempster-Shafer (or theory of belief functions) can be considered as a generalization of the Bayesian theory of subjective probability. However, in opposition to the Bayesian theory DS does not requires probabilities for each question of interest, belief functions allow us to base degrees of belief for one question on probabilities for a related question [7].

“Dempster-Shafer allows alternative scenarios for the system, such as treating equally the sets of alternatives that have a nonzero intersection: for example, we can combine all of the alternatives to make a new state corresponding to “unknown”. But the weightings, which in Bayes’ classical probability theory are probabilities, are less well understood in Dempster-Shafer theory. Dempster-Shafer’s analogous quantities are called masses, underlining the fact that they are only more or less to be understood as probabilities” [8].

Object recognition is one of the uses of this method, and has been applied to detection of ship wakesfrom synthetic aperture radar images [9], robotic, automated guided vehicles [2], and other uses as: color image segmentation [10]; representing the uncertainty inherent in the characterization of containerized radiological waste [11].

To better understand DST’s concept, it is presented a simple example in ANNEX C and

(20)

Neural Networks

The simplest NN is known as a Perceptron which is a system with an input and an output layer.

A Feed Forward Neural Network is a system with an input, an output and at least one hidden intermediate layer which is formed by simple computational units interlinked called neurons.

Figure 1- A simple feed Forward Neural Network

After a training process the neutrons establish synapses (weights) between them and the network should have the ability to respond to newer situations [12].

(21)

This network has some advantages over either Bayesian or DSER methods. The most relevant is the ability to process data fusion without the need of a priori information on a parallel way [2]. Neural is been widely applied on different nature forecasts: weather;

traffic; internet traffic, stock market among others. However, there are innumerous different applications with this method: Traveling Saleman's Problem (only to a certain degree of approximation); Medicine; Electronic Nose; Security; Loan Applications and Character Recognition Image Compression [13].

In annex E there is an example illustrating the application to pattern recognition.

Other methods

Other techniques for feature data fusion less used are clustering algorithms and template methods [14].

(22)

Decision level

In this situation the fusion occurs at the decision level. According to [14], it can also be called as postdecision or postdetection fusion. This level can be achieved applying Boolean operators or using a heuristic score over combinations of decisions from independent sensors detection or classification paths.

Methods

Fuzzy Logic

Fuzzy Logic is a method appropriate to model situations where the boundaries are not clearly identified. The fuzziness can be present in abstract and concrete situations and this theory allows specifying their relevant attributes and relationships [15]. Related to the drifting of the perfect calibration of sensors, [16] referred the usefulness of Fuzzy Logic

“in capturing the desired behavior of the classification algorithm for diverse and nonlinear sensor responses is that we can blend information according to our human expert knowledge”. According to this author the inputs to the fuzzy logic could be outputs from other algorithms, such as neural networks, or other inference logic networks.

[17] described the following features about FL that makes it a particularly good choice for many control problems:

• It is inherently robust since it does not require precise, noise-free inputs and can be programmed to fail safely if a feedback sensor quits or is destroyed.

• The output control is a smooth control function despite a wide range of input variations.

(23)

system performance. New sensors can easily be incorporated into the system simply by generating appropriate governing rules.

• FL is not limited to a few feedback inputs and one or two control outputs, nor is it necessary to measure or compute rate-of-change parameters in order for it to be implemented. Any sensor data that provides some indication of a system's actions and reactions is sufficient. This allows the sensors to be inexpensive and imprecise thus keeping the overall system cost and complexity low.

• Because of the rule-based operation, any reasonable number of inputs can be processed (1-8 or more) and numerous outputs (1-4 or more) generated. However it would be better to break the control system into smaller chunks and use several smaller FL controllers distributed on the system, each with more limited responsibilities.

• FL can control nonlinear systems that would be difficult or impossible to model mathematically. This opens doors for control systems that would normally be unfeasible for automation.

The Fuzzy Logic can be used on several fields as:

• on selection of the most suitable material for a particular application;

hydrodynamic lubrication; elastohydrodynamic lubrication, fatigue and creep;

cumulative fatigue damage analysis; reliability assessment; process control; total risk and reliability with human factors; system condition auditing; reframing

(24)

[17] suggests the following steps to design an FL system:

1. Definition of the control objectives and criteria: What am I trying to control?

What do I have to do to control the system? What kind of response do I need?

What are the possible (probable) system failure modes?

2. Determination of the input and output relationships. One should choose a minimum number of variables for input to the FL engine (typically error and rate-of-change-of-error).

3. Break the control problem down into FL rules. The problem should be split into a series of IF X AND Y THEN Z rules that define the desired system output response for given system input conditions.

4. Creation of FL membership functions. The memberships defines the meaning (values) of Input/Output terms used in the rules.

5. Development of necessary pre- and post-processing FL routines if implementing in S/W, otherwise program the rules into the FL H/W engine.

6. System test: evaluate the results, tune the rules and membership functions, and retest until satisfactory results are obtained.

In annex F there is an example applied to a temperature control device.

Genetics Algorithms

Genetics Algorithms are another method used at decision level. It consists on stochastic optimizations which simulates the process of natural evolution. These algorithms are suitable for very complex systems, including multiple objectives optimization.

GA can be viewed as a family of computational models inspired by Darwin’s evolution

(25)

Figure 3- The GA reproductive cycle

An implementation of a GA usually starts with a population of chromosomes on which we select a set of parents for reproduction. The selected parents generate modified children by genes’ recombination. A crossover occurs when genes came from a fusion of two different parents; however the recombination can be done by mutation of a single chromosome. The resulting children are used to form a new population that we hope to be better. The selection process is done evaluating their fitness (the more suitable they are the more chances they have to reproduce). This is repeated until some condition is satisfied. [19], [20]

Parent 1 (0 1 1 0 1 0 0 0) (0 1

0

0 1 0 0 0) Child 1

(26)

Advantages Disadvantages Only uses function evaluations Cannot use gradients

Easily modified for different problems Cannot easily incorporate problem specific information

Handle noisy functions very well Not good at identifying local optima Handles large, poorly understood search

spaces easily

No effective terminator

Good for multi-modal problems Not effective for smooth uni-modal functions

Return a suite of solutions Needs to be coupled with a local search technique

Very robust to difficulties in the evaluation of the objective function

Easily parallelized

Table 2 Advantages and disadvantages of Genetic Algorithms

Genetics can be used in a hierarchical fuzzy model for pattern extraction and to neuro- fusion models complexity reduction. They can be used as an optimization technique, to extraction of knowledge, in combination with fuzzy rules, fuzzy membership, and with neural networks and fuzzy-logic.

A neuro-fuzzy-genetic model was proposed for data mining and fusion in the area of geoscience and petroleum reservoirs. The use of a neuro-fuzzy DNA model was propose for extraction of knowledge from seismic data and mapping the wireline logs into seismic data and reconstruction of porosity [12]. A list with more uses can be found on annex G.

In annex H one can find an example applied to traveling Salesman problem.

(27)

Expert Systems

Expert System is another method that can be used at decision level. A rule-based expert system is a set of rules that can be applied to a collection of facts in a repeatedly way by an engine. These rules represent heuristics that define a set of actions to be taken in a given situation and facts represent circumstances that describe a certain situation in the real world. [22]

The Expert Systems are present in oil industry applications such as: "Extra Pair of Eyes"

(autonomous intelligent controlling systems); Pipeline and Production Supervision; Plant- wide network supervision and optimization; Abnormal Situations Management;

Environment: Supervision and Control; Online-Analyzer verification and value inference;

Planning, simulating and control of biochemical processes [23]. [24] have studied an expert system where a crude oil distillation column is designed to predict the unknown values of required product flow and temperature in required input feed characteristics.

The system is also capable to optimize the distillation process with minimizing the model output error and maximizing the required oil production rate with respect to control parameter values. In combination with expert system the model also use neural networks and genetics algorithms.

In simple rule-based systems, there are two kinds of inference, forward chaining and backward chaining. In annex I and J one can find an example of forward and backward chaining systems.

More examples of expert systems applications are presented in annex L.

(28)

Blackboard Systems

A blackboard system is an architecture that can integrate multiple problem solving modules (referred to as knowledge databases). This type of integrated problem solvers can make use of more than one problem system in an attempt to overcome the inherent limitations of a single heuristic expert system. The problems solvers may also use different technologies. For example, a system might integrate a heuristic rule based reasoning system with a case-based reasoning system and possibly a model based system.

These architectures can be used for a wide range of tasks such as classification, design, diagnosis, repair etc. [25]

Picture 15 represents a blackboard architecture for a speech understanding system. In this picture, one can see a set knowledge sources (solving modules) sharing a blackboard that is a common global database. The contents of the blackboard are often structured hierarchically and called hypotheses. Knowledge sources respond to changes on the blackboard, and interrogate and subsequently directly modify the blackboard. This modification results form the creation, modification and solution of hypotheses. The knowledge sources have the possibility to communicate and cooperate with each other through the blackboard. In blackboard architecture, each knowledge source responds only to a certain class or classes of hypotheses. These hypotheses, that a knowledge source responds to, often reflect the different levels in the blackboard’s hierarchy. The blackboard holds the state of the problem solution, while the knowledge sources make modifications to the blackboard when appropriate. [25]

(29)

Figure 4- The architecture of Hearsay III- a speech understanding system

(30)

Hard and Soft Decision Taxonomy

Apart from the JDL’s based taxonomy, some authors suggest a different classification of methods regarding the decision process.

Hard and Soft Decision Taxonomy

Decision Type Method Description

Boolean Apply logical AND, OR to combine independent decisions [14].

Weighted sum score

Weight sensors by inverse of covariance and sum to derive score function [14].

Hard decision

M-of-N Confirm decision based on m-out-of-n sensors that agree [14].

Bayesian

Dempster-Shafer Fuzzy variable Neural networks Genetics

algorithms Expert Systems Soft decision

Blackboard Systems

See chapter Taxonomy

Table 3- Hard and Soft Decision Taxonomy

(31)

This classification cluster methods inside two basic groups [14]:

• hard decisions which consist on a single optimum choice

• soft decisions, in which decision uncertainty in each sensor chain is maintained and combined with a composite measure of uncertainty.

In opposition to hard computing, soft computing is tolerant to imprecision, uncertainty, and partial truth. According to [12] soft computing is tractable, robust, efficient and inexpensive.

(32)

Why Bayesian?

According to the presented JDL based taxonomy, our data fusion problem belongs to feature level.

The method chose was the Bayesian Networks.

The main advantages of using Bayesian in data fusion summarized by [26, 27] are:

1. Bayesian statistics is a coherent system for quantifying objective and subjective uncertainties.

2. Bayesian provides principled methods for the model estimation and comparison and the classification of new observations.

3. Bayesian statistics provides a natural way to combine diﬀerent sensor observations.

4. Bayesian statistics provides principle methods for dealing with missing information.

5. Bayesian provides a definition of "personal probability" which satisfies the same set of fundamental axioms which classical statisticians insist must hold for relative frequencies. This fact allows to focuses as much attention on the decision- maker as on the process or phenomenon under study.

Analyzing the data from the sensors it is very difficult to assess the probabilities in a classical way. Since we do not have sensor redundancy on each well, there is no way to confirm if the measure is correct or not. As so, assessing frequencies from the data is a though task and frustrating in a certain point.

The basic premise of Bayesian statistics is that all unknowns are treated as random variables and that the knowledge of these quantities is summarized via a probability distribution. [26]

(33)

can intermix expert judgment, statistical distributions, and observations in a single model.

Further, they are able to learn from evidence in order to update their prior beliefs.

BN models have several advantages over regression-based models. BNs do not rely on point values of parameters that have been derived through some “best fit” procedure.

Instead, the whole distribution of a variable is included. Similarly, BN models do not just predict a single value for a variable; they predict its probability distribution. By taking the marginal distributions of variables of interest, we get a ready-made means of providing quantitative risk assessment.

We didn’t choose Neural Networks because we do not have data for the learning process.

(34)

Bayesian theory

To a better understanding of the Bayesian Networks let us start first with some notions about Bayesian theory.

Bayesian dates from the eighteen century and gains its roots with the English Reverend Thomas Bayes work [29]

This theory presents two important concepts: the Bayesian probabilities and the theorem (also known as rule).

In opposition to the frequency concept, the Bayesian can be related with partial beliefs in a different form to face probabilities. A probability can be thought as a quantitative measure of the strength of one's knowledge or of one’s beliefs. This way, we can assess them using experts’ knowledge and without having historical data. With this concept it is possible to deal with subjective beliefs and use them into a mathematical model.

Other idea subjacent to Bayesian is the conditionality. Instead of a classical approach, Bayes uses the notion of a probability of an event as a consequence of other events’

probabilities.

An example of a conditional probability statement is that, given event B, the probability for event A to happen is x.

P(A|B) = x

This does means that P(A) = x when B is true and everything else is irrelevant to A.

The Bayesian theorem is

P (A|B) =

) (

) ( )

| ) (

|

( P B

A P A B B P

A

P =

(35)

Where [29]:

P(A) is the prior probability or marginal probability of A. It is “prior” in the sense that it does not take into account any information about B;

P(A|B) is the conditional probability of A, given B. It is also called the posterior probability because it is derived from or depends upon the specified value of B;

P(B|A) is the conditional probability of B given A;

P(B) is the prior or marginal probability of B, and acts as a normalizing constant;

Intuitively, Bayes' theorem in this form describes the way in which one's beliefs about observing 'A' are updated by having observed 'B';

(36)

Bayesian Network

According to [30] a Bayesian network (BN) consists of:

- a set of variables and a set of directed edges linking the variables;

- each variable has a set of mutually exclusive finite states;

- the edges express dependency relationships between the variables, forming a DAG (direct acyclic graph).

- for each variable A with parents B1, …, Bn; there is a conditional probability table (CPT) P (A|B1, …,Bn) to quantify the dependency. For variables without parents, the table is related to unconditional (also called marginal) distributions.

In a BN, when we observe a variable, the observation can be entered into the model by reducing their marginal probability distribution to a probability of one for the observed state and zero to the remaining states⁴. The presence of this new evidence updates the curve probability distribution of its children and the distributions of its parents. Applying Bayes’s theorem, observations are propagated recursively through the model, updating their beliefs about probable causes and so learn from the evidence entered into the model.

[28]

The Bayesian Networks allows any node to serve as either a query or an evidence variable. This is a very powerful characteristic and allows the usage of the network in several directions. Let’s consider the BN presented on the next figure to better elucidate the different kinds of inference.

(37)

Figure 5⁵- BN example to elucidate different types of queries.

Example taken from the medical context where D1 and D2 are two different diseases; S1, S2, S3 symptoms; TR1, TR2 tests results realized to the patient.

There are four distinct kinds of inference that can be performed [31]:

• diagnostic inference (from effects to causes). Ex: given a symptom S1 infer the probability of the pathology D1, P(D1|S1);

• causal inference (from causes to effects). Ex: given disease D2 find the most likely symptoms, P(Si|D2);

• inter-causal inference (between causes of a common effect). Ex: given S2 infer P(D1| S2), but adding evidence that D2 is true makes the probability of D1 go down. Although D1 and D2 nodes are independent the presence of one makes the other less likely;

• mixed inferences (combining two or more of the above).

(38)

the expert's understanding of the domain in a better way. These kinds of graphs also improve interaction with a human expert at the model building stage and are readily extendible with new information. Finally, these authors states that causal models facilitate user insight once a model is employed.

However, this causal approach to structuring the problem may present a lesser intuitive conditional probabilities assessment. This may become a serious difficulty when there is no sufficient data to help assessing these probabilities and we have to rely solely on the expert’s believes. In this situation, we may have to choose the graph structure that match the available information in despite of the one who represents the relations in the most intuitive way.

We can find in the literature both approaches for the same kind of problem. [33]

presented an example of a BN to diagnose Pneumonia. According to their idea the arrows indicate all of the conditional relationships between findings and diagnosis.

Figure 6⁶- BN example of an inverse graph approach.

In this approach arrows goes now from the symptom to the disease.

Serial connection is an import concept related to BN which help to introduce d- separation. To explain serial connection let us consider the example in next figure. In the present network, A has influence in B and B has influence in C. Consequently, evidence on A will influence B and B will transmit it to C, forming a communication channel.

However, if the state of B is known, and the evidence is inserted in the network, A and C will became independent and communication channel is broken.

(39)

As a conclusion, evidence may be transmitted through a serial connection unless the state of the variable in the connection is known. [30]

Figure 7. Example of serial connection.

When B is instantiated with evidence, the communication channel between A and B became blocked.

According to [34], a group of nodes Z is said to d-separate the disjoint groups of nodes X and Y when either the nodes Z are ascendants of both groups X and Y, or Z is an intermediate group of nodes. In the next two figures, we can see an example of both types of d-separation.

(40)

Figure 9. Example of d-separation.

Z is an intermediate group of nodes [34].

(41)

Bayesian Network – a simple example

To better understand the Bayesian Network concept lets consider the simple example⁸: - we have a tree loosing its leaves and we want to know why;

- we know that if the tree is dry, this is can be the justification;

- however the losing of leaves can be an indication of a disease.

The problem can be represented with the Bayesian Network presented on next figure.

Figure 10- The tree problem

Let’s consider the following CPTs assessed by an expert in the domain.

Sick="sick" Sick="not"

0.1 0.9

Table 4- P(Sick)

(42)

Dry="dry" Dry="not"

Sick="sick" Sick="not" Sick="sick" Sick="not"

Loses="yes" 0.95 0.85 0.90 0.02 Loses="no" 0.05 0.15 0.10 0.98

Table 6- P(Loses | Sick, Dry)

We can now determine some useful information:

- probability of Loses in a specific state.

e.g.: P(Loses = “yes”) = 0.1832

- marginal probability of Loses given evidences about Sick and/ or Dry states;

e.g.: P(Loses = “yes” | Sick = “sick”, Dry = “dry”) = 0.95;

- inferences about the probability of each parent, given evidence about the child’s state;.

e.g.: P(Sick = “sick” | Losses = “yes”) = 0.49

- inferences about the probability of one parent, given evidence about the child and other parent;

e.g.: P(Sick = “sick” | Losses = “yes”, Dry = “dry”) = 0.11

(43)

Applications

There are innumerous applications of the Bayesian theory which covers distinct fields of science.

However, Bayesian Networks approach’ is still a relatively new research area. The main reason is because it requires significant computational power. Only in lasts few decades we started observing practical applications.

Bayesian Networks contributed with useful improvements on fields such as fault diagnosis and sensor accuracy.

In 1999, [36] presented a model to diagnose faults in airplane turbines with BN.

[37], in 2002, used Bayesian Networks coupled with multivariate state estimation to provide both fault detection and fault diagnostic capabilities for the Space Shuttle Main Engines. In this study, the sensors information is validated with residual estimation techniques. Then, if a fault occurs, a probability is assigned to the component that had the failure; finally, Bayesian networks are applied for diagnosis.

[38], developed a model to diagnose faults in networks of electric power distribution in the same year.

In 2003, [39] proposed a distributed solution using Bayesian Networks for the detection of environmental features in wireless sensor networks.

In the same year, [40] presented a model to predict the final quality of a software product.

With BN they constructed a prediction model that focuses on the structure of the software

(44)

[42] in the same year, developed a fault diagnosis in Autonomous Underwater Vehicles based on Bayesian Networks.

Other field that benefited with Bayesian Network was medicine. [43] stated in 1998 a large number of health care applications use DAGs.

In fact, we can find sophisticated BN approach to clinical dating from 1993. For example, [44] used a Bayesian network with continuously valued nodes to propose an optimal schedule of a certain drug delivery. To achieve this, CPT’s were replaced with conditional density functions. This author used BN to infer the model parameters from a population and to probabilistically adapt it to a specific patient (taking into account the person’s history). This information is then used to help in defining an optimal policy of drug delivery.

[32] presented in 1999 a Bayesian Network model for diagnosis of liver disorders and [45] in the year of 2000 a system to management of infectious disease.

[31] developed a user-friendly web based development tool for medical diagnosis based.

[45] stressed in 2004 the importance of Bayesian networks and other probabilistic graphical models as methods for discovering patterns in biomedical data and also as a basis for the representation of the uncertainties underlying clinical decision-making.

All this applications follow the “classical causal effect” approach. In fact it seems that the

“inverted” approach has not been sufficiently studied.

(45)

Limitations

Bayesian Networks have some limitations in what concerns the difficulties to obtain the necessary parameters. [37] experience some of them during their work related to fault diagnostic capabilities for the Space Shuttle Main Engines:

1. prior probabilities of failure for each of the components are obtained from engineers and reliability test data;

2. conditional probabilities for some of the nodes are obtained from past reports and engineering estimates;

3. BN require tremendous numbers of parameters. For each node that has parents, a conditional probability is required for each state with regard to each combination of parent states. A single node may require hundreds of values.

4. conditional probabilities for multiple failure modes were not available, so Liu and Zhang calculated by averaging the values for each of the participating single failure cases.

Other crucial aspect that can turn into a limitation is the quality of the prior beliefs. [46]

wrote that a Bayesian Network is only as useful as this prior knowledge is reliable. The author expresses that either an excessively optimistic or pessimistic expectation of the quality of these prior beliefs will distort the entire network and invalidates the results. He also emphasizes that selecting the proper distribution model to describe the data has a notable effect on the quality of the resulting network.

(46)

Bayesian Networks may be difficult to model problems when causal relationships between variables are complex and there isn’t available enough data to the network learn.

It is the case of forward loops. However, these situations can be solved using more complex approach as Dynamic Bayesian Networks[49].

(47)

Chapter III - THE MODEL

The chapter starts with the formulation of the formal hypothesis on which relies the model. An outline of the methodology followed is presented. The necessary assumptions and considerations about the data’s quality are then exposed. There is also a description of the necessary experiments realized. The chapter ends with the test and its results.

(48)

Hypothesis

With this work we intended to create a model to estimate sensor accuracy using data fusion. We believe that this could be done using Bayesian Networks.

The examples found in the literature generally assume one of the following:

• the existence of data to assess probabilities;

• the ability of an expert to easily express them as believes.

Unfortunately these conditions were not present. As so, we faced an extra challenge besides the regular problem modeling. We wanted to test if the application of BN was possible with such constraints.

From a formal point of view our aim could be formulated through the hypothesis:

H: In absence of data we may use Bayesian Network for sensor accuracy surveillance.

Following a certain methodology we tried to investigate its veracity.

(49)

Methodology

To test our hypothesis, the present methodology was followed:

- identification of the variables and their dependency relationships;

- delineation of the Bayesian Network structure;

- estimation of all the conditional probability tables (CPT) necessaries to our BN;

- determination of the model inputs. The inputs are evidences we can observe through analysis of the well logs. In the present model we call it conditions.

- query the BN using information about conditions to obtain the sensors’

accuracy probabilities;

If the results were according to our expectations the hypothesis would be considered as true.

(50)

Assumptions

The model is based on a set of conditions established with the help of experts in the domain.

If the sensors are correct they should obey to several conditions⁹ (C):

- bht/bhp = kte (C1). The coefficient between the temperature and pressure should be constant. As a consequence of the ideal gas law, this should be valid for a certain time period;

- bhp > min (C2). The borehole pressure should be over a minimum reference value;

- bhp – whp > diffP (C3). If the well is in production, there should be a pressure difference between the bhp and bht;

- db (bhp/whp) = kte (C4). Relation between pressures on the choke should be constant for a stable choke aperture value.

We assume that the well is in production.

(51)

Data considerations

Statoil Hydro provided a log from 2 platforms, each one with 4 production wells connected to the same reservoir.

The measurements were from the borehole pressure (bhp), borehole temperature (bht), choke aperture, well head pressure (whp) and well head temperature (wht). These values were collected in 5 mins intervals over a month.

After analyzing the data applying some statistical figures of merit we observed the following problematic situations:

C1:

- A-4 presents clearly an irregular value; A-2, A-3; B-1 and A-1 are not so bad.

We do not have information about the B-2 value.

C2:

- 2 wells presented an abnormal low bhp (one bellow 15 bars and the other negative (!));

- 1 well does not present bh values, as so, it had to be excluded from the analysis;

C3:

- A-2 and A-3 presents a higher pressure value at the well head than at borehole.

(52)

Table 7- Analysis of data

Irregular situations represented by italic bold.

In this study we face a feasible data absence problem. As there is no acceptable data¹¹, it limits the model developing in these important aspects:

- determination of correlations between sensor values;

- determination of any probabilities;

- using the BN learning abilities to help establishing relations between variables.

Therefore, we have to solve the problem using merely domain expert’s knowledge.

Consequently, we face the challenge of adapting the model in an easy way to incorporate this information.

WELL

Conditions A-1 A-2 A-3 A-4 B-1 B-2 B-3 B-4 C1 bht/bhp s¹⁰/mean 6% 0% 0% 26% 9% ?? 3% 0%

C2 Bhp mean (bar) 279 14 -100 187 180 N/A 220 230 C3 Bhp –whp (bar) 198 -132 -205 126 127 ?? 166 152 C4 Bhp/whp s/mean 24% 23% -24% 24% 12% ?? 6% 32%

(53)

Experiments

In presence of the data constraints we experiment two different approaches in order to test our hypothesis. These approaches were based on different conditional probabilities concepts presented by [50]:

- A: where causal probabilities are those of the form P(TestResult=fail | Sensor=bad), indicating the likelihood that a particular test condition outcome is caused by the state of a certain sensor.

- B: where diagnostic probabilities are those of the form P(Sensor=bad l TestResult=fail), indicating the likelihood that a particular sensor is bad based on the fact that a certain condition test has failed.

In both approaches we used for C2 (bhp > min) and C3 (bhp – whp > diffP) boolean variables which represents the probability of the condition been satisfied or not. The same logic was followed for the sensors variables¹². We used 3 states variables for both approaches C1 (bhp/bhp = kte) and C4 (db (bhp/whp) = kte) pretending to model a bad; a good and an intermediate result of the conditions. Liu and Zhang, (2002) also used 3 states variables.

To model our network we used the Hugin Lite 7.1 software which can be found in:

http://www.hugin.com

(54)

Approach A

It considers the sensors as parents and the conditions as children.

The logic behind is to reflect the sensors as the causes of the conditions’ state.

The network provides the sensors’ probabilities using Bayesian inference.

This modeling style seems to be the classical approach to fault diagnosis in engineering.

One possible network representation is shown in next figure.

Figure 11- Bayesian Network according to approach A.

This approach has as advantage CPTs with few variables (maximum 3 variables).

However the children CPTs are not intuitive as they are in the form P(Condition|Sensor).

There is another problem related with the d-separation of the conditions. This states that the conditions are independent from each others given the sensors probabilities.

Despite the fact this approach seems to be simpler at a first glance, it became harder to access the CPTs in order to express the expert knowledge in a coherent way.

[50] stated this problem as “domain experts often experience difficulty arriving at the conditional probabilities in the causal direction, which are needed for the network design, as opposed to the probabilities in the diagnostic direction, which reflect their natural way

(55)

Figure 12- Example of one simulation¹³

In the last figure we can see a simulation of the BN using rough CPTs. Since bhp > Min is false, the model can assume that Sensor bhp is not ok. So, bhp by itself justifies why the others conditions are not good. With this set of evidences, the model cannot clearly decide if others sensors are ok or not. This was according our expectations. Now, what if we realize that bht/bhp = Kte is good? Since bhp is not working ok, we are not expecting good unless bht is also not ok.

(56)

Figure 13- Example of an incoherent simulation.

As we can observe the model does not behave as expected. The justification is related with the conditions’ d-separation. We could add more dependencies’ relationships to improve the behavior. However, in that way we’ll start to get complexes and non intuitive CPT.

One could easily think that given only the prior component probabilities (P(C), P(C’)), and the diagnostic conditional probabilities (P(CIT), P(C’IT) ), it is possible to uniquely determine the causal probabilities (P(TIC’), P(T’IC’) or (P(TIC), P(T’IC)). However, as [50] proved this is not possible.

(57)

Approach B

This approach considers the conditions as parents and the sensors as children.

The model follows the logic of the “symptom -> diagnostic”, whereas the conditions are symptoms to diagnose the sensors’ accuracy.

This kind of approach is also found in the medical diagnostic context [33].

Figure 14- Bayesian Network according to approach B.

The network has now links between the sensors to express the interrelation between them and the conditions. At a first glance, these connections may suggest an erroneous physical relationship between the sensors. In the model context, one should interpret this variable as the diagnostic about the sensor and not the physical state. Even if these two concepts may seem similar, they differ in practical aspects because the knowledge of one diagnostic may influence the other sensor diagnostic. [47] alerts for some precaution in Bayesian models interpretations. They state that even BN are highly interpretable structures for representing statistical dependencies, they can be easily misleading if

(58)

Figure 15- Example of BN under the same conditions as approach A.

We can now realize that BN assumes both sensors are not ok. As bhp is not ok, the model expects bhp/whp to be bad. Since this condition is just medium, whp can not be working perfectly.

As this approach seems more suitable to solve our given problem we tried to improve it adding an extra condition to expresses the age of sensors.

The age condition has two states which are:

- Old- when selected decrease the sensor accuracy;

- Neutral- when selected the sensor accuracy is not affected by this condition.

We also experimented adding more intermediate layers in a tentative to get CPTs with less variables and more easy to define. As suggested by [30] this was done by adding mediating variables.

(59)

Figure 16- Experiment to reduce the CPT’s tables complexity.

This was done by adding mediating variables Z1, Y2, X2, Z3, X3.

Even each CPT became easier to define, the overall model behavior was more difficult to delineate. These results are confirmed by [50] which state that multilayer networks are often very sensitive to conditional probabilities. These authors alert to the fact that probabilities have to be defined with greater accuracy because small perturbations in their values may result in radically different diagnostic conclusions.

They also refer: “In the choice between simple Bayesian networks or two-level Bayesian networks and a multilevel network one needs to carefully consider the expected diagnostic benefits versus the increased cost of the knowledge engineering, testing, and real-time execution.”

(60)

Figure 17- The final model including the sensor age condition

(61)

Test and results

We tested the final model to the problematic data described in the topic “data considerations”.

We classified the data into the conditions’ states which are the model’s inputs.

The criterion for the first and last condition is: Good <5%; 5% <= Medium < 20%; Bad

>= 20%. The diffP is 120 bars and Min is 20 bars.

As we do not have information about the sensor age we set that condition on Neutral state.

WELL

Condition A-1 A-2 A-3 A-4 B-1 B-3 B-4

C1 bht/bhp = kte Med Good Good Bad Med Good Good

C2 Bhp > Min Yes No No Yes Yes Yes Yes

C3 Bhp –whp > diffP Yes No No Yes Yes Yes Yes C4 Bhp/whp = kte Bad Bad Bad Bad Med Med Bad Table 8- Classification into conditions’ states¹⁴

The results are presented in the next table. As mentioned before, the variable sensor gives the probabilities of the state OK and their complementary NOT_OK. If sensor is measuring data correctly the state OK is more probable than NOT_OK. An equal probability distribution occurs when the model does not have enough information to decide about the sensor accuracy.

(62)

A-1 A-2 / A-3 A-4

B-1 B-3 B-4

Table 9- Results presented by the model

The results confirm our initial expectations regarding which sensor is most probable to be the cause of the data problem on each well.

These outcomes reinforce the possibility to construct BN even in absence of data. The initial hypothesis was proven to be true. As so, Bayesian is a technique suitable for surveillance of sensors’ accuracy in a production well.

(63)

Chapter IV - CONCLUSIONS

In this chapter we present the final conclusions. Some considerations about practical usages are given. It ends with some suggestions for further developments.

(64)

Conclusions

With this work we could conclude that our initial hypothesis is true. Therefore Bayesian Network is a suitable method for surveillance of sensors’ accuracy in a production well.

We found that when facing situations of missing feasible data to construct the model and limited expert knowledge it may became easier to invert the causal structure. This can be achieved by slightly modifying the variables and the relations meanings in order to express expert’ believes. However, this should only by done if the model is still easily understandable.

The “inverted” approach has been rarely used, though it can be a better alternative under special constrains. It can be especially useful helping the expert in the critical task of assessing his believes of probabilities.

Despite our satisfactory results, one should take into consideration the following aspects were simplifications:

• the conditions that variables should obey;

• the probabilities assessment;

• the number of states on each condition;

• the classification criterions of this states.

As so, before applying this model in practice we present the following suggestions:

- revise the conditions;

- include more conditions;

- increase the conditions states and improve the criterions;

- revise the CPTs.

(65)

Further developments

It would be interesting to extend this model in a way to use the measurements from the others wells.

Since the several wells are all connected to the same reservoir, one could try to find correlations between the several sensors.

If we can find these correlations we could incorporate a sensor redundancy approach into this model.

It would also be interesting trying a Dynamic Bayesian Network approach to include the evolution of the several sensor measurements over time.

(66)

References

1. Carvalho, H.S., et al. A general data fusion architecture,. in Proceedings of the Sixth International Conference of Information Fusion. 2003.

2. Dailey, D.J., P. Harn, and P.-j. Lin, Its Data Fusion. 1996, Washington State Transportation Center (TRAC).

3. Kessler, O. and F. White, Data Fusion Perspectives and Its Role in Information Processing, in Handbook of Multisensor Data Fusion - Theory and Practice, M.E. Liggins, D.L. Hal., and J. Llinas, Editors. 2009, CRC Press: Boca Raton.

4. Welch, G. and G. Bishop, An Introduction to the Kalman Filter. 2006, Department of Computer Science; University of North Carolina; Chapel Hill.

5. Silva, J.M.L.d., Sensor fusion and behaviours for the CAMBADA Robotic Soccer Team, in DETI. 2008, University of Aveiro: Aveiro.

6. Cuevas, E., D. Zaldivar, and R. Rojas. Kalman filter for vision tracking. 2005 [cited; Available from: http://page.mi.fu-berlin.de/zaldivar/files/tr-b-05-12.pdf.

7. Shafer, G. Dempster-Shafer Theory. [cited; Available from:

http://www.glennshafer.com/assets/downloads/articles/article48.pdf.

8. Koks, D. and S. Challa, An Introduction to Bayesian and Dempster-Shafer Data Fusion. 2005, Australian Government – Department of Defence.

9. Tunaley, J.K.E., T.M. Sibbald, and M.T. Rey-Cousins. Use of the Dempster- Shafer algorithm in the detection of ship wakes from synthetic aperture radar images. in Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference on. 1991.

10. Mena, J.B. and M. J.A., Color image segmentation using the Dempster-Shafer theory of evidence for the fusion of texture. 2003, Alcalá University, ISPRS Archives, Munich. p. pp. 17.-19.

11. Bridges, S., J. Hodges, and B. Wooley. Preliminary Results in the Use of Dempster Shafer Theory for a Radiological Waste Characterization Expert

System. 1996 [cited; Available from:

ftp://ftp.cs.msstate.edu/publications/tech_reports/961212.ps.Z

12. Nikravesh, M. and F. Aminzadeh, Soft Computing and Intelligent Data Analysis in Oil Exploration, in Soft Computing and Intelligent Data Analysis in Oil Exploration, M. Nikravesh, F. Aminzadeh, and L.A. Zadeh, Editors. 2003, Elsevier: Amesterdam.

13. Clabaugh, C., D. Myszewski, and J. Pang. Applications of neural networks. 2000 [cited; Available from: http://www-cse.stanford.edu/classes/sophomore- college/projects-00/neural-networks/Applications/index.html.

14. Waltz, E. and T. Waltz, Principles and Practice of Image and Spatial Data Fusion, in Handbook of Multisensor Data Fusion - Theory and Practice, M.E.

Liggins, D.L. Hal., and J. Llinas, Editors. 2009, CRC Press: Boca Raton.

15. Steinberg, A.N., Foundations of Situation and Threat Assessment, in Handbook of