The Principal Component Transform (PCT) : tensor-metamodel of biochemical synergistic (S) systems, a new approach for mathematical modelling of non-linear dynamical systems

(1)

The Principal Component Transform (PCT)

Tensor-metamodel of biochemical synergistic (S) systems, a new approach for mathematical modelling of non-linear dynamical systems.

Norwegian University of Life Sciences Department of M athematical Science and Technology

Master of Science 2014 60 credits

(2)

(3)

To Daniel Andrés & David Emanuel.

(4)

(5)

1 Acknowledgements

This work was carried out at the Norwegian University of Life Science from August 2013 to April 2014. Throughout each and every stage of this project there have been a great many people who have pushed me onwards, motiv- ated me and guided me, and it is thanks to their support and guidance that we are reading this thesis today.

I am heartily grateful to my supervisor, Prof. Arkadi Ponossov, whose guidance and continual encouragement have been invaluable throughout the workings of this thesis. Prof Ponossov did not only lead me through the project; he truly inspired me and in sharing his knowledge both myself and my work have been immensely enriched.

I would also like to express my warmest thanks to Prof. Harald Martens, for his moral support, kindness and constructive ideas; this thesis being the result of one of these ideas. His constant enthusiasm have been both conta- gious and inspirational.

I would like to thanks to Prof Stig Omholt and Prof. Eberhard O. Voit, for their support and suggestions; this thesis bears the name proposed by Prof O. Voit, which is: the principal component transform.

My grateful thanks also goes to Rachel Davies, for her help in reading and correcting my terrible grammatical errors, which you will never nd, thanks to Rachel.

Finally, I would like to express my gratitude to my family for their love and endless support, especially to my wife. Without her unwavering belief in me, I would never been able to achieve what I have achieved, and I am forever indebted to her.

(6)

2 Abstract

In this thesis, a new approach, the principal component transform (PCT)¹, is presented for mathematical modelling of synergistic biochemical systems (S-system) providing us with a simple and very accurate method for t- ting these and other non-linear dynamical systems. The PCT is composed by two metamodels² one bi-linear metamodel and its improved version, a tensor-metamodel; both metamodels are based on the principal component analysis (PCA), which at the same time is based on the singular value decomposition (SVD). The tensor-metamodel is in addition based on mathematical attributes from tensor algebra, especially, the Kronecker product.

The PCT is inspired by the results of the PhD thesis of Julia Isaeva³. She and her co-authors presented how many non-linear models can be approximated by a single bi-linear metamodel. Isaeva's publications have provided valuable knowledge which has been applied in the implementation of the PCT as a tool which deals with multivariate power functions.

Referring to the PCT, the loadings, which are orthogonal, span a subspace, where, after projecting a simple time-series (given by the S-system), we are able to nd a function which behaves in the same way as the non-linear dynamical system described by the S-system. The process of projecting the time-series vector to the span of the loadings-vectors is carried out by the linear regression method.

This thesis did explore the possibility of using the scores, which belong to the PCT as a library, meaning that after having computed the linear regression, we ought to have been able to nd the same parameters in this library.

After having tested this hypothesis in the scalar case we could not nd any logical correspondence between the parameters from the library and those parameters found by means of the linear regression method. These fundings are in accordance with the results presented by the PhD candidate Valeriya Tantseva⁴, where she conrms that the S-system formalism has a sloppy structure in the scalar case, meaning that they have a neutral parameter-set.

However, in higher dimensions, sloppiness seems to disappear, so that the data library can be constructed in an ecient way.

Regarding the bi-linear metamodel, it has been observed that the dimensionality of this method increases exponentially(n³) and proportionally to the size of the S-system being studied. This leads, for big systems, to the formation of principal components belonging to a null subspace. This issue

1This terminology was suggested by Prof. Eberhard O. Voit; Prof. Voit is a Distin- guished Professor in Biochemical Systems at the Georgia Technology Research Alliance.

2This concept and its applications are mostly due to by Prof. Harald Martens; Prof.

Martens is a Professor at the Norwegian University of Science and Technology NTNU

3Julia Isaeva was a PhD student at the Norwegian University of Life Sciences from 2007-2011.

4Valeriya Tantseva was a PhD student at the Norwegian University of Life Sciences from 2009-2013.

(7)

is solved in the tensor-metamodel, where the reduction of the dimensionality is given by the Kronecker product of the SVD.

Finally we can mention that the accuracy that each metamodel-phenome delivers is very good, meaning that more than 99% of the originally mathematical model is described by the PCT.

(8)

3 Sammendrag

I denne masteroppgaven presenterer jeg en ny tilnærming som kan brukes til matematisk modellering av synergistiske biokjemiske systemer (S-systemer), tilnærmingen kalles the principal component transform (PCT). PCT-modellen gir oss en enkel og svært nøyaktig metode for modellering av S-systemer og andre ikke-lineære systemer.

PCT modellen er satt sammen av to metamodeller: en bi-lineær metamodell og en forbedret versjon, kalt tensor-metamodell. Begge metamodellene baseres på prinsipal komponent analyse (PCA). Tensor-metamodellen er i tillegg basert på matematiske egenskaper fra tensor algebra, spesielt Kro- necker produktet.

PCT-modellen er inspirert av doktorarbeidet til Julia Isaeva. Isaeva og hennes medforfattere presenterte hvordan ikke-lineære modeller kan tilnærmes ved en enkelt bi-lineær metamodell. Isaevas publikasjoner har gitt verdifull kunnskap som jeg har brukt for å bearbeide multivariate potensfunksjoner.

Med henvisning til PCA: Scores som er ortogonale, spenner ut et underrom der vi kan projisere en tidsrekke ved lineær regresjons metoden, tidsrekken er gitt av S-systemet. Dette gir en funksjon (en ODE-funksjon), som vil oppføre seg på samme måte som det opprinnelig S-systemet.

Denne oppgaven utforsker muligheten til å bruke scores som tilhører PCT- modellen som et bibliotek. Biblioteket kan brukes i forbindelse med para- meterestimeringsproblemer. Min hypotese er at parametersettet funnet ved lineær regresjon burde nnes i scores-biblioteket. Etter å ha testet hypotesen for det skalare tilfellet fant jeg ikke logisk samsvar mellom parameterrom- met og biblioteket. Dette er i samsvar med resultatene som presenteres av PhD kandidat Valeriya Tantseva. Tantseva bekreftet at S-systemer har en sloppy struktur i det skalare tilfellet. Det betyr at S-systemer har et nøytralt parameter-sett.

I høyere dimensjoner kan "sloppiness" likevel forsvinne slik at biblioteket kan konstrueres på en eektiv måte.

Nøyaktigheten som hver metamodell leverer er meget bra, noe som betyr at mer enn 99% av den opprinnelig matematiske modellen er beskrevet av PCT-modellen.

(9)

4 Introduction

March 2014 was a key date for the Norwegian Scientic Society, whereupon we celebrated the 150th anniversary of the only natural law found by Norwe- gian scientists: the Law of Mass Action. This law is a mathematical model which describes natural phenomenons in biochemistry given by solutions in dynamical equilibrium. Cato Maximilian Guldberg and Petter Waage proposed the Law of Mass Action in 1864, see Apendix A2.

This day we know that the attributes and properties of the Law of Mass Action can be applied not only to biochemistry, but to many other elds of science; such as physics (e.g. thermodynamic studies, semiconductor properties), physiology (e.g. extracellular physiological signals), pharmacology, psychology (e.g. psychological theories in neuro-psychology) and mathematics (e.g. dynamical systems, mathematical ecology, mathematical epidemi- ology).

The Law of Mass Action is also responsible for the relatively new mathematical theory called power-law formalism, which, together with their synergistic (S)-system representation of non-linear dierential equations, describes sat- isfactorily the non-linearities of natural processes. This thesis is intended to be an attribute to this powerful and exible natural Law of Mass Action. I will connect this natural law, which describes non-linear dierential equations, to a method called metamodelling, which has been vastly used at the Center for Integrative Genetics (CIGENE) at the Norwegian University of Life Science, (section 5.1 describes what a metamodel is).

The main goal of the thesis is to nd a new approach, which will be based on metamodelling, to non-linear systems appearing in biochemistry, in particular the S-system representation which goes back to the Gulberd and Waage model and its generalizations.

The mathematical theory which describes how the metamodel and the tensor- metamodel can be used as methods for modelling S-system is described in Sections: 8, 10 and 13.

Examples 2, 3, 4, and 5 present the tting of (1x1), (2x2), (3x3) and (4x4) S-system to a bi-linear metamodel.

Section 13 presents the tensor-metamodel and Section 14 provides examples of (1x1), (2x2), (3x3) and (4x4) S-system being modelled by the tensor- metamodel.

(10)

List of Figures

1 Presents an aposteriori model . . . 11

2 Illustrates an a priori model . . . 12

3 Representation of the structure of the PCA . . . 13

4 Shows the plot of the non-linear functionx^1.5 . . . 19

5 Representation of the scores-library . . . 20

6 Shows the non-linear functionx^1.5being tted by a metamodel where the parameters come from the scores-library . . . 21

7 Shows the non-linear function x^1.5 −x^0.5 being tted by a metamodel where the parameters come from the scores-library 22 8 Graphical representation wherex(tk)⊂xj . . . 26

9 (1x1) S-system being tted by a bi-linear metamodel . . . 30

10 (1x1) S-system being tted by a bi-linear metamodel at 4 dierent initial conditions . . . 31

11 Plot of the functionsin(πx) . . . 35

12 Interpolation ofsin(πx) at 15 points . . . 35

15 Shows an ODE being solved by the fourth order Runga-Kutta method and its absolute error . . . 37

16 Shows an ODE being solved by ODE45 and its absolute error 38 17 Comparison of the absolute error of the Runge-Kutta method and ODE45 . . . 39

18 (2x2) S-system being modelled by a bi-linear metamodel . . . 44

19 (2x2) S-system being modelled by a bi-linear metamodel at 3 dierent initial conditions . . . 45

22 (4x4) S-system being modelled by a bi-linear metamodel . . . 50

23 Shows the convergence of the PCs due to tensor-multiplication of matrices with dierent sizes . . . 53

24 (2x2) S-system being modelled by a tensor-metamodel . . . . 59

25 (2x2) S-system being modelled by a tensor-metamodel at two dierent initial conditions . . . 60

28 (2x2) S-system being modelled by a tensor-metamodel for steady state analysis . . . 65

(13)

5 Background

5.1 Model vs. metamodel

A model in mathematics is a simplied description of a system, and this description is made using mathematical language and concepts. I believe that a mathematical model has 3 main goals:

i) To help us to understand the system.

ii) To help us to explain and describe the system and the components which are involved in the system.

iii) To help us to make predictions about the behaviour of the system, and the eect each component has.

It is well known that mathematical models of complex systems are normally very dicult to analyze and time consuming to work with. On the other hand, a metamodel is a much simpler tool to work with. A metamodel is a model of the original model. A metamodel provides, just like a model, a description of the system. The advantage of working with metamodels is that they considerably reduce the complexity of the model and also reduce the dimensionality of the system, without loosing any important information. We can see in Figure 1 and Figure 2 a graphical representation of an aposteriori and an apriori model with its corresponding metamodel.

A model is known as aposteriori when the real data comes before the mode.

The metamodel comes at the end.

Figure 1: This gure has been extracted from [1].

The model is known as a priori when the data comes after the model.

The metamodel comes at the end.

(14)

Figure 2: This gure has been extracted from [1].

Despite the fact that a metamodel is a simplication of the model, it does not lose any important information regarding the complex system. In other words, a metamodel will not fail to fulli), ii) andiii).

The metamodel will provide an insight, which in some cases is as deep as the model itself, of the complex system being analyzed.

In this thesis I will use the principal component analysis (PCA) as a metamodel.

The PCA will be computed by means of the singular value decomposition.

5.2 The singular value decomposition (SVD)

According to [4], there are two matrices U and V which are orthogonal and a diagonal matrixΣ enabling the matrixX_(I_x_J) to be factorized as follows:

X_(I_x_J)=U_(I_x_I)Σ_(I_x_J)V_(J^t_x_J) (1) whereV^t is V transposed.

The matrix Σ contains the singular values of X on its diagonal. The left singular vectors U are joined together withΣin order to obtain the scores- matrix T and the matrix V, also called right singular vectors which produce the loadings-matrix P:

X = (UΣ)V^t= (T)P^t+e (2) A more extensive denition of the singular value decomposition is provided in appendix A1.

5.3 The principal component analysis (PCA)

A principal component (PC) is a special type of variable which can not be measured directly. Instead, it can be computed as a linear combination of a set of input variables (e.g. a data set). These kinds of variables are called latent variables, see [2]. The author in [2] underlines that the principal component analysis (PCA) allows us to extract relevant information from big and sometimes confusing data setts with comparatively little eort. The PCA reduces the high dimensional data sett to a lower dimension giving few

(15)

principal components that underlie the dynamics of the complex system.

Authors in [3] acknowledge that the PCA is concerned with explaining the structure of the variance-covariance of a set of variables. The result of the analysis of these latent variables has shown, and also been considered by [1], that the principal components lie in the direction of the maximal variation of the data and they describe this variation in a descending order, thus: the rst PC is found along the direction of the largest variation, the second PC in the direction of the second largest variation but orthogonal to the rst, a pattern which is repeated for the third PC, the fourth PC and so on.

LetA=optimal number of PCs

When the remaining variation of the data in question is small, it is considered thatA has been found and the information which is given by A+ 1 PCs is so small that it can be considered as noise (e).

In [1] we can read more about how the PCs reect the largest eigenvalues of the co-variances, therefore the rst few PCs can give an adequate description of the whole data set, these PCs form a new orthogonal basis of the variable space where the scores (T) and loadings (P) are projected onto it. The scores represent the relationship between the PCs and the samples (data rows) while theloadingsrepresent the relationship between the PCs and the variables(data columns).

Assume we are given some random data in the matrixX_(I_x_J), this data can be expressed in terms of theloadingsand scoresas shown below:

X_(I_x_J)=T_(I_x_K)P_(K^t _x_J)+e_(I_x_J₎ (3) Here are X,T,P and e matrices with known dimensions, and where P^tis P transposed.

(3) shows that X_(I_x_J) is represented linearly byT andP, for this reason the PCA is referred to as the bi-linear model.

Next we have a graphical interpretation of (3).

Figure 3: Illustrates how the structure of the PCA will be.

(16)

The full bi-linear model

The bi-linear model of the matrixX_(IxJ), together with the optimal number of PCs(A)can be written in vector form as shown below:

X_(IxJ)=X+t1p^t1+...+tAp^tA+e_A (4) where p^t is p transposed.

The matrixX in (4) represents the main centred-valued matrix, the vectors t1,...tAare the rst A scores vectors, p₁,...p_Aare the rst A loadings vectors, and e is the error.

Centring X before computing the PCA is a normal procedure in statistical analysis, but since this is a mathematical thesis, we are only interested in the mathematical behaviour of the bi-linear model. For this reason, we do not need to center X.

Just before we introduce the PCA as a metamodel for S-system, we give a formal introduction of S-system and the theory of the power-law formalism.

6 The power-law formalism and S-systems

6.1 Historical introduction

The power-law formalism and in particular the S-system representation within this non-linear formalism can trace their origins back to 150 years ago. It all started with two scientists named Peter Waage and Cato Maximilian Guld- berg.

Peter Waage (June 29 1833 - January 13 1900), was a signicant Norwe- gian chemist and professor at the Royal Frederick University. Waage, along with his brother in law Cato Maximilian Guldberg (11 August 1836 - 14 January 1902), developed the Law of Mass Action between 1864 and 1879.

The Law of Mass Action in chemistry is a mathematical model that explains and predicts behaviours of solutions in dynamic equilibrium. Guldberg og Waage published three articles in total, the rst one came in 1864, and was a paper in Norwegian called Studier over Aniteten, (see appendix A2). It went largely unnoticed, as did the second paper in 1867, which was written in French. Almost fteen years after the rst paper Guldberg and Waage had published, 1877 saw a Dutch chemist called Jacobus Henricus van 't Ho achieving similar results as those produced by Guldberd and Waage.

Ho's work was produced independently from that of Guldberg & Waage's and Ho's had started to get recognition for these results. For this reason, the Norwegian scientists decided to write their last paper in 1879, in order to get credit for their work, an achievement which they they did manage to secure.

It is thanks to the scientic achievements of Guldberg & Waage in Studier

(17)

over Aniteten (1864) that we have what we now call the power-law formalism, which is a sophisticated theory that studies dynamic equilibrium.

A representation of this formalism is given in systems of equations called S-systems. In [5] we can read that the S refers to synergism and saturation of the investigated system. The authors in [6] stress the fact that the power-law formalism is a theoretical framework for modelling and analysis of complex systems represented by dierential S-system equations. Among the possibilities of this formalism, we would do well to mention:

* Steady-state characterization of complex systems.

* Parameter estimation from experimental measurements.

* Classication of non-linear functions using recasting techniques.

* Development of ecient numerical algorithms for numerical simulations.

6.2 The need of the power-law formalism and S-systems In the analysis of complex technological systems or biological systems, methods based on linear mathematics very often fail to capture the essential non- linear characteristics from these processes. Concerning living systems and natural processes, non-linear responses such as synergism, saturation and oscillations are the rule rather than the exception. For instance, nding ana- lytical solutions of non-linear dierential equations is an art limited to a few special cases that it is insucient for general non-linear system analysis. On the other hand, the power-law formalism with S-system dierential equations shows potential for providing systematic methods for analysis of non-linear systems, see [6].

6.3 A formal mathematical description of S-systems

The author in [5] provides a very descriptive denition of the mathematical structure of S-systems. I will quote this denition with some terminology changed:

An S-system in biology represents the relative change in ux of a sub- stance in a specic process. This relative change in ux is represented by the functionXi, where the variables have the same name and which implies thatX˙_i denotes the derivative ofX_i with respect to time. X˙_i is at the same time composed of two positive-valued and dierentiable functions ofX. One of these functions can be expressed as: V⁺(Xi) representing the in-ux and the second function: V⁻(X_i) representing the out-ux.

This gives:

X˙i =V_i⁺(Xi)−V_i⁻(Xi) (5) wherei= 1,2, ..., n.

(18)

From (5) we can understand that if the system is composed ofX₁, X₂, ..., X_n variables, the function that describes the production is given byV_i⁺(X1, X2, ..., Xn) and the function that describes the degradation is given byV_i⁻(X1, X2, ..., Xn), giving the following result:

X˙_i=V_i⁺(X₁, X₂, ..., X_n)−V_i⁻(X₁, X₂, ..., X_n) (6) Considering now X˙i, where i = 1,2.., n, this implies that there is only one dierential equation for each dependent variable in the system being considered. Assume now that we have m independent variables and ndepend- ent variables, this will give usndierential equations in the form of (6).

We letV_i⁺ and V_i⁻ be dened now as follows:

V_i⁺(X1, ..., Xn, Xn+1, ..., Xn+m) =αiX₁^gⁱ¹X₂^gⁱ²...Xn^gⁱⁿX_n+1^g^i,n+1...X_n+m^g^i,n+m

and

V_i⁻(X₁, ..., X_n, X_n+1, ..., X_n+m) =β_iX₁^hⁱ¹X₂^hⁱ²...X_n^hⁱⁿX_n+1^h^i,n+1...X_n+m^h^i,n+m

whereαi, βi, gi,n+m, hi,n+m are parameters.

Then:

V_i⁺(X1, ..., Xn, Xn+1, ..., Xn+m) =αiQn+m j=1 X_j^g^i,j and

V_i⁻(X₁, ..., X_n, X_n+1, ..., X_n+m) =β_iQ_n+m

j=1 X_j^h^i,j

This generalization allows us to re-write (6) in the following way:

X˙i=αi n+m

Y

j=1

X_j^g^ij −β

n+m

Y

j=1

X_j^h^ij (7)

withX_i(0) =X_i0 and fori= 1,2, ...,(n+m)

The equation in (7) gives a canonical representation of non-linear ordinary dierential equations. The coecients α_i, β_i ∈ R are called rate constants and the coecientsg_ij, h_ij ∈Rare referred to as kinetic orders.

6.4 Modelling with S-systems

From experimental trials conducted and presented by the author in [5], we know that near to steady-state solutions and relative changes in metabolites produces proportional changes in ux. S-system's representations of biochemical systems responds in the same way. For this reason we can consider S-systems as a suitable tool for modelling biochemical systems.

(19)

At the same time, we know from the theory presented in [6] that S-systems, thanks to its structural homogeneity, means that all steps required for analysis of complex biological systems are straightforward. We could mention:

system representation, steady-state solutions, stability analysis and sensitiv- ity analysis: these methods are ecient for evaluating the dynamic responses of the system, providing at the same time powerful tools for understanding the complex non-linear systems.

In the next chapter we will provide the theory that will allow us to compress these complex non-linear systems represented by S-systems. We will show as well that by compressing S-systems we don't lose any important information of the non-linear system.

7 Compressing power functions with PCA

7.1 Theoretical background

The theoretical background of this section is based on the research and work of Julia Isaeva. Julia took his Philosophiae Doctor Thesis at the Norwegian University of Life Sciences in 2011. I will use more specically the results presented in her fourth paper The modelome of line curvature: Many non- linear models approximated by a single bi-linear model with verbal proling, whilst at the same time I hope to take these results a little further.

Consider a matrix X of dimensions (I xJ). Every entry in X_(IxJ) will be given by a power function, this follows from the way X is dened, which is:

x^ω_jⁱ :X_(IxJ)−→R (8)

wherei= 1, .., I and j= 1, .., J.

This means that we can decomposex^ω_jⁱ by arbitrarily choosing values for J and I. If, for instance, we let: j ∈ [1, ..., J] which is a close domain with J points, and i∈ [−1, ..., I]which is also a close domain with I points, we obtain the matrixX ∈R^I^x^,J:

x^ω_jⁱ :X =







x^ω₁¹ x^ω₂¹ ... x^ω_J¹ x^ω₁² x^ω₂² ... x^ω_J² .. .. ... ..

x^ω₁Î x^ω₂Î ... x^ω_JÎ







∈R^(I^x^J) (9)

We can see now how the matrixX_(IxJ) represents a collection of power functions, or, we could say thatX_(IxJ) is a multivariate power function matrix.

Through this thesis I am going to work in detail with functions of the form x^ω_jⁱ. This is because the structure of these functions is similar to the structure we nd in the S-system representation. Since S-systems are related to biochemical systems, we need to dene some constraints for the domain of x^ω_jⁱ, this is needed in order to get valid results within this branch of chemistry.

(20)

ωi:

ωi will have a default close domain, ωi∈[−1,2]. This is because [5] ensures that kinetic order reactions happen to occur inside this interval, namely [−1,2].

x_j:

The only constraint we have for the domain of xj is that it cannot include the0value. This is because we could risk operating with singularities when dividing by zero.

According to [1] we know that such kinds of matrices (X(IxJ)) can be com- pressed by a metamodel, the bi-linear model. If we recall equation (4), we are reminded of the fact that we can expressX_(IxJ) as a linear combination of the mean-centred valued matrixX and the rst three principal components multiply with the rst three scores-vectors:

X_(IxJ)=X+t1p^t₁+...+tAp^t_A+eA (10) where p^t is p transposed.

Consequently, if we were to multiply X_(IxJ) with a real-valued constant α, we would have:

αX_(IxJ)=αX_(IxJ)+αt1p^t₁+...+αtAp^t_A+αeA (11) Note that:

t∈R^I and p∈R^J

the scores (t) will be related to the rows of X_(IxJ) while the loadings (p) will be related to the columns ofX_(IxJ).

7.2 Further results: The scores-library

To reiterate one more time, we are only interested in the mathematical behaviour and structure of the bi-linear model, and for this reason we do not need to center the data inX_(IxJ)before computing the PCA. The consequence of leavingX_(IxJ) not-centred is that we need to work with 3 PCs instead of 2.

We now have the following metamodel:

x^ω_jⁱ :X_(IxJ)=t1p^t1+t2p^t2+t3p^t3+e4 (12) where p^t is p transposed.

Another important result we are going to further explore is that the scores ti

(i= 1,2,3) from (12), can be brought together to form a library, which can be used in parameter estimation problems. The scores-library is introduced in the following section.

(21)

The scores-library

Let L refer to a library formed by the scores t1, t2 and t3, which can be found in (12).

L= [t1 t2 t3] (13)

Before we proceed, please recall that the dimension of ti (i = 1,2,3), is directly related to the rows of X_(IxJ), and where the rows of X_(IxJ) are given byωi from the power function representationx^ω_jⁱ, see (9).

Assume now that we are interested in a specic value called ωs, which is inside ω_i and where ω_i ∈[−1,2]. Assume this valueω_s = 3/2, which gives the following function:

f(x) =x^(3/2) Expressed graphically, we have:

Figure 4

At the same time we construct a matrix X ∈ R⁽⁶⁰⁰^x⁶⁰⁰⁾ where we let x_j ∈[0.1, 2]and ω_i ∈[−1, 2]. We can now compute the PCA to X₍₆₀₀_x₆₀₀₎ in order to get the scores and loadings.

x^ω_jⁱ :X_(600x600) → P CA→

3

X

i=1

tip^ti ≈X_(600x600) where p^t is p transposed.

Now, since we know thatωs∈ωi we can nd the index position inωi where we have the value of3/2 or its best approximation.

ωi ∈[−1, ω₂, ω3, ..., 1.5

|{z}

position 501

, ...,2] (14)

(22)

Once we have the index position ofω_s we can begin working with L. In this case we have the following library:

L= [t1 t2 t3] ∈R⁽⁶⁰⁰^x³⁾

Now, in order to nd a metamodel that approximates our function f(x) = x^3/2 we need to nd the values of L which correspond to the index position 501. This step is illustrated in the next Figure.

Figure 5: Representation of the scores-library

We use matlab to pick up these values and we get the following metamodel:

f(x)≈L(501,1)p1(x) +L(501,2)p2(x) +L(501,3)p3(x)

| {z }

fb(x)

(15)

where:

t₁ =L(501,1) =−8.83⁵ t2 =L(501,2) =−1.15⁵ t₃ =L(501,3) =−1.04⁴

If we now plot both functions together, f(x) and f(x)b we get the following result:

(23)

Figure 6: Shows the non-linear function x^1.5 being tted by a metamodel where the parameters come from the scores-library

We can see that we get very good results.

Consider now the following non-linear function:

f(x) =x^(3/2)−x^(1/2) (16) The exponents full the requirement of being insideωi ∈[−1, 2].

We use the same matrix we had before, namelyX_(600x600) and we can compute the PCA one more time, meaning we get the same result as before:

x^ω_jⁱ :X_(600x600) → P CA

The dierence this time will be given in the structure of the metamodel, which is given as follows:

f(x)≈

3

X

i=1

tip^ti−

3

X

i=1

tip^ti (17)

where p^t is p transposed.

Since we are using the same xj in computing the metamodel for x^3/2 and x^1/2, this implies that p^ti from (17) is the same in both sums and for this reason can be factorized. We can make the same supposition for the library L, which will be the same in both sums because we are using the sameω_ifor the metamodel forx^3/2 andx^1/2. But the values we are querying in L come from dierent index-positions within L.

for ω(s₁) = 3/2:

ωi ∈[−1, ω₂, ω3, ..., 1.5

|{z}

position 501

, ...,2]

(24)

and forω(s₂) = 1/2:

ω_i ∈[−1, ω₂, ω₃, ..., 0.5

|{z}

position 301

, ...,2]

This gives the following metamodel:

f(x)≈[ω(s₁)

| {z }

L501,1

−ω(s₂)

| {z }

L301,1

]p^t1+ [ω(s₁)

| {z }

L501,2

−ω(s₂)

| {z }

L301,2

]p^t2+ [ω(s₁)

| {z }

L501,3

−ω(s₂)

| {z }

L301,3

]p^t3 (18) We pick up the six values from the common L which corresponds to these index-positions and we compute the dierence:

t1=L(501,1)−L(301,1) =−50.4210 t₂=L(501,2)−L(301,2) = 11.4172 t₃=L(501,3)−L(301,3) =−0.1087

Finally, we now have a metamodel with a structure similar to (12):

f(x)≈t₁p1(x) +t₂p2(x) +t₃p3(x) +e₄ (19) We plot both functions and we get:

Figure 7

The results, as we can see, are very satisfying.

These examples ensure that L can indeed be used as a library. These results are important for further analysis, (see section 16, recasting).

In the next section of the thesis, a metamodel has been developed for approximate S-system dierential equations, having a time-series given in the left hand side (LHS) of the system as the only information available.

(25)

8 Scalar case of S-systems and the bi-linear metamodel

A scalar case of S-system representation is given as follows:

X˙ =αX^g−βX^h

| {z }

f(x,t)

(20)

one equation, four parameters and X common. Recall that α, β ∈ R are called rate constants andg, h∈Rare referred to as kinetic orders.

We will develop a metamodel which is simply an approximation of the original function, and we have called this metamodelfb(x, t).

As mentioned briey previously, the right hand side (RHS) in (20) will be unknown. The only available information we are going to have is the LHS, which will be giving as a vector with function-values off(x, t) at some time t. In other words, f(x, t) will be given discrete. First of all, we begin by deriving the metamodel.

8.1 Deriving the metamodel

Recall the results from section (3), then dened αX^g and βX^h as shown bellow:

αX^g≈αλ₁p^t1(x_j) +αλ₂p^t2(x_j) +αλ₃p^t3(x_j) +α₄ (21) βX^h ≈βµ₁p^t₁(xj) +βµ₂p^t₂(xj) +βµ₃p^t₃(xj) +β4

where p^t is p transposed;λ₁,λ₂,λ₃ andµ₁,µ₂,µ₃ are vector-spaces,α and β are just constants ∈R.

Since we know thatxj is common for both metamodels, p^t_i can be factorized giving the following result:

αX^g−βX^h≈(αλ1−βµ₁)p^t1+ (αλ2−βµ₂)p^t2+ (αλ3−βµ₃)p^t3+4 (22) In order to simplify notation we can write (22) as shown below:

αX^g−βX^h≈Γ₁p^t1(x_j) +Γ₂p^t2(x_j) +Γ₃p^t3(x_j) +₄ (23) Equation (23) is the metamodel for the (1x1) S-system, where:

Γ₁ = (αλ₁−βµ₁) is a vector space.

Γ2 = (αλ2−βµ₂) is a vector space.

Γ3 = (αλ3−βµ₃) is also a vector space.

Notice that the metamodel in (23) is a general representation of the function fb(x, t) which approximates the (1x1) S-system for any given combination of parametersα, β, g, h within a valid domain.

For an explicit (1x1) S-system, the metamodel becomes:

αX^g−βX^h≈Γ₁p^t1(x_j) + Γ₂p^t2(x_j) + Γ₃p^t3(x_j) +₄ (24)

(26)

whereΓ_i (i= 1,2,3), are real-valued constants. These constants, also called parameters for the metamodels are unknown.

Equation (24) can be considered as a parameter estimation problem. We now need to estimate these unknown parameters, which is explained in the next section.

8.2 Metamodel-parameter estimation problem

The process in nding theΓ₁,Γ₂ and Γ₃ parameters from a time-series will be explained through a series of six steps in a simple algorithm.

Step 1: Solving the S-system

Assume the (1x1) S-system is a known explicit, this means that:

X˙ =αXb ^b^g−βXb ^b^h

| {z }

test function

, x(0) =x_o (25)

whereα,b β,b bg andbh are real-valued constants.

At the same time we choose to call the function from (25) test function.

The rst step consists in solving the system in (25). Dierent methods can be applied, in matlab or other computational languages. I will use a built-in function from matlab called ODE45⁵.

The result after solving (25) is a vector which we will refer to from now on as test solution. The test solution will be given as follows:

[x(tk)]^k_k^K^=x^o^+x

1=xo

| {z }

test solution

(26)

the rst value of the vector is the initial condition-valuex_o and the last value isxo+x.

What is more I will only usex(t_k)when referring to the test solution in (26), this is done in order to simplify notation.

The time interval for which the ODE from (25) is solved can be arbitrarily chosen. The domain of x(t_k) ∈ [x_o, x_o +x] will depend entirely on the behaviour of the solution. This interval gives us information about where x(tk) is dened. Furthermore, we can nd the max and the min value for the test solution within this interval. We are very interested in this interval, which will help us to decide the domain for x_j from the power-function representationx^ω_jⁱused in assembling the multivariate power function matrix X_(IxJ).

5ode45 is a numerical tool used in solving dierential equation problems. It applies the fourth and fth order Runge-Kutta method. (MathWorks-DocumentationCenter-ode45 )

(27)

Step 2: Computing the PCA

Recall that x^ω_jⁱ will give a matrix X of size (IxJ). The domain for the columns ofXwill vary accordingly to the domain ofx_jalone. This is because the domain ofωi (rows ofX) has been set to the default value= [−1, 2]. We have:

x^ω_jⁱ :X_(IxJ)=







x⁻¹₁ x⁻¹₂ ... x⁻¹_J x^ω₁² x^ω₂² ... x^ω_J² .. .. ... ..

x²₁ x²₂ ... x²_J







(27)

The domain of xj needs to full two constrains:

a) x_j has to be chosen in such way thatx(t_k) becomes a subset ofx_j. This is possible if we take a close look to the domain ofx(t_k)and we simply chose a wider interval forxj, perhaps if:

x(t_k)∈[x_o, x_o+x] ∀x >0 we choose:

xj ∈[xo−x, xo+ 2x] ∀ x >0

b) The step-size of xj has to be much smaller than the step-size of x(tk).

Assume that k = 50 ⇒ x(t_k) ∈ R^50x1, thus, we chose j = 10∗k ⇒ xj ∈ R^500x1.

If X_(IxJ) fulls a) and b) we can compute the PCA to X_(IxJ) and proceed to the next step.

Step 3: Superposition

The constrains a) and b) given on step 2 ensure that we can writex(t_k)⊂ x_j and that x_j has smaller step-size than x(t_k). A graphical representation is provided below.

(28)

Figure 8

It is now possible to approximate values fromxj tox(tk)in the following three ways:

xj31≈x(t1) (28a)

xj67≈x(t2) (28b)

x_j2 =x(t₃) (28c)

x_j72=x(t₄) (28d)

x_j7 ≈x(t₅) (28e)

xj7 ≈x(t6) (28f)

* Some values from xj (of index position 31 and 67) approximate the rst and second value of x(t_k), (28a,28b).

* Some values fromx_j (of index position 2 and 72) are equal to the third and fourth value ofx(tk), (28c, 28d).

* Only one value fromxj (of index position 7) is similar to the fth and sixth value of x(t_k), (28e, 28f).

As we can see, the process of approximate values fromxj to x(t_k) will give us a very important result, which is a vector we will call[idx]vector.

This vector contains the index positions of the values of x_j that approximate or are equal to the values ofx(t_k). These index positions are the small

(29)

numbers which are beside thej⁰sin (28a-28f).

idx=





 31 67 2 72

7 7







(29)

Note that[idx]∈R^K. Superposition:

We are going to use the [idx] vector to nd a composed function p_i(xj) (i = 1,2,3), of x(t_k). The result will be the composed function pi(x(t_k)) where(i= 1,2,3).

For this job a matlab-code has been implemented, the code records the index positions ([idx]) of the approximated values of xj to x(t_k), then it goes through every element in pi(x_j)and chooses the values that are in the index positions given in[idx]. At the end, the code rearranges the values of p_i(xj) accordingly to[idx]and the result is a composed function pi(x(t_k))∈R^K. To illustrate this step let us consider (28a-28f):

Assuming now that we are done with the approximation step and we have our[idx]vector (29). We will now focus on the pi(xj)vector, for (i= 1,2,3), and pick up the value form the index position 31. This is the rst value in [idx]and for this reason this value takes the rst position in the composed function pi(x(t_k)). If we repeat the process again we have to pick up the value from pi(x)from the index position 67 which is the second value in[idx]

and for this reason, this value takes the second position in pi(x(t_k)). When we have gone through all the elements in[idx]we have a p(x(tk)) ∈R^K. Step 4: Taking the derivative of the test solution

This step is straightforward. matlabis applied to compute the derivative of the test solution giving: x(t˙ _k) =dx/dt wheredx/dt∈R^K−1. Notice that we lose one dimension⁶ when computing the derivative of the test solution.

The step where we dierentiate x(t_k) is necessary because we are going to apply linear regression in order to nd the unknown parametersΓ_1,2,3, and linear regression requires that we work with time-series.

6Y = dif f(X) calculates dierences between adjacent elements of X along the rst array dimension whose size does not equal 1. IfX is a vector of length m, then Y = dif f(X)returns a vector of lengthm−1. (MathWorks-DocumentationCenter-di )

(30)

Step 5: Linear regression

This step will give us three real-valued constants: Γ₁,Γ₂ and Γ₃, which are the three unknown parameters from our metamodel:

fb(x, t) = Γ₁p1(x_j) + Γ₂p2(x_j) + Γ₃p3(x_j) +e₄ The linear system we are going to solve is given as:

M Γ =dx/dt (30)

where

M = [p1(x(tk)),p2(x(tk)),p3(x(tk))]∈R^K^x³ Γ ∈R³^x¹

dx/dt ∈R^(K−1)^x¹

As we observe, the system in (30) is not consistent. The RHS has a lower dimension than the LHS. I believe the best option to solve this problem is by deleting the last row from the matrixM_(KX3). The reason I suggest such an option is because we lose the last element fromx(t_k) during dierentiation.

We have now:

M = [p1(x(t_k)),p2(x(t_k)),p3(x(t_k))]∈R^(K−1)^x³ Γ ∈R³^x¹

dx/dt ∈R^(K−1)^x¹

which is a consistent linear system. We can now compute:

Γ =M^t dx/dt (31)

The vector of M forms a basis for the column space, the vector dx/dt is not in the column space so we need to project it onto it, as a result of this projection we get three real-valued constants: Γ₁,Γ₂ andΓ₃.

Step 6: The metamodel

Once we have the parametersΓ1,Γ2 andΓ3 we can write the metamodel for the (1x1) S-system as shown below:

X˙ = Γ1p1(xj) + Γ2p2(xj) + Γ3p3(xj)

| {z }

f(x,t)b

(32)

wherefb(x, t)≈f(x, t).

To be able to judge the accuracy of the metamodel in (32), we are going to use the square error method:

max (|x(t_k)−x(tb _k)|²) 100% (33)

(31)

This method basically computes the dierence from the exact solutionx(t_k) and the approximated solution bx(tk). The max implies that we are only going to get the biggest result after the dierence has been computed, then we square our answer and multiply it by 100 in order to have the result in percentage form.

Before we can compute the square error method we need to solve the rst order dierential equation given in (32). The challenge here is that we have our metamodel-function in discrete form and the ODE45 method from matlab does not accept discrete functions. However this matter can be easily solved by another built-in function oered by matlab, which is linear interpolation. Applying linear interpolation to (32) will give a curve where the valuesf(t1), f(t2), f(t3), ..., f(tK),andt1, t2, t3, ..., tK (which are known) are tted by matlab. We can now solve the ODE from (32) using ODE45.

This is hugely signicant because the test function has been solved by means of ODE45. We now need to apply the same numerical method to the equation in (32) if we are going to be fair in comparing their solutions.

8.3 Example 1: (1x1) S-system and its bi-linear metamodel Consider the following (1x1) S-system:

X˙ = 3X^0.5−2X^1.5, x(0) =xo (34) The equation in (34) is given in explicit form, whereα = 3, β = 2, g= 0.5 and h= 1.5. This means that we can nd a metamodel that approximates (34) if we go through the steps 1- 6. By doing this we will obtain a metamodel for the scalar case S-system:

X˙ ≈Γ₁p1(x_j) + Γ₂p2(x_j) + Γ₃p3(x_j) (35) Note that parameters Γ1, Γ2 and Γ3 will be dierent for dierent initial conditions (x_o).

Let the ODE in (34) havexo= 0.2, [tspan]∈[0 1]. At the same time we let ωi ∈[−1, 2]and xj ∈[0.1, 5], and we then assembleX_(IxJ) as follows:

x^ω_jⁱ : X_(600x600) →P CA

Once we compute the superposition and linear regression, we obtain the following parameter-values:

Γ₁ = 591.5121 Γ₂ = 35.1352 Γ3 = 107.5798

These results give us the following:

X˙ ≈591.5121p1(xj) + 35.1352p2(xj) + 107.5798p3(xj)

(32)

where pi(x_j)∈R^(600X1) fori= 1,2,3.

Next, we plot both results and compare their solutions.

Figure 9

Finally, the square error method which is also computed in matlab gives:

(max|x(t_k)−x(tb _k)|²) 100≈0.048%

which is a very accurate result.

In order to show how stable the metamodel is, we are going to solve (34) at x_o = 2, x_o = 2.5, x_o = 3 and x_o = 3.5. We will compute a metamodel for each case, which will give dierentΓ parameters although we are using the same X_(600x600) matrix to approximate each ODE.

The results, which are very satisfactory, can be studied in the next graph.

(33)

Figure 10 where the square error is:

- forx_o= 2 ≈0.042%. - forxo= 2.5 ≈0.043%.

- forxo= 3 ≈0.043%. - forx_o= 3.5 ≈0.042%.

Some very important aspects we need to consider now are: How trustworthy are these results? How reliable is the procedure we have employed in solving the discrete ODE given by the metamodel, considering that we used linear interpolation before ODE45 ? and, Is ODE45 a stable numerical method?

These are very relevant questions that will be considered in the next section.

9 Numerical analysis

Remember that we are interpolating the metamodel-function before solving it with ODE45. The following section will show that, whilst another numerical-integration methods can let us deal directly with ODEs given in discrete form, the methods of interpolation and ODE45 are stable and provide reliable results.

Before we consider the methods oered by matlab, and assuming one wants to apply another numerical method, we will consider some important aspects regarding numerical-integration methods that we should bear in mind when solving ODEs.

(34)

As mentioned before, there are many numerical-integration methods that can allow us to deal directly with discrete-ODEs, to name but a few: the Euler method, the midt point method, the Heun's method, the Runge-kutta method and many others. A good numerical method has to provide two things: ef- ciency and accuracy and clearly not all of the methods mentioned above deliver on these fronts. In many cases, one has to decide if time-computing is more important than accuracy or vice versa.

Next, we will consider two methods to show that some methods are more reliable than others, namely the Euler method and the Runge-Kutta method.

9.1 Euler method

We can nd a very good theoretical description of the Euler method in [12]

and [13].

The Euler method is a numerical method used for solving initial value problems. The method provides an approximation of the solution to the initial value problem.

For any

x⁰ =f(t, x) x(to) =xo

the forward euler method gives:

x_(k+1)≈x_k+hf(t_k, x_k) whereh=t_(k+1)+t_k

| {z }

∆t

(36)

The algorithm in (36) calculates approximate valuesx₀, x₁, x₂, x₃, x₄, .., x_n, ..

of the unique solutionf(x(t))at the set pointst0 < t1 < t2 < t3< ... < tn....

Notice thatf(t, x) satises: x⁰(t) =f(t, x(t)).

The Euler method can be applied in order to solve the ODE given by the metamodel. Indeed, we have all we need: we have the time interval given by xj, which are the points where the metamodel has been evaluated -this is∆t in (36)-, we have the initial conditionx₀ =x_k in and we have the function values for the metamodel, which corresponds tof(tk, xk) in (36).

The problem with this method arises under error analysis. As we can observe, the Euler method is a step-by-step method, for this reason one can anti- cipate that by reducing the step-size, we get more accurate results. Broadly speaking, the smaller step-size yield the smaller the error. Such error is known in the eld of numerical analysis as truncation error. The truncation error can be measured locally or globally. The local truncation error measures the error at specic points, found by taking the dierence between the exact solution and the computed solution at specic points, while the global truncation error measures the dierence between the exact solutionf(t) and the approximated solutionf(t)b . It is the latter which is of particular interest to us.

In [11], [12] and [13] we nd that the Euler method has a global truncation

(35)

error of rst order h(O). This mean that the error→ 0 linearly when the step size ∆t→ 0. This convergence of the error to zero is very slow, which implies that this method is not very accurate, although one has to acknowledge that it is very fast. In order to get good results one should consider a numerical method which has an error with a high convergence rate to zero, this means a higher power of h(O).

According to [13], the global truncation error of the midt point method and the Hun's method is of second order h(O)², which implies more accurate results, but if we want accuracy one of the best methods to consider is the fourth-order Runge-Kutta method.

9.2 Fourth-order Runge-Kutta method (RK4)

Also called, classical Runge-Kutta. It is called fourth-order because its global truncation error is of fourth orderh(O)⁴, which is far quicker than the other methods previously mentioned.

The classical Runge-Kutta scheme

x_(i+1) =xi+1

6(k1+ 2k2+ 2k3+k4)/h (37) where

k₁ =f(t_i, x_i)

k₂ =f(t_i+1

2h, x_i+1 2k₁h) k3 =f(ti+1

2h, xi+1 2k2h) k₄ =f(t_i+h, x_i+k₃h)

Notice that thek⁰sare recurrently related. That is,k₁ appears in the equation ofk₂, which appears in the equation of k₃, and so forth. Because each k is a functional evaluation, this recurrence makes the RK4 method very ecient. [13] provides a formal representation of the global truncation error of the RK4 method, the most important result of this representation being that this truncation error is of fourth-orderh(O)⁴.

Notice thath(O)⁴ is achieved with four evaluations off for each integration- step. We can read in [11] that this situation does not extend to higher order RK methods. For instance, an eight-order RK method (h(O)⁸), may require twelve evaluations per step, which is quite a high price to pay, time-wise, and implies more function evaluations per step to achieve slightly better results.

This simply may not be worthwhile. For this reason the fourth-order RK method is considered the most popular among the Runge-Kutta methods.

The Principal Component Transform (PCT) : tensor-metamodel of biochemical synergistic (S) systems, a new approach for mathematical modelling of non-linear dynamical systems