The Principal Component Transform (PCT)
Tensor-metamodel of biochemical synergistic (S) systems, a new approach for mathematical modelling of non-linear dynamical systems.
Norwegian University of Life Sciences Department of M athematical Science and Technology
Master of Science 2014 60 credits
To Daniel Andrés & David Emanuel.
1 Acknowledgements
This work was carried out at the Norwegian University of Life Science from August 2013 to April 2014. Throughout each and every stage of this project there have been a great many people who have pushed me onwards, motiv- ated me and guided me, and it is thanks to their support and guidance that we are reading this thesis today.
I am heartily grateful to my supervisor, Prof. Arkadi Ponossov, whose guidance and continual encouragement have been invaluable throughout the workings of this thesis. Prof Ponossov did not only lead me through the project; he truly inspired me and in sharing his knowledge both myself and my work have been immensely enriched.
I would also like to express my warmest thanks to Prof. Harald Martens, for his moral support, kindness and constructive ideas; this thesis being the result of one of these ideas. His constant enthusiasm have been both conta- gious and inspirational.
I would like to thanks to Prof Stig Omholt and Prof. Eberhard O. Voit, for their support and suggestions; this thesis bears the name proposed by Prof O. Voit, which is: the principal component transform.
My grateful thanks also goes to Rachel Davies, for her help in reading and correcting my terrible grammatical errors, which you will never nd, thanks to Rachel.
Finally, I would like to express my gratitude to my family for their love and endless support, especially to my wife. Without her unwavering belief in me, I would never been able to achieve what I have achieved, and I am forever indebted to her.
2 Abstract
In this thesis, a new approach, the principal component transform (PCT)1, is presented for mathematical modelling of synergistic biochemical systems (S-system) providing us with a simple and very accurate method for t- ting these and other non-linear dynamical systems. The PCT is composed by two metamodels2 one bi-linear metamodel and its improved version, a tensor-metamodel; both metamodels are based on the principal component analysis (PCA), which at the same time is based on the singular value decom- position (SVD). The tensor-metamodel is in addition based on mathematical attributes from tensor algebra, especially, the Kronecker product.
The PCT is inspired by the results of the PhD thesis of Julia Isaeva3. She and her co-authors presented how many non-linear models can be approx- imated by a single bi-linear metamodel. Isaeva's publications have provided valuable knowledge which has been applied in the implementation of the PCT as a tool which deals with multivariate power functions.
Referring to the PCT, the loadings, which are orthogonal, span a subspace, where, after projecting a simple time-series (given by the S-system), we are able to nd a function which behaves in the same way as the non-linear dynamical system described by the S-system. The process of projecting the time-series vector to the span of the loadings-vectors is carried out by the linear regression method.
This thesis did explore the possibility of using the scores, which belong to the PCT as a library, meaning that after having computed the linear regres- sion, we ought to have been able to nd the same parameters in this library.
After having tested this hypothesis in the scalar case we could not nd any logical correspondence between the parameters from the library and those parameters found by means of the linear regression method. These fundings are in accordance with the results presented by the PhD candidate Valeriya Tantseva4, where she conrms that the S-system formalism has a sloppy structure in the scalar case, meaning that they have a neutral parameter-set.
However, in higher dimensions, sloppiness seems to disappear, so that the data library can be constructed in an ecient way.
Regarding the bi-linear metamodel, it has been observed that the dimen- sionality of this method increases exponentially(n3) and proportionally to the size of the S-system being studied. This leads, for big systems, to the formation of principal components belonging to a null subspace. This issue
1This terminology was suggested by Prof. Eberhard O. Voit; Prof. Voit is a Distin- guished Professor in Biochemical Systems at the Georgia Technology Research Alliance.
2This concept and its applications are mostly due to by Prof. Harald Martens; Prof.
Martens is a Professor at the Norwegian University of Science and Technology NTNU
3Julia Isaeva was a PhD student at the Norwegian University of Life Sciences from 2007-2011.
4Valeriya Tantseva was a PhD student at the Norwegian University of Life Sciences from 2009-2013.
is solved in the tensor-metamodel, where the reduction of the dimensionality is given by the Kronecker product of the SVD.
Finally we can mention that the accuracy that each metamodel-phenome delivers is very good, meaning that more than 99% of the originally math- ematical model is described by the PCT.
3 Sammendrag
I denne masteroppgaven presenterer jeg en ny tilnærming som kan brukes til matematisk modellering av synergistiske biokjemiske systemer (S-systemer), tilnærmingen kalles the principal component transform (PCT). PCT-modellen gir oss en enkel og svært nøyaktig metode for modellering av S-systemer og andre ikke-lineære systemer.
PCT modellen er satt sammen av to metamodeller: en bi-lineær metamod- ell og en forbedret versjon, kalt tensor-metamodell. Begge metamodellene baseres på prinsipal komponent analyse (PCA). Tensor-metamodellen er i tillegg basert på matematiske egenskaper fra tensor algebra, spesielt Kro- necker produktet.
PCT-modellen er inspirert av doktorarbeidet til Julia Isaeva. Isaeva og hennes medforfattere presenterte hvordan ikke-lineære modeller kan tilnærmes ved en enkelt bi-lineær metamodell. Isaevas publikasjoner har gitt verdifull kunnskap som jeg har brukt for å bearbeide multivariate potensfunksjoner.
Med henvisning til PCA: Scores som er ortogonale, spenner ut et underrom der vi kan projisere en tidsrekke ved lineær regresjons metoden, tidsrekken er gitt av S-systemet. Dette gir en funksjon (en ODE-funksjon), som vil oppføre seg på samme måte som det opprinnelig S-systemet.
Denne oppgaven utforsker muligheten til å bruke scores som tilhører PCT- modellen som et bibliotek. Biblioteket kan brukes i forbindelse med para- meterestimeringsproblemer. Min hypotese er at parametersettet funnet ved lineær regresjon burde nnes i scores-biblioteket. Etter å ha testet hypotesen for det skalare tilfellet fant jeg ikke logisk samsvar mellom parameterrom- met og biblioteket. Dette er i samsvar med resultatene som presenteres av PhD kandidat Valeriya Tantseva. Tantseva bekreftet at S-systemer har en sloppy struktur i det skalare tilfellet. Det betyr at S-systemer har et nøytralt parameter-sett.
I høyere dimensjoner kan "sloppiness" likevel forsvinne slik at biblioteket kan konstrueres på en eektiv måte.
Nøyaktigheten som hver metamodell leverer er meget bra, noe som betyr at mer enn 99% av den opprinnelig matematiske modellen er beskrevet av PCT-modellen.
4 Introduction
March 2014 was a key date for the Norwegian Scientic Society, whereupon we celebrated the 150th anniversary of the only natural law found by Norwe- gian scientists: the Law of Mass Action. This law is a mathematical model which describes natural phenomenons in biochemistry given by solutions in dynamical equilibrium. Cato Maximilian Guldberg and Petter Waage pro- posed the Law of Mass Action in 1864, see Apendix A2.
This day we know that the attributes and properties of the Law of Mass Action can be applied not only to biochemistry, but to many other elds of science; such as physics (e.g. thermodynamic studies, semiconductor prop- erties), physiology (e.g. extracellular physiological signals), pharmacology, psychology (e.g. psychological theories in neuro-psychology) and mathemat- ics (e.g. dynamical systems, mathematical ecology, mathematical epidemi- ology).
The Law of Mass Action is also responsible for the relatively new mathemat- ical theory called power-law formalism, which, together with their synergistic (S)-system representation of non-linear dierential equations, describes sat- isfactorily the non-linearities of natural processes. This thesis is intended to be an attribute to this powerful and exible natural Law of Mass Action. I will connect this natural law, which describes non-linear dierential equa- tions, to a method called metamodelling, which has been vastly used at the Center for Integrative Genetics (CIGENE) at the Norwegian University of Life Science, (section 5.1 describes what a metamodel is).
The main goal of the thesis is to nd a new approach, which will be based on metamodelling, to non-linear systems appearing in biochemistry, in particu- lar the S-system representation which goes back to the Gulberd and Waage model and its generalizations.
The mathematical theory which describes how the metamodel and the tensor- metamodel can be used as methods for modelling S-system is described in Sections: 8, 10 and 13.
Examples 2, 3, 4, and 5 present the tting of (1x1), (2x2), (3x3) and (4x4) S-system to a bi-linear metamodel.
Section 13 presents the tensor-metamodel and Section 14 provides examples of (1x1), (2x2), (3x3) and (4x4) S-system being modelled by the tensor- metamodel.
Contents
1 Acknowledgements 3
2 Abstract 4
3 Sammendrag 6
4 Introduction 7
5 Background 11
5.1 Model vs. metamodel . . . 11
5.2 The singular value decomposition (SVD) . . . 12
5.3 The principal component analysis (PCA) . . . 12
6 The power-law formalism and S-systems 14 6.1 Historical introduction . . . 14
6.2 The need of the power-law formalism and S-systems . . . 15
6.3 A formal mathematical description of S-systems . . . 15
6.4 Modelling with S-systems . . . 16
7 Compressing power functions with PCA 17 7.1 Theoretical background . . . 17
7.2 Further results: The scores-library . . . 18
8 Scalar case of S-systems and the bi-linear metamodel 23 8.1 Deriving the metamodel . . . 23
8.2 Metamodel-parameter estimation problem . . . 24
8.3 Example 1: (1x1) S-system and its bi-linear metamodel . . . 29
9 Numerical analysis 31 9.1 Euler method . . . 32
9.2 Fourth-order Runge-Kutta method (RK4) . . . 33
9.3 ODE45 . . . 34
9.4 Interpolation . . . 34
9.5 ODE45 vs. fourth-order Runge-kutta . . . 37
10 Vector case of S-systems and the bi-linear metamodel 39 10.1 (2x2) S-systems . . . 40
10.2 Deriving a metamodel for (2x2) S-systems . . . 40
10.3 Metamodel-parameter estimation . . . 42
10.4 Example 2: (2x2) S-system and its bi-linear metamodel . . . 43
10.5 (3x3) S-systems . . . 46
10.6 Example 3: (3x3) S-system and its bi-linear metamodel . . . 47
10.7 (4x4) S-systems . . . 49
10.8 Example 4: (4x4) S-system and its bi-linear metamodel . . . 50 11 Issue with the bi-linear metamodel of big size S-system: Pro-
ducing the span for the null space 51
12 The tensor product 52
13 The tensor-metamodel 53
13.1 Tensor-metamodel for (2x2) S-systems . . . 53 13.2 Tensor-metamodel for (3x3) S-systems . . . 55 13.3 Tensor-metamodel for (4x4) S-systems . . . 56
14 The tensor singular value decomposition 57
15 Testing the tensor-metamodel 58
15.1 Example 5: (2x2) S-system and its tensor-metamodel . . . 58 15.2 Example 6: (3x3) S-system and its tensor-metamodel . . . 60 15.3 Example 7: (4x4) S-system and its tensor-metamodel . . . 61
16 Applications of the tensor-metamodel 62
16.1 Steady state analysis for the tensor-metamodel . . . 63 16.2 Recasting . . . 66
17 Conclusions 73
18 Appendix A1: The singular value decomposition 75
19 Appendix A2: The Law of Mass Action 76
20 Appendix A3: Matlab-codes 78
20.1 Note about matlab codes for the bi-linear metamodel and tensor-metamodel . . . 78 20.2 Code for a multivatiave-power-function Matrix . . . 78 20.3 Computing the PCA . . . 78 20.4 Tensor-singular value decomposition for nding the right com-
bination of PCs for the tensor-metamodel . . . 79
List of Figures
1 Presents an aposteriori model . . . 11
2 Illustrates an a priori model . . . 12
3 Representation of the structure of the PCA . . . 13
4 Shows the plot of the non-linear functionx1.5 . . . 19
5 Representation of the scores-library . . . 20
6 Shows the non-linear functionx1.5being tted by a metamodel where the parameters come from the scores-library . . . 21
7 Shows the non-linear function x1.5 −x0.5 being tted by a metamodel where the parameters come from the scores-library 22 8 Graphical representation wherex(tk)⊂xj . . . 26
9 (1x1) S-system being tted by a bi-linear metamodel . . . 30
10 (1x1) S-system being tted by a bi-linear metamodel at 4 dierent initial conditions . . . 31
11 Plot of the functionsin(πx) . . . 35
12 Interpolation ofsin(πx) at 15 points . . . 35
13 Interpolation ofsin(πx) at 50 points . . . 36
14 Interpolation ofsin(πx) at 1000 points . . . 36
15 Shows an ODE being solved by the fourth order Runga-Kutta method and its absolute error . . . 37
16 Shows an ODE being solved by ODE45 and its absolute error 38 17 Comparison of the absolute error of the Runge-Kutta method and ODE45 . . . 39
18 (2x2) S-system being modelled by a bi-linear metamodel . . . 44
19 (2x2) S-system being modelled by a bi-linear metamodel at 3 dierent initial conditions . . . 45
20 (3x3) S-system being tted by a bi-linear metamodel . . . 48
21 (3x3) S-system being tted by a bi-linear metamodel . . . 48
22 (4x4) S-system being modelled by a bi-linear metamodel . . . 50
23 Shows the convergence of the PCs due to tensor-multiplication of matrices with dierent sizes . . . 53
24 (2x2) S-system being modelled by a tensor-metamodel . . . . 59
25 (2x2) S-system being modelled by a tensor-metamodel at two dierent initial conditions . . . 60
26 (3x3) S-system being modelled by a tensor-metamodel . . . . 61
27 (4x4) S-system being modelled by a tensor-metamodel . . . . 62
28 (2x2) S-system being modelled by a tensor-metamodel for steady state analysis . . . 65
5 Background
5.1 Model vs. metamodel
A model in mathematics is a simplied description of a system, and this de- scription is made using mathematical language and concepts. I believe that a mathematical model has 3 main goals:
i) To help us to understand the system.
ii) To help us to explain and describe the system and the components which are involved in the system.
iii) To help us to make predictions about the behaviour of the system, and the eect each component has.
It is well known that mathematical models of complex systems are normally very dicult to analyze and time consuming to work with. On the other hand, a metamodel is a much simpler tool to work with. A metamodel is a model of the original model. A metamodel provides, just like a model, a description of the system. The advantage of working with metamodels is that they considerably reduce the complexity of the model and also reduce the dimensionality of the system, without loosing any important informa- tion. We can see in Figure 1 and Figure 2 a graphical representation of an aposteriori and an apriori model with its corresponding metamodel.
A model is known as aposteriori when the real data comes before the mode.
The metamodel comes at the end.
Figure 1: This gure has been extracted from [1].
The model is known as a priori when the data comes after the model.
The metamodel comes at the end.
Figure 2: This gure has been extracted from [1].
Despite the fact that a metamodel is a simplication of the model, it does not lose any important information regarding the complex system. In other words, a metamodel will not fail to fulli), ii) andiii).
The metamodel will provide an insight, which in some cases is as deep as the model itself, of the complex system being analyzed.
In this thesis I will use the principal component analysis (PCA) as a metamodel.
The PCA will be computed by means of the singular value decomposition.
5.2 The singular value decomposition (SVD)
According to [4], there are two matrices U and V which are orthogonal and a diagonal matrixΣ enabling the matrixX(IxJ) to be factorized as follows:
X(IxJ)=U(IxI)Σ(IxJ)V(JtxJ) (1) whereVt is V transposed.
The matrix Σ contains the singular values of X on its diagonal. The left singular vectors U are joined together withΣin order to obtain the scores- matrix T and the matrix V, also called right singular vectors which produce the loadings-matrix P:
X = (UΣ)Vt= (T)Pt+e (2) A more extensive denition of the singular value decomposition is provided in appendix A1.
5.3 The principal component analysis (PCA)
A principal component (PC) is a special type of variable which can not be measured directly. Instead, it can be computed as a linear combination of a set of input variables (e.g. a data set). These kinds of variables are called latent variables, see [2]. The author in [2] underlines that the principal component analysis (PCA) allows us to extract relevant information from big and sometimes confusing data setts with comparatively little eort. The PCA reduces the high dimensional data sett to a lower dimension giving few
principal components that underlie the dynamics of the complex system.
Authors in [3] acknowledge that the PCA is concerned with explaining the structure of the variance-covariance of a set of variables. The result of the analysis of these latent variables has shown, and also been considered by [1], that the principal components lie in the direction of the maximal variation of the data and they describe this variation in a descending order, thus: the rst PC is found along the direction of the largest variation, the second PC in the direction of the second largest variation but orthogonal to the rst, a pattern which is repeated for the third PC, the fourth PC and so on.
LetA=optimal number of PCs
When the remaining variation of the data in question is small, it is considered thatA has been found and the information which is given by A+ 1 PCs is so small that it can be considered as noise (e).
In [1] we can read more about how the PCs reect the largest eigenvalues of the co-variances, therefore the rst few PCs can give an adequate description of the whole data set, these PCs form a new orthogonal basis of the variable space where the scores (T) and loadings (P) are projected onto it. The scores represent the relationship between the PCs and the samples (data rows) while theloadingsrepresent the relationship between the PCs and the variables(data columns).
Assume we are given some random data in the matrixX(IxJ), this data can be expressed in terms of theloadingsand scoresas shown below:
X(IxJ)=T(IxK)P(Kt xJ)+e(IxJ) (3) Here are X,T,P and e matrices with known dimensions, and where Ptis P transposed.
(3) shows that X(IxJ) is represented linearly byT andP, for this reason the PCA is referred to as the bi-linear model.
Next we have a graphical interpretation of (3).
Figure 3: Illustrates how the structure of the PCA will be.
The full bi-linear model
The bi-linear model of the matrixX(IxJ), together with the optimal number of PCs(A)can be written in vector form as shown below:
X(IxJ)=X+t1pt1+...+tAptA+eA (4) where pt is p transposed.
The matrixX in (4) represents the main centred-valued matrix, the vectors t1,...tAare the rst A scores vectors, p1,...pAare the rst A loadings vectors, and e is the error.
Centring X before computing the PCA is a normal procedure in statistical analysis, but since this is a mathematical thesis, we are only interested in the mathematical behaviour of the bi-linear model. For this reason, we do not need to center X.
Just before we introduce the PCA as a metamodel for S-system, we give a formal introduction of S-system and the theory of the power-law formalism.
6 The power-law formalism and S-systems
6.1 Historical introduction
The power-law formalism and in particular the S-system representation within this non-linear formalism can trace their origins back to 150 years ago. It all started with two scientists named Peter Waage and Cato Maximilian Guld- berg.
Peter Waage (June 29 1833 - January 13 1900), was a signicant Norwe- gian chemist and professor at the Royal Frederick University. Waage, along with his brother in law Cato Maximilian Guldberg (11 August 1836 - 14 January 1902), developed the Law of Mass Action between 1864 and 1879.
The Law of Mass Action in chemistry is a mathematical model that explains and predicts behaviours of solutions in dynamic equilibrium. Guldberg og Waage published three articles in total, the rst one came in 1864, and was a paper in Norwegian called Studier over Aniteten, (see appendix A2). It went largely unnoticed, as did the second paper in 1867, which was written in French. Almost fteen years after the rst paper Guldberg and Waage had published, 1877 saw a Dutch chemist called Jacobus Henricus van 't Ho achieving similar results as those produced by Guldberd and Waage.
Ho's work was produced independently from that of Guldberg & Waage's and Ho's had started to get recognition for these results. For this reason, the Norwegian scientists decided to write their last paper in 1879, in order to get credit for their work, an achievement which they they did manage to secure.
It is thanks to the scientic achievements of Guldberg & Waage in Studier
over Aniteten (1864) that we have what we now call the power-law form- alism, which is a sophisticated theory that studies dynamic equilibrium.
A representation of this formalism is given in systems of equations called S-systems. In [5] we can read that the S refers to synergism and satura- tion of the investigated system. The authors in [6] stress the fact that the power-law formalism is a theoretical framework for modelling and analysis of complex systems represented by dierential S-system equations. Among the possibilities of this formalism, we would do well to mention:
* Steady-state characterization of complex systems.
* Parameter estimation from experimental measurements.
* Classication of non-linear functions using recasting techniques.
* Development of ecient numerical algorithms for numerical simulations.
6.2 The need of the power-law formalism and S-systems In the analysis of complex technological systems or biological systems, meth- ods based on linear mathematics very often fail to capture the essential non- linear characteristics from these processes. Concerning living systems and natural processes, non-linear responses such as synergism, saturation and oscillations are the rule rather than the exception. For instance, nding ana- lytical solutions of non-linear dierential equations is an art limited to a few special cases that it is insucient for general non-linear system analysis. On the other hand, the power-law formalism with S-system dierential equations shows potential for providing systematic methods for analysis of non-linear systems, see [6].
6.3 A formal mathematical description of S-systems
The author in [5] provides a very descriptive denition of the mathematical structure of S-systems. I will quote this denition with some terminology changed:
An S-system in biology represents the relative change in ux of a sub- stance in a specic process. This relative change in ux is represented by the functionXi, where the variables have the same name and which implies thatX˙i denotes the derivative ofXi with respect to time. X˙i is at the same time composed of two positive-valued and dierentiable functions ofX. One of these functions can be expressed as: V+(Xi) representing the in-ux and the second function: V−(Xi) representing the out-ux.
This gives:
X˙i =Vi+(Xi)−Vi−(Xi) (5) wherei= 1,2, ..., n.
From (5) we can understand that if the system is composed ofX1, X2, ..., Xn variables, the function that describes the production is given byVi+(X1, X2, ..., Xn) and the function that describes the degradation is given byVi−(X1, X2, ..., Xn), giving the following result:
X˙i=Vi+(X1, X2, ..., Xn)−Vi−(X1, X2, ..., Xn) (6) Considering now X˙i, where i = 1,2.., n, this implies that there is only one dierential equation for each dependent variable in the system being con- sidered. Assume now that we have m independent variables and ndepend- ent variables, this will give usndierential equations in the form of (6).
We letVi+ and Vi− be dened now as follows:
Vi+(X1, ..., Xn, Xn+1, ..., Xn+m) =αiX1gi1X2gi2...XnginXn+1gi,n+1...Xn+mgi,n+m
and
Vi−(X1, ..., Xn, Xn+1, ..., Xn+m) =βiX1hi1X2hi2...XnhinXn+1hi,n+1...Xn+mhi,n+m
whereαi, βi, gi,n+m, hi,n+m are parameters.
Then:
Vi+(X1, ..., Xn, Xn+1, ..., Xn+m) =αiQn+m j=1 Xjgi,j and
Vi−(X1, ..., Xn, Xn+1, ..., Xn+m) =βiQn+m
j=1 Xjhi,j
This generalization allows us to re-write (6) in the following way:
X˙i=αi n+m
Y
j=1
Xjgij −β
n+m
Y
j=1
Xjhij (7)
withXi(0) =Xi0 and fori= 1,2, ...,(n+m)
The equation in (7) gives a canonical representation of non-linear ordinary dierential equations. The coecients αi, βi ∈ R are called rate constants and the coecientsgij, hij ∈Rare referred to as kinetic orders.
6.4 Modelling with S-systems
From experimental trials conducted and presented by the author in [5], we know that near to steady-state solutions and relative changes in metabolites produces proportional changes in ux. S-system's representations of bio- chemical systems responds in the same way. For this reason we can consider S-systems as a suitable tool for modelling biochemical systems.
At the same time, we know from the theory presented in [6] that S-systems, thanks to its structural homogeneity, means that all steps required for ana- lysis of complex biological systems are straightforward. We could mention:
system representation, steady-state solutions, stability analysis and sensitiv- ity analysis: these methods are ecient for evaluating the dynamic responses of the system, providing at the same time powerful tools for understanding the complex non-linear systems.
In the next chapter we will provide the theory that will allow us to compress these complex non-linear systems represented by S-systems. We will show as well that by compressing S-systems we don't lose any important information of the non-linear system.
7 Compressing power functions with PCA
7.1 Theoretical background
The theoretical background of this section is based on the research and work of Julia Isaeva. Julia took his Philosophiae Doctor Thesis at the Norwegian University of Life Sciences in 2011. I will use more specically the results presented in her fourth paper The modelome of line curvature: Many non- linear models approximated by a single bi-linear model with verbal proling, whilst at the same time I hope to take these results a little further.
Consider a matrix X of dimensions (I xJ). Every entry in X(IxJ) will be given by a power function, this follows from the way X is dened, which is:
xωji :X(IxJ)−→R (8)
wherei= 1, .., I and j= 1, .., J.
This means that we can decomposexωji by arbitrarily choosing values for J and I. If, for instance, we let: j ∈ [1, ..., J] which is a close domain with J points, and i∈ [−1, ..., I]which is also a close domain with I points, we obtain the matrixX ∈RIx,J:
xωji :X =
xω11 xω21 ... xωJ1 xω12 xω22 ... xωJ2 .. .. ... ..
xω1I xω2I ... xωJI
∈R(IxJ) (9)
We can see now how the matrixX(IxJ) represents a collection of power func- tions, or, we could say thatX(IxJ) is a multivariate power function matrix.
Through this thesis I am going to work in detail with functions of the form xωji. This is because the structure of these functions is similar to the struc- ture we nd in the S-system representation. Since S-systems are related to biochemical systems, we need to dene some constraints for the domain of xωji, this is needed in order to get valid results within this branch of chemistry.
ωi:
ωi will have a default close domain, ωi∈[−1,2]. This is because [5] ensures that kinetic order reactions happen to occur inside this interval, namely [−1,2].
xj:
The only constraint we have for the domain of xj is that it cannot include the0value. This is because we could risk operating with singularities when dividing by zero.
According to [1] we know that such kinds of matrices (X(IxJ)) can be com- pressed by a metamodel, the bi-linear model. If we recall equation (4), we are reminded of the fact that we can expressX(IxJ) as a linear combination of the mean-centred valued matrixX and the rst three principal components multiply with the rst three scores-vectors:
X(IxJ)=X+t1pt1+...+tAptA+eA (10) where pt is p transposed.
Consequently, if we were to multiply X(IxJ) with a real-valued constant α, we would have:
αX(IxJ)=αX(IxJ)+αt1pt1+...+αtAptA+αeA (11) Note that:
t∈RI and p∈RJ
the scores (t) will be related to the rows of X(IxJ) while the loadings (p) will be related to the columns ofX(IxJ).
7.2 Further results: The scores-library
To reiterate one more time, we are only interested in the mathematical beha- viour and structure of the bi-linear model, and for this reason we do not need to center the data inX(IxJ)before computing the PCA. The consequence of leavingX(IxJ) not-centred is that we need to work with 3 PCs instead of 2.
We now have the following metamodel:
xωji :X(IxJ)=t1pt1+t2pt2+t3pt3+e4 (12) where pt is p transposed.
Another important result we are going to further explore is that the scores ti
(i= 1,2,3) from (12), can be brought together to form a library, which can be used in parameter estimation problems. The scores-library is introduced in the following section.
The scores-library
Let L refer to a library formed by the scores t1, t2 and t3, which can be found in (12).
L= [t1 t2 t3] (13)
Before we proceed, please recall that the dimension of ti (i = 1,2,3), is directly related to the rows of X(IxJ), and where the rows of X(IxJ) are given byωi from the power function representationxωji, see (9).
Assume now that we are interested in a specic value called ωs, which is inside ωi and where ωi ∈[−1,2]. Assume this valueωs = 3/2, which gives the following function:
f(x) =x(3/2) Expressed graphically, we have:
Figure 4
At the same time we construct a matrix X ∈ R(600x600) where we let xj ∈[0.1, 2]and ωi ∈[−1, 2]. We can now compute the PCA to X(600x600) in order to get the scores and loadings.
xωji :X(600x600) → P CA→
3
X
i=1
tipti ≈X(600x600) where pt is p transposed.
Now, since we know thatωs∈ωi we can nd the index position inωi where we have the value of3/2 or its best approximation.
ωi ∈[−1, ω2, ω3, ..., 1.5
|{z}
position 501
, ...,2] (14)
Once we have the index position ofωs we can begin working with L. In this case we have the following library:
L= [t1 t2 t3] ∈R(600x3)
Now, in order to nd a metamodel that approximates our function f(x) = x3/2 we need to nd the values of L which correspond to the index position 501. This step is illustrated in the next Figure.
Figure 5: Representation of the scores-library
We use matlab to pick up these values and we get the following metamodel:
f(x)≈L(501,1)p1(x) +L(501,2)p2(x) +L(501,3)p3(x)
| {z }
fb(x)
(15)
where:
t1 =L(501,1) =−8.835 t2 =L(501,2) =−1.155 t3 =L(501,3) =−1.044
If we now plot both functions together, f(x) and f(x)b we get the following result:
Figure 6: Shows the non-linear function x1.5 being tted by a metamodel where the parameters come from the scores-library
We can see that we get very good results.
Consider now the following non-linear function:
f(x) =x(3/2)−x(1/2) (16) The exponents full the requirement of being insideωi ∈[−1, 2].
We use the same matrix we had before, namelyX(600x600) and we can com- pute the PCA one more time, meaning we get the same result as before:
xωji :X(600x600) → P CA
The dierence this time will be given in the structure of the metamodel, which is given as follows:
f(x)≈
3
X
i=1
tipti−
3
X
i=1
tipti (17)
where pt is p transposed.
Since we are using the same xj in computing the metamodel for x3/2 and x1/2, this implies that pti from (17) is the same in both sums and for this reason can be factorized. We can make the same supposition for the library L, which will be the same in both sums because we are using the sameωifor the metamodel forx3/2 andx1/2. But the values we are querying in L come from dierent index-positions within L.
for ω(s1) = 3/2:
ωi ∈[−1, ω2, ω3, ..., 1.5
|{z}
position 501
, ...,2]
and forω(s2) = 1/2:
ωi ∈[−1, ω2, ω3, ..., 0.5
|{z}
position 301
, ...,2]
This gives the following metamodel:
f(x)≈[ω(s1)
| {z }
L501,1
−ω(s2)
| {z }
L301,1
]pt1+ [ω(s1)
| {z }
L501,2
−ω(s2)
| {z }
L301,2
]pt2+ [ω(s1)
| {z }
L501,3
−ω(s2)
| {z }
L301,3
]pt3 (18) We pick up the six values from the common L which corresponds to these index-positions and we compute the dierence:
t1=L(501,1)−L(301,1) =−50.4210 t2=L(501,2)−L(301,2) = 11.4172 t3=L(501,3)−L(301,3) =−0.1087
Finally, we now have a metamodel with a structure similar to (12):
f(x)≈t1p1(x) +t2p2(x) +t3p3(x) +e4 (19) We plot both functions and we get:
Figure 7
The results, as we can see, are very satisfying.
These examples ensure that L can indeed be used as a library. These results are important for further analysis, (see section 16, recasting).
In the next section of the thesis, a metamodel has been developed for ap- proximate S-system dierential equations, having a time-series given in the left hand side (LHS) of the system as the only information available.
8 Scalar case of S-systems and the bi-linear metamodel
A scalar case of S-system representation is given as follows:
X˙ =αXg−βXh
| {z }
f(x,t)
(20)
one equation, four parameters and X common. Recall that α, β ∈ R are called rate constants andg, h∈Rare referred to as kinetic orders.
We will develop a metamodel which is simply an approximation of the ori- ginal function, and we have called this metamodelfb(x, t).
As mentioned briey previously, the right hand side (RHS) in (20) will be unknown. The only available information we are going to have is the LHS, which will be giving as a vector with function-values off(x, t) at some time t. In other words, f(x, t) will be given discrete. First of all, we begin by deriving the metamodel.
8.1 Deriving the metamodel
Recall the results from section (3), then dened αXg and βXh as shown bellow:
αXg≈αλ1pt1(xj) +αλ2pt2(xj) +αλ3pt3(xj) +α4 (21) βXh ≈βµ1pt1(xj) +βµ2pt2(xj) +βµ3pt3(xj) +β4
where pt is p transposed;λ1,λ2,λ3 andµ1,µ2,µ3 are vector-spaces,α and β are just constants ∈R.
Since we know thatxj is common for both metamodels, pti can be factorized giving the following result:
αXg−βXh≈(αλ1−βµ1)pt1+ (αλ2−βµ2)pt2+ (αλ3−βµ3)pt3+4 (22) In order to simplify notation we can write (22) as shown below:
αXg−βXh≈Γ1pt1(xj) +Γ2pt2(xj) +Γ3pt3(xj) +4 (23) Equation (23) is the metamodel for the (1x1) S-system, where:
Γ1 = (αλ1−βµ1) is a vector space.
Γ2 = (αλ2−βµ2) is a vector space.
Γ3 = (αλ3−βµ3) is also a vector space.
Notice that the metamodel in (23) is a general representation of the function fb(x, t) which approximates the (1x1) S-system for any given combination of parametersα, β, g, h within a valid domain.
For an explicit (1x1) S-system, the metamodel becomes:
αXg−βXh≈Γ1pt1(xj) + Γ2pt2(xj) + Γ3pt3(xj) +4 (24)
whereΓi (i= 1,2,3), are real-valued constants. These constants, also called parameters for the metamodels are unknown.
Equation (24) can be considered as a parameter estimation problem. We now need to estimate these unknown parameters, which is explained in the next section.
8.2 Metamodel-parameter estimation problem
The process in nding theΓ1,Γ2 and Γ3 parameters from a time-series will be explained through a series of six steps in a simple algorithm.
Step 1: Solving the S-system
Assume the (1x1) S-system is a known explicit, this means that:
X˙ =αXb bg−βXb bh
| {z }
test function
, x(0) =xo (25)
whereα,b β,b bg andbh are real-valued constants.
At the same time we choose to call the function from (25) test function.
The rst step consists in solving the system in (25). Dierent methods can be applied, in matlab or other computational languages. I will use a built-in function from matlab called ODE455.
The result after solving (25) is a vector which we will refer to from now on as test solution. The test solution will be given as follows:
[x(tk)]kkK=xo+x
1=xo
| {z }
test solution
(26)
the rst value of the vector is the initial condition-valuexo and the last value isxo+x.
What is more I will only usex(tk)when referring to the test solution in (26), this is done in order to simplify notation.
The time interval for which the ODE from (25) is solved can be arbitrarily chosen. The domain of x(tk) ∈ [xo, xo +x] will depend entirely on the behaviour of the solution. This interval gives us information about where x(tk) is dened. Furthermore, we can nd the max and the min value for the test solution within this interval. We are very interested in this interval, which will help us to decide the domain for xj from the power-function representationxωjiused in assembling the multivariate power function matrix X(IxJ).
5ode45 is a numerical tool used in solving dierential equation problems. It applies the fourth and fth order Runge-Kutta method. (MathWorks-DocumentationCenter-ode45 )
Step 2: Computing the PCA
Recall that xωji will give a matrix X of size (IxJ). The domain for the columns ofXwill vary accordingly to the domain ofxjalone. This is because the domain ofωi (rows ofX) has been set to the default value= [−1, 2]. We have:
xωji :X(IxJ)=
x−11 x−12 ... x−1J xω12 xω22 ... xωJ2 .. .. ... ..
x21 x22 ... x2J
(27)
The domain of xj needs to full two constrains:
a) xj has to be chosen in such way thatx(tk) becomes a subset ofxj. This is possible if we take a close look to the domain ofx(tk)and we simply chose a wider interval forxj, perhaps if:
x(tk)∈[xo, xo+x] ∀x >0 we choose:
xj ∈[xo−x, xo+ 2x] ∀ x >0
b) The step-size of xj has to be much smaller than the step-size of x(tk).
Assume that k = 50 ⇒ x(tk) ∈ R50x1, thus, we chose j = 10∗k ⇒ xj ∈ R500x1.
If X(IxJ) fulls a) and b) we can compute the PCA to X(IxJ) and proceed to the next step.
Step 3: Superposition
The constrains a) and b) given on step 2 ensure that we can writex(tk)⊂ xj and that xj has smaller step-size than x(tk). A graphical representation is provided below.
Figure 8
It is now possible to approximate values fromxj tox(tk)in the following three ways:
xj31≈x(t1) (28a)
xj67≈x(t2) (28b)
xj2 =x(t3) (28c)
xj72=x(t4) (28d)
xj7 ≈x(t5) (28e)
xj7 ≈x(t6) (28f)
* Some values from xj (of index position 31 and 67) approximate the rst and second value of x(tk), (28a,28b).
* Some values fromxj (of index position 2 and 72) are equal to the third and fourth value ofx(tk), (28c, 28d).
* Only one value fromxj (of index position 7) is similar to the fth and sixth value of x(tk), (28e, 28f).
As we can see, the process of approximate values fromxj to x(tk) will give us a very important result, which is a vector we will call[idx]vector.
This vector contains the index positions of the values of xj that approxim- ate or are equal to the values ofx(tk). These index positions are the small
numbers which are beside thej0sin (28a-28f).
idx=
31 67 2 72
7 7
(29)
Note that[idx]∈RK. Superposition:
We are going to use the [idx] vector to nd a composed function pi(xj) (i = 1,2,3), of x(tk). The result will be the composed function pi(x(tk)) where(i= 1,2,3).
For this job a matlab-code has been implemented, the code records the in- dex positions ([idx]) of the approximated values of xj to x(tk), then it goes through every element in pi(xj)and chooses the values that are in the index positions given in[idx]. At the end, the code rearranges the values of pi(xj) accordingly to[idx]and the result is a composed function pi(x(tk))∈RK. To illustrate this step let us consider (28a-28f):
Assuming now that we are done with the approximation step and we have our[idx]vector (29). We will now focus on the pi(xj)vector, for (i= 1,2,3), and pick up the value form the index position 31. This is the rst value in [idx]and for this reason this value takes the rst position in the composed function pi(x(tk)). If we repeat the process again we have to pick up the value from pi(x)from the index position 67 which is the second value in[idx]
and for this reason, this value takes the second position in pi(x(tk)). When we have gone through all the elements in[idx]we have a p(x(tk)) ∈RK. Step 4: Taking the derivative of the test solution
This step is straightforward. matlabis applied to compute the derivative of the test solution giving: x(t˙ k) =dx/dt wheredx/dt∈RK−1. Notice that we lose one dimension6 when computing the derivative of the test solution.
The step where we dierentiate x(tk) is necessary because we are going to apply linear regression in order to nd the unknown parametersΓ1,2,3, and linear regression requires that we work with time-series.
6Y = dif f(X) calculates dierences between adjacent elements of X along the rst array dimension whose size does not equal 1. IfX is a vector of length m, then Y = dif f(X)returns a vector of lengthm−1. (MathWorks-DocumentationCenter-di )
Step 5: Linear regression
This step will give us three real-valued constants: Γ1,Γ2 and Γ3, which are the three unknown parameters from our metamodel:
fb(x, t) = Γ1p1(xj) + Γ2p2(xj) + Γ3p3(xj) +e4 The linear system we are going to solve is given as:
M Γ =dx/dt (30)
where
M = [p1(x(tk)),p2(x(tk)),p3(x(tk))]∈RKx3 Γ ∈R3x1
dx/dt ∈R(K−1)x1
As we observe, the system in (30) is not consistent. The RHS has a lower dimension than the LHS. I believe the best option to solve this problem is by deleting the last row from the matrixM(KX3). The reason I suggest such an option is because we lose the last element fromx(tk) during dierentiation.
We have now:
M = [p1(x(tk)),p2(x(tk)),p3(x(tk))]∈R(K−1)x3 Γ ∈R3x1
dx/dt ∈R(K−1)x1
which is a consistent linear system. We can now compute:
Γ =Mt dx/dt (31)
The vector of M forms a basis for the column space, the vector dx/dt is not in the column space so we need to project it onto it, as a result of this projection we get three real-valued constants: Γ1,Γ2 andΓ3.
Step 6: The metamodel
Once we have the parametersΓ1,Γ2 andΓ3 we can write the metamodel for the (1x1) S-system as shown below:
X˙ = Γ1p1(xj) + Γ2p2(xj) + Γ3p3(xj)
| {z }
f(x,t)b
(32)
wherefb(x, t)≈f(x, t).
To be able to judge the accuracy of the metamodel in (32), we are going to use the square error method:
max (|x(tk)−x(tb k)|2) 100% (33)
This method basically computes the dierence from the exact solutionx(tk) and the approximated solution bx(tk). The max implies that we are only going to get the biggest result after the dierence has been computed, then we square our answer and multiply it by 100 in order to have the result in percentage form.
Before we can compute the square error method we need to solve the rst order dierential equation given in (32). The challenge here is that we have our metamodel-function in discrete form and the ODE45 method from mat- lab does not accept discrete functions. However this matter can be easily solved by another built-in function oered by matlab, which is linear inter- polation. Applying linear interpolation to (32) will give a curve where the valuesf(t1), f(t2), f(t3), ..., f(tK),andt1, t2, t3, ..., tK (which are known) are tted by matlab. We can now solve the ODE from (32) using ODE45.
This is hugely signicant because the test function has been solved by means of ODE45. We now need to apply the same numerical method to the equa- tion in (32) if we are going to be fair in comparing their solutions.
8.3 Example 1: (1x1) S-system and its bi-linear metamodel Consider the following (1x1) S-system:
X˙ = 3X0.5−2X1.5, x(0) =xo (34) The equation in (34) is given in explicit form, whereα = 3, β = 2, g= 0.5 and h= 1.5. This means that we can nd a metamodel that approximates (34) if we go through the steps 1- 6. By doing this we will obtain a metamodel for the scalar case S-system:
X˙ ≈Γ1p1(xj) + Γ2p2(xj) + Γ3p3(xj) (35) Note that parameters Γ1, Γ2 and Γ3 will be dierent for dierent initial conditions (xo).
Let the ODE in (34) havexo= 0.2, [tspan]∈[0 1]. At the same time we let ωi ∈[−1, 2]and xj ∈[0.1, 5], and we then assembleX(IxJ) as follows:
xωji : X(600x600) →P CA
Once we compute the superposition and linear regression, we obtain the following parameter-values:
Γ1 = 591.5121 Γ2 = 35.1352 Γ3 = 107.5798
These results give us the following:
X˙ ≈591.5121p1(xj) + 35.1352p2(xj) + 107.5798p3(xj)
where pi(xj)∈R(600X1) fori= 1,2,3.
Next, we plot both results and compare their solutions.
Figure 9
Finally, the square error method which is also computed in matlab gives:
(max|x(tk)−x(tb k)|2) 100≈0.048%
which is a very accurate result.
In order to show how stable the metamodel is, we are going to solve (34) at xo = 2, xo = 2.5, xo = 3 and xo = 3.5. We will compute a metamodel for each case, which will give dierentΓ parameters although we are using the same X(600x600) matrix to approximate each ODE.
The results, which are very satisfactory, can be studied in the next graph.
Figure 10 where the square error is:
- forxo= 2 ≈0.042%. - forxo= 2.5 ≈0.043%.
- forxo= 3 ≈0.043%. - forxo= 3.5 ≈0.042%.
Some very important aspects we need to consider now are: How trustworthy are these results? How reliable is the procedure we have employed in solving the discrete ODE given by the metamodel, considering that we used linear interpolation before ODE45 ? and, Is ODE45 a stable numerical method?
These are very relevant questions that will be considered in the next section.
9 Numerical analysis
Remember that we are interpolating the metamodel-function before solv- ing it with ODE45. The following section will show that, whilst another numerical-integration methods can let us deal directly with ODEs given in discrete form, the methods of interpolation and ODE45 are stable and provide reliable results.
Before we consider the methods oered by matlab, and assuming one wants to apply another numerical method, we will consider some important aspects regarding numerical-integration methods that we should bear in mind when solving ODEs.
As mentioned before, there are many numerical-integration methods that can allow us to deal directly with discrete-ODEs, to name but a few: the Euler method, the midt point method, the Heun's method, the Runge-kutta method and many others. A good numerical method has to provide two things: ef- ciency and accuracy and clearly not all of the methods mentioned above deliver on these fronts. In many cases, one has to decide if time-computing is more important than accuracy or vice versa.
Next, we will consider two methods to show that some methods are more reliable than others, namely the Euler method and the Runge-Kutta method.
9.1 Euler method
We can nd a very good theoretical description of the Euler method in [12]
and [13].
The Euler method is a numerical method used for solving initial value prob- lems. The method provides an approximation of the solution to the initial value problem.
For any
x0 =f(t, x) x(to) =xo
the forward euler method gives:
x(k+1)≈xk+hf(tk, xk) whereh=t(k+1)+tk
| {z }
∆t
(36)
The algorithm in (36) calculates approximate valuesx0, x1, x2, x3, x4, .., xn, ..
of the unique solutionf(x(t))at the set pointst0 < t1 < t2 < t3< ... < tn....
Notice thatf(t, x) satises: x0(t) =f(t, x(t)).
The Euler method can be applied in order to solve the ODE given by the metamodel. Indeed, we have all we need: we have the time interval given by xj, which are the points where the metamodel has been evaluated -this is∆t in (36)-, we have the initial conditionx0 =xk in and we have the function values for the metamodel, which corresponds tof(tk, xk) in (36).
The problem with this method arises under error analysis. As we can ob- serve, the Euler method is a step-by-step method, for this reason one can anti- cipate that by reducing the step-size, we get more accurate results. Broadly speaking, the smaller step-size yield the smaller the error. Such error is known in the eld of numerical analysis as truncation error. The truncation error can be measured locally or globally. The local truncation error meas- ures the error at specic points, found by taking the dierence between the exact solution and the computed solution at specic points, while the global truncation error measures the dierence between the exact solutionf(t) and the approximated solutionf(t)b . It is the latter which is of particular interest to us.
In [11], [12] and [13] we nd that the Euler method has a global truncation
error of rst order h(O). This mean that the error→ 0 linearly when the step size ∆t→ 0. This convergence of the error to zero is very slow, which implies that this method is not very accurate, although one has to acknow- ledge that it is very fast. In order to get good results one should consider a numerical method which has an error with a high convergence rate to zero, this means a higher power of h(O).
According to [13], the global truncation error of the midt point method and the Hun's method is of second order h(O)2, which implies more accurate results, but if we want accuracy one of the best methods to consider is the fourth-order Runge-Kutta method.
9.2 Fourth-order Runge-Kutta method (RK4)
Also called, classical Runge-Kutta. It is called fourth-order because its global truncation error is of fourth orderh(O)4, which is far quicker than the other methods previously mentioned.
The classical Runge-Kutta scheme
x(i+1) =xi+1
6(k1+ 2k2+ 2k3+k4)/h (37) where
k1 =f(ti, xi)
k2 =f(ti+1
2h, xi+1 2k1h) k3 =f(ti+1
2h, xi+1 2k2h) k4 =f(ti+h, xi+k3h)
Notice that thek0sare recurrently related. That is,k1 appears in the equa- tion ofk2, which appears in the equation of k3, and so forth. Because each k is a functional evaluation, this recurrence makes the RK4 method very ecient. [13] provides a formal representation of the global truncation error of the RK4 method, the most important result of this representation being that this truncation error is of fourth-orderh(O)4.
Notice thath(O)4 is achieved with four evaluations off for each integration- step. We can read in [11] that this situation does not extend to higher order RK methods. For instance, an eight-order RK method (h(O)8), may require twelve evaluations per step, which is quite a high price to pay, time-wise, and implies more function evaluations per step to achieve slightly better results.
This simply may not be worthwhile. For this reason the fourth-order RK method is considered the most popular among the Runge-Kutta methods.