The Metabolic Profile at the
Crossroads of Pregnancy and Infancy
Exploratory Biomarker Research Using NMR-based Urinary Metabolomics.
Daniel Sachse
Department of Medical Biochemistry Institute of Clinical Medicine
Faculty of Medicine University of Oslo
Department of Medical Biochemistry Division of Diagnostics and Intervention
Oslo University Hospital
Department of Chemistry University of Oslo
September 2013
There are the rushing waves…
mountains of molecules,
each stupidly minding its own business…
trillions apart
…yet forming white surf in unison.
– Richard Feynman
(The Value of Science, 1955)
1
Preface
Together with the rise of genomics and transcriptomics as top-down, wide-focus, hypothesis- generating approaches to medical research, the early 21st century has seen an increased interest in metabolomics, i.e. broad-spectrum metabolic analysis to monitor global outcomes of the biochemical processes that define life in health and disease.
Inspired by positive reports in the scientific literature, Professor Jens Petter Berg encouraged me to establish and apply NMR-based metabolomics at the Department of Medical Biochemistry in the context of a doctoral research project. Allowing me, originally a physicist, to gather experience with NMR spectroscopy and multivariate statistics, the project would focus on the junction of pregnancy and infancy, and complications thereof. By characterizing the metabolic fingerprint in urine samples, the project aimed in particular to find markers for gestational diabetes and breastfeeding ability in pregnant women, and for pre- and postnatal growth of small, premature infants. As a whole, it would contribute to the exploration of the developmental origins of health and disease.
From the outset of the project in 2009 to its conclusion four years later, I have had the full support, both academically and personally, of my main supervisor, Jens Petter Berg. For giving me this exciting opportunity, for his valuable advice at all stages, and for arranging many inspiring meetings with colleagues on a local, national and international scale, he deserves (he has!) my utmost gratitude.
No less am I indebted to my supervisors Frode Rise and Armin Piehler. Without Frode, to put it bluntly, there would have been no project. Not only did he give me unrestricted, prioritized access to his impressively well-maintained NMR lab, but he trusted me enough to operate the spectrometer on my own at any time of the day or night. The equipment (and I) analyzed thousands of samples near- flawlessly, while Frode could always be counted on for quick trouble-shooting, a critical look at a fresh spectrum, or just a chat by the big magnet. Armin, in his own right, was the main driver in getting the project on its feet. He had already begun researching the methodology before I came into the picture, and it was in fact through him that I first made contact with the community that became my home in research. At this point, also, I would be remiss not to gratefully acknowledge the outstanding technical support our group received from Bruker engineer Eberhard Humpfer, who visited us in Oslo early on and guided us in setting up our procedures.
I would also like to thank my collaborators and co-authors from the two involved research programs, the STORK Groruddalen cohort study headed by Anne K. Jenum and the PRENU intervention study headed by Christian Drevon. From the two programs, I must particularly emphasize Sissel J. Moltu,
2
Line Sletner and Anne Bærug, with whom I spent hours upon hours discussing details of the studies and poring over data and results.
But work is not just about the work. The time I spent between the office and the labs up at the hospital would not have been the same without my dear colleagues Camilla Stormo and Anne-Marie Siebke Trøseid, as well as Åshild Sudmann-Day, Runa Grimholt, Erik Amundsen, Reidun Øvstebø and so many more. Hans-Christian, Ole Kristoffer, Frederik. Peter, Marianne, Petter. Carola, Kari Bente, Berit. Thank you, all of you, it’s been an honor and a pleasure getting to know you, working with you, and enjoying your company on and off the job.
I am also deeply indebted to my family back in Berlin, or rather Hohen Neuendorf, Ortsteil Borgsdorf.
You supported my decision from day one, and on top of that you supported me for as long as I can remember – and then some.
At last, my wife Lili, it is the very least I can say that I owe you my gratitude, because I probably rather owe you this whole degree. You are the reason I’m here in Oslo, you were the one who found this amazing project, and you encouraged me to work through the challenges whenever I needed it.
I love you, beeb.
Daniel Sachse
Oslo, September 2013
3
Contents
Preface ... 1
List of Abbreviations ... 5
List of Papers ... 6
1 Introduction ... 7
1.1 Metabolomics, Biomarkers and the Role of Clinical Chemistry ... 7
1.1.1 Historical Perspective ... 7
1.1.2 Current Technologies ... 9
1.2 The Metabolic Profile at the Crossroads of Pregnancy and Infancy ... 10
1.3 Diabetes ... 10
1.3.1 Gestational Diabetes Mellitus ... 11
1.4 Breastfeeding ... 12
1.5 Feeding Preterm and Low Birth Weight Infants ... 14
1.6 Metabolomics: Typical Methods and Workflow ... 16
1.7 About Urine ... 18
1.8 NMR Spectroscopy ... 19
1.8.1 Basic principles ... 20
1.9 Statistical Analysis ... 22
1.9.1 Univariate Tests ... 22
1.9.2 Multivariate Statistics ... 23
1.9.3 Principal Component Analysis (PCA) ... 25
1.9.4 Multiple Linear Regression (MLR) and Principal Component Regression (PCR) ... 26
1.9.5 Partial Least Squares Regression (PLS) ... 27
1.9.6 Validation of predictive models ... 28
2 Aims ... 31
3 Summary of Papers ... 32
3.1 Paper I: Pregnancy and Gestational Diabetes ... 32
4
3.2 Paper II: Breastfeeding ... 34
3.3 Paper III: Premature Infants ... 34
4 Discussion of the Findings ... 36
4.1 Maternal Urine Profile During and After Pregnancy ... 36
4.2 Prediction of GDM ... 38
4.3 Impact of Breastfeeding on Maternal Urine ... 40
4.4 VLBW Infant Urine Profile in First Weeks of Life ... 42
4.5 Differences and Continuity between Maternal and Infant Metabolism ... 43
5 Methodological Considerations ... 46
5.1 Study Materials... 46
5.1.1 The STORK Groruddalen study ... 46
5.1.2 The PRENU nutritional intervention trial ... 48
5.2 Choice of Sample Material ... 49
5.3 Choice of Analytical Platform ... 49
5.4 NMR Spectroscopy ... 51
5.5 Spectral Processing ... 53
5.5.1 Baseline and Phase Correction ... 54
5.5.2 Normalization ... 54
5.5.3 Mean-Centering and Scaling ... 55
5.6 Reproducibility and Validation ... 55
5.6.1 Test Samples and Internal Quality Control ... 56
5.6.2 Validation of compound quantification ... 57
5.6.3 Biological Variation and Study Types ... 59
6 Conclusion ... 60
7 Future Perspectives ... 61
8 References ... 62
9 Papers I-III ... 75
5
List of Abbreviations
2h-PG Plasma glucose 2 hours after the glucose challenge of an OGTT AGA Appropriate for gestational age
ANOVA Analysis of variance CHC Child Health Clinic CV Cross-validation
EBF Exclusively breastfeeding FPG Fasting plasma glucose GDM Gestational diabetes mellitus
GFR Glomerular filtration rate, a key parameter of kidney function HMDB Human Metabolome Database
IADPSG International Association of the Diabetes and Pregnancy Study Groups IFG Impaired fasting glucose
IGT Impaired glucose tolerance MLR Multiple linear regression
MODY Maturity onset diabetes of the young, a rare autosomal dominant form of T2DM MRI Magnetic resonance imaging
MS Mass spectrometry NBF Not breastfeeding
NMR Nuclear magnetic resonance
OGTT Oral glucose tolerance test, to diagnose diabetes
PBF Partially breastfeeding, i.e. combining with e.g. formula, sugary drinks or other PC Principal component, the axes of the new variable space that a PCA calculates PCA Principal component analysis
PG Plasma glucose
PLS Partial least squares regression, sometimes also Projection on latent structures PLS-DA PLS discriminant analysis
PMA Post-menstrual age, a particular definition of gestational age
ppm Parts per million, in NMR: resonance frequency (spectral position) relative to reference PRENU Not an abbreviation; name of an intervention study of nutrition for very preterm infants ROC Receiver operating characteristic, visualizing binary classification performance
SGA Small for gestational age, as opposed to AGA SIMCA Soft independent modeling of class analogy
STORK Not an abbreviation; symbolic name of one cohort study of pregnant women SVM Support vector machine
T1DM Type 1 diabetes mellitus, characterized by failure of the pancreatic beta cells T2DM Type 2 diabetes mellitus, with the development of insulin resistance
TCA cycle Tricarboxylic acid cycle, also: Citric acid cycle
TSP Trimethylsililpropionate-d4, an NMR reference compound VLBW Very low birth weight (defined as < 1500 g)
WHO World Health Organization
6
List of Papers
I. Sachse D, Sletner L, Mørkrid K, Jenum AK, Birkeland KI, Rise F, Piehler AP, Berg JP: Metabolic Changes in Urine during and after Pregnancy in a Large, Multiethnic Population-Based Cohort Study of Gestational Diabetes. PLoS One 2012, 7(12): e52399.
II. Sachse D, Bærug A, Sletner L, Birkeland KI, Nakstad B, Jenum AK, Berg Jp: Biomarkers for breastfeeding in maternal urine during and after pregnancy analyzed by NMR metabolomics in a large prospective cohort study. Submitted to the Scandinavian Journal of Clinical and Laboratory Investigation.
III. Moltu S, Sachse D, Blakstad EW, Strømmen K, Nakstad B, Almaas AN, Westerberg AC, Rønnestad A, Brække K, Veierød MB, Iversen PO, Rise F, Berg JP, Drevon CA: Profiling of urinary metabolites in very-low-birth-weight infants shows early postnatal metabolic adaptation and maturation. Submitted to Pediatric Research.
7
1 Introduction
1.1 Metabolomics, Biomarkers and the Role of Clinical Chemistry
1.1.1 Historical Perspective
The idea that the analysis of bodily fluids can be used to determine states of health and disease dates back to antiquity. As one of the first diseases ever described, diabetes was known to the ancient civilizations of Egypt, Greece and India as “honey urine disease” as far back as 1500 BC. By the Middle Ages, diagnostic urine charts such as Ulrich Pinder’s urine wheel (Fig. 1), published in his Epiphanie Medicorum in 1506, were in wide use, linking the colors, smells and tastes of urine to various medical conditions [1,2].
Figure 1: Ulrich Pinder's urine wheel links the colors, smells and tastes of urine to various medical conditions in an early form of biofluid profiling. Published in Epiphanie Medicorum, Nuremberg, 1506. Bavarian State Library, Res/4 Path. 42 a, used with permission.
8
As for diabetes, the distinction between the sweet-urine diabetes mellitus and the entirely separate condition of diabetes insipidus had been known since the second century, and the difference between type 1 and type 2 diabetes mellitus had been noted in the fifth century. Still, these observations were based purely on symptoms and clinical features. Even when the scientific curiosity of the Renaissance had paved the way for systematic medical discoveries, it was not until the late 18th century that it was shown that the sweet tasting substance in the urine of patients was in fact sugar. Sugar was then also found in serum, establishing hyperglycemia as a hallmark of diabetes and glucose as its de-facto biomarker.
As the link between diabetes, pancreatic dysfunction and –in the mid-19th century– glucose secretion from the liver emerged, researchers were finally closing in on the biological and biochemical mechanisms behind diabetes. The concept of health as the control of the body’s internal environment and the view of diabetes as a systemic disease stem from this time. Nonetheless it took almost another one hundred years until the pancreatic hormone insulin was identified and was finally successfully extracted and purified in 1921 and early 1922 [3,4].
The example of diabetes illustrates that medical research and drug development depend critically on the knowledge of the biochemical mechanisms of health and disease: Life, the biological function of organisms, is governed by layered, interwoven networks of biochemical reactions. Genetic information is transcribed and modulated to produce proteins which execute the processes of the metabolism. Feedback up and down the hierarchy weaves a complex web of relations. Virtually any medical condition manifests itself in malfunctions or aberrant behavior in one place or another in this web, and much of diagnostics and clinical pathology is about identifying which biochemical process is misbehaving.
Consequently, in the modern era clinical chemistry has become ever more important. It has been estimated that, at least in the developed world, up to 70 percent of all medical decisions are influenced by routine analysis [5]. While clinical chemistry traditionally deals with administering specific tests to measure the levels of particular substances [6], biomarker research has moved from hypothesis-driven studies to including more and more of the data-intensive and largely hypothesis- free “omics sciences” from genomics to metabolomics [7].
9
The omics sciences have been described as a cascade [8] corresponding to the hierarchy of life above: Genomics is reading the “instructions” of what can happen, transcriptomics deals with the expression of genes and what appears to happen, proteomics measures the presence of the proteins that make it happen, and finally metabolomics is observing what immediately does or did happen (Fig. 2).
Between the “top-down” and “bottom-up” approaches of biomedical research, systems biology has emerged over the course of not much more than the last decade as a “middle-out” approach [9] that combines reduction and integration of information to identify and characterize parts and explore the ways in which their interaction with one another and with the environment results in the maintenance of the entire system. Metabolomics, specifically, aims to monitor in an unbiased fashion the global metabolic phenotype composed of all the individual metabolite levels [10].
1.1.2 Current Technologies
The rise of systems biology in general and metabolomics in particular was made possible by the advent of broad profiling technologies such as nuclear magnetic resonance (NMR) spectroscopy and mass spectrometry with the ability to generate large quantities of data, typically several thousand data points per sample, containing information on many analytes simultaneously – from dozens to thousands, depending on the sample and the profiling equipment. The availability of inexpensive computers and recent developments in multivariate statistics and extensive compound reference databases make it possible to conquer the deluge of data and turn numbers into knowledge [11].
Metabolomics can be traced back to and parallels the progress of chemometrics, i.e. data-driven chemical analysis [12]. With roots in plant research and toxicology, metabolomics has expanded into, among others, disease diagnosis, drug discovery and personalized medicine [13]. It is being applied to a wide spectrum of conditions, sample materials and organisms from oncology to nutritional
Figure 2: Cascade of the omics sciences. The disciplines recapitulate the central dogma of molecular biology: DNA makes RNA makes protein… which at last drives the metabolism.
10
research and neonatology, from blood and urine samples to bronchoalveolar lavage and cerebrospinal fluid, from humans to rats to cell lines in a flask [10,14–20].
1.2 The Metabolic Profile at the Crossroads of Pregnancy and Infancy
Few periods in life are more influential than pregnancy, childbirth and early infancy. It is not only a huge psychological and physiological challenge to the mother, but also determines so much of the development of the child. The impact of the intrauterine environment, along with other factors, is studied under the acronym DOHaD – developmental origins of health and disease [21]. Any effort that can help understand the processes in these crucial months and lead to fewer complications and improved developmental trajectories will affect not one, but at least two lives at once.
The physiological adaptations of normal pregnancy are manifold [22,23], including increased metabolic rate, cardiac output and blood volume. Metabolically, serum cholesterol and phospholipids increase, and renal changes lead to increased glomerular filtration rate (GFR) and excretion of glucose, amino acids and proteins, while sodium and water are retained. Most important to the present work, pregnancy may lead to impaired glucose tolerance (IGT) if insulin secretion cannot compensate for pregnancy-associated insulin resistance, which may progress to diabetes [24].
The overall physiological demands of pregnancy may act as a transient stress test that temporarily unmasks vulnerabilities and risks to a woman's health in later life, particularly cardiovascular disease and diabetes [25].
1.3 Diabetes
The carbohydrate and fat metabolism of the human body is strongly dependent on the pancreatic hormone insulin which regulates the uptake of glucose from the blood into the liver, skeletal muscles and fat tissue. Diabetes, specifically diabetes mellitus, is a metabolic disorder where the body cannot produce enough insulin or cannot effectively use it. As a consequence, a person with diabetes will experience hyperglycemia, i.e. elevated blood glucose levels, which will damage tissues over time and lead to life-threatening health complications. [26]
The International Diabetes Federation distinguishes between three main types of diabetes [26]:
• Type 1 diabetes (T1DM), characterized by deficient or absent insulin production caused by an auto-immune attack on the pancreatic beta cells,
• Type 2 diabetes (T2DM) with a relative insulin deficiency and a development of insulin resistance,
• Gestational diabetes (GDM).
11
Besides these three, there are less common types of diabetes caused by or associated with specific conditions such as endocrine disorders or infections, as well as a set of rare monogenic forms referred to as maturity onset diabetes of the young (MODY), or monogenic diabetes for short [27].
One may additionally identify individuals with elevated blood glucose levels that do not (yet) meet diagnostic criteria as “pre-diabetic” with impaired fasting glucose (IFG) and/or impaired glucose tolerance [2].
The most common form of diabetes is T2DM, which is considered to be one of the most challenging health problems in this century. Its incidence is rising rapidly worldwide – it is projected that by 2030 more than 500 million people will suffer from diabetes. The escalating costs threaten the health care system of any nation, and complications associated with the disease are a major cause of disability, reduced quality of life, and death [26,28]. However, T2DM appears to be eminently preventable:
Epidemiological data suggest that nine out of ten cases may be attributed to habits and modifiable behavior, i.e. poor diet and a sedentary lifestyle. Obesity is the primary risk factor, but is exacerbated by a family history of diabetes. [29,30] The effect of bariatric surgery, where the symptoms of T2DM clear within days after the intervention and long before the actual weight loss, provide evidence that weight and T2DM are not in a direct cause-and-effect relationship [31].
1.3.1 Gestational Diabetes Mellitus
Gestational diabetes mellitus (GDM) is defined as any degree of glucose intolerance with onset or first recognition during pregnancy [32,33], and it shares pathophysiological similarities with T2DM even though the exact mechanism involved is not yet known. Accordingly, along with the increase of obesity and T2DM in women of reproductive age, an increase of GDM is observed [34,35].
Several sets of diagnostic criteria for GDM exist, based on fasting plasma glucose (FPG) concentrations or plasma glucose (PG) after an oral glucose tolerance test (OGTT) [33]. In the present work, we focused on the criteria set forth by the World Health Organization (WHO) [36], defining GDM as FPG above 7.0 or PG 2 hours after the OGTT (2h-PG) above 7.8 mmol/L, and criteria based on those by the International Association of the Diabetes and Pregnancy Study Groups (IADPSG) [32]
with FPG above 5.1 mmol/L or 2h-PG above 8.5 mmol/L. These thresholds are somewhat lower than those for T2DM [37] and most of them do not need to be confirmed by repeat testing.
The prevalence varies widely depending on the population studied and the diagnostic criteria. A common estimate is that between 2% and 10% of all pregnancies in developed countries are affected by GDM. One-third of women with GDM experience recurrence in subsequent pregnancies. Risk factors for recurrence are older maternal age, greater parity, and weight gain between pregnancies.
12
Although in most women with GDM, hyperglycemia remains moderate and resolves rapidly after delivery, the consequences are far-reaching [33]:
• The pregnancy is more likely to develop complications, both maternal (gestational hypertension and preeclampsia, need for operational delivery with its associated risks) and fetal (macrosomia and shoulder dystochia, small-for-gestational-age and preterm delivery, birth defects, even stillbirth)
• The mother is at greater risk of developing metabolic syndrome, predisposing her to coronary artery disease and T2DM with all associated complications later in life [38]. In addition, women with a history of GDM have increased risk of obesity, hypertension and subclinical atherosclerosis, leading to increased cardiovascular disease even in the absence of T2DM.
• The child is more likely to suffer immediate consequences (hypoglycemia, lung immaturity) and, crucially, has a higher lifetime risk of obesity and T2DM itself. Exposure to elevated glucose levels in utero has been linked to reduced insulin sensitivity as well as impaired pancreatic β-cell function [39].
Clearly, an intervention during pregnancy would offer the opportunity to substantially positively impact the lives of both mother and child, and to break the vicious circle of diabetes begetting diabetes. In fact, it has been shown that treatment with diet and lifestyle advice and, if needed, medication can improve short-term pregnancy outcomes and reduce the risk of later T2DM [34].
This opportunity relies crucially on better biomarkers besides blood glucose, that can be detected earlier and enable an intervention before the hyperglycemia eventually develops, that can monitor the condition and the intervention as it unfolds, or that can identify mothers at particular risk for complications.
Since the causal factors of T2DM and in particular GDM are not fully understood, it was one of the aims of the present metabolomics study to shed light on the metabolic footprint of this disease.
1.4 Breastfeeding
Although controversies about the causality exist [40,41], breastfeeding is generally considered to be beneficial to the infant and the mother, and the WHO strongly recommends exclusive breastfeeding for the first six months of life [42–44]. Particularly in the developing world, breastfed infants are at a substantially lower risk of death from diarrhea and pneumonia [45,46], the two primary causes of child mortality worldwide. In the long term, adults who were breastfed as babies often have better overall cardiovascular health, as well as lower rates of overweight, obesity and type 2 diabetes
13
[47,48]. Breastfeeding mothers return to their pre-pregnancy weight faster, experience lower rates of obesity, and have lower risks of breast and ovarian cancer later in life [49–51].
Globally, less than 40% of all infants are exclusively breastfed for the first six months of life [42].
Insufficient milk production is the most common stated reason for breastfeeding cessation [52,53].
Risk factors for insufficient milk production are multidimensional and include socio-economic, demographic and lifestyle factors as well as the quality of lactation counseling in health services [54], but at last they all necessarily exert their effect through biological mechanisms.
The initiation of lactation can be divided into two stages: The first is the secretory differentiation when the mammary epithelial cells develop milk secretory capacity and differentiate to lactocytes during pregnancy, and the second is the secretory activation after delivery with abundant milk secretion and major change in the composition of milk [55]. Whereas the secretory differentiation depends on lactogenic hormones such as prolactin, estrogen and progesterone, the secretory activation is triggered by the rapid lowering of progesterone levels postpartum. Once breastfeeding is established, the maintenance of lactation depends on milk removal and suckling which provides a continuous stimulus for prolactin release [56].
Simple biomarkers to evaluate milk production and infant nutrition would be helpful in the clinical management of problems related to breastfeeding and in studies of human lactation [57]. Excretion of lactose in urine during pregnancy and lactation is well documented [58–60] and has been proposed as a biomarker of milk production [61,62]. Metabolomics has been applied to lactation research in animals and humans [63], but not with breastfeeding as the endpoint.
14
1.5 Feeding Preterm and Low Birth Weight Infants
The age of a newborn infant can be reported as days or weeks of life, i.e. since birth, or expressed in terms of the continuum of fetal development as gestational age. The latter is typically counted from the last menstruation 14 days prior to fertilization and is therefore also called postmenstrual age (PMA). The birth of a baby before 37 weeks of gestational age is defined as preterm or premature [64].
The birth weight of an infant, preterm or otherwise, can be classified in two different ways. Consider Fig. 3, based on Norwegian male births [65]:
• Birth weights below fixed thresholds of 2500, 1500 and 1000 g independent of gestational age are called low birth weight (LBW), very low birth weight (VLBW) and extremely low birth weight (ELBW), respectively.
• Relative to a predetermined weight distribution by gestational age, infants below the 10th or above the 90th percentile are called small (SGA) or large for gestational age (LGA), or appropriate for gestational age (AGA) otherwise.
Figure 3: Smoothed birthweight percentiles, male single births in Norway (1987-1998) by gestational age. A developmental trajectory between the 10th and 90th percentile is considered appropriate for gestational age (AGA).
15
As a consequence of these definitions, preterm infants are more likely to have low birth weight, simply because the thresholds do not take the gestational age at birth into account. About 13% of deliveries were preterm in the U.S. in 2006, and the incidence of LBW and VLBW was approximately 8% and 1.5% [66]. The primary causes of VLBW are premature birth and intrauterine growth restriction (IUGR), usually due to problems with placenta, maternal health, or to birth defects. Many VLBW babies with IUGR are preterm and thus are both physically small and physiologically immature.
Preterm infants have greater morbidity than term infants during the first year of life, suffering from infections of the respiratory and gastrointestinal tracts, neurological abnormalities and continuing postnatal growth retardation [67,68]. Small stature, neurosensory impairments and educational disadvantages are known to persist into early adulthood [69] and beyond [70], and low birth weight is linked to elevated risks of cardiovascular disease, hypertension and also impaired glucose regulation [71,72]. There is mounting evidence that fetal programming in utero represents a mechanism for non-genetic inheritance even over more than one generation [73,74].
As discussed above with respect to breastfeeding and maternal gestational diabetes, postnatal nutritional support of premature infants offers an opportunity to exploit a critical epoch of growth and improve long-term health outcomes.
Biologically, the prenatal-postnatal transition is characterized by the change from a continuous supply of nutrients via the placenta to a cyclic supply of nutrients through oral feeding. It takes some time to establish recommended dietary intake, and both term and preterm infants experience an initial weight loss. After birth weight is regained by about two to three weeks of age, it is recommended that the relative growth of the preterm infant parallels that of the fetus at the same gestational age [75]. However, estimates of the minimal energy requirements vary, and there is controversy about whether current recommendations exploit the full growth potential of preterm infants. Nutritional interventions have been devised to minimize energy deficits and improve growth rates, but there are also concerns about overfeeding and obesity due to intensive feeding [76], resulting in metabolic syndrome [77].
Metabolomics strategies have been applied to studies of neonates of animal models and humans.
For example, Nissen et al. investigated IUGR in pigs close to term and found possible markers for fetal programming of subsequent T2DM [78], while Atzori et al. studied differences in urinary metabolite profiles between term and preterm human infants [79]. Metabolomics should also be useful for monitoring the postnatal metabolic maturation and the infants’ response to different nutritional protocols. Yet, there are currently no nutri-metabolomics studies on neonates available in the literature [80].
16
1.6 Metabolomics: Typical Methods and Workflow
The flowchart in Fig. 4 shows a simplified overview of the analytical pipeline or workflow of a classical, untargeted NMR metabolomics study. More detailed, generalized presentations can be found in articles by Xia et al. and Dettmer et al. [8,81]. Samples are run on the analytical platform, i.e.
the NMR spectrometer, in an automated fashion under standardized conditions. The resulting spectra are then processed in order to make them readily comparable by minimizing any unwanted variation. This step typically includes baseline and phase correction, peak alignment or referencing to correct shift variations, removal of e.g. solvent signals, and normalization. A data matrix or table is created, consisting of a row for each sample, and containing in the columns the variables, i.e. the signal intensities at all spectral positions. Further transformations, e.g. log-transformation and scaling or weighting of the variables, can be applied.
A common first step in the statistical analysis, besides simply visually checking the spectra, is then to run a principal component analysis (PCA) in order to get an overview of the data set. PCA arranges the samples along perpendicular axes of decreasing spectral variance, the so-called principal
Figure 4: Simplified flowchart of the analytical pipeline of a classical, untargeted NMR metabolomics study.
17
components, which can be visualized as scatter plots. This will give an impression of the main variations and groupings among the samples, and also reveal outliers. The processing parameters, such the choice of normalization and scaling methods, will affect the results of the PCA.
PCA is called an unsupervised method because it models the “natural” variations in the data set regardless of study endpoints. The resulting plots, on the other hand, can of course be color-coded with respect to the study objectives. If e.g. a disease classification coincides with the dominant variations in the data, then this will become apparent in the plots and one can inspect the PCA loadings to find which spectral signals contribute to the relevant principal components.
In most cases, however, supervised methods that actively search for a link between the spectra and the study endpoints are required. Partial least squares regression (PLS) or classification by PLS discriminant analysis (PLS-DA) are popular approaches, not only yielding statistical models that can be used for prediction but also being fairly straightforward to interpret. Just as in PCA, one reads from the loadings which variables, i.e. spectral regions are most influential in and relevant to the model. Classical multilinear regression or PCR are other options for fitting continuous outcomes, while soft independent modeling by class analogy (SIMCA), k-means clustering and support vector machines (SVM) are available for classification tasks, i.e. building models that can distinguish between e.g. intervention and control group or categorical endpoints such as ethnic background.
Additionally, differences between classes can also be characterized variable-by-variable using t-tests or analysis of variance (ANOVA).
The spectral signals that are identified as relevant to the study endpoints must then be linked to the underlying metabolites. Compound identification involves comparing the respective signal positions and shapes to reference databases such as the Human Metabolome Database (HMDB) [82] and the Biological Magnetic Resonance Bank [83], or to examples in published literature [84–87]. Software packages like Chenomx (Chenomx Inc., Edmonton, AB, Canada) can simplify this task and also quantify known metabolites. In cases where the signals cannot be assigned directly, more complex statistical, NMR or mass spectrometry analyses may aid the identification [88–92].
A distinction is sometimes made between targeted and untargeted metabolomics based on when in the workflow the identification and selection of metabolites takes place. Targeted metabolomics bases its statistical analysis on pre-defined sets of metabolites while the untargeted approach works with the full width of available data, e.g. raw spectra. There are, however, no clear boundaries: A targeted study may cover so many metabolites that it is in effect no more selective than an untargeted study where the analytical platform or the sample preparation limit the range of detectable compounds.
18
Typical biological sample materials for metabolomics include urine, blood plasma and serum, but also e.g. cerebrospinal fluid, saliva or amniotic fluid [87,93]. Outside of human metabolomics, it is also common to analyze cell culture supernatants and lysates as well as e.g. cell, bacterial, tissue or plant extracts [94,95]. Besides the analysis of fluids, intact tissue samples can be studied using solid-state NMR [96].
1.7 About Urine
Since this thesis focuses on urine metabolomics, it may be helpful to briefly review how urine is produced in the kidneys. Consider Fig. 5: Blood from the abdominal aorta enters the kidneys via the renal artery (3) and proceeds to the glomerular capillaries of our approximately one million nephrons (6) that reach from the renal cortex (1) into the medulla (2), before exiting via the renal vein (4). The three basic processes that take place in the nephrons are filtration, reabsorption and secretion: The blood that enters through the afferent arteriole (7) is first filtered in the capillaries of the glomerulus (8). The filtrate (pre-urine) is collected in Bowman’s capsule (9) and passes through the tubuli and the loop of Henle (10), which descends into the renal medulla. There, the filtrate meets again with blood that has traveled from the glomerulus via the efferent arteriole (11) into capillaries (12) that surround the tubuli and the loop of Henle. Another exchange takes place, consisting of both reabsorption from the filtrate into the blood and secretion from the blood into the pre-urine. This
Figure 5: Schematics of kidney and nephron. 1: Renal cortex. 2: Medulla. 3: Renal artery. 4:
Renal vein. 5: Ureter. 6: Nephrons. 7: Afferent arteriole. 8: Glomerulus. 9: Bowman’s capsule. 10: Tubuli and loop of Henle. 11: Efferent arteriole. 12: Peritubular capillaries.
(CC-BY-SA-3.0, Daniel Sachse, Wikimedia Commons)
19
adjusts the sodium and potassium content, salinity, volume and pH of the urine before it leaves the kidneys and travels via the ureter (5) to the bladder.
The human kidneys filter the entire blood plasma volume 60 times per day, generating ca. 180 L of filtrate (pre-urine) which, after reabsorption, produces ca. 2 L of urine. The primary indicator of kidney function is the glomerular filtration rate (GFR), i.e. the volume of filtrate produced per unit of time, disregarding successive reabsorption or secretion in the tubulus. GFR depends on the net pressure across the semipermeable membrane between the glomerular capillaries and Bowman’s capsule and a filtration coefficient that incorporates the size and permeability of the membrane.
Since all endogenous substances are subject to varying degrees of tubular reabsorption or secretion, none of them perfectly reflect the true GFR. However, the renal clearance of creatinine correlates well enough to allow reasonable estimates, even though some secretion occurs and inflates the results. Creatinine is a metabolite of creatine and creatine phosphate produced in muscle tissue at a constant rate and, upholding a dynamic equilibrium, is excreted at the same rate as it is produced. In urine analysis, the creatinine concentration is therefore frequently used as a normalization reference in order to minimize dilution artifacts.
In summary, urine analysis by metabolomics or other means provides a view of the patient’s metabolism, but it does so through the lens of the kidneys.
1.8 NMR Spectroscopy
Since the discovery of nuclear magnetic resonance in 1946 [97], NMR spectroscopy has become a precise and reliable tool for analytical chemistry, biochemistry and many related fields. Today, NMR spectroscopy finds application in basic and applied research in physics, chemistry and material science, and has become an invaluable tool in molecular structure elucidation, particularly of proteins and other biological compounds. In the medical context, NMR is most notable as the principle behind magnetic resonance imaging (MRI) and all the diagnostics and research modalities derived from it [98].
In Metabolomics, one predominantly exploits the circumstance that many small metabolites possess a pattern, or fingerprint, of distinct and often unique proton resonance frequencies. The various constituents of a given biofluid sample will simply contribute the superposition of their particular resonances to the overall NMR spectrum – their presence and respective concentrations can therefore in theory be determined, a principle that also lies at the heart of chemometric mixture analysis and reaction monitoring.
20 1.8.1 Basic principles
The theory of NMR is highly developed and the dynamics of nuclear spin systems is fully understood [11,99–101]. Atomic nuclei with unpaired protons or neutrons possess a property called spin that can be thought of as a small magnetic moment. In a macroscopic sample, these spins point isotropically in all spatial directions as shown in Fig. 6(A). However, when exposed to a static magnetic field B0
Figure 6: Principles of NMR. (A) The intrinsic, microscopic magnetic moments of atomic nuclei are called spins. (B) The spins align themselves along or against an external magnetic field. (C) The alignment leads to split potential energy levels, depending linearly on field strength. The energy difference corresponds to an electromagnetic resonance frequency. (D) In a given molecule, the electrons shield the nuclei to a varying degree from the external field, causing chemical shift of the resonance of a given group. Neighboring nuclei cause further splitting of the resonances. (E) The resulting spectrum of a molecule is a unique fingerprint. (F) The spectrum of a biological sample is the superposition of the fingerprints of its constituents.
21
they will align themselves either with or against the outer field, see panel (B). The two parallel and antiparallel states are called α and β, respectively, and their orientation with respect to the outer field gives them different potential energy levels. As shown in panel (C), the difference between them, and thus the electromagnetic frequency f that enables transitions and resonances, depends linearly on the strength of B0:
∆𝐸=ℎ𝛾𝐵0=ℎ𝑓; 𝑓=ℎ𝛾𝐵0
where h is Planck’s constant and γ is a constant factor depending on the particular atom or nucleus.
The table below shows the properties of some of the most important nuclei for NMR.
Nuclide Unpaired
protons Unpaired
neutrons Net
spin γ (MHz/T) Natural abundance
1H 1 0 1/2 42.58 99.985 %
2H 1 1 1 6.54 0.015 %
31P 0 1 1/2 17.25 100 %
14N 1 1 1 3.08 99.63 %
15N 1 0 1/2 4.31 0.37 %
13C 0 1 1/2 10.71 1.11 %
19F 0 1 1/2 40.08 100 %
Because of its presence in virtually all biological substances and its high natural isotopic abundance, hydrogen is the most commonly used nucleus. In fact, NMR spectrometers are usually not classified by their field strength in Tesla (e.g. 14.1 T), but by their corresponding hydrogen resonance (600 MHz). But one hydrogen resonance frequency alone cannot possibly be useful for metabolic profiling – the spectra would be rather repetitive and uninformative.
It turns out that hydrogen atoms bound to different positions in different molecules experience slightly different local magnetic fields. Consider Fig. 6(D): The hydrogen nuclei of e.g. ethanol are surrounded by their electrons, which act as a shield and diminish the effective magnetic field at the location of the nucleus. However, the presence of the carbon and oxygen atoms and the structure of the molecule distort this protective cloud, such that the green group of hydrogens is shielded less than the red group, experiencing a stronger effective field. Therefore, the green resonance is at a higher frequency than the red. (It is a convention in NMR to arrange the resonance axis from high to low frequencies.) This process and the relative position of a given hydrogen group on the frequency axis are called chemical shift. Chemical shifts are typically reported as a dimensionless quantity, namely the relative distance from a reference signal in parts per million (ppm) of that signal’s frequency. By definition, the ppm scale is independent of the applied outer magnetic field strength.
22
In water, the blue hydroxyl group exchanges its proton with those of the solvent. Because this happens on the same time scale as the NMR acquisition, its signal is lost. The signals that are not lost, however, have another useful property: Their intensity, or rather the area under their resonance peaks, is proportional to the number of contributing protons and the concentration of the sample, making NMR highly quantitative.
Furthermore, neighboring hydrogen groups influence each other in an interaction called J-coupling:
The red group can find the two neighboring green hydrogen atoms equally likely in the configurations α-α, α-β, β-α and β-β. Their spins add or subtract from the local magnetic field, causing the red resonance to split into a pattern of three frequencies, a triplet. Conversely, the green hydrogen spins split the red resonance into a quartet. Together, the chemical shift and the splitting pattern give many metabolites a unique fingerprint, see Fig. 6(E), that can be identified and quantified in the complex mixture spectrum of a biological sample like urine, shown in Fig. 6(F).
1.9 Statistical Analysis
After preprocessing the spectral data, metabolomics studies often use a combination of classical, univariate tests and multivariate statistics to investigate the relations between the spectra and the study endpoints [102].
1.9.1 Univariate Tests
Since NMR spectra in practice are nothing more than a large set of numerical variables, the full range of classical statistics can in principle be applied. These include t-tests and ANOVA between spectral variables and categorical endpoints and Pearson correlations with continuous endpoints. The variables may also be e.g. log-transformed to obtain a normal distribution, or non-parametric alternatives may be used. However, the large number of variables increases the risk of false positives, and an appropriate correction for multiple testing may not always be straightforward due to the often highly intercorrelated internal structure of the data. A related problem affects simple and multiple linear regression, as will be discussed below.
In the current work these classical methods were not used directly on spectral data, but rather on quantitated metabolite levels or concentrations. Even though they sometimes are problematic, these methods have the advantage of being well-established and understood, familiar to many researchers particularly in interdisciplinary settings, and communicating results with a definite, intuitive clarity.
They do, however, not take advantage of the full multivariate width of a metabolomics data set.
23 1.9.2 Multivariate Statistics
Whenever an object, i.e. a patient or a sample, is described by more than one variable, i.e. feature or parameter, it can be helpful to not only analyze each variable on its own, but to combine and integrate them and exploit their internal relationships. This is multivariate analysis. [103,104]
Consider the simulated example in Fig. 7 as a motivation. Panel (A) shows a scatter plot of two variables for one hundred objects that are supposed to belong to two classes, e.g. cases and controls in a clinical study. Even though the classes clearly are separated, the objects are distributed in such a way that neither of the variables alone can separate them perfectly, as illustrated by the overlapping histogram-like densities along variable 1 shown in panel (B), and the corresponding receiver operating characteristic (ROC) curve in panel (C). By integrating the two variables in a multivariate analysis, however, perfect separation can be achieved. A PCA scores plot is shown in panel (D), incorporating the difference between the two classes as the main variation in the first principal component. Panel (E) shows the prediction results of 50 cross-validated PLS regression runs, producing the perfect classification shown in panel (F).
The data in multivariate analysis is arranged in a matrix X with n rows representing the study subjects or samples and m columns representing their features (Fig. 8). In the present work the columns
Figure 7: Simulated example highlighting the utility of multivariate approaches. Top row:
Univariate classification along e.g. Variable 1 is suboptimal due to overlap. Bottom row:
Multivariate analysis produces perfect class separation. Details in text.
24
contain spectral intensities at a given chemical shift or concentrations of a given metabolite. Each row is therefore the profile of one sample. Outcomes or end points, such as blood glucose levels or diabetes status, are collected in a corresponding matrix Y with the same n rows for the respective samples and k columns for the end points. In a classical sense, X are the independent and Y the dependent variables.
It is not uncommon for Y to contain only k=1 column if only one end point is studied at a time.
Categorical end points, on the other hand, are usually transformed (recoded) into dummy columns, one less than the number of categories, that only contain binary representations of class membership. An example is shown in Fig. 8(B).
In terms of statistical methods, one then distinguishes between unsupervised and supervised approaches. Unsupervised methods make models of the X matrix as it is, whereas supervised methods focus on the relationship between X and Y, typically with the aim of producing a predictive model of the end points Y based on the input data X. Naturally, also unsupervised analyses of X can lead to discoveries relevant to Y, see the motivating example above. The scatter plot in panel (A) relies entirely on the numerical values of the X variables, but it reveals a structure in the data that happens to be significant for the class membership stored in Y and visualized purely by coloring.
Figure 8: Data matrices in multivariate analysis. (A) The X-matrix contains m measurements for each of the n samples. These are then related to k end points for each of the n samples stored in the Y-matrix. These can be continuous, numeric values, binary categories or multiple categories. Unsupervised methods model X alone, while supervised methods model the relationship between X and Y. (B) Construction of a dummy matrix from four categorical end points E-F. With E considered default, three columns are required to encode F, G and H.
25 1.9.3 Principal Component Analysis (PCA)
Statistical analysis begins and ends with good visualization. However, in the multivariate case an exhaustive presentation of the data set is often difficult. Principal Component Analysis is an unsupervised method that offers a possibility to detect and survey the structure of a data set by degree of variation.
Like in a scatter plot, the approach is based on interpreting the numeric values of variables as coordinates in an abstract coordinate system as illustrated in Fig. 9(A). The method then finds new orthogonal coordinates, called principal components (PCs), as linear combinations of the original variables, such that the first PC is aligned with the direction of strongest overall variation in the data set, the second PC along the strongest remaining variation thereafter, and so forth. The projections of the data points onto the new axes become their new coordinates, the PCA scores, while the coefficients of the linear combination, the PCA loadings, describe the contribution each original variable gives to the new axes.
Figure 9: (A) Illustration of the elements of PCA. Mean-centered scatter plot of two variables of the original data matrix X. A new axis, PC1, is defined along the direction of strongest variation The projection of a given object onto PC1 is its score, the remaining distance from PC1 its residual. The correspondence between PC1 and the original variables is called the loading. (B) Representation of the matrix algebra of PCA. The original matrix X is represented by pairs of scores t and loadings pT, plus the residuals in E.
26
Algebraically, this means that the original data matrix X is represented by the pairs of scores and loadings along the PCs plus a residual matrix E, see Fig. 9(B). By definition, the more PCs are incorporated into the PCA model, the more of the original variation is included. The first PCs will contain the most prominent structures of the data, while higher-order PCs will to an increasing degree capture noise. By focusing on and plotting the first, i.e. most influential, PCs instead of scatter plots of all original variables and their combinations, PCA allows to quickly and efficiently get a first impression, an overview of the structure of the data set, and to detect outliers and special cases, sample clusters, and internal correlations.
The structure of a data set is determined by how similar or dissimilar the samples are with respect to the numeric values of their variables. Similar samples are represented by similar coordinates in both the original and the new axes, and will show up in similar regions of the scores plots – bigger difference, on the other hand, will result in larger distances between the samples. Groups of similar samples will consequently show as clusters in the plots. By the same logic, variables that behave similarly across samples, i.e. are correlated, will be assigned similar coefficients, i.e. loadings.
1.9.4 Multiple Linear Regression (MLR) and Principal Component Regression (PCR) Supervised methods model the relation between X and Y. The most immediate choice for linking end points Y to multivariate data X is multiple linear regression (MLR). The model for a single (k=1) end point takes the form
𝑦𝑖 =𝑐1𝑥𝑖1+⋯+𝑐𝑚𝑥𝑖𝑚+𝜀𝑖, 𝑖 = 1, … ,𝑛
for m spectral variables and n available samples with corresponding end points. The coefficients c1
through cm constitute the solution of this system of equations and describe the relation between X and Y, the error term ε contains the residuals.
Several constraints, however, make MLR impractical for metabolomics. Most importantly, the model is mathematically underdetermined and cannot be solved when the number of samples n is smaller than the number of predictors m. Metabolomics studies typically deal with thousands of spectral intensities but rarely more than a few dozen or a hundred samples. Even ambitious research projects cannot match the spectral resolution of modern analytical platforms. Another issue is that of multicollinearity in the predictor matrix X: When two or more input variables are correlated, their regression coefficients can cancel each other’s contributions and become unstable. While the resulting model may still predict end points correctly, its coefficients can no longer be interpreted in terms of how which metabolites contribute. This is a problem for biomedical research in general, where metabolites are naturally correlated, but especially for NMR spectroscopy where most
27
compounds have multiple contributions to the spectrum. In other words, multicollinearity cannot be avoided in metabolomics. Lastly, MLR can only model a single Y variable at a time, which can be cumbersome especially for classification with dummy matrices.
An approach that solves two of these three issues is principal component regression (PCR). In essence, one performs a multiple linear regression on a PCA model instead of the original X matrix.
There can be no collinearity because the principal components are by definition orthogonal, and there are never more principal components than samples, meaning the regression model is always well-defined. However, since PCR still is MLR at heart, it can also only handle one Y variable.
1.9.5 Partial Least Squares Regression (PLS)
PLS, also called “Projection on Latent Structures”, can be seen as a supervised extension of PCA.
Here, both the data matrix X and the end point matrix Y are modeled using two respective sets of scores, loadings and residual matrices.
𝑋=𝑇×𝑃𝑇+𝐸 𝑌=𝑈×𝑄𝑇+𝐹 𝑈 → 𝑇×𝐵
Crucially, the scores U of the Y matrix are rewritten as a regression on the scores T of the X matrix, linking input and end points together.
As before, the more components one includes, the more variation (of X and Y simultaneously) the model captures. The first components will capture the clearest, most systematic relations between X and Y, while the later ones will increasingly model the irrelevant peculiarities of the particular data set. For the purpose of the regression, i.e. describing the end points Y through the data X, the most relevant diagnostic parameter is the explained Y variation R2 (or more precisely R2Y):
𝑅2= 1− ∑(𝑌𝑒𝑥𝑝𝑙− 𝑌)2 𝑌2
where Yexpl are the approximations of the original end points Y captured in the Y-scores and Y- loadings. The more components are included, the smaller the residual deviations are, and the closer R2 approaches 1.
Thus far, PLS links continuous, numerical end points Y to a data matrix X. But where end points are binary, such as the presence or absence of a disease, these can simply be represented as numerical codes. PLS can then predict these codes with a certain margin of error and, using a cut-off, discriminate between the original binary end points. This turns PLS into PLS-DA (PLS discriminant
28
analysis). The approach can be extended to multiple categories by rewriting them as a dummy matrix of binaries, as explained earlier.
1.9.6 Validation of predictive models
Validation means testing the predictive power of a statistical model according to given specifications on data that have not been used in the development of the model. The objective and importance of validation is two-fold: In general, validation gives an estimate of the performance and robustness, in other words the significance of an analysis, which of course is crucial for the discussion of findings in a study. But already before final results are presented and evaluated validation is used as a guide to optimize and fine-tune the relevant parameters of a statistical method, such as scaling or number of components.
Particularly the latter is crucial for greedy methods such as PLS that are prone to overfitting: Given enough components and (as is often the case in metabolomics) more variables than objects in the X matrix, PLS will always produce a model that perfectly fits the data set whether there is an actual relation between X and Y or not, simply by learning to identify the individual samples and match them to their Y values. Naturally, such a final model would be useless in practice because any new and previously unseen samples would not be identified. Here, validation helps to find the optimal number of components to include in the model in order to capture the systematic relationships between X and Y while avoiding to model individual matches.
The ideal case is the application of the model to a new and entirely independent study or data set.
This approach, called test set validation, delivers the most realistic estimate of the performance of the model. For example, a PLS model could be developed between patients’ urine spectra X and their blood glucose levels Y as the training or calibration set. A new set of measurements Xnew and Ynew
would then be acquired as the test set, and the model would be used to calculate blood glucose values Ypred based on Xnew. Clearly, the smaller the deviation between the predicted Ypred and the measured Ynew, the better the PLS model.
True test set validation is rarely used in practice due to constraints on time, funds or available subjects. An alternative is cross validation (CV), where X and Y are divided into N corresponding segments or blocks. N PLS models are then built from all but the respective Nth segment and used to predict Ypred values for precisely the Nth segment, which are then compared to the actual Y values as above. In this way, each segment acts once as the Nth test set and (N-1) times as a part of the training or calibration set. A variant of CV is leave-one-out validation, where one uses single samples as segments, so that the leaving out of one segment in each CV step becomes the leaving out of one sample.
29
The difference between the predicted values Ypred and the actual values of Y or Ynew is most commonly described by the diagnostic parameter Q2 (or Q2Y), defined in a manner corresponding to R2 as
𝑄2= 1− ∑(𝑌𝑝𝑟𝑒𝑑− 𝑌)2 𝑌2 .
In the same way as R2 is related to the residuals of the model in the training set or segments, Q2 describes the residual error when the model is applied to the test set or segments, and is understood as a percentage of explained Y variation. A Q2 close to 1 means that the model developed from training set X and Y values can accurately predict test set Y values. A derived parameter is the ratio Q2/R2 which describes the correspondence between training and test sets, i.e. how consistently the explained variation can be reproduced in the test set. In practice, when both R2 and Q2 are low, the data set may contain significant trends but no individual predictability, whereas low Q2 and high R2 imply that the PLS is only modeling noise.
Whereas cross validation answers the question how robust a relation between the X and Y matrices in a given data set is, the complementary approach of permutation testing investigates the risk that the given combination of X and Y values is coincidental. In permutation testing, the labels of the end points relative to the input variables are randomly rearranged. The chosen statistical analysis, e.g.
PLS regression, is performed on a large number of such randomized scenarios to establish the distribution of a diagnostic measure, e.g. Q2, under the null hypothesis that the labels don’t matter.
Finally, the analysis is run with the true end points, and the diagnostics are compared to the randomized distribution, assessing whether the true end points perform significantly better than the randomized ones.
Some consideration must also be devoted to the level at which validation is applied. In general, all transformations and calculations that could influence the results should be inside the validation scope. This is to prevent that the prediction step can take advantage of any global information that would not be available in a future real-world application, thus artificially overestimating model performance. It explicitly includes the scaling and centering steps because they depend on variations across samples by definition, but may also apply to normalization methods such as the probabilistic quotient norm [105] which uses median spectra as a reference. However, the choice of the normalization method and e.g. the number of components of a PLS model are usually tested out independently and then kept consistent.
30
Another bias that could skew validation results may arise in studies where multiple samples from the same subject are used to predict the same outcome, e.g. several urine samples from different time points to predict one blood glucose measurement. If samples from the same patient ended up in both the calibration and the test sets, there would be the danger that, given enough components, a PLS regression would model the identities of individual patients instead of common systematic changes in the urine profiles, again overestimating predictive performance. In such cases one would define the validation segments at the patient level instead of the sample level.
31
2 Aims
The principal purpose of this thesis was to investigate in how far the urinary excretion profile as analyzed by NMR-based metabolomics could contribute to the prediction and monitoring of risks to health and development during pregnancy and in newborns. In particular we wanted to:
1. Create a comprehensive, quantitative overview of how pregnancy affects the urinary excretion profile.
2. Search for signs of gestational diabetes (GDM) in the urine profiles – ideally early, predictive markers.
3. Investigate whether pre- or postpartum urine profiles can be linked to breastfeeding.
4. Characterize the (urinary) metabolic trajectory of very-low-birth-weight infants in the first weeks of life.
5. Assess the effect of an intensive nutritional intervention on said trajectories.
32
3 Summary of Papers
3.1 Paper I: Pregnancy and Gestational Diabetes
In this paper, we analyzed urine samples from the STORK Groruddalen project, a prospective cohort study of 823 healthy, pregnant women with multiethnic background from three north-eastern districts of Oslo, Norway. The aim of STORK Groruddalen was to improve the identification of high- risk pregnancies and reduce adverse short and long-term outcomes for mothers and offspring by identifying predictors for GDM and fetal growth. Paper I represents the application of urinary metabolomics to the search for GDM biomarkers and to the characterization of the metabolic adaptations of pregnancy.
Fasting morning midstream clean-catch urine samples were collected at three visits (V1: gestational week 8–20; V2: week 28±2; V3:10–16 weeks postpartum), aliquoted and stored at -80°C. For the analysis, urines were buffered 9:1 using a potassium salt solution in pure D2O at pH 7.4 containing 5.9 mM trimethylsililpropionate-d4 (TSP) as a spectral and concentration reference, and then centrifuged and transferred to 5 mm NMR tubes. Proton spectra were acquired at 300.0 K on a 600 MHz spectrometer equipped with a cryoprobe and an automatic sample changer. An example spectrum, along with two others, is shown in Fig. 10. After accounting for missing samples and removing a small number of low-quality spectra, a total of 1,911 urine profiles from 790 of the 823
Figure 10: NMR spectra of three urine samples. Green: Healthy Norwegian woman of the STORK Groruddalen study, ca. 28 weeks pregnant. Black: Premature infant of the PRENU intervention study, first week of life, standard diet. Orange: Urine sample pooled from healthy volunteers and used as quality control throughout the measurements done for this thesis.
33
participants were eligible for further analysis (667, 671 and 573 from visits V1, V2 and V3, respectively). Among these were 572 corresponding pairs between visits V1 and V2, 509 between V2 and V3, and 494 between V1 and V3.
The GDM status of the participants was determined at V2 using FPG and 2h-PG after OGTT according to two independent sets of diagnostic criteria, (a) the binary WHO criteria and (b) a set of graded criteria. The WHO criteria diagnosed 13% of the study cohort with GDM. The graded criteria, based on the recommendations of the IADPSG, diagnosed 32% with GDM, further subdivided into 26% with mild and 6% with more pronounced hyperglycemia. Ethnic origin was defined by the country of birth of the participant or her mother, whichever was more relevant. The largest groups were Europeans (n=379, including a small number of North Americans of European descent) and South Asians (n=200).
Multivariate analysis was first carried out on creatinine-normalized NMR spectra and then on a table or matrix of concentrations of metabolites (relative to creatinine) derived from the spectra. PCA indicated systematic differences between urine profiles from the three respective visits. Cross- validated PLS-DA confirmed this by correctly classifying upwards of 90% of samples between any two visits. Univariate median concentrations and interquartile ranges at the three visits were calculated along with pairwise individual fold-changes for all substances and signals in the metabolite matrix in order to characterize the development of the urine profiles over time. This development was found to be dominated by increasing lactose excretion over all three visits, and an increased excretion of several unidentified metabolites with NMR signals between 0.5 and 1.1 ppm from V1 to V2 followed by a dramatic decrease at V3. Further contributions were the similar increase and subsequent decrease of the amino acids glycine, alanine and the combined threonine and lactate signal, and the decrease of tyrosine and formate after birth.
Multivariate models could not predict the GDM status according to either of the criteria based on the urine profiles from any of the visits. However, univariate analyses identified a small number of metabolites that differed between the respective WHO and graded categories at visits V1 and V2. In both cases and at both visits, healthy participants had lower levels of urinary citrate and two unidentified metabolites with doublet signals at 1.08 and 1.11 ppm. No differences with respect to GDM were found at the postpartum visit V3.
At all three visits, European participants were found to have generally lower levels of formate, alanine, an unidentified substance at 0.55 ppm and the combined lactate/threonine signal than South Asians and Others.