Keywords

1 A Brief History of Metabolomics and Its Relevance in Systems Biology

With the advent of the systems biology paradigm, which proposes to explore how interactions between biological components (biomolecules) affect the functionality (biological processes) of an organism as a whole [1], several bioanalytical methods have been proposed and/or improved. Formerly, molecular biology and physiology approaches were employed to acquire biomolecular and functional information, respectively. However, both strategies only provided limited data considering a target biomolecule and the directly related pathways, being incapable of characterizing a biological system in a complete and integrated way. For that reason, the development of the omics strategies caused a real revolution in this scientific area, and nowadays they are widely used in systems biology. Omics strategies aim at identifying the entire set of biomolecules (genes, proteins, metabolites, etc.) contained in a biological tissue, cell, fluid, or organism, thus generating a huge amount of data that are evaluated by biostatistics and bioinformatics tools. Figure 1.1 shows how the omics approaches are correlated and their respective objects (biomolecules) of study. Genomics, transcriptomics, and proteomics are beyond the scope of this book; therefore, we will herein focus on metabolomics as a key systems biology strategy.

Fig. 1.1
figure 1

A correlation between the main omics strategies used in systems biology studies

The term metabolome first appeared in the literature in 1998, when Oliver et al. [2] measured the change in the relative concentrations of metabolites as the result of deletion or overexpression of a gene. Metabolome is therefore used to address the entire set of metabolites an organism expresses. In 2001, metabolomics was defined by Fiehn as the comprehensive and quantitative analysis of all metabolites of the biological system under study [3]. Previously, in 1999, Nicholson et al. [4] used the term metabonomics to refer to the “quantitative measurement of the dynamic multiparametric metabolic response of living systems to physiopathological stimuli or genetic modifications.” Alterations in endogenous metabolite levels that may result from disease processes, drug toxicity, or gene function have been evaluated in cells, tissues, or biological fluids by metabonomics [58]. Latent biochemical information obtained from metabonomics may be used for diagnostic or prognostic purposes. Such information reflects actual biological events rather than the potential for disease, which gene expression data provide [9].

In the last decade, other terminologies have appeared in the literature to define and to classify metabolism studies. Metabolite (or metabolic) profiling was firstly described as the analysis of a small number of predefined metabolites for investigation of selected biochemical pathways and has its origin in early metabolism studies of Horning and Horning in 1971 [10]. Metabolic fingerprinting was defined by Fiehn [3] as “a rapid classification of samples according to their origin or their biological relevance.” Finally, in 2005, Kell et al. [11] proposed the term metabolic footprinting to refer to the exometabolome, i.e., what a cell or system excrete under controlled conditions. Most recently, in 2015, the term real-time metabolome profiling was proposed by Link et al. [12] referring to the direct injection of bacteria and cells in a high-resolution mass spectrometer and the monitoring of hundreds of metabolites in cycles of a few seconds over several hours.

However, before those terms were coined, studies involving metabolomics notions were firstly reported in the literature in the late 1940s by Williams and coworkers [13]. These studies were based in the data from over 200,000 paper chromatograms obtained from body fluid samples from different subjects, including alcoholics and schizophrenics, which produced evidence that there were characteristic metabolic patterns associated with each one of these groups, considering a hypothesis of “biochemical individuality.” Gates et al. published, in 1978, a review compiling these historical events [14]. The development of novel analytical techniques and biostatistics improvements in the 1980’s allowed an enormous progress of metabolic profiling studies. Then, at the end of the 1990s, many acronyms related to omics strategies appeared, and at that point, the terms metabolome, metabolomics, and metabonomics were proposed. A review from van der Greef and Smilde [15] discusses the symbiosis of metabolomics and chemometrics and presents an interesting timeline of the evolution of metabolomics.

Lipidomics is a subdivision of metabolomics defined as “the full characterization of lipid molecular species and their biological roles with respect to expression of proteins involved in lipid metabolism and function, including gene regulation” [16]. This term was proposed in 2003 by Han and Gross [17] to define the research area that focuses on identifying alterations in lipid metabolism and lipid-mediated signaling processes that regulate cellular homeostasis during health and disease. Currently, lipidomics research emphasis consists on identifying alterations in cells and body fluid lipid levels, revealing environmental disturbances, pathological processes, or response to drug treatments [18].

2 Definitions and the Metabolomics Workflow

Although several terms have been devised in the literature to classify metabolomics/metabonomics studies [2, 4, 1921], there is still not an actual consensus regarding terminology. A much simpler general definition that relates to the fact whether the researcher knows a priori what kind of metabolites to search has been proposed here, and it will guide the decisions on the metabolomics workflow presented in Fig. 1.2. In this context, a targeted metabolomics approach is defined as a quantitative analysis (concentrations are determined) or semiquantitative analysis (relative intensities are registered) of a few metabolites and/or substrates of metabolic reactions that might be associated to common chemical classes or linked to selected metabolic pathways. Metabolic profiling as mentioned earlier thus belongs to this definition. An untargeted metabolomics approach is based primarily on the qualitative or semiquantitative analysis of the largest possible number of metabolites from a diversity of chemical and biological classes contained in a biological specimen. Both fingerprinting and footprinting metabolomics belong to this definition.

Fig. 1.2
figure 2

Analytical workflow for studies in metabolomics

Lipidomics could be considered as a targeted metabolomics strategy, since it involves the study of a subset of specific metabolites (lipids). However, due to the complexity of the lipids, lipidomics itself is categorized as targeted or untargeted lipidomics, when the objects of study are specific lipids or global exploratory analyses are performed, respectively. The term focused lipidomics was proposed in 2009 as a strategy “for detecting molecules in some categories while comprehensively utilizing specific fragments (product and precursor ion scanning) or neutral loss caused by a specific feature of the partial structures of the molecules (neutral loss scanning)” [22]. However, the execution of focused lipidomics is only possible when working with mass spectrometry techniques, and actually this is not so distant from the targeted lipidomics concept. More details about lipidomics can be found in Chap. 11.

The metabolomics workflow, shown in Fig. 1.2, comprises the sequential steps that underline both targeted and untargeted metabolomics analyses, which will be further described.

Biological problem and experimental design.

The initial step of the metabolomics workflow relies on a clear and straightforward formulation of the biological problem to be addressed. This step is of crucial importance because it will govern the experimental design that follows. According to the biological problem, the type of metabolomics approach (targeted vs. untargeted metabolomics), sample type (biological fluids, tissues, cells, and/or intact organisms), sample size (number of specimens to be assessed), experimental conditions to which samples will be submitted, frequency of sample collection, metabolic quenching to interrupt enzymatic activity (addition of organic solvents and/or immediate freezing of samples by the use of dry ice or liquid nitrogen), storage conditions (−80 °C is usually preferred for long-term storage of biological fluids) [23], analytical platforms to be employed, and also sample preparation strategies must all be defined at this point, since they are somehow interrelated [24]. It is important to emphasize that metabolomics studies are always comparative in character; therefore, a group of control samples (samples that did not undertake the investigated condition) and test samples (carrying information on the investigated condition) are usually defined in the experimental design.

Sample preparation.

Once the biological problem is defined and experimental conditions for sample collection and storage are established, a further step on sample preparation prior to analysis might be considered. Sample preparation is intimately related to the sample type (whether it is a cell, a tissue, or a biological fluid), the selected metabolomics approach (targeted vs. untargeted analysis), and the elected analytical platform.

For targeted metabolomics, the extraction procedure is usually optimized for the specific metabolites or metabolite chemical classes under consideration and may involve steps such as cleanup for removal of sample matrix interferents and/or pre-concentration strategies, such as liquid-liquid and solid-phase extraction, to enhance the compound detectability [25].

For untargeted metabolomics of biofluids, sample preparation is usually minimal. Protein precipitation is sometimes considered as a mean to preserve column integrity in liquid chromatographic experiments or to prevent capillary clogging in capillary electrophoretic experiments. In general, a simple filtration and a few-fold dilution are often performed. Tissue and/or cell preparations require more elaborated extraction procedures, usually carried out by solid-phase extraction with pure solvents or mixtures, followed by centrifugation and dilution. Gas chromatographic analyses of biofluids and cell/tissue extracts demand further derivatization steps to convert polar metabolites into volatile adducts [26]. These steps are time-consuming and prone to errors, limiting the number of total samples to be processed in a single metabolomics experiment. Nuclear magnetic resonance experiments usually require sample dilution in proper deuterated solvents. More details about sample preparation in clinical metabolomics can be found in Chap. 2.

Data acquisition.

Differently from other omics sciences, metabolomics imposes a great analytical challenge due to the immense variety of chemical composition that biological samples exhibits, spanning from compounds with distinct chemical properties, structural features and functionality, as well as discrepant concentration levels. It is important to emphasize that currently no single analytical platform leads to a comprehensive identification and quantification of the entire metabolite set of a biological system [27, 28]. The chemical diversity of the metabolome, as well as its wide dynamic range [29], demands that different analytical techniques be combined to generate complementary results that will ultimately enhance metabolic coverage [5, 28].

The analytical techniques commonly employed for data acquisition in metabolomics studies are nuclear magnetic resonance (NMR) [23, 30] spectroscopy and mass spectrometry (MS) [31]. NMR spectroscopy can be considered as a universal metabolite detection technique, where samples can be analyzed directly with minimal manipulation and many classes of small metabolites can be measured simultaneously [5, 7, 32]. Major drawbacks in NMR for metabolomics are poor sensitivity and spectral complexity with superimposition of signals at certain spectral regions compromising clear identification. In a recent metabolomics study, NMR allowed characterization of 49 metabolites in human serum, with concentrations above 10 μmol L−1 (“normal NMR-detected serum metabolome”), whereas MS techniques were able to characterize more than 90 metabolites with concentrations lower than 10 μmol L−1 [33].

While MS is more sensitive and specific in comparison to NMR spectroscopy, it usually requires a previous separation step, using a hyphenated separation technique [27], such as gas chromatography (GC), high-performance liquid chromatography (HPLC) or ultra-performance liquid chromatography (UPLC), and capillary electrophoresis (CE). Separation techniques coupled to MS are important to reduce sample complexity and to minimize ionization suppression effects, thus enhancing the detection sensitivity and increasing the metabolome coverage [27].

GC-MS is a sound technique in the metabolomic arena [3437]. However, the need for time-consuming sample derivatization schemes limits its applicability to small sets of samples. Nevertheless, structural specificity of the generated adducts makes it easy to build dedicated spectra libraries that aid metabolite identification. Furthermore, due to derivatization, quite distinct classes of important rather polar metabolites, such as amino acids, biogenic amines, and carboxylic acids, can be assessed in a single chromatographic run and ionization mode. Furthermore, GC-MS via headspace techniques cover the volatile portion of the metabolome.

Perhaps due to extensive sample manipulation GC-MS demands, LC-MS has been the technique of choice in many metabolomics studies [3842], covering the moderately polar fraction of the metabolome. Several modes and different columns chemistries, including reversed-phase liquid chromatography (RPLC), hydrophilic interaction liquid chromatography (HILIC), and more rarely ion-pairing liquid chromatography (IPLC), allow LC-MS to cover a wide range of metabolite categories and polarities.

Although not as prominent as the chromatographic techniques, capillary electrophoresis coupled to mass spectrometry (CE-MS) has joined the metabolomic analytical arsenal due to its unique characteristics, particularly the ability to assess directly the most polar and/or ionic fraction of the metabolome [4345].

It is also possible to perform a metabolomics analysis by direct infusion (DI) mass spectrometry, but a lot of information is lost, due to ionization suppression of many metabolites present at very low concentrations in complex biological matrices [46]. Matrix-assisted laser desorption ionization coupled to MS (MALDI-MS) and MALDI mass spectrometry imaging (MALDI-MSI) are increasingly being invoked for metabolomics studies specially those assessing tissues, cells, and their compartments [47]. The incapability of differentiating metabolite isomers is a shortcoming of the MS technology. More details about NMR spectroscopy in metabolomics can be found in Chap. 3. Chapters 4 and 5 describe MS coupled to chromatographic and electrophoretic techniques, respectively.

Analytical methodologies used for lipidomics were recently reviewed [18, 48]. Basically, the only difference from a metabolomic and a lipidomic experiment is sample preparation: for lipidomics is necessary to include a lipid extraction step, usually by liquid-liquid extraction or solid-phase extraction, prior to NMR or MS analysis [18].

Analytical platform stability issues often arise when untargeted metabolomic studies are performed, since all samples are analyzed just once in a series of randomized sequential runs. To circumvent these issues and to ensure data reliability for further processing, the use of a quality control (QC) pool sample, prepared by mixing small volumes of all control and test samples, is often employed. Instrumental stability is checked by running several times the QC sample upfront and during sample analyses by intercalating QC samples every four or five samples, depending on run time. Repeatability of QC spectra and/or mass chromatograms is inspected visually and statistically. The importance of QC samples in metabolomics studies is thoroughly discussed in the review articles of Dunn et al. [49] and Theodoridis et al. [38].

Data processing.

For untargeted metabolomics, the acquired raw data are submitted to a preprocessing step according to the type of analytical platform employed. For NMR, data treatment includes phasing, baseline correction, alignment, and normalization. Softwares and algorithms, such as PERCH (PERCH Solution Ltd.), Chenomx NMR Suite (Chenomx Inc.), MestReNova (MestreLab Research), MetaboLab [50], AutoFit [51], TopSpin (Bruker Corp.), and MATLAB (The MathWorks Inc.), are routinely used. For hyphenated MS techniques, data treatment includes spectral deconvolution, dataset creation, grouping, alignment, filling data gaps, normalization, and data transformation. Several free access and proprietary softwares are available to process MS data as discussed comparatively by Sugimoto et al. [52]: XCMS [53], Mass Profiler Professional (MPP, Agilent Technologies), MZmine [54], MetAlign [55], MassLynx (Waters Corp.), and AMDIS [56], among others [57].

For targeted metabolomics, analyte quantitation or semiquantitation is a relevant part of data processing and is more commonly carried out using MS spectroscopy [58, 59] rather than NMR, although recommendations are available [60]. Methods are usually developed and conditions optimized for the selected metabolite (s), and the proposed method undergoes extensive validation following regulated protocols for the parameters specificity/selectivity, precision, accuracy, linearity, limits of detection and quantification, and robustness before application is set out. In targeted analyses, the use of internal standards is recommended to improve precision and to handle matrix effects [61], specially isotope-labeled internal standards. Selected reaction monitoring (SRM) is a versatile tool for targeted metabolomics when triple-quadrupole mass spectrometers are used [61].

An interesting alternative approach for comprehensive targeted metabolomics has been put in practice lately using commercially available kits [62]. High-throughput quantitative MS analyses of hundreds of endogenous metabolites of a few chemical classes (acylcarnitines, amino acids, hexoses, phospho- and sphingolipids, biogenic amines, etc.) are performed upfront, and statistical evaluation of results selects the discriminant metabolites. Although a great deal of compounds is usually disregarded after statistical analyses, at the end the discriminant metabolites have already been quantified. This approach opposes the more traditional one where untargeted metabolomics indicates qualitatively the potentially discriminant metabolites, usually a small number of compounds, that are next quantified by targeted metabolomics, followed by statistical evaluation of importance. Drawbacks of the comprehensive targeted metabolomics relate to the fact that a lot of analytical effort is placed on the quantitation of hundreds of metabolites that might render no significant information, and the search of discriminant metabolites is carried out on a limited number of chemical classes imposed by the commercial kit composition, with no room for discovery of novel metabolites. Chapter 6 gives more details about data processing in metabolomics.

Statistical analysis.

Metabolomic data are quite complex and require chemometric tools to reveal discriminant metabolites between control and test samples [63]. Multivariate analyses, comprising unsupervised methods, such as principal component analysis (PCA), and supervised methods, such as partial least square discriminant analysis (PLS-DA) and orthogonal projections to latent structures discriminant analysis (OPLS-DA), are often employed for sample overview and classification. Univariate analysis based on Student’s t-test, Mann-Whitney U test, etc. is also used to corroborate multivariate results. The mathematical models must be validated, which is carried out by cross validation procedures and permutation tests [6365]. Chapter 7 brings more details about chemometrics in metabolomics.

Metabolite identification.

Metabolite identification is required only for untargeted metabolomics studies, since for targeted metabolomics, the metabolite or metabolite class of interest is already defined. For such purpose, free databases and libraries, such as HMDB [66], KEGG [67], PubChem [68], Metlin [69], MassBank [70], LIPID MAPS [71], ChEBI [72], MMD [73], BioMagResBank [74], MetaboID [75], and Chenomx NMR Suite (Chenomx Inc.), are among the most commonly accessed. MassTRIX [76] is also a searching tool that uses some of the databases listed above. Once a putative metabolite has revealed identity, confirmation must be pursued. This can be accomplished by spiking techniques with authentic standards followed by comparison of fragmentation patterns between sample and standard (MS spectra) or 2-D NMR.

Metabolic pathways association.

Biological interpretation is an important step of any metabolomics study, targeted or untargeted. Once putative metabolites are listed and their identification confirmed, corresponding metabolic pathways are next searched. Several databases are available for this purpose: KEGG [67], MetaCyc [77], SMPDB [78], MetaboLights [79], and Reactome [80], among others. For details on the information compiled in databases, the review of Karp and Caspi can be consulted [81]. When altered metabolites are associated to respective metabolic pathways, a rationale can be elaborated in attempt to answering the original biological question that guided the metabolomics study. In Chapter 8, more details about metabolite identification and pathways analysis can be found.

Biological validation.

Although biological validation is not commonly pursued after a metabolomics study is completed, many authors consider that the results will only make a broader sense if proven by validation. Usually an external validation is recommended [63, 82], in which an entire new set of samples are collected and processed, as the work of Barbas et al. [83] exemplifies. Alternatively, the discriminant metabolites found preliminarily in the untargeted metabolomics study can be quantitatively analyzed in the same sample set (internal validation). Biological validation can also be reached by independent specific studies conducted with the discriminant metabolites found in the original metabolomics study. Ganti et al. [84] performed an untargeted metabolomics study using urine samples of kidney cancer patients and control subjects that revealed high levels of acylcarnitines associated with cancer status and kidney cancer grade. The study was then validated by in vitro experiments establishing that acylcarnitines affect cell survival and are indicative of inflammation. Biological validation serves therefore to corroborate the results obtained preliminarily in the original metabolomics study and to consolidate the biological interpretation of results.

3 The Importance of Metabolomics in Clinical Studies

From the beginnings of metabolomics until nowadays, most of the applications are focused on plant metabolomics. Nevertheless, with the recent advent of precision medicine, clinical metabolomics is on the spotlight for being able to provide molecular phenotyping of biofluids, cells, or tissues. In this context, clinical metabolomics is increasingly being applied to diagnose diseases, understand disease mechanisms, identify novel drug targets, customize drug treatments, and monitor therapeutic outcomes [85, 86].

As metabolites indicate end points of the gene expression and cell activity, metabolomics can provide a holistic approach for understanding the phenotype of an organism, playing a fundamental role in systems biology [27]. The characterization of metabolic phenotypes supports precision medicine by pointing out the metabolic imbalances that underlie diseases, discovering new therapeutic targets, and indicating potential biomarkers that may be used to either diagnose disease or monitor action of therapeutics [87].

Clinical metabolomics is thus an area of intense investigation and has been revised periodically for different conditions and diseases [63, 8897]. Chapters 9, 10, 11, 12, and 13 organized in this book compile many applications of clinical importance following the metabolomics framework.