Promises and pitfalls of untargeted metabolomics

Gertsman, Ilya; Barshop, Bruce A.

doi:10.1007/s10545-017-0130-7

Promises and pitfalls of untargeted metabolomics

Metabolomics
Published: 13 March 2018

Volume 41, pages 355–366, (2018)
Cite this article

Download PDF

Access provided by CONRICYT – Journals CONACYT

Journal of Inherited Metabolic Disease

Promises and pitfalls of untargeted metabolomics

Download PDF

2272 Accesses
159 Citations
8 Altmetric
1 Mention
Explore all metrics

Abstract

Metabolomics is one of the newer omics fields, and has enabled researchers to complement genomic and protein level analysis of disease with both semi-quantitative and quantitative metabolite levels, which are the chemical mediators that constitute a given phenotype. Over more than a decade, methodologies have advanced for both targeted (quantification of specific analytes) as well as untargeted metabolomics (biomarker discovery and global metabolite profiling). Untargeted metabolomics is especially useful when there is no a priori metabolic hypothesis. Liquid chromatography coupled to mass spectrometry (LC-MS) has been the preferred choice for untargeted metabolomics, given the versatility in metabolite coverage and sensitivity of these instruments. Resolving and profiling many hundreds to thousands of metabolites with varying chemical properties in a biological sample presents unique challenges, or pitfalls. In this review, we address the various obstacles and corrective measures available in four major aspects associated with an untargeted metabolomics experiment: (1) experimental design, (2) pre-analytical (sample collection and preparation), (3) analytical (chromatography and detection), and (4) post-analytical (data processing).

Mass Spectrometry-based Metabolomics in Translational Research

Untargeted Metabolomics: Next-Generation Metabolic Screening

Fundamentals of Mass Spectrometry-Based Metabolomics

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

In the ideal, a metabolomic study provides a picture of every metabolite in the organism and provides insight into metabolic response to a biological situation or experimental manipulation. The assumptions are that every metabolite will be measured, and that the measurements will be biologically informative. In reality, there are problems with these assumptions and experimental design and methodology are required to overcome them (partially, at least). The bases of the potential problems and approaches to address these issues are discussed below, first in the most general sense which applies to all experimental systems (issues inherent in drawing inference from metabolic pool measurements), and then in specific aspects of mass spectrometric (MS) measurements (pre-analytic, analytic and post-analytic processes).

At the present time, metabolomics experiments are performed with either mass spectrometry (Want et al 2007; Dunn et al 2011; Reaves and Rabinowitz 2011) or nuclear magnetic resonance (Fan and Lane 2016; Dietz et al 2017). Nuclear magnetic resonance (NMR) has the potential to measure metabolite levels in intact tissues, but sensitivity is limited (Tognarelli et al 2015; Fan and Lane 2016), and even with increased field strength (Righi et al 2012; Dietz et al 2017), it is not possible to detect low abundance compounds with currently available technology. This manuscript only discusses liquid chromatography (LC) based mass spectrometry (MS) approaches to untargeted metabolomics, with emphasis on inborn errors of metabolism (IEM).

Targeted versus untargeted metabolomics

There is some confusion and ambiguity in the application of the terms “targeted” and “untargeted” in metabolomics. In targeted studies, specific compounds are quantified and compared to established reference ranges. In practice, this corresponds to setting the mass spectrometer to monitor selected transitions reflecting individual target analytes (and their internal standards) through the time course of the chromatography. This is not different from what Biochemical Genetics laboratories have traditionally done in performing amino acid, organic acid, acylcarnitine analysis, etc. Using modern day instrumentation and stable isotope dilution, target analytes can be fully quantified to clinical laboratory standards, using formal calibration, validation, and quality control (FDA 2001), though in cases where absolute quantification is not necessary, a semi-quantitative approach may be useful, and is often used instead. On the other hand, untargeted metabolomics (Want et al 2005) seeks to analyze all detectable metabolites, known and unknown, to determine if one or more is or are significantly perturbed, and then to perform identification. Untargeted metabolomics is a “discovery mode” process and it relies on differential comparison between groups (Dudzik et al 2017) of samples (for example cases versus controls); it is not applicable to individual samples. In its strictest form, untargeted metabolomics is agnostic, comparing peaks as chromatographic “features”, and then seeking to identify the compounds. The settings of the mass spectrometer would reflect that (i.e., acquisition would be in scan mode), but identification is made on review and extraction of the data collected during the chromatography. Naturally, with experience, a laboratory will accrue a library of identities of chromatographic features and spectra, so that many peaks can be immediately identified. Untargeted metabolomics however is truly intended for discovery, and is not limited to a pre-determined list of metabolites or class of compounds, with the aim to span the breadth of the metabolome.

Challenges/pitfalls and solutions/workarounds

Discussion of issues, pitfalls, and workarounds is organized into the phases of a metabolomics experiment: experimental planning/conceptualization, pre-analytical, analytical, and post-analytical.

Experimental planning/conceptualization

There are realities which raise challenges to the basic assumptions of metabolomics; in some cases, there is nothing that the experimentalist can do to overcome the challenges, but in others there are solutions or at least methods to minimize the problem. LC-MS methodology involves extraction of body fluids or tissues. The source of the material will determine which analytes are present, so in any given sample there may be groups of compounds which will never be seen at more than trace amounts. For example, certain sugar phosphates and nucleotides will not be expected in extracellular fluids, and hydrophobic compounds such as fatty acids may be seen in blood, but not in urine or CSF. The metabolomic picture will differ greatly depending upon the fluid studied and so it is imperative to choose the most relevant sample type that will demonstrate the metabolic perturbation.

It is possible that the key event and most informative biological event in metabolism will take place as a trigger or nucleation event and will not be evident at any time later. That may be the case in transient niacin deficiency which could cause defects in embryogenesis (Shi et al 2017) but might not be evident in the mother at a later time. It also may be true that through cascade effects or biochemical amplification, a widespread change may result from a small perturbation in a key regulator, creating a sort of “butterfly effect,” as for example with microRNA species (Dorn 2013) or the trace concentration of cAMP initiating the cascade of glycogenolysis (Fischer 2013). That regulator may either be inaccessible in the study, or present at such a low concentration that it would never be measured when the experiment is performed. Instrumental dynamic range or interference from much more abundant analytes may make it impossible to monitor changes on both the regulatory and the bulk substrates. Performing longitudinal studies when possible can help detect transient changes which might not be observed in a static timepoint. In other scenarios, the levels of observed intermediates may not reveal a regulatory change, particularly when metabolite pools are defended by side reactions (such as anaplerosis), but measurement of flux could be mechanistically informative. In the last decade, initially pioneered through microbial metabolism studies (Blank et al 2005; Kummel et al 2006), researchers have used ¹³C (and/or ¹⁵N) labeled nutrients to follow the utilization of substrates such as glucose or amino acids, not only in cell culture, but in live mammals as well (Fan et al 2009). A bolus of stable iostopically labeled material can reveal altered ratios of labeled to unlabeled intermediates and isotopologues (containing both labeled an unlabeled atoms), and MS/MS fragmentation patterns can reveal changes in isotopomers (varying in the location of labeled atoms). This can provide insight into changes in metabolic flux in disease and allow construction of metabolic network models, revealing linkage among pathways otherwise not obviously related.

Sample size

A bad outcome for a metabolomics experiment would be finding no meaningful associations, and worse would be reaching spurious conclusions. Since untargeted metabolomics depends inherently on statistical comparison between or among experimental groups (controls vs. cases, treatment A vs. treatment B, etc.), meaningful results require an adequate number of samples in each group. In general, the objective is to predict the number of samples needed to generate a given power (e.g., 0.8) and a given degree of confidence (e.g., an adjusted p-value ≤0.05), given the experimental variability between replicate runs. One approach is to use data sets from pilot studies or from related samples in public data repositories. The power for a given false discovery rate (FDR) may be estimated for a given set of pilot data by a number of methods, including a module of the publically-available MetaboAnalyst package (Xia and Wishart 2016). The larger the number of samples, the less work in the post-analytical phase and the more definitive the results. As a rule of thumb, it is not practical to perform untargeted analysis with groups of less than 5–10 individual samples per group, and it is not realistic to consider running single samples for untargeted metabolomics. The metabolomics standards initiative (MSI) recommends a minimum of five biological replicates in their minimum reporting standards (Sumner et al 2007), but of course the true number required depends heavily on the intrinsic variation in the biological samples as well as the magnitude of the observed perturbation, all factors incorporated into power analysis. It is possible that a pathognomonic metabolite will by chance be seen in a single sample from a patient with a given disease, and Miller et al (Miller et al 2015) recently demonstrated the ability to identify such elevations in metabolomic studies of various inborn errors of metabolism by comparison to previously established reference ranges. However, only known metabolites were evaluated in this single sample fashion, while for biomarker search, multiple patient samples were processed in cohorts, a key aspect of untargeted metabolomics. Novel biomarker search poses a specific challenge with low sample numbers, as various analytical, environmental, and even dietary factors may result in aberrant levels of certain features from any single sample/run, normally evaluated by rigorous false discovery analysis in untargeted experiments. The variation posed by these factors are discussed in detail throughout this review, but it is worth considering that colleagues’ requests to “run untargeted metabolomics on a single sample” or small sample cohorts should be handled with discussion about experimental design, and redirected to either run exhaustive targeted analysis (similar to extended Biochemical Genetics assays) or to extend the study population to provide appropriate statistical power.

Pre-analytical

Sample preparation

When blood is sampled, there are advantages to using plasma over serum, since the specimen can be immediately placed on ice prior to separation. It is possible to use dried blood spot (and urine) cards for some applications (Barri and Dragsted 2013), but there is some uncertainty about extraction efficiency, depending upon the compound’s polarity. There is controversy regarding the choice of anticoagulant for plasma preparation. There may be interferences and serious matrix effects depending on the particular experimental setup, the specific anticoagulant (EDTA or heparin), the counter-ion (Na, K2, K3, Li), and the type (glass versus polypropylene) or brand of the tube. Some investigators favor heparin for plasma samples and state that EDTA should be avoided (Barri and Dragsted 2013), whereas others favor EDTA (Yin et al 2015; Metabolon 2017). Citrate should be avoided when studying central metabolism. There may also be artefactual features from surfactants and detergents used to treat the subject’s skin (Denery et al 2011). The best advice is to perform pretesting, and above all, to be consistent throughout the sample acquisition phase of the experiment, so that all samples are handled identically. Urine samples, which do not require special collection tubes (and should generally not include additives), must also be considered carefully. Metabolite concentrations may vary significantly in an individual throughout the course of the day based on hydration and diet. Often this is managed by normalization to creatinine levels, but that process may be compromised in kidney dysfunction. Alternative methods for normalization that have been used include use of osmolality and “total useful signal” from MS-data (MSTUS), a process by which many (hundreds or thousands) of common ions among all samples are used for scaling (Warrack et al 2009). Other factors to take into account when acquiring any animal or human samples include control of diet (or fasting time) to prevent exogenous metabolite interferences and to minimize variation, in addition to variables associated with sample storage and repeated freeze/thawing (Alvarez-Sanchez et al 2010).

Tissue/cell harvesting, metabolite extraction, and quenching of metabolism

Extracting and quenching metabolism is a critical factor for any metabolomics experiment. The need to effectively deproteinize the biological sample while solubilizing the metabolome is of course important, but if additional metabolism or compound degradation occurs during this process, the readout by LC-MS may no longer be biologically valid. Certain compound classes are especially labile and are represented in many of the primary energy pathways. These include sugar phosphates (glycolysis and pentose phosphate pathways), nucleotides (ATP, GTP, etc.), coenzymes and cofactors whose stability, especially in terms of phosphorylation state, are greatly influenced by factors such as pH and temperature (Sellick et al 2011; Vuckovic 2012; Leon et al 2013). These are intracellular metabolites for the most part and are rarely considered when extracting extracellular material such as plasma (or serum), CSF, or urine. Researchers interested in bacterial metabolism and flux analysis have increasingly considered such issues, often employing filtration systems that avoid perturbation from centrifugation allowing for quick washing and sampling (Aragon et al 2006; McCloskey et al 2014). There may be advantages to bloodspots in limiting ex vivo metabolism (Hill et al 2017), but that approach may entail differences in recovery and stability of different classes of metabolites (Koulman et al 2014). Adherent cell lines face a unique set of challenges in order to limit artifactual metabolic perturbation. In general practice, adherent mammalian cell cultures are washed with PBS, trypsinized, harvested, and centrifuged for further media washing, a process that has been implicated to be poorly compatible with preservation of the metabolome (Teng et al 2009). This has recently led to alternative, creative strategies to allow for quick harvesting and quenching of cellular material (Lorenz et al 2011; Martano et al 2015), where the intracellular energy metabolites mentioned above may be critical to the study. The commonality among the methods is that trypsinization and centrifugation steps are avoided, and cells are quenched quickly directly on the surface that they are grown on. They are then scraped off manually, often after freezing, before final preparation for LC-MS analysis. Validation of proper quenching can be performed by calculating ratios of the intact to degraded forms of labile metabolites such as nucleotides. For example, concentrations of ATP, ADP, and AMP can be incorporated in the equation for energy charge (([ATP] + 0.5[ADP])/([ATP] + [ADP] + [AMP)), and then compared to established ranges in various cell types, generally centered near 0.9 in normal conditions (Chapman et al 1976).

A variety of extraction/quenching methodologies have been compared for tissue that has been excised or biopsied from animals. Issues that have warranted extensive investigation include the need to cryo-freeze tissue, the use of freeze clamping, as well as variables associated with animal anesthesia and euthanasia methods (Belanger et al 2002; Want et al 2013; Overmyer et al 2015). In addition, the extraction solution used can have a major influence on the scope of the metabolome observed. For an untargeted metabolomics experiment that assumes many compound classes will be represented, it is critical to test the extraction efficiency of both highly polar metabolites, such as organic and amino acids, as well as various lipid classes with varying hydrophobicity. For extraction methods that solubilize both polar and hydrophobic compounds, biphasic strategies, such as the Bligh Dyer (Bligh and Dyer 1959) or Folch (Folch et al 1957) method or several variations (Rose and Oklander 1965; Jensen 2008), are commonly used. These primarily use a combination of chloroform, methanol, water, and in some cases acid, resulting in a separation of the aqueous and organic layers of solvent with a protein/DNA layer in between. More recently, a new method that utilizes methyl-tert-butyl-ether (MTBE) instead of chloroform has improved two important aspects of biphasic extraction (Chen et al 2013): 1) MTBE is less toxic than chloroform and safer to handle, and 2) the DNA/protein pellet from extraction is localized to the bottom of the tube following centrifugation. This allows for a simple removal of the two phases without contaminating the lower phase with the insoluble material. A variety of monophasic methods are also widely used and include solvents such as methanol, acetonitrile, ethanol, perchloric acid, as well as others, either in cold or boiling conditions, and are preferred for certain classes of compounds (Kolarovic and Fournier 1986; Canelas et al 2009; Dietmair et al 2010; Yanes et al 2011). It is important to note that there are significant differences in the coverage of the metabolome when comparing the various extraction methods, muddying the true definition of “untargeted” metabolomics.

Analytical

Once sample acquisition and extraction has been achieved, the analytical aspects associated with LC-MS analysis are the next key part of a successful experiment, and though seemingly straightforward, it contains a number of permutations that the experimenter must choose. As with the extraction steps described above, none of these will be perfect for all subsets of metabolites. The analytical choices, which include sample resuspension, chromatography and instrumentation, will determine the breadth of the metabolome covered and the degree of reliability in the collected data. The following section will highlight some of the major areas where major consideration must be applied.

Importance of chromatography

Though several groups have published methods that utilize direct injection into mass spectrometers for analysis of metabolites (Madalinski et al 2008; Fuhrer et al 2011), the vast number of researchers utilize inline chromatography in their platforms to minimize ionic suppression and increase both sensitivity and specificity of the analytes they report. Added complexity, be it in the form of non-volatile salts, buffers, or even metabolites can greatly influence the ionization efficiency of any given compound and cause interfering compounds that will convolute the accurate reporting of data, issues that can be greatly alleviated with successful chromatographic methods. From the early days of untargeted LC-MS based metabolomics dating back a little over a decade ago, C18 reverse phase columns have been a stalwart of many platforms. There are many iterations of C18 columns and nearly every manufacturer sells a version of these, though with sometimes distinguishing features that result in varying degrees of performance. Differences in particle technology, particle size, uniformity, column dimensions, and other factors will affect binding, separation, and elution properties, as well as back pressure. Smaller particle sizes result in increased column efficiency but cause an increase of back pressure that necessitates ultra high performance LC (UHPLC) systems and fast acquisition mass spectrometers to match narrow elution profiles (Guillarme et al 2010). Ultimately though, their frequent use throughout the LC-metabolomics era is based on their high reproducibility, which is a necessity for accurate run-to-run alignment, their versatility in retaining many non-polar and hydrophobic compound classes, and the simple mobile phase compositions (often acetonitrile/water or methanol/water gradients with small amounts of additives such as formic acid) required for their use. The latter factor ensures ideal compatibility with electrospray (ESI) and atmospheric pressure chemical ionization, the two primary LC-MS ionization techniques. The weakness of these columns is in the polar regime of the metabolome, and many such compounds will have poor retention, eluting near the solvent front of a run where the greatest amount of ionic suppression and potential interferences reside. Unfortunately, many of the metabolites of interest, especially in the realm of primary energy metabolism (e.g., organic acids and amino acids) related to both human disease as well as intracellular studies are highly polar. An example of this was an early study involving our group to demonstrate the utility of untargeted metabolomics to detect known biomarkers of IEM (Wikoff et al 2007). In this study of a small group of patients with propionic acidemia, methylmalonic acidemia and controls, the controls were of course distinguished from propionic and methylmalonic acidemia by an elevation of propionyl-carnitine and related acylcarnitines. The distinction between methylmalonic and propionic acidemia, however, was less clear, because methylmalonyl-carnitine was not detected (presumably attributable to a lack of suitable stationary phase ideal for such a highly polar compound). That study demonstrated feasibility, but also limitations: no single chromatography will permit “global” untargeted metabolomics.

Normal phase and HILIC columns, with stationary phases containing polar groups, such as amino, cyano and silica among others, are now frequently employed for additional runs to analyze the polar chemical realm (Jandera and Janas 2017; McCalley 2017). In the past, these columns were more difficult to use reproducibly, as they generally required longer re-equilibration times and more complex mobile phases that also incorporated buffers and higher ionic strength for efficient metabolite elution. A more recent alternative to using normal phase is the use of reverse phase stationary phases containing polar groups such as pentafluorophenyl (PFP) columns (Csató et al 1990), which we have previously validated for use in a combined targeted/untargeted metabolomics platform (Gertsman et al 2014). Various versions of these exist, including with a propyl (PFPP) linker (manufacturers include Phenomenex, Resetek, ES industries, and UCT) or even combined with a C18 stationary phase for a mixed mode effect (Mac-Mode). Mixed mode columns, which generally utilize both non-polar and polar stationary phases to extend versatility in metabolite selection, can often be used in standard reverse phase conditions and have been a preferred choice in some untargeted studies (Yanes et al 2011; Gertsman et al 2015). This is not to say that drawbacks do not exist in these columns as well, and weaknesses can include poor elution of polar lipids or other compound classes that carry both polar and hydrophobic moieties.

Yet another important aspect of chromatography lies in the ability to separate isomers, isobaric compounds, and other interferences. An example of an unexpected interference often ignored is a co-eluting compound that actually has a different parent mass, but undergoes an in-source fragmentation that contributes to the signal of the other. This can occur for the organic acids fumarate and malate for example, where a water loss from malate (m/z 133.014) during electrospray ionization in negative ion mode will cause a m/z 115.004 ion to appear that is indistinguishable in MS and even MS/MS profiles from fumarate (Fig. 1a). If chromatography cannot distinguish these two, fumarate, a very critical TCA cycle intermediate, will be falsely reported. An example of necessitating chromatographic distinction of isomers can be seen in Fig. 1c, where a certain C18 column was unable to resolve 2- and 3-hydroxybuturate under a typical reverse phase gradient, while a C18-PFP column successfully could (Fig. 1b). Though it will be nearly impossible to qualify the separation all such possible pairs or isomers from each other, it is worthwhile to qualify a platform for the critical metabolites that are routinely measured and reported (e.g., major energy pathways). In the above example, 2- and 3-hydroxybutyrate stem from completely different metabolic pathways (threonine/methionine metabolism and fatty acid metabolism respectively), and their combined signal will obscure potentially significant results from either of these.

Analytical variation: the case for internal standards and/or QCs

An obstacle in comparing peak area differences from one run to another is that signal variation occurs for any given compound of interest. Some of this is likely due to small but noticeable differences in signal intensity that may vary during the course of a batch, while other factors include slight differences in the matrix of one sample compared to another, resulting in differences in ionic suppression (especially with different sample types, e.g., plasma vs. urine). A clear example of variation in Fig. 2 (unpublished data from one of our own studies) shows the difference between comparing non-normalized (no stable isotope) palmitoylcarnitine (C₁₆-carnitine) to peak areas normalized to a deuterated version of the compound that was spiked during extraction. The figure shows that one of the lower values in the un-normalized estimate (peak area) was actually one of the higher measurements for that group when normalized appropriately (Fig. 2b). Overall, omitting a stable isotope for comparison would not have changed the mean value of the metabolite in this cohort, but the concentration would have been underestimated if the single sample were studied individually. Appropriate stable isotopes can be especially useful in instances where sample numbers are low, or where compounds fall in chromatographic regions with known ionic suppression. Many groups are more commonly making use of stable isotope dilution in untargeted experiments, which not only comes in handy for the potential normalization of endogenous compounds, but can be used for assessing drift in both signal intensity and retention time (Sysi-Aho et al 2007; Miller et al 2015). Stable isotope dilution is especially helpful in longitudinal studies acquired over years, where there may be large differences in instrument performance, different column batches, or even different operators. However, care must be taken to ensure standards are adequately assessed for stability and degradation during storage times relevant to the breadth of the study. Also, if one is to use internal standards for untargeted studies, it is necessary to match the chemically diversity of the run with the standards selected, also making sure to cover the width of the chromatographic run, as intensity drift may not affect all compounds or sections of the run equally. As this can be cost prohibitive or otherwise burdensome, alternatives to internal standards are used to compensate for analytical errors. These include the use of replicate samples or QCs that can be run throughout different intervals of a batch (Dunn et al 2011; Wehrens et al 2016), and in one method, used in a serially dilute form throughout to test for signal linearity of different compounds (Kouassi Nzoughet et al 2017). A variety of processing tools have also been developed to deal with issues of signal and retention time drift, as well as batch effects and outliers that can plague data analysis, as discussed in the post-analytical section of the review (Salerno et al 2017; Thonusin et al 2017).

Post-analytical

Pre-processing

Following the completion of the mass spectrometry runs, a number of pre-processing and post-processing tools are available for identifying analytes of interest from untargeted data sets. Though many researchers incorporate specific target compounds in such runs that are always integrated and compared, the general processing strategy in untargeted workflows is to focus on compounds that are statistically altered. The runs must be first properly aligned, either with the aid of several freely available software packages (Lommen 2009; Tautenhahn et al 2012a, b; Li et al 2017) or the many propriety software packages often distributed by MS vendors. Non-linear alignment is preferred in pre-processing as chromatographic shifts are often non-uniform throughout the run, and improved alignment enables more accurate peak selection and integration when unique analytes with similar m/z have small deviations in elution time. Metabolomics software packages often allow signal normalization as a pre-processing tool, either with the use of internal standards, or by other methods. Other pre-processing options prior to thorough statistical analysis include the removal of outliers and other batch effects. The following section highlights some of the intricacies and bottlenecks associated with the processing and analysis of pre-processed data.

Compound identification

Identification of unknown compounds in untargeted metabolomics is considered the greatest bottleneck of data interpretation and requires a number of tools and proper instrumentation to successfully overcome. A high resolution mass spectrometer (Q-TOF, Orbitrap, or FT-ICR instruments) using a standard reverse phase platform may lead to the observation of many thousands of peaks from a single run, the number depending on instrument sensitivity, solvent composition and purity, matrix complexity, and in-source fragmentation as possible factors. Each peak does not necessarily represent a unique metabolite though, and a single feature may be represented in a dozen or more forms that include adducts (salt or solvent complexes), dimeric or even trimeric states, and even fragments produced during ionization or transmission of ions. For compounds of interest, it is therefore important to identify the elemental composition of the ion, and some useful guidelines have been published to narrow down the possibilities for any given ion (Kind and Fiehn 2007; Watson 2013). Common considerations to reduce the number of possibilities include: 1) the nitrogen rule (better suited for masses <500 Da), which dictates that a compound with an even nominal mass will have an even number of nitrogen atoms, and with an odd mass will have an odd number of nitrogens, 2) likely hydrogen/carbon ratios and elemental probability analysis, and 3) isotopic distributions of the analyte, as atoms have different isotopic abundances. In addition, since atoms have unique mass defects due to differences in nuclear binding energy (e.g., common isotopic form of sulfur, ³²S, has mass of 31.972, while ¹²C Carbon is 12.000), high resolution mass spectrometry can use such properties to narrow down the possibilities. In Fig.3, we show the parent mass of oxidized glutathione analyzed on an Orbitrap Lumos instrument collected at three different resolution: 30,000, 120,000, and 500,000. Most current Q-TOF instruments have ~30,000 resolution, and at this resolution (along with accurate mass) we demonstrate that the third peak (M + 2) for oxidized glutathione (GSSG) has a lower non-integer mass than its previous two isotopic forms due to the mass defect of ³⁴S, which is the next most prominent form of sulfur after ³²S. This shift to the left can be identified by Q-TOF, but the distribution of the atoms with isotopic forms in this peak are not clear. When increasing the resolution to 120 K, one can see a bump next to that peak that distinguishes the carbon and nitrogen isotopes from the sulfur, the latter being more predominant. When one uses ultra high resolution of 500,000 the two forms are very clearly separated and can actually be integrated accurately, enabling one to both implicate and rule out various combinations of atoms present in the analyte. In addition to resolution, high mass accuracy can help to further narrow down possible elemental compositions, a common attribute found in most instruments used for untargeted metabolomics (<~2–3 ppm mass accuracy). A number of chemical libraries can be searched for annotated compounds that match a possible elemental composition, and include: METLIN (Tautenhahn et al 2012a, b), HMDB (Wishart et al 2009), Chemspider (Williams and Tkachenko 2014), Pubchem (Wang et al 2009)), GnPS (Wang et al 2016)), Lipidmaps (Sud et al 2012), Massbank (Horai et al 2010), Metabolomics Workbench (Sud et al 2016), and MetaCyc (Caspi et al 2014).

An important experiment for identification of unknown compounds involves the fragmentation of isolated ions of interest. Depending on the instrument, either collision induced dissociation (CID), electron transfer (ETD) or electron capture dissociation (ECD) is used to facilitate this experiment. Many workflows allow for an automatic selection of a number of ions during each scan cycle for fragmentation (data dependent acquisition) to provide a library of MS/MS spectra that can later be used for compound identification. For trap instruments, MSⁿ can be useful for more thorough fragmentation and improved structure elucidation, where daughter ions are isolated and further fragmented (Rojas-Cherto et al 2012; Vaniya and Fiehn 2015). Several of the repositories mentioned above, including METLIN, HMDB, Lipid Maps, and GnPS have libraries of MS/MS data for matching unknown spectra. XCMS², an updated version of the very widely used XCMS metabolomics software package, enables fragments from MS/MS spectra to be searched against the METLIN library during data processing, and scored for their similarities to known product ion spectra to enable compound identification (Benton et al 2008), while an online version can also be used for both analysis and spectral library searches (Tautenhahn et al 2012a, b). GnPS, a recently established repository for natural products, allows the metabolomics community to upload data acquisition files online, which can be searched against already identified product ion spectra from previous data collections, and scored for possible matches (Wang et al 2016). The MS community can update the annotations and grade the quality of spectra submitted in this database. These automated search tools greatly help to reduce the time required to manually compare new data to existing spectral libraries. The future of quick compound identification and thorough untargeted metabolomics analysis will in large part be tied to the advancement of such spectral libraries and how they add, share, and search spectral data with fellow researchers, as this bottleneck is much too large to tackle independently.

If an unknown can be matched by some of the methods listed above, such as accurate mass, isotopic distribution, and fragmentation pattern, other factors should also be considered as well, such as: whether the elution time of the unknown likely correlates with the chemical class of the candidate compound, and whether the sample type is likely one to have such a metabolite present. At this stage purchasing an internal standard is the best way to fully confirm identity, which is often difficult or cost prohibitive if custom synthesis is required. Nonetheless, such an investment is often necessary for targeted quantitation or further study of the compound of interest. Misidentification is obviously a major pitfall for data interpretation and though compound matching using the tools described above can be very helpful, issues like isobaric or even isomeric species will often cause an additional hurdle to overcome. Having effective chromatography for the compound class of interest that can distinguish potential isomers is critical for final confirmation. Standards and guidelines for reporting identification or annotation of compounds have been authored by the metabolomics standard initiative (MSI), which have outlined criteria for reporting new compound identities in the literature (Fiehn et al 2007; Salek et al 2013). Within these reports, the MSI outlines the recommended levels of compound identification, ranging from the highest (level 1) where properties of an authentic standard are compared to experimental data, down through putatively annotated and characterized compounds (levels 2 and 3), and finally unknown (level 4). De novo identification of a compound that does not have an accessible fragmentation pattern is especially difficult, but is unfortunately the case for most analytes from a typical metabolomics study. In addition to elemental structure identification, a mass spectrometrist can also use tools to perform mock fragmentations of candidate structures, focusing especially on functional groups that are likely to fragment and ionize well, and then match these to the acquired MS/MS spectra.

Statistical analysis

Untargeted metabolomics experiments generally use a combination of univariate and multivariate analysis to help identify compounds and pathways that are altered between cohorts. There are many different commercial as well as freely available statistical packages that can perform these functions, but in recent years several freely available online tools have been made available that carry a wide range of analysis features geared specifically toward metabolomics analysis. Two widely used online platforms include Metaboanalyst and the Metabolomics Workbench mentioned above. Data can be uploaded to these sites, normalized and scaled as necessary, and then analyzed by tools such as: T-tests, ANOVA, principal component analysis (PCA), as well as partial least squares (PLSDA), and orthogonal projections to latent structures (OPLSDA) determinant analysis, heatmaps, dendograms, volcano plots, and correlation analysis among other useful tools for data reduction and chemometrics.

Both univariate and multivariate analyses require special considerations to limit false interpretation from metabolomics data. Multivariate analysis is often a useful strategy for differentiating cohorts based on the covariances, or correlations of the many independent variables. Prior to using such methods, the signals of the analytes are often scaled so that high intensity ions do not overly bias the modeling. Several common scaling methods such as mean centering or Pareto scaling (Tugizimana et al 2016) are often used, depending on whether one favors treating all analytes equally, regardless of intensity (mean centering), or if one believes that high intensity analytes (compounds that either have high concentrations and/or high ionization efficiencies by ESI-LC-MS) should still have greater weight due to a higher confidence measurement (Pareto). An unbiased, or unsupervised method that is usually used as a first pass in evaluating metabolomics data is principal component analysis (PCA), which reduces the dimensionality of the many variables into primary eigenvectors that capture variance (Jolliffe and Cadima 2016). In PCA, the processing is blind to any classification in the data, and since it considers the relationships of all the independent variables simultaneously, it is generally not useful as a modeling tool in comprehensive metabolomics studies, where most of the variables are irrelevant. For generating models that are more apt to finding the independent variables that best discriminate the classifiers (dependent variables), researchers most often use partial PLSDA or OPLSDA. These are termed supervised methods, as the user inputs the classifiers (Y) along with the independent variables (X) that are projected in multi-dimensional space to enhance the variation in Y (Barker and Rayens 2003). OPLSDA differs from PLSDA in that it separates the uncorrelated variation in X from the predictive, whereas in PLSDA, variation not correlated with the Y-classifiers is still present in the data (Trygg and Wold 2002). The predictive power of both methods is thought to be the same, though (Bylesjö et al 2006).

The pitfall in these supervised methods though are generally associated with overfitting. These methods will often show distinction of cohorts from just randomly generated data, as these tools are designed to accentuate any co-variances that differentiate the response (Y) variables, and with multicomponent testing using large numbers of independent variables with relatively low numbers of replicates, false positives are a given. From PLSDA plots for example, a variable importance parameter (VIP) describes the loadings that fit the model, and help researchers determine which analytes should be left in further iterations, and those that should be removed (not related to variation of cohorts). This process can lead to further overfitting of a model. It is therefore critical to perform validation analysis when generating these models to better ensure that false relationships between metabolites are not causing misinterpretation of the data. Permutation tests can analyze whether the assigned classes from the experiment are any more significant than randomly assigned class distinctions applied to the different samples (Golland and Fischl 2003). Cross-validation is an important process for model development and refinement, where the cohorts are split up into smaller subsets, and the model is further fitted with exclusion of various subjects (Westerhuis et al 2008; Wheelock and Wheelock 2013). Ideally, the samples can be randomly divided into a training set, validation set, and test set, where individual models can be tested and evaluated on unique sample subsets, and then applied to both other samples groups as well as the entire sample set. A cross-validated correlation (Q²) after subsequent iterations of such modeling can be assessed and compared to the R² of the total model fitting (Wheelock and Wheelock 2013). One point often not discussed in modeling from untargeted metabolomics is the use of unidentified metabolites in the model. Though one can try to eliminate contaminants, adducts, and other multiply represented features during pre-processing, a number of unknown features may persist, many of which are not biologically relevant, and which are thought to comprise the majority of features from an untargeted metabolomics run (Benton et al 2015). Some of these may be critical to the findings and are often part of the reason for choosing untargeted metabolomics in the first place. True unknowns that are significant to a data set should be attempted to be identified, but as mentioned previously, this is often a difficult task. If the unknowns cannot be identified, one is left to wonder how they should be considered in a published multivariate model.

Univariate analysis is one of the most common approaches to identify specific analytes that are significantly altered between cohorts. Assessment of normal distribution (parametric vs. non parametric) can be done prior to choosing the univariate method, most often with either student t-tests or ANOVA. Assessment of the statistical significances (i.e., p-values) of these tests are harder to interpret in untargeted metabolomics data, where thousands of individual features (or hypotheses) are being tested with comparably few unique samples tested. This multiple component problem has led to various approaches to correction in univariate testing, not just in the field of metabolomics, but in other OMICS disciplines like genomics/transcriptomics and proteomics. The most conservative correction approach has been Bonferroni correction, where the significance level of an analyte is divided by the number of total hypotheses (Dunn 1961). Such a procedure can limit false positives in analyses (type I error), but unfortunately results in higher numbers of false negatives (type II errors) (Perneger 1998). False discovery rate (FDR) approaches have been developed to apply corrections that are more careful in limiting false negatives (Benjamini and Hochberg 1995; Genovese and Wasserman 2002). One such approach, which was made popular in genome wide studies, but is also used in metabolomics is the Q-value correction, an FDR approach that compares distributions of p-values from a data set and compares it to a distribution where all features are null (e.g., no differential between control and disease) in order to calculate the correction (q-value from a p-value) most applicable for a given dataset (Storey and Tibshirani 2003). Though there is no perfect method to remove false positives, an appropriate correction of multiple testing is nonetheless required in untargeted metabolomics reporting and interpretation.

Conclusion

Untargeted metabolomics is an exciting technology to search for novel metabolic perturbations in various biological systems. As LC-MS metabolomics methods have developed over the last decade or two, sophisticated targeted methods have greatly expanded the breadth of metabolome that can be accurately quantified (Zhou et al 2016). Still, though, the allure of discovering novel biomarkers in disease states makes the untargeted approach remain valuable, and allows the investigator to evaluate a diverse swath of the metabolome, with less chance of missing an association, as when a particular analyte is targeted based on a single hypothesis. Unfortunately, a multitude of caveats are included when choosing untargeted metabolomics. We have attempted to address various aspects of untargeted metabolomics, including pre-analytical, analytical, and analysis aspects, all which have associated pitfalls that can jeopardize the usefulness of the data. From sample acquisition, to sample extraction and chromatographic selection, one can heavily bias the metabolites resolved, necessitating careful scrutiny and validation of each facet of the experiment. In addition, identification of novel compounds of interest presents another obstacle, but fortunately as the field grows, better tools have become available to address such issues. As these platforms further develop, we believe future untargeted studies will help to fill in the many gaps of uncharacterized metabolic perturbation in biological systems, and further benefit the clinical community by discovering novel diagnostic and therapeutic markers in disease.

References

Alvarez-Sanchez B, Priego-Capote F, Luque de Castro MD (2010) Metabolomics analysis. I. Selection of biological samples and practical aspects preceding sample preparation. Trends Anal Chem 29:111–119
Aragon AD, Quinones GA, Allen C et al (2006) An automated, pressure-driven sampling device for harvesting from liquid cultures for genomic and biochemical analyses. J Microbiol Methods 65:357–360
Article PubMed CAS Google Scholar
Barker M, Rayens W (2003) Partial least squares for discrimination. J Chemom 17:166–173
Article CAS Google Scholar
Barri T, Dragsted LO (2013) UPLC-ESI-QTOF/MS and multivariate data analysis for blood plasma and serum metabolomics: effect of experimental artefacts and anticoagulant. Anal Chim Acta 768:118–128
Article PubMed CAS Google Scholar
Belanger MP, Askin N, Wittnich C (2002) Multiple in vivo liver biopsies using a freeze-clamping technique. J Investig Surg 15:109–112
Article Google Scholar
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate - a practical and powerful approach to multiple testing. J Roy Stat Soc B Met 57:289–300
Google Scholar
Benton HP, Wong DM, Trauger SA et al (2008) XCMS2: processing tandem mass spectrometry data for metabolite identification and structural characterization. Anal Chem 80:6382-6389
Benton HP, Ivanisevic J, Mahieu NG et al (2015) Autonomous metabolomics for rapid metabolite identification in global profiling. Anal Chem 87:884–891
Article PubMed CAS Google Scholar
Blank LM, Kuepfer L, Sauer U (2005) Large-scale 13C-flux analysis reveals mechanistic principles of metabolic network robustness to null mutations in yeast. Genome Biol 6:R49
Article PubMed PubMed Central CAS Google Scholar
Bligh EG, Dyer WJ (1959) A rapid method of total lipid extraction and purification. Can J Biochem Physiol 37:911–917
Article PubMed CAS Google Scholar
Bylesjö M, Rantalainen M, Cloarec O, Nicholson JK, Holmes E, Trygg J (2006) OPLS discriminant analysis: combining the strengths of PLS-DA and SIMCA classification. J Chemom 20:341–351
Article CAS Google Scholar
Canelas AB, ten Pierick A, Ras C et al (2009) Quantitative evaluation of intracellular metabolite extraction techniques for yeast metabolomics. Anal Chem 81:7379–7389
Caspi R, Altman T, Billington R et al (2014) The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res 42:D459–D471
Article PubMed CAS Google Scholar
Chapman AG, Miller AL, Atkinson DE (1976) Role of the adenylate deaminase reaction in regulation of adenine nucleotide metabolism in Ehrlich ascites tumor cells. Cancer Res 36:1144–1150
PubMed CAS Google Scholar
Chen S, Hoene M, Li J et al (2013) Simultaneous extraction of metabolome and lipidome with methyl tert-butyl ether from a single small tissue sample for ultra-high performance liquid chromatography/mass spectrometry. J Chromatogr A 1298:9–16
Article PubMed CAS Google Scholar
Csató E, Fülöp N, Szabó G (1990) Preparation and comparison of a pentafluorophenyl stationary phase for reversed-phase liquid chromatography. J Chromatogr A 511:79–88
Article Google Scholar
Denery JR, Nunes AA, Dickerson TJ (2011) Characterization of differences between blood sample matrices in untargeted metabolomics. Anal Chem 83:1040–1047
Article PubMed CAS Google Scholar
Dietmair S, Timmins NE, Gray PP, Nielsen LK, Kromer JO (2010) Towards quantitative metabolomics of mammalian cells: development of a metabolite extraction protocol. Anal Biochem 404:155–164
Article PubMed CAS Google Scholar
Dietz C, Ehret F, Palmas F, et al (2017) Applications of high-resolution magic angle spinning MRS in biomedical studies II-human diseases. NMR Biomed 30:e3784
Dorn GW 2nd (2013) MicroRNAs and the butterfly effect. Cell cycle (Georgetown, Tex) 12:707–708
Article CAS Google Scholar
Dudzik D, Barbas-Bernardos C, Garcia A, Barbas C (2017) Quality assurance procedures for mass spectrometry untargeted metabolomics. A review. J Pharma Biomed Anal 147:149-173
Dunn OJ (1961) Multiple Comparisons among Means. J Am Stat Assoc 56:52
Article Google Scholar
Dunn WB, Broadhurst D, Begley P et al (2011) Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry. Nat Protoc 6:1060–1083
Article PubMed CAS Google Scholar
Fan TW, Lane AN (2016) Applications of NMR spectroscopy to systems biochemistry. Prog Nucl Magn Reson Spectrosc 92-93:18–53
Article PubMed PubMed Central CAS Google Scholar
Fan TW, Lane AN, Higashi RM et al (2009) Altered regulation of metabolic pathways in human lung cancer discerned by (13)C stable isotope-resolved metabolomics (SIRM). Mol Cancer 8:41
Article PubMed PubMed Central CAS Google Scholar
FDA (2001) Guidance for Industry: Bioanalytical Method Validation (Fda.Gov/cder/guidance/index.htm)
Fiehn O, Robertson D, Griffin J et al (2007) The metabolomics standards initiative (MSI). Metabolomics 3:175–178
Article CAS Google Scholar
Fischer EH (2013) Cellular regulation by protein phosphorylation. Biochem Biophys Res Commun 430:865–867
Article PubMed CAS Google Scholar
Folch J, Lees M, Sloane Stanley GH (1957) A simple method for the isolation and purification of total lipides from animal tissues. J Biol Chem 226:497–509
PubMed CAS Google Scholar
Fuhrer T, Heer D, Begemann B, Zamboni N (2011) High-throughput, accurate mass metabolome profiling of cellular extracts by flow injection-time-of-flight mass spectrometry. Anal Chem 83:7074–7080
Article PubMed CAS Google Scholar
Genovese C, Wasserman L (2002) Operating characteristics and extensions of the false discovery rate procedure. J Roy Stat Soc B 64:499–517
Article Google Scholar
Gertsman I, Gangoiti J, Barshop B (2014) Validation of a dual LC–HRMS platform for clinical metabolic diagnosis in serum, bridging quantitative analysis and untargeted metabolomics. Metabolomics 10:312–323
Article PubMed CAS Google Scholar
Gertsman I, Gangoiti JA, Nyhan WL, Barshop BA (2015) Perturbations of tyrosine metabolism promote the indolepyruvate pathway via tryptophan in host and microbiome. Mol Genet Metab 114:431–437
Article PubMed CAS Google Scholar
Golland P, Fischl B (2003) Permutation tests for classification: towards statistical significance in image-based studies. Inf Process Med Imaging 18:330–341
Article PubMed Google Scholar
Guillarme D, Ruta J, Rudaz S, Veuthey JL (2010) New trends in fast and high-resolution liquid chromatography: a critical comparison of existing approaches. Anal Bioanal Chem 397:1069–1082
Article PubMed CAS Google Scholar
Hill C, Drolet J, Kellogg MD, Tolstikov V, Narain NR, Kiebish MA (2017) Blood sampled through dried blood spots (DBS) exhibits diminished ex vivo metabolism compared to whole blood through use of a kinetic isotope-labeling metabolomics approach. Biochem Anal Biochem 6:325–330
Horai H, Arita M, Kanaya S et al (2010) MassBank: a public repository for sharing mass spectral data for life sciences. J Mass Spectrom 45:703–714
Article PubMed CAS Google Scholar
Jandera P, Janas P (2017) Recent advances in stationary phases and understanding of retention in hydrophilic interaction chromatography. A review. Anal Chim Acta 967:12–32
Article PubMed CAS Google Scholar
Jensen SK (2008) Improved Bligh and dyer extraction procedure. Lipid Technol 20:280–281
Article CAS Google Scholar
Jolliffe IT, Cadima J (2016) Principal component analysis: a review and recent developments. Philos Trans A Math Phys Eng Sci 374:20150202
Article PubMed PubMed Central Google Scholar
Kind T, Fiehn O (2007) Seven golden rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry. BMC Bioinformatics 8:105
Article PubMed PubMed Central CAS Google Scholar
Kolarovic L, Fournier NC (1986) A comparison of extraction methods for the isolation of phospholipids from biological sources. Anal Biochem 156:244–250
Article PubMed CAS Google Scholar
Kouassi Nzoughet J, Bocca C, Simard G et al (2017) A nontargeted UHPLC-HRMS metabolomics pipeline for metabolite identification: application to cardiac remote ischemic preconditioning. Anal Chem 89:2138–2146
Koulman A, Prentice P, Wong MCY et al (2014) The development and validation of a fast and robust dried blood spot based lipid profiling method to study infant metabolism. Metabolomics 10:1018–1025
Article PubMed PubMed Central CAS Google Scholar
Kummel A, Panke S, Heinemann M (2006) Putative regulatory sites unraveled by network-embedded thermodynamic analysis of metabolome data. Mol Syst Biol 2:0034
Article PubMed CAS Google Scholar
Leon Z, Garcia-Canaveras JC, Donato MT, Lahoz A (2013) Mammalian cell metabolomics: experimental design and sample preparation. Electrophoresis 34:2762–2775
PubMed CAS Google Scholar
Li L, Ren W, Kong H, et al (2017) An alignment algorithm for LC-MS-based metabolomics dataset assisted by MS/MS information. Analytica Chimica Acta 990:96-102
Lommen A (2009) MetAlign: interface-driven, versatile metabolomics tool for hyphenated full-scan mass spectrometry data preprocessing. Anal Chem 81:3079–3086
Article PubMed CAS Google Scholar
Lorenz MA, Burant CF, Kennedy RT (2011) Reducing time and increasing sensitivity in sample preparation for adherent mammalian cell metabolomics. Anal Chem 83:3406–3414
Article PubMed PubMed Central CAS Google Scholar
Madalinski G, Godat E, Alves S et al (2008) Direct introduction of biological samples into a LTQ-Orbitrap hybrid mass spectrometer as a tool for fast metabolome analysis. Anal Chem 80:3291–3303
Article PubMed CAS Google Scholar
Martano G, Delmotte N, Kiefer P et al (2015) Fast sampling method for mammalian cell metabolic analyses using liquid chromatography-mass spectrometry. Nat Protoc 10:1–11
Article PubMed CAS Google Scholar
McCalley DV (2017) Understanding and manipulating the separation in hydrophilic interaction liquid chromatography-a review. J Chromatogr A 1523:49-71
McCloskey D, Gangoiti JA, King ZA et al (2014) A model-driven quantitative metabolomics analysis of aerobic and anaerobic metabolism in E. Coli K-12 MG1655 that is biochemically and thermodynamically consistent. Biotechnol Bioeng 111:803–815
Article PubMed CAS Google Scholar
Metabolon (2017) Sample preparation and shipping guidelines.www.metabolon.com
Miller MJ, Kennedy AD, Eckhart AD et al (2015) Untargeted metabolomic analysis for the clinical screening of inborn errors of metabolism. J Inherit Metab Dis 38:1029–1039
Article PubMed PubMed Central CAS Google Scholar
Overmyer KA, Thonusin C, Qi NR, Burant CF, Evans CR (2015) Impact of anesthesia and euthanasia on metabolomics of mammalian tissues: studies in a C57BL/6J mouse model. PLoS One 10:e0117232
Article PubMed PubMed Central CAS Google Scholar
Perneger TV (1998) What's wrong with Bonferroni adjustments. Brit Med J 316:1236–1238
Article PubMed PubMed Central CAS Google Scholar
Reaves ML, Rabinowitz JD (2011) Metabolomics in systems microbiology. Curr Opin Biotechnol 22:17–25
Article PubMed CAS Google Scholar
Righi V, Tugnoli V, Mucci A, Bacci A, Bonora S, Schenetti L (2012) MRS study of meningeal hemangiopericytoma and edema: a comparison with meningothelial meningioma. Oncol Rep 28:1461–1467
Article PubMed CAS Google Scholar
Rojas-Cherto M, Peironcely JE, Kasper PT et al (2012) Metabolite identification using automated comparison of high-resolution multistage mass spectral trees. Anal Chem 84:5524–5534
Article PubMed CAS Google Scholar
Rose HG, Oklander M (1965) Improved procedure for the extraction of lipids from human erythrocytes. J Lipid Res 6:428–431
PubMed CAS Google Scholar
Salek RM, Steinbeck C, Viant MR, Goodacre R, Dunn WB (2013) The role of reporting standards for metabolite annotation and identification in metabolomic studies. Gigascience 2:13
Article PubMed PubMed Central CAS Google Scholar
Salerno S Jr, Mehrmohamadi M, Liberti MV et al (2017) RRmix: a method for simultaneous batch effect correction and analysis of metabolomics data in the absence of internal standards. PLoS One 12:e0179530
Article PubMed PubMed Central CAS Google Scholar
Sellick CA, Hansen R, Stephens GM, Goodacre R, Dickson AJ (2011) Metabolite extraction from suspension-cultured mammalian cells for global metabolite profiling. Nat Protoc 6:1241–1249
Article PubMed CAS Google Scholar
Shi H, Enriquez A, Rapadas M et al (2017) NAD deficiency, congenital malformations, and niacin supplementation. N Engl J Med 377:544–552
Article PubMed CAS Google Scholar
Storey JD, Tibshirani R (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci U S A 100:9440–9445
Article PubMed PubMed Central CAS Google Scholar
Sud M, Fahy E, Cotter D et al (2016) Metabolomics workbench: an international repository for metabolomics data and metadata, metabolite standards, protocols, tutorials and training, and analysis tools. Nucleic Acids Res 44:D463–D470
Article PubMed CAS Google Scholar
Sud M, Fahy E, Cotter D, Dennis EA, Subramaniam S (2012) LIPID MAPS-nature Lipidomics gateway: an online resource for students and educators interested in lipids. J Chem Educ 89:291–292
Article PubMed CAS Google Scholar
Sumner LW, Amberg A, Barrett D et al (2007) Proposed minimum reporting standards for chemical analysis chemical analysis working group (CAWG) metabolomics standards initiative (MSI). Metabolomics 3:211–221
Sysi-Aho M, Katajamaa M, Yetukuri L, Oresic M (2007) Normalization method for metabolomics data using optimal selection of multiple internal standards. BMC Bioinformatics 8:93
Article PubMed PubMed Central CAS Google Scholar
Tautenhahn R, Cho K, Uritboonthai W, Zhu Z, Patti GJ, Siuzdak G (2012a) An accelerated workflow for untargeted metabolomics using the METLIN database. Nat Biotechnol 30:826–828
Article PubMed PubMed Central CAS Google Scholar
Tautenhahn R, Patti GJ, Rinehart D, Siuzdak G (2012b) XCMS online: a web-based platform to process untargeted metabolomic data. Anal Chem 84:5035–5039
Article PubMed PubMed Central CAS Google Scholar
Teng Q, Huang W, Collette TW, Ekman DR, Tan C (2009) A direct cell quenching method for cell-culture based metabolomics. Metabolomics 5:199–208
Article CAS Google Scholar
Thonusin C, IglayReger HB, Soni T, Rothberg AE, Burant CF, Evans CR (2017) Evaluation of intensity drift correction strategies using MetaboDrift, a normalization tool for multi-batch metabolomics data. J Chromatogr A 1523:265-274
Tognarelli JM, Dawood M, Shariff MI et al (2015) Magnetic resonance spectroscopy: principles and techniques: lessons for clinicians. Journal of clinical and experimental hepatology 5:320–328
Article PubMed PubMed Central Google Scholar
Trygg J, Wold S (2002) Orthogonal projections to latent structures (O-PLS). J Chemom 16:119–128
Article CAS Google Scholar
Tugizimana F, Steenkamp PA, Piater LA, Dubery IA (2016) A conversation on sata mining strategies in LC-MS untargeted metabolomics: pre-processing and pre-treatment steps. Metabolites 6:40
Vaniya A, Fiehn O (2015) Using fragmentation trees and mass spectral trees for identifying unknown compounds in metabolomics. Trends Analyt Chem 69:52–61
Article PubMed PubMed Central CAS Google Scholar
Vuckovic D (2012) Current trends and challenges in sample preparation for global metabolomics using liquid chromatography-mass spectrometry. Anal Bioanal Chem 403:1523–1548
Article PubMed CAS Google Scholar
Wang M, Carver JJ, Phelan VV et al (2016) Sharing and community curation of mass spectrometry data with global natural products social molecular networking. Nat Biotechnol 34:828–837
Article PubMed PubMed Central CAS Google Scholar
Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, Bryant SH (2009) PubChem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Res 37:W623–W633
Article PubMed PubMed Central CAS Google Scholar
Want EJ, Cravatt BF, Siuzdak G (2005) The expanding role of mass spectrometry in metabolite profiling and characterization. Chembiochem: a European journal of chemical biology 6:1941–1951
Want EJ, Masson P, Michopoulos F et al (2013) Global metabolic profiling of animal and human tissues via UPLC-MS. Nat Protoc 8:17–32
Article PubMed CAS Google Scholar
Want EJ, Nordstrom A, Morita H, Siuzdak G (2007) From exogenous to endogenous: the inevitable imprint of mass spectrometry in metabolomics. J Proteome Res 6:459–468
Article PubMed CAS Google Scholar
Warrack BM, Hnatyshyn S, Ott KH et al (2009) Normalization strategies for metabonomic analysis of urine samples. J Chromatogr B Anal Technol Biomed Life Sci 877:547–552
Article CAS Google Scholar
Watson DG (2013) A rough guide to metabolite identification using high resolution liquid chromatography mass spectrometry in metabolomic profiling in metazoans. Comput Struct Biotechnol J 4:e201301005
Article PubMed PubMed Central Google Scholar
Wehrens R, Hageman JA, van Eeuwijk F et al (2016) Improved batch correction in untargeted MS-based metabolomics. Metabolomics 12:88
Article PubMed PubMed Central CAS Google Scholar
Westerhuis JA, Hoefsloot HCJ, Smit S et al (2008) Assessment of PLSDA cross validation. Metabolomics 4:81–89
Article CAS Google Scholar
Wheelock AM, Wheelock CE (2013) Trials and tribulations of ‘omics data analysis: assessing quality of SIMCA-based multivariate models using examples from pulmonary medicine. Mol BioSyst 9:2589–2596
Article PubMed CAS Google Scholar
Wikoff WR, Gangoiti JA, Barshop BA, Siuzdak G (2007) Metabolomics identifies perturbations in human disorders of propionate metabolism. Clin Chem 53:2169–2176
Article PubMed CAS Google Scholar
Williams A, Tkachenko V (2014) The Royal Society of Chemistry and the delivery of chemistry data repositories for the community. J Comput Aided Mol Des 28:1023–1030
Article PubMed CAS Google Scholar
Wishart DS, Knox C, Guo AC et al (2009) HMDB: a knowledgebase for the human metabolome. Nucleic Acids Res 37:D603–D610
Article PubMed CAS Google Scholar
Xia J, Wishart DS (2016) Using MetaboAnalyst 3.0 for comprehensive metabolomics data analysis. Curr Protoc Bioinformatics 55:14 10 11–14 10 91
Yanes O, Tautenhahn R, Patti GJ, Siuzdak G (2011) Expanding coverage of the metabolome for global metabolite profiling. Anal Chem 83:2152–2161
Article PubMed PubMed Central CAS Google Scholar
Yin P, Lehmann R, Xu G (2015) Effects of pre-analytical processes on blood samples used in metabolomics studies. Anal Bioanal Chem 407:4879–4892
Article PubMed PubMed Central CAS Google Scholar
Zhou J, Liu H, Liu Y, Liu J, Zhao X, Yin Y (2016) Development and evaluation of a parallel reaction monitoring strategy for large-scale targeted metabolomics quantification. Anal Chem 88:4478–4486

Download references

Author information

Authors and Affiliations

Biochemical Genetics and Metabolomics Laboratory, Department of Pediatrics, University of California San Diego, 9500 Gilman Dr. La Jolla, CA, 92093-0830, USA
Ilya Gertsman & Bruce A. Barshop

Authors

Ilya Gertsman
View author publications
You can also search for this author in PubMed Google Scholar
Bruce A. Barshop
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bruce A. Barshop.

Ethics declarations

Conflict of interest

Ilya Gertsman and Bruce A. Barshop declare that they have no conflict of interest.

Additional information

Responsible Editor: Ron A Wevers

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gertsman, I., Barshop, B.A. Promises and pitfalls of untargeted metabolomics. J Inherit Metab Dis 41, 355–366 (2018). https://doi.org/10.1007/s10545-017-0130-7

Download citation

Received: 19 October 2017
Revised: 13 December 2017
Accepted: 20 December 2017
Published: 13 March 2018
Issue Date: May 2018
DOI: https://doi.org/10.1007/s10545-017-0130-7

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Promises and pitfalls of untargeted metabolomics

Abstract

Similar content being viewed by others

Mass Spectrometry-based Metabolomics in Translational Research

Untargeted Metabolomics: Next-Generation Metabolic Screening

Fundamentals of Mass Spectrometry-Based Metabolomics