Automated metabolite identification from biological fluid 1H NMR spectra

Filntisi, Arianna; Fotakis, Charalambos; Asvestas, Pantelis; Matsopoulos, George K.; Zoumpoulakis, Panagiotis; Cavouras, Dionisis

doi:10.1007/s11306-017-1286-8

Automated metabolite identification from biological fluid ¹H NMR spectra

Original Article
Published: 25 October 2017

Volume 13, article number 146, (2017)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Metabolomics Aims and scope Submit manuscript

Automated metabolite identification from biological fluid ¹H NMR spectra

Download PDF

Arianna Filntisi¹,
Charalambos Fotakis²,
Pantelis Asvestas³,
George K. Matsopoulos¹,
Panagiotis Zoumpoulakis ORCID: orcid.org/0000-0001-9348-6078² &
…
Dionisis Cavouras³

785 Accesses
17 Citations
4 Altmetric
Explore all metrics

Abstract

Introduction

Metabolite identification in biological samples using Nuclear Magnetic Resonance (NMR) spectra is a challenging task due to the complexity of the biological matrices.

Objectives

This paper introduces a new, automated computational scheme for the identification of metabolites in 1D ¹H NMR spectra based on the Human Metabolome Database.

Methods

The methodological scheme comprises of the sequential application of preprocessing, data reduction, metabolite screening and combination selection.

Results

The proposed scheme has been tested on the 1D ¹H NMR spectra of: (a) an amino acid mixture, (b) a serum sample spiked with the amino acid mixture, (c) 20 blood serum, (d) 20 human amniotic fluid samples, (e) 160 serum samples from publicly available database. The methodological scheme was compared against widely used software tools, exhibiting good performance in terms of correct assignment of the metabolites.

Conclusions

This new robust scheme accomplishes to automatically identify peak resonances in ¹H-NMR spectra with high accuracy and less human intervention with a wide range of applications in metabolic profiling.

Automated Tools for the Analysis of 1D-NMR and 2D-NMR Spectra

Metabolite Identification in Complex Mixtures Using Nuclear Magnetic Resonance Spectroscopy

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Nuclear Magnetic Resonance (NMR) spectroscopy and Mass Spectrometry (MS) have emerged as key technologies for metabolite analysis (Lenz and Wilson 2007; Lindon and Nicholson 2008; Larive et al. 2015) by examining various biofluids and elucidating biomarkers of disease (Fischer et al. 2014; Jobard et al. 2014; Smolinska et al. 2012; Deng et al. 2016; Kang et al. 2015; Kordalewska and Markuszewski 2015; Psychogios et al. 2011). One of the main issues in these studies is metabolite identification, since interpreting ¹H NMR spectra is a challenging, time-consuming task (Li et al. 2013; Smolinska et al. 2012).

To this end, computational metabolite recognition has been the objective of numerous research efforts (Domingo-Almenara et al. 2016; Chignola et al. 2011; Mihaleva et al. 2009; http://www.chenomx.com). BQuant is based on Bayesian modelling and addresses metabolite ¹H-NMR detection as a variable selection problem (Zheng et al. 2011). A probabilistic method based on Markov chain Monte Carlo (MCMC) and Metropolis–Hastings block updates has been implemented in the BATMAN package (Hao et al. 2012). Mercier et al. (2011) have proposed an automated spectral fitting algorithm (AutoFit). Bayesil is a metabolite identification tool based on a sequential Monte Carlo inference method and a probabilistic graphical model (Ravanbakhsh et al. 2015). MetaboHunter is a software that matches the input ¹H NMR spectrum or peak list with the peaks of the reference metabolites from HMDB and MMCD (Tulpan et al. 2011). Despite the amount and sophistication of the dedicated work, the problem of automated metabolite recognition is still not resolved.

In this context, a new methodological scheme is proposed for automatic preprocessing and peak recognition of metabolites from 1D ¹H-NMR spectra. This software constitutes a key advance in the peak annotation direction by reproducibly detecting large numbers of metabolites with sufficient confidence for discovery-based approaches. The methodology can be described as the matching of a set of reference metabolites with the input spectrum, based on metabolite-specific information from the Human Metabolome Database (HMDB) (Wishart et al. 2013). The proposed methodology performs thresholding, denoising and data reduction on the input spectrum before proceeding to peak identification. A key element of the software is that it considers concomitantly J-coupling values, height ratios for the peaks of each separate multiplet, height ratios for the peaks of the whole metabolite, as well as relative distances between the multiplets of a metabolite. The proposed scheme has been tested on (a) an artificial amino acid mixture, (b) a spiked sample (a serum sample spiked with the amino acid mixture), (c) 40 in-house biological samples (twenty blood serum and twenty human amniotic fluid samples), and (d) 160 biological serum samples, available from the MetaboLights database (Kale et al. 2016). The performance of this methodology has been evaluated in terms of accuracy, specificity and sensitivity and it has been assessed against the MetaboHunter and the Bayesil metabolite recognition techniques. A comparative presentation of the proposed methodology and other metabolite identification software packages in terms of their key methodological features and the spectra used for their evaluation can be found in Supplementary Table S-1 of Supplementary File 1.

2 Materials and methods

2.1 Sample preparation and data acquisition

2.1.1 Artificial samples

An amino acid solution (S1) was created in vitro containing l-Alanine, l-Valine, l-Leucine, l-Isoleucine, l-Glutamic acid, l-Methionine, l-Arginine, l-Proline and Trigonelline. This mixture was added to a blood serum sample (S2) to test the platform’s ability to identify the peaks of these metabolites in complex biological matrices.

2.1.2 Biological samples

CPMG NMR spectra from twenty human amniotic fluid (S3–S22) and twenty blood serum (S23–S42) from previous studies were used (Fotakis et al. 2016). The spectra were normalized to the standardized area of the internal standard (sodium maleate) and converted to ASCII format, using MestreNOVA (http://mestrelab.com). Additionally, the methodology was tested on 18 serum samples from Wruck et al. (2015), 50 serum samples from Hart et al. (2017), 50 serum samples from Gralka et al. (2015) and 42 serum samples from Singh et al. (2017), publicly available from the MetaboLights database (Kale et al. 2016).

2.2 Methodological processes

The automated methodological scheme performs preprocessing, data reduction and primarily metabolite search. Initially, the input spectrum is subjected to noise thresholding, denoising and data reduction. Subsequently, the metabolite search begins consisting of a metabolite screening process followed by a combination selection process. During the metabolite screening process, a number of peak sets is considered for each candidate metabolite, the fitness of which is assessed through scoring functions. During the combination selection process, the optimal peak combination for a candidate metabolite is selected. The methodological scheme is fully automated, thus the user is not required to set any parameters. A flowchart of the methodology is presented in Fig. 1.

2.2.1 Spectrum preprocessing

2.2.1.1 Thresholding

Thresholding is applied to the input spectrum to remove the low intensity noise peaks. The noise threshold is calculated automatically for each input spectrum (see Online Appendix). The effect of thresholding on a spectral area of sample S25 is displayed in Supplementary Figure S-1 (a) (Supplementary File 1).

2.2.1.2 Denoising

Gaussian denoising (Haddad and Akansu 1991) is performed on the thresholded spectrum to reduce the noise peaks with intensity above the noise threshold. The parameter values of Gaussian denoising are calculated automatically (see Online Appendix). The effect of denoising on sample S25 is depicted in Supplementary Figure S-1 (b) (Supplementary File 1).

2.2.2 Spectrum reduction

Data reduction through binning is applied to make the spectrum data more manageable. The Adaptive Intelligent binning method was selected after comparative examination of published binning methods (De Meyer et al. 2008; Anderson et al. 2008, 2010; Davis et al. 2007; Sousa et al. 2013) because it captures the true peaks of the spectrum and does not require further definition of the parameter values.

Results of spectrum reduction are depicted in Supplementary Figure S-1 (c) for the thresholded spectrum, and in Supplementary Figure S-1 (d) for the thresholded and denoised spectrum S25 (Supplementary File 1).

2.2.3 Metabolite search

Once a set of spectral peaks and bin boundaries have been determined, the metabolite database is loaded and the metabolite search begins. Henceforward, the variable $m=1, \ldots ,M$ refers to a metabolite, where M is the number of total metabolites. The variable ${l^m}=1, \ldots ,~{L^m}$ refers to a multiplet of a metabolite m, where ${L^m}$is the total number of multiplets of m. $N={N^{{l^m}}}$ is the number of peaks of a multiplet ${l^m}$ while ${N^m}$is the number of peaks of a metabolite. The frequency area defined in the database for a multiplet ${l^m}$ will be referred to as $are{a_{DB}}^{{{l^m}}}$. A glossary of the metabolite search terms can be found in Supplementary Table S-2 (Supplementary File 1).

2.2.3.1 Metabolite screening

The objective of the metabolite screening process is to obtain a number ($K$) of potential peak sets and their corresponding score separately for each multiplet ${l^m}$ of a metabolite across a wide frequency range, without determining the final peaks of the metabolite. In a ¹H NMR spectrum, metabolite peaks can have frequency shifts due to the pH of the mixture or concentration shifting outside the frequency range of the database $(are{a_{DB}}^{{{l^m}}})$ (Supplementary Figure S-2 in Supplementary File 1). For this reason, a multiplet ${l^m}$ is being searched for separately in a number $\left( K \right)$ of areas $\left( {area~_{k}^{{{l^m}}}} \right)$, which have the same width as $are{a_{DB}}^{{{l_m}}},$ but a different frequency centre (Eq. 1). The $offse{t^{a,K}}$ variable symbolizes the quantity by which the initial area $are{a_{DB}}^{{{l^m}}}$ is modified (Eq. 2). The constant $widt{h_\mu }$is the mean multiplet area width, calculated from 850 metabolites ($widt{h_\mu }=0.07509$ ppm) (see Online Appendix).

$$area_{k}^{{{l^m}}}=are{a_{DB}}^{{{l^m}}}+\left( { - \frac{K}{2}+k} \right) \times \alpha \times ~widt{h_\mu },~~k=1,~2, \ldots ,~K~~~~$$

(1)

$$offse{t^{a,K}}=~\frac{K}{2} \times a \times widt{h_\mu }=\frac{K}{2} \times f{d^a}$$

(2)

An extensive analysis of how the parameters $offse{t^{a,K}},~a,~K$ affect the behaviour of the algorithm, as well as the determination of the optimal values for the parameters $offse{t^{a,K}},~a$ can be found in the Appendix. Supplementary Table S-3 (Supplementary File 1) contains the tested parameter values. Supplementary Figure S-3 (Supplementary File 1) contains the average sensitivity, specificity and accuracy of the proposed methodology over the samples S1–S42 with respect to each tested a value. The preprocessing parameters are set automatically (Sect. 2.2.1) and the optimal values of $a,~offse{t^{a,K}}$ have been determined, therefore the user does not make any decisions on methodology parameters.

In every $area_{k}^{{{l^m}}}$, numerous peak sets are considered as a fit for the multiplet ${l^m}$ of metabolite and are scored according to Sect. 2.2.3.3. The peak set $P~_{k}^{{{l^m}}}$ with the optimal score $Score~_{k}^{{{l^m}}}$ for every ${l^m},k$ value combination is selected and saved. Therefore, the result of the metabolite screening process for a metabolite $m$ is one peak set $P~_{k}^{{{l^m}}}$ (Eq. 3) and its corresponding $Score~_{k}^{{{l^m}}}$ (Eq. 4).

$$P~_{k}^{{{l^m}}}=\left\{ {p_{{k,1}}^{{{l^m}}}, \ldots ,~p_{{k,n}}^{{{l^m}}},~ \ldots ,~p_{{k,N}}^{{{l^m}}}} \right\}$$

(3)

$$Score_{k}^{{{l^m}}}=\left\{ {\begin{array}{*{20}{l}} {Score_{k}^{{{l^m}}}}&{f~{l^m}\,issinglet\;(equation\;\left( {A.10} \right),\;Appendix)} \\ {Scor{e_A}_{k}^{{{l^m}}}}&{if~{l^m}\;is\;A\;order\;multiplet\;(equation\;(A.15),\;Appendix)} \\ {Scor{e_B}_{k}^{{{l^m}}}}&{if\;{l^m}\;is\;B\;order\;multiplet\;(equation\;(A.18),\;Appendix)} \\ {Scor{e_C}_{k}^{{{l^m}}}}&{if\;{l^m}\;is\;C\;order\;multiplet\;(equation\;(A.18),\;Appendix)} \\ {Scor{e_D}_{k}^{{{l^m}}}}&{if\;{l^m}\;is\;D\;order\;multiplet\;(equation\;(A.18),\;Appendix)} \\ {Scor{e_M}_{k}^{{{l^m}}}}&{if\;{l^m}\;is\;multiplet\;without\;rules\;(equation\;(A.22),\;Appendix)} \end{array}} \right.$$

(4)

$$l={l^m}=1,~ \ldots ,~{L^m},~n=1,...,{N^{{l^m}}},k=1,~ \ldots ,~K$$

2.2.3.2 Small peak rejection

Spectrum denoising is a valuable preprocessing step but does not eliminate all unwanted noise peaks. Additionally, heavy denoising of a spectrum can eliminate important peaks. Therefore, the function ${V_{bp}}$ was introduced to discriminate prominent peak sets from less prominent ones without relying on denoising and was used to reject candidate peak sets with a ${V_{bp}}$ value below a specific threshold. The ${V_{bp}}$ function represents the percent difference between the mean intensity of the multiplet peaks and the mean intensity of the bin borders corresponding to the multiplet peaks. The definition and the application of the ${V_{bp}}$ function are presented in the Appendix.

2.2.3.3 Multiplet scoring

The process of finding the best peak set for a multiplet ${l^m}$ in an $are{a^{{l^m}}}$ can be described as (i) the application of a novel scoring function on each candidate peak set ${P^{{l^m}}}$ in that area to assess its fitness for ${l^m}$, and (ii) the selection of the candidate peak set with the minimum score as a fit for ${l^m}$ in $are{a^{{l^m}}}$. However, the scoring function that is applied on a candidate peak set to assess its fitness varies depending on the type of multiplet, the different types being singlets, first (A) order multiplets, second (B) or higher (C, D,… symbolized as X) order multiplets and multiplets that do not comply to specific rules. The scoring functions for first or higher order multiplets have been designed to incorporate features such as J-coupling values and height ratios defined in HMDB for a given multiplet ${l^m}$. On the contrary, since singlets are not associated with J values or height ratios, the scoring function for singlets has been designed to consider the horizontal distances and height differences of a candidate peak with the neighbouring peaks. Moreover, the scoring function for a multiplet without rules considers the height ratios defined in HMDB for it and the height monotony of the candidate peak set, which are the most prevalent features of this multiplet type. The scoring functions for each multiplet type are presented in the Appendix.

Illustrations of scoring for each multiplet category can be found in Supplementary Figure S-4 (Supplementary File 1). Examples of the metabolite screening process on the spectrum of the serum sample S25 are shown in Supplementary Figures S-5 (a–d), Supplementary Text S-1 (Supplementary File 1).

2.2.3.4 Selecting peak combinations

This step consists of the processing of the metabolite screening results and the selection of a final peak set for metabolite m. A combination $k{c^m}$ of $k$ values has to be selected as the final fit for metabolite m, which will be the combination with the minimum score among those that have survived the elimination process. First, all possible $k{c^m}$ combinations for the metabolite m are formed. Subsequently, all combinations are screened and rejected if they do not meet certain criteria (see Appendix and Supplementary Figure S-6 of Supplementary File 1). The remaining combinations are scored according to equations (5–7). $Scor{e_{Multiplets}}^{{k{c^m}}}$ (Eq. 6) represents the mean of the multiplet scores, which are calculated according to Eqs. (A.10)–(A.24) (See Online Appendix). $Scor{e_{Heights}}^{{k{c^m}}}$(Eq. 7) represents how well the heights of all the peaks of $k{c^m}$ fit to the ideal height ratios. The combination with the optimal (minimum) score $Scor{e^{k{c^m}}}$ is selected as the best fit for m. The combination selection process is described in the Appendix, with examples available in Supplementary Figures S-5 (e–h), Supplementary Text S-1 (Supplementary File 1).

$$Scor{e^{k{c^m}}}~={w_M} \times Scor{e_{Multiplets}}^{{k{c^m}}}+~{w_H} \times Scor{e_{Heights}}^{{k{c^m}}}$$

(5)

$$Scor{e_{Multiplets}}^{{k{c^m}}}=\mu \left( {\left\{ {\begin{array}{*{20}{l}} {Scor{e_S}^{{k{c^m},{l^m}}},~~~if~{l^m}~is~singlet~(equation~\left( {A.10} \right),~Appendix)} \\ {Scor{e_A}^{{k{c^m},{l^m}}},~~~if~{l^m}~is~A~order~multiplet(equation~(A.15),~Appendix)} \\ {Scor{e_B}^{{k{c^m},{l^m}}},~~~if~{l^{m~}}~is~B~order~multiplet~(equation~(A.18),~Appendix)} \\ \ldots \\ {Scor{e_M}^{{k{c^m},{l^m}}},~~if~{l^m}~is~multiplet~without~rules~(equation~(A.22),~Appendix)} \end{array}} \right.} \right)$$

(6)

$$Scor{e_{Heights}}^{{k{c^m}}}=\mu \left( {\begin{array}{*{20}{c}} {\% error\left( {\frac{{h_{n}^{{k{c^m}}}}}{{h_{1}^{{k{c^m}}}}},~\frac{{{h_{DB}}_{n}}}{{{h_{DB}}_{1}}}} \right),~~~n=2,~ \ldots ,~{N^m}} \\ {\% error\left( {\frac{{h_{n}^{{k{c^m}}}}}{{h_{{n - 1}}^{{k{c^m}}}}},~\frac{{{h_{DB}}_{n}}}{{{h_{DB}}_{{n - 1}}}}} \right)~~n=3,~ \ldots ,{N^m}} \\ {\% error\left( {\frac{{h_{n}^{{k{c^m}}}}}{{h_{{n - 2}}^{{k{c^m}}}}},~\frac{{{h_{DB}}_{n}}}{{{h_{DB}}_{{n - 2}}}}} \right)~~n=4,~ \ldots ,{N^m}} \end{array}} \right)$$

(7)

3 Results and discussion

3.1 Methodology performance

The methodological scheme’s output is a list of metabolites as well as the peaks assigned to each metabolite. Its results are assessed in terms of metabolite presence as well as peak correctness. The mean accuracy, sensitivity and specificity of the proposed methodology of all samples is presented in Table 1 and Supplementary Tables S-4, S-5 (Supplementary File 1).

Table 1 The average results of the proposed methodology, MetaboHunter and Bayesil on the sample groups S1–S2, S3–S22, S23–S42, MTBLS174 1–18, MTBLS424 1–50, MTBLS242 1–50, MTBLS326 1–42

Full size table

3.2 Benchmarking

The performance of the proposed methodology has been compared to MetaboHunter, Bayesil and the Autofit function incorporated in the Chenomx software suite 8.3. Table 1 contains the average accuracy, sensitivity and specificity results of these software tools in each sample group. Supplementary Tables S-4, S-5 (Supplementary File 1) contain the accuracy, specificity and sensitivity results of these software tools on each tested sample. The performance of all methods for samples S1–S42 was assessed based on 45 metabolites, while for samples MTBLS174 1–18, MTBLS424 1–50, MTBLS242 1–50, MTBLS326 1–42 the performance was assessed only on metabolites identified by the authors of those studies.

In MetaboHunter the input spectra were subjected to baseline and phase correction, and the noise threshold defined as a parameter was similar to the one calculated by our method. The method MH2 of MetaboHunter was used, with a shift tolerance of 0.1 ppm and a confidence threshold of 0.5, since a lower confidence threshold value would yield more false positive results, while a higher value would mean fewer true positive results. The results were assessed in terms of metabolite presence as well as peak correctness. This postulates that even when a metabolite is correctly characterized as positive, it may be considered as false positive when the peaks assigned to it are not correct. More specifically, the peaks assigned to a specific metabolite by MetaboHunter have a 2-digit resolution and always belong to the same peak set, regardless of the input spectrum (Supplementary Figure S-7 in Supplementary File 1). The sensitivity and specificity of MetaboHunter were calculated according to a criterion proposed by Ravanbakhsh et al. (2015) and Everett (2015). Specifically, each peak proposed by MetaboHunter ranging more than 0.025 ppm from the equivalent metabolite peak in the spectrum was considered a false prediction. The higher accuracy scores of the proposed methodology may be explained by the fact that it considers potential peak shifting and overlapping.

Bayesil’s results were assessed in terms of the presence of metabolites and not peak or concentration accuracy, since it does not assign specific peaks for each metabolite. Bayesil is designed mainly for serum, plasma and cerebrospinal fluid samples, but it performed well on amniotic fluid samples.

The accuracy of Autofit (currently found embedded in a commercial software) was estimated based on the metabolites that were automatically profiled. Autofit managed to identify only up to 6 metabolites in any tested biological sample (Supplementary Table S-6, Supplementary File 1). The underperformance of this algorithm may probably be attributed to pH sensitivity. Autofit is optimized to handle efficiently specific sample preparations and acquisition parameters (Temperature = 25 °C, Acquisition time = 4 s, Relaxation delay = 1 s, Spectral width = 12 ppm NOESY pulse sequences with tmix of 100 ms). Similar results have been reported providing a 54% accuracy of Autofit on synthetic urine spectra (Tardivel et al. 2017).

Our methodology exhibited higher accuracy results at the amniotic fluid samples due to the presence of broad peaks of lipid molecules in serum, which clash with peaks of other metabolites. In cases when the lipid broad peak does not cover a multiplet completely, such as L-Lactic acid, the proposed methodology is able to identify the multiplet (Supplementary Figure S-7 (c) in Supplementary File 1).

Finally, as observed in Table 1 and Supplementary Tables S-4, S-5 (Supplementary File 1) the specificity values are consistently lower than the accuracy and sensitivity values because the three methods are examined for a set of metabolites that are expected to be present in a biological sample, therefore the number of true negatives is usually lower than or similar to the number of false positives. These differences can be explained since our algorithm considers the J coupling values, the height ratios for the peaks of each separate multiplet as well as the height ratios for the peaks of the whole metabolite, the relative distances between the multiplets of a metabolite, the frequency shifts due to pH and concentration, the importance of each multiplet, as well as the %difference in the intensity of a peak from local minima (indicative of how prominent the peak is from the baseline).

3.3 Challenges

The metabolite search process does not perceive spectral peaks that have not been recognized during data reduction, which is why denoising was applied frugally to the input spectra. Even though stronger denoising would be beneficial in terms of execution time, it could cause the disappearance of important peaks, introducing false negatives.

The execution time ranges from 200 to 4000 s, depending on the complexity of the input spectrum. The experiments were executed on a personal computer with a x64-based Intel i5 processor at 2.5 GHz and 4 GB of RAM. The program was developed in the MATLAB 7.10.0 programming environment and is supported on MS Windows.

4 Conclusions

A new methodological scheme for the automatic preprocessing and recognition of molecular structures from the 1D ¹H-NMR spectra of biological samples has been presented and compared against MetaboHunter and Bayesil. The proposed methodology matches metabolites to spectral peaks based on scoring functions specific to each multiplet type. It was tested on 42 in-house and 160 publicly available biological samples from four studies. The methodology performed efficiently, achieving a mean accuracy of 77.32% over all 160 publicly available spectra, indicating that it could be used to support the metabolite identification in ¹H-NMR spectra of biological samples.

References

Anderson, P. E., Mahle, D. A., Doom, T. E., Reo, N. V., DelRaso, N. J., & Raymer, M. L. (2010). Dynamic adaptive binning: an improved quantification technique for NMR spectroscopic data. Metabolomics, 7(2), 179–190. doi:10.1007/s11306-010-0242-7.
Article CAS Google Scholar
Anderson, P. E., Reo, N. V., DelRaso, N. J., Doom, T. E., & Raymer, M. L. (2008). Gaussian binning: A new kernel-based method for processing NMR spectroscopic data for metabolomics. Metabolomics, 4(3), 261–272. doi:10.1007/s11306-008-0117-3.
Article CAS Google Scholar
Chignola, F., Mari, S., Stevens, T. J., Fogh, R. H., Mannella, V., Boucher, W., & Musco, G. (2011). The CCPN metabolomics Project: A fast protocol for metabolite identification by 2D-NMR. Bioinformatics (Oxford, England), 27(6), 885–886. doi:10.1093/bioinformatics/btr013.
Article CAS Google Scholar
Davis, R. A., Charlton, A. J., Godward, J., Jones, S. A., Harrison, M., & Wilson, J. C. (2007). Adaptive binning: An improved binning method for metabolomics data using the undecimated wavelet transform. Chemometrics and Intelligent Laboratory Systems, 85(1), 144–154. doi:10.1016/j.chemolab.2006.08.014.
Article CAS Google Scholar
De Meyer, T., Sinnaeve, D., Van Gasse, B., Tsiporkova, E., Rietzschel, E. R., De Buyzere, M. L., et al. (2008). NMR-based characterization of metabolic alterations in hypertension using an adaptive, intelligent binning algorithm. Analytical Chemistry, 80(10), 3783–3790. doi:10.1021/ac7025964.
Article CAS PubMed Google Scholar
Deng, L., Gu, H., Zhu, J., Nagana Gowda, G. A., Djukovic, D., Chiorean, E. G., Raftery, D. (2016). Combining NMR and LC/MS using backward variable elimination: Metabolomics analysis of colorectal cancer, polyps, and healthy controls. Analytical chemistry, 88(16), 7975–7983. doi:10.1021/acs.analchem.6b00885.
Article CAS PubMed PubMed Central Google Scholar
Domingo-Almenara, X., Brezmes, J., Vinaixa, M., Samino, S., Ramirez, N., Ramon-Krauel, M., et al. (2016). eRah: A computational tool integrating spectral deconvolution and alignment with quantification and identification of metabolites in GC/MS-based metabolomics. Analytical Chemistry, 88(19), 9821–9829. doi:10.1021/acs.analchem.6b02927.
Article CAS PubMed Google Scholar
Everett, J. R. (2015). A new paradigm for known metabolite identification in metabonomics/metabolomics: Metabolite identification efficiency. Computational and Structural Biotechnology Journal, 13, 131–144. doi:10.1016/j.csbj.2015.01.002.
Article CAS PubMed PubMed Central Google Scholar
Fischer, K., Kettunen, J., Würtz, P., Haller, T., Havulinna, A. S., Kangas, A. J., et al. (2014). Biomarker profiling by nuclear magnetic resonance spectroscopy for the prediction of all-cause mortality: An observational study of 17,345 persons. PLoS Medicine, 11(2), e1001606. doi:10.1371/journal.pmed.1001606.
Article PubMed PubMed Central Google Scholar
Fotakis, C., Zoga, M., Baskakis, C., Tsiaka, T., Boutsikou, T., Briana, D. D., et al. (2016). Investigating the metabolic fingerprint of term infants with normal and increased fetal growth. RSC Advances, 6(83), 79325–79334. doi:10.1039/C6RA12403H.
Article CAS Google Scholar
Gralka, E., Luchinat, C., Tenori, L., Ernst, B., Thurnheer, M., & Schultes, B. (2015). Metabolomic fingerprint of severe obesity is dynamically affected by bariatric surgery in a procedure-dependent manner. American Journal of Clinical Nutrition, 102(6), 1313–1322. doi:10.3945/ajcn.115.110536.
Article CAS PubMed Google Scholar
Haddad, R. A., & Akansu, A. N. (1991). A class of fast Gaussian binomial filters for speech and image processing. IEEE Transactions on Signal Processing, 39(3), 723–727. doi:10.1109/78.80892.
Article Google Scholar
Hao, J., Astle, W., De Iorio, M., & Ebbels, T. M. D. (2012). BATMAN—An R package for the automated quantification of metabolites from nuclear magnetic resonance spectra using a Bayesian model. Bioinformatics (Oxford, England), 28(15), 2088–2090. doi:10.1093/bioinformatics/bts308.
Article CAS Google Scholar
Hart, C. D., Vignoli, A., Tenori, L., Uy, G. L., Van To, T., Adebamowo, C., et al. (2017). Serum metabolomic profiles identify ER-positive early breast cancer patients at increased risk of disease recurrence in a multicenter population. Clinical Cancer Research, 23(6), 1422–1431. doi:10.1158/1078-0432.CCR-16-1153.
Article CAS PubMed PubMed Central Google Scholar
Jobard, E., Pontoizeau, C., Blaise, B. J., Bachelot, T., Elena-Herrmann, B., & Trédan, O. (2014). A serum nuclear magnetic resonance-based metabolomic signature of advanced metastatic human breast cancer. Cancer Letters, 343(1), 33–41. doi:10.1016/j.canlet.2013.09.011.
Article CAS PubMed Google Scholar
Kale, N. S., Haug, K., Conesa, P., Jayseelan, K., Moreno, P., Rocca-Serra, P., Nainala, V. C., Spicer, R. A., Williams, M., Li, X., Salek, R. M., Griffin, J. L., & Steinbeck, C. (2016). MetaboLights: An open-access database repository for metabolomics data. Current Protocols in Bioinformatics, 53, 14.13.1–14.13.18. doi:10.1002/0471250953.bi1413s53.
Article Google Scholar
Kang, J., Zhu, L., Lu, J., & Zhang, X. (2015). Application of metabolomics in autoimmune diseases: Insight into biomarkers and pathology. Journal of Neuroimmunology, 279, 25–32. doi:10.1016/j.jneuroim.2015.01.001.
Article CAS PubMed Google Scholar
Kordalewska, M., & Markuszewski, M. J. (2015). Metabolomics in cardiovascular diseases. Journal of Pharmaceutical and Biomedical Analysis, 113, 121–136. doi:10.1016/j.jpba.2015.04.021.
Article CAS PubMed Google Scholar
Larive, C. K., Barding, G. A., & Dinges, M. M. (2015). NMR spectroscopy for metabolomics and metabolic profiling. Analytical Chemistry, 87(1), 133–146. doi:10.1021/ac504075g.
Article CAS PubMed Google Scholar
Lenz, E. M., & Wilson, I. D. (2007). Analytical strategies in metabonomics. Journal of Proteome Research, 6(2), 443–458. doi:10.1021/pr0605217.
Article CAS PubMed Google Scholar
Li, L., Li, R., Zhou, J., Zuniga, A., Stanislaus, A. E., Wu, Y., et al. (2013). MyCompoundID: Using an evidence-based metabolome library for metabolite identification. Analytical Chemistry, 85(6), 3401–3408. doi:10.1021/ac400099b.
Article CAS PubMed Google Scholar
Lindon, J. C., & Nicholson, J. K. (2008). Analytical technologies for metabonomics and metabolomics, and multi-omic information recovery. TrAC Trends in Analytical Chemistry, 27(3), 194–204. doi:10.1016/j.trac.2007.08.009.
Article CAS Google Scholar
Mercier, P., Lewis, M. J., Chang, D., Baker, D., & Wishart, D. S. (2011). Towards automatic metabolomic profiling of high-resolution one-dimensional proton NMR spectra. Journal of biomolecular NMR, 49(3–4), 307–323. doi:10.1007/s10858-011-9480-x.
Article CAS PubMed Google Scholar
Mihaleva, V. V., Verhoeven, H. A., de Vos, R. C. H., Hall, R. D., & van Ham, R. C. H. J. (2009). Automated procedure for candidate compound selection in GC-MS metabolomics based on prediction of Kovats retention index. Bioinformatics (Oxford, England), 25(6), 787–794. doi:10.1093/bioinformatics/btp056.
Article CAS Google Scholar
Psychogios, N., Hau, D. D., Peng, J., Guo, A. C., Mandal, R., Bouatra, S., et al. (2011). The human serum metabolome. PloS ONE, 6(2), e16957. doi:10.1371/journal.pone.0016957.
Article CAS PubMed PubMed Central Google Scholar
Ravanbakhsh, S., Liu, P., Bjorndahl, T. C., Bjordahl, T. C., Mandal, R., Grant, J. R., et al. (2015). Accurate, fully-automated NMR spectral profiling for metabolomics. PloS ONE, 10(5), e0124219. doi:10.1371/journal.pone.0124219.
Article CAS PubMed PubMed Central Google Scholar
Singh, A., Sharma, R. K., Chagtoo, M., Agarwal, G., George, N., Sinha, N., & Godbole, M. M. (2017). 1H NMR metabolomics reveals association of high expression of inositol 1, 4, 5 trisphosphate receptor and metabolites in breast cancer patients. PloS ONE, 12(1), e0169330. doi:10.1371/journal.pone.0169330.
Article PubMed PubMed Central Google Scholar
Smolinska, A., Blanchet, L., Buydens, L. M. C., & Wijmenga, S. S. (2012). NMR and pattern recognition methods in metabolomics: From data acquisition to biomarker discovery: A review. Analytica Chimica Acta, 750, 82–97. doi:10.1016/j.aca.2012.05.049.
Article CAS PubMed Google Scholar
Sousa, S. A. A., Magalhães, A., & Ferreira, M. M. C. (2013). Optimized bucketing for NMR spectra: Three case studies. Chemometrics and Intelligent Laboratory Systems, 122, 93–102. doi:10.1016/j.chemolab.2013.01.006.
Article CAS Google Scholar
Tardivel, P. J. C., Canlet, C., Lefort, G., Tremblay-Franco, M., Debrauwer, L., Concordet, D., & Servien, R. (2017). ASICS: An automatic method for identification and quantification of metabolites in complex 1D 1H NMR spectra. Metabolomics, 13(10), 109. doi:10.1007/s11306-017-1244-5.
Article CAS Google Scholar
Tulpan, D., Léger, S., Belliveau, L., Culf, A., & Cuperlović-Culf, M. (2011). MetaboHunter: An automatic approach for identification of metabolites from 1H-NMR spectra of complex mixtures. BMC Bioinformatics, 12, 400. doi:10.1186/1471-2105-12-400.
Article CAS PubMed PubMed Central Google Scholar
Wishart, D. S., Jewison, T., Guo, A. C., Wilson, M., Knox, C., Liu, Y., et al. (2013). HMDB 3.0—The Human Metabolome Database in 2013. Nucleic Acids Research, 41(Database issue), D801-7. doi:10.1093/nar/gks1065.
Article CAS PubMed Google Scholar
Wruck, W., Kashofer, K., Rehman, S., Daskalaki, A., Berg, D., Gralka, E., et al. (2015). Multi-omic profiles of human non-alcoholic fatty liver disease tissue highlight heterogenic phenotypes. Scientific Data, 2, 150068. doi:10.1038/sdata.2015.68.
Article CAS PubMed PubMed Central Google Scholar
Zheng, C., Zhang, S., Ragg, S., Raftery, D., & Vitek, O. (2011). Identification and quantification of metabolites in (1)H NMR spectra by Bayesian model selection. Bioinformatics (Oxford, England), 27(12), 1637–1644. doi:10.1093/bioinformatics/btr118.
Article CAS Google Scholar

Download references

Acknowledgements

This work was funded by a State Scholarships Foundation (IKY) Fellowship of Excellence for postgraduate studies in Greece—Siemens Program. The authors confirm that the funder had no influence over the study design, content of the paper, or selection of this journal.

Author information

Authors and Affiliations

School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou str., 15780, Athens, Greece
Arianna Filntisi & George K. Matsopoulos
Institute of Biology, Medicinal Chemistry and Biotechnology, National Hellenic Research Foundation, 48 Vas. Constantinou Ave., 11635, Athens, Greece
Charalambos Fotakis & Panagiotis Zoumpoulakis
Department of Biomedical Engineering, Technological Educational Institute of Athens, 17 Ag. Spyridonos Street, 12243, Athens, Greece
Pantelis Asvestas & Dionisis Cavouras

Authors

Arianna Filntisi
View author publications
You can also search for this author in PubMed Google Scholar
Charalambos Fotakis
View author publications
You can also search for this author in PubMed Google Scholar
Pantelis Asvestas
View author publications
You can also search for this author in PubMed Google Scholar
George K. Matsopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Panagiotis Zoumpoulakis
View author publications
You can also search for this author in PubMed Google Scholar
Dionisis Cavouras
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Panagiotis Zoumpoulakis.

Additional information

Binary file freely available for download at http://biomig.ntua.gr/downloads/software/MIDTool.zip.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Filntisi, A., Fotakis, C., Asvestas, P. et al. Automated metabolite identification from biological fluid ¹H NMR spectra. Metabolomics 13, 146 (2017). https://doi.org/10.1007/s11306-017-1286-8

Download citation

Received: 05 June 2017
Accepted: 19 October 2017
Published: 25 October 2017
DOI: https://doi.org/10.1007/s11306-017-1286-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Automated metabolite identification from biological fluid 1H NMR spectra

Abstract

Introduction

Objectives

Methods

Results

Conclusions

Similar content being viewed by others

Automated Tools for the Analysis of 1D-NMR and 2D-NMR Spectra

Metabolite Identification in Complex Mixtures Using Nuclear Magnetic Resonance Spectroscopy

Metabolite Identification in Complex Mixtures Using Nuclear Magnetic Resonance Spectroscopy

1 Introduction

2 Materials and methods

2.1 Sample preparation and data acquisition

2.1.1 Artificial samples

2.1.2 Biological samples

2.2 Methodological processes

2.2.1 Spectrum preprocessing

2.2.1.1 Thresholding

2.2.1.2 Denoising

2.2.2 Spectrum reduction

2.2.3 Metabolite search

2.2.3.1 Metabolite screening

2.2.3.2 Small peak rejection

2.2.3.3 Multiplet scoring

2.2.3.4 Selecting peak combinations

3 Results and discussion

3.1 Methodology performance

3.2 Benchmarking

3.3 Challenges

4 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (DOCX 45 KB)

Supplementary material 2 (DOCX 24663 KB)

Supplementary material 3 (CSV 11 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

Automated metabolite identification from biological fluid ¹H NMR spectra