1 Introduction

Nuclear Magnetic Resonance (NMR) spectroscopy and Mass Spectrometry (MS) have emerged as key technologies for metabolite analysis (Lenz and Wilson 2007; Lindon and Nicholson 2008; Larive et al. 2015) by examining various biofluids and elucidating biomarkers of disease (Fischer et al. 2014; Jobard et al. 2014; Smolinska et al. 2012; Deng et al. 2016; Kang et al. 2015; Kordalewska and Markuszewski 2015; Psychogios et al. 2011). One of the main issues in these studies is metabolite identification, since interpreting 1H NMR spectra is a challenging, time-consuming task (Li et al. 2013; Smolinska et al. 2012).

To this end, computational metabolite recognition has been the objective of numerous research efforts (Domingo-Almenara et al. 2016; Chignola et al. 2011; Mihaleva et al. 2009; http://www.chenomx.com). BQuant is based on Bayesian modelling and addresses metabolite 1H-NMR detection as a variable selection problem (Zheng et al. 2011). A probabilistic method based on Markov chain Monte Carlo (MCMC) and Metropolis–Hastings block updates has been implemented in the BATMAN package (Hao et al. 2012). Mercier et al. (2011) have proposed an automated spectral fitting algorithm (AutoFit). Bayesil is a metabolite identification tool based on a sequential Monte Carlo inference method and a probabilistic graphical model (Ravanbakhsh et al. 2015). MetaboHunter is a software that matches the input 1H NMR spectrum or peak list with the peaks of the reference metabolites from HMDB and MMCD (Tulpan et al. 2011). Despite the amount and sophistication of the dedicated work, the problem of automated metabolite recognition is still not resolved.

In this context, a new methodological scheme is proposed for automatic preprocessing and peak recognition of metabolites from 1D 1H-NMR spectra. This software constitutes a key advance in the peak annotation direction by reproducibly detecting large numbers of metabolites with sufficient confidence for discovery-based approaches. The methodology can be described as the matching of a set of reference metabolites with the input spectrum, based on metabolite-specific information from the Human Metabolome Database (HMDB) (Wishart et al. 2013). The proposed methodology performs thresholding, denoising and data reduction on the input spectrum before proceeding to peak identification. A key element of the software is that it considers concomitantly J-coupling values, height ratios for the peaks of each separate multiplet, height ratios for the peaks of the whole metabolite, as well as relative distances between the multiplets of a metabolite. The proposed scheme has been tested on (a) an artificial amino acid mixture, (b) a spiked sample (a serum sample spiked with the amino acid mixture), (c) 40 in-house biological samples (twenty blood serum and twenty human amniotic fluid samples), and (d) 160 biological serum samples, available from the MetaboLights database (Kale et al. 2016). The performance of this methodology has been evaluated in terms of accuracy, specificity and sensitivity and it has been assessed against the MetaboHunter and the Bayesil metabolite recognition techniques. A comparative presentation of the proposed methodology and other metabolite identification software packages in terms of their key methodological features and the spectra used for their evaluation can be found in Supplementary Table S-1 of Supplementary File 1.

2 Materials and methods

2.1 Sample preparation and data acquisition

2.1.1 Artificial samples

An amino acid solution (S1) was created in vitro containing l-Alanine, l-Valine, l-Leucine, l-Isoleucine, l-Glutamic acid, l-Methionine, l-Arginine, l-Proline and Trigonelline. This mixture was added to a blood serum sample (S2) to test the platform’s ability to identify the peaks of these metabolites in complex biological matrices.

2.1.2 Biological samples

CPMG NMR spectra from twenty human amniotic fluid (S3–S22) and twenty blood serum (S23–S42) from previous studies were used (Fotakis et al. 2016). The spectra were normalized to the standardized area of the internal standard (sodium maleate) and converted to ASCII format, using MestreNOVA (http://mestrelab.com). Additionally, the methodology was tested on 18 serum samples from Wruck et al. (2015), 50 serum samples from Hart et al. (2017), 50 serum samples from Gralka et al. (2015) and 42 serum samples from Singh et al. (2017), publicly available from the MetaboLights database (Kale et al. 2016).

2.2 Methodological processes

The automated methodological scheme performs preprocessing, data reduction and primarily metabolite search. Initially, the input spectrum is subjected to noise thresholding, denoising and data reduction. Subsequently, the metabolite search begins consisting of a metabolite screening process followed by a combination selection process. During the metabolite screening process, a number of peak sets is considered for each candidate metabolite, the fitness of which is assessed through scoring functions. During the combination selection process, the optimal peak combination for a candidate metabolite is selected. The methodological scheme is fully automated, thus the user is not required to set any parameters. A flowchart of the methodology is presented in Fig. 1.

Fig. 1
figure 1

Flowchart of the proposed methodological scheme

2.2.1 Spectrum preprocessing

2.2.1.1 Thresholding

Thresholding is applied to the input spectrum to remove the low intensity noise peaks. The noise threshold is calculated automatically for each input spectrum (see Online Appendix). The effect of thresholding on a spectral area of sample S25 is displayed in Supplementary Figure S-1 (a) (Supplementary File 1).

2.2.1.2 Denoising

Gaussian denoising (Haddad and Akansu 1991) is performed on the thresholded spectrum to reduce the noise peaks with intensity above the noise threshold. The parameter values of Gaussian denoising are calculated automatically (see Online Appendix). The effect of denoising on sample S25 is depicted in Supplementary Figure S-1 (b) (Supplementary File 1).

2.2.2 Spectrum reduction

Data reduction through binning is applied to make the spectrum data more manageable. The Adaptive Intelligent binning method was selected after comparative examination of published binning methods (De Meyer et al. 2008; Anderson et al. 2008, 2010; Davis et al. 2007; Sousa et al. 2013) because it captures the true peaks of the spectrum and does not require further definition of the parameter values.

Results of spectrum reduction are depicted in Supplementary Figure S-1 (c) for the thresholded spectrum, and in Supplementary Figure S-1 (d) for the thresholded and denoised spectrum S25 (Supplementary File 1).

2.2.3 Metabolite search

Once a set of spectral peaks and bin boundaries have been determined, the metabolite database is loaded and the metabolite search begins. Henceforward, the variable \(m=1, \ldots ,M\) refers to a metabolite, where M is the number of total metabolites. The variable \({l^m}=1, \ldots ,~{L^m}\) refers to a multiplet of a metabolite m, where \({L^m}\)is the total number of multiplets of m. \(N={N^{{l^m}}}\) is the number of peaks of a multiplet \({l^m}\) while \({N^m}\)is the number of peaks of a metabolite. The frequency area defined in the database for a multiplet \({l^m}\) will be referred to as \(are{a_{DB}}^{{{l^m}}}\). A glossary of the metabolite search terms can be found in Supplementary Table S-2 (Supplementary File 1).

2.2.3.1 Metabolite screening

The objective of the metabolite screening process is to obtain a number (\(K\)) of potential peak sets and their corresponding score separately for each multiplet \({l^m}\) of a metabolite across a wide frequency range, without determining the final peaks of the metabolite. In a 1H NMR spectrum, metabolite peaks can have frequency shifts due to the pH of the mixture or concentration shifting outside the frequency range of the database \((are{a_{DB}}^{{{l^m}}})\) (Supplementary Figure S-2 in Supplementary File 1). For this reason, a multiplet \({l^m}\) is being searched for separately in a number \(\left( K \right)\) of areas \(\left( {area~_{k}^{{{l^m}}}} \right)\), which have the same width as \(are{a_{DB}}^{{{l_m}}},\) but a different frequency centre (Eq. 1). The \(offse{t^{a,K}}\) variable symbolizes the quantity by which the initial area \(are{a_{DB}}^{{{l^m}}}\) is modified (Eq. 2). The constant \(widt{h_\mu }\)is the mean multiplet area width, calculated from 850 metabolites (\(widt{h_\mu }=0.07509\) ppm) (see Online Appendix).

$$area_{k}^{{{l^m}}}=are{a_{DB}}^{{{l^m}}}+\left( { - \frac{K}{2}+k} \right) \times \alpha \times ~widt{h_\mu },~~k=1,~2, \ldots ,~K~~~~$$
(1)
$$offse{t^{a,K}}=~\frac{K}{2} \times a \times widt{h_\mu }=\frac{K}{2} \times f{d^a}$$
(2)

An extensive analysis of how the parameters \(offse{t^{a,K}},~a,~K\) affect the behaviour of the algorithm, as well as the determination of the optimal values for the parameters \(offse{t^{a,K}},~a\) can be found in the Appendix. Supplementary Table S-3 (Supplementary File 1) contains the tested parameter values. Supplementary Figure S-3 (Supplementary File 1) contains the average sensitivity, specificity and accuracy of the proposed methodology over the samples S1–S42 with respect to each tested a value. The preprocessing parameters are set automatically (Sect. 2.2.1) and the optimal values of \(a,~offse{t^{a,K}}\) have been determined, therefore the user does not make any decisions on methodology parameters.

In every \(area_{k}^{{{l^m}}}\), numerous peak sets are considered as a fit for the multiplet \({l^m}\) of metabolite and are scored according to Sect. 2.2.3.3. The peak set \(P~_{k}^{{{l^m}}}\) with the optimal score \(Score~_{k}^{{{l^m}}}\) for every \({l^m},k\) value combination is selected and saved. Therefore, the result of the metabolite screening process for a metabolite \(m\) is one peak set \(P~_{k}^{{{l^m}}}\) (Eq. 3) and its corresponding \(Score~_{k}^{{{l^m}}}\) (Eq. 4).

$$P~_{k}^{{{l^m}}}=\left\{ {p_{{k,1}}^{{{l^m}}}, \ldots ,~p_{{k,n}}^{{{l^m}}},~ \ldots ,~p_{{k,N}}^{{{l^m}}}} \right\}$$
(3)
$$Score_{k}^{{{l^m}}}=\left\{ {\begin{array}{*{20}{l}} {Score_{k}^{{{l^m}}}}&{f~{l^m}\,issinglet\;(equation\;\left( {A.10} \right),\;Appendix)} \\ {Scor{e_A}_{k}^{{{l^m}}}}&{if~{l^m}\;is\;A\;order\;multiplet\;(equation\;(A.15),\;Appendix)} \\ {Scor{e_B}_{k}^{{{l^m}}}}&{if\;{l^m}\;is\;B\;order\;multiplet\;(equation\;(A.18),\;Appendix)} \\ {Scor{e_C}_{k}^{{{l^m}}}}&{if\;{l^m}\;is\;C\;order\;multiplet\;(equation\;(A.18),\;Appendix)} \\ {Scor{e_D}_{k}^{{{l^m}}}}&{if\;{l^m}\;is\;D\;order\;multiplet\;(equation\;(A.18),\;Appendix)} \\ {Scor{e_M}_{k}^{{{l^m}}}}&{if\;{l^m}\;is\;multiplet\;without\;rules\;(equation\;(A.22),\;Appendix)} \end{array}} \right.$$
(4)
$$l={l^m}=1,~ \ldots ,~{L^m},~n=1,...,{N^{{l^m}}},k=1,~ \ldots ,~K$$
2.2.3.2 Small peak rejection

Spectrum denoising is a valuable preprocessing step but does not eliminate all unwanted noise peaks. Additionally, heavy denoising of a spectrum can eliminate important peaks. Therefore, the function \({V_{bp}}\) was introduced to discriminate prominent peak sets from less prominent ones without relying on denoising and was used to reject candidate peak sets with a \({V_{bp}}\) value below a specific threshold. The \({V_{bp}}\) function represents the percent difference between the mean intensity of the multiplet peaks and the mean intensity of the bin borders corresponding to the multiplet peaks. The definition and the application of the \({V_{bp}}\) function are presented in the Appendix.

2.2.3.3 Multiplet scoring

The process of finding the best peak set for a multiplet \({l^m}\) in an \(are{a^{{l^m}}}\) can be described as (i) the application of a novel scoring function on each candidate peak set \({P^{{l^m}}}\) in that area to assess its fitness for \({l^m}\), and (ii) the selection of the candidate peak set with the minimum score as a fit for \({l^m}\) in \(are{a^{{l^m}}}\). However, the scoring function that is applied on a candidate peak set to assess its fitness varies depending on the type of multiplet, the different types being singlets, first (A) order multiplets, second (B) or higher (C, D,… symbolized as X) order multiplets and multiplets that do not comply to specific rules. The scoring functions for first or higher order multiplets have been designed to incorporate features such as J-coupling values and height ratios defined in HMDB for a given multiplet \({l^m}\). On the contrary, since singlets are not associated with J values or height ratios, the scoring function for singlets has been designed to consider the horizontal distances and height differences of a candidate peak with the neighbouring peaks. Moreover, the scoring function for a multiplet without rules considers the height ratios defined in HMDB for it and the height monotony of the candidate peak set, which are the most prevalent features of this multiplet type. The scoring functions for each multiplet type are presented in the Appendix.

Illustrations of scoring for each multiplet category can be found in Supplementary Figure S-4 (Supplementary File 1). Examples of the metabolite screening process on the spectrum of the serum sample S25 are shown in Supplementary Figures S-5 (a–d), Supplementary Text S-1 (Supplementary File 1).

2.2.3.4 Selecting peak combinations

This step consists of the processing of the metabolite screening results and the selection of a final peak set for metabolite m. A combination \(k{c^m}\) of \(k\) values has to be selected as the final fit for metabolite m, which will be the combination with the minimum score among those that have survived the elimination process. First, all possible \(k{c^m}\) combinations for the metabolite m are formed. Subsequently, all combinations are screened and rejected if they do not meet certain criteria (see Appendix and Supplementary Figure S-6 of Supplementary File 1). The remaining combinations are scored according to equations (57). \(Scor{e_{Multiplets}}^{{k{c^m}}}\) (Eq. 6) represents the mean of the multiplet scores, which are calculated according to Eqs. (A.10)–(A.24) (See Online Appendix). \(Scor{e_{Heights}}^{{k{c^m}}}\)(Eq. 7) represents how well the heights of all the peaks of \(k{c^m}\) fit to the ideal height ratios. The combination with the optimal (minimum) score \(Scor{e^{k{c^m}}}\) is selected as the best fit for m. The combination selection process is described in the Appendix, with examples available in Supplementary Figures S-5 (e–h), Supplementary Text S-1 (Supplementary File 1).

$$Scor{e^{k{c^m}}}~={w_M} \times Scor{e_{Multiplets}}^{{k{c^m}}}+~{w_H} \times Scor{e_{Heights}}^{{k{c^m}}}$$
(5)
$$Scor{e_{Multiplets}}^{{k{c^m}}}=\mu \left( {\left\{ {\begin{array}{*{20}{l}} {Scor{e_S}^{{k{c^m},{l^m}}},~~~if~{l^m}~is~singlet~(equation~\left( {A.10} \right),~Appendix)} \\ {Scor{e_A}^{{k{c^m},{l^m}}},~~~if~{l^m}~is~A~order~multiplet(equation~(A.15),~Appendix)} \\ {Scor{e_B}^{{k{c^m},{l^m}}},~~~if~{l^{m~}}~is~B~order~multiplet~(equation~(A.18),~Appendix)} \\ \ldots \\ {Scor{e_M}^{{k{c^m},{l^m}}},~~if~{l^m}~is~multiplet~without~rules~(equation~(A.22),~Appendix)} \end{array}} \right.} \right)$$
(6)
$$Scor{e_{Heights}}^{{k{c^m}}}=\mu \left( {\begin{array}{*{20}{c}} {\% error\left( {\frac{{h_{n}^{{k{c^m}}}}}{{h_{1}^{{k{c^m}}}}},~\frac{{{h_{DB}}_{n}}}{{{h_{DB}}_{1}}}} \right),~~~n=2,~ \ldots ,~{N^m}} \\ {\% error\left( {\frac{{h_{n}^{{k{c^m}}}}}{{h_{{n - 1}}^{{k{c^m}}}}},~\frac{{{h_{DB}}_{n}}}{{{h_{DB}}_{{n - 1}}}}} \right)~~n=3,~ \ldots ,{N^m}} \\ {\% error\left( {\frac{{h_{n}^{{k{c^m}}}}}{{h_{{n - 2}}^{{k{c^m}}}}},~\frac{{{h_{DB}}_{n}}}{{{h_{DB}}_{{n - 2}}}}} \right)~~n=4,~ \ldots ,{N^m}} \end{array}} \right)$$
(7)

3 Results and discussion

3.1 Methodology performance

The methodological scheme’s output is a list of metabolites as well as the peaks assigned to each metabolite. Its results are assessed in terms of metabolite presence as well as peak correctness. The mean accuracy, sensitivity and specificity of the proposed methodology of all samples is presented in Table 1 and Supplementary Tables S-4, S-5 (Supplementary File 1).

Table 1 The average results of the proposed methodology, MetaboHunter and Bayesil on the sample groups S1–S2, S3–S22, S23–S42, MTBLS174 1–18, MTBLS424 1–50, MTBLS242 1–50, MTBLS326 1–42

3.2 Benchmarking

The performance of the proposed methodology has been compared to MetaboHunter, Bayesil and the Autofit function incorporated in the Chenomx software suite 8.3. Table 1 contains the average accuracy, sensitivity and specificity results of these software tools in each sample group. Supplementary Tables S-4, S-5 (Supplementary File 1) contain the accuracy, specificity and sensitivity results of these software tools on each tested sample. The performance of all methods for samples S1–S42 was assessed based on 45 metabolites, while for samples MTBLS174 1–18, MTBLS424 1–50, MTBLS242 1–50, MTBLS326 1–42 the performance was assessed only on metabolites identified by the authors of those studies.

In MetaboHunter the input spectra were subjected to baseline and phase correction, and the noise threshold defined as a parameter was similar to the one calculated by our method. The method MH2 of MetaboHunter was used, with a shift tolerance of 0.1 ppm and a confidence threshold of 0.5, since a lower confidence threshold value would yield more false positive results, while a higher value would mean fewer true positive results. The results were assessed in terms of metabolite presence as well as peak correctness. This postulates that even when a metabolite is correctly characterized as positive, it may be considered as false positive when the peaks assigned to it are not correct. More specifically, the peaks assigned to a specific metabolite by MetaboHunter have a 2-digit resolution and always belong to the same peak set, regardless of the input spectrum (Supplementary Figure S-7 in Supplementary File 1). The sensitivity and specificity of MetaboHunter were calculated according to a criterion proposed by Ravanbakhsh et al. (2015) and Everett (2015). Specifically, each peak proposed by MetaboHunter ranging more than 0.025 ppm from the equivalent metabolite peak in the spectrum was considered a false prediction. The higher accuracy scores of the proposed methodology may be explained by the fact that it considers potential peak shifting and overlapping.

Bayesil’s results were assessed in terms of the presence of metabolites and not peak or concentration accuracy, since it does not assign specific peaks for each metabolite. Bayesil is designed mainly for serum, plasma and cerebrospinal fluid samples, but it performed well on amniotic fluid samples.

The accuracy of Autofit (currently found embedded in a commercial software) was estimated based on the metabolites that were automatically profiled. Autofit managed to identify only up to 6 metabolites in any tested biological sample (Supplementary Table S-6, Supplementary File 1). The underperformance of this algorithm may probably be attributed to pH sensitivity. Autofit is optimized to handle efficiently specific sample preparations and acquisition parameters (Temperature = 25 °C, Acquisition time = 4 s, Relaxation delay = 1 s, Spectral width = 12 ppm NOESY pulse sequences with tmix of 100 ms). Similar results have been reported providing a 54% accuracy of Autofit on synthetic urine spectra (Tardivel et al. 2017).

Our methodology exhibited higher accuracy results at the amniotic fluid samples due to the presence of broad peaks of lipid molecules in serum, which clash with peaks of other metabolites. In cases when the lipid broad peak does not cover a multiplet completely, such as L-Lactic acid, the proposed methodology is able to identify the multiplet (Supplementary Figure S-7 (c) in Supplementary File 1).

Finally, as observed in Table 1 and Supplementary Tables S-4, S-5 (Supplementary File 1) the specificity values are consistently lower than the accuracy and sensitivity values because the three methods are examined for a set of metabolites that are expected to be present in a biological sample, therefore the number of true negatives is usually lower than or similar to the number of false positives. These differences can be explained since our algorithm considers the J coupling values, the height ratios for the peaks of each separate multiplet as well as the height ratios for the peaks of the whole metabolite, the relative distances between the multiplets of a metabolite, the frequency shifts due to pH and concentration, the importance of each multiplet, as well as the %difference in the intensity of a peak from local minima (indicative of how prominent the peak is from the baseline).

3.3 Challenges

The metabolite search process does not perceive spectral peaks that have not been recognized during data reduction, which is why denoising was applied frugally to the input spectra. Even though stronger denoising would be beneficial in terms of execution time, it could cause the disappearance of important peaks, introducing false negatives.

The execution time ranges from 200 to 4000 s, depending on the complexity of the input spectrum. The experiments were executed on a personal computer with a x64-based Intel i5 processor at 2.5 GHz and 4 GB of RAM. The program was developed in the MATLAB 7.10.0 programming environment and is supported on MS Windows.

4 Conclusions

A new methodological scheme for the automatic preprocessing and recognition of molecular structures from the 1D 1H-NMR spectra of biological samples has been presented and compared against MetaboHunter and Bayesil. The proposed methodology matches metabolites to spectral peaks based on scoring functions specific to each multiplet type. It was tested on 42 in-house and 160 publicly available biological samples from four studies. The methodology performed efficiently, achieving a mean accuracy of 77.32% over all 160 publicly available spectra, indicating that it could be used to support the metabolite identification in 1H-NMR spectra of biological samples.