1 Introduction

Pancreatic cancer (PC) is currently one of the leading causes of cancer-related fatality worldwide with the median survival of six months (Oberstein and Olive 2013). Despite the lethal properties of PC, this disease usually has no early warning signs. Hence, a large number of patients are diagnosed at the advanced and incurable stages of the disease. Available screening approaches, however, are only highly capable of detecting PC among high-risk individuals (Shin and Canto et al. 2012). With the advancement of genomics and other post-genomic high-throughput technologies, a tremendous number of genetic molecules and proteins have been reported to be potential biomarkers in the literature. Significantly, an extensive investigation has introduced a compendium that includes 2516 genes and evidence of their expression levels that were associated with PC, which accounted for approximately 12–13% of the known human coding genes (Harsha et al. 2009). Still, there are currently no clinically relevant biomarkers for accurate diagnosis in early stage of PC. Various types of recurrent somatic mutations of PC have been observed in a recent integrated genomics, transcriptomics, and proteomics study, which indicated the remarkable complexity of the disease (Cancer Genome Atlas Research Network 2017). Moreover, the precise mechanisms of PC initiation and progression remain poorly understood despite recent significant progress (Hezel et al. 2006; Makohon-Moore and Iacobuzio-Donahue 2016). This fact has contributed to the unsatisfactory performance of not only the discovery of useful biomarkers for early detection within the general population but also the development of accurate diagnostic tests for the disease in clinical practice (McCormick and Lemoine 1998).

The integration of metabolomics in cancer biomarker research, including PC, is an emerging field (Fan et al. 2012; Halbrook and Lyssiotis 2017; Kumar et al. 2017; Perez-Rambla et al. 2017). This approach is expected to facilitate a comprehensive understanding of the metabolic alterations and to reveal underlying connections between PC and non-PC conditions. General advantages and disadvantages of the employed analytical platforms for PC metabolomics studies as well as a discussion of possible strategies were reviewed intensively (Gangi et al. 2014). Also, several metabolomics-based studies aiming to discover and validate metabolic alterations that are distinctly associated with the status of PC have been conducted (Tumas et al. 2016). The detection of distinct features of PC using biofluids and/or tissue at the metabolome level can be used not only to provide new insights into cancer mechanisms but also to develop new diagnostic approaches (Nguyen et al. 2015).

Increasing interest in applying metabolomics to discover and validate biomarkers for early detection and diagnosis of PC has resulted in a variety of profiles regarding the metabolomics approaches and potential candidates. However, there has been no overall survey of the status of this dominant topic. The complexity of a high-throughput technology like metabolomics combined with the unique challenges of a biomarker study requires considerable effort to translate the findings into meaningful knowledge. Additionally, the clinical applicability of novel diagnostic biomarkers provided by omics-based studies is commonly overstated, possibly due to biases in the study design, small sample size, lack of epidemiological data, inadequate analysis, and/or insufficient validation (Lumbreras et al. 2009). In the current study, we conducted a systematic review to provide a comprehensive discussion on the clinical significance, current perspectives, and potential pitfalls of metabolomics-based biomarkers in PC. Finally, the putative biological relationships of reported biomarker candidates were also explored.

2 Materials and methods

The current systematic review was accomplished according to a predefined reporting protocol recommended by Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA, http://www.prisma-statement.org/). The full checklist is shown in Supplementary file 1.

2.1 Systematic literature search

As shown in Fig. 1, a systematic search of PubMed, Scopus, and Web of Science was carried out to identify studies that evaluated the application of metabolomics in diagnostic biomarker discovery and validation in PC. According to a previously published guideline, the following terms were used: “(pancreas or pancreatic) and (tumor or tumour or malignancy or neoplasm or cancer or carcinoma or adenoma) and (“metabolite profiling” or “metabolite analysis” or “metabolic profiling” or “metabolic fingerprinting” or “metabolic characterization” or metabolome or metabolomics or metabolomic or metabonomics or metabonomic or lipidome or lipidomics or lipidomic)” (Aromataris and Riitano 2014). There was no limit in terms of the search duration. The literature search was conducted on 16 September 2017. In addition, the references to relevant articles were reviewed to identify the suitable studies for further assessment.

Fig. 1
figure 1

The whole workflow of the systematic review

2.2 Inclusion and exclusion criteria

Data of title, year of publication, authors, and abstracts were extracted to EndNote X6 (Thomson Reuters, NY, USA). Duplicated articles among databases were removed. After that step, an initial screening process was employed by reading the titles and the abstracts of the remaining articles. Potential and suspected articles were further evaluated by reading their full-texts. Eligibility was reviewed by at least two authors to avoid personal biases, and a consensus was made when having inconsistencies.

A study was qualified when it examined the diagnostic biomarker discovery and validation using any metabolomics platforms in PC that described most if not all the following elements: (1) biospecimen (tissue, serum, and/or plasma, etc.); (2) metabolomics platform (GC/MS, LC/MS, and/or NMR, etc.); (3) the statistical biomarker selection and evaluation (univariate analysis, multivariate analysis, or statistical learning) for the discrimination of PC from other cancerous and non-cancerous conditions; (4) suitable prediction metrics of the biomarkers (accuracy, sensitivity, specificity, and area under a receiver operating characteristics curve (AUC-ROC), etc.). We excluded the studies if (1) no suitable control groups were used; (2) the studies were conducted on animal tissues or cell lines; (3) there was duplicated or had no reliable data; and (4) they are letters, reviews, theses, and conference posters or proceedings. There was no restriction to study design, race, geographical area, or certain population.

2.3 Study data extraction

A data extraction form was developed in accordance with previously published systematic reviews and guidelines. Basic information of the study (first author, year of publication), patient characteristics (country, patient participant status, clinical stage, prior treatment), control participant status, age, gender ratio, and follow-up were first extracted. Criteria for diagnosis were examined by three levels: guideline, histological confirmation, or not described. Next, the biological specimen, employed metabolomics or lipidomics platforms, nature of the approach (targeted or untargeted), fasting condition, sample collection and storage condition, sample preparation or pretreatment description, internal standard, analytical validation, and metabolite identification methods were described. Finally, we assessed the validation protocol (external validation or cross-validation), outlier detection, suggestive biomarkers (individual markers or panels of markers), quantitative prediction model, and the measurement of predictive performance (Findeisen and Neumaier 2009). The performances of the biomarker panels were preferably reported. Although R2 (coefficient of determination) and Q2 (cross-validated R2) are not the direct indicators for the clinical utility or the future prediction of the diagnostic models, they are commonly employed in metabolomics biomarker research (Xia et al. 2013). Thus, the R2 and Q2 values of PLS-DA or OPLS-DA models were extracted. For each included article, at least two reviewers read and extracted relevant data for the qualitative synthesis. The reviewers retrieved data independently to avoid unexpected error and/or personal biases. Any inconsistencies were discussed to reach a final consensus.

2.4 Reporting methodological quality assessment

According to the Metabolomics Standards Initiative, a minimum reporting standard of metabolomic experiments includes adequate reports of sample preparation, experimental analysis, quality control, metabolite identification, and data pre-processing (Sumner et al. 2007). For the evaluation of a biomarker study, clear objective, proper clinical context, appropriate data source, robust statistical analysis, and external validation are desired (Azuaje et al. 2009; Bossuyt et al. 2015, 2003; Parker et al. 2010; Turakhia and Sabatine 2017). Collectively, we developed a reporting quality assessment panel of included studies to fully assess the included studies with both diagnostic study and metabolomics study aspects: (1) the study characteristics including study design, characteristics of the studied population, diagnostic criteria, criteria for inclusion and exclusion, prior treatment, follow-up, comparative analysis with reference (e.g., CA19.9); (2) the sampling and experimental analysis including sample collection and storage, sample preparation, experimental condition, the use of internal standard, analytical validation method, outlier detection, metabolite identification; and (3) the computational analysis including data pre-processing, statistical analysis, and quantitative prediction modeling. At least two authors independently assessed the quality of the included studies in accordance with the extracted data. We used the term “low risk of bias” (L) for adequate description, “high risk of bias” (H) for simple description or not described (U) for unknown risk of bias, and we performed the visualization of the quality assessment under QUADAS-2 tool (Whiting et al. 2011). In addition, QUADOMICS, a quality assessment panel specialized for omics-based diagnostic assessment studies was also adopted and applied simultaneously (Lumbreras et al. 2008).

2.5 Pathway enrichment analysis

The compound name standardization was conducted using Human Metabolome Database version 4.0 prior to the analysis (Wishart et al. 2017). The pathway analysis was conducted using MetaboAnalyst version 3.5 (http://www.metaboanalyst.ca) (Xia and Wishart 2016). The Kyoto Encyclopedia of Genes and Genomes (KEGG) and the small molecule pathway (SMP) were used as the knowledgebase to identify relevant pathways. The analysis algorithm was a hypergeometric test for over-representation analysis, and the relative betweenness centrality was selected for pathway topology analysis. The pathway that has an adjusted P-value (false discovery rate, FDR) of less than 0.05 was considered to be significantly enriched.

3 Results

The whole process of the screening step is shown in Fig. 1. From 972 reports of Web of Science, Scopus, and PubMed, 664 studies were included for title and abstract screening after deleting duplications. We then excluded 590 studies that were not related to the study question, conducted on animal tissues or cell lines, or had no suitable control groups or no reliable data. Letters, reviews, theses, conference posters or proceedings were also ruled out. Only 74 studies were eligible for further full-text assessment. We excluded 49 publications because they also met the exclusion criteria. There were no other papers identified by cross-referencing the relevant articles. Eventually, the final list of included studies that contained 25 papers was subjected to systematic review.

3.1 Characteristics of included studies

The main characteristics of included studies are described in Table 1. There were three studies collecting samples from two distinct populations [Xie et al. (2015) from USA and China, Ritchie et al. (2013) from Japan and USA, and Lindahl et al. (2017) from Germany and Sweden]. Biological samples utilized for metabolomic analysis included: serum/plasma (22 studies), urine (two studies), and saliva (one study). Apart from the study of Fukutake et al. (2015) in which 360 cancer and 8400 control cases were involved, the sample size of the remaining studies was relatively small, in which only three studies contained more than 100 PC patients (median of 40, range from 5 to 360). The mean/median age of PC samples seems to be higher than the mean/median age of the control groups. The male/female ratio among studies was greater than one. Nevertheless, 11 studies utilized age- and gender-matched control samples. Healthy controls were used in 22 studies. Twelve studies used one or a mixture of non-cancerous or other cancerous conditions. One study used retrospective, 21 used prospective, and two studies used both retrospective and prospective methods for sample allocation. Noticeably, clinical and laboratory criteria for the diagnosis of PC in metabolomics-based biomarker studies were not adequately reported. Particularly, ten studies (40%) explicitly described that pathological information was used for the diagnosis. Tumor stage was provided in 19 studies, in which the number of samples in the resectable stages (0–II) accounted for approximately nearly a half portion. The training set of Hirata et al. (2017), the test set of Xie et al. (2015), and the whole data set of Suzuki et al. (2017) and Urayama et al. (2010) included PC patients who were only in resectable stages. In the study of Sugimoto et al. (2010), all patients were diagnosed with primary disease. External validation was employed in nine studies (36%). Follow-up of non-cancerous controls was not a common practice as only 24% of the studies applied it to ensure the diagnosis and grouping.

Table 1 Demographic characteristics of the included studies

3.2 Metabolomics design properties of included studies

The study design and metabolomics approaches of the included studies can be found in Table 2. Most studies were untargeted metabolomics (14 studies at 56%), lipidomics (two studies at 8%), or both (three studies at 12%). Four studies were targeted metabolomics, which focused on amino acids (Fukutake et al. 2015; Hirata et al. 2017; Leichtle et al. 2013) or fatty acids (Di Gangi et al. 2016). Mass spectrometry (MS) was the most commonly used platform for the assessment of novel biomarkers in PC (19 studies), followed by NMR (four studies), or both (one study). Serum and plasma were the two most important biological specimens in finding potential biomarkers for accurate diagnosis of PC. Thirteen studies employed samples collecting under fasting condition. Samples in the included studies were generally stored at − 80 °C after collection (22 studies). Before the analysis, sample preparation was essentially performed using defined protocols. The use of internal standards was common and observed in 17 studies. However, only 14 studies reported having analytical validation using quality control samples. The compound identification using authentic standards was described in ten studies. Also, in-house library and database matching were the two popular methods. Finally, although outlier detection is useful for the statistical analysis and modeling, only three studies performed outlier detection, mostly using the PCA method.

Table 2 Study designs and metabolomics approaches of the included studies

3.3 Metabolite-based biomarkers are novel for early diagnosis of PC from healthy controls and other non-cancerous conditions

The study design of the included studies focused mainly on the differentiation of PC from healthy controls, PC from non-malignant diseases, and PC from other malignant conditions. There was only one study that applied the “multi”-benign conditions for the control group (Bathe et al. 2011). In general, the reported quantitative values (AUC, accuracy, sensitivity, and specificity) were promising. The AUC of diagnostic models constructed using multivariate analysis ranged from 0.68 to 1.00. The accuracy varied from at least 0.78 to 0.99, while sensitivity and specificity varied from 0.43 to 1.00 and 0.73 to 1.00, respectively. The performance of the biomarker panels in the included studies can be found in Table 3 and the properties of the included studies are extensively depicted in Supplementary File 2.

Table 3 The performances of the panels of biomarkers of the included studies

3.4 Reporting methodological quality assessment

The two quality assessment results of the included studies can be found in Fig. 2 and Supplementary File 3. Our in-house quality assessment contains 16 domains based on both the diagnostic study and the metabolomics study aspects. Twenty of 25 studies (80%) adequately reported at least eight per 16 domains. The remaining five studies reported at least six domains. Characteristics of the diagnostic criteria, criteria for inclusion and exclusion, follow-up, comparative analysis with reference, and outlier detection were among the least adequately reported domains (Fig. 2a). Only five studies had the follow-up program. According to the QUADOMICS tool, 21 of 25 studies (84%) adequately reported at least eight per 16 items (Fig. 2b). All 25 studies were either phase I or phase II. Therefore, items 2 and 14 regarding the use of metabolomics testing in practice were not applicable for all of the studies. According to item 16, the risk of over-fitting presented in 16 of the studies (64%) owing to the lack of an independent validation set.

Fig. 2
figure 2

The two quality assessment results of the included studies. a Our self-developed quality assessment. b QUADOMICS tool. Items 2 and 14 regarding the use of metabolomics testing in practice are not applicable for all of the studies, and then, are not presented in the figure

3.5 Pathway enrichment analysis of potential biomarkers

From the included studies, we extracted 132 potential biomarker candidates. Among them, amino acids were dominant. Glutamic acid and histidine were reported in seven studies, followed by glutamine and isoleucine both of which were revealed in five studies. Only 46 compounds that were estimated more than once were submitted to conduct pathway enrichment analysis. As shown in Fig. 3, alanine, aspartate and glutamate metabolism (impact value of 0.75, FDR of 8.44E−04); glycine, serine and threonine metabolism (impact value of 0.42, FDR of 3.39E−04); and taurine and hypotaurine metabolism (impact value of 0.36, FDR of 3.67E-02) were the most prominent pathways derived from the selected biomarkers. In addition, seven other pathways were also enriched: Arginine and proline metabolism; aminoacyl-tRNA biosynthesis; methane metabolism; valine, leucine and isoleucine biosynthesis; nitrogen metabolism; cyanoamino acid metabolism; and synthesis and degradation of ketone bodies. Detailed information regarding the reported metabolites and enriched pathways can be found in Supplementary File 4.

Fig. 3
figure 3

Pathway enrichment analysis of potential biomarkers. Alanine aspartate and glutamate metabolism; glycine serine and threonine metabolism; and taurine and hypotaurine metabolism were three most prominent pathways

4 Discussion

CA19.9 has been the only approved biomarker for clinical management of PC by Food and Drug Administration (Partyka et al. 2012). However, the level of CA19.9 also increases in other malignancies and even benign conditions while cannot be detected in Lewis antigen-negative patients (Ballehaninna and Chamberlain 2012). Not to mention, CA19.9 had poor clinical utility as a diagnostic biomarker of PC, and its level did not have the power to help clinicians make additional testing or procedures, as well as manage patients (Singh et al. 2011). Although many papers encompassing various disciplines have introduced novel biomarker candidates for early detection and diagnosis of PC, most of the markers have not proceeded beyond the discovery phase (Chan et al. 2013). Moreover, the reporting quality of the development and validation of the proposed markers has been generally poor among medical disciplines, including cancer research (Moons et al. 2015). The important role of cellular metabolism has been recently acknowledged as a hallmark of cancer, which can facilitate the discovery of promising metabolomics-based biomarkers and novel treatments (Halbrook and Lyssiotis 2017; Pavlova and Thompson 2016). A metabolite that significantly alters its expression in PC and can be detected accurately may become a biomarker candidate. Nevertheless, PC cells are highly heterogeneous, and this characteristic hampers the development of a straightforward method for accurate diagnosis of this cancer (Sidaway 2017). In a recent literature review, Tanase et al. (Tanase et al. 2009) concluded that only the combination of soluble biomarkers could provide adequate sensitivity and specificity for clinical applications. Additionally, circulating biomarkers are of great interest for developing promising biomarker panels (Cappelletti et al. 2015; Konforte and Diamandis 2013). In this paper, we examined the status of the oncometabolomics-based biomarker study for the diagnosis of PC. Many individual markers and marker panels were introduced, and some of them were externally validated. Despite the promising results, these studies primarily focused on the identification of the potential biomarkers rather than validation of their predictive capacity. There are also some remaining issues, which may affect the value of the proposed markers for a broader range of clinical use. For example, the sample size was generally small except the study of Fukutake et al. (Fukutake et al. 2015). Concrete epidemiological characteristics, diagnostic criteria, and tumor grading were usually missing. Similar problems were also reported in a systematic review of metabolomic profiling of oesophago-gastric cancer (Abbassi-Ghadi et al. 2013). Moreover, the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) statement recommended for multivariate prediction model studies was not yet applied (Moons et al. 2015). Herein, we discuss crucial aspects and provide recommendations for future investigations on metabolomics-based diagnostic biomarker study.

Problems with study design and validation protocol of a biomarker study lead to the suggestion of false positive candidates or rejection of the false negative candidates. In biomarker research, essential information is usually not sufficiently provided in diagnostic studies, which eventually hinders the study assessment, replication, and application (Azuaje et al. 2009; Bossuyt et al. 2015). The lack of clinically relevant biomedical context in the investigations on diagnostic accuracy is also a major shortcoming (Irwig et al. 2002; Pepe et al. 2008). According to our assessment, the quality of scientific reporting regarding the metabolomic aspects was not strictly followed. Hence, there is an urgent need to improve important reported elements in diagnostic accuracy research. The Standards for Reporting of Diagnostic Accuracy (STARD), updated in 2015, is an important source for reference to improve the accurate reporting of diagnostic studies (Bossuyt et al. 2015). It is also worth mentioning that a biomarker study for diagnostic purpose may require different criteria in terms of clinical indicators than a screening-based biomarker study (Pepe et al. 2008). This domain-specific property requires intensive considerations regarding the particular purpose of the biomarker research. For instance, the prospective-specimen-collection, retrospective-blinded-evaluation (ProBE) study design guideline suggests that a diagnostic study should demand a very high sensitivity while the acceptable value of a false-positive rate (1—specificity) can be up to 75% (Pepe et al. 2008). In addition, a diagnostic biomarker does not have to be the gold standard for the diagnosis of PC to be valuable. Instead, it can be a companion tool to assist other clinically relevant biomarker panels or diagnostic tests, such as endoscopy ultrasound, enhanced MRI, and ultrasound-guided fine-needle aspiration (Ryan et al. 2014; Takhar et al. 2004).

Surgical resection of the primary tumor followed by adjuvant chemotherapy is the only curable option for PC. The resectable PC is mainly regarded as stage I and stage II (Ryan et al. 2014). Therefore, biomarker panels that can accurately diagnose PC at those stages are essential. There is an agreement that an effective approach for early diagnosis should be able to detect the PC at pancreatic intraepithelial neoplasia or T1N0M0 stage (Canto et al. 2013). However, our findings show that none of the included metabolomics-based biomarker studies focused on the premalignant lesions, such as pancreatic intraepithelial neoplasia and intraductal papillary mucinous neoplasm, and only five studies paid attention to the early stages of PC when introducing new biomarkers (Fukutake et al. 2015; Hirata et al. 2017; Suzuki et al. 2017; Urayama et al. 2010; Zhang et al. 2014). This may come from the fact that PC is usually asymptomatic and not visible on conventional imaging studies at early stages (Kamisawa et al. 2016). Furthermore, the inclusion of more PC patients of stage III or IV may cause an overoptimistic estimation in terms of early diagnosis (Kobayashi et al. 2013). A similar problem was also observed in diagnostic biomarkers of bladder cancer (Rodrigues et al. 2016).

Errors and biases can occur at all phases of biomarker discovery and validation studies that eventually increase the risk of misleading results (Diamandis 2010). As an example, the significant difference of the age of patients and controls may become the confounding factor that affects the reliability of the biomarker panels (Kuan 2014). Hence, the effects of suspect confounding factors should be measured using proper analyses. In addition, there are several studies that applied a gender- and/or age-matched control approach. Although this method is statistically efficient and helps control potential confounders, it is usually problematic and is not preferred (Pepe et al. 2008).

Among statistical approaches, multivariate regression analysis, partial least squares discriminant analysis, and orthogonal projections to latent structures discriminant analysis have been the commonly applied methods in PC biomarker research. Nonetheless, there is an increasing trend of using the so-called “black-box” machine learning models. On the one hand, machine learning methods substantially increase the effectiveness of discriminant analysis, but on the other hand, they raise considerable concerns in the biomedical community, especially because of its lack of interpretability (Foster et al. 2014). However, lack of interpretability can partially be solved by applying local interpretable model-agnostic explanations (LIME) method to readily interpret how a particular model works (Ribeiro et al. 2016). It is important to note that the consecutive analyses using different statistical and machine learning methods on the same data set, in which the later analyses use the outcome of the previous models as the supervised methods for feature selection, may eventually provide overfitting results (Smialowski et al. 2010). These data over-interpretation practice should be avoided. The clinical value of a set of biomarkers is reflected by positive predictive value, negative predictive value, positive likelihood ratio, and negative likelihood ratio; these parameters, when applicable, are encouraged to be reported in future studies (Usher-Smith et al. 2016). Additionally, there is no universal rule to develop criteria for determining potential biomarkers. Our results show that most of the studies applied univariate and/or multivariate analysis for this purpose. However, the relative difference of a particular compound of different groups (fold change) is rarely employed as a criterion. A small change, such as 0.9 or 1.1, may be purely due to the disturbance of the data-gathering process and may not be reproducible. However, there is no consensus on what cut-off should be applied. For instance, Crews et al. (2009) suggested a fold change of greater than two in their variability analysis of human plasma while other authors applied a fold change of 1.5 for their analyses (Patti et al. 2012; Vinaixa et al. 2012). Therefore, researchers may flexibly consider this indicator as an additional criterion when interpreting the metabolomics results.

The class-imbalanced problem, even though is usually neglected in biomarker research, often results in poor performance of a diagnostic model for the minority class in clinical utility (Kondo 2014; Saito and Rehmsmeier 2015). This issue becomes even more problematic when it comes up with the high-dimensional nature of a high-throughput study (Yin et al. 2013). Besides, some additional factors that affect the performance of the prediction include, but are not limited to the inadequate numbers of the exploratory data (e.g., small sample size), data complexity, and the considerable similarity of the metabolomes of different conditions. Excellent discussions of this particular issue can be found elsewhere (Chen et al. 2014; Lin and Chen 2013). It is of importance to mention that untargeted metabolomics data contain many redundant and noisy features that negatively affect the performance of the classification models (Grissa et al. 2016). Feature selection aims to reduce the data complexity derived from a high-throughput experiment by selecting the discriminatory features between cases and control samples. The smaller subset of valuable features can be used to build different supervised learning models for the classification purpose. However, a separate portion of data for feature selection may be required before typical training and test sets are used to search for the optimal predictive models to avoid overfitting results. Besides, the integration of demographic information, clinical signs and typical laboratory tests, and the metabolomics-based biomarker panels may significantly enhance the diagnostic performance (Liesenfeld et al. 2013). For instance, hyperglycemia, increased CA19-9, and elevated branched-chain amino acids (BCAAs) are strongly associated with the status of the PC (Pannala et al. 2009). However, this approach is currently not widely applied to metabolomics-based biomarker discovery and validation studies.

Lately, common pre-analytical, analytical, and data processing pitfalls and best practices for different platforms of metabolite measurement are intensively addressed (Baran 2017; Lu et al. 2017). Nevertheless, the application of reporting standards has not been strictly followed in metabolomics studies. According to The Metabolomics Society Data Quality Task Group, there is still a lack of a consensus on quality assurance (QA) and quality control (QC) practices among metabolomics laboratories, suggesting the need of courses, specialist meetings and reports, and expert panels to create practical QA and QC recommendations and guidelines as well as to enhance the practice in the metabolomics community (Dunn et al. 2017). Besides the lack of QC methods, errors in sample processing and storage can also cause misleading results in biomarker discovery. Moreover, the metabolic perturbation during quenching and extraction can be avoided by rapid manipulation, or the metabolite degradation and interconversion after extraction can be addressed by shortening the time interval between sample preparation and analysis or by using preservatives (Lu et al. 2017). One prominent problem is the short-term and long-term stability of metabolites in biospecimens since the process of sample acquisition in a biomarker study usually takes months or years (Dunn et al. 2011; Yang et al. 2013). Yang et al. indicated that the concentration of some common metabolites in plasma was altered as rapidly as one hour when stored at 4 °C. Moreover, the metabolome of the specimens could be changed significantly even when stored at − 80 °C in a 5-year period (Yang et al. 2013). The vulnerable metabolites, therefore, should be considered for exclusion from the biomarker panel. The harmonization in quantitative untargeted lipidomics has been an issue that prevents inter-study and inter-laboratory comparisons (Bowden et al. 2017). However, a recent comparative investigation on 126 identical plasma and 29 quality control samples of nine MS instruments has demonstrated that the untargeted lipidomics analysis using different instruments can achieve almost comparable results. The study partly answered a long-standing question of whether different conclusions in biological studies are mainly affected by the MS instruments (Cajka et al. 2017).

Peak misidentification of significantly different compounds in untargeted metabolomics settings, especially in LC–MS based untargeted metabolomics, is currently a bottleneck that prevents the application of metabolomics in clinical settings (Kind et al. 2017). Peak misidentification may be brought about by co-eluting isomers, similar molecular-weight interferences, or in-source fragmentation products (Lu et al. 2017). The estimation of false discovery rate is recommended to improve the robustness of the metabolite annotation in untargeted MS experiments (Scheubert et al. 2017). In silico prediction algorithms and computational annotation solutions for putatively identifying compounds have been recently become an active research area (Aksenov et al. 2017; Domingo-Almenara et al. 2017). Besides, the aid of ion mobility spectrometry to the typical LC–MS-based metabolomics or lipidomics study will also improve the reliability of the metabolite identification (Paglia and Astarita 2017). Another consideration that needs to be mentioned is that the authors are also encouraged to include the corresponding Human Metabolome Database (HMDB), PubChem, KEGG, and Chemical Entities of Biological Interest (ChEBI) Compound Identifier number of the identified metabolites for further analysis.

The repository of raw data will greatly contribute to the development of biomarker study. Thus, there is an urgent need for publishing the paper together with the incorporation of data on MetaboLights, Metabolomics Workbench, Metabolomics Repository Bordeaux, XCMS online, or MetabolomeXpress to facilitate the validation, comparison, and meta-analysis of proposed biomarkers (Spicer and Steinbeck 2017). Along with data sharing, national and international efforts to establish and maintain repositories or biobanks of specimens will hopefully accelerate the discovery and validation of new biomarkers.

Among differentially expressed metabolites, amino acids were mostly reported. This suggested that amino acid-related pathways might play an important role in the differentiation between PC and non-malignant conditions. Alanine, aspartate and glutamate metabolism and taurine and hypotaurine metabolism pathways were also enriched in lung cancer (Kumar et al. 2017). Taurine and hypotaurine metabolism pathway was also associated with renal clear cell carcinoma (Yang et al. 2014). In addition, the role of serine and glycine metabolism in cancer cell growth was thoroughly reviewed and discussed (Amelio et al. 2014). Noticeably, our study indicated that the valine, leucine and isoleucine biosynthesis pathway was significantly enriched in PC, agreeing with a previous study stating that the concentrations of BCAAs (valine, leucine, and isoleucine) were elevated in the early stage of PDAC development (Mayers et al. 2014). Insulin resistance-related conditions, such as diabetes and obesity, are well-known risk factors of PC, and the circulating biomarkers of peripheral insulin resistance were found to be significantly associated with a higher risk of developing PC (Bao et al. 2013; Li 2012; Wolpin et al. 2013). The increased level of BCAAs was associated with type 2 diabetes mellitus and metabolic syndrome that may eventually contribute to the PC carcinogenesis (Lynch and Adams 2014). Indeed, a recent study suggested that the elevated level of BCAAs in plasma was an independent risk factor of PC (Leake 2014). Furthermore, the altered metabolism of BCAAs was observed in many other cancer types such as glioblastoma, lung cancer, and breast cancer (Ananieva and Wilkinson 2018). It is worthy to note that we only performed pathway enrichment analysis for biomarkers reported in all groups of patients. At this time, it was not feasible to conduct subgroup pathway analysis of each patient characteristic such as age, tumor stage, or prior treatment.

5 Conclusions

The specific properties of PC have made it a legitimate target for numerous efforts to improve the current diagnostic approaches. Most of the current studies suggest that the accurate diagnosis of PC using metabolic alterations of biospecimens is a practical certainty, although there is a need for further improvements and standardizations. Our systematic review and discussion on the current issues of the oncometabolomics-based diagnostic biomarker discovery and validation may assist the advancement in this area. Given the strong potential of the metabolomics approach, further investigations on the multi-center scale with rigorous study designs should be conducted to fully explore the potential of the metabolomics-based diagnostic approach.