Introduction—metabolites and their importance

Metabolites represent the ultimate downstream state in the central dogma exceeding the genome, transcriptome, and proteome. Their activity is connected with large, interweaved networks of biochemical pathways that help in understanding the physiology in healthy as well as in diseased conditions. Metabolites drive various cellular functions like energy production, signal transduction, and homeostasis. Interestingly, metabolism exhibits dynamicity, i.e., a change in physiological conditions can result in alterations in the metabolic pathways resulting in the formation of various metabolites which can be both essential and non-essential. For example, in aerobic glycolysis, the end product formed is pyruvate, which regulates the insulin release, and thus maintains the blood glucose levels, which seems to be more essential in nature. However, in anaerobic conditions, especially during intense workouts, wherein, there is high energy demand and low oxygen availability, the pyruvate formed is catalyzed to lactate, with the help of the enzyme lactate dehydrogenase, which builds up in the bloodstream, which may not be much essential at that point. In disease conditions like cancer, the cancerous cells deliberately channel the pyruvate to anaerobic metabolism resulting in lactate production and acidic microenvironment [1]. It is estimated that about 3000 metabolites are essential for normal growth and 20,000 non-essential metabolites are known to date [2]. Metabolites are generally categorized into two types, namely primary metabolites, and secondary metabolites. Primary metabolites are essential for the growth of the organism. They are the intermediary products of crucial metabolic pathways and play a role in the generation of energy, and the formation of biomolecules such as amino acids, carbohydrates, and nucleic acids. The secondary metabolites are not required for metabolic activities and are formed in response to external stimuli such as pharmaceuticals, pollutants, or contaminants.

Alteration in the biochemical processes via genetic aberrations in critical genes can lead to the formation of intermediary metabolites which can be deleterious to the health of the organism [3]. These metabolic shifts can be investigated using diverse metabolomics strategies that aim to identify the specific metabolic changes in patients compared to healthy controls. Subsequently, these findings can be systematically analyzed and validated in larger cohorts for the development of biomarker panels to be used for the diagnosis of diseases [4,5,6,7,8,9,10].

It is now well-appreciated that metabolites play key roles in diverse disease pathophysiology including cancer, and understanding these metabolic alterations can aid in a better understanding of the disease mechanisms and would facilitate the development of biomarkers for early diagnosis. The current review is aimed at summarizing the metabolomic methods dedicated to understanding the alterations in cancer metabolism and the prospect of metabolomics for the development of reliable metabolite biomarkers for cancer diagnosis.

Metabolism and cancer

Cancer is a result of a complex interplay of epigenetic, genetic, and environmental factors that disrupt the orderly events of cell death and confer the cells with the ability to uncontrolled growth. The intricate regulation that directs the cells to undergo programmed cell death is lost and the cells start to proliferate seamlessly leading to cancer development and spread (metastasis). It is noteworthy to mention that cancer initiation, progression, and maintenance are closely linked to alteration in metabolism, and therefore ‘reprogramming energy metabolism’ is recognized as one of the hallmarks of cancer [11].

Reprogramming energy metabolism as a hallmark of cancer

Deregulation of cellular energetics and metabolism is one of the critical hallmarks of cancer through which cancer cells manage to reprogram the metabolic pathways for their benefit and sustenance. Glucose is the primary source of energy in most mammalian cells, and it is metabolized into pyruvate via the glycolysis pathway. Under aerobic conditions, pyruvate embarks into the Krebs cycle where it is completely oxidized to generate ATP (oxidative phosphorylation), which fulfills the energy demands of the cell. Notably, in rapidly proliferating cancer cells that require a lot of ATP for their sustenance, the majority of the pyruvate is channeled to form lactate via the action of lactate dehydrogenase, a pathway typically active during anaerobic conditions. This aberrant phenomenon of lactate production under aerobic conditions is known as the ‘Warburg effect’ or aerobic glycolysis [12]. Although a binary switch for the regulation of aerobic glycolysis and oxidative phosphorylation is non-existent, accumulating research indicates that these metabolic alterations in cancer cells are governed by genetic as well as epigenetic changes, and ultimately results in enhanced bioenergy production that supports tumor progression [13]. Along these lines, mutations in the mitochondrial DNA (mtDNA) that compromise the oxidative phosphorylation pathway drive aerobic glycolysis and the generation of reactive oxygen species (ROS) in cancer cells. In addition, several signaling pathways including mTOR, HIF1α, p53, PI3K/Akt, Erk1/2 MAPK, AMPK, and ULK1 drive the metabolic reprogramming and contribute to the Warburg effect [14]. Moreover, cancer cells are characterized by an increase in glucose uptake which is meant to cope with the shorter aerobic glycolytic pathway (energy), as well as to feed the pentose phosphate pathway for the generation of nucleic acid precursors, and NADPH for fatty acid synthesis (biomass) and maintenance of redox homeostasis [15].

In addition to glucose, cancer cells also frequently rely on glutamine as a source of energy and precursor for other amino acids and lipids. This blood-borne amino acid is crucial for the replenishment of the Krebs cycle intermediates, pyruvate, and other building blocks that contribute to cancerous growth. Although glutamine primarily supplements glucose for the production of energy and biomaterials, in times of glucose deficiency, cancer cells can uptake and metabolize glutamine oxidatively to fulfill their energy needs [16].

Another key player in the metabolic reprogramming of cancer cells is lactate, an organic molecule secreted by the cells that undergo anaerobic and aerobic glycolysis. Although lactate was long considered a toxic waste, recent developments point to its tumor-promoting capabilities [17, 18]. More specifically, lactate present in the cytosol of the tumor cells enters the mitochondria via the mitochondrial monocarboxylate transporters where it is converted to pyruvate, which serves as a source of ATP production [19]. In addition, lactate has also been established as a key player in promoting cell migration, metastasis, angiogenesis, immune evasion, cancer cell self-sustenance, and shaping the tumor microenvironment [1]. Furthermore, the accumulation of huge amounts of lactate produced by aerobic glycolysis results in acidification of the tumor microenvironment and it has been found that this acidosis supports angiogenesis, metastasis, and immuno-suppression of tumor cells [20].

Deregulation of optimal metabolic functioning of a cell, especially that of mitochondrial metabolism, may favor tumor development and progression. In 2020, a team of Italian researchers reported that the production of reactive oxygen species (ROS) within a cellular environment may set off various causative factors that may activate different oncogenes within a cell leading to cancer development, an observation that contradicts the general hypothesis which states that oncogenes are involved in abnormal ROS production [21]. In another recent study, the level of Acotinase 2 (ACO2), an enzyme found in mitochondria, was observed to be diminished in MCF7 breast cancer cells [22]. Moreover, its overexpression leads to impaired cellular proliferation and the deregulation of the pyruvate metabolism pathway by rerouting pyruvate into the mitochondria that erode the Warburg effect-like bioenergetics characteristics, suggesting that cancers with downregulation of ACO2 levels are effectively rewired through metabolic reprogramming [22]. Notably, mitochondrial metabolism not only provides the necessary amounts of ATP but also caters to the requirement of building blocks for anabolism (anaplerosis) of the rapidly dividing cancer cells. Citrate, one of the key Krebs cycle intermediate, is positioned at a critical juncture between the anabolic and catabolic pathways. Apart from fulfilling the needs of oxidative phosphorylation for energy production, citrate can also be converted back to acetyl-CoA by the action of the enzyme ATP-citrate lyase (ACLY) that can participate in fatty acid synthesis (to meet the requirements of membrane precursors) in the cytoplasm. Moreover, when transported to the nucleus, acetyl-CoA can also take part in acetylation reactions for regulating gene transcription and other cellular processes including autophagy [15].

Overall, it is well-established and evident that metabolic reprogramming offers proliferative advantages to cancer cells, and understanding these metabolic alterations contributes to a better understanding of the disease.

Metabolic diseases as comorbidities in cancer

The co-existence of disorders in addition to a primary disease of interest is defined as comorbidity [23]. Considering the importance of metabolic rewiring in cancer development, it is not surprising that there is a significant association between cancer and other diseases, particularly metabolic disorders such as diabetes, obesity, chronic kidney disease (CKD), and liver cirrhosis. These associations are complex and draw commonalities so that the mechanistic insights into this interdependence can be of significant relevance to widening the therapeutic prospect for cancer [24]. From the disease pathology point of view, this metabolic overlap between cancer and metabolic diseases is often responsible for the occurrence of comorbidity among cancer patients, which is seemingly increasing and affecting the diagnosis, treatment, and prognosis of the patients. Unfortunately, cancer patients with comorbidities have poor survival, worse quality of life, and increasing healthcare expenses [23].

Diabetes, the most prevalent metabolic disease, is closely associated with an increased risk of several cancers including endometrial, colorectal, breast, pancreatic, and liver cancers [25]. Although specific signaling pathways have been implicated in connecting diabetes to cancer, the most relevant mechanism for this association is related to the circulation of higher levels of insulin and insulin-like growth factors in the blood that regulate cell proliferation and apoptosis, thereby increasing the risk of cancer prevalence [26]. In addition, higher circulating levels of glucose have also been found as a risk factor for breast and pancreatic cancer occurrence. Mechanistically, hyperglycemia promotes O-GlcNAcylation modification of ribonucleotide reductase resulting in an imbalance in the nucleotide pools that subsequently drives oncogenic KRAS mutations to support cancerous growth [27].

Obesity, a condition resulting from an excessive amount of fat accumulation in the body, increases the risk of diabetes, cardiovascular diseases, and cancer. Recent epidemiological reports have revealed associations between obesity and several cancers including breast, kidney, colorectal, liver, esophageal, endometrial, thyroid, bladder, and pancreatic cancers [28]. In particular, a study conducted by Calle and co-workers on a cohort of U.S. adults suggested that 19.8% of cancer mortalities in women and 14.2% in men are directly linked to obesity [29]. The metabolic alterations during the development of obesity are potent enough to induce epigenetic as well as genetic changes that support cancer growth and maintenance. More specifically, obesity-mediated insulin resistance developed due to the secretion of leptin, cytokines, free fatty acids, and triglycerides from adipocytes promotes oncogenic alterations that drive cancer development [30].

Chronic kidney disease (CKD) is a medical condition characterized by steady but continuous loss of kidney function over time. Notably, CKD and cancer are linked in a complex way; while CKD can be a risk factor for malignant transformation, cancer therapies are also shown to contribute to CKD development. In particular, end-stage CKD patients undergoing kidney transplantation and dialysis are at a higher risk for some cancers such as melanoma, renal carcinoma, Kaposi sarcoma, and thyroid cancer. Conversely, several cancer subtypes, in particular prostate, lung, breast, and colorectal cancers are also frequently implicated as risk factors for the development of CKD [31].

Hypertension has also been recognized as one of the most common comorbidities in cancer patients which results from the use of angiogenesis inhibitors, alkylating agents, and immunosuppressants as chemotherapeutic agents [32, 33]. Cancer treatment with angiogenesis inhibitors such as tyrosine kinase inhibitors (sorafenib, pazopanib, and sunitinib) and anti-vascular endothelial growth factor antibody (bevacizumab) leads to the development of hypertension, although the mechanism is still under investigation. Further, alkylating agents cause vasoconstriction and arterial endothelial dysfunction that significantly contribute to chemotherapy-induced cardiotoxicity [34].

Therefore, it is evident that several metabolic disorders develop as comorbidities in cancer, or they can drive cancer development, and affect the overall survival of the patients. Nevertheless, sufficient data and mechanistic studies on the specific comorbid conditions are warranted for better management and treatment of cancer patients.

Oncometabolites and their mechanism of action in cancer and metastasis

Alterations in the metabolic pathways during tumor development primarily result from the gain of function or the loss of function mutations in genes encoding enzymes that are involved in the respective metabolic pathway. Subsequently, this leads to changes in metabolite turnover in the cell and some of the metabolites aberrantly accumulate in the cancer cells. Such abnormally accumulated metabolites are termed oncometabolites [35, 36]. Although several metabolic intermediates were initially proposed as oncometabolites, only a handful of them including succinate, fumarate, and 2-hydroxyglutarate have been well-established (Fig. 1). Notably, these oncometabolites share significant structural similarity and they operate in the metabolic proximity of the Krebs (TCA) cycle [37].

Fig. 1
figure 1

Mechanism of action of the oncometabolites (fumarate, succinate, L-2-hydroxyglutarate and D-2-hydroxyglutarate) that are generated from the Krebs cycle due to mutations in fumarate hydratase (FH), succinate dehydrogenase (SDH) and isocitrate dehydrogenase (IDH) enzymes

2-Hydroxyglutarate was the first oncometabolite to be identified and subsequently, the two other structurally similar members were included. There are two stereoisomeric forms of this oncometabolite—D-2-hydroxyglutarate and L-2-hydroxyglutarate, and they are generated via discrete metabolic pathways. Overproduction and accumulation of D-2-hydroxyglutarate are linked to a gain of function mutations in IDH1 or IDH2 genes encoding the enzyme isocitrate dehydrogenase that catalyzes the reversible conversion of isocitrate to α-ketoglutarate. Mutated isocitrate dehydrogenase attains a neomorphic activity that drives the reduction of α-ketoglutarate to D-2-hydroxyglutarate instead of isocitrate, thereby leading to abnormal accumulation of the oncometabolite [38]. On the contrary, the accumulation of L-2-hydroxyglutarate is a result of the nonspecific activity of malate dehydrogenase that catalyzes the reduction of α-ketoglutarate to L-2-hydroxyglutarate [39]. Similarly the accumulation of succinate and fumarate results from mutations in succinate dehydrogenase and fumarase hydratase enzymes, respectively [40, 41].

Although the oncometabolites produce distinct downstream effects in cancer cells, they also share a few common targets due to their structural similarity. Along these lines, all these oncometabolites can inhibit 2-ketoglutarate-dependent dioxygenases including prolyl hydroxylase domain proteins (PHDs), lysine demethylases (KDMs), and ten-eleven translocation (TET) enzymes [37]. In normal cells under normoxia, PHDs promote hydroxylation of proline residues of hypoxia-inducible factors (HIFs), which are subsequently ubiquitinated and degraded by the ubiquitin–proteasome system. However, in cancer cells, the accumulated oncometabolites inhibit the function of PHDs thereby leading to the stabilization of HIFs that promote transcription of genes responsible for angiogenesis and cancer cell growth. In addition, succinate can also promote angiogenesis in an HIF-independent manner by upregulating the expression of vascular endothelial growth factor [42]. Moreover, fumarate can promote the stabilization of HIF-1α by non-canonical activation of NFκ-B signaling [43]. KDMs are responsible for demethylating lysine residues in histones, whereas TET enzymes catalyze the hydroxylation of 5-methylcytosine in DNA CpG dinucleotides. Therefore, inhibition of KDMs and TET enzymes results in aberrant histone and DNA methylation, respectively, that promotes epigenetic modifications in the cancer cells [44].

A growing volume of literature suggests that mitochondrial oncometabolites alter distinct biological pathways in cancer and therefore, in-depth mechanistic insights into their involvement in cancer are still required for a comprehensive understanding of disease progression and the development of diagnostic and therapeutic interventions. Notably, mitochondrial metabolism plays a crucial role in cancer, and therefore, targeting altered mitochondrial metabolites can lead to the development of novel strategies for cancer therapy and personalized cancer therapy [45]. Apart from the above oncometabolites, a wide range of metabolomics studies have also identified several metabolites that are altered in different cancers (see Section "Data processing and preprocessing tools"). However, the metabolic pathways associated with this metabolite alteration and its involvement in cancer progression still warrant a deeper investigation. In particular, the enzymes positioned upstream or downstream of the metabolic defects can be investigated to get an insight into at least the cause of the metabolic alterations. For instance, the knockdown of genes encoding Fatty acid synthase (FASN) and Acetyl-CoA carboxylase (ACC), the two important enzymes involved in the fatty acid synthesis, promotes apoptotic cell death of cancer cells. Similarly, inhibition of the enzyme ATP-citrate lyase (ACLY), which catalyzes the conversion of citrate to acetyl-CoA, leads to growth arrest in cancer cells via the inactivation of the Akt pathway [46]. Stearoyl-CoA desaturase 1 (SCD) is a crucial metabolic enzyme involved in the conversion of saturated fatty acids to mono-unsaturated fatty acids. A study by Rueda-Rincon and co-workers demonstrated that the knockdown of SCD leads to the accumulation of saturated phospholipids of the phosphatidylinositol headgroup class, as revealed by targeted lipidomics analysis that could attenuate the Akt pathway [47].

Further, the metabolic regulation of cancer cells during metastasis is a complex and dynamic process that currently lacks complete understanding. Despite extensive research in the field of cancer metabolomics, a comprehensive understanding of this process remains an open investigative genre. Being a complex process, metastasis involves the spread of cancer cells from the primary tumor site to other distant organs of the body including unrelated organs as well. Metastasis, involving several biochemical reactions and cascades, is metabolically highly inefficient, as most cancer cells do not survive throughout. However, some cancer cells undergo metabolic adaptations to optimize their survival in such inhospitable environments. Cancer cells experience the complex process of dysregulating intrinsic metabolic pathways at different stages of the metastatic cascade to optimize their survival in specific microenvironments [48].

Numerous scientific research sheds evidence that cancerous neoplastic cells in primary tumors frequently reside within a hypoxic tumor microenvironment (TME), where they utilize anaerobic glycolysis pathways to support cellular growth and proliferation [49,50,51]. Significant intratumoral heterogeneity lets some cancer cells, capable of metastasizing, detach from the primary tumor and encounter elevated oxidative stress levels, necessitating various metabolic and transcriptional adaptations to endure the hostile milieu of the bloodstream [52,53,54]. Following migration and implantation in remote organs, cancer cells manipulate and reconfigure their metabolic pathways for survival and propagate using the available nutrients and oxygen at non-primary sites termed metastatic sites. At the onset of the metastasis phenomenon, cancer cells primarily attempt to invade the tumor-associated stromal regions and undertake the epithelial-to-mesenchymal transition (EMT) process, resulting in loss of their polarity and substantial increase in the invasive, stress-resistant potential, eventually disseminating into a mesenchymal-like phenotype [55]. In metastatic cancers, several studies have reported metabolites, like 2-hydroxyglutarate to regulate and moderate EMT via the regulation of several transcription factors [56,57,58].

Importance of metabolite fluxes in cancer and its metastasis

Several critical intermediate steps in various biochemical pathways of a living system are frequently perturbed and impaired in various diseases, which contributes to their complexity. The metabolism at a cellular level is highly dynamic and coordinated wherein the metabolite nutrients are perpetually utilized towards energy production. Metabolic biochemical reactions can be described in terms of the rate of these reactions, technically known as metabolic flux. Metabolic flux refers to the rate of transformation of a substrate to product metabolites in units of moles per unit of time per cell in complex networks of several biochemical reactions [59]. One or more metabolite fluxes may be altered as the overall result of metabolic disorders and tumorigenesis at the organ and organism levels [60, 61]. Metabolic flux is governed by several components of the living organism such as genotypic and environmental features and is decisive in determining the disease or healthy phenotypes [62,63,64]. Isotope tracing is the method of choice widely used in metabolic flux analysis and primarily uses nutrient metabolites labeled either with stable isotopes (like 13C, 2H, and/or 15N) or radioactive isotopes (like 18F, 3H, 14C) [65]. Broadly, there are two categories of metabolic fluxes, namely, intracellular metabolic fluxes and extracellular metabolic fluxes. Intracellular metabolic fluxes are those in which the fluxes are limited within the cell and do not cross the cell membrane, while extracellular metabolic fluxes are the fluxes that are not limited within the cell and can cross the cell membrane facilitating intercellular metabolic crosstalk. The impacts of extracellular metabolic flows or fluxes, for instance, affect the rate at which amino acids or carbohydrates like glucose are taken up or the rate at which cellular biomass grows, which can be easily evaluated in the extracellular environment (Fig. 2). As a result, it is possible to directly assess these fluxes by monitoring alterations in the change in concentration of extracellular metabolites or biomass with respect to progressing time. However, a similar strategy can’t be applied to the measurement of the intracellular fluxes. These measurements can be achieved through the incorporation rates/ blueprints of isotopically labeled metabolites like 13C-Glucose etc. within a biological cell or organism [66]. The labeled metabolites are consumed within the cells to carry out the physiological functions and in turn, the biochemical reactions get traced with these labeled isotopic metabolites, eventually helping in the measurement of the metabolic fluxes within a system [67]. Biologists always strive to understand the perturbations and intricacies of various molecular events playing a role in oncological diseases. One of the research goals in metabolomics of cancer and metastasis is to deduce the complex metabolic rewiring of the cancer cells /tumors supporting their extensive demands in terms of biosynthetic and energetic demand. Metabolic flux analysis tends to be one such analysis pipeline that provides cancer biologists the answers to the complex metabolic cascades in various cancers. In cancer research, comprehending the changes in metabolic flux that are specific to tumors can aid in identifying the reliance on particular enzymes that can eventually be selectively targeted by pharmacological inhibition to combat cancer cells [61, 68]. Several researchers have used the power of metabolic flux analysis and reported the intricacies of altered metabolism in cancer cells when they metastasize to various distant organs. Using human and mouse models across several cancers, researchers have studied metabolic flux as a comparative evaluation of the metastatic tumors when compared to non-metastatic localized primary tumors [54, 69,70,71,72,73,74,75,76,77]. In a study, utilizing isotope tracing, Labuschagne et al. reported the clustering capability of cancer cells to be highly impacted by the metastatic potential of the detached cancer cells as these cells get protection from reactive oxygen species (ROS) that exist in the circulation through alterations in the metabolism towards buffering the oxidative stress developed in cancer cells [78]. Similarly, using in vivo 13C tracer analysis, Christen et al. assessed the pyruvate carboxylase-dependent anaplerosis process in breast-cancer-derived lung metastases and the primary breast tumors and found that lung metastases possess superior pyruvate carboxylase-dependent anaplerosis levels compared to primary breast cancers [71]. Such studies have shed light on the possibilities of isotope tracing methods in unraveling the deeper understanding of the metabolism switchovers in cancer and metastasis.

Fig. 2
figure 2

A representation of metabolic flux during cellular metabolism where metabolic nutrients are constantly consumed, utilized for cellular biomass and energy production, and ultimately secreted out (Figure was created with BioRender under a paid subscription)

OMICS approaches in cancer research and biomarker discovery

The omics approaches consist of the study of genes (Genomics), mRNAs (Transcriptomics), proteins (Proteomics), and metabolites (Metabolomics) using high throughput techniques such as next-generation sequencing, RNA-Seq, liquid chromatography-mass spectrometry (LC–MS), nuclear magnetic resonance (NMR), etc. The omics approach aims to achieve a holistic view of various biomolecules involved in disease conditions such as cancers and other disorders in comparison to healthy individuals. Notably, an integrated omics strategy is crucial for a deeper understanding of the initiation, progression, and maintenance of cancer. In addition, omics technologies are competent enough to identify the cellular, biochemical, or molecular alterations during cancer development that can be measured in biological media such as tissues, cells, or biofluids.

Genomics, the pioneer omics technology, deals with the identification of alterations such as deletions, mismatches or transitions, and single nucleotide polymorphisms between nucleotide base pairs of the DNA in genes. Several gene mutations implicated in cancers have been identified using techniques like PCR, microarray, whole genome sequencing (WGS), and next-generation sequencing (NGS). Despite great advancements in genomics, genetic information is not sufficient to predict the outcome of the disease [79]. Transcriptomics involves the study of transcription products such as mRNA of the whole cell. The main advantage of studying a transcriptome over a genome is that the transcript or mRNA in a cell reflects the expression of the active genes under the given conditions. Therefore, understanding the transcriptome is necessary for elucidating the functional part of the genome that is active during cancer or normal conditions. Transcriptomic analysis employs a wide variety of techniques right from northern blotting, RT-PCR, and microarrays, to high throughput RNA-Seq. Largely with the advent of RNA-Seq, the detection of differentially expressed genes has become more robust compared to microarray techniques [80].

The proteome, which encompasses the complete set of proteins, is the most complex component that reflects the dynamics of the cell. Proteins drive the cellular processes and pathways promoting growth, proliferation, and apoptosis (or evading cell death), and the proteome significantly varies in healthy and disease states. The altered expression of proteins and unwanted post-translation modifications (PTMs) affect the protein structure and thereby their function and such events have been reported in diverse malignancies. These factors cannot be evaluated solely by genomic and transcriptomic approaches. The introduction of a wide array of techniques, for example, 2D electrophoresis, MALDI, SELDI, Electrospray ionization, reverse phase protein array, etc., to the field of clinical proteomics has helped to understand the mechanism of carcinogenesis as well as to discover biomarkers for diagnosis and therapeutic targets, thus making proteomics an essential tool to bridge the gap between genomics and cellular physiology [81, 82].

Metabolomics is currently an evolving field that deals with extracting information about all the metabolites present in a biological sample. Metabolites are the downstream products of central dogma; in other words, metabolites are the continuation of central dogma after proteins. Slight perturbations in the gene, transcript, or protein level prominently influence the metabolism of the organism which indicates a highly dynamic behavior of the metabolome. Numerous studies have revealed alterations in metabolism during malignant disorders. With the advancement of high throughput technologies such as Nanostructure-Initiator Mass Spectrometry (NIMS) that can detect metabolites in the concentration of yoctomole range within tissues and biofluids in a matrix-free desorption ionization mode, metabolomics provides a unique opportunity to understand cancer pathogenesis and to identify biomarkers for cancer diagnosis [83]. In addition, the evolution of robust computational tools has also broadened the ability to collect data at various levels [84, 85]. Figure 3 depicts a schematic representation of the metabolomics-based cancer biomarker discovery pipeline.

Fig. 3
figure 3

A schematic representation of untargeted metabolomics-based cancer biomarker discovery workflow for developing novel candidate biomarkers (Figure was created with BioRender under a paid subscription)

Integration of multi-layered omics data integration

Metabolites exhibit close association with several biological processes that are linked to a multitude of variables, such as genetic variation, environmental influences, and altered enzyme levels or kinetic activity. Recent studies have shown that the integration of multi-omics data provides a better understanding and vivid picture of the system under study. For instance, integrative analysis of metabolomics and transcriptomics data of human prostate cancer tissues showed molecular perturbations associated with prostate cancer. More specifically, the authors found impaired sphingosine-1-phosphate receptor 2 signaling (associated with sphingosine metabolism) in cancer patients, representing a loss of tumor suppressor gene and a potential key oncogenic pathway for therapeutic targeting [86]. Therefore, integration of multi-omics data is essential to comprehend the interrelationship among biomolecules and understand the flow of information from one omics level to the other. Meaningful biological insights from high throughput metabolomics analysis can be interpreted through network-based data integration tools such as MetaCoreTM, InCroMAP, 3Omics, and MetaboAnalyst. These platforms offer varying degrees of data integration, from visual representation of multi-omics data networks to specific modules for integrated pathway analysis. MetaboAnalyst, for instance, allows users to map genes and metabolites onto KEGG pathways thereby reflecting changes in both gene expressions and metabolite concentrations to identify implicated pathways [87,88,89]. Several reviews are available that discuss the methodologies and analytical frameworks for multi-layered omics data integration techniques [90,91,92,93,94].

Of note, integrating multi-omics datasets to gain comprehensive insights into biological processes and diseases presents numerous challenges. These hurdles encompass the inherent heterogeneity within individual omics data, the computational intensity necessitated by the substantial dataset sizes, and the absence of definitive studies guiding the selection of appropriate analytical tools. Multi-omics data are acquired on a wide range of platforms, leading to considerable variation in data storage and formats. Consequently, most integrative analysis tools accept specific data formats, typically requiring preprocessing of individual omics datasets. This preprocessing phase includes data filtering, systematic normalization, batch effect removal, and rigorous quality assessments. The careful application of these preprocessing steps is vital, given their profound impact on subsequent integrative analyses. Notably, data filtering plays a pivotal role in noise reduction and feature selection, a critical aspect of computational-intensive integrative models. Nevertheless, establishing appropriate filtering criteria remains challenging, underscoring the absence of universal standards in the field. Addressing these challenges is essential for realizing the full potential of multi-omics data integration in elucidating complex biological phenomena and disease mechanisms [89, 94, 95].

Mapping cancer metabolism: global and targeted strategies

Metabolic alterations in cancer can be investigated using diverse analytical approaches, and each of these strategies has its advantages and disadvantages [96]. These metabolomics approaches can be broadly classified under untargeted (global) and targeted categories [97]. Generally, biomarker discovery or hypothesis generation studies are performed in an untargeted manner in which metabolites from healthy individuals and cancer patients are identified, and the relative abundances of the metabolites are compared. Subsequently, a panel of biomarkers containing significantly altered metabolites in the cancer samples is identified which can be further validated in a separate larger cohort (Fig. 3). In addition, global metabolomics studies also aim to map the metabolic dysregulation in terms of altered metabolic pathways during cancer progression and provide a better understanding of the disease pathology. Furthermore, such untargeted approaches can be extended to map the metabolic alterations during response to chemotherapy, chemoresistance as well as cancer prognosis [96]. On the other hand, targeted approaches are employed for hypothesis testing or validation experiments. These studies concentrate on a specific set of metabolites intending to measure them in a quantitative or semi-quantitative manner across the samples under investigation (Fig. 4). Notably, findings from a global metabolomics approach are usually validated using chemically characterized standard metabolites in a targeted approach [97]. Currently, the metabolomics platforms that are widely used for investigating alterations in cancer metabolism either rely on mass spectrometry (MS) or nuclear magnetic resonance (NMR) spectroscopy [98]. However, matrix-assisted laser desorption/ionization mass spectrometry imaging (MALDI-MSI) and NMR-based in vivo imaging techniques have found restricted applications in cancer research [99, 100].

Fig. 4
figure 4

A schematic representation of MS-based targeted metabolomics strategy used for subsequent validation of the findings of untargeted metabolomics approach

The chemical composition of the atoms present in metabolites can be studied using NMR spectroscopy. In principle, during NMR acquisition, the metabolites are subjected to a strong magnetic field and subsequently pulsed with radiofrequency waves of the order 3 to 300 GHz. This results in the generation of radiofrequency energy that leads to the transient excitation of nuclei of the constituent atoms (for example 15N, 1H, 31P, and 13C) present in the metabolites. The nuclei of most of the atoms have an inherent spin state and electrical charge. Therefore, once the transiently excited nuclei of the constituent atoms are under the influence of an external magnetic field, the spin states of the atoms are reversed. Subsequently, when the nuclei return to the ground state, they emit radiofrequency waves of discrete spectroscopic patterns (also known as the NMR spectrum). Analysis of such spectrum can provide information on the type of electromagnetic environment and the position of the excited atoms, and therefore the identity of the metabolite (Fig. 5). The major advantage of this method is that it can be used to study metabolites in any form of biological samples, either in solid, liquid or gaseous phases. As a result, it doesn’t demand any elaborate sample processing strategies and therefore it enables rapid acquisition of data and analysis. However, the major limitation of NMR spectroscopy is its sub-optimal sensitivity of detection, which is usually in the micromolar range. And since there is a wide distribution of low-abundant metabolites in biological samples, capturing a comprehensive metabolic snapshot with this platform becomes quite challenging [98, 101]. In addition, peak overlapping and peak shifting (due to pH or metal ions) in NMR spectroscopy can also affect the accuracy of biomarker identification and validation in biological samples [102,103,104].

Fig. 5
figure 5

A schematic depiction of NMR-based metabolomic profiling for candidate biomarker discovery

In addition to NMR, MS-based platforms are most widely used for metabolomic investigations due to their superior sensitivity and increased metabolome coverage [98]. MS-based approaches involve the determination of the mass-to-charge ratio (m/z) of the ionized metabolites (precursor ion) as well as their fragment/daughter ions which are distinct to each metabolite. The metabolites present in a biological specimen are ionized upon introduction to the mass spectrometer in a gaseous or liquid phase and thereafter, they are separated based on their m/z values under the influence of an electromagnetic field of the mass analyzer. Notably, the complexity of the samples can be reduced by introducing an initial chromatographic separation of the sample before acquiring MS measurements. This significantly enhances the resolution and aids in the identification of low-abundant as well as isobaric (same mass) metabolites in a given biospecimen. The two most frequently used platforms for achieving an initial chromatographic separation for metabolomics studies are gas chromatography (GC) and liquid chromatography (LC). In addition, capillary electrophoresis (CE) has also been employed in a few recent metabolomics studies. Initial separation by GC followed by MS (GC–MS) analysis has found tremendous applications in cancer metabolomics, including biomarker discovery studies. Furthermore, the development of universal fragment ion libraries of diverse metabolites has rendered relatively easy identification of metabolites present in any biological sample [105]. However, GC–MS relies on the chemical derivatization of metabolites so that they can be converted into volatile forms for ease of separation along the gaseous phase. Therefore, while it can efficiently analyze most types of metabolites, including polar, non-polar, organic, or inorganic metabolites, it is limited by the detection of phosphorous-containing metabolites that are often tedious to derivatize [106]. Besides GC, LC setups can also be employed for the initial separation of metabolites before MS measurements (LC–MS). In fact, due to the limitations of GC–MS in analyzing phosphorous-containing metabolites, the application of LC–MS platforms is now more prevalent in metabolomics studies as they do not rely on chemical derivatization, and thus can efficiently detect phosphorus-containing compounds. In these platforms, the separation of metabolites is achieved using a solid-phase column that has variable affinities to different types of metabolites. Thereafter, the bound metabolites are sequentially eluted by gradually changing the chemical nature of the flowing liquid phase. Although having several advantages, MS-based platforms are limited by throughput. The introduction of initial chromatographic separation techniques enhances the resolution and coverage of the metabolome, however, it significantly reduces the rate of sample analysis (number of samples analyzed per day or hour) [107]. On the other hand, if the samples are directly introduced into the mass spectrometer, a method known as Direct-infusion mass spectrometry (DIMS), can affect the metabolome coverage [98].

A major challenge associated with the MS-based metabolomics approach is the low identification confidence for certain metabolites. This is because metabolite fragmentation patterns are relatively unpredictable or uninformative (similar fragments for different species), and thus the MS/MS data is often insufficient to differentiate structural and stereo-isomers [97]. Nevertheless, the introduction of ion mobility separation has been able to resolve some of the issues associated with LC-based approaches. At present most MS systems have time scales that are well-integrated with rapid (millisecond) ion mobility separations; for every LC peak, several ion mobility spectra are acquired, and for every ion mobility spectrum, multiple mass spectra are recorded. The coupling of ion mobility separations with LC–MS-based platforms ensures improved mass spectra quality and sensitivity, the potential to distinguish co-eluting precursors, and the power to shorten chromatographic times without compromising resolution [108]. The ion mobility estimations can also be employed to calculate collision cross-sections for individual metabolites. Notably, since collision cross-section values are based on the physical properties of the metabolites and are independent of MS or LC settings, they offer remarkable inter-laboratory precision for the identification of a broad range of metabolites [109]. Stable isotope-labeled internal standards have been widely used to generate calibration curves for calculating metabolite concentration in clinical samples. However, the use of labeled internal standards for every metabolite may not be practical, and thus, can affect the feasibility of such targeted approaches for metabolite quantification [110].

Another promising technique named Seahorse XF technology, introduced by Agilent Technologies in the year 2006, is capable of measuring oxygen consumption rate (OCR) and extracellular acidification rate (ECAR) through fluorophore-based sensors, in living cells about important cellular functions like mitochondrial respiration and glycolysis which are essential components of altered energy metabolism, especially in cancer research [14, 111, 112]. This technique allows the addition of inhibitors, stimulators, or substrate compounds and their mixing, thereby permitting the analysis of the effect of such compounds on the automatic measurement of OCR and ECAR in real-time. OCR which represents the oxygen (O2) amounts present in the system is an indicator of mitochondrial respiration, while ECAR refers largely to the output protons (H+) which is an indicator of glycolysis (Fig. 6). This technique can measure the metabolic response of cells to drugs or other inhibitors in real-time measurements [113].

Fig. 6
figure 6

A schematic representation of SeaHorse based metabolomics strategy used for measuring the oxygen consumption rate (OCR) and extracellular acidification rate (ECAR) of live cells

In modern fluxomics or metabolic flux analysis experiments, heavy-labeled isotope tracing is the most popular and straightforward strategy for the investigation of intracellular metabolic flux in cancer cells. This method involves nourishing (feeding) the cancer cells with isotopically labeled nutrients followed by measurement of the isotopic labeling pattern of metabolites through various analytical approaches like mass spectrometry or NMR [66, 114, 115]. The isotopic labeling pattern of metabolites provides functional (and sometimes real-time) insight into the relative involvement of various pathways towards the biosynthesis of such metabolites. This information can be used to infer the active cancer cell metabolic pathways and their relative contributions to the biosynthesis in context of cancer. Further, the information from this method can be translated into the identification of perturbed metabolic pathways resulting from drug chemoresistance in various cancers.

Recent advances in ultra-sensitive high-resolution MS technologies have made it possible for researchers to investigate various analytes at single-cell resolution. Single-cell techniques have dominated the field of genomics for the past decade and have had a substantial presence in the field of proteomics for the past few years [116]. Even though metabolomics at the single-cell level is not a new concept, it remains a challenging methodology due to the dynamic nature of the cellular metabolome as well as the low number of metabolites enriched from the single-cells. In a nutshell, analysis technologies for single-cell metabolomics necessitate high selectivity and sensitivity, faster acquisition and response speeds, the possibility of capturing data from small sample amounts, and no impact on cellular states during sample preparation, while the data analysis needs to fetch information from the analytical data using complex computational techniques and robust models [117]. One of the simplest strategies for sample preparation in single-cell metabolomics is to undergo rapid freezing of the cells just before the metabolites have to be analyzed, using liquid nitrogen, which halts the cellular changes inside a cell [118]. In the past researchers have used methods that directly sucked the cellular contents of cell under study inside a nanoelectrospray ionization tip followed by dissolution in ionization solvent and direct injection into an MS instrument through nanospray ionization [119,120,121]. In another strategy, Nascimento et al. developed electrochemical sensors and used nanopipette tips covalently blended with glucose oxidase, which functionalized the tip as a nano-sensor for glucose allowing the quantification of intracellular glucose at a single-cell level [122]. This strategy can be translated into distinguishing cancerous cells from non-cancerous cells as it is scientifically known that cancer cells have higher glucose requirements than normal cells. Zhu et al., designed a silica capillary-fused micropipette needle where they could stimulate Paternò-Büchi (PB) reactions at the C=C bond, thereby facilitating the sites of C=C bonds in unsaturated lipids that can be identified in the cell lysates at the single-cell level. They studied a single human colon cancer cell (HCT-116) where they demonstrated that the use of a capillary-fused needle offered several benefits (like single cell metabolite extraction probe, cell lysis container, micro-reactor, and nano-ESI emitter) and direct processed sample introduction into the MS system [123].

Bioinformatics and computational biology tools in cancer metabolomics

In the past few decades, researchers have gained insights into disease mechanisms, leading to the identification and discovery of potential biomarkers and novel therapeutic targets by analyzing the metabolic profiles of various cancer specimens [124, 125]. Usually, the data originating from metabolomics experiments is comprised of enormous metabolic information, and such data processing profoundly counts on fast and robust computational techniques. Therefore, the sheer complexity and volume of metabolomics data necessitates the development of metabolomics data-centric sophisticated computational approaches. Metabolomics data analysis is a complex multi-stepped sequential pipeline starting from data preprocessing and feature extraction to pathway analysis and machine learning-based predictive model building, where bioinformatics and computational biology play an imperative role. Computational bioinformatic tools play a critical role in the advancement of our understanding of cancer metabolism and paving the way for future personalized treatment strategies.

While acquiring metabolomics data, inter-sample variations usually disguise the actual biological differential patterns. Sample-to-sample variation, if not properly addressed, can lead to deceptive results in MS analyses. This type of variation primarily reflects disparities in the initial sample composition, obscuring the true metabolic changes due to underlying biological processes. Therefore, implementing robust sample normalization protocols is crucial to minimize these variations, ensuring that observed MS signal differences are representative of actual biological phenomena rather than inconsistencies in sample preparation. In the metabolomics data analysis pipeline, various data-driven computational strategies like post-acquisition sample normalization, instrument-oriented drifting of signals, linearity of MS signal, and computational variations are considered important and are adopted in data analysis software [126].

A variety of bioinformatics and computational biology tools play a crucial role in handling the intricate data sets derived from metabolomics studies. These tools facilitate a range of tasks, from data processing and statistical analysis to pathway mapping and machine learning. The major tools and software packages used in cancer metabolomics are briefly discussed further.

Data processing and preprocessing tools

Data processing is crucial in metabolomics studies, as it involves refining raw data, reducing noise, and aligning peaks for further analysis. Several software packages can be used to process the metabolomics data. MS-DIAL is one such tool that is a comprehensive software package designed for untargeted metabolomics and lipidomics studies. MS-DIAL supports various types of mass spectrometry data and offers functions for peak detection, alignment, and identification. In cancer metabolomics, MS-DIAL is valuable for processing complex datasets and identifying metabolites across multiple samples. It integrates with metabolite databases, enabling researchers to annotate metabolites and perform pathway analysis. Likewise, XCMS is an R-based software package, which is widely used for processing and analyzing MS-based metabolomics data. It provides tools for peak detection, retention time alignment, and feature quantification. XCMS utilizes the Metlin library in the background to identify the metabolites from the MS data. In cancer metabolomics, this tool helps identify and quantify metabolites in a robust and reproducible manner. Further, MetaboAnalyst is an open-source platform that, with its recent updates, provides a range of data preprocessing functionalities, including normalization, scaling, and transformation. MetaboAnalyst is useful in cancer metabolomics for ensuring data consistency and comparability across different experiments and conditions. Another package mzMine, is a user-friendly platform for processing MS data that offers visualizations that facilitate peak detection and alignment. Researchers can manually inspect and adjust peaks, ensuring high-quality data for subsequent analysis.

Statistical analysis tools

After data processing, statistical analysis tools help identify significant metabolites, patterns, and correlations between different sample groups in cancer studies. SIMCA is a multivariate data analysis software that offers mathematical modeling techniques like principal component analysis (PCA), partial least squares-discriminant analysis (PLS-DA), and orthogonal partial least squares-discriminant analysis (OPLS-DA). These methods are crucial in cancer metabolomics for identifying distinct metabolic profiles and potential biomarkers. Another freely available open-source tool widely used amongst the metabolomics user community is MetaboAnalyst. Beyond data preprocessing, MetaboAnalyst provides a range of statistical tools, including univariate and multivariate analyses. It supports various statistical tests and visualizations, such as clustering, heatmaps, and volcano plots, allowing researchers to find significant metabolites that distinguish cancerous from non-cancerous samples in a very simple GUI-based interface [87]. Many other R and Python-based packages for statistical analysis can be used for metabolomics data analysis, however, they are not much user-friendly and require some programming skills.

Metabolic pathway and network analysis tools

An important bioinformatics requirement in metabolomics research is the need to map the metabolic findings to biological pathways and processes. Metabolic pathway analysis tools help researchers map identified metabolites to known biological pathways, providing insights into the metabolic networks involved in cancer. KEGG (Kyoto Encyclopedia of Genes and Genomes) is one such tool that offers comprehensive pathway maps that integrate genomic, proteomic, and metabolomic data. In cancer metabolomics, KEGG is used to identify pathways altered in cancer cells, guiding researchers toward potential therapeutic targets [127]. Likewise, the MetaboAnalyst platform also offers integrated pathway analysis with metabolomics data, allowing researchers to map metabolites to known pathways and identify those significantly impacted by cancer. This package helps reveal key metabolic pathways involved in cancer progression and resistance to treatment if such kind of research questions are analyzed through metabolomics data. Similarly, network analysis tools are also an important aspect of metabolomics data analysis and are majorly used to visualize and analyze complex interactions between metabolites, proteins, and genes to provide a holistic overview of the molecular interplay existing in cancer. Such tools provide a broader perspective on the biological context of cancer metabolomics data. Cytoscape is a popular network visualization tool, that allows researchers to create and analyze complex biological networks through its inbuilt plugins. In cancer metabolomics data analysis, it helps visualize the interactions between metabolites and other cellular components, offering insights into the broader metabolic landscape of cancer.

Integration tools

Cancer involves complex interactions between various biological systems. Data integration tools help combine the data from different omics disciplines with metabolomics, enabling a more comprehensive understanding of cancer biology. Bioconductor package is a collection of R packages for analyzing and integrating various omics data. Bioconductor facilitates the integration of genomics and metabolomics data, allowing researchers to explore how genetic variations affect metabolic pathways in cancer. MetaboAnalyst also offers modules for the integration of multi-omics data in an intuitive GUI interface that is user-friendly.

Machine learning and predictive modeling tools

In recent times, machine learning and predictive modeling have become increasingly important in biological research including cancer metabolomics, providing tools for building predictive models for diagnosis, prognosis, and treatment response. Scikit-learn, a Python library for machine learning, offers a wide range of tools for building predictive models and performing clustering and classification. In cancer metabolomics, these models can help identify metabolic signatures that distinguish between different cancer types or stages. Similarly, TensorFlow and PyTorch are deep learning frameworks that enable the development of complex models for pattern recognition and classification in large datasets. In cancer metabolomics, these tools are used to identify metabolic signatures that distinguish between different cancer types or stages and elucidate hidden patterns and relationships that may be indicative of specific cancer behaviors.

Overall, bioinformatics and computational biology tools are integral to cancer metabolomics, enabling researchers to process, analyze, and interpret the vast amounts of data generated in these studies. Tools like MS-DIAL, XCMS, MetaboAnalyst, and others are crucial for data processing and analysis, while network and pathway analysis tools help uncover the broader biological context. Machine learning and integration tools are at the forefront of predictive modeling and cross-omics analysis, contributing to a deeper understanding of cancer metabolism and the development of novel therapeutic strategies.

Metabolomics in cancer and its metastasis: exploration of potential biomarkers for diagnosis and novel theranostic targets

Advances in omics technologies have revolutionized biology for a better understanding of biological processes and disease etiology. Integration of metabolomics with bioinformatics grants in-depth analysis of metabolites which helps to decipher the biological complexity of cancer and provide insights into the characterization of the disease. Furthermore, metabolomics can also aid in identifying metabolites that can serve as biomarkers for cancer diagnosis [4,5,6].

Metabolite biomarkers for cancer diagnosis

The primary aim of most metabolomic investigations has been to establish a biomarker panel for early diagnosis of cancer. According to the US Food and Drug Administration (FDA), a biomarker is a ‘characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or biological responses to a therapeutic intervention’. The metabolome shares close connectivity with cancer, and thus metabolomics has been implemented by researchers to identify therapeutic and diagnostic targets in lung cancer [128, 129], hepatocellular carcinoma (HCC) [130, 131], colorectal cancer (CRC) [125, 132, 133], leukemia [132], bladder cancer [134], esophageal adenocarcinoma (EAC) [135], pancreatic cancer [136,137,138,139], head and neck cancer [140], gastric cancer [141], prostate cancer [124, 142], oral cancer [143, 144], ovarian cancer [145] and breast cancer [146,147,148,149] (Table 1).

Table 1 A list of biomarker panels identified in tissue and biofluids of cancer patients by metabolomics approach

In a study that provided metabolic adaptations of invasive ductal carcinoma (IDC), a form of breast cancer, to identify diagnostic markers along with potential therapeutic targets in tissue and serum of patients with IDC, 42 and 32 metabolites were found to be significantly altered, respectively [146]. These metabolites include amino acids, nucleic acids, amino sugar, fatty acids, and other organic compounds. More importantly, a three metabolites panel was identified, including tryptophan, tyrosine, and creatine, in both tissue and serum samples that might be beneficial in screening IDC from control as well as patients [146]. Another LC–MS-based metabolomics investigation in plasma of breast cancer patients (n = 70) and healthy controls (n = 46) identified a panel of four metabolites including L-octanoylcarnitine, 5-oxoproline, hypoxanthine, and docosahexaenoic acid that showed potential as biomarkers for early diagnosis of breast cancer [147]. Putluri et al. [149] analyzed the metabolome of breast cancer sub-types, including luminal A, luminal B, HER2-enriched, basal-like, and Tamoxifen-resistance cancer cells using LC–MS-based metabolomics strategy and found that pyrimidine metabolism is significantly upregulated in luminal B, Her2-enriched and basal-like subtypes as well in Tamoxifen-resistant breast cancer cells. Downstream investigation of RRM2 (Ribonucleotide reductase subunit M2), an enzyme involved in the pyrimidine biosynthesis pathway revealed that it is upregulated in the above breast cancer sub-types as well as in Tamoxifen-resistant breast cancer cells. Subsequent knockdown of the RRM2 gene in resistant cancer cells resulted in attenuation of cell growth and proliferation [149].

Sreekumar et al. [124] investigated tissue (n = 42), urine (n = 110), and plasma (n = 110) samples of prostate cancer patients using GC–MS and LC–MS-based metabolomics strategies. The study identified several altered metabolites and sarcosine, an N-methyl glycine derivative was found to be highly upregulated in all the three prostate cancer biospecimens. Synthesis of sarcosine is driven by catalysis of the enzyme glycine N-methyltransferase, and knockdown of this enzyme resulted in a significant reduction of prostate cancer invasion and migration [124]. GC–MS-based tissue metabolomics was employed for investigating metabolic alterations in ovarian cancer patients (n = 101). The authors found a significant alteration in the levels of 172 metabolites that were largely involved with lipid and amino acid metabolism. Among the altered metabolites, n-acetylaspartate (NAA), a neuron-specific metabolite that demonstrated the highest fold change (28.4 folds, p < 0.001) was further successfully validated in a separate cohort of ovarian cancer samples (n = 145). Further, N-acetyl aspartate synthetase, the enzyme involved in the synthesis of NAA was knocked down in HEYA8 ovarian cancer cells resulting in a reduced expression of the metabolite with a concomitant attenuation of cell proliferation [145]. Urine and plasma samples from lung cancer patients (n = 178 and 156) and healthy individuals (n = 351 and 60) were analyzed using LC–MS-based metabolomics strategies in two independent investigations that resulted in the identification of two biomarker panels—Creatine riboside (CR) and N-acetylneuraminic acid (NANA); and β-hydroxybutyric acid, LysoPC 20:3, PC ae C40:6, citric acid, and fumaric acid [128, 129].

Biomarker development for early detection of hepatocellular carcinoma (HCC) has gained pace as it has become a significant cause of patient deaths in chronic liver disease. The most extensively used biomarker in HCC is Alpha-fetoprotein (AFP). Elevated serum AFP levels correlate well with increased HCC development, and it has reached phase 5 of biomarker development for its clinical utility. Interestingly, a multicentric LC–MS-based serum metabolomics study led by Luo and co-workers identified a panel of biomarkers consisting of Phenylalanyl-tryptophan and glycocholate that could outperform AFP in differentiating HCC patients from healthy controls [130]. The study involved a total of 1448 individuals, including healthy controls (n = 290) and patients with HCC (n = 645), liver cirrhosis (n = 310), intrahepatic cholangiocarcinoma (n = 25), and chronic hepatitis B virus infection (n = 150). Another GC–MS-based metabolomics profiling of serum and urine samples of 822 patients with HCC, 24 patients with benign liver tumors, and 71 healthy individuals found that inosine and chenodeoxycholic acid levels can successfully differentiate HCC patients from healthy controls [131].

Several efforts have also been dedicated to identifying potential metabolite biomarkers for the diagnosis of pancreatic cancer. In such an LC–MS-based metabolomics study on serum exosomes from 22 patients with pancreatic cancer and 57 healthy individuals, Tao and co-workers identified a panel of lipid biomarkers including LysoPC 22:0, PC (P-14:0/22:2) and PE (16:0/18:1) that were closely associated with tumor stage and patient’s overall survival [137]. Another metabolomics investigation employing both GC–MS and LC–MS platforms identified glutamate, choline, 1,5-anhydro-D-glucitol, betaine, and methyl guanidine as potential biomarkers in the plasma of 100 patients with pancreatic cancer and 100 healthy individuals [139].

The application of LC–MS-based metabolomics has also found its way to analyzing serum and urine samples of CRC patients for the identification of candidate metabolite biomarkers. In such a multicentric investigation, Deng and co-workers analyzed urine samples from 171 CRC patients and 171 healthy individuals on an LC–MS platform and identified a biomarker panel containing diacetylspermine and kynurenine. The two metabolites could discriminate CRC patients with an AUC of 0.864, a sensitivity of 80.0%, and a specificity of 80.0% [133]. A couple of years later, serum samples from 98 CRC patients and 50 healthy individuals were investigated using LC–MS-based metabolomics and another panel of potential biomarkers containing hexadecanedioic acid, 4-dodecylbenzene sulfonic acid, 2-pyrocatechuic acid, and formyl anthranilic acid was identified [125].

Saliva is yet another non-invasive biofluid that can be easily accessed and is an attractive source of metabolites, particularly for oral cancers. Ishikawa and co-workers employed a capillary electrophoresis (CE)-TOF–MS-based metabolomics platform to identify the dysregulated metabolites in the saliva of 24 patients with oral cancer and 44 healthy individuals. They found that two metabolites, namely S-adenosylmethionine (SAM) and pipecolate could discriminate oral cancers from controls with a high area under receiver operating characteristic (ROC) curves (0.827; 95% confidence interval, 0.726–0.928, P < 0.0001) [143]. Another biomarker panel consisting of L-phenylalanine and L-leucine was proposed for detecting oral cancers based on an LC–MS-based metabolomics study conducted on the saliva of 30 patients with oral cancer and 60 healthy individuals [144].

While tumor tissues, serum, plasma, and saliva have been frequently employed for identifying metabolic alteration, a recent interesting study tried a dried blood spot (DBS) sampling technique coupled with direct infusion MS to investigate metabolic alterations in 166 gastric cancer patients and 183 healthy individuals [141]. DBS relies on withdrawing microvolumes of blood from subjects by heel or finger puncture and therefore has an advantage over conventional blood withdrawal methods in terms of stability, simpler storage, and easier transfer. The study identified a panel of biomarkers including alanine, arginine, glycine, ornithine, tyrosine/citrulline, valine/phenylalanine, 3-Hydroxybutyrylcarnitine, isovalerylcarnitine/propionylcarnitine, decadienoylcarnitine that could distinguish gastric cancer patients from healthy individual with appreciable sensitivity (87.5 to 95.4%) as well as specificity (86.3 to 90.0%) [141].

Emergence of single-cell metabolomics in cancer and metastasis

The prowess of metabolomics has been evaluated at single-cell level as well. Many researchers have worked towards exploring this technology to understand the biology of tumor drug resistance mechanisms and tumor metastasis. By evaluating the genomic and metabolic biochemical information of individual cells, researchers can discriminate several genes and regulatory pathways steering the development of drug resistance and metastasis in cancer cells. Single-probe MS technology was used by Sun et al., to scrutinize the metabolic alterations found in colorectal cancer stem cells and non-stem cancer cells wherein they observed that colorectal cancer stem cells had higher amounts of TCA cycle metabolites and unsaturated lipids. They further administered inhibitors of SCD1, NF-κB, and ALDH1A1 to colorectal cancer stem cells and found that this administration of inhibitors reduced the abundance of unsaturated lipids and impeded tumor spheroids formation, eventually showing a reduction in stemness of colorectal cancer stem cells [150]. In another study, human colorectal cancer cell line HCT-116 cells were exposed to mitotic inhibitors namely, taxol and vinblastine by Liu et al., wherein they employed single-cell MS to understand the metabolomic alterations at single cell level revealing four biological pathways that could be implicated in the chemotherapy intervention of colorectal cancer [151]. Similarly, single-cell metabolomics has been explored in several studies related to cancer metastasis. Abouleila et al. used an untargeted approach to understand the differences between gastric cancer (GC) and colorectal cancer (CRC) patient samples and employed a strategy to decipher the metabolic cues at single-cell circulating tumor cells (CTCs) level using microfluidics-based live cell enrichment methods coupled to single-cell MS analysis. They reported several statistically significant differences in metabolites and lipids and specifically identified some important ones like acylcarnitines, sterol lipids, and eicosanoids which were found to be elevated in CRC CTCs, while glycerophospholipids showed higher abundances in GC CTCs [152]. Chen et al. utilized a single-cell MS-based metabolomics approach to decipher the metabolic picture in drug-resistant cancer cells. They used the HCT-116 cell line model of CRC and exposed it to Irinotecan (IRI) drug, which is a broadly used drug to treat metastatic CRCs, to make HCT-116 cells resistant to IRI. They repurposed an anti-diabetic drug, Metformin, which is reported to selectively kill cancer stem cells, and hypothesized that metformin can re-sensitize IRI-resistant HCT-116 cells and rescue the therapeutic effect. Using a single-probe MS technique towards the analysis of live IRI-resistant cells, they observed that metformin treatment was correlated with the decrease in lipids and fatty acids, probably through fatty acid synthase (FASN) inhibition. In this interesting study, the authors were able to show the effect of Metformin-IRI synergy to overcome drug resistance in IRI-resistant CRC cells [153]. In other words, single-cell metabolomics holds the potential for substantiating metabolomics research on rare types of cells, especially in terms of cancer.

From the above-mentioned studies, it is evident that metabolomics holds great promise in identifying potential biomarkers for cancer diagnosis and thereby contributes to accelerating the treatment procedures. However, we do not yet have a metabolite biomarker approved by the FDA to be used in the clinics, and therefore, this warrants validation of the biomarker panels identified in the discovery phase in larger multicentric cohorts and clinical trials.

Potential of metabolomics in the development of novel theranostic targets

Theranostics, a combination of diagnostics and therapeutics has gained the attention of researchers for advancements in personalized oncology. The development of theranostics could be used for parallel tumor targeting and tumor imaging since it can destroy cancer cells while sparing normal cells. Therapeutics in oncology target altered pathways involved in growth, proliferation, and metastasis. The metabolite profile gets altered whether caused by external factors or internal dysfunction in patients. Thus, metabolomics is opening new opportunities in the investigation of therapeutic responses [154].

Bhujwalla et al. introduced the term ‘metabolotheranostics’ to specifically target the diseased-based alterations in metabolic pathways with image-guided delivery platforms to achieve specific disease therapy [155]. Molecular imaging can be used to identify theranostic targets specific to cancer with the overall purpose of minimizing damage to healthy tissue. Theranostic imaging requires delivery of therapeutic cargo to targets which are mostly receptors and antigens specific to cancer cells that can be imaged without affecting the normal cells. Metabolic imaging creates new opportunities in cancer for metabolotheranostics where cancer-specific metabolic alterations can be detected [155].

Biomarker studies would extensively help to identify, validate, and optimize therapeutic targets, determine drug mechanisms, and predict treatment response and resistance to cancer therapy. Metabolomics finds an important connection in all of these applications. Personalization of therapies would be beneficial to patients for the determination of markers as a part of drug development for better performance of treatment [156].

Challenges and future perspectives in the development of cancer metabolite biomarkers

Deciphering the various realms of metabolism toward finding the cues for life-threatening diseases like cancer has been in practice for many decades now. Although metabolomics holds great potential for achieving its goals, concrete conclusions from such cancer metabolomics studies could not be drawn confidently.

Challenges in cancer metabolomics

The major limitation in cancer metabolomics is the lack of technological sufficiency which made it difficult to decipher the subtle changes, restricting the researchers from coming up with effective disease-specific biomarkers and therapeutic targets. Given its dynamicity, the study of the metabolome needs very precise and strictly controlled biological sampling strategies to avoid false positives. Moreover, dietary habits, lifestyle, and demographic diversities among individuals play an important role in the formation of the metabolome of an individual under healthy as well as diseased conditions [157]. Particularly, the variability in the demographic conditions needs to be autonomously evaluated and taken into consideration for metabolomics biomarker discovery [158]. These highly important aspects of proper, strict, and well-planned study design as well as the demographic variables to be considered have been well-elaborated in a recent review article [157].

Currently, cancer is one of the most socioeconomically affecting diseases and is a major cause of death worldwide. Therefore, the search for new biomarkers is essential for early diagnosis and effective treatment. It is evident that the majority of cancers result in characteristic alterations in the metabolite profile before manifesting clinical symptoms. Therefore, these altered metabolites could be promising novel biomarkers/biosignatures for diagnosis and therapeutics. Advances in mass spectrometry technologies rendered a powerful platform for identifying potential biomarkers to improve diagnosis and therapeutics. Nevertheless, no single metabolite biomarker or metabolite biosignature reached a stage for cancer clinical utility. The majority of the metabolomics studies in clinical research so far have been performed in a small cohort of samples and are limited to the discovery phase. It is essential to validate the discovery phase findings in larger cohorts of fresh samples before proposing clinical trials. An interdisciplinary collaborative approach is the need of the hour to succeed in metabolomics-driven cancer biomarker discovery. Extensive collaborations are still needed between biologists, chemists, and clinicians to gain success in identifying metabolic markers for cancer. Moreover, there is a major need for innovations in mass spectrometry technology and updated metabolite libraries which will help to identify metabolites at very low concentrations. Longitudinal clinical metabolomics studies have the potential to uncover the metabolite changes that are responsible for disease progression, drug resistance, and understanding the disease mechanism. Unfortunately, such studies are very limited in metabolomics-driven biomarker discovery due to a lack of a proper collaborative environment between hospitals and research laboratories. Although many obvious obstacles are at play, the rise in usage of metabolomics in clinical research may soon come up with potential metabolite biomarkers that would be useful for diagnosis, prognosis, and treatment of cancer in the near future.

Challenges of achieving quantitative accuracy in untargeted metabolomics

Untargeted metabolomics, while offering a comprehensive view of the metabolome, faces significant challenges in achieving quantitative accuracy. Major issues like ion suppression in MS and overlapping signals in NMR data can significantly compromise the reliability of metabolite quantification and subsequent biomarker validation. Therefore, users must utilize the strategies to improve quantitative accuracy and their inherent limitations, along with the potential impact on scalability and sensitivity. Ion Suppression in MS is a well-known phenomenon where the presence of certain metabolites can inhibit the ionization of others, leading to skewed results [159,160,161,162]. This issue can arise from co-eluting compounds or matrix effects and can significantly affect the quantification of metabolites in complex biological samples. The addition of internal standards (IS), especially stable isotope-labeled standards to the sample, is a critical strategy in mass spectrometry (MS) to mitigate the effects of ion suppression, ensuring more reliable and accurate quantification of metabolites in complex biological samples [163]. Generally, while adopting the IS strategy during MS analysis, the internal standards are chosen to closely resemble the target metabolites in terms of chemical structure and properties, but they are different enough to avoid interference in the MS data. IS serves as reference points for data quantification in MS analysis. Researchers can normalize the data by comparing the response of the target analytes to that of the internal standards, eventually compensating for fluctuations in ionization efficiency, thereby ensuring more accurate quantification [163]. Multiple Reaction Monitoring (MRM) uses specific ion transitions to detect and quantify target metabolites, reducing the impact of ion suppression. This targeted approach can enhance sensitivity and selectivity and could be a great strategy to validate the analytes of interest once they have been screened from the untargeted metabolomics approach [164].

NMR spectroscopy is another powerful tool in metabolomics, offering non-destructive analysis and a broad detection range. However, overlapping signals from different metabolites can complicate accurate quantification, especially in untargeted metabolomics applications. Overlapping signals in NMR occur when two or more metabolites have resonance frequencies that are very close or identical, leading to the superimposition of their NMR peaks. During the NMR data analysis, it becomes difficult to assign the peaks to specific metabolites when the peaks overlap and this ambiguity complicates the process of metabolite identification and quantification [165]. Advanced software algorithms like Spectral Automatic NMR Decomposition (SAND), MetaboLab, and Automated Quantification Algorithm (AQuA) can separate overlapping signals, allowing for more accurate quantification [166,167,168], however, they require significant computational resources. Expertise in NMR data analysis is warranted for 2D NMR techniques like Total Correlation Spectroscopy (TOCSY), Correlation Spectroscopy (COSY), or Heteronuclear Single Quantum Coherence (HSQC) which can provide additional structural information, aiding in the resolution of overlapping signals. While effective, these methods are time-consuming and may not be practical for high throughput studies.

Further, accurate quantification in metabolomics also relies on multipoint calibration curves, which are used to establish a relationship between metabolite concentration and instrument response. Calibrations can be complex in untargeted metabolomics due to the wide range of metabolites and their varying chemical properties. Therefore, addressing these challenges researchers have implemented advanced quantitative techniques like isotope dilution where stable isotope-labeled standards for each curated metabolite of interest are used, creating highly accurate calibration curves. However, although reliable, this approach is resource-intensive and may not be feasible for large-scale studies. Standard spiking is another technique where the addition of known amounts of a standard to the sample, can help correct for matrix effects and improve quantification accuracy. However, it requires careful experimental design and may not be suitable for all types of samples. Quantitative Nuclear Magnetic Resonance (qNMR) is an approach that uses specific internal standards to create calibration curves in NMR and offers high reproducibility but also requires specialized equipment and expertise.

In summary, achieving quantitative accuracy in untargeted metabolomics presents significant challenges due to ion suppression in MS and overlapping signals in NMR. Strategies such as internal standards, sample cleanup, spectral deconvolution, and advanced calibration techniques are essential to improving reliability. While these methods offer solutions, they come with limitations regarding scalability and sensitivity, highlighting the need for continued innovation in this field.

Conclusion

Although being a highly complex and challenging field of research, metabolomics holds tremendous potential in the field of disease biology. Metabolomics being highly dynamic, complements the other omics platforms of research in understanding the metabolome in terms of disease indicators or biomarkers [169]. Therefore, utmost care must be employed while devising the strategies for sample collection thereby minimizing the chances of variability in the metabolite profiling. Biomarker discovery gains confidence when the differential metabolite profiling from the discovery cohort can be verified and replicated in an independent cohort of disease-specific and clinically well-characterized samples. Furthermore, another important factor of cross-validating the study results through multicentric and multinational cohorts would epitomize the potential capability of translating the results from bench to bedside in the future. In the long run, researchers should be encouraged to adapt these kinds of collaborative and multicentric metabolomics studies across various transnational settings. These approaches, schemes, and policies in the metabolite biomarker discovery pipeline assure the enhancement of the overall evaluation of the human population, meanwhile also expediting the path toward novel drug discovery and precision medicine in the near future.