1 Introduction

Lung cancer is a malignant tumor with the highest morbidity and mortality in the world, with five-year survival rate less than 20% (Bray et al., 2020; Herbst et al., 2018). Non-small cell lung cancer (NSCLC) accounts for approximately 85% of lung cancers. Among NSCLC, lung adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC) are the most common subtypes (Ettinger et al., 2017). There are many differences in their molecular structure, genetic characteristics and treatment methods, so it is very important to accurately distinguish between these two subtypes.

LUAD is an alveolar epithelial cell carcinoma from adenoid differentiation or mucus-producing cancer cells while LUSC is an epithelial basal cell carcinoma that shows keratinization and intercellular bridges (Travis, 2020). Clinical treatment strategy is quite different between LUAD and LUSC (Relli et al., 2019). Chemotherapy drugs (such as pemetrexed) are very effective in treating LUAD, but they are not recommended for LUSC patients. Besides, antiangiogenic drugs are generally not available in treating LUSC due to frequent bleeding of this population. Driver gene targeted therapy is mainly adapted to LUAD whereas immunotherapy is more effective in treating LUSC (Minguet et al., 2016). Therefore, in order to obtain a satisfactory clinical outcome, a comprehensive examination and pathological classification is necessary before the treatment.

Tissue biopsy is the gold standard for clinical pathological diagnosis and molecular typing of NSCLC. Initially, hematoxylin-eosin (HE) staining observation of tumor tissues under an optical microscope was used to distinguish between LUAD and LUSC. However, it is difficult to make an accurate diagnosis when the tumor structure is unclear due to low differentiation, necrosis or serious extrusion. Nowadays, immunohistochemical method has been widely recommended in clinical practice to distinguish LUAD from LUSC by employing multiple sensitive markers [for example, napsin-A (NAPSA), thyroid transcription factor-1 (TTF-1) and tumor protein p63 (TP63)] (Ao et al., 2014; Schwartz & Rezaei, 2013; Zhan et al., 2015). Although genetic differences between LUAD and LUSC have been clarified in depth, the metabolic differences of these two subtypes are still unclear (Wang et al., 2020).

Metabolomics is an emerging discipline that characterizes a comprehensive profile of endogenous metabolites in biological systems. In particular, metabolomics has been widely used in diagnosing diseases, understanding disease mechanisms, identifying novel drug targets and customizing drug treatments (Wishart, 2016). Metabolome represents the downstream output of the genome and proteome. In contrary to genome and proteome which indicate what may occur, metabolome reveals what is currently happening. In the field of precision medicine, metabolome can promote accurate diagnosis and personalized treatment by characterizing metabolic phenotypes of a cell, organ or biological system. In the present study, we aimed to distinguish the metabolic phenotypes between LUAD and LUSC based on a comprehensive targeted metabolomic platform. The investigation and classification of metabolic phenotype of these two NSCLC subtypes will facilitate a deeper understanding and precise therapy of lung cancer.

2 Methods

2.1 Study participants

A total of 162 patients were initially enrolled from the cancer center of Wuhan Union Hospital, all of which were confirmed NSCLC patients. The heparin anticoagulated plasma samples were collected before treatments, centrifugated at 3500 rpm for 10 min at 4 °C, followed by pipetting supernatant plasma and immediately transferring to a − 80 °C freezer until metabolic analysis. The present study analyzed 128 plasma samples (28 LUSC and 100 LUAD patients) of them. The other patients were not included in this study because 21 of them were diagnosed as other subtypes such as poorly differentiated carcinoma and sarcomatoid carcinoma and the pathological diagnosis of rest patients remained unknown. The pathological subclassification of NSCLC patients were according to the acknowledged WHO guidelines (Osmani et al., 2018).

2.2 Targeted metabolomic analysis

The plasma samples were thawed on ice and mixed by a 3 volume of ice-cold methanol, the mixture was then whirled for 3 min and centrifuged with 12,000 rpm at 4 °C for 10 min. Then the supernatant was collected and centrifuged at 12,000 rpm at 4 °C for 5 min. Finally, the supernatant was collected again for LC-MS/MS analysis.

The sample extracts were analyzed using a liquid chromatography-electrospray ionization tandem mass spectrometry (LC-ESI-MS/MS) system (Shim-pack UFLC SHIMADZU CBM A system, https://www.shimadzu.com/; MS, QTRAP® System, https://sciex.com/). The analytical conditions were as follows: UPLC: column, Waters ACQUITY UPLC HSS T3 C18 (1.8 μm, 2.1 mm × 100 mm); column temperature, 40 °C; flow rate, 0.4 mL/min; injection volume, 5 µL; solvent system, water (0.1% formic acid): acetonitrile (0.1% formic acid); gradient program, 95:5 v/v at 0 min, 10:90 v/v at 11.0 min, 10:90 v/v at 12.0 min, 95:5 v/v at 12.1 min, 95:5 v/v at 14.0 min.

LIT and triple quadrupole (QQQ) scans were acquired on a triple quadrupole-linear ion trap mass spectrometer (QTRAP), QTRAP® LC-MS/MS System, equipped with an ESI Turbo Ion-Spray interface, operating in positive and negative ion mode and controlled by Analyst 1.6.3 software (Sciex). The ESI source operation parameters were as follows: source temperature 500 °C; ion spray voltage (IS) 5500 V (positive), − 4500 V (negative); ion source gas I (GSI), gas II (GSII) and curtain gas (CUR) were set at 55, 60, and 25.0 psi, respectively; the collision gas (CAD) was high. Instrument tuning and mass calibration were performed with 10 and 100 µmol/L polypropylene glycol solutions in QQQ and LIT modes, respectively. A specific set of MRM transitions were monitored for each period according to the metabolites eluted within this period.

2.3 Targeted lipidomic analysis

The plasma samples were melted on ice, vortexed for 10 s and then centrifuged with 3000 rpm at 4 °C for 5 min. 50 µL of each sample was taken and homogenized with 1 mL mixture (include methanol, MTBE and internal standard). The mixture was whirled for 2 min, followed by addition of 500 µL water, and whirled again for 1 min. After centrifugation with 12,000 rpm at 4 °C for 10 min, 500 µL supernatant of each sample was taken and concentrated. Next, dissolve the extract with 100 µL mobile phase B, then stored in − 80 °C. Finally, take the dissolving solution into the sample bottle for LC-MS/MS analysis.

The sample extracts were analyzed using an LC-ESI-MS/MS system (Shim-pack UFLC SHIMADZU CBM A system, https://www.shimadzu.com/; MS, QTRAP® System, https://sciex.com/). The analytical conditions were as follows, UPLC: column, Waters ACQUITY UPLC HSS T3 C18 (1.8 μm, 2.1 mm × 100 mm); column temperature, 40 °C; flow rate, 0.4 mL/min; injection volume, 5 µL; solvent system, water (0.04% acetic acid): acetonitrile (0.04% acetic acid); gradient program, 95:5 v/v at 0 min, 5:95 v/v at 11.0 min, 5:95 v/v at 12.0 min, 95:5 v/v at 12.1 min, 95:5 v/v at 14.0 min. The subsequent MS detection is consistent with the above-mentioned targeted metabolomics analysis.

2.4 Statistical analysis

The following results were obtained with R software. At first, the data of endogenous metabolites in terms of homogeneity and reproducibility was visualized by principal component analysis (PCA). Then, the orthogonal partial least squares discriminant analysis (OPLS-DA) was further applied to remove irrelevant variables. The variable importance in the projection (VIP) values of each metabolite were obtained to measure the contribution of the variable to the model. The validity of OPLS-DA model was judged by R2Y (the interpretability of the model for the categorical variable Y) and Q2 (predictability of the model).

The following data were analyzed using SPSS 23.0 software (SPSS, Chicago, IL, USA). The levels of differential metabolites were expressed using violin plots (Fig. 4). A non-parametrical Kolmogorov-Smirnov test was used for confirming a normal distribution of the data. Differences between the two groups were examined using two-tailed independent samples T-test. P values were presented when variances were equal or inequal, in which a Levene’s test was used to assess variance homogeneity. The logistic regression analysis was performed to evaluate the diagnostic value of the combined biomarkers model. Model performance was assessed by the receiver operating characteristic curve (ROC).

The analysis of metabolic pathways was conducted by the Kyoto Encyclopedia of Genes and Genomes (KEGG, https://www.kegg.jp/) (Kanehisa & Goto, 2000). The enrichment of differential expression metabolites was visualized as bubble chart. The function of metabolites was assigned according to the Human Metabolome Database (HMDB, https://hmdb.ca) (Wishart et al., 2009).

3 Results

3.1 Baseline of the study population

The clinical characteristics of the LUAD and LUSC patients were displayed in Supplementary Table 1. It was shown that LUSC patients were significantly older than LUAD patients (p < 0.001). Male accounted for a greater proportion in LUSC patients (86%) when compared with LUAD patients (64%). It was noteworthy that a large number of non-smokers (54%) were diagnosed with LUAD while only 25% of LUSC patients has no smoking history. The two types of lung cancer patients have no significant difference on BMI and disease stage.

3.2 Detection of endogenous metabolites in plasma of NSCLC patients

The present study was conducted based on an integrated platform of targeted metabolome and lipidome, which contained more than 3000 characterized compounds (Supplementary Table 2). A total of 128 plasma samples were tested in the present study, which were extracted by hydrophilic and hydrophobic methods, respectively, and each sample was then detected using UPLC-MS/MS in positive and negative modes (Fig. 1). Finally, 1141 compounds were detected, which were qualitatively and quantitatively categorized into 32 types (Table 1).

Fig. 1
figure 1

The representative chromatograms of A metabolomic detection in positive mode; B metabolomic detection in negative mode; C lipidomic detection in positive mode and D lipidomic detection in negative mode

In this study, quality control (QC) samples (a mixture of sample extracts) were inserted into the queue to monitor the repeatability of the analytical method in every ten samples. The method stability was assessed by overlaying the total ion current diagrams (TIC diagrams) of different QC samples. The results showed that the curves of the TIC diagrams were highly overlapped, and the retention time and peak intensity were consistent, demonstrating that the signal was stable throughout the analysis process (Supplementary Fig. 1 A–D). In addition, PCA map was used to estimate the degree of variability and the overall metabolic difference. Our results showed that the QC samples were not separated from each other, indicating the stability of this analytic method (Supplementary Fig. 1E, F).

Table 1 The detected compounds and differentially expressed metabolites

3.3 Identification of differential metabolities in LUAS and LUSC groups

The raw metabolomic and lipidomic data was displayed using an OPLS-DA score map, which showed clear distinction between LUAD and LUSC group (Fig. 2A). Subsequent permutation test of the generated model showed that R2X = 0.365, R2Y = 0.929, Q2 = 0.547 (Fig. 2B), confirming that this model was reliable with respect to prediction performance. Figure 2C showed the S-plot of OPLS-DA, in which the metabolites near the upper right and lower left corners indicated differential expression (red dots, VIP ≥ 1; green dots, VIP < 1)

Fig. 2
figure 2

All the detected 1141 metabolites were shown as: A OPLS-DA score map, B permutation test and C S-plot of OPLS-DA

Among the 1141 detected endogenous metabolites, we screened differential components based on both Fold Change (FC) variations and VIP values (FC>2/ < 0.5 and VIP > 1.0, Fig. 3A, B). The result of screening for differential metabolites was visualized in a volcano map (Fig. 3C). A total of 19 differential metabolites were identified, including 3 down-regulated and 16 up-regulated differential metabolites (Table 1). Moreover, the Pearson correlation analysis method is applied to perform correlation analysis on these 19 metabolites (Fig. 3D). Several pairs of them showed strong correlation including 2-Hydroxybutanoic Acid/d-Glyceric Acid, cis-1-Pentadecenoic Acid (C15: 1)/2-(Methylthio) ethanol, Inosine/Hypoxanthine-9-β-d-Arabinofuranoside, Inosine/6-Methylnicotinamide, 6-Methylnicotinamide/ Hypoxanthine-9-β-d-Arabinofuranoside, 5-Aminosalicylic Acid/N-Acetylhistamine, 5-Aminosalicylic Acid/1,4-Dihydro-1-Methyl-4-Oxo-3-Pyridinecarboxamide and Riboflavin/Lumichrome.

Fig. 3
figure 3

All the 19 differentially expressed metabolites were shown by: A fold change, B VIP values, C volcano map and D Pearson correlation analysis

3.4 Identification of diagnostic efficacy of differential metabolites

As mentioned above, 19 differential metabolites were screened out between LUAD and LUSC groups (Fig. 4). We explored whether the demographic characteristics of patients had influences on the plasma metabolic profiles of them by plotting four heatmaps (Supplementary Fig. 2). It showed that the levels of 19 differential expressed metabolites were not affected by these factors remarkably, except that inosine, cis-1-pentadecenoic acid (C15: 1) and hypoxanthine-9-β-d-arabinofuranoside seemed to fluctuate in a certain extent between different groups.

Fig. 4
figure 4

All the 19 differentially expressed metabolites were displayed as violin plots. The data were analyzed by two-tailed unpaired Student’s t test and p values were shown in each plot

In order to evaluate the diagnostic efficacy of these metabolites, ROC curves were plotted for each metabolite or different groups of metabolites combinations which were based on logistic regression models. The Akaike Information Criterion (AIC) was used to screen the optimal combination, with the lower the AIC value representing the better the model effect. Among all the diagnostic models, a logistic regression model including four differential metabolites [2-(Methylthio)ethanol, Cortisol, d-Glyceric Acid, and N-Acetylhistamine] was selected due to its lowest AIC value (data not shown). This model was demonstrated to be an ideal diagnostic tool with an area under the ROC curve of 0.946 (95% CI 0.886–1.000). The cut-off value was calculated based on Youden’s J statistic abovementioned and showed a satisfactory efficacy with 92.0% sensitivity and 92.9% specificity (Fig. 5).

Fig. 5
figure 5

A Cut-off value and B ROC curve of combined diagnostic model integrating four differential expressed metabolites [2-(Methylthio)ethanol, Cortisol, D-Glyceric Acid, and N-Acetylhistamine]

3.5 KEGG functional enrichment analysis

In addition, we performed an enriched pathway analysis for the screened metabolites by using KEGG database. As shown in Supplementary Fig. 3, related pathways could be classified into metabolic pathways, vitamin digestion and absorption, riboflavin metabolism, glycerolipid metabolism, ABC transporters, etc. (Supplementary Fig. 3A). According to the rich factors of KEGG pathway classification, these differentially expressed metabolites could be further enriched in riboflavin metabolism, steroid hormone biosynthesis, prostate cancer, etc. (Supplementary Fig. 3B).

It is noteworthy that pathways with highest enrichment factors (steroid hormone biosynthesis and prostate cancer) were all due to the differential expression of cortisol. Therefore, we further studied cortisol related genes based on The Cancer Genome Atlas (TCGA) database via the website of Gene Expression Profiling Interactive Analysis (GEPIA, http://gepia.cancer-pku.cn) (Tang et al., 2017). The results showed that steroid hormone acute regulatory protein (StAR), a rate-limiting protein of cortisol production, was an important factor in LUAD and LUSC. Specifically, as shown in Supplementary Fig. 4A, StAR expression was downregulated in LUAD tumor tissue when compared with adjacent normal tissue. On the contrary, the expression of StAR in LUSC tumor tissue was upregulated compared with the paired adjacent normal tissue. Moreover, StAR significantly differentiated the overall survival of both LUAD and LUSC (Supplementary Fig. 4C, E), where high expression of StAR predicted a better prognosis.

4 Discussion

Lung cancer is one of the most prevalent malignant tumors worldwide, with NSCLC accounting for an estimated 80–85% of those (Torre et al., 2016). Despite the remarkable progress acquired in the treatment of lung cancer in recent decades, the overall survival rate of this disease is still not satisfactory (2019). Accurate diagnosis of histopathological subtypes of NSCLC is of great significance for the precise lung cancer treatment. Currently, the gold standard for the diagnosis of adenocarcinoma and squamous carcinoma of NSCLC is liver biopsy. The difference between LUAD and LUSC requires visual inspection by an experienced pathologist. However, it is difficult to make an accurate diagnosis when the tumor structure is unclear. Therefore, complementary diagnostic biomarkers are still needed to assist NSCLC pathological typing and subsequent precise treatment decisions.

Previous study had demonstrated that LUAD and LUSC were vastly distinct diseases at the molecular, pathological and clinical levels (Relli et al., 2019). The differences between them have been discussed based on analyzing DNA methylation, RNA, miRNA expression and protein (Sun et al., 2017; Zhang et al., 2020). Fan et al. discussed the mRNA-related biomarkers as potential treatment strategies for LUAD and LUSC by analyzing TCGA database (Liao et al., 2020; Liao et al., 2020). Dong et al. demonstrated that DSG3 and KRT14 were differentially expressed between LUAD and LUSC patients by analyzing Gene Expression Omnibus (GEO) microarray data and 40 pairs of tumor tissue samples (Dong et al., 2020). Zhang et al. had discovered that plasma circulating tumor DNA (ctDNA) was correlated to histology types in NSCLC patients, and then established a ctDNA-based histologic classification model with an accuracy of 90% (Zhang et al., 2019). Xiang et al. found that the plasma levels of homovanillic acid and serotonin hydrochloride were significantly different between patients with LUAD and LUSC, but further relationship remained unknown (Xiang et al., 2018). Despite the extensive studies aimed to discover differences between two kinds of pathological subtypes on genetic and transcriptional aspects, there are still limited achievements in differentiating them on metabolic level. Therefore, it is necessary to conduct a comprehensive study of the metabolic characteristics of these two main subtypes of lung cancer, which will facilitate a deeper understanding and precise therapy of NSCLC.

Metabolomics and lipidomics (a subsection of metabolomics) are emerging disciplines that characterize the comprehensive profile of endogenous metabolites in biological systems. As a high-throughput platform, metabolomics can capture metabolic variations that reflect the disease process. In a recent published work of Kowalczyk et al., the differences in metabolic characteristics between patients with both early and late stages of LUAD and LUSC patients were explored. Their results showed that the levels of PC 15:0/22:6 and PC 18:1/22:6 were significantly different between LUAD and LUSC patients of early stage (Kowalczyk et al., 2021). In contrast, more differential metabolites including deoxycholic acid, glycocholic aicd, linoleic acid, arachidonic acid, and kinds of phospholipids were discovered in the comparison of plasma of LUAD, LUSC and large cell cancer (Kowalczyk et al., 2021).

Interestingly, the results of lipidome in the current study show that the difference in lipid profiles between LUAD and LUSC is very limited, as demonstrated by merely four differentially expressed metabolites out of 809 detected lipid compounds. In contrast, as many as 15 differentially expressed metabolites, which belong to organic acid, hormones, nucleotides and others, were found even though only 332 kinds of endogenous metabolites were discovered in the plasma of NSCLC patients. These results seem to be inconsistent with the findings of Kowalczyk et al. In their work, most of the differentially expressed metabolites belong to the lipids. The potential reason remains obscure, and the conflicting metabolic profiles are usually seen in discovering the differential metabolites of NSCLC subtypes. To the best of our knowledge, several studies have attempted to identify differential metabolites in plasma of NSCLC patients, and discovered specific plasma metabolomics profiles including organic acids, amino acids, fatty acids, lipids or carbohydrates, but the results of them are not always coherent among these different studies (Chen et al., 2018; Kumar et al., 2017; Ni et al., 2019; Puchades-Carrasco et al., 2016; Xiang et al., 2018). We speculate that the population differences and different detection methods may lead to these inconsistent results. Nevertheless, in these studies, organic acids are usually screened out as the differential expressed metabolites. For example, linoleic acid is overexpressed in the plasma of LUAD patients when compared with LUSC counterparts both in our study and the work of Kowalczyk et al.

In the present study, 1141 metabolites were detected in the plasma of NSCLC patients. An OPLS-DA model based on all the metabolites levels revealed that LUAD patients is significantly different from that of LUSC, reflecting distinct metabolic profiles caused by two kinds of pathological conditions. Further, 19 of the 1141 metabolites was selected as differentially expressed metabolites based on strict screening standards (both FC > 2/ < 0.5 and VIP > 1.0). As mentioned in the method section, all the enrolled patients were newly diagnosed NSCLC patients and has not received any treatment yet. Therefore, differences of these metabolites in two groups may not be related to drug treatment. In order to classify NSCLC pathological subtype more accurately, we tested different combinations of the 19 differentially expressed metabolites and identified the optimal combination containing four metabolites based on the AIC index. Finally, 2-(Methylthio)ethanol, cortisol, d-Glyceric Acid and N-Acetylhistamine were integrated to establish a potent diagnostic model. This integrated diagnostic model showed an efficient diagnostic ability to distinguish between LUSC and LUAD, with an area under the ROC curve of 0.946 (95% CI 0.886–1.000), as well as high sensitivity and specificity.

Taken together, in our study, the concentrations of plasma endogenous metabolites (such as alcohol/amines, heterocyclic compounds, nucleotides, organic acid) differ much more significantly between LUAD and LUSC group when compared with lipids levels. According to the annotations of KEGG and HMDB databases, it is noteworthy that among the 19 differentially expressed metabolites, 4-Pyridoxic Acid, d-Glyceric Acid, Pantothenate, 2-Hydroxybutanoic Acid, Undecanedioic Acid, Inosine and Riboflavin (all of which are not belong to lipid compound) have been reported to related to colorectal cancer (Brown et al., 2016; Goedert et al., 2014; Ni et al., 2014; Sinha et al., 2016). However, their relevance to lung cancer remain unclear. Therefore, the physiological effects of these endogenous metabolites should attract the attention of NSCLC researchers.

Notably, KEGG functional enrichment analysis of the 19 differential metabolites showed that cortisol was highly correlated with the pathways of maximum enrichment factor. Therefore, we further studied cortisol related genes using TCGA data and found that StAR expression might be an important factor in LUAD and LUSC. Abnormal expression of transporter proteins in cell membrane has already been demonstrated to be associated tumor progression and development in lung cancer (Wang et al., 2017). Herein, StAR, as a membrane transporter protein, mediates the transport of cholesterol from outer to inner mitochondrial membrane, which is the initial and rate-limiting step of cortisol production (Manna & Stocco, 2005). The production of cortisol depends on StAR, and previous study demonstrated the increase of cortisol production in human adrenocortical cells was associated with elevated expression of StAR in human adrenocortical cells (Ping et al., 2012). Consistent with metabolomics data (the level of cortisol in the plasma of LUSC patients was significantly increased compared with LUAD patients), the expression of StAR in LUSC tumor tissues was also upregulated compared with LUAD ones. StAR was significantly correlated with the prognosis of LUSC and LUAD patients (Supplementary Fig. 4). However, whether it is upregulated in the adrenocortical cells of LUSC patients remains to be further studied.

In addition, D-glyceric acid, an organic acid which was selected as a one of the discriminators of LUAD and LUSC in our study, is derived from the biosynthesis of glycerol (Habe et al., 2009). Aquaporin 3 (AQP3) is also a transporter of membrane, which delivers glycerol for synthesizing the lipid second messenger in the regulation of keratinocyte proliferation and differentiation (Wang et al., 2017). The analysis of TCGA data showed that AQP3 is over-expressed in the tumor tissues of LUAD patients compared with LUSC ones (Supplementary Fig. 5). Interestingly, our findings reveal that the level of d-glyceric acid is significantly higher in the plasma of LUAD patients than that of LUSC counterparts, suggesting that the increased content of d-glyceric acid might be related with the over-expression of AQP3. However, the overall survival and disease-free survival of LUAD and LUSC subtype patients seemed not to be correlated with the expression of AQP3. The specific role of d-glyceric acid in the proliferation and differentiation of NSCLC need to be further explored.

Despite the interesting results, we have to admit that there exist some limitations in the present study. First, the sample size is too small and distributed unbalanced between LUAD and LUSC groups. Second, a validation set for the ROC curve is lack in our research. Third, this study did not include a group of healthy human plasma samples for comparisons with LUAD/LUSC patients. Last but not the least, tissue metabolomic analysis is lack here, which is crucial to identify potential biomarkers derived from the secretion of tumor tissues.

5 Conclusions

In summary, the present study has plotted comprehensive metabolic profiles of LUAD and LUSC landscapes by employing a platform of targeted metabolome and lipidome. This investigation identifies novel endogenous metabolites that may be served as key small molecules in the development and differentiation of NSCLC. However, more in-depth mechanism researches and larger cohort studies are needed to verify these small molecular differences and metabolic pathway abnormalities for clinical application to NSCLC precise diagnosis, treatment and prevention.