Introduction

Mutations in amino acid metabolism-related genes are reported to promote the tumorigenesis and metastasis of various cancers (Pavlova and Thompson 2016; Jain et al. 2012). Amino acid metabolism is a crucial mediator involved in cell growth, proliferation, and in the maintenance of cell redox, genetic as well as epigenetic states (Fu et al. 2023; Li and Zhang 2023). Amino acids are associated with metabolisms of lipids, glucose, and nucleotides, which are very important for cancer cell proliferation and metastasis (Li and Zhang 2016; Vettore et al. 2020; Zhu et al. 2022). Various genes encoding metabolic enzymes have been implicated in tumorigenesis. For instance, glutaminase 2 (GLS2) is reported to enhance tumor drug resistance via a p53-mediated signaling pathway (Mates et al. 2020).

Immunotherapy can enhance the capacity of the immune system to detect and clear cancer cells. Recently, immunotherapy has emerged as a highly effective treatment option for various cancers, such as melanoma, lung cancer, head and neck cancers, and renal cell carcinoma (RCC) (Reda et al. 2022; Fasano et al. 2022; Yao et al. 2022; Huang and Zappasodi 2022). However, it is not clear how immunotherapy affects amino acid metabolism in cancer.

The overall survival outcomes of hepatocellular carcinoma (HCC) patients vary significantly across the world, with a 5-year survival rate of 19% in the USA (Dasgupta et al. 2020). The risk factors for HCC include chronic HBV/HCV infections, alcohol consumption, exposure to aflatoxins, and non-alcoholic fatty liver disease (Yang et al. 2019). Chemotherapy and radiotherapy have markedly improved HCC patients’ survival outcomes. Pathologically, HCC is a highly heterogeneous disease, whose treatment has been documented at the interpatient, intertumoral and intramural levels (Torrecilla et al. 2017; Alawyia and Constantinou 2023). Currently, there is no effective immunotherapy developed for liver cancer. Considering the limited HCC treatment strategies, effective markers are urgently needed for the identification of HCC patients who are likely to benefit from immunotherapy.

In this study, we systematically and comprehensively investigated the characteristics of gene sets related to amino acid metabolism in liver cancer. First, we demonstrated that gene sets associated with amino acid metabolism could stratify liver cancer based on clinical and molecular features. Next, we developed a 4-gene signature that is based on amino acid metabolism-related genes using liver cancer RNA-seq data from TCGA and validated it with a liver cancer dataset from ICGC. Our results show that the 4-gene signature exhibited an ability to accurately predict the prognosis and responses to immunotherapy, which offer further insight into individualized treatment of liver cancer.

Materials and Methods

Data Collection and Processing

The level 3 RNA expressions as well as the clinical data associated with 50 normal liver tissues and 374 HCC tissues were retrieved from the TCGA database (https://portal.gdc.cancer.gov/repository). The RNA-seq data and associated clinical data for 231 liver cancer cases were downloaded from the ICGC database (https://dcc.ICGC.org/projects/LIRI-JP). Then, the “limma” R package was used to normalize read count values using the scale method. Since the ICGC and TCGA datasets are publicly available, the ethical approval requirement was waived. The gene sets associated with amino acid metabolism (REACTOME_METABOLISM_OF_AMINO_ACIDS_AND_DERIVATIVES, Supplementary Table S1) were acquired from the Molecular Signatures Database v7.1 (MSigDB).

Identification of Differentially Expressed AMGs

The “Limma” R package was used for identification of differentially expressed AMGs, with |log2fold change (FC)|> 1) and FDR < 0.05 as cut-offs. The differentially expressed AMGs were then subjected to KEGG pathway and GO analyses via “clusterprofiler” R package to identify the enriched terms and pathways (Wu et al. 2021a, b).

Molecular Subgroup Classification by Consistent Clustering

The prognostic-related AMGs were screened by univariate regression analysis and used for unsupervised clustering analysis. “ConsensusClusterPlus” R package (Wilkerson and Hayes 2010) was used to determine the cluster number of HCC samples from TCGA using the continuous clustering algorithm. The Euclidean square distribution metric, K-Merge cluster algorithm, and KM Clustering algorithm were used for sample classification into k clusters (k = 2:9), with 100 iterations and around 80% of samples selected in every iteration. PheatMap in R was used to visualize the results on a consistency matrix heatmap. Correlation cumulative distributor (CDF) and Delta region maps were used to establish the optimal cluster number (Wilkerson and Hayes 2010). The maximum number of clusters was established based on the criteria: high consistency in the cluster, low variability coefficient, and no significant increase under the CDF curve.

Establishment and Verification of a Prognostic AMGs Signature

Univariate Cox regression analyses were conducted to identify prognostic genes related to amino acid metabolism based on these DEGs (p < 0.05). Then, the LASSO regression analysis was performed to further screen hub genes using the “glmnet” R package (Tibshirani 1997). We conducted 1000 replacement samples of the dataset and chose markers whose repetition frequency > 900. Finally, the multivariate Cox regression analysis was performed to develop a certain gene signature and to define the importance of gene expression. The regression coefficient was derived from multivariate Cox regression analysis and the prognostic index (PI) determined via the formula: (mRNA1 coefficient * expressions of mRNA1) + (mRNA2 coefficient * expressions of mRNA2) + … + (mRNAn coefficient * expression of mRNAn). Using the median risk score, patients were classified into the high- and low-risk groups. With regards to gene expressions in the signature, PCA analysis were done using “prcomp” R package. In addition, t-sne analysis was performed using “rtsne” package in R to assess the distributions of various groups. “Survivalroc” R package was used for time-dependent ROC curve analysis to assess the gene signature’s predictive ability. At last, we analyzed the correlations of hub genes in HCC with Xiantao love online tool (https://www.xiantao.love/products).

Independence of the AMG-Based Model from Other LIHC Patients’ Clinical Features

Based on other clinical features (grade, age, TNM stage, and stage) of LIHC patients, univariate and multivariate Cox regression analyses were conducted to check if the prognostic model is an independent variable. To confirm the prognostic significance of the predictive model, LIHC samples were randomized into two groups depending on various clinical characteristics. Patients were separately classified into the following subgroups: grade I/II, grade III/IV, stage I/II, stage III/IV, age < 65, age ≥ 65, T1–T2, and T3–T4 subgroups. Then, survival outcome analysis was performed to verify the independent prognostic significance of the gene signature in particular subtypes. The ideal cut-off value of the risk score was established using the surv_cutpoint function of “survminer” R package.

Gene Set Variation Analysis (GSVA) and KEGG Pathway Analysis

To establish the biological pathways or processes that were enriched in different risk groups, we used GSVA to assess differences in bioprocess activities and signaling pathways in various risk groups (Hanzelmann et al. 2013). To this end, we acquired the “c2.cp.kegg.v7.2.symbols” file from MSigDB (https://www.gsea-msigdb.org/gsea/msigdb/) and performed GSVA analysis using “GSVA” R package to identify significantly enriched pathways (adjusted p < 0.05. Next, DEGs in low-risk vs high-risk groups (|log2FC|≥ 1, FDR < 0.05) were identified and KEGG pathway analyses done using “clusterProfiler” R package (Wu et al. 2021a, b) to predict their functions. p < 0.05 indicated the biological functions in which the DEGs were significantly enriched.

Analysis of Tumor Mutation Burden

The liver cancer patients’ mutation burden data were retrieved from TCGA database (https://www.cancer.gov/tcga/) and the total non-synonymous mutations counted to determine the mutational burden. The “Maftools” R package was used for driver gene identification (Mayakonda et al. 2018), with p < 0.05 indicating genes that were significantly differentially mutated in low- vs high-risk groups. The top 20 driver genes with the highest mutation frequencies were evaluated further. The copy number increase and loss burden was calculated at the arm and focal levels between high- and low-risk patient groups as previously described (Shen et al. 2019).

Assessment of Immune Cell Type Fractions

The activities of 13 immune-associated pathways and infiltration levels of 16 types of immune cells were evaluated by ssGSEA (Rooney et al. 2015) using the “GSVA” R package. The annotated gene set file is as shown in Supplementary Table S2.

Prediction of Patients’ Responses to ICI and Chemotherapy

The PD-1, CTLA4, PD-L1, and LAG3 immune checkpoints were used to evaluate the associations between risk scores and immunotherapeutic efficacies (Charoentong et al. 2017). The independent datasets (IMvigor210, GSE135222, and GSE91061) were analyzed to assess the ability of the gene signature to predict immunotherapeutic responses. Clinical information and expression data in the IMvigor210 dataset were retrieved from http://research-pub.gene.com/IMvigor210CoreBiologies. The expression data as well as clinical information for GSE91061 and GSE135222 datasets were acquired from GEO (https://www.ncbi.nlm.nih.gov/geo/).

Estimation of Drug Responses

The sensitivity of low- and high-risk groups to chemotherapy was assessed as previously described (Villanueva 2019). Briefly, the drug sensitivity datasets (CTRP and PRISM) were obtained from CTRP (https://portals.broadinstitute.org/ctrp) and PRISM (https://depmap.org/portal/prism/). The CCLE expression data for drug sensitivity analyses were extracted from CTRP and PRISM datasets. Significantly different drugs between the low- and high-risk groups were indicated by log2FC > 0.10 in the CTRP and PRISM datasets, and the Pearson correlation coefficient was < −0.30 in the CTRP dataset. Pearson correlation coefficient was < −0.35 in the PRISM dataset.

Establishment and Evaluation of a Predictive Nomogram

A nomogram was build based on gender, stage, grade, age, and risk score as described before (Iasonos et al. 2008). The receiver operating characteristic (ROC) curve (AUC), 1-, 3-, and 5-year calibration curves, as well as decision curve analyses (DCA) were used to assess the nomogram’s predictive accuracy and discriminatory capacity (Vickers and Elkin 2006).

Drug Susceptibility Analysis

To screen for FDA-approved drugs and clinical trial data, the drug susceptibility based on hub genes was analyzed with the CellMiner database. We mainly analyzed the expressions of four hub genes and drug sensitivity. The correlations between hub genes expression levels and drug susceptibility were conducted by spearman correlation analysis, and the top 16 drugs with the most significant correlations were selected.

Transcription Factor (TF) Regulatory Network Analysis

The regulatory networks between transcription factors (TFs) and hub genes were conducted with the online analysis tool, NetworkAnalyst (https://www.networkanalyst.ca/NetworkAnalyst/). The hub genes (TXNRD1, PSMD14, SMOX, and EEF1E1) were input and the human transcription factor targets were derived from the JASPAR TF binding site profile database to establish the TF–gene interactions regulatory network.

Statistical Analysis

Gene expressions in normal vs tumor tissues were compared using the student’s t test, while the chi-square test was used to compare proportional differences. The ssGSEA scores and differences in immune cells or immune pathways in the low- vs high-risk group were compared using the Mann–Whitney U test. Comparisons of survival differences in the low- vs high-risk groups were conducted by Kaplan–Meier analysis. Independent predictors were identified by univariate and multivariate Cox regression analyses. The R (version 4.0.1) or SPSS (version 22.0) software were used for analyses. Unless otherwise stated, p < 0.05 was set as the threshold for statistical significance.

Results

Prognostic and Amino Acid Metabolism-Associated DEGs in the TCGA Dataset

This study involved 374 liver cancer patients in the TCGA dataset and 231 liver cancer patients in the ICGC (LIRI-JP) dataset, and the flow chart is shown in Fig. 1. Analysis of these datasets using the LIMMA package in R identified 374 AMGs, of which, 132 AMGs were differentially expressed (23 suppressed and 109 elevated) between liver cancer tissues and normal tissues (Fig. S1A, B). GO analysis revealed that these AMGs were enriched in responses to mRNA catabolic processes, protein localization of the endoplasmic reticulum, and protein targeting to the membrane (Fig. S1C). KEGG pathway analysis revealed that the modular genes were enriched in the metabolism of arginine, proline, alanine, aspartate, glutamate, cysteine, methionine, tryptophan, amino acid biosynthesis, and the selenocompound metabolism signaling pathways (Fig. S1D).

Fig. 1
figure 1

Flowchart showing the process of analyzing AMGs in liver cancer

Identification of Distinct Molecular Clusters Based on Prognostic AMGs

Consistent clustering analysis based on the 132 DEGs was done using ConsensusClusterPlus package in R. First, a consensus matrix graph was constructed and the 132 DEGs assigned to 2, 3, and 4 clusters to evaluate the clustering quantity (Fig. S2A–C). As shown in Fig. S2D, k = 4 was associated with good clustering. CDF delta area curve analysis revealed that the area was stable with a cluster number of 4 (Fig. S2E). PCA showed that the 4 clusters could be better distinguished (Fig. S2F). The cluster subgroup was significantly correlated with gender (Fig. S2G). Kaplan–Meier analysis revealed that cluster 1 had markedly better survival probability relative to the other 3 clusters, while cluster 2 had the worst survival probability (Fig. S2H).

The TCGA Dataset-Based Prognostic Model

To establish an amino acid metabolism-associated gene signature, we obtained 50 OS-associated amino acid metabolism-related DEGs based on univariate Cox regression analysis of the TCGA dataset, 11 of which are favorable factors of liver cancer (Fig. 2A, B). A total of 50 AMGs were differentially expressed between liver cancer tissues and normal tissues (Fig. 2D). Then, LASSO regression analysis was performed and 30 genes were reserved (Fig. 2C, E). Finally, 4 AMGs were selected using multiple stepwise Cox regression analysis and used to build a predictive model (Table 1, Fig. 2F, G). The risk score for every HCC patient was determined using the formula: e (0.193 * expression levels of TXNRD1+0.362 * expression levels of PSMD14+0.168 * expression levels of SMOX + 0.270 * expression level of EEF1E1).

Fig. 2
figure 2

Identification of amino acid metabolism-associated genes in the TCGA dataset. A Venn diagram of DEGs and survival-related genes. B Correlation network map of candidate genes. C Distribution map of LASSO coefficients for 50 amino acid metabolism genes. D Heatmap of 26 differentially expressed genes significantly associated with amino acid metabolism. E Six candidate genes were identified using LASSO regression analyses with ten-fold cross-validation. F Multivariate Cox regression coefficient distribution of 4 amino acid metabolism-associated genes. G Multivariate Cox regression forest plot of the prognostic significance of the 4 amino acid metabolism-related genes

Table 1 Identification of prognosis-related hub AMGs using multivariate Cox regression analysis

The patients were assigned into low-risk (n = 183) and high-risk (n = 182) groups based on the median risk score (Fig. 3A). Next, the prognostic gene expression patterns in the high- vs low-risk groups of the TCGA training set were visualized with a heatmap (Fig. 3B). Time-dependent ROC curve analysis of the risk scores capacity for OS prediction revealed respective AUC values of 0.757, 0.678, and 0.669 at 1, 2, and 3 years (Fig. 3C). High-risk score patients had a high chance of dying earlier, relative to those with low-risk scores (Fig. 3D). Then, PCA and t-SNE analysis showed that patients in various risk groups were scattered in two directions (Fig. 3E, F). Kaplan–Meier analysis revealed that the high-risk patient group had markedly worse OS and PFI outcomes, relative to low-risk patients (Fig. 3G, I). However, DFI did not differ markedly in the high- vs low-risk groups (Fig. 3H). The risk score was markedly associated with grade, stage, T stage, as well as the survival status of LIHC patients (Table 2; Fig. S3).

Fig. 3
figure 3

Prognostic assessment of the 10-gene signature model in the TCGA dataset (A). Distribution characteristics and median risk score values in TCGA dataset. B Heatmap analysis indicating differences in levels of the 4-gene signature in high- vs low-risk groups. C AUC of the time-dependent ROC curve was used to evaluate the risk scores’ prognostic value in the TCGA dataset. D Survival rate of patients in low and high-risk groups. E, F PCA and t-SNE analyses of the risk model’s accuracy in TCGA dataset. G–I Kaplan–Meier analyses of DFI, OS, and PFI in the low- and high-risk groups in the TCGA dataset

Table 2 Relationship between patients’ clinical features and the risk score in the training set

Validity of the 4-Gene Signature in the ICGC Dataset

To assess the robustness of the developed model using the TCGA dataset, the ICGC dataset patients were classified into the high- and low-risk groups based on median risk scores and an expression profile heatmap of prognostic risk genes in the high- vs low-risk group generated (Fig. 4A–C). As with findings from the ICGC cohort, t-SNE and PCA analyses verified that patients in the 2 groups were scattered in distinct directions (Fig. 4D, E). Respective AUC value of the 4-gene signature was 0.697, 0.693, and 0.678 at 1, 2, and 3 years (Fig. 4F). Relative to gender, age, and stage, the 4-gene signature had higher AUC value at 1 year (Fig. 4G). The high-risk group had low survival outcomes, relative to the low-risk group (Fig. 4H, p = 0.001).

Fig. 4
figure 4

Verification of the 4-gene signature in the ICGC dataset. A Median risk scores and distribution characteristics in the ICGC dataset. B Heatmap analysis of differences in the expression of the 4-gene signature in low- and high-risk groups. C Survival rates of high- and low-risk patient groups. D, E t-SNE and PCA analyses of the model’s accuracy on the TCGA dataset. F AUC of time-dependent ROC curve was used to evaluate the risk score’s prognostic value in the ICGC dataset. G AUC of the time-dependent ROC curve was used to assess the prognostic significance of the risk score, gender, age, as well as stage in the ICGC dataset. H Kaplan–Meier analysis of OS in the high- and low-risk groups in the ICGC dataset

Prognostic Significance of the 4-Gene Signature

To assess the prognostic value of the model in various clinicopathological settings, the clinical variables and samples were randomized into 2 subgroups with regards to TNM stage, age, grade, and gender. Patients in the subgroups were then assigned into high- and low-risk groups and the best cut-off value of the prognostic model is used as the cut-off. KM survival analysis of the 8 subgroups (age < 65 years old, grade I–II, age ≥ 65 years old, grade III–IV, sex, stage I–II, III, stage IV, T0–T2 and T3–T4) revealed that except for the female subgroup, the prognostic models markedly correlated with LIHC patient survivals outcomes (Fig. 5).

Fig. 5
figure 5

KM survival subgroup analyses for LIHC patients in TCGA dataset based on the 4-gene signature stratified by clinical characteristics. A Aged ≥ 65 years. B Aged < 65 years. C Female. D Male. E Grade I–II. F Grade III–IV. G Stage I–II. H Stage III–IV

Immune Microenvironments in High- vs Low-Risk Groups

To analyze the function of the risk model, enrichment scores of 16 immune cell types and activities of 13 immune-associated pathways in TCGA and ICGC datasets were compared. In the TCGA cohort (Fig. 6A), the high-risk patient subgroup usually exhibited high levels of immune cell infiltrations, especially for activated dendritic cells (aDC), dendritic cells (DC), induced dendritic cells (iDC), natural killer (NK) cells, macrophages, T helper (Th) cells (Tfh, Th1, and Th2 cells), and regulatory T (Treg) cells. Except for cytolytic activities, inflammation promotion, T-cell co-suppression, type I interferon response pathway, and type II interferon response pathway, the rest of the 7 immune pathways were more active in the high-risk patient group than that in low-risk in TCGA cohort (Fig. 6C). Similar observations were made in the ICGC dataset (Fig. 6B, D). Assessment of associations between expressions of the gene signature, immune cells, and immune-associated pathways revealed that most of them were significantly positively or negatively associated with immune cell infiltrations in cancer tissues (Fig. S4).

Fig. 6
figure 6

Comparison of the immune cells and immune-associated roles between various risk groups in the TCGA dataset (A, B) and ICGC dataset (C, D). Comparison of difference in immune cell scores (A, C) and immune-associated function scores (B, D) in TCGA and ICGC datasets. ns = not significant. *, **, and ***, respectively, denote p < 0.05, p < 0.01, and p < 0.001

The 4-Gene Signature and TMB

The correlations between TMB and the risk scores were not significant (Fig. 7A). Grouping patients into low- and high-TMB group revealed that low TMB correlated with better OS relative to high TMB (log-rank test, p = 0.001, Fig. 7B). Given the contraindicated prognostic significance of TMB and the 4-gene signature, we assessed the synergy of these scores for prognostic stratification of TCGA-HCC. Stratified survival analysis showed that TMB status did not interfere with predictions based on the 4-gene signature (p < 0.001, Fig. 7C). These findings imply that the risk score is a potential predictor that is independent of TMB and an effective evaluator of immunotherapeutic responses.

Fig. 7
figure 7

Associations between the 4-gene signature and somatic variations. A TMB differences in low- and high-risk groups. Wilcoxon test, p < 0.0001. B Kaplan–Meier analysis of low and high-TMB groups in LIHC dataset. p = 0.0067, by Log rank test. C Kaplan–Meier analysis of LIHC dataset patients stratified using TMB and the 2-gene signature. D, E OncoPrint was build using high- and low-risk scores. F Focal and broad copy number alterations in low- and high-risk patient groups

Then, we assessed the distribution of somatic variations in TCGA-HCC driver genes in low- and high-risk groups of the TCGA-HCC dataset using the maftools package in R and selected the top 20 driver genes with highest frequencies (Fig. 7D, E). Twenty genes, including TP53 (p = 2.63e−07), SPEG (p = 0.007), NLRP12 (p = 0.007), and DYNC2H1 (p = 0.009) differed markedly with regards to somatic variations in low- vs high-risk group (Table 3). To assess the differences in genetic changes in high- and low-risk group, we next assessed their copy number changes and found that the high-risk patient group had a high burden of copy number gain at focal and arm levels and a higher burden of copy number loss at the arm level, relative to low-risk group patients (Fig. 7F).

Table 3 Relationships between risk scores and somatic variations

KEGG and GSVA Analyses of Low- and High-Risk Groups

To elucidate risk score associated of biological functions, GSVA was used to determine biological differences in high- vs low-risk groups. The analysis results revealed the cell cycle, ubiquitin-mediated proteolysis, spliceosome, and RNA degradation pathways were highly enriched in high-risk groups of ICGC and TCGA cohorts (Fig. S5A), while PRAR, primary bile acid biosynthesis, and linoleic acid metabolism pathways were enriched in the low-risk group (Fig. S5B). Comparisons of the high- and low-risk groups identified 933 and 1772 DEGs (|logFC|> 1, FDR < 0.05) in ICGC and TCGA cohorts, respectively. KEGG pathway analysis of the DEGs from both cohorts revealed that they were significantly enriched in glycolysis/gluconeogenesis, carbon metabolism, bile secretion, metabolism of xenobiotics by cytochrome P450, fructose as well as mannose metabolism, and drug metabolism pathways (p < 0.05, Fig. S5C, D). Moreover, DEGs from the TCGA cohort were markedly enriched in retinol metabolism, ECM–receptor interaction, and leishmaniasis pathways (p < 0.05, Fig. S5C).

Responses of Patients to ICI

We examined the expressions of immune checkpoints (PD-1, PD-L1, LAG3, and CTLA4) in the risk groups and found that they were markedly upregulated in high-risk patient group, indicating a hot immune microenvironment (Fig. 8A). To assess the value of the risk score in predicting the therapeutic advantage to patients, we analyzed the GSE91061 dataset and classified patients who received immunotherapy by high levels or low levels of the 4-gene signature. Notably, exploration of the IMvigor210 cohort revealed that the high-risk patient group markedly outlives those in low-risk group (log-rank test, p = 0.0012, Fig. 8B). The high-risk IMvigor210 patient cohort had a high probability of dying earlier, relative to those in the low-risk group (log-rank test, p = 0.00056, Fig. 8B). The clinical response rate (including CR/PR and SD/PD) was also higher in the high-risk group (Fig. 8B). Comparable findings were obtained from the validation dataset (GSE135222) and the IMvigor210 cohort (Fig. 8C, D). These data indicate that the 4-gene signature can predict responses to immunotherapy.

Fig. 8
figure 8

Role of the 4-gene signature in predicting immunotherapeutic benefits. A Levels of immune checkpoints, CD274, PDCD1, CTLA4, and TIM-3 in low- and high-risk groups. B Risk scores in various immunotherapy response groups. Kaplan–Meier analysis and rate of clinical response in indicated risk score groups in dataset GSE91061. C Risk scores in different immunotherapy response groups. Kaplan–Meier curves and rate of clinical response in different risk score groups in dataset GSE135222. D Risk scores in different immunotherapy response groups. Kaplan–Meier curves and rate of clinical response in different risk score groups in dataset IMvigor210

Drug Responses

Analyses of PRISM and CTRP drug response datasets were aimed at identifying drug candidates with high sensitivities among high-risk patients. The high-risk patients were highly sensitive to compounds MLN2238, SB-743921, SGK461364, clofarabine, paclitaxel, and BI-2536 from the CTRP dataset and the compounds volasertib, epothilone-b, and ispinesib from the PRISM dataset. These compounds had low AUC values in the high-risk group and correlated negatively with the risk score (Fig. S6A, B), indicating that they may exhibit therapeutic efficacies in high-risk liver cancer patients.

Establishment and Validation of the Predictive Nomogram

To assess the risk model’s clinical prognostic value, we used univariate and multivariate Cox regression analyses to ascertain risk factors with independent prognostic value in LIHC. The risk score and stage were found to be important independent factors (Fig. 9A). To test the risk model in clinical settings, we constructed a nomogram and tested its capacity to predict the OS outcomes on LIHC datasets at 1, 3, and 5 years based on gender, grade, age, stage, and risk group (Fig. 9B). This analyze revealed that relative to the 4-gene signature and the other 4 clinical indicators, the nomogram had a better prognostic ROC value and it could predict OS outcomes for 1-, 3-, and 5-year (Fig. 9C–E). Moreover, the 1-, 3-, and 5-year OS calibration curves for the LIHC data revealed that the nomogram had a good predictive discrimination capacity as well as accuracy (Fig. 9F). Comparisons of net benefits of various models, such as none, risk score, all, nomogram, and clinical indicators, revealed that the nomogram had a higher net income and a wider threshold probability (Fig. 9G). Relative to other clinical markers and risk score, the nomogram also had a higher consistency index (C index, Fig. 9H). Thus, ROC, DCA, calibration curve, and C index analyses indicate that the nomogram has better clinical benefits than the risk score based on the 4 AMGs signature alone.

Fig. 9
figure 9

Forest plot from univariate and multivariate analyses of the 4-gene signature and nomogram for predicting 1-, 3-, and 5-year OS in the LIHC training set. A Univariable and multivariable analyses of the 4-gene signature in LIHC patients. B Nomogram for OS prediction at 1-, 3-, and 5-year. C–E ROC analysis of 1-, 3-, and 5-year OS prediction. F Calibration curves for prediction of 1-, 3-, and 5-year OS. G DCA curve. H Concordance index revealing measure of concordance of the predictor with patient survival

Protein Expressions of Hub Genes in HPA Database

To further verify the functions of hub genes, the 4 AMGs were selected for immunohistochemical analysis with the Human Protein Atlas (HPA) database. Which show that, the protein levels of the four hub genes (TXNRD1, PSMD14, EEF1E1, and SMOX) were significantly higher in LIHC tumor tissues, compared to normal tissues (Fig. 10). Moreover, it was discovered that the aforementioned hub genes were expressed not only in the liver tissue but also in 26 other human organ types (Fig. S8).

Fig. 10
figure 10

Immunohistochemical detection of the hub genes in liver cancer and normal tissues from the HPA database

Drug Susceptibility Based on Hub Genes

Correlations between drug Z-scores and hub genes were analyzed, with the first 16 significant drug–gene pairs shown in Fig. S9. A total of 246 drugs showed statistical differences, of which, Irofulven, Staurosporine, Amonafide, and 3-Bromopyruvate were highly positively correlated with hub gene expressions. In contrast, the other 12 drugs were negatively correlated with hub gene expressions (Fig. S9).

Transcription Factor (TF) Regulatory Network Based on Hub Genes

To further understand the mechanisms of the hub genes, a regulatory network of transcription factors (TFs) centered around hub genes was constructed. As exhibited in Fig. S10, hub genes (TXNRD1, PSMD14, SMOX, and EEF1E1) play an important role in the regulatory network, they interact more or less with other TFs. In addition, there was also a mutual regulatory relationship between hub genes. Furthermore, it shows that all of the hub genes in this study were significantly correlated with each other (Fig. S7).

Discussion

Recent studies have associated metabolism with cancer epigenetics (Lee and Kim 2022; Thakur and Chen 2019). Abnormal metabolism enhances tumor proliferation and metastasis. Numerous metabolic genes are effective prognostic biomarkers and amino acid metabolism is a vital metabolic variation in HCC. Bioinformatics approaches have been used to investigate how metabolism affects the risk of HCC (Liu et al. 2020; Tang et al. 2020). Immunotherapy is effective against cancers; however, its relationship with amino acid metabolism genes in HCC have not been investigated. Here, we established a 4 amino acid metabolism-associated genes signature and found that it is an effective prognostic biomarker and predictor of immunotherapeutic efficacy in HCC. In this study, we identified a 4-gene signature also displaying a relative high prognostic value of HCC, and some studies require more genes to achieve similar results (Zhao et al. 2021).

Based on differentially expressed genes that associated with amino acid metabolism, we divided the TCGA dataset of 375 liver cancer cases into 4 subtypes (k = 4) and found that their survival outcomes and clinical characteristics differed significantly (Fig. S2), indicating that amino acid metabolism is associated with liver cancer occurrence and development.

Then, we established a 4-gene signature to assess the amino acid metabolic status of liver cancer patients. The patients were assigned into low- and high-risk groups based on the risk score. K–M analysis revealed that low-risk patients had long survival times relative to high-risk patients. ROC curve analysis showed that the risk characteristics can efficiently predict 1-, 2-, and 3-year survival outcomes of HCC patients. The independent prognostic significance of the risk signature was verified by univariate and multivariate Cox analyses. Recent studies have reported that metabolic gene signatures can efficiently predict OS outcomes of HCC patients (Hu et al. 2020; Wu et al. 2021a). Prognostic markers based on lipid metabolism have been developed and shown to be closely associated with clinical features, immune cells, and various biological roles in HCC (Zhu et al. 2021). Wu et al. developed a six-gene metabolism risk signature for HCC that was highly based on lipid and nucleotide metabolism (Wu et al. 2021b). Liu et al. developed a prognostic marker for glioma that was based on amino acid metabolism and showed that the risk score was closely associated with various aspects of glioma malignancy (Wu et al. 2021a, b). In a recent study, a comprehensive approach incorporating risk model construction, analysis of immune cell infiltration, and gene expression analysis was employed to develop a 9-gene signature associated with amino acid metabolism, and the signature was further utilized to develop a prognostic nomogram for predicting OS in HCC (Zhao et al. 2021). However, in this study, the risk model comprising 4 amino acid metabolism genes revealed that the risk score significantly correlated with immunotherapy.

Notably, after adjusting molecular and clinical features, we found that the amino acid metabolism-associated risk signature was an independent prognostic factor. Next, we constructed a nomogram to predict 1-, 3-, and 5-year OS outcomes in the LIHC dataset. The ROC, DCA, calibration curves, and concordance index analyses showed that the nomogram exhibited superior clinical value relative to the risk score obtained by the 4-gene signature alone. The amino acid metabolism status was used to refine clinicopathological characteristics and which shows great promise in accurately predicting prognosis in liver cancer. Consequently, by integrating the identified risk signatures with other pertinent features, a more comprehensive and precise prognosis prediction for liver cancer can be achieved. This integrated approach improves the design of prognostic models and hence the clinical management of patients with liver cancer.

Risk score-based biomarkers offer valuable insights for prognostication and guiding targeted therapy in precision oncology. An extensive analysis of 6125 compounds revealed promising candidates, including MLN2238, SB-743921, SGK461364, clofarabine, paclitaxel, and BI-2536, identified from CTRP-derived drug response data. High-risk patients with liver cancer showed significant correlations with compounds from both PRISM and CTRP data, such as volasertib, epothilone-b, and ispinesib. These findings highlight the potential of risk score-based biomarkers in identifying targeted therapeutic options for liver cancer patients.

Pathway enrichment analysis using GSVA and KEGG analyses uncovered key insights in high-risk patients. Including cell cycle regulation and biological macromolecule synthesis pathways were enriched, highlighting the importance of amino acid metabolism-associated pathways in high toxicity metabolism, which indicated that amino acid metabolism-associated genes can impact amino acids levels, indicating that many high-risk patients were influenced by toxicity and drug metabolism. Ferroptosis-related genes were closely associated with immune microenvironment in HCC (Zhu et al. 2023). In this study, immune response-associated pathways are enriched in these patients. We also found that increased infiltration of memory B cells, follicular helper T cells, activated memory CD4+ T cells and naive CD4+ T cells, and reduced infiltration of naïve B cells in high-risk patients. B cells infiltration has been reported to be higher in liver cancer patients than that in liver cirrhosis patients or healthy subjects (Zhang et al. 2020). Elevated plasma cell levels and low levels of immature B cells are associated with poor prognosis (Zhang et al. 2019). A recent study found that reduced levels of CD8+ T cells result in immune dysregulation in HCC patients, which may promote HCC progression. Our results also indicated low infiltration of NK cells in high-risk patients. Elevated of Trp and Arg catabolism was reported to trigger NK cell apoptosis and to enhance tumor immune escape (Grohmann and Bronte 2010).

Further analysis indicated that there was no significant correlation (correlation = 0.14) between risk score and TMB, which is more sensitive to immunotherapies. Stratified analysis showed that the prognostic significance of the risk score in LIHC was independent of TMB. The absence of association, along with individual predictive values as well as GSEA outcomes, suggests that the TMB and risk score are distinct tumor immunobiology aspects. Furthermore, the risk score demonstrates its independence to predict immunotherapeutic responses. Analysis of data from patients undergoing immunotherapy (datasets GSE91061, GSE135222, and IMvigor210) revealed significantly higher risk score in patients who responded to immunotherapy, highlighting the predictive value of risk scores. Together, this study indicates that a single dose of immunotherapy may benefit high-risk patients. Previous studies have been reported that immunodiagnostic markers exhibited important value in early prediction of HCC (Xing et al. 2021). The 4 amino acid metabolism-related genes identified in this study may also act as immunotherapeutic markers for hepatocellular carcinoma.

This study has several limitations that should be acknowledged. Firstly, the risk signature is based on 4 genes involved in amino acids metabolism was validated solely using an ICGC dataset. Thus, the clinical utility of this risk signature needs to be validated using real-world prospective data. Secondly, our analysis based on a single hallmark to establish the prognostic model, which may exclude other important factors associated with HCC prognosis. Additionally, the relationship between risk score and immunity warrants further investigation.

In conclusion, we have successfully developed a prognostic model based on four genes associated with amino acid metabolism. Our analysis shows that this model is independently correlated with overall survival in both validation and derivation cohorts, providing valuable insights into prognostic prediction of HCC. Moreover, our study highlights the effectiveness of the 4-gene signature in predicting HCC prognosis and response to immunotherapy.

Conclusion

In summary, we developed a 4-gene amino acid metabolism-associated genes signature. Univariate and multivariate analyses revealed that the characteristics of 4-gene signature were independent prognostic factors in liver cancer. GSVA and KEGG analyses demonstrated a significant association between high-risk score tumors and various malignant characteristics of liver cancer. Moreover, the high-risk groups exhibited a higher number of mutant genes and elevated levels of immune infiltration. This observation was further validated in three immunotherapeutic cohorts, where patients with a low-risk score exhibited notable therapeutic and clinical advantages. Finally, a prognostic nomogram was established according to the TCGA cohort. On a general perspective, this study demonstrates that the 4-gene signature serves as a reliable diagnostic marker and predictive marker for immunotherapy.