Introduction

As a subtype of non-small cell lung cancer (NSCLC), lung adenocarcinoma (LUAD) primarily occurs in the bronchial mucosal epithelium of the lungs [1, 2]. With a high incidence and mortality rate, it has a younger onset age [3]. Though most LUAD patients do not exhibit obvious symptoms in the early stages, as the disease progresses, patients often experience symptoms such as dry cough, hemoptysis, chest pain, dyspnea, and weight loss [4, 5]. The etiology of LUAD is not yet clear, but extensive medical research has found that smoking, age, radon exposure, and environmental pollution are risk factors for LUAD [6]. Fiberoptic bronchoscopy, serum tumor marker examination, lung X-ray, and chest CT examination are common diagnostic measures for LUAD [7,8,9,10]. Currently, although progress has been made in detection methods and targeted therapies, most cases of LUAD are diagnosed at an advanced stage, with poor prognosis and a 5 year survival rate of less than 20% [11]. Therefore, identifying more potential biomarkers may aid in improving the prognosis of LUAD.

Mutations or abnormal expression of cancer driver genes (CDGs), including oncogenes and tumor suppressor genes, can drive tumor formation [12, 13]. Studies have pointed out that mutations in oncogenes can increase the proliferation and survival ability of tumor cells, while mutations in tumor suppressor genes can lead to uncontrolled cell proliferation [14, 15]. Currently, some known tumor driver genes (such as TP53, EGFR, and HER2, etc.) have been confirmed to be linked with the malignant progression of tumors. Akira Mogi et al. [16] revealed that the tumor suppressor gene TP53 frequently mutates in human cancers, leading to a poorer prognosis and relatively stronger resistance to chemotherapy and radiotherapy in NSCLC. EGFR mutations are the second most common oncogenic driver events in NSCLC, with most EGFR-mutated NSCLC patients exhibiting exon 19 deletions or L858R substitutions, which are considered predictive of sensitivity to EGFRTKI treatment and have significant implications for the treatment and prognosis of EGFR mutation subtypes [17, 18]. Alterations in human epidermal growth factor receptor 2 (HER2, or ERBB2) have been identified as oncogenic drivers and potential therapeutic targets in various cancers, including lung cancer (LC), breast cancer (BC), and metastatic urothelial carcinoma [19,20,21]. M Riudavets et al. [22] found in their study on NSCLC that HER2 activation occurs through three mechanisms: gene mutations, gene amplification, and protein overexpression, each with different implications and predictive outcomes. Therefore, the study suggested adopting different treatment approaches for different types of HER2 alterations to improve patients’ survival outcomes [22]. Therefore, based on these analyses, targeting CDGs has the potential to alter the cancer progression of patients to varying degrees. However, the relationship between CDGs and LUAD has not been fully illuminated at present.

This study explored the biological significance of CDGs in LUAD patients based on differentially expressed CDGs (DE-CDGs), seeking potential biomarkers for LUAD and revealing prognosis, immune, and candidate drugs for LUAD. This project can deepen the understanding of the pathogenesis of LUAD from the perspective of CDGs. The flowchart of this study is displayed in Fig. 1.

Fig. 1
figure 1

The flowchart

Materials and methods

Acquisition of public data

The gene expression profiles and clinical data information (age, gender, tumor grade, TMN stage) of the training set LUAD samples were obtained from The Cancer Genome Atlas database (TCGA, https://portal.gdc.cancer.gov/), including 541 cancer samples and 59 normal samples. The GSE72094 (386 cancer samples) dataset was downloaded from the Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/) database as the validation set. Samples with complete survival information and survival time greater than 30 days were retained in both the training set and validation set for subsequent analysis. A total of 568 CDGs were obtained from the literature [23] (Table S1).

Identification of DE-CDGs in LUAD

In the training set, differential expression analysis was carried out by utilizing the edgeR package with parameters of FDR < 0.05 and |log(FC)|> 1. Finally, the intersection of the differentially expressed genes (DEGs) obtained from differential expression analysis and the CDGs obtained from the literature resulted in DE-CDGs.

Construction and validation of DE-CDGs related prognostic model

To detect genes associated with LUAD survival in DE-CDGs. We used univariate Cox regression to screen for genes with P < 0.05 for further analysis. In LASSO regression analysis, we narrowed down the range of survival-related genes and adjusted the optimal parameter λ through tenfold cross-validation. Based on the optimal cutoff values calculated by the survminer R package, we performed multivariate Cox regression analysis on the selected genes to establish the final prognostic model. The formula is as follows:

$$Model = \sum C oefficient(gene)*Expression\,value(gene).$$

We calculated the sample risk scores for the training set and validation set using the model formula and analyzed the survival outcomes of patients with different risk scores. According to the median risk score, we grouped the patients into high-risk (HR) and low-risk (LR) groups. We compared the survival differences between different risk groups using K-M analysis. By plotting the ROC curve, we evaluated the sensitivity and specificity of the model. Finally, we drew a heatmap to display the differential gene expression between different risk groups.

Independent prognostic analysis of prognostic model related to DE-CDGs

To select independent prognostic factors, we carried out univariate/multivariate Cox regression analyses. In addition, a prognostic nomogram was constructed to generate a calibration curve to evaluate the deviation between the nomogram and actual outcomes. Age, TNM stage, and model risk score were considered in the above analyses.

Identification and analysis of DEGs in HR and LR groups

Between the HR and LR groups, differential expression analysis was carried out by utilizing the edgeR package with parameters of FDR < 0.05 and |log (FC)|> 1. Subsequently, the DEGs protein–protein interaction (PPI) network was set up using the Search Tool for the Retrieval of Interacting Genes (STRING) database for the HR and LR groups. Interactions with confidence scores higher than 0.9 were selected as the basis for constructing the PPI network. KEGG and GO enrichment analyses of DEGs were performed by utilizing the clusterProfiler package, with enrichplotR package applied to visualize the results.

Immune landscape assessment and prediction of immune therapy response in the HR and LR groups

With the application of the estimate algorithm, we assessed the tumor microenvironment of each sample in the HR and LR groups, calculated the immune score, stromal score, and ESTIMATE score for each sample, performed the Wilcoxon test, and plotted violin plot for the HR and LR groups. We employed the CIBERSORT method and single sample gene set enrichment analysis (ssGSEA) method to respectively compare differences in immune-related cell infiltration and infiltration abundance related to immune function between the HR and LR groups.

To predict the immune therapy response of the HR and LR groups, the TIDE score was introduced. Furthermore, we performed the Wilcoxon test and plotted the violin plot. The immunophenoscore (IPS) of each patient was obtained from The Cancer Immunome Atlas (TCIA) database (https://tcia.at/home). We further investigated the differences in IPS between the HR and LR groups.

Tumor mutation analysis of the HR and LR groups

The maftools was employed to analyze and plot the mutation situation of HR and LR groups for LUAD SNV mutation data. By comparing and analyzing the similarities and differences in mutation types, SNV class, and mutation rates between HR and LR groups, we selected the top ten genes with the highest mutation rates in the two different risk groups. Furthermore, we drew the waterfall plot displaying the mutation situation of model genes in the two groups.

Screening of potential drugs

We utilized the CellMiner database (https://discover.nci.nih.gov/cellminer/home.do) to dig out drugs related to the model genes. To visualize the results, the ggplot2 R package was employed.

Cell culture

The cell lines used in this study mainly included the human LUAD cell line A549, Calu-3, NCI-H1975 and the human normal lung epithelial cell line (BEAS-2B). The cells were cultured in DMEM (Gibco, USA) supplemented with 10% fetal bovine serum (Gibco, USA) and 1% penicillin–streptomycin (Yeasen, China) and placed in an incubator at 37 °C and 5% CO2.

Real-time fluorescence quantitative PCR (qRT-PCR)

First, the total RNA in the cells was extracted using the TRIzol reagent (Invitrogen, USA) according to the manufacturer's instructions. cDNA was then synthesized by reverse transcription using PrimeScript RT kit (Takara, Japan). qRT-PCR analysis was performed using SYBR Green PCR premix (Takara, Japan) on the Applied Biosystems 7500 sequence detection system (Applied Biosystem), and the relative mRNA expression was calculated using the 2−ΔΔCt method. GAPDH was used as the internal parameter. The primer sequence is shown in Table 1.

Table 1 Primer sequences

Western blot

First, the total protein was extracted with RIPA lysis buffer (Beyotime, China) and quantified with BCA kit (Thermo Fisher Scientific, USA). The protein samples were then isolated by 10%SDS-PAGE and transferred to PVDF membranes. After sealing with 5% skim milk for 1 h, the membrane was incubated with primary antibody at 4 °C overnight. After TBST washing for 3 times, goat anti-rabbit horseradish peroxidase secondary antibody (HRP, ab6721, 1:2000) was incubated at room temperature for 1 h. The blot was then observed with an enhanced chemiluminescence (ECL) solution. Protein expression was detected by ImageJ. Beta-actin was selected as the sample control. The primary antibodies are as follows: HOXD13 (ab19866, 1:1000, Abcam), FANCD2 (ab108928, 1:1000, Abcam), EGR2 (ab108399, 1:1000, Abcam), KLF4 (ab215036, 1:1000, Abcam).

Statistical analysis

SPSS 23.0 (USA) was used for data analysis. Data are expressed as mean ± standard deviation. The Student-t test was selected to compare the differences between the two groups. Univariate analysis of variance was used to evaluate the differences between groups. All experiments were repeated three times. P < 0.05 was considered statistically significant.

Results

Identification of DE-CDGs and construction and validation of the prognostic model

Through differential analysis, we obtained a total of 5576 DEGs related to LUAD. We intersected them with CDGs, obtaining 123 intersecting genes, namely 123 DE-CDGs (Fig. 2A). Subsequently, we obtained 40 genes strongly associated with LUAD survival through univariate Cox regression analysis (threshold: P < 0.05) (Table S2). Then, LASSO regression analysis was performed to eliminate genes that may have multiple collinearity (Fig. 2B). Finally, multivariate Cox regression analysis was carried out to determine the final prognostic model (Fig. 2C). The formula for the prognostic model is as follows:

Fig. 2
figure 2

A Venn diagram identified DE-CDGs. B Plot of LASSO regression analysis based on DE-CDGs. C Multivariate Cox regression analysis based on DE-CDGs. * P < 0.05, ** P < 0.01

$$\begin{aligned} Riskscore = & - 0.1842*ZNF93 + 0.1648*COL1A1 \\ & + 0.0934*MET + 0.0295*HOXD13 \\ & - 0.2248*EGR2 + 0.1791*PABPC1 \\ & + 0.0787*FBN2 - 0.1296*CBFA2T3 \\& + 0.2293*FANCD2 - 0.1084*ZNF208 \\ & + 0.1812*KLF4. \\ \end{aligned}$$

Based on the formula, we calculated the risk scores for all patients in the training set and validation set. Meanwhile, we recorded the clinical outcomes (survival/death) of all patients and compared the survival outcomes of patients in different risk groups, finding different risk scores for different LUAD patients in the training set and validation set. Patients with higher risk scores had higher mortality rates. The survival rate of the HR group (patients with risk scores higher than the median) was considerably lower than that of the LR group (patients with risk scores lower than the median) (Fig. 3A-D). ROC curve revealed that the prognostic model constructed in this study had good predictive performance in both the training set and validation set, with AUC values greater than 0.67 for 1-year, 3-year, and 5-year survival (Fig. 3E-F). Finally, the results of the expression heatmap demonstrated that CBFA2T3, EGR2, ZNF93, and ZNF208 had elevated expression levels in the LR group, while KLF4, MET, COL1A1, FBN2, PABPC1, HOXD13, and FANCD2 had elevated expression levels in the HR group (Fig. 4). The good prediction performance of the model was also verified in the validation set GSE13213 (Supplementary Fig. 1).

Fig. 3
figure 3

A-B risk scores and survival outcomes of different patients in the training set A and the validation set BC-D K–M curve of the training set C and the validation set DEF ROC curve of the training set E and the validation set F

Fig. 4
figure 4

A-B Heatmap of the gene expression levels of the model in the training set A and the validation set B

Selection of independent prognosis factor in LUAD

The results of univariate and multivariate Cox regression analyses demonstrated that the T stage, N stage, and risk score can all serve as independent prognostic factors for LUAD (P < 0.05) (Fig. 5A-B). Based on the age, gender, tumor stage, TNM stage, and risk score, the results of nomogram analysis allowed us to predict the 1-year, 3-year, and 5-year survival rates of LUAD patients (Fig. 5C). The calibration curves demonstrated a high consistency between the predicted 1-year, 3-year, and 5-year survival rates generated by our nomogram model and the actual survival rates of the patients in the dataset (Fig. 5D-F).

Fig. 5
figure 5

A-B Univariate A and multivariate B Cox regression analyses identified the independent prognosis factor for LUAD. C Nomogram constructed based on age, gender, tumor stage, TNM staging, and risk score. D-F 1-year, 3-year, and 5-year calibration curves of the nomogram

To further validate the predictive power of the model, we analyzed the association between the HR and LR groups and clinical features. The results showed that the risk score was significantly correlated with gender, stage, stage T, and stage N (P < 0.05, Fig. 6A). At the same time, to verify the predictive power of the prognostic model for patients with different clinical features, we performed subgroup survival analysis of the clinical features of the HR and LR group. Significant differences in prognosis were observed for different age groups (≤ 65 years and > 65 years) (Fig. 6B), different gender groups (female and male) (6C), different stages (stage I-П and stage Ш-IV) (Fig. 6D), different T stages (T1 + T2 and T3 + T4) (Fig. 6E), different N stages (N0 and N1) (Fig. 6F), and different M stages (M0 and M1) (Fig. 6G), which implied that the prognostic model of the present study had a good predictive and discriminatory ability in clinical practice.

Fig. 6
figure 6

A Correlation analysis between HR and LR groups and clinical data of LUAD patients. B K–M analysis for different age groups (65- and 65 + years). C K–M analysis of gender-specific (male and female) groups. D K–M analysis for different stages (stage I–II and stage III–IV). E K–M analysis of different T-stages (T1 + T2 and T3 + T4). F K–M analysis for different N stages (N0 and N1). G K–M analysis of different M stages (M0 and M1)

Identification and analysis of DEGs between HR and LR groups

Firstly, differential expression analysis was performed on the data of the HR and LR groups using the training set data, and 481 differentially up-expressed genes and 461 differentially down-expressed genes were obtained, totaling 942 DEGs (Table S3). Based on the constructed PPI network using the STRING database, it was found that among the 942 DEGs, 938 DEGs have interactions with each other, forming 387 pairs of interacting relationships (Fig. 7A). In addition, GO enrichment analysis revealed that DEGs between the HR and LR groups may be associated with biological processes including antimicrobial humoral immune response mediated by antimicrobial peptides, epidermis development, and keratinocyte differentiation, as well as cellular component such as ion channel complex, collagen-containing extracellular matrix, and intermediate filament cytoskeleton. DEGs were also related to molecular functions such as hormone activity, receptor ligand activity, neurotransmitter receptor activity, channel activity, passive transmembrane transporter activity, and solute: sodium symporter activity (Fig. 7B). Furthermore, through KEGG enrichment analysis, these DEGs were revealed to be related to the regulation of signaling pathways such as neuroactive ligand–receptor interaction, Metabolism of xenobiotics by cytochrome P450, estrogen signaling pathway, cAMP signaling pathway, Wnt signaling pathway, etc. (Fig. 7C).

Fig. 7
figure 7

A PPI network of DEGs between the HR and LR groups. B-C GO B and KEGG C Enrichment of DEGs between the HR and LR groups

Evaluation of the difference in immune levels between the HR and LR groups and prediction of the possibility of immune therapy response

Based on the immune characteristic data in the training set, the score analysis of the main components in the immune microenvironment of the HR and LR groups showed that the LR group had higher immune score and ESTIMATE score than the HR group (P < 0.05) (Fig. 8A), indicating a potentially higher immune level. As evidenced by CIBERSORT, only CD4 memory T cells activated, macrophages M0, and neutrophils were highly infiltrated in the HR group, while other immune cells such as memory B cells, follicular helper T cells, and regulatory T cells (Tregs) were highly infiltrated in the LR group (P < 0.05) (Fig. 8B). ssGSEA revealed that the infiltration levels of immune cells such as dendritic cells (aDCs, iDCs, and pDCs), B cells, and mast cells were higher in the LR group. The abundance of immune functions such as APC co-stimulation, CCR, HLA, and T cell co-stimulation was also higher in the LR group, while the HR group had a lower abundance of immune cells and immune functions (P < 0.05) (Fig. 8C-D). In terms of IPS and TIDE scores, the LR group had higher IPS and lower TIDE scores, indicating that compared with the HR group, patients in the LR group may have a more sensitive immune response to immunotherapy and a reduced likelihood of immune escape (P < 0.05) (Fig. 8E-F).

Fig. 8
figure 8

A Scores of major components in the HR and LR group immune microenvironment. B-D Immune levels of the HR and LR groups shown by the CIBERSORT B and ssGSEA C-D algorithms. EF IPS and TIDE scores of the HR and LR groups. * P < 0.05, ** P < 0.01, *** P < 0.001, **** P < 0.0001, ns P > 0.05

Genetic mutation assessment of the HR and LR groups

Based on the training set data, analysis of the mutation landscape in the HR and LR groups revealed that missense mutation had the highest number of mutations in both groups, with the HR group having a greater number of missense mutation than the LR group (Fig. 9A–C). Furthermore, statistical analysis of the gene mutation status in each sample of the HR and LR groups demonstrated that among the 242 samples in the LR group, 69 cases of gene mutation were observed, with FBN2, ZNF208, and COL1A1 having the highest mutation frequency (Fig. 9B). In the HR group, analysis of the 245 samples also revealed 69 cases of gene mutation, with FBN2, ZNF208, and MET being the three genes with the highest mutation frequency (Fig. 9D). In addition, based on the important role of driver genes in the development of LUAD, we further analyzed the correlation between risk score and common LUAD driver genes. The results showed that risk score was significantly negatively correlated with driver genes ROS1 and ALK, and significantly positively correlated with EGFR, MET, and NTRK2 (P < 0.05, Fig. 9E).

Fig. 9
figure 9

A Mutation landscape of the LR group. B Gene mutation profiles of individual samples in the LR group. C Mutation landscape of the HR group. D Gene mutation profiles of individual samples in the HR group. E Correlation between the risk score and LUAD driver gene

Prediction of candidate drugs and drug sensitivity analysis for LUAD

Fulvestrant, S-63845, sapacitabine, lomustine, BLU-667, SR16157, motesanib, AZD-9496, XK-469, dimethylfasudil, P-529, and imatinib are candidate drugs identified in this study that have significant correlations with model genes. Among them, CBFA2T3 is greatly positively correlated with fulvestrant (Cor = 0.587), sapacitabine (Cor = 0.568), SR16157 (Cor = 0.526), AZD-9496 (Cor = 0.517), XK-469 (Cor = 0.516), and imatinib (Cor = 0.502) (P < 0.05). MET is considerably negatively correlated with S-63845 (Cor = −0.570) and lomustine (Cor = −0.534) (P < 0.05). COL1A1 is significantly positively correlated with BLU-667 (Cor = 0.533), dimethylfasudil (Cor = 0.508), and P-529 (Cor = 0.504) (P < 0.05). EGR2 is significantly positively correlated with motesanib (Cor = 0.519) (P < 0.05) (Fig. 10). The box plot displayed that the MET high-expression group was more sensitive to S-63845 and lomustine, and the COL1A1 low-expression group was more sensitive to BLU-667 and dimethylfasudil (P < 0.05) (Fig. 11).

Fig. 10
figure 10

Correlation analysis between model genes and drugs predicted by the CellMiner database

Fig. 11
figure 11

Statistics of drug IC50 values corresponding to groups with model genes high or low expressed

Verification of model gene expression levels

By referring to relevant literature, we selected model genes HOXD13, FANCD2, EGR2, and KLF4 to verify their expression levels. Firstly, qRT-PCR results showed that compared with the control group, the mRNA expression levels of HOXD13 and FANCD2 in LUAD cells were significantly higher (P < 0.05), while the expression levels of EGR2 and KLF4 were significantly lower (P < 0.05, Fig. 12A). Further, the trend of protein expression level showed by western blot was consistent with qRT-PCR (Fig. 12B).

Fig. 12
figure 12

The expression levels of HOXD13, FANCD2, EGR2, and KLF4 in LUAD cells were detected using A qRT-PCR and B western blot

Discussion

LC is a common malignant tumor of the respiratory system, and CDGs are considered important factors affecting tumor progression and patient survival [24, 25]. Edouard Dantoing et al. [26] found that mutations in CDGs can alter tumor immune microenvironment and may promote resistance to PD1/PD-L1 in NSCLC. Therefore, targeting CDGs may be a promising alternative approach for treating LUAD patients [26]. In addition, Zou et al. [23] also elucidated the biological role of CDGs in hepatocellular carcinoma survival and tumor immunity, resulting in multiple CDG-related biomarkers that can predict hepatocellular carcinoma prognosis, and found that these CDGs are linked with different immune cell infiltrations in the tumor microenvironment. However, the mechanism of action of CDGs in LUAD remains unexplored. Therefore, exploring the effects of CDGs on LUAD prognosis, immunity, and treatment is of great biological significance. We herein investigate the relationship between CDGs and LUAD to reveal their association with the malignancy of LUAD and prognosis.

CBFA2T3, EGR2, ZNF93, ZNF208, KLF4, MET, COL1A1, FBN2, PABPC1, HOXD13, and FANCD2 were identified as 11 CDGs associated with LUAD prognosis in this study. Among them, CBFA2T3, EGR2, ZNF93, and ZNF208 showed elevated expression levels in the LR group, while KLF4, MET, COL1A1, FBN2, PABPC1, HOXD13, and FANCD2 had elevated expression levels in the HR group. Previous studies have found that CBFA2T3 can facilitate the occurrence and development of tumors in different cancers. In LC, CBFA2T3 is a protective gene, with its high expression promoting the survival of LC patients [27]. In BC, research has found that CBFA2T3 acts as a transcriptional repressor when connected to the binding domain of GAL4 DNA, thus identifying it as a potential candidate gene for BC tumor suppression [28]. As a member of the zinc finger transcription factor family, EGR2 is considered an important regulatory factor for systemic autoimmunity [29]. Recent studies by Liu et al. [30] have unearthed that inhibiting the expression of EGR2 can promote the proliferation of NSCLC cells. ZNF93 and ZNF208 are also members of the zinc finger protein family [31, 32]. Currently, research in ovarian cancer (OC) has pointed out that ZNF93 can facilitate the migration and proliferation of OC cells. Moreover, high expression of ZNF93 is tightly linked to the clinical staging of patients, indicating poor prognosis [33]. In the exploration of LC risk factors, a strong association has been found between ZNF93 and susceptibility to LC [31]. Although the correlation of ZNF208 with LC has not been discovered yet, multiple studies have already found its close association with disease progression in various tumors such as laryngeal cancer, esophageal cancer, and pancreatic cancer [32, 34, 35]. KLF4 is a transcription factor that can regulate cell proliferation, differentiation, and self-renewal of stem cells. It mainly participates in suppressing the differentiation and proliferation of cancer cells in lung tumors, thus being considered an important tumor suppressor [36]. Former studies have elucidated that the MET is overexpressed in many human cancers, including LC [37]. In preclinical and clinical studies of NSCLC, MET activation has been identified as a major oncogenic driver in a subset of LC, mediating malignant progression by influencing cancer cell invasion, survival, and growth [38]. Moreover, MET is also considered a secondary driver of acquired resistance to targeted therapy [38]. COL1A1 has been identified as a key biomarker and potential drug target in LC, with its expression exhibiting an obvious correlation with overall survival (OS) and progression-free survival of LC patients [39]. Wang et al. [40] found through experimental research that elevated expression of COL1A1 can drive LUAD cells to grow, migrate, and invade. As a novel pathogenic gene, FBN2 has been found by Hong et al. [41] to be highly expressed in LC cells and to facilitate the proliferation, invasion, and migration of LC cells. More and more projects suggest that PABPC1 is aberrantly expressed in various tumor tissues and cancers, such as LC [42], gastric cancer [43], and esophageal squamous cell carcinoma [44], etc. Li et al. [45] found in their study on LC that enhancing the ubiquitination of PABPC1 represses the proliferation of cancer cells. HOXD13 is a member of the HOX family and has the function of regulating organ development. Xu et al. [46] found that low expression of HOXD13 promotes the progression of prostate tumors. At present, there is relatively little investigation on the correlation between HOXD13 and the progression of LC, but Han et al. [47] have suggested that HOXD13 can be a promising prognosis biomarker for the diagnosis and treatment of LUAD. FANCD2 is considered a gene associated with autophagy-depended ferroptosis, with higher expression levels in LUAD patients compared to normal tissue specimens, and has been revealed to be associated with TNM staging advantages, lower chemotherapy sensitivity, and lower immune scores in LUAD [48]. Based on these analyses, 11 genes identified in this study are linked to the occurrence and progression of tumors, especially LC.

Next, through immune analysis, we identified immune landscape differences between different risk groups. The results revealed that the LR group had elevated immune scores compared to the HR group, with high infiltration of many immune cells such as follicular helper B cells and T cells. Studies have shown that both follicular helper B cells and T cells are important immune cells in the immune system, and their high infiltration in the tumor microenvironment has been proven to mediate tumor progression [49, 50]. According to the research by Tu et al. [51], the high infiltration of B cells in LUAD is linked with a higher OS of patients. In addition, Cui et al. [52] found that antigen-driven follicular helper B cells and CD4T cells collaborate to effectively facilitate the response of anti-tumor CD8+ T cells. Therefore, we speculated that the highly infiltrated follicular helper B cells and T cells in the LR group may be associated with their better response to prognosis. Furthermore, the prediction of immune therapy response and immune escape revealed that the LR group has higher IPS scores and lower TIDE scores, revealing that LR patients may be more sensitive to immune therapy and less likely to experience immune escape, making them more prone to benefit from immune therapy compared to the HR group. However, due to the lack of corresponding experimental verification, in-depth research is still required regarding the actual benefits of immune therapy for the HR and LR groups.

Our study also identified some candidate drugs that are greatly related to the model genes, such as fulvestrant, S-63845, and aapacitabine. In addition, based on the comparison of IC50 values, it is speculated that the MET high-expression group is more sensitive to S-63845 and lomustine, while the COL1A1 low-expression group is more sensitive to BLU-667 and dimethylfasudil. Despite the encouraging results of targeted therapy research in LC in recent years, most patients eventually develop resistance to targeted drugs, mainly due to changes in carcinogenic drivers, the most common of which include epidermal growth factor receptor (EGFR), anaplastic lymphoma kinase (ALK), and TP53 mutations [53, 54]. This study is expected to explore as many driver genes as possible and to explore the prognosis, immunity, and drug candidates of LUAD by constructing a prognostic model to analyze the potential pathogenesis. In this study, we found that the risk score model was correlated with the common LUAD driver genes to varying degrees, which may provide certain reference and research direction for the drug resistance and off-target problems of targeted therapy in LUAD driver gene positive patients. For example, the model gene KLF4 is a controversial gene. As reported by Liu et al. [55], KLF4 is significantly underexpressed in cisplatin-resistant LC cell lines, and overexpression of KLF4 inhibits the viability, EMT process, and migration and invasion of drug-resistant cells, and promotes apoptosis. However, Zheng et al. [56] revealed that KLF4 overexpression caused by circUBAP2 dysregulation has a facilitating effect on NSCLC proliferation and chemotherapy resistance. In gefitinib-resistant NSCLC cells and tissues mediated by c-Met amplification, KLF4 is overexpressed and favors tumor progression [57]. At the same time, the study also showed that KLF4 can enhance gefitinib resistance by inhibiting β-catenin expression and interfering with β-catenin inhibition of c-Met phosphorylation to activate the c-Met/Akt signaling pathway. The carcinogenic or anticancer effect of KLF4 in NSCLC may be related to the subcellular localization of KLF4 [58]. For the recognized resistance driver gene MET, the TAGTON clinical trial demonstrated that osimertinib (EGFR-TKI) combined with volitinib (MET inhibitor) is a promising therapy for advanced NSCLC patients with MET-amplified/overexpressed EGFRm who have developed disease progression following previous EGFR-TKI screening. Therefore, we speculate that the combined action of model genes KLF4 and Met may be a potential mechanism leading to chemotherapy or targeted drug resistance to LUAD. The model gene COL1A1 has been reported to help solid tumors adapt to the hypoxic conditions of the tumor microenvironment, thereby promoting tumor aggressiveness and drug resistance [59]. At the same time, the drug resistance mechanism of COL1A1 in LUAD is usually carried out as a downstream regulatory gene, such as the regulatory axis miR-150/NOTCH3/COL1A1 [60], LINC00313/miR-218-5p/COL1A1 [61] and miR-29b-3p/COL1A1[62]. Therefore, the model genes and drugs screened in this study can provide not only new exploration ideas for the resistance mechanism of targeted therapy in LUAD patients, but also a reliable research direction for future drug research.

At present, certain limitations still exist in this study. First, the risk profile was constructed based on the TCGA-LUAD dataset, and clinical trials were needed to further validate the prognostic value of the risk score model and the nomogram. Second, although the expression level of the model gene has been verified experimentally, its prognostic mechanism in LUAD remains to be further explored. Third, the prognostic model constructed in this study needs to be further explored in terms of drug resistance or off-target of LUAD. Our investigation identified the biological value of CDGs in LUAD prognosis, immune response, and treatment, proffering new prospects for future LUAD research.