Introduction

Lung cancer, also known as bronchopulmonary carcinoma, is one of the common malignancies and the leading cause of cancer-related death worldwide [1]. Two main histological types are included: non-small-cell lung cancer (NSCLC) and small-cell lung cancer (SCLC). NSCLC which contains adenocarcinoma, squamous cell carcinoma, and large cell carcinoma accounts for approximately 85 % of all lung cancers [2]. Despite that the diagnostic and treatment methods have undergone considerable advancements, prognosis of NSCLC is still unfavorable, with an overall 5-year survival rate less than 15 % [2, 3]. Therefore, in order to provide better prognostic prediction and individualized treatments, further investigation on identification of prognostic markers and potential drug targets is eagerly needed.

Similar to many other carcinomas, NSCLC occurrence and development are closely related to abnormal cell cycle regulation [4, 5]. The timing of the cell to proliferate, to enter into reversible quiescence, to differentiate, or to die is controlled by the cell cycle clock apparatus [6]. Deregulation of the cell cycle process is a necessary step in malignant transformation [7].

The E2F activators (E2F1–3), belonging to the E2F family of transcription factors (Table 1) [812], play an important role in controlling the cell cycle, proliferation, differentiation, and apoptosis [1317]. They were thought to determine the timing of the G1/S transition [18, 19]. An experiment done on mice demonstrated that the higher expression of E2F activators leads to the higher expression of E2F target genes and spontaneous cancer formation [17]. Deregulated expression of E2F activators has been observed in several human malignancies and has been found in bladder, breast, ovarian, and prostate cancers; gastrointestinal carcinomas; and lung cancer [2026]. Although high-level expression of E2F activators and their relationship with clinicopathological features and prognosis have been partly reported in human NSCLC [2426], to the best of our knowledge, the bioinformatics analysis has never been used to explore the role of E2F activators in NSCLC.

Table 1 Classification and characterization of E2F family

Material and methods

Expression evaluation and analysis

In order to evaluate and analyze E2F activator expression, we used The Cancer Genome Atlas (TCGA) datasets. TCGA is a collaboration between the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI). The tumor and normal tissues from more than 11,000 patients have been profiled, covering 37 types of genetic and clinical data for 33 types of cancer [27]. Comprehensive profiling data have been published on cancers of the breast, ovary, skin, head/neck, lung, and other organs and will soon be available for many other cancer types. With rigorous control by the NCI and individual institutes, the data are of high quality. This makes TCGA a useful source of information for gene expression alteration [28], tumor molecular subtype classification [29, 30], and other applications.

Three datasets named TCGA_LUNG_exp_HiSeqV2-2015-02-24, TCGA_LUAD_exp_HiSeqV2-2015-02-24, and TCGA_LUSC_exp_HiSeqV2-2015-02-24 were downloaded at the website of the UCSC cancer browser (https://genome-cancer.ucsc.edu/). These datasets contain a list of cancer-related characteristic information of 1013 NSCLC tissue samples, which include 108 paired NSCLC tissue samples, 57 pairs of lung adenocarcinoma (LUAD) tissues, and 51 pairs of lung squamous cell carcinoma (LUSC) tissues, respectively. The values of E2F activator expression of the tissue samples were obtained from the file “genomicMatrix.” Then, files named “clinical_data” in datasets were used to analyze the association between the E2F activator expression and some certain clinical characteristics.

Prognosis analysis

An online database named Kaplan-Meier plotter (KM plotter) [31] was used to assess the correlation of E2F activator expression to overall survival (OS). Presently, the database has breast cancer [32], gastric cancer, ovarian cancer [33], and lung cancer [31] data. The gene expression data and overall survival information of NSCLC patients in the database are downloaded from Cancer Biomedical Informatics Grid (caBIG, http://cabig.cancer.gov/, microarray samples are published in the caArray project), the Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/), and TCGA (http://cancergenome.nih.gov) [31]. The database was established using gene expression data and survival information of 1926 NSCLC patients downloaded from GEO, EGA, and TCGA. Briefly, three E2F activator submembers (E2F1, E2F2, E2F3) were entered into the database (http://kmplot.com/analysis/index.php?p=service&cancer=lung) to get Kaplan-Meier survival plots. Hazard ratio (and 95 % confidence intervals) and log-rank P were calculated and displayed on the main plots.

Statistical analysis

Three TCGA datasets and one online database mentioned above were used to extract data, analyze correlations, and evaluate different prognosis. Student’s t test and χ 2 test were performed to analyze the data using SPSS software version 22.0. P < 0.05 was considered statistically significant. The data graphs were made by GraphPad Prism 6.02 software.

Results

Analysis in TCGA datasets validates high-level expression of E2F activators in lung cancer, LUAD, and LUSC tissues

When we focused on the 108 paired NSCLC tissues (57 LUAD tissues and 51 LUSC tissues) from the three TCGA datasets, we firstly found that E2F1 was on average 1.19-fold overexpressed in lung cancer tissues (1.14-fold change in LUAD and 1.25-fold change in LUSC, all P values <0.0001) (Fig. 1a). Then, we explored the expression level of E2F2 and E2F3 in lung cancer tissues. The results demonstrated that E2F2 was on average 1.56-fold overexpressed in lung cancer tissues (1.48-fold change in LUAD and 1.64-fold change in LUSC, all P values <0.0001) (Fig. 1b) and E2F3 was on average 1.17-fold overexpressed in lung cancer tissues (1.19-fold change in LUAD and 1.16-fold change in LUSC, all P values <0.0001) (Fig. 1c).

Fig. 1
figure 1

E2F activators are highly expressed in NSCLC, LUAD, and LUSC tissues. a E2F1 is on average 1.19-fold overexpressed in lung cancer tissues, 1.14-fold changed in LUAD, and 1.25-fold changed in LUSC (all P values <0.0001). b E2F2 is on average 1.56-fold overexpressed in lung cancer tissues, 1.48-fold changed in LUAD, and 1.64-fold changed in LUSC (all P values <0.0001). c E2F3 is on average 1.17-fold overexpressed in lung cancer tissues, 1.19-fold changed in LUAD, and 1.16-fold changed in LUSC (all P values <0.0001)

E2F activator expression shows significant correlation with some certain clinical characteristics in NSCLC and subtypes

After further analyzing the file clinical_data in the three TCGA datasets, 725 patients (337 LUAD patients and 388 LUSC patients, respectively) with full-scale clinical data (age, gender, TNM stage, pathologic stage, survival information) were extracted from the 1013 patients mentioned above. Among the 725 patients, median age at the time of diagnosis was 66 years (ranging from 38 to 87 years) and 36.6 % of the patients were female. For NSCLC patients, we chose the median expression value of each E2F activator as the cutoff value, and then the patients were divided into two groups: high expression and low expression. For LUAD and LUSC patients, we used the same grouping method. Firstly, we explored the relationship between E2F activator expression and clinicopathological features in NSCLC patients and we found that E2F1 expression was significantly correlated with age (P = 0.049838), sex (P = 0.007762), and tumor stage (P = 0.023432) (Table 2). E2F2 expression was found to be significantly correlated with sex (P = 0.000003) and tumor size (P = 0.008569) (Table 3). But E2F3 expression showed no correlation with all the clinical characteristics as previously mentioned (Table 4). Then, we explored the relationship between E2F activator expression and clinicopathological features in LUAD patients and we concluded that E2F1 expression was significantly correlated with tumor size (P = 0.047061) and tumor stage (P = 0.043911) (Table 5). E2F2 expression was significantly correlated with lymph node status (P = 0.016263) and tumor stage (P = 0.007615) (Table 6). However, no significant correlation was observed between E2F3 and any clinical characteristic (Table 7). For LUSC patients, E2F1 was concluded to be significantly correlated with tumor stage (P = 0.004436) (Table 8). E2F2 and E2F3 were not correlated with the clinical characteristics as previously mentioned (Tables 9 and 10).

Table 2 Correlation between E2F1 expression and clinical characteristics in NSCLC patients
Table 3 Correlation between E2F2 expression and clinical characteristics in NSCLC patients
Table 4 Correlation between E2F3 expression and clinical characteristics in NSCLC patients
Table 5 Correlation between E2F1 expression and clinical characteristics in LUAD patients
Table 6 Correlation between E2F2 expression and clinical characteristics in LUAD patients
Table 7 Correlation between E2F3 expression and clinical characteristics in LUAD patients
Table 8 Correlation between E2F1 expression and clinical characteristics in LUSC patients
Table 9 Correlation between E2F2 expression and clinical characteristics in LUSC patients
Table 10 Correlation between E2F3 expression and clinical characteristics in LUSC patients

Different prognostic value of E2F activators in NSCLC and subtypes

We next examined the prognostic value of E2F activator expression. All E2F activator Kaplan-Meier survival information can be found in www.kmplot.com.

We first determined the predictive value of the expression of E2F1 in www.kmplot.com. The desired Affymetrix ID is valid: 204947_at (E2F1). Survival curves are plotted for all patients (n = 1928) (Fig. 2a), for LUAD patients (n = 866) (Fig. 2b), and for LUSC patients (n = 675) (Fig. 2c). E2F1 high expression was found to be correlated to worsen OS in all NSCLC patients followed for 200 months, hazard ratio (HR) 1.46 (1.28–1.66), P = 5e−09. E2F1 high expression was also found to be correlated to worsen OS in LUAD patients, HR 1.74 (1.37–2.21), P = 3.6e−06. However, E2F1 high expression was not found to be correlated to OS in LUSC patients, HR 1.15 (0.91–1.46), P = 0.25.

Fig. 2
figure 2

The predictive value of the expression of E2F1 in www.kmplot.com. The desired Affymetrix ID is valid: 204947_at (E2F1). E2F1 high expression is significantly associated to worsen OS in all NSCLC patients (n = 1928) (P = 5e−09) (a), as well as in LUAD patients (n = 866) (P = 3.6e−06) (b). E2F1 high expression is not associated with OS in LUSC patients (n = 675) (P = 0.25) (c)

We then determined the predictive value of E2F2 expression in www.kmplot.com. The Affymetrix ID is valid: 228361_at (E2F2). E2F2 high expression was found to be correlated to worsen OS in all NSCLC patients, HR 1.84 (1.56–2.18), P = 4.7e−13 (Fig. 3a), as well as in LUAD patients, HR 2.23 (1.73–2.87), P = 1.6e−10 (Fig. 3b), but not in LUSC patients, HR 1.01 (0.74–1.38), P = 0.93 (Fig. 3c).

Fig. 3
figure 3

The predictive value of the expression of E2F2 in www.kmplot.com. The desired Affymetrix ID is valid: 228361_at (E2F2). E2F2 high expression is significantly correlated to worsen OS in all NSCLC patients (n = 1928) (P = 4.7e−13) (a), as well as in LUAD patients (n = 866) (P = 1.6e−10) (b), but not in LUSC patients (n = 675) (P = 0.93) (c)

Figure 4 shows the predictive value of E2F3 expression in www.kmplot.com. The Affymetrix ID is valid: 203693_s_at (E2F3). E2F3 high expression was not found to be correlated to OS in all NSCLC patients, HR 0.93 (0.82–1.06), P = 0.27 (Fig. 4a), and in LUSC patients, HR 1.06 (0.84–1.34), P = 0.63 (Fig. 4c). But E2F3 high expression was found to be correlated to better OS in LUAD patients, HR 0.62 (0.49–0.79), P = 8.4e–05 (Fig. 4b).

Fig. 4
figure 4

The predictive value of the expression of E2F3 in www.kmplot.com. The desired Affymetrix ID is valid: 203693_s_at (E2F3). E2F3 expression shows no association with OS in NSCLC patients (n = 1928) (P = 0.27) (a). E2F3 high expression is significantly correlated to better OS in LUAD patients (n = 866) (P = 0.63) (b), but not in LUSC patients (n = 675) (P = 8.4e−05) (c)

Discussion

NSCLC is a highly malignant and aggressive tumor type and showed a poor 5-year survival rate [2, 3, 34]. E2F activator overexpression has been reported in many cancers in recent years, and such overexpression may promote carcinogenesis [22, 23]. Though the role of E2F activators in tumorigenesis and prognosis in several cancers has been partially researched and confirmed [2325], the method of further bioinformatics analysis has never been reported in NSCLC. In the present study, we mainly explored the relationship between E2F activators and the clinical characteristics of NSCLC as well as the relationship between E2F activators and the OS of NSCLC. We hope that all these works will be helpful to make the previous research results abundant, design the treatment, and estimate the prognosis of NSCLC patients.

E2F1, among E2F activators, is the best studied in NSCLC [24, 35, 36]. It was reported that the overexpression of E2F1 contributes to the development of NSCLC, and this role is enhanced by the deregulated pRb-p53-MDM2 circuitry [6]. Moreover, in lung cancer, some miRNAs exert their function by regulating E2F1 [36, 37]. Furthermore, a recent experimental study showed that during the progression of NSCLC, E2F1 overexpression could produce more aggressive tumors with a high proliferation rate and chemoresistance [24]. But Volm et al. demonstrated that E2F1 showed no correlation at all with LUSC patients [38]. In our study, TCGA datasets revealed higher expression of E2F1 in NSCLC, LUAD, and LUSC tissues. Also, we demonstrated that E2F1 expression was significantly correlated with age, sex, and tumor stage in all NSCLC patients; correlated with tumor size and tumor stage in LUAD patients; and correlated with tumor stage in LUSC patients. Then, by using the KM plotter, we determined the prognostic value of E2F1 in NSCLC patients. E2F1 high expression was significantly associated to worsen OS in all NSCLC patients followed for 200 months, as well as in LUAD patients. However, E2F1 high expression was not found to be correlated to OS in LUSC patients.

The E2F2 gene is located on 1p36 [39]. It was reported that in different cancer types, E2F2 may act as either a tumor suppressor or an activator [40]. Many studies revealed that E2F2 overexpression is related to larger tumor size and advanced clinical stage in ovarian cancer [41, 42] and hepatocellular carcinoma [43]. Chen et al. indicated that E2F2 acted as a tumor activator in NSCLC and was an independent indicator for OS in NSCLC patients [25]. In our report, higher expression of E2F2 in NSCLC, LUAD, and LUSC tissues was demonstrated. Besides, E2F2 expression was found to be significantly correlated with sex and tumor size in all NSCLC patients, while E2F2 expression was significantly correlated with lymph node status and tumor stage. However, E2F2 expression showed no correlation with the clinical characteristics in LUSC patients. Furthermore, E2F2 high expression was found to be significantly correlated to worsen OS in all NSCLC patients, as well as in LUAD patients, but not in LUSC patients.

E2F3 encodes two different proteins, E2F3a and E2F3b [44, 45]. E2F3a, as well as E2F1 and E2F2, is inhibited by pRB in quiescent cells and recruits coactivators to E2f-responsive genes in G1, and its promoter is E2f-responsive. E2F3b, like E2F4 and E2F5, acts as a transcriptional repressor [46]. E2F3 overexpression is proved to be an oncogenic event during human bladder cancer [47, 48] and prostate cancer [49] development. Overexpression of E2F3 was also observed in LUAD and LUSC lung cancer patients [26]. In this report, we demonstrated the higher expression of E2F3 in NSCLC, LUAD, and LUSC tissues. But no significant correlation was observed between E2F3 and any clinical characteristic in all NSCLC patients, in LUAD patients, and in LUSC patients. We also observed that E2F3 high expression was significantly correlated to better OS in LUAD patients, but not in all NSCLC and LUSC patients. We consider that E2F3b may be responsible for the better OS in LUAD patients; however, there is no information about E2F3b that can be found in “TCGA datasets” and the “Kaplan-Meier plotter.” More research is needed to better understand the role E2F3 played in NSCLC patients.

Our results indicated that higher expression of E2F1 and E2F2 may play an important role in the malignancy of NSCLC especially in LUAD. E2F1 and E2F2 might be a useful marker for poor prognosis and a potential therapeutic target in LUAD patients. On the other hand, high E2F1 and E2F2 expression could also serve as a molecular marker to identify high-risk subgroups in LUAD patients. But in LUSC patients, no significant clinical significance was observed.