Introduction

Lung cancer leads to the largest number of cancer-related deaths worldwide, more than 85 % of which are lung adenocarcinoma, also known as non-small-cell lung cancer (NSCLC) [1, 2]. The predicted 5-year survival rate of NSCLC patients is 15.9 %, and little improvement has been reached during the past few decades [3]. Due to the poor clinical outcome, substantial researches focus on uncovering the molecular mechanism of NSCLC, providing insights into potential therapeutic targets.

Microarray analysis is widely used in the field of cancer genetics research, which can measure gene expression on a genome-wide scale simultaneously [4]. The technology of the microarray helps to better understand the mechanisms of various diseases [5]. Previous studies have used this technique to find the differentially expressed genes (DEGs) between NSCLC and normal tissues. However, the results of these studies are inconsistent, probably due to sample sources, microarray platforms, and analysis techniques. In order to avoid these problems, a meta-analysis method is developed to detect DEGs by integrating multiple microarray studies [6]. This method has been performed in various types of tumors including hepatocellular carcinoma [7], nasopharyngeal carcinoma [8], colorectal cancer [9], and osteosarcoma [10] to detect key genes, i.e., oncogenes or tumor suppressor genes involved in the development of cancers.

In this study, we employed meta-analysis method to identify DEGs between NSCLC and normal control (NC) tissues, and then we performed functional annotation of these genes to discover the biological processes and signaling pathways associated with NSCLC. Finally, we utilized qRT-PCR to validate the meta-analysis approach.

Materials and Methods

Strategy for Identification of NSCLC Gene Expression Datasets

We searched PubMed database and gene expression omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo) database to identify NSCLC expression profiling studies by microarray. The following keywords and their combinations were used: lung adenocarcinoma and homo sapiens. The original studies that compared gene expression profiling between NSCLC and NC biopsy tissues or cultured cells were included in this study. Non-human studies, reviews, and meta-analysis articles were excluded.

After the background correction and normalization of raw data, we used significance analysis of microarray (SAM) to normalize the data and identified the DEGs by t test. The false discovery rate (FDR) <0.01 was selected as the criterion for significant differences.

Functional Classification of DEGs

We performed gene ontology (GO) enrichment analysis of the DEGs to investigate their biological functions in NSCLC using the online software GENECODIS (http://genecodis.cnb.csic.es) [11]. We also performed the pathway enrichment analysis by utilizing the Kyoto encyclopedia of genes and genomes (KEGG) database.

PPI Network Construction

The protein–protein interactions (PPIs) play central role in the regulation of biological processes and reveal the function of proteins at molecular level. The construction of PPI Network in a genome-wide scale is important for the interpretation of its functions. Biological general repository for interaction datasets (BioGRID) (http://thebiogrid.org/) was used to construct PPI network, and the top 10 up- and down-regulated DEGs were visualized in the network in Cytoscape [12].

The clinical specimens were provided by First Affiliated The Collection of Clinical Specimens

Hospital of PLA General Hospital, with the permission of the patients. Before the study, the protocols and documents were approved by the Medical Ethics Committee of the hospital. The written informed consent forms were obtained from the patients or legal guardians of the patients. The utilization of the samples was in strict accordance with the National Regulation of Clinical Sampling in China. The tumor tissues were immediately frozen and were stored in liquid nitrogen until RNA extraction.

RNA Preparation and qRT-PCR

Total RNA was extracted from each sample using RNeasy Mini Kit (Qiagen, Valencia, CA) according to the manufacturer’s protocol. Two micrograms total RNA was reverse transcribed into single-stranded cDNA using superscript II reverse transcriptase (Invitrogen/Life Technologies, Carlsbad, CA). We utilized PrimerPlex 2.61 (PREMIER Biosoft, Palo Alto, CA) to design primers (see Supplementary Table 1 for primers used). cDNA was amplified using Power SYBR Green PCR Master Mix (Applied Biosystems/Life Technologies, Carlsbad, CA) according to the manufacturer’s instructions. Quantitative PCR was performed with ABI 7500 real-time PCR system (Applied Biosystems, Carlsbad CA). The results were analyzed using Ct method using data assist software version 3.0 (Applied Biosystems/Life Technologies). The data were normalized to ACTIN gene expression.

Results

Differential Gene Expression Analysis by Meta-analysis

We identified 15 expression profiling studies eligible for the meta-analysis, including a total of 637 cases of NSCLC and 298 cases of NC. The details of the selected studies are summarized in Table 1. We found 1063 DEGs with FDR <0.01, among which 464 genes were up-regulated and 599 genes were down-regulated in NSCLC tissues. The top 20 significantly DEGs are listed in Table 2. The full list of the DEGs can be found in Supplementary Table 2.

Table 1 Characteristics of the individual studies
Table 2 The top ten most significantly up- or down-regulated DEGs

Functional Annotation

We conducted a GO categories enrichment analysis to investigate the biological roles of identified DEGs. We separately examined the three groups of GO categories, including biological process, cellular component, and molecular function, by web-based software GENECODIS. Genes of P < 0.01 were selected and were tested against the background set of all genes with GO annotations. We found that the enriched GO terms for biological process was regulation of cell proliferation (GO: 0042127), and while for cellular component was plasma membrane part (GO: 0044459), for molecular function was growth factor binding (GO: 0019838). The full list of GO terms is given in Table 3.

Table 3 The enriched GO categories of DEGs

We also performed the KEGG pathway enrichment analysis to further explore the biological significance of the DEGs. Hypergeometric test with P value <0.05 was used as the criterion for pathway detection. From the KEGG pathway analysis, we found that 37 genes were enriched in focal adhesion signaling pathway, indicating that they may relate with NSCLC metastasis (Fig. 1).

Fig. 1
figure 1

The enriched KEGG pathway of DEGs (focal adhesion pathway). The red icons mean DEGs identified in this study

PPI Network Construction

Based on PPI data in BioGRID, PPI networks of the top 20 significantly DEGs were constructed by Cytoscape software. The PPI network consisted of nodes and edges, which represents proteins and interactions. There were 410 edges and 404 nodes in the PPI network, among which ADRB2, CAV1, and COL1A1 were connected with more proteins (Fig. 2).

Fig. 2
figure 2

PPI networks of the top ten most significantly up- or down-regulated DEGs. Nodes and edges represent proteins and interactions between proteins. The up-regulated genes in NSCLC were marked with red color and the down-regulated genes in NSCLC were marked with deep blue color

qRT-PCR Validation

Five pairs of NSCLC and NC tissues were used to validate the results of meta-analysis. We selected the top ten up- or down-regulated genes for validation. MMP12, COL11A1, THBS2, ADAM12, and FAP were selected as the up-regulated genes in NSCLC, while FABP4, CDH5, CAV1, TCF21, and ADRB2 were selected as the down-regulated genes in NSCLC.

The qRT-PCR results showed that the expression patterns of selected genes in NSCLC and NC tissues were similar to those in the meta-analysis. The expression profiling of up- and down-regulated genes in each sample is shown in Fig. 3. The average fold changes of the up-regulated genes MMP12, COL11A1, THBS2, ADAM12, and FAP were 6.6-, 17.76-, 4.46-, 4.69-, and 2.77-folds, respectively. MMP12 and COL11A1 were dramatically up-regulated in three of five samples, while the other three genes were mild up-regulated in most of the NSCLC samples. The average fold changes of the down-regulated genes FABP4, CDH5, CAV1, TCF21, and ADRB2 were 24.9-, 3.5-, 18.28-, 8.3-, and 3.4-folds. FABP4 and CAV1 were dramatically down-regulated in four of five patient samples.

Fig. 3
figure 3

qRT-PCR validation of the top ten most significantly up- or down-regulated DEGs in five pairs of NSCLC and NC tissues. ACTIN was used as an internal reference gene for normalization. The graph showed the relative expression levels of ADAM12, ADRB2, CAV1, CDH5, COL11A1, FABP4, FAP, MMP12, TCF21, and THBS2 between each pair of NSCLC and NC tissues. Bar graph represents mean ± SEM. Z means normal control tissues; C means non-small-cell lung cancer tissues

Discussion

Lung cancer is still the leading cause of cancer-related mortality all over the word [13]. Although there is a huge development in molecular techniques and lung cancer biology, many of the genetic alterations related to lung carcinogenesis still remain unknown. Microarray analysis can discover the expression alteration of a large number of genes simultaneously within tumors, which may help discover new signaling pathways or molecular mechanisms associated with tumorigenesis. In this study, we combined 15 microarray data sets to detect DEGs. We also performed GO term, KEGG, and PPI analysis of the DEGs and detected some important molecules and signaling pathways which may extend our understanding of the pathology of NSCLC and further guide the development of new therapeutic targets.

In this study, we found 1063 DEGs between NSCLC and NC tissues. GO analysis showed that 266 DEGs were enriched in cell proliferation regulation and 96 DEGs were enriched in DNA binding and growth factor binding. The results indicated that cell proliferation was dysregulated in NSCLC tissues. The fundamental abnormality of cancer cells was the continual unregulated cell proliferation [14], and numerous studies had proved the important role of EGFR in NSCLC, indicating that the GO analysis in this study was reasonable.

KEGG pathway analysis revealed that the DEGs between NSCLC and normal tissues are enriched in focal adhesion signaling pathways. Focal adhesions are the contact sites between the cytoskeleton and extracellular matrix through transmembrane proteins, integrins [15]. Cells received signals from extracellular microenvironment through focal adhesions to maintain proper cell survival, proliferation, differentiation, and motility through integrin-related signaling pathways [1618]. The loss of the tight regulation of focal adhesions can lead to cancer progression and metastasis [1921]. PPI network analysis for the top 20 significantly DEGs indicated that the significant hub proteins were CAV1 and COL1A1, which are the important components of focal adhesion [21]. Our data indicated that focal adhesion components and related signaling pathways may play important roles in pathology of NSCLC and may shed light on discovery of new therapeutic targets of NSCLC.

In order to validate our meta-analysis data, we performed qRT-PCR to detect the expression of the top ten significant DEGs in NSCLC and NC tissues. We found that the expression patterns of the selected ten DEGs, including MMP12, COL11A1, THBS2, ADAM12, FAP, FABP4, CDH5, CAV1, TCF21, and ADRB2, were consistent with our meta-analysis and previous reports.

qRT-PCR results show that the mRNA levels of MMP12, COL11A1, THBS2, ADAM12, and FAP are significantly higher in NSCLC tissues than those in NC tissues. MMP12 is a 22 kDa metal-dependent proteinase which can degrade elastin, type IV collagen, fibronectin, laminin, gelatin, vitronectin, entactin, heparin, and chondroitin [22]. Many of the MMP12 substrates, such as collagen, laminin, and fibronectin, are important extracellular matrix molecules which can regulate cell shape, migration, and survival through focal adhesion [23]. Previous studies also reveal that MMP12 is correlated with early cancer-related deaths in NSCLC, especially for the patients who exposed to tobacco cigarette smoke [24].

Both COL11A1 and THBS2 participate in focal adhesion signaling pathways. COL11A1 is linked to ovarian cancer recurrence and poor survival. The invasion ability and oncogenic potential of ovarian cancer cells are suppressed by COL11A1 knockdown [25]. COL11A1 and THBS2 are overexpressed in lung cancer and can be recognized as a marker of lung cancer [2628]. FAP is a serine protease selectively and highly expressed on the surface of cancer-associated fibroblasts, and it is important in the progression and prognosis in diverse malignancies [29]. The expression level of FAP is closely associated with tumor recurrence and poor clinical outcome in rectal and pancreatic cancer [30, 31]. However, the functions of FAP in NSCLC are poorly understood. One study reports that FAP is highly expressed in lung cancer stroma, and its high expression is a predictor of poor survival of NSCLC patients [32].

The expression of CAV1 and ADRB2 is lower in NSCLC tissues than that in NC tissues, and they are also the significant hub proteins of the PPI network in our study. Furthermore, CAV1 is involved in the focal adhesion signaling pathway. CAV1 is a major structural component of caveolae which is a specialized plasma membrane invagination [33]. The function of CAV1 in tumorigenesis is depending on tumor type and tumor stage. The expression of CAV1 is down-regulated in tumor cells and tissues isolated from breast, cervix, lung, and ovary [3438], indicating that CAV1 may act as a tumor suppressor. Currently, the in vivo function of ADRB2 in NSCLC is largely unknown.

In summary, we used a meta-analysis approach to integrate 15 microarray data sets of NSCLC and identified DEGs and their biological function. We used qRT-PCR to validate the meta-analysis approach by detecting the expression of top ten significantly DEGs. Our study suggested that some DEGs, including MMP12, COL11A1, THBS2, FAP, and CAV1, might participate in the pathology of NSCLC and they might be potential therapeutic targets.