Introduction

Esophageal carcinoma (EC) is the eighth most common and the sixth most lethal cancer worldwide [1]. Despite advances in multidisciplinary treatment of esophageal squamous cell carcinoma (ESCC), the disease generally has a very poor prognosis, with a 5-year survival rate ranging from 10 to 25 % [1, 2]. Esophageal adenocarcinoma (EAC) and ESCC [1] are the two main clinical subtypes of EC. Approximately 70 % of the worldwide cases of ESCC occur in China [1]. Although the application of targeted therapy is primarily limited to EAC, ESCC remains the dominant histological type of esophageal cancer both in China and worldwide [3]. Thus, there is an urgent need for novel strategies aimed at improving our understanding of ESCC biology and for providing targets for therapy or early detection of the disease.

Multiple important signaling pathways in tumorigenesis have been uncovered via expression profiling of coding genes. More recently, actively transcribed long non-coding RNAs (lncRNAs), endogenous cellular RNA transcripts longer than 200 nucleotides in length and without protein-coding capacity [4], identified by high-throughput platforms have been shown to be involved in even more complex genome regulatory networks in cancer. lncRNAs are emerging as crucial regulators of cancer biology; these molecules are generally expressed at lower levels than coding genes but display higher tissue specificity [46].

When located at or near the same genomic locus, lncRNAs are involved in the cis regulation of gene expression [7]. lncRNAs can also regulate distal gene expression through a trans-acting mechanism by associating with multiple protein partners, such as chromatin modifiers, transcription factors and splicing factors, or by serving as decoys, guides, or scaffolds [4, 8]. As lncRNAs appear to be involved in nearly all aspects of gene regulation, analysis of the co-expression of lncRNAs, and messenger RNA (mRNAs) can help predict their roles in the development of various diseases including cancer and lay a foundation for uncovering their mechanisms of activity.

Altered lncRNA profiles have been identified in breast cancer [9, 10], lung cancer [11, 12], colorectal cancer [13], renal cell carcinoma [14], and hepatocellular carcinoma [1517], indicating that aberrant expression of certain lncRNAs contributes to carcinogenesis. Over the past 3 years, studies on lncRNAs have become common in esophageal cancer biology research. For example, Wu et al. found that the long non-coding RNA transcript AFAP1-AS1 is highly expressed in esophageal adenocarcinoma, and functional experiments showed that AFAP1-AS1 promotes invasion and metastasis in esophageal cancer cells [18]. More recently, HOTAIR [19, 20], ANRIL [21], UCA1 [22], PCAT1 [23], and MALAT1 [24] were reported to be upregulated in ESCC and were significantly associated with disease prognosis. Despite the considerable progress in understanding lncRNAs that has accompanied over a decade of research, only a few have been identified. Indeed, most lncRNAs remain largely unstudied, particularly with regard to ESCC.

Therefore, to investigate the potential role of lncRNAs in ESCC, we performed a comprehensive analysis of lncRNA and mRNA profiles in ESCC tissue. In particular, we evaluated the lncRNA and mRNA co-expression network during the genesis of ESCC.

Materials and methods

Data curation and processing

Transcriptomic sequencing data under the accession number GSE32424 [25] were downloaded from publicly available Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/). Normalized reads were downloaded from the Co-LncRNA database (http://www.bio-bigdata.com/Co-LncRNA/) [26]. In brief, raw RNA-seq reads were aligned and mapped using TopHat v2.0.9, and transcriptome assemblies were generated using Cufflinks v2.1.1 with the default parameters. Only expressed genes were considered, and the threshold of the expression value for inclusion in the analysis was set to 0.001 [26]. In this study, human lncRNA and protein-coding gene annotations were directly download from GENECODE v22. All of the categories in the “long non-coding RNA gene annotation” GTF file were considered to be lncRNAs. To obtain genome-wide lncRNA and protein-coding gene expression profiles, normalized expression data were subsequently analyzed for differently expressed lncRNAs and protein-coding genes using the Bioconductor package (limma, version 3.26.1) [27] in R (version 3.2.2) with default parameters. Differentially expressed lncRNAs and mRNAs were identified through fold change filtering.

Construction of the lncRNA and mRNA co-expression network

Spearman’s correlation test was used to estimate co-expression relationships between lncRNAs and protein-coding genes. Moreover, the P value of the correlation coefficient was estimated. Finally, a set of co-expressed genes for each lncRNA was identified by applying a coefficient threshold of 0.95 and a significance threshold of 0.001. The filtered co-expressed genes were defined as potential targets of the lncRNAs in this study. Using Cytoscape (version 3.2.1), the resulting network was defined as an lncRNA-mRNA regulatory network. A direct connection between an lncRNA and an mRNA is represented as a solid line.

Bioinformatic analysis

Gene Ontology (GO) analysis is a functional analysis associating differentially expressed mRNAs with GO categories. The predicted target genes were uploaded into the Database for Annotation, Visualization and, Integrated Discovery (DAVID; http://david.abcc.ncifcrf.gov/), which utilizes GO to identify the molecular function(s) represented in the gene profile. Furthermore, we also used the KEGG (Kyoto Encyclopedia of Genes and Genomes) database (http://www.genome.ad.jp/kegg/) to analyze the potential functions of these target genes in pathways. The lower the P value, the more significant the correlation, and we used the recommended P value cutoff of 0.05.

Patient samples

We retrospectively collected paired tumor and adjacent normal tissues from 50 patients with ESCC and examined the expression of selected lncRNAs with validating RNA sequencing data. All patients had surgically proven primary ESCC and underwent surgery at National Cancer Center/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College between June 2008 and June 2009. The clinical and pathological information for the patients is listed in Table 1. Samples were obtained with informed consent, and the study was approved by the medical ethics committee of the National Cancer Center/Cancer Hospital.

Table 1 Clinical and pathologic characteristics of the ESCC patients in this study

Quantitative RT-PCR

PCR analysis was performed on 50 pairs of ESCC and matched adjacent normal tissues. As an independent validation, the three top-ranked upregulated lncRNAs (RP11-334E6.12, DNM3OS, and RP11-150O12.6) and downregulated lncRNAs (AC103563.9, RP11-7 K24.3, and RP11-351 J23.1) with only one transcript were chosen as candidates. Total cellular RNA was isolated from ESCC tissues using the Oligotex mRNA mini kit (QIAgen) and then reversely transcribed using TransScript II One-Step gDNA removal and cDNA Synthesis SuperMix (Transgen) in accordance with the manufacturer’s instructions. The expression of selected lncRNAs was assayed by SYBR Green-based qRT-PCR using a 7900HT fast real time PCR system (Applied Biosystems/Life Technologies). Glyceraldehyde 3-phosphate dehydrogenase (GAPDH) mRNA was used as an internal control; the primers used are listed in Table 2.

Table 2 Primers used for qRT-PCR analysis of lncRNA expression

Statistical analyses

The expression levels of lncRNAs and mRNAs that were differentially expressed between ESCC and normal tissues were compared using the Bioconductor package limma (version 3.26.1) and R (version 3.2.2) software. Co-expression relationships between the lncRNAs and the protein-coding genes were estimated by Spearman’s correlation test. The false discovery rate (FDR) was also calculated to correct the P value for multiple testing, and unless otherwise stated, statistical significance was considered at P < 0.05.

Results

Differentially expressed lncRNAs and mRNAs in ESCC tissues

Fragments per kilobase of exon per million mapped fragments (FPKMs) were calculated for normalization of the expression level of each lncRNA and mRNA. We identified 127 lncRNAs that were differentially expressed (fold change ≥ 4, P < 0.01) between ESCC and normal tissues (Fig. 1a, b, Table S1). Among them, 98 lncRNAs were upregulated, and 29 lncRNAs were downregulated (Fig. 1a, b, Tables 3 and S1). RP11-334E6.12, VCAN-AS1, DNM3OS, AC093850.2, and RP11-150O12.6 were the five most significantly upregulated lncRNAs in ESCC; CYP4F35P, HCG22, LINC00675, C5orf66-AS1, and AC103563.9 were the five most significantly downregulated lncRNAs (Table 3).

Fig. 1
figure 1

Differentially expressed lncRNAs and mRNAs in ESCC. a Heatmap of expression profiles for the 127 lncRNAs that showed significant expression changes (29 downregulated and 98 upregulated). The red to green color gradient indicates a high to low level of expression. b Volcano plot of P values as a function of the weighted fold change for lncRNAs in five normal and seven tumor tissues. Black dots represent lncRNAs that are not significantly differentially expressed (fold change < 4, P > 0.01), and red dots represent lncRNAs that are significantly differentially expressed (fold change ≥ 4, P < 0.01). c Heatmap of expression profiles for the 3077 mRNAs that showed significant expression changes (219 downregulated and 2858 upregulated). The red to green color gradient indicates a high to low level of expression. d Volcano plot of P values as a function of the weighted fold change for mRNAs in five normal and seven tumor tissues. Black dots represent mRNAs that were not significantly differentially expressed (fold change < 4, P > 0.001), and red dots represent lncRNAs that were significantly differentially expressed (fold change ≥ 4, P < 0.001)

Table 3 The five most significantly downregulated and upregulated lncRNAs

mRNA expression profiles in ESCC tissues were also compared with those in non-cancerous tissues. A total of 3077 mRNAs were found to be differentially expressed (fold change ≥ 4, P < 0.001) between ESCC and non-cancerous tissues: 219 were downregulated, and 2858 were upregulated (Fig. 1c, d, Table S2).

Next, we investigated whether the 127 non-coding and 3077 coding RNAs could distinguish ESCC from normal tissues. The heatmaps showed that the seven ESCC samples clustered together in one group, clearly separated from the normal tissue samples (Fig. 1a, c). The overall changes from normal to cancer state were also observed to be separated by a difference in the expression profile of either lncRNAs or mRNAs (Fig. 1). These observations suggest that a potential dynamic interaction between lncRNAs and coding RNAs may be reshaping the landscape of the entire transcriptome during ESCC development.

Significantly co-expressed mRNAs in ESCC tissues

Genome-wide gene expression profiling of both lncRNAs and coding genes from ESCC and normal tissues was conducted to detect possible associations of lncRNAs with ESCC. We predicted the potential target mRNAs of 127 differentially expressed lncRNAs using Spearman’s correlation test, revealing 1720 mRNAs (Coef > 0.95, P < 0.001) targeted by 119 lncRNAs (8 had no targets). Among them, 165 mRNAs were negatively correlated with lncRNAs, and 1555 mRNAs were positively correlated (Table 3, Table S3).

Construction of the co-expression network

We constructed a co-expression network of the dysregulated lncRNAs and their target mRNAs; differently expressed lncRNAs and their significantly correlated mRNAs were used to draw the network with Cytoscape (version 3.2.1). The co-expression network was composed of 1469 network nodes and 1720 connection edges between 119 lncRNAs and 1350 coding genes (Fig. 2). Within this co-expression network, 1555 pairs were positively correlated and 165 pairs negatively correlated (Table S3). Interestingly, by sharing of the same mRNAs, approximately two thirds (76 of 119) of the lncRNAs and their correlated mRNAs were integrated into one complex network. This co-expression network indicates that one lncRNA could target up to 122 coding genes and that one coding gene could correlate with up to 5 lncRNAs (Fig. 2).

Fig. 2
figure 2

Predicted lncRNA and mRNA co-expression network in ESCC. The co-expression network was established between 119 significantly expressed lncRNAs and 1350 co-expressed mRNAs that had a Spearman correlation coefficient equal to or greater than 0.95. Within this co-expression network, 1555 pairs were positively correlated, and 165 pairs were negatively correlated. The diamonds represent lncRNAs, and the circles represent mRNAs. The red to green color gradient indicates a high to low level of expression

GO and KEGG pathway analyses

A GO enrichment analysis was conducted to explore the function of the co-expressed mRNAs identified in this study. Genes were organized into hierarchical categories to uncover gene regulatory networks on the basis of biological process, cellular component, and molecular function. Specifically, a two-sided Fisher’s exact test was used to determine the GO category and GO annotation list, which was greater than expected by chance (using the recommended P value cutoff of <0.05). Through GO analysis, we found that these dysregulated lncRNA transcripts are associated with developmental process and multicellular organismal development (ontology: biological process), proteinaceous extracellular matrix and extracellular matrix (ontology: cellular component), and protein binding and binding activity (ontology: molecular function). Among the genes corresponding to the identified mRNAs, 1040 are involved in biological processes, 1164 in cellular components, and 1098 in molecular functions (Fig. 3, Table S4).

Fig. 3
figure 3

GO and KEGG analyses of significantly correlated mRNA targets of lncRNAs. The ontology covers three domains: a biological process, b cellular component, and c molecular function

To further specify and identify target mRNAs among the 1350 identified genes, significant pathways of co-expressed mRNAs were compared using the KEGG database. Without FDR correction, this analysis revealed 12 significantly enriched pathways among the transcripts (Table S5). Among these pathways, extracellular matrix (ECM)-receptor interaction (hsa04512) and chondroitin sulfate biosynthesis (hsa00532) were the only significantly enriched networks remaining after FDR correction. Some of the identified pathways, such as the classical gene category “PI3K-Akt” and “TGF-beta” signaling, have been reported to be involved in the induction of neoplasms in ESCC, but the enrichments were not significant after FDR correction (Table S5).

qRT-PCR validation of lncRNA expression

Based on the fold change, significance, and number of transcripts, three upregulated lncRNAs (RP11-334E6.12, DNM3OS, and RP11-150O12.6) and three downregulated lncRNAs (AC103563.9, RP11-7 K24.3, and RP11-351 J23.1) with only one transcript were chosen as candidates for further validation. We verified the expression of these lncRNAs by qRT-PCR using GAPDH as the reference gene with the 2-ΔΔCT method. Log2-transformed fold changes and dot plots of expression in tumor tissues vs. adjacent normal tissues are shown in Fig. 4. The results of qRT-PCR were consistent with the RNA sequencing data (Fig. 4). RP11-334E6.12 and RP11-150O12.6 were significantly upregulated in ESCC tissues (P < 0.05, Fig. 4a, c), though DNM3OS was not (P > 0.05, Fig. 4b), and AC103563.9, RP11-7 K24.3, and RP11-351 J23.1 were significantly downregulated in ESCC tissues compared to adjacent normal tissues (P < 0.05, Fig. 4d–f).

Fig. 4
figure 4

qRT-PCR validation of expression of selected lncRNAs. Both log2-transformed fold changes and dot plots of lncRNA expression in 50 pairs of tumor tissues vs. normal tissues are presented for each selected lncRNA. a, c RP11-334E6.12 and RP11-150O12.6 were significantly upregulated in ESCC tissues compared to normal tissues (P < 0.05). b Expression of DNM3OS showed no significant difference between ESCC tissues and normal tissues. df AC103563.9, RP11-7 K24.3, and RP11-351 J23.1 were significantly downregulated in ESCC tissues compared to normal tissues (P < 0.05)

Discussion

During the past two decades of molecular biological studies of human cancer, a number of coding genes have been determined to be genetically or epigenetically responsible for ESCC development. However, the pathogenesis of the disease remains poorly understood, and much of the alterations in gene expression and regulation involved in ESCC remain to be clarified. Moreover, the majority of lncRNAs described to date are thought to be functional, though few lncRNAs have been experimentally confirmed to be biologically relevant. For example, lncRNAs have been demonstrated to be involved in basal transcription machinery, RNA splicing and translation, and epigenetic regulation in cells [9, 14]. Overall, the cellular functions of lncRNAs remain largely unstudied. Thus, we conducted the current study to better understand the role of lncRNAs and co-expressed mRNAs in the development of ESCC. Recently, increasing evidence has confirmed that lncRNAs are important regulatory factors of gene expression either in a cis (neighboring genes) or trans (distant genes) manner, which is not easily predicted based on the lncRNA sequence [28, 29]. Therefore, predicting potential cancer-related lncRNAs by integrating various types of biological data represents an extremely important topic in such research and is attracting much attention.

Previous studies have shown that the long non-coding RNA CCAT1 promotes gall bladder cancer development via negative modulation of miRNA-218-5p [30], regulates long-range chromatin interactions at the MYC locus [31], and promotes hepatocellular carcinoma progression by functioning as a let-7 sponge [32]. CCAT1 has been reported to be a biomarker that is significantly associated with prognosis in colorectal cancer [31], hepatocellular carcinoma [33], and gastric cancer [34], as well as smoking in ESCC [35]. In the present study, significant CCAT1 upregulation was also observed in ESCC tissue relative to normal tissue and was correlated with U2AF2, HNRNPK, and SLC4A1AP, which according to GO functional annotation analysis, are associated with RNA binding and splicing. Additionally, lncRNA DNM3OS (dynamin 3 opposite strand), which is located within an intron of the Dnm3 gene, has been identified as possibly regulated by the transcription factor Twist-1 during mouse embryonic development [36]. In humans, DNM3OS encodes a miR-199a and miR-214 cluster, supporting the role of these miRNAs as novel intermediates in the pathways that control the development of hepatic stellate cells [37] and specific neural cell populations [38]. DNM3OS is also associated with Se’zary Syndrome [39]. In the present study, DNM3OS was upregulated in ESCC tissue compared with normal tissue and may play an important role in the development of ESCC. Moreover, upregulation of the lncRNA PCAT-1, which was also upregulated in ESCC compared with normal tissue, has been reported to be correlated with an advanced clinical stage and a poor prognosis in ESCC [23]. Thus, aberrant expression of the above lncRNAs has been linked to ESCC development.

To date, a few studies of differentially expressed lncRNAs in ESCC tissues have been reported. However, these studies were based on microarray technology, which tends to yield false positives and/or false negatives [40, 41]. Our study is the first to show by RNA sequencing a total of 127 differentially expressed lncRNAs, with a fold change of at least four, in ESCC tissues. A total of 119 differentially expressed lncRNAs and 1350 potential mRNA targets were then integrated into the lncRNA and mRNA co-expression network, and bioinformatic analysis revealed that these dysregulated lncRNAs are associated with cellular processes (ontology: biological process), cell (ontology: cellular component), and binding (ontology: molecular function). These lncRNAs are also associated with 12 gene pathways corresponding to transcripts involved in the cell cycle, ECM-receptor interaction, and focal adhesion, which were also enriched in another microRNA array study of ESCC [18] that was highly consistent with our study. ECM-receptor interaction leads to direct or indirect control of cellular activities, such as adhesion, migration, differentiation, proliferation, and apoptosis [42]. Our qRT-PCR results showed that the levels of RP11-334E6.12, RP11-150O12.6, AC103563.9, RP11-7 K24.3, and RP11-351 J23.1 expression were highly consistent with the RNA sequencing data. In contrast to the RNA sequencing data, DNM3OS was not significantly upregulated in ESCC tissues based on qRT-PCR, which may be due to either a false positive result or the relatively small number cohort in our validation. Overall, our results demonstrate that lncRNAs have a probable role in ESCC development and progression.

ESCC is a common malignant neoplasm worldwide, with an especially high incidence in China. The etiology, pathophysiology, and underlying molecular mechanisms of ESCC are largely unknown, and additional functional studies of candidate lncRNAs are needed to fully understand the roles of these molecules in ESCC and to effectively control this disease. This proof-of-principle with regard to the potential link between lncRNAs and ESCC presents a novel area for further investigation into the target genes of such lncRNAs, which may lead to the development of new therapeutic strategies for this disease.