Introduction

Cervical cancer is the third most common cancer and the fourth leading cause of cancer deaths among women in the world. More than 500,000 new cases and around 275,000 deaths occur each year [1]. The proportion of cervical squamous cell carcinoma (CSCC) is 75∼80 % [2]. Human papilloma virus type 16 (HPV16) and HPV18 are the most frequently detected types in cervical cancer worldwide, and the carcinogenesis of Human papilloma virus (HPV) is depending on the activities of viral oncoproteins E5, E6, and E7 [3]. These oncoproteins inhibit various cellular targets, including the tumor-suppressor proteins p53, pRb, p21, and p27, as well as disrupt critical cellular progresses, including cell cycle, apoptosis, and malignant transformation of cervical basal cells [4]. However, in most cases, the HPV infections remain asymptomatic and would be cleared by the immune system in 6∼12 months [5], and only few cases progressed to cervical cancer [6]. It has been demonstrated that HPV infection alone is insufficient for cervical cancer and the abnormal host genes are critical in the development of cervical cancer [7]. Therefore, in addition to the HPV DNA testing, the discovery of vital diagnostic and therapeutic molecular markers against CSCC should be prioritized.

Microarray has been a powerful tool to detect the expressions of thousands of genes at one time and discover new genes related to the progression of cervical cancer [8]. However, microarray has several limitations; for instance, background of hybridization disturbs the accuracy of expression measurements, especially for the transcripts of low abundance. Furthermore, probes differ according to the hybridization properties, and the arrays can only measure transcripts with existing relevant probes on the array [9]. However, sequencing-based approaches provided a superior method to solve these limitations. The high-throughput messenger RNA (mRNA) sequencing (RNA-seq) was used to screen new genes, transcripts, and differentially expressed genes with more accuracy than traditional methods [10]. RNA-seq has been used to reveal the transcriptome profiling of colorectal cancer, prostate cancer and breast cancer [1113]. However, the transcriptome profiling of cervical squamous cell cancer has not been performed.

In this study, we thoroughly annotated the transcriptome of CSCC and matched adjacent nontumor (ATN) tissues of three patients by RNA-seq. First, we screened differentially expressed genes (DEGs) among CSCC and ATN tissues. Second, we indentified potential pathway and gene function clustering using Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. Protein interaction network was carried out to reveal the interaction of the DEGs. Quantitative real-time PCR (RT-qPCR) was used to confirm the gene expression differences among CSCC and ATN tissues. Immunohistochemistry was used to detect the relationship between clinicopathological parameters of CSCC and DEGs.

Patients and methods

Patients and tissue specimens

Forty-seven paired fresh-frozen tissue samples (CSCC and ATN tissues) were collected from CSCC patients with radical hysterectomy during the period from January 2012 to August 2013(Peking Union Medical College Hospital, China). Diagnosis of all cases was histologically confirmed by two independent pathologists, and all tumor tissues were assessed by HE staining; only those with the percentage of tumor cells more than 90 % were used (Fig. 1). Three paired samples were randomly selected for RNA-seq. Informed consent from each patient was obtained. The procedures were approved by the Ethics Committee of Human Experimentation in this country or are in accordance with the Helsinki Declaration of 1964.

Fig. 1
figure 1

HE staining of CSCC and ATN tissues. The HE staining results of CSCC and ATN tissues of ×10 and ×20 magnification times under microscope

cDNA library preparation and sequencing

Total RNA was extracted from CSCC and ATN tissues with TRIzol according to the manufacturer’s protocol (Invitrogen). The IIlumina standard kit was used according to the TruSeq RNA SamplePrep Guide (IIlumina). Magnetic beads containing oligo (dT) were used to isolate poly(A) mRNA from total RNA. Purified mRNA was then fragmented. Using these short fragments as templates, random hexamer primers and reverse transcriptase (SuperScript II, Invitrogen) were used to synthesize the first-strand complementary DNA (cDNA). The second-strand cDNA was synthesized by using buffer, dNTPs, RNase H, and DNA polymerase I. Short double-stranded cDNA fragments were purified with QIAquick PCR extraction kit (Qiagen) and eluted with EB buffer for end repair and the addition of an “A” base. The short fragments were ligated to Illumina sequencing adaptors. DNA fragments with selected size were gel-purified with QIAquick PCR extraction kit (Qiagen) and amplified by PCR. The library was then sequenced on Illumina HiSeq™ 2000 sequencing machine. The library size was 400 bp, read length was 116 nt, and the sequencing strategy was paired-end sequencing.

Raw read filtering and mapping

The raw RNA-seq was filtered according to the following criteria: (1) Reads containing sequencing adaptors were removed, (2) nucleotides with a quality score lower than 20 were removed, (3) reads shorter than 50 were discarded, and (4) artificial reads were removed. The clean reads were used for subsequent analysis and were mapped to the reference genome by TopHat [14].

Differentially expressed gene testing

The gene expression was measured by the number of uniquely mapped fragments per kilobase of exon per million mapped fragments (FPKM). We identified differentially expressed genes (DEGs) among CSCC and ATN tissues by Cuffdiff and TopHat. The selection criteria are the false discovery rate (FDR) ≤ 0.05 and fold change ≥ 4 [15].

Functional annotation and pathway analysis

We used the Database for Annotation, Visualization, and Integrated Discovery (DAVID) bioinformatics resource to annotate gene function and pathway [16]. The 5 % cutoff of FDR was used. Only the results from GO FAT and KEGG pathways were used. The Fisher’s exact P value <0.05 was used as significantly important.

Protein-protein interaction analysis

Protein-protein interaction (PPI) data were downloaded from HPRD (http://www.hprd.org/) database. Totally, 39,240 interaction pairs were obtained and subsequently used to build PPI network. We extracted a sub-PPI network consisting of the 13 known cervical cancer-related genes from National Cancer Institute (NCI), in which genes were connected to each other in shortest path. The average shortest path of the 13 cervical cancer related genes was calculated, and the detected DEGs connected to the 13 cervical cancer related genes with the same average shortest path were further included in the subnetwork. Finally, we acquired a subnetwork with 137 genes. On this basis, we computed betweenness centrality for each gene in the network to identify the essential DEGs in the PPI network.

Differentially expressed gene validation

Three genes (retinol dehydrogenase 12 (RDH12), ubiquitin D (UBD), and serum amyloid A1 (SAA1)) were uncovered as play critical roles in the progression of CSCC according to the results of PPI network. Real-time quantitative polymerase chain reaction (RT-qPCR) was used to detect the expression of DEGs on mRNA level in the 47 pairs of samples using Applied Biosystems 7500 Fast Real-Time PCR System. The PCR volume included 10 μl 2× SYBR Green Master Mix (KAPA Biosystems), 1 μl cDNA template, 1 μl sense primer, 1 μl antisense primer, and 7 μl ddH2O. The primer sequences were as follows: RDH12 sense primer, ATAATGAACAGGGACCAAGGA, antisense primer, GCCATAAGCCAGTCTCACAAG; UBD sense primer, AAGATGATGGCAGATTACGG, antisense primer, GTGGTCACCCTCCAATACAA; SAA1 sense primer, GATCAGCGATGCCAGAGAGA, antisense primer, GTCGGAAGTGATTGGGGTCT; and GAPDH sense primer, GTCAAGGCTGAGAACGGGAA, antisense primer, AAATGAGCCCCAGCCTTCTC. The expression levels of target genes in the tested experimental conditions (CSCC tissues) were compared to the control condition (ATN tissues) with the 2-(ΔΔCt) method. GAPDH was used as the referenced gene. All the RT-qPCR reactions were performed in triplicate to capture intra-assay variability.

Immunohistochemistry and clinicopathological parameters analysis

Immunohistochemistry was performed to detect the expression of DEGs in the 47 pairs of CSCC and ATN tissues. Paraffin-embedded cervical tissue samples were dewaxed in xylene and rehydrated in graded ethanol. Antigen recovery was performed in 10 mmol/L boiling sodium citrate buffer at pH 6.0 for 10 min at 92∼98 °C, and then specimens were incubated with 0.3 % H2O2 for 15 min. Nonspecific binding was blocked with normal horse serum for 20 min at room temperature. The sections were incubated with monoclonal mouse antihuman RDH12, UBD, and SAA1 antibody (diluted 1:200, Santa Cruz) at 4 °C overnight. The sections were washed with PBS and incubated with biotinylated secondary antibody for 30 min (diluted 1:1000, DAKO EvisionTM). Sections were then treated with ABC solution at 37 °C for 30 min and incubated with DAB (3,3′-diaminobenzidine) for 5 min. Sections were counter stained using Harris hematoxylin. The three DEG expressions and their clinicopathological significance in CSCC were evaluated.

Statistical analyses

The χ 2 test was used for comparisons between immunohistochemical and clinicopathological parameters. Statistical significance was assumed as P < 0.05. The statistical analyses were performed using the SPSS 13.0 statistical software.

Results

Characterization of sequencing and mapping

Three pairs of matched samples (sample 1/2/3) were randomly selected for RNA-seq. After mapping the data to reference genome, 22.0 million, 43.7 million, and 5.9 million read pairs were obtained from CSCC tissues for the samples 1, 2, and 3, respectively. And, 15.8 million, 21.5 million, and 17.0 million read pairs were obtained from matched ATN tissues accordingly. The proportion of reads mapped to human genome ranged from 63 to 73 %.

Analysis of differentially expressed genes

The normalized expression level of each gene was measured by FPKM. The global profiles of gene expression between CSCC and ATN tissues were highly correlated with Pearson correlation coefficient, ranging from 0.798 to 0.928(Fig. 2a). We detected DEGs with fold change ≥4 for each pair of samples. The overlapping of the DEGs and the common DEGs among the three pairs of samples was shown in Fig. 2b. We detected 3538 DEGs from sample 1 (CSCC vs. ATN), 2481 DEGs from sample 2, and 2280 DEGs from sample 3, respectively. In total, there were 347 significantly common DEGs, including 104 consistent upregulated genes and 148 consistent downregulated genes among the three pairs of samples, while the expression levels of the remaining 95 significantly DEGs were nonconsistent in the three paired samples. (S1). The clustering analysis and “volcano plots” indicated that the DEGs of CSCC tissues were apparently distinguished from that of ATN tissues (Fig. 2c, d).

Fig. 2
figure 2

Analysis of differentially expressed genes in CSCC and ATN tissues. a The scatter plot of gene expression among each pair of samples, the Pearson correlation coefficient is calculated. b The Venn diagram indicated the overlapped differentially expressed genes with fold change >4 among the three pairs of samples. c Hierarchical clustering of differentially expressed genes among CSCC and ATN tissues of the three pairs of samples. d Volcano plots of all the differentially expressed genes in CSCC and matched ATN tissues of the three pairs of samples; the red dots illustrated the upregulated or downregulated differentially expressed genes which were significantly important with P values <0.05

Functional annotation and pathway analysis

The 347 significantly common DEGs were categorized into 73 functional categories, 104 upregulated DEGs were categorized into 34 significant categories, and 148 downregulated DEGs were categorized into 80 functional categories (S2). We classified the 347 DEGs according to the relevant biological functions of cellular component, molecular function, and biological progress (Fig. 3). It showed that most of DEGs were involved in the biological function of cellular component.

Fig. 3
figure 3

Functional annotation of 347 differentially expressed genes. The 347 common significant DEGs were categorized into 73 functional categories according to the relevant biological functions of cellular component, molecular function, and biological progress

The six significant pathways enriched with DEGs with a modified Fisher’s exact P value <0.05 are listed in Tables 1, including cytokine-cytokine receptor interaction, complement and coagulation cascades, retinol metabolism, chemokine signaling pathway, metabolism of xenobiotics by cytochrome P450, and melanogenesis.

Table 1 KEGG pathway analysis of differentially expressed genes

Protein-protein interaction analysis

We extracted a sub-PPI network consisting of the 13 known cervical cancer-related genes from National Cancer Institute (NCI), in which genes were connected to each other in shortest path. The average shortest path of the 13 cancer genes was calculated, and the detected DEGs connected to the 13 cancer genes with the same average shortest path were further included in the subnetwork. Betweenness centrality for each gene in the PPI network was calculated (S3). Retinol dehydrogenase 12 (RDH12) gene, serum amyloid A1 (SAA1) gene, and ubiquitin D (UBD) gene were uncovered as with relatively high betweenness centrality and high fold change values (eightfold change). These three DEGs interacted with the genes in the PPI network to participate in the progression of CSCC.

Differentially expressed gene validation

According to the results of protein-protein interaction analysis, three candidate genes (RDH12, UBD, and SAA1) were used to confirm the DEG expression, and the mRNA expression levels of selected genes were validated in 47 pairs of CSCC and ATN tissues by RT-qPCR (Fig. 4a). The RDH12 gene was downregulated in CSCC tissues (P < 0.001), while UBD gene (P < 0.001) and SAA1 gene (P < 0.01) were upregulated in CSCC tissues compared with that of ATN tissues.

Fig. 4
figure 4

Validation of differentially expressed genes. a The expressions of RHD12, UBD, and SAA1 genes were validated by RT-qPCR in 47 pairs of samples (***P < 0.001; **P < 0.01). b Immunohistochemistry was used to detect the expressions of the three genes (RDH12, UBD, and SAA1) in the 47 pairs of CSCC and matched ATN tissues.

Immunohistochemistry and clinicopathological parameters analysis

Immunohistochemistry was performed to detect the expression of the three candidate genes (RDH12, UBD, and SAA1) in the 47 pairs of samples (Fig. 4b). As shown in Table 2, we found that RDH12 expression was decreased in 74.5 % of CSCC tissues. The RDH12 protein appeared to be located in the cytoplasm of tumor cells. RDH12 expression was negatively associated with tumor size and depth of cervical invasion. The UBD protein located in the cytoplasm of CSCC cells and overexpressed in 61.7 % of CSCC tissues. UBD expression was positively related with tumor size and lymphatic metastasis. The SAA1 protein located in both the nucleus and cytoplasm of CSCC cells and was overexpressed in 57.4 % of CSCC tissues. The SAA1 expression was positively related with clinicopathological parameters of tumor size, lymphatic metastasis, and depth of cervical invasion.

Table 2 Correlation between DEGs and the clinicopathologic characteristics of the CSCC patients

Discussion

Both HPV infection and host genetic factors contribute to the development of cervical cancer. Thus, a better understanding could be achieved for the genetic mechanism of cervical cancer by detailed genetic analyses. In this study, we investigated the transcriptomes of three pairs of CSCC and ATN tissues through RNA-seq for the first time.

In order to estimate whether our findings were reliable, we compared the DEGs with those identified in previous studies. For instance, secreted phosphoprotein 1 (SPP1), as a direct transcriptional target for p53 [17], was upregulated in cervical cancer [18]. The increased expression of cystic fibrosis transmembrane conductance regulator (CFTR) is associated with malignancy, progression, and prognosis of cervical cancer [19]. Dickkopf-1 (DKK1) is involved in embryonic development through inhibiting WNT signaling pathway. The silencing of DKK-1 might promote tumorigenesis of the invasive cervical squamous cells without β-catenin mutations [20]. The expression levels of these genes in our study were consistent with previous studies.

A total of six significant pathways were detected. Several studies have demonstrated that IL1R2 was involved in the TMPRSS2/REG and IL-mediated signaling pathways which were corroborated in the progression, migration, and invasion of cancers [21]. The inflammatory cytokine genes (CXCL5, CXCL9, CXCL10, CXCL12, CCL11, CCL13, CCL14, CCL23) linked to inflammation and tumor biology. The CXCR4/CXCL12 axis induces expression of the integrin and regulates tumor radioresistance through activating SAPK/JNK, MEK1/2, PI3K/AKT, and NF-κB signal pathways [2225]. CXCL5 contributes to tumor metastasis and recurrence through PI3K-Akt and ERK1/2-MAPK signal pathways [26]. It has been reported that the coagulation and complement cascades participated in tumor development [27]. However, the expression of coagulation-related genes F10 and CFD in tumors remained unclear. MASP1 and C4PB might be novel target genes in squamous cell lung carcinoma [28, 29]. All-trans retinoic acid (ATRA), a vitamin A metabolite, plays an essential role in embryonic development and the regulation of cell proliferation, differentiation, and migration [30]. Cytochrome P450 is considered as a key participant in the progression of lung cancer and a potentially valuable target for anticancer drug [31]. It has been reported that the MEK/ERK, PI3K/Akt, and Wnt/β-catenin pathways were involved in the melanogenesis signaling cascade [32]. The DEGs in these pathways might be involved in the progress of CSCC.

The RDH12, UBD, and SAA1 genes might play essential roles in the progression of CSCC according to the analysis of protein interaction network. RDH12 is a kind of NADPH-dependent all-trans-retinol dehydrogenases involving in the metabolism of retinoids [33]. Retinoids have been used as chemotherapeutic drugs in acute promyelocytic leukemia (APL) as the antiproliferative and antioxidant activity [34]. It has been reported that the reduction in the expression of RDH12 could lead to the dysregulation of cell proliferation/differentiation and initiate cancer development in human gastric cancer [35]. In this study, we found that the expression of RDH12 was decreased in CSCC tissues and was negatively associated with tumor size and depth of cervical invasion. The RDH12 gene was involved in the retinol metabolism pathway and interacted with genes such as MMP1 and CDKN2A according to the KEGG pathway analysis and protein interaction network, and it indicated that RDH12 might play an important role in the prevention of carcinogenesis in CSCC. The UBD gene, which is also known as HLA-F-adjacent transcript 10 (FAT10), is a small ubiquitin-like modifier. It has been demonstrated that interference of FAT10 could inhibit cell proliferation by inhibiting the cell cycle-S phase entry and inducing cell apoptosis of hepatocellular carcinoma cells, and the adenovirus–siRNA/FAT10 significantly suppressed tumor growth and prolonged the lifespan of tumor-bearing mice [36]. UBD also played an important role in promoting malignant cell transformation, including proliferative, invasive, and migratory functions. The malignant properties of FAT10 in nontumorigenic and tumorigenic cells were mediated via NF-κB-CXCR4/7 pathway [37]. Serum amyloid A (SAA) is a positive acute-phase protein including four different isotypes in human. SAA is mainly generated by liver in response to inflammation, infection, and tissue injury [38]. SAA1 is a kind of acute-phase SAAs and would increase in concentration approximately several hundred fold changes under inflammatory stimulation [39]. SAA could activate the transcriptional factor NF-κB to suppress apoptosis [40]. SAA may involve in the local inflammation of microenviroment of the malignant tissue by inducing the generation of tumor necrosis factor-α (TNFα), interleukin-1b (IL-1), and the chemokines CCL1, CCL3, and CCL4 [41, 42]. SAA may enhance tumor cell invasion and metastasis by directly increasing the activity of matrix-degrading enzymes (MMP/TIMP-1) and by enhancing TNFα production [43]. In this study, the expression of UBD was upregulated in CSCC and was positively related with tumor size and lymphatic metastasis. SAA1 was overexpressed in CSCC and was positively associated with tumor size, lymphatic metastasis, and the depth of cervical invasion. In addition to KEGG pathway analysis and protein interaction network, the UBD and SAA1 genes might promote the progression of CSCC and could be used as molecular diagnostic markers and therapeutic targets in CSCC.

This research demonstrated that the DEGs, functional categories and related pathways in CSCC, and RDH12, UBD, and SAA1 genes might play critical roles in the development of CSCC by interacting with related proteins. This study would broaden our understanding of the pathogenesis of CSCC.