Introduction

Oral squamous cell carcinoma (OSCC) is consisting of cancers varying in anatomic sites including oral cavity, tongue. Due to variety in subsites, those tumors have distinct biological and clinical behaviors. As one of the most common types of OSCC, tongue squamous cell carcinoma (TSCC) is significantly more aggressive than other types of OSCC, in terms of frequent distant metastasis and locoregional recurrence [1]. Despite recent advances in cancer research and treatment modalities, the 5-year overall survival rate remains <50% for the past few decades [2].

The accumulation of multiple genetic alterations and environmental factors such as tobacco use, alcohol consumption, chronic inflammation, and human papilloma virus (HPV) infection [3] contributes to the development of TSCC. Recently, a cancer exome sequencing study revealed that genomic profiles and mutational spectrum were similar between young nonsmoking and older smoking TSCC patients except for the frequency of Tp53 mutation, which was fewer in young nonsmoking TSCC patients [4]. Although amounts of researches have been performed, the understanding of the disease mechanism is limited, hindering exploration of new therapeutic treatments.

Advances in the low-cost and rapid high-throughput technologies shed light on a systematic understanding of the complex biological processes of diseases including head and neck cancer [57], and thus aiding in the early diagnosis and new treatment approaches. As next-generation sequencing technology, RNA-Seq could measure global genomic expressions with higher resolution and lower cost compared with conditional microarrays [8, 9].

This study first used RNA-Seq technology to compare gene expression profiles of TSCC primary tumor to that of their matched paratumor and normal mucosa. Findings in the RNA-seq analysis were then validated by real-time PCR. The purpose of the study was to demonstrate the genetic and molecular alterations associated with TSCC by RNA-seq, enhancing our understanding of molecular pathogenesis of TSCC.

Material and methods

Tissue collection

Paired TSCC samples from center portions of tumor, and adjacent histological normal tissues from at least 1.5 cm distal to the tumormargins, were obtained from 20 patients who were admitted to Xiangya Stomatological Hospital & School of Stomatology during 2015. Normal mucosa samples were from 6 oral trauma patients. None of the patients received anti-tumor treatment before radical surgical treatment or had other types of tumors. The characteristics of samples for RNA-seq are present in Table S1. Frozen sections of surgical samples were carefully examined by the pathologist using hematoxylin and eosin-stained sections. All samples were frozen immediately in liquid nitrogen after the operation and were stored at −80 °C until RNA extraction. All protocols were approved by the Human Ethics Committee of Xiangya Stomatological Hospital & School of Stomatology. Written informed consent forms were obtained from all participants or their legal guardians.

Total RNA isolation and next-generation sequencing

Total RNA was isolated from each sample with TRIzol Reagent (Invitrogen, USA) according to the manufacturer’s instructions. The integrity of total RNA was examined using an Agilent Technologies 2100 Bioanalyzer. After a series of process for extracted RNA, a HiSeqTM 2500 platform (Illumina) was used to perform sequencing.

Statistical analysis

TopHat v1.3.1 [10] software was used to align raw sequencing reads to the UCSC H. sapiens reference genome (build hg19), and Cufflinks v1.0.3 [11] software was used to measure the relative abundances of the transcripts in fragments per kilobase of exon per million fragments mapped (FPKM). Only the genes with “q value” less than 0.01 were regarded as differentially expressed genes (DEGs). The heatmaps for genes with statistically significant changes in expression were generated by R package.

Functional classification of DEGs

To identify cancer-specific functional categories, we first performed parallel enrichment tests for significantly differentially regulated genes that were detected by pair-wised comparisons in TSCC, paratumor tissue and normal mucosa using the Database for Annotation, Visualization and Integrated Discovery (DAVID) v6.7, which is a set of Web-based functional annotation tools [12]. The lists of DEGs were submitted to the Web interface. Only gene ontology (GO) FAT and KEGG pathways with the false discovery rate (FDR) ≤ 0.05 were selected as functional annotation categories for this analysis. GO categories are sorted into three ontologies including biological process, cellular component, and molecular function, which were examined individually in this study.

Real-time quantitative PCR validation

RNA was reversely transcribed to cDNA using HiScriptTM Q RT SuperMix for qPCR (vazyme) according to the manufacturer’s protocol. SYBR Green PCR amplification was performed on the real-time PCR Detection system cycler (BIO-RAD, USA). The qRT-PCR reaction contained 10 μl of AceQTM qPCR SYBR® Green Master Mix (Q111-02 vazyme), 20 ng of diluted cDNA, and 5 μM of each primer (Table S2) contributing a total volume of 20 μl. Cycling conditions were as follows: 95 °C for 5 min, 95 °C for 10s, 60 °C for 30s for 40 cycles, followed by melt analysis from 60 to 95 °C. 2−ΔΔCt method was used to analysis outputting data. GAPDH was used as a loading control. SPSS version 13.0 (SPSS, Inc.; Chicago, IL, USA) was used for significant analysis.

Results

Analysis of transcriptome sequencing

In total, we obtained 48.5 million, 49.1 million, and 44.6 million read pairs from the TSCC, paratumor tissue, and normal mucosa, respectively. The uniquely mapped reads ranged from 40.4 million to 44.9 million pairs for the three groups. The proportion of reads that mapped to the Ensembl reference genes ranged from 81 to 89%. A total of 252 genes were differentially expressed between TSCC and paratumor tissue including 117 up-regulated and 135 down-regulated genes, and 234 genes were differentially expressed between TSCC and normal mucosa including 67 up-regulated and 167 down-regulated genes. It was noteworthy that there were more dysregulated genes in TSCC than in other two tissues (78 genes in normal vs paratumor), indicating cancer-specific transcriptome reprogramming, as was shown in “volcano plot” of the gene expression profiles (Fig. 1).

Fig. 1
figure 1

Volcano plots for genes in TSCC vs paratumor, TSCC vs normal mucosa. The red and blue dots indicate that up- and down-regulated DEGs were significant at q values less than 0.01

For these 135 genes that were commonly identified in comparison of TSCC vs paratumor and TSCC vs normal, a bootstrap hierarchical clustering was performed among TSCC, paratumor, and normal mucosa tissues. As illustrated in Fig. 2a, the gene expression signature of TSCC was distinctly differently from corresponding normal samples. The gene expression correlation was analyzed among TSCC, paratumor, and normal mucosa group. The global profiles of gene expression were generally highly correlated with pairwise Pearson correlation coefficient (TSCC vs paratumor, Spearman’s rho = 0.93, and TSCC vs normal, Spearman’s rho = 0.93, Fig. 2b, c).

Fig. 2
figure 2

Correlative analysis of gene expression among TSCC, paratumor, and normal mucosa tissues. a Hierarchical clustering of differentially expressed genes across TSCC, paratumor, and normal mucosa tissues. b Spearman correlation analysis was performed between TSCC and paratumor tissue. b Spearman correlation analysis was performed between TSCC and normal mucosa tissues

Functional classification of DEGs

The GO categories that dysregulated genes significantly enriched in from the comparison of TSCC vs paratumor and TSCC vs normal were selected. In total, the DEGs between TSCC and paratumor were categorized into 31 GO categories under three ontologies, and DEGs between TSCC vs normal were categorized into 33 GO categories (Fig. 3). For both two comparisons, GO categories of muscle contraction (GO: 0006936), epidermis development (GO: 0008544), epithelial cell differentiation (GO: 0030855), and keratinization (GO: 0031424) were commonly enriched. Interestingly, GO terms of chemotaxis (GO: 0006935) and defense response (GO: 0006952) were only enriched for DEGs between TSCC vs normal. We also found that the altered expression in pathologic status affected some pathways, such as cardiac muscle contraction (hsa04260), salivary secretion (hsa04970), calcium signaling pathway (hsa04020), GnRH signaling pathway (hsa04912), tight junction (hsa04530), NOD-like receptor signaling pathway (hsa04621) (Fig. 4).

Fig. 3
figure 3

The significantly enriched GO categories with FDR < 0.05 under three ontologies of biological process, cellular component, and molecular function. a For DEG from comparison of TSCC vs paratumor. b For DEG from comparison of TSCC vs normal. BP biological process, CC cellular component, MF molecular function

Fig. 4
figure 4

The significantly enriched pathways with FDR < 0.05. b For DEG from comparison of TSCC vs paratumor. b For DEG from comparison of TSCC vs normal

Quantitative real-time RT-PCR validation

We performed the quantitative real-time RT-PCR (qRT-PCR) to validate the findings of RNA-Seq in 10 samples of TSCC, matched paratumor, and normal mucosa, respectively. Among these DEGs, we identified some previously described tumor-related genes, such as FOLR1, NKX3-1, TFF3, PIGR, NEFL, MMP13, and HMGA2. As was displayed in Fig. 5, FOLR1, NKX3-1, TFF3, and PIGR mRNA expressions were significantly decreased in TSCC compared with matched paratumor or normal mucosa. NEFL, MMP13, and HMGA2 mRNA expressions were significantly increased in TSCC compared with matched paratumor or normal mucosa (Fig. 5). The concordance between qRT-PCR and RNA-Seq results confirmed that the findings from RNA-Seq were credible.

Fig. 5
figure 5

qRT-PCR analysis data for FOLR1, NKX3-1, TFF3, PIGR, NEFL, MMP13, and HMGA2 are presented in TSCC, matched paratumor, and normal mucosa. Asterisk mean p value <0.05

Discussion

In this study, we first used RNA-Seq to examine TSCC whole-genome gene expression patterns by comparison between primary tumor and paratumor or normal mucosa. Our findings provided a genome-wide gene expression profiles in patients with TSCC, providing new clues for understanding the molecular mechanisms of TSCC pathogenesis. In total, 252 and 234 DEGs were obtained from comparison between primary tumor and paratumor or normal mucosa. Aberrant expression of some of these genes was previously reported to be associated with TSCC, such as KRT1 [13], KRT10 [14], CASP14 [15], CRISP3, MUC7, and DMBT1 [16]. However, we also identified some novel genes associated with TSCC. KLK14 was the upregulated gene with the lowest p value in comparison of both TSCC vs paratumor and TSCC vs normal. KLK14, a novel extracellular serine protease, was expressed aberrantly in breast cancer, ovarian cancer, prostate cancer, testicular cancer, and peeling skin syndrome [17], and elevated KLK14 mRNA expression was linked with prognosis of breast and ovarian cancer patients [18]. The role of KLK14 may play either stimulatory or inhibitory in the process of carcinogenesis, in different cancer type and tumor microenvironment [19]. In TSCC, we concluded that KLK14 may promote cancer progression via ECM digestion, suggesting its use as a potential biomarker and therapeutic target for TSCC.

GO analysis revealed that these DEGs were significantly enriched in many TSCC-related functions, such as muscle contraction (GO: 0006936), epidermis development (GO: 0008544), epithelial cell differentiation (GO: 0030855), keratinization (GO: 0031424), chemotaxis (GO: 0006935), and defense response (GO: 0006952). Pathway analysis highlighted many pathways, such as cardiac muscle contraction (hsa04260), salivary secretion (hsa04970), calcium signaling pathway (hsa04020), GnRH signaling pathway (hsa04912), tight junction (hsa04530), and NOD-like receptor signaling pathway (hsa04621), which were closely related to the carcinogenesis of TSCC.

When the tumor invaded the tongue muscle, the contraction of the tongue muscle was affected, resulting in increasing salivary secretion. Our analysis uncovered the related genes involved in muscle contraction and salivary secretion, explaining the phonotype of TSCC in a molecular level. Genes that are enriched in epithelial development and differentiation, such as CALML5, CASP14, KRT10, CDSN, and KRT1, were found to be involved in the development of TSCC. Previous microarray studies also detected the altered expression of cytokeratins KRT16, KRT17 [20], and KRT1 [21]. In moderate oral epithelial dysplasia, enhanced KRT1 and KRT10 synthesis was observed, emphasizing important roles of cytokeratins in the development of TSCC. It has been accepted that tight junction played a vital role in the process of cancer metastasis [22]. Interestingly, we also found that pathway of tight junction was significantly enriched due to altered expression of related molecules including MYL2, MYLPF, MYH7, MYH2, and CLDN3, indicating the higher nodal metastatic rate of TSCC.

The results of RNA-Seq were confirmed by qRT-PCR validation of seven tumor-related genes, including FOLR1, NKX3-1, TFF3, PIGR, NEFL, MMP13, and HMGA2. The results of qRT-PCR validation were in full accordance with that of RNA-Seq. MMP-13, a member of the collagenase family, is involved in the matrix metalloproteinases (MMP) activation cascade, which degrade the extracellular matrix and basement membranes. Elevated MMP-13 expression has been found in many other types of malignancies and oral squamous cell carcinoma [2325]. In oral squamous cell carcinoma, high level of MMP13 expression was significantly associated with lymph node metastasis and tumor staging and may be applied as an independent prognostic factor [26]. Both previous and the present study showed that MMP13 plays an important role in the invasion and metastasis of oral squamous cell carcinomas.

HMGA2, an architectural transcriptional factor, is highly expressed in undifferentiated mesenchymal cells during development and most malignant epithelial tumors including TSCC [27]. HMGA2 contributes to the aggressiveness of carcinoma through up-regulating snail expression and inducing epithelial mesenchymal transition (EMT) [27]. Additionally, HMGA2 expression was associated with poor prognosis in patients with oral squamous cell carcinoma [28]. Based on previous and the present study, we concluded that HMGA2 may be a therapeutic target for TSCC.

TFF3, a member of the mammalian TFF family, was associated with endocrine response in breast cancer. TFF3 mRNA was expressed in breast cancer [29], prostate cancer [30], gastric cancer [31], and etc. However, TFF3 was strongly down-regulated in oral mucosal tissues of 23 healthy subjects and 23 OSCC patients [32], as was identical with our results, suggesting that TFF3 played different roles in the carcinogenesis of various types of cancer. With respect to the other four verified genes, there were some studies that reported the relevance to oral squamous cell carcinoma for FOLR1 [33], NKX3-1 [34], NEFL [35] except PIGR.

Conclusions

In summary, we provided a genome-wide gene expression profile of TSCC. Not only genes that have been previously shown to be involved in TSCC were identified in our study, such as KRT1, KRT10, CASP14, CRISP3, MUC7, and DMBT1, but some interesting novel candidate genes were identified to be associated with OTSCC, such as KLK14 and PIGR. Furthermore, we selected seven genes to validate the result of RNA-seq analysis. Although the present RNA-seq study is based on a relatively small number of patients, the findings was fully confirmed by RT-PCR validation. Therefore, this study provided a valuable reference gene dataset for future identification and validation of biomarkers for detection, diagnosis, and prognosis of TSCC, adding new clues for understanding the molecular mechanisms of TSCC pathogenesis.