Introduction

Oral cancer is the thirteenth most common cancer in the world, accounting for an estimated 300,373 new cases and 145,238 deaths globally [1]. In India, oral cancer is a major public health problem, ranked as the most common cancer in males with an annual incidence of 53,842 and mortality of 36,436, and the fourth most common cancer in females, with the annual incidence as 23,161 and mortality of 15,631 [1]. The high incidence of oral cancer in India is primarily attributed to the prevalent chewing-tobacco habit, with initiation of the habit at an early age of 9–12 years [2, 3]. Besides, a majority of the Indian patients are diagnosed in advanced stages of the cancer, and the 5-year survival of the oral cancer patients is about 40 % [4]. Despite easy accessibility of the sites of oral cancer, presence of preceding precancerous lesions such as oral leukoplakia, erythroplakia and submucous fibrosis in several geographic regions, advances in treatment modalities of surgery, radiotherapy, chemotherapy and targeted therapy such as epidermal growth factor receptor antagonists, prognosis for the cancer is poor. Besides, overall survival of oral cancer patients has not shown a significant improvement in the past four decades. Further, an alarming trend of increased incidence in oral cavity cancers in young adults of 20–45 years, and 60 % increase in the number of oral cancers in <40 years of age in tongue cancer cases, has been reported in the past 25 years [57]. Hence, it is imperative to understand the biology and biological behavior of the malignant oral cells. Besides, the overexpression of a panel of biomarkers in oral cancer tissues as compared to clinically normal tissues in the oral cancer patients may enable identification of specific therapeutic targets in oral cancer.

In the past two decades, molecular basis of oral cancer has been extensively investigated [2, 811]. Oral intraepithelial neoplasia development occurs by clonal evolution, a stochastic multi-step process with several genetic events contributing to the malignancy, followed by clonal expansion and metastasis [11, 12]. The pathways leading to the cancer are complex, including deregulation of oncogenes, tumor suppressor, DNA damage repair genes, genes associated with signal transduction, metastasis and angiogenesis [11, 13]. While trends can be determined, there is no specific alteration[s] that exists in a majority of the oral cancer patients. It is therefore important to anticipate inherent heterogeneity in oral cancer cell genotype, phenotype and pathogenetic events, as well as host genotype. The differences between individual patients and cancer phenotype including aggressiveness of the disease, response to treatment, recurrence and survival of the patient may be reflected in the expression profile of the patients.

Advances in biotechnology and information technology have facilitated differential expression profiling of the genome of oral cancer, reported primarily from patients in western countries [1419]. However, there is a dearth of information from the chewing-tobacco-related oral cancer in the Indian patients. In the current pilot study, high-throughput Illumina microarray platform was used to examine differentially expressed genes in oral cancer tissues as compared to clinically normal oral tissues.

Materials and methods

Study subjects

The study included 30 untreated patients with histopathologically confirmed oral squamous cell carcinoma, admitted to Department of Ear-Nose-Throat, Seth G.S. Medical College and King Edward Memorial Hospital, Mumbai, India. Oral cancer biopsy from the non-necrotic central portion of the resected tissues was collected from the patients at surgery. A majority of the patients were habitual long-term tobacco chewers of a minimum duration of 10 years. Three oral cancer patients with chewing-tobacco habit of more than 10 years had stopped chewing-tobacco for 2 months postdiagnosis of the cancer. The patient details are given in Table 1. Briefly, the patient group comprised of 19 males and 11 females, age range of 28–65 years with mean age of 49 years. The primary sites of oral cancer were buccal mucosa—11 cases; tongue—11 cases; gingiva—7 cases; and a single case of lower alveolus. The tumor size of the patients at diagnosis was T1—5 cases, T2—14 cases, T3—8 cases and T4—3 cases. Histopathologically positive lymph node involvement was observed in 18 patients, and 12 patients were node negative. As per TNM classification, the patients were categorized as early stages I and II—9 cases and advanced stages III and IV—21 cases. Clinically, normal oral tissues from contralateral site were available from 27 oral cancer patients. Paired cancer and normal tissues were available from 16 patients. Whereas 14 tissue samples were non-paired with cancer tissues and normal oral samples from different oral cancer patients, RNA from either one of the samples was degraded and showed low RNA integrity number [RIN]. Clinical appearance of leukoplakia, erythroplakia or melanoplakia was not observed in the patients at diagnosis of oral cancer. The tissues were stored in 1 ml RNAlater solution [Ambion Inc, Texas, USA], overnight at 4 °C, and stored at −80 °C until RNA isolation.

Table 1 Clinicopathological data of oral cancer patients

Informed consent was obtained from the participants for voluntary participation in the study, and the project was approved by Institute Ethics Committee.

RNA isolation

Total RNA was isolated from tissue samples using RNeasy Mini Kit as per the manufacturer’s instructions [Qiagen, Hilden, Germany]. The extracted RNA was quantitated on NanoDrop Spectrophotometer ND-1000 [NanoDrop, Delaware, USA]. The quality and quantity of total RNA were characterized using RNA Nanochip 6,000 kit on Bioanalyzer 2,100 [Agilent Technologies, Foster City, California, USA]. The RNA samples with 18S and 28S ribosomal peaks and RIN between 7 and 10 were used in the microarray experiments.

Illumina microarray assay protocol

Total RNA from cancer tissues and clinically normal tissues was used for microarray analysis assays as per the manufacturer’s instructions. Briefly, total RNA was reverse transcribed to cDNA using high-capacity cDNA archive kit [Applied Biosystems, California, USA]. The cDNA was subjected to in vitro transcription in the presence of biotinylated nucleotides [Ambion Inc., Texas, USA]. The biotin-labeled cRNA was fragmented and hybridized to high-density oligonucleotide IlluminaSentrix Human Ref-8 v2 Expression BeadChip arrays [Illumina Inc., San Diego, USA]. The microarray slide contained 22,184 probes representing curated human genes and ESTs. The arrays were scanned using confocal laser scanner, Bead Array Reader and analyzed using GenomeStudio software [Illumina Inc., San Diego, USA].

Microarray data analysis

The intensity output files were initially analyzed for probe hybridization quality control parameters including average background, target intensity and raw noise values. The non-normalized fluorescent intensity of each probe on the chip was obtained using the DirectHyb gene expression package GenomeStudio software [Illumina Inc., San Diego, USA]. The raw data were subjected to average normalization and filtered to select probes with a detection p < 0.01 in both cancer tissues and normal buccal mucosa. Differential gene expression analysis was performed using Illumina Custom algorithm. Genes were considered differentially expressed with a 1.5-fold increase or decrease of the transcripts in the tumor tissues as compared to the normal tissues [p < 0.05].

Hierarchical clustering analyses were performed on the expression data set using Pearson’s correlation to determine distance metric and visualized using Tree View program to group genes according to their similarities in expression levels using an unsupervised clustering algorithm-based software [20] in the tumor [n = 30] and control samples [n = 27].

The differentially regulated genes were functionally annotated with respect to bioprocesses, molecular function and cellular localization using the Gene Ontology [GO] database [21] and Data Annotation, Visualization and Integrated Discovery [DAVID] [22] bioinformatics tools. The significant gene clusters queried with the known components of biological pathways on the Kyoto Encyclopedia of Genomes and Genes [KEGG] database [23] were used to identify important pathways involved in disease development.

Results

Microarray analysis

The microarray analysis demonstrated >1.5-fold differential gene expression in 425 genes [524 probes] in oral cancer tissues [n = 30] as compared to clinically normal oral tissues [n = 27], with 255 genes upregulated [Supplement data S1] and 170 genes downregulated [Supplement data S2]. The associated biological functions of the genes [25, 26] are indicated in the Supplement data S1 and S2, and the percentage distribution of the genes as per the biological functions in the various groups synopsized in Table 2. Thus, we observed upregulated genes distributed across various functions such as immune response—22 %, cell metabolism—20.4 %, signal transduction—11 %, cell proliferation—9 %, invasion—8.6 %, cell development—6 %, transcription factors—5.5 %, apoptosis—5.5 %, transport proteins—4.3 %. Two genes [<1 %] belonged to angiogenesis and xenobiotic metabolism group, and functions were unknown for 13 genes [5 %] [Table 2]. The downregulated genes were functionally categorized as cell metabolism—25 %, signal transduction—12 %, cell regulation—10 %, transport proteins—9 %, cell development—8 %, apoptosis—5 %, invasion—3.5 %, transcription factors—3 %, cell proliferation—3 %. Four [2 %] genes each were categorized as immune response and xenobiotic metabolism. The functions were unknown for 16 % genes [Table 2].

Table 2 Biological functions of differentially regulated genes (≥1.5-fold) in oral cancers

Thirty-two genes with twofold overexpression and downregulation in 12 genes are observed and indicated in Tables 3 and 4, respectively. Chromosome location, fold change, percent oral cancer patients with overexpression or downregulation and the biological functions of the genes are also indicated in Tables 3 and 4. A majority [21/32, 65.6 %] of the genes were over expressed in 50–77 % of the cancer samples, with fewer genes showing upregulation in 30 to <50 % cancer samples [Table 3], whereas twofold or more downregulation was observed in 57–83 % of oral cancer samples [Table 4].

Table 3 Overexpression of genes (≥2-fold) in oral cancera
Table 4 Downregulated genes (≥2-fold) in oral cancera

The biological pathways and molecular networks were analyzed using DAVID bioinformatic tool with gene enrichment method [22]. A high enrichment score of >12 was obtained for the differentially expressed genes. These upregulated genes were functionally categorized into immune response [12, 37.5 %], signal transduction [2, 6.3 %], cell cycle and proliferation [7, 22 %], invasion and metastasis [4, 12.5 %], apoptosis [2, 6.3 %], cell differentiation [1, 3 %], angiogenesis [1, 3 %], transport [1, 3 %] and xenobiotic metabolizing enzymes [1, 3 %], whereas the 12 genes downregulated were functionally associated with cell metabolism [4, 33 %], cell differentiation [2, 17 %], signal transduction [1, 8 %], invasion and metastasis [1, 8 %], xenobiotic metabolizing enzymes [1, 8 %], cell cycle and cell proliferation [1, 8 %], cell differentiation [2, 17 %], and apoptosis [1, 8 %]. The function of a single gene remained unknown.

Hierarchical clustering analysis

Clustering analysis of genes was performed with GenomeStudio software [Illumina]. Heat map of unsupervised cluster analysis of cancer samples [n = 30] and clinically normal samples [n = 27] defined three clusters [Fig. 1]. Data visualization tool using color grid representing various degrees of gene expression delineated a cluster of 14 samples with a pattern of specific genes downregulated; a distinct second cluster of 15 samples with specific upregulated genes; and a third cluster of 29 samples with the combination of both upregulated and downregulated genes [Fig. 1]. A single sample data were deleted from further analysis due to incomplete clinicopathological information. We observed clustering of five tongue tissues and five buccal mucosa samples from both cancer patients and normal controls. Six of the 16 paired cancer and clinically normal samples demonstrated near-identical gene expression profiles.

Fig. 1
figure 1

Heat map generated by hierarchical clustering of differentially expressed representative genes between oral cancer and control samples. The rows represent genes, and the columns are samples. Hierarchical clustering of gene expression data for representative genes in 58 samples (oral cancer samples = 30 and control samples = 27; a single oral cancer sample was excluded from further analysis due to insufficient clinical information. The color grid indicating expression levels is shown in right-hand side bar. The expression analysis shows four clusters with, cluster 1 comprising 12 samples in the left panel, and 2 samples on extreme right panel, representing 14 normal samples; cluster 2 and 3 include two closely related clusters of 15 + 14 oral cancer samples; cluster 4 comprising 13 apparently normal clinical samples and 2 oral cancers

Discussion

The human genome project decoding the genome and the concurrent advances in biotechnology and information technology provided the basic infrastructure for high-throughput genome-wide associations and expression analysis for understanding the biological processes in human cancers. In the current pilot study, toward examining biomarkers associated with frankly malignant tissues in the oral cavity, we investigated expression of genes in oral cancer tissues as compared to normal tissues from oral cancer patients. We observed >1.5-fold differential regulation in 425 genes in oral cancer tissues as compared to paired and unpaired control normal buccal mucosa from oral cancer patients. Further, a twofold overexpression was observed in 32 genes in 30–77 % and downregulation in 12 genes in 57–83 % oral cancer tissue samples. A significant association of the differentially expressed genes with clinicopathological features e.g., site, lymph node, metastasis and cancer stage was not observed due to smaller sample numbers in the stratified subgroups. We observed a near-identical gene expression profile in 6 of 16 paired cancer and clinically normal samples, indicating that the clinically normal tissues may contain initiated malignant cells due to the process of field cancerization indicating high risk of conversion to a malignant phenotype.

Biological pathway analysis of the differentially expressed genes in oral cancer, analyzed through KEGG pathway [23], demonstrated association with critical biological pathways. Genes associated with cell metabolism and signal transduction comprised 31 % of the upregulated genes and 37 % of the downregulated genes, indicating a critical role for these genes in oral carcinogenesis. The immune response and cell cycle/cell proliferation genes were upregulated in 31 % of the oral cancer samples and may result in deregulation of the genes. The functions of 20 % of the differentially expressed genes are not known.

A diagrammatic representation of the upregulated genes in cell survival, proliferation, migration, angiogenesis and apoptosis is indicated in Fig. 2. Thus, tumor necrosis factor (ligand) superfamily, member 13 b (TNFSF13b), binds to its receptor leading to the activation of transcription factors BCL and BFL for cell survival [25], whereas interleukin-24 (IL-24), tumor necrosis factor a-induced protein 6 (TNFAIP6), chemokine (c–c motif) receptor 1 (CCR1) and absent in melanoma 2 (AIM2) may lead to enhanced cell proliferation via activating the JAK-STAT, MAPK and AKT pathways [24]. SPP1/osteopontin overexpressed 3.6-fold in oral cancer tissues has been associated with wound healing, inflammation, immune response, bone remodeling and tumorigenesis [26]. Several transcription factors including PPRA via GLIPR1 and a combination of TCF, Ep300 and β-catenin activate Wnt1 inducible signaling pathway protein 1 (WISP1) results in the deregulation of apoptosis [25]. Further, invasion and metastasis are mediated through egf-like module containing mucin-like hormone receptor-like 2 (EMR2) and collagen triple helix repeat containing 1 (CTHRC1) [25]. Overexpression of chemokine motif ligand 1 (CXCL1) binds to CXCR2 and results in angiogenesis, whereas C-type lectin domain, family 4 A (CLEC4A), activates the NF-kβ pathway [25] (Fig. 2). Hence, it is feasible that overexpression of the various genes and deregulation of various cellular pathways may lead to oral cancer.

Fig. 2
figure 2

Upregulated genes associated with cellular pathways including cell proliferation, migration, angiogenesis, apoptosis and inflammatory immune response

Besides, the dominant role of overexpression of the mentioned genes, several genes associated with cell regulation, protein transport and cell development were downregulated in 26 % of cancers, indicating disruption in the functions. Death-associated protein-like 1 (DAPL1) deregulation leads to decreased caspase activation which decreases apoptosis [24, 25]. Aldehyde dehydrogenase 3 family, member A1 (ALDH3A1) and cytochrome P450, family 4, subfamily F, polypeptide 12 (CYPF12) are associated with detoxification and deregulation leading to increased toxicity [25]. Downregulation and decreased expression of glycine amidinotransferase (GATM) and beta-carotene 12, 15-monooxygenase (BCMO1) associated with cell metabolism may alter the biological pathways promoting cancer development or progression [24, 25].

Several differentially expressed genes have been earlier demonstrated in oral squamous cell carcinomas. Thus, IFIT2 upregulated 2.3-fold in 57 % of our oral cancer samples is in concordance with Lai and co-workers reporting enhanced expression of IFIT2 in oral cancers as compared to matched non-cancerous oral tissues in Taiwanese group of patients [26]. On the other hand, IFIT2 downregulation has been associated with increased invasiveness and epithelial to mesenchymal transition [27]. The chemokine receptor CCR1 upregulated 3.58-fold in 37 % in our study mediates dual role of directional migration and local host defense against tumor in oral cancer [28] and hepatocellular cancers suggesting multiple functions of this gene in tumorigenesis [29].

The genes overexpressed in our oral cancer tissues have been associated with several additional human cancers. FAP and AIM2 promoted invasive phenotype in colorectal cancer [3032], and FAP and SPP1 were overexpressed in breast cancer [3335]; SPP1 was also associated with poor prognosis in lung cancer [36]; NMMT in invasive renal cell carcinoma [37]; TNFSF13B in glioblastoma [38]; CTHRC1 in esophageal cancers [39]; and CXCL1 in melanoma [40]. On the other hand, TMPRSS11A downregulated in the cancer tissues in our study has been associated with breast cancer [41]; EPHB6, a metastasis suppressor gene, is downregulated in non-small cell lung cancers [42].

The current pilot study using high-throughput microarray analysis to differentiate the clinically normal buccal mucosa and frankly malignant oral cancers may indicate biomarkers of predictive, prognostic and treatment response. The expression analysis should be validated in larger sample/control sizes in various geographic regions via alternative technology such as real-time PCR. The identification of the deregulated genes and the cellular pathways may provide potential new treatment targets, understanding of the mechanistic pathways and insight into oral cancer development with transformation and progression of normal cells to a malignant phenotype.