Introduction

Oral cancer, the most common form of cancer of the head and neck, has a particularly grim prognosis due to its aggressive invasion pattern in the lips, oropharynx, and oral cavity [1]. Cancers of the gingiva, mouth floor, tongue, and palate make up the oral squamous cell carcinoma (OSCC) spectrum. Leukoplakia, erythroplakia, and submucous fibrosis are examples of pre-cancerous lesions that are connected to the advanced phases of OSCC [2]. The etiological variables (such as tobacco use and alcohol consumption) and host genetic factors, infection with Human papillomavirus and Epstein–Barr virus, and environmental factors are primarily to blame for OSCC’s high incidence and delayed diagnosis [3]. Tobacco use places an unreasonable additional strain on the health systems of the world’s most impoverished nations [4]. Tobacco use is responsible for the highest number of cases of mouth cancer in India’s patient population (preferably in smokeless form) [5]. Tobacco use is the cause of forty-five percent of all cancers that occur in men, and twenty percent of all cancers that occur in women in India [6]. According to the findings of the Global adult tobacco survey 2 (GATS 2), which was carried out in 2016–2017, 28.6 percent of all adults are smokers, with the proportion of male smokers standing at 42.4 percent and female smokers at 14.2 percent [7]. There has been a significant drop in the number of people who use tobacco products in India, but the number of people diagnosed with oral cancer has been steadily climbing. There was a total of 135,929 newly diagnosed instances of oral cancer in India, excluding oropharyngeal cancer, according to the data compiled by Globocan 2020. Of these, 104,661 were male patients, and 31,268 were female patients [8]. Figure 1 depicts the cases of oral squamous cell carcinoma and projections for 2025 based on incidence and prevalence over the past 14 years.

Fig. 1
figure 1

Incidence and prevalence of OSCC cases in the last 14 years and prediction for 2025 based on previous trends [9,10,11,12,13,14]

Tobacco-induced alterations

Tobacco is consumed in a variety of ways, the most common of which are smoking and other forms of tobacco products, such as chewing tobacco and dip [15]. In India, the use of smokeless tobacco products is significantly more prevalent than smoking [16]. Cigarette, beedi, hooka, and chutta, among other things, are all examples of items that are used for smoking. Consumption of tobacco can, on its own, cause or contribute to the development of a number of cancers and cardiovascular diseases [15]. The development of oral cancer typically begins with potentially premalignant oral epithelial lesions, such as leukoplakia, erythroplakia, submucous fibrosis, and lichen planus [17]. The rate at which leukoplakia develops into a malignant form ranges from 0.13 to 34% [18]. Even though tobacco use is the major contributor of oral cancer, very little is understood about the molecular changes that are caused by tobacco [19]. According to the International Agency for Research on Cancer (IARC), smokeless tobacco is regarded as a group I carcinogenic to humans. Tobacco contains tobacco-specific N-nitrosamines (TSNA) which are 4-methyl nitrosamino 1,3-pyridyl butanone (NNK), nitrosonornicotine (NNN), nitrosoanatabine (NAT), and nitrosoanabasine (NAB) [8]. After China, which has 311 million tobacco users, India has the second highest number of tobacco users in the world, with 229 million users [20]. Chronic exposure to SLT products is harmful to human health because of the presence of nicotine and TSNA, which are both highly addictive and hazardous chemicals [21]. It is well established that smokeless tobacco products contain toxicants and carcinogens, including nicotine, N-nitrosamino acids, volatile N-nitrosamines, aldehydes (formaldehyde and acetaldehyde), hydrocarbons, and polonium-210 [22]. The majority of the nicotine that is consumed by humans is metabolized to cotinine. It acts as a major metabolite and specific marker of nicotine exposure, and its concentration is determined by the rate of nicotine metabolism. The mucous membranes are the primary route through which nicotine is absorbed from smokeless tobacco. This process begins slowly but picks up speed, reaching its peak after five minutes, and then begins to slow down after thirty minutes, despite the fact that the tobacco is still present in the mouth [23]. Superoxide anion and hydroperoxides are the main sources of nicotine-induced free radicals and act as markers of oxidative stress [24]. Evidences suggest that the availability of smoke-free legislation, increased taxes on smoking, and high social acceptance during working periods have been described as directly proportional to the consumption of smokeless tobacco among adolescent males [25].

DNA adducts formation

Products made from tobacco contain an extremely diverse array of chemicals. Some of these can form direct bonds with DNA, while others require the formation of intermediary products in order to do so [26]. DNA adducts are typically produced as a result of the interaction of DNA molecules with a variety of chemicals [26]. Mutations in the genome caused by DNA adducts can have far-reaching effects on the body’s ability to maintain homeostasis in its many regulatory systems [27]. Figure 2 is a simplified depiction of the effects of tobacco use on the oral mucosa. These DNA adducts can be categorized into a few different groups, such as methyl DNA adducts, pyridyloxobutyl DNA adducts, aldehyde DNA adducts, pyridylhydroxybutyl DNA adducts, DNA phosphate adducts, DNA base adducts, bulky/aromatic adducts, adducts formed from aromatic and heterocyclic aromatic amines, methylating agents forming adducts, ethylating agents forming adducts, 1,3-butadiene result into formation of adducts, adduct formed due to formaldehyde, adduct formation due to acetaldehyde and crotonaldehyde, adduct formation due to acrylamide, reactive oxygen species formed due to tobacco smoke can also form DNA adducts [28, 29]. NNK, also known as 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone, is substantially metabolized to another potent pulmonary carcinogen known as 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone (3-pyridyl)-1-butanol (NNAL), and it has the ability to form DNA adducts [27]. The genotoxic properties of these DNA adduct can lead to the mutation of multiple genes, for example, methyl adducts like O6-mdGuo cause GC-to-AT transition mutations; 7-mdGuo causes base substitution and nucleotide deletion; O4-mdThd causes TA-to-CG or AT-to-TA mutation; O2-mdThd causes TA-to-AT mutation; pyridyloxobutyl DNA adducts like O6-pobdGuo and GC-Aldehyde DNA adducts can cause AT to TA, AT to CG, and GC to TA mutations. [29]. The induction of GC to TA mutations in K-ras at codon 12 is caused by NNK and other carcinogenic substances found in tobacco products, such as nitrosamines, PAH, aromatic amines, and reactive oxygen species (ROS) [26]. A prototypical pathway for the activation of nitrosamines involves the hydroxylation of methyl carbons, which is catalyzed by cytochrome P450. This process results in the unstable transitional substance hydroxymethyl NNK, which then spontaneously loses formaldehyde to form diazohydroxide. Finally, this decomposition reaction produces diazonium ions, which bind to DNA to form adducts [27]. Different human tissues and fluids can be analyzed using various techniques to study and measure DNA adducts. Example: Leukocytes in blood, buccal cells in saliva, and urine (exfoliated epithelial cells) [30]. Examples of DNA adducts and their target genes are shown in Table 1. NNK is a component that can be found in the urine of tobacco users of all types, including smokers and users of smokeless tobacco [27]. DNA adducts are typically composed of 3-methyladenine in their predominant form in smokers, whereas in non-smokers, 7-methylguanine and 1-methyladenine are more likely to be found in urine [31]. To determine the carcinogenicity of various adducts, DNA adduct quantification is typically performed [32]. DNA adducts can be analyzed using a wide variety of analytical techniques, including 32P-postlabelling assays, immunoassays, electrochemical detection, MS-based methods, electrospray ionization (ESI) and nanoESI, triple quadrupole (QqQ) MS, ion trap (IT) MS, high-resolution MS instruments, and the use of front-end ion mobility for rapid separation and quantification of DNA adducts [26].

Fig. 2
figure 2

A graphical representation of the effects that tobacco-specific nitrosamines have on the oral mucosa. DNA adducts and epigenetic alterations can be caused by tobacco-specific nitrosamines, such as 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone (NNK), nitrosonornicotine (NNN), nitrosoanatabine (NAT), and nitrosoanabasine (NAB). Both impaired repair mechanism and an excessively high rate of DNA damage can contribute to the development and accumulation of mutations. Mutation in several oncogenes and tumor suppressor genes, including tumor protein p53 (TP53), A-kinase anchoring protein 9 (Akap9), Ankyrin repeat and pH domain 2 (Arap2), methylation in death-associated protein kinase (DAPK), O-6-methylguanine-DNA methyltransferase (MGMT), adenomatous polyposis coli (APC), and Survivin, has been linked to cancer of oral cavity

Table 1 DNA adducts and target genes

Non-tobacco risk factors

Alcohol

Chronic consumption of alcohol increases risk for several cancers including oral cancer [36]. Heavy alcohol consumption is an established risk factor for 2–4% of total cancers [36]. Bolesina et al. reported that heavy alcohol consumers have higher toll-like receptor 9 (TLR9), which further promotes inflammation and tumor promotion [37]. It is well known that alcohol has a synergistic effect with tobacco consumption during onset and progression of oral carcinoma. A higher prevalence of advanced clinical stage is associated with concomitant alcohol consumption and smoking [34].

Candida

Infection with Candida albicans was essentially associated with cancer as an opportunistic pathogen, but recent evidences suggest that it may be indulged in cancer promotion [38, 39]. Although infection is associated with OSCC, but more clarification is needed on whether Candida is involved in genesis and progression of oral cancers, or the tumors promote fungal growth [40]. Saxena et al. reported that there is a shift in non-Candida albicans infection with higher prevalence in OSCC patients followed by smokeless tobacco users and non-users [41]. In other experimental studies, it was found that Candida was linked with induction of oral leukoplakia and malignant transformation [42]. In vivo and in vitro studies by Vadovics et al. have reported that Candida upregulates oncogenes, potentiates premalignant phenotype, and it is indulged in early and late stages of malignant promotion and progression of oral cancer [40]. Therefore, identification and development of therapeutic approaches for dysregulation caused by Candida can serve to better outcome of patients.

Viral infections

Human papilloma virus (HPV) infection in OSCC ranges between 6 and 58%, whereas Epstein–Barr virus infection in OSCC lies between 25.9 and 82.5% globally [43, 44]. Infection with EBV alone can increase the likeliness of OSCC occurrence by 2.5 times to 5 times [45, 46].

The HPV is a dsDNA virus with a 7 KB genome and approximately more than 100 oncogenic subtypes; these viruses synthesize oncoproteins, such as E6 and E7, suppress p53 and Rb, and thus disable tumor suppressor activity [47]. Although HPV 16, 18, 31, 33, 35, 39, 45, 51, 52, 26, 58, and 59 subtypes are associated with OSCC, but only HPV 16 is responsible for 90 percent of HPV-associated OSCC [48, 49]. Proteasomal degradation of p53 is further facilitated by E6 protein, while E7 protein affects Rb and results in excessive release of E2F transcription factor, which causes excessive cell growth [50]. There are more gene alterations in the DNA repair genes in HPV-associated OSCC compared to non-HPV-associated OSCC [51]. The host body is affected by HPV by integrating itself into the genome and downregulating tumor suppressor activity, amplification of DNA, and the formation of altered transcripts [52]. In HPV-positive OSCC, more P16 is expressed, more CD8 + T cells are infiltrated, and IL-8 is dysregulated, all of which reduce neutrophil infiltration [53]. One of the most affected miRNAs, with perturbed promoter activity, in HPV-positive OSCC is miR-181 [54]. The detection of HPV in OSCC patients can be accomplished using p16 immunohistochemistry (IHC), HPV DNA in situ hybridization (ISH), E6/E7 HPV RNA-ISH, HPV DNA polymerase chain reaction (PCR), and E6/E7 HPV RT-PCR [52]. Patients with HPV-positive OSCC who smoke have five times more distant metastasis and more metastasis sites than patients with HPV-negative OSCC [51, 55]. It has been found that the risk of HPV-associated OSCC increases with the frequency and duration of oral sex, vaginal sex, oral–anal contact, and multiple oral sexual partners [49]. B cells detect HPV antigens and then recruit TH2 cells, which produce antibodies against these antigens. The antibodies most involved in this response are IgG, IgA, and IgM [56]. Figure 3 is a visual representation of how human papillomavirus and Epstein-Barr virus fuel the development of oral squamous cell carcinoma tumors.

Fig. 3
figure 3

A graphical representation of the role that HPV and EBV play in the development of OSCC tumors. Viral E6 protein helps in degradation of p53 protein resulting in increased murine double minute 2 (MDM2) mediated ubiquitination. HPV viral protein E7 downregulated retinoblastoma, which in turn excessively releases E2 factor, which helps in cell proliferation. Epstein–Barr nuclear antigen 1 (EBNA) inhibits B cell leukemia/lymphoma 2 protein (Bcl2) resulting in decreased apoptosis in cancerous cells

Oncogenic Epstein–Barr virus (EBV) is a human herpesvirus that has dsDNA as its genome and causes lifelong persistent infection, although the infection is generally harmless [57]. In addition to the fact that the virus has a biphasic life cycle and keeps its genome in the form of an extrachromosomal episome, the oral cavity is the primary location through which EBV is transmitted. The virus can be found in the saliva of an individual who is infected with the virus for their entire lifetime [58]. Patients with EBV-associated OSCC have a high expression of LMP-1 (Latent Membrane Protein-1), which is a protein that is associated with cell transformation. In addition, this protein significantly affects several different signaling pathways, including JAK-STAT, ERK-MAPK, JNK-p-38, PI3K-AKT, and NF-β, which ultimately results in cell proliferation, anti-apoptosis, angiogenesis, and metastasis [59]. There is a very high amount of inactive p53 (p53i) expressed in EBV-associated OSCC, and this inactivation is because of EBNA (EBV nuclear antigen protein). As a result of this, there is limited or no amount of bcl-2 activation, and the expression of c-myc is increased [60]. There is a decrease in the antioxidant enzyme activity in infected individuals [61].

According to a recent study, there is a higher possibility of developing oral cancer from a coinfection of HPV and EBV, than either virus alone to cause the disease [62].

Comprehensive mutational analysis in OSCC

OSCC is characterized by many mutated genes. Changes in the DNA can occur as a result of the disease itself or as a result of the mutations. There are many different types of gene mutations, including mutations caused by tobacco use, infection with HPV, EBV, alcohol consumption, age, sex, and more. Tobacco consumption can further be differentiated into two groups, i.e., smoking and smokeless tobacco. All mutated genes from the primary study reported on Cancer Genome Atlas are included in the supplemental files. The OSCC mutation spectrum is shown in Table 2.

Table 2 Mutational spectrum of OSCC

OSCC has a remarkably high rate of somatic mutations. FAT1, NOTCH1, CDKN2A, FAT2, LRP1B, TP53, CASP8, FAT1, PTEN, EGFR3, TP53-CASP8, PTEN-LRP1, TP53-CASP8, TP53-ATRX, and ARID2 are among the most reported mutations in India. Additionally, samples from China, Saudi Arabia, and the United States were examined and found to have BRAF, NSD1, NRAS, HRAS and CREBBP mutations in addition to other gene mutations, such as CSMD3, KMT2A, SMARCA4, PABPC1, NOTCH2, DNMTSA, NF1, FANCD2, ZFHX3, STAG2, MYH1, and CDKW mutations. Figure 4 depicts mutations in the chromosome locus.

Fig. 4
figure 4

Chromosome maps representing the localization of mutations on different locus. The white lines on chromosomes indicate the presence of mutation. “The construction of this diagram is based on data generated by the TCGA Research Network: https://www.cancer.gov/tcga”. The supplementary file containing the location and gene names is attached herein (Annexure I)

Tobacco-induced mutations

In oral cancers, the most common mutation is in the TP53 tumor suppressor gene, which produces the p53 protein and helps maintain genomic stability, as well as apoptosis and the cellular function [64, 70]. Mutations in the TP53 gene prevent the p53 protein from performing its normal functions [71]. Mutations in Akap9, Arap2, Cdh11, Hjurp, Mroh2a, Muc4, Muc6, Sp110, and Sp140 were found in p53-null murine oral carcinoma cell lines, as well as stemness markers and loss of E-cadherin expression [72]. The increased risk of tobacco-induced tumorigenesis is further exacerbated by germinal polymorphisms in CDKN2A, which increase susceptibility to tobacco carcinogens [73]. Hras, an isoform of RAS, was initially recognized as an oncogene in chemically induced squamous cell carcinoma (SCC), although, Hras gene mutations are prominently present in various SCC [74]. The cumulative effect of Hras and TP53 gene mutations is what contributes to the dismal prognosis of oral cancer [75]. According to the findings of a study conducted in Indonesia, the level of expression of the p16 protein is significantly reduced in smokers [76].

Non-tobacco-induced mutations

There have been reports of gene mutations, including those in non-conventional oncogenes and tumor suppressor genes. CSP8, FAT1, and Notch1 are a few of these [77, 78]. Because of its ability to activate p53, Notch1 serves as a tumor suppressor gene in a variety of cancers, including hepatocellular carcinoma, lung cancer, and others. Notch1 overexpression has been found in cutaneous squamous cell carcinomas [78].

EP300, ARID, KMT2D, PTEN, NSD1, and FGFR3 are frequently mutated genes in HPV-induced OSCC [79, 80]. A mutation in EP300 causes the protein to become inactive. The HPV E6 oncoproteins prevent the acetylation of the TP53 interpose by EP300, which then kicks off the degradation of TP53 by MDM2 [79]. There are many mutations in lysine methyltransferase KMT2D, which results in the upregulation of CTNNB1’s transcriptional activity by cooperating with MEF2A and thus increasing WNT signaling [79,80,81]. ARID and other somatic driver mutations are solely the result of viral integration and transference, as well as the transcription of genes like ARID [82]. The mTOR signaling pathway includes the tumor suppressor gene TSC2, whose hyper-methylated promoter leads to deregulation of the gene [83]. Additionally, PTEN is mutated in HPV-positive OSCC patients; the phosphatase site at 130R is generally affected [79]. A mutation in NSD1, which is also a lysine methyltransferase, results in a reduction in the production of chemokines that promote inflammation [79]. Targeted proteins are encoded by the FGFR3 gene, and an activating mutation in this gene further activates the PIK3CA or PTEN pathway [79]. When non-tobacco users develop OSCC, it’s due to the mutation of numerous additional genes.

Epigenetic changes drive oral cancer

DNA methylation, histone modification, microRNAs resulting in post-transcriptional gene downregulation, etc., are all examples of epigenetics. These epigenetic modifications to DNA can affect gene expression and function, and activating oncogenes and inactivating tumor suppressor genes, but there is no change in the specific DNA sequence [84, 85]. In CpGIs, 5-methylcytosine is formed in the promoter region of genes, such as tumor suppressor genes and proto-oncogenes, and the epigenetic changes in chromosomal abnormality, uncertain gene expression, and atypical functioning of both the tumor suppressor genes and the proto-oncogenes are caused by DNA methylation [86]. A subset of enzymes known as DNA methyltransferases is responsible for the addition of methyl groups to DNA [87].

There can be many non-tobacco risk factors that are involved in the epigenetic modification of DNA, one such example is alcohol consumption. Consuming alcohol results in histone modification, and DNA methylation and hence becomes the most common reason for OSCC, alcohol is having two sole components, which are responsible for these modifications and these components are ethanol and its metabolite, acetaldehyde. Also there are some other mechanisms which can enhance this DNA methylation activity and these mechanisms can be transmethylation reactions and changes in folate metabolism process [88]. Alcohol consumption is a reason behind DNA hypo-methylation, which further exhibits non-feasible changes in oncogene or tumor suppressor gene expression [89]. Different alcoholic drinks have different amounts of ethanol which can further oxidize to form acetaldehyde, and acetaldehyde being a metabolite having genotoxic properties can result in suppression of tumor suppressor genes and overexpression of oncogenes [86]. There was a study performed that showed alteration in two key long non-coding RNAs, namely lncPSD4-1 and in-NETO1-1, due to alcohol consumption and hence enhancing carcinoma conditions [90]. Alcohol consumption also induces the expression of the anti-apoptotic gene Bcl-2 due to the expression of miR-30a and miR-934 [91]. Some other factors involved in the progression of oral cancer include diet and nutrition, environmental factors, such as viral infections, fungal infections, bacterial infections, occupational risks, poor oral health, genetic factors, and age, leading to epigenetic changes [86].

The expression of many genes is altered due to epigenetic modification in promoter region which further results in OSCC. These genes are elucidated in Table 3.

Table 3 Methylation status of different genes in OSCC

Epigenetic changes also include histone modification, which plays a quite significant role in the progression of OSCC. Histone acetylation and histone methylation are two alterations that are responsible for the alteration of the expression of different genes. Histone deacetylases are responsible for abrupt transcription rate which causes many genes to function improperly [92].

Changes in OSCC oral microbiome

Distinct shifts in the relative abundance of individual oral bacteria suggest that particular combinations of bacteria could serve as markers for OSCC diagnosis. In addition, oral bacteria, such Porphyromonas gingivalis and Fusobacterium nucleatum, can take part in most cancer-promoting pathways, contributing to the growth of OSCC [93].

Significance of non-invasive and liquid biopsies

Recent advances in biomarker research via many different noninvasive methods have been tried to detect OSCC at its earliest stages. Noninvasive samples can be used for testing, such as saliva, brush biopsies, plasma, and others. Early biomarkers discovered in these samples, especially in saliva, have proven to be incredibly helpful for disease management [94]. As was previously stated, risk factors can cause genetic mutations in addition to interfering with multiple pathways involved in cellular and metabolic functions. Saliva contains a wide variety of chemicals, including cytokines, proteomes, RNAs, extracellular non-coding RNAs, DNAs, and metabolites produced by microbes [35, 95]. Salivary proteomic biomarkers can detect tumor relapse, severe dysplasia, viral-induced carcinogenesis, lymphatic metastasis, plasma markers in OSCC, and treatment response [47]. Although many candidates for salivary biomarkers have been identified, it is believed that only a subset of these markers can reliably differentiate between OSCC and pre-malignant states. Cathepsin B, Cyclin D1, Interleukin (IL) 1b, IL-6, IL-8, tumor necrosis factor (TNF), complement factor-H, defensins, carbonic anhydrase 2, matrix metalloproteinase 9 (MMP9), PF-4, 8-OHdG, transferrin, M2BP, MRP14, CD59, profilin, and telomerase are among the most significant markers [47, 96].

Conclusions

Oral cancer tumorigenesis is attributed to genetics and a variety of risk factors, such as tobacco use, alcohol consumption, HPV, and Epstein–Barr virus infection. The poor pathogenesis of the disease is a result of the molecular alterations caused by these factors. The molecular landscape of HPV-positive and HPV-negative cancer appears to be intertwined, despite the clinical differences in presentation. Biomarkers have recently been the focus of research, but it is equally important to keep an eye on new approaches to individualized treatment. As our knowledge of the disease has grown, so has the need for more thorough research to identify the key drivers of tumorigenesis in ethnically diverse populations. Studies have shown that a variety of proteomic biomarkers found in saliva can serve as indicators of disease onset and progression. In saliva samples, the detection of substance-specific deregulated genes and the downstream products of these genes can provide crucial information about the outcome of any arising clinico-pathological conditions. These methods may help in the development of more personalized treatment options for patients of various ethnicities, resulting in improved treatment outcomes and better disease management.