Background

Oral squamous cell carcinoma (OSCC) has a high rate of morbidity and mortality worldwide.1,2,3,4,5 Around 30–50% of patients with OSCC die from the disease within 5 years and survival rates have not improved over many decades.2,5 Such adverse outcomes have mostly been attributed to late presentation of the disease, as early stage disease can be cured with effective treatment.1,2,6 Early detection of OSCC is feasible as they are usually preceded by clinically identifiable lesions termed ‘oral potentially malignant disorders’ (OPMD).1,2,7,8

OPMD are defined as clinical disorders having an increased risk of developing OSCC compared to clinically ‘normal’ oral mucosa.1,7 The majority of OPMD do not transform to cancer, consequently the challenge is identifying those lesions that are most likely to undergo malignant transformation.9,10,11,12

Clinical and histopathological features, though informative, are not very accurate in predicting the clinical behaviour of these lesions.13 Nevertheless, currently the presence and grade of oral epithelial dysplaisa (OED) is considered to be the most useful indicator of malignant transformation in OPMD and provides the basis for patient stratification endorsed by the World Health Organisation.1 A systematic review and meta-analysis indicates that excision of oral dysplastic lesions reduces the risk of malignant transformation by ~3-fold.11 Generally severe epithelial dysplasia or high-grade epithelial dysplasia is treated empirically by surgical excision;14,15,16 however, it is not clear how patient outcomes can be improved across all grades of dysplasia and those patients with non-dysplastic OPMD. Currently, it is unknown whether all OPMD should be excised or if only certain lesions benefit from a surgical intervention.

Numerous studies have assessed the prognostic ability of various biomarkers in OPMD; however, no molecular test has proved to be particularly useful in clinical practice.17,18,19,20,21,22,23 Discovering a molecular signature that is altered in OPMD and indicative of the progression to oral cancer could facilitate personalised management protocols for individual patients.

Contemporary gene expression profiling is being used to develop prognostic and predictive gene signatures in various cancers, including head and neck cancers.24,25 A study by Saintigny et al. (2011) proposed a gene expression-based prediction model for OPMD that showed superior prognostic accuracy when compared to models using clinico-pathological risk factors alone.26 However, the patients in their study were enrolled in a clinical trial in which some patients received active intervention in the form of drugs that may have influenced clinical outcome and gene expression.26 Furthermore, the findings of their study have yet to be validated.

Whilst formalin-fixed paraffin-embedded (FFPE) tissue is an invaluable resource linked to longitudinal disease-related outcome; it is often not possible to extract adequate amounts of high-quality nucleic acid for downstream analysis. A novel gene expression profiling system that relies on direct measurement of transcripts using colour-coded oligonucleotide probes producing molecular barcodes, the NanoString nCounter platform (NanoString Technologies, Seattle, USA), has been able to provide accurate gene expression data using RNA obtained from FFPE material.27,28 Recent studies have shown that mRNA expression analysis using the NanoString platform were equivalent to that achieved through quantitative real-time polymerase chain reaction (qPCR) and possibly superior to microarrays.27,28,29,30,31 Furthermore, the Prosigna™ breast cancer prognostic gene-signature assay is based on Nanostring technology and is approved by the US Food and Drug Administration and recommended by UK National Institute for Health and Care Excellence. The test is used to guide adjuvant chemotherapy decisions for women with oestrogen receptor positive, human epidermal growth factor receptor 2-negative and lymph node-negative early breast cancer.

Despite the global health burden and relatively poor prognosis associated with OSCC, a robust prognostic biomarker or prognostic model for predicting malignant transformation in OPMD has yet to be developed and validated. This study was undertaken to discover and then validate a transcriptomic-signature that identifies OPMD with a high risk of undergoing malignant transformation using FFPE-derived RNA analysed on the NanoString nCounter platform.

Methods

Inclusion and exclusion criteria

Consecutive OPMD cases were identified from a database at Newcastle University. Cases with any one of the following characteristics were excluded: (i) patients with hereditary conditions that are linked to an increased risk of head and neck SCC (such as ataxia telangiectasia, xeroderma pigmentosum, Fanconi anaemia); (ii) history of head and neck cancer; (iii) history of radiotherapy to the head and neck region; (iv) patients that were diagnosed as having chronic hyperplastic candidosis/chronic candidosis.

OPMD were classified as having undergone malignant transformation (MT) when there was progression from an OPMD to OSCC after a period of 6 months or more from the time of initial diagnosis. Those patients with OPMD who were recorded as not having developed OSCC at their last known follow-up appointment were classified as non-transforming (NT) cases with the caveat that the patients were followed up for at least 12 months after diagnosis. All cases were assessed for high-risk human papillomavirus (HR-HPV), and positive cases were excluded from the study.

Patients

Patients were selected from a database containing patients from two different hospitals: (i) Newcastle upon Tyne Hospitals NHS Foundation Trust and; (ii) City Hospitals Sunderland NHS Foundation Trust. Patients from Newcastle Hospitals were selected as the ‘training set’ while patients from Sunderland Hospitals were selected for the ‘test set’.

Clinico-pathological data

Demographic and clinico-pathological features as well as outcome data were recorded for all cases. The following data points were collected and entered into a Microsoft Excel spreadsheet: (i) age at first diagnosis of OPMD: (ii) sex: (iii) clinical diagnosis of lesion: (iv) clinical outcome of OPMD: (v) date of malignant transformation or last follow-up: (vi) World Health Organization (WHO) 2017 OED grading: (vii) binary OED grading.

OED grading was performed following a modified three-tier system adapted from the work published by Speight et al.32 The cases were graded using two different classification systems: (i) WHO 2017 (mild, moderate or severe):1 binary (low-grade or high grade).1,33 All data were coded, link-anonymised and stored in password protected computer files.

RNA extraction

Ten-micrometres sections were cut from the FFPE blocks and placed in 2 ml microcentrifuge tubes after discarding the first two sections. Whole sections that included both epithelium and underlying connective tissue were used. The number of sections per sample was dependent on the size of the FFPE tissue; as a guide four sections were taken for small samples (<5 mm of epithelium), three for medium samples (5–10 mm) and two for larger samples (>10 mm). If the amount of RNA extracted was not sufficient, RNA extraction was repeated using an increased number of sections. RNA extraction was performed using the RNeasy® FFPE kit (QIAGEN, Manchester, UK) according to the manufacturer’s protocol. FFPE blocks were sectioned immediately before RNA extraction. The concentration and the quality of the isolated RNA were measured using a NanoDrop 2000 Spectrophotometer (Thermo Fisher Scientific,Swindon, UK). RNA was diluted to 150 ng/μL, aliquoted and stored in a −80 °C freezer prior to NanoString assay. RNA with a 260/280 ratio of 1.7–2.3 as well as a 260/230 ratio in the range of 1.8–2.3 were considered to be of acceptable quality for downstream assays.34 RNA content for all samples was normalised to 30 ng/μl, and 150 ng of total RNA per sample was used for the assay.

NanoString nCounter customised panel

A list of target genes for the NanoString nCounter Customised Panel (42 genes; 38 target and 4 housekeeping genes) was compiled based on the results of previous experiments: a whole-transcriptome analysis with total RNA sequencing (RNA-Seq),35 results of previous differential gene expression work performed by our group and review of published literature. The selection of candidate genes was discussed and finalised through consensus by the authors; the gene list is shown in Supplementary Table 1. Housekeeping/internal reference genes were selected on the basis of low variation and even coverage across samples.36,37,38,39

NanoString nCounter hybridisation

The NanoString nCounter platform uses hybridisation of short length probes (35- to 50-base sequence) that are subsequently fixed to a biotin-coated cartridge, which is then digitally imaged and counted to quantify mRNA expression. In-depth details regarding NanoString technology can be obtained from Geiss et al.27 NanoString assay was carried out at the Newcastle NanoString Unit, Newcastle University using the nCounter MAX/FLEX system (NanoString Technologies, Seattle, Washington, USA). Each assay comes with engineered External RNA Controls Consortium (ERCC) synthetic internal negative and positive control probes. The summarised laboratory workflow for the Customised CodeSet Panel gene expression assay according to the manufacturer’s protocol is outlined in the Supplementary Methods.34

Normalisation of data and development of prognostic gene signature

Nanostring profiling of codeset was pre-processed using R package NanoStringNorm version 1.2.1. Data were assessed for batch effects using R package FactoMineR version 1.39. Data were normalised using grid search over parameter space as detailed previously,40 resulting in the choice of parameters: ‘geometric mean’ of positive controls, ‘mean’ of negative controls and ‘geometric mean’ of top genes, and finally log2 transformed. Genes with zero counts in >50% samples were removed from subsequent analyses. This resulted in the removal of genes: CDKN2A, MMP1, DSPP, CERS1 and IBSP. All visualisations were generated in R statistical environment version 3.6.1.

Statistical analysis and multivariable prognostic/survival modelling

Statistical analysis and prognostic model building were performed using IBM-SPSS for Windows (version 24, IBM-SPSS Inc., Chicago, Illinois, USA) and the R statistical environment version 3.6.1 (R Foundation for Statistical Computing, Vienna, Austria). Continuous data were always assessed for normality of distribution prior to choosing appropriate statistical tests. Parametric and non-parametric tests were used for initial analysis of demographic, clinical, pathological and molecular variables. For continuous data, descriptive results were appropriately expressed as either median with interquartile range (IQR) or mean with standard deviation (SD). For cross tabulations and Chi-squared tests, exact p-values were calculated where possible.

The Newcastle cohort was used as the training set, while the Sunderland cohort was used as a held-out test set. mRNA abundance data for genes was transformed to z-scores. A multivariable generalised linear model with L1-penalty was fitted in cross-validation (four-fold) settings to identify features predictive of patient outcome. This process was repeated 100 times to select optimal lambda minimising cross-validation error. The final model was used to predict risk scores in the test set and predicted risk scores were dichotomised into risk groups (using median risk score from the training set). These risk groups were tested for association with patient outcome using Cox proportional hazards model. Survival modelling was performed using R packages survival version 3.1–12 and glmnet version 4.0. All visualisations were generated in R statistical environment version 3.6.1. Survival model was adjusted for age, sex, site, OED grade and type of OPMD.

All statistical tests were two-sided and results were considered statistically significant at p < 0.05 value unless stated otherwise.

Results

Of the cases that fulfilled the selection criteria, 134 cases were considered to have sufficient tissue for RNA extraction. The majority of cases (91%, 122 of 134) yielded RNA of suitable quality and quantity for the NanoString assay. All cases were successfully analysed using this assay, and the raw data generated passed the relevant quality control parameters. The training set (n = 56) was comprised of 20 cases of OPMD that underwent malignant transformation (MT) and 36 cases that were non-transforming (NT). The clinico-pathological features of the training set are shown in Table 1. The test set (n = 66) was made up of 23 MT and 43 NT cases. The clinico-pathological features of the test set are shown in Table 2. All the OPMD in the study had oral epithelial dysplasia. Kaplan–Meier time to event analyses (time to malignant transformation) for low- and high-grade epithelial dysplasia are shown in Supplementary Fig. 1 for both the training and the test sets. An accompanying swimmer plot of the timing of individual events and censor dates are presented in Supplementary Fig. 2.

Table 1 Clinico-pathological features of training set (n = 56).
Table 2 Clinico-pathological features of test set (n = 66).

Following pre-processing and normalisation of the NanoString gene expression data (Materials and Methods), univariable prognostic association of genes in the training and test sets was assessed. Of the 33 genes, eight were prognostic in the training set (Wald p < 0.05; Supplementary Table 2) and five were prognostic in the test set (Wald p < 0.05; Supplementary Table 3). Three genes (NOTCH1, CD274 and ITGB8) were prognostic in both sets and also demonstrated consistency in the direction of the estimated hazard ratio (Training set: NOTCH1 HR = 0.26 and p = 0.009, CD274 HR = 2.76 and p = 0.032, ITGB8 HR = 3.04 and p = 0.023; Test set: NOTCH1 HR = 0.19 and p = 6.7 × 10−4, CD274 HR = 4.81 and p = 0.001, ITGB8 HR = 5.55 and p = 0.002; Supplementary Fig. 3). Lower NOTCH1 transcription and higher levels of CD274 and ITGB8 transcripts were associated with malignant progression.

Next, we used the training set to identify a prognostic gene signature associated with malignant transformation. A multivariable prognostic model (Cox model with L1-regularisation; 4-fold cross-validation) was created, which constituted 11 genes. The gene list together with the relevant weightage of each gene is shown in Table 3. This prognostic model was used to predict patient risk scores in the test set. The predicted risk scores were dichotomised into high- and low-risk groups (using median risk score derived from the training set). The risk groups demonstrated two clusters of patients in the test set when assessed against the mRNA abundance data of the underlying genes in the multivariable prognostic model (Fig. 1). These risk groups were further tested for association with patient outcome using Cox proportional hazards model adjusting for age at diagnosis, sex, site, type of OPMD and binary OED grade. The prognostic gene signature remained an independent predictor of malignant transformation when assessed in the test set, with high-risk group showing worse prognosis (hazard ratio (HR) = 12.65, p = 0.0003; Fig. 2a and Table 4). In the multivariable setting, in addition to the gene-signature-derived risk scoring, binary OED grading was also statistically significant (p = 0.017). Predicted risk groups were also tested for association with malignant transformation using C-index, which also confirmed strong concordance between the predicted risk groups and survival times (Concordance index = 0.82, 0.75–0.88).

Table 3 Characteristics of the genes in the prognostic signature, along with the estimated beta coefficients (weightage).
Fig. 1: Bar plot showing the predicted risk scores of the samples in the test set.
figure 1

The clinico-pathological covariates age at diagnosis, sex, OED grade, type of OPMD and site of the index lesion are shown in rows below the bar plot. A heatmap shows the mRNA abundance (z-score) of the genes from the prognostic gene signature for the test set samples.

Fig. 2: Kaplan–Meier time to event analysis using Cox proportional hazards model comparing malignant transformation in the test set samples divided into low- or high-risk groups.
figure 2

Predicted risk scores were categorised into low- and high-risk groups using a threshold estimated as the median risk score of the training set (a). The gene expression-derived classifier was informative in an independent cohort (GSE26549 dataset)26 (b) and was biologically relevant as the predicted risk scores were significantly higher in tongue squamous cell carcinoma samples compared with normal oral mucosa samples (GSE9844 dataset;41 Wilcoxon rank-sum test) (c).

Table 4 Multivariable Cox proportional hazards model (test set).

The predicted risk groups were verified for potential bias in the expression of the housekeeping genes (GAPDH, SDHA, TBP, TUBB), which showed stable expression levels across both groups except for a nominal difference in TUBB expression in the test set (log2 fold change = 0.23, p = 0.01, Wilcoxon rank-sum test) (Supplementary Fig. 4).

Although our predictor was trained and tested using the Nanostring nCounter platform, we tested it in an external cohort (GSE26549),26 which was profiled using a microarray platform (Supplementary Methods). Our classifier accurately predicted the risk of oral cancer free survival in this independent cohort (HR = 2.38, p = 0.041) despite the differences arising from the RNA quantifying platform (Fig. 2b). Furthermore, we used the gene signature to explore the association with normal and malignant states in another microarray profiled dataset (GSE9844).41 We observed significantly elevated risk scores in tongue squamous cell carcinoma samples compared to normal oral mucosa samples confirming oncogenic roles of the signature genes exclusive to tumour samples (p = 3.2 × 10−5, Wilcoxon rank-sum test, Fig. 2c and Supplementary Methods).

Discussion

Currently, risk-stratification of OPMD patients in clinical practice is usually based on a combination of clinical and histopathological features.1,23,42 However, the prognostic utility of these features has been found to be lacking and inconsistent.1,11,15,43 In this study, when considering clinico-pathological parameters, only OED grading was found to be statistically significant in the training set. When the clinico-pathological variables were fitted together using a Cox proportional hazards model, only the binary OED grading of cases was found to be statistically significant. This suggests that of all the clinico-pathological parameters, OED grading is the most useful prognostic indicator for malignant transformation in OPMD and supports the use of the binary grading system in clinical practice. This is consistent with the findings of most studies that have indicated that OED grading is currently the ‘gold-standard’ for prognosticating clinical outcome in OPMD cases.1,23 A confounding factor in the accurate risk assessment of the patients in this study was the lack of data on smoking habits. Smoking status is typically presented in broad categories such as current smoker, ex-smoker, never smoker; however, there are very few studies that provide detailed life-time exposure in pack-years, furthermore there is evidence that patients tend to under-report their smoking habits leading to inaccurate risk estimates.44

Archived FFPE tissues are an invaluable resource that can be successfully used for molecular-based assays despite the degradation of nucleic acids that accompanies fixation and embedding of samples in paraffin wax.27,28,45,46,47,48 Our study provides evidence of the clinical utility of the NanoString nCounter platform in providing robust gene expression outputs using RNA from FFPE tissue.27,28,30,31,49,50 Although relatively new, the NanoString nCounter assay has been shown by several studies to be sensitive and reproducible, with sensitivity and accuracy levels that are better than microarrays and comparable to real-time quantitative PCR (qPCR).28,30,31 A recent study by Veldman-Jones et al (2015) that evaluated the robustness of the nCounter platform in analysing clinical samples showed that the platform has high sensitivity of target detection and good reproducibility even with low RNA amounts, making it suitable for developing clinical tests.30 There are two main advantages of NanoString technology compared to conventional gene expression analysis methods such as qPCR and microarrays. In the nCounter platform, transcript levels are measured from non-amplified total RNA, unlike other platforms, thus reducing errors/biases that may be introduced through increased sample manipulation and enzymatic reactions.27,28 Another advantage of NanoString is that it can be multiplexed to measure up to 800 target genes in one sample, unlike qPCR-based methods that are usually only able to measure the expression of a few genes at a time.27,28,30,31 These features were key to developing Prosigna™, which is a licenced prognostic test for breast cancer.

The gene signature developed in our study using the NanoString assay shows good potential in prognosticating clinical outcome. Our findings are analogous to the findings reported by Saintigny et al. (2011) where the authors showed that gene expression-based methods were superior to clinical and histological variables in determining clinical outcome in OPMD patients.26 In their study, Saintigny et al. (2011) compared microarray-derived gene expression-based models against a model that contained only age, histology (dysplasia vs hyperplasia) and two biomarkers (ΔNp63 and podoplanin).26 The two models containing microarray data showed much better performance compared to the model without any microarray data. Their final model, which combined the microarray data with clinical and pathological covariates, showed a slight improvement compared to the model with only microarray data. However, only nine transcripts were similar between the two microarray-based models, highlighting the rather unstable methodology employed in constructing their prognostic model. Aside from that, other major differences between their study and the current study are the type of tissue, the platform utilised to obtain the gene expression data and the statistical methodology used to arrive at the final gene expression profile.26 Nevertheless, our gene classifier accurately predicted the risk of oral cancer free survival in the Saintigny dataset.26 We also discovered that our gene signature was significantly different in matched normal oral mucosa samples and tongue squamous cell carcinoma.41 Together, these data suggest that the gene expression-derived classifier reported in this study is potentially generalisable and is likely to be underpinned by biologically relevant changes in oral carcinogenesis. Several novel genes (TLX1, CCNE1, ITGB8 and COL4A5) with no known prior associations with oral carcinogenesis contributed to the gene signature that was developed. The characteristics of all the genes in the classifier are summarised in Table 3.51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75

One major issue with prognostic/predictive models is clinical validation. For example, the molecularly driven prognostic model for malignant transformation of oral leukoplakia developed by Saintigny et al. (2011), though initially promising, has not been translated into clinical practice.26 To promote translation into clinical practice, new prognostic/predictive models should be validated by an independent research team using independent patient cohorts.76 Lack of independence between the training and test/validation cohorts can lead to an over-estimation of the prognostic ability of such models. Another barrier for successful validation of a prognostic gene signature is the presence of inter- and intra-tumour heterogeneity in OSCC, as well as heterogeneity in OPMD.

Even though the current study has demonstrated the value of a molecularly driven prognostic model over traditional risk-stratification methods for OPMD patients, molecular-based methods are not without their drawbacks. A major limitation of the current study is the sample size and the almost equal number of MT and NT cases that is not truly representative of the population where MT is variable and ranges between 0.13 and 36.4% depending on the cohort.9 However, this study was designed to be a proof-of-principle study to explore the possibility of using FFPE-derived material for development of a gene-signature prognostic of clinical outcome in OPMD patients. As such, we acknowledge that our study is only the first step in the development of a definitive gene expression-based prognostic model for OPMD. We also recognise that Nanostring is an expensive ‘research use only’ assay, nevertheless, it is conceivable that development of a clinical test would reduce costs by economy of scale. Prosigna™ a Nanostring-based breast cancer test, is proof that the technology can be translated into a cost-effective clinical test.

Although our study has successfully shown that the prognostic model developed is superior to conventional risk-stratification methods in a test set, the patients were obtained in a retrospective manner and the number of samples was small. Future studies require external validation in a sufficiently powered, prospective cohort study, recruiting consecutive patients with OPMD or as an observational component in a clinical trial. Ideally, such studies should be large enough to allow for data to be analysed by dysplasia grade, since this would provide valuable insight into the strengths and limitations of the molecular classifier against the current gold standard for risk assessment.

Conclusions

We have shown proof of principle that RNA extracted from FFPE tissue, when analysed on the NanoString nCounter platform, can be used to model a gene expression signature that accurately predicts the risk of oral potentially malignant disorders undergoing malignant transformation. The molecular classifier was developed on a training set and validated on a test set, but still requires external validation in an appropriately powered cohort study before it can be used in clinical practice.