Abstract
Background
The prognosis for patients with head and neck cancer (HNC) is poor and has improved little in recent decades, partially due to lack of therapeutic options. To identify effective therapeutic targets, we sought to identify molecular pathways that drive metastasis and HNC progression, through large-scale systematic analyses of transcriptomic data.
Methods
We performed meta-analysis across 29 gene expression studies including 2074 primary HNC biopsies to identify genes and transcriptional pathways associated with survival and lymph node metastasis (LNM). To understand the biological roles of these genes in HNC, we identified their associated cancer pathways, as well as the cell types that express them within HNC tumor microenvironments, by integrating single-cell RNA-seq and bulk RNA-seq from sorted cell populations.
Results
Patient survival-associated genes were heterogenous and included drivers of diverse tumor biological processes: these included tumor-intrinsic processes such as epithelial dedifferentiation and epithelial to mesenchymal transition, as well as tumor microenvironmental factors such as T cell-mediated immunity and cancer-associated fibroblast activity. Unexpectedly, LNM-associated genes were almost universally associated with epithelial dedifferentiation within malignant cells. Genes negatively associated with LNM consisted of regulators of squamous epithelial differentiation that are expressed within well-differentiated malignant cells, while those positively associated with LNM represented cell cycle regulators that are normally repressed by the p53-DREAM pathway. These pro-LNM genes are overexpressed in proliferating malignant cells of TP53 mutated and HPV + ve HNCs and are strongly associated with stemness, suggesting that they represent markers of pre-metastatic cancer stem-like cells. LNM-associated genes are deregulated in high-grade oral precancerous lesions, and deregulated further in primary HNCs with advancing tumor grade and deregulated further still in lymph node metastases.
Conclusions
In HNC, patient survival is affected by multiple biological processes and is strongly influenced by the tumor immune and stromal microenvironments. In contrast, LNM appears to be driven primarily by malignant cell plasticity, characterized by epithelial dedifferentiation coupled with EMT-independent proliferation and stemness. Our findings postulate that LNM is initially caused by loss of p53-DREAM-mediated repression of cell cycle genes during early tumorigenesis.
Similar content being viewed by others
Background
Head and neck cancers (HNCs) arise from squamous epithelial cells within the mucosal linings of the oral cavity, larynx, oropharynx, and hypopharynx. HNC is a leading cause of mortality worldwide, accounting for approximately 4.7 million cancer-related deaths per year [1]. The prognosis of HNC (~ 50% 5-year survival) has remained poor over recent decades [2]; however, there is considerable variability in survival and treatment response between patients. This variability likely reflects the inherent heterogeneity of HNC, which occurs in multiple subanatomic regions and can be caused by different etiological factors [3, 4]. Prognosis is currently assessed using clinical stage at presentation, based on the size and location of the primary tumor, presence of lymph node metastases (LNMs) and distant metastases, and by clinical examination and cytology [5]. LNMs represent an independent prognostic factor and are associated with increased risk of metastasis to distant organs [6,7,8]. Distant metastases confer dismal prognosis, yet most HNC-related deaths occur without evidence of them [6, 7], in contrast to many other solid tumor types.
Greater understanding of the biological factors that influence prognosis in HNC could enable development of clinical biomarkers to improve risk stratification and could lead to novel targeted therapies. These are needed since most patients either do not respond to standard HNC treatment, which includes a combination of surgery, radiotherapy, and chemotherapy; or develop resistance to it [9]. Only a minor subset of patients responds to immune checkpoint immunotherapies [10, 11] or to the epidermal growth factor receptor (EGFR) inhibitor cetuximab, the only targeted agent currently in use [12, 13].
There is a growing understanding of the pathobiological factors that influence HNC progression. A subset of HNCs that occur within the oropharynx are caused by human papillomavirus (HPV). HPV positive (HPV + ve) oropharyngeal cancer (OPC) is associated with favorable survival and therapeutic response relative to HPV − ve [14, 15]. HPV + ve HNC represents a biologically and clinically distinct entity from HPV − ve HNC, which is primarily caused by smoking and/or alcohol use and is associated with TP53 mutations in ~ 80% of cases [16,17,18]. Within HPV − ve HNC, TP53 mutations are associated with worse survival [16] as well as increased incidence of lymph node [19, 20] and distant metastases [21]. Other prognostic factors include smoking, driver mutations, and molecular pathways, and variability in the cellular composition of the tumor microenvironment (TME) [4, 22,23,24]. In particular, higher levels of infiltrating CD4 + and CD8 + T lymphocytes are associated with better prognosis [25, 26]. Processes promoting cancer progression and invasiveness have also been reported, including epithelial to mesenchymal transition (EMT), hypoxia, and angiogenesis [4, 27].
Here we sought to identify pathways and cellular processes that drive HNC progression based on analysis of transcriptomic data. Prior studies have reported on genes that are associated with clinical outcomes in HNC; however, these studies provide conflicting results, perhaps due to differences in sample processing, data generation, cohort composition, and inconsistent sample annotation. Here we used a comprehensive meta-analysis approach to identify genes that are robustly associated with two clinical outcomes—LNM status and patient survival. We identified cell types that express these prognostic genes through integration with single-cell and bulk-sorted RNA-seq data, revealing prognostic cell types and cellular processes.
Methods
Curating HNC gene expression datasets
HNC gene expression studies were primarily accessed from GEO and ArrayExpress. Relevant studies were identified using the search terms “Cancer” in combination with the terms “Head and neck,” “Oral,” “Laryngeal,” “Oropharyngeal,” and “Hypopharyngeal,” and by reviewing all datasets that were retrieved by these searches. For GEO searches, datasets were restricted to those with a minimum of ten samples. We identified additional datasets by searching the reports that were associated with these datasets as well as additional review articles, until we were unable to identify any additional suitable datasets. Clinical data was accessed from the metadata that accompanied each dataset within databases, as well as from relevant reports. Where data that was needed to perform the survival and LNM meta-analysis was incompletely reported, authors and journals were contacted to request these data.
All clinical metadata related to survival (Any survival measure) and LNM was retrieved. Also retrieved, where available, were data indicating tumor grade. Other variables that were retrieved included demographic information (patient age, sex, and reported ancestry (race or ethnicity)), clinicopathological variables (tumor subsite, HPV status, measure of HPV status), details of the patient study (country or sample collection), and data pertaining to HNC-related risk habits (smoking and alcohol consumption status and intensity measures). For the TCGA study, HPV status data was obtained from a publication that applied VirusScan [28] to detect HPV RNA within raw RNA sequencing reads, representing the most complete source of HPV status data in terms of patient numbers. To spot-check the accuracy of clinical data, patient sex was inferred based on the ratio of expression of the XIST and RPS4Y1 genes and compared with clinical annotation of sex. This resulted in exclusion of two studies that had inconsistent clinical data.
The curated data compendium included a combined total of 2074 primary HNCs derived from 29 studies (Additional file 2: Table S1). Meta-analyses were performed to identify genes associated with patient survival, LNM status, and tumor grade, applied to the subset of HNCs that were annotated for each variable: These included 1638 HNCs (across 16 cohorts) with survival outcome data, 1449 HNCs (20 cohorts) with LNM status data, and 1139 HNCs (13 cohorts) with tumor grade data.
Processing gene expression data (meta-analysis datasets)
Gene expression datasets that were generated using Affymetrix arrays (N = 21) were processed as follows. To ensure accurate annotation of microarray probes, raw data (.CEL files) were accessed and processed using the “affy” R package in combination with platform-specific custom CDF files that were accessed from Brainarray (http://brainarray.mbni.med.umich.edu/). Expression datasets were normalized using the mas5 algorithm. Samples were next restricted to primary tumors, followed by quantile normalization of the expression data. Probe-level data was next summarized to gene-level data using the WGCNA package [29], using the default “maxmean” method for probe filtering. For each gene, this method selects the probe with the maximum mean expression across all samples as a representative measure of the gene. Summarized gene data were log2 transformed and converted to standard gene expression scores. For each gene, standard gene expression scores were calculated for each patient sample by subtracting the mean expression of the gene and dividing by the standard deviation. Statistical pipelines that were used to perform meta-analyses were applied to standard scores.
Eight datasets were generated using non-Affymetrix microarrays (Microarrays that were manufactured by Agilent, Illumina, and the German Cancer Research Center). These datasets were downloaded from GEO as series matrix files using the GEOquery R package. These datasets were preprocessed as follows: Gene names were converted to Entrez IDs using array annotation “Platform” files that accompanied each dataset. Where Entrez IDs were not included in the annotation file, gene names were converted to Entrez IDs using biomaRt [30]. Datasets were restricted to primary tumors and were filtered to remove samples with missing data for 10% or greater of genes, and to remove genes that had missing data for 10% or greater of samples. Datasets were then quantile normalized. For genes with multiple probes, the WGCNA package was used to identify the probe with the maximum mean expression across samples, which was selected to be a representative measure for each gene. Datasets were then log2 transformed if not already in log2 space and converted to standard gene expression scores as described for Affymetrix-based datasets.
Preprocessed TCGA bulk RNA-Seq data (gene-level HTSeq counts) were downloaded from TCGAbiolinks [31]. TCGA data was processed for meta-analyses using an approach that was consistent with array-based datasets: The dataset was restricted to primary tumor samples and then quantile normalized. Gene names were converted from Ensembl IDs to Entrez IDs using biomaRt [32]. Ensembl ID-level data was summarized to Entrez gene-level data using the WGCNA package “CollapseRows” function. The default “maxmean” method was used to select features with higher expression where Entrez IDs matched multiple Ensembl IDs. The datasets were then log2 transformed and converted to standard gene expression scores as described for Affymetrix-based datasets.
For applications other than meta-analyses, TCGA RNA-Seq data was processed using an alternative normalization approach in order to process primary HNC and tumor-adjacent normal samples in parallel, as quantile normalization assumes similar data distributions across samples [33]. HTSeq counts were converted to standard scores such that expression data for each HNC sample had a mean of zero and standard deviation of 1. Standard scores were then log2 transformed and batch corrected (correcting for sample plate) using COMBAT [34]. Gene names were converted from Ensembl IDs to Entrez IDs using biomaRt [30]. Ensembl ID-level data was summarized to Entrez gene-level data using the WGCNA package “CollapseRows” function [29]. The default “maxmean” method was used to select features with higher expression where Entrez IDs matched multiple Ensembl IDs.
Meta-analysis of genes associated with survival
This meta-analysis included all datasets that had at least 20 primary HNCs with survival and gene expression data (N = 16 studies with a combined total of 1638 HNCs). Clinical data pertaining to all measures of survival was accessed for each study, and survival time was converted to months. Survival analysis was performed using overall survival (OS) where possible, and other survival measures (progression-free survival, or distant metastasis-free survival) where OS was not reported (Table 1). For each dataset separately, Cox regression models were used to calculate z-scores for association of each gene with survival. For genes that were represented in two or more datasets (N = 23,558), Liptak’s weighted meta-z test [35, 36] was used to combine z-scores for each dataset into a single “meta-z-score,” a summary statistic that indicates the association of gene with survival across studies. Liptak’s meta-z test was applied with weights set to the square roots of dataset sample sizes. Genes were considered to be significantly adversely associated with survival (anti-survival) if they had a meta-z-score of 3.09 or greater (i.e., P < 0.001) and favorably prognostic (pro-survival) if they had a meta-z-score of − 3.09 or less.
Meta-analysis of genes associated with LNM
This meta-analysis included all datasets that had at least five LNM + and five LNM0 primary HNCs (N = 20 studies with a combined total of 1449 patient primary). LNM data was accessed from reports or metadata files as either LNM status (presence or absence of LNM) or was converted from a continuous measure of LNM burden (LNM stage, ratio, or number of LMs). For each gene that was available in at least half of the studies, the following statistics were calculated for each study separately: The standardized mean difference in expression between LNM + and LNM0 primary HNCs, the standard deviation of expression in each of these groups, and the number of samples in each group. Next, we used random effects models [58] to calculate meta-z-scores and effect size summary statistics for the association of each gene with LNM status across studies, based on the combined standardized differences and standard deviations, weighted by study sample size. Genes were considered to be positively associated with LNM (Pro-LNM) if they had a meta-z-score of 3.09 or greater (i.e., P < 0.001) and negatively associated with LNM (Anti-LNM) if they had a meta-z-score of − 3.09 or less.
Meta-analysis to identify genes associated with tumor grade
A meta-analysis was performed to identify genes that were associated with tumor grade (i.e., level of differentiation), where grade was reported either using a numeric grading system of the level of differentiation upon histological analysis (well, moderate, poor). This meta-analysis consisted of 13 studies with a combined total of 1139 primary HNCs. In each study separately, linear regression was applied to test the association of each gene with grade or differentiation level, treating grade, and differentiation level as ordinal variables. For genes that were represented in two or more datasets (N = 25,058 genes), Liptak’s weighted meta-z test was used to combine z-scores for each dataset into a single “meta-z-score,” a summary statistic that indicates the association of gene with grade across studies. Liptak’s meta-z test was applied with weights set to the square roots of dataset sample sizes. Genes were considered to be positively associated with grade (pro-grade) if they had a meta-z-score of 3.09 or greater (i.e., P < 0.001) and negatively associated with grade (anti-grade) if they had a meta-z-score of − 3.09 or less.
Testing the independence of prognostic gene signatures from HPV status
Regression models were used to test the association of survival gene signatures with survival, adjusted for HPV status, and to test the association of LNM gene signatures with LNM status, adjusted for HPV status. Expression scores were calculated for each prognostic gene signature (i.e., set of prognostic genes) as the mean of expression (standardized gene expression scores) of all genes within the signature. Each patient (primary HNC) was thereby assigned an expression score for each prognostic signature. Survival gene signatures included all genes that were negatively (anti-survival) and positively (pro-survival) associated with survival, as well as genes within survival gene clusters (S1-6). Cox regression models were used to test for association of each survival gene expression score with survival, adjusting for HPV status, in all studies (N = 4) that had at least ten patients with complete data for survival, HPV status, and gene expression. LNM gene signatures included all genes that were negatively (anti-LNM) and positively (pro-LNM) associated with LNM, and genes within each LNM gene cluster (L1-6). Logistic regression models were used to test for association of each LNM gene expression score with LNM status, adjusting for HPV status, in all studies (N = 6) that had at least ten patients with complete data for LNM status, HPV status, and gene expression. For each prognostic (Survival or LNM) gene signature, an HPV-adjusted meta-z-score was calculated using Liptak’s weighted meta-z test to combine z-scores across studies, weighted by study sample size. Additional analyses performed to investigate effects of HPV status and other potential effect-modifier on gene-survival and gene-LNM associations are described in Additional file 1: Supplementary Methods & Results.
Gene set enrichment analysis
GSEA was applied to all genes that were analyzed as part of the survival and LNM gene meta-analyses, to identify curated genes that were most significantly associated with each outcome, from a database of 18,993 curated gene sets. GSEA was applied to survival and LNM-associated genes using the “fgsea” R package (bioRxiv. https://doi.org/10.1101/060012). For consistency between survival and LNM-associated genes, genes were ranked by meta-z-scores, as this summary statistic was available for both. Curated gene sets were accessed from the Molecular Signatures Database (MSigDB) [59]. Selected for analysis were all gene sets in the “C1,” “C2,” “C5,” “C6,” and “H” gene categories, except for gene sets in the “CGP” (chemical and genetic perturbations) subcategory (N = 18,993 gene sets). CGP subcategory gene sets as well as gene sets in other categories (C3, C4, C7, and C8) were excluded due to the sparsity of their annotation, which makes them difficult to interpret. Gene sets with fewer than fifteen or more than 500 gene sets were removed to exclude enrichment that are less statistically and biologically meaningful. GSEA scores and p-values were calculated for the p53-DREAM target gene set (A set of 201 p53-DREAM target genes accessed from [60]) by adding this gene set to the list of MSigDB gene sets before performing GSEA.
Gene set overrepresentation analysis
GSOA was applied to identify gene sets that most significantly overlapped with survival and LNM-gene clusters, from a database of 18,437 curated gene sets. GSOA was performed using the msigdbr package, which was used to access gene sets from MSigDB, in combination with the clusterProfiler package, which was used to perform hypergeometric tests. Selected for analysis were all MSigDB gene sets that were used GSEA (See: Gene set enrichment analysis). When applying GSOA to genes derived from each meta-analysis, the background gene list used for GSOA consisted of all genes considered in the meta-analysis, i.e., for which meta-z-scores were calculated. These represented genes for which data was available in a sufficient number of studies to be considered.
Unsupervised clustering of prognostic genes based on co-expression in HNC populations
Phenograph [61], an unsupervised clustering method, was used to cluster survival and LNM-associated genes (That were previously identified by meta-analyses) based on their co-expression within HNC bulk transcriptional data. Phenograph was selected over other unsupervised clustering methods to avoid the step of selecting the number of gene clusters (K) a priori, which is required for other methods and introduces bias. For prognostic genes that were associated with each outcome (survival and LNM), Phenograph was applied to a combined matrix of uniformly processed gene expression profiles from twenty studies. Since Phenograph-based clustering does not tolerate missing data, the gene expression matrices were generated using an approach that maximized the number of prognostic genes and HNC that had complete data. To achieve this, for genes associated with each clinical outcome, studies were restricted to HNCs that included data for at least 80% of genes, and data from these studies was combined into a single matrix. Clustering was then applied to genes that had complete (non-missing) data for all samples within the combined matrix. For survival-associated genes, this approach yielded a combined matrix of 1642 HNCs derived from 20 studies, which had complete data for 958/1212 (79%) of all survival-associated genes. For LNM-associated genes, the combined matrix also included 1642 primary HNCs that were derived from twenty patient studies, which included complete data for 742/877 (85%) of the LNM-associated genes. Phenograph was then applied to the combined matrix of gene expression profiles in order to identify co-expressed gene clusters in the sets of survival and LNM-associated genes. All genes that were represented in the combined gene expression matrices were included in the resulting gene clusters.
Analysis of periodic expression of LNM-associated genes based on expression in synchronously dividing cells
Data that was previously published by Dominguez et al. [62] was used to investigate cell cycle phase-specific expression of LNM-associated genes. These data were generated by applying bulk RNA-Seq to map transcription in synchronously dividing cells (HeLa) that were collected at fourteen timepoints over the course of two mitotic cycles. Normalized gene expression data in the form of fragments per kilobase of transcript per million mapped reads (FKPM) was accessed from the Dominguez et al. report [62], as was data indicating the cell cycle phase within which each gene was expressed.
Processing the Stanford scRNA-Seq dataset
This dataset was described in our recent report [63] and is accessible from GEO (Accession number: GSE140042). For the current study, analysis was restricted to primary HNCs that were processed using enzymatic digestion, for consistency with the Puram dataset. Cell Ranger was used to align RNA-Seq reads to the latest GENCODE human transcriptome (Genome build hg38) and to quantify RNA counts. Sparse data matrices were then loaded into a Seurat object, which was filtered to remove genes that were present in ten cells or fewer. Low-quality and dying cells were removed by excluding cells with a unique feature count of fewer than 200 (N = 72 cells) as well as cells with a mitochondrial genome fraction of 0.4 or greater. Potential doublets were removed by excluding cells with a unique feature count of greater than 4000 (n = 444 cells). Integration was then performed by splitting the dataset into separate Seurat objects, with each object containing all the cells that derived from one HNC sample. Gene expression counts for each cell were normalized using regularized negative binomial regression, and variable genes (N = 2000) were found for each sample using the “vst” method [64]. Samples were then integrated into a single gene expression object by finding integration anchors using the “FindIntegrationAnchors” and “IntegrateData” commands. The combined genes were then scaled and centered using linear models. This integration approach removed sample batch effects such that cells clustered by cell type rather than by sample. Unsupervised clustering was applied to the integrated Seurat object in order to identify cell clusters using nearest neighbor modularity optimization [65]. PCA performed with 50 principal components (PCs) and elbow plots were then used to select the appropriate number of PCs. Unsupervised clustering was then applied to cells. To identify the appropriate number of cell clusters (i.e., the appropriate level of granularity), cell clustering was performed at multiple resolutions ranging from 0.3 to 1 in increments of 0.1. The optimal resolution was identified based on visualization of the resulting cell clusters using principal component analysis (PCA) and Manifold Approximation and Projection (UMAP). This approach was used to select the number of clusters that separated the major cell types into different clusters and that separated cell types into subclusters (cell subtypes) only where separate subclusters were clearly visible based on PCA and UMAP visualization. Cell clusters were manually annotated with their cell type and subtype by visualizing their expression of known cell type marker genes. Also visualized were gene expression scores of cell type marker gene signatures that were accessed from PangloaDB [66]. Gene expression scores were calculated for each signature as the mean expression (scaled normalized counts) of all genes in the signature, which indicated the expression of the signature in each cell. Cell type assignments were confirmed by applying Seurat to transfer cluster labels from the preexisting Puram HNC scRNA-Seq dataset [67]. This approach used a model that was trained on primary HNCs from the Puram dataset, for which cell types had been previously annotated, to classify cell clusters in the new Stanford scRNA-Seq dataset. Where multiple cell clusters of the same cell type were observed (such as classic fibroblasts and myofibroblasts), cell subtypes were manually annotated by visualizing gene expression signatures for cell subtypes.
Processing the Puram scRNA-Seq dataset
The Puram scRNA-Seq dataset was accessed from GEO (Accession number: GSE103322) as a preprocessed series matrix file. The dataset was then loaded into a Seurat object and was split into separate sample objects, with each object containing all of the cells that derived from one sample. Samples were restricted to primary HNCs with a minimum of 200 cells (N = 9 samples). The Puram dataset was subsequently processed using Seurat, as described for the Stanford scRNA-Seq dataset (See “Processing the Stanford scRNA-Seq dataset”). Cell type labels that were previously assigned by Puram et al. [67] were accessed from the GEO metadata. The validity of these cell type assignments was confirmed by UMAP-based visualizing expression of cell type marker genes and signatures.
Cells were labeled in a way that was consistent between the Puram and Stanford scRNA-Seq datasets, in order to facilitate comparison between these datasets. For this reason, cells that were labeled as myocytes in the Puram dataset were excluded from all analysis, as cells expressing myocyte markers were not observed in the Stanford dataset. Moreover, while macrophages or dendritic cells were labeled by Puram et al., these cells are labeled as myeloid cells in the current study, as we found that in both the Puram and Stanford scRNA-Seq datasets, myeloid lineage cells clearly separated from other cell types but expressed markers of macrophages, dendritic cells, and monocytes. This is consistent with emerging evidence that cells of the mononuclear phagocyte system (macrophages, dendritic cells, and monocytes) do not represent discrete cell types but have overlapping transcriptional profiles and functions [68, 69].
Prediction of additional cell phenotypes/states in scRNA-Seq datasets
Cell cycle phase was inferred using Seurat, based on expression of cell phase-specific marker genes. CytoTRACE [70] was applied to the raw count gene expression matrix for all epithelial cells, as per user protocol. CytoTRACE was applied to malignant cells only, according to the user manual recommendation that CytoTRACE be applied separately to cells of different lineages. Epithelial to mesenchymal (EMT) score was calculated as the sum of normalized counts for mesenchymal genes (VIM, CDH2, FOXC2, SNAI1, SNAI2, TWIST1, GSC, FN1, ITGB6, MMP2, MMP3, MMP9, and SOX10) minus the sum of normalized counts of epithelial genes (CDH1, DSP, and TJP1), as previously described [71, 72].
Bulk transcriptional profiling of flow-sorted cells
Bulk RNA-Seq was used to profile transcriptomes of distinct cell populations that were isolated from primary HNCs using fluorescence activated cell sorting (FACS):
Patient samples
Primary tumor tissue samples were collected between March 2017 and April 2018 from patients (n = 15) undergoing surgical resection of HNSCC (including squamous cell carcinoma of the oral cavity, oropharynx, and larynx) at the Stanford Hospital, Stanford, CA, after informed consent. Inclusion criteria included a diagnosis of HNSCC and age ≥ 18 years. Fresh tumor tissue specimens, with clinical annotation, were collected at the time of extirpative surgery and freshly frozen within 6 h after surgical resection. This study was performed in compliance with ethical regulations outlined in a Stanford Institutional Review Board (IRB)-approved protocol (protocol no. 11402). Details of patient clinicopathologic features are provided in Additional file 2: Table S2.
Sample preparation for florescence-activated cell sorting (FACS)
FACS sample preparation included obtaining tumor tissue from consented patients within 4 h after tumor tissue removal. Tumor tissue was placed on ice in 50 μl tissue digestion media, DMEM-F12 + with magnesium and calcium (Corning Cellgro, Manassas, VA), 1%FBS (heat inactivated), 10 units/ml Penicillin-10ug/ml Streptomycin (Gibco, Grand Island, NY), and 25 mM hepes (Gibco, Grand Island, NY). Tumor tissue was thoroughly diced with a sterile scalpel and placed in a gentleMACS C-tube (Miltenyi Biotec, Sunnyvale, CA) containing 1.5 ml of tissue digestion media. Tissue was mechanically digested on the GentleMACS dissociator five times under the human tumor tissue program h_tumor_01. Two milliliters of tissue digestion media and 0.5 ml of 3000U/ml collagenase/1000U/ml hyaluronidase (StemCell Technologies, Vancouver, BC) were added to the C-tube after mechanical digestion. The tissue in the C-tube was incubated at 37° C rotating for 1 h, then filtered with a 40-μm nylon cell strainer (Falcon, Corning, NY) into a 14-ml tube containing 14 ml tissue digestion media and centrifuged at 4 °C for 10 min at 514RCF. The enzymatically digested cell pellet was resuspended in 1–4 ml ACK lysis buffer (Gibco, Grand Island, NY) depending on the pellet size and number of red blood cells, for 2 min on ice. Cells were filtered as before, washed with 14 ml of FACS buffer (phosphate buffered saline) without calcium or magnesium (Corning, Manassas, VA), 2%FBS heat inactivated, 10 units/ml Penicillin-10ug/ml Streptomycin (Gibco, Grand Island, NY), and 1 mM Ultra-pure EDTA (Invitrogen, Carlsbad, CA), and centrifuged at 4 °C for 10 min at 514RCF. Cells were resuspended in FACS buffer, counted on a hemacytometer and washed one more time with FACS buffer. Cells were kept in FACS buffer on ice until flow cytometry staining.
Flow cytometry staining and sorting
Cells were incubated in the dark on ice for 30 min with fluorescent markers (Additional file 2: Table S3), at the manufacturers’ recommended concentration, except for DAPI, which was added after the last wash. Cells were washed three times with FACS buffer and sorted on a BD Aria II SORP using the BD FACSDIVA v8.0.1 software into 4 groups, CD3 + CD19 + CD45 + CD68 + (leukocytes), unstained (malignant cells), FAP + (fibroblasts), and CD31 + or CD140a + (endothelial cells) in tissue digestion media containing 30% FBS. Cell sorts had an average efficiency of 86.8% on Purity precision sorting, rerunning sorted samples to test for purity was not performed due to the need for enough RNA to sequence. Cells were spun at 4 °C for 10 min at 514RCF and resuspended in RNAlater stabilization solution (Invitrogen, Carlsbad, CA) at the recommended manufacturer’s concentration and stored at 4 °C for less than a week before RNA extraction.
Flow cytometry gating
Cells were analyzed using FlowJo V. 10.6.1 and first gated on single cell size using FSC width and height and cell granularity using SSC width and height (Additional file 1: Figure S1). Live cells were gated using the DAPI stain. From the live cell gate, the leukocyte group in FITC and endothelial group in PE were used to separate out CD3 + CD31 leukocyte cells from CD3 − CD31 endothelial cells. Leukocyte and endothelial negative populations were used to gate further for fibroblasts in APC and the malignant unstained (leukocyte, endothelial, and fibroblast negative) group.
Bulk RNA sequencing of flow-sorted cells
RNA was extracted from sorted cells within a week of cell sorting. After washing in PBS, cell pellets were used to prepare RNA with the RNAeasy + micro kit with column removal of genomic RNA. RNA samples were quality controlled using the Agilent 2100 Bioanalyzer system. Library preparation was performed using the SMARTer Stranded Total RNA-seq v2 Pico input mammalian kit (Clontech) at the Stanford Protein And Nucleic acid (PAN) facility. Bulk RNA sequencing was performed using the Illumina Hiseq4000 System, inputting 500 pg–5 ng of total RNA per sample and pooling 8–12 samples into each sequencing lane. Sequencing was performed at the Stanford Center for Genomics and Personalized Medicine (SCGPM) facility. This dataset is accessible from Gene Expression Omnibus (Accession number: GSE113839).
Preprocessing and analyzing the Stanford flow-sorted cell bulk RNA-Seq data
Trim Galore! was used to perform adaptor trimming and filtering of raw reads. Kallisto [73] was used to align reads to the GENCODE 34 human transcriptome (Genome build hg38). MultiQC was used to perform quality control of RNA-Seq samples based on the output of Trim Galore! and Kallisto. Transcript-level counts were summarized to gene level using tximport [74]. DESeq2 [75] was used to convert gene-level count data (The output of tximport [74]) to a “DESeqDataSeq” object and to normalize the RNA-Seq counts by dividing them by estimated size factors. These normalized RNA-Seq counts were used to identify the cell type that featured highest expression of each prognostic gene, representing the cell type with the maximum mean normalized count value. Normalized counts were log2-transformed prior to data visualization.
Estimating cell fractions within the Stanford flow-sorted cell bulk RNA-Seq dataset
CIBERSORTx [76] was applied to gene-level transcripts per million (TPM) data in combination with a signature matrix derived from the Puram scRNA-Seq dataset. This signature matrix was derived from HNC scRNA-Seq data, ensuring that the gene signatures used to infer cell fractions were representative of cell types found within HNC tumor microenvironments.
Preprocessing and analyzing the Huang bulk RNA-Seq dataset
Raw fastq files were accessed from the European Genome Phenome Archive (Dataset ID: EGAD00001004489) and were processed as described for the Stanford bulk RNA-Seq dataset (see “Preprocessing and analyzing the Stanford flow-sorted cell bulk RNA-Seq data”). Wilcoxon rank sum tests were used to test for differences in mean expression (DESeq2-normalized counts) of anti-LNM and pro-LNM genes between primary HNCs and patient-matched LNMs.
Testing association of gene expression with TP53 mutation status
Association of pro-LNM cluster 4 genes with somatic TP53 mutations was established based of differential expression analysis within the TCGA [16] and Wichmann [19] bulk gene expression datasets, as well as the Puram scRNA-Seq dataset. Within each bulk gene expression dataset, Wilcoxon rank sum tests were used to test for differences in mean gene expression (normalized counts) of cluster L4 genes between subgroups of HNCs that were stratified based on their TP53 mutation and HPV status. Cluster L4 gene expression was compared between p53 proficient HNCs (HPV − ve and TP53wt) and two separate groups of p53-deficient HNCs, including TP53mut (HPV − ve) HNCs, and HPV + ve HNCs. For the TCGA study, TP53 mutation data was accessed from the MC3 Public MAF file [77]. For the Wichmann study, TP53 mutation data was accessed from GEO metadata (Accession: GSE65858). Excluded from the analysis were Wichmann study HNCs that were annotated as having “non-disruptive” TP53 mutations and that were HPV negative, due to the ambiguity of their p53 proficiency. In the Puram scRNA-Seq dataset, multiple linear regression was used to test for association of mean cluster L4 gene expression (Normalized counts) with TP53 mutation status (the mutation status of the overall tumor), adjusted for cell cycle phase (estimated by Seurat), within malignant cells. TP53 mutation status (as indicated by targeted or whole exome sequencing) were accessed from the Puram et al. report.
Analysis of expression of LNM-associated genes in oral premalignant lesions
OPL data was accessed from GEO (Accession: GSE26549) [78]. This dataset included gene expression array data for 86 OPL (oral leukoplakia) biopsies that were annotated with follow-up (oral cancer-free survival) information, 84 of which were also were annotated for histology. Raw expression array. CEL files (Affymetrix Human Gene 1.0 ST Array) were processed using the “affy” R package in combination with a platform-specific custom CDF file that was accessed from Brainarray (http://brainarray.mbni.med.umich.edu/). Expression data were normalized using the mas5 algorithm. Probe-level data was next summarized to gene-level data using the WGCNA package [29], using the default “maxmean” method for probe filtering, and summarized gene data were log2 transformed. Subsequent statistical analyses were applied to log2-transformed data.
Data analysis software
Data analysis was performed using R versions 3.6.1 and 4.1.0. Bulk RNA-Seq reads were trimmed and filtered using Trim Galore! (Version 0.6.0) and were quality controlled using MultiQC v1.9 within Python 2.7.5. Bulk RNA-Seq reads were pseudoaligned using Kallisto (linux-v0.46.0). Aligned bulk RNA-Seq reads were converted to gene-level estimates using Tximport 1.14.2. Gene-level bulk RNA-Seq counts were normalized using DESeq2 (1.26.0). Single-cell RNA-Seq reads (10x Genomics) were processed and aligned using Cell Ranger (6.1.2). Subsequent processing and analysis of single-cell RNA-Seq data was performed using Seurat (4.1.0). Flow cytometry data were analyzed using FlowJo Cytometry Analysis Software (BD Biosciences). Other programs and tools are indicated in the relevant “Methods” and “Results” sections.
Results
Curation of a resource for meta-analysis of HNC gene expression
We assembled a compendium of 29 primary HNC gene expression datasets with accompanying clinical data, representing the largest such resource for HNC. This resource was specifically built to identify genes associated with two outcome variables: patient survival and lymph node metastasis (LNM) status. Meta-analyses were applied to uniformly preprocessed gene expression data, as in our PRECOG resource [79]. Briefly, datasets were quality controlled, normalized, log transformed, and standardized to calculate gene expression profiles. Clinical data were manually curated and included survival and LNM status as well as variables relevant to HNC prognosis, such as tumor grade, tumor subanatomic location, and HPV status. The resulting 29 cleaned studies included 2074 HNC tumors (Additional file 2: Table S1). In total, 1638 patients (across 16 cohorts) had survival outcome data and 1449 patients (20 cohorts) had LNM status (Table 1).
HNC survival-associated genes reflect TME composition, EMT, and hypoxia
We first identified genes associated with HNC survival across 16 studies. Overall survival (OS) was used where available, while progression-free survival or distant metastasis-free survival was used for other studies (Additional file 2: Table S1). Cox regression models were used to calculate z-scores for association of each gene with survival in each dataset. Z-scores were then aggregated into a per-gene meta-z-score using Liptak’s weighted meta-z [35, 36]. Four hundred seventy-nine genes were favorably associated with survival (pro-survival genes; meta-z ≤ − 3.09, i.e., P < 0.001) and 730 were adversely associated with survival (anti-survival genes; meta-z ≥ 3.09) (Fig. 1A and Additional file 2: Table S4). Cox regression z-scores were generally consistent between studies that did and did not include HPV + ve OPC (Fig. 1A) and remained significantly associated with survival in a meta-analysis that excluded HPV + ve OPC (Additional file 1: Supplementary Results), indicating that HPV does not drive the association of these genes with survival. Moreover, these genes remained significantly associated with survival after adjustment for age and sex (Additional file 1: Supplementary Results).
We next applied gene set enrichment analysis (GSEA) to identify pathways and functions that were enriched within the survival-associated genes (Fig. 1A). GSEA indicated that pro-survival genes were enriched for immune pathways, particularly genes related to antigen receptor-mediated signaling (e.g., CD247, CD19, IL2RG) and immune activation (e.g., CD2, CD3D/E/G, CD5-7). Anti-survival genes were for epithelial-mesenchymal transition (EMT) (e.g., SNAI2, ITGB6, FN1) and hypoxia (e.g., P4HA1/2, GAPDH, ENO1), as well as genes regulated by polycomb repressive complex 1 (PRC1) component enzymes BMI and PCGF2 (e.g., VEGFC, CXCL1, CCND1). These enrichments are consistent with known roles for leukocyte infiltration [26, 80] and EMT [81, 82] in protecting against and promoting HNC progression, respectively.
Since survival genes were enriched for multiple oncogenic processes, we next sought to delineate the various prognostic pathways and processes that are represented by these genes. To achieve this, we applied Phenograph [61], an unsupervised clustering method, to cluster the survival-associated genes identified by meta-analysis based on their co-expression within a large HNC population. Phenograph identified six gene clusters: Three (S3, S5-S6) primarily consisted of pro-survival genes, and three (S1-S2, S4) of anti-survival genes (Table 2, Additional file 2: Table S4 and Additional file 1: Figure S2A). Most of these genes had similar pan-cancer survival associations when compared to our previous analysis of survival-associated genes (Additional file 1: Figure S3A) [79], indicating that these genes are generally prognostic across cancer types. All survival-associated signatures remained significantly associated with survival in linear models after adjusting for HPV status (Additional file 1: Figure S4A). Gene set overrepresentation analysis (GSOA) (i.e., hypergeometric analysis) was used to identify MSigDB gene sets that most significantly overlapped with each survival gene cluster. Anti-survival cluster S1 was also overrepresented for EMT markers, as well as genes negatively regulated by PRC1, or involved in hypoxia or focal adhesion. Anti-survival cluster S2 included genes involved in ribosome and ribonucleoprotein biogenesis, MTORC1 signaling, and response to protein misfolding. Anti-survival cluster S4 was overrepresented for EMT markers, genes upregulated by TGF beta signaling, and genes encoding extracellular matrix components. Pro-survival gene clusters S5 and S6 were overrepresented for genes related to T cell activation and squamous epithelial differentiation, respectively, but no gene sets significantly overlapped with pro-survival cluster S3.
Epithelial differentiation is a key factor in HNC lymph node metastasis
We next identified genes that are differentially expressed in primary tumors of LNM positive (LNM +) patients relative to primary tumors of LNM negative (LNM0) patients. Random effects models applied to 20 datasets with a total of 1449 primary HNCs identified 420 genes more highly expressed in primary tumors of LNM + patients (pro-LNM genes; meta-z ≤ − 3.09) and 457 genes that were lower expressed (anti-LNM genes; meta-z ≥ 3.09) (Fig. 1B, Additional file 2: Table S4). Associations of gene signatures with LNM status were independent of HPV status, age, and sex (Fig. 1B, Additional file 1: Figure S4, Supplementary Results).
Pro-LNM genes were enriched for mitosis and cell cycle genes, particularly ones regulated by E2F transcription factors (TFs) and that are involved in the G2/M checkpoint (CDK1, CCNB2, CHEK2, AURKA/B) (Fig. 1B). They included proliferation markers MKI67 and minichromosomal maintenance complex genes (MCM3, MCM6-8), as well as genes involved in DNA replication, recombination, and repair (POLA1, RAD51, MSH6). Strikingly, anti-LNM genes were enriched for squamous epithelial terminal differentiation processes, including cornification and keratinization. They included IVL, multiple keratins (KRT2, KRT6B, KRT10, KRT16, KRT23, KRT75, KRT78, KRT80), and nine contiguous kallikrein related peptidases (KLK6-14) within the 19q13 gene cluster, which regulate skin desquamification [83]. This suggests that LNM is strongly associated with loss of dedifferentiation or loss of epithelial identity within primary tumors. Indeed, anti-LNM and pro-LNM signatures were respectively strongly positively and negatively associated with tumor grade (i.e., level of pathological differentiation within malignant cells) across studies (Fig. 2A). Expression of LNM-associated genes displayed a stepwise progression from histologically normal tumor-adjacent tissue to tumors with increasing grades of dedifferentiation (Fig. 2B). We also identified genes associated with grade across studies (Additional file 1: Figure S5 and Additional file 2: Table S5). Three hundred twenty-four of 420 (77%) pro-LNM and 374 of 457 (82%) anti-LNM genes were also associated with grade (Additional file 2: Table S5), and gene meta-z’s for these associations were highly correlated (Pearson correlation coefficient = 0.63) (Fig. 2C).
Phenograph was used to cluster LNM-associated genes based on their co-expression within a combined dataset of 1642 primary HNCs (20 studies), which again identified six LNM-associated gene clusters. Three primarily consisted of anti-LNM genes (L1–L3), and three of pro-LNM genes (L4–L6) (Additional file 1: Figure S2B). All six clusters were strongly associated with tumor grade (Fig. 2A). Both anti-LNM clusters L1 and L2, the two major anti-LNM clusters that comprise 81% of anti-LNM genes combined, were overrepresented for epithelial differentiation genes (Additional file 2: Table S6). Thirty (14%) of the gene within cluster L2 overlapped with survival cluster S6 (Additional file 1: Figure S6; hypergeometric test P = 5 × 10−31), identifying a set of differentiation-related genes that are positively associated with survival and negatively associated with metastasis. Anti-LNM cluster L3 (N = 15 genes) was enriched for intracellular protein transport and RNA splicing.
Pro-LNM L5 genes were overrepresented for members of the oncogenic KRAS and IL2-STAT5 signaling pathways, as well as genes that regulate diverse immune processes including activation and proliferation of T cell and B lymphocytes, mononuclear cell differentiation, and immunoglobulin production. Pro-LNM cluster L6, the smallest LNM gene cluster (N = 14 genes) was (like anti-LNM cluster L3) overrepresented for RNA splicing, further suggesting a role of splicing regulation in metastasis.
Disruption of the P53-DREAM pathway is a pro-LNM factor in HNC
Pro-LNM gene cluster L4 was the largest pro-LM gene cluster and included six out of ten genes with the strongest pro-LNM associations. This cluster consisted of cell cycle and proliferation-related genes such as E2F TF target genes, which accounted for the enrichment of cell cycle genes in pro-LNM genes overall. Many of the genes within cluster L4 represent periodically expressed (oscillating) G2/M and G1/S phase genes [62] (Fig. 3A, B, Additional file 2: Table S6) indicating roles that are specific to different stages of the cell cycle. A subset of cell cycle genes that are repressed by E2F4/5 are indirectly regulated by p53 through the p53-p21-DREAM-CDE/CHR pathway [60, 85] (the “p53-DREAM” pathway). Since p53 inactivation occurs in most HNCs and is associated with LNM [1, 13], we investigated overlap of LNM signatures with a set of 201 p53-DREAM-repressed genes [60] (Fig. 3A). P53-DREAM targets were strikingly overrepresented within pro-LNM cluster L4 (Fig. 3A) and displayed a much stronger enrichment within pro-LNM genes than any of the MsigDB gene sets previously analyzed (normalized GSEA score = 3.5, FDR-adjusted p-value = 4 × 10-52). We tested the hypothesis that cluster L4 genes are overexpressed in HNCs due to p53 inactivation by analyzing their expression in relation to both TP53 mutations and HPV status (Fig. 3C), since p53 is targeted for ubiquitination-mediated degradation by the HPV E6 oncoprotein [86]. Indeed, cluster L4 genes were upregulated in both HPV + ve tumors and TP53mut/HPV − ve HNCs relative to TP53wt/HPV − ve HNCs, indicating that these genes might be upregulated in HNC due to loss of repression by p53-DREAM. Moreover, cluster L4 genes were upregulated in HNCs with different functional categories of TP53 mutation (Fig. 3C). This is consistent with the hypothesis that they are upregulated due to loss of p53 TF activity, which is associated with all TP53 mutation types, rather than acquired functions of mutant p53, which are conferred by specific (usually missense) TP53 mutations [87]. Importantly, p53 is primarily a transcriptional activator and represses transcription indirectly by transcriptionally activating CDKN1A (encoding p21), whose cyclin-dependent kinase-inhibitory activity facilitates assembly of the DREAM complex [60, 88]. Indeed, CDKN1A was negatively associated with LNM in our meta-analysis (included in anti-LNM cluster L1), suggesting that p21 could prevent metastasis by mediating transcriptional repression of P53-DREAM target genes within cluster L4. Importantly, other than CDKN1A, LNM-associated genes were not significantly enriched for genes that are directly activated by p53 [89], even though pro-survival genes significantly overlapped with p53-activated genes (P = 8 × 10−5) (Fig. 3A). Indeed, enrichment of pro-survival genes for p53-activated genes (e.g., TP53INP1, TP73, BTG3) could be expected, since TP53 mutations are associated with adverse survival [16]. Together these observations suggest that p53-inactivation supports LNM by upregulating genes that are repressed by p53-DREAM, while downregulation of p53-activated genes influences survival thorough LNM-independent mechanisms, such as by conferring therapy resistance [90].
Interestingly, cluster L4 genes were expressed at particularly high levels in HPV + ve HNCs (Fig. 3C), which could be explained due to inactivation of both p53 and DREAM by the HPV E6 [86, 92]and E7 [93] oncoproteins, as well as HPV E7-mediated repression of the tumor suppressor Rb1 [94], which transcriptionally represses a subset of proliferation genes that are also repressed by DREAM. Upregulation of these pro-LNM genes could explain the paradoxical observation that HPV + ve HNCs have particularly high rates of LNM, despite their favorable prognosis [95, 96]. Moreover, high expression of cluster L4 genes in HPV + ve HNCs could account for why these genes were not associated with survival, since they are associated with both adverse (LNM and p53 inactivation) and favorable (HPV positivity) prognostic factors. Indeed, while not associated with survival in HNC specifically, they were strongly adversely associated with survival in prior pan-cancer analysis [79] (Additional file 1: Figure S3B).
Specific cell types in the HNC TME express survival-associated genes
Identification of cell types that express prognostic genes could yield insight into their roles in disease progression. We sought to identify cells types that express survival-associated genes using two primary HNC scRNA-Seq datasets: a set of five HPV − ve primary HNCs that we profiled on the 10X Genomics platform (Stanford scRNA-Seq dataset [63]); and a published scRNA-Seq dataset of nine HPV − ve primary HNCs profiled using Smart-Seq2 technology (Puram dataset [67]) (Additional file 1: Figure S7 & S8). In each scRNA-Seq dataset, we identified the cell types that most highly expressed each prognostic gene (Table 3, Additional file 1: Figure S9, Additional file 2: Table S4). We further validated our observations of cell type-specific expression by analyzing the expression of survival and LNM genes in bulk RNA-Seq data of four cell populations that we flow sorted from primary HNCs (Table 3, Additional file 1: Figure S10, Additional file 2: Table S4) (Stanford bulk RNA-Seq dataset). These included malignant cells (n = 13), fibroblasts (n = 10), lymphocytes (n = 15), and endothelial cells (n = 12). To confirm the enrichment of target cell types by flow cytometry, CIBERSORTx was applied to infer the fractions of cell types within the Stanford bulk RNA-Seq dataset (Additional file 1: Figure S10B).
Expression of anti-survival cluster S1 and S2 genes were expressed in both malignant and mesenchyme-derived stromal cells in all three datasets (Table 3, Additional file 1: Figure S9 & S10). Consistent with their enrichment for EMT drivers (e.g., SNAI2 and ITGB6), expression of cluster S1 genes was correlated (Pearson r = 0.55, P < 2.2 × 10-16) with “EMT score,” a commonly used measure of EMT based on the expression of epithelial and mesenchymal genes [72] (Figure S11). Within malignant cells, expression of cluster S1 genes was strongly elevated within a distinct subpopulation of cells that displayed high EMT score, consistent with the existence of an aggressive mesenchymal malignant cell population within HNC. In contrast with cluster S1 genes, cluster S4 genes were largely restricted to fibroblasts. They included known fibroblast markers (e.g., FAP, FN1, SERPINH1) [66] and were mostly expressed in a subset termed cancer-associated fibroblast 1 (CAF1) [67] (Additional file 1: Figure S12). Indeed, recent single-cell studies have indicated that many adversely prognostic genes that were previously considered to represent cancer EMT markers are highly expressed in cancer-associated fibroblasts [97].
Expression of pro-survival cluster S3 genes were ubiquitously expressed across cell types, while S5 genes were restricted to cytotoxic T cells and other lymphocytes (Table 3, Additional file 1: Figure S9 & S10). Consistent with enrichment for epithelial markers, pro-survival cluster S6 was restricted to a minor subpopulation of non-proliferating (G1 phase) malignant cells that was particularly well differentiated, as indicated by high expression of anti-grade genes as well as a reported epithelial differentiation signature [67] (Table 3, Additional file 1: Figures S9, S10, & S13).
LNM genes are associated with malignant cell dedifferentiation and proliferation linked to loss of P53-DREAM-mediated repression
Analysis of LNM gene signatures in scRNA-Seq data confirmed downregulation of anti-LNM genes and upregulation of pro-LNM genes in combined cells of LNM + primary HNCs (N = 6) relative to those of LNM0 primary HNCs (N = 3), adjusting for cell type, cell cycle phase, and TP53 mutation status (Fig. 4A). This validated the association of the meta-analysis-derived genes with LNM status in combined cell types, analogous to bulk gene expression data. Interestingly, however, analyses within each cell type revealed strong downregulation of all three anti-LNM gene clusters, and upregulation of all three pro-LNM clusters, within malignant cells of LNM + HNCs; however, no consistent pattern of deregulation was observed in any other cell type (Fig. 4A). This suggests that the LNM-associated genes are deregulated primarily (or specifically) within malignant cells of LNM + HNCs. Consistent with this, LNM-associated genes, both anti-LNM and pro-LNM genes, were primarily expressed within malignant cells (Figs. 2D and 4A, Additional file 1: Figures S9B, S10, & S13), in contrast with the heterogenous expression of survival-associated genes across cell types. Both major clusters of anti-LNM genes (L1 and L2) were restricted to the well-differentiated malignant cell cluster that also expressed pro-survival cluster S6 genes (with which cluster L2 overlapped) (Table 3, Fig. 2D, Additional file 1: Figures S9B, S10, & S13). This concurs with the epithelial differentiation-related functions of these genes. Anti-LNM cluster L3 genes were ubiquitous across cell types, consistent with their roles in basic cellular processes.
Pro-LNM cluster L4, the largest pro-LNM gene cluster, was primarily expressed in malignant cells (Table 3, Additional file 1: Figures S9, S10), particularly those in G2/M cell cycle phase (Figs. 2D and 3D, & Additional file 1: S13), consistent with our earlier findings. Moreover, they were upregulated in malignant cells of TP53 mutated (TP53mut) HPV − ve HNCs (N = 5) relative to TP53 wild-type (TP53wt) HPV − ve HNCs (N = 3) (Fig. 3D), implying that their overexpression is caused by p53 inactivation. Furthermore, within malignant cells, cluster L4 genes were negatively correlated with CDKN1A (Fig. 3E), consistent with their being transcriptionally repressed by the p53-DREAM pathway [88]. Interestingly, they were anticorrelated with CDKN1A expression within malignant cells of both TP53mut and TP53wt HNCs, suggesting that in the absence of TP53 mutations or HPV, downregulation of p21 by other mechanisms such as TP53 deletion [99] could disrupt DREAM-mediated repression of pro-LNM genes.
Pro-LNM Cluster L5 genes were ubiquitously expressed, consistent with their diverse functional repertoire (Table 3, Additional file 1: Figures S9 & S10). Many cluster L5 genes were primarily expressed in leukocytes, confirming a previous report of an immune gene signature that was associated with LNM [19]. Cluster L6 genes were ubiquitously expressed across cell types (Table 3, Additional file 1: Figure S9 & S10), consistent with their roles in RNA splicing.
Taken together, our findings indicate that LNM-associated genes are primarily deregulated within the malignant cells. This is consistent with the functional roles of LNM-associated genes in epithelial differentiation and cell cycle regulation, as well as the observation that they are associated with tumor grade, a measure of dedifferentiation of malignant cells.
Less differentiated LNM-associated malignant cells express non-EMT-related stemness genes
Interestingly, in both scRNA-Seq datasets, the subset of L4-expressing malignant cells that lacked expression of differentiation markers also highly expressed “stemness signatures,” including genes positively associated with tumor grade in HNC and embryonic stem cell (ESC)-specific genes (Fig. 2D, Additional file 1: Figure S13). This malignant cell subpopulation displayed high transcriptional diversity—a hallmark of pluripotent cells—as assessed by CytoTRACE analysis [70] (Fig. 2D, Additional file 1: Figure S13). Within malignant cells, cluster L4 genes were linearly correlated with dedifferentiation and stemness signatures (Fig. 2D). Taken together, these findings suggest that a subset of malignant cells expressing LNM cluster L4 genes could be cancer stem-like cells that have the potential to seed metastases [100].
Pro-LNM cluster L4 genes and anti-LNM genes displayed a “mutually exclusive” expression pattern in scRNA-Seq (Additional file 1: Figure S14), suggesting an antagonistic relationship between them. This inverse association is consistent with the general observation that upregulation of cell cycle genes is inversely correlated with expression of cell type-specific differentiation-related genes across multicellular organisms [101, 102]. Thus, loss of epithelial differentiation transcriptional programs in malignant cells might represent an oncogenic switch to a proliferative state occurring as a secondary consequence of loss of p53-DREAM-mediated repression. Importantly, while EMT has been implicated in both LNM and stemness [103], in our analysis pro-LNM genes did not include EMT or mesenchyme-related genes, and did not correlate with EMT score (Fig. 2D & Additional file 1: S11B).
Enhanced deregulation of LNM-associated genes after lymph node metastasis
Our LNM meta-analysis identified genes that are deregulated in primary tumors of LNM + HNC cases; we next investigated whether they are further deregulated after metastasis, by comparing their expression in LNMs (metastatic tumors) relative to patient-matched primary tumors. We first investigated this at the patient population level by comparing mean expression of pro-LNM genes and anti-LNM between bulk RNA-Seq profiles of primary HNCs (N = 29), and patient-matched LNMs (N = 72), using a dataset published by Huang et al. [98]. This revealed strong downregulation of anti-LNM genes in LNMs relative to primary tumors, coupled with modest but statistically significant upregulation of Pro-LNM genes (Fig. 4B). Since our scRNA-Seq analyses indicated that LNM-associated genes are deregulated primarily within malignant cells, we next compared their expression between malignant cells of primary HNCs and patient-matched LNMs within the Puram and Stanford scRNA-Seq datasets. This confirmed downregulation of anti-LNM genes in malignant cells of LNMs relative to those of primary tumors in both datasets, adjusted for known modifiers of LNM gene expression including cell cycle phase and TP53 mutation status (In the Puram dataset where TP53 mutation status was available) (Fig. 4B). Pro-LNM genes were marginally upregulated in malignant cells of LNMs within the Stanford dataset, but not the Puram dataset; therefore we could not confirm the upregulation of pro-LNM genes in LNMs observed in bulk RNA-Seq data.
We investigated the expression patterns of LNM-associated genes within LNMs by analyzing their expression in combined LNMs and primary HNCs of the Puram and Stanford scRNA-Seq datasets. Consistent with our observations in primary HNCs, pro-LNM gene expression was highest in a stem-like, proliferating malignant cell cluster that consisted of both primary tumor and LNM-derived malignant cells (Additional file 1: Figure S14). This indicates that LNMs maintain a subpopulation of stem-like malignant cells after metastasis. Also consistent with primary HNCs, anti-LNM genes were highest expressed within a well-differentiated malignant cell cluster within LNMs; however, their expression was lower within malignant cells of each unsupervised cell cluster in LNMs compared to those of the same cluster in primary HNCs (Additional file 1: Figure S14). This indicates that anti-LNM genes are generally downregulated throughout the malignant cell compartments of LNMs.
Deregulation of LNM-associated genes as an early tumorigenic event
Our observation that pro-LNM genes are upregulated in p53-inactivated malignant cells led us to hypothesize that transcriptional repression these p53-DREAM target genes could be lost during early tumorigeneses, since p53 inactivation occurs in premalignant lesions and is understood to cause tumorigenesis in HNC [17, 104, 105]. To investigate this, we analyzed expression of LNM-associated genes in 86 oral premalignant lesions (OPLs) [78]. Indeed, we found that pro-LNM cluster L4 genes were strongly upregulated with advancing stages of premalignant disease progression and increasing epithelial dysplasia, a histological phenotype used to grade OPLs that identifies OPLs at higher risk of malignant transformation [106] (Fig. 2E). Conversely, anti-LNM were negatively associated with progression (Fig. 2E). Together these findings indicate that upregulation of cluster L4 p53-DREAM target genes is concomitant with epithelial dedifferentiation, precedes malignant transformation, and could be an early driver of LNM.
Discussion
Robust identification of pathways and cell types with clinical prognosis in HNC can yield insights into the biology of HNC progression and be used to nominate targeted therapies. Through large-scale meta-analysis, we identified genes associated with survival and LNM. Unsupervised clustering applied to these genes highlighted clusters of co-expressed genes which were associated with distinct pathways. Analysis of these prognostic gene clusters in HNC scRNA-Seq and flow-sorted cells indicated that some were associated with distinct cell subtypes, revealing cell subtypes and processes that influence clinical outcomes.
A key finding is that genes associated with LNM status are primarily deregulated within the malignant cells and that their deregulation is intrinsically tied to epithelial dedifferentiation, as indicated by their associations with grade and stemness within malignant cells. Genes negatively associated with LNM (anti-LNM genes) were enriched for epithelial-specific functions and are expressed within well-differentiated malignant cells. Conversely, pro-LNM genes were strongly associated with grade; and the largest cluster of them (cluster L4) was primarily expressed in undifferentiated malignant cells. Dedifferentiation is a hallmark of cancer that is indicative of aggressiveness and poor prognosis across cancer types [107, 108]. This subset of undifferentiated malignant cells expressing pro-LNM genes is thus consistent with an aggressive subpopulation of metastatic cancer stem-like cells [100]. Our findings postulate that epithelial dedifferentiation is a major driver of LNM in HNC and could represent an early driver, since LNM-associated genes were associated with epithelial dysplasia in OPLs.
We identified loss of p53-DREAM-mediated repression as a potential initiating mechanism of LNM in HNC, since pro-LNM cluster L4 genes were overrepresented for known targets of this pathway and were overexpressed in malignant cells of p53mut HNCs, as well as in HPV + ve HNCs, where it is disrupted by HPV [86, 93]. Furthermore, expression of this gene cluster was negatively associated with expression of CDKN1A, an essential mediator of p53-DREAM-mediated repression, which was itself negatively associated with LNM. Since TP53 mutations and HPV infection represent tumor initiating events in HNC [4, 109], p53 inactivation likely represent the initial cause of deregulation of cluster L4 genes, particularly since they were overexpressed in high-grade OPLs, in which TP53 mutations frequently occur and are understood to predispose to malignant transformation [17, 104]. The hypothesis that p53 inactivation leads to LNM is supported by experimental evidence in animal models [110, 111], as well as by the observations that TP53 mutations are associated with LNM [19, 20] and extranodal extension [112]. While p53 regulates many oncogenic processes both through its TF activity as well as transcription-independent mechanisms [113,114,115], TP53 mutations are understood to promote cancer primarily by disrupting TF activity, since virtually all disrupt DNA binding [87, 116]. Our findings suggest that loss of p53 TF activity drives LNM specifically due to loss of p53-DREAM-mediated repression, since genes that are activated by p53, apart from CDKN1A, were not generally associated with LNM, even though they were associated with longer survival.
Anti-LNM genes could be downregulated as a secondary consequence of loss of p53-DREAM-mediated repression, as their expression is lost in the stem-like malignant cells that express pro-LNM genes. Moreover, the pro-LNM proliferation-related genes were also found to be strongly associated with tumor grade, indicating that proliferation coincides with stable epithelial dedifferentiation. This pattern of mutual exclusion is consistent with the observation across multicellular organisms that expression of genes that promote cell cycle progression and proliferation is inversely associated with expression of cellular differentiation-related genes [101, 102]. While the mechanisms underlying this phenomenon are not fully understood, terminal differentiation is thought to require exit from the cell cycle, which is facilitated by repression of cell cycle genes by DREAM and its E2F components [102, 117]. Our findings therefore lead us to postulate a model wherein loss of p53-DREAM-mediated repression of cell cycle arrest causes hyperproliferation of malignant cells, in turn causing epithelial dedifferentiation and stemness by antagonizing expression of epithelial pathways (Fig. 5).
Genes downregulated in LNM + primary tumors (anti-LNM genes) were further downregulated in malignant cells of lymph node metastases. This suggests that loss of these epithelial differentiation-related genes not only predisposes to metastasis but could contribute to later stages of metastasis. Moreover, this indicates that dedifferentiation is enhanced during metastasis in HNC, in contrast with the hypothesis that pro-metastatic cellular plasticity is reversible after metastatic colonization [123,124,125]. LNMs were found to contain a subpopulation of stem-like malignant cells transcriptionally similar to those observed in primary tumors, albeit with lower expression of anti-LNM genes; these cells therefore feature particularly metastatic transcriptomes, suggesting their potential to seed further metastases.
Favorably survival-associated genes included a subset of the epithelial differentiation-related genes that were negatively associated with LNM; genes that presumably prolong survival by preventing metastasis. Despite this, most survival-associated genes were involved in immune and stroma-related processes that were not associated with LNM. Indeed, over half of survival-associated genes were primarily expressed in non-malignant cells, indicative of the importance of tumor immune and stromal microenvironments in HNC progression [26, 80,81,82, 126]. Some of our findings were expected, such as a major role for T cells/NKT cell-expressed gene programs in promoting survival. More surprising was the striking overrepresentation of mesenchyme-related genes among anti-survival genes, including many that have been implicated in EMT. A subset of these genes, particularly those within anti-survival cluster S1, were highly expressed in malignant cells that also had high EMT score. But cluster S1 genes were also high in fibroblasts, which have increasingly been implicated in HNC progression [127, 128]. Additionally, anti-survival genes in cluster S4 were mostly restricted to fibroblasts, and therefore cannot be related to malignant cell EMT. These genes were overrepresented for ECM component genes, further suggesting that they promote HNC progression by modulating fibroblast remodeling of the ECM [129,130,131]. Alternatively, CAFs could promote HNC progression by expressing growth factors that promote proliferation and growth of malignant cells, or by modulating the immune system to avoid detection of malignant cells [132]. While our findings support a possible role for EMT in HNC survival, they are inconsistent with the hypothesis that EMT promotes either LNM or stemness, despite the widespread hypothesis that EMT causes metastasis by giving rise to stem-like malignant cells [82, 103]. Indeed, a recent report showed that EMT scores were generally not associated with metastasis after controlling for stromal expression of mesenchymal genes [97].
Conclusions
In conclusion, our results suggest that LNM is primarily driven by loss of p53-DREAM-mediated repression resulting in proliferation and EMT-independent dedifferentiation of malignant cells, while patient survival is influenced by epithelial differentiation in addition to tumor microenvironmental factors. Experimental studies are needed to confirm a causal pro-metastatic role of p53-DREAM target genes in HNC, which would nominate them as potential therapeutic targets for this deadly disease.
Availability of data and materials
The Stanford bulk RNA-Seq dataset is accessible from Gene Expression Omnibus (Accession number: GSE113839) [133]. The Stanford scRNA-Seq dataset, which was recently described [63], is also available from GEO (Accession number: GSE140042). All other datasets that were analyzed and are publicly available. Accession numbers for datasets that were analyzed as part of these meta-analyses are provided in Table 1 and Additional file 2: Table S1. The uniformly processed gene expression and accompanying clinical datasets used to perform meta-analyses are publicly available at Zenodo [134]: https://zenodo.org/record/7679088.
Publicly available datasets analyzed
Publicly available datasets analyzed within this study were accessed from Gene Expression Omnibus [135], Array Express [136], and the European Genome Phenome Archive [137]. These datasets are accessible using the following accession codes (with references and URL links):
Gene Expression Omnibus:
GSE113839 [133]—https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE113839
GSE140042 [63]—https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE140042
GSE103322 [67]—https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE103322
GSE26549 [78]—https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE26549
GSE39366 [37]—https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE39366
GSE2837 [46]—https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE2837
GSE31056 [48]—https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE31056
GSE65858 [19]—https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE65858
GSE41613 [39]—https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE41613
GSE42743 [39]—https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE42743
GSE33205 [51]—https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE33205
GSE41116 [42]—https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE41116
GSE85195 [43]—https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE85195
GSE3292 [44]—https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE3292
GSE95805 [45]—https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE95805
GSE23558 [47]—https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE23558
GSE2379 [49]—https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE2379
GSE75538 [138]—https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE75538
GSE75539 [138]—https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE75539
GSE27020 [38]—https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE27020
GSE30784 [139]—https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE30784
GSE23036 [50]—https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE23036
GSE6791 [140]—https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE6791
GSE10121 [52]—https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE10121
GSE78060 [53]—https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE78060
GSE9844 [54]—https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE9844
GSE2280 [55]—https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE2280
GSE6631 [56]—https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE6631
GSE3524 [57]—https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE3524
ArrayExpress
E-MTAB-1328 [40]—https://www.ebi.ac.uk/biostudies/arrayexpress/studies/E-MTAB-1328
European Genome Phenome Archive
EGAS00001003233 [98]—https://ega-archive.org/studies/EGAS00001003233
The Cancer Genome Atlas
TCGA data [16] – TCGA data was accessed using TCGAbiolinks [31].
The code used to perform the meta-analyses, as well as the code used to process and analyze the bulk and single-cell RNA-Seq datasets are available at GitHub [141].
References
Ferlay J, et al. Estimating the global cancer incidence and mortality in 2018: GLOBOCAN sources and methods. Int J Cancer. 2019;144:1941–53.
Pulte D, Brenner H. Changes in survival in head and neck cancers in the late 20th and early 21st century: a period analysis. Oncologist. 2010;15:994.
Brennan K, Koenig JL, Gentles AJ, Sunwoo JB, Gevaert O. Identification of an atypical etiological head and neck squamous carcinoma subtype featuring the CpG island methylator phenotype. EBioMedicine. 2017. https://doi.org/10.1016/j.ebiom.2017.02.025.
Leemans CR, Snijders PJF, Brakenhoff RH. The molecular landscape of head and neck cancer. Nat Rev Cancer. 2018. https://doi.org/10.1038/nrc.2018.11.
Begg AC. Predicting recurrence after radiotherapy in head and neck cancer. Semin Radiat Oncol. 2012;22:108–18.
Duprez F, et al. Distant metastases in head and neck cancer. Head Neck. 2017;39:1733–43.
Garavello W, Ciardo A, Spreafico R, Gaini RM. Risk factors for distant metastases in head and neck squamous cell carcinoma. Arch Otolaryngol Head Neck Surg. 2006;132:762–6.
Cho J-K, et al. Significance of lymph node metastasis in cancer dissemination of head and neck cancer. Transl Oncol. 2015;8:119.
Belcher R, Hayes K, Fedewa S, Chen AY. Current treatment of head and neck squamous cell cancer. J Surg Oncol. 2014;110:551–74.
Ferris RL, et al. Nivolumab for recurrent squamous-cell carcinoma of the head and neck. N Engl J Med. 2016;375:1856–67.
Cohen EEW, et al. Pembrolizumab versus methotrexate, docetaxel, or cetuximab for recurrent or metastatic head-and-neck squamous cell carcinoma (KEYNOTE-040): a randomised, open-label, phase 3 study. Lancet Lond Engl. 2019;393:156–67.
Mei M et al. Comparative efficacy and safety of radiotherapy/cetuximab versus radiotherapy/chemotherapy for locally advanced head and neck squamous cell carcinoma patients: a systematic review of published, primarily non-randomized, data. Ther Adv Med Oncol. 2020;12 https://doi.org/10.1177/1758835920975355.
Hitt R, et al. Phase II study of the combination of cetuximab and weekly paclitaxel in the first-line treatment of patients with recurrent and/or metastatic squamous cell carcinoma of head and neck. Ann Oncol. 2012;23:1016–22.
Wise-Draper TM, et al. Future directions and treatment strategies for head and neck squamous cell carcinomas. Transl Res J Lab Clin Med. 2012;160:167–77.
Kimple RJ, et al. Enhanced radiation sensitivity in HPV-positive head and neck cancer. Cancer Res. 2013;73:4791–800.
Lawrence MS, et al. Comprehensive genomic characterization of head and neck squamous cell carcinomas. Nature. 2015;517:576–82.
Zhou G, Liu Z, Myers JN. TP53 mutations in head and neck squamous cell carcinoma and their impact on disease progression and treatment response. J Cell Biochem. 2016;117:2682–92.
Campbell JD, et al. Genomic, pathway network, and immunologic features distinguishing squamous carcinomas. Cell Rep. 2018. https://doi.org/10.1016/j.celrep.2018.03.063.
Wichmann G, et al. The role of HPV RNA transcription, immune response-related gene expression and disruptive TP53 mutations in diagnostic and prognostic profiling of head and neck cancer. Int J Cancer. 2015;137:2846–57.
Biswas NK, et al. Lymph node metastasis in oral cancer is strongly associated with chromosomal instability and DNA repair defects. Int J Cancer. 2019;145:2568–79.
Neskey DM, et al. Evolutionary Action score of TP53 (EAp53) identifies high risk mutations associated with decreased survival and increased distant metastases in head and neck cancer. Cancer Res. 2015;75:1527.
Johnson DE, et al. Head and neck squamous cell carcinoma. Nat Rev Dis Primer. 2020;61:1–22.
Ferris RL. Immunology and immunotherapy of head and neck cancer. J Clin Oncol. 2015;33:3293–304. https://doi.org/10.1200/JCO.2015.61.1509.
Brennan K, et al. NSD1 inactivation defines an immune cold, DNA hypomethylated subtype in squamous cell carcinoma. Sci Rep. 2017;7:17064.
Fridman WH, Pagès F, Saut̀s-Fridman C, Galon J. The immune contexture in human tumours: impact on clinical outcome. Nat Rev Cancer. 2012;12:298–306.
Borsetto D, et al. Prognostic significance of CD4+ and CD8+ tumor-infiltrating lymphocytes in head and neck squamous cell carcinoma: a meta-analysis. Cancers. 2021;13:781.
Alsahafi E, et al. Clinical update on head and neck cancer: molecular biology and ongoing challenges. Cell Death Dis. 2019;10:1–17.
Cao S, et al. Dynamic host immune response in virus-associated cancers. Commun Biol. 2019. https://doi.org/10.1038/s42003-019-0352-3.
Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9:1–13.
Gao J, et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci Signal. 2013;6:pl1.
Colaprico A, et al. TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data. Nucleic Acids Res. 2016;44:e71.
Durinck S, et al. BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics. 2005;21:3439–40.
Zhao Y, Wong L, Goh WWB. How to do quantile normalization correctly for gene expression data analyses. Sci Rep. 2020;10:1–11.
Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostat Oxf Engl. 2007;8:118–27.
Stouffer SA, Suchman EA, Devinney LC, Star SA, Williams RM Jr. The American soldier: Adjustment during army life. (Studies in social psychology in World War II), Vol. 1. The American soldier: Adjustment during army life. (Studies in social psychology in World War II), Vol. 1 (Princeton Univ. Press, 1949).
Lipták T. On the combination of independent tests. Magy Tud Akad Mat Kut Int Kozl. 1958;3:171–97.
Walter V, et al. Molecular subtypes in head and neck cancer exhibit distinct patterns of chromosomal gain and loss of canonical cancer genes. PLoS One. 2013;8:e56823.
Fountzilas E, et al. Identification and validation of a multigene predictor of recurrence in primary laryngeal cancer. PLoS One. 2013;8:e70429.
Lohavanichbutr P, et al. A 13-gene signature prognostic of HPV-negative OSCC: discovery and external validation. Clin Cancer Res Off J Am Assoc Cancer Res. 2013;19:1197–203.
Jung AC, et al. A poor prognosis subtype of HNSCC is consistently observed across methylome, transcriptome, and miRNome analysis. Clin Cancer Res Off J Am Assoc Cancer Res. 2013;19:4174–84.
Thurlow JK, et al. Spectral clustering of microarray data elucidates the roles of microenvironment remodeling and immune responses in survival of head and neck squamous cell carcinoma. J Clin Oncol Off J Am Soc Clin Oncol. 2010;28:2881–8.
Pickering CR, et al. Integrative genomic characterization of oral squamous cell carcinoma identifies frequent somatic drivers. Cancer Discov. 2013;3:770–81.
Bhosale PG, et al. Chromosomal alterations and gene expression changes associated with the progression of leukoplakia to advanced gingivobuccal cancer. Transl Oncol. 2017;10:396–409.
Chung CH, et al. Increased epidermal growth factor receptor gene copy number is associated with poor prognosis in head and neck squamous cell carcinomas. J Clin Oncol Off J Am Soc Clin Oncol. 2006;24:4170–6.
García-Escudero R, et al. Overexpression of PIK3CA in head and neck squamous cell carcinoma is associated with poor outcome and activation of the YAP pathway. Oral Oncol. 2018;79:55–63.
Chung CH, et al. Gene expression profiles identify epithelial-to-mesenchymal transition and activation of nuclear factor-kappaB signaling as characteristics of a high-risk head and neck squamous cell carcinoma. Cancer Res. 2006;66:8210–8.
Ambatipudi S, et al. Genome-wide expression and copy number analysis identifies driver genes in gingivobuccal cancers. Genes Chromosomes Cancer. 2012;51:161–73.
Reis PP, et al. A gene signature in histologically normal surgical margins is predictive of oral carcinoma recurrence. BMC Cancer. 2011;11:437.
Cromer A, et al. Identification of genes associated with tumorigenesis and metastatic potential of hypopharyngeal cancer by microarray analysis. Oncogene. 2004;23:2484–98.
Pavón MA, et al. Gene expression signatures and molecular markers associated with clinical outcome in locally advanced head and neck carcinoma. Carcinogenesis. 2012;33:1707–16.
Stansfield JC, et al. Toward signaling-driven biomarkers immune to normal tissue contamination. Cancer Inform. 2016;15:15–21.
Sticht C, et al. Activation of MAP kinase signaling through ERK5 but not ERK1 expression is associated with lymph node metastases in oral squamous cell carcinoma (OSCC). Neoplasia N Y N. 2008;10:462–70.
Enokida T, et al. Gene expression profiling to predict recurrence of advanced squamous cell carcinoma of the tongue: discovery and external validation. Oncotarget. 2017;8:61786–99.
Ye H, et al. Transcriptomic dissection of tongue squamous cell carcinoma. BMC Genomics. 2008;9:69.
O’Donnell RK, et al. Gene expression signature predicts lymphatic metastasis in squamous cell carcinoma of the oral cavity. Oncogene. 2005;24:1244–51.
Kuriakose MA, et al. Selection and validation of differentially expressed genes in head and neck cancer. Cell Mol Life Sci. 2004;61:1372–83.
Toruner GA, et al. Association between gene expression profile and tumor invasion in oral squamous cell carcinoma. Cancer Genet Cytogenet. 2004;154:27–35.
Viechtbauer W. Conducting meta-analyses in R with the metafor. J Stat Softw. 2010;36:1–48.
Liberzon A, et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics. 2011;27:1739–40.
Fischer M, Quaas M, Steiner L, Engeland K. The p53–p21-DREAM-CDE/CHR pathway regulates G2/M cell cycle genes. Nucleic Acids Res. 2016;44:164.
Levine JH, et al. Data-driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis. Cell. 2015. https://doi.org/10.1016/j.cell.2015.05.047.
Dominguez D, et al. A high-resolution transcriptome map of cell cycle reveals novel connections between periodic genes and cancer. Cell Res. 2016;26(8):946–62.
Zhang W, et al. Identification of cell types in multiplexed in situ images by combining protein expression and spatial information using CELESTA. Nat Methods. 2022;19:759–69.
Hafemeister C, Satija R. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol. 2019;20(1):1–15.
Waltman L, van Eck NJ. A smart local moving algorithm for large-scale modularity-based community detection. Eur Phys J B. 2013;86:1–14.
Franzén O, Gan L-M, Björkegren JLM. PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data. Database. 2019;2019:46.
Puram SV, et al. Single-cell transcriptomic analysis of primary and metastatic tumor ecosystems in head and neck cancer. Cell. 2017. https://doi.org/10.1016/j.cell.2017.10.044.
Ferenbach D, Hughes J. Macrophages and dendritic cells: what is the difference? Kidney Int. 2008;74:5–7.
Guilliams M, et al. Dendritic cells, monocytes and macrophages: a unified nomenclature based on ontogeny. Nat Rev Immunol. 2014;14:571.
Gulati GS, et al. Single-cell transcriptional diversity is a hallmark of developmental potential. Science. 2020. https://doi.org/10.1126/science.aax0249.
Gibbons DL, Creighton CJ. Pan-cancer survey of epithelial–mesenchymal transition markers across the Cancer Genome Atlas. Dev Dyn. 2018;247:555–64.
Creighton CJ, Gibbons DL, Kurie JM. The role of epithelial-mesenchymal transition programming in invasion and metastasis: a clinical perspective. Cancer Manag Res. 2013;5:187–95.
Bray NL, Pimentel H, Melsted P, Pachter L. Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol. 2016. https://doi.org/10.1038/nbt.3519.
Soneson C, Love MI, Robinson MD. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Research. 2015;4:1521.
Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010. https://doi.org/10.1186/gb-2010-11-10-r106.
Newman AM, et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat Biotechnol. 2019. https://doi.org/10.1038/s41587-019-0114-2.
Ellrott K, et al. Scalable open science approach for mutation calling of tumor exomes using multiple genomic pipelines. Cell Syst. 2018;6:271–281.e7.
Saintigny P, et al. Gene expression profiling predicts the development of oral CancerGene expression profiling predicts oral cancer development. Cancer Prev Res(Phila Pa). 2011;4:218–29.
Gentles AJ, et al. The prognostic landscape of genes and infiltrating immune cells across human cancers. Nat Med. 2015;21:1–12.
Spector ME, et al. Prognostic value of tumor-infiltrating lymphocytes in head and neck squamous cell carcinoma. JAMA Otolaryngol Neck Surg. 2019;145:1012–9.
González-González R, et al. Epithelial-mesenchymal transition associated with head and neck squamous cell carcinomas: a review. Cancers. 2021;13:3027.
Pal A, Barrett TF, Paolini R, Parikh A, Puram SV. Partial EMT in head and neck cancer biology: a spectrum instead of a switch. Oncogene. 2021;40:5049–65.
Zani MB, Sant’Ana AM, Tognato RC, Chagas JR, Puzer L. Human tissue kallikreins-related peptidases are targets for the treatment of skin desquamation diseases. Front Med. 2022;8:1–11.
Assou S, et al. A meta-analysis of human embryonic stem cells transcriptome integrated into a web-based expression atlas. Stem Cells Dayt Ohio. 2007;25:961–73.
Peuget S, Selivanova G. p53-dependent repression: DREAM or Reality? Cancer. 2021;13:48505.
Martinez-Zapien D, et al. Structure of the E6/E6AP/p53 complex required for HPV-mediated degradation of p53. Nature. 2016;529:541–5.
Kennedy MC, Lowe SW. Mutant p53: it’s not all one and the same. Cell Death Differ. 2022;29:983–7.
Fischer M, Steiner L, Engeland K. The transcription factor p53: not a repressor, solely an activator. Cell Cycle. 2014;13:3037–58.
Fischer M. Census and evaluation of p53 target genes. Oncogene. 2017;36:3943–56.
Perrone F, et al. TP53 mutations and pathologic complete response to neoadjuvant cisplatin and fluorouracil chemotherapy in resected oral cavity squamous cell carcinoma. J Clin Oncol. 2010;28:761–6.
Dietz A et al. Association of head and neck cancer (HNSCC) subgroups defined by HPV RNA status, gene expression patterns, and TP53 mutations with lymph node metastasis and survival. J Clin Oncol. 2015;33:6046. https://doi.org/10.1200/jco.2015.33.15_suppl.6046.
Scheffner M, Werness BA, Huibregtse JM, Levine AJ, Howley PM. The E6 oncoprotein encoded by human papillomavirus types 16 and 18 promotes the degradation of p53. Cell. 1990;63:1129–36.
Fischer M, Uxa S, Stanko C, Magin TM, Engeland K. Human papilloma virus E7 oncoprotein abrogates the p53–p21-DREAM pathway. Sci Rep. 2017;7:1–11.
Munger K, Jones DL. Human papillomavirus carcinogenesis: an identity crisis in the retinoblastoma tumor suppressor pathway. J Virol. 2015;89:4708–11.
Bauwens L, et al. Prevalence and distribution of cervical lymph node metastases in HPV-positive and HPV-negative oropharyngeal squamous cell carcinoma. Radiother Oncol. 2021. https://doi.org/10.1016/j.radonc.2021.01.028.
Husain N, Neyaz A. Human papillomavirus associated head and neck squamous cell carcinoma: controversies and new concepts. J Oral Biol Craniofacial Res. 2017;7:198.
Tyler M, Tirosh I. Decoupling epithelial-mesenchymal transitions from stromal profiles by integrative expression analysis. Nat Commun. 2021;12:1–13.
Huang L, et al. Molecular classification of lymph node metastases subtypes predict for survival in head and neck cancer. Clin Cancer Res. 2019. https://doi.org/10.1158/1078-0432.CCR-18-1884.
Donehower LA, et al. Integrated analysis of TP53 gene and pathway alterations in the cancer genome atlas. Cell Rep. 2019;28:1370–1384.e5.
Steinbichler TB, et al. Cancer stem cells and their unique role in metastatic spread. Semin Cancer Biol. 2020. https://doi.org/10.1016/j.semcancer.2019.09.007.
Xia K, et al. Identification of the proliferation/differentiation switch in the cellular network of multicellular organisms. PLoS Comput Biol. 2006;2:1482–97.
Ruijtenberg S, van den Heuvel S. Coordinating cell proliferation and differentiation: antagonism between cell cycle regulators and cell type-specific gene expression. Cell Cycle. 2016;15:196–212. https://doi.org/10.1080/15384101.2015.1120925.
Wilson MM, Weinberg RA, Lees JA, Guen VJ. Emerging mechanisms by which EMT programs control stemness. Trends Cancer. 2020;6:775–80.
Sawada K, et al. Immunohistochemical staining patterns of p53 predict the mutational status of TP53 in oral epithelial dysplasia. Mod Pathol. 2021;35:177–85.
Boyle JO, et al. The incidence of p53 mutations increases with progression of head and neck cancer. Cancer Res. 1993;53:4477–80.
Nakano K, Nagatsuka H. Diagnosis of oral squamous cell carcinomas and precancerous lesions. Inflamm Oral Cancer. 2022:19–41 https://doi.org/10.1016/B978-0-323-88526-3.00002-6.
Ben-Porath I, et al. An embryonic stem cell-like gene expression signature in poorly differentiated aggressive human tumors. Nat Genet. 2008;40:499–507.
Malta TM, et al. Machine learning identifies stemness features associated with oncogenic dedifferentiation. Cell. 2018. https://doi.org/10.1016/j.cell.2018.03.034.
Smeets SJ, et al. Immortalization of oral keratinocytes by functional inactivation of the p53 and pRb pathways. Int J Cancer. 2011;128:1596–605.
Ku TKS, et al. Loss of p53 expression correlates with metastatic phenotype and transcriptional profile in a new mouse model of head and neck cancer. Mol Cancer Res MCR. 2007;5:351–62.
Wang J, et al. Epithelial mutant p53 promotes resistance to anti-PD-1-mediated oral cancer immunoprevention in carcinogen-induced mouse models. Cancers. 2021;13:1471.
Sandulache VC, et al. High-risk tp53 mutations are associated with extranodal extension in oral cavity squamous cell carcinoma. Clin Cancer Res. 2018;24:1727–33.
Ho T, Tan BX, Lane D. How the other half lives: what p53 does when it is not being a transcription factor. Int J Mol Sci. 2020;21:13.
Capaci V, et al. Mutant p53 induces Golgi tubulo-vesiculation driving a prometastatic secretome. Nat Commun. 2020;11:1–19.
Muller PAJ, et al. Mutant p53 drives invasion by promoting integrin recycling. Cell. 2009;139:1327–41.
Kato S, et al. Understanding the function-structure and function-mutation relationships of p53 tumor suppressor protein by high-resolution missense mutation analysis. Proc Natl Acad Sci U S A. 2003;100:8424–9.
Fischer M, Müller GA. Cell cycle transcription control: DREAM/MuvB and RB-E2F complexes. 2017;52:638-662. https://doi.org/10.1080/10409238.2017.1360836.
Cao X et al. The DNMT1/miR-34a/FOXM1 axis contributes to stemness of liver cancer cells. J Oncol. 2020;2020:8978930. https://doi.org/10.1155/2020/8978930.
Wu Y, et al. SUZ12 is a novel putative oncogene promoting tumorigenesis in head and neck squamous cell carcinoma. J Cell Mol Med. 2018;22:3582–94.
Warrier NM, Agarwal P, Kumar P. Emerging importance of survivin in stem cells and cancer: the development of new cancer therapeutics. Stem Cell Rev Rep. 2020;16:828.
Liu L, et al. An RFC4/Notch1 signaling feedback loop promotes NSCLC metastasis and stemness. Nat Commun. 2021;12:1–16.
Elkashty OA, Ashry R, Tran SD. Head and neck cancer management and cancer stem cells implication. Saudi Dent J. 2019;31:395–416.
Bakir B, Chiarella AM, Pitarresi JR, Rustgi AK. EMT, MET, plasticity, and tumor metastasis. Trends Cell Biol. 2020;30:764–76.
Jehanno C, Vulin M, Richina V, Richina F, Bentires-Alj M. Phenotypic plasticity during metastatic colonization. Trends Cell Biol. 2022;32:854–67.
Pérez-González A, Bévant K, Blanpain C. Cancer cell plasticity during tumor progression, metastasis and response to therapy. Nat Cancer. 2023:1–20 https://doi.org/10.1038/s43018-023-00595-y.
Reticker-Flynn NE, et al. Lymph node colonization induces tumor-immune tolerance to promote distant metastasis. Cell. 2022. https://doi.org/10.1016/J.CELL.2022.04.019.
Knops AM, et al. Cancer-associated fibroblast density, prognostic characteristics, and recurrence in head and neck squamous cell carcinoma: a meta-analysis. Front Oncol. 2020;10:2621.
Peltanova B, Raudenska M, Masarik M. Effect of tumor microenvironment on pathogenesis of the head and neck squamous cell carcinoma: a systematic review. Mol Cancer. 2019;2019(18):1–24.
Yu B, et al. Periostin secreted by cancer-associated fibroblasts promotes cancer stemness in head and neck cancer by activating protein tyrosine kinase 7. Cell Death Dis. 2018;9:1–18.
Su S, et al. CD10+GPR77+ cancer-associated fibroblasts promote cancer formation and chemoresistance by sustaining cancer stemness. Cell. 2018;172:841–856.e16.
Ji Z, et al. Cancer-associated fibroblast-derived interleukin-8 promotes ovarian cancer cell stemness and malignancy through the notch3-mediated signaling. Front Cell Dev Biol. 2021;0:1655.
Gieniec KA, Butler LM, Worthley DL, Woods SL. Cancer-associated fibroblasts—heroes or villains? Br J Cancer. 2019;121:293–302.
Gentles AJ, B. K. RNA-seq from HNSCC and melanoma populations. Gene Expression Omnibus [GSE113839]. 2023. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE113839.
Gentles, Andrew J., B., K. Compendium of primary head and neck cancer gene expression datasets with accompanying clinical data. Zenodo [7679088]. 2023. https://zenodo.org/record/7679088.
Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002;30:207–10.
Kauffmann A, et al. Importing ArrayExpress datasets into R/Bioconductor. Bioinforma Oxf Engl. 2009;25:2092–4.
Lappalainen I, et al. The European genome-phenome archive of human data consented for biomedical research. Nat Genet. 2015;47:692–5.
Krishnan NM, et al. A minimal DNA methylation signature in oral tongue squamous cell carcinoma links altered methylation with tumor attributes. Am Assoc Cancer Res. 2016;14:805–19.
Chen C, et al. Gene expression profiling identifies genes predictive of oral squamous cell carcinoma. Cancer Epidemiol Biomark Prev Publ Am Assoc Cancer Res Cosponsored Am Soc Prev Oncol. 2008;17:2152–62.
Pyeon D, et al. Fundamental differences in cell cycle deregulation in human papillomavirus-positive and human papillomavirus-negative head/neck and cervical cancers. Cancer Res. 2007;67:4605–19.
Brennan K. Brennan_2023_HNC_meta_analyses. Github. 2023. https://github.com/kevinbrennan/Brennan_2023_HNC_meta_analyses.
Acknowledgements
We would like to thank Laura D. Attardi, Ph.D., Stanford Department of Radiation Oncology, Edgar Engleman, M.D., Ph.D., Stanford Department of Pathology, and Nathan Reticker-Flynn, Ph.D., Stanford Department of Otolaryngology, for advice and consultation related to this research.
Funding
We acknowledge funding from the National Cancer Institute grants U54CA209971, R01CA276828, and R01CA260271.
Author information
Authors and Affiliations
Contributions
Research concept and design: KB, AG, SKP, JBS. Data acquisition: KB, AG, SC, NK, SS, JHS, SKP, JBS. Data analysis: KB, AG, AEP. Data interpretation: KB, AG, AEP, OG, SKP, JBS. Manuscript drafting and revision: KB, AG. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
This study was performed in compliance with ethical regulations outlined in a Stanford Institutional Review Board (IRB)-approved protocol (protocol no. 11402). All patient samples in this study were collected with informed consent for research use and approved by Stanford Institutional Review Board (protocol numbers 18225 and 18691) in accordance with the Declaration of Helsinki.
Consent for publication
All research participants provided consent to publish their data.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1: Supplementary Methods & Results. Supplementary Discussion. Supplementary References.
Figure S1. FlowJo contour plots illustrating the gating strategy that was used to isolate four cell types from head and neck cancer tumors using fluorescence-activated cell sorting (FACS): Cells were analyzed using FlowJo V. 10.6.1 and first gated on single cell size using FSC width and height and cell granularity using SSC width and height. Figure S2. Unsupervised clustering of survival and LNM-associated genes based on co-expression. Figure S3. Pan-cancer survival meta-z scores of genes that were associated with survival and lymph node metastasis (LNM) in head and neck cancer (HNC). Figure S4. Independence of prognostic signatures from potential confounding factors. Figure S5. Meta-analysis-based identification of genes associated with tumor grade in HNC. Figure S6. Overlap of genes between LNM gene clusters and survival gene clusters. Figure S7. UMAP representations of primary HNCs within the Stanford scRNA-Seq dataset. Figure S8. UMAP representations of the Puram scRNA-Seq dataset. Figure S9. Expression of prognostic gene signatures in two primary HNC scRNA-Seq datasets. Figure S10. Expression of prognostic gene signatures in four major cell types, as indicated by bulk RNA-Seq-derived transcriptional profiles of flow sorted cells. Figure S11. Correlations of prognostic gene signatures with an epithelial to mesenchymal transition (EMT) transcriptional score within primary HNC malignant cells. Figure S12. UMAPs (Seurat feature plots) showing fibroblast and myofibroblast gene signatures in primary HNCs of the Puram and Stanford primary HNC scRNA-Seq datasets. Figure S13. Expression of lymph node metastasis (LNM) and differentiation-related gene signatures in two primary HNC single cell RNA-Seq datasets (Supplementary to figure 2D). Figure S14. Expression of LNM-associated gene signatures within distinct subpopulations of malignant cells in primary HNCs and patient-matched lymph node metastases (LNMs).
Additional file 2: Table S1.
Details of all curated head and neck cancer gene expression studies. Table S2. HNC patients within the Stanford bulk RNA-Seq dataset (Bulk RNA-sequencing of flow sorted-cells). Table S3. Antibodies used for flow cytometry. Table S4. Survival and LNM-associated genes. Table S5. Genes associated with cancer grade based on a meta-analysis. Table S6. Top 10 gene sets that most significantly overlapped with each prognostic gene signature.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Brennan, K., Espín-Pérez, A., Chang, S. et al. Loss of p53-DREAM-mediated repression of cell cycle genes as a driver of lymph node metastasis in head and neck cancer. Genome Med 15, 98 (2023). https://doi.org/10.1186/s13073-023-01236-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13073-023-01236-w