Overview of Transcriptomics

Transcriptomics is the study of RNA molecules and is used to interrogate the activity of the genome in a cell or tumor by measuring its RNA makeup. Despite there being at least 11 types of known RNAs (e.g. mRNA, rRNA, tRNA, snRNA, snoRNA, siRNA, hnRNA, gRNA, tmRNA, telomerase RNA, catalytic RNA), the RNA of greatest interest in Oncology is currently messenger RNA (mRNA), which is actively transcribed from DNA and ultimately translated into protein. More recently, lncRNAs are also becoming of increasing interest [1, 2]. As discussed in the prior chapter, DNA is largely similar across cells of an organism, often with specific alterations that define specific genomic subtypes of cancer. mRNA in contrast is highly dynamic and is less binary and static compared to a DNA mutation. Gene expression typically reflects the functional activity of a cell more than DNA, as even if an upstream gene is mutated or has lost function, if alternative pathways become activated mRNA expression may remain constant or even increased.

Given that many genes have similar expression and are highly correlated with one another, transcriptomics often is synthesized into gene expression signatures to capture subtypes of a particular cancer. These signatures reflect a snapshot of the tumor in time, and despite their dynamic nature, can reproducibly capture more static genomic and biologic subtypes, and even serve as reliable prognostic and predictive biomarkers. In prostate cancer specifically, transcriptomics initially was often used to compliment genomics. However, especially in localized prostate cancer, the genomes of prostate cancers contain a relatively small number of somatic driver mutations and/or copy number alterations, and thus there is currently limited utility in routinely searching for DNA alterations. Thus, gene expression profiling alone is increasingly being studied in localized and recurrent prostate cancer.

Technology

The most common technologies used clinically for the assessment of gene expression include real-time PCR and microarray. Research studies have increasingly transitioned to RNA sequencing (RNA-seq), but commercial tests almost exclusively use PCR or microarray technology [3]. Each technology has its strengths and weaknesses that must be weighed, including costs, breadth of transcriptome covered, customizability, data-analysis, throughput, resolution, and dynamic range [3, 4]. Most whole transcriptome studies have used a discovery process with data generated using either microarray or RNA-seq technology, and the subsequent signature created of typically <50 genes is then recreated with either a targeted sequencing process or simply uses RT-PCR given the reduced costs. Table 1 summarizes the differences between microarray and RNA-seq technologies.

Table 1 Comparison of microarray and RNA-seq technology

Methods of Subtyping

The goal of subtyping is to define a subgroup of prostate cancer that is unique using transcriptomics (Table 2). This could be to capture previously identified distinct genomic subtypes based on unique DNA profiles that can be captured using gene expression data. Alternatively, guided or semi-supervised subtyping methods can be performed, using gene expression to capture known biologic characteristics, such as basal- or luminal-ness, cell cycle activity, or neuroendocrine differentiation. More commonly, subgrouping by prognosis is performed with commercially available subtyping signatures, rather than looking at a biologic feature. In contrast, one can use gene expression data, which may or may not be rooted in known biologically driven mechanistic data, to identify patients who intrinsically are most- or least-likely to benefit from treatment. Finally, the least common is to perform unsupervised hierarchical clustering to determine what genes statistically form unbiased subgroups. To illustrate the relationship of many of the developed prognostic, predictive, and biological subtypes in prostate cancer, Fig. 1 shows a heatmap of many of the subtypes and transcriptomic signatures that will be discussed in this chapter.

Table 2 Common transcriptomic subtyping methods
figure 1

Fig. 1 Select transcriptomic subtypes of localized prostate cancer

Subtypes

Capture Genomic Subtypes

As described in the previous chapter, the most recognized subtypes of prostate cancer are classically defined based on DNA alterations. This is reflected in genomic data available from The Cancer Genome Atlas (TGCA) from localized prostate cancer specimens showing frequent ERG and ETS-family rearrangements and SPOP mutations. This is also reflected in genomic data from multiple large metastatic CRPC cohorts, which have shown common alterations in p53, RB loss, DNA repair alterations, PTEN loss, among a list of frequently occurring mutations. In many instances it is not practical to perform genomic sequencing to identify all of these alterations, especially given that it usually requires fresh-frozen tissue. Thus, investigators have developed methods to accurately and reliably capture these subtypes with gene expression data.

  1. 1.

    ERG, ETS, SPINK1 [5]: A gene expression signature that accurately captures ERG+ tumors was developed with a random forest supervised model to predict FISH-assessed ERG rearrangement status. The model was developed and trained (n = 252 samples) and validated with 155 tumors with known FISH-ERG status. Additionally, a classification method based on gene expression for ETV1, ETV4, ETV5, FLI1, and SPINK1 was developed with an unsupervised outlier analysis using the extremevalues on the expression of core probe sets for each gene. Tumors were ultimately grouped into four subtypes (ERG+ subtype, ETS+, SPINK+, or triple negative). Over 1500 patients were used to train and validate this signature. ERG+ tumors typically had lower baseline serum PSA levels and lower Gleason scores compared to the triple negative subtype. SPINK1 tumors typically had higher PSAs and were more common in African-Americans. Subsequently, these microarray expression-based signatures were analytically validated against established immunohistochemical and FISH assays. Despite these findings, there were no significant differences in time to biochemical recurrence or distant metastases, suggesting these subgroups are biologically based and not prognostic.

  2. 2.

    SPOP mutant signature [6]: A novel gene expression signature and decision tree was developed to accurately predict SPOP mutant cancers from gene expression data. Starting with TCGA data, including RNA-seq data and known SPOP mutant status, differential gene expression was performed and clustered based on SPOP status. 212 genes were ultimately used to define the SPOP mutant subclass. This signature, which was validated in a cohort from Weill Cornell Medicine (n = 68), found an 89% sensitivity and 95% specificity of SPOP mutant prediction compared with DNA mutation calling. Using the prior classifier for ERG+ and ETS+ status [5], which are mutually exclusive from the SPOP mutant subtype, This decision tree was able to identify tumors most likely to harbor SPOP mutations in samples without DNA data (n > 8000). It was found that SPOP predicted tumors were less likely to have higher grade tumors, positive surgical margins, or T3 disease. However, they were predicted to have higher PSAs. Therefore, despite the other clinicopathologic factors being favorable, SPOP mutant tumors identified by this gene expression classifier had worse outcomes.

  3. 3.

    PTEN loss signature [7]: PI3K is frequently activated in prostate cancer, especially metastatic CRPC through PTEN loss. PTEN mRNA expression levels are the primary determinant of PTEN protein levels. A PTEN status signature was developed for breast cancer samples with microarray data to identify genes most significantly associated with PTEN IHC status. From this, a consensus ranked gene list was generated by sorting the average p-value from each cross-validation analysis. A total of 246 genes were ultimately included in the PTEN signature with a receiver-operator characteristic (ROC) of 0.758. This signature was also applied to other tumor types, including prostate cancer. In prostate cancer, this signature was shown to correlate with worse survival.

Unsupervised Hierarchical Clustering

One method of obtaining subtypes is to simply let the data determine what genes are differentially expressed across samples in a manner that clearly divides patients into a limited number of groups based on a list of genes. This usually requires a large panel (e.g. >1000) of genes to be assessed. Given that this method does not force or restrict the clustering to predict either an outcome (e.g. recurrence) or a feature (e.g. genomic subtype), the genes discovered may be of unclear importance in prostate cancer at first glance.

  1. 1.

    TCGA- 3 clusters [8]: The TCGA performed a multi-center study to interrogate primary prostate cancer comprehensively, at the molecular level Using 333 tumors, analyses were conducted on the genome, transcriptome, proteome, and epigenome, ultimately identifying seven molecularly defined subtypes (ERG, ETV1, ETV4, FLI1, SPOP, FOXA1, IDH1, and others). Integrative clustering based solely on mRNA data was also performed. This was done via unsupervised expression clustering of prostate tumors using mRNA-seq data. The top 3000 most variable genes were used to develop mRNA subtypes. Three mRNA clusters were identified that largely grouped ERG and ETV positive tumors into one cluster, and SPOP, FOXA1, and IDH1 subtypes together into another cluster. The FLI1 genomic subgroup contained most of the third mRNA cluster. However, these mRNA subtypes did not optimally recapitulate the seven genomic subgroups. Thus, work from the Tomlins [5] and Barbieri [6] laboratories who derived the ERG+ and SPOP mutant signature appear to be more accurate than the TCGA subtypes, suggesting the original findings were likely over-fit, modeling error due to inference on a limited set of patient samples.

  2. 2.

    In another study tissue microarray profiles of 62 primary prostate cancer tumors, 41 normal prostate cancer specimens and 9 lymph node metastases captured >26,000 coding and non-coding genes [9]. Unsupervised hierarchical clustering was performed on all of the samples. Using 5153 cDNAs whose expression varied most across samples, tumor samples were distinguishable from normal samples. Additionally, three subtypes of prostate cancer were identified based on distinct gene expression patterns. However, the biological relevance or prognostic or predictive relevance of these molecular subtypes is unclear, which is one disadvantage to clustering performed in a completely unsupervised manner.

Supervised Clustering to Capture Specific Biologic Characteristics

  1. 1.

    Prostate Cancer Subtypes 1-3 [10]: An integrated classification of prostate cancer was performed on a large training cohort of 1321 tumor samples and a validation set using 10 patient cohorts and 19 laboratory models of prostate cancer (cell lines and genetically engineered mouse models). Twenty two pathway-activation gene expression signatures relevant to prostate cancer were employed to perform the clustering. These were subsequently collapsed to 14 pathway signatures that were grouped into three categories: (1) AR, AR-V, EZH2, FOXA1, RAS, and PRC, (2) SPOP, TMPRSS2-ERG, PTEN, and (3) stemness, proliferation, epithelial-mesenchymal transition, pro-neural, and neuroendocrine differentiation. At this point unsupervised clustering was performed using the 14 pathways activation profiles, and three distinct clusters were identified and termed PC1, PC2, and PC3. These subtypes were validated in both localized and mCRPC. Interestingly, the TCGA subtypes, including ERG, ETV1/4, SPOP, and FOXA1 were found across all of the new subtypes identified, with differential enrichment by subtype. This study also looked at the association of basal and luminal expression and its correlation to the PC1-3 subtypes. They found a strong association between luminal genes with PC1 and PC2, and basal genes with PC3. The PC1-3 subtypes were also prognostic, in that the PC1 subtype had shorter metastasis-free survival than either PC2 or PC3. Ultimately, the subtypes were simplified into a 37-gene signature that could reasonably recapitulate the three subtypes . The clinical utility and clinical relevance of this signature is unclear, highlighting the immense biological heterogeneity of prostate cancer.

  2. 2.

    AR-activity [11, 12]: The androgen receptor (AR) gene, which is near ubiquitously expressed in prostate cancer, regulates thousands of genes. In localized prostate cancer AR expression has limited heterogeneity in expression, whereas in metastatic castration-resistant prostate cancer (mCRPC), there is more diversity in AR expression. However, the activity of the AR, or AR-signaling or AR-activity, which is measured by the expression of canonical AR-target genes, is significantly more heterogeneous in both localized and mCRPC. Recent work demonstrated that ~10% of localized prostate cancer has lower AR-activity measured by nine canonical AR-targets. This subset appears to closely resemble advanced mCRPC in that both have similar AR-activity. Furthermore, expression of neuroendocrine markers and immunesignaling signatures are increased in this low AR-active subset. Not surprisingly, low AR-active localized prostate cancer has a poor prognosis with a more rapid progression to metastatic disease compared to high AR-activity tumors. Not only does low AR-active prostate cancer have a worse prognosis, it also appears to have unique treatment sensitivities. High AR-active prostate cancer is more sensitive to ADT and taxane chemotherapy, while low AR-active prostate cancer appears more sensitive to PARP inhibition and cisplatin chemotherapy. Further work is in development to assess if AR-activity can serve as a prognostic biomarker and a predictive biomarker to guide treatment selection.

  3. 3.

    Neuroendocrine Prostate Cancer (NEPC) signature [13]: A gene expression signature of neuroendocrine and primary small cell prostate cancer was developed using samples from eight cohorts to compare gene expression that is either up or down in NEPC compared to adenocarcinoma samples. A 69 gene signature was identified that captured at least 80% of NEPC patients. These genes generated three subgroups that were termed atypical small cell prostate cancer, prototypical adenocarcinoma, and prototypical small cell prostate cancer.

Subtypes Developed for Prognosis

Commercial Classifiers

  1. 1.

    Decipher [12, 14,15,16,17,18,19]: The Decipher assay is a clinical-grade transcriptome-wide gene expression profiling assay, based on the Human Exon 1.0 ST oligonucleotide microarray (GenomeDx, Inc). While the assay measures over 46,000 protein-coding and non-coding RNAs, the current Decipher clinical test result is a prognostic biomarker based on the expression of 22 genes. A cohort of radical prostatectomy samples was used to train the signature for the primary endpoint of clinical failure (e.g. metastases) between patients who did and did not develop failure post-treatment. Forty three RNA transcripts were identified that were differentially expressed between groups. Through random forest machine learning 22-genes were ultimately identified that yielded the best performance for the prediction of metastatic disease. These 22 genes include both coding and non-coding genes that have roles in cell cycle progression, proliferation, immune response, cell adhesion and motility. The Decipher test has since been validated in over 3000 patients in >40 studies. Most notably the performance of the classifier was validated in a meta-analysis using 975 patients across five cohorts. Decipher was shown to independently predict for the development of metastatic disease, and had superior performance than currently used clinicopathologic variables (e.g. Gleason score, T-stage, margin status, PSA, etc). The C-index of the clinical model was 0.76, which increased to 0.81 from the addition of Decipher to the model. Furthermore, Decipher performed similarly across all subgroups by age, race, and treatment performed. Decipher has also been recently combined into an integrated clinical-genomic risk grouping system that mirrors NCCN risk groups. This study validated the superior performance of Decipher over clinical factors in both surgical samples as well as pre-treatment biopsy samples. The C-index for the combined clinical-genomic system was 0.84, and approximately 67% of patients were reclassified from NCCN risk groups to new clinical-genomic risk groups. The Decipher test has been used in prospective trials as well. The PRO-IMPACT trial assessed the clinical utility of changing management decisions based on the Decipher test. Furthermore, the G-MINOR trial has completed enrollment, and has randomized patients and providers to the receipt of the Decipher test as compared to the best available clinical nomogram (CAPRA-S model). This trial will be the first randomized trial to assess the clinical utility of any commercial genomic classifier in prostate cancer. Decipher is also being used in multiple ongoing national randomized trials, including NRG GU-002, which is stratifying patients by the use of Decipher. Other trials, such as NRG GU-006 is leveraging the Decipher assay, rather than just the Decipher score, since the microarray used to assess genes in the Decipher test provides hundreds of additional signatures through the Decipher GRID, given that >46,000 genes are analyzed on every sample. Recently, the performance of Decipher in the first randomized trial of any commercial gene expression classifier has been reported. The SPARTAN trial , a randomized trial assessing the benefit of apalutamide in M0CRPC ran the Decipher test on a subset of the trial with banked tissue. They showed that Decipher was highly prognostic and predictive of first line ADT failure.

  2. 2.

    Oncotype Dx [20]: The Oncotype Dx Genomic Prostate Score (GPS) is a 17 gene signature designed for pre-treatment biopsy use. It is run on a RT-PCR platform. Its intended use is to help guide active surveillance decision making. To derive the signature, 198 genes were identified that correlated with recurrence, death from prostate cancer, and adverse pathology. This gene list was truncated to 81, which were associated with aggressive disease within the validation cohort. Ultimately, the signature was refined to 17 genes based on consistency of expression across cohorts. These genes are involved in four primary pathways, including stromal proliferation, androgen signaling, cellular organization, and proliferation. Although the GPS signature has been validated in multiple prostatectomy cohorts of patients eligible for active surveillance, until recently it had not been validated in actual active surveillance patients. Recently, the Canary PASS trial performed Oncotype Dx testing on 634 men entering active surveillance. Unfortunately, the Oncotype Dx test was not associated with subsequent biopsy upgrade on either uni- or multi-variable analysis. These results bring into question the clinical accuracy of the Oncotype Dx test. Future studies are needed to assess the role of GPS testing in prostate cancer.

  3. 3.

    Prolaris [21]: Prolaris, also known as the Cell Cycle Progression (CCP) score, measures 31 cell cycle progression genes and 15 housekeeping genes. It is run on a RT-PCR platform. CCP was initially developed in breast cancer patients and has since been tested and validated in prostate cancer patients. It remains unclear if the test was optimized fully for prostate cancer, but it has been validated in prostate cancer needle biopsies and also prostatectomy samples. It has been tested in patients undergoing active surveillance, prostatectomy, and radiotherapy. The CCP test has been tested for multiple outcomes, including biochemical recurrence, metastasis, and prostate cancer-specific mortality. The test has not been used in any randomized trials to date, and future prospective studies are needed to demonstrate its clinical utility and benefit in intact and post-treatment patients.

Non-commercial Classifiers

  1. 1.

    There have been dozens, if not hundreds, of prognostic gene expression signatures reported in the literature. A brief list is shown in Table 3. They have been developed for various indications with various degrees of validation. None have robust clinical data to support their use, and none are commercially available or covered by Medicare (in contrast to Decipher, Oncotype Dx, and Prolaris). When these signatures were optimized to predict for the development of metastatic disease, it was found that the Decipher 22-gene signature outperformed all of the other signatures when run on the same microarray platform [22].

Table 3 Select list of prognostic gene expression signatures in prostate cancer

Subtypes Developed for Predicting Treatment Response

  1. 1.

    RSI [23]: The Radiation Sensitivity Index (RSI) was developed to predict intrinsic sensitivity to ionizing radiotherapy. It claims to be a pan-cancer signature and was developed from the National Cancer Institute panel of 60 cell lines. Thirty five of these cell lines were ultimately used to determine which genes correlated with clonogenic survival after 2 Gy of radiation therapy. In cell line data the signature only has a 62% accuracy of predicting cellular radiosensitivity. Ultimately, the RSI was developed, which is comprised of a linear algorithm of 11 genes (AR, cJun, STAT1, PKC, cABL, SUMO1, CDK1, HDAC1, and IRF1), each with its own weight, that are summed to yield a final score. There are limited data using this signature in prostate cancer, and it does not appear to be able to predict outcomes in patients treated with radiotherapy. Future work will be necessary to determine if RSI can be applied to patients with prostate cancer.

  2. 2.

    PORTOS [24]: Leveraging the Decipher GRID, a 24-gene Post-Operative Radiation Therapy Outcomes Score (PORTOS) was developed and validated to predict for benefit of post-operative radiotherapy. Using a training cohort of 198 patients, 1800 DNA damage repair and previously annotated radiation response genes were ranked based on outcomes after post-operative radiotherapy. Twenty-four genes were identified that predicted benefit from post-operative radiotherapy. Patients with high PORTOS scores derived a significant benefit, as measured as a reduction in distant metastasis, from receipt of post-operative radiotherapy. In contrast, patients with low PORTOS scores failed to derive benefit from the addition of post-operative radiotherapy. In the validation cohort (n = 330) it was confirmed that PORTOS was a predictive biomarker of post-operative radiotherapy benefit (p-interaction = 0.016). Importantly, it is probable that PORTOS may not be purely a measure of intrinsic radiation sensitivity, but rather a predictor of patients who harbor micrometastatic disease outside the radiation field.

  3. 3.

    ADT-RS [25]: The Decipher GRID was used to access 1212 patients that underwent a radical prostatectomy with adverse pathology. Patients who received early adjuvant ADT were matched to patients who did not receive early ADT. Rather than a purely unsupervised analysis, they limited genes to a curated gene list of 1632 genes identified from studies investigating neuroendocrine differentiation, castration resistance, and resistance to ADT. This gene list was then filtered based on feature ranking and model training. Ultimately 49 genes were identified and validated that were predictive of early ADT benefit. This was demonstrated with a significant interaction test in those with a high ADT-RS score (p = 0.035), while low ADT-RS patients did not derive any benefit from early use of adjuvant ADT. Notably, ADT-RS was not prognostic, but was in fact highly predictive, and on multivariable analysis the interaction for ADT-RS was even stronger after adjusting for other clinicopathologic factors.

  4. 4.

    PAM50 [26]: The PAM50 classifier was original developed in breast cancer. It is the basis for the commercially available Prosigna product run using NanoString. PAM50 successfully classifies breast cancers as luminal A, luminal B, HER2, and basal subtypes. These subtypes are not only prognostic, they are predictive of benefit of endocrine therapy and HER2 targeted therapy. Given that multiple cancers, including prostate cancer, also have luminal and basal subtypes, the PAM50 signature was applied to localized prostate cancer leveraging the Decipher GRID gene expression database. Notably, the HER2 subtype was removed, since ERBB2/HER2 amplification does not occur in prostate cancer as it does in breast cancer. The authors used the transcriptome-wide microarray Human Exon 1.0 ST microarray platform on 1567 retrospective samples with long-term follow up that was further divided into a training and validation cohort. Additionally, they used 2215 prospective samples to characterize the PAM50 subtypes in localized prostate cancer. All three subtypes, luminal A, luminal B, and basal, were identified in localized prostate cancer at similar distributions (~33% each). Known luminal markers, such as NKX3.1 and KRT18 were enriched in the luminal subtypes. Similarly, the basal marker CD49f was enriched in the basal subtype. Luminal B patients were the most likely to develop biochemical recurrence and distant metastasis, and display worse prostate cancer specific survival, and overall survival, as determined independently by multivariable analysis. Luminal A patients had the most favorable outcomes. Given the ability of PAM50 to predict responses of breast cancer to endocrine therapy, the benefit of ADT was tested. It was demonstrated that luminal B patients derived a significant improvement in metastasis-free survival from the addition of post-operative ADT, while luminal A and basal patients did not. The interaction test was significant (p = 0.006), indicating that PAM50 appears to be a predictive biomarker of post-operative ADT benefit. These results have led to an open randomized phase 2 trial testing if the addition of apalutamide, a next generation anti-androgen, will improve outcomes over salvage radiotherapy alone (NRG GU006, NCT03371719).

Conclusions

The transcriptome of prostate cancer continues to be unraveled. This chapter primarily focused on gene expression signatures that are based on the expression of protein coding genes. It is clear that gene expression data can recapitulate many of the important genomic alterations. Perhaps more importantly, the transcriptome has been leveraged to provide unparalleled accuracy in assigning a personalized prognosis for a patient above and beyond routine clinicopathologic parameters. Many of these signatures are now in clinical practice, and randomized data will be reported over the next 1–2 years to validate some of these signatures. The most exciting area that is just beginning to unravel is the ability for gene expression classifiers to serve as true predictive biomarkers, which can identify patients most likely to benefit from standard of care treatments, such as radiotherapy or ADT. Some of these are currently in ongoing randomized trials and have the promise to change the clinical landscape of managing prostate cancer.