Introduction

Ependymomas are tumors of glial origin that morphologically resemble ependymal cells lining the ventricles of the central nervous system. These tumors may develop supratentorially, infratentorially as well as in the spinal cord [13]. Ependymoma accounts for 3–6% of all CNS tumors. The incidence is higher in children and young adults where it is the second most common malignant brain tumor. In adults the most common location is the spinal cord, while in children these tumors tend to occur more commonly intracranially within the posterior fossa. Histologically, ependymomas are classified into three major subtypes: myxopapillary ependymoma (WHO grade I), ependymoma (WHO grade II) and anaplastic ependymoma (WHO grade III) [22]. Surgery represents the first line of treatment followed by radiotherapy and chemotherapy for disease control [26]. The 5-year progression-free survival ranges between 30 and 60% with incomplete surgical resection, anaplastic histology and younger age at diagnosis known to be associated with poor prognosis [23]. Whereas in children, the 5-year progression-free survival rate is 14% [31], in adult cases of infratentorial ependymoma the 5-year disease-free survival rates as high as 70% [12] can be achieved. Gross total resection which is an important prognostic factor is not always achieved [14]. In children, adjuvant radiotherapy, another prognostic factor in disease control, often cannot be employed due to young age of the patients. Despite histological similarities, tumors within the same stage show diverse clinical behavior indicating inherent underlying differences. Insights regarding the genetic alterations found in ependymoma have recently been forthcoming [16, 35; however, little progress has been made in improving the survival rates of ependymomas, indicating the need for novel therapeutic approaches.

Several studies have shown that ependymomas arising in the infratentorial, supratentorial and spinal cord locations have distinct genetic signatures and must therefore be viewed as distinct tumor entities [15, 16, 19, 24]. Members of the Notch and Sonic Hedgehog pathway are highly expressed in intracranial ependymomas, whereas the Homeo-box containing (HOX) family genes have been implicated in spinal cord ependymomas [35]. Data from one of our laboratories suggest that ependymomas derive from regionally specific stem cells bearing a radial glial cell phenotype [35]. Apart from histological grade and incomplete resection, several prognostic markers have been studied in ependymomas including Ki-67 [3], survivin [1], human telomerase reverse transcriptase, ERBB family members [11] and nucleolin [30]. Whole genome approaches such as array comparative genomic hybridization [20, 24, 28] and gene expression profiling [8, 16, 25, 27, 34] have also been employed for the identification of prognostic biomarkers. However, despite these studies, molecular markers for disease prognosis have not been integrated into the clinical management of ependymomas. Availability of robust molecular markers of tumor recurrence would allow for the identification of those patients who would benefit from standard treatments. This would provide clinicians with the ability to identify those patients who would not respond to standard therapy thus allowing them to design more aggressive treatment strategies. Since ependymomas are largely chemo-resistant, identification of molecular markers of prognosis would also help identify novel therapeutic targets.

In this study, we examine biologic subtypes of infratentorial ependymoma identified by gene expression microarray data, as well as clinical implications of these subtypes. To achieve this, we examined gene expression profiling microarray data to identify robust biomarkers and applied them to clinically annotated routinely available formalin-fixed paraffin-embedded (FFPE) samples. The biologic subtypes and the genes identified as associated with prognosis are validated in an external dataset.

Materials and methods

Tumor samples and patient characteristics

Archived paraffin-embedded tissue specimens were collected from 56 patients with infratentorial ependymomas. All patients with tissue sufficient for gene expression assessment and a minimum follow-up of 3 years or recurrence within 3 years were considered evaluable. For data analysis, cases were dichotomized into short recurrence-free survivors (SRS) and long recurrence-free survivors (LRS). LRS patients were defined as having clinical follow-up data available for a minimum of 3 years without any recurrence within 3 years from date of surgery and SRS patients were defined as having had a recurrence within 3 years from the date of surgery. Histological diagnosis and tumor grading were performed based on WHO criteria [22]. Sections from paraffin-embedded tissue were reviewed for pathologic diagnosis and dissected if necessary by a neuropathologist (KA) to ensure that ≥90% of the sample represented tumor. The clinical and pathological characteristics of the patients in the study are listed in Table 1. The use of the tissue and clinical data for these studies was covered under a protocol approved by the MD Anderson IRB.

Table 1 Clinical characteristics of the 56 infratentorial ependymoma patients

Gene expression array datasets

Gene expression data from four independent data sets from individual institutions were used for initial candidate biomarker discovery. Publicly available Affymetrix GeneChip data (.cel files) were obtained for data sets from the St Jude Children’s Research Hospital (GSE21687) [16], Virginia Commonwealth University (GSE13267 unpublished), University of Colorado-Denver (GSE16155) [8] and the Instituto Nazionale dei Tumori [24]. The Affymetrix Human Genome U133A v2.0 Array platform was used in three of the four datasets. Expression data were available for 67 infratentorial ependymomas from these sources. Microarray data were background corrected and normalized using the guanine cytosine robust multi-array average (gcRMA) algorithm in the Genespring software (Agilent biotechnologies) resulting in log2 expression values. For further validation, publicly available processed data files for Affymetrix Exon Array data from the Hospital for Sick Children, Toronto (GSE27279) and Agilent microarray data from the DKFZ, Heidelberg (GSE27287) were obtained [37].

Identification of candidate biomarkers

Within the sample sets, average expression and standard deviation values were calculated for each probe. The top 250 highly expressed highly variable probes were used for unsupervised clustering using Cluster 3.0 (http://bonsai.hgc.jp/~mdehoon/software/cluster) and Java tree view (http://jtreeview.sourceforge.net/) in order to identify subgroups based on gene expression patterns. Each sample was assigned to one of two identified gene expression subgroups. Fold change was utilized to identify genes whose expression differed between samples in each subgroup compared to tumors of the other subgroup. Genes were functionally annotated in DAVID (http://david.abcc.ncifcrf.gov/). Pathways altered in Group 1 and Group 2 ependymomas were identified using Gene Set Enrichment Analysis. The KEGG, Biocarta and GO databases were curated for pathway analysis. Candidate biomarkers for infratentorial ependymomas were identified based on high expression, high variability and fold change between the expression-defined subclasses. For candidate biomarker identification, average expression and differential fold change between Group 1 and Group 2 samples were calculated for each probe set. Among highly expressed genes (top 50th percentile of mean expression), genes with the largest fold change between subclass were identified. 295 were selected for further study. Additional genes (n = 71) that were defined in a prior publication on ependymoma subclasses [16] were also included.

RNA extraction

Total cellular RNA was isolated from core punches using the Epicenter RNA isolation kit according to the manufacturer’s protocol (Epicenter Biotechnologies, Madison, WI, USA) following de-paraffinization and proteinase K treatment. Genomic DNA contamination was removed through a DNAse treatment step in the protocol. Quality of the RNA was assessed using a real-time PCR assay for the control gene EEF1A (Assay Id: Hs00265885_g1; Applied Biosystems, Foster city, CA, USA) using 1 μl of undiluted cDNA and standard TaqMan cycling conditions on a 7900HT real-time PCR machine (Applied Biosystems). Samples with a C t value of ≤28 were considered eligible for gene expression analysis.

Real-time PCR

A custom real-time PCR-based microfluidics card (Applied Biosystems) was designed and populated with 366 candidate biomarker genes identified in the discovery microarray expression dataset. In order to optimize amplification of the fragmented RNA found in FFPE processed tissue, TaqMan gene expression primers were picked with predicted amplicon sizes of 85 bp or less (Applied Biosystems) (Supplementary Table 1). Quantitative real-time PCR (QRT-PCR) measurements were performed using a set of 18 FFPE ependymomas (9 SRS and 9 LRS). Total tumor RNA was reverse transcribed to single-stranded cDNA using ABI’s high capacity reverse transcription kit using the maximum allowed concentration of total RNA as per the manufacturer’s instructions (100 ng/μl). To determine fold changes in each gene, QRT-PCR was performed. Fold change associated with survival for each gene was determined by the ∆∆C t method.

Since FFPE-derived RNA can be highly degraded, some quality metrics were employed. The quality metric was to determine the ability to amplify a housekeeping gene (EEF1A). This measurement was then used to guide the RT-PCR for the mRNA microfluidics gene card. Second, since gene assays with very high C t values can mean either low expression or poor quality RNA, we examined the C t values of the control genes GLUD2, NTF3 and PPIC, which were selected for high expression and relatively uniform expression across ependymoma samples in the discovery microarray datasets. If the average C t value of the three control genes was over 34, the sample was considered to have RNA quality too low for a reliable measurement. In order to prioritize highly expressed genes which are more reliably measured in FFPE tissues, we used an average delta C t cutoff of four cycles or less for a gene to be considered for further testing. This occurred in 55% of gene assays among all the samples. Candidate biomarkers from this subset that were found to be associated with recurrence-free survival were identified and included in a validation test. Validation was performed on an independent set of 38 ependymomas (17 LRS and 21 SRS). Genes associated with survival were identified based on fold change (1.5-fold and higher) and t test (P ≤ 0.05).

Calculation of metagene score

In order to determine the association of the overall gene expression classifier with patient outcome, we calculated a single “metagene” score as described in our previous publication [6] with the exception that all the genes were weighted equally. For each case the metagene score was calculated based on the set of ten genes by averaging the normalized expression values for all the genes associated with poor prognosis and then subtracting the average of the normalized expression values for all the genes associated with good prognosis for each case. This resulted in a single numerical score for each tumor, and each tumor was then ranked according to this metagene score. The samples were dichotomized into two groups using median metagene score as the cutoff. Samples with metagene scores above the median were defined as having a favorable 10-gene signature while samples with metagene scores below the median value were defined as having an unfavorable 10-gene signature.

Immunohistochemistry

Immunohistochemical staining on a set of 39 archival FFPE infratentorial ependymoma tumors for TOP2A was carried out using a mouse monoclonal antibody (clone Ki-S1; Dako; 1:100). Antigen retrieval was done using heat-induced antigen retrieval in 10 mM sodium citrate buffer, pH 6.0. Staining was detected using the Envision kit from Dako. Expression of TOP2A protein was scored as present (1) or absent (0).

Statistical analysis

All data were analyzed with Statistica 6.1 software package (Statsoft Inc). Univariate and multivariate Cox regression analyses evaluated the association of individual and multiple covariates with recurrence-free survival. Kaplan–Meier estimated the time to event functions of recurrence-free and overall survival. Recurrence-free survival was defined as the time between date of surgery to date of recurrence, death or last follow up. A P ≤ 0.05 was considered significant.

Results

Unsupervised clustering to identify transcriptomal groups

The overall experimental scheme is outlined in Supplementary Fig. 1. We first attempted to identify transcriptomal subtypes of infratentorial ependymoma, by obtaining publicly available Affymetrix U133 profiling data from four sources (including data from one of our laboratories) comprising samples from 67 patients. After normalization, unsupervised clustering was performed on genes that were both highly expressed and highly variable in the expression array data. Unsupervised clustering revealed two major subclasses of these tumors, as shown in Fig. 1. Patient age was available for 59/67 cases (Fig. 2). Distribution of the patient samples showed that Group 1 was composed predominantly of children. Of the 24 patients for whom age at diagnosis data were available, 23 were under the age of 10 years and only 1 child was over 10 years. In contrast, 7/30 samples from Group 2 were from patients over the age of 10, which included 4 adults (age > 18) (χ2 P = 0.025). Our findings have been further validated in the Witt et al. publication wherein the authors have found a strong correlation between younger patient age and a molecular subgroup characterized by the up-regulation of pathways known to be activated in aggressive tumors. Gene ontology (GO) analysis showed that genes over-expressed in Group 1 samples had ontologies correlating with response to wound healing, inflammation, migration and cell adhesion (Table 2; Supplementary Table 2). These gene ontologies are similar to previous findings from our laboratory and others [10, 29, 36] which described a mesenchymal signature in glioblastoma (GBM). To examine this further, we evaluated the expression of genes that we previously described as mesenchymal [29]. Of the 15 genes originally described in the manuscript, 13 were interrogated on the microarray platform used for these data (U133A GeneChip). All 13 of these genes were over-expressed in Group 1 compared to Group 2 (Supplementary Table 3), indicating concordance of the Group 1 signature with mesenchymal glioblastoma. No significant GO terms were identified for genes over-expressed in Group 2 (Supplementary Table 4). Comparison of infratentorial subgroups (subgroups G, H and I) described in a prior publication [16] showed that the Group 1 is essentially similar to subgroups H and I, while Group 2 resembles subgroup G (Supplementary Fig. 2).

Fig. 1
figure 1

Expression profiling reveals two major subgroups of gene expression in infratentorial ependymomas. a Unsupervised clustering of 67 infratentorial ependymomas using 250 highly variable probe sets reveals two major sample clusters. Age: patients <10 years (orange); >10 years (blue); grade: II (red), III (green) unknown (white). b Heat map of the top 40 genes differentially expressed between Groups 1 and 2

Fig. 2
figure 2

Distribution of age, grade and gender among the microarray-defined subgroups. Group 1 ependymomas were predominantly composed of younger patients

Table 2 Gene ontology analysis of genes over-expressed in microarray expression Group 1

Identification of prognostic gene set

Genes that define the mesenchymal and proneural subtypes of GBM have been found to function as robust biomarkers of clinical outcome [6]. Since several of the genes over-expressed in the mesenchymal glioblastoma are also over-expressed in Group 1 ependymomas, we reasoned that some of the genes that defined transcriptomal subtypes might also be associated with patient outcome in ependymoma. To test this, we selected 366 genes that were differentially expressed between the two groups and conducted QRT-PCR for these genes on a set of 18 FFPE archival infratentorial ependymoma samples on which clinical data were available (Supplementary Table 5). Using 3-year recurrence-free survival as a cutoff, we identified a set of 51 genes (Supplementary Table 6) that were differentially expressed between patients with SRS and LRS. Of these 51 genes, 22 were found to be over-expressed in the SRS patient samples and the remaining 29 were over-expressed in the LRS patient samples. Comparison with unsupervised clustering showed a striking correlation with transcriptomal subgroups. Twenty of the 22 SRS-associated genes were over-expressed in cluster Group 1 tumors, while 27 of the 29 LRS-associated genes were over-expressed in cluster Group 2 (Supplementary Table 6). These results suggest a significant concordance of outcome-associated genes with genes that define global transcriptomal classes (Fischer’s exact test, two-tailed P < 0.001) (Supplementary Table 7).

We then tested these 51 genes in an independent set of 38 infratentorial ependymoma samples (21 SRS and 17 LRS). Using a metagene approach, we found that in univariate Cox analysis, the 51-gene signature was associated with recurrence-free survival (P = 0.016) as well as overall survival (P = 0.0068). In order to optimize this signature and increase its applicability to routine clinical testing we focused on the top differentially expressed genes between the SRS and LRS group of samples. To accomplish this, we used a fold-change cutoff of 1.8 or higher and a t test, P ≤ 0.05, which resulted in a 10-gene signature (Table 3). To determine how the 10-gene signature compares with current standard clinical parameters as a prognostic factor, univariate Cox regression analyses were performed on data from all the 56 samples. In univariate analysis, the 10-gene signature was strongly associated with recurrence-free survival (P < 0.001) as well as overall survival (P < 0.001) (Table 4). Using a cutoff of median metagene score, Kaplan–Meier analyses demonstrate a significant difference in both recurrence-free and overall survival based on the 10-gene score (Fig. 3). The median recurrence-free survival for the metagene-defined unfavorable patient group was 80 weeks while the median for the favorable group was not reached. Consistent with the findings from microarray data, the unfavorable 10-gene signature was associated with a younger age group. Of the 28 patients with an unfavorable 10-gene signature, 22 were below the age of 10. In contrast, only 9/28 patients with the favorable signature were under 10 (Fischer’s exact test, two-tailed P = 0.001) (Fig. 4).

Table 3 The survival-associated 10-gene expression signature of infratentorial ependymoma
Table 4 Univariate analysis of outcome predictors
Fig. 3
figure 3

Kaplan–Meier analysis of survival association with 10-gene signature. Formalin-fixed, paraffin-embedded ependymoma samples were subjected to QRT-PCR as described in “Materials and methods”. The samples were ranked by metagene score and dichotomized into two groups using the median metagene score as the cutoff. Survival is shown for the lower metagene scores (blue) versus the higher metagene score (red). a Recurrence-free survival according to the 10-gene set. The median recurrence-free survival for the metagene-defined unfavorable group was 80 weeks while for the favorable group it was not reached. b Overall survival according to the 10-gene set. The median overall survival for the unfavorable group was 345 weeks, and was not reached for the favorable group. The log rank test was used to determine statistical significance

Fig. 4
figure 4

The unfavorable 10-gene signature is significantly associated with younger age at diagnosis. Statistical significance of age was determined using the Fisher’s exact test, two tailed

Multivariate analyses were then performed to determine whether the 10-gene signature was independently associated with survival outcomes in the context of prognostic factors found to be significant in univariate analysis. In multivariate analysis, the 10-gene signature was significantly associated with recurrence-free survival (P = 0.003) (Table 5). One of the genes in the ten-marker set, Topoisomerase 2 alpha (TOP2A), was tested using immunohistochemistry (IHC) on a set of 39 infratentorial ependymomas. TOP2A expression was significantly associated with the metagene score (Fishers exact test, two-tailed P < 0.001) (Supplementary Fig. 3). When grouped based on expression of TOP2A, Kaplan–Meier analyses demonstrate a significant difference in both recurrence-free and overall survival among groups with low and high expression of TOP2A (Supplementary Fig. 3). In the Heidelberg dataset however, expression of TOP2A mRNA was not significantly associated with recurrence-free survival (log rank test P = 0.12) or overall survival (log rank test P = 0.08) when grouped by median gene expression.

Table 5 Multivariate Cox regression analysis

Validation of microarray-defined subgroups

Following primary analysis of our data an independent group reported results from mRNA-based expression profiles in infratentorial ependymoma [37]. Briefly, two major subtypes (Groups A and B) were identified in this report, with biologic and clinical implications. We obtained the gene expression data used in this paper (GSE27287, GSE27279) to validate our findings. The paper describes two datasets consisting of a total of 84 infratentorial samples, the first tested with an Affymetrix exon array platform (n = 47, “Toronto” dataset) and the second with an Agilent platform (n = 37, “Heidelberg” dataset). We clustered samples from these datasets using the gene list that defined our Groups 1 and 2 subtypes (Fig. 1b). This resulted in two major subgroups in the two datasets (Fig. 5), showing significant concordance in subgroup assignment (Table 6). Conversely, we applied the independently defined gene list to cluster the microarray data from the 67 samples described above, which again resulted in a significant concordance in subgroup assignment (Table 7). In the Heidelberg dataset clustered using the gene list that discriminates Groups 1 and 2, molecular subclass was significantly associated with recurrence-free survival (P = 0.041) as well as overall survival (P = 0.015) in univariate analysis (Supplementary Table 8). Finally, we examined the two subgroup-defining genes indicated in the combined Toronto–Heidelberg datasets LAMA2 and NELL2 in our dataset. The Group A marker LAMA2 was highly expressed in microarray Group 1 (Student’s t test, P < 0.001) while the Group B marker NELL2 was over-expressed in microarray Group 2 (Student’s t test, P < 0.001) (Fig. 6). Analysis of the pathways defined by gene ontologies in Group 1 versus Group A in the Toronto and Heidelberg datasets by GSEA showed 32 commonly deregulated pathways. Among Group 2 and Group B from the two datasets there were 18 commonly deregulated pathways. By random chance less than one pathway would be expected to be commonly altered between Group 1/Group A and Group 2/Group B providing further evidence that the subgroups in our analysis display similar biology to subgroups recently described in an independent dataset.

Fig. 5
figure 5

Subgroups A and B described in the Witt et al. report are similar to transcriptomal subgroups 1 and 2. Gene expression microarray data from the Toronto and Heidelberg datasets was used for clustering using the gene list described in Fig. 1b. Heat map of the top 40 genes differentially expressed between Group 1 and Group 2 in a Heidelberg dataset and b the Toronto dataset. 38 of the 40 most differentially expressed genes in Groups 1 and 2 were evaluable in the Heidelberg data set, while 38 of the 40 differential genes were present in the Toronto dataset. Color coding is as follows: Group A tumors, orange; Group B tumors, blue

Table 6 Concordance between the microarray-defined subgroups: clustering of external datasets (Witt et al.) using the gene signature described in Fig. 1b
Table 7 Concordance between the microarray-defined subgroups: clustering of samples described in Fig. 1b using gene signature described in Witt et al.
Fig. 6
figure 6

Expression of LAMA2 and NELL2 in the transcriptomal subgroups. LAMA2 was over expressed in Group 1 ependymomas, while NELL2 expression was significantly higher in Group 2 ependymomas. The Student’s t test was used to determine statistical significance

Finally, we applied the 10-gene signature to the external dataset to validate outcome association. Univariate analysis validated associations with recurrence-free survival and overall survival (Table 8; Fig. 7). In multivariate analysis, the 10-gene signature was validated as a predictor of recurrence-free survival in the independent dataset (Table 9).

Table 8 Univariate analysis of outcome predictors in the Heidelberg dataset
Fig. 7
figure 7

Kaplan–Meier analysis of survival association with 10-gene signature in the Heidelberg dataset. Gene expression data from the Heidelberg dataset was used to calculate a metagene score for the 10-gene signature as described in “Materials and methods”. The samples were ranked by metagene score and dichotomized into two groups using the median metagene score as the cutoff. Survival is shown for the higher metagene scores (blue) versus the lower metagene score (red). a Recurrence-free survival according to the 10-gene set. The median recurrence-free survival for the metagene-defined unfavorable group was 110 weeks while for the favorable group it was not reached. b Overall survival according to the 10-gene set. The log rank test was used to determine statistical significance

Table 9 Multivariate Cox regression analysis of the Heidelberg dataset

Discussion

Despite histological similarities, ependymomas arising from the spinal cord and the infratentorial and supratentorial compartments of the central nervous system show diverse clinical behavior [2]. Several gene expression profiling studies have shown that ependymomas of different locations indeed have distinct expression profiles suggesting biological tumor heterogeneity [9, 15, 16, 24]. To address site-specific biological variation, we focused on infratentorial ependymomas, one of the common locations found in adult and pediatric tumors. Since infratentorial ependymomas have distinct gene expression signatures, we hypothesized that they would have a unique set of biologic and prognostic markers relative to ependymomas arising in other sites.

Using unsupervised clustering of publicly available infratentorial ependymoma microarray expression data, we found two distinct molecular subtypes of infratentorial ependymoma characterized by distinct biological pathways. Gene ontology analysis showed that one of the subgroups identified in unsupervised clustering (Group 1) showed over-expression of genes associated with angiogenesis, wound healing, migration and adhesion. These gene ontologies are similar to the mesenchymal gene signature reported in glioblastoma [29, 36]. In our analysis of genes that we previously described as prototypical mesenchymal genes, we found that these genes were uniformly over-expressed in Group 1 ependymomas. Patients within expression Group 1 were on average, younger than patients whose tumors clustered as Group 2. While the number of adult patients in the microarray analysis was small (n = 4), all four patient samples clustered within Group 2. All of the patients in Group 1 were children. These results may suggest age-specific biologic differences in infratentorial ependymoma. This conclusion is supported by prior studies that have highlighted genetic [19], epigenetic [32] and chromosomal characteristics [24] of pediatric ependymomas that differentiate them from the molecular profile of adult tumors. A recent report on infratentorial ependymoma expression profiling wherein two molecularly distinct subgroups were identified [37] validates our findings. This report describes a subgroup of infratentorial ependymoma that comprises younger patients and defines a more aggressive phenotype. We have shown that these subgroups are essentially similar to the ones we have described, and describe similar biology, despite differences in the individual genes that define them.

Several groups have reported the prognostic value of gene expression signatures identified by global expression profiling [4, 10, 29, 33]. In order to explore the clinical significance of the microarray-defined subclasses, we tested a panel of genes that were differentially expressed in the two expression groups on a set of clinically annotated infratentorial ependymomas. Using a small pilot sample (n = 18), we identified a set of 51 genes that showed an association with recurrence-free as well as overall survival. This 51-gene signature remained prognostic in a larger independent validation set. Correlation of the 51-gene signature with the microarray-defined classes showed that most (approximately 90%) of the genes over-expressed in the tumors that recurred (SRS) clustered within the microarray expression Group 1, which represented a more aggressive molecular profile. In contrast, the vast majority of genes highly expressed in the LRS tumors were over-expressed in microarray Group 2. These findings indicate close connections between microarray-defined subgroups, patient age and prognostic biomarkers in infratentorial ependymoma.

In order to increase the feasibility of the development of a clinical test applicable to routine testing, we selected the ten genes most significantly associated with survival. Expression values of these 10-gene assays were condensed into a single score (metagene) for statistical analyses. The 10-gene set was shown to be an independent predictor of recurrence-free survival. While there was a strong association between younger age and an unfavorable prognostic signature, the 10-gene signature remained significantly associated with patient outcome after adjusting for age. We have validated the 10-gene signature as an independent predictor of recurrence-free survival in the Heidelberg dataset as recently described [37]. Importantly, while the larger 51-gene signature and the more focused 10-gene signature were optimized using clinical outcome as the endpoint, these genes were derived from a set of genes differentially expressed between transcriptomic subgroups independent of patient outcome, indicating a connection between biologic subtypes of ependymoma and tumor aggressiveness (Supplementary Tables 6 and 7).

Analysis of the known functions of the genes in our 10-gene signature showed that poor survival is associated with the increased expression of genes associated with angiogenesis, extra-cellular matrix and proliferation. Increased expression of the oncogene Transketolase-like 1 (TKTL1) has been reported to be associated with poor survival in colorectal carcinomas [7]. Over-expression of TOP2A mRNA was associated with shorter metastasis-free survival in breast cancer, indicating that it could be a marker for aggressive tumor phenotype [5]. As seen in our study, over-expression of TOP2A has been reported to be associated with poor survival in ependymomas in univariate analysis [17, 18, 21, 38, 39]. However, TOP2A over-expression has not been shown to represent an independent prognostic marker after accounting for clinical variables such as extent of resection and grade thus limiting its applicability as a prognostic biomarker.

These data provide preliminary evidence that an mRNA-based test could serve as a format for a clinical test for infratentorial ependymoma which, along with the existing clinical markers, could be used to optimize therapeutic choices for individual patients, analogous to the predictive test developed for optimization of patients therapy in glioblastoma [6] and breast cancer [33]. Specifically, with further validation, such a test could help identify the likelihood of response to standard therapies. In addition, the identification of specific clinical outcome-associated genes could also provide insights into tumor biology that could help identify novel therapies for patients resistant to standard therapy. Further validation of this 10-gene signature in a larger independent cohort of samples would be required to demonstrate the clinical utility of this molecular signature.