Introduction

Lower-grade gliomas (LGGs) are a group of heterogeneous neuroepithelial tumors comprising diffuse low-grade and intermediate-grade gliomas [World Health Organization (WHO) grades II and III] [1, 2]. The clinical course of the disease spans a broad spectrum, highlighting the need to stratify patients into distinct subgroups with more uniform clinical outcomes. According to the 2016 WHO brain tumor classification [3], a number of genetic and molecular abnormalities, including IDH mutation and 1p/19q codeletion, have been progressively incorporated as supportive markers to facilitate the comprehensive assessment of patients with LGGs and as an integral part of LGG subclassification [4, 5]; these features have been shown to distinguish biologically and clinically distinct subtypes. Nevertheless, a deep understanding of the biological differences between individual patients remains distant, and new prognostic biomarkers are required to better determine clinical outcome and devise patient-tailored treatment.

Recently, the discovery of microRNAs (miRNAs) has led to new discoveries for both the diagnosis and prognosis of various tumors [6, 7]. miRNAs are short non-coding RNAs consisting of approximately 22 nucleotides that play an effective tumor-regulatory function by modulating multiple targets at the posttranscriptional level. Ample evidence has indicated that abnormal expression of miRNAs in glioma is closely associated with multiple biological processes including cellular proliferation, angiogenesis, apoptosis, metastasis, and invasion [8,9,10]. However, whether a miRNA signature derived from one or more miRNAs can predict clinical outcomes in patients with LGGs is unknown.

In the present study, we used available miRNA data from 100 patients with LGGs from the Chinese Glioma Genome Atlas (CGGA) to identify a unique miRNA signature as a putative prognostic biomarker, and further validated its predictive properties using data from a different cohort of 420 patients from The Cancer Genome Atlas (TCGA) database. Furthermore, we developed and validated a miRNA-based predictive model that integrated our newly discovered four-miRNA signature with traditional clinicopathological risk factors for patient-tailored survival prediction in patients with LGGs.

Methods

Patient selection

We used 520 patients with miRNA expression data and corresponding clinical information in this study, of which 100 samples were downloaded from the CGGA database (http://www.cgga.org.cn) as the training set, and 420 samples were obtained from TCGA (cancergenome.nih.gov) as the validation set. Selection criteria for both cohorts were as follows: (a) histopathologically confirmed LGGs according to WHO classification; (b) high-quality miRNA expression data available; and (c) no history of radiation therapy, chemotherapy, or corticosteroid therapy before surgery. Patients with incomplete prognostic information were excluded. Overall survival (OS) was calculated from the date of initial diagnosis until death or last follow-up examination. All patients in the CGGA provided written informed consent to participate and the privacy of patients was strictly protected. The study protocol was approved by the ethics committee of Beijing Tiantan Hospital.

miRNA expression, mRNA expression, and biomarker detection

For the training cohort, miRNA expression data was generated using the human v2.0 miRNA Expression BeadChip (Illumina, Inc., San Diego, CA, USA) with 1146 miRNAs covering 97% of the miRBase 12.0 database [11], mRNA expression was generated by the Agilent Whole Human Genome Array platform [12], and mutations in IDH were detected by pyrosequencing [13], all of which have been described in detail in our previous study. For the validation cohort, miRNA expression data, level 3 RNA sequencing data (RSEM normalized) and molecular data (IDH mutation, 1p/19q co-deletion, and ATRX mutation status) were obtained from TCGA database.

Feature selection and risk score construction

Evidence of an association between differentially expressed miRNAs and patient OS was determined using a univariate Cox proportional regression model, and P values of < 0.05 were considered statistically significant. Because it is suitable for regression analysis of high-dimensional data, the least absolute shrinkage and selection operator (LASSO) method was used to identify and select the most useful predictive features. The following risk score formula was used to calculate for each patient by applying a linear combination of selected features that were weighted by their respective coefficients (Coef) from LASSO:

$${\text{Risk score}}=\left( {{\text{exp}}{{\text{r}}_{{\text{miRNA1}}}} \times {\text{Coe}}{{\text{f}}_{{\text{miRNA1}}}}} \right)+\left( {{\text{exp}}{{\text{r}}_{{\text{miRNA2}}}} \times {\text{Coe}}{{\text{f}}_{{\text{miRNA2}}}}} \right)+ \ldots +\left( {{\text{exp}}{{\text{r}}_{{\text{miRNAn}}}} \times {\text{Coe}}{{\text{f}}_{{\text{miRNAn}}}}} \right).$$

Identical β values were applied to the validation cohort.

Prediction of survival outcome using a miRNA-based risk score

Patients with assigned risk scores in the training and validation cohorts were classified into the high-risk or low-risk group using the median as the cutoff point. The OS rates of patients in the high-risk and low-risk groups were first assessed in the training dataset and then validated in the validation dataset using the Kaplan–Meier method. The log-rank test was used to determine survival differences between groups. Similarly, differences in the survival curves for each identified significant feature were evaluated. The survival impact of the four-miRNA signature within each LGGs subtype was also investigated. Associations between risk score and clinical characteristics were assessed with Chi square or Fisher’s exact tests. Univariate and multivariate Cox regression analysis was used to evaluate whether the four-miRNA signature was an independent prognostic factor.

Individualized prediction model construction

Nomograms for individualized prediction of patient outcomes were generated based on the results of the multivariate analysis to predict 1-, 2-, 3-, and 5-year OS with the rms package in R [14]. To minimize information loss, a backward stepdown selection process was performed to recruit the independent prognostic factors into the final nomogram model conforming to the Akaike information criterion [15]. Harrell’s concordance-index (C-index) and calibration curves were used to assess predictive performance and discriminative ability of the nomograms [16]. Furthermore, the prognostic nomogram was validated in an independent external cohort.

Bioinformatics analysis

Pearson correlation analysis, which was performed using the R programming language (cran.r-project.org), was used to identify genes associated with a miRNA-based risk score, identified based on a P value of < 0.05 and a Pearson correlation coefficient of > 0.3. Furthermore, we performed gene ontology analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways analysis to identify the biological mechanisms of the miRNA-based signature and the pathways in which they may be involved using DAVID (david.ncifcrf.gov). False discovery rates (FDRs) were used to address multiple comparisons. The biological process network was visualized using Cytoscape.

Results

Identification of prognostic miRNAs and association with OS

Using univariate Cox proportional regression analysis, we identified 133 protective miRNAs and 128 oncogenic miRNAs that were significantly associated with patient OS (Fig. 1a). Next, we used a LASSO Cox regression model to identify the most informative putatively prognostic miRNAs, and identified four miRNAs (miR-590-3p, miR-10b, miR-196a, and miR-15b-3p) with non-zero regression coefficients (Fig. 1b, c). All four miRNAs showed high expression (using median expression as a cutoff) and were associated with shorter patient survival, findings which were verified in the validation cohort (Fig. 2a–h).

Fig. 1
figure 1

Identification of prognostic miRNAs. a Univariate Cox regression analysis identified 261 miRNAs significantly associated with the OS of patients with LGG. b Ten-time cross-validation for tuning parameter selection in the LASSO model. c LASSO coefficient profiles of the 261 miRNAs

Fig. 2
figure 2

Kaplan–Meier plot of OS in patients according to the expression of four miRNAs in patients with LGGs. a, c, e, g Each of the four identified miRNAs divides patients into groups with significantly different prognosis in the training dataset. b, d, f, h Expression of the four identified miRNAs retained prognostic significance for patients in the validation set

Development of a miRNA risk score and association with OS

A risk score formula was constructed based on the individual expression levels of the four miRNAs and their respective coefficients as follows: \({\text{Risk score}}=\left( {{\text{expressio}}{{\text{n}}_{{\text{miR-59}}0{\text{-3p}}}} \times \left( {{\text{6}}.{\text{84}} \times {\text{1}}{0^{ - 0{\text{5}}}}} \right)} \right)+\left( {{\text{expressio}}{{\text{n}}_{{\text{miR-1}}0{\text{b}}}} \times \left( {{\text{4}}.{\text{18}} \times {\text{1}}{0^{ - 0{\text{5}}}}} \right)} \right)+\left( {{\text{expressio}}{{\text{n}}_{{\text{miR-196a}}}} \times \left( {{\text{4}}.{\text{11}} \times {\text{1}}{0^{ - 0{\text{5}}}}} \right)} \right)+\left( {{\text{expressio}}{{\text{n}}_{{\text{miR-15b-3p}}}} \times \left( {{\text{9}}.{\text{43}} \times {\text{1}}{0^{ - 0{\text{6}}}}} \right)} \right)\). The dichotomized risk score (using the median risk score as a cutoff) enabled us to segment patients into high- (n = 50) and low-risk (n = 50) groups [hazard ratio (HR) = 0.1213, 95% confidence interval (CI) 0.060–0.287] in the training cohort (Fig. 3a). To confirm that the four-miRNA signature had a similar prognostic value in different populations, we then applied it to predict OS in an independent validation cohort using the median risk score as a cutoff. We found that the dichotomized risk score could also stratify patients from the validation cohort into high- (n = 210) and low-risk groups (n = 210) (HR = 0.3892, 95% CI 0.256–0.593, Fig. 3d). When stratifying patients by IDH mutation status, the four-miRNA signature remained a significant prognostic factor in both cohorts (Fig. 3b, c, e, f). In addition, this signature allowed patients within each LGG subtype to be further stratified (Fig. S1).

Fig. 3
figure 3

Kaplan–Meier plot of OS in patients according to the four-miRNA signature risk score. The dichotomized risk score allowed the segmentation of patients into high- and low-risk groups in both the a training and d validation cohorts. When stratifying by IDH mutation status, the four-miRNA signature remained a significant prognostic factor in both the b, c training and e, f validation cohorts

Associations between the four-miRNA signature with clinicopathological variables and patient outcome

Next, we investigated whether there were associations between the four-miRNA signature and widely accepted prognostic factors in patients with LGGs. We found that the risk score was significantly associated with several known prognostic factors (age, sex, WHO grade, histology, IDH status, and 1p/19q status) in the large TCGA dataset (Table 1).

Table 1 Clinicopathological characteristics of patients with lower-grade gliomas in the CGGA and TCGA datasets

We then performed univariate and multivariate Cox regression analysis to ascertain whether the four-miRNA signature could be an independent predictor for patients with LGGs. Based on the results from univariate analysis, we found that the risk score (P < 0.001), age (P = 0.039), WHO grade (P < 0.001), and IDH status (P = 0.042) were significantly associated with survival in the training cohort. Similar results were found in the validation cohort (Table 2). After multivariable adjustment using the above factors, the risk score remained a powerful and independent factor (P < 0.001) in both the training and validation cohort (Table 2).

Table 2 Characteristics associated with OS using Cox regression for patients with lower-grade gliomas in the CGGA and TCGA datasets

Establishment and validation of the individualized prediction models

Using a backward stepwise method based on the smallest Akaike information criterion, prognostic nomograms that integrated independent prognostic parameters (WHO grade, age at diagnosis, IDH status, and risk score) were constructed. In TCGA cohort, we found that the C-index for the prediction nomogram was 0.83, whereas the C-index of the nomogram for predicting OS was 0.68 in the CGGA cohort. The calibration curve also demonstrated excellent agreement between prediction and observation in the probabilities of 1-, 2-, 3-, and 5-year OS in both cohorts (Fig. 4).

Fig. 4
figure 4

After final model selection, four-miRNA signature, WHO grade, age, and IDH status were included in the nomogram. a A nomogram for predicting OS of patients with LGGs with assessment of model calibration in the b training cohort and the c validation cohort

Significant functions and pathway enrichment analysis

Paired miRNA and mRNA expression data of 93 and 420 samples representing the training and validation data sets, respectively, were eligible for subsequent analysis. In the training cohort, we identified 3012 genes that were significantly associated with different risk score groups, including 1487 that were up-regulated in high-risk patients and 1525 that were downregulated in low-risk patients (Fig. 5a). We then used gene ontology analysis and network analysis to explore the underlying biological functions of genes associated with high-risk patients, and found that the major biological processes were enriched in cell cycle and modification, nucleosome assembly, RNA processing, respiratory electron transport, translational elongation, and DNA repair (Fig. 5b, c). KEGG pathway enrichment analysis showed that cell cycle, DNA replication, and mismatch repair were the main associated pathways (Fig. 5d). Similar four-miRNA signature-related biological processes and signal pathways, which are widely thought to play important roles in tumor proliferation, were observed in the validation dataset (Fig. S2).

Fig. 5
figure 5

Significant functions and pathway enrichment analysis of the four-miRNA signature in the training dataset. a Heat map of differentially expressed genes between high- and low-risk four-miRNA signature groups from 93 samples of LGGs. b Network analysis and c gene ontology analysis showing that the score from a high-risk group was associated with certain biological processes, such as cell cycle and nucleosome assembly. d KEGG pathway enrichment analysis showing that patient classification as high-risk was associated with several specific pathways

Discussion

To elucidate the biological mechanisms underlying LGGs, previous studies have analyzed DNA mutations, RNA expression, DNA copy number, and DNA methylation data [1, 2, 5, 17, 18]. However, information at the post-transcriptional level would provide additional information to improve our understanding of the biological processes underlying LGGs. In the present study, we performed an analysis of post-transcriptional data and identified a four-miRNA signature for prediction of OS in patients with LGGs. This unique signature successfully and robustly stratified patients into high-risk and low-risk groups in both the training (CGGA) and validation (TCGA) cohorts, and the four-miRNA-based nomogram provided an effective approach for individual survival estimation. Furthermore, in silico biological analyses identified potential functional roles of the miRNAs that comprise this prognostic signature in processes implicated in tumorigenesis.

Using a LASSO regression model, we identified four miRNAs (miR-590-3p, miR-10b, miR-196a, and miR-15b-3p) whose expression profiles were significantly associated with patient prognosis and showed that some of these miRNAs play important roles in glioma pathogenesis and progression. For instance, increased expression of miR-10b has been shown to be significantly associated with glioma grade progression [19], and in vitro studies have demonstrated that overexpression of miR-10b promotes invasion, migration, extracellular matrix remodeling, and tumor progression [20,21,22]. In addition, functional analysis revealed that miR-10b mediates tumor progression through the cell cycle and increased hypoxia [23, 24]. Similarly, miR-196a plays a role in the progression of malignancy in patients with glioma and is correlated with OS in patients with glioblastoma [25]. Previous studies of potential regulatory mechanisms demonstrated that overexpression of miR-196a may contribute to cell proliferation and the suppression of apoptosis by activation of NF-kB in glioma cells [26]. Hence, we proposed that the prognostic value of our identified miRNAs may be derived from their important roles in regulating the initiation and progression of LGGs. Furthermore, using LASSO modeling, we integrated multiple miRNAs into a single signature that may better reflect the complex nature and biological processes that underlie LGGs compared with that of a single miRNA. Although TCGA follow-up times were longer than those in the CGGA database, our four-miRNA signature remained a significant predictor in both the training and validation cohorts, which demonstrate that the signature is robust and reliable.

IDH mutations and the 1p/19q co-deletion are particularly notable molecular alterations that occur in the very early stage of gliomagenesis and are considered key prognostic factors for patients with LGGs according to a new WHO classification [1, 17, 27]. Although recent studies have shown that molecular-based stratification (i.e., groups with IDH1 or IDH2 mutations and the 1p/19q co-deletion, IDH1 or IDH2 mutations and no 1p/19q co-deletion, and IDH1/IDH2 wildtype) can be used to categorize patients in clinically and etiologically similar groups [1, 5], genetic characterization alone may be insufficient to comprehensively delineate or define tumor behaviors or mechanisms. In fact, previous studies have identified and validated age, WHO grade, seizure, and other prognostic factors [28,29,30]; however in this study, we generated a miRNA-based nomogram integrating molecular markers as well as clinical and genomic data that yielded a more comprehensive and individual prognostic prediction for patients with LGGs. The key benefits of this model are that it provides a complementary perspective about a single tumor and develops an individual scoring system for patients.

Our study had several potential clinical applications; specifically, our identified miRNA signature may serve as a novel biomarker for prognostic response and prediction towards existing adjuvant treatments for LGGs. Although maximal safe resection combined with adjuvant radiotherapy and chemotherapy is the recommended treatment for patients with LGG and any poor prognostic feature [31], the clinical outcome remains extremely variable with the same treatment regimen. Therefore, this finding indicates that efforts should be focused on the identification of subgroups of patients who are more likely to benefit from adjuvant treatments and thus enables greater personalized care. Houillier et al. demonstrated that patients with IDH-mutated gliomas have a significantly increased response to the oral alkylating agent temozolomide [32]. Baumert et al. [33] performed an EORTC clinical trial and concluded that patients with IDH1/IDH2 mutations and 1p/19q non-codeletion benefitted from radiotherapy more than chemotherapy. Despite the importance of these findings, integrating miRNA data as part of a molecular-based stratification approach may provide new clues in the identification of candidate therapeutic targets in patients with LGGs or enable the provision of better personalized medicine for certain patient subgroups. In a previous study, several miRNAs were examined as potential biomarkers in response to temozolomide treatment in patients with glioblastoma [11]. Therefore, suitable therapies combining more tolerable targeted drugs as an adjunct could be selected for patients with different risk scores, which will improve the efficacy of systemic control.

We identified a few limitations to our study. First, although our study has a large sample size containing independent training and validation sets to robustly demonstrate the putative prognostic value of our miRNA-signature in patients with LGGs, prospective studies are warranted to more fully assess whether these miRNAs are clinically valuable in patient prognosis. Second, the inclusion of additional known variables implicated in LGGs, such as the Karnofsky Performance Score, tumor location, neuroimaging data, and other genomic characteristics are needed to better determine the prognostic contribution of our miRNA signature with existing measures. Third, the absence of a 1p/19q codeletion study in the training cohort is also a limitation of this study. Fourth, our study was retrospective in nature, and therefore, we cannot exclude the possibility that variable treatment regimens with surgery, radiation, and chemotherapy may have had confounding effects on survival outcomes in the patients in our cohorts. Thus, the predictive value of this newly discovered miRNA signature requires further characterization and validation in a separate dataset with a uniform treatment regimen.

In conclusion, we identified and reliably validated a four-miRNA signature associated with OS in two cohorts of patients with LGGs and developed a miRNA-based nomogram for favorable individual prognostic assessment. Further understanding of the biological processes that underlie these miRNAs will provide new insights into the pathogenesis of LGGs and will be an important step towards improved decision making in personalized clinical management of patients with gliomas.