We have read with interest the recent publication by Castel et al., describing the identification of two prognostic subgroups within diffuse intrinsic pontine glioma (DIPG) based on H3.1 versus H3.3-mutation status. They report that these mutations underlie mutually exclusive oncogenetic pathways and hence two phenotypic subgroups, which may eventually lead to specific subgroup treatment for DIPG patients. Although we advocate the search for new predictors in DIPG, we do, however, have some concerns regarding the authors’ statement that histone mutation status is a better predictor for prognosis compared to our recently published prediction model (the DIPG risk score), which is based on clinical and radiological criteria [2].

In their study, Castel et al. attempt to compare the unifactorial predictive value of histone 3.1 and 3.3 mutational status to previously published classifications, including ACVR1 mutation status and the multifactorial DIPG risk score, which is based on three clinical variables (age at diagnosis, interval between onset of symptoms and diagnosis and use of adjuvant chemotherapy in addition to radiotherapy), and one radiological variable (presence of a ring enhancement on MRI) [1, 3]. The authors state “None of these risk factors was a stronger predictor for survival than the histone H3 mutation type, which remained following multivariate analysis (p value <0.0001).” We would debate whether this can be concluded from this French cohort study.

First, Castel et al. did not validate this statement by means of a comparable cohort of DIPG patients: their prediction model was developed in a selected group of 96 DIPG patients, of whom 91 were available for analysis and 79 (86 %) had either a histone 3.1 or 3.3 mutation. 12 patients harboring a wild-type histone 3 or other histone mutation were a priori removed from the analysis, although they were diagnosed with a clinically and radiologically typical DIPG, which to date is the most widely accepted definition of the disease. Additionally, patients with a symptom duration of more than 3 months were excluded. In our prediction model, however, these patients were included, provided a typical DIPG was observed on MRI. Whether “onset of symptoms” and the exact cutoff as a diagnostic criterion to define DIPG as typical or atypical is still a subject of debate. To allow a valid comparison with the DIPG risk score, Castel et al. should at least have included the whole cohort of 91 DIPG patients.

Second, the authors validate their model in a small external cohort containing only 43 patients, and do not separately compare the performance of their predictor (mutation status) with the performance of the DIPG risk score in this validation cohort. Performance testing in a small external cohort may lead to uninformative results because much larger sample sizes are needed to detect differences in external validation cohorts [4]. Further, by their method an appropriate comparison of the predictive performance between their predictor and the DIPG risk score is not done and can therefore not generate reliable conclusions.

Third, we would like to point out that the DIPG risk score model actually performs quite well in the French cohort. Figure S7a from the manuscript shows the Kaplan–Meier curves of the DIPG risk groups. Although we question if 60 patients is enough for a decent validation, the Kaplan–Meier curves based on risk score interval are comparable to the curves published from the original study cohort; both show an increasing overall survival time with decreasing risk scores. The difference between the risk groups is not statistically significant, but this can be explained by the very low number of patients in the standard risk group (n = 5).

Finally, we underline that the authors have provided an important, and apparently strong new variable for future multifactorial prediction modeling. However, applicability is an issue as in contrast to the routinely usable DIPG risk score (which is based on clinical and radiological characteristics), histone mutation status requires a biopsy which to date is, unfortunately, not routinely performed in most countries.

The challenge in prediction modelling is to find the optimal combination of variables that best reflect the influence on survival. Given the strong predictive value of histone mutation status and the good performance of the DIPG risk score model, we recommend that these predictors are applied together in a new large validation cohort to determine their combined value. Currently, the DIPG risk score prediction model is validated in a large cohort from the US, Canada and Australia.