Introduction

Brain tumors are the second most common tumors in children. In contrast to adults, the histological types in childhood brain tumors vary widely. MRI is the most important imaging tool for the assessment of childhood intracranial neoplasm. Conventional T2, T1 and T1-weighted imaging (WI) with gadolinium based contrast agents (GBCAs) injection provide information on tissue morphology and extension of the disease. Evidence to support the benign or malignant nature of the tumor aside from the location includes hemorrhage, ectatic vessels within the tumor and T2-hypointense appearance which is secondary to high cell density [1]. The value of contrast enhancement is limited in predicting tumor grades in children as 89 % of low-grade tumors will show enhancement [2]. Moreover lack of enhancement is also present in high-grade tumors such as PNET and in up to 8 % of medulloblastoma [2]. Therefore, goals such as identification of the most aggressive area of a tumor before biopsy, grading of the tumor and assessment of the response to recent therapies as well as the clinical effects on cognition after long-term treatments have been responsible for the development of advanced MRI techniques. Diffusion-weighted imaging (DWI) provides information on density of cell and tissue microarchitecture [3], perfusion-weighted imaging (PWI) on capillaries and larger vessels features depending on the sequence used and magnetic resonance spectroscopy (MRS) on tissue neurochemical characteristics [4, 5]. These techniques are considered useful tools for grading and typing brain tumors in children [68]. However, apparent diffusion coefficient (ADC) values and MRS profiles showed overlaps between the different tumor types in children [912]. Intratumoral cerebral blood volume (CBV) usually correlates with tumor vascularity with the higher CBV in high-grade tumors [5]. However low grade gliomas such as oligodendrogliomas can show elevated CBV as reported in adult brain gliomas [13]. To date, reports have been focused mostly on posterior fossa tumors. Therefore, the goals of this study were to assess the diagnostic accuracy of advanced MRI techniques (DWI, PWI, MRS) along with single imaging parameters in the presurgical diagnostic grading and typing of pediatric brain tumors irrespective of their location.

Methods

Patient population

This study is a retrospective analysis of advanced MRI techniques in children to evaluate brain tumors prior to surgery. Informed consent was not required because DWI, PWI and MRS are currently approved diagnostic procedures. This study was approved by the Institutional Review board.

From October 2006 to April 2013, 139 patients with a suspected brain tumor were referred by the Department of Paediatric Neurosurgery prior to any treatment. Brain MRI with conventional sequences, DWI, PWI and MRS, was performed. General anaesthesia was administered in patients younger than 5 years. Criteria for inclusion in the study included absence of motion or susceptibility artefacts, and good-quality MRS spectra, arterial input function and PWI perfusion curves. Small tumors (size less than the MRS voxel), hemorrhagic tumors and tumors with high susceptibility artefacts were excluded. In term of location suprasellar tumors such as craniopharyngiomas and extra-axial tumors such as tumors originating from the bone or meningiomas were excluded. In term of tumor subtypes tumors without histological data such as some cases of diffuse brainstem glioma, of pineal tumor, of dysembryoplastic neuroepithelial tumor (DNET) were also excluded.

Seventy-six patients (50 boys, 26 girls; median age, 7.5 years; age range 1 month to 17.9 years) were finally included. Histological diagnosis was obtained in all patients within 1 week after MRI by tumor resection or surgical biopsy.

The cases were diagnosed by two neuropathologists independently and were also reviewed at the time of the study by the Director of Pathology (DFB). The diagnosis was always concordant.

Tumors were classified and graded according to the latest WHO 2007 classification [14]. All tumors were neuroepithelial, of which 54 % were located in the posterior fossa and 46 % in the supratentorial area. The tumor distribution was as follows: grade I (n = 38), grade II (n = 6), grade III (n = 12) and grade IV (n = 20). In order to avoid a too large number of tumor subtypes we have pooled some of them in eight categories (A–H) (Table 1), based on similar pathological findings and/or genetic markers and also clinicoradiological findings and biological behavior.

Table 1 Histological categories and distribution of tumor types

MRI protocol

All MR examinations were performed on a 1.5 T unit (Symphony TIM, Siemens, Erlangen, Germany) using an 8-channel head coil. The MRI protocol included T1-weighted gradient-echo, T2-weighted turbo spin-echo, FLAIR images in at least two anatomic planes, DWI, MRS followed by PWI and a 3D-gradient echo T1 sequence after GBCAs injection.

Diffusion-weighted imaging

DWI was acquired in the axial plane with a spin-echo echo-planar sequence (TR = 3300 ms, TE = 92 ms, slice thickness = 4–5 mm, no spacing, three averages, FOV 240 × 240 mm, 128 × 128 acquisition matrix, acquisition time 51 s). Diffusion-sensitizing gradients were applied in three orthogonal directions with three b values (0, 500 and 1000 s/mm2).

Proton MR spectroscopy

Single-voxel 1H-MRS was performed with PRESS (point resolved single voxel spectroscopy) with short (30 ms) and intermediate (135 ms) TE. The voxel of interest (8 cm3) was placed in the solid component of the tumor away from calcific, hemorrhagic or necrotic areas and away from the scalp. Water-suppressed MRS was acquired with the following parameters: short TE (30 ms): TR 1500 ms, 64 averages, acquisition time 1 min 36 s; intermediate TE (135 ms): TR 1500 ms, 85 averages, acquisition time 2 min 07 s. Data were acquired with 1024 data points over a spectral width of 1000 Hz. Non-water-suppressed MRS was also acquired with short and intermediate TE within the voxel to establish internal reference and normalization of the data.

Perfusion MRI

PWI was performed with DSC-MRI using a fat-suppressed T2*-weighted gradient echo echo-planar sequence (TR = 1480 ms, TE = 30 ms, flip angle 90°, bandwidth 1565 Hz/pixel, FOV = 230 cm, matrix = 128 × 128, slice thickness 4–5 mm, no gap). Fifty data sets of 20 images covering the whole brain were acquired at an interval equal to the repetition time for 1 min 21 s. Images were first acquired at baseline (five data sets) and then during the first-pass of a bolus injection of GBCAs (0.1 mmol/kg of Gd-Dota (Dotarem, Guerbet, France)) administrated via a 22-gauge to 20-gauge peripheral venous catheter with a power injector at 3 mL/s, followed by a saline flush at the same rate. Injection was performed manually in young patients with a thinner catheter (24-gauge) as fast as feasible.

MR data post-processing

The same operator (MK) who performed all measurements was blinded to histological data at the time of analysis. The quality of the arterial input and the regions of interest (ROI) were also reviewed for the present study by another author (NG) and a consensus was reached to record the lowest ADC on DWI, and the highest CBV in enhancing tumors to minimize the possibility of bias.

MRS data

After zero filling and exponential filtering, PRESS data were fitted in the time domain using a metabolic database derived from MRS acquisitions of aqueous model solutions of pure metabolites using the AMARES-MRUI FORTRAN code included in homemade software developed in the IDL environment (Iterative Data Language, Research System Inc., Boulder, CO, USA) [15, 16]. Metabolite signal concentrations for short TE (Cho (choline), NAA (N-acetylaspartate), Cr (creatine), mIns (myo-inositol), Glx (glutamine plus glutamate), CH2, CH3, lipids (sum of CH2 and CH3), acetate, aspartate, scyllo-inositol/taurine, glucose, sum of metabolites) and intermediate TE (Cho, NAA, Cr, lactate, sum of metabolites) were measured and normalized to water intensity at intermediate TE, and were expressed in arbitrary units. Metabolite signal concentrations were also normalized to the sum of metabolites (S) at each TE. Numerous ratios of metabolites were also analyzed. Therefore, a total of 69 parameters from MRS were used for the statistical analysis.

DWI and perfusion data

DWI and PWI raw data were transferred to a workstation and analyzed with a dedicated software package (Oleasphere v2.2, OLEA Medical, La Ciotat, France) using a co-registration with structural MRI.

DWI data

Five to ten circular ROIs measuring at least 20 mm2 were manually drawn on ADC maps within the solid tumor areas with the lowest signal intensity. Care was taken to avoid necrosis, hemorrhage and calcifications by examining all other MR sequences. The lowest ADC value (ADCmin) of the tumor was then recorded. Normalization was obtained with a control ROI placed in the normal-appearing contralateral cerebellar grey matter in posterior fossa tumors and in the normal-appearing contralateral white matter for supratentorial tumors. ADC tumor-to-normal-brain ratios were calculated (rADC).

Perfusion data

Arterial Input Function (AIF) was automatically generated by OLEA software, through automatic selection of pixels within arteries in most cases. The OLEA software used oSVD (oscillatory Singular Value Decomposition) deconvolution method proposed by Wu et al. [17] because it is delay insensitive and semi adaptative. It provides robust and accurate estimates and performs better than any other existing SVD-based deconvolution method. The whole concentration–time curve was integrated from the arrival of the bolus till the end of the acquisition that includes recirculation, which when present, is visible within the AIF signal as well as within the signal of each voxel.

CBV maps were determined by numerically integrating the area under the contrast agent concentration time curve. The CBF, TTP, MTT, K2 and Tmax maps were automatically generated by OLEA software. Leakage was quantified by the K2 leakage coefficient, thus allowing the calculation of corrected CBV. For enhancing tumors, five to ten circular ROI at least 20 mm2 were manually placed in tumor areas of increased CBV and matched to contrast enhancement. For non-enhancing tumors, the ROIs were placed in the solid portions of the tumor. The highest CBV value (CBVmax) was recorded. Great care was taken for the placement of the ROI by overlaying different sequences on CBV maps in order to avoid hemorrhage and calcifications (precontrast T1-weighted and b0 images), necrosis and cysts (FLAIR sequence) and intratumoral vessels (post-contrast T1-weighted images, unprocessed perfusion images, MTT and time MIP (tMIP) maps) (Supplemental material 1). The control ROI was placed in the contralateral cerebellar grey matter for posterior fossa tumors because perfusion measures were reported to be less variable than in the cerebellar white matter [18], and in the normal-appearing contralateral white matter for supratentorial tumors. Tumor-to-brain ratios were obtained for all perfusion parameters (CBV, corrected CBV, CBF, TTP, MTT, and Tmax). K2 values were not normalized because the control K2 values were close to zero, as previously reported [19]. Only positive K2 values, which are widely accepted to represent leakage, were taken into account.

Statistical analysis

An automatic brain tumor classification was used based on the DWI, PWI, MRS techniques and a machine learning technique [20]. Statistical analysis was performed using the free computing environment R (R Development Core Team 2014). A total of 84 parameters were included as input data (two from DWI, 13 from PWI, and 69 from MRS) to perform multivariate statistical analysis.

We used the random forest approach for classification [21], which is accepted as a good predictive model in supervised machine learning. Random Forest for classification is an ensemble method which combines several decision trees. A decision tree is a statistical model based on several binary decision rules having the form X < s where X is any explanatory factor like ADC or rADC, and s is a threshold over that factor. The data set at hand belongs to the root node of the tree. It is split in two subsets (new nodes) using a binary decision rule such that the two obtained subsets are the most homogeneous with respect to the output variable (grade or type of tumor). The best split for each subset is selected among all the possible splits over mtry = √84 parameters selected at random among the 84 parameters. The splitting process of the data is repeated till very few observations in each of the subsets obtained (leaves of the tree) are left. As a classifier, a decision tree may assign a class (a type or a grade) to every new patient.

Each of the ntree = 500 trees is trained on a bootstrap sample. A bootstrap sample contains n observations drawn at random with replacement from the original data set. The final classifier defined by the random Forest, assigns a class for a new patient by applying a majority vote over the decisions given by the trees in the forest.

For both grade and histological types, models were developed to identify which combination of parameters (among DWI, MRS, PWI) gave the best diagnostic accuracy. Seven models were created: M1 (including only DWI parameters), M2 (PWI, including all 13 PWI parameters), M3 (MRS, including all 69 MRS parameters), M4 (DWI + PWI), M5 (DWI + MRS), M6 (MRS + PWI) and M7 (DWI + PWI + MRS).

The models were fitted on two-thirds of the data set (training sample) and tested over the rest (test sample). Training and test samples were drawn randomly from the original data set with stratification. The predictive accuracy (100—predictive error rate) is calculated as an average accuracy obtained from 1000 such test samples. The seven models were fitted separately to predict grades and histological types.

The MR parameters that contributed to accurate tumor grading and typing were subsequently identified in decreasing order of importance. As such it was possible to determine the subset of MR parameters giving the highest predictive accuracy with regard to grading and typing.

Results

The diagnostic accuracy of each modality or combined modalities for tumor grading and typing is shown in Table 2. Models M4–M7 combining two or more imaging modalities performed better than those using one imaging modality separately.

Table 2 Predictive diagnostic accuracy of models M1–M7 for the assessment of tumour grading and typing

Concerning tumor grading, the highest accuracy was obtained with the M4 model (DWI + PWI, 73.24 %) followed closely by model M7 (DWI + PWI + MRS, 72.9 %) and model M5 (DWI + MRS, 71.88 %).

Concerning tumor typing, the highest accuracy was obtained with the M7 model (PWI, DWI and MRS, 55.76 %), followed by model M5 (DWI + MRS, 54.78 %), model M4 (DWI + PWI, 53.01 %) and model M6 (PWI + MRS, 52.52 %).

We could also note that in general the predictive models performed less well for tumor typing than for tumor grading.

The best predictive model for the determination of tumor grade (model M4 combining DWI + PWI) revealed high diagnostic accuracy for grades I (88 %) and IV (86.62 %) but low accuracy for grades II (20.1 %) and III (36.16 %).

The best-performing predictive model for the determination of tumor type (model M7 combining DWI, PWI and MRS) (Table 3) revealed high diagnostic accuracy for categories H (embryonal tumors) (90 %) and A (pilocytic astrocytomas) (83.21 %). Predictive diagnostic accuracy was near 50 % for groups C (DNET) and F (ependymomas) and extremely low for the remaining tumor categories.

Table 3 Predictive diagnostic accuracy of model M7 (combining DWI, PWI and MRS) with regard to tumor typing

The 30 most powerful parameters to classify the different tumor grades and types are given in Fig. 1 in decreasing order of importance. Figure 1 shows the variables importance as computed by the Random Forest. It corresponds to the Mean Decrease in Accuracy (the x-axis) induced on the random Forest model if the corresponding variable was perturbed (its values in test samples are randomly permuted).

Fig. 1
figure 1

Importance of each parameter as a mean contribution to the accuracy of the global random forest classifier over test samples. Note S = (NAA + Cr + Cho)135; S1 = (NAA + Cr + Cho)30; S2 = (S1 + Myo_3.56)30. L lactate, Scytau scyllo-inositol/taurine, L lactate, SL S + L, 30 TE 30, 135 TE 135

ADC and rADC were the most powerful parameters to classify the different tumor grades and types, followed by the choline level at intermediate TE. For tumor grading, additional parameters were Cho/H20 and Cho/NAA at short TE, Cho/S at intermediate TE and CBV. For tumor typing, the additional parameters were the sum of metabolites and lactate at intermediate TE, Cho/H20 and Cho/S at short TE and rTmax. Supplemental material 2 shows that for tumor grading the best diagnostic performance was obtained with the 18 most powerful parameters (76 % accuracy) and for tumour typing the 16 best-performing parameters (62 % accuracy) when these parameters were sequentially added.

Discussion

Many pediatric brain tumors have been studied, mostly of the posterior fossa, using DWI [8, 9, 22, 23]. Reports using MRS [24, 25], DWI and MRS [12, 26], PWI and DWI [18] or the combination of these three techniques [4] are rare in the pediatric population. The present retrospective study consisted of a preoperative multiparametric MR assessment of childhood brain tumors that identified the best imaging protocol and the most powerful parameters categorizing the grade and type of a variety of supra- and infratentorial tumors. Usually histopathological diagnosis is based on cell proliferation, high cell density and high vascularity which are evaluated respectively by the choline level, the ADC, and the parameters from PWI particularly CBV. However pediatric brain tumors are heterogeneous and have specific histologic and imaging characteristics. Therefore many of the assumptions in adults do not apply in children with respect with grade versus enhancement and choline level, e.g. pilocytic astrocytoma [27].

Statistical analysis

Machine Learning types of statistical analysis are modern statistical modeling tools which employ new algorithms for estimations and new approaches which may require intensive computations. These techniques are widely used for statistical modeling, regression classification and prediction. Decision trees are among the fundamental and basic methods in machine learning. This kind of approaches is used in all scientific domains including medicine, and a lot of developments have been achieved for instance in the context of bioinformatics.

The use of machine learning approaches is particularly appealing for high dimensional problems (where the sample sizes are very small compared to the number of variables-the dimension of the data). Another point iof interest is also that of selecting important features to aid with classification and prediction. These are situations where classical models are not satisfactory because they are either parametric (making distributional assumptions, mainly Gaussian), or linear (like linear discriminant analysis), and are unable to provide good estimates in high dimensions as they need large sample size often not available in practice.

Classification methods for pediatric brain tumors have been previously used with MRS [10, 24, 25, 28] and DWI + MRS [12], with linear discriminant analysis. The present study evaluated advanced techniques and multiple parameters with a very sophisticated analysis and extremely narrow confidence intervals and therefore will have a lower reported mean accuracy for particular parameters. However this should not be regarded as a poor result but rather as a much more precise evaluation of a technique.

Diagnostic accuracy of MR modalities and optimal MR protocol

The optimal brain imaging protocol to best determine tumor grade was obtained by combining DWI and PWI, with a predictive diagnostic accuracy of 73.24 %, while tumor types were better evaluated with the combination of DWI, PWI and MRS, although with a lower performance (55.76 %). These results also showed that the diagnostic performance for tumor grading was not highly affected when using DWI and MRS instead of DWI and PWI (diagnostic accuracy, 71.88 vs 73.24 %). In a similar fashion, the diagnostic performance in tumor type assessment was slightly lower when combining DWI and PWI or DWI and MRS compared to the three MR modalities (53.01 or 54.78 % vs 55.76 %). Therefore, DWI combined with either PWI or MRS can be performed interchangeably if necessary because the diagnostic efficacy of these models is quite similar for tumor grading and typing.

The only prior study of pediatric supra- and infratentorial brain tumors using multiparametric MR modalities showed statistical differences for choline and lipids between high- and low-grade tumors [4]. High accuracy (98 %) was obtained for cerebellar tumour assessment by MRS [24] with short and intermediate TE and using 20 metabolites to distinguish the three most frequent tumors (medulloblastoma, ependymoma, astrocytoma) [25]. The accuracy of MRS for tumor typing as a single modality was lower in our results (53.01 %) compared to the aforementioned study: this could be related to a wider range of histological types or a difference in post-processing of MRS data. Combined DWI and MRS data was also reported in posterior fossa tumors with complete separation of all tumor groups by linear discriminant analysis and six metabolites [12]. However, in our study the model combining DWI and MRS showed a low diagnostic accuracy (54.78 %), most likely due to the pooling of tumors irrespective of tumor location.

The diagnostic performance of advanced MRI techniques for each tumor grade and type is also crucial to enhance the presurgical diagnostic confidence level. Good diagnostic accuracy was found with the diagnosis of grade I and IV tumors (88 and 86.62 %), while poor accuracy was present with the evaluation of grade II and III tumors (20.51 and 36.16 %). The most likely explanation could lie in the intrinsic histological heterogeneity of brain tumors, which have histological similarities such as in grades II and III, and which are therefore difficult to distinguish from each other, particularly ependymomas and gliomas. High diagnostic accuracy was present with the evaluation of embryonal tumors (83.2 %) and pilocytic astrocytomas (89.15 %), while a low accuracy, although above 50 %, was obtained when evaluating DNET and ependymomas.

Performance of MR parameters

ADC and rADC were the most effective parameters to predict tumor grade and type. This concurs with previous studies that showed a strong negative correlation between ADC and cell density and grade [29, 30]. Recent paediatric studies showed that ADC alone in combination with conventional MRI could distinguish high- and low-grade tumours with high sensitivity and specificity, with a cut-off ADC minimum value distinguishing the two groups [11]. However, ADC taken alone for grading and typing tumours resulted in low accuracy in this study (62 and 42.49 %, respectively) because tumours were not pooled into low- and high-grade groups but were separated into grades I–IV. Although previous reports have shown that ADC can distinguish most tumour types specifically in the posterior fossa with high accuracy [8, 9, 22, 23], ADC is in fact limited and we cannot rely on a cut-off value due to the histological variability of ependymomas and medulloblastomas [11].

Choline at intermediate TE was the second most important parameter for grading and typing. A positive correlation between total choline and tumour grade has been demonstrated specifically for pilocytic astrocytoma and medulloblastoma of the posterior fossa [10, 12, 25, 28]. An elevated choline peak is known to be related to increased membrane turnover, cellularity and tumour growth [4], with the highest values seen in medulloblastoma and choroid plexus carcinoma [10]. Surprisingly, lipids were not an accurate MRS parameter in our study for grading, as has been previously reported [10].

PWI has become an essential tool to evaluate brain tumors. It is generally recognized that intratumoral CBV correlates with tumor vascularity with the highest CBV value in high-grade tumors [5]. However, the present results showed that CBV was found to be less powerful than ADC and choline when grading tumors and particularly when typing tumors. This may relate to the fact that childhood tumors differ from adult brain tumors not only pathologically but genetically as well. Indeed, increased cell density exceeds the high vascularity in high-grade tumors such as embryonal tumors, which are more frequent in children than glioblastoma. The second probable explanation is that there is a CBV value overlap between the different grades and types. Finally the last reason may be related to the ROI placement technique. Vessels were excluded by using several maps (unprocessed perfusion images, tMIP and MTT) to more precisely assess tissue microvasculature and to avoid overestimation of CBV and overgrading. Our results also showed that rTmax is more powerful than CBV for tumor typing (see Fig. 1b) suggesting that some pediatric brain tumors have similar behavior as ischemia. The highest Tmax values were found in categories A (pilocytic astrocytomas) and B (gangliogliomas and pleomorphic xanthoastrocytomas). This is probably linked to hyalinized vascular sclerosis and infarct associated necrosis in pilocytic astrocytomas, and to perivascular inflammatory cells in pilocytic astrocytomas and gangliogliomas (prayson).

Finally, this study also showed that the combination of the best MR parameters was more accurate than combined MR modalities, particularly for tumor grading (>over 75 %) and typing (<although below 75 %), clearly indicating the need for the three MR modalities and their associated parameters.

Limitations of the study

The main limitations of this study, apart its retrospective nature, are the small sample size for grade II tumors, the many tumor types, with prevalence of grades I and IV and types A and H. However, this corresponds to the usual distribution frequency of pediatric brain tumors [7].

Other limitations are of technical origin with regards to the recent recommendations for DSC [13]: a GRE-EPI sequence with a high flip angle of 90° was used for DSC-MRI without preload of GBCA that could have affected CBV values. The base-line measurements were not as numerous as recommended with consequent decrease of signal to noise ratio.

Conclusion

The purpose of this study was to evaluate the accuracy of some MR techniques and derived parameters utilized by neuroradiologist to support the diagnosis. Accuracies reported are estimated in a very robust and unbiased way (random forests). Advanced MR techniques can be very accurate in identifying grades I and IV tumors in children as well as in typing pilocytic astrocytomas and embryonic tumors, when using the proper parameters.