1 Introduction

According to the Central Brain Tumor Registry of the United States (CBTRUS), diffuse infiltrating gliomas represent roughly a quarter of primary brain and central nervous system tumors in adults. This family of diseases is classified by way of histological and molecular criteria defined by the World Health Organization (WHO) as grades II, III, or IV, in order of increasing malignancy (Louis et al. 2016). The median survival for patients with grade IV glioma, or glioblastoma (GBM), is 14 months with standard treatment (Krex et al. 2007). Although patients with low-grade gliomas (LGG, grade II) carry a much better prognosis, their disease often recurs as grade III or IV after having undergone transformation to a more malignant phenotype (Rees et al. 2009).

Magnetic resonance imaging (MRI) is the clinical gold standard for diagnostic evaluation of glioma, but does not provide information regarding physiology. Integrating 1H magnetic resonance spectroscopic imaging (MRSI) with conventional imaging allows for further characterization of the tumor viz-a-viz metabolism (McKnight et al. 2001; Ullrich et al. 2008; Zhu and Barker 2011; Soher et al. 1996; Christiansen et al. 1993; Posse et al. 2013; Nelson 2011). Maps of the levels of total choline (tCho), creatine (Cr), N-acetylaspartate (NAA) and lactate/lipid may be obtained from long echo time (TE) MRSI data to provide an improved definition of the extent of tumor (Nelson 2004). However, evaluating treatment effects and assessing malignant progression likely requires additional information. Advanced MRSI techniques, including spectral editing and the use of short TE acquisitions, have the potential for monitoring changes in the levels of brain metabolites with lower signal-to-noise ratios (SNR) at the cost of more challenging acquisitions that entail robust optimization.

The first step in choosing the most appropriate acquisition for detecting changes in tumor is to identify those metabolites which are likely to provide sufficient discriminative information between grades, and between newly diagnosed, stable, and transformed lesions. The high SNR and superior spectral resolution that can be obtained using ex vivo high-resolution magic angle spinning (HRMAS) spectroscopy provide an opportunity to identify promising candidates.

The aim of this study was to develop an automated algorithm for a large, retrospective database of HR-MAS spectra to classify grade II, III and IV glioma and to identify differences between newly diagnosed, stable and transformed lesions. Once representative spectral profiles had been identified for each sub-group, regions of the spectrum that provided the highest information content for classifying the samples were determined. This allowed for the unbiased identification of metabolites that were important for discriminating different sub-groups and were therefore likely to provide information about their biological properties.

2 Methods

2.1 Patient population

This study was approved by the Institutional Review Board (IRB) at UCSF and informed consent was obtained from each participating subject. Based on pre-surgical MRI parameters, image-guided tissue samples (n = 237) were collected from 152 patients with suspected new or recurrent gliomas, as summarized by Table 1.

Table 1 Summary of tissue samples and patient population

2.2 Image-guided tissue sampling

The pre-surgery MR data were acquired by previously published methods (Elkhaled et al. 2012) and used to plan the tissue sample locations prior to resection. The criteria for defining suspected tumor regions were based on decreased apparent diffusion coefficient (ADC) values derived from diffusion-weighted imaging and/or an elevated choline-to-N-acetylasparate indices (CNI) from MRSI (McKnight et al. 2001). Samples were defined as spherical targets with a 5-mm diameter on co-registered MR images and displayed using surgical navigation software (BrainLAB Inc., USA). The planned spots were viewed in the operating room and samples were only obtained if the tissue could be safely accessed by the surgeon. Once removed, part of each sample was snap-frozen in liquid nitrogen and stored for 1H HR-MAS spectroscopy and the other part was fixed for subsequent histological analysis.

2.3 HR-MAS

Experimental protocols for acquiring HR-MAS data have been described previously (Elkhaled et al. 2012). Briefly, the primary steps included the loading of the tissue samples (mean weight 9.56 mg) into a 35 µL zirconia rotor (Varian) with 3 µL of 99.9% atom-D deuterium oxide containing 0.75 wt % 3-(trimethylsilyl)propionic acid (Sigma–Aldrich) for chemical shift referencing. Data were acquired at 11.7 T, 1 °C, 2250 Hz spin rate in a 4-mm gHX nanoprobe with a Varian INOVA 500 MHz multinuclear spectrometer equipped with a magic angle gradient coil.

A 1D CPMG pulse sequence (rotor-synchronized) was used with TR/TE = 4 s/144 ms, 512 scans, 40,000 acquired points, 20 kHz spectral width, the 90° pulse was calibrated independently for each sample at 56 dB, 2 s water saturation and 2 s readout acquisition time, for a total time of 35 min. The Electronic Reference To access In vivo Concentrations (ERETIC) method was used to generate an artificial electronic signal that served as an external standard for comparison of metabolite levels (Albers et al. 2009). Data from individual samples were normalized by the ERETIC signal area and tissue weight.

Spectra were only included in the analysis if the SNR of any of the choline species centered around 3.2 ppm was greater than ten. Due to potential anaerobic metabolism during tissue sample removal, the region around the lactate peak was not considered in the subsequent analysis. Spectra were first phase corrected and their intensities normalized based upon the weight of the sample and the weight of the internal reference (trimethylsilylpropionic acid, TSP). As a final step, the natural logarithm of the signal intensity was calculated to homogenize the range of peak amplitudes and avoid biasing the OPLS-DA algorithms (Worley and Powers 2013). Points in the spectrum that lay between 0.8 and 4.5 ppm were included in the analysis.

2.4 Multivariate analysis—OPLS-DA

Partial least squares (PLS) is a family of regression methods, where dimensionality reduction is performed by computing linear combinations of the original variables and subsequently a linear regression is applied to the new variables (Blekherman et al. 2011). The linear predictor that best describes the Y variables based on the dependent variables (X) is selected for further analysis. In some cases, the target vector (Y) is not continuous but of a discrete nature, i.e. classification of grade II or III glioma based on 1024 points from an HR-MAS spectrum. Here, the Y vector is normally set to 0 for grade II samples and 1 for grade III samples. This form of PLS is called PLS discriminant analysis or PLS-DA.

One important feature of PLS is that, in contrast to standard linear regression, it implements the dimensionality reduction to a smaller subset space of X and thus removes correlations in the dataset. This allows it to perform much better when highly correlated data are considered, i.e. NMR data (Blekherman et al. 2011). Even though PLS-DA allows for a better separation between classes in scores space, some residual variance that is not correlated to the output Y remains present in the model (Worley and Powers 2013), which can make the interpretation of performance and class separation difficult. OPLS-DA can be utilized to overcome this problem through orthogonal signal correction, making it possible to separate intra-class variations in X that are unrelated to Y (orthogonal components) from inter-class variations in X that are related to Y (Trygg and Wold 2002; Wiklund 2008). As described previously, the natural logarithm of the raw spectra was taken to decrease the range of signal amplitudes, followed by mean centering and scaling the transformed signal to unit variance (Thevenot et al. 2015).

2.5 Validation

Each of the OPLS-DA models was cross-validated using a bootstrapping method. In this iterative process, the data matrix X was randomly sampled 300 times, with the spectra that were chosen being representative of both classes of data (Batista et al. 2004). Each sampled subset of data was used to train the classifier algorithm and was validated using the remaining data. After each validation, the values of model parameters (accuracy, specificity, sensitivity, positive predictive and negative predictive value, R2Y, Q2 and RMSEE) and the variable-importance-for-projection (VIP) were saved.

2.6 Analysis software

The pre-processing, data analysis and visualization of the results described in this study were performed using the R scientific language within the RStudio IDE (R Version 3.2.2, RStudio Version 0.99.902). All multivariate analyses were performed using the package ‘ropls’ (Thevenot et al. 2015).

3 Results

A total of 237 spectra were considered in the analysis, that came from 152 patients. The breakdown according to each histological sub-group is summarized in Table 1.

3.1 Classification of newly diagnosed glioma

The classification of newly diagnosed grade II versus IV glioma samples exhibited a high model performance, as shown in Table 2: sensitivity, 0.94; specificity, 0.96; accuracy, 0.95. Figure 1a depicts the difference in mean spectra for grade IV minus grade II glioma.

Table 2 Model performance of classification between newly-diagnosed gliomas (mean ± standard deviation)
Fig. 1
figure 1

Newly diagnosed glioma. Difference in mean spectra for grade IV minus grade II (a); grade IV minus grade III (b); grade III minus grade II (c). GPC glycerophosphocholine, 2HG 2-hydroxyglutarate, mI myoinositol, PCr phosphocreatine, hTau hypotaurine, Ala alanine, Glc glucose, Glx glutamine, glutamate, Lip lipids, tCho total choline, Asp aspartate, Val valine

The metabolites that were determined to possess the highest discriminative power in this model were 2-hydroxyglutarate (2HG), glycerophosphocholine (GPC), myo-inositol (MI), creatine/phosphocreatine (Cr/PCr), tCho, taurine (Tau) and valine (Val). Levels of these metabolites were lower for samples from grade IV versus grade II lesions.

The comparison between newly diagnosed grade III versus IV glioma (Fig. 1b, difference spectra) identified similar metabolites as being important, with levels of 2HG, PCr, MI, GPC and hyper-taurine (hTau) being higher in grade III samples, and levels of Ala, lipid (Lip) and Glc being higher in grade IV samples. The classification performance was defined by: sensitivity, 0.76; specificity, 0.97; accuracy, 86% (Table 2).

For newly diagnosed grade II versus III glioma (Fig. 1c, difference spectra), the most significant metabolites were 2HG, the sum of glutamine and glutamate (Glx), glycine (Gly), hTau, asparate (ASP), tCho, valine (val) and alanine (Ala). The levels of these metabolites were found to be increased in the grade III lesions. The classifier did not perform as well with the sensitivity, 0.85; specificity, 0.51 and accuracy, 0.68 (Table 2).

3.2 Classification of newly diagnosed vs. recurrent transformed glioma

The classification between newly diagnosed grade III glioma and those that were originally grade II but had recurred as grade III(II→III) (Fig. 2a, difference spectra) demonstrated 67% sensitivity and 91% specificity (Table 3). Metabolites with the greatest contribution to the model were Glx and Ala, which showed a decrease in recurrent lesions; and hTau and 2HG that were found to be elevated in the same lesions. The classification between newly diagnosed grade IV and those that were originally grade II but recurred as grade IV(II→IV) (Fig. 2b, difference spectra) performed with 99% sensitivity and 99% specificity (Table 3). The metabolites with the highest discriminative power were found to be 2HG, MI, PCr,Tau, and tCho, which were higher in the transformed lesions (grade IVII→IV ).

Fig. 2
figure 2

Newly diagnosed vs. transformed glioma. Difference in mean spectra for newly diagnosed grade III minus recurrent grade IIIII→III (a); and recurrent grade IVII→IV minus newly diagnosed grade IV (b). 2HG 2-hydroxyglutarate, mI myoinositol, PCr phosphocreatine, Tau taurine, hTau hypotaurine, Cho choline, Ala alanine, Glc glucose, Glx glutamine, glutamate

Table 3 Model performance for classification between newly-diagnosed and recurrent gliomas (mean ± standard deviation)

3.3 Classification of pathological grade for recurrent glioma

The model for classifying grade IIIII→III lesions vs. grade IVII→IV had a specificity of 96% and sensitivity of 88% (Table 4). The main resonances contributing to the classification were GPC, phosphocholine (PC), glucose (Glc), Cr/PCr, hTau, 2HG and Lip (Fig. 3a, difference spectra). The levels of most metabolites were increased in the grade IVII→IV lesions, with the exception of Cr/PCr and GPC, which were decreased.

Table 4 Model performance for classification between recurrent glioma subtypes (mean ± standard deviation)
Fig. 3
figure 3

Recurrent glioma. Difference in mean spectra for transformed grade IVII→IV minus transformed grade IIIII→III (a); transformed grade IVII→IV minus non-transformed grade II (b); and transformed grade IIIII→III minus non-transformed grade II (c). GPC glycerophosphocholine, PC phosphocholine, PCr phosphocreatine, 2HG 2-hydroxyglutarate, mI myoinositol, hTau hypotaurine, Ala alanine, Glc glucose, Glx glutamine, glutamate, Lip lipids, tCho total choline, Asp aspartate

The comparison of newly diagnosed grade II vs. grade IVII→IV achieved a classification accuracy of 88% with mean specificity of 99% and mean sensitivity of 76% (Table 4). The main metabolites contributing to the classification were PC, Glc, MI, hTau, 2HG, tCho, Asp and Lip (Fig. 3b, difference spectra). In all cases, the levels were higher in the recurrent grade IV lesions (p values < 0.0001).

The performance of the classifier was worse for distinguishing newly diagnosed grade II vs. grade IIIII→III (specificity of 44% and sensitivity of 68%, Table 4). Again, the prominent peaks (see Fig. 3c, difference spectra) were found to be Glc, MI, 2HG (not all peaks), hTau, tCho, Ala, Asp and Lip, all of which had elevated levels in the recurrent samples.

Figure 4 summarizes the differences in metabolomic profiles that were identified in the current analysis. The inner blocks denote the classification of the newly diagnosed grades II, III and IV and the outer blocks represent the changes that occurred upon recurrence.

Fig. 4
figure 4

Summary of metabolites contributing the most information to classification of newly diagnosed (de novo) and recurrent glioma subtypes. GPC glycerophosphocholine, 2HG 2-hydroxyglutarate, mI myoinositol, PCr phosphocreatine, Tau taurine, hTau hypotaurine, Ala alanine, Glc glucose, Glx glutamine, glutamate, Lip lipids, tCho total choline, Asp aspartate, Val valine

4 Discussion

The automated algorithm that was developed and used to classify different subtypes of glioma was successful in identifying a large number of spectral differences and hence metabolic profiles that were associated with higher tumor grade or malignant progression. The PLS-DA method with orthogonal correction utilized in this work exhibited high model performance. These results were achieved because of the nature of the OPLS-DA method, which removes variations in the spectra that do not contribute to the class separation. This, combined with the dimensionality reduction of the spectral data, allowed for an easier interpretation of the results and a better identification of relevant spectral regions. Using the VIP score of the variables, it was possible to discern frequencies that contributed the most information to the models considered in this study. The accuracy of the models suggests that the evaluation of the combined intensity changes of those spectral regions can be used to create a reliable biomarker for predicting disease progression.

It is important to note that the approach described here differed from the one taken in previous studies (Elkhaled et al. 2012; Wright et al. 2010), which first extracted information about specific metabolites by fitting the peaks in the spectrum. While this yielded meaningful results, it relied upon prior knowledge of which metabolites were included in the spectrum and was only able to consider estimates of parameters for which the Cramer Rao bounds were sufficiently small. Using the entire spectrum for the classification step avoided the potential loss of information associated with leaving out key metabolites or discarding measurements where the metabolite signal had low intensity.

For the newly diagnosed subtypes, the data support findings from other types of cancer (Tessem et al. 2008) that indicate amino acid concentrations are increased in higher grades. This is likely owing to the increased nutrient demand of proliferative tumor that requires an increase in metabolic activity. One exception from the current study is the amino acid valine, which was also previously shown to decrease in tumor tissue (Lai et al. 2005). As has been observed in prior in vivo studies, the spectrum of newly diagnosed grade IV glioma also exhibited a higher concentration of lipid signals compared to the lower grade lesions. This, combined with the decrease in Cr/PCr, suggests that there are small regions of necrosis appearing in the higher grade lesions, presumably associated with hypoxia due to abnormal perfusion. The decrease in GPC from grades II and III to grade IV and in tCho from grade III to grade IV is consistent with there being changes in choline metabolism with more malignant behavior, which has been observed in other cancers.

Two other key metabolites that we found to be associated with differences in pathological grade were 2HG and MI. The recent finding that mutations in the IDH1 and IDH2 genes are associated with elevated 2HG highlights the link between molecular and metabolic findings in glioma (Elkhaled et al. 2012). Differences in 2HG observed in the current study are thus explained by the high prevalence of IDH mutations in grade II and grade IIII glioma (~ 80%) and relative absence among primary GBM. Changes in the levels of MI between grades have also been observed previously (Castillo et al. 2000). Of interest is that in the current study these differences follow the same pattern as for 2HG (Fig. 4). It remains to be seen whether IDH wild-type grade II and III lesions also have lower MI, as there were not enough patients in this category for the current study.

For lesions that had undergone malignant progression, there was a consistent pattern of metabolite levels increasing with pathological grade. This may reflect both higher tumor cellularity and the distinct changes in choline metabolism associated with malignant progression. The decreases in Cr/PCr and elevation in Lip between progression to grade III versus grade IV support the presence of necrosis in the secondary grade IV lesions, though not as pronounced as for newly diagnosed grade IV lesions. Future studies will consider whether there are differences in metabolic profiles for lesions with wild-type IDH or in association with other types of mutations.

5 Conclusion

This study demonstrated that the application of OPLS-DA to raw HR-MAS data can be used to build models for the classification of glioma pathological grades and the prediction of malignant progression. Many of the metabolites that distinguished tumor subtypes, such as 2HG, MI, choline, Cr and Lip, can be measured non-invasively using in vivo 1H magnetic resonance spectroscopy. Utilizing these markers in vivo would increase the accuracy of tissue characterization and also provide supplemental information for non-invasively monitoring response to therapy.