Introduction

Gliomas are the most common primary neoplasm of the central nervous system, and the prognosis for this disease remains very poor [1]. In determining a treatment plan, tumour grade is a key consideration for minimizing the risk of unnecessary morbidity and mortality [2].

Given the limitations of conventional MRI in distinguishing low-grade gliomas (LGGs) from high-grade gliomas (HGGs) (accuracy between 55 % and 83 %), advanced multiparametric magnetic resonance (MR) techniques have been explored, including diffusion-weighted imaging (DWI), diffusion tensor imaging (DTI), proton MR spectroscopy (MRS) and perfusion imaging [3]. Because of its safety and non-invasive nature, MRS is of great advantage in patients with brain gliomas [4, 5]. MRS provides information about metabolic tissue composition, and advanced spectroscopic methods have been used to quantify markers of tumour metabolism (e.g. glucose), membrane turnover and proliferation (e.g. choline [Cho]), energy homoeostasis (e.g. creatine [Cr]), intact glioneuronal structures (e.g. N-acetyl-aspartate [NAA]) and necrosis (e.g. lactate [Lac] or lipids) [6]. Results are usually expressed as ratios between cerebral metabolites rather than absolute concentrations [7].

Numerous studies have evaluated the diagnostic accuracy of 1H-MR spectroscopy for distinguishing high- and low-grade gliomas. A systematic literature review [8] was performed approximately 10 years ago in order to provide evidence, but the interpretation was limited by the small number of studies and small sample sizes. In the current work, we performed a meta-analysis to measure the diagnostic performance of MRS in differentiating high- from low-grade gliomas.

Materials and methods

Search strategy

The PubMed and Embase databases were systematically searched to find relevant published articles (through 27 March 2015). We used the following keywords: (“magnetic resonance spectroscopy” or “MR spectroscopy” or “MRS”) AND (glioma). Additionally, the reference lists of all retrieved articles were checked for other eligible reports that were not identified in the initial search.

Inclusion and exclusion criteria

The inclusion criteria were as follows: (1) MRS was used to differentiate HGGs from LGGs in patients with no clinical history of previous surgery, chemotherapy or radiotherapy; (2) the diagnostic criteria of glioma grading were clearly stated, and the number of LGGs and HGGs could be obtained; (3) at least one individual metabolite ratio was used to evaluate the grading of glioma; (4) values of true positive (TP), false positive (FP), false negative (FN), true negative (TN), sensitivity (SEN), specificity (SPE), positive likelihood ratio (LR+) and negative likelihood ratio (LR−) could be accurately calculated from the data reported; (5) at least seven patients were included; (6) pathology and/or clinical follow-up were used as the reference standard; (7) no data were overlapping; and (8) only English language full-text publications were included. The following were excluded: animal studies, abstracts, reviews, case report, letters, editorials, comments and conference proceedings.

Two authors (Q. Wang and H. Zhang) independently evaluated potentially relevant articles using the inclusion and exclusion criteria. If no agreement could be reached between these two authors, inconsistencies were discussed and resolved by a third author (B.N. Xu).

Data extraction and quality assessment

The screened articles were assessed independently by the same two authors (J.S. Zhang and X.L. Chen). For each included study, basal characteristics (authors, year of publication and country of origin), patient characteristics (number, mean age, sex, and grade of glioma) and technical aspects (imaging field strength, techniques of spectrum acquisition, device parameters, metabolite ratios, cut-off value, metabolite ratio mean value in different grades of glioma and reference standard) were noted. HGGs (grade III–IV gliomas) were considered positive; LGGs (grade I–II gliomas) were considered negative. The number of TP, FP, FN and TN results were calculated and recorded. The methodological quality of the studies was assessed using the Quality Assessment Tool for Diagnostic Accuracy Studies version 2 (QUADAS-2) [9]. Disagreements were resolved by consensus.

Statistical analysis

Standard methods recommended for meta-analysis diagnostic accuracy were used [10, 11]. First, heterogeneity was evaluated between studies in which this may have been caused by a threshold effect. A Spearman correlation coefficient was computed between the logit of SEN and the logit of (1 − SPE) to assess the threshold effect, and the extent of heterogeneity was then assessed using the chi-square value test and the inconsistency index (I2) of the diagnostic odds ratio (DOR). The existence of significant heterogeneity necessitated the use of a random-effects coefficient binary regression model when the test performance was summarized; otherwise, a fixed-effects coefficient binary regression model was used [12, 13].

The pooled SEN, SPE, LR+, LR− and DOR, with their 95 % confidence intervals (CIs), were calculated for each study. The summary receiver-operating characteristic curve (SROC), area under the curve (AUC) and Q* index were calculated. AUC values of 51–70 %, 71–90 %, and >90 % indicated low, moderate and high diagnostic accuracy, respectively.

Subgroup analysis was performed when some homogenous set of studies adopted similar design variables. Subgroups were constructed only when more than three studies could be included. Tests of interaction were performed to assess differences between subgroups [14]. The above-mentioned statistical analyses were performed using Meta-DiSc statistical software version 1.4 [11].

Publication bias was assessed by Deeks’ funnel plot. Formal testing for publication bias was conducted using a regression of the diagnostic log odds ratio against ESS1/2 (effective sample size) and weighting according to the effective sample size, with P < 0.10 indicating significant asymmetry [15]. This statistical analysis was performed using Stata 12.0 software (StataCorp LP, College Station, TX, USA).

Results

Study selection and characteristics

The study selection process is described in Fig. 1. The systematic literature search included 30 articles [1645] comprising a total sample of 1335 patients (1228 patients with MRS of appropriate quality) with radiologically suspected untreated glioma. The detailed characteristics of the included studies are listed in Table 1.

Fig. 1
figure 1

Flow diagram of the study selection process

Table 1 Characteristics of studies included in the meta-analysis of MRS for the differential diagnosis of HGGs from LGGs

Among these studies, 25 were retrospective cohort studies, and only five studies were prospective. In 1228 patients with MRS of appropriate quality, there were 426 LGGs and 802 HGGs. The detailed grading of HGGs was available in all but two studies [17, 25]. Histological results obtained from either surgical biopsy and/or resection were the main reference standard in all studies. Both pathology and clinical follow-up combined with radiological follow-up were used as reference standard in three studies [17, 19, 21].

Most studies used a single metabolite ratio, while five articles used only combined diagnostic thresholds, i.e. Cho/Cr >1.2 and the presence of a lipid/lactate peak [19], either Cho/Cr ratio or Cho/NAA ratio > 1.5 or the presence of lactate or lipids [24], Cho/Pcr-Cr and reduced NAA [41], Cho/Cr and/or LL/Cr [36], and Cho/Cr and NAA/Cho [43].

The risk of bias and concerns regarding the applicability of the studies are shown in Fig. 2. In most studies, the risk of bias was low or unclear. Three studies in which patients received a different reference standard were considered to have a high risk of bias.

Fig. 2
figure 2

Methodological quality analysis of the 30 eligible studies using QUADAS-2 tool

Quantitative synthesis

Cho/Cr ratio

Overall, among 19 of the included studies, a total of 852 MRS examinations of good quality were performed for the detection of HGGs by calculating the Cho/Cr ratio [1618, 20, 22, 23, 25, 27, 28, 30, 31, 3335, 3840, 42, 45]. The diagnostic threshold of Cho/Cr ranged between 0.49 and 2.07. The Spearman correlation coefficient was 0.085 (P = 0.73), which indicated that there was no obvious threshold effect in the estimates of accuracy. Significant heterogeneity was observed in the overall comparisons (I2 = 59 %), and therefore the test performance was summarized using a random-effects coefficient binary regression model.

The pooled weighted values were determined as follows: SEN, 0.75 (95 % CI: 0.71, 0.79); SPE, 0.60 (95 % CI: 0.55, 0.66); LR+, 2.78 (95 % CI: 1.62, 4.76); LR−, 0.35 (95 % CI: 0.25, 0.49); and DOR, 8.53 (95 % CI: 4.39, 16.59). The forest plots from 19 studies are shown in Fig. 3A. The AUC under the SROC was 0.83 (Fig. 4A). A total of 313 MRS examinations among six studies were performed on 1.5-T scanners [27, 34, 39, 40, 42, 45]. A 3.0-T unit was used for 539 scans among 13 studies [1618, 20, 22, 23, 25, 27, 28, 30, 31, 3335, 3840, 42, 45] that were included. There was no significant difference in AUC between the 1.5-T and 3.0-T subgroups (Pinteraction = 0.77).

Fig. 3
figure 3

Forest plot showing the sensitivity and specificity of different metabolite ratios for the differentiation of HGGs from LGGs. (A) Cho/Cr ratio; (B) Cho/NAA ratio; (C) NAA/Cr ratio

Fig. 4
figure 4

Summary receiver-operating characteristic curve (SROC). (A) Cho/Cr ratio; (B) Cho/NAA ratio; (C) NAA/Cr ratio. AUC area under the curve

Next, we performed subgroup analyses by spectrum acquisition technique (multi-voxel spectroscopy [MVS] and single-voxel spectroscopy [SVS]) and echo time of the spectroscopic sequence (short echo time [STE] and long echo time [LTE]). There was no significant difference in the AUC between the MVS and SVS groups (Pinteraction = 0.96) or the STE and LTE groups (Pinteraction = 0.90). The results of the included studies and subgroup analyses of diagnostic accuracy variables are summarized in Table 2.

Table 2 Subgroup analyses of diagnostic accuracy variables

Cho/NAA ratio

A total of 603 MRS examinations were analyzed among 16 studies [2633, 35, 37, 3942, 44, 45] to detect HGGs from LGGs by calculating the Cho/NAA ratio. The diagnostic threshold of Cho/NAA ranged between 0.80 and 3.77. No threshold effect (P = 0.27) or heterogeneity (I2 = 34 %) was found among individual studies. The pooled SEN and SPE values were 0.80 (95 % CI: 0.76, 0.84) and 0.76 (95 % CI: 0.70, 0.82), respectively (Fig. 3B).The AUC under the SROC was 0.87 (Fig. 4B). The pooled SEN, SPE, LR+, LR−, DOR and AUC are summarized in Table 2. No significant difference in AUC was detected between the 1.5-T and 3.0-T subgroups (Pinteraction = 0.76) or the MVS and SVS subgroups (Pinteraction = 0.35).

NAA/Cr ratio

Ten studies [16, 2830, 33, 34, 38, 40, 44, 45] including 333 MRS examinations were conducted in which HGGs were detected by calculating the NAA/Cr ratio. The diagnostic threshold of NAA/Cr ranged between 0.44 and 2.44. Here there was also no threshold effect (P = 0.35) or heterogeneity (I2 = 0.0 %) found among individual studies. The pooled SEN and SPE were 0.71 (95 % CI: 0.65, 0.77) and 0.70 (95 % CI: 0.61, 0.78), respectively (Fig. 3C).The AUC under the SROC was 0.78 (Fig. 4C). No significant difference in AUC was detected between the 1.5-T and 3.0-T subgroups (Pinteraction = 0.39) or the MVS and SVS subgroups (Pinteraction = 0.31).

LL/Cr

Only two studies [20, 36] had been performed in which HGGs were detected by calculating the ratio of lipids and lactate (LL)/Cr. Quantitative synthesis of studies measuring other metabolite ratios was impossible because of limited data. Finally, we compared the AUCs among the three groups, which revealed significant differences in AUC between the Cho/NAA and NAA/Cr groups (Pinteraction = 0.03).

Sensitivity analysis

Among the 19 included studies in which HGGs were detected by calculating the Cho/Cr ratio, the DOR values were greater than 1.0 in all but two studies [17, 34], which were then excluded from the sensitivity analysis. The corresponding pooled SEN, SPE, LR+, LR− and DOR were 0.74 (95 % CI: 0.70, 0.78), 0.72 (95 % CI: 0.65, 0.77), 2.8 (95 % CI: 2.02, 3.88), 0.31 (95 % CI: 0.22, 0.43) and 10.92 (95 % CI: 6.26, 19.06), respectively. The AUC under the SROC was 0.84, which was not materially different from the overall AUC (0.83), indicating that our results were statistically robust.

Heterogeneity analysis and publication bias

Significant heterogeneity was present in the pooled analysis in the Cho/Cr group that could not be explained by threshold effect. Meta-regression analysis showed that factors including year of publication, country, study design, compact surface, voxel, and echo time did not contribute to heterogeneity. There was no significant heterogeneity in the pooled analysis in the Cho/NAA and NAA/Cr groups.

The Deeks’ funnel plot asymmetry test for the overall analysis in the Cho/Cr, Cho/NAA and NAA/Cr groups showed that publication bias may have existed in Cho/Cr and Cho/NAA groups (P = 0.01 and 0.02, respectively) (Fig. 5A, B), while the NAA/Cr group showed no significant publication bias (P = 0.31) (Fig. 5C).

Fig. 5
figure 5

Funnel plot of publication bias. (A) Cho/Cr ratio; (B) Cho/NAA ratio; (C) NAA/Cr ratio

Discussion

Metabolic ratios of MRS have been frequently used to differentiate glioma grade among numerous studies. In an earlier article [8], the sensitivity and specificity of MRS was reviewed only in terms of differentiating high- from low-grade tumours. A recently published meta-analysis [46] concluded that MRS demonstrated moderate sensitivity and specificity for discriminating brain tumours (pooled estimates of 80.58 % and 78.46 %, respectively). The chief problem in this meta-analysis is that the authors combined all related articles without identifying gliomas from brain tumours or differentiating metabolite ratios used in each study, which would inevitably increase heterogeneity and render the conclusion unreliable [7]. We performed the present systematic review and meta-analyses based on accurate calculations of the relevant data.

Cho/Cr ratio

According to the quantitative synthesis, the AUC of the SROC curve was 0.83, suggesting a moderate level of overall accuracy. The DOR is a single indicator of test accuracy that combines the SEN and SPE data into a single number [47]. In this meta-analysis, the pooled DOR for diagnostic accuracy of glioma grading using the Cho/Cr ratio was 8.53, indicating that this ratio may be helpful in the diagnosis of glioma grade. As the likelihood ratios are considered clinically more meaningful than SROC curve and DOR, both LR+ and LR− were also calculated to assess diagnostic accuracy. An LR+ of 2.78 suggests that HGG patients have about threefold higher odds of a positive test (> cut-off value) than LGG patients. On the other hand, the LR− was 0.35, suggesting that with a low cut-off value for the Cho/Cr ratio, the probability of HGG for this patient would be 35 %, which is not low enough to rule out HGG.

The results of subgroup analysis showed no significant difference in diagnostic accuracy between 3.0-T and 1.5-T MRS. However, in light of the limited data, further verification of these results is needed. The diagnostic performance of both the MVS and SVS techniques in distinguishing HGGs from LGGs was mediocre, and showed no statistical difference in either LTE or STE subgroup.

Significant heterogeneity was observed in the overall comparisons that could not be explained by threshold effect. Despite performing meta-regression analysis, we were unable to find the source of the heterogeneity. There may be publication bias with regard to the Cho/Cr ratio (Fig. 5A), as the results of Deeks’ funnel plot asymmetry test (P = 0.01) indicated that only small studies with high accuracy had been reported, while small studies with lower accuracy likely had not.

Cho/NAA ratio

For the Cho/NAA ratio, the AUC (0.87) indicated a moderate level of overall accuracy. The pooled DOR of diagnostic accuracy for HGG detection by the Cho/NAA ratio was 11.95, indicating that the ratio may be helpful in the diagnosis of HGGs. No statistically significant difference in diagnostic accuracy was detected between the 1.5-T and 3.0-T or MVS and SVS subgroups. Because of the limited STE study data, we were unable to compare the diagnostic performance of the Cho/NAA ratio between LTE and STE. No heterogeneity was present in the overall comparisons, but publication bias may exist, due to the small study effects (the tendency for the small studies in a meta-analysis to show high accuracy).

NAA/Cr ratio

With regard to the NAA/Cr ratio, the AUC (0.78) indicated a moderate level of overall accuracy. The summary DOR of the diagnostic accuracy of the Cho/NAA ratio in detecting HGGs was 6.67, indicating that the ratio may be helpful in diagnosing HGGs. No statistically significant difference in diagnostic accuracy was detected between the 1.5-T and 3.0-T or MVS and SVS subgroups. Heterogeneity and publication bias was not present in the overall comparisons, indicating that the results for the NAA/Cr ratio in our meta-analysis were statistically robust.

Overall comparisons between different metabolite ratios

Several studies have strongly recommended the use of the Cho/NAA ratio rather than the Cho/Cr ratio for differentiating HGGs from LGGs [44, 4850], and one study noted no significant differences in the tumoural Cho/Cr ratio between LGGs and HGGs [48]. Cho/Cr is the ratio most frequently used, as mentioned in an overview of MRS [51] and supported by other authors [52, 53]. This ratio has also demonstrated reasonably low overall error during image processing [54]. Another reason for its use is that creatine concentration is known to be relatively stable during the formation of the anaplastic foci, in contrast to Cho concentration, which increases progressively [4]. However, this is a controversial issue, and one study [55] showed that creatine can change during the course of glioma development, and that creatine concentration can even carry prognostic information.

It is noteworthy that our meta-analysis showed no significant difference in AUC between the Cho/Cr and Cho/NAA groups (Pinteraction > 0.05) or between the Cho/Cr and NAA/Cr groups, but that there was a significant difference in AUC between the Cho/NAA and NAA/Cr groups (Pinteraction < 0.05), thus demonstrating that the Cho/NAA ratio has higher diagnostic accuracy than the NAA/Cr ratio in distinguishing HGGs from LGGs.

Based on the overall study results, the sensitivity and specificity of glioma grade diagnosis was higher for the Cho/NAA ratio (SE = 0.80, SP = 0.76) than the Cho/Cr ratio (SE = 0.75, SP = 0.60) or NAA/Cr ratio (SE = 0.71, SP = 0.70). Taking this diagnostic performance into consideration, the Cho/NAA ratio may be a superior index for distinguishing HGGs from LGGs. However, given the limited data, further study is needed to confirm the efficacy of the Cho/NAA ratio.

Limitations

Although the present meta-analysis revealed moderate overall diagnostic accuracy for MRS in distinguishing HGGs from LGGs, some design limitations should be taken into consideration when interpreting our results.

First, most of the included articles focused on the diagnostic efficiency of a single metabolite ratio. The standards combined different metabolites with metabolite ratios in distinguishing HGGs from LGGs, but these were not well established. Thus it was difficult to perform an accurate statistical analysis, as it was not known whether combined standards could improve diagnostic accuracy. In addition, the focus of the analysis was the superior metabolite ratios of tumour tissue, and comparisons between tumour tissue and healthy tissue were lacking, which may have led to lower specificity.

Second, as mentioned above, several innovative and advanced multiparametric magnetic resonance techniques, including DWI, DTI, and MR perfusion and multimodal imaging including PET/SPECT, also help to distinguish HGGs from LGGs. Though it is controversial which imaging technique is superior, most authors concluded that combined techniques would improve diagnostic accuracy [17, 22, 29, 34, 54].

Third, heterogeneity was found among the Cho/Cr ratio groups, and factors such as sample size, patient age and glioma grade may have also contributed to heterogeneity. Although heterogeneity was not present in the other two groups, there was considerable variation in study design, such as different types of coils, MRI devices and diverse methods of determining cut-off values. Fourth, there may have been publication bias regarding the Cho/Cr and Cho/NAA groups. Our meta-analysis was based only on published studies, which tend to report high accuracy; studies with lower accuracy are often rejected or not even submitted. In addition, this review was restricted to full-text articles published in English, thus omitting eligible studies that were unpublished or reported in other languages, which also likely resulted in bias.

Fifth, different spectroscopic patterns of glial tumour grades were not reflected. High-grade tumours in these studies represented grade III and IV cases grouped together, whereas a major diagnostic challenge sometimes is differentiating grade II from grade III and grade III from grade IV. Dividing the subgroups would have changed the cut-off threshold. Finally, the studies with small sample sizes would be greatly affected by adding 0.5 to each cell of the study to correct for zero entries in order to avoid potential problems in odds calculations for studies with sensitivity or specificity of 100 %.

In conclusion, this meta-analysis provides evidence that MRS has moderate diagnostic performance in distinguishing HGGs from LGGs using tumoural metabolite ratios. The Cho/NAA ratio is superior to the NAA/Cr ratio in differentiating between HGGs and LGGs. We suggest that MRS should be combined with other advanced imaging techniques in order to improve diagnostic accuracy.