Introduction

Glioma grading is of critical clinical importance, as the prognosis substantially differs according to the grade, as does the management strategy [1]. Amide proton transfer-weighted (APTw) MRI, which uses image contrast based on chemical exchange saturation transfer (CEST), could be a useful noninvasive technique for glioma grading, as it provides indirect measurements of mobile proteins and peptides [2,3,4]. Since high-grade gliomas are associated with increased expression of cellular proteins and peptides relative to low-grade glioma [5], APTw MRI may predict the cellular proliferation index [6,7,8] and differentiate low- from high-grade gliomas with higher diagnostic sensitivity and specificity compared to current imaging methods [6,7,8,9,10,11,12,13,14,15]. APTw MRI was shown to have superior accuracy for differentiating tumor progression from treatment-related change compared with MR spectroscopy [12]. APTw MRI also has direct biologic relevance compared with cerebral blood volume and apparent diffusion coefficient measurements [9, 11].

To become a clinically useful biomarker, an imaging technique needs to present reliable estimates of disease status in different protocols and processing methods. Although APTw MRI is reported to be a potentially useful tool in pre- and post-treatment tumors [6,7,8,9,10,11,12,13,14,15], APTw MRI protocols are not yet standardized, and the CEST effect greatly depends on RF power, saturation time, pulse sequence, and other imaging parameters [16]. This study addresses whether variations in APTw MRI protocols and post-processing analysis have an impact on the diagnostic accuracy in differentiating low from high-grade gliomas. An additional goal of this meta-analysis is to increase the power of the statistics by aggregating data, as most of the studies investigating APTw MRI have a relatively small sample size.

Meta-analyses have systematically reviewed the clinical utility of advanced imaging techniques, such as diffusion-weighted imaging [17], perfusion-weighted imaging [18], magnetic resonance spectroscopy [19], and positron emission tomography [20], in the assessment of glioma grading. Since APTw MRI is a novel molecular imaging technique that has not yet been previously systematically reviewed in the evaluation of glioma grading, this study aims to assess the diagnostic accuracy of APTw MRI based on existing literature in determining the cellular proliferation index, differentiating between low and high grade gliomas, and confirming interobserver consistency in light of the various imaging protocols and post-processing analysis utilized in prior studies.

Materials and methods

This systematic review and meta-analysis followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [21].

Literature search

A computerized literature search using Ovid-MEDLINE and EMBASE up to March 28, 2018, was performed to identify articles assessing the diagnostic performance of APTw MRI for differentiating low-grade and high-grade gliomas. The following search terms were used: ((glioma) OR (oligodendroglioma) OR (astrocytoma) OR (glioblastoma) OR (“brain tumor”)) AND ((amide proton transfer) OR (APT)). The search was not limited to articles in terms of English language, human or animal studies, or search date. The bibliographies of relevant articles were also searched to expand the extent of the search.

Eligibility criteria

Articles were selected if all of the following inclusion criteria were satisfied: (a) patients with histopathologically confirmed gliomas; (b) patients who underwent pre-treatment APTw MRI; (c) a reference standard based on histopathology; and (d) sufficient data for the reconstruction of two by two tables for differentiating low-grade and high-grade gliomas.

Articles were excluded according to any of the following exclusion criteria: (a) conference abstracts; (b) reviews; (c) case reports or case series including fewer than 10 patients; (d) letters, editorials, or comments; (e) animal or phantom studies; and (f) articles with a partially overlapping patient cohort. For articles with a partially overlapping patient cohort, the study with the largest population was chosen. If a 2 × 2 table could not be obtained, the authors of the articles were contacted for provision of further data.

Data extraction and quality assessment

The literature search, literature selection, data extraction, and quality assessment were assessed independently by two reviewers (C.H.S. and J.E.P.). If disagreement was revealed, a third reviewer (H.S.K.) was consulted. Quality assessment of the selected studies was performed using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) criteria [22].

The following data were extracted from the relevant studies using a standardized extraction form: (a) study characteristics: study institution, period of patient enrollment, study design, method of patient enrollment (consecutive or nonconsecutive), reference standard, and interval between APTw MRI and reference standard; (b) patient characteristics: total number of patients, number of patients with high-grade and low-grade gliomas, mean age, age range, and male to female ratio; (c) technical characteristics of MRI: magnetic field strength, vendor, model, head coil channels, data acquisition (three dimensional [3D] or two dimensional [2D]), APTw MRI sequences, TR (ms), TE (ms), number of saturation offsets (frequencies), power of radiofrequency (RF) saturation (μT), type of RF for CEST saturation (pulsed or continuous), RF saturation duration (total, ms), B0 inhomogeneity correction, total scan time, imaging analysis software, and other parameters; (d) MRI interpretation: number of readers, reader experience, and reader blindness to the reference standard; and (e) outcome: diagnostic performance (sensitivity and specificity) of APTw MRI for differentiating low-grade (WHO grade I and II) and high-grade (WHO grade III and IV) gliomas, cutoff values of parameters for differentiating low-grade and high-grade gliomas, mean values ± standard deviation (SD) of APT signal intensity for low-grade and high-grade gliomas, correlation between APT signal intensity and cellular proliferation index (Ki-67), and interobserver agreement in APTw MRI. If the diagnostic performances of several APTw MRI parameters were separately reported, the performance values of the parameter showing the highest performance were chosen.

Data synthesis and analysis

The pooled sensitivity and pooled specificity and their 95% CIs were obtained using a bivariate random-effects model [23,24,25,26,27]. The diagnostic odds ratio (DOR), pooled positive likelihood ratio (PLR), and negative likelihood ratio (NLR) were also obtained. The DOR was defined as the odds of having a positive APTw MRI result in patients with high-grade gliomas compared with the odds of having a positive APTw MRI result in patients without high-grade gliomas.

$$ \mathrm{DOR}=\frac{\mathrm{True}\ \mathrm{positive}\times \mathrm{True}\ \mathrm{negative}}{\mathrm{False}\ \mathrm{positive}\times \mathrm{False}\ \mathrm{negative}} $$

PLR was defined as the likelihood that an APTw MRI result positive for differentiating low-grade and high-grade gliomas would occur in patients with high-grade gliomas.

$$ \mathrm{PLR}=\frac{\mathrm{Sensitivity}}{1-\mathrm{Specificity}\ } $$

NLR was defined as the likelihood that an APTw MRI result negative for differentiating low-grade and high-grade gliomas would occur in patients without high-grade gliomas.

$$ \mathrm{NLR}=\frac{1-\mathrm{Sensitivit}y}{\mathrm{Specificity}} $$

A coupled forest plot of sensitivity and specificity and a hierarchical summary receiver operating characteristic (HSROC) curve with 95% confidence and prediction regions were obtained.

Heterogeneity across the selected studies was investigated as follows: (a) Cochran’s Q test with p < 0.05 taken to indicate the presence of heterogeneity; (b) Higgins inconsistency index (I2) test with a value > 50% indicating the presence of heterogeneity [28]; (c) visual assessment of the coupled forest plot to assess the presence of a threshold effect (i.e., a positive correlation between sensitivity and false positive rate among the included studies); (d) a Spearman correlation coefficient between the sensitivity and false positive rate > 0.6, which also indicates a threshold effect [29]; and (e) visual assessment of the HSROC curve to evaluate the difference between the 95% confidence and prediction regions, with a large difference indicating heterogeneity.

A Deeks’ funnel plot was performed to evaluate publication bias, with statistical significance being assessed by Deeks’ asymmetry test [30]. A meta-regression was conducted to explain the effects of heterogeneity, with the following covariates being evaluated using a bivariate model: (a) study design (prospective vs. retrospective); (b) MRI vendor; (c) power of RF saturation (2 μT vs. 1 μT); and (d) data acquisition (3D vs. 2D). Data analyses were performed by one of the authors (C.H.S., with 5 years of experience in performing systematic reviews and meta-analyses) using the “Metandi” and “Midas” modules in Stata 15.0 (StataCorp, College Station, TX) and the “Mada” package in R version 3.4.1 (R Foundation for Statistical Computing, Vienna, Austria). A value of p < 0.05 was considered as indicating statistical significance.

Results

Literature selection

The detailed literature selection process is shown in Fig. 1. The computerized literature search returned 186 articles from Ovid-MEDLINE (n = 92) and EMBASE (n = 94). After removal of 10 duplicates, screening of the titles and abstracts of the 176 remaining articles was conducted, and 151 articles were excluded as follows: 81 articles were not in the field of interest, 33 conference abstracts, 25 reviews, 11 case reports, and 1 letter. Full-text reviews of the 25 potentially eligible articles were carefully performed, and 15 articles were excluded (Supplementary materials). Two articles included patients who enrolled during overlapping time periods from the same institution, we checked the authors and the patients were not overlapped [11, 12]. Finally, 10 original articles covering a total of 353 patients were included [6,7,8,9,10,11,12,13,14,15].

Fig. 1
figure 1

PRISMA flow diagram of the study selection process

Characteristics of the included studies

The characteristics of the included studies are described in Table 1. Four of ten studies had a prospective design [7, 8, 10, 15], three studies had a retrospective design [9, 11, 12], and the other studies did not report the design. Patient enrollment was conducted in a consecutive manner in six studies [7,8,9, 11,12,13], while the enrollment process was not reported in the other studies [6, 10, 14, 15]. Histopathology was used as the reference standard in all studies, as this formed one of the inclusion criteria.

Table 1 Characteristics of the included studies

The detailed parameters of the different APTw MRI sequences are described in Table 2. APTw MRI was performed using a 3D acquisition in five studies [9,10,11,12, 14] and a 2D acquisition in five studies [6,7,8, 13, 15]. A gradient echo sequence was used in four studies [6, 7, 11, 12], a spin echo sequence in three studies [8, 13, 15], and gradient and spin echo sequences in three studies [9, 10, 14]. Various saturation offsets and frequencies were used, with a RF saturation pulse power of 2 μT being used in seven studies [6,7,8,9,10, 14, 15], and 1 μT in three studies [11,12,13]. In all studies, the magnetization transfer ratio asymmetry (MTRasym) values were calculated at an offset of 3.5 ppm, the so-called APT-weighted signal values. Image postprocessing was performed using Interactive Data Language in four studies [6, 10, 14, 15], Matlab in four studies [9, 11,12,13], and ImageJ in one study [8]. In all studies, two or three neuroradiologists placed the regions of interest (ROIs) representing the solid component of the tumor on the APT map. Total acquisition time varied considerably, from 3 min 12 s to 10 min 42 s.

Table 2 Amide proton transfer (APT)-weighted magnetic resonance imaging parameters

Quality assessment

The quality of the included studies was considered moderate, with 8 of the 10 studies satisfying at least 4 of the 7 QUADAS-2 domains (Supplementary Fig. 1). In terms of the index test domain, seven studies were considered to have an unclear risk of bias because it was unclear whether APTw MRI was interpreted blinded to the reference standard [10, 12, 9, 11, 13,14,15]. In addition, in the reference standard domain, nine studies were considered to have an unclear risk of bias as it was not clear whether the reference standard was interpreted blinded to the APTw MRI [6,7,8,9,10,11,12,13, 15]. However, we considered that this issue was not related to the applicability of the studies.

Diagnostic performance of APTw MRI for glioma grading

APT signal intensity (%) was used as the main parameter in all studies. APT90 (90% histogram cutoffs for APT values) was used in three studies [11,12,13], and APTmax (maximum signal intensity of APT) was used in one study [10]. In all studies, as the glioma grade increased, the APT signal intensity also increased. High-grade gliomas demonstrated significantly higher APT signal intensity than low-grade gliomas (Table 3). Cutoff values for APT signal intensity varied from 1.53 to 3.70%, with the median cutoff value being 2.23%. The sensitivities of the individual studies varied from 62 to 100%, and the specificities varied from 71 to 100%.

Table 3 Outcomes of the included studies

The pooled sensitivity and specificity for the diagnostic performance of APTw MRI for differentiating low-grade and high-grade gliomas were 88% (95% CI, 77–94%) and 91% (95% CI, 82–96%), respectively (Fig. 2). The pooled DOR, PLR, and NLR were 73 (95% CI, 24–222), 9.5 (95% CI, 4.6–19.5), and 0.13 (95% CI, 0.06–0.26), respectively. The area under the HSROC curve was 0.95 (95% CI, 0.93–0.97).

Fig. 2
figure 2

Coupled forest plots of sensitivity and specificity for the diagnostic performance of APTw MRI for glioma grading. Horizontal lines indicate 95% CIs

The Q test showed that heterogeneity was absent (Q = 4.328, p = 0.057), but the Higgins I2 test demonstrated that heterogeneity was present in the sensitivity (I2 = 68.17%), but not in the specificity (I2 = 44.84%). There was no evidence of a threshold effect in the coupled forest plot (Fig. 3). The Spearman correlation coefficient was − 0.081 (95% CI, − 0.676–0.578), which also indicates no threshold effect. In the HSROC curve, there was a large difference between the 95% confidence and prediction regions, which indicates the possibility of heterogeneity (Fig. 3). The Deeks’ funnel plot revealed that the likelihood of publication bias was low (p = 0.81; Supplementary Fig. 2).

Fig. 3
figure 3

Hierarchical summary receiver operating characteristic (HSROC) curve of the diagnostic performance of APTw MRI for glioma grading

Meta-regression

A meta-regression was performed to explain the effects of heterogeneity (Supplementary Table 1). Among the potential covariates, the power of the RF saturation was associated with study heterogeneity. Studies using a 2-μT RF saturation power showed significantly higher sensitivity (92% [95% CI, 87–97%]) than studies using 1 μT (69% [95% CI, 56–81%]). Otherwise, the study design, MRI vendor, and data acquisition (3D or 2D) covariates did not significantly affect study heterogeneity.

Correlation between APT signal intensity and Ki-67

Four studies evaluated the correlation between APT signal intensity and Ki-67 (Table 3) [6,7,8, 10]. The correlation coefficients ranged from 0.430 to 0.597, indicating moderate correlations between APT signal intensity and Ki-67.

Interobserver agreement

Among the 10 studies, 8 evaluated interobserver agreement for quantitative APTw MRI parameters (Table 3) [6,7,8,9, 11,12,13, 15]. Six studies used intraclass correlation coefficient (ICC) [6,7,8,9, 11, 12], one study used Cohen’s kappa [13], and one study used Cronbach alpha and standardized Cronbach alpha [15]. All studies showed excellent interobserver agreement (ICC, 0.81–0.95; Cohen’s kappa, 1; Cronbach alpha, 0.984; and standardized Cronbach alpha, 0.986).

Discussion

This meta-analysis systematically reviewed the diagnostic accuracy of APTw MRI in grading gliomas, the correlation between APT signal intensity and Ki-67 and the interobserver agreement in interpreting APTw MRI based on 10 studies and 363 patients. High-grade gliomas showed significantly higher APT signal intensity than low-grade gliomas. The pooled sensitivity was 88% (95% CI, 77–94%), the pooled specificity was 91% (95% CI, 82–96%), and the area under the HSROC curve was 0.95 (95% CI, 0.93–0.97). In addition, the correlation between APT signal intensity and Ki-67 was moderate, and the interobserver agreement for APTw MRI was excellent. Although the protocols used at 3-T were heterogeneous, APTw MRI demonstrated excellent diagnostic performance for differentiating low-grade and high-grade gliomas, suggesting that APTw MRI could be a reliable technique for glioma grading in clinical practice.

APTw MRI offers several advantages for brain tumor imaging. First, APTw MRI is a noninvasive technique, and because the contrast is based on endogenous amide protons, the administration of a contrast agent is not needed [9]. Therefore, APTw MRI is particularly helpful if the condition of the patient contraindicates the use of a contrast agent injection. Second, APTw MRI is a reliable method for obtaining quantitative APT parameters, with our results showing excellent interobserver agreement for APTw MRI (ICC, 0.81–0.94). Third, APTw MRI may reflect cellular proliferation, because abnormal protein synthesis causes overexpression of various proteins and peptides [31]. Our study results revealed moderate correlations between APT signal intensity and Ki-67.

This study highlights the importance of technical considerations for APTw MRI. The increased APT signal intensity at the higher RF power might have enhanced the diagnostic value of APTw MRI. All studies applied pulsed RF saturation, but other parameters including pulse sequences, numbers of saturation offsets and frequencies, duration of saturation, and data analysis methods varied between the studies. A previous report demonstrated that different analytical approaches influenced on APT signal intensities [32]. The results of this study emphasize that the imaging protocols, post-processing analysis, and APT signal intensity values need to be standardized across institutions before becoming clinically viable. Continuous efforts to standardize protocols are required, both from vendors and researchers.

Several clinical applications of APTw MRI in brain tumors should be noted. First, APTw MRI can accurately differentiate high-grade gliomas from low-grade gliomas. This study reviewing all currently available articles demonstrated that APTw MRI had high diagnostic performance for differentiating low-grade and high-grade glioma. Second, APTw MRI could be used to evaluate treatment response in pre- and post-treatment gliomas. Previous studies showed that APTw MRI demonstrated high performance for evaluating treatment response in newly diagnosed glioblastoma [33] and post-treatment glioma [12, 34]. Third, APTw MRI shows promising results for differentiating malignant brain tumors (including metastasis) from glioblastoma [35], and primary central nervous system lymphoma from high-grade glioma [36]. Last but not least, APTw MRI could assist tissue biopsy by increasing the accuracy of tumor sampling in patients with infiltrating gliomas [10]. APTw MRI is currently gaining interest, and its technical validation and clinical validation are ongoing. Further clinical validations for various clinical applications are warranted.

Aside from its small sample size, this study is subject to several other limitations. First, the studies included here used pulsed RF saturation, while recent advances in APTw MRI have enabled continuous RF saturation at 3 T [37, 38]. Updating the findings with studies using this recent technique will be desirable in a future meta-analysis. Second, while the I2 test revealed heterogeneity for sensitivity, the power of this test is low given the small number of studies included. We performed meta-regression and found that the power of the RF saturation was associated with study heterogeneity. Third, while the funnel plot did not appear to reveal any publication bias, it should be interpreted with caution given the low number of studies included. Further evaluation of the effects of pH, nuclear Overhauser effect, magnetization transfer effect, water content, temperature, and T1 values of water protons need to be addressed.

In conclusion, although the 3-T protocols used for APTw MRI were heterogenous, the technique demonstrated excellent diagnostic performance for differentiating low-grade from high-grade glioma. APTw MRI could be a reliable technique for glioma grading in clinical practice.