Introduction

The vertebral column is the region of the skeleton most frequently affected by metastatic disease (Ecker et al. 2005; Metser et al. 2004). Autopsy series have demonstrated that vertebral metastases are found in as many as 30 % of patients with malignancies (Sundaresan et al. 2004). Ten percentage of all spinal metastases have no known primary tumor at the time of presentation (Constans et al. 1983). Vertebral metastases can cause severe pain, paralysis and impairment of activities of daily living. Therefore, early diagnosis and treatment before permanent neurologic and functional deficits are essential for a favorable prognosis (Bilsky et al. 1999). Imaging modalities are powerful tools in assessing malignant vertebral involvements (Ecker et al. 2005; Bilsky et al. 1999). The widely used modalities in diagnosing vertebral metastases included magnetic resonance imaging (MRI), computed tomography (CT), positron emission tomography (PET), bone scintigraphy (BS) and BS with single-photon emission computed tomography (SPECT) for focal lesions (Ecker et al. 2005). Extensive research on the diagnostic value of these modalities has been carried out, but no consensus has been reached as to the optimal imaging modality for detection of vertebral metastases yet, for a wide variation in patient population, imaging techniques, study designs and results exists. These factors make it difficult to know the relative diagnostic value of these imaging modalities. A meta-analysis of diagnostic tests represents a powerful tool to summarize findings in the literature by taking into account and enabling analysis of differences between studies (van Houwelingen et al. 2002).

Thus, the purpose of our study was to perform a meta-analysis to compare the diagnostic value of MRI, CT, PET, BS and BS with SPECT in detecting vertebral metastases to identify the potentially most useful diagnostic modality.

Methods and materials

Literature search

A comprehensive computer literature search was performed to identify articles about the diagnostic performance of MRI, CT, PET, SPECT or BS imaging for the detection of vertebral metastases. The articles from January 1995 to December 2015 were searched in MEDLINE, EMBASE databases. Scopus, ScienceDirect, SpringerLink, Web of Knowledge,EBSCO and the Cochrane Library were also checked for relevant articles with the similar keywords. The list of articles was supplemented with extensive cross-checking of the reference lists of all retrieved articles. No language limitation was applied.

Selection of studies

Two reviewers independently assessed potentially eligible studies. After reading all the abstracts, we managed to obtain the full text of these articles to determine whether they were exactly eligible. The inclusion criteria were as follows: (a) MRI, CT, PET, SPECT or BS imaging was used to identify and characterize vertebral metastases. (b) Histopathologic analysis and/or close clinical and imaging follow-up and/or radiographic confirmation by multiple imaging modalities were used as the reference standard. (c) For per-patient or per-lesion statistics, sufficient data were presented to calculate true-positive, true-negative, false-positive and false-negative results of the imaging modalities compared with the reference standard. (d) When data or subsets of data were presented in more than one article, the article with the most details or the most recent article was chosen. Case reports, letters, editorial, comments, reviews, animal, children or in vitro studies and the studies that did not include raw data were excluded; the studies in which result of different imaging modalities was presented in combination and could not be differentiated for performance assessment of tests on an individual modality were also excluded.

Data extraction

The same observers independently extracted relevant data from each article by using a standardized form. Observers were not blinded with regard to information about the journal name, the authors, the authors’ affiliation or year of publication, since this has been shown to be unnecessary (Berlin 1997). To resolve disagreement between reviewers, a third reviewer assessed all discrepant items and the majority opinion was used for analysis.

Study design characteristic

The QUADAS quality assessment tool was used to extract relevant study design characteristics of each study. This tool and the definitions of the characteristics were fully described by Whiting et al. (2003). It is the first systematically developed evidence-based quality assessment tool to be used in meta-analysis of diagnostic accuracy studies.

Other study characteristics

In addition, the following characteristics were recorded: (a) year of publication; (b) sample size; (c) description of study population, which included age and male–female distribution; (d) tests of reference standard; (e) authors’ country; (f) type of primary tumor.

The following imaging features were extracted: For MRI, these features included magnetic field strength, using contrast agent or not and the sequences used. For CT, these features included type of scanner (section helical), section thickness and using contrast agent or not. For PET, these features included type of tracer, amount of tracer and type of analysis (qualitative or quantitative or both). For BS and SPECT, these features included type of tracer and amount of tracer.

Examination results

2 × 2 tables were extracted both on per-patient basis and on per-lesion basis, including the numbers of true-positive, true-negative, false-positive and false-negative results. To avoid selection bias of data sets, the tabulated results for multiple MRI, CT, PET, SPECT and BS systems and/or techniques were counted as separate data sets.

Statistical analysis

Data were separately analyzed for MRI, CT, PET, SPECT and BS. We calculated pooled sensitivity, specificity and diagnostic odds ratio (DOR) for each modality, and we also calculated summary receiver operating characteristic curves (SROC). Then we did Z test to find whether the sensitivity, specificity, DOR were significantly different from others. All analyses were performed by using Microsoft Excel 2003 (Microsoft, Seattle, Wash), SPSS 13.0 for Windows (SPSS, Chicago, III) and Meta-DiSc (Zamora et al. 2006) [Meta-DiSc, produced by javier.zamora, is freeware software to perform meta-analysis of studies of evaluations of diagnostic and screening tests (Zamora et al. 2006)].

Results

Literature search and selection of studies

After the computerized search was performed and reference lists were extensively cross-checked, about 2545 abstracts were identified. We found 56 articles were potentially eligible after reading all the abstracts. After we read the full texts of these articles, 33 of the 56 relevant articles were excluded because (a) the aim of the articles was for identification and characterization of vertebral metastases (n = 1); (b) the reference standard used in the articles was not presented clearly (n = 1); (c) the aim of the article was to evaluate the efficiency of radiotherapy for spinal metastases (n = 1); (d) researchers in the articles did not report data that could be used to construct or calculate true-positive, false-positive, true-negative and false-negative results (n = 30). At last, 23 articles which consisted of 33 studies fulfilled all inclusion criteria and were selected for data extraction and data analysis.

Study design characteristics

Most studies (Table 1) had optimal design except the examination with the same reference standard (78.8 % for “no” responses to question 6), the reference standard performed independently of the index test (81.8 % for “no” responses to question 7) and the interpretation of the reference standard results without knowledge of the index test results (100 % for “no” responses to question 11).

Table 1 Results of distribution of study design characteristics in 36 studies

Other study design characteristics and examination results

There were total 1598 patients in the selected studies, and the age ranged from 7 to 95 years old. In 24 studies, the sex distribution was described: 543 patients were male, and 582 were female. In 12 studies, imaging data were presented about identification of patients. In 11 studies, imaging data were presented about identification of lesions. In ten studies, data were presented about the identification of both patients and lesion. The reference standard was clinical and imaging follow-up in two studies, both histopathologic analysis and clinical and imaging follow-up in four studies, both clinical and imaging follow-up and radiographic confirmation by multiple imaging modalities in ten studies and all three kinds in the other eighteen studies. The studies were from 13 countries, and most of the studies were about mixed primary tumors except lung cancer in 2 studies, breast cancer in 1 study and prostate cancer in 3 studies.

Table 2 presents the included data sets with corresponding numbers of data sets and patients. A full list of all included articles with all relevant study characteristics and complete examination results is available on request from the authors of this article. For MRI, most of the studies used 1.5T except one study 0.2T. Three studies used contrast agent, and the other ten studies did not. Most studies used T1, T2 and STIR sequences. For CT, one study used eight-section helical, one four-section helical and one both sixteen and sixty-four-section helical. The section thickness was 1 mm in one study, 3 mm in one study and 4–5 mm in one study. For PET, four studies used 18FDG as the tracer, and the other used 18F. The amount of tracer for most of the studies was 3.3 MBq/kg. For SPECT and BS, all the studies used 99mTc-MDP as the tracer and the amount of trace was mostly 740 MBq.

Table 2 Study characteristics of included data sets for each kind of imaging modality

Summary estimates of sensitivity, specificity, DOR and summary ROC curves on per-patient basis

For sensitivity, MRI = PET = SPECT > CT = BS (“=” indicated no significant difference, P > 0.05; “>” indicated significantly higher, P < 0.05). For specificity, MRI = CT = BS > SPECT > PET. For DOR, MRI > SPECT > BS > CT = PET. The results are also given in Table 3. The SROC curves for MRI, SPECT and BS on per-patient basis are shown in Fig. 1 (Since only two studies of CT on per-patient basis were enrolled, there was no SROC curve for CT). SROC curve for MRI and SPECT showed better diagnostic accuracy than BS and PET.

Table 3 Summary estimates of sensitivity, specificity and diagnostic odds ratio (DOR) for MRI, CT, PET, SPECT and BS
Fig. 1
figure 1

SROC curves for MRI (a), PET (b), SPECT(c) and BS (d) on per-patient basis; SROC curve for MRI and SPECT showed better diagnostic accuracy than PET and BS

Summary estimates of sensitivity, specificity, DOR and summary ROC curves on per-lesion basis

For sensitivity, PET = SPECT = MRI > BS > CT. For specificity, MRI = CT > PET = SPECT = BS. For DOR, MRI > SPECT > CT = PET > BS. The results are also given in Table 3. The SROC curves for MRI, CT, PET, SPECT and BS on per-lesion basis are shown in Fig. 2. SROC curve for each modality showed MRI had the best diagnostic accuracy and CT had the lowest diagnostic accuracy.

Fig. 2
figure 2

SROC curves for MRI (a), CT (b), PET (c), SPECT (d) and BS (e) on per-lesion basis; SROC curve for each modality showed MRI had the best diagnostic accuracy and CT had the lowest diagnostic accuracy

Discussion

In this meta-analysis, we obtained summary estimates and summary ROC curves for the diagnostic accuracy of MRI, CT, PET, SPECT and BS in the detection of vertebral metastases. On per-patient basis, MRI, PET and SPECT had better sensitivity while MRI, CT and BS had the best specificity. Combined with the SROC curve for each modality, MRI and SPECT were found to be comparable and both better than BS and PET. On per-lesion basis, PET, SPECT and MRI had the best sensitivity; MRI and CT had the best specificity; MRI had the best DOR. Combined with SROC curve for each modality, MRI was found to be the best modality.

It was reported that studies of diagnostic performance of modalities with methodological shortcomings may cause overestimation of the accuracy of a diagnostic test (Lijmer et al. 1999). We therefore did a meta-regression analysis to determine whether the diagnostic values were significantly affected by heterogeneity between individual studies. In the meta-regression, D was the dependent variate. D = ln (DOR). DOR is diagnostic odds ratio. Because of the transformation, the value was approximately normally distributed. First, we did single-factor regression analysis. The variates include: year of publication, sample size, type of reference stand, modality category, authors’ country, type of primary tumor and the answers to the 14 questions of The QUADAS quality assessment tool. We considered variates to be explanatory if their regression coefficients were statistically significant (P < 0.1). Subsequently, we developed a multivariable regression model with which we used a backward stepwise algorithm, to identify only the most important characteristics. Finally, we found that only the modality category was the most important characteristic (coefficients: 0.363, P = 0.06). This result showed the diagnostic values were not significantly affected by heterogeneity between individual studies. The meta-regression procedures are available on request from the authors of this article.

The reference standard used in this meta-analysis was “Histopathologic analysis and/or close clinical and imaging follow-up and/or radiographic confirmation by multiple imaging modalities.” We attempted to reveal whether the kind of the reference standard influenced the diagnostic accuracy and found no significant difference. There were also some drawbacks, as stated in the Standards for Reporting of Diagnostic Accuracy initiative, and a reference standard can be either a single method or a combination of methods to establish the presence of the target condition (Bossuyt et al. 2003). The major problem, however, was the absence of critical information, such as data about the execution of the reference test, the confidence rating or the characterization of lesions, and these data were insufficiently described or not described in a large subset of articles. Moreover, most of the articles did not give us the exact number of patients who were diagnosed by different reference standard.

Another limitation of this meta-analysis is that the number of the eligible articles is limited. We could not do a subgroup analysis of technical differences of each modality. So more ideally designed and specific articles are needed to continue to collect for deeper research in the future.

MRI, CT, PET, SPECT and BS are widely used for diagnosis of vertebral metastases. MRI and CT are anatomic imaging modalities, which analyze tumor tissues on the basis of their morphological appearance, while PET, SPECT and BS are functional imaging modalities. PET identifies viable tumors on the basis of higher glycolytic rates in neoplasms than normal tissue. SPECT and BS identify bone metastasis by detecting the osteoblastic response to bone destruction by tumor cells and the accompanying increase in blood flow (Tryciecky et al. 1997; Deeks 2001).

MRI has high soft-tissue contrast, good spatial and contrast resolution. It is an optimal imaging modality for bone marrow assessment. MRI can detect an early intramedullary malignant lesion before there is any cortical destruction or reactive processes. Moreover, sagittal screening images of the entire spine in MRI can reveal bone, epidural and paraspinal tumor well. The extent and degree of spinal cord compression can be readily appreciated (Schiff et al. 1998). Imaging sequences used to evaluate spinal metastases typically are T1- and T2-weighted. Tumor on a T1-weighted image is hypointense relative to the normal marrow signal. Tumor is hyperintense relative to marrow on standard T2-weighted imaging and produces a myelogram effect with cerebrospinal fluid appearing hyperintense. Unfortunately, using the recently developed, time-saving fast spin-echo T2 techniques may decrease tumor conspicuity. This decreased conspicuity can be compensated using short tau inversion recovery (STIR) techniques. STIR images show enhanced contrast between the lipid marrow (hypointense) and tumor (hyperintense) (Traill et al. 1995; Jones et al. 1994; Mehta et al. 1995). So MRI is an excellent screening tool for vertebral metastases. In the present study, MRI was shown to be better than BS on per-patient basis and the best imaging modality both on per-patient and on per-lesion basis.

CT scan is useful for assessing the degree of bone destruction, for both cortical and trabecular bone components could be well defined. But because considerable cortical destruction is required for visualization of a metastasis by CT, the sensitivity of this modality in detecting early malignant bone involvement is relatively low (Muindi et al. 1983). Moreover, cortical destruction may be especially difficult to determine the presence of severe osteoporotic or degenerative changes. CT is not sensitive for assessment of malignant marrow infiltration, although the presence of the latter may occasionally be suggested because marrow infiltrated by tumor cells is more attenuated compared with normal marrow (Vogler and Murphy 1988). In this meta-analysis, CT was shown to have the lowest sensitivity.

PET may have great potential advantage in detecting metastasis, for glucose metabolism of lesions generally changes before a physical change or symptoms. In the present study, PET was proved to have the best sensitivity. However, PET also has limitations. False-positive result will exist for PET because FDG accumulates in metabolically active tissue, including inflammation and infection, and some normal high FDG uptake tissue such as some muscles will also lead to false-positive results possibly (Even-Sapir 2005). So the specificity for PET is limited. Another limitation for PET is that it lacks anatomic details, which will hinder localization and characterization of increased trace uptake.

BS is the most commonly used modality for detection of bone metastases because it is widely available and provides an entire skeletal visualization within a reasonable amount of time and cost (Tryciecky et al. 1997; Blake et al. 2001). For BS, benign processes, such as fractures and degenerative changes which cause increased bone turnover, result in a high false-positive rate and decreasing the specificity of BS. False-negative findings can occasionally result when pure osteolytic metastases are growing rapidly, when bone turnover is slow or when the site is avascular (Cook and Fogelman 2000; Loeffler et al. 1975). Therefore, despite the usefulness of BS in diagnosing widespread, multifocal lesions, it should never be considered diagnostic when it produces equivocal findings. In the present study, BS was shown to have the lowest specificity on per-lesion basis.

SPECT had better diagnostic accuracy than BS (Roland et al. 1995; Sedonja 1999). SPECT is useful in evaluations of complex areas that are extensively surrounded by soft tissue such as the thoracolumbar spine and pelvis; it can also clarify “hot spots” obtained with other imaging modalities by virtue of its improved contrast resolution. SPECT was useful for distinguishing benign from malignant lesion. In the present study, we found SPECT was comparable with MRI and better than BS in sensitivity on per-patient and per-lesion basis and was the second best modality on per-lesion basis.

Conventional radiograph is another appropriate imaging modality and should be obtained for most patients. Common findings on plain radiography include vertebral body collapse, pedicle erosion, osteoblastic and osteolytic lesions and pathologic fracture dislocation (Algra et al. 1992; Asdourian et al. 1990). The disk margins usually are spared in contrast to marked disk destruction seen in infectious processes. Bone destruction and substantial sclerosis are reliable indicators of metastases. However, vertebral body collapse can be associated with nonneoplastic lesions in up to 22 % of cases (Wong et al. 1990). To be visible on plain radiography, 30–40 % of the bone must be eroded; therefore, lesions can be missed on plain radiography alone and detection of a metastasis may be delayed by several months compared with other modalities (Heary and Bono 2001). Due to the much lower sensitivity and obvious disadvantages when compared with other modalities, conventional radiograph was not included in this meta-analysis. Ultrasound was not a part of this study because there are no data on its role for vertebral metastasis.

On the basis of the results of this meta-analysis, MRI was found to be better than other technique on per-patient and per-lesion basis for diagnosis of vertebral metastases. MRI was also found to be the best modality.