Introduction

Bone marrow edema (BME) is pathologically associated with microfracture and hemorrhage in the trabecular bone, serving as a biomarker for bone injury [1]. The detection of BME can further elucidate the mechanism and extent of occult bone injuries, while also assisting with management, follow-up, and prognostication [1]. Preferable contrast resolution for BME with MRI has defined MRI as the standard of care for BME detection [2]. However, concerns surrounding both cost and delays in diagnosis related to MR access and safety have been arguments for alternative diagnostic options [3]. Advances in dual-energy computed tomography (DECT) technology including three-material decomposition techniques, which have allowed for the digital subtractions of materials with relevant photoelectric effect such as iodine and calcium, are establishing DECT as a potential imaging alternative to MRI when evaluating for BME [4,5,6].

Recent meta-analyses show DECT has a high diagnostic accuracy for detecting BME, particularly in the axial skeleton [4,5,6]. As such, DECT is becoming a recommended imaging modality for BME. However, location is a significant cause for variability with lower accuracies identified in joints of the appendicular skeleton [4]. The purpose of this systematic review and meta-analysis was to evaluate the diagnostic accuracy of DECT for detecting BME in the appendicular skeleton.

Methods

This systematic review and meta-analysis was performed according to current best practices and reported using guidance from the Preferred Reporting Items for Systematic Reviews and Meta-Analysis – Diagnostic Test Accuracy (PRISMA-DTA) guidelines [7,8,9]. A pre-established protocol was designed and submitted to the PROSPERO database prior to the initiation of the review (CRD42020168477). This analysis was exempt from ethical approval as it included only de-identified data with individual studies acquiring ethical approval from their home institution where necessary.

Literature search

A search of Ovid MEDLINE, Ovid EMBASE, Scopus, and the Cochrane Library (including the Cochrane Central Register of Controlled Trials, Cochrane Central Register of Protocols, and the Cochrane Database of Systematic Reviews) was performed from inception to January 31, 2020, to identify studies which used DECT for the detection of bone marrow edema in the appendicular skeleton. Variations of title/abstract/keywords and medical subject heading terms including “dual energy” AND “computed tomography” AND “bone” AND “edema” were modified dependent on database. Individual search strategies by database are outlined in Appendix Table 6. No language restrictions were applied and translation was performed when required. The search was completed according to best practices for electronic search strategies [10]. Articles from each database were then combined and duplicate articles were removed from the list. Titles and abstracts were screened for relevance and full-text review for potentially relevant studies was then performed by two reviewers with 0 and 6 years of experience with musculoskeletal imaging (K.L. and M.P.W.). Discrepancies in both processes were re-reviewed with consensus achieved between reviewers. A gray literature search was also performed by one author (M.P.W.), evaluating the most recent 2 years of conference proceedings from the Radiological Society of North America (RSNA) and the European Congress of Radiology (ECR) in addition to the most recent annual meeting for the American Roentgen Ray Society (ARRS). Conference abstracts which satisfactorily met the inclusion criteria and were not subsequently published were included in analysis. Finally, reference lists for key studies were checked and forward-searching of these key studies was performed in Google Scholar.

Selection criteria

All original articles evaluating the diagnostic accuracy of DECT for the detection of bone marrow edema in the appendicular skeleton compared with an MRI and/or clinical outcome reference standard were evaluated with full-text review. The sacroiliac joint was included as part of the appendicular skeleton if edema was evaluated in the iliac wing. Studies required sufficient data to reconstruct a 2 × 2 contingency table and authors were contacted via email when insufficient information was available. Studies were then included if sufficient information was provided by the corresponding author.

Based on a pre-established protocol, studies were excluded from the analysis if (1) less than 10 lesions were included in the assessment, (2) the study did not use DECT as the index test, (3) the target condition was not bone marrow edema, (4) the study evaluated bone marrow edema in the axial skeleton (including the skull, spine, sacrum, and/or coccyx), (5) studies and/or regions without bone marrow edema were not included, (5) a reference standard other than MRI and/or clinical outcome was used, or (6) the article was non-original research (including review articles, guidelines, consensus statements, and letters to the editor).

Data extraction

The relevant data from included studies was extracted independently by two separate reviewers. One reviewer with 6 years of experience in musculoskeletal imaging (M.P.W.) evaluated all included studies. Two additional reviewers with 0 and 6 years of experience in musculoskeletal imaging (K.L. and D.N.) each reviewed half of the included studies independent from the first reviewer. The datasets were then compared and discrepancies re-reviewed by consensus between the two more experienced reviewers (M.P.W. and D.N.). Patient characteristics were recorded including the total number of patients, mean age, number and percentage of male sex, total number of patients with bone edema (or total number of regions with edema where bones were segmentally evaluated), site of pathology, and study indication. Study characteristics included first author, year of publication, country of publication, prospective versus retrospective study design, single versus multicenter data acquisition, reported reference standard, time between trauma and DECT (days), time between DECT and MRI (days), number of readers, presence of consensus reading, and reader experience were recorded. Details of DECT characteristics including brand, low and high tube voltages (kV), pre-defined tube current-time product(s) (mAs), use of a tin filter, collimation (mm), rotation time (seconds), slice width (mm), reconstruction kernel(s), post-processing imaging, and evaluation method were recorded when reported in the primary study. Finally, true positive (TP), false negative (FN), true negative (TN), false positive (FN), sensitivity, specificity, and accuracy were recorded for each assessment. If the study did not report TP/FN/TN/FN results directly but reported the sensitivity and specificity in combination with total patients/regions and total patients/regions with bone marrow edema as determined by the reference standard, the TP/FN/TN/FN results were calculated. Results were averaged when data for multiple reviewers performing qualitative analysis were reported within a single study [11]. Data was collected on Microsoft Excel v15.14.

Risk of bias assessment

The risk of bias and applicability of each study were evaluated using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool [12]. The QUADAS-2 tool is used in diagnostic test accuracy systematic reviews to evaluate individual studies for potential sources of bias and concerns regarding applicability by assessing four separate domains including (1) patient selection, (2) index test(s), (3) reference standard, and (4) flow and timing. Studies with a high-risk evaluation for any single signaling question in a domain were considered “high risk of bias” for that domain.

Data analysis

At least one 2 × 2 contingency table was developed for each individual study. For studies where different types of performance were evaluated and reported (ex. qualitative evaluation of BME with a binary or multi-point grading scale and quantitative comparison of average Hounsfield units (HU) in region of interests of areas with and without BME), multiple contingency tables were developed for an individual study. Meta-analysis was performed using a bivariate random-effects model. The level of analysis was based on reporting of the individual study (per lesion or per-region basis) and was included in the same model. Summary sensitivities, specificities, and area under the ROC curve values were evaluated.

Inter-study variability was assumed and multivariable meta-regression was planned to explore for potential causes; variables that were homogeneous at the study level were chosen in order to minimize “ecological bias” [7]. Statistical analysis of the variability and publication bias were not performed based on the current recommendations from the PRISMA-DTA group [8]. Statistical significance for differences between groups was defined as p < 0.05. Analysis was conducted using the “mada” package for R version 3.6.2 (The R Project for Statistical Computing).

Results

Literature search

The literature search PRISMA flow diagram is demonstrated in Fig. 1. Forty articles comprising both conference abstracts and full-text articles were reviewed and twenty studies were included in the analysis [13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32] with 18 studies evaluating the qualitative performance [13,14,15,16,17,18, 20,21,22,23,24,25, 27,28,29,30,31,32] and 8 studies evaluating the quantitative performance of DECT [15, 16, 18, 19, 21, 26, 29, 32]. One study was excluded for using a CT reference standard [33].

Fig. 1
figure 1

PRISMA flow diagram showing the screening and selection of studies included in the systematic review

Patient, study, and DECT characteristics

A total of 790 patients were included in the review. The mean age ranged between 23 and 80 years and male sex ranged from 13 to 79% of patients depending on the study. Two studies evaluated BME in the wrist or hand [22, 25], 1 study evaluated the sacroiliac joint [32], 5 studies evaluated the hip and/or pelvis [21, 24, 27,28,29], 8 studies evaluated the knee [13, 14, 16, 17, 19, 23, 26, 31], 3 studies evaluated the ankle or hindfoot [15, 18, 20], and 1 study evaluated multiple joints [30]. Seventeen studies evaluated only post-traumatic patients and 3 studies evaluated non-traumatic patient populations including patients with rheumatoid arthritis with active clinical synovitis [22], hip pain but no prior trauma [29], and symptomatic axial spondyloarthropathy [32]. One study evaluated both post-traumatic patients and patients with chronic pain [18]. Four studies reported the time between trauma and DECT [13, 14, 17, 30] ranging from acutely after trauma [23] to within 100 days of trauma [13]. Patient characteristics of individual included studies are shown in Table 1.

Table 1 Patient characteristics of individual included studies

Study characteristics of individual studies included are detailed in Table 2. Three studies were performed in North America, 9 studies were performed in Europe, and 7 studies were performed in Asia. Studies were equally mixed between prospective and retrospective study designs. All studies were performed at a single academic center. MRI alone was the reference standard for 17 studies. Two studies used a mix of MRI and clinical/surgical follow-up in suspected hip fractures following trauma [21, 28], and one study evaluated patients with suspected hip fractures with a 30-day surgical and/or clinical outcome reference standard [24]. All but one study used multiple readers for evaluation. Studies primarily used board-certified radiologists with several years of experience for evaluation.

Table 2 Study characteristics of individual included studies

DECT characteristics of individual studies included are shown in Table 3. All but one reporting study used a Siemens DECT. Most studies used 80 kV for low voltage and 140 kV for high voltage. Variable current-time products were used dependent on the study. Most studies used a tin filter for higher voltage imaging. Studies used a mix of soft tissue and bone kernels with most studies reporting the use of a three-material decomposition algorithm in post-processing. Thirteen of the reporting studies used a binary evaluation method, with 7 studies reporting the use of a multi-point scale.

Table 3 Dual-energy CT characteristics of individual included studies

Diagnostic accuracy of DECT for bone marrow edema in the appendicular skeleton

Performance results for individual studies by the site of pathology are presented in Appendix Table 7. The pooled and weighted sensitivity and specificity of DECT for detecting BME in the appendicular skeleton were 86% (95% confidence interval [CI] 82–89%), 93% (95% CI 90–95%), and 0.95, respectively (Figs. 2, 3, and 4). Nearly all studies reporting inter-observer and intra-observer agreement included kappa-statistic values within the substantial to perfect agreement range [34]. Two studies demonstrated lower inter-observer agreement when evaluating the knee joint [14] and carpal bones specifically [25].

Fig. 2
figure 2

Forest plots of primary study sensitivity of dual-energy CT for detecting bone marrow edema in the appendicular skeleton

Fig. 3
figure 3

Forest plots of primary study specificity of dual-energy CT for detecting bone marrow edema in the appendicular skeleton

Fig. 4
figure 4

Summary area under the ROC curve of dual-energy CT for detecting bone marrow edema in the appendicular skeleton

Meta-regression was performed evaluating qualitative versus quantitative analysis, trauma versus non-trauma study indication, upper versus lower extremity location, and low versus high or unclear risk of bias studies (Table 4). Quantitative analysis had a statistically higher sensitivity (p = 0.01) but no difference in specificity (p = 0.28) compared with qualitative analysis. No other differences between sensitivity and specificity were identified on meta-regression.

Table 4 Bivariate random-effects multivariable meta-regression evaluating for causes of variability among studies

Risk of bias assessment

The results of QUADAS-2 assessment for risk of bias and applicability in individual studies are shown in Table 5. Studies were predominantly low risk or unclear risk of bias across domains for both risk of bias and applicability. Studies were a mix of low risk and/or high risk of bias for index test dependent on the evaluation method. Where qualitative evaluation methods were pre-specified, risk of bias for index test was deemed low. Where performance of quantitative evaluation of continuous variables was assessed with a retrospective cut-off value, studies were deemed high risk of bias for index test. One study was deemed high risk of bias in the reference standard domain for using 3 different MRI magnet strengths (1 T, 1.5 T, and 3 T) [27]. Two studies were deemed high risk of bias in the flow and timing domain for using maximum intervals between DECT and MRI studies over 2 weeks’ time [13, 29].

Table 5 Results of QUADAS-2 assessment for risk of bias and applicability in individual studies. Studies using quantitative assessment with a retrospective cut-off value were deemed high risk of bias in index test. Studies using a prospective qualitative analysis and retrospective cut-off for quantitative analysis were deemed both low risk and high risk, respectively, for index test dependent of type of analysis [15, 16, 18, 21, 29, 32]

Discussion

This meta-analysis demonstrates that the sensitivity (86% [95% CI 82–89%]), specificity (93% [95% CI 90–95%]), and AUC values (0.95) of DECT for detecting BME in the appendicular skeleton compared with a MRI and/or clinical outcome are similar or better than prior meta-analyses largely evaluating the axial skeleton [4,5,6]. These findings counter a prior meta-regression analysis by Li et al indicating that bone position in the appendicular skeleton, notably the ankle in their analysis, was likely to contribute to variability when compared with studies including the axial skeleton [4]. This meta-analysis therefore contributes to the growing evidence suggesting that DECT can be a viable alternative imaging modality to MRI for BME with predominantly substantial to perfect agreement, despite an appendicular location.

This meta-analysis used a rigorous methodological design and is not subject to the small number of included studies seen previously [5, 6]. Meta-regression of key features felt to potentially account for presumed variability amongst studies was performed, but only identified one potential source of variability. A higher sensitivity and similar specificity were identified with quantitative rather than qualitative analysis. If this difference represents a true result, this may suggest that quantitative analysis offers a better ability to exclude regions without BME while preserving specificity, compared with qualitative assessment. However, given that a similar effect was not identified for specificity, and numerous potential confounding variables were not included in the regression model, this result should be interpreted with caution. Some alternative patient- and study-specific causes which were not explored but may contribute to variability include age, specific location or region of bone marrow edema, bone size, differences in reader experience, and/or differences in the reference standard. The studies included in the analysis did use many similar DECT imaging characteristics, decreasing the likelihood that index test parameters contribute to variability.

There may also be some limitations in the generalizability of the findings of this review to non-expert readers. Most included studies evaluated only board-certified radiologists with several years of experience for qualitative interpretation of BME which could result in higher sensitivity and specificity values than is seen in general practice. In a study with multiple reviewers evaluating DECT for the presence of BME in patients with osteoporotic vertebral compression fractures, the most experienced reader demonstrated near MRI accuracy for BME while less experienced readers did not [35].

Despite the limitations regarding unexplored confounding variables impacting accuracy, as well as risk of bias in several studies, this meta-analysis indicates that DECT may be an alternative to MR imaging for evaluating BME in the appendicular skeleton when MR is not available or contraindicated. Further exploration of sources for variability, potentially through an individual patient data meta-analysis, would be helpful to increase the confidence of the generalizability of these results.