Introduction

Recent years have seen an increase in the use of whole-body MRI for detecting bone involvement in cancers with frequent bone metastasis, like prostate cancer (PCa) [1, 2], and in hematologic malignancies with frequent bone involvement, like multiple myeloma (MM) [2,3,4,5,6]. Whole-body MRI allows early detection of bone metastases and MM lesions by showing bone marrow invasion by malignant cells before bone remodeling occurs and subsequently becomes visible as osteosclerosis or osteolysis on conventional imaging (bone scintigraphy, radiographic skeletal surveys) [7,8,9].

Current guidelines recommend the use of multiple sequences in whole-body MRI, thereby providing a combination of anatomic and functional information [10, 11]. However, the respective diagnostic effectiveness of T1-weighted (T1), short tau inversion recovery (STIR), and diffusion-weighted imaging (DWI) remains unknown, and there is no consensus on the optimal protocol. Hence, heterogeneity exists among whole-body MRI protocols that vary depending on the institution, country, and target cancer [12]. Early studies included only coronal STIR sequences [13]. Most teams currently use coronal or axial T1 or STIR [14,15,16,17], though additional sagittal sequences are often obtained to optimize spinal lesion detection [14, 15, 18]. The addition of DWI sequences in the last decade has added functional information to MRI protocols, and the use of high b-value MR images has increased the sensitivity of the technique for detecting bone and extraskeletal disease [1, 16, 19,20,21].

The determination of the combination of sequences that reach the best diagnostic accuracy and the elimination of superfluous sequences are key questions for the large-scale implementation of whole-body MRI into clinical practice and workflow optimization [4, 11]. PCa and MM were chosen for this study because they are two of the most common and validated oncologic indications of whole-body MRI. In PCa, whole-body MRI emerged as an imaging method of choice thanks to its superiority over previously used modalities (mainly bone scintigraphy), as validated through multiple studies and meta-analyses [12, 22]. In MM, whole-body MRI is a diagnostic modality recommended by national and international authorities [4, 5, 23]. There are little data about the respective effectiveness of T1, STIR, and DWI to detect bone involvement on whole-body MRI. Knowing the individual performances of these sequences would help build target malignancy-specific protocols and decrease the time needed for MRI examinations [11, 24].

The purpose of this study was to compare the diagnostic accuracy of whole-body T1, STIR, high b-value DWI, and sequence combinations to detect bone involvement in PCa and MM.

Materials and methods

Patients

This two-center study was approved by our institution’s ethics committee, which did not require informed consent for the retrospective review of prospectively acquired data. Between January and December 2015, 50 consecutive patients with PCa at high risk for metastasis according to published criteria (newly diagnosed cancer with ≥ 20 ng/ml prostate-specific antigen (PSA), Gleason score ≥ 8, Union for International Cancer Control (UICC) clinical T stage 3 or 4, or suspicion of biochemical recurrence with a PSA doubling time ≤ 12 months) [25,26,27] were prospectively enrolled at the Cliniques Universitaires Saint Luc, Brussels, Belgium. During the same period at the Hôpital Lapeyronie, Montpellier, France, 50 consecutive patients with newly diagnosed and histologically proven MM were approached prospectively and 47 enrolled (2 participants were excluded due to claustrophobia, and 1 did not complete the MRI exam). All 97 patients were examined using a whole-body MRI protocol described below.

MRI protocol

The MRI studies were performed using 1.5-T MR magnets (PCa cohort: Achieva, Philips Medical Systems, Best; MM cohort: Magnetom Avanto, Siemens Healthineers). Patients were placed on the imaging table headfirst in the supine position and covered with head, neck, and spine coils and two 6-element body matrix coils. After acquiring five stacks of images in the coronal (T1 and STIR) and axial plane (DWI), a single stack of coronal whole-body images was reconstructed using the post-processing software provided by the manufacturers. For DWI, high b-value images (800 s/mm2) were reconstructed in the coronal plane. No contrast medium was administered. The imaging parameters are detailed in Table 1. All images were read on PACS workstations (Carestream Vue; Carestream Health).

Table 1 MR imaging parameters

MRI readings

Two musculoskeletal radiologists with 8 and 12 years of whole-body MRI experience performed all readings, in consensus. To assess interobserver agreement, 20 PCa and 20 MM patients were randomly selected and separately read by both observers.

The presence of bone involvement was assessed using the individual sequences (T1, STIR, DWI), the pairs of sequences (T1-DWI, T1-STIR, STIR-DWI), and all sequences together (T1-STIR-DWI), successively. Individual sequences and combinations of sequences were reviewed separately, in a random order and at 1-month intervals, to avoid any recall bias. The readers were blinded to patient identity, status, and clinical and biological data.

Determination of bone involvement

The patterns of bone marrow involvement were classified according to widely accepted categories [28, 29]. Normal marrow was defined on T1 as the homogeneous signal intensity that was higher than that of discs and muscles, homogeneous low signal intensity on STIR, and very low to absent signal intensity on DWI [14, 30, 31].

A focal bone marrow lesion (focal bone metastasis in PCa or focal plasmocytoma in MM) was defined by low signal intensity on T1 (lower than or equal to the signal intensity of discs or muscles) and intermediate to high signal intensity on STIR. On DWI, a focal lesion was defined by an area with high signal intensity on high-b-value images [7, 32, 33]. To avoid partial-volume artifacts and in accordance with previous recommendations [32], the minimal lesion diameter was 10 mm, corresponding to twice the slice thickness.

Diffuse marrow infiltration (diffuse metastatic disease in PCa and diffuse bone involvement in MM) was defined by the homogeneous low signal intensity of the bone marrow on T1, which was similar to or lower than the signal intensity of discs and muscles, intermediate to high signal intensity of the marrow on STIR, and high signal intensity of the marrow on high-b-value images [7, 14, 31]. A fourth pattern of infiltration, the “salt-and-pepper” pattern, was considered in MM, defined by the presence of innumerable unmeasurable tiny foci with low signal intensity on T1 and intermediate to high signal intensity on STIR [29, 34].

Reference standard

In addition to whole-body MRI, all patients underwent routine examinations. In PCa patients, 99mtechnetium bone scintigraphy was performed to detect bone metastases, followed by targeted radiographs of equivocal foci with increased uptake if bone scintigraphy was non-diagnostic, and abdominopelvic CT was performed for lymph node staging. These examinations were performed at baseline and at the 6-month follow-up evaluation. In MM patients, a radiographic skeletal survey was performed at diagnosis and repeated at the six-month follow-up evaluation.

The reference standard—defined as the best valuable comparator (BVC)—for bone involvement was constructed in consensus by the readers along with a third reader (musculoskeletal radiologist with 15-year experience), and with the referring uro-oncologist and hematologist. This BVC included (1) the review of all baseline and follow-up routine imaging examinations and biological and histological data and (2) the consensual reading of all available MRI sequences of a given patient obtained at baseline, along with the prospectively obtained 6-month follow-up examination. This BVC represents the best achievable evidence in the absence of systematic histologic evidence and was used in previous studies [26, 35].

Interpreting false-positive and false-negative findings

False-positive and false-negative findings of any reading were assessed during the consensus reading by three readers and categorized according to published criteria [7, 17, 30, 31]. False-positive findings were categorized as resulting either from benign conditions (degenerative disease, vertebral hemangioma, fracture, focal bone marrow hyperplasia, and diffuse heterogeneous or hyperplastic bone marrow) or from technical causes (susceptibility artifacts and “thoracic spine” artifacts).

False-negative findings were categorized as resulting either from a missed malignancy (sclerotic lesions, poor contrast between lesions and surrounding hypercellular bone marrow, MM lesions with a spontaneously high signal intensity on T1, difficult anatomy like ribs and pelvis) or from technical causes (peripheral location in the explored field and partial volume artifacts).

Statistical analysis

Interobserver agreement between the two readers was assessed with two samples of 20 patients (20 randomly selected PCa patients and 20 randomly selected MM patients) to calculate weighted Cohen’s κ coefficient for each sequence individually and for each combination. The strength of agreement was interpreted using the Landis and Koch scale as follows: κ < 0: poor agreement; 0 < κ ≤ 0.20: slight agreement; 0.21 < κ ≤ 0.40: fair agreement; 0.41 < κ ≤ 0.60: moderate agreement; 0.61 < κ ≤ 0.80: good agreement; and κ ≥ 0.81: very good agreement [36].

A receiver operating characteristic (ROC) analysis was performed to assess the performance of each individual sequence and combination of MR sequences for identifying patients with bone involvement according to the BVC. The area under the ROC curve (AUC), sensitivity, and specificity with 95% confidence intervals (CIs) were calculated. Finally, pairwise comparisons of the AUC values were performed to rank the individual MR sequences and combinations of MR sequences according to diagnostic accuracy, using a chi-squared test of equality of ROC curves’ areas [37]. A p value < 0.05 indicates statistical significance for all tests. All tests were performed using MedCalc version 12.7 Software.

Results

Demographics and disease status according to the reference standard

Fifty PCa patients at high risk for metastasis were enrolled (50 men; mean age, 67 ± 10 years; range, 59–87 years). Forty patients had newly diagnosed disease with high risk of metastasis based on a Gleason score ≥ 8 and/or ≥ 20 ng/ml PSA; 10 patients had PSA recurrence with a PSA doubling time ≤ 12 months after radical treatment or were receiving androgen-deprivation therapy. According to the reference standard, 38 of 50 patients (76%) had bone metastases (Fig. 1). Of these 38 patients, 34 (89.5%) had focal lesions and 4 (10.5%) had diffuse bone marrow involvement.

Fig. 1
figure 1

Whole-body MR examination in a 50-year-old man with newly diagnosed prostate cancer illustrates agreement between sequences. a Coronal T1-weighted and b STIR images, and c reconstructed coronal maximal intensity projection (MIP) view from DWI (inverted grayscale, b = 800 s mm−2) show multiple areas typical for bone metastases (arrows)

Forty-seven patients with histologically proven, newly diagnosed MM were enrolled (27 women and 20 men; mean age, 62.5 ± 9 years; range, 47–90 years). According to the reference standard, 31 of 47 patients (66%) had bone marrow involvement on MRI (Fig. 2). Among these patients, 23 (74%) had focal lesions, 5 (16%) had diffuse bone marrow involvement, and 3 (10%) had a salt-and-pepper pattern.

Fig. 2
figure 2

Whole-body MR examination in a 60-year-old man with newly diagnosed multiple myeloma illustrates agreement between sequences. a Coronal T1-weighted and b STIR images, and c reconstructed coronal MIP view from DWI (inverted grayscale, b = 800 s mm−2) show multiple areas of low signal typical for myeloma foci (arrows)

Interobserver agreement

In the subset of 20 PCa patients, inter-observer variability for the detection of bone metastases on all sequences was in the very good range. In the subset of 20 MM patients, inter-observer variability for the detection of bone involvement on all sequences ranged from good to very good (Table 2).

Table 2 Reproducibility of MRI readings in 20 randomly selected patients with prostate cancer and 20 randomly selected patients with multiple myeloma

Diagnostic performance

Results on diagnostic performance of sequences and combinations of sequences for PCa and MM are detailed in Table 3.

Table 3 Diagnostic performance of individual sequences and combination of sequences for detecting bone involvement in 50 prostate cancer patients and 47 multiple myeloma patients

For PCa, the highest performance (Se = 100%, 95% CI [90.5–100.0]; Sp = 100% [75.3–100.0]; AUC = 1.00 [0.93–1.00]) was achieved by the combinations of T1-DWI and T1-STIR and the combination of all sequences read together. There was no statistically significant difference between protocols (all p ≥ 0.07).

For MM, the highest performance was achieved by the combination of all sequences (Se = 100%, 95% CI [88.4–100.0]; Sp= 94.12% [71.3–99.9]; AUC = 0.97 [0.87–0.99]). The reading of the combination of all sequences showed statistically significantly better performance than the reading of individual sequences or pair of sequences (all p ≤ 0.04), except for the combined reading of T1-DWI (p = 0.49). The pair T1-DWI was superior to the T1 and DWI sequences read individually (p = 0.01 and p = 0.03, respectively).

False-positive and false-negative findings

Table 4 details the number and causes of the false-positive and false-negative findings. In PCa patients, T1, DWI, and STIR-DWI each yielded one false positive. In MM patients, T1 yielded six false positives, DWI yielded five false positives, and STIR, T1-STIR, and STIR-DWI each yielded three false positives (Fig. 3). T1-DWI and T1-STIR-DWI each yielded one false positive.

Table 4 Analysis of the false-positive and false-negative findings of using whole-body MRI to detect bone involvement in 50 metastatic prostate cancer and 47 multiple myeloma patients
Fig. 3
figure 3

Whole-body MR examination in a 73-year-old man with newly diagnosed myeloma illustrates false-positive findings of the STIR and DWI sequences. a Coronal T1-weighted MR image shows a high signal intensity rounded focus typical for a vertebral hemangioma (arrow in a). b Coronal STIR image and c reconstructed coronal MIP view from DWI (inverted grayscale, b = 800 s mm−2) show rounded area of intermediate signal on STIR (arrow in b) and impeded diffusion (arrow in c) misinterpreted as a myeloma focus on both sequences

In PCa patients, STIR yielded three false negatives (Fig. 4). STIR-DWI yielded two false negatives. DWI yielded one false negative (Fig. 5). In MM patients, T1 yielded five false negatives. DWI, STIR, T1-STIR, T1-DWI, and STIR-DWI each yielded two false negatives. T1-STIR-DWI yielded no false negative.

Fig. 4
figure 4

Whole-body MR examination in a 50-year-old man with newly diagnosed prostate cancer illustrates false-negative finding of the STIR sequence. a Coronal T1-weighted MR image and c reconstructed coronal MIP view from DWI (inverted grayscale, b = 800 s mm−2) show area of low signal on T1 (arrow in a) and impeded water diffusion on DWI (arrow in c) typical for bone metastasis. b This lesion was not detected on the coronal STIR MR image

Fig. 5
figure 5

Whole-body MR examination in a 72-year-old man with newly diagnosed prostate cancer illustrates a false-negative finding of the DWI sequence. a Coronal T1-weighted and b STIR images show supracentimetric area of low signal intensity within a midthoracic vertebra considered as sclerotic metastasis (arrow in a and b). c Reconstructed coronal MIP view from DWI (inverted grayscale, b = 800 s mm−2) shows signal void (arrow in c) that was not interpreted and the sequence was considered negative

Discussion

Our study suggests that the performance of a combination of two sequences (T1-DWI and T1-STIR) is similar to that of the entire set of sequences in detecting bone marrow involvement caused by PCa. In MM, T1-STIR-DWI achieved the best diagnostic accuracy but was not superior to T1-DWI to detect bone marrow involvement caused by MM.

In PCa, individual sequences already had high diagnostic value, and the combination of T1-DWI and T1-STIR was sufficient for detecting bone metastases, with no added value provided by a combined reading of the three sequences T1-STIR-DWI. Among the individual sequences, STIR had a low sensitivity, as sclerotic, fibrotic, and poorly hydrated bone metastases from PCa may be occult on STIR [20]. Difficulty in detecting sclerotic lesions has also been reported for DWI because of their low apparent diffusion coefficient [24]. Our results suggest that the use of T1-DWI or T1-STIR allows achieving the same performance as a set of the three sequences together (T1-STIR-DWI) in whole-body MRI protocols used to assess PCa. T1-DWI might be preferable to T1-STIR because it combines a morphologic sequence (T1) and a functional sequence (DWI) and allows the detection of abnormal lymph nodes usually done on DWI [14]. The false-positive findings of DWI used alone have been reported, explaining the lower specificity of the technique contrasting with its high sensitivity, and the need to correlate observations made on DWI to anatomic sequences [38].

In MM patients, the individual sequences showed significantly lower diagnostic performance than the combination of sequences in detecting bone involvement, with T1 showing the lowest diagnostic value. T1-STIR and STIR-DWI combinations showed significantly lower performance than the combination of T1-STIR-DWI. T1-STIR-DWI achieved the best diagnostic value, but was not significantly superior to the combination of T1-DWI. There are several explanations for the reduced accuracy of individual MRI sequences for diagnosing MM, even when used in combination. MM may be occult at diagnosis, and bone marrow will appear normal on MRI, in 50%–75% of patients with an untreated, early form of MM, and in more than 20% of patients with advanced MM. This normal bone marrow appearance is noted as subtle diffuse infiltration on histological analyses and, along with the salt-and-pepper pattern, is classically observed in patients with a lower tumor burden than patients with diffuse or focal marrow involvement [34]. Moreover, MM involvement may be confused with normal findings because diffuse abnormalities can mimic normal marrow even on DWI, especially when the abnormalities are homogeneous. Finally, MM lesions may present a spontaneously high signal intensity on T1, leading to the poor detection of lesions within the high signal intensity of bone marrow, making T1 less effective at detecting bone involvement in comparison with other cancers; this is supported in our study by the lower diagnostic performance of the T1 sequence in MM patients compared to PCa patients.

Regarding the false-positive findings, a variety of nonmalignant conditions may present as bone marrow replacement with a low signal intensity on T1, intermediate to high signal intensity on STIR, or high signal intensity on high-b-value DWI because of impeded diffusion or T2 shine through effect, which makes it difficult to distinguish these conditions from malignant disease. Hemangiomas, degenerative joint disease, bone marrow edema caused by fractures, and benign bone marrow hyperplasia were observed in this series, like in previous studies [17]. The addition of T1 or STIR to DWI reduced the number of false-positive observations in both PCa and MM patients, although not significantly, probably due to limited sample size. Conversely, the false-positive observations noted on the high-b-value DWI images were mitigated by interpreting DWI in combination with morphological imaging, in particular with T1: a lower number of false-positive findings was found using T1-DWI in comparison with only DWI [7, 31].

Regarding the false-negative findings, some lesions were missed because of the lack of contrast between the lesion and surrounding normal bone marrow (e.g., the relatively high signal intensity of some MM lesions on T1, the low signal intensity of some sclerotic bone metastases from PCa on STIR and DWI), or because the lesion was located in anatomic areas difficult to interpret (e.g., the thoracic cage and spine because of motion artifacts, or lesion location at the periphery of the explored field). Again, the combination of sequences minimized false-negative observations, supporting the recommendation to acquire both anatomic and functional sequences in whole-body MRI studies.

Our study has several limitations. First, our sample size may have been too small to confirm some comparisons. Future studies with a larger number of patients are being conducted in a multicentric approach to validate the current results, also with the intent to improve the “ground truth” (comparison with other cutting edge modalities, especially PET scan). Second, body coverage (head to upper thigh) was limited, although the risk of missing significant peripheral metastases in patients with lesions in the central skeleton is very low [32]. Third, we deliberately chose two very different pathologies in order to study the value of whole-body MRI to detect bone lesions: a solid cancer (PCa) and a hematologic cancer (MM). Our findings should not be generalized beyond these conditions. The value of a limited T1-DWI approach should be evaluated in other malignancies, such as breast cancer patients [21]. Fourth, our results obtained on 1.5-T MRI magnets should also be verified on 3-T scanners. Fifth, we did not assess the diagnostic contribution of low b-value DWI images and apparent diffusion coefficient (ADC) maps as part of our study. In daily practice, DWI should be interpreted using low and high b-values and ADC maps. These maps are used for the evaluation of treatment response in bone disease on iterative MR examinations [39]. The availability of these maps may certainly affect the diagnostic specificity of DWI (e.g., recognition of false-positive findings by identifying T2 shine through effect). Not using them might negatively impact the specificity of DWI images read alone and of combined STIR-DWI reading, but not of combined T1-DWI and T1-STIR-DWI readings where anatomic T1 sequences increase the specificity [40]. Finally, the well-known diagnostic performance of the MRI sequences (alone or in combination) for detecting tumoral involvement in the lymph nodes or other organs (especially for PCa) was not assessed.

In summary, this study suggests that T1-DWI and T1-STIR are sufficient to detect bone metastases in PCa. In MM, the combination T1-STIR-DWI has significantly higher diagnostic performance than all sequences but T1-DWI; this suggests the value of a combined T1-DWI approach for bone screening in PCa and MM.

Of note, the use of DWI instead of STIR in PCa may have additional advantages in the perspective of a “one step —whole-body— all-organ staging” of PCa. First, for the detection of the highly common nodal involvement, keeping in mind the limited diagnostic value of a technique relying on size criteria. Second, for the detection of visceral metastasis, although these are relatively rare. Third, for the evaluation of local disease in the prostatic bed, where the use of high b-value images and ADC is crucial [41]. Finally, both in PCa and in MM, the availability of DWI images and ADC maps would allow comparisons in individual lesion characteristics and global tumor load between examinations performed before and after treatment [39, 42].