Introduction

Differentiation between malignant and benign soft tissue tumours is a commonly encountered problem in daily clinical practice. Some benign soft tissue tumours can be correctly diagnosed with standard magnetic resonance imaging (MRI). However, for soft tissue tumours with a nonspecific imaging appearance, standard MRI is often not reliable for distinguishing malignant from benign soft tissue tumours [15]. Diagnostic accuracy of standard MRI for distinguishing malignant and benign soft tissue tumours has been reported with a wide range (50–85 %) [16]. On dynamic contrast-enhanced MRI, rapid arterial enhancement followed by a plateau or washout phase has been reported to favour malignancy [7]. There have been inconsistent reports [814] using diffusion-weighted imaging (DWI) at 1.5 T for differentiation of malignant from benign soft tissue tumours. Benign lesions such as lipoma, epidermal inclusion cyst, and localized tenosynovial giant cell tumour (giant cell tumour of tendon sheath) can have low apparent diffusion coefficients (ADCs), while malignant tumours with myxoid or chondroid components have high ADCs [10, 11, 15]. However, the diagnosis can be easily made based on standard MRI in most cases of lipoma, epidermal inclusion cyst, and localized tenosynovial giant cell tumour. That said, various extracellular substances including haemorrhages, mineralization, and fat present in soft tissue tumours can cause wide variation in ADCs [16]. Therefore, we hypothesized that correlation of quantitative analysis with qualitative analysis on DWI and standard MRI could help differentiate between malignant and benign soft tissue tumours. The purpose of our study was to retrospectively determine the value of adding DWI to standard MRI to differentiate malignant from benign soft tissue tumours at 3.0 T.

Materials and methods

The study was approved by our institutional review board and the requirement for informed consents was waived for this retrospective study.

Patient population

From June 2010 to August 2013, a total of 364 patients underwent 3.0 T MRI including DWI for soft tissue tumours in our institution. The MR images of 58 patients were excluded due to various reasons (Fig. 1). A total of 109 patients underwent pathologic confirmation among 306 patients. We excluded well-differentiated adipocytic tumours (n=17) such as lipomas and well-differentiated liposarcomas, because DWI was performed using a single-shot, spin-echo echo-planar imaging sequence with fat suppression [11]. Non-neoplastic lesions (n=17) such as ganglion cysts or epidermal inclusion cysts, and metastases (n=12) were excluded [15]. Thus, 63 patients (mean age, 51 years; age range, 17–90 years; 35 men and 28 women with 34 malignant and 29 non-malignant soft tissue tumours were included in the study. Table 1 shows histological types of the included cases.

Fig. 1
figure 1

Flow diagram of the study. DWI, diffusion-weighted imaging

Table 1 Histological types of included soft tissue tumours

MRI protocols

MRI was obtained before surgery in all patients. MRI was performed using the 3.0 T (Verio; Siemens Medical Solutions, Erlangen, Germany) with a phased-array coil or an eight-channel extremity coil depending on the anatomic regions. The standard MRI protocols included longitudinal fat-suppressed T2-weighted turbo spin-echo (TSE) sequence, axial T1-weighted TSE sequence, axial T2-weighted TSE sequences with and without fat suppression, and longitudinal and axial fat-suppressed contrast-enhanced T1-weighted TSE sequences. Other parameters are shown in Table 2. Before contrast-enhancement, a single-shot spin-echo echo-planar DWI sequence was obtained in the axial plane. A parallel imaging technique using GRAPPA (GeneRalized Autocalibrating Partially Parallel Acquisitions) was combined with an acceleration factor of 2. Sensitizing diffusion gradients were applied sequentially in the x, y, and z directions with b values of 0, 300, 800, and 1400 sec/mm2 [15, 16]. Pixel-based ADC maps were created based on mono-exponential calculation from DWI using commercial software and a workstation (Leonardo MR Workplace; Siemens Medical Solution, Erlangen, Germany).

Table 2 MR imaging parameters

MRI analysis

To assess the added value of DWI for differentiating between malignant and benign soft tissue tumours, the diagnostic performances of standard MRI (step 1) and standard MRI and DWI combined (step 2) were compared. For each step, two readers (W.H.J., S.Y.L. with 16 and 5 years of experience in musculoskeletal radiology, respectively) retrospectively interpreted MR images independently regarding malignancy. Malignancy was assessed with a five-level confidence score: 0, definitely benign; 1, probably benign; 2, indeterminate; 3, probably malignant; and 4, definitely malignant. The readers were blinded to the imaging reports, clinical history, and results of pathologic examination.

For the second step, using both standard MRI and DWI, the same radiologists determined again whether each case was either a malignancy or benign. To prevent recall bias, the second step was performed six weeks after the first step and in a random order different from that of the first step. In qualitative analyses for DWI, signal characteristics of the solid portion in the soft tissue tumours on DWI were independently evaluated by two readers. The solid portion of the tumour was selected after correlation with standard MRI. Sites of haemorrhage, necrosis, or calcification were carefully avoided after correlation with standard MRI. By comparison with the signal intensity of normal skeletal muscles, the signal of the soft tissue tumours were graded from 1 to 4 (1, hypointense relative to skeletal muscles; 2, isointense relative to skeletal muscles; 3, hyperintense relative to skeletal muscle and hypointense to fluid; and 4, isointense relative to fluid). In quantitative analyses, the minimum value of ADC (ADCmin) and an average value of ADC (ADCav) were independently measured by two readers. The ADCmin was measured by manually drawn regions of interest (ROI) on the ADC map within a solid portion that presented a hyperintense signal on DWI with high b value on a picture archiving and communication system (PACS) [17]. For selecting the lowest value of ADC, ROI were drawn three to five times and the minimum of them were recorded as ADCmin. The ADCav was defined as an average ADC value obtained from ROI drawing the entire mass on one axial plane except for the peripheral most portions in order to avoid partial-volume effects. The ADCs were measured at four different combinations of b values (0 and 300, 0 and 800, 0 and 1400, and a combination of 0, 300, 800, and 1400 sec/mm2). ROIs were automatically reproduced on all ADC maps. In addition, ADC for normal appearing skeletal muscle (ADCms) was obtained to allow calculation of ADC values normalized to skeletal muscle, which we refer to as normalized ADC [18, 19].

Statistical analysis

The pathologic findings were used as the standard of reference. The Chi-square test and Fisher’s exact test were used for comparison of the results from qualitative analysis between malignancy and benignity. Repeated measures ANOVA and t-tests were used for comparison of results from quantitative analysis between malignant and non-malignant soft tissue tumours, and between malignant and non-malignant myxoid tumours. Interobserver agreement for the ADC measurement was evaluated by the Bland-Altman method [20]. The receiver operating characteristic (ROC) curve with areas under the curve (AUC) was obtained for diagnostic performance. The optimal cutoff values of ADCs were determined using ROC curve analysis. MRI and DWI findings were considered as benign if the score was 0–1 and malignant if the score was 2–4. The sensitivity, specificity, and accuracy of each step were calculated on the diagnosis of malignancy and were compared using McNemar’s statistics between each step. Kappa coefficient (κ) was performed to assess interobserver agreement between two readers with regard to lesion characterization. The κ values were interpreted as follows: <0.20, poor; 0.21−0.40, fair; 0.41−0.60, moderate; 0.61−0.80, substantial; and 0.81−1.00, very good [21]. For all tests, P values of less than 0.05 were considered indicative of statistically significant difference. All statistical analyses were performed using the commercial software (SPSS, version 19, SPSS, Chicago, III and MedCalc Software, version 11.3.0.0, Mariakerke, Belgium).

Results

Qualitative analysis of DWI

The DWI signal intensity characteristics of soft tissue tumours are shown in Table 3. On DWI with b value of 1400 sec/mm2, hyperintense signals compared to skeletal muscle (grade 3 and 4) were more common in malignant tumours than non-malignant tumours (34/34 vs. 22/29, P=0.003, for both readers), whereas, there was no significant differences on DWI with b value of 300 and 800 sec/mm2 (P>0.05) (Fig. 2). On DWI with b value of 1400 sec/mm2, hyperintense signals similar to fluid signal (grade 4) were more common in malignant tumours than non-malignant tumours (21/34 vs. 8/29, P=0.007, for reader 1; 22/34 vs. 7/29, P=0.001, for reader 2); however, there were no significant differences on DWI with b values of 300 sec/mm2 and 800 sec/mm2 (P>0.05).

Table 3 Grading of signal characteristics on diffusion-weighted imaging
Fig. 2
figure 2

A 67-year-old man with angioleiomyoma in the hand. Interpretation was correctly changed during the second step by reader 2. Reader 1 made correct interpretation during both the first and the second steps. (a, b, c) There is a well-defined mass (arrow) in the thumb. This lesion reveals heterogeneous signal on T2-weighted image (a, TR/TE 4000/73) and intermediate signal on T1-weighted image (b, 750/15), which shows intense enhancement on fat-suppressed contrast-enhanced T1-weighed image (c, 750/15). During the first step standard MRI was interpreted as indeterminate lesion (score 2) by reader 2 and definitely benign lesion (score 0) by reader 1. (d, e, f) The mass reveals hyperintense signal (arrow) on all of diffusion-weighted images (8700/89) with b values of 300 (d, grade 4), 800 (e, grade 4), 1400 sec/mm2 (f, grade 3) even though signal of the lesion decreases as b value increases. (g) On ADC map (b=0, 300, 800, 1400 sec/mm2) the mass represents high ADCs (arrow): ADCav, 1351 μm2/sec and 1356 μm2/sec for reader 1 and reader 2, respectively; ADCmin, 1250 μm2/sec and 1189 μm2/sec, respectively. During the second step, MRI with DWI were correctly interpreted as definitely benign lesion (score 0) by reader 1 and probably benign lesion (score 1) by reader 2

Quantitative analysis of DWI

Comparisons of ADCs of soft tissue tumours between malignant and non-malignant tumours are summarized in Table 4. The ADCav, ADCmin, and normalized ADCs of malignant soft tissue tumours were significantly lower than those of non-malignant tumours on all b value combinations for both readers (P≤0.002) (Fig. 3). Overall, interobserver agreements of ADCmin and ADCav of all b value combinations were superior to those of two b value combinations: mean difference, 10.1 μm2/sec (95 % confidence interval, −45.2, 65.4) and 30.7 μm2/sec (−22.1, 83.3), respectively. Interobserver agreement of ADCmin was superior to that of ADCav (Table 5). The ADCms were not significantly different in malignant and non-malignant tumours in all four combinations of b values. The cutoff values of ADC and the AUCs are shown in Table 6. The AUCs of ADCmin (0.801−0.817) were higher than those of ADCav (0.737−0.772) at all b value combinations (Fig. 4). AUC of ADCmin (0.801) was significantly higher than that of ADCav (0.737) at the combination of all four b values (P=0.036), while there were no significant differences in other b value combinations (P≥0.068) (Fig. 5).

Table 4 Quantitative analysis of DWI in soft tissue tumours
Fig. 3
figure 3

Boxplots showing the distribution of ADCav (a) and ADCmin (b) in both malignant and non-malignant tumours. Boxes indicate first to third quartiles, each midline indicates median (second quartile), and whiskers represent maximum and minimum values within the interquartile range

Table 5 Mean difference and 95 % limits of agreement for ADC measurement
Table 6 ADC cutoff values for differentiating malignant and non-malignant soft tissue tumours
Fig. 4
figure 4

Receiver operating characteristic curves of ADCav (a) and ADCmin (b) from each b value combination for diagnosis of malignancy

Fig. 5
figure 5

A 65-year-old man with leiomyosarcoma in the thigh, correctly interpreted as malignancy (score 4) during the first and the second steps by both readers. On quantitative analysis of DWI, ADCmin correctly suggested malignancy, whereas, ADCav incorrectly suggested benignity due to large necrosis. (a, b) There is an intramuscular mass (arrows) in the anterior compartment of the thigh, which shows heterogeneous signal intensity with large necrosis on T2- (a, TR/TE 5584/68) and T1-weighted images (b, 681/11). (c) Fat-suppressed contrast-enhanced T1-weighed image (736/11) shows intense enhancement and a large central non-enhancing area (arrows). (d, e, f) Diffusion-weighted images (8700/89) with b values of 300 (d, score 4), 800 (e, score 4), 1400 sec/mm2 (f, score 4) show persistent, hyperintense signal in the peripheral portions of the mass (arrows) as b values increases. (g) On ADC map (b=0, 300, 800, 1400 sec/mm2) the mass represents low ADCs (arrows); ADCav, 1607 μm2/sec and 1209 μm2/sec for reader 1 and reader 2, respectively; ADCmin, 911 μm2/sec and 783 μm2/sec for reader 1 and reader 2, respectively. During the second step, MRI with DWI was correctly interpreted as definitely malignant soft tissue tumour (score 4) by both readers

Diagnostic performance for the differentiation between malignant and non-malignant tumours

Table 7 lists the sensitivity, specificity, and accuracy of each reader for diagnosing malignant soft tissue tumours during the first and second steps. With standard imaging alone (step 1), reader 2 had lower specificity (83 % in reader 1 and 62 % in reader 2) and accuracy (89 % in reader 1 and 81 % in reader 2) than reader 1. With added information from qualitative and quantitative evaluation of DWI (step 2), sensitivity, specificity, and accuracy were higher than those of the first step in both readers. The specificity and accuracy for reader 2 were statistically different between step 1 and step 2 (P≤0.039). During the second step, there was improvement in differentiating malignant from non-malignant soft tissue tumours for both reader 1 [added value=6.3 % (4/63)] and reader 2 [added value=11.1 % (7/63)] (Fig. 6), respectively.

Table 7 Diagnostic performance in differentiation of malignant and non-malignant soft tissue tumours
Fig. 6
figure 6

A 70-year-old woman with malignant melanoma in the upper arm. Both readers interpreted the mass as indeterminate lesion (score 2) during the first step. Interpretation was correctly changed during the second step by both readers. (a, b, c) There is a well-defined ovoid lesion (arrow) within biceps brachii muscle with intermediate to hyperintense signal on T2-weighted image (a, TR/TE 4000/63) and hypointense signal on T1-weighted image (b, 730/13), which shows relatively homogeneous enhancement on fat-suppressed contrast-enhanced T1-weighed image (c, 700/15). During the first step standard MRI was interpreted as indeterminate lesion (score 2) by both readers. (d, e, f) Diffusion-weighted images (8400/73) with b values of 300 (d, grade 4), 800 (e, grade 4), 1400 sec/mm2 (f, grade 3) show persistent, hyperintense mass (arrow) as b value increases. (g) On ADC map (b=0, 300, 800, 1400 sec/mm2) the mass represents low ADCs (arrow); ADCav and ADCmin, 989 μm2/sec and 806 μm2/sec for reader 1 and reader 2, respectively. During the second step, MRI with DWI was correctly interpreted as definitely malignant soft tissue tumour (score 4) by both readers

As for 28 patients with indeterminate lesions, 100 % (9/9) and 84 % (16/19) of indeterminate lesions were correctly interpreted in the second step for both readers, respectively: 18 malignant lesions [score 1 (n=1), score 3 (n=11), score 4 (n=6)] and 10 benign tumours [score 1 (n=8), score 3 (n=2)] during the second step. Interpretation was changed in the wrong direction during the second step for three cases by the two readers: deep benign histiocytoma by reader 1, myxoid liposarcoma and schwannoma by reader 2.

Interobserver agreement of the first step was moderate (κ=0.557) and that of the second step was very good (κ=0.871). The AUCs of the second step were higher than those of the first step in both readers; however, there were not statistically significant: 0.908 vs. 0.972 for reader 1 (P=0.060), and 0.895 vs. 0.937 for reader 2 (P=0.148). Mean specificity, accuracy, and AUC from both reviewers were significantly improved during the second step, whereas mean sensitivity was not: 90 % (52/58) vs. 72 % (42/58) for specificity (P=0.006), 94 % (118/126) vs. 85 % (107/126) for accuracy (P=0.007), 0.955 vs. 0.954 for AUC (P=0.024), and 97 % (66/68) vs. 96 % (65/68) for sensitivity (P=1.000).

Analysis of soft tissue tumours with myxoid changes

There were no significant differences in ADCav and ADCmin between malignant myxoid tumours and non-malignant myxoid tumours on any b value combinations for each reader (P>0.541). Five of six malignant soft tissue tumours with myxoid change in our study were correctly interpreted as representing malignancy by each reader during both the first and second steps: myxoid liposarcoma (n=4), malignant peripheral nerve sheath tumour (n=1), and myxofibrosarcoma (n=1). One myxoid liposarcoma was incorrectly interpreted by reader 1 during both the first and second steps. Reader 2 defined it as an indeterminate lesion during the first step and incorrectly defined it during the second step. In this tumour, standard MRI did not reveal the typical findings of myxoid liposarcoma and DWI showed high ADC.

Discussion

There has been no previous report on qualitative analysis of DWI for soft tissue tumours. In qualitative analysis, we found that 1400 sec/mm2 was a helpful b value for distinguishing malignant and benign soft tissue tumours. However, 22 of 29 benign tumours showed hyperintensity on DWI with a high b value in our study. Therefore, quantitative analysis is necessary for evaluating soft tissue tumours because of significant false positive findings on qualitative analysis alone. To overcome heterogeneity in the components of soft tissue tumours, ADCmin, ADCav, and ADC ratios to normal muscle were analyzed in our study. Rijswijk et al. [8] reported the potential significance of true diffusion coefficients. They described a significant difference of true diffusion coefficients between 11 malignant and 23 benign soft tissue masses using early intravoxel incoherent motion DWI with five b values (0–701 sec/mm2) at 1.5 T, whereas ADC values between these two groups were not significantly different. In contrast, in our study the perfusion-related effect in low b values was inevitable, however, ADCs showed a significant difference between malignant and non-malignant soft tissue tumours at 3.0 T MRI. It could be related to small patient number and measurement of ADCav in three myxoid malignant tumours including two myxoid liposarcomas and one low-grade myxofibrosarcoma out of 11 malignant soft tissue tumours in their study [8], as well as use of high b values and combined standard MRI in our study.

Razek et al. [12] found significant differences in ADCs between malignant and benign soft tissue tumours. Similar to our study, they selected 1.34 × 10–3 mm2/sec as a cutoff ADCav for differentiating malignant soft tissue tumours from benign masses, which resulted in an accuracy of 91 %, sensitivity of 94 %, specificity of 88 % and an AUC of 0.869 using 1.5 T MRI with b values of 0, 500, and 1000 sec/mm2. However, the lower diagnostic performance observed in our study might be a result of the different histological types of the lesions included. Unlike the Razek et al. study [12], we excluded non-neoplastic lesions and well-differentiated liposarcomas, whereas they included venous malformation [29 % (4/14) of benign lesions] and liposarcoma [26 % (6/23) malignant lesions] without stating the subtype, except for one myxoid liposarcoma. Since ADC is affected by various factors including magnetic field strength, machines, pulse sequences, and selection of b values, the absolute cutoff value between malignancy and benignity is not clear [22]. When MR parameters are considered carefully, these cutoff values suggested in our study can be of practical value in guiding judgment of ADCs in each institution. In our study, there were improvements in diagnostic specificity and accuracy for both readers with the addition of DWI. A significant difference in the specificity and accuracy was observed between the two steps for reader 2 and an added value of 84 % (16/19) in indeterminate lesions for reader 2. We observed excellent interobserver agreement in the measurement of ADC, consistent with a recent report by Giles et al. [23]. There has been no consensus on which DWI parameters are optimal for evaluating soft tissue tumours. Although we found no significant differences in the diagnostic performance of different b value combinations, 1400 sec/mm2 performed better than the lower b value combination in qualitative analysis. Moreover, using DWI with a high b value, some cysts can be confidently defined using the T2 shine-through effect.

The ADCav can increase in malignant soft tissue tumours with large necrosis as shown in Fig. 4, and decrease in fat or calcification containing benign soft tissue tumours such as in hemangioma. The ADCmin can be affected by small amounts of haemorrhage or mineralisations that are not clearly delineated on standard MRI. Like our study, in a recent report by Subhawong et al. [19], both ADCmin and ADCav were significantly different between malignant lesions including primary and metastatic tumours, and benign lesions including benign tumours and non-neoplastic lesions. Therefore, we suggest that both ADCmin and ADCav should be obtained and carefully interpreted after correlation with standard MRI. It is helpful that a radiologist can use both ADCmin and ADCav in routine practice. For the differentiation of malignant from benign soft tissue lesions, normalized ADCs showed no additional benefit in our study.

Our study shows that diagnostic performance in differentiation of malignant and benign soft tissue tumours is improved by adding qualitative and quantitative DWI with a high b value to standard MRI at 3.0 T. Most previous research found significant overlap in ADCs between malignant and benign soft tissue tumours, even though subgroup analyses revealed significantly different ADCs in soft tissue tumours without a myxoid component [811]. We assume that one cause of these varying results may be related to the difference in location and size of the ROI. To find the solid portion, it is important to evaluate DWI with corresponding standard MRI [24]. In addition, ADCs should be obtained from the solid area on standard MRI, which shows a hyperintense signal on DWI with high b value. Einarsdottir et al. [9] used ADCav in the tumour section with the largest diameter independently of the solid portion on conventional MRI. In the studies of Rijswijk et al. [8], Maeda et al. [10], Nagata et al. [11], and Razek et al. [12], conventional sequences were used to define ROIs for ADCs, whereas qualitative analyses were not correlated. If the soft tissue mass is interpreted as definitely benign or malignant on standard MRI, the interpretation should not be changed regardless of DWI findings.

In a study by Maeda et al. [10], the ADCs of benign and malignant soft tissue tumours were not significantly different on line-scan DWI with b values of 5 and 1000 sec/mm2 at 1.5 T. They suggested that the most important cause of considerable overlap was the significantly higher ADC of myxoid-containing soft tissue tumours compared to that of nonmyxoid soft-tissue tumours. Similarly, Nagata et al. [11] reported a significant overlap in mean ADC between malignant and benign myxoid soft tissue tumours, whereas there was a significant difference in mean ADC between malignant and benign nonmyxoid soft tissue tumours, using b values of 0, 1000 sec/mm2 at 1.5 T. The mean ADC of malignant myxoid tumours was significantly higher than that of benign nonmyxoid tumours in our study. We found that the relatively high ADC of malignant soft tissue tumours with myxoid changes might not be an issue when diagnosis is performed on both standard MRI and DWI, because those lesions were correctly interpreted as malignant tumours with myxoid changes on standard MRI, and neither reader changed their interpretation during the second step.

There were several limitations to our study. First, this is a retrospective study. Although we recruited consecutive patients who satisfied the inclusion criteria, there is still some possibility of selection bias. Second, the study population and range of tumours were relatively small, particularly for benign soft tissue tumours. This was related to many cases of definite benign tumour on MRI that lead the orthopaedic surgeons not to operate on them in our institution. Third, the ROIs could have contained calcifications that affect ADCs because calcifications cannot be completely excluded on MRI. Fourth, inclusion of the b value of 0 might have introduced perfusion-related diffusion effects into ADCs.

In conclusion, the addition of qualitative and quantitative DWI to a standard MRI protocol improves diagnostic accuracy for differentiation between malignant and benign soft tissue tumours at 3.0 T.