Introduction

Multiple myeloma (MM) is a monoclonal plasma cell proliferative disorder and causes bone marrow infiltration or bone destruction, which is the most prominent feature of MM, occurring in approximately two-thirds of patients at diagnosis and in nearly all patients during their disease [1, 2]. Quantifying the extent of bone marrow infiltration or bone destruction plays a key role in assessing tumor burden, guiding treatment, and evaluating prognosis [3].

Conventional radiography and computed tomography (CT) can visualize the number and size of bone destruction, but its sensitivity is limited because it cannot show bone marrow infiltration [4]. The limitation is now often complemented by fluorodeoxyglucose (FDG) positron emission tomography (PET)/CT, which contains both tumor morphology and metabolism information. However, FDG PET/CT is expensive, radiative, insensitive to bone marrow infiltration and bone destruction located in the skull or ribs, and has a high false-positive rate and false-negative rate [5,6,7]. Magnetic resonance imaging (MRI) is highly sensitive for detecting bone marrow infiltration because of the excellent soft-tissue contrast [8]. Moreover, whole-body (WB) MRI has been proved to have greater sensitivity and specificity in detecting bone marrow infiltration or bone destruction than FDG PET/CT [9]. There are five MRI patterns of bone marrow infiltration in MM: normal, focal, diffuse, combined diffuse and focal, and salt-and-pepper [10]. Previous studies have shown that the tumor burden and prognosis differ among five MRI patterns [11,12,13,14]. Subsequently, semi-quantitative tumor burden scoring methods based on MRI pattern began to emerge in MM. However, previous scoring methods had some controversies: (I) they did not cover all five MRI patterns (such as normal and salt-and-pepper patterns) [15], (II) the studies had variational scoring weight for the number and size of focal lesions [15, 16], and (III) scoring weight for diffuse and combined diffuse and focal patterns were not proper and did not correspond to their tumor burden [1, 15].

WB MRI has the disadvantages of long scanning time, high requirements for technology and equipment, and difficult observation of humeral lesions due to limited field of view (FOV). MM lesions are mainly located in the axial skeleton, and the whole spine is the most affected area [17]. The whole spine scan is quick and convenient, and is widely used in clinical practice as an alternative to the WB MRI.

In our study, we try to develop a new, easy-to-implement scoring method for all five MRI patterns on whole spine scanning. We explored the prognostic significance of the new tumor burden score by evaluating its role in predicting the early treatment response and its association with progression-free survival (PFS), overall survival (OS), and the revised International Staging System (R-ISS) stage.

Materials and methods

Participant cohort

We prospectively recruited participants with newly diagnosed MM who were determined by the International Myeloma Working Group (IMWG) criteria from August 2020 to October 2022, had not received any treatment, and could tolerate MRI examination [18]. Exclusion criteria were participants who had any other diseases in the whole spine and had not completed the four cycles of induction chemotherapy in our hospital or inferior quality of MRI images (Fig. 1). Clinical data and laboratory test results including sex, age, hemoglobin, platelet, serum albumin, serum β2-microglobulin, serum lactate dehydrogenase (LDH), serum calcium, serum c-reactive protein (CRP), bone marrow plasma cell (BMPC) percentage, and flow cytometry of bone marrow cells were collected at the first diagnosis. Before treatment, all participants were divided into three groups based on the R-ISS stage, and then all participants were treated with one of the following first-line induction regimens: bortezomib, lenalidomide, dexamethasone (n = 43); bortezomib, dexamethasone (n = 9); bortezomib, thalidomide, dexamethasone (n = 6); bortezomib, cyclophosphamide, dexamethasone (n = 4). Treatment response was evaluated according to the IMWG response criteria after the completion of four cycles of induction therapy [19]. The treatment response categories include stringent complete remission (sCR), complete response (CR), very good partial response (VGPR), partial response (PR), stable disease (SD), and progressive disease (PD). All participants were categorized into two groups: good response group (sCR, CR, and VGPR status) and poor response group (PR, SD, and PD status). The PFS and OS were also recorded, with PFS defined as the time from MRI scan to progression or death or December 2023, and OS defined as the time from MRI scan to death or December 2023. The median follow-up was 24 months (range, 1–37 months).

Fig. 1
figure 1

Participant inclusion and exclusion flowchart

MRI examination

All whole-spine scanning were performed on a 3.0-T MRI scanner (Signa Pioneer, GE Healthcare, Milwaukee, WI, USA) with a 16-channel head-and-neck coil, 64-channel spine coil, and 16-channel phased-array body coil before treatment. The whole-spine sequences included cervical, thoracic, and lumbar sagittal fast spin-echo (FSE) T1WI, sagittal iterative decomposition of asymmetric echoes (IDEAL) T2WI, and axial fluid-attenuated inversion recovery (FLAIR) T2WI.

Two radiologists (J.N. and S.C, with 13 and 7 years of experience in musculoskeletal imaging, respectively) were responsible for the evaluation of tumor burden score of the whole spine. The average value from the two observers was used for statistical analysis.

Evaluation of tumor burden score

The whole spine consists of three anatomic sites: cervical, thoracic, and lumbar spine. The scoring method of each anatomic site was developed based on the pattern and extent of bone marrow infiltration. The total tumor burden score was the sum of three anatomic sites. The diagnostic criteria for myeloma lesions were that the signal intensity was lower than that of intervertebral disc or muscle on T1WI and higher than that of muscle on T2WI [10].

The scoring method for each anatomic site in five MRI patterns was as follows (Table 1): (1) focal: circumscriptive lesions with diameter at least 5 mm [10]. The number of lesions ≥ 10 was scored as 3, 2–9 was scored as 2, 1 was scored as 1, and 0 was scored as 0 (see exemplary MRI in Figs. 2 and 3) [15]. The size of the largest lesion > 15 mm was scored as 3 and 5–15 mm was scored as 2 (see exemplary MRI in Figs. 2 and 3) [15]. (2) Salt-and-pepper: widespread tiny nodular infiltrates, diameter of less than 5 mm, with preserved normal bone marrow between them [10, 20]. The size of the largest lesion < 5 mm was scored as 1 [15]. We thought the number of lesions of this pattern was at least 10, with a score of 3, and a score of 1 for the largest lesion size, so each anatomic site received a score of 4 (see exemplary MRI in Fig. 4). (3) Diffuse, homogenous infiltration [10]: we considered the tumor burden of this pattern to be equivalent to that of the focal pattern when the number of lesions ≥ 10 and the largest lesion size > 15 mm, so the score for each anatomic site was 6 (see exemplary MRI in Fig. 5). (4) Combined diffuse and focal [10]: the score of this pattern was same as the diffuse pattern (see exemplary MRI in Fig. 6). (5) Normal, similar in appearance to normal adult bone marrow [10]: we considered this pattern to be no macroscopic lesions, so the score for each anatomic site was 0 (see exemplary MRI in Fig. 7).

Table 1 Bone marrow infiltration patterns and scoring method in each anatomic site
Fig. 2
figure 2

Exemplary images of MR focal pattern. A Whole spine sagittal T2WI IDEAL and B whole spine sagittal T1WI. IDEAL, iterative decomposition of asymmetric echoes

Fig. 3
figure 3

Exemplary images of MR focal pattern. A Whole spine sagittal T2WI IDEAL and B whole spine sagittal T1WI. IDEAL, iterative decomposition of asymmetric echoes

Fig. 4
figure 4

Exemplary images of MR salt-and-pepper pattern. A Whole spine sagittal T2WI IDEAL and B whole spine sagittal T1WI. IDEAL, iterative decomposition of asymmetric echoes

Fig. 5
figure 5

Exemplary images of MR diffuse pattern. A Whole spine sagittal T2WI IDEAL and B whole spine sagittal T1WI. IDEAL, iterative decomposition of asymmetric echoes

Fig. 6
figure 6

Exemplary images of MR combined diffuse and focal and diffuse pattern. A Whole spine sagittal T2WI IDEAL and B whole spine sagittal T1WI. IDEAL, iterative decomposition of asymmetric echoes

Fig. 7
figure 7

Exemplary images of MR normal pattern. A Whole spine sagittal T2WI IDEAL and B whole spine sagittal T1WI. IDEAL, iterative decomposition of asymmetric echoes

Statistical analysis

For univariate analysis, the clinical and MRI variables were compared between two groups using the Mann–Whitney U test for continuous variables and chi-square tests or Fisher’s exact tests for categorical variables. Variables with a P-value < 0.05 in univariate analyses were included as covariates in multivariate (logistic regression) analyses to identify independent predictors, and their predictive performance were assessed by receiver operating characteristics (ROC) curve analyses. Thresholds for PFS and OS for independent predictors were calculated using X-tile, and their prognostic significance were assessed using Kaplan–Meier analyses. The differences of tumor burden score at three R-ISS stages were compared by Kruskal–Wallis H test, and an adjusted P-value by Bonferroni’s correction was used for post-hoc tests. Statistical significance was defined as P < 0.05. All statistical analyses were performed by using software (SPSS version 22.0; SPSS, Chicago, IL).

Results

Participant cohort

A total of 75 participants were initially included in this study. Three participants had other diseases in the spine, eight participants had not completed the four cycles of induction chemotherapy in our hospital, and two participants had poor imaging quality. Ultimately, we collected 62 participants. Among them 37 participants received good response and 25 received poor response after four cycles of induction chemotherapy (Table 2).

Table 2 Participant characteristics and group differences

There were 29 participants with focal MRI pattern (median score, 10; range, 3–18), 7 participants with diffuse pattern (median score, 18; range, 12–18), 8 participants with salt-and-pepper pattern (median score, 8; range, 4–12), 9 participants with normal pattern (median score, 0; range, 0–0), and 9 participants with combined focal and diffuse pattern (median score, 18; range, 12–18).

Comparison of clinical and MRI characteristics between two groups

Baseline characteristics for the two groups are shown in Table 2. Poor response group and good response group had similar clinical characteristics at diagnosis, except for higher β2-microglobulin in poor response group (6.91 mg/L vs. 3.41 mg/L; p = 0.004), higher creatinine in poor response group (114.0 umol/L vs. 73.0 umol/L; p = 0.012), and higher tumor burden score in poor response group (16 vs. 6.5; p < 0.001). The R-ISS stages (p = 0.010) were different between the two response groups (Table 2).

Regression analyses for the prediction of poor response

Logistic regression analyses indicated that tumor burden score (odds ratio, 1.266; 95% CI, 1.094–1.464; p = 0.002) (Table 3) was correlated with poor response, but β2-microglobulin, creatinine, and R-ISS stage had no correlation with the poor response (Table 3).

Table 3 OR of poor response

Diagnostic performance for prediction of early treatment response

In ROC analysis, the tumor burden score had sensitivity of 96.0%, specificity of 78.0%, and AUC of 0.838 (Fig. 8).

Fig. 8
figure 8

The ROC curve of tumor burden score for the diagnosis of poor early treatment response, and the AUC was 0.838. ROC, receiver operating characteristic; AUC, area under the curve

Tumor burden score for predicting prognosis of MM

By using X-tile, the cut-offs of tumor burden score for PFS and OS were both 12. Kaplan–Meier survival curves showed that higher tumor burden scores were associated with shorter PFS (p = 0.002, Fig. 9A) and OS (p = 0.011, Fig. 9B).

Fig. 9
figure 9

A Kaplan–Meier curves illustrating progression free survival differences between participantswith low and high tumor burden score. B Kaplan-Meier curves illustrating overall survival differences between participants with low and high tumor burden scores

Comparison of tumor burden score between different R-ISS stages

The results showed that the tumor burden score was significantly different among R-ISS stages, and after pairwise comparisons within the subgroups of R-ISS stage, the tumor burden scores in stage I and II were lower than that in stage III. However, there was no significant difference between stage I and stage II (Table 4).

Table 4 Tumor burden score in different R-ISS stages

Discussion

The tumor burden based on bone marrow infiltration plays a key role in evaluating prognosis in MM [3]. We developed a new tumor burden scoring method according to the extent of bone marrow infiltration from whole spine. Our results showed that the tumor burden score was an independent predictor of early treatment response to newly diagnosed MM with a good diagnostic performance and associated with PFS and OS. In addition, the tumor burden scores in R-ISS-I and R-ISS-II stage were lower than in R-ISS-III stage.

Bone marrow infiltration is the feature that reflects the pathogenesis of MM. Earlier studies had described a variety of semi-quantitative scoring methods to reflect the extent of bone marrow infiltration in MM, but there was no uniform standard. For example, the scoring method of Dong H. et al., firstly, only focused on the focal, diffuse, and combined diffuse and focal MRI patterns. Secondly, a score of 1 per anatomic site for diffuse pattern is considered to underestimate the tumor burden in this pattern, as in this pattern the total amount of plasma cells is large, and this pattern has also been shown to be associated with poor cytogenetics, advanced disease, and worse prognosis [21, 22]. Thirdly, for the combined diffuse and focal pattern, we thought the combined scores of focal and diffuse infiltrations overestimate the tumor burden in this pattern. In our study, we developed scoring methods for normal and salt-and-pepper patterns and adjusted the scoring methods for diffuse and combined diffuse and focal patterns. Therefore, our tumor burden score, which developed on the basis of previous studies and considered the characteristics of five MRI patterns, was a potential semi-quantitative tool to fully reflect tumor burden.

To further evaluate the prognostic values of tumor burden score in MM participants, we analyzed the correlation between tumor burden score and R-ISS stages. The R-ISS combines the International Staging System (ISS) with high-risk cytogenetic abnormalities [deletion (17p), translocation t (4;14), or t (14;16)] and serum lactate dehydrogenase (LDH) level [23, 24]. The ISS is based on the serum β2-microglobulin and albumin levels, and identifies three different prognosis patients: median OS of 62 months in ISS stage I, 44 months in ISS stage II, and 29 months in ISS stage III groups [23]. The high-risk cytogenetic abnormalities and LDH denote increased disease aggressiveness and a high rate of tumor proliferation. The R-ISS was validated in independent cohorts to be more potent than the ISS [25,26,27,28,29]. Our results showed that with the progression of R-ISS stage, the tumor burden score tended to increase, and the tumor burden score of stage I and stage II was significantly lower than that of stage III, indicating that the tumor burden score could identify the patients with poor prognosis. However, there was no significant difference in tumor burden scores between stage I and stage II, which may be related to the small sample size. We speculate that the tumor burden score, which reflects the radiological tumor burden, may serve as a supplemental marker for R-ISS.

There were several limitations in the current study. First, it was a small sample size study. Studies with large samples are needed to confirm these findings. Second, due to the long acquisition time of WB MRI, our study only applied it to the whole spine. In the future, the WB MRI can be performed by using deep learning-based image reconstruction of under sampled MRI data to shorten scan times. Deep learning image reconstruction of under-sampled MRI data is a technique that interpolates the missing k-space data points directly with the help of training data sets or finds out a mapping function between the under-sampled and fully sampled MR images to estimate the reconstructed image with better accuracy [30, 31].

In conclusion, the new tumor burden scoring method was applicable to five MRI patterns. The tumor burden score was an excellent predictor of prognosis in newly diagnosed MM and may serve as a supplemental marker for R-ISS.