Introduction

Multiple myeloma (MM) is a malignant disease characterized by the autonomous proliferation of monoclonal plasma cells in the bone marrow (BM) and by the overproduction of either intact immunoglobulin molecules (M-component or M-protein) or immunoglobulin free light chains kappa or lambda. MM is rare among individuals younger than 40 years, but its incidence rises in subsequent decades and exhibits a slight male predominance. Localized proliferation of malignant plasma cells in the marrow and bone may produce painful osteolytic lesions visible on plain films. Bone involvement has an important clinical impact (e.g. pathological fractures) leading to increasing pain and skeletal instability. Neurological problems are a cause of morbidity, since vertebral osteolytic lesions may produce spinal cord compression or radiculopathy. Correct assessment of BM infiltration, identification of bone lesions and extramedullary (EM) disease is crucial in the assessment of MM [1].

Recently, FDG PET/CT has been used to stage MM patients [24], to accurately evaluate response to therapy [5], to detect the site of EM disease [6] and impinging relapse [5, 7]. In recent years very promising and concordant results have been reported by different groups. FDG PET/CT has been proved to be prognostic in staging, in interim and final treatment monitoring, and during follow-up [4, 7]. FDG PET/CT has some recognized advantages in the evaluation of MM patients:

  1. 1.

    Extended field of view (including skull, ribs, upper limbs, femurs, pelvis and spine)

  2. 2.

    Absence of possible collateral effects or adverse reactions to FDG

  3. 3.

    Possibility to perform it even in patients with renal failure

  4. 4.

    Short image acquisition time with 3-D tomographs (important in patients with fractures, bone pain or vertebral collapse)

  5. 5.

    Free decubitus positioning

  6. 6.

    Possibility to evaluate soft tissues and organs at the same time to detect EM disease

  7. 7.

    Possibility to semiquantify disease activity using the maximum standardized uptake value (SUVmax)

  8. 8.

    The low-dose CT images associated with the PET images allow the morphological appearance of the bone to be described

  9. 9.

    No restrictions in relation to metallic bone implants

However, FDG PET/CT may be difficult to interpret in some patients:

  1. 1.

    A significant percentage of patients are affected by MM-related anaemia that results in a significant increase in BM tracer uptake causing a hot background in bone

  2. 2.

    The amount of FDG uptake is variable and therefore a baseline study is needed for reference

  3. 3.

    Early PET-positive MM lesions may not correspond to an osteolytic area and may be difficult to call

  4. 4.

    The low spatial resolution of PET imaging does not allow “salt and pepper” BM infiltration to be accurately detected

  5. 5.

    Recent fractures may appear falsely positive

  6. 6.

    Bone metallic implants may cause significant artefacts on CT images and may cause infections with resulting nonspecific FDG uptake

  7. 7.

    Therapy response criteria are not defined

No standard interpretation criteria have been proposed for the evaluation of FDG PET/CT scans in MM. Some groups base their image interpretation mainly on semiquantitation, others rely on visual assessment and others on both methods, and thus data reproducibility is prevented [7, 8]. For these reasons, a group of Italian nuclear medicine experts, haematologists and medical physicists defined new visual interpretation criteria (Italian Myeloma criteria for Pet Use; IMPeTUs) to standardize FDG PET/CT evaluation in MM patients, and tested them in a preliminary blinded independent central review process. This was done to provide standard image interpretation criteria to make clinical trial results reproducible.

Materials and methods

Criteria designing process

Criteria were agreed during interdisciplinary meetings attended by nuclear medicine physicians, haematologists and medical physicists from different Italian institutions, who combined their experience in the use of FDG PET/CT in MM. A synthesis of views was reached and mainly descriptive criteria were defined. Semiquantitative reporting was excluded owing to high variability in the measurement of SUVs on different tomographs in clinical practice, especially in PET centres not undergoing any clinical trial qualification process. However, semiquantitative data were measured and collected for future analysis. All PET/CT scanners used in patients in this investigation underwent the clinical trial qualification endorsed by FIL (Fondazione Italiana Linfomi; Italian Lymphoma Foundation) in collaboration with the Italian Association of Nuclear Medicine (AIMN) and the Italian Association of Medical Physics (AIFM) in the Corelab at the Medical Physics Unit of Santa Croce e Carle Hospital, Cuneo, Italy. Variability among PET/CT scanners in the measurement of SUVs in anthropomorphic phantoms is guaranteed to be below 10 %.

Image acquisition protocol

PET/CT scans were acquired according to a local protocol (applying EANM PET procedure guidelines for FDG studies), but the following conditions were required for inclusion of scan data: (a) studies had to be carried out using full-ring PET/CT; (b) iterative reconstruction was applied to PET images; (c) CT and attenuation-corrected PET images at baseline (PET-0), after induction therapy (PET-AI) and the end of treatment (PET-EoT) were available for central review; (d) PET scans of the same patient had to be performed on the same scanner, and (e) both PET and CT scans had to cover the region from the top of the skull to the lower third of the femurs.

Administered activity was 279 ± 68 MBq on average. The uptake time was uniformly in the range 58 – 88 min. The average liver SUV was 2.3 ± 0.5. The differences in parameter values among PET scans at different time points were never significant (p > 0.60).

Image revision process

After anonymization, PET/CT scans were uploaded by the participating PET centres into WIDEN® (diXit srl, Torino, Italy). PET scans were excluded from the study if: (a) they were poor-quality images with low statistics that were not considered suitable for diagnostic interpretation, (b) the image dataset was incomplete, and (c) large violations with respect to uptake time and administered dose were found analysing DICOM headers of PET scans. Upon image upload, the WIDEN platform automatically distributed images to five expert reviewers (M.B., A.B., C.N., M.R. and A.V.) who downloaded the PET images onto their own workstation and independently reviewed the scans by completing the online form, patient by patient.

The review was considered complete when four of the five reviewers fulfilled the criteria. A two-step process for the reviews was set up. A preliminary set of 30 scans (belonging to patients not enrolled in the protocol) was distributed to the experts who reviewed them blindly using their own workstation as a preliminary test of the criteria. The results were compared by calculating the interobserver variabilities of all the parameters. The reviewers then met in a consensus review session to report jointly the cases that showed discrepant results during the previous independent review. When needed, the criteria were adjusted. A new set of rules was established. This new set of rules was used to review blindly a new set of patients enrolled in the protocol. The main change included simplification in the number of focal lesions described, that were merged into the following groups: 0 lesions, 1 – 3 lesions, 4 – 10 lesions, >10 lesions.

IMPeTUs criteria

The final version of the criteria is only descriptive and is based on five-point scales. This is important, since features correlated with the presence of active disease can be recognized “a posteriori” in the light of follow-up data; this process resembles that used in validation of the Deauville criteria. The final version includes the description, using a five-point scales, of: metabolic state of the BM, number and site of focal PET-positive lesions with or without osteolytic characteristics, presence and site of EM disease, presence of paramedullary (PM) disease, and presence of fractures. The visual degree of uptake is defined for the target lesion and EM lesions according to the schema proposed in the Deauville criteria for the evaluation of lymphoma patients [8] (Fig. 1).

Fig. 1
figure 1

FDG PET/CT imaging in a patient with MM. a MIP image, b-c transaxial fused images (red arrows extramedullary lesion (liver), filled and unfilled arrows extraspinal bone lesions). In this patient the descriptive criteria (IMPeTUs) are: BM3, F2 ExtraSp (5), L, EM EN (5), where BM3 indicates bone marrow uptake is < liver but > mediastinum, F2 indicates one to three lesions, ExtraSp indicates outside the spine with (5) indicating reference lesion uptake >> liver, L indicates at least one lesion is also lytic, EM indicates at least one extramedullary lesion, EN indicates the extramedullary lesion is extranodal (liver) with (5) indicating extramedullary lesion uptake >> liver

The IMPeTUs criteria are summarized in Table 1, andin detail are as follows:

  • Bone marrow (BM):

    • Deauville criteria:

      1. 1

        No uptake at all

      2. 2

        ≤ mediastinal blood pool uptake (SUVmax)

      3. 3

        > mediastinal blood pool uptake, ≤ liver uptake

      4. 4

        > liver uptake +10 %

      5. 5

        >> liver uptake (twice)

      “A” appended if there is hypermetabolism in limbs and ribs

  • Focal bone lesions (F):

    • Lesion number group (x):

      • x = 1: no lesions

      • x = 2: 1 to 3 lesions

      • x = 3: 4 to 10 lesions

      • x = 4: >10 lesions

    • S: skull

    • Sp: spine

    • ExtraSp: all the rest

    Target lesion is the hottest area

    • Plus Deauville criteria:

      1. 1

        No uptake at all

      2. 2

        ≤ mediastinal blood pool uptake (SUVmax)

      3. 3

        > mediastinal blood pool uptake, ≤ liver uptake

      4. 4

        > liver uptake +10 %

      5. 5

        >> liver uptake (twice or more)

  • Presence of at least one lytic lesion (L):

    • x = 1: no lesions

    • x = 2: 1 to 3 lesions

    • x = 3: 4 to 10 lesions

    • x = 4: >10 lesions

  • Presence of at least one fracture on CT images (Fr)

  • Presence of paramedullary disease (PM): a bone lesion involving surrounding soft tissues with bone cortical interruption

  • Extramedullary disease (EM):

    • Nodal disease (N) plus site:

      • LC: laterocervical

      • SC: supraclavicular

      • M: mediastinal

      • Ax: axillary

      • Rp: retroperitoneal

      • Mes: mesentery

      • In: inguinal

    • Extranodal disease (EN) plus site:

      • Li: liver

      • Mus: muscle

      • Spl: spleen

      • Sk: skin

      • Oth: other

    • Plus Deauville criteria for target EM lesions:

      1. 1

        No uptake at all

      2. 2

        ≤ mediastinal blood pool uptake (SUVmax)

      3. 3

        > mediastinal blood pool uptake, ≤ liver uptake

      4. 4

        > liver uptake +10 %

      5. 5

        >> liver uptake (twice or more)

Table 1 Summary of IMPeTUs criteria

Semiquantitative data for physiological areas are also added. The semiquantitative indexes are measured using a region of interest (ROI) with a radius greater than 3 cm in the central portion of the liver far away from its edge and a ROI completely encompassed in the lumen of the aorta, taking care to avoid the edge of the vessel wall or areas of calcification, for the mediastinal blood pool structures. The following semiquantitative parameters are annotated:

  • BM SUVmax of the hottest lesion per macro area

  • Focal reference SUVmax of the hottest lesion per macro area

  • Focal reference SUVmean of the hottest lesion (five pixels) per macro area

  • Fr on REF: present/absent

  • REF in EM disease: present/absent

  • Liver SUVmax

  • Liver SUVmean

  • Mediastinal blood pool SUVmax

  • Mediastinal blood pool SUVmean:

  • Comments: …

Statistics

Krippendorff’s alpha was used for measurement of overall consensus agreement [9]. This coefficient is 0 for random coincidences and below 0 for concordances lower than random coincidence.

Results

The first exploratory PET/CT series included 30 scans performed in ten patients. This dataset was used for a preliminary check of the applicability of the criteria and was therefore carried out in patients not enrolled in the multicentre protocol. Overall a good concordance was found for BM evaluation. A low concordance rate among the reviewers was found for focal lesion number, focal lesion score, presence of fracture, presence of lysis, and presence of PM and EM disease (Table 2).

Table 2 Percentage agreement on the first reading (ten patients, 30 scans)

After the criteria were modified on the basis of the findings in the first ten patients (in particular categorizing the number of lesions into different groups), another 17 consecutive patients were enrolled within the Italian multicentre protocol. They had all undergone a baseline PET scan (PET-0), a PET scan after induction therapy (PET-AI) and a PET scan at the end of therapy (PET-EoT). The patient characteristics are presented in Table 3.

Table 3 Patient characteristics

PET-0

Concordance in the evaluation of BM was 71.5 %. Alpha values for BM score and focal score were 0.33 and 0.47, respectively. Alpha values for number of focal lesions and lytic lesions were 0.40 and 0.32. Skull, spine and extraspinal involvement was found in 2, 14 and 12 patients, respectively, with mean concordances of 37 %, 78 % and 71 %. Four patients had a fracture by one reviewer and one patient had a fracture by two reviewers. Two patients had a PM lesion by one reviewer and two patients had a PM lesion by all reviewers. Four patients had nodal and extranodal EM lesions.

PET-AI

Concordance in the evaluation of BM was 64 %. Alpha values for BM score and focal score were 0.09 and 0.43, respectively. Alpha values for number of focal lesions and lytic lesions were 0.22 and 0.21, respectively. Skull, spine and extraspinal involvement was found in 1, 13 and 5 patients with a mean concordance of 25 %, 67 % and 79 %. Four patients had a fracture by one reviewer and one patient had a fracture by all reviewers. Four patients had a PM lesion by one reviewer and two patients had a PM lesion by all reviewers. No patients had EM disease.

PET-EoT

Concordance in the evaluation of BM was 68 %. Alpha values for BM score and focal score were 0.07 and 0.28, respectively. Alpha values for number of focal lesions and lytic lesions were 0.25 and 0.21, respectively. Skull, spine and extra-spinal involvement was found in 1, 9 and 5 patients with a mean concordance of 25 %, 47 % and 70 %. Five patients had a fracture by one reviewer. No patients had PM or EM disease (Table 4).

Table 4 Krippendorff’s alpha values in 17 patients for agreement among four reviewers

Results regarding the agreement among the reviewers as to the presence/absence of bone lesions are presented in Table 5.

Table 5 Agreement among the reviewers as to the presence/absence of bone lesions

Discussion

In clinical practice, oncohaematologists rely on imaging to obtain prognostic data, to stratify patients into risk groups and to identify the best and personalized treatment for each patient. FDG PET/CT is a sensitive but rather nonspecific diagnostic method in oncology. Moreover, standard and shared interpretation criteria to ensure data reproducibility are still lacking. Some nuclear medicine physicians, for example, tend to report only findings with a very high positive predictive value (specificity higher than sensitivity), others tend to report even tiny and faint findings (sensitivity higher than specificity), and others may change their reporting approach depending on the disease under evaluation, clinical history, patient age, risk of relapse etc. Consequently, the same set of images with equivocal findings may result in a positive report, inconclusive report or negative report according to the approach of the nuclear medicine physician, which is usually based on his or her own practical experience and relationship with referring clinicians. This is why a standardized interpretation algorithm is of basic importance. Understanding which finding is relevant, which is collateral, which is prognostic and which remains equivocal is the key to speaking the same language in both clinical practice and scientific clinical research.

At about the turn of the millennium, imaging generally played a marginal role in MM. Only skeletal plain radiography was included in staging systems, but was unsuitable for assessing response to therapy [10]. More recently, MRI was included in staging procedures because of its exceptional sensitivity in the detection of tumour spread, especially in the spine [1114]. However, MRI is also insensitive for the interim assessment of therapy response [15]. The use of PET in MM started relatively late in comparison to PET in lymphoma. Several reasons account for this: (1) the presence in MM of a reliable, readily available and cheap marker for diagnosis and treatment monitoring, the M-protein; (2) low availability of PET scanning facilities in multicentre settings for a disease associated with serious morbidity in more than 30 % of patients at disease onset; (3) lack of standard criteria for PET scan interpretation. MM is a very complicated disease from an imaging point of view. To increase sensitivity of PET the field of view must be extended to also include the long bones and the skull, which are involved by disease in a significant percentage of patients. Moreover, the skeleton can be affected with a diffuse/infiltrative pattern or with focal lesions. Finally, extraskeletal lesions may be present. Concomitant myeloma-related conditions (including anaemia and fractures) are common and complicate image interpretation.

In recent years several studies have focused on FDG PET/CT for MM staging, restaging and treatment monitoring. Many studies have provided interesting preliminary results on the accuracy and predictive value of PET/CT, and its value in the early assessment of therapy response [4, 7, 16]. However, the interpretation criteria used in these studies are different. In some purely visual criteria were used, in some semiquantitative criteria and in some mixed criteria. Such criteria are rather concordant in providing a clear definition of a positive or a negative lesion, but may be strongly discordant if the findings are equivocal such as in the presence of mild focal bone uptake (especially during or after therapy), leading to variable performance of FDG PET/CT [4, 7, 16].

In consideration of the increasing use of FDG PET/CT and of upcoming international multicentre trials on PET/CT in MM, a group of Italian nuclear medicine physicians, haematologists and medical physicists proposed new standardized interpretation criteria for PET/CT image reading. Moving from the positive experience with PET in lymphoma, it was decided to adopt interpretation rules similar to the Deauville criteria based on visual assessment [17], including a semiquantitative analysis (SUVmax of lesions and normal structures) for clarification of eventual equivocal results within the limitations of quantitative assessment in the multicentre setting. A descriptive readout was adopted for the proposed criteria and the follow-up criterion (e.g. the disappearance of the abnormality at the end of treatment) was used to define the described finding as attributable to disease. This method for setting up new diagnostic criteria is longer than a standard validation of predefined a priori criteria, but is guaranteed to be more applicable and reproducible in use. Moreover, it allows nonspecific FDG uptake patterns that may not have a clinical impact or meaning to be ruled out (for example, increased BM activity or mild residual uptake in lytic lesions after therapy) but may be difficult to interpret. Accordingly, each abnormality is described but its clinical relevance is assessed a posteriori.

Nonetheless, the agreement among the reviewers seems at a first glance to be suboptimal, even though there was a certain improvement from the first version of the criteria to the final version that was simplified particularly in terms of the number of the categorized lesions. This is because statistical analysis carried out with only one coefficient takes into account perfect concordance of different degrees of positivity, and is not only a dichotomous judgment in terms of positivity/negativity as reported in previous studies on Deauville criteria in lymphoma patients [8]. Finally, the agreement among the reviewers was comparable to that obtained following the first revision in previous studies on lymphoma [8, 18, 19]. In this perspective a consensual revision of the nonconcordant cases was very helpful for clarifying the most critical points, and the overall set of rules awaits prospective validation in a larger cohort of patients.

As expected, a lower concordance rate was found in the detection of skull lesions and of lesions active after therapy. The skull is a problematic bone area in MM. Lesions are usually very small there, and their detectability is impaired by the physiological high uptake of the underlying brain. After therapy the detectability of lesions strongly depends on the background activity of “normal” BM, and many lesions with minimal residual uptake may be hardly detectable in patients with reactive BM.

The current set of interpretation criteria will be adopted in a forthcoming study including a larger series of patients, which, according to the planned protocol, is expected to reach a sample size of 100 to 130 patients. Subsequently, in the light of follow-up data, a visual cut-off for positivity will finally be defined and provided.

Conclusion

This preliminary study demonstrated that these new, easily applied and descriptive criteria for FDG PET/CT in MM seem to be sufficiently reproducible and may become the shared basis for clinical international multicentre protocols.