Introduction

Evaluation of response to treatment of liver metastatic lesions is of great importance in clinical practice, as well as in research protocols dealing with the effectiveness of novel therapeutic strategies [1]. Response to treatment may be manifested either by reduction in size or by necrosis and colliquation of the tumor [2, 3]. Most of the traditional chemotherapeutic drugs aim to reduce tumor size and in these cases the standard method for assessing the response to treatment is to determine changes in tumor size on cross-sectional imaging examinations [4]. The World Health Organization (WHO) first attempted to propose guidelines and standards for this purpose [5]. According to WHO guidelines, alteration in a lesion’s size is assumed to be reflected by changes of the cross product of the lesion’s maximum diameter and its perpendicular diameter on imaging studies. The response to treatment is then categorized as: complete response (CR), i.e., disappearance of all known disease; partial response (PR), i.e., reduction greater than 50% in tumor size; stable disease (SD), i.e., absence of partial response or progressive disease; and progressive disease (PD), i.e., increase in size greater than 25% of one or more lesions, or the appearance of new lesions.

The WHO guidelines were replaced by new criteria for the estimation of tumor load—response evaluation criteria in solid tumors (RECIST) [2]. The more important points of the new guidelines include: (a) a maximum of five lesions per organ and ten lesions overall (identified as target lesions) should be recorded and measured at baseline and follow-ups, (b) only the longest diameter of each lesion should be measured, and (c) suggestion not to take into account lesions too small for accurate measurements. Categories of response to treatment remained those proposed by WHO, but the thresholds for each category were adjusted. The abovementioned methods have the advantage of simplicity and reproducibility for use in everyday practice. However, new more precise methods of lesion size assessment using three-dimensional (3D) volume measurements have raised questions regarding the accuracy of linear measurements [68].

The aim of this study was to evaluate the accuracy of unidimensional measurements (RECIST criteria) compared with volumetric measurements in patients with liver metastatic disease under chemotherapy.

Materials and methods

This prospective study included 57 consecutive patients with newly diagnosed metastatic lesions from colorectal cancer treated with combination chemotherapy and evaluated by MRI examinations. Metastatic liver lesions were initially detected by CT during the routine staging workup of patients with primary malignancy, and all patients enrolled in the study had at least one lesion with maximum diameter greater than 10 mm. All patients had a first MRI examination (examination A), performed before the initiation of chemotherapy, one at the middle of the chemotherapeutic scheme (examination B), and one immediately post-treatment (examination C). Twelve (12) patients were excluded from the study because they had no follow-up examinations due to intolerance of chemotherapy or death. One patient cooperated poorly and was also excluded since MR images were inappropriate for measurements. Finally, measurements on serial MR examinations were performed successfully in forty-four patients (27 male, mean age 68.48, 95% CI = 63.94–73.01 and 17 female, mean age 62.43, 95%CI = 56.67–68.20). All patients were informed and consented, and the study was approved by the ethical and scientific committee of our institution.

All examinations were performed on a 1-T MR unit (Signa Horizon 1.0 T, GE Medical Systems, Milwaukee, USA), using an abdomen coil (TORSOPA). Each examination comprised a scanogram, T1WI-SPGR and T2WI-FSE axial images with and without fat saturation. After iv administration of gadolinium a dynamic examination at arterial, portal, and equilibrium phase using T1-SPGR was performed. SPIO derivatives (SH U 555 A, Resovist®, Schering AG, Berlin, Germany) were administered, and T2 and T2* images were obtained after 10 min in order to allow SPIO particles to be absorbed from the Kupffer cells. Eleven patients did not have a third examination and comparisons were performed only between examinations A and B. The mean interval between examinations A and Β was 3.39 months (range 2.6–5.9) and between examinations Β and C was 3.29 months (range 3.3–5.2). All examinations were transferred to a workstation (Centricity, GE Medical Systems, Milwaukee USA). The five more appropriate lesions according to RECIST guidelines, namely those with the longest diameter and suitable for accurate repeated measurements, were selected as target lesions (TLs) on the initial MR examination. Each TL was measured for its maximum diameter and its volume on T1-SPGR during portal phase and on axial T2WI after the administration of SPIO. T2WI after SPIO administration (TR 1,920–2,400 ms, TE 92 or 34 ms, echo train length 21, matrix 256 × 192, reconstruction 380–460, WC 80–90, WL 160–180) were used for the measurements owing to the excellent contrast resolution. Τhe volume of each TL was calculated using the “summation of areas” technique. Initially the examining radiologist traced the outline of the lesion’s border in each slice where the lesion appeared. Subsequently, the surface of the area was automatically calculated by the workstation. The volume of the lesion at that slice was measured by multiplying the area by the slice width, and the total volume of the lesion was calculated by summation of the volumes of all slices.

Statistical analysis

The diameter and volume of each target lesion were recorded on an Excel spreadsheet (Microsoft Corp, USA) specifically designed for automatic stratification of each patient/lesion response category in the comparisons between examinations. The percentage change of tumor burden—taking into account all TLs—was calculated for each follow-up examination in each patient. The response to treatment was considered as CR if all lesions had disappeared; PR if there was reduction greater than 30% in the summation of diameters or 65% in the summation of volumes of target lesions; SD if there was neither PR nor PD; and PD in cases where the increase in the summation of diameters or volumes was greater than 20% or 73%, respectively (Table 1). The percentage change in diameter and volume of each single lesion was also calculated to assess the response to treatment of each separate lesion.

Table 1 Equivalence of measurement thresholds between criteria used in the evaluation of tumor response [2]

The agreements between the two measuring methods were examined using the sign test and the coefficient for inter-rater agreement Cohen kappa [9]. The interpretation of the test was based on literature guidelines [10, 11] and was translated into five scales: poor (κ = 0–0.20), fair (κ = 0.21–0.40), moderate (κ = 0.41–0.60), good (κ = 0.61–0.80), and excellent (κ = 0.81–1.00).

A comparison between the volume of each lesion (as measured by the volumetric technique) and the volume of a sphere with a diameter equal to the lesion’s longest diameter was performed to evaluate the assumption that the shape of a sphere approximates the shape of any lesion. As the volumes of the lesions were not normally distributed the Wilcoxon signed ranks test was used. Sphere volumes were calculated using the equation V = 4/3π(D/2)3, where V represents the volume of a sphere and D the diameter of a lesion.

The Statistical Package for the Social Sciences (SPSS version 15.0, Inc., Chicago, IL, USA) was used for statistical analysis. A p value lower than 0.05 was considered as statistically significant. The power of statistical analysis was estimated using the software study size 2.0 (Creostat 2001–2007), resulting in power for kappa statistics 0.93–0.97 for the different evaluations, which is considered strong.

Results

A total of 121 MRI examinations were performed and 170 lesions were characterized as target lesions, allowing for 77 comparisons between examinations and 301 comparisons among individual lesions on serial MRI examinations. The volumetric and RECIST methods were in agreement in 64 out of 77 classifications in treatment response categories (Table 2), resulting in a kappa value of 0.735 (p < 0.001) that corresponded to a “good agreement” between the two methods. Disagreement was observed in 13 out of 77 comparisons (16.88%); in 11 patients the response according to volumetry was worse than RECIST (9 PR results on RECIST corresponded to SD on volumetry and 2 SD on RECIST corresponded to PD on volumetry), while in 2 patients the opposite was found by the “sign test” (one categorized as SD by RECIST and PR by volumetry and one PD instead of SD, respectively). All these differences were at the level of one response grade. With respect to the response of each individual lesion, the two methods were in agreement in 253 out of 301 comparisons (Table 3), showing a kappa value of 0.741 (p < 0.001) that corresponded to a “good agreement”. The two methods demonstrated different response grades in 48 of 301 individual comparisons (15.95%). At the first follow-up (examination A–examination B) the kappa value of agreement was 0.732 (p < 0.001), while at the second follow-up (examination B–examination C) the kappa value was 0.746 (p < 0.001), both at the level of “good agreement”.

Table 2 Cross tabulation between RECIST and volumetry concerning patients’ response to treatment
Table 3 Cross tabulation between RECIST and volumetry regarding the individual lesions response

The response of each individual lesion compared with the overall tumor response of the patient in whom the lesion was located did not differ for 218 out of 301 individual lesions by RECIST (κ = 0.542, “moderate agreement”, p < 0.001) and for 223 out of 301 by volumetry (κ = 0.568, “moderate agreement”, p < 0.001). However in 83 (27.6%) lesions and 78 (26%) lesions by RECIST and volumetry, respectively, the individual lesion response category was not in keeping with the overall response to treatment evaluation of the corresponding patient (Diagram 1). Overall, in 35 of the 44 patients at least one lesion did not keep up with the total tumor response of the patient (Fig. 1).

Diagram 1
figure a

Response to treatment of all individual lesions compared with the overall response of the patient in whom each lesion was located according to volumetry (left) and RECIST (right). Patient response categories are represented on the x-axis. Lesion response category is depicted in color: CR complete response (white), PR partial response (light gray), SD stable disease (dark gray), PD progressive disease (black). Corresponding cross tabulation tables are shown at the top

Fig. 1
figure 1

Discrepancy in the behavior of metastatic liver lesions under treatment. T2* images after SPIO administration, initial examination (a) and follow-up (b) 4 months later. Two lesions (black arrows) decreased in size, responding to treatment. On the contrary, the small, hardly visible lesion in the right liver lobe (white arrow) is considerably larger in the follow-up examination

The average percentage change in the diameter of the lesions (301 comparisons) was 46.77% (RECIST measurements), while the average percentage change in volumes was 159.57%. Assuming that the lesions are spherical in shape and changes are equally distributed in all directions, one would expect a percentage change of 316.16% in volume when the diameter changes are 46.77%. The Wilcoxon test confirmed a statistically significant difference between the real volume changes of lesions and the expected volume changes (Fig. 2) considering the model of a sphere (z = −5.03, p < 0.001).

Fig. 2
figure 2

Discrepancy between RECIST and volumetric measurements. A 53-year-old female patient with a single metastatic liver lesion on initial examination (a) and follow-up examination (b) 3 months later. The unidimensional measurement shows an increase less than 20%, thus the disease was categorized as “stable”. However, volumetry discloses more than duplication in volume (from 42.46 to 91.99 cc) and the disease should be considered as “progressive”. The discordance is probably due to the fact that changes in lesion dimensions are not equal

Discussion

Evaluation of the efficacy of an anticancer treatment is important in patient management and in clinical trials [4]. Over recent decades the methodology for estimating the response to treatment has been modified several times [2, 5, 12]. The current set of tumor response criteria (RECIST) [5] adopts the same categories of response as the previous WHO guidelines [5], i.e., complete response, partial response, stable disease, and disease progression, but the method of measurement has been modified [3]. The use of the longest diameter of each lesion is proposed by the RECIST criteria, since it was considered as accurate as the bidimensional measurements employed by the WHO guidelines [7]. Although the RECIST method is easy, quick, and reproducible, and thus widely used in clinical practice [13], the simplicity of the technique differs markedly with the increasing sophistication of imaging instrumentation [14]. Volumetric image acquisition with modern MR and multidetector CT systems allows precise three-dimensional measurement of tissue volumes [6]. Furthermore, post-processing of acquired images is nowadays easier for three-dimensional study of organs [15] and lesions [16]. Quantification of metastatic tumor burden in the liver can be accurately performed using volumetric acquisition provided by modern CT [17, 18] and MRI. Quantification using volumetry may be considered as more representative than unidimensional measurements for the actual size changes of lesions (Fig. 2). Furthermore, a number of authors have questioned the efficiency of unidimensional measurements to accurately assign patients with solid tumors to the appropriate response category [68, 19]. Instead, the use of volumetric measurements is proposed for proper patient management [68]. This issue has been considered in the discussion for possible response criteria revision by the NCI and the group that had also been implicated in the development of RECIST criteria [20].

Comparison between RECIST and volumetry demonstrated a “good” but not “excellent” agreement between the two methods in the present study; in 13 of 77 (16.8%) comparative evaluations the two methods stratified the patients in different response categories. In the majority of these cases, volumetry assigned the patients in a more unfavorable response category than RECIST. Similar results were also found in the evaluation of solitary lesions under treatment. In 48 out of 301 comparisons (15.95%) the two methods were in disagreement concerning individual lesions response assessment. A difference in response category in such a significant proportion of patients and lesions (about 1/6) may influence therapeutic decisions in everyday clinical practice and may also be misleading in the interpretation of results in clinical trials. The data presented in our study may provide further support for the need for RECIST criteria revision towards the use of volume measurements in the assessment of metastatic tumor burden in the liver. The necessity of such a revision has been also recognized by the RECIST group that has adopted the concept of volumetry superiority towards the unidimensional measurements [20]. On the other hand, volumetry is too time consuming and cumbersome—especially in lesions with irregular borders—to be widely adopted in everyday clinical practice. However, the rapid technological evolution of medical imaging promises solutions for easy and accurate volume measurements [21, 22].

An interesting observation in the present study is that a number of lesions defined as TLs did not behave like the rest of the lesions in the same patient. According to our data 78 of 301 (25.91%) comparisons of individual lesions on serial MRI examinations resulted in a different response category as compared with the response category of the patient in whom the lesion was located. Thus, 35 from 44 patients enrolled in this study had at least one target lesion that exhibited a change in response category, different from the overall response of the same patient, suggesting variable sensitivity to chemotherapeutic agents among lesions in the same patient. These differences in response may reflect differences in chemosensitivity which are mostly due to the development of resistant cancer cell clones, as has been previously described [23]; thus, at least theoretically, one patient may be classified in different response categories if different lesions are selected as TLs among lesions with similar size. Although the selection of five lesions as being representative of the total amount of tumor burden in comparative evaluations facilitates the measurements procedure, it might lead to inaccurate conclusions concerning the response of the patient [24]. It might be more appropriate that all lesions be included in the evaluation of response to treatment and the forthcoming software developments may compensate for the difficulties in measurements.

The concept of measuring only one dimension of each lesion is based on the principle that the diameter is in accordance with the lesion’s volume [25]. Moreover, the boundaries between response categories are usually translated from one metric system to another with the assumption that the tumors grow or shrink similar to a sphere and do so equally in all directions [26]. To explore this hypothesis we compared the volume of each lesion as measured by volumetry with the presumed volume of the lesion using its maximum diameter as the diameter of a sphere. The two volumes differed at a statistically significant level, with the calculated “sphere” volume being almost 100% larger than the actual volume of the lesions, raising questions about the appropriateness of the sphere model. These data further imply that changes in size of a lesion may not be equally distributed in all directions. If the sphere model ultimately proves insufficient for describing changes in size of metastatic lesions—as previous studies [19] and the present one have suggested—then measurement of actual volumes appears to be the more accurate approach.

The fundamental role of imaging in the evaluation of metastatic liver disease response to treatment is well established among clinicians worldwide. Most of the studies that have focused on that issue have used CT as the imaging modality. MR imaging is increasingly applied in the evaluation of oncology patients with metastatic disease [27] and its use is also suggested along with CT by RECIST guidelines. The inherent MR imaging high contrast resolution that is amplified by the use of contrast agents render MRI a very powerful tool in the detection and demonstration of liver metastases [2831]. The use of double contrast administration, i.e., Gd+SPIO, is superior to other imaging techniques and their combinations for intrahepatic lesion characterization [3234]. Thus, false positive results in the selection of target lesions as metastasis may be theoretically diminished. In addition, the use of SPIO allows for better contrast and delineation of lesion boundaries thereby facilitating measurement procedures [35, 36] and overcoming issues related to precise timing of imaging acquisition when gadolinium is used as contrast media; however, this is, to the best of our knowledge, the first study implicating MRI in assessing response to treatment of metastatic liver neoplasms. The limited use of MRI, compared with CT, in such an application is probably related to cost and availability issues, and it may be suggested cautiously in clinical practice.

Whilst conducting the measurements during this study, we have faced some difficulties emanating from the application of RECIST guidelines. In a number of cases target lesions became confluent on follow-up MRI with other, nontarget neighboring lesions demonstrated on initial MRI; thus, these TLs were rendered unsuitable for measurements. The opposite was also encountered when a large lesion responding to treatment split into several smaller lesions on follow-up MRI. The proper handling of the aforementioned lesions is not described in the RECIST guidelines and clarification of this issue is needed, since it is not so uncommon in clinical practice. A possible suggestion for the forthcoming RECIST revision [20] might be the deselection of lesions that have become confluent with nontarget lesions in retrospect. In cases of split lesions we may consider it more appropriate to measure on follow-up examinations summatively the smaller lesions that derived from the larger lesion at initial examination.

Another source of a potential conflict is whether or not to measure a lesion which fulfills the size criteria for a target lesion at initial imaging examination, and become too small to be measured on a follow-up examination in a patient that responded to treatment (Fig. 3). This issue can provoke conflicts between studies and a clarification for the handling of these lesions should be provided. In fact, lesions that become too small are strongly suggestive of a successful response to treatment and may not be deselected as target lesions. A possible suggestion for the RECIST revision might be the inclusion of tiny lesions to measurements in such cases, even if small lesions are prone to significant measurement error.

Fig. 3
figure 3

Two lesions were detected and measured in the right liver lobe on the initial MRI examination (a) in a male patient with colorectal cancer. On follow-up examination (b), one lesion is not considered measurable according to RECIST criteria (longest diameter less than double the slice thickness), complicating the comparison between the two examinations

In conclusion, the present study supports the use of volumetric techniques for the assessment of liver metastatic disease response to treatment as opposed to the application of unidimensional measurements that have certain drawbacks.