Colorectal cancer (CRC) is the third leading cause of cancer-related death worldwide.1 Over one-third of patients with CRC will progress to develop colorectal liver metastases (CRLM) during the course of their disease, resulting in two-thirds of all CRC-related deaths.1, 2

Chemotherapy may be provided to patients with upfront unresectable CRLM or for palliative intent.3 Based on institutional practise, surgeon, oncologist and patient preferences, approximately 60% of patients with resectable disease receive chemotherapy prior to surgical resection.4 Radiologic response to chemotherapy occurs in approximately 65% of patients and is usually a prognostic indicator of favorable outcomes and tumor biology.3 However, complete disappearance of CRLM after chemotherapy, a phenomenon referred to as disappearing liver metastases (DLM) or complete radiologic response, poses a therapeutic dilemma based on the uncertainty of residual microscopic disease. To date there is no consensus on the management of patients with DLM after chemotherapy.5, 6 Current approaches vary, with some surgeons advocating for resection of all the original sites of disease, while others resect only visible residual disease. Furthermore, in patients with unresectable disease at baseline, the definition of conversion to resectability hinges on the interpretation of DLM.

Imaging studies, including computed tomography (CT), magnetic resonance imaging (MRI), [18F]fluoro-2-deoxy-D-glucose positron emission tomography (FDG-PET), and intraoperative ultrasound (IOUS), have an essential role in the detection and localization of DLM. Overall, DLMs can occur in up to 37% of patients who receive preoperative chemotherapy,5 and this is largely dependent on the type and quality of the imaging modality.5, 7 The objective of this study was to identify the imaging modality that best predicts pathological complete response in patients with complete radiological response after chemotherapy.

Methods

The protocol of this systematic review was registered in PROSPERO (CRD42019131949) and the reporting adheres to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement guidelines (electronic supplementary Table 1).8

Search Strategy

The search strategy was developed by a health services librarian with experience in systematic reviews and meta-analyses, with input from the investigative team (electronic supplementary Methods). The accuracy of the search was validated based on the inclusion of a priori known eligible references. The search strategy was first piloted in MEDLINE before being adapted to the syntax of additional databases. MEDLINE, EMBASE, Web of Science, Scopus, the Cochrane Central Register of Controlled Trials, and Cochrane Database for Systematic Reviews were searched from inception to February 2019.

Online clinical trial registries and conference proceedings of relevant national and international societies were searched to ensure literature saturation. Reference lists of eligible studies were manually assessed in order to detect any potentially relevant article.

Eligibility Criteria

Randomized controlled trials, prospective and retrospective studies were considered for inclusion. The population of interest was patients of all ages across all care settings who received chemotherapy for the management of CRLM and had DLM on restaging imaging.

Eligible studies were considered for inclusion if they reported the type of imaging modality that demonstrated DLM after chemotherapy. To ensure that all lesions were accounted for, studies were only included if they reported the number of DLMs at the lesion level. Eligible studies were also required to report at least one of the following reference (gold) standard results: (1) the number of DLM disease recurrences at 1-year follow-up surveillance imaging if the DLM was not resected, or (2) the pathology report results if the DLM was resected.

Studies without chemotherapy treatment of CRLM or without the development of DLM on restaging images after chemotherapy were excluded, as were studies that did not explicitly state the type of imaging modality that diagnosed the DLM. In the latter case, articles were included only if it was possible to extract information regarding the number of DLMs and the imaging modalities that identified the DLM. All authors were contacted to attempt to acquire any missing information from potentially eligible studies.

Outcomes

The primary outcome of interest was the diagnostic ability of imaging to predict the true complete response of the DLM by calculating the negative predictive value (NPV). The NPV was calculated as the proportion of true-negative lesions of all DLMs. True-negative lesions were defined as lesions characterized as DLM by imaging that were confirmed to be a true complete response by the reference standard. False-negative lesions were defined as lesions characterized as DLM by imaging that were confirmed to be malignant based on the reference standard.

Other relevant definitions included true positive lesions, defined as malignant lesions diagnosed on imaging and confirmed by the reference standard (i.e. follow-up imaging or histology). False-positive lesions were defined as lesions diagnosed as malignant on imaging that turned out to be benign by the reference standard. Very few studies reported the numbers of ‘true positive’ and ‘false positive’ lesions and therefore we were unable to calculate the positive predictive value, sensitivity, specificity, likelihood ratio, and diagnostic odds ratio.

To reduce publication bias, both published abstracts and full-text articles were included. No restrictions were placed on the date of publication, publication status, article language. Or study setting.

Study Selection

Three reviewers (HM, WC and SS) independently assessed all citations for eligibility in duplicates, in two stages using DistillerSR software.9 The first stage selected all potentially eligible citations based on title and abstract. Full-text review was subsequently conducted to determine final eligibility. Reviewers were not blinded to study authors or institution. Inter-rater agreement between reviewers was assessed using kappa statistic and a k > 0.8 was required before proceeding to the next stage.

If multiple publications included the same patient population, the most recent study reporting the primary outcome of interest was included. All disagreements were resolved through discussion. Any unresolved disagreements were solved by consensus with a fourth author (PK).

Data Collection

Data were abstracted in duplicate by two reviewers (HM and SS) independently using a standardized electronic form and discrepancies were resolved by discussion. Any unresolved disagreements were solved by consensus with a third author (PK). Data for DLM were recorded on a lesion level.

Quality Assessment

Two reviewers (HM and SS), unblinded to author and journal, independently assessed studies for methodological quality using the QUADAS-2 tool for systematic review of diagnostic accuracy.10 Any unresolved disagreements were solved by consensus with a third author (PK).

Statistical Analysis

NPV and 95% confidence intervals (CIs) were calculated for each imaging modality. Only outcomes from the same modality were combined. NPV were logit-transformed to improve an approximate normal distribution and then pooled. We then antilogit transformed the pooled NPV and 95% CI. Statistical heterogeneity was measured using the I2 statistic.11 Analysis was completed using RStudio (The R Foundation for Statistical Computing, Vienna, Austria).12

Results

Study Selection

Our search resulted in 3488 potential citations and 12 additional potential articles were identified through clinical trial registries and conference proceedings (Fig. 1). Of these, 512 were excluded for duplication, 2846 after title and abstract screen and an additional 129 after full-text review. Twelve retrospective studies and one prospective study were deemed eligible.13,14,15,16,17,18,19,20,21,22,23,24,25 Characteristics of eligible studies are summarized in Table 1.

Fig. 1
figure 1

PRISMA diagram. PRISMA preferred reporting items for systematic reviews and meta-analyses

Table 1 Study and patient characteristics

Study and Patient Characteristics

The 13 studies included 332 patients with DLM. Of the studies included, 38% were conducted in North America or Europe and 62% were conducted in Asia. All studies were published between 2006 and 2018 and patients were accrued between 1998 and 2014, with half of the studies accruing patients after the year 2008. Five imaging modalities were included in these studies reporting the incidence of DLM: CT, FDG-PET, MRI, IOUS, and contrast-enhanced IOUS (CEIOUS) (Table 1 and electronic supplementary Table 2).

Most studies reported the characteristics of patients with DLM, with the exception of three studies.15, 16, 20 Those articles either reported baseline characteristics of patients prior to the development of DLM15, 20 or patients with complete pathologic response.16 Although age data were not reported in three studies,15, 16, 20 all studies were conducted in adults aged ≥ 18 years and included patients in whom the reported median or mean age was >50 years.13, 14,17,18,19,21,22,23,24 Sex data were not reported in three studies,15, 16, 20 and six studies included more males than females (Table 1).13, 17, 21, 23, 24

All patients were treated with systemic chemotherapy, with the exception of two studies that used both systemic and hepatic artery infusion (HAI) chemotherapy.16, 25 A variety of chemotherapy regimens were utilized and often articles did not distinguish the chemotherapy regimen that resulted in a DLM. Therefore, we were unable to perform a subgroup analysis based on the modality of delivery and type of chemotherapy treatment administered. The median or mean number of cycles received among the five studies that reported the data was more than six cycles.14, 17, 18, 23, 24 Of the included articles, four studies described providing patients with DLM postoperative chemotherapy.13, 17, 18, 25 The remainder of the studies did not report on postoperative chemotherapy treatment.

The mean or median size of CRLM prior to chemotherapy that resulted in DLM was reported by 10 studies.13, 14, 17, 18,20,21,22,23,24,25 There was wide variability in the baseline mean or median size of DLM (prior to chemotherapy); < 1.5 cm in five studies,14, 17, 20, 21, 24 and between 1.5 and 3.4 cm in the remainder five studies.13, 18, 22, 23, 25

There were slight variations in the scanners and protocols used between studies (electronic supplementary Table 2). However, all CT scans were performed with intravenous contrast,13, 15, 16, 18, 22, 23, 25 all MRIs were performed with gadoxetate disodium (Primovist®) contrast,20,21,22,23,24 all CEIOUSs were performed with perflubutane injection,19, 22, 24 all FDG-PETs were performed with 18F-FDG,14, 15 and all IOUSs were performed with a linear probe in the beginning of the OR.17, 19

Eight studies assessed CT scans: three studies used a helical CT scan,13, 16, 18 two studies used a multidetector CT scan,22, 23 and three studies did not specify the system used.15, 24, 25 Section thickness between 1 and 5 mm was described in the five included studies.13, 15, 18, 22, 23 MRI was evaluated in six studies: three studies used a 1.5 Tesla scanner,20, 22, 24 one study used a 3.0 Tesla scanner,21 one study did not specify,25 and one study used a combination of both the 1.5 or 3.0 Tesla scanner.23 FDG-PET was assessed in two studies.14, 15 The amount of tracer used, imaging time, and interval time between contrast injection and scanning varied between the two articles. One article reported using 15–20 mCi, scanning for 2–3 min after waiting for 45–75 min in one study,14 while another study reported using 10–18 mCi, scanning for 5 min for table position after waiting for 60–120 min.15

Risk of Bias

The QUADAS-2 tool assessment is presented in electronic supplementary Table 3. Under risk-of-bias assessment, all studies were low risk of bias for patient selection and flow and timing domains. For the domains of index test and reference standard, all studies except one were rated as unclear because no details were provided regarding blinding investigators. The remaining study was rated as low because investigators were blinded to patient outcomes.23

For applicability assessment, all articles except one were rated at a low applicability concern for index test and reference standard because sufficient information into their imaging protocol and reference test definition were provided.13,14,15,16,17,19,20,21,22,23,24 The one article rated as unclear for index test did not provide sufficient information on the imaging protocols.25 The three studies that did not provide baseline characteristics of patients with DLM were rated as unclear on patient selection category because it would be difficult to judge if their population is relevant to other centers.15, 16, 20

Synthesis of Results: Negative Predictive Value

By study design, the incidence of DLM on each imaging modality was reported by all included studies; there were 955 DLMs on CT, 104 on FDG-PET, 50 on IOUS, 585 on MRI, and 175 on CEIOUS. The pooled NPV of DLM for each imaging modality is summarized in Table 2. The pooled NPV was 0.79 (95% CI 0.53–0.93) for CEIOUS, 0.73 (95% CI 0.58–0.85) for MRI, 0.54 (95% CI 0.37–0.7) for IOUS, 0.47 (95% CI 0.34–0.61) for CT scan, and 0.22 (95% CI 0.11–0.39) for PET scan (Fig. 2).

Table 2 Summary of pooled NPV across different imaging modalities
Fig. 2
figure 2

Pooled NPVs of imaging modalities. Pooled NPVs using a random-effects model of a contrast enhanced intraoperative ultrasound; b magnetic resonance imaging; c intraoperative ultrasound; d computed tomography; and e positron emission tomography. DLM disappearing liver metastases, NPV negative predictive value, CI confidence interval

Discussion

DLMs occur after chemotherapy treatment due to tumor response and changes to the liver parenchyma, such as steatosis or sinusoidal obstruction, making it challenging for the diagnostic imaging to accurately identify the liver lesions.26 With the advances in chemotherapy efficacy, the frequency of DLM or complete radiological response on restaging imaging is rising. Patients with DLM continue to be a source of uncertainty for surgeons.6 In one recent survey, half of the surgeons relied on CT scans in this setting to identify DLM.6 In this meta-analysis, we demonstrated that according to the current available evidence, MRI using gadoxetate disodium or CEIOUS are the preferable imaging modalities for evaluation of DLM after chemotherapy, with a pooled NPV of 0.73 (95% CI 0.58–0.85) and 0.79 (95% CI 0.53–0.96), respectively. The benefit of MRI is that it can guide the decision of whether to proceed with an operative intervention, whereas CEIOUS requires an operation. Based on this rationale, we recommend re-imaging all patients with CT scan after chemotherapy and selectively proceeding with an MRI using gadoxetate disodium in patients with DLM prior to proceeding with surgery, as the probability of a true complete response of a DLM on an MRI using gadoxetate disodium is 73%. If a patient is to undergo a surgery and there is the capacity to perform a CEIOUS, then it would be advisable to search for the DLM intraoperatively as well. In cases where the DLM is not resected, we recommend that those patients should be followed closely with an MRI (e.g. every 3 months for the first year then every 6 months thereafter). This would allow for diligent monitoring of the DLM and prompt intervention through resection or ablation if it recurs.

Our findings are not unique to DLM. Another meta-analysis examining imaging for CRLM after chemotherapy did not focus on DLM but demonstrated similar findings to our study. MRI was the best imaging modality, with a higher sensitivity (85.7%) than CT (69.9%) or PET scans (54.5%).26 The poor diagnostic ability of PET scanning is likely due to scant FDG uptake in necrotic liver lesions and the lower sensitivity of PET scans in detecting lesions smaller than 1 cm.26, 27 Our study is unique such that we focused on examining the diagnostic ability of imaging modalities in detecting a true pathologic response.

Primary studies investigating factors that predict formation of DLMs prior to chemotherapy suggest that smaller size and larger number of liver lesions potentially increase the reporting of DLMs.19, 22, 23 Others investigated the normalization of carcinoembryonic antigen (CEA) values, the type of chemotherapy, and the number of chemotherapy cycles as risk factors of DLMs; however, the data were either limited or inconsistent across the studies.17, 22, 25 This diverse range of values rendered it difficult to arrive at any conclusions that are clinically informative and meaningful to clinicians and patients. We suggest dedicated future studies to examine risk factors and predictors of DLM using rigorous methodology and reporting guidelines.

The main reluctance to perform MRI scans with gadoxetate disodium is due to cost and accessibility;28 however, since MRI appears to be the superior modality in imaging of CRLM compared with alternative modalities, it results in less additional imaging, diagnostic work-up costs, and repeat hepatectomy.28, 29 This, combined with our findings, could be the grounds to consider system level and practice changes to proceed with scanning all patients undergoing workup, restaging, and surveillance using MRI with gadoxetate disodium.

According to the QUADAS-2 tool, risk of bias was unclear for the reference standard and index test, as blinding details were not provided. This limitation could have affected the diagnostic performance of imaging by overestimating the NPV. Therefore, emphasis should be placed on methodological design in future studies assessing the accuracy of diagnostic imaging tests.

The limitations of this meta-analysis are mostly related to the available information from the primary studies and the limited number of publications in this field. Only 13 articles were included in this systematic review and meta-analysis because the majority of other studies assessing diagnostic accuracy of imaging did not separate between imaging modalities used or did not report the incidence of DLM. Of the included primary studies, few provided the true-positive and false-positive values and therefore it was not possible to compare the effect estimate of the diagnostic odds ratio or likelihood ratio, which are less affected by disease prevalence. Because of the paucity of studies reporting data on the patient level and long-term outcomes, it was not possible to calculate endpoints at the patient level. Additionally, other studies demonstrated that different chemotherapy modalities, in particular HAI chemotherapy, are more likely to result in DLM.25, 30 We were unable to reliably perform a subgroup analysis of HAI and systemic chemotherapy due to the heterogeneity of chemotherapy modalities used and the lack of granular data of DLM using each treatment. Lastly, there were slight variations in the imaging protocols between studies. It would be helpful for future studies to include more detailed documentation of the imaging system, patient-level outcomes, intent of treatment, and DLM based on chemotherapy modality.

Conclusion

The results of this meta-analysis suggest that MRI with gadoxetate disodium is the most appropriate imaging modality for the detection of DLM after chemotherapy. Patients with DLM using other preoperative imaging modalities should undergo re-imaging with MRI prior to contemplating resection of the remaining CRLM. Consideration should be given to proceeding with the resection of visible CRLM, even if DLMs (on MRI) are left in situ, due to the high NPV of this modality.