Introduction

Tumor spread to the peritoneum is commonly seen in cancer patients from various origins. For example, approximately 75%, 17%, and 10% of patients with ovarian, gastric, and colorectal cancer, respectively, have peritoneal metastases (PM) at the time of initial presentation [1,2,3]. PM are an important cause of morbidity and mortality, with only palliative surgery and systemic chemotherapy as conventional treatment options. Median survival rates for PM patients from gastric origin are 1–3 months, for those from colorectal origin this is up to 12.7 months [3, 4]. For ovarian carcinoma, in spite of new treatment modalities, 10-year survival for patients with advanced stage has hardly improved in the last 25 years (10% versus 13%) [5].

Although the presence of PM is certainly considered a poor prognostic sign, it is not proof of distant metastasis [6]. In recent years, aggressive locoregional treatment strategies including cytoreductive (CRS) surgery possibly followed by hyperthermic intraperitoneal chemotherapy (HIPEC) have shown promising results for patients with limited and resectable peritoneal disease [1, 7]. Five-year survival rates as high as 50% in well-selected patient groups of both ovarian and colorectal cancer have been reported [8, 9]. This means that, for a subgroup of patients with PM, prognosis may shift from a palliative treatment to long-term survival or even cure. Accurate staging and selection of patients with a limited peritoneal tumor load is the crux in achieving these optimal results.

Explorative laparoscopy is the reference standard to assess peritoneal disease. However, this procedure is invasive, often challenging and incomplete due to adhesions and carries a small risk of complications. This underlines the need for a robust imaging modality to reliably quantify the extent of peritoneal disease. To date, computerized tomography (CT) still is the preferred imaging method to diagnose PM in most centers. However, there is a growing interest in functional imaging techniques as positron emission tomography (PET) whether or not combined with CT and magnetic resonance imaging (MRI) especially with the addition of diffusion-weighted (DW) sequences.

The purpose of this meta-analysis is to compare the diagnostic performance of CT, PET(CT), and (DW)MRI in the detection of peritoneal metastases in patients with gastrointestinal (i.e., gastric, colorectal, appendiceal) and ovarian cancer in order to indicate the imaging modality most suitable for optimal preoperative selection of potential CRS candidates.

Materials and methods

Literature search

In accordance with the PRISMA guidelines, a systemic literature search was conducted in the databases PubMed, Embase (Ovid), and Scopus to identify relevant articles from January 1997 up to May 2018 [10]. An initial search strategy was set up and carried out in cooperation with an information specialist linked to our hospital (PB).

Search terms used included “peritoneal seeding,” “peritoneal metastases,” “peritoneal carcinomatosis,” “computed tomography,” “CT,” “FDG-PET,” “PET,” “PET(CT),” “magnetic resonance imaging,” and “MRI” and finally terms as “diagnosis,” “staging,” “accuracy,” combined using “OR” and “AND.”

The PRISMA checklist is presented in Appendix 1. The literature search is described in detail in Appendix 2.

Study selection

References retrieved from the database searches were deduplicated and exported to Endnote X7 (Windows) software. First, two abdominal radiologists with respectively 12 and 10 years of clinical experience (M.L., I.vS.) independently screened titles and abstracts. Review articles, meta-analyses, case reports, conference abstracts, animal studies, comments, and letters to the editor were excluded.

Next, if the abstract fulfilled the criteria as stated below or if this was unclear, both readers reviewed the complete original article.

The defined inclusion criteria were:

  • Diagnosis of PM in patients with a newly diagnosed primary gastrointestinal (gastric/colorectal/appendix) or ovarian cancer of any histological type and any stage

  • CT, PET, PET(CT), and (DW)MRI or a combination of these imaging techniques were used to detect PM.

  • Histopathology, surgery, or clinical/radiological follow-up results were used as a reference standard.

  • Results were presented in a 2 × 2 contingency table or such a table could be extracted from the article.

  • Studies containing 15 or more patients

  • Articles were in English

We aimed to only include studies on the detection of PM in newly diagnosed patients, i.e., concerning pretreatment imaging. Studies comprising merely patients with recurrent disease or patient groups that all underwent previous surgery or neoadjuvant chemotherapy treatment (NACT) were excluded. If surgical history or prior neoadjuvant chemotherapy treatment was not specifically stated in the article, this finding was considered negative.

All histological tumor types were eligible for inclusion. However, because of the known lower detection rate of mucinous carcinomas by 18F-FDG-PET, studies comprising exclusively mucinous primary tumors were excluded in order to be able to compare the results of each imaging modality.

Both region-based and patient-based studies were included. Region-based studies assess peritoneal burden following predefined abdominal regions. In case of a region-based study, averages for all regions together should have been given or should be derivable.

The references of the retrieved articles were subsequently crosschecked for more potentially relevant studies. Inconsistencies between the readers regarding potential eligibility were resolved by consensus.

Data extraction

Data extraction and assessment of methodological quality were undertaken by one of the authors (I.vS.) and checked by a second author (M.L.). Consensus was reached on points of disagreement.

For each study, the following characteristics were extracted using a predesigned form:

Study characteristics

Study characteristics are the following: first author, country of origin, year of publication, study design (retro-/prospective), single-/multi-center, primary outcome, reference test (i.e., histopathology, surgical findings, follow up imaging), patient-based or region-based analysis, interval between imaging and reference standard, sample size, prevalence of PM, accuracy, sensitivity, specificity, true-positives/negatives (TP/TN), false-positives/negatives (FP/FN), positive predictive value, and negative predictive value (PPV/NPV).

Patient characteristics

Patient characteristics are mean age, gender, primary tumor and histology, and prior NACT.

Imaging characteristics

Imaging characteristics are the following: type of imaging modality, basic specifications of imaging modalities and techniques used (i.e., type of scanner, field strength MRI, slice thickness), use of intraluminal and intravenous contrast medium, use of bowel preparation or antispasmolytics. Additionally, specific scoring systems for peritoneal tumor load, e.g., the peritoneal cancer index (PCI) as introduced by Sugarbaker [11], were noted.

If a study did not state TP/TN/FP/FN results, these results were derived from marginal totals or sensitivity and specificity. If in case of multiple readers mean results were given, these were used for the analyses; if not, results of the first reader were used. If the presence of PM was rated using a grading scale (three or five point with an intermediate “equivocal” grade), results with the threshold where the equivocal category was regarded positive were used.

Corresponding authors were contacted to clarify incomplete or unclear data.

Quality assessment

The methodological quality of all included articles was assessed with the revised tool for the quality assessment of diagnostic accuracy studies (QUADAS-2) [12]. This assessment was performed by one of the authors and checked by another (I.vS., M.L.). Final results were based on consensus discussion.

Statistical analysis

Based on the results derived from 2 × 2 contingency tables, all diagnostic parameters for the different modalities such as sensitivity, specificity, and diagnostic odds ratio were calculated. The diagnostic odds ratio (DOR) is a measure for the diagnostic performance of a test, which combines sensitivity and specificity into one measure [13]. A DOR of 1 implies that the test has no discriminatory power at all; the larger the DOR, the better the test discriminates between patients with and without the target disorder. Diagnostic performance between articles was summarized and compared with forest plots and hierarchical summary receiver operating characteristic (HSROC) curves with use of a random effects model as described by Moses and Littenberg [14].

Subgroup analyses were performed for studies with different types of tumors to see whether tumor type (GI or ovarian) affected the diagnostic performance of imaging.

The heterogeneity of the study results was evaluated by calculating the I2 statistic. I2 values can vary from 0 to 100%. Percentages of around 25% (I2 = 25), 50% (I2 = 50), and 75% (I2 = 75) were classified as low, medium, and high heterogeneity, respectively.

All statistical analyses were performed using StataSE (StataCorp), version 11.

Results

Literature search and study selection

The initial search in electronic databases resulted in 3457 articles. After deduplication, 2187 articles were excluded by screening the title and/or abstract. The remaining 69 articles were selected for close review of the full article after which another 50 were excluded for various reasons (Fig. 1). After reference crosscheck of the remaining 19 eligible articles, five additional articles were found that fulfilled all inclusion criteria, adding to a total of 24 included studies. One corresponding author was contacted to clarify unclear data.

Fig. 1
figure 1

PRISMA flow diagram of the search for eligible studies on diagnostic performance of CT, PET(CT), and (DW)MRI for the detection of peritoneal metastases

Study and patient characteristics of included studies

Characteristics of the included studies are shown in Table 1.

Table 1 Summary data of 24 included studies

The total number of included patients was 2302 (range 15–498, mean 96). The majority of studies had a retrospective design (14 out of 24); in one study, the design was not defined [15]. Half of the included studies originated from Asian countries. Two multi-center studies were included [16, 17]. The majority of included studies comprised patients with exclusively ovarian (n = 11) or gastric cancer (n = 10). Three studies comprised patients with colorectal or appendiceal primaries.

The 24 included articles provided 37 datasets as eleven studies evaluated multiple modalities (Fig. 1). Twenty datasets evaluated the presence of PM with CT [2, 16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34], ten datasets with PET(CT) [15, 17, 19, 21, 25, 27, 30, 33,34,35], and seven datasets with (DW)MRI [16, 31, 33,34,35,36,37]. Of the CT oriented studies, eleven used multi-detector CT scans, six spiral CT scans, one study used both and two studies did not specify. Of the ten PET(CT) studies, four used low-dose CT, four diagnostic CT (all contrast-enhanced), and two studies used 18F-FDG-PET without accompanying CT scanning. Of the seven MRI studies, in addition to morphological sequences, two studies used contrast-enhanced sequences, two used DWI sequences, and the remaining three studies included both. None of these last-mentioned studies analyzed the accuracy of DWI versus contrast-enhanced sequences separately [31, 33, 34].

Fifteen studies were patient-based whereas nine had a region-based approach. Three of the region-based studies used the peritoneal cancer index (PCI) as a scoring method [31, 35, 37]. Two studies used an adjusted PCI score comprising nine instead of 13 abdominal regions (implants on small bowel serosa not assessed) [15, 34].

The majority of patient-based studies (10 out of 15) comprised gastric cancer patients whereas the majority of region-based studies comprised ovarian cancer patients exclusively or as part of mixed cohort (respectively 6 and 2 out of 9 studies). Of the patient-based studies, 67% (10 out of 15) was published in or before 2010 whereas all of the region-based studies were published after 2010 (2011–2018).

Three included studies used follow-up imaging as part of the reference standard [21, 27, 32]. In one study, postoperative CT and clinical data were used as a reference standard only for lesions identified on preoperative CT but not found at surgery [32]. The other two studies used follow-up imaging for patients that did not undergo surgery [21, 27]. Method of follow-up imaging is not clearly stated in these studies.

Quality assessment

The results of the QUADAS-2 assessments are presented in Fig. 2a, b.

Fig. 2
figure 2

a QUADAS-2 results. Risk of bias and applicability concerns per study. b QUADAS-2 results. Overview of risk of bias and applicability concerns of all 24 included studies

In the reference standard domain 58% (14 of 24 studies) scores “unclear.” In most of these studies, it was unknown whether the reference standard results were truly blinded to index test results, thus introducing potential bias [2, 17, 18, 21, 22, 24, 26,27,28, 33, 34, 36, 38, 39]. In a quarter of the included studies, there was a high risk of bias in the patient selection domain. This was mainly caused by inappropriate exclusions.

Seven studies provided insufficient information on timing between index test and reference standard. The remaining studies had an acceptable time frame between imaging and reference standard (range from days to 59 days) [18, 19, 21, 22, 25,26,27, 37].

Regarding applicability concerns for patient selection, index test and reference standard were generally low. None of the studies was deemed necessary to exclude from the meta-analysis based on QUADAS-2 assessment.

Meta-analysis

Pooled analysis of sensitivity, specificity, and diagnostic odds ratio of CT, PET(CT), and (DW)MRI

Comparison between CT, PET(CT), and (DW)MRI in the region-based group shows (DW)MRI to have the highest sensitivity (91%; CI, 84–96%) and diagnostic odds ratio (63.3, CI, 31.5–127.3) (Table 2). PET(CT) has the highest specificity (90%; CI, 80–96%) however with only a marginal difference from (DW)MRI and CT (respectively 85% (CI, 78–91%) and 88% (CI, 81–93%)). Figure 3 shows the hierarchical summary receiver operating characteristics (HSROC) curves of region-based CT, PET(CT), and (DW)MRI studies.

Table 2 Summary of statistics on diagnostic performance of CT, PET(CT), and (DW)MRI in detecting peritoneal metastases (region based studies)
Fig. 3
figure 3

Hierarchical summary receiver operating curves (HSROC) for the diagnostic performance of (a) CT, (b) PET(CT), and (c) (DW)MRI for determining the presence of peritoneal metastases on a per region basis. Each circle on the plot represents the pair of sensitivity and specificity from a study and the size of the circle is scaled according to the sample size of the study. The solid red block represents the summary sensitivity and specificity, and this summary point is surrounded by a 95% confidence region (orange dashed line) and 95% prediction region (green dotted line). HSROC curve is plotted as curvilinear line passing through summary point. indicates (DW)MRI to have the highest summary point and the highest confidence level indicating that from current literature (DW)MRI seems most adequate to detect peritoneal malignancies at region-based analysis

In the patient-based group, not enough studies were included to make a pooled analysis for (DW)MRI and PET(CT). Pooled analysis of patient-based CT studies (n = 14) showed a higher DOR than for the region based ones (n = 6): DOR 33.5 (16.3–68.9) versus 15.9 (4.4–58.0). However, sensitivity and specificity showed only small differences (Table 2).

The test for heterogeneity showed there was considerable heterogeneity among studies for all modalities (I2 > 75%). Forest plots of all studies, with their pooled estimates per modality and the I2 score per modality, are presented in Fig. 4.

Fig. 4
figure 4

Forest plots of all studies and their pooled estimates per modality. a CT per region analysis. b PET(CT) per region analysis. c (DW)MRI per region analysis. d CT per patient analysis. Each horizontal line represents an individual study with the result plotted as a box and the 95% confidence interval displayed as the line. The diamond at the bottom of each plot shows the result of the individual studies combined and averaged. The horizontal lines of the diamond are the limits of the 95% confidence intervals. From the forest plots, the specificities of all three modalities seem to perform similarly however the sensitivity of (DW)MRI is higher than that of CT and PET(CT) and PET-CT at region-based detection of peritoneal metastases. For each modality, the degree of heterogeneity between the studies is indicated by the I2 statistic in the left bottom corner

Sub-analyses

There were not enough data available to compare PET with low-dose CT versus PET with diagnostic CT scans or MRI with versus MRI without DW sequences in both patient- or region-based groups.

There was no significant difference in sensitivity and specificity between prospective and retrospective CT patient-based studies with sensitivity and specificity respectively of 71% (CI, 30–94%) and 95% (CI, 84–99%) versus 68% (CI, 54–81%) and 93% (CI, 82–97%).

Comparing type of CT scanner (spiral/multi-detector) and year of publication (≤ 2010/> 2010) did not show significant differences (Appendix 3). There was not enough data available to compare primary tumors (ovarian/gastric) and slice thickness (</> 5 mm).

Discussion

This meta-analysis showed that (DW)MRI provided the highest sensitivity for the detection of peritoneal metastases in patients with tumors from gastrointestinal or ovarian origin. PET(CT) shows a lower, albeit not significant, diagnostic performance than (DW)MRI: pooled sensitivity, specificity, and DOR for PET(CT) 80% (CI, 57–92%); 90%(CI, 80–96%); 36.5 (CI, 6.7–199.5) and for (DW)MRI 92%(CI, 84–96%); 85%(CI,78–91%); 63.3 (CI, 31.5–127.3), respectively. However, (DW)MRI is more easily available in daily practice than PET(CT) and is therefore potentially the imaging method of choice. CT demonstrated the lowest sensitivity rate compared with (DW)MRI and PET(CT) (68% (CI, 46–84%)).

As radiological manifestations of PM are variable and can be subtle, interpretation may be challenging. At present, there is no accepted reference standard for imaging peritoneal metastases. In this meta-analysis CT by far comprised the largest dataset, reflecting the fact that in most centers, this is currently the preoperative imaging method of choice for diagnosing PM. This is probably due to its availability, speed of acquisition, and familiarity. In the literature, the reported sensitivity of CT in diagnosing PM ranges widely from 25 to 90% [2, 40]. Detection of peritoneal implants on CT is strongly influenced by lesion size, location and presence of ascites [41]. Studies report a sensitivity of detection of individual peritoneal implants < 1.0 cm ranging from only 9.1–50% and of nodules < 0.5 cm only 11% [40, 42]. This is mainly caused by its limited contrast resolution, making especially small nodules hardly discernible from adjacent normal tissue. As a result, CT consistently and significantly underestimates the peritoneal tumor load. Substantial understaging of PM is undesirable, as it erroneously labels patients as operable, which might lead to futile surgery.

Functional imaging techniques like PET-CT and DW MRI seem to overcome some of the limitations of CT. Both techniques show a high contrast between tumor and normal surrounding tumor (signal to noise ratio). In addition to detailed anatomical data, PET-CT reflects increased glucose metabolism in tumors whereas DW MRI reflects tumoral cellularity. The most important advantage of PET(CT) is the fact that it is a total body imaging technique that can possibly detect distant metastases elsewhere. Its major drawbacks are its radiation exposure, higher cost, and limited depiction of small tumor volumes (current spatial resolution 4 mm). MRI is radiation free and with a dedicated protocol, it might detect smaller peritoneal lesions potentially missed by PET-CT. Potential drawbacks of DW MRI are false-positive findings caused by, for instance, postoperative abnormalities or lymph nodes.

A recent meta-analysis on the diagnostic accuracy of 18F-FDG PET/CT for the detection of peritoneal carcinomatosis of various cancers found a sensitivity and specificity of respectively 87% (CI 0.77–0.93) and 92% (CI 0.89–0.94) [43]. In the first prospective study evaluating the diagnostic value of whole-body DW MRI compared with CT and PET-CT in assessing peritoneal staging in ovarian cancer patients, Michielsen et al reported accuracies of 91%, 75%, and 71%, respectively. In particular mesenteric and serosal deposits and subcentimetric lesions were better described by DW MRI than by CT [33]. It must be emphasized that this study used a dedicated MRI protocol that, besides DWI and post-contrast series, included proper bowel preparation combining antispasmodics with negative oral contrast fluid to suppress signal from bowel content. Combined gadolinium-enhanced and DWI images in addition to morphological sequences has to highest accuracy compared with either sequence alone [33, 44, 45]. In our experience too, these sequences further complement each in many instances, hereby reducing false readings. Unfortunately, the number of MRI studies in this meta-analysis was insufficient to perform a sub-analysis between studies using both DWI and post-gadolinium series versus those using only post-gadolinium series in addition to morphological sequences.

Despite the suboptimal performance of CT, a previous meta-analysis on the diagnostic performance of CT and (DW)MRI for detecting peritoneal metastases still concluded that CT should be the preferred imaging modality for detecting peritoneal metastases [46]. This conclusion, however, was mainly drawn based on the robustness of data as the number of included MRI studies at that time was much smaller than the number of included CT studies (3 versus 19). More recent literature demonstrates that MRI with DW sequences significantly outperforms CT for estimation of spread of PM, overall staging and prediction of operability in colorectal cancer patients [47, 48].

Major strength of this study is that it is the first meta-analysis that could include enough MR studies to compare all currently available imaging techniques used to assess peritoneal disease. However, there are also several limitations. Although heterogeneity of included patient groups was fairly limited (primary staging, only gastrointestinal, and ovarian tumors), there remained considerable heterogeneity in study design and imaging variables (e.g., scanners, protocols, sequences, intravenous/oral contrast, readers’ profession and experience). Second, in more than half of the included studies, it was unknown whether the reference standard results were blinded to index test results. An explanation for this might be that most articles were published in radiological journals where there is less emphasis on surgical and histopathological aspects. In another three studies, reference standard results were reported not to be blinded to index test results. Moreover, reference standard ranged from histopathological analysis to surgical findings to radiological follow-up. This may have introduced interpretation bias. Third, although the original search terms of this study included ovarian and gastrointestinal primaries, the vast majority of the included non-ovarian studies consisted of cohorts with merely gastric cancer patients. In just three studies, patients with colorectal or appendix tumors were included. Lastly, since PET imaging is uncommon in patients with mucinous primary tumors because of its very low sensitivity in the detection of these tumors, studies comprising exclusively mucinous tumors were excluded from this meta-analysis. In our opinion, including this patient population would have led to a bias because the results of CT, MRI, and PET imaging would be compared while different populations were studied for each modality.

In summary, the detection of peritoneal metastases plays an important role in the accurate staging of cancer patients, helping to optimally guide patient management. With all currently available and promising surgical and oncological treatment options, there is an increasing demand for preoperative detection of peritoneal disease in order to tailor the right treatment for an individual patient. This meta-analysis suggests that (DW)MRI is the preferred imaging method to assess peritoneal tumor load. In the future, DW MRI is expected to be increasingly implemented alongside and above CT and PET-CT as a standard workup tool for imaging peritoneal metastases.