Introduction

A major problem in treating peritoneal metastases (PM) originating from the various intra-abdominal tumours (gastric, colorectal, ovarian) is how to identify these malignant implants early so as to stage patients accurately [1] and to select those patients to undergo cytoreductive surgery (CRS) plus hyperthermic intraperitoneal chemotherapy (HIPEC). Equally important outcome after these combined procedures depends also on detecting PM early given that peritoneal involvement assessed with Sugarbaker’s peritoneal cancer index (PCI) [2] correlates strictly with patients’ long-term survival.

As a diagnostic tool for selecting candidates for CRS plus HIPEC, some have proposed laparoscopy [3]. The laparoscopic approach nevertheless remains underused (<10 %) and concern arises over the incidence and prognostic importance of port-site metastases. Although tests initially used for diagnosing PM include ultrasonography, ultrasonographic findings are inaccurate for staging PM [4]. Other imaging techniques that can give accurate information about the location, size and morphology of PM are multidetector computed tomography (CT), magnetic resonance imaging (MRI) and positron emission tomography (PET) with computed tomography (PET/CT). Despite numerous publications on the role of imaging techniques in detecting PM, none has systematically assessed diagnostic yields. Having more precise information would give physicians treating PM evidence-based advice about which diagnostic imaging technique to prefer.

Seeking evidence-based information on diagnostic imaging techniques for detecting PM, we conducted a systematic review and meta-analysis. As our primary end point we assessed diagnostic accuracy of CT and MRI in detecting PM. Secondary end points were per region CT sensitivity and specificity in assessing the extent of PM as scored by PCI. To provide further information we also determined the correlation between radiological and surgical PCI, and compared the diagnostic yield of CT versus PET/CT.

Materials and methods

Methods for analysis and inclusion criteria were based on the recommendations established by the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) [5]. The review protocol was not published or registered in advance.

Literature search

In December 2014, the MEDLINE, EMBASE, Cochrane Library, Sumsearch2 and Web of Science databases were searched independently by two observers for studies that reported the accuracy of CT, MR, PET and PET-CT in detecting PM in patients with cancer. The search used the following keywords: peritoneal carcinomatosis, peritoneal metastases, computed tomography, positron emission tomography, magnetic resonance imaging, diagnosis, accuracy, imaging, combined using “OR” and “AND”. The search strategy is described in detail in the electronic supplementary material (Appendix E1). Additional articles were searched using the “Related articles” function in PubMed. References for the collected articles were crosschecked for further relevant studies.

Study selection

Potentially eligible papers were initially screened by two reviewers (DB and MR) on the basis of title and abstract. Studies were selected according to the presence of the search terms in the article’s title or abstract, but selection was limited to human subjects. Non-relevant articles, review articles, case reports, comments or letters were excluded. The full-text for the remaining articles was retrieved. The same two radiologists (DB and MR) independently assessed and selected articles for review; discrepancies regarding potential eligibility and inclusion were resolved by consensus. The observers were not blinded to authors’ and journal names.

To be eligible for the systematic review, a study had to assess the use of CT or MR in detecting PM in patients with cancer of any type and at any stage, analyse patient-based and region-based data according to PCI, determine the correlation between radiological and surgical PCI, compare PET/CT and CT in detecting PM, provide sufficient data to (re)construct a 2 × 2 contingency table to assess diagnostic accuracy, analyse per patient data, and use the postoperative histopathological diagnosis as the reference standard.

Data extraction

Relevant data for selected studies were independently extracted by the reviewers using a data extraction form. Differences in data collection were resolved by consensus with a third reviewer (AL an abdominal radiologist with more than 15 years experience) referring back to the original article. From each primary study, the following characteristics were extracted. Study characteristics year of publication; study design; country where the study was conducted; whether it was a single or a multicentre study; primary outcome; all reference tests (i.e. biopsies or intra-operative findings); time between imaging and reference standard (histopathological); dimension of lesion, total true-positive, false-positive, true-negative and false-negative findings.

Patient characteristics number of included patients, age, gender, information about the primary tumour.

Imaging characteristics detailed information about the imaging equipment and basic specifications (field strength for MRI, type of scanner for CT and PET), assessment techniques used (sequences for MRI, slice thickness for CT), bowel preparation (fasting, laxatives, spasmolytic drugs), and use of luminal or intravenous contrast medium or both.

Multiple attempts were made to contact authors if data were incomplete or if an apparent conflict or inconsistency in the article had to be resolved.

To assess the methodological quality of the included primary studies and to detect potential bias, we used items from the QUADAS 2 tool (Quality Assessment of Diagnostic Accuracy Studies) [6]: several signalling questions were described for each domain (patient selection, index test, reference test and patient flow). Answers for each domain could be adequate (☺), inadequate (☹) or unclear (?) and were used to assess the risk of bias for the four domains, described as high, low or unclear risk (Table 2).

Summary measures

The primary end point in this systematic review and meta-analysis was to assess per patient diagnostic accuracy of CT and MR in detecting PM. Secondary end points were determining CT scan sensitivity and specificity in detecting PM for the 13 regions listed in the PCI, computing a cumulative correlation value between radiological and surgical PCI, and comparing diagnostic yields for CT and PET/CT.

Statistical methods

For each study, we analysed the following measures for test: sensitivity, specificity, positive predictive value and negative predictive value. We estimated sensitivity and specificity as the weighted average by sample size.

For meta-analysis, we evaluated cumulative values for per patient diagnostic accuracy of CT and MRI. For each analysis, we calculated effect size (ES), reported as Z value, and percentage of heterogeneity between studies, computing I 2 values. Between-study heterogeneity was analysed using the following equation: I 2 = [(Q − df)/Q] × 100 %, where Q was the Chi-squared statistic and df the degrees of freedom. I 2 values equal to 25, 50, and 75 % were assumed to represent low, moderate and high heterogeneity, respectively. These values describe the percentage variability in effect estimates resulting from heterogeneity rather than sampling error (chance).

For each meta-analysis, we computed the cumulative values for sensitivity, specificity and 95 % confidence intervals (CIs), and other measures across studies, for CT and MRI, using random and fixed-effects models according to heterogeneity. P values <0.05 were considered to indicate statistical significance. All data were analysed with Comprehensive Meta-Analyses (version 2.2.064, July 27, 2011), Excel 2010 (Microsoft corporation, Redmond, WA, USA), and MetaDiSc (version 1.4).

Results

Study selection

Database searching identified 1014 articles and after duplicates were removed, 529 were considered for screening. In 465 articles, neither the title nor abstract indicated that the study met the inclusion criteria and 64 articles were selected for full-text extraction (Fig. 1). After cross-referencing the references, no extra studies were included, and a total 64 studies remained: 42 studies were excluded, 1 because it was an editorial, 2 because they were reviews, 7 because their aims differed, 31 because they failed to fulfil the eligibility criteria and 1 because it was a case report. Of the 529 studies screened, 22 articles fulfilled the eligibility criteria and were, therefore, selected for quantitative synthesis. Of these, 18 evaluated CT [724], 3 MRI [22, 25, 26], and 6 compared the diagnostic yield of PET/CT with CT [1012, 14, 27, 28]. Eleven studies offered data for assessing per patient diagnostic accuracy of CT [714, 16, 21, 22]. Six studies were eligible for comparing per patient accuracy of CT compared with PET/CT [1012, 14, 27, 28]. Eight studies offered data for assessing per region diagnostic accuracy of CT according to PCI scheme [1521, 24]; nine studies offered data for determining correlation between radiological and surgical PCI [7, 1521, 23, 24].

Fig. 1
figure 1

PRISMA flow diagram

Characteristics of the included studies

Of the 22 eligible studies, 21 were conducted in single centres in the United States, Europe, Asia and Australia; one was a multicentre study and conducted across several countries (USA, Israel, Germany, Austria, and Spain) (Table 1). The 22 studies involved 37 centres. Patient selection and index test domains showed a lower risk of bias than did the reference test (postoperative histopathological diagnosis) and patient flow domains. Concerns about the applicability of patient selection, index and reference tests were generally low (Table 2).

Table 1 Summary data for 22 eligible studies
Table 2 QUADAS 2 results

Patients’ characteristics

The 22 studies eligible for review involved 934 patients, 630 of them had histopathologically confirmed PM. Median study size was 46 patients (range 19–130). Of the 630 patients enrolled, 275 (43.6 %) had PM from gastro-intestinal malignancies, 289 (45.8 %) from gynaecological malignancies and 14 (2.2 %) from other cancers (Table 1). No information about the primary tumour was available in 52 patients (8.2 %).

Accuracy of CT in the diagnosis of PM

Per patient data Of the 22 studies analysed, 11 provided per patient data [714, 16, 21, 22] (Table 1). Data for patients were analysed and inserted in a forest plot (Fig. 2). Cumulative data for diagnostic accuracy from these studies were sensitivity 83 % (95 % CI 79–86 %; I 2 83.3 %), specificity 86 % (95 % CI 82–89 %; I 2 65.5 %), pooled positive LR 4.37 (2.58–7.41; I 2 81.2 %), pooled negative LR 0.20 (0.11–0.35; I 2 85.4 %).

Fig. 2
figure 2

Per patient CT accuracy; forest plot for diagnostic sensitivity, specificity, positive and negative likelihood ratios in detecting PM

Per region data Data were provided on a per region basis, according to PCI, in eight studies [1520, 24]. Per region data were analysed for 337 patients (Table 1). Because the lack of raw data prevented us from estimating cumulative values for accuracy using meta-analytic methods, we calculated mean values and 95 % CI (Table 3). CT sensitivity for detecting PM was high in two regions: 78 % (95 % CI 64–92 %) in region 2 (epigastrium) and 74 % (95 % CI 64–83 %) in region 6 (pelvis). Sensitivity was lowest in region 9, upper jejunum (48 %, 95 % CI 14–81 %).

Table 3 Mean values of per region CT sensitivity and specificity according to Sugarbaker scheme

Specificity for detecting PM was particularly high in region 1 (right upper abdomen) and regions 9, 10, 11, 12 (small bowel), ranging from 91 % to 95 %. Specificity was lowest in region 6, pelvis (76 %, 95 % CI 64–89 %). The PCI is briefly described in the electronic supplementary material (Appendix E2).

Accuracy of CT in assessing the PCI

Data were provided in nine studies [7, 1520, 23, 24] (Table 1). The studies showed extreme heterogeneity in design, statistical analysis, correlation coefficients and outcomes used for computing correlation (lesion size, PCI score, detection of peritoneal implants). Although we were unable to compute an overall value to summarize these results, CT-PCI score and surgical-PCI scores yielded high concordance ranging from 0.49 (95 % CI 41–56 %) [15] to 0.96 [18] (using Kappa statistic). CT underestimated surgical PCI by about 12–33 % [1518, 24] (Table 4).

Table 4 Studies reporting correlation between surgical PCI and CT-PCI

Accuracy of MR in diagnosing PM

Few studies reported data for assessing MRI accuracy in detecting PM (Table 1). Three studies, by Fujii et al. [25], Soussan et al. [26] and Satoh et al. [22], assessed MRI accuracy per patient in about 186 patients. Cumulative data for diagnostic accuracy from these three studies were sensitivity 86 % (95 % CI 78–93 %; I 2 0.0 %), specificity 88 % (95 % CI 83–92 %; I 2 9.5 %), pooled positive LR 6.59 (4.66–9.33; I 2 0.0 %), and pooled negative LR 0.16 (0.10–0.27; I 2 0.0 %) (forest plot Fig. 3).

Fig. 3
figure 3

MRI per patient accuracy; forest plot of sensitivity, specificity, positive and negative likelihood ratios

Comparison of FDG PET/CT and contrast-enhanced CT in diagnosing PM

Six studies were eligible for assessing per patient sensitivity of CT in comparison with PET/CT [1012, 14, 27, 28]; cumulative sensitivity values from these studies for PET/CT were 82 % (95 % CI 75–87 %; I 2 76.3 %), and for CT were 66 % (95 % CI 58–73 %; I 2 84.9 %).

Three studies were eligible for assessing CT per patient specificity in comparison with PET/CT [1012]; cumulative specificity values from these studies for PET/CT were 93 % (95 % CI 95–98 %; I 2 0.0 %), and for CT were 77 % (95 % CI 66–86 %; I 2 86 %) (Table 1; Fig. 4).

Fig. 4
figure 4

Cumulative values of sensitivity and specificity of FDG PET/CT and contrast-enhanced CT in the diagnosis of PM

Discussion

As expected, our systematic review shows that both MRI and CT have high per patient accuracy for detecting PM from various cancers. This finding notwithstanding, the small number of studies available for review (3 vs 19) and the lack of evidence-based information on MRI suggest that until MRI techniques advance, the preferred imaging modality and the one that has an excellent diagnostic yield in patients with suspected PM is CT.

When we determined the correlation between CT-PCI and surgical PCI as secondary end point, even though we were unable to compute a clinically relevant cumulative value, single results from the included studies reported medium to high concordance (Table 4). A new finding that our review underlines is that CT scanning seems to underestimate surgical PCI by about 12–33 %. This finding is hardly surprising given the known/innate limitations of PCI staging system. For radiologists, reporting a CT scan according to PCI is prone to errors for several reasons. For example, the boundaries between the 13 regions are unclear especially for the small bowel and more implants in a single region lead to the same score for a single lesion of the same size. In addition, different PM presentations (micro-nodular, nodular, plaque-like, omental cake, mass forming, ascites) increase the difficulties in detecting PM and consequently, increase the likelihood of inaccurately estimating tumour volume. One way to increase per region CT diagnostic accuracy might be to make PCI easier to calculate.

As a secondary end point, when we tried to compare diagnostic accuracy of CT and PET/CT we considered unnecessary to analyse PET/CT accuracy for detecting and characterizing PM because another published meta-analysis had already done so [29]. This meta-analysis, including data from 513 patients with various types of cancer, showed a pooled sensitivity of 72 %, specificity of 97 %, positive LR of 10.4 and negative LR of 0.31 for both procedures combined. These results for per patient diagnostic accuracy almost match those we present for CT. Conversely, direct comparison in the six studies included in our meta-analysis led to different results, with a diagnostic accuracy slightly favouring PET/CT (incremental PET/CT sensitivity 14 %, incremental PET/CT specificity 16 %). These discrepancies probably arose because in one study [27] CT was reported by a nuclear medicine physician; in one study [14] neither readers characteristics nor CT acquisition parameters were clearly reported; in one study [12] CT was interpreted with 5 mm reconstructed images. These results underline the need for a specific CT protocol to properly detect minimal peritoneal involvements: abdomen and pelvis to be acquired with an MDCT in a single-breath-old, thin detector collimation and thickness to benefit from multiplanar reformats, luminal distention to assess small bowel and colon.

Our study design has some inherent limitations. First, we included relatively few studies, although the number is in line with the average number of eligible studies for all medical areas [30] and our sample size is unscientifically large (over 930 patients), particularly if compared with medians and quartiles for studies in the field of diagnostic imaging. The studies we included in this meta-analysis are nevertheless consistent, without major biases, thus avoiding concerns about applicability. Second, in designing our research and selecting articles for review, excluding grey literature may have led to publication bias. Third, in some studies diagnosis of PM lacked histopathological confirmation, possibly, therefore, overestimating the true-positive rate. Forth, heterogeneity among studies, mainly owing to different primary cancers, different imaging methodologies and variables (scanner and acquisition protocol for CT, sequences for MRI, readers’ experience), making it hard to generalize the results.

In conclusion, our systematic review and meta-analysis provides new evidence-based information suggesting that CT is the imaging modality of choice in patients with PM originating from various intra-abdominal cancers. Given its good overall per region diagnostic accuracy according to PCI, CT may help surgeons referring patients to the best treatment option. Until we obtain more consistent information on their diagnostic yield in detecting PM, MRI and PET/CT should be considered second choices.