Introduction

Worldwide, over half a million people die from colorectal cancer (CRC) [1] annually. Population screening reduces mortality in two ways: established malignancy is detected sooner, facilitating cure; and precursor polyps (namely adenomas) can be removed, preventing cancer development subsequently. Internationally, most screening programmes use faecal occult blood testing [2], which is proven in meta-analysis of randomised trials to reduce CRC-related mortality [3]. Faecal testing (whether by guaiac-based techniques, gFOBt, or immunochemical techniques, FIT) is widely available [4], acceptable to screenees [5] and cost-effective [6].

Merely demonstrating occult blood loss cannot improve outcomes unless cancer is found and treated, or large adenomas removed. For most, colonoscopy both confirms cancer and permits endoluminal excision of smaller adenomas and cancers. However, colonoscopy may be incomplete, contraindicated or refused by some screenees [7]. To maximize neoplasia detection, an alternative test is required. One possibility is CT colonography (CTC), which has been recommended when colonoscopy is not feasible or incomplete [8]. This recommendation is largely based on meta-analysis of cohort studies [911] and two randomised trials of symptomatic older patients [12, 13]. Whilst extrapolation from such literature is intuitively logical, gFOBt/FIT-positive screenees have such a high prevalence of abnormality that subsequent tests require extremely high sensitivity to achieve an acceptable negative predictive value. Additionally, screen-detected cancers are earlier stage than symptomatic tumours [14], and could be more difficult to detect at CTC. Furthermore, advanced histologic features are more common in gFOBt/FIT-positives, even at equivalent adenoma diameter [15, 16]. Since CTC has less sensitivity for small polyps [10, 17], this implies that more advanced neoplasia will be missed when testing gFOBt/FIT-positive subjects compared with asymptomatic individuals, in whom subcentimetre adenomas rarely harbour advanced neoplasia [18].

If CTC is to be adopted widely following positive gFOBt/FIT, sensitivity and specificity for cancer and adenomas should be known with precision. The relevant comparator is colonoscopy, since it is accurate, widely available and generally safe [7]. Without this information, clinicians and patients are unable to balance the risks of colonoscopy against the chance of missing neoplasia with CTC. To address this, we performed a systematic review and meta-analysis of the sensitivity and specificity of CTC for colorectal cancer and adenomatous polyps in gFOBt/FIT-positive individuals.

Materials and methods

Data sources

A literature search of the MEDLINE database was performed using Pubmed (http://www.ncbi.nlm.nih.gov/pubmed). All primary studies for the period January 1994 (the year CT colonography was first described) to February 2013 were considered. To retrieve articles relevant to stool testing, the Medical Subject Headings (MeSH) terms feces, Occult Blood and Immunologic Tests were combined with the free-text terms faeces, feces, faecal, fecal, FIT, iFOB*, FOB* and occult. This search was combined with a search for CTC-related literature using the MeSH terms colonography, colography, CT colonography, CT colonoscopy, CT pneumocolon, virtual colonoscopy or virtual endoscopy and the free-text terms colonography, colography, CTC, computerized tomographic colonoscopy, computed tomographic colonoscopy and CT pneumocolon. Subsequently, the Cochrane Library, EMBASE, AMED and OVID were searched using the free-text terms fec*, face*, FOB*, gFOB*, iFOB*, FIT, immunochem* and immunolog* combined with CT comput* tomogra*, colonogra*, virtua* colonosc* and virtua* endosc*. Reference lists from reports eventually selected were also searched manually.

Study selection

Studies were eligible if the patient population had tested positive for faecal occult blood and been imaged by CTC (defined below). Studies describing other populations were potentially eligible if separate per-patient data were presented for gFOBt/FIT-positive participants. Only full reports of original data from in vivo research in human subjects were considered. Review articles, editorials, commentaries, book chapters, abstracts, guidelines and position statements were ineligible.

Target disorder

To be included, the focus of the study had to be the detection of colorectal neoplasia using CTC in comparison with a reference test. Studies assessing a technical development (e.g. computer-assisted detection, CAD) or alteration in CTC technique were potentially eligible if results were presented for CTC examinations conducted according to consensus standards [19].

CT test methods

On the basis of consensus documents for the performance of CTC [19, 20], all patients had to undergo bowel preparation (either cleansing, tagging or both) prior to imaging in a minimum of two positions, with or without intravenous contrast. Interpretation of CTC before the reference test (or blinding of the observer to reference test findings) was required. No stipulation was made regarding the mode of interpretation used by CTC observers, nor regarding the use of CAD.

Reference test

All CTC findings had to be verified by a reference test. Conventional colonoscopy, segmental unblinded colonoscopy and surgery (with subsequent histopathology) were acceptable alternatives.

Data extraction

All abstracts of primary studies were independently screened by two authors (AAP and DAP) who excluded clearly ineligible studies. The full text of potentially eligible studies was retrieved and scrutinised. Differences in opinion regarding eligibility were resolved in face-to-face consensus. From each eligible study, the following were extracted; (a) publication year, (b) number and age range of gFOBt/FIT-positive subjects, (c) single- or multi-centre study design, (d) CTC technique (reconstruction interval, use of cathartics, stool tagging, intravenous contrast medium and the approximate radiation dose), (e) approximate experience of CTC readers, (f) interpretation strategy (including two-dimensional or three-dimensional viewing, use of CAD and double-reporting), (g) reference standard against which CTC and colonoscopy were compared, (h) number of patients with cancer by the reference standard, (i) per-patient sensitivity of CTC for cancer, (j) per-patient sensitivity of colonoscopy for cancer, (k) number of patients with ≥10 mm and ≥6 mm polyps and adenomas, including advanced adenomas, by the reference standard, (l) per-patient sensitivity of CTC for ≥10 mm and ≥6 mm polyps, adenomas and advanced adenomas, (m) per-patient sensitivity of colonoscopy for ≥10 mm and ≥6 mm polyps, adenomas and advanced adenomas, (n) specificity of CTC for polyps, adenomas and advanced adenomas (at ≥10 mm and ≥6 mm thresholds) and (o) positive and negative predictive values of CTC for cancers, polyps, adenomas and advanced adenomas at ≥6 mm and ≥10 mm thresholds. Specificity of CTC for cancers cannot be calculated because large polyps and cancers are only distinguishable post hoc (i.e. histologically). Article quality was judged using QUADAS-2 (quality assessment tool of diagnostic accuracy studies) [21].

Analysis

Numbers of included and excluded studies (and reasons for exclusion), patient characteristics, study design, CTC technique, observer experience and viewing mode were tabulated and analysed with descriptive statistics. The QUADAS-2 assessment was converted into a summary score of either “high risk”, “low risk” or “unclear”, for both the risk of bias and concern over applicability to the systematic review question, as recommended by the QUADAS-2 authors [21].

Per-patient 2 × 2 contingency tables were constructed for meta-analysis of sensitivity and specificity. Forest plots of sensitivity and specificity were generated using the forest command of the metafor package [22] for R version 2.15.1 [23]. Heterogeneity between primary studies was assessed using the I 2 statistic, with values of 25 %, 50 % and 75 % taken to indicate low, moderate and high heterogeneity respectively. Meta-analysis of paired sensitivity and specificity was conducted via a bivariate random effects model that enables estimation of a summary receiver operating characteristic (ROC) curve using the R package mada. The results for single-reader CTC were used, since this is the most frequent mode of interpretation in current clinical practice. Bivariate models allow for possible correlation between sensitivity and specificity [24, 25]. The following factors that might increase heterogeneity were considered as moderator covariates in the bivariate model: (a) year of publication, (b) number of included participants, (c) prevalence of 6–9 mm and ≥10 mm adenomas or carcinoma, (d) single-or multi-centre design, (e) use of faecal tagging, (f) reader experience and (g) use of three-dimensional interpretation. Covariates a–c were treated as continuous variables, and d–g as binary variables.

Results

Search results

A flow diagram of abstracts examined and articles retrieved, included and excluded (with reasons) is shown in Fig. 1. In summary, 122 studies were identified from the Pubmed and Cochrane Library search and 416 from the EMBASE, AMED and OVID search. A total of 39 full-text articles were screened and ultimately 5 were included. Excluded studies are detailed in the “Appendix”.

Fig. 1
figure 1

Flow chart of the systematic review

Characteristics of included studies

Five articles were included, reporting four distinct studies [2630]. Two of these articles reported different primary outcome measures for the same patient cohort [26, 27]. All four studies were performed in Europe: one in the Netherlands [26, 27], one in Italy [29], one in France [30] and one international study in Italy and Belgium [28]. The Italian study was from a single centre, the Dutch study used two centres, the French study used 26 centres and the international study used 21 centres initially, although only 12 contributed patients to the final analysis. The two articles reporting the same patient group were both included (as relevant data were presented across the two articles) but individual subjects were not duplicated during analysis.

Patient characteristics and CTC technique

A total of 622 gFOBt/FIT-positive patients were enrolled in the selected studies, ranging from 49 to 302 per study (Table 1). Two studies were designed specifically to assess gFOBt/FIT-positive patients [26, 27, 29], whereas the other two included gFOBt-positive patients as a subgroup of other high-risk populations [28, 30]. Only the results of gFOBt/FIT-positive subjects are included here. The age range was 50 to 75 years (one study reported mean and interquartile range [30]). Prevalence of ≥6 mm adenomas or cancer ranged from 32.0 to 65.3 %. Cathartic bowel preparation was used by all except Liedenbaum et al., who used a reduced-laxative regime. Faecal tagging was used variably (see Table 1). All studies used dual patient positioning, multislice CT, low dose (<100 mAs), unenhanced acquisition and narrow reconstruction intervals. Reading strategy was left to radiologist preference in two studies [28, 30] and primary 2D in the other two [26, 27, 29]. Computer-aided detection (CAD) was not used. Liedenbaum et al. reported results for both single- and double-reporting [26]. A minimum level of radiologist experience was required by all studies, ranging from 50 to 100 cases. The reference standard was universally segmental unblinded colonoscopy (i.e. initial colonoscopy optimised by re-examination following revelation of CTC findings).

Table 1 Characteristics of the included studies

Study quality

Overall research study quality was good. In one study, 10 patients were excluded because CTC images were judged non-diagnostic and a further 2 had incomplete colonoscopy [26, 27]. In clinical practice, a variable proportion of patients will have poor quality CTC and it is not possible to simply exclude them. However, such cases were a small proportion of the total number in this particular study (12 exclusions, 302 participants), implying a negligible effect on overall results. In another report, patient flow through the study was not reported separately for gFOBt-positive participants [30]. All studies used segmental unblinded colonoscopy as the reference standard, a practice which theoretically may lead to incorporation bias. The summary QUADAS-2 results are presented in Table 2.

Table 2 QUADAS-2 quality assessment of the included studies

Sensitivity and negative predictive value

Two studies [27, 29] reported the sensitivity of CTC for colorectal cancer separately from the sensitivity for adenomas. Sensitivity for CRC was 100 % in one study [29] (2 of 2 cancers detected) and 95.5 % in the other (21 of 22 cancers detected) [27]. The two studies describing CTC for high-risk patients [28, 30] (including some gFOBt/FIT-positives) did not report sensitivity for cancer in the gFOBt/FIT-positive subset. Initial colonoscopy did not miss any cancers (vs unblinded colonoscopy) in the included studies.

Regarding sensitivity for adenomas, the four studies used slightly different outcome measures; nonetheless, heterogeneity between studies was low (I 2 = 0.0 %). Regge et al. [28] reported per-patient sensitivity for advanced adenomas or cancer measuring ≥6 mm, with CTC detecting 96 of 111 such patients (86.5 %). No data were presented at a ≥10 mm threshold. Liedenbaum et al. [27] reported a 91 % per-patient sensitivity of double-reported CTC for ≥6 mm lesions (of any histology), with 192 of 211 such patients being detected by CTC. Unusually for the CTC literature, this article used a size cut-off before CTC was termed a true-positive: for example, a 4-mm polyp reported at CTC which was ultimately measured as 6 mm by the reference standard was regarded as a CTC false-negative, since a CTC finding of a 4-mm polyp would not typically provoke colonoscopy. In the corresponding report of the same patients [26], a more conventional polyp-matching algorithm was used and results were presented for both double-reported and single-reader CTC. The mean sensitivity of double-reported CTC for ≥6 mm adenomas or carcinoma was 93 % versus 89 % for a single radiologist. Corresponding sensitivities for ≥10 mm adenomas or carcinomas were 95 % and 92 % for double- and single-reporting respectively. Heresbach et al. [30] described per-patient sensitivity at ≥6 mm and ≥10 mm thresholds. CTC was 88 % sensitive at the 6-mm threshold (correctly finding 14 of 16 patients) and 92 % sensitive at the 10-mm threshold (12 of 13 patients), for both polyps and adenomas. Finally, Sali et al. [29] reported per-patient sensitivity for cancer or adenomas measuring ≥6 mm, correctly identifying 21 of 22 patients (95.5 %). No per-patient data were presented at a ≥10 mm threshold. These data are summarised in Table 3 and the forest plot in Fig. 2.

Table 3 Per-patient sensitivity (95 % confidence intervals) and negative predictive value (95 % CI) of CTC
Fig. 2
figure 2

Forest plot of included studies showing individual and pooled estimates of sensitivity and specificity of CTC for ≥6 mm adenomas and cancers (Regge et al. reported histologically advanced neoplasia). For each study, marker area is proportional to precision, with greater precision indicated by larger area. Pooled values are derived from the bivariate random effects model. TP true positive, FN false negative, TN true negative, FP false positive

Only one study reported the per-patient sensitivity of colonoscopy for adenomas in comparison to the segmental unblinded reference standard in gFOBt/FIT-positives: Liedenbaum et al. [26] found a 98 % sensitivity for adenomas or carcinomas ≥6 mm and a 99 % sensitivity at a ≥10 mm threshold. Heresbach et al. [30] reported a per-patient sensitivity for colonoscopy of 99.5 % and 99.7 % for ≥6 mm and ≥10 mm polyps respectively, although they did not stratify by gFOBt status. Regge et al. [28] found that blinded colonoscopy only missed two advanced adenomas (measuring 13 and 18 mm), although whether or not these patients were gFOBt-positive was not stated.

Specificity and positive predictive value

Overall, specificity varied substantially between studies, ranging from 52 % to 91 % at a ≥6 mm threshold. Consequently, heterogeneity was high (I 2 = 78.3 %), as summarised in Table 4 and the forest plot in Fig. 2. Since different radiologists in different studies may vary the point at which they judge a test positive, sensitivity and specificity may vary simply because of the arbitrary threshold used by an individual radiologist. Furthermore, there may be differences in the spectrum of cases or sizes of polyps across the studies. A bivariate model was used to construct a summary ROC curve of the included studies (Fig. 3), taking this into account. None of the moderator covariates (year of publication, number of included participants, prevalence of abnormality, single- or multi-centre design, use of faecal tagging, reader experience or use of three-dimensional interpretation) were found to be significant, perhaps because of the small number of primary studies. From this model, the operating point has average sensitivity of 88.8 % (95 % CI 83.6 to 92.5 %) and specificity of 75.4 % (95 % CI 58.6 to 86.8 %) with the summary curve being reasonably close to the top left corner of the ROC space.

Table 4 Per-patient specificity (95 % confidence intervals) and PPV (95 % CI) of CTC
Fig. 3
figure 3

Summary ROC curve of included studies. The sensitivity of each individual study for 6 mm adenomas or cancer is plotted against 1 − specificity. Regge et al. (square) reported advanced adenomas only. Data for Liedenbaum et al. (circle) are for single-reader CTC. Heresbach et al., and Sali et al. are represented by a triangle and diamond respectively. Grey lines show 95 % confidence regions of each individual study. Black circle shows the overall estimate at the operating point

Discussion

CTC is a relatively novel technology that has matured and is now widely available [31]. It is replacing the barium enema for radiological evaluation of the colon, since randomised trials show it is more sensitive and misses fewer cancers and large polyps in older symptomatic adults [12]. The English Bowel Cancer Screening Programme recommends CTC for gFOBt-positive patients who are unsuitable for colonoscopy [8]. However, it is striking how little evidence exists regarding the diagnostic accuracy of CTC in gFOBt/FIT-positive patients. Only four studies have investigated this group, with only two having gFOBT/FIT-positive subjects as their direct focus. National policies are therefore governed largely by extrapolation from these small cohort studies and related reports of higher-risk patient groups.

Nonetheless, the estimated sensitivity of 88.8 % (and range of 86–96 % for the component studies) for adenomas or cancer ≥6 mm suggests that CTC is sufficiently sensitive to substitute for colonoscopy when necessary. Furthermore, heterogeneity was low, implying that the (limited) available literature is consistent. This very high sensitivity is greater than that reported in prior meta-analyses of CTC, which range from 69 % [17] to 86 % [10] for ≥6 mm polyps. We suspect this is due to increased average lesion size in our meta-analysis as a consequence of preselection by gFOBt/FIT (which preferentially detects larger polyps and cancers via their propensity to bleed). For example, patients with ≥1 cm adenomas/carcinomas heavily outnumbered those with 6–9 mm neoplasms in our meta-analysis (246 versus 100), whereas this pattern was reversed in a prior, unrestricted meta-analysis [10]. Since CTC is more sensitive for these large lesions, their relative over-representation inevitably increases the pooled estimate of CTC sensitivity. Although based on small numbers, pooled sensitivity for cancer was 96 % (95 % CI 79.8–99.8 %), identical to that derived from a broader meta-analysis of the diagnostic accuracy of CTC [9]. Notably, the sensitivity of colonoscopy for cancer (judged against segmental unblinded colonoscopy) was 95 % in that meta-analysis, implying that the two tests have very similar sensitivity for established malignancy. Sensitivity for cancer is particularly important since a common reason for performing CTC over colonoscopy is co-morbidity. Detection of smaller adenomas is less crucial, particularly those lacking advanced features. The estimated progression rate of even histologically advanced adenomas to carcinoma is approximately 3–4 % per annum [32], implying that the small chance of missing an advanced adenoma may be acceptable.

Specificity and PPV were not as good, with the latter ranging from 62 to 88 %, somewhat lower than the 92–93 % reported when CTC is used for asymptomatic screenees [33, 34]. The pooled estimate of specificity was 75.4 %, although heterogeneity was high. Low specificities may partly reflect the high prevalence of abnormality in the gFOBt/FIT-positive population, potentially leading radiologists to report equivocal findings as positive (to maximise sensitivity). Furthermore, the minimum level of radiologist experience (50 to 100 cases) was substantially lower than the studies reporting high PPV (minimum 300 cases) [33, 34]. Additionally, faecal tagging was not used in the study with the lowest specificity [29], which reported that most of the false positives were due to faecal residue. Conversely, Regge et al. [28] found that most false positives were due to hyperplastic or diminutive polyps. Irrespectively, the implication is that CTC may direct a substantial proportion of normal patients to colonoscopy.

Since the randomised trials supporting gFOBt population screening employed colonoscopy to investigate a positive faecal test result, large-scale screening programmes follow a similar model. CTC is commonly used when colonoscopy is incomplete or contraindicated (including screenee refusal). Patients included in our meta-analysis were, by definition, able to undergo both CTC and colonoscopy. These data are therefore most applicable to a patient population deemed fit for colonoscopy i.e. those with an incomplete colonoscopy or who refuse it for reasons unrelated to their general health. Conversely, the sensitivity of CTC in frailer individuals with relative contraindications to colonoscopy is unknown. Observational data show that cancer and adenoma detection rates by CTC in gFOBt-positives are substantially lower than corresponding detection rates by colonoscopy [35]. However, these screenees were imaged with CTC because colonoscopy was judged inappropriate, meaning that this difference may arise from selection bias rather than reduced sensitivity of CTC. Nonetheless, the high sensitivity and moderate specificity of CTC found in our systematic review may not generalise to frailer patients.

Our review focused on sensitivity and specificity, and did not consider other factors such as safety, patient acceptability or cost. Furthermore, the impact on overall screening compliance by introducing an additional step in the diagnostic pathway (i.e. faecal testing, then CTC, then colonoscopy) is unknown. The high prevalence of abnormality after positive gFOBt/FIT suggested to the authors of one component study [27] that universal adoption of CTC as a “triage test” would not be cost-effective. Conversely, a recent cost analysis concluded that savings would arise via avoiding unnecessary colonoscopy by CTC triage [36].

The major limitation of this study is the small number of studies available in the primary literature for review and meta-analysis. Whilst this is unavoidable, it does imply that our estimates of heterogeneity may be inaccurate, and that the summary estimates may be substantially affected by a single outlying study. It is therefore reassuring that the two largest studies we included [2628] had almost identical sensitivity and specificity. Additionally, such a small number of component studies precludes meaningful assessment of publication bias via funnel plots or alternatives, meaning that the result of the meta-analysis should be treated with appropriate caution. Assessment of moderator covariates in the bivariate model is also potentially limited by the small number of studies, meaning that we may have erroneously discounted these factors as affecting sensitivity or specificity.

In summary, by systematic review we conclude that few studies have directly addressed investigation of gFOBt/FIT-positive populations by CTC. Nonetheless, available studies suggest that the sensitivity of CTC for ≥6 mm adenomas or cancer following a positive gFOBt/FIT result is 88.8 % (95 % CI 83.6–92.5 %). Specificity is more variable between studies and the summary estimate is lower, at 75.4 % (95 % CI 58.6–86.8 %). Our review suggests that CTC may adequately substitute for colonoscopy when the latter is undesirable or incomplete. The high rate of subsequent testing (predicated by high prevalence of abnormality) and relatively reduced sensitivity of CTC compared to colonoscopy suggest the latter should remain the preferred test where feasible.