Introduction

Breast cancer (BC) screening has been shown to be effective in reducing BC mortality based on meta-analysis of trials [1, 2] and is currently recommended as a secondary prevention strategy [3] and implemented on a national basis in many countries [4]. Individual population-based screening programmes routinely use surrogate indicators for timely evaluation of screening efficacy because mortality reduction requires very large studies with long follow-up periods and is methodologically difficult to assess in a service screening setting. Several surrogate measures are therefore used for ongoing monitoring in population breast screening practices, including screening coverage and attendance rates, cancer detection rates and, importantly, screening sensitivity as an indicator of potential screening impact. Proportional incidence of interval cancers (IC) [5] is considered a reliable method to assess screening sensitivity, being less affected by bias from lead time and overdiagnosis [6]. However it is not easy to determine the IC proportional incidence: challenges exist particularly in defining the underlying breast cancer incidence in the absence of screening, and in identifying all ICs in the absence of a regional cancer registry. Radiological review of screening mammograms preceding an IC, preferably using blinded review [7], is another indirect method to estimate sensitivity, based on the proportion of ICs deemed to be within the screening error upon radiological review [7]. Standards for IC proportional incidence and for radiological review outcomes are provided in European (EC) guidelines [3].

While it is increasingly evident that both tumour biology and pathology affect BC outcomes [8], BC size remains an established strong prognostic indicator [9], with larger tumours (T2+) being associated with poorer outcomes. Reducing the rate of breast cancers that are T2+ is a necessary condition for maximising ‘early detection’ and for realising the anticipated reduction in BC mortality from screening. Thus, we hypothesised that measuring a reduction of T2+ cancers from screening (relative to pre-screening data) may be a valuable indirect indicator of screening sensitivity and hence its potential efficacy.

The aims of this study were to estimate the proportional incidence of T2+ breast cancers in a population screening programme (with documented quality evaluation including IC proportional incidence [10]), and to perform parallel radiological review of both screen-detected T2+ cancers and ICs in consideration of whether these measures may be comparable as surrogate indicators of screening efficacy.

Methods

Screening programme characteristics

A population-based mammography screening programme was started in the Trento province (Italy) in October 2000, following a preliminary 3-month pilot study in the district of Trento, with further extension to the whole Trento province territory. According to EC and Italian Ministry of Health recommendations, screenings were based on an active mail invitation of resident women aged 50–69 years. Screening tests were performed in seven different sites in the province territory, with centralised reading at the Trento screening unit using double-reading. The current study is based on the years 2001 to 2009: from 2005, digital mammography [direct (full-field digital mammography) at the Trento centre, indirect (computer radiography) at other screening sites] replaced film-screen mammography.

Screening programme performance indicators are regularly provided and are available online (National Screening Observatory http://win.osservatorionazionalescreening.it). Detailed screening performance indicators from the Trento programme have recently been reported [10] and indicate good coverage (approximately 85% from a population of 26,000 invited women), and demonstrate cancer detection rates and recall rates within recommended standards [10]. In addition, IC proportional incidence analysis and review showed good performance for the programme, fitting within EC standards [10].

Proportional incidence of T2+ breast cancers

Evaluation of the proportional incidence of T2+ cancers in screening attendees (either screen-detected at prevalent or incident rounds, or detected as interval cancers) considered all T2+ cancers observed during 2001–2009, with incidence data being complete up to December 2010. The estimate of underlying incidence of T2+ cancers was based on local cancer registry data for the years 1999 and 2000, the time-frame before screening was started. Screen-detected T2+ were identified through screening archives. Interval cancers (histologically confirmed invasive breast cancers occurring within 2 years of a negative screening episode) were identified by linking the screening archives with those from the local cancer registry (http://www.registri-tumori.it/cms/?q=RTTrento), the local pathology department and hospital discharge records (HDR). Screening tests performed during the limited pilot study in the year 2000 were not included in this study. Proportional incidence was calculated as the ratio of observed T2+ cancers in screening attendees to the number of T2+ cancers expected in the absence of screening. The expected T2+ cancers estimate was obtained by multiplying the screening attendees-years by age-specific T2+ breast cancer incidence for the Trento province, from the local cancer registry data for women aged 50–69 years and relative to 2001–2009.

Radiological review methods

Films of ‘negative’ screening mammograms preceding screen-detected T2+ cancer (subset of 54 cases for which mammograms were available in screening archives) and all ICs (subset of 50 cases occurring during 2009–2010) were considered for the study. The review of films prior to the 2-yearly screen was performed for T2+ cancers detected at repeated screening rounds. Mammographic review was performed using methods that reduce bias in classification [7]: we used blinded (‘masked’) review methods and integrated screening negative controls (at least one subsequent negative screening) randomly drawn for screening archives and matched to ICs on a 3:1 ratio (total of 170 controls). Independent review was carried out by a panel of three dedicated breast radiologists (external to the screening programme) with varying screening experience (F.C. >10 years, A.F. >20 years, S.C. >30 years). Mammograms (screen-film or digital prints for the study set) were randomly admixtured and mounted on rotating viewers: the reviewer was requested to mark on a paper scheme of the four mammography views, whether recall is warranted, and if so, the exact site of the mammographic abnormality for further assessment. Classification as “occult”, “minimal sign” or “screening error” review categories [7] was based on the majority report from the 3 reviewers. The proportion of T2+ and ICs classified as ‘screening error’ (based on at least 2 radiologists recalling the exact abnormality) was determined. Observed differences were checked by the chi-square test, statistical significance being set at P < 0.05.

Results

T2+ proportional incidence

Underlying T2+ incidence for the years 1999 and 2000 (immediately preceding implementation of screening) in the 50–69 age group was 0.00091. Screening attendees during 2001–2009 accounted for 271,385 women-years (89,219 and 182,167 women-years for first and repeat screens, respectively). Based on these data, the estimated number of T2+ cancers expected during 2001–2009 in this screened population was 247 cancers (year-specific data shown in Table 1).

Table 1 Observed and expected numbers of T2+ cancers in the breast screening programme (2001–2009) for prevalent and incident screening

During 2001–2009, there were 48 T2+ detected at first screen, and 67 T2+ detected at repeat screen, while 53 T2+ cancers were observed as IC—a total of 168 T2+ occurred in the screened cohort. The proportional incidence of T2+ cancers was therefore 68% (168 observed to 247 expected), corresponding to an estimated 32% reduction in T2+ rate in the screened population relative to expected incidence in the absence of screening.

The number of screens, and the number of observed and expected T2+ breast cancers, by year and screening round, are provided in Table 1

Radiological review of T2+ and ICs

Table 2 shows the results of the majority review from three radiologists. Fourteen of 50 (28%) ICs, and 15 of 54 (27.8%) T2+ cancers, were reviewed as screening errors (identified by at least two radiologists) (P = 0.84). The proportion of T2+ and ICs correctly identified by each radiologist at blinded review slightly differed: Radiologist A (T2+ 27.7% vs IC 36.1%, P = 0.49); Radiologist B (T2+ 29.6% vs IC 20.0%, P = 0.36); and Radiologist C (T2+ 35.1% vs IC 40.0%, P = 0.76). Recall rate (determined on negative controls) was different between radiologists (A = 15.2%, B = 8.8% and C = 8.8%; P = 0.08) though not reaching significant levels. The proportion of cases with abnormalities indicated by at least two reviewers was significantly higher for T2+ (P = 0.004) or IC (P = 0.005), as compared to controls.

Table 2 Results of blinded radiological review of previous screening mammograms of interval cancers and T2+ cancers from three radiologists [(−) = negative; (+) = positive]

Discussion

Surrogate measures of screening performance allow the timely monitoring of screening quality and increases the benefits from screening, and have been used in screening programmes throughout the world. Our study explored whether estimates of the proportional incidence of large (T2+) breast cancers, and radiological reviews of preceding mammograms for T2+ cancers in screening participants (measures not usually adopted when monitoring screening), may provide practical surrogate measures of screening sensitivity and potential effect.

We derived the numbers of T2+ cancers expected in the absence of screening from age-specific data from the local cancer registry for the period immediately preceding the screening programme implementation—this allowed estimation of the cancer incidence in the absence of screening because it excluded major long-term incidence temporal shifts and took into account the presumed spontaneous access to mammography available immediately before initiation of the screening programme. Nonetheless, our estimates required an assumption of a relatively stable underlying risk of T2+ tumours and assumed minimal effect from migration for the region. We estimated a proportional incidence of T2+ cancers of 68% in the screened cohort, suggesting that screening may have reduced up to one-third of the T2+ cancers that would have occurred in the absence of screening. This appears to be a plausible estimate given expected screening effects from randomised trials [1], however currently there are no standards for the proportional incidence of large cancers, so we cannot compare our estimates with other studies or screening performance standards. It is unlikely that current standards for IC proportional incidence (EC standard <30% first year, <50% second year and 40% for biennial screening [3]) translate as a standard for T2+ proportional incidence, so we strongly encourage other investigators to extend the proposed concept of T2+ proportional incidence through screening research.

Radiological review of the screening mammogram preceding an IC is not a new concept [7], but our study is the first to investigate preceding mammograms for both ICs and screen-detected T2+ cancers occurring in the same time-frame, with consistent review methods performed by the same radiologists. A strength of the review strategy used in this work is the blinded classification and the case-mix (cancers and controls) which are recommended methods for radiological reviews to help reduce the bias in the classification of mammograms [7, 11] and to approximate the scenario of screening practice [1, 2]. Nonetheless, in a research context, reviewers might be expected to achieve higher sensitivity and lower specificity by adopting a lower threshold for suspicion, as suggested by the recall rates (see Table 2) which were higher than in routine practice, but comparable to other studies using radiological reviews [1113]. The applied comparison of reviews of T2+ and ICs is reliable (performed by the same radiologist panel in the same setting using a standardised set of negative controls) and was performed to interrogate the extent to which preceding mammograms for these cancers would be classified as screening errors. We found that the majority of radiological reviews classified approximately 28% of these cancers as screening errors [3, 7]. Based on recently published data on proportional IC incidence and review, the performance of the Trento screening programme has been shown to be good within the limits recommended by the EC [10]. So, although the 28% screening error for IC review is slightly high, the more relevant issue here is that the distribution of radiological review results was similar for IC and T2+ cancers (see Table 1), raising the possibility that review of T2+ cancers may be integrated into a review of ICs as a complementary (and potentially equivalent) surrogate measure of screening sensitivity.

The results from the radiological review of T2+ cancers in this study suggest that this may have potential utility for monitoring screening performances, however, additional research is needed to further examine this, and to define whether standards for the proportion of ‘acceptable’ screening error [3] for radiological review of ICs could be adopted for review of T2+ cancers. Because of the logistical challenges inherent in identification and verification of ICs, our study has taken a pragmatic approach by exploring the review of T2+ cancers, as this may be more feasible to perform in routine screening clinical quality assurance, particularly where services encounter difficulties in identifying IC cases. Furthermore, there is an inherent educational value in conducting reviews of T2+ cancers and potentially characterising the features of ‘missed’ mammographic lesions that are subsequently detected as large cancers.

Using the combination of epidemiological surveillance (proportional incidence) and radiological review of T2+ cancers, as shown in this study, potentially provides practical indicators of screening performance and warrants further evaluation by breast screening programmes. In particular, radiological review of T2+ cancers could be integrated with review of ICs (as part of quality monitoring) and may potentially prove more feasible than (and hence alternate to) review of ICs for some screening services.