Introduction

In Austria, breast cancer is the most frequent cancer among women with approximately 5500 new cancer cases (30% of all female cancers) diagnosed in 2011 [1]. In Tyrol, a state in western Austria with a female population of approximately 368,000 (51%), the age-standardized incidence rate of breast cancer in 2012 was 81.1 cases per 100,000 women and the age-standardized mortality rate was 11.8 cases per 100,000 women [2] compared to 68.0 and 14.4 cases per 100,000 women in Austria, 80.3 and 15.5 cases per 100,000 women in EU‑28 (28 member states of the European Union) and 92.9 and 14.9 cases per 100,000 women in the USA, respectively [3]. As breast cancer is a progressive disease and tumor stage is the most important determinant of outcome, screening methods for the early detection of breast cancer are of great public health interest; therefore, the effect of mammography screening on breast cancer mortality was investigated in the last decades in several studies that provided firm evidence for the efficacy of breast cancer screening [4,5,6,7,8,9,10]. Following these studies and after an International Agency for Research on Cancer (IARC) expert working group reviewed the evidence and confirmed that screening should be offered as a public health service, in June 2003 the European Parliament recommended the implementation of population-based breast screening programs [11]. According to these EU guidelines, most European countries initiated screening programs [12] and in Tyrol the existing system of opportunistic screening was switched to an organized screening program in 2008 [13, 14]. Nevertheless, despite the fact that randomized trials have shown that mammography screening can reduce breast cancer mortality, in recent years concerns have been voiced about the effectiveness of population-based mammography screening services and interest in quantifying the benefit-harm trade-off for screening mammography has been growing [15,16,17,18,19]. The different interpretations of the evidence and the scholarly debates surrounding the balance of benefits and potential harms have resulted in numerous publications [20,21,22,23,24] that render mammography screening even more controversial and informed decision-making on screening participation challenging for women [25,26,27].

However, systems for the delivery of screening mammography vary among countries, and these differences can influence the effectiveness of screening. Hence, it is crucial to evaluate screening programs in various healthcare systems regarding performance measures that are associated with the provision and quality of the screening process, outcome measures possibly associated with a reduction in mortality as well as estimates of harms [11].

Subsequent to our analysis of the Tyrolean Breast Cancer Screening Program, which evaluated the benefits and adverse effects of combined screening with mammography and ultrasound (submitted for publication), an approach that differs from many other population-based screening programs, we report the main performance and early surrogate parameters as defined by the EU guidelines [11] as well as some important estimates of harms from a 4-year period of organized screening in Tyrol.

Methods

Screening program

In 2006, the Austrian health minister declared mammography to be one of the top health agendas, and in July 2006 a decision was made to implement organized mammography screening programs, namely in a first step in pilot regions, of which Tyrol was the largest. In Tyrol, a clear decision was made to set up the new program while making best possible use of the existing experience and therefore the existing system of spontaneous mammography screening, which had been established over the last 15 years, was switched in 2007 to an organized program by smoothly changing the established framework. Based on this strategy it was possible to establish a nationwide mammography screening program in Tyrol in a very short time. This process followed most EU recommendations for organized mammography screening with the following exceptions: women aged 40–49 years were part of the target population, screening was offered annually to the age group 40–59 years, breast ultrasound was available as an additional diagnostic tool, and double reading was not implemented within this pilot project. After a 1-year pilot phase conducted in two central counties of Tyrol from June 2007 to May 2008 [13], an organized mammography screening program was initiated in the entire state in June 2008. It was offered free of charge to all women in the age group 40–69 years living in Tyrol and covered by compulsory social insurance (approximately 98% of the female population). Women were personally invited to mammography screening, in the age group 40–59 years annually and in the age group 60–69 years every 2 years. Women could consult the screening unit at any time that was convenient to them because a scheduling system did not exist. Screening mammography was offered by all radiologists (N = 13) in private practice in the study area and by all hospital outpatient radiology departments (N = 9). Double reading of mammograms was not implemented in the program. Instead, supplementary breast ultrasound was offered to women immediately after mammography at the radiologist’s discretion. Women were informed of the screening result immediately after examination. The results of mammography, ultrasound or where applicable, of combined mammography with ultrasound examination were recorded according to the Breast Imaging Reporting and Data System (BI-RADS) scheme [28]. Women with BI-RADS score 0 were in the radiologist’s care (e.g. referred to magnetic resonance imaging), while women with BI-RADS 1 or 2 went back to routine screening. Women with BI-RADS 3 were invited for early recall in 6 months or were referred for further assessment at the radiologist’s discretion, and women with BI-RADS 4 or 5 were referred for further assessment. Further assessment was performed at all hospitals in the area (N = 9) and included non-invasive and invasive investigations, such as clinical examination of the breast, further mammography, ultrasound, magnetic resonance imaging or biopsy, where necessary. Additional aspects of the screening program are described in detail elsewhere [14].

Screening database and statistical analysis

All screening and assessment units registered basic individual data which were transferred monthly to a central database at the Department of Clinical Epidemiology of Tirol Kliniken Ltd after pseudonymization of each woman’s social insurance number. The screening and assessment data were combined in this central database, which was maintained as a STATA dataset [29]. Independent of screening, data on tumor characteristics were collected by the Cancer Registry of Tyrol, which covers all cancer cases in the population of Tyrol with a high degree of completeness [30]. Linkage between screening data, assessment data and Cancer Registry data is based on the pseudonym. Numbers and percentages were reported as defined in the EU guidelines [11]. For some indices, population-based rates were computed using the official population data supplied by Statistics Austria. No statistical testing was performed. All reporting was done with STATA, version 13 [29]. This analysis is based on 4 years (from 1 June 2008 to 31 May 2012) of population-based mammography screening in Tyrol.

Program evaluation

In accordance with the EU guidelines [11] key performance indicators and early surrogate parameters were calculated from all screening mammograms in women aged 40–49 years and 50–69 years performed between June 2008 and May 2012. Regarding participation rate, we estimated a 2-year participation rate by counting every woman only once in an observation period of 2 years to reflect the fact that nearly half of the women aged 40–59 years do not return for screening in the first subsequent year although invited annually.

The anticipated breast cancer incidence rate in the absence of screening (background breast cancer incidence rate) that is used to calculate the interval cancer rate and breast cancer detection rate was defined by years of diagnosis in 1988–1990, as spontaneous mammography screening was already introduced in Tyrol during the early 1990s. Interval cancer cases were assessed by linking the screening database and the cancer registry database, and all potential interval cancer cases were checked individually for documentation errors.

The false positive rate and the rate of unnecessary biopsies are important estimates of harms of mammography screening. A false positive screening result was defined as a positive mammography but a final assessment diagnosis of a benign finding. We accumulated the percentage of false positive results per screening round in a woman’s screening life either by counting 25 mammography screens for a woman in her screening life from age 40–69 years (i. e. invitation approach) or by counting 12 screens during a woman’s screening life to account for the fact that the majority of women did not follow the 1‑year screening interval (i. e. actual attendance approach). In the same way, we computed a cumulative unnecessary biopsy rate (women undergoing biopsy proving negative for malignancy).

Results

From June 2008 to May 2012, a total of 272,555 invitations were sent to women in the target population: 41.4% in the age group 40–49 years and 58.6% in the age group 50–69 years. In the same period, 176,957 screening examinations were transferred to the database and serve as the basis for the analyses: 76,431 (43.2%) and 100,526 (56.8%) in women aged 40–49 years and 50–69 years, respectively. As ultrasound is implemented as second-line screening procedure, 76.2% of all women screened underwent supplementary ultrasound (82.3% in women aged 40–49 years). Breast density according to the American College of Radiology (ACR) scores of 3–4 was the reason for supplementary ultrasound in 59.5% and 45.9% of women aged 40–49 years and 50–69 years, respectively (Table 1).

Table 1 Utilization of supplementary ultrasound by age group

Key performance indicators are summarized in Table 2. The estimated 2‑year participation rate was 56.9% and was higher in women aged 40–49 years (60.3%) than in the age group 50–69 years (54.4%). The outcome of screening examinations was incomplete (BI-RADS 0) in a total of 209 (0.1%) cases and negative (BI-RADS 1, 2) in 97.4%. Of all women screened, 2336 (1.3%) were invited for an intermediate screening test within 6 months, and 2322 (1.3%) women were recalled for further assessment. In total 1351 biopsies were performed, 6.9 and 8.2 per 1000 screening examinations in age groups 40–49 years and 50–69 years, respectively. Of all biopsies 96.6% were core biopsies and 3.4% (N = 46) open biopsies. The positive predictive value was 28.2% for assessment and 48.5% for biopsy, with clear differences between the age groups. With respect to waiting times, further assessment was performed within 10 working days after a screening examination in 82.2% and surgery for all confirmed breast cancer cases was performed (without neoadjuvant chemotherapy) within 10 working days after the decision to operate in 82.2%.

Table 2 Key performance indicators by age group

Early surrogate indicators are presented in Table 3. In total, breast cancer was detected in 655 women: 564 (86.1%) invasive and 91 (13.9%) in situ breast cancer cases. The cancer detection rate was 3.7 per 1000 screens and was considerably lower in the age group 40–49 years (i. e. 2.3 per 1000 screens). Tumor characteristics were analyzed for histopathologically confirmed breast cancer cases registered by the Cancer Registry of Tyrol (636 breast cancers: 571 invasive, 65 in situ). The proportion of invasive screen-detected cancers ≤10 mm in size was 31.3% and the proportion of node-negative cancers was 73.4%, with no differences observed between age groups. The proportion of all stage II+ screen-detected cancers was 35.5%.

Table 3 Early surrogate indicators by age group

Interval cancer cases within the first year were assessed for all program years and those within the second year for the first 3 program years. We observed a total of 58 interval cancer cases within 0–11 months and 62 interval cancers within the second year after a negative screening examination (i. e. 0.33 and 0.47 per 1000 screens or 18.6% and 26.5% of the underlying background incidence rate, respectively). In the age group 40–49 years, the interval cancer rate as a proportion of the underlying background incidence rate was higher in both periods, within the first year (24.1%) and the second year (31.8%). Interval cancer rates are summarized in Table 4. So far, 118 interval cancer cases (including those from a pilot phase) were discussed and classified according to EU guidelines in two interval cancer conferences. The proportion of true interval cancer cases was 28.0%, 17.8% were classified as radiologically occult, 39.8% as false negative results of the screening examination, 9.3% as minimal signs and 5.1% as unclassifiable (data not shown).

Table 4 Interval cancer rates by age group

Important estimates of harms are presented in Table 5. For women following the invitation approach within the Tyrolean setting, the estimated cumulative risks for a false positive screening result and an unnecessary biopsy were 21.1% and 9.4%, respectively.

Table 5 Estimates of harms: cumulative proportion of false positive screening result and unnecessary biopsy by age group

Discussion

We evaluated 4 years of a population-based mammography screening program with supplementary ultrasound in Tyrol / Austria and demonstrated satisfactory results for all relevant measures of program quality defined by EU guidelines.

Organizational aspects

Compared to other organized breast cancer screening programs that conform with EU guidelines, our program differs in a few key items: first, women aged 40–49 years are part of the target population and screening is offered annually to the age group 40–59 years; second, our program has not implemented double reading or a minimum caseload of 5000 mammograms per radiologist per year and finally, it uses a combined screening approach consisting of mammography and supplementary ultrasound at the radiologist’s discretion. Despite these differences, the recall rate, biopsy rate and interval cancer rate were lower than in most other nationwide programs in Europe [31]. In our opinion, the main reason for the program’s favorable performance is the combination of mammography and ultrasound. Adding ultrasound to mammography screening is one of the main future directions of breast cancer screening; this approach has already been investigated in several studies [32,33,34] and the potential harms (i. e. higher recall rates and more unnecessary biopsies) were recently discussed [35, 36]. Furthermore, a retrospective analysis of the Tyrolean population-based screening program was performed to evaluate the benefits and harms of this procedure. The results show that supplemental ultrasound in the Tyrolean program increases sensitivity (by approximately 20%) when screening women at average risk for breast cancer. This increase is more than the 5–15% improvement achieved by double interpretation of screening mammograms reported in the literature [37]. Additionally, our results show that recall and biopsy rates can be kept within acceptable limits (submitted for publication). Another reason for the satisfactory results is that all radiologists with good expertise are trained at one central specialized institution using high-quality technical equipment and are certified by the Austrian Radiology Association and all reach a caseload of ≥2000 mammograms per year in private practice. Buist et al. recently showed that such a caseload appears to be sufficient for a high level of diagnostic quality for mammograms [38]. The minimum annual volume of mammographic readings per year varies between different countries (e. g. 2000 readings in Australia, 5000 readings in the UK) and at present evidence justifying these differences is insufficient [39]. In addition, a study within the National Health Service (NHS) Breast Screening Program assessed real-life reader performance as a function of both volumes of mammograms read and reading experience in a multicenter cohort [40]. The authors mentioned that the majority of studies regarding minimum reading volumes are based on test set cases and that studies examining the relationship between performance in real life and performance in test sets were inconclusive. Furthermore, other factors influencing film reading performance (e. g. years of experience, effect of specialist training) are known [41,42,43] and further studies are needed to evaluate differences in reading performance and to provide more useful evidence [40, 43].

Performance and early surrogate indicators

Only two indicators defined by the EU guidelines and evaluated in our analyses did not reach the target value. First, our participation rate of 56.9% was below the acceptable 70% level. A recent summary of participation and coverage rates in population mammographic screening programs for breast cancer in Europe showed an average participation rate of 53.4% (range 19.4–88.9% of personally invited) [12]. Possibilities for increasing participation could be to involve gynecologists and/or general practitioners in referrals for mammography, and to improve information campaigns. A survey among women invited for breast cancer screening in Tyrol showed that these two issues are relevant for participation in the screening program in our healthcare system [44]. Second, the proportion of all screen-detected cancers that are stage II+ (35.5%) was above the acceptable level of 25%. Although the proportion of invasive screen-detected cancers that are small in size achieved a desirable level, the proportion of node-negative cancers is slightly below the recommended level of 75%. One reason could be that the introduction of sentinel node diagnostics may have increased the detection of positive nodes, because the sentinel node technique increases the sensitivity for detecting breast cancer metastases [45].

At 37.2% the positive predictive value of the screening test (i. e. number of cancers detected as a proportion of women undergoing further assessment) in our program for the age group 50–69 years is in good agreement with other European screening programs (6.8–49.5%) [31]. The interval cancer rate, an important measure of the effectiveness of breast cancer screening, in our program is relatively favorable although our choice of background incidence rate follows a very conservative approach: if a more recent period (after 1988) was used to calculate the background incidence rate, the breast cancer rate would have been higher and thus the interval cancer rate as a proportion of the underlying background breast cancer incidence rate would have been even lower. It is well known that the breast cancer rate has increased since the 1990s, especially in European countries. The proportion of interval cancer cases classified as false negative in our program was almost twice as high as the EU recommended maximum level of 20% of the total number of interval cancers. This important matter is currently under discussion by our project team.

Harms

Since the question whether our program has an effect on mortality reduction demands a longer follow-up of the target population and the calculation of overdiagnosis lacks a clear separation of the screening and prescreening periods in Tyrol as well as a long enough observation period, we mainly concentrated on potential harms in terms of false positive results and unnecessary biopsies. Our results following the strict invitation approach show that of 1000 50-year-old women undergoing mammography until the age of 69 years within the Tyrolean setting 114 women will have at least one false positive result and 50 women will undergo at least one unnecessary biopsy. This is very low at the level of individual screening rounds and comparable to a review by the Euroscreen working group that estimated the cumulative risk range for a false positive screening result between 8% and 21% and for an invasive procedure with benign outcome between 1.8% and 6.3% [31].

Although the program results are very feasible and evaluation can be done on a high quality level, our program has some limitations. First, unlike in most European screening programs, double reading of mammograms was not performed. Although the effect of double reading on mammography sensitivity is relatively small (approximately 6%), we are not sure about its effect on our performance parameters. Second, the program protocol did not include a strict clinical path for BI-RADS 0 and 3, resulting in a lack of information on the final results in these cases (BI-RADS 0) or too many cases with BI-RADS 3 recalled for further assessment instead of an intermediate mammography or ultrasound at 6 or 12 months. Third, the combined screening with mammography and ultrasound results in higher costs and a too high utilization of ultrasound in certain age groups. Fourth, although the total interval cancer rate is relatively low, the reason for the high proportion of false negative interval cancer cases should be analyzed further. Fifth, women were personally invited to mammography screening but we did not have a scheduling system. Women were invited to consult the screening unit at any time that was convenient to them. In addition, women who did not respond to the invitation were not issued a reminder. Finally, as immigrants account for about 15% of the Tyrolean population and the number is still rising, no effort was made to offer mammography screening to immigrant women.

Conclusion

In summary, performance of our screening approach combining mammography and ultrasound is very favorable and these population-based results could contribute to the current discussion of future directions to be taken by breast cancer screening programs.