Introduction

In 2003, the European Council recommended mammography screening for women aged 50 to 69 years in accordance with European Guidelines [1]. These guidelines include the whole process, from the invitation of the women to epidemiological evaluation of screening data. In 2006, the European Guidelines were updated to cope with newer insights and technical evolutions [2]. A new chapter was dedicated to digital mammography systems. The main idea of the additional physical tests was to guarantee that at least the same quality could be obtained with the new technology as obtained with film-screen mammography (FSM).

Digital mammography encompasses a group of technologies. In this study we distinguish between ‘computed radiography’ (CR) and ‘direct digital radiology’ (DR). Computed radiography uses phosphor plates and a separate reader. At the time of data collection, powder-based CR systems were in use, with the majority using a double-sided reader as opposed to single sided. Direct digital radiology systems have an integrated X-ray system and detector. The image is available in the computer immediately after the X-ray exposure [3, 4].

The technical image quality of a digital mammogram depends on many factors, including the quality of the detector, the X-ray beam quality and the dose at which the system is operated [5]. In the European Guidelines [2], the technical image quality of the complete imaging chain is expressed via contrast thresholds. In practice, for a selection of disk diameters, the maximal gold thickness of the just visible disks is set (threshold thickness limit). This criterion is evaluated with the CDMAM (Contrast Detail for MAMmography) test object (Artinis, The Netherlands). The results of a performance analysis with the CDMAM test object are threshold levels expressed as X-ray contrast or in terms of the thickness of gold disks. This is evaluated for various diameters and characterised by the dose at which these threshold values have been obtained. Usually the threshold values of the 0.1-mm disk are evaluated in detail, as this is the most difficult criterion to meet in practice. A number of studies showed the superiority of DR technology when compared to CR [68].

The evaluation of the impact of digital mammography on screening indicators is reported in a selected set of prospective and retrospective studies. Whereas the first large multicentre study used both DR and CR technology [9], most studies reported about screening programmes in which only DR technology was used [1018]. Lipasti et al. [19] reported the use of CR mammography in screening and compared this technology via the diagnostic workup of the patients. In the CR arm, 1.71 % of those screened and in the film-screen arm 1.59 % of screened women were recalled for further workup. The PPV was significantly higher for CR (36 %) than in FSM (26 %). Skaane et al. concluded in their review study [20] that ten studies that compared digital and FSM in breast cancer screening showed divergent and rather conflicting results.

The literature is very scarce in terms of data specifically for CR systems used in a screening setting, notwithstanding the widespread distribution of these systems in Europe. To our knowledge, the only investigation with large-scale use of CR in screening has been performed in France (http://www.invs.sante.fr/publications/2011/programme_depistage_cancer_sein/plaquette_depistage_cancer_sein.pdf) and has shown a significantly reduced cancer detection rate for the group of centres using CR versus DR technology. These facts were the direct trigger for a more detailed investigation of the screening data in Flanders, as there is mixed use of DR and CR technology in our mammographic units. This article reports an analysis of the technical and clinical screening performance indicators with CR versus DR for the 2008–2010 time period.

Material and methods

Breast cancer screening organisation in Flanders

Breast cancer screening was started in our region in 2001. Five (regional) cancer screening centres are responsible for the implementation of the Flemish breast cancer screening programme according to the European Guidelines.

Screening mammograms are taken in 171 mammographic units. The units participate with their CR or DR system, with the only requirements being that the systems have passed the type test [21] and the subsequent acceptance test and yearly tests in the mammographic unit and that daily QC procedures are performed for the X-ray system and monitor. At the end of 2010, there were 96 digital systems, distributed as follows: for DR: 7 Fuji Amulet MS 1000 systems, 10 GE Senographe Essential systems, 2 GE Senographe DS systems, 7 Hologic Selenia systems, 2 IMS Giotto Image systems, 5 Sectra MicroDose systems, 21 Siemens Mammomat Inspiration systems and 3 Siemens Mammomat Novation systems. The following CR technology was available: 3 Agfa MM3.0 systems, 4 Carestream EHR-M2 systems, 31 Fuji Profect systems and 1 Konica CP/Regius 190 system.

Screening population

This study is a retrospective analysis of data from the population-based Flemish breast cancer screening programme. The target population for bi-annual invitation was 715,106 women in 2006 and reached 760,633 women in 2010. Invitation coverage is complete. Over the last 5 years, between 163,793 (in 2006) and 183,384 (in 2010) women were screened each year.

The period of investigation was fixed from January 2008 until December 2010.

Between 2008 and 2010, 73,008 women were screened with CR [17,855 women at their first screening (first round) and 55,153 in subsequent rounds]. In parallel to this, 116,945 women were screened with DR (25,032 in the first round and 91,913 in subsequent rounds).

Mammography systems

As soon as digital mammography was conditionally accepted, a gradual introduction of digital mammography was observed. Whereas in 2006 less than 2 % of the screening mammograms were digital, this percentage reached 60 % in 2010 (Fig. 1).

Fig. 1
figure 1

Percentage of digital mammograms in our breast cancer screening programme since 2006

Physico-technical data for present study were collected in a subset of mammographic units controlled by the medical physics experts of either the University of Ghent or the University of Leuven. This made a total of 62 digital mammography units involved, with 25 CR systems and 37 DR systems. The Ghent group performed the QA activities in 24 of these units and the group of Leuven in 38 units (Table 1).

Table 1 Overview of the number of mammograms in the patient dose survey

Patient dose data were available from the 3-yearly legally required patient dosimetry surveys. We took care to select the physico-technical data closest in time to the patient dose survey. In addition, it was verified from daily QC data that there were no substantial changes in the systems between the technical tests and the patient dose survey.

Patient dosimetry

In our mammographic dose surveys, the mean glandular doses (MGDs) were calculated using Dance’s approach, from the following equation [22]:

$$ MGD=Kgsc $$
(1)

where K is the incident air kerma at the upper surface of the breast, measured without backscatter and under the compression paddle; factor g is the incident air kerma to MGD conversion factor for a glandularity of 50 %; the factor c corrects for the difference in breast composition from 50 % glandularity and was assumed here as in the UK screened population; factor s corrects for the x-ray spectrum used.

Following our legislation, MGDs have to be calculated for at least 50 successive patients (4 mammograms per patient). Radiographers either note down all relevant exposure-related data in a data file and transmit this file to the medical physicist or images are collected in an automated way and the header information is then scrutinised for exposure data automatically under supervision of a medical physicist as in [23, 24]. In the latter case, data are usually collected in more than 50 patients per mammographic system. An overview of the patient dose data sample is shown in Table 1. We first calculated the mean and median values of the MGDs of all the mammography systems. We then calculated for CR respectively DR the mean values of the mean and median data of the individual systems. Next, the same calculations were performed for the systems tested by the Ghent and Leuven teams separately. T-test analysis was performed to compare whether the group averages for CR and DR were significantly different.

Mean glandular dose as a function of compressed breast thickness for CR and DR is visualised via the best exponential curve fit through all the individual data points.

Technical screening performance indicators

According to the European Guidelines, systems should operate below an MGD of 3 mGy for the 6-cm breast that is simulated by 5 cm of polymethyl methacrylate (PMMA) and below a threshold gold thickness of 1.68 μm for the 0.1-mm disk (acceptable level). The best systems operate at low dose and low threshold gold thickness. Mean glandular doses describing the dose level of the contrast detail test were calculated from an acquisition of a 5-cm PMMA slab that was imaged with automatic exposure control.

Threshold thickness values were obtained from acquisitions of the CDMAM test object (Artinis, Zetten, The Netherlands) using a manual exposure identical to the automatic 5-cm acquisition for the MGD. Manual reading of the CDMAM was performed at acceptance along with an automated approach that was then continued afterwards if the results of the first automated reading were in line with the manual reading. Automatic evaluation was performed with the software cdcom1.5 (downloadable from the Euref website) and Erica2 (available from www.qaelum.com). This type of data processing was worked out by Young et al. [25] and has been proven to predict human readings from the computerised evaluation method successfully.

We then calculated for CR, respectively DR, the mean values of threshold thickness values and corresponding doses. Next, in analogy to the patient dose data, the same calculations were performed for the systems tested by the Ghent and Leuven teams separately.

Clinical screening performance indicators

The following screening indicators were calculated for CR and DR technologies and for both the initial and the subsequent screening rounds: recall rate (RR), cancer detection rate (CDR), the percentage of DCIS, the percentage of tumours with size <1 cm and positive predictive value (PPV). The RR is defined as the proportion of screened women for whom further workup was recommended. The CDR is the number of pathologically proven malignant lesions of the breast (both in situ and invasive) detected in a screening round per 1,000 women screened in that round. It is expressed as a multiple of the underlying, expected breast cancer incidence rate in the absence of screening. It should be higher than three times this incidence rate for initial screening rounds and 1.5 times higher than the incidence rate for subsequent rounds: the background incidence of breast cancer in the absence of screening is 1.25 per 1,000 women. The T-staging represents the size of the tumours. The PPV is the percentage of cancers found in the recalled women. In some cases, the results of the follow-up exams are not known in the screening programme, as these investigations are performed outside the screening setting. We did not correct our numbers for the missing information. For the total group of patients scanned in the period investigated, 12.5 % of the follow-up results of the recalled cases was not available. As missing follow-up results are homogeneously spread over the cohort, our CDR may therefore be underestimated by up to 12.5 %.

The results of CR and DR technologies for the investigated period were compared using Fisher’s exact test and a P-value less than 0.05 was regarded as statistically significantly different.

Results

Patient dosimetry

The MGDs were calculated for a total of 27,745 mammograms, of which 5,623 were acquired with CR and 22,122 with DR. The larger number of DR mammograms is due to the easier dose data acquisition. As we report only mean and median values, this situation does not bias the results. An overview of the dose data, differentiated according to the different X-ray modalities, is presented in Table 2. The mean and median MGDs of CR and DR systems are significantly different (P < 0.0001). The mean and median MGDs and their standard deviations (SD) were 2.16 mGy (SD 0.36 mGy) and 2.04 mGy (SD 0.36 mGy) for CR and 1.35 mGy (SD 0.32 mGy) and 1.26 mGy (SD 0.30 mGy) for DR. The mean and median MGDs were 60 % higher for CR than for DR.

Table 2 Overview of the mean and median MGD from the patient dosimetry study and the corresponding threshold thickness values for the 0.1-mm disk

Patient dose data for the units tested by Ghent and Leuven showed similar differences. For Ghent, the mean and median MGDs were 2.22 mGy (SD 0.33 mGy) and 2.10 mGy (SD 0.30 mGy) for CR and 1.57 mGy (SD 0.31 mGy) and 1.47 mGy (SD 0.31 mGy) for DR. Data for Leuven were 2.00 mGy (SD 0.41 mGy) and 1.88 mGy (SD 0.46 mGy) for CR and 1.30 mGy (SD 0.31 mGy) and 1.21 mGy (SD 0.28 mGy) for DR. The doses for CR showed no differences between data as obtained by Ghent and Leuven. The mean MGDs for DR were almost significantly different (P-value 0.05), whereas the median MGDs were different, with a P-value of 0.03. These observations confirm the higher dose settings for CR than for DR in the units controlled by both groups.

Figure 2 visualizes the dose data, allowing an easy comparison of the data according to the limiting values of the European Guidelines. The averaged values remain below the acceptable dose levels. It must be remarked that the limiting values in the Guidelines refer to MGDs estimated from the PMMA acquisitions and are not, strictly speaking, intended for use in patient dose surveys.

Fig. 2
figure 2

Mean glandular dose as a function of compressed breast thickness from grouping of all patient dose data collected by the Ghent and Leuven groups and separately for CR and DR technology. Mean glandular doses as a function of compressed breast thickness (mm) were represented by exponential functions that were fitted through the input data: for CR systems tested by Ghent: MGD = 0.6831*EXP(0.0202*thickness), R2 = 0.442; for DR systems tested by Ghent: MGD = 0.8402*EXP(0.0103*thickness), R2 = 0.177. For CR systems tested by Leuven: MGD = 0.56*EXP(0.0212*thickness), R2 = 0.424. For DR systems tested by Leuven: MGD = 0.4326*EXP(0.0173*thickness), R2 = 0.379

Physico-technical data

Physico-technical characteristics in terms of threshold gold thickness at the associated clinical dose setting are shown in Fig. 3.

Fig. 3
figure 3

Graphical representation of mean glandular dose (MGD) and threshold gold thicknesses for 62 systems controlled by the Ghent and Leuven groups

The CR and DR systems are shown as two groups in the graph, with DR presenting clearly better performance parameters.

Threshold thickness values for the 0.1-mm disk of the DR systems had an average value of 1.20 μm (SD 0.13 μm). The same values for the CR systems were on average 1.43 μm (SD 0.13 μm). These values were obtained at the following doses: 2.26 mGy (mean) and 2.31 mGy (median) with SD 0.44 mGy for CR and 1.26 mGy (mean) and 1.25 mGy (median) with SD 0.27 mGy for DR. These dose values are very similar to the results from the patient dose survey. Both threshold thickness values and MGD were significantly different between CR and DR (P-values < 10-9).

Screening indicators

The screening performance parameters are summarised in Table 3. None of the parameters showed statistically significant differences between CR and DR technology.

Table 3 Overview of the clinical screening performance parameters separately for CR and DR technology in the Flemish Breast Screening Programme

The RR for the first round was 5.48 % and 5.61 % for CR and DR technology respectively and well below the acceptable level of 7 % set by the European Guidelines. In subsequent rounds, the recall rate was 2.52 % respectively 2.65 %, which is lower than the achievable level of 3 %.

The CDR for the first round was 0.64 % and 0.63 % for CR and DR technology respectively and above the required three times the incidence rate, estimated to be 0.375 %. In subsequent rounds, the CDR is 0.48 % respectively 0.51 %, which is above 1.5 times the incidence rate.

In the initial rounds we achieved 0.03 % of DCIS for CR and 0.07 % for DR, and in subsequent rounds 0.05 % for CR and 0.08 % for DR. These values were not statistically significantly different. The percentage of DCIS in the detected cancers ranged from 15.6 % in CR to 19.8 % in DR.

The detection rates of cancers with T-scores <1 cm were 0.09 % for CR in initial screening versus 0.12 % in subsequent rounds and for DR 0.13 % and 0.10 % for initial respectively subsequent rounds.

The positive predictive values were also higher than mentioned in the European Guidelines. Computed radiography was 13.9 % for initial rounds and 21.4 % for subsequent rounds; DR was 13.3 % in initial rounds and 21.5 % in subsequent rounds.

Discussion

A unique feature of several breast cancer screening programmes is that the performance is measured continuously as recommended in the European Guidelines. This allows for a follow-up of the effect of new technologies in terms of parameters that are used throughout Europe. The present study collected the data of all patients that had been screened with digital mammography since the introduction of digital mammography and until 2010. In our region both CR and DR had been used.

There are a few study limitations in the screening performance data: (1) while it is known that some of the screening indicators, such as CDR and PPV, may depend on the age of the women, an age-specific analysis could not be performed here as the cohorts with CR and DR were already rather small [26]; (2) we did not document the experience of the second readers. Experience can be judged as acceptable or substantial as the groups have remained largely unchanged since the introduction of screening in 2001; (3) our cohorts were too small to perform a statistical analysis on the numbers of low-grade DCIS lesions versus high-grade lesions as detected with CR versus DR technology. Although the breast imaging community would be eager to learn about possible overdiagnosis, we could not perform the analysis carried out in Bluekens et al.’s study [15] in our cohorts; the groups were too small.

Our main result was that we could not observe any significant difference in the RR, CDR, detection rate of DCIS, percentage of cancers with T-stage <1 cm and PPV in cohorts screened with CR versus DR. This is different from the results in France and could be due to some differences between the French screening programme and our implementation. Our programme has generalised the second reading of all mammograms, whereas in France the first reader can immediately perform add-on exams if deemed necessary. We have soft copy viewing for the first and second reading, whereas in France hard copy reading was allowed. Our programme is unique in having generalised daily and centrally supervised quality control of X-ray systems and monitors, and the age of our screened population is limited to 50–69 years, whereas in France also older women are screened. A limitation of our data sample is of course that it is much smaller than the French sample, and therefore we plan continued evaluation of possible differences in performance. Aside from the differences in the screening setup, differences could also be due to different operational points of the mammography systems. The present study confirmed that in Flanders, CR and DR systems have been set at different MGD and quality levels. The average thickness of gold to detect the 0.1-mm disk was 1.49 μm for CR and 1.20 μm for DR. This was achieved at a mean MGD of 2.26 mGy respectively 1.26 mGy. From daily experience (results not shown), it follows that powder-based CR systems cannot achieve the threshold thickness values seen with DR within the given dose limits of the European Guidelines. Recent experience with needle-based CR systems is promising as shown in a recent comparative technical evaluation of powder-based CR, needle-based CR and an a-Se detector [27]. These CR systems can achieve lower threshold thicknesses, close to the values observed with DR.

In the present study, we showed similar screening performance parameters for CR and DR, but with both types of digital mammography systems operated at different dose and quality levels. This does not prove that these dose levels are optimal or that they could be reduced. Our study is only an indirect approval of the technical requirements put forward in the European Guidelines. Differences in detection of clusters of subtle microcalcification between CR and DR have been quantified recently [7]. The study showed differences between CR and DR and between different dose levels for detection of clusters of microcalcification. Their data are in line with our data in which we observed similar clinical screening parameters with CR and DR but with CR operated at a significantly higher dose level than DR. The present study and the quantitative study do not support lowering the dose settings for either our CR or DR systems. It would be interesting to investigate whether our screening performance parameters could be improved with higher dose DR.

The higher doses used with CR technology (2 mammograms at 2.16 mGy) versus DR (2 mammograms at 1.35 mGy ) for a screening examination at the age of 50 years could lead to an extra 1.6 radiation induced cancers per 100,000 women screened. However, the expected benefit of reduction in premature mortality afforded by routine mammographic screening in terms of either lives saved or years of life saved greatly exceeds this risk [28].

The ultimate test for the performance of the screening programme is reflected in the amount and time distribution of interval cancers. These data are available up to 2007, before the introduction of digital mammography. During the film-screen period, our programme complied with all European Guidelines in this respect. The data are suggestive for compliance in the digital era too.

In summary, all quality assurance efforts in the frame of breast cancer screening offer a unique opportunity to control the effects of the introduction of new imaging techniques. From the physico-technical point of view, DR performs better than powder-based CR in terms of dose and image quality. In the Flemish screening programme we did not see a difference in clinical screening indicators obtained with CR and DR.