Introduction

Digital breast tomosynthesis (DBT) has been shown to increase cancer detection and reduce false positive recalls in screening [1], but there is still a concern that small, and/or low-density calcifications may not be depicted well given all the changes that have occurred in breast screening [2, 3]. Commercially available DBT systems use a limited-angle acquisition and a small number of low-dose projections that produce an under-sampled dataset, compromising the detection of clinically significant calcifications [3]. Calcification detection at screening is important because they are one of the earliest signs of breast cancer [4].

Calcification detection is degraded by noise and blurring in DBT [5]. DBT systems with continuous tube motion have shorter scan times compared to step-and-shoot systems; however, the trade-off is increased blurring due to focal spot motion [6]. A previous study has shown that detector element size affects the spatial resolution of flat panel detectors [7], influencing the effective signal–noise ratio and the overall system modulation transfer function [8]. Preliminary data has been previously reported to demonstrate that the reconstructed voxel size also affects spatial resolution and image blur in DBT reconstructions [9]. However, there is a limit to how much the image quality can be improved by varying the reconstructed voxel size [10].

Virtual clinical trials (VCTs) have been widely used to evaluate the performance of virtual readers to detect calcifications [11,12,13,14,15]. VCTs are beneficial for the development and optimization of novel imaging systems prior to clinical trials [11, 13], making the design process more cost-effective and rapid [16]. VCTs are used to simulate human anatomy, image acquisition, and image interpretation [16]. VCTs have the potential to support regulatory and pre-clinical evaluations [15] and to evaluate and optimize the performance of imaging systems using virtual phantoms [11]. Simulations of the breast anatomy require high resolution to depict accurately the fine details associated with linear structures (e.g., ductal network, blood vessels, and Cooper’s ligaments) and more focal findings (e.g., calcifications, masses, and architectural distortion) [16].

This work presents a VCT method to determine the limit of calcification detection in DBT. Single calcifications of various sizes and densities were inserted into in-silico breast phantoms [11]. DBT acquisitions were simulated assuming various acquisition geometries. Variations of x-ray tube motion, detector element size, and reconstructed voxel size (allowing for super-resolution) were simulated to evaluate detection performance. Channelized Hotelling observers [17] (CHOs) were used to evaluate calcification detection with multiple-reader multiple-case (MRMC) receiver operating characteristic (ROC) analyses [18].

Material and methods

Screening data was collected retrospectively (Fig. 1A) [19]; institutional review board approved and Health Insurance Portability and Protection Act compliant. Screening exams were performed on four Selenia Dimensions systems (Hologic) from Sep 2011 to Jan 2015. A case–control cohort consisting of 132 cancer patients and 528 matched controls was selected for modeling [19]. Control patients were matched using compressed breast thickness (CBT), body mass index (BMI), and BI-RADS density [20].

Fig. 1
figure 1

(A) Patient accrual for developing analytical model of screening population. (B) VCT Pipeline and flowchart used to insert calcifications and simulate synthetic DBT images, and to model virtual observers to collect and compare performance measurements. The method of calcification insertion and reading trials with mathematical models (CHOs) are illustrated in I and II, respectively

A VCT pipeline was developed to model and conduct clinical trials of imaging systems (Fig. 1B). The steps used to conduct the VCTs include image pre-processing, image post-processing, and data analysis. The VCT pipeline consists of two major phases: virtual patient accrual and virtual reader simulation. We used this pipeline to calibrate calcifications below, at, and (slightly) above the threshold of lesion detectability (i.e., various levels of case difficulty) [21].

Determining representative calcifications at the threshold of detectability

In total, 600 anthropomorphic phantoms (voxel size of 100 µm3) were simulated using a principal component analysis (PCA) method to create compressed breast outlines [22] and a GPU-based recursive partitioning algorithm [23]. The phantoms were simulated using a comparable distribution of compressed breast thickness (CBT = 57.75 [49.93, 66.68]mm, p < 0.001) and volumetric breast density (VBD = 13.33 [8.55, 20.76]%, p = 0.24) data of a large screening population (Table 1) [22]. In addition, 24 breast phantoms (calibration set) were simulated to calibrate the composition of single-calcification models [21]. The lesion calibration is needed to match clinically-known AUCs reported in previous reader study thereby duplicating the detection difficulty [24]. The CBT and VBD of the calibration set varied from [30, 80] mm (steps of 10 mm) and [5, 35]% (steps of 10%), respectively.

Table 1 Summary of patient demographics. CBT compressed breast thickness, BMI body mass index, VBD volumetric breast density

Single-calcifications acquired from microcalcification clusters were inserted into each breast phantom. The models of calcification clusters were extracted from diagnostic clinical images using a prone stereotactic breast biopsy system (Fischer MammoVision™) [25]. The single-calcification models varied in size (23, 33, and 43 voxels). The composition of the calcifications was simulated using partial volumes of hydroxyapatite [11, 21]. The fraction of hydroxyapatite in each model was varied using a previously validated method (Figure S1) [21]. In total, 40 single-calcifications of each model were inserted in the central plane (cranial-caudal or CC view) corresponding to the mid-thickness of 24 breast phantoms selected for calibration. The calcifications were randomly distributed in the central plane of the phantoms; calcifications were distributed throughout the plane allowing some background breast tissue to surround each calcification within its bounding box (size of 6.4 mm with air exclusion, Fig. 1BI). This insertion method was repeated for each single-calcification model; results were collected and stratified as a function of calcification size (Figure S1). We simulated central projections of a digital breast tomosynthesis (DBT) system (Table 2) using clinical exposure settings reported in the literature [26]; the dose was varied as a function of CBT and VBD for each phantom. The projections were simulated using the X-ray mass attenuation reported by the ICRU [27]. A ray-tracing algorithm produced images that were processed using commercial software (Briona, version 9.02, real-time tomography LLC, Villanova, PA). Regions of interest (ROIs) of size 4.48 × 4.48 mm2 (64 × 64 pixels) were presented with and without calcifications to virtual readers modeled as channelized Hoteling observers (CHOs, Fig. 1BII) [17]. For calibration, we used 15 channels and 9 spread as parameters of each CHO. The threshold of detectability of calcifications was matched within the 95% confidence interval (CI) reported in a previous reader study (Figure S1) [24]. The VCT method and the parameters used to calibrate lesions are described in previous publication [21].

Table 2 Summary of parameters used to simulate the combination of various DBT acquisition geometries. NA not applicable

Calculating changes in performance between virtual readers

Similarly, 20 calcifications of each model were selected and randomly distributed in the central slice (CC view) of the 600 breast phantoms selected for evaluating the limit of calcification detection using various DBT acquisitions. DBT projections of phantoms with and without calcifications were simulated. The acquisition geometries were varied in terms of detector element size, tube motion, and reconstructed voxel size (Table 2B) to evaluate changes in the performance of virtual readers trained for detecting calcifications. The DBT projection and reconstruction methods described in the previous section were used.

ROIs were selected and centered on each calcification in the central slice of the reconstructed DBT image. The ROIs also measured 4.48 × 4.48 mm2. We varied the spread parameter of each CHO for ROIs reconstructed using 35, 50, 70, 100, and 140 µm voxel sizes. For each CHO, training and testing image sets for each acquisition condition included 600 ROIs with simulated lesions and 600 lesion-free ROIs.

The ROIs were separated into two main sets, with and without calcifications. Each main set was then separated into independent training and testing ROI sets. Each reader was trained with an independent training set but tested with a common test set, simulating a scenario where independent readers read the same cases. The ROIs used for training and testing were selected randomly for each reader. The scores of each reader were pooled to calculate a cumulative ROC [24].

Statistical analyses: ROC with AUCs and signal detection with d’ index

One-shot MRMC ROC analysis was used to estimate the \(\mathrm{AUC}\) and its variance for lesion detectability with multiple lesions per case [18]. We simulated different blurring conditions to evaluate the virtual readers’ performance. Blurring conditions were analyzed as a function of calcification size: two tube motions (continuous and step-and-shoot), two detector element sizes (70 µm and 140 µm), and various reconstructed voxel sizes ([35, 100] µm).

The \(\mathrm{AUC}\) and \({d}{\prime}\) were calculated based on the pooled ROC curves of the 14 virtual readers. AUC differences between two paired ROCs were reported using 2000 stratified bootstrap replicates computed using the Delong method [28]. The 95% CI AUC was also reported using 2000 stratified bootstrap replicates computed based on the Delong method. R packages “pROC” (version 1.18.0) and “psycho” (version 0.6.1) were used to collect the ROC and \({d}{\prime}\) (sensitivity index) statistics, respectively [29]. Differences (Δ) in \(\mathrm{AUC}\) and ratios (λ) of \({d}{\prime}\) were calculated to evaluate changes in performance between acquisition geometries. \(\mathrm{AUC}\) and \({d}{\prime}\) were categorized by tube motion, detector element size, reconstructed voxel size, and calcification size.

Experimental validation

To validate our VCTs, we acquired DBT images of a BR3D breast phantom (Mod. 020, CIRS Inc.). The BR3D phantom consists of six individual semi-elliptical slabs (semi-major axis = 180 mm, semi-minor axis = 100 mm). Each slab is 10 mm thick and composed of a heterogeneous 50–50% adipose-glandular mixture of tissue-equivalent material. The slab that contains an assortment of target lesions was positioned at the central slice of the BR3D phantom (slab #3).

Images of the BR3D phantom were acquired on two different Hologic DBT systems: Selenia Dimensions and 3Dimensions (Hologic Inc.). The Hologic DBT systems acquire 15 equal dose projections of the breast over a 15° angle using continuous X-ray tube motion. The Selenia Dimensions and 3Dimensions systems have detector element sizes of 140 µm (binned) and 70 µm, respectively. Images of the BR3D phantom were first acquired on the Selenia Dimensions system using the AEC mode. The AEC mode corresponded to an acquisition technique of 33 kV, 75 mAs, and W/Al target-filter combination for a thickness of 60 mm. These same technique settings were then used to image the BR3D phantom on the 3Dimensions system. The projections were reconstructed using voxel sizes of 35, 50, 70, 100, and 140 µm (Piccolo, version 5.02, real-time tomography).

The signal–noise-ratio (\(SNR=\upmu /\upsigma\), where µ is the mean signal of calcifications and σ is the standard deviation of the surrounding background) was calculated for each calcification in the reconstructed slice where the calcifications are most in-focus. Single-calcifications were segmented using the Otsu method [30] within ROIs equal to the nominal size of each speck group [30]. The mean \(SNR\) of each calcification size was calculated by averaging the \(SNR\) of each individual calcification (95% CI is also shown). The \(SNR\) ratio between the two imaging systems was then calculated.

Results

VCT analysis

The pooled ROC results indicate that reconstruction with 70 µm voxels is superior to 100 µm voxels (Fig. 2A). Comparing 70 to 100 µm using continuous motion, the difference in \(\mathrm{AUC}\) Δ = 0.0236 (0.0651) and \({d}{^\prime}\) ratio λ = 1.12 (1.33) show an improvement in calcification detection (p < 0.001) for del = 140 µm (70 µm). For the similar comparison with step-and-shoot motion, Δ = 0.0558 (0.0647) and λ = 1.27 (1.29) also show that an improvement in calcification detection (p < 0.001) for del = 140 µm (70 µm).

Fig. 2
figure 2

Pooled ROC curves from MRMC study. Results are categorized by (A) reconstructed voxel size (Rec), (B) tube motion (Tube), and (C) detector element size (Del). Abbreviations ‘C’ is used in plot legends for continuous and ‘S’ for step-and-shoot tube motion

The pooled ROC results indicate that step-and-shoot is superior to continuous tube motion (Fig. 2B). Comparing step-and-shoot with continuous using a 100 µm reconstructed voxel, Δ = 0.0541 (0.0438) and λ = 1.24 (1.23) show an improvement in calcification detection (p < 0.001) for del = 140 µm (70 µm). For similar comparison using a 70 µm reconstructed voxel size, Δ = 0.0863 (0.0434) and λ = 1.40 (1.19) also show that there is an improvement in calcification detection (p < 0.001) for del = 140 µm (70 µm).

Finally, the pooled ROC results did not indicate that a detector element size of 70 µm is superior to 140 µm (Fig. 2C) for most test conditions. The only exception was for continuous tube motion in combination with a reconstructed voxel size of 70 µm (Δ = 0.0259 and λ = 1.08, p < 0.001).

We observed that calcification size affected lesion detectability. The performance is summarized by calcification size and blurring source conditions in Table 3. Note that a reconstructed voxel size of 70 µm and step-and-shoot motion show the greatest improvement in calcification detection for most acquisition conditions (Table 3A, B), especially for the smallest calcifications (8 and 27 voxels). Since we simulate small calcifications at the threshold of detectability, noise is dominant for a detector element size of 70 µm, substantially affecting the performance in calcification detection (Table 3C).

Table 3 Summary of MRMC analyses (%AUC with 14 virtual readers) and d’ calculations of each imaging design. Results are categorized by calcification size, (A) reconstructed voxel size, (B) tube motion, and (C) detector element size. p values (p) calculated using pair-wise comparison between pooled ROCs (DeLong method with bootstrap n = 2000)

Experimental analysis

The SNR calculated using DBT images of a BR3D phantom produced results that closely matched the computer simulations (Table 4). Images reconstructed with super-resolution can increase the SNR of calcifications. For 140 µm detector element size, images reconstructed with 35 µm—100 µm voxel size (super-resolution) resulted in higher SNR for small specks. Large calcifications benefitted from larger reconstructed voxel size; this is consistent with signal detection theory in which objects larger than the blurring kernel do not lose contrast and benefit from noise blurring.

Table 4 Summary SNR of each calcification speck and ratio of SNR between (A) Selenia Dimensions and (B) 3Dimensions systems. Calcification size of 0.136 mm (signal) was not detectable by human observers for signal-to-noise ratio (SNR) calculations

Smaller detector element sizes did not improve calcification detection. In fact, smaller detector elements resulted in noisier images with high σ values (Fig. 3B), reducing the SNR (Table 4B). For 70 µm detector element size, images reconstructed at 100 µm resulted in the highest SNR. The combination of the 70 µm detector element with the smallest reconstructed sizes does not improve the SNR of the calcifications, likely because the detection of small calcifications is dose limited. Computer simulations also demonstrated that a smaller detector element size does not improve calcification detection (Table 4D, E). The benefit of larger detector elements is consistent between experiment and simulation (Table 4C, E; 1.34 ± 0.13 vs. 1.16 ± 0.11).

Fig. 3
figure 3

Images of speck of calcifications acquired using a (A) Selenia Dimensions (140 µm detector size, binned) and (B) 3Dimension system (70 µm detector size) and reconstructed with (top) 35 µm and (bottom) 70 µm voxel size. Examples of speck sizes of 0.400, 0.230, and 0.165 mm are shown, respectively

Discussion and conclusions

This study shows that tomosynthesis acquisition geometry plays an important role in calcification detection, affecting the performance of virtual observers. Both source motion and reconstructed voxel size show significant changes in the performance of imaging systems. The overall impact of small detector element size does not affect substantially the detection of calcifications, except when tomosynthesis is used in combination with an additional blurring source (e.g., continuous tube motion). The greatest impact of these parameters is observed in the detection of small calcifications at the threshold of detectability. The best geometries made use of step-and-shoot motion and small reconstructed voxel sizes, taking advantage of the super-resolution abilities of tomosynthesis. Experimental results showed similar effects of the acquisition geometry.

The limited angle and the limited number of projections in DBT produce an under-sampled dataset that may compromise the detection of clinically significant calcifications. Early DBT studies showed that some calcium-only lesions were less conspicuous when compared with conventional DM [31]. In a survey, 61% of radiologists responded that DBT did not show an improvement in calcification characterization due to the poor image quality of synthetic 2D mammograms in comparison to DM [32]. Calcifications without an associated mass or area of distortion may be the earliest sign of breast cancer, specifically ductal carcinoma in situ (DCIS). In addition, in denser breasts, invasive cancers may often be detected by the identification of suspicious calcifications and these cancers may often have a worse prognosis than some non-calcified cancers [33]. A study conducted with over a million women shows that DBT should be considered the new standard of care for breast cancer screening [34]; therefore, improving calcification detection with the optimization of imaging systems is necessary since DBT has already been shown to improve substantially the detection and characterization of lesions, especially for small lesions composed of soft tissue.

VCTs and other task-based methods have become increasingly complex over the last few years. In this study, we have shown that VCTs can predict with high confidence the physical properties of imaging systems to provide guidance on how to improve calcification detection. Validations of imaging systems currently rely on experimental evaluation of physical phantoms and small clinical datasets [24, 35]. Physical phantoms provide preliminary data to indicate changes in performance between systems but may under-sample the full range of patients and disease complexity. Our VCT results can predict clinical trials that are similar to the performance of clinical systems [11]. Importantly, the US Food and Drug Administration (FDA) supports the use of VCTs to decrease human trial size and duration [15].

Reconstructed DBT images of a BR3D phantom were used to compare and validate our VCT results. The DBT projections were acquired using two commercially available systems and reconstructed using third-party software at 35, 50, 70, 100, and 140 µm voxel sizes. We observed that images reconstructed with super-resolution [36] can improve the \(SNR\) of calcification specks for projections acquired using 140 µm detector elements. We observed that a smaller detector element size (70 µm) did not substantially improve calcification detection; small detector elements will increase the influence of quantum noise at the detector, resulting in sharper but noisier images. It is important to mention that the DBT projections of the BR3D phantom were acquired using two clinical systems designed with continuous tube motion. The benefits of step-and-shoot should also be investigated experimentally.

Experimental validation was performed with technique factors that matched using the automatic exposure settings of the Selenia Dimensions system. Radiation dose can affect the threshold of detectability of calcifications; Hadjipanteli et al have demonstrated with a human reader experiment that the limit in calcification detection varied for DBT images acquired at different doses, namely 0.243 ± 0.014, 0.211 ± 0.011, and 0.204 ± 0.011 mm for 1.25, 2.5, and 5.0 mGy, respectively [5]. Because of this, in our study, the kV and exposure (mAs) were kept the same for both tomosynthesis systems to eliminate the effect of dose from our comparison, and to analyze only the effect of detector element size in calcification detection. We acknowledge that the systems have small differences and that detection may be different if both systems were operated by a phototimer.

The effect of angular range in DBT was not investigated in this work. Hadjipanteli et al have shown that narrow-angle DBT improves the limit in calcification detection (0.211 ± 0.011 mm) as compared to wide-angle DBT (0.257 ± 0.015 mm) [5]. In this study, we wanted to evaluate the impact of super-resolution and the effect of detector element size, tube motion, and reconstructed voxel size in calcification detection; the effect of angular range in DBT could be explored in a separate work. Importantly, the single-calcification models—defined at the threshold of detectability in our study—are in the same range of size reported by Hadjipanteli et al (≈0.2 mm for narrow-angle DBT) [5]. However, some differences in \({d}{^\prime}\) are observed; we attributed these differences to the composition of the simulated calcifications.

While there was general agreement, the \({d}{^\prime}\) the ratio in the computer simulations did not exactly match with the change in \(SNR\) observed in the physical experiments as a function of reconstructed voxel size. There are several possible reasons for the observed differences, including differences in the shape and size of the calcifications, limits to the x-ray physics and detector models, and constraints in the computation resources available. Importantly, \(SNR\) differs from \({d}{^\prime}\)\({d}{^\prime}\) includes the effects of the observer’s eye filter and internal noise. That said, the metrics are intrinsically correlated, and the average results comparing the two detector element sizes are remarkably similar.

The breast phantom population in the VCT was constrained by the simulation parameters. The breast phantoms and calcifications used a voxel size of 100 µm. The computational time and GPU memory required to simulate breast phantoms, DBT projections, and reconstructions increase with smaller voxel sizes and larger breast volumes. Thus, we did not use smaller voxels based on our limits on simulation time and computer memory. In addition, the calcifications were only inserted at the central plane (CC view) of phantoms to improve accuracy in estimating the precise location of the calcifications in the reconstructed images, avoiding rounding errors.

Although patient motion is also considered a blurring source, it was not simulated in this study. Patient motion affects lesion detectability and is more apparent in images acquired using step-and-shoot motion due to the longer scan time. Future work should investigate whether the benefits of step-and-shoot motion in calcification detection could potentially be outweighed by the blurring effects of patient motion; preliminary data has shown that patient motion affects image quality more than tube motion [37]. To conduct VCTs would require an accurate model of patient motion.

The field will benefit from the adoption of standards in VCTs, since clinical trials are limited by cost and duration. Clinical trials of DBT involve ionizing radiation that requires repeated irradiation of volunteers, which may be practically unfeasible today. In particular, screening populations do not have a high incidence of cancer; therefore, many cancer-free women are irradiated. In DBT, the incorporation of personalized acquisition geometries and reconstruction methods will allow better detection and characterization of breast findings.

In summary, calcification detection in DBT is degraded due to blurring from reconstructed voxel size, detector element size, and tube motion. The blurring is more evident in small calcifications. The impact on calcification detection ranked from most to least significant is (i) reconstruction voxel size, (ii) tube motion during acquisition, and (iii) detector element size. Acquisition geometries that use smaller reconstructed voxels and step-and-shoot tube motion have the potential to further improve the detection of calcifications.