Introduction

Digital breast tomosynthesis (DBT), a pseudo 3D imaging technique, is being integrated into clinical use as improved lesion visibility [1], reduced recall rates [1, 2], increased cancer detection [2] and diagnostic accuracy [3], and improved patient comfort [4] have been noted. Most of these results have come from breast cancer screening; however, use of DBT in the assessment clinic is still relatively new with little research performed to date [5]. In the assessment clinic, women are examined to evaluate potential abnormalities detected at screening [5]. The aim of the assessment clinic is to confirm the presence of malignancy and refer the patient to treatment or to reject the screen-detected findings and return the woman to routine screening.

DBT has previously been compared with digital mammography (DM) for its use in the assessment clinic [6,7,8] in combination with 2D mammography imaging (DM or synthesized mammography) as per FDA guidelines [9]. The results from these studies have largely been favorable, with an increase in the sensitivity [6,7,8], area under the receiver-operating characteristic curve (AUC) [6, 8], and jackknife free-response receiver-operating characteristic figure of merit [7, 8].

The workflow in the assessment clinic is very complex [5], as it not only requires evaluation of various images but may also involve biopsy. Diagnostic accuracy is critical in assessment; therefore, the need for superior lesion visibility and improved lesion characterization is paramount. Limited lesion visibility in standard mammography [DM and Spot-View Mammogram (SVM)], due to the tissue ‘overlap effect’ [4], has been deemed a challenge in assessment [5]. In the assessment clinic, use of ultrasound is limited in an examination to supplement the mammographic workup because of the high false-positive rate, and it is performed only on recommendation from radiologists [10]. Noticeably, with DM, 50% of the ultrasounds and 21% of the biopsies performed during assessment for confirmation of suspected malignancy are unnecessary [11].

Effective use of DBT in the assessment clinic calls for superior performance of DBT over the workup view. Few studies [12,13,14] have evaluated DBT (in combined mode with DM) against workup views; however, results have been inconclusive, as some studies have shown [14] statistical significance in improvements, while others [12, 13] did not. Due to this discrepancy in the present literature, we sought to evaluate the efficacy of using DBT in an Australian breast cancer assessment clinic as a focus of our trial—the Tomosynthesis Assessment Clinic Trial (TACT).

In TACT, conducted within the Australian National BreastScreen program, we evaluated the efficacy of DBT in breast cancer diagnosis and compared it with the standard DM workup. We evaluated whether DBT in combination with DM images from the screening examination would provide increased diagnostic accuracy, improve radiologists’ performance, reduce the number of additional images and biopsies required, and improve patient care by simplifying the assessment workflow.

Materials and methods

This study was approved by Australian National Ethics Committee and supported by the Northern Sydney Local Health District, Northern Sydney Central Coast BreastScreen New South Wales (NSW), and the Cancer Institute NSW.

TACT compared the diagnostic accuracy of DBT with DM Workup Views (DMW) when used to evaluate an abnormality previously detected on standard two-view screening mammograms. Data were collected between 16 October 2014 and 19 April 2016 in the Northern Sydney NSW BreastScreen assessment clinic.

Image acquisition

As per routine assessment clinic protocol, the DMW of the recalled lesion was obtained. These consisted of three standard workup views, namely the mediolateral (ML) standard workup and two-view spot magnification [craniocaudal (CC) and mediolateral oblique (MLO), with the exception of calcifications where the spot views were in the CC and ML]. In addition, all participating patients had two-view (CC and ML) DBT. The latter was chosen to more closely correlate with the ML of the standard workup, but did result in less tissue being included on the acquisition. DBT images were obtained at an acquisition angle of 15° and displayed with a 1-mm slice separation. All images for the patients were acquired in one session by the same radiographer using a comparable technique on a Hologic Selenia Dimensions mammography system (Marlborough, MA, USA).

Readers

Participating radiologists (n = 15) were of varying experience levels in reading DM and DBT; they had 4–27 (mean 16) years of experience in the BreastScreen Program and read 1599–13,166 (mean 4810) screening mammograms per year (Table 1).

Table 1 Radiologists’ performance measured using receiver-operating characteristic (ROC) area under the curve (AUC). Each radiologist’s experience level is also listed

Study design and participants

In this balanced split-plot multireader, multicase (MRMC) study, 144 cases (48 cancer cases) were read by the participating radiologists in 3 non-overlapping blocks (as shown in Fig. 1). Each block thus contained 48 cases (16 cancer cases) and 5 radiologists. Each case was read by exactly 5 radiologists, and radiologists in each block read all 48 cases of their respective block using both DBT and DMW in separate reading sessions. Each radiologist’s reading sequence was randomized and scheduled such that a case in DBT and DMW was read at least a month apart to reduce any memory bias. The reading sessions were short (about 10 cases) and were a mix of cases in DBT or DMW, as specified by the radiologist’s randomized reading sequence.

Fig. 1
figure 1

Pictorial representation of the workflow of the study design used in this study

The number of readers, cases (cancer and non-cancer ratio), and blocks used in TACT were calculated such that TACT would achieve the same variance (i.e., standard error) as in Alakhras et al. [8]. The stepwise process of calculations and comparison of these combinations of readers and cases are detailed in Mall et al. [15]. The details of the split-plot reader study are given in Obuchowski et al. and Gallas et al. [16, 17].

Ground truth

Patients participating in TACT had their routine assessment conducted alongside of this study. The final outcome of routine assessment was used to establish the ground truth, which incorporated all 2D mammographic findings, ultrasound, clinical examinations, and biopsy if necessary. This also involved follow-up and screening surveillance wherever necessary.

Case selection criteria

All women aged 40 years and over who attended the single-site assessment clinic during the trial period were offered participation in the trial. From those patients who provided written consent, 144 were selected to participate.

The composition of normal, benign, and cancer cases in the study was equal (i.e., 48 of each). The selection of cancer cases was on a first-in basis (prospective), and once cancer cases were obtained, they were randomly assigned to blocks. A randomized selection of recent normal and benign cases, taken from groups matched for breast density with the cancer cases, was retrospectively harvested. The normal and benign cases were obtained from the immediately preceding 3 months to ensure timely review of the images in the unlikely event of the discovery of an unexpected significant finding on retrospective review. In total, there were 54% non-dense (i.e., breast density categories A and B) and 46% dense (i.e., categories C and D) cases. Radiologists were unaware of the mix of cancer, normal, and benign cases and were unaware of the breast density matching.

Except for three cases where two recall lesions were present, all cases in this study had only one lesion. Radiologists were provided with the reason for recall and were asked to evaluate the effectiveness of the images provided in evaluating the recall lesion as well as to report any additional findings.

Retrospective study workflow

DMW or DBT images were read alongside two-view (MLO and CC) DM screening images (hanging protocol as shown in Fig. 2). No radiologist reviewed any cases in which they had prior clinical involvement. If a prior round of comparison mammograms was available at the time of screen reading, they were also included in the study mammogram hangings. No clinical information was provided to the radiologists other than the reason for recall. Radiologists were asked to grade the lesion severity, lesion conspicuity, and confidence in their assessment on 5-point scales: 5 being the highest (malignancy on the severity scale, most conspicuous on the conspicuity scale, and 80%-100% confidence of the decision on the confidence scale) (Fig. 3). Last, radiologists were asked if additional mammogram images were required, if ultrasound was recommended, and whether biopsy was necessary. These are detailed in Fig. 3.

Fig. 2
figure 2

Details of the hanging protocol used in this study for both the DM workup view and DBT assessments

Fig. 3
figure 3

Pictorial representations of questions asked of the radiologists during their assessment of the cases

To address the review timeframe for non-treatment cases, at least two radiologists in each block read the cases within 3 months of routine clinical assessment. Only one patient, from the DBT group, required a repeat visit to the assessment clinic because of the retrospective review: the repeat assessment was normal, and the subsequent screening examination was also normal.

Data analysis

Receiver-operating characteristic (ROC) area under the curve (AUC) was used to analyze observer performance. Two-sided hypothesis testing was conducted to determine whether use of DBT is any different from DMW (H0: DBT = DMW). Standards for Reporting of Diagnostic Accuracy (STARD) including the sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated. Aside from these, the prevalence rates [true (the proportion of the sample population that is positive) and apparent (the proportion of the sample population that is positive to the diagnostic method)] were also calculated. To understand the effect of using DBT in diagnosing the primary lesion and all lesions (that is, including additional findings), we conducted the STARD and ROC-AUC analysis twice: [1] focused mainly on the primary lesion and [2] based on all lesions radiologists identified. For the former, we have also looked at the effect of DBT on specific lesion types (mass, calcification, architectural distortion, non-specific density, and stellate lesions). For some of these lesion-specific analyses, the results may be limited because of the low number of cases available for that type.

The radiologist decision was considered positive (implying presence of cancer) if the assigned grade for lesion severity was higher than 2 (1 = normal and 2 = benign, as shown in Fig. 3); otherwise, it was considered negative (implying absence of cancer). Based on whether the lesion in truth was cancerous or not, the radiologist decision was classified as true positive (TP)/false positive (FP)/true negative (TN)/false negative (FN). This resulted in benign lesions that were reported by the radiologists being classified as true negatives. This was done to accommodate the binary nature of ROC analysis, where only a binary decision (cancer/not-cancer) can be analyzed.

Mann-Whitney U tests were used to compare radiologists’ grading for lesion severity, conspicuity, and confidence on decisions between DBT and DMW. Chi-square tests (χ2) were used to compare the effect of DBT on need for additional image requirements and need for biopsy.

R language has been used to perform most of these analyses while iMRMC [18] software was used to perform ROC AUC analysis. The significance level for all analyses was set to 0.05.

Results

Each radiologist assessed 48 cases examining the primary lesion while also reporting any additional lesions they found. Thirty unique additional lesions (6 only in DBT and 14 only in DM and 10 in both DBT and DM) were identified by a few radiologists but they were reported a total of 51 times in 31 cases: 23 [16 FP, 7 TN (marked as benign)] in DBT and 28 (19 FP, 9 TN) in DMW. Most radiologists identified at most one additional lesion, and these did not require further assessment.

Except for one radiologist, all radiologists (93.3%) performed better or comparably using DBT than DMW (Table 1, Figs. 4a, b, and 5). In diagnosing the primary lesion, 86.7% of radiologists performed better with DBT than DMW. On average, AUC for DBT was 0.06 higher than that for DMW (maximum difference 0.13) (Table 1).

Fig. 4
figure 4

a Projection of the radiologists’ performance measured by the area under the curve for primary lesions only. b Projection of the radiologists’ performance measured by the area under the curve for all lesions (including primary and secondary lesions)

Fig. 5
figure 5

Receiver-operating characteristic area under the curve plot for DBT and DM workup view

Both the sensitivity and specificity of DBT [primary lesion (0.94, 0.77) and all lesions (0.93, 0.75)] were higher than DMW [primary lesion (0.91, 0.57) and all lesions (0.90, 0.56)] (Table 2). The maximum difference in sensitivity was observed for calcifications (DBT-DMW = 0.07) and non-specific density (0.06). As for specificity, the maximum difference was observed for architectural distortions (greater on DBT by 0.20). The PPV and NPV of DBT were also consistently higher for all analyses compared with DMW (Table 2). Overall, the PPV and NPV of DBT (0.64, 0.96) were 31% and 4%, respectively, higher than DMW (0.49, 0.92).

Table 2 Results of diagnostic accuracy measured using Standards for Reporting of Diagnostic Accuracy (STARD)

DBT was different from DMW in diagnosing both the primary lesion (z = 2.62, p = 0.006) and all lesions (z = 2.74, p = 0.008). For both of these scenarios, the AUCs for DBT (primary lesion 0.935, all lesions 0.927) were higher than for DMW (0.876, 0.872).

Radiologists were more confident about their decisions using DBT than DMW (U = 297,990, p < 0.001) (Table 3). Interestingly, non-cancer lesions appeared less severe (U = 89,331, p < 0.001) on DBT; however, cancerous lesions appeared more severe (U = 33,172, p = 0.02) on DBT. Moreover, cancerous lesions were more easily seen (U = 24,207, p = 0.02) on DBT than DMW.

Table 3 Estimated lesion severity, conspicuity, and radiologists’ confidence in the decisions. Also described are the distributions of these values for cancer and non-cancer cases. Bold values indicate statistical significance

Additional view requirements were reduced using DBT (χ2 = 17.63, p < 0.001) (Table 4). The need for additional views was reduced for both the dense (χ2 = 7.73, p = 0.005) and non-dense (χ2 = 9.38, p = 0.002) breast groups. The most common images that radiologists said would have been helpful when using DMW were: (1) another CC view (n = 77); (2) DBT (63); (3) another MLO view (54), and (4) magnification views (14). For DBT, these were (1) magnification views (n = 65), (2) another CC view (55), (3) another MLO view (35), (4) a spot view (38), and (5) DBT MLO (19).

Table 4 Number of times additional views, ultrasound, and biopsy were recommended. Also described are the distributions of such recommendations for dense and non-dense breast groups

Ultrasound recommendations were also reduced using DBT (χ2 = 8.56, p = 0.003); this difference, however, was not significant for non-dense breasts (χ2 = 1.98, p = 0.16) (Table 4). Lesion characterization (366 DBT, 448 DMW) and reassurance (383 DBT, 374 DMW) were the two leading reasons for ultrasound recommendation with the third being biopsy (244 DBT, 244 DMW).

Use of DBT led to an overall reduction in biopsy recommendation; however, a slight (not statistically significant) increase in recommendations for biopsy specifically in dense breasts was also noted. There was a 58.3% reduction in biopsy recommendations for dense breasts for false decisions (FP and FN) with DBT (36 FP, 1 FN) compared with DMW (57 FP, 1 FN). False decisions where biopsy was not recommended were (23 FP, 1 FN) for DBT and (52 FP, 12 FN) for DMW. Most of the increase in biopsy recommendations for the dense breasts group was due to true-positive decisions.

Overall, the percentage of reduction in recommendations using DBT (compared with DMW) was: (1) 6.3% for biopsy, (2) 35.6% for additional views, and (3) 5.3% for ultrasound.

Discussion

Based on our results, 93.3% of radiologists performed better (or comparably) when using DBT compared with DMW; this is much higher than 75% [19], which was reported by an earlier study focused on diagnostic accuracy using selective cases. Radiologists with below average experience (less than 16 years) performed better [AUC (DBT-DM) = 0.062] with DBT compared with their more experienced peers [AUC (DBT-DM) = 0.054]. This is similar to the TOMMY trial results, which noted increased sensitivity in less experience radiologists than in their experienced peers [20].

DBT offers increased sensitivity, specificity, and predictive value (both positive and negative) in diagnosing lesions (both primary and additional) in the assessment clinic. Our results extend previous studies that have separately compared DBT with DM workup view/SVM [12,13,14, 19, 21,22,23] and DM [6, 8, 24].

Increased radiologist confidence in DBT has been previously reported [25]. We have shown that there is a statistically significant increase in radiologist confidence in diagnosis. Our observation that cancerous lesions appear statistically significantly more severe, and more conspicuous, suggests that detecting cancerous lesions may be easier using DBT as opposed to DMW. Our result of improved diagnostic accuracy with DBT supports this assertion.

Undergoing assessment examinations is emotionally distressing [26, 27]. The more procedures women undergo, the more anxious they are likely to become. For this reason, any reduction of special procedures—such as biopsy or ultrasound—without compromising patient care may improve the assessment experience. Ultrasound is generally performed in the assessment clinic on all patients, but particularly those who require biopsy. Our results suggest that DBT can significantly reduce the need for additional images (additional views and ultrasound) without causing any impact on radiologists’ confidence. Noticeably, the most common reason radiologists cited for ultrasound recommendation with DBT was reassurance, which could be related to reduced experience with DBT or to the 6.3% reduction in biopsy recommendations for benign lesions, which is also a promising benefit of using DBT in assessment. Finally, in agreement with previous reports of improved lesion characterization with DBT, we noted a 35.6% reduction in additional views required with DBT [28]. Our results are in agreement with the report of a 11% reduction in additional view requirements reported by Heywang-Kobrunner et al. [21].

Mhuircheartaigh et al. [23] indicated that SVM may be rendered obsolete (except for one case, because of technical error, i.e., bad breast positioning, in 100% of cases DBT was better) with the introduction of DBT in the assessment clinic. However, Philpotts et al. [29] only noted up to 57% reduction in SVM use; an 89% reduction was also found in the study by Heywang-Kobrunner et al. [21].

The number of additional lesions detected with DBT that required further evaluation was low and slightly smaller than the number detected with DMW. It is interesting that an excessive number of additional lesions were not detected on DBT. None of the additional lesions detected required biopsy.

We have shown that DBT has the potential for successful use in the assessment clinics in Australia and that DBT (in combination with DM images from the screening examination) provides increased diagnostic accuracy, improves radiologists’ performance and confidence, reduces the number of additional images and biopsies required, and even improves patient care by simplifying the assessment workflow.

Limitations

This study had several limitations. A relatively small number of cases, 144 in total, were used, which could impact the generalization of the results. Moreover, cases were matched for breast density between cancer and non-cancer cases to ensure that the greater diagnostic challenges associated with breast density applied equally to cancer and non-cancer cases. It is possible that this case selection, although randomized and with an even density distribution, introduced its own selection bias.

Recalls for more than one lesion were largely excluded from the study to reduce complexity of data recording, and it is possible that this resulted in a selection bias.

Conclusion

DBT has the potential to improve healthcare by increasing diagnostic accuracy and simplifying the workup in the breast cancer assessment clinic, thereby improving the assessment experience for both patients and radiologists.