Abstract
Objectives
To compare interobserver variability (IOV), reader confidence, and sensitivity/specificity in detecting architectural distortion (AD) on digital mammography (DM) versus digital breast tomosynthesis (DBT).
Methods
This IRB-approved, HIPAA-compliant reader study used a counterbalanced experimental design. We searched radiology reports for AD on screening mammograms from 5 March 2012–27 November 2013. Cases were consensus-reviewed. Controls were selected from demographically matched non-AD examinations. Two radiologists and two fellows blinded to outcomes independently reviewed images from two patient groups in two sessions. Readers recorded presence/absence of AD and confidence level. Agreement and differences in confidence and sensitivity/specificity between DBT versus DM and attendings versus fellows were examined using weighted Kappa and generalised mixed modeling, respectively.
Results
There were 59 AD patients and 59 controls for 1,888 observations (59 × 2 (cases and controls) × 2 breasts × 2 imaging techniques × 4 readers). For all readers, agreement improved with DBT versus DM (0.61 vs. 0.37). Confidence was higher with DBT, p = .001. DBT achieved higher sensitivity (.59 vs. .32), p < .001; specificity remained high (>.90). DBT achieved higher positive likelihood ratio values, smaller negative likelihood ratio values, and larger ROC values.
Conclusions
DBT decreases IOV, increases confidence, and improves sensitivity while maintaining high specificity in detecting AD.
Key points
• Digital breast tomosynthesis decreases interobserver variability in the detection of architectural distortion.
• Digital breast tomosynthesis increases reader confidence in the detection of architectural distortion.
• Digital breast tomosynthesis improves sensitivity in the detection of architectural distortion.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Introduction
Architectural distortion (AD) is a subtle mammographic finding that can be a manifestation of breast cancer. While AD can be due to a variety of malignant and non-malignant causes [1], the positive predictive value for malignancy is approximately 75% [2]. AD may be the earliest manifestation of breast cancer [3] and is the most commonly missed abnormality on false-negative mammograms [4]. Earlier detection of AD may improve patient prognosis more than earlier detection of calcifications [3].
Compared to digital mammography (DM), digital breast tomosynthesis (DBT) is a newer imaging technology that displays thin slices of breast tissue, thus mitigating the effects of tissue overlap. DBT has been shown to increase cancer detection rates and decrease screening call-back rates [5,6,7,8,9]. In addition, use of screening DBT allows more recalled patients to undergo ultrasound alone [10], thus potentially improving diagnostic workflow efficiency for patients recalled from screening. The risk of malignancy in abnormalities detected only by DBT is significant [11], and has been reported at nearly 50% [12]. DBT is particularly helpful in detecting abnormalities in patients with dense breasts [13, 14].
Minimising disagreement in difficult cases is the best way to reduce inconsistencies in screening mammogram interpretation [15]. AD has high interobserver variability (IOV) [16], and while sensitivity for AD is lower than for non-AD manifestations of breast cancer [17], DBT improves detection of AD [11]. The purpose of this study was to compare IOV, reader confidence and sensitivity/specificity in detecting AD on DM versus DBT.
Materials and methods
Study design
This reader study, approved by the Institutional Review Board and compliant with the United States Health Insurance Portability and Accountability Act, used a counterbalanced experimental design to estimate the effects of DBT relative to DM regarding sensitivity/specificity, IOV (or reader agreement) and reader confidence in detecting AD. Informed consent was waived.
Selection of patient images
The radiology database at a tertiary breast centre was searched using the PenRad Management Information System (PenRad Technologies Inc., Buffalo, MI, USA) for all reports containing the words ‘architectural distortion’ or ‘possible architectural distortion’ on all screening mammograms performed from 5 March 2012 to 27 November 2013. Unilateral examinations were excluded to decrease a true positive hit by chance and allow each breast to act as a control within a subject. All studies consisted of standard two-view full-field DM images and DBT reconstructions in both the mediolateral oblique and craniocaudal projections. Both DM and DBT images were obtained on 3D units (Selenia Dimensions, Hologic, Marlborough, MA, USA) in a single compression episode for each projection. All patients had both DM and DBT (not synthetic mammography views reconstructed from DBT data). DBT images were obtained through the motion of the x-ray tube over a 15° arc and reconstructed into 1-mm sections. DM and DBT images from all reports containing the words AD or possible AD were consensus-reviewed to confirm the presence of AD or possible AD (Fig. 1) on screening views on a 5 megapixel liquid crystal display (LCD) diagnostic workstation (SecurView, Hologic, Marlborough, MA, USA) by three radiologists (one breast imaging fellow and two fellowship-trained breast imagers). At the time of the consensus review, the breast imagers had 6 and 16 years of breast imaging experience in practice, respectively. This consensus review took place 3 years prior to the current study. Our institution began using reconstructed two-dimensional images after the consensus review and before this study was performed. To maintain the integrity of the methods for the current experimental study, we did not collect additional AD cases using reconstructed two-dimensional imaging. In addition, the 3-year delay provided the beneficial effects of memory decay and retrograde interference. Control cases were identified through searches on our institution’s PACS (GE Healthcare; Chicago, IL, USA) and through MONTAGETM Search and Analytics software (Montage Healthcare Solutions, Philadelphia, PA, USA). Controls were matched for age, breast density (as described in the screening mammogram report by the reading radiologist), presence and side of prior malignancy, presence and side of new malignancy on the presented mammogram, presence and side of prior surgery, and date of mammogram when possible. While matching for breast cancer history may have increased the breast cancer risk profile of the study sample in both the AD and non-AD groups compared to the general screening population, it controlled for the possibility that either group would be more complex or at higher risk than its counterpart. The ratio of case/control was one AD patient/one control patient. All cases and controls were bilateral examinations. Patient demographics, imaging findings, pathology findings and follow-up imaging results were obtained from the electronic medical record and recorded. Imaging and clinical follow-up through May 2016 were reviewed via the electronic medical record, thus all cases had at least 2 years of follow-up available.
Review of images
Two breast radiologists (9 and 19 years experience, respectively) and two breast imaging fellows in the second half of their 1-year breast imaging fellowships who were blinded to patient information, prior imaging, and outcomes independently reviewed the DBT and DM images from screening mammograms done with combined technique in two separate reading sessions. In the first session, for half of the cases (1–59), only the DBT images were reviewed by radiologists and for the other half of the cases (60–118) only the DM images were reviewed. In the second session (at least 1 month later), only the DM images were reviewed for those cases in which DBT images had been previously evaluated (1–59), and only DBT images were reviewed for cases in which DM had been previously evaluated (60–118). While the order of patient images was held constant across sessions 1 and 2, the order itself was randomly assigned AD versus no AD. This counterbalanced experimental design allowed patient images and radiologists to be held constant while only imaging technique (DBT vs. DM) was manipulated; therefore, observed differences in radiologists’ performance can be attributed to the direct effect of imaging technique.
Measures
For each breast, readers recorded the presence or absence of AD or possible AD (i.e., Yes/No), the location in the breast using clock face position, and their confidence in that interpretation on a scale of 0–4 (i.e., How confident? 0 ‘Not at all’, 1 ‘Somewhat’, 2 ‘Confident’, 3 ‘Very confident’, 4 ‘Almost sure’). AD or possible AD that was due to post-surgical change and clearly identified as such by the reader was not coded as AD or possible AD. Only unexplained AD or unexplained possible AD was coded as AD or possible AD for the purposes of this study in order to reproduce the clinical setting in which architectural distortion that is clearly due to prior surgery is not clinically significant. The studies were interpreted on a 5 megapixel LCD diagnostic workstation (SecurView, Hologic).
Statistical methods
All analyses were conducted using SAS Software 9.4 (SAS Inc., Cary, NC, USA). Agreement was examined using weighted Kappa. Differences in confidence between DBT versus DM and between attending radiologists versus fellows were examined using generalised mixed modeling assuming a binomial distribution. Differences in sensitivity/specificity between DBT versus DM and between attending radiologists versus fellows were examined using generalised mixed modeling assuming a binary distribution. Modeling was accomplished with sandwich estimation using PROC GLIMMIX, where images were nested within patients and patients nested within radiologists. Positive likelihood ratios were calculated as sensitivity/(1 − specificity) and negative likelihood ratios were calculated as (1 − sensitivity)/specificity. Receiver operating characteristic (ROC) estimates were calculated using PROC LOGISTIC. Statistical significance was established a priori at the 0.05 level and all interval estimates are calculated for 95% confidence. Multiple comparisons were examined using Tukey correction.
Results
Of 25,369 screening DBT examinations, there were 84 reports of AD or possible AD. Thirty-two cases were excluded because AD or possible AD was not confirmed on consensus review. Four cases were excluded because they were unilateral examinations. The 59 remaining patients had 59 AD lesions (all cases were single lesions in one breast of a bilateral screening mammogram) and were matched to 59 controls for a total of 1,888 observations (59 × 2 (cases and controls) × 2 breasts × 2 imaging techniques × 4 readers, or 472 observations per reader). No differences were observed between AD and non-AD patients on matched variables (see Table 1). Results of biopsy or imaging follow-up of AD cases are shown in Table 2. Although the purpose of the present study was to examine consensus-reviewed AD and possible AD, almost half of our identified cases were confirmed AD that persisted on additional imaging and required biopsy (27 of the 59). Thirty-two out of the 59 cases resolved with additional imaging or turned out to be postsurgical change that was not readily apparent to the consensus reviewers.
Agreement (Tables 3 and 4)
Overall agreement among radiologists was fair to moderate [18] for DM (κ = 0.37) and moderate to good for DBT (κ = 0.61). Agreement between attendings was fair to moderate for DM (κ = 0.40) and good for DBT (κ = 0.72). Agreement between fellows was fair for DM (κ = 0.34) and moderate for DBT (κ = 0.57). In addition, as seen in Table 4, agreement overall was much higher when using DBT (κ = .71) relative to DM (κ = .35); this held true for both attendings (κ = 0.76 for DBT versus 0.46 for DM) and fellows (κ = 0.60 for DBT vs. 0.34 for DM) for the 27 confirmed AD cases.
Reader confidence (Tables 3 and 4)
Level of confidence in detecting AD was higher when using DBT compared with DM (3.2 vs. 2.6 on a 0–4 scale), p < .001. Attendings’ level of confidence in detecting AD was higher when using DBT compared with DM (3.7 vs. 3.1 on a 0–4 scale), p < .001. Fellows’ level of confidence in detecting AD was higher when using DBT compared with DM (2.4 vs. 1.8 on a 0–4 scale), p < .001. As seen in Table 4, confidence also increased when using DBT (2.5) relative to DM (3.1) for confirmed cases of AD, p < .0001.
Sensitivity and specificity (Tables 3 and 4)
Overall, DBT achieved higher sensitivity than DM (.59 vs. .32), p = .0006. Sensitivity for attendings was much higher for DBT than DM (.69 vs. .29), p < .0001. Sensitivity for fellows was higher for DBT than DM (.49 vs. .35), p < .0001. Specificity remained high for both DBT and DM for both attendings and fellows (>.90). The small reduction in specificity from .96 to .93 observed for attendings was offset by the increase in specificity from .91 to .94 for fellows, resulting in no significant change overall when combined. In addition, DBT achieved higher positive likelihood ratio values, smaller negative likelihood ratio values, and larger ROC values relative to DM (Table 3). Sensitivity and specificity were also examined for the 27 confirmed AD cases. As seen in Table 4, overall sensitivity increased when using DBT relative to DM (.41 vs. .86). In particular, sensitivity increased dramatically for attendings using DBT relative to DM (.97 vs. .38), p < .001, for confirmed AD cases; increase in sensitivity was also observed for fellows using DBT relative to DM for confirmed AD (.75 vs. .43), p < .0001. Specificity remained high throughout (.90–.96). Figure 2 shows area under the curve values for attendings and fellows for all AD and possible AD cases versus just for confirmed AD cases.
Discussion
AD is a subtle but clinically important mammographic finding that may be the earliest manifestation of breast cancer [3]. The sensitivity for AD is lower than for non-AD manifestations of breast cancer [17], and AD is the most commonly missed abnormality on false-negative mammograms [4]. The results of our study indicate that, compared to DM, DBT decreases IOV, increases reader confidence, increases sensitivity, and maintains high specificity in the detection of AD. Because our study design held patient images constant (i.e., patient images serve as their own controls) and radiologists constant (i.e., radiologists serve as their own controls) while only imaging technique (DBT vs. DM) was manipulated, observed differences in radiologists’ performance (e.g., sensitivity, specificity) regarding patient images can be attributed to the direct effect of imaging technique.
IOV is an assessment of radiologists’ consensus and, particularly in the setting of expert readers, ambiguity of difficult findings. Because there is no gold standard to confirm mammographic findings other than expert consensus, improving consistency in clinical practice is critical to providing the highest quality care. In a study performed prior to DBT, AD was found to have high IOV compared to other mammographic abnormalities [16]. Our study shows that DBT also decreases IOV compared to DM. Consistency is increased among our four readers together as well as between our two experienced readers and between our two less experienced readers.
In addition, reader confidence was significantly higher when using DBT compared with DM. Confidence in mammogram interpretation is associated with improved accuracy, particularly among low volume readers [19]. Similarly, Tucker et al. recently showed that the increase in sensitivity with DBT is greater for readers with less than 10 years of experience [20]. Compared to DM, DBT allowed for increased detection of AD by all of our readers when examined together as well as by the experienced readers alone (i.e., when examined without the less experienced readers) and by the less experienced readers alone (i.e., when examined without the experienced readers).
The sensitivity for detecting AD in our study was 29–35% with DM and 49–69% with DBT, lower than that found by Suleiman et al. at 87% [17], although all readers in that study were ‘experienced breast screen readers’ and all cases contained a biopsy-proven malignancy. The sensitivity of our attending readers was higher than that of our fellow readers but still lower than that found by Suleiman et al. This is likely due to the differences in case mix. Our cases were consensus reviewed as having AD or possible AD on screening images, although 32/59 cases resolved with additional imaging or turned out to be postsurgical change that was not readily apparent to the consensus reviewers. Scars from prior benign surgeries were not always marked by the technologist, thus some cases of postsurgical distortion were not prospectively identified as such. Many of the cases that were not identified as having AD by our four readers were from this group of 32 cases. When we examined only confirmed AD cases (i.e., only the cases that persisted on additional imaging and required biopsy), sensitivity increased to 97% for attendings and 75% for fellows, which is similar to the sensitivity reported by Suleiman et al. It is possible that additional years of experience with DBT between the time of the consensus review and the time of this study helped the readers to better select only the true cases of AD. Alternatively, it may be that studies selected as having possible AD or AD during the consensus review process were already known to have been read as AD or possible AD by the original radiologist, while in this study the readers also had control cases and were blinded to the original interpretation. Patient information or previous imaging could have contributed to the original designation of AD, but our readers were blinded to patient information and previous imaging. A study by Partyka et al. showed that some AD is seen better or only on DBT compared to DM [11], and our cases were consensus reviewed using both DM and DBT (as in clinical practice but not in our experiment); this likely accounts for some of the low sensitivity seen with DM alone in our study. Importantly, the benefit of increased sensitivity of DBT compared to DM did not come at a cost to specificity, as specificity remained high for both DBT and DM for both attendings and fellows. Although a statistically significant difference was observed for both groups individually, there was no statistically significant difference when the groups were combined. The 3% change (decrease for attendings and increase for fellows) when specificity remained >90% may not be clinically relevant.
The relative benefit in diagnostic performance of DBT compared with DM can also be seen from the likelihood estimates and area under the curve estimates. Specifically, overall positive likelihood almost doubled when using DBT relative to DM and area under the curve increased by 10%, both indicating a stronger association with AD for DBT relative to DM. Though both groups experienced these gains, fellows in particular experienced greater gains in diagnostic performance when using DBT.
Our study design controlled for bias in the following ways: (1) Radiologists were blinded to patient information (e.g., patient identifiers, outcomes), thus reducing the chance of recall due to patient recognition; (2) Reading sessions were separated by 1 month in time and the same patients were viewed using different imaging techniques thus reducing the likelihood that radiologists would recall images from a different imaging technique from the previous month (this time lag of 1 month was selected to ensure the effects of memory decay and retrograde interference of memory given the high volume of breast imaging interpreted by the radiologists in clinical practice); (3) Recall bias, if present, would be the same for DBT and DM because of the counterbalanced design. In addition, although the cases were selected 3 years prior to the study, memory decay and retrograde interference would reduce recall bias after reading thousands of images in clinical practice over the intervening years; (4) To control for order effects, the order of patient images was held constant across sessions 1 and 2, but the order itself was randomly assigned AD versus no AD using a block design.
Our study has some limitations. As this was an efficacy study, a 1:1 matched AD/non-AD design was used to optimise detection in change of sensitivity and specificity equally; positive predictive value and false-positive rate cannot be assessed because of their inherent relationship with prevalence. In addition, as the goal of our study was to examine AD (known to arise from a variety of aetiologies), we included all cases of consensus-reviewed AD, not just those cases due to cancer. A recent study [21] comparing single view DBT and DM showed an increase in recalls for stellate distortions with DBT, most assessed as normal breast tissue after additional imaging; the second most common aetiology was radial scar [21]. Radial scars diagnosed on percutaneous biopsy risk being upgraded to malignancy and thus are often excised, but increased detection of radial scars with DBT compared to DM warrants additional investigation into the need for excision in all cases particularly when there is no evidence of atypia [22, 23]. The inclusion of all AD and possible AD cases in our study, benign and malignant, was to reflect a more real-world collection of cases as seen in clinical practice. While our findings suggest that DBT may improve detection of cancer presenting as AD, our study did not specifically address this. Our study also examined DM alone versus DBT alone, although DBT is used in conjunction with DM or synthesised two-dimensional imaging in clinical practice. When reading DM in conjunction with DBT, the presence of AD on DBT alone (i.e. AD not visible on corresponding DM) would likely prompt further work-up by most radiologists, although we did not specifically address this scenario with our study design. Use of digitally reconstructed two-dimensional images from DBT, as opposed to use of a separate DM acquisition, is increasing. We believe that our results with DM would translate to reconstructed images because both are two-dimensional techniques as opposed to DBT, although we did not examine this in our study. Finally, our cases of AD were agreed upon consensus review by three radiologists, but different consensus reviewers could disagree on the presence of AD in our case mix.
In conclusion, digital breast tomosynthesis decreases IOV, increases reader confidence, and improves sensitivity while maintaining high specificity in detecting architectural distortion.
Abbreviations
- AD:
-
Architectural distortion
- DM:
-
Digital mammography
- DBT:
-
Digital breast tomosynthesis
- IOV:
-
Interobserver variability
References
Shaheen R, Schimmelpenninck CA, Stoddart L, Raymond H, Slanetz PJ (2011) Spectrum of diseases presenting as architectural distortion on mammography: multimodality radiologic imaging with pathologic correlation. Semin Ultrasound CT MR 32:351–362
Bahl M, Baker JA, Kinsey EN, Ghate SV (2015) Architectural Distortion on Mammography: Correlation With Pathologic Outcomes and Predictors of Malignancy. AJR Am J Roentgenol 205:1339–1345
Gaur S, Dialani V, Slanetz PJ, Eisenberg RL (2013) Architectural distortion of the breast. AJR Am J Roentgenol 201:W662–W670
Burrell HC, Sibbering DM, Wilson AR et al (1996) Screening interval breast cancers: mammographic features and prognosis factors. Radiology 199:811–817
Andersson I, Ikeda DM, Zackrisson S et al (2008) Breast tomosynthesis and digital mammography: a comparison of breast cancer visibility and BIRADS classification in a population of cancers with subtle mammographic findings. Eur Radiol 18:2817–2825
Rafferty EA, Park JM, Philpotts LE et al (2013) Assessing radiologist performance using combined digital mammography and breast tomosynthesis compared with digital mammography alone: results of a multicenter, multireader trial. Radiology 266:104–113
Rose SL, Tidwell AL, Bujnoch LJ, Kushwaha AC, Nordmann AS, Sexton R Jr (2013) Implementation of breast tomosynthesis in a routine screening practice: an observational study. AJR Am J Roentgenol 200:1401–1408
Skaane P, Bandos AI, Gullien R et al (2013) Comparison of digital mammography alone and digital mammography plus tomosynthesis in a population-based screening program. Radiology 267:47–56
Durand MA, Haas BM, Yao X et al (2015) Early clinical experience with digital breast tomosynthesis for screening mammography. Radiology 274:85–92
Lourenco AP, Barry-Brooks M, Baird GL, Tuttle A, Mainiero MB (2015) Changes in recall type and patient treatment following implementation of screening digital breast tomosynthesis. Radiology 274:337–342
Partyka L, Lourenco AP, Mainiero MB (2014) Detection of mammographically occult architectural distortion on digital breast tomosynthesis screening: initial clinical experience. AJR Am J Roentgenol 203:216–222
Freer PE, Niell B, Rafferty EA (2015) Preoperative Tomosynthesis-guided Needle Localisation of Mammographically and Sonographically Occult Breast Lesions. Radiology 275:377–383
Haas BM, Kalra V, Geisel J, Raghu M, Durand M, Philpotts LE (2013) Comparison of tomosynthesis plus digital mammography and digital mammography alone for breast cancer screening. Radiology 269:694–700
Ray KM, Turner E, Sickles EA, Joe BN (2015) Suspicious Findings at Digital Breast Tomosynthesis Occult to Conventional Digital Mammography: Imaging Features and Pathology Findings. Breast J 21:538–542
Beam CA, Conant EF, Sickles EA (2002) Factors affecting radiologist inconsistency in screening mammography. Acad Radiol 9:531–540
Onega T, Smith M, Miglioretti DL et al (2012) Radiologist agreement for mammographic recall by case difficulty and finding type. J Am Coll Radiol 9:788–794
Suleiman WI, McEntee MF, Lewis SJ et al (2016) In the digital era, architectural distortion remains a challenging radiological task. Clin Radiol 71:e35–e40
Altman DG (1991) Practical statistics for medical research, 1st edn. Chapman and Hall, London
Geller BM, Bogart A, Carney PA, Elmore JG, Monsees BS, Miglioretti DL (2012) Is confidence of mammographic assessment a good predictor of accuracy? AJR Am J Roentgenol 199:W134–W141
Tucker L, Gilbert FJ, Astley SM et al (2017) Does reader performance with digital breast tomosynthesis vary according to experience with two-dimensional mammography? Radiology 283:371–380
Lang K, Nergarden M, Andersson I, Rosso A, Zackrisson S (2016) False positives in breast cancer screening with one-view breast tomosynthesis: An analysis of findings leading to recall, work-up and biopsy rates in the Malmo Breast Tomosynthesis Screening Trial. Eur Radiol 26:3899–3907
Kalife ET, Lourenco AP, Baird GL, Wang Y (2016) Clinical and Radiologic Follow-up Study for Biopsy Diagnosis of Radial Scar/Radial Sclerosing Lesion without Other Atypia. Breast J 22:637–644
Ferreira AI, Borges S, Sousa A et al (2017) Radial scar of the breast: Is it possible to avoid surgery? Eur J Surg Oncol. doi:10.1016/j.ejso.2017.01.238
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Guarantor
The scientific guarantor of this publication is Ana P. Lourenco, MD.
Conflict of interest
The authors of this manuscript declare no relationships with any companies whose products or services may be related to the subject matter of the article.
Funding
The authors state that this work has not received any funding.
Statistics and biometry
One of the authors has significant statistical expertise.
Informed consent
Written informed consent was waived by the Institutional Review Board.
Ethical approval
Institutional Review Board approval was obtained.
Methodology
• Prospective
• Experimental
• Performed at one institution
Rights and permissions
About this article
Cite this article
Dibble, E.H., Lourenco, A.P., Baird, G.L. et al. Comparison of digital mammography and digital breast tomosynthesis in the detection of architectural distortion. Eur Radiol 28, 3–10 (2018). https://doi.org/10.1007/s00330-017-4968-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00330-017-4968-8