Introduction

Percutaneous core needle biopsy (CNB) is considered the standard technique for histological diagnosis of breast lesions. Previous studies have shown that while CNB is highly reliable and sensitive for distinguishing between malignant and benign disease [13], it is less reliable for diagnosing atypical ductal hyperplasia (ADH). For patients diagnosed with ADH at CNB, the rate of upgrade to ductal carcinoma in situ or invasive cancer at follow-up surgical excision is reported as 19–87% [49]. Therefore, follow-up surgical excision is generally recommended when ADH is diagnosed at CNB.

Several studies have examined whether certain clinical, radiological or pathological factors exist which may predict the likelihood of malignancy [1012]. A lesion can be considered “probably benign” if there is a <2% possibility of carcinoma, as indicated by the definition of category 3 in the Breast Imaging Reporting and Data System (BI-RADS) lexicon of the American College of Radiology [13]. Jackman et al. investigated whether there is a subset of ADH lesions diagnosed at CNB that fitted the “probably benign” definition, which would indicate the most appropriate subsequent action would be imaging monitoring rather than surgical excision [10]. To our knowledge, while several clinical, radiological and pathological factors have been found to be associated with underestimation, no factor alone or in combination is associated with a subset that has a <2% possibility of carcinoma at follow-up surgical excision.

The present study examined whether certain clinical, radiological or pathological factors were associated with malignancy in patients diagnosed with ADH at CNB. The aim was to develop and validate a scoring system that could be used to predict a <2% probability of cancer at follow-up surgical excision. Such a prediction would indicate whether a patient would undergo subsequent imaging follow-up or surgical excision.

Methods

Study population

Between January 2001 and February 2007, 4493 consecutive ultrasound-guided CNBs were performed on suspicious breast lesions at the Seoul National University Hospital (SNUH). A total of 102 CNBs led to a diagnosis of ADH, and 74 patients underwent follow-up surgical excision. This SNUH dataset of 74 patients was used to develop a prediction algorithm for histologic underestimation. The definition employed for “histologic underestimation” was a lesion diagnosed as ADH at CNB that was revealed to harbor malignant foci at follow-up surgical excision, including ductal carcinoma in situ and invasive cancer [4, 14]. The validation dataset consisted of 34 cases diagnosed with ADH at CNB that underwent subsequent surgical excision at the National Cancer Center, Korea (NCC) between June 2002 and September 2006, and 20 similar cases at the Seoul National University Bundang Hospital (SNUBH) between May 2003 and December 2006.

Ultrasound-guided biopsy

All patients in the study population underwent clinical and radiological examination, including mammography and ultrasound. Palpability was assessed by experienced surgeons, and the radiological appearance of the lesion was characterized according to the American College of Radiology Breast Imaging Reporting and Data System lexicon and the final assessment categories [13]. All lesions were evaluated for size on imaging and presence of microcalcification. Lesion size was defined as the greatest lesion dimension on ultrasound imaging for most patients, or mammography size for patients with microcalcification-dominant lesions.

The detailed CNB methodology was published previously [15]. Briefly, ultrasound-guided biopsies were used for sonographically visible lesions, and were performed with patients in a supine or decubitus position using high-resolution sonography units with 10- or 12-MHz linear transducers (Voluson 730, Kretz, Austria; HDI 5000, Advanced Technology Laboratories, Bothell, WA). The biopsy was performed using a spring-loaded device with a 14-gauge automated needle (Bard Peripheral Technologies, Convinton, GA), or with an 11-gauge directional vacuum-assisted biopsy device (Mammotome; Biopsys/Ethicon Endo-Surgery, Cincinnati, OH). Choice of biopsy device depended largely on the preference of the radiologist performing the biopsy, although the preferences of the physician and patient also affected this decision. The vacuum-assisted biopsy device was preferred for lesions where it may have been particularly beneficial [1618], such as calcified lesions, intraductal lesions and solid nodules with irregular margins. Automated gun biopsy was preferred for multifocal and subareolar lesions. The core biopsy tissue sections were fixed in 10% formaldehyde and embedded in paraffin. Each biopsy specimen was stained with hematoxylin and eosin. Immunohistochemistry was not routinely used in the pathological assessment. The biopsy slides were reviewed by an experienced pathologist and diagnosed according to the ADH diagnostic criteria in the WHO guidelines (Supplemental Table 1, Supplemental Fig. 1) [19].

Development of SNUH scoring system

Demographic data, mammography and ultrasound description, core needle size, number of passes, CNB and open surgical biopsy pathology results and follow-up data were collected for each patient. Each factor was first tested individually for association with histologic underestimation using a univariate logistic regression model. Factors with p values ≤0.1 were then included in a multivariate logistic regression model. P values ≤0.05 were considered to indicate significant factors in the multivariate logistic regression model, and a scoring system was developed based those factors. A score for each significant factor was assigned the multiple of 0.5 nearest to the β coefficient obtained for each significant factor from the multivariate logistic regression model. For example, a score of 2.0 was assigned for a palpable lesion with a β coefficient of 1.92, and a score of 3.5 was assigned for a size >15 mm on imaging with a β coefficient of 3.34. The scores for each significant factor were then added, resulting in a total score for each patient. For the variables where a patient had an opposite result, a score of 0 was added. The final scores ranged from 0 to 14.5. The discriminatory ability of the prediction algorithm was measured using the area under the receiver operating characteristic (ROC) curve. The sensitivity, specificity, and positive and negative predictive values for each score were calculated using various cutoff values. The cutoff value to define a subset of ADH diagnosed at CNB as probably benign (i.e., a less than 2% possibility of malignancy at surgical excision) was then determined. To validate this scoring system, the NCC and SNUBH dataset of 54 patients was used. Statistical analyses were performed using SPSS Version 12.0 software (SPSS, Chicago, IL).

Results

Of 4,493 consecutive patients, 102 were diagnosed with ADH at CNB, resulting in a prevalence rate of 2.27%. Of those 102 patients, 74 underwent surgical excision at our institution. Of those 74 patients, 34 (45.9%) were diagnosed with a malignancy after surgical excision (23 with DCIS and 11 with invasive cancer; Fig. 1). Table 1 summarizes the underestimation rates in all patients according to clinical, radiological and pathological variables. Univariate analysis revealed that a palpable lesion on physical examination, microcalcification on mammography and size on imaging >15 mm were associated with underestimation. A diagnosis of diffuse ADH at CNB (P = 0.076) and age > 50 years at the time of biopsy (P = 0.051) were found to almost reach statistical significance in terms of association with underestimation. When those five factors were included in multivariate analysis, palpable lesion, microcalcification on mammography, size on imaging >15 mm and age > 50 years were all found to be independent predictors of malignancy, whereas focal ADH was found to be a negative predictor. The odds ratio, β coefficient associated with each significant factor in the multivariate model, and score for each significant factor according to the β coefficients are shown in Table 2. The total scores for individual patients ranged from 0 to 14.5. The discriminating ability of the scoring system measured using the area under the ROC curve (AUC) was 0.903 (95% confidence interval, 0.835–0.970) for the SNUH dataset of 74 patients, and 0.850 (95% confidence interval, 0.746–0.953) for the validation dataset of 54 patients (Fig. 2).

Fig. 1
figure 1

Patient distribution and stratification. Underestimation rate 45.9%. CNB = core needle biopsy; VACB = vacuum assisted core biopsy; ADH = atypical ductal hyperplasia; DCIS = ductal carcinoma in situ

Table 1 Underestimation rates for 74 ADH cases according to clinical, radiological and histological variables
Table 2 Multivariate logistic regression model
Fig. 2
figure 2

ROC curve and area of under the ROC curve (a) For SNUH dataset (b) For validation dataset. ROC = receiver operating characteristic; AUC = area under the receiver operating characteristic curve

Table 3 shows the sensitivity, specificity, and positive and negative predictive values according to each cutoff value. A negative predictive value refers to the ability to predict a benign lesion without malignancy at follow-up surgical excision. When using a score of ≤3.5 as the cutoff value, none of the 6 (21.6%) patients with such scores were upstaged to malignancy. Thus, a score of ≤3.5 can be used to define a subset of “probably benign” lesions. In the validation dataset of 54 patients, 15 (27.8%) were classified as having “probably benign” lesions (i.e., score ≤3.5), and none of those patients were diagnosed with malignancies at follow-up surgical excision (Table 4b).

Table 3 Sensitivity, specificity, positive predictive and negative predictive values according to various cutoff values
Table 4 Underestimation rate according to various scores

Discussion

The present study is the first to develop a scoring system to predict the probability of cancer at follow-up surgical excision in patients diagnosed with ADH at CNB. We identified clinical, radiological and pathological factors associated with malignancy in patients diagnosed with ADH at ultrasound-guided CNB. Using these factors, a scoring system was developed to predict malignancy, and a subset of “probably benign” lesions was identified (i.e., <2% possibility of malignancy at follow-up surgical excision). The accuracy of this tool was then validated using patient data from two external institutions.

In the current study, the underestimation rate was 45.9% (34/74), which comprised a 50.0% (25/50) rate for 14-gauge automated gun biopsies and a 37.5% (9/24) rate for 11-gauge vacuum-assisted biopsies (P = 0.314 for the two methods). Many studies have investigated stereotactic biopsies, and taking larger core specimens or using the vacuum-assisted device is believed to improve accuracy [2022]. However, there are few reports regarding ultrasound-guided procedures, and the benefit of vacuum-assisted devices remains debatable. In a study of ultrasound-guided CNB, Philpotts et al. [23] reported that there was no significant difference in outcomes when comparing 11-gauge vacuum-assisted devices with 14-gauge automated guns, while Grady et al. [24] suggested that vacuum-assisted biopsy was more accurate than automated gun biopsy under ultrasound guidance. A possible explanation for the present results is that the 11-guage vacuum-assisted biopsies incorporated a higher proportion of calcified lesions (18/24, 75%) compared to 14-gauge automated gun biopsies (14/50, 28%). In our institution, the calcification is also subjected to vacuum-assisted biopsy under ultrasound guidance when lesions are evident on sonography because sampling the sonographically visible component often helps target the invasive component [15, 25, 26]. Such calcified lesions were a significant causal factor for underestimation not only in our study but also other studies [1, 27]. Therefore, the underestimation rate for the 11-gauge vacuum-assisted biopsy was higher in the present study due to selection bias.

The present study identified palpable lesions, microcalcification on mammography, lesion size on imaging >15 mm and age at the time of biopsy >50 years as independent predictors of malignancy at follow-up surgical excision in patients diagnosed with ADH at CNB, while focal ADH was a negative predictor. Other groups have also identified factors that appear to be associated with underestimation. Consistent with our findings, Jackman et al. [10] observed a decrease in underestimation rates (P = 0.01) when maximum lesion diameters were <10 mm, and Ely et al. [28] found that focal ADH was less associated with malignancy at follow-up surgical excision. While the present study identified five factors associated with underestimation in multivariate analysis, no single factor could define a subset with a <2% possibility of carcinoma at follow-up surgical excision. Consistent with these findings, while other series have identified factors associated with malignancy, no single clinical, radiological or pathological factor could identify lesions that could be safely followed rather than surgically excised [10, 12].

Nomograms, such as scoring systems, are statistical tools used to predict the probability of a specific outcome for an individual patient, and are developed to assist clinicians in clinical decision-making [29, 30]. Hwang et al. [31] developed a scoring system to predict the status of the non-sentinel lymph node in breast cancer patients with positive sentinel lymph nodes, based on the beta coefficients obtained for the significant factors from multivariate logistical regression. Based on a modification of that method, the present scoring system was able to predict the possibility of malignancy at follow-up surgical excision. The model discrimination as measured by the area under the ROC curve was 0.903 (95% confidence interval, 0.835–0.970) in the SNUH dataset, and 0.850 (95% confidence interval, 0.746–0.953) in the validation dataset. These results reflect that the prediction accuracy of a model can degrade as it is transported from one population to another [32]. As a general rule, a model that performs with an AUC curve of 0.7–0.8 is considered acceptable, and an AUC of 0.81–0.9 is considered excellent (− 14). Therefore, the present scoring system can be considered as excellent at discrimination, and the accuracy was both reproducible and transportable. The present scoring system is more useful at identifying lower risk compared to higher risk groups. The sensitivity and negative predictive value associated with a score of 3.5 or less was 100% because no patients with such scores were upstaged to malignancy. Thus, we suggest that lesions with scores ≤3.5 could be defined as ‘probably benign’, and be safely followed-up rather than surgically excised. Of course the patients with a low index will require closer than normal follow-up by mammography or ultrasound, because ADH is not normal and is regarded high risk lesion for cancer. Considering patient’s tolerance and compliance, either surgical excision or close follow-up will be determined.

However the system was developed and validated under ultrasound-guided CNB conditions in relatively small populations, and therefore it could be difficult to transfer this system to another institution using different practice guidelines. Thus this system awaits further testing on larger populations and under stereotactic CNB conditions.

Limitations of the present study include that it was retrospective and that it did not involve a randomized series of patients. These limitations can result in outcomes such as the greater proportion of calcified lesions in the 11-gauge vacuum-assisted biopsy group compared to the 14-gauge automated gun biopsy group. Furthermore, in the current study, 28 (27%) of the 102 ADH cases did not undergo surgical excision and were therefore excluded from the study. It is possible that cases with less possibility of malignancy were recommended for imaging follow-up rather than surgical excision, which could affect the underestimation rate and other results. Further validation and/or prospective studies are required.

Conclusion

The present study demonstrated that palpable lesions, microcalcification on mammography, size on imaging >15 mm and age at the time of biopsy >50 years were independent predictors of malignancy, whereas focal ADH was a negative predictor. A scoring system based on these factors may be useful in determining the probability of malignancy, and could be helpful in determining whether or not it is necessary for a patient with ADH at CNB to undergo surgical excision.