Introduction

In the past, variability in the terminology used in mammographic reporting has often lead to misinterpretations. When radiologists provide inconsistent recommendations for given assessments, confusion may be created among referring clinicians about whether and how to conduct further evaluation [14].

The Breast Imaging Reporting and Data System (BI-RADS) of the American College of Radiology (ACR) is a tool created to reduce these inconsistencies [5]. The BI-RADS lexicon includes illustrations of each mammographic feature described, a section on auditing a mammography practice, and sample reports. Its purpose is to standardize (1) the terminology in a mammographic report, (2) the assessment of findings, and (3) the resulting recommendation for action. The relationship between assessment and management recommendations has implications for clinical care, teaching, and evaluating the screening interpretations of radiologists. The features of masses and calcifications described provide the basis for categorization [5]. The BI-RADS lexicon is increasingly used in the European region, e.g. in France, Austria, Switzerland, Turkey, and Germany. Evaluation studies of the lexicon have been published [612]. Several publications, however, have exhibited observer variability in lesion description when using the BI-RADS categories and limitations of the lexicon usage [1324]. In the following, an illustration of mammographic examples from our institution interpreted according to the BI-RADS lexicon of the ACR will be shown. Furthermore, a literature review concerning the usefulness and limitations of the BI-RADS lexicon will be given.

BI-RADS lexicon for mammography according to the ACR [5]

The BI-RADS lexicon describes four classes of breast parenchymal density and its effect on diagnostic accuracy (Table 1, Fig. 1).

Table 1 Classification of breast tissue density according to the BI-RADS lexicon
Fig. 1
figure 1

Different types of breast tissue density according to the American College of Radiology (ACR type 1/fatty, ACR type 2/fibroglandular, ACR type 3/heterogenously dense, ACR type 4/extreme dense breast tissue)

Masses

A mass is defined as a lesion seen in two different projections. If a lesion is seen only in a single projection, it should be called a density. Masses are further classified by shape, margin, and density (Table 2, Fig. 2).

Table 2 ACR BI-RADS categories for mammographic lesions according to their probability of being malignant and subsequent recommendations
Fig. 2
figure 2

Shape (top), margin (middle), and density (bottom) of masses according to the BI-RADS lexicon. Shape: Aa round/oval, Ab lobular, Ac irregular. Margin: Ba Well-defined, Bb ill-defined, Bc obscured, Bd spiculated. Density: Ca fat-containing, Cb isodense, Cc high density

Shape

The shape of a mass can be round, oval, lobular, or irregular (Fig. 2, top).

Margin

The margin of a mass can be described as circumscribed (well-defined or sharply defined), microlobulated (undulation with short cycles), obscured (hidden by superimposing adjacent tissue), indistinct (ill-defined), and spiculated (Fig. 2, middle).

Density

The density of a mass can be higher, equivalent (isodense), or lower than the surrounding parenchyma or fat equivalent (Fig. 2, bottom).

Calcifications

Different types of calcifications are distinguished (typically benign, intermediate-concern calcifications, and calcifications signifying a high probability of malignancy), and the distribution of microcalcifications is an important additional feature (Table 2, Figs. 34).

Fig. 3
figure 3

Types of microcalcifications according to the BI-RADS lexicon. Top, middle Typically benign findings: Aa vascular, Ab coarse or popcorn like, Ac large rod-like, Ad round, Ae “eggshell”, Af lucent-centered, Ag suture, Ah milk or calcium calcifications. Bottom Intermediate-concern calcifications: B amorphe calcifications. Higher probability of malignancy: C left pleomorphic, C right, D fine linear branching calcifications

Types of calcifications

Typically benign (Fig. 3, top, middle)

  • Skin calcifications (dermal)

  • Vascular calcifications

  • Coarse or popcorn-like calcifications

  • Large rod-like calcifications

  • Round calcifications

  • Lucent-centered calcifications

  • “Eggshell” or “rim” calcifications

  • Milk or calcium calcifications

  • Suture calcifications

  • Dystrophic calcifications

  • Punctate calcifications

Intermediate-concern calcifications (Fig. 3, bottom, far left)

  • Amorphous or indistinct calcifications

Higher probability of malignancy calcifications (Fig. 3, bottom, right)

  • Pleomorphic or heterogenous calcifications (granular)

  • Fine linear, fine linear branching (casting) calcifications

Distribution

  • Diffuse/scattered

  • Regional

  • Linear

  • Segmental

  • Grouped or clustered (Fig. 4).

Fig. 4
figure 4

Distribution of microcalcifications according to the BI-RADS lexicon: diffuse (A), regional (B), linear (C), segmental (D), and clusters of microcalcification (E)

Architectural distortion

The normal architecture is distorted with no definite mass visible. Architectural distortion can also be an associated finding of malignancy.

Special cases

Special cases include tubular density or solitary dilated duct, intramammary lymph node, asymmetric breast tissue, and focal asymmetric density (Fig. 5).

Fig. 5
figure 5

A few associated findings and special cases according to the BI-RADS lexicon. Special cases: intramammary lymph node (A), asymmetric breast tissue (B). Associated findings: skin retraction and thickening (C)

Associated findings

These findings are used with masses or calcifications or alone when no other abnormality is present (Fig. 5):

  • Skin retraction

  • Nipple retraction

  • Skin or trabecular thickening

  • Skin lesion

  • Axillary adenopathy

  • Architectural distortion

Location of lesion

The location of a lesion should include:

  • Side (left, right, or both),

  • Location (according to the face of the clock and subareolar, central, or axillary)

  • Depth (anterior, middle, or posterior)

BI-RADS categories

After a clear description of the findings is made according to the above-mentioned parameters, a report indicating the categorization of the lesion(s) into one of the BI-RADS classifications is necessary, implying the appropriate next course of action (Table 2).

Review and discussion of the literature

Usefulness of the BI-RADS lexicon

The final assessment categories of the BI-RADS lexicon are useful predictors of malignancy (Table 3). Liberman et al. [21] found a significantly higher frequency of carcinoma among category 5 lesions than among category 4 lesions for all mammographic findings and all interpreting radiologists. For malignancy, the positive predictive value (PPV) for BI-RADS 5 categories ranged from 81% to 97% versus 23% to 24% for BI-RADS 4 categories [21]. Partik et al. [11] retrospectively evaluated findings in mammography and sonography of male patients with pathohistologically proven diseases according to the BI-RADS lexicon. The invasive ductal carcinoma of male patients was a predominantly lobulated, ill-defined lesion in mammography and sonography. The differentiation of carcinoma from pseudogynecomastia and diffuse or dendritic gynecomastia was easily accomplished. The differentiation between carcinoma and some benign mass lesions, however, was not reliable. Bock et al. [6] analyzed the validity of the BI-RADS lexicon for clinical mammography in men. Assessment of the mammograms with the BI-RADS system correctly placed all cases of malignancy into categories 4 and 5 without respect to the investigators’ level of experience. Siegmann et al. [12] evaluated the BI-RADS category correlation with the malignancy rate of stereotactic vacuum-assisted breast biopsies in order to optimize the indication for performing this procedure. The rate of malignancy increased from 6.3% for BI-RADS category 3 to 16.7% for BI-RADS category 4 and to 85% for BI-RADS category 5 (P<0.001). Orel et al. [23] reported the PPV of the BI-RADS categorization. The PPV for category 0, 2, 3, 4, and 5 lesions was 13%, 0%, 2%, 30%, and 97%, respectively. Lacquement et al. [19] also analyzed the PPV of the BI-RADS lexicon: the overall PPV was 0.23 and increased with increasing level of suspicion for category 1, 2, 3, 4, and 5 with a PPV of 0.0, 0.04, 0.03, 0.23, and 0.92, respectively. Zonderland et al. [25] evaluated 2,762 mammograms with a PPV of BI-RADS category 1, 2, 3, 4, and 5 of 0.3%, 0.6%, 33.9%, 52.7%, and 100%, respectively. The difference between BI-RADS 1 and 2 vs BI-RADS 3 was statistically significant (P<0.01). Bérubé et al. [16] determined whether the categories defined according to the BI-RADS lexicon were useful predictors of malignancy in a retrospective study; core biopsy showed no malignancy in the lesions classified as BI-RADS 3, 4% malignancies and 5% atypical hyperplasias were reported in the category BI-RADS 4, and 54% malignant lesions in the category BI-RADS 5. Gülsün et al. [9] revealed a PPV of 17% and 25% for two readers for BI-RADS category 4 and 68% and 44% for category 5. The interobserver agreement was moderate in the evaluation of microcalcification morphology (kappa: 0.31), distribution of microcalcifications (kappa: 0.29), and final assessment categories (kappa: 0.27), and for associated findings (kappa: 0.31).

Table 3 Final assessment categories: positive predictive values for BI-RADS 2–5 classified lesions

The BI-RADS lexicon does not explicitly state which mammographic features should be included in the various final assessment categories. Several studies found that the features with the highest PPV for masses were spiculated borders and irregular shape, whereas those for calcifications were fine linear morphology with segmental or linear distribution [22]. Table 4 shows a classification of features for the assignment of findings to the various BI-RADS categories [16].

Table 4 Categorization of features of masses and calcifications according to BI-RADS [16]

BI-RADS category 3

BI-RADS category 3 has been the subject of debate in the literature. Caplan et al. [26] reported that 7.7% of 372,760 mammograms were classified as category BI-RADS 3. The probability for BI-RADS class 3 was higher in women who were young, symptomatic, or had abnormal findings on clinical breast examinations. Sickles [27] prospectively evaluated the value of short-term follow-up mammography in 3,184 patients with baseline mammographic lesions classified as probably benign. Of the 3,184 probably benign lesions included in the study, cancer was subsequently discovered in 17 (0.5%). Fifteen of the 17 cancers were diagnosed by means of interval changes on follow-up mammography before they were palpable. Cancer was discovered in 0.1% clusters of round or punctate calcifications, 2% solitary solid circumscribed masses, 0.4% focal asymmetric densities, 0.2% clustered calcifications, and 0.4% multiple solid circumscribed nodules. Sickles [4] noted that the frequency of cancer among probably benign lesions was 0.7% appearing in 1.4% as solid circumscribed masses, in 0.6% as focal asymmetric densities, in 0.4% as localized microcalcifications, in 0.3% as multiple circumscribed masses, and in 0.2% as generalized microcalcifications. Varas et al. [28, 29] analyzed 544 (3%) of 18,435 lesions that were assigned to the BI-RADS category 3 and that were followed up for a minimum of 2 years. Of the follow-up mammograms, 97% showed stability or regression of the BI-RADS category 3 findings, whereas 3% showed nonpalpable interval progression revealed by mammography and underwent biopsy. The breast cancer detection rate among the study population was 0.4%. Of patients who had undergone biopsy because of interval progression of the lesions, 14% were shown to have malignant lesions. In a comparison of the findings from the 1987–1989 study and the 1996 study, the frequency of BI-RADS category 3 lesions has remained stable, patient compliance for follow-up has increased, and PPV of category 3 lesions for cancer has decreased from 1.7% to 0.4% (P=0.04). Mendez et al. [30] evaluated the use of stereotactic vacuum-assisted breast biopsy for BI-RADS 3 lesions to reveal the false-negative rate of category 3 mammograms. A total of 156 vacuum-assisted biopsies was performed for BI-RADS 3 abnormalities in a collective of 947 vacuum-assisted procedures. The false-negative rate of BI-RADS 3 mammograms was 4.5%. Patients with linear microcalcifications had the highest rate of cancer (29%) compared with patients without microcalcifications (1.5%) and patients with nonlinear microcalcifications (2.9%). Monticciolo and Caplan [31] analyzed the recent use of the category 3 designation in a national cancer detection program. In the initial phase (1991–1996), the percentage classified in category 3 was 7.7% and, in the second phase (1996–1999), was 6.0%. Overall, the percentage of category 3 mammographic findings decreased over time, whereas requests for additional examinations increased. In a study at our institution, we evaluated patients who had microcalcifications classified as BI-RADS category 3 and who underwent stereotactic vacuum biopsy [10]. We found a PPV for these BI-RADS category 3 lesions of about 4% in accordance with the literature. Burns et al. [1] reported that tru-cut biopsy was performed in 400 lesions; 156 of the 400 lesions (39%) were classified as BI-RADS category 3. Moy et al. [32] analyzed lesions categorized as BI-RADS 3. In this study, eight of 13 carcinomas were detected in the 6-month follow-up and the remainder in the 12-month follow-up.

Breast tissue density (ACR types)

The BI-RADS lexicon standardizes the classification of breast parenchymal density. It is important that the breast tissue density is included in the report because dense breast tissue interferes with the interpretation of mammograms [33]. Mandelson et al. [34] evaluated breast density as a predictor of mammographic lesion detection. Mammographic sensitivity was 80% among women with fatty breast tissue (ACR type 1) but 30% in women with extremely dense breast tissue (ACR type 4). Satija et al. [35] studied 82,391 screening mammograms among 36,495 women aged 40–80 years and found that ACR type 1 and 2 breast density at age 40 was associated with a relative breast cancer risk of 0.39 with respect to the general population at the same age. At age 80, this relative risk was 0.61. The relative risk for women with breast tissue density ACR type 3 was 0.72 at age 40 and 1.13 at age 80.

Limitations of the lexicon

Variability in mammographic interpretation had been reported in several studies before the use of the BI-RADS lexicon. Elmore et al. [2] published a study in which ten radiologists reviewed 150 mammograms including 27 cancers. Work-up was recommended for 74% to 96% of women with cancer and 11% to 65% of women without cancer. Beam et al. [36] analyzed results of screening mammograms from 79 women with 45 cancers that had been reviewed by 108 radiologists. Sensitivity ranged from 47% to 100%, and specificity ranged from 36% to 99%.

Since the introduction of the BI-RADS lexicon, observer variability has been re-evaluated by several authors [13, 14, 17, 23]. These studies indicate that, even in the presence of a standardized lexicon, variability in mammographic reports persists.

Baker et al. [13] analyzed the results of 60 mammograms independently reviewed by five radiologists. Each radiologist read each case twice. Baker et al. [13] found substantial inter- and intra-observer agreement for choosing terms to describe masses. Considerable inter-observer and intra-observer variability was noted for associated findings and special cases. Kerlikowske et al. [17] published a study of 71,712 screening examinations performed by the Mobile Mammography Screening Program of the University of California, including 267 with cancer. They found moderate agreement between two radiologist readers in reporting the presence of findings when cancer was detected (kappa=0.54) and substantial agreement when cancer was not present (kappa=0.62). The variability in interpretation of mammographic examinations and the accuracy of mammography are neither improved nor diminished with use of BI-RADS. Berg et al. [14] reported inter-observer and intra-observer variability of five experienced mammographers in the use of BI-RADS terminology in 103 screening mammograms and 96 diagnostic mammograms. Lesion management was highly variable, e.g., the five observers agreed on management for only 55% of 86 lesions. Intra-observer agreement for management was seen in 85% of interpretations. The recommendation for additional evaluation or biopsy was made for 90%–97% of cancers on screening mammograms and for 91%–96% of cancers detected in diagnostic mammograms. Pijnappel et al. [37] reported that the overall agreement for lesion classification was moderate (kappa 0.54). The lowest kappa values were observed for the BI-RADS category 3 (kappa 0.59) and category 4 (kappa 0.44). The clinical management of non-palpable lesions consisting of microcalcifications and depending upon radiological classification into groups BI-RADS 3 and BI-RADS 4 is therefore debatable. Berg et al. [15] analyzed the impact of a BI-RADS training on reader agreement describing lesion features and found that after a 1-day training session expert consensus improved.

Referring clinicians, however, have little knowledge of the BI-RADS [24]. Of 86 clinicians, 46% were not aware that radiologists were required to report mammograms by using BI-RADS terminology, 64% had no information of further education regarding the BI-RADS classification, and only 35% were comfortable having BI-RADS in their reports. This study was published in the year 2000, and further education and improving communication since then have improved the referring clinicians’ knowledge of the BI-RADS lexicon. Since its introduction, there has been improvement in the accuracy of BI-RADS application. Variation in assessment and recommendation, however, persists. Taplin et al. [38] revealed that BI-RADS assessment and management recommendations were consistent for negative and benign findings, but inconsistencies were found in assessment and recommendations for mammographic abnormalities. For lesions classified as BI-RADS 3, additional imaging was recommended in 36.9%. Biopsy of BI-RADS 4 lesions was recommended for 48.7%, additional imaging in 38.7%, and clinical examination and/or surgical consultation in 9%. The majority of BI-RADS 5 classified lesions were referred for biopsy (73.3%). A clinical examination and/or surgical consultation was recommended in 18.1% and additional imaging in 6.6% of these cases. Geller et al. [3] reported that BI-RADS assessment categories were generally used as intended for all categories but 0 and 3. Management recommendations for BI-RADS category 3 lesions had the highest variability. Only 40% of these cases were associated with the recommendation for short interval follow-up. Additional imaging was recommended in 64% of BI-RADS 0 findings. In 20% of these cases, either a consultation or biopsy was recommended. Lehman et al. [20] reported that the overall discordance between BI-RADS assessment categories and recommendations was low (3%). The highest recommendation discordance was found for category 3 lesions (53.5%). Mammograms of women with dense breast tissue were 30% more likely to have lesions assigned with discordant assessments and recommendations compared with those of women with fatty tissue.

Continued efforts to educate radiologists and referring clinicians in the use and classification of BI-RADS terms promotes maximum consistency in reporting terminology. The BI-RADS lexicon, therefore, remains a work in progress and may be modified in the future.

Summary and future developments

The BI-RADS atlas is a helpful guide for use in everyday practice. Its purpose is to standardize mammographic reports, thereby improving clarity and enabling better communication, and to facilitate research. In several studies, the PPVs of specific mammographic features have been evaluated and have contributed to further refinement. Studies of inter- and intra-observer variability have shown, however, that further development and training of physicians in lexicon use are necessary. Similar lexicons for breast ultrasound and breast magnetic resonance imaging should be validated.