Introduction

Molecular breast imaging (MBI) is an FDA-approved breast imaging technique that relies on the detection of 99mTc sestamibi accumulation within the breast using a dual-head cadmium-zinc-telluride (CZT) gamma camera. 99mTc sestamibi has been shown to preferentially accumulate in breast malignancies as well as in some types of non-malignant tissue. MBI is a sensitive technique for detection of malignancy in the screening setting [1] and has demonstrated a similar high sensitivity in the diagnostic setting when a dual-detector system is used [2]. As more becomes known about the utility of this method of imaging, it will become increasingly important to have a common language for the greater community when describing MBI findings. To our knowledge, there exists neither standardized terminology nor interpretive criteria for MBI or other gamma-camera breast imaging studies.

The Breast Imaging Reporting and Data System (BI-RADS), developed through the American College of Radiology, is highly effective in standardizing reporting and communication of breast imaging findings and recommendations [3]. Training in BI-RADS for mammography [4] has been shown to improve observer agreement in mammographic interpretation [5]. BI-RADS lexicons for mammography [4], ultrasonography [6] and MRI [7] also serve as important tools for researching outcomes of particular imaging findings to guide management. We have developed a lexicon for the description of MBI images based on familiar BI-RADS lexicon terminology in existence for mammography, ultrasonography and MRI, as well as on the proposed BI-RADS-type lexicon for positron emission mammography (PEM) [8]. The goal of the current study was to determine interobserver agreement and diagnostic accuracy using this lexicon for standardized interpretation of MBI studies by breast radiologists.

Materials and methods

Institutional review

This observer study was deemed by our Institutional Review Board (IRB) as exempt, meaning that the protocol was approved for use without ongoing IRB oversight.

MBI system and imaging

MBI images were acquired using one of two research MBI systems under evaluation, both of which utilize a small field-of-view (20 × 16 cm or 20 × 20 cm) dual-head CZT gamma camera mounted on a modified mammographic gantry. FDA-approved versions of these MBI systems are now commercially available. One system was equipped with two LumaGem detectors (Gamma Medica-Ideas, Northridge, CA; intrinsic spatial resolution 1.6 mm) and the second system was equipped with GE prototype CZT detectors (GE Healthcare, Haifa, Israel; intrinsic spatial resolution 2.5 mm). The detectors of both systems were fitted with system-specific registered collimators which were optimized for high count sensitivity in the setting of near-field imaging with a dual-head gamma camera [9]. These detectors have a very small dead-space at the detector edge, permitting the breast to be positioned in close proximity to the detector, in positions analogous to those obtained with mammography.

MBI studies were selected from a database of over 3,000 examinations which were performed under a variety of IRB-approved research protocols between August 2005 and February 2011. Written informed consent had been obtained from each woman as a part of the applicable protocol. Intravenous injection of 296, 740 or 1,110 MBq (8, 20, or 30 mCi) 99mTc sestamibi was used as previously described [1, 2, 10, 11]. MBI imaging commenced 5 min after injection. A craniocaudal (CC) and mediolateral oblique (MLO) view of each breast was acquired over 10 min per view. For each image, the breast was positioned between the two detectors and gentle compression was applied to limit patient motion. Of note, the amount of axilla included in the MBI images varies from patient to patient and is dependent on patient anatomy and positioning.

Observers

Five radiologists who met all requirements of the Mammography Quality Standards Act (MQSA) for physicians interpreting mammography, worked a minimum of 0.40 full-time equivalents in a Division of Breast Imaging and Intervention, and who had a mean of 14 years experience in breast imaging (range 7–23 years) participated in a 2-h didactic training session on MBI. A sixth observer was an MQSA-qualified board-certified radiologist currently enrolled in a breast imaging fellowship at the same institution who had completed 6 months of fellowship at the time of the interpretation task. None of the radiologists had prior training or experience in interpretation of MBI, and all agreed to have their results analyzed as part of participation in the training.

MBI training session

All observers first attended a 2-h live didactic session conducted by three of the authors. Imaging technique, gamma camera physics, lexicon definitions, and examples of malignant and non-malignant findings and artifacts, were reviewed. Ten unknown cases were then presented, each followed by review of corresponding multimodality images and pathology results.

MBI lexicon

A lexicon, as given in Table 1, was developed for the interpretation of MBI by three fellowship-trained, dedicated breast radiologists (A.L.C., C.T. and R.M.) who had each interpreted at least 270 MBI studies (mean 401, range 273–600), together with input from a member of the BI-RADS committees (W.B.). The MBI lexicon was modeled after familiar terms used in BI-RADS for mammography, ultrasound, and MRI [4, 6, 7]. We referred to the proposed lexicon for PEM [8] in order to unify (where possible) the lexicons for existing nuclear breast imaging techniques.

Table 1 MBI lexicon

The lexicon includes classification of radiotracer uptake in both lesions and background parenchyma. Background uptake was defined as the degree of background radiotracer uptake in the parenchyma relative to subcutaneous fat and was categorized as photopenic, mild, moderate or marked. Lesions could be classified as either a mass or non-mass radiotracer uptake. Qualitative radiotracer uptake within a lesion was classified as either photopenic (lack of uptake), mild, moderate or marked relative to surrounding parenchyma. For non-mass uptake, the distribution was categorized as a focal area, regional, multiple regions, segmental, or diffuse.

The location of MBI findings is specified in terms of the clock face or quadrant and distance from the nipple, as in mammography, though consistency of such descriptions was not evaluated in this task. The size of a finding on MBI is determined on the image(s) where the finding is most discrete with an additional measurement taken in the orthogonal plane, though lesions were not measured by observers in this evaluation. Symmetry between breasts and associated findings of nipple uptake, axillary uptake or vessel uptake can be described, though these were not required for each case in this task.

Final assessment codes, modeled after those used in BI-RADS for other breast imaging modalities, but tailored to MBI, were recorded for each breast. The assessment codes and corresponding recommendations were as follows: 1 – negative, routine follow-up; 2 – benign, routine follow-up; 3 – very low likelihood of malignancy, follow-up in 6 months if targeted diagnostic mammogram and ultrasound are negative; 4 – suspicious, consider biopsy; 5 – highly suggestive of malignancy, take appropriate action. Categories 0 (incomplete, needs additional imaging), and 6 (known malignancy, take appropriate action)were not included in this evaluation.

MBI interpretation task

The MBI interpretation task included studies of 27 patients in which 50 breasts were imaged. Both views of both breasts were shown together for 23 patients and only one breast was shown for 4 patients. The studies were selected from our file of more than 3,000 MBI patients to represent the breadth of pathology as well as the breadth of imaging findings seen on MBI, and were not previously shown as part of the training. Biopsy-proven malignancy was present in 20 of the 50 breasts (40%, 20 patients), including invasive ductal ± intraductal carcinoma in 9 (45% of malignancies); ductal carcinoma in situ in 5 (25% of malignancies); invasive lobular carcinoma in 3 (15% of malignancies); and mixed ductal and lobular carcinoma in 3 (15% of malignancies). For the 15 invasive cancers with size available, the median tumor size was 1.7 cm (mean 2.4 cm, range 1.0 to 6.3 cm). Benign lesions were diagnosed in 11 breasts (22%) including three biopsy-proven lesions (two fibroadenomata and one papilloma) and eight lesions with benign status verified by follow-up (including two benign cysts, two benign intramammary lymph nodes, one breast with benign nipple uptake and three areas of diffuse benign background uptake). The remaining 19 breasts had no diagnosed lesions. As the reference standard, a breast was considered disease-positive based on histopathologic diagnosis of malignancy from core biopsy and/or surgical excision. A breast was considered disease-negative based on either benign histopathologic findings or benign imaging findings at more than 1 year after the MBI study. All MBI studies included upper and lower head detector images in both the CC and MLO projections for a total of four images per breast. Lesions were not marked, so that observers decided whether a lesion was present and how many were present in each breast.

Observers were asked to describe the degree of background radiotracer uptake as well as any MBI findings according to the lexicon, and were asked to give a final assessment ranging from 1 to 5 for each breast based on the MBI findings. Observers were required to specify whether mass or non-mass-like uptake was present, the distribution of non-mass-like uptake (if relevant), and the intensity of uptake in each lesion described. Observers could comment on symmetry and additional findings (axillary uptake, nipple uptake, vessel uptake) but were not required to do so, and these data were not analyzed. Lesion location and size were not evaluated. After recording MBI findings and assessments, observers were then shown conventional imaging of low-resolution mammograms for comparison in all cases but one in which no mammogram was available. In four cases where relevant, an ultrasonogram or a prior comparison mammogram was also shown. Observers then gave a second final assessment based on the combined interpretation of MBI and conventional breast imaging. Observers were instructed not to attempt to give a primary interpretation of the mammogram.

Lexicon descriptor terms, as well as final assessments, were recorded in person by one of the study authors. The observers had access to the detailed BI-RADS style MBI lexicon to use as a reference during the interpretation task. A data recorder was allowed to remind the observers that the lexicon was available for their use, but otherwise did not guide the observers in interpretation.

Statistical analysis

Results were analyzed for MBI alone and again based on assessments of combined MBI and conventional imaging. A final assessment of 4 or 5 was considered test positive, while an assessment of 1, 2 or 3 was considered test negative. Sensitivity, specificity, positive predictive value and negative predictive value were calculated for each observer and for all observers as a group. A receiver-operator characteristic curve was generated for each observer, and the area under the curve for each individual observer was calculated. Agreement for each category was calculated at the per-breast level. In two cases in which two lesions in a single breast were present on the consensus read, analysis was performed using descriptors of the dominant lesion. Any additional lesions described by observers but not by the consensus readers were not included in the analysis of feature agreement.

Cohen’s kappa was used to assess interobserver agreement among the six observers for background uptake, lesion type, distribution of non-mass uptake, intensity of lesion uptake and final assessment code [12, 13]. Grouped kappas were calculated for assessments 1 or 2, 3, and 4 or 5. Agreement between each observer and the expert consensus readers was also calculated. A kappa value of 1.0 indicates complete agreement, while kappa of 0 indicates no agreement beyond that expected by chance. According to Landis and Koch [14], a kappa value below 0.2 indicates slight agreement, between 0.21 and 0.4 fair agreement, between 0.41 and 0.6 moderate agreement, between 0.61 and 0.8 substantial agreement, and 0.81 or greater near-perfect agreement.

Results

MBI interpretive performance

Across the six observers, median sensitivity for MBI alone was 20/20 (1.0, range 0.9–1.0) and median specificity was 26/30 (0.88, range 0.83–0.97). At least one observer assessed 11 breasts as 3, including 3 breasts with malignancy. The median positive predictive value was 0.85 (range 0.80–0.95, 20/25 to 19/20) and the median negative predictive value was 1.00 (range 0.93–1.00, 27/29 to 27/27). The median area under the curve was 0.94 (range 0.93–0.98). Three lesions in three breasts were described by the observers in addition to those known by the consensus readers, and were not included in the analysis.

MBI lexicon agreement

Kappa values for interobserver agreement and agreement between the observers and the consensus readers are given in Table 2. There was only fair agreement on background parenchymal uptake with κ = 0.31 (95% CI 0.23–0.39). Interobserver agreement for lesion type (mass or non-mass) was substantial with κ = 0.79 (95% CI 0.73–0.85) and was substantial for lesion intensity with κ = 0.67 (0.61–0.73). There was moderate to substantial agreement for each subcategory of intensity.

Table 2 Interobserver agreement for MBI descriptors among six dedicated breast radiologists newly trained in MBI interpretation, and agreement with expert consensus interpretations for 50 breasts. The data presented are κ values (95% CI)

For distribution of non-mass-like uptake, interobserver agreement was substantial at 0.63 (0.51–0.75). Agreement was substantial for segmental and regional distributions, moderate for focal area, fair for diffuse uptake (of which there was only one breast) and slight for multiple regions (of which there were two breasts). Results for background and lesion features compared to the expert consensus were similar to those between observers (Table 2).

Interobserver agreement for final assessment of MBI alone (without the corresponding mammogram) was substantial, κ = 0.80 (95% CI 0.73–0.87). Agreement between the observer final assessments and the expert consensus was also near-perfect, κ = 0.83 (95% CI 0.69–0.89). There was near-perfect agreement for combined categories 1 and 2 as well as for combined categories 4 and 5 (Table 2). Agreement for assessment category 3 was slight, with κ = 0.15.

MBI combined with conventional imaging

When interpreted in conjunction with mammography, agreement further improved to κ = 0.87 (95% CI 0.79–0.95), indicating near-perfect agreement. Agreement for assessment category 3 improved to κ = 0.3 (fair agreement) when interpreted together with conventional imaging, with a wide confidence interval. When MBI was interpreted with correlating images, an assessment of 3 was given by at least one observer in six breasts, none of which showed a malignant lesion (Fig. 1). Sensitivity of MBI combined with mammography was 20/20 (1.0) for all observers.

Fig. 1
figure 1

MLO images of the left breast in a 64-year-old woman: MBI image following injection of 322 MBq of 99mTc sestamibi (a) and the mammogram (b) show a mild intensity focal area of non-mass-like uptake in the left axilla (arrow). This corresponds with a low axillary lymph node which had been mammographically stable for at least 4 years. Three observers (including consensus readers) who had given a suspicious or probably benign assessment based on the MBI alone, changed to a benign assessment when the mammogram was provided. Two observers gave a suspicious assessment which did not change with mammographic comparison. One observer rated the finding as benign both before and after mammographic comparison

Specific problem cases

There were a few specific cases for which agreement was only slight or fair. There were two breasts in which five observers agreed with the consensus and described a larger single area of heterogeneous non-mass-like uptake, while one observer picked out a brighter area within it and described the mass as a second/additional lesion (Fig. 2).

Fig. 2
figure 2

MLO images of the right breast in a 46-year-old woman: MBI image following injection of 1,099 MBq of 99mTc sestamibi (a) and mammogram (b). Three observers (as well as the consensus readers) classified this as heterogeneous non-mass uptake only (arrowhead), while three described it as non-mass uptake with a second finding of a mass (arrow). All observers as well as the consensus readers assigned assessment category 4 or 5. Final pathology showed multifocal invasive ductal carcinoma forming three adjacent masses

The lexicon includes axillary uptake as an additional finding, usually thought to represent a lymph node, which may or may not be pathologic. A clearly pathologic axillary finding in one breast was described by five observers as a distinct lesion (three of the observers also described the associated finding of axillary uptake). One observer described this as only the associated finding of axillary uptake.

In three breasts some observers described moderate or marked background uptake as diffuse non-mass-like uptake (Fig. 3). However, all were symmetric, and all observers agreed on a benign final assessment in these cases.

Fig. 3
figure 3

MLO MBI image of the right breast in a 45-year-old woman following injection of 1,106 MBq of 99mTc sestamibi. Uptake in the right breast was classified by some observers (including the consensus readers) as moderate background uptake, and by other observers as diffuse non-mass-like uptake. All observers rendered a negative/benign final assessment. Pathology was benign at prophylactic mastectomy

Discussion

Substantial to near-perfect agreement and high diagnostic performance were found when the proposed lexicon was used for MBI interpretation. These results were achieved by observers with no prior MBI interpretation experience who had received only 2 h of training. These findings support the use of the proposed lexicon for reproducible communication of MBI findings and suggest that MBI studies can be interpreted well by newly trained radiologists with experience in breast imaging.

Interobserver agreement for final assessment was greater in our study than that seen in validation studies of mammography [15], ultrasonography [1618] or MRI [8, 19, 20], and was also slightly greater than seen with PEM [8]. Interobserver agreement for lesion type was similar to or slightly lower than that seen with mammography or ultrasonography [15, 17], but was similar to or higher than that seen with MRI [8, 19]. In general, interobserver agreement for feature analysis was similar to or higher than that seen in lexicon studies of mammography, ultrasonography and MRI [8, 1521]. In these studies and ours, as expected, kappa values were generally lower when a larger number of options were available within a category, and agreement was lowest (with widest confidence intervals) for features which were infrequently seen (such as multiple regions of non-mass-like uptake). Interobserver agreement for final assessment was slightly higher in our study than in a prior MBI study in which two observers retrospectively interpreted screening MBI images in isolation [1]. The diagnostic accuracy in this series was also higher than in the prior study, possibly related to the artificially high number of positive examinations included in this study for the purpose of comparing lesion descriptions.

Our study was similar in design to lexicon validation studies performed for other modalities. It differs because in most of those studies, the lesions to be described were specified on the image, whereas in this study they were not. Hence, it was possible for observers in our study to identify lesions in addition to those identified by the consensus readers. Of the 50 breasts included in the interpretation task, only three such additional lesions were identified.

This proposed MBI lexicon does necessarily differ from other breast imaging lexicons. Because of the relatively low resolution of MBI, finer morphologic details such as mass shape and mass margins cannot be assessed and are not included. Additional terms such as lesion uptake intensity and background uptake intensity need to be included given the functional nature of imaging with 99mTc sestamibi.

Because we did not prespecify which lesions should be described, a few potentially confusing diagnostic situations were brought to light. In the case of consensus-determined marked background uptake, four observers described marked background uptake and two described diffuse non-mass-like uptake. We propose that diffuse non-mass-like uptake only be reported when it is believed to constitute a lesion; diffuse uptake would be more likely to be a suspicious finding when asymmetric with the contralateral breast. In one study, some observers described an enlarged axillary lymph node as an additional finding of axillary uptake, while others described a mass located in the axilla. In this situation, comparison with mammography and/or ultrasonography would be especially helpful to determine whether the axillary uptake represents a pathologic finding (as it did in this case).

We noted relatively frequent use of the “probably benign” assessment category, with 11 of 50 breasts given such an assessment by at least one observer, including three malignancies. In the work-up algorithm presented during training, a negative targeted diagnostic mammogram and ultrasonogram showing the area of probably benign uptake would be necessary before short-term follow-up MBI could be recommended. Observers may have been more comfortable with category 3 because additional imaging is the initial recommendation. Of note, only four breasts were given an assessment of 3 after the mammogram was provided (no malignancies). There was one malignant MBI study where no mammogram was available at the time of this observer study, as the mammography had been performed at another institution. Two malignant cases initially given an assessment of “probably benign” by at least one observer were appropriately upgraded to category 4 once the mammogram was shown. The use of category 3 has not been validated for MBI. Lesions designated as category 3 may have a higher rate of malignancy than the accepted rate of 2% for mammography [2224], ultrasonography [2527] and MRI [28, 29]; of note, a malignancy rate greater than 2% has been shown for category 3 lesions seen on PEM [8].

It is important that interpretive performance improved further when mammograms (and occasionally breast ultrasonograms) were shown to the observers. Similar results were seen for PEM interpretation [8], reinforcing the view that breast imagers should be involved in interpreting breast nuclear medicine examinations. Validation among nuclear medicine physicians has not been performed.

Additional studies at other sites will be necessary to show that the proposed lexicon is also useful at other institutions. We do not yet know how much risk is associated with each lexicon descriptor, such that the lexicon does not offer guidance to interpreters as to which assessment category to assign. Our hope is that the availability of this lexicon will lead to additional research which will further delineate the predictive value of each descriptor/combination of descriptors. Until that data are available, we determine our level of suspicion by factoring in lesion intensity, distribution, conspicuity, the pretest probability of malignancy, as well as findings on additional imaging, as recommended in the Society of Nuclear Medicine guidelines [30]. Higher final assessments are given when there is radiotracer uptake which represents a definite lesion distinct from background that is not accounted for by known benign findings.

This lexicon could likely be used to describe breast-specific gamma imaging (BSGI) and scintimammography findings as well, though validation for such uses is recommended.

Our study had a number of limitations. First, not all terms in the proposed lexicon were evaluated. Symmetry, internal enhancement pattern (homogeneous or heterogeneous) and additional findings (nipple, venous and axillary uptake) were not analyzed. Terms for the description of lesion size and location were also not studied, but are analogous to those in mammography. Quantitative measurement of uptake in breast lesions was not evaluated because of current software limitations, although this is technically possible [31, 32]. An additional limitation of our study was that all observers were from the same institution, which may in part account for the high feature analysis agreement. Cases were preselected in order to include discrete lesions which could be described for lexicon validation, which mimics a diagnostic population going for biopsy. This selection bias may have contributed to the high sensitivity observed. The artificial environment created in administering an interpretation task may not represent how observers would perform in practice [33, 34]. Finally, there was no histological proof of benignity for 8 of the 11 benign lesions included. In some of these cases, follow-up was not longer than 1 year. It is possible that an underlying malignancy was present which did not present itself until after the follow-up period.

In summary, the lexicon used in this study was an easily adopted tool for describing findings on MBI studies. The proposed MBI lexicon is based on BI-RADS terms that are familiar to breast radiologists. Substantial to near-perfect agreement and high diagnostic performance were achieved with relatively little MBI-specific training.