Introduction

The false-negative rate (FNR) of the sentinel node (SN) biopsy procedure is an important measure of procedural accuracy in the surgical management of breast cancer. Potential adverse outcomes from missing node metastases include understaging the patient and an increased risk of cancer recurrence. The FNR is measured by comparing the pathological status of the SNs to that of the remainder of the axillary nodes present in a completion axillary node resection (ANR).

A large randomized trial has demonstrated that when cancer has not metastasized to the SN, survival and recurrence are equivalent in SN biopsy only and ANR [1]. Since the morbidity is lower with SN biopsy only, ANR is not typically justifiable in the SN-negative case. The era of gathering new data on FNR is essentially over.

There are many different methods for removing SNs. These have been devised to achieve practical and theoretical advantages such as decreased pain, increased success rate, and ease of performing the procedure. Many of these outcomes can be reliably evaluated with the number of patients typically seen in a single center. However, establishing an accurate FNR involves observations on a very large number of patients. For example, to establish a 5 % FNR (in a range of 0–7 %) for a single surgeon would take about 300 patients with positive axillary nodes [2]. Comparing the differences in FNR between different SN biopsy methods requires a very large number of patients. Few if any published reports on single-patient populations can define with sufficient statistical certainty what the FNR is for any given SN biopsy method.

We present here a meta-analysis of the FNR observed with different SN biopsy methods. The FNR analysis was based upon grouping published data according to the type of material injected and the location of injection. Only the axillary nodes were considered for this analysis; nonaxillary nodes were not included.

Materials and methods

Search strategy

An information service provided a manually categorized list of the 3,588 published articles related to SNs and breast cancer that were listed in PubMed from 1993 through mid-2011 (Sentinel Nodes and Breast Cancer, www.treeofmedicine.com). Of the 3,588 articles, 302 were categorized as having reported a FNR. These 302 articles were reviewed for inclusion in this meta-analysis. Additionally, a PubMed search was performed for meta-analyses that have already addressed SN biopsy in breast cancer and FNR [38]. Citations from these meta-analyses were also reviewed.

Selection criteria

To be included in the meta-analysis, articles must have contained groups or subgroups of patients who had pathologically negative SN biopsies followed by ANR. Injection material(s) and injection location(s) must have been clearly specified. A FNR reported for a group of patients must have been from a group that had the same injection material(s) and injection location(s) for each patient in the group. An article could have more than one group of patients and be included in our study. The number of patients with positive axillae and the number of patients with false-negative SNs had to be described separately for each group in the article.

Articles that reported results based largely on the same group of patients as a previous article were excluded. Articles that were not available in English were also excluded.

Data collection

A false-negative event is when a patient has a pathologically negative SN and has at least one pathologically positive non-SN. For each article, the number of false-negative cases and the number of cases with positive axillary nodes were recorded for each group of patients that met our inclusion criteria.

Patients were grouped according to the injection location and the injection material. Injection material was classified as dye alone, radioactive tracer alone, and combined dye and tracer. Dyes included isosulfan blue, methylene blue, patent blue, indocyanine green, indigo carmine, and Evan’s blue. Tracers included 99mTc-colloidal albumin, technetium sulfur colloid, 99mTc tin, 99mTc phytate, 99mTc dextran, antimony sulfide, and 99mTc rhenium colloid.

Location of injection was classified as around the tumor (including peritumoral and subcutaneous over the tumor), areolar or periareolar, intratumoral, and intradermal. Combinations of injection locations were classified as around the tumor and areolar; around the tumor and intradermal; and intradermal and areolar. The classification of multiple injection locations was based on groups of patients for whom data were available and who met our inclusion criteria.

Statistical analysis

For a set of patients who had a successful SN biopsy, the FNR was defined as the number of false-negative cases divided by the number of cases with any axillary nodal metastases (FNR = FN/(TP + FN). The reported FNRs were in the form of sample proportions (p = x/n), where n is the number of cases with axillary dissections and x is the number of false-negative cases from each citation. This sample proportion p can be viewed as being an estimate of an unknown binomial parameter that represents the true FNR (π) for a given study. It is known that the variance of the sample proportion p as an estimate of the binomial parameter π is equal to (π) × (1 − π)/n and that the usual method of estimating this sampling variance is to substitute the observed proportion p for π. However, in the two extreme cases where the observed FNR proportion value p equals zero or 1.0, such a substitution can result in a zero value for the sampling variance. Such a simple substitution scenario does not provide a reasonable estimate for the sampling variance of p that reflects the influence of the sample size n. The arc sine transformation of the sample proportion p (x/n) does provide a method that reflects the sample size in the transformed data scale. The arc sine transformation is the result of taking the arc sine of the square root of (x/n) [9]. On the transformed arc sine scale, the transformed FNR has a sampling distribution with a variance of approximately 1/(4 × n). Thus, the sample size n from each citation can be included to reflect the precision of the reported data. Formal meta-analysis of the transformed proportions (p = x/n) can be conducted and then the use of the appropriate back transformation allows summary effect size estimates to be converted back to the original FNR scale.

The arc sine-transformed FNR and sample size for each citation were entered into a data file with the corresponding covariate information, such as year of publication (broken down into approximate quartiles), the injection agents, and the locations used for injections. Furthermore, citations were divided into 17 possible groups by the combination of injection agent and location of injection used. Each covariate was examined relative to variation among the transformed FNR data with the use of a mixed model where citations within any given covariate subgroup were assumed to represent a random effect and the covariate subgroups were assumed to be a fixed effect. Summary forest plots for each of the three covariate subgroups were created based upon the random-effects models. Forest plots of the transformed FNR data were created for those covariate groups, which showed reasonable mixed-model statistical heterogeneity based upon the Q test [10]. Significant Q statistic values were followed with pairwise comparisons to isolate the source of the FNR heterogeneity. The summary effect sizes and 95 % confidence limits within covariate subgroups were back-transformed to create summary covariate-specific subgroup FNR point estimates and their 95 % confidence intervals. A sensitivity analysis was conducted by deleting those citations with FNR equal to zero to determine if any of the primary results were altered.

All primary data transformations and back transformations as well as recoding were conducted using SYSTAT ver. 11 (Systat, Chicago, IL). Mixed-model meta-analysis calculations and graphical displays were obtained using Comprehensive Meta-Analysis ver. 2.2 (Biostat, Englewood, NJ).

Results

Of the 302 articles that reported a FNR value, there were 183 articles that met the inclusion criteria. Overall, these articles produced 202 unique patient groups for analysis. Seventeen articles had two clearly defined groups with either different injection materials or different locations and gave FNRs for each group. One article presented three separate groups. The total number of patients included in these studies was 9,220. The total number of patients with false-negative axillae was 794. The crude overall FNR was 8.61 % (CI = 8.05–9.2 %). Using a fixed-effects model assuming homogeneity between studies, the overall FNR was calculated to be 7.5 % (CI = 7.0–8.1 %). Dropping the homogeneity assumption, it was estimated to be 7.0 % (CI = 6.1–7.9 %) using a random-effects model.

By year

The year of publication for the 202 groups shows growth from a few before 1997, with rapid growth from 1997 to a peak in 2000 and 2001, and a steady decline over the decade (Fig. 1). The dates of publication for the 202 groups were divided into approximate quartiles by year of publication as follows: 1993–1999, 2000–2001, 2002–2004, and 2005–2011. The FNRs for these quartiles were 5.4 % (CI = 3.8–7.3 %), 7.4 % (CI = 6.0–8.9 %), 6.1 % (CI = 4.5–7.8 %), and 8.9 % (CI = 7.1–11.5 %), respectively. There was no significant variation between quartiles (p = 0.09).

Fig. 1
figure 1

Number of articles published per year that report false-negative rate

By injection material

FNRs were calculated for three categories of injection materials: dye-only, tracer-only, and dye-and-tracer. One study was excluded from this analysis because each patient received two different types of dye [11]. FNRs were 8.6 % (CI = 6.7–10.8 %) for dye-only, 7.4 % (CI = 5.6–9.3) for tracer-only, and 5.9 % (CI = 4.8–7.1 %) for dye-and-tracer (Fig. 2 shows overall group data and Fig. 3a–c shows individual study data). All three groups were first compared to determine if their FNRs were equal. The Q statistic for heterogeneity indicated that they were not all equal (p = 0.050). Subsequent pairwise comparisons indicated that there was a difference between the dye-only and the dye-and-tracer categories (p = 0.018). However, there was no difference between tracer-only and dye-only (p = 0.370), or tracer-only and dye-and-tracer (p = 0.178).

Fig. 2
figure 2

The false-negative rate according to injection material of dye only, tracer only, or combination of dye and tracer

Fig. 3
figure 3figure 3figure 3

False-negative rates for individual studies according to dye only (a), combination of dye and tracer (b), and tracer only (c)

By location of injection

There were seven categories based on location of injection. Four categories consisted of a single location site while three categories represented a combination of locations. The intratumoral location had the lowest FNR at 2.5 % (CI = 0.2–12.6 %), while FNRs at the other locations ranged from 4.9 to 8.3 % (Fig. 4). However, there was no statistically significant variation between these seven location categories (p = 0.95).

Fig. 4
figure 4

The false-negative rate according to injection location

By location and by injection material

A total of 17 groups of patients could be identified based on the combination of materials injected and the location of injection (Table 1). The categories were first compared as a whole, and the Q statistic indicated a statistically significant variation existed among the FNRs of the 17 categories (p = 0.034).

Table 1 False-negative rate according to injection material and site of injection

There was significant FNR variation between the injection material categories for those injected around the tumor and those injected intradermally. In the patients who had injections restricted to around the tumor, there was a significant variation in the FNR depending on the injection material (p = 0.0069). The difference in FNR between the dye-only group (9.4 %: CI = 7.3–11.6 %) and the dye-and-tracer group (5.4 %: CI = 4.2–6.9 %) was statistically significant (p = 0.002). There was no significant difference in the FNR between the tracer-only and the other two injection material categories. For the patients in whom the location of injection was intradermal, the FNR was significantly different between dye-only (14.3 %: CI = 6–25.3 %) and tracer-only (0 %: CI = 0–10.3 %) (p = 0.03). However, there were only two groups of patients with injections restricted to this site.

There was no significant FNR variation detected between locations no matter the type of injection material(s) used.

Sensitivity analysis

A sensitivity analysis was performed by dropping all groups that had a FNR of 0 % from the analysis. After this, 166 groups remained. The resulting FNR was 8.7 % (CI = 8.2–9.3 %) using a fixed-effects model and 8.5 % (CI = 7.8–9.3 %) using a random-effects model. There continues to be no significant variation in FNR between the quartiles based on year of publication (p = 0.125). or between injection locations (p = 0.450). The significant variation between injection materials persists (p = 0.005).

Conclusions

This meta-analysis includes data from over 9,000 cases from 183 articles that met the inclusion criteria. As shown in Fig. 1, the number of articles reporting FNR peaked in the year 2000 and declined at a rate similar to the ramp-up rate. There was a nonsignificant trend for the FNR to increase with the last time-period quartile, with the highest at 9.2 %. It is unclear whether the most current quartile truly represents the stabilized rate (~9 %) or whether the trend for even higher FNRs will continue.

We focused on injection materials and injection locations as the procedural variables most likely to impact the FNR. These are the variables that can be most easily modified to achieve the best results. There are too few cases available in the literature to make strong conclusions about how other variables impact the FNR, such as those related to institution, surgeon, and the patient. Injection materials were grouped into three categories: dye-only, tracer-only, and dye-and-tracer. There were insufficient cases to perform a more detailed analysis of materials within these categories. Figure 2 shows that dye-only had the highest FNR of the three categories. No difference was observed between tracer-only and dye-and-tracer. These data support the inclusion of tracer when performing SN biopsy in order to achieve the lowest possible FNR.

Injection locations were grouped into seven categories: four single-site injection locations and three multiple-site injection locations. There were no statistically significant variations among these seven location categories. Intratumor injections had the lowest numerical FNR, but there were too few cases to establish that rate as statistically significant. Intradermal-only injections also had a low FNR, but fewer cases resulted in an even wider confidence interval range. Despite the trends, no location stands out as a compelling choice to achieve the lowest FNR. In the absence of such information, the surgeon may consider other factors such as convenience or success rate when choosing the injection location.

More detailed analysis based on material injected and injection location allowed 17 groups of patients to be identified. The results from this more detailed analysis provided no new conclusions. When compared within individual injection materials, location did not make a difference in the FNR. Interestingly, injection material made a difference only when it was injected around the tumor and intradermally.

The results of this meta-analysis are limited to the cases available in the published literature. As a consequence, the potential for publication bias may exist due to the use of only English language citations in peer-reviewed journals. In addition, since many small studies may be prone to reporting of small FNRs and a true FNR of zero would be extremely unlikely in practice, the sensitivity analysis conducted excluded citations with FNR = 0. The subsequent analysis of this reduced set of citations did not impact on the basic conclusions based upon the full set of citations. Other limitations relate to results for the injection sites. Here, intratumor injection (with intradermal injection close behind) may have the lowest FNR, but there are insufficient cases to determine this with certainty. The available data do not support any one injection location over another. There may also be a persistent trend of higher false-negative cases over time. This seemingly increasing trend, however, is not likely ever to be confirmed since at this point confirmatory ANR is not tenable.

This meta-analysis was restricted to axillary nodes and the status of nonaxillary nodes was not considered. There is considerable variation between the different SN biopsy methods to identify nonaxillary SNs. For example, intratumoral or deep tracer injections identify nonaxillary nodes in 15 % or more of patients. About 1 % of clinically node-negative patients have extra-axillary metastases when there are no axillary metastases [12, 13]. Ignoring nonaxillary nodes means that about 1 % of SN metastases will be missed. An interesting question is whether this 1 % of missed SN metastases should be added to the FNR of methods that are restricted to the axilla.

These results also are relevant to training and quality control. The number of cases needed to assess performance accuracy of an individual surgeon cannot be based on that surgeon’s FNR. This has always been true, given the large number of cases necessary to establish an individual surgeon’s FNR. Now, without the ability to perform confirmatory ANR, it will be impossible to determine a surgeon’s FNR. For the same reasons, long-term surgeon performance also cannot be pegged to a FNR. Other metrics for gauging surgeon performance of SN biopsy will need to be established.

In conclusion, the results of this meta-analysis indicate that the material injected had an impact on FNR and that blue dye alone was associated with the highest FNR. Location of injection did not have a significant impact on FNR, indicating that any of the reported locations will achieve a similar FNR. There was a trend for higher FNRs over time and FNR for the population of surgeons currently performing this procedure is unknown. Lastly, FNR should not be used as a training or quality control metric.