Introduction

Gene amplification of the human epidermal growth factor receptor 2 (erbB2, or HER2/neu, or HER2) has been intensively evaluated in contemporary oncology, because it has potential therapeutic implications for several human cancers, notably breast and gastric cancer [13]. HER2 is amplified in approximately 20% of invasive breast cancers (IBC) [1]; an overview has reported HER2-positivity in breast cancer of 22.2% (range 9–74%) [4]. HER2, a proto-oncogene, has prognostic and predictive relevance in breast cancer in the adjuvant, neoadjuvant, and the metastatic setting [2, 511]. HER2 status refers to the extent of HER2 protein over-expression on immunohistochemistry (IHC), or HER2 gene amplification determined by in situ hybridization (ISH) techniques. Guidelines recommend evaluation of HER2 status in every IBC [1, 1214] primarily to guide treatment with molecularly targeted agents directed against the HER2 receptor. In practice, HER2 status of a primary breast cancer guides not only initial therapy but is often the basis for selection of patients with metastatic disease who may benefit from anti-HER2-directed therapies. This treatment approach makes the implicit assumption that HER2 status of the primary breast cancer reflects that of its corresponding metastasis.

A number of studies have examined HER2 status in primary breast cancer and associated metastases and have reported HER2 discordance in some cases, with variability in the reported discordance proportion, from none to 30% or higher [1517]. Arguments for or against HER2 testing of metastases have therefore been based on divergent evidence and with little certainty on whether the reported HER2 discordance represents tumor-related behaviour or is largely a reflection of testing variability or measurement error [18, 19]. To date, there is little consensus on whether HER2 status discordance, if any, between primary breast cancer and its metastasis is a consistent finding or is of a magnitude warranting re-testing of metastases.

In this systematic review, we examine and summarise the evidence on HER2 discordance between primary IBC and its paired loco-regional or distant metastasis, with the aim of (1) providing summary estimates for HER2 discordance using all the available evidence and (2) exploring methodological, clinical, tumor, and testing-related variables that may be associated with differences across studies in the proportion of HER2 discordant cases. We hypothesized that this approach may potentially allow us to identify patterns in the proportion of HER2 discordance between primary and metastatic cancer which could point towards likely underlying mechanism(s).

Methods

We performed a systematic review and study-level meta-analysis of studies reporting HER2 status in primary breast cancer and its paired metastasis.

Study eligibility

Studies were eligible for inclusion if they reported HER2 status, based on IHC and/or ISH-based testing, in primary IBC and its paired loco-regional or distant metastases. We included studies reporting cross-classified data for HER2 status (primary vs. metastasis), and allowing calculation of discordance rates (proportion of discordant cases) and the direction of discordance between the primary cancer and associated metastasis. We also included studies where the cross-classified data could be derived from reported results.

We excluded studies reporting paired HER2 data in fewer than 25% of subjects or in <20 subjects. We excluded studies of circulating tumour cells, studies of sentinel node biopsy only (where metastases included micro-metastases or isolated tumour cells), or those based on post-mortem testing. Studies published as abstracts were ineligible for this review. Appendix 1 provides further information on eligible studies [1517, 2042] and on studies excluded [4350] based on pre-defined criteria.

Literature search and data extraction

We systematically searched the literature (MEDLINE, EBM Reviews databases to August 2010) for primary studies that met eligibility criteria, using the search and study identification strategy summarised in Appendix 1. Descriptive and quantitative data were extracted by two of three authors (NH and either RB or MB), and disagreement was resolved by discussion and consensus; quantitative HER2 results were independently extracted into a spreadsheet. Extracted information included study and subjects’ characteristics, type and site of metastases, time to metastasis, tumor grade, test-related parameters including scoring criteria, HER2 status for primary and metastatic tumor.

Quality appraisal

Evidence tables, summarising key study characteristics, methodology, and HER2 testing and interpretation parameters were developed with consideration of the American Society of Clinical Oncology/College of American Pathologists recommendations for human epidermal growth factor receptor 2 testing [1], and recommendations for reporting biomarker studies [51, 52]. For valid comparison of HER2 status in this scenario, we looked at whether testing methods and conditions (see “Testing parameters”) were consistent between testing of the primary tumor and the metastasis, and whether paired testing was performed at the same time and laboratory. Studies not meeting this quality criterion (but otherwise meeting eligibility criteria) would, a priori, not be excluded but subjected to a sensitivity analysis. We also looked at whether HER2 testing for each of the tumors was interpreted independently (“masked”) of the other.

Testing parameters

We examined information on type of test (IHC or ISH based), testing methods (sampling, assays used, interpretation criteria, and interpreters), and test interpretation relative to consensus recommendations [1] while cognisant that current test-scoring standards were too recent to be included in studies published prior to 2008. We therefore accepted IHC 0–3+ scoring (based on HercepTest criteria) to be “standard” where staining was reported in >10% of tumor cells (further testing details, including sampling and assays used, are provided in Online-only Table A). For analytic purposes, we classified study-specific HER2 scoring/interpretation into pre-defined categories. For IHC, the categories (from most to least consistent with current standards) were: (i) either standard (HercepTest) scoring (0–3+) with 3+ counted as positive, equivocal results (2+) confirmed with ISH-based testing or standard scoring (0–3+) with most subjects tested/confirmed with ISH; (ii) standard scoring (0–3+) with ≥2+ counted as positive without ISH testing or ISH only in minority of equivocal results or ISH testing in all subjects but HER2 classified according to IHC alone, or (iii) semi-quantitative scoring system (other than standard) described in primary study without ISH testing, or other non-standard scoring.

HER2 scoring for gene-amplification in ISH-based testing was categorised in two groups, based on standards [1] (minimum threshold for FISH ratio was modified to 1.6 to allow classification of data): FISH ratio 1.6–2.2 or HER2 gene copy 4.0–6.0; or FISH ratio >2.2 or HER2 gene copy >6.0. Study-specific information was extracted for the criteria used for HER2-positivity, type of antibody (monoclonal or polyclonal) for IHC assays, and DNA probe for ISH.

Statistical analysis

Descriptive data [median, inter-quartile range (IQR)] were examined for study-level variables: prevalence of HER2 positivity, subjects’ age, time to metastasis, and tumor grade distribution. For each study, the proportion discordant in HER2 status and the corresponding exact 95% confidence limits were computed and displayed in forest plots. The overall pooled estimate of the proportion discordant was based on a random effects logistic regression model, which allows for both within-study sampling variability and between-study variability (heterogeneity) in estimates of the proportion discordant. Categorical covariates, such as test type and test-related parameters, or type of metastasis, were fitted (separately) in the model to obtain the random effects pooled estimates for the proportion discordant across studies and its confidence limits, for each category based on studies that provided covariate-specific data. Quantitative covariates, such as age, were fitted as continuous covariates in the model to examine their association with the proportion discordant. In sensitivity analysis, all modelled analyses were repeated excluding any study considered to be an “outlier” for consistency in testing methodology for paired tests.

HER2 discordance directional analysis was examined by classifying, at the study level, discordant pairs according to whether the HER2 positive result was from the primary or the metastasis; the difference (primary—metastasis) was then computed. The Wilcoxon signed rank test was used to assess whether there was any evidence of a systematic direction for the difference. McNemar’s test was also used to analyse these data by classifying the directional difference for each study as positive, negative or equal.

Results

Twenty-six studies met our inclusion criteria [1517, 2042] (Appendix 1)—these provided paired HER2 data for primary and metastatic tumor in 2,520 subjects. The characteristics of these studies are summarised in Table 1, including definition of eligible subjects, site of metastases, and quality appraisal. Additional detailed testing methods (type of test and samples, assays, scoring and interpretation criteria) are in Online-only Table A. The median proportion of HER2 positivity in primary breast cancers in these studies was 26.1% (IQR 18.8–36.6%). Study-specific median age (21 studies) was 52.0 years (IQR 50.5–53.0 years). Median time to metachronous metastasis (13 studies) was 34.5 months (IQR 28.8–59.5 months). Median distribution for tumor grade (13 studies) was grade I in 8.2% (IQR 6.7–10.8%), grade II in 39.0% (IQR 29.0–45.2%), and grade III in 52.0% (IQR 48.1–63.8%).

Table 1 Primary studies of HER2 in IBC and paired metastases: study characteristics and quality appraisal

All studies were retrospective except for two prospective studies [37, 41]. As shown in Table 1, most studies included subjects who had metastases biopsied and available tissue specimens for primary and metastasis; only six studies included or sampled subjects from a well-defined cohort or therapeutic clinical trial. Santinelli et al. [37] examined several groups of subjects and is included in quantitative analysis as two series—hence “27 studies” are reported in estimates.

Consistency of (within-subject) testing conditions for paired primary cancer and metastasis

Table 1 and Online-only Table A summarise the evidence on testing parameters: studies generally examined HER2 in primary tumor and its paired metastasis (within subjects) using the same test and testing methods, the same scoring and interpretation criteria, in the same laboratory and under the same conditions. Testing of both primary and metastatic tumor was performed at the same time (predominantly by testing representative tissue sections from formalin-fixed and paraffin-embedded primary and metastatic breast cancer); one exception was the study from Lower et al. [17] which included subjects attending a physician’s practice hence HER2 data did not involve testing under the same conditions (Table 1). All other studies tested primary and metastasis using generally consistent methods and were examined in two groups in preliminary analysis to judge within-subject consistency of HER2 testing: (1) 13 studies which maintained all conditions of testing for primary and metastatic tumor [16, 2224, 27, 29, 30, 32, 33, 36, 37, 40] and (2) 13 studies using the same test and assay, and maintaining testing methods on the whole but either omitted information on one testing aspect [15, 20, 21, 25, 26, 28, 31, 34, 35, 42] or were inconsistent in only one testing element [38, 39, 41]. For the latter group, this was omission of information on test interpreters rather than actual testing differences, with only three studies showing inconsistency relating only to type of samples used [38, 39, 41] (Table 1). Pooled estimates for the proportion of HER2 discordance were not significantly different between group 1 studies (HER2 discordant 5.9%; 95% CI 3.5–9.9%) and group 2 studies (HER2 discordant 4.6%; 95% CI 2.6–8.1%), P = 0.53. Therefore, based on pre-defined quality appraisal criteria for consistency of paired testing (see “Methods” section), only the study from Lower [17] sufficiently differed on this criterion to warrant sensitivity analysis.

Pooled proportion of HER2 discordance between primary and metastasis

Forest plots of the studies included in this review, stratified by test used (study-specific data ordered by increasing HER2 discordance within test strata), are displayed in Fig. 1: pooled estimates of the proportion of HER2 discordance between primary and metastasis (HER2 discordant%) were 5.5% (95% CI 3.6–8.5%) or 5.2% (96% CI 3.5–7.8%) excluding the study from Lower [17]. Modelled estimates of HER2 discordant% were not associated with the proportion of HER2 positivity in primary cancers (P = 0.42), median age (P = 0.78), or study time-frame (P = 0.22). In the subset of studies reporting time to metachronous metastases (13 studies), median time to metastasis was not associated with HER2 discordant% (P = 0.48).

Fig. 1
figure 1

Study-specific estimates and pooled estimate for HER2 discordant % in studies reporting HER2 status in primary breast cancer and paired metastasis. D/N* = number of HER2 discordant cases/total number of paired primary breast cancer and metastasis. Estimates for HER2 discordant % for each group of studies according to the test used are shown in Table 2 (details of test scoring and interpretation criteria are described in “Methods” section). The group of studies indicated by IHC/FISH represents studies that assigned an overall HER2 status based on IHC and FISH using the following algorithms: Guarneri [36] considered HER2 status as positive where IHC was 3+, or IHC was 2+ and FISH was amplified or FISH was amplified; Santinelli [37] considered HER2 as positive in cases where IHC was 3+, or IHC was 2+ and FISH was amplified; Simon [35] considered HER2 as positive in cases where IHC was 2+ or 3+ or FISH was amplified and reported 94% concordance between IHC and FISH in their data. The work of Santinelli [37] has been included as two series (nodes; LR/DM = local recurrence or distant metastases)

Table 2 summarises overall and covariate-specific estimates for HER2 discordant% for categorical variables and association of covariates in modelled estimates (study-level covariates examine between-study variability in HER2 discordance). Sensitivity analysis reports each estimate excluding the study from Lower et al. [17]. We used this approach because the study from Lower [17] was an “outlier” for maintaining testing consistency within subjects, and not due to the quality of this study which focused on survival (and reported relatively better survival in subjects with HER2-negative primary cancer and HER2-positive metastasis) [17].

Table 2 Modelled estimates of HER2 discordant proportion in studies of HER2 status in primary breast cancer and its paired metastasis

Type of test used was not associated with HER2 discordant% across studies (P = 0.27). The small group of studies indicated by IHC/FISH represents studies reporting “overall” HER2 status based on IHC or FISH: although these studies had relatively higher summary HER2 discordant%, this estimate had a wide confidence interval. Within the subset of IHC studies, scoring and interpretation criteria, and type of antibody, were not significantly associated with HER2 discordant%. There were too few studies primarily using ISH-based testing to allow comparative analysis, however, modelled estimates for HER2 discordant% were very similar for ISH scoring criteria categories, and for type of probe (Table 2).

In contrast, type (site) of metastasis was significantly associated with modelled estimates of HER2 discordant% across studies (P = 0.0017); this association was unchanged in sensitivity analysis (Table 2). Figure 2 summarises study-specific estimates stratified by the type of metastasis (data ordered by increasing HER2 discordance), and pooled estimates for each metastasis-type grouping (see also Fig. 2 footnote). Models comparing different groupings for the type of metastasis paired with the primary are shown in Table 2: studies with distant metastases had higher HER2 discordant% (11.5%; 95% CI 6.9–18.6%) than those with nodal metastases only (4.1%; 95% CI 2.4–7.2%), P = 0.0082. Studies of primary paired with distant metastases also had higher HER2 discordant% (11.5%; 95% CI 6.8–18.6%) than those including nodal metastases only or various metastases (3.3%; 95% CI 2.0–5.6%), P = 0.0011. These associations persisted in sensitivity analysis. The group of studies pairing primary with various metastases (mostly nodal or loco-regional metastases, a minority with distant metastases) had the lowest HER2 discordant% (1.42%; 95% CI 0.40, 4.86) although the confidence interval for this estimate overlapped with that of the group of studies with nodes only. HER2 discordant% was also associated with whether the paired metastasis was synchronous or metachronous to primary cancer (Table 2) being higher for studies that included only metachronous metastases. This variable correlates with the type of metastasis, in that almost all distant metastases were metachronous, and the vast majority of lymph node metastases were synchronous.

Fig. 2
figure 2

Study-specific and pooled estimates for HER2 discordant % stratified by type of metastasis considered in each study, for studies reporting HER2 in primary breast cancer and paired metastasis. D/N* = number of HER2 discordant cases/total number of paired primary breast cancer and metastasis. Study-specific data ordered by increasing HER2 discordance within each metastasis-type stratum. The group of studies that included a mix of metastases (metastatic nodes, local recurrence, or distant metastases) had ≤25% distant metastases. # Group of studies that included distant metastases or local recurrence comprised approximately 67% distant metastases and 33% local recurrence. The work of Santinelli [37] has been included as two series (nodes; LR/DM = local recurrence or distant metastases

Direction of HER2 discordance

Based on the McNemar test, there was no evidence of a systematic direction across studies for direction of discordance (P = 0.13): 14 studies had discordant pairs with overall change in HER2 from negative primary cancer to positive metastasis, 7 studies had discordant pairs with overall change in HER2 from positive primary to negative metastasis, 6 studies had neutral (equal) discordance direction or no discordance. However, exclusion of the outlier study [17] provided weak evidence (P = 0.074) that discordance in the direction of change from HER2-negative primary cancer to HER2-positive paired metastasis was more likely than the reverse. The results were very similar using the Wilcoxon signed rank test (data not shown) whereby weak but consistent evidence of a systematic difference for HER2 discordance direction (more studies with change to HER2-positive paired metastasis) was found across studies (P = 0.067) after excluding Lower et al. [17].

Discussion

Defining the potential for, and the implications of, discordance in HER2 status between primary breast cancer and its metastasis requires evaluation of two issues. First, quantifying the frequency of discordance and identifying associated factors (if any); second, determining whether discordance in HER2 status impacts clinical care, response to treatment, or prognosis. Our work contributes towards addressing the first of these issues through evidence synthesis from observational studies. The latter issue requires prospective trials examining clinical end-points, particularly in the era of HER2-targeted adjuvant treatments. We examined HER2 discordance between paired primary and metastatic breast cancer in meta-analysis to explore factors associated with differences in discordant proportions across studies. The pooled proportion of HER2 discordance was modest (5.5%; 3.6–8.5%). However, there was a significant association between HER2 discordant proportions and the type of metastasis, in particular, studies pairing primary tumor with distant metastases showed higher proportions of HER2 discordance (9.6%; 4.9–17.7%) than those pairing primary with lymph node metastases only (4.2%; 2.5–7.1%).

There are many challenges to deriving clear answers from existing data on the stability of a HER2 status report between primary cancer and its metastasis, including the limitations of the relatively small size of studies, differences between studies in HER2 testing without centralised evaluation, and predominantly retrospective data sets. Notwithstanding these limitations and considering these studies provide opportunity to explore HER2 discordance in the pre-adjuvant trastuzumab era (without selection based on HER2 positivity), our meta-analysis revealed consistent themes that provide insight into underlying mechanisms. Discordance in HER2 status between primary and metastatic breast cancer may be due to various factors, including tumor progression (genetic drift or clonal selection for a particular HER2 phenotype), test-related technical and interpretation parameters including imperfect test reproducibility, and intra-tumoral heterogeneity of HER2 gene amplification (clonal selection, or sampling differences causing ascertainment bias). Press et al. [53] have shown that the latter is rare, estimating that intra-tumoral HER2 heterogeneity occurred in <1% of women participating in clinical trials. Experts emphasize caution in interpretation of HER2 discordance between primary and metastatic tumor, because the contribution of the various factors outlined above is unknown [18] and we did not aim to quantify these factors in meta-analysis. Our work primarily explored patterns in HER2 discordance estimates across studies. In models examining test-related factors, we found that while HER2 testing varied between studies (but performed at the same time for paired tumors and largely maintained within subjects in all but one study [17]), this did not account for the diversity in HER2 discordance estimates. HER2 status discordance was, however, associated with the type of metastasis, suggesting that HER2 discordance may be a reflection of tumor-related factors, based on this meta-analysis. Our finding of weak evidence in sensitivity analysis (P = 0.074) that discordance in the direction of change from HER2-negative primary cancer to HER2-positive paired metastasis was more likely than the reverse further supports tumor-related factors as possible underlying mechanisms of HER2 discordance.

Our findings do not discount variability in HER2 testing, interpretation, or reproducibility [18, 5457], as possible or partial explanations for HER2 discordance between primary and metastatic tumor in individual cases or studies; because HER2 testing has imperfect reproducibility it is possible that the absolute proportion of discordance is overestimated. However, the effect of testing variability between studies was not significant in our pooled analysis, and the absence of association in our models for test-related factors (based on study-level measures which included test type, and scoring and interpretation criteria) means that these test-related factors did not account for differences across studies in HER2 discordant proportions. Our modelled estimates indicate a statistically significant association for type of metastasis, as outlined earlier, with higher HER2 discordance from studies pairing primary with distant metastases (or that included a majority with distant metastases) than those pairing primary with regional lymph node metastases only or various metastases (Table 2). While we acknowledge that HER2 status reports, due to inherent limitations of test reproducibility, would be expected to give some discordant results for repeated measures due to chance alone [18], this cannot account for the associations shown in our meta-analysis because discordance caused by imperfect test reproducibility would not have a differential effect by type of metastasis (rather, this would be expected to occur randomly across studies) unless HER2 testing of primary and metastasis was separated in time. Importantly, as shown in Table 1, all studies performed same-time HER2 testing with consistency in testing conditions for paired tumors (except for the one outlier study [17]).

One potential limitation, where a test is associated with variability in technique and reproducibility, is whether it is appropriate to pool data. We dealt with this in several ways: the effect of test-related parameters on discordance estimates was examined through analysis; we used random-effects logistic regression models to allow for study heterogeneity; and study-specific descriptive information on HER2 testing (sampling, assays, and scoring) was summarized in evidence tables. Furthermore, our meta-analysis is agnostic with respect to absolute accuracy of HER2 testing: the focus of our work is the relative HER2 status between primary and paired metastatic tumor within subjects compared analytically across (between) studies. As such, the first issue for valid comparisons is whether testing conditions were maintained within subjects (see “Consistency of testing conditions”). This has been carefully considered in quality appraisal (Table 1) and has been examined in our models, and formed the basis for excluding the outlier study [17] in sensitivity analysis. Once within-subject testing consistency was ascertained, our models considered covariates (Table 2) to investigate potential explanatory factors for between-study variability in HER2 discordance. Of note, pre-defined categories for HER2 test interpretation criteria were used in our models to investigate differences in study-specific interpretation relative to current standards and not to judge the interpretive quality of testing—the latter cannot be determined in study-level meta-analysis. It is also possible that our test-related parameters, while capturing information on HER2 testing for pre-analytic (sampling), analytic (test, assay), and interpretation (scoring, positive/amplified threshold) variables, have not measured all sources of testing variability. For example, pre-analytic factors, such as the age and processing of archived tissue, may have contributed to some of the observed discordance in paired testing and between studies. Study-level and retrospective individual person data meta-analysis cannot completely allow for all technical variability in HER2 testing between studies, and the optimal strategy requires prospectively implemented individual person data meta-analysis with centralised paired HER2 testing.

Guidelines recommend HER2 testing of primary IBC (at diagnosis) or at the time of metastatic relapse [12], and some recommend testing metastases if HER2 status of the primary is unknown [58]. Several estimates for HER2 discordance between primary and paired distant metastases are reported in our models, for various comparisons: the estimate with the largest number of cases of primary paired with distant metastases was 11.5% (6.8–18.7%). Our findings should not be taken as grounds to change practice in HER2 testing of distant metastases. However, they support review of recommendations for consideration of selective HER2 re-testing of metastases where biopsy is justifiable on clinical grounds, if such results would affect clinical decision-making. Two prospective studies in our analysis indicated that HER2 re-testing of metastases led to change to trastuzumab therapy or to participation in a clinical trial [41], or to therapeutically relevant HER2 discordance [37], in some subjects. Any change in the current approach for testing metastases should factor the possibility that re-testing may cause false-negative HER2 status [18]. Where feasible, re-testing should include simultaneous testing of primary and metastasis using the same methodology [18].

We report evidence that the type of metastasis was the main association in meta-analysis of diverse estimates for HER2 discordance between primary breast cancer and its paired metastasis. In support of this conclusion are the observations that HER2 discordance was significantly more likely in studies pairing primary with distant metastasis than those with regional nodal metastasis; evidence suggesting systematic directionality in discordance; association of HER2 discordant proportion with whether metastases were synchronous or metachronous to primary cancer; and lack of association between HER2 discordant proportion and test-related parameters. Further research will be necessary to delineate mechanistic insight as to whether observed associations in this meta-analysis are due to biological factors related to type of metastasis, such as acquired changes in molecular phenotype during the course of tumor evolution, or reflect other variables associated with type of metastasis.