Introduction

Human epidermal growth factor receptor (HER2) serves as an important prognostic biomarker in breast cancer (BC), predicting recurrence risk and patient outcomes. Approximately 15–20% of BC cases exhibit HER2 gene amplification or protein overexpression. For over two decades, HER2 overexpression has been recognized as an adverse prognostic factor [1,2,3,4].

However, the advent of anti-HER2 targeted drugs, such as trastuzumab and pertuzumab, remarkably improved the clinical outcomes of patients with HER2-positive, but not HER2-negative, BC [3, 5,6,7]. Given anti-HER2 drugs’ side effects and significant cost, accurate determination of HER2 status is crucial before offering them to BC patients.

International guidelines have been established to standardize and optimize HER2 testing protocols [8]. According to the current HER2 scoring guideline, BC is classified into two groups: HER2-positive, defined as a score of 3 + on immunohistochemistry (IHC) staining or IHC score 2 + accompanied by HER2 gene amplification detected through in situ hybridization (ISH); and HER2-negative, encompassing HER2 IHC scores of 0 and 1 +, or IHC score 2 + and negative for gene amplification per ISH (as outlined in Table 1) [8]. However, recent insights from the DESTINY-Breast04 trial have led to the recognition that patients with metastatic BC with low levels of HER2 expression (HER2-low), specifically defined as HER2 IHC scores 1 + or 2 + and negative for gene amplification per ISH, can benefit from the novel anti-HER2 antibody–drug conjugate (ADC) trastuzumab deruxtecan (T-Dxd) [9]. This emerging therapy challenges the existing binary HER2 classification, prompting a reclassification of HER2-negative cancers into the categories of HER2-low and HER2-zero.

Table 1 HER2 immunohistochemistry score criteria

Up to this point, the conventional approach for assessing HER2 status in BC involves a combined IHC and ISH test [8]. Nonetheless, this test remains susceptible to multiple pre-analytical and analytical variables that could impact testing sensitivity and reproducibility, particularly in distinguishing between HER2 IHC score 0 and 1 + BC. Among these analytical variables, the notable inter- and intra-observer variability introduced by pathologists is an important influencing factor in HER2 assessment [10,11,12,13]. Researchers have sought alternative methods to achieve more accurate HER2 status evaluation. Digital image analysis (DIA) has garnered attention for its potential use in quantitative IHC assays and has emerged as an objective and reproducible scoring technique for assessing HER2 IHC results [14,15,16,17,18,19,20,21,22,23]. Studies have indicated that DIA has the potential to reduce equivocal cases in HER2 IHC assessments, emphasizing its promising role in this field [19,20,21].

The primary objective of this pilot study was to evaluate interobserver reproducibility of HER2-low IHC scoring among breast subspecialty pathologists within our institution. Our investigation focused on cases of BC with HER2 IHC score 0 and 1 +, because the distinction between these two categories has not been consistently applied, and interobserver reproducibility in this context remains inadequately studied. Furthermore, we assessed the precision of DIA and agreement between DIA scoring and pathologist scoring of HER2-low BC within our study cohort.

Materials and methods

HER2 immunohistochemistry (IHC) scoring

We searched the pathology laboratory information system at our institution for records of breast core biopsy diagnosed as invasive mammary carcinoma with negative HER2 status (according to 2018 American Society of Clinical Oncology/College of American Pathology [ASCO-CAP] guidelines) in the original pathology report between February 2022 and August 2022. Exclusion criteria were as follows: (1) microinvasive carcinoma, (2) presence of in situ components (ductal carcinoma in situ or lobular carcinoma in situ), and (3) tumor size < 2 mm or < 10% of the core specimen. A total of fifty cases were selected, including 25 with a HER2 IHC score 0 and 25 with a score 1 +. All biopsy specimens met the College of American Pathology guidelines for specimen collection time (cold ischemia time less than 1 h) and fixation time (at least 6 h and no longer than 72 h of fixation in 10% formalin). HER2 IHC staining was performed using the 4B5 HER2/neu antibody and the Ventana BenchMark Ultra automatic immunostainer (Roche). The study received approval from the institution’s Quality Improvement Assessment Board.

Whole slides of HER2 IHC-stained core biopsy were digitally scanned using Aperio’s ImageScope v12.4.6.5001 (Leica Biosystems) and uploaded to a shared drive. The case numbers were anonymized, and the slide images were randomized for the participating pathologists. In the initial round of scoring, six subspecialized breast pathologists independently reviewed and scored the slide images. In the second round of scoring, all pathologists engaged in an online consensus meeting to discuss cases in which fewer than 5 of 6 pathologists in agreement to obtain the final consensus scores.

Digital image analysis (DIA)

DIA was applied to the same cohort of slide images using Aperio’s Membrane Algorithm v9.1 (Leica Biosystems) as per the product instructions. In this study, the IHC Membrane Image Analysis algorithm automatically selects tumor regions for analysis. The algorithm identifies and quantifies membrane staining intensity and completeness in individual tumor cells within selected regions. Since there is no in situ carcinoma in our studied cases, all the tumor cells identified by the algorithm were included in the analysis. Tumor cells are categorized as 0, 1 +, 2 +, or 3 + based on their membrane staining intensity and completeness. A tumor cell receives a classification of 1 + when there is weak and incomplete membrane staining, 2 + for moderate and complete membrane staining, and 3 + for intense and complete membrane staining. The slide score (DIA 0, 1 + , 2 + , or 3 +) is determined based on the percentages of cells with each classified score. A score of DIA 3 + is assigned if > 10% of cells show 3 + staining, DIA 2 + if > 10% of cells exhibit 2 + or higher staining, DIA 1 + if > 10% of cells display 1 + or higher staining, and DIA 0 if < 10% of cells exhibit 1 + or higher staining.

Statistical analysis

Interobserver reproducibility among pathologists, and concordance between pathologist consensus scores and DIA results were assessed using the Kendall coefficient of concordance (W) statistical test conducted with SPSS Statistics software. The W value indicates the level of agreement: 0.80–1.00 indicates excellent agreement; 0.60–0.79, good agreement; 0.40–0.59, moderate agreement; 0.20–0.39, slight agreement; and 0.00–0.19, poor agreement.

Results

Interobserver reproducibility for HER2 IHC scoring

Detailed scores for each case in both rounds of scoring are presented in Table 2 and Fig. 1. In the initial round, each of the six pathologists evaluated 50 cases, resulting in a total of 300 individual scores (150 scores for score 0 cases and 150 scores for score 1 + cases). Among the 25 cases originally scored as 0, 133 scores remained 0 (88.7%) and 17 scores changed to 1 + (11.3%). There were no higher scores within this group. For the 25 cases originally scored as 1 + , 113 scores remained 1 + (75.3%), 12 scores changed to 0 (8%), and 25 scores changed to 2 + (16.7%).

Table 2 Individual and consensus pathologist scoring of HER2 immunohistochemistry staining in 50 cases of invasive mammary carcinoma
Fig. 1
figure 1

HER2 immunohistochemistry scoring of 50 cases of invasive mammary carcinoma, as determined by 6 individual pathologists, the pathologist consensus score, and the output from digital image analysis (DIA), where the left panel represents cases with an original score of 0 and the right panel represents cases with an original score of 1 + (note: one case could not be evaluated by DIA due to low cellularity, < 500 nuclei)

Complete agreement (6/6 pathologists) was achieved in 19 cases (38%), comprising 17 of the 25 cases originally scored as 0 (68%) and 2 of the 25 cases originally scored as 1 + (8%). Agreement from ≥ 5/6 pathologists was reached in 33 cases (66%), including 18 of the 25 cases originally scored as 0 (72%) and 15 of the 25 cases originally scored as 1 + (60%). The Kendall W value for overall agreement was 0.828, indicating excellent agreement among pathologists.

In the second round, a consensus meeting was held to re-evaluate 7 cases from the 0 score group and 10 cases from the 1 + score group. This round resulted in the final consensus scores shown in Table 2. Two cases from the 0 score group were re-scored as 1 + (cases 21 and 18), and three cases from the 1 + score group were re-scored as 0 (cases 11, 17, and 25). Examples of HER2 IHC images with and without complete pathologist agreement in the first round of scoring are shown in Fig. 2.

Fig. 2
figure 2

Example HER2 immunohistochemistry images (scale bar = 200 µm) included in the study set assigned a score of A 0 (complete agreement from 6/6 pathologists), B 0 (2/6 pathologists) or 1 + (4/6 pathologists) with a final consensus meeting score of 0, C 1 + (complete agreement from 6/6 pathologists), and D 1 + (3/6 pathologists) or 2 + (3/6 pathologists) with a final consensus meeting score of 1 + (note: digital image analysis results were concordant with all consensus scores)

Concordance of pathologist scores and DIA

Aperio’s Membrane Algorithm v9.1 was applied to the 50 slide images to score HER2 IHC staining. This algorithm quantifies membrane staining intensity and completeness in individual tumor cells within selected regions. Of the 50 cases studied, one case from the 0 score group was excluded from analysis due to scant tumor nuclei (< 500, case 37). The average number of analyzed tumor cells per case was 4964 (range 722–16,949). Detailed DIA results are presented in Table 3.

Table 3 Digital imaging analysis (DIA) of HER2 immunohistochemistry staining in 50 cases of invasive mammary carcinoma

Concordance between DIA scores and pathologist scores was evaluated. In cases with agreement from ≥ 5/6 pathologists (17 score 0 cases and 15 score 1 + cases), DIA scores showed a 100% concordance rate with the first round of pathologist scores. When compared with the final pathologist scores, the concordance rate between DIA scores and pathologist scores was 96% (47/49). One case scored as 0 by pathologists was scored as 1 + by DIA (case 21), and one case scored as 1 + by pathologists was scored as 0 by DIA (case 25). The Kendall W value for agreement between final pathologist scores and DIA was 0.959, indicating excellent agreement.

Discussion

The significant benefit of the novel HER2-trageting ADC T-Dxd for metastatic HER2-low BC makes it clinically relevant to distinguish between HER2 IHC score 0 and 1 +. Interobserver reproducibility by pathologists is an important factor in HER2 IHC evaluation. The current study showed some interobserver discordance among breast pathologists in evaluating BC with HER2 IHC scores of 0 and 1 +, although concordance was higher than that reported in previous studies, possibly owing to the subspeciality status of the pathologists. Our results also indicated that HER2 IHC DIA is a feasible tool to determine HER2 status accurately.

Although previous studies have shown good concordance for HER2 IHC scores from different pathologists, a closer inspection of the results revealed high interobserver concordance for HER2-positive BC with IHC scores of 2 + and 3 + scores and lower concordance for HER2-negative BC with IHC scores of 0 and 1 + [10, 11, 13, 24,25,26]. In a recent study, Fernandez et al. [13] evaluated the concordance rate for HER2 IHC scoring among 18 pathologists who read 170 BC biopsy cases. Between 0 and 1 + scores, agreement in ≥ 17/18 pathologists was achieved in only 26% of cases (24/92), whereas agreement in ≥ 17/18 pathologists was achieved in 58% of cases (26/45) for 3 + scores. In the current study, the first round of HER2 scoring showed that complete agreement (6/6 pathologists) was achieved in 19/50 cases (38%) and agreement in ≥ 5/6 pathologists was achieved in 33/50 cases (66%) for BC biopsy with IHC scores 0 and 1 +. These agreement rates are higher than those reported in previous studies. In our study, all observers were pathologists with expertise in breast subspecialty and were aware of the study purpose of clarifying HER2-negative BC as IHC score 0 or 1 +, so we applied the scoring criteria more strictly in each case.

In the 25 cases with IHC score 0 in our study, complete agreement (6/6 pathologists) was observed in 17 cases (68%) and agreement in ≥ 5/6 pathologists was observed in 18 cases (72%). In comparison, in the 25 cases with IHC score 1 +, only 2 cases (8%) showed complete agreement and 15 cases (60%) showed agreement in ≥ 5/6 pathologists. The disagreement in scoring IHC 1 + is due to subjective determination of the percentage of tumor cells being stained (≤ 10% vs. > 10%) and the staining intensity (faintly or barely stained vs. weakly to moderately stained). Among our six breast pathologists who assigned IHC scores, one pathologist tended to assign higher scores than the other pathologists, which reflects the subjectiveness of the manual scoring system.

The HER2 IHC assay commonly used in routine clinical practice was not developed as a quantitative assay to measure levels of protein expression. Given this limitation, new technologies that could provide more comprehensive and accurate assessments of HER2-low BC are under active investigation. Moutafi et al. [27] developed a quantitative immunofluorescence assay coupled with a standardized mass spectrometry HER2 array to measure absolute amounts of HER2 protein (in units of amol/mm2) on conventional histologic sections. In this assay, a low range of HER2 expression in unamplified cell lines was considered 2 to 20 amol/mm2. Among 364 BC cases subjected to the assay, 67% had HER2 expression above the limit of quantification and below the levels seen in HER2-amplified BC. The authors proposed that the assay could be used to determine the levels of HER2 required for response to T-Dxd or similar HER2 ADCs. Kennedy et al. [28] evaluated the analytical performance of immunoaffinity enrichment coupled to multiple reaction monitoring–mass spectrometry (immuno-MRM-MS) and showed that this method had higher concordance with predicate assays and could be used to quantify HER2, even at low expression levels. Xu et al. [29] suggested that molecular methods such as mRNA analysis could accurately define HER2-low BC, aiding in treatment decision-making, because these methods have a wide dynamic range. Although these studies suggest that more sensitive methods to detect HER2 expression could be developed, the validity of these methods in identifying HER2-low BC and predicting treatment response needs to be evaluated by prospective clinical trials. Currently, ASCO-CAP guidelines do not suggest changing the HER2 testing algorithm.

Precise and reliable methods based on DIA and artificial intelligence have also been evaluated to score HER2 IHC [14,15,16,17,18,19,20,21,22,23]. Using Aperio ImageScope, Jakobsen et al. [23] reported good agreement (kappa coefficients of 0.67) between manual assessment and DIA. Hartage et al. [15] applied the HER2-CONNECT algorithm in the Visiopharm Integrator System to score 612 primary invasive BC and compared these scores with pathologist manual scores. They reported 87.3% concordance between HER2 DIA scores and pathologist scores. In the current study, we used Aperio’s Membrane Algorithm v9.1 for DIA. We found that the DIA score was completely concordant with most pathologist scores in cases in which ≥ 5/6 pathologists agreed. The concordance rate between DIA scores and final pathologist scores for all cases was 96%.

Our study has some limitations. First, because this was a pilot study and quality improvement project, the number of cases included was low. Second, the study cohort included only cases in which HER2 IHC scores were 0 or 1 +. For validating the DIA algorithm, a broader spectrum of HER2 IHC scores, including 2 + and 3 +, should be included.

In conclusion, our study showed some interobserver discordance among breast pathologists in evaluating BC with HER2 IHC scores of 0 and 1 +, but their performance was higher than that observed in previous studies, possibly owing to an awareness, as subspecialists, of the HER2-low status. Our results also revealed that HER2 IHC DIA is a feasible and a valid tool to determine HER2 status accurately. Further investigation with a larger number of cases and including those with IHC scores of 2 + and 3 + is needed to validate the performance of DIA.