Introduction

In the era of precision medicine, an expanding list of immunohistochemical (IHC) markers have become class II biomarkers, i.e., they provide prognostic and predictive information to select eligible patients who may benefit from hormonal, targeted, and/or immune therapies [1, 2]. These IHC biomarkers are increasingly and routinely tested in pathology laboratories worldwide.

In order to achieve high inter-laboratory concordance, a number of guidelines have been developed in recent years aiming to standardize the essential pre-analytical, analytical, and post-analytical components of IHC [3,4,5,6]. Any change in these critical components requires full re-validation of the IHC protocol [3]. For any modification in the pre-analytical phase, the validation set implicates prospective procurement of paired tissue samples to allow comparison of the IHC results across various pre-analytical conditions.

Recent advances in targeted therapies and immuno-oncology as well as the approval of companion or complementary IHC biomarkers have placed biomarker testing and interpretation under scrutiny. An example of such a biomarker is programmed death-ligand 1 (PD-L1) IHC as multiple commercial antibody clones exist and the positive criteria vary significantly depending on antibody clone and cancer type [7]. Additionally, it has been shown that PD-L1 expression is subject to tumor heterogeneity, staining inconsistency among different antibody clones, and inter- and intra-observer variability [7,8,9,10,11]. To date, the impact of pre-analytical factors on PD-L1 expression has been underexplored [12].

In this study, we aimed to assess the effect of tissue processing on the immunoexpression of several commonly used class II biomarkers including estrogen receptor (ER), progesterone receptor (PR), human epidermal growth factor receptor 2 (HER2), mismatch repair proteins (MMR), and PD-L1 using a large prospectively collected cohort of 109 tumors of various types. Two commonly available tissue processors and four different processing protocols were compared to determine the effects of tissue processing on IHC assessed using categorical, semiquantitative, and quantitative measurements.

Material and methods

Tissue procurement and processing

The study was approved by the institutional research ethics board. Formalin fixed tissue from 109 tumor resections was prospectively collected and processed using two tissue processors: Pathos delta (Milestone Medical, Kalamazoo, MI, USA) and Leica ASP330S (Leica Biosystems Inc., Concord, Ontario, Canada). The following four tissue protocols were tested: protocol 1 (P1)—rapid protocol for biopsies on Pathos, P2—routine overnight processing protocol on Pathos, P3—protocol for fat-rich tissue on Pathos, and P4—for fat-rich tissue on Leica. Detailed protocols are provided in Table 1. For breast cases, the cold ischemic time was less than 1 h and fixation time was 24–96 h (as for the remaining cases, while these times were not specifically recorded, most cases in our lab are processed in similar manner). The 109 consecutive tumor resections were procured including colorectal carcinomas (n = 28), breast carcinomas (n = 22), renal tumors (n = 21), head and neck squamous cell carcinomas (n = 13), melanomas (n = 11), bladder urothelial carcinomas (n = 11), endometrial carcinoma (n = 1), Merkel cell carcinoma (n = 1), and papillary thyroid carcinoma (n = 1). The procured tissue was size appropriate for each protocol, i.e., for P1 tissue size mimicked that obtained by a core biopsy of 1–3 mm in diameter, for the remaining protocols—tissue size was up to a nickel coin size and 2–3-mm thick depending on tumor availability.

Table 1 Processing protocols utilized in this study

IHC: staining and interpretation

Following processing, tissue microarrays (TMA) were created using triplicate 1-mm cores to account for tumor heterogeneity. Sequential sections from each block were stained with the following class II IHC markers: PD-L1 clones SP263, SP142 and 22C3, ER, PR, HER2, MMR proteins (MLH1, PMS2, MSH2 and MSH6), and BRAF V600E. Details of the antibodies utilized are summarized in Table 2.

Table 2 Details of IHC markers used

The listed biomarkers were scored in the following tumors: PD-L1—all tumors and all clones, ER/PR/HER2—breast carcinoma only, MMR—colorectal and endometrial carcinoma only, BRAF V600E—melanoma and papillary thyroid carcinoma only. IHC results from all 3 cores from each tumor were scored and averaged using the following scoring algorithms.

PD-L1 immunopositivity for urothelial, breast, and head and neck squamous cell carcinomas was determined using available algorithms as described in Table 3. These thresholds were established according to clinical responses to the associated immune checkpoint inhibitors in various clinical trials for urothelial carcinoma [13,14,15,16], head and neck squamous cell carcinomas [17,18,19,20,21], and breast carcinoma [22, 23]. In brief, the combined positive score (CPS) was defined as the number of PD-L1 positive tumor cells (TC) and immune cells (IC) divided by total number of TC × 100. PD-L1 22C3 was considered to be positive if CPS ≥ 10% in urothelial carcinoma or ≥ 1% in breast or head and neck squamous cell carcinomas; PD-L1 SP142 was deemed positive if IC ≥ 5% in urothelial and head and neck squamous cell carcinomas, or IC ≥ 1% in breast carcinoma; whereas SP263 was determined to be positive when ≥ 25% TCs or ICs were stained in urothelial carcinoma or ≥ 25% of TCs were stained in head and neck squamous cell carcinoma. The threshold of PD-L1 positivity for SP263 clone in breast carcinoma has yet to be determined. Additionally, we applied a universal semi-quantitative scoring system using a six-tiered cut-off for positive TC%, positive IC%, and CPS for all tumors: (0) < 1% TC/IC or CPS < 1; (1) 1–4.9% TC/IC or 1–4.9 CPS; (2) 5–9.9% TC/IC or CPS 5–9.9; (3) 10–24.9% TC/IC or CPS 10–24.9; (4) 25–49.9% TC/IC or CPS 25–49.9; and (5) ≥ 50% TC/IC or CPS ≥ 50.

Table 3 Clinical algorithms to determine PD-L1 immunopositivity

For ER, PR, and HER2, scoring was performed according to the latest American Society of Clinical Oncology/College of American Pathologists (ASCO/CAP) guidelines [24, 25]. In brief, for ER and PR, 1% cut off was used, and for HER2, the cut off for positivity was > 10% of tumor cells with strong complete membranous staining. The percentage of positive tumors cells was additionally recorded for ER and PR.

BRAF V600E was considered positive if moderate to strong granular cytoplasmic staining was seen in virtually all tumor cells. Mismatch repair deficiency was defined as absence of staining of MMR in virtually all tumor nuclei with an acceptable internal control. BRAF V600E mutation status in melanoma and HER2 amplification status in breast carcinoma were confirmed from the patient record.

Statistical analyses

All statistical analyses were performed using the SPSS software 24.0 (IBM Corporation, New York, NY, USA). Fleiss’ kappa analyses were performed to determine the concordance among different platforms for each class II biomarker. p values less than 0.05 were considered to be statistically significant.

Results

PD-L1

The percent PD-L1 immunopositivity and concordance using clinical algorithms for urothelial, breast, and head and neck squamous cell carcinomas are shown in Fig. 1. The percent of PD-L1 positivity in urothelial carcinoma was 9–27% using SP263, 18–27% using 22C3, and 0–18% using SP142 clone. No urothelial carcinoma case was universally positive across all clones and platforms, whereas 4 (36%) were consistently negative. For breast carcinoma, PD-L1 immunopositivity was seen in 48–62% of cases using 22C3 and 24–40% using SP142. For head and neck squamous cell carcinoma, the positive rate was 0–31% for SP263, 54–85% for 22C3, and 8–23% for SP142. The number of cases that was universally positive and negative was 0 and 1 (8%), respectively.

Fig. 1
figure 1

Concordance of PD-L1 immunopositivity in urothelial carcinoma, breast carcinoma, and head and neck squamous cell carcinoma (HNSCC). Heatmaps: each row represents an individual case and each cell represents a PD-L1 reading. Bold kappa value: significant, p < 0.05

There was substantial agreement among platforms using SP263 clone in urothelial carcinoma (kappa = 0.614), moderate agreement using 22C3 in breast carcinoma (kappa = 0.402), and fair agreement using SP142 in breast carcinoma (kappa =0.392), as well as using SP263, SP142, and 22C3 in head and neck squamous cell carcinoma (kappa = 0.357, 0.347, and 0.261 respectively). The kappa values across platform for 22C3 and SP142 in urothelial carcinoma did not reach significance (p > 0.05).

The results using a universal six-tiered cutoff are shown in Fig. 2. Overall, there was a fair agreement among the four platforms tested using this 6-tiered approach regardless of antibody clones and/or scoring methods (TC%, IC%, or CPS). The kappa values ranged from 0.248 to 0.354. Figure 3 illustrates the differences in PD-L1 expression between protocols.

Fig. 2
figure 2

PD-L1 immunostain and concordance using universal six-tiered cutoff values. Ca, cancer type; TC, tumor cells; CPS, combined positive score; IC, immune cells; NA, not available

Fig. 3
figure 3

Comparison of the differences in PD-L1 expression between protocols. First column, head and neck squamous cell carcinoma (HNSCC) case 5, clone 22C3 (negative, CPS < 1, with protocol 4, positive, CPS ≥ 1%, with the remaining protocols). Second column, breast carcinoma case 14, clone SP142 (positive, IC ≥ 1%, with protocols 1 and 3, negative, IC < 1%, with 2 and 4). Column 3, urothelial carcinoma case 4, clone SP263 (positive, TC or IC ≥ 25%, with protocol 2, negative, TC or IC < 25%, with the remaining protocols). Scale bar: 200 microns

ER, PR, and HER2 in breast carcinoma

The performance of ER, PR, and HER2 IHC is shown in Fig. 4. There was perfect concordance for ER (kappa = 1.000) with a frequency of ER immunopositivity of 77%. All positive cases showed diffuse ER positivity in 100% of tumor cells.

Fig. 4
figure 4

Comparison of the performance of other class II biomarkers across four procession platforms. NA, not available; PTC, papillary thyroid carcinoma. The numbers in ER and PR represent the actual percentage of positive tumor cells. The positive percentage for BRAF V600E is calculated for melanoma cases only

There was a substantial agreement of PR immunostaining across the platforms (kappa = 0.695). The rate of PR immunopositivity ranged from 55 to 62%. Four breast cancers (18%) showed discrepant PR results, in which the percentage of PR positivity ranged from 0 to 40%.

The concordance of HER2 IHC was substantial (kappa = 0.787). Two (10%) and 14 (64%) cases were interpreted as HER2 positive and negative respectively across all four platforms. Six cases were interpreted as HER2 equivocal in at least one of the platforms tested. Protocol P4 resulted in more equivocal cases (27%) compared with the other 3 protocols. The two cases that were equivocal across all four platforms were subjected to HER2 amplification testing by FISH, one of which showed HER2 amplification whereas the other was not HER2 amplified.

BRAF V600E

BRAF V600E IHC was evaluated in 1 papillary thyroid carcinoma and 11 melanomas (Fig. 4). There was near perfect concordance (kappa =0.925). Three melanomas (27%) and the papillary thyroid carcinoma were consistently positive for BRAF V600E, whereas 7 melanomas (64%) were negative across platforms. There was one case of melanoma showing BRAF V600E positivity using protocol P3 (which was the correct protocol for tissue type) and was interpreted as equivocal with weak granular cytoplasmic staining using the other three protocols. This case together with the 3 BRAF V600E-positive melanoma cases was shown to contain BRAF V600E mutation per chart review.

MMR

A perfect concordance with a kappa value of 1.000 was achieved for all MMR markers in colorectal (n = 24) and endometrial carcinoma (n = 1, Fig. 4). The rate of MMR deficiency was 8% (2/25) for MLH1, 20% (5/25) for PMS2, and 0% for MSH2 and MSH6.

Discussion

Among the class II biomarkers tested, we found that ER and MMR IHC were not impacted by processing; PR, HER2, and BRAF V600E were minimally affected by processing with strong correlation among platforms, whereas PD-L1 (regardless of antibody clones used) was strongly influenced by processing protocols.

It is known that the interpretation of HER2 and PD-L1 IHC is influenced by both intratumoral heterogeneity [7, 11, 26,27,28] and interobserver variability [9, 11, 29, 30]. Therefore, the different results across the platforms observed in PD-L1 and HER2 may be in part a result of intratumoral heterogeneity given the TMA cores were sampled from different areas of each tumor. On the other hand, as we used serial sections from the same TMA block from each processing platform to perform a panel of IHC, the performance of PD-L1 antibody should not be affected by intratumoral heterogeneity. Additionally, the IHC of various clones and platforms were interpreted and scored by the same pathologist to avoid the impact of interobserver variability. Other pre-analytical parameters, e.g., fixation and IHC protocol, remained the same. Therefore, it is reasonable to conclude that the difference in staining and interpretation across platforms is attributable to the variable tissue processing.

It has been shown that pre-analytical variables, e.g., cold ischemic time and fixation time, have a significant impact on the performance and interpretation of biomarkers in breast cancer [31,32,33]. Therefore, the current ASCO/CAP guidelines mandate documentation and standardization of these parameters when handling a breast cancer specimen [24, 25]. Little is known about the influence of tissue processing on biomarker IHC. Sujoy et al. compared ER immunostain between conventional and rapid processing assays using semi-quantitative Q scores, and found the ER results to be identical [34]. Bulte et al. showed that accelerated tissue processing had no significant impact on HER2 status [35]. In the current study, we evaluated both the categorical classification based on ASCO/CAP guidelines and the actual percentage of ER and PR expression. Consistent with what has been previously reported, we found that the processing platform has minimal if any impact on ER, PR, and HER2 interpretation. Together, these results show that rapid protocol and protocols designed for fatty tissue are suitable for biomarker evaluation in BC.

In the current study, we also evaluated the performance of other class II biomarkers. Overall, processing appeared to have no or minimal impact on BRAF V600E in melanoma and PTC, as well as MMR in colorectal and endometrial carcinoma.

The biomarker that appeared most impacted by tissue processing was PD-L1. The variation of PD-L1 staining was observed across all tumor types tested using either the clinical algorithms or a semiquantitative scoring scheme. Several recent studies have shown the impact of preanalytical variables on PD-L1 expression. For example, the type of decalcification agent impacts PD-L1 results for 22C3 clone, but not E1L3N clone [36]. Delayed fixation has been shown to decrease PD-L1 expression in the study by Van Seijen et al. [37], but to have no apparent impact on PD-L1 IHC in the study by Forest et al. [36]. Prolonged fixation does not appear to affect PD-L1 results [37]. In this study, we reported that processing protocol and platform have a significant impact on PD-L1 IHC. Therefore, validation and standardization of preanalytical variables, including tissue processing, should be considered in PD-L1 testing in a clinical laboratory.

Conclusions

Aside from PD-L1, other class II IHC biomarkers, e.g., ER, PR, HER2, MMR, and BRAF V600E, show perfect or high concordance of read out using different tissue processors and processing protocols. However, for PD-L1, the staining and interpretation are strongly influenced by a combination of tissue processing procedures and intratumoral heterogeneity. Optimization and validation of pre-analytical processes including processing protocols are essential for correct PD-L1 biomarker interpretation.