Introduction

Treatment of localized colorectal cancer (CRC) is based upon the tumor, nodes, metastasis (TNM) staging system, with surgery alone recommended for stage I and most stage II disease and adjuvant chemotherapy recommended for high-risk stage II and stage III disease [1]. However, the TNM staging system offers only a rough assessment of tumor biology. The high-risk stage II disease classification includes patients with clinical findings of obstruction and perforation [1] and excludes patients with microsatellite instability (MSI)-high tumors, a subtle distinction that represents a major shift in cancer care as it acknowledges the prognostic importance of tumor biology [2, 3]. This shift towards incorporating tumor biology into staging and treatment algorithms is further evidenced by the recent FDA approval of the anti-PD-1 antibody, pembrolizumab, for treatment of any MSI-high metastatic or unresectable tumor, regardless of histology or origin [4]. This unprecedented approval for a therapy across all tumors types based on a single biologic factor highlights the importance of accurately understanding tumor biology.

The incorporation of MSI status into staging and treatment recommendations [1] highlight recent interest in the tumor microenvironment (TME), a factor that influences response to immunotherapy and serves as a marker of overall tumor biology and prognosis [5, 6]. In CRC, the immunoscore—a measure of CD3+ and CD8+ tumor infiltrating lymphocyte (TIL) densities at the center of tumor (CT) and invasive margin (IM)—has been shown to predict recurrence independent of, and more accurately than, T or N stage [7, 8]. Additionally, a greater degree of inflammation, higher CD8+ lymphocyte density, and higher FoxP3+ lymphocyte density have also been associated with improved survival [9,10,11]. Finally, higher levels of programmed cell death-ligand 1 (PD-L1, a component of the PD-1/PD-L1 checkpoint) within the TME have been correlated with higher CD8+ densities, improved disease-free survival, and improved overall survival [12]. Given the predictive value of the TME, it seems likely that future staging and treatment guidelines will continue to incorporate these features.

As TME characterization and MSI status become increasingly important in cancer care, it is vital that providers accurately define these factors in a pre-therapy setting before the final resection specimen is subjected to thorough pathologic analysis. This pre-therapy assessment is used to identify patients who would benefit from neoadjuvant therapy or be eligible for an immunotherapy window of opportunity trial, which involves administering an immunomodulatory agent over a limited period of time between diagnosis on biopsy and final surgical resection. Mischaracterization of the pre-intervention TME could lead to inappropriate enrollment in a trial and incorrect appraisal of the effect of the trial agent on the TME.

Despite the importance of TME evaluation from pre-treatment biopsies, little is known regarding the accuracy of the biopsy TME and the correlation between the TME from biopsy and surgical specimens. The purpose of our study was to thoroughly compare the TME of CRC (stages I–III) sampled on endoscopic biopsy to that of the definitive surgical resection specimen in patients without a history of neoadjuvant chemo- or immunotherapy. We sought to determine the density of CD3+, CD4+, CD8+, and FoxP3+ lymphocyte populations in both the CT and IM and the correlation of these densities between biopsy and resection specimens. Additionally, we sought to assess the accuracy of an endoscopic biopsy at determining the PD-L1 status of the tumor.

Methods

Patients

This study was conducted after approval by our institutional review board. Patients with pathologic stage I–III CRC diagnosed on endoscopic biopsy who underwent resection with curative intent from the years 2006–2016 at our center were included. Cases were identified through the electronic pathology database at our institution (CoPath Plus, Cerner, Kansas City, MO) using the terms “colonic adenocarcinoma” and “colorectal adenocarcinoma” under the category “final diagnosis”. To meet inclusion criteria, formalin-fixed paraffin embedded tissue (FFPE) blocks from both the initial biopsy and the resection specimen of the same malignancy had to be on site at our facility and needed to have at least 2 mm2 of tumor remaining in the block. Cases were excluded if one of the procedures had been performed at an outside institution, blocks were unavailable, or less than 2 mm2 of tumor remained in the blocks. Patients with stage IV disease at time of diagnosis, in situ disease only on final resection specimen, or who underwent neoadjuvant chemotherapy prior to resection were also excluded. Tumors were staged using the TNM classification from the American Joint Committee on Cancer (AJCC) 7th edition. Demographic, tumor-specific, and treatment data were all collected from retrospective chart review.

Tissue preparation, staining, and analysis

FFPE tissue blocks from the endoscopic biopsies and surgical resections were visually examined for tumor quantity. Blocks that contained the most representative tumor material were examined histologically (hematoxylin and eosin) to ensure adequate tumor was available and selected blocks contained representative samples of the invasive margin and center of tumor. Each block was cut at 4 μ and immunohistochemically stained with FoxP3, CD3, CD4, CD8, and PD-L1 (CAL10, Biocare Medical, Pacheco, CA). The immunohistochemical-stained slides were digitally captured at 400 × magnification, using the Aperio ScanScope AT Turbo (Leica Biosystems, Buffalo Grove, IL) digital imager system. De-identified images were uploaded into eSlide Manager (version 12.3.2.5030) so that all interpreting pathologists were blinded to any patient information corresponding to a given slide. Image analysis was performed using Aperio ImageScope (version 12.0.1.5012) and an in-house analysis algorithm. The algorithm was tuned to detect nuclear and cytoplasmic positivity while excluding larger tumor nuclei in an attempt to interpret only lymphocytes. To validate this algorithm, images were captured on the ScanScope and representative fields were manually counted by three different pathologists and compared to the result obtained using the developed algorithm. After validation, slides were scanned into the eSlide Manager server and 1mm2 areas were manually selected from both the tumor center and the invasive margin (Fig. 1). CD3+, CD8+, and FoxP3+ immunostained lymphocyte counts were reported as positive cells/ mm2.

Fig. 1
figure 1

CD3+ immunohistochemical staining of colorectal adenocarcinoma as seen on an Aperio® scanned slide. A: 1 mm2 of tissue is selected from the invasive margin and the center of tumor at 1×. B: CD3+ lymphocytes at the invasive margin at 10×. C: CD3+ lymphocytes at the center of tumor at 10×

PD-L1-stained slides were read manually by a pathologist and reported as positive/negative. The PD-L1 stain was interpreted per manufacturer guidelines for interpretation in non-small cell lung cancer (the only FDA-approved indication for this test at the time of this investigation; Fig. 2) [13]. Slides containing tissue from the CT, IM, and tumor surface were all examined for PD-L1 staining. If greater than or equal to 1% of the tumor cells were positive for the stain, the specimen was considered positive. If the stain was dark and continuous around the cell membrane, the specimen was considered high positive. If the stain was faint or patchy around the membrane, it was considered low positive.

Fig. 2
figure 2

PD-L1 immunohistochemical staining of colorectal adenocarcinoma. A: negative stain. B: low positive stain. C: high positive stain. Both B and C are scored as “positive”.

All photomicrographs were captured using the Aperio ScanScope AT Turbo (Leica Biosystems, Buffalo Grove, IL) at 400 × magnification using a 20X FN 26.5 lens with a 2 × doubler. Images were processed using the Aperio ImageScope software, version 12.

Statistical analysis

SPSS v. 24 (IBM, Armonk, NY) was used for all statistical analyses. Continuous variables were assessed for normality using Shapiro–Wilk tests and reported as either mean and standard deviation (SD) or median and interquartile range (IQR) as appropriate. Continuous variables were compared either with Student’s t tests (parametric) or Mann–Whitney U tests (nonparametric). Correlations between continuous variables were performed using a Pearson correlation (r). Statistical significance was set at p < 0.05.

Results

Patients

Matched endoscopic biopsy and resection specimens from 78 patients with sufficient tissue remaining were identified and included. Clinical and pathologic descriptors of those patients are described in Table 1. Median age was 61 years, and the majority of patients were male (56.4%). The most frequent tumor and node classifications were T3 (53%) and N0 (54%), respectively. The most frequent overall pathologic stage was stage III (46.2%). Tumors were most commonly in the right colon (consisting of all colon supplied by the superior mesenteric artery). MSI testing was performed in 52.6% of patients, of which 85.4% (35/41) were reported as low probability. Lymphovascular and perineural invasion were present in 30.8% and 11.5% of cases, respectively.

Table 1 Patient demographics

TIL analysis

The TME was assessed at four sites: the CT and IM of the biopsy specimen (CT-B and IM-B), and the CT and IM of the resection specimen (CT-R and IM-R). Comparison of the TME of CT-B and CT-R (Table 2) demonstrated significantly larger populations of CD3+ and CD4+ lymphocytes in the CT-B relative to the CT-R specimen (p < 0.001 and p = 0.004, respectively). The CD3+ lymphocyte population was larger in the IM-B relative to the IM-R (p = 0.001). There was no difference in FoxP3+ or CD8+ lymphocyte populations at either the CT or the IM when comparing biopsy and resection specimens.

Table 2 Comparison of tumor microenvironment in biopsy and resection specimens

Two sets of correlations using TME populations were performed (Table 3). First, CT and IM TME populations were correlated in the biopsy and resection specimens separately (Supplemental Figures S1 and S2). There were moderate correlations between FoxP3+ and CD8+ lymphocyte populations at CT and IM in the biopsy (CT-B to IM-B, r = 0.700 and r = 0.617, respectively) and resection specimens (CT-R and IM-R, r = 0.673 and r = 0.621, respectively). Second, biopsy and resection specimen TME populations were correlated at the CT and IM separately (Supplemental Figures S3 and S4). No lymphocyte population in either the CT or IM had a Pearson r > 0.5 when comparing the biopsy and resection. CD3+ and CD8+ lymphocyte populations at the IM and CT moderately correlated (r values between 0.394 and 0.444) and CD4+ and FoxP3+ lymphocyte populations at either site weakly correlation (r values all < 0.250) between biopsy and resection specimens.

Table 3 Tumor microenvironment correlations

PD-L1 analysis

Biopsy and resection specimens were then divided into groups based on PD-L1 status (Table 4). Of 78 patients, 21 (26.9%) had PD-L1+ biopsies, 35 (44.9%) had PD-L1+ resection specimens, and 13 (16.7%) had both biopsy and resection stain as PD-L1+. Patients with a PD-L1+ biopsy specimen had a greater number of CD3+ and CD4+ cells at CT-B and CD3+ cells at IM-B relative to PD-L1- biopsy specimens. Patients with PD-L1+ resection specimens had a greater number of FoxP3+, CD4+ and CD8+ cells at the CT-R, and FoxP3+ and CD3+ cells at IM-R (all p < 0.05) relative to PD-L1- resection specimens.

Table 4 PD-L1 status and the tumor microenvironment

Given the difference in PD-L1 status between biopsy and resection specimens, the accuracy of the biopsy at predicting final specimen PD-L1 status was then assessed (Table 5). Biopsy specimens were considered the test and resection specimens were considered the gold standard. Only 16.7% of specimens were true positives and 45% of specimens were true negative, with a false-negative rate of 28.2%. The overall accuracy of the biopsy at correctly identifying PD-L1 status of the final tumor was 61.5% (95% CI 49.8–72.3%).

Table 5 PD-L1 accuracy

Discussion

This study investigated the degree of correlation amongst TMEs as measured at the CT and IM in CRC biopsy and resection specimens. Our data demonstrate that there is concordance between the IM and CT FoxP3+ and CD8+ lymphocyte populations within individual biopsy and resection specimens. However, there were only moderate correlations between TIL populations (all r < 0.5) at any location between the biopsy and resection specimens. PD-L1+ specimens consistently had higher TIL populations at CT-R and IM-R, but the overall accuracy of the biopsy at predicting resection PD-L1 status was only 61.5%, with a false-negative rate of 28.2%.

Only recently has immunotherapy expanded to CRC, and only two studies to date have compared the TME of endoscopic biopsies to those of surgical specimens in CRC. Koelzer et al. [14] compared CD8+ and CD45RO lymphocyte populations in pre-operative biopsies and resection specimens from 130 patients with stage I–III CRC. Higher CD8+ lymphocyte infiltration in the biopsy specimen was found to be independently predictive of improved overall survival (p < 0.01). Lower CD8+ lymphocyte densities on biopsy were also predictive of higher T stage and positive nodal status. Yet when matched biopsy and resection specimens were correlated, only a moderate correlation existed for CD8+ lymphocytes (r = 0.42) and a weak correlation existed for CD45RO lymphocytes (r = 0.16). The authors did not comment on any relationship between TIL populations found in the resection specimen and survival, which would have been useful to determine the clinical significance of this weaker correlation between biopsy and resection specimens and the prognostic significance of the biopsy relative to the resection specimen.

Similarly, Park et al. [15] compared CD3+ lymphocytes, CD8+ lymphocytes, and tumor stroma percentage (TSP, an assessment of the degree of intratumoral stroma at the deepest point of tumor invasion) in matched biopsy and resection specimens of 115 patients with stage I–III CRC. No correlation coefficients were calculated, but incorrect characterization of TIL density occurred at frequencies of between 26.1 and 41.1%, depending on the type and location of the various T cells. The group also noted a difference in high- and low-density TSP when comparing biopsy and resection specimens (p = 0.001). The authors concluded that biopsy specimen provides a representative assessment of TME microenvironment, despite the statistically significant difference in lymphocytes populations and tumor stroma between biopsy and resection specimens.

Cumulatively, these data suggest a lack of correlation in certain TIL populations between biopsy and final surgical resection. While comparing Pearson r correlations and sensitivity/specificity calculations across studies is difficult as each of these can be independently influenced by a number of factors inherent to each data set, our study now supports a growing a body of literature that question the reliability of a biopsy at accurately representing the resection specimen TME. Our correlation of CD8+ lymphocytes at the IM (r = 0.417) and CT (r = 0.394) between biopsy and resection mirrors the correlation of CD8+ lymphocytes seen by Koelzer et al. (r = 0.40) [14]. Additionally, the accuracy of biopsy intraepithelial CD3+ lymphocytes density at predicting density in resection specimen reported by Park et al. is similarly lacking (73%) [15]. While our current study corroborates the initial data from previous studies, we have included a more exhaustive examination of the TME of both biopsy and resected specimens, including additional TIL populations (FoxP3+ and CD4 +). We found no correlation between biopsy and resection specimens with regards to any of the examined TIL lines. This lack of correlation is concerning as such an assessment is becoming crucial for both deciding patient treatment and determining the efficacy of therapies.

Our study also examined PD-L1 positivity and its correlation between biopsy and resection specimens. We found significant discordance of PD-L1 status between biopsy and resection specimens, with 38.5% of patients misclassified based on the endoscopic biopsy. This inaccuracy is driven by a low sensitivity (37.1%) and thus a high false-negative rate (28.2%), which could clinically result in the withholding of potentially effective immunotherapy from these patients. A possible reason for this low accuracy is a high degree of intratumoral PD-L1 heterogeneity. PD-L1 heterogeneity is well documented in lung adenocarcinoma [16, 17] and breast cancer [18, 19] and it would be reasonable to hypothesize that such variability might also be present in CRC. This heterogeneity likely applies not just to PD-L1, but to different TIL phenotypes and other aspects of the TME.

Understanding TME heterogeneity is particularly important because individual TIL populations [10, 20,21,22,23] as well as PD-L1/PD-1 staining [24, 25] have all been shown to have prognostic importance. However, these conclusions are based on studies that only analyzed the TME of surgical resection specimen. As researchers attempt to apply such analyses of the TME in the neoadjuvant setting, an inadequate analysis on biopsy secondary to TME heterogeneity may result in the misclassification of patients. In addition, comparing biopsy and resection specimens separated by chemo- or immunotherapy and drawing conclusions on the effect of therapy on the TME may lead to false conclusions as no reliable correlation between biopsy and resection TME has been published. We must, therefore, use caution when making important treatment decisions based on any biopsy specimen alone, and must similarly be cautious when judging the efficacy of a therapy based on the relative change in the TME that are calculated by comparing a pre-treatment biopsy to a post-treatment biopsy or resection specimen.

Future studies must focus on the development of an adequate means of comparing biopsy and resection specimens. While no studies to date have addressed this problem, we would advocate for thorough mapping of the TME using multiple samples throughout the tumor, with initial analysis aimed at understanding the implications of differences throughout the TME. More extensive sampling is also necessary both before and after treatment in window of opportunity trials to more definitively characterize the effects of any therapy on the TME. Similarly, additional studies are needed to fully understand the clinical implications of PD-L1 heterogeneity and if reviewing a greater number of slides to better characterize a patient’s PD-L1 expression status can improve our ability to predict response to targeted therapies.

There are a number of limitations to our study. First, this was a retrospective study constrained by data accessible in the medical record and remaining tissue available for analysis. Lack of sufficient biopsy tissue remaining after original diagnosis certainly limited the power of this study as patients without available tissue were appropriately excluded. Second, a more thorough analysis would have addressed how individual TIL populations, and particularly those from the biopsy specimen, correlated with clinical outcome. Our data set is limited by a lack of long-term follow-up data for a number of patients, precluding our ability to perform such an analysis accurately. Finally, we did not compare multiple CT or IM TMEs from the same specimen to evaluate for spatial heterogeneity at that particular location of the TME. We would expect that a certain degree of heterogeneity would exist even within the CT or the IM, but determining if a specific location more consistently represents the TME would be of great clinical relevance. This question should be investigated further in future studies.

In conclusion, our results demonstrate a weak correlation between TME assessed on biopsy with the TME present in the final resection specimen. In particular, the accuracy of endoscopic biopsy of assessing tumor PD-L1 status was only 61%. Until a more accurate means of assessing the TME using biopsy alone is available, we would use caution when excluding patients from specific therapies based solely on the findings of endoscopic biopsy and interpreting results of window of opportunity studies in colon cancer based on comparisons between biopsy and resection specimens.