Introduction

In hormone receptor-positive (HR+) and human epidermal growth factor receptor 2-negative (HER2−) breast cancers, first-generation genomic signatures that are highly associated with proliferation have been widely used to predict prognosis and as secondary markers for predicting chemotherapy response [1, 2]. Several prospective randomized trials are currently evaluating the utility of the first-generation genomic signatures, and several results have recently been reported [3, 4]. For example, genomic markers, especially for HR+ breast cancer, may indicate that some patients cannot benefit from receiving adjuvant chemotherapy, which is also associated with significant side effects. Cost-effectiveness has also been discussed. These reports showed similar advantages to use first-generation signatures from the quality-adjusted life-years compared to conventional clinicopathological markers [5, 6]. However, with the exception of the conventional clinicopathological markers, there are no standardized and clinically available prognostic and predictive markers for HER2+ or HR− breast cancers. Several recent clinical studies have revealed that morphological evaluation for tumour-infiltrating lymphocytes (TILs), using hematoxylin eosin (HE) or immunohistochemistry (IHC) testing, can predict prognosis and chemotherapy response, independent of the effects of age, nodal status and tumour size, in cases of estrogen receptor-negative (ER−), triple-negative (TN) and HER2+ breast cancer [79]. A recent meta-analysis also revealed that high levels of TILs were significantly associated with favourable breast cancer outcomes in patients who predominantly had TN cancers [10]. Thus, using morphological-measured TILs in different breast cancer subtypes may provide clinically relevant information regarding chemotherapy response and prognosis.

Despite this information, most panellists at the 2015 St Gallen Consensus Conference did not recommend using TILs as a new prognostic factor, based on the absence of standardized evaluation guidelines and limited information regarding reproducibility and clinical validity [11]. However, a group of professionals who are experienced in TILs evaluation (the International TIL Working Group) recently issued recommendations for improving the consistency of TILs scoring, as well as detailed guidelines for annotating lymphocyte infiltration [12]. These recommendations are important, as HE or IHC testing for morphology is cumbersome and lacks objectivity and reproducibility in many instances. For example, there is broad inconsistency in the IHC evaluation of Ki-67 in moderately differentiated breast cancer, and there is controversy regarding whether Ki-67 is an appropriate biomarker for guiding treatment decisions for patients with breast cancer. Furthermore, previous studies have described inconsistent Ki-67 assessments during the routine diagnosis of breast cancer [13, 14]. Moreover, the inter- and intra-observer variability in Ki-67 assessments remains poor to moderate in cases of breast cancer, especially in the G2 breast cancer group (kappa: 0.2–0.4), despite recommendations from the International Ki-67 in Breast Cancer Working Group [13, 14]. Thus, the absence of standardized methodologies, cut-off values and information regarding inter-/intra-observer agreement for evaluating TILs has limited the use of morphological testing to detect TILs in clinical practice [12, 15].

The problems of reproducibility and consistency may be further exacerbated by the complex testing procedure, as Hida et al. have indicated that the morphological method is too detailed for pathologists to use in clinical practice [16]. Thus, a reproducible and objective method for evaluating TILs, such as gene expression profiles, is needed. Previously published results may provide valuable information regarding the use and logistical implementation of gene expression profiles, as several studies have addressed sample handling, testing reproducibility, quality control and standardization of genomic signatures [2, 17, 18]. However, little is known regarding whether TILs-associated genomic signature (TILs-GS) can predict prognosis and treatment response. Therefore, the present study evaluated whether HE-measured stroma and intra-tumour TILs levels were associated with gene expression profiles, and whether TILs-GS could be used to predict chemotherapy response and prognosis in several breast cancer subtypes.

Materials and methods

Training dataset and TILs-GS

We retrospectively evaluated haematoxylin and eosin-stained slides and gene expression profiling data (Gene Expression Omnibus dataset: http://www.ncbi.nlm.nih.gov/geo/GSE6367) from 40 patients with primary breast cancer. The slides were evaluated for TILs at low magnification (×2−4) by a single pathologist from the Nihon University School of Medicine. The presence of TILs was evaluated at the edges of the tumour mass, in the tumour mass, and in the stroma surrounding the expanding mammary ducts that were packed with carcinoma cells. The HE-assessed TILs results were scored as 0 (no detected TILs), 1 (sparse TILs; <50% of the area had TILs) or 2 (dense TILs; >50% of the area had TILs). Among the 40 included cases, 11 cases were assigned a score of 0, 18 cases were assigned a score of 1 and 11 cases were assigned a score of 2. Our institutional ethics board approved the use of human tissues for the HE assessments of TILs.

We subsequently selected 29 samples (scores of 0 or 2) that had available TILs information and gene expression profiling data, and identified genes that were differentially expressed in the samples with TILs scores of 0 or 2. To minimize noisy measurements, we removed probe sets that had average expression values of less than or equal to the lowest 15% of the expression distributions, and retained only the probe set with the highest average gene expression. Thus, 7797 genes were included in the analysis.

We also performed a class comparison test for mRNA gene expressions using the samples with TILs scores of 0 or 2. In this analysis, we blocked the samples using ER status, in order to analyse randomized experiments. This approach allowed us to adjust for a single covariate (i.e., ER status) while analysing different classes (i.e., TILs scores of 0 or 2) using the BRB Array Tools software, as HE-assessed TILs levels are highly associated with ER status [19]. Parametric P values of <0.001 were considered statistically significant in the training analysis, and the 22 overexpressed genes in cases with TILs scores of 2 were selected as the TILs-GS (Supplementary Table 1). The overall TILs-GS score was calculated using the average unweighted gene expressions for the 22 genes, in order to ensure comparability of results that were obtained using different chip types.

Table 1 Logistic regression analysis for pCR in the anthracycline and taxane containing dataset

Validation analysis for TILs-GS

During the validation analysis, we retrieved publicly available cDNA microarray data from 2337 primary breast cancers (806 cases without systemic adjuvant therapy from GSE2034, GSE2990, GSE7390 and GSE11121; 625 cancers that received anthracycline and taxane-based neoadjuvant chemotherapy [NAC] from GSE20194, GSE20271, GSE22093, GSE23988 and GSE25066; 780 cases that received tamoxifen from GSE6532, GSE12093, GSE1705 and GSE26971 and 126 cases that received trastuzumab from GSE37946, GSE42822 and GSE50948). These data were annotated using the Affymetrix Human Genome Array (Affymetrix Inc., Santa Clara, CA). Expressions of ER and HER2 were identified based on ER (ESR1) and HER2 (ERBB2) mRNA expression levels, as previously described [20, 21]. All gene expression data were generated using Affymetrix gene chips and normalized using the MAS5 algorithm (http://www.bioconductor.org), with the mean expression centred to 600 and log 2 transformation. Patients with ESR1 mRNA expression levels (probe set: 205225_at) of greater >10.18 were considered ER+ , and patients with HER2 mRNA expression levels (probe set: 216836_s_at) of >12.54 were considered HER2+ [20, 21]. ER+ and HER2− breast cancers were stratified into two groups with luminal A-like low proliferation or luminal B-like high proliferation. The proliferation score was calculated as the average expression of 12 mitotic kinases (Mitotic Kinase Score), as previously described [22]. The cut-off point between luminal-low and -high proliferation was set at a Mitotic Kinase Score of 8.255 [20].

First, we compared the TILs-GS according to breast cancer subtype (luminal-low, luminal-high, HER2+ and TN [ER−/HER2−]) using the Kruskal–Wallis rank sum test. The prognostic analysis was performed using datasets from patients who received no systemic adjuvant therapy or only adjuvant tamoxifen. The outcome of interest was defined as distant event-free survival (DEFS), and was evaluated according to the tertiles of the TILs-GS score. Survival was censored at 10 years. Survival curves were calculated using the Kaplan–Meier method and compared using the log-rank test. Survivals were also evaluated using a proportional hazards model and Cox regression analysis to estimate the hazard ratios (HRs) and 95% confidential intervals (CIs). In the tamoxifen-treated dataset, we only used ER+ and HER2− cases for the predictive analysis.

Second, therapy sensitivity analysis was performed according to whether the patients had received anthracycline- and taxane-based NAC or a trastuzumab-containing regimen. The outcome of interest was defined as pathological complete response (pCR) in the breast and axilla. The samples for the NAC cohorts had been collected before any treatment using needle biopsy. The Wilcoxon test was used to evaluate the associations between TILs-GS and the responses to NAC or trastuzumab according to breast cancer subtype. Univariate and multivariate logistic regression analyses were also performed to evaluate the values of TILs-GS and clinicopathological variables for predicting NAC response. To avoid optimal cut-off selection bias [23], the univariate and multivariate logistic regression analyses were performed using metagene scores as continuous variables. The multivariate analyses included variables with a univariate P value of <0.1 to avoid overfitting of the data, based on the small number of events in each subgroup.

All statistical analyses were performed using BRB Array Tools software (version 3.9.0a; http://linus.nci.nih.gov/BRB-ArrayTools.html) and R software (version 2.9.0; http://www.r-project.org). Two-sided P values of ≤0.05 were considered statistically significant.

Results

Figure 1 shows the associations between TILs-GS and the breast cancer subtypes in the prognostic dataset. Significantly higher TILs-GS expressions were observed for TN and HER2+ breast cancers, compared to the luminal types (rank sum test P < 0.001). As expected, higher TILs-GS expression levels were associated with the characteristics of more aggressive breast cancers.

Fig. 1
figure 1

Tumour-infiltrating lymphocytes mRNA gene expressions according to breast cancer subtype. The box plots showed the tumour-infiltrating lymphocytes (TILs) mRNA gene expression levels according to breast cancer subtype. P values were calculated using the Kruskal–Wallis rank sum test

Prognostic and predictive values of TILs-GS

The Kaplan–Meier DEFS curves for the TILs-GS tertiles were plotted according to breast cancer subtype using the pooled prognostic datasets from tamoxifen-treated patients and patients without adjuvant treatment (Fig. 2, Supplementary Fig. 1). Among the patients without adjuvant treatment, TILs-GS did not predict prognosis in the luminal cases, regardless of the proliferative level (Fig. 2a, b). Similarly, in the tamoxifen-treated dataset, TILs-GS did not predict prognosis in any of the proliferative subtypes (Supplementary Fig. 1a, b). Among HER2+ cases, a high TILs-GS was significantly associated with a better prognosis, compared to cases with lower expressions (log rank P = 0.001), although this analysis only considered a small number of cases (n = 120) (Fig. 2c). Among TN cases, we observed a similar trend compared to the HER2+ cases, although the trend was not statistically significant (log rank P = 0.729) (Fig. 2d).

Fig. 2
figure 2

The Kaplan–Meier curves according to tumour-infiltrating lymphocytes gene signatures in the dataset without adjuvant treatment. The Kaplan–Meier curves for a luminal-low, b luminal-high, c HER2+ and d triple-negative breast cancers. P-values were calculated using the log-rank test. Hazard ratios (HR) and 95% confidence intervals (CIs) were estimated for distant event-free survival using Cox regression analysis

We also assessed the predictive power of TILs-GS using the NAC-treated cases according to breast cancer subtype. With the exception of the luminal-low proliferation subtype, the Wilcoxon test revealed significantly different TILs-GS levels between the cases with pCR or residual disease. This result indicates that TILs-GS might be a strong marker for predicting chemotherapy response (Fig. 3). In contrast, TILs-GS had no predictive power in the two independent trastuzumab-treated datasets (Supplementary Fig. 2).

Fig. 3
figure 3

Neoadjuvant therapy responses and tumour-infiltrating lymphocytes gene signatures according to breast cancer subtype. The regimens contained anthracycline and taxane. The boxplots show the associations between tumour-infiltrating lymphocytes gene signatures (TILs-GS) and neoadjuvant therapy responses according to breast cancer subtype (A: luminal-low, B: luminal-high, C: HER2+ and D: triple-negative). P values were calculated using Wilcoxon’s test. pCR pathological complete response, RD residual disease

Finally, we performed univariate and multivariate logistic regression analyses to determine whether TILs-GS and the clinicopathological variables could predict pCR (Table 1). In the univariate analyses, pCR was significantly associated with higher histological grade, ER negativity, HER2 positivity and higher TILs-GS scores. In the multivariate analysis, pCR was independently associated with smaller tumour size, higher histological grade, ER negativity, HER2 positivity and higher TILs-GS scores (OR 2.02, 95% CI 1.30–3.14, P = 0.025).

Discussion

The present study revealed that genomic markers were highly associated with TILs levels based on HE. Interestingly, Gu-Trantien et al. evaluated leukocyte infiltration in various breast cancers, and found that 75% of the cells were T lymphocytes, <20% of the cells were B-cells, <10% of the cells were monocytes and <5% of the cells were natural killer cells or natural killer T-cells [24]. Given that our TILs evaluations were based on HE, it is unsurprising that most TILs-associated genes had roles in immune function, especially in T lymphocytes (e.g., ICOS, TCF7, LCK and LCP1). Furthermore, we found that breast cancers with TILs scores of 2 (dense TILs) and 0 (no identified TILs) had distinct gene expression patterns, and that aggressive breast cancer subtypes (e.g., ER− or HER2+) were associated with higher TILs-GS levels. Similar results have been observed in previously studies [19, 25, 26]. However, we investigated the clinical implications of these findings and discovered that TILs-GS was associated with chemotherapy response in several breast cancer subtypes.

Our finding that TILs-GS was highly associated with stromal and intra-tumour TILs status may be reasonable and reproducible. International TILs Working Group recommends evaluating stromal TILs as the principle parameter, rather than intra-tumour TILs, because intra-tumour TILs do not provide the same information that is provided by stromal TILs [12]. However, recent evidence from the neoadjuvant setting suggests that both stromal and intra-tumour TILs can predict NAC response [27]. In addition, Dieci et al. reported that intra-tumour and stromal TILs strongly predicted overall survival (intra-tumour TILs, HR 0.85, P = 0.003; stromal TILs, HR 0.89, P = 0.005) [8]. Nevertheless, it is impractical to consider only stromal TILs during clinical practice, as genome signature samples are usually obtained using core needle biopsy which contain tumour cells (50%), lymphocytes (20%) and stromal cells (30%) [17], or surgical samples, those composition is usually similar to core needle biopsy. Thus, without microdissection to separate the stromal and intra-tumour components, gene expression profiling inevitably involves intra-tumour components. Moreover, microdissection is a complex procedure that cannot be routinely performed during clinical practice, and our goal was to develop TILs-GS as a clinically useful tool. Therefore, the TILs-GS was developed using the signatures that were associated with both intra-tumour and stromal TILs, which allowed us to directly examine and compare TILs-GS with the morphological-evaluated TILs levels.

The present study also revealed that TILs-GS predicted chemotherapy response in most breast cancer subtypes, with the exception of the luminal-low proliferative subtype. Several previous reports have also revealed that immune-related genomic signatures have predictive value, especially in non-luminal breast cancers [22, 28, 29]. There are several possible explanations for the absence of predictive value in the luminal-low proliferative subtype. First, chemotherapy itself may not be effective for low-proliferative breast cancers [3032]. Second, the pCR outcome after NAC may not be suitable for evaluating efficacy in luminal cases [33]. Third, it is possible that our analyses were underpowered, given the sample size and number of events.

Interestingly, our results revealed that TILs-GS had prognostic value in only HER2+ cases. Previous studies have evaluated the prognostic value of TILs in the context of randomized adjuvant trials for breast cancer. The results indicate that baseline TILs were associated with high-proliferative, high-grade and ER− breast cancers, and strongly predicted prognosis for specific breast cancer subtypes, especially TN cancers [7, 19]. However, these trials only considered patients who received adjuvant chemotherapy and/or targeted therapy, and the prognostic value of TILs in untreated patients remains unclear. In the present study, the untreated dataset included retrospectively evaluated outcomes in patients with stage I–II disease and without lymph node metastasis. Thus, the clinical and biological significances of TILs may be distinct in early and advanced breast cancers, and it might be useful to identify patients with a poor prognosis (who should not receive adjuvant therapy) and patients who are expected to experience a good response to therapy. This type of evaluation would require patients with advanced cancers who have not received adjuvant therapy, although it would be difficult to prospectively collect samples in this subgroup, given the related ethical issues.

To address this issue, we tested the predictive value of TILs-GS among trastuzumab-treated cases, as TILs can predict long-term survival in these cases [8, 34], as well as the efficacy of trastuzumab [26, 35]. In addition, trastuzumab treatment results in the activation or recruitment of multiple immune cell lineages, and increases the susceptibility of tumour cells to antibody-dependent cytotoxicity [36]. However, the N9831 trial revealed that TILs were not associated with prognosis among patients who received chemotherapy plus trastuzumab [37]. Nevertheless, patients who receive trastuzumab are a unique subgroup, as they typically receive trastuzumab combined with multiple chemotherapeutic agents, which can induce immunogenic cell death, carcinoma differentiation and inhibit TILs mitosis [3840]. Thus, data from patients who received only a single agent are needed to evaluate a single marker’s predictive power (the “one agent needs one predictive marker” concept). In the present study, TILs-GS did not have prognostic value when we only considered the tamoxifen-treated dataset, and this result was independent of the proliferative level, which indicates that TILs may have distinct roles in cases that received hormone therapy or chemotherapy. Interestingly, Dowsett et al. reported that higher immune-related genes were associated with poorer response to aromatase inhibitor [41], although these associations were opposite to chemotherapy response. The immune system has conflicting potential role in both suppressing tumour growth and carcinogenesis through the production of cytokines and growth factors [42]. Therefore, the absence of predictive value in chemotherapy-treated luminal-low proliferative cases might be related to the distinct roles of TILs in different breast cancer subtypes. Additional studies are needed to validate our findings and address these issues.

The present study has an important limitation as the training and validation datasets were relatively small, and therefore some are true but weaker prognostic and predictive variables may not have been detected as significance in our study. Also, our TILs-GS findings should be compared to the predictive powers of previously published immune-related signatures [43]. Nevertheless, we believe that our findings are generalizable and consistent with predictive results that were observed in datasets treated using homogeneous chemotherapy regimens. Furthermore, our methods for gene expression profiling using stromal and intra-tumour components, and our unweighted calculations of the gene expression profiles, should be relatively easy to validate using other datasets.

In conclusion, TILs-GS was associated with stromal and intra-tumour TILs levels, as evaluated using HE, which predicted chemotherapy response in several breast cancer subtypes. Further studies are needed to perform stratification according to TILs-GS levels and the conventional breast cancer subtypes.