Immunohistochemistry (IHC) is extensively used in contemporary pathology to visualize specific proteins in biological tissues, and it is undergoing an evolutionary shift to meet the requirements of personalized medicine [1]. Typically, IHC markers are interpreted in a binary manner to support pathology diagnosis; however, quantification of in situ protein signals is sought predominantly for anti-cancer therapy decision-making; for these purposes, semi-quantitative scoring systems assess the location, relative intensity, and estimated percentage of positive cells [2]. Rather than providing data on the true concentrations of proteins, IHC presents a binary signal indicating the presence of a particular protein in the tissue structures examined. Enumeration of positive and negative signals in the context of tissue components generates data to quantify cell populations of interest. Importantly, this information can be interpreted in the context of two-dimensional space and biological tissue structures.

Conventional IHC interpretation is based on human visual perception to identify tissue structures and regions of interest and to produce semi-quantitative estimates of the intensity and distribution of IHC signals and the structures labeled. While this approach is sufficient for diagnostic pathology, it does not meet the expectations of personalized therapies, which increasingly rely on accurate biomarker measurement. This gap is eloquently illustrated by situations in which a manual count of IHC-positive and IHC-negative cell profiles remains the method of choice [3].

Digital IHC, enabled by recent progress in high-resolution scanning of microscopy slides and digital image analysis (DIA), provides new options for improving the accuracy, reproducibility, and capacity of analyzing the data contained in microscopic images. Numerous studies have elaborated on these characteristics and have shown the potential benefits of digital technologies [4, 5]. On the other hand, complex issues and challenges in digital IHC test validation to meet clinical standards have been highlighted, and they remain to be resolved [6, 7].

It has become apparent that IHC DIA-generated data per se cannot provide “quick fixes” and all of its expected benefits. A set of methodologies for the validation and standardization, quality assurance, informative metrics, and statistics of IHC DIA remain to be developed [8]. Furthermore, the full benefits of DIA can be realized by going beyond the simple quantification of cell populations and retrieving information on the spatial context and intra-tissue variance of biomarker expression. Together, these efforts have led to the emerging field of comprehensive IHC. Furthermore, the success of new techniques for highly multiplex, truly quantitative, broad dynamic range tissue proteomics, introduced as next-gen IHC [9], will depend on the development of methodologies in bioimage informatics.

There are many areas of tumor pathology in which comprehensive IHC is urgently needed to both accurately measure biomarker expression and to evaluate the intra-tissue heterogeneity of biomarkers. The proliferative activity of breast cancer tissue, estimated by the Ki67 labeling index (Ki67 LI), is one particular example in which prognosis and clinical decision-making largely depend on this biological feature; nevertheless, its clinical utility is hindered by the absence of a harmonized methodology for the test [3, 10]. The situation is further complicated by the phenomenon of intra-tumor heterogeneity of Ki67 expression, raising the need for standardized sampling of tumor tissue for DIA, including the definition, detection, and evaluation of Ki67 hotspots.

A growing number of studies have addressed the intra-tissue heterogeneity of IHC in breast cancer and other tumors. While some investigators have employed texture-based metrics [11, 12], others have proposed automated algorithms to detect hotspots [13], and still others have proposed methodologies based on a manual count to improve their efficiency and account for heterogeneity [14]. The intra-tumor heterogeneity of IHC marker expression has been estimated in a tissue microarray (TMA) study with multiple sampling of tumor tissue [15]. Heterogeneity mapping of protein expression using quantitative immunofluorescence for estrogen receptor and HER2 expression has been proposed in ovarian cancer [16]. As recently reviewed [17], digital pathology methods can contribute to a more comprehensive understanding of the highly heterogeneous tumor microenvironment by mapping the spatial and morphological patterns of normal and cancer cells. Various texture and density-based spatial clustering methods have been employed in these studies to uncover visually undetectable spatial cell interactions in tumors; however, the potential of spatial statistics derived from regular lattices and grids has not been investigated. Meanwhile, in geography, ecology, and other fields, advanced in spatial modeling, regular polygons in arrays have been shown to be most efficient for mapping spatial variation and providing most common framework for spatially explicit models [18, 19]. Although rectangular (square) grid is most commonly used due to its relative mathematical simplicity, a hexagonal grid has long been known to be superior in many aspects in image processing and machine vision related fields [20]. In particular, hexagons have some advantages from being closer in shape to circles, thus a shorter perimeter than a square of equal area, which reduces sampling bias due to edge effects; more importantly, the hexagonal grid has simpler and more symmetrical nearest neighborhood and important advantages in visualization [18].

In this study, we propose a methodology for comprehensive Ki67 IHC evaluation in whole-slide images (WSI) of breast cancer tissue that is based on the systematic subsampling of DIA-generated data into a hexagonal tiling (HexT) array, thereby enabling the computation of texture and distribution indicators for Ki67 LI intra-tumor variability. Multivariate analyses extracted 4 major factors of variance, defined as entropy, proliferation, bimodality, and cellularity. The factor scores were further used in cluster analysis, which outlined subcategories of heterogeneous tumors with predominant entropy, bimodality, or both at different levels of proliferative activity. The HexT data also allowed the visualization of Ki67 LI heterogeneity in the WSI along with automated detection and quantitative evaluation of Ki67 hotspots that were based on the upper quintile of the HexT data, conceptualized as the “Pareto hotspot”.

Materials and methods

Study population and tumor characteristics

The tumor WSI were prospectively collected, as part of a diagnostic routine, from 302 patients with invasive breast carcinoma who had been treated by surgical excision at the National Cancer Institute. The WSI were investigated at the National Center of Pathology during a period from 2013 to 2014. Pathology report data (patient age, tumor histological type, grade, estrogen and progesterone receptor scoring, HER2 (verified by a HER2 FISH test in IHC 2+ cases) and Ki67) were collected. The Lithuanian Bioethics Committee approved this study.

The median age of the patients was 60 years, ranging from 24 to 88. Histologically, there were 271 invasive ductal (89.7 %), 22 invasive lobular carcinomas (7.3 %), and 9 special types (3 %). Histological grading was performed according to the Nottingham grading system [21]: 21 cases were grade 1 (7 %), 123 cases were grade 2 (41 %) and 157 cases grade 3 (52 %). The cases were categorized into the tumor T and N stages as follows: T1 (183), T2 (123), T3 (11), T4 (9), N0(171), N1(83), N2(28), N3(9); the T and N stage was not established in 2 and 11 cases, respectively. Based on immunophenotyping, the tumors were categorized into hormone receptor (HR)-positive (Luminal, 189), with HER2 co-expression (Luminal HER2, 33), HER2 positive (22), and triple negative (TN) (55).

Ki67 immunohistochemistry, image analysis, tiling of the tumor area, and computation of the HexT parameters

Ki67 IHC, image acquisition, and analysis were performed as reported in the previous study [5]; details are provided in the Online Resource 1. Hexagonal tiling of the tumor area, design of co-occurrence matrix for the HexT, Haralick texture parameters, temperature map, and Gaussian mixture models to compute bimodality indicators [18, 2228] are provided in the Online Resource 1. A visual representation of the of tumor analysis performed by the HexT approach is presented in the Fig. 1.

Fig. 1
figure 1

A visual representation of the of tumor analysis performed by the HexT approach. Hexagon grid is overlaid on the original WSI of Ki67 IHC to reflect subsampling of the DIA-generated data. The magnified hexagons illustrate side-to-side the details of the original and DIA markup images. The hexagon colors represent different ranks of the local Ki67 LI, also used for calculation of heterogeneity indicators and for the 3D visualization (bottom left) of the “proliferative surface” of the tumor

Validation of HexT-based hotspot detection data by visual review of the WSI

Three observers independently reviewed 50 randomly selected WSI at low magnification and drew as many as 3 freeform annotations to delineate the Ki67 hotspots in the tumor tissue, if present. Inter-observer agreement of the visual hotspot detection was evaluated. The hotspot annotations from each observer were compared to the corresponding HexT data.

Statistical analysis

Summary statistics and distribution analyses were performed with significance tests based on the paired t test, one-way ANOVA and Duncan’s multiple range test for pairwise comparisons. Fisher’s exact test was used to estimate significant associations in non-parametric statistics. Inter-observer agreement was tested by kappa statistics. Pearson correlations and single and multiple linear regression analyses were performed to test pairwise linear relationships. Factor analysis was performed using the factoring method of principal component analysis; 4 factors were retained based on a minimum eigenvalue threshold of 1.5, and a general orthomax rotation of the initial factors was performed. Cluster analysis was performed using the K-means algorithm. Statistical significance was set at p < 0.05. Statistical analysis was performed with SAS 9.3 software.

Results

Sampling the DIA-generated data into HexT and summary statistics

Criteria and results of sampling the DIA-generated data into HexT and summary statistics of the data from the overall image analysis of WSI, from HexT, and from pathology reports are presented in Online Resource 2.

Correlation of the data from the overall image analysis of WSI, from HexT, and from pathology reports

The Ki67 LI calculated by the WSI DIA and the median Ki67 LI obtained from the HexT (HexSize825) data revealed perfect correlation (r = 0.9967, p < 0.0001) without significant bias detectable by linear regression with the Hex median as the dependent variable (r 2 = 0.997, model p < 0.0001, intercept = −0.46808, slope = 1.02341). Notably, a relevant bias was estimated in the regression model with the Hex Ki67 LI 90th percentile used as the dependent variable (r 2 = 0.917, model p < 0.0001, intercept = 7.40159, slope = 1.21154). The pathology report for Ki67 LI could be predicted from the WSI DIA Ki67 LI with some bias (r 2 = 0.754, model p < 0.0001, intercept = −4.16059, slope = 1.21154).

A weak inverse correlation between the HexT median Ki67 LI and entropy was observed (r = −0.3872, p < 0.0001). Importantly, the scatter plot (Online Resource 3) revealed a nonlinear pattern, suggesting a positive relationship at lower levels of proliferation and a negative relationship at higher levels of proliferation. This trend was confirmed by the correlation analyses (Online Resource 3) of samples split at the median of the Hex Ki67 LI (less than 30 % and greater than or equal to 30 %), with correlations of r = 0.5906, p < 0.0001, and r = −0.4412, p < 0.0001, respectively. An example of two tumors with similar Ki67 LI but different entropy indicators is presented in Online Resource 4. Bimodality indicators were not significantly related to the Ki67 LI values.

As expected, the degree of proliferation measured by various Ki67 LI indicators was associated with higher histological grade (Fig. 2a) and more aggressive types of breast cancer (not shown). Importantly, grade 3 tumors revealed less entropy than grade 1 or 2 tumors (p = 0.0104); the frequency of grade 3 tumors within the upper quartile of entropy was 17.8 % (28/157) versus 33.3 % (7/21) or 32.8 % (39/119) for grade 1 and 2 tumors, respectively. This finding was also reflected by relevant ANOVA results where G3 tumors presented with lower entropy values compared to the G1 and G2 tumors (p < 0.05, Fig. 2b) and is consistent with the relationship between Ki67 LI and entropy noted above (Online Resource 3). It can be interpreted that high-grade tumors are more spatially homogenous with respect to their proliferative activity. While this finding can be taken as trivial, it supports the principle of tumor entropy measurements by HexT approach. The bimodality indicators did not reveal significant clinicopathological associations.

Fig. 2
figure 2

Association of histological grade to median Ki67 LI HexT and entropy. Box-whisker plots representing analysis of variance results with histological grade as explanatory variable with a median Ki67 LI HexT (labeled as PercentPosHex_Median) and b entropy indicator

Factor analysis of the Ki67 indicators

Factor analysis was performed on 297 patients with a complete set of data obtained from the DIA of WSI and from the HexT (HexSize825) analysis. Ki67 LI from pathology reports was also included in the data set. The rotated factor pattern of the 4 factors, extracted with eigenvalues of 5.8, 5.1, 2.3, and 1.9, respectively, is presented in Fig. 3. Factor 1 was characterized by strong loading of the majority of the Haralick texture parameters; the strongest positive loading was with the entropy indicator, and factor 1 was therefore named the entropy factor. Factor 2 was characterized by positive loading of the Ki67 LI indicators from WSI, HexT, and the pathology report accompanied by low skewness of the Ki67 LI distribution in the HexT, and it was therefore named the proliferation factor. Similarly, factor 3 was named the bimodality factor, while factor 4 was termed the cellularity factor, based on the data obtained from both the WSI DIA and HexT data. Factor analyses using the same data set provided similar factor patterns in the subgroups of HR-positive, HER2-positive, and TN tumors (not shown). The results of cluster analysis are presented in Online Resource 5.

Fig. 3
figure 3

Rotated factor pattern of the whole-slide image analysis, hexagon grid, and pathology report data set; n = 297. The loading of: a factors 1 and 2; b factors 1 and 3; and c factors 1 and 4 are plotted (n = 297). Labels are as follows. CellDensityWSI nuclear profile density per tumor area in WSI, Ki67% Ki67 LI from pathology report, Kurtosis Ki67 LI HexT kurtosis, PercentPosHex_P90 Ki67 LI HexT 90th percentile, PercentPositiveHex_DispIndex Ki67 LI HexT dispersion index, PercentPositiveHEX_Median Ki67 LI HexT median, PercentPositiveWSI Ki67 LI WSI, QRange Ki67 LI HexT interquartile range, Skewness Ki67 LI HexT skewness, TotalNucleiHex_DispersionIndex dispersion index of total nuclear profiles per Hex, TotalNucleiHEX_Median median of total nuclear profiles per Hex in a HexT

In summary, the factor and cluster analyses present evidence for two linearly independent features with respect to the intra-tumor heterogeneity of proliferative activity as measured by Ki67 expression. The first is based on entropy and other texture indicators, and the other is based on bimodality indicators. Both aspects of heterogeneity are linearly independent of the Ki67 LI of the tumor. Although a large proportion of tumors can be considered as relatively homogeneous, those with predominant entropy, bimodality, or a combination of both may represent heterogeneous variants.

Automated hotspot detection and measurement—the concept of Pareto hotspot

The DIA data, when subsampled into the HexT, provided an opportunity to analyze the Ki67 LI distribution in the context of the 2D space of the tumor tissue. All ranges of biomarker expression variability in the HexT can be mapped back to the WSI for visualization and quality control. Furthermore, the tumor areas with high Ki67 expression can be specifically highlighted to reveal hotspots, potentially defining the biological behavior of the tumor.

Detection of hotspots by either visual or digital means can be complicated by the nature of the tumor tissue: hotspots may vary in size, shape, and contrast in heterogeneous tumors but be undetectable in homogeneous tumors. Therefore, stable definitions and criteria are needed. One simple approach would be to measure the Ki67 expression as a stable proportion of the tumor tissue at the high end of the range, which may best reflect the biological potential of the tumor to proliferate. Following the Pareto principle (also known as the 80–20 rule), which states that for many events, approximately 80 % of the effects come from 20 % of the causes, we propose the concept of a Pareto hotspot, represented by the upper quintile of the biomarker expression in the tissue. Meanwhile, the median of the Pareto hotspot, represented by the 90th percentile of the entire range, can be a quantitative indicator of the Ki67 LI inside the Pareto hotspot.

To gain insight into the potential usefulness of the Pareto hotspot concept, we performed regression analyses on a subset of tumors with Ashman’s D values above and below the upper quartile. The retrospective Ki67 LI from the pathology report was used as a dependent variable, while the median and 90th percentile Ki67 LI obtained by the HexT analysis were tested as explanatory variables in a multiple regression model. Remarkably, the Ki67 LI values of the pathology reports were somewhat better predicted by the 90th percentile than by the median of the HexT Ki67 LI in the bimodal tumors (upper quartile of the Ashman’s D), and vice versa (Table 1). Notably, the retrospective Ki67 LI values in the pathology reports were not strictly standardized in our study; however, it is expected to reflect the pathologists’ effort to detect and estimate the Ki67 LI in the “hottest” areas of the tumors.

Table 1 Multiple regression models to predict pathology-based Ki67 LI in predominantly unimodal and bimodal tumor subsets

While the clinical utility of the Pareto hotspot metrics are yet to be established, the concept provides the immediate benefit of highlighting the most prominent areas of biomarker expression in the tumor tissue by overlaying the Hex with the fifth quintile of Ki67 LI on the tumor tissue image, forming the Pareto web. This visualization approach does not depend on any assumptions about tumor heterogeneity, as the approximately 20 % of the tumor tissue with the highest biomarker expression levels would be marked in all cases. Nevertheless, the relevance of the Pareto web must be appreciated in the context of the heterogeneity metrics for the individual tumor. Examples of the Pareto web are presented in Fig. 4.

Fig. 4
figure 4

Examples of tumors tested with the comprehensive Ki67 IHC methodology. The panel includes the statistics for the tumor and the histogram of the HexT Ki 67 LI distribution (upper-left quadrant), a 3-D histogram of the HexT Ki67 LI (lower-left quadrant), the WSI with the Pareto web highlighting the upper fifth quintile of the HexT Ki67 LI distribution (for increased detail, yellow Hex represent the 80–90th percentile; orange Hex represent the 90-95th percentile; red Hex represent the 95–99th percentile; blue Hex represent the 99-100th percentile). a Ductal carcinoma, HR and HER2-positive, grade 3, cluster 2. The pathology-based Ki67 LI is similar to the Pareto hotspot Ki67 LI value but higher than the average value. b Ductal carcinoma, HR and HER2-positive, grade 2, cluster 1; the sample represents a tumor with medium entropy and high bimodality. A distinct hotspot is detected and reflected in the Ki67 LI indicators. c Ductal carcinoma, HR-positive, grade 2, cluster 4; the sample represents a tumor with medium entropy and bimodality. The pathology-based Ki67 LI is lower than all the DIA-generated indicators, while the Pareto hotspot Ki67 LI value is much higher and was obtained from the invasive margin of the tumor highlighted by the Pareto web. d Ductal carcinoma, HR-positive, grade 3, cluster 7. The sample represents a tumor with high entropy and bimodality. Importantly, the pathology-based Ki67 LI corresponds to the Pareto hotspot Ki67 LI value (90th percentile).

Validation of HexT data based on hotspot detection by visual review of the WSI

While reviewing 50 WSI, 3 observers identified 20, 21, and 23 tumors, each with at least one hotspot. The agreement between the observers (taken pairwise) in detecting at least 1 hotspot was estimated by kappa coefficients of 0.55, 0.63, and 0.85. Consequently, hotspots were identified in 27, 22, or 15 tumors by 1, 2 or all 3 observers, respectively. Analysis of the actual areas and hotspot overlaps outlined by all 3 observers in the 15 tumors (as above) revealed that on average, hotspots represented 4.8 % of the tumor area (range, 0.6 to 17.0 %). Meanwhile, on average, 26.0 % of the hotspot areas coincided for all 3 observers (range, 1.7 to 70.8 %). Pairwise comparisons revealed hotspot area overlaps of 42.0, 43.8, and 50.1 %.

The hotspot annotations provided by the 3 individual observers revealed significantly higher Ki67 LI values by paired t test (mean differences of 8.4 %, 8.7 %, and 10.1 %; p < 0.0009, p < 0.0008, and p < 0.0003, respectively) compared to the remaining area of the same tumors. The mean differences in the hotspot Ki67 LI between the observers were not significant. Hex overlapping the freeform hotspot annotations provided by the 3 individual observers (Online Resource 6) revealed significantly higher Ki67 LI values by paired t test (mean differences of 5.8 %, 5.8 %, and 6.7 %; p < 0.0003, p < 0.0002, p < 0.0001, respectively) when compared to the remaining Hex of the same tumors. Similarly, the Hex overlapping the hotspot annotations of the 3 observers were more frequently represented by higher quintiles of Ki67 LI HexT values (p < 0.0001); in particular, over 20 % and 40 % of these Hex were in the range of the 4th and 5th quintiles, respectively. Finally, the mean hotspot Ki67 LI from all 3 observers was not significantly different (p = 0.0675) from the Ki67 LI 90th percentile (the median of the Pareto hotspot), although it was significantly lower (−4.4 %, p = 0.0346) in the case of only one observer..

In summary, the experiments revealed fair inter-observer agreement in the visual detection of hotspots along with their variable size and spatial overlap. The hotspots revealed higher Ki67 LI compared to the Ki67 LI of the remaining tumor tissue; they were associated with Hex containing higher Ki67 LI values and were comparable with the Pareto hotspot median Ki67 LI.

Discussion

Our study demonstrates that the informative value of IHC DIA can be greatly enhanced by the systematic subsampling of the data into a HexT. The approach provides numerous benefits: (1) multiple measurements of the IHC marker (or any other histological feature of interest) enable the application of distribution statistics to supplement and ensure the quality of marker expression measurement obtained by a single DIA of the entire region of interest (ROI); (2) data on biomarker expression in 2D space enable the calculation of texture indicators in the region of interest that reflect the global measure of intra-tumor heterogeneity; these indicators, along with the distribution statistics, can be used for inter-tumor comparisons and stratification of the tumors into homogeneous and heterogeneous categories; (3) various percentile ranges obtained from the HexT distribution statistics may prove to be more biologically relevant and clinically useful indicators of tumor proliferative activity than a simple Ki67 LI average, especially in heterogeneous cases (the Pareto hotspot is one possible automated indicator that mimics the current clinical practice of Ki67 LI evaluation in hotspots); (4) automated highlighting of potential hotspots on the WSI (e. g., with a Pareto web) can serve as a decision support and quality assurance tool, whereas 3D visualization of the “roughness of the proliferative surface of the tumor” enables a visual representation of the spatial heterogeneity of biomarker expression for better perception of the disease.

DIA enables greater data retrieval from IHC slides than could ever be achieved by manual counts. In our experiment, 121,000 tumor cell profiles were evaluated per WSI on average, with a range from 11,000 to 419,000. Interestingly, a total of 36 million cells were “processed” in the 297 WSI, with an overall tumor area of 15,000 mm2. From this point of view, retrieval of only one indicator (such as Ki67 LI) in the DIA of the WSI from each tumor section can be regarded as a substantial underutilization of the data. The first step forward, enabled by the HexT, is the provision of the average (or median) Ki67 LI along with standard deviation, standard error, and other distribution analytics. These analytics add value, especially in cases where higher accuracy and precision are critical for clinical decision-making. Remarkably, the Ki67 LI detected by the WSI DIA and the median Ki67 LI obtained from the HexT revealed a perfect correlation (r = 0.9967) in our experiment; these data would allow the modeling of the representativeness of TMA sampling techniques, which is beyond the scope of the present study.

In contrast to our previous study [5] based on breast cancer TMA data, in which we found that visual evaluation of Ki67 LI tended to underestimate the standard criterion values obtained by a stereology grid count, our current data, obtained using the same DIA tool with the WSI, revealed that the pathology report Ki67 LI slightly exceeded the WSI DIA values. Although we did not generate another grid count-based standard criterion data in the present study, the most likely explanation for this “discrepancy” comes from the fact that pathologists, in their routine practice, aimed to evaluate Ki67 LI in the hotspots whenever feasible.

DIA with HexT and computed distribution and texture indicators produced a rich data set, enabling the application of multivariate statistical analyses to understand the sources of variance and types of heterogeneity. We have previously shown that a factor analysis of 10 IHC markers measured by DIA in breast cancer TMA can be successfully used to explore multiple relationships in an immunophenotype [29]. Our current experiment demonstrates that single IHC biomarker expression, when evaluated by “multiple measurements” (HexT) in the tumor tissue that account for the spatial aspects of expression, enables evaluation of the comprehensive IHC characteristics of the tumor. Again, factor analysis facilitated the interpretation of the multiple interrelationships in this dataset by extracting intrinsic factors that represent the main and linearly independent processes/features that are hidden behind the variance of the data. For example, if a factor is characterized by positive loading of entropy and dissimilarity and negative loading of energy and homogeneity (Fig. 3), it can be interpreted as an expression of disordered texture when measured by these variables and may be expressed best by a single indicator of entropy (the variable with highest loading for this factor, in our case). Because the factor analysis results depend on the included data, we “balanced” the data set by including several indicators of Ki67 LI, its distribution and texture and the absolute cell numbers per HexT and WSI. The proliferation factor, represented by various measurements of Ki67 LI along with negative skewness, reflected our observation that tumors with low and high Ki67 expression commonly present with opposite types of distribution asymmetry. Entropy was the most informative single indicator of the texture characteristics. Remarkably, the bimodality factor, although less prominent, was independent of the texture statistics and was best characterized by specific bimodality indicators rather than kurtosis. Notably, factor analyses on the same data set but obtained from the HexT of HexSize550 and HexSize1100 or from the SqxT equivalent to HexSize825 produced very similar results (not shown).

The concept of intra-tumor tissue heterogeneity is not new and is commonly used in the context of IHC biomarker expression and microscopic and/or molecular features in general. Nevertheless, the definitions of heterogeneity are mainly arbitrary, while objective criteria may depend on the specific experimental design, biomarker, and tumor tissue [12, 16, 30]. Furthermore, heterogeneity may be regarded as a part of the biological continuum of tumor variation, in which subpopulations of biologically and morphologically distinct tumor cells find their place in the tumor ecosystem. Therefore, defining and measuring heterogeneity in the 2D space of a microscopic section is a challenging task. Additionally, it appears that tumors can be heterogeneous in different ways; for example, they may be finely or coarsely granular.

One might argue that coarsely granular heterogeneity can be better appreciated by visual evaluation and actually represents the hotspot concept used by pathologists; however, the absolute and relative size, shape and contrast with the background are difficult to define and evaluate. This difficulty is illustrated by the inter-observer variability in our hotspot validation experiment; although agreement in the detection of at least one hotspot was fair between pairs of the 3 observers (kappa coefficients of 0.55, 0.63, and 0.85), the locations and the absolute and relative sizes of the hotspots were variable: the spatial overlap of the annotated hotspots varied among the 3 observers from 1.7 to 70.8 % of the area. Based on the DIA data from the overlapping HexT, the pathologists were able to contrast the hotspots with a mean difference of 5.8–6.7 % compared to the background HexT; however, this criterion is likely to be less precise than the automated detection of the hotspots by the HexT approach. In addition, our data support the notion that automated hotspot detection using HexT brings the benefits of better standardization (both the absolute and relative size of the hotspots can be controlled) and more discriminatory evaluation of highly proliferative areas; for example, the mean (59.3 %) of the 90th percentile of Ki67 LI by HexT (equivalent to the median of the Pareto hotspot) was twice as high as the mean (32.6 %) Ki67 LI by HexT (Online Resource 2). In other words, the HexT approach enables a much higher dynamic range of intra-tumor Ki67 LI variability measurements.

It is even more problematic to measure fine spatial irregularity or disorder in tissue biomarker expression. Thus, it is not surprising that the corresponding concepts of spatial entropy and other texture indicators are not used in daily pathology practice. To go beyond subjective impressions, one would need to enumerate the Ki67 LI in systematically subsampled areas of WSI or in consecutive high-magnification fields with a conventional microscope, which becomes impractical in terms of capacity and expected precision. Therefore, a standard criterion for spatial entropy measurements is barely feasible. Validation of the HexT approach with respect to texture measurements is more realistic for the performance of inter-tool comparisons and clinical utility studies.

The tumor heterogeneity of HER2 expression in breast cancer tissue by immunohistochemical detection has been extensively investigated by Potts et al. [12]. They used ecology diversity statistics to evaluate cell-level and tumor-level heterogeneity and proposed a heterogeneity map to visualize individual tumor heterogeneity and HER2 expression levels in the context of a patient population. The study was based on 200 specimens from 2 different laboratories with 3 pathologists per laboratory, each of whom outlined the regions of a tumor for scoring by DIA. Although the results were consistent between the 2 laboratories, the authors did not have patient outcome data to test the clinical utility of the approach. Importantly, they recognized that the number of sampled regions might be insufficient to make determinations of tumor-level heterogeneity; thus, the use of a methodology that samples the entire tumor sample on a slide may be required for this type of analysis.

To the best of our knowledge, the HexT approach is the first proposed methodology that relies on systematic subsampling of automated DIA-generated data by regular polygons in arrays to measure the intra-tumor heterogeneity of IHC biomarker expression in WSI. To further highlight the benefits and novelty of the HexT approach, we note that: (1) it requires only a single DIA procedure per WSI rather than multiple DIA processes for subsampled tissue areas; (2) the DIA process is not affected by the subsampling (the surrounding tissue context is important for accurate tumor tissue segmentation); (3) multiple HexT settings can be applied and tested using the same DIA result; (4) the HexT process can be adapted to different DIA tools; and (5) the HexT approach generates more informative data than a SqxT approach without substantial differences in computation time. Similar image microarray technique has been proposed by Hipp et al. to extract actionable information from WSI by systematic capture of image tiles with constrained size and resolution [31]; although this approach could be utilized to retrieve tumor heterogeneity statistics, we suggest that our methodology has an important advantage of performing DIA on continuous, “not fragmented” tissue in a WSI while performing spatial DIA data tiling in the post-analytical phase.

We performed cluster analysis to explore whether a meaningful classification of breast cancer cases could be achieved by accounting for the percentage of Ki67-positive tumor cells along with the entropy and bimodality indicators. We did not use the cellularity factor in the cluster analysis, as it was the least important contributor to variation in the data set and was also potentially sensitive to technical variations caused by the variable efficiency of cell detection in tumors with variable nuclear morphology. Notably, the extracted clusters and their associations with breast cancer types and grades reflected the well-known relationship between proliferative activity and disease aggressiveness. However, we focused on exploring the cluster distribution in two linearly independent features of intra-tumor heterogeneity with respect to Ki67 LI, namely entropy and bimodality. Our data indicate that although a great proportion of the tumors can be considered as relatively homogeneous, the presence of predominant entropy or bimodality, or a combination of both, may represent heterogeneous variants. Again, robust stratification of the tumors into homogeneous and heterogeneous groups would require evidence-based definitions, preferably ones that reflect clinical outcomes. While formal definitions for bimodality do exist (for example, if Ashman’s D is more than 2), we would rather consider a bottom-up approach based on the percentile distribution of the real data. For example, the tumors could be considered heterogeneous if their entropy and/or bimodality indicators were in the upper quartile of the distribution; as seen in our data set, the upper entropy quartile contained 51 (17.2 %) tumors, the upper Ashman’s D quartile contained 52 (17.5 %), and the combined entropy and Ashman’s D upper quartile contained 23 (7.7 %) tumors (not shown). By these entropy and/or bimodality criteria, approximately 40 % of the tumors could be considered heterogeneous. Interestingly, Dodd et al. in a study [30] of 25 breast cancer cases, found 8 (32 %) cases with significant and measurable variation in MIB-1 proliferative activity in various sectors of the tumor. In our study, up to 27 of 50 cases could be considered heterogeneous based on an independent review of the WSI by 3 pathologists, while this number dropped down to 22 or 15 if agreement between 2 or 3 pathologists was required, respectively.

We observed a peculiar relationship between the proliferation and entropy indicators in our experiment. While the factor analysis extracted the proliferation and entropy factors, which were by definition linearly independent, a weak inverse correlation between the Ki67 LI indicator and the entropy parameter was noted. Furthermore, a non-linear relationship that was positive at lower levels of proliferation and negative at higher levels of proliferation could be demonstrated. This finding, though slightly unexpected, seems meaningful and may be interpreted as the absence of entropy in the absence of biomarker expression, increasing entropy with the appearance of a biomarker at the low end of the scale, and diminishing entropy when the biomarker expression (proliferation) tends to become diffuse. Consequently, maximum entropy is observed in the middle of the scale of biomarker expression. In other words, the tumors can be broadly subdivided into “cold homogeneous”, “medium heterogeneous”, and “hot homogeneous”. In practical terms, this means that the indicator of entropy (and other heterogeneity indicators) must be used in the context of the degree of proliferation (biomarker expression). A similar relationship between the level of immunohistochemically detected HER2 expression in breast cancer cells, and its intra-tumor heterogeneity has been noted by Potts et al. [12].

Multiple measurements of a tumor tissue provide a choice of variables for prognostic and predictive modeling of the disease. Brown et al. in a recent study [32] of 105 pre-surgical biopsies from breast cancer patients, performed multiple Ki67 expression measurements using quantitative immunofluorescence; the method enabled efficient analysis of the entire biopsy, removing the subjectivity of hotspot selection. They found that averaging all fields of view provided a more sensitive and specific assay to predict the response to therapy than did the maximum field of view value. One might argue that the maximum values are vulnerable to biological and technical aberrations; instead, we propose a simple approach based on measuring the Ki67 expression in a stable proportion of the tumor tissue at the high end of the range; this metric may best reflect the biological potential of the tumor to proliferate, both in homogeneous and heterogeneous tumors. The Pareto principle could thus be applied, leading to the definition of a Pareto hotspot as the upper quintile of the biomarker expression in the tissue. Accordingly, the median of the Pareto hotspot, represented by the 90th percentile of the distribution, can be a measure that is more biologically relevant and less sensitive to intra-tissue heterogeneity and other artifacts than the median or maximum values. The Pareto hotspot approach may provide a solution to “a dual problem of accommodating individual sample heterogeneity while optimizing counting methods”, as noted by Romero et al. [14]. Certainly, the clinical utility of the Pareto hotspot concept remains to be tested in appropriate studies.

Christgen et al. have recently reported a study focusing on the impact of ROI size on Ki67 quantification by computer-assisted image analysis in breast cancer [33]. After manual identification of the highly proliferative areas on WSI, they incrementally increased the ROI size by expanding freeform annotations, based on the number of cells detected by the image analysis in the ROI, and showed that the median Ki67 index decreased from 55 to 15 % by increasing the size of the ROI. This led to significant misclassification between low- and high-proliferative tumors dependent on the size the selected ROI. While manual Ki67 counts in the hotspots are usually limited to some fixed number of cells, the authors proposed that the automated image analysis should also standardize and document the ROI size. While we completely concur with the statement, one must take into account the complexity of the task in defining, detecting and measuring hotspots. We reflect on it is as a formula with “multiple unknowns” and potential analytical biases, starting with hotspot selection, along with the definition of its shape and size, further complicated by variable positive cell spatial density gradients (contrast to the surrounding tissue) in individual tumors and hotspots. This can be appreciated in the examples (Online Resource 6) where variable success of the three observers to agree in spatial hotspot detection seems to depend on the variable properties of the hotspots in different tumors. In other words, either visual or computer-assisted evaluation of hotspots, based on spatial clustering of positive cells, is difficult in terms of standardization. In this regard, the HexT approach is free of selection bias and generates data enabling multiple expressions of the intra-tissue variation to be tested against clinical outcomes. Furthermore, the resolution of the HexT can be (self)-adjusted to different levels of the heterogeneity granularity and types of tumors to be tested. For example, highly cellular tumors would enable sufficient sampling for high-resolution HexT analysis as well as multiple-size HexT testing. In other words, the HexT resolution could be fine-tuned as a function of tumor cellularity and architecture.

The last but not the least, the HexT approach enables effective ways to visualize tumor heterogeneity. The Pareto web, represented by the upper-quintile Hex overlaid on the original image of the tumor, provides a simple way to highlight hotspots that is based on “bottom-up” information retrieved from an individual tumor. Importantly, the Pareto web is generated even in homogeneous cases, and thus, it should be interpreted in the context of the heterogeneity metrics for the tumor. Furthermore, this approach may be useful in all tumors by identifying outlier Hex, which may contain tissue and/or image analysis artifacts, for use in quality assurance procedures. Another important factor is that visual perception of tumor entropy or “roughness” can be facilitated by 3-D histograms of the HexT, which can also be aligned to the original image, adding an intuitive component to the understanding of the disease, especially in obscure cases where the clinical decision may benefit from understanding the “big picture”.

Our present study contains several limitations. Firstly, it was designed to prove the principle rather than to test clinical utility of the HexT approach. Patient follow-up was not available in our data set; nevertheless, we observed some clinicopathological associations of texture indicators to the molecular subtypes and histological grade. Secondly, we used relatively large surgical excision samples of breast cancer tissue, and the approach was not tested on core needle biopsy material. It remains to be investigated: one can expect that core biopsy samples may be not sufficient for texture statistics due to potential lack of 2D data in relation to the Hex size applied. Nevertheless, highly cellular tumors could potentially be tested by smaller hexagons. Yet, the Pareto HS detection and measurement by the HexT approach is less dependent on the 2D data and is likely to function in the biopsy samples. Thirdly, we did not exclude the DCIS component in our analyses. Although we did not find evidence that DCIS could significantly impact the hotspot detection in our study, clinical study design would require manual or automated exclusion of DCIS. Finally, although the HexT approach enables multiple definitions of hotspots besides the Pareto principle, we left them out of scope of the present study since they would be best elaborated in the context of clinical outcome data.

In summary, we propose a methodology, based on HexT of IHC DIA data, to retrieve comprehensive information about biomarker expression, its intra-tissue variance, and spatial heterogeneity indicators and to provide effective ways to visualize intra-tissue heterogeneity for decision support and quality assurance. In this study, we demonstrated the concept of comprehensive IHC to measure the Ki67 LI in breast cancer tissue in a way that includes aspects of intra-tumor heterogeneity; however, this approach can potentially be applied to numerous different IHC markers and tissues.