Introduction

Roughly 90 % of cancer deaths are due to metastases [1]. Current methods to assess metastatic disease at diagnosis—examination of regional lymph nodes and radiologic imaging—do not always successfully detect metastases that are present, leading to false diagnoses of local disease. For example, approximately 25 % of colorectal cancer (CRC) patients diagnosed with local disease later experience recurrence [2]. Markers of epithelial-mesenchymal transition (EMT) [35] measured in primary tumor cancer cells could provide an additional test of risk for metastatic disease to guide treatment decisions, even when metastases are not detected by the two standard tests.

EMT is a mechanism of cancer cell metastasis that involves epithelial cells temporarily becoming mesenchymal cells [4]. This occurs when cellular expression levels of EMT inducers increase, triggering decreased expression of epithelial markers and increased expression of mesenchymal markers. These changes lead to loss of adhesion to adjacent cells—enabling detachment of the transitioning cell from the primary tumor—as well as modifications that enhance motility and invasiveness.

Past studies of associations between EMT markers in primary tumors and patient outcomes have measured and analyzed marker expression data using inconsistent methods [6]. For instance, different studies of the same marker measured using the same laboratory technique have often used different scoring scales and defined high versus low expression differently. Ordinal marker expression data have frequently been collected, limiting the ability to identify clinically useful cut points to define marker expression status. The lack of uniform, clinically-informative methods across studies could contribute to inconsistent findings and hamper translation of EMT markers to clinical use.

To facilitate translation of EMT markers, we developed new approaches that permit direct evaluation of how an EMT marker could be implemented clinically. We focused on digital image analysis of immunostained tissue specimens to obtain continuous marker expression data, statistical evaluation of the clinical utility of a given cut point to define high versus low marker expression, and rigorous selection of covariates to include in multivariate models.

To illustrate our approach, we measured the EMT inducer Snail and epithelial marker E-cadherin in primary tumors from a population-based prospective cohort study of CRC mortality and estimated their associations with time-to-death. We hypothesized that low expression of E-cadherin and high expression of Snail would be associated with shorter times from surgery to death compared to opposite expression levels.

Materials and methods

Study population

Subjects were enrolled in the Cancer Care Outcomes Research and Surveillance Consortium (CanCORS), a population-based, prospective, case-only, multi-site observational study of colorectal and lung cancer patients [7]. Briefly, the study assessed the impact of health-system, provider, and patient factors on cancer outcomes. Patients were at least 21 years of age at diagnosis and were enrolled within 3 months of diagnosis during 2003–2006. The study collected patient surveys, surrogate surveys for patients who were deceased or too ill to participate, and medical records data [8]. Vital status for all subjects was verified using the Social Security Death Index on 4 May 2010, providing at least 42 months of follow-up observation per individual.

The North Carolina CanCORS site enrolled 990 CRC patients and was the only one to collect tumor specimens. Subjects in the present biomarker study came from a catchment area of 33 counties in eastern and central North Carolina at the time of diagnosis. Investigators obtained primary tumor and non-neoplastic adjacent colorectal tissue samples from 506 subjects.

Formalin-fixed, paraffin-embedded tissue specimens were sent from hospitals across the catchment area to the University of North Carolina at Chapel Hill (UNC), where they were used to construct tissue microarrays (TMAs) as described previously [9]. Most subjects had multiple cores from both primary tumor and “normal” (non-neoplastic) margin. We measured EMT markers in 12 representative TMAs that included specimens from 219 subjects. To be included in the study sample, subjects had to have at least one core of tumor tissue successfully stained for one marker, with that core having at least 50 epithelial cells and unambiguous histology. From the 12 TMAs, we excluded 26 subjects lacking adequate tumor tissue and an additional 3 subjects who could not be linked to medical records data, yielding a final study sample of 190 subjects.

Immunohistochemistry

We selected EMT markers for evaluation based on the results of previous studies and criteria discussed in our prior literature review [6]. Marker protein expression was measured using the following antibodies: E-cadherin [mouse monoclonal ready to use (RTU), clone 36B5 (cat #PA0387) from Leica Microsystems Inc. (Norwell, MA)] and Snail [goat polyclonal (ab53519) from Abcam (Cambridge, MA)].

Immunohistochemistry (IHC) was performed at the UNC Translational Pathology Laboratory (TPL) using the Bond fully-automated slide staining system (Leica Microsystems Inc., Norwell, MA). Slides were deparaffinized in Bond Dewax solution (AR9222) and hydrated in Bond Wash solution (AR9590). Antigen retrieval was performed at 100 °C for Snail (for 30 min) in Bond-epitope retrieval solution 1 at pH 6.0 (AR9961) and for E-cadherin for 20 min at 100 °C in solution 2 at pH 9.0 (AR9640). After pretreatment, anti-E-cadherin was applied for 15 min and anti-Snail (1:200) for 30 min.

Detection of Snail was performed using the Bond Intense R Detection System (DS9263) supplemented with the LSAB + kit (DAKO, Carpinteria, CA). E-cadherin detection used the Bond Polymer Refine Detection System (DS9800). Stained slides were dehydrated and cover-slipped. Positive and negative controls (no primary antibody) were included for each antibody. All assays were single-marker (i.e., no multiplex assays).

To verify E-cadherin antibody specificity, tonsil tissue was stained both with and without primary antibody (positive and negative controls, respectively). Snail antibody specificity was confirmed in normal kidney (positive control) and normal liver (negative control) [10, 11]. Additional negative controls were performed with goat IgG (Santa Cruz, sc-2755) used in place of the Snail antibody. Both antibodies stained appropriately in the relevant subcellular compartments (E-cadherin, plasma membrane; Snail, nucleus) in each tissue (see Online Resource 3, Figs. S1 and S2).

Stained slides were digitally imaged at ×20 magnification using the Aperio ScanScope XT (Aperio Technologies, Vista, CA). Digital images were stored in the Aperio Spectrum Database.

Example images of staining for colorectal tissue from CanCORS subjects are provided in Online Resource 3 (Fig. S3).

Automated analysis of digital IHC images

Computer algorithms annotated and scored every eligible tissue core to obtain continuous marker expression data. We used approximately 65 cores originating from two TMAs for algorithm training and automated-analysis validation.

Definiens Composer Technology (Tissue Studio version 2.1.1 with Tissue Studio Library version 3.6.1; Definiens Inc., Carlsbad, CA) was used to annotate images for regions enriched in epithelial cells in IHC-stained TMA cores. To detect differences in cell shape and tissue structure, we developed two Composer algorithms per marker—for non-neoplastic adjacent and tumor tissue, respectively—as both types of tissue were present on each TMA.

We then developed two Tissue Studio scoring algorithms (“solutions”) per marker (non-neoplastic and tumor tissue). E-cadherin membrane and cytoplasmic staining were measured on a continuous average intensity scale of 0–3. Snail was measured as core percent positive nuclei on a continuous scale of 0–100. Several additional Snail scoring algorithms were developed, but only scores based on the first algorithm were used for modeling (see Online Resource 1 for further details).

To evaluate the reliability of computer annotations, one of us (ELB) used Aperio ImageScope (version 11.2; Leica Biosystem, Buffalo Grove, IL) to manually annotate the same 65 cores per marker that were used to optimize Tissue Studio solutions. He remained blind to patient and tumor characteristics while annotating. Automated scores obtained via manual and automated annotation produced Pearson correlations of 0.91 for E-cadherin and 0.94 for Snail. All 12 TMAs stained for E-cadherin and Snail were then analyzed (24 slides in total).

Subjects typically had multiple cores available of a given tissue type (tumor or non-neoplastic). To assign an expression value for each subject by marker and tissue type, we handled replicate cores in two ways: first, as a weighted average of cores, and second, by assigning the expression value of the subject’s “worst core” as the marker expression value. For weighted averages, the weights were area analyzed for E-cadherin and number of nuclei for Snail. The worst core by tissue type was assigned as the lowest core average intensity for E-cadherin and as the highest core percent positive nuclei for Snail.

See Online Resource 1 for further details on algorithm development and validation. See Online Resource 3 for example images of annotations and scoring in colorectal tumor and non-neoplastic tissue (Figs. S4, S5).

Outcome

In statistical models, the dependent variable was length of time in days from primary tumor surgery until all-cause mortality, with administrative censoring at 5 years after surgery.

Covariates

Covariates for multivariate statistical models were selected based on prior studies [6], considerations of biological plausibility, and directed acyclic graph theory [12].

Per directed acyclic graph theory, we adjusted for those variables considered to be common causes of both the independent variable of interest (EMT marker expression in primary tumor cancer cells) and the dependent variable (time from surgery to death): age, neoadjuvant chemotherapy, neoadjuvant radiation therapy, tumor size (T-stage), lymph-node metastasis diagnosis (N-stage), and distant-metastasis diagnosis (M-stage). We used overall TNM stage as a single covariate instead of adjusting for the component stages as three separate variables since including both overall stage and any of the component stages would constitute inappropriate overadjustment.

Pre-surgery treatments consisted of neoadjuvant chemotherapy and neoadjuvant radiation therapy as two separate variables, each coded as received or not received. Cancer treatments (chemotherapy and radiation therapy) prior to surgery were included as covariates but treatments administered after surgery were not included. While treatments at either time are related to time-to-death, the observed EMT marker expression in primary tumor cancer cells could only have been exposed to treatments before surgery because the primary tumor remained in the body at that point, whereas the tumor was removed before the administration of post-surgery treatments. Based on directed acyclic graph theory, we concluded that other variables related to CRC prognosis—such as microsatellite status, tumor budding, and KRAS status—should not be adjusted for in multivariate models.

Identification of statistically-optimal marker expression cut point

For each continuous marker expression variable, we identified the cut point distinguishing high expression from low expression that was most strongly associated with time-to-death.

For every possible cut point along any marker expression continuum, we defined high expression as expression at or above the cut point and low expression as expression below the cut point. Whether high expression is clinically desirable depends on the particular marker. High E-cadherin status would be expected to correlate with better outcomes (i.e., longer time-to-death) [4, 6, 13], whereas high Snail [4, 6] status would be expected to correlate with worse outcomes.

To identify the statistically-optimal cut point for each continuous marker expression variable, we iteratively dichotomized marker expression at every possible cut point in the observed tumor tissue data, with each cut point corresponding to a different subject’s expression value. Each dichotomization of marker expression status was fit as the only independent variable in a Cox proportional hazards model with time-to-death as the outcome, producing a model fit statistic. The cut point with the lowest model fit statistic was considered statistically optimal.

For macro SAS code and further details, see Online Resource 2.

Statistical analysis

The data collection yielded four continuous marker expression variables: weighted average and worst core for each of the two markers. We first used unpaired two-sample t-tests to assess whether average continuous marker expression differed between tumor and non-neoplastic tissue. All subsequent analyses used tumor tissue only. We applied the macro to the tumor tissue data for the four continuous marker expression variables to identify the statistically-optimal cut point for each. Every optimal cut point was used to create a dichotomous marker expression variable (low versus high for E-cadherin, high versus low for Snail).

We generated Kaplan–Meier survival curves stratified by dichotomous marker expression status for one marker or two markers jointly, assessing differences between strata using the logrank test. Next, for each optimally-dichotomous marker expression variable, we fit unadjusted and adjusted Cox proportional hazards models of time-to-death. For marker expression variables found to be associated with patient outcomes at their statistically-optimal cut points, we evaluated additional cut points to explore whether the statistically-optimal cut point ought to be considered the clinically-optimal cut point.

Prior to modeling, non-informative observations (e.g. “No Answer,” “Don’t Know,” “Unknown”) were recoded as missing. Missing data for all model variables were evaluated using multiple imputation. P values of 0.05 or below were considered statistically significant. All analyses were performed using SAS 9.3 (SAS Institute, Cary, NC). The Institutional Review Board at UNC approved the protocol. All subjects provided informed consent.

Results

Subject characteristics are shown in Table 1. We found no differences between overall North Carolina CanCORS and the subset for whom EMT markers were measured in primary tumors.

Table 1 Subject characteristics for overall North Carolina CanCORS and subset in whose primary tumors EMT markers were measured

On average, tumor tissue had lower E-cadherin expression than non-neoplastic adjacent tissue regardless of whether expression values were assigned as a weighted average of cores or as the worst core (Table 2). However, average Snail expression was higher in non-neoplastic tissue than in tumor tissue (Table 2). While the difference was not large, this relationship was consistent across both ways of assigning expression values and all three Snail scoring algorithms (Table S1 in Online Resource 3).

Table 2 Average continuous EMT marker expression in tumor tissue compared to non-neoplastic adjacent tissue

On the average intensity scale of 0–3 for E-cadherin, we found that the statistically-optimal cut point was about 0.52 for weighted averages and 0.42 for worst cores. On a percent positive nuclei scale for Snail using the first nuclear scoring algorithm, the statistically-optimal cut point was about 25.2 % for weighted averages and 63.6 % for worst cores. For each marker expression variable, Table 3 presents the number of subjects by statistically-optimal dichotomous marker expression status.

Table 3 Statistically-optimal dichotomous EMT marker expression status cross-tabulated with tumor stage and with risk of dying within 5 years of surgery

For E-cadherin weighted averages, subjects with low tumor expression had worse survival than those with high tumor expression (Fig. 1). None of the Kaplan–Meier curves stratified by the other three optimally-dichotomous marker expression variables revealed a statistically-significant difference in survival (not shown). A survival curve jointly stratified by E-cadherin and Snail weighted average status had a significant logrank test (see Online Resource 3, Fig. S6). Among the strata of joint expression status, no pattern consistent with the study hypotheses was observed.

Fig. 1
figure 1

Kaplan-Meier overall survival stratified by expression status of E-cadherin measured as a continuous weighted average of tumor cores and then dichotomized at the statistically-optimal cut point (E+ = high expression, E− = low expression)

Unadjusted and adjusted proportional hazards model results paralleled the single-variable stratified survival curves. Low E-cadherin weighted average expression was associated with greater hazards of dying than high expression both when unadjusted [Hazard ratio (HR) = 2.84, 95 % Confidence Interval (CI) 1.29–6.28] and adjusted (HR = 2.57, 95 % CI 1.10–6.03), while no associations were found for any of the other optimally-dichotomous expression variables (Table 4).

Table 4 Unadjusted and adjusted Cox proportional hazards models of the effect of optimally-dichotomized marker expression status on time-to-death censored at 5 years after surgery (n = 190)

To evaluate whether the statistically-optimal cut point would be clinically optimal, we explored several trade-offs between strength of cut-point/time-to-death association and the number of subjects whose treatments might change due to clinical use of EMT markers. Specifically, we considered three different E-cadherin weighted average cut points that were either statistically significant or nearly so: about 0.52 (statistically-optimal value), 0.60, and 0.85 (Table 5). Setting the cut point to a value other than the statistically-optimal value led to hazard ratio point estimates that were weaker than the one at the optimal cut point, but marker expression status was still effectively associated with outcomes at each of these cut points. Notably, the precision of hazard ratio estimates was better at cut points other than the statistically-optimal value, with a confidence limit ratio of 5.48 at a cut point of 0.52, 3.48 at a cut point of 0.60, and 3.11 at a cut point of 0.85.

Table 5 Trade-offs between strength and precision of cut-point/time-to-death association and number of patients with low or high expression, by stage distribution and 5-year risk of death, for multiple cut points of E-cadherin weighted average expression

The number of subjects whose treatments might change based on E-cadherin measurements—those diagnosed with local disease but who had low E-cadherin expression—varied substantially with cut point. Of 99 subjects with E-cadherin measurements and diagnosed with local disease, 6 had low expression at the statistically-optimal cut point, 16 at a cut point of 0.60, and 56 at a cut point of 0.85 (Table 5).

Discussion

EMT markers measured in primary tumor cancer cells could potentially improve the accuracy of cancer staging by identifying patients at risk for metastatic disease who have false negative test results for both lymph node evaluation and radiologic imaging. This improvement in staging could lead to more appropriate treatment decisions that would reduce the number of patients diagnosed with local disease who later experience recurrence, ultimately improving survival outcomes.

Successful translation requires consistent, clinically-informative design of studies of EMT markers and patient outcomes. Although marker expression is naturally continuous, the clinical purpose of the markers is to guide therapy. Since treatment decisions are inherently binary, what matters is dichotomous marker expression. Therefore, the choice of cut point to dichotomize continuous marker expression is critical.

We developed approaches to EMT marker measurement and analysis that allow evaluation of the clinical utility of an EMT marker at different cut points. As a proof of principle, we measured two EMT markers in a set of CRC primary tumors from a population-based prospective cohort study. We found that E-cadherin expression measured as a weighted average of tumor cores was associated with time to all-cause mortality independent of tumor stage, but Snail expression was not associated with outcomes. This implied that E-cadherin has promise as a marker to identify CRC patients at risk for metastatic disease independent of lymph node evaluation and imaging results.

Our results suggested that at least three criteria should be used to evaluate the association between dichotomous EMT marker expression at a given cut point and patient outcomes: strength of point estimate, precision, and the number of patients whose treatments would be changed by implementing the marker clinically at that cut point. Table 5 illustrates the information needed to compare the performance of an EMT marker at different cut points.

The table shows that, while multiple E-cadherin cut points were associated with time to all-cause mortality, there were trade-offs between them in terms of clinical performance. On a continuous average intensity scale of 0–3, the statistically-optimal cut point (in terms of model fit) of 0.52 had the strongest point estimate in terms of being furthest from the null value. A cut point of 0.85 had the best precision.

In terms of the patients whose treatments would change by implementing E-cadherin as a diagnostic tool, this would consist of those diagnosed with local disease according to lymph node evaluation and imaging but who had low E-cadherin expression. Without knowledge of E-cadherin status, these patients would not generally receive adjuvant chemotherapy [14], but after introducing the marker, they might be considered for chemotherapy. Recall that about 25 % of CRC patients diagnosed with local disease later experience recurrence [2]. This suggests that a useful cut point for E-cadherin in CRC would yield an approximate distribution of 25 % low expression and 75 % high expression among those diagnosed with local disease according to lymph node evaluation and imaging. Our study had 99 subjects with E-cadherin measurements who were diagnosed with local disease. The recurrence data for CRC patients diagnosed with local disease implies that an effective cut point would yield a distribution of about 25 low E-cadherin and 74 high E-cadherin expression status among these 99 subjects. Of the three cut points in Table 5, a cut point of 0.60 came closest to this distribution.

Our approach had several important strengths. First, digital image analysis yielding continuous marker expression data maximized the opportunity to identify the most clinically useful cut point [15]. In contrast, collecting dichotomous or ordinal marker data, such as are obtained by manual IHC scoring, would obscure much of the natural variation that may be present and greatly reduces the number of cut points that can be explored. Second, our approach to analyzing the data permitted direct evaluation of the clinical impact of using a particular marker at a given cut point. It also allowed straightforward comparison of the strengths and weaknesses of different cut points for the same marker in the same study sample. Choosing cut points based on biologically arbitrary criteria such as percentiles of marker expression is not recommended [16].

Third, the selection of covariates for multivariate models has varied considerably and been without justification in previous studies [6]. Our approach was to identify a wide range of possibilities using prior literature and then determine whether to include or exclude each possibility based on directed acyclic graph theory [12]. Finally, our sample was population-based, implying that our results could have greater generalizability than the hospital-based samples that have dominated prior studies [6, 17]. Although the subjects in this analysis were approximately 20 % of those enrolled in North Carolina CanCORS, we found no differences in subject characteristics between overall North Carolina CanCORS and those included in the EMT study sample (Table 1). This suggested that the EMT study sample remained representative of the underlying source population.

In terms of limitations, first, the only available outcome was time from surgery to all-cause mortality. It would have been informative to also examine associations between marker expression and time to cancer-specific mortality as well as time to recurrence, but these alternative outcomes were not measured in CanCORS. Second, the sample size was small, which prevented division of the subjects into separate training and validation sets with sufficient power to detect an association between marker expression and outcomes.

Finally, our study did not sample tumors in a consistent way. Ideally, every tumor would have been sampled at each of the invasive front, tumor center, and an edge of the tumor away from the invasive front [6]. However, for any given tumor in our dataset, we did not know from which part of the tumor our tissue specimens came. EMT marker expression could vary throughout a tumor and it may be that, for clinical purposes, physicians should always sample a particular portion (e.g. the invasive front). Had systematic tumor sampling been performed, we could have calculated portion-specific estimates of, say, the association between invasive front E-cadherin and time-to-death, and separately, the association between tumor center E-cadherin and time-to-death. Unfortunately, we could not analyze the data this way given the random tumor sampling.

An important implication of this study for clinical cancer pathology is that EMT marker immunohistochemical measurements should be incorporated into clinical practice using digital image analysis to obtain a continuous expression score on a scale standardized for the particular marker. This score should then be compared to a cut point specific to the marker that has been identified via epidemiologic studies and is based on a consideration of the criteria discussed above: strength of association with patient outcomes, precision of association, and proportion of patients diagnosed with local disease according to lymph node evaluation and radiologic imaging whose treatments would change based on EMT marker status. For CRC, our results suggest that a finding of low E-cadherin levels could signal to clinicians that cancer cells were likely detaching from the primary tumor prior to surgery.

This study furthers translational EMT research in both specific and general ways. Specifically, our results imply that E-cadherin is a promising marker to identify CRC patients at risk for metastatic disease even when metastases are not detected by lymph node evaluation and imaging. More generally, we have developed an approach to measurement and analysis of EMT marker data that is more rigorous, thorough, and clinically-informative than techniques described in previous studies. This approach is not specific to CRC and could benefit future studies in different tumor sites. Adoption of our approach could facilitate translation of EMT markers to clinical use and thereby improve outcomes across a wide spectrum of cancer patients.