Introduction

In the late nineteenth century, Von Hansemann [57] proposed that the nuclear morphology of tumor cells might foretell their ultimate biological behavior. This concept laid the foundation for a multitude of grading schemes that are in use today. A relationship between breast cancer histologic grade and survival was first documented in the 1920s by Greenhough et al. [21]. They assessed eight morphological factors including gland formation, secretory vacuoles, cell size, nuclear size, pleomorphism, degree of hyperchromasia, and number of mitoses. Later on, additional studies by Patey, Scarff, and Haagensen confirmed the relationship between histologic grade and patient survival. Patey and Scarff [39] highlighted the importance of tubular formation, variation in nuclear size, and hyperchromatism in histologic grading, whereas Haagensen [22] evaluated 15 histological features categorized under growth pattern, cell morphology, and the reaction of the surrounding stroma. In 1950, Bloom divided tumors into low, moderate, or high-grade malignancies according to the degree of tubule formation, nuclear features, hyperchromasia, and mitotic activity, and recognized a correlation between tumor grade and survival. Seven years later, Bloom and Richardson [3] proposed a numerical scoring system to facilitate the grading effort. Each of the above three features was examined and given a score of 1, 2, or 3 for a total possible score ranging from 3 to 9. Meanwhile, Black et al. [2] concluded that nuclear morphology was the most significant prognostic factor. They proposed a five-tier nuclear grading system that was later reduced to three tiers by Fisher et al. [18]. In the early 1990s, Elston and Ellis [16] re-examined and modified the grading system by combining Bloom and Richardson’s and Black’s approaches. They deleted “nuclear hyperchromasia” and refined the numerical method in assessing mitotic count. This system, also referred to as the “Nottingham modification of the Bloom–Richardson system”, has become a popular and widely used grading scheme with proven prognostic significance. It is currently recommended by the WHO to include this tumor grading for all invasive breast cancers [52]. Although the Scarff–Bloom–Richardson (SBR) system correlates well with prognosis, the literature regarding its routine use is divided. It has been criticized for being imprecise in the assessment of all three parameters, most notably mitotic frequency, causing an element of subjectivity to influence tumor grade [4, 13, 24, 29]. Elsewhere, it has been shown to be acceptably reproducible when strict criteria are defined [10, 11, 19, 48, 54]. Ductoglandular differentiation has been shown to be the least predictive component of the SBR system, whereas nuclear pleomorphism and mitotic activity are the most useful [31].

Cell proliferation or mitotic activity as a predictor of tumor behavior has received increasing attention in recent years. The Ki-67 antigen, first described in 1983, is a non-histone protein [20]. The protein is expressed in cycling cells in the G1 phase, S phase, G2 phase, and during mitosis, which provides an accurate interpretation of the growth fraction of the tumor. The MIB-1, an anti-Ki-67, antibody has proven to be superior to other antibodies for assessing cells, which have been triggered into the cell cycle [7]. Several studies have proven the prognostic significance of data obtained with this antibody in breast cancer and its positive relationship with histologic grade, tumor size, mitotic activity, hormonal and Her-2 status, and disease-free survival [5, 9, 27, 28, 33, 34, 37, 4042, 46, 50, 56, 58]. Problems did exist, however, with inter-laboratory and inter-observer reproducibility of immunohistochemical (IHC) analysis of MIB-1 (Ki-67) labeling in breast cancer, leading some authors to question its utility as an independent prognostic factor [23, 32, 35, 37].

In the current study, our objective was to develop a new grading system [the nuclear grade plus proliferation (N+P) system] which is comparable to the SBR system in terms of defining prognostically relevant groups. The N+P system demonstrated at least equivalent correlation between histologic grade and overall survival as the SBR system. Our system relies on automated methods that, while not immune to the influence of subjective bias, may (given time and further study) prove to increase the objectivity and reproducibility of breast cancer grading.

Materials and methods

Patient cohort

A total of 650 primary breast carcinomas, ductal type, consisting of 137 SBR grade I, 247 grade II, and 266 grade III tumors were examined. The retrospective study was approved by the Institutional Research Committee at the Kansas University Medical Center. Histologic tumor samples were obtained from lumpectomy or mastectomy specimens as well as core needle biopsies. The samples were taken from 542 lumpectomy/mastectomy specimens and 108 core biopsies from patients ranging in age from 24 to 95 years with a median age of 55 years. Histopathologic parameters, including histologic grade, type, nuclear grade, and angiolymphatic invasion, were recorded for all patients. All tumors were graded using the modified “Nottingham” criteria of Bloom and Richardson [16]. Additional parameters including tumor size and lymph node metastasis were also recorded for lumpectomy/mastectomy specimens.

Patient outcome data were obtained from the Kansas Tumor Registry. Specifically, the date of death was recorded and used to calculate overall survival. Overall (rather than disease-specific) survival was reported because discrepancies and/or incomplete cause of death data were not uncommon.

Criteria for the newly proposed N+P grading system

The goal of this study was to develop a valid, reproducible, and user-friendly system for grading invasive breast cancers. The N+P grading system is a three-tiered system that evaluates two features: nuclear pleomorphism and the automated MIB-1 count. Tubule formation and the manual mitotic count were eliminated. The nuclear grade was scored conventionally from 1 to 3 using the SBR system. Specifically, nuclear grade 1 is characterized by small regular uniform nuclei with little variation in size and shape; nuclear grade 2 by moderate nuclear variation in size and shape; and nuclear grade 3 by marked variation in size and shape, including bizarre nuclei.

The automated MIB-1 count was likewise classified into three categories: ≤9%, 10–25%, and >25%. These cutoffs (9 and 25%) were selected initially based on tertiles to produce three equal groups, but then shifted slightly to avoid placing a cut-point at a value with a large number of cases (Table 1).

Table 1 N+P scheme for grading invasive ductal carcinoma of the breast

Next, the information from nuclear pleomorphism and MIB-1 was combined to yield a new N+P grading system based on the number of bad prognostic factors present in the tumor, i.e., whether the tumor was nuclear grade 3 (Fig. 1) and/or whether MIB-1 expression was >25% (Fig. 2). Thus, N+P grade I was defined as a tumor having nuclear grade 1 or 2 and MIB-1 ≤25%; N+P grade II describes a tumor having either nuclear grade 3 and a MIB-1 >25% or nuclear grade 1 or 2 with a MIB-1 >25%. N+P grade III was a tumor having both nuclear grade 3 and MIB-1 >25% (Tables 1 and 2).

Fig. 1
figure 1

Overall survival of patients with invasive ductal carcinoma classified by nuclear grade. Survival curves of patients with nuclear grade 3 are statistically significantly different those of nuclear grades 1 and 2 (p = 0.0004; log-rank test)

Fig. 2
figure 2

Overall survival of patients with invasive ductal carcinoma classified by level of immunohistochemical expression of MIB-1. The three survival curves are NOT statistically significantly different (p = 0.068; log-rank test)

Table 2 Classification of invasive ductal carcinomas graded according to the proposed N+P grading system

Immunohistochemical studies

Tissue blocks containing the most representative and well-preserved tumor areas were selected for IHC. Immunohistochemical analysis was performed on tissue fixed with 10% neutral buffered formalin. MIB-1, p53, epidermal growth factor receptor (EGFR), Bcl-2, and Her-2 IHC analyses were performed on all specimens using a Dako Autostainer (DAKO, Carpinteria, CA). The paraffin-embedded tissue blocks were cut to 5-μm sections, deparaffinized and heat-treated for antigen retrieval. Her-2 antibody was detected using the HercepTest (DAKO) per manufacture protocol. For individual antibodies, the vendor, clone, titration titer, time of titration, epitope retrieval method, and method of detection, please refer to Table 3. Hematoxylin was used as a counterstain. Appropriate positive and negative controls were included. Positive controls for the markers were selected from surgical specimens received in the surgical pathology laboratory and confirmed to be positive when compared with other samples. Negative controls included samples run without the primary antibody or with non-immune serum. Positive and negative controls supplied in the HercepTest kit were used for the evaluation of the Her-2 stains. Nuclear morphology, tubular formation, and MIB-1 labeling index were then analyzed and assigned scores.

Table 3 Protocols for immunohistochemistry

Quantification of immunohistochemistry

Positive IHC reactions were defined as a dark brown circumferential reaction on the cell membrane for Her-2 and EGFR, distinct nuclear staining for MIB-1, ER, PR, and p53, and intense cytoplasmic staining for Bcl-2. Staining parameters were evaluated at 100×, and areas of high-density immunostaining were chosen for image analysis or manual scoring. For proliferation index (PI) of MIB-1, the percentage of nuclei with immunopositivity was determined using the PI program of either the cell analysis system (CAS) 200 image analyzer (Bacus Laboratory, Chicago, IL) for the period between 1991 to 2001 or (later) with the Clarient automated cellular imaging system (ACIS™; San Juan Capistrano, CA). Five to ten areas with the highest staining intensity were selected for quantitation from each specific lesion (Fig. 3). An average score for all selected areas was then calculated. For ER, PR and p53, both the CAS-200 and ACIS™ systems were used for automated counts. Manual microscopy was utilized to score tumor staining with antibodies to EGFR and Bcl-2. Her-2 staining was quantified using a score of 0 or 1+ to indicate “negative” results, and 2+ or 3+ to represent “positive” results, per the scoring instructions included in the HercepTest kit. Results were validated using the Her-2 scoring system of the ACIS machine. Using both manual and automated microscopy, up to ten high power fields were evaluated with each marker to provide the final score. A staining of 10% or more of the tumor cells with the antibody to EGFR or Bcl-2 was considered positive, whereas for p53, any counts greater than or equal to 5% were considered positive.

Fig. 3
figure 3

Screen images from the ACIS™ program depicting the process of selecting tissue areas to be evaluated for MIB-1 expression. a A representative tumor resection specimen. The exclusive “hot spots” feature identifies areas with the most intense staining (left lower corner). Quick identification of the relevant regions is then possible. Circular areas of representative tumor are then automatically selected, and positive MIB-1 staining is calculated and expressed as a percentage of the total tumor cells (right image). b Highlights two additional valuable features of the ACIS™ system. It shows an image of a representative core biopsy specimen (left image), as well as the use of the manual mapping feature of irregular areas involved by tumor for calculation of percent of cells stained with MIB-1 (right image)

Statistical analysis

Overall frequencies and percentages were summarized for tumor grade, via both the N+P and SBR system, ER, PR, p53, EGFR, Bcl-2, Her-2, vascular invasion, and node positivity. The frequencies of each variable stratified by the grading system were calculated, and their relationships to each grading system were evaluating using the chi square test. Summaries of biomarker expression by N+P system stratified by SBR system are also given. The log-rank test was used to compare overall survival across the three grades for the N+P and SBR systems independently.

Results

Nuclear grade alone provided reasonable prognostic prediction for overall survival (p = 0.0017; log-rank test), with obvious separation of high grade versus low and intermediate grades (p = 0.0004; Fig. 1). The segregation of MIB-1 quantification into three categories resulted in a fairly even distribution between grades I, II, and III (194, 186, 270, respectively) and was prognostic for overall survival (p = 0.068; log-rank test). The highest level of MIB-1 expression significantly differed from the low and intermediate levels of expression (p = 0.022; log-rank test; Fig. 2). We evaluated whether shifting the cut-points by a few percentage points either way would alter the prognostic ability, but there was no improvement in the separation of the curves.

The nuclear grade and automated MIB-1 count was combined as previously described into a three-tier grading system (the N+P grading system.) This classification system was prognostic for overall survival (p = 0.0013; log-rank test), with greatest separation of grades II and III versus grade I (Fig. 4). Comparing individual grades, grade I was statistically significantly different from both grade II (p = 0.0025) and grade III (p = 0.0004). There was a suggestion of separation between N+P grade II and N+P grade III for times less than 4 years, but this was of marginal statistical significance (p = 0.078; log-rank test).

Fig. 4
figure 4

Overall survival of patients with invasive ductal carcinoma classified by the N+P grading system. The three survival curves are statistically significantly different (p = 0.0013; log-rank test)

The N+P grading system was compared to the standard SBR grading system. The SBR system was overall prognostic (p = 0.0032; log-rank test) and identified a cohort of patients with bad prognosis (grade III), but there was no clear separation between grades I and II (Fig. 5). Comparing individual grades, grade III was statistically significantly different from both grade I (p = 0.0096) and grade II (p = 0.0043).

Fig. 5
figure 5

Overall survival of patients with invasive ductal carcinoma classified by the SBR grading system. The three survival curves are statistically significantly different (p = 0.0032; log-rank test)

Of more clinical relevance, if one simply takes SBR grade II patients, who have survival data indistinguishable from SBR grade I patients, and compares them on the basis of being N+P grade I versus N+P grade II, there is an obvious and statistically significant (p = 0.025) separation (Fig. 6). In other words, roughly one third of the patients would be identified who have an increased risk of death. Using SBR grading alone, these patients would have been given a moderately good prognosis.

Fig. 6
figure 6

Overall survival for patients classified as SBR grade II, plotted as a function of the N+P grades I vs II. The two survival curves are statistically significantly different (p = 0.025; log-rank test)

Comparison of the N+P grading system to the SBR system

The two histologic grading systems (the SBR and N+P systems), in general, demonstrated similar frequencies for the different histologic grades (Table 4). More importantly, the greatest difference between the two systems was observed for those tumors initially classified as grade II by SBR. Whereas 91% of tumors initially classified as SBR I remained as N+P I and 70% of tumors initially classified as SBR III remained as N+P III, only 39% of tumors initially classified as SBR II remained as N+P II. The majority (53%) of the tumors initially classified as SBR II were “down-graded” to N+P I. As shown above, this shift provided definite prognostic value.

Table 4 Comparison of the SBR to the N+P grading systems for invasive ductal carcinoma

The two grading systems also demonstrated similar frequencies, grade for grade, for expression of other clinical and immunohistochemical prognostic factors studied (Table 5). At surgery, low-grade tumors were smaller in size (median diameter of 1.3 and 1.5 cm for SBR and N+P grade I, respectively, to 2.2 and 2.3 cm for SBR and N+P grade III, respectively) and less likely to be associated with angiolymphatic invasion or nodal metastasis. The majority of low-grade tumors were ER, PR, and Bcl-2 positive and negative for p53, EGFR, and Her-2. In contrast, the majority of the high-grade tumors were ER, PR, and Bcl-2 negative. In addition, more high-grade tumors showed increased expression of p53, EGFR, and Her-2, consistent with the biological aggressiveness of these tumors. Moderately differentiated tumors were, as expected, somewhere in the middle between well and poorly differentiated tumors.

Table 5 Frequency (%) of expression of various biomarkers and clinicopathologic parameters for tumors classified by SBR and N+P grading systems for invasive ductal carcinoma

The available demographic, clinical, and pathologic data were subjected to Cox regression analysis to identify predictors of overall survival. Variables included were age at diagnosis, evidence of vascular invasion, type of specimen (biopsy or surgical), MIB-1 percent positive cells, nuclear grade, and both the SBR and N+P grades. Lymph node status and tumor size were excluded, as information was not available on 280 and 108 subjects, respectively. Similarly, immunohistochemical expression of the other molecular markers was also excluded because of incomplete data. The first variable to enter the model was the N+P grade (p = 0.001), followed by vascular invasion (p = 0.017).

Discussion

In this study, we described a new three-tier grading system, the N+P system, which eliminates the ductoglandular differentiation component of the SBR system and replaces the manual counting of mitotic figures with an automated MIB-1 count. Our results demonstrate that the N+P system correlates well with a variety of tumor biomarkers and clinicopathologic parameters which have well-known prognostic value, including vascular invasion and lymph node status. The N+P system is at least comparable to the SBR system in terms of predicting overall survival and correlates similarly with prognostic biomarkers. Additionally, one potential advantage of the new system was highlighted by its ability to discriminate a group of intermediate risk patients not identified by the SBR system. About one third of the patients classified as SBR grade II exhibited a survival indistinguishable from SBR grade I. When graded by the N+P system, these patients were shown to have a statistically significantly worse overall survival than would be predicted by the SBR system.

For decades, investigators have known that data collected from routine pathologic evaluation of breast tumors can be used to provide important prognostic information. Histologic grade [8, 15, 26, 41], tumor type [15], tumor size [6, 26, 47], angiolymphatic invasion [12] and lymph node status [14, 17, 26, 43] have all been shown to yield clinically valuable information. As medical understanding of tumor biology has expanded, breast cancer grading has evolved through many stages. The Elston modification of the SBR grading system is currently the most widely used system in North America and Europe [38] and is currently recommended by the WHO [52]. It provides criteria for the assignment of a histological grade which reflects both tumor morphology and biological behavior. Although defined cutoffs exist within the SBR system, the determination of these factors is arguably based on subjective judgments, leading to potential problems with intra- and inter-observer reproducibility. In fact, the latest Breast Task Force of the American Joint Committee on Cancer did not include histologic grading in its staging criteria due to “insurmountable inconsistencies” between institutions and pathologists.

The literature is divided. Some studies demonstrate a lack of precision in assessing all three parameters of the SBR system, specifically mitotic frequency [4, 13, 24, 29, 30, 36, 44, 45, 51]. Elsewhere, it has been shown that with experienced pathologists and strictly defined criteria, the SBR system is quite reproducible [51, 10, 11, 19, 48, 54]. Agreement ranges from poor (35%) to acceptable (75%) among pathologists using the SBR grading system, and there exists a low predictive value of prognosis for individual patients [4, 13, 24, 30, 31, 36, 44, 45, 51]. The source of most pathologist disagreement is within the SBR grade II group. There also exists doubt about the utility of the SBR system when grading small lesions [55].

An important feature common to both the N+P and SBR grading systems is the evaluation of tumor proliferative activity. Performing a manual mitotic count is time-consuming and somewhat subjective. Experience, interest, and diligence in counting mitosis can vary between individuals. It is also known that tissue fixation and slide quality has a negative effect on one’s ability to perform an accurate mitotic count. Furthermore, specimens containing limited tumor material (i.e., core needle biopsies or residual tumor after neoadjuvant chemotherapy) could further limit the accuracy of the count. These factors may impact tumor grade and, consequently, clinical management. The use of an automated MIB-1 count removes some of the subjectivity from the assessment of mitotic activity within a given tumor; however, biologic variability and examiner individuality can never be totally eliminated. While it is our belief that the automatic MIB-1 count is less susceptible to these factors, the inherent proliferative variability within any given tumor, combined with the reliance of our system on accurately choosing the best “hot spots” to be analyzed, prevents the elimination of all sources of potential subjectivity. Newer systems have programs capable of extensively sampling an entire tumor on a slide, further reducing the potential for sampling error. It is our opinion that the N+P system represents a move toward a more objective approach to breast cancer grading; however, further studies need to be carried out to demonstrate the reproducibility of these methods.

The grading of special types of breast carcinoma, as well as invasive lobular carcinoma, is of particular interest. The SBR system is strongest when grading unselected cases of ductal carcinoma; however, investigations have also focused on its utility in invasive lobular carcinoma [1, 49]. In clinical practice, an SBR grade is often reported for lobular carcinoma; however, the impact of using predefined grades (i.e., SBR grade 2 for lobular carcinoma, grade 1 for tubular carcinoma, and grade 3 for medullary carcinoma) on the ability of the SBR to prognosticate these malignancies remains unknown [52]. No universally accepted grading system for non-ductal carcinomas currently exists. In developing the N+P system, it was our goal to not only show equivalence with the SBR system for grading ductal carcinoma but also to identify a single system which may potentially be used for grading other types of invasive breast carcinoma. It is our hypothesis that the N+P system will ultimately prove to be a viable means for grading all types of breast carcinoma. Our preliminary data analyzing ductal carcinoma in situ [53] and lobular carcinoma (unpublished) is promising.

Replacing the manual mitotic count with an automated MIB-1 count has been shown to be beneficial, providing both standardization and precision [25]. Our results demonstrate a positive correlation between the N+P grade and the MIB-1 status of these breast tumors. When nuclear grade and MIB-1 status are combined to arrive at an N+P grade, the results are comparable to the SBR system in terms of overall survival and prognostic biomarkers. In addition, the N+P system is able to accurately grade tumors in needle core biopsy specimens. Tumors graded in these limited tissue samples positively correlated with the different histologic and prognostic parameters studied, as well as with their paired excision specimens.

As stated previously, our goals were to introduce a method of breast cancer grading which is applicable to all forms of invasive and in situ carcinoma and reduces, as much as possible, the subjectivity inherent in many current grading schemes, specifically the Scarff–Bloom–Richardson system. Our results demonstrate the feasibility of the N+P system as a viable alternative to SBR grading; however, our findings need to be interpreted with caution. Two main shortcomings limit our conclusions. First, we evaluated both the SBR and the N+P grading systems by comparing overall survival rather than disease-free survival. Second, we lacked data regarding lymph node status or tumor size on a considerable number of specimens (280 and 108, respectively.) Drawing conclusions regarding potential outcome without fully taking into account the impact of these well-established prognostic factors should be done with caution. Future study may be directed at further assessing the reproducibility of our methods or assessing the ability of the N+P system to prognosticate patients based on disease-free survival. Another potential avenue for investigation would involve attempting to modify the N+P system to segregate tumors into a two-tier system (low- or high-grade) as has been done for urothelial carcinoma.

In conclusion, we have demonstrated that the N+P system is at least equivalent to the SBR system for predicting overall survival and prognostic biomarker expression in invasive ductal carcinoma. Future research is needed (and encouraged) to assess the reproducibility of these methods. The applicability of the N+P system to the grading of other forms of breast carcinoma such as mucinous carcinoma, lobular carcinoma, or even ductal and lobular carcinoma in situ is currently under investigation.