Introduction

The molecular classification of ovarian epithelial carcinomas as low grade (type I) or high grade (type II) identifies two sets of cancers with contrasting incidence, molecular characteristics, and clinical outcomes [17]. The importance of ovarian cancer grade also transcends relevance for individuals with implications for cancer epidemiology and surveillance. It, therefore, would be important to establish the reliability of microscopic (i.e., histological) ovarian cancer grading in the general medical community and as reflected in population-based cancer registries such as the National Cancer Institute’s SEER database. We, therefore, examined the agreement between recorded ovarian carcinoma grade in SEER’s Residual Tissue Repository (RTR) and two independent gynecologic pathologists using three previously defined grading schemes; (1) the International Federation of Gynecology and Obstetrics grading system (FIGO) [8, 9], (2) Shimizu and Silverberg system [911], and (3) Malpica et al. [1215] scheme.

Community-based pathologists commonly use the FIGO system; a three-tier grading scheme (low, intermediate, or high grade) that is modeled after the system for endometrial (uterine) carcinoma, which reflects the level of cellular organization into differentiated structures such as glands and papillae as opposed to solid sheets of tumor cells. Shimizu and Silverberg (herein referred to as SS) also devised a three-tier grading scheme that is similar to microscopic grading for breast carcinoma, incorporating histological architecture, nuclear cytology, and mitotic index. Malpica et al. at the M.D. Anderson Cancer Center (herein referred to as MDACC) proposed a two-tier system (low or high grade) for serous ovarian carcinomas [1215], which is based upon a dualistic conceptual framework where low grade and high grade carcinomas proceed along two separate cancer pathways [17].

Materials and methods

The National Cancer Institute’s SEER program established its Residual Tissue Repository (RTR) in 2003 to facilitate population-based cancer research using archival biospecimens [16, 17]. SEER’s RTR included Tumor Registries in Hawaii, Iowa, and Los Angeles County. The Los Angeles County Tumor Registry did not participate in this study. We retrieved the available formalin-fixed and paraffin-embedded tissue blocks for primary invasive ovarian carcinomas in the Hawaii and Iowa Tumor Registries, excluding tubal and peritoneal tumors. There were 664 ovarian tumors; 516 were from the Hawaii Tumor Registry that were diagnosed from 1983 through 2004, which represented 38 % of all ovarian tumors in the Hawaii catchment area during that time period. The remaining 148 ovarian cases were derived from the Iowa Tumor Registry diagnosed from 1987 through 2003, representing 4 % of all ovarian tumors in the Iowa catchment area during that time period. Because SEER’s RTR data were anonymized, the National Institutes of Health’s Office of Human Subjects Research designated the project as exempt from IRB approval; nonetheless, IRB approvals were provided at the Universities of Hawaii and Iowa.

Demographic data included age at diagnosis, year of diagnosis, and race [White, Asian or Pacific Islander (API), and other/unknown]. Tumor characteristics consisted of the American Joint Committee on Cancer (AJCC) tumor, node, and metastasis (TNM) stage [18], and histological type, behavior, and grade according to the International Classification of Diseases for Oncology 3rd edition (ICD-O-3) [19]. AJCC ovarian cancer stages were stage I (tumors limited to one or both ovaries), stage II (involvement of one or both ovaries with pelvic extension and/or implants), stage III (involvement of one or both ovaries with microscopically confirmed peritoneal metastasis), and stage IV (distant metastasis, excluding peritoneal metastasis). AJCC guidelines specify 5 histological codes for the microscopic assessment of grade (G) that are independent of TNM stage: GX = unknown, G1 = well differentiated, G2 = moderately differentiated, G3 = poorly differentiated, and G4 = undifferentiated. ICD-O-3 morphology codes have six digits; the 1st four digits are for histological type, the fifth is for behavior (benign or malignant), and the sixth is for tumor grade. Ovarian carcinoma histological codes were serous (8441, 8460, and 8461), mucinous (8470, 8471, 8480, and 8481), endometrioid (8380, 8560, 8570, and 8381), clear cell (8310), and other (8010–8580 (excluding previously listed ICD-O-3 codes), 9000, and 9014). SEER abstracted tumor grade from the 6th ICD-O-3 digit as grade 1 (well differentiated), grade 2 (moderately differentiated, moderately well differentiated, or intermediate differentiation), grade 3 (poorly differentiated), and grade 4 (undifferentiated or anaplastic).

Pathology review

The primary study pathologist (MES) reviewed approximately three H&E stained slides per case (all designated as invasive carcinoma in SEER) to independently re-assess behavior (benign, borderline, or malignant), histological type, and grade for all 664 cases retrieved from SEER’s RTR. A set of 19 % of the tumors (128 of 664) was selected for repeat pathology panel review. This set was constructed by taking a random sample of cancers stratified by histological type, with oversampling of rarer types. Sampling fractions for each histological type were serous (10 %, 30 of 298), mucinous (40 %, 30 of 75), endometrioid (20 %, 20 of 97), clear cell (45 %, 28 of 62), and other carcinomas (15 %, 20 of 132). The selected ovarian cancers were reexamined by MES to evaluate intra-pathologist agreement and reviewed by the second pathologist (OBI) to assess inter-pathologist agreement between MES and OBI. The pathologists had access to the gross pathologic descriptions but were masked to SEER’s recorded behavior and AJCC stage since histological grade is meant to provide a microscopic assessment of ovarian cancer prognosis that is independent of stage [18].

For a complete description of the three ovarian cancer grading systems see supplemental table 1 for: (1) the International Federation of Gynecology and Obstetrics (FIGO) system [8, 9], (2) the Shimizu and Silverberg (SS) [911] system, and (3) the MD Anderson (MDACC) system [14]. In brief, the FIGO/SS grading schemes are three-tier systems that assign all histological types to “low,” “intermediate,” or “high” grade. The MDACC grading system is a two-tier system that assigns serous types to “low” or “high” grade.

Statistical analysis

We assessed the representativeness of the SEER RTR ovarian tumors with chi square tests for heterogeneity, comparing demographic and tumor characteristics for the recovered ovarian carcinomas in SEER’s RTR with the ovarian carcinomas in the corresponding Hawaii and Iowa Tumor Registries. To compare the three-tier FIGO/SS grades to SEER grades 1 through 4, we reclassified SEER grade 1 as low, SEER grade 2 as intermediate, and SEER grades 3–4 as high (Table 1A). To compare the two-tier MDACC low and high grades with SEER grades 1–4, we dichotomized SEER grade 1 as low and SEER grades 2–4 as high (Table 1B). Finally, the three-tier FIGO/SS schemes were further collapsed to low and high (intermediate + high) grades for survival analyses.

Table 1 Graphical representation between the reclassified categories of SEER’s 4-tier grading system to the 3-tier and 2-tier grading schemes

Agreement was assessed as percent observer agreement (p o) and Cohen’s standard kappa coefficients (κ) [20]. Kappa coefficients ranged from 0.00 to 1.00 and were interpreted descriptively as poor κ < 0.20, fair κ = 0.20–0.40, moderate k = 0.40–0.60, good κ = 0.60–0.80, and very good κ = 0.80–1.00. The Kaplan–Meier estimator [21] was used to calculate ovarian cancer-specific survival by low or high grade for all AJCC stages combined and then by early stage (AJCC I + II) or late stage (AJCC III + IV). The log-rank test was used to assess survival differences by low and high grade [22].

Results

Descriptive statistics

The 664 ovarian carcinomas in SEER’s RTR are shown in Table 2. Approximately three-quarters of the tumors were contributed by the Hawaii RTR (77 %, 516 of 664). Mean age at diagnosis was 59.6 years. Serous carcinomas accounted for 45 % of the ovarian tumors (298 of 664), 59 % were late stage and 40 % were high grade. Clear cell carcinomas were more common among APIs (13 %, 51 of 379) than among Whites (4 %, 11 of 282), p < 0.01. Women with serous carcinomas were diagnosed at older age, later stage, and higher grade than women with other histological types (p < 0.05). Compared to the 664 tumors in the Hawaii and Iowa RTRs, the ovarian cancers (5347) reported to the full Hawaii and Iowa Tumor Registries demonstrated a higher percentage of White women, slightly older ages at diagnosis, a lower proportion of serous tumors, and lower stage at diagnosis.

Table 2 Distribution of demographic and tumor characteristics for all 664 ovarian tumors in the Hawaii (1983–2004) and Iowa (1987–2003) Surveillance, Epidemiology and End Results Residual Tissue Repositories (SEER RTR)

Pathology review

MES classified SEER’s 664 invasive ovarian tumors as primary invasive ovarian carcinoma (n = 586), benign (n = 3), borderline (n = 45), and other (n = 30). The other category included ovarian cancers diagnosed at distant metastatic sites (i.e., primary carcinoma in the ovary was unavailable for microscopic examination), non-epithelial ovarian cancers, and non-ovarian carcinomas that were metastatic to the ovary. Grade agreement between the pathologist and SEER was similar for the FIGO (Table 3) and the SS systems (Table 4). Percent agreement with FIGO ranged from 24 % for clear cell carcinoma to 57 % for serous carcinoma with poor to fair kappa coefficients ranging from 0.00 to 0.29 (Table 3A).

Table 3 Percent agreement and standard Kappa coefficients for MES versus SEER using FIGO three-tier grading schemes for: (A) all 664 ovarian tumors and (B) restricted to tumors with assigned grade (low, intermediate, or high) and also classified as invasive by the study pathologist
Table 4 Percent agreement and standard Kappa coefficients for MES versus SEER using SS three-tier grading schemes for: (A) all 664 ovarian tumors and (B) restricted to tumors with assigned grade (low, intermediate, or high) and also classified as invasive by the study pathologist

Bar graphs along with an inserted contingency table are used in Fig. 1 to supplement FIGO grade agreement between MES and SEER (Table 3A). Percent observer agreement was p o = 49 % between MES and SEER with 327 of 664 tumors in the diagonal of the contingency table (Fig. 1). MES tended to grade lower than SEER. For example, MES moved 8 % of SEER-assigned high grade to MES low grade cancers (30 of 362) but did not move any SEER-assigned low grade to MES high grade tumors (0 of 362), Fig. 1. Consequently, p o rose from low to high grade, for example, p o = 23 % for low grade (26 for SEER and 112 for MES), 37 % for intermediate grade (57 for SEER and 153 for MES), and 77 % for high grade (219 for SEER and 284 for MES). Tumor grade was unknown for 177 of the ovarian carcinomas either because grade was not recorded by SEER or MES could not classify the tumor grade because of insufficient microscopic tissue (87 for SEER and 115 for MES with 25 mutually unknown for both SEER and MES). Grade agreement improved when restricted to those tumors with known grade (low, intermediate, or high) and also labeled as invasive by MES (Table 3B), that is, p o = 62 % and fair kappa coefficient = 0.32 (95 % CI: 0.26–0.39). Similar improvement was observed for SS grade (Table 4A compared to Table 4B). Grade agreements for the three-tier FIGO/SS systems did not improve substantively even when there was histological type agreement (serous, mucinous, endometrioid, or clear cell) between MES and SEER.

Fig. 1
figure 1

Contingency table for grade agreement between the study pathologist (MES) and SEER for all 664 ovarian tumors in the Hawaii and Iowa RTRs. The crosstab or contingency table (insert) shows percent observer agreement between MES and SEER in the diagonals with disagreements in the off diagonals. Bold fonts in the bar graph show greater percent agreement between MES and SEER for high grade (77 %, 219 of 284) than low grade (23 %, 26 of 112) or intermediate grade (37 %, 57 of 153). Grading was unknown and/or missing for 177 ovarian tumors

Percent agreement but not the kappa coefficient was generally better with the two-tier MDACC than three-tier FIGO system (Table 5). For example, overall agreement between the study pathologist and SEER grade with the MDACC system was p o = 64 % with a poor kappa coefficient = 0.10 (95 % CI: 0.01–0.19), Table 5A, and improved when restricted to cases that were classified as invasive by the study pathologist (p o = 95 %), Table 5B.

Table 5 Percent agreement and standard Kappa coefficients for MES versus SEER using MDACC two-tier grading schemes for: (A) all 298 ovarian serous carcinomas recorded in SEER and (B) restricted to serous tumors with assigned grade (low or high) and also classified as invasive by the primary study pathologist

The randomly selected ovarian carcinomas (19 %, 128 of 664) were reviewed a second time by MES for intra-pathologist agreement and reviewed by OBI for inter-pathologist agreement between MES and OBI. Inter-pathologist agreement was similar to the agreement between MES and SEER, p o = 43 % and fair kappa = 0.25 (95 % CI: 0.13 to 0.35). Intra-observer agreement for the 1st and 2nd review by MES yielded p o = 66 % and moderate κ = 0.52 (95 % CI: 0.41 to 0.64).

Ovarian cancer-specific survival

With 20 years of follow-up (Fig. 2), ovarian cancer-specific survival for the 586 confirmed invasive ovarian carcinomas was better among low grade than high grade tumors for MES (Fig. 2a, log-rank test p < 0.001) and for SEER (Fig. 2d, log-rank test p < 0.001). Long-term ovarian cancer survival was worse for MES low grade than for SEER low grade, for example, cumulative cancer-specific survival after 15 years of follow-up for MES low grade was 64 % (95 % CI: 55–75 %) and for SEER low grade was 90 % (95 % CI: 80–100 %). On the other hand, short-term cancer survival was similar for MES high grade and SEER high grade, for example, cancer-specific survival after 5 years of follow-up for MES high grade was 48 % (95 % CI: 44–54 %) and for SEER high grade was 51 % (95 % CI: 47–55 %). Re-categorizing SEER low grade to include SEER grade 1 + grade 2 and SEER high grade to include SEER grade 3 + grade 4 did not substantively alter the survival analysis (graphs available upon request).

Fig. 2
figure 2

Ovarian cancer-specific survival with 95 % confidence limits by low and high grades for the study pathologist (MES) and SEER: all AJCC stages combined (a, d), early AJCC stages (b, e), and late AJCC stages (c, f)

When stratified by AJCC early and late stage, grade was not a robust independent predictor of cancer-specific survival (Fig. 2b, c, e, f). For those tumors that were designated as benign by MES, that is, benign (n = 3) or borderline (n = 45), ovarian cancer-specific survival was 90 % with only 5 of 48 recorded ovarian cancer deaths during follow-up.

Discussion

Our study demonstrated several interesting findings regarding ovarian carcinoma grade agreement between SEER and two independent gynecological pathologists. First, similar to other clinical studies [23], grade agreement was only fair irrespective of grading system and histological type. For example, Gilks et al. [23] reported inter-observer kappa coefficients of 0.26 and 0.40 for FIGO and SS grading systems, respectively. Second, agreement improved when restricted to tumors with known grade (low, intermediate, or high) and also classified as invasive by the study pathologist. Third, agreement was better for high grade than low grade tumors, two-tier than three-tier grading systems, and intra- than inter-pathologist comparison. Finally, tumor grade was not a strong independent prognostic factor apart from AJCC stage.

Several factors may have affected the generalizability of our results but not the internal validity of agreement for grade. The 664 ovarian tumors from SEER’s RTR represented only 38 and 4 % of the ovarian tumors in the Hawaii and Iowa Tumor Registries, respectively. More than 75 % of the data were contributed by the Hawaii Tumor Registry, enriching the study with APIs and clear cell carcinomas, a histological type that is more common among Japanese than White women [24, 25]. There were differences between patient characteristics in the RTR and the complete Hawaii and Iowa Tumor Registries, but grade agreement did not differ by any of these factors.

Percent observer agreement was generally higher than kappa coefficients, reflecting two limitations of the kappa statistic [26, 27]. First, though the kappa statistic attempts to measure the amount of non-random agreement [28], one limitation occurs when the categories for a given variable are not equally distributed [27]. Given that high grade is proportionately more dominant than low or intermediate grade, high grade ovarian tumors would be more likely by chance alone. The second limitation arises with imbalance of the row and column totals of a contingency table (e.g., Fig. 1 insert) [27]. As shown in Fig. 1, there is an inequity in the totals for MES low, intermediate, high, and unknown grade of 16.9, 23.0, 42.8, and 17.3 % compared to the corresponding totals for SEER grade of 7.5, 24.8, 54.5, and 13.1 %.

Of note, percent observer agreement increased from low to high grade, possibly reflecting the fact that SEER grade comes from community-based pathologists with more clinical information than was available to the study pathologists, for example, AJCC stage. Even when conditioned upon early and late stage, we observed better agreement between MES and SEER for high than low grade (Table 3A). More specifically, for early stage cancers, percent observer agreement was 29 % for low grade and 71 % for high grade. For late-stage cancers, percent observer agreement was 11 % for low grade and 75 % for high grade. The knowledge of stage along with a heightened awareness of poor outcomes for advanced stage ovarian carcinomas may have influenced SEER’s pathologists to avoid classifying late-stage tumors as low grade. If true, this would tend to yield more conservative low grade carcinomas because of their association with early stage disease. Figure 2 supports this conjecture since cancer-specific survival was better for SEER early stage than for MES early stage, whereas survival was similar for SEER and MES late stage. Indeed, prior reports from individual pathology laboratories have found lower survival for low grade tumors than reported in SEER [29, 30]. Admixing benign and borderline tumors with low grade carcinomas [31] also would tend to improve prognosis for low grade. In fact, 90 % cancer-specific survival for the reclassified benign and borderline tumors is clearly better than otherwise would be expected for typical invasive ovarian cancers.

Percent observer p o agreement was generally better for the two-tier MDACC than the three-tier staging schemes. Though we cannot exclude better agreement by chance alone, improvement with the two-tier scheme might possibly reflect the dualistic nature of ovarian cancer. Contemporary clinicopathologic and molecular models implicate two main carcinogenic pathways by type I (low grade) or type II (high grade) [17]. Type I low grade cancers are believed to arise through a stepwise sequence from adenoma to borderline tumor to invasive cancer and are associated with oncogenic mutations that impact cell proliferation (KRAS and/or BRAF) [3, 32]. Type II high grade tumors constitute the majority of invasive ovarian cancers in the general population and typically show molecular changes that are associated with genetic instability [10, 11].

In sum, grade agreement was fair to moderate between SEER and two independent gynecological pathologists. Agreement improved with higher grade, a two-tier grading scheme for serous tumors, and when restricted to tumors that were re-classified as invasive by the study pathologist. Additionally, grade also was not a robust independent predictor of survival. Consequently, though molecular studies and individual clinical outcomes differ by grade, recorded grade in SEER should be used with caution and may not be a reliable metric for population-based cancer epidemiology. Nonetheless, given the compelling molecular evidence for type I and II ovarian cancers, the results of this study suggest that epidemiologists may need to supplement the microscopic assessment of grade for ovarian cancer with additional biological information such as protein and/or gene expression profiles similar to “genomic grade” for breast cancer [33, 34].