Background

Endometrial cancer (EC) is the most common female genital carcinoma and the fourth most common of all tumors in the Western world.1 Several prognostic factors for overall survival (OS) in EC patients have been identified, including International Federation of Gynecology and Obstetrics (FIGO) stage, histologic type and grade, lymphovascular space infiltration, myometrial invasion, and age.2,3 The combination of these factors on an individual level render risk-adapted treatment decisions complex.4,5 Predictive models (e.g. algorithms, nomograms, risk-scoring systems) may provide helpful instruments as they include independent prognostic factors in one model.6,7 Theoretically, over- or undertreatment due to false risk estimation should occur less likely.8

To date, the most widely used classification system in EC patients is the FIGO staging system.9 Most EC patients are staged surgically as approximately 75% of EC patients are diagnosed at an early stage, however some surgical strategies are still under debate (e.g. indications for lymphadenectomy).2 The shortcoming of the anatomical FIGO staging is that it does not consider continuous variables (e.g. mitotic index, grade, number of positive lymph nodes). Therefore, FIGO stage may not be directly related to the actual tumor biology and may not provide an adequate estimate of the course of the disease.4

Nomograms are predictive tools that address this issue and are increasingly used in oncology, for either patient stratification according to risk groups or for patient counseling. In a graphical display, nomograms incorporate multiple prognostic factors, aiming to simplify the individual prediction of an event.8 In general, nomograms discriminate patients with a future event from those without, and the performance of the nomogram is initially tested in an internal validation by providing discrimination and calibration values.8 Discrimination evaluates whether the model is able to discriminate patients with or without the event and is generally expressed with area under the curve (AUC) values.10 For survival data, concordance probability estimates (CPEs) or concordance indexes (c-indexes) describe the ability of a prediction model to rank observed survival times according to predicted survival probabilities. Calibration describes how close predicted and actual outcome are. All nomograms require a thorough internal and external validation before entering clinical practice.8

The EC nomogram of Abu-Rustum et al. predicts the 1-, 3-, and 5-year OS for EC patients and is based on 1735 patients who received treatment at the Memorial Sloan Kettering Cancer Center (MSKCC) in New York City.11 The model incorporates the variables of age, lymph node status, 1988 FIGO stage, final grade, and histological subtype. The initial internal validation showed a c-index of 0.75 (± 0.01), which is a satisfactory discrimination between women who die compared with those who survive.

To date, only two external validation studies on the MSKCC nomogram have been published. Koskas et al. used the Surveillance, Epidemiology, and End Results (SEER) database of 64,023 EC patients for the analysis and described a c-index of 0.81 (± 0.004) to predict 3-year OS.12 Polterauer et al.13 validated the nomogram in a multicenter Austrian cohort of 765 patients and described a c-index of 0.71 (95% confidence interval [CI] 0.68–0.74).

The aim of this study was to provide a third external validation of the MSKCC nomogram and to compare the discrimination and calibration values with the FIGO classification system of 2009.

Methods

Patient Population

Data from medical records and surgery and pathology reports of 609 patients diagnosed with EC between 1991 and 2011 were reviewed and entered into a local database of 68 covariates. Surgical treatment was either laparoscopic or open and followed the international recommendations for surgical EC treatment. If indicated, adjuvant treatment was administered after the recommendation of the Multidisciplinary Tumor Board meeting, which discusses every postoperative gynecological oncology case in our department. Follow-up examinations were held at the department’s outpatient clinics and the follow-up data were recorded in the Comprehensive Cancer Center Freiburg (CCCF) database. When follow-up was completed, patients were contacted by the CCCF biannually to enquire about their health status. The last follow-up patient contact for this study was in January 2016. Patients were included in the study when they received primary surgical treatment at our department and when all nomogram parameters were available. Patients with uterine sarcomas and incomplete data, or who did not receive primary surgical treatment at the Department of Obstetrics and Gynecology in Freiburg (UFK), were excluded from the analysis. The Institutional Review Board of the University Clinics of Freiburg approved this study.

Descriptive and Inferential Analysis

Overall, 454 EC patients (322 type I and 132 type II) met the inclusion criteria and their data were used for the regression analyses and 1-, 3-, and 5-year MSKCC OS calculations. Type I was defined as endometrioid cancers grade 1 and 2, and type II was defined as endometrioid grade 3 as well as serous and clear cell carcinomas. The primary outcome was OS, calculated from the time of surgery until death or last contact. OS was visualized using Kaplan–Meier curves. In the electronic supplementary material we provide the results of a multivariate analysis to present the independent risk factors for OS in our cohort. The predicted OS was calculated using the online MSKCC nomogram calculator14 and compared with the actual CCCF survival data. SAS Studio 3.2, University Edition (SAS Institute, Cary, NC, USA), was used for the descriptive and inferential data analysis. P-values were calculated assuming a significance level of 5%.

Nomogram Validation

The nomogram’s performance was assessed in a discrimination and calibration analysis. For the discrimination analysis, the CPE was used to test the nomogram’s predictive power for the individual survival probabilities. The CPE according to Gerds et al. allows to consider a restricted time horizon and was calculated with R version 3.2.4 (https://www.cran.r-project.org), using the ‘pec’ package.15

Receiver operating characteristic (ROC) models for the nomograms and conventional FIGO staging systems were created. Sensitivity versus 1-specificity across a range of values was compared and AUC values were calculated (Fig. 1).16

Fig. 1
figure 1

AUC values of the ROC curves comparing the predicted vs. actual OS using a the FIGO 2009 staging system, and b 1-, c 3-, and d 5-year Memorial Sloan Kettering Cancer Center nomogram-predicted OS. AUC area under the curve, ROC receiver operating characteristic, OS overall survival, FIGO International Federation of Gynecology and Obstetrics

For the calibration analysis, the patient cohort was divided into five groups of approximately equal size according to quintiles of nomogram-based estimated OS probabilities. A calibration plot was generated to visualize how far the predictions were from the actual outcomes, displaying mean nomogram-based predictions in the five groups on the horizontal axis versus actual observed OS probabilities with accompanying 95% CI on the vertical axis. The diagonal line represents a perfect calibration (Fig. 2).

Fig. 2
figure 2

Calibration plot of the predicted (online MSKCC nomogram calculator) and observed 3-year OS for five subgroups created based on the predicted risk. The diagonal line represents the ideal calibration, and the vertical lines represent the 95% CI of the five subgroups. MSKCC Memorial Sloan Kettering Cancer Center, OS overall survival, CI confidence interval

Results

Patient Population

Overall, 454 patients (322 patients with type I EC and 132 with type II EC) were included and analyzed. Compared with type I EC patients, type II EC patients had more frequent lymph node metastases (pelvic metastases 19.7% vs. 6.21%, para-aortic metastases 4.55% vs. 0.93%; p < 0.0001), distant metastasis (25% vs. 7.45%; p < 0.0001), local cancer recurrence (12.98% vs. 6.83%; p = non-significant), and disease progression (21.97% vs. 9.63%; p < 0.0001). All patient characteristics and surgical and adjuvant treatment modalities stratified for type I and II EC patients are shown in Table 1.

Table 1 Patient characteristics and treatment

Compared with the MSKCC cohort, the median age in our population was higher (65.2 vs. 62.2 years), but distribution of the three different histological types (adeno, serous-papillary and clear cell EC) was similar (Table 2). Median age at diagnosis from the Austrian cohort was comparable to our cohort, but the risk profile in the Austrian cohort was lower (FIGO stage I: Austria 71.8% vs. Freiburg 62.2%; G1: Austria 43.3% vs. Freiburg 26.4%; serous-papillary/clear cell: Austria 8% vs. Freiburg 16.2%).

Table 2 Comparison of the external validation cohorts

Survival Analysis

At the end of surveillance (14 years and 2 months), 211 (46.5%) patients were reported dead. Median progression-free survival (PFS) after surgery for all patients was 94 months (95% CI 92.2–104.8) and median OS was 101 months (95% CI 95.6–107.9). The 5-year OS rate was 69.3%.

Patients with type I endometrial carcinoma showed a mean OS of 109 months (standard deviation [SD] 63.9) and a mean PFS of 105 months (SD 65.5 months). The mean OS of type II EC patients was only 85 months (SD 70.6) and mean PFS was 82 months (SD 72.5). Especially in advanced-stage disease, type II EC patients had a worse OS (FIGO III 50 months [SD 58.0] and FIGO IV 28 months [SD 30.0]) than type I patients (FIGO III 98 months [SD 64.71] and FIGO IV 41 months [SD 27.6]).

Memorial Sloan Kettering Cancer Center Nomogram Validation

The FIGO classification system of 2009 showed an AUC of 0.6 and a CPE of 0.63 in the total patient cohort. In contrast, the external discrimination analysis showed AUC values of 0.79, 0.8, and 0.8 (Fig. 1), and CPE values of 0.8, 0.77, and 0.77 for the MSKCC nomogram prediction of 1-, 3-, and 5-year OS. The discriminatory power of the 3-year MSKCC nomogram prediction based on FIGO stages is visualized in Fig. 3.

Fig. 3
figure 3

Three-year overall survival prediction of the nomogram by 2009 FIGO stage. FIGO International Federation of Gynecology and Obstetrics

Five subgroups were generated according to the MSKCC nomogram-based-calculated 3-year OS. The calibration analysis plot visualizes the calibration and how far the predictions are from the actual outcome (Fig. 2). In general, the actual survival in our patient population of all EC patients for 3-year OS was worse than predicted by the nomogram, which could be due to the higher risk profile of our cohort.

Discussion

In this German cohort of 454 Caucasian EC patients, the MSKCC nomogram predicted 3-year OS better than the conventional anatomical FIGO staging system of 2009. Overall, the nomogram showed good discrimination and calibration values, and therefore appears to be a useful tool to discriminate between high-, moderate-, and low-risk patients.

Accurate individual prognoses help to avoid oncological under- and overtreatment in EC patients. Clearly, patients desire an accurate estimation of their risk for disease progression and OS.4,5 Historically, FIGO staging has helped to standardize therapeutic management and predict OS, but many other important prognostic factors, e.g. grading, histology type, and lymphovascular space infiltration significantly impact overall prognosis.4,12 Nomograms as visual predictive tools incorporate many different prognostic factors and weigh them to provide a more realistic individual risk estimation.6 Although nomogram-based decision making is still experimental, the increasing development and validation studies indicate the need for better prognostic models in clinical decision making.4,7,8

For EC, nomograms have been developed to predict PFS, OS, the risk of locoregional and distant disease recurrence, and the risk of lymph node metastases.17,18,19,20,21,22,23,24 However, validation studies on these nomograms are unfortunately sparse.25 In 2010, Abu-Rustum et al. developed a nomogram to predict the 1-, 3-, and 5-year OS based on 1735 EC patients treated at the MSKCC. The internal validation was performed using boot-strapping techniques and showed a satisfactory c-index of 0.75 (±0.01).11 Our study is the third external validation of this nomogram, and, like both previous studies (the SEER database and Austrian validation study), we can present rather good discrimination and calibration values.12,13

A comparison of our patient population with the SEER database validation study is difficult since the SEER analysis excluded patients with EC recurrence and included only a few patients with advanced-stage disease.

The risk profile of our cohort was higher than both risk profiles from the original publication (median age UFK 65 years vs. MSKCC 62.2 years) and from Polterauer et al. (Austria G1 43.3% and adenocarcinoma 92% vs. UFK G1 26.4% and adenocarcinoma 83.7%), which could explain the longer median survival (UFK 102 months vs. Austria 134 months and MSKCC 154 months) and our poorer 3-year OS shown by the calibration plot. More patients in our study (46.5% vs. approximately 20%) were reported dead by the end of the observation period, likely due to the higher risk profile and the very long observation period in our study (UFK > 14 years vs. 29 months MSKCC and 53 months Austria).

The long observation period in our study is both a strength and a limitation. In general, long observation periods tend to decrease the AUC because they capture more events.8 The discrimination value of AUC = 0.8 in our study can thus be a low estimate. However, as stated above, primary surgical and adjuvant treatment strategies have changed over the years of the observation period and we were unable to provide disease-specific OS data as these are not captured in the CCCF database. In general, discrimination analyses are performed calculating c-indexes or AUC values. Hosmer et al. defined an AUC value between 0.5 and 0.7 as a ‘poor’ discriminator, an AUC value between 0.7 and 0.8 as ‘acceptable’, and an AUC value > 0.8 as ‘excellent’.26 In this regard, the validation of the 2009 FIGO staging system showed a ‘poor’ (0.6) discrimination, and the MSKCC models showed ‘good’ (0.8) discrimination.26 The c-index of the Austrian external validation was 0.71, and our cohort showed a better 3-year CPE of 0.77 and AUC values between 0.79 and 0.8. A recent review of 28 institutional nomograms available online showed that only 12 nomograms present an AUC value > 0.75.8

Conclusion

The MSKCC nomogram is applicable in a German EC population and allows better risk assessment than the FIGO staging system. In future, it could be used, for example, for patient stratification in clinical trials.