Introduction

Prostate cancer (PCa) is the most common non-cutaneous malignancy among men in the United States, with an estimated 164,690 new cases and 29,430 cancer-related deaths in 2018 [1]. The visceral metastases rate of PCa patients is approximately 15% [2]. Due to the widespread use of prostate-specific antigen (PSA) screening and extended prostate biopsy techniques, the detection of PCa has increased substantially. Huggins and Hodges [3] introduced the efficacy of androgen deprivation therapy (ADT) in the treatment of advanced PCa in 1941. Although 80–90% of metastatic prostate cancers (mPCa) respond to initial androgen ablation, most patients eventually develop progressive disease. Unlike patients with localized PCa, for whom the 5-year survival rate approaches 100%, the 5-year survival rate for patients with mPCa is 20–30% [4]. In the US and Europe, some new and more effective agents are now available, such as enzalutamide, abiraterone, sipuleucel-T, and radium-223 [5]. However, treatment for CRPC is at a standstill [6]. Therefore, more accurate information on patient characteristics related to survival is needed.

Previously, the tumor-node-metastasis (TNM) cancer staging system of American Joint Committee on Cancer (AJCC) has been periodically updated for effective cancer management [7]. It is effective for patient populations. Nevertheless, increasing studies indicated that other factors, including age, marital status, PSA, Gleason score (GS), and surgical margins have also been in association with prognosis of patients with PCa [8,9,10,11]. Therefore, it is needed to establish a prognostic indicator system specified for PCa patients.

The nomogram is a simple statistical tool used for predicting cancer prognosis in clinical practices [12, 13]. Nomograms enable specifically individual survival scores by dynamically incorporating clinical variables with technical feasibility and reproducibility. They were created by regression analysis and extended beyond the standard TNM anatomical criteria [14]. It has been proved to be accurate with the advantage of visualization and quantification with a friendly interface for doctors and patients [15,16,17,18]. Several prognostic nomograms have been reported for patients with PCa [19, 20]. Nevertheless, there is still no report on a model for PCa patients based on a large cohort. Because of this need, a nomogram was constructed to predict OS and CSS based on data from the Surveillance, Epidemiology and End Results (SEER) database.

Materials and methods

Patients

The SEER database is free to the public and is updated annually, with routinely collected general messages from patients, primary tumor characteristics, treatments, survival, and follow-up, etc. It is made up of 18 population-based cancer registries, which nearly account for 25% of the total population in the United States [21]. In this study, the data were updated in November 2016, and released on April 16, 2018. The target population downloaded from the database was between 2010 and 2014. The time period of the study was between January 2010 and December of the year 2014. The inclusion criteria included the following: age at diagnosis > 18 years; adenocarcinoma of the prostate pathologically confirmed based on histology (site code: C61.9, and histological code: 8140). Primary prostate adenocarcinoma at all stages, including M1a-c [AJCC cancer staging manual, 7th edition [22]. Patients were excluded if they had a history of previous malignancy. Further patient exclusion criteria were unknown AJCC stage, unknown biopsy GS, unknown PSA values, and unknown survival month. All the finally included patients were randomly divided into the training set and validation set with a ratio of 2:1. Patients have been de-identified in the database, approval for this study was waived by the local ethics committee, and no informed consents were needed.

Variables

The data of patients and tumor characteristics, such as the age, race, marital status, PSA, biopsy GS, T stage, and bone metastasis, and follow-up information. Age was categorized subjectively as ≤ 49 years, 50–59 years, 60–69 years, 70–79 years, and ≥ 80 years. PSA was classified as < 20 ng/mL, 20–50 ng/mL, and > 50 ng/mL. Biopsy GS was classified as ≤ 6%, 7%, 8%, 9%, and 10%. The AJCC TNM staging system 7th edition was used, with the study being limited to the time period of 2010 and 2014, since it had been published in 2010. Overall survival (OS) and cancer-specific survival (CSS) were used as primary end point. OS was defined as the interval from the time of diagnosis to death or last follow-up regardless of death cause. CSS was measured from the date of first diagnosis to the date of death due to mPCa. There was a predetermined cutoff date based on the SEER 2016 submission database, containing information on the date of death until 2014. Therefore, the study used a cutoff date of December 31, 2014.

Statistical analyses

All the categorical variables were described as frequencies and percentages, and were compared between both groups through a Chi-squared test. The Kaplan–Meier method and log-rank test were used to analyze each potential prognostic variable. Cox proportional hazard regression was applied to identify significant prognostic factors with hazard ratios (HRs) and 95% confidence intervals (95% CIs). Variables in the univariate analysis with P values < 0.05 were selected for multivariate analysis. The nomogram was built with potential risk factors based on the multivariate Cox regression analysis in the training set. A final model selection was performed by a backward step-down selection process with the Akaike information criterion [23, 24]. The validation set was used for the validation of the nomogram. C-index (concordance index) was used to estimate predictive performance of the nomogram. The larger the C-index is, the more accurate the model’s predictive ability is [25]. The calibration curves were based on 1000 times bootstrap resampling. The 45-degree line in a calibration plot was used as a perfect model to compare the actual outcomes. SPSS version 23 software (supplied by Chicago, IL, the United States) and R version 3.5.1 (http://www.r-project.org) were used in all above statistical analyses, where all P values were two sided, and statistical significance would be indicated when P value was below 0.05.

Results

Patient characteristics

A total of 6659 eligible patients were involved. Out of the patients, 4440 patients were placed within the training set, while 2219 were placed within a validation set. Figure 1 shows the specific screening process. Most patients (34.7%) were between 60 and 69 years old, 28.2% were between 70 and 79 years old, and 17.3% were older than 80; only 2.4% patients were younger than 49 years old. Most of the patients of both sets were white (73.7%), married (57.2%), and have stage T1–T3 disease (87.5%), and PSA more than 50 ng/ml (59.4%). For GS, most patients (46.7%) have a higher proportion of GS 9. The two sets did not show any other major statistical differences for the remaining variables. Table 1 shows demographic and pathological characteristics of patients.

Fig. 1
figure 1

Flow chart for screening eligible patients

Table 1 The demographics and pathological characteristics of included patients

Nomogram construction

To analyze prognostic factors of OS, we used the univariate analysis and multivariate analysis on the training set. As shown in Table 2, univariate analysis showed that age, race, marital status, PSA, biopsy GS, T stage, and bone metastasis were associated with OS. Meanwhile, adjustments were made for the major risk factors, indicating six variables from the multivariate analysis that were independent OS predictive factors: age, marital status, PSA, biopsy GS, T stage, and bone metastasis. Therefore, a nomogram of 1-, 3- and 5-year OS was established with the independent variables (Fig. 2a). Similarly, those six variables were also used to establish CSS nomogram (Table 3, Fig. 2b).

Table 2 Univariate and multivariate analyses of overall survival in the training set
Fig. 2
figure 2

Nomograms to predict 1-, 3-, and 5-year OS (a) and CSS (b) for mPCa patients

Table 3 Univariate and multivariate analyses of cancer-specific survival in the training set

Nomogram validation

The nomograms were both internally and externally validated. In the training set, the C-index was 0.735 (95% CI 0.722–0.748) in OS and 0.734 (95% CI 0.721–0.747) in CSS, respectively. In the validation set, the C-index was 0.735 (95% CI 0.717–0.753) in OS and 0.742 (95% CI 0.723–0.761) in CSS, respectively. The calibration plots for the probability of OS and CSS indicated that no apparent departure forms ideal line with optimal agreement between prediction by nomogram and observation in both training cohort and validation set (Figs. 3 and 4).

Fig. 3
figure 3

Calibration plots of the nomogram for 1-, 3-, and 5-year OS prediction of the training set (a, b, c) and validation set (d, e, f)

Fig. 4
figure 4

Calibration plots of the nomogram for 1-, 3- and 5-year CSS prediction of the training set (a, b, c) and validation set (d, e, f)

Discussion

In the current study, we made use of the population-based SEER database, and established clinical nomograms to predict conditional survival in patients with mPCa. A total of 6659 patients were included in this study. We successfully developed nomograms that were able to predict the 1-year, 3-year, and 5-year OS and CS for mPCa, whereas both external and internal nomogram validation indicated favorable calibration and discrimination. Nomograms highlighted the clinical significance of age, marital status, PSA, biopsy GS, T stage, and bone metastasis in mPCa patients. Therefore, the proposed nomograms are easy-to-use clinical tools for facilitating popularization of personalized treatment and patient counseling.

Nomograms have been widely used to predict survival outcomes in individual patients. They can be used to predict cancer risk and treatment outcomes [12, 14]. They address the complexity of balancing different variables through statistical modeling and risk quantification. The systematic approach of nomograms also avoids individual physician bias or individual abnormal clinical variables. A growing body of study has shown that nomograms outperform traditional staging scoring systems in various types of cancer, highlighting their use as new standards or alternatives [26,27,28]. Additionally, nomograms are suitable for assisting clinicians in handling complex situations without standard guidelines [29, 30]. They enable individualized risk stratification and enable clinicians to identify and stratify suitable patients for optimal management strategies.

Gleason scores (GSs) were the most powerful factors for predicting mPCa. Previous studies also indicated that GSs play important roles in the prognosis of localized and mPCa [31,32,33]. Rusthoven et al. [34] performed a retrospective analysis including 4654 mPCa patients, and indicated that survival differences for GS 7 vs. 8, 8 vs. 9, and 9 vs.10 were highly significant in both univariate and multivariate analyses. Age was identified as a predictor of survival for patients with solid tumors [35, 36]. It has been reported by several studies to play paradoxical roles in prognosis of PCa [37, 38]. Guo et al. [38] evaluated the effects of age on prognosis of thymomas based on SEER data. They divided age into the younger-aged group (≤ 70 years old), middle-aged group (30.9%, 70–82 years old), and elderly aged group (> 82 years old). They found that patients in the younger group had better prognosis than patients in the middle-aged and elderly groups. In present study, 6659 patients were divided into 5 groups. The results showed that as the patient ages, the prognosis of the patient is getting worse. It was consistent with previous findings. Marital status has been found to be an independent prognostic factor in multiple cancers [39, 40]. Married patients were featured by reduced cancer-specific deaths and more likely to receive definitive therapy [41]. Married patients were more likely to be diagnosed at earlier stage and to receive surgical treatment. Moreover, married patients had significantly lower risk in OS [42]. Our study also indicated the consistent results. This may be due to the fact that married patients tend to get more social support and heart comfort. Pretreatment PSA level is widely considered to be powerful prognostic factor for PCa. This was not surprising because PSA level is a well-known factor in determining the aggressiveness of PCa [43]. Our study also indicated the consistent results. In addition to the above factors, bone metastases have also been identified as important prognostic factors.

There were several limitations of the present study. The first limitation of this study was that the nomograms were constructed from retrospective data. Therefore, the potential risk of selection bias cannot be ruled out. The second limitation was that several critical prognostic factors, such as performance status, serum hemoglobin, and lactate dehydrogenase, were unavailable in the SEER database. These factors have been reported as predictive factors for PCa patients. The third limitation was the lack of external clinical data from independent sources, which is required to improve the utilization of this nomogram. Therefore, these areas will be key areas for future studies.

In conclusion, we were successful in establishing and validating nomograms to predict 1-year, 3-year, and 5-year OS and CSS in individual patients with mPCa based on a large study cohort. The prediction nomograms might be a useful tool for predicting prognosis for patients with mPCa.