Introduction

Although in Western countries, where screening programs is widely used, prostate cancer remains one of the most frequent cancers and a leading cause of cancer death in men (Siegel et al. 2012). With the extension of life expectancy and the development of prostate-specific antigen test, the mortality of prostate cancer is rising. Therefore, the effective prevention and control of the prostate cancer has become an important task of the men’s health protection (Gronberg 2003). At present, the principle of treatment for prostate cancer patients depends on the patients’ clinical stages and prognostic factors. For the patients with selected intermediate and high-risk prostate cancer, radical prostatectomy (RP) or radiotherapy (RT) combined with hormone therapy (HT) is the standard therapy. Several randomized controlled trials (RCTs) have confirmed that the prostatectomy or radiotherapy plus HT were better than surgery or radiotherapy alone for these prostate cancer patients with intermediate or high risk (Roach et al. 2008; Pilepich et al. 2005; Denham et al. 2011). However, the optimal duration of hormone treatment remains unclear. Based on the previous results of meta-analysis (Cuppone et al. 2010; Kumar et al. 2009), long-term HT may benefit patients, but long-term HT inevitably increases adverse reactions and the spending, so it is difficult to make appropriate clinical decision (Denham et al. 2012). Therefore, we performed this meta-analysis to fully evaluate the efficacy and safety of short-term versus long-term HT in prostate cancer, and we comprehensively appraised the quality of evidence and recommended the evidence with Grading of Recommendations Assessment, Development and Evaluation (GRADE) to facilitate clinical decision-making.

Methods

Inclusion criteria

According to the PICOS, we define inclusion criteria: (1) Participants (P): All the patients that were diagnosed as prostate cancer using pathology and cytology were included in systematic review, excluded metastatic prostate cancer patients. The nationality was not limited, and all the patients did not have serious cardiopulmonary diseases and other severe basic diseases. (2) Interventions (I) and comparisons (C): Comparing the efficacy and safety of short-term HT plus radiotherapy versus long-term HT plus radiotherapy; comparing short-term HT plus prostatectomy versus long-term HT plus prostatectomy in prostate cancer. (3) Outcomes (O): The following outcomes were evaluated: Overall survival (OS), biochemical failure rate (BF), clinical progression rate (CP), prostate cancer-specific mortality (PCSM) and disease-free survival (DFS) in the comparison of short-term HT plus RT versus long-term HT plus RT; positive surgical margin rate (PSMR), prostate volume before RP (PVBR) and PSA level before RP (PSAL) in the comparison of short-term HT plus RP versus long-term HT plus RP were calculated; hormone therapy adverse reactions were descriptively reviewed. (4) Study design (S): RCTs. The duration of short-term HT is defined as not more than 6 months, while the duration of long-term HT is defined as more than 6 months (D’Amico et al. 2007b; Denham et al. 2011).

We excluded the following publications: (1) The design of the study was not RCT, for example, non-RCTs, cohort study, retrospective study, etc; (2) the important information was not complete to extract the data; (3) for repeat published articles or that was the same study from different follow time, the article with the most strictest methodology and most complete data was chosen; (4) non-original research, such as review, letter etc; (5) the duration of long-term hormone therapy ≤6 months.

Eligibility assessment was performed independently in an unblinded standardized manner by 2 reviewers. Disagreements between reviewers were resolved by consensus.

Literature search

We identified articles by searching EMBASE, PubMed, Web of Science, Cochrane Library (CENTRAL, Issue 10 of 12, Oct 2012) up to October, 2012. We used MeSH terms combined free terms in all the search strategies that were correctly adjusted in different database (“Appendix” showed the search strategy of EMBASE). In addition to electronic search original papers, we also reviewed the references of included RCTs to look for potentially eligible articles. Furthermore, we checked abstracts that were published in major academic conferences (American Society of Clinical Oncology, European Society for Medical Oncology and American Society for Therapeutic Radiology and Oncology). No language restrictions were applied. We also contact the corresponding author to obtain information if the research results were unclear or more information was needed.

Assessing risk of bias of included studies

The methodological quality of included studies was evaluated according to the Cochrane Collaboration’s tool for assessing risk of bias of RCTs (5.1.0) (Higgins and Green 2011). Evaluation index included sequence generation, allocation concealment, blinding of participants, personnel and outcome assessors, incomplete outcome data, selective outcome reporting and other sources of bias. For each study, we made judgments about risk of bias from each of the six domains of the tool. In all cases, an answer “Yes” indicated a low risk of bias, an answer “No” indicated high risk of bias, and if insufficient detail is reported of what happened in the study, the judgment would usually be “Unclear” risk of bias.

Quality of evidence

The GRADE approach is a method of grading the quality of evidence and the strength of recommendations in health, which is based on the risk of bias, limitations, the indirectness, the consistency of the results across studies, the precision of the overall estimate across studies and other considerations. For each outcome, the quality of the evidence was rated as high, moderate, low or very low using the following definitions: (1) Further research was very unlikely to change our confidence in the estimate of effect. (2) Further research was likely to have an important impact on our confidence in the estimate of effect and may change the estimate. (3) Further research was very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate. (4) We were very uncertain about the estimate (Balshem et al. 2011; Guyatt et al. 2011). The methodological quality of the evidence in the meta-analysis was ascertained using GRADEpro 3.6 software by two reviewers. If disagreements occurred between the two reviewers, a third author would make decision through discussion.

Data extraction

A special data extract form was used to extract relevant data from the included studies. Data extraction was performed completely independently by two reviewers. Reviewers were not blinded to authors or journals. Disagreements were resolved by discussion between the two review authors; if no agreement could be reached, a third author would decide. The following information was extracted from each article: trial design, patient eligibility, baseline patient characteristics, interventions, duration of follow-up and the number of events for all the outcomes. If the trial results were reported in multiple publications, we extracted the data from the article with the most strictest methodology and the most complete data.

Data analysis

All statistical analysis was performed using RevMan 5.1 software. Continuous data were analyzed using standardized mean differences (SMD) as effect size, count data using risk ratio (RR), and 95 % confidence intervals (CI) was calculated. Chi-square test and I 2 test were used for testing heterogeneity between studies. If heterogeneity was not present (I 2 < 50 %, P > 0.10), fixed effect model was adopted for analysis, otherwise, random effect will be employed. In the presence of heterogeneity, we explored potential sources from the following three aspects: clinical, methodological and statistical. We explored heterogeneity through sensitivity analysis and deal with it by conducting subgroup analysis. In the case of excessive heterogeneity, descriptive analysis rather than meta-analysis was adopted.

Results

Study selection and characteristics of included studies

Total 3,863 relevant literatures were collected, 1,418 duplicates were eliminated by the “find duplicates” function of EndNote X6. After reviewed the titles and abstracts of 2,445 articles, 2,401 articles was excluded due to irrelevancy. The full-text versions of 44 papers were obtained to further determine eligibility. We ruled out another 36 articles: 16 articles due to not meeting inclusion criteria; 5 articles due to non-RCTs (Blas et al. 2010; D’Amico et al. 2007a, b; Horwitz et al. 2001; Zapatero et al. 2011a); 14 articles due to the same study from the different follow-up time (Armstrong et al. 2007; Crook et al. 2004, 2007, 2009b; Cuenca and Mazeron 2006; Daly et al. 2012; Gleave et al. 2009; Hanks et al. 2003; Laverdiere et al. 1997; Zapatero et al. 2008, 2009, 2010, 2011b, c); and 1 article due to ongoing study without complete data (Zapatero et al. 2012). Finally, 9 RCTs from 8 articles, total 4,743 patients, were included in the systematic review and meta-analysis: 7 RCTs (total 4,152 patients) compared RT plus short-term HT with RT plus long-term HT (Laverdiere et al. 2004; Horwitz et al. 2008; Armstrong et al. 2011; Crook et al. 2009a; Bolla et al. 2009; Denham et al. 2012); 2 RCTs (total 591 patients) compared RP plus short-term HT with RP plus long-term HT (Gleave et al. 2001; Pu et al. 2007). Literatures screening process was shown in Fig. 1. The baseline characteristics of the patients were balanced between short-term and long-term HT group (Table 1).

Fig. 1
figure 1

Flowchart of the study selection process

Table 1 Characteristics of trials included in systematic review

Quality assessment

This systematic review included 9 RCTs: The baseline characteristics of patients were reported in all trials, 8 studies mentioned “random,” 5 studies reported an adequate randomized sequence generation and allocation concealment; 8 trials described the reasons of incomplete outcome data; all trials did not mention whether the blind method was adopted or not; however, this was unlikely to affect the quality assessment results; one reference with small sample size, and eventually enter the evaluation of 55 cases only (Fig. 2).

Fig. 2
figure 2

a Risk of bias graph: review authors’ judgments about each risk of bias item presented as percentages across all included studies; b risk of bias summary: review authors’ judgments about each risk of bias item for each included study

Results of systematic review in the comparison of RT plus short-term HT versus RT plus long-term HT

Overall survival

Four RCTs were included in the meta-analysis to evaluate overall survival, compared short-term with long-term HT, the result showed that there was no significant difference existed between the two groups [RR = 0.95, 95 % CI (0.91, 1.00)]. Heterogeneity was acceptable between studies (I 2 = 30 %, P = 0.23), and the fixed effect model was therefore applicable. Subgroup analysis indicated that longer HT duration (more than 2 years) might benefit the patients more [RR = 0.93, 95 % CI (0.88, 0.99)] (Fig. 3a).

Fig. 3
figure 3

Meta-analysis of OS and biochemical failure rate compared short-term HT plus RT versus long-term HT plus RT. a Meta-analysis results of OS; b meta-analysis results of biochemical failure rate

Biochemical failure rate

Six RCTs were included in the meta-analysis, which demonstrated that the rate of biochemical failure was significantly increased 34 % in short-term HT compared with long-term HT group [RR = 1.34, 95 % CI (1.03, 1.74)]. For obvious heterogeneity was observed (I 2 = 86 %, P < 0.00001), random effect model was used to analyze the effect size. Due to the use of random effect model, there was no positive result in subgroup analysis [RR = 1.85, 95 % CI (0.92, 3.69)] (Fig. 3b).

Clinical progression rate

Three RCTs were included in the meta-analysis. Clinical progression rate was significantly increased in the short-term HT than in the long-term HT group [RR = 1.70, 95 % CI (1.50, 1.93)], without significant heterogeneity (I 2 = 48 %, P = 0.15), the fixed effect model was applicable. Subgroup analysis confirmed that extending the HT duration (more than 2 years) might make the pre-existing difference more significantly [RR = 1.76, 95 % CI (1.54, 2.00)] (Fig. 4a).

Fig. 4
figure 4

Meta-analysis of clinical progression rate, prostate cancer-specific mortality and disease-free survival compared short-term HT plus RT versus long-term HT plus RT. a Meta-analysis results of clinical progression rate; b meta-analysis results of prostate cancer-specific mortality; c meta-analysis results of disease-free survival

Prostate cancer-specific mortality

Three RCTs evaluated prostate cancer-specific mortality, which was significantly higher in short-term HT compared with long-term HT group [RR = 1.44, 95 % CI (1.16, 1.79)], without significant heterogeneity (I 2 = 2 %, P = 0.36), the fixed effect model was applicable. Subgroup analysis showed more than 2 years of HT significantly decreased the prostate cancer-specific mortality [RR = 1.51, 95 % CI (1.20, 1.89)] (Fig. 4b).

Disease-free survival

Two RCTs reported the DFS. Due to obvious heterogeneity between the two studies (I 2 = 91 %, P = 0.0009), random effect model was adopted to analyze the effect size. There was no significant difference between short-term HT and long-term HT group [RR = 0.73, 95 % CI (0.46, 1.13)] (Fig. 4c).

Adverse events

The adverse events were descriptively analyzed in this systematic review, due to the disunity of report forms and measurement means. In the study of Irish clinical oncology research group 97-01(Armstrong et al. 2011), gastrointestinal (GI) and genitourinary (GU) toxicity (any grade) was found in 50 and 51 % of patients, respectively, with no significant differences between the 4 months HT (Arm 1) and 8 months HT (Arm 2). The cumulative incidence of Grade 2 or greater GI and GU toxicity was 12 and 16 % in Arm 1 and 12 and 17 % in Arm 2, respectively. Horwitz et al. reported the HT adverse reactions, including nausea, vomiting, diarrhea, headache, fluid retention, male breast development, skin rashes, infection, AST (aspartic acid amino shift enzyme) increasing, thrombosis, heart disease, hot flushes, impotence and other adverse reactions in a total of 14 evaluate HT adverse reactions. All of these adverse reactions were reported in accordance with the RTOG (Radiation Therapy Oncology Group) standard and were divided into five levels. No significant difference was found regarding to adverse reactions between the two groups (P = 0.98) (Horwitz et al. 2008).

Bolla et al. reported that after radiotherapy plus 6 months of androgen blockade, fatigue, hot flushes and sexual problems increased significantly, both statistically (P < 0.001) and clinically. One year after end of short-term androgen blockade and at 1.5 year of long-term androgen blockade, there were statistically significant differences between the two groups in terms of insomnia (P = 0.006), hot flushes (P < 0.001) and sexual interest and activity (P < 0.001); the differences were clinically relevant only for hot flushes, sexual interest and sexual activity. Overall quality of life did not differ significantly between the two groups (P = 0.37) (Bolla et al. 2009). In the study of TROG 03.04, at the end of radiotherapy, significant detrimental changes inpatient-reported-outcome scores (PROs) (P < 0.01) occurred in two groups. There were no significant differences in global health status between groups at any time point. At 18 months, PROs that were significantly worse in the intermediate-term androgen suppression (ITAS) groups when compared with short-term androgen suppression (STAS) were hormone-treatment-related symptoms (HTRS): [STAS, 10.20 (8.66–11.75); ITAS, 17.36 (13.63–21.08), P < 0.01]; sexual activity [STAS, 26.38 (23.50–29.27); ITAS, 14.40 (7.44–21.36), P < 0.01]; social function [STAS, 90.31 (87.89–92.73); ITAS, 87.35 (81.52–93.18), P = 0.09]; fatigue [STAS, 17.05 (14.58–19.51); ITAS, 24.52 (18.58–30.46), P < 0.01]; and financial problems [STAS, 3.39 (1.29–5.48); ITAS, 8.97 (3.92–14.02), P < 0.01] (Denham et al. 2012).

Results of systematic review in the comparison of RP plus short-term HT versus RP plus long-term HT

Positive surgical margin rate

Two RCTs evaluated positive surgical margin rate. Positive surgical margin rate was significantly increased in the short-term HT than in the long-term HT group [RR = 1.81, 95 % CI (1.22, 2.68)], without significant heterogeneity (I 2 = 0 %, P = 0.58), and the fixed effect model was applicable (Fig. 5a).

Fig. 5
figure 5

Meta-analysis of positive surgical margin rate, prostate volume before RP and PSA level before RP compared short-term HT plus RP versus long-term HT plus RP. a Meta-analysis results of positive surgical margin rate; b meta-analysis results of prostate volume before RP; c meta-analysis results of PSA level before RP

Prostate volume before RP

Two RCTs evaluated prostate volume before RP, the long-term HT obviously decreased prostate volume before RP over than the short-term HT [SMD = 0.27, 95 % CI (0.10, 0.45)], without significant heterogeneity (I 2 = 0 %, P = 0.44), we adopt fixed model to analyze the effect size (Fig. 5b).

PSA level before RP

Two RCTs evaluated PSA level before RP, the result of meta-analysis suggested that there was no significant difference between short-term HT and long-term HT [SMD = 2.17, 95 % CI (−0.75, 5.09)], but there was significant heterogeneity (I 2 = 97 %, P < 0.00001), the random effect model was choose (Fig. 5c).

Adverse events

Gleave et al. reported that no fatal adverse events and no differences between the two groups in the severity or causality of adverse events or incidence of increased liver enzymes or diarrhea were detected. However, men in the 8 month compared with the 3-month treatment group noticed a higher number of newly reported adverse events and higher proportion of hot flushes, probably because of the increased length of treatment (Gleave et al. 2001). Pu et al. reported that there was no significant difference in the complication rates between the two groups (Pu et al. 2007).

Quality of evidence

There were 8 outcomes about efficacy in this meta-analysis. OS and BF were critical results; clinical progression rate (CP), prostate cancer-specific mortality (PCSM), disease-free survival (DFS), positive surgical margin rate (PSMR), prostate volume before RP (PVBR), PSA level before RP (PSAL) were all important results. The quality of the evidence of each result was shown in Table 2.

Table 2 GRADE evidence profile

Discussion

Main findings

This systematic review showed there was no significant difference in OS and DFS between RT plus short-term HT and RT plus long-term HT group. But the absolute OS was 34 fewer per 1,000 (from 61 fewer to 0 more) in short-term HT plus RT group (Table 2), inferior to long-term HT plus RT, Long-term HT showed a trend toward improved overall survival, though these differences did not reach statistical significance. Besides, biochemical failure rate, clinical progression rate and prostate cancer-specific mortality were conducive to the RT plus long-term HT. Positive surgical margin rate and prostate volume before RP were beneficial to RP plus long-term HT. These results suggested that prostatic cancer patients are likely to benefit from the long-term HT and further confirmed the previous results of meta-analysis (Cuppone et al. 2010; Kumar et al. 2009). Deserve to be mentioned, when we included Horwitz et al. and Bolla et al.’s studies (in their studies long-term HT duration was more than 2 years, obviously longer than other study), more outcomes favors to the long-term HT plus RT group; subgroup meta-analysis suggested that extending the HT duration (more than 2 years) might make the pre-existing difference more significantly and benefit the patients more.

According to the Cochrane Collaboration’s tool for assessing risk of bias of RCT, except Pu et al’s study (Pu et al. 2007), other included researches’ qualities were acceptable. Based on the GRADE system, critical outcomes: The quality of OS was “high,” and biochemical failure rate (BF) was “moderate”; important outcomes: the quality of clinical progression rate (CP) and prostate cancer-specific mortality (PCSM) were “high.” And the quality of DFS, positive surgical margin rate (PSMR) and prostate volume before RP (PVBR) were “low,” and the quality of PSA level before RP (PSAL) was “very low.” The evidence quality of BF was degraded due to the inconsistency between included studies that may be caused by difference in androgen deprivation drugs, dose, mode of administration and follow-up time in different studies. The quality of DFS was degraded due to inconsistency and inaccuracy; the qualities of PSMR, PVBR and PSAL were degraded, which was mainly due to risks of bias. The quality of evidence in the comparison of short-term HT plus RT versus long-term HT plus RT was acceptable; and the quality of evidence in the comparison of short-term HT plus RP versus long-term HT plus RP was low.

Limitations and strengths

Four RCTs compared the adverse reactions between RT plus short-term HT and RT plus long-term HT group, but the results were discrepant. Horwitz et al. and Armstrong et al. reported that no significant differences existed between the two groups regarding to adverse reactions (P > 0.05) (Armstrong et al. 2011; Horwitz et al. 2008). While Bolla et al. reported that the adverse reaction rate was significantly higher in the long-term HT than in the short-term HT group in terms of insomnia (P = 0.006), hot flushes (P < 0.001) and sexual interest and activity (P < 0.001); however, they also found that overall quality of life did not differ significantly between the two groups (P > 0.05) (Bolla et al. 2009). TROG 03.04 study suggested that compared with 6 months of androgen suppression, 18 months of androgen suppression causes additional detrimental changes at the 18-month follow-up in some PRO scores but not in global quality-of-life scores, and with the exception of HTRS, these differences resolved by 36 months (Denham et al. 2012).

It should be noted that obvious heterogeneity existed in individual studies. There were some possible sources of heterogeneity. Firstly, the duration of short-term HT varied in different studies, ranging from 3 to 6 months, so did the duration of long-term HT in different studies, which was 8, 10 months and more than 2 years, respectively. Therefore, the difference in duration of HT may be the cause of heterogeneity. Secondly, the type and dose of drug, as well as route of administration in each study was not completely the same, thus leading to heterogeneity. In addition, the proportion of different clinical stage of the patients included in each study was not completely same, which might be an important source of heterogeneity. Last but not least, follow-up time was different in each study.

In the comparison of RT plus short-term HT versus RT plus long-term HT, we only evaluate the difference in duration of HT, the possible difference in neoadjuvant HT, concurrent HT and adjuvant HT was not accurately evaluated, but the meta-analysis was applicative, because Roach et al. have confirmed that there was no significant difference in the neoadjuvant HT and adjuvant HT in RTOG 9413 study (Roach et al. 2003), this was the first one and only head-to-head study comparing neoadjuvant and adjuvant HT effects in prostate cancer; In addition, based on the long-time follow-up results of two contemporary studies: RTOG 8513 and RTOG 8610, there was similar OS between neoadjuvant and adjuvant HT for locally advanced prostate cancer (Pilepich et al. 2005; Roach et al. 2008). Although meta-analysis was applicable, clinical heterogeneity was inevitable. In the comparison of RP plus short-term HT versus RP plus long-term HT, either short-term or long-term HT was all neoadjuvant treatment (Gleave et al. 2001; Pu et al. 2007).

In addition, although the qualities of most included studies were acceptable, some outcomes, such as positive surgical margin rate (PSMR), prostate volume before RP (PVBR) and PSA level before RP (PSAL), were only reported in studies of small samples. All the studies included in the meta-analysis were from Europe, the United States and Australia rather than Asia. Therefore, whether the results are applicable in Asian patients need to be confirmed by further research. Two RCTs compared RP plus short-term with RP plus long-term HT, due to follow-up time was so short, little outcomes were reported, so we could not analyze long-term follow-up results, such as OS, DFS and BF. All studies did not report economic burden of HT, so it is difficult to exactly evaluate the economic cost of short-term HT and long-term HT. But long-term HT certainly will increase the medical burden of patients and society; besides, long-term treatment may delay the detection of progression or tumor symptoms, so when we make clinical decisions we must consider these problems.

Conclusion

Taking into account, the current data available in literature, prostate cancer patients are likely to benefit from the long-term HT plus RT or RP. However, RT plus long-term HT did not obviously decreased overall mortality of all patients, though subgroup analysis suggested that more than 2 years HT plus RT might benefit more in prostate cancer; RP plus long-term HT decreased PSMR and PVBR, but the results of long follow-up time were absent. Furthermore, the adverse reactions of longer-term HT are inevitable, the spending of health care will increase in long-term HT, and other objective limitations. So, we have to consider all these issues to make appropriate clinical decisions.