Introduction

Complete remission (CR) and CR with incomplete platelet recovery (CRp) have been shown to be associated with prolonged overall survival (OS) for acute myeloid leukemia (AML) patients treated with “intense” therapy with cytarabine and an anthracycline (7 + 3) compared with patients not achieving either response [1].

Patients who achieved CR rather than CRp were more likely to be alive 3 and 5 years after beginning therapy. CR has also been shown to be associated with prolonged OS in AML patients given less intense therapies, such as the hypomethylating agent azacytidine [2]. However, it is not clear whether patients with higher-risk myelodysplastic syndromes (MDS) treated with azacytidine have longer OS if they achieve a CR, compared with lesser responses such as partial response (PR) or hematologic improvement (HI), which are uniquely defined for MDS. The value of PR or HI themselves is unclear. Outcomes such as CR and CRi (CR with incomplete platelet or neutrophil count recovery) are often used as interim markers of a longer-term endpoint, such as OS. Covariates potentially associated with a clinically meaningful survival benefit are often evaluated on a relative scale using hazard ratios (HRs). However, absolute survival benefit is also important. As survival is typically longer in AML patients given intense induction therapy than in AML patients treated with less intense therapy, it might be expected the former would have the greater absolute survival benefit. Here we evaluate whether the absolute and relative survival benefits of a remission in AML and higher-risk MDS differ according to intensity of treatment; specifically, we compare 7 + 3 versus azacytidine-based regimens. We also evaluate the survival benefits associated with HI in MDS and CRi in AML.

Patients and methods

Study population

We analyzed data from four recent SWOG studies. To evaluate less intense therapy among AML patients, we used data from S0703 (n = 133), a Phase II trial that tested azacytidine + gemtuzumab ozogamicin in an older, less-fit population (ClinicalTrials.gov #NCT00658814) [3]. To evaluate azacytidine among higher-risk MDS patients, we used data from S1117 (n = 277), a randomized Phase II/III trial with three arms: azacytidine, azacytidine + lenalidomide, and azacytidine + vorinostat (ClinicalTrials.gov #NCT01522976) [4]. To evaluate 7 + 3, we used data from the 7 + 3 arms of two sequential Phase 3 studies conducted by SWOG: S0106 (n = 301 randomized to 7 + 3, ClinicalTrials.gov NCT00085709), and S1203 (n = 261 randomized to 7 + 3, ClinicalTrials.gov NCT01802333) [5, 6].

For the AML studies (S0703, S0106, and S1203), CR and CRi (either incomplete platelet or neutrophil count recovery) were defined per International Working Group (IWG) criteria [7]. For the MDS study (S1117), the outcomes CR, PR, and HI were defined per MDS IWG criteria [8]. We note that the definition of CR was the same in all four of these studies. In S1117, blood values were assessed every 4 weeks for HI, marrows were assessed every 16 weeks for PR and CR, and patients remained on therapy until progression. In the AML studies, S0703, S0106, and S1203, CR was assessed following induction and, if given, following re-induction; patients not achieving CR (S0106) or CRi (S0703 and S1203) were not eligible for protocol consolidation therapy. For all studies, OS was measured from the date of study registration/randomization to date of death due to any cause, patients last known to be alive were censored at the date of last contact. Institutional review boards of participating institutions approved all protocols and patients were treated according to the Declaration of Helsinki.

Statistical methods

OS was estimated using the Kaplan–Meier method. To avoid survival by response bias, we performed time-dependent regression analyses based on date of response and landmark analyses of OS [9]. For landmark analyses, we present results based on the study-specific date on which 75% of patients who eventually achieved a CR had done so (S1117 152 days, S0703 56 days, S0106 44 days, and S1203 34 days). Time-dependent covariate analysis is not amenable to Kaplan–Meier plots, but landmark analyses are; thus, we chose to present results from both analyses. Patients who had not achieved a CR by the landmark date were analyzed with patients who never achieved a CR. Patients who died or were lost to follow-up before this date were excluded from the landmark analyses. In a sensitivity analysis conducted using the date by which 90% of patients had achieved a CR, results were similar. Log-rank tests were used to compare survival curves. Multivariable Cox regression models included baseline prognostic factors (quantitative unless otherwise specified): age, sex (male versus female), performance status (0–1 versus 2–3), white blood cell count, platelet count, marrow blasts percentage, disease status (de novo versus antecedent MDS or therapy-related disease, for AML studies), study arm (for S1117 only), and cytogenetic risk (International Prognostic Scoring System criteria for MDS patients on S1117 and SWOG criteria for AML patients on S0703, S0106, and S1203). Similar analyses were performed to evaluate the outcomes HI in MDS patients in S1117 and CRi in AML patients in S0703 and S1203. S0106 defined CRi patients as “treatment failures” and so the study could not be analyzed with S0703 and S1203 to evaluate CRi. We note that PR was considered a treatment failure in the AML studies (S0106, S0703, and S1203), HI was not defined in the AML studies (S0106, S0703, and S1203), and that CRi was not defined for the MDS study (S1117). Only two patients achieved PR as best response on S1117, so analyses of this specific endpoint were not feasible.

Results

Table 1 summarizes patient characteristics from the four cohorts analyzed. Median age was 73, 70, 48, and 48 years for S0703, S1117, S0106, and S1203, respectively. The proportion of patients with unfavorable cytogenetics was 25%, 33%, 18%, and 23% across the four studies. CR/CR + CRi rates were 26/40%, 21%/NA, 70/74%, and 63/76% across the four studies.

Table 1 Summaries of studies analyzed

Effect of CR on survival in AML and MDS patients given 7 + 3 or azacytidine-based therapy

Analyses of the relationship between CR and OS are summarized in Table 2 (univariate analyses) and Fig. 1 (Kaplan–Meier plots). The disparate survival prognoses are evident in the Kaplan–Meier plots; the younger patients receiving 7 + 3 therapy on trials S0106 and S1203 had better survival, as expected. HRs from Cox regression models provide an estimate of relative benefit, specifically on the hazard scale, and are summarized in Table 2. In all four studies, patients with a CR by the landmark date had better survival than patients who did not achieve a CR by the landmark date (HR = 0.51, 95% confidence interval (CI) = 0.31–0.83, p = 0.007; HR = 0.60, 95% CI = 0.36–1.00, p = 0.05; HR = 0.44, 95% CI = 0.32–0.62, p = 0.02; HR = 0.39, 95% CI = 0.24–0.61, p < 0.001 for S1117, S0703, S0106, and S1203 respectively). Thus, the relative benefit ranges from 0.39 (in younger AML patients receiving curative intent therapy on S1203) to 0.60 (in older, less-fit AML patients receiving less intense therapy on S0703). We note that AML and MDS clinical trials are often powered to detect alternative HRs of 0.75 or less extreme. We also performed time-dependent Cox regression analyses based on the actual date that CR was achieved. As with the landmark analyses, CR was associated with significantly improved OS compared with not achieving a CR. The HRs were more extreme in the time-dependent analysis, which reflects that fact that the landmark analysis excludes patients who died before the landmark date and classified patients as no CR if they achieved a CR after the landmark date. This pattern is most pronounced in the S0703 study with the smallest sample size (n = 133); using a landmark analysis the HR = 0.60 with p = 0.054, with the time-dependent covariate analysis the HR = 0.51 with p = 0.001.

Table 2 Univariate summaries of CR versus no CR by landmark date
Fig. 1
figure 1

Overall survival for CR versus no CR by the landmark date (date 75% of patients on study had achieved CR)

The CIs for the univariate HRs from the four studies all overlap; thus, to investigate whether there were strong differences in the relative benefit of CR across the four studies, we fit a multivariable regression model for all four studies including an interaction term between CR and study, and controlling for baseline prognostic factors; the results of the landmark model are summarized in Table 3. There was no significant interaction between CR and study, indicating there were no strong differences in the relative association between CR and OS across the four studies. In this model, the average relative benefit of achieving a CR (across all four cohorts) was HR = 0.48 with 95% CI = 0.29–0.78, p = 0.0036. The results were similar in the time-dependent analysis (interaction p-value = 0.36; HR for average benefit of achieving CR across all four cohorts 0.57, p = 0.008). Results were also similar when comparing the 7 + 3 studies (S0106 and S1203) analyzed together versus the azacytidine studies (S0703 and S1117 analyzed together (interaction p-value = 0.25, HR for average benefit of achieving CR across the cohort = 0.46, p < 0.001).

Table 3 Multivariable Cox regression model for CR versus no CR, including an interaction between CR and study (n = 878)

In contrast to the similar relative value of CR for OS regardless of intensity of therapy, the absolute benefit of CR varied widely across the four studies. The absolute benefit can be thought of as the area between the two Kaplan–Meier curves and can be summarized by a number of measures; in Table 2 we provide summaries for 1-year OS, 3-year OS, and median OS. For patient populations with short OS (e.g., less-fit AML patients receiving less intense therapy on S0703), 1-year OS and median OS may be more useful summaries than 3-year OS, because few patients, regardless of response, are alive at 3 years. For patient populations with longer OS (e.g., younger AML patients receiving 7 + 3 therapy on S0106 and S1203), 3-year OS may be a more useful summary that captures the proportion of long-term survivors. We note that all of these estimates in this analysis are higher than estimates in the full trial populations, because patients who died before the landmark date were excluded from the analysis. By all of these measures of absolute benefit, achieving a CR provided a benefit with respect to OS in all four studies.

Effect of CR with incomplete count recovery (CRi) in AML patients

In S0703 and S1203, both CRi and CR patients were eligible for post-induction therapy; thus, CRi can be compared to CR using methods analogous to those used above. S1117, the MDS study, did not include CRi as a potential response outcome following IWG guidelines for MDS responses [8] and S0106 treated CRi outcomes as induction failures and did not allow CRi patients to receive protocol consolidation therapy; consequently, these studies are excluded from this analysis. Earlier work on the relationship between CR and CRp and OS analyzed intensive therapies [1], so we took the opportunity afforded by this dataset to also evaluate the benefit of CR versus CRi with less intense AML therapy (S0703). We note that in the above analyses of CR, patients who achieved a CRi were analyzed in the “No response” cohort for each study. In the following analyses, we used the same landmark date as in the CR analyses to maintain comparability of the CR patients between the analyses.

Kaplan–Meier curves for CR, CRi, and no response patients are shown in Fig. 2. In univariate analyses, CRi patients had OS that was trending toward significantly shorter than CR patients (S0703 HR = 0.52, (0.24, 1.03) p = 0.059; S1203 HR = 0.48, (0.22, 1.02) p = 0.056) and OS that was not significantly different than the survival of patients who did not achieve a response by the landmark date (S0703 HR = 0.83, (0.47, 1.45) p = 0.591; S1203 HR = 1.28, (0.63, 2.60) p = 0.49), with similar results on multivariable analysis (CRi versus CR: S0703 HR = 0.67, (0.32, 1.39) p = 0.28; S1203 HR = 0.73, (0.22, 1.05) p = 0.07; CRi versus no response: S0703 HR = 1.01, (0.53, 1.91) p = 0.98; S1203 HR = 1.03, (0.47, 2.01) p = 0.94).

Fig. 2
figure 2

Overall survival for CR versus CRi versus no CR by the landmark date (date 75% of patients on study had achieved CR)

Effect of HI on OS in MDS patients

We were also interested in evaluating the absolute and relative benefit of HI as the best response for MDS patients. As only two patients achieved PR as best response, analyses of this specific endpoint were not feasible. Kaplan–Meier curves for best response of CR, HI, and no response by the landmark date are shown in Fig. 3. In univariate analyses, we found that patients with best response of HI had an OS that was nonsignificantly worse than patients with best response of CR (HR = 0.69, (0.40, 1.20) p = 0.19) but was significantly better compared with patients who did not achieve a response by the landmark date (HR = 1.55, (1.03, 2.32) p = 0.035). However, in multivariable analysis the patients with best response of HI had significantly worse OS than patients with best response of CR patients (HR = 0.49, (0.27, 0.90), p = 0.022) and significantly better than patients with no response (HR = 1.96, (1.26, 3.04), p = 0.003).

Fig. 3
figure 3

Overall survival for CR versus HI versus no response by the landmark date (date 75% of patients on study had achieved CR)

Discussion

For many years, “response” in AML referred specifically to CR. This reflected findings that patients achieving CR lived longer than those who did not, with the difference largely resulting from time spent in CR. More recently, new categories of response have been defined. Criteria for these can either be more stringent (as in CR without measurable residual disease (MRD)) or less stringent than those for CR, as in CRi or HI. CR, CRi, and HI are often combined and called “composite” or “overall response.” Overall response may be a better indicator of a drug’s activity than CR. Furthermore, given additional therapeutic options and improved supportive care, patients may live longer once relapse from CR occurs compared with 20 years ago. Accordingly, it is relevant to question whether the previous association between CR and survival still applies, and how these lesser responses, characterized by lower blood counts than seen in CR, affect survival.

In line with previous studies [1], we found that, after accounting for the longer time needed to achieve CR than no response (guarantee time), CR was associated with a survival advantage in AML patients given intensive therapy, compared with lesser or no response. The advantage was seen considering both absolute (area between Kaplan–Meier curves for CR vs no CR) and relative (HR) benefit. We found a similar relative benefit of CR for higher-risk MDS patients and older AML patients given azacytidine-based therapies, although the absolute benefit in these MDS and older AML patients was more modest than in AML patients treated with 7 + 3 because of the poorer OS in the older and less-fit (Table 1) azacytidine-treated patients. The poorer relative benefit of CR observed among the azacytidine-treated patients may be because the CRs among these patients had higher levels of MRD and not just attributable to differences in patient characteristics. Unfortunately, MRD was not evaluated across these studies.

We also found a more pronounced survival advantage for CR versus CRi (Fig. 2) than did the study of Walter et al. [1] for CR versus CRp. Although the latter found a survival benefit for CRp versus responses that fell short of CR and CRp, we did not find that CRi was associated with improved OS compared to either CR nor no response (Fig. 2). Differences between CRp and CRi may provide one explanation. CRp requires a neutrophil count > 1000 per microliter, although platelet recovery to > 100,000 is not required. CRi does not require similar recovery of either neutrophils or platelets. Given that infection is the major cause of death in AML, and that a neutrophil count > 1000 reduces the risk of infection, CRp (as in Walter et al. [1]) may be more beneficial than CRi (as here). One of the studies we analyzed, S0703, was a trial of older AML patients receiving azacytidine-based therapy, whereas the studies analyzed in Walter et al. [1] used more intense therapies and analyzed patients treated 10–30 years before the studies we analyzed. In many of the trials analyzed in Walter et al. [1], CRi patients were not eligible for protocol consolidation therapy, which resulted in a different distribution of CR versus CRi patients. This different distribution can be seen comparing the S0106 and S1203 trials in our analysis (Table 1). Requiring a strict morphologic CR to receive protocol consolidation as in S0106 (patients with CRi were not eligible to receive protocol consolidation therapy) was associated with a higher proportion of CRs and correspondingly lower proportion of CRis. This change in distribution may help explain the difference in OS patterns observed in this manuscript; only 2% of patients in Walter et al. [1] achieved CRp. We note that the number of CRi patients was modest in both our analysis and in Walter et al. [1], precluding definitive interpretations in either analysis.

In addition, the analytic methodologies used in the two works are not the same. The prior work performed a landmark analysis at 30 days and analyzed by eventual response, whereas the analysis herein analyzed by observed response at the landmark date and using time-dependent regression models. Since at day 30 (or for any landmark point that is chosen), eventual best response is not known for all patients, the analysis reported here provides values that can be used to directly describe the conclusions (including effect sizes) that can be drawn based on response results up to that landmark date. To better understand the discrepancy between the S1203 results here and the Walter et al. [1] results, we analyzed S1203 data based on eventual response rather than observed response and found very similar results as presented earlier in this manuscript (with CRi as the reference, CRi was not associated with different OS compared with patients without a response: HR = 1.28, (0.63, 2.60), p = 0.49; CRi was associated with worse OS than CR: HR = 0.48, (0.22, 1.01), p = 0.056), leading us to conclude that it was not the analytic methodology that is driving the difference.

In contrast to CRi in AML patients, we did find that among MDS patients receiving azacytidine-based therapy, HI was associated with improved OS compared to patients who did not achieve a response. The absolute and relative benefit of an HI is more modest than the benefit of CR but is still present. HI was not defined in the AML studies we analyzed nor was CRi defined in the MDS study.

A definitive analysis would require data from a trial of randomizing patients between intensive and non-intensive therapies. Data from such a trial is not available, so we have used existing non-randomized datasets to describe the outcomes as observed in four trials. As our results are not from a randomized trial, our conclusions should not be considered definitive. Our principal conclusion is that the relative survival benefit of achieving CR was similar regardless of whether patients received 7 + 3 or less intensive azacytidine-based therapy as initial therapy for AML or high-risk MDS. Hence, with more or with less intense therapy CR should continue to be recognized as a distinct response. However, given their inherently worse prognoses, the absolute survival benefit is smaller for the older less-fit patients given less intense therapies. Although analyses of survival differences usually emphasize relative benefits (often HRs), the distinction between relative and absolute survival benefit may be important when decisions are made regarding the cost of a new medicine versus its effectiveness.

Future work should address whether achievement of CR, CRi, or HI is associated with improved quality of life (QOL), and whether most of the QOL benefit is derived from CR, as many clinicians suspect. In addition to the use of patient-reported instruments, this question might be addressed by separating patients according to whether they are in CR (or CRi or HR) or not at various landmarks (e.g., 3 months, 6 months, etc.) and examining the number of transfusions received and days spent in hospital over the ensuing 3, 6, etc. months, as these are plausible surrogates for QOL.

As this work was started and the trials herein completed, new guidelines for definitions of AML response have been released by the European LeukemiaNet; in particular CR without MRD is now distinguished from CR [10]. The results here should be validated in future trials designed using these new and revised definitions. The US NCI National Clinical Trials Network (NCTN) is developing a multi-arm Phase 2/3 trial for older AML and high-risk MDS patients with azacytidine as the control arm (S1612). This trial will be an ideal data source to validate the S0703 and S1117 results presented here, with a larger sample size and also using the new ELN guidelines for response definitions.