Introduction

Aromatase inhibitors are the most commonly prescribed adjuvant endocrine therapy for hormone-dependent early breast cancer in postmenopausal women [1, 2]. The Canadian Cancer Trials Group (CCTG; formerly NCIC Clinical Trials Group) MA.27 phase III trial (ClinicalTrials.gov identifier: NCT00066573) found 5-year event-free-survival (EFS), distant disease-free survival, disease-specific survival, and overall survival was similar among 7576 women with early breast cancer randomized to receive 5 years of anastrozole or exemestane [3]. The most common patient-rated adverse events based on Common Terminology Criteria for Adverse Events (CTCAE v3) included hot flashes and muscle pain [4]. Vasomotor and joint symptoms have been commonly reported as side effects of endocrine treatment [5,6,7,8]. Based on MA.27 CTCAE ratings, menopause-like symptoms, including hot flashes, arthritis, arthralgia, and myalgia, were not significantly different between treatment groups.

However, CTCAE grades may underestimate the frequency and severity of treatment-related symptoms in comparison to more robust patient-reported outcome (PRO) measures that assess target symptoms, particularly with more subjective symptoms such as fatigue [9,10,11,12,13]. Patient-reported outcomes (PROs) capture the status of a patient’s health condition directly from the patient, without interpretation of the patient’s response by a clinician [14] and psychometrically validated PRO measures are considered to be the gold standard for measuring the patient’s experience [9,10,11,12,13,14]. The discrepancy between clinician- and patient-rated treatment side effects has been previously documented with AI therapy-related symptoms [6, 15].

Adherence to long-term endocrine therapy is poor, despite known benefits of treatment [3, 16]. Among MA.27 participants, non-compliance with 5 years of therapy was high (29.4% anastrozole, 33.8% exemestane) and appeared to be driven by adverse events and comorbidities [3]. Given that treatment side effects compromise AI adherence13 and may predict response to therapy [6, 17] the endocrine therapy-targeted measurement of patient-reported TRS and HRQL among MA.27 trial participants using the 56-item Functional Assessment of Cancer Therapy-Endocrine Symptoms (FACT-ES) provided a valuable opportunity to better understand the treatment experience from the patient’s perspective. The purpose of this study was to compare TRS and HRQL among women randomized to receive anastrozole versus exemestane on the MA.27 trial. A secondary objective was to examine the predictive ability of these endocrine-related PROs in identifying women at risk for not obtaining the full benefit of AI therapy through early treatment discontinuation.

Methods

Study population and design

The ECOG-ACRIN Cancer Research Group (formerly Eastern Cooperative Oncology Group, ECOG) conducted this study (E1Z03; ClinicalTrials.gov identifier: NCT00090974) as a companion to the CCTG MA.27 trial. For this study (E1Z03), patients were required to be enrolled on MA.27 no more than four weeks prior to enrollment on E1Z03. The CCTG MA.27 trial design (randomized phase III) has been previously described [3]. The primary objective of E1Z03 was to evaluate TRS between postmenopausal women with receptor-positive primary breast cancer enrolled on MA.27 and randomized to receive anastrozole or exemestane. Secondary objectives included a comparison of HRQL between treatment arms and to identify patients at risk for early treatment discontinuation using endocrine-related PROs.

Eligibility criteria for MA.27 included: (1) postmenopausal women with histologically confirmed, adequately excised, locally determined, hormone receptor-positive primary invasive breast cancer; (2) randomization assigned more than 3 weeks and less than 3 months from completion of chemotherapy; (3) ECOG performance status of 0–2; (4) and minimum life expectancy of 5 years. Detailed MA.27 eligibility criteria have been published [3]. For E1Z03, patients were required to be able to read, understand, and complete the PRO measure in English.

The protocol was approved by the Institutional Review Boards at each registering institution. Written informed consent was obtained from all individual participants included in the study. All consecutive patients enrolled on MA.27 through ECOG between December 15, 2004 and December 16, 2005 were asked to participate in E1Z03 until the E1Z03 target accrual goal (n = 625) was met. All E1Z03 participants were enrolled after MA.27 was amended to remove the celecoxib arm.

Patient-reported outcome measures: treatment-related symptoms and HRQL

The 56-item Functional Assessment of Cancer Therapy-Endocrine Symptoms (FACT-ES) questionnaire was used to assess TRS and HRQL. The FACT-ES questionnaire includes the 27-item FACT-General (FACT-G) core items to assess physical, functional, social, and emotional well-being, and two subscales: the FACT-Breast Cancer Subscale (BCS, 10 items) to assess breast cancer-specific concerns, and the FACT-ES subscale (19 items) to assess endocrine therapy-related symptoms. Each item is scaled on a 5-point Likert scale from 0 (not at all) to 4 (very much). The FACT-ES was administered at baseline, 3, 6, 12, and 24 months. MA.27 median follow-up was 4.1 years.

Treatment-related symptoms

TRS was the primary endpoint and measured by the aggregate score for the 19 items from the FACT-ES subscale plus four items selected a priori based on prior research [18] from the FACT-G subscale to assess fatigue, sleep, nervousness, and bother from treatment side effects. Scores range from 0 to 92 and higher scores indicate lower symptom burden.

HRQL

HRQL was measured using the FACT Trial Outcome Index (TOI), an aggregate score of FACT Physical and Functional well-being subscale items and FACT-B subscale items. FACT-G items selected to assess TRS were removed from HRQL scores. Scores range from 0 to 84 and higher scores indicate better HRQL.

Treatment-emergent symptoms

FACT-ES individual items assessing symptoms (21 items) were examined to identify the most common moderate or severe symptoms at month 3. The proportion of participants who reported no symptom at baseline (item rating = 0, “not at all”) then rated the symptom as present (≥ 1, “a little bit” or more severe) at month 3 was calculated for each item to identify newly emergent symptoms.

Early treatment discontinuation and duration of treatment

Early discontinuation of AI therapy was defined as discontinuing protocol therapy due to reasons other than completing 5-year treatment per study protocol or study termination. For patients with early treatment discontinuation, duration of treatment was defined as months from randomization to treatment discontinuation due to any reason. Patients who were still on treatment at the MA.27 study termination time (April 2010) were censored at last clinical assessment date. Patients who completed 5-year protocol therapy were censored at 60 months.

Statistical analysis

This study was designed to have at least 90% power to detect a 0.33 standard deviation difference in TRS score changes between baseline and month 6 post-randomization assessments between anastrozole and exemestane, using two-sample t test with a two-sided type I error of 0.025. An accrual objective of 625 patients was set, assuming 10% ineligible rate and 20% attrition rate at 6 months after randomization.

All analyses were conducted in the intent-to-treat population. At each time point, TRS and HRQL were compared between the treatment arms using Wilcoxon rank-sum test [19] (due to its skewed distribution). Change in the TRS and HRQL between follow-up and the baseline visits was tested using paired t tests and compared between the two arms using two-sample t tests. Multivariable linear mixed effects models with unstructured covariance matrices were employed to estimate the time profile of TRS and HRQL and to assess the treatment difference in the endpoints over time, assuming that any missing data were missing at random. Likelihood ratio test was used to determine how to code time variable (i.e., continuous vs. categorical) in the mixed effects models. As a sensitivity analysis, we also implemented the lognormal survival model to adjust for potentially informatively censored data, using the expectation–maximization (EM) algorithm as described in Schluchter [20]. Multivariable generalized estimating equation (GEE) model with unstructured covariance matrices was fit to examine the how TRS affected HRQL, adjusting for other variables (age, ECOG Performance status, T-stage, N-stage, prior adjuvant chemotherapy, prior adjuvant radiotherapy, prior raloxifene therapy, prior hormonal replacement therapy, and experienced any symptoms in the 7 days prior to randomization). FACT-ES individual items were examined to identify the most common moderate or severe symptoms at 3 months. The proportion of patients who reported no symptom at baseline (item rating = 0, “not at all”) then rated the symptom as present (≥ 1, “a little bit” or more severe) at month 3 was calculated for each item to identify newly emergent symptoms.

Kaplan–Meier methods [21] were used to estimate the probability of continuing protocol therapy. Difference in duration of treatment between groups was assessed by stratified log rank test. Landmark stratified Cox proportional hazards models [22, 23] were used to explore whether symptoms at baseline or symptom changes over the first 3 months were associated with duration of treatment on MA.27, with 3 months as the landmark time point. The Fine and Gray competing risk model was fit as a sensitivity analysis, and treatment discontinuation due to disease recurrence or deaths were considered competing risks. Factors associated with bother by side effect at baseline was explored using multivariable linear regression model.

No adjustment was made for multiple comparisons. All significance tests were 2-sided with a type I error of 5%. STATA 11.0 software was used for all analyses [24].

Results

Study population

A total of 688 patients were enrolled (12/04–12/05), of which two were not enrolled on MA.27 and not included in this analysis. Of the 686 patients included in the present analysis, 371 were randomized to anastrozole (Arm A) and 315 to exemestane (Arm E). Two patients never started protocol therapy (see CONSORT diagram, Fig. 1). Participants were a median of 65.7 (Arm A) and 64.7 (Arm E) years of age, predominantly white (96%), with an ECOG PS of 0 (86.5, 87.9%) and early stage disease (Table 1). The majority of participants had completed a partial mastectomy (64.4, 67.3%), approximately 50% had prior radiotherapy and prior hormonal therapy, and a slightly higher proportion of women assigned to Arm A (30.5%) had prior chemotherapy, compared to Arm E (24.8%). Randomization resulted in well-balanced treatment arms with regard to demographic and disease characteristics (Table 1). Disease and demographic characteristics among E1Z03 participants were similar to the larger group of MA.27 trial participants.

Fig. 1
figure 1

CONSORT diagram. aThe most common reasons for non-completion of FACT-ES questionnaires (among patients who were still alive by that time) were “patient was not given form by staff” (18/46, 39.1%) and “patient refusal” (15/46, 32.6%) in both arms. Only 2–3 patients on Arm E did not submit the forms due to being too ill. Among patients who completed the FACT-ES questionnaires, more than 90% of patients did it without assistance. Patients were mainly assisted by staff or family members if they needed assistance, either by reading the questions aloud to them or by clarifying the question/instructions. Overall, the response pattern was similar between the two treatment arms

Table 1 Patient demographic and disease characteristics

PRO data completion rates

Overall, compliance with completing longitudinal PROs was excellent and was similar between the two arms in all follow-up visits in the study (Fig. 1). More than 99% of patients submitted FACT-ES questionnaires at baseline in both arms. At month 6, the retention rate was 94.3% (Arm A) and 92.7% (Arm E). The proportion of patients who answered 80% or more FACT-ES questions was 97% or higher in both arms for all study assessments. In both arms, the questions with the most missing values were those regarding sexual function.

TRS and HRQL endpoints

No significant difference in TRS and HRQL was observed between treatment arms at any time points (Table 2 and Fig. 2). Linear mixed effect models showed similar results (Supplemental Table S1, Online Resource 1). The lognormal survival models further confirmed that TRS and HRQL were not significantly different between treatment arms (data not shown). For both treatment arms, TRS worsened significantly from baseline to follow-up assessments (p < 0.01 for all time points, Table 2) and HRQL remained relatively stable over time (Table 2 and Table S1). The TRS score decreased in the first 6 months then remained relatively stable over the next 18 months.

Table 2 Descriptive statistics for TRS and HRQL at each visit by treatment arm
Fig. 2
figure 2

Mean Score and 95% CI of TRS and HRQL by Treatment. Note: a lb lower bound, ub upper bound; TRS treatment-related symptoms, higher scores indicate fewer treatment-related symptoms; HRQL health-related quality of life, higher scores indicate a better quality of life. *p > 0.05 for comparing differences in TRS and HRQL scores between two treatment arms at all time points, using Wilcoxon rank-sum test. **p > 0.05 for comparing difference in TRS and HRQL changes between baseline and month 6 visits between two treatment arms, using two-sample t test. ***p > 0.05 for treatment-by-visit interaction for both TRS and HRQL in multivariable linear mixed effect models

Association between TRS and HRQL

GEE model results showed that a higher TRS score (i.e., lower TRS burden) was associated with better HRQL. The mean HRQL score would improve by 0.57 points for each 1-point increase in TRS (p < 0.001), after adjusting for other covariates.

Individual FACT items assessing TRS

Individual FACT Items assessing TRS were examined to determine the proportion of patients with moderate or severe TRS at follow-up assessments and to identify new onset, treatment-emergent symptoms.

Moderate or severe TRS at 3 months

Among the 23 individual items (19 FACT-ES items + 4 FACT-G items) assessing TRS, the proportion of patients reporting moderate or severe symptoms (i.e., FACT-ES item response = 3 [quite a bit] or 4 [very much]) at 3 months was highest for joint pain (36.1%, 32.5%), hot flashes (29.9%, 29.1%), decreased libido (23.7%, 24.0%), fatigue (15.2%, 24.0%), and night sweats (17.7%, 17.2%), for Arms A and E, respectively (Fig. 3). Moderate or severe fatigue was significantly higher for Arm E (p = 0.005). For all other TRS, proportions were comparable between treatment arms. Vomiting and vaginal bleeding/spotting were experienced by few patients in the study (less than 5% in all follow-up assessments).

Fig. 3
figure 3

Most common moderate or severe symptoms at month 3 based on individual FACT items (% of patients rating symptom as “quite a bit” or “very much” ≥ 10% in at least one arm)

Treatment-emergent symptoms

The proportion of patients who reported treatment-emergent symptoms (new onset TRS) was calculated by tabulating the number of participants who reported no symptom at baseline (FACT-ES item response = 0 [not at all]) and rated the FACT-ES item ≥ 1 (“a little bit” or more severe) at 3 months on 23 individual items assessing TRS. Among the 32% of women who reported no joint pain at baseline, the majority (Arm A 55.8%, Arm E 54.7%) reported joint pain at 3 months (Fig. 4). In addition to joint pain, the most common treatment-emergent symptoms at 3 months included weight gain, hot flashes, night sweats, mood swings, decreased libido, and breast sensitivity (Fig. 4). An examination of the 23 individual FACT symptom items across follow-up assessments indicated vaginal dryness, pain or discomfort with intercourse, weight gain, and joint pain that had worsened during the 24-month follow-up period. Breast tenderness improved over time (Supplemental Fig. 1, Online Resource 2). These patterns were similar in the two treatment arms. Other symptoms remained stable over time in both arms (Supplemental Fig. 1, Online Resource 2).

Fig. 4
figure 4

Most common new symptoms at month 3 in patients without symptom at baseline based on individual FACT items (% of patients rating items > “a little bit” or more severe among those reporting “not at all” at baseline)

Endocrine therapy persistence

Of the 686 patients, 248 (Arm A = 129, Arm E = 119) discontinued treatment before completing 5-year protocol therapy or before the termination of the MA.27 trial; approximately 57% (141/248) discontinued treatment within 24 months after entry onto the MA.27 study (Arm A = 68/129, Arm E = 73/119). The median duration of treatment was 21.7 months (95% CI 16.7, 29.5) for the 129 patients on Arm A (range, 0 to 57.5 months), and 18.1 months (95% CI 12.2, 23.2) for the 119 patients on Arm E (range, 0.56 to 57.6 months). On both treatment arms, the main reason for discontinuing treatment early was adverse events, including side effects and complications (Arm A = 68/129, Arm E = 64/119). Other reasons include death (n = 10), disease recurrence (n = 32), patients withdrawal (n = 28), other complicating disease (n = 32), lost to follow-up (n = 8), and other reasons (n = 6).

In the overall sample, the proportion of patients discontinuing AI therapy was 20.4% by month 24 and 32.0% by month 48. Duration of AI therapy was not significantly different between treatment arms (p = 0.30, Fig. 5a). Among patients on both treatment arms, those who were more bothered by treatment side effects at baseline had a significantly higher risk of discontinuing treatment before completing protocol therapy (adjusted HR 1.29, 95% CI 1.08, 1.55, Table 3) when bother by treatment side effects was included as a continuous variable. In a sensitivity analysis when treatment discontinuation due to disease recurrence or deaths was treated as competing risks, the results remained similar (data not shown).

Fig. 5
figure 5

Kaplan–Meier estimates for duration of treatment

Table 3 Multivariable landmark cox models for duration of treatment (N = 563)

When bother by treatment side effects was coded as a binary variable, for patients who reported no or little bother by treatment side effects at pre-treatment baseline, the rate of completing 4-year protocol therapy was 70.0% (95% CI 65.9, 73.6), compared to 53.6% (95% CI 43.2, 63.0) for patients who reported moderate or severe pre-treatment bother (log rank p = 0.001, adjusted HR 1.92, 95% CI 1.21, 3.03, Fig. 5b). Based on a linear regression analysis, patient-rated bother by treatment side effects at baseline was associated with prior chemotherapy (p < 0.001), prior radiation therapy (p = 0.005), and the number of current medications (p = 0.01; Table 4). Increased joint pain severity in the first 3 months was associated with increased risk for discontinuing treatment early, but it did not reach statistical significance (HR 1.11, Table 3).

Table 4 Linear regression analysis examining factors contributing to patient-reported bother by treatment side effects at baseline (n = 639)

Discussion

Postmenopausal women with hormone receptor-positive primary breast cancer randomized to exemestane or anastrozole (enrolled on MA.27) reported comparable TRS and HRQL for the first two years of AI therapy. Comparable 5-year event-free survival (EFS), distant disease-free survival, disease-specific survival, and overall survival among women with early breast cancer enrolled on MA.27 and randomized to receive 5 years of anastrozole or exemestane has already been reported [3]. Taken together, the observation that TRS and HRQL are comparable supports approach as a reasonable option for patients considering an aromatase inhibitor for adjuvant therapy.

Among E1Z03 participants, the most common moderate or severe TRS shortly after initiation of AI therapy (3 months) included joint pain, hot flashes, decreased libido, fatigue, and night sweats. The proportion of patients reporting moderate or severe joint pain (33–36%) was significantly higher than CTCAE-rated arthralgia of any grade (6–7%) [3]. Many women reported the new onset of symptoms from baseline to 3 months and the most common treatment-emergent symptoms included joint pain, weight gain, hot flashes, decreased libido, breast sensitivity, night sweats, and mood swings. TRS negatively affected HRQL. In the full MA.27 sample, TRS determined by patient-rated CTCAE grades were not associated with relapse-free survival [4].

A significant proportion of patients (36.2%) discontinued AI therapy before completion of the recommended course. This analysis showed that being bothered by treatment side effects at baseline was associated with higher risk of early treatment discontinuation. Factors contributing to bother by treatment side effects at baseline included prior chemotherapy, prior radiation therapy, and the number of current medications. This suggests that a patient’s pre-treatment disposition and prior treatment experiences should be taken into account when initiating AI therapy to identify patients who may be prone to difficulties with tolerability of treatment. Patients experiencing greater symptom burden from prior cancer therapies, compounded by medications for comorbid conditions, are at greater risk for ‘treatment fatigue’ when faced with 5 years of AI therapy. In discontinuing treatment early, these patients are unable to gain maximum benefit from therapy and may benefit from efforts to bolster AI adherence. An analysis of aggregated PRO data across four cancer clinical trials found that this same item from the FACT, “I am bothered by side effects of treatment” was correlated with clinician-rated adverse events and patients’ overall enjoyment of life [25]. Taken together, these findings indicate this single item may provide value at the start of therapy in identifying patients at risk for treatment non-adherence.

The questionnaire completion rate was excellent for this study. There were relatively few patients lost to attrition due to death or progressive disease in both treatment arms, which indicates that missing data therefore did not pose a serious problem for comparisons between the two treatment arms in this study. The lognormal survival models provided supportive evidence for this assertion. Limitations of this study included the challenges inherent to interpreting symptoms as treatment-related, particularly for complex, multifactorial symptoms (e.g., fatigue) and the use of single items to assess domains of interest. There was a difference in allocation to anastrozole and exemestane arms, although results by treatment arm were comparable.

In conclusion, treatment-related symptoms and health-related quality of life were comparable among postmenopausal women with hormone receptor-positive breast cancer who received anastrozole or exemestane. Patient-reported bother with side effects of prior therapy and concurrent medications, assessed at the time of initiating AI treatment, may signal risk for subsequent early discontinuation of AI therapy.