Introduction

Aromatase inhibitors (AIs) are the recommended first-line adjuvant endocrine therapy in postmenopausal women with hormone-receptor-positive breast cancer, either as monotherapy or in sequence with tamoxifen [1]. AI-associated musculoskeletal symptoms (AIMSS) are reported in up to 50 % of women, leading to drug discontinuation in approximately 13 % of users [2, 3]. Since maximum benefit is observed with 5 years of adjuvant endocrine therapy, it is desirable for women to complete the recommended duration of treatment. In addition, a large retrospective trial suggested that AIMSS may predict improved breast cancer-related outcomes [4].

The etiology of AIMSS is not well understood. Hypotheses include estrogen deprivation, neurohormonal changes causing changes in pain sensitivity, or changes in circulating proinflammatory cytokines, such as interleukin (IL)-1, IL-6, and tumor necrosis factor (TNF)-α [5, 6]. A recent genome-wide association study identified an association between AIMSS and single nucleotide polymorphism (SNP) at the promoter region of the T cell leukemia 1A (TCL-1A) gene. TCL-1A is related to IL-17 and IL-17 receptor A (IL-17 RA) expression [7].

Current interventions for AIMSS are limited to oral analgesics and exercise [5, 6]. The efficacy of these approaches is limited, and long term use of oral analgesics can be challenging. Effective treatments for AIMSS will enable patients to complete their full recommended course of adjuvant endocrine therapy [8]. A randomized controlled clinical trial demonstrated that acupuncture significantly reduced AI-associated joint pain and stiffness compared to sham acupuncture [9], although no mechanistic explanation was provided.

We conducted a dual-center, randomized, sham acupuncture-controlled trial to evaluate the effect of acupuncture on improving both function and pain in women with AIMSS. We also examined the effect of acupuncture on serum hormones and proinflammatory cytokines, including IL-17, to help elucidate the mechanism of action of acupuncture, at the molecular level.

Methods

Patients

Eligible patients were postmenopausal women with stage 0-III breast cancer that was estrogen receptor (ER) and/or progesterone receptor (PR) positive who were receiving standard doses of a third generation aromatase inhibitor (AI) therapy for ≥1 month, and with physician documented AIMSS. Patients were required to have a baseline Health Assessment Questionnaire Disability Index (HAQ-DI) score ≥0.3 and/or pain using a 100 point visual analog scale (VAS) ≥20. Patients were treated at the Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins and the University of Maryland Greenebaum Cancer Center. Patients were excluded from the study if they had acupuncture treatment within the previous 12 months. All patients provided a written informed consent prior to randomization. This clinical trial was approved by the Institutional Review Board of both centers and registered at clinicaltrials.gov (NCT00641303).

Randomization

Patients’ randomization lists were generated by the trial statistician using specialized randomization software prior to the study initiation. Randomization assignments occurred at the University of Maryland and were provided to the acupuncturist at each center. Although the two acupuncturists were not blinded to the assignment, in an effort to minimize the introduction of bias, the acupuncturists followed a prepared script when engaging in conversation with the participant. All other individuals involved in the care of the participant were blinded—including the treating oncologist(s) and research coordinators who collected HAQ and VAS data. The participants were also blinded to the treatment assignment, and were asked at the end of the eight weekly acupuncture sessions to guess which intervention group they were in.

Interventions

Subjects in the real acupuncture group received eight weekly acupuncture in 15 acupoints including: CV 4, CV 6, CV12, and bilateral LI 4, MH 6, GB 34, ST 36, KI 3, BL 65 (Fig. 1). These acupoints were selected based on our clinical experience suggesting that AIMSS resulted from Qi (vital energy) deficiency. We selected the major Qi enhancing acupoints and acupoints that have been shown to alleviate musculoskeletal symptoms [10]. The skin was routinely disinfected with alcohol swabs. Filiform 0.25 mm × 40 mm sterilized and disposable acupuncture needles (DongBang AcuPrime®, United Kingdom) were then inserted. The needles were inserted 0.5 inch into the skin through the Park holding device, a device consisting of an adhesive tube, and remained in the body for 20 min.

Fig. 1
figure 1

Acupuncture point and sham acupuncture point location map

In the sham acupuncture group, sham needles (non-penetrating retractable needles) were placed in 14 sham acupoints located at the midpoint of the line connecting two real acupuncture points (Fig. 1) with the assistance of the Park holding device. The sham acupoints selection was based on a previous study on acupuncture for fibromyalgia [11]. All subjects were unblinded after the 12-week follow-up, and those who were randomized to sham acupuncture were offered four free acupuncture sessions.

Outcome measures

The primary outcome measures were the differences in change in HAQ-DI and pain (VAS) scores from baseline to week 8. All participants in the study were evaluated using the Health Assessment Questionnaire (HAQ), a well-validated tool that has been used extensively for the past two decades to evaluate patients with rheumatic disorders [12]. The two-page HAQ includes the HAQ disability index (HAQ-DI) and visual analog scale (VAS) to assess both function and pain. In the HAQ-DI, each of eight activities (dressing, rising, eating, walking, grooming, reaching, gripping, and performing errands) is scored as 0 (no difficulty), 1 (some difficulty), 2 (much difficulty), or 3 (unable to do), and the scores are then averaged across the eight activities to yield a HAQ-DI score ranging from 0 to 3. Average scores for the general population range from 0.25 to 0.49, whereas for patients with osteoarthritis, the average score is 0.80 and for rheumatoid arthritis, it is 1.20 [13, 14]. Many studies have suggested that a change of 0.22 is the minimal clinically important difference [14]. All subjects were allowed to receive usual medical care in addition to the study intervention. VAS is a standard measure of pain intensity and ranged from 0 (no pain) to 100 (severe pain). HAQ-DI and VAS were used to assess clinical musculoskeletal disorder severity at baseline and weeks 4, 8, and 12.

Fifteen milliliters of blood was drawn from the subjects at baseline and week 8 to measure changes in estrogen concentrations, cytokine profile, and β-endorphin concentration. 25-Hydroxy vitamin D level was measured at baseline only.

Plasma estradiol concentrations were measured by radioimmunoassay at the Dowsett Lab in the Royal Marsden Hospital, London, UK. The quantitative range for estradiol is 3.0–1,500 pmol/L. Serum β-endorphin concentrations were measured using ELISA at the University of Maryland Cytokine Core lab. The quantitative range for β-endorphin is 0.01–100 ng/ml. Serum cytokines concentrations were run using Mesoscale Discovery multiplexed ELISAs at the Johns Hopkins Bayview General Clinical Research Center Core Lab. All cytokines (IFN-γ, IL-1, IL-6, IL-8, IL-10, IL-12, IL-17, and TNF-α) have the same highest calibrator which is 2,500 pg/ml, and serial dilutions are done down to the limit of detection (which varies depending on the particular cytokine). Serum 25-hydroxy vitamin D (VD) level was determined by Immunochemiluminometric assay performed on the DiaSorin LIAISON® instrument in LabCorp.

Statistical/data analysis

An overall sample size of 50 patients per group was required to have at least 80 % power to detect a 0.22-point of clinically meaningful difference between the two groups in ΔHAQ-DI score (week 8–week 0), at a two-sided significance level of 0.05. The standard deviation (SD) of ΔHAQ-DI score used in the sample size calculation was 0.37, averaged from four within group SDs of HAQ-DI score from a study reporting long term changes over time of HAQ-DI [15]. The trial was closed early for futility due to a low conditional power of 0.33 at a planned interim analysis with 36 evaluable patients. The current analysis with 47 participants includes 11 additional patients who had already begun treatment at the time of interim analysis.

The intention-to-treat (ITT) analysis included all patients who received at least one session of treatment and had at least one post-baseline efficacy assessment. For outcomes in which multiple repeated measures were made (HAQ-DI and VAS-pain), the last observation carried forward (LOCF) method was used to replace any missing values. For biomarker levels, only baseline and week 8 assessments were performed; therefore, patients with missing values at either time point were excluded from the analysis. Baseline and week 8 measurements were compared within groups by Wilcoxon signed-rank test. Change scores at week 8 were compared between groups by Wilcoxon rank sum test, as well as by analysis of covariance (ANCOVA), with adjustment for the baseline scores. Since the ANCOVA analysis did not change the conclusions, it is the unadjusted analysis of change scores that is reported here. Fisher’s exact test was performed to compare proportions between two groups. Spearman’s rank correlation analysis was used to explore relationships between the measured biomarkers and the primary outcome measures. All p values are two-sided with p < 0.05 considered statistically significant. No adjustments for multiple comparisons were made.

Results

From May, 2008 to July, 2011, 51 patients enrolled in the study and were randomized to sham or real acupuncture. Four patients were excluded from the analysis due to failure to initiate the intervention (n = 3), and early withdrawal from the study due to a treatment-unrelated issue (n = 1). The remaining 47 patients are included in this report. Twenty-three were randomized to real acupuncture and 24 to sham acupuncture (Fig. 2).

Fig. 2
figure 2

Consort diagram

Baseline characteristics were balanced between the two groups with the exception of baseline HAQ-DI score which was statistically significantly higher in the real acupuncture group (p = 0.047) (Table 1). The median HAQ-DI score at baseline was 1.12 and 0.62 in the real and sham acupuncture groups, respectively. At weeks 4, 8, and 12, the median HAQ-DI scores decreased to 0.62, 0.62, and 0.62, in the real acupuncture group, and to 0.50, 0.25, and 0.25 in the sham acupuncture group, suggesting improved function in both groups. There was no statistically significant difference in change in HAQ-DI at week 8 between the real (median = −0.12, ranging −1.38–1.12) and sham (median = −0.25, ranging −1.00–0.12) acupuncture groups (p = 0.30) (Fig. 3a). At week 8, 14 out of 24 (58 %) patients in the sham acupuncture group, and eight out of 23 (35 %) patients in the real acupuncture group reached the minimal clinically important change in HAQ-DI with greater than 0.22 unit reduction [14]. This proportion is not significantly different between the two groups (p = 0.147, Fisher’s exact test). The median VAS pain scores at baseline were 50 and 49 in the real and sham acupuncture groups, respectively. At weeks 4, 8, and 12, median VAS scores changed to 42, 43, and 37 in the real acupuncture group, and to 51, 32, and 25 in the sham acupuncture group. There was no statistically significant difference in change of VAS at week 8 between the real (median = −2, ranging −68–53) and sham (median = −13, ranging −80–32) acupuncture groups (p = 0.31) (Fig. 3b). There was no statistically significant difference between the two groups in pain medication usage changes during the trial (p = 0.93).

Table 1 Baseline patient clinical characteristics
Fig. 3
figure 3

Summary of HAQ-DI and VAS score and change compared to baseline at each time point

Among the 38 patients who had estradiol measurements at both baseline and week 8, 17 were in the real acupuncture group and 21 in the sham acupuncture group. Eight patients receiving exemestane were excluded from the analysis due to minor interactions between exemestane and the antibody used to detect estradiol, which results in a false elevation in estradiol levels. In the majority of the remaining 30 patients, the estradiol concentrations remained undetectable (<3 pmol/l, data not shown).

Baseline cytokines, β-endorphin, and 25-hydroxy vitamin D, were well-balanced between the two groups, with the exception of IL-17, with a median of 8.03 in real versus 10.39 pg/mL in the sham acupuncture groups, respectively (p = 0.002) (Table 2). At week 8, all measured cytokine concentrations, except for IL-17 and IL-12, showed no statistically significant changes. IL-17 concentrations were reduced significantly in both groups with a median change of −1.14 in the real acupuncture group (p = 0.009) and −1.53 pg/mL in the sham acupuncture group (p < 0.001) (Table 2). IL-12 concentrations were significantly increased in the sham group, rising from a median of 2.78 pg/mL at baseline to a median of 3.94 pg/mL at week 8 (p = 0.001). There was a modest correlation between the reduction in IL-17 and improvement in HAQ-DI score (r = 0.17, p = 0.31) and VAS (r = 0.18, p = 0.27). The correlations between baseline 25-hydroxy vitamin D level and baseline AIMSS severity were moderate for HAQ-DI (r = −0.30, p = 0.047) and negligible for VAS (r = −0.08, p = 0.62).

Table 2 Summary statistics for serum concentrations of β-endorphin and proinflammatory markers at baseline, week 8 and the change at week 8 from baseline

No significant side effects were reported in either arm. Patients were adequately blinded to their treatment assignment. About 55 % of the patients who guessed their treatment assignments did so correctly (9/17 sham acupuncture, 9/16 real acupuncture).

Discussion

In this study, we have observed that AIMSS, measured by HAQ-DI and VAS, improved following either 8 weeks of real or sham acupuncture treatment. Furthermore, there were no significant changes in serum concentrations of estradiol, β-endorphin, and pro-inflammatory cytokines with exception of significant reduction in IL-17 in both groups, and a significant increase in IL-12 in the sham acupuncture group. Both treatments were well-tolerated with no significant side effects.

This is the second randomized clinical trial, to our knowledge, to study the effect of acupuncture in reducing AIMSS. The first study, conducted at Columbia University, showed that real acupuncture significantly reduced AI-induced joint pain and stiffness compared to sham acupuncture [9]. Our study did not demonstrate a significant inter-group difference.

Both studies included women with early stage breast cancer with AIMSS. Our study had a slightly larger number of patients, 47 compared to 38 in the Columbia trial, and we used a standardized acupuncture protocol whereas the Columbia study used an individualized protocol. Our study used non-penetrating retractable needles at non-acupuncture points, whereas the Columbia study used superficial needles at non-acupoints as sham acupuncture. We provided eight weekly acupuncture treatments compared to twice per week treatment for 6 weeks used in the Columbia study. In addition, we selected different primary end points to assess the severity of patients’ AIMSS including both HAQ-DI score to assess daily function and VAS, while the Columbia study used Brief Pain Inventory (BPI) to assess patients’ most severe pain. It is possible that our end points were not as sensitive as the ones used at the Columbia study to capture changes. In the Columbia study, the baseline BPI scores were balanced, while the baseline HAQ-DI scores in two arms in our study were not balanced, likely by chance, with patients randomized to the real acupuncture arm having significantly higher HAQ-DI (worse function). While the baseline VAS scores were balanced in the two arms of our study, there were no significant differences in terms of response to treatment.

Despite significant differences compared to the prior acupuncture AIMSS study, our results are consistent with the majority of the existing acupuncture literature for other indications, which consistently revealed that patients benefit from both real and sham acupuncture even though there was no significant difference between responses to real vs sham acupuncture [1618]. In addition, a recent systematic review in clinical trials that used sham acupuncture controls suggested that sham acupuncture may be as efficacious as real acupuncture [19]. These studies suggest that sham acupuncture may be associated with a true physiological effect. Indeed, studies have shown that both real and sham acupuncture caused release of endorphins [20] and activation of pain-related neuromatrix [21].

In our study, we used non-penetrating needles at the non-acupuncture points as sham acupuncture. The non-penetrating needles may have provided sufficient stimulation to trigger physiological effects. There are over 360 acupuncture points in the body, with the points closely correlated. The non-acupuncture points might still be sensitive even though the needles did not penetrate through.

Interestingly, our study showed a significant reduction of IL-17 in both groups. IL-17 is a proinflammatory cytokine that has been linked to the severity of rheumatoid arthritis and psoriasis [22, 23]. Two recent clinical trials showed that anti-IL-17 pharmacotherapies effectively reduced the severity of psoriasis [24, 25]. More importantly, the IL-17 pathway has been linked to the development of AIMSS [7, 26]. It is possible that both real and sham acupuncture may reduce the severity of AIMSS symptoms through modulation of IL-17.

We have also demonstrated a significant increase in IL-12 in the sham acupuncture group. IL-12 is a key cytokine involved in the differentiation and commitment of undifferentiated CD4 positive T cells to IFN-γ producing Th1 cells. The elevation in IL-12 in the sham acupuncture group may reflect a shift toward Th1, a pro-inflammatory system. A lack of increase in IL-12 in the real acupuncture group might indicate an absence of shift toward Th1, and by extension, perhaps a predominant Th2 (anti-inflammatory) response.

Our study is limited by its relatively small sample size, although larger than some previous studies widely cited in the literature, and the fact that the primary end point, HAQ-DI score, was not balanced between the two treatment groups at the baseline suggesting that by chance different patient populations might have been studied. Prior study showed that acupuncture significantly increased estrogen concentration [27]. Our study showed that the great majority of the estradiol values were below the detection limit both before and during treatment, suggesting acupuncture has no major detrimental impact on elevation of plasma estradiol that could explain the lessening of AIMSS. It is also the first study showing that both real and sham acupuncture significantly reduced IL-17 concentration, suggesting that sham acupuncture using non-penetrating needles at the nonacupoints may have similar physiological effects compared to real acupuncture. Prior studies showed that baseline 25(OH)-VD level did not predict the risk of developing AIMSS [28], although high dose VD3 did prevent the worsening of AIMSS symptoms [29]. Our study showed moderate negative correlation between 25(OH)-VD level and AIMSS severity when measured by HAQ-DI.

Importantly, no significant side effects were associated with acupuncture treatment. It is a minimal risk procedure and provides 35–58 % of patients with clinically significant benefit. Our study also showed that after stopping acupuncture at week 8, the patients’ AIMSS remained less severe than baseline at week 12 follow-up (Fig. 3). The ongoing three-arm acupuncture AIMSS Southwest Oncology Group study (NCT01535066) will help to definitively answer if acupuncture is better than sham acupuncture and standard care alone. While awaiting completion of this important study, we believe that acupuncture remains a safe and viable option for early stage breast cancer patients with AIMSS.