Introduction

Spinal and bulbar muscular atrophy (SBMA) is an adult-onset, hereditary motor neuron disease characterized by muscle atrophy, weakness, and bulbar involvement [14]. SBMA is caused by the expansion of a CAG triplet repeat within the first exon of the androgen receptor (AR) gene [5, 6]. SBMA is a male-specific disease and this appears to stem from the testosterone-dependent accumulation of pathogenic AR proteins in affected neurons [713]. In SBMA mouse models, surgical castration delays the disease onset and reverses the neuromuscular phenotype [10, 13]. Similar effects emerge when the mice are treated with leuprorelin, a luteinizing hormone-releasing hormone agonist that reduces testosterone release [11]. In a phase II clinical trial, leuprorelin suppressed the accumulation of pathogenic AR and slowed the deterioration of motor function in SBMA patients during an open-labeled follow-up period [14, 15]. However, in a large scale phase III randomized controlled trial, the effects of leuprorelin on clinical endpoints were not clearly demonstrated [16]. These discrepancies appear to be partly attributable to placebo-related effects in the large scale trial [16]. Namely, the decrease in motor functions of placebo-treated patients in the phase III trial was milder than that in previous clinical studies without intervention [15].

The effects of placebo on outcome measures are likely to be an important concern in clinical trials of disease modifying therapies for neurodegenerative disorders, because placebo-related improvements may mask the true efficacy of interventions [17, 18]. It is, however, difficult to precisely estimate the magnitude of the placebo effect. The aim of this study was to analyze the effects of placebo on various outcome measures by comparing the progression of motor impairment in untreated and placebo-treated SBMA patients.

Methods

Patients

We analyzed two independent groups: the natural history group (NHG) and the placebo-treated group (PTG). For the NHG, a total of 34 male patients with a diagnosis of SBMA were recruited consecutively and received no specific treatment. The inclusion criteria for the NHG were as follow: (1) genetically confirmed male Japanese SBMA patients with more than one of the following symptoms: muscle weakness, muscle atrophy, or bulbar palsy; and (2) aged 25–75 years old at the time of informed consent. Patients were excluded if they met any of the following criteria: (1) inability to attend periodical follow-up visits; (2) unable to stand upright for 6 min without assistance; (3) tachycardia (>120/min) or uncontrolled hypertension (>180/100 mmHg); (4) angina pectoris or myocardial infarction; (5) severe bulbar palsy or other severe complications such as malignancy; (6) medical history of allergy to barium; (7) hormonal therapies within 48 weeks before informed consent; (8) castrated; and (9) participated in any other clinical trials within 12 weeks before informed consent. All of the patients were followed in Nagoya University Hospital. Data collected during the 48 weeks from the initial evaluation were used for this study. Data for the NHG patients were collected between January 2007 and May 2008.

For the PTG, we used the data from 99 male patients with SBMA who participated in the previous randomized double-blind phase III clinical trial and received placebo for 48 weeks [16]. The inclusion and exclusion criteria for the PTG were equivalent to those for the NHG [16]. Briefly, patients who were 30–70 years old without a desire to procreate were included.

Both studies conformed to the ethical guidelines for human genome/gene analysis research and those for epidemiological studies endorsed by the Japanese government. The Institutional Review Board of Nagoya University Graduate School of Medicine approved the study, and all of the participants gave their written informed consent.

Clinical evaluations

The common measurements for the NHG and PTG were the revised amyotrophic lateral sclerosis functional rating scale (ALSFRS-R), the 5-item amyotrophic lateral sclerosis assessment questionnaire (ALSAQ-5), a modified quantitative myasthenia gravis score (mQMG score), and the 6 min walk distance (6MWD). These outcome measures were used to evaluate the activities of daily living (ADL), quality of life, motor function, and walking capacity, respectively, and were measured at weeks 0, 24, and 48 in both NHG and PTG.

The ALSFRS is a validated questionnaire-based scale that measures physical function in ALS patients performing the ADL [19]. The revised version of this scale, the ALSFRS-R, was generated to improve the disproportion of weighting given to the limbs and bulbar, as compared to respiratory dysfunction, in the original test. The ALSFRS-R was translated into Japanese and validated [20]. The ALSFRS-R is divided into three domains: bulbar-related, limb- and trunk-related, and respiration-related [20].

The ALSAQ-5 is a subjective health measure that was designed for ALS. This questionnaire was developed from the original version, the ALSAQ-40, through the process of item reduction. The ALSAQ-5 and its original version were developed and validated by Jenkinson et al. [21, 22], and the Japanese translation of this test has also been validated [23].

The QMG score is a measure to detect muscle weakness that was originally designed for myasthenia gravis [24]. We used a modified QMG (mQMG) score that measures the muscle power of the extremities and neck flexion. The best possible score is 0 and the worst possible is 15. The scores of mQMG is shown to correlate well with those of ALSFRS-R in the baseline data of the phase III trial of leuprorelin for SBMA [16].

The 6 min walk test is a popular clinical test that has been used to assess functional capacity. The distance traveled during 6 min, 6MWD, is a parameter that evaluates the global and integrated responses of all the systems involved in walking, including the neuromuscular, pulmonary, and cardiovascular systems. Its validity has been verified in various neuromuscular disorders including SBMA [2527].

Genetic analysis

Genomic DNA was extracted from the peripheral blood of the SBMA patients using conventional techniques. PCR amplification of the CAG repeat in the AR gene was performed using a fluorescein-labeled forward primer (5′-TCCAGAATCTGTTCCAGAGGTGC-3′) and a non-labeled reverse primer (5′-TGGCCTCGCTCAGGATGTCTTTAAG-3′). The detailed PCR conditions were described previously [28]. Aliquots of the PCR products were combined with a loading dye and separated by electrophoresis with an autoread sequencer (SQ-5500; Hitachi Electronics Engineering, Tokyo, Japan). The size of the PCR standards was determined by direct sequencing, as described previously [28].

Statistical analysis

All variables were summarized using descriptive statistics, including range, mean, standard deviation (SD), and standard error (SE). Relationships between the various values obtained in this study were analyzed using Pearson’s correlation coefficient. The last observation carried forward method was used to impute missing values when the data were missed only at week 48. The data of patients who were neither evaluated at week 24 nor at week 48 were eliminated from the analysis. The difference between the data at weeks 0 and 48 was analyzed using the paired t test. The difference in progression at 48 weeks between the NHG and PTG was evaluated using the unpaired t test. The sample size required for the clinical trials was estimated using the 48 week chronological data of each group (α = 0.05, β = 0.20). The difference in the slope of regression lines was calculated using the t test. Calculations were performed using the statistical software package Dr. SPSS II for Windows (SPSS Japan Inc., Tokyo, Japan).

Results

Clinical backgrounds of SBMA patients and correlations between the outcome measures

The baseline characteristics of the patients are shown in Table 1. There were a total of 133 subjects (NHG 34, PTG 99). The characteristics of both groups did not differ at baseline except for the disease duration from onset, which was shorter in the NHG. All of the four outcome measures, ALSFRS-R, ALSAQ-5, mQMG, and 6MWD, correlated well with each other in the NHG and PTG, indicating the legitimacy of the clinical evaluations used in this study (Table 2).

Table 1 Baseline characteristics and demographics of the SBMA patients
Table 2 Pearson’s correlation coefficient between outcome measures

48 week chronological change of outcome measures

The 48 week chronological changes in each group are shown in Table 3. One patient in the PTG was lost to follow-up at week 48 because the primary end point of the clinical trial was unmeasurable. In the NHG, one patient was lost to follow-up at weeks 24 and 48 and three patients were lost to follow-up only at week 48 because of inability to attend periodical follow-up visits. The values of the 6MWD significantly decreased over 48 weeks in both groups. Although the ALSFRS-R and mQMG significantly deteriorated in the NHG, neither of these measures showed significant changes in the PTG. We then compared the rate of the 48 week chronological change between the NHG and PTG. The results showed that there was a significant difference in the chronological change rate of the ALSFRS-R between the NHG and PTG (p = 0.03). On the other hand, no significant difference was found in the 48 week chronological change between the two groups with respect to the ALSAQ-5, mQMG, or 6MWD. The same analysis was performed on each domain of the ALSFRS-R: bulbar-related, limb- and trunk-related, and respiration-related. Although no significant differences in the bulbar- or respiration-related domains were seen between the two groups, the inter-group difference was significant in the limb- and trunk-related domain (p = 0.02). Similar results were obtained when we analyzed the data of patients who completed 48 week follow-up (Supplementary Table 1, 2).

Table 3 48 week chronological change of outcome measures

The difference in disease duration between the two groups at baseline may influence the results of our comparison between the NHG and PTG. To ensure the comparability of the two study groups, we analyzed the subset of patients whose disease duration was less than 10 years in each group. The characteristics of these groups did not differ at baseline (Table 1). The differences of chronological changes in the outcome measures between the NHG and PTG within these subgroups were similar to those in the total cases (Table 3). These results indicate that the difference of background at baseline does not have a critical effect on the results of our comparison between the NHG and PTG.

We next investigated the relationship between the 6MWD and ALSFRS-R at weeks 0 and 48 in both groups (Fig. 1). The relationship between these measures at week 48 was similar to that at baseline in the NHG, suggesting that the evaluation of motor function using the ALSFRS-R was plausible at the end of the observation period in this group. However, the slope of the regression line shifted from 0.019 to 0.013 during the 48 week follow-up in the PTG (p < 0.05), suggesting that the placebo treatment affected the evaluation of motor function in these patients.

Fig. 1
figure 1

Relationships between the 6MWD and ALSFRS-R in the NHG and PTG. The slopes of the regression lines in the NHG were not significantly different between week 0 (a) and week 48 (b). On the contrary, the slope of the regression line at week 48 (d) was significantly milder than at week 0 (c) in the PTG (p = 0.04). 6MWD 6 min walk distance, ALSFRS-R the revised amyotrophic lateral sclerosis functional rating scale, NHG natural history group, PTG placebo-treated group

Sample size estimation

To clarify the influence of the placebo-related effects on the design of clinical trials, we calculated the sample size for placebo-controlled studies using data from the NHG and PTG. Table 4 shows the subject numbers per arm required for clinical trials to demonstrate a 50 or 100% reduction of deterioration in each outcome measure using chronological data. According to the estimation based on the chronological data in the PTG, over 10,000 subjects per arm are necessary to demonstrate a 50% therapeutic effect in the ALSFRS-R. In contrast, only 139 subjects are necessary when we calculate the sample sizes on the basis of the chronological data of the ALSFRS-R in the NHG. On the other hand, sample sizes were similar between the NHG and PTG, when they were calculated on the basis of the chronological data of the 6MWD.

Table 4 Sample size estimation for clinical trials

Discussion

In the present study, we assessed the clinical severity of SBMA using quantitative outcome measures: the ALSFRS-R, ALSAQ-5, mQMG, and 6MWT. Two study groups, the NHG and PTG, were independently followed by similar protocols for 48 weeks.

The main aim of this study was to compare disease progression between untreated and placebo-treated patient groups that were evaluated with similar protocols. As the two groups were independent cohorts, and not randomly assigned from a common population, the findings of the present study should be interpreted cautiously. Our results, however, clearly demonstrate that the difference of chronological changes between the NHG and PTG varied according to the outcome measures. Although the 6MWD showed almost the same progression in both groups, there was a statistically significant difference of progression in the ALSFRS-R between the NHG and PTG. Further examination of the ALSFRS-R revealed that there was a significant difference in the limb- and trunk-related domain between the NHG and PTG, although no significant difference was found in either of the other domains. In addition, our results demonstrated that placebo treatment possibly alters the relationship between the ALSFRS-R and 6MWD over 48 weeks. The results of sample size estimation also demonstrated profound differences between the NHG and PTG with respect to the ALSFRS-R.

In the assessment of disease progression in neurodegenerative diseases, we should take several factors that can modify the true progression into account. Above all, the placebo effect should be considered in placebo-treated subjects. Although it is difficult to define the placebo effect, this term can be divided into at least two categories in clinical trials: the improvement due to the pharmacological effect of the placebo itself, and the improvement resulting from psychophysiological effects such as a positive expectation for a new treatment by patients and raters or a subconscious desire to meet the attending doctor’s expectations. Several studies suggested that the placebo effect tends to be prominent in neurological disorders [18, 29]. On the other hand, so-called negative expectations can be generated in the evaluation of natural history without intervention, since the patients or raters may assume that the symptoms must worsen chronologically. The present study indicates that the degree of these biases is larger in the ALSFRS-R than in the other outcome measures of SBMA. Since this functional scale is evaluated by direct interview, its score may be easily affected by the subjectivity of patients and raters. This tendency has also been reported in a systemic review of clinical trials, where placebos demonstrated benefits in studies with continuous subjective outcomes, although they had no significant effects on objective outcomes [17]. Although the degree to which true progression, placebo effect, or negative expectations influence outcome measures is hardly estimated, it is of importance to understand what types of outcome measures tend to be vulnerable to placebo effects and negative expectations, when designing placebo-controlled interventional studies. Knowledge about these subjectivity-related biases is necessary for a correct choice of endpoints in clinical trials.

Although the 6MWD is merely a measure of walking capacity, the ALSFRS-R reflects the total motor function of patients, and thus has been widely used in clinical trials of motor neuron diseases [16, 3032]. For rare diseases such as SBMA, natural history of motor function is indispensable for designing clinical trials. The data of untreated patients, however, should cautiously be used, given the results of the present study that functional scales such as the ALSFRS-R may have an innate subjectivity-related problem. Several different approaches could be used to overcome these concerns. For example, we should develop a new disease-specific outcome measure that is sensitive to changes in the severity of symptoms and has less intrinsic placebo-related effects. Functional scores in which questionnaires are defined with concrete descriptions and their deliberate inter-rater validation may attenuate subjectivity and mystification in interviews. A combination of subjective and objective measures may be considered. We may also need to construct a new class of clinical trials for neurodegenerative diseases that includes a third arm with no intervention and that avoids the inclusion of a placebo group [33].

In conclusion, placebo-treated and untreated SBMA patient groups demonstrated a large difference in the chronological analysis of a motor functional score, but not for an objective measure of walking capacity. These findings should be thoroughly considered when designing clinical trials for SBMA and other neurodegenerative diseases with a slow progression.