FormalPara Key Points

Concerns about the generalizability of pharmacotherapy efficacy trials to “real-world” patients have been raised for more than 40 years. However, almost all of this literature has focused on treatment studies of major depressive disorder (MDD).

We conducted the first review of placebo-controlled bipolar disorder efficacy trials (BDETs) and compared the psychiatric inclusion and exclusion criteria used in BDETs with those used in antidepressant efficacy trials (AETs) of MDD.

Similar to treatment studies of nonbipolar MDD, the treatment studies of bipolar depression frequently excluded patients with comorbid psychiatric and substance use disorders and insufficient severity of depressive symptoms as rated on standardized scales. These findings indicate that concerns about the generalizability of data from trials of recently approved medications for the treatment of bipolar depression are as relevant as the concerns that have been raised about studies of antidepressants for nonbipolar depression.

1 Introduction

Concerns about the generalizability of antidepressant efficacy trials (AETs) to “real-world” patients have been raised for more than 40 years [111]. An early line of inquiry compared the demographic, clinical, and treatment response characteristics of subjects who volunteered for efficacy treatment trials in response to advertisements (i.e., symptomatic volunteers) with patients who were referred for treatment in a more traditional manner. These studies generally found that the two groups were similar, and this was interpreted as support for the generalizability of efficacy trials [15]. However, other studies examined how many patients applying for an acute-phase efficacy trial were accepted into the trial and found low acceptance rates [6, 7]. In addition, more recently, the representativeness of samples treated in AETs was most directly examined by applying the inclusion and exclusion criteria to clinical samples. Several studies have found that most depressed outpatients treated in clinical practice would not qualify for an AET [811]. Almost all of this literature has focused on treatment studies of major depressive disorder (MDD).

During the past 15 years, greater attention has been given to the treatment of bipolar depression. For the first time, medications have received regulatory approval as monotherapy for the acute-phase treatment of bipolar depression without also being approved for the treatment of MDD. Accordingly, a sufficient literature of placebo-controlled studies has accumulated to warrant an examination of the psychiatric inclusion and exclusion criteria of trials that have assessed the efficacy of medications for bipolar depression (bipolar disorder efficacy trials [BDETs]).

We previously conducted a limited review of the psychiatric inclusion and exclusion criteria used in AETs by examining the criteria of 39 AETs published between 1994 and 2000 in five journals [12]. More recently, we conducted a comprehensive review of 170 placebo-controlled AETs published during the past 20 years [13]. We hypothesized that the concerns discussed a decade earlier would result in a broadening of the inclusion/exclusion criteria, thereby enhancing the generalizability of the data from AETs. Our hypothesis was not confirmed. Rather, an investigation of the inclusion/exclusion criteria of studies published during the past 5 years compared with those of studies published during the prior 15 years found that AETs have become more restrictive in the criteria used to select patients into the trials. The more recently published studies were significantly more likely to exclude depressed patients with any comorbid psychiatric disorder, more likely to require a minimum symptom duration longer than the Diagnostic and Statistical Manual of Mental Disorders fifth edition (DSM-5) 2-week threshold, and the cut-off score for inclusion in the study on measures of the severity of depressive symptoms was significantly higher in the more recent studies.

The goal of the present study was to conduct a similar review of placebo-controlled BDETs published during the past 20 years and to compare the criteria used in BDETs with those used in AETs.

2 Methods

We conducted a search of the MEDLINE (via PubMed), Embase (via Ovid), and PsycINFO (via EBSCO host) databases for articles published from January 1995 through December 2014. The search was limited to this timeframe to be consistent with our recently published review of the inclusion and exclusion criteria of AETs [13]. We used the search terms ‘depression’ or ‘depressive’ or ‘bipolar’ and ‘placebo’ and only included articles published in English. We also examined the reference lists of meta-analyses of AETs and BDETs and the studies identified from our literature review.

We excluded trials that focused on refractory depression; chronic depression; psychotic, atypical, or melancholic subtypes of depression; depressed patients with particular symptoms such as anxious features; inpatient samples; or samples that were limited to patients with a particular comorbid condition such as alcoholism, anxiety disorder, or medical illness. We excluded these studies from our analysis because, by definition, they focused on limited groups of depressed patients and this would bias our findings toward suggesting AETs and BDETs are poorly generalizable.

We only included trials focused on patients meeting DSM criteria for a current major depressive episode, and therefore did not include trials that were based on an admixture of patients with major depression, dysthymic disorder, cyclothymia, mixed bipolar disorder, and minor depression. Trials resulting in multiple publications based on the same sample (and the same set of inclusion/exclusion criteria) were included only once. We did not include trials of intravenous or injectable forms of medication or medication combinations or augmentation strategies. We included trials whether or not the medication had received regulatory approval for the treatment of depression.

Two of the authors independently reviewed each article and completed a pre-specified information extraction form listing the psychiatric inclusion and exclusion criteria used in the study. The reviewers met, compared the results of their data abstraction, and resolved discrepancies.

We identified 170 placebo-controlled AETs [13] and 22 BDETs published during the 20 years through December 2014 [1430]. As shown in Table 1, three of the publications on bipolar disorder trials included the results of multiple trials [17, 20, 26]. In our tally of inclusion and exclusion criteria, we counted each trial separately. We compared the groups with the Chi square statistic or with Fisher’s exact test if the expected value in any cell of a 2 × 2 table was <5.

Table 1 Psychiatric inclusion/exclusion criteria in 22 bipolar depression efficacy trials

3 Results

3.1 Frequency of Psychiatric Inclusion/Exclusion Criteria used in Placebo-Controlled Monotherapy Treatment Studies of Bipolar Depression

Table 1 lists the inclusion and exclusion criteria used in each of the 22 BDETs, and Table 2 summarizes this information. Across all 22 BDETs, six inclusion/exclusion criteria were used in at least half of the studies: minimum severity on a depression symptom severity scale (100.0 %), significant suicidal ideation (81.8 %), diagnosis of alcohol or drug abuse or dependence (77.3 %), presence of a comorbid nondepressive, nonsubstance use Axis I disorder (50.0 %), current episode of depression being too long (63.6 %), and current manic symptoms as reflected by scoring above a cut-off value on the Young Mania Rating Scale (77.3 %) [31].

Table 2 Commonly used psychiatric inclusion/exclusion criteria in 170 antidepressant efficacy trials [13] and 22 bipolar depression efficacy trials

The definition of some exclusion criteria varied between the studies. All 22 BDETs excluded patients whose symptom severity was too low. The most commonly used symptom severity measures were the 17-item Hamilton Rating Scale for Depression (HAMD) [32] (77.3 %, n = 17) and the Montgomery-Åsberg Depression Rating Scale (MADRS) [33] (27.3 %, n = 6). One study used both the HAMD and the MADRS [17]. Three different cut-offs on the HAMD were used for inclusion (cut-off of 16 in one study [18]; cut-off of 18 in nine studies [14, 15, 17, 26, 28]; cut-off of 20 in seven studies [16, 20, 22, 24, 25, 30]). Three different MADRS severity scores were used for study inclusion, the most common being a score of 20 (four studies [21, 23, 27, 29]).

Most studies excluded patients with current or recent substance use disorders (77.3 %, n = 17), though whether patients were excluded because of a history of drug/alcohol abuse, dependence, or either varied, as did the time period during which the patients could not have had substance use problems (current, past 3 months, past 6 months, or 1 year).

The range of nondepressive psychiatric pathology used as the basis for exclusion also varied among studies. Some listed the specific disorders that were the basis for exclusion, some listed diagnostic categories, and some indicated that patients with any comorbid diagnosis were excluded. The frequency of specific disorders used as exclusion criteria is presented in Table 3, which includes only disorders that were explicitly cited as exclusionary. For example, because no study specifically listed somatoform or impulse control disorders as exclusionary, these are not listed in the table. For studies that excluded patients with any comorbid disorder, we counted this as an exclusion for each disorder listed in the table. Some studies excluded patients with a limited number of disorders such as eating disorders [15, 17, 26], obsessive–compulsive disorder [15, 17, 26], or panic disorder [15, 17]. We did not count studies that limited exclusions to patients with a primary diagnosis of a nondepressive psychiatric disorder.

Table 3 Number of antidepressant efficacy trials [13] and bipolar depression efficacy trials excluding patients with different nondepressive disorders

Nearly two-thirds of the 22 studies excluded patients based on the duration of the depressive episode. Exclusion because of an episode being too long (i.e., exclusion of patients with chronic depression) was more common than exclusion due to an episode being too short (i.e., a minimum episode duration requirement that is greater than the DSM-IV and DSM-5 requirements of 2 weeks). Studies excluded patients whose episodes were longer in duration than 3 months (n = 1) [28], 6 months (n = 2) [20], 1 year (n = 9) [1517, 21, 22, 24, 25, 30], or 2 years (n = 2) [26]. Eight studies required the episode duration to be greater than the 2 weeks required by the DSM-IV and DSM-5 to make the diagnosis. Of these eight studies, six required a minimum symptom duration of 1 month [16, 21, 22, 24, 25, 30] and two required a minimum duration of 2 months [17].

3.2 Comparison of the Psychiatric Exclusion Criteria used in Antidepressant Efficacy Trials (AETs) and Bipolar Disorder Efficacy Trials (BDETs)

BDETs were significantly less likely than AETs to exclude patients with current or a history of psychotic features/disorders (Table 2). Nearly two-thirds of the BDETs placed an upper limit on the duration of the current depressive episode, more than three times higher than the 20 % rate in the AETs. There was no difference between the AETs and BDETs in exclusion due to total score on a depression symptom scale, history of suicide attempts, current suicidal ideation, or the depressive episode being too short, though the BDETs were significantly more likely to exclude patients who did not score high enough on the first item of the HAMD.

Approximately one-third of the AETs excluded patients with any personality disorder, whereas none of the BDETs used this as an exclusion criterion (Table 3). Post-traumatic stress disorder, and borderline, schizotypal, and antisocial personality disorder were significantly more often listed as exclusions in AETs. There were no significant differences between the AETs and BDETs in exclusion due to other anxiety disorders, substance use disorders, or eating disorders.

4 Discussion

The goals of the present study were to review the psychiatric inclusion/exclusion criteria used in BDETs published during the past 20 years and to compare the criteria used in BDETs with those of AETs. More than a decade ago, our clinical research group raised concerns about the generalizability of AETs and suggested that the majority of patients seen in routine clinical practice would not qualify for an AET [8]. Our findings were subsequently independently replicated by other clinical research groups [911]. We recently updated and expanded our initial review of AETs, and found that AETs continue to recruit a narrow range of depressed patients [13]. To the best of our knowledge, the present review is the first to examine the inclusion/exclusion criteria of monotherapy treatment trials of bipolar depression. To enhance comparability to the monotherapy AETs we previously reviewed, we focused on monotherapy studies of bipolar depression and did not include augmentation trials.

Our comparison of BDETs and AETs found more similarities than differences. All studies required a minimum level of severity on a measure of depressive symptoms, and almost all studies excluded patients with at least some comorbid psychiatric disorders. Most BDETs and AETs excluded patients with suicidal ideation at the time of screening for the treatment study, and most also excluded patients with a history of substance use disorders. The studies of bipolar disorder were significantly less likely to exclude patients with psychotic features, a potentially important difference because patients with bipolar I depression are more likely to have psychotic symptoms than patients with nonbipolar depression [34].

To diagnose a major depressive episode, DSM-5, like its recent predecessors, requires that the symptoms of depression be present for at least 2 weeks. Yet, a significant minority of AETs and BDETs required a minimum symptom duration of at least 1 month. The minimum symptom duration requirement for AETs was more common than the maximum symptom duration exclusion. This was reversed for the BDETs, and exclusion due to chronicity was more frequent than exclusion due to brief symptom duration. The reason for this difference between AETs and BDETs is not clear, as few studies justify the basis for the exclusion criteria used.

Subject selection in BDETs and AETs must balance the issues of internal and external validity. In AETs, the inclusion/exclusion criteria have narrowed over the past 5 years, thereby suggesting that AETs may be even less generalizable than they were previously (when concerns about their generalizability had already been raised) [13]. We were unable to similarly examine whether the inclusion/exclusion criteria of BDETs have changed during the past 20 years because an insufficient number of studies have been conducted to permit such an analysis. The percentage of BDETs published during the past 5 years was not significantly different than the percentage of AETs published during the same 5-year period (40.9 vs. 32.9 %, χ 2 = 0.74, not significant).

A limitation of the present analysis is that it was based on published placebo-controlled studies. There is evidence of publication bias in AETs [35] and BDETs [36], and it is possible that the inclusion/exclusion criteria of published and unpublished studies systematically differ. However, recent reviews of the literature on agomelatine compared efficacy in all published and unpublished studies conducted on the medication and provided links to the unpublished studies [37, 38]. We downloaded the referenced unpublished trials from the internet and found that the inclusion/exclusion criteria of the published and unpublished trials were essentially the same. Similarly, a review of studies of lamotrigine found the criteria in published and unpublished studies were nearly identical [17].

5 Conclusions

This is the first study of the inclusion and exclusion criteria used in monotherapy treatment studies of bipolar depression. Similar to treatment studies of nonbipolar MDD, the treatment studies of bipolar depression frequently excluded patients with comorbid psychiatric and substance use disorders and insufficient severity of depressive symptoms as rated on standardized scales. Thus, concerns about the generalizability of data from trials of recently approved medications for the treatment of bipolar depression are as relevant as the concerns that have been raised about studies of antidepressants for nonbipolar depression.