Introduction

Major depression has been recognised as a high priority in primary health care with treatment rates having increased (Kessler et al. 2005). However, many patients do not achieve remission, although there is a broad consensus that full remission and recovery should be the primary aims in treating depression (Paykel 2002; Solomon et al. 2000). Remission and recovery as optimal treatment outcomes are defined in most studies as an asymptomatic status with no or only minimal symptoms for a specified period after a depressive episode has resolved (Frank et al. 1991).

During recent years, several new anti-depressants have been developed with the aim of improving the efficacy and safety of pharmacological treatment. There has been many debates if available anti-depressants differ in the rates of remission or in the reduction of symptoms (Hansen et al. 2005; MacGillivray et al. 2003). Despite the considerable number of clinical trials, evidence of differential efficacy is sparse. The selective serotonin and noradrenalin re-uptake inhibitor (SNRI) venlafaxine has been claimed to be more effective than selective serotonin re-uptake inhibitors (SSRIs) particularly with respect to remission rates (Rudolph 2002; Shelton et al. 2005; Smith et al. 2002; Thase et al. 2001). The proposed mechanism for the differences in efficacy is the dual action of venlafaxine, suggesting an additive benefit of re-uptake inhibition of serotonin and noradrenalin. The superiority of dual acting substances was supported by clinical data, e.g. in RCTs of clomipramine vs SSRIs such as citalopram or paroxetine (Danish University Antidepressant Group 1986, 1990).

However, most guidelines do not recommend venlafaxine as a first-line option (American Psychiatric Association 2000; Fochtman et al. 2006; National Institute for Health and Clinical Excellence 2004). The discrepancy between national guideline recommendations and the reviews mentioned above may be explained by the differences in the interpretation of the study evidence, methodological choices in systematic review or guideline development or additional consideration of side effects (Cipriani et al. 2007). Despite being intended to provide the best summary of the available evidence, there may be many limitations and shortcomings of meta-analyses in depression research beyond the quality of the included primary studies (Anderson 2001). Recently, guidelines for quantitative reviews of anti-depressants have been published addressing a variety of methodological issues (Lieberman et al. 2005).

To synthesise the study evidence on the comparative efficacy of venlafaxine vs other anti-depressants, we performed a complete up-to-date systematic review and meta-analysis on this substance and explored the reasons for the differences in effect sizes between this and other reviews in the field. We restricted this analysis to the comparison of venlafaxine against SSRIs. This comparison is most interesting from a pharmacological and a clinical viewpoint, as SSRIs are now the anti-depressants with the highest prescription rates in developed countries. We then compared our findings with previous relevant reviews.

Materials and methods

Systematic review

This review was part of a larger review of venlafaxine vs other anti-depressants. We searched the Medline, EMBASE, PsycINFO, PSYNDEX, Cochrane Central Register of Controlled Trials, study registers (http://www.clinicalstudyresults.org; http://www.clinicaltrials.gov) and the manufacturer’s database (http://www.wyeth.com) for primary studies between 1966 and January 2006. The literature search was updated in March 2007. Furthermore, in addition to the abovementioned databases, the Cochrane Database of Systematic Reviews, DARE and guideline databases (http://www.guidelines.gov; http://www.g-i-n.net) were screened for systematic reviews to identify further primary studies. We used a sensitive search strategy based on a combination of text and index terms as a modification of the search strategy of the Cochrane Depression and Anxiety group (available from the authors).

Study selection

In this review, we included double-blind randomised controlled trials in which venlafaxine was compared to citalopram, escitalopram, fluoxetine, fluvoxamine, paroxetine or sertraline with or without a placebo control. We firstly searched all publications comparing venlafaxine with other anti-depressants. In the next step, we chose only those studies where the substance was compared to an SSRI. We did not include conference abstracts unless we could obtain a full-text publication. We included only those studies in which depression was diagnosed according to ICD, DSM or Research Diagnostic Criteria (RDC). To assure the comparability of the studies, we included only studies with at least 6 weeks of double-blind treatment. Long-term studies with more than 6 months duration were excluded because we focused on short-term effectiveness and not on relapse and recurrence prevention. Similar to the criteria used in the British NICE guideline (National Institute for Health and Clinical Excellence 2004), we excluded studies in which more than 20% of participants had a primary diagnosis of dysthymia or more than 15% had a primary diagnosis of bipolar disorder. Studies in geriatric populations were included taking into account that there is no clear evidence of a differential anti-depressant efficacy in general between younger and older persons (Roose and Schatzberg 2005).

We only included studies which used the standard instruments Hamilton Depression Rating Scale (HAM-D) or Montgomery Asberg Depression Rating Scale (MADRS) as outcome parameters. Studies only reporting global ratings such as the Clinical Global Impression Scale (CGI) or Patient Global Improvement (PGI) were excluded because judgements are left to clinicians’ discretion with poorly specified global ratings not necessarily reflecting clinical relevance. We did not employ pre-defined remission and response definitions. Although, according to Frank et al. (1991), the HAM-D score below 8 is a very common remission criterion, many other definitions are used (Keller 2003), and there is still an ongoing debate on specific cutoff points (Zimmerman et al. 2005). However, we performed a sensitivity analysis including only studies that used the following standards: remission defined as HAM-D-17 (17-item version) ≤7 or MADRS ≤10; response defined as a reduction of at least 50% in HAM-D or MADRS scores. Most studies used HAM-D or MADRS scores as primary endpoint. However, a small number of studies reported only composite endpoints derived at by combining several scales (e.g. response was defined in one study as the reduction in the HAM-D or MADRS score of at least 50% and a CGI improvement score of 1 or 2). These composite scores were included when no single score was available because the literature suggests a relatively unimportant influence of choosing composite primary outcomes (Freemantle and Calvert 2007).

Publications were screened in two steps. First, all abstracts from the primary search were screened for potentially relevant articles. For these publications, full-text versions were obtained and screened using our inclusion and exclusion criteria. A total of 981 publications comparing venlafaxine to other anti-depressants were identified with our search strategy (see the Electronic supplementary material). One hundred fifty-one publications were screened in full text.

Data collection and analysis

Two authors extracted data independently. We assessed study and publication characteristics such as primary and secondary outcome measures, blinding, allocation concealment, length of follow-up, handling of missing data, statistical methods (last observation carried forward [LOCF] vs observed case [OC] analysis) and study population characteristics such as sociodemographic and baseline psychopathological parameters.

To evaluate the adequacy of the intention to treat (ITT) analysis, three categories were defined. Data analysis was judged as “adequate ITT analysis” when all randomised patients were evaluated within their allocation group using the LOCF or repeated measures analysis, or when only those randomised patients with one post-baseline measurement were included with no more than 10% of randomised patients being excluded from the analysis. Furthermore, there had to be a description of how patients lost to follow-up had been dealt with in the evaluation. Data analysis was judged as “acceptable ITT analysis” when LOCF or repeated measures analyses were performed with no more than 10% of patients lost to follow-up, no more than 30% patients having dropped out and adequate descriptions of the reasons for dropout were provided. Per protocol, OC or completer analysis, or analyses with less than 90% of randomised patients were classified as “inadequate ITT analysis”. Studies with “inadequate ITT analysis” according to our judgment were excluded.

Effect size calculation

Standardised effect sizes were calculated as the standardised mean differences from the change scores of the 17- or 21-item version of the HAM-D or the MADRS. In addition, post-treatment scores were extracted for sensitivity analyses. The standardised mean difference (SMD) is the improvement score of the venlafaxine group minus the improvement score of the SSRI group divided by the pooled standard deviation of the improvement score. For dichotomous data, we calculated risk ratios (RR) and numbers needed to treat (NNT) or numbers needed to harm (NNH). Statistical significance was evaluated using the Mantel–Haenszel statistic with 95% confidence intervals (95%CI). The numbers of participants with response or remission were calculated including dropouts within last observation carried forward analyses (LOCF).

Missing standard deviations (SDs) were imputed for each trial by calculating the pooled SD from all other studies that used the same depression scale in the same version. Borrowing SDs from other studies by this imputation method has been shown to be safe and appropriate, and the risk of bias is low (Furukawa et al. 2006; Thiessen Philbrook et al. 2007). HAM-D values had to be estimated from figures in publications for some studies.

Primarily, we used a fixed effects model for pooled analyses as two drug classes were evaluated, but as part of heterogeneity analysis, we compared them with random effects models. The random effects model assumes a different underlying effect for each study and takes this into consideration as an additional source of variation (Egger et al. 2001). The fixed effects model considers this variability as exclusively due to random variation, and individual studies are simply weighted by their precision. To quantify heterogeneity between studies, we used the I 2 statistic (Higgins et al. 2003). Low heterogeneity is associated with an I 2 value of 25%, whilst high heterogeneity is associated with an I 2 of 75% (Higgins et al. 2003). Statistical significance of heterogeneity was tested with chi square tests. We defined a substantial heterogeneity between study results as I 2 > 50% or p < 0.2. As a measure of tolerability data, RRs were calculated for the proportion of patients failing to complete the study (overall dropout rate) and for dropouts due to adverse events.

Sensitivity analyses

We performed a variety of sensitivity analyses to determine whether results were altered by excluding certain studies or by using different type of data or statistical methods. We prospectively defined several sub-groups for analysis: (1) age of participants (non-geriatric vs geriatric) with a cutoff of 65 years, (2) type of trial (active control-only trial vs active- and placebo-controlled trial), (3) adequate vs acceptable ITT analysis, (4) use of change scores vs post-treatment scores in standardised mean difference analysis, (5) inpatients vs outpatients, (6) fixed dose venlafaxine vs flexible dose venlafaxine and (7) venlafaxine as experimental vs control substance. In addition, we added further sensitivity analyses after completing the efficacy results. We evaluated studies using or not using standard response or remission criteria (see the “Study Selection” section) and those with or without data derived from figures. Furthermore, we evaluated the influence of the dosage on the effect size by means of meta-regression. The mean daily dosage at endpoint was only reported in some of the flexible dosage studies, therefore, we calculated two meta-regression analyses separately with mean daily dosage and maximum allowed dosage as predictors. A further regression analysis was performed for the influence of the year of publication on effect size. A funnel plot was used to evaluate possible publication bias.

We searched other reviews with quantitative data synthesis and re-evaluated the recent comprehensive analysis of Smith et al. (2002) using our own study inclusion criteria, but limiting the studies to the period up to 2000 when that review had been undertaken. For both evaluations, we included full-text publications only. In case of Smith et al. (2002) having included only abstracts, we searched for the full-text publications of these abstracts. We then compared our own results with those of this review and other previous analyses. We discussed the reasons for the differences in effect size estimations and conclusions.

Data were analysed using Comprehensive Meta Analysis Version 2.2 (Borenstein et al. 2005).

Results

Study characteristics

We identified 28 studies comparing venlafaxine with other anti-depressants as potentially appropriate. Seventeen studies had a SSRI control group and were included in this meta-analysis (Allard et al. 2004; Alves et al. 1999; Bielski et al. 2004; Clerc et al. 1994; Costa e Silva 1998; Dierick et al. 1996; McPartlin et al. 1998; Mehtonen et al. 2000; Montgomery et al. 2004; Nemeroff and Thase 2007; Rudolph and Feiger 1999; Schatzberg and Roose 2006; Shelton et al. 2006; Silverstone and Ravindran 1999; Sir et al. 2005; Tylee et al. 1997; Tzanakaki et al. 2000) (see the Electronic supplementary material). We included 15 studies with adult non-geriatric patients and 2 studies with geriatric patients of at least 65 years of age. Among the studies of non-geriatric populations, three were active- and placebo-controlled trials with 3 treatment arms and 12 were active control-only trials with 2 treatment arms. Among the studies in old age, one study was active- and placebo-controlled and one was active control-only. Study duration was between 6 and 24 weeks with most studies (14 trials) having a double-blind study duration of 6–8 weeks. Trial size varied between 68 and 382 patients. One study compared venlafaxine with citalopram, two with escitalopram, ten with fluoxetine, one with paroxetine and three with sertraline. Baseline HAM-D varied considerably between studies. Whereas in the non-geriatric study group in one study, the mean baseline HAM-D (17-item version) was 20.4 in the venlafaxine group and 19.9 in the escitalopram group (Montgomery et al. 2004), patients in another study had a baseline HAM-D (17-item version) of 29.1 (venlafaxine) and 29.7 (fluoxetine) (Clerc et al. 1994; see the Electronic supplementary material).

All studies used an ITT analysis using the LOCF method. However, only seven studies were judged to have an adequate ITT analysis (Allard et al. 2004; McPartlin et al. 1998; Montgomery et al. 2004; Nemeroff and Thase 2007; Rudolph and Feiger 1999; Schatzberg and Roose 2006; Sir et al. 2005). The randomisation method was adequately reported in only four studies (Rudolph and Feiger 1999; Sir et al. 2005; Tylee et al. 1997; Schatzberg and Roose 2006), whereas only one study reported methods to assure adequate allocation concealment beyond the blinding of patients and physicians (Schatzberg and Roose 2006). Funnel plots did not show signs of publication bias.

Remission rates

Fifteen studies could be included in the pooled remission analysis. In two studies (Tylee et al. 1997; Montgomery et al. 2004), only MADRS data was available, whereas in 12 studies, the HAM-D was used alone or as combined outcome parameter to calculate the remission rates. Remission definitions differed considerably between studies.

Figure 1 shows the final pooled remission rates. The RR for remission was 1.07 (95%CI = 0.99 to 1.15, NNT = 34). There was no statistically significant difference between venlafaxine and the SSRI group. Heterogeneity between studies was low and non-significant (I 2 = 18%, p = 0.248). Therefore, using a random effects model changed the results only marginally (RR = 1.04, 95%CI = 0.96 to 1.13).

Fig. 1
figure 1

Mantel–Haenszel RR for remission; fixed effects model

Response rates

For response analysis, we could include all 17 studies. For all but two studies (Allard et al. 2004; Montgomery et al. 2004), HAM-D scores were used to calculate the response rates. Response was defined as a reduction of at least 50% in HAM-D score or as a reduction of at least 50% in HAM-D or MADRS score and a CGI improvement score of 1 or 2.

The RR for response was 1.06 (95%CI = 1.01 to 1.12, NNT = 27) with venlafaxine being marginally superior to the SSRI group (Fig. 2). Heterogeneity was moderate (I 2 = 32%) and significant according to our definition (p = 0.099). Using the more conservative random effects approach, point estimates changed only marginally (RR = 1.05, 95%CI = 1.00–1.10) but resulted in non-significance (p = 0.053).

Fig. 2
figure 2

Mantel–Haenszel RR for response; fixed effects model

Effect size: standardised mean difference

All but one study provided data to calculate the standardised mean differences (SMD) in HAM-D or MADRS change scores. The SMD was −0.09 (95%CI = −0.16 to −0.02, p = 0.013) in favour of venlafaxine (Fig. 3). Heterogeneity was moderate (I 2 = 39%; p = 0.063), but statistical significance did not change when a random effects model was applied (SMD = 0.10, 95%CI = −0.19 to −0.02, p = 0.02).

Fig. 3
figure 3

Standardised mean difference of change scores; fixed effects model

There was no difference in the overall symptom score between venlafaxine and SSRI when post-treatment scores were used (SMD = −0.06, 95%CI = −0.13 to 0.00). The effect was homogeneous (I 2 = 0%, p = 582); therefore, using a random effects model yielded nearly the same result (SMD = −0.06, 95%CI = −0.13 to −0.00, p = 0.07).

Tolerability

For the analysis of total treatment discontinuation rates, we had to exclude the study of Allard et al. (2004) because dropout rates were not reported separately for study arms. However, all studies could be included in the analysis of the rates of dropout due to adverse effects. The total rate of treatment discontinuation did not differ between venlafaxine and SSRIs (RR = 1.05, 95%CI = 0.93 to 1.2, NNH = 100). However, there were significantly more dropouts due to adverse effects in the venlafaxine group with a RR of 1.38 (95%CI = 1.08 to 1.77, NNH = 32) in favour of SSRIs (see the Electronic supplementary material).

Sensitivity analyses

Including only studies with adequate ITT analysis according to our criteria did not change the results substantially (Allard et al. 2004; McPartlin et al. 1998; Montgomery et al. 2004; Nemeroff and Thase 2007; Rudolph and Feiger 1999; Schatzberg and Roose 2006; Sir et al. 2005). In those studies, none of the outcome parameters showed an advantage of venlafaxine (Table 1). In studies with flexible venlafaxine dosage regimes, remission and response rates and effect sizes were slightly higher compared to the SSRI group, although results have to be interpreted with caution due to the low number of studies with fixed doses. None of the sub-group analyses showed a statistically significant group difference (Table 1). However, when venlafaxine was used as experimental substance, an advantage could be seen in all outcome parameters, whereas none of the analyses showed a difference between venlafaxine and SSRI when venlafaxine was used as comparator. It has to be taken into account that confounding effects (e.g. publication year) could not be evaluated due to the low number of studies. There were no significant differences in remission or response rates between those studies using pre-defined standard response and remission scores and those using different cutoff points or composite endpoints or other. Furthermore, there were no significant differences in the SMD change scores between those studies with fully reported data (SMD = −0.06, 95%CI = −0.14 to −0.03) and those studies with data derived only from figures (SMD = −0.15, 95%CI = −0.26 to −0.03).

Table 1 Results of sub-group analyses

Meta-regression showed neither an influence of mean daily dosage (p = 0.732), publication year (p = 0.742) nor an influence of the maximum allowed dosage (p = 0.690) on remission rates or any other outcome (see the Electronic supplementary material as example).

A re-analysis of the Smith et al. (2002) data applying our inclusion and exclusion criteria resulted in eight studies with an overall RR for remission of 1.11 (95%CI = 1.01 to 1.23) or an odds ratio (OR) of 1.22 (95%CI = 1.01 to 1.47) in the SSRI controlled venlafaxine studies. Our results differed from those of the analysis by Smith et al. (2002) where 16 studies with an SSRI comparison group had been included and an OR of 1.43 (95%CI = 1.21 to 1.71) favouring venlafaxine vs SSRI had been reported. A re-analysis of the response data of 11 studies included in Smith et al. (2002) meeting our inclusion criteria resulted in an OR of 1.26 (95%CI = 1.07 to 1.49) favouring venlafaxine matching exactly the point estimate by Smith et al. 2002 (OR = 1.26, 95%CI = 1.02 to 1.58). Re-calculating the post-treatment score data of the Smith et al. (2002) analysis using our set of inclusion criteria resulted in no statistical superiority of venlafaxine. The overall SMD of the ten studies was −0.08 (95%CI = −0.17 to 0.00) vs SSRI. This result contrasted to the pooled effect size of −0.17 (95%CI = −0.27 to −0.08) favouring venlafaxine vs SSRI in Smith et al. (2002).

Discussion

In a systematic review of randomised controlled studies, we found no benefits in remission rates and a small superiority in response rates of venlafaxine over SSRIs. There were no differences in effect size comparing the HAM-D and MADRS post-treatment scores. Study quality of most included trials was at best moderate with detailed information lacking concerning blinding of patients, physicians and outcome raters, allocation concealment and statistical analysis. The results of our pooled analysis suggest no solid evidence of a superiority of venlafaxine over SSRI. This corresponds with the NICE depression guideline, which concluded that venlafaxine was no more effective in treating depression than other anti-depressants (National Institute for Health and Clinical Excellence 2004, p. 220), but is in contrast with other published reviews in the field.

We could not identify a Cochrane Review comparing venlafaxine with SSRIs. In one Cochrane Review, venlafaxine was counted as SSRI, and only studies comparing venlafaxine and older anti-depressants were evaluated (Geddes et al. 2006). We found seven reviews with formal meta-analysis of randomised controlled trials with venlafaxine (Rudolph 2002; Shelton et al. 2005; Smith et al. 2002; Thase et al. 2001; Einarson et al. 1999; Machado et al. 2006; Hansen et al. 2005); three of which did not perform a systematic literature search (Rudolph 2002; Shelton et al. 2005; Thase et al. 2001). Results of non-systematic reviews should therefore be regarded with caution. An unbiased sampling of studies with pre-defined inclusion and exclusion criteria on the basis of a systematic literature search in relevant electronic databases is a necessary prerequisite of a high-quality meta-analysis. One review included only six studies comparing fluoxetine vs venlafaxine, and it remains unclear why many other relevant studies were not considered (Hansen et al. 2005). In one meta-analysis, studies with venlafaxine extended release (XR), SSRI or tri-cyclic anti-depressant (TCA) response data were searched and compared (Einarson et al. 1999). However, response data from placebo-controlled venlafaxine studies and active control studies were evaluated jointly. In a review with narrow inclusion criteria performed to compare remission, dropout and adverse drug reaction rates of SNRI, SSRI and TCA (Machado et al. 2006), only four venlafaxine studies were included. In this analysis, remission rates were not taken from similar active-controlled studies, but pooled indirectly using non-contemporary control groups.

The systematic review by Smith et al. (2002) included 16 studies for the remission analysis, 17 studies for the response analysis and 19 studies for the effect size analysis of venlafaxine vs SSRI. However, one study (Rudolph et al. 1998) included by Smith et al. (2002) was only placebo-controlled and, therefore, did not fulfil the inclusion criteria. A second study (Alves et al. 1999) only presented remission rates at 3 weeks and not at the final endpoint of 12 weeks. Therefore, in our analysis, only the response rate data of this study was considered. Furthermore, in two studies included only in Smith et al. (2002), a fair number of patients with dysthymia (Ballus et al. 2000) or patients with a bipolar disorder (Zanardi et al. 2000) participated, thus limiting the relevance for major depression. With one exception (Zanardi et al. 2000), the ORs for remission in studies included by Smith et al. (2002), but excluded from our analysis, were above 1.40. On the other hand, remission rates in studies included in both reviews were below 1.40 with three exceptions (Schatzberg and Roose 2006; Rudolph and Feiger 1999; Mehtonen et al. 2000). This trend towards higher ORs or larger effects sizes in studies included only by Smith et al. (2002) was true also for the response analysis and the SMD effect size analysis. In some studies included in both reviews, there was more than one remission definition. Although it was stated that preference had been given to the HAM-D scale, this was not done consistently. One publication reports remission rates of 60.2% in both the venlafaxine and the fluoxetine groups (Costa e Silva 1998). The OR for remission of 1.15 in favour of venlafaxine calculated for this study in Smith et al. (2002) seems to be due to having used not the HAM-D scale but combined outcome parameters. In the case of one study, Smith et al. (2002) used an OR of 0.54 instead of the published value of 1.76 for the response analysis (Dierick et al. 1996). For one study (Mehtonen et al. 2000), OC data were used resulting in higher ORs for remission. In one study included by Smith et al. (2002), neither DSM, ICD nor RDC was used to assure the depression diagnosis (Geerts et al. 1999). Furthermore, we did not include data of Salinas and Venlafaxine-XR-Study-Group (1997) because it was only available as an abstract.

As Kavirajan (2004) mentions, the data on over half of the patients included in the Smith et al. (2002) analysis were taken from studies that had not been published as articles in peer-reviewed journals or were available as abstracts only. Excluding abstracts without full-text publications contributed to the differences in results, as abstract-only publications tended to have larger effect sizes than full-text studies in the Smith et al. (2002) analysis.

The manufacturer of venlafaxine provided funding to five of the previous reviews (Rudolph 2002; Shelton et al. 2005; Smith et al. 2002; Thase et al. 2001; Einarson et al. 1999). One possible advantage of this source of funding may be that authors have the opportunity to access unpublished data. In fact, the so-called file-drawer effect referring to researchers’ tendency to preferably publish positive findings and not to submit “negative trials” may be a major source of publication bias leading to the potential inflation of effect sizes (Khan et al. 2002). This must be weighed against the potential lack of transparency (Jorgensen et al. 2006).

Excluding industry data prevented a full re-analysis of the analysis by Smith et al. (2002) and may be seen as a limitation. Instead, we used strict inclusion criteria to re-evaluate the frequently cited meta-analysis of Smith et al. (2002) to detect sources of variation. Results showed that the differences were not due to the set of newer studies published after 2000. However, our sub-group analysis demonstrated that venlafaxine showed some differential efficacy when used as an experimental or as a comparator drug. Although there were no significant difference between the sub-groups, remission rates and effect sizes were higher in the venlafaxine groups compared to other anti-depressants when venlafaxine was used as the experimental substance. These results may be interpreted, as previously shown for fluoxetine (Barbui et al. 2004), as ‘wish bias’, suggesting a certain role for this kind of sub-group analysis in systematic reviews of anti-depressants. The results of the regression analyses did not show a dose–response relationship. However, due to the great number of flexible-dose studies, differential efficacy associated with dual action starting at doses above 150 mg could not be established.

In this analysis, SSRIs are treated as a drug class. However, there may be differences in efficacy or effectiveness among them, which may get lost in the overall comparison to venlafaxine due to the small number of head-to-head studies. For example, there is some published evidence for a greater efficacy of escitalopram vs other SSRIs (Kennedy et al. 2006). Furthermore, the overall result could obscure a clinically important difference in depression severity sub-groups, as we could not perform a sensitivity analysis on baseline depression severity. It is well known that at milder degrees of depression it may be hard to show a drug–placebo difference, which may be even more difficult between two active treatment arms. Effects may become clearer in the trials that include patients with more severe depression (Khan et al. 2002).

Excluding studies that do not meet certain quality standards from a systematic review is a controversial issue. There is evidence that randomisation, adequate allocation concealment and adequate ITT analysis are important to prevent bias (Moher et al. 1998). However, there is no consensus on how to integrate methodological quality of RCTs in systematic reviews (Moja et al. 2005). We decided not to include studies with major methodological problems. Excluding studies diminishes statistical power. However, this must be weighed against the potential bias introduced by these studies.

Tolerability data showed no differences between venlafaxine and SSRIs in dropout rates. However, the higher rate of study withdrawal due to adverse events adds further to the argument that venlafaxine may have risks for many patients possibly outweighing this substance’s claimed benefits (Cipriani et al. 2007). More large-scale studies may be necessary to identify patient predictors for a differential benefit compared to other substances.

In conclusion, our review does not support the notion of a clinical superiority or an improved trade-off between efficacy and side effects of the specific dual re-uptake inhibition anti-depressant venlafaxine over SSRIs. Among the most important reasons for the differences in remission and response rates between our analysis and previous reviews were:

  • exclusion of studies with low methodological quality;

  • avoidance of pooling selectively reported study results or data from endpoints not prospectively defined and

  • exclusion of abstract-only studies.

Manufacturer sponsoring may have been one reason for the differences in the results.