1 Background

Advances in science have intensified the urgency of carefully examining the concept of enhancement, the use of pharmaceutical or technical interventions for the improvement of capacities or states of healthy individuals (Schöne-Seifert and Talbot, personal communication). Moreover, advances in neuroscience have widened the debate by introducing the concept of neuroenhancement. Humans have always been interested in amplifying their cognitive capacities, and the pursuit of happiness is anything but new. Yet nowadays, more than ever, ways to achieve these purposes are being researched and developed. While the technical interventions (e.g. neuroprosthetics, brain–computer interfaces) are about to leave the realms of science fiction, pharmaceutical interventions are already factual. Here and now modern medicine, often while failing to notice it, offers several possibilities for neuroenhancement, which is short for the enhancement of cognitive, emotional and motivational functions (Schöne-Seifert and Talbot, personal communication).

At the forefront of this debate are the modern antidepressants. The psychiatrist Peter Kramer noted that Prozac® (fluoxetine) and the other selective serotonin reuptake inhibitors (SSRIs) might have a (possibly positive) effect on the mood and personality of individuals in the absence of mood or personality disorder and coined the term “cosmetic psychopharmacology” (Kramer 1993). However, there is next to no research on the evidence that supports or opposes this assumption and today, a large number of different classes of antidepressants are available that could possibly serve as neuroenhancers (Cerullo 2006). Antidepressants target mainly emotional states to achieve a form of mood enhancement, but they may also have an impact on motivational and cognitive functions, so that also the question of their effect as cognitive enhancers and thus broadly as neuroenhancers, arises. Obviously, there exists a broad range of other drugs affecting mainly cognition that could be used for neuroenhancement purposes. These include, among others, psychostimulants, such as methylphenidate (Ritalin®), which have already been shown to enhance performance in particular tests (Brown 1977; Camp-Bruno and Herting 1994; Cooper 2005) after a single dose and have also been extensively misused (Babcock and Byrne 2000; McCabe et al. 2005), and anti-dementia drugs such as acetylcholinesterase inhibitors, which in long-term use emerge as a possibility for people who strive to keep at the top of their performance (Ihl 2003). These preparations for cognitive enhancement pose equally interesting questions and have also been within the scope of our problem, but the focus of this paper will be on modern antidepressants.

In the last decades, extensive research led to the development of new generations of antidepressants in order to have drugs with fewer side effects and selective for one or two neurotransmitter. This led to the reversible inhibitors of the monoamine oxidase A (RIMA), such as moclobemide that were developed primarily to increase safety since they were devoid of severe cardiovascular side effects and food–drug interactions. The big boost for the popularity of antidepressants however, was certainly triggered by the market introduction of the SSRIs (e.g. fluvoxamine, fluoxetine, paroxetine, sertraline, citalopram, escitalopram), the first drugs selective for serotonin. These compounds had an even more favourable side-effect profile so they were also prescribed for less ill patients. They were followed by the serotonin-norepinephrine reuptake inhibitors (SNRIs: e.g. venlafaxine and duloxetine) and the norepinephrine reuptake inhibitors (NRIs) such as reboxetine. Finally, one should also mention bupropion and its newest sustained- and extended-release formulations, a norepinephrine dopamine reuptake inhibitor also in use as a smoking cessation aid (Raymond et al. 2007).

Surveys indicate that antidepressant use has increased rapidly in most developed countries (McManus et al. 2000; Olie et al. 2002; Raymond et al. 2007). This trend is not only driven by the availability and commercial promotion of new antidepressants, but also by an increased awareness of depression. Since it is known that in the past depression remained to a large extent undiagnosed and that the incidence of depression has risen (World Health Organisation 2001) nowadays health care providers are more prepared to screen for, diagnose and treat depression (Alonso and Lepine 2007). Furthermore, patients more readily accept SSRIs and their successors because, rather than being more effective than traditional antidepressants, they are much better tolerated (Mulrow et al. 2000). Hence, the willingness to pharmaceutically treat milder forms of depression has increased, and the availability of new medication has led, according to some, to the reassessment of mild and moderate depression (Slingsby 2002). Just which human problems are called medical is an important social decision and we may choose to define less severe mood states as pathology. In any case, and as many are warning, there may well be economic and political interests furthering the broadening of disease limits (Healy 2004).

Hardly anyone would oppose the use of medication for curing an illness, but some of the new antidepressants seem to go further than that. In his book, Listening to Prozac, Kramer (1993) described how some patients who had completed a course of Prozac® to relieve their complaints wished to continue taking it, although—medically speaking—they were no longer ill. While taking Prozac®, the patients felt “better than well” because, in their view, next to relieving their often uncertain and mild medical condition, the medication improved various aspects of their personality which never had been considered part of their illness. Shy persons became more assertive and people with low self-esteem more confident. Hence, a new trend of “cosmetic psychopharmacology” emerged, comprising people who, although never having been “ill”, still wished to benefit from the possibilities that the new drugs seemed to offer (cf. http://www.biopsychiatry.com, August 2007; http://www.transhumanism.org/index.php/WTA/index, August 2007). This trend probably contributed considerably to the marketing success of these drugs and led to a further debate regarding the appropriateness of prescribing a drug like Prozac® for someone who is not suffering from any medically-recognised condition, but who simply wants to improve well-being or personality (Bostrom and Roache 2007). It is this consumption of drugs by normally functioning people for non-therapeutic purposes which has been labelled as enhancement. “Normal”, however, is a relative term and needs to be defined according to its context (Slingsby 2002). Since the limit between normal and malfunctioning has to be drawn in a complex process of standardization, rather than being self-evident, this may lead, not only to the acknowledgement of mild forms of certain psychiatric disorders, but also to the redefinition of these disorders themselves. Nevertheless, whether we see it as an expansion of the illness criteria or as an enhancement in the absence of illness, the ethical and legal issues that arise have to be addressed on the basis of all the available evidence.

On this basis, in the larger framework of an interdisciplinary research project on pharmaceutical neuroenhancement (http://www.ea-aw.de/de/projektgruppen/projektuebersicht/pharmazeutisches-enhancement.html, May 2008) we set out to collect and analyse the pieces of evidence on the effect of antidepressants in healthy individuals. If antidepressants in trials are able to show a positive effect in healthy test subjects, then the question arises as to how their potential use for neuroenhancement purposes can be regulated. If no evidence of effect can be found in the existing literature, then their ability to enhance healthy people should be proven before we get engaged in further debate about neuroenhancement with these particular drugs. However, even if the lack of effect or even a negative effect in healthy people can be shown, we may find that some people are still willing to use them, simply because of an anticipated positive effect.

Unfortunately, the medical community has failed to follow the debate to its full extent. In line with the traditional role of medicine, to treat and to prevent illness, medical research has only partially and inadvertently addressed the question of the effectiveness of antidepressants in healthy individuals. The studies are sparse and in many cases not rigorous, and the results are ambiguous. In order to shed light on the existing evidence, we applied a methodology mostly used for solving clinical problems. When confronted with an arguable clinical question, evidence-based medicine, gives a clear answer and guidelines through a systematic review. This is “a review of the evidence on a clearly formulated question that uses systematic and explicit methods to identify, select and critically appraise relevant primary research, and to extract and analyse data from the studies that are included in the review” (Center for Reviews and Dissemination 2001). Systematic reviews are “scientific investigations in themselves, with pre-planned methods and an assembly of original studies as their ‘subjects’. They synthesize the results of multiple primary investigations by using strategies that limit bias and random error” (Cook et al. 1997). These strategies include a comprehensive search of all potentially relevant articles and the use of explicit, reproducible criteria in the selection of articles for review (Cook et al. 1997). Although not a clinical problem, neuroenhancement is also a controversial topic, and therefore a systematic review of trials of antidepressants in healthy individuals was needed. Hence, we conducted such a review according to a pre-defined protocol. The methodology is documented in the next sections (Center for Reviews and Dissemination 2001; Egger et al. 2001; Higgins and Green 2006).

2 Objectives

The aim of this review was to assess the effect and safety of modern antidepressants for emotional, cognitive and motivational enhancement in healthy individuals. Antidepressants obviously affect mood, however, we chose not to narrow our research question and investigate only their effect as mood enhancers, but to look at their general effect as neuroenhancers.

3 Criteria for considering studies for this review

3.1 Types of studies

Included were all published single or double blind randomised or quasi-randomised controlled clinical trials, including cross-over clinical trials, that compare one or more of the antidepressants bupropion, citalopram, duloxetine, escitalopram, fluoxetine, fluvoxamine, moclobemide, paroxetine, reboxetine sertraline or venlafaxine with a placebo.

3.2 Types of participants

Eligible studies were those involving healthy people of any age and either sex who showed no evidence of psychiatric disorder, cognitive decline or other disease.

3.3 Types of interventions

All interventions were carried out with the above-mentioned antidepressants in all doses and dosing schedules (single dose or multiple doses), for any duration and by any route of administration in comparison with a placebo.

3.4 Types of outcome measures

The primary outcomes of interest were outcome measures that had emotional, cognitive or motivational parameters, specifically: mood, wakefulness, attention, concentration, memory, learning and executive functions. The outcomes were not pre-defined any further. Secondary outcomes of interest were adverse effects and acceptability of the medication, measured by numbers of people dropping out during the trial and the post-randomisation exclusions, due to the drug effects. Information on adverse effects from other types of studies (s. clinical trials with patients) was not included in the review.

4 Search methods for identification of studies

An author (DR) supported by a professional librarian developed search strategies (available upon request) to identify potentially relevant studies. The MEDLINE and EMBASE databases were searched using the WebSPIRS® 5.12 search engine from OVID and no language restriction was applied. The search was performed in the first week of April 2007 (MEDLINE: 1950 to 2007/04-week 1, EMBASE: 1989 to 2007/03, additional research for newly published articles at the third week of July 2007). Reference lists from relevant primary and review articles were examined for additional studies. Furthermore, pharmaceutical companies were contacted for information on published and unpublished studies. This was done through the German Association of Research-based Pharmaceutical Companies (Verband Forschender Arzneimittelhersteller e.V., http://www.vfa.de), March 2008, an organisation representing 45 research-conducting companies. The request was forwarded to the members of the association. Finally, experts that have conducted reviews on the topic in the past were contacted for further information and guidance.

5 Methods of the review

5.1 Selection of studies

The studies which were obtained through the search strategy were screened and those which were clearly irrelevant were discarded on the basis of their title and abstract. The remaining references were retrieved in hard copy and compared against the reviews inclusion criteria (DR). If there was any doubt whether an article should be excluded or not, the article was assessed by a second reviewer (OL) and disagreements were resolved by discussion.

5.2 Quality assessments

Methodological quality was assessed with regards to randomisation and the method of randomisation, blinding, allocation concealment, recruitment, outcomes, reporting of results, and data analysis using the criteria of the three-item, five-point Oxford Scale (Jadad scale) (Jadad et al. 1996) among others. These focus on three dimensions of internal validity: randomisation, blinding, and withdrawals. Therefore, a trial reported as randomised earns one point, with an additional point being awarded if the method used to generate the sequence of randomisation is described and appropriate (table of random numbers, computer generated, etc.) and subtracted if it is inappropriate (subjects were allocated alternately, or according to date of birth, hospital number, etc.). The same applies for double-blinding. Finally, one point is given if there is a description of withdrawals and dropouts. This scale gives more weight to the quality of reporting than to actual methodological quality and has been criticised for that (Egger et al. 2001), but it is sufficient to give a rough estimate of quality and stratify a large number of studies. Hence, studies with Jadad-score 3 are regarded as good quality studies while scores 0–2 correspond to low quality and 4–5 to high quality studies.

5.3 Data extraction

Four types of data were extracted from the published reports onto a pre-tested abstraction form in a Microsoft Office Excel® spreadsheet: (1) study characteristics, design and quality (e.g. randomisation, blinding); (2) population characteristics; (3) study interventions (e.g. drug, dosage, frequency, duration of study); and (4) results of relevant tests, with all their corresponding parameters (e.g. both time and accuracy in a reaction time test). For data processing, these tests were grouped into test clusters according to the predominant neuropsychological domain that they were assessing (Spreen and Strauss 1998) and these clusters were aggregated for further analysis into the main factors, namely outcomes.

For continuous data, the summary statistics required for each trial and each outcome were the mean, the standard deviation and the number of participants for each treatment group at each time point; these were extracted if available. If available, the mean change from baseline was considered in each group. The baseline assessment was defined as the latest available assessment prior to randomisation, but no longer than two months prior to it. For binary data, the number of people in each treatment group and the number of people experiencing the outcome of interest were to be sought. If the only data reported were the treatment effects and their standard errors, these were extracted.

The outcomes measured in clinical trials often arise from ordinal rating scales. Whenever the rating scales used in the trials had a reasonably large number of categories, the data were treated as continuous outcomes arising from a normal distribution.

5.4 Data analysis

Based on the means and standard deviations of each group, a standardised effect difference, namely Cohen’s d, was calculated for the relevant test parameters of each study. Additionally, the variance of Cohen’s d was calculated. Cohen’s d was chosen since it allows comparing results measured with different psychometric scales. In order to take heterogeneity and correlation within studies into account, a linear mixed model was used for data analysis. Based on this linear mixed model, a meta-analysis and a meta-regression were performed. The results report the heterogeneity variance which measures structural variability between studies together with regression coefficients for fixed effects such as time or dose. All analyses were performed with PROC MIXED of the statistical package SAS 9.1.

6 Results of the search

Our research yielded 2,512 relevant titles from MEDLINE and EMBASE databases (including some duplicate records, where the two databases overlapped). We retrieved 346 articles for full-text evaluation together with those found through references. Although no language constraint was applied, all the relevant publications were in English or German. From these articles, 135 met our inclusion criteria and their results are considered here for answering our research question. (Because of space limitation, studies included in the systematic review, but not cited in the text, are not listed in the reference list. A complete reference list, as well as a list of excluded studies, can be obtained from the corresponding author). Some of the studies applied more than one of the drugs in question or several doses and therefore are represented in the analysis more than once, as different trials coming from the same study. In contrast, in seven cases, two articles (each with different outcome measures) presented findings from the same cohort and so the relevant articles were considered a single study. A few publications presented the same results with others (usually as a short report or preliminary results) and were excluded as duplicate publications.

From 45 research-conducting companies contacted through the German Association of Research-Based Pharmaceutical Companies, three replied to the request for data. They informed us either that no data were available or guided us to studies that had already been traced through the search. Therefore, all the included studies in this systematic review are published articles that were found either through the targeted research or the reference searching of reviews and the identified studies.

7 Description and methodological quality of included studies

The included studies represent many diverse fields of research, from highly sophisticated functional magnetic resonance imaging (fMRI) and behavioural experiments, to trials examining the effects of newly introduced antidepressants on cognitive and psychomotor performance, often in view of side-effects and the fitness to drive. The following descriptive analysis is conducted on all articles included in this systematic review, comprising also those that did not yield quantitative information.

In order to evaluate the results, a first crucial point was the duration of each trial, or in other words, if the drug was given only once (and hence the trial is defined as a single dose trial) or if it was given for a longer period of time (in which case it is a repeated doses trial). Two other decisive aspects were the study design and the quality of a study. With regard to the first point, the studies were divided into two categories: cross-over design studies and parallel design studies. In the former, the test subjects had been—usually randomly—allocated to a placebo and one or more medications for a period of time and then, after an adequate wash-out period, to the alternative drug condition allowing for a within-subject comparison. In the latter, the subjects had been randomly divided into two groups taking a drug or a placebo and a between-subject comparison was made. Obviously, using a cross-over design is a drawback in such a psychopharmacological study. The active substances can have an effect on the neuroplasticity of the central nervous system (CNS) and in this case it could make a difference whether someone receives a drug or a placebo first. Moreover, if the placebo effect is disregarded, there is no way to make sure that these healthy subjects could not differentiate the drug from the placebo when they first took a potent drug and then an inactive placebo. For these reasons, cross-over studies are usually not taken into account in systematic reviews, or only the results from the first phase are considered (Egger et al. 2001; Higgins and Green 2006). However, since neither so many nor such rigorously conducted studies were expected in this field, a priori the quality standards were set so that cross-over trials were included. Three further quality assessments that were used were randomisation, blinding, and withdrawals summarised roughly in the Jadad score. Again, in order not to miss any relevant evidence, no strict criteria were applied.

Although it is known that the main effects of antidepressants on clinical populations are seen only after several weeks, the majority of the studies were, surprisingly, single dose trials. In the 135 included studies, there were 135 trials assessing the acute effects of one dose, while 75 trials tested the effect over a longer period of drug assessment (Fig. 1, for detailed information on the repeated dose trials please refer to Table 3). In some studies with repeated drug intake, there was also an assessment after the first dose and therefore the results from these measurements were also included in the analysis of the single dose trials.

Fig. 1
figure 1

Number of trials assessing the effect of drugs over a longer period of time. (Repeated drug intake)

Most of the trials employed a cross-over study design, while there were also 20 single dose and 18 repeated dose trials with a parallel design. There were also four studies with a short run-in placebo period, where all participants took a placebo for a short period, before taking the medication for a longer time (in a single blind fashion) or being allocated to two medications (of which usually only one was of interest for our review). More details on the studies and their quality are shown in Table 1 and the number of trials conducted with each drug in Table 2. Finally, the different preparations were given orally, except for citalopram which, in four cases, was also given as a single intravenous dose (Bhagwagar et al. 2004a, b; Del-Ben et al. 2005; Harmer et al. 2002, 2003), usually in therapeutic doses and produced little and usually mild and transient side effects.

Table 1 Description of the studies included in the systematic review
Table 2 Number of trials conducted with each preparation
Table 3 Included studies–repeated dose studies

8 Outcomes

Although many different methods have been used to evaluate the effect of antidepressants, those relevant to our review were objective and subjective ratings and neuropsychological tests. The latter were categorised according to a catalogue of neurocognitive tests (Spreen and Strauss 1998) of different neuropsychological domains (attention, memory, etc.). In total, the assessments were grouped into (a) mood, (b) emotional processing and emotional memory, (c) wakefulness, (d) attention and vigilance, (e) memory and learning, and (f) executive functions and information processing. This categorisation was based, to some extent, on previous research on the surrogate markers for the effects of drugs in healthy subjects that was obtained through personal communication (Dumont et al. 2005). A brief description of the domains and the most commonly applied tests follows.

8.1 Mood

One of the primary outcomes in our research question was the change in mood after drug administration. This could be measured by actual mood assessments or other parameters that interact with mood. Several instruments have been applied to measure a person’s genuine mood and most of them are also applied in clinical settings. A first major distinction should be made between objective ratings (observer-rated instruments applied by a health care expert such as a psychiatrist or a psychologist) and subjective self-ratings. The former were applied only occasionally, whereas the latter were used in the majority of the cases. Nevertheless, before inclusion in the trial, in almost all of the studies the subjects were screened by a professional for past or current psychiatric disorder. The standard testing procedure was a self-reporting instrument that was administrated at baseline and after drug or placebo application. Then the mean change from baseline for all the subjects under medication and placebo was measured and compared. In some cases there was no baseline assessment and the mean value after drug intake was compared with the mean value after placebo intake. The most commonly used instrument was a visual analogue scale (VAS) (or a derived factor of several VAS or scales of ascending numbers), on which the subjects reported their current state of mood. Most individual scales corresponded to (the individual VAS lines of) the sub-scale “contentment” proposed by Norris and validated for CNS drug evaluation by Bond and Lader (Dumont et al. 2005). These included instruments such as the von Zerssen Befindlichkeits scale, scales from the Profile Of Mood States (POMS) and the Positive And Negative Affect Scale (PANAS). Other outcomes that correlate with mood, although not measuring mood directly, have also been categorised in this domain. Anxiety was measured by analogous instruments such as the scales of the Spielberger State Trait Anxiety Inventory (STAI) and the POMS anxiety scale. All of them corresponded to the sub-scale “calmness” of Bond and Lader (Dumont et al. 2005). Aggression was mostly assessed by the Buss-Durkee Hostility Inventory (BDHI) but also by other subjective ratings such as VAS and the POMS sub-scales on irritability, assertiveness, hostility and anger. Furthermore, following the idea that personality variables could change after antidepressant administration, in some studies subjective ratings on extraversion/introversion, attentiveness, friendliness and confidence, etc. have been performed. Finally, a number of observations were found where the effect of the drug had been measured indirectly. A main aspect of these was changes in social behaviour under medication. A description of these complicated behavioural tests is outside the scope of this work (Bond 2001, 2005). However, what was measured was, for instance, changes in a roommate relationship, in an interaction with a confederate or in the way the subject behaved in a mixed motive game or a dyadic puzzle task. All these individual mood-measuring instruments were grouped in this outcome to give an overall estimate of the effect of antidepressants on healthy volunteers.

8.2 Emotional processing and emotional memory

Several other studies focused on the question of whether antidepressants could affect the perception and processing of emotionally valent stimuli. Recently, pharmacological modulation of emotions has become an important field of research and a common approach has been to demonstrate disease-specific, or as in this case, drug-specific effects on the recognition of facial expressions (Serra et al. 2006; Venn et al. 2006). This research has been supported by the fact that certain fundamental expressions are innate and the expressions of anger, disgust, fear, happiness, sadness and, to some extent, surprise have been widely accepted to represent the six “basic” emotions. Within this context, the immediate and the chronic effects of antidepressants were tested. In analogue paradigms valent words had to be categorised and pictures, standardised for their arousing and emotional effect, had to be rated. Another aspect of this question which was investigated in these psychopharmacological studies with antidepressants was emotional memory manipulations. This refers to the formation, consolidation and retrieval of memories formed during times of heightened emotional arousal or stress. In a typical paradigm, the subjects—after receiving a drug or a placebo—were presented with items such as words, photos or slides accompanied by a narrative, which were either emotionally neutral or positively or negatively valenced. Arousal was assessed through self-rating scales and, at a later time or date, an unexpected memory test was performed, to evaluate if the medication had an effect on the kind of material that was remembered (Chamberlain et al. 2006).The interpretation of these experiments was based on the assumption that a subtle change in emotional memory can be observed even in the absence of a measurable effect on subjective mood. Nevertheless, such an effect could also have an impact on social adaptation and therefore these studies indirectly measure a desirable effect of drugs for which there could be a demand.

8.3 Wakefulness

The majority of psychoactive medications have sedating or arousing effects. Sedation is a condition generally conceived as decreased or suboptimal wakefulness (Schmitt et al. 2002b), while arousal can be defined as the state of psychological and physiological reactivity of the subject. The first generations of antidepressants have sedative effects and therefore the newest antidepressants have been vigorously tested in order to explore if they also possess such an effect. That was usually done with a VAS measuring sedation (and equivalences, e.g. drowsiness) or arousal (and equivalences, e.g. alertness), or else with corresponding parts of subjective ratings such as the POMS fatigue, vigour scales or the energy sub-scale of the Befindlichkeits Scale. All these corresponded to the “alertness-drowsiness” sub-scale of Bond and Lader (Dumont et al. 2005) and were grouped therefore under the outcome “wakefulness”.

From the different assessment tools applied to evaluate arousal, particular reference should be made to the repeatedly used Critical Flicker Fusion frequency test (CFF), which also provides an index of overall CNS activity. This test requires subjects to discriminate flicker from fusion in a flashing light and the individual thresholds (the frequency at which the change from flicker to fusion, or vice versa, is seen to occur) are determined. Drug-induced decrements in the CFF threshold are believed to indicate sedation, whereas elevation of the CFF threshold might reflect arousal. However, pupil diameter is an important determinant of the CFF threshold and it has been argued that a rise in the CFF threshold might also result from mydriasis (dilatation of the pupil) (Freeman and O’Hanlon 1995). The SSRIs in particular can cause an up to 2 mm increase in pupil size after single or repeated doses (Deijen et al. 1989; McGuirk and Silverstone 1990; Saletu and Grunberger 1988; Schmitt et al. 2002a). Therefore, only investigations of the CFF threshold controlling for pupil size changes with an artificial pupil could be considered and since, in most of the cases this precaution was not met, the results of the CFF had to be disregarded.

8.4 Attention and vigilance

Another cognitive process which interacts closely with arousal is attention, defined as the appropriate allocation of processing resources to relevant stimuli (Coull 1998). Several tests have been developed to evaluate the effect of drugs on attention. Most of them demand a rapid but simple motor response to a stimulus, usually a light. Scoring is done by measuring the reaction time (RT), which can be separated into two components: the recognition reaction time (or the time taken to spot the stimulus and move the finger from a starting position) and the motor reaction time (the time taken from lifting the finger to pressing the appropriate response button that extinguishes the stimulus) (Amado-Boccara et al. 1995). Simple reaction time tests (SRT) measure the response to one sensory cue, while in choice reaction time tests (CRT) the subject is required to extinguish one of several equidistant lights, illuminated at random. Selective attention (giving attentional priority to a relevant stimulus while ignoring distracting or competing irrelevant information) can also be tested by asking the subject to only respond to one stimulus out of many (e.g. Stroop Colour-Word Test) or to a specific cue combination (e.g. red light and high tone). Often a RT task is combined with a tracking task in order to assess divided attention, which is the ability to respond simultaneously to two or more different stimuli. In this case, one must, for instance, keep a joystick-controlled cursor in line with a moving target, while responding to a random stimulus, such as a light. Both the RT and the tracking error are recorded (Compensatory Tracking Task, CTT; Divided Attention Task, DAT). Moreover, vigilance or sustained attention (the ability to maintain a consistent behavioural response to a particular stimulus during continuous and repetitive activity over a prolonged period of time) was usually measured with the Mackworth Clock test, a 45-min-long task. In this test there is a circular arrangement of 60 dots simulating the second marks on a clock and they are briefly illuminated in clockwise rotation proceeding with a 6 dots jump. At rare irregular intervals the target proceeds with a 12 dots jump by skipping one of the dots in the normal sequence and this jump has to be detected.

The above-mentioned attention-measuring tasks were classified under this domain although many of them, such as the RT tasks, were more broadly defined as measuring “psychomotor performance” in the original studies. Under this term, the researchers tried to encapsulate the co-ordination of sensory and motor systems through the integrative and organisational processes of the CNS. The distinction between cognitive and psychomotor functions is artificial, but nevertheless the relevant cognitive components of the psychometric tests have been classified here (e.g. recognition RT). However, there were also a number of commonly used standardised psychometric tests, which mostly relied on co-ordination and had a predominant motor component. These included tracking tasks, where the subject had to keep a joystick-controlled cursor in line with a moving target, but also covered a broad spectrum of tasks such as tapping tests, for which the test subject was required to tap his/her finger as fast as possible. These tests are irrelevant to the objectives of this review and therefore their results are not mentioned here.

8.5 Memory and learning

Another domain of human cognition that is of great interest is memory. Depression is associated with cognitive deficits, which is why the newest compounds have been extensively tested in order to show whether they possess a memory enhancement effect that could ameliorate the cognitive deficits of depression. Nevertheless, studies of their effect on memory in the absence of illness also exist, since healthy volunteers have been tested as well. All the tests that were used are classified in this category, although they varied considerably in terms of information types, temporal characteristics and specific processes that were targeted. List learning tests were often used and typically consisted of one or more acquisition trials in which the items were presented, followed by recall and recognition trials to assess retrieval and storage, respectively. Varying the time interval between presentation and assessment allowed for a differentiation between short- and long-term memory functioning (Schmitt et al. 2006). Besides these assessments, this outcome comprises tests that measure changes in visual memory, spatial memory and learning capacities, and tests measuring working memory.

8.6 Executive functions and information processing

Finally, there is the domain of tests assessing information processing and executive functions. Obviously several of the memory- or attention- measuring tests are capturing to some extent cognitive flexibility and the information processing capacities. However, some more complex test procedures have been carried out, the results of which do not rely merely on memory or attention, but assess more the overall changes in cognitive performance. These tests extend from calculation tasks and logical reasoning tests to gambling and probabilistic learning tasks. Other examples are tests on verbal fluency or perceptual tasks such as tests where the relative length of a tone had to be judged. Although often difficult to interpret, their results reflect reasonably well the overall effect of the medication on human cognition.

9 Results

From the studies included in this systematic review, only those that published sufficient numerical data were included in the further analyses. For each outcome a separate analysis was performed. A p < 0.05 was considered as the significance level.

9.1 Single drug administration

After a single dose of an antidepressant, a significant effect was obtained for two of the outcomes, namely wakefulness and memory, while there was no effect on mood, attention, emotional processing or executive functions.

Included in the analysis were 24 single dose studies assessing wakefulness which were represented by 106 test parameters. The studies varied considerably in their results with a heterogeneity variance of 0.64 that tended to decrease when time as a covariate was considered as well (0.63). Wakefulness was the only outcome to show an overall and time-independent drug effect. There was a small but significant negative effect of drug vs. placebo with an estimate of −0.37 (p < 0.05). Also, at the first assessment, which corresponds to the time point shortly after drug intake, again a significant negative effect was found (0.25, p < 0.04). The assessment at the second and third time point showed a negative effect too; this, however, did not reach significance.

Memory was examined in nine single dose studies (using 59 relevant tests) with little heterogeneity variance (0.18). At the first three time points no effect was found, but at the fourth and fifth assessments a strong significant positive drug effect was found, with estimates of 1.37 (p < 0.03) at the fourth and 1.16 (p < 0.05) at the fifth assessment, respectively.

9.2 Repeated drug administration

For all the outcomes, except emotional processing and executive function, a significant time effect was found, as measured by the fixed effects using type III estimable functions and significance tests.

Mood was examined in 14 studies comprising 85 parameters. The studies varied very little (heterogeneity variance 0.06) with a minimal tendency towards a larger heterogeneity variance when considering time (0.09). No significant drug effect was found in the first and second assessments, but a significant positive one was found in the third, and last, assessment, with an estimate of 0.51 (p < 0.002). Since in all three assessments the effect leaned in the same direction, the condition for using time as a linear covariate was fulfilled. A significant day-by-day increase of mood by 0.13 (p = 0.005) was found in this pool of studies which had an average duration of 16.9 days of drug intake with a standard deviation (SD) of 12 days and a range from 7 to 56 days.

With regard to attention, 17 studies with 258 relevant parameters provided data for analysis. With consideration of time effect, the studies varied very little (heterogeneity variance 0.1). For attention, no effect at the first assessment and only a small, positive, significant effect at the second assessment (estimate 0.14, p < 0.05) was found. A negative, but non-significant, effect emerged in the third assessment (estimate −0.15, p = 0.07), while a small but significant positive effect reappeared at the fourth assessment with an estimate of 0.29 and a significance of p < 0.05. The effect became negative and non-significant again at the fifth and last assessment with an estimate of −0.37 at p = 0.06.

Memory was tested by eight different repeated drug administration studies, with 127 relevant parameters included in the analysis. The heterogeneity variance between the studies, considering time, was very small (0.045). Starting from a minimal but significant negative estimate of group effect at the baseline assessment (−0.31, p < 0.04), up to the third assessment, increasingly significant but altogether small effects over time were found. At the first and second assessments a significant positive drug effect was shown, with estimates of 0.24 (p < 0.02) and 0.23 (p < 0.02), respectively. This effect became larger at the third assessment with an estimate of 0.4 (p < 0.005). At the last assessment no significant effect was observed.

For wakefulness there was a significant time effect as measured by the fixed effects using type III estimable functions, but no significant effect in any particular assessment point. Finally, for emotional processing and executive functions the small number of studies, 2 and 3, respectively, did not allow for any effect to emerge.

9.3 Adverse effects

In the majority of the studies, no standardised method of assessing adverse reactions and reporting drop-outs due to adverse effects was used and, in a number of studies (79), no comment on side effects was made. Therefore, no further analysis was performed and the results are presented here in a descriptive manner. In a small number of studies (20) no adverse effects were observed, while in 84 studies there was some sort of side effects. These were normally benign and only in few cases led to drop-outs. Adverse reactions were usually observed after the initial administration, wore off with continued intake and were primarily gastrointestinal complaints (e.g. nausea, diarrhea, dry mouth, epigastric pain), sleep disturbances, restlessness, tremor, headache, dizziness, fatigue and drowsiness. Furthermore, sedation was a frequently reported adverse effect and was analysed in this report as part of the “wakefulness” outcome.

9.4 Discussion

This systematic review focused on studies of antidepressants in healthy individuals. An unexpected result was the rather large number (135) of these studies which then allowed for analyses of the data by different outcomes. This is due to the fact that cross-over and single blind studies were also included. Surprisingly, most of the studies examined the effects of a single drug administration despite the fact that in clinical populations the effects of antidepressants are detectable only after several weeks of daily intake. But even if the drug was given repeatedly, the studies had an adequate duration in only very few cases. For instance, only 17 studies lasted more than 2 weeks. In contrast, in clinical studies examining the effectiveness of an antidepressant, typically a 6-week cut-off point is chosen. In this systematic review, only one study lasted this long. A possible explanation might be that almost none of the studies were explicitly researching the neuroenhancement properties of antidepressants, since they were addressing different questions, such as the side effect profile or the ability to drive under the influence of a medication.

Unfortunately, many of the studies did not report their results in numbers and therefore, although they were formally included in the systematic review, their results were not considered in the analysis. This is a well-known weakness in reporting controlled trials (Egger et al. 2001; Higgins and Green 2006), especially those failing to find any significant result. This was often the case in the studies not included in our statistical analysis. Consequently, the few significant results that were found through our analysis are to be taken with caution. It is likely that, had the non-significant, not reported results been included in the analyses, the effects might have become less significant.

Nevertheless, from the analysis that was performed, a number of conclusions can be drawn. After a single dose, a robust negative main effect on wakefulness was found. This reflects the sedating potential of many antidepressants after a first dose, which is well known from clinical practice. Concerning the positive effect on memory found after several measurements, post hoc inspection of the original data showed that the effect in the fourth and fifth assessments could be ascribed to the only study that had that many assessment points (Harmer et al. 2002).

The main interest was obviously in the effect after a drug intake of several days. For most of the outcomes an effect emerging over time could be found. Most interesting was the positive effect on mood that continuously increased over time. If the trials had been longer and had had more assessment points, one could speculate that this effect would persist or even become stronger or that a ceiling effect would emerge. One should also take into account that the majority of the studies included in this systematic review were not on enhancement and, therefore, the participants did not have any particularly high expectations that would generate a placebo effect. Nevertheless, a placebo effect did exist. However, there was still a small placebo-verum difference that was statistically significant at the last assessment point. Consequently, it is important to note that antidepressants even in healthy, non-depressed individuals seem to be able to heighten mood.

Regarding attention, no firm conclusions can be made since the results fluctuated highly over time. Concerning memory, the fact that the two groups started with a group difference confounds the data and does not allow any further speculations on the meaning of the results.

Finally, no evidence of a significant adverse event profile could be found. Antidepressants seem to be relatively safe in healthy individuals and had a high acceptability in a population that had a rather uncertain and small anticipated benefit.

Some limitations of our systematic review should be mentioned. The results refer to different doses of several medications and have not been attributed to each drug and thus no conclusion on any particular drug can be drawn. Furthermore, the assessment methods that were clustered together are not necessarily comparable, and given the fact that many of them are taken from clinical settings and are not validated for healthy persons, it is uncertain if they can measure subtle changes in healthy people. This applies especially to the several mood assessment tools that have been used. The question of the comparability of the trials that have been summarised might be raised. The studies were of different quality (as measured by the Jadad scale), sample size, design and duration and the assessments represent different time points in each study. Finally, although no particular test was used to test for it, no obvious publication bias could be observed. That, of course, does not exclude the potential bias that the review process introduces, especially as no grey literature or unpublished results were included. Hence, the explanatory power of this review is limited.

10 Reviewers’ conclusion

10.1 Implications for practice

Through the analysis of the existing studies no consistent evidence for the enhancing effect of antidepressants in healthy individuals could be found. There is little evidence so far supporting the popular opinion that antidepressants have a positive effect on the mood of healthy individuals after repeated administration. No evidence of a significant adverse event profile could be found.

10.2 Implications for research

The existing research summarised in this systematic review not only provides insufficient evidence for or against any effect of antidepressants in healthy people, but it is inapt to be used for answering the question of the possible neuroenhancement properties of these drugs. The majority of the studies did not specifically address this issue and therefore, in view of the growing public interest, there is a need for studies with the explicit research question of neuroenhancement. For these, appropriate assessment methods need to be developed and validated for healthy individuals. Studies should be of parallel-group, randomised, double-blind, and placebo-controlled design with clear prior specification of outcome measures of interest. Not only should they be of high quality, but moreover of sufficient duration. The results should be reported in detail, provide numerical data and state precisely the rate and extent of adverse effects.