Overview

Recent high-profile clinical trials with psychedelic drugs have highlighted challenges related to rigorous study design and condition masking that have simmered in both psychotherapy and pharmacology research for decades (e.g., Basoglu et al. 1997; Enck and Zipfel 2019). Interrelated methodological challenges regarding the selection of appropriate control conditions, masking (also known as blindingFootnote 1), and expectancy effects have clouded our understanding of the source of clinical improvements in psychedelic studies and, in fact, across medicine. Studies on psychedelic therapy are particularly challenging as they must address methodological issues inherent to both psychotherapy and pharmacology research as well as issues that are distinctly problematic to the field, such as “hype” and salient psychoactive effects that compromise masking. In this paper, we delineate how many of the methodological limitations that have been raised as critiques of psychedelic science are common challenges across psychotherapy and pharmacology research more broadly and are in need of addressing. This review allows us to share lessons across disciplines and provide recommendations for improving future psychedelic and non-psychedelic research. We conclude by highlighting that psychedelic studies should not be held to a different standard than other forms of psychotherapy or pharmacology research, and that the fields can leverage important lessons from one another by recognizing their shared limitations. To this end, we provide practical methodological recommendations to measure and manage expectations as well as to enhance masking in psychedelic studies. These recommendations can be deployed more broadly across clinical trials to improve the rigor and reproducibility of future research.

Treatment-nonspecific effects

To begin, we review the various reasons for including control conditions in clinical studies and examine what exactly is being controlled. In any clinical trial, changes in symptoms can be observed because of treatment-specific or treatment-nonspecific effects (Turner et al. 1994). Treatment-specific effects are changes directly attributable to the independent variable or intervention under study (e.g., drug dose or psychotherapeutic approach). Treatment-nonspecific effects are changes not related to the specific treatment arm (i.e., common to being in any clinical trial), as well as placebo and nocebo effects related to treatment expectations (Table 1). Including certain control conditions allows the trialist to filter out contributions of treatment-nonspecific effects from treatment-specific effects to attribute clinical improvements to the intervention under study (Fig. 1a).

Table 1 Key terms and definitions
Fig. 1
figure 1

Treatment-nonspecific effects in clinical trials. (a) Hypothetical results of a clinical trial to delineate the sources of treatment-specific and treatment-nonspecific effects. Including placebo and no treatment control conditions allows trialists to identify treatment-specific effects (figure inspired by Wampold et al. 2016). (b) In a clear illustration of expectancy effects, Bingel et al. (2011) measured participants’ pain intensities before (i.e., Baseline) and after receiving remifentanil while manipulating participant expectancies across three groups (e.g., No expectancy, Positive expectancy, or Negative expectancy). They found that priming positive treatment expectancy doubled the analgesic effect of remifentanil when compared to no expectancy. In contrast, inducing negative treatment expectancies eliminated the analgesic effect. (c) Gold et al. (2017) demonstrated that treatment effect sizes vary as a function of the type control group utilized

The natural history or spontaneous variation of any given disease under study may be the least controllable source of treatment-nonspecific change that can confound clinical trial interpretation. Symptoms can change (e.g., spontaneous remission) independently of the study intervention as a function of an unidentified biological or psychosocial change in the individual’s life. Additionally, in most clinical trials, participants are screened and selected based on minimum criteria of symptom severity, and many individuals may be especially motivated to seek out research studies when their symptoms peak in severity (Whitney and Von Korff 1992). Subsequent measurements using the same scale may show an apparent improvement. This “regression to the mean” rather than a true treatment-specific effect may lead researchers to erroneously conclude a treatment is effective when participants may have improved over time without any treatment (Hengartner 2020). Regression to the mean is a ubiquitous statistical phenomenon that results whenever cases are selected for follow-up based on abnormally high or low scores at baseline, demonstrated in observational studies and clinical trials, and across multiple diseases (Bland and Altman 1994). Changes due to the natural course of the condition and regression to the mean are considered theoretically distinct but in practice are difficult to disentangle.

Participant behavior can also change simply as a consequence of the interest, care, or attention received as part of a study. This well-established psychological phenomenon is known as the Hawthorne effect (Sedgwick and Greenwood 2015). This effect is associated with outcomes as diverse as workplace productivity to cognitive functioning and quality of life in dementia patients (McCarney et al. 2007). Notably, researchers and study personnel, not just participants, can be susceptible to Hawthorne effects, thereby influencing clinical outcomes (Sedgwick and Greenwood 2015). That is, those caring for participants in an experimental trial are under increased scrutiny and observation as compared to those operating in an unobserved clinical setting, and this difference may impact both the quality and quantity of patient care. This bias can cause an overestimation of an experimental treatment’s therapeutic effect due to clinical improvements from treatment-nonspecific factors. A distinct but related issue is that the simple act of repeated observation and measurement of behaviors and symptoms can alter those same behaviors and symptoms. Repeated pain assessments can increase pain chronicity (Ferrari and Russell 2010), asking about illicit drug use can decrease use (D’Onofrio et al. 2012), and daily symptom assessments can worsen or improve symptom severity in PTSD (Dewey et al. 2015; Pedersen et al. 2014). Drawing extra attention to an issue can lead to symptom amplification or may provide more opportunities to resolve it (Barsky and Klerman 1983). In either case, it is clear that simply enrolling in a clinical trial can influence symptoms regardless of treatment assignment.

Taken together, issues related to the natural course of the disease, regression to the mean, and observation-related changes highlight that there are many mechanisms by which symptoms may change in a clinical trial irrespective of the treatment being tested. It is therefore important to include, at a minimum, control arms that do not receive the treatment, as treatment-nonspecific factors confound experimental and control arms to a similar extent. However, the simple inclusion of an untreated comparison group may not be enough to isolate treatment-specific effects (Gold et al. 2017; Enck and Zipfel 2019). Participants often have expectations regarding the efficacy of the treatment under study. If participants have knowledge about their treatment arm assignment (e.g., in an open-label study), or gain knowledge through their subjective experience (e.g., having a psychedelic trip) or somatic symptoms, their expectations about therapeutic efficacy can affect their clinical outcomes. This problem is common to most psychotropic trials (e.g., selective serotonin reuptake inhibitors [SSRIs]; Hieronymus et al. 2018) and is particularly salient for high-dose psychedelic trials in which subjective drug effects are especially pronounced. Without effective condition masking, it is virtually impossible to maintain the independence of the main variable under study (i.e., the treatment), as it is confounded by participant expectations. In addition to influencing participant outcomes, baseline expectancies about a treatment’s therapeutic effects can also impact masking efficacy (i.e., whether participants are aware of their treatment arm assignment), as those with noticeable improvements in symptoms often assume they were assigned to the active treatment group (Sackett 2007). We now consider several specific expectations and how they interact with masking and treatment outcomes.

Expectancies in psychotherapy and pharmacology research

Tambling (2012) differentiates between expectations about the process of treatment and expectations about the outcome of treatment. In the case of psychotherapy, process expectations are expectations about what will happen during therapy (e.g., patient’s thoughts about roles they and their therapist will assume, characteristics of their therapist, and what sessions will entail). In pharmacological trials, process expectations can include expectations about any acute drug effects, including psychoactive effects. Process expectancies may be particularly pertinent with psychedelic drug trials as expectations about the acute effects of the drugs are shaped by hours of psychotherapy, widespread representations in popular media, and a highly ritualized process of drug administration. When these expectations are matched by experience, a study participant may be especially confident in unmasking their treatment arm assignment.

Outcome expectations refer to whether the treatment is anticipated to reduce symptoms. In the case of psychotherapy, studies suggest that outcome expectancies are stronger predictors of therapeutic effects than are specific psychotherapy techniques (Horvath et al. 2011; Webb et al. 2010). Positive outcome expectations are related to stronger alliance with the therapist, which is associated with better outcomes (Vîslă et al. 2018; Yoo et al. 2014). A recent, well-powered meta-analysis (N = 12,722) compared patient outcome expectancies and clinical outcomes across a variety of diagnoses and psychotherapy interventions, revealing that greater positive outcome expectancy was consistently associated with better treatment results (Cohen’s d = 0.36; Constantino et al. 2018). Outcome expectancies also have strong effects relative to the active effects of psychotropic drugs (Rutherford and Roose 2013). In trials where patient-reported outcomes are the primary efficacy measures, the effects of outcome expectancies are particularly strong (Atlas 2021). Fillingim and Price (2005) concluded that in placebo analgesia studies outcome expectancies accounted for up to 81% of variance in post-treatment pain ratings. Thus, across clinical research contexts, participants’ outcome expectations about the specific treatment being administered influence clinical outcomes.

Negative outcome expectations can also influence clinical outcomes. When individuals are aware that they have been assigned to a treatment that they believe is unlikely to improve their symptoms, negative expectation alone can worsen patient outcomes, which is known as the nocebo effect (Gold et al. 2017; Planès et al. 2016). This effect was elegantly demonstrated in a study with remifentanil, an opioid analgesic, which found that priming negative expectations about the treatment completely negated the analgesic effect of the drug (Bingel et al. 2011; Fig. 1b). Furthermore, if a participant has positive expectations about the proposed experimental treatment but comes to believe they have been assigned to a control condition, outcomes may worsen as a result of disappointment or the belief that one will not improve without being assigned to the active treatment (Furukawa et al. 2014). Indeed, those put on a waitlist control condition typically have worse outcomes than those assigned to active placebo, or even no treatment, as they have less reason to expect an improvement in symptoms (Patterson et al. 2016). With waitlist control designs, those in the control condition do not receive treatment until after a waiting period, where they are compared with the active treatment group. However, participants are generally aware that they are in a control condition during their waiting period and thus may not expect to see improvements, whereas the active treatment group likely has the opposite expectation. Therefore, waitlist control designs may artificially inflate intervention effect size estimates (Fig. 1c; Cunningham et al. 2013; Zhipei et al. 2014). Possibly illustrating this effect, in a waitlist control study of psilocybin for the treatment of major depressive disorder, waitlisted participants reported higher anxiety scores at the end of the waitlist period compared to the beginning, enhancing the apparent therapeutic effect of psilocybin (Davis et al. 2021). The crucial role of expectancies in treatment outcomes across clinical contexts underscores the need for trial designs that control for expectation-related improvements, which we elaborate on in the following sections.

Importantly, outcome expectancies are rarely measured in psychotherapy and pharmacology studies (Doering et al. 2014). Constantino et al. (2011) noted that expectancies have often been thought of as nuisances to clinical research and disregarded rather than being considered important ingredients of the therapeutic process. Furthermore, the few studies that have included assessments of treatment expectations have used brief and study-specific measures, meaning there is surprisingly little overlap between studies in how expectations are quantified (Tambling 2012). Moreover, there is no manual or expert consensus for managing expectancies despite the extensive evidence of the important role of expectancy in treatment responses (Zilcha-Mano et al. 2019). Collectively, these findings highlight that challenges related to participant expectations are common across psychotherapy and pharmacology research, and that, to date, there is no standard for addressing expectation-related issues.

Psychedelic research and expectations

Briefly, the typical structure of a modern psychedelic therapy clinical trial involves an arduous screening process, multiple preparation sessions, single or multiple drug dosing sessions, and integration sessions after drug administration (Fig. 2). The preparation sessions are used for several purposes, including to build rapport between the participant and the therapists or facilitatorsFootnote 2, to inform the participant about common or possible psychedelic drug experiences, to reassure the participant’s safety with dosing day procedures, and to assist with establishing the patient’s intention(s) for their dosing session. The drug dosing session is highly structured with two therapists accompanying the participant throughout the 6–8-h session in a comfortable environment. During the dosing session, participants often remain reclined on a couch with eyeshades and headphones for music and are encouraged to focus on their inner experience throughout the drug session, exploring any content that arises with an open and accepting mindset. In the days following drug dosing, the participants work with the same clinical team in integration sessions to make meaning of their experiences and to incorporate any insights they may have had into their lives going forward. With these fundamental elements of psychedelic therapy, it is best considered a complex, multicomponent intervention that includes aspects of both pharmacology and psychotherapy. Notably, throughout the course of a psychedelic therapy trial, a participant’s process expectations and outcome expectations are subject to change as they gather more information about possible drug effects, approach the sessions in a certain way (e.g., trust, let go, be open), and experience the actual drug effects. Hereafter, we refer to this package of procedures as psychedelic therapy and acknowledge that all of these aspects may determine treatment-specific effects.

Fig. 2
figure 2

Stages of psychedelic therapy. Psychedelic therapy typically involves preparation, dosing, and integration sessions

Participants’ expectations as well as intentions (i.e., what they desire from the psychedelic experience) are thought to play a prominent role in the drugs’ acute and long-term effects (Olson et al. 2020). Some have even termed psychedelics “placebo enhancers,” as they can enhance the perception of meaningfulness (Hartogsohn 2016, 2018) and induce a state of suggestibility (Carhart-Harris et al. 2015). It has been noted across popular culture that psychedelic experiences are heavily influenced by one’s expectations, and some have gone as far as to claim “no other class of drugs are more suggestible in their effects” (Pollan 2018). Hartogsohn (2021) noted that the fundamental role of expectations in psychedelic drug effects may reconcile the paradoxical conceptions that have been held about the drugs—views that are so varied, it at times sounds as though scientists are discussing completely different drugs (e.g., they have been used to both treat mental illness and to model psychosis). Utilizing pre-dosing expectations as well as the acute state of suggestibility induced by psychedelics in tandem may be an important component of the therapeutic process with psychedelic therapy, but this combination can also be co-opted for nefarious purposes. Historically, psychedelics have been used by cults as well as investigated for their alleged potential in “mind control” by the US government during MK Ultra (Cusack 2020; Kogo 2002; Ledford 2019). There is even concern about psychedelics’ potential for changing beliefs (e.g., political or metaphysical; de Wit et al. 2021; Pace and Devenot 2021; Timmermann et al. 2021) and memories, though that is beyond the scope of this review. Therefore, it may be ethical to include an enhanced informed consent process about possible belief changes induced by psychedelic therapy prior to enrolling participants into a clinical trial (Smith and Sisti 2021).

Although pre-dosing expectations have long been thought to be integral to the effects of psychedelics (Eisner 1997; Leary et al. 1963), very few studies have actually measured them. A recent “microdosing” (i.e., sub-hallucinogenic dosing) study found that positive expectations regarding psychedelics at baseline predicted subsequent increases in wellbeing irrespective of whether a participant received a psychedelic or an inert placebo (Kaertner et al. 2021). Similarly, a large-scale, placebo-controlled study of microdosing found that participants experienced comparable improvements in mood and cognition in the drug and placebo conditions (Szigeti et al. 2021). Another microdosing study found that after controlling for baseline expectancies, there was no difference between psilocybin and placebo on measures of awe (van Elk et al. 2021). However, to the best of our knowledge, only a single “macrodosing” (i.e., full hallucinogenic dosing) trial has recorded pre-treatment expectancies. An open-label ayahuasca study found that participants endorsing an expectancy of favorable change in neuroticism, extraversion, and conscientiousness in response to ayahuasca showed a greater decrease in neuroticism and greater increases in extraversion and conscientiousness following ayahuasca administration compared to participants with lower expectancies receiving the same treatment (Weiss et al. 2021). A recent systematic review found those with a recreational intention with psychedelics tended to have less challenging experiences when they used a psychedelic (Aday et al. 2021; Haijen et al. 2018), again suggesting that what one desires and expects to experience with psychedelic influences the drug’s effects. Thus, the few studies that have measured expectations and intentions to date support the prevalent assumption that pre-dosing expectations interact with psychedelic drug effects and outcomes. Whether these same considerations apply to other drug classes (e.g., such as psychostimulants) is unknown, further emphasizing the need to measure and report therapeutic expectations in a systematic way across areas of clinical research.

High-dose psychedelic trials may also be particularly susceptible to a type of bias termed “hype” or the “Michael Pollan effect” (Carpenter 2020; Table 1). Some have argued that psychedelic therapy marks the most important innovation in psychiatry since the introduction of SSRIs, or possibly ever, and it is not uncommon to hear claims about the potential for psychedelics to “change the world” from industry leaders and enthusiasts (Dupuis 2021). This pervasive messaging may lead to amplified positive expectations compared with many other types of clinical interventions and perhaps motivates participants to “not let the movement down” by failing to clinically improve. This notion was illustrated in a recent ayahuasca study (Aday 2021), where one of the participants asked us (JSA) if they should stop participating in the study because they did not have a mystical experience and did not want to “ruin the research.” In our experience recruiting for psychedelic studies, many potential participants explicitly express a sense of pride and excitement in participating in a psychedelic trial as well as strong confidence in the benefit of psychedelics to their mental wellbeing. These motivations for participation and heightened positive expectations coupled with the functional unmasking that often occurs make identification of a treatment-specific effect in high-dose psychedelic trials particularly challenging and highlights the need for study designs that properly mask participants to conditions (Burke and Blumberger 2021).

Certain aspects of the study personnel, environmental context, and measures included in psychedelic drug trials may contribute to enhanced expectations as well. For example, the use of two therapists at a time and rituals like placing a fresh rose in the room on dosing day may serve to amplify positive expectations and signal that the experience is of particular significance (Gukasyan and Nayak 2021). Additionally, outcome expectancies of psychotherapists have been shown to have a marked effect on treatment engagement and clinical outcomes across therapeutic approaches (Doering et al. 2014; Leake and King 1977), suggesting this may be a treatment-nonspecific factor relevant to psychedelic studies as well. Lastly, the specific measures used in psychedelic trials can influence participant expectations; one study volunteer noted “I long to see some of the stuff hinted at in the questionnaire” in reference to questions they encountered on the Mystical Experience Questionnaire (MEQ; MacLean et al. 2012; Pollan 2018). Thus, in addition to preexisting attitudes about psychedelics, certain expectations may be engendered by characteristics of the trial.

Modern era clinical research design elements

Next, we will describe many of the study designs and methods that have been attempted to manage these issues across psychotherapy and pharmacology trials to date. Open-label study designs, in which both the patient and study personnel are aware of what specific treatment is administered, most closely resemble how psychotherapy and psychotropic drugs are administered in real-world, non-research settings. Although high in ecological validity, this type of design does not control for most of the confounding nonspecific factors that can affect clinical outcomes (e.g., Hawthorne effect, spontaneous variation of symptoms, regression to the mean).

Some treatment-nonspecific factors, such as regression to the mean, can be controlled if sufficient data are available at both the individual and group level, as a precise mathematical formula can be developed to predict the actual regression effect in a given experimental setting (Barnett et al. 2005). These authors have identified specific experimental strategies to mitigate or manage expected regression to the mean effects in a clinical trial. First, they recommend selecting cases based on multiple baseline observations. Requiring that eligible subjects have stable test scores over two or more baseline assessments will predictably reduce, although not necessarily eliminate, regression to the mean. Second, the authors suggest correcting for regression to the mean effects in the analyses by using either ANCOVA modeling or application of a correction formula. Of note, neither of these strategies have been systematically applied in studies of psychedelic therapy. Third, investigators may consider a waitlist control condition, although we refer the reader to limitations to this approach noted previously.

The double-blind randomized controlled trial (RCT) is considered the gold standard design for identifying a true treatment-specific effect, under conditions where neither investigator nor participant knows their treatment allocation. An RCT entails randomly assigning participants to treatment or control conditions and withholding knowledge of treatment arm assignment from participants and study personnel (i.e., masking). Effectively executing this design controls for expectancies as it is unknown which treatment each participant received, and therefore treatment-nonspecific factors can be ruled out as the source of treatment arm outcome differences. Treatment arm masking in RCTs is best achieved with active placebo comparators, in which the control condition is structurally equivalent and closely resembles the presentation and side effects of the experimental treatment without providing the therapeutic effects (Doering et al. 2014). Inert but identical-looking pills that lack the side effects of the treatment condition (i.e., inactive placebos) are often used but may be easy for participants to detect, and subsequent nocebo effects may confound analyses.

There has been considerable debate that continues today about what constitutes a proper “inert” placebo for psychotherapy in the same sense as an “inert” placebo in pharmacology, as some have argued that “there is no such thing as inert psychotherapy” (Rosenthal and Frank 1956; Wampold et al. 2016). In the context of psychedelic trials, to date, the psychotherapy component has been held constant across the treatment and control conditions, making this issue less relevant for the field for now. However, as researchers delineate the nuances of what specific forms of psychotherapy are most synergistic with psychedelics, this potential confound will become an increasingly important issue to address (Horton et al. 2021). A related challenge with psychedelic studies is that unmasking may lead to differences in how the psychotherapy component is administered and received, given that the context of the therapy shifts once the participant and/or therapist becomes aware of the treatment arm assignment. Therefore, improved masking procedures must be implemented into psychedelic science for the field to meet the assumptions of the current gold standard clinical trial design.

Crossover RCT designs have been used in many pharmacological studies as an efficient way to account for treatment-nonspecific confounds because participants act as their own control. In a crossover design, participants are randomly assigned to a sequence of treatments where they receive both the experimental and placebo treatments but at different timepoints (i.e., placebo then experimental treatment or vice versa). A major weakness of crossover designs, however, is the potential for carryover effects (i.e., the therapeutic benefits could “carryover” after the first treatment and misrepresent the true effect of the second treatment). Carryover effects are especially concerning in psychedelic trials because the effects of psychedelic therapy in some cases have been shown to be durable for over a year (Griffiths et al. 2008; Johnson et al. 2017; see Aday et al. 2020b for review). Thus, even a 12-month washout period is unlikely to achieve a return to pre-treatment levels on the variable of interest, which biases within-person analyses and threatens the validity of conclusions that can be drawn. Moreover, masking is likely to be compromised in crossover designs that involve a psychoactive drug (Wilsey et al. 2016). For example, almost all participants accurately identified their treatment condition in a crossover study that used psilocybin and niacin as a placebo control (Grob et al. 2011). Thus, simple crossover designs may be more confounded than a parallel (between-subjects) RCT design for psychedelic trials.

We have repeatedly noted the importance of adequate masking in double-blind RCTs, and emphasize that it is impossible to know if the double-blind or masking was achieved without testing masking efficacy. Surprisingly, however, masking efficacy typically goes unmeasured or unreported in psychotherapy and pharmacology trials (Doering et al. 2014). Many researchers report their studies as being “double-blind” without testing such claims (Basoglu et al. 1997). A systematic review on methods of masking in randomized controlled trials with pharmacologic treatments concluded that reporting of condition masking is generally “quite poor,” and based on trials that have tested the success of masked methods, a high proportion of studies are effectively unmasked (Boutron et al. 2006; Rabkin et al. 1986). This corroborates a recent systematic review of studies published in top psychiatry journals in 2017 and 2018, which found that only 59% of the trial reports included adequate reporting of masking outcomes (Juul et al. 2020), as well as a meta-analysis that indicated a large majority of antidepressant RCTs do not assess masking efficacy, and when measured, masking often fails (Scott et al. 2022). Similarly, a comprehensive literature search found that masking was not maintained in 20/23 “double-blind” studies examining psychotropic drugs (Fisher and Greenberg 1993). The authors noted improvements in patient symptomology and side effects from the active drug were the major cause of unmasking. Long-term masking can be difficult, if not impossible, to achieve with highly efficacious treatments because it is clear to the patient that they experienced an improvement in symptoms (Muthukumaraswamy et al. 2021). Thus, many argue that end-of-trial assessments for masking cannot be done with validity, as they cannot disentangle masking from guesses based on efficacy (Mataix-Cols and Andersson 2021; Sackett 2007), although it should be noted that some researchers argue that it is not considered unmasking at the end of the trial if people guess their condition based on efficacy (Katz 2021).

Masking attempts in psychedelic studies

Multiple approaches have been attempted to address these methodological challenges specifically as they relate to psychedelic trials. First, active placebos have been used in an attempt to mask participants and therapists to treatment conditions, albeit generally unsuccessfully. This difficulty was infamously demonstrated in the “Good Friday Experiment,” where divinity school students were assigned to receive psilocybin or niacin, a B vitamin with mild physiological effects, in a group setting at a chapel (Pahnke 1963). Despite some initial confusion because of niacin’s fast-acting effects on vasodilation and general relaxation, before long, it became clear which participants had been assigned to which condition, as those in the psilocybin group had intense subjective reactions and often spiritual experiences, whereas the niacin group “twiddled their thumbs” while watching on (Prideaux 2021). By the end of the day, all participants correctly ascertained whether they were in the treatment or control group (Doblin 1991). Despite the clear masking failure, after more than 50 years, many researchers today still use niacin as the active placebo in clinical trials with psychedelics, perhaps for a lack of better alternatives (Grob et al. 2011; Ross et al. 2016; Siegel et al. 2021). Nevertheless, participants are now dosed individually rather than in a group to reduce potential unmasking from witnessing others’ experiences. Modern psilocybin trials have also employed methylphenidate (Griffiths et al. 2006) and dextromethorphan (DXM; Carbonaro et al. 2018) as active placebos, although the success of masking was typically less than 25% or unreported in these studies (Bershad et al. 2019; Carbonaro et al. 2018; Griffiths et al. 2006). Uthaug et al. (2021) tested an innovative strategy at masking by mimicking the aesthetic and somatic features of the psychedelic brew, ayahuasca. The investigators used a mixture of coco powder, vitamins (unspecified), turmeric powder, quinoa, traces of coffee, and potato flour, as a placebo to mimic the texture as well as gastrointestinal side effects of the drug. Despite effectively masking the profound effects of ayahuasca in several experienced users, a majority of participants were still able to accurately identify their treatment assignment (Uthaug et al. 2021). A review of ongoing clinical trials revealed that researchers are currently experimenting with a number of other potential control conditions in psychedelic studies, including mannitol, lactose, ketamine, microcrystalline cellulose, and nicotinamide (Siegel et al. 2021), but the effectiveness of these attempts remains to be seen.

Low doses of psychedelics have also been tried as a potential control condition to improve participant masking (Griffiths et al. 2016). One study combined a low dose of psilocybin with incomplete disclosure (see below) such that participants and study staff were unaware of the number of treatment arms in the study. Specifically, participants were informed that they could receive anywhere from 0.5 to 30 mg of psilocybin in the trial when in fact they could only receive 0.5 mg if they were in the control condition or 25 mg if they were in the treatment condition (Griffiths et al. 2016). An advantage of including the low dose of psilocybin is that all participants are truthfully told they will receive psilocybin, which presumably helps balance treatment expectations across both conditions. However, participants and therapists are still at risk for unmasking with this design because it is typically easy to ascertain whether the participant has an intense psychedelic experience or not. Schenberg (2021) also noted that this design may be limited by ethical considerations, given that 3,4-methylenedioxymethamphetamine (MDMA) research has shown that low-dose control conditions can be stressful and trying for patients, leading to dropouts and dissatisfaction (Oehen et al. 2013), and anecdotal lore in the underground psychedelic therapy community suggests that medium doses of psychedelics can agitate people without allowing them to “breakthrough” (JDW, personal communication, 2021). On the other hand, low doses of classic psychedelics (i.e., microdosing) have been purported to be therapeutic (Fadiman 2011; Kuypers et al. 2019), which could also confound study results, although the therapeutic benefit of single microdoses seems unlikely to be durable or significant. Thus, including a low-dose psychedelic as part of an active control condition is a promising starting point.

Incomplete disclosure of certain aspects of the study design is a strategy that has been employed to enhance masking success and balance treatment expectations among conditions. For example, some studies incompletely disclose the number of treatment arms to participants in an attempt to obscure the study design and reduce the participants’ confidence in their treatment group allocation (Bershad et al. 2019; Carbonaro et al. 2018; Griffiths et al. 2006; Reissig et al. 2012). Another compelling approach (in healthy subjects) involves consenting participants to possibly receiving one of several substances in order to reduce their certainty of treatment allocation. For example, in some experiments, participants consent to receive MDMA, methamphetamine, tetrahydrocannabinol (THC), benzodiazepine, and/or placebo (Bedi et al. 2010; Bershad et al. 2019), but in fact only receive one or two of these drugs in any particular study. Although this design is possible to implement in psychedelic studies of healthy individuals who are not seeking treatment, there are limitations to this approach, including reduced generalizability because a large proportion of the population may not be comfortable with receiving any one of the listed substances. Moreover, this design has not proven to be particularly effective to date, as participants accurately identify the experimental condition (e.g., MDMA and psilocybin) ~70–85% of the time (Bershad et al. 2019; Carbonaro et al. 2018). Thus, even with these more rigorous approaches, adequate masking remains a challenge. Taken together, there is a pressing need for methodological innovations that adequately address the problem of masking in psychedelic studies.

Muthukumaraswamy et al. (2021) made several recommendations for addressing masking in psychedelic clinical trials. The authors suggested that active placebos may need to be combined with alternative trial designs (e.g., dose-response parallel-groups design) as well as some vagueness about the acute effects of psychedelics when consenting participants. Dose-response parallel-groups designs compare the full dose of the active treatment drug with a low dose; the advantages and disadvantages of such an approach are discussed previously. Vagueness regarding the acute effects of psychedelics has tradeoffs as well: although it may improve masking, there are clear ethical concerns as participants need to be able to give fully informed consent (Smith and Sisti 2021). This consideration is especially true with psychedelic studies, as psychedelic experiences have been described as “life changing” and have the potential to affect one’s social relationships (Ross et al. 2016), spirituality (Griffiths et al. 2006), and worldview (Timmermann et al. 2021). Another recommendation provided was the 2 × 2 balanced placebo design (Rohsenow and Marlatt 1981), or 2 × 2 factorial design, in which the intervention factor (psychedelic drug, placebo) and instructional set provided to each participant (receiving psychedelic drug, receiving placebo) are systematically crossed with each other. This design offers a potentially rigorous experimental means for separating pharmacological effects of the drug from participant expectations but is most suitable for mechanistic studies of acute drug effects, rather than clinical trials examining treatment efficacy. To date, there are no published reports of this design being used in psychedelic drug research, possibly because of its high costs (Schenberg 2021). Although researchers have begun to address the methodological challenges associated with masking, treatment expectations, and their combined impact that can bias study results, there is a need to advance the rigor of future research. We build upon this work in the next section by elaborating on recommendations for improving psychedelic clinical trials.

Novel recommendations to improve future research

Experimental confounds related to expectancies and placebo effects in psychedelic studies largely stem from inadequate masking. Therefore, our recommendations are primarily focused on how to improve masking in psychedelic trials through a combination of procedures intended to decrease participants’ confidence in their assigned treatment arm (Fig. 3). As our review of others’ pioneering work makes clear, adequate masking involves critical decision points at every step in the lifecycle of a clinical study. Our suggestions follow suit, noting elements for consideration in study development and design, participant recruitment and selection, outcomes and endpoints, study procedures, and analysis plans. It should be noted that masking is not an all-or-nothing phenomenon; incorporating a portion of these suggestions can incrementally reduce participants’ confidence in their treatment arm assignment and thereby attenuate the influence of treatment-nonspecific factors in interpretations of clinical trials.

Fig. 3
figure 3

Recommendations for improving methodology in psychedelic trials. Overview of our recommendations for improving experimental methodology in future clinical trials with psychedelics

Study development and design

The choice of a control condition, the number of study arms, and overall design should be determined by the specific purpose of the study (Freedland, 2020; Gold et al., 2017). For example, although an open-label study design does not mask participants or control for treatment-nonspecific factors, it may be appropriate when the purpose of the study is to examine safety, feasibility, or proof-of-concept. If the purpose is to examine treatment efficacy, inactive control conditions (e.g., treatment-as-usual, waitlist controls) should be included at the minimum to control for some treatment-nonspecific factors, such as natural history or regression to the mean. A stronger study design to test for efficacy would include an active control condition, such as an active placebo that mimics some of the acute effects of a psychedelic. Including both an active and inactive control condition (i.e., 3-arm design) is a promising way to disentangle placebo effects (Fillingim and Price 2005; Smith et al. 2020; Vase and WartolowVaseska 2019), because 3-arm trial designs allow for comparisons between both the treatment and the active placebo conditions with the inactive control condition to delineate treatment-specific effects from placebo effects (see Fig. 1a). There are also alternative study designs that may be especially useful because of psychedelic trials’ vulnerability to large placebo effects. Sequential parallel designs with a placebo run-in period can reduce the size of placebo effects by excluding “placebo responders” from the subsequent treatment phase (Campbell et al. 2019; Dworkin et al., 2010; Ivanova et al. 2016; Tamura and Huang 2007). This alternative design can be implemented in psychedelic trials by giving all participants an active placebo in the first phase and then randomly assigning only the participants who did not respond to the initial treatment (i.e., placebo nonresponders) to the psychedelic or placebo in the second phase. This placebo run-in period creates a subgroup for analysis that increases the sensitivity to detect a treatment-specific effect (Dworkin et al., 2010; Ivanova et al., 2016); however, a recent systematic review challenges the notion that this design actually reduces the measured placebo response (Scott et al. 2021).

We also recommend designing studies with a single psychedelic administration when possible, given our current understanding regarding the efficacy of psychedelic therapy. There are compelling reasons to believe that multiple psychedelic dosing sessions may have therapeutic advantages (Bouso et al. 2013; Leger and Unterwald 2021; Mithoefer et al. 2019), and this treatment model is very likely to be adopted in clinical practice if these therapies become FDA-approved. On the other hand, the current controversies surrounding psychedelic therapy are focused on whether there is any drug-specific benefit of the complex therapeutic intervention. The answer to this basic question is very likely to inform regulatory decisions, cost-effectiveness models, and coverage by insurers, and is dependent on adequately masked trials. To that end, studies with only a single dosing session are likely to be superior in supporting adequate masking compared to studies with multiple dosing sessions. That is, once participants have experienced the subjective effects of a substance, they are more likely to identify that substance if it is readministered or recognize that a different substance has been given, compromising the conclusions that can be drawn from the trial (Wilsey et al. 2016). Therefore, we recommend between-subjects designs with a single dosing session when evaluating treatment efficacy.

Several trials have included an open-label crossover component, wherein patients assigned to the inactive control arm are offered the opportunity to receive open-label psychedelic therapy after completing the final post-treatment assessment (Wolfson et al. 2020). Some have argued that this design feature is ethically mandatory in order to provide the patient with the best possible chance of therapeutic response. We disagree with the idea that the standard of care, or optimal care, involves offering unregulated and unapproved psychedelic therapy, particularly when the goal of these trials is to establish the efficacy of these same interventions. We recommend incorporating well-established strategies to minimize harm to participants that may arise if an experimental therapy is either harmful, or conversely highly effective, rendering placebo treatment unethical. “Stopping rules” are predefined time points where an interim analysis for efficacy can be performed to identify these situations and minimize harm. Alternatively, adaptive randomization based on outcome (see below) can achieve a similar goal while maintaining statistical power (Dragalin 2011)Footnote 3. We also emphasize the importance of including robust psychotherapeutic support in any treatment arm when dealing with high-risk populations selected for treatment resistance, both to maximize patient safety and monitoring and to better assess drug-specific enhancement of psychotherapy as discussed previously.

Participant recruitment and selection

We recommend recruiting psychedelic-naive participants when possible for clinical trials. Masking an individual’s treatment condition is much more feasible if they have no prior experience with that substance and are less certain about what effects to expect (i.e., process expectations; Tambling 2012; Wilsey et al. 2016). On that basis, participants should be naive to the active placebo as well. Ostensibly, psychedelic-naive individuals would have less confidence as to whether they received the treatment or active placebo, particularly if the active placebo had hallucinogenic effects. Carbonaro et al. (2018) demonstrated that experienced hallucinogen users are highly accurate at differentiating between whether they received psilocybin or DXM, but those without prior hallucinogen use may be easier to convince, especially if this strategy is combined with other recommendations given here (e.g., incomplete disclosure of study design, between-subjects designs with a single drug administration). It should be noted, however, that a challenge with this design is that several psychoactive substances (e.g., cannabis, opioids) are known to elicit different subjective and behavioral responses in drug-naive individuals compared to those with past experience (Solowij et al. 2019). This appears to be the case with psychedelics too, as demonstrated by a negative relationship between number of previous psychedelic uses and the intensity of acute effects (Aday et al. 2021). Thus, the phenomenological experience and intensity of drug effects may differ in first-time users, which could limit generalizability. If recruiting only psychedelic-naive participants is not feasible given the increasing number of recreational users (Yockey et al. 2020), then imposing clear exclusion criteria, such as restrictions on number of lifetime uses or use within the past 12 months, should be incorporated.

Outcomes, assessments, and endpoints

The choice of outcomes, assessments, and endpoints can have a large impact on the evaluation of treatment benefit and overall methodological rigor of psychedelic clinical trials. The primary endpoint for a trial should be well-defined, reliable, and represent a clinically meaningful outcome of how a patient feels, functions, or survives (e.g., Fleming and Powers 2012; US FDA 2009). Outcome measures should be consistent with expert recommendations or consensus statements for a given disease or condition under study when available (e.g., Deyo et al. 2014), and the minimal clinically important difference in the primary outcome measure that represents a treatment benefit should be set a priori (e.g., Dworkin et al. 2008, 2009). There are unresolved questions regarding the long-term efficacy of psychedelic therapy. Lasting, clinically significant improvements following psychedelic therapy, regardless of any placebo group difference, are likely more important to patients, providers, and stakeholders than an acute improvement that is not maintained. However, given the current level of evidence and controversy regarding the drug-specific efficacy of the treatment, we emphasize the primary importance of rigorous, well-controlled trials is to define clear evidence of benefit that outlasts the acute drug effect. The specific timing of outcomes will depend heavily on the indication under consideration. Although long-term follow-ups provide a more complete understanding of treatment effects, especially in trials on chronic conditions, they are still susceptible to placebo effects and selection bias affecting trials from the outset. For example, a well-designed, masked RCT showed that arthroscopic knee surgery was never better than placebo surgery across 2 years of assessments (Moseley et al. 2002).

We recommend using multiple methods of measurement to comprehensively examine the effects of psychedelic therapy in clinical trials. Patient-reported outcomes (PROs) assess the status of a patient’s health condition (e.g., disease symptoms, functioning) directly from the patient and are commonly used as endpoints in clinical trials (Mercieca-Bebber et al. 2018; US FDA 2009). Including valid, reliable, and clinically informative PRO measures is valuable because they capture patient-centered perceptions of meaningful change and have downstream influence on clinical decision-making, drug labeling claims, and health policy (Calvert et al. 2018; Doward et al. 2010). Clinician-administered assessments or observer reports can also be useful in psychedelic trials as they avoid potential self-report biases of PROs; however, these types of assessments are also vulnerable to methodological issues, such as low interrater reliability and rater bias (Kobak et al. 2007). Therefore, when feasible, trials should also include objective and reliable measures, such as biomarkers and/or behavioral tasks that reflect component processes related to the index pathology. Two categories of biomarkers recognized by the FDA (Smith et al. 2017; US FDA 2020) that may be particularly relevant for psychedelic clinical trials are predictive biomarkers and surrogate endpoints. Predictive biomarkers indicate whether certain participants respond differentially to the treatment or placebo and can be used to stratify randomization on variables of interest that may maximize the efficiency of a trial and minimize the risk of exposing additional patients to an unproven treatment (Strimbu and Tavel 2010). Surrogate endpoint biomarkers include accurate and well-validated lab measures or physical signs that reliably predict or stand in for a clinically meaningful endpoint (e.g., biomarkers of abstinence; Johnson et al. 2014; Fleming and Powers 2012). Not all diseases or health conditions have biomarkers that predict treatment benefit or represent clinical endpoints, but when available, inclusion of these types of biomarkers may lead to more efficient trials with less bias (Fleming and Powers 2012). Because psychedelic clinical trials are particularly expensive, one must weigh the tradeoffs between trial costs and participant burden with the addition of biomarkers, long-term follow-ups, and lengthy assessments.

Study procedures: managing and measuring treatment expectations

Several pragmatic steps can be taken at the beginning stages of a study to manage participants’ expectation bias. We do not currently have sufficient data to claim that psychedelic therapy is an effective treatment; therefore, investigators should emphasize the uncertainty regarding the treatment efficacy, rather than insinuating that the treatment will improve participants’ symptoms (Erpelding et al. 2020; Evans et al. 2021; Gewandter et al. 2020; Smith et al. 2020). This communication on the uncertainty of treatment efficacy should be consistent across recruitment materials, initial contact with potential participants, consent forms, and any interactions with participants. Moreover, in trials comparing psychedelic therapy to placebo, drug effects should be explained neutrally (Smart et al. 1966). For example, participants can truthfully be informed about possible drug effects while also noting that there is significant variability between people—some people have strong reactions to a psychedelic while others have very mild reactions (Griffiths et al. 2016). Similarly, in studies in which both treatment arms receive psychotherapy, the investigator can honestly describe psychotherapy as an effective treatment whether or not it is paired with a psychedelic. To ensure this clinical equipoise and manage participants’ expectations, all study staff should be masked to treatment arm assignment and trained to present the study and arms of the trial neutrally.

In addition to managing expectations, it is important to measure participants’ treatment expectations. We and others (e.g., Muthukumaraswamy et al. 2021) recommend the use of established measures of expectancy, such as the Stanford Expectations of Treatment Scale (Younger et al. 2012), which is a valid and reliable measure of participants’ positive and negative treatment expectancies. The scale includes six items that can easily be adapted across research contexts to identify differences in expectancies between treatment groups as well as relationships between treatment expectancies and outcomes. The Credibility and Expectancy Questionnaire (Devilly and Borkovec 2000) can also be used to measure the degree to which a participant thinks and feels the treatment will improve their symptoms or functioning. Furthermore, several face-valid questions, such as “how helpful do you believe the treatment will be for improving your [primary symptom]?”, have been used successfully to measure treatment expectations in previous research (e.g., Sherman et al. 2010). Another option is to conduct semi-structured interviews, possibly during participant preparation and integration sessions, and use qualitative analyses to assess participants’ positive and negative treatment expectations (e.g., Eaves et al. 2015). Because of the aforementioned issues with unmasking following a psychedelic session, and the interaction between masking and expectations, it may be useful to measure treatment expectations after the drug dosing session in addition to those at baseline. Arguably, expectations at baseline may be predictive of subjective effects during the psychedelic session, and expectations at post-session may be predictive of changes in clinical outcomes. This speculation remains to be tested, but it is worthwhile to systematically evaluate the natural dynamics of expectations during psychedelic trials and examine whether expectations change after the dosing session.

Study procedures: incomplete disclosure

We have reviewed studies where incomplete disclosure has been used to reduce participants’ certainty regarding their treatment assignment (Bershad et al. 2019; Carbonaro et al. 2018; Griffiths et al. 2006; Reissig et al. 2012). In designing a trial, it is critically important to distinguish “incomplete disclosure” from “deception.” Most institutional review boards have internally defined these respective procedures; however, “deception” is generally agreed to mean that the investigators provide false information to a participant whereas “incomplete disclosure” indicates that the subject is not fully informed about the purpose or design of the study. These strategies are controversial—the ethics of omitting important information about a study and misleading participants is an area of ongoing debate (Miller et al. 2005; Roulet et al. 2017). Implementing any deceptive practice requires thorough scientific justification and authorization by institutional review boards. Empirical evidence in healthy adults suggests that research participants may not be adversely affected by deception (Mundt et al. 2017); however, in the context of clinical trials in which therapeutic alliance is critical for patient safety and treatment efficacy, deception may be particularly ill-advised. If it is considered ethically appropriate, though, withholding information from participants as well as study staff about the number of study arms and the exact doses administered may be particularly effective for enhancing masking success. Providing a vague, incomplete description of the study structure and a range of possible dosages may be best suited for standard, two-armed RCT designs (and avoids the need to use an alternative study design that requires a significantly larger sample size for adequate statistical power). Without the cues of knowing that it is only possible to receive the experimental treatment or placebo (e.g., a high dose or an ultra-low dose of a psychedelic), it may be difficult for both the participant and staff to develop a firm belief about the participant’s treatment condition. Similarly, listing the side effects of all of the potential study drugs together—instead of listing effects specific to each substance—may be an ancillary strategy to reduce participants’ confidence in their treatment arm assignment while still fully informing them of all the drug effects they may be exposed to (Boutron et al. 2006). In a recent study with 5-MeO-DMT, researchers withheld the identity of the study drug but informed participants that they would be receiving a tryptamine psychedelic (Reckweg et al. 2021); this may be a useful method for managing expectations in cases where participants could have distinct expectations regarding specific psychedelic substances. A related recommendation to improve methodological rigor in the field is for researchers to report what drug effects participants were informed about prior to the study.

Incomplete disclosure to participants and study personnel regarding key elements of a study’s design may help to meet a central objective of masking: establishing “a state of ambivalence” about treatment allocation to minimize the impact of beliefs on study outcomes (Mathieu et al. 2014). Ensuring that study staff receive the same information as participants and remain unaware of the true design throughout the study is critical, as feedback from observers is known to influence participants’ clinical outcomes (Colagiuri and Boakes 2010; Hróbjartsson et al. 2012). It is important to acknowledge that undertaking this effort—concealing fundamentals of study design from staff as well as participants—is challenging from a practical standpoint, requiring careful management of access to information about the study (e.g., a “cone of silence”). Using incomplete disclosure or deception also necessitates appropriate debriefing protocols, as well as development of masking assessments that avoid revealing the true study design. Most assessment tools in the clinical trial literature measure perceived treatment assignment as nominal data and implicitly indicate study design (i.e., “Do you think you received the active treatment or placebo?”). Probing participants’ and staff members’ beliefs using ordinal/parametric scales may not only allow investigators to maintain uncertainty about the design, but also has the advantage of increasing statistical power (Laferton et al. 2017).

Study procedures: active placebo

Use of an active placebo has a clear rationale for psychopharmacology studies. However, as reviewed above, efforts to mask the unique subjective effects of psychedelics have had limited success. Our choices are largely constrained by a limited understanding of how psychedelics produce therapeutic benefits. For example, a drug that mimics psychedelic effects but provides no therapeutic benefit could potentially be an excellent active placebo. However, the internal contradiction in this strategy becomes apparent if, as several researchers argue (Yaden and Griffiths 2020), the subjective effects produced by psychedelics (particularly mystical states) themselves drive therapeutic benefit. Although intuitive, this hypothesis is nonetheless unproven and a thorough evaluation is beyond the scope of this review; we instead refer the reader to an excellent summary of arguments for and against this idea (Olson 2020; Yaden and Griffiths 2020). We anticipate that future research will clarify whether mystical states induced by means other than psychedelics such as hypnosis (Lynn and Evans 2017), holotropic breathwork (Puente 2014), meditation (Russ and Elliott 2017), virtual reality (Glowacki et al. 2020), or non-psychedelic psychoactive drugs (Earleywine et al. 2021) are sufficient for therapeutic effects observed in psychedelic therapy trials, such as smoking cessation and symptomatic relief from depression in appropriate target populations.

A deeper understanding of the neural systems and neurochemistry required for psychedelics’ therapeutic effects may lead to highly effective comparators for use in clinical trials. A recent clinical study investigating the antidepressant mechanism of ketamine illustrates that the acute subjective effects of a psychedelic-class drug may be separable from its therapeutic effects. Williams et al. (2018, 2019) found that a high dose of an opioid antagonist, naltrexone, effectively blocked ketamine’s antidepressant and anti-suicidal effects but had a minimal impact on ratings of ketamine-induced dissociation. This small study was met with some controversy (Heifets et al. 2019; Marton et al. 2019; Yoon et al. 2019) and requires replication in a larger independent sample. Also, notably, the authors did not formally assess masking efficacy in the respective treatment conditions. Nonetheless, these findings suggest a powerful active placebo comparator for future studies of ketamine, and potentially other psychedelics. Similarly, for classical psychedelics like psilocybin, pharmacological agents may be discovered that interrupt neuroplastic processes triggered by psilocybin, but do not interfere with its acute psychedelic effects. Another highly innovative approach in development (NCT04842045) pairs psilocybin with an amnestic drug (midazolam, a benzodiazepine). This study is focused on safety. The broader hypothesis, yet to be tested, is that psychedelic and mystical states evoked in participants who do not form memories of the experience are not therapeutic, likely because participants’ amnesia prevents subsequent therapeutic integration of the psychedelic experience. An alternate outcome may be that participants do experience therapeutic benefit, but are effectively masked to their assigned treatment condition by virtue of midazolam-induced amnesia. In this case, a near-perfectly controlled, masked study design is achieved, with an easily interpretable finding for psilocybin’s efficacy, uncomplicated by differential placebo or nocebo effects in patients receiving midazolam alone versus midazolam plus psilocybin. We eagerly anticipate results from this pioneering line of inquiry and note several challenges. In addition to the ethical considerations of using amnestic agents in psychiatric populations, there are technical considerations that may confound this approach, including uncertainty as to whether midazolam retains its amnestic property when paired with a psychedelic, whether amnestic doses of midazolam produce a degree of sedation that precludes entry into a mystical state, or whether midazolam directly blocks therapeutic psychological or neural mechanisms induced by psychedelic medications.

Psychedelic therapy may be an uninterruptible whole, requiring the drug, psychedelic experience, and associated psychotherapy to achieve any therapeutic benefits (Sessa 2014). In this case, which should be assumed true until proven otherwise, there is still a pragmatic need to identify pharmacological and somatic placebo treatments that adequately mask psychedelic effects. Although we have no evidentiary basis to recommend specific active placebos beyond those that have been attempted, substances with hallucinatory effects (e.g., ketamine, DXM, and high doses of tetrahydrocannabinol) may be compelling options, especially when combined with drug-naive participants. We strongly support studies specifically devoted to developing and testing active placebos for use in therapeutic clinical trials. The need to develop active placebos for participants with past psychedelic use is particularly important given the likely decrease in psychedelic-naive participants that can be recruited for clinical therapeutic studies in the coming years.

Design of an active placebo ought to be considered in concert with other study design elements described above, with the overarching goal of reducing a prospective study participant’s certainty of their treatment condition. For example, if testing psilocybin’s efficacy for major depressive disorder, investigators may combine active placebo and incomplete disclosure to balance expectancy effects across treatment arms. For simplicity, the study could be designed as a two-arm comparison of high-dose psilocybin versus ultra-low-dose (ineffective) psilocybin plus an active placebo. During the informed consent process, participants would truthfully be informed that they will receive a range of psilocybin doses and may also receive an active placebo, with full disclosure that the purpose of the active placebo is to reduce their certainty of treatment assignment. The number of study arms (two, in fact) and the likelihood that their assigned psilocybin dose would be effectively non-therapeutic would not be disclosed. Furthermore, informed consent could include information that subthreshold (but not ultra-low) psilocybin may have therapeutic value, although, again, it would not be disclosed that no participants would be assigned to a subthreshold dose group. In this case, the specific goal of an active placebo might be to mimic aspects of a high-dose psilocybin dose, which could be achieved with DXM or perhaps a combination of a benzodiazepine and a mild stimulant. Taken together, participants would be informed of all the possible treatment conditions and may be reasonably uncertain as to whether they received a high therapeutic dose of psilocybin versus an ultra-low dose plus active placebo.

Analysis: assessing and reporting outcomes related to trial design

The set of treatment-nonspecific effects, collectively termed “the placebo effect,” and effective masking are key considerations for designing an interpretable study involving psychoactive drugs. Anticipating the placebo effect, measuring the contribution of expectancies, assessing the effectiveness of masking, and systematically reporting these data will set standards and lead to iterative improvements in trial design. These factors ought to be considered at every step in the lifecycle of a clinical study. We specifically recommend calculating statistical power based on known placebo effect sizes, obtaining repeat baseline measures of the primary outcome(s), measuring expectancies and masking success, and analyzing primary outcomes using expectancy and perceived (rather than actual) treatment arm as covariates.

Estimating the size of the placebo effect informs statistical power calculations, which, if resources are limited, may impact the feasible number of treatment arms. A common method of estimating the size of the placebo effect in a trial is to compare outcomes in the placebo arm to a “no treatment” arm (Hróbjartsson and Gøtzsche 2010; Wampold et al. 2016). However, given the previously discussed “hype” around psychedelics, participants randomly assigned to the “no treatment” arm would likely experience disappointment and nocebo effects from their knowledge of not being in the active treatment. An alternative method of partitioning the placebo effect from the treatment effect may be to compare against a “placebo benchmark” (Jones et al. 2021). Jones and colleagues found that the effect size of the placebo effect was uniform across different treatment approaches for depression (pooled Hedge’s g = 1.05). In areas where the size of the placebo effect has been well-established, researchers may be able to compare their anticipated effect size against a criterion. Investigators can also take simple steps to minimize some components of the placebo effect, such as regression to the mean. We recommend that investigators perform repeat baseline assessment of their outcome of interest and only enroll participants with stable response characteristics. This procedure may be more cost-effective than including an untreated control condition to estimate regression to the mean.

We strongly recommend measuring the factors that make up the placebo effect. Prior to conducting any study procedures (e.g., preparation sessions), participants’ treatment expectations should be measured as described above. Measuring masking efficacy is similarly important and should be appropriately timed. In many cases, the clinical benefits of psychedelics may be rapid (Majić et al. 2015; Murphy-Beiner and Soar 2020). We recommend measuring participant- and therapist-perceived treatment allocation, certainty of treatment allocation, and the reason for their guess both immediately after the psychedelic dosing session(s) and at the end of the study.

Including two measurement occasions may help determine whether participants and therapists guessed the treatment allocation based on the subjective effects during the treatment session or from changes in clinical symptoms over time (Katz, 2021; Kolahi et al. 2009). We agree with Katz (2021) that accurate guesses of treatment allocation due to treatment efficacy should not be considered unmasking. To further redress the influence of masking, we suggest using clinical assessors who are unaware of the study design and participant treatment allocation to collect all relevant measures. Clinical assessors should also be asked about perceived participant treatment allocation at the end of the study (Katz 2021). We again emphasize that investigators should create protocols and adherence plans for all relevant study staff to maximize the chances that masking is maintained throughout the study.

Participant expectations and functional unmasking may be unavoidable sources of bias that impact internal validity and the inferences that can be drawn from study results (Higgins et al., 2011; Kolahi et al., 2009). However, modern adaptive trial designs can help investigators at least achieve an even distribution of these biases across conditions. A thorough discussion of adaptive designs is beyond the scope of this review, and we refer the reader to two useful summaries, including draft guidance from the FDA on adaptive trial design for industry (FDA 2019; Pallmann et al. 2018). In short, investigators may consider using expectancy and participant-assessed treatment conditions to create balanced randomization blocks (i.e., covariate-adaptive treatment assignment) just as other clinical trials stratify recruitment on the prevalence of comorbidities, sex, and other factors that may differentially impact treatment outcomes. For small exploratory trials, it may not be possible to balance on multiple pre-treatment variables; therefore, the decision to balance recruitment on treatment outcome expectations must be weighed against other recruitment priorities.

A major benefit of measuring expectancies and masking efficacy is that these factors can be used as covariates in the analysis of primary study outcomes, and the specific effects of expectancy and treatment arm guess on outcome can be evaluated. In the previously discussed microdosing study by van Elk et al. (2021), researchers initially found that microdoses of psilocybin led to greater ratings of awe than placebo; however, after adding baseline expectations as a covariate to the analyses, the difference between conditions was non-significant. In a study that employs an effective active placebo, outcomes can be analyzed according to the drug that participants think they received compared to the drug they actually received. In a study measuring pleasantness of affective touch, Bershad et al. (2019) found a significant effect of MDMA compared to an active placebo, methamphetamine. A substantial number of participants who received methamphetamine believed they had received MDMA (38.9%). Analyzing outcomes using a participant’s guess as a covariate showed no effect in this latter group. This comparison strongly reinforced the authors’ conclusion that the effect of MDMA on affective touch was drug-specific and not a product of participants’ expectations.

Beyond the scope of RCTs

One notion to consider is embracing expectancy and placebo effects. The important role of expectancies in psychedelic therapy blurs the line between treatment-specific and treatment-nonspecific effects and raises the broader question: rather than eliminating treatment-nonspecific effects, should trialists be looking for ways to optimize and synergize them with treatment interventions to enhance clinical outcomes (Colloca and Barsky 2020; Enck et al. 2013)? Although no formalized manual exists on how to boost expectancy in psychotherapy, inducing positive expectations has been shown to enhance the effectiveness of a variety of health interventions (Bingel et al. 2011; Flowers et al. 2018; Kaptchuk et al. 2020), a strategy which could seemingly be tailored to—and be particularly synergistic with—psychedelic treatments as well. As discussed previously, placebo and drug-specific effects are likely to be interactive rather than additive (Kube and Rief 2017). Thus, it may be the case that the “therapeutic window” opened by psychedelics is an emergent property of a complex system comprising expectations, drug effects, setting, and therapeutic alliance. It may be impossible to isolate an individual component of this complex package in an RCT. Critically, this does not condemn psychedelic therapy as being no more effective than placebo, but means that the current gold standard clinical trial design may not be sensitive to detecting the therapeutic effect of an individual treatment element.

A potential solution to this dilemma may be to shift focus from efficacy trials and the use of explanatory or confirmatory RCT designs towards pragmatic clinical trial designs (PCTs) that have an alternative goal of assessing treatment effectiveness. Whereas internal validity (i.e., objective comparison of drug vs placebo in tightly controlled settings with homogenous groups) is the major objective of an explanatory or confirmatory trial, external validity and the generalizability of treatment effectiveness are the primary focus of a well-designed PCT. Consequently, PCTs offer potential “real-world” tests of clinical effectiveness and the generalizability of outcome data, rather than isolation of the active ingredient for change. To achieve these goals, PCTs typically include one or more alternative therapies to the treatment under study, rather than active or inactive placebos, and participants are normally recruited from a broad “real-life” clinical population, with few exclusions or restrictions on participation. Although pragmatic trials are normally conducted in the fourth, post-marketing phase of drug development, Carhart-Harris et al. (2021) have argued cogently for the potential benefits of pragmatic designs being used earlier to broadly assess the clinical effectiveness of current psychedelic treatments, either as an alternative or complement to the much narrower focus of current RCTs.

Lastly, a closely related approach to consider when testing the effectiveness of psychedelic therapy is to evaluate large-scale population data using so-called “natural experiments.” Natural experiments provide an alternative to RCTs by taking advantage of circumstances whereby naturally occurring events can be linked to variables of interest (Thapar and Rutter 2019). This type of design is necessary when randomly assigning individuals to masked conditions is not possible because of ethical or logistical constraints, such as when studying maltreatment or child neglect (Rutter 2007). If the challenges related to expectations and masking with psychedelics preclude rigorous RCTs, natural experiments may be another method of evaluating the treatment’s effects. With the recent legalization of psilocybin therapy in Oregon as well as successful decriminalization movements across the USA (Aday et al. 2020a; Marks and Cohen 2021), it is possible that objective indices related to mental health (e.g., suicide rates, emergency room visits for psychiatric issues) could precipitously decrease at the population level if psychedelics are indeed an effective treatment for a variety of psychiatric conditions. Although it is unclear what the initial accessibility of these treatments will be to individuals in states such as Oregon (Williams and Labate 2020), if positive trends in mental health are observed at the population level after the introduction of legal psychedelic therapy, the role of expectations may be considered immaterial to the broader benefits to society.

Conclusion

Accurate detection of treatment-specific effects in clinical trials is an intrinsically complex task across areas of research as study personnel and participant expectations interact dynamically with masking and therapeutic outcomes. Psychedelic studies are particularly challenging as they must address additional confounds related to “hype” and salient psychoactive effects that hinder treatment arm masking to an extensive degree. On one hand, to characterize clinical efficacy and safety, it is an essential challenge for the field to separate pharmacological effects from multiple, interactive socio-psychological influences in psychedelic medicine. Innovative, disruptive experimental designs may be needed to this end. On the other hand, at a practical level, it is important from a public health standpoint to identify methods of optimizing psychedelic treatment outcomes, perhaps by utilizing expectancies. These results could potentially guide clinical decision-making.

Traditional placebo masking with inert comparators is insufficient for high-dose psychedelic studies, and this review highlights that this issue often extends to psychotherapy and pharmacology research more broadly. Here, recommendations are presented for improving the methodological rigor of future psychedelic studies that addresses issues related to expectations and participant masking. Specifically, we provide guidelines on study design (e.g., incomplete disclosure of treatment arms, neutral explanation of drug effects), participant recruitment and selection (e.g., include psychedelic- and active placebo-naive participants), outcomes and endpoints (e.g., include biomarkers and behavioral measures), control conditions (e.g., use active comparators), and analyses (e.g., test masking efficacy, control for pre-treatment expectations, compare against placebo benchmark). Although these recommendations are tailored to psychedelic studies, they can be incorporated into psychotherapy and pharmacology research more broadly to increase precision in identifying treatment-specific effects. Doing so may improve methodological rigor and identification of effective interventions across areas of medicine.