Introduction

Anxious individuals tend to interpret ambiguous information in a threatening way (Mathews and MacLeod 2005). According to cognitive models, this interpretation bias causes anxiety (Beck and Clark 1997; Clark and Beck 2010) To test this claim, researchers developed brief paradigms, called Cognitive Bias Modification for Interpretations (CBM-I), designed to shift interpretations to be more or less threatening. Initial studies demonstrated that it was possible to shift biases, and that shifting biases led to subsequent shifts in anxiety (Mathews and Mackintosh 2000; Mathews et al. 2007), providing support for cognitive models, and also providing initial evidence that CBM-I could be used as an intervention to treat or prevent anxiety.

Research has highlighted the clinical applications of CBM-I by showing that it can sometimes reduce symptom severity in anxious samples (Hirsch et al. 2016; Menne-Lothmann et al. 2014), can be disseminated online (Hoppitt et al. 2014), and can, in some circumstances, have effects as strong as gold standard treatments (Steinman and Teachman 2014). However, studies have obtained mixed results. For example, while some Internet-based CBM-I studies have been able to shift biases in expected directions and reduce subsequent psychopathology (Williams et al. 2013), others found that CBM-I does not affect subsequent anxiety (Steinman and Teachman 2015). Further, CBM-I tends to have stronger effects on subsequent anxiety when administered in a lab-setting vs. over the Internet (Kampmann et al. 2016). Finally, although individual studies demonstrate that it is possible to reduce anxiety through CBM-I, meta-analyses suggest the effect is small (Hallion and Ruscio 2011) and inconsistent (Cristea et al. 2015; Menne-Lothmann et al. 2014). Importantly, a review of CBM meta-analyses suggests that CBM-I has a reliable, moderate effect on shifting interpretation bias and reducing anxiety symptoms, but does not reliably decrease emotional vulnerability (i.e., responses to stressors; Jones and Sharpe 2017).

Researchers have tested several variations of CBM-I, providing clues to how to optimize CBM-I’s effects. For instance, Holmes et al. (2006) found that effects of CBM-I are enhanced when participants are encouraged to vividly imagine the scenarios occurring. Similarly, Edwards et al. (2018) found that priming anxious imagery before CBM-I resulted in less anticipatory anxiety in response to a stressor but other results were mixed. Hoppitt et al. (2010) found that actively generating the positive endings of scenarios affected subsequent emotionality of ambiguous scenarios following training, while a condition without active generation did not. Further, Lee et al. (2015) found that asking participants to imagine a future event was more effective at reducing negative interpretations, compared to a condition in which participants were just asked to think about the content of the scenarios. A challenge in generalizing from these different attempts is that the variations are all tested in independent studies, making it difficult to compare effects.

The current study tests multiple variations of CBM-I in a large, analogue anxious sample. A team of students was tasked with generating new variants of the standard CBM-I that they thought would lead to strong effects. The goal was to determine which strategies are optimal, so that researchers can then follow-up on testing the strongest variations in future full trials. This study included a common form of CBM-I, which we call “traditional CBM-I,” in which participants read and imagined themselves in ambiguous scenarios that ended with a word fragment that disambiguated the scenario in a non-threatening way, followed by a comprehension question that emphasized the non-threatening interpretation of the scenario (modified from Mathews and Mackintosh 2000). It also included 12 variations on this traditional CBM-I, a non-CBM-I condition designed to improve cognitive flexibility, a neutral scenarios condition, and a no task control condition. Data were collected via Amazon’s Mechanical Turk (MTurk), which provides quick and easy access to clinical and subclinical samples, and high quality data (Shapiro et al. 2013). Specifically, demographic and mental health data collected via MTurk have high test–retest reliability and internal consistency (Shapiro et al. 2013). Notably, participants report social anxiety levels that are approximately one standard deviation above the college mean, highlighting the high prevalence of anxiety among MTurk participants (Chandler and Shapiro 2016).

Testing New CBM-I Variants

Each variant of CBM-I built on prior theory or empirical findings to modify or add an element to training expected to enhance the effects of CBM-I.

Enhancing Imagery

Building from work showing that focusing on imagery, rather than text, led to stronger CBM-I effects (Holmes et al. 2006), in the Imagery Only/Audio variation, participants listened to an audio presentation of scenarios. Building from evidence that implementation intentions (if–then statements) can have strong effects on thought and action (Gollwitzer and Sheeran 2006), in the Implementation Intention condition participants were provided an implementation intention related to generating imagery when completing scenarios, which was expected to enhance visualization of the training material, promoting greater affective engagement with the material. (Note, the way different variants were grouped together was done for ease of discussion, but is certainly not the only grouping possible and many of the variants could fall into multiple categories.)

Increasing Engagement

Following results suggesting active generation and engagement leads to stronger CBM-I effects (Hoppitt et al. 2010), a Generate Positive Ending condition was included, in which scenarios were missing the final word (as opposed to including an fragment for the final word), and participants were instructed to type in a word to complete the scenario in a positive way (note, we were unaware when designing this condition that a related approach by Rohrbacher et al. (2014) did not strengthen effects). Another variant designed to increase the challenge, and hence engagement, was a Group of Words variation, in which participants were asked to select the final word of each scenario from a list of possible options.

Facilitating Readiness for Learning

Based on findings that priming anxiety in advance of attention training may improve CBM outcomes on the Internet (Kuckertz et al. 2014) and research suggesting that emotional memories can be effectively modified with activation and re-consolidation (Schwabe et al. 2014), the Anxious Imagery variation included an anxiety prime prior to CBM-I. Borrowing from contingency management research that suggests paying people for following through with tasks increases treatment effects (Prendergast et al. 2006), a Monetary Reward condition was included. In this condition, participants would intermittently receive a monetary bonus for answering comprehension questions correctly. Given that being present-focused and nonjudgmental might increase openness to learning new interpretive styles (Teper et al. 2013), a Mindfulness condition was included, in which participants completed a brief mindfulness task before CBM-I based on the idea that even a temporary state of nonjudgmental present focus might enhance openness to learning a new interpretation style.

Encouraging Flexibility

The benefits of CBM-I may be due to teaching participants to be more flexible in their interpretations, rather than teaching them to always interpret things positively. Therefore, a Switching condition was included, designed to increase flexibility by switching between interpreting blocks of scenarios in a negative or positive way. Analogously, a Reframing Negative Events variation was included, in which participants saw a mix of negative and positive scenarios, but following a negative ending, they saw a second resolution sentence that assigned a more positive outcome. This was intended to increase flexibility and teach participants to reappraise negative events (Beltzer et al. 2014). A variation in which scenarios ended positively half of the time and ended negatively half of the time in random sequence was also included (50/50). Although past researchers have considered this condition to be a control condition, results from past CBM-I studies suggest that this condition may not be inert, and may in fact be training cognitive flexibility (Menne-Lothmann et al. 2014).

Adding Pictures

Variants that added an explicit visual component to increase participant immersion were included. In the Adding Pictures variation, participants saw a picture related to the scenario they were reading. To emphasize the link between the meaning assigned and its emotional consequence, a Matching Facial Expressions variation was included. In this variation, following scenarios, participants were shown faces with different expressions and asked to select the face that matches the emotion in the scenario.

Non-interpretation Comparison Conditions

To test an alternate way to improve cognitive flexibility, rather than emotional interpretive style, Attention-switching was included. This condition required participants to switch rapidly between paying attention to one or the other of two simultaneously presented stimulus features; no scenarios were presented. Additionally, a control condition was included in which all scenarios were neutral and did not involve emotional ambiguity (Neutral). Finally, a No task Control condition was included.

We predicted that all 13 variations of CBM-I would lead to a less threatening interpretation bias compared to the No task Control condition (but there were not specific predictions about how each CBM-I variation would differ from one another) in our analogue anxious sample. Testing effects of the Attention-Switching condition was exploratory, given no interpretation training was occurring but some increase in cognitive flexibility was plausible. We did not expect the Neutral condition to differ from the No task Control condition.

The goal in this exploratory study was to determine which CBM-I approaches are optimal so that researchers can then follow-up on testing the strongest variations in future trials.

Method

Participants

An analogue anxious sample of N = 1120 (75.3% female) adults were recruited over the Internet through MTurk in exchange for $1.50 (or $1.75 for participants in the Monetary Reward condition). Participant ages ranged from 18 to 68 (M = 27.52, SD = 10.16; note that we excluded age data from four participants who reported being born in 1925 or earlier). All participants scored 10 or above, indicating moderate to extremely severe anxiety based on established norms, on the Depression, Anxiety, Stress Scales—Short Form: Anxiety Subscale (DASS-Anxiety; Lovibond and Lovibond 1995), which was used as a screener prior to participation in the main study. All participants reported US citizenship. Race was reported as: 69.9% White, 3.1% Black, 6.1% Asian, 8.0% as more than one race, and 11.7% as other or unknown. Ethnicity was reported as 9.3% Hispanic, 77.7% non-Hispanic, and 13.0% unknown or unreported.

Materials

Conditions

See Online Appendix for sample materials from each condition.

100% Positive

Participants were asked to read and imagine themselves in 70 scenarios (modified from Mathews and Mackintosh 2000). Scenarios covered multiple fear domains, including potential social threats (e.g., meeting with your boss; 35 scenarios), physical threats (e.g., feeling lightheaded; 20 scenarios), and other threats (e.g., strange noise in the middle of the night; 15 scenarios). Each scenario ended with a word fragment. All scenarios were ambiguously valenced until the fragment, which disambiguated the scenarios in a non-threatening way. Participants had three chances to complete the word fragment correctly; after three tries the program would move to the comprehension question. Following each scenario, participants answered a comprehension question, which insured they read the scenario and reinforced the non-threatening interpretation. For example, participants might read, “Your partner asks you to go to an anniversary dinner that their company is holding. You have not met any of his/her work colleagues before. Getting ready to go, you think that the new people you will meet will find you fr_endly.” Participants would type “i” to complete the word “friendly.” Then, participants would be asked a comprehension question to reinforce the non-threatening interpretation of the scenario, such as, “Were you disliked by your new acquaintances?” and participants would type “n” for “no.” Participants had to answer the comprehension question correctly before proceeding to the next trial. All of the following conditions were identical to the 100% Positive, except for the specific differences noted below.

Imagery Only/Audio

Scenarios were presented via audio (instead of text); there were no word fragments. Comprehension questions were presented on screen.

Implementation Intention

Prior to training, participants were told to try their best to complete the task with the goal of imagining themselves in the stories as best they can. Participants were told to try to complete the task while thinking, “Whenever I fill in a new word fragment, I will do my best to imagine myself in the situation being described.”

Generate Positive Ending

Scenarios were missing the last word. Participants were asked to resolve the ambiguity of the scenario positively by typing in one or more words (vs. completing a word fragment) to complete the sentence (a subsample of entered words were later reviewed to confirm adherence to instructions).

Group of Words

Scenarios were missing the last word. Participants completed the scenarios by choosing a word from a list of four words as quickly as possible. All word choices were of similar valence, but only one option made sense grammatically and emotionally. Participants were prompted to select a different word if they chose incorrectly.

Anxious Imagery

Prior to training, participants were asked to complete a guided imagery exercise for 20 s in which they were asked to imagine themselves in a recent or upcoming situation that elicited/would likely elicit anxiety.

Monetary Reward

Participants were informed that some of the items would be eligible for a bonus, and if they got these items correct on the first try, they would receive a monetary bonus. After approximately one third of correctly answered comprehension questions, participants received feedback that they were correct and that they won additional bonus money. Following training, they were told they won an additional $0.25.

Mindfulness

Prior to training, participants completed a 3.5-min mindfulness task in which they listened to instructions to focus on their breathing, listen to surrounding sounds, and become aware of bodily sensations. Halfway through training, they were reminded to continue being mindful and led through a similar 1-min breathing exercise.

Switching

Half of the scenarios ended in a non-threatening way, and the other half ended in a threatening way. Participants practiced switching between non-threatening and threatening endings in a fixed order of blocks.

Reframing Negative Events

Half of the scenarios ended in a non-threatening way and the other half initially ended in a threatening way (presented in random order). Following each threatening ending, participants read another sentence that added a non-threatening outcome (often suggesting resilience) to the previous scenario. Comprehension questions always followed non-threatening endings.

50/50

Half of the scenarios ended in a non-threatening way and the other half ended in a threatening way (presented in random order).

Adding Pictures

Scenarios were accompanied by related neutral images.

Matching Facial Expressions

Twenty-five percent of comprehension questions were followed by a second “affective comprehension question,” in which participants were shown faces with different expressions and asked to select the face that matches the emotion in the scenario. Participants were told they were correct if they picked a positive face.

Attention-Switching

Instead of completing CBM-I, participants completed an “attention-switching” control task to improve cognitive flexibility (Karbach and Kray 2009). Participants were instructed to switch rapidly (every 1–2 trials) between paying attention to one or the other of two simultaneously presented stimulus features. On each trial, participants saw a stimulus appear in the center of the screen, and pressed one of two keys to sort the stimulus into the correct category.

Neutral

There was no emotional ambiguity in the scenarios and they referenced neutral situations.

No Task Control

Individuals completed assessments on the same schedule as the other participants, but did not complete any training. Following baseline assessments, participants were told they would complete a second set of questionnaires, and then immediately completed post-training measures (note that the RRT, described below, included a different set of scenarios the second time it was administered). They received the same study description as the other training conditions and were compensated equivalently.

Measures of Interpretation Bias

To evaluate if condition modified interpretation bias, participants completed the Recognition Ratings Task (RRT, modified from Mathews and Mackintosh 2000) before and after training (see Salemink and van den Hout 2010 for task validation). Note that this task is also referred to as the Similarity Rating Task (SRT) and Ambiguous Passages in the literature. Participants read and imagined themselves in seven novel scenarios about social situations that were similar in format to training scenarios, except each scenario included a title, and all scenarios remained ambiguous even when the word fragment was completed. For example, participants might see, “THE LOCAL CLUB: You are invited to attend a social event at a local club, although you don't know any of the members very well. As you approach the door you can hear conversation and loud music, but as you enter the room it stops for a mo_ent.” The matching comprehension question would be, “Do you know most of the club members very well?” Participants saw different scenarios before and after training.

Next, participants would see the title of each ambiguous scenario with a brief reminder of what the scenario was about, along with four disambiguated interpretations of each scenario. Two of the disambiguated interpretations were related to social concerns (labeled “targets,” one positive, one negative) and two were unrelated to social concerns (labeled “foils,” one positive, one negative). For example, participants would see “conversation stops and club members glare at you” (negative, target), “conversation stops so club members can greet you” (positive, target), “you realize your favorite song was just playing” (positive, foil), and “you realize you forgot your wallet at home” (negative, foil). Participants were asked to rate how similar each disambiguated interpretation was to what they believed was the meaning of the original scenario on a scale of 1 (“very dissimilar in meaning”) to 4 (“very similar in meaning”). The RRT was administered at both pre- and post-training. Cronbach’s alphas for negative and positive RRT for both time points ranged from 0.55 to 0.84. Of note, we assessed condition effects on positive and negative RRT separately (rather than creating a bias index) given recent work suggesting that positive and negative interpretations may not be a unidimensional construct (Steinman et al. 2019). Participants saw a different set of scenarios before and after training.

Given that the RRT is similar in format to CBM-I, an additional measure of interpretation bias was included following training: a subset of items from the Brief Body Sensation Interpretation Questionnaire (BBSIQ; Clark et al. 1997). In the BBSIQ, participants are presented with fourteen ambiguous events related to physical (e.g., feeling lightheaded) or external (e.g., smelling smoke, social situations) concerns, along with three possible explanations for each ambiguous event (one negative, and two neutral or positive explanations) Participants were asked to rate the extent to which they believed each explanation for why the ambiguous event occurred on a scale of 0 (“not at all likely”) to 8 (“extremely likely”). To obtain an interpretation bias score, ratings for the negative explanations were averaged (following (Steinman and Teachman 2010, 2015). The BBSIQ was administered post-training. Cronbach’s alpha = 0.97, suggesting good reliability.

Measures of Anxiety Symptoms and Emotional Vulnerability Footnote 1

Potential participants completed the Depression, Anxiety, Stress Scales—Short Form: Anxiety Subscale (DASS-Anxiety; (Lovibond and Lovibond 1995) as a screener. In the DASS-Anxiety, participants are asked how often seven anxiety symptoms applied to them in the past week, on a scale of 0 (“did not apply to me at all”) to 2 (“applied to me very much, or most of the time”). The DASS has strong psychometric properties (Antony et al. 1998). The DASS-Anxiety was administered prior to pre-testing measures. Cronbach’s alpha in current sample = 0.81.

To further assess baseline anxiety, participants completed the Overall Anxiety Severity and Impairment Scale (OASIS; Norman et al. 2006). The five-item OASIS assesses anxiety frequency, severity, and associated avoidance, work and social interference. All items are rated on a scale of 0 (lowest impairment/severity) to 4 (highest impairment/severity). The OASIS was administered pre-training. The OASIS was selected due to its brevity and strong psychometric properties; Cronbach’s alpha in current sample = 0.77.

To test if condition affected responses to a hypothetical stressful situation, participants completed the Anticipated Stressful Situation Questionnaire (ASSQ; modified from Murphy et al. 2007). In the ASSQ, participants are asked to vividly imagine themselves in a feared situation (e.g., public speaking, being in a high place) and then rate their predicted anxiety, desire to avoid the situation, probability of the situation turning out badly, and if the situation were to turn out badly, how manageable the consequences would be. All items were rated on a scale of 1 (least anxiety/avoidance/likelihood of negative outcome/consequence) to 5 (most anxiety/avoidance/likelihood of negative outcome/consequence). The ASSQ was administered post-training. Cronbach’s alpha in current sample = 0.82.

To test if condition affected emotional vulnerability in response to a stressor, all participants completed an Anagram Task post-training. In this task, participants solved as many anagrams out of 40 as they could in three minutes, followed by two extremely difficult, multi-syllabic anagrams. To operationalize emotional vulnerability, we assessed change in anxiety (on a − 3 to 3 Subjective Units of Distress Scale) before and after the Anagram Task.

Procedure

This study was conducted over the Internet via Amazon’s mTurk. Interested participants completed an initial consent and the DASS-Anxiety. Those who scored a 10 or above were invited to participate (indicating moderate to severe anxiety; Lovibond and Lovibond 1995). Participants completed a second consent form, followed by baseline assessments of interpretation bias (RRT) and anxiety (OASIS). Next, participants were randomly assigned to 1 of the 16 conditions. After completing their condition assignment, participants completed measures of interpretation bias (RRT and BBSIQ) and emotional vulnerability (ASSQ and Anagram Task). Participants were debriefed, and those assigned to a control condition were given the opportunity to complete the 100% Positive condition.

Results

This was an exploratory study with multiple dependent variables investigating 16 different conditions. Thus, the number of planned statistical tests was large. This was considered appropriate as the goal was to identify patterns that would suggest a larger, confirmatory trial of some subset of CBM-I variants would be beneficial. For this reason, the results were interpreted primarily in terms of effect sizes, though we also report standard test statistics and p-values to be comprehensive. Along these lines, we invite other researchers to conduct additional comparisons of interest for their research questions—data are shared at https://osf.io/ch6kq/.

Participant Characteristics and Descriptive Statistics

As expected, conditions did not differ in terms of gender (χ2(15) = 17.01, p = 0.318), age (F(15,1063) = 0.45, p = 0.964), race (χ2(105) = 118.09, p = 0.180), or ethnicity (χ2(30) = 32.16, p = 0.360; see Table 1). Similarly, conditions did not differ in terms of participants’ baseline anxiety symptoms, as measured by the OASIS (F(15,1104) = 1.15, p = 0.309) or baseline interpretation bias, as measured by RRT (negative: F(15,1104) = 1.38, p = 0.151; positive: F(15,1104) = 1.61, p = 0.065; see Table 2). To test the effects of interest, each condition was compared to the Control No task Control condition using independent t-tests and Cohen’s d values (a standardized indicator of mean differences) were computed. Missingness was minimal; per condition, between zero to three participants had missing data for each measure. All available data was used, but an analysis of only complete cases yielded similar results.Footnote 2

Table 1 Participant characteristics
Table 2 Descriptive statistics

Condition Effects on Interpretation Bias

See Table 2 for means and standard deviations of pre-training and post-training measures. To test if condition affected change in interpretation bias, changes in negative and positive recognition ratings were computed (RRT Change; post CBM–pre CBM). For positive recognition ratings, positive RRT Change values indicate more positive interpretations over the course of the study but negative RRT Change values indicate less positive interpretations. For negative recognition ratings, positive RRT Change values indicate more negative interpretations over the course of the study but negative RRT Change values indicate less negative interpretations.

The RRT Change for negative and positive recognition ratings were computed for each condition and compared to the No task Control condition. All conditions except the Attention-Switching, Neutral, and Imagery Only/Audio conditions had a larger decrease in negative interpretations (see Fig. 1; − 2.32 < ts(126–146) < − 4.80, ps < 0.05, − 0.39 < ds < − 0.80) compared to the No task Control condition. All conditions except the Attention-Switching and Neutral conditions had a larger increase in positive interpretations (see Fig. 2; 2.53 < ts(119–148) < 4.75, ps < 0.05, 0.44 < ds < 0.80) compared to the No task Control condition. The 95% confidence intervals overlapped for the 12 effective conditions for negative interpretations and the 13 effective conditions for positive interpretations, suggesting similar magnitude of effects across effective conditions and the Cohen’s d values were mainly in the medium range.

Fig. 1
figure 1

Effect of training conditions on negative RRT Change, compared to No task Control condition

Fig. 2
figure 2

Effect of training conditions on positive RRT Change, compared to No task Control condition

Next, BBSIQ ratings for all training conditions were compared to the No task Control condition. The physical and external BBSIQ subscales were highly correlated (r(1097) = 0.88, p < 0.001), so they were combined. Surprisingly, only the Imagery Only/Audio condition had less negative interpretations compared to the No task Control condition (see Fig. 3; t(131) = − 2.20, p = 0.030, d = − 0.38), and this effect size was small to medium.

Fig. 3
figure 3

Effect of training conditions on Post-training BBSIQ, compared to No task Control condition

Training Condition Effects on Anticipated Anxiety

To test if condition affected responses to a hypothetical stressor, ASSQ ratings for all training conditions were compared to the No task Control condition. All ASSQ items were moderately correlated (0.51 > rs > 0.57), so were combined. The Mindfulness, Switching, and Reframing Negative Events conditions had less negative responses compared to the No task Control condition (see Fig. 4; Mindfulness: t(138) = − 2.11, p = 0.036, d = -0.35; Switching: t(143) = − 2.79, p = 0.006, d = − 0.46; Reframing Negative Events: t(139) = − 2.62, p = 0.010, d = − 0.44), with all effect sizes falling within the small to medium range.

Fig. 4
figure 4

Effect of training conditions on Post-training ASSQ, compared to No task Control condition

Training Condition Effects on Response to a Stressor

To test if condition affected response to an actual stressor, change in anxiety following the Anagram Task (post-Anagram Task—post-CBM) for all training conditions was compared to the No task Control condition. Only the 50/50, Reframing Negative Events, and Matching Facial Expressions conditions differed from the No task Control condition, with less change in anxiety following the Anagram Task (see Fig. 5; 50/50: t(119) = − 2.19, p = 0.030, d = − 0.38, Reframing Negative Events: t(129) = − 2.27 p = 0.025, d = − 0.38 and Matching Facial Expressions: t(138) = − 2.49, p = 0.014, d = − 0.41). Again, all effect sizes were in the small to medium range.

Fig. 5
figure 5

Effect of training conditions on change in anxiety following Anagram Task, compared to No task Control condition

Training Condition Effects compared to Neutral Condition

As a secondary post-hoc analysis, to provide a more stringent comparison for the various CBM-I conditions, we reran the main tests of condition effects comparing conditions to the Neutral (rather than No task Control) condition. The Neutral condition has the advantage of matching the CBM-I conditions for format, time, and task demands. For negative interpretations, as measured by the negative RRT change, the Anxious Imagery, 100% Positive, and Adding Pictures conditions were no longer significant when comparing to the Neutral condition (Anxious Imagery: t(128) = − 1.64, p = 0.104, d = − 0.28, 100% Positive: t(113) = − 1.84, p = 0.068, d = − 0.33, and Adding Pictures: t(116) = − 1.71, p = 0.090, d = − 0.30), though effect sizes indicated small effects. For positive interpretations, as measured by the positive RRT change, the 50/50 condition was no longer significant when comparing to the Neutral condition (t(122) = 1.53, p = 0.128, d = 0.27). For negative interpretations, as measured by the BBSIQ, the Imagery Only condition was no longer significantly different when comparing to the Neutral condition (t(120) = − 1.07, p = 0.286, d = − 0.19). For the ASSQ, the Mindfulness condition was no longer significantly different when comparing to the Neutral condition (t(124) = − 1.76, p = 0.081, d = − 0.30). For anxiety elicited by the Anagram Task, no conditions differed from the Neutral condition for emotional vulnerability, indicating that the 50/50, Reframing Negative Events, and Matching Facial Expressions conditions were no longer significant when compared to the Neutral condition (50/50: t(126) = − 1.69, p = 0.093, d = − 0.30, Reframing Negative Events: t(133) = − 1.77, p = 0.080, d = − 0.30, and Matching Facial Expressions: t(136) = − 1.93, p = 0.056, d = − 0.33). Notably, these comparisons all showed small effects based on Cohen’s d values, though they were no longer significant.

Effects of Combined Active Training Conditions

Our active CBM-I conditions may have effects that are too small to detect when conditions are analyzed individually.Therefore, as a final post-hoc analysis, we combined all active CBM-I conditions (excluding Attention Switching, Neutral, and 50/50),Footnote 3 re-computed the effect size, and compared it to the No task Control and Neutral conditions. When compared to the No task Control condition, the combined effect of active CBM-I conditions was significantly different for positive and negative interpretations, as measured by change in RRT (positive: t(85) = 5.99, p ≤ 0.001, d = 0.70; negative: t(81) = − 4.11, p ≤ 0.001, d = − 0.56) and the ASSQ (t(86) = − 2.14, p = 0.035, d = − 0.25), but not the BBSIQ (t(81) = − 1.19, p = 0.239, d = − 0.16) or anxiety elicited by the Anagram Task (t(89) = − 1.82, p = 0.072, d = − 0.19). When compared to the Neutral condition, the combined effect of CBM-I was only significantly different for positive and negative interpretations, as measured by change in RRT (positive: t(82) = 4.92, p ≤ 0.001, d = 0.53; negative: t(80) = − 3.66, p = 0.001, d = − 0.41), but not the ASSQ (t(86) = − 1.73, p = 0.087, d = − 0.17), BBSIQ (t(79) = 0.57, p = 0.569, d = 0.07) or anxiety elicited by the Anagram Task (t(77) = − 1.06, p = 0.291, d = − 0.13). Taken together, these combined analyses suggest that our method of delivering CBM-I over the Internet had a medium to large effect on positive and negative interpretations (as measured by RRT), and a small effect on response to a hypothetical stressors (as measured by the ASSQ, though this effect was no longer reliable when compared to the Neutral condition).

See Online Appendix for supplemental table of Pearson correlations between post-training measures (note the unexpected direction of relationships between negative RRT and other post-training measures).

Discussion

The current study compared the effects of 13 active variations of CBM-I and two alternate conditions to a No task Control condition in an analogue sample with moderate to severe anxiety. Results suggested that all conditions, except the Neutral and Attention-Switching alternate conditions, reduced negative interpretations and increased positive interpretations of ambiguous information (though the Imagery Only/Audio condition did not significantly affect negative interpretations). However, only a few of the conditions differed from the No task Control condition on other post-training measures, and it is plausible these differences may be due to chance.

As expected, all active CBM-I conditions shifted interpretation bias when compared to the No task Control condition (and all but a few when compared to the Neutral condition), with effect sizes typically in the small to medium range. This supports research highlighting the malleability of interpretation bias (Menne-Lothmann et al. 2014) and supports the potential of using the Internet as a dissemination strategy for CBM-I. Surprisingly, the 13 effective conditions for positive interpretations and the 12 effective conditions for negative interpretations (when compared to the No task Control condition) had similar magnitude of effects (based on overlap across their 95% confidence intervals), suggesting the variations did not substantively alter the training programs’ ability to shift interpretations in a single session online format. Even the 50/50 condition, which is often used as a control condition, led to a more benign and less threatening interpretation bias when compared to the No task Control condition (but not the Neutral condition). This is in line with prior findings that conditions initially believed to be controls (such as the 50/50 condition) may not actually be inert (perhaps due to training flexibility), and may help explain mixed findings in the literature (Menne-Lothmann et al. 2014). This highlights the necessity of determining what constitutes an appropriate control condition for future CBM-I studies (see also Blackwell et al. 2017).

A few CBM-I conditions stood out because training effects were evident on other post-training measures besides the RRT. Specifically, participants in the Reframing Negative Events and the Switching conditions both responded differently than the No task Control and Neutral conditions in response to a hypothetical stressful situation. Similarly, the 50/50 and Reframing Negative Events conditions resulted in significantly less anxiety in response to the Anagram Task, compared to the No task Control condition (but not the Neutral condition). This is in line with past work suggesting that cognitive flexibility enhances positive outcomes (Parsons et al. 2016). Notably, our results suggest that targeting flexibility related to emotional materials (as done in the Switching, 50/50, and Reframing Negative Events conditions), and not just cognitive flexibility in general (as done in the Attention-Switching condition) may be helpful to shift interpretive bias and emotional vulnerability. However, with the current measures it is difficult to tell whether changes in cognitive flexibility occurred or mediated results.

A few additional scattered condition effects occurred on various post-training measures. Specifically, the Matching Facial Expressions condition resulted in significantly less anxiety in response to the Anagram Task, compared to the No task Control condition (but not the Neutral condition). When compared to the No task Control condition (but not the Neutral condition), the Mindfulness condition also resulted in significantly different responses to a hypothetical stressful situation. Also, when compared to the No task Control condition (but not the Neutral condition), the Imagery Only/Audio condition was the only variation to result in significantly different interpretation bias as measured by the BBSIQ. However, we are reticent to over-interpret these scattered effects on post-training measures given the inconsistent results across comparisons and large number of tests conducted. The most consistent finding is that many variations of active CBM-I alter interpretation bias—common to these variations is presentation of emotionally ambiguous scenarios and their resolution. Additionally, results suggest the likely importance of training flexibility related to emotional materials to shift emotional vulnerability. It is not known from these findings whether the presentation and resolution of emotional ambiguity (a key hypothesized mechanism in CBM-I) is necessary, or whether simply the presentation of emotional material accounts for some of the observed effects. Further research that dismantles the different components of CBM-I training materials will be valuable.

While the majority of active conditions shifted interpretations to be more positive and less negative, most conditions did not affect other outcomes. The lack of effects on “downstream” outcomes (i.e., anticipated anxiety to a stressful situation) was even more pronounced when comparing active CBM-I conditions to the Neutral condition. This is contrary to the causal claim in cognitive models of anxiety (Beck and Clark 1997; Clark and Beck 2010), and is in line with meta-analyses documenting inconsistent effects of CBM-I on anxiety (Cristea et al. 2015; Hallion and Ruscio 2011), though see Fodor et al. (2020)

While results may suggest that interpretation bias change does not alter anxiety, it is also quite plausible that design choices precluded our ability to see CBM-I effects on downstream outcomes. First, our decision to do a brief, single session, Internet-delivered CBM-I likely decreased the potency of CBM-I, given that more trials, multiple sessions, and laboratory delivery all increase effects of CBM (Jones and Sharpe 2017; Menne-Lothmann et al. 2014; Zhang et al. 2019). Second, our selection of outcome measures (i.e., anagrams) may have limited our ability to see effects. Namely, the performance concerns that were expected to be activated by the anagram task may not have matched the idiographic anxiety concerns of our heterogeneous sample. It is possible that we would have seen more condition effects on emotional vulnerability measures had we restricted training to participants with a specific type of anxiety (e.g., spider fear). Then, we could have used CBM-I training scenarios (e.g., scenarios about spiders and ability to cope with anxiety around spiders) and stressor tasks that were specific to that fear (e.g., behavioral tasks related to spiders). Third, time between training and assessment (e.g., to practice new interpretations in daily life) may be needed to affect downstream outcomes. Finally, our analyses combining active CBM-I conditions’ effects suggest that CBM-I, in general, had a small effect on response to a hypothetical stressful situation, but results may be too small (or measures may be too insensitive) to be detected when each condition is tested individually. Thus, while we cannot determine the specific reasons for the null results with these data, we think it will be important to conduct further tests to distinguish between these possible explanations given this will be critical for determining the ultimate clinical utility of CBM-I.

The results from this study highlight the significant challenge of how to optimally test ideas for many variations in the paradigm to improve results. We elected to try a ‘proof of principle’ randomized clinical trial (RCT) approach where we did a small dose of each variant to see if there was a signal that would warrant testing in a future larger trial. This raised the challenge of not being able to clearly interpret whether null results were due to the low dose or to the variant not having an effect. Conducting well-powered RCTs is highly resource-intensive and time-consuming so future researchers might want to consider alternate designs that more clearly allow for rapid testing of many conditions, such as the Leapfrog design (Blackwell et al. 2019).

Results should be interpreted in light of several limitations. First, this study used an analog sample. Future studies are needed to determine if results are consistent with diagnosed anxious samples. Second, this study only included one session of training; previous research (Menne-Lothmann et al. 2014) suggests that multiple training sessions lead to stronger effects, so it is unclear from the current study whether lack of transfer in some conditions is due to weaker CBM-I variants, or due to not receiving an “adequate dose” of the variants. This also may explain the limited change observed in negative interpretations; while statistical significance was achieved, considerable negativity still remained based on the post-training Negative RRT scores (though effect sizes for most conditions were in the medium range). Future trials testing variants of CBM-I should ideally include multiple sessions, with more trials in each session to increase the robustness of effects and reduce risk of Type II error. Third, although this study included the ASSQ and Anagram Task as proxies for emotional vulnerability, the online nature of this study prevented administration of actual measures of behavioral avoidance and approach. Fourth, the BBSIQ, ASSQ, and Anagram Task were only administered post-training (to reduce participant burden), which limits our ability to know if training actually changed these constructs. Fifth, we did not include manipulation checks for our conditions (e.g., did not test if our Mindfulness condition affected state mindfulness). Finally, despite its widespread use for psychological research, there remain interesting open questions about whether a research infrastructure like mTurk is well-suited to intervention studies like this one. On the one hand, it is the ideal vehicle for exploratory research of this type that requires a very large sample and access to many anxious individuals. On the other hand, the motivation and reward structure of mTurk workers may not align well with this type of intervention development work.

Despite these limitations, results suggest that presenting valenced interpretations of ambiguous information during CBM-I, regardless of the specific format, leads to less threatening, more benign interpretations. However, most conditions did not differ from No task Control condition on other post-training assessments.