Acceptance and commitment therapy (ACT; Hayes, Strosahl, & Wilson, 2012b) is a transdiagnostic, contextual–behavioral approach to psychotherapy and other behavioral health concerns that is considered foundational to third-wave cognitive and behavioral therapies (Hayes, 2004). ACT has demonstrated efficacy across a range of clinical problems (cf. A-Tjak et al., 2014; Powers, Vörding, & Emmelkamp, 2009), resulting in its designation as an empirically supported psychological treatment with strong research support for chronic pain and modest research support for depression, mixed anxiety, obsessive–compulsive disorder, and psychosis (Division 12 of the American Psychological Association, 2016). ACT’s philosophical and theoretical development and its research, implementation, and dissemination strategies are closely linked to contextual behavioral science (CBS), a psychological scientific paradigm heavily influenced by early radical behavioral thinking and behavior analysis (cf. Hayes, Barnes-Holmes, & Wilson, 2012a; Levin, Twohig, & Smith, 2016; Vilardaga, Hayes, Levin, & Muto, 2009).

Among CBS’s strategic proposals for scientific progress is the importance of “middle-level” terms: clinically useful interfaces that describe complex sets of functional relations between the individual and the environment that in turn are based on behavioral principles (Vilardaga et al., 2009). They possess looser precision and lower levels of process specificity than basic behavioral processes such as reinforcement. Nevertheless, some argue that middle-level terms may be useful for training, dissemination, and clinical practice, especially for people without a thorough foundation in behavioral theory who are faced with complex human behavioral situations and challenges (Hayes et al., 2012a; Levin et al., 2016; Vilardaga et al., 2009). Currently, middle-level terms are at the core of ACT’s psychological flexibility model, which presents six middle-level processes to target during treatment in order to develop and maintain psychological flexibility: acceptance, cognitive defusion, contact with the present moment, self-as-context, values, and committed action (Hayes & Strosahl, 2004).

Some authors have suggested that the pragmatic value of any middle-level term is dependent upon the clarity of the links between the term and specific sets of underlying functional relations and have questioned the existence of such links with respect to the components of ACT’s clinical model (Barnes-Holmes, Hussey, McEnteggart, Barnes-Holmes, & Foody, 2016; McEnteggart, Barnes-Holmes, Hussey, & Barnes-Holmes, 2015). In this line of reasoning, the philosophical and theoretical foundations of any middle-level term should be as well defined as possible—an effort invited by the CBS approach (Hayes et al., 2012a). This maximizes the pragmatic utility of middle-level terms by preventing the conflation of multiple processes in a single concept and minimizes the dissemination of middle-level terms into practice without functional clarity. Otherwise, middle-level terms may not service either the CBS practitioner or the behavioral psychologist more generally.

The objective of this article is to consider the middle-level term cognitive defusion as it is currently used within ACT and to clarify the functional link between this concept and basic learning processes. The article first contextualizes ACT in its theoretical roots, primarily the role of language in psychological problems. Following this, cognitive defusion is presented as a therapeutic intervention to address problems caused by human language, outlining its procedures, processes, and outcomes. Third, the proposed mechanism of change in cognitive defusion exercises is critically examined, and an alternative conceptualization is put forward. Finally, the conceptual, clinical, and research implications of the new proposal are considered.

Theoretical Foundations of ACT: Relational Frame Theory

The theoretical rationale for cognitive defusion arises from the functional account of language and cognition provided by relational frame theory (RFT; Dymond & Roche, 2013; Hayes, Barnes-Holmes, & Roche, 2001). The foundational position of RFT is that the generalized operant of arbitrarily applicable relational responding (AARR) is the basis of human language (Hayes et al., 2001). Relational responding refers to responding that is not solely controlled by a stimulus and its directly conditioned functions but rather by relations between stimuli. Arbitrarily applicable expresses that the stimuli in a controlling relation do not need to share formal (i.e., physical) characteristics.

AARR is a learned behavior that requires a history of multiple-exemplar training (Luciano et al., 2009). In other words, the social environment teaches the individual, through differential reinforcement, to respond relationally to several sets of stimuli (e.g., trees, houses, cars, toys) regarding numerous physical properties (e.g., color, size, weight) in direct and reverse order (A–B and B–A). This extensive history allows the individual to abstract the common element across these interactions: a verbal cue that signals the discriminated response that is likely to be reinforced (e.g., the word same). Once abstracted, this cue can exert contextual control over behavior (i.e., relational context) by indicating the type of relation that applies among stimuli (such as coordination, opposition, distinction, comparison, hierarchy, temporality, spatiality, and causality) and thereby control a particular pattern of responding in accordance with the specified relation. These different patterns of responding have been labeled relational frames.

Once AARR is well established in the person’s repertoire, the formation of a few relations among stimuli will lead to the emergence of new relations that were not directly taught, given an appropriate relational context. These derived relations are characterized by mutual entailment (if A → B, then B → A) and combinatorial entailment (if A → B and A → C, then B → C). Importantly, a stimulus may have its function transformed through direct or derived relations—a process documented experimentally across multiple functions and relational frames (e.g., Augustson & Dougher, 1997; Dougher, Augustson, Markham, Greenway, & Wulfert, 1994; Dougher, Hamilton, Fink, & Harrington, 2007; Dymond, Roche, Forsyth, Whelan, & Rhoden, 2007, 2008; Dymond et al., 2011; Gil, Luciano, Ruiz, & Valdivia-Salas, 2012, 2014; Greenway, Dougher, & Wulfert, 1996; Roche, Barnes-Holmes, Smeets, Barnes-Holmes, & McGeady, 2000; Whelan & Barnes-Holmes, 2004; Whelan, Barnes-Holmes, & Dymond, 2006). The transformation of function is controlled by another verbal cue responsible for signaling the specific stimulus function that will be transformed: the functional context (Dougher, Perkins, Greenway, Koons, & Chiasson, 2002; Perez, Fidalgo, Kovac, & Nico, 2015; Roche et al., 2000; Wulfert & Hayes, 1988).

The following example may clarify the aforementioned concepts. Suppose a person who is fearful of the box jellyfish is told: “The blue ring octopus is more dangerous than the box jellyfish.” This sentence is an instance of AARR in which the stimuli blue ring octopus and box jellyfish are being arbitrarily related. The expression more than acts as a relational context, controlling responding to the blue ring octopus in terms of its comparative relationship to the box jellyfish. The word dangerous acts as a functional context, indicating that the comparison is based on a dimension of threat rather than size or beauty, for example. Consequently, even if the person had never heard of the blue ring octopus before, he or she can say that it is more dangerous than, for example, a pufferfish (which he or she knows is less dangerous than the box jellyfish). More interestingly, the mention or sight of the octopus may elicit fear and evoke escape responses and will do so to a higher degree than the mention or sight of the jellyfish.

Consequences of AARR

According to RFT, AARR’s importance to human behavior derives from some of the characteristics of the constructed stimuli relations (Hayes et al., 2001). First, the arbitrariness of responding means that, given an appropriate contextual cue, any stimulus can be related to any other. Thus, small gestures, sounds, or notes on paper (responses that are easy to produce and perceive) may acquire stimulus functions and act as words. Second, the indirectness of responding suggests that people may respond to a stimulus that they never encountered before through its relation to other stimuli. This enables learning about remote or improbable events before they actually occur and about dangerous situations without risk to the individual. Third, the derivativeness of responding means that, through mutual and combinatorial entailment, the establishment of a few relations will lead to multiple derived relations, dramatically increasing learning rates and creating complex relational networks of stimuli. Together, these characteristics suggest that AARR may be foundational to what is commonly referred to as human intelligence insofar as AARR is involved in fundamental aspects of communication, reasoning, and problem solving (Cassidy, Roche, & O’Hora, 2010).

Therefore, it is no surprise that the social community will favor the development of AARR, modeling and reinforcing relational responses through socially mediated consequences (e.g., parents’ approval when their child says, “The stove is hot”). Shortly thereafter, the enhanced probability of more adaptive responses to environmental challenges due to transformations of functions (e.g., the child avoiding touching the stove because “the stove is hot”) will also contribute to increased frequency of AARR over time. Initially, this will occur in a public manner, as observed in children’s increased engagement in speech even when it does not serve a social function (cf. Vygostky, 1978) and later, as some speech instances are socially punished, in a private manner as thinking (Skinner, 1957). In adults, AARR becomes so recurrent, fluent, and fluid that some authors have described it metaphorically as a stream (e.g., James, 1890).

According to RFT, over normal human development, ever-increasing portions of the environment are responded to in terms of arbitrarily applicable (i.e., verbal) relations to other stimuli. Thus, many stimuli may acquire their functions solely or mainly through verbal relations. Verbal relations come to dominate over nonverbal learning processes unless these normal language-learning processes are disrupted through unusual circumstances (e.g., trauma or developmental disorders). When a person responds almost exclusively to the verbal conditioned functions of a stimulus to the detriment of other, nonverbal stimulus control, ACT therapists use the middle-level term cognitive fusion (Hayes et al., 2012b).

RFT suggests that cognitive fusion is not inherently detrimental to the individual but becomes so when it leads to maladaptive and rigid behavioral patterns. For example, faced with an extended hand inviting a handshake, a person may respond to multiple functions of the stimulus hand. These may include the formal properties of the person’s hand, such as its size, skin color, or shape, or the social consequences of accepting or rejecting the handshake offer, such as approval or criticism. It is also possible to respond to the hand as an object that contains germs, which in turn cause diseases. For many individuals, this set of verbal relations between disease and hand may exert a weak influence under normal conditions of multiple sources of stimulus control. For other individuals, the verbal relations between disease and hand may dominate, transforming the aversive functions of disease to the hand, and may render other potential sources of stimulus control irrelevant. We now have cognitive fusion. In this case, the verbally acquired, derived functions have more strength than the other formal or directly established stimulus functions, and the derived relation’s influence over behavior will be extensive, eliciting and evoking specific responses, such as fear and avoidance, with great probability. If the functional transformations that occur are not sensitive to the current context (e.g., a person’s hand will always be diseased and aversive, whether it is clean or dirty), their influence over behavior will become more generalized across different situations, further increasing the probability of fear and avoidance responses.

The high magnitude and low context sensitivity of stimulus functions combine to dramatically increase the likelihood of a specific response (or set of responses). Given that time is finite, when a response increases in probability, other incompatible ones must decrease in likelihood (Baum, 2002; Herrnstein, 1970). All other responses are improbable in this situation: One’s response pattern is rigid regarding that stimulus. As Wilson and Murrell (2004) pointed out: “The primary problem with conditioned aversives is not that the individual avoids or becomes aroused. The problem is that they only become aroused and avoid” (p. 129). In this example, such a rigid avoidance pattern prevents meaningful social interactions and contributes to a narrowing of the person’s life.

To this individual, the experience of cognitive fusion is one of being surrounded and dominated by thoughts, with these thoughts automatically generating behavior, so much so that the actual functional verbal relations controlling behavior are not apparent. Other sources of influence are minimized, and the experience of choice is reduced. The person merely reacts to the content of these verbal relations, even when the person is aware that this rigid pattern is not in his or her best interest and leads to problematic consequences. This is well described by ACT’s metaphors of entanglement and oneness: being stuck, caught up, and fused.

Because unlearning is not possible (Bouton, 2002; Falls, 1998), it is necessary for the individual to obtain new and different learning experiences to diminish the strength of derived stimulus functions and counteract their extensive dominance over behavior that leads to rigidity. However, AARR’s characteristics may interfere with this process. More specifically, AARR enables the individual to respond to an event in an indirect manner through its relation with other stimuli. Despite the advantages gained by this characteristic, it also allows for changes in the salience or probability of the event to be unnoticed by the person, who will continue to respond to it through previously established verbal relations. In other words, the individual may be insensitive to the current environment and remain under the control of derived verbal relations (cf. Galizio, 1979; Hayes, Brownstein, Zettle, Rosenfarb, & Korn, 1986; Matthews, Shimoff, Catania, & Sagvolden, 1977; Shimoff, Catania, & Matthews, 1981).

Furthermore, even if the person comes into direct contact with the altered contingencies, it may not be sufficient to alter his or her responding (cf. Hayes et al., 1986; Pilgrim & Galizio, 1990, 1995; Roche, Barnes, & Smeets, 1997; Shimoff et al., 1981). After all, the maintenance of verbal coherence is a powerful automatic reinforcer (Bordieri, Kellum, Wilson, & Whiteman, 2015; Quiñones & Hayes, 2014; Wray, Dougher, Hamilton, & Guinther, 2012), and it may be stronger than the reinforcement instated by the new contingencies. As repeatedly shown by the extensive literature on confirmation bias, people tend to search for and overevaluate information congruent with their previously established relations (Nickerson, 1998), therefore reducing the impact of new learning experiences.

The Objective of Cognitive Defusion

For clients whose psychological problems are maintained or exacerbated by verbally conditioned stimuli that promote rigid and unadaptive response patterns, cognitive–behavioral therapists have commonly chosen cognitive restructuring interventions. In brief, cognitive restructuring is a collaborative effort in which both the client and therapist logically and empirically evaluate and dispute the client’s thought content and underlying beliefs, assumptions, and schemas (Dobson, 2009; Leahy, 2003). In other words, the therapist helps the client to contact incoherencies between his or her cognitions and other thoughts (logical) or environmental events (empirical).

From an RFT perspective, cognitive restructuring’s main objective is that the client will engage in AARR, developing new stimulus relations that are more sensitive to current conditions and contextual subtleties than the previously established ones (Blackledge, Moran, & Ellis, 2009). Strengthening these new verbal relations will favor their influence on behavior, diminishing the dominance of the functional control that resulted from transformations due to the previous relations. Therefore, the verbal control over behavior is maintained but altered.

Over the last 30 years, ACT has constructed an alternative model that explicitly minimizes cognitive restructuring while emphasizing cognitive defusion interventions. The aim of cognitive defusion is not to create new stimulus relations but to disrupt the transformations of functions that occur in instances of AARR by means of its functional context. Therefore, rather than altering the content of verbalizations, cognitive defusion diminishes their impact on behavior (Blackledge, 2007; Hayes, Luoma, Bond, Masuda, & Lillis, 2006; Hayes et al., 2012b). By decreasing the verbally conditioned functions of a stimulus, its directly conditioned functions as well as other stimuli (especially nonverbal) become more likely to exert influence over behavior, possibly evoking more adaptive responses. This may reduce the experience of entanglement and fusion and enhance the experience of choice among different alternatives, increasing response flexibility (i.e., reducing rigidity).

The Procedures of Cognitive Defusion

A large number of defusion exercises, created within the practice of ACT or borrowed from other practices and cultures, have been used to diminish the impact of AARR on behavior. Some of them, retrieved from Blackledge (2015); Hayes et al. (2012b); and Luoma and Hayes (2009), will be briefly outlined in the following sections.

Playing With Words

In these exercises, the client selects a word or small phrase that exerts control over behavior. He or she is instructed to “play” with the word, repeating it out loud for 30 s (word repetition); repeating it using funny voices, such as those of cartoon characters (silly voices); repeating it slowly, each syllable at a time (slow speech); singing it (singing thoughts); or using its synonym in a foreign language (word translation).

Questioning Verbal Coherence

These procedures question commonly attributed causes of behavior. The client may be asked to write his or her life story but change its consequences for his or her present self (create a new story); list all of his or her defining characteristics, notice the list’s incongruities, and eliminate the characteristics from the list one by one (I am); or repeatedly answer “Why?” questions regarding behavior until he or she has no reasons left (Why, why, why?).

Disrupting Thought–Action

These exercises produce experiences of incongruity between thoughts and actions. They include having clients convince themselves not to perform a simple physical action and then do it (thoughts and feelings aren’t causes); carry cards with written thoughts, including I can’t walk, while walking around (carrying cards); or participate in a role-playing exercise in which the therapist and client alternate between the person (who must walk around and guide the pair) and the person’s mind (which must constantly describe, evaluate, and compare) without either one interrupting the role of the other (take your mind for a walk).

Observing the Process of Relational Responding

These interventions require the client to attend to the activity of relational responding. In order to do so, various mindfulness exercises are used in which the client must observe his or her thoughts, for example, as leaves falling on a river (leaves on a stream) or posters carried by soldiers marching (soldiers in the parade) without interrupting the flow. The client can also be asked to observe his or her flow of private events across multiple instances in his or her past and to attend to the person who is observing these experiences (observer exercise).

Identifying Relational Responses

In these procedures, the client is asked to identify different instances of relational responses. He or she can do so by labeling them as worries, judgments, comparisons, and so on (cubbyholing); differentiating formal from evaluative properties (bad cup metaphor); introducing the prefix “I am having a thought that. ..” (having thoughts); or being thankful for the thought (mental appreciation).

Converting Into Physical Objects

These exercises attribute physical characteristics to thoughts, turning them into objects with properties such as size, color, and texture (physicalizing); passengers on a bus that the client is driving (passengers on a bus); fingers that can be close or far from one’s face (hands as thoughts); or words written on a sheet of paper (content on cards).

Exercises’ Outcomes in Component Studies

Some defusion exercises have been investigated in component studies that attempt to separate a clinical procedure from its larger therapeutic package, such as ACT, in order to isolate its effects (Levin, Hildebrandt, Lillis, & Hayes, 2012). Several studies have investigated the effects of the word repetition exercise on the emotional discomfort associated with and the believability of thoughts, obtaining positive outcomes on both measures (Barrera, Szafranski, Ratcliff, Garnaat, & Norton, 2015; Deacon, Fawzy, Lickel, & Wolitzky-Taylor, 2011; De Young, Lavender, Washington, Looby, & Anderson, 2010; Keogh, 2008; Mandavia et al., 2015; Masuda, Feinstein, Wendell, & Sheehan, 2010b; Masuda, Hayes, Sackett, & Twohig, 2004; Masuda et al., 2009; Masuda et al., 2010a; Ritzert, Forsyth, Berghoff, Barnes-Holmes, & Nicholson, 2015; Tyndall, Papworth, Roche, & Bennett, 2017; Watson, 2007).

The emotional discomfort and believability of thoughts were further explored using the having thoughts exercise, attaining mixed results (Healy et al., 2008; O’Sullivan, 2013; Pilecki & McKay, 2012). This procedure has also been found to alleviate the effects of a learned helplessness induction on problem solving (Hooper & McHugh, 2013) and decrease cigarette smokers’ approach and consumption behavior (Beadman et al., 2015).

The carrying cards exercise coupled with ACT’s “swamp metaphor” (cf. Hayes et al., 2012b) has been shown to decrease escape or avoidance behavior due to aversive stimulation without significantly affecting emotional discomfort (Gutiérrez, Luciano, Rodríguez, & Fink, 2004; Kehoe et al., 2014; McMullen et al., 2008). The same has been found with the content on cards exercise combined with a variation of ACT’s observer exercise (Luciano et al., 2014). A variation of the observer exercise has also been used to reduce disruptive behavior in children, with positive results (Luciano et al., 2011); in combination with word repetition, it has also been used to reduce implicit anxiety measures (Kishita, Muto, Ohtsuki, & Barnes-Holmes, 2014).

Mindfulness strategies (e.g., leaves on a stream) have diminished the emotional discomfort occasioned by thoughts (Foody, Barnes-Holmes, Barnes-Holmes, Rai, & Luciano, 2015; Marcks & Woods, 2005), whereas silly voices as an adjunct intervention to exposure and response prevention reduced the frequency of problem behavior in children with autism (Eilers & Hayes, 2015).

Finally, studies that combine more than one cognitive defusion procedure into a larger protocol have obtained positive results with spider fear (Golijani-Moghaddam, 2011; Wagener & Zettle, 2011); overeating (Hooper, Sandoz, Ashton, Clarke, & McHugh, 2012; Jenkins & Tapper, 2014; Moffitt, Brinkworth, Noakes, & Mohr, 2012); dysphoria (Hinton & Gaynor, 2010); and discomfort and willingness to experience negative self-relevant thoughts (Larsson, Hooper, Osborne, Bennett, & McHugh, 2016). However, one study did not obtain positive results with avoidance in claustrophobia (Dublin, 2012).

The Process of Cognitive Defusion

The aforementioned studies suggest that defusion procedures are capable of producing important behavioral changes. However, they do not address the underlying change processes involved (Dymond, Roche, & Bennett, 2013; Levin & Villatte, 2016). The most widely used account of the mechanism of change in cognitive defusion interventions is offered by Blackledge (2007). The author’s proposal is based on experiments that demonstrated contextual control over functional transformations in AARR (e.g., Dougher et al., 2002; Perez et al., 2015; Roche et al., 2000; Wulfert & Hayes, 1988). According to Blackledge (2007), a “context of literality” exists at the societal level that supports AARR and its effects, including fusion. This context includes properties common to everyday language, such as grammatical structure, speech velocity, congruence between emotional tone and content, maintenance of verbal coherence, and attending to the products of relational responding (i.e., derived relations and transformations of functions). As Blackledge stated, these contextual commonalities across verbal interactions enable function transformation. Under normal social conditions, the context of literality is in place and AARR will occur naturally and at high strength. Alternatively, cognitive defusion procedures call attention to and disrupt the context of literality. With the context of literality disrupted, thoughts may still occur, but the functional transformations produced by AARR may be reduced. In the words of Blackledge (2007):

In other words, certain contextual conditions must be in place for verbally specific processes to change stimulus functions (i.e., for cognitive fusion to occur). We can thus logically assume that changing these contextual conditions in certain ways would lead to a disruption of these verbally based functional transformations. This, it is argued, is the essence of cognitive defusion. (p. 561)

This conceptualization poses some difficulties. First, as the author stated, it logically assumes that the process of defusion is the opposite of fusion (i.e., introduction or removal of contextual cues) because their results are opposite (i.e., an increase or decrease of function transformations through verbal relations). However, that is not necessarily the case. An event can be opposed to another along one dimension but not another. For example, fridge is opposed to stove regarding temperature (i.e., one is cold and the other is hot) but not weight (i.e., both are heavy). Analogically, given that cognitive fusion and defusion are opposed in their results, it does not necessarily follow that they must be opposed in terms of process.

This reasoning is supported by experimental evidence that considers a variety of learning processes neglected by Blackledge’s (2007) account. Equifinality is probable in human learning: Different learning pathways (processes) can lead a stimulus to acquire similar functions (result). For example, analogous fear and avoidance responses can be generated by stimuli whose functions were learned through direct conditioning, observation, instruction, and derived relational responding (Cameron, Roche, Schlund, & Dymond, 2016; Dymond, Schlund, Roche, De Houwer, & Freegard, 2012).

The importance of considering multiple behavioral processes is heightened by the fact that a stimulus whose function was acquired through verbal relations can have its magnitude diminished through nonverbal processes. For example, extinction effects produced nonrelationally through respondent or operant extinction may generalize to verbally related stimuli (Dougher et al., 1994; Luciano et al., 2014; Roche, Kanter, Brown, Dymond, & Fogarty, 2010; Vervoort, Vervliet, Bennett, & Baeyens, 2014). As Kanter (2013) states, “even if clinical problems do arise in ways suggested by experimental RFT research, this does not justify the conclusion that the clinical problems need to be targeted with relational interventions” (p. 230).

To sum up, a resulting functional transformation does not precisely specify the process by which the stimulus produced it, and the process of function reduction does not have to be a functional opposite of the process of function acquisition. Furthermore, Blackledge’s (2007) conceptualization does not readily account for the differences in behavioral effects between different defusion procedures. For instance, the literature reviewed previously indicates that the word repetition, leaves on a stream, and soldiers in the parade exercises produce reductions in emotional discomfort (e.g., Masuda et al., 2010a). However, both the carrying cards and content on cards exercises produce significant changes in escape or avoidance behavior—in some cases, decreasing it to a minimum—but do not produce reductions in emotional discomfort (e.g., Gutiérrez et al., 2004; Luciano et al., 2014). It is difficult to comprehend how a single process could produce such distinct results.

An Alternative Proposal

An alternative account that can better accommodate the empirical evidence and the potential for a plurality of learning mechanisms is possible. In what follows, we will attempt to provide such an account. Our main argument is that the reduction of transformation of function through verbal relations is an outcome that can be obtained through different basic behavioral processes. This idea is not entirely novel and has been briefly considered in at least one previous publication (Healy et al., 2008).

As a starting point, consider ACT exercises that involve word play (word repetition, silly voices, slow speech, singing thoughts, and word translation). These are basically exposure procedures in which the client repeatedly contacts a stimulus of high magnitude, without other associated stimuli, until its verbally conditioned eliciting functions are reduced. Thus, these exercises allow other stimulus features to be responded to, such as the sensation of the mouth while pronouncing different auditory aspects of the word, as reported by clients (Hayes et al., 2012b). Sometimes a playful context supplements the intervention, eliciting emotional responses incompatible with the previous response pattern.

As is the case with other exposure interventions, the precise mechanism of change involved in these exercises is unclear (cf. Tryon, 2005). More specifically, it is possible to conceptualize them as working through the process of respondent extinction, in which the repeated presentation of the stimulus is sufficient to decrease its function strength (Tryon, 2005). Alternatively, counterconditioning processes in which one response is replaced by another may be at play (Tryon, 2005). A third possibility is that inhibitory learning processes may occur, creating new relations among stimuli that establish the previously feared stimuli as safe (Craske, Treanor, Conway, Zbozinek, & Vervliet, 2014).

Basic research indicates that the effects of exposure may generalize to other verbally related stimuli, diminishing their functional strength (Dougher et al., 1994; Luciano et al., 2013; Roche et al., 2010; Vervoort et al., 2014). This suggests not only that defusion procedures may decrease the functional dominance of particular stimuli or functions but also that this decrease may in turn occur for verbally related stimuli. Thus, function transformation is reduced. Indeed, some authors have argued that cognitive defusion, in some cases, may involve no more than exposure and derived extinction (Roche et al., 2010; Wilson & Murrell, 2004).

In contrast, cognitive defusion exercises that disrupt the link between thought and action (thoughts and feelings aren’t causes, carrying cards, and take your mind for a walk) may be conceptualized as procedures that work through differential reinforcement. These procedures require the client to come into contact with a stimulus of high magnitude and emit responses that differ from the rigid response usually evoked by the stimulus. These responses will be reinforced by either the therapist (social reinforcement) or naturally through engagement in valued activities. Through differential reinforcement of alternative responses (DRA) in the clinical context (cf. Vollmer & Iwata, 1992), the correlation between the stimulus and the operant response is reduced; consequently, the stimulus’s evocative functions are diminished. The thought becomes a discriminative stimulus for other responses.

Basic research suggests that function reduction through differential reinforcement transforms the function of verbally related stimuli (Bones et al., 2001; Broothaerts, 2015). Thus, defusion procedures may reduce function transformation through this process. Dymond, Dunsmoor, Vervliet, Roche, and Hermans (2014) offered a similar interpretation:

For example, one technique known as “defusion” (Masuda et al., 2004) teaches the client how to perform other instrumental responses in the presence of fear stimuli and to thereby broaden the response functions of fear stimuli rather than narrow them. In effect, the multiplicity of response functions that get established in the therapeutic setting . . . compete with the normally dominant fear and avoidance functions and reduce the probability of fear and avoidance emerging on each occasion. (p. 38)

This last category of procedures maps most closely onto the definition of defusion offered by Blackledge (2007) insofar as it alters the context around the verbal relations targeted. However, it diverges from it regarding the contextual operation performed. If, as Blackledge (2007) stated, a functional contextual cue is removed, it is very likely that another variable will, for historical reasons, take its functional role. The same would not occur if a procedure recontextualizes relational responses, introducing cues that act as alternative functional contexts and directly diminish function transformation. Some cognitive defusion exercises attempt this strategy and typically introduce one of two types of functional contexts.

The first functional context is descriptive autoclitics, or verbal stimuli that offer information regarding the variables controlling one’s verbal behavior (cf. Skinner, 1957). For example, to describe a speaker as a liar, insane, or biased changes the impact of his or her verbal responses on the listener (e.g., McHugh, Barnes-Holmes, & Barnes-Holmes, 2004), usually minimizing it. Some defusion exercises use descriptive autoclitics to depict the thought flow as a narrative that is coherent but, nevertheless, does not necessarily correspond to reality, being arbitrary, creative, flawed, and with distinguishable story components (e.g., create a new story; I am; why, why, why?; mental appreciation; bad cup metaphor; having thoughts; cubbyholing; observer exercise).

The second commonly used functional context locates one’s relational response at a spatial distance from oneself (there as opposed to here). The literature on discounting (cf. Rachlin, 2006) establishes that a stimulus decreases its value as a function of its probability of occurrence and temporal delay. Studies with primates (Kralik & Sampson, 2012; Long & Platt, 2005; Stevens, Rosati, Ross, & Hauser, 2005) and with humans (Brown, Reed, & Harris, 2002; Hannon, 1994; Pate & Loomis, 1997) suggest that the same holds true with respect to spatial distance: Increased distance from a stimulus leads to reduced stimulus control. Additional evidence comes from self-distancing studies, in which reexperiencing an emotional memory in the third person (distanced perspective) reduces overt and covert emotional responding in comparison to reimagining it in the first person (immersed perspective; e.g., Ayduk & Kross, 2008; Kross, Ayduk, & Mischel, 2005; Kross, Gard, Deldin, Clifton, & Ayduk, 2012; Mischkowski, Kross, & Bushman, 2012). Therefore, defusion exercises that verbally recontextualize the client’s relational responses as physical objects in the distance (posters in soldiers in the parade; leaves in leaves on a stream; passengers in passengers on a bus; imaginary objects in physicalizing; hands in hands as thoughts; and written words in content on cards) may diminish ensuing transformations of function through this process. This process is clearly central to ACT. In fact, the distance metaphor was emphasized in ACT’s original name, “comprehensive distancing” (Zettle, 2005).

To conclude, the present account proposes that cognitive defusion (i.e., reductions in verbal function transformations) can occur through different mechanisms of change, and procedures described in the literature may operate through one of these processes or more than one in combination (see Table 1). Extinction strategies diminish the eliciting functions of stimuli through extinction, counterconditioning, or inhibitory learning processes, whereas DRA exercises diminish the evocative functions of stimuli through differential reinforcement. Consequently, their reduced functions diminish the strength of the transformed functions to verbally related stimuli. However, contextual strategies reduce the transformation of function directly through the introduction of alternative functional contexts that disrupt this process.

Table 1 Summary of the different pathways to cognitive defusion

Conceptual and Clinical Implications

Throughout this article, the distinction among cognitive defusion procedure, process, and outcome has been emphasized. We define a procedure as the manipulation of environmental events, a process as the changes in the interaction between the organism and its environment, and an outcome as the change in the organism’s dispositional state or behavioral tendency (Lopes, 2008). More often than not, discussions about cognitive defusion do not differentiate among these three phenomena, and the term cognitive defusion is used to refer to all of them interchangeably (Barnes-Holmes et al., 2016; McEnteggart et al., 2015). By doing so, the concept’s precision is reduced. Given that precision is pursued by both CBS and science in general and that pragmatic utility in CBS is partially a function of defining its terms at the most useful levels of precision (Hayes et al., 2012a), it may be useful to distinguish these phenomena herein. Specifically, there are several defusion exercises used during therapy, such as word repetition, take your mind for a walk, and having thoughts (procedures), and distinct defusion mechanisms of change underlying these exercises, such as extinction, differential reinforcement, and recontextualization (processes). Given this diversity of procedures and processes, their outcomes may have variations, such as in the specific stimulus function affected. McEnteggart et al. (2015) expressed this concern as follows:

Of course, we are neither denying that defusion techniques exist, nor that fusion can be reduced, nor that any of this cannot happen through a process of defusion, instead we are simply saying that the same concept cannot be all three types of phenomena. (p. 57)

Assuming that the term cognitive defusion is maintained as a purportedly pragmatically useful concept in ACT, it may benefit the stated goals to develop a conventional use of the term in which its referent is unambiguous. We suggest that its use as an outcome descriptor may be the most beneficial and the least susceptible to confusion. To be more specific, we discuss defusion as the effect of reducing the function transformations that occur in verbal relations through one of several processes. These processes, in turn, may be produced by various exercises that can now be more meaningfully distinguished from each other on the basis of their differing underlying processes.

The present discussion has parallels with Cordova’s (2001) thesis regarding the eponymous ACT term acceptance. The author suggested that “acceptance might be operationally defined as a change in the behavior evoked by a stimulus from that functioning to avoid, escape or destroy to behavior functioning to maintain or pursue contact” (p. 215). Cordova describes multiple processes and procedures responsible for such change, suggesting that acceptance is best defined as an outcome. Indeed, all of the terms of ACT’s psychological flexibility model may be defined as outcomes rather than procedures or processes, suggesting that ACT’s overall aims include increased contact with private events (acceptance), decreased control of verbal relations (cognitive defusion), increased sensitivity to present contingencies (contact with the present moment), development of a different perspective of the self (sense of self-as-context), increased clarity about what is important (values), and engagement in behaviors congruent with one’s values (committed action).

Our proposed clarification of cognitive defusion as an outcome rather than a process has implications for the practitioner. Specifically, one of the cornerstones of clinical behavior analysis (which includes ACT) is functional analysis (Callaghan & Darrow, 2015; Dougher, 1999), which is the identification and description of functional relations among environmental events and the person’s actions (cf. Haynes & O’Brien, 1990; Sturmey, 1996). Functional analysis relates to both case conceptualization and intervention planning through the identification of problematic organism–environment relations that are causing psychological suffering, the selection of the behavioral processes capable of altering these relations, and the choice of intervention procedures that target these behavioral processes. Without proper functional assessment, the selection of intervention procedures may be ineffective or iatrogenic (cf. Iwata, Dorsey, Slifer, Bauman, & Richman, 1982). Thus, as a clinical behavioral therapy, ACT’s decision-making process must be primarily guided by functional analysis and not simply protocols and procedures (Bach & Moran, 2008; Westrup, 2014). By achieving clarity over the basic processes involved in the outcome of cognitive defusion, the clinician can more effectively choose between interventions based on their processes and not their procedures. Having determined that the client would benefit from a reduction of function transformation caused by verbal relations (i.e., defusion), the therapist can select which behavioral process has the highest probability of being effective given the specifics of the case (e.g., extinction, differential reinforcement, or recontextualization). In turn, the therapist can choose from an array of procedures in the literature or even formulate new ones to bring about the relevant process. Consequently, the therapist’s decision making increases in coherence, flexibility, and creativity.

As an example of the aforementioned decision-making process, consider two hypothetical clients, Alexandra and Rodrigo. Alexandra presents as a high-functioning executive who goes to work every day but spends hours behind her closed office door crying uncontrollably, overwhelmed by thoughts that she is a failure. Alexandra’s absorption in her private events and her high emotional arousal interferes with the quality of her work to such an extent that the therapist may choose defusion exercises that target respondent extinction. Rodrigo, in contrast, suffers from social anxiety, is dominated by thoughts that he will suffer a social humiliation if he goes out in public, and has difficulty leaving home. Because his avoidance is fairly complete, he does not display high levels of emotional arousal. His therapist may choose exercises that target operant processes that focus more on the avoidance that results from verbal transformations of function. In both cases, the therapist’s focus is on valued living. However, the process to achieve that varies because the case conceptualization differs.

Finally, imagine a therapist who decides, based on a functionally based case conceptualization, to recontextualize the client’s thoughts. By being under the control of the process that he or she ought to foster, the therapist is able to select (or even design) and implement a procedure that resonates with the client’s current repertoire and strengths, enhancing the probability of a successful intervention. For example, an architect who has great capacity to imagine three-dimensional objects may easily be able to objectify his or her thoughts, whereas a vigorous reader may be better able to see his or her thoughts as a narrative.

Future Research Directions

This article offers an account of the processes of change in cognitive defusion. The formulation was based on the behavior–analytic literature of stimulus function transformations and existing component studies of cognitive defusion procedures. Additional research is necessary to elucidate the processes involved in cognitive defusion as an outcome. Three research implications are relevant. First, future analog component process studies on cognitive defusion procedures (Levin et al., 2012) would benefit from carefully considering how they conceptualize defusion in generating hypotheses about the independent and dependent variables involved. If different processes underlie various defusion interventions, results should differ across studies. For example, exposure procedures should readily alter eliciting functions, whereas DRA procedures may readily affect evocative functions. The aforementioned component studies offer preliminary support for this idea. Specifically, word repetition (an exposure exercise) decreases emotional arousal (an eliciting function; e.g., Masuda et al., 2004, 2009; Masuda et al., 2010a), and carrying cards (a DRA exercise) decreases avoidance (an evocative function) despite the maintenance of high emotional arousal (Gutiérrez et al., 2004). However, no study has directly compared the effects of different cognitive defusion procedures yet. This matter requires urgent empirical attention.

Second, experimental analog studies offer many of the benefits of component studies insofar as their dependent variables are behaviors analogous to clinical problems (e.g., emotional arousal, avoidance). These studies differ, however, regarding the independent variable. Instead of complete exercises, the relational processes themselves are manipulated, allowing for greater experimental control over the process of change (Dymond et al., 2013; Levin & Villatte, 2016). This type of research is encouraged. Research examining derived extinction provides such an example (e.g., Luciano et al., 2013; Roche et al., 2010; Vervoort et al., 2014). These studies create de novo equivalence classes between arbitrary stimuli. Through direct conditioning, one member of the class typically acquires an eliciting or evocative function (e.g., fear or avoidance), which transforms the function of all other class members. Then, one member of the class is submitted to an extinction procedure to allow for a subsequent test for generalization of extinction to other class members. Thus, derived extinction may be a suitable experimental analog for interventions that promote cognitive defusion through respondent processes and may inform practitioners with data from controlled environments.

Third, studies using clinical samples and clinical practices are required. Single-subject designs, if used systematically, may show the direct benefits of defusion procedures. Clarity about the use of the concept of cognitive defusion as an outcome as opposed to a process will make defusion interventions more amenable to direct empirical investigation and allow for easier and more effective comparison across case studies and studies using single-subject designs. Process analyses of group designs may also observe and report on the more precise independent variable–dependent variable relations proposed herein. To do this research, development of clinically useful measures of defusion processes (e.g., distinguishing eliciting from evocative functions of defusion procedures) would be useful.

Conclusion

As ACT continues to accumulate empirical support across diagnoses and presentation problems, efforts to train and disseminate ACT that emphasize its clinical model of middle-level terms will undoubtedly increase. However, as we have shown, at least one of these terms (cognitive defusion) has evolved in popularity and practice without being used with great precision, accommodating a large relational network of procedures, processes, and results. The conflation of multiple distinguishable phenomena in a single concept may be an obstacle to the overall agenda of CBS in developing “basic and applied scientific concepts that are useful in predicting and influencing the contextually embedded actions of whole organisms, individually and in groups, with precision, scope and depth” (Hayes et al., 2012a, p. 2). Throughout the conceptual development of cognitive defusion, the clarification of basic processes (mechanisms of change) underlying these interventions and responsible for these outcomes was not emphasized. However, because of the functional nature of behavior–analytic explanations, this discussion is important and should guide research and clinical practice.

We hope to stimulate discussion on the clarity of middle-level terms; to provide an alternate, more precise conceptualization of cognitive defusion; and to provide a framework that may clarify future analog component, experimental, and clinical research studies on defusion and the practice of clinicians. All three types of research ought to be valuable additions to the body of research on ACT because they will aid in clarifying the basic processes that contribute to ACT’s efficacy. The overall goal is not only to improve the treatment delivery’s quality but also to identify functionally defined, empirically supported principles of change (cf. Rosen & Davison, 2003) that in the future may guide clinical practice independent of protocolized treatment packages, with greater integration among different contextual–behavioral psychotherapies (Callaghan & Darrow, 2015).