Keywords

1 Introduction

The ability to learn and identify threat is critical for survival; hence it is highly conserved and supported by multiple neural processes. Threat-relevant information consists not only of discrete cues (e.g., a gun) but also of the context in which a threat occurs (e.g., a gun show versus in a dark alley). Context governs the predictive value of fear and safety cues and facilitates the selection of appropriate cognitive, behavioral, and neurobiological responses. A context may act as a modulator of threat associations and/or an occasion setter for another cue and can itself also serve as a stimulus that acquires associative value (Maren et al. 2013; Urcelay and Miller 2014). Contextual information plays an important role in constraining inappropriate memory recall (Chun and Phelps 1999). Impairments in contextual fear learning during and following trauma may be involved in the etiology of posttraumatic stress disorder (PTSD) by contributing to inappropriate recall of traumatic memory (Acheson et al. 2012; Liberzon and Abelson 2016). Improved understanding of contextual fear learning may inform development of novel PTSD treatment and prevention efforts (Glenn et al. 2014; Risbrough et al. 2016), underscoring the need to better delineate the neural mechanisms associated with contextual fear.

Much of what is known about the neural mechanisms of contextual fear learning is based upon animal research (e.g., Bouton 1993; Fanselow 2000, 2010). Animal studies indicate that contextual fear may be learned through two distinct processes: elemental and configural processing of contextual information (Rudy et al. 2004). Elemental processing involves learning contextual information through separate associations with each of the salient individual elements present, which primarily requires only the amygdala. Alternatively, configural processing reflects the binding together of multimodal individual contextual features into a single gestalt representation of the entire context. The hippocampus is thought to form configural representations of a context which are subsequently used to recognize a similar context (pattern completion) or distinguish between contexts (pattern separation) (Rudy 2009) and communicate with the amygdala to control fear behavior. Configural and elemental processes compete over encoding of contextual information such that under normal circumstances hippocampal-driven configural learning takes priority. In circumstances involving compromised hippocampal functioning, contextual information can potentially be learned through multiple amygdala-driven elemental associations.

Configural versus elemental learning of contextual information during a traumatic experience may play an important role in the etiology of PTSD. Impaired hippocampal encoding of contextual details during and following trauma could lead to the amygdala taking over, resulting in elemental encoding of the traumatic context. An important consequence of amygdala-driven elemental encoding of trauma is that subsequently encountering a single element related to the trauma may trigger recall of the traumatic memory and elicit a fear response. This differs from recall of a configural memory of trauma which is likely to be triggered only in circumstances similar to the overall traumatic context (Acheson et al. 2012; Liberzon and Abelson 2016). An example is illustrated in the following vignette: A soldier, Joe, is driving in a long convoy through a desolate desert area of Afghanistan where prior convoys had been attacked by enemy combatants. Joe experiences the trauma of seeing a truck unexpectedly blown up in front of him by a roadside explosive. If encoding of Joe’s traumatic memory is configural, it is likely that he will only subsequently have strong recall of the trauma in situations with many of the same co-occurring contextual features (e.g., desert, roadside trash, war zone, military convoy, hot sun) but not in situations with only a single trauma reminder (e.g., seeing sand at the beach or trash by the side of the road while in the United States). Alternatively, if memory of the traumatic context was formed elementally, there is increased probability that single reminders of the event (sand, trash, hot sun, smell of automobile exhaust) will cause Joe to have increased physiological arousal related to frequent re-experiencing of the trauma. Frequent re-experiencing of traumatic memory is a core symptom of PTSD.

There is a strong theoretical rationale for researching neural mechanisms related to contextual fear learning and PTSD risk, resilience, and etiology (Acheson et al. 2012; Liberzon and Abelson 2016). However, a significant barrier to investigating these areas is a lack of clarity over what exactly the term “context” means and how to operationalize it. For example, what constitutes the context in the vignette about Joe’s traumatic experience? The context shaping Joe’s memory of his trauma memory likely includes a combination of the multimodal sensory details of the environment, internal affective and cognitive states, and the unpredictability of threat over an extended period of time. Context is a broad, multifaceted construct, so it is perhaps unsurprising that the operational definitions and methods used to investigate human contextual fear learning are also broad and varied. Human neuroimaging studies of contextual fear acquisition can be grouped into four broad categories in terms of the paradigms used to manipulate contextual learning: colored background, static picture background, virtual reality (VR), and configural stimuli. Across these broad paradigms, there is much variability in how context is defined and measured. Contextual fear learning may represent a behavioral phenotype with the potential to inform PTSD treatment development (Risbrough et al. 2016). In order to make progress, though, there is a need to clarify how contextual fear has been operationalized in human research and to elucidate the neural circuitry involved in distinct aspects of contextual learning.

2 Review

2.1 Review Method

The aim of the current review is to highlight the primary methods used in neuroimaging studies of human contextual fear acquisition, summarize findings regarding the neural circuitry involved, and provide methodological recommendations. There are a number of research domains related to but outside of the scope of this review, which is restricted to human neuroimaging studies with acquisition of contextual fear as a primary construct of interest. Thus, this review will not cover studies in the following areas: human neuroimaging of cued fear conditioning (e.g., Greco and Liberzon 2016), contextual fear conditioning in humans that does not involve neuroimaging (e.g., Ameli et al. 2001), contextual modulation of fear extinction in both humans and animals (e.g., Anagnostaras et al. 2001; Milad and Quirk 2012), and non-contextual fear-related hippocampal functioning (e.g., Hayes et al. 2012; Chen and Etkin 2013; Hannula and Helmstetter 2016; Nees and Pohlack 2014). This review was conducted through PubMed searches of “context fear,” “contextual fear,” “fMRI,” and “imaging,” with studies only included if they used human subjects and focused on acquisition of contextual fear. Here we will review four types of paradigms used in neuroimaging studies of contextual fear acquisition (colored background, static background, VR, configural), outline findings for the primary neural circuitry involved in each paradigm, and make methodological recommendations for human studies of contextual fear. To facilitate interpretation of the neuroimaging findings, we additionally conducted a custom meta-analysis using Neurosynth based on the studies identified and reviewed (Yarkoni et al. 2011).

2.2 Background Color or Picture

The most basic method investigators have used in imaging studies to manipulate context is by changing the background color of a screen (Armony and Dolan 2001; Barrett and Armony 2009; Cavalli et al. 2017; Lang et al. 2009; Pohlack et al. 2012). In this paradigm, the threat context (CON+) and safety context (CON−) are designated by distinct colors which correspond to whether or not an aversive unconditioned stimulus (US) will be administered. The earliest neuroimaging example of this paradigm used distinct background colors that changed to represent different contexts (Armony and Dolan 2001; Armony and Dolan 2001; Barrett and Armony 2009). Variations of this paradigm have used background color contexts that slowly transitioned back-and-forth between CON+ and CON− (Lang et al. 2009) rather than abruptly switching between colors as in prior studies. It is noteworthy that all of the studies using a slow transition between the color of CON+ and CON− reported increased hippocampus activity for CON+ compared to CON− (see Table 1), while the distinct color background studies did not. Instead, the distinct color background studies found only amygdala and parietal activity differences in CON+ versus CON− (Armony and Dolan 2001; Barrett and Armony 2009). The absence of hippocampus activity in distinct color background studies may reflect that the simplicity of stimuli in this paradigm could be solved through elemental processing and does not require forming a configural representation of the context. This suggests that the additional perceptual component of slowly transitioning color between contexts may be sufficiently complex to require hippocampus-related processing.

Table 1 Summary of neuroimaging findings for contextual fear learning paradigms

Another common paradigm for studying contextual fear acquisition uses static pictures or photographs of distinct environments to serve as contexts (Marschner et al. 2008; Steiger et al. 2015). In this paradigm, pictures of two “similar but easily distinguishable rooms” serve as CON+ and CON−. The room pictures are presented for long trial durations (~60s trial), during which discrete cues (geometric shapes presented over background picture; CS+ or CS−) are presented, with timing of US presentation predictable during CS+ but unpredictable during CON+. In general, studies using this paradigm report elevated skin conductance response (SCR), US expectancy, and hippocampus activity during CON+ compared to CON− and elevated amygdala activity during discrete CS+ versus CS− trials. Most imaging studies of contextual fear use psychiatrically healthy samples, but one study using background pictures as contexts examined differences between PTSD patients (n = 14), trauma-exposed healthy subjects (n = 12), and healthy controls (n = 11) (Steiger et al. 2015). Self-reported arousal, negative valence, and US expectancy were higher to CON+ than CON− for all subjects, but PTSD patients had poorer contingency awareness than healthy controls and higher differential hippocampal response to CON+ versus CON− than both other groups. The authors hypothesized that the PTSD-related increase of hippocampal activity for CON+ compared to CON− may reflect compensatory neural engagement to perform a low-load contextual task for the PTSD patients. In other words, for non-PTSD subjects, the task was sufficiently easy that it required minimal hippocampus-dependent processing, while for PTSD patients task completion required greater hippocampal engagement.

In contextual fear paradigms with background colors or pictures as contexts, there are several ways in which the simplicity of the visuospatial information is a methodological strength. First, using colors or pictures as contexts may be more easily replicable across research laboratories than VR contexts due to the relative simplicity and low cost of designing the contextual stimuli. Second, neuroimaging findings from studies using background color as context are easier to interpret as being related to unpredictability of US timing rather than confounding unpredictability with learning about complex multimodal contextual features.

Alternatively, background pictures and particularly background colors have much less ecological validity as contexts than VR environments. It is debatable to what extent different backgrounds colors or pictures actually represent different contexts versus different simple cues. Distinguishing between background colors almost certainly does not require configural processing. In studies using background pictures, without careful methodological control, distinguishing CON+ from CON− may be accomplished by solely remembering a single element present in each picture. For example, the room pictures utilized by Steiger et al. (2015) were similar in terms of overall shape and layout, but the contexts could have been differentiated by attending only to whether the left edge of each picture included a door (hallway) or books (library). It is problematic for studies ostensibly investigating hippocampus-dependent configural learning that findings of neural activity could reflect elemental rather than configural processing.

2.3 Virtual Reality

VR has been utilized in numerous studies of contextual fear conditioning (e.g., Baas et al. 2004; Grillon et al. 2006), though only a small number include neuroimaging (Andreatta et al. 2015; Alvarez et al. 2008; Alvarez et al. 2011; Indovina et al. 2011). In this paradigm, subjects passively move through VR rooms (e.g., house, airport) that serve as contexts, usually for an extended trial duration (30–40 s). Typically, the aversive US is presented unpredictably within CON+ with no US delivery during CON− or ITI. The unpredictable timing of the US in CON+ is designed to maximize conditioning to the overall environment rather than to specific features within the context. VR studies of contextual fear generally find increased SCR, US expectancy, post-training anxiety, and hippocampus and amygdala activity for CON+ versus CON−.

A modification of this paradigm adds discrete cues (e.g., 3 s auditory tone, virtual actor raising arms to ears for 4–6 s) which are presented multiple times during longer CON+ or CON− trial (Alvarez et al. 2011; Indovina et al. 2011). In these studies, the offset of the discrete CS+ is predictably paired with US administration, while timing of US presentation is unpredictable during CON+ trials. Findings using this paradigm generally demonstrate higher self-reported anxiety and SCR to the unpredictable versus predictable context and to the CS+ versus CS−. VR neuroimaging studies have found sustained activation in the bed nucleus of the stria terminalis (BNST) and in frontal and parietal regions during CON+ compared to CON− (Alvarez et al. 2011). Activation of the extended amygdala to predictable and unpredictable cues (Andreatta et al. 2015) and hippocampus and BNST activation to cues during US unpredictability have also been observed (Alvarez et al. 2011). Using a similar VR paradigm, Indovina et al. (2011) examined whether aberrant contextual fear learning is associated with individual variation in trait anxiety, a key risk factor for internalizing psychopathology (Shackman et al. 2016). Hippocampus activity during the unpredictable CON+ was increased for high trait anxious individuals, but ventromedial prefrontal cortex (vmPFC) to hippocampus functional connectivity was decreased, suggesting an anxiety-related deficit in recruiting contextual-relevant neurocircuitry. The anxiety-related enhancement of the hippocampus during the unpredictable CON+ may reflect compensatory recruitment of contextual fear learning (Steiger et al. 2015).

A major strength of using VR environments is that they represent perhaps the most ecologically valid paradigm used in fMRI studies of contextual fear acquisition. Subjects move through immersive virtual environments that mimic complex visual aspects of real-world environments. Yet, it is worth noting that VR studies of contextual fear acquisition have thus far been limited to visual elements. With continued technical advances, future studies in the VR paradigm will hopefully include other forms of sensory information such as auditory, olfactory, and tactile stimulation to create truly multimodal VR contexts as has been done in research on VR exposure therapy (e.g., Norrholm et al. 2016; Rothbaum et al. 2014).

One important methodological weakness in these VR neuroimaging studies is that configural processing was not required to differentiate contexts. In two of these studies, the different VR environments could have been distinguished solely based on the background or floor color (Andreatta et al. 2015; Indovina et al. 2011). Furthermore, none of these VR studies report having included the same elements within different contexts (i.e., feature-identical design), so contextual differentiation could be accomplished through learning the presence or location of only a few elements. It is problematic for VR imaging studies hoping to draw conclusions about hippocampus-dependent processes that contextual differentiation could be completed via hippocampal-independent means (i.e., elemental associations).

2.4 Configural

A novel methodological approach uses “feature-identical contexts” containing identical elements within them but which are rearranged in different contexts (Baeuchl et al. 2015). This methodology aims to require configural processing in order to differentiate contexts, i.e., a gestalt representation of each context must be learned through hippocampal-dependent processes (Rudy et al. 2004; Rudy 2009). Baeuchl et al. (2015) used two pictures of rooms as CON+ and CON−, each including seven identical elements (TV set, bookshelf, door, painting, couch, lamp, chair), four of which were rearranged differently between contexts. This design should prevent differentiating contexts through focusing only on the presence of a single element. Subjects had higher SCR, self-reported arousal, negative valence, US contingency ratings, and enhanced activity in the anterior and posterior hippocampus and the basolateral amygdala to CON+ relative to CON−. Yet, even this feature-identical design did not necessarily require learning a representation of the entire context, as contextual differentiation could have been achieved through learning only a pair of elements (e.g., always CON+ if door is next to couch, always CON− if painting is next to couch). For studies aiming to investigate hippocampus-dependent configural learning, it is imperative that experimental methodology ensures measurement of the configural learning, or findings cannot be definitively attributed to hippocampus-dependent processing.

2.5 General Issues in Contextual Fear Learning Paradigms

One of the earliest paradigms to investigate human contextual fear learning used variants of the no-shock, predictable shock, and unpredictable shock task (NPU task; Grillon and Davis 1997; Grillon and Morgan 1999). In these paradigms, a colored background, scene, or cue signals whether an oncoming aversive US will be administered in response to a specific cue, occur randomly, or not occur at all (Schmitz and Grillon 2012). The context, in these studies, is described as the cognitive state of certainty versus uncertainty associated with the background (Baas et al. 2004; Vansteenwegen et al. 2008). Human neuroimaging work utilizing variants of this paradigm is growing rapidly (Grupe and Nitschke 2013), but few studies continue to use the term “context” to describe the different conditions based on shock predictability. Research on predictable versus unpredictable threat has become independent of the role of contextual fear processing and more focused on differentiating sustained anxiety from phasic fear in the human brain (Shin and Liberzon 2010; Tovote et al. 2015). Notably, hippocampus-driven contextual processing does not appear to be heavily involved in threat uncertainty. Generally, investigations of threat uncertainty or unpredictability focus on the extended amygdala, primarily the central nucleus of the amygdala (CeA) and BNST (Shackman and Fox 2016; Fox et al. 2015). Most studies do not find increased hippocampal activity to uncertain threat if that is the sole manipulation (Somerville et al. 2013; Grupe et al. 2013), unless another context is manipulated (Alvarez et al. 2008, 2011) or healthy subjects are compared to psychiatric populations (Dretsch et al. 2016). This suggests that cues with temporal unpredictability (the subject knows the US is coming but not when it will occur) have their own unique underlying neurocircuitry (e.g., CeA, BNST) which may shape or interact with hippocampus-based contextual learning.

It is frequently stated that US unpredictability is necessary to elicit contextual fear conditioning (e.g., Baeuchl et al. 2015; Grillon et al. 2004), but this may only be true insofar as it is necessary to elicit sustained anxiety. Just as with a simple CS+, phasic (not sustained) fear to CON+ can be trained through pairing context offset with US onset. There is ample evidence from human and animal research that hippocampus-dependent pattern separation and pattern completion do not require stimulus uncertainty (Bakker et al. 2008; Yassa and Stark 2011; Rolls 2013). Much of the interest in contextual fear conditioning paradigms pertains to their relevance for investigating hippocampus-dependent configural processes in relation to PTSD (Acheson et al. 2012; Steiger et al. 2015), yet perhaps the most ubiquitous methodology in contextual fear studies (US unpredictability) does not depend on hippocampus activity (Shackman and Fox 2016; Somerville et al. 2013). This mismatch between methodology and neural region of interest may result from ambiguity around the term “contextual fear” which led to the conflation of two distinct contextual characteristics (temporal unpredictability versus multimodal features).

The broad definition of context has led to methodological challenges for identifying neural activity which is unique to different forms of contextual manipulation. A methodological limitation of most fMRI studies of contextual fear acquisition is that different contextual characteristics (i.e., US unpredictability, context duration, simple versus complex multimodal features) are commonly confounded with one another. For example, it is impossible to make conclusions about findings of elevated hippocampal activity during VR threat contexts (Alvarez et al. 2008; Andreatta et al. 2015) or static picture contexts (Marschner et al. 2008). The neural circuits underlying configural learning versus sustained anxiety are likely different, and they cannot be distinguished in studies such as these, which confound distinct contextual characteristics.

A general critique about imaging studies of contextual fear learning pertains to the use and reporting of psychophysiological measures. Fear-potentiated startle is one of the most commonly used and well-validated methods for measuring fear learning in humans (e.g., Grillon 2008), yet none of the studies reviewed here utilized startle. The lack of imaging studies in this area utilizing startle is likely due to the challenge of presenting auditory startle pulses over the loud background noise of an fMRI environment (>100 dB; Ravicz et al. 2000; Moelker and Pattynama 2003) or to avoid introducing electrical artifacts and noise into acquisition of brain images and vice versa (scanner pulses and changing magnetic field causing noise in EMG collection). However, simultaneous fMRI and startle measurement is possible with scanner safe equipment and additional preprocessing steps (van Well et al. 2012; Gorka et al. 2017). Relying on SCR as the sole physiological measure of fear learning is problematic because SCR and startle reactivity are distinct biological responses with different underlying neural underpinnings and potential for translation to model organisms (Davis 2006; Nagai et al. 2004; Risbrough 2010). SCR, but not startle reactivity, may be dependent on contingency awareness (Sevenster et al. 2014), and SCR may be a less sensitive measure than startle in detecting differences in fear responding related to PTSD (Acheson et al. 2014). Finally, few studies report associations (positive or negative) between physiological measures of fear and neural activity (Indovina et al. 2011; Pohlack et al. 2012). This paucity of reporting is likely related to the difficulty relating SCR and BOLD changes over long stimulus durations, as well as the broader issue in the literature of not reporting negative results. It is worth reiterating the value of reporting negative findings (e.g., Teixeira da Silva 2015) particularly for research domains such as brain activity during fear learning in which the expensive monetary cost of equipment limits both sample size and the overall number of studies.

2.6 Summary of Neuroimaging Findings

Much of the work defining the neural circuitry associated with contextual fear learning has been done in animals (Maren et al. 2013) with little neuroimaging work investigating this area in humans (Greco and Liberzon 2016). Findings from the human neuroimaging studies reviewed here are generally consistent with findings from animal models (see Table 1 for summary of imaging findings across contextual fear paradigms). The results from Marschner et al. (2008) found dissociable roles for the hippocampus and amygdala for contextual and cued fear acquisition, respectively, and replicated animal findings implicating the hippocampus as the key region for acquiring contextual/configural fear but not elemental fear (Fanselow 2010; Maren et al. 2013) and the amygdala as primary region for elemental processing (Rudy 2009). Other human studies find contextual modulation of both the hippocampus and amygdala (Andreatta et al. 2015) or the amygdala alone (Armony and Dolan 2001). One potential explanation is that the hippocampus forms a conjunctive representation of the discrete fear-related cues, which strengthens connections with the amygdala to control defensive responses (e.g., freezing; Rudy et al. 2004) and inhibits elemental contextual processing by the amygdala (Rudy 2009). This is consistent with converging findings in connectivity analyses where increased functional connectivity between the hippocampus and amygdala was observed (Baeuchl et al. 2015) and a path analysis which showed a negative association between hippocampus and the amygdala (Alvarez et al. 2008). These findings are correlational in nature and do not provide a causal explanation, but they show preliminary evidence that contextual fear learning in humans recruits similar neural circuitry as has been extensively mapped in animal models (Rudy 2009; Fanselow 2010).

Beyond the hippocampus and amygdala, many of the reviewed studies observed medial PFC (mPFC) activation, including the vmPFC and dorsal anterior cingulate cortex (dACC) (Andreatta et al. 2015; Marschner et al. 2008; Pohlack et al. 2012). Animal models demonstrate that the mPFC is an important region for encoding contextual associations (Hyman et al. 2012; Euston et al. 2012) and plays a pivotal role in fear conditioning and extinction in humans and animals (Giustino and Maren 2015). Specifically, the vmPFC has been shown to mediate fear regulation and extinction (Sehlmeyer et al. 2009; Milad and Quirk 2012) and encoding of contextual cues (Rozeske et al. 2015; Quinn et al. 2008). In contrast dorsal regions of the ACC are thought to integrate cognitive, affective, and physiological signals (Shackman et al. 2011) and be involved in the expression of fear during fear conditioning (Etkin et al. 2011; Milad et al. 2007) including contextual fear (Rozeske et al. 2015). These data suggest that subregions within the mPFC are critical to contextual fear expression.

The results of our review also indicate that a broad frontoparietal network responds to contextual fear processing. For example, Lang et al. (2009) found contextual fear was associated with sustained activation in the superior frontal gyrus, inferior frontal gyrus (IFG), frontal gyri, supramarginal gyrus, and the insula. Marschner et al. (2008) found increased bilateral parietal cortex, bilateral insula, dACC, and orbital frontal cortex for CON+. Baeuchl et al. (2015) found configural context fear-related learning in the IFG and middle frontal gyri (MFG), bilateral parietal cortices, and bilateral insula. Such frontoparietal activity may reflect enhanced cognitive and attentional allocation to threatening contexts (Corbetta and Shulman 2002; Scolari et al. 2015; Zanto and Gazzaley 2013), or it could reflect emotional regulation processes (Etkin et al. 2015). The lateral PFC activity is consistent with research showing this region’s importance for cognitively demanding fear learning such as contextual fear extinction and trace conditioning (Delgado et al. 2008; Knight et al. 2004; Gilmartin et al. 2014). The insular cortex plays an important function in anticipating aversive events (Carlson et al. 2011; Simmons et al. 2011) and in predicting cognitive control demands (Jiang et al. 2015), especially in paradigms that use unpredictable shock (Alvarez et al. 2015; Grupe and Nitschke 2013). It has also been shown to facilitate the expression of contextual fear in rodents (Alves et al. 2013). Collectively, these data suggest that contextual fear learning is associated with broad cortico-limbic circuits underlying cognitive-emotional function.

3 Contextual Fear Acquisition Meta-Analysis

To facilitate the summary of the neuroimaging findings associated with contextual fear learning, we conducted a custom meta-analysis by searching the Neurosynth database (Yarkoni et al. 2011) for the studies surveyed in our review. This strategy enabled us to aggregate the findings across multiple studies, thereby increasing the power to detect effects and visualize “core” brain circuits underlying contextual fear learning. A total of six studies were available for which we created the term “contextual fear acquisition.” As detailed by Yarkoni et al. (2011), the Neurosynth database extracts the coordinates of brain activation maps from each of the identified studies and then creates Z-scored brain maps representing the strength of the association of the term (i.e., contextual fear acquisition) to the rest of the brain. The meta-analysis shows the probability that specific brain regions consistently activate given the custom term (e.g., contextual fear acquisition). The forward and reverse inference maps of contextual fear acquisition and a list of the six investigations used can be found online: http://www.neurosynth.org/analyses/custom/31dcd438-7888-413f/. To aid interpretation of the custom meta-analysis, we assessed how the contextual fear acquisition brain map is unique versus overlaps with the broader Neurosynth automated meta-analysis terms “fear” and “conditioning.” A total of 298 studies were included for the term fear and “137” studies for conditioning as of 6/5/17. For ease of comparison between the three terms, we computed a three-way conjunction map to visualize brain regions that overlap and differ. All brain maps are visualized at a threshold of Z > 5.0 (FDR < 0.01).

As Fig. 1 shows, contextual fear acquisition (red clusters) is associated with a distributed network of cognitive and affective neural regions that differ from the broader constructs of fear (yellow clusters) and conditioning (green clusters). As hypothesized, studies of contextual fear acquisition consistently report activity in the hippocampus, particularly in mid to posterior subregions. The anterior hippocampus appears to share neural circuitry with contextual fear acquisition, fear, and conditioning (purple clusters). This pattern suggests that learning or expression of fear is associated with anterior portions of the hippocampus, whereas contextual aspects reliably activate mid to posterior portions. The overlap in the anterior hippocampus extends into the amygdala, where the three terms also show strong activation. These results converge with animal studies (Fanselow 2010) implicating the hippocampus and amygdala as key regions in discriminating safe versus threatening contexts.

Fig. 1
figure 1

Conjunction analysis showing similarities and differences between the custom meta-analysis of “contextual fear acquisition” and the automated meta-analyses of “fear” and “conditioning” using Neurosynth. Brain maps shown are forward inference maps at a threshold of Z > 5 and FDR corrected (FDR < 0.01)

Within the PFC contextual fear acquisition and fear overlapped in the vmPFC. All three terms (contextual fear acquisition, fear, and conditioning) converged in the dACC, with larger clusters found for fear and conditioning. Consistent MFG activation was unique to studies with contextual fear acquisition but not fear or conditioning, as was activation in the parietal cortex. These results suggest that regions important for higher-order cognition and attention may be important for contextual fear learning.

All three terms consistently show activation in the bilateral anterior insula, and contextual fear acquisition and fear overlap in the BNST. The insula and BNST have been shown to be important for sustained anxiety and modulating the anticipation of threat certainty (Fox et al. 2015; Paulus and Stein 2006; Shackman and Fox 2016). In regard to contextual fear learning, insula and BNST activity may be a result of the methodological convention to administer the US unpredictably. Future research is needed to determine the role the insula and BNST play in contextual fear paradigms that use predictable threat timing.

As this custom contextual fear acquisition meta-analysis consisted of six studies, cautious interpretation is warranted. Moreover, due to the few studies to select from, we only present the forward inference maps. Thus, the unique neural circuitry of contextual fear acquisition that is reported provides only tentative evidence that those regions may contribute to contextual fear learning. As future contextual fear acquisition studies are published, they can be added to the meta-analysis which will serve to enhance the understanding of the neural mechanisms associated with contextual fear acquisition.

4 Recommendations

4.1 Terminology

Future studies of contextual fear learning should use language that clearly identifies which types of contextual characteristics are investigated. Only using the broad term “context conditioning” or “contextual fear learning” can indicate a number of distinctly different paradigms. Just as it would be problematic to lump together without distinction studies of fear acquisition, extinction, and reinstatement all under the term “fear learning,” failing to specify significant methodological differences in the study of contextual fear acquisition hinders scientific progress. Indeed, in human research the term “contextual fear learning” is vague enough that it means almost nothing when used in the absence of a specifier (e.g., unpredictable contextual fear, configural contextual fear). Until there is widespread agreement about what constitutes an associative context, researchers should specify exactly how a context is operationalized in new studies and in previous findings being cited. Further, any discussion about neural circuits involved with contextual fear learning in regard to psychiatric etiology should differentiate between distinct forms of contextual fear processes (e.g., Is PTSD associated with dysfunction in configural processing of multimodal environmental elements, uncertainty of threat, extended duration threat, or some combination of all three?).

4.2 Stimulus and Response Characteristics

Investigations of contextual fear should utilize experimental designs that control for rather than confound distinct contextual characteristics (US unpredictability, long stimulus duration, multimodal configuration). Such methodological considerations are particularly important for imaging studies aiming to draw conclusions about neural activity. As long as the neural circuitry underlying a given process is not fully understood, experimental designs should aim to isolate processing of only a single contextual characteristic. This strategy could include examining (a) configural learning in visually complex short duration (6–8 s) contexts with predictable US timing, (b) US unpredictability in visually simple short duration contexts, or (c) stimulus duration in visually simple contexts with predictable US timing. Once there is improved understanding of neural circuits involved in processing specific types of contextual characteristics, interactions between distinct circuits might be investigated through study designs such as a 2 × 2 approach with two characteristics manipulated, while the third is held constant [e.g., visuospatial complexity (simple, complex) × US timing (predictable, unpredictable) in short duration contexts].

Linking neuroimaging findings with functional indicators of fear will be critical for understanding functional relationships between circuit activity and contextual modification of fear expression. This complementary approach will greatly enhance our understanding of abnormalities in contextual fear expression and circuits in patient populations such as PTSD. Beyond SCR and self-report US expectancy/contingency ratings, additional measures of fear learning include fear-potentiated startle (van Well et al. 2012; Gorka et al. 2017) and pupil dilation (Visser et al. 2015; Korn et al. 2017), which probe different fear circuitry than SCR. Moreover, both positive and negative results regarding associations between measures of fear learning and neural activity should be reported.

4.3 Control for Cue Associations

A major reason for interest in contextual fear learning is that PTSD has been associated with hippocampal dysfunction (Acheson et al. 2012; Liberzon and Abelson 2016), and contextual fear conditioning is viewed as a useful paradigm for investigating hippocampal-dependent processes of pattern completion and pattern separation (Rudy et al. 2004; Rudy 2009). Unfortunately, much of the fMRI research on contextual fear conditioning has utilized methodology which either does not specifically probe hippocampal-dependent fear learning (i.e., configural) or which confounds complex visuospatial environmental details with threat uncertainty and extended stimulus duration. Studies investigating configural fear learning of contextual information should use feature-identical contexts which can only be distinguished by learning the overall arrangement of elements within a context. One study design that could accomplish this goal would be having a single threat context (CON+) comprised of several features and multiple safe contexts (CON−) all comprised of different arrangements of the same elements. If designed properly, such an approach can necessitate that distinguishing CON+ from CON− can only be achieved through learning the overall contextual configurations (see Fig. 2 for example). Such a feature-identical approach to contextual fear learning could also utilize VR environments.

Fig. 2
figure 2

Schematic design of feature-identical CON+ and CON-s with four features that require configural learning for differentiation. In order to correctly predict CON+ (which is paired with US presentation), it is necessary to learn the full configuration of ABCD in CON+ rather than learning the position of a single feature or pair of features (AB, AD, AC, BD, BC, CD)

4.4 Timing of Recall

Configural learning about contextual information is believed to require the hippocampus (Rudy 2009), but long-term encoding and retrieval of contextual information may depend upon mPFC (Quinn et al. 2008). Given that the timing of memory recall may be a critical determinant of the neural circuitry probed, imaging studies of contextual fear learning should examine short-term versus long-term recall of contextual information.

4.5 Psychiatric Samples

Given recent interest in the role of hippocampal dysfunction as a risk factor for PTSD, it is unfortunate and somewhat surprising that only a single extant neuroimaging study has examined contextual fear in individuals with PTSD (Steiger et al. 2015). The completion of more neuroimaging studies of contextual fear learning in PTSD and psychiatric samples will likely advance understanding of psychiatric risk, resilience, and etiology, particularly if improved methodology suggested above is incorporated.

5 Summary

In conclusion, there are relatively few experimental investigations of human contextual fear learning using neuroimaging (Greco and Liberzon 2016). Findings from these studies generally support the role of the hippocampus, amygdala, and mPFC in contextual fear learning as well as a broad frontoparietal network. A custom Neurosynth meta-analysis provides additional evidence that MFG, parietal cortex, and mid and posterior subregions of the hippocampus contribute to acquisition of contextual fear as compared to neural activity associated with broader constructs of “fear” and “conditioning.” Unfortunately, many of the studies we reviewed have methodological confounds which limit interpretation and understanding of the neural circuitry involved. In order to make advances in understanding how humans acquire fear in complex, multimodal environments, it is necessary to increase specificity in terminology about “context” and “contextual fear.” Tasks that avoid confounding distinct types of contextual characteristics (i.e., configural processing, US predictability, stimulus duration) will further refine our understanding of neural circuits underlying context modulation/mediation of learned fear. Ultimately, improved understanding of the neural circuitry involved in different aspects of human contextual fear learning may contribute to advances in characterizing risk and resilience for PTSD as well as treatment development.