Executive functions generally refer to “higher-level” cognitive functions involved in the control and regulation of “lower-level” cognitive processes and goal-directed, future-oriented behavior. Over 2500 scientific articles have been published on this topic in the past 10 years. They have examined the role of executive functions in normal development (e.g., Espy and Kaufmann, 2002), Attention-Deficit/Hyperactivity Disorder (e.g., Sergeant et al., 2002), Antisocial Personality Disorder (e.g., Morgan and Lilienfeld, 2000), Parkinson's disease (e.g., Dagher et al., 2001), and neuropsychiatric disorders including Schizophrenia and Obsessive-Compulsive Disorder (e.g., Nieuwenstein et al., 2001; Perry et al., 2001).

The component processes of executive functions have been investigated by means of factor-analytic studies. The Wisconsin Card Sorting Test (WCST) taps neuropsychological processes involving (a) cognitive flexibility, (b) problem-solving, and (c) response maintenance (Greve et al., 2002). In empirical and theoretical papers, the processes that emerge as component factors underlying executive functions are: (a) inhibition and switching (Baldo et al., 2001; Burgess et al., 1998; Miyake et al., 2000; Rabbitt, 1997; Sergeant et al., 2002; Troyer et al., 1998; Welsh, 2002), (b) working memory (Barcelo and Knight, 2002; Barcelo and Rubia, 1998; Barkley, 1996; Denckla, 1996; Dunbar and Sussman, 1995; Pennington et al., 1996; Sergeant et al., 2002; Stuss et al., 1998; Stuss et al., 2001; Welsh, 2002; Zelazo et al., 1997), and (c) sustained and selective attention (Barcelo, 2001; Barkley, 1996; Manly and Robertson, 1997; Stuss et al., 1998; Stuss et al., 2001).

There remains an on-going debate regarding whether executive functions are regulated by the frontal lobes (e.g., Miyake et al., 2000; Welsh, 2002), leading to ambiguity of definition (Eslinger, 1996; Stuss and Alexander, 2000; Tranel et al., 1994). Tests of abstract reasoning and verbal fluency commonly are referred to as “frontal lobe” measures because persons with severe lesions in this area do poorly on them (e.g., Benton, 1968; Milner, 1963). Eventually it became standard practice to conclude that individuals who perform poorly on executive function tests had a “frontal lobe deficit” (Stuss et al., 2000). In fact, it “is virtually impossible to find a discussion of prefrontal lobe lesions that does not make reference to disturbances of executive functions and, in parallel fashion, there is rarely a discussion of disturbances of executive functions that does not make reference to dysfunction in prefrontal brain regions” (Tranel et al., 1994, p. 126). Despite the circularity of linking anatomy (frontal lobes) with a neuropsychological construct (executive functions), the frontal lobes continue to be linked to measures of executive function (Duke and Kaszniak, 2000; Duffy and Campbell, 2001). In addition to this circularity, the sensitivity and specificity of executive function measures to lesions in the frontal lobes is inconsistent (e.g., Bigler, 1988; Costa, 1988; Wang, 1987). Several researchers have found that persons with frontal lesions perform within normal limits on executive function tests (e.g., Ahola et al., 1996; Damasio, 1994; Eslinger and Damasio, 1985; Heck and Bryer, 1986; Shallice and Burgess, 1991b) and others have found that persons with non-frontal or diffuse lesions perform as poorly as persons with frontal lesions on these tests (e.g., Anderson et al., 1991; Axelrod et al., 1996; Crockett et al., 1986; Grafman et al., 1990; Heaton, 1981; Robinson et al., 1980).

Because of the fuzziness surrounding the relationship between executive functions and the frontal lobes (e.g., Denckla, 1996; Stuss and Alexander, 2000), there is a limited literature establishing the construct validity of classic executive function tasks (e.g., Barcelo, 2001; Kafer and Hunter, 1997; Phillips, 1997; Rabbitt, 1997; Reitan and Wolfson, 1994). The current paper aims to examine the executive function construct and its method of assessment by reviewing lesion and neuroimaging studies using three executive function measures: Wisconsin Card Sorting Test, Phonemic Verbal Fluency, and Stroop Color Word Interference Test (Stroop). These three tasks were chosen because they are among the most frequently used executive function measures (e.g., Baddeley, 1996; Butler et al., 1991; Carlin et al., 2000; Goodglass and Kaplan, 1979; MacLeod, 1991; Stuss and Levine, 2002) and there are several theoretical papers examining the underlying cognitive processes involved in each (e.g., Greve et al., 2002; Miyake et al., 2000). Lesion studies will address whether these tests distinguish between persons with frontal lobe injuries and controls, and functional neuroimaging studies will address whether the frontal lobes are activated when healthy individuals are performing these tasks. Lesion studies also will be examined quantitatively across all three measures (where effect size data are available). There are only three known meta-analytic studies in this area, but they are limited to the WCST (Demakis, 2003; Rhodes, 2004) and verbal fluency tests (Henry and Crawford, 2004).

We shall limit the current review to studies of adult populations because of changes in both executive functions and frontal lobe development among children (e.g., Levin et al., 1997; Malloy and Richardson, 2001; Welsh, 2002). The paper will conclude with a brief review of alternative executive function measures to augment existing protocols, such as the gambling task and the Multiple Errands Test (Bechara et al., 1994; Burgess, 2000), along with possible directions for future research.

FRONTAL-SUBCORTICAL CIRCUITRY

Several researchers (e.g., Cummings, 1995; Duke and Kaszniak, 2000; Sbordone, 2000; Stuss and Benson, 1984) suggest that there are three principal frontal-subcortical circuits involved in cognitive, emotional, and motivational processes: (a) dorsolateral; (b) ventromedial; and (c) orbitofrontal. The dorsolateral frontal cortex projects primarily to the dorsolateral head of the caudate nucleus and has been linked to executive functions, including verbal and design fluency, ability to maintain and shift set (as measured by the WCST), planning, response inhibition, working memory, organizational skills, reasoning, problem-solving, and abstract thinking (Cummings, 1993; Duke and Kaszniak, 2000; Ettlinger et al., 1975; Grafman and Litvan, 1999; Jonides et al., 1993; Malloy and Richardson, 2001; Milner, 1971; Stuss et al., 2000). The ventromedial circuit, which is involved in motivation, begins in the anterior cingulate and projects to the nucleus accumbens. Lesions to this region often produce apathy, decreased social interaction, and psychomotor retardation (Sbordone, 2000). The orbitofrontal cortex projects to the ventromedial caudate nucleus and is linked to socially appropriate behavior. Lesions to this area cause disinhibition, impulsivity, and antisocial behavior (Blumer and Benson, 1975; Cummings, 1995).

In addition to the primary projections noted above, the frontal lobes have multiple connections to cortical, subcortical and brain stem sites and should “be conceived as one aspect of an executive system involving many structures of the central nervous system” (Duffy and Campbell, 2001, p. 116). The basis of “higher-level” cognitive functions such as inhibition, flexibility of thinking, problem solving, planning, impulse control, concept formation, abstract thinking, and creativity often arise from much simpler, “lower-level” forms of cognition and behavior. Thus, the concept of executive function must be broad enough to include anatomical structures that represent a diverse and diffuse portion of the central nervous system.

Table 1. Procedures and Scoring of Executive Function Measures

EXECUTIVE FUNCTION MEASURES

Wisconsin Card Sorting Test

Origin and History

The WCST (Heaton et al., 1993) often has been cited as the most frequently used measure of executive functioning (Baddeley, 1996; Barcelo and Knight, 2002; Reitan and Wolfson, 1994; Spreen and Strauss, 1998; Stuss and Benson, 1986; Stuss and Levine, 2002), and is regularly used by over 70% of neuropsychologists (Butler et al., 1991). The original construction of the task (Berg, 1948) was based partly on sorting test methods to assess abstract reasoning and set shifting in humans and animals (Goldstein, and Scheerer, 1941; Vigotsky, 1934; Weigl, 1941; Zable and Harlow, 1946). Milner (1963) adapted the procedure developed by Grant and Berg (1948), and her version became the model for the current “standard” administration of the WCST (known as the “Heaton version”). The WCST eventually became a popular neuropsychological test (Butler et al., 1991) due to its reported sensitivity to frontal lobe lesions (e.g., Drewe, 1974; Milner, 1963) and the publication of a standardized procedure (Heaton, 1981). Table 1 presents a condensed version of the WCST procedures and scoring protocols (as well as those for the additional two executive function measures to be reviewed in this paper). Readers interested in a more detailed description of procedures and scoring should consult the respective manuals (Benton and Hamsher, 1976; Heaton et al., 1993; Stuss et al., 2001).

Lesion Studies

Twenty-five studies examining the effect of various brain lesions on WCST performance in adult populations were reviewed (Table 2). There have been two previous qualitative reviews of this literature (Mountain and Snow, 1993; Reitan and Wolfson, 1994) and two meta-analytic studies in this area (Demakis, 2003; Rhodes, 2004). The state of the evidence regarding WCST sensitivity and specificity to frontal lobe lesions follows.

Table 2. WCST Lesion Studies for Qualitative Review

Twelve studies indicate that adults with frontal lobe lesions perform worse than healthy controls, and 10/16 studies suggest that persons with frontal lobe lesions perform worse than persons with non-frontal lobe lesions (see Table 2). In contrast, two single-case design studies found that persons with focal frontal lobe damage exhibited no deficits compared to norms (Eslinger and Damasio, 1985; Heck and Bryer, 1986), and four studies found no differences between frontal and diffuse or basal ganglia comparison groups (Eslinger and Grattan, 1993; Heaton, 1981; Heaton et al., 1993; Robinson et al., 1980). In general, these studies support the sensitivity of the WCST to frontal lobe lesions, as compared to non-frontal lobe lesions, but they do not support the specificity of the test to frontal lobe lesions.

In addition, five studies failed to demonstrate significant differences between frontal and non-frontal groups (Anderson et al., 1991; Axelrod et al., 1996; Crockett et al., 1986; Grafman et al., 1990; van den Broek et al., 1993). As seen in Table 2, these five studies do not appear to differ from those in which persons with frontal lobe lesions had impaired performance relative to controls. Further evidence that the WCST is a sensitive but not specific marker of frontal lobe damage is provided by studies that find poorer performance on the instrument among patients with a posterior lesion (Teuber et al., 1951) and patients with thalamic lesions (Wallesch et al., 1983) relative to controls.

Functional Neuroimaging Studies

There is one known qualitative review of brain activation during WCST performance (Barcelo, 2001). Five additional studies not reported by Barcelo (2001) are reviewed here (Catefau et al., 1998; Haines et al., 1994; Rezai et al., 1993; Riehemann et al., 2001; Weinberger et al., 1986). The central question of this section is whether the frontal lobes are activated during WCST performance in healthy adult populations (see Table 3). For studies that included disordered populations as well in their analyses (e.g., persons with schizophrenia), only the data for the healthy control group were reported in Table 3. Event-related potentials, electroencephalographic, and magnetoencephalographic (MEG) studies (e.g., Barcelo et al., 2000; Çiçek, and Nalçaci, 2001) tend to have poor spatial resolution (Cabeza and Nyberg, 2000) and were excluded from this review. In addition, the event-related potential literature has been reviewed extensively elsewhere (Barcelo, 2001).

Table 3. WCST Neuroimaging Studies for Qualitative Review

Several studies found increased activation in the dorsolateral prefrontal cortex during performance of the WCST (see Table 3), which is commensurate with the results of five lesion studies (Ettlinger et al., 1975; Grafman et al., 1986; Heck and Bryer, 1986; Milner, 1963; Milner, 1971; Stuss et al., 2000). Activation occurs in other frontal areas as well, including the ventromedial and orbitofrontal cortices. Collectively, these results seem to suggest that a bilaterally intact prefrontal cortex, especially the dorsolateral prefrontal cortex, is necessary for “normal” WCST performance, but it is unclear whether the WCST primarily activates the left or right prefrontal cortex (see Table 3).

A number of non-frontal brain regions are activated by the WCST, including the inferior parietal cortex (Berman et al., 1995; Nagahama et al., 1997; Nagahama et al., 1996; Tien et al., 1998), basal ganglia (Mentzel et al., 1998), temporo-parietal association cortex (Konishi et al., 1998), and occipito-temporal, temporal pole, and occipital cortices (Marenco et al., 1993; Nagahama et al., 1996; Ragland et al., 1998). These results are consistent with lesion studies reporting no significant differences between frontal groups and non-frontal, diffuse, or basal ganglia comparison groups (Anderson et al., 1991; Axelrod et al., 1996; Crockett et al., 1986; Eslinger and Grattan, 1993; Grafman et al., 1990; Heaton, 1981; Heaton et al., 1993; Robinson et al., 1980; van den Broek et al., 1993). That the WCST activates a widespread network of brain regions is consistent with distributed neuronal network modeling accounts of the test (e.g., Dehaene and Changeux, 1991). A parsimonious explanation of the WCST results supports the idea that a distributed network of neural circuits is activated when task demands involve integrated functioning. For example, activities of daily living, such as planning a trip to the store, involve overt and covert behavior components. At the overt level, the individual may search for the appropriate writing instruments, write down directions, and make a list of items. At the covert level, the individual may engage long-term and short-term memory functions, visualize a path to the store and where items are located, and plan a budget that is within the parameters of the resources available. One could refer to these activities as internal and external (or implicit and explicit) representations of cognitive ability that fall within the purview of executive functions. A fundamental tenet of this review is that executive functions recruit from a wide range of functional abilities that are orchestrated in part by the frontal lobes. Thus, any high-level cognitive task is likely to require participation of both subcortical and cortical regions, many of which have neural paths leading to the frontal lobes.

Over half of the WCST neuroimaging studies had sample sizes less than 15 subjects, slightly less than half did not include adequate control tasks in their experimental designs, and all used the subtraction method to compare activation between a target and reference condition, a method that has been called into question by several authors (see Cabeza and Nyberg, 2000 for elaboration). There are additional delimiting factors affecting the interpretation of the WCST imaging results, including no examination of non-frontal brain areas in the analyses, simply averaging brain activation across the entire duration of the WCST, and poor temporal resolution of the hemodynamic neuroimaging procedures (e.g., PET and fMRI). Despite these concerns, the results seem to suggest that WCST performance activates a distributed neural network involving both frontal and non-frontal brain regions. Thus, like the lesion studies, the neuroimaging studies indicate that the WCST is a sensitive, but not specific, marker of frontal lobe functioning.

Phonemic Verbal Fluency

Origin and History

Verbal fluency is one of the most frequently used measures of executive functioning (Baddeley, 1996; Baldo et al., 2001; Goodglass and Kaplan, 1979; Stuss and Levine, 2002; Warbuton et al., 1996), and is used regularly by approximately 50% of neuropsychologists (Butler et al., 1991). The two types of verbal fluency tasks are phonemic and semantic. Phonemic fluency tasks require participants to say (or write) as many words as possible beginning with a specific letter. Semantic fluency tasks require participants to say (or write) as many words as possible within a certain category (e.g., animals). In general, persons with frontal lobe damage demonstrate impaired phonemic fluency, while their semantic fluency remains relatively intact (Troyer et al., 1998).

Although Feuchtwanger (1923, cited in Zangwill, 1966) reported that persons with frontal lobe damage had a decrease in the production of spontaneous speech, Thurstone (1938) developed a written test of verbal fluency called the Word Fluency Test (TWFT), which was the first standardized procedure for the psychometric assessment of word fluency. An oral word fluency task developed by clinical neuropsychologists about 30 years after the development of the TWFT is the most popular phonemic fluency task for brain-damaged populations (Benton and Hamsher, 1976). The Controlled Oral Word Association test (COWA) of Benton and Hamsher (1976) requires participants to generate as many words orally as possible within 60 seconds beginning with the letters “F,” “A”, and “S”.

Lesion Studies

Sixteen studies examining the effect of various brain lesions on phonemic verbal fluency performance in adult populations are reviewed below (see Table 4). There is one known qualitative review (Reitan and Wolfson, 1994) and one meta-analytic study (Henry and Crawford, 2004) in this area. Our updated review will summarize the data that both support and challenge the sensitivity and specificity of phonemic verbal fluency to frontal lobe lesions.

Table 4. Verbal Fluency Lesion Studies for Qualitative Review

Ten (out of 10) studies found that persons with frontal lobe lesions produce significantly fewer words than healthy controls, and eight (out of nine) studies indicate that persons with frontal lobe lesions perform worse than persons with non-frontal lobe lesions (see Table 4). In support of these findings, Henry and Crawford (2004) conducted a meta-analysis of 31 studies with 1,791 subjects and found that individuals with focal frontal lesions had larger deficits in phonemic (r = .52) verbal fluency as compared to healthy controls. Collectively, these results suggest that phonemic verbal fluency is sensitive to frontal lobe lesions, but does not address whether it is specific to frontal lobe lesions. To address the specificity issue, persons with any other type of brain damage would have to perform as well as healthy controls, and persons with frontal lobe lesions would have to perform significantly worse than all other brain-damaged groups on this task.

Diffuse and frontal lobe lesioned patients show no differences in total number of words produced, yet both do worse than healthy controls (Pendleton et al., 1982). Alzheimer's patients perform as poorly as frontal lobe lesioned patients (Miller, 1984) and persons with both frontal and non-frontal left hemisphere lesions perform worse than persons with right hemisphere frontal and non-frontal lesions (Perret, 1974) and healthy controls (Stuss et al., 1998). Thus, these studies do not support the specificity of phonemic verbal fluency tasks to frontal lesions.

Persons with left frontal lesions often perform significantly worse than any other brain-damaged group (Baldo et al., 2001; Pendleton et al., 1982; Perret, 1974; Ramier and Hecaen, 1970; Stuss et al., 1998; Troyer et al., 1998), but right frontal (Bornstein, 1986; Miceli et al., 1981; Miller, 1984; Pendleton et al., 1982; Perret, 1974; Ramier and Hecaen, 1970; Troyer et al., 1998) and bilateral frontal (Benton, 1968; Janowsky et al., 1989) lesions also impair phonemic verbal fluency performance. Although non-frontal and right-sided lesions have been found to impair phonemic verbal fluency, impaired verbal fluency typically is a product of left-sided lateralization of frontal lobe damage (Ramier and Hecaen, 1970).

Table 5. Verbal Fluency Neuroimaging Studies for Qualitative Review

There are several limitations to the lesion studies in Table 4 that may be related to the lack of specificity of phonemic verbal fluency tasks to frontal lobe lesions. First, several studies did not include appropriate control groups (e.g., Benton, 1968; Stuss et al., 1998). Second, several studies did not indicate the exact localization of the lesions, which may have confounded the “purity” of the groups’ compositions (e.g., Butler et al., 1993). Third, some studies used the COWA while others used the TWFT. Differences between written and oral word fluency tasks, as well as the time interval differences between these tasks, may have impacted the findings. Lastly, some of these studies (e.g., Miceli et al., 1981) excluded persons with dysphasia and others provided no information regarding the incidence of dysphasia in the study participants (e.g., Milner, 1964). Reitan and Wolfson (1994) noted “that the incidence and possible effects of dysphasia should be identified when tests that require production or processing of verbal material are used in comparative assessment of brain-damaged groups” (p. 172) due to the fact that the presence of dysphasia could be causing the limited verbal productions, rather than the lesion location per se.

Functional Neuroimaging Studies

A few studies have examined whether the frontal lobes are activated during phonemic verbal fluency performance in healthy adult populations (see Table 5). Studies that used semantic fluency tasks were excluded from this review because semantic fluency remains relatively intact in persons with frontal lobe damage (Troyer et al., 1998).

As seen in Table 5, the results of the phonemic verbal fluency tasks vary widely across studies. Some consistent results, however, have emerged from an examination of the data. These studies found increased activation in the left dorsolateral prefrontal cortex (Frith et al., 1995; Frith et al., 1991; Warkentin and Passant, 1997), anterior cingulate (Frith et al., 1995; Frith et al., 1991; Phelps et al., 1997), and left inferior frontal gyrus (Paulesu et al., 1997; Phelps et al., 1997). The findings of increased activation in frontal areas along with the finding of Parks et al. (1988) of increased overall frontal lobe activation suggest that an intact frontal cortex, especially the left side, is required for phonemic verbal fluency performance. Phonemic verbal fluency also activates a number of non-frontal brain areas, including the thalamus (Frith et al., 1995; Paulesu et al., 1997), parietal lobes (Parks et al., 1988), and temporal lobes (Parks et al., 1988).

These brain-imaging studies suffer from similar methodological shortcomings as those using the WCST. Despite the limitations to the neuroimaging studies, the results suggest that phonemic verbal fluency performance activates a number of frontal and non-frontal brain areas, indicating the sensitivity, but not specificity, of phonemic verbal fluency tasks to frontal lobe functioning.

Stroop Color Word Interference Test

Origin and History

The Stroop test is one of the most extensively studied measures of selective attention (Blenner, 1993; Carter et al., 1995; Goodglass and Kaplan, 1979; Lezak, 1995; MacLeod, 1991; Stuss et al., 2001) used by approximately 50% of neuropsychologists (Butler et al., 1991). The test often consists of three sets of stimuli: (a) color words printed in black ink; (b) color patches or colored X’s; and (c) color words printed in incongruous colored ink (e.g., the word “RED” printed in blue ink). The participant must read the color words on the first sheet, the colors on the second sheet, and the color of the ink (i.e., not the words) on the third sheet. In the latter task, the normal tendency to read the words, rather than the color of the ink in which the words are printed, elicits a significant slowing in reaction time (RT) called the “Stroop effect” or the “interference effect.” Stroop (1935) found that healthy college students’ mean RT increased by 74 percent from naming color patches to naming the incongruous colored ink in which color words were printed.

Nearly 50 years before Stroop (1935) published his seminal paper on attentional interference using the paradigm described above (now known as the “Stroop test”), Cattell (1886) had previously shown that it took participants longer to name colors (and pictures of objects) than it did to name the corresponding words of the colors and objects. He also found that it took participants a shorter amount of time to recognize a color than it did a word or letter, but it took longer to name the color than the word or letter “because in the case of words and letters the association between the idea and name has taken place so often that the process has become automatic, whereas in the case of colours [sic] and pictures we must by a voluntary effort choose the name” (p. 65). In other words, Cattell argued that there is a distinction between automatic and voluntary attentional processes involved in naming colors and words, that is, attending to the lexical features of words is an automatic process while attending to ink color is not. As MacLeod (1991) noted, Cattell's hypothesis has influenced his contemporaries (e.g., James, 1890; Quantz, 1897) as well as future psychologists (e.g., Stroop, 1935; Posner and Synder, 1975).

There have been over one thousand articles published on the Stroop effect over the past 67 years. A review of the entire Stroop literature is beyond the scope of this paper (interested readers should consult MacLeod, 1991). This section will focus only on how the test came to be used by neuropsychologists for the detection of frontal lobe impairment. It appears that the Stroop came to be used by neuropsychologists as a “frontal lobe test” based on the results of one study which found that persons with left frontal lobe lesions displayed significantly longer interference trial RTs than persons with non-frontal lobe lesions (Perret, 1974). Only five additional studies have been located that examine the role of frontal lobe lesions on Stroop performance since that time, even though the test is widely used as a measure of frontal lobe functioning (Butler et al., 1991; Stuss et al., 2001).

Lesion Studies

The last qualitative review of studies employing the Stroop test (MacLeod, 1991) did not address the question of sensitivity and specificity of the task to frontal lobe lesions. Only two studies have found that persons with frontal lobe lesions perform worse than healthy controls (Stuss et al., 2001; Vendrell et al., 1995). Another study found that bilateral medial frontal lesions increase “susceptibility” to the Stroop effect (Holst and Vilkki, 1988, p. 80). In sum, only certain areas of the frontal lobes appear to underlie Stroop performance, namely lateral and superior medial, not orbitofrontal (see Table 6).

Table 6. Stroop Lesion Studies for Qualitative Review

Two other studies have found differences between frontal and non-frontal groups. Persons with left frontal lobe lesions perform significantly worse on the incongruent color naming condition than persons with right frontal, right non-frontal, and left non-frontal damage, and persons with left-sided frontal lobe lesions produce significantly slower performance on all three conditions than persons with non-frontal lobe lesions (Perret, 1974; Stuss et al., 2001). Conversely, Blenner (1993) found no differences between groups with frontal and temporal lobe lesions, although the combined lesioned group performed worse than a normal group on all three conditions.

Collectively, the results of the Stroop lesion studies are less consistent than the results of the WCST and phonemic verbal fluency lesion studies. The Stroop test is sensitive to lateral and superior medial lesions of the frontal lobes, but it is not specific to overall frontal lobe functioning.

Functional Neuroimaging Studies

In general, the hemodynamic brain imaging data both support and challenge the sensitivity and specificity of the Stroop test to frontal lobe functioning (see Table 7). Despite a number of differences between the studies, there are some consistent findings. A general conclusion to be drawn is that increased activation in the anterior cingulate cortex is a critical brain region for selective attention (e.g., Bench et al., 1993; Carter et al., 1995; Pardo et al., 1990; Posner and Dehaene, 1994; Posner and Petersen, 1990). The one study that did not find increased activation in the anterior cingulate cortex differed from the others in experimental design (Banich et al., 2000). Despite the one negative finding, it appears that the anterior cingulate cortex does play an important role in Stroop performance. In fact, Peterson et al. (1999) stated that it acts as a “central executor [emphasis added] that coordinates and integrates the task-oriented sensory, receptive and expressive language, alerting, working memory, response selection, motor planning, and motor response processes within the CNS [central nervous system]” (p. 1253).

Table 7. Stroop Neuroimaging Studies for Qualitative Review

In addition to the prominent role of the anterior cingulate during Stroop performance, several studies also indicate that the Stroop test activates the middle frontal gyrus (Banich et al., 2000; Bush et al., 1998; Leung et al., 2000; Taylor et al., 1997), parietal lobe regions (Brown et al., 1999; Bush et al., 1998; Carter et al., 1995; Leung et al., 2000; Peterson et al., 1999; Taylor et al., 1997), motor areas (Bush et al., 1998; Pardo et al., 1990; Peterson et al., 1999), and temporal lobe regions (Bush et al., 1998; Carter et al., 1995; Leung et al., 2000; Pardo et al., 1990). Furthermore, several studies demonstrate that the Stroop activates a distributed neural network of brain regions (Brown et al., 1999; Bush et al., 1998; Carter et al., 1995; Leung et al., 2000; Pardo et al., 1990; Peterson et al., 1999; Taylor et al., 1997). Moreover, the finding that a task as complex as the Stroop activates a large number of brain areas is consistent with parallel distributed processing models of the Stroop effect (e.g., Cohen et al., 1990). Despite a number of methodological limitations to the Stroop neuroimaging studies, the results suggest that Stroop performance activates a distributed neuronal network of frontal and non-frontal brain regions.

SUMMARY OF QUALITATIVE REVIEW

A qualitative review of three popular executive function measures (WCST, phonemic verbal fluency, and Stroop) suggests that these measures are sensitive, but not specific, indicators of frontal lobe damage. Typically, persons with frontal lobe lesions perform more poorly than healthy controls on these tests, although several studies indicate that patients perform within normal limits (e.g., Ahola et al., 1996; Damasio, 1994; Eslinger and Damasio, 1985; Heck and Bryer, 1986; Shallice and Burgess, 1991b). Moreover, persons with frontal lobe lesions usually perform worse than persons with non-frontal lobe lesions, but some studies found that persons with non-frontal or diffuse brain lesions do as poorly as frontal lobe lesion patients (e.g., Anderson et al., 1991; Axelrod et al., 1996; Crockett et al., 1986; Grafman et al., 1990; Heaton, 1981; Robinson et al., 1980). Thus, overall there have been inconsistent findings regarding the sensitivity and specificity of these three executive function measures to lesions in the frontal lobes, indicating that these tasks should not be used as “frontal lobe tests” per se, but rather as tests of specific executive functions (e.g., problem-solving, cognitive fluency).

In addition, functional neuroimaging studies using these three executive function measures demonstrated that the tests activate a distributed neural network of frontal and non-frontal brain regions. In other words, the brain imaging data do not implicate the frontal lobes as the only brain region responsible for executive functions. It is not surprising, however, that multiple brain areas are involved in cognitive processes as complex as the executive functions tapped by these measures, including shifting and maintaining cognitive set, inhibition of prepotent responses, selective attention, and planning. Moreover, the frontal lobe regions have multiple connections with various other cortical, subcortical, and brain stem sites and, thus, the frontal lobes should “be conceived as one aspect of an executive system involving many structures of the central nervous system” (Duffy and Campbell, 2001, p. 116). Commensurate with the results of the lesion studies, the neuroimaging data support the sensitivity, but not specificity, of these three executive function measures to frontal lobe functioning.

RATIONALE FOR QUANTITATIVE REVIEW

In order to examine further the relationship between these three executive function measures and frontal lobe damage, lesion studies were examined quantitatively. The meta-analytic approach increases statistical power, permits the estimation of a population effect size, and allows an examination of variables that may be moderating the relationship between lesion location and performance on the executive function measures.

As a rationale for the quantitative analyses, the qualitative review by itself relies only on the statistical significance of the original findings, which may be unreliable due to low statistical power (Schmidt, 1992). Thus, studies with small sample sizes may produce results that are “not significant” despite the presence of large effect sizes. The qualitative review was performed in order to determine potential moderator variables, inclusion criteria, and coding strategies. The procedures and results of the meta-analysis are presented next.

METHOD

Search Strategy

Several strategies were employed to identify studies for inclusion in the meta-analysis. First, searches of computerized databases, including PsycINFO and MEDLINE, were conducted using keywords such as “executive function,” “frontal lobe,” “Wisconsin card sort,” “Stroop,” “verbal fluency,” and “word fluency,” as well as variants on these terms. After collecting all available published articles and abstracted studies, their reference sections were scanned in order to locate additional articles that may have been missed in the previous searches. Lastly, authors were contacted to inquire whether additional research had been conducted that would have been overlooked by the previous search methods.

Inclusion Criteria

Studies needed to satisfy the following criteria to be included in the meta-analysis:

  1. 1.

    The sample consisted of adult participants only (i.e., the mean or median age of the sample was equal to or above 18 years).

  2. 2.

    The study did not consist solely of persons with “suspected” frontal lobe damage, such as psychiatric populations (e.g., persons with schizophrenia) or demented populations (e.g., persons with Alzheimer's disease or frontal lobe dementia).

  3. 3.

    The study either had a healthy control group or a non-frontal lobe lesioned control group.

  4. 4.

    The study included verification of frontal lobe damage either through a brain imaging technique (e.g., CT scan or MRI) or through surgical reports.

  5. 5.

    The study employed the standard version of the WCST (Milner, 1963; Heaton, 1981; Heaton et al., 1993), the COWA test of phonemic verbal fluency (Benton and Hamsher, 1976), or the standard version of the Stroop (Stroop, 1935; Stuss et al., 2001).

  6. 6.

    The study reported the following scores, depending on which test was administered: (a) number of perseverative errors on the WCST because “perseverative errors are regarded as the main signs of frontal dysfunction” (Barcelo and Knight, 2002, p. 349); (b) total number of “FAS” words generated on the COWA test; and (c) interference trial RT for the Stroop.

  7. 7.

    Adequate data (i.e., means and standard deviations, t-values, F-values, or p-values) were provided for calculation of effect sizes.

Study Sample

Initially, there were 52 lesion studies selected for the meta-analysis. A total of 27 studies were included after evaluation of whether each study satisfied the inclusion criteria. These 27 studies tested 1,992 participants with sample sizes ranging from 18 to 415. The mean age of the samples ranged from 26.5 to 66.9 (with a mean age of the entire sample of 45.33, SD = 10.14).

Examination of Moderator Variables

The qualitative review identified potential moderators of the relationship between executive function measures and the frontal lobes, including type of test, comparison group, and age. Each variable is discussed briefly below.

Type of Test

Several studies have found relatively low intercorrelations (r < .40) among executive function tests (e.g., Cockburn, 1995; Crockett et al., 1986; Duncan et al., 1997; Humes et al., 1997; Miyake et al., 2000; Welsh et al., 1999), indicating the possibility that each executive function test measures something unique (Duncan et al., 1997; Rabbitt, 1997; Vandierendonck, 2000). Despite the low correlations among executive function measures, it is important to determine whether there are higher correlations among executive function measures than between executive function measures and other measures not hypothesized to tap executive functions (e.g., recognition memory). One study found that the median correlation between four executive function tests and five non-executive function tests was .29, while the median correlation among the four executive function measures was only .26 (Duncan et al., 1997), indicating that the executive function measures are no more related to one another than they are to other tasks. Furthermore, Miyake et al. (2000) found that correlations among executive function measures were higher when they were thought to tap the same underlying cognitive process while correlations were lower when they were thought to tap different cognitive processes, suggesting that executive function measures show signs of both convergent and discriminant validity.

Generally, the results seem to suggest that type of executive function measure may moderate the relationship between frontal lobe functioning and test performance. Type of test was coded as a categorical variable.

Comparison Group

The qualitative review of lesion studies indicated that most studies using a healthy comparison group found that the executive function measures were sensitive to frontal lobe lesions, while many of the studies using a non-frontal lobe lesioned comparison group did not find differences between groups. Thus, type of comparison group was explored as a possible moderator variable of the relationship between executive function measures and integrity of the frontal lobes. It was hypothesized that studies comparing persons with lesions in the frontal lobes to lesions in posterior brain regions would yield smaller effect sizes than studies comparing persons with lesions in the frontal lobes to healthy control participants. Comparison group was coded as a categorical variable.

Age

A moderating effect of age was predicted based on several studies that found significant correlations between executive function measures and age of participants (Anderson et al., 1991; Axelrod and Henry, 1992; Berg, 1948; Bryan and Luszcz, 2000; Crockett et al., 1986; Grafman and Litvan, 1999; Heaton et al., 1993; Little and Hartley, 2000; Nagahama et al., 1997; Nelson, 1976; Pendleton et al., 1982; Rhodes, 2004; van den Broek et al., 1993; Wang, 1987; Zelazo et al., 1997). For instance, Little and Hartley (2000) found that the interference effect on the Stroop test was greater for older adults than for younger adults. Additionally, Heaton et al. (1993) found a quadratic relationship between WCST performance and age. Scores improved during childhood (ages 6(1/2) -19) then stabilized during adulthood (ages 20–50) and finally declined at an accelerated rate during late adulthood (ages 60–90). Moreover, Malloy and Richardson (2001) found that the frontal lobes do not fully mature until adolescence and that there is a greater loss of neurons during normal aging in the frontal lobes than in posterior regions. In a meta-analysis conducted by Rhodes (2004) robust age effects were found on the number of categories achieved and the number of perseverative errors committed. These effects were moderated by education and test version. Thus, age appears to impact executive functions on a psychological level (i.e., executive function test performance) and a neuroanatomical level (i.e., frontal lobe development and degeneration). Mean age was coded as a continuous variable for each study in which this information was provided.

Fig. 1.
figure 1

Scatterplot of relationship between unweighted effect sizes and mean age of samples in the meta-analysis.

RESULTS

Primary Analysis

Details regarding the statistical procedures employed in this meta-analysis are reported in Appendix A. The effect sizes were averaged across executive function measures to produce an unweighted grand mean effect size of large magnitude (Cohen, 1992), d = −.83, with a 95 percent confidence interval of −1.08 to −.58. The effect sizes were then weighted by their respective sample sizes, yielding a weighted grand mean effect size of moderate magnitude (Cohen, 1992), d + = −.78, with a 95 percent confidence interval of −.88 to −.68. The fail-safe N statistic (Orwin, 1983) indicated that 78 additional studies with null results (i.e., effect sizes equal to zero) would be necessary to reduce the weighted mean effect size to a non-significant level.

A test of homogeneity using the weighted effect sizes was significant, Q (26) = 99.11, p < .0001, indicating that the effect sizes come from two or more populations. Thus, there are potential moderator variables that may be impacting the relationship between executive function measures and frontal lobe functioning.

Analysis of Moderator Variables

Details regarding the statistical procedures for the moderator analyses are reported in Appendix B. Due to the finding of significant heterogeneity of the effect sizes, several factors were examined as potential moderator variables.

Type of Test

Type of test was a significant moderator of the relation between executive function test performance and integrity of the frontal lobes, Q b = 97.85, p < .0001. However, there was significant within-group heterogeneity at each level of the variable. Follow-up contrasts revealed significant differences between each comparison (i.e., WCST v. verbal fluency, WCST v. Stroop, and verbal fluency vs. Stroop). Overall, the results indicated greater sensitivity to frontal lobe damage for the WCST (d = −0.97) and phonemic verbal fluency (d = −0.80) than for the Stroop test (d = −0.30).

Comparison Group

Type of control group also was a significant moderator, Q b = 109.47, p < .0001. An analysis of the results supported the hypothesis that studies comparing a frontal group to a non-frontal group yield a smaller grand mean effect size than studies comparing a frontal group to a healthy control group, d = −0.57 and d = −1.05, respectively.

Age

A scatterplot of effect sizes (see Fig. 1) suggests that effect sizes are larger for younger and older adult groups, but are smaller (i.e., closer to zero) for middle-aged adults. Regression analysis confirms that there is a quadratic relationship between age and effect size, β = −4.422, t = −3.306, p = .003, and the LOWESS line fit to the scatterplot of age and effect size is arched, suggesting that effect size is curvilinearly related to age. Although somewhat counterintuitive, frontal lobe damage in younger and older adults may be more detrimental due to developmental and degenerative processes and, thus, it may cause greater impairment in these individuals on executive function measures (leading to the larger observed effect sizes).

DISCUSSION

Summary of Qualitative and Quantitative Reviews

The results of the qualitative and quantitative reviews of the WCST, phonemic verbal fluency tasks, and Stroop test suggest that these measures are sensitive (not specific) indicators of frontal lobe damage, but there are inconsistencies in the results. The WCST has the strongest and most consistent relationship to the frontal lobes, phonemic verbal fluency has the second strongest relationship, and the Stroop test has a less consistent and weaker relationship than the WCST and phonemic verbal fluency.

It may be that the three tests are tapping different underlying cognitive process and, therefore, the construct of executive function may not be unitary. Several authors have suggested that the executive function construct is “fractionable” (Baddeley, 1996; Bryan and Luszcz, 2000; Burgess, 1997; Burgess et al., 1998; Denckla, 1994; Duke and Kaszniak, 2000; Duncan et al., 1995; Miyake et al., 2000; Owen et al., 1995; Robbins, 1998; Shallice, 1988; Shallice and Burgess, 1991a; Stuss and Levine, 2002; Stuss et al., 1995; Vandierendonck, 2000; Zelazo et al., 1997) and that “there is no frontal homunculus, no unitary executive function” (Stuss and Alexander, 2000, p. 291). That executive functioning may involve participation of diffuse areas of the brain and that different tests of this function appear to be tapping varied cognitive processes does not mean that the construct lacks unity. On the contrary, if one assumes that “executive” is distinct from “non-executive” function, the implication would be of greater overall coordination of brain activity as a necessary condition for higher-level cognitive processing. Thus, tasks designed to tap executive function naturally would be sensitive to frontal lobe damage, but not specific to focal frontal lesions because executive functioning requires participation and coordination of activity among diffuse anatomical and functional brain areas. It simply may be that the frontal lobes participate to a greater extent than other areas of the brain in functions considered to be “executive.” Without input, however, from other cortical and subcortical areas executive functioning would be compromised. Therefore, it may be more worthwhile to conceptualize executive functions as a “macroconstruct” in which multiple executive function subprocesses work in conjunction to solve complex problems and execute complicated decisions (Zelazo et al., 1997).

Alternatively, there are several additional potential moderators that were not explored due to insufficient data, including general intelligence, exact localization of lesion, and time since injury. Differences in task administration and scoring procedures as well as poor psychometric properties also may have influenced the findings. Surprisingly, there are only a handful of studies examining the reliability and validity of executive function measures, and these studies usually find low reliability and inadequate validity (Bowden et al., 1998; Humes et al., 1997; Kafer and Hunter, 1997; Miyake et al., 2000; Schnirman et al., 1998; Stuss and Alexander, 2000; Vandierendonck, 2000). Parks et al. (1992) suggested that parallel distributed processing (PDP) models of executive function tasks may help to circumvent some of these problems with reliability and validity. It is important, however, to discuss briefly what PDP modeling is before explaining how it addresses the reliability and validity of executive function measures. The theory behind PDP modeling developed out of early psychological associationist ideas such as Hebbian learning principles (Hebb, 1949). Subsequently, investigators produced empirical physiological data that were compatible with the psychological theories (Parks et al., 1992). Following the advent of “supercomputers” that could integrate the physiological and psychological data, PDP modeling was developed. In general, PDP “refers to a complex mathematical methodology used to model neuropsychological functions and other neurobehavioral tasks” (Parks et al., 1992, p. 215). PDP methodology addresses reliability insofar as each network model is internally consistent due to neuroanatomical and biological constraints. In terms of validity, computer simulations of experimental data (e.g., WCST scores) have replicated actual neuropsychological test performance in persons with frontal lobe damage and healthy controls (Levine and Prueitt, 1989).

One might logically ask if it is necessary to require frontal lobe involvement in order to qualify a test as a measure of executive functioning. Some circularity of reasoning emerges in the argument that the construct validity of executive function tests should be established on the basis of their sensitivity and specificity to frontal lobe damage. Rather, it should be established on their ability “to assess the theoretical concept of executive function and the group of cognitive processes it entails” (Bryan and Luszcz, 2000, p. 41). None of the tasks reviewed measure the entire executive function domain because it is not a unitary construct (e.g., Miyake et al., 2000; Vandierendonck, 2000). Moreover, executive functions depend on the integrity of other “lower-level” aspects of cognition that were not specifically assessed in the majority of these studies, including visual-spatial perception, visual and auditory attention, and short- and/or long-term memory (Phillips, 1997). In other words, people may display impairments on these tasks due to a deficit in one of the “lower-level” cognitive processes that underlie the target executive function, rather than due to frontal lobe dysfunction.

In summary, the use of executive function tests as “frontal lobe indicators” is not supported by the data reviewed (i.e., the articles failed to demonstrate the specificity of these measures to frontal lobe functioning). Discussing the validity of these tests solely in terms of the frontal lobes, however, not only confounds psychology with anatomy but it also ignores the importance of linking the neuropsychological construct of executive functions to behaviors that are both measurable and important in the real world. Despite the seemingly paradigmatic shift within psychology where the study of behavior has become the study of the brain (i.e., many psychologists now are studying which brain regions underlie certain behaviors rather than studying the behaviors themselves), it is important to ground the executive function construct in the measurement of observable behaviors that have real-world significance. There is a need for more ecologically valid executive function measures (Burgess et al., 1998; Cripe, 1996; Ready et al., 2001; Sbordone, 1996; Wilson, 1993).

Several authors have begun to examine the executive function construct in this manner and they have developed new tasks and measurement tools for this endeavor (e.g., Bechara et al., 1994; Shallice and Burgess, 1991a). Ironically, many of these procedures originated from the lack of traditional executive function measures to detect impairment in persons with frontal lobe lesions (e.g., Eslinger and Damasio, 1985; Shallice and Burgess, 1991b). Nevertheless, the development of these procedures represents a movement away from the “strict localizationist approach” of clinical neuropsychology (Duffy and Campbell, 2001, p. 113), where “psychology and anatomy are inseparable” (Tranel et al., 1994, p. 126) to a more integrative approach that incorporates the behavioral, theoretical, cognitive, and neuroanatomical approaches. This movement has resulted in the development of “alternative” executive function measures.

Alternative Executive Function Measures

Several researchers (e.g., Eslinger and Damasio, 1985; Shallice and Burgess, 1991b) have demonstrated executive function impairments in persons who performed within normal limits to exceptionally well on standard neuropsychological executive function measures (e.g., WCST) and standard IQ tests (e.g., Wechsler Adult Intelligence Scale). The executive function deficits manifested themselves only in complex “real-life” situations constructed by the examiners, such as shopping tasks. One of these tasks is called the Multiple Errands Test (MET) and it requires participants to buy various grocery items on a shopping list with money given to them by the examiner (Burgess, 2000). They also are given a written copy of instructions asking them to find out specific information, be at a particular location at a certain time, and follow a number of rules such as “you must not enter a shop other than to buy something” (Burgess, 2000, p. 281). Another alternative EF measure is the Cognitive Estimates Test (CET; Shallice and Evans, 1978) which requires participants to provide a reasonable estimate to a series of ten questions to which they are unlikely to know the answer (e.g., “What is the length of an average man's spine?”). An increasingly popular, alternative executive function measure is the gambling task (Bechara et al., 1994), which measures real-life decision-making skills and sensitivity to future consequences. The gambling task requires participants to choose a card from one of four decks that have different monetary rewards and punishments of which the participants are unaware, and it measures the ability to estimate which decks are risky and which are more profitable over time.

Impaired performance on these alternative executive function measures are said to reflect a “dysexecutive syndrome” (Burgess et al., 1998) rather than a “frontal lobe syndrome” (Stuss and Benson, 1984). The change in terminology not only represents a movement away from the linkage of psychology and anatomy but also a movement towards more ecologically valid indicators of executive functions. In other words, persons with a dysexecutive syndrome have difficulties with decision-making, risk-taking, and problem-solving that are not measured adequately by the classic neuropsychological executive function measures (Damasio, 1994). These difficulties significantly impair their ability to work and/or attend school and function well interpersonally (Grafman and Litvan, 1999).

SUMMARY AND FUTURE DIRECTIONS

There has been a long-standing tradition within clinical neuropsychology to link the “highest cognitive functions” such as planning, organization, decision-making, problem-solving, and logical analysis with the largest and most enigmatic brain region, the frontal lobes (Luria, 1966; Reitan and Wolfson, 1994). Before systematic studies were carried out to illuminate the functions of the frontal lobes, “higher-level” processes were attributed to the anterior brain regions because neurological studies already had mapped the majority of “lower-level” functions onto posterior brain areas (Reitan and Wolfson, 1994).

In our review, we found inconsistent support for the historical association between executive functions and the fontal lobes. Rather, the results indicated the sensitivity, but not specificity, of these measures to frontal lobe functioning. In other words, both frontal and non-frontal brain regions are necessary for intact executive functions. One can ask why is sensitivity fairly robust and reliable among commonly used tests of executive function, yet specificity is modest at best? The answer may reside with the notion that executive function is a “macroconstruct” (Zelazo et al., 1997), that is, multiple executive function subprocesses (e.g., working memory, inhibition, and selective attention) work in conjunction to solve complex problems and execute complicated decisions. Thus, participation of the frontal lobes in virtually any “executive process” is probably a necessary, but largely insufficient, requirement.

In the past decade, there has been a growing interest in studying executive functions in both normal and disordered populations. It has been found that persons with executive function deficits are significantly impaired in their ability to work, attend school, and function well interpersonally (Damasio, 1994; Grafman and Litvan, 1999). For instance, a study by Bayless et al. (1989) found that low scores on the Tinker Toy Test (an alternative executive function measure of planning, goal formulation and execution) were strongly predictive of unemployment. In addition, scores on the Behavioral Assessment of Vocational Skills test (a newer and more ecologically valid executive function measure), as compared to more classic neuropsychological tests (e.g., Trails A & B and the Wechsler Adult Intelligence Scale-Revised), were found to be the only significant predictor of vocational performance (Butler et al., 1993). Furthermore, the Behavioral Assessment of the Dysexecutive Syndrome (a series of six “real-life” tests hypothesized to cause difficulties in persons with executive function deficits) was a better predictor of executive functions in real-world situations than the WCST (Wilson, 1993). Future research should be devoted to the development of ecologically valid executive function measures and more emphasis should be placed on the remediation of executive function deficits considering their often profound negative impact on social and occupational functioning. Investigators also should conduct additional studies examining the underlying cognitive subprocesses of the executive function construct. Finally, clinical neuropsychologists may consider abandoning the conceptualization of executive functions in terms of the frontal lobes in favor of a more integrative approach that incorporates behavioral, theoretical, cognitive, and neuroanatomical approaches.