Introduction

Epilepsy is one of the most common neurological disorders. Epidemiological studies found a prevalence of 0.5 to 1 % of the European and North American population, and economic costs are high (MacDonald et al. 2000; Pachlatko 2008; Vivas et al. 2012). About 60 % of epilepsy cases are classified as focal epilepsies (Loiseau et al. 1990). In focal epilepsy, up to 60 % of patients develop drug resistance (Siegel 2004). Some forms of focal epilepsy, especially temporal lobe epilepsy (TLE), benefit from epilepsy surgery regarding seizure freedom and post-surgical quality of life (Wiebe et al. 2001).

Neuropsychology is an essential part of presurgical diagnostics in focal epilepsy. It can provide information regarding the localization and lateralization of the epileptogenic focus. Furthermore, in combination with MRI, PET and EEG results as well as the medical examination neuropsychology results are an important part of patient counseling regarding the prognosis and evaluation of cognitive outcomes of surgical procedures. To date, no evidence-based standards have been put forward for presurgical neuropsychological assessment (Brückner et al. 2010; Brückner 2012). Test batteries have been assembled on the basis of neuropsychological experience gained from lesion studies and have been in use for many years (Jones-Gotman et al. 2010).

Verbal fluency can be impaired in temporal (TLE) and frontal lobe epilepsy (FLE). Semantic verbal fluency is seen as a part of semantic memory assessment (Gardini et al. 2013; Sheldon and Moscovitch 2012). Commonly, phonemic verbal fluency impairments are interpreted as pointing to a frontal dysfunction of the language-dominant hemisphere, and semantic fluency impairments are seen as pointing to either frontal or temporal dysfunctions of the language-dominant hemisphere (e.g., Piazzini et al. 2008; Jones-Gotman et al. 2010; Giovagnoli and Bell 2011). (Troyer et al. 1998) found semantic verbal fluency relatively unimpaired in patients with frontal lobe lesions compared to patients with temporal lobe lesions. Semantic and phonemic fluency deficits in TLE compared to healthy controls have been observed in a number of primary studies (Martin et al. 1990; Arnold et al. 1996; Lehericy et al. 2000). Left-sided TLE (LTLE) seems to be especially impaired in comparison with healthy subjects (Martin et al. 1990; Arnold et al. 1996). However, some studies have shown verbal fluency deficits also in right-sided TLE (RTLE) patients in comparison to healthy controls (Martin et al. 1990; N’Kaoua et al. 2001), although theoretical assumptions and current clinical interpretations do not emphasize such impairment.

To date, most systematic reviews on verbal fluency have been performed on lesion studies with heterogeneous patient populations or imaging studies with healthy adults (e.g., Alvarez and Emory 2006). Although patients with non-frontal and right-sided lesions have shown deficits in phonemic verbal fluency, impaired verbal fluency typically is interpreted as a product of left-sided frontal lobe damage (Alvarez and Emory 2006). However, it is problematic to generalize results from other neurological patient populations to patients with focal epilepsy. A qualitative review by (Risse 2006) stated that verbal fluency is sensitive to frontal lobe epilepsy when compared to performance of control subjects and that overall, greater impairment is noted in the LFLE group compared to RFLE. (Sherman et al. 2011) reviewed general cognitive outcomes after epilepsy surgery and found improvement in semantic verbal fluency in LTLE after surgery in approximately 25 % of cases.

The present review is aimed at providing the first systematic presentation of research results regarding verbal fluency performance in presurgical patients with focal epilepsy, in order to assess verbal fluency performance in different groups of focal epilepsy and compare the results with current theoretical assumptions, which have an impact on clinical practice. The main goals of the review were a) to compare different lobar groups of focal epilepsy against healthy control subjects, b) to compare different lobar groups of focal epilepsy against one another, and c) to compare focal epilepsies of differing lateralization regarding semantic and/or phonemic verbal fluency performance. Furthermore, we conducted a subgroup analysis of studies including only patients with mesiotemporal pathology.

Methods

Methods and results are reported in accordance with the applicable standards (Moher et al. 2009; Stroup et al. 2000).

Eligibility Criteria

Only studies including adult presurgical (or non-surgical) patients with unilobar focal epilepsy (see results on diagnostic methods) explicitly measuring verbal fluency (semantic or phonemic, oral or written) were included. For semantic fluency all measures of semantic fluency requiring the subject to name exemplars of one distinct category as quickly as possible were included. For phonemic fluency all measures of letter fluency requiring the subject to name words beginning with a certain letter of the alphabet as quickly as possible were included. Switching measures, i.e. measures of semantic or phonemic fluency requiring the subject to switch between different semantic categories or different letters within the same task, were not included as they are less commonly used or recommended in clinical practice (e.g., Brückner 2012). In order to be included, findings had to be reported in a peer-reviewed journal. Table 1 summarizes the applied criteria.

Table 1 Inclusion criteria

A further inclusion criterion not listed in Table 1 refers to a subgroup of studies including patient groups with mesiotemporal epilepsy. In order to be included in these subgroup analyses, relevant study populations had to consist of at least 90 % of patients with mesiotemporal epilepsy (MTLE), as shown by EEG and brain imaging data.

Study Selection and Data Collection

PubMed was searched using a comprehensive search string: (seizure OR seizures OR epilepsy OR epileptic OR ictal) AND (fluency OR “word production” OR “Controlled Oral Word Association” OR “speech production”). The search included publications available via PubMed until December 31, 2011. An additional hand-search was performed in the reference lists of relevant reviews that had been identified in the database search. Studies were rated for inclusion by two independent reviewers (see Appendix for details). Data on study characteristics, participants, risk of bias, and outcomes were extracted by one reviewer and cross-checked by a second reviewer using a piloted and standardized form.

Assessment of Confounding and Risk of Bias

Possible confounding was addressed by examination of comparability of groups regarding age, intelligence (if not reported, level of education was used as proxy), duration and/or onset of epilepsy, number of prescribed antiepileptic drugs (AED) and seizure frequency (for patient subgroup comparisons) in each study. Methodological rigour was additionally assessed by examining the quality of language dominance measures, recruitment process, psychometric quality and blinding of outcome assessments, and sample size (see Appendix for details). Risk of bias assessment was performed by two independent raters. Disagreements were resolved by discussion.

Data Synthesis

We used meta-analysis to statistically summarize results of the primary studies regarding the questions of interest. For each comparison we calculated the standardized mean difference, Hedges’ g (Hedges and Olkin 1985), which expresses the size of an effect in a study in units of the variability (standard deviation) of the data in that study. Conventionally, standardized mean differences of 0.2, 0.5, and 0.8 are interpreted as small, medium, and large effects, respectively (Cohen 1988). If used for diagnostic purposes, these values correspond to approximately 15, 33, and 47 % of diseased subjects showing a worse performance than control participants, respectively. Statistical variability of the findings was examined using the I2 statistic, which describes the percentage of the variability in effect estimates that is attributable to systematic differences between studies rather than chance (Higgins et al. 2003). Usually, values above 50 or 60 % are considered to implicate substantial and non-ignorable systematic heterogeneity. We performed subgroup analyses including studies in patients with solely mesiotemporal pathology, studies at low risk of bias, and studies with a control group, respectively. Risk of publication bias was assessed by visual examination of whether considerable asymmetry was present in the corresponding funnel plots (Egger et al. 1997). Further details on the data synthesis are reported in the Appendix.

Results

Study Selection

A search in PubMed resulted in 247 records (after duplicate elimination), and a hand search identified another 11 potentially relevant studies leading to a total of 258 potentially relevant reports. Of these, 192 were eliminated as non-relevant for a preliminary sample of 66 studies. After review of the full-texts of the latter, an additional 27 reports were eliminated, resulting in a final sample of 39 studies that met inclusion criteria (see Fig. 1).

Fig. 1
figure 1

Flow chart of study inclusion process

While most excluded studies fulfilled more than one exclusion criterion, the most common reason for exclusion was C1: 182 of all 258 studies were excluded because their study groups did not include a group of preoperative patients with focal epilepsy and/or no appropriate control group (see Table 1). 56 studies were excluded due to C2, i.e. no fluency measure was reported or at least not adequately presented. In 56 studies the subjects were children (C3). 14 studies were excluded because they were published in Spanish, Portuguese, Polish, Russian, Japanese or Chinese (C4). For two studies no abstract and/or full-text were available (C5). One duplicate publication was excluded.

Characteristics of Included Studies

Table 2 provides an overview of the characteristics of the included studies. Ten studies were conducted in the United States of America, seven in Germany, five in Italy, five in the United Kingdom and four in France. Two studies were conducted in Brazil, three in Australia, one in Austria, one in China and one in the Netherlands. 33 out of 39 studies applied measures of phonemic verbal fluency, 23 used measures of semantic fluency. 17 studies applied both semantic and phonemic fluency measures. Sample sizes for assessment of verbal fluency ranged from 8 to 284 patients across all groups. Approximately one third of the included studies investigated small samples (N < 15), whereas in the remaining two thirds of studies samples were medium sized or large (N ≥ 15). Distribution of studies with small samples was similar between phonemic and semantic fluency measures.

Table 2 Overview of included studies

Diagnostic procedures were relatively homogeneous: 30 out of 39 studies conducted continuous ictal and interictal scalp video-EEG monitoring as well as MRI. Some of these studies reported additional EEG with depth electrodes and additional imaging with SPECT or PET. In three studies it was unclear whether video was acquired during EEG recording (“ictal and interictal scalp EEG recordings”) in addition to MRI. In four studies, due to insufficient reporting, it remained unclear which type of EEG recording was conducted in addition to MRI. One study reported only interictal EEG recording and MRI (however, here the patients were not seen as part of routine presurgical assessment as in most other studies), and in one study the assessment procedure remained unclear. Consequently, at least 33 out of 39 studies applied continuous ictal and interictal (video-) EEG monitoring. 14 out of 39 studies reported post-surgical data to confirm presurgical focus determination.

Mean participant age was relatively homogeneous, ranging from 25.2 to 44.7 year. Regarding gender studies were homogeneous, with no clear majority of either male or female participants. All of the 39 studies included TLE patients. Eight studies included FLE patients. 23 studies included healthy controls. Most study patients showed focal pathologies on MRI. On average, mean full scale or verbal IQ was < 100 (most mean values between 90 and 100) in the included patient groups. Mean age at onset of epilepsy varied across studies, ranging from 6.2 to 26 year. Mean duration of epilepsy ranged from 11 to 31.9 year (mostly between 15 and 25 year).

Outcome Measures

For the assessment of phonemic verbal fluency, assessment time was usually 1 min. per letter. The most commonly applied measure was the Controlled Oral Word Association Test (COWAT, Benton et al. 1994) including the letters F, A and S. The parallel forms (C, F, L or P, R, W) were rarely used. Other tests used were an Italian Oral Fluency Test (OFT, Novelli et al. 1986) with the letters P, F, L, and the Leistungsprüfsystem subtest 6 (LPS6, Horn 1983), a written test with the letters L, P, R or F, K, R. One study used a French phonemic fluency test (P,R,V) by (Cardebat et al. 1990). Another study used a fluency task from the Delis Kaplan Executive Function System (D-KEFS, Delis et al. 2011) (F, A, S) and one a fluency task from the Protocole Montréal d’Évaluation de la Communication (MEC, Joanette et al. 2004): P-words in 2 min. For the assessment of semantic verbal fluency, the most common category by far was animal naming in 1 min. (Benton et al. 1994). One study used animal naming in 2 min. from the Stichting Afasie Nederland word-fluency test (Deelman et al. 1980). The OFT used sum scores across the categories animals, fruits and car brands. One study applied the MEC semantic fluency task (items of clothing, 2 min.), another one the Supermarket Fluency Test (Troyer 2000). The remaining studies applied experimental phonemic or semantic fluency tasks.

Risk of Bias

Regarding comparability of important demographic and clinical variables (age, IQ/education, onset/duration of epilepsy), 24 studies (62 %) had sufficiently (most parameters matched) or well matched samples (all parameters matched). In the remaining 15 studies (38 %), matching was either insufficient (only some parameters matched) or non-existent. However, out of the studies including healthy control groups approximately half did not achieve sufficient matching of IQ/education, especially between controls and patients. Gender was distributed relatively similarly across the study groups in most studies (see Tab. 2). Only 15 out of 39 studies reported data on seizure frequency. In merely two of these studies patient groups were not comparable on seizure frequency. Twenty studies provided some data on antiepileptic drugs (AED), however not always in a clear fashion. Number (no.) of AED per patient was most commonly reported. In one study patient groups were not comparable on no. of AED. Thirteen studies reported some data on different types of AED. Across these studies about 52 % of patients were on monotherapy, and approximately 9 % of patients were on topiramate or phenobarbital (and one patient on zonisamide. Only two studies reported data on secondary generalization of seizures. Seizure frequency and no. of AED were rated separately from other clinical variables in risk of bias assessment due to the large amount of studies providing no or insufficient data.

For the assessment of language dominance 21 studies used the Wada test or fMRI, one study reported Wada results for the majority of their TLE but not the FLE sample (Ramirez et al. 2010), 10 studies reported data from handedness inventories, the remaining 7 studies did not report any information on language dominance. Sample size was sufficient (N ≥ 15) in two thirds of studies. A common problem was a lack of blinded assessment regarding verbal fluency measures. Most studies did not report whether the raters were blinded or not. This can be explained by the fact that verbal fluency testing in most cases took place as part of standard presurgical neuropsychological assessment. Patient recruitment was unclear in most studies as the authors did not mention whether patients were recruited in a consecutive fashion or not. A small number of studies also used patient databases for retrospective assessment. However, in a large number of studies, patients were investigated as part of routine preoperative assessment. In these cases, consecutive recruitment might be assumed. The quality of outcome measures was high in the majority of studies, i.e. commonly used measures of verbal fluency or less commonly used measures with normative data were applied. Only four of the 39 studies did not report on the fluency measures they applied or used experimental measures without normative data.

On the whole, study quality is at least in the medium range regarding variables crucial to the present review. In the summary risk of bias assessment, the sum scores for the 39 studies ranged from −7 to 10; approx. one third of studies (15) were classed as high-risk and almost two thirds (24) as low-risk studies.

Phonemic Verbal Fluency

Meta-analytic results for phonemic verbal fluency of focal epilepsy patients are summarized in Table 3.

Table 3 Summary results of the main and subgroup analyses

Patients with TLE were impaired on phonemic verbal fluency in comparison to healthy control subjects (overall g = 1.22, p < 0.001). Heterogeneity was moderate within comparisons of subgroups with healthy controls (I2 < 50 %) but notable between subgroup comparisons (61 %). Only one study compared phonemic verbal fluency in frontal lobe epilepsy patients (with mixed lateralization) and healthy controls showing a large and significant effect in favor of the control group (g = 1.54, p < 0.001).

With respect to lateralization in temporal lobe epilepsy patients, a small statistically significant effect in favor of RTLE over LTLE was observed (g = 0.35, p < 0.001). Only two studies compared phonemic verbal fluency between right and left frontal lobe epilepsy patients, showing a medium effect size in favor of RFLE over LFLE (g = 0.71, p < 0.001).

In the comparison of phonemic verbal fluency according to localization of the dysfunction, meta-analysis showed a moderate advantage of TLE over FLE patients (overall g = −0.47, p < 0.001). Heterogeneity was notable (62 %) only in the comparison between FLE and mixed TLE patients, though most comparisons included only a very limited number of studies. Across all subgroups, overall heterogeneity remained acceptable (49 %).

Semantic Verbal Fluency

Meta-analytic results for semantic verbal fluency of focal epilepsy patients are summarized in Table 3.

With regard to semantic verbal fluency, all comparisons between healthy controls and temporal lobe epilepsy patients showed large and statistically significant effects (overall g = 1.31, p < 0.001). The heterogeneity statistic indicated considerable differences only between (I2 = 76 %) but not within subgroups. One study compared semantic verbal fluency between frontal lobe epilepsy patients (left-sided) and healthy controls showing no statistically significant effect (g = 0.90, p = 0.08).

With respect to lateralization in temporal lobe epilepsy patients, semantic verbal fluency proved to be slightly but statistically significantly better in RTLE than in LTLE patients (g = 0.27, p = 0.002). In frontal lobe epilepsy patients, one study found no statistically significant difference between RFLE and LFLE (g = 0.39, p = 0.10).

In comparisons of semantic verbal fluency between frontal and temporal lobe epilepsy patients, none of the subgroup comparisons nor the overall comparison revealed a significant difference (overall g = −0.08, p = 0.55).

Subgroup Analyses in Mesiotemporal Epilepsy Patients

Meta-analytic results for verbal fluency of mesiotemporal epilepsy patients (MTLE) are displayed in Table 3.

For phonemic verbal fluency, large effects showed better performance of healthy controls than MTLE patients (overall g = 1.22, p < 0.001). With regard to lateralization, a small effect favoring RMLTE over LMTLE patients was observed (g = 0.35, p < 0.01).

On semantic verbal fluency, large effects showed better performance of HC than MTLE (overall g = 1.39, p < 0.001). RMTLE patients performed slightly better than LMTLE patients (g = 0.35, p = 0.01).

Influence of Risk of Methodological and Publication Bias

In comparisons of studies at low and high risk of bias meaningful moderation was revealed only in RTLE vs. LTLE contrasts. Regarding semantic verbal fluency, superiority of RTLE patients was substantially smaller (but still statistically significant at p = 0.002) in studies at low risk of bias than in studies at high risk of bias (g = 0.23 and 1.24, respectively). Regarding phonemic verbal fluency a similar but less prominent trend was identified (g = 0.31 in studies at low risk of bias and 0.50 in studies at high risk of bias). A sufficient number of studies were present for contrasting the results of studies with and without healthy control groups in the RTLE vs. LTLE comparisons. Although superiority of RTLE patients over LTLE patients seemed somewhat more pronounced in studies with healthy control groups, this difference was very small and did not reach statistical significance.

Visual investigation of funnel plots revealed that in comparisons of RTLE vs. LTLE the distribution of imprecise (small) studies was somewhat unbalanced with missing studies that would support better semantic and phonemic verbal fluency in LTLE. Although it is unlikely that the summary effects would be nullified if such studies were found and included, the true advantage of RTLE over LTLE may be somewhat smaller than reported here. For the other investigated comparisons, funnel plots were largely balanced.

Discussion

This is the first systematic review focusing on verbal fluency deficits in presurgical patients with focal epilepsy. Its results may be of particular interest at the time as recommendations and assessment standards are currently discussed and published (Brückner 2012).

While it is true that the demands on executive and semantic retrieval functions vary between semantic verbal fluency, which obviously places higher demands on the semantic retrieval system, and phonemic verbal fluency, which is more dependent on executive functioning, both tasks rely on either component. Therefore, some impairment on semantic verbal fluency can also be observed in patients with frontal-executive dysfunction and on the other hand phonemic verbal fluency impairment in patients with disturbances in semantic networks (Laisney et al. 2009). Hence, some extent of phonemic verbal fluency impairment in TLE can be explained by the semantic task components. Therefore, a pattern of results showing more marked impairment of FLE patients on phonemic and for TLE patients on semantic verbal fluency could be expected. The results of the review confirm this partly. Patients with TLE are impaired on phonemic verbal fluency compared to healthy control subjects, and as expected FLE does seem to be associated with even worse performance on phonemic verbal fluency tasks than TLE. Impaired phonemic verbal fluency is viewed as a typical indicator of frontal lobe dysfunction, especially in patients with left-frontal lesions (e.g., Baldo et al. 2001). Accordingly, (L-) FLE patients should show the largest impairment of all focal epilepsy groups. As expected, a superiority of RFLE over LFLE can be observed, albeit in a sample of only two studies. Unfortunately, the study sample for comparisons between FLE and healthy controls is too small to draw any reliable conclusions (one study).

The results for semantic verbal fluency mirror those for phonemic verbal fluency. As expected, the effect sizes showing impairment of the TLE groups are somewhat larger. However, no significant effects between FLE and TLE, HC and FLE, or RFLE and LFLE could be observed. The impairment of FLE patients on phonemic verbal fluency seems to be more marked than on semantic fluency. However, numbers of included studies are small in some of these comparisons. Consequently, more studies on semantic as well as phonemic fluency in FLE patients are needed. As TLE patients are impaired on both tasks in comparison with healthy controls, localization of dysfunction to the frontal lobes should not automatically be assumed when a patient shows phonemic verbal fluency deficits and even less so for semantic verbal fluency deficits. A spread of the epileptic activity from temporal to frontal areas and reduced functional connectivity between the temporal and frontal lobes may be responsible for frontal-executive deficits in TLE patients (Haneef et al. 2012).

Regarding lateralization of verbal fluency deficits in TLE, an interesting pattern of results emerged. On phonemic as well as semantic verbal fluency, both LTLE and RTLE patients show marked deficits relative to the control group. On a direct comparison RTLE patients show only slightly better performance than LTLE patients. The overall impairment as well as the relatively stronger impairment in LTLE were expected. However, taking the fact into consideration that language functions should be primarily impaired in LTLE under the assumption that most included patients would show typical cerebral language dominance, the impairment in RTLE patients is more pronounced than expected. Semantic deficits have been associated with left temporal lobe dysfunction in previous research (e.g., Sheldon and Moscovitch 2012). As part of semantic memory retrieval semantic fluency has been linked to an extended predominantly left-lateralized network including temporal and frontal areas (Verma and Howard 2012). The relatively strong impairment of the RTLE group has important implications: Verbal fluency deficits are usually viewed as a sign of left-hemispheric dysfunctions, but RTLE patients’ impairment as a group - especially in comparison with LTLE patients - demonstrates that semantic fluency testing as part of the language assessment provides less lateralizing information than might be assumed based on theoretical assumptions regarding the semantic network (e.g., Binder et al. 2009, see also Hermann et al. 2001).

There are a number of possible explanations for the verbal fluency deficits seen in RTLE. Antiepileptic drugs (AED) can reduce cognitive speed (Ortinski and Meador 2004), which in turn could influence fluency performance. Inspection of AED data in included studies did not show a selective bias for RTLE patients (see also: possible sources of bias). Assuming similar AED effects for RTLE and LTLE, the fact that effect sizes in the comparisons between these groups in both fluency tasks were relatively small cannot be explained by AED effects. Consequently, regarding the comparisons between RTLE and HC, it is unlikely that the effects can solely be explained by the influence of AED, as this would imply that the remaining genuine verbal fluency impairment of patients with LTLE was very small indeed. This does not seem to be the case as studies in drug-naïve epilepsy patients show marked deficits in verbal tasks (Baker et al. 2011; Aikia et al. 2001). Instead, it is possible that the spread of seizure activity from right to left temporal areas as well as interictal discharges could cause language impairment to some extent (Badawy et al. 2012). Also, language task improvements that have been observed after temporal lobectomy might be due to the reduction of ictal and interictal epileptic activity (Hermann and Wyler 1988). Furthermore, the right temporal lobe possibly bears some language-related functions. For instance, the right hemisphere seems to be involved in processing semantic information in language comprehension (Yang 2014). Other studies have shown bitemporal metabolic activation during verbal fluency tasks, suggesting that not only left-lateralized areas, but possibly an interhemispheric network might be necessary for normal fluency performance (e.g., Parks et al. 1988). Finally, some research results suggest a higher incidence of depression in epilepsy patients after right temporal lobectomy (Quigg et al. 2003). As depression is associated with reduced semantic and phonemic verbal fluency performance (Henry and Crawford 2005), it could be speculated that in RTLE depressiogenic mechanisms might be at work even before surgery causing fluency deficits as a side effect. For instance, (Doucet et al. 2013) concluded that RTLE has a more maladaptive impact on amygdala-based emotion processing compared to LTLE. The authors speculate that amygdala-related functional connectivity differences might reflect emotional perturbations at a subclinical threshold or at a level inaccessible to introspection.

The results of the subgroup analyses in patients with mesiotemporal epilepsy (MTLE) are similar to the findings on phonemic as well as semantic verbal fluency in the whole study sample. Interestingly, in relation to healthy controls patients with MTLE seem to be just as impaired as an unselected group of TLE patients. However, there was a considerable overlap between the entire study sample and the MTLE subgroups, since many of the included studies mainly or solely investigated patients with MTLE. Research has shown fluency deficits in MTLE or involvement of the hippocampus in verbal fluency tests (e.g., Gleissner and Elger 2001; Sheldon and Moscovitch 2012). Sheldon and Moscovitch argue that semantic verbal fluency of categories with episodic content (such as “people you work with”) activate the hippocampus more than categories with less episodic content (such as “famous people” or “animals”). However, they found small hippocampal activations also in the “non-episodic” categories. Furthermore, it could be argued that animal fluency (which was analyzed in conjunction with 14 other “non-episodic” categories in their study) does actually bear a relatively high episodic content as it might activate memories of pets owned, zoo visits etc. This could partly explain why MTLE patients showed a marked semantic verbal fluency deficit in the present review. The results show a small difference in effect sizes between impairment on semantic and phonemic verbal fluency tasks in MTLE patients compared to HC, with semantic verbal fluency being slightly more impaired. However, MTLE patients are also markedly impaired on phonemic verbal fluency, which seems to suggest that propagation of epileptic activity from mesiotemporal to lateral temporal regions plays an important role in their verbal fluency performance (see also Bonelli et al. 2011).

All in all, verbal fluency deficits can be an indicator of frontal and/or temporal lobe dysfunction. The results point to a higher suitability for phonemic verbal fluency tasks for objectifying frontal lobe dysfunction in comparison to semantic fluency. However, more studies including frontal lobe groups are needed. Attempts at clear localization and lateralization are hindered by the fact that most patient groups show some verbal fluency impairment, with larger-than-expected deficits in the RTLE patients. On the other hand, identification of verbal fluency impairment can be a valuable resource when viewed in the context of an entire neuropsychological test profile. Of course, the interpretation of individual neuropsychological test results in epilepsy is influenced by the fact that cognitive deficits do not only reflect the zone of functional impairment but also clinical features of epilepsy including e.g., seizure frequency and medication.

With regard to clinical heterogeneity, studies were rather similar considering the included patients’ demographic variables with the exception of duration/onset of epilepsy which showed somewhat higher variation than the other parameters. The majority of studies used samples from standard presurgical assessment, which explains the relative similarity of patient populations and diagnostic methods. Thus, clinical heterogeneity poses no major threat to the validity of the meta-analytical results.

Statistical heterogeneity did not pose a problem in most of the meta-analytical comparisons. It was mainly a concern in some of the comparisons with “mixed” FLE and TLE groups. Here, the numbers of included studies were small and apparent clinical differences between study groups may be too large to produce homogeneous results.

Possible Sources of Bias

Some noise in the results might be expected taking into consideration that different epilepsy care institutions apply different evaluation techniques and differ in expertise. However, regarding diagnostic standards, the primary studies are relatively homogeneous: at least 33 (up to 37) out of 39 studies applied continuous ictal and interictal scalp EEG monitoring as well as MRI plus other imaging techniques. This should allow for a high-quality assessment of the epileptogenic focus. Hence, bias due to incorrect diagnoses should be limited. Furthermore, there is no reason to assume that this would affect patient groups in different ways. However, localization can only be definitely inferred post-surgically by achieving seizure freedom. This issue could be resolved in a future systematic review only including studies with sufficient reporting of post-surgical data.

Regarding the matching of crucial parameters across all study groups (IQ/education, age, duration, onset of epilepsy) around two thirds of the studies were sufficiently matched. The criterion applied (mean differences < = ½ pooled SD) was rather strict leading to a conservative estimate. Therefore, apart from the IQ/education matching concerning comparisons with healthy controls in some studies (see below), matching of crucial variables may be considered sufficient across the study pool, lowering the risk of bias.

Verbal IQ or educational achievement are related to language functioning and hence, also to verbal fluency performance. Thus, a bias in the results caused by slightly higher mean IQ scores for healthy subjects compared to epilepsy patients cannot be ruled out, since approximately half of the 23 studies which included healthy controls did not report sufficiently on IQ or education (N = 4) or were not well-matched regarding these parameters (N = 8). However, even if perfect matching with regard to IQ/education between healthy controls and patient groups in all of the included studies might reduce effect sizes slightly, it is unlikely to influence their statistical significance.

Due to insufficient reporting in the primary studies it was not possible to systematically assess the effect of secondary generalization of seizures (see results). Data on seizure frequency and antiepileptic drugs (AED) were not or insufficiently reported in a large number of studies. Seizure frequency and data on AED were reported in less than half and about half of the included studies respectively. Imbalance between groups regarding seizure frequency or AED occurred in only a minority of these studies. The studies not reporting any data on seizure frequency or AED were conservatively assigned a negative score on these variables in the risk of bias assessment. Furthermore, patient groups are unlikely to be affected differentially by these potential sources of bias. Regarding comparisons with healthy controls risk of bias cannot be ruled out completely. However, this has been addressed in the risk-of-bias analyses (see below). Furthermore, in the included studies only a minority of patients received drugs that are clearly associated with attention and/or language deficits such as topiramate, zonisamide or phenobarbital (e.g., Lee et al. 2003; Ortinski and Meador 2004). With older AED (e.g., carbamazepine, phenytoin) only medium-sized effects on cognition have been observed (e.g., Vermeulen and Aldenkamp 1995), whereas newer AED (e.g., levetiracetam, lamotrigine) are generally associated with fewer or no cognitive side effects (Ortinski and Meador 2004). Therefore, cognitive side effects of AED cannot solely explain the large effects observed in the present review.

Twenty one studies (54 %) used Wada or fMRI to determine cerebral language dominance. 11 (28 %) reported handedness data or included only right-handed subjects. Seven (18 %) did not make any mention of language dominance or handedness. Three of the seven studies with no mention whatsoever of language dominance only included “mixed TLE” groups and were therefore not included in lateralization comparisons, leaving only four studies, in which this might have served as a source of bias regarding lateralization results (Field et al. 2000; Giovagnoli and Avanzini 2000; Helmstädter and Elger 2000; McDonald et al. 2008). Furthermore, in healthy subjects right-handedness has been reported to be associated with left-hemispheric language dominance in 93-96 % of cases of moderate to strong right-handers (Knecht et al. 2000). In epilepsy patients left-hemispheric language dominance has been reported in 74 % (moderate right-handers) to 91 % of cases (strong right-handers) (Isaacs et al. 2006). In a sample of 174 epilepsy patients the overall incidence of non-left hemisphere language dominance was 24 % (Isaacs et al. 2006). All in all, for our whole study sample, an incidence of left-hemispheric dominance of roughly 90 % can be estimated, i.e. risk of bias in analyses concerning lateralization of epileptogenic focus through lack of language dominance reporting in included studies is rather low.

As most studies do not report on blinded assessment, one must assume that it did not take place. This could pose a problem as most participants are likely to have been assessed as part of standard interdisciplinary presurgical diagnostic procedures. Neuropsychologists performing the cognitive assessment are not always blind to EEG results or patient history at the time of assessment. On the other hand, verbal fluency assessment is relatively straight forward: For written fluency tests (e.g., LPS6), objectivity is high as the patients write the words down themselves. Oral fluency tests can either be recorded and the number of words written down later or the examiner writes down the words as the patient utters them. Objectivity is relatively high in this case as well as there is generally not much doubt as to whether a specimen belongs to a certain category (e.g., animals) or whether a word beginning with a certain letter complies with the test rules (e.g., in COWA FAS). Hence, as opposed to tests with a greater range of freedom of interpretation through the examiner (e.g., assessment of spontaneous speech), verbal fluency is less sensitive to examiner-based sources of bias.

Considering the limited statistical heterogeneity, the effect of test variability on the present results with regard to varying task duration, semantic category, letter combination or mode of presentation (oral vs. written), if present at all, is expected to be negligible. However, a systematic investigation of potential effects of test variability in focal epilepsy might be an interesting objective in further research.

If no mention regarding recruitment was made as in most studies, at least in studies including patients who were assessed for surgery, it can be assumed that recruitment was consecutive as part of standard preoperative assessment. In the absence of higher standards of reporting in the included publications, it is, however, not possible to rule out recruitment-based bias entirely.

In analyses on the possible effects of methodological and publication bias most results were found to be robust. However, one cannot not rule out that pooled estimates are somewhat biased in the comparison between RTLE and LTLE patients, where small and imprecise studies at high risk of bias reported exaggerated effects favoring RTLE. Even if the statistical significance is very unlikely to be affected by these limited bias sources, the true effect sizes in favor of RTLE might be slightly overestimated here. This finding yet again demonstrates the importance of questioning the concept of relatively preserved verbal fluency skills in patients with RTLE.

External Validity Issues

Only presurgical data were evaluated here. Thus, no conclusions should be drawn from the evidence regarding post-surgical deficits in verbal fluency. A further systematic review will be needed in order to clarify that issue. More than one third of the included studies investigated mainly or solely patients with mesiotemporal epilepsy. Therefore, the results primarily refer to mixed samples as frequently seen in clinical practice and should be generalized to groups of patients with exclusively lateral temporal lobe epilepsy only with caution.

Summary and Outlook

All in all, quality of the included studies is in the medium range. In this area of research, one has to rely on observational (non-randomized) evidence. Therefore, quality standards such as matching of study groups are especially important. Even though in the present study pool certain sources of bias cannot be ruled out completely (e.g., IQ, blinded assessment, recruitment), the risk of bias to the main results seems comparatively low, and sensitivity analyses show that the findings are largely robust. The fact that effects for TLE and FLE are in line with theoretical assumptions lends further credibility to the results. Heterogeneity and small numbers of included studies prevent meaningful interpretation of some of the results, particularly concerning studies with “mixed” TLE samples and some of the analyses with FLE groups. Additional studies are needed. In general, higher quality of publications is desirable for future research, especially reporting on potentially confounding variables such as AED, seizure frequency and seizure type. Regarding the findings on mesiotemporal epilepsy, studies comparing phonemic and semantic fluency performance between patients with mesiotemporal and lateral temporal lobe epilepsy are needed.

The main findings regarding presurgical or non-surgical patients with focal epilepsy are: TLE (and MTLE) patients are impaired on semantic as well as phonemic verbal fluency regardless of lateralization of focus, with only slightly better performance for RTLE patients. FLE patients in comparison with TLE patients seem to be more impaired on phonemic but not semantic verbal fluency with insufficient data for comparisons with healthy controls. The implications for neuropsychological assessment are that verbal fluency impairment in general is sensitive but not specific to LTLE and possibly FLE. The significant verbal fluency impairment seen in patients with RTLE can most probably be explained by a combination of factors including, among others the bihemispheric nature of language networks and possibly impaired emotional processing in RTLE. Hence, language impairment in patients with RTLE as seen in the current review could prove another interesting avenue for future research.