Introduction

Semantic verbal fluency

Verbal fluency tests, both on phonological and on semantic cue, are very popular in clinical and experimental neuropsychology: they are used in order to assess executive functions, lexical retrieval and production [1]. Several brain regions, in particular left frontal and temporal areas, are involved in this task [2]. Accordingly, not only aphasic patients, but also patients with frontal damage, without aphasic deficits are impaired on verbal fluency [3]. Verbal fluency tasks are therefore submitted to patients with dysexecutive disorders, due to brain injury [4], schizophrenia [5], mild cognitive impairment [6], Alzheimer’s disease [7], fronto-temporal dementia or Parkinson’s disease [8]. However, it is verbal fluency on phonemic cue, which is more sensitive to left frontal damage and to psychiatric syndromes [9], whereas verbal fluency on semantic cue proves to be impaired especially in patients with temporal lesions, Alzheimer’s disease, semantic dementia or Parkinson’s disease [10, 11].

The distinction between living and non-living categories seems to be the main feature on which objects knowledge is organized. Different neuropsychological models have been proposed to explain category-specific deficits. The first model assumes that the living/non-living distinction is based upon the different weight that visuo-perceptual and functional attributes have in the identification of members of these categories [12]; a second model suggests that evolutionary pressure resulted in the elaboration of dedicated neural mechanisms for the domains of living (animals and plants) and non-living (artifacts) things [13]. A third model proposes that the different level of interconnections existing between perceptual and functional features in living and non-living things may be more important than the weighting of these features [14]. Numerous authors have suggested that the opposition between living and non-living is probably too general and they highlight the necessity of a more fine distinction within each category [15, 16]. In these studies the authors described patients that, within living impairments, showed selective impairments for animals or for fruits and vegetables. Two different hypotheses tried to explain this category-specific semantic deficit within living domain. Caramazza and colleagues [14, 16] proposed the “domains of knowledge” hypothesis that assumes that neural selection may have generated specialized and separated neural circuits for different sub-categories of living things, in particular for animals, fruits and vegetables, because these things had potentially two different roles in human survival (animals are potentially dangerous, while fruits and vegetables are source of food). Gainotti [17], based on a review of literature, suggested that only the dissociation between living and non-living reflects a real neuro-anatomical organization, while the difficulties within the same category of living things (animals and fruits) are probably the results of social roles, familiarity factors and gender effects. Moreover Gainotti and colleagues [18] studied the importance that various sensorimotor modalities played for different categories of living things: visual, auditory and language information seems to play a dominant role in the representation of “wild animals”, whereas taste, olfactory and visual informations are very important in the representation of fruits, vegetables and flowers. These authors hypothesized that the differences within the living domain could reflect this representation that different categories had in different anatomical part of the brain.

Neuropsychological, neuroimaging and direct stimulation studies in neurosurgical patients have confirmed the existence of specific areas for different semantic categories [12, 13, 1922]. Papagno and co-workers [22], by means of DES on patients with low-grade and high-grade glioma (HGG) submitted to awake surgery, found that patients produced errors in naming living objects during stimulation of the posterior part of the left middle temporal gyrus (BA 21) and of the left inferior frontal gyrus (BA 45), while the stimulation of the posterior part of the supramarginal gyrus (BA 40) and of the superior temporal gyrus (BA 22) interfered with naming of non-living objects, both at a cortical and subcortical level. These sites differ from those typically observed in neuropsychological and neuroimaging studies [2124], probably because of a cerebral reorganization observed in low-growing tumors. Finally, functional neuroimaging studies have demonstrated, during naming of animals, a bilateral activation of the left inferior temporal lobe and the fusiform gyrus, while naming of artifacts activated the posterior medial temporal gyrus, the inferior temporal gyrus bilaterally, the left medial temporal gyrus and the left premotor region [13, 19, 23].

Normative data

Normative data on semantic fluency are specific for each language and population. Almost in all variants of this task the participants asked to produce as many words as possible belonging to a given semantic category, over a period of 60 s for each category. For the Italian population normative data have been collected by Novelli et al. [24], by Spinnler and Tognoni [25], by Capitani and colleagues [26], and recently by Costa and colleagues [27].

In the Novelli’s version [24], categories are animals, fruits and brands of cars. The score corresponds to the total number of words produced. The score is affected by age and education, but gender has no effect. Spinnler and Tognoni [25] assessed colors, animals, fruits and cities; in this task 2 min is allowed for each semantic category. The global score is represented by the mean of words. Also in this version age and education proved to affect performance. Capitani and colleagues [26] assessed animals, tools, fruits and vehicles and found an effect of age and education; more specifically, fruits and tools were significantly affected also by gender, with women showing a better performance than men for fruits, and men a higher performance than women for tools. Costa and colleagues [27] assessed animals, fruits and colors, with the score corresponding to the total number of words produced in a period of 1 min for each category. The score is affected by age, education and gender (women produced significantly more words than men).

Normative data for different languages (English, Portuguese, Spanish, Swedish and German) have also considered the effects of demographic variables, such as age, education and sex [2833], with some studies considering only the semantic category of animals [3033]. Age always proved to significantly and negatively affect performance [2833]. Apart from Spanish and German [29, 33], also education significantly affects performance [28, 3032], while gender has proved to be significant only for the category “professions”, with a significant advantage for male [33]. In Portuguese and English [28, 32] also the number of subcategories or “cluster” (for example wild animals, domestic animals, courtyard animals, fish, etc.) and steps between clusters [32] were considered. In both studies education significantly affected the amplitude of clusters and the number of steps between clusters [28, 30], while age differently affected the number of steps between clusters.

Before the recent work of Costa and colleagues [27] normative data for the Italian population were available only for people under 75 years [24]. In recent years, there has been a significant increase in the average length of life, which can be estimated at 85 years [34]. Therefore, our aim was to review the normative data of semantic fluency, also considering that the increasing use of the modern mass media may have enhanced the availability of knowledge related to different semantic categories [35]. Finally unlike Costa and colleagues [27], we collected normative data for each category separately.

Materials and methods

Participants

290 healthy Italian volunteers (n = 290), 142 males and 148 females took part in this study between March 2010 and March 2012. Participants’ mean age was 54.10 years (range 19–98, SD = 19.2) and mean education was 12.26 years (range 3–23, SD = 4.26). Inclusion criteria were: (1) age ≥18 years, (2) absence of neurological or psychiatric diseases, no history of alcohol and/or drug abuse, potential medical diseases, (3) right handedness. Participants were balanced for demographic variables (age, education, sex) that may affect performance (see Table 1) and divided in seven groups according to age (19–29, 30–39, 40–49, 50–59, 60–69, 70–79, ≥80), and in five groups according to education (≤5, 6–8, 9–13, 14–16, ≥17). They were recruited from different sources: (1) relatives, friends and colleagues of the authors, (2) spouses, relatives and caregivers of in-patients and out-patients of the hospital where two authors (BZ and AC) worked (Fondazione IRCCS Ca’ Granda, Ospedale Maggiore Policlinico, Milan, Italy). The ethnic background of all participants was Caucasian and all were living in Italy and educated in Italian. Participants did not receive any financial reimbursement or any other compensation. The study was approved by the local ethical committee of the University of Milano-Bicocca.

Table 1 Distribution of the study group according to age, educational level and gender

Test and procedure

Participants were asked to produce as many words as possible belonging to a given category in 1 min for each category. As mentioned, the three categories used were animals, fruits and brands of cars. The examiner said: “Now you should tell me all the names that come to mind belonging to a specific category that I will indicate. For example, I could say: flowers and you could tell me: tulips, roses, primrose, etc. I will tell you when you can stop. Let’s start with animals”. The total number of correct items generated for each category is recorded. The score is represented by the total number of words produced for the three categories. A word repeated twice is counted only once.

Statistical analysis

Statistical analysis and scoring were performed according to the method described by Capitani [36]. Multiple regression analyses were performed to evaluate the effect of demographic variables (age, education and sex) on performance. For the global score and for each semantic category, the first step was to identify the linear model through a covariance analysis. For each demographic variable we have tried raw score and logarithmic, quadratic and reciprocal transformation; we adopted raw score transformation of all the demographic variables (age, education and gender) which proved most effective in reducing the residual variance. Raw scores were adjusted, according to the relative influences of those variables that had a significant effect.

Correction grids were derived to adjust, when necessary, the performance of each newly tested participant for the effect of age, education, and gender. Adjusted scores were then used to compute tolerance limits. A subject’s score is considered normal when it lies within the highest 95 % of the population whereas it is pathological if it falls within the lowest 5 %. Inferential cut-off scores were then derived to define the score at which or below which the probability that an individual belongs to the normal population is <0.05. Scores equal to or lower than the cut-off score were considered pathological. Adjusted scores were then transformed into a 5-point interval scale, from 0 to 4 equivalent scores, following a method used for other neuropsychological tests [25]. Zero corresponds to a score below the 5 % tolerance limit. one, two, three are intermediated score, and four corresponds to a score better than a mean. Equivalent scores simply combine non-parametric tolerance limits and the demographic adjustment [36].

Results

The mean and median scores for the three categories separately and together, and the cut-off values, are given in Table 2. Multiple regression analysis with age, gender, and education as independent variables were performed for the global score [F(3,286) = 84.196 p < 0.0001] and considering separately each category: animals [F(3,286) = 55.139 p < 0.0001), fruits [F(3,286) = 55.139 p < 0.0001], vehicles [F(3,286) = 70.376 p < 0.0001]. Age and education significantly affected the global score (t = −2.867, p < 0.01, t = 2.768, p < 0.01, respectively), and each category considered separately: animals (t = −2.395, p = 0.01, t = 2.470, p = 0.01, respectively), fruits (t = −2.514, p = 0.01, t = 2.096, p < 0.05, respectively) and brands of cars (t = −2.393, p = 0.01, t = 2.417, p = 0.01, respectively). Gender did not affect performance with animals and fruits, while it did for brands of cars (t = −4,929, p < 0.0001). Correction grids are reported in Tables 3 and 4.

Table 2 Means, medians, standard deviations, and cut-off scores obtained by the subjects in the semantic fluency test and subtest
Table 3 Correction grid and equivalent scores for the semantic fluency test
Table 4 Correction grid and equivalent scores for the category “animals”, “fruits” and “brand of cars” of the semantic fluency test

Discussion

We collected normative data for semantic fluency in order to update the current Italian normative data of Novelli and colleagues [24] extending the age interval to 98 years, considering the increase in the mean age of the population; furthermore, we analyzed separately each category in order to provide a tool to evaluate category-specific deficit in patients with cerebral lesions.

A significant effect of age and education was found as in previous standardization on the Italian population [2427]. We observed, in particular, that performance decreases with aging, as occurs in most neuropsychological tests with a few exceptions, such as naming by description [24]. Education improves performance, while gender does not affect the global score with one exception: men proved to obtain a better score than women on brands of cars. No relevant changes in the cut-off have been found in the present study as compared to the normative data collected by Novelli and collegues [24]. However, some differences can be clearly seen in the correction grids: in the present study age and education required larger corrections. This result shows that world change scan affect normal population and therefore it is necessary to update normative data for neuropsychological tests.

In order to highlight the importance of neural structures located in the left temporo-parietal cortex for recovering and naming objects belonging to different categories living (animals and fruits) and non-living (brands of cars), we report a comparison between two small samples of patients with temporal and parietal brain tumors. We compare 30 patients: 10 with left parietal glioma and 20 with left temporal glioma. In our groups all the patients with a parietal glioma have a normal score before surgery and only two patients (20 %) have a pathological score in almost one category post surgery. Differently in the temporal group of patients 3 (15 %) have an abnormal score in almost one semantic category before surgery and 14 patients (70 %) show a pathological score after surgery. In particular patients with temporal brain tumors have an impairment mainly in living categories (animals and fruit, 55 % for both categories), as reported in literature [3739]. Patients with brain glioma probably have had a reorganization of neural structures involved in our task given their different tumor location and the rate of growth [22].

To explain how to use the correction grid, both for the global score and for each category, we report as example a single case of a patient (MV), male, 42 years old, with 8 years of education, affected by a left temporal HGG. After surgery, the patient obtained a global score of 35:13 for animals, 7 for fruits and 15 for brands of cars. Given his age and education, the raw score can be adjusted by adding 1.3 to the global score. Similarly, 0.5 should be added to the raw score for “animals”, 0.4 to “fruits”, and 0.8 for “brands of cars”. The adjusted global score is therefore 36.3, which corresponds to a normal performance, with an equivalent score of 3; the adjusted score for animals (13.5) is also in the normal range corresponding to an equivalent score of 3, as is the score for “brands of cars” (14.2), which corresponds to an equivalent score of 4. In contrast, the adjusted score for fruits (7.4) corresponds to an equivalent score of 0 (abnormal score). This result suggests that considering each category separately represents a remarkable advantage in clinical and experimental setting since it enables to detect single-category deficit.

Verbal fluency (both phonologic and semantic) tests are among the most widely used neuropsychological tools in the assessment and monitoring of dementia [35]. In particular, semantic fluency as compared to phonological fluency seems to be more sensitive in detecting early deficits in semantic dementia and Alzheimer’s disease [7]. The specificity and sensitivity of the semantic fluency test could help in an early diagnosis of language impairment [40].