Introduction

The Stroop color and word test (SCWT) [1] is widely used to evaluate selective attention, inhibition, and sustained attention.

The classical version of the SCWT is composed of three tables, showing color words, colored squares or circles, and color words printed in incongruent ink (i.e., red printed in blue ink), respectively. The Stroop effect consists of a delayed response when words have to be named according to the color of the ink, ignoring the meaning of the printed word.

The SCWT evaluates the reaction times to non-ambiguous stimuli (reading words in black ink or naming colors in painted forms), and to ambiguous stimuli (color word printed with incongruent ink), thus assessing the inhibition mechanisms that are crucial to executive functions. [2, 3].

Traditionally, the SWCT has been used to evaluate frontal function, although neuroimaging studies show that a distributed network is activated during the Stroop effect. Indeed, studies with fMRI demonstrate the activation not only of the dorsolateral prefrontal cortex (DLPFC) and anterior cingulate cortex (ACC), but also of the posterior parietal cortex (PPC) during the Stroop effect [46], even if the role of each structure is a subject of debate [4, 7].

Performance of the SWCT is affected by many pathological conditions. The test is frequently part of neuropsychological batteries for the evaluation of cognitive functions in Alzheimer’s disease (AD) [8], fronto-temporal lobar degeneration (FTLD) [9, 10], dementia with Lewy body (DLB) [10], vascular cognitive impairment [1113], depression [14], schizophrenia [15], and anorexia [16]. The reaction times increase physiologically with aging and some activation patterns are slightly different between elderly and young people [17, 18].

In this frame, updated norms for the complete form of SCWT (that uses five colors and 100 items for each table with respect to short versions that use less colors and less items) in the Italian language are lacking. Indeed, the main reference for the SCWT was published in 1998 and analyzed a sample with only 15 subjects between 70 and 80 years and without subjects older than eighty [19]. In a more recent study including subjects in the ninth decade [20], a short version (30 items and three colors) of SCWT was used but only one index [20] was evaluated, while in another study, finalized to the assessment of multiple sclerosis [21], older age groups were not assessed, and, again, only one index was considered.

The need to update the normative values is crucial in the evaluation of cognitive functions that are affected by demographic and cultural transformation. The changes in living conditions such as the increased life expectancy, (http://www.worldlifeexpectancy.com/country-health-profile/italy, http://www.istat.it/it/archivio/99464), and the increase of education modify the cultural background of the general population.

The aim of our study was to produce updated normal values for the SCWT in Italian language drawn from a sample balanced for age, gender, and education level.

Materials and methods

Subjects

At first, we planned to enroll 32 subjects for each decade between 20 and 90 years for a total of 224 subjects. This sample size has been established in order to provide reliable correction values, by applying power analysis for multiple regression [22] using the pwr package in R [23], with the following parameters: probability level (α): 0.05, desired statistical power (1 − β): 0.80, effect size (Cohen’s f 2): 0.05, number of linear predictors: 3. These 32 subjects would have included 16 subjects per gender, evenly divided according to four education levels (primary school, middle school, high school, and university degree). During the recruitment period it became progressively clear that subjects with a primary-school level of education could be found only among elderly subjects because current Italian legislation requires a minimum of 8 years of education starting from age 6 (law 1859/62). Thus, we decided to omit the enrollment of the 24 subjects aged 20–49 with primary school education. The new target population was formed by 200 subjects to be enrolled within 18 months.

With the exception of people under 50 years old with the lowest education level, now almost disappeared in Italian population, the plan of our study was to analyze a sample evenly distributed as for age, education, and gender. Such a choice was aimed at estimating the expected values of the indexes of Stroop test as function of significant predictors rather than to be representative of the whole population. Subjects were healthy volunteers checked by means of a general medical history, clinical, and neurological examination. The cognitive status was assessed by means of a clinical interview and MMSE. Depression was rated by means of the Montgomery–Åsberg depression rating scale (MADRS) [24].

The inclusion criteria were the following: (i) age 20–90; (ii) education from primary school to university degrees; (iii) signed informed consent; (iv) MMSE >27 (>25 for age >65); (v) a score <10 at MADRS of the depression evaluation [25].

The exclusion criteria were (i) diabetes mellitus, either treated or not; (ii) severe arterial hypertension not properly controlled by drug therapy (diastolic blood pressure >109 mmHg); (iii) history of a cerebrovascular accident; (iv) history of transient global amnesia in the last 3 years; (v) history of brain injury with a loss of consciousness longer than 30′; (vi) history of brain injury with a loss of consciousness of at least 10′ in the last 6 months; (vii) central nervous system diseases (Parkinson, epilepsy, migraine, etc.); (viii) major psychiatric disorders (psychosis, major depression, etc.); (ix) chronic use of benzodiazepines, neuroleptic, and other sedative drugs (stable low doses of benzodiazepine, SSRI, or other hypnotic drugs were allowed); (x) evidence of severe systemic pathology not properly controlled including, but not limited to, renal failure (creatinine level >2 mg/dL); liver failure (transaminase levels >3 × ULN); untreated thyroid disease; anemia (hemoglobin levels <10 mg/dL); cancer in the last 5 years; (xi) chronic or occasional use of illicit psychotropic substances in the last month; (xii) habitual consumption of >750 cc wine/die or equivalent for daily alcohol intake; (xiii) illiteracy or less of 2 years of education; (xiv) history of polychemotherapy or radiotherapy; (xv) infancy or adolescence development disorder; (xvi) sensory deficit (hypoacusia, visual deficit).

At the end of the recruitment period, 192 subjects fulfilled these inclusion and exclusion criteria, thus almost reaching the intended number, and were enrolled. The distribution of subjects by age, gender, and education is reported in Table 1.

Table 1 Number of subjects enrolled arranged by age and education

Overall average values of the 192 subjects were as follows: age 57.3 ± 19.6 years (range 20–90), education 11.5 ± 4.3 years (range 5–19), MMSE score 29.5 ± 0.8 (range 27–30), MADRS score 2.5 ± 2.5 (range 0–9).

The Stroop color and word test

The present version of SCWT consists of three tables, with one hundred stimuli. The three tables, each arranged in 10 rows and 10 columns, are composed by: (1) color words printed in black ink, (2) colored squares, (3) color words printed with an incongruent ink. The measures of the tables are 420 mm width × 480 mm height, the font used for table one and three is Arial 24, the measures of the squares in Table 2 (color) are 20 mm × 20 mm. The colors used in this version are blue, green, red, brown, and purple, printed by the Organizzazioni Speciali (Florence, Italy) [26].The following parameters were obtained and further considered. The correct answers achieved in the first 30 s for each table, generating three scores, namely word items (WI), color items (CI), and color word items (CWI); the total time needed for reading each table, generating three more scores, labeled word time (WT), color time (CT), and color word time (CWT).

Table 2 Comparison between regression models as fitted with or without transformation of predictors

The study protocol was approved by the local ethics committee. According to the recommendations of the Helsinki Declaration of 1975, as revised in 2008, all subjects were informed about the objectives and methods of the research, and they agreed to take part in the study. The study was explained to all participants both orally and by written instructions.

Statistics

The preliminary analysis evaluated the distribution of raw scores for each of the six indexes (WT, WI, CT, CI, WCT, and WCI). Power analysis was preliminarily computed to fix the sample size to provide reliable correction values, but considering some missing values with respect to the planned number, power analysis was re-computed considering the effective sample size. Subsequently, multiple regression analysis was performed for each index, using the demographic variables (age, education and gender), as independent variables.

The rationale of this procedure was based on the methodology of equivalent scores originally described by Spinnler and Tognoni [27] and then applied to several normative studies [30], including those on the Stroop Test, in the Italian language [1921]. Following this approach, we applied multiple regression analysis to study the effect of age, education, and gender on Stroop indexes. Moreover, in order to take not-linear effects into account we applied the same data transformation as suggested by Spinnler and Tognoni [27], namely, the square root of education and the logarithmic transformation of age [ln(100 − age)]. We then evaluated four regression models in which age and education were both included as raw values, or as transformed values, or one raw and the other one transformed values, alternatively. Among these four models, we choose that with the best R2 value. Dealing with not-nested models, we applied the Akaike information criterion (AIC) [28] for comparing the adequacy of different models [29]. For each model the AIC weights is reported and can be directly interpreted as the likelihood or relative probability of being the best model. The rates between AIC weights were also computed to compare the adequacy of couple of models.

At the following step, the equations to adjust scores for age and education were drawn from the best fit model for each of the six indexes. They were used to standardize all raw values and to build-up the tables reporting a correction value for each class of age and education, as computed for predictors at their central value. Gender was not considered as it was not significant at regression analysis. Reference limits were then computed by analyzing the whole sample of age- and education-corrected values. The corrected score was used to define the cut-off, in accordance with the system of equivalent scores adopted in many Italian normative [1921, 27, 30]. The cut-off for each index was computed by the resolution of Wilks’ integral equations [31] for 95 % tolerance limits at 95 % confidence level.

The cut-off value separates pathological performances from normal performances and defines the values corresponding to the equivalent score of zero. According to the method of equivalent scores, the scores were classified into five ranges corresponding to five categories (0, 1, 2, 3, 4). The equivalent score of 4 identifies the performances above the median value while the equivalent scores of 1, 2, 3 partition the intermediate range (between cut-off and median value) according to specific percentile ranks [27, 30 ].

Statistical analysis was performed by the statistic software SPSS 17.00 (SPSS Inc., Chicago, USA) and MatLab R2014a (MathWorks, Natick, Massachusetts, USA) and using the pwr package in R (http://CRAN.R-project.org/package=pwr).

Results

All subjects concluded the test with the following mean times: WT: 52.7 ± 21.5, CT: 79.1 ± 25.9, CWT: 151.9 ± 60.7, and with the following mean number of items read in the first 30 s early-items scores: WI: 64.92 ± 16.82, CI: 43.9 ± 11.8, CWI: 23.8 ± 8.1.

Power analysis was performed considering 3 predictors, a significance level of 0.05 and a power of 0.80 in the actual sample of 192 subjects. Results showed that the available number of participants allowed to detect a significant effect with an effect size equal to 0.0588, which is between the small (0.02; i.e., the best) and the moderate (0.15) range [22] that is reasonable for a reliable regression analysis.

The best linear regression model for each index always included age and education as significant regressors, while gender never reached the statistical significance, the probability levels found for the effect of gender were, respectively: WT: p = 0.152; WI: p = 0.071; CT: p = 0.704; CI: p = 0.931; CWT: p = 0.597; CWT: p = 0.235. Gender was then excluded from regression models..

Using the Akaike information criterion (AIC) [28], we found that the ranking of AIC weights relevant to the four models was the same for all the indexes and the model based on transformed age and education was the best while the differences were mainly associated with the logarithmic transformation of age (see Table 2).

In all cases, the best fit was associated with transformed independent variables (i.e., natural logarithm of age and square root of education) and yielded significant models for WT (R 2 = 0.197, F 2,189 = 23.176, p < 0.001), WI (R 2 = 0.267, F 2,189 = 34.369, p < 0.001), CT (R 2 = 0.362, F 2,189 = 45.148, p < 0.001), CI (R 2 = 0.377, F 2,189 = 57.204, p < 0.001), CWT (R 2 = 0.479, F 2,189 = 86.908, p < 0.001), CWI (R 2 = 0.605, F 2,189 = 144.726, p < 0.001). Statistical data relevant to the effect of age and education on each index are reported in detail in Table 3.

Table 3 Values of linear regression models

The raw values of the six indexes were thus corrected according to the equations of the multiple regression models as shown in Table 4, which also reports the correction values arranged for age and education classes. The equivalent scores and the relevant score range for each index are reported in Table 5.

Table 4 Correction table for the raw scores of the six indexes
Table 5 Equivalent scores for six indexes (age- and education-corrected scores)

Discussion

The aim of our study was to produce updated normative values for one of the most used attention tests, the SCWT, in an Italian population by recruiting subjects balanced as for education, gender, and age. Such sampling arrangement was aimed at estimating the expected values of the indexes of Stroop test as a function of significant predictors. The sample did not reflect the age, education, and gender distribution of the general population, as in such case it could have introduced a bias, or lower precision, in the estimation of expected values for some predictor intervals, in particular for the highest age and education values.

To the best of our knowledge, only one among the three studies published so far about the Italian version of the Stroop has enrolled subjects up to the age of ninety (52.1 ± 19.56, range 20–90 years) [20] using the brief form of the SCWT (with thirty items and three colors). The main study that utilized the full version did not enroll subjects beyond the age of 81 (mean age: 49 ± 15.6, range 18–81 years) and was published more than 15 years ago [19], while in the third study, only one index was validated in a sample with a low mean age (40.9 ± 11) [21]. In the present study, we provide norms for an extended age range with balanced groups (mean age: 57.3 ± 19.6 years range 20–90), using the complete version of the SCWT (with one hundred items for each of the three tables and five colors) and analyzing several indexes.

The multiple regression analysis showed that age and education had statistically significant influence on the performances as evaluated by all the six indexes of the SCWT, whereas gender did not show significant effects.

The decline in performance with increasing age complies with previous Italian data both in the extended [19] and short forms [20, 21]. This effect has also been found in previous studies concerning SCWT implemented in other languages [3236] and in experimental studies with fMRI [17, 37]. Education is a second variable that positively correlated with the SCWT performance. This effect is in keeping with two out of three previous Italian studies [19, 20]. The positive relationship between education and performance has also been found in other populations worldwide [32].

The correlation with gender did not reach the statistical significance for any index. The effect of gender on SCWT performance is controversial. Considering the previous Italian normative studies, only one reported a significant effect of gender using the classical ‘paper’ version but not in the computer version [19]. The remaining two studies [20, 21] are in keeping with our findings, even though partially different indexes were used. The international normative data do not describe the effect of gender in a Caucasian population [33], but only in an Asiatic sample [32]. Moreover, even in the original study of Stroop [1], data were inconclusive, because in the first experiment reaction times were similar between males and females, while in the third experiment females performed better than males The authors suggested that the difference could be ascribed to differences in educational background between males and females.

A possible limitation of this study arises from the regional composition of the sample, as subjects were enrolled in a single Northern Italian region, even if their origin was composite, due to inter-regional mobility. Further studies might clarify if additional regional or social factors affect the scoring.

This study introduced three indexes (WT, CT, CWT), computed as the total time needed for reading each table: these indexes were not described in the previous Italian studies and were considered here in order to specifically evaluate sustained attention through the reaction times to unambiguous and ambiguous stimuli. These indexes, together with the other three (WI,WI,CWI), could be useful in the identification of attentional profile both in brain pathologies of the third age, such as AD [8], FTLD [9, 10], and LBD [10] and in typical brain pathologies of the adulthood, such as brain injury [38] and multiple sclerosis [39].

In conclusion, we have collected an updated set of norms for the classical, full card version of the SCWT, in healthy subjects from 20 to 90 years old, balanced for age, education, and gender. Several indexes have been analyzed and cut-off values were derived, exploring different aspects of attention. This normal data set paves the way to further studies, exploring the usefulness of these indexes, either alone or taken together in various combinations, for the detection of attentional disorders in specific pathological conditions.