Introduction

Verbal reasoning is a skill that characterizes and distinguishes human beings, and can be defined as the ability to draw inferences from given information [1]. This complex, multicomponential function implies involvement of various cognitive abilities, such as language, attention, working memory and abstraction, as well as categorization skills. Verbal reasoning ability is acquired gradually during language and abstract thought acquisition, and is completed in early adulthood with maturation of the underlying functional and anatomical substrates (e.g., [2] for development of the capacity to understand metaphoric language).

It has long been known that brain injury of diverse aetiology can be associated with verbal reasoning difficulties [35]. Several studies have demonstrated involvement of different areas of both cerebral hemispheres in verbal reasoning tasks. fMRI studies show activation of combinations of brain regions in the frontal, parietal, temporal, and occipital lobes, basal ganglia, and areas of the cerebellum [6]. The parietal cortex seems to have a critical role in resolving transitive inferences [7]. Deficits in deductive reasoning have been reported in cases of left focal temporal lobe lesions [8, 9] and left frontal lobe lesions [1012]. The right frontal gyrus and the right anterior insula seem to be involved in conceptualization tasks [13]. The bilateral prefrontal ventromedial regions, the right orbitofrontal cortex, the medial prefrontal cortex, and the anterior cingulate cortex also seem to be involved [14]. Indeed, frontal lobes contribute to verbal reasoning by integrating and analyzing information and its appropriate use in relation to context [15]. Many studies have also shown high correlation between working memory and verbal reasoning task performance in healthy subjects [12, 16, 17]. Furthermore functional imaging studies suggest that the prefrontal cortex is crucial for analogical reasoning [18].

The first studies to test verbal reasoning were probably by Aleksandr Luria [19] (published in 1976, but the study was originally conducted at the end of the 1920s), who used “odd one out” tasks to investigate classification and abstraction capacities. There are many subsequent studies using single tasks of verbal reasoning to correlate function with neuronal substrates [e.g., 11, 12], but few have attempted to produce standardized material useful for assessing patients with acquired brain injury in clinical practice. Although activation of brain regions is widespread during verbal reasoning tasks, and impaired verbal reasoning is subsequently a frequent and disabling consequence of brain injury, few standardized tests to assess this function are available to clinicians. In batteries for assessing executive functions, such as FAB [20], and in certain IQ tests, such as WAIS-IV [21], some subtests assess the capacity to perceive conceptual relations between words. Although a recent systematic review [22] recognized the WAIS-IV similarity subtest as a valid task for measuring and assessing verbal abstract reasoning, used as a single task, it might be insufficient for assessing this complex function. The recent Family Relation Reasoning Test [23], proposed in the German language, assesses several cognitive operations, such as inference, working memory, and deduction, but it is not available in Italian. The “Giudizi Verbali” test [24] has many limits: its normative data have not been updated and the test only provides norms for individuals aged 40 years and over. Furthermore, young people have difficulty in the Proverb subtest, because proverbs are no longer current in everyday language.

The present study was designed to construct a new test for verbal reasoning, suitable also for young individuals. In the first part of the paper, we describe how we constructed the test, and in the second, we provide normative data for the total score and the seven domain subscores in a sample (N = 380) of cognitively healthy participants covering a wide age (16–75 years) and education range. We used consolidated analytical procedures [25] to generate a correction grid for raw scores taking the influence of the main socio-demographic variables (gender, age, and education) into account, and to transform adjusted scores into equivalent scores (ES) [24, 26]. Moreover, grids for z scores are proposed to facilitate comparison of scores obtained either on the entire test or on the seven subtests.

Methods

Phase 1: development of the verbal reasoning test (VRT) and pilot study

Seven subtests were designed, namely, absurdities, intruders, relationships, differences, idiomatic expressions, family relations, and classifications. The different subtests were identified selecting tasks that have different aspects of verbal reasoning as a common denominator. Ten items plus a warming up item were created for each subtest. The stimuli were constructed with different degrees of difficulty, determined by different aspects, such as degree of abstraction or amount of working memory involved.

Absurdities This test consists of sentences containing conflictual information. The subject has to identify the logical incongruence (e.g., “Outside the farm there was a bright sunshine, while inside it was raining”).

Intruders In this test, the participant has to identify the “intruder” among four words (e.g., “physician, hospital, dentist, nurse”).

Relationships This task requires the participant to identify the relationship between a pair of terms and to express the same type of relationship between two other words (e.g., “The relationship between cold and hot is the same of that between open and…”).

Differences In this task, the subject is asked to identify the main characteristic distinguishing two objects or concepts (e.g., “What is the main difference between eye and ear?”).

Idiomatic Expressions This test requires the subject to explain the meaning of certain common idiomatic expressions (e.g., “What does it mean: lift your elbow?”).

Family Relations The participant is asked to specify the degree of familial relationship between relatives in a statement (e.g., “Lucy and Mary are sisters. Mary has a daughter, Anne. What kind of family relation is there between Lucy and Anne?”).

Classifications In this task, the participant has to determine the category to which triplets of words belong (e.g., “What are Milan, Rome and Naples”?).

Participants

107 Italian participants were included in a preliminary sample. They were healthy and with no brain injury, depression, alcohol and/or drug abuse, severe medical conditions (e.g., neoplasms and organ failure), stroke, or clinically evident cognitive disability. We used a normal score on Digit Span forward [27], as inclusion criterion to ensure normal short-term verbal memory. These healthy Italian volunteers were distributed across age classes (mean age was 45.5 years, SD 16.9 years, range 16–75 years), gender (56 women and 51 men), and education levels (primary school to university, mean formal education was 14 years, SD 5.27 years). Subjects under 45 years of age or with less than 8 years of education were excluded. Informed consent was obtained from all individuals participating in the study.

Procedure

Participants were individually assessed with the forward version of the Digit Span test [27] and the VRT. VRT was administered using the following procedures: (a) items were presented orally and (b) items could be repeated once on request of the participant.

Analyses

Participants’ responses on all seven tasks were scored 2, 1, or 0: 2 for a correct response, 1 for a partially correct response, for a concrete example without any elaboration or for a correct answer when the item was repeated after the subject’s request, and 0 for either a wrong response or for repetition of the item without any elaboration. Three experienced neuropsychologists independently assessed the responses to each item, comparing scores in the case of disagreement. A scoring manual was drawn up using examples (manual available at: http://vrt.sstefano.it). We selected seven of the ten original items for each subtest, excluding items with too high response variability, too low mean response rate (<1), and fewer than 55% of responses rated with a score of 2. A final version of the test was then created, composed of seven subtests, each of them containing seven items (total 49 items). The test can be downloaded from the website: http://vrt.sstefano.it.

Phase 2: Collection of normative data for an Italian population

Participants

To obtain normative data, we recruited another 290 healthy participants with the same inclusion criteria and socio-demographic characteristics as for the pilot study. Six centers ranging from Southern to Northern Italy were involved in the study to ensure representativity across regional Italian variants. Thus, final normative data were collected on a total sample of 397 healthy volunteers distributed across age classes (mean 45.9, SD 17.0, range 16–75 years), gender (204 women and 193 men), and educational levels (from primary school to university, expressed as years of formal education, including post-grad education and/or specialization courses; mean = 13.1 years, SD 4.7, range 3–29). Participants were native Italian speakers without a history of neurological deficits. Informed consent was obtained from all individuals participating in the study.

Procedure

The same procedure as in Phase 1 was used here. Responses were scored by the operators of each center using the correction manual created by the authors at the end of the pilot study.

Analyses

Two different sets of analyses were carried out separately on VRT scores and on scores of each of the seven subtests: (a) equivalent score procedure and (b) Z-score procedure. Statistical analysis was performed with the R program [28].

(a) Equivalent score procedure Total VRT score and subtest scores were analyzed by simultaneous multiple regressions to check the influence of the demographic variables age, education, and gender. We first did this by means of linear regression on raw score data. Linear regression was significant for age and education but not for gender, so gender was not considered in the subsequent analyses (see below). Age and education were entered in several multivariate linear regressions to partial out any overlapping effect. We applied the transformations suggested by Capitani and Laiacona [25] to age and education, namely, the logarithmic transformation of age [ln(100-age)] and the square root transformation of education. Four regression models were employed: in the models, age and education were both included as raw values, or as transformed values, or one raw and the other one transformed, alternatively. Of the four models, the model with the best R2 value was selected. To compare the adequacy of different models, we applied the Akaike information criterion (AIC) [29] and the one with the lowest AIC was selected as the best regression model (see [30] for same approach).

In the following step, from the best fit model, we drew the equations that allow to adjust scores for age and education (separately for VRT total score and for those on the subtests). They were used to standardize all raw values and to build the tables giving correction values for age and education ranges. Since gender does not have any effect on participants’ performance, this variable was not included in the best fit model. The resulting correction grid allows immediate adjustment of raw performance scores of newly tested individuals according to age and education. Correction factors were calculated for four age ranges (16–30, 31–45, 46–60, and 61–75 years) and four educational ranges (in terms of years of formal education: 3–7, 8–12, 13–15, and >15 years). Finally, reference limits were computed by analyzing the whole sample of age- and education-corrected values. The cutoff for each index was computed by solving Wilks’s [31] integral equations for 95% tolerance limits at 95% confidence levels. The cut-off value separates pathological performance from normal performance and defines the values corresponding to the equivalent score = zero. According to the equivalent score method, scores were classed into five ranges corresponding to five categories (0, 1, 2, 3, and 4). An equivalent score of 4 indicates above-median performance (>50 percentile ranks), while the equivalent scores of 1, 2, and 3 partitions the intermediate range (between cut-off and median values), according to specific percentile ranks [24].

(b) Z-score procedure We calculated the z scores for all subtest scores and for the total score. A correction grid was constructed on the basis of the scores.

Results

Seventeen subjects were excluded on the basis of pathological Digit Span scores. The final sample was composed of 380 participants (mean age = 45.9 years, SD 17, range 16–75; mean education = 13.2 years, SD 4.7, range 3–29). The distribution of the sample by gender, age, and education ranges is reported in Table 1. The mean raw score of the whole sample was 82.9 ± 11.7 (range 25–98). The mean raw score of the subtests were the following: Absurdities 11.5 ± 2.5 (range 1–14), Classifications 12.9 ± 1.6 (range 5–14), Differences 12.4 ± 2 (range 3–14), Idiomatic Expressions 11.5 ± 2.1 (range 2–14), Intruders 12.2 ± 2.4 (range 2–14), Relationships 11.8 ± 2.7 (range 2–14), and Family Relations 10.5 ± 2.9 (range 0–14).

Table 1 Distribution of the sample by gender, age, and education levels

(a) Equivalent score procedure Linear regression analysis of VRT total score and those for the single subtest showed a significant effect of age (all ps < 0.024) and education (all ps < 0.001). No significant effects were found for the variable gender (all ps > 0.429). The Akaike information criterion (AIC) indicates that the best model is that based on transformed age and education for six scores (Absurdity, Intruder, Relationship, and Difference subtests and the Total scores), on raw age and transformed education for two subtests (Idiomatic Expression and Family Relation subtests).

Best fit analysis yielded significant models for all sets. Statistical data on the best models for each subtest are reported in Table 2. The raw values of VRT and subtest scores corrected according to the equations of the multiple regression models are shown in Table 3, and the equivalent scores for each subtest are reported in Table 4.

Table 2 Values of linear regression models of the best models (see the main text for details)
Table 3 Correction grid of raw scores for the entire VRT and its single subtests (adjusted score = raw score − corrected score)
Table 4 Equivalent scores (ES) for adjusted values of the VRT total score and the single subtest scores (age- and education-corrected scores)

(b) Z-score procedure Tables 5 and 6 show the z scores for the whole test and the seven subtests. Assuming two standard deviations below the mean score to indicate pathological performance, for such scores to be obtained by only 2% of the population, z scores below −1.96 can be considered pathological and are indicated in the table with an asterisk. The cutoff is shown for the total and for the single subtest scores. We suggest the use of the correction grid (Table 5) to adjust the row score, before transforming raw scores into z score.

Table 5 Z score for each of the seven subtests
Table 6 Z score for the total raw score (RS)

Discussion and conclusion

This paper describes a new test designed to assess verbal reasoning abilities in adults with acquired brain injury. We created seven different tasks investigating different aspects of verbal reasoning. We tested this first set of tasks in a pilot study on a sample of 107 healthy subjects, equally distributed between genders, and for age and education ranges. Based on the results obtained on the pilot study, the number of items included in each task was reduced from ten to seven on the basis of the participants’ results, and a manual for scoring the results was created.

In the second phase, we recruited the normative sample of 380 participants. Multiple linear regression analysis on the normative sample failed to reveal any gender effect but showed a significant effect of age and education. We, therefore, replicate the results of Spinnler and Tognoni [24] on the “Giudizi Verbali” test, in which sex effect was even not found (see [32] for a comprehensive review about sex differences in cognitive abilities, and [33] for sex differences in verbal reasoning). We then built a grid for correcting raw point scores on the basis of the patients’ socio-demographic characteristics, and introduced two different types of analyses of the results: equivalent scores and z scores, to meet two different requirements. Equivalent scores are mainly indicated for discriminating pathological from normal performances, but are not sensitive to changes in a patient occurring over time, as for brain-damaged patients that were tested before and after treatment, or in the case of a follow-up study in patients suffering of degenerative brain damage. Z scores are more sensitive for comparing follow-up performances, for example, of a same individual at different time intervals.

The test may have different applications. First, it is also suitable for young subjects, since it was calibrated on persons from 16 years of age and over. A second advantage is that it includes different subtests investigating different aspects of reasoning. Correction of eight different scores (the total VRT score and its seven subtest scores) offers the additional advantage of total or partial use of the tool. For example, in the case of patients with severe brain injury, only certain subtests may be applied, whereas for other patients, the administration of the whole protocol is indicated.

Although we consider that verbal reasoning testing is useful for all patients undergoing neuropsychological evaluation, those with bilateral frontal damage or with left temporal or subcortical lesions of different aetiology (vascular, traumatic or neurodegenerative) can benefit from more detailed investigation of verbal reasoning capacity.

Further studies are needed to compare the performance of patients in the different subtests. It will be interesting to determine in how much the results correlate between the different subtests and with those of other cognitive tasks and with the site of brain damage. Certain subtests are likely to correlate better with certain cognitive functions (e.g., the Family Relation subtest requires good verbal working memory, whereas the Classification and Intruder subtests are more linked to lexical-semantic and conceptual capacities), and damage to specific brain areas is likely to affect the results of certain subtests more than others. Overall, this test seems to fill a gap in the range of Italian verbal reasoning tasks.