Background

Cognitive dysfunction is common and often severe in multiple sclerosis (MS) and can represent a major clinical aspect of the disease. Studies suggest the crucial role of neuropsychological evaluation in monitoring disease progression and in predicting an earlier conversion to MS in patients with CIS compared with physical symptoms/signs [1, 2]. Cognitive impairment upsets lives, employment status, and can have meaningful negative consequences, caregiver burden, and social functioning, thereby affecting overall quality of life of the person with MS and the family [3,4,5,6,7]. Cognitive deficits are more frequent in progressive MS and appear to be mainly related to an older age and higher levels of disability [8, 9]. Prevalence rates, based on several studies, lie in a wide range between 40 and 70% of MS patients with cognitive dysfunction [10]. Symptoms and severity can be extremely variable and may be influenced by the presence of mood disorders (i.e., depression and anxiety) and/or fatigue [5, 11]. In adult MS, attention and concentration, information processing speed, executive functions, and memory are the most affected cognitive domains, whereas dementia is rare [12]. Considering the high costs of an extended neuropsychological assessment in terms of time and money in the routine evaluation of MS patients, a number of brief cognitive screening batteries have been developed [13,14,15].

The National Multiple Sclerosis Society (NMSS) in the USA has recently published recommendations for cognitive screening and management in MS care [16]. In particular, the NMSS suggests the use of the Symbol Digit Modalities Test or similar validated instrument for screening adults with stable MS. In order to monitor the progression of the disease over time as well as the effects of treatment (e.g., starting/changing disease-modifying therapy), experts recommend the use of the same test annually, integrating this with a more comprehensive neuropsychological assessment when impairment or worsening cognitive performance on the screening tool emerges. Several instruments have been developed specifically for the evaluation of cognitive impairment in individuals with MS. The Brief Repeatable Battery (BRB) and the minimal neuropsychological assessment of MS (MACFIMS) represent the most widely used neuropsychological batteries, both in clinical practice and in research, due to excellent psychometric properties, as well as the availability of alternate forms for longitudinal assessment [17, 18]. The BRB and MACFIMS overlap on a number of measures, although the latter includes tasks exploring executive functions. A recent study compared the sensitivity of these two batteries in detecting cognitive impairment in patients with MS and reported that the BRB and MACFIMS are comparable in their overall sensitivity. Although the comparison of the two batteries revealed a greater discriminative validity of the visuospatial memory test of the MACFIMS, the sensitivity of the verbal memory test was similar in the two batteries [19]. Currently, the BRB has normative values for the Italian population and represents the only available battery for detecting cognitive dysfunction in clinical practice. With this background, the aim of the study was to complete the translation of the MACFIMS in Italian and to validate two alternate forms, providing normative values adjusted for age, sex, and education.

Methods

A total of 200 healthy subjects were recruited from the community in 8 Italian cities across the country (Bari, Brescia, Catania, Chieti, Crema, Florence, Milan, Rome). The sample size was determined on the basis of recent normative neuropsychological battery studies in the Italian population [20, 21], assuring representativeness of the MS population through adequate stratification according to age, sex, and education. Exclusion criteria included severe visual or hearing impairment, any neurological or psychiatric diagnosis, history of alcohol or drug abuse, and learning or intellectual disability. Depression was assessed with the Beck Depression Inventory II. A score of 14 or above is indicative of depressive symptoms [22,23,24]. Subjects with a Beck Depression Inventory score of greater or equal to 14 were excluded from participating in the study. Subjects were evaluated at baseline and reevaluated with an alternate form of the same battery after 12 months.

The study was approved by the Ethics Committee for the Region of Liguria (P.R. 549REG2015). Study subjects provided signed informed consent.

Methods

Subjects at each site were evaluated by the same neuropsychologist who received training in order to standardize the administration, registration, and scoring procedures. Tests were administered in the same order according to the recommendations of the panel of experts who developed the MACFIMS battery [18].

MACFIMS neuropsychological battery

The Controlled Oral Word Association Test (COWAT) measures phonemic fluency or language efficiency and research speed [25]. During the task, the subject has 1 min to generate as many words as possible that start with a stimulus letter provided by the test administrator. Three trials with three different letters are conducted. For use in an Italian population, the stimulus letters, typically part of the COWAT, were substituted with letters more common to the Italian language [26, 27].

The Brief Visuospatial Memory Test-Revised (BVMT-R) is a measure of visuospatial learning and memory [28]. The subject is asked to observe a matrix containing 10 abstract figures for 10 s and reproduce the drawing as faithfully as possible with a pencil on a white sheet of paper. The matrix is presented and reproduced three times. Each design is given a score of 0, 1, or 2, depending on whether the reproduction is accurate and positioned correctly on the page. The total score is the sum of the scores obtained in the three presentations. After about 20–25 min, the subject is asked to reproduce the matrix again as shown previously, from memory. The test also provides the assessment of visuospatial memory through a recognition task.

The Paced Auditory Serial Addiction Test (PASAT) is a measure of sustained attention and information processing speed [29]. The subject listens to an audio recording of a voice stating 61 one-digit numbers. The subject must add the last number heard to the previous one. The two versions of the PASAT administer the numbers at 3- and 2-s intervals. The total score is the number of correct answers in each trial (range 0–60).

The Judgment of Line Orientation (JLO) test is a measure of visuospatial perception [30]. Subjects are asked to match two lines to a set of 11 lines arranged in a 180° semi-circle. The total score is the number of correct answers over 30 items.

The California Verbal Learning Test-II (CVLT) is a measure of episodic verbal learning and memory [31]. The administrator reads a list of 16 words, at 1-s intervals, in a fixed order, over five learning trials. After each trial, the subject is asked to recall as many words as possible in any order (free recall). After 30 min, the subject is asked to recall the words learned previously (long delay recall). The CVLT contains standard and alternate forms. At baseline, the word list contained in the short Brief International Cognitive Assessment for MS (BICAMS) battery was used since it was available in Italian [21]. An alternate form was used to minimize practice effect at follow-up. The alternative version was translated by a bilingual translator into Italian and subsequently back translated by an independent translator in order to verify correctness (copyright permission Pearson, 2017).

The Symbol Digit Modalities Test (SDMT) is a measure of attention, information processing speed, and visual scanning by substitution [32]. Using a reference key, subjects are required to match nine abstract symbols paired with numerical digits as quickly as possible. A pre-test with 10 items was performed. The total score is the correct number of pairings in 90 s, and scores range between 0 and 110. The oral response format of the SDMT was administered.

The Delis–Kaplan Executive Function System Sorting Test (D-KEFS ST) free condition is a measure of executive function, in particular, assessing concept formation and the ability to explain sorting concepts abstractly [33]. At baseline and follow-up, the words on card sets 1–2 (standard form) and 3–4 (alternate form) were used since it was available in Italian [34]. A screening pre-test precedes the examination during which each participant reads 24 words and the administrator verifies the subject’s knowledge of each word. Six cards (practice set) are shown to explain the task. Subjects observe different card sorting and classification possibilities (perceptive and verbal criteria). The sorting task consists of dividing the six stimulus cards into two groups (categorization) of three cards each (description) on the basis of perceptive (e.g., color, shape, ...) or semantic lexical characteristics (words displayed on the cards). Eight sorting, three verbal, and five perceptive criteria are possible for each card set. The maximum categorization and description score are 8 and 32, respectively, for each card set.

Statistical analysis

Group comparisons were assessed through the Student’s t test for unpaired samples, the Mann–Whitney test, and the χ2 test, when appropriate. An alpha = 0.05 was considered statistically significant. Regression-based norms were calculated at each time point (baseline and follow-up) following the previously described procedure used for the English version of the MACFIMS [35] In particular, the control group’s raw scores on each neuropsychological measure were converted to scaled scores using the cumulative frequency distribution of each measure (M = 10, SD = 3). We then regressed the resulting scaled scores on age, age-squared, sex (male = 1; female = 2), and education entered en bloc. The inclusion of a term of age-squared allowed the correction for the nonlinear relationship between age and cognition. The assumptions of regression analysis were tested by conducting a Kolmogorov–Smirnov test to evaluate the normality of the residuals (the Kolmogorov–Smirnov test should not be significant). Normative data was established as follows: participants’ raw test scores were converted to scaled scores using the raw-to-scale score conversions derived from healthy controls. Next, the multiple regression equations derived from healthy controls were applied to compute demographically predicted scores for each participant. These predicted scores were then subtracted from each participant’s actual scores and the differences divided by the standard deviation of the control group’s raw residuals for each measure (obtaining the z-score). All analyses were performed using the SPSS 24 for Windows (SPSS, Chicago, IL, USA).

Results

The demographic characteristics of the sample are reported in Table 1. Age of the enrolled subjects ranged from 18 to 65 years. There was no significant difference between healthy controls recruited in different cities. Neuropsychological performance at baseline and follow-up is reported in Table 2. There were no significant differences between the scores obtained at baseline and at the 12-month follow-up. Regarding the baseline assessment (MACFIMS version A), Table 3 reports the raw-to-scale score conversion (M = 10, SD = 3) using the cumulative frequency distribution of each measure of the MACFIMS. Table 4 shows the normal control regression models for the baseline version of the MACFIMS. All models include age, age-squared, sex (male = 1; female = 2), and education. The Kolmogorov–Smirnov test on the distribution of the residuals was negative for all models (p > 0.06). The standard deviations of the residuals for the MACFIMS version A are reported in Table 5.

Table 1 Characteristics of the study sample
Table 2 Neuropsychological performance
Table 3 MACFIMS version A: raw score to scaled score conversions
Table 4 MACFIMS version A: standard deviation of the residual
Table 5 MACFIMS version A: final regression models

For the follow-up assessment (MACFIMS version B), Table 6 (online resource 6) reports the raw-to-scale score conversion (M = 10, SD = 3) using the cumulative frequency distribution of each measure of the MACFIMS. Table 7 (online resource 7) shows the normal control regression models for the follow-up version of the MACFIMS. All models include age, age-squared, sex (male = 1; female = 2), and education. The Kolmogorov–Smirnov test on the distribution of the residuals was negative for all models (p > 0.06). The standard deviations of the residuals for the MACFIMS version B are reported in Table 8 (online resource 8).

These models can be applied to convert raw scores from a subject with MS to regression-based T score. Figure 1 provides an example.

Fig. 1
figure 1

An example of using the models to convert raw scores from a subject with MS to regression-based T scores

Discussion

For years, the literature and scientific organizations have recognized the importance of cognitive assessment in MS. For this reason, various neuropsychological batteries have been developed for both clinical practice and for research. Recently, an algorithm for the management of cognitive functions in people with MS was proposed [2]. In particular, the authors suggest that cognitive functions should be evaluated annually with short batteries such as the BICAMS and, in case of cognitive decline evidenced at screening, extended to more in-depth assessment. While brief screening batteries are increasingly recognized as key tools for application in everyday clinical practice, more extensive and comprehensive neuropsychological batteries remain of critical importance in order to obtain a more precise cognitive profile necessary for planning tailored rehabilitation.

Among the available tools in MS, the BRB and the MACFIMS are the batteries more extensively used throughout the world. Further, evidence from several studies have shown their excellent psychometric properties in detecting and characterizing cognitive dysfunction in MS. Compared with the BRB, the MACFIMS has the advantage of exploring executive functions, providing a more comprehensive picture of the neuropsychological profile of individuals with MS [18, 19]. In order to be able to use these cognitive batteries in different countries, validated versions, as well as country-specific normative data, are essential [36,37,38].

While both forms of the BRB have been validated and normative data produced for the Italian population, a complete validation of the MACFIMS was lacking [20, 39]. The current study provides the Italian translation and validation for both baseline and alternate forms of the MACFIMS. Both baseline and alternate versions were translated and validated on the same normative sample, which addresses possible practice effects in longitudinal evaluations. Normative scoring has been developed following the procedure applied for the original version by Parmenter and colleagues [35]. This provides regression-based norms that account for demographic influences on test performance, using the entire normative sample rather than smaller subgroups for the computation of demographic stratification. Moreover, it has been suggested that regression-based adjustment provides some measurement advantages as compared with discrete norms [35].

Previously, two Italian validation studies of the MACFIMS were published [40, 41]. The paper by Migliore and colleagues mainly focused on criteria and, partially, construct validity, demonstrating good performance of the tests in the Italian population [40]. The authors did not provide normative values. In the second paper [41], demographically adjusted normative values were provided for only the baseline version of the MACFIMS. Moreover, the authors decided to eliminate the PASAT from the battery, and to calculate an overall cognitive impairment index without assessing cognitive performance on each task. Furthermore, corrected scores were not provided, limiting the applicability of the procedure. The current study, in comparison with previous studies, demonstrates higher scores on the majority of neuropsychological tests (in particular, CVLT-II total learning and SDMT, compared with both studies, and BVMT-R and D-KEFS scores compared with study by Argento et al. [41]). This may be, at least in part, due to a younger sample in the current study. Indeed, scores on the CVLT-II total learning, SDMT, and BVMT-R were similar to those reported in the Italian validation of the BICAMS, where the age of the sample was similar to that of the present study [21].

In interpreting the study findings, a few limitations should be taken into account. Although healthy subjects were recruited in 8 different Italian cities covering the entire country, the study sample could not be entirely representative of the general Italian MS population. Further, a cohort of relatively young subjects (age range 18–65 years) renders it difficult to apply the normative scores in elderly individuals.

In conclusion, to our knowledge, this is the first calculation of regression-based normative values of the complete MACFIMS battery for the Italian population. The application of these data can assist neurologists and neuropsychologists in Italy in the characterization of disease-related cognitive impairment, in order to promote a tailored rehabilitation approach. Moreover, the validation of a longitudinal assessment tool can contribute to evaluating the efficacy of pharmacological and non-pharmacological therapies in both research and clinical practice.