Introduction

Since the early description of the disease, cognitive disorders have been known to be a common feature of multiple sclerosis (MS); the precise nature of these deficits, their frequency and their evolution throughout the disease process are yet to be completely established. Recent evidence suggests that cognitive impairment (CImp) may be detectable at any stage of MS, with an overall prevalence ranging from 40 to 65 % [1] and even after a single demyelinating episode [2, 3]. Prevalence of cognitive disturbance varies among MS subtypes, and it can be the main or only clinical symptom [4, 5]. Studies comparing selected samples of patients with relapsing–remitting MS and progressive MS (PMS—secondary and primary) demonstrated that CImp is more frequent and severe in progressive than in the relapsing–remitting group [6, 7].

Attention, visuospatial abilities, learning and memory, information processing speed and executive functions [1, 8, 9] are the most involved domains. Processing speed and visual learning and memory seem to be most commonly affected in MS [1, 10]. Areas of cognition that are not usually affected are “simple” attention (e.g. repeating digits) and essential verbal skills (e.g. word naming and comprehension) [1, 11].

To evaluate neuropsychological deficits, several tools are available. Two neuropsychological batteries are mostly used the Rao Brief Repeatable Neuropsychological Battery (BRBN) [12] and the Minimal Assessment of Cognitive Function in MS (MACFIMS) [10, 13]. MACFIMS was developed by an expert panel of neuropsychologists convened by the Consortium of MS centre in April 2001 [13]. It requires approximately 90 min and consists of seven tests: Controlled Oral Word Association Test (COWAT; [14]), Judgement of Line Orientation Test (JLO; [14]), California Verbal Learning Test II edition (CVLT-II; [15]), Brief Visuospatial Memory Test—Revised (BVMT-R; [16]), Paced Auditory Serial Addition Test (PASAT; [17]), Symbol Digit Modalities Test (SDMT; [18]), Delis Kaplan Executive Function System Sorting Test (DKEFS sorting test; [19]). These tests have good reliability [20] as well as validity, in that they generally correlate well with brain MRI metrics [21, 22] and employment status [10]. A recent study [23] comparing BRBN and MACFIMS batteries found that they are comparable in their discriminative validity. However, the BVMT-R appears to be more sensitive than the 10/36 Spatial Recall Test, used for the assessment of visual memory; moreover, the MACFIMS battery investigates executive functions that the BRBN does not explore. However, further studies are necessary to prove and enhance these results.

Despite well-documented psychometrics in English, the MACFIMS is not very easily applied in non-English-speaking countries. Currently, MACFIMS is not applicable to the Italian population both for research purposes and for everyday clinical practice since the validity of this battery in Italy has never been demonstrated.

The aim of this study was to explore the validity of the MACFIMS in the Italian population; several aspects of validity were assessed as follows: criterion (ability to discriminate between MS patients and healthy controls) and construct (factorial structure of the MACFIMS).

Materials and methods

Subjects

One hundred and thirty patients with clinically diagnosed MS (according to McDonald’s criteria; [24]) and 60 healthy controls (HCs) matched by sex, age and education were enrolled. The demographic and clinical characteristics of the MS and HC groups are reported in Table 1.

Table 1 Demographics and clinical characteristics of study samples

All subjects were recruited at the MS centre of the Department of Neuroscience of San Giovanni Calibita “Fatebenefratelli” Hospital (Rome), Policlinico “Tor Vergata” (Rome) and Campus Bio-Medico University (Rome). The research participants (including all HCs) were contacted by mail/telephone or were approached during the course of their usual clinical care at the MS centre. All patients referred to MS centres from January 2012 to June 2013 and meeting the inclusion/exclusion criteria were enrolled. About 35 % of them were assessed for routine monitoring of cognitive functioning, 30 % for specific clinical reasons (i.e. disability evaluation, differential diagnosis of CImp vs depressive disorder, suspected cognitive impairment), and about 35 % were research volunteers. HCs were selected from hospital employees (physicians, nurses, clerks, cleaners, porters) and their relatives.

All subjects met the following inclusion criteria: age 18 years or older; fluent in Italian; and able to provide informed consent to all procedures. Exclusion criteria were neurological disorder other than MS; psychiatric disorder other than mood, personality or behaviour change following the onset of MS; medical condition that might influence cognition; history of developmental disorder (e.g. ADHD, learning disability); history of substance or alcohol dependence, or current abuse; motor or sensory defect that might interfere with cognitive test performance; relapse and/or corticosteroid use within 4 weeks of assessment (only patients). Patients were under routine treatment, and we did not include or exclude patients based on medications.

In data analysis, we considered together relapsing–remitting and clinically isolated syndrome patients (Relapsing group) and secondary progressive and primary progressive patients (Progressive group).

A detailed clinical interview was performed to verify the inclusion and exclusion criteria; all patients underwent a complete neurological examination. Each HC and MS patients was asked to sign an informed written consent (previously approved by the local ethical committee) to participate in the study.

Neuropsychological assessment

Tests were administered in a standardized manner, during daytime, in a quiet room, and in a fixed order, in accordance with consensus panel recommendations [10, 13]. The MACFIMS includes seven tests. The COWAT is a measure of verbal fluency and mental flexibility; it was administered in the standard manner, following the method of Arthur Benton [14]. In successive 1-min trials, participants generated as many words as possible, beginning with each of three designated letters (F-A-S). The dependent measure was the total number of correct words over the three trials. The JLO is a measure of the visual–spatial ability; it required participants to identify the angle defined by two stimulus lines from among those defined by a visual array of lines covering 180°. The dependent variable was the total number of correct responses over 30 items. The CVLT-II is a measure of verbal learning and memory; the ability to learn a 16-word list was first examined over the course of five trials. The examiner reads 16 words and asked participants to repeat as many words as possible. The entire List A was repeated each time. After 25 min, participants are asked to recall the list again without another exposure. Outcome measures are total learning over five trials and number of correct recalls following the delay. The BVMT-R is a measure of visuospatial learning and memory; it consists of three learning trials and delayed recall of a matrix of six designs. Each design received a score of 0, 1 or 2 based on accuracy and location scoring criteria. There were three free-recall trials followed by a 25-min delayed recall. The PASAT is a test of auditory working memory and mental speed and included 60 trials presented at inter-stimulus intervals of three (PASAT-3) and two (PASAT-2) seconds. The dependent measure was the number of correct responses from each of the two trials. The SDMT is a test of visual processing speed consisting of nine abstract symbols paired with a number from one to nine. Participants responded by voicing the digit associated with each symbol as quickly as possible. The dependent measure was the number of correct responses in 90 s. The DKEFS sorting test was employed to assess higher executive function. The examinee is asked to sort six different cards into two groups in as many different ways as possible. With each sort, the rationale for sorting is explicated by the patient and scored for accuracy. We recorded the total number of correct sorts (CS) and calculated the verbal description score (DS), which was based on the abstractness and accuracy of the sort descriptions.

Moreover, the participants completed the Beck Depression Inventory (BDI; [25]) to evaluate depression. The entire test battery required 90/100 min of face-to-face testing time.

Every neuropsychological test is considered impaired if the score is ≤5th percentile with respect to normative data; patients impaired on two or more tests were defined as cognitively impaired. Italian normative data of COWAT [26], JLO [27], PASAT [28], SDMT [29] and the DKEFS sorting test [30] are available; for normative data of CVLT-II and BVMT-R, we considered the US sample [15, 16].

On the basis of the number of test failures, patients were classified as mildly (two test failures), moderately (three test failures) or severely (four or more test failures) impaired, following the paper of Nocentini et al. [31]. This classification reflects different levels of cognitive deterioration in order to highlight different degrees of severity of cognitive dysfunction.

In addition to the number of tests failed, the number of cognitive domains impaired was considered. More specifically, the domain was considered damaged when at least one test in the domain was impaired. MS patients were considered as multi-domain cognitive impaired (mDCI) when at least two domains were found altered. We take into consideration five cognitive domains: verbal memory [CVLT total learning (TL); CVLT-delay recall (DR)], visual memory (BVMT-total learning (TL); BVMT-delay recall (DR)), information processing speed (PASAT-3; PASAT-2; SDMT), executive functions (DKEFS-CS; DKEFS-DS; COWAT) and visuospatial perception (JLO).

Translation and cross-cultural adaptation

Italian versions of COWAT [26], JLO [27], PASAT [28], SDMT [29] and DKEFS sorting test [30] were available.

To preserve semantic equivalence in the remaining test (BVMT-R, CVLT-II), first bilingual translators with a good understanding of Italian and English translated the tests into Italian. Next, back translation into English was performed by independent translators with a good understanding of Italian and English. To assure their equivalence, original and back-translated versions were compared.

BVMT-R is a visual–spatial test, thus only the instructions needed translation into Italian. Moreover, there are no semantic associations with stimuli in the Italian culture or language. For CVLT-II, an iterative process of modification, including cultural adaptation of some words, was undertaken, as these tests emphasize verbal stimuli. In CVLT-II, each word of the English version of List A has been replaced by an Italian word, so as to preserve the test’s properties, match word frequency and appropriate similarity of meaning.

For every test, care was devoted to keep the original items in every case in which it was possible, given the resemblance between the two languages.

Statistical analysis

Data were presented as mean (SD) or, if necessary, as median (min–max). A log-transformation was applied to some variables to gain a better fit to Gaussianity, to limit the dangerous effect of extreme values and to reduce heteroscedasticity in the residuals. Group differences on demographic and clinical data were assessed using parametric tests (Student’s t test or univariate ANOVA) or, if necessary, nonparametric tests (Mann–Whitney U test or Kruskal–Wallis test, chi-square test). The correlations among all neuropsychological subtests and demographic and clinical data [BDI log scale (log-BDI)] were evaluated using Pearson’s or Spearman correlation coefficients. Analyses of covariance (ANCOVA) were performed to evaluate the differences in test performance among HC and diagnosis groups (Relapsing and Progressive group) adjusting for sex, age and log-BDI. To describe the effect size, Cohen’s d was calculated; it is the difference between means divided by the pooled SD, and its magnitude is assessed using the thresholds provided in Cohen [32].

Because data about four tests (BVMT-TL and BVMT-DR; PASAT-3; and PASAT-2) presented a right-skewed distribution inflated at their minimum value, adequate statistical models were used in addition to the linear general models previously applied. Specifically, zero-truncated negative binomial (ZTNB) regression was used to model BVMT-TL and BVMT-DR data (more precisely, the value of 0 could not occur and their distributions were inflated on their minimum value equals to 20).

Instead, PASAT-3 and PASAT-2 data were semi-continuous presenting a continuous distribution except for a probability mass at 0. A two-part model was applied, so as to separate the modelling into two stages. In the first stage, a binary model was used for the dichotomous event of having 0 or positive values, such as the logistic regression model; conditional on a positive value, in the second part, data were modelled using a log-gamma generalized linear model. In the second part of the model, sex, age and log-BDI were considered as covariates.

A logistic regression model was applied to analyse the association between CImp (two or more tests below the cut-off) and diagnosis groups (HC, Relapsing and Progressive group), adjusting for sex, age and log-BDI.

Construct validity analysis was performed using principal component analysis (PCA) with varimax rotation. All neuropsychological (NP) tests were included. As a criterion to choose the number of factors, we considered that at least 80 % of the total variance should be explained. We considered the NP tests with a loading ≥0.5 for that variable most representative of a specific component. Overall, a p value of less than 0.05 was considered significant and a Bonferroni adjustment was applied when performing multiple comparisons. All statistical analyses were performed using SPSS 16 except for the ZTNB which was performed using STATA 10.

Results

Overall, MS patients and HCs were matched by age, sex and years of education. Considering relapsing and progressive groups, a significant difference was observed; in particular, the Progressive group seemed to be older than other groups (HC vs Progressive group, p < 0.001; Relapsing vs Progressive group, p < 0.001). For sex and years of education, no differences were observed (see Table 1). A significant difference was found in log-BDI scores comparing MS patients and HCs (p = 0.003). We also evaluated correlation between MACFIMS and depression (BDI score on a log scale). We observed only a slight significant correlation between PASAT-2 (p = 0.045), DKEFS-CS (p = 0.006), DKEFS-DS (p = 0.015), JLO (p = 0.019) and COWAT (p = 0.035). The statistical analyses took into account this difference (see “Statistical analysis” section).

Cognitive performance for all tests administrated among the MS patients and HCs is shown in Table 2a. ANOVA showed significant effects between group for all tests. Effect sizes ranged from 0.33 for PASAT-3 to 1.18 for DKEFS-DS. Moreover, to highlight any differences between subgroups, we considered multiple comparisons (see Table 2b). ANCOVA showed that all tests significantly discriminate between HCs and the Relapsing group; moreover, all tests significantly discriminate HCs from the Progressive group; CVLT-TL (p = 0.014) and BVMT-TL (p < 0.001) significantly discriminates the Relapsing from the Progressive group. However, when adjusted for EDSS, the differences of BVMT-TL and CVLT-TL did not remain significant (p = 0.587 and p = 0.362, respectively).

Table 2 Neuropsychological tests scores comparing MS patients and HC

Since the statistical distribution of PASAT and BVMT-R presented the peculiarity of inflation at the minimum of their distribution (PASAT at zero and BVMT-R at 20), the semi-continuous and zero-truncated models (see “Statistical analysis” section) indicated that the disease had a significant effect on BVMT-TL scores. According to this model, the expected BVMT-TL score for the Relapsing group was equal to 0.87 (95 % confidence interval (CI) = 0.78–0.96) times the reference value of HC group (T score = 50). Thus, we estimated that BVMT-TL in the Relapsing group was 43.4 (95 % CI = 42.8–52.6). Similarly, the BVMT-TL expected score in the Progressive group was 37.6 (95 % CI = 32.2–45), holding constant all other variables in the model. Also, for the BVMT-DR score, the model revealed an effect of the disease. The expected BVMT-DR score for the Relapsing group was equal to 0.86 (95 % CI = 0.78–0.95) times the reference value of the HC group (T score = 54.7), so the estimated BVMT-DR score in the Relapsing group was 46.9 (95 % CI = 42.5–51.8). The expected BVMT-DR score in the Progressive group was 37.6 (95 % CI = 32.0–44.2), holding constant all other variables in the model.

About PASAT-3, applying the two-part model, we observed a significant association between the probability of a zero score and the diagnosis, in particular with the Progressive group (ORRelapsing vs HC = 3.42, 95 % CI = 0.95–12.25, p = 0.177; ORProgressive vs HC = 14.93, 95 % CI = 3.67–60.79, p < 0.001), but when a positive score was obtained, no difference could be attributed to the disease.

With regard to PASAT-2, the two-part model suggested that the diagnosis was associated with a high probability of a zero response (ORRelapsing vs HC = 6.41, 95 % CI = 1.44–28.79, p = 0.045; ORProgressive vs HC = 22.79, 95 % CI = 4.53–114.65, p < 0.001) and also was significantly related to positive values, although only two HCs obtained a 0 score. Particularly, the Relapsing group compared with HCs had a lower probability of obtaining an elevated positive score to PASAT-2; in the Progressive group, a significant effect did not emerge because a high percentage of MS patients obtained 0 scores. The patients with a 0 score at the PASAT-2 were not able to perform the test for general cognitive deficits. The two HCs, instead, refused to perform the task due to the difficulty to make the calculations and the resulting stress.

Based on scores equal or below the 5th percentile, frequencies of impairment in MS patients were as follows: BVMT-TL (total learning) 35.4 %, PASAT-3 32.3 % and PASAT-2 34.6 %, SDMT 37.7 %, BVMT-DR (delayed recall) 25.4 %, CVLT-DR (delayed recall) 24.6 %, JLO 26.9 %, CVLT-TL (total learning) 16.9 %, DKEFS-DS 49.2 %, DKEFS-CS 37.7 %, COWAT 20.8 % (Fig. 1). Generally, the Relapsing group performed better than the progressive one, as shown in Fig. 2.

Fig. 1
figure 1

Frequencies of impairment using a cut-off of 5th percentile

Fig. 2
figure 2

Frequencies of cognitive impairment in MS patients and HC

According to the definition of CImp previously reported, 70.8 % of patients are impaired. In order to reveal different degrees of severity in cognitive dysfunction, 18.5 % showed a mild level of impairment (two tests failed), 10.0 % a moderate level (three tests failed) and 42.3 % a severe level (four or more tests failed). Regarding the number of cognitive domains impaired, 57.7 % of MS patients showed at least two domains compromised (out of them, 20 % were mildly, 16.9 % moderately and 20.8 % severely impaired).

The logistic regression analysis, with CImp as dependent variables, revealed a significant effect of the disease (ORMS patients vs HC = 14.79, 95 % CI = 6.02;36.31, p < 0.001); considering diagnosis groups (HC, relapsing and progressive groups), we observed that the probability of failure on two or more tests increases with the extent of the disease course (ORrelapsing vs HC = 13.88, 95 % CI = 5.47;35.19, p < 0.001; ORprogressive vs HC = 19.00, 95 % CI = 4.83;74.70, p < 0.001). Considering only MS patients, the association between the probability of failure on two or more tests and the extent of the disease course did not remain after the adjustment for disease duration (ORrelapsing vs progressive = 1.37, 95 % CI = 0.35; 5.43, p = 0.651).

Principal component analysis (PCA)

In order to evaluate the construct validity, PCA was performed. Four components were obtained for MS patients (Table 3): (1) visual–spatial memory and processing speed, (2) working memory, (3) executive functions and (4) verbal memory. Overall, the four components explained 80.3 % of the variance of the 11 NP subtests included in the analysis and alone the first component explained 51.7 % of the total variance. PASAT-3 and PASAT-2, in addition to the second component, were discreetly loading also on the first component (0.370–0.321), being a measure of processing speed. Similarly, the two CVLT-II variables were discreetly loaded onto the first component (0.317–0.317), as a measure of memory. COWAT loaded quietly onto the third component (0.370) as a measure of executive functions. Overall, the cumulability for JLO and COWAT was lower with respect to the others (0.58 and 0.48, respectively) indicating that the four components explained a modest percentage of the variance of these two tests.

Table 3 Results of Principal Component Analysis

Discussion

Our study has shown the validity of the Italian version of the MACFIMS battery. Our data confirmed the group differences between MS patients and HCs in cognitive performance as assessed by the MACFIMS. Overall, with\ respect to the Relapsing group, the progressive one had lower performances in every test considered [33, 34] (Fig. 2). Effect sizes ranged from medium to very large, and verbal memory and executive functions tests were most sensitive to the disease.

According to previous literature [1, 7, 10], more than half of MS patients (70.8 %) exhibit CImp (at least two tests failed); out of them 18.5 % were mildly, 10 % were moderately, and 42.3 % were severely compromised. Differences in the frequency of impairment can be expected to differ across studies because of variability in testing methods and sample composition. This sample is representative of patients attending a hospital-based MS centre. Lower frequencies of impairment would be expected in a population-based sample, and higher frequencies might be found in samples of patients seen only for specified clinical purposes.

The cognitive functions mostly affected were visual memory and information processing speed/executive functions (Fig. 1). Considering the number of cognitive domains impaired, we found that 57.7 % of MS patients showed at least two domains compromised (out of them 20.0 % mildly, 16.9 % moderately and 20.8 % severely). This classification may be more specific to identify MS patients with a clear cognitive impairment; in fact, patients with two tests failed in the same domain are not considered mDCI.

In evaluating the PCAs, we identified four components: visual–spatial memory/processing speed; working memory; executive functions; verbal memory. The first component explained 51.7 % of the total variance. This high percentage could be due to the number of variables with non-negligible loadings on this component: BVMT-R, SDMT and COWAT (characterized by highest loadings on C1), PASAT and CVLT-II showed loadings higher than 0.3. Our data identified two clear verbal episodic memory and executive function components, and a more general component involving processing speed, visual memory and spatial functions. It is important to clarify that the MACFIMS test represents a minimal assessment of cognitive functions in MS [13] and does not assess cognitive functioning in a comprehensive manner. Therefore, these results should not be viewed as a representation of cognitive functioning in MS. However, we observed two single components representing verbal memory and executive functions, two tests that in our study were most sensitive to the disease.

There are some limitations in our study. We enrolled MS patients in three MS centre, all in Rome; to increase the external validity of the study would be desirable in the future to recruit MS patient in MS centre in other Italian cities. Moreover, higher percentage of cognitive impairment highlighted could be due to hospital MS centres particularly careful to the cognitive assessment. Another limitation was the lack of data about MS patients’ cognitive reserve and its correlation with the neuropsychological results; cognitive reserve, indeed, could be an important factor to explain cognitive profile in MS. Also, the limited size of our HC sample not allow to provide normative Italian data for the CVLT-II and the BVMT-R so we used for these tests US normative data.

In conclusion, our study is the first to validate an Italian version of the MACFIMS. Several aspects of validity have been demonstrated: criterion (all MACFIMS subtests discriminate MS patients from HCs) and, partially, construct (factorial structure of the MACFIMS). Future work will investigate the test–retest reliability of these tests, vocational status since cognitive dysfunction has a major burden on level of employment reached and the longitudinal course of NP dysfunction in MS using these measures.