Introduction

Multiple sclerosis (MS) is a chronic, autoimmune disease, characterised by inflammatory demyelination, affecting the white and the grey matter in the central nervous system. It typically affects multiple functional systems, resulting in a variety of symptoms, which may include sensory disorders, muscle weakness, ataxia, tremor and especially lower and upper limb impairments [1]. Although hand movements are essential in daily living, as demonstrated by Johansson et al. [2], disability in people with MS (PwMS) is thought to be mainly related to ambulation; as a matter of fact, the upper limb function is often impaired in MS subjects [3]. Moreover, upper limb functioning could impact negatively on quality of life of PwMS [4, 5], especially due to the bimanual nature of several tasks of everyday life, such as changing clothes, performing toileting or washing both the hands [6]. As reported by Bertoni et al. [7], 75 out of 110 subjects with moderate disability at the Expanded Disability Status Scale (EDSS) [8] had bilaterally impaired manual dexterity assessed with Nine Hole Peg Test (9-HPT) and correlate with ability in daily living activities.

Recently, as result of a revision on clinical tools to measure upper limb function in MS, the 9-HPT has been considered the gold standard for upper limb assessment [9], especially for its strict correlation with performance in daily living activities, and quality of life [10]. Moreover, the 9-HPT normative data have been recently published [11].

Based on a recent review [12], the most used patient-reported outcome measures for upper limb perceived function in MS are the Disability of Arm, Shoulder and Hand (DASH), the Manual Ability Measure-36 (MAM-36) and Abilhand questionnaires [13]. In spite of the importance of upper limb functions as predictor of patients’ outcomes [14], in a recent overview of patient-reported outcome measures of manual ability, Kahraman et al. [15] concluded that no MS-specific scales are available. As regards the MAM-36, the scale has been used to assess subjective upper limb function in several neurological and not neurological diseases such as Charcot–Marie–Tooth [16], rheumatoid arthritis [17], acquired brain injuries [18], spinal cord injuries [18] and upper extremity orthopaedic disease [18, 19]. The psychometric validation of the MAM-36 [18] was performed on 337 subjects affected by variety of neurologic and musculoskeletal disorders including a small sample of 44 subjects with MS. The correlation between MAM-36 and EDSS was reported in 44 PwMS [20].

A validated version of the MAM-36 is not currently available for the Italian context. The aim of the study was to translate the MAM-36 to Italian and explore its psychometric properties in a large sample of PwMS, and to investigate construct validity by examining its association with demographic and clinical variables, and the 9-HPT. For the purpose of the present study, the psychometric validation of the MAM-36 was performed using a combination of confirmatory factor analysis (CFA) and Rasch modelling techniques.

Research plan and methods

Translation and adaptation of the MAM-36

The MAM-36 includes 36 items assessing perceived ease or difficulty in performing common tasks (e.g. eating, dressing, button clothes) using one’s hands, regardless of which hand is used and excluding the use of adaptive equipment. Items are rated on a 4-point scale from 1 (cannot do it) to 4 (easy); a zero-response option is also included, indicating tasks that are almost never performed, with or without hand impairment. Scores on the 36 items are summed to create a raw total score with a range from 36 to 144. The MAM-36 was translated into Italian by a professional translator with knowledge of health terminology. The translation was evaluated to ensure semantic equivalence and acceptability. During an initial meeting of MS experts, a list of possible alternatives for the controversial item stems and response choices was developed. Problematic items and response choices were retranslated into Italian from the original version, and a definitive version was determined by consensus. The questionnaire has been fulfilled by 5 MS patients followed by second experts’ meeting. Subsequently, the Italian version was back-translated into English and compared with the original one.

Subjects

Study participants were a consecutive sample of patients followed at five MS outpatient clinics from “Department of Rehabilitation, Mons. Luigi Novarese Hospital” of Moncrivello; “Department of Neurology, University of Catania”; “Don Gnocchi Foundation” of Milano, “Rehabilitation Service of Liguria of the Italian Multiple Sclerosis Society (AISM)” of Genova; and “Sant’Andrea Hospital” of Rome. Each ethical committee of each participating centre allowed the ethical approval for this study (P.R.196REG2015). Signed informed consent was obtained from each patient prior to enrolment in the study according to the Declaration of Helsinki.

Inclusion criteria were as follows: a minimum age of 18, diagnosis of MS according to McDonald revised criteria at time of recruitment [21], a stable disease course without worsening more than 1 EDSS point over the last 3 months, relapse free at the time of enrolment into the study, completed MAM-36 and capable of understanding and providing signed informed consent. Exclusion criteria were presence of bilateral plegia, orthopaedic or neurological diseases or diseases other than MS. For each patient, we also collected demographic (age, gender) and clinical characteristics (disease course and duration, EDSS), and the 9-HPT.

Strategy of analyses

First, we computed descriptive statistics for the study measures. Next, we investigated the psychometric properties of the MAM-36.Footnote 1 As a first step, we examined the fit of the MAM-36 data to a one-factor confirmatory factor analysis (CFA) model. The CFA was performed using MPLUS 7.3 using the WLSMV estimator. We evaluated model fit using the comparative fit (CFI), the Tucker-Lewis (TLI) and the root mean-square error of approximation (RMSEA) indexes. We considered values of CFI > 0.95, TLI > 0.95 and RMSEA < 0.05 as indication of good model fit, while CFI and TLI > 0.90, and RMSEA < 0.08, as indication of acceptable fit [24]. Significant standardised item loadings > 0.70 were considered indicating good convergent validity, while values equal or below 0.70, but exceeding 0.50, were considered acceptable [25]. Values of variance extracted (AVE) ≥ 0.50 and omega (ω) composite reliability coefficient ≥ 0.70 were considered further evidences of convergent validity of the item set [25,26,27].

Next, we implemented the Rasch rating scale model [28] to examine the functioning of the MAM-36 scale. Adequate functioning of the 4-point rating scale was established if results met the following criteria: (1) a minimum of ten responses in each rating category, (2) rating category measures increasing monotonically and (3) Outfit mean-square values for each rating category sitting below 2.0 [29]. Adequate item fit was determined by Infit and Outfit mean-square values sitting in a range of 0.6 to 1.4 [30], while items showing values sitting beyond this range but not exceeding 2.0 were considered unproductive for measurement, but not degrading [31]. Reliability of person scores was determined using the Rasch reliability index, assuming values ≥ 0.90 as appropriate for clinical applications [32]. Additionally, we computed the person separation index and used it to determine the number of statistically distinguishable ability groups (i.e. person strata [33]). The dimensionality of the scale was examined by performing a principal component analysis (PCA) on Rasch residuals. Unidimensionality was established if Rasch measures explained  40% variance of the data, and the first contrast had an eigenvalue lower than or equal to 2.0 and accounted for both less than 5% total variance, and less than 10% of unexplained variance [34, 35]. Rasch analyses were performed using Winsteps 3.68.2.

Next, we examined ceiling and floor effects of the MAM-36 score, which we considered significant if we found more than 15% of patients reporting either minimum or maximum extreme scores. Additionally, we computed 25th, 50th and 75th percentile values for the MAM-36 score, used them to stratify patients in four quartile ability groups and inspect the distribution of scores in each group.

Criterion validity of the MAM-36 score was investigated by examining its association with patients’ demographic and clinical variables, and 9-HPT scores for both arms. The association of the MAM-36 score with age, disease duration, EDSS and 9-HPT scores was examined using Spearman’s rank correlation coefficient. Gender differences on test scores were examined using Mann–Whitney U test. Association between the MAM-36 score and disease course was examined using Kruskal–Wallis test (with Dunn’s post hoc test).

Finally, we inspected the distribution of MAM-36 scores among groups of patients showing different level of impairment based on existing normative data for the 9-HPT. Scores were categorised as indicating an “overt” impairment if ≥ 2 SDs from the normative 9-HPT values [36]. Based on this threshold, we grouped patients distinguishing between those showing unilateral overt impairment (N = 40) and bilateral overt impairment (N = 115), and those for which both arms showed no overt impairment (N = 63) and investigated differences across these groups on the MAM-36 using the Kruskal–Wallis test (with Dunn’s post hoc test). Except where indicated, analyses were performed in SPSS, version 23.

Results

Two hundred and eighteen patients were recruited. Descriptive statistics for patients’ demographic, clinical characteristics and the MAM-36 score are reported in Table 1.

Table 1 Descriptive statistics for recruited patients

Psychometric properties of the MAM-36

Results of the investigation of the psychometric properties of the MAM-36 are reported in Table 2 and 3, and Fig. 1. As regards the CFA, results showed the one-factor model had acceptable model fit based on recommended thresholds (CFI = 0.96, TLI = 0.96, RMSEA = 0.05). Standardised loadings were > 0.70 for all items except for three items—i.e. items 10, 18 and 31—whose loadings ranged from 0.58 to 0.65 (see Table 3). Combined with an AVE of 0.64, and excellent composite reliability (ω = 0.98), results of the CFA supported the convergent validity of the item set.

Table 2 MAM-36 response categories: number of responses, estimated Rasch measure and fit statistics
Table 3 MAM-36 items: one-factor CFA loadings, Rasch difficulty parameters and fit statistics
Fig. 1
figure 1

Rasch item-person map for the MAM-36

Rasch analyses showed the rating scale diagnostics met all three essential criteria for a functioning rating scale, as responses were greater than 10 for each category; Rasch category measures increased monotonically; and Infit and Outfit mean-square statistics were within suggested thresholds (Table 2). As regards item functioning, results are reported in Table 3. Item difficulty ranged from − 1.84 to 1.59; item 2 (“Carry a shopping bag with a hand loop”) was the most difficult item, while item 1 (“Eat a sandwich”) was the easiest. Compared with the distribution of person ability (see Fig. 1), which ranged from − 1.89 to 5.43 logit with a mean of 2.32 logit, easy to see that most of the items were perceived by patients to be relatively easy, as the average person ability was over 2 logit beyond average item difficulty. The Rasch person separation reliability index was 0.91, indicating excellent reliability. Person separation index was 3.21, resulting in a person strata value of 4.61, which indicate approximately four distinct ability groups could be detected using MAM-36 scores.

As regards item functioning, four items showed Infit and Outfit mean-square values sitting either above the upper limit (item 18: “Use a remote control”, item 31: “Write 3 to 4 sentences legibly”), or below the lower limit (item 1: “Eat a sandwich”; item 20: “Turn door knob to open a door”) for productive measurement. Interestingly, all the items showing problematic fit assessed prevalently unimanual activities. Because Infit and Outfit statistics for these items did not exceed the 2.0 threshold, we decided not to drop the items from the scale.

Results of the PCA performed on Rasch residuals indicated the Rasch dimensions explained 51.1% of the variance in the data. The eigenvalue for the first contract was 2.9, thus exceeding the proposed criterion and suggesting the existence of a secondary dimension in the data with roughly the strength of three items. Items 22 (“Carry a shopping bag with a hand loop”) and 34 (“Use a hammer or screwdriver”) revealed positive loadings on the contrast, while items 32 (“Turn pages of a book”) and 36 (“Take a CD/DVD out of its case and put it into a player/drive”) showed negative loadings, possibly suggesting patients had different familiarity with activities involving carrying heavy loads. However, the first contrast accounted for only 3.8% and 8.6% respectively of the total and unexplained variance, which are both below suggested criteria, indicating no substantial violation of scale unidimensionality.

Distribution of the MAM-36 score

In our sample, the MAM-36 raw score scale had a mean value of 126.46 (SD = 18.75; Skewness = − 1.56), a median value of 131, with an observed range of 39–144 (see Table 1). We found indication of significant ceiling effect (but no floor effect), as more than 15% of the patients (18% of patients, N = 40) had the highest possible score on the scale. Coherently, based on 25th, 50th, and 75th percentiles computed for the MAM-36 score (25th pct = 116.75; 50th pct = 131.00; 75th pct = 141.25), most of the patients appear to be clustered in the upper range of the score distribution.

Criterion validity of the MAM-36

Results of correlation between study variables showed the MAM-36 scores had a weak negative correlation with age (r = − 0.14, p < .05) and disease duration (r = − 0.25, p < .01), and a moderate negative correlation with EDSS (r = − 0.47, p < .01), and the 9-HPT scores for both arms (right: r = − 0.35, p < .01; left: r = − 0.30, p < .01). There was no significant association between MAM-36 score and gender (p = .72). Results indicated significant differences existed in MAM-36 scores among patients with different disease course (relapsing-remitting: M = 129.05; secondary progressive: M = 121.64; primary progressive: M = 121.58, KW [2] = 8.75, p = .01); post hoc analyses showed that only relapsing-remitting and secondary progressive PwMS were statistically different (p = .01).

We also found a significant difference that existed in MAM-36 score when comparing patients with different levels of upper limb impairment (no overt impairment: M = 131.22; unilateral overt impairment: M = 127.55; bilateral overt impairment: M = 123.28; KW [2] = 10.92, p < .01). However, only patients with bilateral overt impairment showed significantly different scores than patients reporting no overt impairment on both arms (p < .01).

Discussion

The present study had multiple aims. First, the adaptation to Italian of the MAM-36 and the investigation of its psychometric properties, including dimensionality, rating scale and item functioning, and targeting to a sample of MS patients. Next, we examined construct validity of the MAM-36 by investigating its association with theoretically correlated demographic and clinical criteria.

As regards the investigation of the psychometric properties of the instrument, the CFA and Rasch model analyses supported the unidimensionality of the scale, score reliability, and adequate functioning of response categories. Still, Rasch analyses showed a few items did not fully comply with proposed standards, thus potentially introducing undesired noise in the measurement process. Interestingly, all items showing problematic functioning assessed unimanual activities, as opposed to bimanual activities, which instead represent the majority of tasks assessed by MAM-36 items. Overall, given the relatively low number of items showing problematic functioning, and low item bias, we decided not to remove the items from the scale. Still, future studies employing large samples of MS patients should consider examining the feasibility of producing independent scores for unimanual and bimanual MAM-36 tasks, as well as exploring the use of alternative models including secondary ability dimensions [37].

As regards the targeting to the sample of the scale, both the Rasch analysis and the examination of score distribution showed the scale failed to target patients with low disability, a finding that has been already reported by other authors [18]. Based on the comparison of the Rasch item difficulty and person ability measures, most of the items seemed to cluster at the lower end of the ability spectrum, and none of the included items targeted individuals showing low disability. Coherently, the scale also showed indications of ceiling effect. Combination of these findings suggests that the scale might be more suitable to assess upper limb impairment in patients with moderate-to-high disability. For use in unselected MS population, a revision of the scale is advisable.

Lastly, criterion validity of MAM-36 scores was examined by investigating its association with clinician-rated EDSS score and 9-HPT scores for both arms, and results indicated a moderate convergence between the MAM-36 and these measures. As expected, the MAM-36 score also negatively correlated with age and disease duration and was significantly higher among patients with secondary progressive MS than among relapsing-remitting patients.

Overall, findings from the present study indicate that MAM-36 shows adequate fit to Rasch assumptions and provides a reliable assessment for upper limb disability for MS patients, in particular in the medium-to-high area of the disability continuum.

Limitations

The present study is not without limitations. Primarily, at 218 patients, sample size for the study was small, limiting the robustness of our findings. However, our sample size sits just above the 200 observations threshold typically suggested for conducting CFA analyses [38], and for stable parameter estimates in Rasch analyses [39]. Further, it is worthy to note that Rasch mean-square fit statistics are expected to provide reliable information even at low sample sizes [40]. Still, studies performed on larger samples may help clarify further the functioning of the scale among PwMS.

Conclusions

In summary, this study indicates that the Italian adaptation of the MAM-36 provides an assessment of upper limb function in MS that shows good measurement properties. The scale demonstrated excellent score reliability, scale unidimensionality and a well-functioning rating scale. Still, in line with previous findings, the scale showed indications of problematic targeting to patients with low disability. For this reason, use of the scale appears to be more suitable among patients with moderate-to-severe disability.