Introduction

Progression of disability is the most important outcome measure in multiple sclerosis (MS), but the individual course of MS is still unpredictable [1, 2]. As a result, the detection of patients at high risk for disease progression versus those at low risk is difficult, and a better guidance is still needed for choosing the most appropriate and cost-effective among several treatment options now available [3]. Evoked potentials (EP) have been used for more than 20 years in MS to confirm the involvement of sensory and motor pathways or to disclose clinically silent lesions [4]. The diagnostic value of EP in establishing the diagnosis of MS has been addressed by several studies. Patients with suspected MS and EP abnormalities had a 71% higher risk of clinical deterioration during a 2-year follow-up compared to individuals with normal EP [5]. Correlations between EP and EDSS at first test and at 1 [1], 3 [4], 5, and 10 years [2] were also studied using nonparametric approaches which, however, showed insufficient predictive power if applied to individual cases. Although there is a general agreement that EP reflect the integrity of sensory and motor pathways and that the combination of EP abnormalities correlates with disability, the usefulness of multimodal evoked potentials (mEP) in monitoring the evolution of multiple sclerosis has not been yet clarified [4]. Moreover, the diagnostic value of EP has been diminished by the inclusion of only visual EP (VEPs) in the McDonald’s diagnostic criteria since 2001 [6]. The aim of this study was to re-evaluate the usefulness of mEP for short-term prediction of the EDSS by considering mEP not as a single predictor but within a multivariate statistic approach derived from economics which can be easily implemented and tested. A two-step process was employed to this end. First, we carried out a thorough retrospective analysis of all MS patients who had serial multimodal EP tests performed in our lab since 1989 to describe time-dependent variations and correlations of disability scores and EP measures in a large clinical sample. Second, we developed a multivariate predictive model of EDSS at 1 year by selecting patients who had at least three EP tests at regular 1-year intervals.

Materials and methods

Patients

A total of 221 MS patients referred to our MS center during the period 1989–2008 for clinical, neuroimaging, and neurophysiological assessments (908 exams) were selected from our clinical database. To be included, patients needed to have a diagnosis of clinically definite MS according to research criteria at the time of assessment (McDonald’s criteria [6, 7] or Poser’s criteria [8] for those diagnosed before 2001) and at least two multimodal sensory EP tests and simultaneous clinical and EDSS assessments during the 20-year period. Results of incomplete EP tests as well as tests performed during a clinical relapse were not included. Data of a subgroup of 58 patients who were assessed yearly on three consecutive occasions was used to build and test the predictive model.

Sensory evoked potentials

Visual (VEP), auditory (BAER), and somatosensory (SEP) evoked potentials were recorded according to recommended protocols [9]. Stimulation and recording were carried out using PC-based neurophysiological recording systems (Nicolet® CA 2000 and XLTEK® Protector) by the same experienced technicians. SEPs were obtained by electrical stimulation of the median nerves at the wrists, and of the posterior tibial nerves at the ankles. Latencies of the main peripheral, spinal, and cortical components were measured along with N20-N9 latency difference and N20-N25 amplitude difference for median nerve SEPs, P37-N19 latency difference and P37-N45 amplitude difference for tibial nerve SEPs. VEP to black and white pattern-reversal stimulation with checks of 30 and 15 min of arc were recorded over Oz of the 10–20 international system, with Fz as the reference. Latency of the P100 component and N75-P100 amplitude difference were measured. BAER to clicks at 70 dB above subjective threshold with contralateral white noise masking were recorded at the CZ electrode referred to the ipsilateral and contralateral ears. The latency of the main peaks I, III, and V, the inter-peak latencies (I–III, III–V, and I–V), and the I:V amplitude ratio were measured.

Multimodal sensory evoked potentials score (mEPS)

For all EP modalities, latencies and amplitudes—when measurable—were compared with normative data obtained in our laboratory. SEPs latencies were corrected for body height. Absolute or inter-side difference values exceeding 2.5 SD of normal values, and absence or gross morphological abnormality of a major component were considered abnormal. Abnormalities were quantified separately for each modality [VEP 30′, VEP 15′, BAEP, SEP lower limb (LL), SEP upper limb(UL)], according to a six-point graded scale drawn from the work of Jung et al. [1] (0 = normal; 1 = pathological side difference of latency; 2 = latency above the normal range but below 1.1 of upper limit, or >50% side difference of amplitude; 3 = latency 1.1–1.3 of upper limit; 4 = latency above 1.3 of upper limit; 5 = absent EP component). The multimodal evoked potentials score (mEPS) was calculated as the sum of the left and right abnormalities in the five modalities (inter-side differences were included only for the pathological side); thus, the worst possible score was 50 (5 points × 2 sides × 5 EP modality).

Statistics

In step 1, the impact of the type of disease course (e.g., relapsing-remitting, benign MS, secondary progressive, and primary progressive) on individual numeric variables (EDSS, mEPS, age, age at onset, and disease duration) was assessed by means of one-way ANOVA models. Correlations between experimental variables were assessed by bivariate parametric statistics (Pearson’s r), whereas an ordinary least-squares linear regression analysis (OLS) was applied on the whole sample data, with the EDSS score as the dependent variable and mEPS, age, age at onset, gender, and type of disease course as the regressors. The model assumptions (normality, homoscedasticity, linearity, collinearity, specification errors, and independence of the data as well as outliers and leverage points) were all verified according to recommended procedures [10, 11]. As a result, a root-square transformation was applied to the mEPS and five outliers were removed, leaving a total of 903 data for analysis. To deal with heteroscedasticity and non-independence of the data, the Huber–White sandwich estimator for robust standard error with the cluster option was finally applied to the regression [1012].

In step 2, the same OLS linear regression was applied to statistically balanced data (e.g., equal time intervals between successive EP tests) of a subsample of 58 patients. The dependent variable was again the EDSS score whereas quantitative regressors (l.EDSS, l.mEPS) were lagged by one period (i.e., 1 year or t−1) and the formula resulted:

$$ \mathop Y\nolimits_{.t} = \mathop \beta \nolimits_{0} + \mathop \beta \nolimits_{1} \mathop Y\nolimits_{1.t - 1} + \mathop \beta \nolimits_{2} \mathop X\nolimits_{2.t - 1} + \mathop \beta \nolimits_{3} \mathop X\nolimits_{3.t} + \cdots + \mathop \beta \nolimits_{6} \mathop X\nolimits_{6.t} + \mathop \varepsilon \nolimits_{.t} $$
(1)

where Y, EDSS; Y 1, lagged EDSS; X 2, lagged mEPS; X 3X 6, fixed clinical variables; β, regression coefficient; ε, error term. To verify the results, the difference between observed EDSS and estimated EDSS was individually calculated for all patients. All the analyses were performed with commercial packages (Stata® and Sas System®) with an academic license.

Results

Step 1

The demographic and clinical characteristics of the study samples are summarized in Table 1. Concerning the total sample, the relapsing–remitting (RR) course was the most represented (83.7%), as were female patients in all disease courses except in the secondary progressive (SP) group; patients with an SP or a benign course had longer disease durations. The largest time interval between age at onset and age at first EP test was observed in the benign MS group (about 14 years on average), the shortest in the primary progressive (PP, 3 years) group. As expected, mean EDSS and mEPS values were higher in the SP and PP groups. One-way ANOVA main effects of disease course were significant (all p < 0.0001) for age, age at first test, age at onset, duration of illness at first test, EDSS, and mEPS.

Table 1 Clinical findings of 221 MS patients analyzed in step 1 and of a subsample of 58 patients analyzed in step 2

EDSS and mEPS scores were moderately correlated (r = 0.44) both in the whole panel data (903 tests) and at the time of the first test (221 tests, r = 0.42), whereas among the other quantitative variables only age at onset showed a mild but significant correlation with EDSS (r = −0.15, p < 0.005).

The results of the OLS employing mEPS and the other clinical variables as regressors to predict EDSS are summarized in Table 2. Despite a significant overall F test, the coefficient of determination—which measures of how well future EDSS scores are likely to be predicted by the model—was not satisfactory (R 2 = 0.24).

Table 2 Regression analysis performed in step 1

Step 2

In step 2, a subsample of 58 patients fulfilling the criterion of three consecutive tests performed at yearly intervals was drawn from the original panel data (Table 1). This subgroup did not show significant differences in EDSS and mEPS mean values by two-sample t tests when compared to the remaining 163 patients, while significant differences were found concerning duration of illness, age, and age at onset (two-sample t tests: all p < 0.001), as well as the distribution of gender and type of disease course (Chi-square tests: all p > 0.05). At the time of the first test, the duration of illness ranged between 1 and 15 years and was rather homogeneously distributed: 37.9% of patients had a disease duration between 1 and 5 years, 24.1% between 6 and 10 years, and 37.9% above 10 years. The group mean EDSS and mEPS showed a small (1%) increase from the first to the third year. As in the whole sample, mEPS at the time of the first test weakly correlated with EDSS (Pearson’s r = 0.34, p = 0.01) but after 1 and 2 years the cross-sectional correlations increased moderately (r = 0.44, p = 0.0004 and r = 0.50, p < 0.0001, respectively). Changes (delta) in EDSS scores after 1 and 2 years were not significantly correlated with the corresponding delta mEPS values. However, EDSS scores after 1 and 2 years were moderately correlated with mEPS at the first test and after 1 year, respectively (r = 0.38, p = 0.003 and r = 0.44, p = 0.0005); the best correlation was found between mEPS after 2 years and EDSS scores after 3 years (r = 0.53, p < 0.0001).

The results of the OLS linear regression analysis employing lagged mEPS, lagged EDSS, and clinical data as regressors for EDSS prediction are summarized in Table 3. Afterwards age was excluded as a regressor because it did not pass the Akaike’s Information Criterion and Bayesian Information Criterion tests [13]. Individual regression coefficients for lagged EDSS and lagged mEPS were significant as well as the overall F test. The value of the coefficient of determination indicated an overall goodness of fit compatible with an adequate prediction of EDSS scores (R 2 = 0.79).

Table 3 Regression analysis with lagged (L1) variables

To verify the above findings, the difference of fitted versus real values of EDSS assessed during 2008 was computed (Fig. 1). We arbitrarily chose an error threshold of ±0.5 as an acceptable individual error estimate of EDSS in clinical practice. Using this threshold, 72% of the patients obtained an error value equal or inferior to ±0.5.

Fig. 1
figure 1

Histogram of differences between measured and predicted values. 72% of patients lie between the ±0.5 error threshold

Discussion

The relation between mEPS and EDSS was evaluated in this study by means of parametric approaches. In the retrospective study of 908 multimodal EP tests performed on 221 patients during the period 1989–2008, Pearson’s correlations confirmed previous findings of a moderate relationship between the two measures [1, 2, 4]. Furthermore, EDSS at the first EP examination also correlated moderately (r = 0.42) with mEPS, in line with Leocani’s et al. study [4], but at variance with two other studies [1, 2] showing no correlation. These apparently conflicting findings can be explained not only by differences in methods and variables, but also by the different mean duration of the disease at the time of the first EP test, which was longer than 5 years in ours and Leocani’s et al. study [4], but less than 2.5 years in Jung’s et al. [1] and Kallman’s et al. [2]. More specifically, a poor correlation between mEPS and EDSS during the first few years after disease onset is likely due to a substantial rate of subclinical lesions being detected only by EP tests, which only later progress into clinically overt signs caught by disability scales such as the EDSS [1, 14]. Accordingly, our result is consistent with a longer duration of the disease at the time of the first EP test, implying a change of the ratio of subclinical versus clinical lesions in favor of the clinical ones. Indeed, a significant correlation between EP scores and EDSS at 5 years was also found by Kallman et al. [2].

Despite the presence of moderately significant bivariate correlations, the multivariate OLS analysis including mEPS along with several clinical variables as regressors failed to predict EDSS scores at an acceptable level (R 2 = 0.24; Table 2). Again, this result supports the previous consideration on the clinical versus subclinical abnormalities ratio. We, therefore, reasoned that if EDSS measures overt clinical impairment while mEPS adds information concerning future clinical changes, because of its sensitivity to subclinical impairment, the linear relation between mEPS and EDSS should be better analyzed by time-lagging the former term. This reasoning was supported by other pieces of evidence concerning increasing values of correlation coefficients in the cross-sectional yearly correlations between EDSS and mEPS, as well as between mEPS t−1 and EDSS t . Specifically, mEPS at first test (mEPS t ) weakly correlated with EDSS t but after 1 and 2 years the correlations increased. By the same token, EDSS after 1 year (EDSS t+1) weakly correlated with mEPS t , but after 2 years EDSS t+2 was moderately correlated with mEPS t+1 and this correlation further increased when EDSS t+3 and mEPS t+2 were compared (Fig. 2).

Fig. 2
figure 2

a Cross-sectional correlations between EDSS and mEPS and b correlation between one-period lagged mEPS and EDSS during the observation period (2005–2008) in a subsample of 58 patients

Besides, the lagged variables strategy made it possible to include lagged EDSS (EDSS t−1) as a lagged dependent variable leading to a predictive multivariate model using lagged mEPS (mEPS t−1) and lagged EDSS (EDSS t−1) along with the other fixed clinical variables (age at onset, gender, and course) to predict the EDSS at 1 year.

Longitudinal correlations of delta values between EDSS and mEPS across the 3-year period were not significant as already noted by Leocani et al. [4], but in contrast with Jung et al. [1] who found a significant correlation between delta mEPS and EDSS in RR patients. As previously discussed, a role in such discrepancies can be assigned to the long versus short duration of the disease at the first observation, i.e., the longer MS duration the lower is the variability of delta values between tests.

A problem with our retrospective data panel was that the new model required equal time intervals between successive tests. It was, therefore, necessary to draw a subsample of patients who had at least three equally spaced (i.e., 1-year interval) EP tests. Fifty-eight patients who happened to be tested during the period 2005–2008 had statistically balanced data fulfilling such criterion (Table 1). This subsample showed mean EDSS and mEPS values similar to the remaining patients sample.

The new OLS regression analysis employing one-period lagged EDSS and lagged mEPS confirmed the hypothesis outlined above by returning a better global R 2 value (0.79). Using a variable as both a dependent and a lagged dependent term in a multivariate longitudinal model is a common technique in economics [11]. For example, to predict future prices it is crucial to take into account previous prices [15]. In the medical field, Deloire et al. [16] predicted the EDSS at 5 and 7 years using a multivariate approach which included baseline EDSS as a predictor. Our interest, however, was not focused on the obvious relation between the dependent and the lagged dependent variable as a necessary step on which to evaluate the contribution of other regressors. We rather sought to underline the ability of the lagged independent mEPS variable to explain the change between the two EDSS measurements. As shown above, its individual contribution was statistically significant while that of the other predictors was not.

The performance of the model was further tested employing the difference between estimated and measured EDSS values. Using an error threshold of ±0.5, the EDSS prediction at 1 year was correct in 72% of the patients. This figure is substantially higher than that of preceding works using similar multivariate approaches. Yaldizli et al. were able to explain 56% of EDSS variance after mean of 7.1 years by using EDSS at diagnosis along with other clinical and MRI variables, while Deloire et al. explained 56 and 71% of EDSS change at 5 and 7 years, respectively, using EDSS at baseline and age along with either Symbol Digit Modalities Test or consistent long-term retrieval [16, 17]. Due to major differences in patients’ characteristics, methods and regressor variables, none of the preceding works are directly comparable with ours; noteworthy, neurophysiological variables have been never included as regressors in spite of being still widely used to diagnose and monitor the disease [18, 19]. Nonetheless, a generalized difficulty in accurate long-term prediction of disability with multivariate models seems to emerge. Accordingly, our findings support the application of multivariate models particularly for a short-term (e.g., <5 years) prediction of MS disability, as well as the utility of EP in explaining future disability. A possible drawback of our study resides in the limited prediction time span; e.g., because of the small mean variation in EDSS and mEPS scores within the 3-year period of observation our findings would appear to be obvious at first glance. However, a few issues should be considered which can support our conclusions. First, individual changes up to ±3 EDSS points were found in the 58 patients group which were accurately predicted by the model; second, an accurate short-term prediction is more important for timely decisions than a long-term one, as stressed by recent treatment optimization guidelines [20]; third, the sum of successive accurate short-term predictions may be more efficient than long-term ones in detecting changes in disease course; fourth, in the work of Vercellino et al. [21] 50% of 174 relapses observed in 304 patients caused EDSS increases between 0.5 and 3.5 after 1 year, supporting the utility of short-term prediction.

In conclusion, our study extended previous findings based on nonparametric approaches [1, 2, 4] by showing the efficacy of a multivariate parametric model combining the effect of mEPS and routine clinical variables including the EDSS itself to predict short-term variability of EDSS. In spite of a more complex statistical approach requiring the control of regression assumptions and of balanced data, our model could be applied in clinical practice to guide treatment and follow-up decisions in individual patients.

Our future aim is to test how the model performs with more heterogeneous patients subgroups, e.g., as far as disease course is concerned, as well as to ascertain its prediction power beyond 1 year. Though we are aware that increasing the term of prediction will also increase data variability and estimate errors as the results of a recent work tend to confirm [17], a 2-year prediction term could still be a feasible target to pursue with a multivariate parametric approach. The overall predictability of our model could also be improved by including the motor evoked potentials, which were not available in our retrospective data panel and which have been found to be well correlated with the EDSS in cross-sectional analyses [1, 2, 4] and with EDSS changes at 2 years [14]. On the other hand, the recent introduction of novel targeted immune modulators which appear to markedly affect disability progression over time in RR patients [22] and EP abnormalities as well [23], could limit the applicability of predictive models devised in the era of the corticosteroids and the β-interferons; nonetheless, we think a multivariate approach could still be needed to tackle the inherent low predictability of MS outcomes and to deal with new important covariates.