Introduction

Since the introduction of the McDonald criteria in 2001, conventional magnetic resonance imaging (MRI) has become one of the cornerstones of diagnosing multiple sclerosis (MS) [1]. However, the association between common neuroradiological markers such as T1 or T2 lesion load and clinical disability in patients suffering from MS is weak [2] and was referred to as the ‘clinico-radiological paradox’ [3]. Although various confounders have been discussed [4], the significance of conventional MRI in describing the clinical status of MS patients has been questioned [5, 6]. Please note that the term conventional MRI generally refers to standard MRI sequences such as T1- and T2-weighted images as compared to advanced MRI techniques such as diffusion tensor imaging.

In other neurological diseases such as Huntington or Alzheimer’s disease, however, multivariate pattern recognition methods have been demonstrated to be a sensitive tool predicting the clinical status of individual subjects including the presence of a disease [7, 8] as well as symptom severity [9, 10]. In the present study, we investigated the amount of predictive information contained in local tissue intensity patterns extracted from conventional T1- (magnetization prepared rapid gradient echo, MPRAGE) and T2-weighted (fluid-attenuated inversion recovery sequence, TIRM) MRI data for clinical disability in MS patients. We conducted separate cross-validated canonical correlation analyses to predict several clinical scores, including the disease duration, motor disability (9-Hole Peg Test [9-HPT], Timed-Walk Test [TWT], [11]), cognitive dysfunction (paced auditory serial addition test [PASAT], [11]) and overall clinical disability (expanded disability status scale [EDSS], [12]).

Methods

Patients

Forty-one patients with clinically definite MS (relapsing-remitting type, [1]) recruited for a recent study of our group entered the analysis [13]. Patients were required to be between 18 and 55 years and to have at least one gadolinium (Gd-DTPA)-enhancing lesion on a qualifying T1-weighted brain MRI scan [14]. The patients were scored on the EDSS [12] and the subtests 9-HPT, TWT, and PASAT of the Multiple Sclerosis Functional Composite (MSFC; [11]). One subject was excluded because his EDSS score exceeded the 97th percentile. Demographic and clinical details are listed in Table 1. Importantly, MRI and clinical ratings were performed on the same day and all patients were in remission at this time point (interval to the last relapse/corticosteroid treatment was at least 30 days).

Table 1 Demographic and clinical details of MS patients

Consent was obtained according to the Declaration of Helsinki, and the study was approved by the research ethics committee of the Charité - Universitätsmedizin Berlin. All subjects gave written informed consent.

MRI acquisition

Whole-brain high-resolution three-dimensional T1-weighted images (MPRAGE, TR 2110 ms, TE 4.38 ms, TI 1,100 ms, flip angle 15°, resolution 1 × 1 × 1 mm) and a T2-weighted fluid-attenuated inversion recovery sequence (TIRM, TR 10,000 ms, TE 108 ms, TI 2,500 ms, resolution 1 × 1 × 3 mm, 44 contiguous axial slices) were acquired using a 1.5-Tesla MRI (Magnetom Sonata, Siemens, Erlangen, Germany) with an eight-channel standard head coil.

Lesion load

Lesion load for MPRAGE and TIRM images were routinely measured using the MedX v.3.4.3 software package (Sensor Systems Inc., Sterling, VA, USA, [15]). Lesion load of TIRM images were additionally measured using in-house software [13].

Preprocessing

Several preprocessing steps were performed. First, a clinician and experienced reader (CP) applied in-house software to conduct lesion mapping based on individual TIRM images. To be as conservative as possible, CP was instructed to mark any hyperintensities visible in the TIRM images and not only oval lesions as it is common in clinical practice. Next, correction of field inhomogeneities, coregistration of high-resolution MPRAGE and TIRM images, and normalization of these high-resolution images to the Montreal Neurological Institute (MNI) brain template (voxel size: 2 × 2 × 2 mm) were conducted using SPM5 (Wellcome Trust Centre for Neuroimaging, Institute of Neurology, UCL, London, http://www.fil.ion.ucl.ac.uk/spm; Fig. 1). The normalization parameters for the MPRAGE images were estimated by the ‘unified segmentation approach’ [16] and then applied to the MPRAGE and co-registered TIRM images as well as to individual lesion masks. Importantly, lesion areas identified by the clinician were excluded to avoid lesion-mediated artifacts in the normalization routine (see SPM5 manual: www.neuro.nl/fmri/docs/SPM5manual.pdf). Finally, we obtained MPRAGE and TIRM images from all subjects as well as their individual lesion masks in MNI space (voxel size: 2 × 2 × 2 mm).

Fig. 1
figure 1

Overview of data processing. Raw MPRAGE and TIRM images were co-registered and normalized to the Montreal Neurological Institute (MNI) template. Individual lesions that were marked beforehand by a clinician were excluded from the estimation of normalization parameters. The search space was reduced to only voxels within the standard brain mask that were not cerebrospinal fluid with a probability of greater than 0.8. For the normalized MPRAGE and TIRM images separately, we aimed to predict the clinical markers’ disease duration, motor disability, cognitive dysfunction, and overall clinical disability

For the pattern-based analyses, only voxels within the SPM standard brain mask that were not cerebrospinal fluid (CSF) with a probability of more than 0.8 (based on SPM CSF prior map) were included. The rather conservative threshold of 0.8 was chosen to avoid misinterpretation of tissue-free voxels as brain tissue. To account for different intensity levels, the images were standardized within subjects based only on normal-appearing (i.e., non-lesional) brain tissue. This was done to ensure that a higher lesion load did not introduce any biases into the standardization. We continued by using a general linear model to regress out the variance contained in voxel intensities that could be explained by local deformation parameters determined during spatial normalization. The deformation (i.e., shift) parameters in x, y, and z direction were calculated for individual MPRAGE images using the deformations toolbox implemented in SPM5. This step was performed in order to rule out that pattern-based prediction could rely on intensity differences induced by the spatial transformation that correlate with clinical disability (e.g., due to a stronger correction of regional atrophy in patients with a higher disability score).

Pattern-based canonical correlation analysis

In order to decode symptom severity from the preprocessed MRI data, we conducted a pattern-based and cross-validated canonical correlation analysis using in-house software [17] independently for MPRAGE and TIRM images and for each of the following clinical variables: disease duration, motor disability (as measured by 9-HPT and TWT), cognitive dysfunction (as measured by PASAT) and overall clinical disability (as measured by EDSS; Figs. 1, 2).

Fig. 2
figure 2

Illustration of pattern-based canonical correlation analysis. In the training phase, local patterns (i.e., so-called searchlights) of n-1 subjects were extracted from the normalized MPRAGE and TIRM images, respectively. The dimension of these feature vectors were reduced to half the samples using principal component analysis (PCA; not shown here). The canonical correlation analysis (CCA) then finds a linear relationship between PCA-projected voxel intensities and clinical scores for all training subjects. In the testing phase, a pattern of a new ‘unseen’ subject is represented and CCA is used to predict the clinical score of this person. For validation, we performed a leave-one-out cross-validation over all subjects. This whole procedure was repeated for each searchlight position, so that each voxel in the brain was the center voxel of the searchlight once

In all analyses, local intensity patterns were extracted from the given MRI data by using a ‘searchlight’ approach [18, 19], which searches across the whole brain for local tissue intensity patterns informative about clinical disability. A searchlight is defined as a spherical cluster of N voxels that is created around a given center voxel v i . Here, we used a searchlight radius of four voxels (i.e., 8 mm, corresponding to at most 257 voxels in one searchlight). For each voxel in a searchlight, we extracted individual tissue intensity values based on MPRAGE and TIRM images, respectively. In the next step, these feature vectors were used to decode disease duration, motor disability, cognitive dysfunction, or overall clinical disability by carrying out a canonical correlation analysis (CCA).

CCA is a standard tool in statistics to study linear relationships between two sets of multidimensional variables [20]. The goal of CCA is to find two bases, one for each set of variables, such that the correlation between the projections of the variables onto these two bases is mutually maximized. Because it is independent of the coordinate system in which the variables were originally described, it can overcome one of the major drawbacks of ordinary correlation analysis.

In our case, the first set of variables is given by the N-dimensional local tissue intensity vector of all subjects; the second set is given by one or two clinical scores of the subjects. Prediction of motor disability comprised two variables, 9-HPT and TWT; all other clinical variables were modeled using one variable.

Since CCA is ill conditioned for redundant sets of variables, we reduced the dimensionality of the feature vectors to 20 (i.e., half the number of subjects) using principal component analysis (PCA, [21]). A fixed number of principal components were chosen to guarantee equal feature vector lengths across all regions.

To assess the generalizability of performance using an independent data set while at the same time avoiding circular inference [22], we performed a leave-one-out cross-validation. This means that we used the feature vectors of all but one subjects as ‘training data’ first to reduce the dimensionality by PCA and second to train the CCA algorithm to learn a relationship between intensity values and clinical scores. We then tested it on the remaining, independent ‘test’ subject. This procedure was repeated so that each subject was the test subject once.

Decoding accuracy is then given by the Pearson’s correlation coefficient between true and predicted clinical scores (i.e., canonical variates) and is assigned to the center voxel of the searchlight. A high correlation implies that the local cluster of voxels surrounding the center voxel spatially encoded information about the clinical variables under investigation. Please note that we extracted only the first canonical variate since it accounts for the largest canonical correlation. After repeating the procedure for each searchlight position, we generated a parametric map indicating the correlation for each searchlight. Corresponding probabilities were calculated using Student’s t distribution. To account for the multiple comparison problem, we report searchlight center coordinates that exhibit a significant correlation on a Bonferroni-corrected level of p < 0.01 (one-tailed). This very conservative threshold was chosen to increase the specificity of the analyses.

To characterize the underlying tissue alterations, we also reported the average grey matter and lesion ratio for each significant cluster. For this purpose, individual grey matter and lesion ratios for all voxels contained in the searchlights of the cluster were calculated and then averaged. The individual grey matter ratio was determined by the number of grey matter voxels (probability of being grey matter >0.5 based on individual grey matter probability maps provided by SPM5 during spatial normalization) divided by the number of all voxels in the cluster. Similarly, individual lesion ratio was determined by the number of lesion voxels (according to the individual lesion masks) divided by the number of all voxels in the cluster.

In a further analysis, we investigated the predictability of clinical disability based on lesion load alone. Here, we conducted a CCA where the first set of variables is given by T1 and/or T2 lesion load and the second set is given by the clinical markers.

Results

Statistical analysis of clinical and volumetric measures

Significant correlations (Spearman’s rho) were found between the univariate variables disease duration and EDSS (r = 0.50, p < 0.01), disease duration, and 9-HPT (r = 0.38, p < 0.05), EDSS and 9-HPT (r = 0.40, p < 0.05), EDSS and TWT (r = 0.33, p < 0.05), 9-HPT and TWT (r = 0.46, p < 0.05), and TWT and PASAT (r = −0.47, p < 0.01). T2 lesion load calculated using in-house software correlated strongly with the T2 lesion load calculated with MedX (r = 0.88, p < 10−12, Pearson), but due to the sensitive lesion mapping performed by the clinician, it resulted in a higher overall mean (in mm3; mean (M) = 12,556, standard deviation (SD) = 9,196 as compared to M = 5,302, SD = 4,142). Decoding accuracies based on T1 and/or T2 lesion load (calculated with MedX) are shown in Table 2.

Table 2 Decoding of clinical scores based on lesion load

Pattern-based canonical correlation analysis

Local patterns informative about disease duration, motor disability, cognitive dysfunction, and overall clinical disability are shown in Fig. 3, separately for MPRAGE and TIRM images. Clusters reaching statistical significance after correction for multiple comparisons (p < 0.01, Bonferroni corrected) based on independent test data are listed in Tables 3 and 4. Please note that the decoding accuracy based on tissue intensity patterns was clearly higher than for T1 and T2 lesion load. The anatomical regions were identified using ‘automated anatomical labeling’ (AAL, [23]) and differed between MPRAGE and TIRM images, but included both white matter and task-related grey matter. We found patterns in the somatosensory cortex and posterior parietal cortex linked to disease duration. For motor disability, we identified regions in areas related to motor control such as cerebellum, thalamus, and primary motor cortex as well as in areas related to the planning and coordination of movements such as posterior parietal cortex and middle frontal gyrus. PASAT scores could be accurately predicted from areas known to be involved in working memory [24] such as posterior parietal cortex, but also from inferior temporal lobe, cingulum, and fusiform gyrus. When lowering the threshold to p < 0.05 (Bonferroni corrected), we additionally found clusters in the right prefrontal cortex (inferior frontal gyrus for MPRAGE images and middle frontal gyrus for TIRM images; Fig. 3). For EDSS, one cluster in the inferior frontal gyrus was maximally informative.

Fig. 3
figure 3

Results of pattern-based canonical correlation analysis. Brain regions encoding information about disease duration (pink), motor disability (blue), cognitive dysfunction (red), and overall clinical disability (green) are overlaid on a rendered T1-weighted anatomical template image. For illustrative purposes, searchlight center coordinates are shown at p < 10−5 (uncorrected) with a cluster threshold of five voxels. Peaks of correlation reaching statistical significance after correction for multiple comparisons (p < 0.01) are listed in Tables 3 and 4. AG angular gyrus, CER cerebellum, FFG fusiform gyrus, IFG inferior frontal gyrus, IPL inferior parietal lobe, M1 primary motor cortex, ITL inferior temporal lobe, IOL inferior occipital lobe, MFG middle frontal gyrus, MOL middle occipital lobe, MTL middle temporal lobe, PC paracentral, PCN precuneus, PH parahippocampus, S1 somatosensory cortex, SFG superior frontal gyrus SMA supplementary motor area, SMG supramarginal gyrus, SPL superior parietal lobe, TP temporal pole. All other areas correspond to white matter

Table 3 Regions encoding symptom severity based on MPRAGE images
Table 4 Regions encoding symptom severity based on TIRM images

A relatively low grey matter ratio (between 16.32 and 79.01 %) in areas classified as grey matter by AAL indicated that most areas were located in the conjunction of white and grey matter. The lesion ratio found was quite low (at most 14.69 %) in all significant clusters, but tended to be higher in white matter.

Maps obtained from MPRAGE and TIRM images indicating the correlation coefficient for each searchlight position were moderately correlated (r = 0.34 for disease duration, r = 0.22 for motor disability, r = 0.29 for cognitive dysfunction, and r = 0.31 for EDSS; Pearson). This means that correlation values between predicted and true clinical scores calculated based on either MPRAGE or TIRM images are to some extent related. The number of significant searchlights (p < 0.01, Bonferroni corrected) was higher for MPRAGE images (n = 36) compared to TIRM images (n = 29). In both modalities, more significant searchlights were found for motor disability and cognitive dysfunction than for disease duration and overall clinical disability.

Discussion

In the present study, we demonstrated that local tissue intensity patterns extracted from conventional MRI of MS patients together with a canonical correlation analysis encode clinically relevant information about symptom severity quantified in terms of disease duration, motor disability, cognitive dysfunction, and overall clinical disability in normal-appearing brain parenchyma. In particular, these patterns were more informative than the global lesion load. Remarkably, decoding accuracy based on local tissue intensity patterns was twice or three times higher than for T2 lesion load.

In recent years, numerous efforts have been made to link clinical disability scores in MS patients and MRI-derived markers. Generally, this association seems to be rather poor and led to the formulation of the so-called clinico-radiological paradox [3]. However, these early studies mostly tried to establish a dependency between T2 lesion load and clinical scores. For T1 lesion load, atrophy and measures based on non-conventional MRI, the results tend to be better [25, 26], but are still not applicable in clinical practice. We used a complex pattern-based decoding approach to establish a relationship between tissue characteristics from conventional MRI and clinical scores. The main advantage over common correlation analysis is that it is capable of detecting subtle tissue alterations in local brain areas that are not distinguishable by the naked eye or measurable by global markers such as lesion load or brain volume. Therefore, it allows for a sensitive mapping of clinically relevant regions. By using a cross-validated procedure, we additionally ensured the generalizability to new data sets.

Although we used a very conservative significance threshold, we found several regions that robustly encode symptom severity. For disease duration, we identified the precuneus and postcentral gyrus as areas with high predictive information. Interestingly, both structures have been linked to significant grey matter loss in MS and seem to be associated with fatigue [27] and sensory disturbances [28]. We hypothesize that this grey matter loss in conjunction with clinical symptoms accumulates over time and therefore correlates with disease duration. For motor disability, we mostly found regions involved in motor control (cerebellum, thalamus, and precentral gyrus) and planning of coordinated movements (posterior parietal cortex). These regions have been linked to various structural and functional abnormalities in MS patients [29, 30]. For cognitive dysfunction, we mainly found working memory areas including posterior parietal cortex and—when lowering the significance threshold—prefrontal cortex. Both areas have been shown to be functionally related to numerical information processing [24]. In MS patients, the PASAT score has been shown to correlate with atrophy [31], reduced diffusivity [32], and a higher functional activation [33] in these areas. Additionally, we have found patterns in the cingulum for which structural abnormalities have actually been reported to correlate with the PASAT score [34]. In line with several studies arguing that atrophy in frontal areas is a good indicator of the EDSS [35, 36], we found one area in the inferior frontal gyrus to accurately predict the EDSS score. However, most other studies did not find a connection between regional brain MRI abnormalities and the EDSS score, which has been explained by the low specificity of the EDSS [37] and a major involvement of the spinal cord in constituting the EDSS score [38].

Although the result maps based on MPRAGE and TIRM images were moderately correlated, peak regions encoding symptom severity differed between MPRAGE and TIRM images. This might be due to the fact that T1- and T2-weighted images measure different tissue and disease properties. For instance, it has been argued that T1 signal intensity is a marker of neuronal density, whereas T2 hypointensity rather correlates with the myelin content [39]. In line with several studies highlighting the importance of T1-relaxation time measurements for disease progression and clinical disability [40, 41], predictions based on MPRAGE images (both lesion load and intensity patterns) were superior to predictions based on TIRM images.

The question regarding the underlying tissue characteristics explaining the high predictability in certain areas is the most challenging one in the present study. This is due to the fact that T1- as well as T2-weighted imaging is relatively unspecific [42]. A hallmark of MS pathology is the development of focal inflammatory lesions. However, since the proportion of lesions was below 5 % in most brain areas listed in Tables 3 and 4, we argue that lesions visualized by TIRM images play a minor role in revealing symptom severity, though. Several studies [3], including our own data, confirmed that the T2 lesion load only partially accounts for the individual clinical deficits. In addition to focal lesions, histological studies [43] and advanced imaging techniques such as diffusion-weighted imaging [34, 44], magnetization transfer imaging [45], and proton spectroscopy [29] revealed that tissue damage in MS is more widespread than previously believed. This tissue damage usually remains undetected on conventional MRI and thus the referring parenchyma is termed normal-appearing brain tissue (NABT). We claim that our algorithm captures not only macroscopic features as presented by lesions, but also subtle signal alterations in the NABT that remain occult to the human eye. We showed that these diffuse abnormalities help to elucidate the extent of clinical disability in MS patients when suitable and more complex analysis techniques are employed. This becomes especially true for brain areas involved in most common symptoms of MS patients, such as visual disturbances and motor as well as sensory pathways, which were revealed in both T1- and T2-weighted sequences.

Several limitations should be pointed out. First, the exact histopathological processes in MS remain unclear, although several possible pathomechanisms are discussed that could account for an accurate prediction. Future studies should correlate prediction accuracy with histopathological findings and additionally assess whether our findings can be confirmed with other quantitative MRI techniques giving an additional insight into tissue integrity such as magnetization transfer imaging or diffusion tensor imaging. Second, our sample size is rather small and might cause effects that are not representative for all MS patients. Therefore, our results need to be confirmed in a larger patient cohort, preferentially including also other forms of MS (e.g., primary or secondary progressive MS) as well as other neurological diseases. Additionally, it would be interesting to assess whether our findings can be confirmed in patients with greater overall clinical disability and impairment in specific clinical domains. Third, the common clinical parameters were criticized to be rather unspecific, in particular the EDSS. Future studies might consider the sub-scores of the EDSS or additional markers of symptom severity. Fourth, the spatial normalization procedure, which is necessary to conduct a group-based analysis, may be influenced by cerebral atrophy. Although we aimed to minimize this error by regressing out the variance induced by the deformations, we cannot guarantee a totally bias-free normalization. Fifth, although we have found a strong association between local brain patterns and the clinical disability markers, the results have to be interpreted with caution. This is due to the fact that we cannot estimate the clinical contribution of spinal cord lesions, in particular to the scores reflecting motor disability, at the current stage of study. Finally, even though our proposed method is not yet applicable in clinical practice, we generally believe in the potential of pattern-recognition methods in complementing macrotexture information already used by neuroradiologists. With the emergence of large databases (such as the Alzheimer’s Disease Neuroimaging Initiative [ADNI] database, www.loni.ucla.edu/ADNI) and more powerful computers, the elaborate methods described in the present study might be transferable to large-scale clinical practices. However, future studies are necessary to evaluate the prognostic information of local intensity patterns in clinical trials and to make this information usable for individual patients.

In conclusion, we have shown that predictive information contained in local brain tissue intensity patterns of MS patients clearly outperforms information contained in conventional neuroradiological markers such as the lesion load when using suitable analysis techniques. Predicting areas were located in the conjunction of white and task-related grey matter and consisted mostly of NABT. We hypothesized that our proposed algorithm uses a mixture of slight intensity changes due to several pathomechanisms, including diffuse T2 hyperintensities with lesions as endpoint, T1 hypointensities, and atrophy. Our findings suggest that local intensity patterns might perform as clinically relevant biomarkers of clinical disability in MS and should be considered in future studies.