Introduction

Secondary progressive multiple sclerosis (MS) is characterised by an unremitting, slow accumulation of disability that follows an earlier relapsing remitting phase, and is considered to be a consequence of progressive neuroaxonal loss within the central nervous system [1, 2].

Clinical trials in secondary progressive MS that use disability measures as the primary outcome are necessarily long and expensive due to the slow and variable nature of disability progression and the insensitivity of currently used disability scales. It is relevant to develop surrogate markers of disease progression that are sensitive to the measurement of neuroaxonal loss and have greater statistical power in the detection of treatment effects. In this respect a number of MRI measures have emerged as potential candidates including measures of brain and spinal cord (SC) atrophy [36], T1 lesion load [7] and spectroscopic quantification of N-acetylaspartate [8].

Measures of brain and SC atrophy have been shown to correlate with disability scores in MS [36]. In addition, whole brain (WB) atrophy measures, particularly those utilizing registration-based algorithms, are highly sensitive and reliable in comparison to other candidate MRI measures, with relative methodological simplicity [912]. However, regional atrophy measures may more accurately reflect the neurodegenerative process. In particular grey matter (GM) atrophy may reflect neuroaxonal loss and demyelination [13], whereas white matter (WM) atrophy may additionally be influenced by inflammation, oedema and gliosis [14, 15].

In this 2 year follow-up study of subjects with secondary progressive MS we evaluate a number of volumetric MRI measures in terms of their potential use as surrogate markers of disease progression and neuroaxonal loss.

Methods

Subjects and clinical assessment

This study includes 56 subjects with secondary progressive MS allocated to the placebo group in a randomized controlled trial of neuroprotection with lamotrigine in secondary progressive MS. MRI and clinical data were acquired at baseline, 12 and 24 months. The inclusion criteria for the trial required subjects to be aged 18–60 years and have an Expanded Disability Status Score (EDSS) of 4.0–6.5. Disease progression was defined as a sustained deterioration in disability over the preceding 2 years with evidence of progression from clinical documentation or an increase of at least 1.0 point on the EDSS scale. Subjects were excluded from the study if there had been use of corticosteroids within the previous 2 months, immunosuppressive therapy within the previous 6 months or mitoxantrone within the previous 12 months. This research was approved by the Joint UCL/UCLH Committee on the Ethics of Human Research and all subjects gave written informed consent.

Clinical assessments included the EDSS and the Multiple Sclerosis Functional Composite (MSFC) score which consists of three separate components; the 25 foot walk (TW), 9 Hole Peg Test (9HPT) and Paced Auditory Serial Addition Test-3 (PASAT-3). The composite z score of these three tests was calculated for a within-group comparison. All subjects had three practice attempts at the PASAT-3 in the 2 weeks preceding the baseline clinical assessment.

Progression of the EDSS score over the 24 month follow-up was defined as an increase of 1.0 point for baseline EDSS scores of 4.0–5.0 and an increase of 0.5 point for baseline EDSS scores of ≥5.5. Progression of the MSFC z score was defined as a decrease of 0.34; this figure was calculated as the change in the MSFC z score resulting from a 20% worsening from baseline median values in each of the three components, as previously described [16].

MRI protocol

All imaging was performed on a Signa 1.5T machine (General Electric Milwaukee, Wisc, USA) at a single centre (Institute of Neurology, London). The following sequences were acquired at 0, 12 and 24 months: (1) a 2D T1 weighted spin echo sequence (TE 15 ms, TR 550 ms) in the axial plane yielding 3 mm contiguous slices; (2) a T2 weighted dual fast spin echo (TE 20 and 80 ms, TR 2,500 ms) in the axial plane yielding 3 mm contiguous slices; (3) a 3D T1 weighted gradient echo sequence (TR 15 ms, TI 450 ms, TE 5 ms) yielding 124 contiguous 1.5 mm thick coronal slices covering the WB; (4) a 3D IR prepared T1 gradient echo sequence (TR 15 ms, TI 450 ms, TE 5 ms) of the cervical spine yielding 64 partitions in the sagittal plain of 1 mm thick equivalents.

Post-acquisition analysis

  1. 1.

    WB atrophy Fully-automated SIENA (FSL software, Oxford, UK) was applied to axially reoriented 3D T1-weighted images. Brain centre of mass was specified to optimize the automated brain segmentation. Image pairs are registered using the skull as a constraint and percentage brain volume change (PBVC) between the registered images is estimated. The SIENA algorithm failed in two subjects due to movement artifact in one of the image sets and these were excluded from the analysis.

  2. 2.

    GM and WM atrophy GM and WM segmentations were performed using an established and automated segmentation tool, SPM5 (Statistical Parametric Mapping, University College London, UK) [17]. The 3D T1-weighted sequence was reformatted into a pseudoaxial plane and then registered to the 2D proton density images using a normalised mutual information algorithm [18]. The reformatted T1-weighted images were then segmented using SPM5 generating WM and GM masks. A lesion mask was applied to the GM segment to remove any misclassified voxels, and all GM segments were visually checked to ensure adequate segmentation and exclusion of WM lesions (lesions were classified as WM). Lesion voxels were reclassified as WM. Baseline tissue fractions relative to the total intracranial volume were also calculated (BPF, GMF, WMF). One subject was excluded from this analysis due to a segmentation failure.

  3. 3.

    Central cerebral (CC) atrophy CC volume was measured on the 2D T1-weighted images. This measure is based upon six contiguous 3 mm axial slices with the most caudal at the level of the velum interpositum. A semi-automated thresholding tool, MIDAS (Medical Image Display and Analysis System) [19], was used to segment the brain from surrounding skull and CSF.

  4. 4.

    SC atrophy Mean cross-sectional cervical cord area was measured using in-house software. The algorithm is based on five contiguous 3 mm pseudoaxial slices using the centre of the C2/C3 intervertebral disc as a caudal landmark, with slices perpendicular to the SC, as previously described [4].

  5. 5.

    T2 lesion volume (T2LV) This was estimated using a semiautomated local thresholding technique on in-house software (DispImage-Plummer, University College London, UK). T2 lesions were contoured on the corresponding proton density image. Manual editing of lesion boundaries was occasionally necessary in poorly defined or confluent lesions.

Reliability

The scan–rescan reproducibility of the volumetric MRI measures was assessed using MRI scans from three healthy controls, each of whom was imaged according to the same MRI protocol as the main study on five occasions over a period of 4 weeks. SIENA measures volume change and does not give a cross-sectional volume, scan–rescan variability was assessed with this technique in the three healthy controls by registering each of the five repeated images to the baseline image.

Statistical analysis

All statistical analysis was performed using SPSS version 14.0. The mean annual volume change for each of the volumetric MRI measures was calculated. The responsiveness of each of the MRI measures was estimated with the paired t statistic (mean change/standard error of the change) using comparisons between the baseline and month 24 data. The coefficient of variation was used to assess scan–rescan variability for each of the MRI measures in the healthy control scans. Since SIENA measures PBVC across registered images, the coefficient of variation for this measure is simply expressed as the mean deviation of the percentage changes from zero, rather than as a proportion of the mean, to allow a direct comparison between measures.

The z scores for the MSFC, TW, 9HPT and PASAT were calculated for comparison of subjects within the study cohort [20]. The time given for failure at the TW was 180 s and for failure at the 9HPT was 777 s. Based on prior determination that the MRI measures had a parametric distribution, Pearson correlation coefficients were used to assess correlations between the MRI measures and the change in the MSFC z score and its component measures. A stepwise multiple linear regression model was used to identify independent MRI correlates of change in the MSFC z score, with the MSFC as the dependent variable and the MRI atrophy measures as well as T2LV as independent variables.

Subjects were divided into two groups in terms of the progression of both their EDSS and MSFC scores over the 2 year follow-up, i.e. stable or progressed. It was felt that this method of assessing associations with the clinical measures would be useful, particularly in assessing the EDSS score which is insensitive and remained stable in the majority of subjects during the trial. Independent sample t tests were used to identify differences between clinically stable and progressive groups in terms of the clinical and MRI measures, with the exception of change in the EDSS score for which a Mann–Whitney U test was used.

Results

The baseline clinical and MRI characteristics of the 56 secondary progressive MS subjects in the cohort are described in Table 1.

Table 1 Baseline clinical and MRI characteristics

Sensitivity and responsiveness of the volumetric MRI measures (Table 2)

The SCCA had the highest annual rate of atrophy of the MRI measures at −1.63%. Annual GM atrophy at −1.18% was greater than WM [WM volume increased slightly (+0.12%)], and annual WB atrophy as measured by SIENA was −0.59%. Due to its superior precision, the SIENA measurement was the most responsive of the MRI measures in detecting change (t statistic −12.02). CCV, GMV and SCCA all had good responsiveness in our cohort. WMV change was inferior to all the other MRI measures in its sensitivity and responsiveness, and in the paired t test analysis the month 0 volumes were not significantly different from the month 24 volumes for this measure.

Table 2 Mean annual atrophy and paired t statistic for the volumetric MRI measures

Reliability (Table 3)

SIENA PBVC had the smallest scan–rescan variability on the control scans at 0.22% and CCV was also a reproducible measure with a variability of 0.37%. The SCCA was the least reliable measure with a variability of 1.51%.

Table 3 Scan–rescan coefficient of variance (COV) for each MRI measure in the control subjects

Longitudinal MRI-clinical correlations (Table 4; Fig. 1)

Rates of WB, GM and SC atrophy all correlated significantly with change in the MSFC z score over the 2 year follow-up with the strongest correlate being GM atrophy (r = 0.42) (Fig. 1). GM atrophy was also significantly correlated with change in the PASAT z score (r = 0.33) and 9HPT z score (r = 0.31), with a trend for WB and SC atrophy to be associated with the latter. WB atrophy was significantly correlated with change in the TW z score (r = 0.39).

Table 4 Correlations between MRI atrophy measures and change in the MSFC z score and its components
Fig. 1
figure 1

Scatter-plots of atrophy against change in the MSFC z score

In a stepwise multiple linear regression model to identify the best and independent MRI correlates of change in the MSFC z score, GM atrophy (β = 0.46, P ≤ 0.001) and SC atrophy (β = 0.38, P = 0.002) emerged as independent correlates and were able to explain 32% of the variance of change in the MSFC z score.

Longitudinal MRI correlates of disease progression as defined by both EDSS and MSFC scores (Table 5)

When the cohort was divided into those with stable disease versus disease progression over the 2 year follow-up, according to the EDSS criteria 17 subjects progressed and 39 remained stable, and according to the MSFC criteria 16 progressed and 38 remained stable (two missing data points at month 24 due to non-compliance with the PASAT). Subjects with worsening disease according to the EDSS criteria were not significantly different from those with stable disease in terms of the MRI atrophy measures. In contrast, those with worsening disease according to the MSFC criteria had a significantly greater rate of WB (−0.76 vs. −0.52% per year, P = 0.05) and SC (−2.77 vs. −1.30% per year, P = 0.02) atrophy compared to those with stable disease. There was also a trend for GM and WM atrophy to be greater in this group of subjects. There was no difference in relapse rate between progressive and stable subjects based on either the EDSS or MSFC criteria.

Table 5 Rates of atrophy in subjects with disease progression as defined by the EDSS and MSFC criteria

Discussion

Given the increasing focus on potential neuroprotective therapies in MS, it is relevant to identify and develop reliable imaging tools to monitor the neurodegenerative process. This is particularly so in progressive forms of MS, when ongoing and widespread neuroaxonal loss is evident [2, 14, 21, 22] and is the likely basis of clinical deterioration. Whilst there are many MRI studies in relapsing-remitting and mixed cohorts of MS subjects, relatively few have specifically focused on atrophy measures in secondary progressive MS and there is some evidence that both the rate of atrophy [23, 24] and its effect on disability [25] may be different at different stages of the disease.

Whole brain atrophy

WB atrophy was measured at an annual rate of 0.59% which is highly consistent with previous observations in secondary progressive MS [9, 26] and is substantially greater than would be expected with normal aging. Of the volumetric MRI measures assessed in our cohort, the SIENA measurement of WB volume change was the most reliable and most responsive to change. This is similar to a previous investigation in a secondary progressive MS cohort in which SIENA was compared with CCV and segmented-subtracted SIENA normalized WB volumes [27]. The key methodological difference between SIENA and the other MRI techniques assessed here is that SIENA utilizes image registration to measure change rather than through a process of quantifying and then subtracting cross-sectionally measured volumes. This important attribute of the SIENA measure may explain its superior reliability in this study.

Longitudinal correlations were found between WB atrophy and change in the MSFC z score over the 2 year follow-up (r = 0.35) and this is comparable to a previous correlation reported between these two measures over an 8 year follow-up in relapsing-remitting disease [5]. WB atrophy was the only measure to show a correlation with progression of walking time. This supports the notion that walking impairment in progressive MS reflects more global disease and not only pathology confined to spinal WM tracts. Global measures of brain pathology such as WB atrophy [28] and MTR [29] have been associated with walking times in previous cross-sectional studies of progressive MS. We provide longitudinal data to confirm this association and also assess a SC measure in the same cohort to allow direct comparison.

Grey matter and white matter atrophy

The patterns of GM and WM atrophy observed in this study are consistent with previous reports of greater GM atrophy in MS subjects [23, 30], however our observed annual GM atrophy rate of 1.18% is greater than has previously been reported in secondary progressive MS (−0.39%) [23], but similar to reported rates in relapsing and primary progressive MS [3032]. A possible explanation for this discrepancy is an underestimation of change in the previous secondary progressive study since annual loss of BPF was also reported in a range of 0.15–0.39% which is much lower than previous estimates in MS. GM volume changes are likely to reflect destructive pathologies including neuroaxonal loss and demyelination [13, 14], whereas changes in WM volume may additionally be influenced by fluctuating pathologies such as inflammation, both widespread and focal, oedema and gliosis [14, 15], which may mask actual tissue loss.

Technical factors related to image analysis algorithms can potentially affect the calculation of volumes. GM and WM segmentation may be slightly influenced by the presence of WM lesions when using SPM, even after reclassification of lesions as WM, with a greater burden of lesions causing an artefactual increase in GM volumes and decrease in WM volumes [28, 32, 33]. Although this might lead to a slight underestimation of GM atrophy and an overestimation of WM atrophy, it would seem unlikely to have had much effect in our longitudinal study in which the frequency of new lesions was relatively small and GM atrophy was the predominant finding.

GM atrophy was the best longitudinal correlate of change in the MSFC z score (r = 0.42) and GM and SC atrophy were identified as independent correlates of the clinical measure in a multiple linear regression model. GM atrophy was also the only significant correlate of change in the PASAT-3 and 9HPT performance. Given the correlation with change in the MSFC z score, it was surprising that GM volume change was not significantly greater in the binary group that had progression according to the MSFC criteria. It is likely that this study lacked statistical power to detect this difference since the rate of GM atrophy was still greater in the group that progressed (−1.62 vs. −1.11% per year).

A number of recent cross-sectional [34, 35] and longitudinal studies [36] have demonstrated that GM atrophy is more correlated with clinical status than WM atrophy and the correlations we find are in line with those previously reported, although there are no previous studies examining correlation coefficients between GM atrophy and change in the MSFC z score. Longitudinal studies suggest that GM atrophy occurs early in the course of the disease [32, 37, 38] but becomes more evident in established relapsing-remitting and secondary progressive disease [23]. The correlations we find between GM atrophy and deterioration in limb and cognitive function may indicate that GM damage, such as that caused by GM lesions and secondary neuroaxonal injury, is having a direct effect on clinical status. It is also possible that GM atrophy more accurately reflects a global neurodegenerative process than WM atrophy, the latter being influenced by other pathologies.

Central cerebral volume

A high rate of central volume loss has been observed in MS subjects, and has been related to progression of the EDSS score in relapsing [3] and secondary progressive disease [39]. Central atrophy has also been used successfully in the clinical trial setting [39, 40]. We also observe a high annual rate of change in this study at 1.51% and as such this measure demonstrates good responsiveness in our cohort. However we found no correlation with change in either the EDSS or MSFC score. It is possible that, in only having limited brain coverage, this technique is omitting clinically relevant regions of atrophy in secondary progressive MS, particularly frontal cortex and structures within the posterior fossa.

Spinal cord atrophy

The SCCA measured the highest rate of change of all the MRI measures and as a result it had comparable responsiveness when compared to CCV and GMV despite less reproducibility. The SC is a small structure with a high surface area to volume ratio and since measurement noise is generated at the surface, it is not entirely surprising that the SC area measurement has less reproducibility. In addition the SC measure is not fully automated and there is some dependence on the operator to classify the cord and CSF regions. The rate of cervical cord atrophy observed in this study (−1.8% per year) is higher than has previously been reported in relapsing remitting MS and secondary progressive MS [6, 41] but lower than previous estimates in primary progressive MS [42]. Our cohort of MS subjects are being studied at a time when their locomotor function is steadily deteriorating and their ambulation, whilst still present, is severely limited. The high rate of atrophy in the upper cervical spine may reflect neurodegeneration during this period of clinical deterioration in locomotor function.

We also find that SC atrophy is an independent correlate of disability progression as measured by the MSFC z score, and the trends to also associate with TW and 9HPT performance are consistent with an effect on motor function. A markedly higher rate of SC atrophy was observed in the binary group that had MSFC progression (−2.77 vs. −1.30% per year) highlighting the potential utility of this measure as a surrogate marker. Cross-sectional studies have previously demonstrated that SC atrophy is correlated with clinical status in secondary progressive MS [25, 35], but here, for the first time, we demonstrate longitudinal correlations with disability in a secondary progressive cohort which are independent from correlations with the brain atrophy measures.

Disability measures

When comparing progressive and stable subjects according to EDSS criteria, there were no distinguishing MRI measures. However, according to the MSFC criteria, both WB atrophy and SC atrophy were significantly greater in subjects who had disability progression. A number of previous studies have demonstrated that change in the MSFC z score correlates better with underlying pathology measured on MRI than change in the EDSS score [38, 43], and here we confirm those findings in a secondary progressive MS cohort. The EDSS is very heavily-weighted towards lower limb function in subjects with an EDSS greater than 4.0, failing to take account of cognitive and upper limb deficits which are more strongly associated with MRI abnormality [35]. Lower limb dysfunction and mobility may be affected by factors other than underlying tissue destruction such as a previous severe transverse myelitis, spasticity, use of walking aids, exercise, psychological factors and other co-morbidities.

In conclusion, measures of WB, GM and SC atrophy are all responsive and all demonstrate significant correlations with change in clinical status over a 2 year follow-up in secondary progressive MS. The WB atrophy measure has the advantage of high reliability since it utilizes a registration based technique to measure volume change. Improvements in the reliability of the GM and SC segmentation, perhaps by introducing registration-based methods, may make these the regions of choice for use as surrogate markers in the clinical trial setting. However, based on our data the SIENA WB atrophy measure currently appears to strike the best balance between clinical relevance and responsiveness.