Introduction

The international controversy about which treatment is the best for degenerative spine disorders, especially cervical spondylogenic myelopathy (CSM), continues unabated. Although a complete recovery is clearly impossible, the number of methods recommended and frequency of operative intervention increases, mostly stimulated by medical research, the anticipation of the population, and industrial development for systems and reconstructive components. Physicians, patients, and medical care paying authorities worldwide are desperate to know what the most successful procedure is, “success” meaning more than the improvement of single functional data. An international exchange of experience and studies is therefore essential, especially in case of CSM, which is characterised by complex symptoms. The ability to compare various results that measure clinical deficits and outcome is a necessity for successful worldwide discussion about the disease and its treatment. The study presented here compares Nurick-score [21, 22], Japanese orthopaedic association score (JOA-score) [34], Cooper-myelopathy-scale (CMS) [4], Prolo-score [26] and European-myelopathy-score (EMS) [10] and analyses how they reflect outcome, clinical and diagnostic data in patients with cervical myelopathy.

Materials and methods

Clinical data

In a retrospective study 43 patients were examined, all of whom showed clinical and morphological signs of CSM and were operated (anterior approach; 40 patients in 2 levels, 3 patients in 3 levels) on during the period from January 1995 to December 2001. The mean age was 53.9 years (27–81 years). At the beginning of the investigation 14 of 43 patients were already in retirement, 29 were of employable age. Mean duration of symptoms was 20.7 months (3 days to 10 years). The postoperative examination took place approximately 12 months after operation (6–24 months). We evaluated gait function, paresis and sensibility of the lower extremity, vegetative symptoms, diadochokinesia, funicular symptoms (pathological reflexes, muscle tone and reflexes of the upper and lower extremity), paresis and sensibility of the upper extremity, pain of the neck, shoulder and arm (radicular symptoms), economic situation and severity of work. The postoperative clinical status was compared to the prior status as follows: worse, unchanged, regressive or normalized. Unilateral deterioration of paresis or sensibility was graded as worsening, while unilateral improvement was graded as regression.

The most frequent symptoms were gait dysfunction (86%), increased muscular reflexes (79.1%), pathological reflexes (65.1%), paraesthesia of the upper limb (69.8%) and pain (67.4%). The frequency of manifestation of the symptoms corresponds with the frequency described by other authors [6, 12, 15, 17], the data also confirms that funicular and radicular symptoms are usually combined in CSM [9, 10, 18]. Postoperatively 51.9% of the patients showed improvement of the clinical symptoms. Specifically, radicular symptoms improved in 58.2% of the patients, funicular symptoms in 49.9%. Similar results in clinical development were also found by Chiles et al. (1999) [4].

The scores

The scores investigated are shown in Tables 1, 2, 3, 4, and 5.

Table 1 The higher the grade, the more severe the deficit
Table 2 Japanese orthopaedic association score (JOA-score modified by Keller 1993)
Table 3 Cooper myelopathy scale
Table 4 Prolo-score (modified for CSM)
Table 5 European myelopathy score (EMS)

Results

Score-results and changes of the scores in the course or investigation

Table 6 shows the score-results of each score pre- and postoperative, including the recovery-rate as a measure of the cumulative improvement.

Table 6 Mean pre- and postoperative score-results and recovery-rate of the scores

Postoperative improvement was measured to be highly significant (Wilcoxon-test: Nurick-score P < 0.004/all the other scores P < 0.0001). Similar results concerning the improvement of the scores investigated were accessed by different authors [5, 7, 8, 12, 24, 25, 27, 30, 31].

Correlation of the scores

The correlation between the scores was measured using Spearman’s rho. All the scores show a significant pre- and postoperative correlation. A strong correlation is demonstrable throughout the whole investigation between Nurick-score, JOA-score, CMS of the lower extremity and EMS (P < 0.0001). The Prolo-score shows a good correlation with the Nurick-score (P = 0.001), the JOA-score and EMS (P = 0.002). Correlation between the Prolo-score and the CMS is preoperatively weaker (P < 0.05) than postoperatively (P < 0.001). The CMS of the upper extremity and the Nurick-score correlate on the level of P < 0.05, whereas the JOA-score and the EMS show a strong correlation with the CMS (UE) (P < 0.001). Both scores question upper extremity function separately, which they, according to the correlation result, reflect better than the Nurick-score. As expected, there is no significant correlation between the CMS of the upper and the lower extremity, since both dysfuctions do not depend on each other. The weaker correlation of the Prolo-score and the other scores, especially preoperative, is to be explained by the stronger weight of the economical status in this score, which the other scores consider only in connection with functional deficit (example: Nurick-score grade three) and not separately. Postoperatively the correlation between the Prolo-score and the other scores gets stronger. Persistent functional deficit leads to persistent inability to work and attribution to higher grades of severity, whereas patients with improved symptoms were again capable to work and get better score-results.

Reflection of the clinical symptoms in the scores

The scores assess gait function, fine motor function, sensibility, proprioception and coordination as well as vegetative symptoms. In addition, the Prolo-score and Nurick-score also assess the economic situation of the patient. To clarify which symptoms have an impact on the score-result, we investigated, using the U-test of Mann–Whitney, whether presence or absence of the funicular and radicular symptoms show a significant difference in the heights of the scores (Table 7).

Table 7 Correlation between scores and clinical symptoms

For clinical symptoms leading to significant worse score-results in at least one of the scores (Table 7: + or ++) [14], bar diagrams were designed (Fig. 1: example: gait dysfunction).

Fig. 1
figure 1

Bar diagrams grades of severity (in percent) for each score depending on the occurrence (yes) or non occurrence (no) of gait function. Figure shows, by taking gait competence as an example, that with exception of the Prolo-score, all scores tend to higher grades of severity if gait dysfunction does occur (yes), yet this tendency is not statistically significant for JOA-score and EMS. The same tendency was shown for paresis of the lower extremity, vegetative symptoms, Dysdiadochokinesia, paresis of the upper extremity and paraesthesia in the lower extremity for all the scores

Recovery-rate

The recovery-rate shows the changes of the scores, taking improvement, worsening and no changes of the scores into consideration. It is assessed by the following formula [10]

$$ {\text{Recovery - rate}} = \frac{{{\left( {{\text{postoperative}}\;{\text{score}} - {\text{preoperative}}\;{\text{score}}} \right)}}} {{{\left( {{\text{total}}\;{\text{score}} - {\text{preoperative}}\;{\text{score}}} \right)}}}. $$

Table 8 shows the recovery-rates of the scores in comparison with the number of patients showing improved score-results.

Table 8 Comparison of the recovery-rates and the number of patients showing improved symptoms

The recovery-rates vary less than the number of patients showing improvement. The difference between the recovery-rates of the Nurick-score and the EMS and the recovery-rates of the JOA-score is statistical significant (P < 0.05). The comparison of the recovery-rates of all the other scores showed no significant correlation (Wilcoxon-test: P > 0.05). Therefore an assessment of the recovery-rate is recommended, as it allows a better comparison of the results of differently structured scores.

Reflection of diagnostic and anamnestic data in the scores

Duration of symptoms

The duration of myelopatic symptoms has no influence on the height of the scores (Spearman’s rho: P > 0.05).

Age

Younger patients showed, with exception of the Prolo-score, preoperatively significantly better score results than older patients (Spearman’s rho: P < 0.015). By using the Wilcoxon-test, it was shown that the increase of the scores and thereby the improvement of the preoperative clinical symptoms were more significant for patients at working age (P < = 0.001) than for those in retirement (P < 0.05). Younger patients seem to benefit more from operation than older patients.

Economic situation

The economic situation (capable/incapable of gainful employment) is preoperatively only significantly reflected by the Prolo-score (Kruskal–Wallis-test: P = 0.002), and it assigns patients incapable of gainful employment to higher grades of severity. A preoperative selection of patients with stronger functional deficit against those with slight incapacity is missing in the Prolo-score. The other scores evaluate the grade of severity by the grade of functional disorder rather than the economic situation and preoperatively show no correlation to the economic situation. Postoperatively this correlation was proven to be significant for all the scores, with exception of the CMS of the lower extremity (Kruskal–Wallis-test: P < 0.05). The postoperative course shows that the evaluation of the economic situation is appropriate, since persistent functional deficit can effect the ability to work, and is therefore important to consider when evaluating the outcome.

Severity of work

There was no significant correlation between the height of the scores and the severity of work the patients had to cope with in their jobs (Kruskal–Wallis-test: P > 0.05).

Somatic evoked potentials

Somatic evoked potentials of the N. medianus (MSEP) and the N. tibialis (TSEP) were assessed pre- and postoperatively. Preoperatively 82% of the patients showed pathological TSEP and 43% a pathological MSEP. Postoperatively 86% showed a pathological TSEP, and a pathological MSEP was registered in 36% of the patients.

There was no significant correlation between the SEP and the height of the scores (Kruskal–Wallis-test: P > 0.05.

Radiological findings

The radiological examination showed a normal lordotic alignment of the cervical spine in 58% of the patients, 42% showed an abnormal alignment (steep or kyphotic cervical spine). High signal intensity of the myelon appeared in 47% of the patients. A narrow anterior–posterior diameter of the spinal canal smaller than 14 mm over at least five levels was detected in 28%. A compression of the myelon was observed in 56% of the patients, while 12% showed a radicular compression.

Abnormal curvature of the cervical spine was described as a negative predictor for the postoperative recovery by several authors [1, 20, 30].

The CMS of the lower extremity preoperatively showed significantly lower grades of severity in cases of false alignment of the cervical spine (U-test: P < 0.05). The other scores showed no significant correlation to any of the radiological findings. Signal intensity and myelon or radicular compression had no impact on the height of the scores (U-test: P > 0.05).

Discussion

The “Nurick-score” is the oldest of the scores investigated, and like the JOA-score it is well established in literature, however, the Nurick-score judges the postoperative outcome less accurately than the other scores. Changes of the upper extremity are difficult to detect by using the Nurick-score, since the main focus is on gait function. Investigating the clinical dynamics, more improvement was found in radicular rather than in funicular symptoms [3].

The “CMS” shows this change by analysing functions of the upper and lower extremity separately, and alone significantly reflects both of these symptoms. Therefore the CMS should be preferred to the Nurick-score when evaluating the patient’s functional status. Nevertheless, one benefit of the Nurick-score is the assessment of the economic situation in connection with gait function, which especially postoperatively has a great impact on the score-result, while the CMS does not consider the economic situation.

The “JOA-score” best measures the outcome when compared to the other scores. Changes of sensitivity disorders of the upper extremity are very well reflected, since they are judged differentiated. The JOA-score is a score frequently used in literature, which is why it can be recommended for consideration, as the results of different studies are more easily compared using the same score [13].

However, to evaluate the total function of a person with CSM, the “EMS” should be preferred to the JOA-score, since the EMS seems to be able to reveal functional deficit better by additionally assessing proprioception and coordination. Analysing those symptoms allows conclusions to be drawn about the abilities of the patient at work and in everyday.

As shown in Table 7, the Nurick-score only reflects symptoms of the lower extremity, while JOA-score, CMS and EMS also take paresis of the upper extremity as a symptom of root compression and dysdiadochokinesia (disturbance of coordination) into account. They significantly judge function of the second motor neuron, the Funiculus posterior, and the proprioceptive system, while only the CMS significantly judges symptoms of the upper as well as the lower extremity.

The “Prolo-score” does not reflect clinical symptoms significantly, but still shows a recovery rate similar to the JOA-score and the CMS. Therefore the Prolo-score is useful to assess changes of symptoms in CSM after operative intervention. Regaining the ability to work and being able to perform housework or retirement activities can be understood as a measure of normalisation and rehabilitation. However, the Prolo-score is not suitable to evaluate the preoperative grade of severity, since it does not differentiate clinical symptoms [28].

It shows, that with exception of the Prolo-score, all scores measure the essential, mostly funicular symptoms of the cervical myelopathy. Paraesthesia of the upper extremity, pain, muscle reflexes, pathologic reflexes and muscle tone show no significant correlation with the height of the score-results, meaning that those symptoms being positive or negative have no influence on the grade of severity in any of the scores life.

Younger patients seem to benefit more from operation than older patients; however, these results are only partially confirmed in the literature [16, 20]. Yamazaki et al. (2003) [32] and Fessler et al. (1998) [8] proved that the preoperative evaluated scores (using JOA-score or Nurick-score) were significantly lower in older patients than in younger ones, but could not find age to be a predictive factor for better or worse outcome.

Eight of the recovery-rate, as well as the outcome, are not correlated. The results of Fessler et al. (1998) [8], Schön (1999) [30] and Restuccia et al. (1992) [29] support this conclusion.

Davis (1996) [6] showed in his study on operative therapy in patients with cervical radiculopathy that patients in jobs with less demanding work and/or housewives had better score-results than patients that carry out strenuous jobs. This could not be proven to be significant in patients with cervical myelopathy.

There was a weak correlation between the height of the recovery-rate and the height of the Prolo-score (P < 0.05), meaning that patients with a higher preoperative Prolo-score show a higher recovery-rate. This result corresponds with a statement given by Chiles et al. (1999) [4], that preoperatively higher scores are predictive for better outcome.

Therefore and because of the weak correlation considering the Prolo-score, it seems that the preoperative severity of clinical symptoms and the height of the recovery-rate, as well as the outcome, are not correlated. The results of Fessler et al. (1998) [8], Schön (1999) [30] and Restuccia et al. (1992) [29] support this conclusion.

Schön (1999) [30] describes a very weak correlation between a lower EMS and pathological SEP. Lyu et al. (2004) [19] found out the same using the JOA-score. The results of the own study do not confirm these results.

In the radiological findings an abnormal curvature of the cervical spine was described as a negative predictor for the postoperative recovery by several authors [1, 20, 30].

This circumstance might be due to an earlier indication for operative intervention, in cases of little clinical but severe radiological findings, to prevent progression of the cervical myelopathy. Furthermore, it is widely known that symptoms of the cervical myelopathy occur more often in patients with steep or kyphotic alignment of the cervical spine [2, 10, 11, 16, 18, 20, 21, 2325].

The other scores showed no significant correlation to any of the radiological findings. Signal intensity and myelon or radicular compression had no impact on the height of the scores (U-test: P > 0.05). Naderi et al. (1998) [20] describe a postoperative improvement of the neurological status for patients with and without high signal alternation in the MRI, the recovery-rate thereby being better for patients without signal alternation. The difference between the two groups was not significant, however, some researchers have reported a significant correlation between the neurological status and presence of high signal intensity within the spinal cord [33, 34].

Conclusion

The investigated internationally used scores clearly have different benefits and problems, and so one must carefully choose the score best fitting the particular study.

To evaluate the clinical state and the grade of severity of the CSM the EMS and the CMS (assessing upper and lower extremity function separately) seem particularly appropriate. If the interests are focused on the regained ability to work, or the ability to perform leisure time, which might interest public health and paying authorities, the Prolo-score should be considered.

Since the scores discussed here are internationally well established, it is not necessary or feasible to create another improved score-system. Nevertheless, it is always important to differentiate the therapy results of CSM published worldwide. To assess the postoperative successes, the evaluation of the recovery rate is essential. Since there is no significant difference in the recovery rate amongst the majority of the scores, this allows a good comparison of the results from different studies.