Introduction

A variety of patient-reported outcome (PRO) measures have been developed and used for patients with degenerative cervical myelopathy, such as Short Form-36 (SF-36) [1], Neck Disability Index (NDI) [2], and EuroQOL (EQ-5D) [3]. Having such tools to assess quality of life or disability associated with cervical disorders from the patient’s own perspective is of absolute importance. The Japanese Orthopaedic Association (JOA) developed the JOA score in 1975 as a quantitative measure of the severity of myelopathy, and various modifications have been in wide use [4,5,6]. Contrary to a common misunderstanding, JOA and modified JOA are not PROs, but rather are scales measured by healthcare providers with limited objectivity, although they remain in use as primary outcome measures for cervical myelopathy. In 2007, the JOA produced a PRO questionnaire specifically designed for cervical myelopathy called the Japanese Orthopaedic Association Cervical Myelopathy Evaluation Questionnaire (JOACMEQ) [7]. The JOACMEQ comprises 24 questions in 5 domains (cervical spine function, upper extremity function, lower extremity function, bladder function, and quality of life [QOL]), yielding five domain summary scores each ranging from 0 to 100. The validity and reliability of the JOACMEQ have been established in both the original form and translated versions [8,9,10].

When treatment success is discussed using PROs, minimum detectable change (MDC) and minimum clinically important difference (MCID) are two important benchmarks that can be referenced. The MDC is defined as the smallest definite change that can be detected by a measurement or perceived by a patient, while the MCID represents the smallest change recognized as clinically meaningful by a patient [11]. The challenge is that these thresholds can differ based on patient cohort, diagnosis, and surgical procedure. Moreover, the definition of “clinical importance” has yet to be conclusively established, and no gold standard exists for addressing this question. As a result, various cut-off values have been applied for each PRO reported in the literature.

Since 2002, we have investigated patients’ recognition of health transitions as well as satisfaction with treatment after cervical decompression surgery by distributing specific questionnaires in addition to standard PRO measures [12]. We believe that such results are useful for dichotomizing patients in order to estimate MDC and MCID. The purpose of the present study was to elucidate MDCs and MCIDs for the JOACMEQ, SF-36, NDI, and EQ-5D used for degenerative cervical myelopathy patients undergoing laminoplasty.

Materials and methods

Patient sample and outcome measurements

A consecutive series of laminoplasty for degenerative cervical myelopathy in patients ≥ 18 years old treated in a single academic institution from 2002 to 2010 was retrospectively reviewed. Patients with rheumatoid arthritis, which can compromise accurate assessment of motor function, and those with a history of malignancy, which can negatively affect QOL, were excluded from this analysis. Preoperative physical and mental dysfunction and QOL were assessed by the PRO questionnaires (JOACMEQ, SF-36, NDI and EQ-5D). NDI is described as a percentage of the full score in the present study. All patients were followed for more than 12 months, and postoperative status was re-assessed using the same questionnaire.

Postoperatively, patients were also asked to answer two additional anchor questions. Both questions were prepared with responses using 7-point Likert scales. The first question asked about the patient’s health transition, i.e., how much the patient deemed his or her postoperative condition had changed from the preoperative status, with possible answers of “much worse,” “worse,” “somewhat worse,” “about the same,” “somewhat better,” “better,” and “much better.” The other question asked if the patient was satisfied with the treatment results, with possible answers of “very dissatisfied,” “dissatisfied,” “somewhat dissatisfied,” “unsure,” “somewhat satisfied,” “satisfied,” and “very satisfied.”

Distribution-based method

The distribution-based method was used to estimate the MDC cut-off based on the statistical characteristics of the score distribution. With this method, the minimum amount of change potentially detectable was estimated to be greater than the standard error of measurement (SEM) with a 95% confidence interval [13]. The SEM was calculated as \({\text{SD }} \times {\sqrt {1 - R}}\), where SD is the standard deviation and R represents reliability. A study by Chien et al. [9] was referenced for the reliability of the five domains in the JOACMEQ (range 0.793–0.903), while 0.90 was used for the SF-36 physical component summary (PCS) and mental component summary (MCS) [14], 0.90 for NDI [15], and 0.69 for EQ-5D [16], according to previous reports.

Anchor-based method

The anchor-based method was also used for calculating cut-offs for MDC and MCID. Using this method, “anchors” as gold standards for assessing the change in the condition of a patient were utilized, and cut-offs were estimated based on receiver operating characteristic (ROC) curve analysis. ROC curves were created by plotting sensitivity and specificity. In this context, sensitivity was defined as the proportion of patients in whom the change in score was greater than the MDC/MCID on each measurement among those who met the gold standard criteria. For the MDC, this external criterion was defined as the patient conceiving their health status to be “somewhat better,” “better,” or “much better.” For the MCID, it was defined as the patient being “somewhat satisfied,” “satisfied,” or “very satisfied” with the treatment results. In contrast, specificity was defined as the proportion of the patients with a change in score smaller than the MDC/MCID among those who did not meet the gold standard criteria described above. Cut-offs were set at the points on the ROC curve where (1 − sensitivity)2 + (1 − specificity)2 was smallest according to the least-squares method.

Statistical analyses

All analyses were carried out using IBM SPSS Statistics version 19 software (SPSS, Somers, NY). To analyze differences in scores before and after surgery, a paired t test or Wilcoxon’s signed-rank test was used. Correlations between the variables were tested by either Pearson’s correlation coefficient or Spearman’s rank correlation coefficient rho. For all statistical tests, values of p < 0.05 were considered significant. Approval for this study was given by the institutional review board of the Clinical Research Support Center at the University of Tokyo Hospital.

Results

Demographics

A total of 109 cases of cervical laminoplasty for degenerative cervical myelopathy were reviewed. Four patients had rheumatoid arthritis and 5 had a history of malignancy, with 1 patient showing both. After excluding these 8 patients, 101 patients (64 males, 37 females) were included in the analysis. Mean age was 66.1 years (standard deviation [SD]: 10.8 years; range: 33–91 years). The most common diagnosis at surgery was cervical spondylotic myelopathy (n = 60, 59%), followed by ossification of the posterior longitudinal ligament (n = 38, 38%), disk herniation (n = 2, 2%), and ossification of the ligamentum flavum (n = 1, 1%). Mean follow-up was 27.5 months (SD 14.8 months; range 12–89 months).

Pre- and postoperative outcome measurements as well as their postoperative changes are summarized in Table 1. Domain scores of JOACMEQ showed significant postoperative improvement, except for bladder function score. All other outcome measurements also showed significant postoperative improvement, except for SF-36 MCS. Among these improvement, only JOACMEQ upper extremity function score showed very weak correlation with the follow-up period (r = 0.222, p = 0.04), but all the other score changes were not significantly correlated with the follow-up period (p = 0.07–0.89). Regarding the anchor questions, 68% of patients admitted that their health condition was at least “somewhat better” than before surgery, whereas 66% were at least “somewhat satisfied” with the treatment results and 49% were “satisfied” or “very satisfied.”

Table 1 Comparisons of pre- and postoperative outcome measures

MDCs and MCIDs

Based on the observed standard deviations and reliability values previously reported in the literature, the distribution-based MDC was calculated for the five domain scores of JOACMEQ, SF-36 PCS and MCS, NDI, and EQ-5D (Table 2).

Table 2 Minimum detectable changes and minimum clinically important differences for outcome measures

Next, ROC curves were determined for the outcome measures used, except for JOACMEQ bladder function and SF-36 MCS, which did not show significant postoperative improvements. First, sensitivity and specificity were plotted for the health transition question results in order to determine cut-offs for the MDC (Figs. 1 and 2). Area under the curve (AUC) varied from 0.575 to 0.695, depending on the score of interest (Table 2). Cervical spine function score of the JOACMEQ showed the worst discriminant capability, whereas QOL score showed the largest AUC. Anchor-based cut-offs for the MDC of the JOACMEQ, SF-36 PCS, NDI, and EQ-5D are shown in Table 2.

Fig. 1
figure 1

Receiver operating characteristic (ROC) curve to determine the minimum detectable changes (MDCs) for Japanese Orthopaedic Association Cervical Myelopathy Evaluation Questionnaire (JOACMEQ) domain scores. CS cervical spine function; UE upper extremity function; LE lower extremity function; QOL quality of life

Fig. 2
figure 2

ROC curve to determine the MDCs for Short Form-36 (SF-36) physical component summary (PCS), Neck Disability Index (NDI), and EuroQOL (EQ-5D)

In the same manner, ROC curves were created using the results from the patient satisfaction question to determine the MCID (Figs. 3 and 4). AUCs ranged from 0.584 to 0.753, with JOACMEQ cervical spine function being smallest and NDI being largest. AUCs for patient satisfaction ROC curves tended to be larger than those for the health transition ROC curve, with some exceptions (Table 2). Cut-offs for MCIDs of the JOACMEQ were 2.5 for cervical spine function, 13.0 for upper extremity function, 9.35 for lower extremity function, and 9.5 for QOL. MCID was 3.9 for SF-36 PCS, 4.2 for NDI, and 0.0485 for EQ-5D. For the reference, the same analyses were performed with more stringent criteria, where only “satisfied” and “very satisfied” patients included as responders, and MCIDs for the JOACMEQ cervical spine function, QOL, SF-36 PCS, and NDI were calculated to be larger.

Fig. 3
figure 3

ROC curve to determine the minimum clinically important differences (MCIDs) for JOACMEQ domain scores. CS cervical spine function; UE upper extremity function; LE lower extremity function; QOL quality of life

Fig. 4
figure 4

ROC curve to determine the MCIDs for SF-36 PCS, NDI, and EQ-5D

Discussion

The present study elucidated MDCs and MCIDs for the JOACMEQ, SF-36, NDI, and EQ-5D, all of which are commonly used for assessing dysfunction and QOL of degenerative cervical myelopathy patients, in the context of cervical laminoplasty.

The MDC and MCID are often confused with each other, but represent two conceptually different properties of outcome measurements [11]. As we have mentioned, inconsistencies in calculation and definition have resulted in the various threshold values reported in the literature. A distribution-based cut-off reveals the score change beyond statistical measurement error. We therefore believe that it should be defined as MDC. As for anchor-based cut-offs, the choice of anchors has a substantial impact on the results. In the present study, patients were instructed to answer two independent questions about their health transition and post-treatment satisfaction, and MDC and MCID were differentiated using these two separate anchors.

Several previous reports have attempted to calculate MCIDs for the outcome measures used in the present study. A review of the literature is summarized in Table 3. One study only reported distribution-based MCID [17], while others also employed anchor-based methods [18,19,20,21,22,23]. Anchors varied from an external criterion of health transition or patient satisfaction as answered in independent questionnaires, to the cut-off of NDI as a surrogate. Three studies used the SF-36 health transition item (HTI) as the anchor [18, 19, 22], and another study employed global rating of change, which uses a standard 11-point Likert scale [21]. The HTI is essentially a five-level Likert-type question that assesses the patient’s recognition of whether their health status has improved since treatment [14], and is equivalent to the 7-level health transition scale we used. Based on our definition, the cut-offs reported as MCIDs were equivalent to MDCs in the present study.

Table 3 Review of the literature

MDCs and MCIDs for the five domain scores of the original version of JOACMEQ have never been reported, although Chien et al. mentioned MCIDs for the Chinese-translated version [21]. The JOA has proposed that an increase in score of ≥ 20 in each domain be considered as “effective” [24]. However, the present study revealed that the MCID of the original JOACMEQ ranged from 2.5 to 13 depending on the domain, suggesting that the previous criteria argued by JOA were too stringent. Interestingly, the MDC of JOACMEQ cervical function score was larger than the MCID. Typically, the MCID tends to be larger than the MDC and any changes that fall between these two cut-offs indicate “statistically significant,” but not clinically important, changes. On the other hand, when the MDC exceeds the MCID, changes between these two values can be clinically important, but not distinguishable from measurement error. Therefore, using the MDC rather than the MCID as a threshold to evaluate recovery in cervical function using JOACMEQ could represent a safer option.

Comparison of the present results with those from previous reports is not simple. As we have previously mentioned, cut-offs can be affected by the method of calculation, the anchor in anchor-based method, the patient cohort, and the surgical procedure. Table 3 shows that MCIDs previously reported in the literature were within the same range as ours, except for JOACMEQ and NDI. Chien et al. [21] reported much smaller values for the MCIDs of JOACMEQ upper and lower extremity functions. The biggest difference between their study and ours was the time frame for postoperative assessment. They used very early postoperative results as early as 3 months postoperatively to see changes in score, whereas the majority of studies (including our own) used a 1-year time point. This might have impacted patients’ recognition of health transitions in that relatively smaller improvement could have been deemed satisfactory. Our MCID for NDI was smaller than reported in any previous studies. One possible explanation is that our present cohort might have consisted of patients with relatively minor neck pain. Conversely, Carreon et al. [18] showed the average preoperative NDI was 53.0%, much higher than our result, for instance. They defined their cohort as patients undergoing cervical fusion surgery for degenerative conditions. In general, Asian populations with a background of developmental canal stenosis are subject to spinal cord compression at an earlier stage of degenerative change, and thus with less neck pain, than other populations.

A few limitations need to be considered when interpreting the results of the present study. First, our cohort consisted only of patients who underwent cervical laminoplasty. Thus, the present cut-offs for MDCs and MCIDs may be not applicable to other cohorts, such as patients undergoing anterior procedures or posterior fusion surgery. In particular, outcomes after fusion surgery indicated for cervical deformity with neck pain could differ significantly from those after decompression surgery for myelopathy. Second, as we have repeatedly mentioned, no anchor has been established for determining the MCID. Although several authors have used health transition scales as an external anchor, we believe that patient satisfaction is more suitable, despite being affected by not only the treatment result, but also the hospital environment, the patient’s experience through the treatment course and various other non-clinical factors. Another problem is that no gold standard to assess satisfaction has yet been established. Parker et al. [20] selected a patient satisfaction anchor derived from the North American Spine Society Patient Satisfaction Questionnaire, whereas we used a 7-point Likert scale. Of note, the previous studies that used HTI included the smallest level of difference, recorded as “somewhat,” in the responder group, but Parker et al. defined the responders as only those who showed the extreme satisfaction who answered “The treatment met my expectations” on the scale. Therefore, we calculated MCIDs based on these two criteria with different levels of stringency and proved that the cut-off values with more stringent criteria tended to be larger. Lastly, our follow-up period varied among the patients, although all the patients were followed for more than 1 year. Given the fact that there were little to no correlations between the postoperative outcome score improvements and the follow-up period and that the perception of the postoperative health transition and satisfaction was linked to the individual timing of outcome measurements, we believe the possible bias introduced into calculation of MCID was negligible.

Conclusion

In conclusion, the present study revealed that MCIDs among patients undergoing cervical laminoplasty were 2.5 for JOACEMQ cervical spine function, 13.0 for upper extremity function, 9.35 for lower extremity function, 9.5 for QOL, 3.9 for SF-36 physical component summary score, 4.2 for NDI, and 0.0485 for EQ-5D based on anchor-based method using a patient satisfaction scale.