Introduction

Neck pain (NP) is nearly as common as low back pain (LBP) and causes almost as many lost working days [1]. When pain persists for more than 12 weeks, it is usually defined as chronic. Chronic NP has a 1-year prevalence ranging from 1.7 to 11.5 % in the general population [2] and can be a significant burden to both the patient and society.

In recent years, patient reported outcomes have become a useful means of quantifying disability and monitoring the effectiveness of interventions for patients with spinal diseases, including those resulting in neck and/or arm pain. It has been recommended that at least five domains should be assessed: pain symptoms (axial and radiating pain), function, symptom-specific well-being, work disability and social disability [3]. Given this premise, in 1998 a multidimensional set of questions was recommended and introduced as “the core set” [3]. More recently, the questions were put together in an instrument entitled the “Core Outcome Measures Index” (COMI), covering the aforementioned domains and also including a question about the quality of life [4]. The instrument showed excellent psychometric characteristics in patients with back pain undergoing either surgical or conservative management [5, 6]. Multilingual adaptations of the COMI for use in patients with LBP confirmed that it displayed adequate psychometric properties and the COMI has subsequently become the main patient-oriented outcome tool for the surgical and conservative registries of the “Spine Tango Registry” of EuroSpine [7].

To complement the COMI-back, a COMI-neck was developed, with its psychometric properties being investigated in patients with chronic NP [8] and patients with degenerative problems of the cervical spine undergoing disc arthroplasty [9]. It was shown to be an effective instrument, without notable floor or ceiling effects, and to have good construct validity when compared with other instruments commonly used to evaluate neck disorders. The instrument was also responsive, being able to discriminate well between patients with a good outcome and those with a poor outcome after the surgery [9]. In order to evaluate the wider applicability of this disease-specific questionnaire (COMI-neck), it was recommended that its applicability be examined in other languages as well as in patients undergoing other treatment modalities [9].

The aim of this study was to describe the validation of the cross-culturally adapted Italian version of the COMI-neck in patients with chronic NP undergoing conservative treatment.

Methods

The process of cross-cultural adaptation of the Italian COMI-back has already been described in detail [10] and was carried out in accordance with established guidelines [11]. The questionnaire was adapted for the cervical spine by enquiring about neck pain rather than back pain, and arm/shoulder pain rather than leg/buttock pain, and by making reference to the “neck problem” rather than the “back problem”. Otherwise the wording in the instrument was identical to that in the Italian version of the COMI-back. Both the English and Italian versions of the COMI-neck are shown in the “Appendices 1 and 2”. This longitudinal study was approved by the Institutional Review Board of our hospital. Patients gave their written consent to take part.

Patients

Outpatients attending the Physical Medicine and Rehabilitation Unit of our Hospital and an affiliated rehabilitation centre were recruited between March and December 2012. Inclusion criteria were chronic NP (lasting more than 3 months), aged ≥18, and the ability to read and speak Italian fluently. Exclusion criteria were specific causes of NP (e.g. disc herniation, stenosis, deformity, fracture), central and peripheral neurological signs, systemic illness (e.g. tumours and rheumatologic diseases), and psychiatric disorders. Patients with recent cerebrovascular events, myocardial infarction or chronic lung or renal disease were also excluded.

Demographic and clinical characteristics were recorded by research assistants.

Procedures

Two research assistants were involved in providing the participants with written information about the study procedures. Patients satisfying the admission criteria underwent an 8-week rehabilitation programme that included exercises aimed at improving postural control, strengthening and stabilising the neck muscles, and stretching. Patients also received education in ergonomic principles.

The Italian version of the COMI-neck was administered to all patients as part of a comprehensive pre- and post-rehabilitation assessment that included evaluations of disability, quality of life, pain and the global treatment outcome (GTO). The GTO was evaluated using the question “Overall, how much did the treatment you received help your neck problem?” and was answered on a 5-point Likert scale, ranging from “helped a lot” to “made things worse” [4, 12]. The GTO was dichotomised as “good” (helped, helped a lot) and “poor” (helped only little, did not help, made things worse) for further analyses.

Outcome measures

COMI-neck

This is a self-administered measure composed of seven questions aimed at evaluating pain (item 1a for NP and item 1b for arm/shoulder pain), neck-related function (item 2), symptom-specific well-being (item 3), general quality of life (item 4), and disability (item 5 for social and item 6 for work activities). All of the items relate to how the patient felt in the last week, except for items related to disability, which refer to the last 4 weeks. Items concerning pain use a 0–10 graphic rating scale (GRS) while the remaining items use a 5-point adjectival scale (see Fankhauser et al. [9] for details). For items 1a–1b, the higher of the two scores is used in order to represent “pain” and for items 5–6 the average is used to represent “disability”. Hence, the COMI-neck includes five domains (pain, function, symptom-specific well-being, quality of life, disability). To form the COMI summary score, each of the domain scores is transformed to a 0–10 scale and these are then averaged to give a score ranging from 0 to 10, with higher scores indicating a worse status [9].

Neck Pain and Disability Scale (NPDS)

This allows a comprehensive evaluation of neck pain and disability. Each of the 20 items is scored using a NRS ranging from 0 (normal function) to 5 (the worst possible situation your problem has led to), leading to a total score ranging from 0 (no disability) to 100 (maximum disability) [13]. Patients completed the validated Italian version, which consists of three subscales (NPDS 1: neck dysfunction related to general activities; NPDS 2: NP and cognitive-behavioural aspects; NPDS 3: neck dysfunction related to activities involving the cervical spine) [14].

Euroqol-5 dimensions (EQ-5D) and Euroquol visual analog scale (EQ-VAS)

The EQ-5D is a generic, self-administered questionnaire that measures health-related quality of life. It consists of five items concerning mobility, self-care, usual activities, pain/discomfort and anxiety/depression. Each item is rated on a 3-point adjectival scale. The EQ-VAS is used to quantify the “overall health state”, with the patient indicating his/her current health status on a 0 (worst score) to 100 (best score) VAS [15, 16]. The EQ-5D summary index scores [ranging from −0.594 (worse than death) to 1 (best possible health)] were calculated using the unweighted method of Prieto and Sacristan [17].

Pain numeric rating scale (NRS)

This is an 11-point rating scale ranging from 0 (no pain at all) to 10 (the worst imaginable pain) [18].

Psychometric properties

Acceptability

The time needed to answer the questionnaire was recorded. All of the data were checked for missing or multiple responses.

Floor/ceiling effects

Descriptive statistics were calculated in order to identify floor/ceiling effects, which were considered to be present when >15 % of the subjects obtained the lowest or highest possible scores, indicating the proportion for whom, respectively, no meaningful deterioration or improvement in their condition could be detected since they are already at the extreme of the range [9]. Due to the different scoring polarity of the questionnaires, for the COMI, the NRS and the NPDS, the highest scores represented floor effects (worst status) and the lowest scores, ceiling effects (best status); the converse was true for the EQ-5D and EQ-VAS scores.

Reliability

Reliability of the COMI-neck was assessed by evaluating the test–retest stability, which measures reliability over time (intraclass correlation coefficient, ICC (2,1), with good and excellent reliability indicated by values of 0.70–0.85 and >0.85, respectively) [19]. The COMI-neck was completed by the patients on two occasions, 7 days apart. This time interval was chosen to optimise the trade-off between recall effects (more likely with shorter intervals) and true change (more likely with longer intervals).

Minimum detectable change score

The smallest change in score that is likely to reflect a true change rather than a measurement error was estimated by means of the minimum detectable change (MDC). This was calculated by multiplying the standard error of the repeated measurements (SEM) by the z-score associated with the desired level of confidence (95 % in our case) and the square root of 2, which reflects the additional uncertainty introduced using difference scores based on measurements made at two time points (in our case on days 1 and 7). The SEM was estimated using the formula: SEM = SD[(1 − R)1/2], where SD is the baseline standard deviation of the measurements, and R the test–retest reliability coefficient (i.e. ICC) [19].

Construct validity

The extent to which an instrument’s score relates to the score of the theoretical construct of another instrument, to the expected degree, was investigated by means of hypothesis testing [19]. It was hypothesised a priori that the following pairs of COMI items and corresponding items/questionnaires would achieve a level of correlation ranging from 0.40 to 0.80 (Pearson’s correlation):

  • the COMI “worst pain” score and the NRS and the NPDS 2 subscale;

  • the COMI item “neck function” and the NPDS 1 and 3 subscales;

  • the COMI item ‘‘symptom-specific well-being’’ and the EuroQol-5D and EQ-VAS;

  • the COMI item ‘‘general quality of life’’ and the EuroQol-5D and EQ-VAS; and

  • the COMI “disability” average score and the NPDS 1 and 3 subscales.

Responsiveness

The ability of the instrument to detect change over time in the construct being measured was investigated by means of both distribution and anchor-based methods [20, 21].

  • Effect size (Standardized Response Mean, SRM) was calculated by dividing the mean difference in change scores (pre- to post-test scores) by the SD of these change scores; an effect size of 0.2 is regarded as small, 0.5 as moderate and 0.8 as large [22, 23]; this SRM allows a group-level interpretation of the study population undergoing treatment [24].

  • Unpaired t tests were used to detect significant differences between the change scores (i.e. from pre-treatment to post-treatment) for the good and the poor outcome groups.

  • A Receiver Operating Characteristics (ROC) analysis was calculated on the COMI score change using the dichotomous GTO (“good” and “poor”) as the external criterion. Using the ROC curve, the responsiveness is described in terms of sensitivity (probability of the measure correctly classifying patients who demonstrate change on an external criterion of clinical change; good GTO) and specificity (probability of the measure of correctly classifying patients who do not demonstrate change on an external criterion; poor GTO). Values for sensitivity and for false-positive rates (1-specificity) are plotted on the y- and the x-axis of the curve and the area under the curve (AUC) represents the probability a measure correctly classifies patients as improved or unchanged. This area theoretically ranges from 0.5 (non accuracy in discriminating) to 1.0 (perfect accuracy) and an AUC of at least 0.70 is considered to be acceptable [25]. The optimal cut off point was computed using the Youden index [25].

Statistical analyses

The analyses were carried out using the Italian version of SPSS 20.0 software.

Results

Patients

A total of 150 patients were originally invited to participate and, of these, 103 satisfied the inclusion criteria. There were 77 females (75 %) and 26 males (25 %) with a mean (±SD) age of 53.0 ± 12.5 years (range 21–79). The mean (±SD) duration of NP was 14.5 ± 13.2 months. The mean (±SD) body mass index was 23.8 ± 4.38. Table 1 shows the patients’ general characteristics.

Table 1 General characteristics of the study population (N = 103)

Psychometric properties

Acceptability

All of the questions of the COMI-neck were well accepted. The questionnaire was completed in a mean (±SD) time of 2.8 ± 1.3 min. No missing responses or multiple answers were given by any of the patients. There were no problems in comprehension.

Floor/ceiling effects

Table 2 shows the mean ± SD scores and the floor and ceiling effects for all of the outcome measures. The COMI summary score and the COMI items “pain” and “general quality of life” showed low floor and ceiling effects at both baseline and follow-up. At baseline, the COMI item “symptom specific well-being” showed a high floor effect (21.4 %), and the COMI item “disability” showed a high ceiling effect (43.7 %). At follow-up, the COMI items “neck-related function” and “disability” showed high ceiling effects (26.5 % and 59.8, respectively) and the EQ-5D a high ceiling effect (17.5 %), meaning that a high percentage of people reached the best status.

Table 2 Mean (SD) values and floor and ceiling effects for the different outcome measures

Reliability

Test–retest reliability was good for the COMI summary score (ICC = 0.87; 95 % CI: 0.81–0.91; Table 3).

Table 3 COMI-neck test–retest reliability

Minimum detectable change score

The MDC score was 1.8 points.

Construct validity

The relationships between each of the COMI item scores and the corresponding questionnaire scores are shown in Table 4. Most (75 %) of the a priori hypotheses were accepted. The COMI “worst pain” score showed moderate correlations with the NRS and the NPDS pain and cognitive-behavioural aspects subscale (0.45 and 0.48, respectively); the COMI item “neck function” showed moderate correlations with the NPDS 1 and 3 subscales (0.55 and 0.49, respectively); the COMI item “general quality of life” showed moderate to low correlations with the EuroQol-5D and EQ-VAS (−0.44 and −0.23, respectively); and the COMI “disability” average score showed moderate correlations with the NPDS 1 and 3 subscales (0.45 and 0.48, respectively). Only the relationships between the COMI item “symptom-specific well-being” and the EuroQol-5D and EQ-VAS (−0.15 and 0.24, respectively) failed to reach the hypothesised moderate correlation of 0.4–0.8.

Table 4 Correlation coefficients with 95 % confidence interval describing the relationship between the COMI-neck domains and the reference instruments at baseline

Responsiveness

The GTO was distributed as follows: 24 (23.3 %) helped a lot, 62 (60.2 %) helped, 13 (12.6 %) helped only a little, 2 (1.9 %) did not help, 2 (1.9 %) made things worse. As a consequence, the “good outcome” group consisted of 86 patients (83.5 %) and the “poor outcome” group of 17 patients (16.5 %). There was a significant difference in the mean COMI-neck total change scores for the good and poor outcome groups (2.02 ± 1.50 and 0.59 ± 1.50, respectively; p = 0.002). An SRM of approximately 0.40 was obtained for the poor outcome group and an SRM of about 1.23 for the good outcome group. Hence, changes in the COMI-neck total score showed a good ability to discriminate between outcome groups (high SRM in the good outcome group and low SRM in the poor outcome group). The ROC analysis carried out on the COMI-neck change scores revealed an AUC of 0.73 (95 % CI 0.62–0.85), showing significant discriminative abilities; the cut off point that best discriminated between good and poor outcomes was a change score ≥2.0 points (sensitivity 55 %, specificity 88 %).

Discussion

This paper describes the validation of the cross-culturally adapted COMI-neck in a sample of Italian patients with chronic NP. The Italian COMI-neck displayed acceptable psychometric characteristics, and required about 3 min to complete. It would, therefore, appear to be an appropriate instrument for use in everyday clinical practice and for the longitudinal assessment of conservative treatment outcomes related to NP [26].

Despite some floor and ceiling effects for single COMI items, the COMI-neck summary score showed no critical floor or ceiling effects. It should be borne in mind that floor and ceiling effects are population-dependent; the present study involved patients undergoing conservative treatment for NP, who typically suffer from only moderate disability and who generally have satisfactory improvements after the treatment. As expected, our findings differ from those reported for patients undergoing cervical spine surgery, who typically suffer from severe functional restrictions, neurological deficits and high pain preoperatively and who generally have only minimal symptoms after the treatment; in such patient groups, there are typically greater percentages of people with the worst status preoperatively and the best status postoperatively [9].

Test–retest reliability was satisfactory in the present study suggesting good repeatability over time in subjects with chronic NP. This property was also investigated in English speaking subjects and previous findings of ICCs ranging from 0.64 to 0.99 (p always <0.001) support our results [8].

COMI-neck proved to be sensitive to change in patients with chronic NP. At a 95 % confidence level, the minimal detectable change score indicated that if a patient showed a change of more than 1.8 points after conservative treatment, it would not likely be due to measurement error.

Similar to the findings for the German COMI-neck [9], the individual COMI items in our Italian version showed moderate correlations with their reference scales, confirming the priori hypotheses. The only exception was for “symptom-specific well-being”, which showed a low correlation with the corresponding full-length questionnaires; interestingly, similar findings were reported for the original versions of the COMI-neck and COMI-back [5, 9], confirming that this item is likely delivering unique information.

The very low SRM for the poor outcome group and high SRM for the good outcome group as well as the significant difference between the mean COMI-neck change scores for the two outcome groups indicated good discriminative ability of the instrument. In addition, the AUC value obtained for the ROC analysis of the Italian COMI-neck confirmed its ability to discriminate between the two GTO groups. Although different types of treatment were used in the present study, our responsiveness findings were in line with those for a previous study on the COMI-neck that showed a similarly good ability to discriminate between good and poor outcome groups [9]. Our cut off point for indicating a “good global outcome” (a ≥2 point reduction on the COMI), estimated on the basis of ROC analysis, was in line with that previously published [9]. This cut off score represents the minimum clinically important change score (MCIC) and the fact that it exceeded the MDC (1.8 points) indicated that the instrument was able to detect a “signal” (MCIC) in excess of the “noise” (MDC). The MCIC is useful for researchers and clinicians when assessing relevant individual change in longitudinal studies. However, the estimates of the MCIC may have been affected by the dichotomous “good”/“poor” classification and the subsequent division of the sample into sub-groups because the greater the imbalance between the sub-groups (in terms of the % patients in each), the less reliable the estimates.

This study has some limitations. Firstly, the relationships between COMI-neck and physical tests of neck function were not considered; only other questionnaires were used to assess the construct validity. Secondly, the number of patients in the poor outcome group was rather low, which may limit the external validity of the responsiveness analysis. Thirdly, GTO was assessed using a 5-point Likert scale, and clinically important changes would probably have been more discriminating if a 7-point scale had been used [27].

In conclusion, our findings show that the cross-culturally adapted Italian COMI-neck is a reliable, valid and responsive instrument for use in assessing the outcome of patients undergoing conservative treatment for chronic NP. The scale is recommended both for evaluating group outcomes in clinical trials and for individual patient monitoring.