Introduction

Patient views in measuring functional health status is important in order to understand and document both the impact of pain and symptoms and the effect of treatment in low back pain (LBP) patients [1, 2].

Patient-based outcome measures are usually classified as generic or disease-specific [3]. The generic measures are designed to measure the domains of general health, overall disability and quality of life and are important for broad comparisons across conditions. This is often at the expense of the responsiveness to clinically relevant change in specific diseases [4]. Therefore, disease-specific instruments measuring attributes of symptoms and functional status relevant to a particular disease or condition were developed and they are often found to be more responsive to the target condition when compared to generic measures [2, 57].

A plethora of back-specific instruments have been developed over the last decade, and in a recent review a total of 36 back-specific questionnaires attempting to address patient perceptions of their back trouble have been identified [8]. Choosing the “ideal” outcome measure for a clinical trial is virtually impossible since most instruments offer advantages and disadvantages depending for example on the type of study or patient population. As a result, Deyo and colleagues [9] proposed a standardised core set of instruments measuring five domains: pain symptoms, back related function, generic well-being, disability and satisfaction with care. These recommendations were updated by an expert panel in 2000 [4]. In the domain of back related function, the recommended and most widely used measures are the Oswestry Disability Index (ODI) and the Roland Morris Disability Questionnaire (RDQ). Consequently, a MEDLINE search revealed more than 300 citations in which the ODI had been used to assess disability in LBP and it has been found to be reliable, valid and responsive in particular in patients with a higher level of disability [1013]. The ODI exists in four versions, and to facilitate a comparison of results among studies, version 2.1 is recommended [8, 10, 11].

The ODI is a self-administered questionnaire initially developed by John O’Brien in 1976 and version 1.0 was published and validated in 1980 [14]. Version 1.0 has been adapted by the American Academy of Orthopaedic Surgeons (AAOS) omitting sections 1,8 and 9 and changing the score of each item from 0–5 to 1–6 [9, 11]. Another revision of version 1.0 was carried out by the Medical Research Council and was published as version 2.0 in 1989 [15]. This is not to be confused with the revised (modified) ODI also published in 1989 by a chiropractic study group [16]. In 2000 Fairbank and Pynsent [11] published a thorough review of the ODI with reprints of the four versions. However, in section 10 (travelling) of version 2.0 as published by Fairbank and Pynsent [11] the third and fourth response options have subtle mistakes. This was corrected in a subsequent publication by Roland and Fairbank [10] and is now referred to as version 2.1 [17].

Most questionnaires are developed in English-speaking countries and a direct translation for use in a different language may be problematic. Published guidelines for standardised translation and cross-cultural adaptation exist [18, 19]. The ODI has been cited in nine languages, some of which have followed a rigid cross-cultural adaptation process, and published in the literature. Several publications refer to a Danish version of the ODI [2023], however, a systematic search of the literature revealed no published translation, cross-cultural adaptation or validation into the Danish language.

The objectives of this two article series are twofold: (1) to translate and cross-culturally adapt the ODI version 2.1 into the Danish language and (2) to investigate the psychometric properties of the Danish ODI in a large population of back pain patients seen in the primary (PrS) and secondary sectors (SeS) of the Danish health care system. In paper 1 of this series, the translation and cross-cultural adaptation process, test–retest stability, scale width and construct validity are examined in two distinct back pain populations, and in paper 2 we examine the sensitivity, specificity and clinically significant improvement in the same two LBP populations.

Materials and methods

Translation and cross-cultural adaptation

The translation and cross-cultural adaptation process followed the five stages outlined in the recent guidelines [18, 19]. Written documentation was produced for each stage of the process serving as a memory aid for the expert committee review. Version 2.1 of the ODI was translated from English to Danish by two different and independent translators whose mother tongue was Danish. Translator 1 (T-1) was a professional translator with a secretarial job and, thus, naïve to the purpose and health concepts of the questionnaire. Being naïve to the purpose and concepts was useful in eliciting unexpected meanings from the original instrument. The other translator (T-2), a professor in clinical biomechanics, was aware of the purpose and the concepts involved in the instrument. This would improve the reliability of the ODI by allowing for a better idiomatic and conceptual rather than literal equivalence between the two versions of the questionnaire.

Both Danish translations (T-1 and T-2) were compared with one another to produce a preliminary translated version (T-12p). To evaluate the quality of the translation process, two independent raters judged the T-12p version of the ODI before retranslation into English. The quality was rated according to clarity of translation, common language use and conceptual equivalence on a scale ranging from 0 (not at all perfect) to 100 (perfect) [24]. A panel consisting of the forward translators, the independent raters and the main author evaluated the comments from the two raters to produce the final translated version (T-12).

The final T-12 version was then retranslated into English by two independent translators with English (British English and Australian English) as their mother tongue and Danish as their secondary language. Back-translator 1 (BT-1) was a teacher of English and back-translator 2 (BT-2) was a professor at the local University. Both had been living in Denmark for several years and were blinded to the original version of the ODI. In preparation for the expert committee review, the two retranslations were compared with the original version of the instrument resulting in a checklist highlighting major discrepancies in content between the two versions.

A bilingual expert committee including the forward and back-translators, a language specialist, a methodologist, a clinician and a recorder/coordinator was assembled to review all the versions of the forward and back translations. The purpose of the expert committee was to resolve major discrepancies detected in the translation and retranslation process, detect errors of interpretation and missed nuances, and assess the necessity of performing a cultural adaptation for use among Danish back pain patients. All issues raised during the expert committee review process were resolved by consensus and documented in a written report.

During the final part of the adaptation process the pre-final version was tested for content (face validity), wording, ease of understanding and missing items. Forty patients participated in the pre-testing of the questionnaire; 20 patients seen in the PrS of the Danish health care system (a chiropractic clinic) and 20 in the SeS (an out-patient hospital back pain clinic). Each patient completed the questionnaire followed by a questionnaire developed for the purpose of detecting comprehension. At completion, they were briefly interviewed to explore any problem areas in-depth. The findings were discussed among the translators resulting in only minor changes to the pre-final version. Further psychometric testing of the final version of the Danish ODI was carried out in a validation study.

Validation study

The study was reported and accepted by The Danish Data Protection Agency.

Patients and setting

Back pain patients’ initial entry point into the Danish health care system is the primary health care sector comprising general practitioners, chiropractors and physical therapists (via the general practitioner). Patients who do not respond to the initial treatment may get referred to a hospital-based multidisciplinary spinal unit in the SeS for further evaluation and management. Thus, sociodemographic and illness profiles of the patients in the two sectors are very different [25] and we recruited participants in both sectors. The PrS patients were recruited from seven chiropractic practices, whereas the SeS patients were enrolled from a multidisciplinary spinal unit (Backcenter Funen, Ringe). A total of 301 consecutive patients were recruited: 168 from the PrS and 133 SeS patients from the Danish health care system (Fig. 1). Sixty-eight patients refused to participate resulting in a baseline study population of 233.

Fig. 1
figure 1

Flow of participants and dropouts in the validation study. Groups A and B received a 7-point and 15-point transition question, respectively

Questionnaires

A questionnaire booklet was constructed for the validation study which included the final version of the Danish ODI, the 23-item RDQ [26, 27], the two subscales of the LBP Rating Scale—Pain (LBPRSpain) and disability (LBPRSdisability) [23]—and the two subscales of the SF36—physical function SF36 (pf) and bodily pain SF36 (bp) scales [24, 28, 29]. Furthermore a global 0–10 numeric rating scale (NRSpain) measuring back and/or leg pain intensity “today” was included.

The questionnaire booklet used for test–retest reliability contained the Danish version of the ODI with the questions rearranged at follow-up and a single question asking whether the patient had experienced any change since the last time completing the questionnaire.

The patients’ global retrospective assessment of treatment effect (transition question) was measured using a 7-point Likert scale ranging from “much better” to “much worse” [30]. All patients were told their baseline global rating of pain severity (NRSpain) before answering the transition question [31, 32]. In addition, they were asked to rate the importance of any changes in their back/leg pain since baseline using a 0–10 NRS.

Data collection

Patients eligible to participate in the study had to fulfil certain criteria: (1) age above 18, (2) presence of LBP and/or leg pain and (3) able to read and understand Danish. Exclusion criteria were: (1) suspected pathological disorder of the spine (fractures, spinal infections or malignancy, ankylosing spondylitis, rheumatoid arthritis, or other inflammatory diseases) and (2) patients with a known psychiatric disorder.

Twenty minutes before the initial consultation, the purpose of the study was explained and oral consent was obtained. The patients filled in the baseline questionnaire booklet. A test–retest questionnaire booklet was completed 1 day after for the PrS patients and 1 week after for the SeS patients. The shorter interval for the PrS test–retest patients was selected as these patients are likely to demonstrate true change due to the natural history of back pain and a possible treatment effect [33, 34]. This is more unlikely in SeS patients as the duration of LBP is longer. Only patients reporting to be stable were included in the test–retest analysis. At 8 weeks the patients received the final questionnaire booklet. All patients who completed the questionnaire at 8 weeks participated in a telephone interview carried out by a professional interviewer from the Danish National Institute of Social Research. Information on the patient's retrospective assessment of the treatment was obtained, and to reduce dependence between the transition question and the questionnaires the interview was conducted 3–5 days after the 8 weeks follow-up [35].

Analysis

Data transformation. The two subscales of both the SF36 and the LBPRS, the RDQ and the NRSpain were transformed to cover an interval ranging from 0% to 100%, with a high score representing higher disability or pain [36].

The raw change score for each outcome measure was obtained by subtracting the 8 weeks follow-up score from the baseline score. For the last part of the concurrent validity calculations, the raw change scores were converted into standard scores with a mean of 0 and a standard deviation of 1, thus, allowing for between-scale comparisons [37].

Reliability

Psychometricians have for years used reliability as a generic term to indicate both homogeneity (internal consistency) of a scale and reproducibility of scores [38, 39].

Homogeneity (internal consistency) assesses to which extent the items in a scale are interrelated and taps different aspects of the same attribute (unidimensionality). We used item-total correlations and Cronbach’s alpha coefficient to assess internal consistency. Item-total correlation is the correlation of the individual item with the scale total omitting that item. Cronbach’s alpha was calculated from the baseline values and homogeneity is considered acceptable when Cronbach’s alpha exceeds 0.7 although it is often recommended that values should not be above 0.9 as this suggests item redundancy [37, 40]. To further evaluate each items contribution to the total score, we graphed the item score against the five score categories as described by Fairbank et al. [14]. If the item correlates well with the latent variable (pain related function) an increase in the line is expected as the total ODI score increases. On the other hand, a more horizontally oriented line may represent an item which belongs to a different latent variable [27].

Reproducibility was measured using the intraclass correlation coefficient (ICC) for repeated trials [39] and using the limits of agreement (LOA) as outlined by Bland and Altman [41, 42]. The Bland and Altman method has several advantages when compared to all correlation coefficients. First, correlation coefficients depend on the range and distribution of the variables and, hence, the way in which the sample of subjects was chosen. Lastly, correlation coefficients may be high despite a poor agreement between the repeated measurements [43].

Scale width

The lowest and highest possible scores of a scale are known as the “floor” and “ceiling”. If a high proportion of patients score at or very close to the floor or ceiling, no further improvement or deterioration can be detected resulting in biased results [44].

Scale width is defined as the region of the score range of an instrument with the capacity to allow detection of change in scores over time and is an extension of the “floor” and “ceiling” concepts [45]. In addition to reporting floor and ceiling effects. We used the LOA interval at each end of the scale to be 95% confident that a change greater than instrument measurement error can take place, in addition to reporting floor and ceiling effects [45].

Validity

Cross-sectional discriminant validity assesses whether the scales under investigation can differentiate among groups of patients with different levels of a chosen factor (e.g. symptom location). We chose to assess the following baseline factors from the medical history at two levels: (1) location of symptoms (LBP only vs. leg pain ± LBP) [46, 47], (2) pain duration of the current episode (≤ 30 days vs. > 30 days) [48] and (3) frequency of taking medication during the last week (less than a couple of times during the last week vs. more than a couple of times during the last week) [46].

Concurrent validity analysis was carried out at baseline and 8 weeks follow-up. We tested the ODI and the external instruments for within- and between-scale systematic differences in patient grading by calculating the difference in the mean score of the instruments for the two patient populations. Between-scale systematic differences were tested using an interaction term in the regression model. Second, the ability of an instrument to distinguish between different degrees of patient disability can be expressed as how well the patients are spread out on the response scale (0–100%). Using a variance comparison test, we compared the spread of the ODI scores to the other instruments. Lastly, we examined whether the individual patient score level on the ODI scale was comparable to the external instruments. Bland–Altman LOA plots of standardised scores were used for this analysis [41, 43].

Longitudinal external construct validity examines whether or not a scale measuring a certain domain over time correlates appreciably well with other scales that theory suggests should be related to it [49, 50]. Longitudinal external construct validity was assessed by comparing the change score of the ODI with that of the external measures using Pearson’s correlation coefficient (r).

All statistical calculations were carried out using the statistical package STATA® v. 8.2 SE (StataCorp). Robust variance estimation was applied whenever possible in order to reduce the dependency on normality assumption and statistical significance was accepted at the P < 0.05 level.

Results

Translation and cross-cultural adaptation

During the translation process, several noteworthy issues arose. First, in section 1 there was disagreement among the expert committee members as to how to scale the severity of pain in Danish. Many words exist describing for example “mild pain” or “moderate pain”. Consensus was reached by close scrutiny of (a) common language and (b) conceptual equivalence. Second, as noted in the German translation of the ODI [51], it seems illogical to have “very painful” in answer category 2 and 3 of section 2 (personal care) as category 2 reflects less disability compared to category 3. Thus, we omitted the word “very” from answer category 2 in this section of the Danish ODI. Third, the expert committee discussed how to translate “travelling” as the equivalent Danish word “rejse” is conceptually slightly different. However, in lack of a more precise word, the committee agreed on using this word.

The quality of the translation process showed an overall difficulty rating (average of clarity, common language and conceptual equivalence) well above 90 for all sections of the questionnaire (data not shown). Item 1 (pain) and 6 (standing) showed the poorest difficulty ratings (91 and 94) corresponding to a high number of comments but only minor wording changes. The Danish version of the ODI is available from the official ODI website [17] or from the authors on request.

Validation study

Participants and missing data

Three hundred and one consecutive patients (PrS: n = 168; SeS: n = 133) were eligible for inclusion into the study (Fig. 1). The baseline response rate was 77% leaving 233 included patients at baseline (PrS: n = 128; SeS: n = 105). At 8 weeks the follow-up response rate was 82% of the baseline entry; thus, 191 patients (PrS: n = 94; SeS: n = 97) were available for analysis at 8 weeks follow-up. An additional ten patients dropped out at the 9 weeks telephone interview mostly from the SeS.

The baseline demographics of the two study populations are shown in Table 1. Age distribution and the ratio of male/female were similar in the two groups whereas all the other characteristics were distinctly different. Patients from the PrS had mostly LBP only, shorter duration of the current LBP episode and used less medication compared to SeS patients.

Table 1 Baseline descriptive data for the two study groups

A dropout analysis showed a lower mean age for the dropouts (8 years lower) in both PrS and SeS patients and dropouts from the SeS were more likely to be males with longstanding problems but lower medication use.

At baseline 25 patients (11%) failed to answer item 8 (sex life) and 15 patients (6%) failed to answer item 10 (travelling) and this was equally distributed between PrS and SeS patients.

Reliability and stability

Homogeneity was assessed using Cronbach’s alpha and item-total correlations at baseline (n = 233). For the whole group alpha was 0.88. For PrS and SeS patients we found an alpha of 0.89 and 0.85, respectively. Item-total correlations ranged from 0.54 (item 7, sleeping) to 0.73 (item 10, travelling) in the whole group.

The influence of each item on the total ODI score is depicted in Fig. 2. In general, all item scores increase with an increasing total ODI score. Thus, each item contributes to the total score and belongs to the same latent variable (pain related function). Items 8 and 10 (sex life and travelling) seem to respond better at higher ODI scores; however, caution should be taken as to the validity of this since the number of patients is low (n = 5).

Fig. 2
figure 2

Mean score for each ODI item in relation to the five total score categories. The average score of each item is depicted as a function of the total baseline entry score (divided into five score categories). An increase in the average item score with an increasing total baseline entry score signifies good correlation with the latent variable of the instrument. *Number of patients in each category

Repeatability was carried out on 93 stable patients (PrS: n = 36; SeS: n = 57). The mean (SD) time interval for completion of the two questionnaires was 9.1 (10.6) days for all patients, 4.4 (9.8) days for PrS patients and 12.0 (10.1) days for SeS patients. The ODI showed excellent test–retest reliability, as evidenced by the ICC and LOA. ICC was 0.91 among all patients, 0.93 in PrS patients and 0.89 in SeS patients. The mean difference and 95% LOA for all patients were 0.8 (−11.5 to + 13.0) with no noteworthy difference between PrS and SeS patients [2.2 (−9.2 to + 13.6) and −0.1 (−12.7 to + 12.4), respectively]. Thus, no systematic bias was found between the test and retest and the spread of the dots was uniform (Fig. 3). All normal plots of the differences were acceptable.

Fig. 3
figure 3

Limits (95% ) of agreement plot for repeated measures of the ODI in stable patients. The plot shows the difference in ODI score against the average ODI score

Scale width

Only one patient obtained the lowest possible score (floor effect) whereas no patients reached the ceiling of the scale at baseline. However, the proportion of patients scoring outside the scale width (as indicated by the LOA) showed a different picture. A total of 25 patients (10.7%) scored within the lower score range (0–11.5%) with 18 (14.1%) being PrS patients and 7 (6.7%) being SeS patients. No patients scored within the upper score range (87–100%).

Validity

Cross-sectional discriminant validity. Table 2 provides a summary of the findings for the cross-sectional discriminant validity analysis. The results show a small monotonic decrease in the ODI score with more proximal symptoms (< 0.001), shorter pain duration of the current episode (< 0.05) and a larger increase in ODI score with more medication usage (< 0.001). No differences were observed between the PrS and SeS patient groups.

Table 2 The ability of the ODI to distinguish between clinically important subgroups (cross-sectional discriminant validity) at baseline

Concurrent validity. We looked at three different aspects of concurrent validity. First, the ODI was tested for systematic differences when compared to the other instruments. At baseline the ODI measured ≈ 10% (< 0.01) lower compared to the external disability measures [RMQ, LBPRSdisability and SF36 (pf)] and ≈ 21% (< 0.01) lower compared to the external pain measures (LBPRSpain, SF36 (bp) and NRSpain). The same trend was noted at 8 weeks follow-up and between PrS and SeS patients. We also looked at the within- and between-scale systematic differences at baseline between the two study populations to evaluate if any differences existed. The within-scale mean difference between the PrS and SeS patients for the ODI was 5 points. A similar result was found for the RMQ (5 points); however, LBPRSdisability and SF36 (pf) showed a somewhat higher mean difference of 10 and 11 points, respectively (data not shown). Between-scale mean differences are shown in Table 3. No statistically significant differences were found between the mean score of the ODI and the external instruments except for the two subscales of the SF36. The results from the 8 weeks between-group comparison are not included as the data in the PrS patients were biased due to a floor effect.

Table 3 Baseline comparisons of systematic differences between mean scores of the ODI and the external instruments in PrS and SeS patients

Second, we compared the spread of the ODI scores to the disability and pain measures at baseline and 8 weeks follow-up. At baseline, the ODI scores are spread over a narrower window (SD ± 15.85) when compared to the external measures (SD range 17.40–25.38). This was statistically significant (< 0.01) for all comparisons except the RMQ (SD ± 17.40; = 0.16). When comparing ODI and RMQ for the PrS and SeS patients at baseline, no significant difference in the score spread was seen. The same trend was observed at 8 weeks follow-up in both patient populations.

Finally, the individual patient score level was examined by Bland–Altman LOA plots of standardised scores (Fig. 4). ODI score level at baseline is within ± 1.3 SD when compared to the other disability measures and within ± 1.7 SD in comparison to the pain measures. Furthermore, the ODI score level is comparable in PrS and SeS patients. The same pattern was seen at 8 weeks follow-up (data not shown).

Fig. 4
figure 4

Limits of agreement between the mean standardised ODI score and the standardised RDQ, LBPRSdisability, LBPRSpain, SF36 (pf), SF36 (bp) and NRSpain scores at baseline

Longitudinal external construct validity. Correlations between the change score of the ODI and the external measures were calculated using Pearson’s r. The results showed correlation coefficients of 0.78 (RDQ), 0.69 (LBPRSdisability), 0.75 (SF36 (pf)), 0.56 (LBPRSpain), 0.65 (SF36 (bp)) and 0.61 (NRSpain). As expected, the ODI correlated less strongly to the pain measures compared to the disability measures. All correlations were statistically significant (< 0.01), indicating acceptable external longitudinal construct validity of the ODI change score.

Discussion

This paper reports on the Danish cross-cultural adaptation of the frequently used back-specific ODI, and presents results of the first part of the psychometric testing. The validation procedures were carried out in two different back pain populations for several reasons. First, few studies have cross-culturally adapted and validated functional scales in patients with LBP of differing severity [47]. Second, we specifically wanted to psychometrically test the ODI in a broad range of LBP patients since a cross-culturally adapted outcome measure should be tested in target populations relevant for clinical research and clinical practice.

We included consecutive patients in the study to get a true representation of LBP patients in the two patient populations. The dropout analysis did show some differences between the participants and dropouts; however, we consider these differences minor.

Translation of the ODI

During the translation and cross-cultural adaptation procedures we followed the recommendations described by Guillemin et al. and Beaton et al. [18, 19]. The problems encountered during the process were minor and documented at all stages, and we conclude that our attempt to translate the ODI into Danish is both reliable and conceptually valid.

Reliability

Homogeneity (internal consistency), as measured by Cronbach’s alpha, was found to be 0.88 for the whole study population (PrS 0.89; SeS 0.85) which falls well within the recommended interval of 0.7–0.9 for group comparisons [37]. Our ODI alpha is in the top end when compared to previously reported coefficients ranging from 0.76 to 0.94 [46, 5255]. Item-total correlations ranged from 0.54 to 0.73 for all patients and were generally higher for the PrS patients.

We used the ICC and LOA as a measure of repeatability. The study showed that the ODI had an excellent ICC of 0.91 which compared well with the literature [15, 45, 56]. We found a mean difference of 0.8 and a 95% LOA of −11.5 to + 13.0 with no noteworthy difference in the two patient populations. This indicates that the ODI showed negligible systematic bias on the repeated measurements. The 95% LOA signifies change greater than the measurement error and is therefore conceptually equivalent to the minimum detectable change (MDC) as reported by Stratford and Binkley [57]. Thus, a worsening greater than 12 points and improvement greater than 13 can be considered a “real change” at the very stringent 95% confidence level. At the less stringent 90% confidence level the LOA was found to be (−9.6 to 11.0). To the author’s knowledge, this is first time LOAs for the ODI have been reported in the literature [13]. In several studies values for the MDC for the ODI have been reported; however, the comparability is questionable as the ODI version and level of confidence differ. Hägg et al. [58] reported an MDC95% of 10 points for ODI version 1.0, Frits et al. and Grotle et al. found an MDC95% of 13 and 11 points, respectively, for the modified (revised) ODI and Mannion et al. [51] found an MDC95% of nine points for ODI version 2.1. Furthermore, the MDC90% was reported to be 10.5 points for the modified ODI [45]. Thus, our LOA of 13 points is in the high end in comparison to reported values. Apart from ODI version and confidence level, we ascribe this to differences in the patient population and test–retest time interval.

The mean time spans between completions of the two questionnaires were 4.4 and 12.0 days for the PrS and SeS patients, respectively. The shorter test–retest interval in PrS patients was carefully chosen balancing the risk of not finding stable patients and introducing bias from patients memorising their previous answer. To reduce the memory effect, the sequence for ten items of the ODI were changed at the retest. When examining the LOAs for the two patient populations no differences were found.

Scale width

Traditionally floor and ceiling effects describe the percentage of subjects scoring maximal or minimal points. As a benchmark McHorney and Tarlov [44] suggested that questionnaires with more than 15% of the respondents scoring at the floor or ceiling initially should not be used. We did not find any floor or ceiling effect of the Danish ODI using this criterion as only one patient reached the floor of the scale. However, using the more sensible scale width approach, the Danish ODI showed a fairly pronounced floor effect in the PrS patients (14.1%) compared to the SeS patients (6.7%). Similar results were found by Patrick et al. [26] in a non-surgical patient group and it is thus questionable how useful the Danish ODI is as a primary outcome measure in a PrS patient population.

Validity

We examined several aspects of criterion and construct validity of the Danish ODI. The results of the cross-sectional discriminant validity analysis showed that the ODI can discriminate between groups of subjects that are expected to differ in their level of disability for all the chosen variables (symptom location, pain duration and medication usage). Interestingly, the group score difference was the largest for medication usage (13 points) in comparison to symptom location (7 points) and pain duration (3 points) indicating that this variable is important for discriminating among LBP patients when using the ODI.

In the concurrent validity analysis we looked at the differences between the ODI and external disability and pain measures at baseline and 8 weeks follow-up. Three aspects were analysed: systematic differences among the instruments, patient spread on the response scale and specific response scale scores for the different instruments. In comparison to the disability and pain measures the mean score of the ODI was 10 and 21% lower, respectively. This confirms previous findings that the ODI may be more appropriate for patients with a greater degree of disability [10], particularly so when the pain level is high. Comparing PrS and SeS patients, the results showed similar systematic differences for the ODI and RMQ (5 points) but higher for the LBPRSdisability and SF36 (pf) (10–11 points). This is important when comparing results of similar patient populations in clinical trials. Further comparisons of the two patient populations showed that the difference between the mean scores of the external instruments compared to the mean ODI score was negligible except for the two subscales of the SF36. We suspect this to be due to the generic nature of the SF36 and the finding supports the validity of disease-specific instruments such as the ODI.

The second analysis evaluated the ability of the ODI to distinguish between different patient disabilities (patient spread). Several interesting points were noted. First, of all the external pain and disability scales the ODI showed the narrowest window indicating a poorer spread of the PrS and SeS patients on a scale ranging from 0 to 100%. Second, the ODI and RMQ seem to be almost equally good at differentiating patient disabilities in both study populations except at lower disabilities where the ODI has a tendency to reach the floor of the scale (data not shown). Third, the pain scales showed a superior ability at differentiating patients (in particular NRSpain) in comparison to the disability scales highlighting the importance of including both pain and disability measures in clinical trials. Lastly, the global scale of SF36-pf showed a better differentiating ability compared to the disease-specific scales (ODI and RMQ) proving that disease-specific scales are not necessarily the best scales for the cross-sectional differentiation of LBP patients.

In the last analysis we compared the ODI score level to the external pain and disability scales using standardised LOAs. Agreement on the individual score level ranges from ± 1.3 SD for the disability measures and ± 1.7 SD for the pain scales reflecting that pain and disability are two related but different dimensions. We consider the agreement between the ODI score level as compared to the external measures acceptable.

Kirshner and Guyatt [49] recommended evaluative measures be tested for longitudinal external construct validity. In lack of a “golden standard” we examined the correlation of the ODI change scores against well-validated instruments purporting to measure the domains of pain and disability. The moderate to strong correlation coefficients ranging from 0.69 to 0.78 for the disability measures and from 0.56 to 0.65 for the pain measures supported a good longitudinal external construct validity of the ODI.

Finally, our SeS population contains chronic LBP patients ranging from the moderately disabled patient to the surgical patient. Thus, the mean pain and disability scores are lower compared to a purely surgical population such as those reported by Fairbank et al. [59] and Fritzell et al. [60]. In other words, our estimates apply to the majority of the chronic LBP patients but specific values may vary between subgroups.

Conclusion

The Danish ODI version 2.1 was translated, culturally adapted and psychometrically tested in two different LBP populations relevant for future clinical research. The ODI is a reliable and valid tool to assess pain related function when compared to well-established pain and disability scales. It is probably a more appropriate outcome measure in patients seen in the SeS due to a negligible floor effect and its ability to assess patients with a greater degree of disability and pain.