Introduction

Hip osteoarthritis (OA) is common, painful, and sometimes disabling [16]. To determine the influence of the disease and its treatments on pain, function, and quality of life in patients with hip OA, surgeons increasingly use patient-reported questionnaires. These should be reliable, valid, and sensitive to clinical changes [23]. The Oxford Hip Score (OHS) is a 12-item, hip-specific, self-reported questionnaire for patients with hip diseases. It has been widely used as an outcome measure of functional ability, daily activities, and pain from the patient’s perspective [5]. There are 12 domains, and each is scored using a self-reported 5-point Likert scale; the OHS’ sum score therefore ranges from 0 (worst) to 48 (best). The OHS has been studied extensively and has proven to be reliable, valid, and responsive for patients [5, 21]. It also has been translated and validated in several languages like German, Dutch, Japanese, French, and Italian [7, 10, 17, 20, 24].

When one reliable, valid questionnaire is being used in populations with different cultures, it is necessary to test the psychometric properties of the questionnaire rather than simply translating the content. China has the largest population (approximately 1.3 billion), and Chinese is one of the most common languages in the world; however, there has not been a Chinese version of the OHS (OHS-C) so far.

Therefore, we aimed to perform an intercultural adaptation of OHS for the Chinese-speaking population with hip OA and evaluated the psychometric properties of the Chinese version in Chinese patients undergoing THA. Specifically, we tested the (1) reliability; (2) validity; and (3) responsiveness of the Chinese version of the OHS.

Materials and Methods

Translation and Crosscultural Adaptation

The translation of the original OHS followed previous published guidelines [1, 11]. The process was formed in five steps: Step 1–forward translation. The forward translation from English to simplified Chinese was performed independently by three bilingual translators who were native Chinese. Two of the translators were orthopaedic surgeons in our hospital (authors of the article, WZ and JL); another one was a professional bilingual translator with no medical background unaware of the study purpose. Step 2–synthesis of the translation. The first Chinese version of the OHS was obtained after a consensus meeting of the three translators. Step 3–backtranslation. Three native English speakers (YJ, FA, GD) with a medical background, fluent in Chinese, blind to the previous English version of OHS, translated the Chinese version of OHS back into English. Step 4–a consensus meeting with all translators was held to compare the backtranslation with the first Chinese version, original English version, and to resolve discrepancies, ambiguities, or any other problems to reach a prefinal Chinese version of the OHS. Step 5–test the prefinal version of OHS on 30 consecutive patients with hip OA to see if there were any problems with the prefinal version. All the translators should discuss the problem and develop the final Chinese version of the OHS (OHS-C) and perform further psychometric testing.

Psychometric Assessments and Statistical Analysis

Participants

Between July and December 2012, all 136 standard Chinese-speaking patients undergoing THA were invited to participate in this study. The inclusion criteria were as follows: age > 18 years, able to read and speak Chinese, primary hip OA diagnosed based on the criteria of the American College of Rheumatology, and willing to receive a THA in our hospital. Patients were excluded if they were unable or unwilling to complete the questionnaire or if they had symptomatic OA in the other lower limb joints, a history of lower limb or spine surgery, inflammatory arthritis, spondyloarthritis, or severe lung, heart, or other diseases. Finally, a total of 108 (79% of those invited, 63 women and 45 men) who met the prespecified inclusion criteria did so. Mean age of participants was 66 years (range, 35–87 years). Duration of hip OA was 5.8±2.5 years (range, 1–12 years) (Table 1). The population was in accordance with the article proposed by Terwee et al. [22] that the study should enroll at least 100 patients for internal consistency analysis and 50 patients for floor or ceiling effects, reliability, and validity analysis. All the 108 patients signed informed consent to participate in the study and the clinical research ethics committee of our hospital approved the study.

Table 1 Characteristics of participants

Instruments

The OHS is widely used to assess patients with diseases of the hip and it includes 12 items (each scores on a 0–4 Likert scale). The questionnaire generates an overall score ranging from 0 to 48 with a higher score representing better hip status. The OHS has been translated and validated into several languages [7, 10, 17, 20, 24].

To determine construct validity, we compared the OHS with the Harris hip score, the SF-36, and the visual analog scale (VAS) score for pain. The Harris hip score (HHS), a joint-specific health status questionnaire, is frequently used by clinicians to assess the outcome of the hip. The HHS contains four domains: pain, function, deformity, and ROM ranging from 0 (maximum disability) to 100 (no disability) [12]. The SF-36 is a general health status-measuring questionnaire that contains eight domains: Physical Functioning, Role-Physical, Bodily Pain, General Health, Vitality, Social Functioning, Role-Emotional, and Mental Health. The SF-36 has been translated and validated in Chinese populations in many studies. Each subscale ranges from 0 to 100 and higher scores represent better health status [15, 25, 27]. The VAS is a simple and widely used method to measure patients’ intensity of pain. It allows patients to rate pain intensity along a 100-mm line ranging from “no pain” (at the left end) to “worst pain” (at the right end) [8].

Participants completed the OHS-C, HHS, VAS, and SF-36 in an outpatient room of orthopaedics in our hospital. Two weeks later, when they were in the hospital waiting for surgery, they were asked to complete the questionnaires for the second time. Six months after the surgeries, the participants were required to complete the OHS-C for the third time.

Acceptability and Score Distribution

To evaluate acceptability, all the patients were asked if there were any difficulties filling in the questionnaire. The data were checked for missing or multiple responses. The completeness of the OHS-C and the time needed to complete the OHS-C were also measured. The average time required to complete the OHS-C was 96 ± 24 seconds. All participants completed the OHS-C and there were no missing responses or difficulties observed. Scores of OHS-C ranged from 3 to 31 (Fig. 1). We also summed the scores of the VAS, HHS, and SF-36 (Table 2).

Fig. 1
figure 1

This is the distribution of the OHS-C scores.

Table 2 Score distribution of the OHS-C, HHS, VAS, and SF-36 (N = 108)

Reliability

The reliability property was assessed by internal consistency and test-retest reliability. Internal consistency was measured by Cronbach’s alpha. Cronbach’s alpha > 0.7 is considered good reliability [22]. We measured test-retest reliability by comparing scores of the first and second time. The health status of patients with such a chronic disease is unlikely to change too much during 2 weeks without medical intervention. People also would not recall the answers they chose before. The intraclass correlation coefficient (ICC) was used to assess the test-retest reliability, where a value > 0.8 is considered of good reproducibility [9]. Bland-Altman, describing the mean scores of the two assessments and differences between them, was also used to assess whether there was systematic bias between the test and retest of the OHS-C [2, 3].

Validity

Construct validity was evaluated by calculating the Pearson correlation coefficients among the OHS-C and HHS, VAS, and eight domains of the SF-36. The correlations were judged as poor (r = 0–0.20), fair (r = 0.21–0.40), moderate (r = 0.41–0.60), good (r = 0.61–0.80), or excellent (r = 0.81–1.0). Now that the OHS was interculturally adapted to evaluate the physical health of the hip, we hypothesized that the OHS-C correlated strongly with the physical health-related domains (Physical Functioning, Bodily Pain) of the SF-36 and weakly with the mental health-related domains (Vitality, Role-Mental Health, Role-Emotional) of the SF-36. Floor and ceiling effects were also considered significant if > 15% of all the participants achieved the lowest (0) or highest (48) possible score on the OHS-C [18].

Responsiveness

The responsiveness [4, 13, 19] of the OHS-C was obtained by comparing the preoperative scores and 6-month postoperative scores. We calculated the effect size by using the SD of preoperative OHS-C scores divided by the mean change between preoperative scores and postoperative scores [18]. We also calculated the standardized response mean by using the SD of the changes between pre- and postoperative divided by mean of the changes.

SPSS Version 13.0 (SPSS Inc, Chicago, IL, USA) was used to analyze the datum of all the questionnaires.

Results

Reliability

Internal Consistency

The internal consistency was good. The Cronbach’s alpha was 0.91 for the overall OHS-C and ranged from 0.90 to 0.91 if an item was deleted. The item total correlation ranged from 0.43 to 0.77, which also indicated good correlation between each item and the overall OHS-C (Table 3).

Table 3 Correlation of each item and total OHS-C scores

Test-retest

The OHS-C showed excellent test-retest reliability. Mean score of the retest was 15.7 ± 5.0, which was similar to the first test (15.3 ± 5.3; p > 0.05). ICC for the test-retest was 0.937 (95% confidence interval, 0.909–0.957; Table 4). Bland-Altman plot (Fig. 2) showed no systematic bias. The limits of agreement ranged from −4.01 to 3.20. It also indicated good reproductivity of the OHS-C [3].

Table 4 Intraclass correlation coefficient between the test and retest groups (n = 108)
Fig. 2
figure 2

These are Bland-Altman plots of test-retest reliability of the OHS-C. The interval of two assessments was 2 weeks. Dashed lines show the 95% (mean ± SD) limits of agreement.

Validity

The result demonstrated that the correlation between OHS-C and HHS (0.89, p < 0.01) was excellent. The OHS-C also correlated well with the VAS (−0.79, p < 0.01) and the Physical Functioning (0.79, p < 0.01) and Bodily Pain (0.70, p < 0.01) domains of the SF-36. These data indicated convergent validity. A correlation between OHS-C and Role-Physical (0.52, p < 0.01), General Health (0.55, p < 0.01), and Social Functioning (0.51, p < 0.01) domains of the SF-36 was moderate. However, the correlation between the OHS-C and Vitality (0.31, p < 0.01), Role-Emotional (0.31, p < 0.01), and Mental Health (0.29, p < 0.01) domains of the SF-36 was weak, indicating divergent validity. We also observed that the OHS-C showed a better correlation with SF-36 than HHS (Table 5).

Table 5 Pearson correlations among the OHS-C, HHS, VAS, and SF-36 (n = 108)

Responsiveness

The Chinese version of the OHS showed good responsiveness to treatment. The responsiveness of the OHS-C was evaluated by comparison of the pre- and postoperative scores of the THA group. The mean score of OHS-C improved from 15 ± 5 to 34 ± 4 (p < 0.01). The mean of changes was 19 ± 5. The effect size and standardized response mean for OHS-C were 3.52 and 3.31, respectively.

Discussion

In China, clinical surgeons are paying more attention to self-reported outcome assessment. Several hip-specific instruments have been translated and crossculturally adapted into Chinese, including the Hip Disability and Osteoarthritis Outcome Score [26]. At present, there is no agreement for which questionnaire should be used to evaluate the status of patients with hip OA. The OHS is widely used as a joint-specific measure for patients with hip OA [14], but to our knowledge, this widely used tool has not been validated in a Chinese population. The purpose of this study therefore was to interculturally adapt the OHS into Chinese and to evaluate the psychometric properties of the OHS-C in a Chinese population with hip OA undergoing THA. We found the Chinese version of the OHS to be a valid tool, demonstrating a high degree of reliability, validity, and responsiveness.

Before discussing our results further, there are some limitations of our study that should be considered. First, the participants did not represent the entire Chinese population with hip OA. Most of the patients recruited had severe hip OA and intended to undergo THA. However, there was enough variability in the population to demonstrate responsiveness, and no floor or ceiling effects were observed. Second, we translated the OHS into a standard simplified Chinese language, the official language of China, but traditional Chinese language was also widely used in several southern areas in China. So it is necessary to translate and validate the OHS into traditional Chinese language in the future. Third, all of the participants underwent THA. We did not assess the responsiveness in patients receiving conservative treatments. Thus, more validation research in patients with hip OA with other treatments would be required.

The Cronbach’s alpha correlation coefficient for the OHS-C (0.914) indicated excellent internal consistency, which was equivalent to other studies of OHS [6, 7, 10, 17, 20, 24]. The Pearson coefficients of item total (ranging from 0.427 to 0.770) also indicated good correlation between item and overall score. As for the test-retest reliability, ICC for the OHS-C (0.937; 95% confidence interval, 0.909–0.957) and Bland-Altman plot (Fig. 2) was considered of good reproducibility. It was in accordance with other validation studies [10, 17, 20].

Construct validity was demonstrated by calculating the correlation between OHS-C scores and HHS, VAS, and eight individual domains of SF-36 scores. The OHS-C correlated significantly with HHS (0.890) and VAS (−0.788), which suggested the OHS-C measured similar aspects to HHS and VAS. We also observed that OHS-C showed a significant correlation with Physical Functioning (0.79, p < 0.01) and Bodily Pain (0.70, p < 0.01) domains of the SF-36 and a weak correlation with Vitality(0.31, p < 0.01), Role-Emotional (0.31, p < 0.01), and Mental Health (0.29, p < 0.01) domains of the SF-36 (Table 5). The result of construct validity was consistent with previous validation studies [6, 7, 10, 17, 24]. No floor or ceiling effects were observed in the pre- and postoperative patients, similar to previous studies [10, 17].

The responsiveness, or sensitivity to clinical change, is the most important characteristic in prospective outcome study. The result showed that the OHS-C was able to detect change after surgical treatment with excellent responsiveness. The effect size of the OHS-C was 3.52. Compared with those who received hyaluronic injection (effect size 1.98), patients who received a THA showed a better effect size of the OHS [17]. It was also better than the effect size of patients receiving a THA in other studies of OHS [6, 10]. Our explanation was that the participants in our study were in worse health status than those of other validation studies, which might lead to better responses to surgical treatment.

In summary, we found that the OHS could be interculturally adapted into Chinese with good psychometric properties. As a self-reported questionnaire, the Chinese version of the OHS is a joint-specific, reliable, valid instrument for a Chinese population with hip OA undergoing THA. Therefore, we suggest that the OHS-C can be used by surgeons in practice to evaluate the impact of hip OA and its treatments on patients’ pain and function.