Introduction

Knee osteoarthritis is the most common degenerative joint disease in older adult populations, often affecting the patient’s daily activities and quality of life [28]. According to the sixth national population census of the People’s Republic of China in 2010, there were 177.65 million people older than 60 years in China, accounting for 13.26% of the total population [43]. The overall prevalence of symptomatic knee osteoarthritis in the population 45 years or older in China reached 8.1% [53]. One would expect a portion of this aging population with symptomatic knee osteoarthritis to consider total knee arthroplasty (TKA) in the future, and patient-reported outcomes will be an important measure of success.

Several scoring systems in the Chinese language are available for assessing knee disorders, some of which have been scientifically validated, such as the new Chinese Knee Society Scoring System [38]. There is increasing interest in using patient-reported outcome measures (PROMS) in clinical practice across the spectrum of osteoarthritis. The Oxford Knee Score (OKS) is a disease-specific, PROM that is used to evaluate joint pain and physical function before and after TKA in patients with knee osteoarthritis [14, 16, 30]. The OKS has been shown to be valid, responsive, and reliable for measuring patients’ perceptions of their knee disorder [11]. It correlates with the achievement of patients’ expectations and satisfaction after TKA [1, 7,8,9, 32] or nonsurgical management [25] with no observer bias. The OKS has been reported to be an independent predictor of knee ROM after TKA [40]. Although not designed for this purpose, the OKS is increasingly used to assess eligibility for primary [12] or revision [47] surgery. The OKS is short and does not require a clinical review to obtain objective data, in contrast to some other functional scoring systems. These factors have resulted in its widespread use in large cohort studies and comparative trials that assessed outcomes after TKA and the long-term durability of knee components [20, 41]. Although there is a Singaporean Chinese version of the OKS that has been validated in Chinese-speaking Singaporeans with knee osteoarthritis [60], the Singapore Chinese version of the OKS may not be culturally adaptable to patients with knee osteoarthritis in mainland China, considering the sociocultural differences in lifestyle, language, and family support structures between Singapore and China.

We therefore sought to translate and cross-culturally adapt the OKS for use in mainland China, and test it for (1) reliability, (2) construct validity, (3) dimensionality, and (4) responsiveness.

Methods

The medical ethics committee of Nanfang Hospital approved this study (NFEC-201209-K9), and all participants provided written informed consent.

Translation and Cross-Cultural Adaptation

The cross-cultural adaptation of the OKS to Chinese was performed according to the ISIS Outcomes Translation and Linguistic Validation Process (Oxford University Innovation [formerly ISIS Innovation], Oxford, UK) and a published guideline [59]. Briefly, the original English version of the OKS was translated independently to Chinese by three native Chinese bilingual translators (a professional English translator [NL], an experienced orthopaedic surgeon [JW], and an advanced-practice nursing specialist [XL]). A consensus panel was formed to discuss the three preliminary translations, which resulted in the first synthetic version. The synthetic version then was back-translated by two bilingual nonmedical, professional English translators (ZW, DC), who were blinded to the original English version. This process was continued until a final version was produced that had no disagreements between the English and Chinese versions. A consensus meeting that included all members of the consensus panel and the translators involved in the original translation process reviewed all of the reports and reached consensus on any discrepancies, thereby finalizing the pretesting Chinese version.

After the translation process was complete, 20 Mandarin-speaking patients with knee osteoarthritis who were scheduled to undergo TKA were asked to (1) decide if all of the items on the questionnaire were understandable and (2) complete the questionnaire. These subjects comprised patients with different socioeconomic and educational backgrounds (mean age, 63 ± 6 years; range, 52–75 years; male, 25%; living in rural area, 60%; unemployed, 50%; illiteracy, 20%; primary school, 35%). Each item and response option were read aloud to the patients and they were asked about their understanding of the meaning of the item and the chosen response. Responses were recorded by the interviewer. No difficulties encountered by the respondents were noted in the pretest phase. Therefore, no further modification was required for the pretesting version. The final Chinese version that had been approved at the consensus panel meeting and all the versions of the scale were sent to and approved by the copyright holders of the OKS (Oxford University Innovation Limited, Oxford, UK) (https://process.innovation.ox.ac.uk/p/oks/stage/1/oxford-knee-score.aspx).

Study Design and Subjects

This is a prospective cohort study of patients scheduled to undergo TKA. Between March 2013 and March 2015, 253 patients underwent TKA at the department of arthroplasty of a tertiary referral hospital in Guangzhou, China. We applied inclusion and exclusion criteria to this group, and excluded 139 patients (55%), leaving 114 (45%) who participated (Group 1) and whose surveys were analyzed for psychometric properties of the OKS-CV. A flow chart outlines the assignment of patient groups (Fig. 1) Eligibility criteria were (1) patients with knee osteoarthritis who were scheduled to have a primary unilateral TKA, (2) patients who were fluent in Mandarin, and (3) consent to participate. The exclusion criteria were: (1) lack of understanding of Mandarin, and (2) inability to comprehend the questionnaires owing to cognitive impairment. After consenting to the research, all patients were interviewed by a trained interviewer (KL) and asked to complete four questionnaires before their planned TKA: the OKS-CV, WOMAC [2], Short Form-8 Health Survey (SF-8TM; Quality Metric Incorporated, Lincoln, RI, USA), and the EuroQol Group 5-Dimension Self-Report Questionnaire (EQ-5D) [3]. Sociodemographic data and the presence of chronic medical status also were recorded. With a mean postoperative followup of 2.7 years (SD, 0.5 years), 114 patients who completed the preoperative questionnaires were asked to complete the OKS-CV and the WOMAC again postoperatively.

Fig. 1
figure 1

A flow chart shows the assignment of patient groups in our study. OKS-CV = Oxford Knee Score-Chinese Version; EQ-5D = EuroQol Group 5-Dimension Self-Report Questionnaire; SF-8 = Short-Form 8 Health Survey.

For assessing test-retest reliability of the OKS-CV, a different group of 35 outpatients with knee osteoarthritis (Group 2) was recruited between December 2016 and January 2017 at the same hospital. Eligibility and exclusion criteria for Group 2 were largely the same as those for Group 1. Because the measured condition of the patients was expected to change after TKA, “were scheduled to have a primary TKA” was removed from the eligibility criteria for Group 2. Moreover, to minimize changes of clinical status, patients who were scheduled to undergo intraarticular injection between the test-retest assessments were excluded. Among them, the OKS-CV was measured at the first interview (face to face) and again at a 1-week interval by telephone interview. The sample size for evaluating the test-retest reliability in this study is consistent with that of Naal et al. [42].

All patients were diagnosed with knee osteoarthritis by an orthopaedic surgeon (JW) according to radiographic and clinical criteria of the American College of Rheumatology diagnostic criteria [29]. There were no differences in the age, sex, and socioeconomic and educational backgrounds between the study sample and our routine patient population of the last 5 years (n = 715) at our hospital (all p > 0.05). As one of the three largest hospitals in Guangdong province, our hospital attracts patients from all cultures and income levels. Access to the hospital is open to every patient, and our patients are a mixture of urban and rural inhabitants. The study cohort therefore was considered representative.

Patient Demographics and Clinical Background

A total of 114 patients underwent TKA (Group 1: mean age, 67 ± 7 years; range, 55–84 years), completed the preoperative questionnaires, and were followed up by telephone interview, with a mean postoperative followup of 2.7 years (SD, 0.5 years). Among them, one patient died, 13 patients had moved and thus were lost from the sample, and one patient refused to answer the questionnaires because of hospitalization for pneumonia. The final cohort consists of 99 patients (87% of the preoperative number of patients; mean age, 67 ± 7 years; female, 80%). For assessing test-retest reliability, a different group of 35 outpatients (Group 2) completed the OKS-CV twice in 1 week. There were no differences in sex and socioeconomic or educational backgrounds between the groups (p > 0.05). However, age (F = 17.3, p < 0.001), duration of osteoarthritis (F = 13.7, p < 0.001), and BMI (F = 24.3, p < 0.001) of Group 2 were smaller than those of Group 1 (Table 1). There were no missing data for any item of the OKS-CV. Patients did not report any difficulty understanding or completing the questionnaire.

Table 1 Demographic characteristics of the subjects

Measures Tested

The OKS is a joint-specific, patient-reported, outcome measures instrument containing 12 items regarding knee symptoms and function, each with five levels. According to the original scoring system, the score of each item ranges from 1 to 5, and a summary score ranges from 12 (least difficult) to 60 (most difficult) [14]. According to a revised scoring system [41], the score of each item ranges from 0 (severe problems) to 4 (no problems), with a summary score ranging from 0 (worst outcome) to 48 (best outcome). We used the revised scoring system. Because it is difficult to compare original and revised scores among studies, we collected radiologic data of all 114 participants in Group 1 from their medical records to get a better sense of the degree of osteoarthritis and establish external validity with other patient populations. The Kellgren and Lawrence grading system was used to classify the severity of osteoarthritis [33] (Supplemental Table 1. Supplemental material is available with the online version of CORR ®.)

The WOMAC is a 24-item, disease-specific, functional measurement commonly used to assess symptoms and physical disability in patients with knee and/or hip osteoarthritis [2]. The WOMAC consists of three domains. Several versions of the WOMAC can be used. We used the five-point Likert rating system (each item is scored 0 to 4): stiffness (two items; range, 0-8), pain (five items; range, 0–20), physical function (17 items; range, 0-68). The total score is calculated by summing the three domain scores (range, 0–96), with higher scores reflecting worse pain, stiffness, and physical function. The Chinese version of the WOMAC has been validated for use in China [51].

The EuroQol Group 5-Dimension Self-Report Questionnaire (EQ-5D) is a self-reporting questionnaire that measures quality of life using a self-classifier and a VAS (EQ-VAS) [46]. The self-classifier contains five dimensions—mobility, self-care, pain/discomfort, usual activities, and anxiety/depression—with three levels of response to each dimension. Utility valuations for all 243 EQ-5D health states are based on time tradeoff evaluations by 1222 members of the general public in China [39]. The EQ-VAS is a vertical, graduated (0–100 points) 20-cm VAS, where 100 represents the “best imaginable health state” and 0 the “worst imaginable health state” [61]. The Chinese version of EQ-5D has been validated for use in China [57].

The SF-8 is derived from the SF-36® for the purpose of yielding comparable scores for the eight health dimensions and two summary measures of the SF-36 with minimal respondent burden [58]. It consists of eight items: (1) general health, (2) physical functioning, (3) role physical, (4) bodily pain, (5) vitality, (6) social functioning, (7) emotional roles, and (8) mental health. Each item uses a five- or six-point Likert-type scale. All eight items are classified in two summaries, with the physical component summary generated by items 1 to 4 and the mental component summary produced from items 5 to 8 [36, 56]. The item scores and summary scores are calibrated on a scale of 0 to 100, with higher scores reflecting better subject status. The Chinese version of SF-8 has been validated for use in China.

Reliability Testing of the OKS-CV

Internal consistency reflects the strength of the relationship among the items in the scale. In the current study, Cronbach’s alpha was used to assess internal consistency of the OKS-CV. Internal consistency was deemed satisfactory if this index was 0.7 to 0.9 [30]. The coefficient also was calculated by eliminating one item from each of the 12 questions. All items were examined for correlation with the overall score [54]. The items should be at least moderately correlated with each other, and each item should correlate with the total scale score by greater than 0.30 [55]. The scores for the analysis of internal consistency of the OKS-CV were completed at administration of the pre-TKA assessment in Group 1 (n = 114).

Test-retest reliability, referring to the reproducibility of the measurement, gives an indication of the stability of the test instrument with time. The test-retest reliability of the OKS-CV involved 35 outpatients in Group 2, and was assessed using an intraclass correlation coefficient (ICC) from a one-way random effects model. The ICC ranges from 0 (no agreement) to 1 (perfect agreement), with an ICC greater than 0.80 being considered an indicator of high reproducibility [34]. To ensure that the patients have a relatively stable clinical condition, the test-retest reliability of the OKS-CV was assessed using a recommended interval of 1 week between two measurements [18, 45, 52, 55].

Construct Validity Testing of the OKS-CV

The construct validity was examined by means of convergent validity and divergent validity evaluations [35]. Construct validity refers to the degree to which the scale assesses the underlying theoretical construct it is supposed to measure. Convergent validity and divergent validity are two aspects of construct validity. To assess the construct validity of the OKS-CV, Spearman’s rank correlation coefficients were calculated between the domains of the OKS-CV and WOMAC, and the related EQ-5D and SF-8 subscores. Hypotheses were formulated regarding the expected magnitude and direction of relations between the subscales of the OKS-CV and the above-mentioned instruments. For convergent validity, three theories were proposed. The OKS-CV should (1) have moderate to strong correlations with the pain and physical function domains of the WOMAC; (2) have moderate to strong correlations with the SF-8 physical component summary [14, 21]; and (3) have moderate to strong correlations with EQ-5D mobility and usual activities. For divergent validity, we postulated that (1) the correlation between the OKS-CV and EQ-5D VAS would be less than 0.35; and (2) the OKS-CV was expected to be least correlated with EQ-5D anxiety/depression and the SF-8 mental component summary. All scores for the analysis of construct validity of the OKS-CV were completed at administration of the pre-TKA assessments in Group 1 (n = 114).

Dimensionality Testing of the OKS-CV

Dimensionality, referring to a measure of whether all items in a subscale relate to a single latent variable, was assessed using the exploratory factor analysis with nonorthogonal promax rotation [22, 60]. The exploratory factor analysis consists of two steps, namely finding an initial solution, and then rotating that solution [31]. Each retained factor had eigenvalues greater than 1 [19]. Eigenvalues indicate the importance of each factor in explaining the variability and correlations in the observed sample of data. Items with a factor loading of 0.40 or greater were considered acceptable [37]. Factor loadings represent how much a factor explains a variable in factor analysis. The scores for the analysis of dimensionality of the OKS-CV were completed at administration of the pre-TKA assessment in Group 1 (n = 114).

Responsiveness Testing of the OKS-CV

Responsiveness refers to a measurement property concerning change scores that might be used to assess improvement or deterioration on a health status. The responsiveness of the OKS-CV was evaluated by calculating standardized response mean (SRM) and effect size (ES) in patients who completed pre- and post-TKA assessments of the OKS-CV (n = 99, 87% of the patients pre-TKA in Group 1). The SRM is defined as the mean change scores between preintervention and postintervention divided by the SD of the change scores. The ES is defined as the observed change in scores between preintervention and postintervention divided by the SD of the baseline (preintervention) score. It has been suggested that ES of 0.2, 0.5, and 0.8 are regarded as small, medium, and large degrees of change, respectively [14]. A previous study suggested that function improvement may reach a plateau 1 year after TKA and is maintained at 3 years after TKA [50]; the followup of our study cohort (mean, 2.7 years; SD, 0.5 years) therefore was considered to be long enough after TKA that the patients would have been expected to reach maximal recovery.

We also examined the distribution of floor and ceiling effects of the OKS-CV in the patients who completed the pre-TKA assessment of the OKS-CV in Group 1 (n = 114) by analyzing the proportion of individuals obtaining the lowest (0 points) and highest (48 points) scores, respectively. The presence of floor or ceiling effect usually is defined as 15% of individuals in a sample achieving the extreme of the range (maximum or minimum level) [42]. The effects represent a measuring limitation, indicating that a scale may not be possible to measure a meaningful improvement or deterioration of a target condition.

Sample Size Determination

The required sample size was determined based on the assumptions made for establishing the convergent validity. The convergent validity was established via Spearman’s rank correlation coefficients, reflecting the strength of the association between the OKS-CV and the criterion-related scales. A moderate effect size of 0.30 for convergent validity was assumed [10], and the power and type 1 error probability set at 0.80 and 0.05 [6], respectively. With these assumptions, the required number of participants was 41 [6].

Statistical Analyses

Demographic data were summarized using descriptive statistics, with the mean and SD obtained for continuous variables and the percentage for categorical variables. Internal consistency was estimated by using Chronbach’s alpha. The test-retest reliability was assessed by using ICC. Convergent validity and divergent validity indicate the scale’s construct validity. Spearman’s rank correlations were used because of the nonparametric nature of the data. The correlation was considered strong, moderate, or weak if the coefficient was greater than 0.5, 0.35 to 0.5, or less than 0.35, respectively [15]. Higher correlation coefficients are expected for convergent validity, and lower correlation coefficients are expected for divergent validity [16, 23]. Dimensionality was assessed using exploratory factor analysis. The Kaiser-Meyer-Olkin test and Bartlett’s test of sphericity were performed to confirm sample and items adequacy. The Kaiser-Meyer-Olkin measure should be greater than 0.5, and the significance level (p) for Bartlett’s test should be less than 0.05 for a satisfactory factor analysis to proceed. Responsiveness was evaluated by using the SRM and ES, and then using paired t tests to compare the change in the scores from preoperative to postoperative. Correlations were calculated with bivariate two-tailed Pearson’s correlations. Missing data were treated as described by Nilsdotter et al. [44]. The percentage of missing data was tabulated for each item; less than 5% was considered acceptable. All tests were two-tailed. A probability less than 0.05 was considered statistically significant. Data were analyzed with IBM SPSS statistics Version 19.0 software (IBM Corp, Armonk, NY, USA).

Results

Reliability

Internal consistency was good, with Cronbach’s alpha at 0.885. Cronbach’s alpha coefficients were slightly higher (0.888) after removing Item 8 “Pain in bed at night.” The corrected item-total correlations exceeded 0.3 for all items (Table 2).

Table 2 Reliability of the Chinese version of the Oxford Knee Score

All 35 patients completed test-retest reliability. Mean scores for the first and second administrations of the OKS-CV were 30 ± 9 and 32 ± 8, respectively. The OKS-CV showed excellent relative reliability with an ICC of 0.93 (95% CI, 0.87–0.97) for the total score.

Construct Validity

Three a priori assumptions of convergent validity and one assumption of divergent validity were confirmed with moderate to strong correlations (r > 0.35) between the OKS-CV and the WOMAC, the SF-8 physical component summary, the EQ-5D mobility and usual activities, and a weak correlation (r < 0.35) between the OKS-CV and EQ-5D VAS (Table 3). With respect to convergent validity, there were moderate to strong correlations between the OKS-CV and all subscales of the WOMAC (r = −0.80, p < 0.001), the SF-8 physical component summary (r = 0.65, p < 0.001), and the EQ-5D mobility (r = −0.35, p < 0.001) and usual activities (r = −0.41, p < 0.001). Divergent construct validity was satisfied only by the weak correlation between the OKS-CV and EQ-5D VAS (r = 0.30, p < 0.001). The OKS-CV correlated strongly with the SF-8 mental component summary (r = 0.58, p < 0.001) and moderately with EQ-5D anxiety/depression (r = −0.35, p < 0.001).

Table 3 WOMAC, EQ-5D, or SF-8TM of the study population and its correlation to the OKS-CV

Dimensionality

The exploratory factor analysis with nonorthogonal promax rotation method obtained three factors. The retained factors had eigenvalues greater than 1, and explained 66% of the general variance (Table 4). The Kaiser-Meyer-Olkin measure of sampling adequacy (0.85) and Bartlett’s test of sphericity (chi-square = 666; p < 0.001) were satisfied. Six items loaded on factor 1 (Items 3, 7, 9, 10, 11, 12), which was labeled “Physical function” and showed a strong correlation with the physical function subscale of the WOMAC (r = −0.73, p < 0.001). Factor 2, composed of five items (Items 1, 2, 4, 5, 6), was labeled “Pain” and showed a strong correlation with the pain subscale of the WOMAC (r = −0.66, p < 0.001). Factor 3 had only one item (Item 8), which was defined as “Sleeping problem” and showed a moderate correlation with the EQ-5D pain/discomfort (r = −0.35, p < 0.001) and anxiety/depression (r = −0.36, p < 0.001) and the SF-8 emotional roles (r = 0.36, p < 0.001) and mental health (r = 0.39, p < 0.001).

Table 4 Exploratory factor analysis of the Chinese version of the Oxford Knee Score

Responsiveness

The mean scores of the OKS-CV improved from 23 (SD, 9) to 37 (SD, 6) after surgery (t = −14.3, p < 0.001). The responsiveness values of the OKS-CV (SRM = 1.52, ES = 1.52) were close to those of the WOMAC (SRM = 1.78, ES = 1.68) (Fig. 2). The change scores of the OKS-CV were associated with the change scores of the WOMAC (Fig. 3).

Fig. 2
figure 2

The standardized response means (SRM) and effect size (ES) measures at preoperative baseline and 2.7 (SD 0.5) years postoperative followup are shown (n = 99). OKS-CV = Oxford Knee Score-Chinese; Item 8 = Pain in bed at night.

Fig. 3
figure 3

The scatterplot shows the correlation between OKS-CV change score and WOMAC change score (r = 0.68, p < 0.001). A fitted linear regression line slopes upward from left to right. Change scores = postoperative score − preoperative scores; OKS-CV = Oxford Knee Score-Chinese Version.

The results showed no floor or ceiling effect for the OKS-CV (Fig. 4). Two patients obtained the lowest score of 1 point (0.9%), and one patient obtained the highest score of 48 points (0.9%).

Fig. 4A–B
figure 4

The graphs show the (A) floor and (B) ceiling effects in the OKS-CV and the WOMAC at preoperative baseline (n = 114). OKS-CV = Oxford Knee Score-Chinese Version; Item 8 = Pain in bed at night.

Discussion

From the cultural perspective, the Chinese mainland population has its own linguistic features, family structure, and lifestyle. For example, in our study, 92% patients lived with their adult offspring, 37% lived in a building without an elevator, 14% lived in a bungalow with stairs, and 29% used squat toilets. This unique population is growing rapidly and the demand for TKA in an aging population can be expected to increase as well. PROMs are increasingly important tools to study the results of TKA in varied populations, and the OKS has been a favored instrument as it is easy for patients to complete, widely reported, and validated in many languages. Our study cross-culturally adapted the OKS for use in mainland China, taking into account some of the unique aspects of the culture noted above, and validated it through standard psychometric pathways.

There are numerous limitations of the current study that should be acknowledged. First, our subjects were recruited from one arthroplasty center and represented mainly Chinese Mandarin-speaking patients, and this may limit generalizability elsewhere in mainland China. Second, we conducted a test-retest reliability assessment in a different group of patients who were younger and had a shorter duration of osteoarthritis than our main study cohort; our test-retest reliability result may not replicate in a larger representative sample. Third, the factor 3 (Item 8 ‘the pain at night’) obtained from the exploratory factor analysis may not be as robust as the other two factors, with the eigenvalue value close to 1. Fourth, the followup between the preoperative and postoperative scores used in our cohort was variable, since we used the longest followups possible for the OKS-CV measurements. A more-uniform followup would have provided more precise results. However, we considered this variability acceptable because we were judging the responsiveness of the OKS-CV and not specific eventual outcomes of TKA. In line with previous longitudinal studies suggesting that function improvement may reach plateaus 1 year after TKA and maintained at 3 years after TKA [50], there were no differences in the postoperative scores of the OKS-CV and the WOMAC during 2-year followup (24–35 months, n = 68) and 3-year followup (36–46 months, n = 31) in our cohort (Supplemental Table 2. Supplemental material is available with the online version of CORR.). The fifth limitation was that the patients were predominantly women and the sample size of this study restricts further subgroup analysis. Another larger prospective cohort study would be needed to address these issues.

Reliability, referring to the consistency or reproducibility of measurement, was evaluated by internal consistency and test-retest reliability. In published studies, Cronbach’s alpha coefficients of different language versions of the OKS range from 0.81 to 0.95 [5, 14, 16,17,18, 21, 26, 30, 42, 45, 48, 52, 55, 60], and ICCs range from 0.85 to 0.99 [16,17,18, 21, 26, 42, 45, 52, 55]. Regarding internal consistency, Cronbach’s alpha coefficient of the OKS-CV in this study was high (0.89), similar to the English version (0.87) [14] and cross-culturally adapted Italian, Thailand, French, Japanese, Portuguese, and Turkish versions [5, 21, 30, 45, 52, 55] (Supplemental Table 3. Supplemental material is available with the online version of CORR.). The high Cronbach’s alpha coefficient for the OKS-CV and acceptable corrected item-total coefficients for all 12 items confirmed that the OKS-CV is internally consistent, with the correspondent items properly correlated with each other. Regarding test-retest reliability, the value of the ICC (0.93; 95% CI, 0.87–0.97) indicated excellent test-retest reliability of the OKS-CV, similar to the Swedish, German, and Persian versions [16, 17, 42]. However, nearly 40% (six of 15) of published studies regarding translation and cultural adaptations of the OKS did not assess the test-retest reliability (Supplemental Table 3. Supplemental material is available with the online version of CORR.).

Evaluation of convergent validity and divergent validity was accomplished through testing of prespecified hypotheses regarding the relationship between the PROMS domain scores and evaluation systems already adequately validated. The convergent construct validity was good in this study. As hypothesized, the OKS-CV correlated moderately or strongly with domains in the WOMAC and SF-8 that measure similar health aspects. However, unlike the Singapore Chinese version [60], the correlations between the OKS-CV and the EQ-5D anxiety/depression were much higher than expected, suggesting that knee osteoarthritis affects not only the physical but the mental health-related quality of life of our patients. Using a different scoring system, there were similar findings in the Swedish [16], and Singapore Chinese [60] versions. Potentially, pain and poor knee function may explain much of the emotional problems in patients with knee osteoarthritis. In support of this possibility, a study that developed a mapping algorithm between the OKS and EQ-5D utilities, based on a large sample size, suggested that the OKS predicted anxiety/depression responses quite accurately [13]. Furthermore, the potential reason that the OKS-CV was weakly correlated with the EQ-5D pain/discomfort in our study is that the patients were asked to report their status “during the past 4 weeks” in the OKS-CV, whereas in the EQ-5D it was to report their status as of “today”.

To our knowledge, there have been no consistent results among the published studies that have reported dimensionality. The items measuring pain and physical disability in the OKS did not load on the same factors [1, 18, 24]. In the current study, the results of exploratory factor analysis indicated a three-factor structure of the OKS-CV. First, factor 1 (composed of Items 3, 7, 9, 10, 11, 12 and labeled as “physical function”), which showed a strong correlation with the physical function subscale of the WOMAC, may indicate physical function. Second, factor 2 (composed of Items 1, 2, 4, 5, 6, and labeled as “pain”), which showed a strong correlation with the pain subscale of the WOMAC, may indicate pain. Item 8, “pain in bed at night,” loaded on the third factor independently. This finding was echoed by the moderate correlations between Item 8 and the EQ-5D pain/discomfort and anxiety/depression and the SF-8 emotional roles and mental health. This item may not perform as it was designed to. A possible explanation is the long average duration (years) of osteoarthritis in the patients in our study (9 ± 6 years), and the high percentage of patients (79%) choosing the response option “some nights,” “most nights,” or “every night”. Previous studies suggest that patients with osteoarthritis commonly report sleep disturbance that may be attributable to pain [4, 49]. The pain of osteoarthritis has negative effects on patients’ mood, sleep, and ability to participate in social and recreational activities [27]. Thus, our findings suggest that the OKS-CV may have a factor structure that encompasses more than two underlying health domains, with Item 8 representing a pain-related sleeping problem. Our findings also raise the possibility that it might be worthwhile to study the role of knee osteoarthritis in sleep and emotional problems, and the role of the OKS in predicting the anxiety/depression responses of the patients from a clinical perspective. Further studies with a larger sample size are needed to determine dimensionality of the scale in the intended population, and to evaluate the mental state of patients with knee osteoarthritis.

In the current study, the corresponding values of the OKS-CV showed excellent responsiveness. Moreover, the change scores of the OKS-CV were associated with the change scores of the WOMAC. These results suggest that the OKS-CV is sensitive to measures of the patient’s preoperative and postoperative changes, which can help clinicians and researchers determine whether clinical symptoms have improved after an intervention such as a TKA. However, only 13% (two of 15) of the published studies regarding translation and cultural adaptation of the OKS have conducted responsiveness testing (Supplemental Table 3. Supplemental material is available with the online version of CORR.).

The Chinese version of the OKS is culturally and linguistically equivalent to the original English version. The results of this study support the use of the OKS-CV as a reliable, valid, responsive instrument for assessing PROMS outcomes in Chinese Mandarin-speaking patients with knee osteoarthritis. This version also will allow cross-cultural comparisons of PROMS outcomes among Chinese Mandarin-speaking patients with knee osteoarthritis to previously published studies from countries and populations in which the OKS was assessed in other languages.