Introduction

Osteoarthritis (OA) is the most common joint disease of the adult population, where the incidence and the prevalence increase with age all over the world [13]. OA has a major impact on both national economies due to the expensive treatment and surgery techniques and the patients’ quality of life [4]. The World Health Organization (WHO) Scientific Group on Rheumatic Diseases estimates that 10 % of the world’s population who are 60 years or older have significant clinical problems that can be attributed to OA [5]. Since the incidence and prevalence increase with age, longer life expectancy and obesity epidemic will result in an increase of OA [69]. The age- and sex-standardized incidence of hip OA is 88 per 100.000 person-years [2].

Osteoarthritis is characterized with degeneration of cartilage, narrowing in joint space, pain and disability [5]. Although OA may affect any joint in the body, it affects the knee joint most frequently and it is followed by hip joint [10, 11].

Patients with hip OA complain pain and functional impairment during activities of daily life such as walking and climbing stairs [12, 13]. Stiffness, strength deficits, loss of joint movement, and gait disturbances like walking asymmetries, and decreased walking velocity are frequent clinical findings in hip OA [8]. Pain and deficiency of physical function negatively affect quality of life [12, 14]. The usage of patient-reported questionnaires for determining the effects of disease and its treatments on pain, function and quality of life in patients with hip OA is gradually increasing. These questionnaires must be reliable, valid, and sensitive to the clinical changes [15].

Oxford hip score (OHS) measures specific forms of pain and the problems of mobility in patients with hip problems [6, 16]. The OHS is a 12-item, hip-specific, self-reported questionnaire for patients with undergoing total hip replacement (THR). The OHS has been widely used as an outcome measure of functional ability, daily activities, and pain from the patient’s perspective in hip OA patients [17]. The OHS has been extensively studied and has proven to be reliable, valid, and responsive in hip OA patients [6, 1719]. The original version from 1996 was updated in 2007 introducing a new scoring system [6, 7]. It has also been translated and validated in several languages such as Dutch, German, French, Japanese, Italian, Korean, and Chinese [2027]. Since OHS is not available in Turkish Language, the aim of this study was to cross-culturally adopt the OHS for Turkish-speaking patients. Another aim was to determine the clinometric properties, reliability, and validity of the Turkish version of OHS (OHS-TR) in patients with hip osteoarthritis.

Materials and methods

Patients

Seventy patients with hip osteoarthritis evaluated in Hacettepe University Department of Orthopedics and Traumatology were recruited in the study. Exclusion criteria were rheumatic diseases potentially responsible for a secondary OA, severe articular inflammation, and traumatic hip lesions. Patients with cardiac diseases or peripheral vascular diseases were also excluded. All patients were native Turkish speakers. The study protocol was approved by the local research and ethics committee of Hacettepe University. Consent was obtained from each patient prior to participation upon receiving complete information on the study.

Demographic characteristics of patients were recorded. All patients received and completed the following questionnaires: OHS-TR, Turkish version of Western Ontario and McMaster Universities Arthritis Index (WOMAC) [28] and Turkish version of the Short-Form 36 Health Survey (SF-36) [29].

The Oxford hip score

The OHS is a 12-item practical, sensitive to clinically important changes, reliable, and valid questionnaire for completion by patients with knee osteoarthritis [6, 19]. Each question has 5 categories of response, corresponding to a score ranging from 0 to 4. Overall score ranges from 0 (worst) to 48 (best). For scoring the questionnaire, the revised version was used [7].

Patients filled all questionnaires at the first interview. The retest was implemented 7 days after the first test by telephone interview [30]. The time period required to answer the questions was noted during application of first OHS questionnaire. Comprehensibility and acceptance of the questionnaire is determined by the ratio of unanswered questions. After completing the OHS, the SF-36 and WOMAC questionnaires applied all the patients.

Short-Form 36

The Short-Form 36 Health Survey is the most widely used questionnaire to assess health-related quality of life (HRQoL). SF-36 consists of eight subscales including functional status, well-being, and overall evaluation of health status (i.e., physical functioning, physical role, bodily pain, general health, vitality, social functioning, emotional role, mental health). Scores for each subscale range from 0 (poor) to 100 (good health) [29, 31].

Western Ontario McMaster Universities Arthritis Index

The WOMAC, a 24-item disease-specific functional measurement, consists of three subscales pain (5 items), stiffness (2 items), and physical function (17 items). Each of these 24 items is graded either on a five-point Likert scale or on a 100-mm visual analog scale [3234]. The five-point Likert (0–4) WOMAC was used for calculation in this study. Subscale scores were calculated by summing each item for pain score (score range 0–20), stiffness (score range 0–8), and physical function (score range 0–68). Total score was calculated by summing the three subscales scores (range 0–96), with higher scores reflecting worse pain, stiffness, and physical function [28].

Translation and adaptation process

The OHS-TR was developed according to the international guidelines under the license of the OHS copyright holder (© Isis Innovation Limited 1998. All rights reserved) [3537]. Two translations from English to Turkish were performed by two different, independent translators whose native language was Turkish but fluent in English, allowing detection of errors and divergent interpretations of items in the original instrument with ambiguous meanings. To obtain better idiomatic and conceptual (rather than literal) equivalence between the two versions of the questionnaire and to render the intended measurement more reliable, one translator had knowledge of the study purpose and the concepts of the instrument. The other translator was unaware of the translation objective, and this was useful to gather unexpected meanings from the original tool. Both Turkish translations were then synthesized by these translators and two bilingual health professionals and retranslated back to English by two native English speakers who were totally blind to the original version. Each English translation was then compared with the original English OHS version and checked for inconsistencies. To assess the necessity of performing a cultural adaptation and to fine tune it for use among Turkish patients, the Turkish version was jointly reviewed by an expert committee composed of the authors, two experienced professional translators, and health professionals, who were all bilingual. To detect errors of interpretation and nuances that might have been missed, the committee again compared the Turkish version with the original English version. The final stage of the adaptation process was to test the pre-final version. Pretesting of the pre-final Turkish version for comprehensibility on 10 randomly selected patients revealed no further difficulties with the questionnaire. After testing on limited number of patients, the questionnaire was approved by the translation committee without any changes to be used on the study population [30, 35, 36, 38].

Statistical analyses

The sample size was based on the general recommendations of Altman; at least 50 subjects in a methods comparison study [39]. Quantitative and qualitative variables were presented as mean ± standard deviation (X ± SD) and percent (%), respectively. Minimum and maximum scores for individual items and the total score for OHS were examined for possible floor or ceiling effects. If more than 15 % of respondents achieved the lowest or highest possible score, floor or ceiling effects were considered to be present. Statistical analyses were performed using SPSS 11.5 software (SPSS Inc, Chicago, IL). Bland–Altman plots were created using MedCalc Statistical Software version 15.2 (MedCalc Software bvba, Ostend, Belgium). Values of p < 0.05 were considered significant.

Reliability

The Cronbach’s α coefficient was used to measure the internal consistency. The Cronbach’s α statistic is an estimate of the reliability of a scale’s measurement calculated from a single administration of the scale. The range of a coefficient varies between 0 and 1. A higher Cronbach’s α coefficient points to a higher correlation between the questions. The coefficient was also calculated for elimination of 1 item in all 12 questions. All items were examined for correlation with the overall score [30, 34, 39, 40].

Intraclass correlation coefficients was also used to assess reliability. A two-way random effects model reliability analyses was used in the present study. ICC was calculated with confidence intervals for each item and total score. [30, 3941].

Reproducibility or test–retest reliability was assessed by asking patients to complete another OHS-TR 7 days after the first one. The changes in mean scores between the first and second tests were calculated for each item and total score. Differences between test and retest scores were compared by Wilcoxon signed-rank test to assess any systematic differences between both administrations. Correlation between the both test scores was also determined by the Spearman’s correlation coefficient to analyze reproducibility [3941].

Several agreement parameters can be found in the literature. Most references describe the use of the standard error of measurement (SEM) and repeatability coefficient which is recommended by Altman and Bland [4246]. SEM is defined as the standard deviation of errors of measurement that is associated with the test scores for a specified group of test takers. The SEM was calculated using the following equation:

$${\text{SEM}} = {\text{SD}}_{x} \surd 1 - r,$$

where SD x equals the standard deviation of the observed scores (x) and r equals the reliability estimate for the measure. Agreement was estimated using both SEM and repeatability coefficient [30, 43, 44]. Furthermore, the instrument should be able to distinguish the clinically important changes (MIC) from measurement errors. Therefore, MIC was calculated according to the formula: 1.96 × SDchange [47]. In addition Bland–Altman representation was obtained [4749].

Validity

Validity is an index of how well a test measures and what it is supposed to measure. Validity was assessed by calculating the Spearman’s correlation coefficient between the OHS-TR and the WOMAC and SF-36. Spearman’s correlations were used due to the non-parametric nature of the data. To evaluate the convergent validity of the OHS-TR, Spearman’s correlation coefficients were calculated between the OHS-TR and WOMAC scores and related subscores of the SF-36. We hypothesized that OHS-TR should have moderate to high (0.50–0.80) correlations with these scores.

Discriminant validity was evaluated by calculating Spearman’s correlation coefficients between the OHS-TR and mental component summary, mental health, vitality, and emotional role subscores of SF-36. To test discriminant validity, we hypothesized OHS-TR should have lower correlation coefficients (r < 0.50) with mental components of SF-36 [30, 3941, 47, 50].

Results

A total of seventy patients fulfilled the inclusion criteria and accepted to participate in the study. After clinical evaluation, all patients completed the questionnaires. The demographic and clinical characteristics of the study population are presented in Table 1. Comprehensibility and acceptance of the questionnaire determined by the ratio of unanswered questions were good since there were no unanswered questions. Patients did not report any difficulties in understanding and completing the OHS-TR. The Turkish translation of OHS is presented in Appendix. Mean time for completing the OHS-TR was 168.26 ± 83.15 s (range 44–377 s). The absolute values of all scores are presented in Table 2.

Table 1 Demographic and clinical characteristics of study population (n = 70)
Table 2 Absolute values of all scores

There was a floor effect only in 1 item. It was observed in item 1 with 15.7 %. No floor or ceiling effect was observed for the total OHS-TR score.

Reliability

The internal consistency of OHS-TR tested by Cronbach’s α was high for the total score (Cronbach’s α 0.93). Corrected item-total correlations ranged between 0.54 and 0.81. All items correlated with the total score and the elimination of one item did not result in an α higher than 0.93 (Table 3).

Table 3 Internal consistency of Oxford hip score

All patients completed the OHS-TR twice for testing the reproducibility. Second test was performed 7 days after the first one. The Spearman’s correlation coefficient between the two tests was high (r = 0.980, p < 0.001) (Fig. 1) (Table 4).

Fig. 1
figure 1

The correlation scatter of first and second OHS Tests

Table 4 Correlation between OHS and WOMAC and SF-36

For each item, test–retest reliability was analyzed with both intraclass correlation coefficients (ICCs) and related samples Wilcoxon signed-rank tests. The ICCs were very high and ranged between 0.80 and 0.99. Although there were no significant difference between first and second tests in each 12 items (p > 0.05), the total scores were significantly different (p = 0.01) (Table 5). The mean difference between the total scores was −0.31 (standard deviation 1.17; 95 % confidence interval −0.59 to 0.03). The Bland–Altman plot is shown in Fig. 2. The difference between two measurements was within the limits of agreement in most of our cases.

Table 5 Test–retest scores of the Turkish version of OHS to evaluate reliability of patients with osteoarthritis of hip
Fig. 2
figure 2

Reliability of the OHS-TR presented as Bland-Altman representation

The calculated SEM was 0.16, within-subject standard deviation (Sw) was 1.17, and the repeatability coefficient was 3.63. The calculated MIC for OHS-TR was 2.30.

Validity

OHS-TR was significantly correlated with both WOMAC and SF-36 scores (p < 0.001). The highest degree of correlation was observed with the WOMAC total score (r = −0.848), and with respect to discriminant validity the lowest degree of correlation was observed with the vitality (r = 0.380) and mental component summary (r = 0.434) subscores of SF-36 (Table 4).

Discussion

In the present study, it is demonstrated that the OHS-TR is a valid and reliable tool for the assessment of pain and function in Turkish-speaking patients with hip OA.

The OHS-TR had excellent response rate. In the literature, it is considered that a response rate of 80 % is sufficient [51]. We had a 100 % response rate both in the first and second tests. The five-point Likert system enables quick answering by the patients, as well as uncomplicated and time-saving evaluation by the investigator, offering an advantage for clinical routine. Having no missing data and short time required to complete the questionnaire reflects good acceptance and comprehension by the Turkish mother tongue patients. Translating and culturally adapting this kind of short, practical, reliable, valid, and sensitive instruments into Turkish language is important not only for use in Turkey with a population currently approximating 80 million, but also to use in other countries in which the Turkish people are living and working. The Turkish population only in European Union countries currently stands at 10 million [52].

In the presence of floor or ceiling effects, since an extreme value of an item in the test will more likely to be identical on the retest, agreement parameters may be over or underestimated. In the present study, there was a mild floor effect only in one item and there were no floor or ceiling effects in the total score of the OHS-TR. We believe that such item-based minor floor or ceiling effects could be expected due to the heterogeneity of the patients in terms of social differences and severity of the disease. In some of the version studies of OHS, authors reported such minor item-based effects, which is in accordance with our study [47, 51].

Internal consistency is a measure of the extent to which items in a questionnaire are correlated, by this means measuring the same concept [53]. As demonstrated by the internal consistency (Cronbach’s α 0.93), test–retest reliability (r = 0.98), and the non-significant difference between each items’ first test and second test scores (p > 0.05), the psychometric properties of the OHS-TR were in accordance with the original English, German, Dutch, Italian, Danish, French, Korean, and Persian versions [6, 2024, 51, 54]. The mean difference between the first and second test total scores was significantly different in our study. Although we did not calculate smallest detectable difference (SDD), Martinelli and colleagues reported the SDD value of 6.1 [23]. This finding means that only a change between two subsequent measurements greater than 6.1 points can be interpreted as a real change. For the same purpose, we have calculated the MIC to distinguish the measurement errors from clinical changes. In the present study, the calculated MIC was 2.30. The mean difference was 0.31 points which is much below this clinical significance level for OHS.

The ICC is considered to be the most suitable and commonly used reliability parameter for continuous measures [47]. Our results (ICC ranged between 0.80 and 0.99) were well comparable with previous version studies, which indicate that our translation and cultural adaptation have succeeded in establishing the exact same meaning for each item with the original English version [6, 2024, 51, 54]. Although there are slight differences in the psychometric properties of the previous version studies [6, 2024, 51, 54], the differences could be related to demographic and clinical differences between the study populations.

The hypothesis for construct validity was confirmed for OHS in the present study. As assumed, OHS-TR was strongly correlated with WOMAC and related SF-36 subscores (physical component summary, physical functioning, bodily pain, and physical role subscales). We were expecting low correlations (r < 0.50) between the OHS-TR and mental component summary, role emotional, vitality, mental health, and social functioning subscores of SF-36. Interestingly, mental health and social functioning scores showed stronger than expected correlations with OHS-TR. Although these parameters showed low correlations in the western language version studies as well as the original English version, our results were in accordance with the Japanese version study [26]. As in the Japanese culture, Turkish people also tend to use squatting and sitting on the floor or low surfaces (like eating on the floor or using Turkish style toilets which requires squatting) both culturally and religiously (crouching during Islamic prayer). In both cultures, limitations in these activities may cause psychological and social limitations. Therefore, when comparing questionnaire across clinical studies, which are performed in different countries, differences in socio-cultural factors, healthcare systems, and severity of the disease should be considered.

Conclusion

The results of the present study support the use of the Turkish version of OHS as a reliable and valid outcome instrument in Turkish-speaking patients with osteoarthritis of the hip. The questionnaire should be tested extensively for detecting changes within time, for follow-up and, especially, for routine clinical assessment in different hip problems.