Introduction

Patient-reported outcome measurements (PROMS) can provide reliable and valid measures of a patient’s degree of pain, impairment, disability, and quality of life. They are a critical tool in evaluating the efficacy of orthopedic procedures and are increasingly used in clinical trials to assess the outcomes of health care. Total hip arthroplasty (THA) in osteoarthritis has shown to have a significant improvement on patients’ health-related quality of life [1]. The Hip dysfunction and osteoarthritis outcome score (HOOS) was developed as an extension of the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) questionnaire, which has been used worldwide as a hip osteoarthritis (OA) specific questionnaire [2, 3]. The HOOS contains five subscales. A normalized score can be assessed for each domain, where 100 indicates no symptoms and 0 indicates severe symptoms. The original version of the HOOS was shown to be valid, reliable, and responsive in hip OA patients and is considered useful for the evaluation of patient-relevant outcomes after THA [2, 3]. Currently it has been translated, validated and published in Chinese, Dutch, French, German, Japanese, Korean, Thai, and Turkish [4,5,6,7,8,9,10,11]. The German cross-cultural adaptation and evaluation was performed with a small number of Swiss-German participants (n51) lacking assessment of responsiveness [7]. In prospective outcome studies, the responsiveness of an outcome measure and its ability to detect change when a change has occurred is an essential characteristic of the validity of the measure [12,13,14].

The aim of this study was to estimate responsiveness and minimally important change and to reassess reliability and validity of the German HOOS in a high number of patients with OA undergoing THA. To determine potential differences between languages, we translated the English HOOS into German and compared it with the Swiss-German version of Blasimann [7].

Materials and methods

Translation

Forward and backward translation of the HOOS was performed according to international guidelines of the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) [15]. Two bilingual translators whose native language was German independently translated the English version forward into German. Two native English speakers performed backward translation of the German questionnaire into English. The final version was tested on 15 patients with OA of the hip to ascertain acceptance and comprehension.

Patients and validation procedure

From November 2014 to January 2016, a total of 251 patients, 154 (61%) women and 97 (39%) men, with a mean age of 68 years (36–89 years) with OA undergoing THA were consecutively recruited at a single institution. Eligibility criteria included adult patients undergoing primary THA. Patients were asked to complete the German HOOS, the German Oxford- Hip Score (OHS), the German Short-Form 36 Health Survey (SF-36), and a numeric scale for pain and disability (NRS). HOOS, OHS, SF-36, and NRS were completed 3–14 days before surgery (t1) and again on the morning before surgery (t2) for reliability testing. 6 months after surgery (t3), all participants were asked to complete HOOS a last time.

Instruments

The HOOS contains five subscales: symptoms (Sym), pain, activity of daily living (ADL), sports/recreation (S/R), and quality of life (QoL) [3]. The pain domain is constituted by the five questions of the original WOMAC pain domain, plus five additional questions, the Sym domain includes the two questions of the WOMAC stiffness domain plus three additional questions. The ADL domain contains the WOMAC function questions (17 questions) and the S/R (four questions) and hip-related QoL (four questions about global difficulty, lack of confidence in hip, lifestyle change or awareness of the hip problem) are newly generated domains, which aim to evaluate the consequences of hip OA on more demanding activities and on QoL [2, 16].

Patients score each question on a five-point Likert scale scored from 0 to 4, with 0 representing the worst stage.

The OHS is a 12-item instrument to evaluate pain and function related to the hip [12]. Each item is scored on a five-point Likert scale from 0 to 4, with 0 representing the worst stage. The measure generates a single overall score ranging from 0 to 48 (summed items), where 48 represents the best health state. A German OHS has been translated and validated for OA and THA patients. It showed to be a reliable instrument for use in patients with OA of the hip undergoing THA [17].

The SF-36 instrument is a widely used generic patient-reported instrument to measure health-related quality of life. It consists of eight domains: physical functioning (pf), role physical (rp), role emotional (re), social functioning (sf), mental health (mh), energy/vitality (e/v), pain (p), general health perception (gh). It has been translated and validated into German [18].

The NRS was used to determine pain and disability of the hip. On a 0–10 scale, 10 represents the most severe pain or disability.

Statistical analysis

The HOOS and OHS scores were entered into a Microsoft Excel spreadsheet (Microsoft Corporation, Redmond WA) and analyzed using SPSS v24 (SPSS Inc. Chicago, Illinois). A p value < 0.05 was considered to indicate statistical significance.

Reliability

Reproducibility

Reproducibility as test–retest reliability was assessed by calculating intraclass correlation coefficient (ICC, Two-way Random Effect Model Absolute Agreement Definition) between HOOS completed at the first visit 3–14 days before surgery (t1) and second time before surgery (t2). An ICC value of 0.7 and above was considered as good [19, 20].

Internal consistency

Reliability also includes internal consistency [20]. Internal consistency is the extent to which items within a scale are homogeneous, thus measuring the same construct [21, 22]. Cronbach´s alpha (α) coefficient calculated to assess internal consistency of the HOOS items. Values of α of 0.7, 0.8 and 0.9 are considered to represent fair, good and excellent degree of internal consistency, respectively [23].

Floor and ceiling effects

Floor and ceiling effects were considered to exist if more than 15% of responses reached lowest or highest possible score [24].

Validity

Construct validity

Describes the extent to which a score relates to other scores [24]. As no gold standard exists the HOOS subscales were compared to OHS and SF-36 and NRS pain and disability using non-parametric correlation coefficients (Spearman´s Rho). Correlation coefficients < 0.4 were considered as low, 0.4–0.59 as moderate and 0.6–0.79 as high correlation. For convergent validity high correlation between HOOS dimensions pain and the OHS domains pain and with the SF-36 domains bp and NRS were hypothesized. For the HOOS dimension ADL, S/R high correlation was expected with OHS domain function and SF-36 domain pf. Low correlations were expected between subscales on different contents.

Responsiveness

Responsiveness is the extent to which a questionnaire is able to detect changes over time or due to an intervention such as surgery [25]. All patients completed HOOS before surgery (t1) and 5–6 months after surgery (t3). To test responsiveness effect size (ES) and standardized response means (SRM) were calculated. ES calculated as the difference between the means before and after intervention divided by the standard deviation (SD) of the same measure before treatment [26]. SRM is calculated as the difference between the means before and after treatment divided by the SD of the change. For both, ES and SRM, values of 0.2, 0.5 and 0.8 were regarded as small, moderate and large effects, respectively [20, 26].

Minimal important change (MIC)

Minimal important change is the smallest change in a treatment outcome that a patient or physician would identify as important. MIC describes a threshold above which outcome is experienced as relevant by the patient and avoids the problem of bare statistical significance [27]. One distribution-based approach to calculate MIC is the minimal detectable change (MDC). It is defined as minimum amount of change that can be considered above the threshold of a measurement error. If the change in a score is higher than MDC, it can be considered as a true change [27]. It is calculated from the standard error of measurement (SEM), which is related to the internal consistency/reliability of the score (Cronbach´s alpha). (SEM = Standard deviation *√1 − Cronbach´s alpha). To allow comparisons with other studies, the MDC was calculated based on the confidence level of 90% (MDC90: MDC = 1.65*SEM* √2) [28, 29].

Results

Our translation of the HOOS showed no significant differences compared to the Swiss-German questionnaire. This was determined by an independent expert group for cross-cultural adaptation of questionnaires and German researchers who compared each question with regard to linguistic and content-related differences.

Reliability

Reproducibility

All five dimensions of the HOOS demonstrated excellent test–retest reliability with ICC values of 0.79 for Sym, 0.87 for S/R, 0.85 for pain, 0.89 for ADL, 0.86 for QoL and 0.88 for the HOOS total. The mean indexes for the baseline and the reliability assessments were 30.6 [Standard deviation (SD) 15.1] and 29.4 (SD 14.3), respectively (Table 1).

Table 1 Reliability—Cronbach´s alpha and ICC

Internal consistency

Cronbach´s alpha (α) of 0.69 for Sym, 0.82 for S/R, 0.91 for pain, 0.96 ADL, 0.82 for QoL and 0.97 for the HOOS total demonstrated strong internal consistency (Table 1).

Floor and ceiling effects

No floor or ceiling effects were observed for the HOOS subdomains except S/R which showed a floor effect (Table 2).

Table 2 Floor and ceiling effects

Validity

Construct validity

To examine construct validity, the Spearman´s correlation coefficients between HOOS, SF-36, OHS and NRS are examined and shown in Table 3. Convergent validity of the HOOS ADL, S/R subscale was shown with strong correlations (> 0.6) with OHS function and SF-36 domains pf. As hypothesized, HOOS pain subscale correlated strongly with OHS domain pain, SF-36 domain bp and NRS. All these findings were statistically significant (p < 0.05).

Table 3 Validity

Responsiveness

Table 4 shows the responsiveness of the HOOS. All subscales demonstrated excellent (ES/SRM > 0.8) responsiveness between preoperative assessment (t2) and postoperative follow-up (t3) indicating that a very large degree of change was detected following surgery. The highest effect size showed the pain subscale (2.86) representing the best responsiveness, whereas the sports and recreation domain showed lowest (2.07), still representing large effects.

Table 4 Responsiveness

The SEM was 2.81, 2.61, 2.81, 3.70, 3.52, and 2.64 for the German HOOS Sym, pain, ADL S/R, QoL and HOOS total, respectively. MDC90 (90% confidence level) was 6.55, 6.09, 6.55, 8.63, 8.22 and 6.16 for domains Sym, pain, ADL, S/R, QoL and HOOS total, respectively. The mean difference between preoperative and postoperative assessment is shown in Table 4 ranging between 20.23 for pain subscale and 27.30 for QoL. All subdomains and the index showed higher changes than the MDC, which indicates true changes [25].

Discussion

The HOOS is an internationally used PROM which has been translated, validated and published in Chinese, Dutch, French, German, Japanese, Korean, Thai and Turkish [4,5,6,7,8,9,10,11]. Responsiveness has been assessed for the Chinese, French, Japanese, Korean and the original HOOS [3, 4, 6, 8, 9]. Evaluation of responsiveness with minimally detectable change has not been published in any language.

Test–retest reliability of the German HOOS showed excellent results with ICC values ranging from 0.85 to 0.89 for the different subscales. These results are comparable to the other HOOS version, where ICC values ranged from 0.75 to 0.98 [4,5,6,7,8,9,10,11].

Strong internal consistency has been demonstrated for the HOOS total (Cronbach´s alpha) 0.97, 0.82 for S/R, 0.91 for pain and 0.96 for ADL domain of the German version (Table 1). This is comparable to other language versions of the HOOS where Cronbach´s alpha values range between 0.70 and 0.97 [4,5,6,7,8,9,10,11].

No floor or ceiling effects were observed for the HOOS subdomains except S/R which showed a floor effect (Table 2), which is in line with Dutch HOOS, whereas the Chinese, French, Korean, Thai, and Swiss version showed no floor or ceiling effects at all [4, 6, 7, 9, 10]. For the Japanese HOOS, floor and ceiling effects have not been evaluated [8].

Construct validity was determined by comparing the German HOOS with the German SF-36 and OHS (Table 3). Comparison between HOOS and SF-36/12 has been published for all translated versions, whereas a comparison between HOOS and OHS only for the Japanese questionnaire [8].

The HOOS ADL and S/R subscales showed strong correlations (> 0.6) with OHS function and SF-36 pf domains. As hypothesized, HOOS pain subscale correlated strongly with OHS domain pain, SF-36 domain bp and NRS. The Japanese version also showed strong correlation between HOOS and OHS, but lower correlation concerning HOOS and SF-36 subdomains. HOOS subscale pain and SF-36 bp only demonstrated moderate correlation (0.53), which could be explained by cultural differences between the study populations. Divergent validity was shown by low correlation between HOOS domains and SF-36 re and physical subscales, respectively.

Table 4 illustrates the responsiveness of the HOOS. All subscales showed excellent (ES/SRM > 0.8) responsiveness between preoperative (t2) and postoperative follow-up (t3). The highest effect size showed the pain subscale (2.86) representing the best responsiveness, whereas the S/R domain showed lowest (2.07), still representing large effects. Responsiveness has been evaluated for the Chinese, French and Japanese versions, which also showed excellent results [4, 6, 8].

Minimal important change as the smallest change in a treatment outcome has not been described for the HOOS, so far. The aim was to ascertain the smallest amounts of change in the HOOS domain scales that are likely to be clinically meaningful and beyond measurement error for OA of the hip. The SEM, MDC90 and the mean difference between preoperative and postoperative assessment are shown on Table 4. All subdomains and the index showed higher changes than the MDC which indicates true changes.

Our evaluation of the HOOS showed similar results to the validated HOOS versions in other languages. Cultural differences, smaller number of patients and different hip pathologies and surgeries may be the reason for differing results in some aspects of the other HOOS versions. Responsiveness showed excellent results. To our knowledge, we are the first to determine MIC of the HOOS. Our article has estimated distribution-based MDC values for the HOOS to be between 6.1 and 8.6 score points. If a patient improves or deteriorates beyond the MDC 90 value, we can be fairly certain that this is not due to random variation in the score [30]. Correspondence with the developer of the HOOS was conducted to compare our German HOOS to the Swiss-German HOOS by Blasimann [7]. Our translation of the HOOS showed no significant differences indicating no need for another German questionnaire. This was confirmed by an independent expert group for cross-cultural adaptation of questionnaires and German researchers.

In conclusion, the German HOOS demonstrated good psychometric properties. Our study proofed that the German questionnaire is a valid and reliable instrument for patients with OA undergoing THA. It can be used as a tool for evaluating the efficacy of surgical procedures and in clinical trials to assess the outcomes of health care.