Introduction

Total hip arthroplasty (THA) provides significant improvement in both mobility and quality of life for patients with osteoarthritis (OA) of the hip [1]. A recent epidemiologic study projected an increase in THA by 71% to 635,000 procedures per year by 2030 [2, 3]. As the prevalence of OA and the procedure volumes of THA continue to grow, focused use of outcome metrics will be required as clinicians and researchers seek to evaluate the effectiveness of THA.

As payers and providers seek to standardize quality and costs of care for THA, patient-reported outcome measures (PROMs) define success after THA [4, 5]. The generation of new outcome metrics has increased, as researchers seek to better understand the perceived nuances of post-surgical populations. These new metrics often refine older metrics, applying their strengths while also introducing novel methods of analysis [6]. Across the literature, both the number of individual types of outcome metrics and the frequency of their utilization continue to rise, generating increased heterogeneity that may limit the ability to generate reliable comparisons [7,8,9].

This increase in outcome measures may be driven in part by the Centers for Medicare & Medicaid Services (CMS) merit-based incentive payment system (MIPS) and a contemporary environment that incentivizes clinicians to collect value-focused metrics [10, 11]. Moving forward, it is important that the THA community gain consensus regarding which outcome metrics are most preferred to generate useful measure and to facilitate clinical practice recommendations.

Although a number of studies have sought to validate individual PROMs within a population cohort, few studies have sought to understand the underlying trends in utilization of these outcome metrics [1, 5, 7]. Although the utilization rates of each of these outcome metrics continues to change in THA reporting, no review has thus far considered these trends in detail. Investigating the rise or decline of commonly used metrics may also reveal new trends in clinician preferences over time. These trends will provide a perspective on which outcome metrics have become more commonly used over time, as well as those that have been rendered obsolete or less preferred by the THA provider community.

To better understand the complexity in outcomes reporting within the THA literature, this review examined the trends in utilization of outcome metrics. We hypothesized that outcome metric utilization would demonstrate significant heterogeneity, and that utilization rates of numerous outcome metrics may have significantly changed over the past 15 years.

Materials and methods

A review of studies published between January 1, 2005 and December 31, 2019 was performed to obtain relevant quality metrics in THA. Quality metrics were separated into five subcategories: joint-specific, joint-agnostic, general health, quality of life, and patient satisfaction. These subcategories were chosen as they are commonly utilized to categorize outcomes variables when evaluating THA outcomes [12,13,14].

The Harris Hip Score (HHS), Hip Disability and Osteoarthritis Outcome Score (HOOS), Hip Disability and Osteoarthritis Outcome Score Short Form, Joint Replacement (HOOS-JR), Modified Harris Hip Score (mHHS), Oxford Hip Score (OHS), and Western Ontario McMaster Universities Arthritis Index (WOMAC) were the outcomes metrics categorized as joint-specific. These metrics are all joint-specific metrics used to evaluate THA [15,16,17]. The Numeric Rating Scale (NRS), University of California at Los Angeles Activity Score (UCLA), and Visual Analogue Scale for Pain (VAS) were the outcomes metrics categorized as joint-agnostic [18,19,20]. These metrics are all joint-agnostic metrics used to evaluate THA that are not primarily directed at general health, quality of life, or patient satisfaction. The Forgotten Joint Score-12 (FJS-12), Patient-Reported Outcome Measurement Information System Global-10 (PROMIS-10), and Veterans RAND 12-Item Survey (VR-12) were the outcomes metrics categorized under general health [12, 21]. These metrics are all metrics used to primarily evaluate general health. The EuroQol 5-Dimension Health Outcome Survey (EQ-5D), Short Form-12 Health Survey (SF-12), and Short Form-36 Health Survey (SF-36) were the outcomes metrics categorized under quality of life [10, 22, 23]. These metrics are all metrics used to primarily evaluate quality of life. Lastly, a literature search revealed that patient satisfaction is often evaluated in THA; however, specific outcomes metrics focused solely on patient satisfaction are not yet available for THA [16, 18, 21, 24]. Thus, papers reporting “Patient Satisfaction” were also included in this analysis.

A review of the historic literature was carried out for the outcome metrics selected in this study [10, 12, 15, 21, 25,26,27,28]. Information regarding the year they were individually developed, their validity in measuring THA outcomes, and number of questions is listed in Table 1. It is important to note that HOOS-JR was first validated in 2008, after the first date of data collection within the study [24]. The same can be said for the FJS-12, which was validated in 2012 [29]. All other outcome metrics were validated prior to the first data collection period within this study.

Table 1 Summary of outcome metrics and corresponding outcome tools

The aggregate number of primary THA papers in the literature was determined using the following PubMed search: Hip Arthroplasty [Title/Abstract] AND (“2005/01/01” [Date—Publication]: “2019/12/31” [Date—Publication]) NOT Hemiarthroplasty [Title/Abstract] NOT resurfacing [Title/Abstract]. A PubMed search was performed for each individual outcome metric using the following search template: Hip Arthroplasty [Title/Abstract] AND (X) [Title/Abstract] AND (“2005/01/01” [Date—Publication]: “2019/12/31”[Date—Publication]) NOT Hemiarthroplasty [Title/Abstract] NOT resurfacing [Title/Abstract]; X = quality metric searched. Paper titles and abstracts were then manually screened to ensure that each study included within the analysis not only was related to primary THA but also contained the quality metric searched. Review papers, protocols, and articles not primarily related to THA were eliminated from the analysis.

The number of papers utilizing each outcome metric per year between January 1, 2005 and December 31, 2019 was tabulated from its specific PubMed search. The number of papers within each subcategory of outcome metrics (joint-specific, joint-agnostic, general health, and quality of life) per year was determined by adding the number of papers using each outcome metric within their respective category (Table 2). The total number of studies reporting outcomes metrics per year was found by adding the total number of articles from each outcome subcategory.

Table 2 Utilization of outcome metrics within each subcategory between 2005 and 2019

The percentage of manuscripts with each outcome metric was calculated for each category. The frequency of each outcome metric used each year was also determined. Linear regression analysis was performed to evaluate the fraction of THA papers that included at least one outcomes variable. In addition, linear regression analysis was used to evaluate the fraction of all THA papers that included each outcomes subcategory, and how frequently each outcome metric within each subcategory was utilized. For all regression analyses carried out within the study, the publication year served as the independent variable. A P < 0.01 indicated a significant change in utilization of an outcome metric over time. This value was selected to adjust for multiple comparisons within the study.

Results

From January 1, 2005 to December 31, 2019, there were 14,744 THA studies published in the English language. Of these, 3736 (25.3%) non-duplicate articles were found to utilize at least one of the selected outcomes metrics, and therefore, met the inclusion criteria of the study (Fig. 1). There was a statistically significant increase in studies utilizing outcomes metrics between 2005 and 2019 [78/ 515 (15.1%) vs. 522/1772 (29.5%), respectively; P < 0.001; R2 = 98.1%] (Fig. 2).

Fig. 1
figure 1

Flowchart of studies selected

Fig. 2
figure 2

Percentage of hip arthroplasty publications per year utilizing at least one outcome metric

Of the five outcome subcategories considered within this analysis, three demonstrated a significant change in use between 2005 and 2019 (Fig. 3). The Joint-specific subcategory significantly decreased from 2005 to 2019 [58/78 (74.4%) vs. 314/522 (60.2%), respectively; P < 0.001]. In addition, joint-agnostic reporting increased from 2005 to 2019 [6/78 (7.69%) vs. 71/522 (13.6%) respectively; P < 0.001]. Lastly, reporting for the general health subcategory significantly increased from 2005 to 2019 [0/78 (0%) vs. 71/522 (3.3%), respectively; P < 0.001]. However, quality of life reporting did not significantly change from 2005 to 2019 [5/78 (6.4%) vs. 66/522 (12.6%), respectively; P = 0.607], nor did patient satisfaction [9/78 (11.5%) vs. 54/522 (10.3%), respectively; P = 0.905].

Fig. 3
figure 3

Utilization of outcome metrics within each subcategory between 2005 and 2019

Within the joint-specific subcategory, use of HHS significantly decreased from 2005 to 2019 [48/58 (82.8%) vs. 180/314 (57.3%), respectively; P < 0.001], use of HOOS significantly increased from 2005 to 2019 [0/58 (0%) vs. 21/314 (6.7%), respectively; P < 0.001], and use of mHHS significantly increased from 2005 to 2019 [0/58 (0%) vs. 33/314 (10.5%), respectively; P < 0.001] (Fig. 4). Within the joint-agnostic subcategory, there were no significant changes in utilization for any of the three outcomes metrics within this study: NRS, UCLA, and VAS. Within the general health subcategory, PROMIS-10 demonstrated a significant increase in usage from 2005 to 2019 [0/0 (0%) vs. 9/17 (29.4%), respectively; P < 0.001]. In addition, VR-12 demonstrated a significant increase in usage from 2005 to 2019 [0/0 (0%) vs. 3/17 (17.6%), respectively; P < 0.001]. Lastly, in the quality of life subcategory, EQ-5D demonstrated a significant increase in usage from 2005 to 2019 [0/5 (0%) vs. 23/66 (34.8%), respectively; P < 0.001], while SF-36 exhibited a significant decrease in usage from 2005 to 2019 [5/5 (100%) vs. 18/66 (27.3%), respectively; P = 0.008].

Fig. 4
figure 4

Changes in utilization of outcome metrics by subcategory between 2005 and 2019. a Individual metric use in the joint-specific subcategory. b Individual metric use in the joint-agnostic subcategory. c Individual metric use in the general health subcategory. d Individual metric use in the quality of life subcategory

Discussion

The review considered five categories of outcome metrics as well as individual instruments within each category from 2005 to 2019. This is the first study to consider how the trends in reporting of individual outcomes metrics in THA have changed over time. We hypothesized that outcome metric utilization rates would demonstrate evolving clinician preferences over time, resulting in heterogeneity throughout the literature. There was a statistically significant increase in studies utilizing outcome metrics between 2005 and 2019.

We found that the number of publications per year utilizing at least one outcome metric increased significantly between 2005 and 2019. Recent studies reported that, while the reporting of outcome metrics continues to increase, clinician choice of individual instruments is often highly variable [8, 30]. Despite this heterogeneity, the increase in instrument utilization may demonstrate that both payers and providers have developed a strategy of cost reduction and patient care improvement. While increased reporting of outcome metrics generates additional data on patient outcomes, it should be done wisely, especially because additional metrics may introduce increased heterogeneity within the already complex literature [31]. In fact, this analysis shows that there are 16 widely used outcome metrics used to evaluate THA, across five subcategories. This heterogeneity in outcomes reporting may lead to unnecessary confusion within the industry and may ultimately hinder subsequent research and analysis. A consensus among experts may streamline data reporting and ultimately improve quality of care.

Among the subcategories of outcome metrics, the study found that from 2005 to 2019, joint-agnostic reporting significantly increased, the general health subcategory significantly increased, and the joint-specific subcategory significantly decreased. The decline in utilization in the joint-specific category, accompanied by the increase in more patient-centered metrics (e.g., general health and joint-agnostic categories), may indicate a transition in clinician preference towards monitoring the health of the patient as a whole over time, rather than just the pain and function of a single joint [1, 6].

With respect to individual outcomes measures within the joint-specific subcategory, three outcomes changed significantly from 2005 to 2019. The use of HOOS and mHHS significantly increased, while the use of HHS significantly decreased. Numerous studies illustrate the validity of HHS, as well as its ability to coordinate comparable data across many different countries and languages, which may contribute to its increased utilization [32,33,34,35]. This review found that as HHS decreased significantly, mHHS increased significantly. The results of our study indicate that clinicians may have shifted their preferences away from HHS to mHHS, falling in line with several studies which have recommended the use of mHHS due to its increased simplicity, reliability, and reproducibility in a variety of clinical settings [36, 37].

HOOS-JR, OHS, and WOMAC saw no significant change in utilization within the joint-specific subcategory. The lack of change may be due to its recent introduction, first appearing in the results of our study in 2018. Although OHS has been found to be both reliable and valid, it did not show a significant change in utilization from 2005 to 2019 [38]. While WOMAC utilization rates showed no significant change, the instrument is already the second most utilized metric within the joint-specific subcategory at 16.1% of all joint-specific metrics between 2005 and 2019. The lack of significant change in utilization could be due to an unchanging physician preference for the metric.

Concerning individual outcome measures within the joint-agnostic subcategory, no individual metric showed a significant change in utilization. While the review found no significant change in utilization rate, use was consistent at 23.1% of all joint-agnostic metrics from 2005 to 2019.

Within the general health category, PROMIS-10 demonstrated a significant increase in usage from 2005 to 2019 while VR-12 demonstrated a significant increase in usage from 2005 to 2019. PROMIS-10 is a relatively new metric, first appearing within the search parameters in 2018 and was rapidly adopted (29.4% of general health metrics in 2019). PROMIS-10 has been more heavily studied in TKA, but has not been as well reviewed in THA [27]. VR-12 is able to monitor the effects of comorbidities such as depression and smoking on THA outcomes over time, which may explain its increase in utilization [39]. Despite these results, several studies have shown that scale tends to provide insight into the complex relationship between the psychologic and physical factors impacting patient [39,40,41,42].

Within the quality of life subcategory, EQ-5D demonstrated a significant increase in usage from 2005 to 2019, while SF-36 exhibited a significant decrease in usage. SF-12 demonstrated no significant change in utilization over time. Several studies have found that both SF-36 and SF-12 are acceptable quality of life metrics for the majority of patients [10, 30, 43]. Our results may demonstrate that clinicians tend to prefer shorter quality of life instruments, leading to the utilization rate of SF-36 to decrease while the use of SF-12 has increased [44].

Regarding the utilization of patient satisfaction as an outcome metric, this review saw no significant change from 2005 to 2019. Patient satisfaction is difficult to measure because it encompasses the entire patient experience in THA, from initial communication of preoperative expectations to postoperative follow-up and recovery. Several studies have shown that long-term patient satisfaction in THA is best correlated with pain during activity and lifestyle factors [45, 46]. Despite these findings, the review saw no significant increase in reporting of patient satisfaction, which could be due to the measurement of patient satisfaction as a subsection of additional outcome metrics in the quality of life and joint-agnostic subcategories.

Similar complexity in outcome reporting exists within several fields of orthopedics. A review of reporting outcomes in total knee arthroplasty procedures found significant complexity in the reporting of outcome measures, presenting considerable difficulties in the attempt to compare outcomes in total knee arthroplasty across fields [47]. The current review similarly analyzed the trends in reporting outcomes, which demonstrated similar complexity in THA. As the use of outcome metrics, both in frequency of utilization and in number of individual instruments, continues to increase, we recommend that THA community seek to form a consensus on the preferred outcome metrics. Such an agreement will allow clinicians and researchers to work together to deliver quality patient care through efficient utilization of the vast amount of THA outcome data generated each year. As the transition from volume-focused to value-focused health care continues, a more comprehensive understanding of these metrics will be necessary. Thus, the THA community should strive to reach an agreement on the most useful subcategories, as well as the most reliable and valid outcome metrics within each subcategory. Such a consensus will improve uniformity in the THA literature, and may translate into unified patient management strategies. These changes may ultimately lead to improved quality of care and outcomes for patients receiving THAs. In addition, such changes may be further implemented in similar fields, such as total knee arthroplasty, or spinal surgery.

A number of limitations must be kept in mind when interpreting the results of this study. For instance, not every outcome metric was considered within this study. The vast number of outcomes metrics that are currently being used to evaluate THA indicates that significant heterogeneity exists within the literature. To address this limitation, we performed a comprehensive literature search prior to conducting our analyses to identify the most prevalent outcomes metrics reported within the literature and used by clinicians. This search included analysis of past reviewer articles for relevant outcome metrics, as well as evaluation of articles based on inclusion criteria, outlined in Fig. 1. In addition, this study did not incorporate geographical data; thus, country-level analysis is outside of the scope this paper, but may be an important area of future investigation.

Conclusions

Outcome metric reporting in THA has increased over the last 15 years, accompanying by a rise in both aggregate outcomes reporting and in the number of individual instruments used to evaluate THA. As researchers and clinicians seek to improve patient care, the complexity of reporting outcomes may limit the ability to analyze, compare, and extrapolate study results. We recommend a rigorous evaluation of outcome metrics, which will permit a community-wide consolidation towards the most preferred, valid, and reliable metrics, permitting cross-study comparison and easier generation of clinical guidelines for practice.