Introduction

In 2016, the Comprehensive Care for Joint Replacement (CJR) bundled payment model incentivized the voluntary use of patient-reported outcome measures (PROMs) by hospitals following total hip (THA) and knee arthroplasty (TKA) [1]. As the Centers for Medicaid and Medicare Services (CMS) contain costs associated with these two most commonly billed Medicare inpatient surgical procedures, incorporation of PROMs in the bundled payment model represents a shift from “fee-for-service” to “pay-for performance” model of reimbursement [2, 3]. Utilization and research associated with PROMs has increased accordingly and can be expected to continue as payment models transition to mandate reporting of these metrics.

Recent studies have reported methods facilitating relevant and cost-effective data collection to report on short-term outcomes after orthopaedic procedures [4,5,6]. The arbitrary 2-year follow-up rule enforced by many orthopaedic journals in reporting clinical outcomes after total joint arthroplasty has been questioned. The International Society of Arthroplasty Registries Working Group’s “best practices” recommendations for reporting PROMs, including data collection immediately before and up to 1 year post-operatively with a 60% threshold for acceptable follow-up, was supported [7]. Furthermore, as large prospective studies and national registries are used for sample size to adequately power clinical outcomes studies, a minimum 2-year follow-up is cost prohibitive in this patient population.

The purpose of this study was to test for differences between PROMs collected at 1 and 2 years following THA and TKA, using a prospective data outcome collection system at a single, large, academic healthcare system. We hypothesized PROMs obtained at 1 year would not be significantly different from those obtained at 2 years post-operatively.

Methods

Patients

Details regarding our institution’s data collection system (referred to as CC-OME) have been previously reported [8, 9]. Patients enrolled in CC-OME who underwent THA or TKA from July of 2015 to June of 2016 were included in this study. CC-OME utilizes the Research Electronic Data Capture (REDCap) database for data storage and HIPAA compliance [8]. Clinical, demographic, and PROMs data were collected at 1 and 2 years post-operatively. The THA and TKA cohorts were analyzed and reported separately. The average response time for assessments was 417 days (± 57) for 1 year and 898 days (± 105) for 2 years.

There were n = 486 THAs performed during the study time period. Exclusion criteria encompassed revision, contralateral arthroplasty, death, inability to complete PROMs questionnaires, or missing pre- and/or post-operative data. For the THA cohort, n = 469 patients were enrolled and baseline PROMs were collected (Fig. 1a). Of these, n = 12 patients underwent revision, n = 13 had a contralateral arthroplasty, and n = 1 patient died prior to 1-year follow-up and were excluded. N = 145 patients (30.9%) were lost to follow-up at 1-year, leaving n = 298 patients in the THA cohort with 1 year PROMs. Of these patients, n = 7 patients underwent revision, n = 5 had a contralateral arthroplasty, and n = 1 patient died prior to 2-year follow-up and were further excluded. Between 1 and 2 years n = 80 patients were lost to follow-up, leaving n = 205 patients with responses for both 1- and 2-year follow-up time points.

Fig. 1
figure 1

a Strobe Diagram for THA cohort. b Strobe Diagram for TKA cohort

There were n = 419 TKAs performed at the institution during the study period (July 2015–June 2016). The TKA cohort included n = 414 patients from whom baseline PROMs were collected (Fig. 1b). Of these, n = 10 underwent revision, n = 25 had contralateral arthroplasty, and n = 4 patients died prior to the 1-year follow-up and were excluded. An additional n = 135 patients were lost to follow-up at 1-year (32.6%), leaving n = 240 in the TKA cohort with 1 year PROMs. Of these patients, n = 3 had revision, n = 7 had contralateral arthroplasty, and n = 3 patients died prior to 2-year follow-up and were further excluded. Between 1 and 2 years n = 53 patients were lost to follow-up, leaving n = 174 patients with both 1- and 2-year follow-up responses.

PROMs

PROMs selected for this study were Veterans Rand 12-Item Health Survey (VR-12) scores for both THA and TKA cohorts [10, 11], the Hip Disability and Osteoarthritis Outcome Score (HOOS) [12] Pain subscale, the HOOS-Physical Function Short-form (PS) for THA cohort, and the Knee Injury and Osteoarthritis Outcome Score (KOOS) Pain subscale and the KOOS-PS for TKA cohort [13]. The VR-12 is a health-related quality of life assessment which encompasses eight scales (vitality, physical functioning, bodily pain, general health perceptions, physical role functioning, emotional role functioning, social role functioning and mental health) and two summary measures: Physical Component Summary (PCS) and Mental Component Summary (MCS) [14]. The VR-12 is included in the Medicare Health Outcomes Surveys, which has sampled Medicare beneficiaries since 1998 [15]. Use of the VR-12 has been specifically recommended by the CMS as a validated, non-proprietary and relatively short instrument, and is included in the CJR Bundled Payment Model [16].

With regards to disease-specific instruments, the HOOS pain subscale was utilized for the THA cohort given the extensive validation of the HOOS questionnaire for patients with hip osteoarthritis [17]. Currently, the HOOS questionnaire is the only CMS-recommended hip-specific outcome measure [18]. Similarly, for the TKA cohort, the KOOS pain subscale was utilized as it has been extensively validated for knee osteoarthritis [19, 20] and is the only knee-specific CMS-recommended outcome measure, particularly the pain subscale [16]. Regarding measures of function, HOOS and KOOS physical function short-forms (PS) were utilized in lieu of full HOOS activities of daily living subscale and KOOS sports and recreation subscale to decrease respondent burden and eliminate redundant items. The HOOS-PS and KOOS-PS are validated shorter versions of the HOOS activities of daily living subscale and the KOOS sports and recreation subscale and have demonstrated similar construct validity and responsiveness [13].

To address the potential selection bias created by the loss of follow-up in both THA and TKA cohorts between baseline and 1-year follow-up (Tables 1 and 2), and again between baseline and 2-year follow-up (Tables 3 and 4), demographic and clinical variables as well as PROMs were compared between patients who were lost to follow-up (i.e., non-responders) and those who were not lost to follow-up (i.e., responders).

Table 1 Comparison of preoperative total hip arthroplasty patient demographics, characteristics, and patient-reported outcomes for responders and non-responders at 1 year post-operative
Table 2 Comparison of preoperative total knee arthroplasty patient demographics, characteristics, and patient-reported outcomes for responders and non-responders at 1 year post-operative
Table 3 THA Summary by Cohort, Year 2 vs All Lost to FU
Table 4 TKA Summary by Cohort, year 2 vs all lost to FU

Statistics

Analyses were done separately for THA and TKA outcomes. T tests were used to analyze continuous variables and Pearson Chi-square tests to analyze categorical variables. Patients were sorted into groups according to year-to-year change. If score increased by the minimal clinically important difference (MCID) (Table 5) [6, 17, 21] or more from year 1 to year 2, the patient was assigned the “Better” category. If score decreased by the MCID or more, the patient was assigned the “Worse” category; changes in score less than the MCID were considered “No Change.” Paired t tests were used to test for difference between year 1 and year 2 values. Multivariate Imputation by Chained Equations (MICE) via the R package mice was used to fill in outcome data for patients lost to follow-up at either 1 or 2 years, and paired t test results for the multiply imputed dataset were presented with the t test of the available data. Rubin’s Rule was used to pool the results of the multiple imputation data tests.

Table 5 Patient-reported outcome measure changes from 1 year to 2 years post-operative

Continuous variables were summarized with mean and standard deviation (SD), and categorical variables were summarized with frequency (%). T tests were used to analyze continuous variables and Pearson Chi-square tests were used to analyze categorical variables. The analysis was done in R using the “Hmisc”, “tidyverse”, “rms”, “compareGroups”, and “equivalence” packages. P values less than 0.05 were considered statistically significant.

Results

THA cohort

Mean HOOS pain scores at 1 and 2 years were 90.6 (13.3) and 90.0 (14.7), respectively. Most patients had no change or improved HOOS pain (n = 179; 87.3%) 2 years post-operative compared to the 1-year measures (Table 5), and showed no statistically significant difference (p = 0.445) (Table 6). Similarly, no significant differences were found between 1 and 2 years for HOOS-PS scores (p = 0.265), VR-12 PCS (p = 0.239), and VR-12 MCS scores (p = 0.342) (Table 6). Baseline comparisons between responders and non-responders at 1 year demonstrated non-responders tended to have fewer years of education (p = 0.003), lower HOOS pain (p = 0.044), and lower VR-12 MCS scores (p = 0.001) (Table 1). Comparisons between responders and non-responders at 2 years showed again non-responders had fewer years of education (p = 0.001) and lower HOOS pain (p = 0.007) at baseline (Table 3).

Table 6 Hypothesis test results

TKA cohort

Mean KOOS pain scores at 1 and 2 years were 83.8 (16.8) and 85.0 (18.5), respectively. Most patients had no change or improved KOOS pain (n = 145; 83.3%) at 2 years post-operative compared to the 1-year measures (Table 5), with no statistically significant difference (p = 0.242) (Table 6). Similarly, no significant differences were found between 1 and 2 years for KOOS-PS scores (p = 0.0.088), VR-12 PCS (p = 0.275), and VR-12 MCS scores (p = 0.0.075) (Table 6). Baseline comparison between responders and non-responders at 1 year showed the non-responder group had a higher percentage of patients who were nonwhite (p = 0.001), had higher BMI (p = 0.038), less education (p = 0.018), and higher KOOS-PS (p = 0.004) (Table 3). The comparison at 2 years revealed the non-responder group was again composed of patients with a higher proportion of nonwhite ethnicity (p < 0.001), fewer years of education (p < 0.001), higher baseline KOOS-PS (p = 0.001) and lower baseline VR-12 MCS (p = 0.001) (Table 4).

Discussion

This prospective observational study longitudinally assessed PROMsafter primary THA and TKA at 1 and 2 years post-operative. The main finding of the study was no statistically significantly differences were found between 1 and 2 years in all subscales analyzed (HOOS pain, HOOS-PS, KOOS pain, KOOS-PS, VR-12 PCS, VR-12 MCS). The majority of patients experienced no change, or improvement, in joint-specific PROMs at 2 years compared to the 1-year measures, indicating that little additional information is gained from the added time and cost invested in the subsequent patient follow-up year.

The high cost associated with conducting large prospective studies sufficiently powered to identify individual risk factors related to any type of orthopaedic surgery has led to major deficiencies in identifying modifiable predictors of outcomes. A recent systematic review of PROMs collection in the setting of orthopaedic procedures revealed only four surgeries are represented with a minimum of 1000 cases: THA, TKA, anterior cruciate ligament (ACL) reconstruction and hip fracture surgery [8]. Strategies related to lowering costs and improving efficiency and reproducibility of clinical outcomes studies are, therefore, vital to improve value in healthcare. In accordance with a recently reported meta-analysis [4], the findings of the present study suggest that PROMs for pain and function following primary THA and TKA did not change between the 1- and 2-year follow-up mark. For short-term outcome studies in which PROMs are the primary outcome variable, our data suggest that there might not be an additional value in collecting VR-12, HOOS and KOOS Pain and PS scores beyond the 1-year time point. For studies in which implant survivorship is the primary outcome variable, longer term follow-up would still be required and even the arbitrary minimum follow-up mark of 2 year might not be sufficient. With the development of evidence-based approaches in clinical research, minimum follow-up periods required by journals will likely be determined by the outcome (PROMs, implant failure, infection rates, revision risk, among others) for which the study is powered.

In a meta-analysis including six THA and nine TKA outcome studies, Ramkumar et al. [4] tested for equivalence between PROMs collected at 1 and 2 years post-operatively. No statistically significant difference was found for the four most commonly reported PROMs, the Harris Hip Scores (p = 0.22), Short Form (SF) scores (including SF-12, SF-36 and SF-6D) (p = 0.94), Western Ontario and McMaster Universities index (WOMAC) (p = 0.49), and Knee Society Score (p = 0.13) [4] at these two time points. This study, however, included mostly level III and IV data, which attests for the paucity of higher quality prospective cohort studies reporting on changes on PROMs between 1 and 2 years after THA and TKA.

Loss of follow-up was a major concern in this study. A similar study comparing outcomes of ACL reconstruction at 1 and 2 years utilizing the Swedish National Ligament Register had a 70% loss of follow-up rate [5], inherent of large registry database studies. To address potential selection bias related to this in our study, we analyzed baseline characteristics of patients who were lost to follow-up with those who were not lost to follow-up between baseline and 1 year, and 1 and 2 years and reported differences (Tables 1, 2 and 3, 4). While statistically significant, these differences in PROMs were not clinically significant. The baseline differences seen in HOOS pain, KOOS-PS, and VR-12 MCS between responders and non-responders at 2 years were below MCID for each specific score and, therefore, not clinically relevant. Two other studies have reported none or small differences below the minimal clinically detectable change in KOOS between responders and non-responders at 1 and 2 year follow-up [22, 23], which supports minimal selection bias in the present study. If the lost to follow-up population were to increase with later time points and if this population were associated with poorer outcomes, results could be improperly skewed to demonstrate improvement when there is none, thus highlighting a need for an evidence-based minimum follow-up period. A ceiling effect may also have contributed to the absence of statistical difference between PROMs obtained at 1 and 2 years post-operatively. While there are very few reports of ceiling effect on both HOOS and KOOS [17, 21], this is a common limitation to all studies including these PROMs and would not affect the conclusions of our findings.

Significant differences were found in education level between those who were lost to follow-up and those who were not between baseline and 1 year, and 1 year and 2 year (Tables 1, 2 and 3, 4). As follow-up involves patients filling out extensive questionnaires, a limitation may exist in that patients find these questionnaires too demanding and be unwilling to participate. Another limitation of this study is the relatively small sample size, which was not large enough to reliably analyze the subset of patients who either improved or worsened between the 1- and 2-year time points. This information would be valuable in identifying predictors of functional decline after 1 year and enable clinicians to detect patients who would benefit from early follow-up and intervention. It should also be noted that the results presented here only apply to the PROMs included, and do not apply to complications, survivorship, etc. Lastly, while there was no statistical difference in VR-12 PCS and MCS between 1 and 2 years post-operatively, a larger percentage of patients in both the THA and TKA cohorts experienced worsening of those scores compared to the joint-specific outcome measures. This finding indicates that general health measure may not necessarily correlate with hip and knee function between years 1 and 2 post-operatively and future, larger studies may be required to appropriately assess how these measures fluctuate after THA and TKA.

Clinical outcomes research is paramount in driving treatment decisions and allocating healthcare resources appropriately [24]. Identifying modifiable patient-related risk factors pre-operatively can improve patient care, optimize functional outcomes, and decrease healthcare costs. Minimum follow-up periods for clinical outcomes studies should be tailored to specific primary outcome variables of interest. Furthermore, this should be established based on how specific outcome variables fluctuate over time following intervention. For studies in which PROMs are the primary outcome variable in patients who did not undergo revision surgery, our study supports that a minimum follow-up period of 1 year is adequate as no significant differences were found from 2-year measures. Additional advantages to using PROMs at 1 year include decreased rates of loss to follow-up compared to 2 years, as well as reduced cost of implementation, which is more sustainable to healthcare providers.