Introduction

The value of patient-reported outcomes measures (PROMs) to evaluate outcomes after surgery is gaining recognition and the need to integrate these into clinical practice is becoming increasingly important [1,2,3,4,5,6]. For surgical interventions such as total knee replacement (TKR), patient-reported outcomes generally include pain, function and health-related quality-of-life (QoL). Patients typically experience a significant improvement in these outcomes within the first year following surgery and the effects tend to plateau in subsequent years [7,8,9,10,11]. Although this pattern of recovery is well-known, it is unclear if it can be universally applied because up to 20% of TKR patients do not gain clinically meaningful improvement following surgery [12, 13]. Recent research on longer-term functional outcomes identified a subgroup with delayed functional gains [14] which indicates that longer-term recovery patterns, and consequently effectiveness gained from surgery, differ considerably across patients. Therefore, it is important to better understand the longer-term implications of TKR, particularly in patients who experience poorer outcomes and whether these patients can be identified early to optimize their outcomes.

There is growing evidence that QoL is an important predictor of outcomes such as complications, hospitalisation and mortality [15,16,17,18]. This suggests that a better understanding of patients’ QoL trajectories can reveal important information on disease progression and outcomes. QoL PROMs have the additional benefit of capturing the necessary information for cost-effectiveness analysis allowing decision makers to compare the value of health interventions and prioritize resource allocation. Further, associating patterns of QoL with patient characteristics may help identify groups for whom TKR may be of higher or lower value. This can help facilitate the rational deployment of TKR to those who stand to benefit the most while targeting others for more appropriate alternative interventions or management strategies. This is important as healthcare systems are transitioning from volume- to value-based health care as a means of improving sustainability of the healthcare system whilst also optimizing patient outcomes and experience [19, 20]. This is particularly relevant for surgical interventions such as TKR, which are performed in high volumes annually and are associated with considerable health care costs, amounting to $11.8 billion in 2014 in the US alone [21].

In this study, we aimed to identify unique QoL trajectory groups for TKR patients from routinely collected PROMs, demonstrate the distinct variations in health gains and explore the individual characteristics related to group membership, using a rich data source with 5 years QoL data. By quantifying health gains using quality-adjusted life years (QALYs), a commonly used outcome in economic evaluations such as cost-effectiveness analyses, we demonstrate the feasibility of how QoL trajectories can be translated into valuable information for guiding patient-centered care. This will also provide a better understanding of the value of surgery across different trajectory groups.

Material and methods

Data

The St. Vincent’s Melbourne Arthroplasty Outcomes (SMART) Registry prospectively captures clinical and patient-reported outcomes in all patients undergoing elective hip and knee replacement at the study institution in Melbourne, Australia. The study institution is a tertiary referral centre for joint replacement surgeries and receives state-wide referrals. Registry data collection started in 1998 and to date, has recorded over 10,000 procedures with approximately 800 registered annually [11, 22, 23]. This dataset is ideal to answer the research question regarding longer term trajectories as at least 5 years of annual follow-up data are available. This included patients who had TKR between January 1, 2006 and December 31, 2011. Individuals were excluded if they had missing baseline QoL score, no QoL scores at all subsequent time points, underwent early revision or died within 2 years of surgery. Our analysis included patients with at least two QoL scores. For individuals that underwent bilateral knee surgery during the study period, only the most recent TKR was included in the analysis. Sensitivity analyses were conducted to assess the effect of our exclusion criteria.

Baseline data on patients were prospectively collected and included baseline socio-demographic and patient characteristics such as age, gender, body mass index (BMI), smoking status, co-morbidity measures such as Charlson Co-morbidity Index (CCI) and American Society of Anesthesiologist (ASA) Physical Status Classification. Cultural and linguistic diversity was measured via the need for an interpreter, socioeconomic status was measured via the Socio-Economic Index for Areas (SEIFA) [24] and geographical accessibility index (ARIA +) [25] reflected rurality. Clinical variables included bilateral knee surgery, Kellgren-Lawrence scale [26] describing radiographic severity of osteoarthritis and the Knee Society Scores (KSS) [27] subscales for pain and function.

Quality-of-life measurements

Patients completed the 12-item Short Form Survey (SF-12) prior to surgery and annually post-operatively. Baseline and annual QoL scores up to 5 years post-surgery were considered for analysis. SF-12 responses were transformed into utility values using a published algorithm [28]. A utility value is a general index of wellbeing used for economic evaluation where 1 is equivalent to ‘full health’ and 0 is equivalent to being ‘dead’ with scoring algorithms based on public preferences for health states.

Statistical analysis

Latent class trajectory analysis

Latent class growth analysis (LCGA) was used to identify subgroups of patients according to their trajectory of QoL (described using utility values) pre-surgery and up to 5 years following TKR. LCGA is a semi-parametric technique used to classify distinct subgroups that follow a similar pattern of change over time hence appropriate for analyzing longitudinal data [29]. This means that patients exhibiting similar patterns of QoL are grouped forming sets of homogenous classes. LCGA is able to accommodate missing data such that patients with missing QoL values at several time points are not excluded from the analysis thus minimizing the exclusion of patients [30, 31].

Identifying trajectory groups

As the number of potential trajectories is unknown, a series of models considering 1 to 8 classes were estimated. The censored normal model was selected as the most appropriate for the available data. The Bayes Information Criteria (BIC) is a commonly used criteria to assess model fit, where higher BIC values indicate better model fit [32]. The choice for optimal model was guided by a combination of factors including our research objective, goodness-of-fit statistics Akaike’s Information Criteria (AIC), model interpretability, posterior group-membership probability diagnostics [31, 32]. The latter set of diagnostics included ensuring all groups displayed average group posterior probabilities above 0.7 [29] and odds of correct classification (OCC) were greater than 5 [32]. Patients were assigned to the trajectories which they had the highest posterior probability of membership.

Estimating QALYs

Quality-adjusted life-year (QALY) is a common metric used to measure health benefit and incremental outcomes are of interest for economic analysis to quantify the value of interventions [33]. QALYs for each QoL trajectories were calculated using patient-level utility values using the area under the curve method [34]. To quantify the effectiveness (health benefit) gained from the intervention, QALYs gained (incremental QALYs) were calculated for each patient assuming the patient experienced no change from baseline utility if the patient had not had a TKR [10, 35,36,37,38].

Multinomial logistic regression analysis

Based on assigned trajectories, multinomial logistic regression analysis weighted by probability of class membership was performed to examine the association between trajectory groups and baseline patient characteristics. The multivariable model included variables identified as potentially important discriminators of class membership in the univariable multinomial logistic regression analyses (those displaying associations at P < 0.10). Tests for collinearity were conducted with variance inflation factor (VIF) greater than 10 and tolerance of less than 0.1 considered to indicate the presence of multi-collinearity. The trajectory group with the highest incremental QALYs was used as reference category against which other trajectory groups were compared. All analyses were conducted using Stata SE14 (StataCorp, College Station, TX, USA), employing Traj plugin for LCGA.

Results

Study population

1553 TKR patients were included in the analysis after 339 cases were excluded based on: missing baseline QoL (n = 3), no follow-up QoL (n = 36), early death (n = 14), early revision (n = 32) and bilateral surgeries where the most recent surgery was already included (n = 254). Table 1 displays the baseline patient characteristics who were on average 70.1 years (SD, 8.5) and 67.4% were female and mean QoL utility of 0.56 (SD, 0.11) pre-operatively. Of those included, complete QoL data from baseline and across all 5 years were available for 1218 patients (78%).

Table 1 Baseline characteristics

Model selection

The model with six classes was chosen to achieve a balance between model parsimony and adequately identifying distinct QoL patterns to demonstrate heterogeneity within the cohort providing insights on the longer-term QoL outcomes and the potential value of surgical intervention across different patient groups. The 6-class model produced six distinct QoL trajectories (Fig. 1) and met all diagnostic tests criteria. The probability of membership for allocated class ranged between 0.78 and 0.85 and displayed OCC above the minimum value of 5 (full results can be found in Supplementary Material Tables S1 and S2). The addition of excluded individuals in the sensitivity analysis produced similar results.

Fig. 1
figure 1

QoL trajectory profiles and class membership for six-class model (coloured figure). Traj 6—High baseline, moderate sustained improvement (6%). Traj 5—Low baseline, large sustained improvement (19%). Traj 4—Low baseline, large unsustained improvement (18%). Traj 3—Low baseline, moderate improving (8%). Traj 2—Low baseline, moderate sustained improvement (31%). Traj 1—Low baseline, small sustained improvement (18%)

Characterization of classes

The trajectories were characterized by 3 main phases; pre-surgery, post-operative improvement (period between pre-surgery and Year 1) and maintenance (after Year 1). Table 2 provides the description for each of the trajectories, total and incremental QALYs gained over the 5-year period.

Table 2 Description of each QoL trajectories by phases and estimated QALYs over 5-years

Total QALYs of the trajectory with lowest QoL (Trajectory 1) was 2.62 (SD, 0.19) and the number of QALYs increased with higher utility values for subsequent trajectories. In terms of effectiveness gained from TKR, incremental QALYs were lowest for Trajectory 1 (0.16 (SD, 0.35)) and greatest for Trajectory 5 (1.42 (SD, 0.40)). Although patients in Trajectory 6 had the greatest number of QALYs, estimated incremental gain from TKR was small at 0.39 (SD, 0.37) compared to most other trajectories.

Characterization of patients across trajectory groups

Baseline patient socio-demographic and clinical characteristics were compared across the 6 QoL trajectories and are provided in Supplementary Material Table S3. Patient characteristics differed across trajectories. The mean age of patients in Trajectories 3 and 5 was lower than in other trajectories. There was a higher proportion of females in trajectories reporting poorer QoL. Trajectory 1 had the largest proportion of patients reporting severe baseline pain (71.3%) and lowest baseline KSS function (26.7 (SD, 19.8)) compared to others.

Although there was a low chance of collinearity (tolerance range between 0.79 and 0.98; mean VIF = 1.14) when all variables were included, to achieve a parsimonious model, only patient characteristics displaying associations of P < 0.10 from the univariable regression models (Supplementary Material Table S4) were included in the final multivariable model. These were age, gender, BMI, interpreter, CCI, ASA, rurality, baseline KSS pain and function. Results from multivariable multinomial logistic regression are presented in Table 3 (Supplementary Material Figure S1), showing the relative risk of belonging in the respective trajectory for each patient characteristic.

Table 3 Multivariable multinomial logistic regression showing relative risk (RRR) of belonging in each of the trajectory groups compared to Trajectory 5 (highest incremental QALYs/health gains)

Compared to patients with the greatest incremental QALY (Trajectory 5), patients with the lowest gains from TKR (Trajectory 1) are more likely to have co-morbidities, high ASA score, need an interpreter, more likely to report lower KSS function score (poorer mobility) and less likely to be in rural residence. Patients with moderate, sustained gains in Trajectory 2 are more likely to be older, female, require an interpreter, have co-morbidities, less likely to be in a rural residence, more likely to report lower KSS function score and are less likely to report severe pain compared to those with large gains (Trajectory 5). Patients exhibiting slow progressive improvement (Trajectory 3) were found to be more likely to have co-morbidities and report mild than moderate/severe pain compared to those whose improvement peaked earlier (Trajectory 5). Compared to Trajectory 5, patients in Trajectory 4 were older, have co-morbidities, and less likely to report moderate/severe pain. Patients consistently reporting high QoL (Trajectory 6) were likely to be older, less likely to report moderate/severe pain and more likely to report higher KSS function score compared to Trajectory 5. A summary of these findings is presented in Table 4.

Table 4 Patient characteristics associated with each of the trajectories compared to Trajectory 5 (highest incremental QALYs/health gains)

Discussion

Using latent class growth analysis, we identified 6 distinct QoL trajectories indicating the presence of significant heterogeneity in QoL outcomes among TKR patients. Although most patients exhibited a trajectory profile that is commonly reported in the literature (improvement within 1 year followed by a plateau), the distinct difference observed in this study is that patients had variable gains following surgery and not all patients maintained the improvement. This highlights that patients will not universally achieve large QoL improvement following TKR as is commonly reported in the literature. Trajectory 5 (large sustainable improvement after surgery) was identified to be the most positive QoL trajectory with the greatest gain in QALYs. However, only 18.4% of the patients were classified in this trajectory and were likely to be younger, have no co-morbidities and report greater pain at pre-surgery than most in other QoL trajectories.

While much research has focused on identifying potential risk factors and integrating these to improve medical decision making, associating patient-reported outcomes such as QoL to these patient characteristics can facilitate delivery of individualized health care as it allows patient engagement in shared decision making to help optimize outcomes [20, 39]. The unique QoL trajectories identified in this study clearly show variations in the benefits of TKR; one-year post-surgery and in the longer term, and the combination of patient characteristics associated with each trajectory. Whilst there are limitations in employing the current findings to deterministically identify patient subgroup most likely to have poor outcomes, knowledge of the combination of characteristics (Table 4) that predisposes a patient to trajectories with poor health gains (for example, Trajectories 1, 2 and 6) can be useful in anticipating possible outcomes and mitigating such risks. This may include managing pre-surgery expectations [40], personalizing self-management plans [41], careful planning in managing co-morbidities to optimize patients prior to surgery [42] and tailoring pre-surgery management through mindfulness training to maximize outcomes in these patients [43].

There is also potential to use this information to improve post-surgical management to optimize care. Correlating patient characteristics with patient-reported QoL responses can help clinicians track progress and identify patients who are unlikely to obtain the maximum effectiveness from the treatment; for example, elderly female patients with moderate pain pre-surgery who consistently report low QoL may not benefit fully from the standard prescribed post-surgical management and may require an individualized approach. This gives both the patients and providers opportunity to engage and plan follow-up consultations based on goals and expectations for physical [44,45,46] or mental health [43] therapies to improve outcomes. Recognizing the variability in health trajectories could also enable patients to have realistic expectations, to better understand their clinical course and facilitate discussions with their surgeons [1]. This allows for the opportunity to tailor the evolving care post-surgery on an as-needed basis. While understanding the patient characteristics associated with these trajectories is important, it is acknowledged that beyond these characteristics, psychological factors such as pain-related beliefs and psychological distress can also influence TKR outcomes [47,48,49] and should be considered alongside.

To date, trajectory analysis on TKR patients have mostly focused on pain and function trajectories and have also demonstrated heterogeneity within the TKR population; commonly identifying the presence of a subgroup with poor pain and/or function outcomes comprising between 14 and 23% of the study cohort [11, 50, 51]. While it is unclear if patients with a low QoL trajectory (Trajectory 1) were non-responders or those reporting poor pain/function outcomes after surgery, some similarities in the characteristics of these patients were observed. Patients in Trajectory 1 had higher BMI, were more likely to be co-morbid, report severe pain, have low mental and physical well-being (Supplementary Material Table S3) which are consistent with the predictors of non-responders [22] or poor pain and function outcomes [11, 50]. For these patients, the prescribed standard surgical treatment and follow-up plans are unlikely to be adequate, thus resulting in poor patient outcomes and low value care. Therefore, by maximizing the use of PROMs to better understand potential QoL trajectories, clinicians can be better informed on how they may plan to manage subgroups with these characteristics and assign patients to more appropriate level of surveillance and better supportive care or alternative rehabilitation programs to optimize the outcomes of those who are truly experiencing low QoL long-term after surgery.

This study showed that improvement in QoL following surgery was observed to be the greatest among younger patients; Trajectory 5 and Trajectory 3 albeit over a longer period. Although the observed associations were statistically significant, they were relatively weak, and this could be due to the small number of patients under the age of 60 (approximately 12% of sample). Historically, younger patients are considered as less appropriate candidates compared to elderly patients due to the higher risk for revision [52]. This is likely related to duration of prostheses survivorship and higher levels of activity among younger patients [53]. While revision risk is an important consideration, the current study provides additional insights. It may be useful for clinical practice to consider the potential benefits and value to be gained from the intervention when making surgical recommendations, particularly in younger patients [54]. Post-marketing surveillance and advances in technology have led to improvements in prostheses survivorship which have now reached 90% at 20 years and even 82% at 25 years [55]. Therefore, having to wait for advanced age to be suitable for surgery may represent a missed opportunity to improve an individual’s well-being and labor force productivity.

As PROMs including QoL are increasingly recognized as an important consideration in clinical care, it is important these are routinely captured pre- and post-surgery using relevant tools to evaluate the effectiveness and value of intervening [56]. These findings also reinforce the need to encourage PROMs collection beyond the one-year post-surgery mark as delayed improvers (Traj 3) or diverging trends (e.g. Traj 4 and 5) can be indicators of sub-optimal care. Beyond routine collection of PROMs, there also needs to be considerations in integrating these into shared decision-making tools and identifying suitable approaches to implement these in practice to better guide clinical care and improve the value of surgery. Additionally, risk stratification is an important approach in advancing research [57], thus the ability to identify homogenous subgroups based on a combination of characteristics amongst a heterogenous cohort can be useful in selecting the right patients for trials of novel interventions allowing for a more targeted approach.

Because of the rapidly growing rates of utilization and large costs associated with TKR, the judicious use of scarce healthcare resources is ever more important to ensure sustainability for health insurers and health systems. Further, the appropriateness of the surgery in selected patients has also been called into question where studies showed up to one-third of TKRs were deemed to be inappropriate procedures [58, 59]. Therefore, it is important to target those whom we can maximize outcomes and improve value of care. We find patients reporting good QoL prior to surgery (Trajectory 6) were among those with small gains. Though it is uncertain if these patients have merely adapted to their condition hence report higher levels of QoL than others with the same condition [60], it may be important to understand the rationale for surgical intervention in these patients. While TKR is widely regarded as a cost-effective procedure in general, this raises the question if TKR is necessarily cost-effective for all patients. Some groups of patients may require additional care and healthcare services demands to improve their outcomes. This may be relevant to patients exhibiting poor long-term QoL outcomes with small gains in health benefit such as those in Trajectories 1, 2 and 6, which in combination contributes to a significant proportion (55%) of the cohort. Therefore, further research to quantify the healthcare needs and assess the cost-effectiveness across these sub-groups would be helpful in understanding the true value of surgery amongst the group of heterogenous TKR patients.

Limitations

Several limitations should be considered when interpreting these results. The generalizability of the findings could be limited as patients were from a single-center. However, the demographics of patients in this study closely reflect those reported in our National Joint Replacement Registry [61]. It is acknowledged that changes to modifiable characteristics such as comorbidity over time can affect QoL trajectories [62]. However, it is difficult to ascertain the extent of this in the current study unless such information is also captured over time. QoL assessments can be subject to biases known as response shifts where patients could change the way they evaluate themselves and respond to surveys over time [63]. While studies have shown that changes in health outcomes were underestimated when response shifts were not accounted for in TKR patients [64, 65], another has shown that despite adjusting for large response shifts, it did not change the authors’ clinical interpretation of the results [66]. In the context of our study where all patients were surveyed in the same manner across time, it is unlikely to change the conclusions drawn from our analysis. It is noted that our assumption of no change from baseline made in the calculation of incremental QALYs may result in an overestimation of QALYs as a result of regression to the mean [67]. Conversely, deterioration in QoL due to aging or absence of surgery may also result in an underestimation. The application of our assumption follows published economic evaluations [10, 34, 35, 37, 38]. Variables such as co-morbidity, ASA, KL scores and socio-economic indicators (SEIFA) were dichotomized to avoid small cell sizes which could reduce the sensitivity of our analysis.

Conclusion

There is strong evidence indicating important heterogeneity in QoL trajectories in TKR patients resulting in variable gains in QoL and QALYs across different trajectory groups. This indicates not all patients benefit from the surgical procedure in the same way. With the growing recognition to support patient-centered care, PROMs may have a particular usefulness when employed alongside patient characteristics for tracking and guiding clinical care to maximize patient outcomes and justifying costs of surgical intervention. Future research should focus on identifying approaches of its implementation into clinical practice.