Introduction

In recent years, patient-reported outcome measures (PROMs) receive significantly more attention as a tool to measure the outcome of total knee arthroplasty (TKA) [1]. This patient-centric philosophy uses patient satisfaction as a metric of quality of patient care as opposed to the traditional, physical outcome measurements. The relation between these subjective and objective measurement tools would seem self-evident, but earlier results have been inconclusive [2,3,4,5].

The mean impetus for this paradigm shift is the understanding that concerns and priorities of patients and surgeons may differ [5, 6]. Despite good scores in the traditional outcome measurement tools developed in the eighties [7,8,9,10], only 68–93% of the patients report satisfaction after TKA [6, 11,12,13,14,15,16]. Considering this discrepancy, the current research project is focused at pinning the underlying causes of dissatisfaction. Whilst PROMs are deemed relevant by policy makers, registries and general public, the relation between PROMs and objectively measured functional tests remains to be demonstrated by the orthopaedic community.

The aim of this paper is twofold; (1) can we analyze whether the functional outcomes are different between patients with high and low PROMs after TKA, and (2) if so, can the PROMs be predicted by the improvement of (1) the range of motion, (2) the quadriceps force, (3) the sit-to-stand test (STS), (4) the 6-min walk test (6MWT). This research investigates the relation between these objective outcomes and subjective ‘patient satisfaction’ by setting thresholds for these functional tests. This would help to establish the necessary improvement in physical outcome necessary to achieve patient satisfaction after TKA.

Materials and methods

After obtaining institutional review board approval (EC number B670201419601), informed consent was obtained from 77 patients planned for primary TKA at our center. These patients were subsequently evaluated preoperatively and followed prospectively. Twenty patients refused to come back for the postoperative testing at 6 months for personal reasons. Fifty-seven patients (37 female, 20 male) were evaluated preoperatively and at 6 months postoperatively. Mean age of the patients was 64 years (SD 9.0) with a mean BMI of 31 (SD 5.7).

Variables and outcome measures

Maximum flexion and flexion contracture were measured by a single investigator using a goniometer to the nearest 5°, with the patient in supine position. ROM was calculated by subtracting flexion contracture from maximum flexion. Jakobsen et al. demonstrated that even a change of knee joint ROM less than 5° could be detected by experienced testers using a standardized protocol, with very good intra-tester reliability (ICC 0.89–0.97) and inter-tester reliability (ICC 0.92–0.96) [17].

Quadriceps and hamstring force were measured using maximum voluntary isometric contraction. This was tested with a hand-held dynamometer (Microfed Handheld Dynamometer®, HHD) by a single investigator [18]. Quadriceps force was measured with the patient sitting down holding their leg in 60° of flexion. The handheld dynamometer was placed on the distal anterior third of the lower leg. Subsequently, the patient was asked to sustain maximal force during extension. The hamstring force was measured in a prone position. The leg was positioned in 30° flexion, whilst the handheld dynamometer was placed on the distal posterior third of the lower leg. Three trials were performed on each side, and the maximum voluntary force generated normalized to the participant’s weight was used to quantify quadriceps and hamstrings strength.

To evaluate the sit-to-stand (STS) test, patients were requested to sit on a standard chair without armrests. Their feet were placed comfortably on the floor with knees flexed slightly more than 90°. Patients were then asked to stand up to a fully upright standing position five times as quickly as possible without using their hands (arms folded across the chest). Timing with a stopwatch started on “start” (after a countdown from 3) and ended on the fifth stand. This test is easy to perform in clinical practice and has shown excellent intra- and interrater reliability (ICC 0.89) in patients with hip or knee OA. It was also found to accurately predict disability across populations [19].

The procedure for the 6MWT was based on published guidelines [6]. A 40-m long section of the hallway was demarcated for this test. The participant was instructed to walk as far as possible for 6 min, up and down the demarcated path, pivoting to turn at the end of each lap. Timing commenced as the participant stepped over the start line. Standardized encouragement was given to the patient after each minute. The patient was instructed to stop at 6 min, or earlier if they were unable to complete 6 min, and to maintain their position whilst the assessor measured the final partial lap with a validated measurement tool. High repeatability of this test has been established in knee osteoarthritis patients (ICC 0.94) [20].

Three different PROM tools were used. (1) The knee injury and osteoarthritis outcome score (KOOS) consists of 42 knee-related items, with each item scored from 0 to 4. Five subscales of symptoms, pain, activities of daily living (ADL), sports and quality of life (QOL) are converted to 100 points. Decreasing scores indicate worsening of knee symptoms [9, 21]. Since a low completion rate of the sports subscale was found in our study, this subscale was excluded from the analysis. (2) The satisfaction subscore of the 2011 Knee Society Score (KSS) consists of five questions (40 points). The higher the score, the better the outcome [22, 23]. (3) The Oxford Knee Score (OKS).consists of twelve questions assessed on a Likert scale with values from 0 to 4, a summative score is then calculated, where 48 is the best possible score (least symptomatic) and 0 is the worst possible score (most symptomatic) [21] The Oxford Knee Score was only evaluated at 6 months postoperative.

Dutch translations of all these scores have been validated [8, 21, 23] for use in Dutch speaking countries or regions including Flanders where this study was conducted.

Statistical analysis

Statistical analysis was performed using SAS 9.4 (SAS Institute Inc, Cary, NC, USA) and SPSS version 22 (IBM Corp., Chicago, IL, USA).

A uniform agglomerative to-step cluster methodology was applied to identify groups of patients with the same characteristics. The input variables were the PROMs KOOS [Pain, Symptoms, ADL and QOL, OKS and KSS (Satisfaction)]. The number of clusters was specified to be 2 using Schwarz’s Bayesian Critera (BIC).

Overall goodness-of-fit of clusters was evaluated using the silhouette coefficient. Silhouette measure of less than 0.2 was classified as poor, between 0.2 and 0.5 as fair and more than 0.5 as good solution quality, of which a good solution quality (0.6) was found in this study [24]. Predictor importance (PI) was shown to be the highest for KSS Satisfaction (PI = 1.00), then KOOS Symptoms (PI = 0.75), KOOS Pain (PI = 0.66), KOOS QOL (PI = 0.62) and KOOS ADL (PI = 0.28). Patients in each cluster were more similar to each other than to patients in the other cluster.

After cluster definition, independent T tests were used to compare the between-group differences and paired T tests were used to evaluate the within-group differences.

ROC-curve analysis was used to identify thresholds for the functional outcomes that identified patient satisfaction group according to the aforementioned PROMs.

The Youden’s index (which is the point where Sensitivity + Specificity-1 is the highest) was used to define the cut-off. Maximizing this index allowed us to find an optimal cut-off point independently from the prevalence. These cut-off values were used to dichotomize the improvement in functional outcomes. The dichotomized parameters were then used to determine a multiple logistic regression model that can predict the odds-ratio of being in the cluster with high PROMs (cluster 1).

Since the nature of these exhaustive studies generally leads to study groups that are relatively small, comparable studies were evaluated. After evaluating the sample size of comparable studies, a group of minimum 50 patients was aimed for [5, 25, 26].

Results

Two-step cluster analysis based on the postoperative KOOS, OKS and KSS satisfaction subscore led to the formation of two clusters. Cluster 1 contained 45 patients with high PROM scores, whilst cluster 2 contained 12 patients with low to medium PROM scores (silhouette coefficient = 0.6). The cluster specific demographic information, functional scores and postoperative PROMs are listed in Tables 1, 2 and 3.

Table 1 Cluster specific demographics
Table 2 Preoperative, postoperative and difference between pre-and postoperative for all functional variables
Table 3 Preoperative, postoperative and difference between pre-and postoperative for all PROMs

Preoperatively, the normalized quadriceps force and the KOOS ADL subscore were significantly higher in cluster 1 (cluster with high PROM scores). All other preoperative functional tests and preoperative PROMs were similar between the two clusters (Tables 2, 3).

Postoperatively, all PROM subscores as well as all functional outcomes, except 6MWT, were significantly higher in cluster 1 (Tables 2, 3).

A significant improvement in PROM scores and functional outcome scores was found in cluster 1, but not in cluster 2 (Tables 2, 3).

Cut-off values for each functional outcome improvement were based on the Youden point of the ROC curve (Table 4; Fig. 1). These cut-off values were used to dichotomize the functional outcomes and odds ratios were evaluated (Table 4). Backward logistic regression with all dichotomized functional outcomes allowed us to retain a significant model containing ROM and 6MWT improvement. While simple logistic regression of the demographics showed near missed significance for gender, this variable was added to the model, with the chance of being in the cluster with high PROMs being higher for male patients (OR 8.0, CI 0.95–67.69, p = 0.055, AUC 0.67).

Table 4 Cut-off points for improvement of functional outcomes with odds ratio, 95% confidence intervals (CI), p values and area under the curve (AUC)
Fig. 1
figure 1

ROC curves of all continuous functional outcomes to define cut-off values using Youden’s point

The final multiple logistic regression model consisted of gender, improvement of ROM ≥ 5° and improvement of 6MWT ≥ 50 m. This model was statistically significant (p < 0.01) with a sensitivity of 91.1% and a specificity of 75%. Odds ratios were 10.85 (95% CI 1.69–69.56) for improvement of 6MWT, 2.56 (95% CI 0.38–17.06) for ROM and 13.17 (95% CI 1.06–163.83) for gender (Fig. 2).

Fig. 2
figure 2

ROC-curve logistic regression model based on improvement of range of motion ≥ 5°, 6-min walk test ≥ 50 m and gender

Discussion

The most important finding of the present study was the establishment of cut-off values for improvement of ROM and 6MWT enabling us to predict the classification into clusters based on PROMs. This is, to our knowledge, the first study using cluster analysis to group different PROMs into clusters that represent satisfaction levels. While this is a common technique in psychology, genetics, microbiology and marketing, it may be applied to subjective outcomes after TKA, as these represent certain patient profiles [27, 28]. The clusters proposed in this study represent one group of patients with good to excellent PROMs and another group with poorer results. Patients in each cluster are more similar to each other than to those in the other cluster. While significant between-cluster differences for all PROMs are found, this implies a good execution of the two-step cluster analysis [29].

This between-cluster difference was also observed for all postoperative functional outcomes. The cluster with the high PROM scores consistently included the patients with better ROM and quadriceps force, further walking distance and faster execution of the STS test. Furthermore, patients in this cluster experienced a significant improvement for all PROMs and functional outcomes, compared to the patients in cluster 2. Considering these differences, a relation between the PROMs and the functional outcomes can be postulated, especially with regard to the difference in improvement between the clusters.

Besides the postoperative difference between the clusters, patients in the cluster with high postoperative PROMs also exhibited significant stronger quadriceps strength and a higher KOOS ADL subscore preoperatively. However, a different gender distribution between the clusters may play a role with only one male in the cluster with low PROMs. Previous studies have reported that men undergoing TKA have significantly higher physical function scores than women [20, 30, 31]. Additionally, higher preoperative quadriceps force was shown to result in decreased postoperative pain, and improved quality of life [32]. Therefore pre- and postoperative quadriceps training should probably be recommended to improve patient satisfaction.

Since the two clusters exhibited a distinct difference in functional improvement between the pre- and postoperative setting, we investigated whether cut-off values would be appropriate to predict the cluster allocation.

Considering the 6MWT, patients who improved 50 ms or more were more than 8 times more likely of having high PROMs and thus being satisfied with the result of their TKA surgery. This cut-off value is similar to the results of Naylor et al. who described the threshold of improvement for the 6MWT, at 26 weeks post TKA, between 26 and 55 m [33]. While they used a 7-point transition scale of improvement, we used the aforementioned clusters to define the threshold. Ko et al. analyzed the absolute walking distance and found that the WOMAC (Western Ontario and McMaster Universities Osteoarthritis Index) was significantly predicted by the 6MWT [23]. In our study, the absolute walking distance did not differ between the two clusters. Therefore, we recommend focusing on the improvement in walking distance, as this reflects the postoperative progression.

Patients with a postoperative increase in ROM of 5° or more were six times more likely to be classified in the ‘satisfied cluster’. While patients in this cluster gained 12°on average, the ROM in the ‘less satisfied cluster’ barely increased by 2°. We acknowledge that it is difficult to accurately measure these small increments in ROM, but Jakobsen et al. demonstrated that a change in knee joint ROM of less than 5° can be detected by experienced testers using a standardized protocol, with very good intra-tester reliability (ICC 0.89–0.97) and inter-tester reliability (ICC 0.92–0.96) [17].

The influence of ROM on patient satisfaction after TKA, which is indicated in our results, is in contradiction with the study by Miner et al. [4]. They observed that patient satisfaction and quality of life are driven by the overall functional status and not influenced by the ROM of the index knee. Again, our findings suggest the improvement of ROM is the important factor associated with satisfaction.

Even though improvement in the STS test was not withheld as significant predictor, the ‘satisfied cluster’ has a significant faster STS performance time but also a significant improvement compared to the ‘less satisfied cluster’. The cut-off value for the improvement of the STS test of 1.05 s is associated with a three times higher likelihood of high PROM score cluster allocation. Until now, clear STS-test cut-off values for TKA patients have not been defined. Estimated values for normal performance for community-dwelling elderly, older than 60, are found to be 11.4–14.8 s [34]. While all patients in this study were community-dwelling, the recorded STS values were slower (cluster 1: 15 s and cluster 2: 20 s on average). The relatively short follow-up of 6 months may account for this finding.

Another important parameter correlated with functional performance [35, 36] is the normalized quadriceps force. Although it was not a significant predictor of cluster allocation, an increase of 1.5N/BMI or more, represented an odds ratio for high PROM scores of 2.5. Better quadriceps strength has been observed to increase the walking distance in the 6MWT and improve the STS-test time [25, 35, 37]. As these functional parameters are interrelated, the 6MWT may illustrate the quadriceps force.

Whilst the thresholds for increase in ROM and 6MWT were established as significant predictors, they were also retained by the backward logistic regression model. Since an unequal gender distribution was observed, the model was adjusted for gender. The final model with a sensitivity of 91% and a specificity of 75% was, therefore, considered a clinically relevant prediction tool. Male patients improving by more than 50 ms on the 6MWT and by more than 5° ROM were 6–8 times more likely to be classified in the ‘satisfied cluster’.

There are some limitations to the present study. First, the follow-up of 6 months is relatively short and further improvement could occur later on. However, a steep clinical improvement is usually only noticed between 3 and 6 months after TKA, followed by a significantly slower, more gradual improvement up to 2 years postoperatively [38].

A second limitation is related to the floor- and ceiling effects of the PROMs. Ceiling effects of the KOOS pain subscale have been reported to be 15% at 6 months and 22% at 12 months [39]. Since ceiling effects occur when a majority of the patients reach the top scores, further improvements in outcome may remain undetected. However, as the cut-off for floor and/or ceiling effects should not exceed 15%, the use of the KOOS at 6-months does not pose a problem [19]. Previous studies showed no occurrences of floor nor ceiling effects of the KSS at 6 months after TKA [40], which make the KSS satisfaction a strong evaluation tool for outcome after TKA. A third limitation is the relatively small sample size. Although there were statistically significant differences within the data, larger patient groups would permit more substantial comparisons. However, the nature of these exhaustive studies generally leads to relatively small study groups as reflected in the significant number of drop-outs after the initial testing. Nevertheless, the sample size of this research was comparable to other published studies evaluating functional outcomes [5, 25, 26]. Finally, we did not correct the model for age. However, Bourne et al. could only demonstrate a weak correlation for poorer satisfaction with advancing age. Robertson et al. did not find any correlation between satisfaction and age [2].

The new cut-off values for improvement of ROM and 6MWT established in this study, could be helpful in clinical practice. It could be interesting to use these tests and their cut-off values as markers for patient satisfaction. In patients who are not likely to achieve the cut-off for satisfaction, rehabilitation should be intensified. This could help patients to reach their functional goals and hopefully lead to a higher satisfaction with the TKA. Whilst a correlation between functional outcomes and PROMs seems intuitive, earlier results have been inconclusive [2,3,4,5]. This study, however, provides new cut-off values which provide a useful addition to the PROMs in the evaluation of TKA patient populations.

Conclusions

The current study established cut-off values for the improvement of ROM and 6MWT 6 months after TKA, predicting the classification into clusters based on PROMs. To increase the likelihood of being satisfied, patients should be able to walk 50 m further during the 6MWT and increase their ROM by 5° or more as compared to the preoperative situation. This model has a sensitivity of 91% and a specificity of 75%, and can easily be applied in clinical practice.