Introduction

Knee osteoarthritis (KOA) constitutes a significant portion, approximately four-fifths, of the global osteoarthritis burden [1]. Among individuals aged 40 years and above, the global prevalence of KOA is reported to be 22.9% [2], representing a substantial disease burden. For end-stage unicompartmental KOA, unicompartmental knee arthroplasty (UKA) emerges as a crucial treatment option, offering advantages such as a short surgery time, quick recovery, and preservation of knee proprioception [3].

The alignment of implants is critical to the success of UKA, closely linked to aseptic loosening, polyethylene wear, and the progression of KOA in the contralateral compartment, consequently influencing the revision rate [4]. Manual cutting guide is the most commonly used option, but it relies on subjective assessment of anatomical landmarks and manual positioning. In recent years, the use of patient-specific instruments (PSI) and robotic systems in UKA has increased to achieve more precise implant alignment. PSI, a personalized osteotomy guide, based on a three-dimensional model of the knee derived from preoperative magnetic resonance imaging (MRI) or computed tomography (CT) [5]. Robotic systems, on the other hand, provide feedback on tracking and ligament balance, and robotic arms assist surgeons in executing predetermined bone resection plans within tactile boundaries [6].

However, the uncertainty remains regarding whether PSI-assisted UKA (P-UKA) and robot-assisted UKA (R-UKA) can improve the accuracy of implant alignment. Furthermore, it is unclear whether the potential positional benefits outweigh the time and economic cost associated with these new technologies, and whether there are resulting improvements in patient-reported outcome measures (PROMs). Notably, current studies have primarily compared outcomes between R-UKA and C-UKA or P-UKA and C-UKA, with a lack of direct comparison between the advantages and disadvantages of P-UKA and R-UKA.

Therefore, this study aims to compare the imaging and functional outcomes of the three osteotomy guides through Bayesian network meta-analysis (NMA), exclusively including randomized controlled trials (RCTs) with evidence level I.

Methods

This systematic review and NMA was performed according to Cochrane Guidelines [7], and was in compliance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [8] (Appendix 1). The research protocol was registered with PROSPERO.

Search strategy

A comprehensive search strategy was developed following a pre-search, and subsequent modifications were made based on the search results. Finally, we searched five electronic databases [PubMed, Embase, Cochrane Central Registry of Controlled Trials (CENTRAL), Web of science, and Scopus] and major orthopedic journals (Bone & Joint Journal, Bone & Joint Research, Clinical Orthopedics and Related Research, Journal of Arthroplasty, Journal of Bone and Joint Surgery American Volume, and Knee Surgery Sports Traumatology Arthroscopy) up to September 24, 2023. The search utilized a combination of MeSH terms and free words. Specific search strategies for each electronic databases were shown in Appendix 2. We also screened reference lists of systematic reviews and meta-analyses for any additional relevant references.

Inclusion and exclusion criteria

Included studies met the following criteria: (1) medial UKA due to KOA; (2) involved at least two of three interventions, not limited to two-arm RCTs; (3) human studies; (4) reported at least one of the interested primary or secondary outcomes; (5) published in English. Excluded studies comprised single-arm studies, abstracts, letters to the editor, commentaries, reviews, systematic reviews, meta-analyses, conference articles, case reports, model studies, and cadaveric studies.

For the same author or team, if multiple literatures were published on the same topic, we only included the most recently published literature. If these literatures differed in study population or outcomes, all relevant studies were included in the NMA.

Two independent reviewers screened titles, abstracts, and full texts for study eligibility. Disagreement were resolved by a third reviewer, and, if needed, the corresponding author of the original literature was consulted for clarification.

Data extraction

Data extraction was independently performed by two reviewers, with disagreement resolved by a third reviewer.

Standardized criteria were established by pre-extracting data from three randomly selected studies. The specially designed data extraction form included: year, author, country, type of comparison, sample size (in knees), patient characteristics, follow-up time, PSI imaging device, robotic system, UKA prosthesis type, funding, and interested primary and secondary outcomes.

Primary outcomes included: deviation angle of hip-knee-ankle angle (HKAA), deviation angles of femoral and tibial components in coronal and sagittal plane. Secondary outcomes included: surgery time, blood loss, length of hospitalization, revision rate, complication rate, and various PROMs. Definitions of primary and secondary outcomes are provided in Appendix 3.

Mean and standard deviation (SD) were collected for continuous variables, and frequencies were collected for categorical variables. When outcome data did not meet the requirements, we used approximation formulas (Appendix 4) to estimate mean or SD. Efforts were made to contact corresponding authors for missing data.

Quality assessment

The Cochrane risk-of-bias tool [9] was utilized for risk of bias assessment, covering six domains with seven items: selection bias (random sequence generation, allocation concealment), performance bias, detection bias, attrition bias, reporting bias, and other bias. The risk of bias was categorized as low, high, or unclear for each item We drew the risk of bias graph and risk of bias summary based on the quality assessment results.

The quality of evidence for primary outcomes was assessed using the Grading of Recommendations, Assessment, Development, and Evaluation (GRADE) framework [10]. GRADE framework contains five factors: study limitation (risk of bias), indirectness, inconsistency, imprecision, and publication bias. For RCT studies, the initial quality is high, and if any factor is met, the study will be downgraded by one level until it drops to very low.

Quality assessment was conducted independently by two reviewers, with a third reviewer resolving any inconsistencies.

Statistical analysis

The unit of analysis was the knee. Categorical outcomes were compared using risk ratios (RR) and 95% confidence intervals (CI), while continuous outcomes were compared using mean difference (MD) and 95% CI. We used the random effects model to conduct a meta-analysis of direct comparison results among studies, quantitatively judged the homogeneity between studies using I2 statistic.

NMA combined direct and indirect comparison results within a Bayesian framework through a Markov chain Monte Carlo approach. The Bayesian method uses posterior probabilities to rank all analyzed interventions, overcoming the shortcomings of the frequentist method in parameter estimation by continuously iteratively estimating the maximum likelihood function, which is prone to instability and produces biased results. The Bayesian framework does not calculate P values. The conclusion comes from whether 1 (no difference in RR) or 0 (no difference in MD) falls within CI. When the 95% CI of RR is 1 or the 95% CI of MD is 0, the results were not significant. The analysis included a prior distribution set to the normal distribution, four chains, a thinning interval of one, 50,000 iterations, and an annealing algorithm for the first 20,000 times to eliminate the effect of the initial value. In cases where the number of studies was insufficient for NMA, a descriptive report was provided.

For each interested outcome, we plotted network diagrams illustrating the distribution of numbers and comparisons between interventions. Model stability was assessed using the Brooks-Gelman-Rubin diagram (Appendix 5). A “rank probability” for each intervention was calculated and a rank order plot was generated to assess the treatment hierarchy.

We used Confidence in Network Meta-Analysis (CINeMA) [11] to generate network diagrams as well as study limitation bars in the GRADE evaluation. Conventional meta-analysis with pairwise comparisons was performed using STATA 17.0 (StataCorp LLC, Texas, USA) to obtain I². The risk of bias graph and risk of bias summary were created using RevMan [12]. All NMA were performed using the gemtc 1.0–2 packages of R 4.3.2 (R foundation for statistical Computing, Vienna, Austria).

Results

Characteristics of included studies

Following screening, 12 studies with 871 knees were included in NMA (Fig. 1). Among these, two studies used fixed-bearing UKA, eight studies used mobile-bearing UKA, and two studies used both prosthesis types. There were six studies (391 knees) comparing P-UKA and C-UKA, with follow-up times ranging from 3 to 24 months. Four studies implemented PSI designs based on MRI and two studies utilized CT (Table 1). Another six studies compared 480 knees with R-UKA and C-UKA, with follow-up times ranging from 4.5 months to 24 months; three studies used the Mako Robotic system, two studies used Acrobot system, and one study used the BlueBelt Navio image-free robotic surgical system (Table 2). No RCT directly compared P-UKA and R-UKA (Fig. 2). All studies comparing R-UKA and C-UKA received fundings, while this proportion was only 16.7% (1/6) of studies comparing P-UKA and C-UKA. A network diagram for each interested outcome is provided in the appendix 6.

Fig. 1
figure 1

Flow diagram

Table 1 Characteristics of the included studies comparing P-UKA and C-UKA
Table 2 Characteristics of the included studies comparing R-UKA and C-UKA
Fig. 2
figure 2

Global network diagram. The size of the node represents sample size, the edge width represents the number of studies, the solid line represents direct comparison, and the dotted line represents indirect comparison. The proportion of risk bias of the original study is shown in each node. Red indicates high risk, yellow indicates unclear, and green indicates low risk

Risk of bias

The primary source of bias was related to the blinding of outcome assessment, with additional high-risk studies in participant blinding and incomplete outcome data. Approximately 58.3% of the studies clearly demonstrated the methodology of randomized sequence generation and allocation of concealment (Fig. 3).

Fig. 3
figure 3

(A) Risk of bias graph and (B) risk of bias summary

Primary outcomes

HKAA deviation angle

Seven studies (461 knees) reported HKAA deviation angle [13,14,15,16,17,18,19]. Differences of deviation angle between P-UKA, R-UKA and C-UKA were not significant (Table 3). Treatment hierarchy, based on rank probability, was as follows: (1) P-UKA, (2) C-UKA, and (3) R-UKA, suggesting that P-UKA exhibited the smallest HKAA deviation angle, while R-UKA showed the largest (Appendix 7).

Table 3 Network analysis of P-UKA, R-UKA and C-UKA

Femoral component deviation angle in coronal plane

Seven studies (512 knees) reported femoral component deviation angle in the coronal plane [14, 15, 17,18,19,20,21]. The differences in deviation angle between P-UKA, R-UKA and C-UKA were not significant (Table 3). The treatment hierarchy indicated that R-UKA had the smallest deviation angle, followed by C-UKA, and P-UKA had the largest (Appendix 7).

Femoral component deviation angle in sagittal plane

Six studies (484 knees) reported femoral component deviation angle in sagittal plane [14, 15, 17, 18, 20, 21]. P-UKA had a significantly larger deviation angle of 4.16° than R-UKA, with no significant differences observed between C-UKA and either P-UKA or R-UKA (Table 3). The treatment hierarchy showed that R-UKA had the smallest deviation angle, followed by C-UKA, and P-UKA had the largest (Appendix 7).

Tibial component deviation angle in coronal plane

Eight studies (583 knees) reported tibial component deviation angle in coronal plane [13,14,15, 17,18,19,20,21]. Differences between three interventions were not significant (Table 3). The treatment hierarchy revealed that R-UKA had the smallest deviation angle, followed by P-UKA, and C-UKA had the largest (Appendix 7).

Tibial component deviation angle in sagittal plane

Eight studies (604 knees) reported tibial component deviation angle in sagittal plane [13,14,15, 17,18,19,20,21]. NMA indicated that the deviation angle of R-UKA was 2.45° significantly smaller than that of C-UKA. No significant differences were observed in the remaining comparisons (Table 3). The treatment hierarchy showed that R-UKA had the smallest deviation angle, followed by P-UKA, and C-UKA had the largest (Appendix 7).

Secondary outcomes

Surgery time

Six studies (288 knees) reported surgery time [14, 16, 17, 19, 20, 22]. Compared with C-UKA, the surgery time of R-UKA was significantly longer, reaching 15.98 min (Table 3). Treatment hierarchy based on rank probability showed that R-UKA had the longest surgery time, followed by P-UKA, and C-UKA has the shortest (Appendix 7).

Functional score

Four studies (408 knees) reported the Forgotten Joint Score (FJS) [13, 17, 23, 24], four studies (387 knees) reported the Oxford Knee Score (OKS) [17, 20, 23, 24], and three studies (236 knees) reported Short Form 12 (SF-12) mental and physical [18, 19, 23]. NMA results showed no significant differences in FJS, OKS, SF-12 mental or physical outcomes between the three comparisons (Table 3). Treatment hierarchies are detailed in Appendix 7. Other types of PROMs were insufficient for NMA, and a descriptive summary is provided in Appendix 8.

Revision rate

Five studies (464 knees) reported revision rate [13, 17, 20, 23, 24]. The revision rate of P-UKA was 2.40 times that of C-UKA, 6.12 times that of R-UKA; and R-UKA was 0.40 times that of C-UKA, although these differences were not significant (Table 3). The treatment hierarchy showed that R-UKA had the lowest revision rate, while P-UKA had the highest (Appendix 7). Appendix 9 summarized the specific reasons for the revision.

Complication rate

Six studies (492 knees) reported complication rate [13, 14, 17, 20, 23, 24]. No significant differences were observed in the complication rate among the three comparisons (Table 3). The treatment hierarchy showed that R-UKA had the lowest complication rate, while P-UKA had the highest (Appendix 7). Specific reasons for complications are summarized in Appendix 9.

Blood loss

Only one study reported blood loss [15], with an average of 50 ml for P-UKA and 75 ml for C-UKA. The difference was not significant (P = 0.42).

Length of hospitalization

Only one study reported the length of hospitalization [17]. The average length for P-UKA and C-UKA was 1 ± 0.9 days and 1 ± 1 days, respectively, with no significant difference between the two groups.

GRADE evaluation

We conducted GRADE evaluations of five primary outcomes. For HKAA deviation angle, the evidence for the P-UKA and C-UKA comparisons, and the P-UKA and R-UKA comparisons, was moderate. For femoral deviation angle in the coronal plane, the evidence for the R-UKA and C-UKA comparisons, and the R-UKA and P-UKA comparisons, was moderate. For tibial deviation angle in sagittal plane, the evidence for the R-UKA and C-UKA comparisons was very low. The levels of evidence between the comparisons for the other primary outcomes were low (Appendix 10).

Discussion

The main finding of this NMA is that in the sagittal plane, R-UKA significantly improves the alignment accuracy of both femoral and tibial components. Moreover, within the treatment hierarchy, R-UKA emerges as the leader in terms of alignment accuracy for femoral and tibial component deviation angles in the coronal plane. Unfortunately, the notable improvement in implant alignment accuracy does not translate into discernible differences in PROMs.

Numerous studies have consistently reported early failures in UKA, such as prosthesis loosening, polyethylene wear, and accelerated progression of contralateral KOA, associated with poor implant position [25,26,27,28]. Particularly concerning the tibial side, the excessive tibial slope can cause increased bone stress, anterior cruciate ligament tear, and tibial component loosening [29].

In the sagittal plane, R-UKA demonstrated a favorable advantage of both femoral and tibial components alignment, resulting in a significant reduction in deviation angle of 4.16° (compared with P-UKA) and 2.45° (compared with C-UKA), respectively. Importantly, existing study showed that a misalignment of 2° in UKA indicates implant failure [30, 31], and from this point of view, R-UKA has a significant clinical impact on the improvement of component alignment. The robotic system combines preoperative planning with patient imaging, and the robotic arm provides assistance to the surgeon through the senses of haptics, auditory, tactile, and visual, in order to provide feedback when approaching determined resection parameters, preventing over-resection and malpositioning during bone resection [32]. It should be noted that all R-UKA studies included in our NMA received funding from interested robot manufacturers, which may affect the research results.

In contrast, the improvement in component accuracy due to P-UKA was not significant, and even the femoral component deviation angles in the coronal and sagittal plane were greater than that of C-UKA in treatment hierarchy. This discrepancy may be caused by inaccurate three-dimensional imaging reconstruction and PSI manufacturing, which often requires further manual bone resection intraoperatively and results in errors [17].

Currently, there is only one study that has conducted a three-arm comparison. The authors prospectively enrolled 30 knees of P-UKA and compared them with the retrospective cohorts of R-UKA (13 knees) and C-UKA (14 knees). The study found no significant differences in component alignment between P-UKA and R-UKA in the coronal, sagittal, and axial planes. However, the accuracy in the coronal and sagittal planes was significantly higher in P-UKA compared to C-UKA [33]. It is imperative to highlight the small sample size and the potential for selection bias in this study, given its reliance on a combination of prospective and retrospective cohorts. High-quality three-arm RCTs are still warranted to substantiate these findings.

In contrast to a previous NMA analysis focusing on outlier rates [34], our direct analysis of component deviation angles provides a more intuitive understanding of the variation introduced by different techniques, considering the absence of a consensus among surgeons regarding the safe alignment range of the implant [35,36,37,38].

Whether improving the component alignment translates into improvements in PROMs remains controversial. In our NMA, we collected all PROMs reported in the original literature and performed Bayesian analysis on FJS, OKS, SF-12 physical and mental scores. Surprisingly, the superior implant alignment observed did not yield tangible benefits in terms of functional outcomes. Among the descriptive PROMs analyzed, only Gilmour et al. [24] reported that compared to C-UKA (mean: 14.5, IQR: 3.8, 31.0), R-UKA (mean: 3.5, IQR: 1.0, 25.3) had a significant improvement (P = 0.043) in the stiffness visual analogue scale. However, this positive finding stands in contrast to the overall non-significant differences observed in PROMs between the groups. This may be due to the fact that UKA patients are usually younger and have better preoperative knee function, so it is difficult to distinguish between good and excellent after surgery, that is, the ceiling effect; and the insufficient sensitivity of clinical functional scores may be further minimized this difference. However, a recent meta-analysis showed that R-UKA significantly improved KSS and postoperative FJS compared to C-UKA [39]. Considering that clinical outcomes and patient satisfaction are the fundamental goals of UKA, further in-depth studies on the improvement effect of PROM are still needed.

Regarding surgery time, R-UKA was significantly longer than C-UKA by 15.98 min. The prolonged surgery time associated with R-UKA can be attributed to several factors, including the placement of trackers, bone registration, bone preparation, and the acquisition of ligament balance data. Additionally, it can be attributed to the learning curve [40]. A systematic review showed that this threshold ranged from 5 knees to 10 knees [41], depending on the type of robotic system. As surgeons progress through the learning phase, the surgery time may decrease [42,43,44] or even be no different from C-UKA [45, 46]. P-UKA requires continuous adjustment of the PSI position intraoperatively to achieve a tight fit with the bone surface, and therefore may also prolong the surgery time, although this difference was not significant in this NMA.

R-UKA and P-UKA did not increase the revision rate in comparison to C-UKA. P-UKA had two reported revisions. One revision was attributed to a tibial plateau fracture, which may be related to a technical reason [17], and the second revision was prompted by persistent knee pain, a known factor contributing to 8% of early failures according to existing study [47], and there have been patients with revision due to this reason in R-UKA and C-UKA in this NMA. In addition to this, C-UKA also showed one patient who had a revision due to aseptic loosening. It is important to consider that the follow-up duration across the studies included in our NMA ranged from 3 to 24 months. Given that 50% of UKA failures typically occur within the first 5 years postoperatively [48, 49], the relatively short follow-up period in these studies may be insufficient to detect potential differences in revision rates among the three guides.

Similarly, R-UKA and P-UKA did not increase the complication rate. The reference point bone screws used by R-UKA to receive optical signals and the bone pins used to fix the reference frame additionally increase the stress concentration on cortical bone, posing a certain risk of fracture; prolongation of the surgery time may lead to infection-related complications [14]. However, most of the complications reported in this NMA, such as stiffness, numbness or wound leakage, are not related to the choice of guide.

Compared with P-UKA and C-UKA, R-UKA improves the sagittal alignment, but its economic cost is higher [46, 50]. The purchase and maintenance of robotic systems and the cost of additional advanced CT machines are all challenges to its popularization. Only when the annual case volume exceeds 94 cases, R-UKA is cost-effective compared with C-UKA [46], while it is not cost-effective for low- and medium-volume centers. There is still a lack of unified standards for PSI design and implant position parameters. Once the PSI is inaccurate, experienced experts can make adjustments based on the actual intraoperative conditions, but beginners do not have the ability to adapt to such changes, which may cause serious consequences.

The strength of this NMA is that it only included RCTs with the highest level of evidence, ensuring the quality of original studies. It conducted a comprehensive analysis of multiple imaging and functional outcomes of UKA and provided a treatment hierarchy. Our NMA also filled the gap of the lack of head-to-head comparative studies on R-UKA and P-UKA through indirect comparison. This NMA has the following limitations: Firstly, due to the lack of original literature directly comparing P-UKA and R-UKA, we cannot perform consistency testing between the three interventions. However, we still estimated the effect between each pair of comparisons by adjusted indirect comparisons and assessed consistency in the network. Secondly, the studies we enrolled included two types of implants, mobile and fixed bearing; and the robotic systems are also different. Different surgical principles may cause differences in results. Thirdly, when analyzing each interested outcome, the number of studies was less than 10 articles, so publication bias could not be assessed through funnel plots, but we consulted CENTRAL to supplement possible unpublished negative results. Finally, the short follow-up period of the included studies may not detect potential differences in complications and revision rates, and future analyzes of studies with longer follow-up periods are needed.

Conclusion

R-UKA has advantages in implant alignment, especially in the sagittal plane of the femoral and tibial components. There was no difference in PROMs, complications and revision rates between R-UKA, P-UKA and C-UKA. In clinical practice, cost-effectiveness, imaging outcomes and functional results need to be comprehensively considered to select the appropriate UKA method.