Introduction

Total knee arthroplasty (TKA) is highly effective for treating advanced knee osteoarthritis [1]. Despite its great success and rapid development over the past two decades, 20% of patients experience unsatisfactory clinical outcomes after surgery [2, 3]. The precise positioning of components and alignment of the limb are critical factors influencing patient satisfaction and functional results following TKA [4]. In practice, achieving these standards manually can be very challenging for surgeons. The development of orthopedic robots has facilitated the widespread adoption of robot-assisted total knee replacement systems in clinical settings [5,6,7,8]. Many reports have indicated that robotic-assisted total knee arthroplasty (RA-TKA) enables more accurate bone cutting and implant placement and achieves balanced extension and flexion gaps tailored to the patient's skeletal anatomy and natural ligament balance, reducing the probability of detrimental stress and wear [9, 10]. The approach can theoretically promise more significant improvements in clinical outcomes. However, controversy persists regarding whether RA-TKA yields superior functional and clinical outcomes compared to manual TKA (M-TKA). Multiple studies report no statistically significant difference in clinical outcomes, despite favorable radiological outcomes during follow-up visits [11,12,13]. Based on that, many scholars have disputed its clinical significance. Therefore, we collected relevant articles and performed a meta-analysis. The study aims to compare the clinical outcomes and radiological results of RA-TKA and M-TKA through the analysis of relevant studies, thus providing a basis for physicians’ decision-making processes. The hypothesis of this study is that RA-TKA yields significantly superior outcomes than M-TKA, both clinically and radiologically.

Materials and methods

Literature retrieval

This study complied with the standards for Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA 2020) [14] and registered in PROSPERO prospectively (CRD420234731153). Articles published in English were systematically retrieved from PubMed, Web of Science, Cochrane Library and Embase up to June 1, 2024 to comprehensively compare the efficacy and/or safety of RA-TKA and M-TKA in treating knee osteoarthritis. The following terms were searched in the database: "robot-assisted", "robotic-assisted", "robot", "robotic", “Arthroplasty”, "Knee Replacement Arthroplasties", "Robotic Assisted Surgery" and "Total Knee Arthroplasty” (Table S1). Two authors independently and impartially examined the articles that met the inclusion criteria according to the search strategy, performed data extraction and manually examined the reference list of all included studies.

Inclusion and exclusion criteria

Studies that contained the following features were included: 1. studies with a randomized control, cohort or case–control design; 2. patients with end-stage knee osteoarthritis; 3. studies that compared RA-TKA and M-TKA; 4. evaluations including no less than one of the following indicators: American Knee Society Score (KSS), Western Ontario McMaster Universities Osteoarthritis Index (WOMAC), Oxford Knee Score (OKS), joint range of motion (ROM), 36-Item Short Form Health Survey (SF-36) score, Hospital for Special Surgery (HSS) score, Forgotten Joint Score (FJS), pain score, patient satisfaction score, operation length (min), intraoperative blood loss (ml), hip–knee–ankle (HKA) angle, frontal femoral component angle, frontal tibia component angle, lateral femoral component angle and lateral tibia component angle; 5. articles containing enough data for calculating odds ratio (OR) or weighted mean difference (WMD).

Studies were excluded if they were reviews, letters, comments, case reports, abstracts for conference presentation, articles on pediatrics and unpublished articles. We included studies on patients who underwent unilateral RA-TKA or M-TKA, and also excluded studies involving single-compartment knee arthroplasty.

Data extraction

Data was extracted systematically and independently by two investigators (Fu Xinyu and She Yiming). Disagreements were finally resolved by the third researcher (Jin Ri). We extracted the data on first author and publication year of the article, research duration, research country, research design, sample size, patient’s age and body mass index (BMI), follow-up time and interventions. Continuous variables in the included studies presented as the median and interquartile range (IQR) or range were calculated to obtain the mean ± standard deviation using verified mathematical methods [15, 16]. For studies with missing or unreported data, the corresponding author was contacted to request for complete (if any) data.

Quality assessment

Randomized controlled trials (RCTs) as well as cohort studies were evaluated, respectively, using the Cochrane Quality Assessment Scale and Newcastle–Ottawa Scale (NOS) [17]. The scale mainly includes three dimensions: subject selection, comparability between groups and measurement of results. Studies were given a score from 0 to 9, with a score of 7–9 representing high quality [18]. Research quality and level of evidence were separately reviewed by two researchers, and differences were handled via discussion.

Statistical analysis

Evidence synthesis was conducted using Review Manager 5.4 (Cochrane Collaboration, Oxford, UK). Weighted mean difference (WMD) and risk ratio (OR) were adopted to assess continuous and binary variables, respectively. Indicators were all presented using 95% confidence interval (CI). Then heterogeneity (Cochran's Q) and inconsistency index (I2) of all studies were assessed by the Chi-square (X2) test [19]. A p value for the χ 2 test beneath 0.05 or I2 over 50% was considered as remarkable heterogeneity. If there was remarkable heterogeneity, a random effects model was employed to approximate the pooled WMD or OR. If not, the fixed effects model was utilized. To assess the influence of the eligible studies on the pooled results containing remarkable heterogeneity, one-way sensitivity analysis was also performed. The funnel plot was produced using Review Manager 5.4 (Cochrane Collaboration, Oxford, UK). The results involving ≥ three studies were tested by the Egger’s regression test in Stata 15.0 (Stata Corp, College Station, TX, USA) [20], and the publication bias was visually assessed. A p value of lower than 0.05 was deemed to have statistical significance.

Results

Literature retrieval and study characteristics

The process of literature retrieval and screening is displayed in detail in Fig. 1. Through systematic literature retrieval, 1,711 related articles were finally obtained from PubMed (n = 342), Embase (n = 644), Cochrane (n = 150) and Web of Science (n = 575). Then 1,033 titles and abstracts of studies were screened after deleting duplicates. Finally, 12 full-text articles were included, concerning 2,863 patients (1,449 RA-TKAs and 1,414 M-TKAs) for pooled analysis [11,12,13, 21,22,23,24,25,26,27,28,29]. Among them, five were RCTs [11, 12, 21,22,23] and seven were cohort studies [13, 24,25,26,27,28,29]. The specific features of all included studies are presented in Table 1. Quality evaluation of eligible cohort studies and RCTs is shown in Table S2 and Fig. 2, respectively.

Fig. 1
figure 1

Flowchart of the systematic search and selection process

Table 1 Baseline characteristics of the included studies
Fig. 2
figure 2

Risk of bias graph summary for randomized controlled trials

Change of Knee Society Score

In the analysis of KSS improvement from preoperative to postoperative stages, the RA-TKA and M-TKA groups exhibited similar changes in KSS scores (WMD: −1.18; 95%CI: −3.41, 1.05: p = 0.30), with significant heterogeneity observed (I2 = 98%, p < 0.00001) (Fig. 3a). The subgroup analysis of > 6-month follow-up duration revealed a significantly lower KSS score in the RA-TKA group compared to the M-TKA group (WMD: −0.61; 95%CI: −0.89, −0.33: p < 0.00001) (Table 2).

Fig. 3
figure 3

Forest plots of score: a KSS, b WOMAC, c OKS, d ROM, e SF*36, f HSS, g FJS

Table 2 Subgroup analysis of RA-TKA versus M-TKA

Change of Western Ontario McMaster Universities Osteoarthritis Index

Five studies reported WOMAC. The pooled analysis indicated that RA-TKA and M-TKA groups demonstrated similar changes in WOMAC scores (MD: 2.32; 95%CI: -4.27, 8.91; p = 0.49), with salient heterogeneity observed (I2 = 98%, p < 0.00001) (Fig. 3b). The subgroup analysis of > 6-month follow-up duration unveiled a significantly lower WOMAC score in the RA-TKA group in contrast to the M-TKA group (WMD: −3.40; 95%CI: −3.72, −3.08; p < 0.00001) (Table 2).

Change of Oxford Knee Score

OKS was reported in four studies, which unraveled that the RA-TKA and M-TKA groups presented with similar OKS scores (WMD: 5.70; 95% CI: -0.97, 12.37; p = 0.09), with salient heterogeneity found (I2 = 100%, p < 0.00001) (Fig. 3). Sensitivity analysis showed that after excluding the research published by Lincon et al. in 2016 [23], the result changed from insignificant to significant, indicating instability of the index (Fig. 4c). The subgroup analysis of cohort studies, > 6-month follow-up duration, and European population revealed a significantly higher ROM score in the RA-TKA group compared to the M-TKA group (Table 2).

Fig. 4
figure 4

Sensitivity analysis of a KSS, b WOMAC, c OKS, d ROM, e pain score, f operative duration, g HKA, h FFC, i FTC, j LFC, k LTC

Change of joint range of motion

Seven studies reported ROM. The comprehensive analysis suggested a significantly lower ROM score improvement in the RA-TKA group in contrast to the M-TKA group (WMD: −2.84; 95% CI: −3.72, −1.95; p = 0.00001), with significant heterogeneity observed (I2 = 98%, p < 0.00001) (Fig. 3d). Sensitivity analysis uncovered that after excluding the research published by Kim et al. in 2020 [12] or Jin et al. in 2018, the result changed from significant to insignificant, indicating instability of the index (Fig. 4d).

Change of the 36-Item Short Form Health Survey score

The primary analysis demonstrated no statistically significant difference in postoperative SF-36 values between the RA-TKA group and the M-TKA group (WMD: −1.47; 95% CI: −4.02, 1.08; p = 0.26) (Fig. 3e), with significant heterogeneity (I2 = 6%, p < 0.30). Subgroup analysis similarly found no statistical difference.

Change of hospital for special surgery

In the analysis of KSS improvement from preoperative to postoperative stages, the RA-TKA group and the M-TKA group indicated no statistical difference (WMD: −1.06; 95% CI: −3.75, 1.64; p = 0.44) (Fig. 3F) and notable heterogeneity was discovered (I2 = 98%, p < 0.00001).

Forgotten Joint Score (FJS)

Six studies reported FJS. The pooled analysis unveiled a significantly higher FJS score in the RA-TKA group in contrast to the M-TKA group (WMD: 5.59; 95% CI: 3.53, 7.66; p < 0.0001) (Fig. 3g), with no significant heterogeneity (I2 = 21%, p = 0.27).

Change of pain score

Four studies reported pain scores. The comprehensive analysis found no significant difference in pain scores between the RA-TKA and M-TKA groups (WMD: 1.51; 95% CI: -0.34, 3.36; p = 0.11). Sensitivity analysis showed that after excluding the research published by Linkon et al. in 2016, the result changed from insignificant to significant, indicating instability of the index (Fig. 4d).

Patient satisfaction score

Three studies reported patient satisfaction scores. The pooled analysis showed that the RA-TKA and M-TKA groups exhibited similar patient satisfaction scores (WMD: -0.08; 95% CI: −1.40, 1.24; p = 0.91), with no salient heterogeneity (I2 = 0%, p = 0.82) (Fig. 5b).

Fig. 5
figure 5

a Pain score, b patients' satisfactory score, c operative duration (min), d intraoperative blood loss (ml)

Operative duration

The operation time was reported in three studies. The comprehensive analysis found that the RA-TKA group exhibited a significantly longer operation time than the M-TKA group (WMD: 25.97; 95% CI: 12.59, 39.34; p = 0.0001), with significant heterogeneity (I2 = 85%, p = 0.001) (Fig. 5c). Sensitivity analysis uncovered that after excluding the research published by Kim et al. in 2020 or Xu et al. in 2022, the result changed from significant to insignificant, indicating instability of the index (Fig. 4f).

Intraoperative blood loss

The comprehensive analysis revealed no statistically significant difference in intraoperative blood loss between the RA-TKA group and the M-TKA group (WMD: −5.53; 95%CI: −1.90, 12.95; p = 0.14), with no salient heterogeneity (I2 = 0%, p = 0.65) (Fig. 5d).

Change of hip-knee-ankle angle

HKA was reported in six studies. The comprehensive analysis suggested a significantly superior HKA score improvement in the RA-TKA group than that in the M-TKA group (WMD: 1.10; 95%CI: 0.40, 1.80: p = 0.002), with significant heterogeneity (I2 = 99%, p < 0.00001) (Fig. 6a).

Fig. 6
figure 6

Radiographic findings: a HKA, b FFC, c FTC, d LFC, e LTC

Frontal femoral component angle

Frontal femoral component (FFC) angle was reported in three studies. The comprehensive analysis found no clinical difference in the FFC angle between the RA-TKA group and the M-TKA group (WMD: 0.61; 95%CI: −0.19, 1.42; p = 0.14), with no salient heterogeneity (I2 = 90%, p < 0.0001) (Fig. 6b). Sensitivity analysis indicated that after excluding the research published by Jin et al. in 2018 [26], the result changed from insignificant to significant, indicating instability of the index (Fig. 4h). The subgroup analysis of RCT in a study revealed a significantly higher FFC score in the RA-TKA group compared to the M-TKA group (WMD: 1.01; 95%CI: 0.81, 0.21; p < 0.0001) (Table 2).

Frontal tibia component angle

Three studies reported frontal tibia component (FTC) angle. Pooled analysis revealed no meaningful difference in FTC angle between the RA-TKA group FFC and the M-TKA group (WMD: 0.13; 95%CI: -0.99, 1.25; p = 0.82), and there was remarkable heterogeneity (I2 = 93%, p < 0.00001) (Fig. 6c).

Lateral femoral component angle

Three studies reported lateral femoral component (LFC) angle. Pooled analysis revealed no significant difference in the LFC angle between the RA-TKA group FFC and the M-TKA group (WMD: −0.80; 95%CI: −3.73, 2.13; p = 0.59), and there was remarkable heterogeneity (I2 = 99%, p < 0.00001) (Fig. 6d). The cohort study subgroup analysis discovered a significantly lower LFC score in the RA-TKA group than in the M-TKA group (WMD: −2.90; 95%CI: −3.42, −2.38; p < 0.00001) (Table 2).

Lateral tibia component angle

Three studies reported lateral tibia component (LTC) angle. Pooled analysis found no meaningful difference in LTC score between the RA-TKA group and the M-TKA group (WMD: 0.65; 95%CI: -0.52, 1.81; p = 0.28), and remarkable heterogeneity was noted (I2 = 93%, p < 0.00001) (Fig. 6f). Sensitivity analysis discovered that when the research published by Jin et al. in 2018 [26] was excluded, the result changed from insignificant to significant, indicating that the index was unstable (Fig. 4k). RCT subgroup analysis revealed a significantly higher FFC score in the RA-TKA group than in the M-TKA group (WMD: 1.06; 95%CI: 0.80, 1.31; p < 0.0001) (Table 2).

Publication bias

A visual assessment of funnel plots for all outcome measures (Fig. 7) was performed. Outcome measures showing potential publication bias were subjected to Egger’s test. No statistically significant publication bias was observed.

Fig. 7
figure 7

Funnel plots of a KSS, b WOMAC, c OKS, d ROM, e FJS, f pain score, g patients' satisfactory score, h operative duration, i intraoperative blood loss, j HKA, k FFC, l FTC, m LFC, n LTC

Discussion

The most important findings of this meta-analysis are as follows: 1. The RA-TKA group demonstrated significantly better improvements in HKA angle and postoperative FJS score compared to the M-TKA group. 2. In contrast to the RA-TKA group, the operation time was shorter in the M-TKA group. Subgroup analysis of > 6-month follow-up results showed that the M-TKA group exhibited more significant improvements in KSS score and WOMAC score.

With the ultimate goal of building a stable, painless and long-lasting joint, M-TKA relies on surgical instruments to measure knee parameters, select a prosthesis and execute a surgical plan. Despite the proven efficacy and reproducibility of conventional knee arthroplasty, and ongoing innovations in prosthetics and surgical instruments, a notable number of patients remain dissatisfied with knee arthroplasty, attributed to various known and uncertain reasons [30, 31]. The primary reason often stems from the stringent standards of TKA for prosthesis placement, lower limb reconstruction and postoperative stability, while M-TKA struggles to consistently meet these criteria [32,33,34]. RA-TKA has been developed to eliminate potential inaccuracies in implant positioning and alignment, thus mitigating patient dissatisfaction. Numerous studies have unraveled that RA-TKA results in fewer outliers in component positioning, especially in the sagittal plane, irrespective of the knee alignment and balancing techniques employed. Surgeons may balance the knee more precisely with RA-TKA than with M-TKA [35]. However, it remains to be validated whether RA-TKA can improve postoperative function recovery and deliver superior clinical efficacy compared with M-TKA. On this basis, we conducted an up-to-date systematic review and meta-analysis.

The study has demonstrated that the RA-TKA group exhibited certain advantages in improving the HKA angle, thereby enhancing the alignment accuracy of the prosthesis and reducing the deviation of the lower limb force line from the neutral position. It has been reported that maintaining the (HKA) angle within a safe range of ± 3° can increase implant survival [36]. HKA angle is a crucial factor affecting the longevity of knee prosthesis. Abnormal HKA angle following TKA may result in prosthetic knee prosthesis dislocation, early postoperative prosthesis loosening, compromised functional recovery and heightened revision rate [37, 38]. Research by Mary K. Richardson et al. indicates that patients undergoing RA-TKA have a significantly closer-to-neutral postoperative HKA angle. In addition, compared to patients treated with conventional methods, fewer RA-TKA patients experience HKA angles outside the range of 0° ± 3° [39]. Byung Sun Choil et al. have demonstrated that the RA-TKA achieves superior accuracy and precision in femoral and tibial prosthesis placement compared to M-TKA, and all X-ray measurements were reproducible. These findings uncover that RA-TKA can enhance the accuracy and repeatability of component positioning and overall limb alignment [40,41,42,43]. It is worth noting that RA-TKA demonstrated the accuracy of prosthetic positioning and the enhancement of early patient-reported outcomes [35]. However, the correlation between accurate implant positioning and clinical outcomes remains contentious. A systematic review by Bensa et al. indicates that both procedures significantly improve patients’ symptoms, with no significant difference in clinical outcomes observed between RA-TKA and M-TKA, aligning with our own findings [44]. This raises questions about whether achieving a 180° alignment is universally “normal” and whether it should be the goal of TKA for all patients. Multiple studies have found that the HKA angle deviates from 0° in the general non-arthritic population. A study by Bellemans et al. has reported a varus angle of 1° in women and 2° in men based on a study of 250 healthy adults [45]. In addition, Almaavi et al. have reported a large variation in natural knee anatomy among 4884 CT scans of the knee, with only 5% of the general population exhibiting a natural neutral alignment (HKA angle: 0°) [46]. In most patients undergoing TKA, the knee may be compelled into an unnatural position, potentially contributing to the lack of corresponding clinical outcomes despite achieving better natural neutral alignment. Given the variability of coronal knee alignment in non-osteoarthritic knees and the wide variability of all coronal alignment parameters, the necessity is underscored for a more anatomically precise and individualized approach to knee arthroplasty [47].

The FJS is a joint-specific questionnaire designed to assess a patient’s ability to “forget" about a joint issue following joint treatment. It reflects not only the difference between “good” and “bad”, but also distinguishes between “good”, “very good” and “excellent” results [48]. In this study, the RA-TKA group was found to have a superior postoperative FJS in contrast to the M-TKA group. A study by Kafelov M et al. has unveiled that RA-TKA achieved a higher FJS at 1 year postoperatively compared with M-TKA [29]. Similarly, Kaanni et al. have reported that robot-assisted total knee arthroplasty is relevant to a statistically significant improvement in FJS compared to conventional total knee arthroplasty, although these differences fail to reach a minimal clinically important difference (MCID) at any follow-up interval [28]. Therefore, further comprehensive randomized controlled trials are needed for validation. The lower ceiling effect of the FJS allows monitoring of long-term outcomes, particularly in groups that show favorable outcomes following total joint arthroplasty. Measurable clinically significant differences between RA-TKA and M-TKA may be better demonstrated in future long-term studies [40].

The setup and registration of the robotic system in RA-TKA are unique and may lead to increased total operative time [49]. This study observed that the RA-TKA group had longer operative times than the M-TKA group, possibly due to the complexity of robotic surgical steps, operator inexperience, and the longer learning curve associated with RA-TKA. The study by Xu et al. has unraveled that in RA-TKA, a significant portion of operative time is devoted to tasks such as setup, femoral and tibial fixation, and alignment [22]. Longer operative time may elevate the rate of TKA infection, causing devastating consequences of TKA [50]. This is one of the disadvantages of RA-TKA, which can be improved by reducing the time allocated to non-surgical activities. As surgeons gain proficiency and RA-TKA techniques are refined, operative times may further decrease. The change of postoperative ROM in the M-TKA group was superior than that in the RA-TKA group. However, given the variability of the results, this result should be interpreted cautiously. In the subgroup analysis of the > 6-month follow-up, it was suggested that the improvement of KSS and WOMAC scores in the M-TKA group was significantly higher than that in the RA-TKA group, indicating potential advantages of long-term outcomes for the former. Short-term follow-up results were similar between the two groups. A recent meta-analysis has revealed that short-term patient-reported outcomes (KSS and WOMAC) are improved in the RA-TKA group compared with the conventional TKA group. However, these differences do not exceed the threshold for MCID, suggesting that they may not be clinically significant [35].

Limitations of the study: firstly, not all of the included studies were RCTs (5 RCTs and 4 retrospective cohort studies); secondly, due to limited data availability, no subgroup analysis was performed for different brands of robotic assistance systems. Considering the underlying factors, the results should be interpreted cautiously and may not be applied to all systems. As robotic systems evolve, new high-quality studies are warranted to assess the latest advancements in robotic systems. Thirdly, the study was constrained by a limited number of included studies and relatively small sample sizes, potentially limiting its representativeness for the broader population. Future studies with long-term follow-up are needed to establish more definitive conclusions regarding outcomes and benefits. Despite these limitations, our study incorporated recent analyses from both prospective and retrospective cohort studies. Sensitivity analysis and subgroup analysis were used to test the stability and publication bias of the results, providing a theoretical basis for large-scale prospective clinical trials and evidence support for clinical workers’ treatment choices.

Conclusion

Our research results uncovered that the improvement of the HKA angle in the RA-TKA group was more significant than that in the M-TKA group. In terms of the operation time, improvement of ROM, KSS over 6 months and WOMAC score, M-TKA outperformed RA-TKA. The experimental follow-up time of this study was relatively short. Therefore, large-scale and well-designed clinical research with longer follow-up time is needed to comprehensively evaluate the advantages and disadvantages of the two surgical methods.