Introduction

Conventional hip and knee arthroplasty are common procedures effectively employed to manage advanced degenerative and inflammatory joint disease [1], providing acceptable survivorship and significantly improving quality of life (QOL) by reducing pain and restoring function [1, 2]. Previous studies have demonstrated that hip and knee arthroplasty survivorship are correlated with intra-operative factors such as leg alignment, component alignment, component fixation, joint line maintenance and soft tissue balancing [3, 4]. To improve surgical accuracy and precision and ultimately QOL and survivorship, several robotic-assisted arthroplasty surgery systems have been developed in the last two decades [5]. While there is evidence suggesting that robotic surgery improves the accuracy of prosthesis implantation when compared to conventional hip and knee arthroplasty [3, 4], at present, it is unclear if robotic-assisted surgery results in improved functional outcomes, pain, QOL and satisfaction with surgery.

Robotic surgery may be classified as a passive, semi-active or active robotic system, depending on how independently the system performs the manoeuvres [6]. Passive robotic systems assist the surgeon by displaying the surgical plan to be followed, while semi-active and active robotic systems directly influence the surgeon’s operative technique by respectively providing feedback or carrying out a pre-programmed surgical plan under the supervision of the surgeon. A large proportion of the existing evidence investigating the effectiveness of robotic surgery has focussed on clinical outcomes [4, 5, 7,8,9]. Recent systematic reviews comparing semi-active and active robotic hip and knee arthroplasty versus conventional surgery reported that robotic surgery was associated with better control of prosthesis positioning, fixation and alignment [4, 8].

Controlling intra-operative clinical factors has been reported to improve the success of hip and knee arthroplasty [4, 5]; however, whether these advantages translate into improved patients’ function, pain and QOL outcomes at the short-, medium- and long- term is currently unknown. While meta-analyses of these factors have previously investigated passive robotics compared to conventional surgery, this has not been performed for semi-active or active robotic systems [4]. Previous systematic reviews have attempted to provide an overview and describe the scope of semi-active and active robotic surgery in this area, but have not attempted to pool results for patient-reported outcomes [4, 8]. Therefore, the effect of robotic surgery, as compared to conventional surgery, from a patient’s perspective has not been elicited.

Patient-reported outcomes inform clinicians of the impact the treatment has on the patient and therefore has a significant role in guiding decision making for treatment [10]. As a result, knowledge of the effectiveness of robotic surgery for hip or knee arthroplasty on functional, pain and QOL outcomes will provide invaluable information to clinicians, patients and policy makers. Therefore, the aim of this systematic review was to evaluate the effectiveness of semi-active and active robotic hip and knee arthroplasty on patient-reported outcomes. Primary outcomes of interest were function, pain, QOL and patient satisfaction with these surgeries at short-, medium- and long- term.

Materials and methods

The protocol of this systematic review and meta-analysis was registered on PROSPERO (https://www.crd.york.ac.uk/PROSPERO/display_record.asp?ID=CRD42017059932; registration number CRD42017059932) prior to the start of the study and was written in accordance with PRISMA-P [11]. The review followed the methods recommended by the Cochrane Handbook for Systematic Reviews of Interventions [12], and was written in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [13]. The PRISMA checklist in reported in Online Resource 1.

Search strategy

The electronic databases of PubMed, Medline, Embase and Cochrane Central Register of Controlled Trials (CENTRAL) were searched (via Ovid) from inception till 20 March 2017. In addition to the electronic database searches, Google Scholar was searched and pearling of reference lists of studies was conducted for additional relevant articles. We also contacted an expert in the field, not involved in the review, to check if any published report was missed.

The search strategy was based on key terms for “arthroplasty”, “robotic surgery”, “hip” and “knee” (Online Resource 2). For each potentially eligible study, the full-text article was obtained and assessed against the inclusion criteria. Studies were screened by two independent review authors (SK and MD), and conflicts regarding included studies were resolved by discussion with a third review author (DS).

Inclusion criteria

This review included comparative studies (e.g. randomised controlled trials [RCTs], cohort studies) reporting the effectiveness of semi-active and/or active robotic arthroplasty compared to any other surgical intervention (e.g. conventional, passive robot). Semi-active robots were defined as any device that provided feedback or constrained the surgical plan within a pre-determined area greater than a conventional cutting guide (e.g. Mako, Acrobot, Navio Precision Freehand Sculptor). Active robots were defined as any device that performed surgical procedures without the direct intervention of a surgeon (e.g. ROBODOC, CASPAR).

Eligible studies met the following criteria: (1) reported on adults (≥ 18 years old) of any gender; (2) investigated any type of hip or knee arthroplasty (e.g. total hip [THA] or knee [TKA] arthroplasty, partial hip arthroplasty [PHA], unicompartmental knee arthroplasty [UKA], bi-unicompartmental knee arthroplasty [Bi-UKA]); and (3) presented at least one patient reported outcome measure of function, pain, QOL or overall patient satisfaction with the surgery at any follow-up time. Follow-up periods were categorised into short (≤ 3 months), medium (3–12 months) or long term (≥ 12 months). If studies reported multiple time points within each follow-up interval, the time point closest to two months was considered as the shortest time point, the medium time point was considered as the closest to six months and the longest time point was considered as the closest to 12 months. No language or publication restrictions were employed, with translations attempted for all non-English published studies.

Data extraction process

A standardised piloted data extraction form was employed to collate study information, participants’ baseline characteristics, intervention, control characteristics and outcome data. Two reviewers (SK and MD) independently extracted data from the included studies, and disagreements were resolved by discussion and consensus. If consensus could not be reached, a third reviewer (DS or MH) was consulted. Studies that were published in duplicate were only included once, but all versions were considered for maximal data extraction. If missing data were found, we made attempts to contact authors.

Assessment of studies

Risk of bias of the included studies was assessed by two independent reviewers (SK and MD) using the Downs and Black Quality Checklist for Health Care Intervention Studies [14]. This tool evaluates the study’s reporting quality (10 items), external validity (3 items), bias (7 items), confounding and selection bias (6 items) and power of studies (1 item) (Online Resource 3). Scores using this tool range from 0 (high risk of bias) to 32 (low risk of bias).

The strength of the evidence of the included studies was assessed from high to very low quality for each outcome using the Grading of Recommendations Assessment, Development and Evaluation (GRADE) system [11]. In brief, the quality of evidence was downgraded from high quality by one level according to the following criteria: (1) risk of bias (> 25% of patients from studies with high risk of bias [Downs and Black score < 26]); (2) inconsistency (statistically significant heterogeneity [I2 > 50% or ≤ 75% of studies with findings in the same direction]); and (3) imprecision (only 1 study or > 1 study with < 300 patients for each outcome). Two reviewers (SK and MD) independently rated the overall quality of evidence and consensus between these reviewers for all evaluations was used to resolve any disagreement.

Statistical analysis

For dichotomous outcomes, an estimate of the relative risk (RR) and its 95% confidence interval (CI) was calculated, where a RR value < 1 favoured robotic surgery. Continuous outcomes were analysed by calculation of the mean difference (MD) and 95% CI, where a negative MD value favoured robotic surgery.

Results from included studies were converted to a score out of 100, where a higher score indicated a better outcome, for ease of comparison across all outcomes (e.g. Oxford Knee Score of 48/60 was converted to 75/100) [15]. Homogeneous outcome measures from individual studies were combined through a meta-analysis using a random-effects model and correlated outcomes combined [16,17,18,19,20], where appropriate (r value > 0.5). If a meta-analysis was not possible, the results were qualitatively reported. Statistical heterogeneity was assessed visually and using the I2 statistic.

Results

Search results

The search yielded 2957 articles across all databases after duplicates were removed. Following elimination of clearly irrelevant references, 118 full-text articles were screened for eligibility. After screening of the full-text articles, 100 articles were excluded based on being of ineligible study design (n = 60), not reporting outcomes of interest (n = 28) and not evaluating an intervention of interest (n = 12) (Online Resource 4). Therefore, 18 articles were included of which 14 reported on a unique sample (Fig. 1).

Fig. 1
figure 1

Flow diagram of review process

Study design and characteristics

Included studies investigated the effectiveness of active robotic versus conventional THA (n = 7; reported by four RCTs and three prospective cohorts) [21,22,23,24,25,26,27], semi-active robotic versus conventional THA (n = 1; reported by one retrospective cohort) [28], active robotic versus conventional TKA (n = 4; all reported by RCTs) [29,30,31,32] and semi-active robotic versus conventional UKA (n = 2; reported by one RCT and one prospective cohort) [33, 34] (Table 1).

Table 1 Characteristics of included studies for hip and knee arthroplasty

Sample sizes ranged from 28 [33] to 200 [28], with a total of 955 hip arthroplasties (mean age ± SD = 60.54 ± 11.94 years) and 387 knee arthroplasties (mean age ± SD = 66.59 ± 6.80 years) reported. Functional outcomes were reported by all included studies, with follow-up times ranging from six weeks [33] to 13 years [21]. Six studies reported on the difference of pain scores [22, 25, 26, 29, 33, 34].

Osteoarthritis was the predominant diagnosis in the included robotic hip arthroplasty studies [21,22,23, 25,26,27,28], with one study mostly reporting on osteonecrosis patients [24]. Additionally, four of these studies included patients treated for pathologies other than osteoarthritis [24,25,26,27]. Studies investigating knee robotic surgery examined patients presenting with osteoarthritis [29,30,31,32,33], although one study did not specify the diagnoses of their participants [34]. Table 1 presents the characteristics of the included studies.

Risk of bias assessment

Table 2 summarises the risk of bias for individual studies. All studies had at least one domain considered to be high risk of bias, and five studies (36%) were judged with more than two domains. The most common methodological flaw was found in the confounding domain, which investigated biases in the selection of study subjects (n = 11; 79%). This was followed by power (n = 9; 64%), where no study evaluating knee arthroplasty was considered to meet the criteria. The least methodological flaws were found to be in the domains of external validity (n = 12; 86%) and reporting quality (n = 11; 79%).

Table 2 Downs and Black Checklist for measuring the methodological quality of the included studies

Summary of findings and strength of evidence

THA active robotic versus conventional surgery

The outcomes of active robotic compared to conventional THA surgery were investigated by seven studies [21,22,23,24,25,26,27], including 755 hip arthroplasties (Table 3). Short-term results for function reported no significant differences between groups when results were pooled for the Merle d’Aubigne Score (MD − 0.41; 95% CI − 5.31 to 4.48) (Fig. 2). This was also found when short-term function was evaluated by the modified Harris Hip Score (mHHS) (MD 0.40; p > 0.05; 95% CI not reported) [21], Harris Hip Score (HHS) (MD − 0.90; 95% CI − 4.84 to 3.04) and Mayo Clinical Hip Score (MD 4.75; 95% CI − 0.48 to 9.98) [23], providing low to very low quality of evidence. A single study evaluated function in the medium term using the Merle d’Aubigne Score (MD − 2.22; 95% CI − 8.40 to 3.95) and HHS (MD − 6.10; 95% CI − 12.34 to 0.14), providing low-quality evidence of no difference between groups [23]. Though, this study also presented very-low-quality evidence of a difference between groups favouring active robotic surgery using the Mayo Clinical Hip Score (MD − 9.50; 95% CI − 16.55 to − 2.45). At long term, pooled estimates of function using the Merle d’Aubigne Score (MD − 1.25; 95% CI − 3.90 to 1.41) and combined mHHS and HHS (MD − 2.90; 95% CI − 9.04 to 3.24) provided low-quality evidence of no difference between groups (Fig. 2). We also performed a sensitivity analysis including only RCTs (i.e. removing non-randomised comparative studies) and found similar pooled estimates for the Merle d’Aubigne Score (MD − 3.96; 95% CI − 9.32 to 1.40) or combined mHHS and HHS (MD − 4.62; 95% CI − 13.32 to 4.09) (Online Resource 5). Lim et al. assessed function using the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) and reported no difference between groups (MD − 1.0; 95% CI − 2.47 to 0.47). However, low quality of evidence of an increase in function, favouring active robotic surgery, was reported when assessed by the Mayo Clinical Hip Score (MD − 12.88; 95% CI − 18.27 to − 7.48) [23] and Japanese Orthopaedic Association (JOA) Score (MD − 2.00; 95% CI − 3.89 to − 0.11) [25].

Table 3 Summary of findings and quality of evidence assessment (GRADE)
Fig. 2
figure 2

Mean difference for functional outcomes in trials comparing active robotic versus conventional total hip arthroplasty. Studies are ordered chronologically within functional outcomes. Short-term indicates follow-up periods < 3 months. Long-term indicates follow-up periods > 12 months. Negative values favour active robotic surgery. MD mean difference.*modified Harris Hip Score

Pain was assessed in three studies [22, 25, 26], including 344 hip arthroplasties (Table 3). Low-quality evidence was found in one study (n = 156) that reported no significant difference between groups when patients reported if they experienced thigh pain at short term (RR 2.75; 95% CI 0.91 to 8.27) [26] and at medium term (RR 1.50; 95% CI 0.26 to 8.73) [26]. The pain domain of the Merle d’Aubigne Score was reported in one study (n = 58) at long term with no difference between groups (MD 0.00; p > 0.05; 95% CI not reported) [22]. Nakamura et al. (n = 130) investigated if patients presented pain in their thigh (RR 4.23; 95% CI 0.48 to 36.9) or knee (only two patients in the robotic group reported knee pain) and found no differences between robotic and conventional surgery at long term [25].

Bargar et al. (n = 103) evaluated QOL using the Short Form 36 Health Survey (SF-36) overall scaled score in the short (MD 2.60; p > 0.05; 95% CI not reported), medium (MD 3.20; p > 0.05; 95% CI not reported) and long term (MD − 2.40; p > 0.05; 95% CI not reported), and provided very low quality of evidence of no difference between groups [21] (Table 3).

THA semi-active robotic versus conventional surgery

One study (n = 200) compared long-term outcomes of semi-active robotic versus passive robotic THA surgery and provided low-quality evidence for all outcomes [28] (Table 3). No difference between groups was found when function was evaluated by the mHHS (MD − 6.00; 95% CI − 9.78 to 2.22) and WOMAC (MD − 1.30; 95% CI − 5.51 to 2.91).

In terms of QOL (Short Form 12 Health Survey), there was no difference between groups in both the physical component summary (PCS) (MD − 1.60; 95% CI − 4.58 to 1.38) and the mental component summary (MCS) (MD − 1.60; 95% CI − 4.28 to 1.08) at long term.

TKA active robotic versus conventional surgery

Four studies reported data from 282 TKAs comparing active robotic to conventional surgery (Table 3) [29,30,31,32]. In the short term, one study (n = 60) provided very-low-quality evidence of no significant difference between group when evaluating function using the Hospital for Special Surgery (HSS) Score (MD − 0.60; 95% CI − 3.97 to 2.77) and WOMAC (MD 0.40; 95% CI − 5.77 to 6.57) [31]. Low quality of evidence of no difference between groups in the medium term was reported when estimates of the Knee Society Function Score (KSS-F) and HSS function scores were pooled (MD 0.04; 95% CI − 2.94 to 3.01) (Fig. 3). No difference between groups was reported in the medium term when function was assessed by the WOMAC (MD 0.20; 95% CI − 5.14 to 5.54) [31] and Oxford Knee Score (OKS) (MD 0.63; 95% CI − 1.17 to 2.42) [29]. In the long term, all four studies reported low quality of evidence of no difference between groups when estimates of function were pooled using the KSS-F and HSS (MD − 0.51; 95% CI − 1.83 to 0.82) or KSS-F and WOMAC (MD − 0.51; 95% CI − 1.95 to 0.94) (Fig. 3). No difference between groups was also found when function was evaluated by the OKS (MD 1.25; 95% CI − 0.16 to 2.66) [29], providing very low quality of evidence.

Fig. 3
figure 3

Mean difference for functional outcomes in trials comparing active robotic versus conventional total knee arthroplasty. Studies are ordered chronologically within functional outcomes. Medium term indicates follow-up periods between 3 and 12 months. Long term indicates follow-up periods > 12 months. Negative values favour active robotic surgery. MD mean difference, WOMAC Western Ontario and McMaster Universities Osteoarthritis Index. *Knee Society Score-Function

Patient response to pain was reported by one study (n = 60) which found no significant differences between groups at medium (MD 4.80; 95% CI − 8.90 to 18.50) and long term (MD − 3.90; 95% CI − 16.48 to 8.68) when responding to the SF-36 Bodily Pain Score [29].

QOL was evaluated by one study (n = 60) that reported the PCS and MCS scores of the SF-36, providing very low quality of evidence [29]. In the medium term, no differences between groups was found in the PCS (MD 0.50; 95% CI − 4.76 to 5.76) and MCS (MD − 4.40; 95% CI − 9.08 to 0.28). This was also reported in the long term with the PCS (MD − 4.10; 95% CI − 9.61 to 1.41) and MCS (MD − 4.60; 95% CI − 9.69 to 0.49) (Table 3).

Patient’s satisfaction with the outcome of surgery was investigated by one study (n = 60) which reported no significant difference between groups at medium (RR 0.99; 95% CI 0.83 to 1.17) and long term (RR 0.96; 95% CI 0.82 to 1.12) [29].

UKA semi-active robotic versus conventional surgery

Two studies compared semi-active robotic to conventional surgery in patients undergoing UKA (n = 105) (Table 3) [33, 34]. Coon et al. evaluated function in the short term using the combined Clinical and Function Knee Society Score (cKSS) and reported no difference between groups (MD not reported; p > 0.05; 95% CI not reported) [34]. The cKSS was assessed by two studies, evaluating function in the medium term. Cobb et al. (n = 28) found a difference between groups favouring semi-active robotic surgery (MD not reported; p = 0.004; 95% CI not reported) [33], whereas Coon et al. (n = 77) found no difference between groups (MD not reported; p > 0.05; 95% CI not reported) [34], with both studies providing very low quality of evidence. Cobb et al. also reported no differences between groups when function was measured by the WOMAC (MD not reported; p = 0.06; 95% CI not reported) [33].

Pain was assessed in both studies (n = 105), and no significant difference between groups at medium term using the pain component of the cKSS (MD not reported; p > 0.05; 95% CI not reported) [34] and WOMAC (MD not reported; p > 0.05; 95% CI not reported) [33] was found.

Discussion

Statement of principal findings

This systematic review and meta-analysis demonstrates that comparable patient-reported outcomes are achieved at short, middle and long term in hip or knee arthroplasty when performed by a semi-active or active robotic system compared to conventional surgery. No study reported outcomes favouring conventional surgery over robotic, with reports from two individual studies showing active robotic total hip arthroplasty provides significantly better function outcomes at medium- and long-term follow-up, and one study reporting semi-active robotic unicompartmental knee arthroplasty provided better functional outcomes at medium-term follow-up. No significant difference in pain, QOL and satisfaction with surgery was reported in individual studies.

Strengths and weaknesses of the study

The strengths of this review include the adherence to a pre-specified protocol registered on PROSPERO, inclusion of all comparative papers and following of the PRISMA recommendations including the use of the GRADE system to appraise the quality of evidence. We also assessed study’s risk of bias with the Downs and Black Quality Checklist for Health Care Intervention Studies, which has been shown to have acceptable validity and reliability [14]. To facilitate the interpretation of our meta-analysis, we provided precise estimates and clinically interpretable scores on a 0–100 scale. Furthermore, the review applied no restrictions on the publication language, date of publication and patient-reported outcome measure employed, and we have contacted an expert in the field to ensure relevant studies were not missed.

We, however, were only able to identify a small number of studies (n = 14), half of which were published over a decade ago. As a result, we encountered difficulty in obtaining full data sets and raised the possibility that advances in technology may have an influence over reported results [35]. This, combined with the assessed quality of included studies, meant that the level of evidence across all studies was somewhat low. Furthermore, the inclusion of non-randomised comparative studies, and the heterogeneity of the surgical approaches regarding the instrumentation, manual rasping technique and prostheses implanted, may have also influenced the results presented. In addition, heterogeneity of outcomes employed meant that scores could not always be pooled in our meta-analysis. It was also identified that while pain was explicitly reported by six studies, only two studies provided quantifiable results to explore pain intensity.

Strengths and weaknesses in relation to other studies

To our knowledge, this is the first systematic review and meta-analysis to rigorously evaluate the effectiveness of semi-active or active robotic hip and knee arthroplasty compared to conventional surgery on patient-reported outcomes. Previous systematic and narrative reviews in this area have commonly provided general overviews or summaries, with few employing a systematic approach. Many of these articles have focussed on knee arthroplasty in terms of the clinical outcomes associated with this technology, with secondary reference to patient-reported outcome measures [4, 5, 7,8,9]. Additionally, results from three new studies, of which two are RCTS, have also been included in our review [24, 28, 29].

Meta-analysis of results from these studies through collaboration of individual study results, namely those from consistent or correlated measures, had not been attempted previously. However, Karthik et al. [8] and Jacofsky et al. [5] presented tables descriptively summarising the overall results from multiple studies, and based on their individual study summaries, the pattern of no difference between groups was somewhat consistent with the results of this review.

When collaborating these results, semi-active and active robotic systems were evaluated separately. While the differences between semi-active and active robotic systems are well established in the literature [5, 6], previous reviews have commonly grouped the two in an attempt to provide an overall interpretation of robotic surgery. Jacofsky et al. [5] attempted to distinguish between semi-active and active robotic systems and then evaluated the individual robotic devices within each group. Similarly in our review, we grouped studies accordingly to robotic system (i.e. semi-active or active), surgery performed (i.e. THA, TKA or UKA) and patient-reported outcomes (i.e. function, pain, QOL or patient satisfaction).

Unanswered questions and future research

This systematic review found a paucity of studies investigating the effectiveness between robotic surgery and conventional surgery in terms of patient-reported outcomes. A lack of consistency of outcome measures in these studies further compounded the difficulty in pooling the data for meta-analysis and establishing recommendations. This is a well-documented phenomenon in arthroplasty, with governing bodies [36, 37] and international experts [38, 39] supporting the establishment of standardised questionnaire sets to be collected.

Additionally, pain was only reported in six studies [22, 25, 26, 29, 33, 34]; however, comparable outcomes were not employed. While many outcome measures employed by the included studies evaluated pain as a component of their overall score (e.g. HHS, Merle d’Aubigne, WOMAC, OKS), explicit discussion of this was lacking. With pain being one of the primary reasons for consideration of arthroplasty [40,41,42], future studies should aim to comprehensively assess pain using standardised measures. Furthermore, a proper sample size calculation should be employed.

While the results of this review indicate that robotic and conventional surgery are comparable in terms of patient-reported measures of function, pain, QOL and satisfaction, it remains uncertain as to the effect robotic surgery truly has on these outcomes. Due to the low or very low quality of evidence presented across the included studies, further high-quality research may change the results of this review and caution should be taken when interpreting our results. Whenever possible, future studies should investigate the effect of different surgical approaches on patient-reported outcomes and further evaluate the influence of robotic surgeries on clinical outcomes such as length of stay and complications intra-operatively and post-operatively [43]. Key study design components to address in future include consistency in outcome measures employed as suggested by governing bodies [36, 37] and experts [38, 39], and increased power via improved samples sizes, particularly in knee arthroplasty studies.

Conclusion

In conclusion, this review’s findings indicate that post-operative functional outcomes for patients undergoing robotic or conventional total hip and knee arthroplasty are comparable. Although, evidence of significant improvements in function have been reported in single studies following robotic surgery at medium- and long-term follow up. Whether these results translate to improvements in post-operative pain, quality of life and satisfaction with surgery remains unclear.