Introduction

There has been a growing interest in variations in surgical approach as a means of improving early function and reducing postoperative pain in total hip arthroplasty (THA) [1]. The posterior approach and the lateral (direct and anterolateral) approaches account for almost 90% of cases worldwide, while the direct anterior approach (DAA) accounts for up to 10% [2].

The popularity of the DAA has increased dramatically over the last 15 years. Proponents of the DAA cite its muscle-sparing approach and potential for reduction in postoperative pain and improvements in early postoperative function [3]. However, the literature remains mixed regarding these perceived benefits and concerns remain about the reported higher levels of complication rates including early reoperation, postoperative wound complications and intraoperative fractures [4].

Several reviews have compared outcomes of THA based on surgical approach. However, the majority of these reviews include non-randomized studies, do not focus on short-term outcomes, have included fewer studies and do not compare all major surgical approaches [5,6,7,8]. Furthermore, previously published reviews do not consider the minimally important clinical difference (MCID) to determine whether differences found are clinically relevant. Two NMAs have been performed on comparisons of surgical approaches in THA. Docter et al. [6] examined the differences in complications among surgical approaches. However, this review was limited by the lack of patient-reported outcomes and the inclusion of retrospective cohorts, reducing the quality of the evidence. Putananon et al. [7] also performed an NMA comparing various surgical approaches and included 14 randomized controlled trials (RCT). Given substantial amount of new information has become available, with seven new RCTs being published in the last 3 years, an updated review is needed.

The purpose of this systematic review and network meta-analysis is to compare the short-term outcomes of the common surgical approaches (DAA, PA, DL, AL) up to 12 weeks postoperatively. Specifically, the aim is to answer the following questions: (1) which surgical approach results in the highest functional outcomes scores? (2) Which surgical approach results in the lowest postoperative pain scores and opioid consumption? (3) Are there differences in reported surgical complications and reoperation rates among the common surgical approaches?

Materials and methods

Protocol and registration

This review NMA was performed according to the guidelines set out by PRISMA and Cochrane Collaboration for performing and reporting network meta-analyses [9]. The review was registered in the PROSPERO database prospectively.

Eligibility criteria

All inclusion and exclusion criteria were defined a priori. Randomized controlled trials involving skeletally mature patients, undergoing primary total hip arthroplasty and comparing at least two different surgical approaches were included. Specific inclusion criteria included (1) adult patients ≥ 18 years old, (2) undergoing primary THA, (3) randomized controlled trials, (4) comparison of two or more different surgical approaches, (5) extractable outcomes of interest including pain, functional outcomes and opioid consumption up to 12 weeks. Exclusion criteria consisted of (1) non-randomized studies, (2) patients undergoing THA for fracture, (3) studies without the aforementioned outcomes of interest, and (4) studies in which the surgical approach was not defined. Studies that compared the same fundamental surgical approach in a minimally invasive fashion vs. standard approach were not considered for inclusion given the variability in techniques of what is considered minimally invasive and lack of differences previously reported in the literature.

Search strategy

A systematic literature search of PubMed, MEDLINE, Embase, Web of Science and SCOPUS was undertaken from inception to May 7th, 2020 (Supplementary Appendix). Search terms included “total hip arthroplasty”, “direct anterior approach”, “anterolateral approach”, “Watson-Jones”, “lateral approach”, “Hardinge Approach”, “Posterior Approach” and “Posterolateral approach”. All searches were limited to RCTs in humans and no language limits were placed. In addition, retrieved papers and recent reviews were manually assessed.

Study selection

Two blinded reviewers (N.H. and C.K.) independently reviewed and screened all articles at both the title/abstract and full-text stages using Rayyan QCRI (Qatar Computing Research Institute, Doha, Qatar). Discrepancies at both stages were resolved by the lead author (A.G.). The inter-observer agreement for study eligibility was determined using Cohen’s kappa (κ) statistic and interpreted the κ values according to McHugh et al. [12].

Data extraction

Prior to formal extraction, a collaborative data-extraction tool was created a priori. The extracted data included study and protocol characteristics, intervention specifics, length of stay, postoperative functional outcome and pain scores, opioid consumption and postoperative complications.

The main outcomes of interest were postoperative function at 6 and 12 weeks and postoperative pain scores on postoperative days 1 (POD 1) and 2 (POD2), 2 weeks and 6 weeks. When these timepoints were unavailable, we utilized the closest data point available. Cumulative postoperative opioid consumption was recorded when available. Postoperative complications were recorded by type and incidence. The following postoperative complications were compared between groups: total reoperations, intraoperative fractures, wound complications, deep infections and dislocations.

Network geometry

A network diagram was created to visually depict the network geometry in terms of the number of different surgical approaches used, the frequency at which they were evaluated and the direct and indirect comparisons made. The network diagram included nodes specific to the different surgical approaches. These nodes were weighted based on the frequency that they were performed. Weighted links were used to represent the number of studies that compared the connected nodes.

Study appraisal

The Cochrane risk of bias assessment tool was used by two reviewers (N.H. and C.K.) to assess for the methodological quality of each study [13]. The domains were assessed and determined to be at low, unclear or high risk of bias. Each study was given an overall risk of bias grade.

Measures of treatment effect

As per the guidelines set out by the Grades of Recommendation Assessment, Development and Evaluation (GRADE), all scores for each measured outcome were converted to the scale of the most commonly reported instrument, if applicable [14]. According to GRADE guidelines, this is the preferred method for combining continuous outcomes when a comparison to MCID is important, and when most studies report outcomes in the most familiar scale already [14]. The visual analogue scale (VAS, 0–10) was used for pain scores and the Harris Hip Score (HHS, 0–100) were the most common scales used in the included studies, and therefore, all other scales were converted to VAS and HHS, respectively [15, 16]. After data conversion, a mean difference (MD) with 95% credible intervals (Crls) were calculated and reported accordingly. The common scale allowed for consideration of the concept of a MCID. The MCID for the VAS scale was set at 1.9 based on previously defined values [17]. The MCID for the HHS was set at 7–10 based on previous calculations [18]. The majority of studies did not report data as a change from baseline, therefore a direct comparison of change from baseline across interventions was not performed.

Managing missing data

If the mean or standard deviation (SD) was missing, the mean score at each timepoint was calculated by subtracting the mean difference from the baseline score and the SD was imputed using the SD of other studies included. If values were presented in median and ranges, they were converted to mean and SD as per the Cochrane handbook and Hozo et al. [19]. For data presented exclusively in graph format, we utilized a validated data-extraction software (WebPlotDigitizer, version 4.1; Ankit Rohatgi) to record outcomes.

Network meta-analysis

To directly and indirectly compare the four major surgical approaches, a network meta-analysis was performed. The following outcomes were compared: length of stay, pain scores at POD1, POD2, 2 weeks and 6 weeks, functional outcomes (Harris Hip Score) at 6 weeks and 12 weeks and major complications.

The mean effect size was used to compare outcomes for each surgical approach. Forest plots and ranking diagrams were used to visually depict differences. The surface under the cumulative ranking curve (SUCRA) values were reported. SUCRA values are presented as percentages, with the larger percentages indicating a greater chance of that treatment being the best treatment option for that outcome.

Assessment of heterogeneity and transitivity

A network analysis, utilizing a Bayesian framework and random-effects model was utilized. Transitivity was assessed across comparative groups by evaluating study characteristics and patient demographics. The I2 value was utilized to describe global heterogeneity across the networks.

Results

Study selection

The results of the search are depicted in Fig. 1. A total of 825 studies were identified after duplicates were removed. Of the 52 articles included in the full-text review, 25 RCTs (n = 2339) were included in the NMA. The agreement between the two reviewers was substantial (κ = 0.78) at the title/abstract stage and moderate (κ = 0.58) at the full-text stage.

Fig. 1
figure 1

PRISMA diagram

Characteristics of included studies

The characteristics of the included studies are described in Table 1. A total of 17 studies examined the DAA (n = 835), 15 studies examined the DL (n = 727), 13 studies examined the PA (n = 534) and 5 studies examined the AL (n = 243). Figure 2 demonstrates the network comparison of all four surgical approaches. The median sample size of included trials was 87 (range 44–164). The mean age was 62.7 (± 11) and the proportion of women was 54%. The characteristics of included studies are summarized in Table 1.

Table 1 Study characteristics
Fig. 2
figure 2

Overall network diagram for all treatment comparisons

The risk of bias summary for each included RCT is available in Appendix Fig. 1. Ten trials were considered to be at low risk of bias, three were high risk and 12 studies were deemed to have some concerns regarding risk of bias. Overall, the highest risk of bias was due to concerns regarding the randomization process and measurement of outcome.

Postoperative function

The DAA led to significantly improved HHS at 6-week follow-up when compared to both DL and PA (Fig. 3). However, this improvement failed to reach the previously defined MCID range of 7–10. There was no significant difference in HHS at 12-week follow-up among surgical approaches when compared to the DAA group. At both the 6- and 12-week follow-ups, the DAA had the largest probability of being the best approach for HHS scores.

Fig. 3
figure 3

Forest plots demonstrating differences in HHS outcomes at 6 and 12 weeks postoperatively. DAA direct anterior approach, AL anterolateral, DL direct lateral, PA posterior approach

Postoperative pain

The only significant difference in postoperative pain scores was on POD2 and 2 weeks, when the DAA recorded significantly less pain than the DL (Fig. 4). However, these differences (0.9 and 1.3) failed to meet the previously defined MCID. The AL approach had the largest probability of having the lowest pain scores at POD1 and the 12-week follow-up while the DAA approach had the largest probability of having the lowest pain scores at POD2, 6-week follow-up and the PA had the largest probability of having the lowest pain scores at 2 weeks. The overall ranking based on the SUCRA values is shown in Table 2.

Fig. 4
figure 4

Forest plots demonstrating differences in VAS pain scores at postoperative day 1 (A) and 2 (B), 2-week (C) and 6-week (D) follow-up. DAA direct anterior approach, AL anterolateral, DL direct lateral, PA posterior approach

Table 2 SUCRA values

Length of stay

The AL approach had a significant shorter LOS when compared to the DAA. There was no significant difference in LOS for the DL and PA when compared to the DAA. The AL followed by the DL had the largest probability of the shortest LOS. The overall ranking based on the SUCRA values is shown in Table 2. The overall mean length of stay varied from 0.8 days to 12.5 days.

Opioid consumption

The variability in reporting did not allow for quantitative pooling of results with regards to postoperative opioid consumption. Five trials reported on inpatient opioid consumption while one study examined the total opioid consumption at 2-week follow-up. Barret et al. [20] compared DAA to PA and found no significant differences in postoperative opioid consumptions on postoperative days 1 and 2. Taunton demonstrated significantly less inpatient postoperative opioid use in the DAA compared to PA [21]. Cheng et al. [22] demonstrated lower overall opioid consumption in patients undergoing the DAA compared to PA at the 2-week follow-up mark. Brismar [23], Nistor [24] found the DAA had lower cumulative opioid consumption during the inpatient postoperative period compared to the DL. Mjaaland [25] demonstrated a reduction in opioid consumption on the day of surgery with the DAA but no lasting differences when compared to the DL. Martin [26] demonstrated no difference in postoperative opioid consumption between the AL and DL approaches.

Complication rates

Complication rates were recorded in 19/25 studies (n = 1723). There were a total of 20 reoperations, 24 intraoperative fractures, 21 wound complications, 12 dislocations and 8 deep infections. With the exception of lateral femoral cutaneous nerve (LFCN) palsies, there were no significant differences in the aforementioned complications between the approaches. When combined into a composite of major complications (reoperations, intraoperative fractures, wound complications dislocations, and deep infections), there was no significant difference among the included approaches. With respect to major complications, the DL had the lowest probability of complications followed by the DAA, PA and finally the AL (Table 2).

Discussion

This review demonstrates that there was no clinically significant differences in functional outcomes between the approaches as the improvements seen with the DAA failed to meet MCID cutoffs. Pain scores varied throughout the postoperative course. The AL had a significantly shorter length of stay when compared to the other approaches. There were no significant differences in major postoperative complications between the four THA approaches.

The DAA has seen a resurgence in popularity in recent years due to its purported benefits of reduced pain and earlier recovery [27]. The DAA has been marketed as a minimally invasive operation with reduced pain, quicker recovery and decreased dislocation rates [27]. Given the lack of long-term functional differences demonstrated between surgical approaches, this review opted to focus on the potential short-term differences that may be important to the changing demographic of patients undergoing THA [28]. Given the rise in demand for THA in younger patients, early functional recovery may be an important consideration for some patients, particularly those returning to work or sport following surgery [29]. However, the results of the current study do not support the notion of clinically meaningful differences in functional outcomes between surgical approaches.

The differences in functional outcomes failed to meet the previously defined MCID for patients undergoing THA. However, there is a lack of standardized MCID for comparing varying surgical approaches in THA. The majority of MCIDs quoted in the literature are comparing pre to postoperative functional scores and not comparing differences in postoperative scores between different interventions (i.e., surgical approach) [30]. Concerns have also been raised about the validity of the HHS due to the potential ceiling effect associated with this scoring system. The ceiling effect occurs when several patients score the highest score possible, which is common in this patient population [31]. In addition, the HHS was designed in the 1960s and may not be appropriate for the changing demographic of patients receiving THA [31]. Given these limitations, it is unclear the extent to which patients can appreciate the differences found in the HHS scores. Perhaps a more focused patient-reported outcome measure needs to be developed to discern the differences in patient between surgical approaches that patients have qualitatively described [32]. For example, the relatively new Forgotten Joint Score, appears to be more responsive to higher levels of functional outcomes and is impacted less by the ‘ceiling effect’ when compared to traditional scoring systems [33, 34].

The improvements in early functional outcomes did not translate to reduced LOS in DAA patients. These results differ from previous reviews suggesting a reduced LOS in patients undergoing DAA [35]. Given the wide range of mean LOS of the included studies, the lack of differences found may be secondary to a host of other variables and not the surgical approach itself. The LOS in THA patients has decreased significantly over the last decade which may account for some of the variability in the included trials [36]. Similarly, higher surgeon and hospital volume have been shown to reduce the LOS in patients undergoing total joint arthroplasties [37, 38]. Finally, this review included patients from a range of countries and health care systems which have been shown to impact LOS in the past [39].

Despite the widespread notion of reduced postoperative pain with the DAA, this approach did not demonstrate a meaningful reduction pain scores when compared to the other common approaches. These results mirror recent non-randomized cohorts in which the DAA led to statistically significant but not clinically significant reductions in postoperative pain [40]. Although the DAA is muscle-sparing, studies comparing postoperative inflammatory markers have shown no differences between surgical approaches [41]. This may be secondary to the increased stretching of the musculature required to visualize the bony anatomy in the DAA which may actually increase inflammation to a greater extent than detaching the musculature [41].

The majority of studies reporting on opioid use demonstrated lower consumption in the DAA when compared to the DL and PA. Seah et al. [42] demonstrated similar results and found that the DAA led to significantly less daily opioid consumption when compared to both the DL and PA. These results showed 13% of opioid-naïve patients undergoing THA continue to use opioids at 1 year postoperatively, highlighting the risks of opioid prescription and consumption in the perioperative period. Given the current opioid consumption and the high proportion of patients who remain on chronic opioids following surgery, the differences in postoperative opioid consumption between surgical approaches warrants attention. Future trials should document postoperative opioid consumption to develop a deeper understanding of the differences in opioid consumption among surgical approaches in both the short and long term.

The current review demonstrates no significant differences in complication rates across surgical approaches which differs from previous reports in the literature [43]. There are several possibilities why the results from the current review differ. First, given the low complication rates seen with THA, the studies included were not adequately powered to detect differences in complication rates. Second, with the exception of Nistor et al. [24], the studies included in this review did not include learning cases. The current review also consisted of relatively small sample sizes that are underpowered to detect differences in complication rates given their rarity in THAs. Given this, the current findings must be interpreted with caution.

This review is strengthened by its comprehensive nature and rigorous adherence to the PRISMA guidelines. This review consists of exclusively Level 1 evidence providing the highest quality of evidence available. A major limitation of this NMA is the relatively small sample size of included RCTs, which should be a consideration when designing future trials [44].

The current review suggests that there were no clinically relevant short-term functional differences between surgical approaches as the improvements found with the DAA failed to reach clinical significance. No significant differences in pain scores or complications were found. All major surgical approaches led to large improvements in function by 12 weeks with relatively low complication rates, emphasizing the success of total hip arthroplasty.