Introduction

Globally, the world has an aging population and subsequently increased pressure on healthcare budgets. Hip fracture is a serious injury, which mostly occurs in older patients. Worldwide, there are 1.3 million hip fractures, with more than 70,000 hip fractures in the UK every year [1]. These figures are projected to rise to over 100,000 by 2020 in the UK [1] and more than six million by 2050 worldwide [2]. The global cost of this clinical problem is estimated at 1.75 million disability-adjusted life years lost and represents 1.4% of the total healthcare burden in established market economies [3].

A hip fracture is a potentially catastrophic event; approximately 25% of patients will die during the first year following this injury, and those that survive will have a significant reduction in their quality of life. At present, hip fractures constitute a major health burden, as they account for 24% of all fractures in the elderly [4]. Many forms of arthroplasty are utilized to manage displaced fractures of the femoral neck (NOF) in the elderly including hemiarthroplasty and different designs of total hip arthroplasty including dual mobility implants [5,6,7,8]. Nevertheless, modern literature currently lacks agreement regarding the best implantation technique (cemented or uncemented) for the hip fracture population. The majority of the current literature that guides practice mainly includes relatively old studies, which compared first-generation prostheses such as the Austin Moore and Thompson implants. Moreover, these studies were also criticized for their size and poor eligibility criteria, as well as poor randomization, inadequate follow-up, and suboptimal reporting of clinical and functional outcomes [9, 10].

The aim of this systematic review and meta-analysis of randomized controlled trials (RCTs) and observational studies was to investigate whether either contemporary cemented or cementless hemiarthroplasty demonstrates superior outcomes, excluding obsolete prostheses such as the Thompson and Austin Moore implants. We hypothesized that, although there might be no differences between both groups in hip function, contemporary cemented hemiarthroplasties would be associated with higher costs but fewer perioperative complications.

Methods

We followed the PRISMA statement guidelines during the preparation of this systematic review and meta-analysis. Moreover, all steps were performed in strict accordance to the Cochrane handbook of systematic reviews of interventions [11].

Literature search strategy

We searched the medical electronic databases PubMed, Embase, Cochrane Library, Scopus, EBSCO, and Web of science through May 2017 using the following keywords: “Hemiarthroplasty,” “arthroplasty,” “femoral neck fractures,” “intracapsular hip fractures,” “hip prosthesis,” “cemented,” “cementless,” “uncemented.” No restrictions by language or publication time were employed. We also checked the clinical trial registry (REF) for additional ongoing and unpublished studies. Additionally, we searched references of the most relevant articles.

Eligibility criteria

We included studies meeting the following inclusion criteria: Randomized controlled trials (RCT) and observational studies; patients older than 70 years with femoral neck fractures; cemented and uncemented hemiarthroplasty (CH; UCH); all designs of contemporary hemiarthroplasty. We excluded studies based on the following criteria: Thompson and Austin Moore prosthesis used, patients with a previous fracture of the same hip or with a pathological fracture, non-English studies, and duplicate references.

Selection of studies

Three authors independently applied the selection criteria. Eligibility screening was conducted in two steps: Titles and abstracts screening for matching the inclusion criteria, and full-text screening for eligibility to meta-analysis. Disagreements were resolved upon the opinion of discussion and with discussion with the senior author.

Outcomes of interest

We included studies reporting at least one of the following outcomes: Post-operative hip function with a follow-up of three months, one year, and five years; post-operative pain scores; re-operation and revision rate; implant-related complications including intra-operative fractures, periprosthetic fractures, dislocation, loosening of prosthesis, wound infection, and wound haematoma formation; operative details including operative duration, intra-operative blood loss, and numbers of patients requiring blood transfusion; hospital stay; and cost.

Data extraction

Three reviewers independently extracted and tabulated data on first author, publication year, study design, number of participants in each group, mean age, gender, type of intervention including type of prosthesis, study period, follow-up period, and relevant outcomes data. Another senior reviewer resolved disagreements and reasons of exclusion were recorded.

Risk of bias assessment

For clinical trials, two authors independently used the Cochrane Risk Of Bias (ROB) assessment tool [11]. For observational studies, we used the Newcastle Ottawa scale (NOS) for assessing the quality of observational studies [12] and each included study was assessed based on reporting of three essential domains: (a) selection of the study subjects, (b) comparability of groups on demographic characteristics and important potential confounders, and (c) ascertainment of the pre-specified outcome (exposure/treatment). To assess the risk of bias across included studies, we compared the reported outcomes between all studies to exclude selective reporting of outcomes. To investigate the possibility of publication bias, we used the Egger’s test [13] and the funnel plot method. In case of significant publication bias, the trim and fill method was used for correction and the effect estimate was recalculated accordingly.

Assessment of heterogeneity

We tested for heterogeneity among included studies by the chi-square test and I-square tests. The chi-square was used to test the existence of significant heterogeneity while I-square quantifies the variability in effect estimates that is due to heterogeneity. I-square test was interpreted according to the recommendations of the Cochrane Handbook of Systemic Reviews and meta-analysis (0–40%, might not be important; 30–60%, may represent moderate heterogeneity; 50–90%, may represent substantial heterogeneity; and 75–100%, considerable heterogeneity) [11]. Fixed-effect model was used if no significant heterogeneity was present (I2 < 50%; p > 0.1). Otherwise, a random effect model was used; a sensitivity analysis was conducted if heterogeneity existed among the studies.

Data analysis

As the analysis included interventional and observational studies, we stratified the outcomes according to the study design. In case of missing standard deviation (SD), we calculated it from the corresponding standard error or confidence interval according to Altman [14]. For dichotomous data, we calculated relative risks (RR) and 95% confidence intervals (CI) for each outcome. For continuous data, we calculated mean difference (MD) and 95% confidence intervals (CI) for each outcome. The statistical analysis was conducted using comprehensive meta-analysis software (version 3, Biostat, USA, 2015). An alpha level < 0.05 was considered statistically significant.

Results

Demographics and characteristics

Our search yielded 872 unique studies, of which 29 studies were included in our meta-analysis. The 29 included studies (9 RCTs and 20 observational studies) investigated a total of 42,046 patients. Thirty-two thousand one hundred eighty-six hips underwent cemented hemiarthroplasty (77%), and 9860 underwent uncemented hemiarthroplasty (23%). The flow diagram of study selection is shown in Fig. 1.

Fig. 1
figure 1

Flow diagram of study selection

Risk of bias assessment

Six out of nine RCTs achieved adequate random sequence generation and allocation concealment, and seven kept unbroken blinding. Incomplete outcome reporting was at low risk in bias at eight RCTs, and selective reporting was low risk at seven RCTs (Fig. Fig. 2a). Observational studies achieved a mean of 7 out of 9 points on the NOS indicating a moderate quality (Fig. Fig. 2b). In summary, the quality of included studies ranged from moderate to high.

Fig. 2
figure 2

Risk of bias assessment of the included studies

Post-operative hip function

Various functional scores were used to assess hip function in the included studies. Six studies reported hip function score at three months, one year, and five years post-operatively. Pooled effect size showed no statistical significant difference between the CH and UCH groups at three months (MD = 1.81, 95% CI [− 2.36, 5.97], p = 0.40), at one year (MD = 0.37, 95% CI [− 2.40, 3.14], p = 0.80), and five years (MD = − 3.72, 95% CI [− 14.53, 7.09], p = 0.50).

Post-operative pain

Pooled effect size of six studies (2 RCTs and 4 observational studies; n = 12,182) showed no significant difference between the CH and UCH groups (RR = 0.72, 95% [CI − 0.50, 1.05], p = 0.09); with moderate heterogeneity (I2 = 48.07%, p = 0.09). By stratifying studies according to study design, the observational studies showed significantly less post-operative pain with the CH group (RR = 0.41, 95% CI [0.23, 0.74], p = 0.003), while RCTs showed no significant difference between the two compared groups (RR = 1.06, 95% CI [0.65, 1.72], p = 0.82).

Re-operation and revision rate

Pooled effect size of 16 studies (5 RCTs and 11 observational studies, n = 14,796) showed no significant difference between CH and UCH (RR = 0.80, 95% CI [0.54, 1.19], p = 0.27), with moderate heterogeneity (I2 = 53%, p = 0.007). This effect estimate was consistent in both observational and RCTs when stratified alone. Egger’s test showed no evidence of publication bias (p = 0.37).

Intra-operative fractures

Pooled effect size of nine studies (3 RCTs and 6 observational studies, n = 2189) showed significantly lower intra-operative fractures in CH compared to UCH (RR = 0.36, 95% CI [0.28, 0.45], p < 0.0001), with no evidence of heterogeneity (I2 = 0%, p = 0.56) (Fig. 3).

Fig. 3
figure 3

Pooled effect size of nine studies showing significantly lower intraoperative fractures in CH compared to UCH

Periprosthetic fractures

Pooled effect size of six observational studies (n = 1371) showed that CH was associated with lower incidence of periprosthetic fractures than in UCH (RR = 0.44, 95% CI [0.21, 0.91], p = 0.03), with no evidence of heterogeneity among these studies (I2 = 0%, p = 0.74) (Fig. 4).

Fig. 4
figure 4

Pooled effect size of six observational studies showing CH was associated with lower incidence of periprosthetic fractures than in UCH

Dislocations of prosthesis

Pooled effect size of 13 studies (2 RCTs and 11 observational studies, n = 14,696) showed no significant difference between the two groups (RR = 0.70, 95% CI [0.49, 1.01], p = 0.09), with no evidence of heterogeneity (I2 = 0%, p = 0.84). This effect estimate was consistent in RCTs when stratified alone. Subgroup analysis of observational studies showed that CH had fewer dislocations than UCH group (RR = 0.67, 95% CI [0.46, 0.96], p = 0.03). Significant publication bias was detected by Egger’s test p = 0.005. Following correction with the trim and fill method, the adjusted RR was in favour of the CH than the UCH (RR = 0.63, 95% CI [0.44, 0.89]).

Aseptic loosening of prosthesis

Pooled effect size of six observational studies (n = 12,656) showed no statistically significant difference between the CH and UCH groups (RR = 0.57, 95% CI [0.13, 2.48], p = 0.45), with high heterogeneity among these studies (I2 = 67.5%, p = 0.009). A further sensitivity analysis was performed after excluding one observational study [4]. The sensitivity analysis was consistent with the previous analysis and indicated no significant difference between the compared groups (RR = 1.04, 95% CI [0.38, 2.82], p = 0.95), with no substantial heterogeneity (I2 = 2.84%, p = 0.39).

Wound infections

Pooled effect size of 11 studies (4 RCTs and 7 observational studies, n = 12,516) showed no statistically significant difference between the two compared groups (RR = 0.80, 95% CI [0.61, 1.06], p = 0.12), with no evidence of heterogeneity (I2 = 0%, p = 0.79). Subgroup analysis according to study design did not change this result significantly. Egger’s test showed no evidence of publication bias (p = 0.27).

Wound haematoma

Two observation studies reported (n = 11,254) data on wound haematoma. Under the random effect model, the overall effect size showed no significant difference between the two compared groups (RR = 0.38, 95% CI [0.08, 1.85], p = 0.23).

Heterotopic ossifications

Pooled effect size of two studies (n = 330) showed that CH was associated with higher incidence of heterotrophic ossifications over than in UCH (RR = 1.79, 95% CI [1.11, 2.88], p = 0.02), with no evidence of heterogeneity (I2 = 0%, p = 0.65).

Operative time

Pooled effect size of 16 studies (8 RCTs and 8 observational studies, n = 2679) showed that operative time was shorter in the UCH group in comparison to the CH group (MD = 11.25 min, 95% CI [9.85, 12.66], p < 0.0001), with low evidence of heterogeneity (I2 = 25%, p = 0.17). Stratifying by the type of study design, this result was maintained true in both subgroups. Egger’s test showed no evidence of publication bias (p = 0.40) (Fig. 5).

Fig. 5
figure 5

Egger’s test showing no evidence of publication bias

Intra-operative blood loss

Pooled effect size of seven studies (5 RCTs and 2 observational studies, n = 1955) showed that intra-operative blood loss was significantly higher in CH over than UCH (MD = 68.72 ml, 95% CI [50.76, 86.69], p < 0.0001), with moderate evidence of heterogeneity (I2 = 42.17%, p = 0.11). This result was kept true in subgroup analysis in observational studies (MD = 77.40 ml, 95% CI [43.73, 111.06], p < 0.0001) and RCTs (MD = 65.27 ml, 95% CI [44.03, 86.51], p < 0.0001).

Post-operative blood transfusion

Pooled effect size of three RCTs (n = 681) did not favour either of the two groups (RR = 1.07, 95% CI [0.89, 1.28], p = 0.49), with moderate evidence of heterogeneity (I2 = 33.3%, p = 0.2).

Hospital stay

Pooled effect size of eight studies showed no significant difference between the two compared groups (n = 1609) (MD = 0.01 days, 95% CI [− 0.35, 0.37], p = 0.96), with no evidence of heterogeneity (I2 = 1.15%, p = 0.4). When subgroup analysis was considered, still no significant difference was found in observational studies (MD = 0.10 days, 95% CI [− 0.30, 0.51], p = 0.61) and RCTs (MD = − 0.30 days, 95% CI [− 0.35, 0.37], p = 0.96).

Costs

Three studies assessed the cost-effectiveness of prosthesis (n = 25,374); Tripuraneni et al. [15] reported lower operative and anaesthetic times and observable cost savings with uncemented femoral implants. Conversely, Yli-kyyny et al. [16] showed that cemented implants were more expensive than uncemented implants while Santini et al. [17] showed that uncemented implants were more expensive compared with cemented implants.

Discussion

This meta-analysis includes the outcomes of 42,046 hemiarthroplasties reported in 29 studies. The main finding observed is that cemented contemporary hemiarthroplasty is associated with fewer periprosthetic fractures, but longer operative time compared with uncemented contemporary hemiarthroplasty. Contemporary cemented hemiarthroplasty is also associated with more intra-operative blood loss though there was no evidence to demonstrate that this translated to a higher transfusion requirement or a greater risk of post-operative complication. Previous work has examined the impact of timing of surgery but there has been very little published regarding the relative differences between implant types [18]. There was also a demonstrable difference in higher rates of heterotopic ossification in the cemented hemiarthroplasty group. No statistically significant differences were demonstrated between cemented and uncemented hemiarthroplasty in post-operative hip function, pain, revision rate, dislocations, aseptic loosening, infection, and length of hospital stay. Many studies agree with our findings with regard to operation time [19]. Cemented implants are associated with increased operating times, mainly because of the added time for canal preparation and cement setting time. However, current data do not demonstrate whether this increased operative time translates to greater morbidity or overall healthcare costs.

We found no significant difference in regard to revision rates between both groups. Other studies have presented conflicting rates of revision between CH and UCH. Gjertsen et al. [9] analyzed revision rates of the Norwegian Hip Fracture Registry; they reported increased revision rate in the uncemented group—with revisions mainly for fractures, aseptic loosening, infection, and dislocation. Although we found that fractures (intra-operative and post-operative) were higher in the uncemented group, but still revision rates were similar between both groups. However, in the Gjertsen et al. [9] study, there were differences in baseline characteristics between both groups; the uncemented group included more patients with cognitive impairment. On the other hand, Bell et al. [19] reported significant increase in the revision rate in the cemented group (p = 0.02). Others, similar to our findings, have found no differences in the rates of further surgery [20]. Also, there was no significant difference in terms of aseptic loosening either radiologically or as a cause for subsequent revision.

Only three studies reported on cost analyses and demonstrated conflicting conclusions regarding the most cost-effective implant. A weakness of this study is the inability to draw robust conclusions regarding functional outcomes between the two implant groups. A lack of standardization of reporting of outcome data in the studies analyzed has prevented conclusions being drawn regarding the functional differences between the UCH and CH groups. Firstly, with respect to postoperative hip function, several different outcome measures and Patient Reported Outcome Measurement (PROM) questionnaires were utilized each with their own strengths and limitation. A lack of uniformity in PROM data collection may result in small differences being overlooked. More work is needed to establish a gold standard PROM score that can be used universally and improve the ability of future meta-analyses to draw robust conclusions [21]. Clearly, a fundamental aim of surgical treatment is to ensure a good functional outcome and further high-quality research is required to determine if differences in functional outcome exist between the two implant groups. Similarly, of the six observational studies reporting on wound infection, there was only brief description of their definition of infection and whether this was superficial or deep. Few of the studies defined their identified “infections” as “superficial incisional,” “deep incisional,” or “organ or space infection” in concordance with the Centers for Disease Control and Prevention (CDC) definitions of Surgical Site Infection (SSI) {Agency, 2006 #1664} [22]. Therefore, without a robust system for the diagnosis and reporting of postoperative SSI, true differences in infection rates may also be overlooked.

We used the Cochrane Collaboration tool to assess risk of bias of the included RCTs, and for observational studies, we used Newcastle Ottawa scale. The quality of included studies ranged from moderate to high. Consequently, the evidence generated by this systematic review and meta-analysis is credible.

This important meta-analysis helps demonstrate the true comparisons between contemporary cemented and uncemented hemiarthroplasty prostheses and helps avoid the reliance upon and tendency to quote outdated work which assessed the more dated implant models. A key strength of this meta-analysis is the high patient numbers included and this represents the largest meta-analysis to compare contemporary CH and UCH implants. For example, a recent systematic review and meta-analysis, of RCTs only, comparing CH and UCH by Veldman et al. [23] included just 950 patients compared to 42,046 in this work.

The most notable strength of this study is the vast patient number. We have conducted a comprehensive literature search that has yielded a large number of studies with huge sample size of 42,046 hips. This large sample size adds to the generalizability and validity of our results. While a great strength of this meta-analysis is the vast number of patients, we do recognize limitations to this work. Firstly, only nine of the 29 included studies were RCTs while 20 were observational studies, a lower level of evidence. However, the observational studies that were included in the analysis achieved a mean NOS score indicative of at least moderate quality. In addition, to the best of our knowledge, this is the first meta-analysis that has included both RCTs and observational studies in comparing contemporary cemented versus uncemented hemiarthroplasty. Secondly, as noted above, we have been unable to draw robust conclusions regarding differences in functional outcomes due to the absence of a standardized reporting procedure. A further limitation of this work is that only English articles were included in the literature search an analysis. However, we feel that the number of included studies and the large sample size is likely to be representative and alternative outcomes would be unlikely in the event of inclusion of non-English studies.

Here, we have demonstrated that, for the most part, outcomes between contemporary cemented and uncemented hemiarthroplasty implants are at least equivalent, with the exception of the increased rate of peri-operative periprosthetic femoral fracture in the uncemented group. However, this increased periprosthetic fracture risk did not translate to greater risk of revision. Having said that, the price of modular uncemented hemiarthroplasties is higher. Also, 11 min of extra time on the trauma list does not, in practice, translate into another case to be done on the same list; this short period of time may also be lost with interest when cables are applied for the periprosthetic fractures. Also, in our experience of hip surgery, poorly functioning hemiarthroplasties are rarely revised as hip surgeons do not usually feel they can make a poor result better by revising it, so revision is not a great surrogate for “badly done hemiarthroplasty” of either variety. These findings might support the recent shift toward preferential use of cemented components.

Further high-quality RCTs are required to determine whether choice of contemporary cemented or uncemented implants affects mortality rates and quality of life indices. Similarly, given a relative equivalence demonstrated here, further work is required to build on existing knowledge regarding other, non-implant related, variables which impact upon hip fracture outcome [24,25,26,27]. There continues a demand for a methodologically reliable, comprehensive multicenter RCT comparing contemporary cemented and uncemented hemiarthroplasty stems, not only concentrating on the mortality and complications but also focusing on patient-reported outcome measures.