Introduction

Ultrahigh-molecular-weight polyethylene wear and associated aseptic loosening and osteolysis are leading causes of long-term THA revision [26]. The rates of loosening and osteolysis in metal-on-conventional polyethylene THAs have been reported to range from 9% to 47% [3, 5, 6]. Highly crosslinked polyethylene (HXLPE) was introduced to reduce wear and THA revision rates; however, there is limited information about the reduced risk of revision associated with HXLPE compared with conventional polyethylene in THA.

Several hip simulator and randomized clinical trials (RCTs) have evaluated HXLPE versus conventional polyethylene wear. Simulator studies report decreased femoral penetration and wear in HXLPE compared with conventional polyethylene [13, 15]. Radiological evaluations of in vivo liner wear in RCTs have also found lower wear of HXLPE versus conventional polyethylene [2, 5, 12, 16, 27]. Meta-analyses and systematic reviews also suggest that HXLPE has lower femoral penetration and wear than conventional polyethylene [8, 10, 14]. Although these findings suggest decreased wear of HXLPE liners, these studies have not evaluated reduction in risk of THA revision rates.

Findings from studies that have examined THA revision rates in relationship to polyethylene formulation are conflicting. Although some studies report a reduction in risk of revision rate for metal-on-HXLPE versus metal-on-conventional [17], others did not find an increased risk or did not investigate reduction in risk of THA revision [3, 7]. These prior study findings are limited by the small sample sizes from single-center and academic institutions, loss to followup, and limited length of followup. Methodological differences and investigation of a variety of implant designs also limit the use of current findings.

Larger, registry-based studies have reported a higher risk of revision for conventional polyethylene versus HXLPE [1, 22]. These studies are important in that they provide large samples on a wide range of patients across multiple settings by surgeons with various experience levels. However, as a result of the limited availability of data from US registries, there is currently a reliance on information about THA bearing surface performance from other countries.

Therefore, the purpose of this study was to compare risk of revision of metal-on-HXLPE compared with a metal-on-conventional polyethylene bearing surface in primary THAs using a large US registry. Specifically: (1) Do primary THAs with a metal-on-conventional polyethylene bearing surface have a higher risk of revision (all-cause or aseptic) than metal-on-HXLPE? (2) Is the risk of revision (all-cause or aseptic) higher for conventional polyethylene versus metal-on-HXLPE when the effects of femoral and acetabular components are controlled for in prosthesis-specific analyses?

Patients and Methods

A retrospective cohort study was conducted. Kaiser Permanente’s Total Joint Replacement Registry (TJRR) was used to identify cases during the study period. Data collection procedures, participation, and coverage of this TJRR have been published [19, 21]. In brief, the TJRR covers over 9 million members of an integrated healthcare system in seven geographical regions in the United States and enrolls over 20,000 joint arthroplasties a year. The registry has 95% voluntary participation and only 8% were lost to followup during the 10-year study period [20]. All elective nonbilateral primary THAs, in which patients were at least 18 years old at the time of their procedure and had metal-on-conventional polyethylene or metal-on-HXLPE bearing surfaces registered between April 1, 2001, and December 31, 2011, were included in the sample. Revision procedures, bilateral (same-day) primary procedures, and conversion procedures were not included. The overall study sample (N = 26,823) included all metal-on-conventional polyethylene and metal-on-HXLPE hips; cohorts for prosthesis-specific analysis to control for the femoral and acetabular components consisted of Duraloc (DePuy Inc, Warsaw, IN, USA) (N = 1146) and Reflection (Smith & Nephew Inc, Memphis, TN, USA) (N = 5202) THA cohorts. The cohort included cases from 51 medical centers and 333 surgeons were included.

The majority of the 26,823 primary THAs included in the study were women (n = 16,170 [60%]), white (n = 20,559 [77%]), had a body mass index < 30 kg/m2 (n = 16,233 [61%]), and had an American Society of Anesthesiologists (ASA) score of 1 or 2 (n = 15,374 [57%]) at the time of their surgery. The mean age of the total THA cohort was 70 years (SD = 10), and the prevalence of diabetes was 23% (n = 6239) (Table 1). Of the 26,823 THAs included in the study, 1815 (7%) had metal-on-conventional polyethylene bearing surfaces, and 25,008 (93.2%) had metal-on-HXLPE bearing surfaces. The median followup for this cohort was 2.9 years (interquartile range [IQR] 1.3–5.5 years). There were 1146 THAs in the Duraloc cohort, of which 382 (33%) had metal-on-conventional polyethylene and 764 (67%) had metal-on-HXLPE (Table 2). The median followup for this cohort was 8.2 years (IQR 5.8–9.2 years). There were 5202 THAs in the Reflection cohort, of which 753 (15%) had metal-on-conventional polyethylene and 4449 (86%) had metal-on-HXLPE (Table 3). The median followup for this cohort was 5.1 years (IQR 3.4–7.0 years). The conventional polyethylene cohorts included liners that were only gas-sterilized (uncrosslinked) or were gamma radiation-sterilized, corresponding to a dose of 25 to 40 kGy. The HXLPE cohorts included eight individual formulations with varying technical characteristics (Table 4).

Table 1 Patient, surgeon, implant, and hospital characteristics for the total THA cohort, 2001–2011
Table 2 Patient, surgeon, implant, and hospital characteristics for the Duraloc cohort, 2001–2011
Table 3 Patient, surgeon, implant, and hospital characteristics for the Reflection cohort, 2001–2011
Table 4 Summary of highly crosslinked polyethylene formulations in the present study and their technical characteristics

Revision was the outcome of interest. All-cause revision included procedures for any reason in which removal and reimplantation of a component occurred at any time after the original index procedure. Aseptic revision was defined as a revision for which infection was not a reason performed any time after the original index procedure. The TJRR prospectively monitors all registered hips for subsequent revisions. After identification of a possible revision by the TJRR through electronic algorithms or surgeon reporting, the hip in question was reviewed by trained clinical research experts (see Acknowledgments), who adjudicated the event and confirmed the reason for revision.

Exposure and Covariates

The type of bearing surface used was the exposure of interest (metal-on-conventional polyethylene versus metal-on-HXLPE). Variables thought to be related to both bearing surface and revision-free survival time were included in a propensity score model to adjust for observed confounding. The variables included continuous covariates for age, operative time, body mass index, surgeon average annual volume, and hospital average annual volume as well as categorical covariates for gender, ASA score [18], diabetes diagnosis, race (six categories), and surgeon total joint arthroplasty fellowship training status.

Statistical Analysis

Frequencies, proportions, means and SDs as well as medians and IQRs were used to describe the total THA cohort and the Duraloc and Reflection cohorts within the two bearing surface groups. Cumulative incidence of revision was calculated. Crude cumulative incidence of all-cause and aseptic revision rate/100 years of observation (revision density) and reasons for revision were calculated for the total THA and the Duraloc and Reflection cohorts.

Revision rate/100 years of observation was compared using a Poisson regression. Because bearing surface material was not randomly assigned, we addressed observed confounding using a propensity score approach [4, 25]. The objective for using propensity scores was to remove or reduce confounding so that the magnitude of bias in the estimated treatment effect was negligible. Propensity score methods can minimize confounding by making the treatment groups equal (or approximately so) on a collection of measured variables. The fundamental theoretical property of propensity score methods is that hips with the same correctly estimated propensity scores will be comparable with respect to all covariates used to calculate the propensity scores so that it is only a matter of chance as to whether each actually receives one treatment or the other. In the specific approach used, the following steps were taken: (1) the propensity score was estimated in the conventional way by fitting a logistic regression model and estimating the conditional probability of treatment assignment for each record; (2) we checked that cases in one bearing group had comparable counterparts with respect to their covariate distribution in the other bearing group and those that did not were excluded based on a caliper width of 0.2 SD of the logit propensity score; (3) we stratified the sample into six strata based on the estimated logit propensity score; and finally (4) we calculated the weight for each record based on the number of units in a stratum multiplied by the proportion of units assigned to the treatment group of interest in the data and divided by the number of records assigned to the treatment group of interest in that particular stratum. Missing data were handled using multiple imputation. Ten imputed data sets were created and Rubin’s rules for aggregating parameter estimates and variances were used [23]. Logistic regression models were used to generate propensity scores.

Marginal multivariate Cox regression models accounting for surgeon clustering using robust variance estimation were fit with stratification (five strata) by propensity score for each imputed data set and results were subsequently aggregated across data sets [11]. Additionally, some of the analytic models also used regression adjustment for surgeon volume, site volume, and hybrid fixation to address imbalance remaining in these variables after stratification by propensity score. All analyses used metal-on-HXLPE as the reference group. Hazard ratios (HRs) with 95% confidence intervals (CIs) and Wald p values are provided. For the primary analysis models, individuals not experiencing a revision were treated as censored as of whichever date came first: the study end date (December 31, 2011), a membership termination date, or date of death. Data were analyzed using SAS (Version 9.2; SAS Institute, Cary, NC, USA) and p < 0.05 was used as the threshold for statistical significance. In this study, hypothesis testing was focused on the adjusted HR for the comparison of the bearings for three groups (total THA cohort, Duraloc cohort, Reflection cohort) for each of two outcomes (all-cause and aseptic revision), leading to six tests and an increased chance of committing a Type I error. Under a conservative approach of assuming these tests are independent, the Bonferroni-adjusted alpha is 0.0056.

Sensitivity Analysis

Based on the distribution of head size, the two bearings surfaces were not comparable for head size with metal-on-HXLPE containing head sizes > 36 mm, whereas conventional PE did not. Conventional PE also contained very few 36-mm heads. To address this issue we conducted a sensitivity analysis removing head sizes ≥ 36 mm and only included two categories: head size ≤28 mm and head size 32 mm. We included this head size variable in the propensity score model as well. We also examined whether the effect of the bearing was moderated by cup type. To do this we compared the bearing surface effect estimate for Duraloc versus Reflection for each of the outcomes using Wald chi square tests.

Results

Risk of Revision, All THA: Conventional Polyethylene versus HXLPE

The adjusted risks of all-cause (HR, 1.75; 95% CI, 1.37–2.24; p < 0.001) and aseptic (HR, 1.91; 95% CI, 1.46–2.50; p < 0.001) were higher in patients with metal-on-conventional polyethylene bearing surfaces compared with metal-on-HXLPE (Table 5). At 7 years followup, the cumulative incidence of revision was 5.4% (95% CI, 4.4%–6.7%) for metal-on-conventional and 2.8% (95% CI, 2.6%–3.2%) for metal-on-XLPE. The all-cause revision density for metal-on-conventional hips was 0.76 (95% CI, 0.68–0.84) revisions/100 years of followup and 0.60 (95% CI, 0.57–0.63) for metal-on-HXLPE hips (Table 6). The main reasons for revision in the metal-on-conventional polyethylene group were instability (49%), aseptic loosening (20%), infection (15%), and other (22%) (Table 6). The main reasons for revision in the metal-on-HXLPE group were instability (40%), infection (25%), periprosthetic fracture (13%), and other (14%). When accounting for differences in femoral head size distribution, the results were not substantively different from those previously reported for the overall effect (ie, without any cup restriction) (HR, 1.69; 95% CI, 1.19–2.40; p = 0.003 [all-cause]; HR, 1.73; 95% CI, 1.22–2.44; p = 0002 [aseptic]). Therefore, it appears that the size of the femoral head is not able to explain most of the differences observed in the performance of the bearings.

Table 5 Propensity score-weighted regression results for risk of all-cause and aseptic revision in conventional bearings compared with HXLPE bearings for the overall THA, Duraloc, and Reflection cohorts
Table 6 Crude all-cause and aseptic cumulative incidence of revision, revision rate per 100 years of followup, and reasons for revision for the overall THA, Duraloc, and Reflection cohorts

Risk of Revision, THA Design-specific: Conventional Polyethylene versus HXLPE

Within the Duraloc cohort, the adjusted risks of all-cause (HR, 3.15; 95% CI, 1.65–6.02; p < 0.001) and aseptic (HR, 2.87; 95% CI, 1.43–5.78; p = 0.003) revision were higher in patients with metal-on-conventional polyethylene compared with those with metal-on-HXLPE bearing surfaces (Table 5). The 7-year cumulative incidence of revision was 8.3% (95% CI, 5.8%–11%) for metal-on-conventional polyethylene versus 2.6% (95% CI, 1.7%–4.2%) for metal-on-HXLPE polyethylene (Table 7). The all-cause revision density for metal-on-conventional polyethylene hips was 1.06 (95% CI, 0.87–1.26) revisions/100 years of followup and 0.42 (95% CI, 0.33–0.51) for metal-on-HXLPE hips (Table 6). The main reasons for revision in the metal-on-conventional polyethylene group were instability (43%), aseptic loosening (27%), infection (20%), and other (33% each). The main reasons for revision in the metal-on-HXLPE group were instability (68%), aseptic loosening (14%), pain (14%), infection (9%), and periprosthetic fracture (9%).

Table 7 Yearly number of procedures at risk and cumulative incidence of revision by bearing surface for the overall THA, Duraloc, and Reflection cohorts

Within the Reflection cohort, the adjusted risks of all-cause (HR, 1.93; 95% CI, 1.23–3.01; p = 0.004) and aseptic (HR, 2.44; 95% CI, 1.49–3.48; p < 0.001) were higher in patients with metal-on-conventional polyethylene compared with those with metal-on-HXLPE bearing surfaces (Table 5). The 7-year cumulative incidence of revision was 4.6% (95% CI, 3.2%–6.6%) for metal-on- conventional polyethylene versus 2.2% (95% CI, 1.7%–2.7%) for metal-on-HXLPE (Table 7). The all-cause revision density for metal-on-conventional polyethylene hips was 0.63 (95% CI, 0.51–0.74) revisions/100 years of followup and 0.39 (95% CI, 0.35–0.44) for metal-on-HXLPE (Table 6). The main reasons for revision in the metal-on-conventional polyethylene group were instability (65%), other (26%), infection (13%), periprosthetic fracture (10%), and aseptic loosening (10%). The main reasons for revision in the metal-on-HXLPE group were instability (40%), infection (26%), other (17%), and periprosthetic fracture (12%).

The hypothesis testing, assuming tests for the outcomes by cohort are independent, found that all tests that would be significant under an alpha of 0.05 would still be significant with this stricter threshold (Table 5). Despite apparent differences in the magnitude of the HR when examining the effect of the bearing moderated by cup type, none of these tests achieved statistical significance: chi square (1) = 1.48, p = 0.223 (all-cause), chi square (1) = 0.14, p = 0.709 (aseptic).

Discussion

Osteolysis associated with polyethylene wear is a long established cause of THA revision of surgery [3, 9, 24, 26]. A reduction in polyethylene liner wear therefore should reduce THA revision. Although prior studies suggest differences in radiologically measured wear in metal-on-HXLPE versus metal-on-conventional polyethylene bearing surfaces, findings regarding reduction in risk of revision are conflicting [7, 17] and limited based on sample sizes, single-center and academic studies, limited length of followup, methodological differences, and investigation of a variety of implant designs. Larger population-based registry studies have primarily focused on countries outside of the United States [11]. Our study provides the risk of THA revision associated with conventional versus HXLPE-polyethylene in a large US representative sample. The strengths of our study include the large, representative US sample, the ability to evaluate different implant designs with different HXLPE formulations, and the inclusion of revision as the study endpoint, which has been reviewed and adjudicated by trained clinical content experts. In our study, the risk of all-cause and aseptic revision in primary THA was higher for metal-on-conventional polyethylene bearings compared with metal-on-HXLPE bearing surfaces.

This study had a number of limitations. First, this study is observational and it is possible that we did not address every potential confounding variable in our analyses. In our study, we addressed confounding using propensity score-matching techniques to address differences in the conventional and HXLPE groups. Second, to control for femoral and acetabular components, we included only two cohorts (Duraloc and Reflection) with sufficient samples in the subgroup prosthesis-specific analyses. Third, lack of radiological, functional, or patient-reported outcomes may be perceived as limitations. However, revision is the definitive endpoint of wear, which HXLPE was designed to address. Finally, followup for greater than 10 years is necessary to evaluate longer-term results. Despite this limitation, the benefits of HXLPE are already observed within our study.

Our study findings confirm the results of in vitro hip simulator and other clinical studies that compared HXLPE with conventional polyethylene liners. Similar to Nakashima et al [17], we found an increased risk of revision for conventional polyethylene versus HXLPE. Our results differ from Howard et al’s [7] study, which did not report a higher risk of revision in conventional polyethylene versus HXLPE. Most likely this difference is related to limitation in statistical power associated with sample size because rates were similar to our study but did not reach statistical significance. The higher risk of revision in metal-on-conventional polyethylene bearing surfaces in our US sample is consistent with results reported by the Australian Orthopedic Association National Joint Replacement Registry [1]. Similar to the Australian Registry results, the difference between HXLPE and conventional polyethylene is evident in less than 10 years followup. These findings suggest that metal-on-conventional bearing surfaces have a higher risk of revision in both populations.

Within the Duraloc and Reflection cohorts, metal-on-conventional polyethylene also had a higher risk of revision than metal-on-HXLPE bearings. This finding confirms findings from registries from other countries [1] in a US sample of THA and emphasizes that higher risk of revision for conventional polyethylene is consistent when controlling for femoral and acetabular components.

In conclusion, in a large US population-based study, metal-on-conventional polyethylene THA bearing surfaces had a higher risk of revision compared with metal-on-HXLPE bearing surfaces. Clinicians should consider the use of HXLPE when using a polyethylene bearing in THA.