Introduction

Total hip arthroplasty is a cost-effective operation that substantially improves patient quality of life [30]. Wear and osteolysis have become the foremost concerns in primary total hip arthroplasty (THA) in the recent decade [2, 4, 10, 15, 16]. Therefore the importance of a low wear rate as a factor in the durability of total hip replacement is well-accepted. Recent advances in minimizing the production of wear debris include the use of highly cross-linked polyethylene bearing surfaces, improvements in the tapers and locking mechanisms of modular implants, and the use of hard-on-hard bearing surfaces such as ceramic-on-ceramic and metal-on-metal surfaces. The rationale for the use of metal-on-metal articulation is that it produces less volumetric wear debris than metal-on-polyethylene and therefore may result in a decreased incidence of osteolysis-induced failure, particularly in young active patients who are expected to have a long life expectancy [12, 27]. The widespread acceptance of metal-on-metal articulations has been tempered by concerns of increased metal ion production from these devices compared to standard total hip arthroplasty [14, 18, 19, 25, 31]. To date, there are no published clinical data to suggest that there are adverse health consequences related to the type of ion production associated with these devices; however, a recent paper suggest that pseudotumors may result from a reaction to high metal wear debris [29].

The very low wear associated with metal-on-metal bearings resulted in the reintroduction of hip resurfacing in the 1990s. It involves less resection of host bone on the femoral side and therefore is considered by some to be a conservative bone-preserving arthroplasty for young patients with advanced osteoarthritis [28]. The large bearing surface results in improved range of motion and contributes to increased prosthetic joint stability. The metal-on-metal articulation has the potential to decrease wear and ultimately to reduce the incidence of implant failure. Early clinical results with these new designs have been favorable with survivorship of 94% to 99% at 2- to 5-year followup in young patients [1, 11, 17, 33].

In addition, proponents have suggested that resurfacing arthroplasty may result in improved activity level and function compared to standard hip arthroplasty [32]. To date, only one randomized clinical trial has supported this proposed advantage. Venditolli et al. [34] in a prospective randomized control trial reported a slight increase in activity level, UCLA score of 7.1 versus 6.3, for resurfacing versus standard total hip with a 28-mm head.

There are potential disadvantages to resurfacing arthroplasty. It appears, in a proportion of cases, to require a greater resection of acetabular bone than conventional arthroplasty, and there is concern over the long-term survivorship of the femoral component [24, 32]. At 3.5 years followup, Amstutz et al. [1] reported an overall failure rate of 6%. In addition, femoral neck fractures after resurfacing continue to be a problem with a reported rate from the Australian Registry of 1.46% [32].

In order to overcome the potential drawbacks of femoral neck fracture, loosening, and late osteonecrosis seen with resurfacing arthroplasty, it is now possible to take advantage of the new metal-on-metal articulations with large heads mated to a standard femoral component. Several manufacturers have recently introduced these devices. The acetabular components are identical to the ones used in resurfacing and the femoral components are identical to the ones used in standard hip replacement. A large head with a diameter that matches the inside diameter of the socket is attached to the Morse taper of the stem through a secondary metal sleeve (in some but not all systems) that intervenes between the taper of the head and the taper of the stem. This provides an articulation that is identical between the hip resurfacing and a THA with a large-diameter head. The published loosening rate of conventional femoral stems is much lower than that of the femoral component in hip resurfacing [6]. This type of device potentially could have the functional advantages of resurfacing while eliminating its disadvantages including high ion levels [5].

We therefore hypothesized quality-of-life outcomes (PAT-5D index, WOMAC, SF-36, and UCLA activity score) of metal-on-metal resurfacing arthroplasty and total hip arthroplasty with a metal-on-metal large head would be equivalent. We then hypothesized there would be no difference in serum metal ions between the two groups. The study is being reported early due to safety concerns of the markedly elevated serum metal ions in the large-head metal-on-metal total hip group.

Patients and Methods

We recruited patients between June 2005 and August 2008 at two university hospitals (Vancouver General and Montreal General Hospitals) and one community hospital (Red Deer Regional Hospital). Inclusion criteria were any patient aged between 19 and 70 years deemed suitable for hip resurfacing as judged by the treating surgeon. Exclusion criteria were previous fracture of the hip requiring internal fixation, previous femoral or pelvic osteotomy, dysplasia requiring structural graft, presence of osteopenia or osteoporosis, and hepatic or renal insufficiency. The study design was a prospective randomized trial. Assignments were made by permuted blocks of two and four, stratified by surgeon. Seven surgeons participated in the recruitment of patients. The assignments were contained in sealed envelopes and were opened the day before surgery by the study coordinator to allow for proper setup in the operating room. The two arms of the study were standard resurfacing arthroplasty and large-head metal-on-metal (MOM) total hip (see below for details). The patients, nurses, and physiotherapists were blinded to their assignment as was the laboratory that undertook the metal ion analysis. One hundred and seven patients were randomized in this study; 104 patients completed a baseline assessment (Table 1). Checking the randomization with group comparisons at baseline revealed no unexpected results. Eight patients were lost to followup, leaving 96 patients for assessment of quality of life. Of these, 23 had not reached the 1 year mark and therefore 73 patients form the basis of the quality of life being reported in this paper. The institutional review board at all three hospitals and the two universities approved the study protocol.

Table 1 Demographic data for the study population (n = 104)

Preoperatively, patient demographics were recorded, including age, gender, education level, height, weight, and occupation. All patients had comorbidity assessed by a self-administered comorbidity questionnaire and by Charnley class. Patients were assessed for clinical outcomes preoperatively, at 8 to 12 weeks postoperatively, and at 1 year and 2 years after surgery. At each assessment the patient completed four quality-of-life questionnaires: Paper Adaptive Test in 5 Domains of Quality of Life in Arthritis Questionnaire (PAT-5D) [22], WOMAC [7], Short Form-36 (SF-36) [21], and the UCLA [36] activity score. Serum levels of cobalt and chromium were measured in a subset of patients treated at the primary center.

All surgeries were performed through the posterior approach. The implants were from one manufacturer (Zimmer Inc., Warsaw, IN). The acetabular component in the two groups was identical (Durom® cup). In one group the femoral component was the Durom® femoral resurfacing component and in the other group (large-head metal-on-metal total hip) the femoral component was the M/L Taper® stem made of titanium. Onto this was placed a large Metasul® head via a Cr-Co alloy metal sleeve adapter and Morse taper in order to match the 12/14 taper of the stem (Fig. 1A–B).

Fig. 1A–B
figure 1

The figure shows (A) the Durom® (Zimmer Inc., Warsaw, IN) acetabular cup and Durom® resurfacing femoral component; (B) the M/L Taper® (Zimmer Inc.) stem, Metasul® (Zimmer Inc.) large femoral heads, and Cr-Co alloy metal sleeve adaptors.

The bearing surface in each arm was identical. The femoral and acetabular components are made of wrought-forged, high-carbon-content Cr-Co alloy (0.20 to 0.25% carbon). The surface roughness was less than 0.005 μm and the radial clearance 75 μm (manufacturer’s data, Zimmer, Warsaw, IN).

We followed all patients 8 to 12 weeks postoperatively, and at 1 year and 2 years after surgery. At each assessment the patient completed the four quality-of-life questionnaires: Paper Adaptive Test in 5 Domains of Quality of Life in Arthritis Questionnaire (PAT-5D) [22], WOMAC [7], Short Form-36 (SF-36) [21], and the UCLA [36] activity score.

The PAT-5D is a newly developed questionnaire with conditional branching. It questions five domains relevant to arthritis patients: (1) daily activities, (2) walking, (3) handling objects, (4) pain, and (5) emotions. The instrument is five pages long; five Likert responses are solicited for each domain [22].

The WOMAC osteoarthritis index is the tool recommended for disease-specific outcome measures of hip and knee arthroplasty [8]. It is a self-administered multidimensional index containing dimensions for pain (five items), stiffness (two items), and function (17 items). Items contain five Likert responses. They may be reported singly and in aggregate. The WOMAC is valid [7] and reliable in patients with osteoarthritis of the hip and knee. At present, it is the most frequently used measure of pain and functional disability among arthroplasty patients. WOMAC was scored using normalized data with a score of 0 being the worst and 100 being the best.

The Short Form-36 (SF-36) is a self-administered generic measure of quality of life (QOL) with eight subscales: (1) physical functioning, (2) social functioning, (3) role limitations (physical), (4) role limitations (emotional), (5) pain, (6) mental health, (7) vitality, and (8) general health perception. The SF-36 is widely used and is reliable and valid across a broad spectrum of medical conditions [21].

The UCLA activity score is a valid assessment of intensity of lower extremity use. Habitual activity is categorized in 10 levels, 10 indicating regular participation in impact sports and 1 indicating housebound status and near complete inactivity [36].

Patients enrolled in the study were approached to determine if they were willing to undergo serum ion measurements. A subset of 30 patients at the primary institution (Vancouver General Hospital) agreed to undergo serum ion testing. In this group we measured serum cobalt and chromium levels preoperatively and at 2 months, 1 year, and 2 years after surgery. Twenty-six of these 30 patients (13 in each arm) had ion measurements at baseline and at a minimum of 1 year. At 1 year we determined the cup abduction angle for those patients having serum metal ion testing. Among the 26 patients having complete data on the serum ion levels, there was no difference in baseline demographics in the resurfacing and large-head arms. At 1 and 2 years postoperatively there was no difference in the groups in quality of life or activity level (mean UCLA activity score at 1 year: resurfacing = 6.8, large-head MOM total hip = 6.3). Radiographic measurement of abduction angle showed no difference between the two groups (mean abduction angle: resurfacing = 45.8°, large-head MOM total hip = 44.2°). In this subgroup no cup had an abduction angle greater than 55°. One tube of blood was collected from each patient using a plastic 7-mL nonadditive, red label, blue top BD Vacutainer tube (Trace Element, Serum, REF 368380, Becton Dickinson, Franklin Lakes, NJ). Blood was allowed to clot for 20 minutes then centrifuged with the stopper on for 15 minutes. The serum was then transferred using a polypropylene transfer pipette into a 7-mL Sarstedt polypropylene tube. It was then stored at –20°C prior to analysis. All specimens were shipped to the Trace Elements Laboratory at the University of Western Ontario. This laboratory uses the Thermo Fisher Element 2 high-resolution sector field inductively coupled plasma mass spectrometer (HR-SF-ICPMS; Thermo Scientific, Waltham, MA) for measurement of metal ions. This is considered the gold standard for trace metal ion analysis [23, 25].

Since the groups were successfully randomized and quality-of-life (QOL) measures were reasonably normal by their Q-Q plots, unadjusted two-sample t-tests were utilized to test for differences between groups on the QOL outcomes at baseline, 12 months, and 24 months. Cobalt and chromium concentrations were not normally distributed, so the nonparametric Wilcoxon exact rank-sum test was employed for those measures. Intragroup comparisons were made between baseline and 12 months, and were based on one-sample t-tests for QOL measures, and the nonparametric Wilcoxon signed-rank test for changes in cobalt and chromium concentrations.

In addition to numerical comparisons, box-and-whisker plots were produced both cross-sectionally (between groups) and longitudinally (within groups). Box-and-whisker plots were standard format, with edges surrounding the interquartile range, stems out to 1.5 times the interquartile range from the edge, or truncated at the most extreme outlier if within that range. Outliers beyond the stems were plotted. To aid interpretation, box-and-whisker plots were joined by lines through their means.

Sample size calculation was based on the Equivalence test of Primary QOL outcome: the analysis tested for a lack of statistical difference in the change (delta) of the ambulation domain from presurgery to 1 year after surgery of the PAT-5D. A derived standard deviation (SD) of delta was obtained from the maximum likelihood estimate (MLE) of scores collected previously in 120 patients following arthroplasty [22]. This estimated SD was used in an additive two-sample equivalence model for sample size determination (SAS 9.1.3). If the limit within which equivalence is maintained is set to the SD estimated by this method, we have a power of 90% to reject a significant difference (one-sided, alpha = 0.05) with 48 subjects per group. Allowing for a 10% attrition rate, we planned to recruit 108 subjects (54 per group).

The PAT-5D was chosen for primary outcome as it has a much lower ceiling effect in patients undergoing total hip arthroplasty than the WOMAC or SF-36 [22]. Final outcome was measured at 1 year, as several studies have shown quality-of-life plateaus at this time post surgery.

Results

We found no difference between the groups in our primary outcome, the PAT-5D ambulation domain scores, at baseline or at 1 year. In all WOMAC domains, both patients in the large-head MOM total hip group and the resurfacing group showed improvements at 1 year but there was no difference between the groups in baseline scores or scores at 1 year; SF-36 demonstrated similar findings as the WOMAC. There was no difference between the groups at baseline or 1 year with both groups showing improvements at 1 year postoperatively (Table 2). In both groups patients had improvement in their UCLA activity score at 1 year. In the two groups there was no difference in UCLA activity score at baseline (resurfacing – mean, 4.9; large-head MOM total hip – mean, 4.7; p = 0.51) or at 1 year (resurfacing – mean, 6.8; large-head MOM total hip – mean, 6.3; p = 0.24).

Table 2 Quality-of-life outcomes

At 1 year after surgery, intragroup comparison of patients who received the large-head metal-on-metal total hip showed the serum cobalt level had increased (p = 0.0010) 46-fold from the preoperative median of 0.11 μg/L or parts per billion (interquartile range, 0.1–0.2), to the postoperative median of 5.09 μg/L, (interquartile range, 3.0–7.5). At 1 year postoperatively, patients in the resurfacing group showed the serum cobalt level had increased (p = 0.0010) 3.9-fold from the preoperative median of 0.13 μg/L (interquartile range, 0.1–0.2) to the postoperative median of 0.51 μg/L (interquartile range, 0.4–0.7). Intergroup comparisons revealed no differences (p = 0.565) in the preoperative median serum cobalt levels between the large-head metal-on-metal hip group (median, 0.11 μg/L) and the resurfacing group (median, 0.13 μg/L). However, at 1 year after surgery the median serum level for cobalt was 10-fold higher (p = 0.000) in the large-head metal-on-metal total hip group than in the resurfacing group (large-head group median, 5.09 μg/L; resurfacing group median, 0.51 μg/L). Cobalt ion levels continued to rise out to 2 years in the large-head metal-on-metal total hip group, increasing to 5.38 μg/L (interquartile range, 3.5–7.2), while the median 2-year serum cobalt level in the resurfacing group was 0.54 μg/L (interquartile range, 0.4–0.7) (Fig. 2).

Fig. 2
figure 2

The graph shows serum cobalt in the large-head metal-on-metal total hip and resurfacing groups preoperatively, at 2 months, 1 year and 2 years postoperatively. Boxes are joined by lines through their means.

At 1 year postoperatively, intragroup comparison of patients in the large-head metal-on-metal total hip group showed the serum chromium level had increased (p = 0.0010) 10.7-fold from the preoperative median of 0.20 μg/L (interquartile range, 0.1–0.3), to the postoperative median of 2.14 μg/L, (interquartile range, 0.9–3.2). At 1 year post surgery, patients in the resurfacing group showed the serum chromium level had increased (p = 0.0049) 5.4-fold from the preoperative median of 0.15 μg/L (interquartile range, 0.1–0.2) to the postoperative median of 0.81 μg/L (interquartile range 0.5–1.3). Intergroup comparisons revealed no difference (p = 0.608) in the preoperative median serum chromium level between the large-head metal-on-metal hip group (median, 0.20 μg/L) and the resurfacing group (median, 0.15 μg/L). However, at 1 year post surgery the median serum level for chromium was 2.6-fold higher (p = 0.023) in the large-head metal-on-metal total hip group than in the resurfacing group (large head group median, 2.14 μg/L; resurfacing group median, 0.81 μg/L). Chromium ion levels also continued to rise out to 2 years in the large-head metal-on-metal total hip group, increasing to 2.88 μg/L (interquartile range, 1.1–4.0), while the median 2-year serum chromium level in the resurfacing group was 0.84 μg/L (interquartile range, 0.7–1.1) (Fig. 3).

Fig. 3
figure 3

The graph shows serum chromium in the large-head metal-on-metal total hip and resurfacing groups preoperatively, at 2 months, 1 year, and 2 years postoperatively. Boxes are joined by lines through their means.

Discussion

Total hip arthroplasty continues to be the gold standard for treatment of degenerative hip disorders. Most patients will enjoy an excellent quality of life. However, concerns regarding longevity in young active patients [26] have spurred an increasing use of hard-on-hard bearings in young patients, particularly metal-on-metal resurfacing. The purpose of this study was to compare outcomes of resurfacing arthroplasty to metal-on-metal total hip arthroplasty with a large-diameter head in a randomized clinical trial. Since the articulating portion of these implants is identical, the primary hypothesis of this study was that clinical outcomes would be equivalent and that there would be no difference in serum metal ions between the two groups. The results of this trial are being reported prior to all patients reaching the primary endpoint due to concerns with the excessively high level of metal ions in the large-head MOM total hip arthroplasty group.

This study does have some limitations. First, the conclusions are based on the subset of thirty patients who underwent metal ion testing. This group was not randomized to treatment arm but was a subset of the randomized patients. While this may introduce some confounding effects we identified no difference detected in these two groups with respect to demographics, quality of life scores or activity levels. Second is drawing conclusions on equivalence of quality of life. In this study not all patients reached the final end point of one year for quality of life. For this reason we have stated that the quality of life is comparable in the two groups. With ongoing followup a future study will be able to be more definitive in this conclusion.

To date there has been only one published RCT comparing resurfacing to standard total hip [34]. The authors compared resurfacing arthroplasty to standard metal-on-metal total hip with a 28-mm head and showed no difference in quality of life at 1 year as measured by the WOMAC and Merle D’Aubigné-Postel scoring systems. They did however report improved activity level as measured by the UCLA activity score. In addition, more patients in the resurfacing group returned to moderate to heavy activities. In contradistinction to the Venditolli study [34], we compared resurfacing arthroplasty to a metal-on-metal total hip arthroplasty (THA) with a large-diameter head. In our prospective, randomized, double blind study we found no difference in quality of life as measured by three outcome tools: PAT-5-D, WOMAC, and the SF-36. In addition, the two groups had no difference in activity level as measured by the UCLA activity score. As mentioned, one explanation for the difference versus the Venditolli study is that in our total hip group all patients received large heads (in most cases > 50 mm in diameter). Another major difference is our patients were blinded to their treatment, which eliminates an important source of bias that may have been present in the study by Venditolli et al. [34].

While quality of life and activity level were substantially improved in the patients in the current study with metal-on-metal articulations, we continue to have concerns over release of these metal ions in these young and active patients. These concerns include local tissue toxicity, impaired renal function, hypersensitivity, chromosomal damage, and possibly malignant cellular transformation. To date, no study has shown that any of these theoretical concerns occur clinically. More recently, concerns of so-called pseudotumors have been raised by Pandit et al. [29]. In all published studies to date with metal-on-metal bearings, serum levels of cobalt and chromium have been elevated [3, 25, 35].

In our study, the difference in the two groups was on the femoral side. In the group receiving the large-head metal-on-metal total hip, the femoral head was attached to the neck of the femoral component via an adapter. This adapter is designed to allow the surgeon to vary the leg length without increasing the femoral head inventory. However, this adapter introduces two separate Morse tapers into the arthroplasty stem-femoral head combination. We found markedly elevated cobalt and chromium levels in the group receiving the large-head metal-on-metal total hip. In addition, in the resurfacing group serum cobalt and chromium levels tended to peak at 2 months after surgery then plateau at 1 year and remain constant out to 2 years post surgery.

Sources of metal ions in our patients include the bearing surface and the modular junction between the femoral head and neck. In both groups the bearing surface contributed to the increase in metal ions from baseline. Our data indicate excessive levels in the large-head metal-on-metal hip arthroplasty are probably not solely from the bearing surface since the two groups had the same bearing surface. Furthermore, the results of serum cobalt and chromium in our resurfacing arm are consistent with the literature [3, 9, 35]. By having identical articulations in the two groups, the only plausible explanation for the markedly elevated serum cobalt and chromium levels relates to the two areas of modularity for the attachment of the femoral head to the stem. It is well-known that the head-neck junction is a source of release of metal ions due to fretting and corrosion [19]. In the large-head MOM THA group in this study, the two modular junctions and mismatch of metals between the titanium stem and the Cr-Co alloy adaptor, could account for the elevated metal ion levels seen [13, 19, 20]. In the literature, one article has addressed metal ions in large-head metal-on-metal total hips [3]. It is hard to compare metal ion levels with our study due to differences in the bearing surface, femoral head size, and techniques for measuring ion levels. However, the levels of metal ions seen in our patients were several fold higher than what has reported in this study [3]. In both of these studies, the femoral head was attached to the femoral stem by only one Morse taper, lending further support to the explanation that the two modular junctions at this interface contribute to the markedly elevated cobalt and chromium levels seen in our study group. One other issue of importance when there are two Morse tapers is the tolerance of the tapers. Standard tolerances that are acceptable for conventional THA may not be adequate with respect to the amount of particles generated when there is more than one taper in the hip arthroplasty. This is another area that should be investigated if one is to improve the metal ion levels in patients with large head metal-on-metal total hips.

We found metal-on-metal total hip arthroplasty with large-diameter heads can achieve comparable quality of life and activity level as hip resurfacing. In both groups, patients were able to achieve excellent improvements in quality of life and activity level. However, we found substantially higher and clinically concerning levels of serum cobalt and chromium levels in patients with modular metal-on-metal THA with a large-diameter head as compared to hip resurfacing. Furthermore, these exceedingly high levels continue to increase 2 years post surgery. As a result, we no longer recommend this particular design for patients. Metal-on-metal THA with large-diameter heads should be redesigned to eliminate the intervening adapter and thereby eliminate one potential source of cobalt and chromium. While our data do not allow us to comment on other metal-on-metal total hip arthroplasties with adapters, future studies are warranted to verify that it is not company dependent. As well, the authors believe that all large metal-on-metal hip arthroplasty systems should have comprehensive in vivo ion data testing to ascertain whether they are consistent with current standards and trends acceptable for modern metal-on-metal joint replacements.