Introduction

The use of metal-on-metal (MoM) bearings has declined over the past 5 years and their future use has been a subject of considerable debate and controversy. There is a general perception in the orthopaedic community that periprosthetic adverse tissue reactions, which include pseudotumor formation, periprosthetic necrosis, and lymphocytic inflammation, are caused by high wear [1, 9, 12, 13]. This perception is in part because of the occasional appearance of metallosis, that is, gross tissue staining from metal deposition into periprosthetic tissues around some MoM hip arthroplasties at the time of revision, including those with pseudotumors. However, our previous observations have shown that this is an oversimplification of a complicated issue, because there are reports in which adverse local tissue reactions (ALTRs) were observed in the absence of high wear or metallosis [57, 21]. This may be related to the occurrence in some patients of metal hypersensitivity reactions. The term aseptic lymphocytic vasculitis-associated lesions (ALVAL) has been used to describe histological features initially thought to be the result of metal hypersensitivity [24].

Although osteolysis sometimes is reported around failed MoM hips, ALTRs primarily affect the soft tissues and may be assessed histologically. In an effort to understand the broad range of tissue reactions around MoM implants, our center developed a histological score based on the initial descriptions of ALVAL to rate the following features that can be potentially found in all periprosthetic tissues: changes in the synovial lining, inflammatory cell infiltrates, and changes to the overall organization of the tissues (Fig. 1) [5]. In a group of 32 pseudotumors, substantial differences in the tissue features and resulting ALVAL score were found between hips with high and low wear [5]. The highest scoring hips were revised for suspected metal hypersensitivity and those hips had low wear. However, the degree to which component wear correlated with histological features was unclear in that small group.

Fig. 1A–C
figure 1

Examples are shown of the application of the ALVAL score to rate changes in the synovial lining inflammatory cell infiltrates and changes to the overall organization of the tissues. (A) This tissue receives a low ALVAL score because the synovial lining (left) is intact, there are very few inflammatory cells, and the organization is typical of normal capsule (1 + 1 + 1) (Stain, hematoxylin and eosin [H&E); original magnification, × 4; inset, × 20). (B) This tissue receives a medium ALVAL score for the loss of the synovium and attachment of fibrin (left), predominant macrophages with occasional small perivascular lymphocytic aggregates (inset), and generally good tissue organization (3 + 1 + 1) (Stain, H&E; original magnification, × 4; inset, × 20). (C) This tissue receives a high ALVAL score because the synovial lining (bottom) has been replaced by fibrin and a layer or necrosis, lymphocytes are the predominant inflammatory cell (inset), and are arranged in a broad swathe at the rear of the tissue (3 + 3 + 4) (Stain, H&E; original magnification, × 4; inset, × 20).

The primary purpose of this study was to correlate the histopathological features with wear measurements in a larger group of MoM hip arthroplasties. Specifically, we sought to determine to what extent the magnitude of wear is associated with (1) the histological changes; (2) presence of metallosis; and (3) likelihood of pseudotumor formation in the periprosthetic tissues.

Materials and Methods

A retrospective study of 119 MoM THAs and hip resurfacings was performed at our implant retrieval center. The inclusion criteria for this study included the availability of: (1) periprosthetic soft tissues; (2) information regarding the presence of pseudotumor or metallosis provided by the surgeon; and (3) wear measurements. From an implant retrieval collection containing over 500 MoM hip implant retrievals, 119 hip implants met these inclusion criteria for the study: bearings with minimal postretrieval damage that had been measured by a coordinate wear machine (and in the case of hip resurfacing femoral components before destructive sectioning), basic clinical information such as gender, time in vivo and reason for revision, and, finally, the inclusion of at least one sample of periprosthetic tissues. These included 88 hip resurfacings (28 male, 60 female) and 31 THAs (12 male, 19 female) obtained at revision after a median of 40 and 25 months in vivo, respectively. The reasons for revision included aseptic loosening (n = 35), acetabular malposition (n = 25), unexplained pain (n = 22), and suspected metal allergy (n = 11) (Table 1). There is no objective test for metal allergy, but the surgeons who provided specimens to us generally arrived at the possibility of metal allergy based on the exclusion of other likely causes of pain such as infection and loosening, particularly when the bearings were considered to be well functioning and therefore not likely to be producing high amounts of wear. There were 39 hips with tissue metallosis, which was defined as tissue discoloration obvious intraoperatively, and 27 hips with a pseudotumor, which was defined as a periprosthetic solid mass, fluid-filled sac, or an enlarged bursa found intraoperatively (Table 2). Because these features were provided by different surgeons, it was not possible to quantify them retrospectively so they were graded as absent or present for the present study.

Table 1 Demographic data of hip implants
Table 2 Implant characteristics at revision

Components

Of the 88 hip resurfacings, the majority were Birmingham Hip Resurfacings (Smith & Nephew, Arlington, TN, USA) (n = 41) and Conserve Plus resurfacings (n = 20) (Wright Medical Technology, Arlington, TN, USA). The implant types also included some earlier generation MoM resurfacings and 31 conventional metal-on-metal THAs (Table 3). The age of the patients ranged from 18 to 82 years and in vivo service ranged from 1 to 178 months; ball sizes ranged from 28 to 58 mm (Table 1). From routine implant retrieval analysis findings, in conjunction with clinical, intraoperative, and radiographic information provided by the revising surgeon, a mode of failure was assigned to each hip (Table 1).

Table 3 Type of hip implants

Histological Analysis

Archived tissue specimens were used. These were produced after fixation in 10% neutral-buffered formalin and paraffin processing for sectioning and staining with hematoxylin and eosin. A trained pathologist (DK), blinded to the implant history and unaware of the study design, analyzed at least two slides from each hip. Each slide was given an ALVAL score [5], from 0 to 10, based on the sum of three histomorphological constituents: loss of synovial lining (0–3), inflammatory infiltrate (0–4), and tissue organization (0–3) (Fig. 1).

Wear Analysis

The wear depths of both the femoral and acetabular components had been measured using a coordinate measuring machine (BMT 504; Mitutoyo, Aurora, IL, USA) with a resolution of 0.01 µm by digitizing 300 to 400 points over the surface of the implant.

Statistical Analysis

From the previously described implant retrieval database and clinical and revision surgery documents, the following variables were obtained: (1) maximal wear depth of the acetabular and/or femoral component; (2) presence of a pseudotumor; (3) presence of metallosis; (4) mode of implant failure; and (5) ALVAL score for both total and individual morphological characteristics. All continuous variables were described using mean, median, ranges, and box plots. The distributions of wear depths and ALVAL scores were checked to determine the appropriate type of analysis and representation. The correlation between ball wear, cup wear, and total wear depth and ALVAL total scores were evaluated using Spearman correlation coefficients and associated p value. The correlations among the three constituents of the total ALVAL score were evaluated using Pearson correlation coefficients. The differences in ball wear, cup wear, and total wear depth between hips with and without metallosis were evaluated using Mann-Whitney’s nonparametric test. Similarly, the differences in ball wear, cup wear, and total wear depth between hips with and without pseudotumors was evaluated using Mann-Whitney’s nonparametric test. A general linear model was established to compare the ALVAL scores among the hips with different modes of failure. After this analysis, the least significant difference method of post hoc multiple comparisons was used to compare each grouping of hips by failure mode to the others to determine significant differences, if any.

There was marked variation in the histological features and measured wear depths. Overall, the ALVAL scores were distributed normally with a mean of 5.4 and they were ranked low in 24, moderate in 86, and high in eight. The three constituents of the total ALVAL score were all found to correlate with each other (p < 0.001 for all three correlations, Pearson’s correlation coefficients between 0.296 and 0.604). Explant wear depths had a skewed distribution (Fig. 2) and ranged from 2 to 614 µm in the acetabular components (median 12 µm) and from 2 to 316 µm in femoral components (median 14). As a result of the highly skewed distribution, the mode (most frequently occurring) values were most representative of the sample: 6 µm for the balls and 5 µm for the cups. When divided by followup time, the mode values for wear rates were 2.1 µm per year for balls and 3.0 µm per year for cups. There was a strong correlation between the acetabular and femoral component maximum wear depth (r = 0.77, p < 0.0001) (Fig. 3). Malpositioned cups were associated with the high wear of both components, median of 46 μm for the cup (range, 2–614 μm) and 56 μm for the ball (range, 6–316 μm), higher than those that were not malpositioned, median 10 μm for the ball (range, 2–235 μm) and 13 μm for the ball (range, 3–202 μm). Additionally, thirty-four hips (28.6%) demonstrated subjective findings of edge wear.

Fig. 2
figure 2

Wear distribution of the femoral and acetabular bearings is shown.

Fig. 3
figure 3

There is a correlation between the acetabular and femoral component maximum wear depth (µm).

Results

Overall, with the numbers available, the magnitude of wear was not associated with the total ALVAL score (ρ = −0.09, p = 0.42) (Fig. 4). Wear was mildly associated with only one the three constituents of the ALVAL score, specifically, a negative correlation with inflammatory infiltration (coefficient −0.33, p = 0.003) with wear (Fig. 5). Patients revised for suspected metal allergy had the highest ALVAL score, higher than those revised for other reason (GLM p = 0.033) (Fig. 6).

Fig. 4
figure 4

A correlation analysis between ALVAL score and wear depth (µm) is demonstrated.

Fig. 5
figure 5

A correlation analysis was performed between wear (µm) and inflammatory infiltration score.

Fig. 6
figure 6

The distribution of ALVAL scores in different modes of failure is shown. Fx = fracture.

The magnitude of maximum ball or cup wear depth both were higher in patients with metallosis (median, 66–78 µm; range, 3–614 µm) compared with those without (median, 8–9 µm; range, 2–156 µm). ALVAL scores were nearly identical in those with and without metallosis (mean, 5.30 ± 1.73 and 5.28 ± 1.69, respectively, p = 0.96).

Maximum wear depth was not predictive of the formation of a pseudotumor with the numbers available, because the wear depth was similarly distributed in hips with and without a pseudotumor (Fig. 7). Importantly, among those with a pseudotumor (n = 27), the median cup wear was 31 µm with a range of 3 to 614 µm, indicating that half of hips with pseudotumors had less than 31 µm of wear. Conversely, those without a pseudotumor (n = 92) had a median cup wear of 10 µm, ranging from 2 to 368 µm. Median ball wear in those with a pseudotumor was 30 µm ranging from 3 to 259 µm and in those with no pseudotumor 13, ranging from 3 to 316 µm. With the number of hips available, there were no statistically significant differences in cup (p = 0.18) or ball (p = 0.74) wear depth between those with and without a pseudotumor.

Fig. 7
figure 7

Box plots show the wear depth with and without pseudotumor.

Discussion

A decade ago, a large percentage of hip replacements performed in the United States and virtually all resurfacing arthroplasties used metal-on-metal bearing surfaces; however, today, there has been a major decline in the use of MoM bearings [1, 13]. Although some studies have implicated metal wear as a factor in the failure of some MoM implants, several studies have demonstrated that pseudotumors and other types of complications can occur in the absence of high wear [5, 6, 21] and in our previous study, the majority of implants were revised for reasons other than wear [7]. In the present study of 119 available MoM hips, wear was not associated with periprosthetic tissue reaction as measured by the ALVAL score or the formation of pseudotumors. Therefore, wear alone did not explain the histopathological periprosthetic tissue reactions associated with MoM implants.

Our study had a number of limitations, and our findings should be interpreted in light of these issues. First, we used a histopathological grading scheme, which, like other semiquantitative rankings of periprosthetic features that have been used in the past [16, 24], cannot be validated by a secondary objective measurement system. Although this may have obscured any true correlations between histopathological reactions and wear depth measurements, until validated tests for metal allergy or wear-debris induced histiocytic inflammation become available to provide validation for the ALVAL score, we must accept that the specificity and sensitivity of the ALVAL grading scale cannot be established. Moreover, the purpose of the ALVAL score was not diagnostic but, rather, to improve the standardization of reporting of periprosthetic histology and to decrease the effect of individual interpretation. The ALVAL scoring system has good interobserver agreement among experienced users [5] and is being increasingly used in the literature [9, 11, 13, 18, 19, 22]. Because it incorporates cellular and tissue features that are common to tissues around all types of implants, the use of the ALVAL score may be an objective reflection of the broad range of histological reactions in implanted joints. A second limitation was the use of linear wear depth to represent the magnitude of wear rather than volumetric wear loss. However, a recent study has indicated very high correlations between linear and volumetric measurements in MoM implants with correlation coefficients of 0.93 and 0.84 (p < 0.001) [8], indicating that the conclusion in the present study was not affected by this limitation. Other limitations of the present study include the fact that a limited number of the MoM implants in our registry met the inclusion criteria. This limitation, along with the relatively small number of patients who had metallosis and/or pseudotumor, may have resulted in low statistical power to identify associations with wear depth. Furthermore, the fact that the majority of the hips had very low wear values may account for the lack of correlations. Specifically, the mode (most frequent value) of wear was only 2 to 3 μm per year in this retrieval cohort. Also, metallosis and pseudotumors were treated as categorical findings (absent or present), whereas their extent or size varies considerably in patients. Unfortunately, the retrospective nature of the present study limited our ability in quantifying these important outcome variables in the present study, but this could be examined in future studies.

The association between metal wear and ALVAL has been examined in the literature. Recently, Grammatopoulos et al. [9] performed a similar study comparing wear and ALVAL in 56 hip resurfacings with and without pseudotumors. The results of the present study were consistent with those of Grammatopoulos et al.’s, because they too could not find an ALVAL dosage dependence on wear when using the same histological criteria that was used in this present study. Our results demonstrated a negative correlation between the inflammatory infiltrate score and wear debris such that the higher wear was associated with a lower ALVAL score, likely reflecting that more macrophages were present in the tissues. In a study of 94 MoM hip resurfacings and THAs, Nawabi et al. [19] also found a slight negative correlation (ρ = 0.237, p = 0.022) between linear wear and ALVAL score. Because macrophages are known to respond to particles [2, 4, 10, 20], they are likely to predominate in high-wear hips. However, it should be emphasized that this is also likely a simplification of a complex series of tribological and biological events that require more elaborate studies.

The present study found that wear was typically higher in patients with metallosis (n = 38) but that those tissues had nearly identical ALVAL total scores to tissues without metallosis. This was consistent with the findings of Pelt et al. [22] who, in a smaller series of 18 large-diameter THAs, also found no difference in ALVAL scores between hips with and without intraoperative metallosis and high or low metal load assessed histologically.

A causal association between presence of pseudotumors and metal wear magnitude has not been established in the literature. Several studies found that pseudotumors were associated with high wear and metal hypersensitivity [8, 12]. In the Grammatopoulos et al. [9] study previously mentioned, the authors found that patients with pseudotumors (n = 45) had increased linear wear rates and ALVAL scores compared with those without pseudotumors. Additional studies by the same group also demonstrated increased linear wear in components associated with pseudotumors [8, 12]. In contrast, similar to the results of the present study, Matthies et al. [15] found that patients revised for pseudotumors (n = 72) had comparable wear rates and metal ion levels as those revised without a pseudotumor. In addition, the authors found that the proportion of patients with a pseudotumor was similar between those with and without well-positioned components. Regardless of the association between high wear and pseudotumor presence, most of the aforementioned studies do note that pseudotumors can occur in the presence of low wear. Moreover, patients with high wear may not develop pseudotumors. These observations are generally consistent with a previous report by our group, which described 32 pseudotumors that were found in hips with both high and low wear [5]. In that study, the pseudotumors around implants with low wear had a higher ALVAL score than hips with high wear, leading to the suggestion that those represented metal hypersensitivity reactions. Although this may seem to be a likely explanation for the lack of dosage dependence of wear with ALVAL and pseudotumor formation observed in this and other studies, it is worth reiterating that the spectrum of tissue reactions is likely affected by complex individual host factors that are still poorly understood. For example, adverse tissue reactions may develop in patients without MoM bearings [3, 11, 14, 17, 23], in which metal or corrosion products are generated at modular junctions, which may have an exaggerated result in those with metal hypersensitivity.

In summary, our results demonstrate that wear alone is not predictive of pseudotumor formation or ALVAL features in periprosthetic tissues. The cause of the soft tissue reactions around MoM implants is multifactorial in nature and is not simply a matter of wear, although reducing the amount of wear occurring in patients is clearly a worthy goal. The present study should serve to emphasize that factors other than wear such as individual patient immunoreactivity are likely to play an important role in the outcome of a hip arthroplasty regardless of the bearings. Elucidating the role of these other factors should remain a research goal.