Introduction

Large-head metal-on-metal total hip arthroplasties (MoM THA) or metal-on-metal hip resurfacing arthroplasties (MoM HRA) were introduced because of their perceived advantages over the conventional metal-on-polyethylene articulations [1]. However, there have been numerous alarming reports of the formation of peri-articular masses in patients with MoM arthroplasties, usually referred to as pseudotumors. Pseudotumors can be small or large, solid or fluid-filled masses with or without communication with the joint [24]. The etiology is probably a capsular reaction to metal debris that eventually leads to pseudotumor formation [5, 6]. The reported incidence in different screening cohorts ranges from 28 to 39 % depending on the type of MoM prosthesis and the screening method used [4]. Although often benign, pseudotumors can be destructive, causing soft tissue damage, osteolysis, fractures, and (sub)luxation with concomitant symptoms of pain and discomfort. With large pseudotumors, revision surgery is often warranted in symptomatic cases [7]. Since 2010, metal-on-metal hip articulations have been under increased scrutiny from governmental regulatory agencies and national and international societies leading to alerts, advice, and post-marketing surveillance up to outright discontinuation of metal-on-metal devices [810]. As of 2010, all patients in our hospital who received a MoM hip implant have been invited to a comprehensive screening protocol including CT imaging. Although there is no general consensus whether all different forms of capsular reactions found in screening populations are clinically relevant, adverse reactions to metal debris (ARMD) are prevalent in symptomatic as well as asymptomatic patients with MoM hip replacements.

Screening for capsular reactions by means of computed tomography (CT) is efficient and relatively quick. Availability of CT is much better than magnetic resonance (MR) and costs are estimated to be 2–4 times lower. In a previous study, we showed that CT correlates well with MR in detecting pseudotumors [2]. CT has the additional advantage, however, that anteversion of the acetabular as well as femoral components can be calculated and is much better in detecting osteolysis. To use CT in a clinical setting, a robust, easy-to-use grading system for morphology of the hip capsule is mandatory. For this purpose, a five-point grading scale was developed in our hospital, based solely on morphological changes to the hip capsule [4, 11]. After we became more experienced with the grading system and applying it clinically in our first cohort of patients, it became apparent that the distinction between types I and II as well as between types IV and V did not seem to influence decisions regarding follow-up and revision of patients [2]. Therefore, we decided to further modify the existing grading system I–V into classes A, B, and C, distinctive capsular changes [11]. The primary aim of the present study was to develop a score for classification of absence or presence of MoM pathology and its association with revision by means of a cross-sectional design [4, 11].

Materials and methods

All patients who underwent MoM THA in our hospital were invited for screening. Various clinical measures and CT scan were obtained in a cross-sectional fashion. A decision on revision surgery was made shortly after screening. We developed a score on capsular reactions and performed an intra- and interrater reliability study of capsular reactions in a cohort of a uniform MoM THA. This study consists of 582 patients, of which 82 patients are treated bilaterally who were all invited for follow-up. The implant consisted of a Bi-Metric porous coated uncemented stem with a metal-on-metal M2a-Magnum femoral head and ReCap acetabular component (Biomet, Warsaw, IN, USA). The modular head and acetabular component are high-carbon, as-cast (single heated) components. The first cohort of this group consisted of 108 patients that were part of a prospective single-center study for which approval of the medical ethical board was obtained [2]. Subsequently, all treated patients in our clinic were contacted and invited for outpatient clinic screening again with approval of the institutional review board. Patients were scheduled for non-contrast CT scan analysis on a 48- and 64-slice scanner (Philips, Best, Netherlands) without iterative reconstruction protocols (iDose4) and without orthopedic metal artefact reduction post-processing protocol (OMAR). CT parameters: kV 140, mAs 250, slice thickness 0.9, increment 0.45, collimation 48 or 64 × 0.625, pitch 0.675, rotation time 0.75. Reconstructions were made with D filter, 800/2000 WL/WW and A filter 50/350, WL/WW. Reconstructions were processed axial, sagittal, and coronal from both D and A filters. Window width to window-level values were set at 2000:650. All examinations were reviewed on a workstation running Agfa IMPAX version 6.3.1.4537 with BARCO monitors type MDCC3120-DL, color, resolution 1536 × 2048, display orientation portrait, physical size 31.8 × 42.4 cm/12.52 × 16.69 inch. The Digital Imaging and Communications (DICOM) data of the CT examinations were anonymized using an available PACS anonimization tool for the second reader (MM). Both readers were board-certified musculoskeletal radiologists, had no previous experience in reporting metal-on-metal implants regarding capsular disease, and were blinded for patient’s further history. The first reader trained the second reader in mastering the newly designed CT classification system on morphological changes of the hip capsule as the second reader had not encountered MoM-related capsular pathology before. They obtained consensus in evaluating 20 cases before each observer scored all cases independently. The classification system incorporates five categories (I–V), covering the entire spectrum of post-operative CT findings of the hip capsule [4, 11]. All MoM THA implants were subsequently classified into categories I–V (Figs. 1, 2, 3, 4, and 5). After all CT scans were read and graded independently, consensus between observers was reached on observed differences in classifications. Based on this consensus experience we simplified our five-point classification system into a three-point classification system as distinction between I–II seemed not clinically relevant and IV–V were considered to be a different morphological expression of the same underlying disease. This adapted system distinguishes between categories A, B, and C capsular changes [11]. In this adapted measure, the first two categories of capsular reactions (I and II) of the old system are merged into category A, category III coincides with category B, and categories IV and V correspond to category C (Table 1). The pre-clinical assumption is that only category C requires replacement surgery. Category A capsular reaction is not considered clinically relevant, as this reaction is present in patients with a conventional THA and consists of category I and II capsule reactions [12]. Category B consists of category III capsular reaction and is rarely observed in both and therefore considered clinically relevant, as it shows some bulging mass effect anteriorly and posteriorly. Category B and C patients are considered candidates for revision if patients are either symptomatic or the peri-articular mass compromises the abductor apparatus or neurovascular bundle. A category C capsule is considered as either a category IV capsule or category V capsule under inflammatory pressure and subsequently developing in the direction of the least resistance. This can be eccentric, which is mostly seen inferomedial to the head of the THA or THR and in some cases above the neck of the prosthesis. In a category V lesion, pressure is reduced due to filling of the bursa iliopectinea that may be non-communicating, communicating and/or septate (13 %) or by filling the bursa subtrochanterica. The subtrochanterica bursa is often damaged by approaching the hip joint in case of THA surgery. In case of an observed communicating fluid collection between the bursa and the hip joint, we believe this to be iatrogenic [1315].

Fig. 1
figure 1

Axial CT image, type I hip capsule reaction on the left, thickening of the hip capsule anteriorly not more than 4-6 mm

Fig. 2
figure 2

Axial CT image, type II hip capsule reaction on the right, thickening of the hip capsule more than 6 mm

Fig. 3
figure 3

Axial CT image, type III hip bulging capsule reaction both anterior and posterior on the right

Fig. 4
figure 4

Axial CT image, type IV hip capsule reaction bilaterally, inferomedial enlargement of the hip capsule

Fig. 5
figure 5

Axial CT image, type V hip capsule reaction on the right, filling of the bursa iliopectinea and bursa subtrochanterica, both in connection with the hip capsule

Table 1 Scoring method: simplified A–C classification system derived from traditional grade I–V grading system

In the case that different types of CT findings in one THA were present, the highest score was applied.

Inter-radiologist agreement on MoM pathology was assessed by means of the weighted Cohen’s kappa for both classification systems (I–V, A–C). Intra-radiologist agreement for the simplified A–C classification system was studied by use of a random sample of 20 % of patients (n = 122). Categorical data were presented as n (%), and tested by means of Fisher’s exact test. Continuous data were presented as median (min–max) and tested by means of Mann–Whitney U test (two-group comparison) or Kruskal–Wallis test (three-group comparison). The decision for screening took place immediately after the screening, which also consisted of serum ion levels and taking the clinical situation of the patient into the decision process [16].

To assess whether the classification system correlated well with the decision for revision surgery, logistic regression analysis was performed. For this analysis, we excluded patients revised for reasons other than adverse local tissue reaction such as instability, infection, and aseptic loosing. Several variables were included, such as age, in situ time of the prosthesis, contralateral THA, acetabular version, cup size, cup inclination and anteversion, serum ion levels of cobalt and chromium and the simplified (A–C) classification system. Univariate statistically significant variables were entered in multiple models, in order to study independence of variables. All statistical analysis was performed two-tailed using alpha 5 % as significance level. (SPSS version 22.0).

Results

In total, 664 scores from 664 MoM hips in 582 patients obtained by two observers were available for analysis. CT was performed an average of 37 months after surgery. Interobserver reliability for the non-simplified version (I–V) was κw = 0.71 (95 % CI: 0.62–0.79), which indicates good agreement between the two musculoskeletal radiologists. Interobserver reliability for the simplified version (A–C) was Kw = 0.71 (95 % CI: 0.65–0.76), which again indicates good agreement between the two observers. Intra-observer reliability for the simplified version (A–C) was κw 0.78 (95 % CI: 0.68–0.87). As expected, the intra-observer reliability is higher than the interobserver reliability. Outcomes were also tested for differences between non-simplified and simplified scales. As to be expected, outcomes for interobserver reliability did not differ between versions (p < 0.87).

Table 2 shows MoM-related patient characteristics and revision decision for all patients and patients with unilateral or bilateral MoM THA. Figure 6 shows that the adapted CT category system is associated with revision exclusively due to MoM pathology, in both patients with unilateral MoM THA (p < 0.001) and patients with bilateral MoM THA (p < 0.044). Table 3 shows that the adapted CT category system is associated with several clinical measures. Table 4 shows the association of revision status with several clinical measures in patients with a unilateral MoM THA. Table 5 shows logistic regression analysis regarding associates of revision surgery in patients with unilateral MoM THA with or without contralateral conventional THA. In univariate logistic regression analysis on revision, cup, anteversion of the cup, cobalt–chromium ion serum levels, and the simplified A–C classification system were statistically significant. Nagelkerke R square of the simplified A–C classification system was 0.211. When all univariate statistically significant variables were entered in a multiple logistic regression model, Nagelkerke R square was 0.445. Cobalt and chromium lost statistical significance, however. Cobalt and chromium appeared to be highly correlated with r = 0.931, p < 0.001. In a model including cup, anteversion of the cup, chromium, and the simplified A–C classification system, all variables were statistically significant, and Nagelkerke R square was 0.444. In an even smaller model including cup, chromium, and the simplified A–C classification system, all variables were statistically significant, and Nagelkerke R square was 0.433.

Table 2 Characteristics of the patients and revision decision
Fig. 6
figure 6

Percentage revision surgery by CT category: MoM metal-on-metal; THA total hip arthroplasty

Table 3 Association of CT category with patient characteristics, in patients with an unilateral MoM** and patients with a bilateral MoM
Table 4 Association of revision status* with several clinical measures, in patients with a unilateral MoM THA**
Table 5 Logistic regression analysis for associates of revision*, in patients with a unilateral MoM THA** (n = 490)

Discussion

In this study, we show good intra- and interobserver reliability with a simplified classification system for classifying swelling around large-head MoM hip arthroplasties. No practical CT grading system with good intra- and interrater reliability has been previously described. The interrater agreement was good for the more extensive as well as the simplified version. Intra-observer agreement was slightly higher (κw 0.78) than inter-observer agreement (κw 0.71).

Extensive analysis shows that the classification shows association with other MoM-related parameters in distinctive patient categories. Perhaps unsurprisingly the simplified A–C classification system showed to be an independent associate of revision in several multiple logistic regression models. It is the independent associate of revision that is most unlikely to be attributed to chance (p < 0.001).

The possibility in CT scans of measuring femoral and acetabular anteversion was of great added value to surgeons, especially in planning revision surgery. The CT scan gave us the opportunity to assess the position (ante/retroversion) of acetabular component and stem (supplementary knee scan), which is of great interest to the clinician, as it is one of the parameters that influence the indication and execution of revision surgery. We have shown here that in a multiple logistic regression model, anteversion of the cup is indeed an independent associate of revision, notably a negatively association.

Consensus meetings for clinical purposes showed that it is often difficult to assess the thickness of the capsule because of the presence of streak artefacts. The best location to assess the thickness in our opinion is the insertion on the trochanter anteriorly.

The demand for imaging studies has dramatically increased in patients with MOM THA and THR. Firstly, as a consequence of the issued recommendations. Secondly, because of the media attention that these recommendations attracted, and thirdly because of the large amount of patients that are invited for follow-up in order to screen for capsular reactions. This situation subsequently puts a demand on a hospital’s financial resources. Furthermore, it calls for an efficient means to screen for significant pathology to identify those patients that could be a candidate for revision. The diagnostic techniques used to asses MoM THR patients include ultrasound (US), CT, and MRI. Especially, MRI with metal artefact reduction sequences (MARS) is an excellent tool to detect pseudotumors, and recently three reliable MR classifications of soft tissue changes found in MoM THA have been published [1720]. Many centers do not have access to MARS, however. Moreover, the lengthy MRI scans, absence of MARS software, costs, and large numbers to screen preclude the widespread application of MR for this purpose. In contrast to MRI, screening by means of computed tomography is efficient and relatively quick. Generally, availability of CT is much better, and costs are estimated to be 2–4 times lower in the Netherlands. It has the essential additional advantage of calculating the orientation of the individual prosthesis components as malposition is vital in the decision-making process prior to potential revision surgery.

Radiation exposure is nevertheless a reason for concern. By careful planning, the dose length product can be reduced to a minimum. An upgrade from a 48- to a 64-slice system with iDose reduced the computed tomography dose index (CTDI) by approximately 30 %. Data from the literature suggest that the administrated radiation dose can be decreased over 50 % [2123]. Furthermore, new hybrid iterative and full-iterative protocols are available from various manufacturers. This development will almost certainly further reduce the dose exposure to the patient. New metal artefact suppression post-processing software is also available, generating better visibility of the immediate soft tissues around metal implants [2426]. Recent research suggests that reducing the streak artefacts can be further improved [2731]. Due to these on-going technical developments, image quality can potentially be improved in the areas affected by metal artifacts. The derived benefit can be twofold: either from information on the morphology that was not visible without these correcting methods, or where poorer photon statistics, by introducing dose-saving protocols, are balanced by these methods to achieve similar image quality. The present capacity of full-iterative reconstruction techniques will reduce the radiation dose substantially. This study shows that there is a reliable, feasible, and easy-to-use classification system for CT in capsular MoM-related disease. With ongoing dose reduction developments in the future and the added benefit of measuring component position and osteolysis, we believe CT is an attractive alternative to MR. One concern with CT imaging is that it is often difficult to assess the thickness of the capsule because of the presence of streak artefacts. Post-processing techniques such as metal artifact reduction in combination with full iterative reconstructions and in time possibly spectral CT will reduce the presence of streak artefacts. The best location, however, to assess the thickness of the capsule in our opinion is the insertion on the trochanter anteriorly.

This investigation adds clinical validity to a tool that in our hospital significantly helped in the communication between radiologists and orthopedic surgeons with regard to management of the MoM patients.

Although the classification suggests a linear progression in capsular reaction, this study was not designed to evaluate progression of capsular reaction from grade A to C. Patients who did not need, or did not want, revision surgery, are currently followed up by CT scanning in our institution at 1-, 5-, and 10-year intervals to evaluate if such a progression will eventually occur. Asymptomatic pseudotumors, however, seem to show little change within 1 year [32]. The classification in this study is now part of a comprehensive screening protocol together with physical symptoms and serum ion levels [11].

Conclusions

The presented simplified CT grading system (A–C) in its first clinical validation on 48- and 64-multislice systems is reliable, showing good intra- and interrater reliability and is independently associated with revision surgery. In a multiple logistic regression prediction model together with other unilateral significant MoM-related variables of interest when considering revision, the simplified A–C version shows to be an independent predictor for revision that is the most unlikely to be attributed to chance. Further clinical validation could consist of multinational multireader validation preferentially in latest CT techniques with higher multislice systems with partial or even full iterative reconstruction techniques with dedicated metal artifact reduction protocols [33].