Introduction

Hip resurfacing arthroplasty (HRA) accounts for 10% of primary hip arthroplasty in the UK. The Birmingham Hip Resurfacing (BHR; Smith & Nephew, London, UK) system is a second-generation HRA introduced into practice in the UK in 1997 [1]. The first reports of hip resurfacing date back to the 1950s, and several variants were used until the 1980s with high early failure rates, leaving total hip arthroplasty (THA) as the only arthroplasty option for both the old and the young patient. The development of modern resurfacing was directed to the specific needs of young and active patients. In fact, avoidance of stress shielding and prevention of dislocation as realised with metal-on-metal resurfacings reflect those needs. The primary goal of HRA is to gain time for the young patient until conventional THA would be suitable [2, 3]. In 2006, the BHR system was approved by the US Food and Drug Administration and subsequently introduced to the US market. The theoretical advantages of resurfacing include less inflammatory debris and osteolysis, minimal resection of the femoral head, improved joint stability, and improved biomechanics [4]. With regard to implants, resurfacing arthroplasty represents the most conservative solution: it carries a high potential both for joint biomechanical restoration and femoral bone preservation. This potential is extremely attractive for use in young and active patients. The results of second-generation HRA suggest that the outcomes are comparable with THA but also have aroused some concerns regarding short- and mid-term follow-up. To assess outcomes and revision rates of arthroplasty, two major data sources are available: sample-based clinical trials, and national arthroplasty registers. Clinical studies are mainly conducted in specialised centres. However, these centres are not representative of the average in all aspects, for example, as regards the number of patients treated, staff training, or staff personal expertise. Study design or patient selection may imply further bias factors. Even a publication bias can potentially have a relevant impact on published data. In contrast, national arthroplasty registers include all operations performed in a certain country and can thus avoid or considerably reduce these bias factors [5]. On the other hand, data from registers reflect the background of data collection, such as the surgical procedures used or the public health system concerned, which eventually has an influence on the outcome. Also, the evaluation procedures applied, such as designation of implant variants to cohorts, may potentially lead to misinterpretations [6]. Registers focus on outcome concerning revision rate, as do most of outcome studies related to specific implants [7]. The issue of uncontrollable factors, such as the impact of individual interests, the impact on the results of clinical sample-based studies of specific study circumstances, or the occurrence of publication bias are still part of the scientific discussion. The crucial question in this context is to what extent study results are reproducible in everyday clinical practice. This does not only apply to pharmaceutical studies, but also to follow-up examinations regarding the outcomes of medical devices or surgical techniques [5]. Recent attention has focused on the comparison of results in terms of revision rate reported in peer-reviewed journals and national joint arthroplasty register data. The purpose of our study was compare outcomes regarding revision rate as reported in peer-reviewed literature and national joint arthroplasty register data. Based on a comparison of average outcome data, the validity and reproducibility of published data in everyday medical service, and potential bias factors, were estimated.

Methods

A comprehensive Web-based literature analysis was performed. Searching the MEDLINE online database was followed by a manual literature research. The inclusion criteria for scientific articles to be considered in the subsequent evaluation comprised the following: unambiguous identification of the implant and revision rate data (revision for any reason) either presented in the text or unambiguously calculable from the data contained; unambiguous values were required for all items. An exception was made only in the case of follow-up times, where articles were also accepted that indicated a time period only. In that case, a linear function was assumed for patient distribution. Thirty-two publications were identified that fulfilled these criteria, and their full texts were analysed [13, 836]. The main outcome criterion assessed in the study was revision rate, which was calculated using a standardised methodology, by means of the parameter of revisions per 100 observed component years. This indicator is a recognised standard in epidemiology that was , for example, used as early as in the middle of the twentieth century in providing evidence of the association between tobacco consumption and the incidence of lung cancer [37, 38]. In principle, this method is a calculation of the correlation between the incidence of a potential risk exposure (e.g., cigarette smoking) and a consequential event (e.g., development of lung cancer). It also allows for consideration of essential influencing factors (e.g., length of time of smoking or number of cigarettes smoked) in the calculation. Applied to arthroplasty, this means there is a risk for revision from the moment of implantation. The total number of individual years from implantation (= observed component years) are counted. The total number of revisions (for any reason) as the failure end-point are documented and calculated in revisions per 100 observed component years. A value of 1 represents a 1% revision rate at one year and a 10% revision rate at ten years of follow-up. This indicator was introduced in arthroplasty by the Australian register. The calculations in this study were performed according to the investigators’ guidelines published in their annual reports.

Clinical studies were compared to the data sets from arthroplasty registers. The most recent annual reports were selected that were available from the Web site http://www.efort.org/education/registers.aspx. Only national registers featuring documentation completeness of more than 90% and with their data validation procedure published were considered for the analysis. In case of register data sets, precise values were strictly required. Eventually, the annual reports of Australia 2010 and New Zealand 2009 were included in the analysis [39, 40]. The journal publications were analysed with regard to their year and type of publication, follow-up period, authors, geographic region, and number of cases. Any publication indicating the McMinn Centre in Birmingham and Prof. Derek McMinn as author or coauthor was rated as publication by the development team [41, 42].

To be rated as an outlier data set, the average value had to show a statistically significant difference in the outcome and at least a difference of 300% to the benchmark in register data sets. The national hip and knee registers in Sweden and Denmark publish outcome from individual departments with deviations of up to a ratio of 3 for the outlier departments [41]. These deviations in outcome were rated as explicable differences in average patient service due to cumulative effects of influencing factors, such as surgeons’ expertise, departmental training activities, internal and external quality control activities, patient selection, or the public health system. To determine statistical significance, 95% confidence intervals (CIs) were calculated using Circulator software version 4, an Excel-based program by the University of Adelaide, SA, Australia. Further statistical evaluations were not performed owing to basic data variability and study design.

Results

The mean clinical follow-up period was 5.06 years [standard deviation (SD) 2.49], ranging from 1.0 to 10.9 years. The average follow-up period of the annual register reports was 3.44 years (SD 1.79) and ranged from 2.19 to 4.71 years. Overall, the clinical studies reported on 18,708 implants, equivalent to 106,565 observed component years, whereas annual reports comprised 9,806 primary cases, corresponding to 44,294 observed component years (Table 1, Fig. 1).

Table 1 Number of primary cases and revisions reported in the peer-reviewed literature and in annual reports of national joint arthroplasty registers
Fig. 1
figure 1

Cumulative primary (blue) and revision (red) cases reported in the peer-reviewed literature and in annual reports of national joint arthroplasty registers

Thirty-two original articles were assessed in this study. Three publications (9.4%) were identified as being by the development team, 19 (43.8%) originated from other centres of the United Kingdom, four came from continental European centres (12.5%) and the remaining six were from Asia, Australia and the USA (18.8%). (Fig. 2)

Fig. 2
figure 2

Percentage of origins of different sources of the peer-reviewed literature concerning Birmingham Hip Resurfacing (BHR)

The average revision rate of the published follow-up studies was 2.65% (SD 2.16), which amounted to 0.46 revisions per 100 observed component years (CI 0.34–0.58%). The average revision rate from register data was 3.41% (SD 1.79) and corresponded to 0.74 revisions per 100 observed component years (CI 0.72–0.76%). (Table 1, Fig. 3)

Fig. 3
figure 3

Different rates of revisions per 100 observed component years reported in the peer-reviewed literature. The horizontal line denotes the average of revisions per 100 observed component years as reported by national arthroplasty registers

Studies authored by the development team

Literature published by the development team accounted for 3,651 primary cases reported. The average follow-up was 7.4 years (SD 2.7). The 72 revisions reported correspond to a revision rate of 1.98% and 0.27 revisions per 100 observed component years (CI 0.14–0.40%). This means that more than one third (37.8%) of the implants analysed in the published literature were reported by the development team. Compared with the comparative register value of 0.74 revisions per 100 observed component years (CI 0.72–0.76%), there is a statistically significant difference between register data and the results from Birmingham.

Studies from independent European centres

The average follow-up period of studies reported from European centres was 5.3 years (SD 2.5). Twenty-three published articles reported the results of 14,093 primary implants. There were 406 revision procedures, which corresponds to a revision rate of 2.88% (SD 2.31) and 0.54 revisions per 100 observed component years (CI 0.25–0.83%). This does not differ statistically significantly from the results of the development team or from register data.

Studies from other independent centres

There were six studies from independent centres outside Europe: two from Australia, two from Asia and two from the United States. The mean follow-up period in these studies was 3.5 years (SD 1.79). In 964 primary implants, 17 revisions were observed. This corresponds to a revision rate of 1.72% (SD 1.3%) and 0.50 revisions per 100 observed component years (CI 0.35–0.75%).

Register data

Analysis of annual national arthroplasty register reports of Australia and New Zealand revealed average outcomes ranging from 0.73 to 0.75 revisions per 100 observed component years. The reports comprised 9,806 primary implantations and 334 revisions. This corresponds to a revision rate of 2.58% (SD 1.38) and a value of 0.74 revisions per 100 observed component years. The mean follow-up period of the annual reports was 3.45 years (SD 1.78).

Discussion

Hip resurfacing prostheses predate the use of stemmed femoral components. Various materials were used between the 1930s and 1950s, including ivory, glass, and stainless steel. Femoral resurfacing coupled with cemented polyethylene acetabular resurfacing was popular in the 1970s, but it fell out of use because of high rates of bone resorption (osteolysis) and loosening within five years of surgery. New metallurgy allows resurfacing with metal-on-metal articulations, and there has been a resurgence in the use of total HRA to manage arthritides [3, 4]. Simple resurfacing of the worn articulation has less frequently been used as a means of THA. Advantages of HRA include preservation of bone on the femoral side, greater physiological stress transfer at the proximal femur, which might avoid problems such as stress shielding, and lower risk of dislocation due to the larger femoral head compared with conventional THA. Also, revision surgery of the femoral component is considered to be easier than in THA with an intramedullary anchored femoral stem. [2, 4, 43]. However, resurfacing has several disadvantages. The lack of modularity of this device reduces the ability to adjust leg length. It is not appropriate in hips with loss of femoral-head and neck-bone stock or in hips with femoral cysts. Fractures of the femoral neck as the most common mode of hip resurfacing failure have been well documented. It is unique to this procedure, with an incidence ranging from 0% to 4%. Aseptic loosening and metal degradation represent further disadvantages of HRA [4]. Pyocytic vascular and associated lesions (ALVAL) have been identified as a new problem in arthroplasty surgery with metal-on-metal-bearing surfaces [12]. In a review article, McGrory et al. found higher revision rates in HRA compared to conventional THA, rather than difficulties in comparing patient populations [44]. Revision rate is a recognised, well-defined and objective parameter after arthroplasty that covers a variety of possible complications. The necessity for revision surgery has serious consequences for the patient’s quality of life and causes high health-care expenditure. Decision making largely follows standard procedures in diagnostic assessment and indication. This indicator is therefore well-suited for comparative analysis, and the conclusions are relevant for all major parties involved in the health-care system. For that reason, outcome data concerning revision rate have a major impact on daily decisions by surgeons and health authorities.

The average results published by the development team differed markedly from the outcome shown in other data sources. In fact, the development team reached 0.27 revisions per 100 observed component years, whereas the comparative value from the overall peer-reviewed literature was 0.46, and analysis of national arthroplasty register data revealed a value of 0.74 revisions per 100 observed component years. This difference is statistically significant.

The majority of journal publications reviewed for this study report a lower probability of revision than do register data, but they show similar results in all regions worldwide. In fact, independent European centres report 0.54 revisions per 100 observed component years, whereas studies from independent centres in Asia, USA and Australia report a value of 0.50 revisions per 100 observed component years. The differences from register data might be explicable by the fact that a vast majority of publications originate from specialised centres, which are not representative of the worldwide or national average patient care in all aspects. As opposed to this, register data include virtually all operations performed in a country and therefore comprise the entire range of treatment, thus reducing bias and allowing better generalisation in the area covered by the register.

Surgery outcomes are, of course, subject to certain variations resulting from factors that are independent of the products used. They could be related to the profile of patients treated in the department concerned, the surgeon’s experience, specific surgical techniques, quality assurance measures, and to the effects of the particular public health system. In this study, a difference of a factor up to 3 between data sets was considered to be explicable by individual experience, particular circumstances in the hospital concerned, and other potential confounders. The value of 3 was chosen because it covers the variance among individual hospitals in countries in which national registers publish these data, such as the Swedish (Hip and Knee) Registers or the Danish National Arthroplasty Registers, as well as the deviation from the mean of revision rates of individual implants in various national registers [41, 42, 45]. The reason for this divergence can only be discussed theoretically. However, irrespective of the reason, the average surgeon should be aware of the fact that the outcome published by the inventing centre seems to be hardly reproducible in average patient services and other institutions. Thus, the published results of the development group are only of limited value for decisions being made by other users, as they cannot expect to reproduce those excellent mid-term results.

The variation in results is clearly lower in registers of different countries than it is in the clinical literature. Apart from the larger number of cases, it is probably the minimisation of confounding factors—which basically cannot be excluded in sample-based studies—that accounts for this effect. Recent attention has focused on the difference of outcome between revision rate reported in the peer-reviewed literature and those in national joint arthroplasty registers. Findings were similar to those of our study for implants used in either field: THA, total knee arthroplasty, or total ankle arthroplasty [57, 4650].

Limitations of our study include the difference in patient characteristics that might occur in individual data sets. In addition, there are some limitations to the validity of data used in this study resulting from estimations that were necessary to allow for comparison of different data sets.

In general, national joint arthroplasty registers provide high-quality outcome data regarding average patient care in the specific region. Compared with these data, the outcome published by the inventors of BHR are significantly better and hardly reproducible by other users of this implant, including large centres, which publish their series in peer-reviewed journals. Although, McMinn has a remarkable impact on the publication and scientific rating of BHR and resurfacing hip arthroplasty, this limitation should be taken into account by other stakeholders when making their individual decisions.