Introduction

Total hip arthroplasty (THA) is a successful and cost-effective intervention for treating severe hip osteoarthritis with persistent pain and disability. Besides relieving the pain, restoration of biomechanical forces around the hip with appropriate femoral offset and leg length is an important goal [14]. The radiographic preoperative planning and postoperative evaluation of these parameters require good validity, interobserver reliability and intraobserver reproducibility. Computed tomography (CT) and magnetic resonance imaging (MRI) have shown excellent characteristics in this manner [57], but concerns such as high cost, radiation exposure and availability in relation to the huge number of planned THA operations make their routine use impractical and limited to selected cases. Therefore, surgeons use plain radiographs for this purpose and rely on proper standardisation of radiographs and measurement techniques to minimise the shortcomings associated with plain radiographs.

The authors have previously proposed a new way to measure the (global) femoral offset (FO) on plain radiographs (the Sundsvall method). We investigated its concurrent validity by comparing it with CT scans and its interobserver reliability and intraobserver reproducibility in a small sample of patients [8]. We found this method to be clinically applicable.

The aims of this prospective study are to evaluate the concurrent validity (called validity throughout the rest of the paper) of the Sundsvall method of measuring postoperative FO by comparing it to a standard method and to evaluate the interobserver reliability and intraobserver reproducibility of measurement of postoperative FO, leg length discrepancy (LLD) and acetabular cup inclination and anteversion.

Patients and methods

Patients

This prospective study was performed at Sundsvall Teaching Hospital, Sweden, between September 2010 and December 2013. The study was approved by the regional ethics committee at Umeå University, and informed consent was obtained from all patients. A power analysis using a Bonett’s approximation [9] for three observers, a minimum value of 0.7 for the interclass correlation coefficient (ICC) and 95 % confidence interval (CI) width of 0.2 indicated that a sample size of 68 hips was required. Therefore, we chose to include 90 patients with unilateral THA in order to provide a safe margin of error.

A total of 90 consecutive patients with primary unilateral osteoarthritis (OA) who underwent THA between September 2010 and June 2012 were recruited into the study. The inclusion criteria were a unilateral total hip replacement with either a cemented Lubinus SP II system (Link, Germany) or an uncemented CLS stem and Trilogy cup (Zimmer, USA). Patients with secondary (OA), previous spinal, pelvic, or lower limb injuries or fractures were excluded.

Image acquisition

All of the included postoperative radiographs were made on a computerised radiography system (Siemens, Erlangen, Germany). Radiographs were taken at the second postoperative day using a standardised protocol. The anteroposterior (AP) hip radiograph was made with the patient supine and both legs internally rotated 15° using a leg retainer and X-ray beam centred on the pubic symphysis with film focus distance of 115 cm. The lateral radiographs were made with the patient supine with the contralateral hip flexed and externally rotated with the X-ray beam angled at 45° inferomedial to superolateral through the hip joint. Acceptable radiographs were centred, straight (equal-sized obturator foramina) and included the proximal one-third of the femora [10]. All images were digitally acquired using the Picture Archiving and Communication System (PACS) (Impax: Agfa, Antwerp, Belgium), and all measurements on radiographs were subsequently made on a 19-inch LCD monitor using PACS software. The measurements performed on the anteroposterior radiograph were the FO, LLD and acetabular cup inclination, while the measurement performed on the lateral radiograph was the acetabular cup anteversion.

Measurement of leg length discrepancy

The LLD on radiographs was defined as the difference in perpendicular distance in millimetres between a line passing through the lower edge of the teardrop points to the corresponding tip of the lesser trochanter [2, 11] (Fig. 1).

Fig. 1
figure 1

Radiographic measurement of the LLD. The LLD was defined as the difference in perpendicular distance in millimetres between a line passing through the lower edge of the teardrop points to the corresponding tip of the lesser trochanter

A positive LLD value was obtained when the operated limb was longer than the contralateral side, whereas a negative value indicated the opposite. Measurements were calibrated to a radiopaque standardised metal sphere to assess the degree of magnification. A 1-mm precision scale was used.

Measurement of femoral offset

Measurement of FO using the Sundsvall method was carried out on the AP view of the pelvis as the horizontal distance between the femoral axis (a line drawn through the centre of the femoral shaft) and the midline of the pelvis at the height of the lateral tip of the greater trochanter [8] (Fig. 2). The measurement was performed bilaterally to compare the femoral offset on the operated side to the nonoperated hip. A positive value was used when the FO of the operated hip was greater than the nonoperated side, while a negative value indicated the opposite.

Fig. 2
figure 2

The Sundsvall method of FO measurement carried out on the AP view of the pelvis as the horizontal distance between the femoral axis and the midline of the pelvis at the height of the lateral tip of the greater trochanter

Measurement of FO with the standard method was carried out on AP view as the addition of the distance between the longitudinal axis of the femur to the centre of the femoral head and the distance from the centre of the femoral head to a perpendicular line passing through the medial edge of the ipsilateral teardrop point of the pelvis [3]. Once again, the measurement was repeated bilaterally to compare the FO of the operated side to the nonoperated hip. A positive value was used when the FO of the operated hip was greater than that of the contralateral side, while a negative value indicated the opposite (Fig. 3).

Fig. 3
figure 3

Measurement of FO with the standard method as the distance between the longitudinal axis of the femur to the centre of the femoral head plus the distance between the femoral head to a perpendicular line passing through the medial edge of the ipsilateral teardrop point of the pelvis

Measurement of acetabular cup inclination

Cup inclination was measured on the AP view as the angle in degrees between a line drawn along the angle of the rim of the cup and the transischial line (a line drawn between the most inferior point of the ischial tuberosities) [12] (Fig. 4).

Fig. 4
figure 4

Acetabular cup inclination is measured in degrees between a line drawn along the angel of the rim of the cup and transischial line (a line drawn between the most inferior point of the ischial tuberosities)

Measurement of acetabular cup anteversion

Acetabular cup anteversion was measured on the lateral radiograph as the angle formed by the intersection of a line drawn across the face of the acetabulum and a line perpendicular to the horizontal plane, according to the Woo and Morry method [13] (Fig. 5).

Fig. 5
figure 5

Acetabular cup anteversion is measured on the lateral radiograph as the angle formed by the intersection of a line drawn across the face of the acetabulum and a line perpendicular to the horizontal plane

Assessment of reliability and validity

The interobserver reliability of FO, LLD and cup inclination and anteversion was assessed from the measurements made by three independent observers (an orthopaedic surgeon, orthopaedic resident and radiologist). All measurements were made without any knowledge of the patient’s clinical information or the findings of the other examiners. After 8–10 weeks the orthopaedic surgeon and the radiologist repeated the same measurements and the intraobserver reproducibility was measured by comparing the first to the second measurements.

The results of the three observers using the Sundsvall method of FO measurements were compared with the results using the standard method to measure the validity of the Sundsvall method. The observers were blinded to their previous results when they made the measurements. Furthermore, we measured the degree of prediction of the three observers, i.e. the percentage of correct prediction for each observer of whether the FO was of positive (the operated hip had increased in FO) or negative (the operated hip had decreased in FO) value.

Statistical analysis

The ICC (with 95 % CI) was used to evaluate the interobserver reliability of the obtained measures among the three observers and to evaluate the intraobserver reproducibility between the first and second measurements done by the two observers. To determine the concurrent validity of the Sundsvall method, we use the Pearson’s correlation coefficient (r) to measure its correlation with the standard method. The paired t-test was also used to compare the means of the measurements of the Sundsvall to the standard methods. This was done to assess whether there would be a significant difference between the two methods, which could mean an over- or underestimation of FO measured by the Sundsvall method. For both ICC and Pearson’s correlation coefficient the value of 0.00 to 0.20 was considered slight, 0.21 to 0.40 was considered fair, 0.41 to 0.60 was considered moderate, 0.61 to 0.80 was considered substantial and 0.81 to 1.00 was considered excellent [14].

Statistical analysis was carried out using SPSS for Windows, version 20.0 (SPSS Inc., Chicago, IL) and throughout the statistical significance level α = 0.05 was used.

Results

There were 46 males and 44 females with a mean age of 68 years (44 to 85). We had eight patients with low=quality images (mainly a malrotated pelvis and unclear lateral view). These patients were reexamined with new radiographs to ensure adequate quality. The interobserver reliability of all measurements among the three observers was excellent, except for LLD, which was substantial (Table 1). The intraobserver reproducibility of measurement for the two observers was excellent (Tables 2 and 3).

Table 1 The interobserver reliability of radiographic measurements of FO, LLD, acetabular inclination and anteversion among the three observers. ICC Intraclass correlation coefficient, CI confidence interval, LLD leg length discrepancy, FO femoral offset
Table 2 The intraobserver reproducibility of observer 1 (the orthopaedic surgeon). ICC Intraclass correlation coefficient, CI confidence interval, r Pearson’s correlation coefficient, LLD leg length discrepancy, FO femoral offset
Table 3 The intraobserver reproducibility of observer 2 (the radiologist). ICC Intraclass correlation coefficient, CI confidence interval, r Pearson’s correlation coefficient, LLD leg length discrepancy, FO femoral offset

The validity of the Sundsvall method of FO measurement when compared to the standard method was good with positive correlation. Pearson’s (r) for the three observers was excellent. The p-values comparing the means (SD) of the Sundsvall method and standard method was >0.05 (Table 4), i.e. there were no significant differences among the measurements.

Table 4 The validity of the Sundsvall method of FO measurements compared to the standard method using Pearson’s correlation coefficient and degree of prediction. SD standard deviation, r Pearson’s correlation coefficient

Discussion

Previous studies have shown that LLD and altered FO after THA could affect the hip biomechanics and therefore the postoperative functional outcome and patient satisfaction [24, 1517]. The degree of acceptable LLD after THA is controversial. Up to 10 mm is tolerated well by most of the patients. Konyves and Bannister [2] as well as Wylde et al. [9, 16] found that postoperative LLD was a common problem (affecting about one-third of THA patients) and when still perceived several months after the surgery affected the short- and mid-term Oxford Hip Score. Other drawbacks of postoperative LLD included altered gait and nerve palsy especially when the LLD exceeds 25 mm [17]. On the other hand, the effect of altered FO after THA is less studied and documented [3]. When the FO is decreased, the risk of prosthetic impingement and abductor muscle weakness is increased. Cassidy et al. [18], for instance, found that a reduction of FO of more than 5 mm was associated with lower functional outcome and attributed this to the muscle weakness and loss of the abductor lever arm. However, it is still unclear what the cutoff value is for altered FO that requires surgical correction. Therefore, the restoration of LLD and FO is essential and requires meticulous preoperative templating to ensure proper prosthetic positioning and the use of a modular prosthesis where the length and angle of the femoral stem neck are variable.

The measurement of LLD after THA on the AP plain radiograph view has been widely evaluated and discussed in the literature [8, 12, 16, 1921]. In the present study, we used the Woolson method [11] to measure LLD (Fig. 1). This method (inter-teardrop-lesser trochanter distance) was found to be as reliable as an orthoroentgenogram with improved correlation with full-leg radiographs compared to the bi-ischial-lesser trochanter distance [2, 22]. The teardrop points have previously been found to be vertically and rotationally constant landmarks despite altered pelvic rotation [19]. This could minimise the effect of pelvic rotation in plain radiographs.

We found a substantial interobserver reliability and excellent intraobserver reproducibility of this method among the observers. Our results are in agreement with those reported in the literature [12, 1921]. However, these studies were not actually designed to evaluate the reliability but rather the effect of LLD after THA. Furthermore, the present study was adequately powered regarding the number of included patients and observers compared to previous studies. Also, all patients had unilateral osteoarthritis and this factor increased the accuracy of the LLD measurements, as the reference side is unaffected.

The FO is classically measured on the pelvis AP view as the radiological distance between the femoral axis and centre of the femoral head (hip rotational centre) [23]. However, this measurement does not take into account the FO changes caused by the positioning of the acetabular cup. The latter is usually measured separately as the distance between the centre of the femoral head to a perpendicular line passing through the medial edge of the ipsilateral teardrop. This is referred to as the cup offset [24]. By adding the cup offset to FO, the global FO is achieved. This standard method has shown good interobserver reliability and intraobserver reproducibility. However, the use of multiple reference points (femoral axis, centre of the femoral head and teardrop point) may increase the risk of erroneous measurement [8]. Also, in some cases the degree of osteoarthritic changes and/or the peroperative acetabular reaming is so extensive that the teardrop point becomes difficult to localise. Therefore, we advocated a new method, the Sundsvall method, to measure the global femoral offset (Fig. 2). In a previous pilot study, the Sundsvall method had an excellent correlation with the CT scan and standard method. A limitation of the study was its small sample size.

In the present study, we found a strong agreement between the Sundsvall and standard methods for all three observers. This indicates good validity of the Sundsvall method. Furthermore, the interobserver reliability and intraobserver reproducibility of both the Sundsvall and standard method were excellent. Therefore, we propose that the Sundsvall method could replace the standard method in measuring global FO. In cases where the evaluation of cup positioning or cup offset is needed, the standard method should be used.

The precise positioning of the acetabular cup is a crucial part of the surgical technique of THA. A number of studies have shown correlation of acetabular cup positioning with THA outcome [15, 25]. Improper positioning can also give rise to bone-bone or implant-implant impingement and/or prosthetic instability [17, 18].

In clinical practice, the pelvis AP view is used to determine the inclination angle of the cup (Fig. 4), while the lateral view is used to measure the anteversion angle of the cup (Fig. 5). The accuracy of these measurements has been criticised owing to the difficulty in standardising pelvic tilting and rotation during imaging. Therefore, a CT scan is recommended as the method of choice, especially for cup anteversion evaluation. The high cost and radiation dose of CT scans are some of the disadvantages associated with their use. Bearing in mind the large number of THAs performed each year, surgeons still use plain radiographs for this evaluation.

We measured the acetabular cup inclination angle (in degrees) between a line drawn along the rim of the cup and transischial line (Fig. 4). Kalteis et al. [26] found that this method had a good validity and reliability when compared with CT scans. Others [9, 21, 2628] have evaluated the reliability of this method in different patient categories, e.g. primary OA, dysplastic hips and femoroacetabular impingement [29], and found it to be good to excellent. This agrees with the results of the present study. On the other hand, we measured the acetabular cup anteversion on the lateral view according to the Woo and Morry method [13]. The cup ante- or retroversion is calculated in relation to the horizontal plane (Fig. 5), assuming that the pelvis is parallel to this plane. Nunley et al. [30], Nho et al. [31] and McArthur at el. [32] have shown that this method had a good validity compared with CT scans. However, any pelvic rotation/tilting may affect the accuracy of this measurement, especially in patients with contralateral hip or spine diseases. In the present study, the included patients had no contralateral hip disease, a factor that could decrease this bias. We found the interobserver reliability and intraobserver reproducibility of this method to be excellent among the observers. This agrees with the results found by others [27, 31, 32].

The present study has a few limitations. Radiographic measurements made on the pelvic AP view are susceptible to error since horizontal dimensional parameters are influenced by variations in the positioning of the pelvis and the divergence of the X-ray beams [3, 33, 34]. In the measurement of acetabular cup anteversion, we assumed that the patient positioning on the X-ray table was standardised in such a way that the measurement of cup anteversion was accurate compared with the horizontal plane. This might not be the case in all patients. Although a standardised positioning protocol was used for obtaining the radiographs, patient position remains a possible source of error.

To evaluate the validity of the Sundsvall method, we compared the measurement obtained with measurement of the standard method on plain radiographs. To achieve a more precise evaluation of the validity of the Sundsvall method, comparison with a CT scan is desirable. However, the standard method is an appropriate method for the measurement of perioperative FO in THA and this comparison should give an assumable evaluation about the validity of the Sundsvall method. These limitations are negated by the strengths of this study, which is a prospective cohort with the required number of patients and observers. Only patients with unilateral osteoarthritis were included to improve the accuracy of measurements, as the contralateral reference hip is anatomically unchanged. The observers were from two different specialties and experiences. This would make the obtained results more generalisable and therefore applicable in routine clinical practice.

We chose to include the intraobserver reproducibility assessment of two observers only because we thought it would be sufficient to test this parameter. This is usually also done for this type of agreement studies to save the time and effort of the observers so that only two observers need to make the second measurement. However, the sample sizing was based on the interobserver reliability among three observers.

In conclusion, the evaluated radiographic measurement methods have the required validity and reliability to be used in clinical practice.