Introduction

Closed reduction (CR) and spica cast immobilization are routinely used in the treatment of patients aged six to 18 months with developmental dysplasia of the hip (DDH) with reducible hips [1]. Some authors have shown that post-operative magnetic resonance imaging (MRI) is helpful in detecting hips that remain dislocated after CR and spica cast immobilization [2,3,4,5]. Moreover, it has also been shown that MRI correlates well with arthrography performed immediately prior to CR [6,7,8]. Despite the fact that several studies have highlighted the validity of MRI, none have focused on the reliability of plain radiograph assessment after CR and spica cast immobilization, notwithstanding the fact that radiograph interpretation can extensively affect the diagnosis and treatment of patients with DDH [8]. Additionally, the quality of plain radiograph interpretation can also be influenced by external factors, such as experience and subspecialty training, as has been shown by several studies performed on different sets of radiographs taken for other diseases [9,10,11].

The primary objective of this study was to assess the interpretation quality of immediate post-operative anteroposterior (AP) pelvis radiographs of children undergoing CR and spica cast immobilization for DDH. We aimed to evaluate intra- and inter-observer reliability among raters of different levels of experience, specialties, and institutions. We hypothesized that compared to less experienced raters, more experienced raters would score better regardless of subspecialty training or background.

Materials and methods

After securing IRB approval from our institution (n. 2017102307), a series of 28 randomly selected patients (30 hips) with pre- and post-operative AP pelvis radiographs and post-operative MRI were included.

The inclusion criteria were (a) age between six and 18 months; (b) diagnosis of unilateral or bilateral DDH confirmed by pre-operative AP pelvis radiographs; (c) treatment by hip arthrogram, CR and spica cast immobilization; (d) an exploitable set of pre- and post-operative AP pelvis radiographs; and (e) a post-operative MRI performed no later than 36 hours after index surgery.

Patients not meeting all inclusion criteria (a through e) and those with teratologic, syndromic, or neuromuscular hip dislocation were excluded.

Rater selection and grouping

Participants included 16 raters with different levels of experience and subspecialties (13 paediatric orthopaedic surgeons and 3 radiologists) from 2 different institutions, one from Asia and one from Europe. Paediatric orthopaedic surgeons were grouped according to their experience, per the number of years in clinical practice, and the geographical origin of their institution, Asia or Europe.

Raters from Institution 1 (Asia) were divided into three groups of three raters each: group A (raters 1, 2, and 3) included raters with less than five years of experience, group B (raters 4, 5, and 6) included raters with five to ten years of experience, and group C (raters 7, 8, and 9) included raters with more than ten years of experience.

Three paediatric radiologists with more than five years of experience and a special interest in musculoskeletal disorders from institution 1 (Asia) were included in group D (raters 10, 11, and 12).

Four paediatric orthopaedic surgeons with more than five years of experience from institution 2 (Europe) were included in group E (raters 13, 14, 15, and 16).

Radiographic assessment

The 28 selected AP pelvis radiographs (30 dislocated hips) were assigned to raters in random order. Each rater assessed hips on post-operative AP pelvis radiographs twice at a two-week interval, with radiographs presented in a different random order at each presentation (n = 60 evaluations).

Each post-operative AP pelvis radiograph was presented together with a vignette stating the patient’s age, gender, and diagnosis (i.e., 12-month-old female with left DDH) and with the pre-operative AP pelvis radiograph.

Raters were asked to rate each hip on post-operative AP pelvis radiograph as in (reduced) or out (dislocated).

MRI was used as the standard reference for the final assessment of hip joint reduction (in or out). All raters were blinded to MRI results during the rating process. Data were collected and analyzed by two researchers not involved in the care or imaging analysis of any of the included patients.

Statistical analysis

Statistical analysis was performed using Stata software (version 13, StataCorp LP, College Station, TX, US). The tests were two-sided, with the Type I error set at α = 0.05. Characteristics were presented as the mean ± standard deviation or median [interquartile range] for continuous data (assumption of normality assessed using the Shapiro-Wilk test) and as the number of patients and associated percentages for categorical parameters. Generalized linear mixed models (logistic) were carried out to compare percentage of errors made by raters according to the rating (first or second) and to the level of expertise, the specialty, and the geographic origin of raters. In these models, raters and patients are considered as random-effects in order to model between and within raters and patients variability. The kappa coefficient for correlated data and proportion accuracy (%) were calculated in order (1) to measure the inter observer reliability at the first rating and intra-observer reliability between first and second ratings, then (2) to compare results between observer’s evaluation and MRI at the first rating. According to the usual recommendations [12, 13], the concordance was examined as follows: < 0.2 (bad), 0.2–0.4 (low), 0.4–0.6 (moderate), 0.6–0.8 (good) and > 0.8 (excellent). Considering MRI as the standard reference, sensitivity and specificity were calculated and presented with 95% confidence intervals.

Results

Twenty-eight patients with unilateral (26 patients; 7 right, 19 left) or bilateral (2 patients) DDH, for a total of 30 dislocated hips, met the inclusion criteria. There were four male and 24 female patients with a mean age of 12 ± four months (range, 6–18). According to Tönnis classification, hips were rated as type 1 in one case, type 2 in eight cases, type 3 in 16 cases, and type 4 in five cases (Table 1).

Table 1 Demographic and clinical characteristics of patients in this cohort

Based on post-operative MRI findings, a total of 6 hips (6/30; 20%) were identified as out (dislocated) after arthrogram, CR, and spica cast immobilization.

Radiographic assessment

A total of 60 hips (30 for each rating) were reviewed twice by each of the 16 observers, for a total of 1920 ratings.

Overall, on average, raters misdiagnosed 8.6 ± 2.5 hips (range, 6–13) and 8.9 ± 2.7 hips (range, 5–14) in the first and the second rating, respectively. No significant difference was found between the first and the second rating or among all raters (P = 0.72). Table 2 shows the number and percentage of errors made by raters according to the level of expertise, specialty, and geographic origin of raters (Table 2). The level of experience (< 5, 5–10, and > 10 years of experience) and subspecialty training (orthopedic surgery versus radiology) of raters did not influence their performance, expressed as the number of errors/number of total ratings.

Table 2 Number and percentage of errors made by raters according to the level of expertise, specialty and geographic origin of raters

Table 3 shows inter- and intra-observer reliability of the post-operative X-ray in DDH patients in terms of agreement (%) and Cohen’s kappa. Agreement among all readers equals κ = 0.12 at first rating. Clinicians with less than ten years of experience demonstrated a similar level of agreement to raters with more than ten years of experience (κ = 0.04). Consistency was moderate when raters assessed hip reduction at a two week interval (κ = 0.48, percent of agreement at 82%). Overall, the sensitivity and specificity of post-operative AP pelvis radiographs with MRI as the standard reference were 32% [CI 23%–43%] and 81% [CI 77%–85%], respectively.

Table 3 Inter- and intra-observer reliability of post-operative X-ray in DDH patients in terms of agreement (%), weighted Cohen’s kappa (к), sensitivity, specificity, and 95% coefficient interval (CI) according to MRI

Discussion

This article aimed to assess the quality and reliability of the interpretation of post-operative AP pelvis radiographs of children treated by CR and spica cast immobilization for DDH. This study forced raters to evaluate each hip as reduced (hip in) or dislocated (hip out) in a homogeneous set of radiographs.

Our findings demonstrate that AP pelvis radiographs taken after closed reduction and spica cast immobilization are frequently misjudged, regardless of the level of experience, subspecialty training, and geographic origin of the rater. In particular, the mean overall misjudgment rate was 29.3% (range, 16.7–46.7%). Additionally, the low sensitivity and moderate specificity of the post-operative radiographic assessment (with MRI as standard reference) pushed forward the idea that AP pelvis radiographs alone, taken after cast immobilization, are probably inadequate to consistently evaluate if the hip is well reduced or not. Thus, MRI must be used as the standard reference for hip reduction and is required to post-operatively assess patients undergoing CR for DDH (hip in or out).

Our data provide a basis to assert that hip radiographs that have produced previous discordant interpretations continue to produce discordance on secondary review, in all groups of raters. Some hips are likely at a higher risk of being wrongly rated. In particular, we found that among misjudged hips, 8 were consistently wrongly rated by at least eight out of the 16 raters (50%) on both ratings. Among these hips, five (62.5%) were rated as in (reduced) although they were posteriorly dislocated on MRI (out) (Fig. 1). The remaining three hips (37.5%) were interpreted as out (dislocated) although MRI proved them to be in (reduced) (Table 4) (Fig. 2). These findings lead to the conclusion that approximately one hip out of four is at risk of being wrongly rated, regardless of the experience, subspecialty training, and geographic origin of the raters (Tables 2 and 3) (Fig. 3).

Fig. 1
figure 1

Polar plot histogram showing the percentage of correct answers for all patients (P1 through P28), side (R or L), during the first (E1) and second evaluation (E2). Dark gray highlights percentage of incorrect answers (R software; CRAN R project; Vienna, Austria)

Table 4 Hips consistently misjudged by more than 50% of raters
Fig. 2
figure 2

Sixteen-month-old female with a right dislocated hip (a). Arthrogram image (b). Hip was rated as in (“reduced”) on the post-operative AP radiograph by most of the raters, although the post-operative MRI showed the hip being posteriorly dislocated (“out”) (c, d)

Fig. 3
figure 3

Fourteen-month-old female with a left dislocated hip (a). Arthrogram image (b). Hip was rated as out (“dislocated”) on the post-operative AP radiograph by most of the raters, although the post-operative MRI showed the hip being reduced (“in”) (c, d)

Why is the error rate so high? One reason could be that AP pelvis radiographs provide only frontal plane images. Raters often missed posteriorly dislocated hips. It could be hypothesized that AP radiographs give raters the impression that the hip is reduced (hip in) in the frontal plane, although it is out in the coronal view. Moreover, if the hip is not perpendicular to the source of radiation, the projected image may lead the rater to a false interpretation. We feel that it is very difficult to place the hip perpendicularly to the source of radiation due to multiple factors, such as the amount of hip abduction and rotation, asymmetry of the spica cast, and positioning of the patient [14]. Hence, radiological signs of reduction, such as Shenton’s line, medial pool distance, femoral head-acetabulum distance, and the axis of the femoral neck going to the triradiate cartilage may not be always reliable [15, 16]. In particular, in some patients, the axis of the femoral neck did not go through the triradiate cartilage centre in the radiograph: hips were rated as out although the MRI showed them to be in (reduced). Similarly, in other patients, Shenton’s line was broken on AP pelvis radiographs, and the hip was rated as out although the MRI showed the hip to be in (reduced).

A second reason could be related to the amount of contrast used for the intra-operative arthrogram. It is possible that if contrast diffuses around the joint or if too much product is used, the radiographic assessment could be more challenging and more prone to misjudgment. This seems to be particularly true when raters base their judgment on the femoral head-acetabulum distance, normally less than 4 mm [17, 18], and/or the “spur” sign, as described by Bowen [19]. Therefore, the quality of the arthrogram is important, and the amount of contrast should probably be standardized in order to avoid confounding post-operative radiographs.

Moreover, if the cast is too thick and/or is not totally radio-transparent (i.e., plaster of Paris), it might create difficulties for raters to evaluate whether the hip is in or out. In this respect, 10 out of the 16 raters (62.5%) consistently misjudged a patient immobilized in plaster of Paris (patient 26; Table 1). It is well known that plaster of Paris can be well molded; however, synthetic material has the advantage of better transparency [20].

In summary, all of the above influence the quality of radiographs and their subsequent readability.

This study has some limitations. It is a preliminary study based on radiographs of patients presenting at a single institution. Moreover, the relatively low number of radiographs did not allow the inclusion of a balanced distribution of DDH configurations.

Despite these limitations, this is the first study documenting board-certified clinicians of different levels of expertise, subspecialties, and geographical origins having low agreement when assessing hip reduction on post-operative AP pelvis radiographs of children aged six to 18 months with DDH treated by CR and spica cast immobilization.

In conclusion, post-operative AP pelvis radiographs alone appear to be inadequate to assess if the hip is reduced (hip in) or dislocated (hip out). In contrast with our hypothesis, experience and subspecialty are not protective for avoiding errors. An MRI after closed reduction and immobilization is mandatory. Institutions without MRI equipment should be very cautious in treating such patients and should eventually refer them to tertiary medical centers with MRI.

Based on the present findings, we recommend performing post-operative MRI rather than AP pelvis to assess whether the hip is reduced or not. Compared to standard radiographs, MRI allows more reliable interpretation while decreasing radiation exposure.