Introduction

Fractures of the proximal humerus are increasingly frequent, with numbers tripling between the 1970s and the 2000s [1, 2]. Among these fractures, those involving the tuberosities and also the anatomic neck are a therapeutic challenge. For this type of fracture (Neer’s four-part fracture) the degree of displacement of the fracture needs to be understood in order to provide suitable treatment and apprehend the risks in its evolution. There is indeed, in this type of fracture, a risk of humeral head ischaemia, which will carry considerable weight in the therapy adopted.

The usual classifications, such as the AO or the Neer classification, have shown their limitations in terms of reproducibility [3, 4] and are not suitable for the prognostic assessment of these four-fragment fractures of the proximal humerus. The radiographic parameters described by Hertel in 2004, on the other hand, seem to us to be far more relevant to routine clinical practice [5]. They enable an analysis of the fracture and an assessment of the risk of post-traumatic necrosis of the humeral head. They also enable therapeutic care to be apprehended more efficiently, in particular via the assessment of the medial cortex (calcar) which is an important stability criterion for osteosynthesis [6,7,8]. These parameters were initially described on standard radiographs, but they are fully operative for reading 2D scans or 3D reconstructions.

The use of the scanner to improve reproducibility of the classification of these proximal humerus fractures is still controversial [9,10,11]. In case of complex fractures, however, the scanner is most often the rule to guide therapeutic strategy, although the reproducibility of the different assessment criteria has never been studied.

The aims of the present study were firstly to analyse inter-observer and intra-observer reproducibility for the different criteria proposed by Hertel, using three types of imagery (standard radiographies, 2D and 3D scans) and secondly to assess the relevance of the use of the scanner to improve reproducibility.

Material and method

Twenty radiographic files derived from the 2014 SOFCOT symposium on four-part fractures of the proximal humerus were chosen randomly among the 384 complete files. This multi-centre study involved 11 centres specialised in surgery of the shoulder, and it obtained approval from CPP-Est (2013-A00050-36). The radiological files all contained standard radiographs performed in emergency settings when the patient arrived in the care facility, a 2D scan with axial, sagittal and coronal sections, and high-resolution 3D reconstructions.

Analysis protocol

Three independent observers (PM, VB and XO) with differing experience assessed each radiographic file three times for the implementation of a study of intra-rater reproducibility. An expert committee (CN and FS) convened for the analysis of the radiological files. Inter-rater reproducibility was thus assessed by comparing the first assessment of each observer with the expert committee assessment. The assessment of the different files was always conducted in the same manner: analysis of the standard radiographies, then analysis of the 2D scan sections and finally analysis of the 3D reconstructions. An interval of at least two weeks was allowed between each round of assessments. A computerised document presenting the different parameters and explaining them was issued to each participant before the start of the study.

The eight criteria assessed were:

  • Displacement of the humeral head on the frontal plane: not displaced/varus/valgus.

  • Displacement of the humeral head on the sagittal plane: not displaced/angle <20°/angle >20°.

  • Humeral head split: yes/no.

  • Calcar comminution: yes/no.

  • Medial hinge: yes/no.

  • Length of metaphyseal extension: <8 mm/>8 mm.

  • Displacement of the greater tuberosity: not displaced/>5 mm/<5 mm.

  • Displacement of the lesser tuberosity: not displaced/>5 mm/<5 mm.

Statistical analysis

Intra-rater reproducibility was assessed using Cohen’s Kappa coefficient across three series of measures. Inter-rater reproducibility was also assessed using the Kappa coefficient comparing rater series 1 and the expert committee assessment. We thus assessed the extent of the influence of the types of imagery on concordance with expert opinion.

The Kappa coefficient is a tool designed to measure agreement between two qualitative variables based on the same terms. The Kappa coefficient values are always between 0 (no agreement) and 1 (perfect agreement). A Kappa coefficient comprised between 0.00 and 0.20 significate a very poor agreement, between 0.21 and 0.40 significate a poor agreement, between 0.41 and 0.60 significate a moderate agreement, between 0.61 and 0.80 significate a satisfactory agreement and between 0.81 and 1.00 significate an excellent agreement.

The analyses were performed on SAS software (version 9.3, SAS Institute Inc., Cary, NC, USA) and SPAD (version 8.2, Société Coheris, Suresnes, France).

Results

Tables 1, 2 and 3 describe the results for intra-observer reproducibility for the analysis of standard radiographs, 2D scan sections and 3D scan reconstructions, respectively.

Table 1 Intra-rater reproducibility after analysis using standard radiographies
Table 2 Intra-rater reproducibility after analysis of 2D scans
Table 3 Intra-rater reproducibility after analysis of 3D reconstructions

The overall analysis of intra-observer reproducibility for the criteria studied reached poor to moderate agreement in 13 out of 24 cases (8 criteria and 3 observers)—Kappa values ranging from 0.21 to 0.60—when the analyses were performed on standard radiographs. The number was 6 out of 24 (25%) for the 2D scans and 5 out of 24 (21%) for the 3D reconstructions.

In the other instances, the Kappa coefficient was satisfactory (8 for the standard radiographs, 11 for the 2D scans and 13 for the 3D reconstructions) or excellent (3 for the standard radiographs, 7 for the 2D scans and 6 for the 3D reconstructions.

Agreement between the observers and the expert committee was studied via inter-rater reproducibility (Table 4). The Kappa coefficient increased with imagery type in formal manner for the criteria median hinge and length of metaphyseal extension. For the other criteria, the influence of the type of imagery on rater/expert agreement was less marked. The criterion calcar comminution was very poorly reproducible between raters 2 and 3 and the expert committee, while rater 1 reached Kappa coefficients between 0.29 and 0.60.

Table 4 Inter-rater reproducibility between first assessment of each rater and expert opinion

Discussion

Intra-rater reproducibility on the Hertel criteria was moderate for assessments based on standard radiographies, and it improved with the use of 2D scans and 3D reconstructions. Inter-rater reproducibility was better for several criteria in case of use of 2D and 3D scans compared to standard radiographies.

Numerous authors have shown that the most widely used international classifications (Neer, AO, Codman, Duparc, AST, etc.) were not very reproducible, with at best only moderate inter-rater reproducibility (Kappa values between 0.41 and 0.60) [4, 10,11,12,13]. The main value of these classifications resides in the scope for comparing clinical series one to the other. Thus, all the fractures in the present series are type 12 fractures for Codman-Hertel, 11-C1 or 11-C2 for AO and four-part fracture for Neer.

The interest of the Hertel criteria is their prognostic and therapeutic value. Indeed, certain criteria, such as displacement of the greater tuberosity or humeral head angle greater than 20° in the sagittal plane will orient towards surgical treatment. Other criteria, such as medial hinge or absence of calcar comminution, provide information on the stability of the fracture [5, 6, 8]. Finally, parameters such as fracture of the humeral head or metaphyseal extension under 8 mm are risk factors for osteonecrosis of the humeral head [5].

In assessments using standard radiographs, certain criteria exhibit moderate to good intra-observer reproducibility, for instance the greater tuberosity criterion, displacement of the humeral head in the frontal plane or fracture of the humeral head. These criteria are indeed fairly easy to interpret on standard radiographs such as those performed in emergency setting. In contrast, intra-rater reproducibility was not as good for the assessment of the medial hinge (observer 1), metaphyseal extension (observer 2) or calcar comminution (observer 3). The assessment of the lesser tuberosity on radiographs was in all cases difficult and poorly reproducible. For inter-rater reproducibility, it was poor to moderate in all cases for radiographic analyses, with the exception of displacement of the humeral head in the frontal plane (Kappa range 0.55–0.65).

The use of 2D and 3D scans improved intra-observer reproducibility for most of the criteria considered for all three raters, with satisfactory to excellent Kappa coefficients for five of the eight criteria. The three criteria that were more problematic were the assessments relating to the greater and lesser tuberosities and calcar comminution.

This last criterion was not initially described by Hertel, but consideration of this parameter developed with the use of locking plates. It is in fact an important criterion for the assessment of the stability of the fracture, in particular when plate osteosynthesis is envisaged. In 2012 Osterhoff [8] defined calcar comminution as being present if there was an intermediate fragment on the medial curve of the humeral metaphysis below the anatomic neck. The rather uncertain definition of this criterion could well explain the very poor intra-rater reproducibility whatever the type of imagery used. The intra-rater reproducibility for this criterion, however, showed improvement in the assessment of 3D reconstructions, since this type of imagery enables the medial cortical to be viewed as a whole and thus the detection of any intermediate fragment. This criterion is thus worth studying, but requires a better definition to gain in reproducibility.

The analysis of lesser and greater tuberosity displacement is moderately reproducible on 2D and 2D imagery both intra- and inter-rater. This can certainly be explained by the lack of precision in the distinction “not displaced” and “displacement under 5 mm”. Indeed, in case of a (very minor) displacement of 1 or 2 mm, the raters may have assessed differently without actually misinterpreting. To improve reproducibility for these parameters, the following response options could be suggested: not fractured/non-displaced fracture (<5 mm)/displaced fracture (>5 mm).

The value of 2D scans and 3D reconstructions for the assessment of proximal humerus fractures is still a subject of debate [9,10,11]. However, for complex fractures where treatment can be guided by precise criteria such as metaphyseal extension or the presence of a medial hinge, we have seen that these scans are very useful. They improve inter-rater assessment reproducibility for most of the criteria that are relevant to therapeutic decisions.

Conclusion

The criteria we have retained for four-fragment complex fractures of the proximal humerus exhibit intra- and inter-rater reproducibility that is at least moderate. Reproducibility could be considerably improved by associating 2D scans and 3D reconstruction, in particular for the criteria related to prognosis for the vascularisation of the humeral head.