Introduction

Forensic age estimation is a current focus of research interest in the field of forensic medicine. Over the last few years, it has become increasingly important to determine, in particular, the age of living persons [5, 19, 20, 32, 37, 38].

In the context of criminal proceedings, the Study Group on Forensic Age Diagnostics currently recommends that, in addition to a physical inspection and a dental examination, skeletal age should be determined [23]. In principle, this is accomplished initially by examining the degree of maturity of the skeletal elements of the hand. From a forensic viewpoint, assessment of the hand skeleton is applied in questionable cases to help monitor whether any juristically relevant age limits have been exceeded. As the persons in question may, under certain circumstances, risk serious consequences, the result of the examination is coupled with the demand for a particularly high level of validity. To be forensically applicable, the methods available must therefore offer a high degree of estimate accuracy and facilitate reliable diagnoses.

To date, a multiplicity of very varied techniques have been developed for determining skeletal age based on an X-ray image of the hand. All existing methods may essentially be classified according to two different basic principles: atlas and bone-specific techniques.

A large number of X-ray atlases are in existence, depicting the normal ossification of the human hand [10, 12, 17, 21, 24, 33, 39, 40]. These make it possible to estimate the chronological age of a child or adolescent by comparing the overall maturation pattern of a given X-ray image of the hand with a collection of standard images (Fig. 1). Amongst these atlas methods, the standard works by Greulich and Pyle [12] and Thiemann et al. [40], in particular, are applied in criminal age estimation practice. In recent years, the digital skeletal maturation atlas of Gilsanz and Ratib [10] has provided idealised standards of normal hand skeleton development classified specifically by gender and age.

Although the different ossification centres of the hand skeleton occur in a certain regular order, and changes to their size and shape as well as closure of the growth plates take place more or less following set principles, hand radiographs with inhomogeneous maturation patterns also have to be assessed occasionally. This phenomenon is taken more strongly into consideration in another methodological approach. Bone-specific techniques make it possible, by means of specific assessment systems, to estimate individual chronological age on the basis of the degree of maturity manifested by selected skeletal elements in an X-ray image of the hand (Fig. 2). Amongst the techniques most used in forensic work are the bone-specific methods of Tanner et al. [11, 34, 36]. The method of Roche et al. (FELS method [22]) has been established for a long time, primarily in clinical use.

To determine the most suitable method for use in criminal forensic work, the present study aims to compare the estimate accuracy of the atlas methods of Greulich and Pyle, Thiemann et al. and Gilsanz and Ratib, and the bone-specific methods of Tanner et al. (here, the RUS2 and RUS3 scores) and Roche et al. Furthermore, all these methods will also be studied comparatively in relation to the reproducibility of skeletal age diagnoses.

Test persons and methods

Radiographs of the left hand of a total of 48 male and 44 female children and adolescents aged between 12 and 16 years were evaluated retrospectively. Table 1 shows case group figures subdivided by gender and chronological age.

The X-ray images of the hand were created in the period from 1986 to 2002 in a specialised osteology practice in Papenburg, Germany. The evaluation included exclusively hand radiographs of children and adolescents whose physical development was appropriate to their age. If any indications of a disease which might influence skeletal maturation emerged, this led to exclusion from the study.

All the X-ray images were on a scale of 1:1. They had previously been digitalized using an X-ray scanner and anonymised by randomly distributed numerical data names. It was not possible to derive any person-related information from the images themselves. Only the head of the study, who was not involved in the collection of data, knew the age and gender of the persons under investigation in the form of a key file.

With the use of the program Synedra view personal 3 version 3.2.0.0, all the X-ray images were assessed in the course of a blind study by two examiners with much experience in the evaluation of hand radiographs, independently of one another. One of the examiners repeated the evaluation following an interval of at least 3 months. In the case of each individual X-ray image, skeletal age was determined using the methods of Greulich and Pyle, Thiemann et al., Gilsanz and Ratib, Tanner et al. (both RUS2 and RUS3) as well as Roche et al.

It is only possible to determine the age of a skeleton according to Roche et al. by using special software (FELShw). The chronological age of a person that is assumed to be known is also needed for the calculations. Since it is not possible to provide appropriate information under the conditions of the forensic age estimation, we used the skeletal age previously ascertained according to Greulich and Pyle as an example in accordance with a recommended alternative approach [6].

The statistical interpretation of the data ensued with the aid of IBM SPSS STATISTICS 19 software. All calculations were undertaken divided by gender.

To study the criterion of estimate accuracy as a correlation between the skeletal age ascertained in dependence on the different methods on the one hand, and the chronological age of the test persons on the other, we applied the respective Pearson’s correlation coefficients (ρ). To analyse the criterion of reproducibility as a correlation between the method-dependent diagnoses of skeletal age made by one and the same examiner (intraobserver agreement) and by the two examiners (interobserver agreement), the weighted kappa coefficients (κ) were calculated.

Results

As the result of the studies on estimate accuracy, Tables 2 and 3 show the correlation coefficients calculated for the observed method spectrum, divided by gender. These are taken here as a basic measure of the linear correlation between the characteristics of method-dependent skeletal age and the chronological age of the test persons.

The coefficients ascertained for the study population are characterised across the methods, in both genders, by values which quite predominantly lie close together. In detail, the coefficients determined for female test persons, in particular, using the atlas methods of Greulich and Pyle as well as Thiemann et al., but also using the bone-specific method of Roche et al., indicate a comparatively high linear correlation measure between the skeletal ages determined in each case and the relevant chronological age (0.766 ≤ ρ ≤ 0.802). Most notably, the skeletal ages determined in accordance with the bone-specific method of Tanner et al. lead, in this context, to significantly poorer results (0.615 ≤ ρ ≤ 0.717).

Particular good estimate accuracies can also be achieved in the male gender using the atlas methods according to Greulich and Pyle or Thiemann et al. and according to the bone-specific method of Roche et al. (0.745 ≤ ρ ≤ 0.791). There are no similar restrictions on the method according to Tanner et al. either in the RUS2 or in the RUS3 score (0.760 ≤ ρ ≤ 0.793).

In Tables 4 and 5, the results of the study on the reproducibility of the method-dependent skeletal age diagnoses for one or both examiners, respectively, are shown with the gender-specific weighted kappa coefficients. The coefficients for the study population assume almost exclusively high values of over 0.9 in both genders, regardless of the method applied. More differentiated observation shows that, besides the bone-specific method of Roche et al., each of the atlas methods studied produces a comparatively high degree of intraobserver agreement (0.959 ≤ κ ≤ 0.980) amongst the female test persons. By contrast, the corresponding coefficients for the RUS2 and the RUS3 score of the bone-specific method of Tanner et al. display a significantly lower value (κ = 0.913). In the interobserver comparison, the weighted kappa coefficients calculated for the atlas method of Thiemann et al. and the bone-specific method of Roche et al. represent a very high degree of agreement of results amongst the female test persons (0.933 ≤ κ ≤ 0.953). Here too, the coefficients of the two tested versions of the bone-specific method of Tanner et al. are lower (0.849 ≤ κ ≤ 0.917). In the male gender, an overall tendency towards higher weighted kappa coefficients can be ascertained, and at the same time, methodologically induced discrepancies are less significant. Here, particularly high linear correlation measures emerge for the atlas method of Greulich and Pyle, in particular (intraobserver agreement, κ = 0.985; interobserver agreements, κ 1 = 0.949, κ 2 = 0.966) as well as the bone-specific method of Roche et al. (intraobserver agreement, κ = 0.992; interobserver agreements, κ 1 = 0.986, κ 2 = 0.984).

Discussion

Only recently could some of the today more or less established clinical methods to determine skeletal age on the basis of an X-ray image of the hand also be utilised for forensic age diagnostics [25, 26, 2831]. The methods available can only be used in forensic age estimation if they satisfy the criteria of a high degree of accuracy and reproducibility of the diagnoses.

To evaluate the criterion of the estimate accuracy of the methods analysed, use was made of Pearson’s correlation coefficient. This indicates the strength of the linear correlation between two metric variables and can thus be used as a measure of the linear correlation between the characteristics of method-dependent skeletal age on the one hand and of the chronological age of the test persons on the other [2]. As the correlation does not take into consideration whether there are differences in the mean values of the two distributions and whether the individual value ranges which result differ, the method-specific estimate accuracy can be evaluated without the additionally superimposing influence of the secular degree of acceleration in the different reference populations.

The correlation coefficients submitted with this study make it clear that, overall, a fundamentally high degree of agreement exists between the characteristics studied. They thus express the generally good estimate accuracy of all the assessment methods observed. This becomes particularly apparent not only in the case of the atlas methods of Greulich and Pyle and Thiemann et al., but also in the case of the bone-specific method of Roche et al. By contrast, the RUS2 and RUS3 versions of the bone-specific method of Tanner et al. stand out because they manifest the lowest correlation coefficients in the female gender.

One possible cause for the differences ascertained between the methods in relation to estimate accuracy which should be discussed is, first of all, the limited representativity of the standard criteria selected. In accordance with the present results, a particular advantage of the atlas methods according to Greulich and Pyle and to Thiemann et al. seems to be that they combine hand radiograms that are representative in terms of age with the verbal definition of various age-specific maturity criteria for the assessment of the skeletal maturity pattern. In contrast, the atlas method according to Gilsanz and Ratib dispenses with the specification of standard criteria that must be checked as an obligatory measure; this seems to limit estimate accuracy particularly where an inexperienced examiner is concerned. Compared with the atlas methods, the bone-specific methods of Roche et al. and Tanner et al., with their varying emphases on different designated standard criteria of the state of maturity of individual elements of the hand skeleton, offer a significant difference. In particular, the extremely comprehensive and complex system proposed by Roche et al., with its 13 skeletal proportions and 98 maturity indicators, which are divided into 232 stages and entered into the overall result in a differentiated manner, is evidently well able to represent the process of maturation of the hand skeleton. By contrast, the principle of skeletal age estimation of the hand introduced by Tanner et al. in its application of the clinically most significant RUS2 and RUS3 scores to the female test persons in the study population leads to a noticeable decrease in estimate accuracy. In this context, the identical stage differentiation of the radius and ulna used in both systems must be evaluated as deficiently inaccurate in view of the major influence of the state of maturity of the distal epiphyses of the forearm on overall skeletal age, particularly in the advanced phase of development [29, 41]. Thus, when comparing the genders, the earlier completion of maturation of the female hand skeleton increasingly leads to a lack of options for differentiation, particularly at a higher age, thus entailing the risk of incorrect estimates. As the carpalia have also already achieved their mature state in the corresponding age segment, this problem can also not be overcome by including them in the more comprehensive TW20 score of the method of Tanner et al. In fact, the only conceivable solution would be to additionally take into consideration other characteristics of the development of the radius and ulna, such as, for example, the forensically significant regression of the epiphyseal scar [3, 27]. Finally, a further general circumstance which may potentially compromise the estimate accuracy of the method of Tanner et al. must also be seen in the completely arbitrary assignment of point values to certain manifestations of characteristics [9]. In contrast to the method of Roche et al., the two tested scores do not take into account the probabilities of the interindividually varying presence of maturity indicators in the reference population [18].

The limitation of the estimate accuracy of a method for determining the skeletal age of the hand is also conceivable as an expression of the inhomogeneity of the reference population in question. Amongst the atlas techniques analysed, the method of Greulich and Pyle is based on a more or less clearly defined population of 1,000 Americans of mainly North European origin, born in the USA, and primarily from economically well-situated families. They were studied between 1931 and 1942 at the age of 0 to 18 years in the course of the Brush Foundation longitudinal study. The method of Thiemann et al. is based on a random sample of 5,200 children and adolescents aged from 0 to 18 years, selected in a standardised manner and analysed in 1977 to 1979 in the former GDR. However, even the bone-specific method of Roche et al. is based on a study population, examined from 1932 to 1977 in the course of the FELS longitudinal study, of 355 male and 322 female US-American children and adolescents with a relatively coherent national social structure. By contrast, according to the current recommendations of the Study Group on Forensic Age Diagnostics (AGFAD) [23], the atlas method of Gilsanz and Ratib does not meet the criteria for a reference study applicable for purposes of criminal proceedings, as no details at all are given on the reference population used. Thus, amongst other factors, potential inhomogeneities in the composition of this reference population cannot be excluded as a cause of the overwhelmingly most unfavourable estimate accuracies amongst all the atlas methods studied. The random sample in the TW2 bone-specific method of Tanner et al. is composed of some 3,000 British children from the middle and lower classes. The relevant X-ray images were evaluated between 1946 and 1970. With the TW3 method of Tanner et al., which did not appear until 2001, this reference population was adapted to the secular trend. The updated reference values are now additionally based on studies from Belgium [4], Spain [14] and USA [35]. A further potential source of blurring of the linear combination of characteristics of skeletal age and the chronological age of the test persons, particularly in the female gender, may be identified in the mixture of test persons of differing socioeconomic status, different geographic origin and heterogeneous secular degree of acceleration which characterises the random samples used in the TW methods.

In evaluating the criterion of the reproducibility of skeletal age diagnoses, the weighted kappa coefficient was applied as a measure of observer agreement between the method-dependent diagnoses of one and the same examiner and of the two examiners [42].

The weighted kappa coefficients identified in the course of the present study attest to the overall excellent agreement between the diagnostic results reached by one or both examiners in the case of all the methods studied. For evaluators with good experience, the achievable skeletal age diagnoses can therefore be considered reliable and, for the most part, independent of the observer.

The inter-method differences in the reproducibility of skeletal age diagnoses ascertained here may be explained, for the most part, by the margin of interpretation confronting the examiner, for example, as a result of the insufficiently exact description of methodological procedure and of the relevant estimate criteria, but also as a result of ambiguous reference images and illustrative graphics. In the overall view, the existing results initially reveal a generally lower potential for discrimination between the methods studied in the case of the male gender. This effect, in turn, may be derived as a consequence of a later completion of maturation accompanied by a longer-term conservation of maturity criteria which can be differentiated unequivocally. In the group of bone-specific methods studied, in particular, the specification of a few selected indicators of the development of the hand skeleton appears to have a limiting effect amongst female test persons. Thus, the lowest concordances of estimate results in the intra- and interobserver comparison were achieved with the method of Tanner et al. in the group of female study participants. Moreover, it proves problematic here too, independent of gender, that many examiners find it very difficult to understand the texts describing the stages, which promotes a superficial orientation by the accompanying illustrations [8]. However, the primary use of graphics reduced to the essential characteristics is a potential source of incorrect diagnoses [13]. Amongst the atlas methods analysed, the lowest rate of intra- and interobserver agreement is to be established for the method of Gilsanz and Ratib. In this connection too, the complete lack of a description and/or schematic characterisation of each of the most important age-relevant morphological criteria must be noted critically.

Much discussion has taken place on the potential advantages of the principle of the bone-specific technique as compared with the atlas method in age estimation practice. In particular, the weighting of characteristics, which is achievable using the relevant methods, is of outstanding interest for the clinician in the event of a dissociated skeletal maturation. Forensic age estimations are, however, contraindicated in the case of such maturation patterns, which is why questions need to be asked about the benefits of the bone-specific methods in this area of application in view of the greater difficulty of learning this procedure and the considerable additional expenditure of time in some cases. The data presented here confirm the assertion of various earlier studies that the greater expenditure of time does not stand in any acceptable relationship to the potential increase in estimate accuracy as compared with established atlas methods [1, 7, 15, 16, 41]. The results also make it clear that bone-specific methods likewise make little contribution to an improvement in the reproducibility of skeletal age diagnoses. This becomes particularly apparent in the case of the clinically world-renowned second and third versions of the method of Tanner et al. In the overall comparison, both of these are least able to fulfil the qualitative criteria that are decisive for forensic application and should therefore not be favoured, especially in the context of criminal proceedings. By contrast, the results of the present study confirm the scientifically high significance of the method of Roche et al. Moreover, in this case, the computer-supported calculation of the skeletal age is based on the person’s clinically known chronological age; this was an obstacle to the use of the method in forensic age estimation. The recommendation implemented in this study to replace the chronological age by the skeletal age determined for the test persons according to Greulich and Pyle cannot be applied to the forensic use of the method according to Roche et al. in view of the incalculable risk of a systematic assessment error. Here, more far-reaching studies would be indispensable.

As the present results show, the demands made by age estimation practice in criminal proceedings on a method of skeletal age estimation of the hand are particularly well met by the atlas methods analysed. However, in relation to the method of Gilsanz and Ratib, besides the methodological weaknesses highlighted here, it must be kept in mind that in the case of the 14-plus age groups relevant to criminal proceedings, there is a risk of overestimating chronological age in the female gender by up to 7.2 months [30]. For this reason, it is only of very limited use for the purposes of forensic age estimation. With an extremely strongly accelerated reference population in the case of the methods of both Greulich and Pyle and Thiemann et al., the risk of overestimating chronological age and thus of causing prejudice to a defendant in criminal proceedings is minimised. Both methods can be recommended unconditionally for use in the area of criminal proceedings. Compared with the method of Greulich and Pyle, which is today one of the most established worldwide, the method of Thiemann et al. has so far been applied mainly in the German-speaking world. Its predestinating qualities are primarily to be seen in the more up-to-date reference population and an improved scientific study concept involving defined inclusion criteria.