Introduction

Forensic scientists use age estimation to help discover the identity of unidentified dead individuals. In living persons, age estimation can be used, for example, to decide whether a perpetrator with unproven age (mostly young immigrants without documents) has reached the age of criminal responsibility and, if so, whether that age is adult or still minor, and therefore which criminal code is applicable. The maturation of bones and teeth is strongly correlated with chronological age in young individuals and can therefore be used to estimate age. Current recommendations in Germany [1] and in other countries suggest incorporating dental age into the age estimation procedure. Research in this area has focused on maximizing the precision of age estimation methods to keep the frequency of erroneous decisions in legal proceedings as low as possible. For an overview on age estimation in clinical dentistry, the interested reader is referred to a recent work by Kirschneck and Proff [2].

Several schemes of assessing dental maturity have been proposed. One of the most popular is the classification of mineralization stages introduced in 1973 by Demirjian and colleagues [3]. Each tooth from the first incisor to the second molar in the left mandible is assigned a stage from A (beginning mineralization) to H (apex closed). These stages are converted to maturity scores using one table, then another table is used to convert the sum score into an estimated age.

The Demirjian method was updated in 2001 by Willems and co-workers [4] using newer data from Belgian children. The updated method also applies Demirjian stages A–H, but the resulting sum score provides the estimated age, meaning that there is no second conversion step. Both the Demirjian and Willems methods are applicable up to 16 years of age because mineralization of the incorporated teeth has then finished in most individuals meaning that these methods cannot be used to discriminate ages beyond this threshold.

In 2010, an atlas method was proposed by a research group from London [5, 6]. The atlas consists of a series of schematic reference images of the whole dentition for certain ages. The expert performing age estimation determines which of the reference images best matches the panoramic radiograph of the individual being examined, and the age corresponding to that reference image is the estimated age assigned to the individual. After the first year of life, the London Atlas assigns age in 1-year ranges.

This study compared the performance of the three available methods (the “classical” Demirjian scheme, its update by Willems, and the London Atlas) for forensic age estimation.

Methods

Routine panoramic radiographs from children treated at the orthodontic department of the University Hospital Würzburg were assessed. Eligibility criteria were age 6–16 years, sufficient image quality and presence of teeth in order to apply the London Atlas and to perform Demirjian staging. Individuals were excluded if there were any notes in their patient records indicating that they had or might have had a systemic disorder that would potentially affect the speed of dental maturation.

Panoramic radiographs were digitalized using an X-ray scanner and stored as bitmaps. The examiner was free to apply any standard image processing procedures such as changing size, brightness or contrast so that they felt subjectively comfortable when rating dental age or stages of particular teeth. Ratings were carried out by the second author who had previous training and supervised by the last author during data capture. For each of the panoramic radiographs, age according to the London Atlas was rated first then Demirjian staging was done at least 14 days later.

The London Atlas [5] provides a series of schematic one-sided reference images of the deciduous and permanent dentition for different ages (in steps of 1 year from age 1.5 years onwards). The observer had to select the reference image that best matched the panoramic radiograph under consideration. To assess the extent of matching, all teeth were included, and stages of mineralization, eruption, and root resorption (of deciduous teeth) were taken into account. The observer chose one of the reference images, and age corresponding to that image was the estimated age according to the London Atlas. In-between estimates were not used. The same series of reference images was applicable for both sexes.

For Demirjian staging, the observer used the schematic and radiographic reference images provided by Demirjian and colleagues [3] to assess incisors then through to the second molars in the left mandible. Each tooth was assigned one of eight stages of mineralization (from A through H). Again, in-between stages were not used. If any tooth in the left mandible was missing, the contra-lateral tooth in the right mandible was used as substitute. Stages were converted into scores based on the tables provided [3], and the sum score was converted into estimated age. The conversion tables were implemented as program syntax in statistical software (SPSS), so that the computation of estimated age from mineralization stages was done automatically. The same reference images were applicable for both sexes, but there were separate conversion tables for boys and girls. The same teeth mineralization stages were used for age estimation using Willems’ method. Again, stages were converted into scores, which provided the estimated age.

Repeat assessment of 30 randomly selected panoramic images was carried out to assess intra-observer agreement for all assessments.

Statistical analyses

Methods were compared for bias (systematic over- or underestimation of age on average across all individuals) and mean absolute error (absolute difference between estimated and true age) using the one-sample t test for assessment of each method. Repeat measurement analysis of variance was used to compare the methods to each other; Cochran’s Q test was used for related samples to determine the frequency of absolute errors above 2 years. In addition, a case-wise comparison between each pair of methods was carried out by defining that for each two methods A and B and for each case X:

  • Method A wins if |AgeEstimate(A,X)–TrueAge(X)| < |AgeEstimate(B,X)–TrueAge(X)|

  • Method B wins if |AgeEstimate(A,X)–TrueAge(X)| > |AgeEstimate(B,X)–TrueAge(X)|

  • Tie of A and B if |AgeEstimate(A,X)–TrueAge(X)| = |AgeEstimate(B,X)–TrueAge(X)|

The null hypothesis of equal probabilities of winning of A and B was examined using the binomial test. As a fourth method of age estimation, we computed the average of the age estimates obtained from the two newer methods (Willems’ and London Atlas), and compared them to those of the three methods described above. Two-sided P values of < 0.05 were considered statistically significant.

Sample size

It was calculated that 471 subjects would be required to achieve 95% power to detect a bias of ± 3 months for a method with a standard deviation of errors of 1.5 years. A sample of 327 subjects would provide 95% power to detect a superiority of 60:40 in the case-wise comparison of two methods. Therefore, sample size was set at 500.

Ethics

This study retrospectively evaluated anonymous X-ray images obtained in routine clinical practice. Therefore, no ethical evaluation or approval was required according to German law (written statement by the Ethics Committee of the Medical Faculty of the University of Würzburg, reference number 20170317-01).

Results

A total of 500 patients were included (Table 1). The intra-class correlation coefficient for intra-observer agreement was 0.95 for Demirjian’s method and 0.98 for the age estimates obtained by the London Atlas.

Table 1 Age at panoramic imaging and sex of included subjects

Performance data for the three methods and their comparisons are summarized in Table 2. Demirjian’s method had only a modest bias towards estimating age as too low, whereas Willems’ method systematically underestimated age by an average of 4.5 months, and the London Atlas overestimated age by approximately 3.5 months. However, the mean absolute error of Willems’ method and the London Atlas was smaller than the mean absolute error of Demirjian’s method, but only the comparison between Willems’ and Demirjian’s method reached statistical significance. The same result was obtained when considering the number of individuals with an absolute age estimation error of > 2 years. In the case-wise comparison (Table 3, upper part), Willems’ method tended to perform better than the other two methods, but this trend did not reach statistical significance. The London Atlas was comparable with Demirjian’s method.

Table 2 Comparison of precision of prediction of chronological age by dental age based on Demirjian’s and Willems’ scoring methods, the London Atlas, and the average of Willems’ method and the London Atlas
Table 3 Case-wise comparison of Demirjian’s (DS) and Willems’ (WS) scoring methods, the London Atlas (LA), and the average of WS and LA (AM) to each other with respect to the precision of the prediction of chronological age by dental age

Bias almost disappeared when the average age estimates of Willems’ method and the London Atlas were used because these two methods had similar biases in opposite directions. Furthermore, compared with each of the three individual methods, the average method had a bias that was significantly closer to zero with a narrower confidence interval, and the mean absolute error and percentage of cases with an absolute error of > 2 years were significantly smaller (Table 2).

In the case-wise comparison, the average of the Willems’ and London Atlas methods provided a significantly more precise estimate of age than each of the three individual methods (Table 3, lower part). Thus, the average of two methods performed best with respect to all criteria considered.

Figure 1 displays the errors quoted in Table 2 separately for both sexes. The significant overall bias of Demirjian’s method was attributable solely to underestimation of age in boys, and there was no bias in girls. The underestimation of age by the Willems method was comparable for boys and girls, and the overestimation of age by the London Atlas method was considerably larger in girls than in boys. Notably, the average method had modest biases with different signs for boys and girls, but the difference between both small biases was significant. Mean absolute errors of all methods were comparable for both sexes. However, the rates of absolute errors of > 2 years were larger in boys than in girls for Demirjian’s method and larger in girls than in boys for the London Atlas, and were similar for both sexes using Willems’ and the average methods.

Fig. 1
figure 1

Bias, mean absolute errors, and percentages of absolute errors of > 2 years in boys and girls for the three methods and the average of Willems’ method and the London Atlas. Comparison P values refer to the null hypothesis that errors are equal in boys and girls

Figure 2 shows the case-wise comparisons from Table 3 carried out for boys and girls separately. On the horizontal axis, differences in the percentage of cases won by the first method minus those won by the second method are shown. The extent of superiority or inferiority of methods did not differ significantly between boys and girls, apart from more pronounced superiority of the average method over the Willems method in girls.

Fig. 2
figure 2

Case-wise comparison of Demirjian’s (DS) and Willems’ (WS) scoring methods, the London Atlas (LA), and the average of WS and LA (AM) to each other in boys and girls. Positive/negative differences of rates of “won” cases mean that the first/s method is superior, respectively. Comparison P values refer to the null hypothesis that the differences of rates are the same in boys and girls

Figure 3 displays the distribution of errors of age estimates obtained by the average of two methods. The mean error (bias) was − 0.04 years (i.e., a mean underestimation by 2 weeks), and the standard deviation was 0.96 years.

Fig. 3
figure 3

Distribution of the age estimation errors of the average of Willems’ method and the London Atlas

Discussion

Willems’ update of Demirjian’s age estimation method improved the absolute precision of the original method but increased the bias towards underestimation of age. The London Atlas was competitive with Demirjian’s method, although its 1-year classification bands limit the precision of age estimation that can be achieved. Willems’ method tended to be a bit more precise than the London Atlas, but none of the comparisons achieved statistical significance. Beyond numerical performance, some technical advantages of the London Atlas should be mentioned. First, it is very easy to use and takes less time than staging of individual teeth. Second, the scoring method cannot be applied in the simultaneous absence of contralateral teeth in the left and right mandible. In particular, about 5% of the population have agenesis of at least one second premolar [7, 8], the mandibular teeth are more frequently affected, and both mandibular second premolars are often missing. Both Demirjian and Willems scores cannot be computed in these cases, while the Atlas method can be used because it can be applied by looking at the rest of the dentition. This is also the case when some teeth cannot be assessed due to problems with image quality.

To compare our results with other studies examining the accuracy of Willems’ method, we refer to a recent meta-analysis incorporating 23 studies in various populations [9]. The mean biases reported ranged from underestimation by 8 months to overestimation by 7 months, compared with the underestimation by 4.5 months in our sample. There was a wide range of biases across studies and this did not appear to be associated with the region of origin (which might be explained by delayed mineralization in some populations due to malnutrition resulting from poverty). Therefore, we suggest that discrepancies in classification into Demirjian stages by observers from all over the world might be responsible for the wide variations reported.

Validation studies for the London Atlas are more consistent. Three recent studies [10,11,12] reported a slight overestimation of chronological age. Of those, data from a European population [10] showed a mean overestimation by 3.5 months, which is consistent with our finding.

Each of the methods has specific limitations. Discrimination of ages based on first incisors through second molars is poor from the age of 14 years onwards and impossible beyond age 16 years [13]. The scoring methods by Demirjian and Willems do not incorporate third molars, limiting their diagnostic power at the upper bound of the age tables. In addition, overestimation of age is the less likely as true age approaches 16 years (and impossible if this age has been reached). This results in age-dependent negative bias. On the other hand, tooth mineralization in girls is slightly ahead of that in boys when age is < 15 years (as confirmed in a recent study [13]). This probably explains the variation between sexes when using the London Atlas because this method is not sex specific. A combination of both methods might reduce the estimation errors occurring due to these limitations.

Indeed, our data showed that the best precision, with almost no bias, was achieved using the average of age estimates obtained from Willems’ method and the London Atlas. This is probably because the information obtained using each of these methods adds to the other. Scoring of particular teeth might be considered more objective and more detailed, while the overall assessment may have a larger subjective component. On the other hand, the atlas incorporates all teeth while the Willems method scores only the teeth of the left mandible (or their substitutes in the right mandible). This means that this method ignores asymmetric teeth development whereas an observer using the atlas could take this information into account. Furthermore, biases of Willems’ method and the London Atlas are in opposite directions such that these balance each other out in the average method (particularly the large bias of the atlas in girls).

Based on our findings, we suggest applying both methods simultaneously and using their average for dental age estimation when age is < 16 years. This is a new approach compared with the current practice of using a single preferred method. When methods are combined, the London Atlas should be applied first followed by assessment of Demirjian stages in the lower left mandible. The rationale for this recommendation is that the overall view of the dentition when applying the atlas method will probably have little influence on subsequent Demirjian staging of individual teeth. Conversely, if Demirjian stages were determined first and used to calculate estimated age, this could bias subsequent interpretation of overall dentition using the London Atlas meaning that the two methods did not provide independent information, and the error variance seen in our study would not be achieved.

In forensic age estimation, the combination of several methods involving different anatomic structures is recommended [1]. When combining age estimates obtained from teeth and hand bones [14, 15], the weighted average of both estimated ages had been suggested for the common estimate, where the weights should be inversely proportional to the variances of the errors of both methods [16]. Assuming that the standard deviation for the Thiemann hand atlas method was 0.97 years [16] and that for the average of Willems’ method and the London Atlas was 0.96 years, the weights for the two methods would be almost equal, and the common estimate would therefore be the usual average of dental and skeletal age (with a standard deviation of errors of 0.68 years).

The use of an orthodontic patient sample should be mentioned as a limitation of this study. Although patients with disorders potentially affecting dental mineralization were excluded, the variance in the speed of tooth mineralization and of eruption in individuals with an orthodontic treatment indication might be larger than in the general population. Therefore, age estimates might be less precise, and error rates for the detection of age thresholds could be increased. As a consequence, the standard deviation of the errors of the average of Willems’ method and the London Atlas in the general population might be less than the 0.95 years reported above. In the combined age estimation including skeletal age obtained from the hand, the teeth may then get a higher weight, and the standard deviation of the errors of the combined estimates might be less than the above-quoted 0.68 years. However, the validity of the comparison of several methods of age estimation should not be diminished by this limitation, because the higher biological variability in the sample would influence the results of all methods in the same manner, and therefore the property of one method being superior to another should not be altered.

In conclusion, the results of this study showed that a combination of the London Atlas and Willems’ scoring method provides more precise estimates of dental age than the current practice of applying a single method.