Introduction

Age estimation is often an imperative procedure in circumstances where chronological age (CA) is ambiguous, for example, in determining whether a child has reached the age of criminal responsibility during judicial proceedings or whether undocumented individuals have attained the age at which employment, marriage, adoption, or immigration is permitted [1,2,3]. Age estimation of unknown corpses and skeletal remains is also important in anthropological and forensic science [4] and can be helpful for diagnoses and treatment planning in clinical dentistry and orthodontics [5, 6].

CA estimation based on dental panoramic radiography is the most commonly age estimation method [7,8,9,10,11,12,13,14]. Dental calcification development is independent of somatic or sexual maturity, as it is genetically controlled, making it superior to other age estimation methods, such as skeletal indicators [15, 16]. The developmental stages of teeth, compiled by Demirjian in 1973 and revised in 1976 [5, 6], based on French-Canadian individuals is the most widely applied method, on account of its rationality, ease of application, and objectivity. Willems et al. [1] developed a modified version of Demirjian’s scoring system using a Belgian population, by simplifying the conversion steps. Some previous studies have reported that estimated dental age (DA) determined by the method of Willems was more accurate than that generated using the Demirjian method [3, 8, 17,18,19]; however, other studies have reached the opposite conclusion [20, 21].

Ethnic differences significantly influence the development of teeth [18, 20]. Data regarding the accuracy of the Demirjian and Willems methods are lacking for children from Hunan province in central southern China. Accordingly, the objective of this study was to analyze the suitability of the Demirjian and Willems methods for estimation of the CA of Han population children from central southern China.

Subjects and methods

Subjects

Orthopantomograms (n = 1249) for this study were randomly selected from those obtained from patients attending XiangYa Stomatological Hospital of Central South University in Hunan Province. Data were collected from January 2016 to November 2017. The children included were 603 girls and 646 boys (age range, 8–16 years). Samples were divided into nine groups according to CA (one group per year). The sex and age distributions of each group are presented in Table 1.

Table 1 Age and sex distribution of the samples

Selection criteria

The inclusion criteria were as follows: (1) children whose ethnic origin was of the Han population of Hunan province, (2) the orthopantomogram was clear and of high quality, (3) no history of medication or surgery that could affect the eruption and mineralization of the mandibular permanent teeth. The exclusion criteria were as follows: (1) orthopantomogram with missing mandibular permanent teeth on both sides of the mandible, except the third molar; (2) deciduous teeth remaining on the mandible; (3) orthopantomograms from children with systemic disease; (4) gross pathology or tumor in the mandible.

Prior to conducting our research, ethical approval was obtained from the ethics committee of XiangYa Stomatological Hospital and informed consent was provided by all individual participants included in the study, according to the Declaration of Helsinki.

DA estimation

The Demirjian method presents eight development stages of dental maturity, from initial mineralization (stage A) to root completion (stage H), for each of seven left permanent mandibular teeth [5]. Scores were given corresponding to the stage of each tooth and the sum of these scores was converted into DA, with reference to published conversion tables and/or percentile curves [5, 6]. The sum of the corresponding scores for each tooth directly represented the DA of a subject, according to Willems’ table [1]; there are separate tables for boys and girls. CA was obtained by subtracting the date of birth from the date of taking radiographs.

Statistical analyses

SPSS version 20.0 for Windows (IBM, Armonk, NY, USA) and MS Excel (Microsoft Office) were used for statistical analyses and data management. The Kolmogorov-Smirnov test was used to test the normal distributions of age differences, for each age group and sex. The paired t test was used to analyze the statistical significance of differences between the means of CA and DA by age group and sex. The Wilcoxon signed rank test was also applied to assess the significance of differences between CA and DA in some age groups where these exhibited non-normal distributions. The mean absolute error (MAE) was used to quantify the accuracies of the two methods [22]. A p value less than 0.05 was considered statistically significance.

Intra- and inter-observer reproducibility

All measurements were performed separately by two trained examiners, who were ignorant of sex and CA. To assess the intra- and inter-observer reproducibility of determination of Demirjian stage for each tooth, we calculated Cohen’s Kappa coefficient values. An independent set of 53 randomly selected orthopantomograms were re-examined after an interval of 3 weeks.

Results

Intra- and inter-observer reproducibility was satisfactory, with Kappa coefficients of 0.963 and 0.934, respectively. The mean CA of all the children was 12.04 ± 2.55 years (boys, 12.12 ± 2.59 years; girls, 11.94 ± 2.49 years).

The Demirjian method

For all samples, mean DA estimated using the Demirjian method was slightly higher (0.002 years) than mean CA; the difference was not statistically significant (p = 0.26). In addition, the differences between DA and CA for girls and boys were not statistically significant (p = 0.48 and p = 0.59, respectively). For girls, the mean DA was underestimated, with a mean difference of 0.03 years, while for boys, the mean DA was overestimated by 0.03 years (Table 6).

Comparisons of the accuracies in each age group for girls and boys are presented in Tables 2 and 3 and Fig. 1. For girls, the age difference most frequently observed was between 0 and 0.5 years and the largest under- and overestimation of age was 4.5 years. For boys, the age difference most frequently observed was between − 0.5 and 0.5 years and the largest under- and overestimations of age were 4.0 years. For both sexes, the mean difference between the CA and DA was < 1 year, apart from boys aged 16 years (1.21 ± 0.88 years). For girls, mean differences were only statistically significant in three age groups (8, 12, and 16 years), while for boys, there were statistically significant differences in five age groups (10, 12, 13, 14, and 16 years).

Table 2 Comparison between chronological age and the Demirjian dental age for girls
Table 3 Comparison between chronological age and the Demirjian dental age for boys
Fig. 1
figure 1

Bar graph of age differences distribution between chronological age and dental age using the Demirjian and Willems methods

The Willems method

For all samples, the mean DA estimated using the Willems method was underestimated with a mean difference of 0.49 years, which was a statistically significant difference (p < 0.001). The differences between DA and CA were also statistically significant for both girls and boys (p < 0.001). The mean DA was underestimated relative to the CA for both girls and boys, with mean differences of 0.54 and 0.44 years, respectively (Table 6).

Comparisons of the accuracy for each age group for both sexes are presented in Tables 4 and 5 and Fig. 1. For girls, the age difference most frequently observed was between 0.5 and 1.0 years and the largest under- and overestimations of age were 5.0 years and 3.5 years, respectively. For boys, the age difference most frequently observed was between 0 and 1.0 years, with the largest under- and overestimations of age 4.5 and 4.0 years, respectively. For girls, the mean differences between CA and DA were < 1.0 year in all age groups; however, they were closest to 1.0 year in the age groups 10, 11, and 16 years. For boys, the age differences were < 1.0 years in all groups except 15 and 16 years. For girls, the mean difference only lacked statistical significance in two age groups (12 and 15 years), while for boys, there was no statistically significant difference in three age groups (8, 12, and 13 years).

Table 4 Comparison between chronological age and the Willems dental age for girls
Table 5 Comparison between chronological age and the Willems dental age for boys

Comparison of results obtained using the Demirjian and Willems methods

For girls, the Willems method resulted in clear underestimation of CA compared with the Demirjian method for all age groups, except for girls aged 13 years; in this age group, both methods overestimated the mean CA. In addition, the Demirjian method overestimated the CA in the age groups 11, 12, and 14 years.

For boys, the Willems method underestimated the mean DA in all age groups except those aged 8 years, while the Demirjian method underestimated the mean DA in four age groups (9, 11, 15, and 16 years). Hence, both the Demirjian method and the Willems method resulted in more accurate estimation in lower, relative to higher, age groups for both sexes (Fig. 2).

Fig. 2
figure 2

Comparison of dental age between the Demirjian method and the Willems method by gender

The small difference in overall MAE between the two methods was not significant (Table 6); however, for girls, the MAE was higher when the Willems method was used (0.84 years vs. 0.72 years for the Demirjian method). The MAE was also higher than the Willems method for boys (0.88 years vs. 0.66 years for the Demirjian method).

Table 6 Comparison between the Demirjian dental age and the Willems dental age for girls and boys

Discussion

Age estimation of children has a significant role in forensic personal identification, clinical dentistry, and particularly in the determination of legal responsibility. In China, different age groups have different levels of criminal liability, for example, children < 14 years old do not have any criminal liability, those aged from 14 to 16 years should have some criminal liability, and on reaching the age of 16, individuals assume full criminal liability [23].

Dental maturity is a widely used indicator to evaluate dental development and it can also be used to estimate human age. The Demirjian method has been established for more than four decades (since 1973), and it has become the most popular method for estimation of DA [5]. Although the Demirjian method is precise when used for the reference population of French-Canadian individuals, the authors note that their system may not be accurate for other populations [5], and its inaccuracies have been reported in a number of publications [23,24,25,26,27,28,29,30]. A meta-analysis recently showed that the Demirjian method overestimated CA, with a weighted mean difference of 0.62 for males and 0.72 for females [19]. The Willems method, which is a modification of the Demirjian approach, has been the subject of a great deal of research interest, and a recent meta-analysis reported that the majority of studies using the Willems method did not report significant overestimation of age for either sex (0.26 and 0.29 for male and females, respectively) [19]. Nevertheless, some previous studies reported that the Willems method is inaccurate when used in various populations [31,32,33,34]. Hence, population-specific standards should be employed to achieve the most accurate age assessment, rather than a universal standard developed for use in other populations [19].

A total of 1249 panoramic radiographs, from children aged 8–16 years old, were finally chosen for use in estimation of the applicability of the two methods of DA estimation. Panoramic radiographs are rarely taken as routine dental radiographs for children below 8 years old in the clinic; therefore, no panoramic radiographs from children < 8 years old were included in the present study. Furthermore, the applicability of the Demirjian method is limited for estimation of ages > 16 years; hence, no panoramic radiographs from children > 16 years old were included.

In our study, the Demirjian method underestimated the age of girls and overestimated that of boys by 0.03 years in both cases. In contrast, the Willems method underestimated the ages of both girls and boys by 0.54 and 0.44 years, respectively. These data indicate that the Demirjian method is more accurate than the Willems method, contrary to the results of previous studies from other countries [8, 35,36,37,38,39]. For example, Maber et al. showed that the method of Willems was more accurate than the Demirjian method for estimation of the ages of children from London, United Kingdom, with the mean underestimations of 0.05 years for boys and 0.20 years for girls [8]. Grover et al. found that the Demirjian method overestimated DA by a greater degree (0.66 years for boys and 0.56 for girls) than the Willems method in North India [35]. Medina and Blanco reported the mean differences between DA and CA of 0.62 and 0.15 years for the Demirjian and Willems methods, respectively; their data showed that the Willems method for age estimation was more accurate than the Demirjian method in Venezuelan children [39]. However, another study reported that the Demirjian method was more accurate than the Willems approach, consistent with our findings, with an overestimation of 0.1 years for both sexes [20]. The divergence between our results and those reported for other populations is likely due to biological variation among children with different ethnic origins. In addition, sample size, age range, and the age distribution of the samples, and the statistical approach used, may also contribute to the observed differences.

There are 56 ethnic groups in China, which covers a large geographical territory comprising more than 20 provinces, and there may be differences in the growth and the development of children in different provinces of China. This explains why our results are inconsistent with those of previous studies comparing these two DA methods in other Chinese provinces [21, 22]. For example, Xiuxia Ye et al. tested the Demirjian and Willems methods in 941 children from southeastern China (Shanghai municipality) and reported that the Demirjian method clearly overestimated DA by 1.68 years for boys and 1.28 years for girls compared with the Willems method; they concluded that the Demirjian method was not accurate for use in southeastern Chinese children [22]. Although Yue Zhai et al. found that the Demirjian method was more accurate than the Willems method, consistent with our findings, the Demirjian method underestimated age, similar to the Willems method, by 0.47 and 0.63 years for boys and girls, respectively, in northern Chinese children [21]. Similarly, Tunc et al., Altunsoy et al., and Celikoglu et al. reported differences between geographical areas or cities within the same country (North, West, and East Turkey) [29, 40, 41].

Based on our findings, the smallest age differences for both of these two methods were observed in younger age groups, consistent with some previous studies [29, 41,42,43]. Hagg and Matsson reported that, among the various stages of development involved in DA estimation, those occurring in younger children were of shorter duration than those in older individuals; thus, the higher accuracy for young children may be attributable to the evaluation of a large number of stages with shorter durations [42]. The maximum age difference was observed in the oldest age group (16 years), particularly among boys. This may be due to the fact that, when the apical ends of the root canals of the seven left mandibular teeth close, the maximum maturity score in the table for conversion of the maturity score to DA was 98.4 for boys, rather than 100, and there is no means of more accurate dental scoring [8, 44]. Therefore, the age of any individual older than 16 years will be underestimated using this method; however, the generally reported observation that DA assessment for girls is more accurate than that for boys using these two methods was not confirmed by the present study [19, 20, 28, 44].

As this was a retrospective study of radiographs sampled from the patient records from the central southern Chinese Xiangya Stomatological Hospital in Hunan province of Central southern China, our results can only be considered representative of Chinese children in this region; therefore, the accuracy of the method requires verification in other districts of China. Moreover, the applicability of the method to other age groups, or its use with the inclusion of third molar data, warrants further study in the future.

Conclusions

According to our results, the standards developed by Demirjian are more appropriate for estimation of the DA of central southern Chinese children, compared with the Willems method; however, the method should be applied with caution for other age groups and we recommend further research to obtain more accurate results for examining groups of different ethnic and geographical origin.