Introduction

Estimating the biological age of patients is an important step in the treatment planning of orthodontics, special care dentistry, and oral medicine [1,2,3,4]. Other applications involve the forensic field, especially when it comes to the assessment of the age of legal majority [5,6,7,8] and human identification [9,10,11]. Dental development plays a key role for age estimation of children and adolescents [12]. Studies advocate that the development of the human teeth can resist extrinsic and intrinsic factors, such as malnutrition [13], early/delayed extraction of deciduous teeth [14], and even systemic diseases [15].

Most of the dental age estimation techniques designed to assess dental development in children rely on radiographic analyses [16,17,18,19,20]. Operator-depending procedures into the radiographic analyses include linear measurements of tooth ratios [21] and the classification of crown/root developmental stages [22,23,24]. Authors have demonstrated that staging techniques may be better than metric techniques [25]. In staging techniques, dental development is categorized into ordinal variables, and the stages are weighted into regression formulae, or maturity scores, to be converted in dental age [23].

The most popular technique developed to categorize developmental stages was proposed in 1973 by Demirjian et al. [23]. According to the authors, the seven permanent teeth of the left side of the mandible (3rd quadrant – except the third molar) can be classified into eight stages of crown/root formation [23]. The authors also provide a description of the stages to guide the operator during image analysis [23]. Some of Demirjian’s stages are based on the appearance of specific anatomic features (i.e., molar radicular bifurcation in stage E); other stages are based on the proportion of root/crown sizes (i.e., “the root length is equal to or greater than the crown height” in stage F). After staging, Demirjian’s method requires a conversion of dental development into self-weighted scores [23]. Further on, the scores are combined into a single maturity score that will lead to an estimated age (AE) [23]. Despite the recurrent use of Demirjian’s technique in the scientific literature [26,27,28,29,30], studies have revealed consistent overestimations of the chronological age (CA) [31]. More specifically, a recent meta-analysis showed overestimations of 0.62 and 0.72 years in males and females, respectively [31].

In 2001, Willems et al. [32] performed an age estimation study with 2116 Belgian children to revisit Demirjian’s approach and provide alternative self-weighted values [32]. In 2017, four meta-analyses confirmed Willems’ method as reliable for age estimation [31, 33,34,35]. More recently, a meta-analysis [36] ranked Willems’ method as the best for age assessment of Brazilian children. The outcomes, however, were based on a sample of nearly 900 individuals from the South region of Brazil [37], which does not necessarily represent the continental-size country and population. Other studies with Willems’ method in the Brazilian population are available, but they have evident restrictions based on unbalanced [38] and small sample sizes [39, 40]. Others were designed case–control to study patients with systemic diseases [41, 42], and one tested the performance of a new statistical model [43].

An existing gap remains regarding the need for large sample-sized observational studies with balanced samples of Brazilian boys and girls. Equipped with over 60 dental offices and a centralized Oral Imaging and Diagnosis Unit, the Central Dental Clinic of the Brazilian Army (OCEx), in Rio de Janeiro/Brazil, provides nationwide dental treatment – figuring as an optimal source of panoramic radiographs for country-specific sample collection.

Based on the exposed, this study aimed to assess the dental development of Brazilian children from a large and balanced sample of panoramic radiographs collected from the Central Dental Clinic of the Brazilian Army (OCEx). Willems’ and Demirjian’s methods were used for dental age estimation.

Material and methods

Study model and ethical aspects

This was a cross-sectional study with retrospective sample collection. Ethical approval was obtained from the institutional committee of ethics in human research (protocol number: 43741421.0.0000.5374). The study was reported following the “Enhancing the Quality and Transparency of Health Research” (EQUATOR Network) considering the STROBE (The Strengthening the Reporting of Observational Studies in Epidemiology) checklist for cross-sectional studies [44].

Setting and participants

The sample consisted of panoramic radiographs (n = 1.990) equally obtained from Brazilian boys (n = 995) and girls (n = 995) between the years of 2008 and 2020 at the Central Dental Clinic (OCEx) of the Brazilian Army in Rio de Janeiro/Brazil. The sample was collected retrospectively from the Clinic’s database, and the radiographs were exclusively obtained for clinical reasons. The inclusion criteria consisted of (a) Brazilian nationality; (b) age between 3 and 15.9 years; and (c) record of (at least) a panoramic radiograph in the OCEx’s database. Exclusion criteria were (I) radiographs with bilaterally missing teeth in the mandible; (II) presence of extensive decay, restoration and root canal treatment bilaterally in the mandible (except third molars); (III) presence of visible bone lesions, surgical appliances in the mandible, deformation of maxillofacial bones, and visible dental anomalies; (IV) poor image quality; and (V) missing data about date of image acquisition and patient’s date of birth and sex. Sample collection was performed between January and June 2021. The sample size established in the present study was set to reach as close as possible Willems’ original sample size (n = 2.116) [32]. Additionally, the number of panoramic radiographs was set to be higher and more balanced based on age and sex compared to previous studies in Brazil [38,39,40,41,42]. The panoramic radiographs were imported to a personal computer equipped with a 15″ screen and Adobe Photoshop CS6™ image viewer (Adobe Inc. San Jose, CA, USA) for magnification of 100% and eventual adjustments of brightness and contrast prior to analysis.

Variables and data sources

From each panoramic radiograph, the date of image acquisition was registered, as well as the patient’s date of birth and sex. The difference between dates led to patient’s CA. For methodological purposes, the CA was converted into categorical data by stratifying the sample into 10 age categories: 3├ 7; 7├ 8; 8├ 9; 9├ 10; 10├ 11; 11├ 12; 12├ 13; 13├ 14; 14├ 15; and 15├ 16 years (Table 1). The analysis of the panoramic radiographs was performed in a dimmed room under standard viewing conditions [45]. No more than 25 radiographs were analyzed per day to avoid visual fatigue [46]. Dental age estimation followed the technique proposed by Demirjian et al. [23] through the classification of dental development into eight stages (A–H) (Fig. 1). The AE was obtained by converting Demirjian’s allocated stages into Willems’ scores for boys and girls [32]. Similarly, the same stages were converted into EA according to Demirjian’s original maturity scores and age conversion tables [23].

Table 1 Sample distribution based on sex and age categories
Fig. 1
figure 1

Panoramic radiograph of a 11,5-year-old Brazilian boy that shows the seven mandibular left permanent teeth classified with Demirjian’s stages F, G, and H – in this case illustrating intermediate, advanced and terminal phases of root formation. According to Demirjian’s original description: in stage F (for incisors, canines and premolars) the root length is equal or greater than the crown height; in stage G the walls of the root canal are now parallel, and their apical end is still partially open; in stage H the apical end is completely closed [23]

Operator-dependent bias

The main observer was an orthodontist (12 years of experience) with a background (MSc) in oral radiology. In order to assess the consistency of the observer and reproducibility during image analysis, intra-observer agreement test was performed (re)assessing (T2) 100 panoramic radiographs within 30 days from the analysis of the original sample (T1). In parallel, an additional observer (oral radiology specialist) was recruited to assess the same 100 panoramic radiographs to enable inter-observer agreement. The two observers were supervised by a forensic odontologist with ten years of experience and background in dental age estimation studies. The rationale behind the number of radiographs (n = 100) that were re-assessed for examiner agreement tests was based on a previous dental age estimation study [47] that had a similar sample size (n = 1.900). The panoramic radiographs selected for intra- and inter-observer agreement tests had 700 teeth analyzed. Since image analysis consisted of the classification of dental developmental stages according to the technique of Demirjian et al. [23], a balanced selection of the total sample (n = 1.990 radiographs, n = 13.930 teeth of the third quadrant) was established. In this process, panoramic radiographs that were previously coded with alphanumeric identifiers were randomized (www.random.org) within each of the 10 age categories (resulting in 10 randomization steps). The first 10 panoramic radiographs ranked after randomization within each age category were selected for observer reproducibility tests. This strategy enabled a collection of radiographs from representatives (individuals) of all the age categories and in equal numbers. Consequently, all the available developmental stages were covered by the sample. Intra- and inter-observer agreements were quantified with Weighted Kappa statistics following previous studies in the field [37, 47, 48]. Both the intra- and inter-observer agreements were calculated individually for each of the teeth of the 3rd quadrant (except the third molar).

Quantitative variables and statistical methods

Data analysis was performed by means of descriptive and exploratory statistics of central tendency and dispersion. Shapiro–Wilk test was used to assess data normality regarding CA and EA. The comparison between CA and EA for each of the methods was accomplished with paired t-tests, intraclass correlation coefficient, and Bland–Altman approach. Additionally, the latter was used to assess the variation of CA and EA and to estimate the concordance intervals. The non-parametric methods of Kruskal and Dunn were used to assess the variation of AE across the 10 age categories, while Mann–Whitney’s test was used to assess variations between boys and girls. R software (R Foundation for Statistical Computing, Vienna, Austria) was used during statistical analyses. Statistical significance was set at 5%.

Results

Intra-observer agreement reached Weighted Kappa values of 1.0, 0.98, 0.94, 1.0, 0.97, 0.97, and 0.98 for the teeth #31, 32, 33, 34, 35, 36, and 37, respectively. For the same teeth, inter-observer agreement reached Weighted Kappa values of 0.97, 0.85, 0.81, 0.86, 0.80, 0.93, and 0.90, respectively. These results indicate almost perfect agreement (0.8–1.0) [49].

Shapiro–Wilk test revealed a normal distribution of the CA and of the EA (for Demirjian’s and Willems’ methods) (p < 0.0001).

The initial exploratory description of the sample revealed a mild overestimation of the CA using Willems’ method and a more evident overestimation from the application of Demirjian’s method in boys and girls (Table 2).

Table 2 Descriptive overview of the sample’s chronological age and the age estimated with Demirjian’s and Willems’ methods

Bland–Altman’s approach showed a small bias between the EA from Willems’ method compared to the CA: 0.02 years (0.06 in boys and − 0.02 in girls). The intraclass correlation coefficient outcome (0.97) suggested an excellent correlation between CA and EA [50]. For boys, girls and combined sex, there was a lack of statistically significant differences (p > 0.05) between CA and EA applying Willems’ method. Demirjian’s method, on the other hand, led to a considerably higher bias: 0.67 years (0.6 in boys and 0.74 in girls). Despite the excellent correlation between CA and EA (0.96), statistically significant differences were observed between CA and Demirjian’s EA for boys, girls, and both combined (p < 0.0001). Upper and lower limits of concordance for Bland–Altman approach in boys using Willems’ method were 1.98 and − 1.87, respectively. In girls, the values were 1.95 and − 1.99, respectively (Fig. 2). With Demirjian’s method, the upper and lower limits of concordance were 2.64 and − 1.43 in boys and 2.70 and -1.23 in girls, respectively (Fig. 3).

Fig. 2
figure 2

Visual representation of the difference between Willems’ (W) estimated age (EA) and the chronological age (CA) for boys (A) and girls (B). Mean difference between CA and EA: 0.06 in boys and − 0.02 in girls

Fig. 3
figure 3

Visual representation of the difference between Demirjian’s (D) estimated age (EA) and the chronological age (CA) for boys (A) and girls (B). Mean difference between CA and EA: 0.6 in boys and 0.74 in girls

Data analysis performed separately per age category revealed that the worse prediction by Willems’ method reached 0.38 in boys (age category: 13 ├ 14 years) and − 0.33 in girls (age category: 14 ├ 15 years). Demirjian’s method reached median values up to 1.09 in boys (age categories: 13 ├ 14 and 14 ├ 15 years) and 1.45 in girls (age category: 11 ├ 12 years). Between the ages of 10 and 12, the differences between CA and EA were not only clinically significant (over 12 months) but also statistically significant (p < 0.001) (Table 3).

Table 3 Differences between chronological and estimated ages for Willems’ and Demirjian’s method for boys and girls in each of the age categories

Discussion

In the past, the biological age assessment for clinical purposes was mainly based on skeletal parameters, such as the radiographic growth markers of the hand and wrist [51,52,53]. Awareness of radiologic optimization and justification led to new methods based on the cervical vertebrae assessed from lateral cephalograms [54,55,56]. The use of age assessment tools for clinical purposes increased with studies on pubertal growth spurt correlating dental and skeletal parameters [4, 5]. However, knowledge of the best country-specific dental age estimation method is necessary prior to adding the dental component to the study of biological age estimation. This large sample-sized observational study retrospectively assessed panoramic radiographs of a database in Brazil to test the performance of Willems’ and Demirjian’s methods for dental age estimation of boys and girls.

In 2021, a systematic literature review [36] screened dental age estimation methods applied in Brazilian children. According to the study, the seven methods that were meta-analyzed showed standardized mean differences between CA and EA of 0.05 years for Willems’ method [32], − 0.11 for Lilequist and Lundberg’s [57], 0.22 for Nolla’s [17], 0.27 for Mornstad’s [58], − 0.31 for Cameriere’s [21], 0.74 for Demirjian’s (seven teeth version) [23], and − 0.87 for Haavikko’s [59]. The outcomes of the present study are consistent with the current scientific literature, showing a variation between CA and EA of 0.02 years for Willems’ method and 0.67 for Demirjian’s method (boys and girls pooled together).

From a global perspective, the performances of Willems’ and Demirjian’s methods have a similar pattern. A systematic literature review and meta-analysis published in 2017 [31] revisited articles that used Willems’ and/or Demirjian’s methods in children worldwide. The authors found statistically significant overestimations of Demirjian’s method in boys and girls (p < 0.05) – an outcome that converges with our findings. The overall overestimations for boys and girls, considering all the meta-analyzed studies, were 0.62 and 0.72 years, respectively [31]. In our study, the overall overestimations were 0.60 years for boys and 0.74 years for girls. These outcomes highlight limitations of Demirjian’s method in practice since overestimation between 7 and 9 months may be clinically significant depending on the diagnostic/therapeutic need. From a forensic perspective, methods that lead to overestimations of nearly 1 year should be carefully considered as well – since they could categorize a child as an adolescent or even allocate a boy/girl into the legal age of sexual consent (i.e. > 14 years in Brazil). For Willems’ method, the overall overestimations for boys and girls reported in the literature are below 0.29 years (or 3.5 months) – a value that may be both clinically and forensically acceptable in most scenarios [31]. Contrasting, the performance of Willems’ method in our study was considerably superior, reaching overall overestimations of only 0.06 years (nearly 21 days in boys). These outcomes confirm Willems’ method as a proper dental age estimation tool to assess the biological age of Brazilian boys and girls. Moreover, our findings validate the improvements made by Willems et al. [32] after revisiting Demirjian’s technique [23] in 2001 – since overestimations were drastically reduced.

A deeper look into data stratified based on age categories shows that the best performances of both methods occurred within young age categories. Willems’ method [32] performed better in children with ages between 3 and 12 years, in which the difference between CA and EA was not higher than 0.13 years (boys and girls combined). Demirjian’s method [23] had better performances in children between 7 and 10 years, in which the difference between CA and EA was below 0.65 years. In Demirjian’s outcomes, more specifically, the difference between CA and EA is more evident since statistically significant differences (p < 0.05) were found between age intervals. A reasonable explanation for this phenomenon is the reduced number of developing teeth over time. Consequently, dental age estimation is performed based on restricted dental age information available. The scientific literature corroborates this phenomenon by showing a vast number of dental age estimation studies that sample individuals below the age of 16 years [31, 38, 43]. Moreover, Demirjian et al. [23] advertise that their method is applicable to children between 3 and 17 years. Older individuals would have all the seven mandibular left permanent teeth in stage H (complete apical formation) and would lose reference for proper age estimation.

One of the inherent limitations of Willems’ and Demirjian’s methods, as well as most of the existing dental age estimation methods based on radiographic analysis, is the operator-dependent procedure to allocate dental stages to developing teeth – which is subjective. Despite the almost excellent intra- and inter-observer agreement detected in this study, the performance of the methods could be biased by operator performance. This is one of the reasons that could explain the lower error rates detected in the present study compared to previous investigations of Willems’ method [31], for instance. Future studies in the field should focus on reducing operator-dependent procedures with automation tools, such as artificial intelligence. Machine learning fed with big-data seems to be the next step in dental age estimation studies.

In this large sample-sized observational study, Willems’ method had evidently superior performance compared to Demirjian’s method for dental age estimation. The reliability of the present evidence is supported by the methodological decisions that enhanced the protocols of country-specific studies that were limited in the past.