Introduction

Age estimation is an important component of both archeological and forensic anthropology investigations. The need to assess the age of an individual, especially with regard to the age of adolescents and young adults, has become increasingly important in those legal proceedings involving unaccompanied or asylum-seeking minors who lack proper identity documents [1,2,3,4]. South Africa is a country with a high rate of youth crime [5] and legal and illegal immigration [6,7,8], and is home to more than 1.2 million children representing the 49% of all orphans in the country [9]. Due to the nature of such cases, non-invasive methods for age estimation in the living could be used more effectively.

In 2006, Cameriere et al. [10] developed a linear regression model for age estimation in non-adults based on the relationship between age and measurement of open apices in seven left permanent mandibular teeth in 455 Italian Caucasian children ages 5–15 years. In 2007, the authors published a new paper with additional samples of children aged 4–16 years from various European countries, and provided a common formula for those countries [11]. Other researchers have applied this method to children from other countries with the majority reporting high accuracy using the European formula [12, 13], while a few researchers have recommended the use of population specific regression formulas [14, 15].

The main purpose of this study was to assess the applicability of Cameriere’s European formula [11] for age estimation in two samples of black and white South African children and, if this formula turned out to be unfeasible, to construct a new specific model, based on the relationship between the apical width and the tooth length (maturity index) of the seven permanent mandibular teeth, for South African children.

Materials and methods

The studied sample consisted of 1944 children of which 970 with black South African ethnicity (girls 491, boys 479) and 974 with European ethnicity living in South Africa (girls 493, boys 481) (Table 1). The children’s samples were collected from the database of a private orthodontic clinic in Pretoria, South Africa, and they were part of the orthodontic treatment plan. They were registered anonymously and their sex, date of birth, and date that the X-rays were performed were recorded for each panoramic radiograph. None of them were taken exclusively for this research and information about participant’s ethnicity was available.

Table 1 Samples distribution of South African population according to age, sex, and race

The inclusion criteria were as follows: age between 6 and 14 years at the time the panoramic radiographs were obtained, good-quality radiographs, and healthy children with known age and free of systemic disorders. Socioeconomic status was not evaluated among the participants.

Measurements

Evaluation was made of the radiographs of the left permanent developing mandibular teeth, except third molars ((Fédération Dentaire Internationale (FDI) 31 to 37)) [16]. Mandibular teeth were chosen because they can be easily visualized on panoramic radiographs and the left side was selected as default. When a tooth was missing, or it was decayed, extensively filled, crowded, or rotated, the contra-lateral homologous tooth was scored.

The dental age estimation was performed according to the method of Cameriere et al. [11]. The number of teeth with complete root development, i.e., apical ends of the roots completely closed (N0), was counted. Teeth with incomplete root development, i.e., with open apices, were also examined and the distance (Ai, i = 1,..., 7) between the inner side of the open apex was measured. To take into account the effects of possible differences among X-rays in magnification and angulations, measurements were normalized by dividing by the tooth length (Li, i = 1,..., 7). The ratio xi = Ai/Li, i = 1,..., 7 was therefore obtained and the dental maturity was evaluated according to the following variables: the normalized measurements of seven left permanent mandibular teeth, the ratio (xi), the sum of the normalized open apices (S) and the number (N0) of teeth with complete root development (Fig. 1).

Fig. 1
figure 1

An example of maturity index (xi) measurements in left mandibular teeth

X-ray images were processed by an open-source image computer-aided drafting program used to process and analyze digital images (ImageJ 1.49).

Statistical analysis

Three observers with varying levels of experience, two forensic odontologists (NA, LVP), and a PhD student (EC), analyzed the radiographs only knowing the sex of the individual.

As suggested by Ferrante and Cameriere [15], intra-class correlation coefficient (ICC) was applied to calculate intra- and inter-observer variability.

Repeated observations from the first author (NA) were used to assess intra-observer agreement, while inter-observer analysis was based on comparisons with those of the two other observers (LVP and EC). For this purpose, 50 radiographic panoramic images were randomly selected 4 weeks following the initial scoring to calculate percentage of agreement, for both intra- and inter-observer analysis.

Accuracy

All the radiographs were assessed to evaluate dental age (DA) of black and white South African children using the European formula [11]:

$$ \mathrm{Age}=8.387+0.282g\hbox{--} 1.692{x}_5+0.835{xN}_0\hbox{--} 0.116s\hbox{--} 0.139{sxN}_0 $$
(1)

where (g) is a variable, with g = 1 for boys and g = 0 for girls, (x5) is the ratio or the normalized measurement of the second premolar, (N0) is the number of teeth with finished maturation of root apex, and (s) is the sum of the normalized widths of apices of the seven left permanent developing mandibular teeth including also x5 (xi = Ai/Li, i = 1,..., 7).

The accuracy of DA estimation was defined as how closely chronological age (CA) could be predicted, measured as the difference between DA and CA for each child. For the ith child, the following formula has been applied:

$$ {E}_i={\mathrm{DA}}_i-{\mathrm{CA}}_i,i=1,\dots, n $$

where n is the sample size. The difference, Ei, is the error in the estimated age of the ith child. A positive result indicates an overestimation and a negative figure an underestimation. The accuracy of the age estimation was evaluated by analyzing the error distribution. In particular, the mean absolute error (MAE) has been estimated as follows:

$$ \mathrm{MAE}=\frac{1}{n}\sum \limits_{i=1}^n\mid {E}_i\mid; $$

In addition, the slope (βERR) of the linear regression of the estimated age error to chronological age and the interquartile range (IQRERR) of error distribution have been calculated.

Scatter plot and box plot graphs, and tables, were used to show relationships between CA and DA values for both sexes. Shapiro-Wilk [17] test was performed to ascertain whether error distribution shows a serious deviation from normality. Statistical analysis was performed using the R statistical software URL: https://www.R-project.org/ [18]. The threshold of significance was set at 5%.

Results

Intra- and inter-observer agreement and their 95% confidence intervals (95% CI) showed rates of 0.991 (95% CI 0.977–0.997) and 0.992 (95% CI 0.98–0.996), respectively.

The European formula showed an underestimation of the real age of children. Both mean and median of the error’s distribution were significantly less than 0 (p < 0.001) in both black and white children (Table 2). Furthermore, plots of the errors against the observed ages (Fig. 2) by using eq. (1) showed that there is a bias in the error’s estimate. The slope of the straight line was − 0.133 in white children and it was − 0.178 in black ones; in both cases, it was significantly different from 0 (p < 0.001). This means that there was a trend in the error estimates: the ages of the younger children were overestimated while those of the older children were underestimated in both white and black children.

Table 2 Measures of central tendency and variability of the residual error’s distribution using eq. (1) for estimating ages
Fig. 2
figure 2

Plots of the errors of the estimated age by using eq. (1) against the chronological ages in years (years) evaluated in both black (left plot) and white (right plot) groups

Finally, regarding the Shapiro-Wilk test [17], the p value was less than 0.001 in both groups, indicating that the errors were not normally distributed (Fig. 3).

Fig. 3
figure 3

Q-Q plot (Quantil-Quantil plot) and Shapiro-Wilk test of normality of the error distributions obtained using eq. (1) for age estimation

The data distribution (Fig. 4), with S as a function of age, exhibited a two-phase profile with a breakpoint between 10 and 12 years of age.

Fig. 4
figure 4

Plot of S (sum of the normalized widths of apices) as function of age (years) in the four subsamples of black boys and girls, white boys and girls

Consequently, to overcome the two questions related to the age-dependent bias and the nonlinear profile of the data, for a given sex and race, age was estimated as a function of the sum of the normalized open apices, S, by using the Bayesian calibration approach [19, 20]. However, differently from the previous approaches, the location parameter of the probability model was described by a segmented equation:

$$ S={\beta}_0+{\beta}_1\;\mathrm{Age}+{\beta}_2\;{\left(\mathrm{Age}-y\right)}_{+} $$
(2)

where β0, β1, and β2 are the model parameters, γ is the breakpoint and (Age - γ)+ is the function equal to x, if x ≥ 0, and 0, otherwise. The previous eq. (2) is invertible and its inverse equation is:

$$ \mathrm{Age}=\Big\{{\displaystyle \begin{array}{ll}\left(S-{\beta}_0\right)\cdot {\beta_1}^{-1}& \mathrm{if}\ {\beta}_0+{\beta}_1\cdot \gamma <s\\ {}\left(S-{\beta}_0+{\beta}_2\cdot \gamma \right)\cdot {\left({\beta}_1+{\beta}_2\right)}^{-1}& \mathrm{if}\ 0<S\le {\beta}_0+{\beta}_1\cdot \gamma \end{array}} $$
(3)

Hereafter, eq. (3) has been used to describe age as function of dental maturity index S (Fig. 5). To exemplify the use of eq. (3), a case of a black girl with S = 1.2 has been considered. From Table 3, b0 = 6.611, b1 = −0.589, b2 = 0.483, and γ = 10.5, taking into account that b0 + b1γ = 0.4265 < S, the estimated age of the considered black girl results 9.19 years.

Fig. 5
figure 5

Plot of age as function of S (sum of the normalized widths of apices) in the four subsamples of black boys and girls, white boys and girls, and the respective lines obtained using eq. (3) with parameter values showed in Table 3

Table 3 The estimated parameters of the model (2) by sex and race cohorts

After estimating the age of children using eq. (3), with parameters reported in Table 3, the model explained 76% of total variance in white girls and 80% in the white boys’ subgroup, 76% of total variance in black girls and 78% in the black boys’ subgroup (Table 4).

Table 4 Measures of accuracy and precision of the residual errors distribution using eq. (3)

The mean absolute error of the residuals (residuals = predicted age minus observed age) ranged from 0.718 to 0.769 years, with the interquartile range (IQRres) ranged from 1.19 to 1.32 years. The plot of residuals versus ages showed a not significant bias (Fig. 6), that is, differently from the European formula (1); estimated ages did not tend to underestimate the chronological age significantly as the age increases. In fact, the bias of the estimates, bres, ranges from − 0.01 to 0.01 which corresponds to a bias of a few days for all individuals in the sample. Neither of the bres values was significantly different from 0 (Table 4).

Fig. 6
figure 6

Plots of the errors distribution using eq. (3) against the chronological ages in years (years) in the four subsamples of black boys and girls, white boys and girls

Discussion

The timing of dental development in children of South African origin has not been studied at large. Previous studies, in specific racial groups (Zulu and Xhosa) of South African children, demonstrated that Demirjian’s age estimation method [21] overestimated the age of the examined subjects but they were not considered as reliable due to the small number of children in each age group [22]. According to Uys et al. [23], the girls demonstrated a more advanced dental maturity for all the age groups. These results are in contrast with Hargreaves et al. [24] in which the eruption times in both girls and boys (black, white and Indian origin) were similar. Some researchers established also that children of African ancestry are significantly advanced in dental emergence and tooth formation compared with European ancestry populations [25]. Finally, Willems et al. [26] noticed a small but significant overestimation only in females when the Willems method [27] was applied.

To date, the Cameriere European formula [11] has never been extensively tested in a large database of black and white South African children nor has the maturity index (xi) been used to establish black and white South African population-specific age prediction models. Recent findings confirmed the usefulness of the European formula [11] in several population samples coming from Mexico [28], Turkey [29], India [14], and Bosnia-Herzegovina [30]. However, Nigro-Mazzilli et al. [31] showed that a new specific regression model is useful for a more accurate age prediction in a Brazilian sample of children aged between 4 and 16 years.

In the present study, several specific findings from the analysis warrant mention: (1) the results showed that the readings obtained by the examiners, on a large sample of 1944 children, are reliable and unbiased, being this consistent with the results of previous studies [11, 14, 28,29,30,31]; (2) based on the gathered information, the maturity index [11] was applied, yielding an inappropriate prediction intervals; (3) the age estimates revealed that European formula caused a bias in the error’s assessments. In fact, when this formula was applied, the ages of the younger were overestimated while those of the older children were underestimated in both white and black children (tendency of attraction to the middle) [32]. In the forensic context, in which a precise discrimination between age groups must be performed, the age estimates should be accurate and unbiased, and the uncertainty should be quantified appropriately.

For this reason, the total dataset was used to develop a new country-specific age model based on Cameriere et al. [11] maturity index (xi), generating a new formula, expressed directly in years. For this purpose, a sounder statistical approach has been performed: the Bayesian statistics was used showing that estimated ages did not tend to underestimate the chronological age significantly as the age increases (MAE 0.718 to 0.769 years). Previous studies [33,34,35] employed Bayesian statistics for assessing age from third molar analysis and they reported low bias and similar findings in terms of accuracy. Metsäniitty et al. [35], in a recent work in which they combined permanent teeth and third molar development for age estimation purposes, showed a median error (ME) of 0.031 years in males and 0.011 in females, respectively. The ME provides a sounder information about the direction of the error, i.e., whether the model overestimates or underestimates children’s age.

According to the literature [36, 37], from an operational and theoretical point of view, there are several advantages of assessing age by applying the Bayesian approach [38]: (1) it reduces the bias inherent in the regression model; (2) it is much more reliable than the classical calibration methods; (3) it allows experts to choose the most appropriate probability to describe the uncertainty that is associated with the variables characterizing the problem; (4) in spite of its higher mathematical complexity, the Bayesian approach provides the forensic personnel with an additional tool in making forensic decision during the age estimation process.

Conclusions

These results showed that the Cameriere’s maturity index (xi) is reproducible in both samples of South African black and white children, for forensic purposes. The new population-specific model provides superior accuracy and the development of a sounder statistical model, with a Bayesian calibration approach, is useful for a more precise estimation. New extended data on individuals of ascertained ethnic background must be undertaken. A protocol for cross-validation needs to be developed in the future to confirm these results. This will allow the formulation of suitable probability models for measuring uncertainty about the variables involved.