Introduction

Forensic age estimation became a challenging field in forensic sciences since population movement and immigration increase due to political instabilities in several region of the world. Estimation of the age of asylum seekers is a key point because it has many implications for authorities. The French state take in charge by law all the seekers under the 18 years of age [1]. More generally, it explained why the European Community had recommendation concerning young unaccompanied people [2]. Therefore, asylum seekers frequently claimed to be younger than 18 years old. The global increase of phenomenon can easily be observed: for example, 85,726 individuals (whose 14,436 declared to be minors) requested asylum in France in 2016, versus 57,337 in 2011 [3, 4]. The administrative documentation examination is not sufficient most of the time. Authorities must then instruct for a forensic estimation of the biological age. Medicolegal physicians are frequently requested by authorities to estimate the biological age of living persons claiming to be minors, i.e., under the 18 year of age in a large proportion of countries [2, 5,6,7]. Forensic age estimation is a standardized process which associate a clinical examination and a step by step procedure of radiological examination, according to guidelines published by the International Study Group of Forensic Age Diagnostics (AGFAD: Arbeitsgemeinschaft für Forensische Altersdiagnostik) [8,9,10,11,12,13,14].

A significant part of the age estimation literature concerned dental age estimation and the study of orthopantomograms (OPG) [15,16,17]. Many studies focused on the development of the third molar (M3) because it is the only tooth that is still in development around the age of 18 years old [18]. Different methods were elaborated for dental age estimation. Two different approaches were developed: non-metric and metric methods. The most common which is a non-metric method is the Demirjian Staging System (DSS) [16]. It was created in 1973 and identified eight developmental stages (A–H) based on the evaluation of the mineralization of a selected number of teeth [15, 16, 19, 20]. More recently, Cameriere et al. developed in 2008 a new metric method for assessing if an individual is up to 18 years, which is the upper range limit for the definition of a minor individual in most of the states in the world [21]. This method is based on the evaluation of the I3M. It consisted of the sum of the distances between the inner sides of the two open apices divided by the tooth length measured on the left mandibular M3 (tooth 38 from the World Dental Federation guidelines). If the I3M is less than 0.08, the individuals could be considered equal or older than 18 years old.

A comparison between I3M and Demirjian’s stages G and H was already performed, and highlights that the I3M is more precise for 18-year cutoff visualization [22].The original sample was composed of Caucasian individuals. Its applicability should be tested on several populations because the M3 development varied in diverse populations [23]. Since then, the method has been subsequently validated in various populations foreign to the reference sample [22, 24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44]. The main purpose of this study was to assess (I3M cutoff), test, and apply Cameriere’s method on a contemporary French population and compare our main results to the one previously published.

Materials and methods

Sample

We retrospectively selected OPGs of individuals aged between 14 and 22 years, performed in the radiology department of the French University Hospital of Tours between January 2014 and August 2017. The age range was chosen in agreement with previous studies [22, 24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44]. Exclusion criteria were as follows: hypodontia; extracted or badly rotated M3s; or those with caries, fillings, crown restorations, developmental anomalies (dysmorphology, abnormally short roots). The individuals concerned by an age estimation procedure in the forensic department were also excluded. Like in the original study, both impacted and not impacted M3s were included if their roots were radiologically distinguishable. The ancestry/ethnic origin was not studied and was not a selection criteria. Only the administrative age (declared at the hospital) and the sex were collected. No identification data were collected and images were automatically anonymized before the selection process. This retrospective study performed on anonymized non pathological radiological data was conducted in accordance with the ethical standards of the country and the local committee.

Measurements

All the OPGs were acquired with a Planmeca Promax 2D S2 (Planmeca Oy, Helsinki, Finland). The selected OPGs were anonymized, extracted, and analyzed. The measurements were performed on anonymized data, i.e., the sex and age of the individuals were not known by the observer when it was performed. Left lower M3 (tooth 38) was evaluated for each included individual with the same methodology of the princeps study of Cameriere et al. [21]. The apical ends of the roots of the left lower M3 of each subject were analyzed to calculate the I3M, which is evaluated as the sum of the distances between the inner sides of the two open apices divided by tooth length. If the root development of the M3 is complete, i.e., the apical ends of the roots are completely closed, then I3M = 0 [21].

Statistical analysis

All statistical analyses were performed using the R statistical program, R 3.4.3 software (R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria) [45].

Image format and measurement validation

Previous studies recorded all OPGs as computer files in JPG format and studied digital images by the use of an image processing program (Adobe Photoshop® or ImageJ® (IJ)) [22, 24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44]. Two series of measurements were achieved by the same observer (LR) on 40 randomly selected OPGs using IJ for JPG format and the picture archiving and communication system (PACS) for DICOM format (Carestream Health, Rochester, NY, USA). Three weeks separated the two series of measurements to avoid any bias related to software or data encoding. The two series of measurements performed were then compared with each other with a two-sample paired t test.

Intra-observer and inter-observer variabilities

To test intra-observer reproducibility, another random sample of 40 subjects was selected and re-examined after a period of 6 weeks by the first observer (LR). Concerning inter-observer reproducibility, a third random sample of 40 individuals was evaluated by LR and the second observer (CR). Both observers (LR and CR) never worked on this method before. The reproducibility of the method was studied by the intraclass correlation coefficient (ICC) [46]. Cohen’s Kappa test was also calculated for intra-observer and inter-observer agreement [47]. Both ICC and Cohen’s Kappa were calculated as in the previous studies [22, 24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44].

Population characteristics

The population was studied by sex and seven I3M range categories were used to describe the population: [0.00–0.04 [; [0.04–0.08 [; [0.08–0.3 [; [0.3–0.5 [; [0.5–0.7 [; [0.7–0.9 [and [0.9–1.6 [.These categories were chosen according to the literature [22, 24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44]. The descriptive statistics for the age were reported for each category and Tukey’s boxplots were plotted by sex to illustrate the purpose. An independent sample t test was performed to compare male and female samples.

I3M cutoff accuracy evaluation

This dental age assessment method was developed based on a Caucasian population. As it was previously explained, the methodology was then tested on several population especially in the latest publications [36,37,38,39,40,41,42,43,44]. Specific descriptive statistics (also known as quantities) were calculated for each sex and for different values of I3M cutoff previously studied (0.04; 0.06; 0.08; 0.10; 0.12; 0.14) [38]: accuracy (AC); sensibility (Se); specificity (Sp); Youden’s J index (J index); positive predictive value (PPV); negative predictive value (NPV); positive likelihood ratio (LR+); negative likelihood ratio (LR−) and Bayes post-test probability (p).

The Bayes post-test probability (p) of being 18 years of age or older (i.e., the proportion of individuals with I3M ˂ 0.08 who are older than or equal to 18 years) was calculated using Bayes’ theorem [48, 49], which may be written as:

$$ p=\frac{p1\ast p0}{\left(p1\ast p0\right)+\left(1-p2\right)\left(1-p0\right)} $$
(1)

where in (1), p1 is the sensitivity of the test; p2 is the specificity of the test; p0 is the probability that the subject in question is 18 years old or older, in the target population (individuals between 14 and 22 years of age in the French population). Probability p0 was calculated on the basis of data from the French National Institute of Statistics and Economic Studies (INSEE: Institut National de la Statistique et des Etudes Economiques) as the proportion of individuals who live in France aged between 18 and 22 years to those aged between 14 and 22 years [50]. It is considered to be 53.5% for females and 53.2% for males in year 14 to 22 year old in January 2017.

Receiver operating characteristic (ROC) curves for each cutoff was plotted by sex and the more efficient according to the quantities was highlighted.

Most accurate I3M results

After identifying the most accurate I3M cutoff, the contingency table was calculated for the best I3M cutoff by sex. Furthermore, the number and percentage of corrected evaluation were calculated in comparison with the total number of participants in each age group by using the more accurate value of I3M cutoff.

Approximation of summary ROC curves and comparison

Finally, our results were compared with the results described in the literature. A recent meta-analysis summarized the published literature which concerned the methodology developed by Cameriere et al. [51]. The different populations studied were compared between them by the use of summary receiver operating characteristic (SROC) curve assessment. SROC curves are statistical methods used to assess diagnostic data and summarize sensitivity and specificity from different studies in order to compare data [52]. By the use of previous published data used in the meta-analysis enhanced with the last studies [51], we made a SROC curves and the appropriate 95% confidence interval. Thereafter, the point which corresponded to the results of this study was plotted in order to compare the data.

Results

We selected 431 OPGs (187 females and 244 males). Age and sex distribution of our French sample is presented in Table 1. A minimum of 37 (14 years) and a maximum of 59 (16 years) individuals were studied per age. Numbers in brackets represent the number of subjects with closed apices (I3M = 0.00) for each age group. The first apical closure was noticed at age 17 in females and at age 18 in males. There were still open apices at the age 22 in both sexes.

Table 1 Descriptive table of the population according to sex and age categories. Number in the brackets correspond to the number of individuals with closed apices (I3M = 0.00)

There was no statistically significant difference between I3M obtained by using the PACS or IJ. The result of the t test was t = − 2.15; p = 0.03. Hence, the OPGs were analyzed on the PACS. ICC of intra-rater and inter-rater agreement of I3M was 0.98 [0.97; 0.99] and 0.98 [0.96; 0.99], respectively. Intra-rater and inter-rater Kappa for discriminating those who were 18 years old or older and those who were younger than 18 years old were 0.99 [0.977; 0.99] and 0.94 [0.899; 0.99], respectively.

As expected, distribution of chronological age gradually decreased as I3M increased in both males and females (Fig. 1). Table 2 indicates the descriptive statistics concerning the age in each I3M category (number of individuals, mean, standard deviation, minimum value, first quartile, median, third quartile, maximum value). The mean ages between genders varied across I3M classes, and the differences between both sexes were statistically significant for two groups ([0.08–0.3[and [0.7–0.9[; see Table 2).

Fig. 1
figure 1

Tukey’s boxplots by sex and by age of I3M (index of third molar) range categories

Table 2 Descriptive statistics of the age for each I3M range category by sex. The two last columns are the result of the independent sample test (t with the 95% confidence interval and the related p values). n number of individuals, mean; SD standard deviation; min minimum value; Q1 first quartile, median; Q3 third quartile; max maximum value; t [CI 95%] t value with its 95% confidence interval; p value significant if < 0.05

Table 3 is the quantity statistics derived from the contingency table of test of age of majority (18 years old) when different values of I3M were used to discriminate minors and adults for both sexes. The 95% confidence interval was also calculated and indicated in the brackets. The most accurate cutoff for both sexes was 0.08 according to the statistical parameters, especially Youden’s J index. Concerning the I3M = 0.08 value for males, the sensitivity of the test (the proportion of individuals being 18 years of age or older whose test was positive) was 92.2% (95% CI 88.8–95.6%) and its specificity (the proportion of individuals younger than 18 years whose test was negative) was 88.35% (95% CI 84.3–92.3%). The proportion of correctly classified individuals (the accuracy) was 90.6% (95% CI 86.9–94.3%). Estimated post-test probability p (the probability that a subject positive on the test (I3M ˂ 0.08) was 18 years old or older) was 0.901 (95% CI 0.864–0.938). Concerning the I3M = 0.08 value for females, the sensitivity of the test was 74.5% (95% CI 68.3–80.7%) and its specificity was 88.2% (95% CI 83.6–92.8%). The accuracy was 80.7% (95% CI 75.0–86.4%). Estimated post-test probability p was 0.878 (95% CI 0.831–0.925). In Fig. 2, ROC curves were plotted for each I3M cutoff studied in gray by sex. The ROC curve for the most accurate cutoff (I3M = 0.08) was plotted in black.

Table 3 Quantities derived from the contingency table of each cutoff studied (0.04; 0.06; 0.08; 0.10; 0.12; and 0.14) by sex and their 95% confidence interval. AC accuracy, Se sensibility, Sp specificity, J index Youden’s J index, PPV positive predictive value, NPV negative predictive value, LR+ positive likelihood ratio, LR− negative likelihood ratio, p Bayes post-test probability
Fig. 2
figure 2

Receiver operating characteristic curves plotted for each I3M cutoff studied (0.04; 0.06; 0.08; 0.10; 0.12; and 0.14) by sex. The points on each curve correspond to the sensitivity and specificity associated to the I3M cutoffs related to each curve. The curves in black corresponded to the best cutoff (I3M = 0.08) for both sexes

Table 4 is the contingency table for the 18 years cutoff and the I3M < 0.08 positive test value. It showed the close association between adult age and positivity of the test (I3M ˂ 0.08) for both genders. The number and percentage of corrected evaluation according to the total of included individuals in each group by using the best I3M cutoff value (0.08) in which individuals are 18 years of age and older or younger are described in Table 5.

Table 4 Contingency table describing performance of the test for the best cutoff (I3M = 0.08) for the 18-year cutoff values of third molar maturity index (I3M)
Table 5 Number and percentage of corrected evaluation according to the total of included individuals in each group by using the best cutoff value (I3M = 0.08) in which individuals are 18 years of age and older or younger

SROC and present study results

We calculated the SROC on the basis of the bibliography of the Santiago et al. meta-analysis and posterior studies. Eighteen articles were used to plot the SROC curves and their confidence interval [22, 24, 26, 28–41, 44]. The SROC curves were plotted in Fig. 3 for both sexes. The point which corresponded to the sensitivity and the specificity (I3M = 0.08) was plotted in Fig. 3 for each sex.

Fig. 3
figure 3

Summary receiver operating characteristic (SROC) curves of previous data. The middle curve corresponded to the mean SROC curve. The upper and lower curves corresponded to the 95% confidence interval of the SROC curve. The results of the present study are pointed (dark gray point)

Discussion

Results and discussion

Cameriere et al. presented their results by combining both sexes in the original study. In this study, the results were presented separately for each sex mainly for two reasons. The first reason is linked with the age assessment procedure: the sex is not an unknown variable and is always certain because of the clinical examination in age assessment context. The second one is illustrated by the results of t test which compared I3M categories of both sexes. A significative difference between both sexes was highlighted in two I3M categories ([0.08–0.3[and [0.7–0.9[). It means that the measurements of I3M cannot be considered as equivalent in both sexes. Caution is required when the results are presented and they should always be presented separately from one to another. In the original study, sexes were not studied distinctly, hence it cannot be compared with our results. However, the results of our study on the French sample indicated correct classification in 80.7% of cases for females and 90.6% of cases for males for the I3M = 0.08 cutoff and to identify the age of 18 years old. The sensitivity of the test was 74.5% for females and 92.2% for males. The specificity was 88.2% for females and 88.3% for males. Post-test probability p was 0.878 for females and 0.901 for males. Cameriere et al. found in their prior study a global correct classification of being an adult of 83%, a global sensitivity of 70%, a global specificity of 98%, and a global post-test probability p of 98% [21]. A recent systematic review and meta-analysis resumed most of the results of the studied populations [51]. In order to compare our results with the other studies, we plotted SROC curves with the data of the meta-analysis and the studies published since this meta-analysis such as our study parameter [36,37,38,39,40,41,42,43,44, 51]. Our results are located between the SROC confidence interval curves for the males which gave hard proof of the performance of the diagnostic test on the studied population. The results for the females are below the lower 95% confidence interval range of the SROC curves but not really far from it. The SROC methodology has the advantage of being a simple and graphical overview of the previous paper which concerned mainly the Caucasian population. It was important to present updated SROC curves because since the meta-analysis of Santiago et al. ten newer studies from mainly Caucasian population were published in 2018 and 2019 which tested the Cameriere et al. methodology [36,37,38,39,40,41,42,43,44, 53]. Unfortunately, all the statistics were not available for every articles and it was not possible to include three of the ten articles to plot the SROC curves [42, 43, 53].

In order to explain these differences (mainly in sensitivity) between both sexes, the population of this study and the previous studies were analyzed. The distribution of this study seemed to be more balanced than in some others [37, 42]. The smaller the I3M is, the harder the precise measurement is to perform, except when the root apices are closed (I3M = 0.00). The I3M between 0.00 and 0.08 are the harder to measure as a consequence while they are critical in the statistical analysis. If the population is unbalanced and there is a lack of individuals near 18 years old (who probably will have an I3M near 0.08), it will result in an artificially better accuracy and a better sensitivity/specificity. These results will be artificially increased by an important amount of measurements which resulted in I3M = 0.00 (mature adults) and I3M higher than 1 (young individuals). It explained why the results on the Table 5 are between the results of Chu et al. and the results of Tafrount et al. especially between 17 and 19 years old. The difference between the distribution of Tafrount et al. and our distribution explained the important variation in terms of sensibility/specificity between the two studies. Another argument in order to explain the difference is the unbalance of the number of individuals between the group of individuals younger than 18 years old and the group of individuals older than 18 years old. The proportion in the two French studied populations are the opposite and there were more individuals older than 18 years old in the population of this study especially for the female subgroup. As it was previously explained, this is exactly the part of the population where the I3M is hard to evaluate which in turns led to a worse sensitivity and specificity. Our study was not the only to have such a lower sensitivity: the method was tested on a Chilean population and the sensitivity for the population (male and female) was 70.5% [43]. Despite the fact that this sensitivity concerned both sexes, this value highlighted that our low sensitivity seemed not to be linked with a statistical unknown bias in the female population. It should be highlighted that R. Cameriere figured as author or co-author in each study published after the meta-analysis except one [38]. Santiago et al. indicated that it should be considered because better results should be expected when the creator of a methodology applied his own age assessment tool on a population [51]. This argument might partially explain why some of our results seemed not to be as good as in others studies.

As in many previous studies previously published on Cameriere’s method, the ratio of analyzed OPG to the included ones was not calculated [22, 29,30,31, 33, 34, 36, 37, 39, 41, 42]. As it is already explained, such a ratio would not provide a good estimation of the clinical applicability of the method, because of the retrospective nature of the study (for example, a significant number of OPGs had been performed post-surgery of excision of wisdom teeth in our study). It would be interesting to calculate this ratio in the context of a prospective study, which was not possible in the present case.

Sex dimorphism

Even if the first apical closure in this study was noticed for a 17-year-old female and for an 18-year-old male, results highlighted an earlier maturation of M3 in male population in comparison with the female population and results agreed with most of the previously published studies [51]. Tooth development is a well-studied topic, whether in the odontological field or in the anthropological field [54, 55]. More generally, the skeletal maturation of females occurred at an earlier stage than for males. Concerning the tooth development, it is more complex. If the tooth development is globally studied, the female maturation happened earlier than for males which is the opposite of our results. Literature clearly highlighted that there is a difference between the M3 and the other teeth [15, 16, 55, 56]. This difference is particularly visible for the root development. The teeth development begin with the crown earlier for females but the root development happened faster for males than females. The method of Cameriere et al. is highly under the influence of root closure and that is why males finished their maturation earlier. This statement explained the better sensitivity of I3M for males than for females as in many other studies concerning this method [51].The younger the individuals, the easier is the age estimation because more teeth can be used to perform the estimation [55]. This physiological development statement explained why assessing dental age for the 18-year-old cutoff is difficult and why it is always an active part of the forensic age estimation research field in order to improve the recommendations [8].

Third molar and age estimation

Scientific community understand nearly 60 years ago that wisdom teeth is essential to assess the age of individuals in a judicial context [56]. The contribution of the evaluation of dental age is well established for assessing if an individual is an adult or a minor. Former studies highlighted the superiority of the method of Cameriere et al.—based on the evaluation of the I3M—compared with the use of Demirjian stages G and H, previously considered as the reference method, to estimate adult age [21, 22]. The original study, published in 2008, was conducted on a Caucasian sample. One of the main concern was the validation in several different populations when a new methodology is developed. Cameriere’s method was only created in 2008, but it was tested on different populations from the five continents as it is highlighted in the recent systematic review and meta-analysis [51]. The interest concerning this method is always actual and dental age assessment is of interest all over the world [38]. Despite the recent publication of a French study, we studied our population for several reasons. First of all, the population in this study is about 27% greater than in the previous French study and one of the objectives was to improve the effectiveness of the methodology in the French population [37]. Chu et al. highlighted that the cutoff should fluctuate from one population to another. They highlighted a difference in their population of northern China between male individuals (best cutoff: I3M = 0.08) and female individuals (best cutoff: I3M = 0.10). In our study, we evaluate the different I3M cutoffs with the same strict methodology presented by Chu et al. [38]. The results pointed that the cutoff I3M = 0.08 seemed to be the more accurate for Caucasian populations and confirmed the methodological options of Tafrount et al. [37]. It is of interest to highlight that in a similar population (two different French samples), there is no variation of the I3M cutoff. This is a convincing argument against the influence of other unidentified factors and it confirmed the usefulness of the Cameriere et al. methodology.

For forensic purposes, to reduce ethically unacceptable errors (minors classified as adults) is more important than to reduce technically unacceptable errors (adults classified as minors) [57]. If our study showed a good specificity of 88.23% for females and 88.35% for males, these results remain lower of those of several anterior studies [22, 24, 25, 27,28,29,30,31,32, 34,35,36,37,38,39,40,41, 44]. This variability may be due to several factors, and a bigger population sample study should provide indices to identify these factors in order to increase the efficiency of this methodology. In fact, it should be highlighted that a certain amount of inaccurate classifications occurs between the ages of 17 and 19, as shown in Table 5 in our study. This concentration of errors in the year surrounding the 18-year threshold is unfortunately constant in the literature [22, 24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44] and is not related to the I3M cutoff methodology [38]. It represented one of the main limits of this method and more generally regarding the age estimation whatever the method or the anatomical site studied. This limit is more related to the body maturation process than the method itself.

Conclusion

In conclusion, Cameriere et al. method is useful to estimate adult age in the French population. The main limit of this technique is the impossibility to measure I3M on subjects with extracted or unusable third molars. To improve its accuracy, and according to the guidelines published by the International Study Group of Forensic Age Diagnostics, this method need to be coupled with a physical examination, a radiograph of the left hand, and, if necessary, with a computed tomography (CT) of the sternal clavicular epiphysis.