Introduction

The development of methods to estimate the age of individuals represents a considerable part of the physical and forensic anthropology literature, because the identification of human skeletal remains is one of the basic aspects of the subject of the field [1]. In the case of infant skeletal remains, the constant development, growth, and maturation of the skeleton and teeth make it possible to estimate the age at death with a high degree of reliability [2, 3]. That is why the length of the long bones and dental development and eruption have usually been employed to estimate the age of infant individuals [4,5,6].

Most of the current methods used to estimate the age of non-adult individuals are far from following the recommendations of the scientific community for publication in terms of sample size, reliability, and error estimation, among other factors [7]. While these approaches offer a substantial amount of information regarding the growth and development of individual skeletal elements, it is essential that the new methods proposed by researchers are not only more precise in their mathematical expression, but also easier to use by the anthropologist [8].

The development of the skull as a criterion for age estimation in non-adult individuals is quite useful especially in early childhood, given the large number of bones in the cranial structure and their different stages of fusion. The fetal growth of the temporal bone was described by Anson [9] and has been used to estimate age in several studies [10,11,12,13,14,15]. The lateral region of the temporal bone, including the squamous portion, shows very rapid growth between birth and 4 years of age. Although it continues to grow until approximately 20 years of age, this development is drastically reduced with respect to the first years of life [16]. Another favorable aspect is that, due to the high bone density of skeletal elements such the pars basilaris and pars petrosa, which enhances their preservation, methods based on these two bones are useful both in forensic and bioarchaeological contexts in those cases where the skeleton is not complete [17].

The objectives of the study were: (1) to assess the accuracy of the Fazekas and Kósa [10] and Nagaoka [13, 14] methods in a Mediterranean population and (2) to propose new regression formulas based on the metric development of the pars petrosa and the squamous portion of the temporal bone in the infant from an identified Mediterranean collection. The results will allow updating the existing methodology on age estimation in gestational and infant individuals.

Materials and methods

The study sample belongs to the Granada Osteological Collection Sample of Identified Infants and Young Children (Spain) composed of 242 individuals (Table 1), which was contextualized by Aleman et al. [18]. This collection is currently located at the Department of Legal Medicine, Toxicology, and Physical Anthropology of the University of Granada. The individuals come from the Cemetery of San José of Granada (Spain) and are of contemporary chronology (twentieth century), from which plenty of official documentation is available. Hence, it is possible to obtain very reliable antemortem information, such as the date of birth, the date of death, months of gestation, sex, cause of death, and related pathologies, among other factors [18]. In addition, the remains are in an excellent state of preservation.

Table 1 Distribution by age and sex of Granada osteological collection of identified infants and young children

All individuals whose antemortem records indicate prematurity or any pathology that could imply an anomalous formation and/or development of the skeleton were eliminated from the study. Furthermore, individuals with taphonomic and/or traumatic damage or alterations and individuals whose skeletal development or biological age was not consistent with the age recorded in the official documentation were excluded from the study sample. The chronological age in the official documentation was given in years in the case of infants and in months of gestation in the case of gestational individuals.

Therefore, we have multiplied the months of gestation by 30 and the years by 365. This gives the chronological age in days for each individual in the sample. To determine the gestational age, 280 days were added to the chronological age of all postnatal individuals. In this way, the results obtained with the regression formulas have two different interpretations: (1) if the estimated age is greater than or equal to 280, then the number must be subtracted to obtain the estimated age in days of a postnatal individual. (2) When the age estimate is less than 280, the individual will be considered as a prenatal or gestational age individual.

The study sample was composed of 109 individuals with an age between 5 months of gestation and 1.5 postnatal years. Because of the cranial synostosis process that interferes with measuring the WP, this measurement could only be taken in 85 individuals of the sample. The distribution of the sample by age and sex is shown in Table 2.

Table 2 Age and sex distribution of the sample

Data were collected by using a digital caliper, with an accuracy of 0.01 mm, following the methodology proposed by Fazekas and Kósa [10] (Fig. 1). The definitions used for each measurement taken in the pars petrosa and squamosal portion are listed below:

  • Length of the pars petrosa (LP): the maximum anteroposterior distance across the petrous portion;

  • Width of the pars petrosa (WP): the maximum right-angled distance of the length across the arcuate eminence;

  • Height of the squamosal portion (HS): the maximum distance from the center of the tympanic crest to the superior border of the bone, and

  • Width of the squamosal portion (WS): the distance from the posterior arch of the squamomastoid suture to the anterior edge of the squamosal part.

Fig. 1
figure 1

Location of measurements. Width of the squamous portion (WS). Height of the squamosal portion (HS). Length of the pars petrosa (LP). Width of the pars petrosa (WP)

To assess the methods proposed by Fazekas and Kósa and Nagaoka [10, 13, 14], the chronological age of the individuals was compared with the age estimated by these methods by using the Wilcoxon test. In addition, the root mean square error (RMSE) was also calculated, that is, the average square distance between the real and predicted values was calculated to determine the reliability of the method [19]. The RMSE result allows us to evaluate a predictive model by the estimation error of each method applied in the Mediterranean population.

The repeatability and reproducibility of the variables were analyzed by using Lin’s concordance correlation coefficient (CCC) [20], which is a comparative analysis of two measurements of the same variable where the degree of agreement between them is evaluated [21, 22]. For this purpose, measurements were repeated in 25% of the sample (30 randomly selected individuals) 2 weeks after initially taking the measurements. In addition, a second observer took measurements independently of the total selected sample.

The Kolmogorov–Smirnov test was applied to analyze the distribution of the sample [23]. Likewise, the degree of correlation between the study variables and age was evaluated with Spearman’s test [24], and descriptive statistics were applied for both variables distributed by age groups.

With the purpose of developing an equation to estimate the age of infant individuals from the temporal bone measurements, an exponential regression model was applied to each of the variables (Fig. 2), considering age in days as dependent variable (y) and the metrical variable as dependent variable (x). The regression model was applied with differentiation by sex and a combination of sex, like those proposed to estimate the age of infant individuals with the pars lateralis, pars basilaris, and coxal bone [25,26,27,28]. In addition, we also offer a formula that combines the measurements of the squamous portion and the length of the petrous portion.

Fig. 2
figure 2

Exponential relationship of pars petrosa length and age in days

According to Lucy et al. [29], this calibration method has the advantage of being easier to implement if multivariate indicators are evaluated. Thereby, the model fit was determined based on leave-one-out cross-validation (LOOCV), where each individual is classified by functions derived from all cases except for itself. In other words, the analysis is performed several times by excluding one individual at a time, as a way of establishing whether their classification is correct.

The final proposed function was generated by WEKA Workbench software with the least squares algorithm [30]. The standard error of estimate (SEE) and the 95% confidence interval (CI) were obtained by applying the recommendations of Corron et al. [7], where the result of the equation to calculate SEE was multiplied by 1.96 [31]. One of the advantages of the SEE is that its result is expressed on the same scale as the dependent variable.

The exponential regression function of age in days for the study variables was determined by the following formula:

$$Age\ (days)={\beta}_0\times {e}^{measure\ (mm)\times {\beta}_1}\pm \left(1.96\times SEE\right)95\% CI$$
Age:

age in days

β0:

constant

β1:

measuring constant

Measure:

measure in mm

The following equation was used to calculate the SEE [31]:

$$SEE=\sqrt{\frac{\sum {\left(Y-{Y}^{\acute{\mkern6mu}}\right)}^2}{n-2}}$$
Y´:

chronological age

Y:

estimated age result

n:

observations

Statistical analyses were performed with SPSS 25 for Windows 10 (Spanish version; IBM Corp. Armonk, NY, USA). WEKA Workbench software 3.6.15 (University of Waikato) was used to generate the function [32].

Results

The accuracy tests of the methods proposed by Fazekas and Kósa and Nagaoka when applying the Wilcoxon test showed significant differences (p < 0.05) between the chronological age and estimated values for each of the proposed methods, except for HS estimated with the Fazekas and Kósa method (Fig. 3). Furthermore, these methods tended to overestimate the age of the fetal individuals in the collection (Table 3). The method proposed by Fazekas and Kósa overestimated the age when applying the LP, while it underestimated it by more than 100 days when using the WP. On the other hand, the method proposed by Nagaoka overestimated the age by 42 days when using the WP and 38 days when using the LP. In addition, the RSME was more than 100 days for the Fazekas and Kósa method and 45 days for the Nagaoka method (Table 3).

Fig. 3
figure 3

Accuracy test plots of the Fazekas and Kósa and Nagaoka methods in the Mediterranean population

Table 3 Testing of accuracy results of the Fazekas and Kósa and Nagaoka methods

The intra- and inter-observer errors for each of the variables are reported in Table 4. The results were excellent, falling between substantial and almost-perfect correlations. Nevertheless, WP had the worst correlation coefficient: the inter-observer error was poor to moderate and the intra-observer error was moderate. These results could be explained by the heterogeneity of the posterior part of the pars petrosa. In addition, the difficulty of clearly identifying the point of measurement for the WP reduced the reliability of this measurement.

Table 4 Concordance correlation coefficient (CCC) to calculate the intra- and inter-observer error for each measurement

The sample did not follow a normal distribution (Kolmogorov—Smirnov test, p < 0.05), because a large part of the individuals in the sample are between birth and 0.5 years of age (Table 5). Descriptive statistics for each measure are shown in Table 5. The Wilcoxon test for nonparametric samples revealed no bilateral significant differences in the variables (LP, p = 0.14; WP, p = 0.43; HS, p = 0.19; WS, p = 0.06), so the left side was taken for consecutive analyses. To maintain the sample number, if it was not possible to measure the left side, this value was replaced by the right side [6]. The correlation between the measurements and the age at death of the individuals, using Spearman’s test, was very strong [24] (Table 5).

Table 5 Descriptive statistics for each variable by age range

The regression formulas with the associated SEE and 95% CI are included in Table 6. They are presented with the sample separated by sex and combined because the osteological and dental development of boys and girls is not similar and indeed is even more differentiated in the case of teeth [33, 34]. Even though there are differences in the growth of boys and girls [34, 35], there were minimal differences in the SEE between the functions for female and male individuals, a finding that increases the value of the combined function as a method for estimating age. Furthermore, given the difficulty in estimating sex in infants [36, 37], the use of the function designed with the sample not separated by sex (the combined function) is recommended in cases where the sex is unknown.

Table 6 Age estimation functions for LP, WS, HS, and combined variables

The combined function for the three measurements produced the highest coefficient of determination (R2 = 0.87), while the function for female individuals for the WP produced the lowest (R2 = 0.55), the only value that was less than 0.60.

The estimation error is presented individually with a 95% CI for each of the functions, adding and subtracting this value from the estimated age [6, 31]. For example, for an estimate made from the LP of an individual of unknown sex, the result would be 300 ± 119.22 days.

Discussion

The methods proposed by Fazekas and Kósa [10] and Nagaoka [13, 14] showed significant differences between the chronological age and the estimated age when applied to a sample from the Mediterranean population (Granada Collection) (Table 3; Fig. 3), so we do not recommend using these methods to estimate gestational age. Both methods tend to overestimate the age of individuals. In addition, the RSME values were greater than 150 days (5 months) in most cases. This represents a large estimation error for fetal individuals.

Although the functions for the LP in the Nagaoka method and the HS in the Fazekas and Kósa method produced age estimates more in accordance with the chronological age, with an RSME less than or equal to 50 days, the reliability of the method in this population means that it cannot be recommended. Furthermore, in the work performed by Nagaoka on the Japanese population, specifically in the individuals from the Tohoku University collection, they reported significantly smaller measurements of the pars petrosa with respect to the European population [14], a factor that may explain the non-applicability of the method in our study population. Therefore, it was necessary to establish a methodology to estimate age in non-adult individuals through the petrous and squamous portions in the Mediterranean population.

Several authors have proposed a methodology based on regression models to estimate age in infants by using cranial [1, 26, 38] and postcranial skeletal remains [6, 27, 39,40,41]. Although these methods calculate the error of their estimations, it is essential to emphasize that these data can be confusing, as it is not the error that should be associated with the estimation, because the error is based on the training sample on which the method is based, a common mistake in age estimation methods in forensic anthropology [19]. The optimal approach in these cases is to have a training dataset to develop the method and a testing dataset, where the error of the predictive model is calculated and, hence, the method is validated or refuted. However, due to the typical sample size in forensic anthropology, it is difficult to divide the sample into training and testing datasets [19]. For this reason, the use of techniques such as cross-validation (CV) is an excellent solution to this problem [42]. In addition, the use of leave-one-out cross-validation (LOOCV) is particularly recommended for cases where the sample size is small [19]; thus, it might be more applicable in forensic anthropology. Another advantage of using a CV is that it provides a more realistic estimation of the validation error [19].

Although our results are positive, they should be interpreted with caution, because the results are less accurate than the estimations provided from long bones in both individuals younger than 2 years of age [6, 43] and in gestational age individuals [44, 45]. Carneiro et al. [44, 45] proposed methods for fetal individuals using long bones, which showed R2 of at least 0.90. Moreover, the estimated error in all cases was not less than 17 days. In contrast, the error of the estimation proposed with our methodology was always higher than 30 days. Thus, the functions proposed in this paper are not as efficient as the length of long bones to estimate age in prenatal individuals [44, 45]. However, if we take into account the range of error in the estimates, our results, especially the formula for the length of the petrosa, are close to AlQhatani’s method based on the eruption and development of teeth [4].

On the other hand, Cardoso et al. [6, 46] proposed regression models based on measurements of long bones, the shoulder girdle, and the pelvic girdle, for postnatal individuals up to 2 years of age; they had an R2 close to 0.90, which is higher than our functions. In addition, the best SEE values corresponded to the length of the femur, with a value of 0.23 years, and the height of the ilium in female individuals, with a value of 0.17 years. The worst SEE corresponded to the length of the tibia, with an SEE of 0.29 years, and the height of the pubis in female individuals, with an SEE of 0.48 years [6, 46].

Smith et al. [38] measured the skull and reported results that are not as positive as those calculated in long bones. The highest R2 was close to 0.80, associated with the frontal, occipital height, and mandibular ramus height. The best SEE corresponded to the width of the frontal bone, at 0.20 years, and the worst SEE was for the length of the zygomatic bone, at 0.58 years [38].

We have found that our functions are in the ranges of variability (based on R2) that are associated with the skull growth-based methods [38]. In contrast, when compared with models that are based on the development of long bones and the shoulder girdle and pelvic girdle, our formulas are not highly recommended [6, 46].

Even though the error of the functions obtained in this paper presented a lower R2 with respect to those mentioned above [6, 46], our functions for each measurement have an associated error of less than 90 days, except for the function based on the WS, for which the error was 110.77 days. In other words, the associated error is in most cases similar to or lower than the error in those functions. However, we only recommend using the function for the LP because it provided one of the best results, both in terms of R2 (0.85) and the acceptable estimated error (30–61 days). Regarding the other functions, especially those relating to the squamous portion, we do not recommend using them because R2 was less than 0.70. Likewise, we do not recommend using the combined function. Although it presented a good R2 (0.81–0.87) and a low estimation error (43–67 days), the results do not improve when using the LP separately.

The formulas we proposed here are presented through the use of inverse calibration, applying LOOCV. In this sense, the application of CV provides a methodology with greater precision, reduces model overfitting, and offers a more realistic error [19].

The equations we have presented provide valuable information on the age at death of fetal and postnatal individuals up to 1.5 years of age with a minimum estimation error of 59 and a maximum of 132 days in a Mediterranean sample. However, the method needs further validation of documented individuals of different geographical origins.

Conclusions

Our results obtained for validating the methods of age estimation through the pars petrosa proposed by Fazekas and Kósa and Nagaoka are not applicable to the Mediterranean population. The proposed regression model relating age to the pars petrosa length, height, and width explains a significant percentage of the total age variation in the contemporary population of non-adult individuals from the San José cemetery in Granada, Spain. Likewise, we do not advise using the WP to estimate age due to its low R2 and low degree of repeatability and reproducibility among researchers.