Introduction

The estimation of sex and age are probably the two main parameters analyzed in physical anthropology to identify or contextualize skeletal remains. In particular, the estimation of sex, in addition to its obvious usefulness for human identification, enable providing more accurate estimates of the age, given that development processes are strongly linked to sex [13].

However, when the skeletal remains belong to children, the estimation of sex presents great difficulties, mainly for two reasons:

  1. 1.

    In contrast to the estimation of age, sex estimation in children is much more imprecise than in adult individuals. This is because the appearance of features with sexual dimorphism is mostly produced as a result of a change in hormone levels at puberty, so in individuals who have not yet reached it, the discriminating bone features are minimal [4]. Therefore, the methodology used to estimate the sex of adults (mainly through features of the skull and pelvis) is not applicable to children. In addition, other indicators such as chemical differences in bone, based on levels of citrate, calcium, phosphorus, and strontium, are also only seen in individuals who have reached reproductive age [4].

  2. 2.

    It has not been possible to develop good methods due to the scarcity of samples available for research [1, 4, 5]. They must be composed of well-preserved, identified, and more contemporary skeletons. Until today, the most important osteological collections that have been used for research are the Fazekas and Kósa (1978) [6], Spitalfields [7], Lisbon [8], Argentina [9], South Africa [10], and Chile [11] collections. However, these do not fully comply with the above minimum requirements.

Despite these difficulties, numerous studies have demonstrated the existence of morphometric differences between sexes in the skeleton of children, which has allowed for designing methods that provide an estimate of sex for this age group although, for now, these are much more inaccurate compared to adults. These methods include those using the morphological differences in the humerus [12], dental [13], visual features in the jaw and the pelvis [14], cephalometric methods [15] or based on geometric morphometric analysis [16].

The method published in 1993 by Schutkowski [14], based on the macroscopic analysis of the morphological differences in the ilium and jaw, may be the most prominent method so far due to its promising results and its simplicity. To develop it, 37 boys and 24 girls were used, between 0 and 11 years, belonging to the collection of Spitalfields [7]. Subsequently, numerous studies to validate his results have been published, including those carried out by Loth and Henneberg [17] with 7 boys and 12 girls from the collection of South Africa; by Sutter et al. [11] in 30 male and 55 female mummies from Chile; by Vlak et al. [18] in 23 girls and 33 boys from the collection of Lisbon; and Cardoso and Saunders [19], who used 57 boys and 40 girls also from the collection of Lisbon.

In general, these authors obtained very different results in their publications. For example, if the angle of the sciatic notch was valued, Schutkowski [14] obtained a 95.0 % chance of success if the male trait was observed, Sutter [11] achieved 75 %, and Vlak et al. [18] 54.5 %. Some authors believed that these discrepancies were due to the small size of the samples [20], others to the difficulty of defining the different indicators [19], and others because the results were separated by age groups [3].

Perhaps the main conclusion we can draw when comparing these studies is that sex estimation in infant skeletons is still an unattained goal for research in physical anthropology. Also, it is possible that by discussing the results featured in these publications, the contributions that have been made in identifying those constraints posed by the estimation of sex in children is more important than the validation of each of the indicators studied. In this work, we sought to contribute to this statement by describing two major errors that, in our opinion, have been observed in most studies addressing this subject. These errors prevent proper comparisons between studies being carried out and they hinder the proper interpretation of the results.

Problem statement

The following is a widespread error in the interpretation of results when methods for the estimation of sex in subadults are evaluated. The usual approach is to estimate the sex in an identified population in order to obtain the success rate in the estimate for each of the sexes; in other words, the percentage of boys and girls showing a specific indicator is quantified [11, 12, 1517, 1924]. This approach informs us about the distribution of these morphological features in the studied population; however, it is far from actually indicating the ability of the method to properly estimate the sex of an individual.

Briefly, we can say that to evaluate a method regarding estimating the sex, the rate of success obtained for each sex should not be calculated. Instead, we need to calculate the probability of hitting the estimate if we see a male or female indicator, which offers a very different approach.

This brief explanation, really expressing a simple idea, may be complex to understand if it is not represented numerically, so a hypothetical example to illustrate the two possible interpretations that can be performed is shown in Table 1. This example depicts the hypothetical results of a study to validate a method for estimating sex. These results have been simplified and exaggerated. The number of boys and girls of the sample with each indicator is shown.

Table 1 Different interpretations of the results for validation studies using methods for estimating sex. Hypothetical results

Interpretation A: sex was correctly estimated in 100 % of boys, whereas this only occurred in 50 % of girls. In conclusion, this is a good method to estimate sex in children but not for girls. This is a common but incorrect interpretation.

Interpretation B: When the male indicator is observed, there is a 66 % chance to be a boy, while when the female indicator is observed there is a 100 % chance of being a girl. So this is a very good method for identifying girls but bad for boys. This interpretation is more rare yet correct.

The same results show completely opposite conclusions based on the interpretation given. It is very important to note that, from a forensic perspective, specific morphological features of the skeleton are the only data that can be analyzed by the anthropologist; consequently, the method must express the probability of being a boy or a girl when that feature is observed. We are not interested in the opposite interpretation, namely, to know the probability that an individual of known sex presents a particular trait. This causes many studies to evaluate a method for estimating sex to show confusing or conflicting conclusions, and for that reason it is not possible to use them in cases of forensic identification or for comparison with similar studies if their results are not reinterpreted.

This error has already been reported by other authors [25]; however, many validation studies still use the incorrect interpretation.

Different numbers of boys and girls in the sample

This error occurs when the number of male and female individuals in the sample is not the same, which occurs in most studies [11, 12, 1424, 2628]. In fact, we have not found any previous work that has taken this factor into account.

As explained in the previous section, validation studies should report the probability of a correct estimate of sex when a certain feature is observed. However, to obtain a representative probability of the actual population, a similar sex distribution should be used, assuming that this will be 1:1. In other words: if the number of male individuals is greater than the females, obviously the probability of observing a feature of male individuals will be higher.

To illustrate this error, we have used the same example presented in the previous section, but now it shows the results that would have been obtained if the number of female individuals were half the number of males (Table 2).

Table 2 Different results obtained when the sample has a different distribution of sexes. Hypothetical results

In this example, although it also shows that 50 % of girls each have a variant of the indicator studied, only the interpretation “B” of the results is affected, which now shows a higher probability of success when the male trait is observed (80 %). This means that if the number of females in the sample is lower than that of males, the probability of making a correct estimate if we see a masculine trait is higher than it would be in a real population. This may be particularly evident in studies published by Schutkowski [14], Sheuer [20], and Sutter [11], in which the number of boys was much higher than girls (up to three times), so the probability of properly allocating sex was highly favored for male traits.

When only small samples can be used, as in the case of osteological collections of identified infant and young children, the solution to this problem should not be to obviously reduce the number of males or females in the studied sample because useful information would be lost; instead, the results must be weighed, increasing the importance of data obtained from the underrepresented sex.

Justification and objectives

As Rösing et al. [29] stated, methods for estimating sex can be divided between complex methods but effective and simple but less effective. The choice of each depends on the specific circumstances, such as the state of preservation of the remains, the clarity of the morphological indicators, the available technical equipment, the level of precision required, etc. Because of the aforementioned problems of the current methods used in Physical Anthropology, in the forensic context, and if resources or the state of the remains permit it, a DNA test is recommended. However, this technique is not applicable in cases of burned, degraded, or contaminated remains.

Although new technologies should be used to improve the methods for estimating sex, for example using geometric morphometric techniques or chemical analysis, we consider the macroscopic analysis remains the simplest and generally useful tool in many cases, even to provide first orientation information. Therefore, the overall objective of this study was to validate and readjust, if necessary, the criteria used by Schutkowski [14] for the estimation of sex.

To carry this out, we analyzed one of the best osteological collections available for research. It was recently acquired by the Laboratory of Anthropology of the University of Granada and consists of a large number of individuals compared with other similar collections. The skeletons are in very good condition and there is wide antemortem information obtained through birth, death, and burial certificates for almost all individuals [30]. Furthermore, in this case, a new perspective for the analysis and interpretation of the results has been used, different from that used by most previous studies.

The specific objectives addressed in this study were as follows:

  • Validate the method by Schutkowski [14] for estimating sex in infant and juvenile skeletons in a sample of Mediterranean origin.

  • Providing new hit rates for correctly assigning sex for each of the indicators studied, avoiding methodological errors in the analysis and interpretation of the results observed in previous studies.

  • Compare the results with those of previous studies in different populations.

Material and methods

The sample was obtained from the osteological collection of identified infant and young children located at the Laboratory of Anthropology of the University of Granada (Spain). This collection comes from the Cemetery of San Jose located in the city. This is a relatively recent sample (mid to late twentieth century), its state of preservation is almost perfect and thanks to the existence of official documents, it has extensive antemortem information, such as the precise date of birth and death, immediate and fundamental causes of death, last address, months of pregnancy in the case of fetuses, etc.

For the selection of the study sample, the following exclusion criteria were used: individuals with developmental, traumatic, or taphonomic alterations, which do not allow for correctly identifying the selected indicators, cause of death involving possible developmental changes or lack of “sex” data in official documents. Once these criteria were employed for the total of 230 individuals in the osteological collection, 185 (109 boys and 76 girls) were used for this study, aged 5 months of gestation to 6 years. The sample distribution by age and sex is shown in Table 3.

Table 3 Age at death (years) and sex distribution of sample

Ilium and left jaw of each individual was evaluated macroscopically by the first author in order to assign a sex estimate for each of the seven criteria used by Schutkowski [14]. Before compiling data for a study, it is essential that the investigator is trained in the use of these criteria to become familiar with the variability that can be displayed. To facilitate identification of the different indicators, the definition was accompanied by images which clearly show male and female morphology. The slightly modified criteria used by Schutkowski [14] and the corresponding images are listed below:

  • Angle of the greater sciatic notch (Fig. 1): viewed from the ventral aspect and with the auricular surface side of the angle aligned vertically. This angle will be close or similar to 90° in the male sex and more obtuse, up from 90° in female individuals.

    Fig. 1
    figure 1

    Angle of the greater sciatic notch

  • Depth of the sciatic notch (Fig. 2): in boys is deeper than in girls.

    Fig. 2
    figure 2

    Depth of greater sciatic notch

  • “Arch” criterion (Fig. 3): the “arch” formed by drawing an extension from the inferior side of the greater sciatic notch crosses the auricular surface in girls and passes over it in boys.

    Fig. 3
    figure 3

    "Arch" criterion

  • Curvature of the iliac crest (Fig. 4): seen from the top of the ilium, the iliac crest shows a pronounced S-shape in boys and softer in girls

    Fig. 4
    figure 4

    Curvature of the iliac crest

  • Morphologic features in the mandible (Fig. 5):

    1. A

      Protrusion of the chin region: it is more prominent and square in boys than in girls.

    2. B

      Shape of the anterior dental arcade: the alveoli of the canine protrudes with respect to molars.

    3. C

      Eversion of the gonion region: in boys this region is protruding, while in girls is more flattened.

      Fig. 5
      figure 5

      Mandibular traits

Data collection was repeated by the same observer and a different observer in a total of 30 individuals randomly taken after completion of the study, in order to calculate the intra- and interobserver errors committed to each of the criteria employees. The kappa coefficient and percentage of agreement between the original and the repeated measures were calculated.

To check if the traits investigated were actually related to the sex of the individuals, the data were analyzed using chi-square analysis.

To validate the method of Schutkowski, the positive predictive values when each trait was observed was calculated. The percentage of successful estimates for each sex was not calculated, since this information does not actually represent the reliability of the method (see “Introduction”; interpretation A in Table 1) [25].

To calculate the probability of a correct estimate when female or male characteristics were observed was necessary to weight the results according to the different distribution of boys than girls in the sample (detailed explanation in “Introduction”; Table 2). For this, an equivalent number of individuals who would show each trait if the sample had the same number of boys as girls (0.5 probability for each sex) was calculated:

$$ {A}^1=\frac{A\times 50}{X} $$
$$ {B}^1=\frac{B\times 50}{Y} $$

Where:

  • A: Number of boys showing each trait

  • B: Number of girls showing each trait

  • X: Percentage of boys in the study sample

  • Y: Percentage of girls in the study sample

  • A 1: Equivalent number of boys that would show this feature if the sample had 50 % of each sex

  • B 1: Equivalent number of girls that would show this feature if the sample had 50 % of each sex

The correct probability (weighted) was calculated using the following formula:

$$ \mathrm{Probability}\ \mathrm{of}\ \mathrm{correctly}\ \mathrm{estimating}\ \mathrm{sex}\ \mathrm{with}\ \mathrm{male}\ \mathrm{features}=\frac{A^1}{A^1+{B}^1} $$
$$ \mathrm{Probability}\ \mathrm{of}\ \mathrm{correctly}\ \mathrm{estimating}\ \mathrm{sex}\ \mathrm{with}\ \mathrm{female}\ \mathrm{features}=\frac{B^1}{A^1+{B}^1} $$

The results are shown jointly for all ages and also separately for individuals under 3 months of age (n = 108) and older (n = 77). For the choice of this age as the cutoff, changes in hormone levels in children in this age group have been taken into account: in males, circulating levels of luteinizing hormone (follicle-stimulating hormone and testosterone), which are responsible for the emergence of sex differences, begin to rise around the tenth week of gestation and decline shortly before birth; after birth, these increase again and re-stabilize around 3 months of age. Until puberty, these levels are approximately equal between boys and girls [3133].

Results

Table 4 lists the results for intra- and interobserver error.

Table 4 Intra- and interobserver error: Cohen’s Kappa coefficent (K) and % agreement on traits studied (N = 30)

The results obtained for the sexual dimorphism of the traits studied using chi-square analysis is shown in Table 5.

Table 5 Chi-square analysis for the sexual dimorphism of the traits studied

The number of male and female individuals showing each feature, as well as the weighted probability of correctly estimating sex with Schutkowski’s method [14] are shown in Table 6.

Table 6 Individuals with every feature and weighted positive predictive values for each sex

Discussion

The criteria proposed by Schutkowski [14] for the estimation of sex in subadults from the morphological features of the ilium and jaw have been evaluated. To this end, one of the largest collections of children identified [30] has been used and the results have been interpreted differently from the way most previous studies have done. The values obtained were generally much lower than those posed by Schutkowski [14] (Table 6).

The results obtained for intra- and interobserver error have shown, in general, a bad reliability for the assignment of the proposed traits. The best results were obtained with the angle and depth of the sciatic notch and with the shape of anterior dental arcade, which showed good or very good agreement according to the Kappa coefficient. All other traits are very difficult to identify because of the subjectivity with which they are defined. Comparing these results with those reported by Cardoso and Saunders [19], who analyzed the sex estimation by using the arc criterion, we can conclude that this trait should not be used to design methods for estimating sex in infant and young children. Unfortunately, research examining the reproducibility and repeatability of this method are very scarce, so it has not been possible to compare our results for other traits. However, this study suggests that the difficulty identifying the traits is an important factor that compromises the usefulness of this method.

A statistically significant relationship with sex for four of the traits studied was found when all age groups were analyzed, and for none of the features when only those who were younger than 3 months of age were analyzed (including fetuses) (Table 5); however, none of these groups obtained a success probability for the estimate greater than 0.67 for any criteria, values which are unacceptable in forensic contexts.

The best results were obtained in the group older than 3 months of age. In this case, the positive predictive value was 0.73 with the angle of the sciatic notch and 0.80 with sciatic notch depth, but in both cases only when masculine traits were observed. If feminine traits were observed, the positive predictive value was lower (0.59 and 0.68, respectively). The other indicators in this age group also showed very low values.

Hormonal differences between females and males are conditioned by the presence or absence of testicles. Any individual, in the absence of androgens, will develop physiologically expressing typical female characters [34]. This is why, in our study and other previous studies that have attempted to estimate sex in children, the probability of success in the estimate is higher when male traits are observed. Another way to interpret this could be that in this age group it is easy to find boys who look like girls, but hard to find girls who look like boys.

With regard to the separation of the results by age group, there is a controversy between different authors. On one hand, it is widely known that the differences between the sexes appear gradually and accentuate the older the individuals are; for this reason, most authors have created artificial classifications of age to separate the results. Furthermore, recent studies suggested that these separations can increase the error assumed rather than reducing it [3]; instead, they proposed methods to estimate the sex that include the variable age as a continuous variable in the method itself. Indeed, these classifications by age group do not exist at the biological level because the changes occur gradually and continuously during development. When these ratings are used, they only respond to a concrete objective, which is specific to each case, discipline, or area of study. Therefore, the limits of those age groups should be defined according to the characteristics of the process under study; other classifications should not be used in different contexts [35]. In this study, we have chosen to separate results by age group; however, this separation was not random: the selection of 3 months as the cutoff to separate the results was due to information relating to differences in hormone levels between boys and girls during development [3133], since the appearance of discriminating features depends on this.

Few studies have directly tested Schutkowski’s method, possibly because of the difficulty of accessing collections identified. To compare previous results with those obtained in this study, the data previously obtained was recalculated following the requirements outlined in this paper (Tables 1 and 2). Likewise, only the results for the group aged between 0 and 6 years was used (Table 7). Only the results published by Cardoso and Saunders [19] using the Lisbon collection were not able to be adapted due to the nature of the data. It can be seen that the positive predictive values obtained in this study was generally lower than those obtained by other authors, excluding those published by Vlak et al. [18], who obtained abnormally low data compared to the rest, even more if we consider that both the angle and depth of the sciatic notch were precisely the best criteria for the estimation of sex, as indicated by all other studies. The main difference from other studies was the results for the curvature of the iliac crest and chin, which proved to be very effective for the estimation of sex in the collections of London [5] and Chile [11], but in our study they provided very low probabilities.

Table 7 Comparison of results with others studies

The differences observed among these studies were probably due to differences in the characteristics of the samples, methods of analysis, or interpretation of results. However, examining the similarities, we can conclude that the most reliable traits were those that refer to the sciatic notch, in this case the angle and depth. Other studies did not directly analyze the method of Schutkowski, but they can also be used to corroborate our results. Garcia-Mancuso and Gonzalez [16] analyzed the geometric morphometric of the ilium of subadult individuals with a collection identified from Argentina recently acquired. In this study, they concluded, as in ours, that the sciatic notch was the most dimorphic trait between sexes in subadults, in particular the depth. However, these results should not be used to estimate sex in forensic contexts, since the positive predictive values are still small and intraobserver and interobserver error is not acceptable.

From now, it will be important to think about why these differences exist, in order to guide future studies to develop effective methods for estimating sex. In this regard, Wilson et al. (2014) suggest that this variability could be due to differences in growth rates between boys and girls, rather than morphological differences, which would be continually changing during development. Therefore, the best solution might be to propose new methods that include age as a continuous independent variable in the estimation of sex. This should be done through improved techniques of observation and statistical analysis, and using new technologies and resources available at our disposal.

In this study, the method proposed by Schutkowski [14] for the estimation of sex in subadults was tested. To do this, one of the more important collections of identified infant and young children available for research was used; however, it can still be considered insufficient to provide effective methods because it did not allow for actually knowing the effect of other factors such as population origin or age as independent variables. Although the conditions are not suitable, validation studies allow us to draw closer consensus, identify problems, and gradually perfect the methods of analysis.

Conclusions

Schutkowski’s method is not acceptable for forensic purposes. The rates of correctly assigning sex in juvenile skeletons from the macroscopic analysis of the morphological differences in the ilium and jaw were lower than those published by Schutkowski [14]. Moreover, only three of the seven traits analyzed showed a reduced intra- and interobserver error: the angle and depth of the sciatic notch and the shape of the anterior dental arch. Finally, it is important to note that Schutkowski’s method, as well as those that have been subsequently published to validate their results, have errors in the methodology of analysis and interpretation of results.