Introduction

In the forensic anthropological analysis of human skeletal remains, the estimation of sex is the first step in the construction of the biological profile of an individual [1]. The determination of sex by DNA analysis is more accurate compared to other available methods, but its application has practical limitations such as cost and unavailability of DNA in enough quantities in poorly preserved skeletons [2]. Therefore, the forensic anthropologist uses various morphological and metric methods to estimate sex in skeletal remains. Several bones in the human skeleton show marked sexual dimorphism and are suitable for the sexing of individuals.

In morphological methods, pelvis and skull are often used to estimate sex. The pelvic bone is known to be the most sexually dimorphic bone of the human skeleton, especially in adult individuals [3, 4]. Therefore, several methods have been developed based on a visual assessment or scoring of morphological traits to assess sex. In 1969, Phenice [4] proposed a new method of sex estimation based on the presence or absence of the ventral arch, sub-pubic concavity, and the medial aspect of the ischio-pubic ramus. Different researchers extensively tested this method which showed a high accuracy rate (> 80%) in sex estimation [5,6,7,8,9]. In 2012, Klales and colleagues [8] revised the Phenice method by giving ordinal scores to all three morphological traits in contrast to the original method proposed by Phenice. Recently, this method has been recalibrated by using skeletons from American, Hispanic, South African, and Thai populations, and a new validated regression formula for sex estimation has been developed [10].

In forensic cases, when only dismembered, fragmented skeletal remains are recovered, the forensic anthropologists are expected to use the available bones to build the biological profile of the individual. In such conditions, well-established methods based on complete innominate are less useful. The preauricular sulcus is one of the frequently studied dimorphic traits of the innominate, and its presence is said to be indicative of female sex [11, 12]. Initially, it was believed that the presence of the preauricular sulcus was the marker of parturition [13], but later, it was found to be present in males as well [14]. The studies on preauricular sulcus morphology [15,16,17] showed that it could be used as a reliable indicator for sex estimation. The greater sciatic notch morphology represents another dimorphic trait of the innominate. In males, it tends to be narrow, and in females, it is relatively wide [18, 19]. The size of the pelvic bone and the development of structures around its margins, such as the ischial spine, influence the shape of the greater sciatic notch [18]. Studies on this trait also revealed that it could be used for sex estimation with reasonable accuracy [18,19,20].

The sexual differences are usually not apparent in long bones, but the metric method with a statistical approach can aid the estimation of sex with high accuracy [21,22,23,24]. Metric methods are considered to perform better than morphological methods, as the former eliminates the subjectivity inherent to morphological assessment and thus reduces the inter-observer and intra-observer errors [25, 26]. Though metric methods have reasonable accuracy rates, it is evident that one population standard should not be used for another population for sex assessment as there may be a significant variability in sexual dimorphism between populations [27].

The biomechanical environment is known to influence the morphological appearance of the bone and its structure [28]. In addition, hormone levels, differences in growth rates, and disease process also have effects on the morphological aspect of bones [29]. Aging influences skeletal morphology in different ways. During the aging process, bone resorption occurs from the cortical surface, while, at the same time, this process is compensated by periosteal apposition and bone enlargement [30]. Moreover, estrogen deficiency state in the post-menopause stage leads to loss of trabecular bone. It is also reported that age-related periosteal bone formation differs between the sexes [31,32,33]. Age-related changes in skeletal material can sometimes lead to erroneous results in sex estimation [34]. Indeed, increased misclassification tendency has been noted on the metric method of sex estimation on the patella [35] and scapula [36] in individuals with advanced age.

It is reported that post-menopausal women show more masculine features in the cranium compared to young age [9]. Effect of age on pelvic morphological traits has not been studied extensively. Lovell [37] conducted a study to test the Phenice method and noted that the accuracy of sex estimation decreases in old age. The greater sciatic notch morphology pattern said to change with age, and Walker [20] reported that older females are likely to be misclassified as males based on its morphology.

In the past, many metric studies on sex estimation of long bones were undertaken on European populations [38,39,40,41,42,43,44,45]. Some studies have been conducted on elderly populations, but many of them focused on single bone measurements [39,40,41,42].

The aim of the present study is to assess the reliability of pelvic morphological traits and long bone metrics on sex estimation of individuals in middle and late adulthood from an Italian skeletal collection. Furthermore, this study aims to determine whether morphological traits should be preferred over metric methods in sex estimation in case of fragmented skeletal remains.

Material and methods

This study was carried out on skeletons with known age and sex, from the CAL Milano Cemetery Skeletal Collection [46]. The collection consists of 2127 skeletons housed in the LABANOF (Laboratorio di Antropologia e Odontologia Forense), in the Department of Biomedical Sciences for Health, University of Milan (Italy), and is available for research purposes, in accordance with article 43 of the Italian National Police Mortuary Regulation [47]. The collection is constituted of skeletons which had been buried in the cemeteries of Milan and then exhumed by cemetery workers after 10 years of burial with the aid of machinery. Each skeleton of the collection is associated with documentation that includes dates of birth and death, age, sex, cause of death, and the details of pathological and traumatic conditions of the deceased. This collection contains skeletons of individuals who died between 1910 and 2001; however, 85% of individuals died after 1980.

At the time of the study, only a few hundred skeletons had been cleaned. Among these skeletons, the best preserved ones were chosen for this investigation.

Accordingly, a total of 164 adult skeletons (74 males and 90 females) were selected from the CAL Milano cemetery skeletal collection. All selected individuals were born after 1905 and died between 1986 and 1998, with an average age-at-death of 74.9 years (Table 1). Bones with evidence of antemortem trauma, morphological deformity, pathological or taphonomic changes, which would lead to the alteration of measurements, were excluded from this study. Based on these criteria, some selected skeletons had bones that were not suitable for this study. Therefore, those bones were excluded, but the remaining bones of the same individual were included for the study.

Table 1 Age distribution of the study sample

Two forensic pathologists who are trained in physical anthropology carried out the study.

Morphological methods

Concerning morphological methods, the scores of the subpubic concavity (SPC), medial aspect of the ischio-pubic ramus (MA), and ventral arc (VA) were calculated as described by Klales and colleagues [8]. The pubic bone trait accuracy in sex estimation was calculated by using a logistic regression equation derived from the recalibration of the Klales et al. method on global pooled samples [10]: Y = 1.42969(VA) + 1.0415(SPC) + 0.9752(MA) − 10.0139. The value 0 was taken as sectioning point: the negative values obtained from the regression formula were considered as females and the positive values regarded as males. The scores of the greater sciatic notch morphology and preauricular sulcus morphology were assigned as described by Buikstra and Ubelaker [11]. For the greater sciatic notch, the scores 1 and 2 were classified as females, score 3 was accounted as indeterminate, and scores higher than 3 were considered as males. For the preauricular sulcus, its presence or absence was assessed. Absence was considered as male, and its presence was classified as female, irrespective of its morphological score. For all morphological methods, left side innominate bones were used.

In addition, inter- and intra-observer agreement (Cohen’s kappa) was calculated for repeating 30 assessments for each morphological trait by two trained operators and by the primary investigator in 1-week interval.

Metric methods

The post-cranial measurements were taken, as described by Buikstra and Ubelaker [11], by using an osteometric board and a digital sliding caliper (Table 2). The measurements were compared for inter- and intra-observer reliability. The inter-observer reliability was assessed by repeating the same measurements by the second operator on the entire study sample, whereas the intra-observer reliability was tested by repeating 30 skeletal measurements in 1-week interval by the primary investigator. Intraclass correlation coefficient (ICC) was calculated to assess the degree of agreement between the measurements.

Table 2 List of postcranial measurements and morphological features assessed in this study

The right and left measurements were checked for asymmetry in both sexes and no statistically significant asymmetry was found. All available left side measurements were included for the metric sex estimation. For each measurement, mean values, standard deviation, t test, and p values were calculated on both sexes to ascertain whether there are statistically significant differences between males and females. The sectioning points for each variable were obtained by taking the averages of male and female mean values [25]. If a measurement lied above the sectioning point, the individual was classified as male, and if the measurement lied below, the individual was considered as female.

The classification rates for each measurement were calculated as described by Barnes and Wescott [48].

Results

Intra- and inter-observer agreement for morphological assessment of pelvic traits is reported in Table 3. The following criteria were used to interpret the agreement: kappa < 0 poor agreement, 0–0.20 slight agreement, 0.21–0.40 fair agreement, 0.41–0.60 moderate agreement, 0.61–0.80 substantial agreement, 0.81–1.00 (almost) perfect agreement [49]. Based on this, the inter-observer agreement varied between slight and moderate, with the subpubic concavity and the greater sciatic notch showing the highest agreement. On the contrary, the intra-observer agreement was almost perfect for all the morphological traits, except for the subpubic concavity which, however, showed a substantial agreement.

Table 3 Cohen Kappa values for the intra- and inter-observer agreement of morphological traits assessment

With regard to the application of the logistic regression equation derived from the recalibration of the Klales et al. method on global pooled samples [10], both males and females were correctly classified in 100% of the cases (Table 4).

Table 4 Sex estimation accuracy of the logistic regression equation derived from the recalibration of the Klales et al. method on global pooled samples

The assessment of the preauricular sulcus and the morphology of the greater sciatic notch (Table 5) indicated that the former is characterized by a higher combined accuracy (89.6% vs 85.4%).

Table 5 Results of sex estimations by the assessment of pelvic morphological traits

Concerning measurements, ICC value interpretation criteria described by Koo and Li [50] were used to assess the intra- and inter-observer agreement. A value below 0.50: poor agreement, between 0.50 and 0.75: moderate agreement, between 0.75 and 0.90: good agreement and above 0.90: excellent agreement. Most of the measurements showed an “excellent” intra- and inter-observer agreement (Table 6). The only exceptions regarded the maximum length of the tibia, along with the humerus vertical head diameter and epicondylar breadth which indicated a “good” agreement between the observations of the two raters.

Table 6 Intraclass correlation coefficients (ICC) for intra-rater and inter-rater agreement of metric measurements

Table 7 provides the descriptive statistics, sectioning points, classification rates for all left side measurements. All t tests comparing the male and female value were significant (p < 0.005) and the classification rates varied from 90.73 to 75.19%. The highest classification rate was obtained by using the maximum epiphyseal breadth of proximal tibia, and the lower value regarded the maximum length of the femur.

Table 7 Descriptive statistics of all measurements taken for this study with sectioning points and classification rates

Discussion

Visual methods of sex estimation from pelvic morphological traits give relatively quick results [34]. The accuracy of sex estimation by using the logistic regression equation derived from the recalibration of the Klales et al. method on pelvic morphological traits was equal to 100% in our study. Similar results have also been reported on a Mexican population [51]. The same formula has shown a high accuracy rate (> 95%) for American white and South African white populations [10]. The result of the present study confirms that morphological traits such as subpubic concavity, medial aspect of the ischio-pubic ramus, and ventral arc are highly dimorphic in advanced age individuals. Furthermore, it also confirms that the recalibrated formula of the Klales et al. method can be successfully used with European individuals in middle and late adulthood.

The presence of the preauricular sulcus as a dimorphic trait has been studied in different populations such as American, Japanese, and German and yielded conflicting results on accuracy [15,16,17]. In the present study, female sex estimation accuracy was higher (100%) in comparison to male sex (79.2%), and the same trend was observed for the Hamann-Todd Human Osteological Collection [17]. The combined accuracy rate obtained in our study (89.6%) was higher than the accuracy rates obtained from studies on the William M. Bass and Terry Collections (78.8%) [16] and Hamann-Todd Human Osteological Collection (75.8%) [17]. This observation confirms that preauricular sulcus is highly sexually dimorphic in the tested Italian population.

The ordinal scoring system of the greater sciatic notch has a good impact on sex estimation of fragmented pelvic bones. It was the most preserved morphological feature of the study sample. The accuracy of female sex estimation (88.5%) was higher than that of males (82.3%) based on greater sciatic notch morphology. Similar results were obtained by Novak and colleagues [16] in their study on William M. Bass and Terry collections. Walker [20], in his study on Americans of European or African ancestries, found a relationship between the age-at-death and the morphology of the greater sciatic notch. Elderly females tend to have narrower, more “masculine” sciatic notches and they are likely to be misclassified as males. Nonetheless, in the present study, the accuracy in sex estimation was higher in females compared to males. The combined accuracy rate of 87.1% in sex estimation in this study showed that the grater sciatic notch morphology demonstrates a high sexual dimorphism in the tested Italian population and age seems to have have minimum influence on it.

In the current study, all pelvic morphological traits tested showed good sexual dimorphism. The primary limitations found for these methods on the tested sample were their applicability, repeatability, and reproducibility. The applicability of the tested methods depends on the state of preservation of bone which varies significantly between the different regions of pelvic bones. The greater sciatic notch morphological features were preserved better than pubic bone in this study sample, and the same pattern has been observed in other studies [52, 53]. Though the application of Klales method yielded 100% accuracy, the preservation of pubic bone traits reduced the applicability of the method to less than half of the study sample.

Morphological scores depend on visual assessment, which is greatly influenced by the level of subjectivity [54, 55]. The main problem found in ordinal scoring of morphological traits is the unreliability of its application on a large sample in a replicable manner [56]. The high degree of observer subjectivity, a lack of consistency in the evaluation of traits, and a strong influence of the previous experience of the observer are some reported factors that can affect sex estimation based on the assessment of morphological characteristics [18]. In the present study, the tests of the intra- and inter-rater reliability demonstrated a slight to substantial agreement between observations which is in line with previous studies which highlighted the limited value of repeatability and reproducibility of sex estimation methods based on the assessment of morphological traits [57, 58].

With regard to sex estimation by metric method, ICC values showed excellent to a good inter-observer agreement. Moreover, the excellent intra-observer agreement between measurements confirmed the repeatability and reproducibility of the metric method on the tested sample.

The results of the metric analysis revealed a high classification rate (> 80%) for most parameters indicating a high sexual dimorphism in the tested Italian population and therefore availability of a number of long bone measurements to successfully perform sex estimation. The maximum epiphyseal breadth of proximal tibia and the femur epicondylar breadth showed the best results in discriminating between males and females. The former showed the highest classification rates also in American white population, whereas the latter in American black population [25].

In agreement with previous studies on long bones metrics [27, 59, 60], the length of the bone resulted less useful in discriminating between males and females compared to the epiphyseal breadth, although also the former guarantees good classification rates. This applies to both the tibia [26, 34, 61] and the femur [17, 48], as well as to the bones of the upper limb [37, 62, 63].

Conclusion

This is the first study on postcranial bones of a middle and late adulthood Italian population comparing the sexing accuracy of pelvic morphological traits and selected long bone measurements. The results of this study demonstrated the validity of the tested sexing methods, also with individuals in the late adulthood with a high degree of confidence.

When pubic bone is available, the logistic regression equation derived from the recalibration of the Klales et al. method on pelvic morphological traits seem to be the most reliable way to estimate sex. However, both in archeological and forensic contexts such morphological traits may not always be assessable due to bone degradation. In such cases, other morphological traits of the pelvis such as the greater sciatic notch and the preauricular sulcus, along with the epiphyseal breadth of long bones, guarantee a good ability in discriminating between males and females of Italian descent.