Introduction

Sex estimation of skeletal material is one of the most fundamental tasks of forensic and physical anthropologists. Despite the revolutionary advancements in DNA methods in forensic science in recent years, the morphological methods used for estimating sex have retained their relevancy because of a number of reasons such as degradation of DNA under different forensic circumstances (e.g., fires) [1].

Various methods for sex estimation, based on different parts of the skeleton, have been reported [212]. Some of these methods rely on morphological features (descriptive), whereas others are based on measurements. The metric methods have a major advantage over the descriptive ones, since they are less dependent on the judgment of the observer [13]. A relatively new method, i.e., geometric morphometrics, has been applied for sex estimation to overcome the disadvantages of the morphologic method [1417]. Although this method yields good results [16, 18], it requires both special equipment and a specialized researcher. Therefore, it is not applicable for most forensic and physical anthropologists.

Mandibles are both sexually dimorphic and durable (i.e., recovered intact or in an adequate condition) and are thus good candidates for sex estimation of unknown individuals [7, 15, 1921]. Some studies have used metric characteristics of mandibles to create discriminant functions for sex identification [15, 19, 22]. These studies focused on standard measurements of the mandible such as mandibular angle, bicondylar and bigonial breadths, ramus height, and symphysis height. Other studies used descriptive methods, e.g., flexure of the ramus, shape of the chin, and gonial flaring [6, 2326]. These methods, based either on continuous or discrete variables, suffer from various deficiencies; to wit, the variables for sex estimation were either arbitrarily selected or statistically picked from a small pool of measurements; the selected metric variables were limited by the available measuring tools (e.g., caliper); most methods were not cross-validated and did not respond to forensic needs (e.g., often only a fragment of the mandible is available); and they were constructed on samples derived from homogeneous populations. With regard to the latter, nowadays, most societies have become more heterogeneous owing to the increase in human mobility between countries and continents [27].

In recent years, recognizing the contribution of various imaging techniques, especially computed tomography (CT) scans, to postmortem investigation has increased [2832], accompanied by studies that ensure the validity and reliability of these techniques [3337]. Accordingly, CT is becoming a common diagnostic tool in many forensic institutes. Thus, the need for a CT-based method to estimate sex from skeletal remains has emerged. The aim of this study was to develop a CT-based method for sex estimation using the mandible, which overcomes much of past studies’ deficiencies.

Materials and methods

The study sample was derived from the current Israeli population. This population is particularly suitable for studying biological variation in heterogeneous populations due to its extensive mixture of people migrating to Israel from different parts of the world. The study design is retrospective. Head and neck CT scans of 438 individuals (214 males and 224 females), over the age of 20 years, were randomly selected from a pool of CT scans carried out between the years 2000 and 2012 at the Carmel Medical Center, Haifa, Israel (Brilliance 64, Philips Medical System, Cleveland, Ohio; slice thickness 0.9–3.0 mm, pixel spacing 0.3–0.5 mm, 120 kV, 250–500 mA, number of slices 150–950, and matrix 512 × 512). All CT scans were carried out for diagnostic purposes and for whom a CT exam was medically necessary. Inclusion criteria were as follows: age ≥20 years, intact lower incisors, and at least two teeth of the posterior unit (premolars and/or molars) on each side. Exclusion criteria included the absence of the lower incisors; dental implants and metal restorations that interfere with the measurement; prominent facial and mandibular asymmetry; cranio-facial, temporomandibular joint, or muscular disorders; trauma; previous surgery in the head and neck region (medical files or signs on the skull); and technically aberrant CT scans. This study was approved by the ethical board of the Carmel Medical Center (number 0066–11-CMC).

Two sets of measurements (Table 1) were taken using the Philips portal (thin client). The first set (n = 13) includes surface (external) linear measurements from a 3D reconstruction of the mandible, using the volume rendering application of the software (Fig. 1). The second set (n = 12) includes internal linear and area measurements from 2D images or cross sections of the mandible (Fig. 2). Measurements of the mandibular body and symphysis region were taken in relation to the mandibular plane.

Table 1 Definitions of CT-based external and internal mandibular measurements
Fig. 1
figure 1

Measurements of the mandible taken from a 3D model using the volume rendering technique. Anatomical definitions for each measurement appear in Table 1. Note that measurements of mandibular body height were taken when the mandible was positioned in the mandibular plane (the inferior margins of the mandibular body are positioned parallel to the horizontal plane) in a lateral view (i.e., ascending rami overlap)

Fig. 2
figure 2

Measurements of the mandible taken from lateral and cross-sectional images. Anatomical definitions for each measurement appear in Table 1. Note that cross sections of the mandibular body and symphysis were carried out perpendicular to the mandibular plane (the inferior margins of the mandibular body are positioned parallel to the horizontal plane). The antegonial notch is the space created between the inferior margin of the mandibular body and the mandibular plane

In many forensic/archeological cases, the mandible is incomplete; therefore, we calculated discriminant functions for sex estimation for five different states of completeness of the mandible (hereafter referred to as scenario I to V). Scenario I relates to a complete mandible; therefore, all CT-based measurements could be included in the regression analysis. Scenario II relates to half a mandible (from ramus to chin); thus, measurements of the ramus (length, width, and cross-sectional area (CSA)), body (length, heights, and CSAs), coronoid (height, width, and CSA), condyle (width), mandibular angle region (angle, width, and CSA), and antegonial notch area could be included in the regression analysis. Scenario III relates to a fracture of the mandible where only the mandibular arch (without rami) exists. Here, three external measurements, body height at the premolar and molar regions and chin width, and six internal measurements of the symphysis and chin (heights, thicknesses, and areas) could be included in the regression analysis. Scenario IV included the ramus alone (from coronoid and condyle to the mandibular angle). Here, five external measurements could be included in the regression analysis, which are ramus length and width, coronoid length and width, condyle width, and two internal measurements ramus width CSA and coronoid width CSA. In scenario V, a small fragment of the mandibular body was included. Thus, two external measurements and two internal measurements were included in the forward analysis, which are the mandibular body heights at the premolar and molar regions and their CSAs, respectively.

Statistical analysis

Data were analyzed using SPSS 21.0 software. Significance was set at p < 0.05. Intraobserver and interobserver variations were examined on 15 individuals using the intraclass correlation coefficient (ICC) analysis. For intraobserver variation, measurements were taken twice with a 2-week interval between each by the same researcher (TS). For interobserver variation, measurements were taken by an additional independent researcher (either HM or VS). ICC was interpreted according to the Cicchetti [38] categorization system, <0.40 poor agreement, 0.40–0.59 fair agreement, 0.60–0.74 good agreement, and 0.75–1 excellent agreement.

General summary information, i.e., mean and standard deviation (SD) for each measurement, were obtained via descriptive statistics. An independent sample t test was used to examine the significance of differences between males and females for each measurement. The rate (%) of sexual dimorphism for each measurement was calculated as follows: % dimorphism = [(mean males − mean females)/mean females] × 100. Discriminant functions for sex estimation and their success rates were calculated using logistic regression (forward analysis). Odds ratios (OR) and 95% confident intervals (CIs) were given for variables included in the discriminant equations.

Validation

To examine the validity of the success rates of the suggested discriminant functions for sex estimation, two cross-validation tests were conducted. Random sampling was carried out, using the function RAND (Excel 2013), to select 40 individuals for the test group, i.e., each of the 438 individuals received a random number; the 20 males and 20 females with the lowest numbers were included in the test group. The obtained discriminant functions, based on 398 individuals, were used to estimate the sex of the individuals in the test group. This procedure was carried out twice.

Results

The studied population (n = 438) consisted of 49% males and 51% females with no significant differences (p > 0.05) in age (53.3 ± 19.9 and 56.2 ± 20.6 years, respectively). ICC values for intraobserver and interobserver variations are presented in Table 2. The intraobserver variation of both external and internal measurements showed excellent results (0.905 < ICC < 0.991 and 0.838 < ICC < 0.986, respectively). The interobserver variation of external measurements showed excellent results for all measurements (0.85 < ICC < 0.996), except for two, coronoid width and chin width, which yielded good results (0.71 and 0.715, respectively). Most internal measurements (10 out of 12) showed excellent results (0.763 < ICC < 0.98), except for two (the chin area and the antegonial notch area), which yielded good results (0.741 < ICC < 0.785).

Table 2 Intraobserver and interobserver reliability tests: intraclass correlation coefficient (ICC) analysis

Significant differences between males and females were found for all mandibular external measurements and for most of the internal measurements (Table 3). For all measurements, except for mandibular angle, males have a greater means than females (Table 3). Sexual dimorphism rates varied from 1.6 to 103.1%. The most dimorphic traits were the antegonial notch area (103%), the chin width (22.3%), the body height CSAs (premolar 15.7% and molar regions 16.6%), the symphysis area (13.9%), the ramus length (13.5%), the body height at the molar region (11.1%), the coronoid height (10.8%), the body height at the premolar region (10.5%), the condyle width (10.4%), and the symphysis height (10%). A logistic regression analysis (forward method) was carried out separately for each scenario (I–V). In scenario I, a complete mandible, 6 out of the 25 measurements, were included in the discriminant function, which are ramus length, coronoid height, chin width, bigonial breadth, symphysis height, and antegonial notch area (Table 4), with a successful classification rate reaching 90.8% (similarly for males and females) (Table 5). In scenario II, half mandible, 4 out of 16 measurements, were included in the discriminant function, which are the ramus length, the coronoid height, the condyle width, and the antegonial notch area (Table 4). The classification rate using these measurements was 85.6%, with similar rates for males and females (Table 5). In scenario III, when only the mandibular arch (mandible without rami) was considered for analysis, only three internal measurements out of nine were included in the discriminant function, which are the chin height, the chin width, and the symphysis height (Table 4). The successful classification rate was 79.1%, with a slightly higher correct classification rate for males (80.3%) than for females (77.8%) (Table 5). Scenario IV describes a situation where only the ramus was considered for sex estimation (from coronoid and condyle to the mandibular angle). Two out of seven measurements were included in the discriminant function, which are the ramus length and the coronoid height (Table 4). A correct classification rate of 82.2% (with similar rates for males and females) was achieved (Table 5). Scenario V included a fragment of the mandibular body. Only two measurements out of four were included in the discriminant function, which are the body height at the premolar region and its CSA (Table 4). A correct classification rate of 72.9% (76.9% for males and 68.3% for females) was achieved (Table 5). Table 5 presents the discriminant functions for sex estimation with correct classification rates for the five scenarios of the mandibular state of completeness.

Table 3 Descriptive statistics (N, mean, and standard deviation (SD)) by sex; an independent sample t test for differences between males and females (p value) and percent of dimorphism are presented
Table 4 Mandibular measurements included in the discriminant functions (forward analysis) to estimate sex in various states of completeness of the mandible (scenario I to V)
Table 5 Discriminant functions for sex estimation and successful classification rates (%) for various states of completeness of the mandible (scenarios I to V)

Cross-validation analysis revealed that the fit of our models to a sample of observations, which was not used to estimate the model, was high (Table 6), yielding a mean success rate of 89% for scenarios I (complete mandible) and II (half mandible).

Table 6 Success rates of sex estimation based on cross-validation tests in various states of completeness of the mandible (scenarios I–V)

Discussion

The current study provides a series of discriminant functions for sex estimation based on measurements taken from CT scans of the mandibles. Each function was constructed based on a different state of completeness of the mandible. Our study shows a high rate of success discrimination for complete (90.8%) and partially preserved (half) mandibles (85.6%). Successful classification rates of previous methods using different features of the mandibles vary from 59 to 94% [14, 18, 20, 22, 24, 30]. The only study where a successful classification rate greater than ours (94.2%) was reported by Loth and Henneberg [23], who relied on mandibular ramus flexure. However, researchers who tested their method found much lower accuracy rates (66–85.8%) [24, 25, 39]. Additionally, geometric morphometric analysis of the ramus flexure [16] showed that the accuracy of sex estimation using this feature is low and that it has better classification characteristics for males than for females. Considering the rate of sexual dimorphism in mandible features (Table 3), it is clear that discrimination between the sexes based on a single trait is problematic.

The predictive rates of previous studies are lower than ours for two main reasons: (1) variables for sex estimation were either arbitrarily selected or were statistically taken from a small number of measurements and (2) various size and shape characteristics of the mandible could not be utilized by the traditional measuring tools (e.g., CSA and bone thickness).

Our method exhibits several major advantages over previous methods. First, the suggested method enables forensic anthropologists to change from descriptive evaluations (e.g., the robusticity rate, the gonial eversion magnitude) to numeric ones. Second, measurements included in the discriminant equations for sex estimation were taken from a large pool of mandibular measurements, tested statistically for their discrimination power. Third, it enables access to morphological features (e.g., CSA of the mandibular body) not possible by traditional measuring tools (e.g., caliper). Fourth, it provides clear knowledge on the success rates for males and females from a heterogeneous population that has undergone cross-validation. Fifth, it is more adequate for forensic needs because it covers different states of completeness of the mandible.

Limitations of the study

The discriminant functions were developed based on a given population. Although the study population is heterogeneous, the equations should be tested on other populations as well. Although the presented functions can be applied to mandibles of all ages, their applications for elderly individuals should be carried out carefully to ensure that they meet the inclusion criteria (e.g., intact incisors, the presence of molars or premolars at the measured location).

Conclusions

A simple, reliable, and valid method is suggested for forensic scientists for estimating sex, using CT scans of mandibles retrieved from a modern western industrial society. Five discriminant functions, based on mandibular measurements, were constructed to cover various conditions of completeness of the mandible. The greater the completeness of the mandible is, the higher the rate of success discrimination (up to 90.8%) will be. This method is not age dependent and has specific inclusion and exclusion criteria.