Introduction

Sex assessment is among the principal tasks when examining human skeletal remains. Since human identification as well as various skeletal analyses heavily rely on sex, accurate sexing methods are of great importance in forensic anthropology and bioarchaeology. Over the years, a number of qualitative and quantitative methods have been proposed based on cranial and post-cranial traits [e.g., [1,2,3,4,5]] with varying rates of success and the highest correct sex classification concerning the pelvis and the cranium [4, 5]. Qualitative methods assess specific sexually dimorphic anatomical traits. In general, qualitative methods are easier to apply but heavily rely on the experience of the observer; hence, they tend to be more subjective and prone to inter-observer error [5, 6]. On the other hand, quantitative methods rely on measurements, which may be more time consuming and often require specialized equipment [7] but are less prone to observer bias and often produce more accurate results [5, 6, 8].

In recent years, the quantitative methods advanced to more complicated techniques using geometric morphometrics, which allow more advanced analysis of the shape variation than traditional measuring methods [5, 9, 10]. Although the application of geometric morphometrics requires the use of sophisticated equipment such as laser scanners and 3D digitizers, its more elaborate shape analysis of skeletal elements has revealed further information on the variation of the human skeleton such as asymmetry [11], age-related changes [12], and secular changes in morphology [13]. Additionally, geometric morphometrics have produced methods for sex estimation with improved correct classification results over traditional quantitative [4, 5, 8]. In contrast to the qualitative methods, geometric morphometrics not only produce far more robust and reproducible results due to the minimization of inter-observer error, but also allow more accurate measurement of the magnitude of sexual dimorphism [8], which is known to vary in its expression in different populations [14].

The use of ectocranial landmarks in geometric morphometrics for analyzing the shape variation of the human cranium with respect to sex has been previously studied [9, 13, 15,16,17,18]. The use of generalized procrustes analysis (GPA) and principal component analysis (PCA), which constitute the common analytical work flow for such data, produce results that are difficult to apply in forensic cases unless all landmarks are present and the reference data are also available to the forensic anthropologist or the bioarchaeologist, who need to examine an isolated sample. This has resulted to some criticism against the usability and effectiveness of geometric morphometrics in sex assessment [7, 19]. On the other hand, the traditional quantitative methods, which are straightforward to implement (simple measurements), often rely on cranial traits known from visual observation to be sexually dimorphic [20, 21]. As a result, significant sex-related variation, which is captured with GPA and PCA, is left unexplored and unused for sex discriminant functions [5]. The present study aims to fill this analytical gap by iteratively exploring the sexually dimorphic properties of almost a quarter of million possible combinations of euclidean distances and angles based on ectocranial landmarks in a modern Greek skeletal collection.

Materials and methods

The sample consisted of 176 adult crania of Greek individuals of known sex: 94 males and 82 females. These crania constitute part of the modern skeletal reference collection (known as the Athens Collection), which is housed in the Department of Animal and Human Physiology at the National and Kapodistrian University of Athens. The collection consists of 225 skeletons. Information on the sex, age at death, cause of death as well as place and year of birth for each individual in the collection is derived from death records [22]. All individuals are Greek nationals with recorded age-at-death ranging from 19 to 99 years old and their respective year of birth spanning from 1879 to 1965. Individuals without any evidence of pathological deformation and absence of craniotomy were selected.

Data acquisition

Eighty landmarks (12 midline and 34 bilateral) on the outer surface of the skull, chosen on the basis of adequately illustrating the geometry of the cranium, were used (Table 1). Most of landmarks definitions were obtained after Bigoni et al. [9], while definition of supramastoid crest–squamous suture intersection was obtained after Franklin et al. [20]. Regarding the porion, glabella, and opisthocranion landmarks, definitions were obtained from Whites’ book [23]. All landmarks were type I, type II, and type III [24] and can be unambiguously located on the cranial surface [9]. Three-dimensional landmark coordinates were digitized (by MC and AB) with a Microscribe 3DX (Immersion Cor, San Jose, California), whose rated accuracy is ± 0.23 mm. Inter- and intra-observer errors regarding landmark digitization on the crania of the Athens Collection have been previously reported [12] ranging from 0.3 to 1.8 mm. Moreover, for the purposes of the present study, the most sexually dimorphic traits (13 distances and 7 angles) have been measured (by AB) with a mechanical caliper on 20 randomly selected crania (10 males, 10 females) to assess the usability of the produced discriminant functions when measuring with simple equipment.

Table 1 List of digitized landmarks

Data analysis

All available coordinates of the 80 landmarks were analyzed in GNU Octave [25]. Purpose specific programming functions, which are available upon request, iteratively calculated all possible combinations of distances and angles from the battery of landmarks included in the present study. For each unique set of the 3160 distances and 246,480 angles, a linear discriminant analysis was performed to calculate the correct classification percentage of the original group for each single variable (distance or angle). Thirteen distances and seven angles with the highest correct classification score were further analyzed in SPSS (IBM SPSS version 23.0, Armonk, NY). Discriminant function analysis (DFA) for leave one out cross-validated correct classification was performed for all 20 variables. Multivariate DFA was further performed on all angles and distances as separately as well as joined. Both enter and step-wise methods were used in multivariate analysis. Additionally, linear regression of each variable on the centroid size was performed to assess the relation of each variable to the size of the cranium. The centroid size was calculated from 63 landmarks that were available in 167 individuals, who were included in all 20 angle and distance variables. Finally, the technical error of measurement (TEM) as well as the scaled TEM has been calculated [26] for each of the most sexually dimorphic traits between the measurements calculated by landmark coordinates and those taken with the caliper to assess the accuracy of the later method and determine sampling variability between the two measurement techniques.

Results

The iterative approach adopted in the present study for investigating the sex-related geometric variation of the human cranium yielded numerous sexually dimorphic distance and angle measurements that yield reasonably high correct classification. Searching through the univariate results showed 60 distances with correct classification higher than 82% and 61 angles with correct classification higher than 75%. However, we have decided to focus on the most sexually dimorphic distance and angle variables, which were further used in multivariate analysis. Thirteen distances with correct classification higher than 85% and seven angles with correct classification higher than 78% were chosen for this matter. The descriptive statistics of these measurements along with their corresponding landmarks are presented in Table 2.

Table 2 Descriptives of highest correct classification distances and angles

The aforementioned correct classification percentages were used as a threshold for selecting the most sexually dimorphic measurements based on the original group classification, which was evaluated in the preliminary iterative analysis in GNU Octave. The DFA in SPSS of the selected 20 angles and distances altogether verified the original group correct classification results but also yielded the cross-validated correct classification along with the discriminant functions presented in Table 3. In addition to the total correct classification for each variable, the corresponding percentages for males and females are also provided, since there are certain measurements that provide remarkably better classification in favor of one sex over the other. It should be noted that the sectioning point for all the discriminant functions in Table 3 is zero with positive values classifying for male.

Table 3 Discriminant functions and correct classification results of distances and angles

Whereas the highest cross-validated correct total classification for D1 and A1 was 87.4 and 80.6%, respectively, the multivariate analysis yielded even higher percentages. The correct classification results of the multivariate analysis using enter and stepwise DFA on the 20 variables are summarized in Table 4. Although grouping angles and distances in multivariate DFA (enter method) yielded higher correct classification percentages, the stepwise method revealed that the combination of only four measurements, namely angles A3 and A7 and distances D1 and D6, is the most accurate sex classifier with cross-validated correct classification 93.5%. Additionally, the same classifier provides almost the same accuracy for both males and females (93.3 and 93.8%, respectively). It should be noted that although the multivariate discriminant functions still produce different correct classification scores between males and females, these differences are reduced compared to univariate results. Again, the sectioning point for all the discriminant functions in Table 4 is zero with positive values classifying for male.

Table 4 Discriminant functions and correct classification results of multivariate analysis

The results of the regression analysis between sex discriminating variables and the centroid size are presented in Table 5. As expected, the distance variables explain a much higher percentage of the variance of the size of the cranium (mostly between 70 and 80%) rather than the angle variables (ranging from 17 to 27%). The only exceptions are variables D1 and D12, which correspond to the same measurement from the right and left side, respectively, between landmarks Ovale mediale-Mastoidale and the explained variance is about 47%. All regression models presented in Table 5 are statistically significant.

Table 5 Regression analysis between sex discriminating variables and centroid size

TEM and scaled TEM for distance and angle variables are shown in Table 6. The scaled TEM for manually measured distances varies from 0.8 to 2.3% with respect to the calculated distances based on landmark coordinates. Additionally, TEM is systematically lower than the standard deviation (see Table 2) of each variable for both males and females from the entire skeletal sample and most important is much lower than the difference between upper and lower confidence interval bounds between sexes. The manual measurement of angles revealed high values of scaled TEM for angles A1, A3, and A7 ranging from 10.4 up to 19.1%, whereas the scaled TEM for all other angles does not exceed 4%. Although TEM for angles A1, A3, and A7 is low (1.1, 0.6, and 0.6, respectively) suggesting high precision in measurement, it is of the same magnitude as the observed standard deviation which may result in low accuracy when measuring these particular variables.

Table 6 Technical error measurement between digitized landmarks and manual measurements

In light of the TEM results, a stepwise method DFA was conducted including all angle and distance variables from the entire skeletal sample except for high-level scaled TEM angles. The results (shown in Table 4) revealed even higher classification scores with cross-validated correct classification reaching 95.2% for the total sample, whereas males are correctly classified at 94.3% and females at 96.3%. This classification is based on a three-variable discriminant function including distances D1 and D6 as well as angle A5.

Discussion

The aim of the present study has been two-fold. We aimed to explore the geometry of the cranium in an iterative manner to identify the most sexually dimorphic traits that can be used for accurate sex identification, while at the same time minimizing the complexity of the application of modern geometric morphometric methods. Indeed, although our analysis was based on three-dimensional coordinates of landmarks, which requires specialized equipment (laser scanner or 3D digitizer in our case), all traits reported in our results can be easily and accurately measured with the use of simple equipment (caliper). For distance traits, all is required a straightforward distance measurement between two landmarks, whereas the traits regarding angles can be easily calculated from the distances associated with the respective landmarks that define the angle of interest. This can be achieved using the formula \( B=\mathit{\operatorname{arccos}}\left(\frac{a^2+{c}^2-{b}^2}{2 ac}\right) \), which is derived from the law of sines [27], or simply use an online calculator such as http://www.calculator.net/triangle-calculator.html. For illustration purposes, considering the angle B formed by three landmarks, namely A, B, C, distance a corresponds to the distance between landmarks B and C, distance b corresponds to landmarks A and C, and distance c between landmarks A and B. Although the discriminant functions reported in the present study can be easily implemented with a single vernier caliper, the use of geometric morphometrics and digitization of 3D coordinates of landmarks constitutes the backbone of our research objective, since almost a quarter of a million distinct combinations of distances and angles could not have been evaluated manually. The TEM results reported in the present study suggest that manual measurements are quite accurate and precise for implementing the proposed sexual discriminant functions without the need for specialized and expensive equipment (3D microscribe). With the exception of angles A1, A3, and A7, the scaled TEM lies well within acceptable levels [28].

With respect to exploring accurate sex identifiers, our results have identified sexually dimorphic traits that have not been previously reported, such as the distance between Ovale mediale–Mastoidale landmarks. Moreover, all other distance traits exhibiting correct classification higher than 85% concern distances between landmarks on opposite hemispheres of the cranium but not symmetric (e.g., bilateral landmarks). Our results show that the most sexually dimorphic distances correspond to non-symmetric traits which are not usually evaluated by other researchers [20, 21, 29]. The same pattern of favoring non-symmetrical traits over symmetric measurements is also present for the traits concerning angles, which although they exhibit lower correct sex classification rates than distance traits, they still produce better results than symmetrical angles often used on other research [30].

The application of iterative analysis of all possible distance and angle combinations based on the battery of landmarks used in the present study not only revealed different sex estimation traits with higher correct classification rates than those commonly used for producing discriminant functions for various populations [20, 21, 29,30,31,32,33,34], but also produced better classification results than previous work on the same population sample using GPA–PCA workflow [15,16,17]. More specifically, Franklin et al. [20] reported 85% correct classification for bizygomatic breadth as the most accurate single variable and 90% for multivariate classification using three variables (namely, glabello-occipital length, bizygomatic breadth, and mastoid height) in a study based on 3D volume rendered multi-detector computed tomography scans of 400 adult individuals. Mahakkanukrauh et al. [21] working on a sample of 200 Thai skulls reported maximum accuracy of 90.6% when combining six measurements (maximum cranial length, bizygomatic breadth, biauricular breadth, nasal height, biorbital breadth, and right mastoid length). Marinescu et al. [29] produced multivariate discriminant functions achieving 88% correct classification for a Romanian population sample based on four cranial measurements (cranial length, cranial height, facial breadth, and nose height). Similar classification scores have been reported by Ogawa et al. [33] from a Japanese sample. Kranioti et al. [34] working on a modern Cretan population sample, which may be regarded as the most closely related to the Athens Collection since they both comprise from contemporary Greek individuals, reported bizygomatic breadth as the most discriminatory variable with 82% correct classification whereas a stepwise method involving five measurements (bizygomatic breadth, cranial length, nasion–prosthion and mastoid heights, and nasal breadth) raised the classification accuracy to 88.2%.

Regarding the sample of the present study, the research of Chovalopoulou et al. reported cross-validated correct classification of 74.8% for the palate region (five landmarks) and 90.4% for the cranial base (25 landmarks) [15], original group correct classification of 79.4% for the midsagittal curve of the neurocranium (32, 2 landmarks and 30 semi-landmarks) and 89.2% for the cranial vault (31landmarks) [16] as well as 86.7% for the upper face (31 landmarks) and 83% for the orbits (10 landmarks) [17]. In the present study, although the maximum cross-validated correct classification of single distance variable (2 landmarks) reaches 87.4%, the multivariate analysis produced a discriminant function with only three traits (calculated from only seven landmarks), namely two distances and one angle, with a total cross-validated correct classification of 95.2%.

The direct comparison of our results with previous work on the same population sample not only shows better classification results with easier applicability on forensic cases for sex identification, but also illustrates the advantages of the methodological approach of the present study, where an exhaustive iterative approach (although computationally intensive) was selected over the GPA–PCA analysis often used in landmark-based analysis [9, 10, 15,16,17]. Of course, using discriminant analysis for landmark-based measurements is a long established method for sex identification [e.g., [35,36,37]], which is still adopted in recent research [20, 21, 29,30,31,32,33,34]. Nevertheless, the use of “standard” craniometric measurements or their variants poses limitations on the possible findings for each population being examined, which the present study tries to overcome by “ignoring” the prior knowledge of cranial sexual dimorphism expression. The benefits of nonstandard interlandmark distances (ILDs) over standard ILDs as well as GM methods have also been reported by Spradley and Jantz [38]. In their work on ancestry estimation of American Black and White and Hispanic males and females, they have demonstrated that nonstandard ILDs provide the best classifications for the groups used in their analyses.

Although our correct classification results are mostly higher than those reported for other populations regarding craniometric traits [20, 21, 29,30,31,32,33,34], direct comparison of our results with those from different populations does not necessarily reflect higher expression of sexual dimorphism in the Greek population, since different measurements are concerned. However, the highest correct classification discriminant functions reported in the present paper do not necessarily outperform other craniometric discriminant functions in other populations, although they definitely outperform them in the Greek sample, since the extended list of landmarks in the present study includes virtually all landmarks used in other studies and all possible combinations have been assessed. Nevertheless, it is valuable to compare our results with the recent findings of Oikonomopoulou et al. [7], who examined the accuracy of existing binary logistic equations for sex prediction based on cranial traits in the same population sample with our study. Oikonomopoulou et al. used the cranial traits of the mental eminence, supra-orbital margin, glabella, nuchal area, and mastoid process as described by Walker [2] and further included the vertical femoral head diameter (FHD) for further improving the classification results of their newly proposed functions based on the Athens collection. According to their findings, Walker’s equations produced variable degree of success with correct classification ranging from 76 to 99% for males, but only from 22 to 61% for females. Additionally, applying the discriminant functions from Soficaru et al. [39] produced even poorer results (males: 66–98%; females: 26–77%) [7]. However, the high sex bias reported by Oikonomopoulou et al. [7] shows rather poor overall sex estimation for these discriminant functions. On the other hand, the newly proposed equations by Oikonomopoulou et al. [7] based on the Greek assemblage yielded correct classification accuracy ranging from 87.4 to 88.4% for males and 91.4 to 92.6% for females, which are almost as good as the discriminant functions reported in the present study. The inclusion of FHD in their functions further increased the achieved accuracy in the range of 92.6–94.7%, but the gain in classification accuracy is very small for the requirement of the presence of the femoral bone belonging to the same individual as the cranium.

Usually, size produces the most prominent variation between males and females, whether it concerns the qualitative assessment of the mastoid process or the quantitative assessment of the maximum cranial length. Previous research by Chovalopoulou et al. [15,16,17] has shown that cranial sex discrimination based on form variation, which includes both shape and size variation, always produces better results than shape variation. Additionally, Oikonomopoulou et al. have shown that using a size related variable as a proxy in sex estimation further improves the classification accuracy of discriminant functions [7]. Nevertheless, using size as a proxy renders the discriminant functions even more population specific, since size-based variables are more influenced by secular trend and are usually population specific [40]. Evaluating the correlation of our distance and angle variables with the centroid size of the crania used in the present study aims to illustrate how much of each sex discriminating variable is explained from the variation in size. Of course, this should not be regarded as a population specificity metric of our discriminant functions. Nevertheless, it provides some insight into how the expression of sexual dimorphism is influenced by the observed size variation. Further research on different population samples is required to evaluate the performance of the proposed discriminant functions for sex assessment and to establish how this performance is interrelated with the observed size variation.

Conclusions

Iteratively exploring nonstandard ILDs as well as interlandmark angles revealed a number of sex-discriminating single traits with reasonably high correct classification, which can be valuable when examining partial cranial remains. It also produced one of the most highly accurate sex-discriminating functions reported in literature involving only two distances and one angle, which also has been demonstrated that can be measured accurately with a simple caliper relaxing the need for use of specialized equipment. The results of our study further emphasizes the need for revising the standard data collection protocol as previously suggested by Spradley and Jantz. Although the use of equipment such as microscribe and landmark based GM methods has become increasingly popular in forensic anthropology, the use of discriminant functions based on simple measurements remains very important since it relaxes the time constraints and need for elaborate setups by the forensic examiner.