Introduction

Successfully determining the sex of the individual is an essential component in the process of identification, as it helps to narrow down the list of potential identifications (e.g., eliminates members of the opposite sex) and increases the accuracy of subsequent methods for estimating other biological attributes (e.g., sex-specific age and stature estimation standards) [1,2,3].

The pelvis is regarded as the best indicator of sex [4], but the often poor state of pelvic preservation makes the use of the skull, which tends to resist taphonomic phenomena better, more important [5, 6]. Besides, the skull is not only reported to be a good indicator for sex determination [4] but also offers the possibility for facial reconstruction of the subject [7], providing clues of utmost importance in the goal of identification.

Metric methods for sex determination from the skull were introduced in the middle of the twentieth century [8], and their main advantage is that more objective results can be obtained compared with the non-metric methods.

However, anthropometric methods remain most accurate when population-specific data are applied [9,10,11,12,13]. Many factors may influence skeletal morphology, such as genetics, environmental factors, migration flows, and secular trends [14,15,16]. Consequently, population-specific standards should be generated and regularly updated.

In the absence of documented archeological collections, medical imaging archives offer a suitable source of population-specific contemporary data, from which skeletal standards can be developed. It has been proven that imaging techniques, especially the multislice computed tomography (MSCT), are reliable for use in anthropometric studies [17,18,19,20,21]. Discriminant functions developed from 3D CT scans of the skull could also be calculated with an accuracy reaching 90% [9].

In Tunisia and in North Africa, there are no contemporary population-specific skeletal standards for the estimation of sex, a fact that may be an issue in forensic practice considering that the North African genetic pool is an admixture of the original inhabitants (Berbers) with those of neighboring and distant populations [22, 23]. Besides, the high (legal and illegal) migration flow led to a rise in the number of unknown skeletons and putrefying corpses with proper identification challenges. This highlights the need to establish a set of accurate and reliable, population-specific, discriminant functions that allow sex estimation from 3D MSCT of bones of North African individuals.

The objective of the current study was to analyze the correlation between sex and metric parameters of the skull in a contemporary Tunisian population using CT scan analysis.

Methods

Study sample

In this study, we analyzed craniometric parameters acquired in multislice computed tomography (MSCT) of 510 adult individuals. The sample was collected from the cranial MSCT of living Tunisian patients who presented at the imaging department of Charles Nicolle’s Hospital, Tunis, Tunisia, for a medical investigation from January 2013 to June 2015. The inclusion criteria were: age ≥ 18 years, known sex, and CT scans without contrast enhancement.

We excluded CT scans showing skull malformation, infectious or neoplastic pathology, Paget’s disease of the bone, or signs of recent or ancient trauma. We randomly selected 510 scans equally distributed by sex (255 males, 255 females), with age ranging from 18 to 100 years (mean = 56 years, SD = 18 years). The scans were then anonymized. Only information about sex and age were recorded and then blinded.

Imaging technique

The scans were performed on a General Electric 16-channel Multidetector tomographer, Healthcare Brightspeed™, VDTEXTe model with a tube voltage of 120 kV and effective mAs of 160 (GE Healthcare, Milwaukee, Wisconsin, USA).

The CT scans were selected to ensure that the space between slices was less than the slice thickness (insuring visualization of all anatomical structures). Axial slices thickness ranged from 0.9 to 1.3 mm. MSCT Multiplanar (axial, coronal, and sagittal) images were post-processed to create a volumetric tridimensional reconstruction using ASIR™ software in a General Electric™ workstation (Advantage Workstation, GE Healthcare, Milwaukee, Wisconsin, USA). A preset image filter displaying the bone with a wide opacity ramp was set at a window/level operation (W/L) of 2000/500. The workstation software has a measurement tool allowing the linear distance between two points to be automatically calculated.

Craniometric parameters

A total of 37 landmarks (14 bilateral and 9 midline located landmarks) were identified (detailed definitions of the landmarks are given in Online Resources 1 and 2), defining 27 linear inter-landmark distances (Table 1).

Table 1 Inter-landmark measurements definitions

For the orbital breadth measurement, the maxillofrontale was employed as an internal limit instead of the dacryon (point on the medial border of the orbit that marks the junction of the sutures between the frontal, maxillary, and lachrymal bones [27]) for two main reasons: Firstly, previous studies reported difficulties in identifying the dacryon in CT sections [7, 17, 21, 28], and secondly, the maxillofrontale landmark is—as suggested by Piquet [25]—more suitable for the measurement of the orbital breadth and was reported to be easier to identify [7]. In association with traditional inter-landmark craniometric measurements, bone thickness was measured at two locations, as described by May et al. [29].

  • Frontal bone thickness (FBT): measured 1 cm anterior to the bregma on the midsagittal plane, perpendicular to the endocranial table surface.

  • Parietal bone thickness (PBT): measured 3 cm anterior to the lambda, perpendicular to the endocranial table surface.

Data acquisition

Anatomical landmarks were manually identified along the volume rendered (VR) images and verified on the multiplanar slices. Inter-landmark measurements and bone thickness were then calculated using the workstation software measurement tool (examples in Figs. 1, 2, and 3). Rotation, translation, zoom, and transparency tools were used to allow the precise identification of landmarks.

Fig. 1
figure 1

Frontal chord (FRC) measurement

Fig. 2
figure 2

Bimastoidale (BMS) measurement

Fig. 3
figure 3

Bizygomatic breadth (ZYB) measurement

As the geometrically defined landmarks (i.e., type III) may be difficult to position directly on the 3D surfaces with accuracy, the Frankfurt Horizontal plane was constructed to guide the process. Two axes provided by the original software were placed on the CT slices so that the first one matched the right and left porion, and the second matched the left porion (po) and left Inferior orbital margin (oi) representing the Frankfurt plane, as defined by Martin [30]. A third axis, perpendicular to the established horizontal plane, was shifted to extremities in order to obtain a tangent to a bone structure; their meeting point was identified as the landmark.

In some cases, only the neurocranium was considered of interest during the imaging process (depending on the clinical problem that motivated the CT scan), and the lower parts of the skull (facial skeleton and the mastoid processes) were partially scanned; therefore, measurements involving landmarks located on these structures could not be collected.

Statistical analyses

Descriptive and univariate analyses

The statistical analyses were performed with Microsoft Excel® and R® v 3.1.1 for windows (R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.).

A descriptive analysis was performed for each variable. Sample sizes, sample means, standard deviations, and maximum and minimum values were calculated for each variable in both sex groups. The normal distribution of the data was tested using the Shapiro-Wilk’s test and the equality of variances using Fisher’s test.

Sexual dimorphism was investigated by comparing the male and female mean values for each craniometric parameter using parametric (Student’s t test)/non-parametric (Mann-Whitney U and Aspin Welsh) statistical tests and taking the p value <0.05 as significant. Two sexual dimorphism ratios were computed for each variable:

- The Lovich-Gibbons (LG) ratio = Male mean/Female mean [31].

- The logarithmic ratio = Ln (Male mean/Female mean) [32].

Precision study

We randomly selected 28 CT scans. They were assessed twice by the main operator, at 1 week interval (to evaluate the intra-observer error), and three times by a second operator, with the same time lapse (to evaluate the inter-observer error). The second operator had an introduction to the method, had the landmark definitions, and tested different measurements on 20 CT scans before the evaluation.

For each craniometric parameter and for each operator, we computed the technical error of measurement (TEM) [17, 33, 34], the relative technical error of measurement (rTEM) [17, 33,34,35], and the reliability coefficient (R) [34, 36]. The relative TEM (rTEM) is calculated to compensate for the positive association between TEM and measurement size. The R coefficient allows an estimation of the proportion of total measurement variance that is unrelated to measurement error, with values ranging between 0 (not reliable) and 1 (completely reliable). A value of 0.9 indicates that 90% of the total variability is “true” biological variation, and that the remaining 10% is due to measurement error (imprecision and unreliability).

Multivariate analyses

A multiple logistic regression using backward stepwise selection was applied to determine sex based on the average data of craniometric parameters. The variables that were included in the logistic model fitting process were those with a p value <0.2 in the mean comparison tests, rTEM ≤5% and R ≥ 0.9.

An outcome from the logistic regression of P > 0.5 would be classified as male while an outcome of P < 0.5 would be considered female (an outcome of 0.5 indicates indeterminate sex). Multiple other models were generated by direct logistic regression on single variables and by combining the most sexually dimorphic parameter according to the sexual dimorphism ratios. Three models were specifically designed for the cranial vault, the cranial base, and the facial skeleton; these models can be potentially helpful for sex estimation in case of fragmented skulls.

In order to validate the generated models, we calculated the posterior probabilities (PP) and sex bias and performed a leave-one-out cross-validation (LOOCV). The sex bias was calculated by subtracting the male correct prediction percentage from the female correct prediction percentage.

Results

Precision study

All of the measurements fell within the acceptable range of intra-observer and inter-observer technical error of measurement (rTEM ≤5%), and 26 out of 29 measurements exhibited an acceptable reliability coefficient (R ≥ 0.9). The R value was low for the right orbital breadth (total TEM = 1.15 mm, SD = 1.84 mm, R = 0.61) and the left orbital breadth (total TEM = 1.47 mm, SD = 2.09 mm, R = 0.5). The R value of the right mastoidal height was 0.88 (total TEM = 0.97, SD = 2.75), while the left mastoidal height had a value of 0.91 (total TEM = 0.89, SD = 3) (detailed results of the precision study are provided in Online Resource 3).

Univariate analyses

The males’ mean values for the 27 inter-landmark distances and the two bone thickness measurements were significantly higher (p < 0.05) than the corresponding female values (Table 2), indicating the presence of sexual dimorphism in the skulls of the study sample.

Table 2 The basic statistics of the skull measurements

The mastoid processes measurements (MHLleft, MHLright, and BMS), the left orbital breadth (OBLleft), and the bizygomatic breadth (ZYB) had the highest sexual dimorphism ratios (Table 2).

Logistic regression

Direct single variable

Direct logistic regression performed separately on each variable enabled single-variable discriminant models to be established. The most accurate single predictors were ZYB, cranial base length (CBL), glabelle-lambda length (GLL), maximum cranial length (MCL), left mastoidal height (MHL left), bimastoidale (BMS), and nasio-occipital length (NOL), with sexing accuracy ranging from 79.41% for ZYB to 70% for NOL (Table 3).

Table 3 Single-variable discriminant models

Stepwise and direct multiple variable

Two main models were calculated, yielding the highest rates of classification accuracy. The first model (model1) had the highest cross-validated accuracy at 90.04% (Table 4).

Table 4 Multivariable model’s accuracy

Model 1: Z = (0.28408 x ZYB) – (0.13913 x JUB) + (0.11511 x CBL) – (0.04838 x FFB) + (0.77845 x MCL) – (1.06662 x NOL) + (0.34175 x GLL) + (0.10356 x BMS) + (0.2181 x MHLleft) - 55.58661.

However, the second model was considered the most “practical” with fewer measurements needed (parsimony principle), and achieving 85.9% cross-validated accuracy (Table 4).

Model 2: Z = (0.15659 x ZYB) – (0.9574 x MCL) + (0.20961 x GLL) + (0.15035 x BMS) + (0.26275 x MHLleft) - 63.41141.

Furthermore, to handle cases with partial fragmentation of the skull, other models were generated using variables measured at the cranial vault alone (model 3), the cranial base (model 4), or the facial skeleton (model 5):

Model 3: Z = (−0.07097 x MFB+ (0.18826 x FFB+ (0.97636 x MCL) (1.16835 x NOL+ (0.39094 x GLL) (0.09968 x PC) - 36.85134;

Model 4: Z = (0.1422 x FOB+ (0.1357 x CBL+ (0.1883 x BMS+ (0.3443 x MHLleft) - 48.1572;

Model 5: Z = 0.3221 x ZYB - 41.2822.

These models had accuracy rates ranging from 79.41 to 83% (Table 4).

Discussion

The purpose of the current study was to analyze the relationship between male and female crania and the differences in cranial measurements as they relate to sexual dimorphism. The univariate analyses revealed a significant difference between sex groups for all of the investigated craniometric parameters, with males displaying larger average values than females. These measurements were then used to generate formulas to determine the sex of an individual within the Tunisian sample, with precision reaching 90%.

Since the implementation of the Daubert ruling, forensic scientists are required to substantiate their assertions with testable, reliable, replicable, and scientifically valid methods [11, 37,38,39].

In anthropometric studies, an R value greater than 0.95 should be “sought where possible” [34]. Franklin et al. suggested that any measurement with an R value below 0.9 and rTEM above 5% should be treated with caution [17].

Usually, type II and III landmarks present with lower precision compared to type I landmarks [40, 41]; therefore, measurements inclusive of type II and III landmarks were expected to result in lower repeatability.

The coefficient of reliability was at its lowest for right and left orbital breadths (respectively 0.61 and 0.5), defined by landmarks of type I (maxillofrontale) and type III (ectoconchion). Even if the intra- and inter-observer rTEM were at acceptable rates (possibly related to the use of the maxillofrontale instead of the dacryon), the absolute measurement error rates of the orbital breadths were offset by low standard deviations, thus resulting in low coefficients of reliability.

The right mastoid height (MHLright) had a coefficient of reliability of 0.88, indicating that 12% of the variance of this variable may be due to measurement error. Although the absolute error for both right and left mastoid heights were small (0.97 and 0.89 mm of the total TEM, respectively), the difference in variances (2.75 mm for MHLright and 3 mm for MHLleft) may explain the lower R coefficient observed for the right mastoid height. The remaining variables had an R coefficient, TEM and rTEM in line with published research [17, 21, 42].

While using the Frankfurt plane method, a limited amount of intra and inter-observer error rate was noticed in this study. The reliability of the measurement protocol was insured by the verification of the exact location of landmarks within the multiplanar images along with the volume rendered reconstructions.

The 27 inter-landmark distances and the bone thickness of the two locations exhibited a strong sexual dimorphism, with male values being significantly higher than those of females. Previous craniometric studies had similar results with an obvious “size” sexual dimorphism in the skull [8,9,10, 28, 43,44,45,46,47,48,49,50].

MCL, ZYB, mastoid height (MHL), and bimastoidale (BMS) included in the first and second models were frequently encountered within sex prediction formulas in a variety of diverse populations [8,9,10, 28, 43, 51, 52]. The GLL and bijugal breadth (JUB) retrieved in the first model had been previously reported as strong predictors of sex when included in a multivariate model [53].

The ZYB is well established as being among the most dimorphic in the human skull in several populations (e.g., Australian [9], South African Black and White [46, 51], Turkish [28], Northern Indian [54], Japanese [10], and Cretan [43]). Also in our study, this craniometric distance was highly accurate within a single variable discriminant model with a 79.4% level of accuracy and sex bias of less than 1%. Previous research reported similar accuracies (78 to 85.5%) [9, 43, 46, 51, 54].

The accuracy reported in several previous studies ranged from 73.2 to 90% for functions constructed with three variables [9, 52, 54], from 80.8 to 91.1% for functions constructed with four variables [10, 55, 56], and from to 85.9 to 87.1% for formulas constructed with five to six variables [28, 43].

The multivariable models generated by stepwise and direct logistic regression in the current study achieved high classification accuracies ranging from 85.9% (model 2) to 90% (model 1) and low sex bias (respectively −0.97 and −2.9%). These results are comparable to published accuracies in other populations (85.5 to 90%) [8,9,10, 28, 43, 54, 57].

Models generated for partially fragmented crania can be useful in skulls with vital trauma or postmortem damage. Although the 3rd, 4th, and 5th functions were less accurate, they provided an opportunity to correctly determine the sex of a damaged skull with a classification accuracy ranging from 79.4% (model 5) to 83% (model 4).

Since concerns exist regarding the accuracy of CT images in reflecting an object’s true dimensions, several papers dealt with the extent to which CT scan craniometrics can be used in forensic anthropology. An acceptable concordance had been demonstrated between measurements taken directly from a dry skull (using a digitizer or a caliper) and those taken from the volume rendered CT image of the same dry element [17,18,19,20, 58]. When comparing the accuracy of cranial measurements collected using three different means: CT scan VR images including soft tissues, CT scan VR images after soft tissues removal, and direct measurements in the dry skulls, Stull et al. [21] considered the between-methods differences as an acceptable amount of error in forensic anthropology, and that these differences are more a consequence of measurement repeatability than related to imaging artifacts. Furthermore, Stull et al. [21] stated that “CT scans inclusive of soft tissues are recommended for metric data collection if the goal is to create an applicable anthropological technique”. Thereby, regarding the high resolution of the CT scans in the present study and the small error rate of the measurements composing the proposed models, it can be presumed that the models developed in the present study can be useable on dry bone findings.

The relation between human cranial thickness and parameters of the biological profile is the subject of continuing discussion. Several studies have shown conflicting or inconclusive results concerning the relationship between cranial thickness and sex, age, and other biological parameters such as body build and ancestry [59, 60].

Countless factors account for the variation in research findings, such as sampling bias (e.g., small sample populations), the confounding effects of pathology, variations in data collection methodologies, and ways in which data were analyzed [61, 62].

Most researchers noticed that, when it comes to cranial thickness, females have usually higher mean values than males [59, 60, 63].

A correlation with aging could be observed for cranial thickness in some studies, with an increase in cranial thickness for ages over 50 years, especially for females [60, 64]. This pattern was attributed to the incidence of hyperostosis frontalis interna (HFI), and according to Ross et al. [60], cranial thickness is sexually dimorphic by the onset of HFI.

The current study reported a significant difference in both frontal and parietal bone thickness between males and females. Regarding the mean values, males had significantly thicker bones than females; these findings differ from the reported results of previous studies. Numerous hypotheses could be stated at this point:

  • This could be a particular trend in the Tunisian population, in a population-specific manner.

  • Possible non-investigated confusion factors could bias the interpretation, considering that significant correlations between bone thickness and other biological parameters, like body built or age, had been reported in previous studies [60, 63, 64].

It is important to state that a comparison with other studies remains difficult regarding the frequently different sites of measure, even for the same bone, and the different analysis methods.

This study is part of an ongoing research project aiming to create a specific and representative anthropological database that would improve the proficiency of Tunisian forensic practitioners when dealing with identification issues. As documented skeletal collections are not available in Tunisia, medical scans represent an appropriate source of data for collecting contemporary skeletal standards and, eventually, updating existing data. MSCT of living individuals performed for a diagnostic purpose offer an opportunity for obtaining virtual skeletons with a high resolution, avoiding any unnecessary radiation exposure, and providing data which are consistent with those obtained by traditional anthropological methods.

Conclusion

In conclusion, our study demonstrated that the skull was highly dimorphic and represented a reliable bone for sex determination in contemporary Tunisian individuals. Based on a CT scan study of 27 studied inter-landmark distances, we were able to identify a nine-variable model achieving a classification accuracy of 90% with −2.9% sex bias and a six-variable model yielding 85.9% sexing accuracy with −0.97 sex bias. Also, three supplementary models were generated allowing for sex determination of incomplete skulls.