Introduction

Sex estimation in damaged and mutilated dead bodies and skeletal remains constitutes the foremost step in medico-legal identification examinations. It enables to consider only the missing persons of the estimated sex, and subsequently, sex-specific age estimation can be performed [1, 2]. Dental identifications are most frequently based on comparing the post mortem (PM)-collected odontological evidences with the ante mortem (AM) specifications registered in the provided dental files. If AM records are not available, a PM profiling is established by the examining forensic odontologist. Characteristics of the individual likely to narrow the search for the AM resources, such as age, sex, ancestry, systemic disease, socio-economic status, occupation, and habits, are considered [26]. Sex estimation is an important part in diverse forensic disciplines. In forensic anthropology, sex estimation is based on morphological and metrical features of the skeletal bones, such as the skull and mandible [4, 710], scapula, clavicle, sternum, humerus, femur, hip, and sacrum [8, 11]. In forensic medicine, external and internal autopsies and DNA analysis of different prelevated biological materials are used [4, 6, 8]. In forensic odontology, methods based on metric and non-metric dental features as well as DNA analysis of the teeth (parts) are developed for sex estimation [1, 4, 8]. Teeth were used to estimate the sex of unknown individuals, based on the differences between sexes in the dimensions and the morphology of teeth [4, 11], the dissimilar patterns of dental development and tooth eruption [8], and the expression of the amelogenin protein [4]. Related to tooth morphology, aplasia or hypoplasia of the maxillary lateral incisor was found predominantly in females, and hyperodontia predominantly in males [8]. Amelogenin is a major matrix protein of the human enamel, with a different signature in the size and the pattern of the nucleotide sequence in males (M) and females (F) [4]. Several studies compared tooth crown dimensions between sexes, measured intraoral [1214], on dental casts [1524], or on skeletal and dental remains [2527]. Mesiodistal (MD) and buccolingual (BL) diameters of the permanent tooth crown were the two most commonly used and studied dimensions [14, 18, 19, 2127], followed by diagonal measurements (mesiobuccal-distolingual and distobuccal-mesiolingual) [16, 25, 27, 28], and the mandibular canine index, expressed as the ratio of the MD dimension of canines and the inter-canine arch width [2931]. Most studies included measurements on different tooth positions, in particular on all the teeth [17, 1922, 24, 27], only on maxillary teeth [23], or only on randomly chosen tooth positions [1216, 18, 25]. The reported studies revealed that the dimensions of the canines provide the highest sexual dimorphism [1416, 19, 21, 26, 27], followed by the premolars [19, 26, 27], the first and second molars [12, 16, 2527], and the maxillary incisors [1, 19]. Moreover, these findings were similar comparing samples of divers biologic origin [17, 1922, 27]. Morphological features of tooth crown and root were studied mainly in incisors and molars of both dentitions. Different methods were reported in the literature and excelled with the Arizona State University Dental Anthropology System (ASUDAS) method [11]. A non-metric feature, which has been found to show sexual dimorphism, is the distal accessory ridge of the canines, which shows a higher frequency and a more pronounced trait expression in males [1, 4, 11]. Sexual dimorphism has been shown to be more significant in the permanent dentition of young adults. Studies indicated that the early permanent dentitions provided the best conditions for tooth size measurements and morphological feature registration because in an early adulthood dentition, less mutilation and less attrition is observed [8, 32]. Panoramic radiographs are very commonly used tools for diagnosis in dental practice, and consequently allow for an easy retrospective collection of the registered information [33]. The radiographs permit to perform (digital) measurements of different tooth crown and root parts [3436]. The aims of this study were to assess the degree of sexual dimorphism in permanent teeth, in particular to detect which tooth dimension, on which tooth position, was most sex-related and applicable for sex estimation in forensic practice. Moreover, it was aimed to explore if combining specific tooth dimensions on particular tooth positions improved the accuracy of sex prediction in forensic identification.

Materials and methods

In the age range between 22 and 34 years, 200 digital panoramic radiographs (100 M, 100 F) were retrospectively collected from the dental clinic files of the University Hospitals UZ Leuven, Belgium. The panoramic radiographs were digitally captured according to the manufacturer’s recommendations for positioning and exposure. Images were acquired with Cranex Tome (Soredex, Finland), Veraviewpocs 2D (J. Morita, USA), Planmeca Promax 2D (Planmeca Oy, Finland), and Vistapano S (Durr Dental AG, Germany).

Ethical clearance was obtained from the Ethical Committee of University Hospitals UZ Leuven, Belgium (2014 12 11). The collected data were anonymized. Besides the panoramic radiographs, additional data were extracted from the related patient files, including date of radiographical exposure, date of birth, and sex. The selected radiographs met the following inclusion criteria: good image quality; all permanent teeth completely developed; no teeth extracted; no medical history of tooth pathology or disorders of skeletal development visible; and no crown restoration, occlusal wear, trauma, or orthodontic treatment detected. Images demonstrating major errors were rejected and thus excluded from further analysis. Panoramic radiographs were imported in image enhancement software (Adobe Photoshop CS6, Adobe System Incorporated, San Jose, CA, USA) [37] and resized 1:1, based on the technical specifications of the related dental radiography unit manufacturer.

Four landmarks were located on each considered tooth, namely the most occlusal tooth point (O), the root apex (A) (for multiradicular teeth the mesial root apex (MA) was considered), the mesial cement-enamel junction (MCEJ), and the distal cement-enamel junction (DCEJ) (Fig. 1). The landmarks were used to measure tooth part dimensions. These variables were grouped in lengths and width variables, and ratios of variables were calculated (Table 1, Fig. 2). In particular, the established length measures were total tooth length (TTL), occlusal plane length (OPL), total crown length (CL), crown length (CEJL), and root length (RL). The width measures included maximal crown width (CW) and cement-enamel junction width (CEJW). The ratios of tooth lengths from the same tooth allowed correcting for radiographical deformation. In premolars and molars, due to sometimes appearing bucco-palatinal inclination, buccal and palatal cusps are not overlapping. The ratio between OPL and TTL gave an indication of the degree of bucco-palatinal inclination of premolars and molars (ratio = 1 equals no inclination). All the variables were measured on all permanent teeth in the upper and the lower left quadrant. In the case of absence or poor image quality of the considered tooth on the left side, the corresponding contralateral tooth was measured (e.g., Federation Dentaire International #43 instead of #33). In total, 212 variables (106 measurements and 106 ratios) were examined.

Fig. 1
figure 1

Tooth landmarks located on the first mandibular molar. The panoramic radiographs were integrated in Adobe Photoshop CS6, zoomed 300% and the landmarks marked with the elipse tool. The landmarks were positioned on the mandibular first molar: most occlusal point (O), mesial root apex (MA)/in monoradicular teeth root apex (A), mesial cement-enamel junction (MCEJ), distal cement-enamel junction (DCEJ). The horizontal line represents the occlusal plane (OP) of the investigated tooth, and is defined as the line connecting the tips of the cusp(s), radiologically projected on other tooth material.

Table 1 Variables and ratios of variables based on tooth measures established on panoramic radiographs
Fig. 2
figure 2

Placement of guides in order to perform the length and width measurements. To obtain optimal measurements the panoramic radiographs were zoomed 300% and rotated with the line connecting the mesial and distal cement-enamel junction landmarks parallel to the "X" axis (horizontal line in the lower image). Guides were dragged at the selected landmarks, and the measurements were performed using the rectangular marquee tool. The upper image presents the horizontal guides placed for the length measurements of tooth #28: total tooth length (TTL), occlusal plane length (OPL), root length (RL), total crown length (CL), crown lenght (CEJL). The lower image presents the vertical guides placed for the width measurements of tooth #28: maximal crown width (CW), cement-enamel junction width (CEJW)

All the measurements were registered by a single examiner. To check for intra- and inter-observer reliability, after 1 month, 15 % of the radiographs were randomly selected and re-evaluated by the first and a second examiner. The intraclass correlation coefficients (ICC) were calculated to quantify the degree of reliability.

For each of the 212 variables separately, males and females were compared using a Mann–Whitney U test. The discriminative ability was quantified using the area under the curve (AUC). A value of 1 equals perfect discrimination, and 0.5 equals random prediction. Suppose that males have on average a higher score on a specific variable than females. Then, the AUC can also be interpreted as the probability that a randomly chosen male subject has a higher score on that variable than a randomly chosen female. P values were adapted for multiple testing using the false-positive discovery rate (FDR) [38]. Since the number of the variables is high compared to the number of subjects, in a first step, a principal component analysis (PCA) on the 212 variables has been used to reduce the dimension of the data. The resulting principal component scores, each of them being a linear combination of the original variables, are then used to discriminate between males and females. More specifically, a (multivariate) linear discriminant analysis (LDA) is used separately for a varying number of principal components (1 to 30). The LDA is based on a multivariate normal distribution assuming the same covariance matrix in both groups and results in a score which is a linear combination of the used principal component scores. The misclassification error, the AUC, and the Brier score (i.e., the mean squared prediction error) were given to quantify the performance of the PCA-LDA model. To obtain a fair assessment of the performance for future observations, a cross-validation procedure was applied splitting 100 times at random the data into a calibration (80 %) and test (20 %) set. The procedure was applied in the calibration set and evaluated in the test set. Mean performance (over the 100 samples) was compared with the (overoptimistic) observed performance. Analyses have been performed using SAS software, version 9.2 of the SAS System for Windows (SAS Institute Inc., Cary, NC, USA).

Results

The subjects of the studied sample had a mean age of 27.1 years (SD 3.37 years) for M and 26.4 years (SD 2.92 years) for F.

The mean ICC for the intra-observer reliability was 0.95. For 153 and 208 variables, the ICC values were higher than 0.90 and 0.80, respectively. The mean inter-observer ICC was 0.71. For 92, 116, and 133 variables, the ICC values were higher than 0.90, 0.80, and 0.70 respectively.

For all the considered teeth, all mean tooth length and mean width measures were found to be higher in M than those in F. As an illustration, the sex-specific mean TTL and CW measures for M and F were listed in Table 2. The variables being significantly different between males and females, and having a p value <0.0001 after correction for multiple testing were listed in Table 3. All these variables were tooth length measures, except for one ratio of lengths and three width measures. TTL for the mandibular canine was the most discriminative variable. In general, the mandibular and maxillary canines showed the greatest sexual dimorphism for the length and to a minor extend, for the width measures. In the univariate analyses, only three variables had an AUC higher than 0.75, i.e., TTL33, TTL23, and RL33 (Table 4).

Table 2 Sex-specific mean TTL and CW values for each measured tooth
Table 3 List of variables with p < 0.0001 after FDR correction for multiple testing, ordered on AUC value
Table 4 Distribution of the AUC values for the 212 variables

The results from the multivariate analyses (PCA-LDA models) revealed that increasing the amount of information (i.e., increasing the number of included PC) did not substantially increase the discriminative ability. Irrespective of the number of PC scores used, the cross-validated AUC stayed below 0.80 and the cross-validated misclassification error above 25 % (Table 5).

Table 5 Observed and cross-validated sex discriminating performance (misclassification error, area under the curve, Brier score) as based on the number of used principal components. The results of the cross-validation refer to the mean over 100 random samples

Discussions

In the current research, it was not feasible to collect all data from direct measures on extracted teeth, because in contemporary research, it is not feasible to sample 100 male and 100 female corpses with all permanent teeth present, in the ages immediately after maturation of these teeth. Moreover, it would be hard (to impossible) to get ethical clearance to extract all teeth from the sampled subjects. Therefore, measures of teeth on a panoramic radiograph collection were chosen as best alternative to collect data for the current indicative study.

Multiple reasons support this decision. Firstly, panoramic radiographs allowed to register the principal metric sex-related tooth features, described in literature [14, 18, 19, 21, 2327]. Because, on panoramic radiographs, a clear distinction between the enamel, the dentine, the pulp, and the surrounding tooth structures was registered, it permitted to measure in particular the total tooth length, the crown length, the root length, and mesiodistal tooth widths on divers levels (Table 1). Secondly, compared to tooth dimension data collected intraorally or on dental casts, panoramic radiographs allowed to register measurements of the whole tooth, including the root(s). In particular, tooth and root length(s) and MD root widths at different levels could be registered (Table 1). Subsequently, more sex-related dental variables could be explored. Third, the variable measurements performed on panoramic radiographs were, compared to variable measures from previous studies performed on casts or intraoral, established more in correspondence with the standards to describe equal variables; e.g., in the current study, CW was measured from the mesial to the distal contact point, and CL was considered from the most occlusal crown point perpendicular to the connection between the mesial and distal CEJ. Fourth, although in forensic context, periapical radiographs represent the standard radiographical procedure during post mortem dental data collection, in the current research, panoramic radiographs offered the possibility to study all the teeth present in a subject, on one single image [33]. This reduced the working time and eliminated the registration errors that could occur with the repeated geometric radiographic settings necessary for standardized periapical radiograph exposure of the whole dentition [33, 39]. Moreover, a retrospective periapical x-ray collection of all tooth positions from each sampled subject was not available, because most periapical x-ray collections mainly include images of particular pathologic teeth with insufficiently known clinical diagnostic information.

Disadvantages of using panoramic radiographs for data collection were as follows, first, the need to calibrate the image size according to the technical specifications of the used panoramic unit [34]. In order to obtain 1:1-sized images, the dimensions of the imported images required resizing, according to the magnification factor and the panoramic image sizes, mentioned in the technical specifications of the unit manufacturer. Secondly, due to tooth rotation, overlap, and/or interference with the surrounding anatomical structures [33], difficulties in locating landmarks could appear on panoramic radiographs. Therefore, during the radiograph collection process, only images overcoming these issues were selected and included. Twenty-three percent of the initial collected radiographs were excluded. Possible radiographical deformations were compensated using tooth dimensional ratios. Third, future research may focus on the relation between the panoramically derived tooth ratio’s and the potential measures on intraoral radiographs from the same teeth in the same patient (or from PM-extracted teeth). This information is essential, because during forensic examinations, there is mostly a lack of ability to perform a panoramic radiographical registration of the presented dental evidence(s). Fourth, certain non-metric dental traits can be used for sex discrimination. They can be observed, performing a clinical oral examination or investigations on extracted teeth or on dental casts [11] (e.g., the canine distal accessory ridge morphology [1], crown traits of (deciduous) teeth [40], together with BL tooth properties, tooth weight [41], and tooth form (combination of size and shape) [42]). These features are not registered or detectable on panoramic radiographs.

In a pilot setup, 21 subjects (11 males and 10 females) with both dental casts and a panoramic radiograph registered the same day were retrospectively collected from patient files. To detect possible distortion between the measures on casts and on panoramic radiographs, the CW of all the studied teeth was measured on both registrations, and their mean ratio was calculated. The obtained mean ratios varied between 1 (SD 0.16) and 1.31 (SD 0.18). The appearing discrepancies between both measures cannot only be attributed to radiographic deformation. On the casts, it was not possible to measure exactly from the mesial to the distal contact point, especially not in the dorsal tooth positions. This was reflected in the high differences in results between the frontal (mean ratio ≤ 1.08) and the dorsal teeth (1.13 ≤ mean ratio ≤ 1.31). The results of the pilot setup indicate that extrapolation of the study results to real tooth measures need to take into account possible radiographical distortions. Because in the pilot setup, only one studied variable could be validated (with an inherent measure fault), in future research, a validation of all the studied variables should be performed comparing the current variables measured on extracted teeth and on their panoramic radiograph taken before extraction.

The age range of the studied sample was restricted to young adults (22–34 years), to ensure that the teeth of the investigated mature dentitions had the highest probability to be intact. Especially, tooth development and certain dental physiology or pathology could affect the tooth length measurements. Tooth wear (e.g., attrition) increases with increasing age [43]; the normal vertical loss of enamel from physiological wear in vivo is considered to be approximately 20–38 μm per annum [44]. A recent systematic review of 186 prevalence studies concluded that the percentage of subjects presenting with severe tooth wear increased from 3 % at the age of 20 years to 17 % at the age of 70 years [45]. The studied subjects were spread in the age range between 22 and 34 years. The youngest age truncation was necessary to include subjects with mature teeth. The oldest truncation was chosen to maximally reduce the influence of attrition. Indeed, according to the mentioned standard of vertical loss, the maximal vertical loss possibly appearing between the youngest and oldest included subject would be 456 μm (38 μm × 12 years). Taking additionally into account that severe attrition in the studied sample only appears in the smallest part of the 3 to 17 % range, it can be concluded that attrition is not affecting the current study outcomes. Most studies using intraoral measurements for sexual dimorphism set a similar age range for their selected study sample [12, 14].

The current univariate study results indicated that dimensions of the mandibular and maxillary canine present the highest ability for sexual dimorphism. This result was consistent with the existing literature reports [1416, 19, 21, 26, 27]. Statistically significant differences between M and F, based only on tooth length measures, were detected in the following teeth: first mandibular premolar, lateral and central maxillary incisors, second mandibular molar, second maxillary premolar, central and lateral mandibular incisors, second mandibular premolar, and first maxillary premolar (p value (FDR) <0.0001). These results were in agreement with studies performed on similar sized samples of populations from different biological origin, reporting significant differences in tooth size between M and F in premolars [19, 26, 27], first and second molars [12, 16, 17, 2527], and maxillary incisors [1, 19]. All referred studies were based only on BL and MD crown measurements. In the current study, canine length measurements, TTL and RL, were found to have the highest discriminative values.

In forensic anthropology, the accuracy of determining the correct sex by morphological and metric assessment of different skeletal bones is between 80 and 90 % for the scapula [46, 47], sternum [48], humerus [49], and femur [50]. These values increase to nearly 100 % for combinations of the skull [51, 52], scapula and clavicle [53], or femur and hip bone [54], or the pelvic bones [55, 56]. Other methods like finger printing [57] and DNA analysis [58] have a high accuracy, between 96.8 and 100 %, respectively. Teeth are considered a useful supplement and adjunct to sex discrimination, but not recommended as the sole indicator of sex [1]. The main practical forensic appliance of sex estimation is to narrow the ante mortem search field based on the available post mortem evidences. Consequently, this search should depend on highly reliable information. Therefore, an accuracy of at least 80 % would avail for the studied dental variables to be used as sole sex predictor. In the current research, the highest AUC value was between 0.75 and 0.80, for only three tooth-specific variables. Combining tooth variables by using ratios did not increase the tooth-specific AUC values; a maximal AUC value of 0.68 was detected for the ratio of RL37 and TTL37. The performed multivariate analysis did not detect a high increase in the discriminative results compared to the univariate results. Combining variables information from 30 PCs, which explained 88 % of the total variability of the 212 variables, did not succeed in obtaining AUC values higher than 0.80 (cross-validated).

The obtained ICC values to test intra-observer reliability, indicated an excellent level of reproducibility of the tooth dimension measurements. The lower obtained ICC values for the inter-observer reliability test can be explained by a difference in experience between the two observers, and is transferable to forensic practice, where less experienced examiners need to perform the measurements according to the described protocols.

In future, research could be validated, if extrapolation of the current findings to periapical radiograph registrations is possible. Further on, the exact influence of age on the sex discriminative values of the used dental variables should be examined and quantified for each possibly affected variable. Because in forensic anthropological examinations, teeth are often present in the available skeletal evidences, combining the currently studied dental and available skeletal sex discriminative parameters should be explored on their sex discriminative performances.

Conclusions

The canines were the most sexual dimorphic teeth. The best sexual dimorphic parameters were tooth lengths, in particular TTL and RL. Combining multiple dental parameters did not provide additional sexual dimorphic information, compared to individual parameters or ratios of parameters. In future, research could be validated, if extrapolation of the current findings to periapical radiograph registrations is possible. Further on, the exact influence of age on the sex discriminative values of the used dental variables should be examined and quantified for each possibly affected variable. A validation of all the studied variables should be performed comparing the current variables measured on extracted teeth and on their panoramic radiograph taken before extraction, in order to quantify possible radiographic distortions of their linear measures. Using only dental parameters obtained from panoramic radiographs for sexual dimorphism should be avoided since the discriminative ability is too low to obtain an acceptable misclassification error.