Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Craneofacial superimposition (CFS) [1] is one of the most relevant skeleton-based identification techniques. It involves the process of overlaying a skull image (or a skull 3D model) with a number of ante-mortem images of an individual and the analysis of their morphological correspondence to try to establish whether they correspond to the same individual. Three consecutive stages for the whole CFS process have been distinguished in [2]:

  • Acquisition and processing of the face and the skull photographs/models. In some approaches, this step also involves the location of anatomical landmarks on the skull and the face.

  • Skull-face overlay (SFO), which focuses on achieving the best possible superimposition of an image, video-frame or a 3D model of a physical skull, and a single ante-mortem image of a missing person.

  • Skull-face overlay assessment and decision making, in which the degree of support (strong support, moderate support, limited support, and undetermined [3]) that the skull and the available photograph belong to the same person or not (exclusion) is determined.

From the Computer Vision (CV) point of view, there is a clear relation between the SFO procedure and an Image Registration (IR) problem [4]. SFO can be tackled following an IR approach in order to superimpose the skull onto the facial photograph. To do so, the most convenient procedure is to guide the IR process by matching the corresponding cranial and facial landmarks.

Concerning soft computing, this matching process involves a really complex optimization task. There is incomplete and vague information guiding the process (matching of two different objects, a skull and a face). Thus the resulting search space is huge and presents many local minima, especially when a skull 3D model is considered. Due to it, exhaustive search methods are not useful. Furthermore, forensic experts demand highly robust and accurate results. IR approaches based on Evolutionary Algorithms (EA) are a promising solution for facing this challenging optimization problem. Thanks to their global optimization nature, EAs are capable to perform a robust search in complex and ill-defined problems as IR [5].

However one drawback of the existing EA-based proposals dealing with the SFO problem is that they consider all landmarks equally important when they are not. For instance, landmarks located in the teeth represent the most confident source of information since it is the only bony part visible in the face. Thus, there is a need to properly model the different relative importance of the pairs of landmarks used to perform SFO as a 3D-2D image registration problem.

To overcome these problems, we modeled the expert knowledge related to the differences among landmarks based on anthropometric‘s characteristics into the existing automatic SFO method. This has been modeled with two classic approaches from the area of multicriteria decision making [6]: Weighted sum and Lexicographic order. In addition, the obtained results have been compared with the state-of-the-art approach (rcga-mc45) [7] using a “ground truth” dataset [8].

This paper is structured as follows. Section 2 reviews the current state of the art and introduces the automatic CFS system. Section 3 describes our proposals for modeling and incorporating expert knowledge within the optimization process. Section 4 presents the experiments and results. The final conclusions are detailed in Sect. 5.

2 Craniofacial Superposition

The diverse CFS approaches evolved as new technologies became available on foundations laid previously [1]. Although several authors had made different classifications of the technique, all of them recognize three different categories: photographic superimposition, video superimposition and computer-aided superimposition [1, 9]. Another one was proposed by Damas et al. [2] classifying them into two groups: non-automatic computer-aided methods and automatic computer-aided methods. Those later approaches deal with the SFO task within CFS and drastically reduce the time taken for SFO. Those proposals are based either on photograph to photograph comparison [10] or on skull 3D model to photograph comparison [7, 1114].

2.1 Skull-Face Overlay as a Computer Vision Problem

Skull-face overlay requires positioning the skull in the same pose as the face of the photograph. From a pure CV point of view, the ante-mortem photograph is the result of the 2D projection of a real (3D) scene that was acquired by a particular (unknown) camera [15]. In such a scene, the living was somewhere inside the camera field of view with a given pose.

The most natural way to face the SFO problem is to replicate the original scenario. The goal is thus to adjust its size and orientation of the skull 3D model with respect to the head in the photograph through geometric transformations in the camera coordinate [1]. The specific characteristics of the camera must also be replicated to reproduce the original as far as possible and hence the perspective projection of the skull 3D model onto the facial photograph.

2.2 Our Automatic Skull-Face Overlay Procedure

The 3D-2D IR approach is guided by the cranial and facial landmarks previously assigned by a forensic expert in the skull 3D model and the facial photograph.

Hence, given two sets of cranial and facial landmarks, \(\textit{C} = \{cl^1, ... , cl^n \}\) and \(\textit{F} = \{fl^1, ... , fl^n \}\), the process has to solve a system of equations with 12 unknowns [12]: the direction of the rotation axis \({\varvec{d}} = (d_x, d_y, d_z)\), the location of the rotation axis with respect to the center of coordinates \({\varvec{r}} = (r_x, r_y, r_z)\), the rotation angle \(\theta \), the factor s that scales the size of the skull 3D model as the face in the photograph, the translation \({\varvec{t}} = (t_x, t_y, t_z)\) that places the origin of the skull 3D model in front of the camera to replicate the moment of the photograph, and the camera angle of view \(\phi \). Although rotation parametrization with only 3–4 parameters are also possible, and usually employed in the literature, we used 7 in order to increase the interpretability of the corresponding transformation and the definition of its constrains. Those parameters determine the geometric transformation f that projects every cranial landmark cl \(^{i}\) of the skull 3D model onto its corresponding facial landmark fl \(^{i}\) of the photograph:

$$\begin{aligned} F=C \cdot R \cdot S \cdot T \cdot P \end{aligned}$$
(1)

where R S, T, and P are rotation, scaling, translation and perspective projection matrices, respectively [12]. In addition, it modeled two sources of uncertainty.

Firstly, the location of facial landmarks refers to the difficult task of placing landmarks on a photograph [16]. The definition of many anthropometric landmarks is imprecise in nature [17]. Using precise landmarks, forensic anthropologists can only place the facial landmarks that they clearly identify in the facial photograph. The fuzzy approach developed in [13] allows experts to enclose a region where the facial landmark is placed without any doubt by using variable-size ellipses (fuzzy landmarks) instead of locating a precise point as usual. The number of landmarks placed by the expert can thus increase when those landmarks are considered. This leads to a better description of the skull-face correspondence thanks to the new pairs of cranial points and fuzzy landmarks in the face. The performance of the automatic SFO method is thus improved.

Secondly, facial soft tissue depth varies for each landmark correspondence and for different groups of people. It produces a mismatch among cranial and facial landmarks. Thus, the correspondence of a particular landmark on the surface of the skull and on the surface of the skin may not be symmetrical and perpendicular. This variability has been widely studied in many populations and considering different age and gender subgroups. The first and unique proposal tackling this uncertainty within an automatic SFO process has been recently published in [7]. This directly incorporates the corresponding landmark spatial relationships and distances within the automatic SFO procedure. To do this, they model the minimum (min), mean (mean) and maximum (max) distances between a pair of cranial and facial landmarks. These distances can be obtained from any anthropometric study looking at the specific population group considered. They used two alternative approaches to deal with the landmark matching imprecision in SFO (using Spheres or Cones).

Using the cranial and facial landmarks together with the previous consideration, an EA iteratively searches for the best geometric transformation f, i.e. the optimal combination of the 12 parameters that minimizes the mean error (ME) fitness function [7]:

$$\begin{aligned} FME=\frac{\sum \limits _{i=1}^{Ncrisp}(d^{\prime }(x_i,f(\tilde{C}^i))+\sum \limits _{j=1}^{Nfuzzy}(d^{\prime \prime }(\tilde{F}^j,f(\tilde{C}^j))}{N}, \end{aligned}$$
(2)

where Ncrisp is the number of 2D facial landmarks precisely located (crisp points), Nfuzzy is the number of 2D facial landmarks imprecisely located and defined as bi-dimensional fuzzy sets, N is the total number of landmarks considered (N = Ncrisp + Nfuzzy), x \(_i\) corresponds to a 2D facial landmark defined as a crisp point (\(x_i \in F\)), \(\tilde{C}^i\) and \(\tilde{C}^j\) are fuzzy sets modeling each 3D cranial landmark and the soft tissue distance to the corresponding 3D facial landmark i or j; f is the function that determines the 3D-2D perspective transformation that properly projects every 3D skull point onto the 2D photograph; f( \(\tilde{C}^i\) ) and f( \(\tilde{C}^j\) ) are two fuzzy sets, corresponding to the result of applying the perspective transformation f to the 3D volume (either sphere or cone), which model the landmark matching uncertainty; \(\tilde{F}^j\) represents the fuzzy set of points of the imprecise 2D facial landmark; d \(^\prime \) (x \(_i\) , f( \(\tilde{C}^i\) )) is the distance between a point and a fuzzy set of points, and d \(^{\prime \prime }\) ( \(\tilde{F}^j\) , f( \(\tilde{C}^j\) )) is the distance between two fuzzy sets.

3 Modeling Anthropometric Landmarks Relative Importance Within the Automatic SFO Process

3.1 Anthropometric Differences Among Landmarks for SFO Purposes

The rationale behind differentiating or grouping landmarks could be multiple. However since not all of them can be tested to find the best way of grouping them for every particular scenario, there is a need to choose among them. In this work, we focused on their anthropometric differences in order to differently consider them within the SFO optimization process. In particular, we modeled the three following scenarios:

Landmark Classification I: According to Their Anatomical Definition. It has long been recognized that not all landmarks are equally identifiable. This way, distinguishes three types of landmarks in [17] named type 1, 2, and 3 according to the decreasing precision of their anatomical location. Type 1 includes landmarks at which three different tissues meet. Type 2 defines points of maximum curvature or other local morphogenetic processes, usually with a biomechanical implication like a muscle attachment site. Finally, type 3 refers to external landmarks, which belong to a curve or surface. In addition there is a good reason to suspect that the identification precision differs among landmarks. Related with the previous classification, a recent study analyzing the spatial distribution/precision of forensic experts while locating landmarks in facial photographs concluded that there is a significant correlation between the type of landmark and the precision in their location.

Landmark Classification II: According to the Rigid or Mobile Nature of the Region. The jaw is the only articulated part on the skull, hence slightly or even large differences in the articulation of the mandible in the available facial photographs and in the 3D skull model are always expected. In fact, CFS practitioners call this region “terra incognita”, in the sense that they can not precisely assess craniofacial correspondence in this region due to its mobile nature. Although jaw articulation has been widely studied and mathematically modeled, there is not a single CFS method or practitioner reproducing jaw articulation in ante-mortem images in a reliable and objective manner. Another alternative to address this source of error/uncertainty will be thus to introduce a mathematical modeling of the jaw articulation into the automatic SFO process so it could be estimated for each particular ante-mortem image. However, even if the latter is successfully performed, there will always be a margin of error justifying the need of considering the landmarks within this region in a more suspicious way.

Landmark Classification III: According to the Presence or Absence of Soft Tissue. Most of landmarks do not have an exact match between their position in the skull and in the face due to the facial soft tissue thickness. Contrary to them, a few landmarks (located in the teeth) have a direct matching relation since they are located in a bony region. Thus, it seems quite obvious to consider this group of landmarks as the most representative to study craniofacial anatomical correspondence, something recently corroborated by an experimental study developed with the framework of the European project MEPROCS. However, it has not been analytically modeled or tested this higher importance within an automatic SFO procedure, which in any case will need the guidance of other landmarks due to the mostly coplanar region represented by teeth.

3.2 Modeling the Differences Among Landmarks

As a result of distinguishing different groups of landmarks with a different relative importance with the two previously mentioned approaches from the area of multicriteria decision making [6]: Weighted sum and Lexicographical order.

Weighted Sum. In this approach, all landmarks will always contribute to the final fitness, however not all of them will contribute equally. Depending on the relative importance of a particular group of landmarks and the number of marked landmarks per group in each case (to be able to fairly compare the results of different cases with a different number of marked landmarks per group). More formally, the fitness of each individual of the Genetic Algorithm (GA) population will be calculated according to Eq. (3):

$$\begin{aligned} Fitness = \frac{\sum _{i=1}^{n} w_{i}*fitnessLevel_{i}*nLevel_{i}}{\sum _{i=1}^{n} nLevel_{i}} \end{aligned}$$
(3)

where \(nLevel_{i}\) is the number of pairs of corresponding landmarks of group i located in a particular SFO case. \(w_{i}\), that ranges from 0 to 1, is the weight of group i, and \(\sum _{i=1}^{n} w_{i}\) is equal to 1. \(FitnessLevel_{i}\) is the result of calculating the fitness with just the landmarks of the group i.

Once this proposal has been defined, the last point is to establish the value of the free parameters of this approach, i.e., the number of groups and landmarks included in them, and the weight \(w_i\) of each particular group (their relative importance). While the three different and independent landmarks grouping approaches have an anthropometric motivation, the values for weighting them could be any possible combination adding 1.

Lexicographical Order. This approach lexicographically minimizes the fitness of each individual of the GA population. The first group of landmarks is the most important and it always contributes to the final fitness. However the information of a following group is only used when two individual are “similar” in all the previous groups. Two individual are considered “similar” when the differences between their fitness is lower than an \(\epsilon \). However since the marked landmarks and the distance between them are different in each case, this epsilon has thus to be adaptive to each case, group and generation.

$$\begin{aligned} \epsilon _{i}^{it} = k*|bestFitness_{i}^{it}-worstFitness_{i}^{it}| \end{aligned}$$
(4)

where i is the group, it is the generation number, \(\epsilon _{i}^{it}\) is the adaptive \(\epsilon \) of the group i at generation it, \(bestFitness_{i}^{it}\) is the best value of the fitness at generation it calculated only using the landmarks of the group i, \(worstFitness_{i}^{it}\) is the worst value of the fitness at generation it calculated only using the landmarks of the group i, K is a parameter that define how severe is the epsilon.

The variable k modulates how easily two individual are considered “similar”. A high value of k will produce more ties at each lexicographical level and thus, the information of the less important group of landmarks will be considered more frequently.

4 Experiments

A total of 324 different experiments were carried out. These involved 18 SFO problem instances corresponding to nine cases of live people (from Spain and Italy), three different parametrizations for both weighted sum and lexicographical order approaches, and the three different landmark classifications. Table 1 shows a summary of the experiments that have been carried out, along with the configuration of their parameters. Since all the approaches tested are based on stochastic processes, 10 independent runs were performed for each problem instance to compare the robustness of the methods and to avoid any possible bias.

Table 1. Experimental design for Landmark classification I, II and III

Table 2 shows the average error distance of our approaches for all the SFO cases. The error of each experiment was calculated by measuring the euclidean distance from the GT to the closest point of the backprojection line for the obtained geometric transformation f (see [7] for a detailed explanation of this validation metric). Weighted sum performs slightly better when the differences of weights is small (W3). Similarly, lexicographical order performs better when the similarity function is relaxed (larger k values, L3). This similar behaviour is more evident in the first and second group of landmarks, and it does not apply for the third group probably because of the limited number of bony landmarks. In fact, both approaches also reach very similar average errors when considering the latter group of landmarks. However, weighted sum performs better in the remaining two cases. The best parametrization for each particular approach and landmark group are marked in bold. G3W2 and G3L1 resulted to be the best performing approaches with similar average distance error. Notice that, the third group of landmarks (G3) could be only tested on a small subset (four cases) due to the impossibility to locate landmarks in the teeth. Then, within those approaches that could have been applied over the entire data set, the weighted sum with parametrization W3 is the best approach (G1W3 and G2W3).

Table 2. Mean error in mm regarding the ground truth of all the SFO cases (18 in total) for each particular approach, landmarks classification and parametrization

For all cases, there is not statistical significance difference between the MC45 and the two proposals when they model the differences according to the presence or absence of soft tissue (G3W2, G3L1). However this way of classifying is only formed in our dataset for frontal poses and it could be misleading. With the rest of approaches the MC45 is significantly better than the obtained results.

For lateral cases, our results are always significantly worse than the MC 45. However for frontal cases, G2W3, G3W2 and G3L1 have shown a performance as good as the MC45 and sometime it is slightly better although not significant differences have been found.

Once it is clear that there is a completely different behaviour of the two proposals in frontal and lateral poses, the following is to study in depth the reason behind it.

It is crucial to facilitate the location of a significant number of facial landmarks in order to properly determine the geometric transformation. Thus, the performance of those cases that did not have enough landmarks was unsatisfactory. We also performed a Pearson test in order to measure the correlation between the number of landmarks and its final performance. This shown that the performance is not just related to the number of landmarks but also with which landmarks are located.

5 Conclusion

This paper addressed the SFO stage and the problem of the relative importance of landmarks according to their anthropometric differences. Therefore, we modeled it using two classic approaches (weighted sum and lexicographical order) from multicriteria decision making [6] into the current SFO stage with three different ways of classifying landmarks.

The weighted sum obtained better results than the lexicographical order in almost all the experiments. At first sight, the performance of both approaches was significantly worse than the state of the art method. However analyzing the performance of the cases separately depending on the facial pose of the subject in the ante-mortem photographs, they showed very different behaviors. On the one hand in lateral pose cases, the performance was significantly worse than the state-of-the-art. This poor performance appears to be closely related to the small number of landmarks in the first groups. On the other hand in frontal pose cases, the performance was slightly better than the state of the art proposal although no significant differences were achieved. In summary although more future testing seems necessary, promising results were obtained in those cases where the forensic expert has located a large number of landmarks, and worse results in those cases with few landmarks.

Promising research lines for future work include the study of other ways of classificate landmarks as well as modeling other relationships among landmarks such as their correlation due to face symmetry. Another future work will focus on progressively reducing the uncertainty in fuzzy landmarks. Lastly, another interesting research line is to use the idea of using memetic algorithms [18] in the current proposal as a means of local refinement of the chromosomes.