Keywords

1 Introduction

Automatic recognition and authentication of individuals is becoming an important research field of computer vision. The applications are varied and are used for the safety of goods and lives as they are utilized in airports, military places and banks as well as by anti-criminal police.

Other usages of 3D authentication are used for video surveillance, biometrics, autonomous robotics, intelligent human–machine interface, photography and image and video indexing. Therefore, different biometric characteristics such as fingerprint, vein, hand geometry, iris, can be used for automatic authentication. The drawbacks of these modalities are that the cooperation of the candidate is usually essential. Among these, the face recognition is a very important biometric modality. The main advantages are that the candidate does not need direct physical contact with the system, and it is a transparent process since hidden cameras can be used without the candidate’s knowledge.

It has been shown that 2D face recognition is sensitive to illumination changes and variation of pose and expression. 3D face recognition is likely to solve these problems. 3D face recognition is more robust against pose changes and can better overcome illumination [1]. In the last 5 years, a rapid increase in the usage of 3D techniques for applications in video, face recognition and virtual reality has taken place both in academy and industry applications. The acquisition technology of 3D images has become simpler and cheaper. The main advantage of the 3D data usage is that it conserves all the geometric information of the object, which is close to the reality representation.

In this paper, we work in unaffected region by the pose changes and the variation of facial expressions. Indeed, we focus on the anthropometric landmarks that play an important role to perform quantitative measurements. Indeed, we extract the most important features that help us to recognize the 3D face. Our features vector is composed of the Euclidean distances and angles between these keypoints, the entropy of the region of interest and the width of the forehead.

The paper is organized as follows. In Sect. 2, an overview of some research works is presented. The 3D face databases are shown in Sect. 3. In Sect. 4, we detail our contribution in 3D face characterization. The achieved experiments and the results are discussed in Sect. 5. The conclusion and the perspectives are presented in Sect. 6.

2 Related Works

Recently, many authors worked on 3D face. Among them, we may cite: the algorithm developed in [2] described differential geometry descriptors such as derivatives, coefficients of the fundamental forms, different types of curvatures and Shape Index. After that they computed the geodesic and Euclidean distances between landmarks, nose volume and ratios between geodesic and Euclidean distances.

The method in [3] based on a new triangular surface patch (TSP) descriptor to localize the landmark in the 3D face. This descriptor represents a surface patch of an object, along with a fast approximate nearest neighbour lookup to find similar surface patches from training heads.

In the research work of salzar et al. [4], each landmark is located automatically on the face surface by means of a Markov network. The network captures the statistics of a property of the surface around each landmark and the structure of the connections. The estimation of the location of landmark on a test model is carried out by using probabilistic inference over the Markov network. They performed inference using the loopy belief propagation algorithm.

Another work proposed in [5] represented a linear method, namely linear discriminant analysis, and a nonlinear method, namely AdaBoost to characterize the 3D face.

The work of Böer et al. [6] consisted in the Discriminative Generalized Hough Transform (DGHT) to describe the 3D face scans.

Ballihi et al. [7] are interested in finding facial a curve of level sets (circular curves) and streamlines (radial curves). Indeed, they used a geometric shape analysis framework based on Riemannian geometry to compare pairwise facial curves and a boosting method to highlight geometric features according to the target application.

Ramadan and Abdel-Kader [8] suggested to use spherical wavelet parameterization of the 3D face image.

In [9], feature extraction is done by geodesic distances and linear discriminant analysis (LDA).

Han et al. [10], based on the mapping of a deformable model to a given test image involves two transformations, rigid and nonrigid transformation. They utilized two intrinsic geometric descriptors, namely geodesic distances and Euclidean distances, to represent the facial models by describing the set of distance-based features and contours.

Berretti et al. [11] in their work proposed three different signatures to locally describe the 3Dface at the keypoints, namely the histogram of gradients (HOG), the histogram of orientations (SHOT) and the geometric histogram (GH).

In [12], Fang et al. performed the principal component analysis (PCA) method the feature space to reduce its dimensionality. Then, they applied LDA to find the optimal subspace that preserves the most discriminant information. In fact, these two methods are applied sequentially to reduce the feature dimension and find the optimal subspace. The feature vectors combine geometric information of the landmarks and the statistics on the density of edges and curvature around the landmarks.

The method introduced in [13], based on wavelet descriptors as a multiscale tool to analyze 3D face. Also, they used an ellipsoidal cropping through the detection of facial landmarks to detect and crop the facial region. In the step of the preprocessing of the facial region, they applied a procedure called Sorted Exact Distance Representation in order to fill holes. Gaussian filter is used to the range image to smooth the surface. Then, they are based on the Gaussian function as mother wavelets to extract the landmarks.

Therefore, in our work, we prefer to improve geometric descriptors which are based on an anthropometric measurements and Euclidean distances between feature points of a 3D meshed face.

Experimental results are tested on GAVAB and FRAV 3D databases. Our technique outperforms other latest methods in the state-of-the-art.

3 3D Face Databases

3.1 GAVAB Database

GAVAB is a database that consists of 549 3D facial range scans (Fig. 1) of 61 different subjects (45 are male and 16 are female) captured by a Minolta Vivid 700 scanner. The faces were placed at 1.5 ± 0.5 m of distance from the scanner. Although the chair (with wheels) was at 1.5 m of distance, the individuals had the flexibility to move their head normally. This fact produces any resolution changes from some images to others. The height of the chair could be changed in order to make the head always visible to the scanner [14].

Fig. 1
figure 1

3D face scan

Each image is a mesh of connected 3D points of the facial surface. Textured information of each vertex has been eliminated to reduce the size of all these face models. The subjects are all Caucasian, and most of them are aged between 18 and 40 [15]. Some examples in the GAVAB database are shown in Fig. 2.

Fig. 2
figure 2

Example of scans from GAVAB database

Each subject has been scanned 9 times with many poses and different facial expressions. The scans with pose variations contain 1 facial scan while looking up (+35°), 1 while looking down (−35°), 1 for the right profile (+90°) and 1 for the left profile (−90°). Figure 3 shows some examples of pose variation [15].

Fig. 3
figure 3

Example of pose variation

The facial scan without pose changes include 4 different close frontal facial scans: 2 of them are with a neutral expression, 1 with a smile and 1 with an accentuated laugh. In Fig. 4, we find a different facial expression of some subjects in the database [15].

Fig. 4
figure 4

Variation in facial expression

In addition, GavabDB is the noisiest dataset currently available to the public, showing many holes and missing data especially in the eyebrows (Fig. 5a). Also, the images of this database have much noise and nonfacial regions, such as neck, hands, shoulders and hair.(Fig. 5b). Moreover, each person has several scans with different poses and facial expressions.

Fig. 5
figure 5

a Scans from Gavab database showing varying amounts of holes (missing data). b Scans from Gavab database showing varying amounts of noise and presence of nonfacial regions

The most important problem in this database is that all the scans of a given person do not have the same number of faces and vertexes (Fig. 6), which can cause a problem to the extraction process in the region of interest. Besides, this database shows a high interpersonal variation.

Fig. 6
figure 6

Different scans of the same person showing an intrapersonal variation of number of vertexes and faces

As a matter of fact, in this database we find all types of scans with pose variations, many facial expressions, occlusions, fat and slim persons, etc. Fig. 7 presents the variation between some individuals.

Fig. 7
figure 7

Intrapersonal and interpersonal variations in the scans of GAVAB database

Therefore, we will use this database to evaluate our technique, since it has been used for automatic face recognition experiments and other possible facial applications such as pose correction or register of 3D facial models.

3.2 FRAV 3D Database

This database contains 106 subjects, with approximately one woman every three men. The data were acquired with a Minolta VIVID 700 scanner, which provides texture information and a VRML file. Some examples in the FRAV V3 database are shown in Fig. 8 [16].

Fig. 8
figure 8

Example of scans from FRAV 3D database

Each person has 16 captures with different poses and lighting conditions, trying to cover all possible variations, including turns in different directions as it is shown in Fig. 9 [16].

Fig. 9
figure 9

Example of scans from FRAV 3D database showing poses variation

Moreover this database provides scans with different facial expression as it is presented in Fig. 10.

Fig. 10
figure 10

Example of scans from FRAV 3D database with different facial expression

In every case only one parameter was modified between two captures. This is one of the main advantages of this database, respect to others [16].

4 Proposed Approach

In this paper, a novel approach based on landmark localization and characterization of a 3D face is presented. It contains five essential steps (Fig. 11). Indeed, we begin with the lecture of the 3D mesh. Then we extract the region of interest (ROI) using the anthropometric proportions and a cropping filter. After that, a pre-treatment of the extracted part is needed. Finally, after the detection of the salient points, we use the Euclidean distance and the angle between them to determine the features vector.

Fig. 11
figure 11

Organization chart of our approach

4.1 Region of Interest Detection

We all know that any face is divided in three parts in the anthropometric proportions; the first part is between the higher forehead and eyebrows, the second is between the eyebrows and nose tip and the third region is between the nose tip and the chin.

We focus on the second part of the face where the number of vertexes is in the interval [1/3 2/3] of the image (Fig. 12). After that, we have worked exactly in the region of the nose tip that is located in the end of the second part.

Fig. 12
figure 12

Detection and extraction of the region of interest

The regions of interest (eyes + nose) are automatically detected for frontal scans and scans with pose variation. We choose to work on these regions above the mouth to avoid the expression variations. It is named a static part because it is weakly affected by the variations of facial expression while the lower part includes the mouth is strongly affected by the change of facial expression. In this part we can detect the keypoints that are used as the foundation to detect the most prominent features to compose the characteristic vector.

4.2 3D Face Preprocessing

Preprocessing data is a very important and nontrivial step of biometric recognition systems. We preprocess a 3D face to minimize the effects of the intrinsic noise of a face scanner, such as holes and spikes, including some undesired parts such as clothes, neck, ears and hair. This part is common to all situations including 3D scan preprocessing. Our preprocessing has three stages: segmentation, filling holes and Delaunay triangulation, median filter and denosing. The details of these stages are given in the following.

  • Segmentation

The first step of the pretreatment is the segmentation in which we split the 3D face in two areas. We are based on the anthropometric proportions of the face to detect the region of interest that we need (the eyes and the nose). Indeed, we are interested in the second part of the face. Then, we use a cropped filter to extract the ROI. Figure 13 shows the regions of interest that we work on. These regions are unaffected by the variations of facial expression.

Fig. 13
figure 13

3D face segmentation

  • Filling holes and Delaunay triangulation

The holes are created either because of the absorption of laser in dark areas, such as eyebrows and mustaches, or self-occlusion or open mouths [17]. These holes and noises in a 3D scan reduce the recognition system accuracy, that is why it is important to fill holes in the input mesh.

Though, building triangular mesh costs too much time and raises many issues; so we have applied Delaunay triangulation (Fig. 14) to fill holes in the mesh of the extracted face region so as to place them on a regular grid. This technic is not only very fast and easy, but also it solves this problem more efficiently.

Fig. 14
figure 14

Delaunay triangulation to fill holes. a Triangulation de Delaunay. b Filling holes

  • Denosing

In this stage, we have used median filter to remove the undesirable spikes and clutter due to the 3D scanner (Fig. 15). This filter may also unnecessarily suppress actual and small variations on the data which appear in the 3D surface.

Fig. 15
figure 15

Median filter

Finally, we perform a mesh heat diffusion flow in order to eliminate the maximum of the undesirable noise (Fig. 16).

Fig. 16
figure 16

Denosing the 3D face

4.3 3D Face Characterization

The Cranio-facial measurements are used in different fields: in sculpture to create well-proportioned facial ideals, in anthropology to analyze prehistoric human remains and more recently in computer vision (estimation of the orientation of the head, detection of characteristic points of a face) and computer graphics to create parametric models of human faces.

In this paper, we propose an approach of 3D face characterization using anthropometric measures based on seven salient landmarks: (left inner eye corner, left outer eye corner, right inner eye corner, right outer eye corner, centre of the root of the nose, nose tip and forehead point).

We extract the desired region of interest (eyes and nose) that is unaffected by the variations of pose, occlusions and facial expressions. We focus on the third phase of facial recognition system: the characterization. This step is the most important and interesting step in any recognition system. It takes place after the detection and preprocessing phase. This phase is based on an analysis of a 3D shape image.

After detecting the points of interest, we built our feature vector which is composed by:

  • Entropy oft he region of interest;

  • Euclidean distances;

Figure 17 shows the different Euclidean distances suggested in our approach as a feature to characterize the 3D face.

Fig. 17
figure 17

Euclidean distances

where as follows:

$$ A = \left( {\begin{array}{*{20}l} {x_{1} } \hfill \\ {y_{1} } \hfill \\ {z_{1} } \hfill \\ \end{array} } \right);B = \left( {\begin{array}{*{20}l} {x_{2} } \hfill \\ {y_{2} } \hfill \\ {z_{2} } \hfill \\ \end{array} } \right)L1 = \sqrt {\left( {x_{2} - x_{1} } \right)^{2} + \left( {y_{2} - y_{1} } \right)^{2} + \left( {z_{2} - z} \right)^{2} } $$
(1)
$$ C = \left( {\begin{array}{*{20}l} {x_{3} } \hfill \\ {y_{3} } \hfill \\ {z_{3} } \hfill \\ \end{array} } \right);D = \left( {\begin{array}{*{20}l} {x_{4} } \hfill \\ {y_{4} } \hfill \\ {z_{4} } \hfill \\ \end{array} } \right)L2 = \sqrt {\left( {x_{4} - x_{3} } \right)^{2} + \left( {y_{4} - y_{3} } \right)^{2} + \left( {z_{4} - z_{3} } \right)^{2} } $$
(2)
$$ B = \left( {\begin{array}{*{20}l} {x_{2} } \hfill \\ {y_{2} } \hfill \\ {z_{2} } \hfill \\ \end{array} } \right);C = \left( {\begin{array}{*{20}l} {x_{3} } \hfill \\ {y_{3} } \hfill \\ {z_{3} } \hfill \\ \end{array} } \right)L3 = \sqrt {\left( {x_{3} - x_{2} } \right)^{2} + \left( {y_{3} - y_{2} } \right)^{2} + \left( {z_{3} - z_{2} } \right)^{2} } $$
(3)
$$ \begin{aligned} E = \left( {\begin{array}{*{20}l} {a_{1} } \hfill \\ {a_{2} } \hfill \\ {a_{3} } \hfill \\ \end{array} } \right);F = \left( {\begin{array}{*{20}l} {b_{1} } \hfill \\ {b_{2} } \hfill \\ {b_{3} } \hfill \\ \end{array} } \right)D1 = \sqrt {\left( {b_{1} - a_{1} } \right)^{2} + \left( {b_{2} - a_{2} } \right)^{2} + \left( {b_{3} - a_{3} } \right)^{2} } \hfill \\ a_{1} = \frac{{x_{1} + x_{2} }}{2};a_{2} = \frac{{y_{1} + y_{2} }}{2};a_{3} = \frac{{z_{1} + z_{2} }}{2};b_{1} = \frac{{x_{3} + x_{4} }}{2};b_{2} = \frac{{y_{3} + y_{4} }}{2} \hfill \\ \end{aligned} $$
(4)
$$ G = \left( {\begin{array}{*{20}l} {x_{2} } \hfill \\ {y_{2} } \hfill \\ {z_{2} } \hfill \\ \end{array} } \right);H = \left( {\begin{array}{*{20}l} {x_{5} } \hfill \\ {y_{5} } \hfill \\ {z_{5} } \hfill \\ \end{array} } \right)D2 = \sqrt {\left( {x_{2} - x_{5} } \right)^{2} + \left( {y_{2} - y_{5} } \right)^{2} + \left( {z_{2} - z_{5} } \right)^{2} } $$
(5)

The Eqs. (1)–(5) determined, respectively, the length of the right eye, the length of left eye, the length of the root of the nose, the distance between the eyes and the distance between the eyes and nose. These features are calculated using Euclidean distance between the keypoints detected in the region of interest. We extract the coordinates of each point in the three axes (x, y, z). So, the Euclidean distance between two points implicates computing the square root of the sum of the squares of the differences between corresponding values.

  • Angles between keypoints

Figure 18 shows the angles between the keypoints. When (a) is the angle between nose tip, left outer eye corner and right outer eye corner, (b) is the angle between nose tip, left inner eye corner and right inner eye corner, (c) is the angle between nose tip, right inner eye corner and right outer eye corner, (d) is the angle between nose tip, left inner eye corner and left outer eye corner and (e) is the angle between the nose tip, centre of the root of the nose and centre of the forehead.

Fig. 18
figure 18

Angles between keypoints

We use the trigonometric function to determine the degrees of angles between the landmarks. We use exactly the cosine theta that returns the angle whose cosine is the specified number.

This function calculates the angle between two three-dimensional vectors as it is shown above.

As it is shown in Fig. 19, there are lots of different shapes that noses can take (dorsal hump, long nose, tension tip, over protected tip, bulbous tip and under projected tip). Therefore, we suggest adding another primitive that is the angle between nose tip, centre of the root of the nose and centre of the forehead (Fig. 18e). This angle is highly differing from an individual to another, and it depends on the nose of people.

Fig. 19
figure 19

Different shapes of noses

The principal keypoints detected are as follows:

  • A: Right outer eye corner.

  • B: Right inner eye corner.

  • C: Left inner eye corner.

  • D: Left outer eye corner.

  • H: Nose tip.

  • I: Centre of the forehead

The keypoints E, F and G are calculated from the principal keypoints. It indicated that:

  • E: Centre of the right eye.

  • F: Centre of the left eye.

  • G: Centre of the root of the nose.

  • Width of forehead;

Both the GAVAB and the FRAV 3D databases contain variety of scans between people. Each person has its properties and a specific form of the face (see Fig. 20). We propose to add a specific characteristic which is the width of the face to the feature vector. This feature helps us to determine the form of the face.

Fig. 20
figure 20

Different forms of human face

Fig. 21
figure 21

Arc of the circle

We begin by the extracting of the face in a circle inwhich the nose tip is its centre (Fig. 22). Hence, applying the Eqs. 6 and 7, we can calculate the width of the forehead as an arc of the circle. In Fig. 21, L is the arc of the circle.

Fig. 22
figure 22

Measure of forehead width

$$ {\alpha =}\frac{180 * \theta}{\pi} $$
(6)
$$ L = \frac{\alpha \pi r}{180} $$
(7)

where:

  • r = Radius: distance between the nose tip and the first vertex of the 3D face.

  • θ = Angle (in radians) between nose tip, left outer eye corner and right outer eye corner (Fig. 17a).

  • L = Length of the arc = Width of the forehead.

The advantage of our approach is that we work only in the upper part of the face so we work just in the half circle to extract the pertinent characteristics of the 3D face. These characteristics were then fed into a standard classifier such as support vector machines (SVM) to handle the classification.

5 Experiments and Results

We test our algorithm on the GAVAB and FRAV 3D face databases. Experimental results show that the proposed method is comparable to state-of-the-art methods in terms of its robustness, flexibility and accuracy (Table 1).

Table 1 Example of measures of characteristics of scans from both GAVAB and FRAV 3D databases

We suggest working with the geometric measurements because they have a permanent and high subjective perception. These signatures are the most discriminate and consistent that deal with pose variation, occlusions, missing data and facial expressions. Furthermore, the literature shows the absence of 3D biometric systems based on the fusion of these features.

Purposely, the major contributions of this paper consist of a new geometric features approach for 3D face recognition system handling the challenge of facial expression.

Moreover, in our fully automatic approach, we work just in the upper part of the 3D face. We success 100 % to extract this region. Therefore, it is not only permanent, but also fast in time of execution.

6 Conclusion and Perspectives

In our approach, we use the anthropometric measurements to extract the pertinent primitives of 3D face. We focus on the upper face because this region is unaffected by the variations of facial expressions. We use seven salient points that are unaffected by the variations of pose, occlusions and facial expressions. We use the Euclidean distances and the measure of angles between these points to extract the features vector of the 3D face. This vector is necessary in 3D face recognition system.

Future evaluations should be carried out on more challenging datasets with an increased number of images, also incorporating rotations and translations to improve the success rate.