Keywords

1 Introduction

The analysis of 3D faces is important in many applications, especially in the biometric and medical fields. Such applications aim to accurately relate information from different meshes in order to compare them. A common approach to compare 3D meshes is by rigid registration, where two or more meshes are fitted in exact alignment with one another. The localization of specific landmarks and regions on faces often plays important part in these applications. Land marks can help the registration algorithms in achieving rough alignment of meshes. In biometric applications, landmarks are often instrumental in the generation of signatures for faces [1]. In this paper, we propose a robust framework to accurately localize landmark i.e. the nose-tip from three-dimensional faces obtained from the 3D meshes and we have tested our method by applying it on the meshes generated from the landmark points available from the Bosphorus database and finally tested how much deviated are the nose-tips of the generated model from the original landmarked model. This paper is organized as follows. Section 2 discusses some of the related works on 3D land marking. Section 3 describes the proposed algorithm. Section 4 describes the experimental results and discussions. Finally conclusions and future scope are enlisted in Sect. 5. A common approach to compare 3D meshes is by a rigid registration, where two or more meshes are fitted in exact alignment with one another. The localization of specific landmarks and regions on faces often plays a important part in these applications. Land marks can help the registration algorithms in achieving rough alignment of meshes. In biometric applications, landmarks are often instrumental in the generation of signatures for faces [1]. In this paper, we propose a robust framework to accurately localize landmark i.e. the nose-tip from three- dimensional faces obtained from the 3D meshes and we have tested our method by applying it on the meshes generated from the landmark points available from the Bosphorus database and finally tested how much deviated are the nose-tips of the generated model from the original landmarked model. This paper is organized as follows. Section 2 discusses some of the related works on 3D land marking. Section 3 describes the proposed algorithm. Section 4 describes the experimental results and discussions. Finally conclusions and future scope are enlisted in Sect. 5.

2 Related Works

In this section, let us discuss some existing approaches on 3D face detection, landmark localization, face registration and statistical models for face analysis. Colombo [2] performed 3D face detection by first identifying candidate eyes and noses in a classifier. However, the authors highlight that their method is highly sensitive to the presence of outliers and holes around the eyes and nose regions. Most of the existing approaches [3,4] target face localization, rather than detection, where the presence and number of faces is known. In Ref. [4], face localization is performed by finding the nose tip and segmentation is done through a cropping sphere centered at the nose tip. This approach is highly restrictive to the database used, as each input mesh is assumed to contain only one frontal face and no pose variations is taken into consideration. Moreover, the cropping sphere has a fixed radius over the entire database and hence the segmentation is not robust to scale variation. In Ref. [4], 3D point clustering using texture information is performed to localize the face. This method relies on the availability of a texture map and the authors state, reduction in stability with head rotations greater than ±45° from the frontal pose. Once faces are detected and segmented, landmark localization is often used for face analysis. Many existing approaches rely on accurately locating corresponding landmarks or regions to perform a rough alignment of meshes. In Ref. [5], a curvature based “surface signature” image is used to locate salient points for the estimation of the rigid transformation. Shape- models used in these works do not involve fitting the model to 3D data, devoid of texture. The dependence on prior knowledge of feature map thresholds, orientation and pose is evident in most existing methods for landmark localization on meshes [6,7]. In all the above models, pose variance is not taken into consideration very significantly. But in the landmarking model that we have described, our method is invariant to any pose variation.

3 Proposed Algorithm

A range image (Fig.  1) is a set of points in 3D each containing the intensity of individual pixels. The technique also holds in case of 2.5D images which may be described as containing at least one depth value for every (x, y) coordinate. The acquisition process is described as follows: Normally, a 3D mesh image is 3D camera such as a Minolta Vivid 700 camera and a range image is generated from the 3D mesh image. The image in our case is generally in the form of z = f(x, y). Next, some pre-processing methods is applied to eliminate unwanted details such as facial hairs, scars etc. And finally using a geometric approach, the nose-tip is being localized.

Fig. 1
figure 1

Face images: 2.5D range image

Now, we shall discuss the proposed algorithm. Figure  2 shows the proposed technique.

Fig. 2
figure 2

An overview of the present proposed method

The present technique makes use of the following steps:

3.1 Processing of Landmark Data from Bosphorus Database

We use the 3D facial model from the Bosphorus database. The database composes of a .lm3 file and a .bnt file. With this landmarking model i.e. the .lm3 file, we build a parameterized model, Ω = Υ(b), where Ω = {ω1, ω2, …,ωN}, with ωi = (xi, yi, zi) representing each landmark.

Now as can be seen in Fig.  3, there are 14 landmarks in the Bosphorus database. We have taken the data, discarded the non-face regions and then we have built up the model i.e. the landmarked range image. The landmarked range image, which has been generated by our system is shown in Fig.  4.

Fig. 3
figure 3

Sample face showing the 14 landmarks used to generate the landmark model

Fig. 4
figure 4

Sample face from the Bosphorus database showing the landmark model.a Frontal pose.b Rotated about x-axis.c Rotated about y-axis at 30°.d Rotated about yz-axis

In Fig.  4, we demonstrate the landmark models corresponding to frontal pose (a), rotated about y-axis (b), rotated about x-axis (c) and rotated about yz-axis (d). As we can see in Fig.  4, it represents a face because we are building up the entire model with only 14 landmarks. The 14 landmark data of each individual as given in the Bosphorus database have to be normalized and brought within proper coordinate system, otherwise the performance of our comparison based system could not be justified. Here pre-processing of the landmark model is not required because it hardly contains noise.

3.2 3D Range Image Acquisition

A range image is actually an array of numbers where the numbers quantify the distances from the focal plane of the sensor to the surfaces of objects within the field of view along rays emanating from a regularly spaced grid. Different from 3D mesh images, it is easy to utilize the 3D information of range images because the 3D information of each point is explicit on a regularly spaced grid. The Fig.  5 show samples of range images that we have taken for testing from the Bosphorus database.

Fig. 5
figure 5

Samples from the Bosphorus database corresponding to a single person for frontal pose (a), image rotated about y axis (b), image rotated about x-axis (30°) (c), image rotated about yz-axis (d)

The mesh-grids corresponding to the range images shown in Fig.  5ad are shown in Fig.  6.

Fig. 6
figure 6

Mesh-grids from the Bosphorus database corresponding to a single person (female) for frontal pose (a), image rotated about y axis (b), image rotated about x-axis (30°) (c), image rotated about yz-axis (d)

3.3 Perform Pre-Processing on the 3D Image

Sometimes 3D face images are affected by noise and several other factors. So some types of smoothing techniques are to be applied. In our present technique, we have extended the concept of 2D weighted median filtering technique to 3D face images. The present technique performs filtering of 3D dataset using the weighted median implementation [8] of the mesh median filtering. The weighted median filter is a modification of the simple median filter. After smoothing the results corresponding to Fig.  6ad are obtained in Fig.  7ad respectively.

Fig. 7
figure 7

Smoothed mesh-grids from the Bosphorus database corresponding to a single person (female) for frontal pose (a), image rotated about y axis (b), image rotated about x-axis (30°) (c), image rotated about yz-axis (d)

3.4 Feature Localization

Faces have a common general form with prominent local structures such as eyes, nose, mouth, chin etc. Facial feature localization is one of the most important tasks of any facial classification system. To achieve fast and efficient classification, it is needed to identify features which are mostly needed for classification task.

3.4.1 Surface Generation

The next part of the present technique concentrates on generating the surface [9] of this 3D mesh image. For the nose tip localization we have used the maximum intensity concept as the tool for the selection process. Each of the landmark models as shown in Fig.  3 were inspected for localizing the nose tip. A set of fiducially points are extracted from both frontal and various poses of face images using a maximum intensity algorithm [9]. As shown in Fig.  8, the nose tips have been labeled on the facial surface, and accordingly, the local regions are constructed based on these points. The maximum intensity algorithm used for our purpose is given below:

Fig. 8
figure 8

Sample face from the Bosphorus database showing the landmark model with nose-tips localized.a Frontal pose.b Rotated about x-axis.c Rotated about y-axis at 30°.d Rotated about yz-axis

  • Algorithm Find_Maximum_Intensity(Image)

    • Step 1:- Set max to 0

    • Step 2:- Run loop for I from 1 to width_of_Image

    • Step 3:- Run loop for J from 1 to height_of_Image

    • Step 4:- Set val to sum(image(I−1:I+1,J−1:J+1))

    • Step 5:- Check if val is greater than max.

    • Step 6:- Set val to max

    • Step 7:- End if

    • Step 8:- End loop for J

    • Step 9:- End loop for I

  • End

The same approach is hereby applied to the mesh-grids obtained as shown in Fig.  9 which shows the result with the nose-tip localized.

Fig. 9
figure 9

Smoothed mesh-grids from the Bosphorus database corresponding to a single person (female) for frontal pose.a Image rotated about y axis.b Image rotated about x-axis (30°).c Image rotated about yz-axis.d With nose-tips localized

3.5 Calculate the Standard Deviation of the Points Generated from the Landmark Points

In this step, we have extracted the nose coordinates in the form of (x, y) from the landmark model. The steps that we have performed to check how much our generated nose-tips vary from the coordinates (x, y) extracted from the landmark model are listed in the following Algorithm:

  • Algorithm CompareLandmark

    • Step 1:- At first calculate the mean for all the data points.

    • Step 2:- The next step is to get the deviations in these numbers. Do this by subtracting the mean from each of the numbers in the set of data.

    • Step 3:- Now, square the deviations calculated in Step 2.

    • Step 4:- Next, add all of the squares in Step 3 together.

    • Step 5:- Now divide the sum in Step 4 by the total number of data in both x and y coordinates.

    • Step 6:- Next take the square root of the result in Step 5.

    • Step 7:- The result in step 6 is the standard deviation of the original dataset from the landmarked model.

    • Step 8:- Calculate the co-variance of x and y

    • Step 9:- Calculate the standard deviation of x and y

    • Step 10:- Finally, apply f test to see the difference of standard deviations.

  • End of Algorithm

3.6 3D Registration

Registration basically means transforming shapes in such a way that they can be compared. For 3D face recognition, e.g. it is common to locate a number of landmarks (e.g. eyes, nose, and mouth) in each face and rotate, translate and scale these landmarks in such a way that they are projected to fixed, predefined positions. The same geometric transformation is then applied to the facial image. The facial image is thus transformed to an intrinsic coordinate system. In this paper only the feature localization has been specified, but registration will be performed on the basis of the generated landmark points in our future work. In the present method, we have specified intuitively on the localization of nose-tips across any pose variations. We now distinguish rigid and non-rigid registration. The former only performs rotation and translation (and possibly scaling) of the point clouds. The latter also allows for (small) deformations of the point cloud to realise an optimal registration. Non-rigid registration can be useful in handling facial expressions. Our method of registration would be rigid registration only.

Also there are yet some other classifications of registration technique which are enlisted as follows:

  • One-to-all registration (register one face to another).

  • Registration to a face model or atlas.

  • Registration to an intrinsic coordinate system using geometric properties of the face like landmarks. Our proposed technique for registration would be to some intrinsic coordinate system using geometric properties of the face like landmarks. In the algorithm below, we propose a Algorithm for registration which would be implemented as a part of our future work.

  • Algorithm Registration_3D

    • Step 1:- Input the 3D image in frontal pose and the rotated image

    • Step 2:- Pre-process both the 3D images

    • Step 3:- Generate the nose-tips both in case of the 3D image and the rotated images.

    • Step 4:- while(x coordinate of rotated_image <= x coordinate of frontal_image)

    • Step 5:- if (x coordinate of rotated_image <e)

    • Step 6:- Output the registered image and exit

    • Step 7:- else

    • Step 8:- Rotate the image by 2°.

    • Step 9:- End if

    • Step 10:-End while loop

  • End Algorithm

4 Experimental Results

To evaluate the accuracy o f the landmark localization, we compare the landmarks localized on 988 faces of the Bosphorus database with the provided ground-truth. Since, we are concentrating on the pose-variations, so it is better to say that let us first list on how many 3D range images the max-intensity algorithm is correctly able to detect the nose-tips. We have considered the pose-variations and thus listed the results of nose-tip localization in all frontal poses including expressions as shown in Table  1 (Figs.  10, 11, 12, 13).

Table 1 Results of nose-tip localization in frontal pose with expressions
Fig. 10
figure 10

Some samples in frontal pose from Bosphorus database

Fig. 11
figure 11

Some samples in non-frontal pose from Bosphorus database with images rotated about yz-axis

Fig. 12
figure 12

Some samples in non-frontal pose from Bosphorus database with images rotated about x-axis

Fig. 13
figure 13

Some samples in non-frontal pose from Bosphorus database with images rotated about y-axis

In Table  2, we present the result of our nose-tip detection method up to pose-variations of 10° across YZ-axis.

Table 2 Results of nose-tip localization in rotated pose with respect to yz axes

In Table  3, we present the result of our nose-tip detection method up to pose-variations of 5° and 10° across X-axis.

Table 3 Results of nose-tip localization in rotated pose with respect to x axes

In Table  4 we present the result of our nose-tip detection method up to pose-variations of 5° and 10° across Y-axis.

Table 4 Results of nose-tip localization in rotated pose with respect to y axes

Here in the following section, we would list the comparison of our technique on both landmarked and generated 3D mesh-grids are listed as below:

As we see in Fig.  14, we have plotted some samples marked in red from the landmarked image and some samples marked in black from the generated 3D mesh. The samples that we have basically taken in Fig.  14 are all expressions faces taken from the Bosphorus database. Figure  15 shows the standard deviation we have obtained for the expressions face i.e. ANGER and DISGUST and we list the standard deviation of the generated range model and the landmarked model.

Fig. 14
figure 14

Some samples of nose-tips plotted from the 3D landmark model

Fig. 15
figure 15

Some samples of nose-tips plotted from the 3D landmark model and 3D mesh

Now, we have plotted in Fig.  16, the standard deviations that we have obtained in case of expressions, neutral and pose variations of 3D faces.

Fig. 16
figure 16

Standard deviations of landmarked model from 3D generated model of poses in case of expression faces, neutral faces, poses rotated about x axis and poses rotated about y-axis

In the last and final step, we have applied f-test. The result of the test must be one of the two following conclusions:

  • The population standard deviations are different from each other (if F > 1).

  • The population standard deviations are not different from each other (if F < 1).The test statistic of this curve is in the form\( {\text{F}}\,{ = }\,{\raise0.7ex\hbox{${{\text{s}}_{ 1}^{ 2} }$} \!\mathord{\left/ {\vphantom {{{\text{s}}_{ 1}^{ 2} } {{\text{s}}_{ 2}^{ 2} }}}\right.\kern-0pt} \!\lower0.7ex\hbox{${{\text{s}}_{ 2}^{ 2} }$}} \) where\( {\text{s}}_{ 1} \) and\( {\text{s}}_{ 2} \) are the standard deviations.

The standard deviations calculated falls in the region where F < 1 shown in Fig.  17 which means there is significantly very less difference between the above calculated standard deviations. Thus we have proved that the standard deviations between a particular 3D face mesh-grid and its’ corresponding landmarked model of a certain individual are same which proves our statistics.

Fig. 17
figure 17

The f test curve

5 Conclusion and Future Scope

In this paper, we have presented a novel technique for localization of nose-tip in 3D face images and we have compared our method with the landmarked data models available. We have detected the feature i.e. the nose-tip and the entire process of nose-tip detection is invariant to pose variations i.e. we have detected the nose-tip in case of any pose variations across the X, Y and Z axes. Experimental results demonstrate that our method performs well in case of the Bosphorus database. For our future work, the performance evaluation of our method is in progress in case of other databases also. As a part of our future work, we propose to attempt the problem of registration across pose-variations by finding out the translation and rotational parameters. Also, we aim at performing 3D face recognition by using standard classifiers like PCA, LDA etc.