Keywords

1 Introduction

Currently, face recognition technologies exhibit good performance under normal conditions [1,2,3]. However, face recognition with pose variations is still a major challenge because an object with different poses may appear considerably different owing to nonlinear deformation and self-occlusion [4]. Multi-pose face recognition based on a single view has high research value because it can be used in a wide range of practical applications. Researches on multi-pose face recognition technology based on a single view can be divided into two primary categories, i.e., pose correction methods and virtual multi-pose image synthesizing methods. Virtual multi-pose image synthesizing methods can improve the accuracy of face recognition because they generate appropriate virtual multi-pose face images to enrich training samples.

Among virtual multi-pose image synthesizing methods, three-dimensional (3D) face morphable models are an important area of research. The establishment of this model is based on a set of 3D faces [5]. However, original 3D face data cannot be used for linear calculation because the numbers of vertices and patches are different for each 3D face. It is necessary for all vertices and patches of the 3D faces to have one-to-one correspondence. A large number of methods have been proposed for dense correspondence of 3D faces. Blanz et al. proposed an optical flow algorithm and a bootstrapping algorithm to solve the dense correspondence problem of 3D faces [6, 7]. The method is valid when the 3D face is similar to the reference face. Yongli Hu et al. proposed a dense correspondence method based on a grid resampling method, and the result was more accurate [8]. However, the process was very complicated because considerable manual interaction was required. Gu et al. proposed a uniform grid resampling algorithm [9]. Gong Xun et al. proposed a grid resampling method based on a planar template [10]. The method located features based on a combination of two-dimensional (2D) and 3D texture information, and divided the surface of a 3D face using these features.

To simplify the construction of a 3D face model, an improved dense correspondence method based on planar template resampling with geometric information is proposed in this paper. A 3D face is reconstructed using a 3D sparse morphable model, and multi-pose face images are generated using texture mapping, model rotation, and projection. The rest of the paper is organized as follows: In Sect. 2, an improved dense correspondence method based on planar template resampling with geometric information is proposed. In Sect. 3, 3D face reconstruction based on the 3D face morphable model is presented. In Sect. 4, the experiments performed using BJUT-3D and CAS-PEAL-R1 databases are described, and the proposed method is compared to other methods. Finally, conclusions are stated in Sect. 5.

2 An Improved Dense Correspondence Method Based on Planar Template Resampling with Geometric Information

2.1 Definition of Planar Template

After face segmentation on the cylindrical expansion of the original 3D faces, segmented 2D texture maps are aligned using the nose tip positions and overlapped. Among the overlapped points, the pixels that belong to more than half of the images are used to construct a planar template, which is shown in Fig. 1(a).

Fig. 1.
figure 1

Definition of planar template

The corresponding vertices of the selected pixels are scattered, and meshes are built using these vertices to construct a normalized 3D face model. The topological structure is constructed through the following steps: First, vertices are connected according to the connection rules of vertices on the vertical and horizontal iso-lines. Then, each small rectangular area is divided into two triangles by the diagonal between the bottom left corner and the top right corner of this area. Lastly, the triangles are numbered from top to bottom and left to right, which is shown in Fig. 1(b). The template can be used to generate the normalized 3D face model.

2.2 Resampling of Original 3D Faces

An original 3D human face is not consistent with the definition of the planar template, and it is necessary to resample it.

  1. (a)

    Vertical sampling

    For an original 3D face, \( f_{i} \) (\( 1 \le i \le m \)), the vertex on the kth line can be calculated using interpolation. \( pre_{ver} \) and \( fol_{ver} \) shown in Eqs. (1) and (2) are used to represent the line numbers before and after the kth line, respectively.

    $$ pre_{ver} = floor(ratio_{ver} \times k) $$
    (1)
    $$ fol_{ver} = ceil(ratio_{ver} \times k) $$
    (2)

In Eqs. (1) and (2), \( floor \) is a rounded down function, \( ceil \) is a rounded up function, and \( ratio_{ver} \) is the ratio between the height of the segmented 2D texture image, \( f_{i} (h) \), and the height of the planar template, \( Tmpt(h) \).

$$ ratio_{ver} = f_{i} (h)/Tmpt(h) $$
(3)

The resampling data (including the shape and texture data) of the vertex, \( v_{k} (vert) \), on the kth line can be calculated using interpolation between the vertex, \( v_{pre} (ori) \), on the \( pre \) line and the vertex, \( v_{fol} (ori) \), on the \( fol \) line.

$$ \begin{aligned} & v_{k} (vert) = v_{pre} (ori) \times (1 - \Delta r) + v_{fol} (ori) \times \Delta r \\ & \Leftrightarrow \left\{ {\begin{array}{*{20}l} {s_{k} (vert) = s_{pre} (ori) \times (1 - \Delta r) + s_{fol} (ori) \times \Delta r} \hfill \\ {t_{k} (vert) = t_{pre} (ori) \times (1 - \Delta r) + t_{fol} (ori) \times \Delta r} \hfill \\ \end{array} } \right. \\ \end{aligned} $$
(4)

where \( \Delta r = ratio \times k - pre \).

  1. (b)

    Horizontal sampling

    The interpolation operation that was used in vertical sampling is used to calculate the lth data on the kth line. \( pre_{hor} \) and \( fol_{hor} \) shown in Eqs. (5) and (6) are used to represent the row numbers before and after the lth row, respectively.

    $$ pre_{hor} = floor(ratio_{hor} \times l) $$
    (5)
    $$ fol_{hor} = ceil(ratio{}_{hor} \times l) $$
    (6)

where \( ratio_{hor} \) is the ratio between the width of the segmented 2D texture image, \( f_{i} (w) \), and the width of the planar template, \( Tmpt(w) \).

$$ ratio_{hor} = f_{i} (w)/Tmpt(w) $$
(7)

The lth resampling data on the kth line (including the shape data and texture data) of the vertex, \( v_{k,l} (final) \), can be obtained using interpolation between the vertex, \( v_{k,pre} (vert) \), on the \( pre \) row and the vertex, \( v_{k,fol} (vert) \), on the \( fol \) row.

$$ \begin{aligned} & v_{k,l} (final) = v_{k,pre} (vert) \times (1 - \Delta r) + v_{k,fol} (vert) \times \Delta r \\ & \Leftrightarrow \left\{ {\begin{array}{*{20}l} {s_{k,l} (final) = s_{k,pre} (vert) \times (1 - \Delta r) + s_{k,fol} (vert) \times \Delta r} \hfill \\ {t_{k,l} (final) = t_{k,pre} (vert) \times (1 - \Delta r) + t_{k,fol} (vert) \times \Delta r} \hfill \\ \end{array} } \right. \\ \end{aligned} $$
(8)

where \( \Delta r = ratio \times l - pre \).

After arranging the resampled data according to the topology structure of the planar template, each 3D face can be represented as two vectors.

$$ \left\{ {\begin{array}{*{20}l} {s_{i} = (x_{1} y_{1} ,z_{1} , \ldots ,x_{j} ,y_{j} ,z_{j} , \ldots ,x_{n} ,y_{n} ,z_{n} )^{T} \in R^{3n} } \hfill \\ {t_{i} = (r_{1} ,g_{1} ,b_{1} , \ldots ,r_{j} ,g_{j} ,b_{j} , \ldots ,r_{n} ,g_{n} ,b_{n} )^{T} \in R^{3n} } \hfill \\ \end{array} } \right.,1 \le i \le N $$
(9)

In Eq. (9), \( s_{i} \) is the shape vector composed of three coordinates of the ith 3D face, \( t_{i} \) is its corresponding texture vector composed of R value, G value, and B value, N is the number of 3D faces, and \( n \) is the number of facial points on the 3D face. Values with the same subscript represent the same facial feature points on different face vectors.

3 3D Face Reconstruction Based on Sparse Morphable Model

3D face reconstruction based on a sparse morphable model consists of two steps, i.e., model construction and face shape reconstruction. The first step involves acquisition and normalization of prototype 3D face data and establishment of a 3D morphable model. The second step is to match a target face image with the morphable model and accomplish the reconstruction of the face.

3.1 Construction of 3D Morphable Model

Normalized 3D faces have an equal number of vertices and patches. They can be regarded as a linear space, where each element can be expressed linearly using other elements. The shape of a 3D face can be expressed as Eq. (10).

$$ s_{i} = (x_{1} ,y_{1} ,z_{1} , \ldots ,x_{k} ,y_{k} ,z_{k} , \ldots ,x_{n} ,y_{n} ,z_{n} )^{T} \in R^{3n} ,i = 1, \ldots ,m $$
(10)

where \( (x_{k} ,y_{k} ,z_{k} ) \) is the coordinate of the \( k \) th vertex, \( v_{k} \), \( n \) is the number of vertices, and m is the number of 3D faces. The linear space constructed using m 3D faces can be expressed as Eq. (11).

$$ S = (s_{1} , \ldots ,s_{m} ) \in R^{3n \times m} $$
(11)

The shape of a new face can be expressed as Eq. (12).

$$ s_{new} = S \cdot a,a = (a_{1} , \ldots ,a_{i} , \ldots ,a_{m} ) $$
(12)

where \( a_{i} \in [0,1] \) and \( \sum\limits_{i = 1}^{m} {a_{i} = 1} \).

The \( m'(m' \le m - 1) \) vectors corresponding to the \( m'(m' \le m - 1) \) largest eigenvalues can be used to construct the feature matrix, \( Q = (q_{1} , \ldots ,q_{m'} ) \), and Eq. (12) can be rewritten as Eq. (13).

$$ s_{new} = \bar{s} + Q \cdot \beta = \bar{s} + \Delta s $$
(13)

In Eq. (13), \( \bar{s} = \frac{1}{m}\sum\limits_{i = 1}^{m} {s_{i} } \) and \( \beta = (\beta_{1} , \ldots ,\beta_{i} , \ldots ,\beta_{m'} )^{T} \in R^{m'} \). This implies that a specific face can be obtained using a deformation on the average face.

3.2 Face Shape Reconstruction Based on Sparse Morphable Model

Face shape reconstruction based on the sparse morphable model can be described as the analysis of global shape deformation, \( \Delta s \), according to the shape deformation, \( \Delta s^{f} \).

$$ \Delta s^{f} = L(s - \bar{s}) = L(Q \cdot \beta ) = Q^{f} \cdot \beta $$
(14)

where L represents the selected feature points.

The computation of \( \beta \) can be transformed into determining the optimal solution of the objective function, as shown in Eq. (15).

$$ E(\beta ) = \left\| {Q^{f} \cdot \beta - \Delta s^{f} } \right\|^{2} + \eta \cdot \left\| \beta \right\|^{2} $$
(15)

where the first term represents reconstruction error, the second term represents random fluctuation, which improves the robustness of the model to noise, and \( \eta \ge 0 \) is an adjusting parameter.

According to singular value decomposition, Eq. (16) is obtained by taking the derivative of \( \beta \) with respect to the objective function.

$$ \arg \hbox{min} \left\| {E(\beta )} \right\| = V \cdot \left( {\frac{{\lambda_{i} }}{{\lambda_{i}^{2} + \eta }}} \right) \cdot U^{T} \cdot \Delta s^{L} $$
(16)

where \( U \in R^{l \times l} \) and \( V \in R^{m' \times m'} \).

Combining Eq. (16) with Eq. (13), the final reconstruction result of the given face can be described as Eq. (17).

$$ s_{new} = \overline{s} + Q \cdot V \cdot \left( {\frac{{\lambda_{i} }}{{\lambda_{i}^{2} + \eta }}} \right) \cdot U^{T} \cdot \Delta s $$
(17)

4 Experiments and Analysis

4.1 3D Face Database and the Segmentation Performance

The BJUT-3D database [11] is used to evaluate the performance of the proposed method. There are approximately 200000 vertices and 400000 triangular faces for one 3D face model. An example of original 3D face data is shown in Fig. 2. Face segmentation results based on geometric information are shown in Fig. 3.

Fig. 2.
figure 2

Geometry and texture information of one person in BJUT-3D database

Fig. 3.
figure 3

Face segmentation results based on geometric information

4.2 Dense Correspondence Performance of the Proposed Method

The data of 100 3D faces from the BJUT-3D database, consisting of 50 females and 50 males, are used to evaluate the dense correspondence performance of the proposed method. A comparison between the original 3D point cloud data and the dense correspondence results is shown in Fig. 4. It can be observed from Fig. 4 that the 3D face database can be normalized using the proposed method and is suitable for linear operations to be performed later.

Fig. 4.
figure 4

Dense correspondence results of the proposed method

4.3 3D Reconstruction Results of the Proposed Method

A hundred face images with a natural expression and no light variation are used from the CAS-PEAL-R1 database to evaluate the reconstruction performance of the proposed method. Two original face images are shown in Fig. 5(a). The images are normalized to a size of 164 × 146 pixels. The 3D reconstruction results obtained using the proposed method are shown in Fig. 5(b). The texture mapping results are shown in Fig. 5(c). Figure 5(d)–(n) show the multi-pose results obtained by rotation and projection of the 3D faces. Figures 5(d), (e), and (f) show the projected results of 15°, 30°, and 45° rotation, for the pose of looking in front horizontally. Figures 5(g), (h), (i), and (j) show the projected results of 0°, 15°, 30°, and 45° rotation, for the pose of looking down. Figures 5(k), (l), (m), and (n) show the projected results of 0°, 15°, 30°, and 45° rotation, for the pose of looking up.

Fig. 5.
figure 5

3D face reconstruction results (a) Input images (b) Reconstructed results (c) Texture mapping results (d)–(n) Multi-pose mapping results of the 3D face

4.4 Recognition Results of the Proposed Method

A hundred subjects with a natural expression and no light variation are used from the CAS-PEAL-R1 database to evaluate the performance of the proposed method for face recognition. Twelve multi-pose images for each subject obtained by rotation and projection of the 3D sparse morphable model, as described in Sect. 4.3, are used as training samples. Five real multi-pose face images for each subject are used as testing samples. The performance of the proposed method is compared to that of conventional methods, and the results are shown in Table 1.

Table 1. Recognition performance comparison of the proposed method and other methods

For the enhanced projection-combined principal component analysis ((PC)2A) method, the recognition rate is relatively low. This is primarily because the new images are dependent and are highly correlated with the original image. The recognition rates of the pose adaptive feature extraction (PAFE) method and the triplet pose sparse matrix (TPSM) method are relatively high because a priori information about 3D face data is used. For the proposed method, training images are enriched by rotation and projection of the reconstructed 3D face. It can be observed from Table 1 that the recognition rate of the proposed method is 91%, which is superior to state-of-the-art methods for pose-invariant face recognition.

5 Conclusions and Discussions

To synthesize virtual multi-pose faces, a 3D face reconstruction method based on a single view is proposed in this paper. This method includes three aspects. The first is planar template establishment based on geometric information. The second is vertical and horizontal resampling of 3D face data based on the geometric relationship between the planar template and original 3D face data, based on which normalized 3D face data is obtained. The third is that a 3D sparse morphable model is constructed based on the normalized 3D face data, and multi-pose face images are generated by texture mapping, rotation, and projection of the established 3D face. The proposed method has two contributions. First, the dense correspondence process is performed automatically and the model construction process does not involve manual interaction, which makes it superior to other methods. Second, virtual multi-pose face images are synthesized using the proposed method and training samples are enriched, because of which recognition performance improves.

A pose-invariant face recognition problem without illumination variations is considered in this study. Face recognition with pose and illumination variations is more difficult. Future research will focus on a pose-invariant face recognition problem with illumination variations based on a single view.