1 Introduction

Face recognition is a critical component in many computer vision applications, such as access control, video surveillance and public security. In real world, face images are often corrupted by many unknown factors, namely, illumination, occlusion and expression, so that face recognition is still a challenging task. What’s more, in some computer vision front-end application, it is necessary to quickly and efficiently recognize faces, which is still an open problem.

Among the proposed face recognition method, deep learning (DL), manifold learning and sparse representation-based method are three main research branches. Hinton proposed the deep learning method, which converts high-dimensional data to low-dimensional codes by training a multilayer neural network (Hinton and Salakhutdinov 2006). Deep learning method has made great success in computer vision field (Farabet et al. 2013; Kavukcuoglu et al. 2010; Ciresan et al. 2012). In the field of face recognition, the deep learning method has also achieved very good results. Taigman et al. (2014) proposed a deep learning (DL) frame-work to deal with the mission of unconstrained face recognition. Researches have presented a system (DeepFace) that is close to the human visual system. Despite the high recognition rate, the deep learning method needs to handle massive amounts of data and multilayer neural network, and at present, it is very difficult to study the relationship between human face recognition and multilayer neural network recognition.

Tenenbaum et al. (2000) introduced the manifold learning, which is built on classical multidimensional scaling (MDS) but aim to preserve the intrinsic geometry of the data. Silva and Tenenbaum (2003) proposed the mathematical description of manifold learning in detail. Based on that, manifold learning obtains the development continuously. Ataer-Cansizoglu et al. (2014) proposed a distance order preserving manifold learning algorithm that extends the basic mean-squared error cost function used mainly in multidimensional scaling (MDS)-based methods. Bhatia et al. (2014) presented a novel method of hierarchical manifold learning, which aims to automatically discover regional properties of medical image datasets. It extends conventional techniques by additionally examining local variations, in order to produce spatially-varying manifold embeddings to as a characterisation of a given dataset. Lu et al. (2013) proposed a multi manifold analysis technology for face recognition. Researches assume that the samples from the same face class would define a single manifold in the feature space and seek better separated feature dimensions in the low-dimensional feature spaces. Manifold learning has the noise sensitive problem, which affects the robustness of recognition, and it will be further studied.

Sparse representation is first introduced into face recognition by Wright et al. (2009). In this method, a testing image is first sparsely represented over all training images, and then the classification result is obtained by finding the class, which leads to the minimal representation error. Sparse representation has been widely used in both computer vision and signal processing applications. However, the SRC algorithm has its limitation in some aspects considering the problem of position, illumination and occlusions of the actual database and so on. Hence, a number of researchers have been attracted to tackle these difficulties and many effective methods were proposed. To overcome the problem of illumination variations and occlusions, Yang and Zhang (2010) introduced Gabor features, which make the occlusion dictionary compressible. Yang et al. (2011) and He et al. (2011) studied the robust sparse representation to avoid the utilization of identity occlusion dictionary, so that the speed of sparse representation can be accelerated and the higher recognition rate can be reached, especially when face image is occluded by disguise. Wang et al. (2015) proposed a Manifold Regularized Local Sparse Representation (MRLSR) model, which guarantees that all coding vectors in sparse representation should be group sparse. It means that each representation meets both individual sparsity and local similarity. Zhuang et al. (2013) proposed a method to train a sparse illumination dictionary to represent the different illumination patterns between the training and testing image named SIT technique. To overcome the limitation of requiring large amounts of samples per class, Deng et al. (2012, 2013) proposed the use of additional dictionaries, constructing it by the use of available data, which refines face recognize performance. A testing sample is represented as a sparse linear combination of the training and intra-class dictionary samples. The additional intra-class dictionary of subject samples can effectively model face variations. The method is called Extended SRC (ESRC). Iliadis et al. (2014) combined the sparsity-based approaches with additional least-squares steps, mitigating the need for a large number of training images since it proves robust to vary number of training samples. Liu et al. (2014) proposed a local structure based sparse representation classification (LSSRC), which divides the face into local blocks, classifies each local block, and then integrates all the classification results by voting. It successfully alleviates the dilemma of high data dimensionality and small samples. To overcome the problem of emphasizing the sparsity too much and over-looking the correlation information, Rigamonti et al. (2011) proposed correlation representation-based classifier (CRC) in consideration of the collaborative representation. Wang et al. (2014) proposed a framework called adaptive sparse representation-based classification (ASRC) in which sparsity and correlation are jointly considered. To address the alignment problem, Liao et al. (2013) developed an alignment-free face representation method based on Multi-Key point Descriptors (MKD). The approach represents each face image with a set of key point descriptors (GTP and SIFT), and constructs a large dictionary from all the gallery descriptors.

Although now big data computing could bring high face recognition rate, it needs strong computing power, and normally works in the cloud. However, in many computer vision applications, especially a lot of front-end application, it needs to quickly and efficiently recognize faces. Inspired by human rapid and accurate identification of familiar faces, we think that there may be a class of fast computing mechanisms that play a role in human face recognition and thus improve the accuracy of recognition. Bonnen et al. (2013) considered this problem from another point of view. They proposed a component-based recognition method. Referring to some psychological research results (Richler et al. 2011; Schwaninger et al. 2006; Piepers and Robbins 2012), this method suggests that the facial features are composed of the “detail” features and the “holistic” features, and the “detail” features occupy the main position. Then, face recognizing could be realized through extracting “detail” features and discriminating them. Continuing in-depth thinking about this idea, we study the nonlinear least-squares calculation in face recognition application, and find that it really can improve the recognition rate, and more importantly, it can deal with any combination of face features, such as “detail” and “holistic” features, obtaining a high recognition rate. we also study the algorithm of SRC in depth, and find that the SRC algorithm has acceptable robustness for the facial rigid region (e.g. eyes, nose and mouth). What’s more, there is an interesting result that the recognition rate of single facial component using SRC algorithm in actual database is close to the human face recognition. Then we propose a hierarchical face recognition algorithm using nonlinear least-squares named HSRC. HSRC has three steps: the first step is multichannel analysis of the holistic features of one face; the second step is to analyze the components of facial rigid region; the third step is to judge the face by using nonlinear least-squares.

This paper continues as follows. In Sect. 2, we introduce the basic principle and main process of SRC. In Sect. 3, we describe the main contents of HSRC and display the whole process of HSRC algorithm. In Sect. 4, we introduce the experimental results of facial components and judgment by nonlinear least-squares. In Sect. 5, we summarize the whole paper and put forward a direction of further research.

2 Sparse representation-based classification (SRC)

Sparse Representation-based Classification (SRC) is proposed by Yang and John Wright 2009, which is based on the theory of compressed sensing (CS). The main process of SRC is as follows:

  1. 1.

    Form the training set:

    $$\begin{aligned} A = [{A_1},{A_2},\ldots ,{A_k}],\quad {A_i}(1 \le i \le k) \end{aligned}$$

    \({A_i}\) are the sample images of the i-th person, and k is the total number of classes. \({A_i} = [{\nu _{i,1}},{\nu _{i,2}},\ldots . {\nu _{i,n}}]\), \({\nu _{i,n}}\) is the n-th sample image of the i-th person.

  2. 2.

    Use the training set to present the testing image y:

    $$\begin{aligned} y = \sum \limits _{i = 1}^k {\sum \limits _{j = 1}^n {{\alpha _{i,j}}{\nu _{i,j}}} } = Ax \end{aligned}$$

    \(x = [{\alpha _1},{\alpha _2},\ldots , {\alpha _k}]\) is the sparse solver of the testing image and \({\alpha _i} = [{\alpha _{i,1}},{\alpha _{i,2}},\ldots ,{\alpha _{i,n}}](1 \le i \le k)\) is the sparse coefficient of the person i. If the class of the testing image is i, y could be sparsely represented only by the sample images of person i in some degree.

  3. 3.

    Get the sparse representation via \(\ell 1\) norm minimization:

    $$\begin{aligned} {{\hat{x}}} = \min {\left\| x \right\| _1} \end{aligned}$$
  4. 4.

    Calculate the residual:

    $$\begin{aligned} {r_i}(y) = {\left\| {y - A{\delta _i}({\hat{x}})} \right\| _2},\quad i = 1,\ldots ,k \end{aligned}$$

    Confirm the class of the testing image:

    $$\begin{aligned} Identity(y) = \min ({r_i}) \end{aligned}$$

    If we want to identify the class of the testing image intuitively, we can calculate the residuals between y and \(A \cdot {\delta _i}({\hat{x}})\). \({\delta _i}({\hat{x}})\) denotes the coefficients sequence, where \({\alpha _j} = {\mathbf{0}},(j \ne i)\), and \({\alpha _i} \ne {\mathbf{0}}\).

3 A hierarchical face recognition algorithm based on nonlinear least-squares (HSRC)

SRC algorithm has been widely used in both computer vision and signal processing applications, and the recognition rate is really high in some databases such as ORL. However, in other databases, such as actual environment databases, SRC algorithm encounters severe performance degradation, just as Wu et al. (2014) analyzed. To deal with the performance degradation problem, SRC algorithm needs to relax the confirm conditions while recognizing holistic face pictures, and to enhance the accuracy rate while recognizing the face details (Wu et al. 2014). Along this technical route, we further suggest that, in recognition of the top floor, people recognize the face through combining the “detail” and “holistic” features. And the “detail” features, such as mouth, eyes, may be accurately identified by Sparse Representation. So, we design a new hierarchical face recognition algorithm named as HSRC. Hierarchy is embodied by the analysis of holistic face features and facial “detail” features, and we use nonlinear least-squares to combine these features in order to enhance the recognition performance.

HSRC has three steps:

Step 1: Multichannel analysis. We use bidirectional PCA, linear discriminant analysis and GradientFace to extract more “holistic/configural” face features, then select some face images close to the testing image to generate 1-step face database, which will be used as sample images for the next operation.

Step 2: Extract facial components. After handling the 1-step face database with Self Quotient Image (SQI) (Wang et al. 2004), we extract the facial components by using a new proposed algorithm inspired by ESRC to generate 2-step face database, which includes four sub-databases: left eye database, right eye database, nose database and mouth database.

Step 3: Judge by nonlinear least-squares. We use SRC algorithm in each sub-database, and combine these recognition results with nonlinear least-squares method, which promotes the recognition rate.

The flow diagram of HSRC is shown in Fig. 1.

Fig. 1
figure 1

The main process of HSRC algorithm

3.1 Multichannel analysis

The first step of HSRC follows the first step of T_SRC (Wu et al. 2014). We select three algorithms as the basic algorithms for the multichannel analysis. They are bidirectional PCA (Zuo et al. 2006), linear discriminant analysis and GradientFace (Zhang et al. 2009). Considering that each basic algorithm can capture part of the “holistic” facial information, combining multi parts can get more complete “holistic” facial information. We propose multichannel analysis. Three algorithms, BDPCA, PCA \(+\) LDA and GradientFace \(+\) PCA \(+\) LDA are used respectively in each channel to recognize the face. Then we combine the candidates as the 1-step face database, which are the most likely 1–3 face images in each channel. By relaxing the candidate constraints, HSRC introduces more “holistic” facial information as recognition conditions.

3.2 Extract facial components

Mainly based on the results of psychology research, HSRC extracts facial components as facial “detail” features. The existing methods may not have the ideal effect of locating facial components. The Core Image Programming of Apple includes two parts: one is detecting face in an image and the other is getting face components. It has an ideal result in getting face components on the premise of recognizing the whole face-contour correctly. Otherwise it cannot work well. Gabor filter is used to detect edges in image processing, and it could be used to detect facial components too. But the accuracy of detection would be affected by eyebrows, spectacle frames or cheekbones, which will result in the deviation of detecting. Adaboost algorithm with Haar features is usually used in face detection, but it does not have an ideal result when it is used in facial components detection. In this paper, we propose a new method to extract facial components, which is multi-adaptive and robust.

The motivation for our proposing algorithm comes from the fact that different images of the same subject share a lot of similarities. Since facial components such as eyes, noses and mouths from different people look alike in some degree. They have the similar structure and contour line. On the basis of this point, we can select existing facial components to form an average template to locate the new facial components by using SRC.

The model of extracting facial components is showed as below:

$$\begin{aligned} y = B{x_0} + D{\xi _0} + \varepsilon \end{aligned}$$
(1)

\(B{x_0}\) denotes the average facial components. \(D{\xi _0}\) denotes the variant individual deviation and is the small dense noise. B and D are dictionaries of the average facial components and the individual deviation respectively. \({x_0}\) and \({\xi _0}\) are the sparse representations, which could be recovered simultaneously by \(\ell 1\) minimization. \((Bx,{\Delta _1})\) and \((D\xi ,{\Delta _2})\) are two thresholds. If \(\left| {B{x_0} - Bx} \right| \le {\Delta _1}\) and \(\left| {D{\xi _0} - D\xi } \right| \le {\Delta _2}\), the testing samples could be accepted as the facial component represented by Bx.

Fig. 2
figure 2

Before and after handling with SQI

Fig. 3
figure 3

Four big blocks

The algorithm steps are described as follows:

First, to remove the effect of varied illumination, we preprocess the images in 1-step face database by using SQI. Figure 2 is the result of SQI extracting illumination invariant before and after.

Second, after SQI dealing with the image, we cut the image into four big blocks, and each block includes left eye, right eye, nose and mouth respectively. Figure 3 shows the result. Then we use a rectangle with the size of \(M \times N\) to traverse these big blocks every pixels from x axis and y axis and extract many small blocks. These small blocks are used as training samples. In this paper, \(M = N = 31\) Pixels.

Third, we select some possible facial component images with the size of \(M \times N\) as testing samples, and find out the most likely sample by using training samples and SRC.

The algorithm process is expressed as follows:

  1. 1.

    Input: a matrix of training samples

    $$\begin{aligned} A = [{B_1} + {D_1},{B_2} + {D_2}\ldots \ldots {B_k} + {D_k}] \end{aligned}$$

    \({B_i}\) is the average facial feature; \({D_i}\left( {1 \le i \le k} \right)\) is individual deviation; y is a testing sample.

  2. 2.

    Normalize the columns of A and y.

  3. 3.

    Solve the \(\ell 1\) minimization problem:

    $${{\hat{x}} = \arg \min {{\left\| x \right\| }_1}} \quad {s.t.} \quad y = Ax$$
  4. 4.

    Compute the residuals:

    $$\begin{aligned} {r_i}(y) = {\left\| {y - A{\delta _i}({\hat{x}})} \right\| _2},\quad i = 1,\ldots ,k \end{aligned}$$

    where

    $$\begin{aligned} {\delta _i}({\hat{x}}) = [0,0,\ldots ,{x_i},0,\ldots ,0], \quad {x_i} = {x_{i1}},\ldots ,{x_{im}} \end{aligned}$$

    coefficient m is corresponds to the m training samples coming from the No. i individual.

  5. 5.

    Output:

    $$\begin{aligned} Identity(y) = \arg {\min _i}({r_i}(y)) \end{aligned}$$

According to this algorithm, we locate the facial components successfully and use these images as 2-step face database. The accuracy of this algorithm is shown in Table 1, which is judged by people.

Table 1 The accuracy of the algorithm of extracting facial components (%)

3.3 Judge by nonlinear least-squares

How to combine some facial components into a holistic one is still an open research problem. We use nonlinear least-squares method to combine the facial components judgements, and get a good holistic recognition rate. There is an interesting phenomenon that although each facial component judgement is not so accurate, just like the human beings, the holistic judgement using nonlinear least-squares method is in high accuracy, close to human face recognition.

3.3.1 Nonlinear least-squares method

Considering a curve \(y = f(x,\beta )\) that relies on variable and parameter \(\beta\), we hope the curve fits the data in the meaning of least squares:

$$\begin{aligned} Q = \sum \limits _{k = 1}^m {{{\left[ {{y_k} - f({x_k},\beta )} \right] }^2}} \end{aligned}$$
(2)

\({x_k}\) is the No. k factor; \({y_k}\) is the actual score of No. k factor; \(f({x_k},\beta )\) is the target score of No. k factor; \(\beta\) is the weight value of each factor. Q is the value of nonlinear least-squares error.

3.3.2 Judge by nonlinear least-squares

2-step face database involves four subdatabases (left eye, right eye, nose, mouth), which only include the facial components information, but lack the “configural” face information. So we introduce GradientFace to supplement the holistic information of face. We respectively apply SRC algorithm to these five factors mentioned above and get the recognition results. First of all, according to our actual experience, the weights \(\beta\) of the factors of eyes, nose, mouth and Gradientface are assigned with 0.2 0.3, 0.1 0.2, 0.1 0.2, 0.2 0.25 in turn. And then, for one testing sample, we extract 5 component factors, namely left eye, right eye, nose, mouth and Gradientface. For \(No.\,k\left( {1 \le k \le 5} \right)\) factor of each testing sample, we sort the similarity of all training samples according to the residual errors, and assign a “similar score” to each class, which is represented by the absolute differences of \({y_k} - f({x_k},\beta )\). We stipulate that the more similarity corresponds to lower scores. The specified rule is described as follows:

  1. 1.

    Initialize each class:

    $$\begin{aligned} similar\mathrm{{\_}}scor{e_{clas{s_j}}} =S;\quad 1 \le j \le n \end{aligned}$$

    where n denotes the total number of classes, and \(clas{s_j}\) denotes the j-th class.

  2. 2.

    Each factor of the testing sample, such as left eye, will be compared with all training samples and according to the residual errors, reassign the “similar score” of the corresponding class. Specific steps are as follows:

    figure a

Now for a testing sample, each class corresponding to each factor is assigned to a “similar score”. Then we use Eq. (2) to calculate the scores for each class, and the class corresponding to minimum value \({Q_{\min }}\) is the final recognition result.

4 Expermental results and analysis

4.1 Experiment databases

For the purpose of evaluating the performance of the proposed scheme HSRC, we conduct the experiments on three typical face image databases: ORL database, extended YaleB database and the actual environment database. There are 40 people in total and each person has ten images in ORL database. The size of the image is \(112 \times 92\). In the Extended YaleB database, there are 39 people in total and 25 images per person, and the size of an image is \(192 \times 168\). In order to compare them conveniently, we select ten images (the same images as ORL) from 25 images per person randomly. The actual environment database is collected by our computer front-camera via OpenCV. There are 36 persons and 10 images per person in the actual environment database. And the size of an image is \(128 \times 128\). The face databases are shown in Fig. 4.

Fig. 4
figure 4

Part images of three databases

4.2 Experimental results of HSRC 1-step and 2-step

Multichannel analysis is the 1-step process of HSRC algorithm, and the result is 1-step database, which includes several most likely classes. No. 1–5 images are selected from all databases as the training set and the rest of the five images are taken as testing images. The actual environment database uses three most likely classes in each channel. Then we make these classes a union set to generate the 1-step database. The 1-step recognition rate in actual database is 95.5 %.

Although the 1-step HSRC has high recognition rate, it is not the real recognition rate. It just means that some most likely classes include the right class of the testing image. So the next work is to choose the correct class from these most likely classes through the perspective of facial components.

Next we elaborate the process of extracting facial components by using actual environment database as an example. First, we use SQI to handle the 1-step database. The size of images are \(128 \times 128\) and we cut them into four big blocks with the size of \(64 \times 64\), each of which includes left eyes, right eyes, noses and mouths respectively. Through using \(31 \times 31\) rectangles to traverse these big blocks, we pick out many small blocks as training samples. According to the training samples, we could locate the facial component in testing samples by using the method referred in Sect. 3.2. Figure 5 shows the part results of locating partial components of face in actual environment database. The facial components extracted by the above method are collected and stored in the 2-step database.

Fig. 5
figure 5

Part facial components located

the 2-step database involves database of left eye, database of right eye, database of nose and database of mouth. In each subdatabase, we respectively select 5, 6, 7, 8 images as training samples, with the rest as testing samples. Tables 2, 3 and 4 illustrate the accuracy rates of the eyes components recognition in the 2-step databases by using SRC algorithm. The first lines of Tables 2, 3, 4, 5, 6 and 7 denote the different number of the training sample images 5, 6, 7, and 8.

Table 2 The accuracy rate of the components recognition in actual environment database (%)
Table 3 The accuracy rate of the components recognition in ORL database (%)
Table 4 The accuracy rate of the components recognition in Extended YaleB database (%)

Tables 3 and 4 show that the recognition rates of facial components are really high. The reason is that when we use three basic algorithms for multichannel analysis, the recognition rates in each channel are high enough for ORL database and Extended YaleB database. The most likely classes that each channel select concentrate on the correct class. It means that the alternatives of the 2-step database are very small and always correct. Thus, the components recognition can get high accuracy rate. Table 2 shows that the recognition rate of eyes by using SRC algorithm in actual database ranges from 65 to 75 %, which is close to the accuracy of human facial component recognition (Bonnen et al. 2013).

4.3 Experimental results of HSRC 3-step

Next, we assign similar scores to all classes and distribute the weights for each factor referred to the rules in Sect. 3.3. Specifically, S is 15, N is 5, i is 1 to 5 and the weights for each factor are 0.25, 0.25, 0.15, 0.15, 0.2 in turn. Using nonlinear least-squares, the facial component recognitions can be combined into one result, which is the face recognition rate. We compare the recognition results of HSRC with that of some proposed algorithms, which are shown in Tables 5, 6 and 7 and Figures 6, 7 and 8.

Table 5 The recognition rate of actual environment database (%)
Table 6 The recognition rate of ORL database (%)
Table 7 The recognition rate of Extended Yaleb database (%)
Fig. 6
figure 6

Part facial components located

Fig. 7
figure 7

Part facial components located

Fig. 8
figure 8

Part facial components located

As is shown to us, in actual database, HSRC does have promotion compared to the other algorithms due to its robustness. It can accept the low accuracy of one component recognition, but the accuracy of combining all components is high. If we could extract the facial component more accurate, the recognition rate of HSRC would be higher. Comparing with SRC+BDPCA and T_SRC, HSRC does not perform so well in ORL database, and the reason from our inference is that the accuracy of extracting facial components is not so ideal. More components identification errors will affect the robustness of HSRC. The accuracy of HSRC recognition is really high in Extended YaleB database, because the main problem of the database, varied illumination, could be solved by SQI. Summing up the above discussion, generally, HSRC algorithm has really good performance. Moreover, we further analyze the robustness of HSRC through extracting facial components more accurately. We manually locate the exact coordinates of facial components in actual database, and use HSRC to recognize the testing images. The results are shown in Table 8. When the training images are 5 or 6, the performance improvement with manual extracting is obvious, and the recognition rates using 5–8 training images are close. It can be inferred that if the number of training images reaches a certain threshold, HSRC algorithm can achieve the same accuracy of manual extraction of facial components. Furthermore, it is generally believed that the location of the facial components is accurate when people recognize face image, and the recognizing results is robust when the face image is familiar. This effect can be achieved by HSRC algorithm to some extent. The crucial part of the algorithm is that nonlinear least-squares method can provide good robustness while recognizing face.

Table 8 The recognition rates of HSRC and HSRC (manually) in actual database (%)

5 Conclusions

With the increasing face recognition rate, the technology of face recognition is widely used in our life, such as public security forensic and management of houses security. Recently, people do research and propose new methods to solve the problem of variant illumination, posture and incomplete face in face recognition. We focus on the problem whether proposed face recognition algorithm can establish correlation with human face recognition, and it may bring us more efficient algorithms to recognize familiar faces. We study the algorithm of SRC in depth and find out that the recognition rate of facial components by the use of SRC algorithm is close to the human recognition rate. Then we propose a hierarchical face recognition algorithm based on nonlinear least-squares named as HSRC. HSRC can reduce the effect of variant illumination, posture and incomplete face image and improve the recognition rate. Experiments prove that HSRC has a better performance than SRC and T_SRC have.

However, only combining the facial components features in HSRC may not be enough to get a higher recognition rate. In the next phase of research, we will try to introduce the “configural” information among the facial components into the face recognition. We believe there is still room for improvement in face recognition, and we will do better in further research.