1 Introduction

Face recognition technology can be widely used in business and security fields, such as attendance system, criminals monitoring, and Id verification. So face recognition [1,2,3,4,5,6,7] has been a hot research topic in the field of pattern recognition over the past decades. Researchers have proposed many algorithms for face recognition. Among them, the subspace analysis methods such as the principal component analysis [8,9,10,11,12,13,14], the linear discriminant analysis [15,16,17,18,19,20], and the independent component analysis [21, 22] were used most widely. These methods are usually composed of two stages: feature extraction and classification. In feature extraction stage, a projection transformation matrix is got by learning on the training set, and then samples are projected onto the feature subspace by the matrix. In classification stage, some classifiers such as the nearest neighbor or support vector machine classifier are used to classify testing samples.

Recently, a novel method called sparse representation classifier (SRC) [23] was proposed and achieved excellent experiment results for face recognition. The core of the algorithm is to use all the training samples to represent the test sample by linear combination [23,24,25,26,27,28]. The combination coefficient is constrained by the sparse, that is to say, most of the component of the coefficient are zeros. After combination coefficient was got, for a test sample, the representation residual to each class can be calculated by the test sample and linear combination result of the training samples in that class. Finally, the test sample is classified to class with minimum residual.

Previous researches have already proved that the sparse representation method can obtain a very good performance for face recognition [29]. However, the computational cost of the method is very high because it depends on an iterative algorithm to calculate the sparse representation coefficient. Zhang [30] proposed a method called collaborative representation classifier based on regularized least squares (CRC_RLS). This method utilizes collaborative representation (linear combination of all training samples) to represent the test samples and the linear combination coefficient is deduced by regularized least squares. The recognition rate of CRC_RLS is comparable with that of SRC, but computational cost of the CRC_RLS is far less than that of SRC. Xu et al. [26] proposed a simple and fast representation-based (SFR) face recognition method. The method selects only the nearest training sample to the test sample from every class and then expresses the testing sample as a linear combination of all the selected training samples. The method can classify more accurately than the nearest neighbor classification. Tang et al. [27] proposed a novel sparse representation method based on virtual samples. The virtual samples were created by adding small noise to the original training samples. Wang et al. [35] improved the method proposed in [27]. They divided an original image into several parts and generated a corresponding virtual training sample via adding different random noise to pixels of different parts. So, various illumination changes could be simulated.

In this paper, a novel collaborative representation method based on virtual samples (CRVS) is proposed. Firstly, the virtual samples of training samples are produced by principal component analysis (PCA) reconstruction and image mirror transform. Then, by the original sample and new virtual samples as the dictionary, the CRC_RLS is employed to express the test sample by the augmented training samples and to classify the test sample. A number of FR experiments show that our proposed method outperforms the method proposed in [26] and other methods based on virtual samples.

The rest of this paper is organized as follows: Sect. 2 introduces the proposed algorithm. Section 3 shows the experiments results for face recognition. Finally, Sect. 4 comes to the conclusion.

2 The representation classification based on augmented sample set

2.1 Generating virtual images

The proposed method in the paper includes two steps. Firstly, the virtual samples are created by PCA reconstruction and image mirror transform. Secondly, the new training sample set that consists of the original and virtual samples is used as the dictionary to represent the test sample. CRC_RLS is utilized to calculate the representation coefficient and classify the test sample.

To keep virtual sample with similar to the original image, PCA reconstruction is employed to produce virtual sample. PCA is a common statistical method, often used for data dimension reduction. Here, we take advantage of the nature of the minimum reconstruction error for PCA and select the most important base vectors of PCA to reconstruct the original sample.

Let \(x_{k,i} (k = 1,2, \ldots ,C\;\,i = 1,2, \ldots ,m)\) denotes a train sample. Here, the \(k\) denotes that the sample is from the \(k\) th class and \(i\) denotes that it is the i th sample in that class. C and m, respectively, are the number of class and sample number of each class.

In the process of PCA, first of all, the samples are subtracted by the mean. Let \(x^{'}_{k,i} = x_{k,i} - \overline{x}\), where \(\overline{x}\) denotes the mean of all training samples. Then, the covariance matrix can be created by Eq. (1).

$$M = \sum\limits_{k = 1}^{C} {\sum\limits_{i = 1}^{m} {x^{\prime}_{k,i} } } x^{\prime}_{k,i} {}^{T}$$
(1)

By calculating the eigenvalue and eigenvector of matrix M, a series of eigenvector can be got. Let \(v_{j} (j = 1,2, \ldots \, N - 1)\) denotes an eigenvector, where N is the number of training sample. The eigenvalue is corresponding to the important degree of the eigenvector for PCA reconstruction and those eigenvectors with large eigenvalues can be able to reconstruct the original sample better. So we sort eigenvectors from large to little according to corresponding eigenvalues. A sample is projected into PCA subspace and the projected coefficients are got by Eq. (2).

$$a_{j} = x_{k,i} {}^{T} v_{j}$$
(2)

If the first L eigenvectors are selected as the basis vectors, the reconstruction sample of the original sample is got by following equation.

$$x^{1}_{k,i} = \sum\limits_{j = 1}^{L} {a_{j} } v_{j}$$
(3)

As shown in Fig. 1, we can see that sample of PCA reconstruction is similar to original sample.

Fig. 1
figure 1

An original face and virtual faces of it produced by PCA reconstruction and mirror transform. a The original face. b The PCA reconstruction face (first 90% eigenvectors are selected). c The mirror face

In order to improve the recognition ability under change of test samples, the proposed method employs mirror transform to generate new sample. Front face image is usually symmetric, and the mirror transform does not destroy the overall structure of the face image. A new face with mild pose change can be got and with the new face, the matching probability of training samples and test samples will be improved. We get the mirror image by Eq. (4).

$$x^{2}_{k,i} (r,c) = x_{k,i} (r,nc - c + 1)$$
(4)

In above equation, \((r,c)\) denotes the pixel position of an image and \(nc\) is the width of the image. Figure 1 shows an original training sample and virtual samples produced by it.

2.2 Collaborative representation classification by extensive samples

By the previous process, we get some virtual samples and these virtual samples will be used in the following step. Then, it comes to the second phase of the proposed method. We create a matrix X including the all original train samples and their virtual samples. Let y denotes the test sample. Regarding X as the dictionary, we can represent y by the linear combination of the column vectors of X. In other words, it assumes that the following equation is approximately satisfied:

$$y = \sum\limits_{i = 1}^{N} {a_{i} x_{i} }$$
(5)

where N is the size of column vectors of X, and \(x_{i}\) is the i th column of X and \(a_{i}\) is the coefficient of \(x_{i}\) for linear combination. The equation can be rewritten as follows:

$$y = Xa$$
(6)

All face images have similar appearance, so it is reasonable that expressing collaboratively new face by the linear combination of all the training images. The regularized least squares method can be used to solve the equation and the liner combination coefficient can be got by Eq. (7).

$$a = (X^{T} X + \lambda I)^{ - 1} y$$
(7)

where, I is the unitary matrix and \(\lambda\) is positive constant as the regularization parameter. After the combination coefficient \(a\) is obtained, we can get the representation result from train samples of each class by following equation.

$$y_{k} = \sum\limits_{i = 1}^{nk} {a_{i} } x_{k,i} \quad k = 1,2,\ldots \, L$$
(8)

where, \(nk\) is the samples number of kth class in X. The deviation of test sample and representation sample can be calculated by following equation.

$$d_{k} = \left\| {y - y_{k} } \right\|$$
(9)

The deviation reflects the ability to represent test sample of that class. The smaller \(d_{k}\) is, the greater contribution the samples from the \(k\) th class makes to represent the test sample. So, y will be classified into the class that produces the smallest deviation. That is to say, if \(d_{t} = \hbox{min} (d_{k} )\), the test sample is classified into the tth class.

The whole flow of recognizing a face in our proposed method is shown in Fig. 2.

Fig. 2
figure 2

The flow diagram of the proposed method

3 Experimental results

To evaluate the performance of the proposed method, a number of experiments are done on three benchmark face image databases: ORL, Yale, and AR. For showing the effectiveness of our proposed method, other similar methods proposed in [26, 27, 35] were implemented and experimented based on same face data. Moreover, as a preprocessing method before training, the getting virtual samples method proposed in this paper can be combined with many classification algorithms. Recently, some complex but efficient algorithms-based SRC were proposed, such as the Regularized Robust Coding (RRC) in [36]. In experiments, the RRC method was used for face recognition, and the combination of virtual samples with RRC was also done.

The ORL face database [31] consists of 40 subjects. Every subject is composed of 10 images with various facial expressions, varying illumination and facial details. We get cropped images from ref [26] and the size of the image is 56 × 46. Figure 3 shows some cropped images of one person in ORL database. For the dictionary in representation classification is need to be over completed, we resize each image to 14 × 11 in the experiment.

Fig. 3
figure 3

Images of a person from ORL database

The Yale face database [32] consists of 165 face images of 15 individuals, each providing 11 different images. The images are in upright, frontal position under various facial expressions and lighting conditions. In our experiments, each image is manually cropped and resized to 32 × 32 [33]. Figure 4 shows some cropped images of one person in Yale database. We resize each image to 16 × 16 in the experiment.

Fig. 4
figure 4

Images of a person from Yale database

For experiments on ORL and Yale face database, if m samples of all the n samples per subject are chosen as train samples and the remaining samples are used as test samples, then the number of possible training sets and test sets is: \(C_{n}^{m}\). The experiment results are the mean of these test sets.

The AR face database [34] contains over 3200 frontal face images of 126 different individuals. Each individual has 26 different images taken in two different sessions separated by 2 weeks intervals, and each session consists of 13 faces with different facial expressions, illumination conditions and occlusions. In our experiment, we choose a subset of AR consisting of 50 men and 50 women as [30] did, and all images are cropped to 60 × 43 pixels and converted to grayscale. Figure 5 shows the images of one person used in our experiments. We resize each image to 15 × 11 in experiments.

Fig. 5
figure 5

the images of a person from AR database used in our experiment. (The images in first row are from session 1 and the images in second row are from session 2)

We segment the AR database into three subsets, and experiments were done on the subsets. Subset one includes 1st–4th images of each person in session 1. The images in the subset are variant facial expressions. Subset two includes 1st, 5th–7th images of each person in session 1. The images in the subset are variant illumination. In experiments on subset one and subset two, first one image is selected as train sample and the remaining samples are used as test samples. Subset three includes all images showed in Fig. 5 and the images in session 1 are included in train set and others are in test set.

In all experiments, we firstly convert each sample vector into a unit vector with the length of 1 in advance. We use the regularized least squares to obtain the coefficients for linear combination and set the regularized parameter \(a\) to be 0.01. When getting the virtual sample by PCA reconstruction, we select first 95% eigenvectors as the reconstruction basis.

For Yale database, the number of train samples of each class is from 1 to 5. The experiment result is shown in Table 1.

Table 1 Means of the recognition rates (%) of our proposed method and other methods on the Yale face database

From the table, we can see that the recognition rate of our proposed method is always better than that of other methods on this database, no matter how many images are selected as train samples. In addition, by combing the virtual samples and RRC, the recognition accuracy of RRC can be increased on Yale face database. The average recognition accuracy is higher 13.22% than that of method proposed in [26]. When five samples of each person are selected as train samples, we can get 462 train sets by combination. Figure 6 shows the recognition rate (%) on each train set. From Fig. 6, we can see that the recognition rate of our method is greater than that of [26] on most train sets.

Fig. 6
figure 6

The comparison of the recognition rate of each train set on Yale database between the proposed method and the method mentioned in [26]

For ORL database, the number of train samples of each class is from 1 to 4. The experiment result is shown in Table 2.

Table 2 Means of the recognition rates (%) of our proposed method and other methods on the ORL face database

From the table, we can see that the accuracy of our proposed method is higher than that of others when 1 or 2 images are selected as train samples. When 3 or 4 images are selected as train samples, the accuracies of all methods are similar. It shows that our method is more efficient at scenario with undersampled train set, especially when the number of training samples is very little, such as one training sample. When 4 images of each person are selected as train samples, there are 210 training sets. The variances of accuracy of our method and the method in [27] are 1.61% and 1.84%. That shows our method is more stable to the method in [27].

The experiment results on AR database are shown in Table 3.

Table 3 The recognition rates (%) of our proposed method and other methods on the AR face database

From the table, we can see that although the performance of our proposed method is inferior to that of other methods on expressions subset, the recognition rate of our proposed method is comparative to that of others on illumination subset. On this database, the RRC method got most high recognition rate. But on expressions subset, the performance of RRC is poor. The recognition accuracy is higher 4.43% than that of method proposed in [26] when images in session 1 are selected as train samples (subset three).

The algorithm mentioned in [26] is simple and fast. Similar to that algorithm, our method also calculates the representation coefficients by regularized least squares. So the calculation efficiency of our method is comparable with algorithm mentioned in [26]. Different from method of [26], our method uses more samples including original and virtual samples to represent a test sample. So an additional step is needed for generating virtual samples in our method, and this will increase the computational burden of our proposed method. But the additional work can be done at training step, and the time efficiency of recognition step can be still kept. We record the times on experiment at 462 train sets, when five samples of each person are selected as train samples at Yale database. The time of our method is 25.23 s and the time of [26] is 45.53 s. The running time of our method is far less than that of [26]. So in time efficiency of test sample recognition, our proposed method is far better than method proposed in [26].

4 Conclusions

For representing the test sample better, we exploit the original and virtual images to construct new train sets. The method has two advantages. Firstly, the train samples are undersampled in face recognition problem and the dictionary composed by these samples in collaborative representation problem is usually not over completed. The number of train samples increases by generating virtual samples and the new dictionary can be over completed. Secondly, new generated sample represents some change of a face. So when matching a test face with unknown facial change, it seems that a more similar train sample can be found on the augmented samples. A lot of experiments show that the proposed method outperforms to some similar methods proposed in [26, 27, 35].