1 Introduction

Face Recognition (FR) has attracted an enormous amount of attention in recent years, as it has a wide variety of applications in identification or verification of facial images. Since FR algorithms are accurate and harmless, they are being used in applications such as access control, smart cards, information security, and law enforcement [6, 9, 49]. With the advancements in other related technologies, even more doors have been opened for advanced face recognition systems. For example, Reference [41] proposes a robust method of labeling mobile images on cloud platforms. As another example, reference [40] proposes an efficient person re-identification algorithm, in which pedestrians are identified across multiple cameras with different viewing geometry.

While there exists a huge body of work dealing with designing reliable FR systems, these systems are still facing some difficulties, e.g., when human faces look different in various lighting conditions or facial expressions. The majority of the existing literature employs shallow or deep learning of face recognition. Traditional algorithms based on feature extraction utilize local, or non-local facial features to train a model that will later be used to recognize an unseen face image [8, 10,11,12, 14,15,16,17, 19, 22, 28, 30,31,32, 38, 39, 43, 45, 46, 48, 49]. Deep learning approaches on the other hand do not generate features, but instead they use the entire face image database to learn and predict unseen face images [18, 34,35,36,37, 50]. State-of-the-art deep learning methods have reported to achieve very high recognition accuracy [18].

Moreover, most of the current FR approaches assume that multiple images of a same face are available for training their algorithms [1, 12, 48]. However, in practical situations, there are only limited number of images available for the same face. Moreover, employing more images would increase the complexity and the processing time of the algorithm. In the case where only one sample image is available for training of the system, it is difficult to recognize the person’s face in different lighting conditions. Reference [39] demonstrates that the performance of FR algorithms degrades even more for the “one sample per person” case when there is no limitation on the lighting conditions of the images under test. To solve this problem, some researches have focused on employing local features in FR [12, 17, 19, 38]. Authors in [11] propose a face description and similarity measuring technique for the Visual Similarity Retrieval (VSR) in single model face databases. Instead of using the local operations on isolated points, the algorithm in [11] employs the Directional Corner Point (DCP) matching in which directional information showing the connectivity to neighbors is utilized for the similar point. Unlike neural network approaches that require multiple model faces per person to train the system for optimal setting [13], the DCP method is suitable for the single model face database retrieval as it is fast in computation. For graph matching techniques in [21, 26, 44], feature vectors are extracted at points on the face image which contains more information. In most feature based methods, facial features are assumed to be the eyes, nose and mouth. In [21, 26, 44], however, the locations and the number of feature points are not fixed, therefore, the number of feature vectors and their locations can vary to represent different facial characteristics of different human faces. In this way, feature points are located around the main facial features (e.g., eyes, nose and mouth) as well as around the special facial features of an individual such as dimples.

Unlike the first group of researchers who are interested in local features, some other works in the literature indicate that instead of using only local features, the entire face should be taken into account. This line of work has been discussed in [45] where the authors employ the principal component analysis to enrich the information extracted from the facial images, or in [46], two-dimensional principal component analysis is used. For such scenarios, a large number of features are used for extracting the information which makes the computational cost of the system increase. To reduce the computational complexity of such a FR process, some sort of dimensionality reduction mechanisms are used before data classification. Linear Predictive Coding (LPC) [47] and Linear Discriminant Analysis (LDA) [10] are two classic approaches for such a dimensionality reduction. In addition to those classic methods, Compressive Sensing (CS) is an efficient mechanism used for decreasing the dimensionality of feature vectors. In comparison to other feature extraction methods, in a CS-based approach, the feature extraction is performed by extracting random samples. CS-based approaches generally result in lower-dimensional measurements of images, without significantly compromising the recognition performance.

In most of the previous works on the CS-based face recognition tasks, e.g., [3, 7, 25, 28], face images are processed in the frequency domain and then, they are classified based on the gender of human subjects appearing in the image. In [7], different images of the same gender are considered as an ensemble of inter-correlated signals. These signals are assumed to be sparse due to variations in faces with respect to the whole image. The authors in [7] exploit this sparsity using CS which enables them to grossly represent images of a given gender by two kinds of features: one represents common features of different faces and another which denotes different faces in all training samples. The first and the second features are combined by a cascaded filter for robust gender recognition. Reference [28] also views the different images of the same subject as an ensemble of inter-correlated signals where the signals are assumed to be sparse on the variations in expressions [28]. The proposed scheme in [28] employs this sparsity using distributed CS theory to represent training images of a given subject by only two feature images: one that captures the holistic (or common) features of the face, and the other which captures different expressions in all training samples.

Some of the CS-based face recognition algorithms use Sparse Representation based Classifiers (SRCs) for classification [25], in which a query sample is encoded as a sparse linear combination of all training samples. Then, the face images are classified by evaluating the classification error, i.e., selecting the class with the minimum representation error. It is demonstrated in [25] that the 1-norm sparsity constraint on encoding of the face matrix’s coefficients plays a key role in the success of SRCs. By synthesizing the algorithm presented in [25] with our Compressive Sensing based two-levels FFT (CS-2FFT) feature extraction process, we propose a robust and fast CS-based face recognition algorithm to generate a sparse feature vector for each image. In this paper, we propose a new CS-based technique that utilizes the FFT to extract a sparse feature vector which has the least possible number of non-zero transform coefficients. In the proposed method, a Hamming windowing function is applied to the image, and then the FFT coefficients resulted by taking the FFT from each window are added together. Therefore, we will end up having one number for each window corresponding to the summation of the normalized FFT coefficients in that window and this gives us a vector for each image. We take the second FFT from this vector and show that if we take a random sample of the resulting vector, it would contain enough information to discriminate the corresponding facial image from other images in a database.

With the above insights in mind, this paper presents for the first time a new modeling and analysis framework for the face recognition task and a sparse FFT-based feature extraction mechanism for characterization of face images. We represent the image as a vector obtained by connecting the image columns sequentially together. To further reduce the dimensionality of our face recognition problem, a CS-2FFT feature vector is synthesized for each image vector. This feature extraction process takes into account both locally extracted features of the image such as those mentioned in the VSR [11] and F-FR [26], and globally extracted features such as the ones in FLDA [8], C-LDA [14], and FFR [20]. The proposed classifier is based on the principle of the SRC with an additional random measurement on the extracted features to reduce the number of dimensions in the image vector which is fed to the classifier. In the proposed face recognition algorithm, we use the CS theory two times. First, in the feature extraction process for extracting the feature vectors from the face images (described in Section 3). This process does not deploy the CS reconstruction, thus, there is no need for the information of the used dictionaries. Second, as it will be mentioned in Section 4, in our classification process, we use the CS reconstruction for selecting the true class. As a result, a significant reduction in the dimensionality of the image vectors is achieved. Extensive experiments have been conducted to evaluate the performance of the proposed scheme in comparison to the existing shallow learning based methods such as the robust VSR in single model face databases [11] and the FLDA applicable to the face recognition with one sample per person [8]. Experimental results show that Compressive Sensing combined with the Sparse Representation Classification (SRC) [25] achieve a high recognition accuracy on Olivetti Research Laboratory (ORL) face database (96%), Labeled Faces in the Wild (LFW) home face database (42% - when trained/tested only on a subset of the database), Extended Yale B face database (95.9%), and Alex Martinez and Robert Benavente face database (AR) (95.6%), while maintaining a reasonable computational complexity.

The rest of this paper is organized as follows: Section 2 reviews the compressive sampling strategy, Section 3 explains the proposed sparse feature extraction mechanism, Section 4 elaborates on the sparse representation-based classification, Section 5 provides performance evaluations, experimental results and interpretation of the findings, and finally Section 6 concludes the paper.

2 Overview of compressive sensing strategy

In our proposed approach, Compressive Sensing is being used as an efficient mechanism for dimensionality reduction of the feature vectors. Thus, it is essential to bring a brief overview of the utilized Compressive Sensing scheme in this section. Illustrated in Fig. 1, suppose x t is a sparse vector of order K on the basis Ψ, i.e., x t has only K non-zero coefficients on basis matrix Ψ. In addition, suppose Φ denotes a M × N sampling matrix with MN, whose rows are uncorrelated with columns of Ψ. If Θ is the sparse samples vector, then we define

$$ \mathbf{x}_{t}=\boldsymbol{\Psi} \boldsymbol{\Theta}. $$
(1)

According to the Compressive Sensing theory, x t can be recovered using \(M=O(K\log (N))\) non-adaptive samples as follows [32, 43]:

$$ \mathbf{Y}=\mathbf{\Phi} \mathbf{x}_{t}=\mathbf{\Phi}\boldsymbol{\Psi}\boldsymbol{\Theta}=\mathbf{D}\boldsymbol{\Theta}, $$
(2)

where Y is an M × 1 vector representing the samples and D is an M × N matrix. To recover the signal, the sparse coefficients vector Θ is computed using the following 0-norm optimization problem:

$$ \min \| \boldsymbol{\Theta} \|_{0}\quad s.t. \quad \mathbf{Y}=\mathbf{\Phi} \mathbf{x}_{t}=\mathbf{D}\boldsymbol{\Theta}. $$
(3)

Solving (3) in its general form is complicated, thus, 1-norm is practically used instead of the 0-norm as follows [21, 26]:

$$ \min \| \boldsymbol{\Theta} \|_{1}\quad s.t. \quad \mathbf{Y}=\mathbf{\Phi} \mathbf{x}_{t}=\mathbf{D}\boldsymbol{\Theta}. $$
(4)

Therefore, instead of taking N samples from a signal which is sparse of order K, only M perpendicular samples of the signal are taken. This means that less number of samples are used in the entire face recognition process. Independent identically distributed (i.i.d) random Gaussian vectors (0,1) and Bernoulli vectors (± 1) are among popular random vectors which are perpendicular within each other and therefore these vectors can be utilized for sampling of the image.

Fig. 1
figure 1

Compressive Sensing in matrix notation

3 Proposed feature extraction algorithm

A schematic process of the proposed face recognition approach is illustrated in Fig. 2. In this figure, we employ two main steps for the recognition task. In the first step, known as the learning process, we extract CS-2FFT feature vectors from each class symbol, while in the second step, we extract CS-2FFT feature vectors from new test pictures and calculate the minimum error between test feature vectors and class symbol feature vectors. For the above FR task, in this section, we introduce the CS-2FFT feature extraction process employed in both steps.

Fig. 2
figure 2

Block diagram of the proposed face recognition scheme

To extract a sparse feature vector from t th face image, we first create a M × 1 vector x t by connecting the image columns sequentially together. This vector is then partitioned using 50 percent overlapped Hamming windows meaning that the first half of each window is overlapped with the previous window, while the second half is overlapped with the next window. Applying the FFT to i th window and normalizing the resulted coefficients by their maximum values so that the normalized magnitudes are between 0 and 1, we make a M × 1 vector \(\mathbf {x}^{(t)}_{nor_{i}}\). Since, t th face vector x t is similar to a summation of sinusoidal waves, we can use the x t ’s FFT coefficients or equivalently the spectrogram as a starting point in our CS-2FFT feature extraction process to extract \(\mathbf {x}^{(t)}_{F(w_{i})}\). It should be noted from \(\mathbf {x}^{(t)}_{F(w_{i})}\) that the size of the used FFT is equal to the length of x t . In fact, to make better time invariant properties in the output of the “summation of window coefficient” block in Fig. 2, local features, i.e., the FFT coefficients of each window, are added together. In the next step, we add up these transform coefficients for each window to assign one value per window which represents the small-time feature of that window. We put the associated window values together to make a B × 1 vector \(\mathbf {x}^{(t)}_{sum_{i}}\) and take another FFT from the resulted signal. The output of this step represents the long-time feature of the original image.

Note that the Fourier analysis is a process that decomposes a face image vector into the summation of a series of sine and cosine waves with different amplitudes, frequencies and phases. Each Fourier transform coefficient can be interpreted in the frequency domain corresponding to the original signal in the spatial domain. The first FFT operates as a moving window filter that is convolved with the image. Window summation aggregates the filtered window samples as a part of the convolution. This provides a notion of local changes (50% overlapping windows) of texture variations for each window. Then the second FFT extracts the longer term non-local texture variations, and results in a sparse representation. Another method of extracting texture and orientation of an image through spectral exploitation, that is widely used in the literature, is to use Gabor filters. However, the 2FFT approach results in a sparse representation that fits better in the rest of the proposed solution.

It is also worth noting that due to the two FFT stages, local and non-local variations are both taken into account. As a result, theoretically it shouldn’t make a noticeable difference to construct a vector signal from image rows instead of image columns. This was also verified statistically through our experiments. However, it should be disclosed that no other way of windowing was tested (e.g. zigzag scan, 2D block windows, or etc.). The main motivation behind our choice was to achieve a very fast parallelizable implementation, as we do not expect to see a noticeable difference in the sparse representation due to the 2FFT approach.

Deploying the second FFT on the vector of feature samples of the time-invariant data results in the B × 1 vector \(F(\mathbf {x}^{(t)}_{sum_{i}})\) representing the obtained feature vector. Similar to the first FFT, it is sensible from \(F(\mathbf {x}^{(t)}_{sum_{i}})\) that the size of the second FFT is equal to the length of \(\mathbf {x}^{(t)}_{sum_{i}}\). It is worth mentioning that the second FFT provides a sparse representation of the feature vector obtained by applying a predetermined amplitude threshold. For this case, the length of this feature vector is equal to the number of windows. Note that while the first FFT represents the time variations inside any frame of the face vector, i.e., local properties of the face image, the second FFT is deployed to indicate the variations of the local properties during consecutive frames. To have an image vector that is as sparse as possible, we feed \(F(\mathbf {x}^{(t)}_{sum_{i}})\) into an amplitude filter which sets the values that are less than a threshold to zero. This amplitude threshold is used to clip the values of the normalized feature elements. As we will show in Section 5, a proper choice of the threshold value results in a higher degree of sparseness in the feature vector that yields a shorter CS-2FFT vector.

In the last step, features are extracted randomly by sampling from the sparse vector y t . The sampling process is performed by multiplying the sparse B × 1 vector y t by a random Gaussian sampling matrix which leads to the M′ × 1 vector \(\mathbf {x}^{\prime }_{t}\). Number of the samples depends on the sparsity level of the y t vector. We empirically choose this parameter in our numerical evaluations in Section 5.

4 Compressive sensing based classifier

Compressive Sensing based classifiers utilize properties of the sparse signals to develop classification algorithms. In this section, we provide more insight through a classification method that take the advantage of this sparsity in our face recognition algorithm (see Fig. 2).

Suppose \(\mathbf {v}_{i,j}\in \mathbb {R}^{M}\) is the j th extracted sample vector from the face images in the i th class. Also, we assume that there are enough sample vectors in the i th class in the form of \([\mathbf {v}_{i,1},\mathbf {v}_{i,2}, ... , \mathbf {v}_{i,n_{i}}]\), where n i represents the number of samples in i th class. If \(\mathbf {x^{(t)}}\in \mathbb {R}^{M}\) is t th sample vector extracted from a test image in the same class as v i, j , then it is approximately represented by the following linear expression:

$$ \mathbf{x}^{(t)}=\sum\limits_{j=1}^{n_{i}} \boldsymbol{\Theta}_{i,j}\mathbf{v}_{i,j}, $$
(5)

where Θ i, j ’s denote the weighting parameters. Let’s define A as a matrix whose column A i is filled with the feature vector corresponding to the i th class v i, j , then A would be a function similar to Ψ in (1) and (2). This can be written as

$$ \mathbf{x}^{(t)}=\mathbf{A}\boldsymbol{\Theta}~ \in~ \mathbb{R}^{M}, $$
(6)

where Θ is an expanded feature vector defined as \(\boldsymbol {\Theta }=[{\Theta }_{i,1},{\Theta }_{i,2},...,{\Theta }_{i,n}]~\in ~ \mathbb {R}^{n}\) and \(n={\sum }_{i=1}^{K} n_{i}\), where K is the number of classes. More precisely, the expanded feature vector is a coefficient vector whose elements are zero except for the elements associated with the i th class [44]. To find the sparse representation vector \(\widehat {\boldsymbol {\Theta }}\) for any input vector x t , we use the optimization problem in (3) or (4). In the ideal case, all non-zero elements of \(\widehat {\boldsymbol {\Theta }}\) correspond to the columns of A that are associated with the i th class. However, due to the existence of the noise and the dissimilarity between the input signal and each class signals, the ideal case would not occur. To resolve this problem, one feature function \(\delta _{i}:~\mathbb {R}^{n}\rightarrow \mathbb {R}^{n}\) is defined for the i th class such that \(\delta _{i}(\widehat {\boldsymbol {\Theta }})\in \mathbb {R}^{n}\) is a new vector whose non-zero elements are the corresponding elements of \(\widehat {\boldsymbol {\Theta }}\) for the i th class. Assuming that the test data belongs to the i th class, Θ can be estimated from (6) as follows:

$$ \hat{\mathbf{x}}^{(t)}_{i}=\mathbf{A}\delta_{i} (\widehat{\boldsymbol{\Theta}}). $$
(7)

Once \(\widehat {\boldsymbol {\Theta }}_{i}\)’s and \(\hat {\mathbf {x}}^{(t)}_{i}\)’s are evaluated for i = 1, ... , K, then x t falls within a class with the least estimation error, i.e.,

$$ \min\quad r_{i}(\mathbf{x}_{t})=\| \mathbf{x}_{t}-\hat{\mathbf{x}}^{(t)}_{i}\|_{2}, $$
(8)

where r i (x t ) is the least estimation error of i th class.

Algorithm 1 summarizes our proposed face recognition scheme. In the next section, we provide some experiment results to evaluate the proposed algorithm in Algorithm 1.

figure e

5 Experimental results and discussion

In this section, extensive and comparative experiments have been conducted to evaluate the performance of the proposed scheme when compared to the state-of-the-art methods such as the robust VSR in gray-scale and single model face databases [11] and the FLDA applicable to the face recognition [8]. All face recognition experiments are performed over ORL [5], LFW [15], Extended Yale B face database, and Alex Martinez and Robert Benavente face database (AR). The algorithms are tested using MATLAB version 8.1 on Windows 8 platform and a Pentium 4 machine with 4 GB of memory. The following describes the details of the experimental setup and the results.

5.1 Face datasets

The ORL dataset consists of 400 face images of 40 people. All images of each person have been taken in different times, poses, and lighting situations. Also, images of each person are with glasses, closed/open eyes, or with/without smile. Figure 3 shows snapshots of the ORL dataset including the gray level images with 8 bits brightness intensity depth and the resolution of 92 × 112. For all face images in the ORL database, we randomly select five image for each person as the training sample and the remaining images are considered as the test images set.

Fig. 3
figure 3

Snapshots of the ORL dataset

The images in the LFW dataset are selected from news articles, and include all of the parameter variations mentioned above. The LFW dataset in our experiments contains more than 10,000 face images collected from the web, and 1,500 subjects which have two or more different face photos in this dataset. These images have been arranged into resolution of 64 × 64 with a commercial face alignment software [24]. Hence, our experiments on this dataset are considered as fully automatic face recognition. Figure 4 shows 16 sample images of five subjects from the LFW database. It is seen from this figure that LFW contains many natural variations of poses, expressions, lighting, and misalignments. Therefore, face recognition for this dataset is more challenging than the images in the ORL database which have been collected under controlled conditions. In our experiments, only the first ten images per each subject are chosen. Moreover, for this dataset, the first image of each person is selected as the class symbol and the remaining images are considered as the class members. In addition, each class has ten members, where there are 151 members in total. Among ten images of each person, five of them are used for training and the rest are used for the test. Training images are chosen such that they have maximum uncorrelated information. Therefore, 200 images are used for the training from which 200 feature vectors are extracted.

Fig. 4
figure 4

Snapshots of the LFW dataset

The Extended Yale B database contains 2,414 images of 38 subjects (around 63 images per person). The cropped and normalized 192 × 168 face images have been captured under various laboratory-controlled lighting conditions [23]. Similar to the other utilized databases, for each class we use half of the images for training (i.e., about 30 images per subject), and the other half is used for testing. Randomly choosing the training set ensures that our results will not depend on any special choice of the training data. Figure 5 shows sample images of different subjects in the Extended Yale B database.

Fig. 5
figure 5

Snapshots of the Extended Yale B dataset

The AR database consists of over 4,000 images of 126 subjects. For each subject, 26 pictures are selected as training set and another 26 images are used for testing set [27]. Compared to the Extended Yale B database, the images in the AR database contain more parameter variations, e.g., a wider range of illumination change, expressions, and facial disguises. For each subject, 14 images with illumination and expression changes are selected; seven images for training, and the other seven for testing. The images are cropped to the dimension of 165 × 120 and are converted to grayscale. Figure 6 shows sample images of different subjects in the AR database. This database is substantially more challenging than the Extended Yale B database since the number of subjects is now increased to 126, however, the training images for the AR dataset is reduced to only seven images per subject.

Fig. 6
figure 6

Snapshots of the AR dataset

5.2 Experimental setup

The experiments consist of two sets. The first benchmark compares the accuracy and the speed of the algorithms for performing the face detection. For this case, the sparse source vector and the under determined linear system are generated based on image columns. The second benchmark measures the face recognition accuracy in noisy situation where the training images are taken from AR, Extended Yale B, LFW and ORL face recognition databases.

5.3 Length of the extracted feature vector

The length of feature vectors in all classical face recognition algorithms directly affects the computational complexity and the performance of the methods. As explained in Section III, in our proposed feature extraction process, we can control the length of the feature vector by adjusting the amplitude threshold. This amplitude threshold is used to clip the values of the normalized feature elements. Figure 7 shows the length of our proposed feature vector in terms of the amplitude threshold for ORL, LFW, AR, and Extended Yale B databases. It is shown in Fig. 7 that a proper choice of the threshold value results in a higher degree of sparseness in the feature vector that yields a shorter CS-2FFT vector.

Fig. 7
figure 7

Length of the proposed feature vector in terms of the amplitude threshold for various datasets

Moreover, in order to observe the effect of the feature length in the face recognition accuracy, we illustrate the face recognition percentage and the feature dimension in terms of the amplitude threshold in Fig. 8. As explained in Section 5.1, the AR database is more challenging than other used databases, therefore, we utilize the AR database in this part of simulations. The results in both Figs. 7 and 8 show that higher values of the amplitude threshold yields a shorter length of the feature vector, while the face recognition accuracy is not significantly affected. To achieve the best trade-off between the length of the feature vector and the face recognition accuracy, in our proposed face recognition algorithm, we empirically choose a point from this trade-off to 0.45.

Fig. 8
figure 8

Performance of the proposed face recognition algorithm in terms of the amplitude threshold for the AR database and SNR = 5 dB

5.4 Experimental results and interpretations

In our simulation setup, we use the False Recognition Rate (FRR) and the False Acceptance Rate (FAR) metrics. The FRR is a measure of the likelihood that the biometric security system will incorrectly accept an access attempt by an unauthorized user. More precisely, the FAR of a system is stated as the ratio of the number of false acceptances divided by the number of identification attempts. This metric is typically obtained as the ratio of the number of false rejections divided by the number of identification attempts. For our databases, the results show F R R = 3.0% and F A R = 1.4%. In addition, the correct classification rate is 95,6% which only gives 4,4% misclassification. This is an interesting result, since the images of each person in the AR database are in a wide range of conditions and noticeably different.

Tables 1 and 2 compare the accuracy and the complexity e(t) of the proposed face recognition algorithm with some classical recognition schemes such as Powel-Beale CGA [29], scaled CGA [30], Powel-Beale CGA and PCA [31], CA-SRC [25], F-CS [7], PCA [42], LDA [4] and EI-CS [28]. To obtain the accuracy of the aforementioned algorithms in Table 2, we use the face recognition error rate. Note that the only true measure of the complexity of an algorithm is the mathematical measurement of the orders of operations it needs to complete. However, since almost all of the face recognition algorithms are so complex that makes such measurements impractical, we assess the algorithms complexity in terms of runtime. Table 2 shows the complexity as the average runtime of a recognition process for different face recognition algorithms in a local PC box (specifics above). It is also worth noting that some classical algorithms implementations such as PCA and LDA are available online [33] while others were implemented in-house as the implementation of such methods were not provided by their authors. We tried our best to ensure our in-house implementations match the original papers. It should be disclosed however that some detailed parameters and configurations could have been tuned slightly differently than what the original authors have used to generate their plots and performance reports.

Table 1 Face recognition rate in different methods for SNR = 5 d B
Table 2 Complexity e(t) of various face recognition schemes for SNR = 5 d B

As observed from these tables, our proposed face recognition algorithm has better accuracy along with a less complexity than other schemes for LFW, ORL, AR and Extended Yale B databases. These results come from the CS-2FFT feature extraction algorithm employed in the face recognition process, while e.g., in Powel-Beale CGA, CGA and PCA, the authors use other classical feature extraction algorithms for identifying faces. The accuracy of the proposed method also shows the robustness of our algorithm against the noise and different pose and lighting conditions [2]. In [2], the authors study the robustness of the feature extraction process by extracting the CS-2FFT from a simulated noisy sine wave and compare that with the feature vector extracted from the clear wave. Since other classical face recognition algorithms process a large amount of information provided in an image database, these algorithms usually suffer from a high complexity. In the case of a high resolution image database, this effect becomes even worse, while in our proposed face recognition algorithm, we utilize Compressive Sensing in both feature extraction and classification processes to significantly decrease the computational cost.

We also examined the effect of the image resolution on the face recognition rate for the aforementioned algorithms. To this end, we down-sampled the 92 × 112 ORL images to 92/L × 112/L images by down-scaling their resolutions. Figure 9 illustrates the correct face recognition rate versus the image resolution for SNR = 5 dB, and for the ORL and LFW datasets. In this figure, the horizontal axis represents different values of 92/L and the vertical axis represents the face recognition rate. The results in Fig. 9 show that an increase in the image’s width leads to an increase in the face recognition rate in all algorithms, while in the proposed algorithm, the growth is small than that of the other schemes. It means that the proposed algorithm is robust against the variation of the input image size. It is seen from Fig. 9 that our face recognition algorithm can accurately recognize a person from his/her images with different resolutions.

Fig. 9
figure 9

Correct face recognition rate versus the image resolution for different algorithms with SNR = 5 dB and for (a) ORL dataset, and (b) LFW dataset

In the next step, we evaluate the relevance of sparsity for rejecting invalid test images for the AR data base. We again simulate 50% levels of occlusion by replacing a randomly chosen block of each test image with an unrelated image. However, in this experiment, we include only half of the subjects in the training set. Thus, half of the subjects in the testing set are new to the algorithm. We test the performance of our algorithm to determine whether a given test subject is in the training database by sweeping the amplitude threshold through a range [0,1]. Figure 10 shows the Receiver Operator Characteristic (ROC) curves of this experiment. For comparison, we also consider the outlier rejection by thresholding the Euclidean distance between the CS-2FFT feature vector extracted from the test image and the CS-2FFT feature vector extracted from the nearest training images within the proposed feature space. It is seen from Fig. 10 that at 50% occlusion, our proposed method still outperforms other assumed face recognition algorithms.

Fig. 10
figure 10

ROC curves for outlier rejection (recognition rate in terms of false acceptance rate) for (a) ORL dataset, (b) Extended Yale B dataset, and (c) AR dataset

To complete our numerical evaluations, we examine the effect of the noise power on the performance of the face recognition algorithms shown in Tables 3 and 4. For this simulation setup, we add the Additive White Gaussian Noise (AWGN) to face images and apply the face recognition algorithms. The results in Tables 3 and 4 show that the proposed algorithm provides a superior accuracy and lower complexity than the other algorithms of Tables 3 and 4, for different Signal to Noise Ratio (SNR) values (Note that our scheme is robust against noise variations).

Table 3 Face recognition percentage in different SNRs and for ORL dataset
Table 4 Face recognition percentage in different SNRs and for LFW dataset

Remark 1

Based on the experimental results on the used databases, it is concluded that i) for all the aforementioned datasets, the best performances of the proposed CS-2FFT based face recognition algorithm consistently exceed the best performances of Powel-Beale CGA, PCA and other classical algorithms at each individual feature dimension, i i) the performances of the other assumed face recognition algorithms depends strongly on the noise, while, as illustrated in Table 3, the performance of our proposed face recognition algorithm does converge in different SNR regimes.

Remark 2

It is worth mentioning that the basic SRC method in [25] does not incorporate any dimensionality reduction, while, in our scheme, we use the dimensionality reduction in the classification in accordance to the sparse feature extraction scheme. Since our proposed Compressive Sensing-based feature extraction process does not contain redundancies, it is suitable for the fast and reliable classifications.

Remark 3

It is worth noting that deep learning based approaches (Convolutional Neural Networks(CNNs) specifically) have been reported to achieve high recognition accuracy [18, 34,35,36,37, 50]. Evaluating the performance of such deep learning based methods (over different levels of SNR and other conditions tested above) is left outside the scope of this paper and will be explored as a part of future works.

Remark 4

Please also note that the experimental results for the LFW database shown above are based on training and testing on only a subset of the database (for the sake of faster processing). It is therefore expected to see a lower accuracy compared to the case when all of the database is used (reported by [18] some deep learning based methods achieve 96 + % accuracy when the entire LFW database is used).

Future work

Future work includes the incorporation of deep learning based algorithms in the experimental comparisons, to evaluate the performance of these methods in the presence of different amounts of noise (SNR levels), size of datasets, number of face images per person, lighting conditions, and etc. We will also utilize the entire LFW database in the performance evaluations. This will particularly be important when evaluating the deep learning based methods, as they may require large amounts of labeled face images.

6 Conclusion

In this paper, a novel compressive sensing-base method for face recognition has been proposed. We used the CS-2FFT feature extraction and SRC classifiers to make our proposed face recognition scheme become less computationally complex and more robust than some existing face recognition algorithms. To this end, first we extracted CS-2FFT feature vectors from images and then we used these feature vectors on the SRC classifier to classify face images. Experiment results showed that the CS-2FFT feature extraction and the SRC classification are effective and robust in the face recognition task. In particular, our proposed face recognition scheme achieves 96% accuracy on the ORL dataset, 95.9% accuracy on the Extended Yale B face database dataset, 95.6% accuracy on the AR database dataset and 42% accuracy on the LFW dataset with a less computational complexity when compared to other state-of-the-art face recognition approaches.