1 Introduction

Face is a very important human trait. It is the most significant biological trait which differentiates one person from others. Whenever we want to authorize someone manually, we simply recognize the face and authorize it. When an intelligent system/machine mimics this human behavior, it is known as face detection. Face recognition is a science of recognizing the human face from a number of images/database. It is a very significant area of research as it has wide applications in various domains. There are enormous applications of face recognition. Major applications include the authentication of users for various applications and devices. It includes unlocking of laptops, mobiles, and other software-based systems. Another application is in forensics and investigations where faces are needed to be recognized. Recent applications include intruder detection by various institutes, laboratories, and organizations. Face recognition is based on realizing the biological traits of a human. Other identification techniques like fingerprint, signatures, ID, and PIN are widely used for personal identification and authorization, but the advantage of face recognition is significant. In the case of fingerprint or signature recognition, the person needs to give the impression or signature, respectively, but in face recognition, no such effort is required and the person under examination may not even realize that he is under observation. Moreover, the person need not remember anything as in the case of ID and PIN. It becomes less time-consuming and works exactly how people recognize and authorize other people in society.

Face recognition is essential whenever personal identification is required. Therefore, research in this domain is getting a significant contribution over the last decade. Face recognition is usually done by searching and matching a face from a given database. The database size may be small or large as per the application requirement. Face recognition is generally required for the automated identification of authorized, unauthorized, suspects, criminals, and others. Its main applications include automated surveillance, identification of missing/criminals/terrorists, and searching of database w.r.t to a particular face or its sketch. Face recognition has been proved very useful for the identification of missing/criminal/terrorist. The application becomes considerable when the recognition of a face is required to be done from a crowd. Recent advances in this domain aim at face detection in CCTV camera-obtained videos. Although face recognition algorithms have been quite successful, still there are a lot of open challenges. Aging of person, his pose/facial expressions, environmental features like Illumination/background, and similarity in siblings and blood relations are most of these [1]. Low resolution of images is another challenge [33]. The most significant parameter for a good face recognition algorithm is its accuracy. Advanced face algorithms are in practice nowadays. Advance face algorithm means the state-of-the-art algorithms that study the facial features along with the factors affecting facial features. Therefore, state-of-the-art solutions require the consideration of the face poses, light, and other factors for the higher recognition results.

2 Related work

Face recognition may be performed on the basis of eigenfaces by performing principal component analysis [28]; Laplacian-faces to preserve local information [11]; fisher-faces [2]; multi-subregion-based correlation filtered faces [36]; nearest neighbor or subspace method and many more. A significant number of researchers then utilized key feature-based techniques. Initially, Du et al. [6] proposed a SURF-based feature extractor which is scale and in-plane rotation invariant. Usually, a SURF feature detector has 64 dimensions. The elaborated results were discussed for SURF-64, SURF-128, SIFT-128, SURFdbl-128, and SIFTdbl-128. Liong et al. [19] presented the unsupervised feature learning technique to learn feature representation from raw pixels automatically. The author used different feature dictionaries to represent features, and feature projection matrices were stored for face regions. Huang et al. [12] utilized the high-frequency sub-bands of wavelet transform to extract facial features. Efficient experiments were conducted on the FERET, ORL, and AR face database. Vinay et al. [30] proposed Oriented Fast and Rotated BRIEF (ORB)-based face recognition technique which was claimed to be a robust one with higher accuracy than the state of the art. Carro et al. [3] analyzed Speeded Up Robust Features (SURF) being the most efficient computationally. A matrix estimation method (RANSAC) was used. The proposed approach performed even with partial occlusions. Klemm et al. [15] proposed a key point-based technique for face detection. Performance comparison was done with techniques based on global features (Eigenfaces) as well as techniques based on local (LBP, Gabor filters) features. Hassner et al. [9] proposed a novel template-based face recognition. The algorithm aims at increased accuracy with less computational and storage costs. Werghi et al. [35] presented an integrated approach using shape and texture local binary patterns (LBPs) for 3D face recognition. First, the LBP was calculated from face mesh, and then, a grid of descriptions was constructed. Naik and Panda [22] proposed an adaptive cuckoo search algorithm to improve the objective function values at a faster rate. The proposed algorithm was validated using twenty-three standard benchmark test functions. Three standard face databases, i.e., Yale2B, ORL, and FERET, were tested to claim the accurate recognition rate. Wen et al. [34] proposed convolutional neural networks (CNNs)-based new supervision signal, called center loss, for face recognition tasks. The center loss simultaneously learns a center for deep features of each class and penalizes the distances between the deep features and their corresponding class centers. Labeled Faces in the Wild (LFW), YouTube Faces (YTF), and MegaFace Challenge datasets were used for experimentation.

Guntupalli and Gobbini [8] emphasized the contribution of learning and interactions with other brain areas for the recognition of familiar faces. Lu et al. [21] proposed feature and dictionary learning techniques for face recognition from different poses, illuminations, expressions, and resolutions. Karczmarek et al. [13] proposed an extension of chain code-based local descriptor (CCBLD). A bag of visual word was suggested using chain codes. Chhabra et al. [5] proposed a content-based image retrieval (CBIR)-based system to extract features from faces. Oriented Fast and Rotated BRIEF (ORB) and scale-invariant feature transform (SIFT) were used. The proposed features were classified using a decision tree, random forest, and MLP. Li et al. [18] explained the importance of the integration of multiple types of features for accurate and robust face recognition. A new approach named color two-dimensional principal component analysis was proposed which was based on a convolution neural network.

Wang [31] studied the impact of age and gender on the identification of faces. A deep learning method was used to classify the facial features. An average recognition rate of 83.73% was obtained. Ranjan et al. [25] studied various face recognition covariates such as pose, illumination, expression, and image resolution. The proposed approach studied the effect of uncontrollable covariates such as race, age, and gender on face recognition.

3 Phases of a face recognition system

There are three basic steps in face recognition: face detection, feature extraction, and then face matching as illustrated in Fig. 1. These steps are just the simulation of human behavior while recognizing a face. Whenever, as humans, we are to recognize a face, we actually follow these steps unconsciously. Face detection is the primary step for face recognition. In this step, the system needs to identify the presence of face/s in the give image/scene if any. It aims at identifying a face by preprocessing and extracting a face image from an input image/scene. Face detection required preprocessing of image, i.e., face edge detection, segmentation, and localization. Edge detection is used to highlight the boundaries and other features of the face in the image. Then, segmentation and localization are used to locate and segment the highlighted face boundary from the rest of the image. After the region of interest, i.e., face patch has been obtained; it is followed by feature extraction and representation of the face region. The objective of feature extraction is to transform the face patch into a vector with a fixed dimension or a set of feature points with their corresponding locations. The objective is to extract and represent the intrinsic attributes of a face image. Features extracted may be visual features, statistical pixel features, transform coefficient features, or algebraic features. Then, obtained face-intrinsic features are matched with an existing database of face-intrinsic features to perform face matching/classification. Feature extraction is the most important phase of the proposed face recognition system. In this phase, the significant features are extracted for face identification. In the present paper, two feature extraction techniques, namely SURF and SIFT, are considered for face recognition. These techniques are briefly discussed in the following subsections.

Fig. 1
figure 1

Block diagram of the human face recognition system

3.1 Speeded up robust features (SURF)

SURF is a local feature extraction method. It uses a local invariant fast key point detector for extracting image feature key points. It utilized a distinctive descriptor [4] for extracting the image feature descriptor. It is a fast and robust computational method as compared to the SIFT feature extraction method. The main working of a SURF algorithm is as follows:

  • The feature key points from an image are extracted based on the necessities.

  • Then, the orientation is assigned to the key points. The orientation is assigned in circular motion with respect to the interesting key points.

  • Later, the squared area is tuned according to the selected orientation.

  • Lastly, feature descriptors are extracted using Haar wavelet responses. Usually, an 8D feature vector is extracted as a descriptor vector.

3.2 Scale-invariant feature transform (SIFT)

The SIFT detector extracts a number of meaningful descriptive image features. These features are invariant to scaling, rotation, and illumination. Such points are generally present in high-contrast areas, possibly on object edges. One significant quality of these features is that the relative positions between them should not change from one image to another.

The main stages in SIFT feature extraction are:

  • The first stage is scale-space extrema extraction: in this stage, the interest points which are scale and rotation invariant are searched. Difference of Gaussian (DoG) function is used.

  • Then, the key point localization and filtering are performed. In this stage, the location and scale for resultant interest points are found. Key points are selected which are robust to image distortion.

  • The next stage is the Orientation Assignment in which one or more orientation is assigned to each key point location based on local image-gradient directions.

  • Next, the feature description is performed. The local image gradients are measured at the selected scale in the neighborhood of a key point. And 128D feature descriptor is obtained.

3.3 Locality preserving projections (LPP)

He and Niyogi [10] proposed a feature reduction algorithm, called locality preserving projections (LPP). In the LPP algorithm, the neighborhood information is represented using a graph. Then, a transformation matrix is computed using Laplacian of that graph. The local neighborhood information is preserved. Since LPP is designed for preserving local structure, a search for nearest neighbor in the low-dimensional space will give almost the same results to that in the high-dimensional space. The major advantage of LPP is that it may be applied to any new data point to locate it in the reduced representation space.

4 Proposed system for face recognition

The proposed system of face recognition consists of various phases, namely preprocessing, feature extraction, classifier prediction, and lastly face recognition. First, the digital image undergoes the preprocessing stage. Preprocessing is a preliminary phase of a face recognition system. Feature extraction is used to extract different features from the preprocessed image that results in some quantitative information of interest. This phase analyzes a set of features that can be used for distinctively grading the shape present in the image. Extracted features are global and local in nature and aim to detect the objects present in the image. Global features include geometrical shapes, texture, size, and color. Preliminary features from the input image are extracted to recognize a face by using geometrical shape features. SIFT and SURF feature extraction methodologies are used for extracting image feature descriptors. Further, the K-means clustering algorithm is used to generate ‘K’ number of clusters using the descriptor array. Lastly, LPP is used to reduce the dimensions of the feature vector.

4.1 Proposed algorithm

Input::

Query Image

Step 1::

Extract a feature descriptor from training images using SURF and SIFT as discussed

Step 2::

Use the K-means clustering algorithm to generate 128 clusters. Compute the mean of every cluster to obtain a 128-dimensional feature vector for each descriptor

Step 3::

Use the LPP dimensionality reduction algorithm to reduce the feature vectors of 128 units into 32 and 64 components

Step 4::

Integrate both feature vectors, i.e., SURF and SIFT and store the combined feature vector in the database

Step 5::

Train the proposed system using a combination of SURF and SIFT feature vectors to obtain the classifier model

Step 6::

Test the trained classifier model by inserting the query object image and extract the SURF and SIFT features of a questioned image

Step 7::

Predict the similarity between query object data features and trained dataset using the model classifier

5 Experimental results and discussion

In this proposed system, we conducted extensive experiments on five popular face image datasets, i.e., Yale2B, Face 94 and M2VTS, ORL, and FERET. A brief detail of these is provided in Table 1. A few face examples from the datasets are depicted in Fig. 2. In our experiments, features of SURF-32, SURF-64, SIFT-32, SIFT-64, and SURF-SIFT-128 are used, where 64 and 128 specify the dimensions of the feature vectors. The performance of the classifier is determined based on accuracy, true-positive rate, false-positive rate, and area under the curve.

Table 1 Details of dataset used for experimental work
Fig. 2
figure 2

A few face examples from the a ORL, b Face 94, c Yale2B, d FERET, and e M2VTS

Table 2 depicts the results obtained for the SURF and SIFT features for face recognition. Accuracy of face recognition has been presented for individual SURF and SIFT features and their integration. All the features are classified using the decision tree and random forest for three different face recognition datasets. The accuracy for the random forest is much better than a decision tree for Yale2B, FERET, and Face94 datasets, but vice versa for M2VTS and ORL. The encouraging results were obtained for Yale2B, ORL, and M2VTS datasets as compared to Face 94 and FERET datasets. Occlusion in FERET images is the main reason for the low performance of SIFT and SURF features. SIFT features performed better than SURF features when used independently, but the best results are obtained for their integration. The best recognition accuracy of 99.70% and 98.67% is obtained for the Yale2B and M2VTS datasets, respectively, using SIFT (64) + SURF (32) features. Table 3 elaborates on the TPR for all datasets using all combinations of features with both the classifiers. The results obtained are clearly sustaining the claim for high TPR obtained. High TPR is obtained with a random forest classifier for Yale2B face datasets and with a decision tree classifier for M2VTS, respectively, with integrated feature set as expected. Table 4 illustrates FPR for all the datasets. Low TPR obtained for integrated features confirms the competence of classifying features and supports the claims for efficient classifier. The AUC obtained is tabulated in Table 5. High AUC > 99–100 further explains that the proposed feature set is highly efficient for face recognition in images. Table 6 contains the evaluation time for each set of experiments. Most of the proposed feature-based classifiers took only a fraction of minutes to perform the recognition which demonstrates that the proposed model is computationally efficient.

Table 2 Recognition accuracy using decision tree and random forest classifier
Table 3 True-positive rate using decision tree and random forest classifier
Table 4 False-positive rate using decision tree and random forest classifier
Table 5 Area under curve using decision tree and random forest classifier
Table 6 Evaluation time taken by decision tree and random forest classifier

The proposed algorithm is compared with various state-of-the-art techniques in Table 7. The proposed classifier has a higher recognition rate as compared to parallel techniques for ORL, YALE2B, and M2VTS dataset. The performance deteriorates only in the case of the FERET dataset due to occlusion in images.

  • Experimentation on ORL dataset In the ORL database, the images of the subject are taken at different times, under slightly varying lighting conditions and with various facial expressions. Some people are captured with glasses. The whole of the dataset was used for experiments. We have compared the achieved accuracy with the other state-of-the-art papers. SIFT features alone have performed efficiently in comparison with two-phase test sample sparse representation and 1-level DWT/two-dimensional locality preserving projection. The SIFT-based model has even outperformed deep CNN model for ORL dataset. The results indicate that SIFT features are capable to withstand changes in lighting and facial expression and glasses.

  • Experimentation on FACE94 dataset This dataset has male and female images, acquired against a plain green background. There are minor variations in head turn, tilt, slant, and the face position. Various face expressions are included, but no lighting variation is made. Comparable accuracy as Vinay et al. [29, 30] is obtained as they experimented only with a subset of dataset.

  • Experimentation on M2VTS dataset The advantage of the M2VTS dataset is that the appearance of subjects varies widely in different sessions. There are changes in tan, expression, hairstyle, etc., for the same person. There is a change in their glasses, or they did not wear in some sessions. Remarkable accuracy is obtained by SIFT (64) + SURF (32) features which ensure the usability of these features against expression, hairstyle, and glasses.

  • Experimentation on FERET dataset The FERET face database is one of the benchmark datasets for evaluating face recognition algorithms. The subjects included are diverse across ethnicity, gender, and age. These images have varying facial expression, illumination and have been taken in multiple sessions over years with some occlusion. We observed that comparable accuracy is achieved using SIFT (64) + SURF (64) features with low dimensionality as compared to Deep PCA (650 features) and multi-task deep CNN. Only, Ke et al. [14] claimed higher accuracy but used fewer samples for experimentation.

  • Experimentation on the YALE2B dataset Similar to other datasets, different images in YALE2B dataset are also having different facial expressions and lighting conditions. Our proposed model using SIFT (64) + SURF (32) features obtained an accuracy of 99.70% which is the best till now. The comparison with techniques based on 2D-eulerPCA and CSDL-SRC-power norm exposed that a significant improvement has been achieved with proposed features.

Table 7 Comparison with existing techniques

6 Conclusion

In this research paper, authors have proposed an effective and accurate face detection algorithm which is based on the integration of feature extraction using SURF and SIFT algorithm. The experimental results on FACE 94, Yale2B, ORL, FERET, and M2VTS datasets describe that the proposed approach is efficient and robust. The performance of the classifier is determined based on accuracy, true-positive rate, false-positive rate, and area under the curve. The key features of the proposed research are that the face detection is highly accurate with a lower false-positive rate. Further, the algorithm is found to be computationally efficient. Two classifiers, i.e., decision tree and random forest, are used for the experimental results to validate the performance of the proposed system. The comparative or better results are obtained for various face datasets with the proposed feature-based model.