Keywords

1 Introduction

Facial recognition (FR) has many practical applications due to its advantages such as uniqueness, immutability, social acceptance, ease of use and low cost [1]. It is a nonintrusive method for identifying or verifying individuals. FR algorithms involve training classifiers using facial features. Unfortunately, there are many redundant, irrelevant features negatively affect the performance of FR approaches. Approaches such as binary pattern (LBP) can be used to extract local spatial patterns as opposed to global features [2, 3]. It is a feature descriptor for facial expression representation. The main advantages of LBP are tolerance against illumination changes and its computational simplicity [4]. For further performance improvements, feature selection methods can be employed to reduce feature space dimensionality. Feature selection attempts to solve the problem of redundant, irrelevant, and inaccurate features, and can be performed with the aid of various optimization algorithms. These algorithms can include ant colony optimization [5], particle swarm optimization [6], bacteria foraging optimization [7] and firefly algorithm [8].

FR can be implemented using classifiers which include artificial neural networks (ANNs), support vector machine (SVM) and the k-nearest neighbor (KNN) algorithm. These are machine learning approaches that are commonly used for pattern recognition. Many researchers have shown that KNN and SVM outperform other classifiers for FR purposes [9,10,11]. KNN is a simple, efficient, reliable, and computationally efficient algorithm for FR [12] whereas SVM is a machine learning approach widely used in image processing applications. KNN has high recognition rate and can quickly identify items from a large dataset [10]. In terms of facial recognition, KNN leverages upon distance metrics to identify the closest person from the dataset [11, 13]. SVM is an effective discriminative classifier that developed by Philips [14]. The input to SVM is a set \(X,Y\) of labeled training data, where \(X\) is the data and \(Y = \left[ { - 1,1} \right]\) is the label. The output of an SVM algorithm is a set of \(N\) support vectors. The main advantage for SVM is stability, whereby bit changes in the data do not greatly affect the hyperplane, leading to a stable model [15, 16]. SVM can be used to develop classifier or regression models. For FR, SVM attempts to generate a decision surface that separates dissimilarities between images of the same and the images of different individuals [11].

Various FR algorithms have been proposed in recent literature, all with the goal of maximizing recognition rate by adopting a variety of techniques. Gao and Lee’s approach is based on scale-invariant feature transform (SIFT), which is a method to extract local features [17]. The experimental results shown that average performance of 95% when tested on the FERET dataset. Agarwal and Bhanot’s approach involved identifying the center hidden neuron layers of a radial basis function neural network for the purpose of facial recognition [8]. They used the firefly optimization algorithm as feature selection method. Experimental results showed decent recognition rates for various databases as ORL (97.75%), Yale (99.83%), AR (93.15%) and LFW (60.50%). Zhu and Xue presented a novel approach called random subspace method for FR [18]. The tensor subspace approach was used for feature selection to achieve a recognition rate of 98.32%. Lu et al. used a sparse representation method using rank decomposition to get a robust recognition rate of 96% [19].

Other researchers developed FR methods to address issues of high-dimensional features and the multitude of variations available in face images. One method uses GOA to extract relevant features from the high dimensional feature vectors [20]. Their experiments on the ORL dataset led to an accuracy of 91.5%. Sasirekha and Thangavel proposed novel FR algorithm based on KNN with particle swarm optimization (PSO) [21]. LBP and PSO were used to extract and select features respectively, leading to a best-case accuracy of 97.41%. Maheshwari et al. developed an FR approach based on local directional pattern, a feature extraction method [14]. Then, genetic, and differential optimization algorithms were used as a feature selection method to eliminate the irrelevant features. Finally, SVM used to classify the identity of facial images. Their experimental analysis showed that differential evolution outperforms genetic algorithm. Gupta and Goel developed a FR approach that extracts features using a Gabor filter [22]. Principal component analysis (PCA) was then used for feature selection for dimension reduction. A modified version of the artificial bee colony (ABC) is then used on the feature vectors to search for the best match for a test image in a given database, achieving an accuracy of 97%. Abd et al. proposed an FR approach also based on the Gabor filter for feature extraction followed by feature selection by grey wolf optimization (GWO) algorithm. By training a KNN classifier, a recognition rate 97% was achieved on the Yale dataset [23]. The FR approach by Kumar, based on PCA and bat optimization algorithm depicted a recognition accuracy of 96% when tested on the Yale database [12].

More recently, Aro et al. proposed an FR algorithm based on enhanced gabor filters and the ant colony optimization algorithm [24]. The proposed method aimed to solve the high dimensionality problem of gabor filters that lead to low performance and high time complexity. The ant colony optimization algorithm was used to remove noisy, redundant and irrelevant gabor features. They achieved an accuracy of 97.14% and 95.71% using the Malahanobis and Chebyshev classifiers, respectively. Benamara et al. proposed a multispectral face recognition method using random feature selection and PSO-SVM [25]. The proposed method solved the problem of intra-variation conditions which negatively affects the performance of FR systems by using both infrared and visible spectra. A new feature selection algorithm was introduced that reduces the feature space dimensionality to be suitable for real time applications.

Eleyan proposed a PSO metaheuristic algorithm as a feature selection method for face recognition systems that reduces the dimensionality of extracted feature vectors [26]. Experimental analysis was executed by using two well-known face databases. Performance of the PSO approach in terms of accuracy, specificity and sensitivity depicted high performance as compared to other algorithms such as principal component analysis (PCA). Malhotra and Kumar proposed an optimized facial recognition approach that combines DCT and PCA to extract the features that led to a high recognition accuracy of 96.5% [27]. Cuckoo search was used in the feature selection stage to remove irrelevant features. Král et al. proposed another face recognition system based on an improved local binary patterns (LBP) approach [28]. In the proposed approach, the enhanced LBP considers more pixels and different neighborhoods while computing the features. The proposed approach was evaluated using UFI and FERET face datasets, depicting improved performance as compared to other state-of-the art approaches. Table 1 shows the summary of the related work.

Table 1 Summary of related work

In this paper, we investigate the use of two relatively new optimization algorithms in facial recognition. We select these algorithms after studying different 13 optimization algorithms from the perspectives of accuracy and time complexity when used for feature selection. Based on our experiments, the binary dragonfly algorithm (BDA) and grasshopper optimization algorithm (GOA) outperformed their peers in both aspects. Both optimization algorithms are used for feature selection prior to training KNN and SVM classifiers. We denote the four FR approaches as BDA-KNN, BDA-SVM, GOA-KNN and GOA-SVM. The proposed FR algorithms depict a desirable performance in terms of both recognition rate and time complexity, outperforming other recently proposed FR algorithms.

The remainder of this paper is organized as follows: Sect. 1 discusses related work in FR, followed by Sect. 2 which investigates 13 optimization algorithms for feature selection. Section 3 then describes four of the proposed FR approaches whereas Sect. 4 provides experimental analysis of those methods. Finally, the paper concludes with some final remarks in Sect. 5.

2 Optimization Algorithms

2.1 Binary Dragonfly Algorithm

The dragonfly algorithm (DA) is a relatively new optimization algorithm based on swarm intelligence proposed in 2016 [29]. There are many versions of DA such as BDA, multi-objective dragonfly algorithm and single-dragonfly algorithm. The relevant parameters for BDA are listed below, where N is the number of neighboring individuals, \(X_{i} , X_{j} , X^{ + } , X^{ - }\) denote the positions of the current individual, \(j\) th individual, food source and enemy respectively, and \(t\) denotes the number of iterations,

$$Separation{:} S_{i} = - \mathop \sum \limits_{j = 1}^{N} (X_{i} - X_{j} ),$$
(1)
$$Alignment{:} A_{i} = \frac{{\sum\limits_{j = 1}^{N} {Vj} }}{N},$$
(2)
$$Cohesion{:} C_{i} = \frac{{\sum\limits_{j = 1}^{N} {Xj} }}{N} - X_{i} ,$$
(3)
$$Attraction{:} F_{i} = X^{ + } - X_{i}$$
(4)
$$Distraction{:} E_{i} = X^{ - } + X_{i} ,$$
(5)

To update the position of dragonflies in a search space and formulate their movements, two vectors are considered, the step vector \(\Delta X\) and position, \(X\). The step vector denotes the direction of dragonfly movement which can be calculated as

$$\Delta X_{t + 1} = \left( {sS_{i} + aA_{i} + cC_{i} + fF_{i} + eE_{i} } \right) + w\Delta X_{t} .$$
(6)

After calculating the step vector, the position vectors are calculated as

$$X_{t + 1} = X_{t} + \Delta X_{t - 1}$$
(7)

Then, to enhance the randomness of the dragonflies,

$$X_{t + 1} = X_{t} + \left( {0.01X_{t} \times \frac{{r_{1} \times \alpha }}{{\left| {r_{2} } \right|^{{\frac{1}{\beta }}} }}} \right),$$
(8)

where \(r_{1} ,\) \(r_{2}\) denote two random numbers in [0,1], \(\beta = 1.5\) and α is calculated as

$$\upalpha = \left( {\frac{{\varPhi \left( {1 + \beta } \right) \times \sin \left( {\frac{\pi \beta }{2}} \right)}}{{\varPhi \left( {\frac{1 + \beta }{2}} \right) \times \beta \times 2^{{\left( {\frac{\beta - 1}{2}} \right)}} }}} \right)^{{\frac{1}{\beta }}} ,$$
(9)

where \(\Phi \left( x \right) = \left( {x - 1} \right)!\). Finally, the transfer function is used to calculate the probability of the dragonflies changing positions,

$$T\left( {\Delta x} \right) = \left| {\frac{\Delta x}{{\sqrt {\Delta x^{2} + 1 } }}} \right| .$$
(10)

To update the position of search agents in binary search spaces,

$$X_{t + 1} = \left\{ {\begin{array}{*{20}c} {\neg X_{t} , r < T\left( {\Delta X_{t} + 1} \right) } \\ { X_{t} , r \ge T\left( {\Delta X_{t} + 1} \right) } \\ \end{array} } \right\},$$
(11)

where \(r\) denotes to a number in the interval of [0,1]. The BDA algorithm considers all the dragonflies as one swarm and simulate exploration/exploitation by adaptively tuning the swarming factors (\(s\), \(a\), \(c\), \(f\), and \(e\)) as well as the inertia weight (\(w\)). The pseudocode of BDA is shown in Algorithm 1.

figure a

2.2 Grasshopper Optimization Algorithm

Grasshopper optimization algorithm (GOA) is a new optimization algorithm proposed by Saremi et al. in 2017 [30]. The multi-objective version of the grasshopper algorithm was later proposed in 2018 proposed by Mirjalini [31]. As its name implies, GOA is inspired from the behavior of grasshoppers. It is generally used to search for optimal solutions to constrained and unconstrained problems [32]. The pseudocode of GOA is shown in Algorithm 2. The position of the ith grasshopper, \(X_{i}\) is calculated as

$$X_{i} = S_{i} + G_{i} + A_{i} ,$$
(12)

where \(S_{i}\) is the social interaction, \(G_{i}\) is the gravitational force on the ith grasshopper and \(A_{i}\) is the wind advection. Social interaction is the main parameter that dictates the grasshoppers’ movement which can be calculated as

$$S_{i} = \mathop \sum \limits_{j = 1,j \ne i}^{N} s\left( {dij} \right)\widehat{{d_{ij} }} ,$$
(13)

where, N is the number of grasshoppers, \(d_{ij}\) is the distance between the ith and jth grasshoppers, \(\widehat{{d_{ij} }}\) is a unit vector from the ith to the jth grasshopper, and s is a where, N is the number of grasshoppers, \(d_{ij}\) is the distance between the ith and jth grasshoppers, \(\widehat{{d_{ij} }}\) is a unit vector from the ith to the jth grasshopper, and s is a function that represents social attraction. These parameters are defined as

$$d_{ij} = \left| {x_{j} - x_{i} } \right|,$$
(14)
$$\widehat{{d_{ij} }} = \frac{{x_{j} - x_{i} }}{{d_{ij} }},$$
(15)
$$s\left( r \right) = fe^{{ - \frac{r}{l}}} - e^{ - r} ,$$
(16)

respectively, where \(f\) and \(l\) are the attraction intensity and the attractive length scale respectively, and \(x_{i}\) represents the ith grasshopper within the entire population. The final mathematical model of the grasshopper position in the dth dimension is described as

$$sX_{i}^{d} = c\left( {\mathop \sum \limits_{j = 1,j \ne i}^{N} c\frac{{ub_{d} - lb_{d} }}{2}s\left( {d_{ij} } \right)\widehat{{d_{ij} }}} \right) + \widehat{{T_{d} }},$$
(17)

where \(ub_{d}\), \(lb_{d}\) and \(\widehat{{T_{d} }}\) are the upper bound, lower bound and best solution found so far, respectively. \(c\) is a control parameter to modify the behavior of exploitation and exploration and can be calculated as

$$c = c_{max} - l\frac{{\left( {c_{max} - c_{min} } \right)}}{L},$$
(18)

where \(c_{max} = 1\), \(c_{min} = 0.00001\), \(l\) and \(L\) are the maximum value, minimum value, current iteration and maximum number of iterations, respectively.

figure b

2.3 Comparison of Optimization Algorithms

Prior to selecting BDA and GOA to be used in our work, we performed a comparison of 13 optimization algorithms according to 12 test functions to determine their accuracy and efficiency for feature selection purposes. The 12 test functions used for comparison include Raster, Ackley, Camel3, Dejong5, Levy, Sphere, Rosen, Griewank, Zakharov, Schaffer2, Rothyp and Shubert [33]. Experiments were performed using MATLAB 2018 on an Intel Core-i5 CPU with 2 GB RAM. The experiments were executed 1000 times before the accuracy results (cost function) and time taken (in seconds) for each execution are noted, where for both measures, a lower value is desired. Search area dimensions between 10, 20 and 30 were used, with the lower and upper bounds of \(10 \in \left[ { - 5,5} \right]\), \(20 \in \left[ { - 10,10} \right]\) and \(30 \in \left[ { - 15,15} \right]\). Only the unimodal category (single solution problems) is used to determine the best algorithm for feature selection. The results are tabulated in Table 2, where the dragonfly and grasshopper optimization algorithms outperformed their peers in both metrics.

Table 2 Comparison between optimization algorithms

3 Proposed Method

In the proposed work, features of the human face are first extracted using uniform LBP (ULBP). Features are the significant characteristics from a face image which may be its shape, texture, or context. Relevant features are then selected by using BDA and GOA to train two classifiers, KNN and SVM. Classifiers trained using features selected by BDA are denoted as BDA-KNN and BDA-SVM whereas the classifiers trained using features selected by GOA are denoted as GOA-KNN and GOA-SVM. The following subsections provide details regarding the steps involved in developing these algorithms.

3.1 Preprocessing

Illumination and pose normalization techniques are used in the preprocessing stage to mitigate their negative effects on the overall performance of the algorithm. The normalization technique divides the face image into four sub-segments which are each processed independently. The location of the nose is considered the middle point of the image where this image splitting occurs. Illumination normalization is performed for each segment based on the probability density function of its pixels’ grey levels. Upon completing the normalization process, the sub segments are merged and subjected to pixel averaging followed by the application of filters. Details of the entire process are available in [34].

3.2 Feature Extraction

Conventional LBP is typically computed for each pixel \(\left( {x_{c} ,y_{c} } \right)\) of an image with the consideration of small circular neighborhood values (with radius \(R\) pixels). Let \(g_{c}\) denotes the gray level value of that pixel, then \(LBP_{P,R}\) \(\left( {x_{c} ,y_{c} } \right)\) is defined as follows

$$LBP_{{\left( {P,R} \right)}} \left( {x_{c} ,y_{c} } \right) = \mathop \sum \limits_{p = 0}^{p = P - 1} S\left( {g_{p} - g_{c} } \right)2^{P} ,$$
(19)
$$s\left( g \right) = \left\{ {\begin{array}{*{20}c} {1,} & {if g \ge 0} \\ {0,} & { otherwise} \\ \end{array} } \right\},$$
(20)

where \(P\) corresponds to the number of pixels in the neighborhood with radius \(R\). A subset of these \(2^{P}\) binary patterns known as uniform patterns have at most two transitions from 0 to 1 (or vice versa). These uniform patterns play an important role in improving recognition. Thus, the total number of output labels generated by mapping patterns of \(p\) bits is \(p\left( {p - 1} \right) + 3\). We can mathematically define the uniform LBP (\(LBP\left( {P,R} \right)^{u2}\) as:

$$LBP_{P,R}^{u2} \left( {x_{c} ,y_{c} } \right) = \left\{ {\begin{array}{*{20}c} {I\left( {LBP_{P,R} \left( {x_{c} ,y_{c} } \right)} \right), if U\left( {LBP_{P,R} } \right) \le 2 ]} \\ { P\left( {P - 1} \right) + 2 , otherwise } \\ \end{array} } \right.,$$
(21)

where \(I\left( z \right) \in [0,P\left( {P - 1} \right) + 1]\) and

$$U\left( {LBP_{P,R} } \right) = S\left( {g_{P - 1} - g_{c} } \right) - S\left( {g_{0} - g_{c} } \right) + \mathop \sum \limits_{1}^{P} \left| {S\left( {g_{P} - g_{c} } \right) - S\left( {g_{P - 1} - g_{c} } \right)} \right|$$
(22)

\(U\left( {LBP_{P,R} } \right)\) denotes the pattern’s number of spatial bitwise transitions (1/0 changes). If the value of \(U\left( {LBP_{P,R} } \right) < 2\), the corresponding pixel is labeled by an index function I(Z). Otherwise, the pixel will be assigned a value of \(\left( {P - 1} \right)P + 2\). Each uniform pattern is assigned an index based on the index function \(I\left( Z \right)\) which contains \(\left( {P - 1} \right)P + 2\) indices [35]. The global high-dimensional feature descriptor is then generated by concatenating all the features.

3.3 Feature Selection and Classification

Extracting features using ULBP is sensitive to noise and can lead to irrelevant features. The feature extraction method results in a high dimensional feature vector which affects the accuracy and computational cost of a classifier. An efficient FR method could be built by identifying the most important features of the face image. These problems are solved via feature selection which we will perform using BDA and GOA (presented previously in sections A and B, respectively). The parameters used for BDA and GOA are summarized below:

  • BDA

    • Test Size = 1

    • Maximum Iterations = 50

    • Number of Particles = 5

  • GOA

    • Maximum Number of Generations = 50

    • Number of Search Agents = 5

    • Lower Bound = −10

    • Upper Bound = 10

The candidate population (number of particles/search agents) for each optimization algorithm is first initialized, then the search for the best features is performed. After each iteration, features which have been identified will be used as inputs to the KNN or SVM classifiers. The resulting recognition accuracy will be used as the fitness function to compare the new set of features to the previous one. Features that lead to the highest accuracy will be selected for facial recognition purposes. We use each optimization algorithm separately alongside each classification algorithm to identify the combination that maximizes recognition accuracy. Feature selection based on the four combinations, BDA-KNN, BDA-SVM, GOA-KNN and GOA-SVM follow similar steps as shown in Algorithm 3.

figure c

4 Results and Discussion

All experiments described in this section are performed using the Windows 10 on an Intel Core-i5 CPU with 2 GB RAM and MATLAB version 2018. We use three datasets for comparative purposes, the first of which being the Olivetti-Oracle Research Lab (ORL) face database. The database contains 400 frontal faces, each with a size of 112 X 92 pixels. They can be subdivided into 10 tightly cropped images of 40 individuals with variations in pose, illumination, facial expressions (open/closed eyes, smiling/not smiling) and facial details (glasses/no glasses). The second dataset used is the AR face database created by Aleix Martinez and Robert Benavente. The AR database consists of 4000 color images of 126 different people, divided into 46 females and 70 males. The facial images were taken under restricted conditions but with variations in illumination, facial expression and occlusion with sunglasses, scarves, and hair styles. Labeled Faces in the Wild (LFW) is the third and final dataset used in this work [36]. It consists of 5749 different subjects where 1680 subjects have two or more images, resulting in a total of 13,233 images. Similar to the previously discussed datasets, the images have differences in terms of pose, lighting, expression, background, race, age, gender, clothing, occlusions, camera, focus, and other parameters. This database is considered one of the most vital datasets to analyze the robustness of FR against uncontrolled conditions.

We evaluate the performance of the four combinations, BDA-KNN, BDA-SVM, GOA-KNN and GOA-SVM in terms of their time complexity and accuracy. We first compare the optimized algorithms with their unoptimized counterparts to show the performance gains in terms of both metrics. As seen in Tables 3 and 4, the optimized algorithms displayed significant performance improvements. Reduction of the features from feature selection leads to improved accuracy (by preventing overfitting) and improved time complexity.

Table 3 Performance Improvements of KNN
Table 4 Performance Improvements of SVM

Prior to using the reduced feature set, SVM is generally more accurate than KNN albeit being slower. The performance gap between both algorithms is reduced by applying the optimization algorithms for feature selection. In addition, BDA-KNN and GOA-KNN now slightly outperforms BDA-SVM and GOA-SVM respectively. The experiments also indicate that the GOA variants of both classifiers slightly outperform their BDA counterparts. One explanation for this phenomenon is that GOA is more suited to identify global optima whereas BDA tends to generate locally optimal results. We also compare the proposed work against other recently proposed approaches based on recognition rate as shown in Table 5. For all datasets, BDA-KNN, BDA-SVM, GOA-KNN and GOA-SVM generally outperform their peers.

Table 5 Accuracy (%) comparison with existing work

The new optimization algorithms were effective in removing irrelevant, noisy, and redundant features that were extracted using ULBP. This is apparent from the high prediction accuracy of the proposed method as compared to other FR proposals in Table 5. This result also supports our findings in Table 2, which identified that the dragonfly and grasshopper algorithms outperform other optimization algorithms. To the best of our knowledge, the proposed work is one of the first in investigating the use of both dragonfly and grasshopper algorithms specifically for facial recognition purposes.

5 Conclusion

In this paper, we investigate the application of two relatively new optimization algorithms in facial recognition, the dragonfly and grasshopper optimization algorithms. We select these algorithms after performing a thorough comparison with 13 of its peers in terms of feature selection capability. Both algorithms are then used for feature selection alongside two classifiers, k-nearest neighbor and support vector machine. We denote the combination of these approaches as BDA-KNN, BDA-SVM, GOA-KNN and GOA-SVM respectively. As expected, significant performance improvements were obtained when the optimized algorithms were compared to their unoptimized counterparts. Interestingly, we also found that the KNN outperformed their SVM counterparts after application of the optimization algorithms for feature selection, whereas the inverse held true prior to feature selection. We also found that the GOA-based classifiers outperform their BDA counterparts due to the capability of GOA in identifying globally optimal solutions as compared to the locally optimal solutions generated by BDA. Performance comparison against other similar approaches in literature depicts the superiority of the proposed methods in terms of both accuracy and time complexity. Moving forward, our findings imply that future facial recognition algorithms should leverage upon grasshopper optimization for feature selection to maximize performance.