1 Introduction

Feature extraction is a key component in any recognition system. The compact and quality representation is of a high importance, and has deep relation with the performance of the system. Several feature extraction methods and techniques have been introduced and utilized in the literature [21, 25]. For instance, Binarized Statistical Image Feature (BSIF) [18] has been used for feature extraction in variety of systems such as fingerprint recognition system, palmprint recognition system, iris recognition system, and many more. One of the main reasons is that BSIF texture representation has efficacy, e.g., Raja et al. [33] proposed a BSIF features based schema for visible spectrum iris and periocular verification systems. In another work, Doyle et al. [10] developed an iris recognition based on BSIF features. While, Younesi et al. [42] introduced a palmprint recognition system using Gabor filter and BSIF features; such that the region of interest (ROI) images are past to Gabor filters, then to BSIF for features extraction,

In a recent work, Mishra et al. [28] proposed a palmprint recognition framework based on BSIF features fused with BRISK features. Attia et al. [4] designed a finger knuckle print recognition frameworks utilizing a bank of multi scale BSIF features; different BSIF filters were used to create a separate BSIF filter bank. In another work, Attia et al. [5] have presented a multimodal biometric system using finger major and minor dorsal finger knuckle patterns for recognition. The BSIF was employed to extract the feature from the different modalities. In different framework, Attia et al.[6] adapted a finger knuckle print recognition system with deep rule-based classifier such that the BSIF and Gabor filter where employed for feature extraction. BSIF features extraction was also employed with face modality. For example, Ylioinas et al. [41] proposed a face recognition system, where BSIF features was used. However, BSIF was not employed on the whole image, it was rather block wise adaption. The BSIF was employed on different rejoins of the face image and BSIF histogram vectors were collected at the end. Ouamane et al. [31] developed a multimodal face verification algorithm. A local features fusion method was presented based on four different feature extraction techniques (i.e., LBP, SLF, BSIF, and LPQ features). Beside BSIF, new feature extraction method are proposed and used daily, e.g., a recently proposed work by Kumar et al. [20], where the a novel descriptor was employed for feature extraction named Dense Local Graph Structure (D-LGS). Cheng et al. [9] proposed another feature descriptor, where deep learning convolutional neural network was used to retrieve features via sparse representation. Yee et al. [40] proposed a face recognition system based on Laplacian Completed Local Ternary Pattern (LapCLTP) features. Leng et al. [22] in a different work has proposed an approach that enhance the efficacy of DCT feature extraction, and improve the accuracy for face and palmprint recognition.

In the literature, many works have been done in order to improve the recognition system. By enhancing the different methods used during the recognition process such as feature extraction, feature selection and many more [23, 24]. The monogenic signal was first introduced by Felsberg et al. [12], where a two-dimensional (2-D) generalization of the analytic signal was introduced. This whole monogenic signal was based on the Riesz transform as extension of the Hilbert transform. Several works have been presented in the literature using monogenic signal representation. For instance, Yang et al. [39] proposed a monogenic binary coding (MBC) for local feature extraction. In another work, a feature extraction and representation model for face recognition (MBP) was proposed by Yang et al. [38], where Local Binary Pattern (LBP) for monogenic magnitude encoding was used. Huang et al. [16] presented a facial expression recognition system with monogenic signal representation to obtain local phase orientation and magnitude from the image. Then, local binary pattern (LBP) was employed to capture the texture and motion information. Whereas, Ohet al. [30] developed a micro-expression recognition system via multi scale monogenic signal representation. Motivated by the higher efficacy of the monogenic signal and feature extraction and representation capability of BSIF, we present, in this paper, a novel scheme named (M-BSIF) for face feature based on monogenic filter and BSIF method. Thus, M-BISF has been used to extract feature from face image, The Monogenic signal representation has proved its great potential in different application domains [15, 17].

Also, BSIF descriptor has been widely utilized in several recognition systems due to its capacity. However, BSIF was not always the best feature descriptor for certain trait [11, 34], i.e., it was not able to attain the best recognition rates. In order to enhance the capability of BSIF method, our proposed feature description scheme takes BSIF to the next level with enhancement yielded by monogenic filters. In this work, a new feature descriptor is proposed called M-BSIF. This new descriptor is realized based on the properties offered by the monogenic signal representation such as local features (local phase, local amplitude and local orientation). In addition, monogenic signal representation is cost effective; it takes less time and space. Also, it can maintain more image information compared to other techniques such as filters those on Gabor wavelets. Specifically, the proposed M-BSIF is realized by, first, applying a band pass mechanism via log-Gabor filter on the image, then a monogenic filter is applied to decompose face image into three complementary parts, i.e., local amplitude, local phase, and local orientation. Next, BSIF is utilized to encode these complementary components in order to extract more efficient features. The used method offers an effective template that represent person in the system, then in matching step it used to decide whether the user is genuine or impostor. The presented framework aims at further enhancing the effectiveness of BSIF features, by utilizing the properties of monogenic signal representation.

The rest of the paper is organized as follows. Section 2 presents outline of the presented system as well as detailed explanation on the proposed feature descriptor. Section 3 outlines the results and a discussion on the findings together with a comparative study. Finally, conclusion is presented Section 4.

2 Proposed M-BSIF based scheme for face recognition system

In this section, we describe the overall M-BSIF based face recognition system, as also shown in Fig. 1. Figure 2 shows the general process of the designed feature descriptor.

As depicted in Fig. 1, the proposed system is composed of five steps. The first step is feature extraction. In this module, proposed M-BSIF descriptor is employed to extract the features. In the second module, dimensionality reduction is performed using PCA + LDA. In the next step/module, Mahalanobis distance is utilized for matching the face representations. Then, the final decision is represented. Lastly, the evaluation is conducted via four evaluation measures, i.e., Rank-1, EER, ROC and CMC. In the following sub section, each step of the presented system is detailed.

Fig. 1
figure 1

The general diagram of the proposed system

2.1 M-BSIF feature extraction

The proposed M-BSIF feature descriptor’s steps are shown in Fig. 2. M-BSIF descriptor has three main steps: (i) band pass using Log-Gabor Filter, (ii) Monogenic signal representation, (iii) BSIF feature extraction. As shown in Fig. 2, before Monogenic signal representation, band-pass (Fig. 2(1)) is applied via Log Gabor filter (Fig. 2(2)) in order to obtain the frequency responses. Then, monogenic signal representation is employed (Fig. 2(3)), that results in (Fig. 2(4)) with three different components, i.e., local amplitude (energy), local phase and local orientation. These three components increase the recognition rate by providing more information. Lastly, BSIF descriptor is used (Fig. 2(5)) to achieve the feature vectors from the previously resulting components. As shown in (Fig. 2(6)), features are obtained that will help in the enhancement of the systems performance with much relevant information. Finally, each image feature vectors are concatenated and the BSIF histogram is obtained.

Fig. 2
figure 2

The procedure of obtaining M-BSIF descriptor

2.1.1 Monogenic signal representation

The monogenic, introduced by Felsberg et al. [12], is a two-dimensional (2-D) generalization of the analytic signal. This whole monogenic signal is based on the Riesz transform as an extension of the Hilbert transform. One of the vital characteristics is preserving properties of a 1-D analytic signal while feature extraction. It adds more information such as local amplitude (energy), local phase, and local orientation.

Let \(f\left(x\right)\) be the real-valued signal and \( h(x) \) be the Hilbert transform kernel in the spatial domain, then the analytic signal \({f}_{anal}\left(x\right)\) is defined by:

$${f}_{anal}\left(x\right)=f\left(x\right)+j.h\left(x\right)f\left(x\right),$$
(1)

where \(h\left(x\right)=1/\pi x\). Overall, the Hilbert transform is performed in frequency domain and the response of \(h\left(x\right)\) in frequency domain is \(H\left(\omega \right)=-j.sign\left(\omega \right)=-j\omega /\left|\omega \right|\).

An analytic signal can be decomposed in two elements, i.e., the local amplitude (energy) \(A\left(x\right)\) and local phase \(\varphi \left(x\right)\) as follows:

$$A\left(x\right)=\parallel {f}_{anal}\left(x\right)\parallel =\sqrt{{f}^{2}\left(x\right)+{\mathcal{H}}^{2}},$$
(2)
$$\varphi \left(x\right)=atan2\left(\mathcal{H},f\left(x\right)\right), \varphi \left(x\right) \in \left[\text{0,2}\pi \right[,$$
(3)

where \(\mathcal{H}=h\left(x\right)f\left(x\right).\)

To realize the generalization, the Riesz transform is used [35]. It is the scalar-to-vector signal transformation whose frequency response equals to \(-j\boldsymbol{\omega }/\parallel \boldsymbol{\omega }\parallel\) in 2-D. Therefore, the Riesz transform can be expressed as follow:

$${f}_{R}\left(s\right)=\left(\genfrac{}{}{0pt}{}{{f}_{x }\left(s\right)}{{f}_{y}\left(s\right)}\right)=\left(\genfrac{}{}{0pt}{}{{h}_{x} * f\left(s\right)}{{h}_{y} * f\left(s\right)}\right),$$
(4)

where \(f\left(s\right)\) with \(s=(x,y)\) is the input signal. The filters \({h}_{x}\) and \({h}_{y}\) are characterized by the 2-D frequency responses \({H}_{x}=-j{\omega }_{x}/\parallel \omega \parallel\) and \({ H}_{y}=-j{\omega }_{y}/\parallel \omega \parallel\), where \(\omega =\left({\omega }_{x} ,{\omega }_{y}\right)\). Now, the spatial representation of Riesz kernel can be obtained as follows:

$$\begin{array}{*{20}l} h_x=\frac x{2\pi{\parallel s\parallel}^3}\\ h_y=\frac y{2\pi{\parallel s\parallel}^3}\end{array}$$
(5)

The monogenic signal for an image \(f\left(s\right)\) is defined as the combination of \(f\) and its Riesz transform as follows:

$${f}_{monogenic}\left(s\right)=\left(f\left(s\right),{f}_{x}\left(s\right),{f}_{y}\left(s\right)\right),$$
(6)

With \(f\left(s\right)\) as the real part of the monogenic signal. While, \({ f}_{x}\left(s\right)\) and \({f}_{y}\left(s\right)\) represent the imaginary part. Thus, the image signal \(f\left(s\right)\) can be decomposed into three different components, i.e., the local amplitude, local phase and local orientation [12]. The corresponding equation that defines those components is:

$$\left\{\begin{array}{*{20}c}A=\sqrt{{f}^{2}+{f}_{x}^{2}+{f}_{y}^{2}}\\ \phi =-sign\left({f}_{x}\right)atan2 \left(\sqrt{{f}_{x}^{2}+{f}_{y}^{2}}/f\right)\\ \theta =antan\left({f}_{y}/{f}_{x}\right)\end{array}\right.$$
(7)

where \(A\) is the local amplitude or energy of the monogenic signal. While, \(\phi\) represents the local phase or structural information and \(\theta\) the local orientation or geometric information.

2.1.2 Log Gabor filter

Before applying the Riesz transform, the image must go through band pass flirting in order to maintain the invariance/equivariance characteristics of the signal decomposition [14]. Thus, in this work, we employed log Gabor to perform the band pass flirting as in [39, 13]. The outline of the frequency response of log-Gabor filters is given as:

$$G\left(\omega \right)=\text{exp}\left\{-{\left[\text{log}\left(\omega /{\omega }_{0}\right)\right]}^{2}/2{\left[\text{log}\left(\sigma /{\omega }_{0}\right)\right]}^{2}\right\}$$
(8)

In Eq. (8), \({\omega }_{0}\) represent the center frequency and \(\sigma\) is the scaling factor of the bandwidth. In [19], it was suggested that it’s better to set the ratio \(\sigma /{\omega }_{0}\) as constant for filters with constant shape ratio. But, in this work, we use different shape ratios between \(\left[0, 1\right]\), in order to see the effect of a different ratios on the final results.

The band-pass monogenic signal representation is given as:

$${f}_{\text{lg}-monogenic}=\left({f}_{lg}\left(s\right),{f}_{lg-x}\left(s\right),{f}_{lg-y}\left(s\right)\right)=\left({f}_{lg}\left(s\right),{h}_{x} * {f}_{lg}\left(s\right),{h}_{y} * {f}_{lg}\left(s\right)\right),$$
(9)

where \({f}_{lg}\left(s\right)=f\left(s\right) * {F}^{-1}\left(G\left(\omega \right)\right)\), and \({ F}^{-1}\) is the 2-D inverse Fourier transform. The amplitude, phase and orientation of a signal \(f\) can be similarly computed [12] as shown in the following equation:

$$\left\{\begin{array}{*{20}c}{A}_{lg}=\sqrt{{f}_{lg}^{2}+{f}_{lg-x}^{2}+{f}_{lg-y}^{2}}\\ {\phi }_{lg}=-sign\left({f}_{lg-x}\right)atan2 \left(\sqrt{{f}_{lg-x}^{2}+{f}_{lg-y}^{2}}/{f}_{lg}\right)\\ {\theta }_{lg}=antan\left({f}_{lg-y}/{f}_{lg-x}\right)\end{array}\right.$$
(10)

In this work, we adapt a multiscale monogenic representation in order to fully describe the signal. Hence, in a multiscale log-Gabor the parameters \({\omega }_{0}\) and \(\sigma\) could be given as:

$$\sigma ={\sigma }_{ratio}{\omega }_{0}, {\omega }_{0}={\left({\lambda }_{min}{\mu }^{s-1}\right)}^{-1},$$
(11)

where \({\lambda }_{min}\) as minimal wavelength, \(\mu\) as the multiplication factor of wavelength, and \(s\) as the scale index. Figure 2 shows a multiscale monogenic signal representation on 3 scales.

2.1.3 BSIF feature extraction

Binarized Statistical Image Feature (BSIF) descriptor [18] is a textural local descriptor. BSIF uses a set of filters with a fixed size φikxk are applied on an image X of size m×n to obtain binarized responses. The filter results are obtained as in Eq. (12):

$$r_i=\sum_{m,n}\varphi_i^{kxk}X\left(m,n\right),$$
(12)

Where φikxk is a linear filter of size k and I = {1, 2, …, n}denotes the number of statistically independent filters whose response can be computed together and binarized to obtain the binary string according to Eq. (13):

$$b_i=\left\{\begin{array}{*{20}c}1 & if\;r_i\;>\;0\\ 0 & otherwise\end{array}\right.$$
(13)

Once the binarization is completed, the BSIF features then are obtained as the histogram of pixel’s binary codes that can effectively characterize the texture components in the image. Note that there are two main important factors in BSIF descriptor, i.e., the filter size\(\left(k\right)\)and the filter length (n).

2.2 Dimensionality reduction

Following the feature extraction phase using M-BSIF, the feature vector obtained is of a high dimension. With the purpose of reducing the vector’s dimension and tackling the large data, we employ a dimensionality reduction technique known as PCA + LDA. This feature reduction technique, due to its speed and simplicity, is considered one of the best ways to deal with large data. To apply this method, first the PCA is used to project the images into a lower data space [36]. Whereas, the LDA is used to maximize intra-class distances and minimize the intra-class distances. The matrix of transformation W can be found as follows for maximizes criterion [7]:

$$\text{T}\left(\text{W}\right)={\text{W}}_{\text{opt}}=\underset{\text{W}}{\text{arg max}}\frac{\left|{\text{W}}^{\text{T}}{\text{S}}_{\text{B}}\text{W}\right|}{\left|{\text{W}}^{\text{T}}{\text{S}}_{\text{W}}\text{W}\right|}=\left[{\text{W}}_{1}{\text{W}}_{2}{\cdots \text{W}}_{\text{d}}\right]$$
(14)

\(\text{T}\left(\text{W}\right)\)is the fisher discriminant criterionthat is to be maximized. W is built by concatenation of the number d leading Eigenvectors. Note that \(\text{W}\)is obtained by resolving the following system:

$${\text{S}}_{\text{W}}^{-1}{\text{S}}_{\text{B}}{\text{W}}_{\text{j}}={\text{W}}_{\text{j}}{{\uplambda }}_{\text{j}} ,$$
(15)

where j = 1, 2, …, d

2.3 Matching module

For matching two face M-BSIF representations, nearest neighbour was employed as a classifier. In addition, the distance used by this classifier is Mahalanobis distance. Let two vectors \({V}_{i}\) and \({V}_{j}\) represent the feature vectors of query and images stored in database (i.e., template), respectively. The distance between \({V}_{i}\) and \({V}_{j}\)is calculated by Eq. (16):

$${d}_{Ma}\left({V}_{i}{,V}_{j}\right)={\left({V}_{i}-{V}_{j}\right)}^{T}{C}^{-1}\left({V}_{i}-{V}_{j}\right)$$
(16)

where \(C\) represents the covariance matrix

3 Experiment results and discussion

In order to evaluate the proposed system, a series of experiments were conducted on three different public databases, i.e., ORL database, AR database and JAFFE database. Each database was used in three different sub experiments where different shape ratio as well as different scales were used.

3.1 Experiment I: ORL database

In this section, we report the results of the experiment conducted on the ORL database. Olivetti Research Laboratory (ORL) [8] database is one of the most used databases in face recognition literature. ORL is a publicly available database, which contains 40 subjects with 10 images per subject. In total, it has 400 images of a size 112 × 92 per image. ORL database images are grayscale with pose and slight light variations with small detailed facial changes. Moreover, the images were taken against uniform background.

This experiment’s results are presented in Tables 2, 3 and 4 by setting scale index \(s\) to 1, 2 and 3, respectively, and shape ratio between \(\left[\text{0,1}\right]\). The tables report the results in terms ofrecognition rate (Rank-1) and Error Equal Rate (EER). In addition, for a more insightful results representation, Fig. 3 shows theReceiver Operating Characteristic (ROC), Cumulative Match Characteristic (CMC) and Error Equal Rate (EER) curves, where:

  • CMC curve represents the performance of a face recognition system in the identification mode. ROR (Rate Of Recognition) is information about the percentage of traits, which represents the closest match in the stored database corresponding to the right identity, i.e., identity being among the top r ranked matches with r = 1, 2,. . ., n; n stands for the number of persons in the database. The CMC plot of the ranking ability in an identification system in regarded as to the rate of recognition.

  • ROC Curve represents a graphical illustration of the evolution of the False Rejection Rate (FRR) against False Acceptance Rate (FAR) for all possible operating thresholds.

  • EER is an important metric used to evaluate a biometric system, it is where the FAR and FRR are equal. Lower the EER is better the system is.

The parameters wavelength \({ \lambda }_{min}\), multiplication factor \(\mu\) and the number of scalesused in all experiment initially were chosen randomly, because a fixed set of parameters that can produce the best performance is to be determined. Table 1 shows the parameters used in each table for each experiment.

Table 1 Parameters set

Table 2 reports the performance on recognition rate Rank-1 and EER using 1 scale. In Table 2, we can that the best results belong to the one with shape ratio equals to 0.1, where the Rank-1 equals to 99.5% and EER equals to 0.21%. Overall, while gradually decreasing the shape ratio, we can see the increase from 4.5% for Rank-1 and 47.92% for EER with shape ratio 0.9 to 99.5% for Rank-1 and 0.21% EER for shape ratio 0.1. The big increase can be seen between 0.9 and 0.8 shape ratio, and between 0.5 and 0.4 shape ratio.

Table 2 Results for number of scales = 1 and frequency center = [1]

The results of Rank-1 and EER using 2 scales reported in Table 3. In from Table 3, the best results belong to the one with shape ratio equals to 0.1, where the Rank-1 equals to 100% and EER equals to 0%, as well as shape ratio 0.2, where the Rank-1 equals to 99% and EER equals to 0.12%. The reported results in Table 3 are better than in Table 2. While gradually decreasing the shape ratio, we can observe an increase in Rank-1 and a decrease in EER. Compared to Table 2, the reported results in this experiment (Table 3), the performance starts at high for shape ratio of 0.9, where the Rank-1 equals to 69.5% while EER equals to 9.66%. Also, performances start improving from the shape ration 0.8, where 81% was given in Rank-1 and 6.30% in EER, which is better results compared to the results reported in Table 2 for the same shape ratio.

Table 3 Results for number of scales = 2 and frequency center = [1.4, 2.8]

Table 4 presents the results for the experiment with 3 scales. We can notice that the best results belong to the last three shape ratios 0.3, 0.2 and 0.1, which were able to give 99.5% and 100% Rank-1, and 0.09%, 0.08%, and 0% EER, respectively. In this experiment, like the previous two tables, the results keep improving while decreasing the shape ratios. In Table 4, results start picking up from 0.8 shape ratio. Now comparing those three sub-experiments, the reported results improve highly using 3 scales.

Table 4 Results for number of scales = 3 and frequency center = [1.4, 1.960, 2.744]

In order to support those findings, the results are presented in terms of ROC, CMC and EER curves in Fig. 3. Moreover, we did select three shape ratios from each table to plot, i.e., 0.1, 0.5 and 0.9 in order to clearly observe and compare the results. In Fig. 3, we can see that the proposed descriptor is highly effective, and we can also see the impact of different shape ratios as well as the scales. We can also confirm that the best results are reported to the shape ratio to 0.1. It is worth noting that using higher number of scales improve the results overall.

Fig. 3
figure 3

ORL database- (a) ROC, CMC and EER for Nb scale = 1, (b) ROC, CMC and EER for Nb scale = 2, (c) ROC, CMC and EER for Nb scale = 3. (ROC helps with the visualization of the performance at different threshold, CMC curve represents the performance of a face recognition system in the identification mode)

3.2 Experiment II: AR face database

In this section, we report the results of experiments on AR database. First, AR face database [27] is a publicly available database. It contains 4000 color images of 126 individual. The images are of frontal views with different expressions, lighting conditions and occlusions (scarf and sun glasses). The pictures were taken in two sessions in two weeks with 26 images per subject. In this study, we considered the first 100 individual, i.e., 2600 images in total.

This experiment results are presented in Tables 5, 6 and 7 by setting scale index \(s\) to 1, 2 and 3, respectively, while varying shape ratio parameter between \(\left[\text{0,1}\right]\). The tables report the results in terms of recognition rate (Rank-1) and Error Equal Rate (EER). For a more insightful results representation, Fig. 4 shows the ROC, CMC and EER curves. The parameters wavelength \({\lambda }_{min}\)and multiplication factor \(\mu\) are same as in the previous experiment Table 1.

Table 5 Results for number of scales = 1 and frequency center = 1
Table 6 Results for number of scales = 2 and frequency center = [1.4, 2.8]
Table 7 Results for number of scales = 3 and frequency center = [1.4, 1.960, 2.744]

Table 5 reports the findings as recognition rate Rank-1 and EER using 1 scale. In Table 5, the best reported results belong to the one with shape ratio equals to 0.1, where the Rank-1 equals to 97.1% and EER equals to 0.93%. Overall, in Table 5, performance start pretty low, however gradually, while decreasing the shape ratio, we can see the improvement in terms of the recognition rate from 0.80% for Rank-1 and 47.47% for EER to 0.95% for Rank-1 to 97.1% and 0.93% for EER with shape ratio 0.1. The dramatic increase can be seen between 0.4 and 0.3 shape ratios, where the recognition rate Rank-1 jumps from 32.3–78.9%and EER from 17.96 to 5.20%.

Table 6 represents the results of recognition rate Rank-1 and EER using 2 scales. It is apparent from Table 6 that the best results belong to the one with shape ratio equals to 0.2, where the Rank-1 equals to 97.60% and EER = 0.81%. Overall, the reported results (like in Table 5), gradually, while decreasing the shape ratio, increase in Rank-1 and a decrease in EER. Compared to Table 5, the reported results in this experiment start of higher level, where the shape ratio of 0.9 gives 14.6% for Rank-1 and EER equals to 28.05%, which is a remarkable improvement. Moreover, the results start picking up and improvements start at shape ratio 0.4. Overall, the results in this table compared to the results reported in Table 5 are better, while the best reported results are similar.

The Table 7 provides results belonging to the experiment using 3 scales. We can observe that the best results are attained by the shape ratios = 0.2. It was able to yield 97.6% Rank-1 and EER = 0.82%. As in previous two tables, in this experiment as well the results keep improving while decreasing the shape ratios. We can see in Table 7 that the results start picking up from 0.4 shape ratio. Now comparing those three sub-experiments, we can observe the best performance in Table 7. Regardless of best reported results, the improvement in the performance overall was better while using 3 scales.

To further support our findings, the results are presented in terms of ROC, CMC and EER curves in Fig. 4. Also, we selected three shape ratios from each table, i.e., 0.1, 0.5 and 0.9 in order to clearly observe and compare the results. Therefore, from Fig. 4, we can see that the proposed descriptor is highly effective, and we can also see the impact of different shape ratios as well as the scales. We can also confirm that the best results are reported to the shape ratio to 0.1. It is worth noting that using higher number of scales improve the results overall.

Fig. 4
figure 4

AR database- (a) ROC, CMC and EER for Nb scale = 1, (b) ROC, CMC and EER for Nb scale = 2, (c) ROC, CMC and EER for Nb scale = 3

3.3 Experiment III: JAFFE face database

Japanese Female Facial Expressions (JAFFE) database [26] is another publicly available benchmark dataset that has been used for evaluation of different face recognition systems. This database has total of 213 images posed by ten subjects (Japanese female) while maintaining six different facial expressions plus a neutral expression. The images are in the gray level and the size of each image is 256 × 256 pixels.

This experiment results using JAFFE database are presented in Tables 8, 9 and 10 by setting scale index \(s\) to 1, 2 and 3, respectively, while varying shape ratio parameter between \(\left[0, 1\right].\)The tables report the results in terms of recognition rate (Rank-1) and Error Equal Rate (EER). For a more insightful results representation, Fig. 5 shows the ROC, CMC and EER curves. The parameters wavelength \({\lambda }_{min}\) and multiplication factor \(\mu\) are same as in the previous two experiments (please see Table 1).

Table 8 Results for number of scales = 1 and frequency center = [1]
Table 9 Results for number of scales = 2 and frequency center = [1.4, 2.8]
Table 10 Results for number of scales = 3 and frequency center = [1.4, 1.960, 2.744]

Table 8 represents the results of recognition rate Rank-1 and EER using 1 scale. It is apparent from Table 8 that the best results belong to the one with shape ratios from 0.5 to 0.1, where the Rank-1 equals to 100% and EER = 0%. Overall, the reported results start pretty low, but gradually improve within only 3 first shape ratios. We can also notice that the increase can be seen between 0.7 and 0.6 shape ratios, where the recognition rate Rank-1 jumps from 77 to 94% in recognition rate Rank-1 and from 7.22 to 2.72% for EER. All in all, the results reported in this experiment are excellent even only one scale is used.

Table 9 provides the results of recognition rate Rank-1 and EER using 2 scales. It is apparent from Table 9 that the all attained results are good, where the Rank-1 = 100% and EER = 0% on the shape ratios from 0.7 to 0.1. Also, even shape ratios 0.9 and 0.8 give good performances of Rank-1 equals to 96% and 99% and EER equals to 0.05 and 1.27%, respectively, which is a remarkable improvement compared to Table 8.

Table 10 the experiment using 3 scales. We can notice that all achieved results are excellent from 0.9 to 0.1 shape ratio, where the results are 99% for Rank-1 and 1% for EER. These performances are interestingly good, even better than previous two tables, therefore the results overall are better using 3 scales.

Lastly, for further more support the findings, the results are presented in terms of ROC, CMC and EER curves in Fig. 5. Like the previous experiments, we selected three shape ratios (i.e., 0.1, 0.5 and 0.9) from each table to plot in order to clearly observe and compare the results. In Fig. 5, we can see that, first the proposed descriptor is highly effective as well as similarly to the previous experiment the impact of different shape ratios as well as the scales. In this experiment, all results are remarkable regardless of the shape ratio; especially in the last experiment.

Fig. 5
figure 5

JAFFE database- (a) ROC, CMC and EER for Nb scale = 1, (b) ROC, CMC and EER for Nb scale = 2, (c) ROC, CMC and EER for Nb scale = 3

All in all, the results from different experiments indicate that the proposed feature extraction scheme is highly effective as shown in above reported tables and figures. Various effects of the shape ratio, number of scales and the center of frequency have been studied to show how performances of a face recognition system changes by varying these three parameters. Therefore, the best set of parameters according to our study is shape ratio = 0.1, and number of scales = 3.

3.4 Comparative study

In order to further demonstrate the effectiveness of the proposed system, a comparative study was conducted with same system using only BSIF feature extraction technique on three datasets.

As shown in Table 11, compared to only BSIF feature descriptions [25], the proposed system is highly effective. For example, we can clearly see the huge difference between the BSIF only features based system and the proposed M-BSIF based system, which attained 49.84% and 0.81% EER, respectively, using AR dataset. While, for ORL database the proposed system was able to obtain 4.00% recognition rate and 100% recognition rate while using only BSIF and M-BSIF systems, respectively. Similarly, on JAFFE database, the only BSIF and M-BSIF based frameworks yielded 10.00% and 100% recognition rates, and 49.56% and 0.00% EERs, respectively. While compared to [20] it did similarly good where both were able to yield a 100% recognition rate for ORL database. Further our proposed system was able to outperform [9] in terms of recognition rate for the AR database, where our system was able to obtain 97.60% while [9] obtained 95.85% and close results with [37] were this later was able to obtain 99.78%. In addition, our system was also able to give good results like [40] and [29] and that using JAFFE database where our system yield 100%. Similarly MBSIF was also able to outperform the recent work [32], while using the ORL database were this work was only able to attain 91.31% on recognition rate while using M-BSIF was able to obtain 100%. MBSIF was also able to perform better then [3] using ORL database even this later was able to obtain a good accuracy of 98.92%.

Table 11 Comparative study of proposed method with existing works

4 Conclusions

In this paper, a new descriptor M-BSIF was proposed for face feature extraction. The method is based on monogenic signal representation and BSIF descriptor. More specifically, the proposed face feature descriptor enhances the ability of the traditional BSIF feature extraction in face recognition system. The presented framework first applies a band pass scheme using log-Gabor filter, then the monogenic filter is used to obtain a set of three different components (i.e., local energy, local phase and local orientation) from each face image. Then, BSIF feature extraction technique is employed on the previously obtained components. The histograms code extracted from the component images are concatenated into a larger feature vector. Next, PCA + LDA technique is used to reduce the dimensionality of the feature vector. Finally, the cosine Mahalanobis distance is utilized to verify whether the user is genuine or impostor. The experiments were conducted on three public different datasets, i.e., ORL, AR and JAFFE. The experimental analyses show the high efficacy of the proposed M-BSIF feature extraction in face recognition system. Also, the proposed M-BSIF was able to attain high accuracy compared to existing schemes in the state-of-the-art. All in all, we can conclude that the proposed M-BSIF descriptor can be an efficient tool for feature extraction in face recognition, therefore we aspire to use it with different modalities. As a future work, we aspire to find the best set of parameters that can guarantee the quality of the findings on larger datasets. Also, we will study the robustness of the frameworks against different adversaries [1, 2].