Introduction

A multimodal biometric recognition systems fuse/combines information from more than one biometric trait to enhance the performance of each unimodal systems and make the system robust under different imaging conditions such as poor (or overexposed) illumination changes, pose variations, the difference in imaging types of equipment, etc. Such systems are being utilized across a variety of applications due to their high discriminative performance in various fields including surveillance, security, the government needs so on [1]. Iris is by and large fused with the face biometric because iris texture information is clear at close up distances while the face trait has supplementary information from a faraway distance [2]. The blend of the face and iris traits can pay compensation for one another, while any one of them not available or partial information is available for the classification task. These techniques can achieve high performances in a controlled environment [3]. However, due to the optical distortion, illumination variation, varying acquisition conditions, occlusion, and many others, the acquired face/iris images are of low quality, which dramatically reduces the performance of the system [4, 5]. On another hand, unimodal biometric systems employ single-source biometric modality that suffers from several issues related to noisy data, lack of uniqueness, non-universality, and so on [6]. The design and selection of feature extractors and fusion techniques often play an important role in such circumferences [7]. Furthermore, if the multiple biometric modalities' features are not properly fused, the performance of such systems may degrade. Hence, there is a need for development of a new feature level fusion technique that can handle such issues. To address the listed issues, the current research has introduced a new face and iris feature extraction and feature level fusion technique for the development of a multimodal biometric system.

The projected study effort aims to build up an efficient and robust biometrics recognition system using face and iris traits. To accomplish a superior recognition rate with reduced computation time is the most important intention of the anticipated work. In this direction, this paper introduces multimodal biometric approaches for person identification based on Polar Fast Fourier Transform (PFFT) features. In the approach, the multimodal databases used for experimentation contain face and iris samples, the face samples are obtained from face databases, namely ORL, LFW, and the new VISA Face and the iris samples are obtained from CASIA 4.0 and the new VISA Iris dataset. The proposed multimodal system overcomes the limitations of face or iris single systems, improve the recognition rate of the overall system and enhances the security against spoof attacks, as in this multimodal system we fuse/combine characteristics of face and iris modalities.

1D-PFFT (One Dimensional Polar Fast Fourier Transform) features are obtained from both face and iris traits individually and feature level fusion is accomplished through CCA (Canonical Correlation Analysis). 1D-PFFTC (1D-Polar Fast Fourier Transform iris Code) features from iris image are obtained and the CCA approach has been applied to fuse 1D-PFFT and 1D-PFFTC features of the face and iris modalities to generate a single PFFT Feature Vector (PFFT-FV) (Polar Fast Fourier Transform-Feature Vector). In this work, based on Polar Fast Fourier Transform-Feature Vector, different classifiers such as NN, SVM, and PNN are presented. The proposed multimodal biometric techniques are evaluated with two different types of face and iris multimodal datasets viz. synthetic multimodal datasets and genuine VISA multimodal dataset. All the proposed multimodal biometric systems are implemented and evaluated using MATLAB 2017b on Intel Core i5 machine having a processor speed of 1.6 GHz and RAM size of 8 GB. In general, the contribution of this research is as follows:

  • A new scheme of Polar Fast Fourier Transform iris Code (PFFTC) for iris feature representation is proposed, and three machine learning techniques are presented for person identification based on Polar FFT features.

  • The Canonical Correlation Analysis (CCA) method for feature level fusion of face and iris modalities is presented. The proposed feature level fusion approach has the benefit of a decrease in the resultant feature vector dimension. The results of the proposed CCA-based feature level fusion approach on both legitimate and artificial databases are found to be encouraging.

  • A challenging new face and iris VISA a multimodal dataset is developed.

The subsequent sections of this paper are organized as follows. The proposed feature extraction and fusion techniques employed in the proposed face–iris multi-modal biometric systems are described in "Proposed Methodology" section. "Classification Techniques" section provides a description of different classification techniques proposed in this paper. The experimentation results and comparative analysis of proposed approaches with PFFT feature are described in "Experimental Results and Discussion" section. Finally, conclusions are drawn in "Conclusions" section.

Proposed Methodology

The different tasks involved in the proposed multimodal system are (1) Data acquisition, where the face and iris biometric modalities are acquired from different face/iris datasets. (2) Feature extraction, polar FFT features are extracted from the given input image. (3) Feature Fusion, CCA is employed to integrate face and iris traits features to get a single feature vector. (4) Classification, feature matching task is performed and an assessment is made to establish the distinctiveness of a person. The Block diagram of the proposed multimodal system is shown in Fig. 1. The description of the feature extraction and CCA based fusion technique used in the proposed model is described in the subsequent sections.

Fig. 1
figure 1

Block diagram of proposed multimodal biometric system

In the proposed methodology, initially face component is cropped from the given sample image using the Viola-Jones technique and rescaled to 120 × 120 pixels grayscale face image [8]. Similarly, the iris portion is extracted from the eye samples by employing a canny edge detector and Circular Hough Transform (CHT). The segmented iris portion may include noises, namely eyelids and eyelashes. Such noises will disgrace the efficiency of the proposed biometric recognition system [9]. Therefore, Linear Hough Transform (LHT) is employed to separate eyelids and the thresholding method was adapted to remove eyelashes. Further to handle issues related to dimensional variation of the iris portion, Daugman’s rubber sheet model was deployed (the process can be referred to as normalization) [10]. Finally, to handle issues related to non-uniform light conditions on the iris region, the local histogram equalization technique was employed in order to obtain a better quality of the iris region [11]. The entire pre-processing of the eye image is illustrated in Fig. 2

Fig. 2
figure 2

Illustration of segmentation of iris region

PFFT Feature Extraction

Once the face part and iris region are obtained, the next step is to extract features from both using Polar Fast Fourier Transform (PFFT) algorithm. In the proposed PFFT feature-based multimodal biometric recognition techniques, after pre-processing and segmentation of face and iris portion, face and iris features are extracted using Fast Fourier Transform (FFT). Fast Fourier Transform is applied on face/iris to compute 2D-DFT coefficient values. In this paper, the low-frequency components of face/iris are retained and further used for person identification. To obtain the low-frequency coefficients more effectively, the low-frequency components (coefficients) are interchanged diagonally such that the low-frequency coefficient values become visible in the center of the frequency spectrum. To reduce the length of the feature vector, periodicity and complex conjugate symmetry property of DFT are used so that only non-redundant coefficient values (NRDFT) of face/iris region DFT spectrum are retained. The new phase spectrum of the face/iris region reduces the dimension of the band to its one fourth in contrast with its original length. Further, face or iris features are acquired by analyzing non-redundant coefficients of DFT (The technique makes use of periodicity and complex conjugate symmetry property of the spectrum, where spectrum repeats itself eternally in both vertical and horizontal directions). To perform PFFT one need to build polar grid, i.e., transform Cartesian to the polar grid (Illustrated in Fig. 3).

Fig. 3
figure 3

Pseudo-polar grid [12]

In general, the pseudo-polar grid can be represented into two subsets, i.e., vertical subset points (VS) and horizontal subsets points (HS) as shown in Fig. 3.

In Fig. 3, index l represents the vertical axis and k index indicates the horizontal axis. In the pseudo-polar grid, the sides of each square are of the size \(\frac{\pi t}{N},{\text{ where }}t = 0,1, \ldots ,N\). The VS rays have size \(\frac{2p}{N} \, \), where \(\, - \frac{N}{2} \le p < \frac{N}{2}\) and equispaced slope. The horizontal subset rays (HS) are analogous to vertical subset rays (VS) but with clockwise rotation of 90°. In general, vertical subset points (VS) and horizontal subsets points (HS) can be expressed as follows:

$$ {\text{VS}} = \left\{ {\xi l = \frac{\pi t}{N}\,\,\,{\text{for}} - N \le t \le N,\,\,\xi k = \, \xi l \cdot \frac{2p}{N}\,\,\,{\text{ for}} - \frac{N}{2} \le p < \frac{N}{2}} \right\} $$
$$ {\text{HS}} = \left\{ {\xi k = \frac{\pi t}{N}\,\,\,{\text{for}} - N \le t \le N,\,\,\,\xi l = \, \xi k \cdot \frac{2p}{N}\,\,\,{\text{ for }} - \frac{N}{2} < p \le \frac{N}{2}} \right\} $$

In this research, the face/iris sample is converted into a uniform polar grid representation, such that rotation and scaling operations are condensed to translations (which are determined by phase correlation). The Polar Fast Fourier transform (PFFT) is carried out in two steps. In the beginning, pseudo polar sampling information is used to deploy a Polar FFT algorithm. In the second step, the transformation from pseudo polar to Polar FFT is performed (Rotate the rays and Circle the square). In the proposed multimodal biometric systems, correlation of the coordinates of equispaced Cartesian and non-equispaced polar FFT grid is used to determine a statistical inference of the face/iris traits. Finally, obtained interpolated polar coordinates (in equispaced Cartesian coordinates) are projected into one-dimensional (1-D) PFFT. 1D-PFFT (1D- Polar Fast Fourier transform) coefficients obtained from the face image are referred to as PFFT feature and iris image is referred to as PFFTC (Polar Fast Fourier Transform iris Code) feature. Further, both PFFT and PFFTC feature values are fused through CCA for recognition. The entire feature extraction algorithm operates as follows:

figure a

Feature Fusion Through CCA

In the proposed methods CCA technique was employed to fuse two feature vectors (face and iris feature vectors) in order to achieve a single feature vector, which is believed to be more discriminative than the original feature vectors [13, 14].

Suppose A and B are face and iris feature matrices (Polar FFT feature vectors), respectively, n represent number of sample images, m indicates number of classes in a dataset, p and q indicates dimensionality of the feature vectors A and B, respectively. In general feature space can be expressed as follows: \(\left\{ {A^{\left( i \right)} \in {\mathbb{R}}^{n \times p} } \right\}_{i = 0}^{m}\) and \(\left\{ {B^{\left( i \right)} \in {\mathbb{R}}^{n \times q} } \right\}_{i = 0}^{m}\), A(i) = {a(i)1, a(i)2,…,a(i)n} and B(i) = {b(i)1, b(i)2,…,b(i)n} denotes vector of ith class of A and B correspondingly.

Let Saa and Sbb indicate within set covariance matrices of A and B. Let Sab and Sba are between set covariance matrices of A and B, respectively (where Sba = SabT, where a and b are column feature vectors). Further, the covariance matrix S in Eq. (7) contains all the information related to the pair of features A and B [13, 14].

$$ S = \left( {\begin{array}{*{20}c} {{\text{cov}} \left( a \right) {\text{cov}} \left( {a,b} \right)} \\ {{\text{cov}} \left( {b,a} \right) {\text{cov}} \left( b \right)} \\ \end{array} } \right) = \left( {\begin{array}{*{20}c} {S_{aa} S_{ab} } \\ {S_{ba} S_{bb} } \\ \end{array} } \right) $$
(7)

The central principle of employing the CCA technique is to determine the linear combination of \(A^{*} = W_{a}^{T} A{\text{ and }}B^{*} = W_{b}^{T} B\), which exploit the pairwise correlation among A and B data sets, respectively, and expressed as [13, 14]:

$$ corr\left( {A^{*} , B^{*} } \right) = \frac{{{\text{cov}} \left( {A^{*} , B^{*} } \right)}}{{{\text{var}} \left( {A^{*} } \right) \cdot {\text{var}} \left( {B^{*} } \right)}} $$
(8)

,

where \(cov\left( {A^{*} , B^{*} } \right) = W_{a }^{T } S_{ab } W_{b }\), \( cov\left( {A^{*} } \right) = W_{a }^{T } S_{aa } W_{a }\) and \(cov\left( { B^{*} } \right) = W_{b }^{T } S_{bb } W_{b }\), \(W_{a } = \wedge^{2} \overline{{W_{a } }} = S_{aa}^{ - 1} S_{ab}\) \(S_{bb}^{ - 1} S_{ba} \overline{{W_{a } }}\) and \( W_{b } = \wedge^{2} \overline{{W_{b } }} = S_{bb}^{ - 1} S_{ba}\) \(S_{aa}^{ - 1} S_{ab} \overline{{W_{b } }}\) (\(\overline{{W_{a } }} \) and \(\overline{{W_{b } }}\) represents eigenvectors, \(\wedge^{2}\) represent diagonal matrix of nonzero eigen values).

Wa and Wb matrices contains the sorted eigenvectors related to the nonzero eigen values. \(A^{*} ,B^{*} \varepsilon \, R^{r \times n}\) denotes as canonical variates and \(r = rank(S_{ab}^{{}} ) \le \min (n,p,q)\)(r is the number of nonzero eigen values).

Further, the feature level fusion of face and iris features is performed by concatenation of the transformed features vectors \(A^{*} {\text{and}} B^{*}\) and can be defined as [13, 14]:

$$ {\text{PFFT}} = \left( {\begin{array}{*{20}c} {A^{*} } \\ {B^{*} } \\ \end{array} } \right)^{*} = \left( {\begin{array}{*{20}l} {W_{a} } \hfill & 0 \hfill \\ 0 \hfill & {W_{b} } \hfill \\ \end{array} } \right)^{T} \left( {\begin{array}{*{20}c} A \\ B \\ \end{array} } \right) $$
(9)

In Eq. (9), the new feature vector (PFFT) is of size (p + q), p and q indicates size of the feature vectors A and B, respectively (dimensionality of the PFFT of face and PFFTC feature of iris vectors, respectively).

In the research work, the CCA fusion technique has been employed to integrate the feature vectors of face (PFFT-Polar FFT) and iris (PFFTC- Polar FFT iris Code) biometrics, respectively. The resultant single Polar FFT- Feature Vector (PFFT-FV) is thought to be more discriminative than the original feature vectors. The proposed different machine learning techniques based on classifiers, namely Support Vector Machine (SVM), Nearest Neighbor Classifier (NNC) and Probabilistic Neural Network (PNN) with Polar FFT- Feature Vector are described in "Classification Techniques" section.

Classification Techniques

Nearest Neighbor algorithm (NN) is employed for classifying test samples based on the closest training samples in the feature space. In the proposed NN classifier with Polar FFT- Feature Vector (PFFT-FV), the test samples being assigned to the trained class most common among its nearest neighbors. The multiclass SVM classifier with the PFFT-FV feature is proposed for person identification. The Gaussian RBF kernel function, as it selects smooth solutions. The SVM classifier is trained separately for both face and iris datasets. During the recognition stage, the feature vector PFFT-FV of the probe face and iris is constructed by extracting PFFT features. Further, the feature vector of test images is compared with the trained SVM. Finally, the SVM classifier classifies probe face and iris images based on the matching score (returns the identity of the probe image (class id)). A Probabilistic Neural Network (PNN) with PFFT-FV is presented in this work. The fused feature vectors PFFT-FV of the entire dataset and test image PFFT-FV values are provided as the input data to the PNN architecture. PNN makes the final decision based on the largest voting scheme.

Experimental Results and Discussion

This paper to show the generalizability and efficiency of the proposed techniques against different imaging conditions and acquisition setups, three different face datasets, namely ORL, LFW, and the newly devised VISA face and two iris databases, namely CASIA.4.0 Interval and the newly devised VISA Iris databases are used. The newly developed VISA Face and Iris database consist of 100 persons face and iris samples, which are acquired from different places and are obtained in different light conditions, sessions and eye movements at VTU, Balagavi and BEC (Basaveshwar Engineering College), Bagalkot, Karnataka, India. The VISA Face database consists 1805 color images in.JPG format and is obtained using mobile phones having 16-megapixel primary camera. VISA Iris dataset consists of 5301(1702 left eye and 1788 right eye images) grayscale eye images in.bmp format and are captured using the IrisShield camera at a distance of 5 cm. The efficiency of the proposed methods has been extensively evaluated on ORL and CASIA, ORL and VISA Iris, LFW and CASIA, LFW and VISA Iris, and VISA Face and CASIA synthetic databases. The proposed techniques are also evaluated on newly developed VISA Face and VISA Iris legitimate database. Leave-One-Out Cross-Validation (LOOCV) the protocol has been in use for assessment [15].

The anticipated multimodal biometric techniques are evaluated on synthetic databases and shown in Fig. 4. Figure 5 illustrates the performance of the proposed methods on LFW + VISA Iris and ORL + VISA Iris synthetic datasets. The knowledge bases constructed from the multimodal features of training images of different individuals from the face and iris datasets are employed for SVM, NN, and PNN classifiers. The performance of the proposed classifiers with PFFT-FV values is tabulated in Table 1.

Fig. 4
figure 4

ROC on ORL, LFW, and VISA Face with CASIA a SVM, b NN, and c PNN

Fig. 5
figure 5

ROC on ORL, LFW, and VISA Face with VISA Iris a SVM, b NN, and c PNN

Table 1 Comparison of the proposed multimodal techniques on genuine and synthetic datasets

Further, the efficiency of the new multimodal biometric techniques with CCA-based feature level combination of the PFFT features on the real and synthetic multimodal face and iris databases are compared with the performance of multimodal, and unimodal biometrics and reported in Tables 1 and 2. The proposed multi-modal biometric techniques yield an attractive performance in comparison with proposed unimodal biometric techniques.

Table 2 Comparison of the proposed techniques with state-of-the-art methods on unimodal datasets

From Table 1, it is revealed that the proposed multimodal biometric techniques with feature level fusion of face and iris traits have achieved considerable accuracy on both genuine/synthetic datasets. SVM classifier with PFFT-FV shows improved performance compare to proposed PNN and NN classifier with PFFT-FV on ORL with CASIA, and VISA Face with CASIA datasets. NN classifier with PFFT-FV achieves enhanced recognition rate on ORL with VISA Iris, LFW with VISA Iris, and VISA Face with VISA Iris datasets in comparison with SVM and PNN classifiers. The experimentation results reveal that the recognition accuracy of the PNN classifier with PFFT-FV is low compared to SVM and NN classifiers. From Table 1 and Figs. 4 and 5, it is also observed that the proposed multimodal biometric systems based on SVM/NN classifiers perform consistently better than the PNN technique on reported face datasets with CASIA dataset. However, their performance reduces over face datasets along with the new VISA iris database, because of low clarity texture information present in eye images of the VISA Iris dataset.

Table 2 demonstrates that improved recognition accuracies are achieved by the proposed face and iris PFFT feature-based fusion technique compared to the DCW features based fusion technique [13]. Even though the performance of proposed multimodal techniques employing face and iris images of synthetic multimodal datasets is better compared to VISA Face + VISA Iris real dataset. Due to, the newly formed VIS dataset is more challenging and complex than other reported synthetic face and iris datasets.

The proposed multimodal biometric techniques are found to be resilient to the presence of illumination, pose, expression and occlusion on the face and illumination changes, dimension inconsistency, eyelid, and eyelashes and reflection of camera and surrounding object reflection on the eye.

We evaluate the performance of proposed approaches on LFW face dataset, which is widely used benchmark for face recognition in unconstrained environments. In this section, experimentation results of proposed machine learning techniques with PFFT features are compared with MLGD (Multi-directional Local Gradient Descriptor) [8], symbolic approach [16]. Deep Convolutional Neural Networks with Gabor [17] and other state-of-the-techniques [18]. Table 3 shows comparison of the proposed techniques with existing methods on LFW database.

Table 3 Comparison of the proposed techniques with state-of-the-art methods on LFW dataset

From Table 3 it is observed that, the proposed techniques even works well in an unconstrained environment. In this work, the performances of the proposed multimodal techniques are compared with some of the similar techniques [5, 13, 19, 20] on synthetic ORL + CASIA interval dataset. Table 4 shows a comparison of the proposed feature level fusion techniques in comparison with state-of-the-art techniques on ORL and CASIA dataset.

Table 4 Comparison of the proposed multimodal techniques with state-of-the-art methods on ORL + CASIA

In Table 4, the SVM, MKL, naïve-LLR-GMM and BSA techniques meet 92.34%, 93.52%, 93.91%, 93.62% recognition accuracy on virtual dataset ORL + CASIA, respectively. DCT + PCA + Zernike moment + Gabor filter feature fusion technique achieved a recognition rate of 98.80%. SSA face features with 2D log Gabor filter features achieved 97.50 using weight sum rule [5]. Feature level fusion of face and iris wavelet feature achieves a recognition rate of more than 99% [13]. It can be seen that the proposed multimodal biometric technique performs better than reported multimodal recognition techniques. The CCA based feature level fusion techniques meet better recognition rates as compared to other techniques (Sl. No. 1–9). The novelty of the work is to create a single unified feature projection from the face and iris biometric modalities for person identification. In comparison to the related works, the performance of the new face and iris multimodal biometric employing the new CCA feature fusion technique is satisfactory and can be used in building secured real-time applications.

From Tables 2, 3 and 4, it is observed that the proposed face and iris multi-modal biometric techniques yields an attractive performance compared to the proposed unimodal biometric techniques. The experimentation results and performance analysis shows that, the proposed NN, SVM, and PNN classifiers with set of PFFT features are found to be efficient in person identification.

Conclusions

This paper presented a new face and iris multimodal biometric techniques using machine learning algorithms based on the Polar FFT features. The proposed multimodal biometric systems conquer most of the previous issues related to unimodal systems such as non-uniform illumination, pose variations, presence of noise, intra-class variations, inter-class similarities and so. The canonical correlation analysis method is in use to combine face and iris feature vectors PFFT to improve the recognition accuracy. The feature-level fusion presented in this research has the benefit of a decrease in the resultant fused feature vector length and which has more discriminative than the original feature vectors. Such an approach effectively improves recognition accuracy as well as robustness. The extensive experimental results and performance analysis indicates the robustness of the proposed multimodal biometric systems. In the future, the employed feature level fusion technique can be extended to fuse other biometrics.