1 Introduction

Biometric recognition is a process to recognize an individual using his/her physiological and behavioural biometrics traits [1, 2]. The physiological characteristics include face, fingerprint, palm print, retina and the behavioural characteristics include speech pattern, gait, keystroking and signature. Biometrics is more reliable than password, token, since they are permanently associated with the user. Biometrics also offers some advantages over these security measures such as non-repudiation, accuracy and security. Numerous biometric recognition systems have been proposed based on different biometrics [39]. Recently, several studies reveal that the walking pattern of a person can also be used for recognition purpose [7]. Mu et al. reported a gait recognition technique in which biologically inspired features (BIF) are used to represent the gait image [8]. BIF is a set of new features obtained from a feed-forward model of the primate visual object recognition pathway [9]. To obtain the BIF, the input gait image is decomposed by a set of Gabor filter. This results in simple cell receptive field which is further used to get complex cell [8]. The biologically inspired features are used in face recognition [10]. When a single biometrics is used for recognition, then the system is called the unimodal system. A unimodal biometrics system suffers from several disadvantages such as noisy data, intra-class variation, non-universality and spoof attacks [11]. Some of the limitations of the unimodal system can be eliminated using multiple biometrics instead of a single biometrics in the recognition process [12]. These systems are known as multimodal biometrics system. Multimodal biometrics systems are more reliable, since features of the different biometrics of a single user are used and so more information of a user is used for recognition.

In a multimodal biometric system, four levels of information fusion are possible [11]. They are fusion at the sensor level, feature extraction level, matching score level and the decision level. Sensor-level fusion is the combination of data from the biometric sensor. Feature-level fusion is the combination of feature vectors obtained either by different sensors or by applying different feature algorithms on the same data. Score-level fusion is the combination of different matching scores obtained by different biometric systems. Decision-level fusion is the combination of the decisions taken by the different biometric systems. A schematic diagram of different levels of fusion of multimodal biometrics is shown in Fig. 1. Various multimodal systems have been proposed in the literature which differ from one another in terms of their algorithms, use of biometrics, level of fusion and method of integration of the multiple biometrics [1230]. In these techniques, different transforms are used to extract features from the biometric images and different projection methods are used to reduce the dimension of the obtained feature vectors.

Fig. 1
figure 1

Levels of fusion in multimodal biometrics system, a sensor-level fusion, b feature-level fusion, c matching score-level fusion and d decision-level fusion

In this paper, different multimodal biometric systems are investigated based on different levels of fusion of face and palmprint images. Features from these biometric images are extracted using Gabor–Wigner transform (GWT) [31]. The GWT is an operational combination of Gabor transform and Wigner distribution function. In the present paper, the GWT is used to extract the feature vector from the biometric images for matching purposes. To select the significant features and to reduce the dimension of the feature vector, particle swarm optimization (PSO) technique is used. Numerical experiments have been carried out to study the different unimodal and multimodal systems.

2 Related work

Multimodal biometric systems are potentially useful with higher recognition rates. Since the last decade, a number of new multimodal biometric systems have been reported to improve the recognition rate [1230]. Xu et al. proposed a feature-level fusion multimodal system in which two biometrics are used as the real and imaginary part of the complex matrix [13]. Different types of multimodal system based on feature- and score-level fusions have been proposed in which particle swarm optimization has been used to reduce the dimension of the feature vector [14]. Jing et al. explored a multimodal biometric system in which different projection methods are used to extract the features from the biometric images [15]. An image-based linear discriminant analysis approach is used to fuse two biometric traits of the same subject in the form of matrix at the feature level [16]. Various fusion strategies have been discussed for multimodal systems [17]. Kumar et al. [18] proposed a multimodal biometric system in which palmprint and face images are integrated at feature level. A score-level fusion of electrocardiogram and unobtrusive biometric has been proposed [19]. Xu et al. [20] proposed a sparse representation method for bimodal biometrics. Huang et al. [21] reported a face and ear based multimodal system using sparse representation. Different hand based multiple biometric systems have been proposed for better performance of the recognition system [2223]. Michael et al. [22] have used multiple hand features such as palm veins, palmprint, finger vein, knuckle print and hand geometry to achieve better accuracy. A multimodal hand vein biometric system which comprises of dorsal and palmer vein have been implemented [23]. Sedai et al. [24] proposed fusion of shape and appearance features of a human pose to achieve a discrimination between different subjects. Islam et al. [25] reported a multi-biometric human recognition using three-dimensional ear and face features. Gait features have been fused with cumulative foot pressure image for recognition [26]. Several multimodal systems have been reported in which Gabor filters are extensively used [2730]. Yao et al. [27] proposed a multimodal biometric recognition system in which Gabor filters and principle component analysis methods have been used to extract the features from face and palmprint modalities. A multimodal system has been reported in which Gabor filtered images were fused at pixel level and Kernel discriminative common vectors-radial basis function (KDCV-RBF) is used to classify the subjects [28]. Yang et al. [29] proposed a feature-level fusion multimodal system in which fusion of fingerprint and finger vein has been used for personal identification. A face recognition technique has been proposed in which Gabor face images have been fused with face images to improve the performance of the system [30]. Discriminative K-SVD (D-KSVD) algorithm has been applied on Gabor face features for facial expression recognition [32]. Recently, several techniques have been proposed in which multiple features of a scene from different views have been used [3339]. These features are called as multi-view data [33]. Yu et al. [34] proposed a semi-supervised multi-view distance metric learning technique for cartoon synthesis. A multi-view technique in which dimension of these fused features is reduced by learning a unified low-dimensional subspace [35]. A multi-view hyper graph-based learning method has been reported in which click data with varied visual features were adaptively integrated [36]. Yu et al. [37] used multiple features to synthesize new cartoons. A new multi-view Hessian regularization (mHR) technique is presented to address different problems in Laplacian regularization-based image annotation [38]. Liu et al. [39] presented a multi-view Hessian discriminative sparse coding (mHDSC) technique in which Hessian regularization is seamlessly integrated with discriminative sparse coding for multi-view learning problems. These techniques are different from the proposed technique because in most of these techniques, projections of the features have been used for the reduction of the dimension but in the present technique, PSO has been used for the selection of dominant features from the feature vector. This results in reduction in the intraclass distance and increase in the interclass distance. The proposed technique is also very different from the techniques in which Gabor filters are used for the feature representation [811, 2730]. As explained earlier, biological inspired features are obtained by decomposing the image using Gabor filters with varied orientation and scaling [811]. Simple cells are obtained which are further used to achieve complex cells. The Gabor filter has drawback of low resolution because of use of the Gaussian window with fixed width. In the present techniques, features are extracted using the Gabor–Wigner transform in which Gabor image is combined with the Wigner transformed image to improve the resolution problem. The GWT has a better resolution, since there is an improvement in resolution because of the inclusion of the Wigner transform.

3 Gabor–Wigner transform

Extraction of distinct and informative features from the biometric images is a fundamental requirement of the biometric system. Different transforms provide different representations of biometric images which can be further used for matching purposes. One of the transforms, which can represent a biometric image in space and frequency variables simultaneously, is the Wigner distribution function (WDF). The WDF has the ability to provide a local frequency spectrum of an image. The WDF \(W\left( {x,f} \right)\), of a continuous function \(f\left( x \right)\) can be written as [4042]:

$$W\left( {x^{\prime } ,f} \right) = \int_{{ - \infty }}^{\infty } {f\left( {x^{\prime } + \frac{x}{2}} \right)f^{*} \left( {x^{\prime } - \frac{x}{2}} \right)e^{{ - i2\pi xf}} dx}$$
(1)

where \(f^{*} \left( x \right)\) is the complex conjugate of \(f\left( x \right)\). The discrete version of WDF of a discrete signal \(f\left[ l \right]\) is defined as:

$$W\left( {l,f} \right) = 2\sum\nolimits_{n = - \infty }^{\infty } {f\left( {l + n} \right)f^{*} \left( {l - n} \right)e^{ - i2\pi nf} }$$
(2)

Further modifications result in pseudo-Wigner distribution (PWD) which is defined as [41]:

$$W\left( {l,f} \right) = 2\sum\nolimits_{{n = - \frac{N}{2}}}^{{\frac{N}{2} - 1}} {f\left( {l + n} \right)f^{*} \left( {l - n} \right)w\left( n \right)w\left( { - n} \right)e^{{ - i2\left( {\frac{2\pi n}{N}} \right)f}} }$$
(3)

where \(w\left( n \right)\) is a window function of size N. The detailed explanation is given by Gabarda et al. [41]. By scanning the image using a one-dimensional window of size N, the PWD of each pixel is calculated in its local neighbourhood. By shifting this window over each of the possible position in the image, a pixel-wise PWD of the whole image is produced. This gives ‘N’ PWD representations of the input image. As seen from Eqs. 1, 2 and 3, the WDF is the Fourier transform of the autocorrelation of the given function. A cross term is also produced in the WDF which limits the use of WDF for various applications.

Another transform that also has an ability to represent an image in the space and the frequency domain simultaneously is the Gabor transform. In the Gabor transform, the input image is space selected using a Gaussian window and the Fourier transform is performed on it for frequency analysis. The Gabor transform (GT) of a one-dimensional function can be defined as [43]:

$$G\left( {x\prime ,f} \right) = \int_{ - \infty }^{\infty } {f\left( x \right)e^{{ - \pi \left( {x - x\prime } \right)^{2} }} e^{ - i2\pi fx} dx}$$
(4)

where \(G(x\prime ,f)\) is the Gabor transform of function \(f(x)\). The discrete version of the Gabor transform for a finite function \(f(l)\) is defined as follows [44]:

$$f\left[ l \right] = \sum_{m = 0}^{M} {\sum_{n = 0}^{N} {C_{m,n} h_{m,n} \left[ l \right]} }$$
(5)

The discrete Gabor coefficients \(C_{m,n}\) are obtained as follows:

$$C_{m,n} = \sum\nolimits_{l = 0}^{L - 1} {f\left[ l \right]\gamma_{m,n}^{*} \left[ l \right]}$$
(6)

where \(\gamma \left[ l \right]\) is a dual basis of \(h\left[ l \right]\) and both of them form a biorthogonal basis. The discrete Gabor transform is defined as:

$$G\left( {m,n} \right) = \sum\nolimits_{l}^{L - 1} {f\left[ l \right]\gamma \left[ {l - m\Delta M} \right]W_{L}^{ - n\Delta Nl} }$$
(7)

where \(\gamma \left[ k \right] = e^{{\pi k^{2} }}\), \(W_{L}^{kl} = e^{{i\left( {\frac{2\pi }{L}} \right)kl}}\), \(\Delta M\), and \(\Delta N\) are the time and frequency sampling intervals, respectively. M and N are the number of samples used in time and frequency domains, and \(\Delta M \times M = \Delta N \times N = L\). The Gabor transform has a drawback of low resolution which is caused because of the use of the fixed width of the Gaussian window. A wide Gaussian window gives a good frequency resolution but poor spatial resolution. With decrement in the width of the Gaussian window, frequency resolution decreases and spatial resolution increases.

Both these transforms have advantages as well as disadvantages. The combination of the Gabor transform and the Wigner distribution function can overcome the disadvantages of both the transforms. The cross-term problem of the Wigner transform and the low clarity problem of the Gabor transform can be reduced using the Gabor–Wigner transform. The GT and WDF are combined into GWT by the following ways [31, 43, 45]:

$$GWT_{f} = GT_{f} \times WDF_{f}$$
(8)
$$GWT_{f} = min\left\{ {\left| {GT_{f} } \right|^{2} ,\left| {WDF_{f} } \right|} \right\}$$
(9)
$$GWT_{f} = WDF_{f} \{ \left| {GT_{f} } \right| > 0.25\}$$
(10)
$$GWT_{f} = GT_{f}^{2.6} \times WDF_{f}^{0.6}$$
(11)

4 Particle swarm optimization

Particle swarm optimization (PSO) is a computational technique which is utilized to find an optimal solution of a problem in a search space [14, 4649]. The PSO has a population of particles in search space with random positions and velocities. At every step, each particle updates itself by the best positions associated with its own positions and its neighbours which are found out using a fitness function. If the position and the velocity of the \(i^{th}\) particle can be represented as \(X_{i} = \left\{ {X_{i1} ,X_{i2} , \ldots X_{id} } \right\}\) and \(V_{i} = \left\{ {V_{i1} ,V_{i2} ,V_{i3} \ldots V_{id} } \right\}\), respectively. At each step, the velocity and the positions of the particles are updated according to the following equations [46]:

$$V_{id}^{\text{new}} = w \times V_{id}^{\text{old}} + c_{1} \times rand_{1} () \times \left( {pbest_{id} - X_{id} } \right) + c_{2} \times rand_{2} () \times \left( {gbest_{id} - X_{id} } \right)$$
(12)
$$X_{id} = X_{id} + V_{id}^{\text{new}}$$
(13)

where c 1 and c 2 are two positive constants, rand 1() and rand 2() are two random functions in the range [0,1], and w is the inertia weight.

In the proposed technique, PSO is used to select the dominant features from the face and the palmprint images. To select the features from the feature vector, binary PSO is used [14, 48, 49]. In binary PSO, the particle positions are the random binary bit strings of length ‘N’ with ones and zeroes which show the selection or denial of bits. The velocity of each particle is updated according to a sigmoid function as given below:

$$S(V_{id}^{\text{new}} ) = \frac{1}{{1 + e\left( { - V_{id}^{\text{new}} } \right)}}$$
(14)
$${\text{If}}\left( {rand < S\left( {V_{id}^{\text{new}} } \right)} \right)\,{\text{then}}\,X_{id} = 1;\quad {\text{ else}}\,X_{id} = 0;$$
(15)

where \(V_{id}\) is the particle velocity obtained from Eq. 12 and \(S(V_{\text{id}} )\) is a sigmoid transform and rand is the random number in the range [0,1].

Feature selection is done by optimizing the fitness function. A fitness function is defined for the system by which the performance of the system can be improved. In the proposed technique, the minimum EER is used as the fitness function. Different parameters are important for the optimization of the fitness function. All these parameters are first optimized and the parameters which give best performance are fixed. For the present study, the maximum velocity \(V_{ \hbox{max} } = 6\) is chosen. The inertia weight and the constants C1 and C2 are chosen to be as 1.2, 0.9 and 1, respectively. A population size of the swarm is chosen as 25 which also plays an important role in the optimization.

5 Proposed algorithm

Different types of multimodal biometrics system are investigated in which the face and palmprint images of a person in combination are used for the recognition. These systems are categorized based on the fusion of the different biometrics at different levels of fusion. The first multimodal system is based on the feature-level fusion and the second type of multimodal system is based on the score-level fusion. The algorithms are further described in detail one by one.

5.1 Unimodal biometric system

To understand the multimodal system, first, the face and palmprint unimodal systems are described in detail. In both the unimodal systems, the same type of architecture is used. A block diagram for the unimodal system is shown in Fig. 2. For the unimodal system, a biometric image of size \(50 \times 50\) is taken. The features of the biometric image are then extracted using the Gabor–Wigner transform. As explained earlier, GWT is the combination of Gabor transform and the Wigner distribution function. To perform the Gabor transform, a Gaussian window of size \(10 \times 10\) is used. A feature vector of dimension \(62,500 = 25 \times 50 \times 50\) is obtained. For the Wigner distribution, a window of size twenty-five is chosen and a combined feature vector of dimension \(62,500 = 25 \times 50 \times 50\) is obtained. The Gabor–Wigner feature vectors are obtained using Eqs. 811. The obtained feature vector is of dimension \(62,500 \times 1\). To reduce the dimension of the GWT feature vector, a binary PSO is used to select the dominant features from the feature vector. Using the PSO, there is a reduction in the dimension of the feature vector. The obtained feature vectors are then stored. In the verification process, the feature vector of test images is matched with the feature vector of the reference images.

Fig. 2
figure 2

Block diagram of a unimodal system

5.2 Multimodal system based on feature-level fusion

In the proposed multimodal biometrics system, the two biometrics of a single user are integrated at the feature level. To design the system, in the enrolment process, the system is first trained with the training data base. The entire algorithm involves two stages, the enrolment and the verification processes which are further described in detail.

5.2.1 Enrolment process

In the enrolment process, 300 (\(2 \times 150\)) face images and 300 (\(2 \times 150\)) palmprint images of a user are used to train the system. Out of the three hundred images, one-fifty images from each biometrics are selected as the reference images and the other one-fifty images are used for matching. The feature vectors from both the biometrics are extracted using the GWT similar to unimodal systems. The obtained feature vectors of both the biometrics are combined using the following equation:

$$Multimodal \,Feature = \alpha \times Face + i\beta \times Palmprint$$
(16)

where α and β are the weight factors for the face and palmprint feature vectors, respectively. The sum of the weight factors is always unity because it shows the contribution of each of the biometrics to the total score. The PSO is then used to select the dominant features from the obtained combined feature vector. The block diagram for this technique is shown in Fig. 3. An extensive study has been carried out to find out the values of the weight factors. The weight factor can be calculated by different means [38, 39]. In the present paper, the weight factors are varied in such a way that their sum gives unity and for each value of the weight factor the performance of the system is evaluated and finally the values of weight factors are fixed for which the system performs the best. The obtained weight factors are then used for verification. The obtained multimodal feature vector is binarized and stored as the reference feature vector for further verification.

Fig. 3
figure 3

Block diagram for the feature-level fusion multimodal system

5.2.2 Verification process

The verification is demonstrated using the entire set of test images. The set of test images contains 600 (\(4 \times 150\)) face images and 600 (\(4 \times 150\)) palmprint images. This set of database can be called as the verification data base. Similar to the enrolment process, in the verification process, a set of verification feature vectors are extracted using the set of verification face images and the verification palmprint images. The set of verification feature vectors is then matched with the stored reference feature vectors for both the biometrics using Hamming distance as the discrimination factor. The Hamming distance between the two vectors is the number of pixels that have different values in the two vectors.

5.3 Multimodal system based on matching score-level fusion

In this multimodal system, the matching score of the palmprint image and the face image of a user are integrated. The basic block diagram for the proposed technique is shown in Fig. 4. Similar to feature-level multimodal system, first the system has been trained with the training images in the enrolment process. As shown in Fig. 4, the feature vectors for both the biometrics are extracted using the GWT. Dominant features from the obtained feature vectors are selected using the PSO. To integrate both the biometrics, matching scores are calculated for the biometrics. Hamming distance is used as matching score for both the biometrics. As explained earlier, two images of each subject (two face and two palmprint images) are used to train the system. Two sets of feature vectors of each image are obtained. One of the set of feature vectors is stored as reference vectors for both the biometrics. The other set of feature vectors of each biometrics is matched with the stored set of feature vectors. A matching score of both the biometrics is obtained for each of the subjects. The matching score obtained by both the biometrics is integrated using the sum fusion method [14]. First, the matching score obtained by each of the biometrics is normalized and then added using a weight factor. A weighted sum rule can be written as:

$$S_{\text{fuse}} = w_{1} \times S_{\text{face}} + w_{2} \times S_{\text{palmprint}}$$
(17)

where \(S_{\text{fuse}}\) denotes the fused score, \(S_{\text{face}}\) and \(S_{\text{palmprint}}\) are the matching scores for the face and the palmprint images, respectively, and \(w_{1}\) and \(w_{2}\) are the weight factors associated with the face and the palmprint images, respectively. Similar to feature-level fusion-based multimodal system, numerical computation is carried out to find out the values of the weight factors.

Fig. 4
figure 4

Block diagram for the score-level fusion-based multimodal system

5.4 Hybrid multimodal system

The basic idea for the hybrid multimodal system is the fusion of the scores obtained from individual unimodal systems with the score obtained from the feature-level fusion multimodal system. The block diagram for the proposed system is shown in Fig. 5. The obtained multimodal system is the combination of two unimodal and a feature-level multimodal system and it is named as the hybrid multimodal system. In this system, the matching scores obtained from all the three systems are integrated using the weight sum rule as given below:

$$S_{\text{hybrid}} = w_{\text{face}} \times S_{\text{face}} + w_{\text{palmprint}} \times S_{\text{palmprint}} + w_{\text{FF}} \times S_{\text{FF}}$$
(18)

where \(S_{\text{hybrid}}\) is the matching score obtained for the hybrid multimodal system. \(w_{\text{face}}\), \(w_{\text{palmprint}}\), and \(w_{\text{FF}}\) are the weight factors for the face, palmprint unimodal and feature-level fusion multimodal system, respectively. Sum of all the three weight factors is unity. \(S_{\text{face}}\), \(S_{\text{palmprint}}\) and \(S_{\text{FF}}\) are the scores obtained from the face, palmprint unimodal and feature-level fusion multimodal system, respectively. Similar to the feature-level and the score-level fusion multimodal systems, numerical computations are carried out to calculate the values of the weight factors.

Fig. 5
figure 5

Block diagram for the hybrid multimodal system

6 Experimental results and analysis

The numerical experiments are carried out on face and palmprint databases. To investigate the proposed technique, a large database is composed by combining the different standard databases for face images. The face database consists of a set of 900 images of 150 subjects with six images per subject. These images are from ORL [50], Yale-B [51] and Essex database [52]. The ORL face database consists of 400 images of 40 subjects each with 10 images. A subset of 40 subjects each with six images is chosen from the ORL database. The extended Yale B database contains a set of 2,432 images (64 images of 38 subjects) which are manually aligned, cropped and resized to 168 \(\times 192\). A subset of 38 subjects each with six images is chosen from the extended Yale-B database. The remaining subset of database of 72 subjects each with six images is taken from the Essex face database. This database consists of 7,900 images of 395 subjects each with 20 images. From the ORL database and the Essex database, only the face images are cropped. The whole database (900) images are resized to a pixel size of \(50 \times 50\).

The palmprint database also consists of 150 subjects each with six images taken from the IIT Delhi Touchless Palmprint Database version 1.0 [53]. This database is captured using a digital CMOS camera. This database contains left- and right-hand images from more than 230 subjects. A set of automatically segmented and normalized palmprint regions are also available in this database. We have chosen a set of 900 images of 150 subjects each with six images from the automatically segmented database of right hand. The automatically segmented images are of size \(150 \times 150\). The whole database 900 images of palmprint are also resized to \(50 \times 50\). The 900 images of face and palmprint are randomly paired to obtain a multimodal biometrics set for each of the 150 users.

For the unimodal systems (face and palmprint systems), 900 feature vectors are obtained from 150 subjects each with six images. So the total match of genuine to genuine is 900 and total match of genuine to imposter is 134,100. For the multimodal system, 300 vectors of each biometrics’ face and palmprint are obtained from 150 subjects in the enrolment process. In the verification process, 600 feature vectors of each biometrics’ face and palmprint are obtained from 150 subjects. So the total match of genuine to genuine is 600 and total match of genuine to imposter is 89,400.

Performance of the biometric systems has been evaluated in terms of the FAR, the FRR and the EER. FRR is the ratio of the false rejected genuine population to the total population of genuine and FAR is the ratio of the false accepted impostor population to the total population of impostors. The EER is the point where FRR and FAR are equal. The ROC curves are plotted between FRR and the FAR for varying values of threshold. As described earlier, GWT is a combination of Gabor transform and the Wigner distribution function in different manners [31]. Experiments are further carried out for all the three transforms separately for the unimodal system using the four equations of GWT (Eqs. 811). The results for all the four equations of GWT for both the unimodal system are shown in terms of the minimum EER in Table 1. As it is observed from Table 1, the first three equations of GWT (Eqs. 810) give higher EER values in comparison to the Gabor and Wigner distribution individually. It means that these equations of GWT do not work for the proposed system. But the fourth equation of GWT is able to improve the performance results when used in the form as given below:

$$GWT_{f} = GT_{f}^{0.8} \times WDF_{f}^{0.2}$$
(19)

As seen from Table 1, the unimodal system based on Gabor transform performs better than the system based on Wigner distribution. Thus, the system based on GWT performs better when the contribution of the Gabor transform is higher than the Wigner distribution. In the present paper, for all the studies, GWT in the form of Eq. 19 is used for the feature extraction from the biometrics. ROC curves corresponding to both the unimodal biometric systems are shown in Fig. 6.

Table 1 EER values obtained from both the unimodal system for all the transforms
Fig. 6
figure 6

ROC curves for face and palmprint unimodal systems for all the three transform as feature extraction

As described earlier, different types of multimodal biometric systems are discussed in this paper. An extensive study has been carried out to show the performance of the techniques when PSO is used for feature selection and when PSO is not used for feature selection. Results in terms of EER for different multimodal and unimodal systems are shown in Table 2 when PSO is not used. For all the multimodal system, the weight factors are shown along with the EER values. The weight factors are for face, palmprint and fused feature (in hybrid multimodal system), respectively. In the case when PSO is not used, the feature vectors of dimension \(62,500 \times 1\) are used for matching. The ROC curves corresponding to different types of unimodal and multimodal systems based on Gabor, Wigner and GWT are shown in Fig. 7. As seen from Table 2, for all types of systems, the GWT gives good results in comparison to the Gabor and Wigner distribution separately. The values of EER obtained for feature-level, score-level and hybrid multimodal system are 4.28, 3.07 and 2.07, respectively. This shows that for all types of multimodal systems, the EER decreases significantly in comparison to unimodal biometric systems. The hybrid multimodal system gives better results for all the three transforms and the hybrid multimodal system based on GWT performs the best.

Table 2 EER values obtained from different systems for all the transforms when PSO is not used
Fig. 7
figure 7

ROC curves for different biometric systems for Gabor, Wigner and Gabor–Wigner transform, respectively, when PSO is not used for feature selection

To reduce the dimension of the feature vectors, PSO is used for all the biometric systems. Results for the different biometric systems when PSO is used are given in Table 3. The ROC curves corresponding to the different unimodal and multimodal biometric systems based on different transforms using PSO for feature selection are shown in Fig. 8. It is observed from Fig. 8, that for all the transforms all the multimodal systems give better results in comparison to the unimodal systems. It can also be observed from Table 3, that the EER values for all the multimodal systems reduce significantly in comparison to both the unimodal systems individually. From Table 2 and 3, it is seen that for all the types of biometric systems, when PSO is used, not only is there a reduction in the dimension of the feature vector but also the performance of the system improves in all the cases. The dimensions of the feature vectors reduce to around half of the original feature vector. Reduction in the dimension results in reduction in computational time while improving the performance level. It is observed from Table 3, that for all the three transforms the score-level fusion multimodal system and the hybrid multimodal system perform better than the feature-level multimodal systems. For all the three transforms, the best results are obtained from the hybrid multimodal system. The minimum values of EER 1.66, 3.55 and 3.72 are obtained for GWT, Gabor and Wigner transform, respectively, for hybrid multimodal system. Thus, it can be concluded that the hybrid multimodal system performs best when GWT is used as feature extraction transform. This shows that GWT can extract the features of the biometrics more efficiently in comparison to the Gabor and Wigner distribution individually. This is because the GWT provides a high resolution and without cross-term representation of a biometric image in space and frequency domain simultaneously. As explained earlier, for all the multimodal systems, the weight factors are shown along with the EER values. The weight factors are for face, palmprint and fused feature (in hybrid multimodal system), respectively.

Table 3 EER values obtained from different systems when PSO is used to reduce the dimension of feature vector for all the transforms
Fig. 8
figure 8

ROC curves for different biometric systems for Gabor, Wigner and Gabor–Wigner transform, respectively, when PSO is used for feature reduction

To compare the proposed multimodal system with existing techniques, a comparative chart is shown in Table 4. For comparison purpose, only those multimodal systems are employed which are constituted using integration of face and palmprint modalities. As observed from Table 4, the proposed technique performs well in comparison to the techniques existing in the literature [13, 27, 28] except the technique given in [15]. In this technique, multiple projection methods were used to improve the results at the cost of computational complexity. A comparable result is obtained for the proposed technique with the existing technique [14]. As shown in Table 4, the face unimodal system gives lower EER in comparison to the palmprint unimodal system. It shows that GWT provides distinct features when it is used for gradually changed biometric image such as face in comparison to comparatively varying biometric such as the palmprint image.

Table 4 Comparison of the proposed face and palmprint multimodal system with the existing face and palmprint multimodal techniques

7 Conclusion

In this paper, different multimodal biometric systems are explored in which Gabor–Wigner transform is used to extract the features from the face and palmprint modalities. Particle swarm optimization technique is then used to select the dominant features from the obtained feature vectors. An extensive study has been carried out to show the performance of these systems when three different transforms: Gabor, Wigner and Gabor–Wigner transforms are used to extract the features from the biometrics. For all the three transforms, results obtained for the multimodal systems give good results in comparison to the unimodal systems because of the use of the multiple information for recognition purpose. It is analysed that the best results are obtained from the hybrid multimodal system. For a hybrid multimodal system based on GWT, an EER of values 2.07 and 1.65 is obtained without PSO and with PSO for feature selection. The results show that the Gabor–Wigner transform performs better in comparison to the Gabor and the Wigner distribution function separately. Results show that the PSO is able to improve the performance of the system with reduced dimension of the feature vector.