Keywords

1 Introduction

Achieving the three primary security constituents such as authentication, authorization, and accountability is of major concern in preventing privacy breaches, as we know usage of digital imagery from different sources is now ubiquitous. Identity verification is a perfectly natural process practiced by humans in everyday activities in a non-automated manner. We humans employ several identity verification techniques, e.g., appearance and color, body language, and so on. As getting closer to each other, we can identify based on features such as voice, gait. We practice identity verification techniques withoutknowing specific parameters, but still we identify him/her accurately. However, automation of same kind of identity verification through machine is not an easy job. Hence, Biometric technology has emerged very significantly, which is a science of identifying an individual based on physiological and/or behavior characteristics. Biometric system is basically pattern recognition which makes use of individual’s modality to identify him/her, face, fingerprint, iris, palmprint, hand geometry, handvein, finger knuckle print are physiological modalities and modalities under behavioral are, voice, gait, signature and key stroke.

However, most of the biometric systems employed in real world application are unimodal—it means that they rely on single source of modality, as it is simple and easily adoptable. These unimodal systems struggle with number of challenges to face such as noise in the sensed data, intra-class variations, inter-class similarities, non-universality, spoof attacks etc. These limitations of unimodal biometric systems can be overcome by adopting the Multibiometric system. A multibiometric system relies on multiple source of information by multi-sensor, multi-algorithmic, multi-sample and multi-instance ways. Primary concern in any multibiometric system is to determine the viability of fusing multiple information to achieve desired level of accuracy. Based on the type of information available, different type of fusion can be performed. Fusion strategies at various levels are sensor level, feature level, score level and decision level.

It is well known that real time data is always prone to noisy conditions, hence building the system with the classifiers that is capable of handling imprecision to a certain degree is a prerequisite for any robust biometric model. Giving a meaningful conclusion to any robust model in a positive aspect needs to be diagnosed by inducing artificial noise and checking its threshold of noise tolerance [27] Attaining unified measures performance and robustness simultaneously in a biometric model that is essentially designed in addressing real time noisy issues is disregarded in literature so far. Generally there are two types of noises, class noise (Ex: labelling errors) and attribute noise (Ex: attribute instance corruption) that hinders the noise model which in turn reduces system’s performance. Noise affects the performance of the system and when compared to the accuracy of the model without noise, generally there would be a difference. Building a generalized model that takes care of real time noises and proving itself insensitive to such noises is a challenging work.

In this paper, we have designed face centric multimodal biometric system by fusing face biometric data with other modalities such as Fingerprint, Iris, Palmprint and Handvein. Prominent contribution of this paper are:

  1. 1.

    Different combination of physiological traits employed along with the face considering it has facecentric bimodal systems are proposed and their evaluation at all levels of fusion is carried out. The proposed multimodal systems are evaluated using all conventional fusion strategies.

  2. 2.

    We also have proposed rotation invariant segmentation technique to extract ROI from Palm print images. In case of Handvein images, to reduce storage and computational burden, we have proposed a method to extract sufficient ROI required for discrimination.

  3. 3.

    Log-Gabor filter is a generalized feature extraction for most of the 2D data modality as against 1D data modality (such as speech, online signature etc). We have exploited Log-Gabor features in this study which helps us to know potential benefits/limits of employing this feature set on different modalities.

  4. 4.

    Extensive experiments are conducted to ascertain behavior of various multimodal systems. Results are substantiated with appropriate analysis.

Face modality has several advantages that makes it preferable in many biometric applications such as non-intrusiveness in nature, availability of strong feature extraction algorithms (subspace, active shape models) under different levels of controlled environment (Face Recognition Grand Challenge, Face Recognition Vendor Test, Face Recognition Technology (FERET)) data sets. On the other hand, there are several studies in literature that showed fusion of face with other modalities such as palmprint, fingerprint etc. However, these papers address only a particular fusion technique but did not provide a complete knowledge considering all pre and post classification fusion schemes. Hence, it is evident that there are plenty of reasons to carry out this type of study. We have listed below some important reasons that instigated us to do this work:

  • To the best of our knowledge, there is no single paper in the literature that addresses face centric bimodal systems developed at all levels of fusion.

  • In general, results obtained from this study helps us to determine: (a) the optimal modality combination for face trait. (b) the robust level of fusion of face trait with particular secondary modality against Gaussian noise and (c) impact of fusion strategies at various levels of fusion.

  • Facial feature extraction algorithms are tried at various capacities for enhanced recognition rate: such as developing multi-algorithmic approach, multi-sensor face recognition system, multi-level fusion algorithms, multiple classifiers etc. This paper works on similar lines to know the optimal combination of other biometric modality with face.

  • The type of fusion carried out has great impact on verification accuracy. This itself can be motivating to evaluate the fusion at all four levels.

The organization of paper is as follows: Sect. 2 presents a brief review of literature related to multimodal approach. Section 3 presents the proposed segmentation method of Palmprint and Handvein images. Section 4 discusses the tools and techniques employed in this paper. Experimental setup, results and subsequent analysis are presented in Sect. 5. Section sec6 presents concluding remarks followed by future avenues based on this study.

2 Review of Literature

Over the last two decades, numerous multimodal biometric systems have been proposed. Geng et al. [4] proposed context-aware multi-biometric fusion, which can dynamically adapt the fusion rules to the real-time context. As a typical application, the context-aware fusion of gait and face for human identification in video are investigated. Two significant context factors that may affect the relationship between gait and face in the fusion are considered i.e. view angle and subject-to camera distance. Fusion methods adaptable to these two factors based on either prior knowledge or machine learning are proposed and tested.

Chhabria et al. [21] proposed multimodal gesture biometric recognition system, authors have discussed pre classification and post classification techniques and also explored various statistical and normalization rules. Finally they have concluded with few inferences: score level fusion is most prevalent and easier to conduct. Score level fusion gives good results when the samples are of good quality for investigation, as a result weights could be assigned to individual scores based on quality issues. Hezil et al. [6] developed biometric identification system under the feature level fusion by fusing two modalities- ear and palm print, which is a unique combination in literature. They have performed extensive experimentation on benchmark databases such as IIT Delhi-2 ear and IIT Delhi palmprint, local texture descriptors were adopted to extract discriminating features. Separation of original signal from noise can be done in one of the ways: homogeneous area pre classification which estimates the noise variance image filtering techniques, Local variance estimation [6]. By adopting Discrete cosine transform, image structure can be preserved by high frequency coefficients and noise variance in the image can be computed by low frequency coefficients, such that it helps in separation of noise from the original image. They proposed noise variance estimation in two steps, in first step noise variance is estimated with the linear combination of normalized moments and learned coefficients and in the next step the look up table is generated by analyzing the Cumulative distribution function values of the training images. For a new image noise variance is computed by its CDF looking in to the look up table.

Pyatykh et al. [25] proposed noise level estimation adopting Principal component analysis blocks and the contributions of their work are as follows: efficient in performance issues, though the image is non homogeneous processing is done and also attained the good performance when compared with the state-of-art methods. Kearns M proposed a statistical approach for noise tolerance where a learning algorithm is restricted from identifying individual samples of the unknown target function [10]. Nadheen and Poornima [18] developed a multimodal biometric system adopting iris and ear extracting PCA features and fused these features at the score level using the statistical sum rule.

Youssef et al. [7] proposed a multimodal system employing on face and ear modalities, authors extracted block-based LBP features from these two traits and then fused them using the score level fusion. Eskandari et al. [3] proposed multimodal biometric system by fusing the match scores of face and iris that are obtained from several standard classifiers, authors have extracted features using several local and global feature extraction methods. Transformation based score level fusion and classifier based score level fusion is done in to classifying the concatenated matching scores.

Zhu et al. [20] proposed a multimodal biometric identification system based on finger geometry, knuckle print and palm print features. First preprocessing is done to get the finger and palm ROI (Region of Interest). Finger geometry features and knuckle print features of index, middle, ring and little fingers were extracted from the finger ROI, palmprint features represented with key points and their local descriptors were extracted from palm ROI. A coarse-to-fine hierarchical method was employed to match multiple features for efficient recognition in a large database. In the decision level AND rule fusion was adopted which has shown improvement in perfromance.

Tao et al. [26] proposed an optimal fusion scheme at decision level by the AND or OR rule, based on optimizing matching score thresholds. Both the theoretical analysis and the experimental results have been presented. In theory, the proposed decision fusion will always bring improvements over the original classifiers that are fused, and in practice, it also improves the system performance effectively, in away comparable or even better than the conventional matching score fusion. Marcialis et al. [16] proposed a novel mathematical model to perform serial fusion, which is simple and able to predict the performance of two serially combined matchers. The proposed model helps the designer in finding the processing chain allowing a trade-off between performance and matching time. Experiments carried out on well-known bench mark datasets made up of face and fingerprint images supports the use fullness of the proposed methodology when compare it with standard parallel fusion.

Separation of original signal from noise can be done in one of the ways: homogeneous area pre classification which estimates the noise variance [14, 15] image filtering techniques [1], Local variance estimation [2]. By adopting Discrete cosine transform, image structure can be preserved by high frequency co-effiecients and noise variance in the image can be computed by low frequency co-efficients, such that it helps in separation of noise from the original image [17]. They proposed noise variance estimation in two steps, in first step noise variance is estimated with the linear combination of normalized moments and learned coefficients and in the next step the look up table is generated by analyzing the Cumulative distribution function values of the training images. For a new image noise variance is computed by its CDF looking in to the look up table. Pyatykh et al. proposed noise level estimation adopting Principal component analysis blocks and the contributions of their work are as follows: efficient in performance issues, though the image is non homogeneous processing is done and also attained the good performance when compared with the state-of-art methods [22]. Kearns proposed a statistical approach for noise tolerance where a learning algorithm is restricted from identifying individual samples of the unknown target function [13].

3 Preprocessing

In this section, we propose Region of Interest (ROI) extraction from Palmprint and Handvein images.

Fig. 1.
figure 1

Region of Interest extraction of Palmprint: (a) and (b) Automatic contour detection (c) and (d) locating key points, these Points will be used to select the ROI (e) and (f) Final ROI is rectangular shape that forms 4 green points (Color figure online)

3.1 Rotation Invariant ROI Extraction from Palmprint Image

The algorithm that we are proposing here extracts essential palm region which is invariant to rotation. The main steps for obtaining the rectangular area ROI called central part sub-image of a Palmprint are summarized as in Fig. 1 and steps are as follows:

  • Anisotropic filter is applied to improve edge detection and also to find the contours of the palmprint image

  • Trace the boundary of the holes between the fingers. Calculate the center of gravity of the holes and decide the key points \(k_1\), \(k_2\), \(k_3\) respectively.

  • Line up \(k_1\) and \(k_3\) to get the Y-axis of the Palmprint coordinate system and then make a line through \(k_2\) perpendicular to the Y-axis to determine the origin of the palmprint coordinate system.

  • Once the coordinate system is decided a fixed size sub-image of the central part of a Palmprint is extracted.

  • Calculate rotation angle in using above key points and crop exact ROI.

3.2 ROI Extraction from Handvein Images

There are essentially two reasons for us to propose ROI extraction from Handvein images:

Fig. 2.
figure 2

Region of Interest extraction of Handvein

  1. 1.

    The discriminative information in an Handvein image is located in central region. There is very less or almost no useful features along the four sides.

  2. 2.

    To reduce the storage and computational burden.

As the essential region of interest in Handvein image. To increase the verification accuracy we have to extract the Region of Interest of handvein modality. Given raw handvein image, the process of extracting the ROI follows.

  • Convert gray scale image to binary by selecting suitable threshold.

  • Scan the image contour from left to right (profiling).

  • Select Knuckle tip (K1) as control point, then locate other key points K2 and K3 by scanning

  • Move some appropriate number of pixels from key points, that give reliable ROI.

Our algorithm of ROI extraction are shown in Fig. 2(a)–(d).

4 Proposed Multimodal System

The multimodal biometric system exhibits number of advantages as compared to that of unimodal biometric system and are listed below [23]

  • Multimodal biometric system works with more than one modality, hence it offers a substantial improvement in the accuracy as compared to other approaches of multibiometric system.

  • Multimodal biometric solves non-universality issues by covering a large population of users. If user cannot possess a single valid biometric trait still they can be enrolled into a system by using another valid biometric trait. However, it gives certain degree of flexibility to the user.

  • Multimodal biometric systems are less sensitive to imposter attacks. It is very difficult to spoof the legitimate user enrolled in multimodal biometric system.

  • Multimodal biometric systems are robust to the noise on the sensed data i.e. when information acquired from the single biometric trait is corrupted by noise we can use another trait of the same user.

  • These systems also help in continuous monitoring or tracking the person in situation when a single biometric trait is not enough. For example tracking a person using face and gait simultaneously.

Multimodal biometric systems are gaining popularity as it provides very high degree of performance and also high universality [12]. Since multimodal biometric system combine the information from different biometric traits, the core of the multimodal biometric system involves in performing the fusion of these information from different biometric trait. Fusion can be carried out at four different levels such as sensor level, feature level, match score level and decision level [19].

4.1 Sensor Level Fusion

At sensor level fusion, raw images acquired by different sensors or multiple snapshots are combined. The aim of sensor level fusion is to obtain the detailed information from both the images subjected to fusion [11]. In our experiments, we have adopted wavelet based image fusion. In wavelet based image fusion, decomposition is done with a high resolution image as it decomposes an image into a set of low resolution images with wavelet coefficients at each level. Then, it replaces a low resolution image with an MS band at the same spatial resolution level and finally, performs a inverse wavelet transformation to convert the decomposed image and replaced set back to the original resolution level. Further, features extraction and matching is performed on fused image.

4.2 Feature Level Fusion

In feature level fusion each individual modality process outputs a collection of features. The fusion process fuses these collections of features into a single feature set or vector. Feature level fusion performs well, if the features are homogenous (i.e. of same nature). However, if the features are heterogeneous, then it requires normalization to convert them into a range that makes them more similar. We used four well-known normalization methods [24].

  1. 1.

    Min-Max (MM) method: This method maps the raw scores (s) to the [0, 1] range. The quantities max(s) and min(s) specify the end points of the score range:

    $$\begin{aligned} n= \frac{s_{i} - min(s)}{max(s) - min(s)} \end{aligned}$$
    (1)
  2. 2.

    Z-score: This method transforms the scores to a distribution with mean of 0 and standard deviation of 1. The operators mean() and std() denote the arithmetic mean and standard deviation operators, respectively:

    $$\begin{aligned} n= \frac{s_{i} - mean(s)}{std(s)} \end{aligned}$$
    (2)
  3. 3.

    Tanh: This method is among the so-called robust statistical techniques. It maps the raw scores to the (0, 1) range:

    $$\begin{aligned} n = \frac{1}{2} \Bigg [ tanh \left\{ 0.01 \cdot \frac{s_{i} - mean(s)}{std(s)} \right\} + 1 \Bigg ] \end{aligned}$$
    (3)

4.3 Score Level Fusion

In this fusion, the matching scores of multiple palmprint spectral images are fused into a single score using different rules, such as sum rule, min rule and max rule; later scores are compared with the system acceptable threshold. The \(n_{i}^{m}\) represents the normalized score for matcher m (\(m=1,2, \cdots , M\), where M is the number of matchers) applied to user i (\(i=1,2,\cdots ,I\) , where I is the number samples in the database). The fused score for user i is denoted as \(f_{i}\). Popular rules used in score level fusion are [5]:

  • Sum Rule: \(f_{i} = \sum _{m=1}^{M} n_{i}^{M} \forall i \)

  • Min Rule: \(f_{i} = min(n_{i}^{1}, n_{i}^{2},\cdots ,n_{i}^{M}) \forall i\)

  • Max Rule: \(f_{i} = max(n_{i}^{1}, n_{i}^{2},\cdots ,n_{i}^{M}) \forall i\)

4.4 Decision Level Fusion

Fusion at decision level can be done by adopting appropriate threshold, but only a small amount of information is available in taking decisions [5]. Hence, it is not very accurate, here only Boolean decisions exists like, accept or reject. The output of each matcher can be merged into one single decision using logical AND and OR rule which finally gives single decision.

4.5 Log-Gabor Based Feature Extraction

We have adopted Log Gabor as feature extraction algorithm for all the modalities which we have selected to effectively represent biometric samples. Log Gabor transform have consistently achieved high recognition rates in all traditional unimodal biometric system. The Log Gabor transform has a response that is Gaussian when viewed on a logarithmic frequency scale instead of linear one. Because of their Gaussian profile, Log Gabor filter provide an optimal joint space-frequency localization whose shape is smooth, symmetric, infinitely differentiable [9]. Hence Log Gabor transform allows one to capture more information in high frequency areas and also possess high pass characteristics and there by reflecting the frequency response of image more realistically. On the linear frequency scale, the transfer function of the Log Gabor transform has the form [8]

$$\begin{aligned} G(\omega )= exp \left\{ \frac{ -\log (\omega / \omega _{o})^{2} }{2 \times \log (k/\omega _{o})^2} \right\} \end{aligned}$$
(4)

where \(w_{0}\) is the filter’s center frequency. To obtain constant shape ratio filters, the term \(k/w_{0}\) must also be held constant for varying \(w_{0}\). In our experimentation, we have selected effecitve parameter such as filter has a bandwidth of approximately 2 octaves and filter bank is constructed with 8 orientations and 4 different scales.

5 Experimental Results and Discussion

5.1 Experimental Setup

In this section, we on the experimental setup made in our study. For face samples, we have considered AR database, we used PolyU databases for Palmprint and High resolution fingerprint. Iris and Handvein databases of CASIA and Bosphorus respectively. In all the experiments, training was performed by considering three views of each user and two views were used for subsequent testing. Since, Handvein database consists of maximum five samples of each user. The performance was studied under both clean and noise conditions, for unimodal as well as different face centric multimodal approaches.

Fig. 3.
figure 3

Few example images of face, palmprint, fingerprint, handvein and iris modalities: clean images (Top) and corrupted by noise (Bottom)

We have used real time noise in our experiments such as Gaussian noise, to validate the robustness of both unimodal and multimodal systems. For training we have used clean three samples of individual modality and while testing have considered the two noise corrupted samples. The clean and noise corrupted images are shown in Fig. 3.

5.2 Results on Clean vs Noisy Unimodal Biometric Systems

The main objective of our experimentation is to provide consequences of level of fusion under different strategies of face centric multimodal system. We have chosen Face, Palmprint (Pp), Iris, Handvein (Hv) and Fingerprint (FP) modalities and log gabor as feature extraction algorithm which is assumed in yielding good performance to all the above modalities. In all of our experiments, performance of levels of fusion is measured in terms of False Acceptance Rate (FAR) at values 0.01%, 0.1% and 1% and its related values of Genuine Acceptance Rate (GAR in %) is tabulated.

Table 1. Performance analysis of unimodal system

Initially, we have conducted experiments on each single modality; the performance is measured for each modality and results are tabulated Clean\(^{(Noise)}\). Table 1 indicates that clean and noisy face modality GAR is higher at 0.1% and 1% value of FAR. Compared to other modalities fingerprint, palmprint, Handvein and iris. However, the at 0.01% FAR clean palmprint biometric outperform the better than considered biometrics. The lowest performance is observed in table for iris unimodal system on clean and corrupted by noise condition.

5.3 Results on Clean vs Noisy Multimodal Biometric Systems

In this section, we have presented empirical results on comparative analysis at different levels of fusion under various rules in developing multimodal approach. Different bio-modal biometric systems are proposed preserving facial feature has common modality in fusing with other employed modalities.

Table 2. Face and palmprint fusion at various levels

Table 2, shows the evaluation of face and palmprint modalities being fused and comparison is done on sensor, feature, score and decision level fusion with their relevant fusion rules. On fusion of face with palmprint modality, the bimodal verification system has yielded results: at sensor level fusion (94.5%, 92.5%), feature level-Min-Max rule (96%, 94%), score level-Sum rule (97.5%, 96%), decision level-Or rule (97.5%, 88.5%) of GAR at 1% FAR on clean and noisy database respectively. We can observe that sum rule of score level fusion gives highest values of GAR% at different values of FAR% compared to other fusion levels and its rules. Definitely these results proves performance is high when compared to the earlier experimented unimodal case.

Table 3. Face and fingerprint fusion at various levels
Table 4. Face and Handvein fusion at various levels

Results on fusion of face and fingerprint is tabulated in Table 3, Sensor level fusion (93.5%, 89.5%), feature level-Tanh rule (76.5%, 72%), score level –sum rule (96.5%, 92%), decision level (92%, 73.5%) has obtained GAR at 1% FAR for clean and noisy database respectively. Again here also score level fusion sum rule is performing better than other levels of fusion. Even the sensor level fusion performance is healthier. The lowest performance is found on decision level AND rule.

Table 4 infers the multimodal approach of face and Handvein. Even though, score level fusion-Sum rule gives (95%, 92.5%) highest performance on both clean and noisy data respectively, the feature level fusions normalization schemes Min-Max, Z-score and Tanh perform equally well with GAR% with different values of FAR.

Table 5. Face and Iris fusion at various levels

Finally, we have fused Face with Iris, at different levels of fusion. Results from Table 5 states that, all the normalization schemes of feature level outperform other level of fusion and has given highest level of accuracy. In all the set of experiments score level fusion is performing consistently.

6 Conclusions

This paper presents an overview of unimodal and multimodal biometric verification systems and comparative study on levels of fusion. Multimodal biometric systems will solve the drawbacks faced by unimodal systems more elegantly as seen from our results. From the experimental results obtained we arrive at the following inference: (a) all biometric modalities performance is always dependent on feature extraction in our case we have used a generalized feature extraction Log gabor which gives better performance results. (b) A new modality introduced to any biometric system gives complementary information. Hence, one can find significant improvement in accuracy. (c) In Score level fusion, sum rule always gives better performance compared to other levels of fusion. (d) While improved performance is available with increasing additional modalities, judging the right combination is very critical. (e) The proposed model gathers the results collectively on both the clean and noisy data experimented on all levels of fusion under the adopted rules gives the robustness analysis, we can see from the results obtained that the system is performing consistently on both kinds of data on most of the evaluation sets exhibited on the tables. In general multimodal always yields better results than any other approaches. However developing a generalized framework that governs the dynamic selection of Modality/Fusion/Feature extraction algorithms implies higher cost, more processing and difficult to deploy and maintain, our future work would be intended working on this idea.