Keywords

1 Introduction

Biometric recognition systems are increasingly being used to recognize an individual in her office, government departments, banks and even for getting necessary commodities. In biometric based identification system, the biometric traits of an individual (i.e., a probe) are matched with those of every enrolled user (i.e., subject) in the system. On the contrary, in a verification system, the biometric traits of an individual are compared with the stored biometric traits of her. Traditionally, biometric system has been unimodal, where a single biometric trait is used to establish one’s identity. But the unimodal biometric system faces several challenges as follows: non-universality, noisy data, inter-class similarities, intra-class variation, interoperability issues and susceptibility to circumvention. In order to overcome these challenges, multimodal biometric system has been adopted. Apart from fulfilling the gaps of a unimodal biometric system, multimodal biometric system is more secure and more accurate which has been proved over the period of time.

Multimodal biometrics combine information from multiple biometric traits (fingerprint and face [1], different finger surfaces [2], face and palm-print [3], etc.) or multiple representations of same biometric trait (various feature descriptors of palm-print biometrics [1]). This fusion of information can occur at sensor level, feature level, score level, rank level and decision level. Scope of the current paper is limited to rank level fusion. Rank level fusion [1,2,3,4,5,6] method is useful when there is less information (similarity or dissimilarity scores between the probe and the enrolled subjects) available to perform the fusion. Scores from different modalities can also be fused using a score level fusion [7,8,9,10] method. But these scores need to be normalized [10] in order to perform the score level fusion. On the contrary, these scores can be sorted to produce a rank list. Multi-level fusion schemes [11] can be found in literature.

A rank level fusion method combines several rank lists as obtained from multiple biometric modalities. A rank list, in this context, is a list of ranks of each subject as compared with a probe (based on a similarity or a dissimilarity score). Several rank level fusion methods have been proposed in the literature. Borda count, weighted borda count, highest rank and logistic regression are basic rank level fusion strategies [12] available in literature. For example, a rank level fusion method based on logistic regression and Borda count for combining kinetic gait and face biometrics has been proposed in [4]. In [3], the fusion of face and palm-print has been performed at rank level using highest rank, Borda count and logistic regression.

Apart from the above methods, several non-linear rank level fusion methods also exist. In [2], few such non-linear weighted rank methods have been used for rank level fusion for finger surface biometrics. These are, hyperbolic tangent, hyperbolic arc sinus, hyperbolic arc tangent, division exponential, and logarithm. In [1], fusion of three rank lists (as obtained from three different features descriptors of palm-print) has been carried out using two nonlinear methods, namely exponential and weighted exponential methods. In [5], rank lists as generated through multiple rank level fusion methods have been consolidated either serially or parallelly. Serial combination is obtained by a combining functions as \(f_2(f_1(x))\), where \(f_1(x)\) and \(f_2(x)\) are two different rank level fusion methods. Similarly, parallel combination has been performed by combining all the rank lists as generated using various rank level fusion methods using a hyperbolic tangent rank level fusion method. Fusion of multimodal biometrics involving face, iris and ear is done at rank level [6]. A Markov chain is used for this task. In this method, a Markov chain has been established on the enrolled subjects. Transitions in this Markov chain represent an order relation among these enrolled subjects. A rank list is obtained using stationary distribution of this Markov chain.

In a completely different approach, the present paper perceives the rank level fusion of multimodal biometrics as an optimization problem. In this context, the goal is to minimize the distances between an aggregated rank list and the input rank lists. Two widely used distance measures in the domain of rank aggregation problems - (i) Spearman footrule [13] and (ii) Kendall’s tau [13] distances - are considered in the proposed method. In this work, cross-entropy Monte Carlo algorithm [14] based approach has been proposed to solve the above optimization problem in the context of multi-modal biometrics. The proposed approach has been experimentally studied on two different datasets: (i) NIST BSSR1 [15] and (ii) OU-ISIR BSS4 [16, 17]. Experimental results justify the suitability of the proposed approach of rank level in the context of multimodal biometrics.

The rest of this paper is organized as follows: A detailed formulation of rank level fusion of multimodal biometrics as an optimization problem is presented in Sect. 2. Section 3 describes the proposed rank level fusion method using cross-entropy Monte Carlo method. Section 4 reports the results of applying the proposed rank level fusion method as well as several state-of-the-art fusion methods on two multimodal biometric datasets. Finally, Sect. 5 draws the concluding remarks.

2 Formulation of Rank Level Fusion as an Optimization Problem

Let \(P_1\), \(P_2\), ... , \(P_N\) be various biometric traits to identify a person. Let the matching score \(Q_i^j\) be associated with each such biometric trait \(P_i\) for a \(j^{th}\) person (subject) for an input probe. A rank list \(L_i\) of those subjects can be generated from an ordering of these matching scores. Considering a high value of \(Q_i^j\) as good (for a similarity score), the following is true about the rank list \(L_i\): \(Q_i^j > Q_i^k\) implies \(L_i^j < L_i^k\). Here, \(L_i^j\) indicates the rank of the \(j^{th}\) subject in the list \(L_i\).

Fig. 1.
figure 1

Fusion of multimodal biometrics at rank level

Therefore, N rank lists are created as \(L_1\), \(L_2\), ..., \(L_N\) for biometric traits \(P_1\), \(P_2\), ... , \(P_N\), respectively. A combination of these N rank lists generates an aggregated rank list \(\delta ^*\) as shown in Fig. 1.

$$\begin{aligned} \delta ^* = aggregate(L_1, L_2, ... L_N) \end{aligned}$$
(1)

The objective, here, is to generate the aggregated rank list \(\delta ^*\) having minimum distances from the input rank lists \(L_1\), \(L_2\), ..., \(L_N\). Hence, the objective function to generate an aggregated rank list can be defined as:

$$\begin{aligned} minimize \,\,\, \varPhi (\delta ) = \sum _{i=1}^N w_i \times d(\delta ,L_i) \end{aligned}$$
(2)

\(\delta \) denotes an aggregated rank list.

  • \(L_i\) is the \(i^{th}\) input rank list (as obtained from biometric modality \(P_i\)).

  • N denotes number of modalities.

  • \(w_i\) denotes the associated weight for rank list \(L_i\). For the reported experiments in this paper, each input rank list has been assigned same weight.

  • d denotes a distance metric between two lists.

The goal is to obtain an aggregated rank list \(\delta ^*\), which minimizes the objective function \(\varPhi (\delta )\), among the set of all candidate rank lists.

Spearman footrule [13] and Kendall’s tau [13] distances are applied here to calculate the distance between two rank lists.

In order to estimate the distance between two rank lists, Spearman footrule distance [13] considers summation of the absolute differences between ranks of each subject in two lists as:

$$\begin{aligned} d(\delta , L_i) = S(\delta , L_i)= \sum _{t \epsilon L_i \cup \delta } |r^{\delta }(t)- r^{L_i}(t)| \end{aligned}$$
(3)

Here, \(r^{\delta }(t)\) represents the rank of subject t in the list \(\delta \). \(r^{L_i}(t)\) represents the rank of subject t in the input rank list \(L_i\).

As per Kendall’s tau distance [13], the distance between the aggregated rank list \(\delta \) and the input rank list \(L_i\) is estimated by counting the number of disagreements in the rank ordering by considering every pair of subjects between these two lists.

$$\begin{aligned} d(\delta , L_i) = K(\delta , L_i)= \sum _{t,u \epsilon L_i \cup \delta } K^p_{tu} \end{aligned}$$
(4)
$$\begin{aligned} where, K^p_{tu}= {\left\{ \begin{array}{ll} 0 &{} \text {if }r^{\delta }(t)< r^{\delta }(u), r^{L_i}(t)< r^{L_i}(u) \\ &{} \text {or }r^{\delta }(t)> r^{\delta }(u), r^{L_i}(t)> r^{L_i}(u),\\ 1 &{} \text {if }r^{\delta }(t)> r^{\delta }(u), r^{L_i}(t)< r^{L_i}(u) \\ &{} \text {or }r^{\delta }(t) < r^{\delta }(u), r^{L_i}(t)> r^{L_i}(u),\\ p &{} \text {if }r^{\delta }(t) = r^{\delta }(u){ or}r^{L_i}(t)=r^{L_i}(u) \end{array}\right. } \end{aligned}$$
(5)

Here, p is penalty and its value is set to 0.5.

If subject t is not present in one of the two lists (either \(\delta \) or \(L_i\)), the rank of the subject (\(r^{\delta }(t)\) or \(r^{L_i}(t)\)) in the list is considered as one more then the size of the list.

3 Cross-Entropy Monte Carlo Algorithm

In this paper, the cross-entropy Monte Carlo algorithm has been presented to solve the above optimization problem (Eq. 2). The cross-entropy (CE) Monte Carlo algorithm [14] is an iterative method to solve combinatorial problems. The cross-entropy Monte Carlo method is presented in this section.

A rank list \(\delta \) can be represented as a matrix X of size \(n \times k\) having values \(x_{jr} \epsilon \){0,1}. n is number of subjects in the list and k represents total number of rank positions. In this matrix X, each row represents a subject (person) and column represents each rank. So, this matrix will have its entries as 0’s and 1’s, while satisfying the following constraint on each row and each column. The summation of \(x_{jr}\) values in each row and in each column has to be 1. The places in the matrix X, where 1 is present for each subject, will define the rank of the subject. The matrix X defines a rank list of size k uniquely using the placement of 1’s in each row. For example, a set of subjects A, B, C and D having ranks 3, 1, 2 and 4, respectively, can be represented using a \(4 \times 4\) matrix X as:

$$ X= \begin{bmatrix} 0 &{} 0 &{} 1 &{} 0\\ 1 &{} 0 &{} 0 &{} 0\\ 0 &{} 1 &{} 0 &{} 0\\ 0 &{} 0 &{} 0 &{} 1 \end{bmatrix} $$

The solution space \(\chi \) can be defined for the proposed optimization problem as a collection of all such feasible matrices. Here, the aim is to find the rank list (i.e., the matrix X) with minimum objective function value over the solution space. This will generate the aggregated rank list.

X follows the probability mass function (pmf) \(P_{v^t}(X)\) which is reflected in a parameter matrix \(v_{n\times k}= p_{jr}\).

$$\begin{aligned} P_{v^t}(X) \propto \prod ^n_{j=1} \prod ^k_{r=1}&(p_{jr})^{x_{jr}} \nonumber \\&\,\,\,\,\times I(\varSigma ^k_{r=1} x_{jr}\le 1, 1\le j\le n; \varSigma ^n_{j=1} x_{jr}= 1, 1\le r\le k) \end{aligned}$$
(6)

Here, the element at the \(jr^{th}\) position of the X matrix is denoted by \(x_{jr}\). Each element in the parameter matrix v is denoted as \(p_{jr}\) which indicates the probability of \(j^{th}\) subject having \(r^{th}\) rank.

The steps of the CE algorithm [14] is given below. These steps are repeated until the algorithm converges.

  1. 1.

    Initialization: Let t denote the current iteration number, which is initialized to 0. A parameter matrix \(v^0\) of size \(n \times k\) is initialized to have same value for every element of the matrix. In this matrix, row corresponds each subject (person) and column corresponds individual ranks. Initially, the probability of \(j^{th}\) subject having \(r^{th}\) rank \(p^0_{jr}\) (i.e., \(jr^{th}\) element in \(v^0\) matrix) is assigned a value 1/n. Here, total number of subjects (persons) in the list is n and k represents the total number of rank positions. So, initially every subject has an equal opportunity for being in the candidate list. The objective function \(\varPhi (\delta )\) in Eq. 2 is evaluated for such a candidate list.

  2. 2.

    Sampling: \(N_S\) samples (candidate lists) are generated from the probability mass function (pmf) \(P_{v^t}(X)\) (Eq. 6) at each iteration t. For each of the \(N_S\) candidate lists \(\delta _i\)’s, its objective function value \(\varPhi (\delta )\) is estimated using Eq. 2. An ascending order sorting of these candidate lists helps to obtain the \(\rho \)-quantile \(y^t = \varPhi _{[\rho N]}\). The value of \(\rho \) is assumed as 0.1 for the reported experiments.

  3. 3.

    Updation: The probability values \(p^{t+1}_{jr}\) are updated as:

    $$\begin{aligned} p^{t+1}_{jr} = (1-w)p^t_{jr}+ w \frac{\sum ^{N_S}_{i=1} I(\varPhi (\delta _i)\le y^t)x^{(i)}_{jr}}{\sum ^{N_S}_{i=1} I(\varPhi (\delta _i)\le y^t)}, \end{aligned}$$
    (7)

    where, the \(jr^{th}\) position of the \(i^{th}\) sample (denoted by matrix \(X_i\)) is \(x^{(i)}_{jr}\). For the experiments, \(N_S\) is set to 1000, k is set to 5, w is a weight parameter. The value of w is set to 0.25.

  4. 4.

    Convergence: This algorithm stops if changes in the optimal list are less than a threshold for a specified number of iterations. The number of iteration is set to 5 for the reported experiments.

4 Experimental Results

Performance of the proposed cross-entropy Monte Carlo method (independently using Spearman footrule and Kendall’s tau distances) is experimentally compared against several existing fusion methods (both at rank level and score level). These existing state-of-the-art methods have been mentioned in Sect. 4.1. Moreover, this experimental comparison has been carried out for two different tasks of multimodal biometrics. Hence, two different datasets have been used in these experiments. These datasets along with the corresponding performance measures of all these comparing methods have been reported in Sects. 4.2 and 4.3.

4.1 State-of-the-Art Methods for Performance Comparison

Performance of the proposed rank level technique has been compared against the existing state-of-the-art linear and non-linear rank level fusion methods. Borda count, weighted Borda count and highest rank methods belong to linear rank level fusion methods [12]. Similarly, non-linear rank level fusion methods are exponential [1], weighted exponential [1], division exponential [2] and logarithm [2] methods. The proposed method has also been compared against few state-of-the-art score level fusion methods, sum-rule [9], product-rule [18], max-rule [18] and min-rule [18].

Some of these existing methods (weighted Borda count, exponential, weighted exponential, division exponential and logarithm) require weights to be assigned for various biometric modalities. The performances of these methods depend on these selected weights. An elitist genetic algorithm has been used to search for the set of weights for various modalities. Recognition accuracy (as defined as the percentage of probe for which correct matching subject has been found) is considered as a fitness criteria for this genetic algorithm.

Moreover, the set of selected weights depends on the dataset on which the weights are being trained. Hence, k-fold cross validation (using \(k=5\)) is used to eliminate this dependency. The dataset is splitted into k parts. \(k-1\) parts have been considered as training set to learn the set of weights using the elitist genetic algorithm and remaining one partition is used as test set to report the recognition accuracy. This is repeated k-times to get different set of weights and corresponding accuracies on the test sets. Finally, average accuracy from k such test sets has been presented in the result.

Table 1. Cumulative recognition accuracies in % for NIST BSSR1 dataset using various comparative methods

4.2 Fusion of Multimodal Biometrics Involving Face and Fingerprint

In this paper, the first dataset namely (BSSR1) is from the data repository of NIST [15]. This dataset has been widely used to study fusion of multimodal biometrics [1, 7, 8, 19]. In this dataset, four biometric modalities have been considered. Two of these modalities are for face biometrics (using two different matchers, termed as G and C in the dataset). Fingerprints of the right index finger and left index finger are the other two modalities. These above four biometric modalities were acquired for each of the 517 persons (subjects) during enrollment phase. The dataset contains similarity scores of each of these subjects as a probe with all 517 subjects as per two different face matchers (termed as G and C) and fingerprint matchers for right and left index fingers. These similarity scores from various biometric modalities are fused using existing score level fusion methods (as mentioned in Sect. 4.1).

Moreover, rank lists are generated based on the given similarity scores. This provides four rank lists for each probe. These rank lists are combined using the proposed rank level fusion method (Sect. 3) and other existing methods as discussed above (Sect. 4.1). Table 1 presents the recognition accuracies of various comparing methods (in %) for the probe subjects within top 1, top 2 and top 3 ranks (cumulative). It also presents the recognition accuracies for each of the unimodal biometrics in this dataset.

From Table 1, the results clearly show that the proposed cross-entropy Monte Carlo algorithm based on Spearman footrule (CES) and Kendall’s tau (CEK) distances performs better than most of the comparing methods. The reason for this superiority of the proposed method is that the method considers minimization of the distances between aggregated and input rank lists. Only the performance of the division exponential method is equal to the proposed CE method with Spearman footrule distance. It is also noticed that the proposed method is performing better than each unimodal matcher justifying the need for multi-biometric system.

4.3 Fusion of Multimodal Biometrics for Various Gait Features Representations

Additionally, the second dataset (BSS4) is form the Institute of Scientific and Industrial Research (ISIR), Osaka University (OU) [16, 17]. This dataset has also been used for fusion of multimodal biometrics in [20, 21]. In this dataset, input image sequence from gait has been processed using five different feature extraction methods: (i) Gait energy image (GEI), (ii) Frequency-domain feature (FDF), (iii) Gait entropy image (GEnI), (iv) Chrono-gait image (CGI), (v) Gait flow image (GFI). The dataset is composed of dissimilarity scores of each of these 3249 subjects as probe with all 3249 subjects for above mentioned features. These dissimilarity scores from above gait features are fused using existing score level fusion methods. Details of these gait feature extraction methods can be found [17, 20].

Moreover, rank lists are generated based on the given dissimilarity scores. This provides five rank lists for each probe. These rank lists are combined using the proposed rank level fusion method (Sect. 3) and other existing rank level fusion methods as discussed in Sect. 4.1. Table 2 presents the recognition accuracies of various methods (in %) for the probe subjects within top 1, top 2 and top 3 ranks (cumulative). The table also presents the recognition accuracies for each of the unimodal biometrics in the dataset.

Table 2. Cumulative recognition accuracies in % for OU-ISIR BSS4 dataset using various comparative methods

From Table 2, the results clearly show that the proposed cross-entropy Monte-Carlo algorithm based on Spearman footrule (CES) and Kendall’s tau (CEK) distances has superior performance over other methods except score level fusion with sum and rank level fusion with division exponential method. The justification made for the previous dataset BSSR1 is also applicable here. It is also noticed that the proposed method is performing better than each unimodal matcher justifying the need for multi-biometric system.

5 Conclusion

Rank level fusion has been studied, in this paper, for multimodal biometrics. The manifold contributions of this paper are highlighted here: The rank level fusion in multimodal biometrics is formulated as an optimization problem. In order to solve this optimization problem, cross-entropy Monte Carlo method (using two distant measures, namely Kandall tau and Spearman footrule distances) is proposed. The proposed method is tested for two different multi-biometric datasets: BSSR1 and BSS4. The proposed method using both of the distance metrics provides better performance in identifying the subjects than most of the existing methods of fusion at rank level (e.g., Borda count, weighted Borda count, highest rank, exponential, weighted exponential, division exponential, logarithm) and score-level (product-rule, sum-rule, max-rule and min-rule) for multimodal biometric systems. Experiments also justify the usefulness of multimodal biometric system over unimodal biometric system. Similarly, the proposed model can be applied for any other multimodal biometrics. Moreover, initial success for the reported experiments is encouraging enough to try out other meta-heuristic search and optimization strategies (like genetic algorithm, particle swarm optimization etc.) in the context of rank level fusion of multimodal biometrics.