Keywords

1.1 Introduction

In today’s security conscious society, automatic personal authentication is important in different applications including government, commercial, educational institutions, industries, public places, etc. Questions such as “Is this the person who he claims to be?”, “Should this individual be authorized to perform this transaction?”, “Does this employee have authorization to access this service?” etc., are asked millions of time every day by thousands of organizations in both private and public sectors [1].

Existing systems use either identity cards or passwords for personal authentication (Fig. 1.1a). These security systems no longer suffice for individual authentication because cards can be stolen or forged and a password can be forgotten or cracked. The following are some interesting statistics:

  1. 1.

    According to a report by Nilson, “$11.27 billion losses due to credit card and debit card fraud during 2012” [2].

  2. 2.

    According to American Bankers Association’s Deposit Account Fraud Survey-2011, “Financial institutions incurred $955 million in losses due to debit card fraud in 2010, which is around a 21% increase from the $788 million in losses incurred during 2008” [2].

  3. 3.

    According to the Gartner Group, “between 20 to 50% of all help desk calls are for password resets and the average help desk labor cost for a single password reset is about $70” [3].

The above statistics shows the need of an accurate and efficient approach for personal recognition. Biometric recognition that uses humans fingerprint and/or palmprint and/or iris, etc., is a better choice and a reliable solution for convenient human recognition (Fig. 1.1b). As humans biometric features are unique, cannot be stolen/forgotten, and the person must be physically present during authentication [4], biometric recognition systems are gaining popularity and deployed in many important applications [5,6,7,8,9,10,11,12,13]. This results large-scale biometric databases in real time and an identification system need to search millions of records to identify a query. As the biometric data do not have any natural sorting order like numeric or alphabetic [14, 15], recognition in these large biometric systems is a challenging problem. In this book, we explore methods that are capable of searching biometric databases in real time with a high level of confidence.

Fig. 1.1
figure 1

Personal authentication techniques: a Traditional methods such as identity cards, Passwords, etc., b Biometric characteristics [16]

1.2 Biometric Recognition

“A biometric system is a pattern recognition system that recognizes individuals based on the measurement of their physiological and/or behavioral traits: Physiological traits include a person’s fingerprint, facial features, palmprint, vein pattern, or ocular characteristics; Behavioral traits include voice, gait, keystrokes, signature etc.” [17]. The word biometrics is derived from the Greek words bios (meaning life) and metron (meaning measurement), i.e., biometric traits are the measurements from living human body. Figure 1.2 shows a few of the biometric traits (including physiological and behavioral) for personal recognition.

A generic biometric system is shown in Fig. 1.3. It consists of two modules: enrollment and recognition.

Enrollment

This module enrolls the individuals into the biometric system (Fig. 1.3a). During enrollment, a sensor captures the biometric characteristic of an individual, from which a set of features (template) are extracted by a feature extractor. Depending on the application context, the extracted feature template may be stored in a central database along with the individual’s identity (name, ID number, etc.) or be recorded on a smart card issued to the individual.

Recognition

This module recognizes the identity of an individual at the point of service. During this phase, the sensor acquires the biometric characteristic of the individual to be recognized. The captured biometric image is preprocessed by the feature extractor to generate the template. The extracted template is compared to the prestored template(s) using a matcher to establish the identity. The process of user recognition in biometric systems is shown in Fig. 1.3b, c. A biometric recognition system is designed to work in one of the two different modes: (i) verification or (ii) identification.

Fig. 1.2
figure 2

Different biometric traits for personal recognition

1.2.1 Verification

In verification mode, the user will claim his identity by using a user name, or a personal identification number, or a smart card, etc., along with the biometric data. The system will then verify the user by matching the acquired biometric characteristic with his own biometric sample prestored in the system. The system in this mode, conducts a one-to-one matching to determine whether the identity claimed by the individual is true or not [18]. In this case, the question “Is Mr. X really who he claims to be?” is answered in either acceptance or rejection. An example of the verification scenario occurs when we try to use the ATM at a bank. We have to provide our biometric data along with ATM card to verify our identity. In this case, the system compares the provided biometric data with our prestored template to ensure that the true owner is the one who is using the card to perform the transaction. The process of recognizing a user in verification mode can be seen in Fig. 1.3b.

1.2.2 Identification

In this mode, the user does not claim any identity. The user provides his biometric data, and the data is compared to the stored template of every individual in the system database. The system in this mode, conducts a one-to-many comparison to find the identity of an individual. In this case, the question “To whom does the submitted biometric data belong?” is answered. For example, if a fingerprint impression is found at a crime scene, to determine the suspect it is compared to all the enrolled fingerprints in the database. If a match is found, the identity of the suspect is determined. The process of recognizing a user in identification mode can be seen in Fig. 1.3c.

Fig. 1.3
figure 3

Different modes of operation of a generic biometric system [16]

1.3 Indexing

In today’s security conscious society, biometric recognition systems became more popular and deployed in variety of applications such as surveillance, border control, network access, banking, employee authentication, etc. The market for biometric applications is growing worldwide, and specifically in emerging economies, such as India, where scalability is a huge challenge. According to a market research report by Acuity Market Intelligence (AMI) [19], the market for worldwide biometrics industry is expected to grow steadily from an annual revenue of 3.4 billion USD in 2009 to 11 billion USD in 2017 as shown in Fig. 1.4.

Note that, most of these biometric systems deal with large-scale databases and their size is increasing at a rapid pace. For instance, India’s national ID program [5] called Unique Identification Authority of India (UIDAI) registered a database of 700 million people. It will reach 1.25 billion people in a few years and the number of accesses per day is expected to be 1 to 5 million. In the United States, Federal Bureau of Investigation (FBI) developed a fingerprint database called Integrated Automated Fingerprint Identification System (IAFIS) [20]. Currently, it has records of over 51 million criminals and over 1.5 million noncriminals.

Fig. 1.4
figure 4

Acuity Market Intelligence (AMI) Report

However, identification of an individual in such large databases is typically determined by matching his/her biometric template with each enrolled template in the database. This is computationally expensive, i.e., response time increases linearly with size of the database. Hence, there is a need of efficient retrieval methods that can enable searches in reduced space of the database and thus reduces the search time without compromising accuracy.

This problem, i.e., search space reduction in biometric databases may be stated as follows: Given a large biometric database D and a query q, the identification system has to,

  • Quickly retrieve a candidate set C from D such that the retrieved images in C are most similar to q,

  • \(\vert C \vert \ll \vert D \vert \), and

  • C must contain q’s identity with high probability.

There are two different approaches to handle this problem. The first one is partitioning the images stored in the database [21]. The entire database is divided into small number of partitions, i.e., classes. To identify a query, first its class is determined and compared only with the candidates of that class to which the query belongs. However, this approach uses only predefined classes and the images are unevenly distributed among them resulting in variation in the system performance [4]. Further, the system must handle rejected templates carefully.

The second approach is indexing which computes an index to every individual (Fig. 1.5). To identify a query, this technique retrieves a set of similar candidates from the database whose index are most similar to it. Next the query is compared only with the retrieved similar candidates instead of with the complete database and thus reduces the search space.

Fig. 1.5
figure 5

Process of Biometric identification using indexing approach [16]

1.3.1 Challenges

The following are few issues that need to be considered while indexing.

  • Intra-class variations, i.e., two images of the same user obtained at different time instances may not be same. This is mainly because of,

    • Different sensors at different times.

    • Poor maintenance of sensors.

    • Changes in lighting conditions.

    • Lack of user cooperation. For example, a person may have beard or glasses at enrollment time but not at identification time; different facial expressions at different times.

    This may increase the false rejections of the system as different indexes are possible for the same user.

  • Inter-class similarity, i.e., overlap of feature space of different users leads to increase in false matches.

  • Further, indexing methods of relational databases are also not suitable for biometric data [14, 15]. In relational databases, records (or data) are arranged in an alphabetical or numerical order with respect to a primary key for efficient retrieval. But biometric templates do not have any sorting order to arrange [14, 15].

  • Finally, the indexing methods for multimedia (i.e., image, video) databases are also not suitable for biometric databases [22,23,24,25,26,27,28,29,30,31]. The following are a few reasons:

    • In multimedia databases, there is large variability among the subjects in terms of appearance i.e., different type of subjects (like trees, humans, buildings, etc.) are present in the database. Hence, a coarse-level classification is possible. But, there is little appearance variability among the biometric images of different users, i.e., the biometric samples of different users look almost similar. For example, in a fingerprint database, the impressions of different users almost look similar with few differences.

    • The multimedia (especially image and video) data are represented with metadata [31] such as annotated text, symbols, tags, etc., which is not possible for biometric data.

    • Finally, the feature representation of biometric data is different from multimedia data [32]. Basically, the multimedia data is represented with texture [24, 27,28,29], color [22, 23, 25] and shape [26] features. However, most of the biometric characteristics do not contain these features.

1.4 Biometric Indexing Techniques

The fast identification in biometric databases can be achieved by two different approaches: classification and indexing. These approaches are used to filter the search space during identification process. In classification approaches, the database images are divided into different groups (classes) such that the images in the same class are similar in terms of some quantitative information. During identification, the class of the query is first identified and then it is matched with only the images present in that class. However, as said earlier, these approaches have a serious limitation that the images are unevenly distributed among the predefined classes which makes the system statistically unreliable for faster identification.

In indexing approaches, each image is assigned an index based on its features. During identification, the query is matched with only the images which have similar index. Majority of current developments for biometric indexing are based on one of the following features:

  1. 1.

    Key feature points [33,34,35,36,37,38,39,40,41,42]

  2. 2.

    Geometric properties of Triplets [18, 43,44,45,46,47,48]

  3. 3.

    Match scores [49,50,51,52,53,54]

  4. 4.

    Other approaches

    • Ridge orientation based (for fingerprint) [55,56,57,58]

    • Texture based (for palmprint [59,60,61,62], iris [32, 63, 64])

    • Color based (for iris) [65,66,67]

    • Subspace approximations (for face) [68,69,70].

1.4.1 Key Feature Point Based Indexing Approaches

These approaches extract the key feature points from the biometric samples and use them for indexing purpose. Boro et al. [33] developed an indexing technique using fingerprint minutiae points (i.e., bifurcation and end points). The minutiae features are enrolled into a hash table using geometric hashing [71]. Jayaraman et al. [36] proposed a minutiae-based geometric hashing technique for fingerprint indexing. A fixed length feature vector called Minutiae Binary Code (MBC) is computed for each minutia in the fingerprint. The minutiae and its feature vector are stored into the hash table using geometric hashing.

Mansukhani et al. proposed an indexing approach based on minutiae tree [34]. They constructed a large index tree where the enrolled templates are represented by the leaves of the tree. The branches in the index tree correspond to different local configurations of minutiae points. Searching the index tree entails extracting local minutiae neighborhoods of the test fingerprint and matching them against tree nodes. Cappelli et al. developed Minutiae Cylinder-Code (MCC) based indexing technique [35]. For each fingerprint, a fixed size binary code is computed. This code is a representation of spatial and directional relationships between a minutia and its neighborhood structure with a minutiae cylinder. To find the best matches, Locality Sensitive Hashing (LSH) technique is used.

Badrinath et al. [37] propose an efficient indexing scheme using geometric hashing of Speeded Up Robust Features (SURF) [72] to index the palmprint into a hash table. During querying, a score-level fusion of voting strategy based on geometric hashing and SURF score is used to identify the live palmprint. In a recent work, Dewangan et al. [39] proposes a face indexing method based on SURF key features and k-d tree. Authors created a two-level index space based on the SURF key points and divide the index space into a number of cells. Further, they define a set of hash functions to store the SURF descriptors of a face image into the cell. The SURF descriptors within an index cell are stored into k-d tree.

Mehrotra et al. [40] proposed an indexing method based on Scale Invariant Feature Transform (SIFT) [73]. The SIFT features are extracted for each iris image and mapped into a hash table using geometric hashing. Panda et al. [41] proposed an indexing method for iris databases using parallel geometric hashing. Authors first, extract the SIFT features from the iris images. The SIFT features are indexed into a hash table using a parallel geometric hashing using multiple processors. The use of parallel processors increases the retrieval performance of the system during identification. In another work, Mehrotra et al. [42] also used the SIFT features for iris indexing. The extracted SIFT features are indexed using a k-d-b tree. During identification, a range search is used to retrieve a set of similar images to the query. The summary of different key feature point based indexing techniques is given Table 1.1.

Table 1.1 Key feature point based indexing approaches

1.4.2 Triplet-Based Indexing Approaches

These approaches compute some form of triplets using the feature points of the biometric samples for indexing purpose. Bhanu and Tan [43] proposed a triplet-based fingerprint indexing method. They compute all the possible triplets from the extracted minutiae of a fingerprint. Their method used triangle features such as handedness, type, direction, etc. to compute the index. Instead of all possible triangulation, Bebis et al. used Delaunay triangulation [74] of minutiae points for indexing fingerprints [44]. For each triplet of the fingerprint, their method computes the ratios of largest side of the triplet with the two smallest sides and the angle between the smallest sides to generate the index.

Ross and Mukherjee [45] also developed an indexing technique based on Delaunay triangles. However, they added ridge curvature to the minutiae triplets for improved performance. Further, they used k-means clustering for indexing the triplets. Another Delaunay triangulation-based indexing approach was proposed by Liang et al. [18]. However, this approach uses lower order Delaunay triangulation [75]. They proved that the Delaunay triangulation is sensitive to skin distortion and the order-0, order-1 Delaunay triangles are more stable and robust against distortion. Further, Alonso et al. [46] extended the Delaunay triangulation to handle the distortions caused by spurious and missing minutiae. Iloanusi et al. [48] proposed a minutiae quadruplet based approach for fingerprint indexing. The authors used multiple fingers of an individual and extracted the geometric information from the minutiae quadruplets. Four, five, and ten fingerprints from a subject are fused at the rank level using the highest rank rule.

Jayaraman et al. [47] also proposed a method for palmprint indexing using SURF features. They extract the SURF features from each palm image. Then they apply a series of preprocessing steps on the SURF features, such as, mean centering, principal component analysis, rotation, and normalization to make them invariant to affine transformations. Finally, a block-based triangulation is applied and the geometric features of the triangles are indexed using geometric hashing. Table 1.2 shows the summary of various triplet-based indexing approaches.

Table 1.2 Triplet-based indexing approaches

1.4.3 Match Score Based Indexing Approaches

These approaches use the match score between the images for indexing purpose. The first such attempt was made by Maeda et al. [49]. A match score vector was calculated for each image by matching it against all the images in the database and stored. During identification, the match score vector of the query is compared against each image score vector.

Gyaourova and Ross [51] present an indexing approach based on match scores. This method generates a set of match scores called index code, by comparing a biometric image with a small set of reference images. During querying, the match scores between the test image and all the enrolled images are compared to identify the candidate list. This approach was tested individually on face and fingerprint database. Finally, the candidate identities from both the databases are fused to identify the best matches. Authors claim that comparison of two score vectors takes less time compared to matching two templates.

Paliwal et al. [52] proposed another work based on match scores. For each image, a set of match scores are computed like Gyaourova and Ross [51] method. The computed match scores are stored into a Vector Approximation (VA\(+\)) file which is a space partitioning method. This method use k-NN search and texture to retrieve top k similar matches. This approach was tested on a palmprint database. Table 1.3 shows the summary of the various score based indexing approaches.

Table 1.3 Match score based indexing approaches

1.4.4 Other Indexing Approaches

In the literature, there are also some indexing techniques for biometric systems which are based on different features other than discussed above. For example, ridge information for fingerprints; texture, color information for palmprints, iris and face biometrics, etc. Summary of other indexing techniques in the literature are given in Table 1.4.

Table 1.4 Other indexing approaches

1.5 Benchmarking in Indexing and Performance Evaluation

Benchmarking is the process of validating the results and comparing with already existing best practices in the literature. The benchmarking improves the quality of the development activity. Some of the biometric benchmark databases (like PolyU palmprint, FVC fingerprint) are available to the research community for evaluation. The performance of the indexing algorithm can be calculated based on various parameters like hit rate and penetration rate.

1.5.1 Databases

Experiments are conducted on the following biometric databases:

  1. 1.

    Fingerprint Verification Competition (FVC) databases

  2. 2.

    PolyU Palmprint database

These databases exhibit some fundamental differences such as type of biometric, device used to capture the images, resolution, lighting conditions, etc. This forms the basis for the study of the proposed work under different circumstances. Detailed description of these databases is given in the following:

1.5.1.1 Fingerprint Verification Competition (FVC) Databases

The seven FVC databases used in the experiments are: 1. FVC 2002 DB1, 2. FVC 2002 DB2, 3. FVC 2002 DB3, 4. FVC 2002 DB4, 5. FVC 2004 DB1, 6. FVC 2004 DB2, and 7. FVC 2004 DB4. Each of these database comprises images from 100 different fingers. Each finger has 8 impressions in the database. This makes a total of 800 images to perform the experiments. Further, each database is divided into two mutually exclusive training (i.e., Gallery) and test (i.e., Probe) sets. Arbitrarily, four images per finger are chosen for training and the remaining four images are used for testing.

1.5.1.2 PolyU Palmprint Database

The PolyU palmprint database was acquired at the Hong Kong Polytechnic University using a CCD (Charge Coupled Device) camera [76] at a spatial resolution of 75 dpi and 256 gray levels. This benchmark database consists of 7,752 grayscale images of size \(384 \times 284\) pixels corresponding to 386 different palms. Around 20 images per palm have been collected in two sessions. Arbitrarily 10 images per palm are considered for training and remaining 10 images are used for testing. Table 1.5 shows the detailed description of each database used in the experiments.

Table 1.5 Characteristics of the databases used in the experiments

1.5.2 Performance Metrics

The performance of the proposed indexing approaches is determined using the following measures:

  1. 1.

    Hit Rate (HR)

  2. 2.

    Miss Rate (MR)

  3. 3.

    Penetration Rate (PR)

  4. 4.

    Cumulative Match Characteristics (CMC) curve

1.5.2.1 Hit Rate (HR):

Hit Rate (HR) is the percentage of test set images for which the corresponding gallery set image with the correct match is present in the retrieved candidate set.

$$\begin{aligned} HR = \left( \frac{y}{M} \right) \times 100\% \end{aligned}$$
(1.1)

where y is the correctly identified test set images and M is the total number of test set images.

1.5.2.2 Miss Rate (MR):

Miss Rate (MR) is the percentage of probe set images for which the corresponding gallery set image with the correct match is not present in the candidate set.

$$\begin{aligned} MR=100-HR \end{aligned}$$
(1.2)

1.5.2.3 Penetration Rate (PR):

Penetration Rate (PR) is the average percentage of gallery set images retrieved (i.e., Candidate set) to identify a query image from the test set by the indexing mechanism.

$$\begin{aligned} PR= \left( \frac{1}{M} \sum \limits _{i\,=\,1}^{M} \frac{|C^i|}{N}\right) \times 100\% \end{aligned}$$
(1.3)

where \(C^i\) is the candidate set of the \(i^{th}\) test set image, N is the number of images in the gallery set, and M is number of images in the test set.

An efficient indexing method will have a high hit rate (low miss rate) and a low penetration rate.

1.5.2.4 Cumulative Match Characteristics (CMC) Curve

CMC curves represent the identification accuracy of the system at various ranks. To determine the accuracy, the images in the retrieved candidate set are sorted in descending order such that the image in the first position is most similar to the query and other positions are arranged accordingly. We assign rank 1 to the image in candidate set at the first position, rank 2 to the image at the second position, and so on. Accuracy at rank n (denoted by \(I_n\)) indicates the percentage of test set images for which the genuine match is present in top n images of the sorted candidate set. This is formulated in Eq. 1.4, where z denote the number of test set images for which the genuine match is in top n, and M denote the total number of images in the test set.

$$\begin{aligned} I_n = \frac{z}{M} \end{aligned}$$
(1.4)

1.6 Summary

The chapter includes a brief introduction to biometric recognition and importance of indexing. It also explored different issues that should be addressed by an indexing system. The current developments in the field of biometric indexing and retrieval also explored. Finally the benchmarking, and performance evaluation procedures for biometric indexing techniques are explained.