1 Introduction

Biometrics refers to automated recognition of individuals based on their biological and behavioural characteristics [23, 25]. Due to its resistance to false matches, the iris represents one of the most powerful biometric characteristics [7]. In order to confirm an individual’s identity accurately and reliably, iris recognition systems analyze the complex random texture that is visible in the iris of the eye. Iris recognition technologies [7, 16] are deployed in numerous large-scale nationwide projects and are currently entering the mobile market [43]. Following Daugman’s approach [16], which is the core of most operational applications, four key modules constitute an iris recognition system: (1) image acquisition, where most current deployments require subjects to fully cooperate with the system to capture images of sufficient quality; (2) pre-processing, which includes the detection of the pupil and the outer iris boundary, a normalization of the iris to a rectangular texture and enhancement of the textured image. Parts of the iris texture which are occluded by eyelids, eyelashes or reflections are stored in an according noise mask; (3) feature extraction, in which a binary feature vector, i.e. iris-code, is generated by applying appropriate filters to the pre-processed iris texture; (4) comparison, which is based on the estimation of Hamming distance (HD) scores between pairs of iris-codes and corresponding masks, can be performed rapidly achieving millions of comparisons per second per CPU core [16]. In the comparison stage, circular bit shifts are applied to iris-codes and HD scores are estimated at different shifting positions, i.e. relative tilt angles. The minimal obtained HD, which corresponds to an optimal alignment, represents the final score.

Focusing on the feature extraction stage, Daugman [15] proposed the use of two-dimensional Gabor filters, which are applied to a pre-processed iris image in a dimensionless polar coordinate system at different scales. Each complex Gabor response is encoded into two bits by the signs of its real and imaginary value. Numerous alternative methods have been suggested for the purpose of iris feature extraction, e.g. one-dimensional Gabor filters [36], packets of Gabor wavelets [27], differences of discrete cosine transform (DCT) coefficients [39], custom-built filters [33, 49], including circular symmetric filters and tripole filters, or characterization of key local variations [34]. It has been shown that the vast majority of these methods resemble that of [16], in the sense that they quantize responses of other linear transforms or filters (replacing 2D Gabor filters), which are applied to iris textures in a similar manner [26].

Despite the above de facto standard approaches, some entirely different schemes have been proposed, which aim at extracting different types of iris features, e.g. salient key points [5, 57] or human-interpretable features [10, 50]. More recently, the use of deep neuronal networks has been introduced to iris recognition, too. With respect to the feature extraction stage, features obtained from convolutional neuronal networks have also revealed promising results, e.g. [30, 42]. In particular, several researchers have investigated the usefulness of SIFT [32] for iris recognition based on near-infrared (NIR) [1, 11, 37, 38, 56, 58] and visible wavelength (VW) iris images [5, 31, 37]. SIFT represents a generic method for image-based extraction and comparison of keypoints and corresponding descriptors, which is invariant to translations, rotations and scaling transformations in the image domain. However, compared to traditional approaches, the vast majority of proposed SIFT-based iris recognition schemes suffer from three major drawbacks, see Sect. 2: (1) degraded biometric performance: significant drops in recognition accuracy are reported, which might lead to unacceptable biometric performance, even in case of good-quality iris images; (2) poor comparison speed: the original comparison algorithm of SIFT requires a brute force comparison of extracted keypoint descriptors, which causes considerably longer response times compared to traditional HD-based iris comparators. (3) extensive storage requirement: SIFT-based reference data consist of a variable-sized set of keypoint descriptors, which requires significantly more storage compared to the compact binary iris-codes of traditional schemes.

In this work, we tackle all of the above-mentioned drawbacks of SIFT-based iris recognition systems. We show that, in case iris biometric input data are prepared properly and an adequate post-processing is applied, a SIFT-based iris recognition scheme is capable of maintaining the biometric performance obtained by different traditional systems. In order to achieve a rapid comparison, we extract SIFT-based binary codes from iris textures, which allow for a simultaneous matching of keypoints and an efficient retrieval of corresponding descriptors via look-up tables. Further, we introduce a binarization of keypoint descriptors, which enables a fast comparison based on bit operations and, at the same time, significantly reduces storage requirements. The proposed system yields a substantial speed-up resulting in low transaction times comparable to those of traditional schemes. Moreover, the proposed system turns out to be a suitable candidate for biometric fusion, obtaining significant performance gains in a challenging multi-algorithm fusion with traditional schemes.

This article is organized as follows: Section 2 briefly summarizes the functionality of SIFT as well as existing applications of SIFT to iris recognition. Section 3 provides a detailed description of the constituting modules of the SIFT-based baseline iris recognition system. The generation of SIFT-based binary codes, the binarization of keypoint descriptors and the corresponding comparison techniques are introduced in Sect. 4. Experimental results are presented in Sect. 5. Finally, conclusions are drawn in Sect. 6.

2 Related work

The following subsections briefly review the fundamentals of SIFT (and its variants) and published approaches to SIFT-based iris recognition.

2.1 Scale-invariant feature transform

Given an image I, points of interest, i.e. keypoints, are extracted by detecting scale space extrema using a difference of Gaussian (DoG) function. The main idea behind scale space extrema detection is to identify stable features which are invariant to changes in scale and viewpoint. In the first step, a Gaussian filter \(G(x,y,\sigma )\) of size \(\sigma \) is convolved with I(xy), \(L(x,y,\sigma )=G(x,y,\sigma )*I(x,y)\). Subsequent scales are subtracted by employing a constant multiplicative factor k in order to obtain DoG images, \(D(x,y,\sigma )=L(x,y,k\sigma )-L(x,y,\sigma )\). The set of Gaussian-smoothed images and corresponding DoG images form an octave, where different octaves are obtained by successively down sampling the original image by a factor of 2. In [32], it is suggested to use 3 octaves and \(s=3\) scales per octave with \(\sigma =1.6\) and \(k=2^{1/s}\). In order to cover a complete octave in the scale space extrema detection, \(s+3=6\) images have to be generated per octave. Keypoints are localized as minima or maxima by comparing each pixel in a DoG image against its eight neighbours in the same scale and corresponding nine neighbours in the scale above and below. Subsequently, the stability of detected feature points is verified by rejecting noise-sensitive points that exhibit low contrast and points which are along an edge. Therefore, additional parameters are introduced in [32], i.e. a contrast threshold of 0.04 and an edge threshold of 10. Figure 1 depicts SIFT keypoints extracted from a NIR iris image of size \(320\times 280\) pixels where keypoint detection is applied at different stages of the iris biometric processing chain.

Fig. 1
figure 1

Keypoint detection: SIFT keypoints extracted from a the iris, b the normalized iris texture and c the enhanced normalized iris texture for a sample image of the CASIAv4 iris database using the default parameters suggested in [32]. Keypoints are depicted using random colours for a better differentiation of nearby keypoints

In the next step, the dominant orientation of gradients within a \(16\times 16\) pixel window around each keypoint is estimated. Subsequently, a keypoint descriptor is extracted, relative to the rotation of the corresponding keypoint. Again, gradient magnitudes and orientations are sampled within a \(16\times 16\) pixel window and accumulated into orientation histograms summarizing the contents over \(4\times 4\) sub-regions where each orientation histogram consists of 8 bins. Hence, a total number of \(D=4\times 4\times 8=128\) bins form the descriptor of a keypoint.

Given two images, extracted sets of keypoints can be matched by estimating the Euclidean distance between corresponding keypoint descriptors. This means, a reference descriptor is compared against all probe descriptors to determine the closest neighbour of distance \(d_1\). In order to decide whether the closest neighbour is a match, a ratio test is performed using the distance of the second closest neighbour \(d_2\). In case, \(d_1/d_2<t\) corresponding keypoints are considered to match, where \(t\simeq 0.8\) is suggested in [32]. An alternative to this ratio test is to cross-check matching pairs of keypoints. That is, only those matching pairs are retained where both keypoints are closest neighbours to each other. While the Euclidean distance between two points remains stable, the nearest neighbour of a specific point may change when changing the perspective of the comparison, which is illustrated in Fig. 2. By cross-checking whether pairs of descriptors are nearest neighbours, the number of outliers (false positives) can be minimized without employing a ratio test, which might require an application-specific threshold. On the contrary, cross-checking of obtained matching pairs of keypoints requires significantly more computational effort.

Fig. 2
figure 2

Example of cross-checking matching point pairs: Euclidean distances between two point sets, Green and Red, are estimated where a Green is matched against Red, b Red is matched against Green and c cross-checking is performed after matching Green against Red and vice versa

In order to provide a more efficient detection as well as matching of keypoints and corresponding descriptors, different alternative techniques have been proposed, most notably Speeded-Up Robust Features (SURF) [4]. For SURF, keypoint matching is accelerated by employing keypoint descriptors of reduced size, e.g. 64 or 32 bins. Moreover, since binary descriptors allow for a fast HD-based keypoint comparison, different methods of how to binarize keypoint descriptors [2, 28] or directly extract binary keypoint descriptors have been suggested, such as Binary Robust Independent Elementary Features (BRIEF) [8].

2.2 SIFT-based iris recognition

Numerous researchers have applied SIFT for the purpose of iris recognition. Table 1 summarizes most relevant works with respect to the input data from which SIFT keypoints and corresponding descriptors are extracted, major findings, employed databases and obtained biometric performance. It can be observed that, in comparison to conventional iris recognition systems, the vast majority of listed approaches do not reveal competitive biometric performance. Moreover, the complex procedure required to match sets of SIFT keypoints prevents from a rapid comparison of SIFT-based templates, which require significantly more storage than binary iris-codes. Early works on SIFT-based iris recognition [1, 5] were motivated by unconstrained acquisition scenarios where accurate segmentation of the iris region turns out to be challenging. By extracting SIFT keypoints directly from the original (coarsely segmented) iris image potential self-propagating errors in the normalization stage can be avoided. Reported performance rates of early approaches were not competitive compared to traditional iris recognition systems, while more recent proposals reveal acceptable biometric performance [11]. However, it has been shown that SIFT descriptors carry complement information which can be utilized to improve the performance of a traditional iris recognition system in a score-level fusion scenario [1]. While it has been shown that keypoint matching fails for challenging off-angle iris images [5], biometric performance tends to improve in case keypoints are extracted from (correctly) normalized and/or enhanced iris textures [31, 56]. Compared to original iris images, normalized textures of constant size yield a more stable amount of detected keypoints and contrast enhancement techniques increase the amount of detected keypoints, since less of them exhibit low contrast.

Table 1 Overview of related work: most relevant approaches of SIFT-based iris recognition, findings, employed databases, reported results and remarks

Another important issue is the detection of false positive keypoint matches. Since relative positions of detected keypoints are expected to change only slightly, geometrical constraints can be applied in order to detect false positive keypoint matches, which is referred to as trimming of false matches [45]. In [5, 11], it is suggested to divide the detected iris into regions and perform keypoint detection and matching per region. Similarly, a region-based matching of keypoints can be performed on normalized iris textures [31]. In [1], it is suggested to retain only matches within a distinct range of rotation and distance, which means that scale invariance and robustness to pupil dilation can no longer be guaranteed. Moreover, biometric performance might also be improved by employing alternative keypoint descriptors [38] or matching strategies [31]. SIFT can also be employed to detect and match macro-features, i.e. structures within the iris texture, such as furrows or crypts [50]. It is important to note that the estimation of a final comparison score between two given iris images has received only little attention. Within most approaches, it is suggested to count the number of matching keypoints [1, 5]. However, if compared sets of keypoints and corresponding descriptors are subject to strong variations in size, score normalization techniques play an important role.

In summary, it can be concluded that existing approaches to SIFT-based iris recognition fail to obtain competitive biometric performance rates. We also observe that diverse processing steps of SIFT-based iris recognition schemes, e.g. keypoint matching or score estimation, might allow for potential improvements to obtain a more reliable recognition system. Such a system would be of particular interest, since proposed studies have demonstrated the potential of SIFT-based features to complement conventional iris-biometric feature vectors, i.e. iris-codes. In addition, comparison speed turns out to be vital to enable an operation of SIFT-based iris recognition in identification mode. Note, that all presented approaches require complex keypoint matching procedures, which do not allow for a rapid comparison. Finally, it is worth noting that so far there are no strategies for reducing the storage required within SIFT-based iris recognition schemes.

3 SIFT-based iris recognition system

In the subsequent subsections, we will summarize the pre-processing and feature extraction process the baseline and the proposed system will build upon. Further, the baseline SIFT-based comparator and its used score estimation are described.

3.1 Pre-processing and feature extraction

In the first step, the pupil and outer iris boundaries are detected and the iris is transformed to a normalized texture of \(W\times H=512\times 64\) pixels according to the rubbersheet model [16], see Fig. 1a, b. The dimension of the normalized iris texture is derived from IS ISO/IEC 29794-6:2015 [20], where the iris radius, i.e. the radius of a circle approximating the iris–sclera boundary, is required to be at least 80 pixels. Subsequently, contrast-limited adaptive histogram equalization (CLAHE) [59] is applied to the texture resulting in an enhanced texture, as shown in Fig. 1c. CLAHE is performed using a block size of \(40\times 64\) pixels, 256 histogram bins and a clipping limit of 1. Similar image enhancement techniques are employed in conventional iris recognition systems relying on global feature extractors [7]. It can be observed that enhancing the contrast of the extracted iris texture significantly increases the amount of detected keypoints. By stretching the histogram of pixel intensities, detected keypoints exhibit higher contrast, i.e. numerous keypoints, which might have been discarded in the original image or the normalized texture due to low contrast, are now retained. Since CLAHE is performed locally, a rather uniform distribution of detected keypoints is obtained, which can not be achieved by operating a global contrast threshold in case images or image regions vary in contrast.

Let \({\mathbf {p}}=(x,y,\theta )\) be a detected keypoint at \((x,y) \in {\mathbb {R}}^2\) and \(\theta \in [0,360)\) its orientation, i.e. angle. The corresponding keypoint descriptor is denoted by \({\mathbf {d}}=(d_1,d_2,\dots ,d_{D})\), with \(D=128\) in case of SIFT. The biometric reference data \({\mathcal {R}}\) for a single enhanced iris texture consist of a set of N detected keypoints and a corresponding set of their keypoint descriptors, \({\mathcal {R}}=\{({\mathbf {p}}_n, {\mathbf {d}}_n)\}, n=1,\dots ,N\). In contrast to a reliable feature selection procedure, which aims at reducing the extracted feature set to a discriminative subset thereof, the above-described pre-processing step increases N. Moreover, proposed feature selection methods mostly rely on global contrast thresholds [11] without analyzing whether feature descriptors of high contrast keypoints do carry the most discriminative information in case of iris biometrics. Note that, for the baseline system storage requirement, which increases with N, is considered a less relevant factor.

3.2 Keypoint comparison and score estimation

Given a reference set \({\mathcal {R}}\) and a probe set \({\mathcal {P}}\), keypoint descriptors are compared using the \(L_2\)-norm. In the employed comparison step, each keypoint descriptor of \({\mathcal {R}}\) is compared against all keypoint descriptors of \({\mathcal {P}}\) and vice versa. After performing the previously described cross-checking step, a set of matching keypoints \({\mathcal {M}}= \{({\mathbf {p}}_{rk},{\mathbf {p}}_{pk})\}, k=1,\dots ,K\) is obtained, \({\mathbf {p}}_{rk}\in {\mathcal {R}}, {\mathbf {p}}_{pk}\in {\mathcal {P}}\). Since an increased amount of detected keypoints raises the probability of false positive keypoint matches, a detection and exclusion of such matches are performed based on geometric constraints. Given \({\mathcal {M}}\), only those matches are retained for which the corresponding keypoint coordinates lie within a distance defined by two thresholds, \(\epsilon _x\) and \(\epsilon _y\), yielding a new set of matches \({\mathcal {M}}'\),

$$ {\mathcal {M}}'= \{({\mathbf {p}}_{rl},{\mathbf {p}}_{pl}): \Vert x_{rl}-x_{pl}\Vert<\epsilon _x, \Vert y_{rl}-y_{pl}\Vert <\epsilon _y\}, $$
(1)

with \(l=1,\dots ,L\) and \(L\ll K\). These geometrical constraints are adjusted to detect false positive keypoint matches and to tolerate natural variance in keypoint locations. In particular, \(\epsilon _x\) compensates for horizontal shifts of iris textures resulting from head tilts, while \(\epsilon _y\) defines a vertical tolerance which might be caused by (non-accurate) normalization of iris images exhibiting large variation pupil dilation [51]. Figure 3 illustrates a SIFT-based comparison of a genuine pair of enhanced iris textures without and with employing geometrical constraints. Note that in contrast to [1], scale invariance is not affected since textures are normalized prior to keypoint detection. The proposed comparison is somewhat similar to a block or region-based comparison of keypoint descriptors [5, 31], in the sense that keypoints of matching descriptors have to be located within a \(\epsilon _x \times \epsilon _y\) region. However, in the presented approach, these regions are, in transferred sense, dynamically stretched around each keypoint such that no matches are missed due to keypoints located at region boundaries. Moreover, matching pairs of descriptors are obtained from the entire sets of reference and probe descriptors so that the probability of false matches to occur within the defined region is further reduced.

Fig. 3
figure 3

Trimming of false matches: comparison of obtained keypoint matches from a genuine comparison a without and b with geometrical constraints. Keypoint matches are depicted using random colours for a better differentiation of nearby matches

Finally, a comparison score s between \({\mathcal {R}}\) and \({\mathcal {P}}\) is estimated. In diverse approaches [1, 5, 58], it is suggested to return the number of obtained matching pairs of descriptors as final score, i.e. \(s=\Vert {\mathcal {M}}'\Vert \). However, the amount of detected keypoints does significantly differ in case parts of the iris are occluded by eyelids, eyelashes or reflections. Moreover, potential attackers might perform presentation attacks [21] using textures which exhibit a large amount of keypoints in order to artificially increase the chance of a false accept. Hence, similar to the estimation of the fractional Hamming distance in relation to iris-code bits, comparison scores need to be normalized. Given \({\mathcal {R}}, {\mathcal {P}}\) and obtained matches of keypoints \({\mathcal {M}}'\), a dissimilarity score s is estimated as,

$$ s=1-\frac{\Vert {\mathcal {M}}'\Vert }{\min (\Vert {\mathcal {R}}\Vert ,\Vert {\mathcal {P}}\Vert )}. $$
(2)

We suggest to define a lower limit for \(\Vert {\mathcal {R}}\Vert \) as a quality check. Such a limit prevents from presentation attacks in which only a few descriptors are presented to the system and compared against the entire reference set, which might result (by chance) in a match and, hence, in a low dissimilarity score.

The presented baseline scheme reveals two major disadvantages: firstly, variable-sized sequences of real-valued keypoint triples and corresponding integer-valued descriptors cause an increased storage requirement, in contrast to conventional compact binary iris-codes; secondly, comparisons of keypoint descriptors are performed by estimating the \(L_2\)-norm between corresponding histograms, which requires \({\mathcal {O}}(N^2)\) steps using the described comparator. Consequentially, the required computational effort is significantly compared to an efficient Hamming distance-based comparator, which might prevent the system to be operated in identification mode (on large databases). Note that the stated drawbacks apply to all proposals of SIFT-based iris recognition. Hence, an acceleration of the presented baseline SIFT-based iris recognition system is of interest.

4 Proposed system

The proposed scheme builds upon the pre-processing and feature extraction of the previously described baseline scheme. Key components of our system, which are described in detail in the following subsections, include the generation of a keypoint-code and a look-up-table, a specific alignment procedure, a binarization of keypoint descriptors as well as a specific comparator.

4.1 Keypoint-code, look-up table and alignment

We assume that iris images are acquired from a direct frontal angle. This assumption is considered reasonable, since it has been shown that SIFT-based iris recognition becomes unreliable in case of off-angle image acquisition [5]. If iris images are taken frontally, the relative locations of keypoints detected in enhanced iris textures are expected to persist. Based on this assumption, a simultaneous pairing of keypoints can be achieved prior to a comparison of keypoint descriptors. For this purpose, we quantize a given keypoint triple \({\mathbf {p}}=(x,y,\theta )\) and map it to a two-dimensional binary map, in order to ease further processing steps. This two-dimensional binary map is referred to as keypoint-code. In the first step, function f maps a given keypoint triple to a quantized two-dimensional point, \(f({\mathbf {p}}, q_x, q_y, q_\theta )={\tilde{\mathbf {p}}}=(\tilde{x},\tilde{y}) \in {\mathbb {N}}^2\), where \(q_x, q_y\) and \(q_\theta \) denote predefined quantisation factors and,

$$\begin{aligned}\tilde{x} &= \lfloor x/q_x\rfloor , \nonumber \\ \tilde{y} &= \lfloor y/q_y\rfloor + \lfloor H/q_y\rfloor \cdot \lfloor \theta /q_\theta \rfloor .\end{aligned} $$
(3)

Hence, angle values only take influence on resulting y-coordinates. For instance, let \(q_x=4, q_y=2, q_\theta =90\), and according to the previously defined setting \(H=64\), a keypoint triple \({\mathbf {p}}=(15,5,100)\) is mapped to \((\tilde{x},\tilde{y})=\big (\lfloor 3.75\rfloor ,\lfloor 2.5\rfloor +\lfloor 100/90\rfloor \cdot \lfloor 64/4\rfloor \big )=(3,2+1\cdot 16)=(3,18)\). Note that, while this process reduces the dimensionality of the feature space, it does not cause any information loss and, hence, does not reduce storage requirement. However, in the following process of keypoint pairing, i.e. feature alignment, two-dimensional binary maps offer a more efficient internal representation.

Fig. 4
figure 4

Keypoint-code and look-up table: two examples of extracted binary codes and their corresponding visualized look-up tables extracted from the enhanced normalized texture of Fig. 1c. Chosen parameters of a and b yield identical dimensions for codes and look-up tables

In the second step, \((\tilde{x},\tilde{y})\) is mapped into the binary keypoint-code \({\mathbf {C}}\) of dimension \(\tilde{W}\times \tilde{H}=\lfloor W/q_x\rfloor \times \lfloor H/q_y\rfloor \cdot \lfloor 360/q_\theta \rfloor \), such that 1s indicate keypoint coordinates, \({\mathbf {C}}[\tilde{x}][\tilde{y}]=1\). In order to achieve a certain robustness against small changes in keypoint coordinates, neighbouring bit positions are set to 1 in case \(x/q_x\) or \(y/q_y\) are close to according boundaries. In the previous example, \(\tilde{x}=\lfloor x/q_x\rfloor =\lfloor 15/4\rfloor =3\), while \(x/q_x=15/4=3.75\) is close to its right neighbour 4. Therefore, we define a threshold t, to which the distances to neighbouring positions are compared against,

$$ {\mathbf {C}}[\tilde{x}+i][\tilde{y}]= 1, {\mathbf {C}}[\tilde{x}][\tilde{y}+j]=1, {\mathbf {C}}[\tilde{x}+i][\tilde{y}+j]=1, $$
(4)
$$ i= \left\{ \begin{array}{ll} 1,&\quad \text{ if } x\mod q_x \ge q_x/2+t\\ -1,&\quad \text{ if } x\mod q_x < q_x/2-t\\ 0, &\quad \text{ otherwise. } \end{array}\right. $$
(5)
$$ j=\left\{ \begin{array}{ll} 1,&\quad \text{ if } y\mod q_y \ge q_y/2+t\\ -1,&\quad \text{ if } y\mod q_y < q_y/2-t\\ 0, &\quad \text{ otherwise. } \end{array}\right. $$
(6)

Obviously, \({\mathbf {C}}[\tilde{x}][\tilde{y}]= {\mathbf {C}}[\tilde{x}+i][\tilde{y}]\) or \({\mathbf {C}}[\tilde{x}][\tilde{y}]={\mathbf {C}}[\tilde{x}][\tilde{y}+j]\) applies in case \(i=0\) or \(j=0\), respectively. For instance, let \(t=0.5\), then for \({\mathbf {p}}=(15,5,100)\) we get \(x\mod q_x= 15 \mod 4=3 \ge 4/2+0.5\) and \(y\mod q_y= 5 \mod 2=1\), resulting in \(i=1\) and \(j=0\). This means, in this example the right neighbour of \({\mathbf {C}}[\tilde{x}][\tilde{y}]\) will be set to 1 as well, \({\mathbf {C}}[\tilde{x}+i][\tilde{y}]={\mathbf {C}}[4][18]=1\).

Keypoint-codes, which is merely constructed from sets of keypoint coordinates, will be used to determine correspondence between keypoints. Comparisons merely based on keypoint-codes, which do not carry information about descriptors of keypoints, are not expected to achieve reliable biometric recognition in particular, if the number of detected keypoints is specifically maximized. Hence, in addition to the keypoint-code, we extract a look-up table, denoted as \({\mathbf {L}}\), of same dimension, which consists of integer values defining the correspondence between 1s in the keypoint-code and keypoint descriptor indices. For the n-th keypoint \({\mathbf {p}}_n, n=1,\dots ,N, {\mathbf {L}}[\tilde{x}_n][\tilde{y}_n]=n\), where neighbouring positions might be set as well according to the above-described process. It is important to note that prior to generating the keypoint-code and the look-up table, keypoints are arranged in ascending order with respect to their response values. Thereby, keypoints with highest response values overlap those with lower response values in the keypoint-code and the look-up table, in case of identical quantized coordinates. Figure 4 illustrates two examples of keypoint-codes and visualizations of their look-up tables. From the look-up tables, we observe that cells originating from keypoints with highest response values (red cells) tend to overlap those originating from low-response keypoints (blue cells). From the look-up table in Fig. 4a, where \(q_\theta =360\) (angles are ignored), it can clearly be observed that keypoints with highest response values mostly occur in the upper regions of the enhanced iris texture, i.e. iris texture parts near the pupil which are hardly occluded by eyelids or eyelashes.

In order to efficiently retrieve correspondence between keypoints (and their descriptors) of a given reference and probe set, \({\mathcal {R}}\) and \({\mathcal {P}}\), we align their keypoint-codes, \({\mathbf {C}}_r\) and \({\mathbf {C}}_p\), respectively. In generic iris recognition schemes, the minimum HD-score obtained from various circular bit shifts of the probe iris-codes, which correspond to different relative tilt angles, represents an optimal alignment [16]. Similarly, we apply circular bit shifts to the probe keypoint-code, but seek for the maximum overlap of 1s between the reference and probe keypoint-code. At the optimal alignment, the intersection of the reference and probe keypoint-code is referred to as alignment-code. Let S be the maximum tolerated pixel-shift in both directions for enhanced iris textures, then the set of considered shifting positions is defined as \({\mathbf {S}}=\{-\lceil S/q_x\rceil ,\dots , \lceil S/q_x\rceil \}\). Let \(g({\mathbf {C}},s)\) denote a keypoint-code \({\mathbf {C}}\) shifted by s bits, then the alignment-code \({\mathbf {A}}_{rp}\) between \({\mathbf {C}}_r\) and \({\mathbf {C}}_p\) is estimated as,

$$ {\mathbf {A}}_{rp} = \max _{s\in {\mathbf {S}}}\left( {\mathbf {C}}_r \cap g\left( {\mathbf {C}}_p,s\right) \right) . $$
(7)

That is, the pairing of two sets of keypoints can now be performed based on efficient bit operations, comparable to HD-based comparators, where the amount of required bit comparisons depends on the dimension of keypoint-codes and the degree of rotation compensation. Note that, compared to the baseline system, the proposed alignment process implicitly trims false positive pairs of keypoints.

4.2 Binarization of descriptors and proposed comparator

In order to efficiently compare and store keypoints descriptors, we propose to binarize them, too. Given \({\mathcal {R}}\), the corresponding set of keypoint descriptors \(\{{\mathbf {d}}_n\}\) is binarized to obtain \(\{{\mathbf {b}}_n\}\) with \({\mathbf {b}}_n = ({b_1}_n,\dots , {b_{D}}_n)\) and \(n=1,\dots ,N\), by defining a binarization threshold \(\tau \), such that,

$$ {b_i}_n= \left\{ \begin{array}{ll} 1,&{}\quad \text{ if } {d_i}_n > \tau \\ 0, &{}\quad \text{ otherwise. } \end{array}\right. ,\quad i=1,\dots ,D $$
(8)

where \(D=128\) in case of SIFT. With the use of this trivial, yet effective, quantization method, the most dominant bins of each descriptor histogram are set to 1. For the first 16 keypoint descriptors of keypoints detected in Fig. 1c, the resulting binarized descriptors are depicted in Fig. 5 for various values of \(\tau \). Obviously, the proposed binarization significantly reduces the required storage for extracted feature descriptors. In particular, the stored reference data consist of \(\tilde{W}\times \tilde{H}\) bits and integers for the keypoint-code and the corresponding look-up table, respectively, and \(N \times D\) bits for the set of binarized keypoint descriptors. In order to avoid arbitrary large sets of feature descriptors, the extraction of descriptors could be limited to N keypoints, which exhibit the highest response values.

Given a pair of feature descriptors, \({\mathbf {b}}_r\) and \({\mathbf {b}}_p\), the distance function d is used for comparison,

$$ d({\mathbf {b}}_r,{\mathbf {b}}_p)=1-\frac{2 \Vert {\mathbf {b}}_r \cap {\mathbf {b}}_p\Vert }{\Vert {\mathbf {b}}_r\Vert + \Vert {\mathbf {b}}_p\Vert }. $$
(9)

Hence, the amount of identical dominant descriptor bins, i.e. 1s, is counted and normalized by the total amount 1s occurring in both binarized descriptors. The proposed distance function operates on bit level and, thus, enables a highly efficient comparison of feature descriptors.

Let o be the shifting position which yields an optimal alignment, i.e. \({\mathbf {A}} = {\mathbf {C}}_r \cap g({\mathbf {C}}_p,o)\). The final dissimilarity score between \({\mathcal {R}}\) and \({\mathcal {P}}\) is estimated as,

$$ \frac{1}{\Vert {\mathbf {A}}_{rp}\Vert }\sum _{w=1}^{\tilde{W}} \sum _{h=1}^{\tilde{H}} {\mathbf {A}}_{rp}[w][h] \cdot d({\mathbf {b}}_{r}[{\mathbf {L}}_r[w][h]], {\mathbf {b}}_{p} [{\mathbf {L}}_p [w+o][h]]). $$
(10)

In case, \({\mathbf {A}}_{rp}[w][h]=1\), which indicates that at least one keypoint in \({\mathcal {R}}\) and \({\mathcal {P}}\) occur at identical coordinates, the corresponding binarized descriptors are obtained through the look-up table and compared. The amount of detected pairs \(\Vert {\mathbf {A}}_{rp}\Vert \) is used to normalize the score.

Fig. 5
figure 5

Binarization of keypoint descriptors: examples for the first 16 binarized keypoint descriptors of the normalized enhanced texture of Fig. 1c according to three different thresholds. Each row represents one binarized keypoint descriptor. a\(\tau =10\). b\(\tau =20\). c\(\tau =30\)

5 Experiments

The subsequent subsections summarize the experimental setup and present evaluation with respect to biometric performance as well as time measurements.

5.1 Experimental setup

Experiments are carried out on CASIAv1 [12], CASIAv4-Interval [14] and the BioSecure iris database [44]. For the BioSecure database, we use all left eye images for performance evaluations and corresponding right eyes are for score normalization purposes in fusion scenarios (see Sect. 5.3). Number of subjects, images, image resolution, image type and resulting amounts of genuine and impostor authentication attempts are summarized in Table 2. Examples of iris images of each database are depicted in Fig. 6. In accordance with IS ISO/IEC 19795-1:2006 [22], biometric performance is evaluated in terms of genuine match rate (GMR) and false match rate (FMR) by plotting the detection error trade-off (DET) curve and the equal error rate (EER).

Table 2 Overview of employed databases w.r.t. image resolution, image type, corresponding number of subjects, number of iris images and the resulting number of genuine and impostor comparisons
Fig. 6
figure 6

Sample NIR iris images of the a CASIAv1, b CASIAv4-Interval and c BioSecure iris database

In Fig. 7, the frequency of locations of detected keypoints in enhanced iris textures across all datasets is depicted in a heatmap. In can be observed that less keypoints are detected in iris parts which are frequently occluded by eyelids (two-dome pattern), since these yield homogeneous regions in the enhanced texture. This implicit regional limitation of keypoint detection makes an additional storage of noise masks redundant, while these could obviously be easily incorporated in order to exclude unreliable keypoints. Figure 8 shows the frequency distribution of the amount of detected keypoints in enhanced iris textures on each database. We observe that the number of detected keypoints might vary considerably, e.g. if large parts of iris textures are affected by occlusions. Hence, that the number of detected keypoints could serve as an input for the quality check module of an iris recognition system. Across all used databases, an average number of keypoints extracted from enhanced iris textures is N=645. Hence, the look-up table has to consist of at least 10 bits per entry in order to index all keypoints. Together with the keypoint-code and the binarized descriptors, the stored template consists of \(\tilde{W}\cdot \tilde{H} + 10\cdot \tilde{W}\cdot \tilde{H} + N\cdot D\) bits. In contrast, assuming that a keypoint triple and its response value can be stored in 32 bits and that each feature descriptor bin consists of 8 bits, the average template size of the baseline system is estimated as \(N\cdot (32 + 128\cdot 8)\) bits. Depending on the used quantisation factors, \(q_x, q_y\), and \(q_\theta \), a significant reduction in storage requirement can be achieved in the proposed system. For instance, let \(q_x=4, q_y=4, q_\theta =90\), we get \(\tilde{W} = 512/4 = 128\) and \(\tilde{H} = 64/4 \cdot 360/90 = 64\), such that the average size of stored template results in \(128\cdot 64 + 10\cdot 128 \cdot 64 + 645 \cdot 128 \simeq 170\) kB. In contrast, the baseline system would on average require \(645\cdot (32 + 128\cdot 8) \simeq 665\) kB.

Fig. 7
figure 7

Location of detected keypoints: frequencies of keypoint coordinates of SIFT keypoints detected in normalized enhanced textures extracted from all images of employed databases

Fig. 8
figure 8

Number of detected keypoints: distributions of the amount of detected keypoints in normalized enhanced textures for the three employed databases. a CASIAv1. b CASIAv4-Interval. c BioSecure

5.2 Performance evaluation

In a first experiment, we measure the biometric performance of the baseline system of Sect. 3. Table 3 compares obtained performance rates using SIFT, SURF and BRIEF keypoint descriptors across employed datasets for different reasonable values of \(\epsilon _x\) and \(\epsilon _y\). As can be seen, for best configurations, \(\epsilon _x\) values are significantly larger than \(\epsilon _y\) values, since rotations of the eye (head tilts) cause significant variations in x-coordinates of detected keypoints. The DET curves for the best configurations on the CASIAv4-Interval and the BioSecure database are plotted in Fig. 9. While SURF and BRIEF extract more compact descriptors, which enable a faster comparison of keypoints, recognition accuracy drastically drops for these approaches compared to SIFT, which obtains practical performance rates across all datasets. These drastic performance drops, which have been observed across all datasets, stress the importance of discriminative keypoint descriptors. If no cross-checking is applied, the EER of the best configuration on the CASIAv4-Interval database is almost doubled, from 0.246 to 0.431%. From Table 1, we observe that the proposed SIFT-based baseline system, which operates on enhanced iris textures and utilizes geometrical constraints as well as cross-checking of detected keypoint matches, significantly outperforms the vast majority of proposed approaches.

Fig. 9
figure 9

DET curves for the best parameter configurations for SIFT, SURF and BRIEF keypoint detectors and corresponding keypoint descriptors using all-against-all comparison of keypoint descriptors, cross-checking of keypoint matches and geometrical constraints. a CASIAv4-Interval. b BioSecure

Table 3 Biometric performance in terms of EER (%) on employed databases for SIFT, SURF and BRIEF keypoint detectors and corresponding descriptors using an all-against-all comparison of descriptors, cross-checking of keypoint matches and different parameters for geometrical constraints

Table 4 summarizes obtained performance rates for the proposed system for appropriate parameter settings. For all evaluations, we use \(S=18\), which corresponds to a relative rotation of approximately \(\pm 10^{\circ }\), and \(t=1\) in order to achieve rotation invariance and a more robust pairing of keypoint coordinates, respectively. In case only the overlap of probe and reference keypoint-codes, \(\Vert {\mathbf {A}}_{rp}\Vert \), would be employed as comparison score, i.e. not taking into account corresponding keypoint descriptors, biometric performance significantly decreases. For instance, for the best configuration on the CASIAv4-Interval database the obtained EER increases from 0.225 to 5.021%. For the binarization of keypoint descriptors thresholds of \(\tau =10\), 20, and 30 have been found to reveal best biometric performance. As can be observed, on the more challenging CASIAv4-Interval and BioSecure database, the presented scheme maintains the recognition accuracy of the SIFT-based baseline iris recognition system. Best performance rates, which indicate a robust pairing of keypoints, are achieved for rather small quantisation values of \(q_x\) and \(q_y\). Focusing on keypoint angles, best recognition accuracies are obtained for \(q_\theta \) values of 45 or less. In order to further assess the goodness of the presented binarization method, we evaluate the proposed system employing two alternative descriptor binarization schemes referred to as Binarized Scale-Invariant Feature Transform (B-SIFT) [29] and Binarization of Gradient Orientation Histograms (BIG-OH) [3]. In the B-SIFT approach, the median bin value of each descriptor is used as threshold based on which all bins are binarized. In the BIG-OH scheme, consecutive bin values are compared and greater/smaller-or-equal relations are binary encoded. Both approaches employ HD-scores to determine the dissimilarity between keypoint descriptors. Moreover, in case keypoint descriptors are not binarized, the \(L_2\)-norm can be employed in order to compare them. Obtained performance rates for applying the \(L_2\)-norm on original keypoint descriptors and the B-SIFT as well as the BIG-OH scheme are shown in Table 5 for various parameter configurations. The proposed binarization technique outperforms the B-SIFT as well as the BIG-OH approach on all datasets. Performance rates achieved when applying the \(L_2\)-norm on original keypoint descriptors underline that information loss is negligible in the presented binarization method. DET curves which compare the best configurations of the proposed system to the aforementioned schemes are depicted in Fig. 10 for the CASIAv4-Interval and the BioSecure database. The corresponding probability density functions of the proposed system for both databases are depicted in Fig. 11.

Fig. 10
figure 10

DET curves for the best parameter configurations for \(L_2\)-norm and HD-based comparison of binarized keypoint descriptors BIG-OH, B-SIFT and the proposed method. a CASIAv4-Interval. b BioSecure

Fig. 11
figure 11

Probability density functions for the best parameter configurations of the proposed method. a CASIAv4-Interval. b BioSecure

Table 4 Biometric performance in terms of EER (%) on employed databases for the proposed system using different quantization parameters and thresholds for the binarization of keypoint descriptors
Table 5 Biometric performance in terms of EER (%) on employed databases using different quantization parameters with \(L_2\)-norm and fractional HD, after binarization with BIG-OH and B-SIFT, for the comparison of feature descriptors

5.3 Comparison and fusion with traditional systems

Further, we compare and fuse the proposed system with two traditional approaches. The first feature extraction method follows the Daugman-like 1D-Log Gabor feature extraction algorithm of Masek [36] (LG), and the second follows the algorithm proposed by Ma et al. [34] (QSW) based on a quadratic spline wavelet transform. Both methods divide the pre-processed iris texture into stripes to obtain 10 one-dimensional signals, each one averaged from the pixels of 5 adjacent rows, and extract iris-codes consisting of \(512\times 20\) bits. In the comparison stage, HD-based scores are estimated performing \(\pm \,16\) bit shifts to compensate misalignments. Implementations of the employed feature extractors are freely available in the University of Salzburg Iris Toolkit (USIT) [52]. Table 6 compares the biometric performance of LG and QSW to the best configurations of the previously evaluated schemes. We observe that the proposed system reveals competitive biometric performance compared to the traditional systems across all databases achieving the lowest EERs on the CASIAv4-Interval and the BioSecure database.

Table 6 Summary of biometric performance in terms of EER (%) for all databases obtained by the baseline systems and best parameter configurations all keypoint-based approaches

In order to investigate whether features extracted by the proposed system complement those extracted by traditional schemes, as reported in [1], we perform a biometric fusion on the BioSecure database, which has been found to be the most challenging one. In the context of biometric fusion, score-level fusion using the sum-rule with proper normalization has been observed to result in competitive performance [24]. To fuse scores obtained by the traditional and the proposed system, we perform a z-score normalization prior to performing a score-level fusion based on the sum-rule. Normalization parameters are estimated employing score distributions obtained from right eye comparisons. From the DET curves in Fig. 12, we observe that a fusion of the traditional schemes does not yield a significant gain in biometric performance. In contrast, a fusion of the proposed system with LG or QSW does reveal a clear improvement in recognition accuracy resulting in EERs of 1.58 and 1.74%, respectively.

Fig. 12
figure 12

DET curves on the BioSecure database for the baseline systems, the best parameter configuration of the proposed method and pair-wise score-level fusions thereof

5.4 Time measurements

Experiments were performed on a single core of an Intel Core i7-3610QM CPU with 3.2G Hz on a standard work station with sufficient RAM. Keypoint-codes, binarized descriptors and iris-codes are internally represented as two-dimensional arrays consisting of bytes (chars). Comparisons are performed by using according bit operations where the number of bits set, i.e. 1s, within resulting bytes is derived from a 8-bit look-up table. The implementation can be considered lightweight and fully portable as it does not make use of larger look-up tables which require a significant amount of RAM nor of CPU-specific PopCnt (population count) functions. That is, our implementation yields an upper bound for the time required for a single comparison. Figure 13 depicts average execution time and corresponding boxplots for a single iris biometric comparison required by all considered systems. As can be seen, the proposed system clearly outperforms the baseline systems based on SIFT and SURF revealing an average execution time comparable to that of the BRIEF-based baseline system, which obtained unpractical recognition accuracy. Using B-SIFT or BIG-OH in order to binarize descriptors slightly reduces the execution time, since these schemes do not require a normalization of intermediate scores. However, it has been shown that these schemes fail to maintain the recognition accuracy of the SIFT-based baseline system. Hence, focusing on SIFT, the proposed system yields the best trade-off between biometric performance and authentication speed. The traditional systems still outperform the proposed system in terms of execution time as these require less bit comparisons. Nevertheless, as previously mentioned, a limitation of the template construction to a fixed number of N keypoints, which exhibit highest response values, could be used to achieve further speed-ups as well as more constant execution times.

Fig. 13
figure 13

Comparison of the average execution time (in microseconds) required for a single pair-wise biometric comparison, boxplots depict the corresponding median, upper/lower quartile and upper/lower whisker for different methods (note the logarithmic scale on x-axis)

6 Conclusion

In contrast to proposed studies, we have shown that the discriminative power of SIFT-based features is comparable to that of traditional NIR iris recognition systems. In order to obliterate the additional shortcomings regarding authentication speed and storage requirement, we proposed an improved SIFT-based iris recognition system achieving an approximately 75-fold speed-up (without the use of intrinsic functions), compared to an according baseline system. The authentication speed provided by the proposed approach is similar to that of traditional scheme. Furthermore, employed binarization methods significantly reduce storage requirements. Hence, the proposed approach represents the first practical applicable SIFT-based iris recognition system in particular, in scenarios where reliable recognition and comparison speed represent crucial factors such as large-scale identification systems. Finally, we confirmed that SIFT-based features complement those extracted by traditional schemes, such that a multi-algorithm yields a significant gain in recognition accuracy, as reported in [1]. It is important to note that generic improvement which have been the proposed for traditional iris recognition system, e.g. by using generalized or personalized weight maps [17], can be easily integrated in order to further improve recognition accuracy.

An application of the proposed system to VW iris images is subject to future work. Since VW iris recognition schemes tend to exhibit inferior recognition accuracy compared to systems based on NIR images [7], a computationally efficient multi-algorithm fusion would be especially appealing in order to improve performance rates. Computational efficiency might as well play an important role in mobile iris recognition systems [7, 48], where computing and storage capacities are limited. Further, the presented concept could be applied to speed-up fingerprint recognition systems, which perform a brute-force comparison of vicinity-based descriptors of minutiae triples, e.g. [6, 9]. In contrast to iris recognition, where alignment involves a one-dimensional shifting, in case of a fingerprint recognition system proper pre-alignment methods would have to be employed. In order to facilitate reproducible research [54], an implementation of the improved SIFT-based baseline iris recognition system is made available together with [52] and the proposed system will be included upon acceptance.