1 Introduction

Biometric has gained importance over two other classical authentication paradigms: token-based and knowledge-based methods due to its less chance of being spoofed or stolen. Face is the primitive way of human authentication and hence early researches have proven the possibility of machine based recognition of human face. Further need of partial face recognition, and the complexity of iris recognition, jointly advocated the research in the direction of periocular (periphery of ocular) biometrics. Periocular biometric is observed to contain gross features (prominent in visible spectrum) as well as subtle features (prominent in near-infrared spectrum). Hence periocular region of a person can be useful for recognition either in visible spectrum (VS) or near-infrared (NIR) imaging. Many researchers have applied several global and local feature extraction techniques to evaluate performance of periocular biometrics. In general, local features exhibit better performance than global feature based recognition in case of periocular biometric. However the local features, with the benefit of high accuracy of recognition, brings the overhead of large feature vector and slow feature extraction and matching. Among the latest local features, Phase Intensive Local Pattern (PILP) [4] is observed to produce highest accuracy as it considers a combination of coarse-to-fine features for recognition. PILP generates a large feature template, which needs huge storage for a database with many subjects. The 1:1 matching of two such large templates also requires huge amount of time. Hence it becomes compulsory to reduce the feature size of PILP for authentication through handheld mobile devices. This article discusses a reduction method of PILP. The reduced feature vector is termed as R-PILP. The reduced feature is tested on two NIR spectrum constrained databases: BATH, CASIAv3, and two VS spectrum partially unconstrained databases: UBIRISv2, FERETv4. The objective of evaluation over these four datasets is to evaluate its efficacy of recognition from subtle and gross features.

2 Related work

Last decade has seen a paradigm shift of research from highly-accurate constrained biometric recognition towards achieving moderate accuracy in unconstrained scenarios, and forwarding the latter to improve performance in terms of accuracy and time. Demand of using biometric system for forensic identification and terrorism investigation is responsible for the lurch. In an unconstrained scenario, difficulty in achieving a desirable biometric template can be broadly classified into two class of reasons:

  1. 1.

    Intrinsic limitation of imaging system: distant imaging, limitation of imaging spectrum, out-of-focus blur, and motion blur

  2. 2.

    External environmental factors: non-cooperation of subjects, out-plane (non-orthogonal) imaging, and illumination variance

Periocular biometric emerged as a counterpart to the pre-established iris biometric to support the following scenarios: to use the images categorized as ‘Failure to Acquire’ (full or partial closed eyes occluding iris) by an iris biometric system, to find a minimal subset of face biometric so that recognition from partial face images can take place, and to achieve recognition in visible spectrum images. It has also been observed that even most advanced preprocessing techniques are insufficient to bring unconstrained templates to the standards of ideal constrained templates. Feature extraction hence plays an important role to extract in-plane orthogonal transformation (viz. scaling and rotation) invariant features from the less-informative noisy templates. Through last two decades, researchers have established iris recognition systems which are highly accurate (in order of 99% in Identification mode) for NIR iris databases like BATH, CASIAv1, ICE, etc. The journey was pioneered by experiments on constrained iris databases. Particularly it can be marked that Proenca and Alexandre, in their work [24], have reported their system to perform with an Equal Error Rate (EER) of 1.01% on CASIA while the EER increases to 2.83% on noisy UBIRIS database. Likewise the work in [28] by Vatsa et al. operates with high accuracy on constrained datasets, but the performance degrades comparatively on UBIRIS which is a partially non-cooperative and noisy dataset. The cause behind this fall in accuracy had been analysed by the researchers as: a. visible spectrum imaging and b. unconstrained noises during acquisition for UBIRIS database. Hence an advanced research has proceeded to investigate whether recognition from unconstrained VS eye image is achievable. Research began to trace the answer to the following question: Can addition of features from the periphery region further improve the recognition accuracy already achieved solely using iris? While researchers had been yet to establish the answer to this question, in [8], Hollingsworth et al. had proposed the existence of features in periocular region in NIR images through experiments with human subjects. In first step of his approach, some human experts are shown a periocular image of a subject. In the next step the experts are shown few periocular images and asked to recognize which image belongs to the same subject shown in the first step. Subsequently accuracy is calculated depending on whether the experts can identify a subject from periocular image. Moreover, the human experts involved in the experiments are asked to mention which feature in the NIR image best helped their recognition judgement. Further research by Hollingsworth et al. in [9] is extended to investigate the existence of features in both NIR and VS periocular images through human expert analysis and automated algorithms. The work by Miller et al. [15] achieves the highest accuracy in periocular recognition among landmark works in this domain [1, 8, 11, 14,15,16,17, 19, 20, 26, 27, 29, 30]. However, the test in his work has been experimented on FRGC face database which constitutes VS images of high resolution of the order of 1704 × 2272 or 1200 × 1600 and the number of pixels between centers of two eyes are 250. Patel et al. [21] have discussed about periocular biometrics based kinship verification. Their proposed algorithm is based on neighborhood repulsed metric learning (NRML). Zhao and Kumar [31] have investigated about feature extraction from periocular region by employing semantics-assisted convolutional neural networks (SCNN). The proposed method is claimed to have superior performance compared to contemporary works with relatively smaller training set. Ahuja et al. [2] have proposed a convolutional neural network (CNN) based hybrid technique for ocular smartphone based biometrics. Their developed method employs spervised and unsupervised CNN augmented with Root SIFT model. However, test of recognition from low resolution VS periocular has remained unverified.

Local features extraction using automatic scale selection has been investigated by Chenhong and Zhaoyang [13] for normalized NIR acquired iris images. The technique deals with first filtering the given normalized iris image by employing a bank of Laplacian of Gaussian (LoG) filters with many different scales and computes the normalized response of every filter. The maxima of normalized response over scales for each point are selected together as the optimal filter outputs of the given iris image. The iris feature is represented in a vector comprises off location and scale information, which is further binary coded for final presentation. In [3], conventional Local Binary Pattern (LBP) [18] followed by Scale Invariant Feature Transform (SIFT) has been applied to extract features from periocular region and tests the same on UBIRISv2 and FERETv4 databases. The accuracy of such feature extraction technique has been found to be limited to approximately 85%, which motivates further development of novel features that will pull up the accuracy for noisy VS periocular recognition to a higher level. Section 3 illustrates the proposed PILP technique along with its reduction to find R-PILP, which attempts to achieve periocular recognition from low resolution VS images. We have converted the VS color images to grayscale and the grayscale images (without separate color information) are subjected as input to the proposed approach. Thus we have further made the input least favourable for recognition and attempted to attain recognition through proposed R-PILP in this challenging scenario. As comparative literature, we have considered a global feature PIGP [3] and widely used local features: SIFT, SURF and PILP. SIFT and SURF are described in Sections 2.1 and 2.2 respectively. A detailed description of PILP is given in this article as the method of reducing PILP is very interlaced. PIGP is not described separately as PIGP comprises of finding feature descriptor in the same way as PILP.

2.1 Scale invariant feature transform

A local feature descriptor termed as Scale Invariant Feature Transform (SIFT) [12] is used as a comparative feature with the proposed one. SIFT provides stable set of features while being less sensitive to geometric transformations of area of interest. The feature is extracted with following steps: a. Keypoint Detection and b. Keypoint Descriptor Computation. SIFT extracts keypoints using Difference of Gaussian and a distribution of gradient orientation from a window around the interest point is described as descriptor. Finally Keypoints of one image is matched to other to find a distance between the images. If the distance is higher than a threshold, the two images are concluded to be captured from different subjects, else they are concluded to be of same subject.

2.1.1 Keypoint detection

The keypoints are detected from periocular image using cascade filtering approach. This is done to search stable features across all possible scales. To define the scale space, input periocular image (I) is convolved with Gaussian kernel G(x, y, σ) as defined by

$$ L(x,y,\sigma)=G(x,y,\sigma) \ast I(x,y) $$
(1)

where ∗ is the convolution operation and σ defines the width of Gaussian filter. The Difference of Gaussian (DoG) images are computed from two nearby scales differentiated by constant multiplicative factor k as:

$$ D(x,y,\sigma)=L(x,y,k\sigma)-L(x,y,\sigma) $$
(2)

DoG images are used to detect interest points with the help of local maxima and minima across different scales. Each pixel in DoG image is compared to 8 neighbors in the same scale and 9 neighbors in the scale above and below. The pixel is selected as a candidate keypoint if it is local maxima or minima in 3 × 3 × 3 region.

2.1.2 Keypoint descriptor computation

Orientation is assigned to each keypoint location to achieve invariance to image rotations as descriptor can be represented relative to orientation. To determine keypoint orientation, a gradient orientation histogram is computed in the neighborhood of the keypoint. The scale of keypoint is used to select Gaussian smoothed image L. For each Gaussian smoothed image L(x, y), magnitude (m(x, y)) and orientation (θ(x, y)) are computed as given in (3).

$$ m(x,y) = \sqrt{(L(x+1,y)-L(x-1,y))^{2} + (L(x,y+1)-L(x,y-1))^{2}} $$
(3)
$$ \theta(x,y) = \tan^{-1} \left[\frac{L(x,y+1)-L(x,y-1)}{L(x+1,y)-L(x-1,y)}\right] $$
(4)

Orientation histogram is then formed for gradient orientation around each keypoint. The histogram has 36 bins for 360 degree range of orientations and each sample is weighted by gradient magnitude and Gaussian weighted circular window with σ of 1.5 times of scale of keypoint before adding it to histogram. Peaks in the histogram correspond to orientation and any other local peak within 80% of largest peak is used to create keypoint with the computed orientation. This is done to increase stability during matching [12].

Once orientation has been selected, the feature descriptor is computed as a set of orientation histograms on 4 × 4 pixel neighborhoods. These histograms contain 8 bins each and each descriptor contains an array of 16 histograms around the keypoint. This generates SIFT feature descriptor of 4 × 4 × 8 = 128 elements. The descriptor vector is invariant to rotation, scaling, and illumination.

2.1.3 Keypoint matching

To match two images, corresponding feature sets are subjected to nearest neighbour matching and number of keypoints matched is considered as the parameter to interpret the degree of matching.

2.2 Speeded up robust features

A local feature descriptor termed as Speeded Up Robust Features (SURF) [6] is used as a comparative feature with the proposed one. SURF, like SIFT provides stable set of features while being less sensitive to geometric transformations of area of interest. However, the SURF descriptor is 64 dimensional while SIFT descriptor is 128 dimensional. The feature is extracted with following steps: a. Keypoint Detection and b. Keypoint Descriptor Computation. SURF extracts keypoints using Hessian matrix and a distribution of Haar wavelet responses from a window around the interest point is described as descriptor. Finally Keypoints of one image is matched to other to find a distance between the images. If the distance is higher than a threshold, the two images are concluded to be captured from different subjects, else they are concluded to be of same subject.

2.2.1 Keypoint detection

Hessian matrix based interest point (keypoint) detection is adopted in SURF. For detection of keypoints, determinant of Hessian matrix is used for selecting location and scale. Given a point P = (x, y) in an image I, the Hessian matrix H(P, σ) in P at scale σ (where σ is the standard deviation of the Gaussian) can be found using (5).

$$ H(P,\sigma) = \left[ \begin{array}{ll} L_{xx}(P,\sigma) & L_{xy}(P,\sigma)\\ L_{xy}(P,\sigma) & L_{yy}(P,\sigma) \end{array}\right] $$
(5)

where L x x (P, σ) is obtained through convolution of the Gaussian second order derivative \((\frac {\sigma ^{2}}{\sigma x^{2}}g(\sigma ))\) with image I at point P. Likewise L x y (P, σ) and L y y (P, σ) can also be derived. D x x , D x y , and D y y are discretized version of L x x , L x y , and L y y respectively. The discretization is done to achieve different sized filter at different scales.

The approximation for Hessian determinant can be computed using (6).

$$ Det(H_{approx}) = D_{xx}\ D_{yy}\ -\ (0.9D_{xy})^{2} $$
(6)

The scale space construction starts with 9 × 9 filter and then filters with sizes 15 × 15, 21 × 21, and 27 × 27 are applied. The increment in filter size is doubled for every new octave.

Keypoints are localized in scale and image space by applying a non-maximum suppression in 3 × 3 × 3 neighbourhood. The local maxima found on the determinant of Hessian matrix are interpolated to image space.

2.2.2 Keypoint descriptor computation

Orientation of a circular window around every keypoint is found. The neighbourhood is split into 4 × 4 sub-regions. Haar wavelet responses of this circular neighbourhood of interest point is calculated for each sub-region. The size of wavelets are scale dependent. Haar wavelet responses are calculated in x and y direction separately. The feature vector v of a 4 × 4 sub-region can be obtained using (7).

$$ v=\sum d_{x}, \sum d_{y}, \sum |d_{x}|, \sum |d_{y}| $$
(7)

where d x and d y are Haar responses obtained from the subregion in x and y direction respectively.

Concatenating the feature vector for all 4 × 4 sub-regions results in a descriptor with vector length of 4 × 4 × 4 = 64 which represents the keypoint. For all detected keypoints, such 64-D feature vectors are extracted.

2.2.3 Keypoint matching

The keypoint matching is same as followed by SIFT. To match two images, corresponding feature sets are subjected to nearest neighbour matching and number of keypoints matched is considered as the parameter to interpret the degree of matching.

3 Phase intensive local pattern

This section illustrates the Phase Intensive Local Pattern (PILP) which is formed to match fine-to-coarse features in periocular VS image. A matching technique illustration follows the feature extraction section which explains a matching technique used for matching two patterns obtained through feature extraction.

3.1 Extraction of PILP feature

The feature extraction technique to achieve the final feature vector representing a periocular image comprises of four sequential steps: a. Keypoint detection through Phase Intensive Patterns, b. Edge feature removal, c. Oriented Histogram computation, and d. Feature vector formation. These steps are elaborated hereafter.

3.1.1 Keypoint detection through phase intensive patterns

The first step of a local feature extraction technique is to choose from the periocular image few points which holds the important features and are sufficient to uniquely describe the image and make the image identifiable. Such points are termed as keypoints. First part of keypoint generation (pattern generation) is exactly same as generation of patterns by PIGP [3]. Still we elaborate the process in this article for completeness of readability.

We vary our scale (Δ) for feature detection from 3 to 9 with an increment of 2. Subsequently, we will use a filter to trace the pattern in each scale which will vary in size from 3 × 3 to 9 × 9. The justification of choosing this range of the scale lies as the variation in distance of subject from camera is approximately three times of the minimum distance of camera and subject. At a given scale Δ, the phase-intensive global pattern (PIGP) at a pixel (x c , y c ) with respect to its Δ2 − 1 neighbors considering a phase-tilt ϕ can be derived using (8). At said pixel (x c , y c ), this equation convolves s-function and assigns a weight to each neighbouring pixel depending on its spatial location and the phase-tilt ϕ inclined to which we are aiming to extract the pattern. For example, when Δ = 3, there are 32 − 1 = 8 neighbors to any pixel. To find the PIGP at a pixel (x c , y c ) at angle \(\phi = \frac {\pi }{4}\), the 3 × 3 neighbors around the pixel are operated with corresponding pixel and result is obtained. Figure 1 illustrates the working of s-function for Δ = 3, where the neighbors of center pixel are labeled as 0 if they are less than the center pixel, else labeled as 1.

$$\begin{array}{@{}rcl@{}} PIGP(x_{c},y_{c},{\Delta},\phi) &=& \frac{\sum\limits_{n=1}^{{\Delta}^{2}-1} s(i_{n},i_{c}). 2^{\sin\left( \tan^{-1}\left( \frac{y_{n}-y_{c}}{x_{n}-x_{c}}\right)-\phi\right)}}{\sum\limits_{n=1}^{{\Delta}^{2}-1}{2^{\sin\left( \tan^{-1}\left( \frac{y_{n}-y_{c}}{x_{n}-x_{c}}\right)-\phi\right)}}}\\ &=& \sum\limits_{n=1}^{{\Delta}^{2}-1} \left( s(i_{n}-i_{c}).\left( \frac{2^{\sin\left( \tan^{-1}\left( \frac{y_{n}-y_{c}}{x_{n}-x_{c}}\right)-\phi\right)}}{\sum\limits_{n=1}^{{\Delta}^{2}-1} 2^{\sin\left( \tan^{-1}\left( \frac{y_{n}-y_{c}}{x_{n}-x_{c}}\right)-\phi\right)}}\right)\right) \end{array} $$
(8)

where,

$$ s(i_{n},i_{c}) = \begin{array}{ll} 1, & \text{if}\ \ i_{n} -i_{c} \ge 0\\ 0, & \text{otherwise.} \end{array} $$
(9)
Fig. 1
figure 1

An example how s-function works for PILP filter

This process can be re-represented as a convolution with filter formed with the process shown in Fig. 2. For each Δ the demonstrated process is repeated eight times varying the value of ϕ from 0 to \(\frac {7\pi }{4}\) with an interval of \(\frac {\pi }{4}\). Figure 3 demonstrates the eight filters formed in scale Δ = 3. We consider four unique filters out of these eight, as every two filters with phase difference of π are having same structure (can be observed in Fig. 3). Figure 4 shows four 3 × 3 filters in filter bank for Δ = 3. Similar filter banks are formed for other values of Δ. Finally we obtain four filter banks (corresponding to Δ = 3, 5, 7, 9), each having four filters within the bank. The spatial size of all four filters for scale Δ is Δ × Δ.

Fig. 2
figure 2

Extrema detection method

Fig. 3
figure 3

Filter formation for PILP

Fig. 4
figure 4

PILP filter bank

The overall flow of keypoint detection methodology is as follows. When the aforementioned four filters of a filter bank are convolved with the original image, four pattern images are found. These four images are subjected to extrema detection, where each pixel of a pattern image at phase ϕ is compared with its 27 neighbours (including the pixel itself) in same phase and phase images of [(ϕ + π/4) mod 2π] and [(ϕπ/4) mod 2π] (refer to Fig. 5). These extrema are claimed as potential keypoints containing features.

Fig. 5
figure 5

Intensity representation of PILP orthogonal filter bank

To understand the whole feature extraction process mathematically, the following symbols are introduced:

I::

Original image

f Δ, i π/4 :

Filter of size Δ × Δ corresponding to phase ϕ varying as i π/4 (i = 0, 1, 2, 3)

F Δ :

Filter bank comprising four Δ × Δ filters, i.e., F Δ ≡ {f Δ,i π/4 | i = 0, 1, 2, 3}

I Δ, i π/4 :

Result of convolution of I with f Δ,i π/4 and s-function

k Δ, i π/4 :

keypoints found from local minima in I Δ,[i mod 4]π/4 with respect to itself and neighbouring pixels in I Δ,[i mod 4]π/4, and I Δ,[i mod 4]π/4

K Δ :

Set of all keypoints in Δ × Δ scale, i.e., K Δ ≡ {k Δ,i π/4 | i = 0, 1, 2, 3}

K :

Set of all keypoints from all scales, i.e., \(K \equiv \bigcup \limits _{\Delta = 3,5,7,9} K_{\Delta }\)

The whole keypoint extraction procedure is given in Algorithm 1 which is further schematically presented in Fig. 6. The algorithm is computation intensive as it involves convolution operation and pixel-wise extrema detection. However, the computation is worth as it yields sufficient number of keypoints.

figure f
Fig. 6
figure 6

Keypoint extraction method

3.1.2 Edge feature removal

It is necessary to determine keypoints belonging to edges and to remove them without further processing. For a keypoint at (x, y) in image I Δ,i π/4 where i = 0, 1, 2, 3, the magnitude m Δ,i π/4(x, y) is computed as given in (10). The value of m Δ,i π/4(x, y) will be high if there is any edge at (x, y), otherwise the value will be low.

$$\begin{array}{@{}rcl@{}} m_{\Delta,i\pi/4}(x,y) = \sqrt{(I_{\Delta,i\pi/4}(x+1,y)-I_{\Delta,i\pi/4}(x-1,y))^{2} +}\\ {(I_{\Delta,i\pi/4}(x,y+1)-I_{\Delta,i\pi/4}(x,y-1))^{2}} \end{array} $$
(10)

A high value of m denotes a keypoint to be on edge and those keypoints are discarded through proper thresholding and not considered in further processing [12].

3.1.3 Keypoint descriptor computation

Orientation is assigned to each valid keypoint location to achieve invariance to image rotations as descriptor can be represented relative to orientation. To determine keypoint orientation, a gradient orientation histogram is computed in the neighborhood of the keypoint. For a keypoint at (x, y) in image I Δ,i π/4 where i = 0, 1, 2, 3, the orientation θ Δ,i π/4(x, y) is computed as given in (11).

$$ \theta_{\Delta,i\pi/4}(x,y) = \tan^{-1} \left[\frac{I_{\Delta,i\pi/4}(x,y+1)-I_{\Delta,i\pi/4}(x,y-1)}{I_{\Delta, i\pi/4}(x+1,y)-I_{\Delta,i\pi/4}(x-1,y)}\right] $$
(11)

For every detected keypoint (x, y) in I Δ,i π/4, the orientation histogram is formed from a 16 × 16 pixel block oriented at θ Δ,i π/4(x, y) around each keypoint. The 16 × 16 pixel block under consideration is further divided into 16 sub-blocks, each having 4 × 4 pixels. For each sub-block, a histogram is formed which has 36 bins for 2π range of orientations. Peaks in the histograms correspond to peak within 80% of largest peak. Each histogram is represented with 8 of its most significant directional values (peaks). Hence 16 sub-blocks will produce together 16 × 8 = 128 peaks which represents the orientation of the 16 × 16 window around the keypoint (x, y). These 128 dimensional value is used to represent the detected keypoint in the feature vector.

As described, we obtain 128 values corresponding to each keypoint. Hence, we obtain a vector of size 128 × N where N is the number of detected keypoints. This PILP feature vector is claimed to represent each identification uniquely and capable of matching periocular images.

3.1.4 Why reduction is needed?

It is evident from Section 3.1.3, if N keypoints are detected through PILP in a periocular image, the corresponding feature vector size becomes N × 128. This feature size is large when N is large, and feature extraction is time consuming. Furthermore, the feature matching becomes time intensive. To reduce the feature size, we can adopt three mechanisms: a. to prune number of keypoints (N) b. to reduce the number of keypoints by applying clustering technique and/or c. to reduce the dimension of features from 128 to some lower value.

Figure 7 shows the schematic diagram of the proposed mechanism. In the proposed mechanism, if an image yields N × 128 PILP feature vector as shown in Fig. 7a (where N is the number of keypoints detected), some keypoints among N will be pruned and finally M (where M < N) is retained in the reduced PILP feature set to yield a M × 128 feature vector as shown in Fig. 7b.

figure g
Fig. 7
figure 7

Feature reduction of PILP to obtain R-PILP feature set

There is an undoubted need of sequencing the features from high relevance to low relevance in order to ensure pruning of least significant features. Pruning of least significant features cannot hamper the recognition rate, whereas removal of any important feature can lead to compromising final recognition accuracy.

3.1.5 Proposed methodology: reduced phase intensive local pattern (R-PILP)

Each SIFT and PILP descriptor contains an array of 16 histograms (each having 8 values) around the keypoint, and hence each keypoint descriptor is of size 16 × 8 = 128. Hence it is clear that set of every 8 values within 128 values are independent. Each set of 8 values do have cohesion among themselves but such two sets are purely uncorrelated. The proposed method aims towards checking if every of these 8-value sequences are monotonic (consistently increasing or decreasing in nature). If an 8-value sequence is non-monotonic its differentiation can never produce zero values. However, if an 8-value sequence is monotonic, it will certainly produce 50% zero values after finite number of differentiations. A 128-D feature with most of its parts as monotonic is of less significant than another 128-D feature with most of its parts as non-monotonic. We thereby sequence the features and remove least-significant features. The overall process is given in Algorithm 2.

One important issue to ensure in this reduction process is not to loose any significant feature as that will harm the recognition accuracy. Hence we have experimented to ensure with how much reduction we can achieve almost similar accuracy as PILP. It is empirically found that upto 20% reduction in PILP feature does not affect the recognition accuracy. However, going beyond this limit and pruning more features degrade the recognition immediately.

An example of the reduction technique is demonstrated on a small region of an UBIRISv2 image in Fig. 8. The proposed reduction technique can also be applied to SIFT as well as SURF due to their similarity in nature of orientation histogram-based feature extraction. The technique is scalable in terms of choosing number of features or degree of reduction. Compromising on accuracy, degree of reduction can be tuned as per demand of system to be developed. If a system is to be developed in which security is prime issue and no compromise on accuracy can be made, the reduction can be limited to very small portion of total number of features. On the other hand, if a system is required to be developed that needs minimal template storage but agree to compromise to recognition accuracy, then this reduction technique can be applied to prune a high portion of input feature set.

Fig. 8
figure 8

Example of feature reduction on a sample UBIRISv2 periocular image

3.2 Matching of proposed R-PILP feature

The matching algorithm plays a significant role in any biometric system. In local feature matching, the total number of paired keypoints (similarity score) is considered to find the authenticity of an individual. Let I be the set of all images available in the database. For understanding, I m be a gallery iris image and I n be a probe image where I m , I n 𝜖 I. Let K m be the set of p keypoints found in I m and K n be the set of q keypoints found in I n by applying R-PILP local feature detector. Let D m and D n denote the set containing keypoint descriptors for each keypoint in K m and K n respectively. For each element in D m the Euclidean distance is found with every element in D n . |D m | denotes number of keypoint found in I m and |D n | denotes number of keypoint found in I n . The nearest neighbor approach pairs the i th element in D m with j th element in D n , iff the descriptor distance between the two (after multiplying with a threshold) is minimum. The details of the algorithm is given in Algorithm 3.

figure h

4 Experimental results

The performance of the proposed R-PILP feature is compared with two features developed successively: PIGP and PILP. As R-PILP is a successor to these features, comparing with them truly reflects the improvement achieved due to reduction. The comparison is made with respect to accuracy yielded, template size (proportional to number of keypoints), and feature extraction and matching time. Two landmark works SIFT and SURF are also used for comparison.

4.1 Performance measures

The following measures are employed to evaluate the performance of the proposed biometric system:

  1. 1.

    False Acceptance Rate (FAR): FAR is the frequency of fraudulent access to imposter claiming identity. This statistic is used to measure biometric performance when operating in the verification mode. A false accept occurs when the query template of an individual is incorrectly matched to existing biometric template of another individual.

  2. 2.

    False Rejection Rate (FRR): FRR is the frequency of rejections relative to people who should be correctly verified. This statistics is used to measure biometric performance when operating in the verification mode. A false reject occurs when an individual is not matched correctly to his/her own existing biometric template.

  3. 3.

    Accuracy (Acc): Accuracy of a biometric system is defined as the rate of true acceptance and true rejection. Particularly accuracy of a biometric system can be defined as illustrated in (12).

    $$ Acc = \left( 100-\frac{FAR+FRR}{2}\right)\% $$
    (12)
  4. 4.

    Receiver Operating Characteristic (ROC): ROC curve depicts the dependence of FRR with GAR [Genuine Acceptance Rate (GAR) = 1 − FRR] for change in the value of threshold. The curve is plotted using linear, logarithmic or semi-logarithmic scales. ROC can also be represented by plotting FRR against FAR for change in the threshold value.

  5. 5.

    Cumulative Match Characteristic (CMC): The rank-k identification indicates the number of correct identification that occur in top k matches. Let R k denote the number of elements of probe set in top k, then the probability of identification is given by I = R k /N. CMC curve represents the probability of identification (I) at various ranks k.

  6. 6.

    Decidability index or d index: d index [10] measures the separation between the arithmetic means of the genuine and imposter probability distribution in standard deviation units as defined in (13).

    $$ d^{\prime} = \frac{\sqrt{2}\ |\mu_{genuine}-\mu_{imposter}|}{\sqrt{\sigma^{2}_{genuine}+\sigma^{2}_{imposter}}} $$
    (13)

4.2 Databases used

The proposed R-PILP is tested on publicly available BATH and CASIAv3 database to demonstrate its accuracy on subtle features available in NIR images. To evaluate the performance of R-PILP on gross features, noisy unconstrained images of UBIRISv2 and periocular region cropped from FERETv4 are used. A detail of these databases can be found in Table 1.

Table 1 Detail of publicly available testing databases used for evaluation of proposed approach

4.3 Experiment 1: evaluation of performance on subtle features

In this experiment, the proposed reduction approach is applied to PILP features obtained from a high resolution NIR image. Testing of this experiment is made on BATH and CASIAv3 databases which contains high resolution intensity-variation compensated NIR images of ocular region captured in orthogonal view. The performance results (Accuracy curve, ROC curve, Score distribution, and CMC curve) of R-PILP on BATH and CASIAv3 databases are shown in Figs. 9 and 10 respectively. Further quantitative results can be found in Table 2, which justifies that R-PILP can perform equally well for NIR databases as PILP.

Fig. 9
figure 9

Performance of R-PILP with NN matching technique on BATH database

Fig. 10
figure 10

Performance of R-PILP with NN matching technique on CASIAv3 database

Table 2 Comparison of performance of PIGP, PILP, and R-PILP on BATH, CASIAv3, UBIRISv2, and FERETv4 databases

4.4 Experiment 2: evaluation of performance on gross features

In this experiment, the proposed reduction approach is applied to PILP features obtained from a low resolution VS image. Testing of this experiment is made on UBIRISv2 and FERETv4 databases which contains low resolution color VS images of ocular region captured in unconstrained view from a distance.

The performance results (Accuracy curve, ROC curve, Score distribution, and CMC curve) of R-PILP on UBIRISv2 and FERETv4 databases are shown in Figs. 11 and 12 respectively. Further quantitative results can be found in Table 2, which justifies that R-PILP can perform equally well for VS databases as PILP.

Fig. 11
figure 11

Performance of R-PILP with NN matching technique on UBIRISv2 database

Fig. 12
figure 12

Performance of R-PILP with NN matching technique on FERETv4 database

4.5 Comparative analysis of R-PILP with PIGP and PILP

The discussed post-reduction technique takes as input feature vector F of size N × 128 from PILP and reduces its keypoints to 0.8N so that the reduced feature vector F has size 0.8N × 128. In case of PILP, if two images (one gallery and one probe) with feature vectors of size N 1 × 128 and N 2 × 128 are to be matched, Nearest Neighbour (NN) matching technique will operate N 1 × N 2 distance calculations. If one 128-D disctance calculation costs t time, then total NN distance calculation will consume t P I L P = (N 1 × N 2) × t time.

However, if the same process is executed on the same images with the proposed reduction technique, the gallery and probe image will have feature vectors of size 0.8N 1 × 128 and 0.8N 2 × 128 after reduction. Hence NN matching needs only 0.8N 1 × 0.8N 2 distance calculations. So, the total NN distance calculation in this case will consume t RP I L P = (0.8N 1 × 0.8N 2) × t time.

Hence, obtained speed-up of R-PILP with respect to PILP (\(\text {Speed-up}^{\textit { R-PILP}}_{\textit { PILP}}\)) achieved through the reduction can be calculated as shown in (14).

$$\begin{array}{@{}rcl@{}} \text{Speed-up}^{R-PILP}_{PILP} & = &\frac{1/t_{R-PILP}}{1/t_{PILP}}\\ & = &\frac{t_{PILP}}{t_{R-PILP}}\\ & = &\frac{(N_{1} \times N_{2}) \times t}{(0.8N_{1} \times 0.8N_{2}) \times t}\\ & = &\frac{1}{0.8 \times 0.8}\\ & = &1.5625 \end{array} $$
(14)

Though this reduction technique consumes time for reducing the PILP feature, but that process takes place only once. The reduced feature R-PILP speeds up matching process with a factor of 1.56 with respect to PILP which is significant as matching process executes every time a live query comes. Moreover, reduction of these 20% keypoints does not affect in performance of R-PILP comparing with its predecessor PILP.

To advocate this theoretical analysis, Table 3 presents the number of keypoints found in an image through SIFT, SURF, PILP, and R-PILP. The table presents the number of keypoints as a range in [Q1 - Q3], where, N/4 images yield < Q1 keypoints and N/4 images yield > Q3 keypoints when a feature extraction method is applied on a database of size N. However, from Table 4, it is clear that R-PILP takes more time for feature extraction than SIFT, SURF, and PILP.

Table 3 Comparison of number of keypoints extracted by SIFT, SURF, PILP, and proposed R-PILP on BATH, CASIAv3, UBIRISv2, and FERETv4 databases
Table 4 Comparison of time of feature extraction by SIFT, SURF, PILP, and proposed R-PILP on BATH, CASIAv3, UBIRISv2, and FERETv4 databases

As feature matching time is theoretically proportional to the number of keypoints, in Table 5 it can be observed that matching time for R-PILP is less than that of PILP by a factor of 63%, which supports our theoretical analysis.

Table 5 Comparison of time of feature matching by SIFT, SURF, PILP, and proposed R-PILP on BATH, CASIAv3, UBIRISv2, and FERETv4 databases

5 Conclusion

R-PILP, having 20% reduction from PILP, delivers approximately the same performance as PILP, as confirmed by testing on four datasets. However, reduction beyond this 20% margin degrades the performance as found experimentally. We have hence empirically limited our reduction to 20% of existing keypoints only. The reduction, while maintaining approximately the same accuracy as PILP, offers two additional benefits: a. a low template size resulting in reduction of the whole database and query communication cost for a networked biometric system and b. reduced matching time making the response to user query faster. The matching time of R-PILP is theoretically analysed to speed up 1.56 times than matching time of PILP. The analysis has been supported by experimentation where 63% reduction in matching time of R-PILP with respect to PILP is obtained. This reduction in matching time, with help of removal of outlier features, surely places R-PILP as a candidate feature for mobile biometric authentication systems.

Abbreviations used

ACC ::

Accuracy

CMC ::

Cumulative Match Characteristics

FAR ::

False Acceptance Rate

FRR ::

False Rejection Rate

PIGP ::

Phase Intensive Global Pattern

PILP ::

Phase Intensive Local Pattern

R-PILP ::

Reduced Phase Intensive Local Pattern

ROC ::

Receiver Operating Characteristic

RR ::

Recognition Rate

SIFT ::

Scale Invariant Feature Transform

SURF ::

Speeded Up Robust Features