Keywords

1 Introduction

Biometrics can be used to recognize an individual by detecting physical or behavioral human features digitally and they are also accessible to services, devices, or records. Types of these biometric signatures such as iris, fingerprints, speech, or cadence texting are uniquely distinct as such identifiers to ensure better identification of the human [1]. Nowadays, a smartphone screen can be unlocked with facial recognition, weather updates with the help of Siri, or log in an online bank account by detecting the fingerprint of that bank account holder are the real-time examples of biometrics. We are more concerned about the authenticity and privacy of our identity with the usage of more technology. Iris is the most effective because of its high performance and accuracy. Even though an individual cannot forget or lose his/her biometric, if we store the biometric templates in some database they are vulnerable to so many security and privacy attacks [2]. The cross-spectral iris detection provides rich knowledge about the human iris by using specific spectral bands. Base on the past literature survey of cross-spectral iris detection, feature-based approaches, which are not reliable for more accurate results to changes in parameters such as spatial conditions and positions for iris extraction and iris image acquisition, results in performance degradation of iris recognition in the extraction process [3]. The structure of the iris under specific spectral lighting is considered to look differently. Generic systems for recognizing iris include 4 modules as given in Fig. 1, (1)Image Acquisition of iris (2) Segmentation and Normalization (3) Extraction of features (4) Matching and Recognition.

Fig. 1.
figure 1

Iris recognition system

High-quality NIR images are needed by image acquisition to provide an accurate identity, while most existing implementations allow subjects to comply entirely with the program. In pre-processing, the outer portion and pupil are detected. The surrounding portions of the iris pattern like eyelids, eyelashes, or mirrors are identified as unnecessary sections and retained in the noise masks of the background. With the aid of CNN, the iris structure is extracted as a required extraction function. The training model is paired with one extracted attribute from the test picture.

The usages of cross-spectral iris technologies have increased rapidly in the past few years. Cross-spectral iris recognition is used for iris detection in a particular range, such as near-infrared (NIR) iris recognition based on wavelength and visible (VIS) iris recognition based on wavelength. A cross-spectral region occupies different spectrums to gain more information regarding the human iris for better recognition. Cross-spectral iris recognition is used in many biometric systems, like visas and biometric passports, Personal Identification Card Schemes such as PAN Card, Aadhar Card, Driving License, and e-commerce applications. In iris recognition, corneal reflection may be affected in the VIS spectrum. Highlighted are light-colored irises and barely noticeable are dark-colored irises [4]. Detection of iris representations in both the VIS and NIR spectra within the same range is simpler than detection. Since cross spectra are commonly utilized and aimed in commercial implementation [5], NIR-VIS cross-spectral domain work gained attention.

The rest of the paper is organized as follows: Sect.2 gives a survey on recent literature in cross-spectral iris recognition, Sect. 3 gives step by step explanation of the proposed method, Sect. 4 provides experimental analysis and discussion on results. The conclusion is given in Sect. 5.

2 Related Work

Recent research in the recognition domain has been investigated that iris recognition with iris matching of cross-spectral is much more complex than that of matching a single wavelength, i.e., visible to the visible iris signal or near-infrared to near-infrared iris. The previous research provides many examples of successful usage of CNN in the classification of images [6], hand-written character recognition [7], and face recognition [8]. An implementation by using CNN for periocular recognition [9], iris recognition [10], and iris image segmentation [11] and the identification of false iris images [12] have also been accomplished by other recent specialized researchers in commercial applications. Modern ocular recognition has implemented more precise learning characteristics for user authentication, such as pose or gender. Hence, more complex biometric recognition issues are explored, such as cross-spectral iris matching, to discover the pervasive capabilities of deep learning architectures. For effective iris detection, the iris images are obtained by using the VIS and also under the NIR with more accurate results than that of single illumination. This approach is generally introduced as more demanding and appealing than the multi-researchers matching cross-sensor iris. Kuo Wang and Ajay Kumar [13] used discrete hashing for iris recognition. Early attempts extensively examined the knowledge quality accessible from iris photos obtained under specific ranges and suggested incorporating these complementary details to boost matching precision with that accessible from traditional iris detection under NIR. The efficiency deterioration of similar iris images acquired from various resolutions or two separate sensors [14] was also noted in the literature. Bowyer et al. [15] implemented deterioration variables such as dilation of the eyes, contact lenses, and aging of the prototype, are influenced by the corresponding consistency, and contribute to the less effective delivery of matches in biometrics. Ramaiah et al. [16] suggested advanced cross-spectral iris detection utilizing bi-spectral processing, finding NIR images of VIS for iris photographs, then comparing those NIR images projected to certain NIR images in the gallery database. Markov random fields (MRF) is implemented as a more precise and accurate framework to overcome the cross-spectral iris matching disadvantages.

3 Proposed Method

The flow diagram of our method is given in Fig. 2. As given in this figure we have two phases in our model. First is the offline training phase. In the training phase, we trained our model using CNN. The second phase of the model is the testing phase that will match the user image with the trained model. In the end, the match score is generated to decide whether the given image is imposter or genuine. Each block in this diagram uses CNN to represent a step in the recognition of cross-spectral irises. The major difference between the proposed work and the work in [13] is the use of Multi-class SVM for matching as shown in the last block of Fig. 2.

Fig. 2.
figure 2

Flow diagram of proposed method

The first block in the flow diagram illustrates the iris database we used which is PolyU’s bi-spectral iris database. The next block is pre-processing and normalization which are the main phases of iris picture processing. The preprocessing of the iris image will first define the distorted circular outer borders of the pupil and the iris in a picture of an eye. Photos captured under defined conditions should be obtained with sufficient lighting, distance stated, and other criteria, photographs of the highest standard. The iris picture not only includes iris but also certain unwanted portions of the eyelid, eyelash, pupil, and sclera. In some instances, there may be chances of specular reflections that corrupt iris patterns within the desired iris region. The angle between frame and eye and the external light levels often influences the intensity of the iris. So we have to separate and remove the unwanted noise and find the circular iris field. The fundamental operations involved in pre-processing the iris when collecting the characteristics are iris position, iris normalization, and development [13]. The next step is Iris localization where the area of interest is derived in the initial stages by isolating iris texture, i.e., circular iris areas. Two simple operations include finding the iris; then, detecting eyelids, and detecting boundaries. There are some undesired portions around the iris, which are detected with occlusion by eyelashes and eyelids. The next task is to define the internal and external boundaries of the iris image. For segmentation of Iris boundaries, Canny edge detection and Hough transform are used. Dimensions of eye photographs derived from various sources are distinctive owing to the expansion of iris with specific light intensity rates. The inconsistency resulting from different viewing distances, camera rotation, eye rotation, etc. and it will change the resolution of the iris. It also changes the actual distance between the limbus and pupil boundary. The iris images must therefore be produced with constant dimensions. Under the different conditions of two iris, images may have the same characteristic. The size of the iris area is not a constant, because it is a doughnut-shaped structure. To solve these challenges, we need to normalize iris images. We used, rubber sheet model invented by Daugman [17], as in Fig. 3.

Fig. 3.
figure 3

Rubber sheet model

The rubber sheet model is a linear model. This Daugman’s model is applicable to each pixel of the desired iris image. In this model \((r, \theta )\) is a pair of real coordinates, where r is within [0, 1] and \(\theta \) is within [0, 2\(\pi \)]. The transformation of cartesian coordinates (x, y) of pixel values to the non concentric polar coordinates \((r, \theta )\) of pixel values is derived using this model. The mapping of the desired iris image I(x, y) is shown in Eqs. 1 and 2.

$$\begin{aligned} I \Big ( x\left( r,\theta \right) , y \left( r, \theta \right) \Big ) \rightarrow I \left( r, \theta \right) \end{aligned}$$
(1)

where pupil boundary points \((x_p(\theta ), y_p(\theta ))\) are combined by \(\text {x}(\text {r}, \theta )\) and \(\text {y}(\text {r},\theta )\). \((x_s (\theta , y_s (\theta ))\) indicates the outer perimeter of the desired iris. \((x_p( \theta ), y_p( \theta ))\) consists of a set of boundary points of the pupil.

$$\begin{aligned} x\left( r,\theta \right) =\left( 1 - r \right) * x_p( \theta ) + r * x_s( \theta ) \end{aligned}$$
(2)

where, the pixel value of iris image is I(x, y). The original coordinates of iris image is (x, y). The non-concentric polar coordinates of iris image is \((r, \theta )\), and corresponding \((x_p, y_p)\) coordinates indicate the pupil and iris boundaries at \(\theta \) deviation. Hence, using the above normalization method we get a normalized form of the iris image as in Fig. 4.

Fig. 4.
figure 4

Normalization of an Iris image

Cross entropy loss is used as a loss function of CNN. It will calculate the loss of the trained model. The architecture of deep learning like CNN has self-learned capabilities. The CNN operating in our system is similar to AlexNet and is seen in Fig. 5. The network has three layers of convolution, two fully connected layers (FCs), and three layers of pooling. There is a nonlinear activation function. Before the Rectified Linear Unit (ReLU), a pooling layer and the first fully connected layers are used. The classification task is achieved at the last fully connected layer.

Fig. 5.
figure 5

Softmax cross-entropy loss in CNN taken from [13]

The \(i^{th}\) channel output \(y^i\) is calculated by a convolutional layer using Eq. 3. In this equation, \(x^j\) input of \(j^{th}\) from the last layer, convolutional kernel is known as \(w^ij\), while \(b^ij\) is the neuron bias.

$$\begin{aligned} y^i = \sum _{j} \Big (b^{ij}+ w^{ij}*x^i\Big ) \end{aligned}$$
(3)

We use a max pooling to extract maximum value for each layer. It will reduce the size of input for the next process. The ReLU is used as an Activation function. ReLU gives only non negative values. ReLU is defined in Eq. 4.

$$\begin{aligned} y^i = max(y^i,0) \end{aligned}$$
(4)

The Fully Connected (FC) layer will give the output vector and all the other nodes in the layer are connected to this node. Equation 5 refers to the output vector obtained.

$$\begin{aligned} y^i = b^i + \sum _{j} w^{ij}*x^i \end{aligned}$$
(5)

The weight of the network is initialized randomly. The randomly initialized weight will generate variation in the neurons. Error is calculated by the softmax cross entropy. The normalized error is measured between the original/actual value and our prediction value as given in Eq. 6.

$$\begin{aligned} 1/N \sum _{n=1}^{N} H(p_n, l_n) =-1/N[y_n log \hat{y_n}+ (1-y_n)log (1-\hat{y_n}] \end{aligned}$$
(6)

where the number of classes is indicated as N, ground truth value is as \(l_n\) and \(p_n\) is the predicted value. The output vector size is 1*N. The class label prediction probability is represented by the value of each element for the input iris image. The back propagation is aimed to minimize the loss of the training model to achieve maximum value of predicting the actual class.

3.1 Matching

Matching is conducted during the testing step of the biometric reconnaissance program. The classifier finds the label for each corresponding test image after extraction of the feature. Specific forms of classification can be required for this function, e.g. Aid Vector Machine, Regression of Softmax, and Neural Network. A multiclass Vector Machine Aid classifier was introduced in this matching function as shown in Fig. 6. A notation of multiclass Support Vector Machine (SVM is given as: The set of iris training data is denoted as \((x_1, y_1), (x_2, y_2), \ldots , (x_n, y_n)\). The proposed matching is classified into two different classes. Where \(y_i \in -1,1\) and the feature vector is denoted as \(x_i \in R^d\).

Fig. 6.
figure 6

Using multi class SVM for matching

Algorithm 1 is used to classify the testing images with trained model as given below. Classification is done using the SVM.

figure a

For the collection of data with M groups, the multi-class SVM can be used, we have to train M number of binary classifiers that can separate every class from all other groups, then take the class which gives greater margin (one-vs-all).

4 Experimental Results and Analysis

A publicly available database of cross-spectral iris images in the different spectrums is obtained from 209 subjects. The bi-spectral iris collection of PolyU contains 418 classes of bi-spectral photographs collected from 209 subjects. There are 15 instances in each spectrum. Photos from two spectra were obtained in this collection concurrently. There are a total of 12,540 iris images (\(209\times 22\times 215\)). The measurements of the initial images are 640480 pixels. A publicly available iris recognition algorithm is used to get various segments of iris pictures accurately in this iris recognition implementation. The size of each of the standardized and iris images using a segmentation algorithm is 512664 pixels. Sample NIR and VIS iris images of the database in different preprocessing steps of the iris recognition experiment are shown in Fig. 7. This database has a low-quality sample of images. Some of them are considered as a representative sample of images from the iris databases as shown in Fig. 8.

Fig. 7.
figure 7

Pre-processing steps of PolyU Bi-Spectral iris image

Fig. 8.
figure 8

Poor quality images from PolyU iris database

For the experiment analysis, CNN from Softmax is employed as a feature extractor with cross-entropy loss. The vectors of features are classified from this iris recognition experiment. All the function vectors are a binary vector of 1000 bits. The comparative matching results are obtained for better performance evaluation using the common IrisCode method. There are various layers in this convolutional neural network. The self-learned features of CNN are generated from different corners, other textures, and edges. The accuracy and loss of the trained model during the train iris dataset are given in Fig. 9. The accuracy of the model is increased while epochs are increasing.

Fig. 9.
figure 9

Training and testing accuracy of the proposed CNN

The metrics used to analyze the performance of the proposed method are:

Genuine Acceptance Rate (GAR): It is the measure by which the system accepts genuine iris templates in the total number of iris templates tested.

False Rejection Rate (FRR): It is the measure by which a genuine iris template on the total number of iris templates tested is falsely refused as shown in Eq. (7). FRR can also be represented using GAR i.e. GAR = 1–FRR. GAR stands for a genuine acceptance rate.

$$\begin{aligned} FRR = \frac{Number \; of \; false \; rejections}{Number \; of \; identification \; attempts} \end{aligned}$$
(7)

False Acceptance Rate (FAR): It is the measure by which a false iris template on the total number of iris templates tested is wrongly accepted as shown in Eq. (8).

$$\begin{aligned} FAR = \frac{Number \; of \; false \; acceptances}{Number \; of \; identification \; attempts} \end{aligned}$$
(8)

Equal Error Rate (EER): It is the error value obtained when the values of FRR and FAR equal. Using Genuine Score Distribution as well as Imposter Score Distribution, the performance measures are calculated. The matching results are compared with other approaches such as MRF approach [18] and the IrisCode approach [19]. The summary of the matching results in terms of the equal error rate is shown in Table 1.

Table 1. An analysis of comparative results of EER
Table 2. The iris recognition accuracy of different iris databases

Table 2 gives the accuracy of the proposed iris recognition method on the databases listed. From the table it is observed that, the accuracy after normalized is slightly decreased as compared to the accuracy after segmentation. It says that the proposed CNN can capture more discriminative textures of iris in the segmentation than normalization.

5 Conclusion

This work utilizes deep neural neural networks for the cross-spectral iris detection. We experimented with iris recognition, utilizing PolyU cross-spectral iris repository. The accuracy in our trained model is resulting in a 9.38% EER. Further, we used a multi class SVM classifier to achieve the recognition task on iris images. The segmented and normalized iris image is given to CNN as input. The results proved that the accuracy is deteriorated after normalization compared to segmentation.