1 Introduction

Nowadays, the use of passwords is becoming obsolete for important tasks in many organizations and they have been moving to biometric authentication. According to a report given by Verizon [1], it has been observed that passwords are responsible for 81% of the data breaches. This can be eliminated by using biometric systems. Biometric authentication is more secure because it is harder to recreate or forge due to their distinctive features and they are unique for each person [2, 3].

Biometric traits are body measurements and calculations associated with human characteristics. The physical traits such as iris, fingerprint, finger vein, etc, can be used to identify an individual and distinguish them from others [2, 3]. Iris is considered as the widely accepted biometric trait due to features like uniqueness and stability [4, 5]. Enrollment and verification are the two phases involved in a biometric system.    In the enrollment phase, the reference biometric is captured and the features are extracted from it by a feature extractor. A reference template is generated from these features using the template generator and stored in a template database. During the authentication phase, the features are extracted from the query biometric using the feature extractor, and a query template is generated. The comparator module compares the query template with the templates stored in the database. Then the comparator returns the verification decision. The comparator returns the accept decision if the query template and reference template belongs to the same user otherwise it returns the reject decision. The feature extractor should be good at extracting the features from the biometric data so that it can accurately differentiate the genuine and imposter user. Feature extractor plays a crucial role in a biometric system and the accuracy of the biometric system depends very much on the feature extractor.

The use of multiple instances of biometric information to recognize and authenticate an individual is known as multi-instance biometrics. Multi-instance biometric systems are more secure because it is harder for the attacker to manipulate multi-instance biometrics [2]. Deep learning-based models are widely explored in many fields of computer science nowadays. Their ability to learn patterns, extract features from given data is essential for day-to-day problems. Training a stable and robust deep learning-based model requires enough training examples, which may be impractical in many real-world scenarios [6]. A convolutional neural network inspired by AlexNet is used for feature extraction in our system to avoid the mentioned limitation.

Cloud computing has been getting very popular in recent years due to its large computing power, data storage, and scalability. Fraudulent activities can be reduced by performing biometric authentication on the cloud. Cloud was already used for biometric authentication in mobile phones [7]. But, biometric authentication on the cloud has problems like the privacy of the biometric template on the cloud [8]. Indeed, it has been shown that iris images can be reconstructed from templates [9, 10]. Therefore, biometric template protection schemes such as cancelable biometrics [11], bio-cryptosystems [12] and homomorphic encryption (HE) [13] are introduced and has been applied successfully in recent years to achieve the required template protection. However, HE schemes suffer from considerable computational requirements [13]. In the literature, several works are proposed in the biometric field to provide the privacy of templates using cancelable biometrics [14,15,16,17,18]. But they fail to maintain the trade-off between speed, security, and accuracy.

As a contribution to the aforementioned challenges, this paper proposes a cancelable iris authentication system (MICBTDL), which uses a convolutional neural network (CNN) trained using triplet loss for feature extraction. The artificial neural network (ANN) is used as a comparator module to compare the reference template and query template. MICBTDL uses both random projection and random crossfolding to achieve the irreversibility requirement of the biometric template protection scheme. As a result, our system is more secure when compared to the state-of-the-art approaches. In addition, our proposed system does not use any tool kits like the University of Salzburg Tool Kit [19] to extract the features of an iris image. So, our method can be applied to any other biometric trait also. MICBTDL is evaluated on MMU and IITD iris databases to check its efficiency. MICBTDL can solve attacks like modify template and intercept channel of biometric authentication system.

The rest of this article is structured as follows: Sect. 2 presents about the state-of-the-art approaches. Section 3 describes our proposed system. Subsequently, the experimental evaluation and its results are presented in Sects. 4 and 5 offers our conclusion.

2 Related works

In recent years, deep learning models have been showing very promising results in improving the accuracy of many biometric authentication systems. Specifically, convolutional neural networks (CNNs) are the most successful and widely used architecture in the deep learning community.

Sibai et al. [20] designed an iris recognition system by using feed-forward artificial neural network. Authors conducted several experiments by varying the input format, number of hidden layers, and the number of neurons in the hidden layer to find the optimal parameters. Khedkar and Ladhake [21] proposed an iris recognition system using neural network techniques such as radial basis function (RBF), support vector machines (SVM), & multi-layer perceptron (MLP). Two feature extraction techniques, namely Haar wavelet decomposition & 1D Log Gabor wavelet used in a method proposed by Rai et al [22]. The iris patterns are identified with Hamming distance and SVM. Srivastava et al. [23] implemented an approach for iris recognition by combining functional modular neural networks and evolutionary fuzzy clustering. Saminathan et al. [24] introduced a method for iris authentication by using kernel-based multi-class SVM. Marsico et al. [25] presented a survey of machine learning techniques ranging from neural networks to deep learning for iris recognition. Ahmadi et al. [26] suggested a recognition system which used MLP and particle swarm optimization to increase generalization performance. Later to reduce the computational complexity, Ahmadi et al. used a genetic algorithm with RBF. Fahim et al. [27] proved the feasibility of machine learning techniques to recognize a person with iris modality even if an eye image is captured through a smartphone.

Ahmadi et al. [28] designed a biometric system by using MLP-imperialist competitive algorithm (MLP-ICA) as a classifier. The authors used a Gray-level difference matrix to obtain the features from the iris. The softmax classifier and convolutional neural network (CNN) are used to obtain the features from the iris image and classify the user into any of the N classes by Waisy et al. [29]. When compared to state-of-the-art works, the performance is better for this method. Arsalan et al. [30] proposed a method by using deep learning to determine the true iris region without pre-processing the eye image. Zhao and Ajay [31] suggested a framework to accurately detect, segment the iris images by using the fully convolutional network. Zhao and Ajay [31] introduced an “Extended Triplet Loss (ETL)” function to learn the spatially corresponding features of an iris image. A cross-spectral iris recognition system is designed by Wang et al. [32]. The features are extracted by using CNN and supervised discrete hashing (SDH) is used for compression and classification. Admovic et al. [33] proposed an approach for iris recognition by using stylometric features and random forest machine learning methods. The hybrid-based particle swarm optimization (PSO) is used as a classifier and proposed an iris recognition system by Gale et al. [34]. Hybrid-based PSO is a combination of a weighted directed acyclic graph (DAG) SVM and spiking neural networks (SNN). The classification task is achieved by weighted DAG SVM and evaluation is achieved by SNN. Sudhakar et al. [35] proposed a cancelable biometric system using a feature extractor based on deep learning for extracting iris features and then used a random projection technique to convert the extracted features into a cancelable template. Later in 2020, they suggested a cancelable biometric system [36] based on the cloud which has the server on the cloud and a client is connected to the server for authentication. El-Hameed et al. presented a scheme to preserve the privacy of fingerprint templates using the advanced chaotic maps [14]. Abdellatef et al. [15] proposed a cancelable multi-biometric face recognition method using bio-convolving encryption. A novel cancelable multi-modal biometric system in which iris and fingerprint are fused using the projection-based approach is proposed by Gupta et al. [16] The feature points are projected onto a random plane obtained using a user-specific key to generate the cancelable template. Mahesh et al. [37] proposed a novel privacy-preserving iris authentication using fully homomorphic encryption.

The state-of-the-art works fail to maintain the trade-off between accuracy, speed, and security. Therefore, MICBTDL is proposed to achieve high accuracy and provide confidentiality to the iris templates.

Fig. 1
figure 1

Block diagram of MICBTDL. The steps during the enrollment, authentication and after the authentication phases are indicated with dashed, solid and dotted lines

3 Multi-instance cancelable iris authentication using triplet loss for deep learning models (MICBTDL)

MICBTDL is the first multi-instance cancelable iris authentication system using triplet loss as a loss function during the training phase of deep learning models. Triplet loss [38] is a loss function that is used for training machine learning models by comparing an anchor image to a positive and a negative image [39, 40]. The models trained using triplet loss are generally good at distinguishing between images of the same class and images of different classes [39]. For some cases like biometric authentication, triplet loss is better for training the network compared to the Softmax cross-entropy loss which contains a fixed number of classes and trains the classification model. In biometric recognition, we need to be able to compare two unknown biometrics and be able to declare whether the biometrics are of the same person or not. The models are trained such that the outputs of the positive image and anchor image are close and at the same time outputs of the negative image and anchor image are far away. In MICBTDL, a convolutional neural network inspired by AlexNet is used for feature extraction which is the most crucial part of biometric authentication. For user authentication, a Multi-layer Perceptron network which is a fully connected network of neurons is used after which Euclidean distance is used to compare the distance between outputs of template image and test image.

Figure 1 depicts the flow diagram of MICBTDL. It consists of three entities namely, the client device, the cloud server, and the trusted authenticator. MICBTDL consists of two phases, enrollment and authentication phases. The steps during the enrollment phase are given as below:

  1. 1.

    The client device acquires the reference right and left iris images.

  2. 2.

    The client device applies the random crossfolding and transforms the cross folded images to the trusted authenticator.

  3. 3.

    The trusted authenticator extracts the features using CNN.

  4. 4.

    The trusted authenticator generates the reference cancelable template and sends to the cloud server.

The steps during the authentication phase are given as below:

  1. 1.

    The client device acquires the probe left and right iris images.

  2. 2.

    The client device applies the random crossfolding and transforms the cross folded images to the trusted authenticator.

  3. 3.

    The trusted authenticator extracts the features using CNN.

  4. 4.

    The trusted authenticator generates the reference cancelable template and sends to the cloud server.

  5. 5.

    The cloud server performs the verification through ANN and sends the verification result to the trusted authenticator.

  6. 6.

    The trusted authenticator compares the verification result with threshold and transmits the reject/accept decision to the client device.

3.1 Assumptions

MICBTDL assume the following

  1. 1.

    The client device has limited computational & memory resources. It is a trusted entity during the authentication and enrollment phases.

  2. 2.

    The cloud server performs the computations honestly but curious to view the data.

  3. 3.

    The trusted authenticator is a semi-trusted entity.

3.2 Image capture and random crossfolding

In this phase, the right and left iris images are captured from the user and random crossfolding is applied. First, both the images are resized to 192 x 192 pixels, and then a random matrix of the same size is generated. The generated random matrix is converted to the binary matrix which is then multiplied to the left iris image and its complement is multiplied to the right iris image. In this way, we generate a random cross folded image in which half of the pixels are of the left iris and the other half are of the right iris. The illustration of random crossfolding can be seen in Fig. 2.

Fig. 2
figure 2

Illustration of Random Crossfolding

The generated random crossfold ensures the safety of the original biometric. Even if the random crossfold is compromised, the original biometric is safe because the random crossfold contains half pixels of the left iris and the other half pixels are of the right iris. So, when the user feels the random crossfold is compromised, the user can just change the user key just as how he changes a password, and then a new random matrix is generated from the new user key which then generates a new random crossfolded template which is different from the previous one because the random matrix has changed. Now, the user will be authenticated using the new user key. The crossfolded images are given as an input to CNN for feature extraction.

3.3 Feature extraction through deep learning

Feature extractor is the most crucial component in a biometric system that decides its performance of it. In this biometric system, a convolutional neural network performs feature extraction from the crossfolded images. First, the crossfolded images (192x192px) are normalized using greyscale normalization. The CNN architecture can be visualized in Fig. 3.

Fig. 3
figure 3

Schematic representation of Convolutional Neural Network

Fig. 4
figure 4

Triplet loss Architecture

Conv1 is a pair of convolution layers with 16 filters. Conv2 is a pair of convolution layers with 32 filters. Conv3 is a pair of convolution layers with 64 filters. Convolution is done by parsing the kernel filter over the entire image. Every convolution layer is followed by the max pool layer and dropout layer. Max pool layer is used for dimensionality reduction of the output of convolution layers. Max pool layers reduce computation cost and also prevent over-fitting The dropout layer is used for regularization, i.e., it ignores weights randomly making the CNN model learn in a regularized way. In the proposed model, the ReLU activation function is used to increase the nonlinearity in the images. Finally, a flatten layer is used after 3 convolution layers which are followed by a dense layer containing 256 neurons. Triplet loss is a loss function that is used for training machine learning models by comparing an anchor image to a positive and a negative image. The models trained using triplet loss are generally good at distinguishing between images of the same class and images of different classes.

For training a model using triplet loss, we need the input data to be in the form of triplets. Each triplet has an Anchor image at index 0, a Positive image at index 1, and a Negative image at index 2. We generate the database of triplets as follows, first, we select a random class and pick an anchor image from it randomly and then a positive image from the same class randomly (other than anchor). Later a different class is chosen and pick a random image (as a negative image) from it.

A triplet loss layer is added at the end of our CNN model. The architecture of triplet loss is shown in Fig. 4. Each triplet is fed into the CNN model and then the triplet loss layer modifies weights in the CNN model such that the outputs of the positive image and anchor image are close and at the same time outputs of the negative image and anchor image are far away. In this way, the CNN model is trained, and then the triplet loss layer is removed from the model to extract outputs of size 256 \(\times \) 1 from the final dense layer.

3.4 Cancelable template generation

Random projection is used in this phase to store the biometric template. Cancelable biometrics is achieved in this system by transformations such as random crossfolding and random projection. Random orthogonal matrices are generated using user keys which are then multiplied by the feature vectors to generate the cancelable templates. So, the original biometrics are not stored on the cloud and even if this cancelable biometric template is compromised, the user can change his/her user key just as he changes his/her password to generate a new template.

Fig. 5
figure 5

Schematic Representation of Artificial Neural Network

3.5 User verification through artificial neural network

An artificial neural network (ANN) is a fully connected network of neurons in which the input is passed forward layer by layer. The input to this ANN model is the feature vectors extracted from the CNN model. The schematic representation of the ANN model is shown in Fig. 5. The ANN model consists of dense layers with 256, 200, 100, 50, 10 neurons at each layer, respectively. L2 normalization is done on the output layer. The ANN model is trained using triplet loss in the same way as the CNN model. Triplets are created from feature vectors. Euclidean distance between the reference random projected feature vector and the query random projected feature vector is computed and used for the verification decision. If the distance exceeds the threshold, the user is an imposter otherwise he/she is genuine.

4 Results and observations

4.1 Experimental design

MICBTDL is experimented on MMUFootnote 1 and IITD [19] iris database. The details of the database are listed in Table 1. The IITD iris database consists of 225 subjects. Since 208 subjects from the IITD database contains both left and right iris images with a minimum of 5 samples each. Therefore, we consider 208 subjects to conduct our experiments and exclude other subjects. The experiments of MICBTDL are trained on a GPU workstation (SuperMicro 7039, Dual Intel Xeon Silver processor 4110, CUDA enabled NVIDIA GPU card Geforce GTX1080Ti, 2 GPU). Further, the Tensor Flow and Keras were applied.

Table 1 Description of the database

4.2 Performance analysis

We carried out experiments on IITD and MMU databases. IITD database has 2080 images (208 subjects-five right and five left iris images of each subject). MMU database has 450 images (45 subjects - five left and five right iris images of each subject). The metrics like false accept rate (FAR), false reject rate (FRR), genuine accept rate (GAR), and equal error rate (EER) are considered to check the performance of MICBTDL.

Table 2 Baseline Comparison of MICBTDL
Fig. 6
figure 6

CNN training curves a) Loss vs Epochs b) Accuracy vs Epochs on MMU dataset

Fig. 7
figure 7

CNN training curves a) Loss vs Epochs b) Accuracy vs Epochs on IIT database

Fig. 8
figure 8

Performance evaluation. ROC curves of a) CNN b) ANN on MMU, IIT databases

Fig. 9
figure 9

FAR and FRR distributions of proposed model on a) MMU b) IITD databases

The baseline comparison of MICBTDL is shown in Table 2. We can infer from Table 2 that the fusion of left and right iris images using random crossfolding increases the performance. Even though there is a slight degradation in the performance after applying the random crossfolding and random projection, the privacy of the iris template is preserved. Figures 6a, 7a shows loss and Figs. 6b, 7b shows accuracy for each epoch during the training of CNN on MMU and IITD iris databases. Figure 8 shows the Receiver Operating Characteristic (ROC) curve of CNN and ANN on the MMU and IITD iris database. The FAR and FRR distributions of MICBTDL on MMU and IITD databases are shown in Fig. 9. We fine tune the important parameters of CNN such as Margin for Triplet loss, iterations, dropout and ANN such as Margin for Triplet loss, iterations. We can infer that when the epoch increases, loss decreases, and the accuracy increases when the epoch increases. At some point, even with the increase in epoch, there is no change in the accuracy. The margin for triplet loss and the number of iterations plays an important role. The parameters of CNN and ANN considered in MICBTDL are shown in Tables 3 and 4. It is observed that an optimal EER of 0.03, 0.08 and 0.03, 0.06 for CNN and ANN on MMU, IITD iris databases.

Table 3 Parameters of CNN considered in MICBTDL
Table 4 Parameters of ANN considered in MICBTDL

4.3 Security analysis

4.3.1 Irreversibitlity analysis

Cancelable iris templates are generated in MICBTDL by using two one-way transformations: 1) Random CrossFolding and 2) Random Projection. In random crossfolding, a random matrix (X) of size 128 \(\times \) 128 is generated with the user key (U). A binary matrix (Y) of size 128 \(\times \) 128is obtained by using X. An orthogonal matrix (G) is generated using the user key in random projection. The cancelable template (C) is formed by multiplying the crossfolded iris images with G.

The original iris images cannot be reconstructed by using the generated cancelable template & user key because:

  1. 1.

    It is infeasible to reconstruct X due to the possible combinations in the range of \(10^{1}0^{128 \times 128}\). \(2^{128 \times 128}\) combinations are required to construct Y.

  2. 2.

    Furthermore, it is very difficult to obtain G from C. The required number of combinations are in the order \(10^{{10}^{256}}\), making it extremely difficult to achieve the same crossfolded feature image.

4.3.2 Revocability

The iris images cannot be generated from the cancelable template because of irreversibility. But, there is a possibility of compromising the cancelable template. In that scenario, the old cancelable iris template is replaced with a new cancelable iris template by changing the user key. A new random matrix is generated which is completely different from the previous one.

4.3.3 Diversity

Different cancelable iris templates are created from the original iris images by changing the user keys. There won’t be any relationship between the generated cancelable templates.

Table 5 Comparison of MICBTDL with other feature extraction techniques (EER in %)
Table 6 Comparison of MICBTDL with existing works (EER in %)

4.4 Comparison analysis

From Table 5, we can see that the CNN in MICBTDL has a better EER for both IITD and MMU Dataset. The difference in EER is very significant for the MMU dataset because it has fewer users. The CNN in [36] needs more data for efficient feature extraction whereas the CNN in MICBTDL learns better even when the number of users is less. The reason to accomplish the fair performance is due to the triplet loss function.

In the same way, from Table 6, we can observe that the MICBTDL achieves significantly fair performance than MLP in [36] for the MMU database and performs better than the other existing methods for the IITD database. The reason for the less performance in [36] is due to the fewer number of users. From the results, it is clear that MICBTDL is suitable for small datasets as well as large datasets.

5 Conclusion and future work

A multi-instance cancelable iris authentication system that uses a CNN trained using triplet loss for feature extraction and store the feature vector as a cancelable template is proposed in this paper. Later MICBTDL uses an ANN network as the matching module. MICBTDL uses triplet loss to train the neural networks so that the networks learn how to differentiate a positive image from a negative image by comparing it with the template(anchor) image. Cancelability of iris templates is ensured with two operations performed on the original iris images (1) Random crossfolding and (2) Random Projection. We experimented MICBTDL on IITD and MMU databases. From the experimental results, we can conclude that MICBTDL accomplishes fair performance when compared to other existing works.

In the future, MICBTDL can be applied to other biometric traits like finger-vein, fingerprint, etc. or a multi-modal system can be proposed as it does not require any preprocessing technique on the images that are specific to the iris. MICBTDL can also be used to safely process 2D images on cloud-based applications. So, this idea can also be extended to cloud-based 3D design.

On behalf of all authors, the corresponding author states that there is no conflict of interest.