1 Introduction

The biometric-based recognition systems are being tremendously utilized for many critical applications such as law enforcement, cell-phone authentication, citizen identification, healthcare, border control, commercial applications and public security. Dominantly, the iris recognition systems are being deployed for this task due to its higher reliability with user’s convenience. A recent report [9] indicates that the iris market size is expected to grow from USD 2.3 billion in 2019 to USD 4.3 billion by 2024, at a CAGR of 13.2% during the forecast period. The widespread use of iris recognition technology for identification and verification by government entities is the key driving factor for market growth. Other substantial aspects that are significantly benefiting the iris recognition industry's growth include the rising penetration of iris recognition technology in the consumer electronics vertical and the high demand for iris scanners for access control applications. Even though these systems offer better security to a variety of computing applications, these systems endure a variety of attacks threating the security. Among the eight vulnerable attack points identified by Ratha et al. [11], the presentation or spoof attack is regarded as the most widely and easier attempted at the sensor level. The sensor module of an iris biometric system is mainly compromised by presenting a forged modality using a variety of artefacts such as iris’s paper printouts, textured contact lenses, prosthetic eyes, and etc. To alleviate presentation attacks, an IVIDNet [14] also known as a liveness detection module [15] is integrated with the biometric recognition system which serves as a security check. In general, the IVIDNet technique [23] may be considered of as a binary classification problem addressed by computing the difference in micro-textural or image quality features between live and counterfeit iris attributes. Contemporarily, the development of vitality detection network (VIDNet) mechanisms is a prominent field of research, with a slew of contributions aimed at providing efficient liveness detection systems. While VIDNet approaches based on single or multiple image characteristics have yielded accurate detection systems, their performance and generalisation capability to unknown attacks has been restricted. Additionally, choosing the appropriate number and type of image attributes for identifying a given image as genuine or fake is also one of the hefty tasks in conventional VIDNet algorithms. The field of deep learning (DL)-based IVIDNet is emerging as a potential alternate compared to traditional methods in current time. There are several reasons for this, including autonomous deep feature extraction and improved accuracy. However, these strategies have pitfalls, like increased training overhead and the need for a larger training dataset. To address these issues, an emerging paradigm in deep learning is to utilize the knowledge of pre-trained models in a specific domain that may be efficiently translated to build an efficient VIDNet. Transfer learning offers numerous advantages, including reduced training time, better performance (in most cases) and eliminating the need of large training dataset. Although, many researchers have utilized the capabilities of pre-trained models such as AlexNet, VGGNet, and ResNet50 there are still open research issues that are need to be tackled using the most efficient models and utilizing techniques.

Therefore, this work broadly aims to develop an efficient vitality detection technique that is not impacted by the type of fabrication material used to spoof the model. To attain this, we train and test our model on different datasets (cross-datasets). The key contributions of this work may be summarised as follows:

  1. i.

    We propose a novel technique for iris anti-spoofing using weighted score level fusion of potent deep level features of iris images.

  2. ii.

    The proposed approach overcome the challenge of scarcity of large amount of iris samples via transfer learning by including various iris artefact.

  3. iii.

    The IVIDNet approach integrate the merits of two robust and pre-trained deep models to yield a generalized iris spoof detector.

  4. iv.

    The proposed approach is evaluated on the benchmark iris liveness detection Notre Dame 2017 and NDCLD 2015.

  5. v.

    The IVIDNet demonstrates superior performance in both known and unknown attack scenarios as well as outperform the similar anti-spoofing mechanisms.

The remainder of the study is organized as follows: Section 2 presents a review on recent advancements in IVIDNet techniques. The framework and algorithms of the proposed IVIDNet approach is illustrated in Section 3. The experimental benchmark datasets along with performance protocol and a detailed experimental analysis are systematically discussed in Section 4. At last, the conclusions as well as the future scope of this work are briefed in Section 5.

2 Related work

Due to the significance of counter mechanisms and current advances in iris-based recognition systems, it is necessary to detect the liveness of the presented characteristic, since intruders may readily impersonate the authentication system using various presenting instruments (PAIs). As a result, the most pressing research problem is to distinguish a live iris from a fake one, as mandated by ISO-Standard IEC 30,107–3 E. Several trends in iris anti-spoofing systems have emerged throughout the years, based on a variety of essential principles. Vitality detection techniques are often classified into two categories based on the type of liveness indicator used, namely (a) hardware-based analysis and (b) data driven-based analysis. For identifying real and fake iris qualities, the former approach uses an extra sensing device in addition to the iris recognition system to assess vitality characteristics including temperature, impedance, image quality, blood cells, and so on. Unlike the former, the data driven-based module analyses the image features of single (static) or multiple (dynamic) iris images using hand-crafted or automated feature extraction-based approaches to detect liveness attributes.

Transfer learning-based techniques are receiving more attention these days as a consequence of their impressive successes in spoof attack detection mechanisms. Our suggested method is based on using the pre-trained models and generating a novel and efficient IVIDNet; hence, our study falls into the automated feature extraction-based iris spoof detector category. As a result, this section's brief literature assessment is confined to pioneer contributions linked to transfer learning-based methods. Beforehand, Ribeiro et al. [12] investigated texture transfer learning for super resolution that is applied to low resolution images. Although, on a subset of the CASIA iris image dataset, the developed technique achieves the best EER of 6.07% in factor 2 when the describable texture dataset (DTD) is used. This work lacks to investigate the integration of best datasets with the enrolling outcomes. Chen and Ross [2] suggested a multi-task iris vitality detector (IVD) system based on a technique for detecting objects. This method is computationally efficient and can be implemented in a real-time setting. However, in instances when the training and test datasets have distinct assaults, the approach is not studied. Gautam and Mukhopadhyay [6] introduced a transfer learning approach that depended on a pre-trained AlexNet model for feature extraction and dimensionality reduction, followed by principal component analysis (PCA). For classification, a Cubic SVM (cSVM) multi-class model based on error-correcting output code (ECOC) is utilised. This study needs to address efficient comprehension and exploitation of hybrid classifiers, as well as strong feature extraction algorithms in tandem with deep image representation.

Alaslani and Elrefaei [1] proposed an efficient iris authentication system based on transfer learning with CNN. For feature extraction and classification, this method is accomplished by fine-tuning a pre-trained VGG-16 model. The performance of the iris recognition system is assessed using four publicly available databases: IIITD, CASIA-Iris-V1, CASIA-Iris-thousand, and CASIA-Iris-Interval. According to the findings, the proposed technique has a 100% accuracy rate in the instance of IIITD. Minaee and Abdolrashidi [10] provided a deep learning system based on a pre-trained CNN model (ImageNet). The performance of the mechanism is measured on the IITD dataset, and an accuracy rate of 95.5% is measured. Choudhary et al. [3] presented a novel densely connected contact lens detection network (DCLNet) based on DCNN with SVM on top for classification. Other networks have more layers and learning parameters than the DCLNet, which is a densely linked convolutional network with fewer layers and learning parameters. Because of the tight connections between layers, it learns more critical qualities. The experimental findings show that when compared to the state-of-the-art (SOA), the suggested technique improves the CCR by up to 4%. Normalization, on the other hand, may be inferred to degrade the model's accuracy in the vast majority of circumstances. Therar et al. [21] used the multimodal biometric real-time approach IrisConvNet based on the architecture of a deep learning model for instances of a person's left and right irises. The CNN and transfer learning techniques are deployed to produce specific features that are fed into a multi-class SVM algorithm for feature extraction and classification. IrisConvNet performance is evaluated using two publicly available datasets: IITD and CASIA-Iris-V3. IITD has a 99% accuracy rate for both the left and right iris, whereas CASIA-Iris-V3 has a 94% and 93% accuracy rate for the left and right iris, respectively. Sardar et al. [13] proposed a deep Interactive Squeeze Expand Unet (ISqEUNet) model with interactive learning to reduce training time while enhancing storage efficiency by minimising the number of involved parameters. NICE.I has a mean true positive rate (mTRP) of 0.983% and a mean error rate (MER) of 0.261%, according to the results of three publicly accessible datasets.

Another IVD solution based on multi-layer fusion is propounded by Fang M. et al. [5]. Two level fusion i.e. feature level and score level is done on the feature extracted from the last several convolution layers. Although, result shows that multi-layer fusion technique performs better as compare to the best single layer feature extractor using pre-trained VGG-16, but while trained from scratch this technique perform well only on larger dataset such as the IIITD-WVU database in comparison to the Notre Dame database. Recently, Tapia J. et al. [20] deliberated a two stage serial framework for PAD focused on detecting bonafide images. For this approach the largest iris PA database by combining several other databases is developed and model is tested when trained from scratch and using fine-tuning. Although comparable results were obtained in known environment the performance of proposed two stage networks is not measured in unknown attack scenarios.

Based on the comparative analysis of several TL-based IVD as specified in Table 1, it can be inferred that in most of the techniques a pre-trained model on ImageNet is deployed. The reason behind this is, it consists of over 14 million images of roughly 20,000 categories and training a new model using this may reduce the overall training time. Moreover, IITD iris anti-spoofing dataset is widely used anti-spoofing dataset in these approaches. Besides, the accuracy rate for IITD dataset in transfer learning-based approaches ranges from 81.40% to 100%.

Table 1 A summary of various transfer learning enabled IVD mechanisms

3 The proposed approach

The deep learning-based approaches are advantages for capturing similarities among adjacent pixel values to safeguard against spoof attacks. To achieve better vitality detection results, it is imperative to train an appropriate IVID model that is based on significant features extracted from an adequate number of relevant images. In order to address specific concerns of existing SOA approaches, such as increased training overhead and the need of larger dataset, we provide a transfer learning-based IVID technique that significantly improves overall performance. The approach is based on weighted fusion of the predictions of two pre-trained model namely InceptionNet [18] and VGG-19 [17]. In the following subsections, we describe the underlying idea of IVIDNet framework, algorithms, and weighted score level fusion of the outcomes of various models.

3.1 The IVIDNet framework

Extracting deep level features to design a robust IVD that performs well in unknown assaulting scenarios is one of the critical issues in CNN-based methods. To address these issues, a recent deep learning paradigm is to use the knowledge of pre-trained models in a specific domain that may be efficiently translated to construct an efficient IVD. To this end, we present an efficient framework for IVIDNet to accomplish the task of the anti-spoofing sub-module as illustrated in Fig. 1. The suggested framework's main premise is to work in two stages, which include training and testing procedures.

Fig. 1
figure 1

The framework of our proposed IVIDNet technique

The training stage broadly comprises a series of activities that are applied to iris dataset such as pre-processing, deep feature extraction, building a model by fusing the predictions of two pre-trained models, and parameter of the target models. The goal of testing stage is to evaluate the IVIDNet model's correctness by validating it on a randomly selected set of images encompassing a variety of sensors and datasets. The detailed explanation of these phases is discussed in the following subsections.

3.1.1 Pre-processing

The key objective of image pre-processing is to get ideal data with the fixed region and high-quality that eliminates undesired distortions. The acquired iris images are usually of low quality as they are captured under different environmental conditions through a variety of sensing device. To enhance and prepare these images for DL-based authentication models, the dataset is subjected to a series of pre-processing operations [22]. First, the region of interest is segmented from the iris images to remove any extraneous background information. The coloured images are then converted to grey scale to reduce the computing complexity. Indeed, in our approach, the colour feature is not necessary to discriminate between the classes of fake and live modalities as it provides additional information that adds unnecessary complexity and takes up more memory space. The next phase resizes the iris images to a dimension of 224 × 224 to achieve uniformity. Thereafter, to overcome the problem of lack of inadequate size of anti-spoofing dataset augmentation operations are performed. Further, we use feature scaling technique to standardize the independent features present in the data in a fixed range.

3.1.2 Deep feature extraction

Deep feature extraction is the process of extracting image features from the deep layers of a CNN, and the features extracted are known as deep features. This procedure entails first providing the input data to the pre-trained CNN, and then obtaining the relevant activation values from the fully connected layer, which is usually present at the network's end, or the various pooling layers present at different levels. These features extracted from the iris images are used for correct authentication purposes. The fundamental approach which is adopted to extract deep image features along with their pseudo-codes is described in following subsections.

  1. a

    Customizing InceptionNet model

The fundamental way of improving the efficiency of deep neural networks is by increasing their size. This entails expanding the network's depth (number of levels) as well as its breadth. This is a simple and safe technique to train higher-quality models, especially if a large amount of labelled training data is available. However, there are two main downsides to this easy method. The bigger the network, the more parameters it has, which makes it more prone to overfitting, especially if the number of labelled samples in the training set is restricted. This can become a substantial bottleneck, since the creation of high-quality training sets can be difficult. Another issue of uniformly increased network size is the significantly increased use of computational resources. Even inside the convolutions, the fundamental way to solve both difficulties would be to move from fully connected to sparsely connected architectures. An inception network is a deep neural network with an architectural design that consists of repeating components referred to as inception modules as illustrated in Fig. 2. One of the most appealing features of this architecture is that it allows for a large increase in the number of units at each level without an uncontrollable blow-up in computing complexity.

Fig. 2
figure 2

A generic architecture of customized InceptionNet V3 model

Another practical benefit of this design is that it follows the intuition that visual input should be processed at several scales before being aggregated so that the following step may abstract features from multiple scales simultaneously. The improved utilization of computational resources allows for increasing both the width of each step as well as the number of stages without getting into computational difficulties.

Another way to avail use of the inception architecture is to create slightly inferior, but computationally less expensive variants of it. Further to utilize the capabilities of pre-trained InceptionNet in building the IVIDNet model the customization is done as shown in Table 2. A global average pooling 2D layer is added to the functional Inception V3 model. Finally, a dense layer is added to classify the images as real or fake.

  1. b.

    Customizing MobileNet model

Table 2 A description of customized InceptionNet model

A building block for mobile models are becoming increasingly efficient. As an effective substitute for traditional convolution layers, MobileNet V1 proposed depth-wise separable convolutions. By separating spatial filtering from the feature generation process, depth-wise separable convolutions efficiently factorise conventional convolution. In order to benefit of the low rank nature of the problem, the following generation MobileNet V2 [8] included the linear bottleneck and inverted residual method to construct even more efficient layer structures. In order to improve the expressiveness of non-linear per channel transformations, this structure internally extends to a higher-dimensional feature space while maintaining a compact representation at the input and output. By adding lightweight attention modules based on squeeze and excitation into the bottleneck structure, MnasNet was further built upon the MobileNet V2 framework. To achieve the most effective models for MobileNet V3 [7], it combines these layers as building blocks.

Additionally, layers are improved by enhanced swish nonlinearities. It employs the hard sigmoid in place of the sigmoid, which is used in the nonlinearities of squeeze, excitation, and swish and can be computationally inefficient as well as difficult to maintain accuracy in fixed point arithmetic. Through this procedure, two new MobileNet models: MobileNet V3-Large and MobileNet V3-Small that are oriented toward high and low resource use cases, respectively, are released. Compared to MobileNet V2, MobileNet V3-Large improves ImageNet classification accuracy by 3.2% while lowering latency by 20%. Also, in comparison with MobileNet V2 model with comparable latency, MobileNet V3-Small is 6.6% more accurate.

Figure 3 shows the generic architectural view of MobileNet V2 model. Further to utilize the capabilities of pre-trained MobileNet in building the IVIDNet model the customization is done as shown in Table 3. A global average pooling 2D layer is added to the functional MobileNet model. Finally, a dense layer is added to classify the images as real or fake.

Fig. 3
figure 3

An illustration of customized MobileNet V2 model

Table 3 A description of customized MobileNet V2 model

3.1.3 Weighted score level fusion

In general, a score level fusion is a process of integrating the prediction probabilities of two or more models together. We employed a weighted score level fusion that is one step ahead to the score level fusion, here we specify weights to each model and then fuse the probabilities to get better results.

figure a

First the image features are extracted by using the fine-tuned models, the next task involves fusion of predictions probabilities stated by the models. Figure 4 displays the process of computing the weighted score level fusion for a given set of iris images.

Fig. 4
figure 4

An example of our proposed weighted score level fusion in IVIDNet

This module further enhances the abilities of the model as the power two models is transferred to one model to efficiently do the classification process. Here the weights are first chosen at random and the one pair that gives the best accuracy is further used to efficiently build the module.

3.1.4 Decision module

The last module of IVIDNet model is the decision module. The output of the fusion module is considered and evaluated against the pre-defined threshold. The samples with prediction probabilities greater than the threshold are considered to be as a live samples and others are considered as fake samples.

3.2 Proposed algorithm

A training algorithm used for learning of VID model is depicted in Fig. 5. Initially, a training set of iris images of size ‘X’ is chosen form the anti-spoofing dataset ‘Dt’.

Fig. 5
figure 5

The learning algorithm for our proposed IVIDNet

These acquired images are usually of low quality to enhance and prepare these images for further processing, the dataset is subjected to a series of pre-processing operations. At first the \(\mathrm{^{\prime}}\mathcal{r}\mathrm{^{\prime}}\) operator is used to covert RGB to grayscale to reduce the computing complexity. Then to obtain uniformity in our model images are resized by \({ }^{^{\prime}}{\mu }^{^{\prime}}\) to the size of 224 X 224. Further augmentation operation is \({ }^{^{\prime}}{\vartheta }^{^{\prime}}\) is performed to artificially increase the amount of data. Finally, images are rescaled \(\partial\) to ensures optimal comparisons across data acquisition methods and texture instances.

After pre-processing of images both the models are fine-tuned the dense \({ }^{^{\prime}}{\mathcal{d}}^{^{\prime}}\) layer is dropped and a new ‘n’ output layer is added to the model ‘Q’. Afterwards, the customized models M1 and M2 are trained on all the instances of database Dk. The results of the models are integrated as \({ }^{^{\prime}}{\mathrm{\varphi }}^{^{\prime}}\). Finally, IVIDNet is built as \(\mathrm{^{\prime}}\updelta\)(\(\mathrm{\varphi }\))’ by hyper tuning with optimal parameters. A similar set of steps are followed to pre-process the testing images ‘t’ for validation of IVIDNet model as shown in Fig. 6. Then the testing of models Mj is performed using ‘x_test’ and prediction sets are generated as ‘M1_pred’ and ‘M2_pred. Thereafter we define the weights as w1 and w2 and calculate the D_score as w1 * M1_pred + w2 * M2_pred. At last we compare the D_score and defines the labels of the sample as live or fake.

Fig. 6
figure 6

The validation algorithm for our proposed IVIDNet

4 The IVIDNet learning and validation

The learning of our IVID approach by using a training algorithm is depicted in Fig. 5. Initially, a training set of iris images of size 'X' is chosen from the benchmark anti-spoofing dataset (Dt). For a model to perform effectively, the training dataset should be large enough to encompass samples from all of the labelled classes as well as sensors. Following that, the training images are pre-processed using the fundamental image processing operations to obtain standardized set of images. Let Si (x, y) is the ith image of Dt obtained after applying pre-processing operations on corresponding input image Xi (x, y). Thereafter, the selected base models are customized and a new set of layers are added on top of the base models. In the next step we train the new layers on the dataset. Afterwards, a weighted score level fusion model is build using the predictions of two pre-trained models. Finally, the trained model is hyper-tuned to various parameters with proper experimentation at an appropriate search space. Figure 6 lists the steps involved for testing algorithm of the IVIDNet method. The trained IVIDNet is validated by presenting images from the testing datasets. To build the appropriate feature sets, a similar sequence of steps is conducted to test images, such as image pre-processing. Finally, the classification of the test samples is carried out by IVIDNet by assigning a class label as either live or fake.

4.1 Experimental analysis

In this section, we assess the effectiveness of our approach as a vitality detection mechanism. To begin, we provide a brief overview of datasets and evaluation methodologies, which are used as a standard criterion for assessing performance. The IVIDNet is then fine-tuned, and the resulting model is tested on two publicly available datasets, Notre Dame-17 and Notre Dame-15. The proposed method is also evaluated in cross-database scenarios to determine the technique's generalization capability. Finally, the performance of the IVIDNet model is compared against the related state-of-the-art IVID approaches.

4.1.1 Evaluation datasets

An iris anti-spoofing database represents the systematic collection of iris information mainly used for developing and evaluating the iris VID algorithms. An adequate size of database corresponding to different iris sensing technologies and fabrication materials are required to assess these algorithms. For evaluation purposes two benchmark iris anti-spoofing datasets, i.e., LivDet 2017 Notre Dame [24] and NDCLD 2015 [4] are used. The details of these datasets are summarized in Table 4.

Table 4 An outline of LivDet 2017 Notre Dame and LivDet 2015 NDCLD datasets

4.1.2 Performance protocols

For performance evaluation, we select the overall protocol related to metrics and appropriate dataset selection. We utilize the training images from both the datasets to learn the IVIDNet and testing samples are selected across various domains to compute the performance.

The model is evaluated in terms of five standard performance metrics, namely attack presentation classification error rate (APCER), bona-fide presentation classification error rate (BPCER), average classification error rate (ACER), average classification accuracy (ACA) and receiver operating characteristic (ROC). A detailed description of various protocols is listed in Table 5.

Table 5 An analysis of various metrics used for the evaluation of IVIDNet

4.2 Experimental setup

Once our dataset for the IVIDNet model is prepared, we can perform the various experiments.

4.2.1 Hyper-parameter tuning

The anti-spoofing model's detection accuracy may be significantly impacted by the hyper-parameter settings. Using a Meta-process, the ideal hyperparameters are tuned for each dataset. For IVIDNet, the hyper-parameter such as number of epochs, learning rate, activation function, batch size, and optimizer are chosen. Table 6 contains the outcomes of our proposed model's hyper-parameter settings when trained and tested on Notre-Dame 2017 iris dataset.

Table 6 Performance evaluation of IVIDNet at different parameter settings

In each step of model’s tuning the optimum value of hyper-parameter is chosen and fixed for the next step. The process recurs until the whole set of optimal values are acquired. The optimal parameters are further used to evaluate the performance of IVIDNet model in different scenarios.

4.2.2 Performance with different pre-trained models

Pre-trained model is a saved network that was previously trained on a large dataset such as ImageNet, typically for a large-scale image classification task. Further these models could either be used as it is or we can transfer the knowledge to customize it to perform related tasks. To build an IVIDNet we have customized the most efficient models among all the models trained for image classification problems. To confirm that the best models are selected we have compared five well known pre-trained model that are publicly available: InceptionNet, EfficientNet [19], VGG-16, VGG-19, and MobileNet. The comparative analysis of various fine-tuned model when trained and validate on Notre Dame 2017 iris anti-spoofing dataset is summarized in Table 7.

Table 7 Comparison of performance among various fine-tuned models

The accuracy rate is measured at the selected values of hyper-parameter from the preceding step. From the comparison it can be clearly inferred that MobileNet V2 and InceptionNet V3 are the two models with maximum training and validation accuracy rate. The performance measure of these two models with the help of graphs is depicted in Fig. 7.

Fig. 7
figure 7

A graphical representation of InceptionNet and MobileNet performance

4.2.3 Performance of IVIDNet before and after fusion

A solitary fine-tuned model could be seen as a potent and accurate tool for efficient image classification purposes. Fusion of more than one such model could further influence the accuracy of a classifier. This step could either increase the accuracy or increase the complexity of resultant model. Table 8 shows the performance measure of a sole fine-tuned models and weighted fused IVIDNet.

Table 8 Performance measure of customized models before and after fusion

From the outcomes we can infer that in comparison to the sole models the accuracy of the fused IVIDNet model is increased in both testing in known and unknown scenario of the Notre Dame 2017 datasets. It is a graphical representation of performance of any classification model at all classification thresholds could be represented with the help of ROC curve. Figure 8 illustrates the ROC curve for IVIDNet model.

Fig. 8
figure 8

The ROC curve of IVIDNet tested on known and unknown set

4.2.4 Performance at varying weights

The proposed model is based on the idea of weighted score level fusion of prediction from the two customized models. Here ‘w1’ and ‘w2’ corresponds to the weights assigned to InceptionNet V3 and MobileNetV2 respectively.

Table 9 depicts the contrast among the performance measure of the model at different values of weights. The performance is measured on the training set of Notre Dame 2017 iris anti-spoofing dataset. Based on the classification accuracy of the final IVIDNet model the optimal value for weights could be 0.4 and 0.6 for InceptionNet V3 and MobileNet V2 model respectively. Figure 9 shows the ROC curve of IVIDNet with different values of weights.

Table 9 Comparative analysis of IVIDNet at varying weights
Fig. 9
figure 9

The ROC curve of IVIDNet by varying weights

4.2.5 Performance with cross dataset scenarios

An IVD’s efficiency in terms of generalizability to unknown assaults is a critical part. We thus conduct an experiment to assess how well our technique performs in a cross-database scenario. Cross-dataset testing, is where a model is trained on one set and evaluated on distinct datasets entailing iris artefacts created by different spoofing materials, is used to extend the IVD approach across unknown threats. In this test, we used images from a set to train our model and samples from another set to test it. Table 10 presents the results at cross-dataset evaluations.

Table 10 Cross-dataset performance of the IVIDNet

The contrast among the performance of the model when trained on Notre Dame 2017 and tested on NDCLD 2015 at different values of epochs is summarized in Table 10. From there we can be inferred that the IVIDNet achieves an accuracy rate of 89.63% on 40 epochs at cross dataset evaluation.

4.3 Comparison with SOA techniques

Several machine and deep learning-based solutions are presented in the literature to address the issue of iris anti-spoofing. Since our method is transfer learning-based and grounded on the concepts of weighted fusion of prediction values of two fine-tuned model at score level. In order to assess the IVIDNet's effectiveness, we contrast it with comparable SOA techniques that are based on similar techniques. Table 11 depicts the contrast between our proposed approach and a multi-layer fusion technique trained and evaluated on Notre Dame dataset using a pre-trained model.

Table 11 A comparison of IVIDNet with related approach

The comparison clearly indicates that our proposed model performs more efficiently compared to the multi-layer fusion method.

The comparison of proposed IVIDNet with SOA iris spoof detection mechanisms is briefly discussed in Table 12. The outcomes indicate that our approach performs well in known environment with ACA of 99.39% and shows descent results in unknown attack scenarios.

Table 12 IVIDNet’s comparison with SOA TL-based anti-spoofing approaches

5 Conclusions

This research work has presented an efficient and novel iris liveness detection mechanism that fuses the robust features of two pre-trained DCNN models (InceptionNet V3 and MobileNet V2). The IVIDNet has been evaluated by conducting a series of experiments on benchmarks anti-spoofing datasets and the empirical results proves the effectiveness of our approach. Besides, known attack scenarios the IVIDNet shows promising performance in unknown attack environment covering cross-database scenario. The IVIDNet offers several merits as compared to which other counterparts that include: (i) usage of robust deep level features from pre-trained models (ii) works well with smaller datasets (iii) lower training overhead due to using pre-trained models. However, the proposed IVIDNet approach has not been evaluated in cross-material and cross-sensor environments. The future scope of this work may include the performance evaluation of IVIDNet on some more recent iris anti-spoofing datasets such as LivDet 2017, LivDet 2020, IIITD, and etc. An additional future work is to evaluate IVIDNet in new unknown attack environments particularly unknown fabrication materials used for creating iris artefacts and sensors. The proposed approach may be extended for anti-spoofing techniques in other biometrics such as fingerprint, face, palm prints, etc.