1 Introduction

In the last decades, biometric systems are being widely used as a major domain of security applications in the electronically advanced society. It has led to the development of many biometric systems using different biometric traits. Among numerous biometric traits, the ear shape has become a very much popular and well-accepted biometric trait for human recognition owing to its unique shape, stable structure, and salient features. In addition, the ear biometric is a good alternative to other successful physiological modalities including fingerprint, iris, and palmprint because it is non-intrusive and less cooperation is needed from the users for recognition operation. This biometrics requires a low expensive camera to capture 2D ear images like face biometric trait but having promising recognition performance than face. Again, as size of ear shape is smaller, it can easily combine with other biometric trait to overcome the problems of unimodal biometric systems.

In 1890, first-person Alphonse Bertillon suggested that the shape of the outer ear can be used for human identification. After a long duration in 1989, the American police officer Alfred Iannarelli [21] developed a manual measurement based ear identification system. He performed extensive experiments on ten thousand ear images based on twelve unique features or measurements of ear for identification of different shape of the ears. During latter part of 20th century, Burge and Burger [6] were the first to propose a computer based ear biometric system. With the intention of studying the uniqueness of the ear prints, one project the Forensic Ear identification (FearID) was commenced by nine institutes from Italy, the UK and the Netherlands in 2006. The outcome of their study brought to the public with the conclusion that earprints can be used as biometric in a semi-automated identification system. Since then, ear biometrics has been deployed in many areas including identity recognition, serveilance, law enforcement, and forensics applications [16].

However, ear biometrics based on 2D ear images still undergo poor recognition performance due to improper acquisition such as poor contrast, blur, illumination variations, and noise. Various enhancement techniques are generally used to resolve the above problems occurring at the time of capturing the 2D ear images. In the biometric systems, contrast enhancement step is very important towards improvement of recognition performance. This enhancement operation can be broadly classified into two categories such as spatial domain and frequency domain. The most commonly used enhancement techniques of the spatial domain are classified into four types based on the operation performed on the pixels of the image such as pixel-wise operation [42], filter operation [18, 41], transformation [42], and multi-scale enhancement [25]. The linear contrast enhancement, gamma correction are the example of pixel-wise enhancement. The proposed image enhancement approach in this paper is based on filter operation. All histogram based enhancement approaches are based on transformation operation.

In order to reduce the effect of low contrast from 2D images the following enhancement techniques are widely used such as histogram equalization (HE) [42], and contrast-limited adaptive histogram equalization (CLAHE) [36]. In HE, the intensity of each pixel of the image is modified based on relative frequency of the intensity of every pixels. Due to this both object and background are equally enhanced, which tends to introduce noise in the background. It also over-enhances the image contrast in presence of high peaks in the histogram. Unlike evaluating a single histogram for the entire image, the AHE enhancement technique divides the image into a number of tiles and computes histograms of each region to redistribute the intensity values of these regions. Finally, the enhanced intensity values of the neighboring tiles are combined using bilinear interpolation to obtain the entire image. The main benefit of AHE is the enhancement of local detail information contents and noise enhancement is still a problem. To minimize the noise, the CLAHE introduced one extra step clipping the height of the local histograms before the computation of its CDF. In CLAHE, histogram clipping step does not fully assure to eliminate noise. In addition, the selection of the clipping limit is image dependent.

Unlike traditional image enhancement techniques, recently many meta-heuristic optimization algorithms are deployed to search for the optimal parameters of the transfer function which maps the intensity values of the pixels into enhanced intensity values of the input image. Gorai et al. [18] proposed an approach based on particle swarm optimization (PSO) [14, 28] which gave promising results on the standard images. Similarly, Sarangi et al. [41] explored differential evolution (DE) [11, 43] to enhance the standard images which also gave promising results. Recently, many meta-heuristic algorithms [12] have been studied to obtain optimal parameters value of the transformation function, but most of them suffer from large computational times and proper tuning of the algorithmic-specific parameters to achieve better performance.

The main contribution of our paper is to develop an automated image enhancement approach in order to resolve the above mentioned issues in the existing enhancement methods. Initially, enhanced Jaya algorithm (EJA) has been exploited to enhance contrast level of the ear images in the preprocessing stage of the ear biometric system in less computation overhead. The enhanced ear images contain more detail information that allows extracting rich features in order to improve recognition performance. In the next stage, we employed a local feature extractor, the Speeded-up Robust Features (SURF) has been employed to extract the most discriminant local descriptors under pose variations. Finally, the nearest neighbor classifier is trained using minimum training set including two samples per each subject for person identification in the matching stage.

The rest of the paper is organised as follows. Section 2 describes background study of the problem. Detail description of ear biometric system and a review of existing works available for ear localization and recognition in Sect. 3. Section 4 presents the proposed ear biometric system. A comprehensive analysis of the experimental results of proposed approach is compared to the results of two classical approaches and two meta-heuristic approaches of ear images enhancement in Sect. 5. Section 6 emphasizes conclusion and future scope.

2 Problem formulation

Image enhancement is one of the important tasks in the preprocessing module of a biometric system. The main objective of an efficient image enhancement algorithm is to highlight the fine detail contents of the input image which can easily separate the object from the background. Hence, enhanced image can be utilized to extract more discriminating features that improves recognition performance of the biometric system. The proposed image enhancement technique is based on a spatial operation or filter operation which is performed using two functions. First, the transformation function is used to modify the pixel intensity values based on both statistical global and local information and four parameters. Second, the fitness or cost function is required to estimate the information content of an image. The mathematical expression of the transformation function and fitness function are given in the following subsections.

2.1 Transformation function

Transformation function updates the pixels intensity values to new intensity values based on Eq. (1) derived from global and local statistical information of the input image. To extract global statistical information, all the pixels’ intensity values of the image of dimension \((P\times Q)\) are utilized. Similarly, for extracting local statistical information, it depends on the neighborhood pixels intensity values of a predefined \((n\times n)\) window. The mathematical form of the parametric transformation function g(pq) for both global and local enhancement is derived as follows:

$$\begin{aligned} g(p,q)=K(p,q)*[f(p,q)-c.\mu (p,q)]+\mu (p,q)^{a} \end{aligned}$$
(1)

where K(pq) is known as image enhancement function, which is expressed using the global mean (M) and local standard deviation (\(\sigma \)) in Eq. (2). Similarly, \(\mu (p,q)\) and \(\sigma (p,q)\) denote the local mean and local standard deviation of the current pixel, which is calculated using \((n\times n)\) neighborhood pixels intensities centred around current pixel (pq). Image enhancement function, global mean, local mean, and local standard deviation of an image f(pq) are given in Eqs. (2), (3), (4) and (5) respectively.

$$\begin{aligned} K(p,q) &= \frac{k.M}{\sigma (p,q)+b} \end{aligned}$$
(2)
$$\begin{aligned} M(p,q) &= \frac{1}{p \times q}\sum _{p=1}^{P}\sum _{q=1}^{Q} f(p,q) \end{aligned}$$
(3)
$$\begin{aligned} \mu (p,q)& = \frac{1}{n}\sum _{1=1}^{n}\sum _{q=1}^{n} f(p,q) \end{aligned}$$
(4)
$$\begin{aligned} \sigma (p,q) &= \left( \frac{1}{n}\sum _{p=1}^{n}\sum _{q=1}^{n}(f(p,q) -\mu (p,q))^{2}\right) ^{\frac{1}{2}} \end{aligned}$$
(5)

Finally, Eq. (1) can be rewritten by substituting Eqs. (2) in (1) to get the required transformation function as:

$$\begin{aligned} g(p,q)=\frac{k.M}{\sigma (p,q)+b}[f(p,q)-c\mu (p,q)]+\mu (p,q)^{a} \end{aligned}$$
(6)

The transformation function in Eq. (6) enhances the contrast level of the input image centred at the local mean. The four positive real-valued parameters (abck) of the transformation function decides the intensity values of the enhanced image. Since these four parameters play an important role to vary the contrast level of the input image over a broader range. Hence, the proposed enhancement technique optimizes those four parameters value over the predefined ranges of the transformation function to enhance the intensity values of the input image.

2.2 Evaluation of fitness function

In order to measure the amount of enhancement of an enhanced image, an fitness function is used after assigning a specific parameters values of a, b, c, and k on the transformation function. The higher fitness value of an enhanced image indicates more enhancement has acquired. Designing a good fitness function, it depends on measurement parameters used. In general, the enhanced image maximizes a number of edges and entropy in comparison with the input image. An enhanced image expands the gray-levels over a wide range which increases the details edge information. Similarly, the entropy increases due to enhancement of intensity of the pixels in the enhanced image. In this work, the fitness function is used for the proposed approach is sum of edge intensities, number of edge pixels (edgels), and entropy of the enhanced image. These three evaluation measures are combined to define an effective fitness function. An effective fitness function should be chosen to measure the fitness value of the enhanced image. This leads to an increase in entropy value because of more information present in the enhanced image [8]. The fitness function is presented in the form of equation as follows:

$$\begin{aligned} F(I_\textit{enhanced}) &= log(log(E(I_{\textit{edge}})))\nonumber \\&\quad \times \frac{\textit{edgels}(I_{\textit{edge}})}{M\times N} \times H(I_{\textit{enhanced}}) \end{aligned}$$
(7)

where \(I_{enhanced}\) is the enhanced image obtained from the transformation function in Eq. (6) and \(I_{edge}\) is the edge image obtained from Sobel edge detection. The edgels \((I_{edge})\) is the number of edge pixel in the edge image of enhance image. \(E(I_{edge})\) the is sum of all the pixel intensity values of the enhanced image which is corresponding to edge pixels of the edge image and \(H(I_{enhanced})\) is the entropy value of enhanced image. \(E(I_{edge})\) is calculated as follows:

$$\begin{aligned} E(I_{\textit{edge}})=\sum _{p=1}^{P}\sum _{q=1}^{Q} I_{\textit{enhanced}}*I_{\textit{edge}} \end{aligned}$$
(8)

The entropy of the enhanced image \(I_{enhanced}\) determines the information content which is calculated as follows:

$$\begin{aligned} H(I_{\textit{enhanced}})=-\sum _{i=0}^{L_{max}-1} p(i)*log_2(p(i)) \end{aligned}$$
(9)

where p(i) indicates the probability of occurrence of pixels with gray level \(i \in (0, L_{max}-1)\) and \(L_{max}\) indicates the maximum gray level present in the enhanced image \((I_{\textit{enhanced}})\).

2.3 Jaya algorithm and enhanced Jaya algorithm

All meta-heuristic optimization approaches are randomized algorithms which need controlling parameters to control the path of the candidate solutions towards a global optimal solution. Thus, these algorithms require proper tuning of the control parameters to guide candidate solutions to converge at global optimal solution. Among all meta-heuristic algorithms, Jaya algorithm does not need algorithm-specific control parameters. This subsection provides an in-depth description of Jaya algorithm (JA) and the proposed enhanced Jaya algorithm (EJA).

2.3.1 Jaya algorithm

Jaya algorithm (JA) [35] is a recent and simple yet meta-heuristic algorithm introduced by Venkata Rao. The name Jaya algorithm has originated from Sanskrit language, meaning of the word Jaya is victory. The principle behind this algorithm is always getting the optimal solution by following best and avoiding the worst solutions. However, it is parameters free, this key fact makes JA popular among many researchers. In recent years, it has been used in various optimization problems in different domains of engineering applications [29, 44, 47, 48].

JA utilizes two operators such as selection and updating the candidate solutions. After selection of best for each candidate solution from the present solution and the next solution based on their fitness values. JA relies on the present best and worst solution to update each individual solution for the next generation. At each generation, the candidate solutions are updated to move toward a global optimal solution with the principle of following the best and avoiding the worst solutions. Mathematical expression [35] for updating candidate solutions are as follows:

$$\begin{aligned} X_{i,j}(g+1) &= X_{i,j}(g)+r_{1,i,j}*(X_{i,\textit{best}}(g)-|X_{i,j}(g)|)\nonumber \\&\quad -\,r_{2,i,j}*(X_{i,\textit{worst}}(g)-|X_{i,j}(g)|) \end{aligned}$$
(10)

where \(X_{i,j}(g)\) is the value of the jth parameter for ith solution at iteration g, \(X_{i,best}(g)\) and \(X_{i,worst}(g)\) is the value of the ith parameter for the best solution and the worst solution at iteration g respectively, \(r_{1,i,j}\) and \(r_{2,i,j} \) are two random numbers generated in the range of [0, 1] at iteration t, finally \(X_{i,j}(g+1)\) represent the updating value of jth parameter for ith candidate solution in the next iteration \((g+1)\). Continuous upgrading every candidate solution at next iteration using Eq. (10) exploits neighborhood positions that leads to converge at global solution. In addition, two random numbers \(r_{1,i,j}\) and \(r_{2,i,j}\) enhance the exploration capability of Jaya algorithm. After updating the candidate solutions at current iteration n, then the selection process is used to select the best individual solution between the current solution \(X_j(g)\) and the updating next solution \(X_j(g+1)\) based on fitness values for the next iteration \((g+1)\) which is described in [35] as follows:

$$\begin{aligned} X_j(g+1)=\left\{ \begin{array}{ll} X_j(g),&\quad \text {if}\;\; f(X_j(g))>f(X_j(g+1))\\ X_j(g+1), &\quad \text {otherwise} \end{array}\right. \end{aligned}$$
(11)

As a result, all candidate solutions of the next iteration are best than their corresponding solutions in the current iteration. In accordance with the selection of the candidate solutions for next iteration, all their fitness function values are modified and considered as the input to next iteration.

2.3.2 Enhanced Jaya algorithm

One major issue of Jaya algorithm is slow convergence to the global best solution. In order to achieve fast exploration and convergence to optimal solution, we incorporate an efficient mutation strategy in the Jaya algorithm to avoid slow convergence and it is termed as the enhanced Jaya algorithm (EJA). In the experimental evidence, the combination of mutation operator with Jaya algorithm can enhance the candidate solutions diversity and searching capability. As a result, the mutation operator increases the convergence rate that leads to reach at the optimal solution in less iterations.

As mentioned in [41], DE outperforms the PSO based image enhancement in terms of performance and faster convergence to the global optimal solution. In each generation, DE executes mutation operation for the entire population before crossover and selection operations. Hence, DE can explore the search space faster than other existing evolutionary algorithms. However, it takes high computational time for execution and occasionally, its performance degrades in case of stagnation of the population that leads to premature convergence or slow convergence to the optimal solution. For this reason, we randomly incorporated the mutation operator of DE in the Jaya algorithm to enhance diversity in the solutions after performing updating and selection operations on the candidate solutions. In the proposed EJA, we used an adaptive selection of the mutation operator to balance exploitation and explorations in less computation time. This mutation operator mathematically expressed as

$$\begin{aligned} X_j(g+1)=X_{r_{1}}(g)+F*(X_{r_{2}}(g)-X_{r_{3}}(g)) \end{aligned}$$
(12)

where \(j\in \{1,2, \ldots , K\}\), jth candidate solution of the population of size K. Similarly, \(r_1 \), \(r_2 \), and \(r_3 \) are the indices of the candidate solutions from \(\{1,2, \ldots , K\}\). The nth and \((n+1)\)th represents solutions of the current and next iterations. F is the scaling factor lies in the range of [0,1] to control the difference vector in the mutation operation and avoid stagnation in the population. In this work, we have used an adaptive scaling factor which is described as

$$\begin{aligned} F=I+\textit{rand}*((G_{max}-g)/G_{max}) \end{aligned}$$
(13)

where I is the initial value of scaling factor F which is 0.8, g is the current iteration and \({G_{max}}\) is the maximum iteration. In addition, we can also adaptively evaluate the rate of mutation in Eq. (14) and the mutation operation will be performed when uniformly generated a random real number is greater than and equal to the rate of mutation.

$$\begin{aligned} \textit{rand}(0,1)\ge \left( 1-\frac{g}{G_{max}}\right) \end{aligned}$$
(14)

The mutation rate varies from 1 to 0 in the successive iterations. Initially, the rate of mutation operation is less and gradually it increases in the last part of the iterations. In this way, it enhances the exploration capability of the algorithm. Finally, EJA balances both exploitation and exploration capability using upgrading and mutation operators.

2.4 Speeded-up robust feature transform (SURF)

An efficient local feature descriptors, named Speeded-Up Robust Features (SURF)proposed by Herbert Ray and his colleagues [4]. SURF detects scale and in-plane rotation-invariant interest points (key points) and its corresponding descriptors are generated. It provides a better result in computationally quite faster than another proficient local feature descriptor Scale Invariant Feature Transform (SIFT). In order to make all computations faster, integral images and Hessian matrix are used to detect interest points. The interest points are the blob-like structures at regions where the determinant of the Hessian is maximum. In an integral image I for a given point x with coordinate \(x[p,q]^T\) the hessian matrix \(H(x;\sigma )\) at scale \(\sigma \) is defined as:

$$\begin{aligned} H(x;\sigma )=\begin{bmatrix} L_{xx}(p,\sigma )&\quad L_{xy}(p,\sigma ) \\ L_{yx}(p,\sigma )&\quad L_{yy}(p,\sigma ) \end{bmatrix} \end{aligned}$$
(15)

where \(L_{xx}(p,\sigma )\), \(L_{xy}(p,\sigma )\), \(L_{yx}(p,\sigma )\) and \(L_{yy}(p,\sigma )\) are the convolution of the Gaussian second order derivatives \(\frac{\delta ^{2}}{\delta x^{2}}g(\sigma )\), \(\frac{\delta ^{2}}{\delta x \delta y}g(\sigma )\), \(\frac{\delta ^{2}}{\delta y \delta x}g(\sigma )\) and \(\frac{\delta ^{2}}{\delta y^{2}}g(\sigma )\). The computation of the Hessian matrix is expensive and time consuming. Hence, Hessian can be computed approximately using box filters. For scale invariant, SURF uses a pyramidal scale space, like SIFT. But, it is different from SIFT; the SURF can directly change the scale because of the box filters and integral image. The scale space is analysed by up-scaling the filter size of a certain dimensions \((9\times 9,15\times 15, 27\times 27, etc.)\). The key-points are localized in the image and over scales; a non-maximum suppression in a \(3 \times 3 \times 3\) neighborhood is applied. Then, the descriptor is calculated for all interest points by considering a circular region near the interest points and the sum of Haar wavelet responses in horizontal (dx) and vertical (dy) directions are computed. In order to make descriptors rotation invariant, an orientation selection method has introduced in [4]. The square region is divided into \(4\times 4\) grids to preserves spatial information. For each subdivision, authors computed the horizontal and vertical Haar wavelet responses at \(5\times 5\) equally spaced sample points. The sum of the wavelet responses in dx and dy for each subdivision is used as first set of entries in feature descriptor. In addition, the sum of absolute values of the responses, |dx| and |dy| are computed to obtain the information about the polarity of the intensity changes. Hence, for ith subdivision the feature descriptor \(V_i\) is given by

$$\begin{aligned} V_i=\{ \Sigma dx, \Sigma dy, \Sigma |dx|, \Sigma |dy|\} \end{aligned}$$
(16)

These local feature descriptors \((V_i)\) are concatenated to form a 64 dimensional feature descriptor which expresses the interest point and nearest neighbors. Next, feature descriptor of the interest points are matched. In our work, matching of descriptors can be done by three distance measures such as Euclidean distance \((d_1)\), Canberra distance measures \((d_2)\), nearest neighbor distance ratio \((d_3)\). The distance between two descriptors \(D_1\) and \(D_2\) using Euclidean distance is given by

$$\begin{aligned} d_1 &= \sqrt{\Sigma _{i=1}^{64} (D_{1i}-D_{2i})} \end{aligned}$$
(17)
$$\begin{aligned} d2 &= \Sigma _{i=1}^{64} \frac{|D_{1i}-D_{2i}|}{|D_{1i}|+|D_{2i}|} \end{aligned}$$
(18)
$$\begin{aligned} d3 &= \frac{d_1(D_1,D_2)}{d_1(D_1, D_3)}<\theta \end{aligned}$$
(19)

If nearest neighbor ratio in Eq. (19) is less than \(\theta \), then \((D_1, D_2)\) descriptors have better match, otherwise they have no match. In our experiments, in most of the cases Euclidean distance metric in Eq. (17) outperforms than Canberra distance in Eq. (18) and nearest neighbor distance ratio in Eq. (19).

3 Generic ear biometric system

A biometric system works in two phases: one ear enrolment phase and second ear recognition phase. In the enrolment phase, the biometric system is trained using multiple images for each person and all the feature sets of the individuals are stored in the database. Similarly, in the recognition phase, an individual is recognized by the biometric system. A generic ear recognition system is as shown in Fig. 1, which is basically consists of four modules: ear localization, ear image enhancement, feature extraction, and matching/classification. First module is the ear localization module in which the ear shape is localized and cropped from the side face image. The input ear image may be affected by low contrast, illumination variation, and noise which influences the recognition performance through misclassification. Hence, the second module is the image enhancement in which the quality of the ear image is enhanced to extract discriminative features in the feature extraction module. The feature set of the enhance ear image can be generated using global or local feature extraction technique. As the feature set contains redundant features, it is necessary to select unique features or discriminant features using either feature selection or reduction technique. Then, the reduced feature set is used for person recognition in the recognition module.

Fig. 1
figure 1

The generic ear recognition system

Various ear localization and recognition approaches have proposed in the literature, we describe a review of some recent and well-known approaches of ear localization and recognition in 2D images. Yuan et al. [51] have employed skin-colour and elliptical contour information for ear localization. Ibrahim et al. [22] utilized a bank of curved and stretched Gabor wavelets known as banana wavelets, which is similar to ear structure used for ear localization. Islam et al. [24] proposed an Adaboost technique which yields huge time for ear localization. Prakash et al. in [33] presented a template matching based ear localization technique in which template has been resized according to the size of the side face to make it suitable for each test images. Prakash et al. [34] used hierarchical clustering of the edge map to localize ear. Prakash et al. [31] utilized connected components of a graph from the edge map and selected a set of connected component as probable ear candidates. Recently, we have efficiently used modified Hausdorff distance for ear localization. Sarangi et al. [39] presented a new scheme of automatic ear localization based on template matching with the modified Hausdorff distance. Recently, Emeršič et al. [17] proposed a novel ear detection technique based on convolutional encoder-decoder networks (CEDs).

In the literature, ear recognition methods can be categorized into the five categories: holistic approaches, structural approaches, spectral approaches, local descriptor-based approaches, and deep learning approaches. The holistic approaches use statistical measures which are evaluated from the image as a whole. Victor et al. [46] applied principal component analysis (PCA) method on the training ear images to compute eigenvectors and test ear images are projected on the basis eigenvectors for ear recognition. Zhang et al. [52] proposed an ear recognition system using a combination of independent component analysis (ICA) and a radial basis function network (RBFN). Xie and Mu [49] proposed an improved locally linear embedding (IDLLE) algorithm for dimensionality reduction of multi-pose ear recognition using KNN classifier. The performance of improved LLE has improved over LLE, PCA and KPCA. Yuan and Mu [50] proposes ear recognition system based on sparse representation to deal with ear recognition under partial occlusion.

The structural approaches use ear shape and structure to recognize the persons. Burge and Burger [6] represented ear as an adjacency graph from the Voronoi diagram of its edge segments and novel graph matching algorithm has used for ear recognition. Hurley et al. [20] proposed a novel feature extraction technique, force field transformation for ear localization and recognition. Choras [9] has used geometric features to represent the ear shape for ear recognition.

The spectral approaches utilize different frequency spectrums to recognize the persons. Sana and Gupta [37] proposed an efficient ear biometric system in which authors used multi-resolution discrete wavelet transform (DWT). Kumar and Wu [27] utilized a log-Gabor based combination approach for ear identification. Abate et al. [1] proposed a rotation invariant descriptor technique for ear recognition using generic Fourier descriptor to extract features from ear images. Basit and Shoaib [3] proposed a ear recognition method based on discrete curvelet transform (DCT).

In the last decade, many researchers have published local features-based ear recognition techniques. The local features represent an image into multiple image patches or the key-points which can characterize by multiple feature vectors or descriptors. As a result, the descriptors contain detail local information which is more robust to occlusion or clutter. Many researchers have explored well-known single local feature or fusion of multiple local features representation to improve recognition performance in the ear biometric systems under illumination, pose, scale variation, partial occlusion, and clutter. Guo et al. [19] presented a texture based local feature called Local Similarity Binary Pattern (LSBP). Bustard and Nixon [7] considered ear image as a planar surface and created a homography transform using SIFT feature points to register ears accurately. Damer and Führer [10] used multi-scale dense HOG features, Pflug et al. [30] utilized local phase quantization (LPQ) descriptor, Benzaoui et al. [5] employed binarized statistical image features (BSIF), and Sarangi et al. [38] explored pyramid histogram of oriented gradients (PHOG) as a descriptor of 2D ear shapes to extract discriminative features for ear recognition. Some authors tried to use multiple representations to increase the recognition performance namely, Badrinath and Gupta [2] performed feature-level fusion of different poses of ear images to form a fused template for ear verification; Kisku et al. [26] exploited scale invariant feature transform (SIFT) feature descriptors for the ear recognition; Prakash and Gupta [32] fused the speeded-up robust feature (SURF) on enhanced ear images; Sarangi et al. [40] proposed a feature-level fusion of two local features PHOG and LDP representation for each ear shape and reduced combined feature dimension using KDA.

Most recently, Emeršič et al. [15], Dodge et al. [13] and Zhang et al. [53] proposed CNN based ear recognition under uncontrolled conditions with illumination, pose, scale variations, partial occlusions, and clutter.

4 Proposed technique

Like other biometrics, the proposed ear biometric system consists of three main components: preprocessing, feature extraction, and matching. Figure 2 illustrates the major steps of the proposed ear biometric system. The ear images can be easily captured using the low-resolution camera at a distance because of which it suffers from pose variation, illumination changes, partial occlusion, and clutter. After the acquisition of 2D side face image using the camera, the preprocessing module starts to localize ear shape form the side face image and then enhancement technique is used to improve the quality of the ear image. This preprocessing stage plays a vital role in the improvement of recognition performance. The enhancement technique increases the discriminant features contained in the original ear image that leads to improve the recognition performance. We have explored a new and simple yet meta-heuristic optimization algorithm known as Jaya algorithm which does not require algorithmic-specific control parameters and only one updating operation (10) is used for exploitation and exploration operations. In our work, we have used proposed enhanced Jaya algorithm (EJA) to improve the diversity in the search space by introducing the superiority of Differential evolution’s mutation operator in the simple Jaya algorithm. To examine the effectiveness of the enhancement approach, we use SURF feature extraction technique to make our algorithm robust to handle scale, rotation invariant, and occlusion. Then, for simplicity, we use k-NN classifier for identification of subjects.

Fig. 2
figure 2

Block diagram of the proposed ear biometric system

figure a

The main advantage of our approach is that it does not need algorithmic-specific control parameters and we adaptively select mutation operation which makes it fully automatic ear enhancement technique. We used the proposed enhanced Jaya algorithm (EJA) for searching the optimal values of the parameters (a, b, c, and k) of the transformation function described in Eq. 6 in order to improve the quality of the input image. Experimentally, the upper and lower boundary of the four parameters has been fixed, which are given as follows: \(a \in [0,1.5] \), \(b \in [1, \frac{gmean}{2}]\) , \(c \in [0,1]\), and \(d \in [0,1.5]\). \(X_{i,j}\) is the ith candidate solution with jth parameter in the population. \(X_{i,j}=[0,1,0,0]\) and \(X_{i,j}^{th}=[1.5,\frac{gmean}{2},1,1.5]\) are lower and upper bound of the parameters values of the candidate solutions. Algorithm 1 presents the basic steps of the proposed enhanced Jaya algorithm. All the candidate solutions of the population are randomly initialized using given formula as follow:

$$\begin{aligned} X_i=X_{i,j}^{low}+\textit{rand}*(X_{i,j}^{high}-X_{i,j}^{low}) \end{aligned}$$
(20)

Each candidate solution contains four parameters. They are randomly computed within a range of predefined search space based on the lower and upper bounds of each parameters. The population size N is set ten times the number of parameters which is 40 in our all enhancement experiments. After initialization of the population, the fitness function is used to evaluate the best and the worst solution of the population. Next, the fitness value for each solution evaluated in two phases. In the first phase, the transformation function in Eq. 6 used to modify the pixel values of the original image, and then in the second phase, the fitness value evaluated using Eq. 7 for enhanced image. In order to maintain diversity in the search space, the candidate solutions are updated using Eq. 10 to converge in the direction of the best optimal solution with the help of the best solution and the worst solution. Every solutions move towards an optimal solution by following best solution and avoiding the worst solution of each generation. This updating operation of the Jaya algorithm is merely responsible for exploitation and exploration in the search space. For this reason, the Jaya algorithm needs more number of iterations to reach near the best solution. Therefore, the proposed approach intended to introduce the mutation operation for improving diversity in the search space which reduces the number of iterations to converge the optimal solution. The mutation operation of DE algorithm is very much efficient to explore the search space. The same mutation operator in Eq. 12 has been employed in the proposed approach to improve diversity and fast convergence to the optimal solution. In this work, the mutation operator is applied to all candidate solutions of the specific iterations when a uniformly generated random number in the range of [0,1] is greater than \((1-\frac{g}{G_{max}})\). It is observed that the probability of selecting mutation strategy depends on the iteration index i.e. a few number of mutation operation have performed at the initial iterations and the probability of selecting mutation operation increases in the last stage of the iterations when the value of \((1-\frac{g}{G_{max}})\) reduces from 1 to 0. Algorithm 1 describes the pseudo-code of the enhanced Jaya algorithm (EJA).

5 Experimental results

In this section, four different experiments of ear image enhancement are conducted to demonstrate the effectiveness of the proposed algorithm. First, the subjective evaluation of the original and enhanced images; second, SURF feature key-points matching; third, experiment is to examine the quantitative metrics such as detail variance (DV), background variance (BV), and edge counts; finally, evaluating the performance of the ear identification system based on enhanced ear images as inputs. The performance of the proposed image enhancement technique is compared to two classical well-known algorithms which are Histogram Equalisation (HE) and Contrast-Limited Adaptive Histogram Equalisation (CLAHE) and two popular meta-heuristic search algorithms such as Particle swarm optimization (PSO) and Differential evolution (DE) based image enhancement techniques, respectively. The experimental design and results are discussed in the following subsections.

5.1 Databases used

In this paper, all the experiments are carried out on the ear images from four widely used databases such as IIT Delhi (IITD) [23], University of Notre Dame collection E (UND-E) [45], collection J2 (UND-J2) [45], and the Annotated Web Ears (AWE) [16] standard databases. The IITD database contains ear images of the students and staff from the IIT Delhi, India. Recently, an extended version of ear database is available of 221 subjects with 793 images, where each subject age group varies from 14 to 58 years. In this database, most of the images are a very low contrast with large intra-class dissimilarity due to pose variation. Next, authors have selected two more databases from the University of Notre Dame (UND) such as collection E and collection J2 to handle more difficulties. The UND-E database contains 464 side face images from 114 subjects and 3–9 samples per subject. Images of the same subject are captured in different time with a large variation of pose and illumination effect. Similarly, The UND-J2 database consists of 2413 side face images taken from 415 subjects with 2–22 samples per subject. It can be noticed that most of the images have different objects in the background along with side face which is a challenging task to locate the ear region among multiple objects. In addition, female subject’s images are occluded by hair and earrings. Recently, an unconstrained ear images collected ‘in the wild’ of the ears of public figures collected by a web crawler named as AWE database. It consists of ten thousand ear images of 100 subjects with 10 samples per subject. In this database, images of the same subject have a huge variation in terms of illumination, pose, scale, and visual quality distortions. After ear localization all the ear images of four databases were resized for the enhancement and recognition as follows: IITD ear images resized [\(50 \times 150\)], UND-E ear images resized [\(80 \times 180\)], UND-J2 ear images resized [\(80 \times 180\)], and AWE ear images resized [\(100 \times 100\)] respectively.

In order to evaluate the performance of the image enhancement techniques, we perform identification operation comparisons on the enhanced images of each technique. In this experiment to determine the effectiveness of enhanced images of each database, we selected three samples for each subject from the database. Out of two samples are used as training samples and the remaining one is used as a test sample from every subject. In this way, three different combinations of datasets are built with two training samples and one test sample and an average of identification rates are determined for performance evaluation. The distribution of training and test samples of the four datasets are illustrated in Table 1.

Table 1 Allocation of samples for training and test sets on four datasets

5.2 Parameters Tuning

After ear localization operation, all the ear images were converted to the gray-scale images prior to the preprocessing operation. Then, low-pass filter was employed to minimize the illumination effect using the kernel or filter. Here, we experimentally determined different filter size for four databases such as for IITD kernel size is [\(3\times 3\)] for low resolution images, UND-E kernel size is [\(11\times 11\)] because most of the images are high contrast images, UND-J2 kernel size is [\(5\times 5\)], and AWE kernel size is [\(7\times 7\)]. Next, for image enhancement many parameters of the classical PSO, classical DE, and proposed EJA have to tuned to obtain optimal results i.e., to improve the contrast level of the ear images on the four databases. We have set experimentally the parameters of the PSO, DE, and EJA as mentioned in Table 2. As we know, the original Jaya algorithm is a parameter-free, simple, and computationally faster but it takes more generations to converge the optimal solution in comparison with DE and PSO. We have used adaptive mutation rate in the proposed approach when a real random value in the range [0, 1] is greater than the mutation rate, then mutation operation is performed for all the candidate solutions of the population in that iteration. Unlike DE where mutation operation is performed in every generation, in EJA initially, more mutation operations are performed and gradually it reduces to zero in last generations. We introduced mutation operator in the Jaya algorithm to achieve higher exploration and fast convergence to optimal solution.

Table 2 Parameters setting of PSO, DE, and proposed EJA for image enhancement

5.3 Performance measures

This section analyses experimental results of two classical and three meta-heuristic-based enhancement techniques on the four popular standard ear image databases. Two classical approaches are HE and CLAHE. Similarly, three meta-heuristic approaches are PSO, DE, and the proposed EJA. Figure 3 indicates the visual performance analysis of the enhanced sample images of our approach on the mentioned four databases. Figure 4 presents SURF key-points matching between original sample images and between enhanced images using our approach that indicates comparison of the enhanced images have more matching points on samples from four mentioned databases. Then, Figs. 567 and 8 show the visual presentation of sample images and their enhanced images obtained using HE, CLAHE, PSO, DE, and our EJA techniques on mentioned databases. Further, Figs. 91011 and 12 indicate the rate of identification performance of four k-NN classifiers with four set of enhanced images obtained in preprocessing stage by using HE, CLAHE, and proposed EJA approaches. Tables 3, 4, 5, and 6 illustrate the quantitative performance measures in terms of DV, BV, edge counts of four sample images on IITD, UND-E, UND-J2, AWE databases respectively. Similarly, Tables 7, 8, 9 and 10 give the rate of identification performance of the ear biometric systems based on HE, CLAHE, and EJA enhancement techniques used in preprocessing stage and ear image representation using SURF feature descriptors on IITD, UND-E, UND-J2, and AWE databases respectively.

Fig. 3
figure 3

Histogram based visual analysis of original and ours images

Fig. 4
figure 4

SURF keypoints matching for first and third rows with Poor contrast images and second and four rows with enhanced images

Fig. 5
figure 5

Four original images row-wise start from left to right enhanced gray-scale images using HE, CLAHE, PSO, DE, and proposed EJA enhancement techniques on IIT Delhi database

Fig. 6
figure 6

Four original images row-wise from left to right enhanced gray-scale images using HE, CLAHE, PSO, DE, and proposed EJA enhancement techniques on UND-E database

Fig. 7
figure 7

Four original images row-wise from left to right enhanced gray-scale images using HE, CLAHE, PSO, DE, and proposed EJA enhancement techniques on UND-J2 database

Fig. 8
figure 8

Four original images row-wise from left to right enhanced gray-scale images using HE, CLAHE, PSO, DE, and proposed EJA enhancement techniques on AWE database

Fig. 9
figure 9

Identification rate curves of the proposed ear biometrics using HE, CLAHE, PSO, DE, and EJA on IITD database

Fig. 10
figure 10

Identification rate curves of the proposed ear biometrics using HE, CLAHE, PSO, DE, and EJA on UND (Collection E) database

Fig. 11
figure 11

Identification rate curves of the proposed ear biometrics using HE, CLAHE, PSO, DE, and EJA on UND (Collection J2) database

Fig. 12
figure 12

Identification rate curves of the proposed ear biometrics using HE, CLAHE, PSO, DE, and EJA on AWE database

At first, a subjective performance measure of the enhanced images is illustrated by using histograms of input and enhanced images. The histograms of the enhanced images are more flat or spread over the histogram of input images in the interval [0, 255]. Second, the performance measure based on SURF feature key-points matching, it counts a number of key points matched which is more in case of enhanced images. Third performance measure is a quantitative measure in which, we estimate detail variance (DV), background variance (BV), and edge counts of the input and enhanced images for comparison. DV and BV is the average local variance of the foreground details and background respectively. The variance of each pixel is estimated using the gray-levels of the eight neighboring pixels over an entire image. From the variance image if pixels local variance is greater than the threshold then the variance of those pixels is the part of average foreground detail variance (DV), otherwise, the variance of those pixels is the part of average background variance (BV). In the case of an enhanced image the DV value increases, while the BV value is not considerably changed compared to the original image, then contrast enhancement is realized. Also, a number of edges of the enhanced image are a comparison with the original image.

Finally, we examined the quality of the enhanced image based on the rate of identification performance of the proposed ear biometric system. Identification operation of the biometric system is evaluated on the basis of one-to-many comparison against all training samples to validate the identity of a test sample. The performance of the identification is measured using accuracy or correct identification or rate of identification, which is defined as the rate at which a number of samples are correctly classified out of total test samples. The rate of identification is measured in terms of ranks; rank-1 indicates the fraction of samples correctly classified based on first rank comparison results; similarly, rank-2 implies correct identification based on second rank comparison results and so on. The rate of correct identity of the probe samples within the top k ranks matching scores obtained from gallery samples is referred to as a rank-k identification rate. This is graphically represented using Cumulative Match Characteristic (CMC) curve, which evaluates the rank-wise cumulative sum of correct identification. Hence, it illustrates the rank in the x-axis and the cumulative rank accuracy in the y-axis.

5.4 Analysis of numerical results

In this subsection, we discuss the experimental results of four different performance measures of HE, CLAHE, PSO, DE, and EJA based image enhancement approaches applied on the four above-mentioned ear image databases. These databases have low contrast, different lighting conditions, pose, and scale variation images. Figure 3 illustrates the visual representation of the original images and EJA based enhanced images and their corresponding histograms on IITD, UND-E, UND-J2, and AWE databases. The Fig. 3 indicates the original images have narrow histograms and enhanced images have flat histograms. Consequently, the original low contrast image indicates one peak in the histogram and the original high contrast image indicates two peaks in the histogram of UND-E database sample image but their enhanced images have flat histogram. In addition, Fig. 4 presented the SURF feature key-points matching of original and enhanced images from the mentioned databases. It has found that if the enhanced image is not properly enhanced then it has less key-points matching. Tables 334 and 5 show the quantitative metrics such as DV, BV, and edge count of original and enhanced images have been generated using HE, CLAHE, PSO, DE and proposed techniques on the mentioned databases. The quantitative measure of the edge count (EC) of all enhanced images on four mentioned databases, the proposed approach based enhancement produce have more edge counts than other mentioned enhancement approaches. However, in all four samples of IITD database DV values of HE-based enhancement have greater values than other enhancement techniques. Hence, it is clear that the performance HE-based enhancement performs better in IITD database than other mentioned enhancement approaches. In other three databases, the proposed EJA-based enhancement approach provides better DV values in most of the sample images. Similarly, Tables 789 and 10 indicated an average rate of person identification using HE, CLAHE, PSO, DE, and EJA on the four mentioned databases. It is clearly observed that the proposed approach EJA outperforms over other techniques. As DE’s mutation operator has been used in EJA for more exploration and early convergence, which makes EJA rate of identification results slightly better than DE approach but EJA consumes less execution time in comparison to PSO and DE algorithm. In quantitative comparison based on person identification operation, we have taken two samples from each subject for training and one sample for testing in identification operation using different enhancement approaches at preprocessing stage in the ear biometric system. In real world applications, we can increase the number of training samples per subject and employing dense local descriptors to represent ear images then the rate of identification ranks of proposed ear biometrics approach based on EJA image enhancement would be far better than the results have obtained in the experiments.

Table 3 Number of edge counts (EC), detail variance (DV) and background variance (BV) of the IITD database with nomenclature [Method name-Subject number-Image number]
Table 4 Number of edge counts (EC), detail variance (DV) and background variance (BV) of the UND-E database with nomenclature [Method name-Subject number-Image number]
Table 5 Number of edge counts (EC), detail variance (DV) and background variance (BV) of the UND-J2 database with nomenclature [Method name-Subject number-Image number]
Table 6 Number of edge counts (EC), detail variance (DV) and background variance (BV) of the AWE database with nomenclature [Method name-Subject number-Image number]

From the experimental results, it clearly observed, the proposed EJA approach is computationally faster and performance-wise efficient in case of illumination variation, low contrast, extraction of fine detail features, and presence of noise. It is also clear that the proposed approach is fully automatic does not require parameters tuning for improving identification performance.

Table 7 Comparison of average identification rate of the proposed ear biometrics using HE, CLAHE, PSO, and DE on IITD database
Table 8 Comparison of average identification rate of the proposed ear biometrics using HE, CLAHE, PSO, and DE on UND-E database
Table 9 Comparison of average identification rate of the proposed ear biometrics using HE, CLAHE, PSO, and DE on UND-J2 database
Table 10 Comparison of average identification rate of the proposed ear biometrics using HE, CLAHE, PSO, and DE on AWE database

5.5 Computational execution time

In this section, we present the total average execution time of the proposed ear image enhancement using EJA and being compared with PSO and DE based ear image enhancement approaches. According to Algorithm 1, the average execution time (\(T_{exe}\)) is sum of average time of three operations such as updating candidate solutions (\(T_{upd}\)), calculating fitness value for every candidate solutions (\(T_{fit}\)), and evaluating mutation (\(T_{mut}\)). The total time taken (\(T_{exec}\)) by the proposed approach as follows:

$$\begin{aligned} T_{exec}=T_{upd}+T_{fit}+T_{mut} \end{aligned}$$
(21)

In this work, we execute each approach 25 times to compute the average execution time. The experiments are implemented in MATLAB (R2017b) running on an Intel Core i5 PC with 2.6 GHz processor, 4 GB RAM, and Window 10 Operating System. For EJA, the average runtime for three operations (\(T_{upd}\)), (\(T_{fit}\)), and (\(T_{mut}\)) are computed to 0.017, 13.065, and 0.195 seconds respectively with the total execution time (\(T_{exce}\)) is 13.28 second. In EJA, adaptive mutation rate decides mutation operation. In the experiments, it has been observed that the average number of mutation operation is 26 out of 50 generations. However, mutation operator in DE is used for every generation and it takes 0.38 seconds which is approximately double as compared to the proposed EJA. Furthermore, two more operators (crossover and selection) of DE needs additional execution time as compared to the proposed EJA. Similarly, the average execution time of PSO based enhancement approach is very close to the proposed EJA due to slight difference in upgrading candidate solutions. However, in PSO, first velocity of the birds (candidate solutions) is evaluated then the position of individual birds (candidate solution) is updated with respect to the velocity of individual birds. In EJA, positions of the candidate solutions are directly updated with the help of the current best and worst solutions. Table 11 indicates the average execution time in seconds for PSO, DE, and the proposed EJA based ear image enhancement approaches. It clearly indicates that execution time of the proposed EJA is less in comparison to classical PSO and DE based enhancement approaches. In terms of space complexity, EJA requires less memory space than other meta-heuristic optimization algorithms because it does not require any algorithm-specific parameters for tuning and directly upgrading the candidate solutions. Hence, the overall results reveal that the proposed EJA is an efficient image enhancement approach for the practical applications.

Table 11 The average execution time (in seconds) achieved by the three meta-heuristic based ear image enhancement approaches

6 Conclusion

In this paper, we proposed an ear biometric system by incorporating a new ear enhancement approach at the preprocessing stage using enhanced Jaya (EJA) meta-heuristic algorithm. The EJA introduces a mutation operator with the aim of faster convergence and improvement of the quality of ear images to enhance the performance of the ear biometric system. The proposed approach is tested with four benchmark databases from the literature. Computational results of the proposed enhancement technique are compared with two well-accepted conventional enhancement techniques such as histogram equalization, CLAHE, and two well-known meta-heuristic optimization like PSO and DE-based enhancement techniques. The qualitative and quantitative results based on four databases clearly indicate the superiority of the proposed technique. It is also observed that proposed approach of image enhancement can be well-suited in the ear biometric system owing to the improved contrast of the original ear images that leads to extract more discriminative features in the feature extraction stage and subsequently increases the recognition performance.

There are some limitations that require future study. We can try other dense local feature descriptors instead of SURF with the expectation of improvement in the performance. Again, the proposed approach can also be improved by incorporating a multi-objective meta-heuristic optimization technique. Besides, another direction of research can be the use of deep learning based image enhancement approach in ear biometrics.