Introduction

Recently, a newly emerging biometrics technology based on human finger veins has attracted more and more attention, since the finger veins are located internally within the living body, thus providing a recognition system with high accuracy and immunity to forgery and the interference from the outer skin (e.g., from skin disease, humidity, dirtiness, etc.).

Finger vein systems have advantages of low cost, easy collection with contactless operation, and small devices. Finger vein patterns are viewable with reflected light due to the peak absorption of infrared illumination by oxygenated and de-oxygenated hemoglobin in the blood relative to the surrounding flesh at specific frequencies [1]. In practice, however, finger vein images suffer from a specific selectivity in imaging modes, or changes in the physical conditions and blood flow, which make them become unstable and have low contrast, or cause the veins to have various apparent thicknesses and brightness, as shown in Fig. 1. This makes it difficult to achieve reliable and accurate finger vein recognition and causes high generalization requirements for feature extraction and matching algorithms.

Vein extraction has been widely researched, usually based on the intensity characteristics in the cross-sectional profiles, since a vein pattern point is darker than its surroundings. Miura et al. proposed repeated line tracking [1] and the maximum curvature detection method [2, 3] based on the cross-sectional profiles. Hoover et al. [4, 5] proposed an approximated Gaussian-shaped model to simulate the profile curve. Although these methods can extract the veins from a low-contrast image, they are sharply affected by the temporal change of the widths of the veins. Intensity thresholding-based methods [6] are easily affected by the image brightness due to the threshold tuning problem.

In the matching phase, the similarities between the registered image and testing image are calculated based on the certain distance [7], chi-square distance [8], or machine learning methods. Over the past few decades, lots of cognitive-inspired computation works are contributed for the image processing and pattern recognition [912], including neural networks [1315], genetic algorithms, support vector machine [16], and a new type of feed-forward classifier, extreme learning machines (ELM) [1720]. The ELM has recently attracted more and more attention as an emergent technology that overcomes some of the challenges faced by other classifiers. The ELM works well for generalized single hidden layer feed-forward networks (SLFNs). The essence of the ELM is that the hidden layer of SLFNs need not be tuned. Compared with the traditional classifiers, the ELM provides better generalization performance at a much faster learning speed with less human intervention [2124].

In this paper, we propose a novel finger vein recognition system that is more robust to the variation of the external factors such as lighting and user positioning, and shows improved stability, complexity, and recognition accuracy, thus rendering the system more practical in real-world applications and enabling it to deal with the increasing size of the datasets. For feature extraction, a novel explicit guided directional filter is proposed to obtain high-quality finger vein contours from noisy, non-uniform, low-contrast images without introducing any segmentation process. This filter enhances an input image with the help of a supervisor image that instructs the filter to preserve the vein patterns and reduce the impacts of the background, such as haze and illumination. After the guided directional filter, the veins are sufficiently magnified to directly extract the average absolute deviation (AAD) features, which are the strengths of the directional block information with eight different angles, even from images with thin, vague ridges and non-uniform backgrounds. Finally, a variant of the original ensemble ELM called the feature component-based ELMs (FC-ELMs) is introduced. FC-ELMs are designed to utilize the characteristics of the AAD features; and improve the recognition accuracy, speed, and stable generalization for large datasets; and substantially reduce the number of hidden units.

Fig. 1
figure 1

Finger vein images with diverse qualities: a low-quality images affected by illumination and low contrast and b high-quality images

Related Study: Ensemble Extreme Learning Machine

The ELM algorithm was first proposed by Huang et al. [21, 22, 25] based on the single-layer feed-forward network (SLFN). The main concept of the ELM is that hidden node parameters are randomly generated without tuning. Consider a set of \(N\) arbitrary distinct samples \((x_i, t_i)\), where \(x_i=[x_{i1}, x_{i2}, \ldots ,x_{in}]^T\in {R^n}\) and \(t_i=[t_{i1},t_{i2}, \ldots , t_{im}]^T\in {R^m},\,x_i\) is an \(n\times {1}\) input vector and \(t_i\) is an \(m\times {1}\) target vector. For the given training samples \(\left\{ (x_i,t_i)\right\} _{i=1}^N \in {{R^n}\times {R^m}}\), the output of an SLFN with \(L\) hidden nodes can be represented by

$$\begin{aligned} f_S(x_j)=\sum _{i=1}^L\beta _iK(a_i,b_i,x_j)=t_j,\quad j=1, \ldots , N \end{aligned}$$
(1)

where \(a_i\) and \(b_i\) are the hidden node parameters, which could be randomly generated. \(K(a_i,b_i,x_j)\) is an activation function and \(\beta _i\) is the weight connecting the \(i^{th}\) hidden node to the output nodes, which can be written compactly as:

$$\begin{aligned} H\beta =T \end{aligned}$$
(2)

\(H\) is called the hidden layer output matrix of the network. Given the randomly generated hidden node parameters \((a_i,b_i)\) and the training inputs \(x_i\), the hidden layer output matrix \(H\) can be computed simply.

$$\begin{aligned}&H(a_1,\ldots ,a_L,b_1,\ldots ,b_L,x_1,\ldots ,x_N)=\left[ \begin{array}{ccc} K(a_1,b_1,x_1) &{} \cdots &{} K(a_L,b_L,x_1) \\ \vdots &{} \cdots &{} \vdots \\ K(a_1,b_1,x_N) &{} \cdots &{} K(a_L,b_L,x_N) \end{array}\right] _{N\times {L}} \nonumber \\&\beta =(\beta _1,\ldots ,\beta _L)^T \quad \hbox {and} \quad T =(t_1,\ldots ,t_L)^T\qquad \qquad \end{aligned}$$
(3)

Therefore, training the SLFNs simply amounts to solving the linear system of output weights \(\beta\). With the computed \(H\) and given output \(T\), the output weight \(\beta\) is estimated as:

$$\begin{aligned} \beta =H^{\dagger }T \end{aligned}$$
(4)

where \(H^{\dagger }\) is the Moore–Penrose generalized inverse of the hidden layer output matrix H. There are several methods of calculating the Moore–Penrose generalized inverse of H, such as the SVD-based method. The single ELM network shown in Fig. 2a is widely used for real-time applications due to its simple steps and very high speed.

Hansen and Salamon [26] proposed that the single network performance can be improved by using an ensemble of neural networks with a plurality consensus scheme. An integration of several ELMs connected in parallel was first proposed by Lan et al. [27]. It was confirmed that the method worked well for both stationary and non-stationary time series prediction [28, 29] and sales prediction [30] with better generalization performance.

The average of the ELM outputs was used as the final decision. Assume that the output of each ELM network is \(f_s^{(j)}(X),\,j=1, \ldots , M\). The final output of the ensemble ELMs (E-ELMs) shown in Fig. 2b can be represented as:

$$\begin{aligned} f_E(X)= \sum _{j=1}^{M}w_j\cdot f_s^{(j)}(X), \quad w_j=\frac{1}{M} \end{aligned}$$
(5)

where \(f_E(X)\) is the output of the whole system with input \(X\). We expect the ensemble ELM to work better than the single ELM, because the randomly generated parameters make each ELM network in the ensemble distinct. The variance of the ensemble network is lower than the average variance of all of the single networks. Let \(f(x)\) denote the true output of the predicted input and \(\widehat{f_i}(x)\) be the estimated value of network \(i\). Then, the error \(e_i(x)\) between the predicted \(\widehat{f_i}(x)\) and true output \(f(x)\) is expected to be at a minimum:

$$\begin{aligned} e_i(x)= \left| \widehat{f_i}(x) -f(x) \right| \end{aligned}$$
(6)

Then, the expected square error of a single network becomes

$$\begin{aligned} E[e_i(x)^2]=E\left[ \left\{ \widehat{f_i}(x) -f(x) \right\} ^2 \right] \end{aligned}$$
(7)

The average error made by M networks is given by

$$\begin{aligned} E_{\mathrm{avg}}=\frac{1}{M}\sum _{i=1}^{M}E[e_i(x)^2] \end{aligned}$$
(8)

Similarly, the expected error of the ensemble is given by

$$\begin{aligned} E_{\mathrm{ens}}=E\left[ \left\{ \frac{1}{M}\sum _{i=1}^{M}\widehat{f_i}(x)-f(x) \right\} ^2 \right] =E\left[ \left\{ \frac{1}{M}\sum _{i=1}^{M}e_i(x) \right\} ^2 \right] \end{aligned}$$
(9)

If the errors \(e_i(x)\) are nonzero, then

$$\begin{aligned} E_{\mathrm{ens}}=\frac{1}{M}E_{\mathrm{avg}} \end{aligned}$$
(10)

It can be shown that the ensemble ELM network produces fewer errors than M single ELM (S-ELM). Since each of the M S-ELM networks has different adaptabilities to the new data, they can overcome the problem of networks that cannot adapt well to the new data.

Fig. 2
figure 2

Architecture of extreme learning machine classier: a single ELM classifier and b ensemble ELM classifier

Proposed Finger Vein Recognition System

As shown in Fig. 3, the proposed finger vein recognition system consists of two modules: a feature extraction module and matching module based on the ELM. The feature extraction module consists of three main steps: pre-processing, vein contour extraction, and AAD feature extraction. The ROI image with a pre-defined smaller size can speed up the overall feature extraction process. Vein contour extraction using a guided directional filter extracts high-quality vein contours in eight directions. Instead of pixel-based features, the AAD strengthens the direction information by extracting features on non-overlapping blocks. The matching module was implemented using the ensemble ELM network, which consists of eight small ELM, each trained with the AAD sub-features with a pre-defined angle, and an output layer to combine the outputs of the eight ELMs.

Fig. 3
figure 3

Procedures of the proposed finger vein recognition system

Pre-processing

It is necessary to determine a reliable region of interest (ROI) in a finger vein image with a pre-defined size but adjustable position and rotation, due to the user’s informal placement, distortion, and rotation. As the finger target is brighter than the surrounding background pixels, convex structure is formed at the profiles of the finger, which could be detected by the open top-hat filter defined in Eq. (11). \(F\circ B\) represents the opening morphology operation with the structure \(B\), which is the disk structure with a size of 5.

$$\begin{aligned} \hbox {OTH}\left( t \right) =(F-F\circ B)(t) \end{aligned}$$
(11)

Since the finger profile can be approximated as a line, the Hough transform is used to detect the positions and angles of the finger lines, since it is tolerant of gaps in the edge descriptions and is relatively unaffected by image noise [31]. The group of edge points \(\left\{ \left( x_1,y_1 \right) ,\left( x_2,y_2 \right) ,\ldots ,\left( x_k,y_k \right) \right\}\) is transformed into a sinusoidal curve in the plane \(\left( \theta ,\rho \right) ,\left( \rho \geqslant 0,0\leqslant \theta \leqslant \pi \right)\) defined by:

$$\begin{aligned} \rho =x_i\cos \theta +y_i\sin \theta \quad \left( i=1,2,\ldots ,k \right) \end{aligned}$$
(12)

The accumulator cells that lie along the curve are incremented, and the resulting peak in the accumulator array provides strong evidence that a corresponding straight line exists in the image. As shown in Fig. 4c, two peaks \(\left( \rho _1,\theta _1 \right) ,\,\left( \rho _2,\theta _2 \right)\) are detected corresponding to the two horizontal finger contour lines. When considering the finger curvature itself, a simple rotation correction will involve a rotation of \((\theta _1+\theta _2)/2\) degrees when two detected peaks satisfy the condition:

$$\begin{aligned} (\theta _1-\pi )+(\theta _2-\pi )\geqslant \frac{\pi }{18} \end{aligned}$$
(13)

Finally, the ROI is centered at the point \((C_x,C_y)=(width/2,\,(\rho _1+\rho _2)/2)\) and cropped with a size of [256, 96] for the rotation-corrected images, as shown in Fig. 4d.

Fig. 4
figure 4

The procedure of pre-processing: a the captured image, b the edge image by the top-hat filter, c accumulator array obtained by Hough transform, d finger contour line and ROI detection

Guided Directional Filter for Vein Contour Extraction

Since finger vein images are not always of high quality, due to the varying tissues and bones, or uneven illumination, an efficient enhancement method is necessary to recover those influencing factors that make the veins appear different in terms of their thickness and brightness at each acquisition. As the finger vein network is composed of a series of ridges in a particular orientation, a properly tuned directional filter, such as the even symmetric Gabor filter [32], has proved to provide excellent performance for ridge extraction. A guided directional filter is constructed using an even symmetric Gabor filter and the guided filter [33]. Using a supervisor image that can be the input image itself or another image, the guided filter instructs the filter to preserve the vein pattern and reduce the impact of the background.

The key assumption of the guided filter is the existence of a local linear model between the supervisor image \(S\) and the filtered image. In each window, \(w_k\) centered at pixel \(k\), and the guided image \(G_u\) is linearly transformed by \(S\) with the coefficients \((a_k,b_k)\), which can be represented as:

$$\begin{aligned} G_{ui}=a_kS_i+b_k,\quad i\in w_k \end{aligned}$$
(14)

This local linear model ensures that \(G_u\) has an edge only if \(S\) has an edge, because \(\triangledown G_u=a\triangledown S\). This has been proven to be useful in image matting, image super-resolution, and haze removal [33]. The relationship between \(S,\,I\) and \(G_u\) can be described in the form of image filtering as follows:

$$\begin{aligned} G_{ui}(I,S,w,\varepsilon )=\underset{j}{\sum }W_{ij}(S,w,\varepsilon )I_i \end{aligned}$$
(15)

The kernel weight can be explicitly expressed by:

$$\begin{aligned} W_{ij}(S,w,\varepsilon )=\frac{1}{\left| w \right| ^2}\underset{k:(i,j)\in w_k}{\sum }(1+\frac{(S_i-\mu _k)(S_j-\mu _k)}{\sigma _k^2+\varepsilon }) \end{aligned}$$
(16)

where \(\mu _k\) and \(\sigma _k^2\) are the mean and variance values of window \(k\), respectively. It can be proven that the kernel weights \(\underset{j}{\sum }W_{ij}\left( S \right)\) are equal to 1 without any extra normalization. Then, the guided directional filter can be represented by the following general form:

$$\begin{aligned} G(I,f_0,\theta _k,\delta )=G_a(I,f_0,\theta _k,\delta )*G_u(I,S,w,\varepsilon ) \end{aligned}$$
(17)

where,

$$\begin{aligned}&G_a(I,f_0,\theta _k,\delta )=\exp \left\{ -\frac{1}{2} \left( \frac{I_{x_{\theta _k}}^2+I_{y_{\theta _k}}^2}{\delta ^2} \right) \right\} \cos (2\pi f_0 I_{x_{\theta _k}})\nonumber \\&\quad \quad \hbox {where}, \quad \left[ \begin{array}{c} I_{x_{\theta _k}}\\ I_{y_{\theta _k}} \end{array}\right] =\left[ \begin{array}{cc} \cos \theta _k &{} \sin \theta _k \\ -\sin \theta _k &{} \cos \theta _k \end{array}\right] \left[ \begin{array}{c} I_x \\ I_y \end{array}\right] \qquad \qquad \end{aligned}$$
(18)

where \(*\) denotes a convolution in two dimensions, while \(\theta _k=k\pi /8\), and \(k=(1,2,\ldots ,8)\) denote the orientation, and \(f_0\) is the center frequency of the Gabor filter. The bank of guided directional filters, as shown in Fig. 7, generates eight filtered components. Since the linear edge preserving coefficient, \(a_k\), will decrease with increasing \(\varepsilon\) in Eq. (16), \(\varepsilon\) is considered as the degree of edge preservation. As shown in Fig. 5, the edge preservation performance is enhanced with increasing \(\varepsilon\) and window size, \(w\).

Meanwhile, the haze removal and vein enhancement performance vary depending on the supervisor image. A proper supervisor image will benefit the vein extraction process, as demonstrated in Fig. 6e–g, where the supervisor image is the same as the input image in Fig. 6e, f, and the supervisor image in Fig. 6g is the image enhanced by the guided filter. Although Fig. 6g shows a darker and clearer vein contour because of the use of the iteratively enhanced supervisor image than Fig. 6e and g, it may achieve more effective for other matching methods such as local binary pattern-based methods. When using a directional filter for further vein extraction, much more noise is obtained than the image in Fig. 6f, because the additional enhancement also enhances the noise simultaneously. To optimize the setting of the guided filter, the performance of vein contour extraction is quantitatively evaluated by matching performance in “Vein Contour Extraction Performance” section. Compared with the typical enhancement methods, the guided Gabor filter performed superior accuracy when \(\varepsilon =1^2,\,w=15\), and \(S=I\).

Fig. 5
figure 5

Enhancement performance of the guided filter, a input image I, b supervisor image S, cf enhanced images under various \(w,\,\varepsilon\)

Fig. 6
figure 6

Performance comparisons of vein contour extraction based on enhanced image obtained using: a original image, b global histogram, c local histogram with a block size of [32, 16], d wavelet normalization, e guided filter when \(S=I,\,\varepsilon =0.05^2\), and \(w=15\), f guided filter when \(S=I,\,\varepsilon =1^2\), and \(w=15\), g guided filter when \(S\) is the enhanced image of Fig. 5f, \(\varepsilon =1^2\) and \(w=15\)

Block-Based Average Absolute Deviation Feature Extraction

The outputs of the guided directional filter form eight-vein contour images are shown in Fig. 7a. The finger vein images can be discriminated by the variation of the finger vein contours in the eight directions. Instead of pixel-based features, the directional filtered image is segmented with non-overlapping blocks of size \([T_1\times {T_2}]\). For instance, \((256 \times {96})/({T_1}\times {T_2})\) features can be extracted from a normalized image with a size of \(256\times {96}\) based on the statistical information. The selection of the splitting block size is analyzed in the experimental section. Assuming that \(F_{mn}\) represents the block matrix of a filter image, the statistics based on a block (a component of \(F\) in the column m and row n, where \(m=1, 2,\ldots , 256/{T_1},n=1, 2,\ldots ,96/{T_2}\)) can be computed. The AAD [34] \(\delta _{mn}^k\) of the magnitudes of \(G(I,f_0,\theta _k,\delta )\) corresponding to \(F_{mn}\) is calculated as:

$$\begin{aligned} \left\{ \begin{array}{l} \delta _{mn}^k=\frac{1}{N}\underset{F_{mn}}{\sum }\left| \left| G(I,f_0,\theta _k,\delta ,w,\varepsilon ) \right| -\mu _{mn}^k \right| \\ \mu _{mn}^k=\frac{1}{N}\underset{F_{mn}}{\sum }\left| G(I,f_0,\theta _k,\delta ,w,\varepsilon ) \right| \end{array}\right. \end{aligned}$$
(19)

where \(N\) is the number of pixels in \(F_{mn}\), and \(\mu _{mn}^k\) is the mean value of the magnitudes of \(G(I,f_0,\theta _k,\delta )\) in \(F_{mn}\). The feature vector for matching can be represented by: \(X=[C_1,C_2,\ldots ,C_8]\), where,

$$\begin{aligned} C_k=\left[ \begin{array}{ccc} \delta _{11}^k &{} \dots &{} \delta _{1n}^k \\ \vdots &{} \delta _{ij}^k &{} \vdots \\ \delta _{m1}^k &{} \dots &{} \delta _{mn}^k \end{array}\right] _{t\times s} \quad k=(1,2,\dots ,8) \end{aligned}$$
(20)

Eight-dimensional AAD features \(X\) corresponding to the eight contour images are obtained in this way as shown in Fig. 7b when the non-overlapping block size is \(16 \times {16}\). For each normalized contour image with a size of [256, 96], 96 (\([16\times 6]\)) vectors can be extracted to match a query image with a template.

Fig. 7
figure 7

Feature extraction results: a vein contour features on the eight directions, b block (\([16\times {16}]\))-based average absolute deviation features

Proposed Feature Component-Based Extreme Learning Machines

In the face recognition system, the facial components features derived from the eyes, nose, and mouth can be separately extracted, while they are batched as one feature set for recognition [35]. For the finger vein recognition, the global feature is more highly sensitive to image variations caused by user operation or environmental conditions, such as finger rotation, translation, or illumination. In contrast to general recognition systems based on structural component features, which are extracted based on the local position or properties of the objects, the proposed recognition system, called feature component-based ELMs (FC-ELMs), selects the feature components from the global features directionally, since the veins are composed of a series of directional information.

In the finger vein recognition system, based on the extracted eight component features, eight S-ELM networks are constructed in a parallel manner, as shown in Fig. 8. The parallel recognition systems based on the selected independent feature components are linearly combined for the final recognition decision. The eight directional components in this paper, called \(C_1,\,C_2\), ..., \(C_8\), are related to the directional filter at \(0^{\circ },\,22.5^{\circ }, \ldots , 157.5^{\circ }\), respectively. For each of the eight directional components, 96 AAD features are extracted with the selected block size of \(16\times {16}\). One of the eight components \(C_k\) from the total feature vector sets is assigned as the input for each S-ELM network. Thus, the feature size of the each S-ELM network is decreased to 1/8 of the feature vectors in S-ELM and E-ELM models. The output of the FC-ELM model is defined as follows:

$$\begin{aligned} f_{c}(X)=\sum _{k=1}^{8}w_k\cdot f(C_k),\quad k=1,2,\ldots ,8 \end{aligned}$$
(21)

where \(k\) denotes the \(kth\) component in the eight directions. Although each department will run independently, the success of the project (\(f_{c}(X)\)) is based on the proper assignment (the component \(C_k\)), the efficiency of each department (the performance of \(f(C_k)\)), and the department cooperation between them (the adaptive weight \(w_k\)). To ensure the matching performance of the recognition system, the principle employed for feature component extraction is that each component has sufficient uniqueness for recognition and robustness for the user operation or illumination.

Fig. 8
figure 8

Overview of the finger vein recognition system based on the proposed FC-ELM network

With the component correlation analysis, the proper weight assignment to the component features will improve their cooperation. An adaptive weights method is proposed with the analysis of the independence and correlation of the eight components based on the following two factors:

  1. 1.

    Not only the AAD features but also the component distribution of each image will contribute to the matching.

  2. 2.

    Those components with high confidence are assigned larger weights to decrease the matching error.

Assuming that both the fingerprint and finger vein image are convoluted with the proposed guided directional filter, the fingerprint energy will be spread almost equally in each direction, since the fingerprint ridges are connected in the form of a circle. However, instead of the approximate uniform distribution, the finger vein energy of the eight components will behave more like a Gaussian distribution. The main blood vessels, such as the main branches, flow from one side to another in the vertical direction and form an energy peak. The minor vessels exhibit more energy degeneration than the main vessels, which are connected to the main vessels randomly and less of the energy is focused. The energy distribution in the eight directions for the finger vein is shown in Fig. 9. The energy of each component \(E_k\) is defined in Eq. (22).

$$\begin{aligned} E_{k}=\underset{x,y}{\sum }(255-G(x,y,k))^2 \end{aligned}$$
(22)

where \(G(x,y,k)\) is the intensity value for pixel \((x,y)\) in the \(kth\) filtered image. The dark vein contour with intensity value 0 has the highest energy. The weights \(W_{k}\) for FC-ELM are given inversely that a larger energy component will have a smaller weight.

$$\begin{aligned} W_{k}=\frac{1}{n-1} \cdot \left( {1-\frac{E_k}{{\sum _{k=1,2,\ldots ,n} E_k }}} \right) \end{aligned}$$
(23)
Fig. 9
figure 9

Energy distribution of the dataset on the eight directional components

When the input features are the same, it was shown by Eq. (10) that the ensemble ELM networks could decrease the square error. Although the input features of the feature component vary, the single component test demonstrated that the matching accuracy of each component, which is more than 94 %, is sufficient. In the single component test, \(\hbox {Component}_1\) to \(\hbox {Component}_8\) are evaluated based on the basic ELM network under the dataset [3, 3] for five trials. The average training and testing results are shown in Fig. 10 with the tuning of the hidden neurons. We found that the components in \(0^{\circ },\,22.5^{\circ },\,135^{\circ }\), and \(157.5^{\circ }\) performed better than the components in \(45^{\circ },\,67.5^{\circ },\,90^{\circ }\), and \(112.5^{\circ }\). In other words, the major vein contributed less to the matching than the minor vein, since most of the vein image contains the major vein, thus decreasing the uniqueness of the major vein. This also satisfies the Shannon entropy theory, which can be approximately defined as the degree of disorder or uncertainty.

To analyze the correlation of the directional components, leave-one-out tests were also performed, as shown in Fig. 11. The matching results for \(\hbox {Component}_i\) mean the matching based on all of the components except for \(\hbox {Component}_i\). The results show that all of the components contribute to the matching, and the matching performance will be degraded when leaving out any of the component. \(\hbox {Component}_1\) contributed the most, since the matching performance is seriously degraded when it is deleted. To evaluate the feature stability, the variances of six images per individual were computed for both the eight single component features and the global features shown in Fig. 12. All of the component features have smaller variances than the global features, which means that the stability and robustness of the feature space are increased. The stabilities of \(\hbox {Component}_1,\,\hbox {Component}_2,\,\hbox {Component}_5\), and \(\hbox {Component}_8\) are improved by 20 % compared with the global features.

The selected component features have better sufficient accuracy for matching and higher stability than the global features. In addition, similar to the ensemble ELM networks, with the randomly generated nodes for the eight component features, the FC-ELM networks can improve its stability to a leave comparable to that of the E-ELM (\(M=10\)) network, as shown in “Performance of S-ELM, TER, E-ELM, FC-ELM, and EC-ELM” section.

Fig. 10
figure 10

Matching performance of the eight single components

Fig. 11
figure 11

Matching performance of the leave-one-component-out test

Fig. 12
figure 12

Feature variance comparisons for the component features and global features

Proposed Ensemble Components-Based ELM

Compared with the ensemble ELM model, the proposed feature component-based ELM model is much smaller, since the size of the input feature, the number of hidden neurons, and numbers of S-ELM networks are all decreased substantially. To combine the advantage of the ensemble ELM model and component-based ELM model, we propose the ensemble component-based ELM network (EC-ELM), as shown in Fig. 13, in which the average of the FC-ELM outputs is used as the final decision. Assuming that the output of each FC-ELM network is \(f_c^{(j)}(X),\,j=1, \ldots , M\), the final output of the EC-ELMs, \(f_{\mathrm{EC}}(X)\), can be represented as:

$$\begin{aligned} f_{\mathrm{EC}}(X)= \sum _{j=1}^{M}w_j\cdot f_c^{(j)}(X) \quad \mathrm {where},\quad w_j=\frac{1}{M} \end{aligned}$$
(24)

where \(f_{\mathrm{EC}}(X)\) is the output of the whole system with input \(X\). The scale of the proposed EC-ELM model is smaller than that of the ensemble ELM model in Fig. 2b, since the scale of each FC-ELM module is larger than each S-ELM module, but the number of modules, M, which participate in an ensemble operation is much smaller than in the ensemble ELM.

Fig. 13
figure 13

Ensemble component-based extreme learning machines network

Experimental Results

Dataset

The datasets mainly used in the study is a public finger vein dataset, including 106 individuals. The Group of Machine Learning and Applications at Shandong University (SDUMLA) set up the homologous multi-modal traits dataset [36], which consists of face images, finger vein images, gait videos, iris images, and fingerprint images. Each individual was asked to provide images of the index finger, middle finger, and ring finger of both hands, and the collection for each of the six fingers is repeated six times to obtain thirty-six finger vein images. The finger vein dataset is composed of 3,816 images with a size of \(320\times {240}\) pixels.

To evaluate the effect of the proposed method FC-ELMs and EC-ELMs, a new finger vein dataset including 1,000 images constructed by the group of Multi-Media Lab of Chonbuk National University (MMCBNU) is added for the evaluation in “Performance of S-ELM, TER, E-ELM, FC-ELM, and EC-ELM” section. The finger vein dataset is composed of finger images from 100 individuals, and each finger image is repeated ten times.

Experimental Protocol

Due to the random settings within the hidden neurons of ELM classifier, as well as the statistical evidence, we use ten runs of tenfold stratified cross-validation for all final accuracy results. Two types of training and testing sets, [3, 3] and [5, 1], are generated randomly. The set [5, 1] means that five images per individual are employed for training with one image per individual for testing. The programs are run on 3.00 GHz Intel Core 2 Quad processor using Matlab 7.0.1.

To optimize the proposed recognition system, the following measures are used to evaluate the performance:

  1. 1.

    Vein contour extraction: As the accurate contour images can improve the matching performance, the quality of the contour image is quantitatively evaluated using the matching performance.

  2. 2.

    AAD feature extraction: To optimize the AAD feature, different block sizes are evaluated in terms of the matching accuracy and time consumption using the S-ELM network.

  3. 3.

    Hidden neurons and the number of ELM: According to the four types of ELM classifier: S-ELM, E-ELMs (\(M=5, 10, 15, 20\)), FC-ELMs, and EC-ELMs (\(M=5, 10, 20\)), the matching performance, stability, and computational complexity of the finger vein recognition systems are evaluated with hidden neuron tuning.

  4. 4.

    Genuine matching and imposter matching: The false acceptance rate (FAR) and false rejection rate (FRR) defined in Eqs. (25) and (26), respectively, are evaluated with \(634 \times (3804-6)=2407932\) impostor matches versus \(634 \times 5=\hbox {3,170}\) genuine matches.

  5. 5.

    Comparison with existing finger vein recognition methods: The proposed recognition system is compared with the minutiae feature-based methods [37], local binary pattern-based methods [38, 39], and the SVM-based method [16].

    $$\begin{aligned} \hbox {FAR}&= \frac{\hbox {Number\ of\ accepted\ imposter\ claims}}{\hbox {Total\ number\ of\ imposter\ accesses}}\end{aligned}$$
    (25)
    $$\begin{aligned} \hbox {FRR}&= \frac{\hbox {Number\ of\ rejected\ genuine\ claims}}{\hbox {Total\ number\ of\ genuine\ accesses}} \end{aligned}$$
    (26)

Evaluation of Performance

Vein Contour Extraction Performance

The vein contour exaction performances are evaluated in terms of the matching accuracy using the S-ELM network. Corresponding to the typical image enhancement methods mentioned in Fig. 6, the comparison of the matching performance is shown in Table 1. The guided filter performed with superior accuracy.

Table 1 Matching performance comparison with the typical enhancement methods

AAD Feature Extraction

The AAD feature estimates the similarity between each pair of split blocks. The difference will decrease with increasing size of the local block, and so the characteristic of the individuals will become more featureless. In contrast, a smaller block size can describe the vein contour features in more detail, but will bring about a larger computational burden. To choose the proper block size, block sizes of \(16\times {16}\) and \(32\times {32}\) are evaluated in terms of the matching accuracy and time consumption using the S-ELM network. According to Table 2, a block size of \(16\times {16}\) performs better than the larger block size of \(32\times {32}\).

Table 2 Matching performance comparisons with block sizes of \(16\times {16}\) and \(32\times {32}\)

Performance of S-ELM, TER, E-ELM, FC-ELM, and EC-ELM

The matching performance, stability, and computational complexity of the finger vein recognition systems were evaluated according to the classifier employed for the four types of ELM: S-ELM, E-ELMs (\(M=5, 10, 15, 20\)), FC-ELMs, and EC-ELMs (\(M=5, 10, 20\)). In Fig. 14, the matching performance is compared with hidden neuron tuning for the S-ELM and E-ELM models for \(M=5,10,15\), and 20. For \(M=20\), the matching performance of the E-ELM networks is found to be the best, with a score of 97.19 %, which is much higher than that of the S-ELM model. In regard to the training and testing times, it is worth mentioning that although the E-ELM can be constructed in a parallel manner, the time consumption is calculated in series, since the simulation is performed based on a the single computer. As shown in Fig. 15, the matching accuracy of the proposed FC-ELMs is 97.69 %, which is higher than that of the E-ELM networks, because the weights are adaptively assigned to strengthen the weak learning ability caused by analyzing the vein directional components distribution. The EC-ELM model with a smaller number (\(M=5\)) of FC-ELM networks provides slightly improved matching performance, reaching 97.75 %. Based on the results of the tuned hidden neuron test in Figs. 14 and 15, the comparison of the structural complexities between the optimal versions of the four types of ELM is shown in Table 3. While S-ELM has the fewest nodes and E-ELMs (\(M=20\)) have the most, FC-ELMs have less than 20 % of the nodes required for E-ELMs (\(M=20\)). This shows that FC-ELMs (and EC-ELMs) are superior to E-ELMs in terms of structural complexity, since FC-ELMs (and EC-ELMs) have a relatively smaller size of basic network than E-ELMs.

Fig. 14
figure 14

Matching performance of the S-ELM and E-ELM when the numbers of ELM networks are \(M=5,10,15\) and 20: a training time, b testing time, c training accuracy, d testing accuracy

Fig. 15
figure 15

Matching performance of the FC-ELMs and EC-ELMs when the numbers of FC-ELM networks are \(M=5,10\) and 20: a training time, b testing time, c training accuracy, d testing accuracy

Table 3 Comparison of the size of the input features and hidden neuron setting of the S-ELM, E-ELMs, FC-ELMs, and EC-ELMs models
Table 4 Matching performance for individual trials with respect to the dataset SDUMLA
Table 5 Matching performance for individual trials with respect to the dataset MMCBNU

To evaluate the stability, ten trials of tenfold stratified cross-validation test are performed for each network, and the standard deviations of the matching accuracy are shown in Tables 4 and 5 for two testing datasets. Compared with the ELM method and the TER method [40], which minimizes the total error rates by adjustment of the class-specific normalization, EC-ELMs show the better stability. E-ELM and FC-ELM networks improve the stability by more than 50 % compared with the S-ELM. The FC-ELM network can achieve the same stability as E-ELMs (\(M=10\)), but with much fewer hidden neurons. From the evaluation of the matching performance, stability, and the complexity of the finger vein recognition systems for the S-ELM, E-ELMs (\(M=5, 10, 15, 20\)), FC-ELMs, and EC-ELMs (\(M=5, 10, 20\)), it is shown that the optimal classifier among the four kinds of ELM is the proposed FC-ELMs. Moreover, the significance of improvement was obtained based on a paired t test between two compared means at a significance level of 0.05, as shown in Tables 4 and 5.

Genuine Matching and Imposter Matching

The match score distribution of the two kinds of FC-ELMs is shown in Fig. 16. The \(X\)-axis represents the matching score, which is the final output decision value \(f_C(X)\) obtained from the FC-ELM network in Eq. (21), and the \(Y\)-axis is its frequency value. The genuine matching can be separated from the imposter matching with a clear threshold for both the adaptive and average weighted networks. The adaptive weighted FC-ELMs provide a larger discrimination distance between the genuine and imposter matching than the average weighted FC-ELMs. Hence, the adaptive weighted FC-ELMs would be more adaptive to the growing dataset.

The receiver operating characteristic (ROC), which is a plot of the genuine acceptance rate (GAR = 1 \(-\) FRR) versus the FAR, is simulated as shown in Fig. 17. The adaptive weighted FC-ELMs are slightly superior to the average weighted FC-ELMs, which obtains an FAR of 0.16 % and FRR of 0.58 %).

Fig. 16
figure 16

Genuine and imposter match score based on the adaptive and average weighted FC-ELM network

Fig. 17
figure 17

ROC curves of the adaptive and average weighted FC-ELM networks

Table 6 Performance comparison of the proposed and existing methods

Comparison with the Existing Methods

A comparison of the correct classification rate (CCR), training time, and testing time obtained from the minutiae feature-based methods [37], local binary pattern-based methods [38, 39], and the proposed directional feature-based methods is shown in Table 6. Based on the proposed features, several classifiers are compared including the modified Hausdorff distance [37], SVM [16], S-ELM, E-ELMs, FC-ELMs, and EC-ELMs. The results show that the proposed FC-ELMs achieve higher CCRs of 97.69 and 99.53 % for the [3, 3] and [5, 1] training and testing sets, and the EC-ELMs with \(M=5\) afford the highest CCRs of 97.75 and 99.60 % for the [3, 3] and [5, 1] training and testing sets, respectively. Although the testing time of EC-ELMs is higher than that of the S-ELM and FC-ELMs models, it is several hundred times less than that of the E-ELMs and the other matching methods.

Conclusions

This paper presented an efficient finger vein recognition system with novel feature component-based ELM models. With the assignment of the adaptive weights to the FC-ELMs, a higher matching performance of CCR = 99.21 % is achieved with FAR = 0.16 % and FRR = 0.58 %, which is much better than those of the S-ELM, E-ELMs, SVM, and other distance-based methods. Moreover, due to the smaller size of the input feature vectors, fewer hidden neurons, and fewer number of ELM networks, the FC-ELM model provides superior performance in terms of both the recognition rate and matching speed, reaching 0.87 ms per image, which is satisfactory for real-time recognition. The FC-ELM and EC-ELM networks have the advantage of balancing the stability of E-ELM networks, along with higher CCRs and less computation complexity.