Keywords

1 Introduction

Identification systems has a lot of applications spanning from governmental projects like border control or criminal identification to civil purposes like e-commerce, network access or transport. Out of the existing identification systems, biometrics has become very popular. It is the most reliable identification system since it depends upon the physiological characteristics of a person which is unique for every person in this planet and cannot be stolen or swapped. Different physiological features can be used as biometrics such as iris, retina, face etc.

Face authentication is one of the most popular biometric identification systems. It is more direct, user-friendly and convenient compared to other methods. There are many existing face authentication systems in visible spectrum. But such systems may fail when there is a variation in illumination. Further, such systems can be easily spoofed by using disguises or by using photographs. Such systems cannot distinguish a live face from a non-live face. A secure system needs liveness detection to guard against such spoofing.

These challenges can be met with the help of thermal imaging. Thermal imaging makes use of thermal cameras to detect infrared radiation emitted by objects. Since thermal imaging depends on the heat emitted there is no issue of illumination variation. Only a live face will emit thermal radiation. So liveness detection can be easily carried out with thermal imaging. Also the radiation emitted by human face is at a different range compared to other objects thus aiding in detecting disguises.

Previous works done by researchers indicate that many face authentication algorithms such as Principal Component Analysis (PCA) [1], Local Binary Pattern (LBP), Histogram of Oriented Gradients (HOG) etc. which gives very accuracy in face authentication in visible spectrum works very well in the case of thermal images too. However, they were all verified on datasets containing non-realistic backgrounds. There are no other background objects. But that may not always be the case. There may be other objects with similar spectral range in the background. The work presented here analyses the performance of various face authentication algorithms on a newly created database with images having realistic background.

The paper is organized as follows: Sect. 2 provides an overview on related work done previously by other authors. Section 3 explains the method implemented to achieve the objective. Section 4 presents the results obtained. Section 5 gives the conclusion reached and future scope of the work.

2 Literature Survey

Researchers have realized the potential of thermal imaging in the area of human identification. One of the earliest works on the topic was conducted by Chen et al. He had conducted some significant studies on face authentication in visible and infrared spectrum using principal component analysis (PCA) and has presented the results in [2]. The algorithm was implemented on their dataset called C-X1. Database consists of 10,916 images in both visible and IR spectrum from 488 subjects.

Guzman et al. [3] proposed an algorithm that does face authentication based on the vasculature information which is different for every person. The algorithm is implemented by using localization of active contours and morphological operations. The features are then matched using similarity measurements. The proposed algorithm was implemented on C-X1 dataset and on a newly created dataset comprising 13 subjects. When implemented on the Guzman dataset it gave an average recognition rate of 89% and a recognition rate of 68.5% on C-X1 dataset by using different similarity measures. The poor performance of algorithm on C-X1 dataset was claimed to be due to the lack of non uniformity correction in the dataset. Major drawback is that the algorithm was validated only on a small dataset.

Carrapico et al. [4] has taken into consideration several image descriptors and analyzed their performance in face authentication using thermal images. The image descriptors considered include Gabor Bank-Filters, Localized Binary Patterns (LBP), Colour and Edge Directivity Descriptor (CEDD) [5] and Fuzzy Colour and Texture Histogram (FCTH) [6]. The four algorithms were implemented on dataset provided by University of Science and Technology, China along with a k-nn classifier. Out of the four the Localized Binary Patterns (LBP) feature descriptor gave the highest accuracy of 91%.

Xie and Wang [7] has proposed a modified LBP based feature extraction algorithm for face authentication using thermal imaging. The major drawback of LBP is its large dimensionality. The proposed approach tackles this problem by selecting personalized features for each subject. The algorithm was implemented on their own dataset of 400 thermal images of 40 subjects in combination with KNN classifier. Its performance was compared with other LBP based techniques. Their algorithm gave the recognition rate of 98.2%.

Espinosa-Duró et al. [8] created a database of thermal face images. It consist of 41 subjects (32 males and 9 females). Visible and thermal images of all the subjects were captured under three different illumination conditions: IR (infrared illumination), AR (artificial illumination) and NA (Natural illumination). A feature extraction method based on discrete cosine transform (DCT) was implemented.

In all the above mentioned works the feature extraction algorithms were verified only on datasets containing images with plain background. However these algorithms need to be verified on images with realistic background as well.

3 Methodology

Figure 1 depicts the basic flow diagram of the face authentication process.

Fig. 1
figure 1

Block diagram

  1. (a)

    Pre-Processing: A median filter with a kernel size of 5 was used to remove the higher frequencies and the impulse noises while preserving the edges.

  2. (b)

    Feature Extraction: Four different feature extraction algorithms were implemented on the two datasets: Local Binary Pattern (LBP), Histogram of Oriented Gradients (HOG), Pyramid Histogram of Oriented Gradients (PHOG) and Discrete Cosine Transform (DCT).

  • Local Binary Pattern (LBP) [7]: It is a powerful feature for texture classification. It has two major parameters: P and R. P is the number of neighboring pixels and R is the radius. Here P is taken as 8 and R as 1. The entire image is divided into non-overlapping blocks of uniform size of 16 × 16 pixels. For each pixel in the block, its intensity is compared with 8 neighboring pixels at a radius of 1 pixel. A binary code for centre pixel

$$ LBP_{P,R} \left( {g_{c} } \right) = \sum\limits_{i = 0}^{P - 1} {2^{i} \cdot B} $$
(1)

is generated based on the comparison as per the following equation:

$$ B = \left\{ {\begin{array}{*{20}c} {1, \quad g_{i} - g_{c} > 0} \\ {0, \quad g_{i} - g_{c} < 0} \\ \end{array} } \right. $$
(2)

where \( g_{c} \) is the gray value of the central pixel,\( g_{i} \) is the gray value of the neighboring pixels. \( g_{i} - g_{c} \) is the generated binary code for central pixel \( g_{c} \). A binary value of 1 is assigned if the gray value of central pixel is less than that of the neighboring pixels and a value of 0 is assigned if the gray value of central pixel is greater than that of the neighboring pixels. The number bits depend on P value. Since it is taken as 8 here an 8 bit binary code is generated for each pixel.

For each non-overlapping block histogram is computed based on the following equations:

$$ H\left( r \right) = \sum\limits_{{x_{c} }}^{M - 1} {\sum\limits_{{y_{c} }}^{N - 1} {f\left( {LBP_{P,R} \left( {x,y} \right),r} \right)} } $$
(3)
$$ f\left( {LBP_{P,R} \left( {x,y} \right),r} \right) = \left\{ {\begin{array}{*{20}l} 1 \hfill & { LBP_{P,R} \left( {x,y} \right) = r} \hfill \\ 0 \hfill & { otherwise} \hfill \\ \end{array} } \right. $$
(4)

The final feature matrix is formed by concatenating normalized histogram of the binary patterns of all the blocks in the image.

  • Histogram Of Oriented Gradients (HOG) [9]: It is a scale invariant feature descriptor. Entire image is divided into uniform blocks with 50% overlapping. The gradient of an image is defined by:

$$ \Delta f = \left( {\begin{array}{*{20}c} {g_{x} } \\ {g_{y} } \\ \end{array} } \right) = \left( {\begin{array}{*{20}c} {\frac{\partial f}{\partial x}} \\ {\frac{\partial f}{\partial y}} \\ \end{array} } \right) $$
(5)

where \( \frac{\partial f}{\partial x} \) is the gradient in x direction and \( \frac{\partial f}{\partial y} \) is the gradient in y direction.

The gradient direction is calculated by:

$$ \theta = \tan^{ - 1} \left( {\begin{array}{*{20}c} {g_{y} } \\ {g_{x} } \\ \end{array} } \right) $$
(6)

The gradient magnitude and direction is calculated for each pixel in each block. Then a histogram of gradients is computed for each cell (9 bins, 0°–360°. Concatenating the normalized histograms all the blocks forms the final feature vector.

  • Pyramid Histogram of Oriented Gradients [10]: It is a variant of HOG. HOG features are calculated for three different block sizes. Then final feature vector is formed by concatenating the normalized histograms of all the levels.

  • Discrete Cosine Transform [8]: It is an invertible linear transform. The image is converted into frequency domain by applying DCT. DCT2 defined by the following equation is used:

$$ X\left[ {k,l} \right] = \frac{2}{N} \cdot c_{k} \cdot c_{l} \cdot \sum\limits_{m = 0}^{a - 1} {\sum\limits_{n = 0}^{b - 1} {x\left[ {m,n} \right]\cos \left( {\frac{{\left( {2n + 1} \right)l\pi }}{2N}} \right)} } $$

where a and b indicate the number of coefficients in x and y direction, N is the total number of significant coefficients and \( c_{k} \) and \( c_{l} \) are defined as:

$$ c_{k} ,c_{l} = \left\{ {\begin{array}{*{20}l} {\frac{1}{\sqrt 2 } to\,k = 0,\,l = 0} \\ {1\,to\,k = 1,2, \ldots .a - 1\quad and\quad l = 1,2, \ldots b - 1} \\ \end{array} } \right. $$

The DCT coefficients indicate the importance of frequencies in it. The first coefficient signifies the lowest frequency component and carries the most significant information. Higher frequencies are mostly noise components. The first N DCT coefficients are used as feature vector.

  1. (c)

    Classification: The features formed were classified using two popular classifiers: K-Nearest Neighbours (KNN) [11] and Support Vector Machine (SVM) [12].

A k-fold cross validation process was carried out to get the classification accuracy. Here k was chosen as 10. The entire dataset is divided into 10 random groups. Each group is taken as the test matrix and the remaining group as training groups in an iterative process. The classification accuracy is calculated as follows:

$$ A = \frac{C}{T} $$

where A is the classification accuracy, C is the number of correct classifications and T is the total number of classifications.

  1. (d)

    Datasets: The algorithm was implemented on two datasets: the publicly available Carl database and a newly created database. Carl database consists of thermal, visible and infrared images of 41 subjects (32 males and 9 females) under three different illumination conditions: natural illumination, artificial illumination and infrared illumination. The dataset also contains face localized images of the same. There are many face localization algorithms in existence like level set algorithm [13] etc. The face localized images in Carl database were created using a new algorithm [14] (Figs. 2, 3 and 4).

Fig. 2
figure 2

a Thermal, b Visible

Fig. 3
figure 3

a Thermal, b Visible

Fig. 4
figure 4

a Thermal, b Visible

The newly created database consists of thermal and near infrared images of 30 subjects (10 males and 20 females) with realistic backgrounds. The backgrounds vary for different subjects. The images were captured at different times of the day from morning 9 a.m to evening 4 p.m to take into account the temperature variations that occur. The thermal images were acquired using FLIR One thermal camera with a temperature range of −20° to 120 °C and thermal sensitivity of 0.1 °C. The images captured are in jpg format with dimensions of 480 × 640 pixels. The near infrared images were captured using a MAPIR NDVI + Red Survey 2 camera with a spectral range of 700 nm–1 mm (Fig. 5).

Fig. 5
figure 5

a Near infrared, b Thermal

The captured images were 3 dimensional with the infrared component as the 2nd dimension. So only that was extracted from all the images. The images are in raw jpg format with dimensions of 3840 × 2160 pixels. The images were acquired with an approximate distance of 1 m between the subject and the camera. Unlike the existing thermal face databases, the images were captured with a non-plain realistic background.

4 Results

The results obtained are presented in Tables 1, 2 and 3.

Table 1 Classification accuracy using Carl and new database
Table 2 Classification accuracy of face localized images
Table 3 Comparison of different illuminations using KNN

Table 1 depicts the classification accuracy for KNN and SVM on the different feature extraction algorithms. The tables indicate that the accuracy is highest for LBP features. Also, it is observed that the accuracy value decreases when considering images with non-plain background.

Table 2 gives the accuracies when the algorithms were implemented on the face localized images of Carl database. The results indicate the advantage of face localizing the images before feature extraction.

Table 3 compares the accuracy values for different illuminations. It can be observed that the accuracy does not vary much with change in illumination.

5 Conclusion and Future Work

A new dataset of thermal and near infrared face images with non-plain realistic backgrounds was created. Different feature extraction algorithms (LBP, HOG, PHOG, DCT) were compared by implementing them on a benchmark dataset and on a newly created dataset. The LBP features gave the highest accuracy. Also, the accuracy decreased significantly when implemented on the new dataset because of its non-plain background. Also, accuracy increased dramatically when face localization was done before feature extraction. There was not much change in accuracy when implemented for different illuminations conforming that thermal images remain unaffected by illumination variations.

Face localization algorithm for images with plain background already exists. Such algorithms for non-plain backgrounds should be investigated. The algorithm needs to be verified on a larger dataset.