Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

5.1 Introduction

Face recognition has been an active area of research over the past few decades. Many major advancements have been reported in the literature. New applications have triggered new challenges, and new challenges have called for new research solutions. Surveillance at night or in harsh environments is one of the most recent applications of face recognition. Latest advancements in manufacturing of small and cheap imaging devices sensitive in active infrared range (near-and short-infrared) [21, 23] and the ability of these cameras to see through fog, rain, at night and operate at long ranges provided researchers with new type of imagery and posed new research problems [58, 10, 13, 28, 39, 40, 45]. As observed, active-IR energy is less affected by scattering and absorption by smoke or dust than visible light. Also, unlike visible spectrum imaging, active-IR imaging can be used to extract not only exterior but also useful subcutaneous anatomical information. This results in a very different appearance of face images in active-IR range compared to face images in visible spectrum. Acknowledging these differences, many related questions can be posed. What type of information should be extracted from active-IR images to successfully solve the problem of face recognition? How to match a face image in visible range to a face image in active-IR range? The latter falls in the scope of heterogeneous face recognition . Developing local operators for heterogeneous face recognition is the focus of this chapter. We will first provide a short overview of two general existing approaches to solve the problem of face recognition and later narrow it down to an overview of local operator-based approaches recently proposed and used in the field.

The literature identified two general categories of approaches to address the problem of face recognition: the holistic approach (also known as subspace analysis) and the local feature approach. The former represents the global photometric information of a human face using subspace projections. Examples include principal component analysis (PCA), independent component analysis (ICA), linear discriminant analysis (LDA), canonical correlation analysis (CCA), multilinear subspace learning (MSL), and their derivatives. Sirovich and Kirby [44] showed that PCA could be applied to a collection of face images to form a set of basis features which are known as eigenfaces. Later, Turk and Pentland [47, 48] expanded these results and presented the method of eigenfaces as well as a system for automated face recognition using eigenfaces. They showed a way of calculating the eigenvectors of a covariance matrix in a way that made it possible for computers at that time to perform eigen decomposition on a large number of face images. Jutten and Herault [27] introduced the general framework for ICA and then Comon [16] refined it. ICA can be seen as a generalization of PCA, in which ICA generates a set of basis vectors that possess maximal statistical independence, while PCA uses eigenvectors to determine basis vectors that capture maximal image variance. Motivated by the fact that much of the important information may be contained in the high-order relationship rather than that of the second-order, Bartlett at el. [3, 4] applied ICA to the problem of face recognition.

Fisher was the first to introduce the idea of LDA [20]. LDA determines a set of optimal discriminant basis vectors so that the ratio of the inter- and intra-class scatter matrices is maximized. It is primarily used to reduce the number of features to a more manageable number before classification. Each of the new dimensions is a linear combination of pixel values, which form a template. CCA was first introduced by Hotelling [25]. Given two random vectors \(X = (X_{1} , \ldots ,X_{n} )\) and \(Y = (Y_{1} , \ldots ,Y_{m} )\), and assuming a correlation among the variables, CCA finds the linear combinations of \(X_{i}\) and \(Y_{j}\) that result in the maximum correlation with each other. Melzera et al. [37] applied CCA to face recognition and proposed appearance models based on kernel canonical correlation analysis.

The second category of approaches use local operators instead and have advantages such as more robustness to illumination and occlusion, less strictly controlled conditions, and involvement of very small training sets. Examples of operators used in this category include Gabor filters, local binary patterns (LBPs), histogram of oriented gradients (HOGs), Weber local descriptor (WLD), and their generalizations and variants. Gabor filter is known to be a robust directional filter used for edge detection [36]. It has been found that simple cells in the visual cortex of mammalian brains can be modeled by Gabor functions [18, 34]. A set of Gabor filters parameterized by different frequencies and orientations are shown to perform well as an image feature extraction tool. Therefore, it has been widely used in image processing and pattern analysis applications [19, 26, 31, 33]. LBP is a particular case of the texture spectrum model proposed by Wang et al. [50]. It was first introduced by Ojala and Pietikinen [41, 42] for texture classification and found to be a powerful tool. LBP was thereafter applied to face recognition as well as object detection [1, 24]. Due to its discriminative power and computational simplicity as well as robustness to monotonic changes of image intensity caused by illumination variations, LBP has been expanded into several variant forms (see, e.g., [53, 54]). HOG analysis was introduced by Dalal et al. [17] and was initially used for the purpose of object detection. This operator is similar to other operators such as edge orientation histograms and scale-invariant feature transform, but differs in that it is computed on a dense grid of uniformly spaced cells and uses overlapping local contrast normalization for improved accuracy. Chen et al. [12] introduced the WLD operator inspired by Weber’s law—an important psychological law quantifying the perception of change in a given stimulus [43].

Most of described methods have been developed for intra-spectral matching, to be more specific to match visible light images. Some operators were tuned to work with heterogeneous face images. For example, Chen et al. [14] conducted a face recognition study in thermal IR and visible spectral bands using PCA and Faceit G5. They showed that the performance of PCA in visible spectral band is higher compared to the performance of PCA in thermal IR spectral band and that these data fused at the matching score level resulted in a performance similar to the performance of the algorithm in visible band. Li et al. [32] proposed a method to compare face images within the NIR spectral band under different illumination scenarios. Their face matcher involved an LBP operator to achieve illumination invariance and was applied to near-infrared (NIR) images acquired at a short distance. In their recent works, Akhloufi and Bendada [2] experimented with images from database including visible, shortwave infrared (SWIR), mid-wave infrared (MWIR), and thermal infrared images. They adopted a classic local ternary pattern (LTP) and a new local adaptive ternary pattern (LATP) operator for feature extraction. The work of Klare and Jain [29] employed a method based on LBP and HOG operators, followed by a random sampling LDA algorithm to reduce the dimensionality of feature vectors. This encoding strategy is applied to NIR and color images for cross-spectral matching . The results are shown to outperform Cognitec’s FaceVACS [15].

This chapter focuses on a discussion of local operators (algorithms from the second category) for heterogeneous face recognition . The methodology for feature extraction and heterogeneous matching adopted in this chapter does not require training data, which justifies its importance in practice. Once local operators are developed, they can be applied to any heterogeneous data (we particularly focus on matching visible images to active-IR images) and do not require any estimation or learning of parameters or retraining of the overall face recognition system.

We present and compare several feature extraction approaches applied to heterogeneous face images. Face images (in visible spectrum and active IR) may be first processed with a bank of Gabor filters parameterized by orientation and scale parameters followed by an application of a bank of local operators . The operators encode both the magnitude and phase of Gabor filtered (or non-filtered) face images. The application of an operator to a single image results in multiple magnitude and phase outputs. The outputs are mapped into a histogram representation, which constitutes a long feature vector. Feature vectors are cross-matched by applying a symmetric Kullback-Leibler distance . The combination of Gabor filters and local operators offers an advantage of both the selective nature of Gabor filters and the robustness of these operators.

In addition to known local operators such as LBP, generalized LBP (GLBP), WLD, HOG, and ordinal measures [11], we also present a recently developed operator named composite multi-lobe descriptor (CMLD) [9]. Inspired by the design of ordinal measures, this new operator combines Gabor filters, LBP, GLBP, and WLD and modifies them into multi-lobe functions with smoothed neighborhoods.

Performance of Gabor filters, LBP, GLBP, WLD, and HOG used both individually and in combinations and performance of CMLD are demonstrated on both the Pre-TINDERS and TINDERS datasets [51]. These datasets contain color face images, NIR and SWIR face images acquired at a distance of 1.5, 50, and 106 meters.

5.2 Heterogeneous Face Recognition

A typical system for heterogeneous face recognition can be described by three connected modules: a preprocessing module, a feature extraction module, and a matching module (see the block diagram in Fig. 5.1). In this work, the preprocessing module implements an alignment, cropping and normalization of heterogeneous face images. The feature extraction module performs filtering, applies local operators , and maps the outputs of local operators into a histogram representation. The matching module applies a symmetric Kullback-Leibler distance to histogram representations of heterogeneous face images to generate a matching score. A functional description of each of the three modules is provided in the following subsections.

Fig. 5.1
figure 1

A block diagram of a typical face recognition system

5.2.1 Preprocessing: Alignment, Cropping, and Normalization

In this work, the preprocessing module implements image alignment, cropping, and normalization. For alignment, positions of the eyes are used to transform the face to a canonical representation. Geometric transformations such as rotation, scaling, and translation are applied to each face image with the objective to project eyes to fixed positions. Figure 5.2 a, b, d illustrates the processing steps. In our work, the anchor points—the fixed positions of the eyes—are manually selected. However, this process can be automated by means of a Haar-based detector trained on heterogeneous face images [49], as an example.

Fig. 5.2
figure 2

Preprocessing of the face: a original color image, b aligned and cropped color face, c grayscale conversion of (b), d aligned and cropped SWIR face, and e log transformation of (d)

The aligned face images are further cropped to an area of size \(120 \times 112\) (see Fig. 5.2b, d). After being cropped, images undergo an intensity normalization. Color images are converted to grayscale images using a simple linear combination of the original R, G, and B channels (see Fig. 5.2c). Active-IR images—SWIR and NIR images—are preprocessed using a simple nonlinear transformation given by \(\log (1 + X),\) where X is the input image, as shown in Fig. 5.2e. The log transformation redistributes the original darker pixels over a much broader range and compresses the range of the original brighter pixels. The transformed image is brighter and has a better contrast than the original image, while the gray variation (trend) of the pixels is still preserved since the transformation is monotonic.

5.2.2 Feature Extraction

Feature extraction (implemented by the second module in the block diagram) is intended to extract an informative representation of heterogeneous face images with the objective of successful heterogeneous face recognition . In this chapter, we focus only on local operators. Below, we provide a brief mathematical description of Gabor filters, LBP, generalized LBP, WLD, HOG, as well as some variants or improvements such as Gabor ordinal measures (GOM) and composite multi-lobe descriptor (CMLD) . We move the description of the ultimate feature vector to Sect. 5.2.3.

5.2.2.1 Gabor Filter

As recently demonstrated by Nicolo et al. [39, 40] and Chai et al. [11], a two-step encoding of face images, where encoding with local operators is preceded by Gabor filtering, leads to considerably improved recognition rates. Therefore, many combinations of operators analyzed in this chapter involve filtering with a bank of Gabor filters as a first step. The filter bank includes 2 different scales and 8 orientations resulting in a total of 16 filter responses. The mathematical description of the filter is given as follows:

$$G(z,{\theta },s) = \frac{{{\parallel \fancyscript{K}}({\theta },s){\parallel }}}{{\upsigma^{2} }}{\kern 1pt} { \exp } {\kern 1pt} \left[{\frac{{{\parallel \fancyscript{K}}({\theta },s){\parallel }^{2} {\parallel }z{\parallel }^{2} }}{{2\upsigma^{2} }}} \right]\left[{e^{{i{\fancyscript{K}}(\theta ,s)z}} - e^{{- \frac{{\upsigma^{2} }}{2}}} } \right],$$
(5.1)

where \({\fancyscript{K}}(\theta ,s)\) is the wave vector and \(\upsigma^{2}\) is the variance of the Gaussian kernel. The magnitude and phase of the wave vector determine the scale and orientation of the oscillatory term and \(z = (x,y)\). The wave vector can be expressed as follows:

$${\fancyscript{K}}(\theta ,s) = {\fancyscript{K}}_{s} e^{{i\phi_{\theta } }} ,$$
(5.2)

where \({\fancyscript{K}}_{s}\) is known as a scale parameter and \(\phi_{\theta }\) is an orientation parameter. The adopted parameters for the complex vector in the experiments of this chapter are set to \({\fancyscript{K}}_{s} = \left({\pi /2} \right)^{s/2}\) with \(s \in {\mathbb{N}}\) and \(\phi_{\theta } = \theta \pi /8\) with \(\theta = 1,2, \ldots ,8.\) The Gaussian kernel has the standard deviation \(\upsigma = \pi .\)

A normalized and preprocessed face image \(I(z)\) is convolved with a Gabor filter \(G(z,\theta ,s)\) at orientation \(\phi_{\theta }\) and scale \({\fancyscript{K}}_{s}\) resulting in the filtered image \(Y(z,\theta ,s) = I(z)*G(z,\theta ,s),\) where \(*\) stands for convolution.

5.2.2.2 Weber Local Descriptor

WLD consists of two joint parts: a differential excitation operator and a gradient orientation descriptor. In this chapter, we adopt only the differential excitation operator to encode the magnitude filter response, resulting in a robust representation of face images.

The differences between the neighboring pixels of a central pixel are calculated and normalized by the pixel value itself. The summation of these normalized differences is further normalized by a monotonic function such as a tangent function. Finally, quantization is performed to output the WLD value.

The mathematical definition of WLD used in this chapter is given as follows:

$${\text{WLD}}_{l,r,N} (x) = {\fancyscript{Q}}_{l} \left\{{\mathop {\tan }\nolimits^{- 1} \left[{\sum\limits_{i = 1}^{N} \left({\frac{{x_{i} - x}}{x}} \right)} \right]} \right\},$$
(5.3)

where \(x_{i}\) are the neighbors of x at radius r and N is the total number of neighbors (see Fig. 5.3). \({\fancyscript{Q}}_{l}\) is a uniform quantizer with l quantization levels.

Fig. 5.3
figure 3

Illustration of the neigboring pixels (N = 12) of a central pixel at different radii: the left corresponds r = 1; the right r = 2

5.2.2.3 Local Binary Pattern

An uniform LBP operator is described as follows:

$${\text{LBP}}_{r,N}^{{\fancyscript{U}}} (x) = {\fancyscript{U}}\left\{{\sum\limits_{i = 1}^{N} {\fancyscript{I}}\{x_{i} - x\} 2^{i} } \right\},$$
(5.4)

where \(x_{i}\) are the neighbors of the pixel x at radius r and N is the total number of neighbors. \({\fancyscript{U}}\) is the uniform pattern mapping and \({\fancyscript{I}}\left(\cdot \right)\) is the unit step function:

$${\fancyscript{I}}(x) = \left\{{\begin{array}{*{20}c} {1,} & {x > 0} \\ {0,} & {x \le 0} \\ \end{array} } \right.$$
(5.5)

Note that within this book chapter, we use only uniform sequences. A binary pattern is uniform, if it contains at most two bit-wise transitions from 0 to 1 or from 1 to 0, when the bit sequence is recorded circularly. For example, the sequence 011111111000 is a 12-bit uniform pattern, while the sequence 010001011111 is not uniform. The uniform mapping \({\fancyscript{U}}(d)\) is defined as follows:

$${\fancyscript{U}}(d) = \left\{{\begin{array}{*{20}c} {d,} & {{\text{if}}\;\;d_{B} \;\;{\text{is uniform}}} \\ {M,} & {\text{otherwise}} \\ \end{array} } \right.$$
(5.6)

where \(d_{B}\) is the binary form of a number d and M is the total number of uniform patterns formed using N bits. We work with \(N = 12\)-bit sequences, which results in \(M = 134\) uniform patterns.

5.2.2.4 Generalized Local Binary Pattern

A uniform GLBP operator is a generalization of the encoding method proposed in [22] by introducing a varying threshold t rather than a fixed one. Based on our empirical analysis, the combination of LBP applied to the magnitude response of a Gabor filter and GLBP applied to the phase response of the same Gabor filter boosts the cross-matching performance [39]. The uniform generalized binary operator is defined as follows:

$${\text{GLBP}}_{r,N,t}^{{\fancyscript{U}}} (x) = {\fancyscript{U}}\left\{{\sum\limits_{i = 1}^{N} {\fancyscript{T}}_{t} \{x_{i} - x\} 2^{i} } \right\},$$
(5.7)

where \(x_{i}\) is the ith neighbor of x at radius r (we set \(r = 1,2\) in our experiments) and N is the total number of neighbors. \({\fancyscript{U}}(\cdot )\) is the uniform pattern mapping described in the previous subsection (see Sect. 5.2.2.3). \({\fancyscript{T}}_{t}(\cdot )\) is a thresholding operator based on threshold t. It is defined as follows:

$${\fancyscript{T}}_{t} (x) = \left\{{\begin{array}{*{20}l} {1,} \hfill & {\left| x \right| \le t} \hfill \\ {0,} \hfill & {\left| x \right| > t} \hfill \\ \end{array}} \right.$$
(5.8)

The values for the thresholds in this chapter were evaluated experimentally and set to \(t = \pi /2\).

5.2.2.5 Histogram of Oriented Gradients

Dalal and Triggs [17] were the first to introduce HOG in their work. The essential thought behind the HOG operator is that local object appearance and shape within an image can be described by the distribution of intensity gradients or edge directions.

An input image is computed using Gaussian smoothing followed by a derivative mask such as the very simple 1D mask \([ - 1,0,1]\). The directional derivatives can be expressed as follows:

$$\begin{aligned} & G_{x} (x,y) = I(x + 1,y) - I(x - 1,y) \\ & G_{y} (x,y) = I(x,y + 1) - I(x,y - 1), \\ \end{aligned}$$
(5.9)

where \(I(x,y)\) is the input image. \(G_{x} (x,y)\,\text{and}\,G_{y} (x,y)\) denote the derivatives along x and y directions, respectively. Then, the magnitude and phase components of the gradient can be calculated as follows:

$$\begin{aligned} M(x,y) & = \sqrt {G_{x} (x,y)^{2} + G_{y} (x,y)^{2} } \\ \alpha (x,y) & = \mathop {\tan }\nolimits^{ - 1} \frac{{G_{x} (x,y)}}{{G_{y} (x,y)}}, \\ \end{aligned}$$
(5.10)

where \(M(x,y)\) and \(\alpha (x,y)\) are the magnitude and phase, respectively.

The next step is spatial and orientation binning. A weighted vote is calculated at each pixel for an edge orientation histogram channel based on the orientation of the gradient at that pixel, and the votes are accumulated into orientation bins over local small regions called cells (cells can be either rectangular or circular). The orientation bins are evenly spaced over \(0^{ \circ } - 180^{ \circ }\) (“unsigned” gradient) or \(0^{ \circ } - 360^{ \circ }\) (“signed” gradient). The vote is a function of the gradient magnitude at the pixel, very often the magnitude itself. The descriptor vector is thereafter normalized over non-overlapping blocks using the \(L_{1}\) or \(L_{2}\) norms, or their variants. An example of using \(L_{2}\) normalization is given as follows:

$${\mathbf{v}}^{*} = {\mathbf{v}}/\sqrt {{\parallel }{\mathbf{v}}{\parallel }_{2}^{2} + \varepsilon^{2} } ,$$
(5.11)

where v is the non-normalized descriptor vector and ε is a small constant.

5.2.2.6 Combination of Operators

A fusion of extracted features often leads to improved recognition performance. As shown in [38, 40], LBP and WLD applied to the magnitude of Gabor filtered images combined with GLBP applied to the phase of Gabor filtered images yielded a significant performance boost. Details of this fusion scheme can be found in [38, 40]. A block diagram of the fusion approach is displayed in Fig. 5.4.

Fig. 5.4
figure 4

A block diagram of the fusion scheme in [38]

5.2.2.7 Gabor Ordinal Measures

GOM is a recently developed local operator [11]. This operator combines Gabor filters (see Sect. 5.2.2.1) with ordinal measures, a measurement level which records the information about ordering of multiple quantities [46]. Following GOM, Chai et al. extracted a histogram representation and applied a dimensionality reduction by means of LDA to filtered and encoded face data.

The ordinal measure in [11] is modified using a smoothed neighborhood described by a Gaussian smoothing function. Therefore, the ordinal measure filter \(f_{\text{om}} ({\mathbf{z}})\) can be expressed as follows:

$$\begin{aligned} f_{\text{om}} ({\mathbf{z}}) & = C_{p} \sum\limits_{i = 1}^{{N_{p} }} \frac{1}{{\sqrt {2\pi }\upsigma_{p,i} }}\exp \left[ {\frac{{ - ({\mathbf{z}} - \mu_{p,i} )^{T} ({\mathbf{z}} - \mu_{p,i} )}}{{2\upsigma_{p,i}^{2} }}} \right] \\ & \quad - C_{n} \sum\limits_{i = 1}^{{N_{n} }} \frac{1}{{\sqrt {2\pi }\upsigma_{n,i} }}\exp \left[ {\frac{{ - ({\mathbf{z}} - \mu_{n,i} )^{T} ({\mathbf{z}} - \mu_{n,i} )}}{{2\upsigma_{n,i}^{2} }}} \right] \\ \end{aligned}$$
(5.12)

where \({\mathbf{z}} = (x,y)\) is the location of a pixel. \(\mu_{p,i}\) and \(\upsigma_{p,i}\) denote the central position and the scale of the ith positive lobe of a 2D Gaussian function, while \(\mu_{n,i}\) and \(\upsigma_{n,i}\) denote that of the ith negative lobe of the same Gaussian function. \(N_{p}\) and \(N_{n}\) are the numbers of positive and negative lobes, respectively, while constant coefficients \(C_{p}\) and \(C_{n}\) keep the balance between positive and negative lobes, i.e., \(C_{p} N_{p} = C_{n} N_{n} .\)

5.2.2.8 Composite Multi-lobe Descriptor

In [9], a new operator named CMLD was proposed. CMLD combines Gabor filter, WLD, LBP, and GLBP and modifies them into multi-lobe functions with smoothed neighborhoods. The new operator encodes both magnitude and phase responses of Gabor filters. The combining of LBP and WLD utilizes both the orientation and intensity information of edges. The introduction of multi-lobe functions with smoothed neighborhoods further makes the proposed operator robust against noise and poor image quality. A block diagram of CMLD is provided in Fig. 5.5.

Fig. 5.5
figure 5

A block diagram of composite multi-lobe descriptor

The multi-lobe version of LBP (referred to as MLLBP) is the same as the ordinal measure described in (5.12) (see Sec. 5.2.2.7). An illustration of such a MLLBP operator is provided in Fig. 5.6. The multi-lobe version of GLBP called MLGLBP is constructed in a similar way as MLLBP except for that the unit step function \({\fancyscript{I}}(\cdot)\) in (5.5) is replaced by the thresholding function \({\fancyscript{T}}_{t}(\cdot)\) in (5.8). The multi-lobe version of WLD (MLWLD) is a modification of the original WLD operator (see Sec. 2.2.2 for details) and is given by:

$${\text{MLWLD}}_{N} ({\mathbf{z}}) = {\fancyscript{Q}}_{l} \left\{{\mathop {\tan}\nolimits^{- 1} \left[{\sum\limits_{i = 1}^{N} \frac{{I({\mathbf{z}})*\hat{f}_{\text{MLWLD}}^{(i)} ({\mathbf{z}})}}{{I({\mathbf{z}})}}} \right]} \right\},$$
(5.13)

where \(I({\mathbf{z}})\) is an input and \({\mathbf{z}} = (x,y)\) is the location of a pixel. \(\hat{f}_{\text{MLWLD}}^{(i)} ({\mathbf{z}})\) is the ith element of the set of \(\varTheta \times M\) kernel functions \(\{ f_{\text{MLWLD}} ({\mathbf{z}};\theta ,L):\theta = 1,2, \ldots ,\varTheta ;L = 2,3, \ldots ,M\}\), where \(\varTheta\) is the total number of orientations and M is the maximum value of total lobe number. \(f_{\text{MLWLD}} ({\mathbf{z}};\theta ,L)\) is given by

$$f_{\text{MLWLD}} ({\mathbf{z}};\theta ,L) = \sum\limits_{l = 1}^{L} \frac{{C_{l} }}{{\sqrt {2\pi }\upsigma_{l,\theta ,L} }}\exp \left[ { - \frac{{({\mathbf{z}} - \mu_{l,\theta ,L} )^{T} ({\mathbf{z}} - \mu_{l,\theta ,L} )}}{{2\upsigma_{l,\theta ,L}^{2} }}} \right],$$
(5.14)

where \(\mu_{l,\theta ,L}\) and \(\upsigma_{l,\theta ,L}\) are the center and the scale of the kernel function at orientation θ, and L is the total number of lobes. \(\{ C_{l} \}\) are the coefficients to keep a balance between the positive and negative lobes. A detailed description of MLLBP, MLGLBP, and MLWLD can be found in [9].

Fig. 5.6
figure 6

Examples of kernels at different orientations used in multi-lobe operators: a a di-lobe function, b a trilobe function

5.2.3 Histogram (Feature Vector) and Matching Metric

Each encoded response (the output of each local operator ) is divided into 210 non-overlapping square blocks of size \(8 \times 8\). Blocks are displayed in the form of histograms, and the number of bins is set to be equal to the level of the encoders mentioned in the previous section (e.g., 135 in our experiments). Then, a 135-bin histogram of each block is formed, and histograms of the blocks are concatenated and normalized to be treated as a probability mass function, resulting in a vector of length \(135 \times 210 = 28{,}350\) for each encoded response. The length of the feature vector was selected empirically to maximize the cross-matching performance. Vectors of all encoded responses are further concatenated, and thus, the total size of a feature vector corresponding to an input face image is \(28{,}350 \times P,\) where P is the number of encoded responses. In this book chapter, \(P = 96\) for the case of Gabor filters followed by LBP, GLBP, and WLD as well as for the case of CMLD (see Sects. 5.2.2.6 and 5.2.2.8).

When the distance between two feature vectors (histograms in our case) is evaluated, it is expressed as a sum of distances between all feature vector pairs. A sum of two Kullback-Leibler distances [30] is used as a distance metric to compare the feature vectors of heterogeneous images. For two images A and B with the feature vectors \(H_{A}\) and \(H_{B} ,\) respectively, the symmetric Kullback-Leibler distance is defined as follows:

$$D_{KL} (A,B) = \sum\limits_{k = 1}^{K} \left( {H_{A} (k) - H_{B} (k)} \right)\log \frac{{H_{A} (k)}}{{H_{B} (k)}},$$
(5.15)

where K is the length of the feature vectors \(H_{A}\) or \(H_{B}\).

5.3 Datasets

In our experiments, we use two datasets Pre-TINDERS (Tactical Imager for Night/Day Extended-Range Surveillance) and TINDERS collected by the Advanced Technologies Group, West Virginia High Tech Consortium (WVHTC) Foundation [35]. A summary of the datasets can be found in Table 5.1.

Table 5.1 Summary of the datasets

Pre-TINDERS is composed of 48 frontal face classes of total 576 images, at three wavelengths—visible, 980 nm NIR , and 1550 nm SWIR . Images are acquired at a short standoff distance of 1.5 m in a single session. Four images per class are available in each spectral band. A 980-nm light source is used to illuminate the face in the NIR spectral band, while a 1550-nm light source is used in the SWIR spectral band. The original resolutions of the acquired images (see Fig. 5.7) are \(640 \times 512\) (png format) for both NIR and SWIR images and \(1600 \times 1200\) (jpg format) for color images.

Fig. 5.7
figure 7

Sample images: a visible, b SWIR at 1.5 m, c SWIR at 50 m, d SWIR at 106 m, e NIR at 1.5 m, f NIR at 50 m, and g NIR at 106 m

TINDERS is composed of 48 frontal face classes each represented by visible, NIR (980 nm) at two standoff distances (50 and 106 m), and SWIR at two standoff distances (50 and 106 m) images. At each distance and spectrum, four or five images per class are available. A total of 478 images with the resolution \(640 \times 512\) (png format) are available in SWIR band. A total of 489 images with the resolution \(640 \times 512\) (png format) are available in the NIR band. The visible (color) images with the resolution \(480 \times 640\) (jpg format) are collected at a short distance and in two sessions (3 images per session), and all of them have neutral expression, resulting in a total of 288 images. Sample images from the Pre-TINDERS and TINDERS datasets are shown in Fig. 5.7.

It is important to note that although the original resolution of images in Pre-TINDERS and TINDERS is varying, we crop and normalize them to be the same size for each experiment described below. This is done to ensure a fair comparison.

5.4 Experiments and Results

In this section, we analyze the performance of various local operators used for encoding heterogeneous face images. In our experiments, galleries are composed of visible light face images, while NIR and SWIR face images are presented as probes. We match NIR and SWIR face images collected at 1.5, 50, and 106 m to visible light face images acquired at a distance 1.5 m.

For both SWIR and NIR spectra (at both short and long standoff distances), a total of 11 operators (including individual operators and their combinations) are implemented. We order and number them as follows: (1) LBP, (2) WLD, (3) GLBP, (4) HOG, (5) Gabor filter, (6) Gabor filter followed by LBP applied to the magnitude image (Gabor + LBP), (7) Gabor filter followed by WLD applied to the magnitude image (Gabor + WLD), (8) Gabor filter followed by GLBP applied to the phase image (Gabor + GLBP), (9) Gabor filter followed by LBP, GLBP, and WLD (Gabor + LBP + GLBP + WLD), (10) GOM , and (11) CMLD . The parameters in the experiments are chosen as follows. The number of orientations and radii for Gabor filters are set to 8 and 2, respectively. The number of radii for LBP, GLBP, and WLD is chosen as 2, and the number of neighbors around the central pixel is set to 12. The same parameters are used in operators to encode short- and long-range images.

The results of matching are displayed in the form of receiver operating characteristic (ROC) curves. We plot genuine accept rate (GAR) versus false accept rate (FAR). Summaries of equal error rates (EERs), d-prime values, and GARs at the FAR set to 0.1 and 0.001 are provided in tables.

5.4.1 Matching SWIR Probes Against Visible Gallery

Our first experiment involves matching SWIR face images to visible face images. The heterogeneous images are encoded using the eleven individual or composite operators as described earlier in this section. The performance of the individual encoders can be treated as benchmarks. The results of matching parameterized by different standoff distances are shown in Figs. 5.8, 5.9, and 5.10. In these experiments, visible light images form the gallery set. All SWIR images are used as probes.

Fig. 5.8
figure 8

ROC curves: matching SWIR probes at 1.5 m to visible gallery

Fig. 5.9
figure 9

ROC curves: matching SWIR probes at 50 m to visible gallery

Fig. 5.10
figure 10

ROC curves: matching SWIR probes at 106 m to visible gallery

5.4.1.1 Short Standoff Distance

For the case of the short standoff distance (pre-TINDERS dataset), the performance of single operators such as HOG, LBP, WLD, GLBP, and Gabor filters is inferior to the performance of the composite operators where Gabor filters are followed by LBP, WLD, and GLBP. It is also inferior to the performance of each CMLD and GOM , the composite multi-lobe operators. Within the group of single operators, HOG outperforms the other four operators closely followed by LBP and then Gabor filters. WLD appears to be less suitable for encoding heterogeneous face images in the framework of the cross-spectral matching .

Within the group of composite operators, the top five, following closely together, are CMLD , Gabor + LBP + GLBP + WLD, GOM, Gabor + LBP, and Gabor + WLD. Gabor + GLBP performs slightly inferior to the top three. Table 5.2 presents a summary of EERs, d-prime values, and GAR values at FAR set to 0.1 and 0.001 values.

Table 5.2 EERs and GAR values: matching SWIR probes at 1.5 m to visible gallery

5.4.1.2 Long Standoff Distance

SWIR images at longer standoff distances (50 and 106 m in the case of TINDERS dataset) experience some loss of quality due to air turbulence, insufficient illumination, and optical effects during data acquisition. This immediately reflects on the values of matching scores. Figures 5.9 and 5.10 display the results of cross-spectral comparison parameterized by 50 and 106 m standoff distances, respectively. Gallery images are retained from the previous session. Note that in both figures, Gabor + LBP, Gabor + WLD, CMLD , and GOM display a very similar performance. They are closely followed by Gabor + GLBP. The top performance in both cases is demonstrated by Gabor + LBP + GLBP + WLD. Once again, composite operators outperform single operators, which was anticipated. However, at longer standoff distances, matching performance of all the operators and their combinations but Gabor + LBP + GLBP + WLD drops nearly two times for the case of 50 m and 2.5 times for the case of 106 m. EERs, d-prime values, and GARs at FAR set to 0.1 and 0.001 are summarized in Tables 5.3 and 5.4.

Table 5.3 EERs and GAR values: matching SWIR probes at 50 m to visible gallery
Table 5.4 EERs and GAR values: matching SWIR probes at 106 m to visible gallery

5.4.2 Matching NIR Probes Against Visible Gallery

In the second experiment, NIR face images (probes) are matched to short-range visible face images (gallery). The results of matching parameterized by the standoff distances of 1.5, 50, and 106 m are shown in Figs. 5.11, 5.12, and 5.13, respectively.

Fig. 5.11
figure 11

The results of cross-matching short-range (1.5 m) NIR probes and visible gallery images

Fig. 5.12
figure 12

The results of cross-matching long-range (50 m) NIR probes and visible gallery images

Fig. 5.13
figure 13

The results of cross-matching short-range (106 m) NIR probes and visible gallery images

5.4.2.1 Short Standoff Distance

Among the group of single operators, LBP and HOG outperform the other operators, followed by GLBP and Gabor. Similar to the case of SWIR probe images, WLD operator performs poorly. All composite operators demonstrate a relatively high performance with ROC curves closely following one another. CMLD appears to outperform the other four composite operators. It is closely followed by GOM and then by Gabor + LBP + GLBP + WLD. Table 5.5 summarizes the values of EERs, d-primes, and GARs at FAR equal to 0.1 and 0.001.

Table 5.5 EERs and GAR values: matching NIR probes at 1.5 m to visible gallery

5.4.2.2 Long Standoff Distance

Long-range NIR probes display a cardinally different performance. As can be seen from Fig. 5.7, NIR images at 106 m have much lower contrast and overall quality compared to NIR images at 50 m. This difference in image quality immediately reflects on the matching performance of the two sets of probes (50 m probes and 106 m probes). This also reflects on the interplay among the 11 operators. Figures 5.12 and 5.13 display the cross-matching results for the two standoff distances (50 m and 106 m, respectively). Comparing the composite operators in terms of their performance, NIR at 50 m shows that Gabor + LBP + GLBP +WLD, CMLD , and GOM perform equally well. Their performance is very close to the performance they demonstrate at 1.5 m. Note it is only slightly degraded. These three ROCs are closely followed by the ROCs of Gabor + GLBP and Gabor + WLD. At 106 m, NIR probes do not perform as well. In fact, the performance of NIR images encoded with composite operators drops at least three times compared to the performance of the same operators applied to NIR at 50 m. Figure 5.13 indicates that GOM followed by CMLD, Gabor + LBP + GLBP + WLD, and Gabor + GLBP, where GLBP is applied to phase images, seems to be more robust to degraded image quality in NIR spectrum compared to other composite operators. Among single operators, Gabor and HOG still outperform other single operators for both standoff distances. Tables 5.6 and 5.7 present a summary of EERs, d-primes, and GARs at FAR set to 0.1 and 0.001 for the case of 50 m and 106 m standoff distances, respectively.

Table 5.6 EERs and GAR values: matching NIR probes at 50 m to visible gallery
Table 5.7 EERs and GAR values: matching NIR probes at 106 m to visible gallery

5.5 Brief Analysis

  1. 1.

    Combining Gabor filter bank with other local operators results in considerably improved performance compared to the performance of individual local operators. This holds both for short and long standoff distances and for the both types of cross-spectral matching performed in this chapter.

  2. 2.

    As anticipated, quality of active IR probes affects matching performance. In this chapter, quality of probes is a function of a standoff distance. We use an adaptive sharpness measure [51] to calculate the image quality of the probes in both SWIR and NIR spectra at all the standoff distances (see Table 5.8 for the values). From the results, the matching performance of SWIR data degrades with standoff distance faster than the matching performance of NIR images. However, the overall sharpness measure values (and thus the matching performance) of SWIR images is higher compared to the sharpness measure values (and the matching performance) of NIR images.

    Table 5.8 Sharpness measure of the probes in SWIR and NIR at different standoff distances
  3. 3.

    Among the five individual operators, HOG outperforms other operators followed by LBP and Gabor for the case of 1.5-m standoff distance and SWIR probes. For the case of 50-m standoff distance and SWIR probes, LBP and HOG perform nearly equally well followed by Gabor. For the case of 106 m and SWIR probes, LBP, and Gabor, each outperforms HOG. This leads to a conclusion that LBP and Gabor are more robust to data acquisition noise compared to HOG.

  4. 4.

    For the case of NIR probes and 1.5-m standoff distance, LBP and HOG perform equally well. Their ROC curves very closely following one another. For the case of NIR probes and both 50 m and 106 m standoff distances, Gabor filters substantially outperform HOG and LBP, which continue to show very similar performance. Thus, poor quality NIR images should be encoded with Gabor filters for robust cross-matching performance.

  5. 5.

    All composite operators, where Gabor filters are followed by the application of other local operators , perform equally well in nearly all cases of standoff distances and the two types of active-IR probes. Performance of the combination of Gabor + GLBP is slightly inferior to the other combinations in every case besides the case of NIR probes and 106-m standoff distance. This is the case where Gabor + GLBP applied to the phase of face images demonstrated superior performance compared to other operators. Thus, Gabor + GLBP appear to be very robust to severe image degradation for the case of NIR probes.

  6. 6.

    Multi-lobe operators, CMLD and GOM , and the composite operator Gabor + LBP + GLBP + WLD display the top performance in all three cases of standoff distance for both types of cross-spectral matching .

  7. 7.

    The improved performance of the composite operators comes at a cost of increased complexity. For a single operator, the feature vector (histogram) is formed from 2 outputs of the operator. For a composite operator, the feature vector is 16 times longer due to 16 outputs of Gabor filters each encoded with a local operator . The involvement of each additional local operator (applied to outputs of Gabor filters) doubles the length of the feature vector. Although the complexity of the feature vector grows, most of the operations can be implemented in parallel, which allows involvement of devices for parallel computing.

5.6 Summary

This chapter presented a short overview of recent advances in the field of heterogeneous face recognition , emphasizing the topic of local operators developed for matching active-IR face probes to a gallery composed of high-quality visible face images. A brief description of each individual and composite operator (11 in total) was provided. The list of individual operators included LBP, GLBP, WLD, HOG, and Gabor filters. Composite operators included Gabor + LBP, Gabor + GLBP, Gabor + WLD, Gabor + LBP + GLBP + WLD, GOM , and CMLD . We considered a very specific framework for cross-matching heterogeneous face images, assuming that each image is aligned, cropped, and enhanced at first. It was then filtered and encoded using local operators. The outputs of local operators were converted into a histogram representation and compared against histogram representations of images in the gallery by means of a symmetric Kullback-Leibler distance . This cross-matching approach does not require any training or learning, and it is shown to be robust when applied to a variety of heterogeneous datasets.

We presented the results of matching SWIR and NIR facial images to visible facial images. Both short (1.5 m) and long (50 and 106 m) standoff distances were considered. The results were documented in figures and tables. We presented ROC curves as well as GARs at two specific levels of FAR, EERs, and d-prime values. The combination of Gabor filters followed by other local operators substantially outperformed the original LBP and the other individual operators. As the standoff distance increased, the matching performance of all the operators dropped. This drop was attributed to a relatively low quality of imagery at long standoff distances (SWIR vs. visible and NIR vs. visible).