RIMFRA: Rotation-invariant multi-spectral facial recognition approach by using orthogonal polynomials

Cevik, Taner; Cevik, Nazife

doi:10.1007/s11042-019-07816-6

RIMFRA: Rotation-invariant multi-spectral facial recognition approach by using orthogonal polynomials

Published: 11 June 2019

Volume 78, pages 26537–26567, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

RIMFRA: Rotation-invariant multi-spectral facial recognition approach by using orthogonal polynomials

Download PDF

Taner Cevik¹ &
Nazife Cevik²

211 Accesses
3 Citations
Explore all metrics

Abstract

This paper proposes a novel rotation-invariant multi-spectral facial recognition approach (RIMFRA) by using orthogonal polynomials. In the first step, a rotation, illumination and noise invariant local descriptor (RinLd) is proposed to represent the texture patterns of a face image. Color channels of the images embodies non-trivial information about the characteristic of the image. Hence, the local descriptor matrices are extracted among the color channels. The corresponding new descriptor matrices for the red, green and blue channels of the image are extracted. Afterwards, co-occurrence matrices are obtained from the six combinations of the corresponding color channel descriptor matrices, that are red-red, blue-blue, green-green, red-blue, green-blue and red-green. Finally, these matrices are decomposed by using the orthogonal polynomials to achieve a more reliable and characteristic pattern extraction. The coefficients obtained as a result of the decomposition process are used as the ultimate features for the classification of the images. Extensive simulations are conducted over benchmark datasets. As presented by the simulation results, the ultimate features yield very high discriminating performance as well as providing resistance to rotation and illumination variations.

Weighted statistical binary patterns for facial feature representation

Article Open access 31 May 2021

Illumination-Robust Local Pattern Descriptor for Face Recognition

Enhanced Face Preprocessing and Feature Extraction Methods Robust to Illumination Variation

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Biometry has attracted a great deal of attention in recent years and has been widely used for its high performance in many areas such as surveillance, identification, and human-computer interaction [5, 15, 18, 30, 31, 34, 40, 50, 58, 60, 71]. Individuals have biological characteristics, also called metrics, that distinguish them from others [27]. Extracting behavioral and/or physiological characteristics of individuals to make discrimination is called as biometrics recognition. Face, iris, retina, ear, palm are the prominent common discriminative physiological characteristics. Besides, voice, typing rhythm and gait are the behavioral characteristics, which are called as behaviometrics [48].

Face is one of the leading biometrics preferred for individual discrimination because it can distinguish individuals with high accuracy and less human participation. Face data can be easily collected and processed in real time using remote devices such as cameras without the need for any human intervention [14, 26].

As with many other images, face data are also exposed to disruptive external factors such as noise, illumination, pose variations and rotation. Variations in the pose, illumination, direction and the presence of random noise inhibit a pixel-by-pixel comparison among the images. Therefore, facial recognition has attracted great interest from researchers to overcome the above mentioned challenges. At the point where pixel-to-pixel comparison does not perform well, texture helps in image classification. Texture plays a key role in computer pattern recognition, especially in image related applications [32, 33]. Besides Although not a globally accepted definition, texture can be defined as the result of recurring local patterns throughout the picture [47]. As with other types of images, there is also a texture in the face images. Features are extracted from the texture of the face images and then analyzed and classified to distinguish individuals. To qualify a feature set as high quality, it must provide two criteria. The first is that the need for computer processing complexity should be low, so it can be used in real-time applications. The other criterion is that it should be able to express the properties of the texture in the best possible way so that it can do the splitting well during classification between the textures [41].

Numerous studies have been conducted to suggest a high-performance identifier for low computational complexity and high representational power. These methods can basically be grouped under two headings: holistic and local appearance features [39]. Holistic techniques analyze the entire face image and extract global information to recognize a subject. This global information is obtained by analyzing pixel relationships along the entire image and corresponding features are extracted. These features represent the global characteristic of the image that uniquely discriminates the face from others [64]. The most well-known of the holistic approaches are Principal Component Analysis (PCA) [65], Linear Discriminant Analysis (LDA) [7] and Independent Component Analysis (ICA) [10]. The hallmark of PCA is that it reduces dimensionality by transition from a high dimensional image space to a low dimensional orthogonal space. The transition is performed by considering the lowest mean square reconstruction error and also by applying a linear transformation. LDA focuses on finding linear transformation that maximizes inter-class variance and minimizes intra-class variation. ICA, first adopted by Herault and Jutten [69], seeks for a linear transformation to minimize the statistical dependencies of the components of a vector. Many of the following studies [22, 23, 28, 29, 54, 56, 63, 72] are based on these fundamental methods and have struggled to improve their performance by introducing new ideas among them.

Local approaches, unlike the holistic ones, reveal local distinctive features that are more resistant to changes in expression and illumination. In this aim, several research studies (LBP [2, 59], LPQ [73], LDP [25], LDN [52, 53], HoG [11], LTP [62], Gabor [43, 74]) have been performed to satisfy the task of obtaining a high level of distinctive and proper texture representation [8]. Among these, LBP has been a promising pioneer for follow-up studies due to its high performance and calculation efficiency [12]. LBP identifies local textures by comparing each pixel with its 3 × 3 local neighborhood. Each pixel is then replaced by the eight-bit-stream-result of the comparison step. Each bit in the bit stream represents the magnitude-comparison-result of the corresponding neighbor to the reference pixel. If the intensity value of the reference pixel is less than the neighboring pixel, the corresponding bit is assigned to zero, otherwise assigned to one.

One of the most obvious concerns in facial recognition is undoubtedly the failure of the proposed feature descriptors when the image is rotated. A robust descriptor, regardless of local or holistic, should work independently of the direction of the image, i.e. reflect the same image characteristics in all conditions. As is known, basic LBP does not consider rotational variations, thus, a number of follower improvements [70, 78] have been proposed to gain resistance to rotational variations. Furthermore, color channels contain significant information yet a great deal of the studies to date have derived characteristics from the monochrome images. In this paper, we propose a compound method that blends three main distinguishing subjects. First, a rotationally invariant local identifier is proposed which is also resistant to variations of light and facial expression. Second, the power of the color channels is used to investigate the statistical properties of matrices created by taking into account the occurrences of the local identifier in the multi-spectral area. Finally, the information stored in the multi-spectral occurrence matrices is represented by the orthogonal polynomial coefficients to reinforce the discriminative power of the proposed method.

The rest of this paper is organized as follows. Section II briefly describes the proposed method and gives basic information about pioneering ideas. Section III shows the results of the simulations and also refers to the discussions. Finally, Section IV completes the paper.

2 RIMFRA

This section describes the details about the proposed method in detail. The main steps of the general process are given in the following. At the outset, the multi-spectral rotation invariant local descriptor matrices are calculated from the RGB bands of the raw image. After the formation of the descriptor matrices, multi-spectral co-occurrence matrices are calculated. In the last step, orthogonal polynomial coefficients are obtained from each co-occurrence matrix. Finally, the coefficients obtained from the co-occurrence matrices are concatenated to form the ultimate feature vector for each facial image. Figure 1 depicts the operation of the complete process.

2.1 RinLd

The local texture descriptors, the prominent ones of which have been mentioned in the previous section, have provided promising discriminatory performances. However, the two of the most critical issues expected from these descriptors are that they should be rotationally invariant and resistant to changes that may occur in illumination. As mentioned earlier, LBP is one of the basic and leading local descriptors that figures out the local structure of the images. As is known, LBP defines the relationship between the central pixel and its neighboring pixels in an NxN block (N indicates the width and length of the block). However, the initial LBP does not concern with the rotational variations. That is, the value of the descriptor calculated from a sub-portion of the image changes when the image is rotated. Another deficiency of LBP is that it does not consider the intensity value of the central or reference pixel. Therefore, it is possible for some pixels with different intensity values to be represented by identical values in the new domain. The state of this undesirable identical representation of the different pixels may be fixed by taking into account the intensity of the reference pixel.

In this paper, we propose a new local descriptor that is resistant to rotational and illuminative variations. Instead of working on monochrome images, the method we offer here operates on RGB images. The three color bands of the image are divided into NxN blocks. Subsequently, the adjacent pixels of any reference pixel are sorted on a vector in descending order relative to their density values, as shown below:

$$ S{I}_{NxN}= sor{t}_{dsc}\left({I}_{NxN}\right) $$

(1)

where SI_NxN and I_NxN express the sorted and unsorted neighboring pixels’ intensity values respectively. The intensity value of the reference pixel is subtracted from each element of the sorted vector. If the absolute value of the result is greater than the threshold value (T), a 1, otherwise a 0, is assigned to the corresponding position of a new binary vector.

$$ {B}_{I_c}\Big\{{\displaystyle \begin{array}{ll}1& if\mid S{I}_i-{I}_c\mid >T,i=1,2,\dots N-1\\ {}0& otherwise\end{array}} $$

(2)

The threshold value is not held constant throughout the image, in the contrary, is dynamic and depends on the mean intensity value in the block. The intensity of the reference pixel is also taken into account while calculating the mean value in the block. T is calculated as follows:

$$ T=\mid {I}_c-\frac{\sum_{i=1}^N{SI}_i}{N}\mid $$

(3)

The resulting binary vector represents the comparison between the reference pixel and its neighbors in terms of intensity levels yet it does not solve the challenge of multiple pixels having different intensity values to be represented with identical values in the new domain. Thus, the resulting value is recalculated by taking into account the intensity value of the reference pixel as follows:

$$ {SB}_{I_c}={B}_{I_c}\times \left({I}_c/255\right) $$

(4)

The basic LBP and some of its derivatives do only consider the relationship between the reference pixel and its neighbors. However, the information concealed in the magnitude of the difference is being discarded in this way. Because of that, it is possible to encounter the challenge of two pixels with different intensities having identical values in the new domain. The most competent way to address this situation is to take into account the intensity value of the reference pixel. RIMFRA remedies the expressed challenge in two separate steps by calculating the value scaled according to the intensity value of the reference pixel while keeping the threshold value dynamic. Figure 2 illustrates the challenge of having the same binary patterns for pixels with different intensities and how RIMFRA handles this situation.

For each image, a total of three local descriptor matrices ((RinLd)_R, (RinLd)_G, and (RinLd)_B) are created, one for each color band of the image. Following this stage, the constructed matrices are fed into the next step in the process described in the next section.

2.2 Multi-spectral co-occurrence matrices

The modern image acquisition and processing systems are capable of expressing and operating on colors in different spaces, namely RGB (Red, Green, Blue), HSV (Hue, Saturation, Value) and CIE Lab. Commonly, color images are represented as RGB. In fact, the information carried in the image is the brightness level of each band. Although many applications require RGB to be converted to other color domains, such as HSV or others, RGB based computer vision systems are simpler and more economical than others. Although the RGB color space is machine-dependent, which seriously disrupts uniformity [49], it still performs very successfully in areas such as calibration and classification [24]. For example, researchers have analyzed the images of apples and have successfully estimated the amount of fruit they contain [61]. In addition, some researchers have correctly predicted some of the geometric properties of different crop species by applying RGB-based image processing techniques [19, 35, 37, 45, 46, 68]. Moreover, it has been verified that methods based on color analysis are a reliable way of discrimination and retrieval in facial detection and monitoring. Furthermore, although humans vary according to skin color, the main distinguishing parameter has been shown to be density rather than chrominance [77].

Gray-level co-occurrence matrix (GLCM), which was introduced firstly by Haralick [21] at the beginning of 1970s, has been proven an efficient way of texture representation [13]. GLCMs are formed by considering the number of occurrences of intensity value patterns in the image. Haralick proposed a set of statistical features obtained from GLCMs that achieved a success rate of %84 at a higher operational speed [3, 20]. Although it is an ancient method, it has been the reference and inspiration in many fields such as iris recognition [75], image segmentation [1] and CBIR [9, 36] in videos. However, base GLCM runs on gray-level images and discards the information carried by color bands. Arvis et al. [6] have proposed a method, which incorporates the color bands of the pixels during the construction of the co-occurrence matrices. As is known, GLCM contains information on the spatial relationships of intensity values and their formation amounts. Let f is an image whose intensity values vary in the range [0, L-1]. The value on row i, column j in GLCM indicates the number of times that the pixel pair (z_i, z_j) occurs in f with orientation Q. The orientation represented with Q eventually refers to a displacement vector d = (dx,dy | d_x = d_y = d_g), where d_g is the number of gaps between the pixels of interest. For the situation of adjacency, d_g = 0. The orientation can also be represented with two parameters as the distance d that the intensities z_i, z_j apart from each other with angle α. d theoretically can take values from 0 to L-2. The orientation of the pixel pattern can be in four different directions as 0°, 45°, 90° and 135°. That is, each image can have four different GLCMs (for each angle 0°, 45°, 90° and 135°) for a given d. The size of a GLCM matrix depends on the discrete intensity values in the image. That is, if the intensity values of the image vary in the range [0, L-1], then the size of the GLCM is (L-1) × (L-1). With the basic GLCM method, four separate matrices are formed, one for each of the directions 0°, 45°, 90°, 135°. Considering the color bands of the image, a total of twenty-four GLCMs are generated, six at a time for each direction. In RIMFRA, local descriptive matrices generated for each band of the first stage image are fed into the multi-spectral-co-occurrence matrix construction process, rather than directly entering the raw image as done in previous studies. The output of the process is the multi-spectral co-occurrence matrices, i.e. CM_{(RinLd)R(RinLd)R}, CM_{(RinLd)G(RinLd)G}, CM_{(RinLd)B(RinLd)B}, CM_{(RinLd)R(RinLd)G}, CM_{(RinLd)R(RinLd)B}, and CM_{(RinLd)G(RinLd)B}.

2.3 Orthogonal polynomial decomposition

In the proposed framework, the final stage of the feature extraction is the orthogonal polynomial decomposition process. Orthogonal polynomials, such as Tchebichef, have been shown to be an efficient means of representation of 2D functions [57]. In addition, some orthogonal polynomials such as Hermite and Zernike have also been used in previous studies during texture extraction and classification [38, 67]. However, Tchebichef polynomials have been identified to pose better performance compared to others [4]. Previous studies fed the raw image as input directly to the orthogonal polynomial decomposition process. However, as described in the simulation results section, inputting multi-spectral co-occurrence matrices instead of the raw image provides higher performance in terms of facial discrimination. Since the majority of information about the image and structure is concealed in the first few moments and the details are thought to be expressed in higher order moments, the second order statistical data, which does not have a high degree of importance can be eliminated by means of the limited opening. Thus, unnecessary complexity is eliminated.

The decomposition of the input matrix into moment orders M_pq is given in the following:

$$ {M}_{pq}=\frac{1}{\rho (p)\rho (q)}{\sum}_{x=0}^{N-1}{\sum}_{y=0}^{N-1}{m}_p(x)w(x){m}_q(y)w(y)f\left(x,y\right) $$

(5)

where 0 ≤ p, q, x, y ≤ N-1; m_n(x) represents a set of orthogonal polynomials, w(x) and ρ() denote the weight function and rho respectively.

The mathematical representation of the Tchebichef orthogonal polynomials is given in the following equation:

$$ {m}_n(x)=n!{\sum}_{k=0}^n{\left(-1\right)}^{n-k}\left(\begin{array}{c}N-1-k\\ {}n-k\end{array}\right)\left(\begin{array}{c}n+k\\ {}n\end{array}\right)\left(\begin{array}{c}x\\ {}k\end{array}\right) $$

(6)

$$ \rho (n)=(2n)!\left(\begin{array}{c}N+n\\ {}2n+1\end{array}\right) $$

(7)

$$ w(x)=1 $$

(8)

where m_n(x), ρ(n), w(x) denote the n^th Tchebichef polynomial, rho and weight functions respectively. In this study, the number of coefficients calculated for a single input matrix with the size NxN is equal to 2 N-2, hence, a signature comprising 6(2 N-2) coefficients is generated ultimately. Figure 3 depicts the orthogonal polynomial decomposition of a sample matrix:

3 Simulation results and discussions

Various experiments are conducted to measure and analyze the performance of the proposed framework under different circumstances. The evaluation of the proposed framework is performed on five benchmark databases, namely, Face94 [16], JAFFE [42], YALE (http://vision.ucsd.edu/content/yale-face-database), CAS-PEAL-R1 [17], ORL [55].

To ensure uniformity, some preprocessing is applied to each image. Each image is first scaled to a size of 64 × 64. Following the scaling stage, the face extraction is performed using the Viola Jones [66] algorithm to eliminate the effect of unnecessary background and foreground factors. To accurately measure and analyze the performance of the proposed framework and compare to the state-of-the-art methods in the area, hold out testing is utilized. That is, if an individual has N images in a data set, %80 of them and their average image are used for training. The rest is used for testing. When creating the average face image, each individual’s images are aligned taking the individual’s eyes into consideration. The sample average face calculation of an individual is given in Fig. 4:

The performance analysis of the proposed framework is performed in two folds. In the first step, the stability and resistance of the method to rotation, illumination changes and noise effects are clarified. Later, the recognition performance of the proposed method is analyzed and compared to the the-state-of-art methods such as LBP, LDP, LDNP (Local Directional Number Pattern) [51], Gabor Features, HoG (Histogram of Gradients), LTP, LTeTP (Local Tetra Pattern) [44] and LDrvP (Local Derivative Pattern) [76] by conducting extensive simulations.

3.1 The stability analysis

As mentioned earlier, rotational changes, variation of illumination and noise significantly affect recognition performance. Hence, the proposed local descriptor and the overall architecture should not fail under challenging circumstances and should remain intact to fulfill the recognition task satisfactorily. First, it is shown how the recommended local identifier remains intact against rotation changes. Following this, RIMFRA’s performance analysis is carried out to verify its resistance with extensive simulations against lighting changes. The last section in this section shows the analysis results of the simulations performed to see the behavior of RIMFRA under changing noisy conditions.

3.1.1 Rotation-variation resistance analysis

A solid texture descriptor should be stable and produce similar features even if the original image is rotated, because the content of the image does not change and belongs to the same person. It is therefore important to demonstrate and verify the behavior of the proposed method if the image is subject to rotational changes. Figures 5 and 6 show the stability performance of RIMFRA. In Fig. 5, a sample matrix, which symbolizes a block of an image is demonstrated. As depicted in the figure, the local descriptor content extracted from the block does not change even if the matrix is rotated 90° counter clockwise.

Figure 6 shows the face images of two people in the Face94 database and 90° rotated versions thereof.

Similarity performance analysis of the proposed method under rotational variation is made and compared with the state-of-the art methods in the literature. Firstly, the resulting textural features of RIMFRA and other methods are obtained from the face images and their 90° variants. Next, the similarity analysis is done by calculating the Mean Square Error (MSE) between these sets of features. Figure 7 shows the histograms of the feature sets produced by RIMFRA for the face images of two sample individuals in the Face94 database and 90° rotated versions thereof.

Histograms presented in the first column belong first individual’s face images and histograms in the right column belong to the second individual. Obviously, the histograms in the same column are very similar, which is the desired situation that confirms the robustness of RIMFRA against rotational changes. The similarity analysis is conducted on images from different databases to verify and compare the results fairly. Tables 1, 2, 3, 4 and 5 and Figs. 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 and 20 demonstrate the results of the similarity and correlation values between the feature sets of the sample images and their 90° rotated versions that are selected from different datasets. No image adjustment or enhancement technique is applied to images to see the performance of the methods used in raw images.

Table 1 The MSE values calculated in the Face94 sample images

RIMFRA: Rotation-invariant multi-spectral facial recognition approach by using orthogonal polynomials

Abstract

Similar content being viewed by others

Weighted statistical binary patterns for facial feature representation

Illumination-Robust Local Pattern Descriptor for Face Recognition

Enhanced Face Preprocessing and Feature Extraction Methods Robust to Illumination Variation

Explore related subjects

1 Introduction

2 RIMFRA

2.1 RinLd

2.2 Multi-spectral co-occurrence matrices

2.3 Orthogonal polynomial decomposition

3 Simulation results and discussions

3.1 The stability analysis

3.1.1 Rotation-variation resistance analysis

3.1.2 Illumination-variation resistance analysis

3.1.3 Noise resistance analysis

3.2 The recognition performance analysis

3.2.1 Training-based recognition performance analysis

3.2.2 Similarity-based recognition performance analysis

4 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation