1 Introduction

Biometry has attracted a great deal of attention in recent years and has been widely used for its high performance in many areas such as surveillance, identification, and human-computer interaction [5, 15, 18, 30, 31, 34, 40, 50, 58, 60, 71]. Individuals have biological characteristics, also called metrics, that distinguish them from others [27]. Extracting behavioral and/or physiological characteristics of individuals to make discrimination is called as biometrics recognition. Face, iris, retina, ear, palm are the prominent common discriminative physiological characteristics. Besides, voice, typing rhythm and gait are the behavioral characteristics, which are called as behaviometrics [48].

Face is one of the leading biometrics preferred for individual discrimination because it can distinguish individuals with high accuracy and less human participation. Face data can be easily collected and processed in real time using remote devices such as cameras without the need for any human intervention [14, 26].

As with many other images, face data are also exposed to disruptive external factors such as noise, illumination, pose variations and rotation. Variations in the pose, illumination, direction and the presence of random noise inhibit a pixel-by-pixel comparison among the images. Therefore, facial recognition has attracted great interest from researchers to overcome the above mentioned challenges. At the point where pixel-to-pixel comparison does not perform well, texture helps in image classification. Texture plays a key role in computer pattern recognition, especially in image related applications [32, 33]. Besides Although not a globally accepted definition, texture can be defined as the result of recurring local patterns throughout the picture [47]. As with other types of images, there is also a texture in the face images. Features are extracted from the texture of the face images and then analyzed and classified to distinguish individuals. To qualify a feature set as high quality, it must provide two criteria. The first is that the need for computer processing complexity should be low, so it can be used in real-time applications. The other criterion is that it should be able to express the properties of the texture in the best possible way so that it can do the splitting well during classification between the textures [41].

Numerous studies have been conducted to suggest a high-performance identifier for low computational complexity and high representational power. These methods can basically be grouped under two headings: holistic and local appearance features [39]. Holistic techniques analyze the entire face image and extract global information to recognize a subject. This global information is obtained by analyzing pixel relationships along the entire image and corresponding features are extracted. These features represent the global characteristic of the image that uniquely discriminates the face from others [64]. The most well-known of the holistic approaches are Principal Component Analysis (PCA) [65], Linear Discriminant Analysis (LDA) [7] and Independent Component Analysis (ICA) [10]. The hallmark of PCA is that it reduces dimensionality by transition from a high dimensional image space to a low dimensional orthogonal space. The transition is performed by considering the lowest mean square reconstruction error and also by applying a linear transformation. LDA focuses on finding linear transformation that maximizes inter-class variance and minimizes intra-class variation. ICA, first adopted by Herault and Jutten [69], seeks for a linear transformation to minimize the statistical dependencies of the components of a vector. Many of the following studies [22, 23, 28, 29, 54, 56, 63, 72] are based on these fundamental methods and have struggled to improve their performance by introducing new ideas among them.

Local approaches, unlike the holistic ones, reveal local distinctive features that are more resistant to changes in expression and illumination. In this aim, several research studies (LBP [2, 59], LPQ [73], LDP [25], LDN [52, 53], HoG [11], LTP [62], Gabor [43, 74]) have been performed to satisfy the task of obtaining a high level of distinctive and proper texture representation [8]. Among these, LBP has been a promising pioneer for follow-up studies due to its high performance and calculation efficiency [12]. LBP identifies local textures by comparing each pixel with its 3 × 3 local neighborhood. Each pixel is then replaced by the eight-bit-stream-result of the comparison step. Each bit in the bit stream represents the magnitude-comparison-result of the corresponding neighbor to the reference pixel. If the intensity value of the reference pixel is less than the neighboring pixel, the corresponding bit is assigned to zero, otherwise assigned to one.

One of the most obvious concerns in facial recognition is undoubtedly the failure of the proposed feature descriptors when the image is rotated. A robust descriptor, regardless of local or holistic, should work independently of the direction of the image, i.e. reflect the same image characteristics in all conditions. As is known, basic LBP does not consider rotational variations, thus, a number of follower improvements [70, 78] have been proposed to gain resistance to rotational variations. Furthermore, color channels contain significant information yet a great deal of the studies to date have derived characteristics from the monochrome images. In this paper, we propose a compound method that blends three main distinguishing subjects. First, a rotationally invariant local identifier is proposed which is also resistant to variations of light and facial expression. Second, the power of the color channels is used to investigate the statistical properties of matrices created by taking into account the occurrences of the local identifier in the multi-spectral area. Finally, the information stored in the multi-spectral occurrence matrices is represented by the orthogonal polynomial coefficients to reinforce the discriminative power of the proposed method.

The rest of this paper is organized as follows. Section II briefly describes the proposed method and gives basic information about pioneering ideas. Section III shows the results of the simulations and also refers to the discussions. Finally, Section IV completes the paper.

2 RIMFRA

This section describes the details about the proposed method in detail. The main steps of the general process are given in the following. At the outset, the multi-spectral rotation invariant local descriptor matrices are calculated from the RGB bands of the raw image. After the formation of the descriptor matrices, multi-spectral co-occurrence matrices are calculated. In the last step, orthogonal polynomial coefficients are obtained from each co-occurrence matrix. Finally, the coefficients obtained from the co-occurrence matrices are concatenated to form the ultimate feature vector for each facial image. Figure 1 depicts the operation of the complete process.

Fig. 1
figure 1

The block diagram of the entire process

2.1 RinLd

The local texture descriptors, the prominent ones of which have been mentioned in the previous section, have provided promising discriminatory performances. However, the two of the most critical issues expected from these descriptors are that they should be rotationally invariant and resistant to changes that may occur in illumination. As mentioned earlier, LBP is one of the basic and leading local descriptors that figures out the local structure of the images. As is known, LBP defines the relationship between the central pixel and its neighboring pixels in an NxN block (N indicates the width and length of the block). However, the initial LBP does not concern with the rotational variations. That is, the value of the descriptor calculated from a sub-portion of the image changes when the image is rotated. Another deficiency of LBP is that it does not consider the intensity value of the central or reference pixel. Therefore, it is possible for some pixels with different intensity values to be represented by identical values in the new domain. The state of this undesirable identical representation of the different pixels may be fixed by taking into account the intensity of the reference pixel.

In this paper, we propose a new local descriptor that is resistant to rotational and illuminative variations. Instead of working on monochrome images, the method we offer here operates on RGB images. The three color bands of the image are divided into NxN blocks. Subsequently, the adjacent pixels of any reference pixel are sorted on a vector in descending order relative to their density values, as shown below:

$$ S{I}_{NxN}= sor{t}_{dsc}\left({I}_{NxN}\right) $$
(1)

where SINxN and INxN express the sorted and unsorted neighboring pixels’ intensity values respectively. The intensity value of the reference pixel is subtracted from each element of the sorted vector. If the absolute value of the result is greater than the threshold value (T), a 1, otherwise a 0, is assigned to the corresponding position of a new binary vector.

$$ {B}_{I_c}\Big\{{\displaystyle \begin{array}{ll}1& if\mid S{I}_i-{I}_c\mid >T,i=1,2,\dots N-1\\ {}0& otherwise\end{array}} $$
(2)

The threshold value is not held constant throughout the image, in the contrary, is dynamic and depends on the mean intensity value in the block. The intensity of the reference pixel is also taken into account while calculating the mean value in the block. T is calculated as follows:

$$ T=\mid {I}_c-\frac{\sum_{i=1}^N{SI}_i}{N}\mid $$
(3)

The resulting binary vector represents the comparison between the reference pixel and its neighbors in terms of intensity levels yet it does not solve the challenge of multiple pixels having different intensity values to be represented with identical values in the new domain. Thus, the resulting value is recalculated by taking into account the intensity value of the reference pixel as follows:

$$ {SB}_{I_c}={B}_{I_c}\times \left({I}_c/255\right) $$
(4)

The basic LBP and some of its derivatives do only consider the relationship between the reference pixel and its neighbors. However, the information concealed in the magnitude of the difference is being discarded in this way. Because of that, it is possible to encounter the challenge of two pixels with different intensities having identical values in the new domain. The most competent way to address this situation is to take into account the intensity value of the reference pixel. RIMFRA remedies the expressed challenge in two separate steps by calculating the value scaled according to the intensity value of the reference pixel while keeping the threshold value dynamic. Figure 2 illustrates the challenge of having the same binary patterns for pixels with different intensities and how RIMFRA handles this situation.

Fig. 2
figure 2

a Identical LBP codes assigned to different patterns b RIMFRA overcomes the miss-assignment situation

For each image, a total of three local descriptor matrices ((RinLd)R, (RinLd)G, and (RinLd)B) are created, one for each color band of the image. Following this stage, the constructed matrices are fed into the next step in the process described in the next section.

2.2 Multi-spectral co-occurrence matrices

The modern image acquisition and processing systems are capable of expressing and operating on colors in different spaces, namely RGB (Red, Green, Blue), HSV (Hue, Saturation, Value) and CIE Lab. Commonly, color images are represented as RGB. In fact, the information carried in the image is the brightness level of each band. Although many applications require RGB to be converted to other color domains, such as HSV or others, RGB based computer vision systems are simpler and more economical than others. Although the RGB color space is machine-dependent, which seriously disrupts uniformity [49], it still performs very successfully in areas such as calibration and classification [24]. For example, researchers have analyzed the images of apples and have successfully estimated the amount of fruit they contain [61]. In addition, some researchers have correctly predicted some of the geometric properties of different crop species by applying RGB-based image processing techniques [19, 35, 37, 45, 46, 68]. Moreover, it has been verified that methods based on color analysis are a reliable way of discrimination and retrieval in facial detection and monitoring. Furthermore, although humans vary according to skin color, the main distinguishing parameter has been shown to be density rather than chrominance [77].

Gray-level co-occurrence matrix (GLCM), which was introduced firstly by Haralick [21] at the beginning of 1970s, has been proven an efficient way of texture representation [13]. GLCMs are formed by considering the number of occurrences of intensity value patterns in the image. Haralick proposed a set of statistical features obtained from GLCMs that achieved a success rate of %84 at a higher operational speed [3, 20]. Although it is an ancient method, it has been the reference and inspiration in many fields such as iris recognition [75], image segmentation [1] and CBIR [9, 36] in videos. However, base GLCM runs on gray-level images and discards the information carried by color bands. Arvis et al. [6] have proposed a method, which incorporates the color bands of the pixels during the construction of the co-occurrence matrices. As is known, GLCM contains information on the spatial relationships of intensity values and their formation amounts. Let f is an image whose intensity values vary in the range [0, L-1]. The value on row i, column j in GLCM indicates the number of times that the pixel pair (zi, zj) occurs in f with orientation Q. The orientation represented with Q eventually refers to a displacement vector d = (dx,dy | dx = dy = dg), where dg is the number of gaps between the pixels of interest. For the situation of adjacency, dg = 0. The orientation can also be represented with two parameters as the distance d that the intensities zi, zj apart from each other with angle α. d theoretically can take values from 0 to L-2. The orientation of the pixel pattern can be in four different directions as 0°, 45°, 90° and 135°. That is, each image can have four different GLCMs (for each angle 0°, 45°, 90° and 135°) for a given d. The size of a GLCM matrix depends on the discrete intensity values in the image. That is, if the intensity values of the image vary in the range [0, L-1], then the size of the GLCM is (L-1) × (L-1). With the basic GLCM method, four separate matrices are formed, one for each of the directions 0°, 45°, 90°, 135°. Considering the color bands of the image, a total of twenty-four GLCMs are generated, six at a time for each direction. In RIMFRA, local descriptive matrices generated for each band of the first stage image are fed into the multi-spectral-co-occurrence matrix construction process, rather than directly entering the raw image as done in previous studies. The output of the process is the multi-spectral co-occurrence matrices, i.e. CM(RinLd)R(RinLd)R, CM(RinLd)G(RinLd)G, CM(RinLd)B(RinLd)B, CM(RinLd)R(RinLd)G, CM(RinLd)R(RinLd)B, and CM(RinLd)G(RinLd)B.

2.3 Orthogonal polynomial decomposition

In the proposed framework, the final stage of the feature extraction is the orthogonal polynomial decomposition process. Orthogonal polynomials, such as Tchebichef, have been shown to be an efficient means of representation of 2D functions [57]. In addition, some orthogonal polynomials such as Hermite and Zernike have also been used in previous studies during texture extraction and classification [38, 67]. However, Tchebichef polynomials have been identified to pose better performance compared to others [4]. Previous studies fed the raw image as input directly to the orthogonal polynomial decomposition process. However, as described in the simulation results section, inputting multi-spectral co-occurrence matrices instead of the raw image provides higher performance in terms of facial discrimination. Since the majority of information about the image and structure is concealed in the first few moments and the details are thought to be expressed in higher order moments, the second order statistical data, which does not have a high degree of importance can be eliminated by means of the limited opening. Thus, unnecessary complexity is eliminated.

The decomposition of the input matrix into moment orders Mpq is given in the following:

$$ {M}_{pq}=\frac{1}{\rho (p)\rho (q)}{\sum}_{x=0}^{N-1}{\sum}_{y=0}^{N-1}{m}_p(x)w(x){m}_q(y)w(y)f\left(x,y\right) $$
(5)

where 0 ≤ p, q, x, y ≤ N-1; mn(x) represents a set of orthogonal polynomials, w(x) and ρ() denote the weight function and rho respectively.

The mathematical representation of the Tchebichef orthogonal polynomials is given in the following equation:

$$ {m}_n(x)=n!{\sum}_{k=0}^n{\left(-1\right)}^{n-k}\left(\begin{array}{c}N-1-k\\ {}n-k\end{array}\right)\left(\begin{array}{c}n+k\\ {}n\end{array}\right)\left(\begin{array}{c}x\\ {}k\end{array}\right) $$
(6)
$$ \rho (n)=(2n)!\left(\begin{array}{c}N+n\\ {}2n+1\end{array}\right) $$
(7)
$$ w(x)=1 $$
(8)

where mn(x), ρ(n), w(x) denote the nth Tchebichef polynomial, rho and weight functions respectively. In this study, the number of coefficients calculated for a single input matrix with the size NxN is equal to 2 N-2, hence, a signature comprising 6(2 N-2) coefficients is generated ultimately. Figure 3 depicts the orthogonal polynomial decomposition of a sample matrix:

Fig. 3
figure 3

Tchebichef signature of a sample matrix

3 Simulation results and discussions

Various experiments are conducted to measure and analyze the performance of the proposed framework under different circumstances. The evaluation of the proposed framework is performed on five benchmark databases, namely, Face94 [16], JAFFE [42], YALE (http://vision.ucsd.edu/content/yale-face-database), CAS-PEAL-R1 [17], ORL [55].

To ensure uniformity, some preprocessing is applied to each image. Each image is first scaled to a size of 64 × 64. Following the scaling stage, the face extraction is performed using the Viola Jones [66] algorithm to eliminate the effect of unnecessary background and foreground factors. To accurately measure and analyze the performance of the proposed framework and compare to the state-of-the-art methods in the area, hold out testing is utilized. That is, if an individual has N images in a data set, %80 of them and their average image are used for training. The rest is used for testing. When creating the average face image, each individual’s images are aligned taking the individual’s eyes into consideration. The sample average face calculation of an individual is given in Fig. 4:

Fig. 4
figure 4

Sample face images of an individual and his average face image

The performance analysis of the proposed framework is performed in two folds. In the first step, the stability and resistance of the method to rotation, illumination changes and noise effects are clarified. Later, the recognition performance of the proposed method is analyzed and compared to the the-state-of-art methods such as LBP, LDP, LDNP (Local Directional Number Pattern) [51], Gabor Features, HoG (Histogram of Gradients), LTP, LTeTP (Local Tetra Pattern) [44] and LDrvP (Local Derivative Pattern) [76] by conducting extensive simulations.

3.1 The stability analysis

As mentioned earlier, rotational changes, variation of illumination and noise significantly affect recognition performance. Hence, the proposed local descriptor and the overall architecture should not fail under challenging circumstances and should remain intact to fulfill the recognition task satisfactorily. First, it is shown how the recommended local identifier remains intact against rotation changes. Following this, RIMFRA’s performance analysis is carried out to verify its resistance with extensive simulations against lighting changes. The last section in this section shows the analysis results of the simulations performed to see the behavior of RIMFRA under changing noisy conditions.

3.1.1 Rotation-variation resistance analysis

A solid texture descriptor should be stable and produce similar features even if the original image is rotated, because the content of the image does not change and belongs to the same person. It is therefore important to demonstrate and verify the behavior of the proposed method if the image is subject to rotational changes. Figures 5 and 6 show the stability performance of RIMFRA. In Fig. 5, a sample matrix, which symbolizes a block of an image is demonstrated. As depicted in the figure, the local descriptor content extracted from the block does not change even if the matrix is rotated 90° counter clockwise.

Fig. 5
figure 5

Demonstration of the robustness of the proposed method to the rotational changes in an exemplary matrix

Fig. 6
figure 6

Sample images and their rotated versions of two individuals in the Face94 database

Figure 6 shows the face images of two people in the Face94 database and 90° rotated versions thereof.

Similarity performance analysis of the proposed method under rotational variation is made and compared with the state-of-the art methods in the literature. Firstly, the resulting textural features of RIMFRA and other methods are obtained from the face images and their 90° variants. Next, the similarity analysis is done by calculating the Mean Square Error (MSE) between these sets of features. Figure 7 shows the histograms of the feature sets produced by RIMFRA for the face images of two sample individuals in the Face94 database and 90° rotated versions thereof.

Fig. 7
figure 7

RIMFRA feature set histograms of the images given in Fig. 6(a-d)

Histograms presented in the first column belong first individual’s face images and histograms in the right column belong to the second individual. Obviously, the histograms in the same column are very similar, which is the desired situation that confirms the robustness of RIMFRA against rotational changes. The similarity analysis is conducted on images from different databases to verify and compare the results fairly. Tables 1, 2, 3, 4 and 5 and Figs. 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 and 20 demonstrate the results of the similarity and correlation values between the feature sets of the sample images and their 90° rotated versions that are selected from different datasets. No image adjustment or enhancement technique is applied to images to see the performance of the methods used in raw images.

Table 1 The MSE values calculated in the Face94 sample images
Table 2 MSE values calculated in the YALE sample images
Table 3 MSE values calculated in the CAS-PEAL-R1 sample images
Table 4 MSE values calculated in the JAFFE sample images
Table 5 MSE values calculated on the ORL sample images
Fig. 8
figure 8

Graphical representation of the correlation between feature sets of sample images and their 90° rotated versions in Face94 dataset

Fig. 9
figure 9

Sample images and their rotated versions of two individuals in the YALE database

Fig. 10
figure 10

RIMFRA feature set histograms of the images given in Fig. 9(a-d)

Fig. 11
figure 11

Graphical representation of the correlation between feature sets of sample images and their 90° rotated versions in YALE dataset

Fig. 12
figure 12

Sample images and their rotated versions of two individuals in the CAS-PEAL-R1 database

Fig. 13
figure 13

RIMFRA feature set histograms of the images given in Fig. 12(a-d)

Fig. 14
figure 14

Graphical representation of the correlation between feature sets of sample images and their 90° rotated versions in CAS-PEAL-R1 dataset

Fig. 15
figure 15

Sample images and their rotated versions of two individuals in the JAFFE database

Fig. 16
figure 16

RIMFRA feature set histograms of the images given in Fig. 15(a-d)

Fig. 17
figure 17

Graphical representation of the correlation between feature sets of sample images and their 90° rotated versions in JAFFE dataset

Fig. 18
figure 18

Sample images and their rotated versions of two individuals in the ORL database

Fig. 19
figure 19

RIMFRA feature set histograms of the images given in Fig. 18(a-d)

Fig. 20
figure 20

Graphical representation of the correlation between feature sets of sample images and their 90° rotated versions in ORL dataset

Figure 9 shows the face images and their 90° rotated versions of two individuals in the YALE database.

Figure 10 shows the histograms of the feature sets produced by RIMFRA for the face images of two sample individuals in the YALE database and 90° rotated versions thereof.

Figure 12 shows the face images and their 90° rotated versions of two individuals in the CAS-PEAL-R1 database.

Figure 13 demonstrates the histograms of the feature sets produced by RIMFRA for the face images of two sample individuals in the CAS-PEAL-R1 database and 90° rotated versions thereof.

Figure 15 shows the face images and their 90° rotated versions of two individuals in the JAFFE database.

Figure 16 demonstrates the histograms of the feature sets produced by RIMFRA for the face images of two sample individuals in the JAFFE database and 90° rotated versions thereof.

Figure 18 shows the face images and their 90° rotated versions of two individuals in the ORL database.

Figure 19 demonstrates the histograms of the feature sets produced by RIMFRA for the face images of two sample individuals in the ORL database and 90° rotated versions thereof.

In all tables and figures, columns 1 and corr1 values represent the similarity analysis and correlation results between the first image and its 90° rotated version respectively. Columns 2 and corr2 indicate the similarity analysis and correlation results between the second image and its 90° rotated version respectively. Columns 3 and corr3 values refer to the similarity analysis and correlation results between the first image and the 90° rotated version of the second image respectively. Columns 4 and corr4 values represent the similarity analysis and correlation results between the second image and the 90° rotated version of the first image respectively. Inherently, a high-representative texture descriptor of an individual’s face image should remain similar even when the image of that individual is rotated. Furthermore, the dissimilarity between the images belonging to two different individuals should be high to differentiate the individuals. In addition, the correlation value between images of the same individual is high, but should be low among the images of different individuals. As clearly shown in the tables representing the results of the different datasets, RIMFRA achieves consistent and highly accurate performance, producing results that meet the above-mentioned considerations.

3.1.2 Illumination-variation resistance analysis

The second stage of the analysis process involves the performance test under illuminating variations and the comparison of the proposed method with other texture descriptors. Two types of analysis are performed to investigate the performance of our method. First, tests are performed on the face images in the CAS-PEAL-R1 data set (Fig. 21) exposed to natural lighting variations.

Fig. 21
figure 21

Sample images of an individual in the CAS-PEAL-R1 database under different lighting conditions

Table 6 and Fig. 22 show the MSEs and correlation values between the feature sets of the first image and others. Although CAS-PEAL-R1 is one of the most demanding data sets due to facial images containing compelling variations for texture descriptors, RIMFRA competes with the most modern descriptors proposed in the literature.

Fig. 22
figure 22

Graphical demonstration of the correlation between the feature sets of a sample individual’s self-illuminated images in CAS-PEAL-R1 dataset

In the second stage of the illumination robustness analysis, an artificial, non-linear and non-uniform, third-order polynomial-based artificial illumination effect is created and included in the images in each dataset. Table 7, 8, 9, 10 and 11 and Fig. 23, 24, 25, 26 and 27 demonstrate the MSEs and correlation values calculated between the feature sets of the original image of an individual from each dataset and its artificially illuminated versions.

Table 7 MSE values calculated between the feature sets of the image of a sample individual in the Face94 data set and the feature sets of the artificially illuminated versions of this image
Fig. 23
figure 23

Graphical demonstration of the correlation between the feature sets of the artificially illuminated versions of an image in Face94 dataset

Fig. 24
figure 24

Graphical demonstration of the correlation between the feature sets of the artificially illuminated versions of an image in YALE dataset

Fig. 25
figure 25

Graphical demonstration of the correlation between the feature sets of the artificially illuminated versions of an image in CAS-PEAL-R1 dataset

Fig. 26
figure 26

Graphical demonstration of the correlation between the feature sets of the artificially illuminated versions of an image in JAFFE dataset

Fig. 27
figure 27

Graphical demonstration of the correlation between the feature sets of the artificially illuminated versions of an image in ORL dataset

Table 6 MSE values calculated on the CAS-PEAL-R1 sample images
Table 8 MSE values calculated between the feature sets of the image of a sample individual in the YALE data set and the feature sets of the artificially illuminated versions of this image
Table 9 MSE values calculated between the feature sets of the image of a sample individual in the CAS-PEAL-R1 data set and the feature sets of the artificially illuminated versions of this image
Table 10 MSE values calculated between the feature sets of the image of a sample individual in the JAFFE data set and the feature sets of the artificially illuminated versions of this image
Table 11 MSE values calculated between the feature sets of the image of a sample individual in the ORL data set and the feature sets of the artificially illuminated versions of this image
Table 12 MSE values calculated between the feature sets of a sample individual’s image in each database and its artificially salt-pepper noisy version
Table 13 MSE values calculated between the feature sets of a sample individual’s image in Face94 dataset and feature sets of its artificially Gaussian noise exposed versions
Table 14 MSE values calculated between the feature sets of a sample individual’s image in YALE dataset and feature sets of its artificially Gaussian noise exposed versions
Table 15 MSE values calculated between the feature sets of a sample individual’s image in JAFFE dataset and feature sets of its artificially Gaussian noise exposed versions
Table 16 MSE values calculated between the feature sets of a sample individual’s image in CAS-PEAL-R1 dataset and feature sets of its artificially Gaussian noise exposed versions
Table 17 MSE values calculated between the feature sets of a sample individual’s image in ORL dataset and feature sets of its artificially Gaussian noise exposed versions
Table 18 The recognition performance results regarding supervised training
Table 19 The recognition performance results regarding similarity analysis
Table 20 The recognition performance results regarding similarity analysis

As presented in the tables above, RIMFRA offers promising performances in comparison to the latest technology in terms of robustness against illuminating variations.

3.1.3 Noise resistance analysis

Another compelling to consider during performance analysis of a texture descriptor is how it is resistant to noise effects without applying any noise filtration. Therefore, any pre-treatment method to mitigate the effects of noise is not applied in simulations in order to accurately analyze the resistance to noise. Two types of noise, i.e. salt-pepper and Gaussian, are applied to the images in each database. Firstly, the salt-pepper noise is handled. Figure 28 shows an exemplary image, salt-pepper-noise exposed version, as well as RIMFRA feature set histograms of both. Clearly, the histogram of the multiple spectral-orthogonal signature of the original image and the histogram of the multiple spectral-orthogonal signature of the noisy version are very similar.

Fig. 28
figure 28

Demonstration of the robustness of the proposed method against salt-pepper noise

Table 12 and Fig. 29 show the similarity and dissimilarity values between a sample image in each dataset and their versions affected by the salt-pepper noise. For a method that is confirmed to be resistant to salt pepper noise, the MSE which expresses the difference between the feature sets of the image and the feature sets of the salt pepper noisy version should be low. In contrast, the correlation between the feature sets of the image and the feature sets of the salt pepper noisy version should be high. As a result, low MSE and high correlation values inherently points how the method is resistant to noise. As clearly seen, RIMFRA is one of the bests among the methods in terms of low MSE and high correlation.

Fig. 29
figure 29

Graphical demonstration of the correlation between the feature sets of sample images in each database and their salt-pepper noisy versions

The second noise resistance analysis is performed by incorporating Gaussian noise. Gaussian noise with different variance (σ2) values is applied to the images in each dataset. As in the salt-pepper noise effect analysis, the similarity and correlation values are measured between the feature sets of images and feature sets of their Gaussian noise exposed versions. Tables 13, 14, 15, 16 and 17, Figs. 30, 31, 32, 33 and 34 demonstrate the dissimilarity and correlations between the feature sets of the original images and noisy ones respectively.

Fig. 30
figure 30

Graphical demonstration of the correlation between the feature sets of a sample image in Face94 dataset and feature sets of its Gaussian noisy versions

Fig. 31
figure 31

Graphical demonstration of the correlation between the feature sets of a sample image in YALE dataset and feature sets of its Gaussian noisy versions

Fig. 32
figure 32

Graphical demonstration of the correlation between the feature sets of a sample image in JAFFE dataset and feature sets of its Gaussian noisy versions

Fig. 33
figure 33

Graphical demonstration of the correlation between the feature sets of a sample image in CAS-PEAL-R1 dataset and feature sets of its Gaussian noisy versions

Fig. 34
figure 34

Graphical demonstration of the correlation between the feature sets of a sample image in ORL dataset and feature sets of its Gaussian noisy versions

3.2 The recognition performance analysis

The recognition performance analysis of RIMFRA is done in two ways: 1- Training-based recognition performance analysis 2- Similarity-based recognition performance analysis. Since RIMFRA runs on color images, all images in each non-colored dataset are initially converted to RGB color space. To do this, a conversion map is generated by taking a reference colored image and its non-colored version. A best possible map is tried to be created for the conversion purpose. As is known, it is not possible to find a complete conversion map from gray to RGB conversion. Therefore, it is tried to find the best possible conversion map. Because the images of the Face94 data set are originally in a colored form, they give the best results during simulations. However, because other data sets consist of non-colored images, these images are first converted to RGB and then processed. The conversion process is unclear as it is approximate, which naturally affects the results.

3.2.1 Training-based recognition performance analysis

At this stage, supervised learning is used during classification. %80 of each individual’s images in each dataset is used for training and the remaining images of the individuals are used for testing. Table 18 shows the performance results of RIMFRA and the-state-of-the-art methods in terms of recognition accuracy. As it can be seen in Table 18, RIMFRA performs promisingly well when compared to other methods in terms of classification accuracy analysis using supervised learning. RIMFRA performs remarkably even on the challenging datasets CAS-PEAL-R1, JAFFE and ORL.

3.2.2 Similarity-based recognition performance analysis

At this stage, recognition performance measurement of RIMFRA and the-state-of-the-art methods are done by implementing similarity analysis between the feature sets of the images. That is, the feature set of the image that is being searched is calculated and then compared to the feature sets of each image in the dataset. If the tag of the most similar image found matches up with the tag of the image that is being searched, that shows a hit (true-positive), otherwise a miss (false-positive). Table 19 figures out the recognition accuracy performances of each method in each dataset. As clarified in the table, RIMFRA competes with the other methods even on the challenging datasets without any training, that is without any knowledge.

The final step of the simulations is to measure recognition accuracy when images are subject to rotational changes. At this point, the desired image is rotated by 90° and then the feature set is extracted. The feature set is then compared with the feature set of the non-rotated images. Not surprisingly, RIMFRA shows remarkable performance under the circumstance of rotational change as presented in Table 20.

4 Conclusion

This paper proposes a rotation-invariant multi-spectral facial recognition approach, which is highly resistant especially to rotational variances, as well as illumination changes and noise effects. Nearly all methods proposed so far have based on gray-level domain that ignores the information embodied in the color bands. The traditional view during texture extraction is considering the relationships of the pixels only in the colorless domain. However, during texture extraction, significant discriminative features can be obtained by considering the relationships between different color bands of the neighboring pixels. With this in mind, RIMFRA explores the multi-spectral relationships of pixels with their neighbors. Orthogonal polynomials are an effective way of representing 2D matrices. Thus, the resulting matrices produced in the previous step are fed to the orthogonal polynomial decomposition stage. The first few coefficients of the polynomial are the ones that the most information about the 2D matrix and also help reduce the size of the feature set. Simulation results encourage us to take the idea a step further by considering not only the RGB space but also other color spaces and combining the features of different color spaces as a compound feature set in future studies.