1 Introduction

With the development of modern science and technology, the need for authentication and identification processes to be carried out quickly, effectively and automatically, i.e., without the need for any human intervention is rapidly increasing [1]. Biometrics have received great interest due to their high performance in differentiation among people in many areas such as surveillance, identification, human–computer interaction and have gained wide application in recent years [2,3,4]. Individuals have some biological properties called metrics that distinguish them from others [5]. Biometry is concerned with these metrics that are categorized as physiological and behavioral [6].

Face data are one of the leading biometrics in that it can be collected and processed in real time without any discomfort and physical contact through devices such as cameras and without the need for any human intervention, as well as its high distinctive performance [7, 8]. Therefore, face recognition has been widely preferred in the security domain of commercial and law enforcement applications [9, 10]. The face recognition process, which is actually a representative of pattern recognition, consists of two main steps. The first of these is the process of inference of the features that can distinguish a face from others, also called face representation. The other is the design and application of classifiers or models that can be distinguished by the so-called facial pairing [11, 12]. Among these steps, the face representation is more crucial because face recognition is a computationally difficult task that requires a fine distinction between images of similar identity, as well as generalizing to different images of the same identity [13]. In addition, factors such as noise, partial occlusion and illumination, exposure and expression differences make the automatic recognition process much more difficult [14].

Each face has its own shape and color characteristics. The methods focus on the color characteristics of the face is concerned with the texture of the face. It is identified that each human face has its own structural properties called texture. Although there is no general consensus on its definition, texture is defined as repeated patterns in the image [15]. A descriptor, to be a good representative of a face, should carry the qualifications as a high-discriminative ratio, low computational complexity and resistance to deteriorating factors such as noise, occlusion and variations in illumination and expression [16].

1.1 Related work

Plenty of face representatives have been presented in the literature that is mainly grouped under two categories: local [17, 18] and holistic descriptors [19,20,21]. Holistic approaches examine the entire image and consider holistic features. This holistic set of features also refers to the general characteristics of the face [22]. Principal component analysis (PCA) [19], linear discrimination analysis (LDA) [20], independent component analysis (ICA) [23] and grey-level co-occurrence matrices (GLCM) [24] are the most basic, popular and inspiring works of holistic approaches. The distinctive feature of PCA is to reduce dimensionality by the transition from high-dimensional image space to a low-dimensional orthogonal area. The transition is performed by applying a linear transformation, given the least mean square reconstruction error. LDA is interested in finding a linear transformation that minimizes the intra-class variation and maximizes the inter-class variation. First adopted by Herault and Jutten [25], ICA searches for a linear conversion to minimize the statistical dependencies of the components of a vector. GLCM is one of the basic and prominent statistical textual

feature extraction methods that has been widely used in various applications [26, 27] for texture analysis. GLCM is the matrix that holds the distribution of co-occurring intensity patterns at a given offset over a given image. The second-order statistical (Haralick) features are extracted to analyze the texture of the image which subsequently can be used for classification tasks [28]. GLCM, which is one of the primary sensitive textual descriptors, handles the visual texture of the image by evaluating the pixel spatial arrangement statistics [29].

In contrast to holistic approaches, local descriptors, such as, local binary pattern (LBP) [30], local Gabor binary pattern (LGBP) [31], center-symmetric local binary pattern (CS-LBP) [32], local directional pattern (LDP) [33], joint local binary patterns with Weber-like responses (LJBPW) [34], pyramid transform domain local binary pattern (PLBP) [35], local directional gradient pattern (LDGP) [36], local phase quantization (LPQ) [37], local directional number pattern (LDNP) [38], histogram of gradients (HoG) [39], local ternary pattern (LTP) [40], Gabor [41, 42] benefit from local appearance features. Among the local descriptors, Gabor wavelets, Radon transformation [43], Texton learning [44, 45] and LBP are the prominent ones and have inspired further studies [46, 47]. Especially, LBP has found a wide application area due to its flexibility of adaptation, high performance of discrimination and low complexity [48]. Therefore, a series of sequential studies [49] have been proposed to develop and expand the idea of LBP.

As mentioned above, the face is one of the most important biometrics because of the amount of information it offers about individuals and the fact that this information can be collected using remote, camera-like devices without any discomfort and human intervention. Many non-verbal and semantic information such as a person’s identity, intent, emotion can be obtained by looking at an individual’s face. On the face, there are some important key points, namely landmark-points, on which the face shape analysis is made. Some leading computer vision applications, such as head-pose-estimation [50, 51] and facial expression recognition [52, 53], exploit the data belonging to the landmark-points. Moreover, the landmark-points around the eye can provide the first estimate of the central position of the pupils, to be used in eye detection and eye tracking [54]. Data retrieved from landmark-points can be an important source of information for human and computer interaction, entertainment, security surveillance and medical applications. Due to some reasons, the detection of the landmark-points becomes struggling. Firstly, the change in facial appearance differs from person to person in different facial expression insertions and head position changes. Secondly, environmental conditions such as lighting significantly affect the appearance of faces in images. And, finally, the self-occlusion due to extreme changes in head poses, or the occlusion caused by other objects leads to missing face appearance information [55].

1.2 Our contributions

Although facial landmark-points have been heavily exploited for facial expression recognition and head-pose-estimation, their discriminative performance on personal identification has been poorly explored. In this study, the discriminative performance of the landmark-points in terms of facial recognition by proposing a method that exploits both the facial appearance and face shape features belonging to them are analyzed. In the first step, all sample images are put into a standard form by pre-processing to provide uniformity. Following the completion of the uniformization phase of the samples, the landmark-points detection process starts. By the end of this process, sixty-six landmark-points on each face image are defined with their spatial coordinates. Subsequently, by exploiting the spatial coordinates of these landmark-points, some shape-based features encompassing their Euclidean distances between each other as well as grey-level-based appearance features are extracted and included in the feature set. The classification is performed according to this compound feature set that comprises both the shape and appearance-based relationships between the landmark-points. Extensive face recognition experiments are conducted on four widely used face datasets.

As clearly identified in the simulation results, the strengths of the proposed method can be listed as follows:

  • Remains stable under varying illumination.

  • Maintains its high individual discrimination despite increasing noise.

  • Retains its ability to recognize faces at a satisfactory rate, even when exposed to partial obstacles that make it very difficult to distinguish individuals.

The rest of this paper is organized as follows. Section 2 explains the proposed method briefly. Section 3 reports the experimental results and discussions among them. Finally, Sect. 4 concludes the paper.

2 Methodology

The method proposed in this study is comprised of three major steps as landmark-points detection, shape and grey-level appearance-based feature extraction and classification. Figure 1 illustrates the entire operational block diagram of the proposed method.

Fig. 1
figure 1

The block diagram of the entire process

2.1 Landmark detection

It is the purpose of the facial landmark detection algorithms to automatically detect the positions of the landmark-points of faces in images or videos. These landmark-points define the positions of some dominant components of the face, such as corners of lips and eyes, the tip of the nose or the interpolated points that connect these fiducial staff on a spline or a face contour [55]. The methods for facial landmark detection are generally classified under three headings, as the holistic [56, 57], part-based [58, 59] and regression-based [60] methods [61].

Part-based methods build the idea of local facial appearances and holistic shape appearances, which makes these methods robust to occlusion and illumination. This idea, which is suggested firstly by the study named active shape models (ASM) [62] and following improved by another study, constrained local models (CLM) [63] has pioneered many subsequent research activities [64, 65].

In contrast to the part-based ones, holistic methods exploit the global facial shape patterns and holistic facial appearance information. The active appearance model (AAM) [56] is indeed a statistical model that uses a few coefficients to fit the face images to control both the facial appearance and the shape changes. AAM relies on principal component analysis (PCA) while building the holistic facial appearance model and global shape model subsequently, during the model construction. The determination of the landmark-points is done by adapting the learned appearance and shape models to the test images. Model coefficients are estimated in the conventional AAM, by repeated calculations based on the model coefficient update estimation that relies on the current model coefficients and error image [55].

Regression-based landmark detection is a method of interest in recent research studies. The regression-based methods, rather than putting forward a global face model as done by holistic and part-based methods, intend to learn directly a mapping from the appearance of an image to its landmark locations. The regression-based methods roughly are categorized as direct regression methods, cascaded regression methods and deep-learning-based regression methods. Direct regression methods, which are further sub-categorized as local and holistic approaches, exert to predict the landmark locations at once without the need for an initialization. In contrast, cascaded regression methods, require subsequent, cascaded iterations to correctly localize the landmark-points with the requirement of pre-initialization of the landmark locations [55].

Indeed, a landmark detection algorithm specifies the locations of \( N \) landmarks, \( Lp = \left\{ {x_{1} , y_{1} , x_{2} , y_{2} , \ldots , x_{N} , y_{N} } \right\} \) on a given image \( f \). In this study, a CLM-based method, that is, the discriminative response map fitting (DRMF) [65] is applied to define the landmarks of the face images. Sixty-six landmark-points are identified when DRMF applied on a face image as illustrated in Fig. 2.

Fig. 2
figure 2

Sample image and its annotated landmarks

DRMF shows promising performance regarding landmark-points detection, even on binary images. The location values of the landmark-points of an image and its binary version are almost the same, with a mean square error (MSE) value of 2.8315, as clarified in Fig. 3 and Table 1.

Fig. 3
figure 3

A sample image and its binary version with their annotated landmarks

Table 1 Image coordinates \( \left( {X,Y} \right) \) of the landmark-points of a sample image and its binary version

2.2 Feature extraction

The descriptor in this study is comprised of two types of feature sets that are extracted relying on the facial landmark-points, as appearance-based features and shape-based features. The following section describes each feature set and the way of their retrieval in detail.

2.2.1 Shape-based feature set

DRMF identifies sixty-six landmarks on each face image. It is examined whether the spatial relationships of these landmark-points are unique for individuals or not. Let \( f \) be an image with \( N \) landmarks, \( Lp = \left\{ {x_{1} , y_{1} , x_{2} , y_{2} , \ldots , x_{N} , y_{N} } \right\} \), where \( N = 66 \). The Euclidean distances between these points are calculated. Totally \( N \times \left( {N - 1} \right) \) distance values are calculated as:

$$ d\left( {Lp_{i} ,Lp_{j} } \right) = \sqrt {\left( {x_{i} - x_{j} } \right)^{2} + \left( {y_{i} - y_{j} } \right)^{2} } $$
(1)
$$ fea_{1} = \left[ {d\left( {Lp_{1} ,Lp_{2} } \right) d\left( {Lp_{1} ,Lp_{3} } \right) \ldots . d\left( {Lp_{N - 1} ,Lp_{N} } \right)} \right] $$
(2)

where \( fea_{1} \) denotes the feature set that is comprised of the Euclidean distances between the landmark-points.

2.2.2 Appearance-based feature set

The second feature set \( fea_{2} \) includes the mean pixel intensity value differences of the landmark-points. To mitigate the effect of the single-pixel intensity changes due to noise or illuminative changes, the mean value of the pixels at the k-hop neighborhood, as well as the pixel at each landmark, is considered.

$$ mp_{{Lp_{i} }} = \left( {\mathop \sum \limits_{n = 1}^{{k^{2} - 1}} p_{{Lp_{n} }} + p_{{Lp_{i} }} } \right)/k^{2} $$
(3)
$$ fea_{2} = \left[ {d\left( {mp_{{Lp_{1} }} , mp_{{Lp_{2} }} } \right) \ldots d\left( {mp_{{Lp_{N - 1} }} , mp_{{Lp_{N} }} } \right)} \right] $$
(4)

where \( p_{{Lp_{i} }} \), \( mp_{{Lp_{i} }} \) and \( fea_{2} \) refer to the pixel intensity value of the landmark-point at \( Lp_{i} \), mean intensity value belonging to its k-hop neighborhood, and the feature set containing these mean intensity values, respectively.

After the extraction stage of the two feature sets, they are concatenated to form the overall feature set, \( fea_{ToT} \), as:

$$ fea_{ToT} = fea_{1} | fea_{2 } $$
(5)

3 Simulation results and discussions

Various experiments are conducted to measure and analyze the performance of the proposed framework under different circumstances. The evaluation of the proposed framework is performed mainly on the CAS-PEAL-R1 dataset [67]. The CAS-PEAL-R1 dataset is a subset of the CAS-PEAL dataset that contains several tens of thousands of images of 1040 subjects. The CAS-PEAL-R1 dataset is preferred because it contains a large number of images taken under various conditions such as varying lighting, expression and accessories. These variable factors make the recognition process much more difficult.

Sample images retrieved from different categories in the CAS-PEAL-R1 dataset are presented in Fig. 4.

Fig. 4
figure 4

Exemplary face images retrieved from different categories in the CAS-PEAL-R1 dataset

The performance analysis of the proposed method is carried out in two steps. In the first step, the resistance of the method to noise effects, partial occlusion and illumination changes is explored. Then, by performing comprehensive simulations, the facial recognition accuracy performance of the proposed method is analyzed and discussed. The performance of the proposed method is compared to a number of the-state-of-the-art methods (LBP, Gabor, local tetra patterns (LTetP) [68], local monotonic pattern (LMP) [69], local phase quantization (LPQ), Weber local descriptor (WLD) [70], local gradient pattern (LGP) [71], median binary pattern (MBP) [72], local arc pattern (LAP) [73], monogenic binary coding (MBC) [74]) proposed in the literature.

Following the feature extraction stage, the classification task is implemented. The classification is done by means of supervised training. k-nearest neighbor (k-NN) method is used as the training algorithm. In the k-NN method, k value is taken as 1. The reason for this is that individuals show very close features regarding facial characteristics. Therefore, if you increase the radius of the circumference during classification, you may accidentally assign an incorrect label to a subject. On the outset, the data are split into two parts (train and test) randomly to train the model. The train-part forms the %80 of the dataset while test includes the rest %20. Besides, the train-part of the dataset is also partitioned in two as the train (%80) and validation (%20).

Experiments are performed on MATLAB 2017b running on the Intel CORE i7-5500U 2.4 GHz processor and 16 GB RAM computer system.

3.1 Analysis of stability

Noise, occlusion and changes in illumination can significantly affect recognition performance. Therefore, the proposed method should not fail under exerting conditions and should remain stable to achieve satisfactory recognition performance. First, the robustness of the proposed method against noise is analyzed. Secondly, it is presented the behavior of the proposed method in the occasion of exposure to variances in illumination. Lastly, it is explored the resistance of the proposed descriptor under variable partial occlusions.

3.1.1 Noise resistance analysis

An important consideration to be taken into account during the performance analysis of a face identifier is how it resists against a challenging factor such as noise without any improvement, i.e., filtration. Therefore, the reaction of the proposed method is examined by applying artificially produced noise. Two types of noise, i.e., salt-pepper and Gaussian, are artificially applied to each image in the dataset.

At the outset, salt-pepper noise is considered. Images may sometimes suffer to noise, namely impulse noise, during acquisition, transmission or recording operations. Impulse noise is generally classified as random-valued impulse noise (RVIN) and fixed-valued impulse noise (FVIN). These two types of noise models differ in the intensity value change occurring on the noise pixels. In FVIN model, each noise-suffered pixel takes the value 0 or 255, that is, the pixel turns to either black or white. The modeling of FVIN is ordinarily is done as follows:

$$ x_{ij}^{{\prime }} = \left\{ {\begin{array}{*{20}c} {\left\{ {0,255} \right\}\; {\text{with}}\;{\text{probability}}\;p} \\ {x_{ij} \;{\text{with}}\;{\text{probablity}}\; 1 - p } \\ \end{array} } \right\} $$
(6)

where \( x_{ij} \), \( x_{ij}^{{\prime }} \), \( p \) refer to the original, noisy pixel intensity values at image coordinate (i, j), and the noise density, respectively [75].

For RVIN, two types of models have been proposed. In the first of these models [76], a noisy pixel can take one of the values in a fixed interval of the length m, rather than two fixed values as in FVIN. This model is called as fixed range impulse noise (FRIN) and formulated as:

$$ x_{ij}^{{\prime }} = \left\{ {\begin{array}{*{20}c} {\left[ {0,m} \right) \;{\text{with}}\;{\text{probability}}\; p_{1} } \\ {x_{ij } \;{\text{with }}\;{\text{probability}}\; 1 - p } \\ {\left( {255 - m,255} \right] \;{\text{with}}\;{\text{probability}}\; p_{2} } \\ \end{array} } \right\} $$
(7)

The second proposition [75] for RVIN is called as general fixed-valued impulse noise (GFN) or multi-valued impulse noise (MVIN) and formulated as follows:

$$ x_{ij}^{{\prime }} = \left\{ {\begin{array}{*{20}c} {\left\{ {0,255} \right\}\;{\text{with}}\;{\text{probability}}\; p} \\ {x_{ij} \;{\text{with}}\;{\text{probablity}}\; 1 - p } \\ \end{array} } \right\} $$
(8)

where S is the set of pulse noise values consisting of k elements selected from values in the range \( \left[ {0,255} \right] \).

Every image in the dataset is exposed to salt-pepper noise artificially and the recognition accuracy performance analysis is conducted on the salt-pepper noisy images. Figure 5 demonstrates the recognition accuracy values of the proposed and other methods. d, \( \left( {\varvec{fea}_{{\varvec{ToT}}} , \varvec{k} = 1} \right) \), \( \left( {\varvec{fea}_{{\varvec{ToT}}} , \varvec{k} = 2} \right) \) denote the noise density, the proposed method considering 1-hop and 2-hop neighboring pixels while calculating the mean pixel intensity value, respectively. Clearly, the proposed method remains stable even in case of increasing salt-pepper noise exposure. Since the method proposed in this study fully relies on the facial landmark-points, the salt-pepper noise does not affect the extraction of these points and the same or nearly similar feature set is continued to be captured. Thus, while the recognition accuracy performances of most of the other methods seriously degrade, our method can distinguish individuals despite the increasing salt-pepper noise rate.

Fig. 5
figure 5

The effect of salt-pepper noise on the recognition accuracy

In the next stage of noise durability performance analysis, another type of noise, Gaussian noise, is taken into consideration. The two dominant noise sources for digital image acquisition are the stochastic quality of photon counting in detectors and the internal and electronic fluctuations of the collection devices [77]. This most common noise arising due to the image acquisition system can be modeled as Gaussian random noise generally [78]. Gaussian noise is statistical noise with a probability density function (PDF) equal to that of the normal distribution, also known as the Gaussian distribution, named after Carl Friedrich Gauss. In other words, the positions of the pixels exposed to noise and scattering of the values are subject to Gaussian distribution. The PDF of a Gaussian random variable is formulated as in the following:

$$ p_{G} \left( z \right) = \frac{1}{{\sigma \sqrt {2\pi } }}e^{{ - \frac{{\left( {z - \mu } \right)^{2} }}{{2\sigma^{2} }}}} $$
(9)

where z, \( \mu \), \( \sigma \) denote the grey-level, mean value and standard deviation, respectively.

Figure 6 presents the recognition accuracy values of the proposed method and other competitors in the case of Gaussian noise exposure. The data in the table show the recognition accuracy values for the increasing Gaussian variance (\( \sigma^{2} \)) values by taking the Gaussian mean constant \( \mu = 0.001 \). Inherently, the recognition accuracy performance of a method degrades due to the increase in the variance of the noise. However, a robust descriptor should be as resistant as possible despite the increasing and changing noise parameters and be able to withstand its disturbing effects. Obviously, our method keeps its robustness even under varying and increasing noise values, while the recognition accuracy performances of other methods degrade seriously.

Fig. 6
figure 6

The effect of Gaussian noise on the recognition accuracy

3.1.2 Varying-illumination-resistance analysis

Face recognition becomes a strongly challenging paradigm especially in unconstrained environments [79]. Varying illumination is one of the most decisive factors that make facial recognition difficult. Image variations because of the varying illumination that induce changes on the images such as cast or attached shadows can be larger than that due to the innate differences between individuals [80]. These challenges stemming from varying illumination have attracted considerable attention from researchers, and many studies have been conducted to overcome these challenges. These studies are broadly classified into three categories as, normalization and pre-processing, illumination invariant feature extraction and modeling [81].

The method proposed in this study falls into the category illumination invariant feature extraction. If a method exploits the feature set that heavily relies on pixel intensity values, it is inevitable to be affected by changes in illumination. Therefore, when designing our face descriptor, it is intended not to produce a feature set based purely on pixel density values that would reduce immunity to the damaging effects of varying illumination.

The recognition accuracy performance of our method and other state-of-the-art methods are explored by conducting extensive simulations on the CAS-PEAL-R1 dataset. The reason for selecting the CAS-PEAL-R1 data set is that instead of being artificially produced, it contains a subset of images that are exposed to varying illumination that occurs naturally in the indoor environment. Figure 4c presents sample face images that are exposed to varying natural illumination. Obviously, without any pre-processing or normalization process, the distinguishing of the individuals is highly struggling for any descriptor that descriptor-feature-set heavily relies on pixel intensity values. As illustrated in Fig. 7, our method does the best regarding recognition accuracy even under the diminishing effects of shadows as a result of changing illumination. The performances of the other descriptors sharply fall due to their feature sets’ reliance purely on pixel intensity values.

Fig. 7
figure 7

The effect of illumination variation on the recognition accuracy

3.1.3 Recognition performance analysis under facial-accessory-caused partial occlusion

Although there is much research on face recognition to mitigate the distorting effects of pose and illumination changes, problems caused by occlusions are often overlooked. However, face occlusion is quite common and may occur due to intentional or unintentional spontaneous reasons. For example, football hooligans and ATM criminals may wear scarves and/or sunglasses to prevent their faces from being recognized. Other than that, some people wear veils because of religious beliefs or cultural habits. Other sources of facial occlusion include medical masks, beards, hats, facial hair, mustaches, make-up and so on. Of course, facial occlusion can significantly affect the performance of the most complex face recognition systems, unless occlusion analysis is specifically considered. The robustness of face recognition systems against partial occlusion is therefore very important nowadays [82].

Local feature-based methods have been recognized to be robust to partial occlusions and less susceptible to such problems, unlike traditional holistic approaches such as PCA, LDA and ICA. Some of the local feature-based methods focus solely on illumination and/or expression changes, while others [83, 84] aim to overcome the problems caused by partial occlusions.

As abovementioned, the method in this study does not solely rely on the pixel intensity values, rather, it is based on facial landmark features that are stable even under partial occlusions. Therefore, it can be seen from the results in Fig. 8 that our method has a high performance even on the data set of partially obstructed images, such as glasses and hats, which seriously impair facial recognition (Fig. 4a). Compared to the performance of other methods, our method provides a great advantage.

Fig. 8
figure 8

The effect of partial occlusions on the recognition accuracy

3.2 Recognition accuracy performance analysis

In the previous section, the analysis of the stability and robustness of the proposed method is performed on the challenging dataset CAS-PEAL-R1, which comprises recognition difficult, occluded and varying illuminated images. Besides, the resistance of the method is explored by exposing these images to artificially generated salt-pepper and Gaussian noise. This section clarifies the recognition accuracy performance of the proposed method on different datasets, namely ExtendedYaleB [85], Face94 [86] and JAFFE [87,88,89]. The Extended Yale B dataset contains 16352 images of 28 different individuals, 640 × 480 pixel size, exposed to 9 different poses and 64 different illuminations. Although images in the Extended Yale B dataset do not include any expression variation, they are exposed to significant pose and illumination changes. Figure 9 presents sample images of ExtendedYaleB dataset.

Fig. 9
figure 9

Exemplary face images retrieved from the ExtendedYaleB dataset

Figure 10 gives the recognition accuracy performance of the proposed and other state-of-the-art methods on the ExtendedYaleB dataset. Similarly, a result consistent with the reality of the previous results is obtained, i.e., in terms of recognition performance, the method we propose gives the second-best performance following Gabor.

Fig. 10
figure 10

Recognition accuracy performance analysis on the ExtendedYaleB dataset

Simulations are continued to be conducted on another dataset, namely Face94, which is composed of a total of 1860 images of which 20 belong to each individual. Despite the fact that it is not as challenging as CAS-PEAL-R1 and ExtendedYaleB datasets, individuals pose varying expressions (Fig. 11). Not surprisingly, as presented in Fig. 12, the proposed method performs the best among the others.

Fig. 11
figure 11

Exemplary face images retrieved from the Face94 dataset

Fig. 12
figure 12

Recognition accuracy performance analysis on the Face94 dataset

Following, the face recognition performance is analyzed on the JAFFE dataset. The database contains 213 images of 7 facial expressions (6 basic facial expressions + 1 neutral) posed by 10 Japanese female models. Each image has been rated on 6 emotion adjectives by 60 Japanese subjects (Fig. 13). As clarified in Fig. 14, despite the high-varied expressions of the models, the proposed method achieves one of the best discrimination performances.

Fig. 13
figure 13

Exemplary face images retrieved from the JAFFE dataset

Fig. 14
figure 14

Recognition accuracy performance analysis on the JAFFE dataset

4 Conclusion

In this paper, a high-discriminative facial recognition method, which fuses shape and grey-level features of faces, is proposed. Facial markings are used as shape properties because they present the characteristics of individuals that are at least unchanged or variable even in the event of exposure to destructive factors such as occlusion, varying illumination and noise. A number of features are produced of these facial landmark-points. These features include both some spatial values and pixel intensity values that are calculated considering only those landmark-points and their vicinities. As clearly presented in the simulations results, the proposed method remains stable even under the challenging factors such as varying illumination, noise and partial occlusion, which diminishes of a facial recognition method significantly. The recognition and robustness performance of a number of the-state-of-the-art methods are also analyzed. The proposed method competes promisingly with others in terms of recognition accuracy and at the same time makes a clear difference to them when exposed to challenging factors.