1 Introduction

The most universal statement in the recognition process is “The Finger Prints of two persons over the world cannot be the same”. This statement is true on the face also; the faces of two identical twin persons are also different. Face recognition is the most experienced phenomenon to perform recognition. Faces also contain physiological information in the form of expressions or emotions. Faces get human attention and attraction very quickly. Face recognition is required to acquire information like age, sex, emotions, and expression of a human being [1]. The basic model of face recognition in traditional and advanced forms is shown in Fig. 1 [2].

Fig. 1
figure 1

Face Recognition Model

The model shown in Fig. 1 can be applied to any standard or advanced systems to recognize an individual, his presence or the emotion. The system can be applied to diverse face types, including 2D greyscale, colour, 3D, thermal, near-infrared acquired in different environments and constraints. The normalized-face-DB is the trained normalized labeled dataset available in the actual or featured form. The dataset is robust and ineffective against real-time situations, challenges and constraints. This database should answer any valid face query generated in the actual environment. This query face or face-set is acquired in the real environment through the application, availability and situational constraints. The input face can be a full facial image, side view, partial, distorted, morphed, sketch image acquired through varied technologies and available in different formats, resolution and quality. Some of these challenges that exist in real-time capturing are shown in Fig. 2 on sample images taken from multiple databases. The FRS system must be capable of tackling these challenges at its integrated process stages.

Fig. 2
figure 2

Various Real-Time Issues in Captured Face Images

The pre-processing [3,4,5] stage is defined as the earlier phase of Face Recognition Systems (FRS) to handle the deficiencies and challenges that exist in the input image. The acquired face can be affected by some distortion, quality, size, pose [6] or illumination [7] specific unbalancing. The image rectification or restoration methods are included in this pre-processing stage to handle the different kinds of variances and disturbances. These rectification methods map the input-face to normalize and trained database images in terms of quality, size, resolution, color and textural constraints. The maximum the mapping will be achieved more significant, relevant, robust and accurate decisions can be expected from designed-FRS. The noise removal, size adjustment and color adjustments methods are defined at this stage to achieve effective facial recognition. The face acquired in online applications, outdoor scenarios and in unconstrained environments requires more striking and specific rectification to deal with unexpected deficiencies and disturbances. In the advanced FRS, the image specific, issue-specific, environment-specific and application-specific methods are integrated into this pre-processing stage to handle the different kinds of issues. Even the analytical measure assisted pre-processing methods were defined by the researchers to improve the performance of leading and challenging applications. The research methods proposed by the researchers to deal with these challenges are provided in Sect. 2 with a relative performance impact.

The pre-processing stage is followed by the segmentation [8, 9] stage which extracts the face or facial component. The first-level segmentation is used to extract the facial skin region from the face-image. The segmentation methods are based on the color or positional or geometric information of the face image. The color-based segmentation is used to identify the skin [9, 10] regions. The color model consideration is used with specific rules and constraints to acquire the skin based facial region. One of the color models is HSV (Hue-Saturation-Value). Here, Hue represents the angle, Saturation represents the color purity that lies between 0 and 1, and Value represents the component darkness defined between 0 and 1. An adaptive value of Hue and saturation represents the skin color. One of the values is used by [2] in his segmentation based model. The author used the Hue range between 0 and 60 degrees as well as saturation values defined between 0.23 and 0.68. The effect of color-based segmentation is shown in Fig. 3. Different algorithms, constraints and rules proposed by earlier researchers for face segmentation are provided in Sect. 3.

Fig. 3
figure 3

Color-based Segmentation

Face Component [9] identification and extraction are the second-level of face segmentation in which specific or all facial components [11] are extracted separately. In complex applications such as the expression or micro-expression identification, partial face recognition [12], group [12] face recognition requires components-based facial details. These expressions, components or the geographical facial-features are having functional as well as technical importance in face, expression and emotion recognition systems. The appearance-based features include external and internal features. Some descriptive features are listed in Table 1 [13, 14]. These features are acquired for each of the visible facial components with relative characterization. Each of the facial components is quantified by the specific feature and value that can distinctively identify the significance of the component. These characteristics are evaluated separately to generate a more relevant, wide and efficacious feature set. Deep level observations can be taken from these features for advanced applications of individual processing such as pain, intention, emotion or micro-expression recognition, activity-attention identification, partial face recognition from the group of individuals etc. Each of the applications can process one or more of these components individually or collectively for effective identification of the individual, his expression or the activity.

Table 1 Descriptive Facial Features [13]

The extracted facial or facial component ROI (Region of Interest) is processed through various feature descriptors [10, 15,16,17] for acquiring the application-selective, relevant and decision-driven information. These features are categorized as appearance-specific, geometric, statistical and discriminating features. The researchers have applied various algorithms and rules under different constraints for extracting these features. These features were used by the researchers individually or in the combined form for improving the performance and accuracy of FRS. The supervised and unsupervised learning methods were also integrated for reducing the dimension of processing-featureset and for identifying the most relevant and decision-driven features. The contribution of researchers in feature generation and optimization is described in Sect. 4.

Once the database images and input facial images are transformed to feature form, the classifier is applied to identify the individual or his class. The class can be based on the application such as age-group, gender, emotion, pain. Various distance-based, rule-based, probabilistic and criteria-specific supervised and unsupervised learning methods were investigated by the researchers for improving the effectiveness and performance of the facial recognition system. In recent year, composite classifiers, deep learning methods and optimization approaches are used to strengthen the existing classifiers. The feature or component selective classifiers are combined by the researchers within a common framework for gaining the benefits of multiple-classifiers.

In the last few years, the FRS faced some challenges, which were unpredictable in the systems designed a decade ago. These challenges are recognized as the existence of morphed faces and plastic surgery faces [18]. These kinds of faces modify the geometry, skin, scars and other visible features of actual faces. The hand-drawn sketch [19] based facial recognition, partial faces and group [12] face are also challenges in digital forensic and critical identification. In the complex unconstrained [20] environment, low-resolution images capture the low-quality mobile or surveillance camera devices are not effectively recognized by the traditional FRS. The recent FRS can use more effective feature descriptors and classifiers to improve face recognition in such critical scenarios.

Other then these traditional applications, the face-recognition systems are having various advancements in terms of image types, applications, challenging environment and unconstrained situational features. The 3D [21] faces and infrared [22] are the more descriptive image-forms captured by specialized cameras. The 3D [19, 21, 23] faces are captured through specialized devices, techniques and environments that acquire the depth dimension of the face more accurately. The structured lights, laser scanners or other stereo based systems are the most common techniques adopted to acquire high-resolution 3D faces. The reference points and viewpoints are also having a higher significance to generate the adaptive polygonal mesh for extracting 3D faces. In 3D faces, the polygonal structure change, curve map and the point specific evaluations were conducted by the researchers to recognize the individual, his movement and the expression. The geometrically inspired features were used to take the advantage of additional feature dimensions and to map with real faces. The transition of a 3D-to-2D face [19, 23] is also accomplished as the intermediate process to include the depth-oriented features in facial recognition. Such systems are robust enough to recognize the 2D and 3D face images using the same feature descriptors. The infrared and near-infrared (NIR) [19, 22]are the advanced, informative and illumination robust face images that ensure higher accuracy than visible imagery-based systems. The frequency-based, appearance-based and moment-based methods were defined by the researchers to deal with NIR images. These images are acquired using active radiation sources to achieve effective face and expression recognition in unconstrained environmental conditions. Different levels of illumination and contrast unbalancing can be actively handled by these images.

This paper has provided a detailed study on the contribution of the researchers in each stage of facial recognition systems. The capabilities of traditional and recent FRS systems are explored in this paper with technological improvement. The detailed exploration of various methods to achieve the accuracy gain FRS is discussed in this paper. The objective of this research is to identify the transformation of FRS in terms of image types, applications and the adopted methods to improve each stage. The composition adaptive and optimization-integrated methods investigated by the researchers are described in this paper. In this section, an overview of FRS is provided with current research directions. The components involved in facial recognition and the associated challenges are described in this section. Various benchmark databases were used by the researchers to validate their research. The description of these databases and their features is provided in Sect. 2. After getting the facial image, the first requirement is to normalize it against various real-time challenges, including resolution, illumination, noise, etc. In Sect. 3, the methods adopted to repair and normalize the real-time facial image are discussed. These normalized faces are later processed under segmentation to extract the facial region. In Sect. 4, the research conducted on facial segmentation is presented. The features are extracted from the acquired facial regions using different algorithms and measures. In Sect. 5, the research contributions to generate the most indicative and contributing features for recognizing a face are provided. Geometric, structure-adaptive, appearance-based and other significant feature generation methods are described in this section. The feature generation and selection methods are described in this section. In Sect. 6, various classification and recognition models adapted by the researchers are provided. The challenges handled by these classification models are also described. In Sect. 7, the recent and current open research areas in the facial recognition system are described. In Sect. 8, the conclusion and future scope of this study are provided.

2 Face Databases

When a new algorithm or model is investigated, it is recommended to test it on some benchmark databases before implementing it in the real environment. The validation of a novel FRS system is also conducted using such benchmark databases. Several facial databases are available with different resolutions, sizes and challenges addressed by the databases. When research investigated a new algorithm, he can select the most suitable benchmark database based on the capabilities of this model. By performing a deep study on the earlier work of researchers, the most-used facial databases are listed in Table 2. The table has provided the features and challenges addressed by these databases. The results obtained by a researcher for his FRS can be directly compared by the work conducted by the earlier researchers on the same dataset. This kind of analysis identifies the strength and the performance improvement gained by the researcher. In a later section of this paper, the research conducted on these databases is described with performance accuracy.

Table 2 Features of Various Face Images

3 Face Rectification and Normalization

The face is a biometric feature that does not require any specialized device. Any digital, analog or mobile camera is capable of capturing the image. This kind of easy availability of device and advancement of technology has reduced the cost, efforts and availability issues in facial capturing. However, the same technology also increased the number of associated challenges. These challenges are specific to the environment, device type, device quality, etc. Some common challenges associated to face capturing are listed in Table 3.

Table 3 Issues in Facial Capturing

The issues listed in face capturing effects and vary the photos taken at two different instances. Because of this, it becomes difficult to match the face image exactly to the database image. It increases the complexity of face recognition against other objects, patterns and biometric recognition systems. The changes or identified in the real-time face capturing includes color unbalancing, illumination unbalancing [24], contrast unbalancing, alignment, geometric [6] mismatch and the focusing. Various normalization [3] methods were applied by the researchers at the pre-processing stage [5] to rectify these issues and to standardize the face image. In this normalization stage, the physical and textural qualities of the input face image are mapped to database images. The effectiveness of this pre-processing stage directly reflects the performance of the face recognition system.

The lighting or the environment effect can degrade the image in terms of unbalanced illumination and contrast variation. The partial illumination and regional lighting effect are more severe. In real-time applications like face recognition, the illumination based degradation can affect the actual identification of facial features, and it reduces the error rate. Nikam et al. [25] combined the multi-scale Weber filter and enhanced complex wavelet transform for handling the varying illumination. The method improved the accuracy significantly for uncontrolled illumination conditions. A multi-scale [26] method using a maximum response filter bank was applied for handling the varying illumination. The method exposed the edges sharply and partially reduces the illumination. A lighting aware frontalization [20] method was proposed for rendering the frontal pose with the accurate face region. The landmark exploration with symmetric processing was proposed for lighting-normalized face recognition.

Ding et al. [27] provided a work to deal with the pose variation within ± 90o. The author used the available and partial information for pose-robust face recognition. The author applied path-based dictionary learning for generating the discriminative subspace. The author used the unoccluded textural information and correlates it with different poses and improved the earlier results of FRS. Duan et al. [28] presented the spatial information-based method to extract the face information except for the pose details. The author used the local feature descriptor with the linear transformation to extract selective information. Zhou et al. [29] used the divide and rule method for handling the pose variation in FRS. The Huffman codes were used by the author with region selection factors for the effective representation of faces and for recognizing them irrespective of different poses. The method increased the tolerance against rotation by using a patch-based classification technique. Duan et al. [28] removed the pose part from face images for generating the effective local feature descriptor. The transformation matrix and self-similarity feature-based learning method was applied to these selective regions for pose-invariant face recognition. The landmark-based [30] depth warping method was used for pose based face reconstruction. This probe face image was processed using a sparse regression method for pose-robust identification of an individual.

Face expression [31] is an essential and challenging face feature. Various dedicated applications are available that uses the expression significantly to take relative decisions. The intension, emotion of a known, unknown and disable person can be identified using facial expression. In the face-recognition system, the expression variation is considered as a varying and disturbing feature. Sometimes, it becomes difficult to map an individual with two different expressions. The expression-robust face recognition is the requirement for a typical FRS. The methods are required to heal these expression variations for improving face recognition. The textural and structural features were processed by the researchers for different face components for computing the integrated face expression. Martins et al. [32] used the disparity Energy Model (DEM) for suppressing the expression features and provided an effective expression invariant face recognition system. The encoded disparities exist in the face region and use it as actual correlated and weighted featureset for taking for an effective decision on the recognition system.

The occlusion is the major issue in real-time face capturing in which the face-information is overlapped by some other object, scene or individual. The issue with occlusion in face images is critical because of its unpredictable behaviour. The region, structure and type of occlusion are diverse for different images, scenes and situations. The region of occlusion and ratio of occlusion directly affects the accuracy of FRS. Long et al. [33] used the three layers pool method for preserving the high-level local and spatial information. The method suppresses the occlusion and noises and acquired the eyes, nose and mouth regions more accurately. This correlated information was processed by fuzzy rules to generate more effective features-space for deciding on face recognition. The patch-based occlusion methods also use the occluded region either partially or in the weighted form. However, this kind of inclusion can affect the accuracy of face recognition. A pixel-level [34] occlusion detection method was proposed using sparse representation. The decision for each pixel is taken as its participation in the occluded region. This contiguous occlusion analysis based method improved the accuracy and robustness against the occlusion problem. A low-rank [35] and sparse feature representation based occluded face recognition framework was provided. The proposed low-rank optimization method can reference the error map and handled the synthesized contiguous occlusion for face images. This reweighted sparse coding method has improved the accuracy of occluded face recognition. Fu et al. [36] proposed the locally constrained linear coding (LLC) based on occluded region exclusion and detection for face recognition. The Markov random field-based error correction analysis was also included for simplifying the regularization of occluded and normal regions.

Noise [37] is the undesirable and uncontrolled component that affects the quality of images and image processing systems. The images, which are acquired in the controlled environment and using specific devices, are lesser infected by noise components. The input for FRS systems is captured using a variety of devices in diverse environments, the images captured in such systems are very much infected by noise. Different noise infections affect the image information in a distinct form. It affects the accuracy of face recognition in the real environment. In FRS, the pre-processing methods are applied as essential substage to remove these impurities and to improve the reliability as a real-time system. In real-time images, different noises exist that affect the images differently. The common noise forms of Gaussian Noise, Salt & Pepper Noise, Poisson noise, etc. The type and density of noise affect the reliability of face-recognition systems. Because of this, each of the face-recognition systems handles this noise at the pre-processing stage by applying the filter. These filters are generic or noise specific. The median filter, mean filter, Gaussian filters are the common and traditional methods for removing the noise from face images. Budiman et al. [38] used the smoothing filter for handling the noise and later extract the Gabor and non-negative factorization method for noise-robust feature generation. A dual [39] model vector-directional and directional-distance filter were used for suppressing the noise element. The method proved effective performance against Salt & Pepper, Poisson and Gaussian noise. Banerjee et al. [40] used the enhanced high-pass filter in combination with a continuous wavelet filter for noise correction in face images and improved the reliability of the recognition process.

Low resolution [41] images are small size and low-quality images. Such images lack the essential information and increase the data loss on picture-scaling. In authentication-based systems, the low-resolution images cannot ensure good results. The face images can be acquired through low-quality cameras, CCTVs and hidden devices. These kinds of devices use low-resolution images for saving memory. However, the recognition of the individual in such images is the bigger challenge. The influence of low resolution was identified by [42] on the reliability of face recognition. Various techniques were discussed by the author respective of different face positions. The degree of correct face region detection and recognition was observed by the author. The observations identified a slight degradation in the recognition rate for low-resolution face images.

The recent methods proposed by the researchers for facial image rectification and normalization are provided in this section and Table 4. The able has listed the methods or framework for handling the variant issues that exist in face images. The accuracy observations obtained by the researchers for different datasets are also listed in this Table.

Table 4 Facial Rectification and Pre-processing Methods

3.1 Histogram Equalization

Histogram equalization [7] is a powerful method for handling the bad-contrast problem. The unequal distribution of contrast can be balanced by mapping the pixel values to uniformly distributed pixels over the image. Histogram equalization uses variously integrated and localized methods for improving and balancing the image contrast. The region sectioning and local region based accumulative processing can be defined for handling the more complex contrast issues. Some of these methods include histogram expansion, Cumulative Histogram Equalization, Local Area Histogram Equalization (LAHE), Par sectioning and Odd sectioning as the extensive histogram methods. The histogram (h) of an image (Img) is given by Eq. (1)

$$h_{n} = \frac{{PixelsCount\left( {Img,n} \right)}}{{PixelCount\left( {{\text{Im}} g} \right)}}$$
(1)

where n is the Pixel intensity of image where n = 0,1,2…L-1.

L is the maximum intensity of the image (For Grayscale L is 256).

Img is the processed image hn is the histogram value for the pixels of intensity n.

Now, the histogram equalization is applied to the unbalanced image and the equalized image (hImg) is obtained. The equation for histogram equalization is provided in Eq. (2)

$${\text{hImg}}\left( {{\text{i}},{\text{j}}} \right) = {\text{floor}}\left( {\left( {L - 1} \right)\mathop \sum \limits_{n = 0}^{{Img\left( {i,j} \right)}} h_{n} } \right)$$
(2)

This transformation method changes the image by distributing the pixel intensity over the wider range. The balanced contrast image is obtained with better visibility and feature exploration. Wang et al. [55] proposed the block-adaptive local histogram equalization method for contrast adjustment. In this method, the input image is divided into overlapped subblocks and evaluated the gradients for these blocks. The adjacent block-based histogram equalization was applied for adjusting the contrast of images. A recursive weighted [56] multi-plateau histogram equalization method was proposed for balancing the contrast over the segmented image. The weight normalization based method had equalized the separate segments and applied it recursively over the image. Santhi et al. [57] applied the clipping process-based partitioned histogram for enhancing the contrast over the image. The partition-based gray point array distribution was applied for balancing the contrast over the region.

3.2 Gray-Level Transformation

Gray level transformation [5] is an effective illumination rectification technique thatperforms pixel-wise intensity mapping with the specification of the transformation function. The unequal illumination is rectified by this method and redistributes it up to some extent. Based on the comprehensive function, the transformation can be linear or non-linear. The cumulative distribution, normalized function, logarithmic and exponential functions are the commonly used inclusive function within gray-level transformation. The gray-level transformation identifies the maximum and minimum gray levels represented Graymax and Graymin. Based on this information, the contrast modulation(cm) and mean-brightness (mb) are computed by Eqs. (3) and (4).

$${\text{cm}} = \left[ {\frac{{Gray_{max} - Gray_{min} }}{{Gray_{max} + Gray_{min} }}} \right]$$
(3)

and

$${\text{mb}} = \left[ {\frac{{Gray_{max} + Gray_{min} }}{2}} \right]$$
(4)

The intensity distribution is performed using sinusoidal expansion as the basic enhancement. The gray level distribution can spread the brightness over the image which is present in a region.

3.3 Median Filter

The median filter [58] is a widely used nonlinear filter, which is capable of healing the impulse noise. It is a rank-order filter that preserves the structural features of an image and changes the textural features. The median filter generates a rectangular mask to process the neighboring pixel of that mask. The image pixels within the mask are replaced by the median value of this mask. This mask window slides over the image and heals the noisy values that exist in the neighborhood. The quality of the median filter depends on the size and shape of the mask. Zhang et al. [59] observe the noise level uncertainty over the digital and applied the adaptive median filter for removing the impulse noise. The size of the adaptive window as also estimated by the author using global noise density and corruption exist in the region. An adaptive weighted [60] median filter based framework was presented for suppressing the salt-and-pepper noise. The author applied the median filter repeatedly within the filter window and reduced the noise effect based noise density. The recursive [61] and an adaptive median filter was proposed for suppressing the high-density noise. The author processed the pixels recursively in combined form and filters the window region effectively.

4 Face Segmentation

The real-time face capturing includes the extraction of a face with background, objects, individuals and other body-parts. The extraction of the face region is required for accurate recognition of an individual. Segmentation [8, 9, 62, 63] is the integrated sub-stage of FRS. In this stage, the processing ROI (Region of Interest) is extracted from the face image. In real-time images, face segmentation is the key issue. Face segmentation is itself a challenging area to locate the faces in an external environment, group images. The mixing of the objects and face region increases the error rate in face detection. Selective segmentation [9] is required to locate the face components such as eyes, lips, nose etc. The component-based segmentation plays an important role in expression classification and partial face recognition. Various skin color and face geometry-based methods were proposed by the researchers to improve face recognition. The color model [9, 64] based evaluation was recommended for the extraction of unstructured faces from balanced-color images. If the face images are aligned and single pose, the geometric and shape adaptive methods can be applied for the face region extraction.

Das et al. [65] applied the modified watershed algorithm on the Chrominance component of the YCbCr model for extraction of the skin region. The proposed method achieved effective segmentation results for FRI CVL Face Database. The method was applied on single face images and not verified for multiple-face segmentation. A new color balloon snake [66] model was presented for the identification of the skin region. The method was combined with a skin-tone distribution model and boundary diffusion for the acquisition of facial boundary and features. The method was robust against the lighting variation and complex backgrounds. The log-normal and Log-Gabor [67] filters were applied with dynamic feature analysis for extraction of expression features from the face image. The spatial filter-based method has processed the transient facial features for precise estimation of facial expressions. The work was tested on five benchmark databases and achieved an accuracy over 80% for each dataset. The background, pose, illumination and expression variations were analyzed by the author. Chakraborty et al. [68] combined the local and global skin pixel analysis with region growing for accurate and effective detection of the face region. The color-space model-based local-skin distribution was observed for detection of the face region with an error rate lesser than 12.87%. Chen et al. [69] provided a study on skin-color modeling and detection of the facial region within the face images. The author discussed the pixel and region-based skin segmentation method. The characterization of these methods and related challenges were discussed in this image. In Table 5, various face segmentation methods are defined with relative methodology and accuracy.

Table 5 Face Segmentation Methods

4.1 Morphological Operators

Mathematical Morphology [63] applies the mathematical filters for handling the key problem of face segmentation. These operators are shape specific non-linear operators suited for binary features. The structuring element is applied over the image to analyze and compare the neighborhood pixels. This structuring element is applied as a matrix over the image. Various researchers used morphological operators for the extraction of an object or face region. The morphological operators comprise five main operations: erosion, dilation, opening, closing and thinning. Dilation is the primary operation that accepts two sets of elements composed using structural elements. The functioning of these operators is provided in the form of equations applied on the image Img and the St is the structural block. The dilation operation Img by St is given in Eq. (5)

$$Img \oplus St = \left\{ {pxD \in E^{N} |\;pxD = px + e\;,for{\mkern 1mu} px \in Img,e \in St} \right\}$$
(5)

where px is the pixel from the image block and e is the element from a structural component, pxD is the dilated pixel obtained from the + operation on pixel and element. The dilation operation is performed in an increasing preserve order. The morphological erosion operation is provided in Eq. (6)

$$Img!\;St = \left\{ {px \in E^{N} |px + e \in Img,e \in St} \right\}$$
(6)

The erosion and dilation are the primary morphological operators. Opening and closing operators are generated using the composition of Erosion and Dilation operators. The opening operator provided in Eq. (7) can highlight the smooth contours and eliminates the small islands and peaks over the image.

$$Img\;o\;St = (Img\;!\;St) \oplus St$$
(7)

The closing operator processes the structural component for generating the effective contours and fuses the small holes, gaps and breaks. Equation (8) is showing the functioning of a closing operator as a composition of erosion and dilation operators.

$$Img{ } \bullet { }St = (Img{ } \oplus { }St)!{\text{St}}$$
(8)

These four operators can be applied repeatedly in different combinations by the researchers to extract the effective face region. The separation of background and detection of the face region is done by the researchers. Al-Otum et al. [71] used the color morphological operators with distance-based analysis for color pixel classification. These operators were applied for suppression of noise, shape analysis, edge detection and skeletonization. Once the features are acquired, the segmentation was applied for region-based segmentation. Boucheta et al. [72] applied the fuzzy rules on the morphological operator for taking the shape, edge and texture-based decision on region segmentation. The fuzzy-order-based RGB color space processing was applied for extraction of the effective region. The multi-valued functions were applied for texture classification. The method achieved an accuracy of over 77%. Pujol et al. [73] used the threshold method with mathematical morphology for optimal segmentation. The author identified the gradient threshold value based on the adaptive evaluation. The method achieved effective segmentation results in an optimal time.

4.2 Skin based Segmentation

The skin-based segmentation [64, 74] performs the color value analysis to locate the face-skin area. The skin color modeling is performed by the researchers on different color models such as RGB (Red–Green–Blue), YCbCr, CMYK, HSV. According to the dataset images and scenes, different threshold limits and color-ranges were applied by the researcher for locating the face and other skin areas. The adaptive thresholding methods were defined to observe the image against the varying background and illumination. The method dynamically identifies the color range for extraction of skin pixel range. Shaik et al. [75] provided a comparative study on skin segmentation methods in HSV and YCbCr color spaces. The author identified that YCbCr achieved more accurate results for skin segmentation. YCbCr [10] Model is the most popular color model for skin color segmentation as it provides the color-based uniformity over the image. The model percepts the content-specific error and assigns the weightage to background and foreground regions. This color model is formed using the main components called luminance (Y), Chrominance-Blue (Cb) and Chrominance-red (Cr). The transition of the RGB color image to YCbCr can be obtained using Eq. (9).

$$\left[ {\begin{array}{*{20}c} Y & {Cb} & {Cr} \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} R & G & B \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {.299} & { - 0.168935} & {0.499813} \\ {.587} & { - 0.331665} & { - 0.418531} \\ {.114} & {0.50059} & { - 0.081282} \\ \end{array} } \right]$$
(9)

The transition of RGB to YCbCr is also shown in Fig. (4). In this color model, the luminance layer provides the illumination-free image. The chrominance-blue and chrominance-red are obtained by neglecting the luminance vector from blue and red layers of the RGB model respectively. The chrominance vectors express the skin area of an image. The figure shows the transition of the RGB image to the YCbCr image and extraction of the Chrominance red(Cr) image. Finally, the thresholding applies to identify the skin segmented region shown in Fig. 4 (iv).

Fig. 4
figure 4

Face-Region Localization using Skin based Modelling [10] (i) Source RGV Image (ii) Converted YCbCr Image (iii) Extracted Chrominance (Cr) Image (iv) Skin Segmented Image

Kang et al. [76] provided the methods for improving skin-region detection by performing a sliding-window analysis. The appearance-based analysis was conducted for face images with complex background. The haar wavelet was also applied for reducing the computational cost for effective region detection. The method decreased the false detection rate of up to 47%. A comparative study on the use of color models for face segmentation was provided by [77]. The performance comparison was conducted by the author for RGB, HSI, CIELab and YCbCr color models. The author achieved better segmentation results for the HSI model with a lesser error rate and processing time.

5 Feature Extraction

Feature Extraction [1, 15, 78] is extensively important for the face recognition system. This stage extracts the most relevant information and reduces the dataset size. The decision-oriented information is obtained using the suitable feature extractor for classifying the dataset. If the facial features are not completely available, then the specific-part information is more advantageous to take important decisions. Feature Extraction over the face images is done in two steps, first to locate the feature-region and second to separate them from the rest of the face area. The shape, points, color-information and textural aspects are the different information that can be acquired to describe the face effectively. Various parameters, methods and constraints were used by the researchers individually or in different combinations to improve the accuracy of facial recognition systems.

In recent years, the responsibility of feature extraction [15] method is increased to handle various real-time challenges, including facial distortion, partial capturing, image quality, illumination issues. The parametric and non-parametric models were investigated by the researchers and included as a primary layer to the face-recognition system. The available feature generation methods are categorized in this section as feature descriptors. The recent work on feature extraction methods and measures is provided in Table 6. The generalized and the commonly used methods for feature extraction in different aspects are provided in this section.

Table 6 Fuzzy Rule-based Computation on Inclination Angle

5.1 Geometric features

In face recognition, the foremost task is to identify the input image is a face image or not. The mosaic images are useful to answer this question. Mosaic images are a series of images with different resolutions. To build this series, the original image is subdivided into square cells of nxn. Each pixel of this cell is having the average gray value of all cell pixels. Smaller the cell size, the clearer the picture will be. Such as for a face image of size 512 × 512, a cell size of 16 × 16 can conclude the facial shape so that decision can be taken on the image is a face image or not. The mosaic image with different cell sizes is shown in Fig. 5(b) and 5(c) [79]. This mosaic image concept is important when we have to extract the face from a scene or the video [80].

Fig. 5
figure 5

a Original Image, 3b Mosaic Image Cell size 16, 3c Mosaic Image Cell Size 2

After recognizing the image as the face image, the facial features are aquired. These features are additive and primitive features that can be separated from the image. Edge and object features are the most significant and basic kinds of structural and Geometric features [1]. Figure 6 is showing the curvic model of the face [81].

Fig. 6
figure 6

Curvic Model of Face [81]

These features include the distance between the eyes, width and height of different facial components, including eye, lips, nose. The structural and shape features of chin, head are also defined as the geometrical and appearance-specific features.

Facial boundaries are required to identify for acquiring the facial region. The geometric structure of face and its inclination is identified for estimating the face position and pose. The center and side edges are estimated for localizing the facial region. The eye coordinates are also used as the geometric features for accurate localization of facial region. The geometrical evaluation of the facial region and its inclination is provided through Eqs. (10), (11) and (12).

$${\text{Inclination of Face }} = \frac{Left\_Eye.Y - Right\_Eye.Y}{{Left_{Eye} .X{ }{-}{ }Right\_Eye.X}}$$
(10)

Similar to the inclination equation, the left and right boundaries of face are estimated through Eqs. (11) and (12)

$${\text{FLeftBoundary}} = \frac{{LeftEdgeY{ }{-}{ }Nose.Y{ }}}{{LeftEdgeX{ }{-}{ }Nose.X}}$$
(11)

And for the right edge

$${\text{FRightBoundary}} = \frac{{RightEdgeY{ }{-}{ }Nose.Y{ }}}{{RightEdgeX{ }{-}{ }Nose.X}}$$
(12)

Some of other structural features derived based on the facial description are provided in Eqs. (13)–(17) [1]

$${\text{Eye2EyeDist}} = \sqrt[2]{{\left( {Left\_Eye.X - Right\_Eye.X} \right)^{2} + \left( {Left\_Eye.Y - Right\_Eye.Y} \right)^{2} }}$$
(13)
$${\text{Left}}\_{\text{Eye2NoseDist}} = \sqrt[2]{{\left( {Left\_Eye.X - Nose.X} \right)^{2} + \left( {Left\_Eye.Y - Nose.Y} \right)^{2} }}$$
(14)
$${\text{Right}}\_{\text{Eye2NoseDist}} = \sqrt[2]{{\left( {Right\_Eye.X - Nose.X} \right)^{2} + \left( {Right\_Eye.Y - Nose.Y} \right)^{2} }}$$
(15)
$${\text{Nose}}\_{\text{Mid2Left}}\_{\text{Edge}} = \sqrt[2]{{\left( {Left\_Edge.X - Nose.X} \right)^{2} + \left( {Left\_Edge.Y - Nose.Y} \right)^{2} }}$$
(16)
$${\text{Nose}}\_{\text{Mid2Right}}\_{\text{Edge}} = \sqrt[2]{{\left( {Right\_Edge.X - Nose.X} \right)^{2} + \left( {Right\_Edge.Y - Nose.Y} \right)^{2} }}$$
(17)

While performing the face recognition, these all feature can be used collectively under different weightage. Once these all features are extracted, the localization cost of the face can be identified. As shown in the Fig. 6, the face localization cost estimation can be done under three main vectors called Hair Line Curve, Face Left Side Curve and Face Right Side Curve [81, 82]. The cost estimation under any feature analysis is given by [81, 82].

$${\text{Cost of Localization}}\left( {\text{f}} \right) \, = {\text{ Costdsimilarity}}\left( {{\text{S}}\left( {\text{f}} \right),{\text{Timg}}\left( {\text{f}} \right)} \right) \, + {\text{ Costmiss}}\left( {\text{f}} \right) \, + {\text{Costspring}}\left( {{\text{Timg}}\left( {{\text{f1}}} \right),{\text{ Timg}}\left( {{\text{f2}}} \right)} \right)$$
(18)

Here f is the specific feature for which the analysis is performed. S is the source input image. Timg is the training image respective to which the comparison is performed. Costdsimilarity is the dissimilarity between the source and the training image. Costmiss is the penalty if the feature f does not exist in the source image. Costspring is the difference between two features in the training image.

This formula estimates the similarity measure between the source and training images under a particular feature vector. In case of all N feature analysis, the formula is given in Eq. (19)

$${\text{Featured Similarity Analysis}} = \frac{{\mathop \sum \nolimits_{i = 1}^{N} \begin{array}{*{20}c} {Cost\, of \, Localization \left( f \right)} \\ \end{array} }}{N}$$
(19)

While performing the similarity analysis respective of the pixel for a specific area under a precise feature constraint, there are two main approaches called Nearest Neighbour analysis (NN) and Normalized Cross-Correlation Coefficient (NCC). The nearest neighbor analysis performs the pixel intensity analysis for its 4 or eight neighbors. However, this approach is not impressive in case of lighting effect. On the other hand, NCC depends on the gray level different analysis that is computed respective of the average intensity. It is hardly affected by lighting conditions [83]. Another improved form of similarity analysis is NPC (Normalized Principal Component) Feature. Equation (20) evaluates the NCC for two images.

$$NPC\left( {Source} \right) = \frac{{\left( {\sigma 1^{2} - \sigma 2^{2} } \right)^{2} + 4\sigma n^{2} }}{{\left( {\sigma 1^{2} - \sigma 2^{2} } \right)^{2} }} \times \left( {0.5 + \frac{{\sigma n}}{{\sqrt {\left( {\sigma 1^{2} - \sigma 2^{2} } \right)^{2} + 4\sigma n^{2} } }}} \right)$$
(20)

Here, σ1 is the standard deviation of the input source image, and σ2 is the standard deviation of the template image respective to which similarity analysis is performed. σ n is the normalized correlation. To provide the optimization in similarity analysis, these three vectors are used collectively. The derived similarity measure represented in Eq. (21) is given by [83].

$${\text{Similarity }}\left( {{\text{Source}}} \right) = \frac{1}{{NN\left( {Source} \right)}} + { }NCC\left( {Source} \right) + { }NPC\left( {Source} \right)$$
(21)

In the year 1992, Toshio [84] defined a fuzzy-based estimation to identify the inclination of the face. The fuzzy rule defined by the author depends on different parameters. These parameters include: Length of Skin Region(SL), Width of Skin Region(SW), Length of Hair Region touched the left side of Skin Region(HL), Width of Hair Region touches the right side of Skin Region(HR). Based on these basic parameters some decision parameters are derived [85]. These parameters A, B, C are shown as Eq. (22).

$$\left. {\begin{array}{*{20}c} {A{ } = { }\frac{SL}{{SW}}} \\ {B{ } = { }\frac{HL}{{SL}}} \\ {C{ } = { }\frac{HR}{{SW}}} \\ \end{array} } \right\}$$
(22)

Once these attributes are defined, based on these attributes, the Fuzzy membership function is defined. The decision based fuzzy membership is shown in Fig. 7 for each parameter (A, B, C).

Fig. 7
figure 7

Fuzzy Generated Roles for Facial Decision Parameters

The values derived from the estimation of physical characteristics are trained under fuzzy logic. The fuzzy logic has divided these values under the rules of High, Medium and Low. Figure 7a, b and c are representing these values under the fuzzy ruleset. The decision values driven from these rules are shown in Table 6 [84].

From this ruleset, the grading for all three vectors A, B, C are derived. These gradings are represented by new parameters called GAi, GBi and GCi for the ith membership function. Based on these equations, the facial inclination can be identified [84]. Equation (23) represents the inclination of the face with angular estimation.

$${\text{Face Inclination}} = \frac{{\mathop \sum \nolimits_{i = 1}^{3} { }\mathop \sum \nolimits_{i = 1}^{3} \mathop \sum \nolimits_{i = 1}^{3} \left( {GAi{*}GBi{*}GCi} \right){*}amgle{ }}}{{\mathop \sum \nolimits_{i = 1}^{3} { }\mathop \sum \nolimits_{i = 1}^{3} \mathop \sum \nolimits_{i = 1}^{3} \left( {GAi{*}GBi{*}GCi} \right){ }}}$$
(23)

Here, angle is the actual inclination obtained from Table 6 by performing the rule mapping.

Once the face inclination is obtained, the next work is aligning the face by performing the specific rotation of the face area and to present the face as a normalized face area. The fuzzy-based estimation can also be performed to represent the face model effectively [86]. According to this model, the face image is described by using three primitive cells called face cell(F), background cell (B), hair cell(H). Two other overlapped cells are described by Hair or Face Cell(H/F) and Hair and background cell(H/B). Now the complete image is divided into smaller rectangles or cells, and the fuzzy-based estimation on each rectangle is performed to identify its characteristics respective to defined cells. According to fuzzy law, the derivation of H/F and H/B is given by Eq. (24) [86].

$$\left. {\begin{array}{*{20}c} {H/F{ } = { }H{ } + { }F{ }{-}{ }HF} \\ {H/B{ } = { }H{ } + { }B{ }{-}{ }HB} \\ \end{array} } \right\}$$
(24)

These parameters are further used to acquire the hair, background and face region. The computation of these parameters is given through Eqs. (25) to (28).

$$HairArea = { }\frac{{\mathop \sum \nolimits_{i = 1}^{N} HairSimilarity\left( i \right)}}{N}$$
(25)

The Eq. (25) described the hair area identification within a particular block and N is the nuber of pixels in that block. The intensity based analysis is performed for identification of hair region. Similarity, Eq. (26) shows the identification of facial region.

$$FaceArea = { }\frac{{\mathop \sum \nolimits_{i = 1}^{N} FaceSimilarity\left( i \right)}}{N}$$
(26)

After identification of Face and Hair regions, the background region is located using Eqs. (27) and (28) [86]

$$BackArea = { }\frac{{N - { }\mathop \sum \nolimits_{i = 1}^{N} HairSimilarity\left( i \right) - \mathop \sum \nolimits_{i = 1}^{N} FaceSimilarity\left( i \right)}}{N}$$
(27)
$${\text{BackArea }} = {\text{ N }}{-}{\text{ HairArea }}{-}{\text{ FaceArea}}$$
(28)

The textural feature is also extracted in the form of rough contour. The template based analysis is performed over the database images for analyzing the contour feature. This template contour is generic for all the database images and defines a generalized face area. Based on this rough contour face area, the important facial features can be extracted. As the face area is already aligned, we need only three vectors for the feature extraction called width, height and center point of a feature. Figure 8 is showing the template for Head Area [17], Face Area [17], eye and mouth extraction [87, 88]

Fig. 8
figure 8

Extraction of Component Specific Face Features [87]

5.2 Statistical Features

Statistical features are linearly combined with the image which are obtained through multivariate analysis. It defines the most effective categorization of features for face detection and quantified the face-features through quantitative values. These statistical measures are not only based on the localization but also include the intensity measures under different frequency domains. By defining the diverse frequency spectrums or the distribution analysis, the different facial features can be extracted from the face. One of the analyses is Eye Moment Analysis described by Xiaoguang in 1992 [89]. The eye moment analysis is shown in Eq. (29).

$$EyeMomentAnalysis = \mathop \sum \limits_{i = 1}^{M} { }\mathop \sum \limits_{j = 1}^{N} \left( {\frac{i - centerx}{{dist}}} \right)^{p} \left( {\frac{j - centery}{{dist}}} \right)^{q} f\left( {i,j} \right){ }$$
(29)

Here, (centerx,centery) are the center coordinates to the left or right eye.

Dist is the distance between eyes.

M, N are the region area for the eye in terms of width and height.

F(I,j) is the intensity value of that particular eye point.

Another intensity-based statistical measure used by the research to identify the profile project is the Walsh Power Spectrum analysis. This statistical analysis is having the ability to perform effective feature extraction even if discrimination exists between faces. This spectrum analysis is the combination of autocorrelation function and the dyadic power spectrum analysis [89]. Equation (30) represents the spectrum based analysis.

$${\text{DA}}\left( {\text{i}} \right) = \frac{1}{N}\mathop \sum \limits_{k = 0}^{N - 1} p\left( k \right){*}p\left( {k \otimes i} \right)$$
(30)

Here, p(i) is the projection function of size N. N is the length of series and projection Eq. (31) is given as

$${\text{P}}\left( {\text{z}} \right) = \mathop \smallint \limits_{z}^{{}} f\left( {x,y} \right)dw$$
(31)

Here w is the direction of the face. When this frequency spectrum is defined under the Walsh Transform function then it is given by Eq. (32)

$${\text{W}}\left[ {{\text{DA}}\left( {\text{i}} \right)} \right] = {\text{W}}\left[ {{\text{p}}\left( {\text{i}} \right)} \right]{\text{ W}}\left[ {{\text{p}}\left( {\text{i}} \right)} \right]$$
(32)

The texture is an important characteristic of an image to identify the objects and region of interest. To perform the intensity-based analysis, the textural features are extracted and computed as the main processing stage or for the post-analysis. The texture is defined as an observed pattern on the surface of the image or the object. The pattern can be regular or repetitive. It is defined with some common property types such as smoothness, graininess, coarseness etc. In facial feature identification, the texture analysis is helpful to identify the skin area, hair area, background area, dress area etc. [16]. The texture analysis is not a single term or analysis, instead, it is been defined by different statistical measures given in [90]. The author has defined about 14 different statistical measures to identify and classify the textures for images. But in case of face images, we have defined 5 important measured [16]. These measured are Angular Second Moment, Contrast based analysis, Correlation Measure, Inverse Difference Moment and entropy measure. These measures are given in Eqs. (33)–(37)

  • Angular Second Moment is the statistical measure to identify the homogeneity of the texture in terms of intensity analysis.

    $$AngularSecondMoment = \mathop \sum \limits_{i = 1}^{M} { }\mathop \sum \limits_{j = 1}^{N} Img\left( {i,j} \right)^{2} { }$$
    (33)
  • Contrast Based Analysis is defined as the local variation in the texture.

    $$ContrastBasedAnalysis = \mathop \sum \limits_{i = 0}^{255} { }i^{2} \left( {\mathop \sum \limits_{j = 1}^{M} { }\mathop \sum \limits_{k = 1}^{N} Img\left( {j,k} \right)} \right)$$
    (34)

    Here, i represents the gray level over the image with minimum value 0 to maximum value of 255.

  • Correlation Measure performs the referenced based analysis respective of the mean and standard deviation values obtained from the image. A higher correlation value represents the larger difference of image pixels from the average intensity and stiffer the variability over the image.

    $$CorrelationMeasure = \frac{{\mathop \sum \nolimits_{i = 1}^{M} \mathop \sum \nolimits_{j = 1}^{N} Img\left( {i,j} \right) - { }Mean\left( x \right){*}Mean\left( y \right)}}{{StdDev\left( x \right){*}StdDev\left( y \right)}}$$
    (35)
  • Inverse Difference Moment is another contrast based statistical analysis to obtain the local variation over in face image.

    $${\text{InverseDiffMoment}} = \mathop \sum \limits_{i = 1}^{M} \mathop \sum \limits_{j = 1}^{N} \frac{1}{{1 + \left( {i - j} \right)}}{\text{Img }}\left( {{\text{x}},{\text{y}}} \right)$$
    (36)
  • Entropy is the analysis of randomness over the image. The lower the entropy value, the more smoothen the image will be.

    $${\text{Entropy}} = \mathop \sum \limits_{i = 1}^{M} \mathop \sum \limits_{j = 1}^{N} Img\left( {i,j} \right){*}Log\left( {Img\left( {i,j} \right)} \right)$$
    (37)

Now, the main question arises here “Is a single feature is capable of taking a Decision about Recognition”. The answer is “no”, as there is discrimination between different faces of the same persons. The similarity score with a single feature will be higher and will overlap with different human beings. Because of this, integration is required between different features while performing the recognition process. This further requires some suitable and appropriate approach. Some of these approaches are given by [91]

  1. 1.

    Take the most similar feature among the faces and use its score for the analysis

  2. 2.

    Take the average of these feature scores

  3. 3.

    Assign the weightage to these features under the primitive and non-primitive criteria.

  4. 4.

    Assign the weightage to different geographical features under the statistical measures

  5. 5.

    Classify the images under different features and then recomputed them under other features.

5.3 Holistic Features

Holistic [92, 93] features acquire global information from the pixels of face images. The most popular holistic feature-based methods are PCA [94,95,96] (Principal Component Analysis), LDA (Linear Discriminant Analysis) [97,98,99] and ICA (Independent Component Analysis). These methods perform data reduction and acquire the most relevant information on different aspects to improve the performance of the facial recognition method. Eigenface generation and processing are the basic phenomena associated with these methods. PCA generates the eigenface from the training set and utilizes it to map the input facial image over it. The ICA is the holistic feature that is derived as the set of independent features. The fisher face is generated as the subspace which is obtained as the linear discriminant. The fisher face is a more promising method than eigenface based methods. LDA is more significant for a high dimensional dataset. This method uses the scatter matrix based feature vector to map the faces to the class and within the class.

Senthikumar et al. [100] provided a comparative study on PCA, KPCA (Kernel PCA), 2D PCA, ICA (Independent Component Analysis), FDA (Fisher Discriminant Analysis) methods. 2D PCA processed the 2D image metrics instead of taking the 1D vector as input. It generates the covariance matrix in 2D and presents the images directly. It reduces the processing efforts and optimized the task of traditional PCA. In KPCA, the kernel function is applied to the input space for improving the expensive nonlinear mapping. ICA performs the analytical measure using two architectures. The computation on these measures is applied to generate the ICA feature face. FDA method comparative obtain the higher scatter between classes while utilizing the desired Eigen features. The fisher face is generated by this method for effective face representation. The comparative results show that KPCA and 2DPCA achieved a better recognition rate of over 90% for Yale and Senthil databases. The fusion [101] feature framework using PCA, FLD (Fisher Linear Discriminant) was provided for generating a distance-based weighted feature face. A distance-in-feature-space (DIFS) was created by computing the confidence and weights. This fusion score based method achieved a higher accuracy for IPCV and FERET databases.

5.4 Appearance-based Features

The appearance-based features extract the visible features such as expression, marks and structure of face for recognizing the face of an individual. These methods reduce the dimension of the face and acquire the most significant and contributing features. The decomposition methods, point-based methods and symmetric feature-based methods cover appearance-based approaches. A decomposition adaptive DA-DWT (Direction-Adaptive-Discrete Wavelet Transformation) [102] method was defined for extracting the expression, pose and illumination invariant features. These discriminative features were extracted by generating the blocks and observing the low-frequency features with directional evaluation. This local appearance descriptor achieved a higher accuracy than PCA, KPCA, LDA and LBP methods. A neighborhood preserving projection [103] based appearance-feature analysis method was provided for an effective representation of a face. The nearest neighbor based weighted procedure was implied for generating the connected features. The method was robust under varying lighting conditions and occlusion. [104] extracted the local features using Steerable Pyramid (S-P) wavelet transform method for accurate face recognition. The method divided the facial image into smaller blocks and performed the appearance feature extraction using S-P-DWT decomposition. The method achieved better results than PCA, LDA, Gabor and curvelet based methods.

Facial marks [105] such as wrinkles, pimples, moles and scars can be used to recognize an individual. The textural and structural features can be used to identify these points or regions. These marks are more important when the partial face or the occluded features are available. The marks count, positional and region cover mapping were accomplished by the researchers for improving the accuracy of facial recognition. Ohzeki et al. [106] acquired the rotation and size normalization adaptive feature points for the representation of facial characteristics. The author obtained the stable recognition of face using these point-specific holistic features. [107] provided a Hexagon-Scale-invariant feature transform (H-SIFT) for the generation of symmetric feature points. The method highlighted the edge response and low contrast region of the face. Various marks and facial details were identified by this feature extractor. The method achieved a higher recognition rate even for occluded faces.

5.5 Composite Feature Processors

A feature processor can perform significantly well to address a particular aspect or objective or challenge of the face. However, as the complexities in the face capturing scenario or environment increases, then multiple feature forms are required to deal with these challenges. The researchers have used different composite descriptors to handle distinct aspects of facial features. The fusion or weighted or aggregative methods are available to combine these features. A feature fusion [108] scheme was defined for generating the facial landmarks. The quadratic distance-based similarity matching was performed using fusion measures. The similarity adaptive fusion decision was taken by the author for exploring the feature descriptor. The method has reduced the error rate for both the front and poses variant faces [109]. combined the DWT and semi-decimated DWT (SDWT) methods for generating the expression invariant face recognition. The author used the coefficient enhancement functions to strengthen the approach. The feature filtration was done using Weber Local Descriptor (WLD) for highlighting the probabilistic region over the face. This feature fusion method acquired the most significant features using the dimension reduction method and improved the accuracy of pose, illumination and age invariant faces. Ngu et al. [110] applied the PCA and non-linear neural network for reducing the dimension of facial features and extracting the most effective visual features from face images. The method acquired the correlation adaptive features with similarity evaluation. The method was tested on various image classes and achieved significant accuracy for each image set. Lu et al. [111] proposed the covariance vector regularization (CMR) for generating the weights for available features and combines combined them in a matrix form. The method fused four features taken from color channels, LBP features and CNN features. The significance of the work was tested on LFW, MultiPIE, AR and Georgia Tech. databases (Table 7). The method achieved a higher accuracy for variant faces.

Table 7 Feature Extraction Methods

6 Recognition Models/Methods

Face recognition is accomplished while passes through a series of stages. However, once the normalized facial region is obtained, the main responsibility for achieving the higher reliability, robustness and performance using different feature extractors and the classification algorithms. Researchers have provided various models and frameworks in an integrated phenomenon with feature extraction, representation and face recognition. Each of the models is having a significant contribution either in terms of feature processors or the recognition method or both. Some of these methods are traditional and improved by the researchers to handle various real-time challenges. In this section, the most significant, popular and contributing face recognition frameworks, methods and measures are provided. The significant advancement in these methods and their research contributions are also provided in this section. Table 8 has provided the various face recognition models and methods along with the challenges addressed in the research. The section has also provided the functional description of traditional and popular methods adaptive by the researchers for feature generation and face recognition.

Table 8 Face Recognition/Classification Methods

6.1 Eigen Face Based Analysis

Eigenface [157,158,159] based facial feature analysis and recognition are the most used and important approaches adopted by PCA (Principal Component Analysis) in 1991. It is a distance analysis-based measure that can be implemented on the whole image or the geometric feature part such as on face area, eyes etc. In the simplest form; it does not use any facial feature and use the entire face as the workspace. To perform this analysis, we perform the initial analysis on complete database images called DImg1, DImg2……. DImgN. Once the face set is defined, the next work is to take the average face [157, 158] and its evaluation is shown in Eq. (38).

$$AvgImg = \frac{{\mathop \sum \nolimits_{i = 1}^{N} DImg\left( i \right)}}{N}$$
(38)

Now generate a new statistical dataset based on the difference value from average face AvgImg. MeanImage to difference analysis is shown in Eq. (39)

$${\text{DiffDatasetImg}}\left( {\text{i}} \right) = {\text{DImg}}\left( {\text{i}} \right){-}{\text{AvgImg}}$$
(39)

Now compute the matrix from this face database, where each element is given as a matrix and represented as Eq. (40)

$${\text{AtA}} = \left( {{\text{DiffDatasetImg}}\left( {\text{i}} \right) *{\text{ DiffDatasetImg}}\left( {\text{i}} \right) {\text{t}}} \right)/{\text{M}}$$
(40)

As the matrix is generated, the next work is to obtain the Eigenvalue and Eigenvector for the matrix. The Eigenvector is given by Eq. (41)

$$EigenVector\left( k \right) = { }\left(\mathop \sum \limits_{i = 1}^{M} DImg\left( i \right){*}X\right)/\sqrt {\lambda \left( k \right){*}M}$$
(41)

Here λ(k) is the Eigenvalue for the obtained matrix.

M is the number of database images.

Once the Eigenvectors are obtained, the Eigen distance of input face and all database faces is taken and provided in Eq. (42)

$${\text{EigenDistance }} = {\text{ EigenVector}}\left( {\text{k}} \right) - {\text{ InputFaceEigenVector}}$$
(42)

The face with the minimum Eigen distance will be accepted as the recognized face. The accuracy of the recognition process depends on affordable error. This error is considered as the threshold while performing the distance-based match [157]. The example of eigenfaces is shown in Fig. 9.

Fig. 9
figure 9

: Eigen Face Values [158]

The main drawback of this approach is its sensitivity to the image size, brightness, background, head orientation, etc. The Eigenvalue is effective for the known faces, but to identify the unknown face; it requires a large dataset and even though obtained results are not much reliable. Eigenface-based approach uses the selective and few features from the face and obtains the results based on approximation. The effectiveness of the results depends on the pre-processing stage, if the face and dataset are normalized, only in that situation, the results will be effective [157, 158].

An improvement over the standard eigenface recognition is the modular eigenface recognition system. This approach divides the whole dataset into smaller datasets respective of certain features. A layered system of eigenface and eigenfeature is adapted on each layer to get a better classification of a large facial dataset. The author implemented the work on a huge dataset with 3000 images and gets the recognition rate up to 98%.

Another improvement to the Eigenface approach is provided by [159]. The author introduces the concept of the Dual Eigenspace Method (DEM) that was based on K-L transformation. It was already observed that the traditional eigenface recognition approach was not effective when the variation in the training image is present in terms of head position, lighting conditions, facial expression, etc. DEM provides the solution to this problem by introducing the concept of distributed feature analysis. These distributed features are collectively called “Face Space”. According to this, instead of comparing the complete facial image, the analysis is performed on the feature class. The representation of this feature matching is given in Eq. (43)

$${\text{Si}} = \frac{1}{P}\mathop \sum \limits_{i = 1}^{P} \left( {m_{i} - m} \right)\left( {m_{i} - m} \right)^{T}$$
(43)

Here, P is number of people in the Training set. And (mi-m) is difference in feature. The author also observed that DEM provided a more accurate approximation for the recognition and provided a reduced error rate for the recognition process. This approach improved the recognition rate up to 97% that was about 87% in the traditional approach.

6.2 Local Correlation and Multiscale Integration Approach

It is a feature-correlation analysis approach for recognizing an object over the dataset. The feature was presented in the form of the local autocorrelation coefficients. The author computed about 25 autocorrelation coefficients under different kernels. These kernels are shown in Fig. 10(a). The kernels are represented in the form of a nxn matrix where n is either 3 or 5. This kernel matrix is scanned over the image and computes the value for each marked pixel position as shown in Fig. 10(c). The value of the kernel is multiplied with respective image pixel values, and some aggregative value is obtained for the marked position. This aggregative value is called the correlation coefficient. Based on these marked values number of kernels are possible for the evaluation. The author used about 25 different kernels of varied levels to perform feature computation [160]. To handle the low-resolution images, kernel scaling can also be applied. Here figure, 10(b) is showing the scaling of the kernel from 3 × 3 to 5 × 5 with the same order value 3.

Fig. 10
figure 10

Kernel Matrix Representation for Local Correlation Method

After the exploration of the feature vector, the next stage is to perform the classification based on the feature vector. The author adapts the LDA(Linear Discriminant Analysis) that used the feature vector \({x}_{m}^{k}\) obtained using correlation coefficient analysis. Here k is the number of classes and m is the number of images in each class. LDA uses a distance-vector analysis of a feature vector for both inter-classes as well as intra-class similarity [160]. The intra-class analysis is given in Eq. (44)

$${\text{Distintra}} = \sum\limits_{k = 1}^{K} {\sum\limits_{m = 1}^{M} {(x_{m}^{k} - Mean({\text{X}}^{{\text{k}}} ))\;(x_{m}^{k} - Mean({\text{X}}^{{\text{k}}} ))^{T} } }$$
(44)

whereas the inter-class analysis is presented by Eq. (45)

$${\text{Distintra}} = \sum\limits_{k = 1}^{K} {(x^{k} - Mean({\text{X}}^{{\text{k}}} ))\;(x^{k} - Mean({\text{X}}^{{\text{k}}} ))^{T} }$$
(45)

While performing the recognition, it is required that the Distinter will be maximum and Distintra will be minimum. With about 25 kernels, the success ratio obtained by this approach is 95% and having the scope with more accurate recognition if the kernel instances will be increased.

6.3 N Tuple Classifier

N Tuple Classifier [161, 162] is a single-layer classification technique based on integer value vectors obtained from the binarization of the image. It is one of the straightforward and fast classification approaches. This method performs the single-pass training by extracting the n-tuples from the face image. In the traditional form of N-Tuple Classification, a lookup table is maintained to track the address of each training image. To perform the recognition process, distance-based analysis has been conducted on coded values. In the traditional approach, arithmetic distance and hamming distance was used along with the gray encoding scheme. The drawback of tradition approach was the lesser recognition ratio. The improvement over the architecture of traditional approach was provided by Continuous N-Tuple Classifier in terms of d-dimensional input space.

This modified approach defines a vector instead of a single address to represent an image. This vector space defines the correlated subset of the original image. This method defines the vector of real values so that more accurate derivation will be obtained from the imageset. The projection vector obtained from an image is represented by Eq. (46)

$${\text{Yi }} = {\text{ x}}\left( {{\text{a1}}} \right),{\text{ x}}\left( {{\text{a2}}} \right) \ldots{\text{x}}\left( {{\text{an}}} \right)$$
(46)

To perform the recognition over this vector set, the distance-based analysis has been applied. The distance Manhattan distance matrix was adopted in this work to perform the accurate recognition [161]. The equation for this distance-based mapping is provided in Eq. (47).

$${\text{D}}\left( {{\text{x}},{\text{y}}} \right) = \mathop \sum \limits_{i = 1}^{n} \left| {xi - yi} \right|$$
(47)

Here, xi is the vector of some training set image and yi represents the vector of input face image. The training image with minimum distance will be adapted as the final recognized-face. Later on, this method was improved by some other distance analysis such as Euclidean Distance, Weighted Euclidean distance, etc. This method provided the multi-level recognition process so that more accurate recognition can be obtained [161].

6.4 Probabilistic Decision Based Neural Network

PDBNN [163] is an intelligent classifier that combines the statistical analysis along with the neural network. The probabilistic analysis where represents the statistical analysis, but the DBNN represents an intelligent multi sub-network scheme in which each face class is represented by a sub-network. The number of classes over the facial databases depends on the structural analysis. In the simplest form, a DBNN is defined with two sub-networks, one to represent the face images and the other to represent non-face images. The DBNN is defined with some learning rule that represents the confidence value for which an image will be accepted as the face image. The learning process of DBNN is further divided into two substages. In the first stage, unsupervised learning is performed locally that train each sub-network separately. This stage is defined with positive patterns so that the detection of the facial image will be done. Later supervised learning is implemented globally, and the decision boundaries are obtained. This learning stage is defined with the negative patterns so that correct rejection of non-face images will be obtained. To perform the classification based on likelihood functions, weightage can be assigned to different regions over the face areas. For this weightage assignment, each sub-network is divided in the number of clusters and Gaussian mixture based analysis is applied on each cluster to perform the cluster analysis [163]. The probabilistic likelihood function is defined in Eq. (48).

$${\text{P}}\left( {{\text{x}}\left( {\text{t}} \right)|{\text{w}}} \right) = \mathop \sum \limits_{r = 1}^{R} P\left( {\uptheta _{r} {|}w} \right)p(x\left( t \right)|w,\uptheta _{r} )$$
(48)

where \({\uptheta}_{r} represents the rth Cluster of Sub-network\)

P(\({\uptheta}_{r}\left|w\right) represents the prior probability of cluster r\)

As the basic model, this scheme is divided into three main modules. In the very first module, the probabilistic neural network was applied to identify the face area. To detect the face area, the threshold specification is defined as the decision criteria and the pre-processed face is obtained. The Neural network provided the effective recognition of face area based on different face positions.

6.5 Elastic Bunch Graph Matching [164]

It is another data structured oriented feature analysis approach to recognize the single instance face dataset. This prediction system is based on the graph representation of the face image in which the featured nodes are taken by obtaining the Gabor wavelet coefficient. The accuracy level of this method is based on the derivation of these Gabor coefficients called jets and the localization of featured nodes. To identify the landmark for these featured nodes, the change in the viewpoint is identified by the cross-points and these cross points are called fiducial points. Once the featured nodes and localization are done, they are represented as a new data structure called a bunch graph. A bunch graph is created based on multiple jets and the recognition is performed by comparing the magnitude values of these jets under some similarity analysis mechanism. The foremost stage after pre-processing is the extraction of the jet from the face image. The jet extraction based on the convolution of the image is given in Eq. (49) [164]

$$\varphi j(Mean\left( x \right) = \frac{{k_{j}^{2} }}{{\sigma^{2} }}{\text{Exp}}\left( {\frac{{k_{j}^{2} x^{2} }}{{2\sigma^{2} }}} \right)\left[ {\exp \left( {ik_{j}^{{}} Mean\left( x \right)} \right) - exp\left( {\frac{{\sigma^{2} }}{2}} \right)} \right]$$
(49)

Here \(\sigma\,\, is\,\, 2\pi\).

Kj is the wave vector.

In this work, jet is defined as a set with 40 Gabor wavelet coefficients that derive the different characteristics such as a change in orientation, size, frequency and phase. After the extraction of the Gabor filter, the similarity analysis based on these features is done. The similarity function used in this work is listed in Eq. (50).

$$\text{S}\left( {\text{J},\text{J}^\prime } \right) = \frac{{\mathop \sum \nolimits_{j} img_{j} img^{\prime}_{j} }}{{\sqrt {\mathop \sum \nolimits_{j} img_{j}^{2} img^{\prime2}_{j} } }}$$
(50)

where J and J’ are two Jets.

The similarity is estimated respective to the same position vector and for different displacements. The method identified the number of Jets that satisfy the similarity level. The image with maximum Jets mapping is selected as the resulting image. Another similarity function is also taken into account to analyze the Jets under the phase variation. This kind of similarity analysis is shown in Eq. (51).

$${\text{S}}\left( {{\text{J}},{{\text{J}^{\prime}}}} \right) = \frac{{\mathop \sum \nolimits_{j} img_{j} img^{\prime}_{j} {\text{~cos}}(\emptyset _{j} -{ \emptyset} ^{\prime}_{j} - \bar{d}\overline{{k_{j} }} }}{{\sqrt {\mathop \sum \nolimits_{j} img^{2}_{j} img^{\prime2}_{j} } }}$$
(51)

Here, \(\stackrel{-}{d}\) is the displacement vector and (\((\emptyset _{j} - \emptyset ^{\prime}_{j} ) \,represents\, the\) change in phase. Based on these two functions, the estimation of the similarity for two Gabor constraints is done. The approach is effective enough and provides more than 98% recognition rate.

6.6 Matching Pursuit Filters

The main idea of Pursuit filter [165] is about the selection of the Best Decomposition function on the Feature Set. To obtain the best function f, a greedy-adaptive approach was used. To identify the optimal decomposition function, the wavelet decomposition function g is applied on each iteration and selects the function that provided maximum projection.

$${\text{F}} = max_{{g{ } \in D}} { }\left| {{ }\left( {Ri,f,gi} \right)} \right|$$
(52)

Here Ri represents the featured function set. f is the function applied on iteration I, g is the wavelet applied on iteration I, F is the optimal adaptive decomposition function taken into account for the analysis.

Matching Pursuit Filter is adaptively important in the recognition process. The overall functioning is divided into two main stages. The first stage is to define a feature function that extracts and represents the facial feature for the dataset and the second is to select the adaptive feature component from the generated set. The feature component in the facial image is represented as the coefficient vector, and it is obtained from the facial image by using the projection vector. Based on the projection type, the residual image Ri is derived from the actual image t, then the extracted residual image is given by Rit. Where i represents the iteration. If the component g is also defined for the image, then for the component residual image for ith iteration is given by (Rit,gi). With each iteration, this residual image is decomposed and updated. After this iterative process, the wavelet-based feature extraction stage is completed [165].

After the selection algorithm, the recognition algorithm is processed to identify the similarity between the input object and the training images. To identify the similarity, the angle analysis is performed on the similarity coefficient. For this, the centroid of each training and input image is taken and estimate the mean distance as the similarity measure. To provide robustness in terms of linear images, the analysis is based on the angle basis. The presented work provided an accuracy up to 95.4% when the analysis is performed on five geometric features. As the complete work is statistical, high performance will be achieved in this work. Another advantage of work is that it can be used effectively for visible images as well as infrared images [165].

6.7 Analytic-to-Holistic Approach

This is the geographic feature point analysis approach adaptive to perform points-based matching rather than the whole facial image. The work is divided into two stages called the analytical stage and the holistic stage. The analytical stage is about to identify the facial feature points based on the positional analysis. In this work, five facial regions are considered to identify 15 feature points. These regions include the facial boundary, eyes, mouth, eyebrows and nose. For the facial boundary, six facial features are extracted. In the same way, four feature points for eyes, for mouth two feature points, for eyebrow two feature points and for noise, one feature point is extracted. This model performs the analysis based on internal constraint analysis, outer constraint analysis and image constraint analysis. The boundary is extracted as the parametric curve. This boundary curve is represented by Eqs. (53) [92]

$$\begin{aligned} {\text{BoundaryCurve}}\left( {{\text{Img}}\left( {\text{t}} \right)} \right){ } & = { }\mathop \smallint \limits_{0}^{1} InternalConstraint\left( {Img\left( t \right)} \right) \\ & + { }ExternalConstaint\left( {Img\left( t \right)} \right) + ImageConstraint\left( {Img\left( t \right)} \right) \\ \end{aligned}$$
(53)

where t, is the facial image vector and t is the parameter respective to which extraction is performed. With each iterative process, the force is applied to it and more specific vector constraint. The magnitude analysis and distance analysis is performed to extract more specific boundary over the face. Once the facial boundary is identified, an intelligent measure is performed to identify the mouth and eye regions on the facial image. In the same way, all 15 feature points over the facial image are extracted. After extraction of these feature points, the point-based matching is performed with all database images for all these points. The recognition accuracy is identified by using weighted Euclidean distance, given as

$${\text{MatchingRatio}} = \sum\limits_{i = 1}^{15} {x(i)\left\| {P(i) - Q(i)} \right\|^{2} }$$
(54)

Here I is the specific feature point, x is the weighted assigned to each feature, P is the featureset of input image and Q is the featureset for database image. The recognition rate achieved by the work is about 94%.

6.8 Nearest Feature Line Method

It is the feature point-based analysis approach called Nearest Feature Line (NFL) [166, 167]. At the initial stage, two distinct feature points are collected from the same feature class. As the point is obtained, the next work is to generate a feature line between these points and use it as the feature set for the analysis. To identify the nearest straight-line features, interpolation and extrapolation are performed on the featured line. The advantage of this feature line analysis is to achieve robustness in terms of illumination, pose and expression. After getting the featured line dataset, the distance-based analysis is performed between the input feature set and the database feature set. The image with minimum distance will be elected as the matched image. The feature class of this method is been derived using the Eigenface approach. This approach provided an accuracy level of up to 97% [166]. Another improvement to line-based recognition was proposed by [167]. Instead of using the single line segment author used the multiple line segments that are selected randomly. During the pre-processing stage, the oval shape of facial boundary is extracted and some geometric operations like rotation, scaling is performed over the face to achieve robustness in terms of geometric features. The endpoints of the line segments lie on the boundary of the facial oval area. The accuracy and the performance of the approach depended on the coverage of the facial area for these random line segments. To perform the classification and recognition, the nearest neighbor analysis is performed over the dataset images.

6.9 Elastic Graph Matching

This approach defines a graph matrix over the facial area with the set of featured nodes. Each node of this graph extracts the positional features based on neighborhood analysis. To provide robustness, the Gabor based feature decomposition is performed. The Gabor filter here covers the orientation and rotational aspects so that more accurate decision making will be performed. This Gabor based approach provides analytical matching under m orientations and n resolutions [168].

6.10 Local Binary Pattern

The textural features of the facial image are unified in structural and textural form using LBP (Local Binary Pattern) [147, 148]. This method performs the textural-neighborhood analysis and quantified it as 0 and 1 values. The center-specific evaluation is conducted in this method and applied a threshold role to generate the binary pattern. The researchers have used various methods and rules based on mean, median, max, etc. for computing the adaptive binary values. The size of the window also affects the generation of the LBP pattern. The standard size adapted by most of the researchers is 3 × 3 or 5 × 5. After generating the rectangular blocks, the processing block is computed respective to the surrounding eight neighboring blocks. Now the distance vector is computed for each surrounding block respective to the center block. The 3 × 3 neighborhood blocks are represented by {b0, b1, b2….b8} and b0 is the center block. The intensity-based analysis is performed respective neighbor blocks. Each neighbor is compared with the center block using some threshold value. This rule formulation for the generation of binary values on neighbor blocks is given in Eq. (55)

$$b^{\prime} =_{j} \left\{ {\begin{array}{*{20}c} 0 \\ 1 \\ \end{array} } \right.\begin{array}{*{20}c} {\,If\, b_{j} < b_{0} } \\ {\,otherwise} \\ \end{array}$$
(55)

The equation represents the generation of binary patterns over the facial image. This structural and textural feature-based pattern can be used as adaptive features to recognize the facial image.

7 Future Directions/Challenges

Face recognition has already proven as the foremost and the most visible biometric feature to recognize an individual. The identification of a stranger and recognition of an individual without his permission is possible only through facial biometrics. This biometric feature can be acquired through the mobile camera, CCTVs, Webcams and sting devices. With the advancement of capturing devices and the associated technology, the application of facial recognition is also spreading. Forensic science, patient pain recognition, expression-based customer or person assessments, etc. are getting popular in research. Some of the new and approved research areas in face recognition are described below:

7.1 Group Face Recognition

The beauty of the face feature is its availability and easy capturing in an open environment. The individuals can be captured through CCTV, hidden cams in an open environment. These individuals can be compared over the wide dataset for real-time authentication and authorization of individuals. In crime scene investigations, classroom surveillance, college or campus-based entries can be restricted through such capturing in real public environments. These kinds of applications require recognizing and isolating the individuals in groups [12, 169, 170]. In group images, there can be m number of individuals with different environmental and situational complexities. The pose variation, varying distance from cameras and partial occlusion are the common challenges in group photo processing. Liao et al. [171] provided a method for GroupWise image registration. The query image-based image re-reconstruction method was used for transforming the group to individual multiple images match. The salient scale based component matching was applied by the author. But, as the number of individuals in the environment increases and body occlusion occur, it becomes more difficult to detect individuals from the group. Still, the lesser work is done for the identification of the individual from the group phase and it is an interesting and challenging research area.

7.2 Morphed Faces

Today, we can see a lot of applications that provide easy morphing in terms of changing the complex, structure and facial elements. Even a non-expert can generate the morphed [172] images and faces. The traditional FRS is not robust against the morphed faces. The morphed face recognition and identification of real face are some of the recent challenges in FRS. The challenge becomes more critical when the morphed image is constructed by taking the facial components of more than one individual. In such a case, the component-specific feature map is required for recognizing the individual. Image morphing is also done by the individuals for representing themselves more attractive while posting images in public domains such as avatars [173, 174]. The face recognition for these morphed and retouched images is a challenging and innovative research area.

The facial marks, structure and textural features are collectively used for facial recognition. The high makeup or plastic surgery [18] is applied to remove or alter or affect these features. The accuracy of the face-recognition system will be affected in the absence of these features. Plastic surgery or makeup robust facial recognition is one of the recent research phenomenons. The facial traits are changed by applying these skin treatments. The deep feature extractor and facial point [175] analysis were used for the identification of scale variant plastic surgery faces. The research can be accomplished to identify the feature descriptors that are not affected by plastic surgery. The identification of unaffected regions, structures and points can be identified and used collectively or individually for improving the recognition of such altered faces.

7.3 Sketches to Face Recognition

The recognition of a suspect or criminal over the facial database is one of the advanced research areas. The challenges in the face recognition system are the availability of the larger database of history sheeters and a hand-drawn sketch of a suspect. In such systems, even the images in the face database are older and not clear. In such case, a sketch-to-face [176] recognition system is required to recognize the suspect. The quality of the system is also dependent on the briefing of the eyewitness and the expert hands of the forensic artists. This kind of recognition system must be adaptive to facial components, marks, structure and aging. The composite component processing can be applied for the identification of near-matched suspects from the criminal databases. The better accuracy and robustness to handle identical sketch queries are still a research problem. The selective and composite component-based hybrid feature match can be applied for improving the accuracy of sketch-to-face mapping.

7.4 Infrared Face Recognition

In recent images, thermal images are adopted as the solution to handle the environmental complexities and challenges. The night-vision cameras use the same technique to identify the individuals in the dark. Thermal [177] images capture the temperature of facial skin in the form of heat radiations. The thermal faces can handle the noise, opacity and low resolution effectively. Once the thermal images are collected, the feature acquisition and recognition methods are applied for effective face recognition. Researchers have provided the appearance, local matching and global feature matching based methods for accurate recognition of faces. Even though, it is one of the recent areas, which is having scope to recognize the face under other real-time complexities. The fusion [178] of visible and thermal-IR images were used for improving facial recognition against lighting variations. Different combinations of similar and distinct features from both kinds of images can be used collectively for improving the robustness and accuracy of degraded face recognition. Farokhi et al. [22] provided a study of various methods and measures for the recognition of NIR (Near Infrared) faces. The frequency, moment, appearance and orientation based methods were provided by the researchers for effective infrared image recognition.

7.5 3D Face Recognition

3D images [21] provide a realistic representation of face images with depth information. As the extensive features are provided by 3D faces in terms of pose specification. The 3D cameras also provided the extraction of the face through multiple angles. A lot of work is provided by the researchers for 3D face recognition. However, still it lacks implementation in the real environment. 3D face processing is the open research area that is also extending against the various real-time challenges. The illumination, environment and expression variation degrades the performance of recognition of 3D faces. The transformation and deformation [179] of 2D to 3D and 3D to 2D is also a research problem for mapping the face with dissimilar dimension images. Patil et al. [180] identified various challenges and scope of 3D face recognition in current scenarios. The application-based study with associated challenges was provided in this paper. The available 3D databases are also described that can be used for initiating the work in this area. The geometric and sparse [181] features were used by the researchers for improving the performance of 3D face recognition. The feature ranking and composite feature-based 3D face processing can be applied for improving the accuracy of 3D FRS.

7.6 MicroExpression Recognition

The scope of Face expression recognition is integrated into many applications such as intension identification, pain identification. In most of the face-expression recognition system, six or seven facial expressions are identified. However, each of the expressions is having internal sub-classes called micro-expressions [182]. Micro-expression recognition is a challenging and recent research area. The scope of this area is to identify the degree of expression such as the severity of pain in patients. Identification of depression [183] for autistic patients by observing the facial micro-expression is also the current research area. The discriminant, spatial and local information can be acquired and processed individually or collectively for the identification of micro-expression.

8 Conclusion

Face recognition is the traditional and now most popular biometric feature. FRS is not limited to online and offline authentication. Instead, the FRS is gaining popularity in various advanced applications, including intention identification, pain identification in hospitals, attention recognition etc. The technological developments have provided the camera in hand of each individual. The variances in these cameras exist in terms of quality, type and applications. The surveillance systems, hidden cameras and mobile-cameras are a part of daily life. Even the infrared cameras, 3D cameras, night-vision cameras are also available for advanced applications. Because of this, the challenges in the face-recognition system are increasing under the specification of application and environment. The illumination, noise, pose and expression variations affect the performance and quality of the face recognition system.

In this research, a detailed study on the algorithmic transformation is provided respective to the different challenges that exist in the real-time environment. We try to explore each stage of the face-recognition system concerning the issues handled at that particular stage. The database processing and the analytical results obtained by the different researchers for each specific stage are provided. The most popular methods of each stage are also described with the extensive formulation. The paper has provided the approaches and their significance to improve the results of preprocessing, segmentation, feature generation and recognition stages. The analytical results can identify the most popular databases as well as algorithms/models for optimization of each stage of FRS. The paper also identified the open research directions and application in FRS.