Keywords

1 Introduction

Face recognition (FR) is among the most well-studied aspects of computer vision. Through the use of deep learning algorithms and bigger volume datasets, researchers have subsequently seen substantial development in FR, notably for limited social media web images, such as high-resolution photos of famous faces taken by professional photos [1]. However, the far more difficult FR in unrestrained and low-resolution surveillance imagery, on the other hand, remains unsolved and largely unexplored. In the extent of image analysis besides computer vision, face recognition is a challenging task. Face recognition is a biometric technology that uses a digital image to identify or authenticate a person. It is mostly utilized in security and surveillance. Deep neural networks have lately made great progress in general object recognition [2]. Automatic Face Recognition and Surveillance aid in the development of a secure technology for the upcoming era of computers [3]. The following is indeed the basis for a review of the literature: As stated in [4] variations in time, age, and circumstances have an impact on each person’s face, skeletal structure, muscle development, and body composition. Face recognition systems, equally images and videos, are becoming increasingly popular to use. This system is essentially focused on a variety of poses, expressions, and illuminations.

Face recognition has been shown to exhibit compression impacts since the imageries are immediately retained and delivered in a compressed state, while depictions have been tested with extensively, but mostly in uncompressed image files. It addresses challenges like monitoring and image classification and object appearances in a way utilized as a collection and compressed video segments per any blob examining even while working with still-to-videos.

Face recognition was demonstrated in real-time using a camera, an image, or a set of faces tracked in a video by the researchers in [2]. They evaluated the distance among both landmarks and particularly in comparison the test image to various established encoded image landmarks, derived HOG features, and then categorized them notwithstanding the lighting, expression, radiance, aging, transformations (translate, rotate, and scale the image), or pose during the recognition phase. Researchers were able to create an automatic face recognition system using a picture or video of a person’s face acquired via a mobile device or a webcam.

Face recognition was achieved by integrating two methods: the histogram of oriented gradient (HOG) and the Convolutional Neural Network (CNN). HOG excels at identifying image edges and corners. When contrasted with the local binary pattern (LBP) [5], which employs all eight dimensions for each pixel, HOG utilizes a specific direction for each pixel. However, the coarseness of the binning used by LBP causes it to lose information. Under complicated changes in light and time conditions, HOG features with fewer dimensions perform better than LBP features. With reduced computing time for feature extraction and a reduced number of feature vector magnitudes, HOG features outperform VGG16, VGG19, and others.

The very same image may reflect different pixel information due to variations in illumination and intensity of the acquired images, which is a significant impediment in identifying a person’s face. The acquired image was first processed to grayscale, and afterward, the gradient of each pixel was examined based on the lighter to a darker pixel value in the HOG approach. It spotted all of the faces in an image frame based on the gradient analysis. To set up the posing and projecting an image of the frontal face, a face landmark is being employed.

The CNN learning algorithm was used to detect faces depending on the encoded face of the current frame and already cached encoded faces. Face recognition has long been regarded as a watershed moment in image processing. Even while cameras are now found in almost every home, on the streets, and in businesses, detecting a person from the footage is a time-consuming operation, limiting the security’s effectiveness, this is one of the reasons face recognition have to be enhanced for effective use of the webcam for surveillance [6]. They overcame the limitations of using webcams for surveillance by improving the face recognition algorithm. The face recognition algorithm consistently contrasts a live video stream with an uploaded image from the database, so when the specific object that is the person is detected, an incredibly quick alert is sent. Surveillance cameras, particularly those installed at airports and other public locations, maybe an extremely effective tool for locating missing people as well as other wanted individuals. They overcame the detection of more than one face in their work.

Surveillance is essential nowadays as societies depend on them to improve safety and security, especially where crime is likely to occur such as car parks, supermarkets, office environments banks, construction sites, and motorways. Currently, video data is mainly being used for forensic purposes; this makes it lose the benefit of being a pro-active real-time alerting system since most of the crimes are usually discovered after the harm has been done. This leaves room for further research in surveillance that is continuous monitoring to send an alert in real-time. Smart cameras are now being incorporated into intelligent systems for surveillance to recognize looks in a crowd in real-time.

2 Biometrics

Biometrics is body measurements and calculations associated with human features. Physiological appearances are referred to the shape of the body. Examples to mention on which researches are going on few embraces like face recognition, iris recognition, palm veins, Deoxyribonucleic acid (DNA), fingerprint, face recognition, palm, vein, retina, and odor/scent [7,8,9,10].

2.1 Face Biometrics

Face recognition finds a useful application in several cases including eliminating duplicate entries in a country’s voter registration system preventing a person from registering twice. In access control such as computer logon or office access, security at airports for passengers and airline staff all around the world. It is driver’s licensing offices, for next of kin benefit recipients, police bookings, banking, electoral registration, employee IDs, identification of newborns, national identity cards, in surveillance operations, passport verification, criminals list verification at police sector, Visa processing, and Card Security control at ATMs.

While facial recognition can be done reliably, quickly, and continuously in controlled environments, the technology is currently too rigid and general to cope with real-world situations. Aging, transformation in facial hair, viewpoint distinctions, and cluttered contextual are an automatic facial recognition system that faces some significant problems. Face identification is difficult to automate because faces are a type of natural object that does not lend itself to simplistic geometric interpretations. Computer-assisted face recognition has the potential of being able to manage a huge quantity of faces, meanwhile, the human brain has restricted memory [11].

2.2 Face Detection Methods

Detecting Faces Techniques are classified as feature-based techniques, in which characteristics provided by [12] express an individual’s identity and image-based techniques. With all its statistics and structural classifier, the feature-based algorithm [13] outlines how local features are obtained and their positions. Similarly, image-based approaches [14] used algebraic processes to define color modifications. The qua-ternion is used to build generalized linear filtering methods and a new color edge detector.

Researchers in [4] classified facial recognition algorithms into two invariant techniques: discriminative and generative approaches. Discriminative approaches rely on basic data such as age, weight, skeletal structure, and body mass to be studied, however generative approaches outline the procedure for feeding the data into the model. The Who Is It database, which was created by developing an effective database, comprises age and weight information as well as facial imagery. The database solely includes public figures in an attempt to show changes in age and weight over time. The program tries to distinguish images that have changed in age and weight over time. The outcome of weight is evaluated besides subsequently, neural networks are taught. Training comes first, followed by testing in a learning-based system. In comparison to other methodologies, the researcher obtained a 28.53% Rank-I identification performance accuracy with a 3.4% minimal error rate. Over decades, the system has attempted to detect an individual’s facial appearance, position, aging, actual or artificial, disguise, and plastic surgery as described by certain researchers on covariates of imageries.

Face detection in color photos is challenging once the background is multifaceted and the luminance varies, making skin detection problematic and resulting in false positives. A parallel structure algorithm of skin color recognition was used to enhance detection reliability and to create a classifier using a Gaussian-mixture model and the Ada-boost training algorithm to eliminate false positives. Face Candidates algorithm is used to test the face detection algorithm for the skin color model, and then Ada-boost trained algorithm is used to test the classifier’s verification algorithm on several images as training examples. Face recognition has also employed color palettes with various applications such as images retrieval, color palette, and color transfer.

The expression and impression of color amalgamations is conveyed by the mixing of colors classified into abstract categories through a distinctive set of colors. The pattern matching method outlines the entire facial features to associate input and reference patterns for face detection. For a typical human being, the most difficult challenge is to apply the facial recognition retrieval model for a correct match in the shortest amount of time. Exclusively, when relating with non-static or dynamic environments such as live streaming, webcam recording, or viewing real-time video where facial features are not distinct enough to use as an input image. To create such a model, the research presented by [15] created a model for solving both steps, Facial Detection, and Facial Recognition. Pattern recognition in video files is used in the facial detection stage, which is executed using a single picture matching algorithm. The second phase was to deliberate the image input from the camera, which began with a GUI for chopped square frame design to transmit the important key extent for separating facial features from a complicated background. Second, the out-turn picture obtained through the data source is recognized, and the mean is calculated using Successive Mean Quantization Transform (SMQT) and Eigen techniques applied to the images. After that, it breaks up using the Sparse Network of Windows (SNOW) classifier for facial detection at a high-speed rate with no impact on the background context. The method has been tested on 150 input image snapshots collected from a webcam and has been verified to be 100% accurate.

Developed a new Surveillance Face Recognition Challenge, dubbed QMUL-SurvFace, to encourage the development of innovative FR algorithms that are successful and robust for low-resolution surveillance face pictures [16]. The low-resolution facial images were captured from real surveillance videos, not from fake downsampling of high-resolution footage. This baseline contains 463,507 facial images representing 15,573 distinct identities taken in uncooperative surveillance scenarios over a significant period. As a result, QMUL-SurvFace is a true-performance surveillance FR problem with low resolution, motion blur, uncontrolled poses, changing occlusion, poor illumination, and backdrop clutters. Evaluate the FR performances of five sample deep learning face recognition models (DeepID2, CentreFace, Vgg-Face, FaceNet, and SphereFace) against current standards on the QMUL-SurvFace task [16].

Appearance-based Face Detection. To identify the relevant features of the face and non-facial imagery, these methods use statistical analysis and machine learning techniques. The learnt qualities are expressed in the format of distribution models or discriminant functions, which are subsequently used to detect faces. Meanwhile, dimensional minimization is commonly used to increase computation and detection effectiveness.

Feature-based Face Detection. These methods, also referred to as constituent face recognition, rely on the relationship between the components of the face, it employs invariant features of faces for detection. The idea is that humans can detect faces and objects in a variety of positions and lighting environments, hence attributes or features (such as brows, nose, eyes, mouth, and skin color) must be invariant across these variations. A statistical model is initiated based on the retrieved features to depict their relationships and verify the presence of a face. The reliability of visual feature detection is crucial in this approach.

2.3 Face Recognition Methods

Face recognition methods can be grouped broadly into two: Learning-based Methods and Hand-crafted Methods.

Learning-based Methods. The learning-based methods usually employ convolutional neural networks (CNN) of varying configurations and depths (layers). A baseline CNN is made up of several layers, each of which uses a variational function to transfer one volume of activations to another. Its architecture consists of at least a Convolutional Layer, Pooling Layer, and Fully-Connected Layer. Due to their excellent learning ability especially for large-scale input data, they are constantly being employed by more and more researchers [17]. Deep learning’s accomplishment in face recognition has recently surpassed those such as handcrafted and machine learning methods [18]. CNN architectures strive to be deeper and much more complex to acquire improved recognition performance, which consumes resources, time, as well as space. Nevertheless, CNN is used to learn and extract useful features from an image, they also have the advantage that different configurations already trained for specific tasks exist and could be adapted. Certain layers of a trained model (typically the last output layer) can be removed, and the activations of the lower levels can then be used as fixed feature extractors. Several studies have achieved promising results using these deep characteristics [19, 20], and [21].

Hand-crafted Methods. The hand-crafted methods are further divided into four broad categories: a global approach, local approach, appearance or holistic approach, and other methods (which do not fall under the first three).

Global Approach. These are features based on the general texture or appearance of the image. There are a lot of global feature extraction approaches in literature but the most widely used are Gabor filters [22]; Histogram of oriented Gradients [23]; Local phase quantization (LPQ) [24]; Discrete Cosine Transform (DCT) [25]; Local Binary Patterns (LBP) [24, 26]; Weber local descriptor (WLD) [23, 27]; Local Oriented Statistics Information Booster (LOSIB) [23].

Local Approach. This approach focuses on the local facial features such as eyes, mouth, and nose, computes their locations, and applies statistical properties, geometry, or appearance as the determining factors for classification. These are traits that are focused on the image’s most crucial details and their spatial relationships with each other. The most commonly used textures are Scale Invariant Feature Transformation (SIFT) [28]; Speeded Up Robust Features (SURF) [29]; Symmetry Assessment by Feature Expansion (SAFE) [28]; Binary Robust Invariant Scalable Keypoints (BRISK) [30]; Oriented FAST and Rotated BRIEF (ORB) [31]; Phase Intensive Local Pattern (PILP) [29].

Holistic Approach. The entire facial region is regarded as data input for the facial capture system in this approach e.g., Eigenfaces, Principal Component Analysis (PCA), Linear Discriminant Analysis and independent component analysis, and so on. The holistic-based technique tries to distinguish a face by employing global representations, that is, the image as a whole. To acquire the feature vectors, methods like Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Linear Discriminant Analysis (LDA) are utilized. Examples of these features include Eigenface (implemented using PCA) and Fisherface (implemented using LDA).

Biologically Inspired Features (BIF). are imitative primate’s feed-forward model of visual object recognition pipeline which is acknowledged to be intelligent to recognize visual patterns with high exactness. Gabor functions are employed to model basic cells in mammalian brains’ visual cortex. Gabor filters show frequencies and orientations that are similar to frequencies and orientations in the human visual system. As a result, Gabor filter image processing is regarded to be similar to human comprehension in the visual system. These features were applied by [13].

Elastic Bunch Graph Matching (EBGM). EBGM stands for feature-based face recognition. Certain facial traits are selected by manual interaction. These characteristics are used to create a bunch graph. The bunch graph’s numerous nodes represent various facial landmarks. We may establish the gap among a given test image trait and the closest accessible train image feature by scanning for the shortest measure and analyzing a single train image to all of the training images. A feature extraction method incorporates both a holistic and a local approach. 3D imagery is used in the majority of hybrid techniques. Also, because the image of a person’s face is acquired in 3D, the technology can identify the curves of the eye sockets, including the forms of the chin and forehead. Since the technique uses depth and an axis of assessment, a profile face may be sufficient as it has significant data to build a whole face.

Others. Some researchers [32], have exploited the use of facial marks such as moles and freckles to try to recognize faces though in combination with LBP and Fisher vectors. Other approaches include Active Shape Models (ASM) which uses the shape of an object (face) as its features by a collection of landmark points at clear corners of the face and facial landmark boundaries [33]. AAM builds a shape model and an intensity model from such a collection of training samples using principal component analysis (PCA) [34].

3 Generic Modes of Face Recognition System

A face recognition system comparing it to other biometric recognition systems operates in two modes [35]:

  1. a.

    The training mode: the face image of an individual is captured using an acquisition sensor like camera and scanner. The acquired face image is processed and stored in the database with a label (name or unique number) for easy identification or verification.

  2. b.

    The testing mode: the face image stored is once again acquired and processed to obtain the necessary features required to either verify or identify the individual.

3.1 Generic Modules of Face Recognition Systems

A face recognition system as shown in Fig. 1 is designed using the following basic modules. Modules 3 and 4 are carried out with CNN i.e., the CNN architecture is used for the features extraction and the classification stages.

Fig. 1
A block diagram of a generic face recognition model with 4 modules presents how the features of an image are stored in the database via different stages.

Block diagram of generic face recognition system

  1. a.

    Images acquisition: an acquisition sensor like a camera or sensor is used to capture faces from images or videos. The images must have a considerable amount of spatial information about the face before they can be useful.

  2. b.

    Images pre-processing: the pre-processing entails cropping out faces from the acquired images and performing some enhancement on them, to make subsequent processing easy and also to advance the overall performance of the system.

  3. c.

    Feature extraction: this involves extracting the low-level features like edges, lines, dots, medium-level features e.g., texture and color, and high-level features e.g., shape from the face images. The features are used for the recognition process.

  4. d.

    Matching/classification: this compares the features obtained during recognition against the stored images to produce a matching score.

  5. e.

    Ion against the stored images to engender a matching score.

3.2 Overview of Convolutional Neural Network

Convolutional Neural Network (CNN) helps to achieve excellent learning ability for classification of both large-scale and small-scale input data [17]. CNN because of its flexibility and adaptability makes it possible to take out different configurations from already trained models. Convolution simply means applying filters also known as kernels or windows to each image pixel. It tries every possible match. Convolution is performed at each convolutional layer. A layer means a stacking operation and in a convolution layer, the layer consists of the stack images that have been filtered. CNN has been existing since the 1990s but has gained popularity due to its ability to solve recognition problems hence improving computer vision. CNN has its uniqueness from another neural network because of its assumption that all inputs are images, this allows it models its architecture in a way that it recognizes basic image-defined features which help in pattern recognition, face recognition, digits recognition, and many more.

3.3 Overview of Deep Neural Network

Deep learning is a sort of machine learning which enables computers to learn by instance in the likely manner that humans do. Deep learning has progressed to the degree that it can currently beat humans in certain tasks, which include object classification in imagery.

The intrusion detection challenge has been compliant with machine learning methods due to the vast capacity of network telemetry besides other sorts of security data. Numerous modern commercial intrusion detection systems, or security platforms, employ machine learning-based algorithms as part of their detection technique. These methods are often classified as part of the intrusion detection approach’s oddity detection class.

There are two types of machine learning models: shallow learning or typical models and deep learning models from 40 machine learning models. Deep learning models are neural network models with a large degree of hidden layers that are currently in use. These models can learn extremely complex nonlinear functions, and hierarchical layering allows them to learn relevant feature representations from incoming data. Deep learning algorithms have recently achieved success in a variety of domains, including image 45 categorization. There are two key reasons deep learning has lately become useful:

  1. a.

    Deep learning necessitates a significant deal of computational power. A parallel architecture is suited for deep learning on high-performance GPUs. This helps developers to reduce deep learning network training time from weeks to hours each when used during conjunction either clusters or cloud computing.

  2. b.

    Deep learning requires substantial labeled data. For example, driverless car development requires millions of images and thousands of hours of video. Apart from scalability, another benefit mention often about deep learning models is their ability to perform automatic feature extraction from raw data, also called feature learning.

Deep learning architectures for example deep neural networks, deep belief networks, convolutional neural networks, and recurrent neural networks have been put into fields including natural language processing, vision speech, computer vision, speech recognition, audio recognition, medical image analysis, machine translation, material inspection, and bioinformatics to mention few where the findings are on par with, if not better than, the efficiency of a human expert. Generally, these architectures can be put into 3 specific categories:

Feed-Forward Neural Networks. This is the least used model of neural networks in practical applications. The first layer is the inputs, while the last layer is the outputs. Neural networks with far more than a hidden layer are referred to as “deep” neural networks. They do a series of calculations that change how related the instances are. Each layer’s neurons’ activity is a nonlinear function of the previous layer’s neurons’ activities.

Recurrent Networks. In their connection graph, these have directed cycles. As a result, following the arrows can sometimes lead you back to where you started. These may exhibit complicated dynamics, making training them difficult. They have a physically more realistic aspect to them. There is a great deal of interest right now in figuring out how to train recurrent networks efficiently. Modeling sequential data using recurrent neural networks is a quite natural development. They’re similar to very deep nets with one hidden layer each time slice, with the exception that they use the same weights and receive input at each time slice. They possess the ability to recall information for a long time in their concealed condition, but it is extremely difficult to instruct them to use this skill.

Symmetrically Connected Networks. These are similar to recurrent networks, but the unit connections are symmetrical (they have the same weight in both directions). Recurrent networks are substantially more difficult to examine than symmetric networks. As they follow an energy function, they are likewise limited in what they can perform. “Hopfield Nets” are symmetrically linked nets with no hidden units. “Boltzmann machines” are hidden units in an asymmetrically linked network.

4 Related DNN-Based Face Recognition Work

This section reviews existing works related to the development of a face recognition system for enhanced security surveillance. Several research works have been carried out in the field of face recognition from images captured by webcam with impressive results; however, there is still a lot of room for contribution. A summary table is presented in Table 1.

Table 1 Previous work on the development of a face recognition system for enhanced security surveillance

5 Summary

After reviewing existing works of literature, Deep Convolutional Neural Network (DCNN) has proved to attain state-of-the-art results for face recognition as a security means to prevent intrusion, 1w especially for large datasets. DCNN also has the advantage over other neural networks for image classification because DCNN automatically detects the important features without any human supervision. This chapter also explains the importance of surveillance concerning face recognition.