1 Introduction

Face recognition is one of the widely researched topics in the field of computer vision for decades now. Currently, face recognition has reached mobile devices for unlocking of phones and surveillance purposes using drones [1]. Some common face recognition challenges are occlusion, make-up, illumination, image processing attacks etc. [2]. Face recognition has been studied under different attacks viz. stealth attacks [3], spoof attack [4], presentation attack [5], backdoor attacks [6].

This work is the extension of Sharma and Kumar [7]. The earlier work was done without the mathematical modelling and the pseudo-codes of the image processing attacks presented in this paper. In previous work, the focus was on literature already existing in addition to the empirical evaluation of the attacks. Zangeneh and Moradi [8] proposed a method to recognize the facial expressions using the differential geometric features. Geometric features are extracted by identifying the changes in the facial landmark values after the change in expression. Ahmad et al. [9] presented a pre-processing technique using independent component analysis to separate the single image's illumination and reflectance component for a face recognition system. Hsia et al. [10] proposed a backlight compensation technique to improve face recognition accuracy. The brightness and contrast of a face image favourably impact the quality of the face recognition system. Parubochyi and Shuwar [11] presented a self-quotient image method based on globally modified Gaussian filter kernel for light normalization. The most significant advantage of the self-quotient image technique is that it uses a single shot of an image. Sharma and Patterh [12] presented a review of feature extraction and recognition techniques for faces. The main methods that have been highlighted in this paper are Support Vector Machine based machine learning for face recognition, Latent Dirichlet Allocation and Discrete Cosine Transform feature engineering techniques. There are different researches in the field of face recognition that link with adversarial attacks [55, 57], make-up [56, 58], expression-based [60], age-based [63], handling bias [62], presentation attacks based [59], and based on explainable artificial intelligence [61].

Image processing attacks are classified into three broad classes, namely, image enhancement attacks, geometric image attacks, and image noise attacks [13]. Enhancement and noise attacks do not affect the number of pixels in an image but modify them. In geometric attacks, the number of pixels is involved. Machine learning plays a vital role when working in pattern recognition and image classification. After the features have been extracted from a face image, they are quantized using the rounding-up technique and then given as input to the machine learning algorithm for training purpose. Quantization is a signal processing technique that converts the given input into smaller sets most commonly by rounding up technique or modulus technique. Quantization can also be seen as a compression technique as the original features are being reduced. Four classes of machine learning viz. support vector machine, k-nearest neighbour, decision trees and discriminant analysis along with ensemble modelling have been explored for training and testing of image attacks invariant face recognition system [14,15,16,17,18,19].

There are many facial datasets available publicly. In the presented work, two datasets, namely Bosphorus face dataset [20], and University of Milano Bicocca (UMB) face dataset [21] have been used. We investigate the image processing attacking from a new perspective: how they affect the machine-learning-based face recognition techniques. In our knowledge, this is the first attempt to study the impact of different machine learning algorithms on the face recognition system under image processing attacks. The ten well-known image processing attacks are discussed with their time complexities. These are blurring, sharpening, median filtering, histogram equalization, resizing, rotation, cropping, Gaussian noise, Poisson noise and speckle noise attacks. They are evaluated in conjunction with ten machine-learning variant based face recognition techniques over two face databases. The rest of the paper is structured as follows: Sect. 2 presents the preliminary concepts of image processing and face recognition systems. Sect. 3 introduces the machine-learning models-based face recognition system. The experimental results and discussions are mentioned in Sect. 4. The visual verification of the system is shown in Sect. 5. The concluding remarks are drawn in Sect. 6.

2 Preliminaries

This section discusses the theory and mathematics related to the subject of image processing attacks and face recognition.

2.1 Basic Concepts of Face Recognition System

Training and testing are the two significant phases in face recognition. While training the face recognition system, a certain portion of the dataset is considered out of the full dataset. Face registration, pre-processing, feature extraction and machine learning are performed gradually till the classification model is trained for face recognition. Testing is done using the probe image by completing the registration, pre-processing, feature extraction and training generated model validation. The phases which are responsible for face recognition under different challenges can be seen in Fig. 1.

Fig. 1
figure 1

Phases of face recognition [7]

2.1.1 Dataset Collection

The face images can be collected with two methods, namely primary and secondary approach. Dataset is primary when the researcher collects data for novel use else; it is secondary [22]. The collected face images are correctly labelled for the right usage. Two dimensional (2D), two and a half dimensional (2.5D) or depth images and three dimensional (3D) [52,53,54] are the three type of face images that can form a dataset in single or multiple repositories.

2.1.2 Training Images

In the training phase, multiple images are read into the face recognition system being built. When training and testing phases are in the face recognition system's development phase, the training-testing ratio is set. When the best approach is found for creating the face recognition system, a full dataset is used to train the system. When a probe image comes for face identification or verification, it is processed and matched for correlation with the images trained in the system, returning the identified or verified person of interest.

2.1.3 Face Registration

After reading the dataset, the next task is to do the segmentation of face from the image. The main reason behind the face's segmentation is to focus on the pixels of face only and discard the rest of the image for better training purposes. This process of focusing on the face is known as face registration. It can be enhanced by using multiple techniques viz. iterative closest point (ICP) algorithm, spin images, simulated annealing and intrinsic coordinate system for the three-dimensional face registration process [23].

2.1.4 Image Pre-processing

This phase improves the quality of the image being processed. It can be either of the enhancement, geometric or noise attacks. Pre-processing an image is necessary for making the image ideal for feature extraction. This phase takes place in both cases of training and testing of the face recognition system.

2.1.5 Feature Extraction

There is a plethora of feature extraction techniques in the image processing literature viz. histogram of oriented gradients (HOG), speeded up robust features (SURF), local binary pattern (LBP) features, haar-like features, haralick features etc. [24,25,26,27,28]. All types of feature extraction techniques depend on the pixels of an image. Different distance metrics viz. Euclidean distance, city block distance, Minkowski distance, Mahalanobis distance etc. are available for the features to interact with each other during different machine learning classifications [29,30,31].

Feature extraction is done in both the training and the testing phases. Based on training images, features are extracted for the machine learning phase. Based on testing images, features are extracted for the model validation phase.

2.1.6 Machine Learning

During the face recognition system training, the machine learning phase is implemented after feature extraction of the image. This phase includes the crunching of features into the algorithms which uses different parameters for building the mathematical equations and correlations for the prediction of discrete class in case of classification or a real number in regression.

2.1.7 Model Validation

When the testing phase is under process, probe image is read, pre-processed and feature extracted for the prediction to be done by the machine learning trained model. The output of this phase gives the probable class of the person to which the photo belongs.

2.1.8 Subject Identification

The result of the model validation phase is compared to the expected output for the matter of subject identification. If the model validation phase output matches exactly the expected output, it is said to be true positive. If the model validation phase output does not match the expected output, it is true negative [32].

2.2 Image Processing Attacks

There are three classes of image processing attacks viz. enhancement attacks, geometric attacks and noise attacks.

2.2.1 Enhancement Attacks

Image enhancement attacks are the form of attacks that do not affect an image's size but modifies the existing pixels. There are four types of enhancement attacks chosen to be discussed viz. blurring, sharpening, median filtering and histogram equalization [13]. These can be seen in Fig. 2. The face used in Fig. 2 has been taken from the Bosphorus dataset [20].

Fig. 2
figure 2

Image enhancement attacks [20]

Pseudo codes and time complexities of each enhancement

figure c

The time complexity of the blurring pseudo code is \(O(m*n*w*h)\), where \(m\) is the width of the original image, \(n\) is the height of the original image, \(w\) is the width of the blurring kernel and \(h\) is the height of the blurring kernel.

Blurring is one of the image processing techniques in which the image's pixels are affected by the surrounding pixels [33]. This method is used for smoothing and edge detection. When blurring is increased, it drastically affects the recognition rate in case of face recognition.

figure d

Time Complexity of the histogram equalization attack is \(O(m*n)\), where \(m,n\) are the dimensions of the original image. Histogram equalization technique improves the overall quality of the image by increasing the intensity of all the pixels.

figure e

The time complexity of the median filter attack is \(O(m*n*w*h)\) where \(m,n\) are the dimensions of the original image and \(w,h\) are the dimensions of the kernel filter. Median filtering is an enhancement attack used for reducing the noise in an image. In this method, full image convolution is done for attenuating the noise signal.

figure f

The time complexity of the sharpening attack is \(O(m*n*w*h)\) where \(m,n\) are the dimensions of the original image and \(w,h\) are the dimensions of the kernel filter. Addition of the original image and the signal proportional to high pass filtering version of the original image is known as sharpening. This is a technique of increasing the pixel intensities of an image for enhancing fine details and edges of the image [34].

2.2.2 Geometric Attack

Image geometric attacks can be defined as those attacks which affect the number of pixels in an image. Experimentation has been done on three geometric attacks: viz. resize, rotation and cropping [35]. These can be seen in Fig. 3.

Fig. 3
figure 3

Image Geometric Attacks [20]

Pseudo code and time complexities of each geometric attack are as follows:

figure g

Time Complexity of the rotation attack is \(O(m*n)\), where \(m,n\) are the dimensions of the original image. Rotation as an image processing attack is defined as a geometric transformation which deals with moving the whole image to given angle moving along the base in an anticlockwise or clockwise direction [36]. Image padding is applied to an image before being rotated.

figure h

Time Complexity of the cropping attack is \(O(m*n)\), where \(m,n\) are the dimensions of the original image. Cropping is a geometric attack similar to image segmentation. In cropping, image is partially filled with zeroes and the remaining part is left visible after the attack.

figure i

Time Complexity of the resize attack in down-sampling is \(O(m*n)\), where \(m,n\) are the dimensions of the original image. Resizing or scaling an image deals with up-sampling or down-sampling the number of pixels in an image [37]. Interpolation techniques are used in both the cases. When an image is up-scaled, the image quality decreases unless super resolution techniques are used. Face recognition accuracy drastically decreases when an image is up-scaled.

2.2.3 Noise Attacks

Image noise attacks are the attacks done directly on the pixels of an image. Generally, they are done based on density or the variance of their type. Direct changes are brought in an image by manipulating pixels. Image size is not affected by this attack. This work experimentation has been done using three types of noise attacks: gaussian noise attack, speckle noise attack, and poisson noise attack [36]. These can be seen in Fig. 4.

Fig. 4
figure 4

Image Noise Attacks [20]

Pseudo code and time complexities of each noise attack are as follows:

figure j

Time Complexity of the gaussian noise attack is \(O(m*n)\), where \(m,n\) are the dimensions of the original image. Gaussian noise attack is one of the most famous noise attack. In this attack, white pixels are added uniformly in the image. This method changes the original pixels throughout the image, making image corrupt.

figure k

Y = I + β*I; β = uniformly distributed random noise with values of mean and variance.

Time Complexity of the speckle noise attack is \(O(m*n)\), where \(m,n\) are the dimensions of the original image. Speckle noise is multiplicative in nature. Noise and signal are statistically independent [38]. This noise has very prominent existence in ultrasound images. It deteriorates the edges and other fine details affecting the contrast of the image, which in return makes detection of lesions difficult [39].

figure l

Time Complexity of the poisson noise attack is \(O(m*n)\), where \(m,n\) are the dimensions of the original image.

Poisson noise is applied to an image in contrast to adding noise such as Gaussian. Poisson noise or Shot noise occurs when finite energy particles in electrical circuit generates measurable statistical fluctuations [40]. Poisson noise percentage is higher at darker pixels as compared to lighter pixels.

2.3 Need of Attacks Invariant Face Recognition System

Face recognition systems are prone to different forms of challenges including illumination, occlusion, make-up, age, enhancement, geometric and noise attacks. Three different forms of attacks have been considered viz. enhancement, geometric and noise attacks in the presented work.

3 Models used for Face Recognition

3.1 Motivation

The work presented in this paper makes use of the machine learning models with face recognition invariant of image processing attacks on such a wide scale. To the best of our knowledge, this work is being done the first time, including quantifying histogram of oriented gradients.

3.2 Mathematics of Models

Face recognition algorithms that have been used are Support Vector Machine, K-Nearest Neighbours, Discriminant and Bagged Tree Ensemble model. Table 1 presents the mathematics of machine learning models.

Table 1 Mathematics of Models

4 Experimentation and Result Discussions

This section presents the detail of experimentation of this research. Sub-sections have been made based on dataset detail, experimental setup and empirical evaluation.

4.1 Datasets Used

The Bosphorus face database [20] and the University of Milano Bicocca (UMB) face database [21] are two state-of-the-art face databases for all the experiments presented in this paper. Bosphorus database has a total of 4666 face images of 105 subjects. These images have good illumination and require less amount of pre-processing in the training and testing phases. UMBDB has a total of 1473 face images of 143 subjects clicked in multiple backgrounds and light illuminations.

4.2 Experimental Setup

The experimental platform has been developed on Dell Inspiron computer with Intel(R) Core(TM) i7-7500U CPU@2.70GHz and 16G RAM. The testing software is MALTAB 2017a licensed under Thapar Institute of Engineering and Technology and running on Windows 10.

Table 2 presents the machine learning model's initialisation table parameters, representing the basic parameters with their initial values when models were trained.

Table 2 Algorithms and their Parameters Initialization

4.3 Empirical Evaluation

This sub-section presents an extensive analysis of image processing attacks by comparing ten variants of machine learning models. All attacks have been implemented after the quantization of HOG features. Variations of attacks presented in this research can be seen in Table 3.

Table 3 Image processing attack class, name and variation handled in current research

4.3.1 Experimentation 1: Effect of Enhancement Attacks on Machine Learning based FR Systems

Multiple machine learning models have been tested for accuracy by varying the parameters of enhancement attacks. Effect of four different enhancement attacks have been shown as follows:

4.3.2 Effect of Blurring on Models

Table 4 shows the blurring effect on the face recognition accuracy on both datasets namely Bosphorus and UMBDB with model ranking. The variants of three classification models, namely support vector machine, k-nearest neighbor, and discriminant analysis, have been used to train and test blurring attacks in the face recognition system.

Table 4 Effect of blurring on models with Bosphorus and UMBDB datasets

Subspace discriminant ensemble model achieves the best accuracy of 80.1% and 78.1% for 5x5 and 9×9 blurring filters respectively on Bosphorus dataset. Even for the UMBDB dataset, subspace discriminant ensemble outperforms other models with 77.2% and 76.5% accuracy for 5×5 and 9×9 blurring filters.

4.3.3 Effect of Sharpening on Models

Table 5 shows sharpening attack on face images of Bosphorus as well as UMBDB dataset in two parts. Comparing model accuracy between ten variants of SVM, KNN and discriminant analysis have been represented for both datasets.

Table 5 Effect of sharpening on models with Bosphorus and UMBDB datasets

Subspace discriminant ensemble model outperforms others with 85.5% and 84.8% accuracy with 50×50 and 100×100 size image for Bosphorus dataset face recognition. Similarly, for the UMBDB dataset, again subspace discriminant ensemble model outperforms other models with 86.7% accuracy for 50×50 image size and 86.3% accuracy for 100×100 image size.

4.3.4 Effect of Median Filtering on Models

Table 6 presents the median filtering enhancement attacks for Bosphorus and UMBDB datasets. Ten model variants of SVM, KNN and discriminant analysis have been used for accuracy comparisons and ranking of models for both datasets.

Table 6 Effect of median filtering on models with Bosphorus and UMBDB datasets

Subspace discriminant ensemble model performs best with 85.7% accuracy for 50x50 image size and 84.9% accuracy for 100×100 image size for Bosphorus dataset. For UMBDB dataset, subspace discriminant is best performing with 90.3% accuracy for 50×50 size image and 83.6% accuracy for 100×100 size image.

4.3.5 Effect of Histogram Equalization on Models

Table 7 shows histogram equalization image enhancement attack results for Bosphorus as well as UMBDB face dataset. There are ten variants of machine learning models from class of SVM, KNN and discriminant analysis.

Table 7 Effect of histogram equalization on models with Bosphorus and UMBDB datasets

In the case of the Bosphorus face dataset, the subspace discriminant ensemble model is outperforming other models with 85.3% and 84.3% face recognition accuracy for 50×50 and 100×100 image size. In the UMBDB face dataset, the subspace discriminant ensemble model is outperforming other model variants with 87.1% accuracy for 50×50 image size and 82.3% accuracy for 100×100 image size.

4.3.6 Experimentation 2: Effect of Geometric Attacks on Machine Learning Based FR Systems

Rotation, cropping and resizing attacks have been performed under this section. Results are as follows:

4.3.7 Effect of Rotation on Models

Table 8 presents the rotation attacks on Bosphorus and UMBDB face datasets. In the case of Bosphorus face dataset, subspace discriminant ensemble model is holding rank 1 with 85.6% accuracy for 90° rotations, 84.8% accuracy for 180° rotations and 84.2% accuracy for 270° rotations. In case of UMBDB face dataset, subspace discriminant ensemble model is holding rank 1 with 83.5% accuracy for 90 ° rotations, 83.0% accuracy for 180° rotations and 83.9% accuracy for 270° rotations.

Table 8 Effect of rotation on models with Bosphorus and UMBDB datasets

In this paper, only 90° variants have been studied for image rotation purposes. Accuracies of machine learning models are not varying much with the variation of the angle of rotation. It is believed that if the angle of rotation is acute, the accuracy of rotated faces will drop compared to 90° variations. Acute angle image rotation based face recognition would be included in future work.

4.3.8 Effect of Cropping on Models

Table 9 shows the cropping attack on Bosphorus dataset faces as well as UMBDB dataset faces. Three variants of cropping have been tested with variants of machine learning models. In case of Bosphorus faces, recognition accuracy is 78.3% for right 25% of the image cropped, 82.5% for right 50% of the image cropped, and 88.1% for right 75% of the image cropped respectively by using subspace discriminant ensemble model.

Table 9 Effect of cropping on models with Bosphorus and UMBDB datasets

In the UMBDB face dataset, the best accuracy has been achieved by subspace discriminant ensemble model with 70.1% recognition accuracy for right 25% cropped image, 84.2% accuracy for right 50% cropped image and 83.1% accuracy for right 75% cropped image.

It is noteworthy from Table 9, the accuracy of face recognition is increasing when the cropped image is covering more percentage of the face.

4.3.9 Effect of Resize on Models

Table 10 presents the image resize attack on Bosphorus face dataset as well as UMBDB face dataset. In the Bosphorus dataset, the best performing model is a subspace discriminant ensemble model with 85.5% accuracy for 50×50 image size and 85.3% accuracy for 100×100 image size. In the case of UMBDB dataset, subspace discriminant ensemble model has outperformed all other models by achieving 88% accuracy for 50×50 size face images and 87.8% accuracy for 100×100 size face images.

Table 10 Effect of resizing on models with Bosphorus and UMBDB datasets

Generally, face recognition accuracy drops when an image is resized from a smaller size to bigger due to interpolation. In Table 10, the accuracy of 50×50 and 100×100 size image are both at par rather than expected difference in them. The reason behind less accuracy-difference is that both times, the image's resizing was done from a larger original image, rather than resizing 50×50 image to a size of 100×100.

4.3.10 Experimentation 3: Effect of Noise attacks on Machine Learning Based FR Systems

Gaussian, speckle and poisson noise attacks have been implemented on the models in this sub-section. Results are as follows

4.3.11 Effect of Gaussian Attack on Models

Figures 5 and 6 show the graphical representations of the accuracy performances of ten different KNN, SVM, and discriminant analysis variations. Both figures show five Gaussian noise variations with mean 0 for each and variance as 0.05, 0.15, 0.25, 0.35 and 0.45 respectively.

Fig. 5
figure 5

Effect of Gaussian noise on models with Bosphorus dataset

Fig. 6
figure 6

Effect of Gaussian noise on models with UMBDB dataset

In Fig. 5, Bosphorus face dataset under Gaussian noise attack, subspace discriminant ensemble model outperformed other models with the highest accuracy of 84.8% for v=0.05. In Fig. 6, UMBDB face dataset, coarse KNN model outperforms other models with an accuracy of 80.4% for v=0.05 in face recognition accuracies. Accuracy decreases gradually as the variance of Gaussian noise is increased.

4.3.12 Effect of Speckle Attack on Models

Figures 7 and 8 show the graphical representations of the accuracy performances of ten different KNN, SVM, and discriminant analysis variations. Both figures show five variations of Speckle noise with mean 0 for each and variance as 0.01, 0.04, 0.10, 0.20 and 0.40 respectively.

Fig. 7
figure 7

Effect of speckle noise on models with Bosphorus dataset

Fig. 8
figure 8

Effect of speckle noise on models with UMBDB dataset

Figure 7, Bosphorus face dataset under Speckle noise attack, subspace discriminant ensemble model outperforms other models with the highest accuracy of 84.8% for v=0.04. In Fig. 8, for the UMBDB face dataset, linear discriminant model outperforms other models with accuracy of 64% for v=0.01 in face recognition. Accuracy decreases gradually as the variance of Gaussian noise is increased.

4.3.13 Effect of Poisson Attack on Models

Table 11 presents the Poisson noise attack on face database of Bosphorus and UMBDB. Poisson noise attack has been performed with two different sizes of the images.

Table 11 Effect of Poisson noise on models with Bosphorus and UMBDB datasets

In the case of Bosphorus, the best performing model is subspace discriminant ensemble model with 78.6% accuracy for 50×50 image size and 73.8% accuracy for 100×100 image size. In the case of UMBDB, subspace discriminant ensemble model has outperformed other models by achieving 71.9% accuracy for 50×50 size face images and 63.3% accuracy for 100×100 size face images.

It can be concluded that subspace discriminant ensemble model best handled 95% cases of image processing attacks trained and tested for face recognition system accurately.

5 Visual Verification of Image Attacks Invariant Face Recognition System

This section shows the input and output of all the image processing attacks on face recognition system visually. Three sub-sections have been made to show different image attacks belonging to enhancement, geometric and noise attacks, respectively.

5.1 Visual Verification of Enhancement Attacks on Face Recognition System

Figure 9 shows the visual input and output for different enhancement attacks viz. blurring, histogram equalization, median filter and sharpening. Blurring has been shown with 5×5 and 9×9 blur filter as attack in input. Histogram equalization, median filter and sharpening attack have been visually verified with inputs of 50×50 and 100×100 image sizes. All the inputs have been selected randomly out of occluded faces.

Fig. 9
figure 9

Visual Verification of Enhancement Attacks on Face Recognition System

5.2 Visual Verification of Geometric Attacks on Face Recognition System

Figure 10 shows the visual input and output for different geometric attacks viz. resize, cropping and rotation. Resize attack has been demonstrated with 50x50 and 100x100 attacks. Cropping is shown with right 25%, right 50% and right 75% area cropped in input. Rotation is demonstrated with 90°, 180° and 270° anticlockwise rotations. All the inputs have been taken out of occluded faces randomly.

Fig. 10
figure 10

Visual Verification of Geometric Attacks on Face Recognition System

5.3 Visual Verification of Noise Attacks on Face Recognition System

Figure 11 shows the visual input and output for different noise attacks viz. Gaussian, Speckle and Poisson for visual verification.

Fig. 11
figure 11

Visual Verification of Noise Attacks on Face Recognition System

Gaussian noise attack has been shown with five variations of mean and variance viz. (0,0.05), (0,0.15), (0,0.25), (0,0.35) and (0,0.45). Speckle noise attack has been shown with five variations of density viz. d = 0.01, d = 0.04, d = 0.10, d = 0.20, and d = 0.40. Poisson noise attack has been shown with image sizes 50x50 and 100x100. All the inputs with occlusion have been chosen randomly.

It can be cross-validated from Figs. 9, 10, and 11 that the face recognition system is invariant of image processing attacks built by training of various machine learning models. It can also be verified that all the test cases in visual verification have an occlusion in the image.

6 Conclusion

This paper presents the face recognition under different image processing attacks in great detail. Pseudo codes of all attacks have been given along with the time complexities of each attack. The mathematical of the machine learning algorithms, experimental setup with parameters initialization, and experimental results in extensive empirical form has been provided. Visual verification of image attacks is an attempt to demonstrate attacks invariant face recognition system. Ten image processing attacks viz. blurring, histogram equalization, sharpening, median filtering, resize, cropping, rotation, Gaussian noise, Speckle noise and Poisson noise have been discussed in this paper. All the attacks implemented done have used quantized-HOG features, hence compressing the original features.

This research is limited to two-dimensional face recognition systems. Work can be extended for three-dimensional face recognition. How image processing attacks work on voxel information and meshes would be an interesting research to work up on. An effort was made to extend this work on depth images or 2.5D images of the face but results were bad and were not included into this research. This work has an application in captcha-based recognition where these attacks are commonly used for objects identification.

In the last section, visual verification has been presented showcasing the robustness of the image processing attacks invariant face recognition system. In future, we intend to extend the current work to expression and occlusion identification, invariant of image processing attacks using deep learning techniques.