1 Introduction

Facial recognition has been an active research topic for a long time. Many approaches have been proposed to accomplish the task of face recognition during the last two decades and these methods are briefly reviewed below. Detailed review can be found in, Refs [14].

One of the commonly used face recognition methods is the eigen-face method. Kirby and Sirovich [5] used the Karhunen–Loève procedure to exploit the natural symmetries (mirror images) in a well-defined family of patterns (human faces). This results in an extension of the data and imposes even and odd symmetry on the eigen-functions of the covariance matrix, without increasing the complexity of the calculation. The resulting approximation of faces projected from outside of the dataset onto this optimal basis is improved on average.

Turk and Pentland [6] developed face recognition system which tracked a subject’s head and then recognized the person by comparing characteristics of the face to those of known individuals. Face images were projected onto a face space defined by the ‘eigen-faces’ which were the eigen-vectors of the set of faces not necessary corresponded to isolated features such as eyes, ears, and noses. The framework provided the ability to learn to recognize new faces automatically.

Moon and Phillips [7] evaluated the performance aspects of various PCA-based face recognition algorithms by comparing them on the changing of illumination normalization procedures; effects on algorithm performance of compressing images with JPEG and wavelet compression algorithms; varying the number of eigenvectors in the representation; and changing the similarity measures in the classification process.

Lovell et al. [8] developed the adaptive principal component analysis (APCA) to improve the robustness of PCA to nuisance factors such as lighting and expression by first applied the PCA and then the face space was rotated, warped by whitening and filtering the eigen-faces according to the overall covariance, between-class, and within-class covariance to find an improved set of eigen features.

Zhang et al. [9] proposed a subspace method called diagonal principal component analysis (DiaPCA), which directly seeks the optimal projective vectors from diagonal face images without image-to-vector transformation. It reserved the correlations between variations of rows and those of columns of images. Experiments showed it was much more accurate than both PCA and 2DPCA.

Bartlett et al. [10] used version of independent component analysis (ICA) derived from the principle of optimal information transfer through sigmoidal neurons on face images proved that ICA representation gave the best performance on the frontal face images.

Lin et al. [11] used a probabilistic decision-based neural network (PDBNN) method for face detection and recognition. It adopted a hierarchical network structures with nonlinear basis functions and a competitive credit-assignment scheme. Experiments showed satisfactory results in terms of recognition accuracies and processing speed.

Huang et al. [12] used component-based recognition and 3D morphing models to build a pose and illumination invariant face recognition system.

Shan et al. [13] presented an individual appearance model based method called face-specific subspace (FSS), for recognizing human faces under variation in lighting, expression, and viewpoint. This method derived from the traditional eigen-face but differed from it in essence.

Karande and Talbar [14] used PCA for preprocessing before applying ICA for training of images obtained encouraging results under variation of pose and illumination.

Zhao et al. [15] proposed a face recognition method based on PCA and linear discriminant analysis (LDA) in order to improve the generalization capability of LDA. The results showed a significant improvement on face recognition.

Su et al. [16] adapted a multi-feature extraction technique that includes PCA and LDA, and for classification they applied radius basis function network (RBFN). They also acquired features in both special and frequency domains. This method of face recognition was able to achieve a higher rate of accuracy.

Delac et al. [17] performed a comparative study on the effect of image compression techniques on the PCA, LDA and ICA.

It is very common to apply PCA based feature reduction and extraction while the preliminary features are extracted using different wavelets. Liu [18] combined Gabor wavelet features with enhanced Fisher linear discriminant methods and kernel PCA with fractional power polynomial models on the Gabor features [19], Liu and Wechsler [20] applied Gabor + PCA + ICA method for face recognition. Gabor + PCA representation of face [1820] and facial expression [21, 22] had shown good performance for classification and identification. Amin and Yan [23] have recently studied the characteristics of 40 basic Gabor feature vectors for face recognition. Liu et al. [24] have develop the local discriminant wavelet packet coordinates for face recognition.

In actual face recognition practice, image size can be quite small. Zhao et al. [15] demonstrated that by using 12 × 11 size images in an LDA system for face recognition. Lin et al. [11] used a 14 × 11 size images in a Probabilistic Decision-Based Neural Network (PDBNN) system. It is shown in neuropsychological research that a minimum image size of 18 × 24 is acceptable for human perception [25]. Furthermore, the existence of a universal face subspace of fixed dimension was investigated by Zhao et al. [15]. This means that as long as the image is larger than the subspace dimension, its size does not matter. It was also shown by Zhao et al. that, possibly due to the improvement of signal-to-noise ratio with the decrease in image size, the smaller images produced slightly better performance. In a recent paper by Zhang et al. [26], more studies on the size of images and filters were discussed.

This paper is an extension of our earlier work [27]. Here, we present the results of a systematic study of the effects of noise, blurriness, image-size, subject-size, image collection method, expression, pose and illumination on face recognition.

2 PCA based face recognition system

[28] PCA is a statistical method for reducing large dimensionality of data space (observed variables) into the smaller intrinsic dimensionality of feature space (independent variables). This is the case where there is a strong correlation between observed variables.

The main idea of using PCA for face recognition is to express the large 1-dimensional vector of pixels constructed from 2-dimensional facial image into the compact principal components of the feature space called eigenspace projection. Eigenspace is calculated by identifying the eigenvectors (eigen-faces) of the covariance matrix derived from a set of facial images (vectors).

To identify an image from a known face database, we divide the images from the face database into training face images and test face images without any overlapping.

From the training face images, we compute the eigen-faces of the images and prepare an image set of facial signatures. From the test face images, we select an image, compute the eigen-faces and then compare with the facial signatures of the training face images to find the closest match.

3 System structure

A typical facial recognition system will have four major generic components as shown in Fig. 1. These steps are briefly discussed in the subsequent sections.

Fig. 1
figure 1

A generic facial recognition system

4 Facial IMAGE acquisition

For any face recognition system the initial task is to collect facial image data for training and testing the system. The data for this experiment is collected from many different sources to understand the effect of ethnicity, illumination, pose and expression. We have a set of in house data, and the AT&T (ORL Database) database [29], CMU facial expression database [30], Nott Face Database [31], JAFFE Face Database [32], Caucasian Face Database [33], Indian Face Database [34], Asian Face Database [35], and Faces94 Face Database [36].

In the ORL database there are ten different images of each of 40 distinct persons. For some persons, the images were taken at different times, varying the lighting, facial expressions (open or closed eyes, smiling or not smiling) and facial details (glasses or without glasses). All the images were taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement). The size of each image is 92 × 112 pixels, with 256 gray levels per pixel. An example is provided in Fig. 2.

Fig. 2
figure 2

Sample images for a subject of the ORL Database

The CMU facial expression database contains many facial expression images. However, for our face recognition purpose, we used ten different images of each of 20 distinct persons. Each image has the center of eyes and center of top lip aligned, the intensity across an image is also normalized to a unite mean and zero variance. The size of each image is 116 × 96 pixels, with a 256 gray level per pixels. Figure 3 shows some images from the CMU database.

Fig. 3
figure 3

Sample images for a subject of the CMU Database

In the Nott Face Database, there are ten various images of each of 40 distinct persons. All images were taken at a similar illumination (bright) but with various face expressions and non-aligned. The size of each image is 92 × 123 pixels, with 256 gray levels per pixel. An example is provided in Fig. 4.

Fig. 4
figure 4

Sample images for a subject of the Nott Face Database with various expressions and same illumination (bright)

For the JAFFE Face Database, there are ten various images of each of 10 distinct persons. All images were taken at the similar illumination (bright) but with various face expressions and non-aligned. The size of each image is 110 × 110 pixels, with 256 gray levels per pixel. An example is provided in Fig. 5.

Fig. 5
figure 5

Sample images for a subject of the JAFFE Face Database with various expressions and same illumination (bright)

From the Caucasian Face Database, we have selected the following two groups, (a) Faces with various expressions but same illumination (bright); (b) Faces with various poses but same illumination (bright). For each group, there are ten various images of each of 40 distinct persons. The size of each image is 110 × 110 pixels, with 256 gray levels per pixel. Examples are provided in Figs. 6 and 7. In the Indian Face Database, there are ten different images of each of 40 distinct persons. All images were taken at a similar illumination but with different face orientations and non-aligned. The size of each image is 112 × 84 pixels, with 256 gray levels per pixel. An example is provided in Fig. 8.

Fig. 6
figure 6

Sample images for a subject of the Caucasian Face Database with various expressions and same illumination (bright)

Fig. 7
figure 7

Sample images for a subject of the Caucasian Face Database with various poses and same illumination (bright)

Fig. 8
figure 8

Sample images for a subject of the Indian Face Database with various poses but similar illumination (slightly darker illumination)

For the Asian Face Database, we have selected the following three groups, (a) Faces with various expressions and slight different illumination; (b) Faces with various poses and slight different illumination; and (c) Faces with frontal images but various illumination conditions. For each group, there are ten different aligned images of each of 40 distinct persons. The size of each image is 40 × 50 pixels, with 256 gray levels per pixel. Examples are provided in Figs. 9, 10 and 11.

Fig. 9
figure 9

Sample images for a subject of the Asian Face Database with various facial expressions and slightly different illumination

Fig. 10
figure 10

Sample images for a subject of the Asian Face Database with various poses and slightly different illumination

Fig. 11
figure 11

Sample images for a subject of the Asian Face Database with frontal images but various illumination conditions (from bright to dark)

For the Faces94 Face Database, there are ten various images of each of 40 distinct persons. All images were taken at the same illumination (darker with green background) but slight various face expressions and non-aligned. The size of each image is 100 × 110 pixels, with 256 gray levels per pixel. An example is provided in Fig. 12.

Fig. 12
figure 12

Sample images for a subject of the Faces94 Face Database. Same illumination but slightly darker with green background and slight various face expressions

5 Facial image pre-processing

As mentioned in the previous section, the images used for this experiment are 256 gray level images but of various sizes. The ORL Face Database (92 × 112), the CMU Face Database (116 × 96), the Nott Face Database (92 × 123), the JAFFE Face Database (110 × 110), the Caucasian Face Database (110 × 110), the Indian Face Database (112 × 84), the Asian Face Database (40 × 50) and the Faces94 Face Database (100 × 110). For the CMU database images, we have normalized the facial geometry and intensity in order to make comparison for experiments. Here note that the Asian database have facial images that are already geometry normalized but not intensity normalized.

The CMU faces are normalized based on the following method: first, manual detection of land mark points (eyes and mouth top) on a facial image to give a reference to crop the facial region for further processing is performed (Fig. 13). The normalization can be performed using the following affine transformation.

$$ \left[ {\begin{array}{*{20}c} {x^{\prime}} \\ {y^{\prime}} \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {\cos \theta } & { - \sin \theta } \\ {\sin \theta } & {\cos \theta } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {S_{x} } & 0 \\ 0 & {S_{y} } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} x \\ y \\ \end{array} } \right] + \left[ {\begin{array}{*{20}c} {D_{x} } \\ {D_{y} } \\ \end{array} } \right] $$
(1)

where, \( S = \left[ {S_{x}\quad S_{y} } \right] = \left[ {\frac{{w^{\prime}}}{w}\quad\frac{{h^{\prime}}}{h}} \right] \) contains the scaling perimeters and \( D = \left[ {\begin{array}{*{20}c} {D_{x} } & {D_{y} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {d^{\prime}_{x} - d_{x} } & {d^{\prime}_{y} - d_{y} } \\ \end{array} } \right] \) contains the translation parameters.

Fig. 13
figure 13

Aligning images using an affine transformation

Here, x′ and y′ are the horizontal and vertical positions of the 2D face model coordinates, and x and y are the horizontal and vertical positions of the original image coordinates. The upper-left corner of each frame, including the face model image is denoted as (0, 0). In the standard face model, the top point of the philtrum (bottom of the nose) is the rotation center whose position is \(( {d_{x}^{\prime } ,d_{y}^{\prime } })\) = (192, 230), and the position of the center of the right eye is (167,155) and that of the left eye is (217,155); the width w′ between the center of both eyes is 92 pixels and the height h′ from the level of the center of both the eyes to the top point of the philtrum is 135 pixels. The horizontal scaling is given by the parameter S x which is computed as the ratio of the distance w′ at the face model to that distance w at the original face image. The vertical scaling given by S y is computed as the ratio of the distance h′ at the face model to that distance h at the original face image. The horizontal and vertical displacements (translations) are represented by D x and D y , respectively, and are measured from the top point of the philtrum in the original face image (d x , d y ) to that in the face model \(( {d_{x}^{\prime } ,d_{y}^{\prime } })\). The angle of rotation of the line connecting the centers of both eyes in the original face image from the corresponding horizontal line in the face model is represented by θ, where the clockwise rotation is negative and the counterclockwise rotation is positive. The pixel positions of each image are integer valued, but the warped positions after the affine transformation are, in general, not integer-valued. So the gray value at each integer-valued pixel of the warped image needs to be estimated by bilinear interpolation based on the gray values of its four nearest neighbor pixels in the original image. Finally, this affine transformation will give us geometry and intensity normalized facial image and based on this image we crop the face region and pass it the next level for facial feature extraction.

6 Facial feature extraction using principle component analysis (PCA)

Gray intensity of each pixel is considered as initial feature. Each facial image is converted into a row vector by appending each row one after another and the normalized and cropped image of size 112 × 92 will become 10,304 dimensional feature vector. So, we can say a single facial image is represented by a very higher dimensional feature vector hence any classification technique can hardly be applied to learn the underlying classification rules. Therefore, PCA is applied to extract more relevant features/signatures [6]. PCA is a simple statistical method to reduce the dimensionality while minimizing mean squared reconstruction error [6].

Let us assume that M facial images that are denoted as I1, I2, …, I M have size a × b pixels. Using conventional row appending method we convert each of the images into N = a × b dimensional column vector. At first the mean image as column vector, Ξ of size N, from all the image vectors of is calculated as shown in Equation (2). An example of the average face is provided in Fig. 14.

Fig. 14
figure 14

Average face acquired using Eq. 2 for the CMU database

$$ \Upxi = \frac{1}{M}\sum\limits_{i = 1}^{M} {{\rm I}_{i} } $$
(2)

Then each face difference from the average is calculated using the equation (3).

$$ a_{i} = {\rm I}_{i} - \Upxi $$
(3)

Then we construct the matrix A = [a 1a 2, …, a M ] containing all the mean-normalized face vectors as columns. Using this normalized face vectors we can calculate the covariance matrix ℑ along the feature dimension of size N × N of all the features using the following conventional formula as:

$$ \Im = \frac{1}{N}{\rm A}{\rm A}^{T} $$
(4)

Here notice that the matrix AAT of size 10,304 × 10,304 needed to be constructed to calculate the matrix ℑ. However, it is virtually impossible for the memory constrains to perform any matrix operation on the AAT matrix. Rather, the method described in [1] is employed to construct the matrix ℵ using Eq. 5. Instead of AAT, the matrix ATA of size 360 × 360 (out of 400 images 10 for each subject, 40 images one for each subject is kept apart for testing) is constructed as ℵ of size M × M using:

$$ \aleph = \frac{1}{M}{\rm A}^{T} {\rm A} $$
(5)

Then we calculate the eigenvalue and eigenvectors of this covariance matrix using Eq. 6.

$$ [V,D] = eigs(\aleph ) $$
(6)

Here, D = [d 1d 2, …, d M ] of size M contains the sorted eigenvalues, such that d 1d 2〉…〉d M and the corresponding eigenvectors of the matrix ℵ is contained in the matrix V = [v 1v 2, …, v M ] which is of size M × M. According to the method proposed in [1], we can acquire the corresponding eigenvectors of the matrix ℑ using V = [v 1v 2, …, v M ] as:

$$ U = A \times V $$
(7)

Here notice that, even though each vector v i is of size M, the vectors u i of U = [u 1u 2, …, u M ] are of size N. We can use the matrix U to project our N data onto lower M dimensions. Example eigenfaces are provided in Fig. 15.

Fig. 15
figure 15

Eigen faces (u 1u 2u 3 and u 4) acquired using Eq. 7 for the CMU database

Now the question is how many features in the lower dimension we should choose. A standard procedure is to plot the ratio of the sum of top eigenvalues and the sum of all nonnegative eigenvalues using Eq. 8. The plot for this ration of the ORL database is given in Fig. 16.

Fig. 16
figure 16

The ratio of “the sum of the top eigenvalues” and “the sum of all the eigenvalues”

$$ \delta_{\Updelta } = \frac{{\sum\nolimits_{j = 1}^{\Updelta } {d_{j} } }}{{\sum\nolimits_{i = 1}^{M} {d_{i} } }} \, \Updelta = 1,2, \ldots ,M $$
(8)

This intuitively measures how much of the variance of the original N-dimensional data will be retained if certain number of smaller r-dimensional data is acquired from the principle component projection. So, assume that top r eigenvalues retains the 90% variance of the original data. We will construct a projection matrix from the top r eigenvectors from the matrix U (actually the order is maintained by correspondent eigenvalues), let’s call it Ω r of size r × N. The projected data from the original N dimensional space to a subspace spanned by r principal eigenvectors (for the topr eigenvalues) contained in the matrix Ω r expressed as:

$$ Y_{r} = \Upomega_{r}^{{}} A^{{}} $$
(9)

Note that from Fig. 16, the top 50 principle components are chosen for this experiment because the sum of the top 50 eigenvalues of the covariance matrix is more than 90% of the sum of all the eigenvalues.

7 Face recognition or classification

After all the facial images are finally represented with relevant features by projecting onto a lower dimension using PCA, we can use similarity measures between faces from the same individual and different individuals. Assume that the normalized vector formed face test images are kept in the matrix T (note that there are 40 images for 40 subjects that were not used in the PCA stage), where each column corresponds to a test face image. For classification, at first we normalize the test images vector by subtracting the mean calculated previously (Eq. 2) using:

$$ B = T - \Upxi $$
(10)

Then using Eq. 9 we project the normalized test data set as shown in the following equation.

$$ Z_{r} = \Upomega_{r} B $$
(11)

For each column in the matrix Z r , we calculate the Euclidean Norm of the difference with the projected vectors of matrix Y r . Finally, the test image is identified as the person with the smallest value among all the Euclidean Norm values.

8 Experimental results and discussion

At first, we observed the influence of the number of PCA signatures (features) on the person identification. In this experiment we vary the number of signatures or features in the lower dimensional representations from one to fifty. In Fig. 17, notice that as the number of signatures is increased the recognition accuracy is increased as well, however, the recognition rate saturates after a certain amount of increase in the number of signatures.

Fig. 17
figure 17

Ten-fold cross validation test accuracy for the ORL database

Second, to determine the influence of the number of images used to create the covariance matrix. Even if there is no training involved in this experiment, we called the number of data used to create the covariance matrix as the training data and rest of the data are called test data. Here specially note that the test data are not a part of the training data, and all the testing is performed using a ten-fold testing method. It is observed that if the training size is increased then the recognition accuracy is increased too. Fig. 18 shows the recognition accuracy for different training size. Notice that if 90% of the total images are used for PCA calculation it performs better than if less is used for PCA calculation.

Fig. 18
figure 18

Recognition accuracy for different training sizes

Third, experiment is performed by increasing the amount of noise in the test images to determine a tolerance level for a PCA based face recognition system. In Fig. 19, we provide some noisy images for a subject from the ORL database. In Fig. 20, results for the noise images are provided.

Fig. 19
figure 19

Images of the same individual with different noise level

Fig. 20
figure 20

Recognition rates for different noise levels

Fourth, experiment is performed by increasing the amount of blurriness in the test images to determine a tolerance level for a PCA based face recognition system. In Fig. 21, we provide some blurry images for a subject from the ORL database. In Fig. 22, results for the noise images are provided.

Fig. 21
figure 21

Images of the same individual with different Blurriness level

Fig. 22
figure 22

Recognition rates for different blurriness level

Fifth, we determine the smallest size of the image that is good enough to be able to retain a 90% and above recognition rate. For the ORL database it is 10%. It means that for the ORL database, if the images are collected as 11 × 9 pixel images then the recognition rate is still satisfactory. Results are provided in Fig. 23.

Fig. 23
figure 23

Recognition accuracy for different image-sizes

Sixth, the results that we have observed in for previous experiments need to be verified by experimenting with another database where the data are collected in different conditions. The CMU database is mainly used for facial expression research; here we performed similar experiments on the CMU database and observed similar results. One sample result is provided in Fig. 24. In this graph, notice that both the experiments are performed to identify 20 individuals. It is logical that if the number of individuals to be identified is less, then the chance of correct recognition is high. In Fig. 17, the graph for identifying 40 individuals of ORL database is provided. In comparison, the recognition accuracy for 20 individuals as shown in Fig. 24 is higher. Another important observation is that, if the data is pre-processed (geometric alignment and intensity normalization) cleverly then the number of signatures required for recognition is small and the recognition rate is also high. It can be observed in Fig. 24, as the CMU data reaches the 100% recognition rate very fast.

Fig. 24
figure 24

Recognition curves for ORL and CMU databases

Seventh, to determine the effect of expression, we utilize the Nott Face Database, JAFFE Face Database and the Caucasian Face Database. We select a group from Nott Face Database, a group from JAFEE Face Database and, a group from Caucasian Face Database with images taken at various face expressions but same illumination (bright). Results are provided in Fig. 25. We achieve very high recognition rate no matter whether they are black and white (Nott Face database and & JAFFE Face Database) or color images (Caucasian Face Database) and expression has minimal impact on recognition accuracy.

Fig. 25
figure 25

Recognition accuracy for various expressions but same illumination (bright)

Eighth, to determine the effect of pose, we utilize the Caucasian Face Database and the Indian Face Database. We select a group from Caucasian Face Database and a group from Indian Face Database with images taken at various face orientations but same illumination (brighter for Caucasian Face Database while darker for Indian face Database). Results are provided in Fig. 26. We have achieved very high recognition rate in Caucasian Face Database and pose has minimal impact on recognition accuracy. With slightly reduced illumination in Indian Face Database, we can still achieve an accuracy of 70%.

Fig. 26
figure 26

Recognition accuracy for various poses but same illumination (brighter for Caucasian Face Database while darker for Indian face Database)

Ninth, to determine the effect of illumination, we utilize the Asian Face Database and Faces94 Face Database. We select three different groups with various expressions, poses and illumination conditions from Asian Face Database and one group from Faces94 with slight various expressions but slightly darker illumination and green background. Results are provided in Figs. 27 and 28.

Fig. 27
figure 27

Recognition accuracy for various illumination conditions conclusions

Fig. 28
figure 28

Recognition accuracy for same illumination conditions but darker with green background

For images with different expressions and poses but slightly different illumination (Asian face) and images with same illumination but darker with green background and slight various expressions (Faces94 face), face recognition rates decrease (below 55%). However, for full frontal images but various illuminations (from bright to dark—Asian face), recognition accuracy falls below 10%.

9 Conclusions

In general for PCA based face recognition, the increase in the number of signatures will increase the recognition rate. However, the recognition rate saturates after a certain amount of increases. Therefore, in our observation it is better to use robust image pre-processing systems, such as geometric alignment of important facial feature points (eyes, mouth, and nose) and intensity normalization which increases the recognition rate and simultaneously decreases the number of signatures representing images in the PCA space. Increase in the number and variety of samples in the covariance matrix increases the recognition rate. Increase in noise and blurriness decrease the recognition accuracy. In general, the image size is not important for a PCA based face recognition system as long as the number of signatures before PCA-projection is more than the total number of sample images. Expression and pose have minimal effect to the recognition rate while illumination has great impact on the recognition accuracy. As such, continuous research works have been carried out to deal with illuminations on images in order to achieve satisfactory results on face recognition accuracy. In summary, all these findings can provide useful performance evaluation criteria for optimal design and testing of human face recognition systems.