Keywords

1 Introduction

In general, AFR can be accomplished through class discrimination which can be realized and implemented through numerous methods and techniques. While some of these techniques are based on deterministic principle, such as the technique of discrete Fourier transform (DCT) and Gabor filter [1,2,3], other AFR techniques can be considered to have statistical basis. They are mainly principal component analysis (PCA) [4,5,6], linear discernment analysis (LDA) [7, 8], support vector machine (SVM) [9], and artificial neural networks (ANN) [10].

In this paper we are interested in applying two different techniques to AFR, namely, PCA and LDA operating on still face images but subjected to gradient-shadowing being a specific image occlusion. The motivation can be justified by the fact that side-shadowing is the occlusion imposed on face images most often. The paper contains basic theoretical analysis of both techniques, illustrating the advantage gained by LDA over PCA technique in terms of enhanced capability of LDA compared with PCA with regard to class separation. Computer simulation evaluates the performance of each above techniques in terms of recognition rate against increased levels of gradient-shadowing occlusion.

2 Literature Review

AFR which is based on PCA principles was first suggested by M. Kirby and L. Sirovish [6] in 1990, and it was utilized by And M. Turk and A.P. Pentland [7] later on. Aleix M. Martinez and Avinash C. Kak [8] have tested the performance of both PCA and LDA against subspace dimensionality. The simulation results showed a slight improvement of 20% on average in terms of FRR in favor of LDA over PCA. Similar results were obtained by Peng P. et al. [11]. The work of Chandolu P. and Jayesh G [12] illustrate almost 16% on average FRR improvement of LDA performance over PCA when the techniques run against increased number of face samples. Similar results have been obtained by Önsen T. and Adnan A. [13]. Tomesh Verma and Raj Kumar Sahu [14] tested LDA-PCA technique which incorporate PCA feature vectors versus PCA technique alone. Simulation results exhibit once again 16–20% improvement in FRR gained by LDA-PCA over PCA alone. However, the work does not include the effect of face image occlusion on the FRR for both techniques, in addition to the fact that the combined LDA-PCA can prove to be too costly in terms of the required processing speed for real-time application. The work of Zhao W. et al. [15] retains similar result but with different type of face image occlusions.

3 PCA Technique

3.1 Theoretical Background

PCA technique is based on Eigen vector decomposition of data covariance matrix. The information of data, say, that of face image, is represented by a set of most significant Eigen vectors of the covariance matrix. This basic concept of PCA will be explained in the following example. Let X and Y be data vectors such as

$$ X=\left[2.3\ 0.5\ 2.2\ 1.9\ 3.1\ 2.3\ 2\ 1\ 1.5\ 1.1\right] $$

and

$$ Y=\left[2.4\ 0.7\ 2.9\ 2.2\ 3\ 2.7\ 1.6\ 1.1\ 1.6\ 0.9\right] $$

The two data vectors are plotted in Fig. 1. The sample means are

$$ \overline{X}=\frac{\sum_{i=1}^{10}{x}_i}{10}=1.81 $$
(1)

and

$$ \overline{Y}=\frac{\sum_{i=1}^{10}{y}_i}{10}=1.91 $$
(2)
Fig. 1
figure 1

Ten data points representing X and Y vectors

The normalized data vectors to the sample means are

$$ A=X-\overline{X} $$
$$ =\left[0.69-1.31\ 0.39\ 0.09\ 1.29\ 0.49\ 0.19-0.81-0.31\ 0.71\right] $$

and

$$ B=Y-\overline{Y} $$
$$ =\left[0.49-1.21\ 0.99\ 0.29\ 1.09\ 0.79-0.31-0.81-0.31\ 1.01\right] $$

Then the covariance matrix is

$$ C=\left[\begin{array}{cc}0.61655& 0.61544\\ {}0.67787& 0.71655\end{array}\right] $$

And the corresponding Eigen vectors are

$$ e=\left[\begin{array}{cc}-0.73517& -0.67787\\ {}-0.67787& 0.73517\end{array}\right] $$

The two Eigen vectors represent two PCA components—PCA1 and PCA2—as shown in Fig. 1. In general, we chose Eigen vectors so that data have maximum variance along these vectors to be the PCAs and neglect the rest. Data can be projected on these major PCAs and thus reducing data dimensionality. In our example, we have chosen the first Eigen vector eT = [−0.73517 − 0.67787 ] as single PCA and neglected the other.

When data is projected on this PCA (black dots), the data dimensionality (size) is reduced (compressed) from two dimensions to one dimension as shown in Fig. 2. This is the first function of PCA analysis.

Fig. 2
figure 2

Projection of data on main PCA vector

However, and most importantly, these two Eigen vectors represent what is called feature vectors of the data, and hence they can be used for recognition in general and for face recognition in particular. Which is considered to be the second benefit of PCA principle.

3.2 PCA Applied to Face Recognition

Regardless of the technique being used, AFR is usually accomplished by executing two phases. These are training phase and recognition phase.

3.2.1 Training Phase

  • Stage 1: The system starts by digitizing the camera snapshot into 2D face image of black/white intensity of size n × m pixels. Also, in this stage, pixels of the 2D image are concatenated to form 1D image vector X of size (L × 1) where L = m × n.

  • Stage 2: Calculating the “average face” which is the averaging of M given faces vectors such that

$$ \varphi =\frac{1}{M}{\sum}_{i=1}^M{X}_i $$
(3)
  • Stage 3: Normalizing all available M face images to zero average value by subtracting

$$ {\psi}_i={X}_i-\varphi $$
(4)
  • Stage 4: Constructing face covariance matrix C and calculating PCAs through the following:

  • Construct face image matrix

$$ A=\left[{\psi}_1^T\kern0.5em {\psi}_2^T\dots \kern0.5em {\psi}_M^T\right] $$
(5)
  • Calculate covariance matrix

$$ C=\frac{1}{M}A{A}^T $$
(6)
  • where AT is transpose of matrix A.

  • Calculate M Eigen vectors of C matrix as

$$ e= Av $$
(7)
  • where v is the largest Eigen vectors of matrix:

$$ C={A}^TA $$
(8)
  • Vectors e of largest corresponding λ are the required PCA for this person image.

  • Stage 5:

  • Project face images on PCAs (Eigen vectors) as

$$ {P}_i={e}_i^T{\psi}_i $$
(9)
  • and form person image feature vector

$$ \Omega ={\left[{P}_1{P}_2\dots .{P}_M\right]}^T $$
(10)

3.2.2 Recognition Phase

  • Stage 1: Enter query images, repeat stages 1–5, and obtain the feature vector of the query image Ωx.

  • Stage 2: Calculate Euclidian distance between query image feature vector and each one of the stored images.

$$ {d}_j=\left\Vert {\Omega}_{\textrm{x}}-{\Omega}_{\textrm{j}}\right\Vert $$
(11)
  • where J=1,2…,K, where K is the number of persons in the data base and consequently identify the query image that has the index

$$ j=\textrm{argmin}\left({d}_j\right) $$
(12)

4 LDA Technique

Principal component analysis (PCA) is an effective method mainly for reducing data dimensionality, thus for data compression, and it has been used for face recognition systems effectively. However, PCA technique does not take into account class separation. In other words, there could be two well-separated classes of data having the same spreading direction as shown in Fig. 3, and the PCA technique considers them as single class with one Eigen vector.

Fig. 3
figure 3

(a) Projection of data on main PCA. (b) Projection of data on main LDA showing effective class (category) separation

Another statistical technique that is called Fisher’s linear discernment analysis or LDA for short which was first introduced by Sir Ronald Fisher in 1936, and reformulated later on by S. Balakrishnama and A. Ganapathiraju [16] and Alaa Tharwat et al. [17], provides both Eigen vectors and class separation at the same time; hence, there could be a big chance for the LDA to outperform PCA technique in terms of ROC for a wide verities of ill-illuminated images.

The main principle of LDA technique is to find the Eigen vectors that maximize the separation between classes in the data so that a powerful class recognition system may be obtained.

With LDA technique it may well be possible to obtain both PCA projection and concise class separation at the same time (Fig. 3a, b).

In order to drive mathematical formula for obtaining these projection vectors, let us assume two classes X1 and X2. Each class contains M number of the same person’s image vectors, of which each is of N × 1 size such as

\( {X}_1=\left({X}_1^1,{X}_1^2,\dots, {X}_1^M\right) \). The mean value for class X1 is

$$ {\mu}_1=\frac{1}{M}{\sum}_{i=1}^M{X}_{1,i} $$
(13)
$$ {\mu}_2=\frac{1}{M}{\sum}_{i=1}^M{X}_{1,i} $$
(14)

And the mean value for class X2 is

$$ {\mu}_2=\frac{1}{M}{\sum}_{i=1}^M{X}_{2,i} $$
(15)

Note that both μ1 and μ2 are vectors on size N × 1.

Let w be the LDA projection vectors that will guarantee the separation of face image classes, and then the mean of the first face image projection on this vector is

$$ \underset{1}{\overset{\sim }{\mu }}={w}^T{\mu}_1 $$
(16)

And the mean of second face image projection on this vector is

$$ \underset{2}{\overset{\sim }{\mu }}={w}^T{\mu}_2 $$

The projection of class X1 on vector w is the vector Y1 such that

$$ {Y}_1={w}^T{X}_1 $$
(17)

And for class X2, the projection vector is Y2 such that

$$ {Y}_2={w}^T{X}_2 $$
(18)

The distance between these classes’ means is

$$ d(w)=\left|\underset{1}{\overset{\sim }{\mu }}-\underset{2}{\overset{\sim }{\mu }}\right|=\left|{w}^T{\mu}_1-{w}^T{\mu}_2\right|=\left|{w}^T\left({\mu}_1-{\mu}_2\right)\right| $$
(19)

The covariance matrix, otherwise known as scatter matrix for projection Y1 as

$$ \underset{1}{\overset{\sim }{S}}={\sum}_N\left({Y}_1-\underset{1}{\overset{\sim }{\mu }}\right){\left({Y}_1-\underset{1}{\overset{\sim }{\mu }}\right)}^T $$
(20)

And for class Y2

$$ \underset{2}{\overset{\sim }{S}}={\sum}_N\left({Y}_2-\underset{2}{\overset{\sim }{\mu }}\right){\left({Y}_2-\underset{2}{\overset{\sim }{\mu }}\right)}^T $$
(21)

Rewriting Eq. (20),

$$ \underset{1}{\overset{\sim }{S}}={\sum}_N{w}^T\left({X}_1-\underset{1}{\overset{\sim }{\mu }}\right){\left({X}_1-\underset{1}{\overset{\sim }{\mu }}\right)}^Tw={w}^T\left[{\sum}_N\left({X}_1-\underset{1}{\overset{\sim }{\mu }}\right){\left({X}_1-\underset{1}{\overset{\sim }{\mu }}\right)}^T\right]w $$
(22)

Or

$$ \underset{1}{\overset{\sim }{S}}={w}^T{S}_{w1}w $$
(23)

where

$$ {S}_{w1}={\sum}_N\left({X}_1-\underset{1}{\overset{\sim }{\mu }}\right){\left({X}_1-\underset{1}{\overset{\sim }{\mu }}\right)}^T $$
(24)

is called within-class scatter matrix.

Then

$$ \underset{1}{\overset{\sim }{S}}+\underset{2}{\overset{\sim }{S}}={w}^T{S}_ww $$
(25)

where

$$ {S}_w={S}_{w1}+{S}_{w2} $$
(26)

is again called within-class scatter matrix.

Now in order to obtain the required LDA projection vector w which will separate the two classes, we seek the vector w that maximizes the ratio of mean distance to within-class matrices:

$$ \rho (w)=\frac{{\left|\underset{1}{\overset{\sim }{\mu }}-\underset{2}{\overset{\sim }{\mu }}\right|}^2}{\underset{1}{\overset{\sim }{S}}+\underset{2}{\overset{\sim }{S}}} $$
(27)

On the other hand, rewriting

$$ {\left|\underset{1}{\overset{\sim }{\mu }}-\underset{2}{\overset{\sim }{\mu }}\right|}^2={\left({w}^T{\mu}_1-{w}^T{\mu}_2\right)}^2={w}^T\left({\mu}_1-{\mu}_2\right){\left({\mu}_1-{\mu}_2\right)}^Tw $$
(28)
$$ {\left|\underset{1}{\overset{\sim }{\mu }}-\underset{2}{\overset{\sim }{\mu }}\right|}^2={w}^T{S}_Bw $$
(29)

where

$$ {S}_{Bi}=\left({\mu}_i-{\mu}_m\right){\left({\mu}_i-{\mu}_m\right)}^T $$
(30)

is called the between-class scatter matrix.

In general and for N face images, the between-class scatter matrix is given by

$$ {S}_B={\sum}_{i=1}^N\left({\mu}_i-{\mu}_m\right){\left({\mu}_i-{\mu}_m\right)}^T $$
(31)

Then the ratio to be maximized in order to get class separation becomes

$$ \rho (w)=\frac{w^T{S}_Bw}{w^T{S}_ww} $$
(32)

The LDA-required projection vector is obtain by

$$ w=\textrm{argmax}\left(\frac{w^T{S}_Bw}{w^T{S}_ww}\right) $$
(33)

And this can be carried out by doing

$$ \frac{d\rho (w)}{w}=0 $$
(34)

And this leads to

$$ {S}_w^{-1}{S}_Bw=\rho w $$
(35)

The interpretation of Eq. (35) is that the required LDA projection w vector is the Eigen vector of matrix

$$ A={S}_w^{-1}{S}_B $$
(36)

which can be obtained in similar manner to PCA technique.

Example:

Let two images be represented by the two vectors:

$$ {X}_1=\left\{\left(4,1\right),\left(2,4\right),\left(2,3\right),\left(3,6\right),\left(4,4\right)\right\} $$
$$ {X}_2=\left\{\left(9,10\right),\left(6,8\right),\left(9,5\right),\left(8,7\right),\left(10,8\right)\right\} $$

Then we get

$$ {S}_1=\left[\begin{array}{cc}0.8& -0.4\\ {}-4.0& 2.6\end{array}\right],{S}_2=\left[\begin{array}{cc}1.84& -0.04\\ {}-0.04& 2.64.\end{array}\right] $$

and

$$ {\mu}_1=\left[3.00\ 3.60\right], $$
$$ {\mu}_2=\left[8.40\ 7.60\right] $$

Substituting in Eqs. (35) and (36), we get

$$ {S}_w=\left[\begin{array}{cc}2.64& -0.44\\ {}-0.44& 5.28\end{array}\right], $$
$$ {S}_B=\left[\begin{array}{cc}29.16& 21.60\\ {}21.60& 16.00\end{array}\right] $$
$$ {S}_w^{-1}{S}_B=\left[\begin{array}{cc}11.98& 8.81\\ {}5.08& 3.76\end{array}\right] $$

Solving Eq. (35) we get

$$ {S}_w^{-1}{S}_Bw=\lambda w\longrightarrow \left|{S}_w^{-1}{S}_B-\lambda I\right|= $$
$$ \left|\begin{array}{cc}11.98-\lambda & 8.81\\ {}5.08& 3.76-\lambda \end{array}\right|=0\longrightarrow \lambda =15.65 $$

Then

$$ \left[\begin{array}{cc}11.98& 8.81\\ {}5.08& 3.76\end{array}\right]\left[\begin{array}{c}{w}_1\\ {}{w}_2\end{array}\right]=15.65\left[\begin{array}{c}{w}_1\\ {}{w}_2\end{array}\right] $$

Solving for w1, w2

$$ \left[\begin{array}{c}{w}_1\\ {}{w}_2\end{array}\right]=\left[\begin{array}{c}0.91\\ {}0.39\end{array}\right] $$

Figure 4 illustrates the two image classes and the LDA projecting vector on which the projections of two images can be easily separated.

Fig. 4
figure 4

Projection of two simple images on the main LDA vector showing clear class separation

5 Computer Simulation

A computer simulation for evaluating the performance of both PCA and LDA techniques using the same images have been implemented through writing MatLab program which was run on laptop computer core i5 machine.

The simulation consists of processing face images of 100 different person samples. Each face image is represented by 200 × 200 pixels of monochromatic gray intensity, a sample of which is shown in Fig. 5.

Fig. 5
figure 5

Face samples taken from the University of Sheffield

These face images have been borrowed with courtesy from free image library of the University of Sheffield [18].

The simulation consists of both training and recognition phases for each of PAC and LDA techniques. In training phase, the data base which contains feature vectors are obtained using ideal images regarding the lighting conditions. Only in recognition phase that occlusion are introduced over images for testing. Since this work is mainly concerned with relative comparison of PCA and LDA, then qualitative image occlusion might be a sensible choice. Having said that, within the recognition phase, the nonideal (shadowing level), which is acting as image occlusion, was introduced by adding a black hue of linear gradient to each image. Figure 6a, b illustrates the increased black hue on two separate experiments with multiplying factor multi = 0.3 and multi = 0.6 respectively.

Fig. 6
figure 6

Side-shadowing occlusion of deferent qualitative grades: (a) multi = 0.3 and (b) multi = 0.6

Each of PCA and LDA algorithms is performed on the same corrupted images set after adding “salt-and-pepper” noise so that we get random ensample which is necessary to calculate the FFR which is in turn is a probability. The performance of both PCA and LDA was evaluated in terms of FRR against increased hue intensity (multiplying factor) which is denoted by shadowing level as it is depicted in Fig. 7a, b. Figure 7a depicts the simulation outcome for average black hue of multi = 0.3, while Fig. 7b depicts simulation outcomes for multi = 0.6. By studying these figures, we notice the following: first, for ideal illumination, i.e., shadowing =0.

Fig. 7
figure 7

Face recognition rate of PCA versus LDA. (a) multi = 0.3 and (b) multi = 0.6

We may notice that the improvement of FRR gained by LDA over PCA is ranging between 10% and 25%, which is in complete agreement with the works of [19,20,21]. Secondly, as the severity of occlusion increase, the performance in terms of FRR of PCA deteriorates rapidly, while that of LDA deteriorates gracefully. Even we get fairly constant and good FFR of 80% for mild shadowing (Fig. 7b). Thirdly, still LDA technique maintains higher FRR compared with PCA for all levels of shadowing occlusions. From FRR curves in Fig. 6a, b, we notice that LDA technique outperforms PCA techniques in terms of higher FRR for the same degree of nonideal image lighting.

6 Conclusion

In principle, theoretical formula which may enable us to quantify the difference in performance between PCA and LDA in terms of FRR is needed. However, this task can be proved to be highly difficult. Monte Carlo simulation as it was used in this work may provide a valid alternative. And as such, we succeeded in getting tangible indication that LDA can outperform PCA technique in terms of FRR versus ill-lighting condition. Both figures show a graceful degradation in performance in the case of LDA techniques with respect to worsening lighting conditions, while PCA fails fast and dramatically. However, the price is a much increased computation burden in the case of LDA, and this might not be a prohibiting factor given the fact that computation power of modern computers might fulfill the requirement for implementing LDA-based face recognition system for real-time applications.