Introduction

Principal component analysis (PCA), also known as Karhanen–Loeve expansion, is a classical feature extraction and data representation technique widely used in the areas of pattern recognition and computer vision. Turk and Pentland [1, 2] presented the well known Eigenfaces method for face recognition. Recently, two PCA-related methods, independent component analysis (ICA) and Kernel principal component analysis (Kernel PCA) have been widely used. Barlett et al. proposed ICA for face representation and found that it is better than PCA when cosines are used as the similarity measure. Yang et al. [3] used Kernel PCA for face feature extraction and recognition and showed that the Kernel Eigenfaces method outperforms the classical Eigenfaces method. However, ICA and Kernel PCA are both computationally more expensive than PCA.

As opposed to conventional PCA, 2D PCA is based on 2D matrices rather than 1D vectors. Here an image covariance matrix can be constructed directly using the original image matrices. As a result, 2D PCA has two important advantages over PCA. First, it is easier to evaluate the covariance matrix accurately. Second, less time is required to determine the corresponding Eigenvectors.

Conventional Face Recognition Models

This section details the different face recognition models. First model one dimensional PCA (1D PCA) derives desirable features characterized by Eigen vectors. Second model Fisher discriminant analysis (FDA) [4, 5], achieves greater scatter between-classes. Third model independent component analysis (ICA) [6] is performed on face images under two different architectures, one which treated the images as random variables and the pixels as outcomes, and a second which treated the pixels as random variables and the images as outcomes.

The fourth model, Kernel PCA (KPCA) [6] applies kernel functions in the input space to achieve the same effect of the expensive nonlinear mapping. As opposed to PCA, 2D PCA [7] is based on 2D image matrices rather than 1D a vector so the image matrix does not need to be transformed into a vector prior to feature extraction.

2D PCA Algorithm

Two dimensional principal component analysis (2D PCA) is based on 2D Eigen vectors. In this method the image covariance matrix is a 2D matrix and it is directly calculated from the 2D original image matrices. Therefore, this method has the advantage of easier evaluation of the covariance matrix and less time required to find out Eigen vectors and Eigen values.

Steps Involved in Training Phase

  1. (1)

    The average image of all training image samples is denoted by \(\overline{\text{A}}\).

  2. (2)

    Then find Gt called the image covariance matrix of size n × n and ‘n’ is the number of columns in 2D face image matrix.

    $$G_{t} = \frac{1}{M}\sum\limits_{j = 1}^{M} {\left( {A_{j} - \overline{A} } \right)^{T} \left( {A_{j} - \overline{A} } \right)}$$
    (1)

    where ‘M’ is the total number of train facial images. The Gt is the average of covariance matrix obtained for all the training images.

  3. (3)

    The feature vectors X1…Xd are Eigen vectors and are calculated by singular value decomposition of covariance matrix ‘Gt’.

  4. (4)

    Project the given image ‘A’ on the feature vectors X1…Xd results in the projected vectors, Y1…Yd called the principal components of the sample image ‘A’.

    $$Y_{k} = AX_{k}$$
    (2)
  5. (5)

    The principal component vectors obtained are used to form an m × d matrix B = [Y1…Yd] (‘m’ is the number of rows in 2D face image matrix) and which is called the feature matrix or feature image of the image sample A.

Steps Involved in Recognition Phase

  1. (1)

    Assign class Wk to each of the training samples [B1…BM]. A nearest neighbour classifier is used for classification. The distance between two arbitrary feature matrices, Bi = [Y (i)1 , Y (i)2 ,…Y (i)d ] and Bj = [Y (j)1 , Y (j)2 ,…Y (j)d ] is defined by,

    $$d(B_{i} ,B_{j} ) = \sum\limits_{k = 1}^{d} {\left\| {Y_{k}^{(i)} - Y_{k}^{(j)} } \right\|_{2} }$$
    (3)

    where \(\left\| {Y_{k}^{(i)} - Y_{k}^{(j)} } \right\|_{2}\) denotes the Euclidean distance (norm 2) between the two principal component vectors Y (i)k and Y (j)k and the superscript ‘i’ and ‘j’ denote the train and test projection vectors respectively. Bi(test image) εWk(given class) if and only if d(Bi,Bj) = min{d(Bi,B1), d(Bi,B2),…d(Bi,BM)}, that is, the test facial image Bi belongs to given class only if it has minimum Euclidean distance with trained facial images Bj of that particular class.

  2. (2)

    Reconstruct an image using the following equation

    $$\overline{A} = VU^{T}$$
    (4)

    where V = [Y1…Yd] are the feature vectors and U = [X1…Xd] are the Eigen vectors.The reconstruction of the facial image is important to compare the mean square error between original test facial image and reconstructed facial image. Based on the mean square error, the performances of face recognition algorithms are compared.

Experiments and Analysis

The 2D PCA method is used for face recognition and tested on two well-known face databases (ORL, Yale) and an our own face database (Senthil face database). The ORL database [8] is used to evaluate the performance of 2D PCA under conditions where the pose and sample size are varied. The Senthil database [9] is employed to test the performance of the system under conditions where there is a variation in facial expressions, and in brightness conditions. The Yale database [10] is used to examine the system performance when both facial expressions and illumination are varied.

Experiments on the ORL Database

The ORL database contains images from 40 individuals, each providing ten different images. For some subjects the images are taken at different times. First, an experiment is performed using the first five image samples per class for training, and the remaining images for test. Thus, the total number of training samples and testing samples are both 200.

The 2D PCA algorithm is first used for feature extraction. Here, the size of the covariance matrix is 92 × 92. Some of the subimages (test images project only on selected top Eigen vectors) are shown in Fig. 1. As observed in Fig. 1, the first subimage contains most of the energy of the original image. The other subimages show the detailed local information from different levels.

Fig. 1
figure 1

Some reconstructed subimages are shown in Inverse order

As the value of k increases, the information contained in Ak becomes gradually weaker. Figure 2 shows the magnitude of the Eigenvalues quickly converges to zero, which is exactly consistent with the results of Fig. 1.

Fig. 2
figure 2

The plot of the magnitude of the Eigen values in decreasing order

An approximate reconstruction of the original image is obtained by adding up the first d subimages together. Figure 3a shows five reconstructed images of the facial image of the subject 40 in ORL Face database by adding first ‘d’ number of subimages (d = 2, 4, 6, 8, 10) (i.e.) Eigenfaces reconstructed using top ‘d’ number of Eigen vectors together. The reconstructed images become clearer as the number of subimages is increased.

Fig. 3
figure 3

a Some reconstructed images based on 2D PCA. b Comparison of reconstructed face images obtained in 2D PCA and 1D PCA of subject 01 for d = 10

For comparison, the PCA (Eigenfaces) is also used to represent and reconstruct the same face image. Figure 3b shows the comparison of reconstructed face images of subject 01 for both cases 2D PCA and 1D PCA (d = 10). The PCA is not performing well in the reconstruction of this image. Furthermore the error between reconstructed facial image and original image is high.

Table 1 presents the top recognition accuracy of PCA and 2D PCA with some random rearrangement in the original ORL Face database, which corresponds to different number of training samples. The performance of 2D PCA is better than PCA. The 2D PCA method is also superior to PCA in terms of computational efficiency for feature extraction. Table 2 indicates that feature extraction by 2D PCA takes much less time. As the number of training samples per class is increased, the relative gain between 2D PCA and PCA becomes more apparent.

Table 1 Comparison of the top recognition accuracy (%) of PCA against 2D PCA
Table 2 Comparison of CPU time (s) for feature extraction using the ORL (CPU: Intel Core i3 CPU 2.53 GHZ, RAM: 4 GB)

The performance of 2D PCA is also compared with other methods, including Fisherfaces [4, 5], ICA [6], and Kernel Eigenfaces [7] without any rearrangement or random shuffle in original ORL face database. In these comparisons, two experimental strategies are adopted.

One strategy is, using first five images per class for training and the other is the leave one-out strategy, that is, the image of one person is removed from the data set and all of the remaining images are used for training. The experimental results under both strategies are listed in Table 3, recognition rate of 2D PCA is better than other methods. The dimensionality of 2D PCA looks like higher but actually different algorithms are compared after projection of feature vectors in 2D space with respect to Eq. (2).

Table 3 Comparison of 2D PCA with other methods using the ORL database

Experiment on the Senthil Database

The Senthil face database contains 80 colour face images of five people (all are men), including frontal views of faces with different facial expressions, occlusions and brightness conditions. Each person has 16 different images. The face portion of the image is manually cropped to 140 × 188 pixels and then it is normalized. The normalized images of one person are shown in Fig. 4.

Fig. 4
figure 4

Sample images for one subject of the Senthil database

Figure 4c, d, g, k, l, o involve variations in facial expressions. Figure 4a, b, e, f, h–j, m–p involve variations in pose. The top recognition accuracy and the time consumed for feature extraction are listed in Table 4. Again 2D PCA is more efficient and effective than PCA.

Table 4 Comparison of the PCA with 2D PCA for Senthil face database

PCA and 2D PCA are compared under varying facial expressions, pose and brightness conditions. The mean square error (MSE) between original test facial images and reconstructed test facial images is plotted for PCA and 2D PCA and they are compared as shown in Fig. 5. The Fig. 6 shows the MSE for 2D PCA for different feature vectors selected.

Fig. 5
figure 5

Comparison of MSE in dB between 1D PCA and 2D PCA

Fig. 6
figure 6

MSE between original test images and reconstructed test images in Senthil database for different feature vectors for 2D PCA

Experiment on the Yale Database

The last experiment is performed using the Yale face database, which contains 165 images of 15 individuals (each person has 11 different images) under various facial expressions and lighting conditions.

Each image of original size 243 × 320 pixels without being cropped are considered for this experiment. In this experiment the first five images are used for training and testing strategy and leave one-out strategy. The experimental results using 2D PCA, PCA (Eigenfaces), ICA, and Kernel Eigenfaces are listed in Table 5. The recognition rate of 2D PCA is superior to PCA, ICA, and Kernel Eigenfaces.

Table 5 Comparison of the performance of 2D PCA, Eigenfaces, ICA, and Kernel Eigenfaces using the Yale database

Conclusion

In this paper, 2D PCA model is compared with all other conventional face recognition models. It has many advantages over conventional PCA (Eigenfaces). In the first place, since 2D PCA is based on the image matrix, it is simpler and more straightforward to use for image feature extraction.

Second, 2D PCA is better than PCA, FDA, ICA and KPCA in terms of recognition accuracy in all experiments. Third, 2D PCA is computationally more efficient than PCA and it can improve the speed of image feature extraction significantly.

Image representation and recognition based on PCA or 2D PCA is statistically dependent on the evaluation of the covariance matrix. The advantage of 2D PCA over PCA is that the former evaluates the covariance matrix more accurately.

Finally, there are two disadvantages in 2D PCA model. First, when a small number of the principal components of PCA are used to represent an image, the mean square error between the approximation and the original pattern is minimal. But 2D PCA needs more coefficients for image representation than PCA. Second, 2D PCA takes larger recognition time compared to all other conventional recognition models for small face databases like Senthil Face database (which is having <100 facial images) as shown in Table 6.

Table 6 Comparison of recognition time in seconds for 1D PCA and 2D PCA for Senthil face database