Keywords

1 Introduction

There are many types of personal authentication systems and face recognition is one of the active research areas since last several decades. Several methods have been proposed to recognize faces [13]. There are two main categories of face recognition methods: feature-based and appearance-based [1]. Using appearance-based methods, a face image of size N × N pixels is represented by a vector in N 2 dimensional space. Practically, these spaces are too large to perform robust and fast recognition of faces. To solve this problem, dimensionality reduction is done using Principal Component Analysis (PCA) technique. In 1987, PCA was first used to represent face images by Sirovich and Kirby [4]. Turk and Pentland applied PCA to face recognition and presented eigenfaces method in 1991 [5]. We study the effect of 15 similarity measures on the performance of face recognition using PCA. Following characteristics are used to measure the system performance: area above cumulative match characteristics (CMC), rate of recognition and percent of images needed to extract to achieve cumulative recognition of 100 %.

Organization of the paper is as follows: In Sect. 2, we present face recognition using PCA technique in detail. Various similarity measures are described in Sect. 3. In Sect. 4, experimental work and the results obtained are presented. Section 5 offers the conclusion.

2 Principal Component Analysis

We implemented face recognition using PCA as proposed by Turk and Pentland [5]. Let the gallery set of M face images be Г 1 , Г 2 ,…, Г M . The average face image of the whole set is defined by

$$ \Psi = \frac{1}{M} \sum\nolimits_{i = 1}^{M} {\Gamma _{i} } $$
(1)

Each face image differs from the average face, Ѱ, by the vector \( \phi_{i} = \varGamma_{i} -\Psi \), where i = 1 to M. Find covariance matrix C as

$$ C = A A^{T} ,\;{\text{where}}\;{\text{matrix}}\;A = [\phi_{1} \phi_{2} .. \phi_{M} ]. $$
(2)

Matrix C is of size N 2 by N 2. It’s computationally expensive to find its N 2 eigenvectors. Therefore form M by M matrix, \( L = A^{T} A \) and get its M eigenvectors, \( \upsilon_{i} \). The most significant M eigenvectors of C are found as:

$$ u_{l} = \sum\nolimits_{k = 1}^{M} {v_{lk} \phi_{k} , l = 1, \ldots , M } $$
(3)

Now from these M eigenvectors, consider M′ (<M) eigenvectors (with the highest M′ eigenvalues). New probe image is projected into the facespace using operation:

$$ w_{k} = u_{k}^{T}(\varGamma - \Psi ) $$
(4)

for k = 1, … , M′.

These values of w form projection vector \( \Omega = \left[ { w_{1} , w_{2} , \ldots , w_{{M^{'} }} } \right]. \) Probe image is then classified as belonging to closest face class by using some similarity measure.

3 Similarity Measures

Consider two feature vectors x and y of dimensions n each. The distances between these feature vectors can be calculated as [611]:

  1. 1.

    City block distance (or Manhattan distance):

    $$ d\left( {x,y} \right) = \sum\nolimits_{i = 1}^{n} {\left| {x_{i} - y_{i} } \right|} $$
    (5)
  2. 2.

    Euclidean distance:

    $$ d\left( {x,y} \right) = \sqrt {\sum\nolimits_{i = 1}^{n} {(x_{i} - y_{i} )^{2} } } $$
    (6)
  3. 3.

    Squared Euclidean distance (Sum square error, SSE):

    $$ d\left( {x,y} \right) = \sum\nolimits_{i = 1}^{n} {(x_{i} - y_{i} )^{2} }. $$
    (7)
  4. 4.

    Mean square error (MSE):

    $$ d\left( {x,y} \right) = \frac{1}{n}\sum\nolimits_{i = 1}^{n} {(x_{i} - y_{i} )^{2} } $$
    (8)
  5. 5.

    Cosine distance:

    $$ d\left( {x,y} \right) = - \frac{{\sum\nolimits_{i = 1}^{n} {x_{i} y_{i} } }}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} x_{i}^{2} \mathop \sum \nolimits_{i = 1}^{n} y_{i}^{2} } }} $$
    (9)
  6. 6.

    Mahalanobis distance:

    $$ d\left( {x,y} \right) = \sqrt {(x - y)S^{ - 1} (x - y)^{t} } $$
    (10)

    where S is the covariance matrix of the distribution.

  7. 7.

    Standard Euclidean distance:

    $$ d\left( {x,y} \right) = \sqrt {(x - y)V^{ - 1} (x - y)^{t} } $$
    (11)

    where V is the n by n diagonal matrix whose j th diagonal element is S(j) 2, where S is the vector of standard deviations.

  8. 8.

    Minkowski distance:

    $$ d\left( {x,y} \right) = (\sum\nolimits_{i = 1}^{n} {\left| {x_{i} - y_{i} } \right|^{p} } )^{1/p} .$$
    (12)

    where p is a scalar exponent and here p > 0.

  9. 9.

    Chebychev distance:

    $$ d\left( {x,y} \right) = max_{i} \{ |x_{i} - y_{i} |\} $$
    (13)
  10. 10.

    Correlation distance:

    $$ d\left( {x,y} \right) = - \frac{{n\mathop \sum \nolimits_{i = 1}^{n} x_{i} y_{i} - \mathop \sum \nolimits_{i = 1}^{n} x_{i} \mathop \sum \nolimits_{i = 1}^{n} y_{i} }}{{\sqrt {({\text{n}}\mathop \sum \nolimits_{{{\text{i}} = 1}}^{\text{n}} {\text{x}}_{\text{i}}^{2} - (\mathop \sum \nolimits_{{{\text{i}} = 1}}^{\text{n}} {\text{x}}_{\text{i}} )^{2})({\text{n}}\mathop \sum \nolimits_{{{\text{i}} = 1}}^{\text{n}} {\text{y}}_{\text{i}}^{2} - (\mathop \sum \nolimits_{{{\text{i}} = 1}}^{\text{n}} {\text{y}}_{\text{i}} )^{2} )}}} $$
    (14)
  11. 11.

    Canberra distance:

    $$ d\left( {x,y} \right) = \mathop \sum \nolimits_{i = 1}^{n} \frac{{|x_{i} - y_{i} |}}{{\left| {x_{i} } \right| + |y_{i} |}} $$
    (15)
  12. 12.

    Modified SSE distance:

    $$ d\left( {x,y} \right) = \frac{{\mathop \sum \nolimits_{i = 1}^{n}(x_{i} - y_{i} )^{2} }}{{\mathop \sum \nolimits_{i = 1}^{n} x_{i}^{2}\mathop \sum \nolimits_{i = 1}^{n} y_{i}^{2} }} $$
    (16)
  13. 13.

    Modified Manhattan distance:

    $$ d\left( {x,y} \right) = \frac{{\mathop \sum \nolimits_{i = 1}^{n} |x_{i} - y_{i} |}}{{\mathop \sum \nolimits_{i = 1}^{n} |x_{i} |\mathop \sum \nolimits_{i = 1}^{n} |y_{i} |}} $$
    (17)
  14. 14.

    Weighted Modified SSE distance:

    $$ d\left( {x,y} \right) = \frac{{\mathop \sum \nolimits_{i = 1}^{n} z_{i} \left( {x_{i} - y_{i} } \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{n} x_{i}^{2}\mathop \sum \nolimits_{i = 1}^{n} y_{i}^{2} }},\quad z_{i} = \sqrt {1/\lambda_{i} } $$
    (18)

    where λ i are eigenvalues.

  15. 15.

    Weighted Modified Manhattan distance:

    $$ d\left( {x,y} \right) = \frac{{\mathop \sum \nolimits_{i = 1}^{n} z_{i} \left| {x_{i} - y_{i} } \right|}}{{\mathop \sum \nolimits_{i = 1}^{n}|x_{i} | \mathop \sum \nolimits_{i = 1}^{n}|y_{i} |}},\quad z_{i} = \sqrt {1/\lambda_{i} } $$
    (19)

In this paper we perform identification task. CMC curve graphically represents the performance of the identification system. It is a plot of rank values on the X axis and probability of correct identification at or below that rank on the Y axis [10, 12].

4 Experiments and Results

We tested performance of the system using ORL face image database [13]. Face images are taken at the AT and T Laboratories between April 1992 and April 1994. ORL database contains images of 40 different persons, with 10 images per person. The face images are taken at different times by varying lighting, facial details and facial expressions. All face images are frontal with some pose variation. Each image is of size 112 × 92 pixels with 256 gray levels.

For experimental work, each face image is resized to 50 × 40 pixels. Training is done by considering first 5 images and testing is done considering remaining five images per person. This gave us gallery set of 200 images and probe set of 200 images. Experiments are implemented using MATLAB® R2013a. We used nearest mean rule in which we computed a template for each identity in the database. The closest identity is chosen as the match.

The experimental results with 15 similarity measures are listed in Tables 1, 2 and 3. The performance for face recognition is measured by calculating the area above cumulative match characteristic curve (CMCA). If CMCA is smaller, it indicates better recognition performance. We show how many images (in percents) should be extracted to get cumulative recognition rate between 80 to 100 %. If it is smaller, it implies that fewer images need to be extracted to get required cumulative recognition rate. We also find recognition rate that is achieved if the closest match is extracted from the system. If this first one recognition rate is more, it indicates better results. In Tables 1, 2 and 3, we used subscripts to mark the best results.

Table 1. Performance using 20 % of features (40).
Table 2. Performance using 60 % of features (120).
Table 3. Performance using 90 % of features (180).

Table 4 reports the sorted similarity measures w.r.t the performance of the system using following characteristics: (i) recognition rate, (ii) overall recognition accuracy (i.e. CMCA), (iii) images (in percent) extracted to get 100 % cumulative identification.

Table 4. Sorted similarity measures with respect to the performance of the system.

The training set has 200 images (5 images per person). This produces 199 eigenvectors as there will be only (M − 1) meaningful eigenvectors with remaining eigenvectors having associated eigenvalues of zero [5]. Figure 1 shows the variation in the system performance with number of eigenvectors for top 6 performers. We need to extract fewer images (30 %) to achieve 100 % cumulative recognition using Cosine distance measure if 10–90 % of eigenvectors are used and correlation if 30–90 % of eigenvectors are used. Performance with respect to rank increases till 10 % and 12 % of eigenvectors for Cosine and Correlation similarity measures respectively, and later it stabilizes.

Fig. 1.
figure 1

Performance of the system with respect to number of eigenvectors (a) Cumulative 100 % recognition, (b) Recognition rate, (c) CMCA (Color figure online)

System achieved best performance using Cosine similarity measure (336.25–364.38) if 10–20 % of eigenvectors are used, 323.13 if 90 % of eigenvectors are used and Correlation (322.5–323.75) if 30–60 % of eigenvectors are used. For CMCA, top performance is shown by Correlation and Cosine similarity measures. Performance with these measures increases until approximately 30 % of eigenvectors are used, and then performance almost stabilizes.

We achieved largest recognition rates using Correlation (82.5–86 %) if 10–30 % of eigenvectors are used, Cosine distance (85–86 %) if 20–30 % of eigenvectors are used and City block distance (86–87.5 %) if 30–90 % of eigenvectors are used. Best recognition rate is achieved using City block measure. It shows increase in recognition rate with number of eigenvectors. The variation in the performance indicates that selecting similarity measure is a critical decision in designing a PCA-based face recognition.

5 Conclusions

This paper investigates 15 different similarity measures for face recognition using PCA. We examined the performance of the system by varying the number of eigenvectors. The experiments are conducted on ORL face database which has 400 face images. The best identification performance is reported using following similarity measures: Cosine, Correlation and City block. Using Cosine distance we need to extract fewer images to achieve 100 % cumulative recognition than using any other similarity measure. This research shows the effect of similarity measures on the performance of the system. It is observed that, as number of eigenvectors increased, recognition rates also increased. This observation is consistent with prior studies [10]. Performance of the system increased till roughly 30 % of eigenvectors. After that, it almost stabilized. Standardized Euclidean, Weighted Modified SSE and Weighted Modified Manhattan are worst performers to name a few with CMCA of (510.7–1538), (687.5–1323.8) and (734.4–1516.9) for 10–90 % of eigenvectors.