1 Introduction

Face clustering problem has found many applications nowadays in many emerging areas like forensic and surveillance applications, smart phones, and social media. It can be described as follows: Given a set of data or faces, clustering techniques separate them into different group. Each group shares some common characteristics (faces of the same person). The greater the similarity within the group (same person) and the greater the difference between groups (different persons), the better the clustering is. This problem can be tackled by different approaches. Different face representations can be considered and different cluster algorithms can be applied. Good model should be used to represent a face and good cluster technique with powerful similarity measure should be be applied on this model in order to get the best possible clustering of a set of faces.

1.1 Related work

In this paper, we have attempted to handle the face clustering problem by using a set of Gabor filters as a face representation method and by using an effective face clustering algorithm (Affinity propagation) which is capable of automatically grouping the faces. We replaced the popular Euclidian similarity measure by a more powerful one which is the Structure Similarity Measure.

Different clustering techniques use different similarity measures. K-means [19] algorithm is the simplest clustering algorithm. It minimizes the within-cluster sum of squares (Euclidian similarity measure). The number of clusters should be specified. Hierarchical clustering algorithms fall into 2 categories: top-down or bottom-up. Bottom-up algorithms treat each data point as a single cluster at the outset and then successively merge (or agglomerate) pairs of clusters until all clusters have been merged into a single cluster that contains all data points. Bottom-up hierarchical clustering is therefore called hierarchical agglomerative clustering or HAC. This hierarchy of clusters is represented as a tree (or dendrogram). In Spectral clustering [26] the problem of clustering is reformulated using the similarity graph: We want to find a partition of the graph such that the edges corresponding to different clusters have a very low weight (which means that points in different clusters are dissimilar from each other) and the edges within a group have higher weight (which means that points within the same cluster are similar to each other). Different clustering techniques were applied to face clustering.

In [6] spectral clustering with local gradients was developed. They used Yale-B and PIE-66 datasets. 2D-HMM and hierarchical clustering were used in [23]. They used a database of 1500 face images of 8 persons. LBP face-features and color and texture, body-features were used in [3]. They used spectral clustering. They used a database of 400 face images of 5 persons. In [15] the authors used probabilistic clustering. Hierarchical clustering using rank-order distance function was applied in [27]. Learning base descriptors are developed in [2]. Subspace clustering was applied in [16]. They used the extended Yale-B database. Wang et al. K-NN graph construction method was done in [18]. [1] and [11] used hierarchical clustering to organize the face database. [14] applied their face clustering on video frames. Deep learning was used in [12]. Clustering billion of images with large scale nearest neighbor was used in [9].

1.2 Contributions

Our contributions in the Face clustering problem are the following:

  • Selection of a fast, and an accurate clustering algorithm: Affinity propagation algorithm which does not require to specify a priori the number of clusters.

  • Selection of an accurate image similarity measure: structural similarity index (SSIM) which is a perception-based model that considers image degradation as perceived change in structural information, while also incorporating important perceptual phenomena, including both luminance masking and contrast masking terms.

  • Selection of an accurate face features: Gabor filters which model the responses of simple cells in the primary visual cortex by extracting important features in an image from different orientation and different scales.

2 Background

2.1 Affinity propagation

Affinity propagation (AP) is a clustering algorithm based on the concept of “message passing” between data points [4]. It can also support similarities that are not symmetric or do not satisfy the triangle inequality. It is deterministic and does not require the number of clusters that must be determined and estimated before running the algorithm. A collection of real-valued similarities between data points is considered as input to the “affinity propagation” approach. The most common similarity measure is the negative squared error (Euclidean distance). For points ‘xi’ and ‘xk’, s(i,k) = −||xi-xk||2 is taken as inputs. The values of s(k,k) are usually taken the median of the input similarities .

There are two kinds of message exchanged between data points:

  • The “responsibility” matrix R has values r(i, k) that quantify how well-suited xk is to serve as the exemplar for xi, relative to other candidate exemplars for xi.

  • The “availability” matrix A contains values a(i, k) that represent how “appropriate” it would be for xi to pick xk as its exemplar, taking into account other points’ preference for xk as an exemplar.

Both matrices are initialized to all zeroes.

The algorithm then performs the following updates recursively (Eqs. 1 and 2).

$$ {\displaystyle \begin{array}{c}r\left(i,k\right)\leftarrow s\left(i,k\right)-\mathit{\max}\left\{a\left(i,{k}^{\prime}\right)+s\left(i,{k}^{\prime}\right)\right\}\\ {}\mathrm{k}^{\prime}\ne \mathrm{k}\end{array}} $$
(1)
$$ {\displaystyle \begin{array}{c}a\left(i,k\right)\leftarrow \min \left(0,r\left(k,k\right)+{\sum}_{i^{\prime}\notin \left\{i,k\right\}}\max \left(0,r\left({i}^{\prime },k\right)\right)\right)\\ {} for\ i\ne k\ and\end{array}} $$
(2)

To limit the influence of strong incoming positive responsibilities, the total sum is thresholded so that it cannot go above zero. The “self-availability” a(k,k) is uploaded differently (Eq. 3):

$$ a\left(k,k\right)\leftarrow \sum \limits_{i^{\prime}\ne k}\max \left(0,r\left({i}^{\prime },k\right)\right) $$
(3)

2.2 Structural similarity index

SSIM [17] is primarily used in measuring the similarity between two images. This method allows the metric to adapt to the local statistical characteristics at different image locations. It is characterized by very good accuracy and low complexity.

The Structural Similarity Index quality assessment index is a computation of three terms, namely the luminance term, the contrast term and the structural term. It is given by Eq. (4).

$$ \mathrm{SSIM}\left(\mathrm{x},\mathrm{y}\right)=\mathrm{l}{\left(\mathrm{x},\mathrm{y}\right)}^{\alpha}\cdotp \mathrm{c}{\left(\mathrm{x},\mathrm{y}\right)}^{\beta}\cdotp \mathrm{s}{\left(\mathrm{x},\mathrm{y}\right)}^{\gamma } $$
(4)
$$ \mathrm{Where}\kern0.5em l\left(x,y\right)=\frac{2\upmu x\upmu y+C1}{\upmu^2x+{\upmu}^2y+C1} $$
(5)
$$ c\left(x,y\right)=\frac{2\sigma x\sigma y+C2}{\sigma^2x+{\sigma}^2y+C2} $$
(6)
$$ s\left(x,y\right)=\frac{\upsigma \mathrm{x}\mathrm{y}+\mathrm{C}3}{\upsigma \mathrm{x}\upsigma \mathrm{y}+\mathrm{C}3} $$
(7)

If α = β = γ = 1 and C3 = C2/2 the index simplifies to the Eq. (8):

$$ SSIM\left(x,y\right)=\frac{\left(2\mu x\mu y+\kern0.5em C1\right)\left(2\sigma x y+C2\right)}{\left({\upmu}^2x+{\upmu}^2\ y+C1\right)\left({\sigma}^2x+{\sigma}^2y+C2\right)} $$
(8)

2.3 Gabor filtering

Gabor filters [5] are a set of band pass filters used to extract important features in an image. The idea is to convolve the image with a set of filters to extract structures with different sizes and orientations. Gabor filter is a sinusoidal plane of a particular frequency and orientation, modulated by a Gaussian envelope. It can be represented by the Eq. (9)

$$ h\left(x,y\right)=g\left({x}^{\prime },{y}^{\prime}\right)\exp \left(2\pi if{x}^{\prime}\right) $$
(9)

Where g(x, y) is the 2d Gaussian given by Eq. (10)

$$ g\left({x}^{\prime },{y}^{\prime}\right)=\exp \left[-\frac{1\ }{2}\ \raisebox{1ex}{${x}^{\prime 2}+{y}^{\prime 2}$}\!\left/ \!\raisebox{-1ex}{${\sigma}^2$}\right.\right] $$
(10)

(x, y) = (x cos θ + y sin θ, −x cos θ + y sin θ ) are the rotated coordinates (x, y) by θ.

The parameters f and θ represent the frequency and orientation of the signal. In the time domain, Gabor function is represented by Eq. (11):

$$ h\left(x,y\right)=\frac{1}{2\pi \sigma {}^2}\exp \left[-\frac{1}{2}\left(\frac{x{}^2+y{}^2}{\sigma {}^2}\right)\right]\cos \left(2\pi fx\right) $$
(11)

The Fourier transform of the Gabor filter is given by Eq. (12)

$$ H\left(u,v\right)=\exp \left[-2\pi {}^2\sigma {}^2\left(\left(u-f\right)+v{}^2\right)\right]+\exp \left[-2\pi {}^2\sigma {}^2\left(\left(u+f\right)+v{}^2\right)\right] $$
(12)

When a Gabor filter is applied to an image, it gives the highest response at the edges and at points where the texture changes.

The θ parameter decides what kind of features the filter responds to. For example, giving theta a value of zero means that the filter is responsive only to horizontal features only. So, in order to obtain features at various angles in an image, we divide the interval between 0 and 180 into 16 equal parts, and compute a Gabor kernel for each value of theta thus obtained.

The wavelength governs the width of the strips of Gabor function. Increasing the wavelength produces thicker stripes and decreasing the wavelength produces thinner stripes. Keeping other parameters unchanged and changing the lambda to 60 and 100, the stripes get thicker.

Figure (1) shows the Gabor filters for 8 orientations and 5 wavelengths.

Fig. 1
figure 1

Gabor filters for 8 orientations and 5 wavelengths

2.4 F-measure

To assess the performance of the clustering algorithms, we have used the F-measure [22]. It can be described as follows:

  • Positive (P): same cluster.

  • Negative (N): different clusters.

  • True (T): same class.

  • False (F): different classes.

  • TP: Number of item pairs that are in the same cluster and belong to the same class.

  • FP: Number of item pairs that are in the same cluster but belong to different classes.

  • TN: Number of item pairs that are in different clusters and belong to different classes.

  • FN: Number of item pairs that are in different clusters, but belong to the same class.

Precision (Eq. (13)) is calculated as the fraction of pairs correctly put in the same cluster, recall (Eq. (14)) is the fraction of actual pairs that were identified, and F-measure (Eq. (15)) is the harmonic mean of precision and recall.

$$ preci\mathrm{s} ion=\frac{TP}{TP+ FP} $$
(13)
$$ recall=\frac{TP}{TP+ FN} $$
(14)
$$ F=\frac{2\ast precision\ast recall}{precision+ recall} $$
(15)

3 Algorithm

Our algorithm can be summarized as follows:

  1. 1)

    For each face image face(x,y): Find the Gabor outputs for 8 orientations and 5 wavelengths. Every MxN face image will be represented by an augmented size M1xN1 image facegabor(x,y). Where M1 = 8 M and N1 = 5 N. This can be shown in Fig. 2.

  2. 2)

    Find the STRUCTURAL SIMILARITY INDEX (SSIM) among all the resulting images:

    s(i,k) = SSIM(facegabor(i), facegabor(k))

  3. 3)

    Apply the Affinity propagation algorithm on the augmented size images.

  4. 4)

    Get the face clusters and apply the F-measure.

Fig. 2
figure 2

Gabor outputs for 8 orientations and 5 wavelengths

4 Experiments

To show the effectiveness of our approach, we have compared our approach with:

  • K-means which requires the specification of the number of clusters.

  • Spectral clustering also requires that the number of clusters to be specified a priori.

  • Agglomerative clustering doesnot require the specification of the number of clusters.

  • Affinity propagation using Euclidian distance as similarity measure.

  • Conditional Pairwise Clustering (ConPac).

  • And our approach which uses the SSIM as similarity measure.

We have used the following databases:

  • Labeled Faces in the Wild (LFW) [7]. This database contains 13,233 images of 5749 subjects. We have used the entire database. The results are shown in Table 1.

  • IARPA JANUS B (IJB-B) [20]: TheIJB-B dataset contains 11,754 images of 1845 subjects. We have used 2 subsets (IJBB128, IJBB1024) with a number of subjects respectively 128, 1024. The results are shown in Tables 2 and 3.

  • The Extended Yale-B face database [10] contains 38 persons where images (frontal faces) were captured under 9 different illuminations. The results are shown in Table 4.

Table 1 Comparison of our algorithm and other clustering algorithms on the LFW dataset
Table 2 Comparison of our algorithm and other clustering algorithms on the IJBB128
Table 3 Comparison of our algorithm and other clustering algorithms on the IJBB1024
Table 4 Comparison of our algorithm and other clustering algorithms on the extended Yale database

For K-means and spectral clustering, you need to specify a priori the number of clusters. For example In Table 1 corresponding to the LFW database, we have chosen 2 values of K. The first one is 5749 which the total number of Subjects (gave an F-measure = 0.098) and the other one is K = 500 and gave an F-measure of 0.346. We followed the same approach for the Spectral Clustering algorithm.

The other 4 techniques (Agglomerative, Affinity Propagation, ConPac, and our algorithm) will output automatically the number of clusters. We have also compared the usual Affinity Propagation which is applied directly to the images with our approach. Our approach gave the biggest value of the F-measure.

All of our simulations were done using Matlab R2017a on an Intel(R) core™ i7–4510 CPU 2Ghz. The additional time complexity of our algorithm compared to the regular Affinity propagation algorithm is the computation of the output of the Gabor filter which is 0.101 s per image.

5 Conclusions and future directions

In this paper, we proposed a novel clustering algorithm which we called GABOR FACE CLUSTERING USING AFFINITY PROPAGATION AND STRUCTURAL SIMILARITY INDEX. It clusters face images without knowing a priori the number of persons. It is based on: a representation using a set of Gabor filters Affinity propagation clustering and structural similarity index, which is a very powerful method for measuring the similarity between two images. To show the effectiveness of our approach, we compared it to other clustering algorithms on challenging face databases. In future directions, we will extend our ideas and skills to other multimedia applications, such as feature selection [8], image segmentation [21], and haze removal [24, 25].