Abstract
During the last decade, sparse representations have been successfully applied to design high-performing classification algorithms such as the classical sparse representation based classification (SRC) algorithm. More recently, collaborative representation based classification (CRC) has emerged as a very powerful approach, especially for face recognition. CRC takes advantage of SRC through the notion of collaborative representation, relying on the observation that the collaborative property is more crucial for classification than the l 1-norm sparsity constraint on coding coefficients used in SRC. This paper follows the same general philosophy of CRC and its main novelty is the application of a virtual collaborative projection (VCP) routine designed to train images of every class against the other classes to improve fidelity before the projection of the query image. We combine this routine with a method of local feature extraction based on high-order statistical moments to further improve the representation. We demonstrate using extensive experiments of face recognition and classification that our approach performs very competitively with respect to state-of-the-art classification methods. For instance, using the AR face dataset, our method reaches 100% of accuracy for dimensionality 300.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
One of the main challenges of current research in pattern recognition (PR) is to improve the robustness of exiting algorithms with respect to confounding factors including noise, rigid transformations, changes in viewpoint, illumination, etc. Recent advances from statistical learning [1] have brought attention to the notion of sparsity to extract the salient image features in such a way to obtain more accurate and robust classification. Wright et al. [2], in particular, introduced a very influential framework called sparse representation based classification (SRC) for face recognition (FR) and successfully applied this method to identify human faces with varying illumination changes, occlusion and real disguise. In their method, a test sample image is coded as a sparse linear combination of the training images and classification is achieved by identifying which class yields the least residual. Several other methods were inspired by SRC including: the FR method based on sparse representation of facial image patches by Theodorakopoulos et al. [3]; kernel sparse representation for image classification and FR, which applies a sparse coding technique in a high dimensional feature space via some implicit feature mapping [4]; the Gabor occlusion dictionary for SRC by Yang and Zhang which reduces the computation cost by using Gabor feature [5]; a robust regularized coding model to enhance the robustness of face recognition to confounding factors [6, 7]; the method based on maximum correntropy criterion for robust face recognition by He et al. [8]. An alternative point of view was proposed by Zhang et al. [9] who argued that rather than sparsity “the collaborative representation mechanism used in SRC is much more crucial to its success of face classification”. Based on this observation, they introduced a method called collaborative representation based classification with regularized least square (CRC) [9] which was shown to perform very competitively against SRC with a lower computational cost. As a further refinement of CRC, some of the authors proposed a method called relaxed collaborative representation (RCR) which is designed better capture the similarity and distinctiveness of different features for the classification [10]. An alternative approach is the two-phase test sample representation method [11] and relies on detecting first the training samples located away from the test sample (assuming they have negligible effect on classification); next the test sample is represented as a linear combination of the M nearest neighbors and the representation result is used for classification. Another method proposed in [12] consists in partitioning face images into blocks and then creating an indicator to remove the contaminated blocks and choose the nearest subspaces; SRC is finally used to classify the occluded test sample in the new feature space.
We also recall the Fisher discrimination dictionary learning (FDDL) algorithm by Yang et al. [13] which embeds the Fisher criterion in the objective function design. The FDDL scheme has two remarkable properties. First, dictionary atoms are learnt to associate the class labels so that the reconstruction residual from each class can be used in classification; second, the Fisher criterion is imposed on the coding coefficients so that they carry discriminative information for classification. To improve this method, Feng et al. [14] propose to learn jointly the projection matrix for dimensionality reduction and the discriminative dictionary for face representation JDDLDR. The joint learning combines more effectively the learned projection and the dictionary with the result of improving FR performance. Within the general framework of the discriminative dictionary learning (DDL), the projective dictionary pair learning (DPL) algorithm [15] learns a synthesis dictionary and an analysis dictionary jointly to achieve the goal of signal representation and discrimination. The vector guided dictionary learning (SVGDL) method is proposed in [16] as a special case of the Fisher discrimination dictionary learning (FDDL) method; here the weights are determined by the numbers of samples of each class and a parameterization method is used to adaptively determine the weight of each coding vector pair. Compared with FDDL, SVGDL can adaptively assign different weights to different pairs of coding vectors. Yet another DDL approach recently proposed is the locality constrained and label embedding dictionary learning (LCLE-DL) algorithm [17], where locality information is preserved using the graph Laplacian matrix of the learned dictionary rather than the conventional one derived from the training samples; next, the label embedding term is constructed using the label information of atoms instead of the classification error term; the coding coefficients derived by combining locality-based and label-based reconstruction are shown to be very effective for image classification. Very recently, it was proposed a probabilistic interpretation of the collaborative classification mechanism to explain the classification mechanism of CRC and following this analysis it was introduced a method called probabilistic collaborative representation based classifier (ProCRC) which jointly maximizes the likelihood that a test sample belongs to each of the multiple classes [18].
On other hand, a class of algorithms described as local feature based methods [19–28] also demonstrated very promising results in problems of object recognition and texture classification. For instance, some of these methods use Gabor filters to extract local directional features on multiple scales and have been successfully applied in FR [20, 21]. Compared to more conventional methods such as Eigenface [29] and Fisherface [30], Gabor filtering is less sensitive to image variations. Another type of local feature widely used in FR is statistical local feature (SLF), such as histogram of local binary pattern (LBP) [22], whose main principle is to model a face image as a composition of micro-patterns [28]. By partitioning the face image into several blocks, the statistical feature (e.g., histogram of LBP) of these blocks is extracted, and finally the description of the image is formed by concatenating the extracted features in all blocks. For example, Zhang et al. [24, 25] proposed to use Gabor magnitude or phase map instead of the intensity map to generate LBP features. New coding techniques on Gabor features have also been proposed, e.g., Zhang et al. [26] extracted and encoded the global and local variations of the real and imaginary parts of the data using a multi-scale Gabor representation. Borgi et al. [31–35] proposed two algorithms that apply a sparse multiscale representation based on shearlets to extract the essential geometric content of facial features, one called regularized shearlet network (RSN) and another one sparse multi-regularized shearlet network (SMRSN). Finally, we recall that Meng et al. [36] proposed a kernel based representation model to fully exploit the discrimination information embedded in the statistical local features (SLF_RKR) and applied a robust regression method handle occlusions in face images.
In this paper, we adopt the same general philosophy of CRC and our main novel contribution is to integrate this method with a virtual collaborative projection (VCP) routine designed to train images of every class against the others classes with the goal to improve fidelity before projecting the query image. Additionally, inspired by the remarkable results obtained from the recent literature in local feature based method, our algorithm includes a routine to compute high-order statistical moments (SM) in order to extract highly discriminative local features and improve data representation. To validate our algorithm, which is called statistical binary pattern with virtual competitive representation (SBP_VCP), we have tested it on multiple datasets for problems of face recognition, gender classification, handwritten digit recognition, object categorization and action recognition. Experimental results show that our method consistently achieves very competitive results as compared to classical and state-of-the-art algorithms.
The rest of this paper is organized as follows. Section 2 introduces the main idea of statistical binary pattern and high order moments for feature extraction. Section 3 describes the proposed virtual collaborative projection applied to trained faces. Section 4 reports extensive numerical experiments to validate the proposed method and compare it against state-of-the-art methods on problems of face recognition under different confounding factors as well as image categorization, handwritten digit and action recognition. Finally, Sect. 5 concludes this paper.
2 Statistical binary pattern and high order moments
The statistical binary patterns (SBP) representation is an extension of local binary patterns (LBP) and it aims at enhancing the expressiveness and discrimination power of LBP for image modelling (especially texture) and recognition, while reducing sensitivity to small perturbations, e.g., noise. The main idea of this method, which was introduced by one of the authors and their collaborator in [37], consists in applying a rotation invariant uniform LBP to a set of images corresponding to the local statistical moments associated to a given spatial support. The resulting code forms the SBP and an image is then represented by joint or marginal distributions of SBPs.
2.1 Moment images
A real valued 2d discrete image f is modelled as a mapping from \({{\mathbb{Z}}^{2}}\) to \(\mathbb{R}\). The spatial support used to calculate the local statistics is modelled as\(B\subset {{\mathbb{Z}}^{2}}\), such that \(O\in B,\) where O is the origin of \({{\mathbb{Z}}^{2}}\). The r-order moment image associated to f and B is also a mapping from \({{\mathbb{Z}}^{2}}\)to \(\mathbb{R}\), defined as:
where z is a pixel from \({{\mathbb{Z}}^{2}}\), and \(\left| B \right|\) is the cardinality of the structuring element B. Accordingly, the r-order centered moment image (r > 1) is defined as:
where \(m_{(f,B)}^{1}(z)\) is the average value (1-order moment) calculated around z. Finally the r-order normalized centered moment image (r > 2) is defined as:
where \(\mu _{(f,B)}^{2}(z)\) is the variance (2-order centered moment) calculated around z.
2.2 Statistical binary patterns
Let R and P denote the radius of the neighborhood circle and the number of values sampled on the circle, respectively. For each moment image M, one statistical binary pattern is formed as follows:
-
one (P + 2)-valued pattern corresponding to the rotation invariant uniform LBP coding of M:
-
one binary value corresponding to the comparison of the centre value with the mean value of M:
where s denotes the pre-defined sign function, and \(\tilde{M}\) the mean value of the moment M on the whole image. Hence \(SB{{P}_{P,R}}(M)\) represents the structure of the moment M with respect to a local reference (the center pixel), and \(SB{{P}_{C}}(M)\) complements the information with the relative value of the center pixel with respect to a global reference (\(\tilde{M}\)). As a result of this first step, a \(2(P+2)\)-valued scalar descriptor is then computed for every pixel of each moment image.
2.3 Image descriptors
Let \({{\left\{ {{M}_{i}} \right\}}_{1\le i\le {{n}_{M}}}}\) be the set of \({{n}_{M}}\) computed moment images. \(SB{{P}^{\left\{ {{M}_{i}} \right\}}}\) is defined as a vector valued image, with \({{n}_{M}}\) components such that for every \(z\in {{\mathbb{Z}}^{2}}\), and for every i, \(SB{{P}^{\left\{ {{M}_{i}} \right\}}}{{(z)}_{i}}\) is a value between 0 and 2 (P + 2). If the image f contains texture, the descriptor associated to f is made by the histogram of the values of \(SB{{P}^{\left\{ {{M}_{i}} \right\}}}\). We consider two kinds of histograms.
First we consider the joint histogram H defined as follows:
Depending on the size of the texture images, the joint distribution may become too sparse when the dimension (i.e., the number of moments) increases.
Next, we consider the marginal histograms \({{\{{{h}_{i}}\}}_{i\le {{n}_{M}}}}\) defined as:
An image descriptor can then be defined using the joint histogram H or the concatenation of the \({{n}_{M}}\) marginal histograms \(\{{{h}_{i}}\}\). The length of the descriptor vector is \({{[2(P+2)]}^{{{n}_{M}}}}\) in the first case and \(2{{n}_{M}}(P+2)\) in the second case.
2.4 Higher order moments
The SBP model on higher order moments is evaluated next. The objective of the SBP framework is to extend the LBP texture image descriptors from the local level, represented by the pixel z, to the regional distribution level of \(z+B\) by approximating the distribution to a set of statistical moments. It is known that the mean and variance describe faithfully a statistical distribution only in special cases, e.g., when it is a normal distribution. This assumption may fail for natural texture images. Therefore, higher order moments are needed to obtain an accurate description of a general distribution and capture the relevant information.
Regarding the size of the image descriptor, it clearly increases as the number of moments increase. When we use joint histograms, the descriptor size is \({{(2(P+2))}^{n}}\) where P is the number of neighbours used in LBP and n is the number of moment images. When we use marginal histograms, the size is only \(2n(P+2)\) but this comes at the price of a significant loss of information. Hence we propose a trade-off between descriptor size and information loss based on the concatenation of joint histograms corresponding to pairs of moment images.
Formally, we can recursively define the higher order SBP hybrid image descriptor as follows.
Let \({{M}_{1}}\)and \({{M}_{2}}\) be moments or combinations of moments by their joint or concatenated histogram. We shall denote as \(SB{{P}^{{{M}_{1}}{{M}_{2}}}}\)(resp. \(SB{{P}^{{{M}_{1}}\_{{M}_{2}}}}\)) the image descriptor made by the joint (resp. concatenated) histograms constructed from \(SB{{P}^{{{M}_{1}}}}\) and \(SB{{P}^{{{M}_{2}}}}\). In our experiments for higher order moments below, we have only considered pairs of moments for joint histograms. The algorithm below summarizes the high order binary statistical moment SBP:
The SBP Algorithm |
Input: f—a 2D image, \(B\subset {{\mathbb{Z}}^{2}}\)—the spatial support used to calculate the local moments, P—the number of neighbours, R—the radius neighbouring circle. Output: \(SBP_{P,R}^{{{m}_{1}}{{\mu }_{2}}}\)—texture descriptor of f. Calculate moment images: 1. Calculate the first order moment image \({{m}_{1}}\) (or \(m_{(f,B)}^{1}\)) associated to f and B using the formula (1). 2. Calculate the second order centered moment image \({{\mu }_{2}}\) (or \(\mu _{(f,B)}^{2}\)) associated to f and B using the formula (2). Statistical binary patterns: 1. Calculate statistical binary patterns \(SB{{P}_{P,R}}\left( {{m}_{1}} \right)\) and \(SB{{P}_{C}}\left( {{m}_{1}} \right)\) from the first order moment images \({{m}_{1}}\), using the formulas (5) and (6). 2. Calculate statistical binary patterns \(SB{{P}_{P,R}}\left( {{\mu }_{2}} \right)\) and \(SB{{P}_{C}}\left( {{\mu }_{2}} \right)\) from the second order moment images \({{\mu }_{2}}\), using the formulas (5) and (6). 3. Calculate \(SBP_{P,R}^{{{m}_{1}}{{\mu }_{2}}}\) as joint histogram of \(SB{{P}_{P,R}}\left( {{m}_{1}} \right)\), \(SB{{P}_{C}}\left( {{m}_{1}} \right)\), \(SB{{P}_{P,R}}\left( {{\mu }_{2}} \right)\) and \(SB{{P}_{C}}\left( {{\mu }_{2}} \right)\). |
Figures 1 and 2 compare the recognition rate of the algorithms LBP, CLBP [38] and SBP. For this comparison, we used the Outex database [39], a large and comprehensive texture database which includes 24 classes of textures collected under three illuminations and at nine angles. To measure the dissimilarity between the two histograms, we used the nearest neighborhood classifier with the Chi square distance. We considered different configurations of SBP: in Fig. 1 we set the (P,R) value equal to (24,3); in Fig. 2 we used values (8,1), (16,2) and (24,3).
3 Virtual collaborative projection
Zhang et al. [9] investigated the role of collaboration between classes in representing the query sample. In order to collaboratively represent the query sample \(y\in {{\mathbb{R}}^{m}}\) using X (all the gallery images where each column is a training sample) with low computational cost, they introduced a method called collaborative representation based classification with regularized least square method (CRC_RLS). A general model of collaborative representation is:
where \(\alpha\) is the coding vector \((\alpha =[{{\alpha }_{1}},\ldots ,{{\alpha }_{i}},\ldots ]\) and \(y\approx X\alpha)\) and \(\lambda\) is the regularization parameter.
The algorithm is described below:
The CRC-RLS Algorithm |
1. Normalize the columns of X to have unit l 2-norm. 2. Code y over X by \(\tilde{\alpha }=Py\) where \(P={{\left( {{X}^{T}}X+\lambda I \right)}^{-1}}{{X}^{T}}\). 3. Compute the regularized residuals \({{r}_{i}}={\left\| y-{{X}_{i}}{{{\tilde{\alpha }}}_{i}} \right\|}/{{{\left\| {{{\tilde{\alpha }}}_{i}} \right\|}_{2}}}\;\) 4. Output the identity of y as \(\text{identity}(y)\text{ }=\text{ argmi}{{\text{n}}_{\text{i}}}\left\{ {{r}_{i}} \right\}\) |
where \({{\tilde{\alpha }}_{i}}\) is the coding vector associated with class i.
The method proposed in this paper improves this algorithm by increasing the fidelity of the training images and enhancing the collaboration between classes by representing not only the query sample y but also all gallery images \({{x}_{i}}\) of every class i based on the idea of virtual collaborative projection (VCP).
Using this idea, we can compute the average images \({{C}_{i}}\) from every class i over X, defined as:
where \({{N}_{tr}}\) represents the number of training images of a class i.
Next by computing P as:
then the resulting virtual coefficient \({{\tilde{\alpha }}_{virtual}}\) is calculated as follows:
This virtual coefficient is used as a weight for every class i and reconstruct a new gallery images \({{d}_{{{c}_{i}}}}\):
A new dictionary D (the update of X) is then obtained by combining all images \({{d}_{{{c}_{i}}}}\left( D=\left[ {{d}_{{{c}_{1}}}},\ldots ,{{d}_{{{c}_{i}}}},\ldots \right] \right)\).
Next, when a query sample y is presented to be classified, we follow the same procedure as CRC_RLS by computing the regularized residuals \({{r}_{i}}\) but we utilize the new dictionary D:
where \({{D}_{i}}\) represents the images of a class i. The identity of a query sample y is computing by:
Below we present our virtual collaborative projection (VCP) algorithm when a query image\(y\)is presented to be classified:
The VCP Algorithm |
1. Normalize the columns of X to have unit l 2-norm. 2. Compute the average images \({{C}_{i}}\) of every class i using the formula (9). 3. Compute the virtual coefficient \({{\tilde{\alpha }}_{virtual}}\) using the formulas (10) and (11). 4. Compute \({{d}_{{{c}_{i}}}}\) using the formula (12). 5. Combining all the \({{d}_{{{c}_{i}}}}\) in a dictionary D. 6. Compute the regularized residuals \({{r}_{i}}\) using the formula (13). 7. Return the identity of y using the formula (14). |
In order to investigate the efficiency of VCP versus CRC, we conducted some experiments using the AR face dataset [40] with different dimensionality. Note that PCA is used to reduce the dimensionality of original face images, and the Eigenface features are used for this first experiment with three dimensions 54, 120 and 300.
For this comparison, we selected a subset from AR dataset that contains 50 male subjects and 50 female subjects with only illumination and expression changes. For each subject, the seven images from Session 1 were used for training and the other seven images from Session 2 were used for testing. The images were cropped and resized to 60 × 43. Table 1 shows that VCP performs slightly better than CRC_RLS [9]:
Additional experiments are conduct in Sect. 4 with object categorization and action recognition where we use features provide by state-of-the-art methods and not the high order statistical moments.
We conclude this section by presenting our algorithm of high order statistical binary pattern with virtual collaborative projection (SBP_VCP) obtained by adding the step of high order statistical moments features extraction (cf. Sect. 2) to the VCP algorithm. This additional step is performed for the training images X resulting in a new training set and for every query sample y.
The SBP_VCP Algorithm |
1. Extract the statistical binary patterns \(SBP_{P,R}^{{{m}_{1}}{{\mu }_{2}}}\) of X using the SBP Algorithm. 2. Extract the statistical binary patterns \(SBP_{P,R}^{{{m}_{1}}{{\mu }_{2}}}\) of y using the SBP Algorithm. 3. Call VCP algorithm. |
In the next section we illustrate the performance of the SBP_VCP approach.
4 Experiments
To demonstrate the performance of our SBP_VCP algorithm, we conducted extensive experiments on multiple benchmark databases for face recognition, handwritten digit recognition, gender classification, image categorization and action recognition.
4.1 Parameter settings
We first describe how we set the parameters in the SBP_VCP algorithm. A part from the choice of moments and their combinations, two additional parameters need to be set in the calculation of the SBP:
-
The spatial support B for calculating local moments.
-
The spatial support {P;R} for calculating the LBP.
Although those two parameters are relatively independent, it must be noticed that B has to be sufficiently large to be statistically relevant. Regarding {P;R}, this quantity is supposed to be relatively small in order to represent local micro-structures of the (moment) images.
In the following, due to space constraints, we only show experiments using structuring element B = {(1;5); (2;8)} which provides very satisfactory results on the different datasets.
Regarding {P;R}, the spatial support of the LBP, we have considered the three settings commonly found in the literature: {8;1}, {16;2}, and {24;3}.
Regarding the parameters associated with the virtual collaborative projection and the collaborative classification, we used a regularization parameter \(\lambda\) which is initialized as follows, for:
-
Face recognition (FR) without occlusion: \(\lambda =0.001\)
-
Face recognition (FR) with occlusion: \(\lambda =0.1\)
-
Gender classification (GC): \(\lambda =0.001\)
-
Digit handwritten recognition: \(\lambda =0.1\)
-
Image categorization: \(\lambda =0.001\)
-
Action recognition: \(\lambda =0.1\)
In all tables reported, the value in bold indicates the best performance. Namely, in Table 1 through Table 18, the values in bold indicate the best recognition rates; in Tables 19 and 20 the values in bold indicate the least computation time.
4.2 Face recognition (FR)
4.2.1 Extended Yale B database
The Extended Yale B [41, 42] database contains 2414 frontal face images of 38 individuals; some samples are presented in Fig. 3. We used the cropped and normalized face images of size 54 × 48, which were taken under varying illumination conditions. Three tests are considered for this dataset.
Test 1. We randomly split the database into two halves. One-half, which contains 32 images for each person, was used as the dictionary, and the other half was used for testing. Table 2 shows the recognition rates versus feature dimension by nearest neighbours NN, nearest feature line NFL [43], support vector machine SVM, sparse representation based classification SRC [2], linear regression based classification LRC [44], locality-constrained linear coding LLC [45], regularized robust coding RRC [7] methods. SBP_VCP achieves the best recognition rate for all dimensions except dimension 300 where it performs slightly worse than RRC_l 1 [7] but it is still superior to all other methods considered.
Test 2. For each subject, N tr samples are randomly chosen as training samples and 32 of the remaining images are randomly chosen as the testing data. Here the images are resized to size 96 × 84 and the experiment for each N tr runs ten times. For comparison, we used robust kernel representation with statistical local features SLF-RKR [36] and we used the same features extraction; statistical local features SLF with NN, LRC, SVM, CRC and SRC based methods.
We list in Table 3 the FR performance results, measured as mean recognition accuracy. The proposed algorithm SBP_VCP achieves the best performance when N tr = 5 or 20 and it is the second best method slightly behind SLF-RKR_l 2 when N tr = 10. It can also be noticed that methods based on collaborative representation (e.g., SLF-RKR [36], SLF + CRC, SLF + SRC and original SRC) perform better than other kinds of linear representation methods (e.g., SLF + LRC, SLF + NN).
Test 3. In the third test, we randomly selected between 2 and 7 images from each person as training set and used the remaining images as testing set. Similarly, all the samples were projected into a subspace of 550 dimensions (Samples in LDA + SRC and LDA + CRC schemes are projected into a subspace of 37 dimensions), in addition to SRC and CRC we compare our method with JDDLDR [14], FDDL [13] and PDL [15] based approach. The FR results are shown in Table 4.
Table 4 shows that SBP_VCP gives the best results for all values of N tr . We remark that the improvement in performance is significant as compared to all others methods demonstrating the advantages of combining the statistical features with this twin competitive (collaborative) classification.
4.2.2 AR database
Test 1. As in [2], we selected a subset (with only illumination and expression changes) containing 50 male and 50 female subjects from the AR database [40]; some samples are shown in Fig. 4. For each subject, the seven images from Session 1 were used for training and the other seven images from Session 2 were used for testing. The images were cropped to 60 × 43. The FR rates with baseline comparison reported in Table 5 show that the proposed approach yields the best performance among all methods considered for all dimensions, even when the dimension is 30 and competing methods perform rather poorly. As expected, all methods achieve their maximal recognition rates at dimension 300.
Test 2. For each subject, the seven images with illumination change and expressions from Session 1 were used for training, and the other seven images with only illumination change and expression from Session 2 were used for testing. The size of the original face image is 83 × 60. The recognition rates versus the number of training samples N tr are reported in Table 6, showing that SBP_VCP achieves the highest recognition rates, followed in order by SLF-RKR [36] and SLF + SRC.
4.2.3 MPIE database
The CMU Multi-PIE database [46] contains images of 337 subjects captured in four sessions with simultaneous variations in pose, expression, and illumination. Among these 337 subjects, all the 249 subjects in Session 1 were used for training. To make the FR more challenging, four subsets with both illumination and expression variations in Sessions 1, 2 and 3, were used for testing. We conducted two tests with this experimental protocol.
Test 1. In the first test, for the training set, as in [2], we used the seven frontal images with extreme illuminations {0, 1, 7, 13, 14, 16, and 18} and neutral expression (refer to Fig. 5a for examples). For the testing set, four typical frontal images with illuminations {0, 2, 7, 13} and different expressions (smile in Sessions 1 and 3, squint and surprise in Session 2) were used (refer to Fig. 5b for examples with surprise in Session 2, Fig. 5c for examples with smile in Session 1, and Fig. 5d for examples with smile in Session 3). Here we used Eigenface with dimensionality 300 as the face feature for sparse coding. Table 7 reports the recognition rates found in four testing sets.
Table 7 shows that SBP_VCP gives the best results using the sets smile-S1 and Squint-S2 and the second best results with the sets surprise-S2 and smile-S3. Since smile-S1 is in the same class (intra-class) as the training set, that’s why we have a good result, regarding smile-S3 and surprise-S2 sets we have the second best accuracy by 72.7 and 62.5% respectively.
Test 2. In the second test, we analyzed the impact of statistical binary pattern (SBP) on different state-of-the-art methods with the same experimental protocol as Test 1. We considered nearest neighbours NN, linear regression LRC [44], sparse representation SRC [2], collaborative representation CRC [9] and relaxed collaborative representation RCR [10] based classification. Table 8 reports the recognition rates found on the different methods with and without SBP.
Results in Table 8 show that SBP consistently increases the performance of different approaches, especially when the classes are different from Session 1. The improvement in performance is significant for collaborative classification based methods CRC and RCR; for example the recognition rate of RCR with the set square-S2 increases from 40 to 74.6%, and with the set surprise-S2 from 38.1 to 64.5%.
4.2.4 AR database, disguise
In this experiment, we considered a subset from the AR database consisting of 2599 images from 100 subjects (26 samples per class except for a corrupted image w-027-14.bmp), 50 males and 50 females. We performed three tests: the first one follows the experimental settings in [2]; the other two, described below, are more challenging. The images were resized to 83 × 60 in the first and third test and to 42 × 30 in the second test; four representative samples of two persons are shown in Fig. 6.
Test 1. In the first test, 799 images (about 8 samples per subject) of non-occluded frontal views with various facial expressions in Sessions 1 and 2 were used for training, while two separate subsets (with sunglasses and scarf) of 200 images (1 sample per subject per Session, with neutral expression) were used for testing. The FR results are listed in Table 9 and show that the SBP_VCP method achieves a much higher recognition rates than CRC_RLS [9], RRC [7] (with scarf), SRC [2], Gabor feature based sparse representation with Gabor occlusion dictionary GSRC [5] and maximum correntropy criterion CESR [8].
Test 2. In the second test, we considered FR with a more complex disguise including variations of illumination and longer data acquisition interval. 400 images (4 neutral images with different illuminations per subject) of non-occluded frontal views in Session 1 were used for training, while the disguised images (3 images with various illuminations and sunglasses or scarves per subject per session) in Sessions 1 and 2 for testing. The results, reported in Table 10, show that the SBP_VCP methods achieves better performance than CRC_RLS [9], SRC [2], GSRC [5] and CESR [8], except for sunglass-S1, where it achieve the second best result after RRC [9].
Test 3. In this test, a subset of 50 males and 50 females were selected from the AR database. For each subject, 7 samples without occlusion from session 1 are used for training, with all the remaining samples with disguises used for testing. These testing samples (including 3 samples with sunglass in Session1, 3 samples with sunglass in Session 2, 3 samples with scarf in Session 1 and 3 samples with scarf in Session 2 per subject) not only have disguises, but also variations of time and illumination. Table 11 reports the FR results on the four test sets with disguise.
Table 11 shows that the proposed method achieves the best recognition rate with sunglasses in Session 2 and achieves 100% accuracy with Session 1 (as some others methods) and the second best accuracy in the sessions with scarf (SLF_RKR is ranked first). We remark that all methods perform better for session 1 (sunglass and scarf) than session 2, as session 2 is more challenging due to variations in illumination.
4.2.5 Georgia Tech data base with block occlusion
The Georgia Tech (GT) [47] Face Database contains 750 color images of 50 subjects (15 images per subject), as shown in Fig. 7a. These images have large variations in pose and expression and some illumination changes. Images were converted to gray scale, cropped and resized to 90 × 68. The first eight images of all subjects were used in the training (400 images), the remaining seven images for testing (350 images). For block occlusion, were placed a randomly located rectangle of all the testing images using an unrelated image, as illustrated in Fig. 7c.
Performance results reported in Table 12 compare the algorithms SBP_VCP, SBP-CRC, SBP-SRC, SBP-LRC, and SBP-NN in the presence of block occlusion ranging from 0 to 50% of the image. Table 12 shows that SBP_VCP achieves the best accuracy. Our interpretation is that this remarkable performance is due mostly to the VCP approach which efficiently takes advantage of the twin collaborative representation in the training and testing steps.
4.2.6 FRGC data base with block occlusion and single sample per person (SSPP)
The FRGC database [48] contains faces acquired under uncontrolled conditions as shown in Fig. 8a. Using single sample per person (SSPP) protocol as another challenging problem in FR, we randomly selected 152 images for training, 152 images for testing and replaced a randomly located block of the test image with an unrelated image, as illustrated in Fig. 8c. The images were cropped and resized to 90 × 68 pixels. The recognition accuracy on this dataset is reported in Table 13.
The Table 13 shows that also in this test with block occlusion ranging from 10 to 50% of the image our algorithm SBP_VCP achieves the best performance, as it exhibits as lightly better accuracy than all the other methods considered. Note that all methods, except SBP-NN and SBP-LRC, achieve the same recognition rates without occlusion, while their performance is different in the presence of occlusion. This shows that SBP_VCP performs remarkably well in the challenging SSPP problem.
4.3 Gender classification (GC)
4.3.1 AR database
We selected a non-occluded subset (14 images per subject) of AR [22] consisting of 50 male and 50 female subjects. Images of the first 25 males and 25 females were used for training and the remaining images were used for testing. The images were cropped to 60 × 43. PCA was used to reduce the dimension of each image to 300. Table 14 reports the comparison of SBP_VCP versus the methods: regularized nearest subspace (RNS) [49], multi-regularized features learning (MRL) [50], CRC_RLS [9], SRC [2], SVM, LRC [44] and NN. The Table 14 shows that SBP_VCP outperforms the others methods considered and illustrates that the proposed method based on statistical local features is very effective for gender classification.
4.3.2 FEI database
There are 14 images for each of 200 individuals with a total of 2800 images [51]. The number of male and female subjects is exactly the same and equal to 100. The first nine images of all subjects are used in the training (1800 images, 900 per gender) and the remaining five images serve as testing images (1000 images, 500 per gender). Figure 9 shows all samples from one person. The images were cropped to 60 × 43.
Here we compare SBP_VCP to the MRL [50] and CRC_RLS [9] algorithms on different dimensionality. Table 15 shows that SM_VCP outperforms MRL and CRC_RLS with all dimensionality except for dimension 30.
4.4 Handwritten digit recognition
We next considered the problem of handwritten digit recognition on the widely used USPS database (Hull, J.J. 1994), which has 7291 training and 2007 test images. We used two different values of N tr : 100 and 300 images. Results in the Table 16 below show that SM_VCP outperforms all competing methods considered when N tr is 300 images. When N tr = 100, Fisher discrimination dictionary learning FDDL [13] is the best performing algorithm but our approach has the second best performance.
4.5 Image categorization
We tested the proposed method on the problem of multi-class object categorization. We used one of the two Oxford flower datasets, 17 category data set, [53], some samples of which are show in Fig. 10. We adopt the default experimental settings provided at the website http://www.robots.ox.ac.uk/~vgg/data/flowers, including the training, validation, test splits and the multiple features. It should be noted that, in this setting, features are only extracted from those flower regions which are well cropped by segmentation. This set contains 17 species of flowers with 80 images per class. As in [54], we directly use the χ 2 distance matrices of seven features (i.e., HSV, HOG, SIFTint, SIFTbdy, color, shape and texture vocabularies) as inputs, and perform the experiments based on the three predefined training, validation, and test splits. Performance results (in terms of accuracy) comparing VCP versus other state-of-the-arts are presented in Table 17 and show that VCP slightly outperforms all other methods. Note that, as we follow [54], we did not use the SBP for the representation in this test.
4.6 Action recognition
Finally, we conducted an experiment of action recognition on the UCF sport action dataset (Rodriguez et al. [57]) and the large scale UCF50 dataset. The video clips in the UCF sport action dataset were collected from various broadcast sports channels (e.g., BBC and ESPN). There are 140 videos in total and their action bank features can be found in Sadanand et al. [58]. The videos cover ten sport action classes: driving, golfing, kicking, lifting, horse riding, running, skateboarding, swinging-(pommel horse and floor), swinging-(high bar) and walking. The UCF50 dataset has 50 action categories such as baseball pitch, biking, driving, skiing (Fig. 11), and there are 6680 realistic videos collected from YouTube.
On the UCF sport action dataset, we followed the experimental settings in Rodriguez et al. [57] and evaluated VCP via five-fold cross validation, where one fold is used for testing and the remaining four folds for training. Since we use the action bank features of [58], we do not use SBP as a local feature in this test.
We compared VCP against state-of-the-art methods and reported the recognition rate in Table 18. Again, results show that VCP performs very competitively, illustrating the impact of the collaborative method.
4.7 Running time
In practical applications, training is usually an offline stage while recognition (classification) is usually an online step. Since we adopted the same classification procedure of collaborative representation based classification CRC, the speed-up we achieve is remarkable when compared to many other methods due to the significant reduction in computational complexity. In fact, after projecting a query sample y via \(P={{\left( {{X}^{T}}X+\lambda I \right)}^{-1}}{{X}^{T}}\), y is classified to the class which gives the minimal \({{r}_{i}}(\alpha )=\left\| y-{{X}_{i}}\alpha \right\|_{2}^{2}+\lambda {{\left\| \alpha \right\|}_{n}}\) where n = 1 or 2 and \({{\alpha }_{i}}\) is the coding vector associated with class i (\(\alpha =[{{\alpha }_{1}},\ldots ,{{\alpha }_{i}},\ldots ]\) and \(y\approx X\alpha\)).
All experiments were carried out using MATLAB on a 2.20 GHz with dual-core CPU machine with 3.00 GB RAM. Table 19 lists the average computational cost of training step on Test 1 and Test 2 from the AR dataset with real face disguise. The comparison of the LBP [22] to SBP algorithms shows that LBP has the least computation time, but SBP is close.
Table 20 lists the average computational cost classification of different methods on Test 1 and Test 2 from the AR dataset with real face disguise. SBP_VCP has the least computation time followed by RRC while GSRC has the highest computation time.
5 Conclusion
In this paper, we have introduced a novel approach for pattern recognition combining high order statistical binary pattern and collaborative projection for robust local representation and classification. We have demonstrated that the extraction of statistical features based on the high-order moments of the images is particularly effective against images outliers. When this is property is combined with our strategy for competitive or collaborative representation based on a trained virtual projection, we obtain a method we call SBP_VCP which is a powerful refinement of the collaborative representation based classification recently proposed in the literature. We have validated SBP_VCP on a wide range of problems from pattern recognition and classification which include face recognition, gender classification, object categorisation and action recognition. Extensive numerical tests and detailed comparison with standard and state-of-the-art methods demonstrate that the proposed SBP_VCP approach performs very competitively even on challenging classification tests. Additionally, our method can be implemented at a relatively small computational cost as it relies on the same efficient framework used in CRC for the classification step.
References
Borgi MA, Labate D, El’arbi M, Amar CB (2015) Sparse multi-stage regularized feature learning for robust face recognition. Expert Syst Appl 42(1):269–279
Wright J, Yang AY, Ganesh A et al (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227
Theodorakopoulos I, Rigas I, Economou G, Fotopoulos S (2011) Face recognition via local sparse coding. In: Proceedings of the ICCV, pp 1647–1652
Gao S, Tsang I, Chia L (2010) Kernel sparse representation for image classification and face recognition. In: Proceedings of the ECCV, pp 1–14
Yang M, Zhang L (2010) Gabor feature based sparse representation for face recognition with Gabor occlusion dictionary. In: Proceedings of the ECCV, pp 448–461
Yang M, Zhang L, Yang J, Zhang D (2011) Robust sparse coding for face recognition. In: Proceedings of the ICCV, pp 625–632
Yang M, Zhang L, Yang J, Zhang D (2013) Regularized robust coding for face recognition. IEEE Trans Image Process 22(5):1753–1766
He R, Zheng WS, Hu BG (2011) Maximum correntropy criterion for robust face recognition. IEEE Trans Pattern Anal Mach Intell 33(8):1561–1576
Zhang L, Yang M, Feng X (2011) Sparse representation or collaborative representation: which helps face recognition? In: Proceedings of the ICCV, pp 471–478
Yang M, Zhang L, Zhang D, Wang S (2012) Relaxed collaborative representation for pattern classification. In: Proceedings of the ICCV, pp 2224–2231
Xu Y, Zhang D, Yang J, Yang J-Y (2011) A two-phase test sample sparse representation method for use with face recognition. IEEE Trans Circuits Syst Video Technol 21(9):1255–1262
Mi J-X, Liu J-X (2013) Face recognition using sparse representation-based classification on K-nearest subspace. PLoS ONE 8(3):e59430. doi:10.1371/journal.pone.0059430
Yang M, Zhang L, Feng X, Zhang D (2011) Fisher discrimination dictionary learning for sparse representation. In: Proceedings of the ICCV, pp 543–550
Feng Z, Yang M, Zhang L, Liu Y, Zhang D (2013) Joint discriminative dimensionality reduction and dictionary learning for face recognition. Pattern Recogn 46(8):2134–2143
Gu S, Zhang L, Zuo W et al (2014) Projective dictionary pair learning for pattern classification. In: Proceeding of advances in neural information processing systems, pp 793–801
Cai S, Zuo W, Zhang L (2014) Support vector guided dictionary learning. In: Proceedings of the European conference on computer vision, pp 624–639
Li Z, Lai Z, Xu Y et al (2015) A locality-constrained and label embedding dictionary learning algorithm for image classification. IEEE Trans Neural Netw Learn Syst. doi:10.1109/TNNLS.2015.2508025
Cai S, Zhang L, Zuo W, Feng X (2016) A probabilistic collaborative representation based approach for pattern classification. In CVPR, pp 2950–2959
Lades M, Vorbrüggen JC, Buhmann J et al (1993) Distortion invariant object recognition in the dynamic link architecture. IEEE Trans Comput 42(3):300–311
Liu C, Wechsler H (2002) Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition. IEEE Trans Image Process 11(4):467–476
Shen L, Bai L (2006) A review on Gabor wavelets for face recognition. Pattern Anal Appl 9(10):273–292
Timo A, Abdenour H, Matti P (2004) Face recognition with local binary patterns. In: Proceedings of the ECCV, pp 469–481
Ojala T, Pietikäinen M, Mäenpää T (2002) Multi-resolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987
Zhang W, Shan S, Gao W et al (2005) Local Gabor binary pattern histogram sequence (LGBPHS): A novel non-statistical model for face representation and recognition. In: Proceedings of the ICCV, pp 786–791
Zhang W, Shan S, Chen X, Gao W (2009) Are Gabor phases really useless for face recognition?. Pattern Anal Appl 12(3):301–307
Zhang B, Shan S, Chen X, Gao W (2007) Histogram of Gabor phase patterns (HGPP): a novel object representation approach for face recognition. IEEE Trans Image Process 16(1):57–68
Xie SF, Shan SG, Chen XL, Chen J (2010) Fusing local patterns of Gabor magnitude and phase for face recognition. IEEE Trans Image Process 19(5):1349–1361
Ahonen T, Hadid A, Pietikainen M (2006) Face description with local binary patterns: application to face recognition. IEEE Trans Pattern Anal Mach Intell 28(12):2037–2041
Turk M, Pentland A (1991) Eigenfaces for recognition. J. Cognitive. Neuroscience 3(1):71–86
Belhumeur PN, Hespanha JP, Kriengman DJ (1997) Eigenfaces vs. Fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720
Borgi MA, El’arbi M, Labate D, Amar CB (2015) Regularized directional feature learning for face recognition. Multimedia Tools Appl 74(24):11281–11295
Borgi MA, Labate D, El’Arbi M, Amar CB (2014) Regularized Shearlet network for face recognition using single sample per person. In: Proceedings of the ICASSP, pp 514–518
Borgi MA, Labate D, El'arbi M, Amar CB (2014) ShearFace: efficient extraction of anisotropic features for face recognition. In: Proceedings of the ICPR, pp 1806–1811
Borgi MA, Labate D, El'arbi M, Amar CB (2014) Sparse multi-regularized shearlet-network using convex relaxation for face recognition. In: Proceedings of the ICPR, pp 4636–4641
Borgi MA, Labate D, El'arbi M, Amar CB (2013) Shearlet network-based sparse coding augmented by facial texture features for face recognition. In: Proceedings of the ICIAP, pp 611–620
Yang M, Zhang L, Shiu SC, Zhang D (2013) Robust kernel representation with statistical local features for face recognition. IEEE Trans Neural Netw Learn Syst 24(6):900–912
Nguyen TP, Vu NS, Manzanera A (2016) Statistical binary patterns for rotational invariant texture classification. Neurocomputing 173:1565–1577
Guo ZH, Zhang L, Zhang D (2010) A completed modeling of local binary pattern operator for texture classification. IEEE Trans Image Process 19(6):1657–1663
Ojala T, Maenpaa T, Pietikainen M et al (2002) Outex—new framework for empirical evaluation of texture analysis algorithms. In: Proceedings of the ICPR, pp 701–706
Martinez A, Benavente R (1998) The AR face database. CVC technical report 24
Georghiades A, Belhumeur P, Kriegman D (2001) From few to many: illumination cone models for face recognition under variable lighting and pose. IEEE PAMI 23(6):643–660
Lee K, Ho J, Kriegman D (2005) Acquiring linear subspaces for face recognition under variable lighting. IEEE PAMI 27(5):684–698
Li SZ, Lu J (1999) Face recognition using nearest feature line method. IEEE Trans Neural Netw 10(2):439–443
Naseem I, Togneri R, Bennamoun M (2010) Linear regression for face recognition. IEEE Trans Pattern Anal Mach Intell 32(11):2106–2112
Wang JJ, Yang JC et al (2010) Locality-constrained linear coding for image classification. In: Proceedings of the CVPR, pp 3360–3371
Gross R, Matthews I et al (2010) Multi-PIE. Image Vis Comput 28(5):807–813
Georgia Tech Face Database (2007). http://www.anefian.com/face_reco.htm
Phillips PJ, Flynn PJ et al (2005) Overview of the face recognition grand challenge. In: Proceedings of the CVPR, pp 947–954
Zhang L, Yang M et al (2011) Collaborative representation based classification for face recognition. Technical report. arXiv: 1204.2358
Borgi MA, El’arbi M, Labate D, Amar CB (2014) Face, gender and race classification using multi-regularized features learning. In: Proceedings of the ICIP, pp 5277–5281
Thomaz E, Giraldi GA (2010) A new ranking method for principal components analysis and its application to face image analysis. Image Vis Comput 28(6):902–913
Yang M, Zhang L, Feng X, Zhang D (2014) Sparse representation based fisher discrimination dictionary learning for image classification. Int J Comput Vis 109(3):209–232
Nilsback M, Zisserman A (2006) A visual vocabulary for flower classification. In: Proceedings of the CVPR, pp 1447–1454
Yuan XT, Yan SC (2010) Visual classification with multitask joint sparse representation. In: Proceedings of the CVPR, pp 3493–3500
Nilsback M, Zisserman A (2008) Automated flower classification over a large number of classes. In: Proceedings of the ICCVGIP, pp 722–729
Gehler P, Nowozin S (2009) On feature combination for multiclass object classification. In: Proceedings of the ICCV, pp 221–228
Rodriguez MD, Ahmed J, Shah M (2008) Action MACH a spatio-temporal maximum average correlation height filter for action recognition. In: Proceedings of the CVPR
Sadanand S, Corso JJ (2012) Action bank: a high-level representation of activity in video. In: Proceedings of the CVPR, pp 1234–1241
Yao A, Gall J, Van Gool LJ (2010) A hough transform-based voting framework for action recognition. In: Proceedings of the CVPR, pp 2061–2068
Yeffet L, Wolf L (2009) Local trinary patterns for human action recognition. In: Proceedings of the ICCV, pp 492–497
Wang H, Ullah MM et al (2009) Evaluation of local spatio-temporal features for action recognition. In: Proceedings of the BMVC, pp 1–11
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Borgi, M.A., Nguyen, T.P., Labate, D. et al. Statistical binary patterns and post-competitive representation for pattern recognition. Int. J. Mach. Learn. & Cyber. 9, 1023–1038 (2018). https://doi.org/10.1007/s13042-016-0625-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-016-0625-9