Statistical binary patterns and post-competitive representation for pattern recognition

Borgi, Mohamed Anouar; Nguyen, Thanh Phuong; Labate, Demetrio; Amar, Chokri Ben

doi:10.1007/s13042-016-0625-9

Statistical binary patterns and post-competitive representation for pattern recognition

Original Article
Published: 30 December 2016

Volume 9, pages 1023–1038, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Statistical binary patterns and post-competitive representation for pattern recognition

Download PDF

Mohamed Anouar Borgi¹,
Thanh Phuong Nguyen^2,3,
Demetrio Labate⁴ &
…
Chokri Ben Amar¹

314 Accesses
6 Citations
Explore all metrics

Abstract

During the last decade, sparse representations have been successfully applied to design high-performing classification algorithms such as the classical sparse representation based classification (SRC) algorithm. More recently, collaborative representation based classification (CRC) has emerged as a very powerful approach, especially for face recognition. CRC takes advantage of SRC through the notion of collaborative representation, relying on the observation that the collaborative property is more crucial for classification than the l ₁-norm sparsity constraint on coding coefficients used in SRC. This paper follows the same general philosophy of CRC and its main novelty is the application of a virtual collaborative projection (VCP) routine designed to train images of every class against the other classes to improve fidelity before the projection of the query image. We combine this routine with a method of local feature extraction based on high-order statistical moments to further improve the representation. We demonstrate using extensive experiments of face recognition and classification that our approach performs very competitively with respect to state-of-the-art classification methods. For instance, using the AR face dataset, our method reaches 100% of accuracy for dimensionality 300.

A Collaborative Sparse Representation-Based Approach for Pattern Classification

Improving Sparse Representation-Based Classification Using Local Principal Component Analysis

Weighted average integration of sparse representation and collaborative representation for robust face recognition

Article Open access 15 November 2016

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

One of the main challenges of current research in pattern recognition (PR) is to improve the robustness of exiting algorithms with respect to confounding factors including noise, rigid transformations, changes in viewpoint, illumination, etc. Recent advances from statistical learning [1] have brought attention to the notion of sparsity to extract the salient image features in such a way to obtain more accurate and robust classification. Wright et al. [2], in particular, introduced a very influential framework called sparse representation based classification (SRC) for face recognition (FR) and successfully applied this method to identify human faces with varying illumination changes, occlusion and real disguise. In their method, a test sample image is coded as a sparse linear combination of the training images and classification is achieved by identifying which class yields the least residual. Several other methods were inspired by SRC including: the FR method based on sparse representation of facial image patches by Theodorakopoulos et al. [3]; kernel sparse representation for image classification and FR, which applies a sparse coding technique in a high dimensional feature space via some implicit feature mapping [4]; the Gabor occlusion dictionary for SRC by Yang and Zhang which reduces the computation cost by using Gabor feature [5]; a robust regularized coding model to enhance the robustness of face recognition to confounding factors [6, 7]; the method based on maximum correntropy criterion for robust face recognition by He et al. [8]. An alternative point of view was proposed by Zhang et al. [9] who argued that rather than sparsity “the collaborative representation mechanism used in SRC is much more crucial to its success of face classification”. Based on this observation, they introduced a method called collaborative representation based classification with regularized least square (CRC) [9] which was shown to perform very competitively against SRC with a lower computational cost. As a further refinement of CRC, some of the authors proposed a method called relaxed collaborative representation (RCR) which is designed better capture the similarity and distinctiveness of different features for the classification [10]. An alternative approach is the two-phase test sample representation method [11] and relies on detecting first the training samples located away from the test sample (assuming they have negligible effect on classification); next the test sample is represented as a linear combination of the M nearest neighbors and the representation result is used for classification. Another method proposed in [12] consists in partitioning face images into blocks and then creating an indicator to remove the contaminated blocks and choose the nearest subspaces; SRC is finally used to classify the occluded test sample in the new feature space.

We also recall the Fisher discrimination dictionary learning (FDDL) algorithm by Yang et al. [13] which embeds the Fisher criterion in the objective function design. The FDDL scheme has two remarkable properties. First, dictionary atoms are learnt to associate the class labels so that the reconstruction residual from each class can be used in classification; second, the Fisher criterion is imposed on the coding coefficients so that they carry discriminative information for classification. To improve this method, Feng et al. [14] propose to learn jointly the projection matrix for dimensionality reduction and the discriminative dictionary for face representation JDDLDR. The joint learning combines more effectively the learned projection and the dictionary with the result of improving FR performance. Within the general framework of the discriminative dictionary learning (DDL), the projective dictionary pair learning (DPL) algorithm [15] learns a synthesis dictionary and an analysis dictionary jointly to achieve the goal of signal representation and discrimination. The vector guided dictionary learning (SVGDL) method is proposed in [16] as a special case of the Fisher discrimination dictionary learning (FDDL) method; here the weights are determined by the numbers of samples of each class and a parameterization method is used to adaptively determine the weight of each coding vector pair. Compared with FDDL, SVGDL can adaptively assign different weights to different pairs of coding vectors. Yet another DDL approach recently proposed is the locality constrained and label embedding dictionary learning (LCLE-DL) algorithm [17], where locality information is preserved using the graph Laplacian matrix of the learned dictionary rather than the conventional one derived from the training samples; next, the label embedding term is constructed using the label information of atoms instead of the classification error term; the coding coefficients derived by combining locality-based and label-based reconstruction are shown to be very effective for image classification. Very recently, it was proposed a probabilistic interpretation of the collaborative classification mechanism to explain the classification mechanism of CRC and following this analysis it was introduced a method called probabilistic collaborative representation based classifier (ProCRC) which jointly maximizes the likelihood that a test sample belongs to each of the multiple classes [18].

On other hand, a class of algorithms described as local feature based methods [19–28] also demonstrated very promising results in problems of object recognition and texture classification. For instance, some of these methods use Gabor filters to extract local directional features on multiple scales and have been successfully applied in FR [20, 21]. Compared to more conventional methods such as Eigenface [29] and Fisherface [30], Gabor filtering is less sensitive to image variations. Another type of local feature widely used in FR is statistical local feature (SLF), such as histogram of local binary pattern (LBP) [22], whose main principle is to model a face image as a composition of micro-patterns [28]. By partitioning the face image into several blocks, the statistical feature (e.g., histogram of LBP) of these blocks is extracted, and finally the description of the image is formed by concatenating the extracted features in all blocks. For example, Zhang et al. [24, 25] proposed to use Gabor magnitude or phase map instead of the intensity map to generate LBP features. New coding techniques on Gabor features have also been proposed, e.g., Zhang et al. [26] extracted and encoded the global and local variations of the real and imaginary parts of the data using a multi-scale Gabor representation. Borgi et al. [31–35] proposed two algorithms that apply a sparse multiscale representation based on shearlets to extract the essential geometric content of facial features, one called regularized shearlet network (RSN) and another one sparse multi-regularized shearlet network (SMRSN). Finally, we recall that Meng et al. [36] proposed a kernel based representation model to fully exploit the discrimination information embedded in the statistical local features (SLF_RKR) and applied a robust regression method handle occlusions in face images.

In this paper, we adopt the same general philosophy of CRC and our main novel contribution is to integrate this method with a virtual collaborative projection (VCP) routine designed to train images of every class against the others classes with the goal to improve fidelity before projecting the query image. Additionally, inspired by the remarkable results obtained from the recent literature in local feature based method, our algorithm includes a routine to compute high-order statistical moments (SM) in order to extract highly discriminative local features and improve data representation. To validate our algorithm, which is called statistical binary pattern with virtual competitive representation (SBP_VCP), we have tested it on multiple datasets for problems of face recognition, gender classification, handwritten digit recognition, object categorization and action recognition. Experimental results show that our method consistently achieves very competitive results as compared to classical and state-of-the-art algorithms.

The rest of this paper is organized as follows. Section 2 introduces the main idea of statistical binary pattern and high order moments for feature extraction. Section 3 describes the proposed virtual collaborative projection applied to trained faces. Section 4 reports extensive numerical experiments to validate the proposed method and compare it against state-of-the-art methods on problems of face recognition under different confounding factors as well as image categorization, handwritten digit and action recognition. Finally, Sect. 5 concludes this paper.

2 Statistical binary pattern and high order moments

The statistical binary patterns (SBP) representation is an extension of local binary patterns (LBP) and it aims at enhancing the expressiveness and discrimination power of LBP for image modelling (especially texture) and recognition, while reducing sensitivity to small perturbations, e.g., noise. The main idea of this method, which was introduced by one of the authors and their collaborator in [37], consists in applying a rotation invariant uniform LBP to a set of images corresponding to the local statistical moments associated to a given spatial support. The resulting code forms the SBP and an image is then represented by joint or marginal distributions of SBPs.

2.1 Moment images

A real valued 2d discrete image f is modelled as a mapping from ${{\mathbb{Z}}^{2}}$ to $\mathbb{R}$. The spatial support used to calculate the local statistics is modelled as$B\subset {{\mathbb{Z}}^{2}}$, such that $O\in B,$ where O is the origin of ${{\mathbb{Z}}^{2}}$. The r-order moment image associated to f and B is also a mapping from ${{\mathbb{Z}}^{2}}$to $\mathbb{R}$, defined as:

$$m_{(f,B)}^{r}(z)=\frac{1}{\left| B \right|}\sum\limits_{b\in B}{{{\left( f(z+b) \right)}^{r}}}$$

(1)

where z is a pixel from ${{\mathbb{Z}}^{2}}$, and $\left| B \right|$ is the cardinality of the structuring element B. Accordingly, the r-order centered moment image (r > 1) is defined as:

$$\mu _{(f,B)}^{r}(z)=\frac{1}{\left| B \right|}{{\sum\limits_{b\in B}{\left( f(z+b)-m_{(f,B)}^{1}(z) \right)}}^{r}}$$

(2)

where $m_{(f,B)}^{1}(z)$ is the average value (1-order moment) calculated around z. Finally the r-order normalized centered moment image (r > 2) is defined as:

$$\beta _{(f,B)}^{r}(z)=\frac{1}{\left| B \right|}{{\sum\limits_{b\in B}{\left( \frac{f(z+b)-m_{(f,B)}^{1}(z)}{\sqrt{\mu _{(f,B)}^{2}(z)}} \right)}}^{r}}$$

(3)

where $\mu _{(f,B)}^{2}(z)$ is the variance (2-order centered moment) calculated around z.

2.2 Statistical binary patterns

Let R and P denote the radius of the neighborhood circle and the number of values sampled on the circle, respectively. For each moment image M, one statistical binary pattern is formed as follows:

one (P + 2)-valued pattern corresponding to the rotation invariant uniform LBP coding of M:

$$SB{{P}_{P,R}}(M)(z)=LBP_{P,R}^{riu2}(M)(z)$$

(4)

one binary value corresponding to the comparison of the centre value with the mean value of M:

$$SB{{P}_{C}}(M)(z)=s\left( M(z)-\tilde{M} \right)$$

(5)

where s denotes the pre-defined sign function, and $\tilde{M}$ the mean value of the moment M on the whole image. Hence $SB{{P}_{P,R}}(M)$ represents the structure of the moment M with respect to a local reference (the center pixel), and $SB{{P}_{C}}(M)$ complements the information with the relative value of the center pixel with respect to a global reference ($\tilde{M}$). As a result of this first step, a $2(P+2)$-valued scalar descriptor is then computed for every pixel of each moment image.

2.3 Image descriptors

Let ${{\left\{ {{M}_{i}} \right\}}_{1\le i\le {{n}_{M}}}}$ be the set of ${{n}_{M}}$ computed moment images. $SB{{P}^{\left\{ {{M}_{i}} \right\}}}$ is defined as a vector valued image, with ${{n}_{M}}$ components such that for every $z\in {{\mathbb{Z}}^{2}}$, and for every i, $SB{{P}^{\left\{ {{M}_{i}} \right\}}}{{(z)}_{i}}$ is a value between 0 and 2 (P + 2). If the image f contains texture, the descriptor associated to f is made by the histogram of the values of $SB{{P}^{\left\{ {{M}_{i}} \right\}}}$. We consider two kinds of histograms.

First we consider the joint histogram H defined as follows:

$$\begin{aligned} & H:{{[ 0,2(P+2) [}^{{{n}_{M}}}}\to {\mathbb{N}} \\ & H(v)=\left| \left\{ z;SB{{P}^{\left\{ {{M}_{i}} \right\}}}(z)=v \right\} \right| \\ \end{aligned}$$

(6)

Depending on the size of the texture images, the joint distribution may become too sparse when the dimension (i.e., the number of moments) increases.

Next, we consider the marginal histograms ${{\{{{h}_{i}}\}}_{i\le {{n}_{M}}}}$ defined as:

$$\begin{aligned} & H:[ 0,2(P+2) [\to \mathbb{N} \\ & {{h}_{i}}(n)=\left| \left\{ z;SB{{P}^{\left\{ {{M}_{i}} \right\}}}{{(z)}_{i}}=n \right\} \right| \\ \end{aligned}$$

(7)

An image descriptor can then be defined using the joint histogram H or the concatenation of the ${{n}_{M}}$ marginal histograms $\{{{h}_{i}}\}$. The length of the descriptor vector is ${{[2(P+2)]}^{{{n}_{M}}}}$ in the first case and $2{{n}_{M}}(P+2)$ in the second case.

2.4 Higher order moments

The SBP model on higher order moments is evaluated next. The objective of the SBP framework is to extend the LBP texture image descriptors from the local level, represented by the pixel z, to the regional distribution level of $z+B$ by approximating the distribution to a set of statistical moments. It is known that the mean and variance describe faithfully a statistical distribution only in special cases, e.g., when it is a normal distribution. This assumption may fail for natural texture images. Therefore, higher order moments are needed to obtain an accurate description of a general distribution and capture the relevant information.

Regarding the size of the image descriptor, it clearly increases as the number of moments increase. When we use joint histograms, the descriptor size is ${{(2(P+2))}^{n}}$ where P is the number of neighbours used in LBP and n is the number of moment images. When we use marginal histograms, the size is only $2n(P+2)$ but this comes at the price of a significant loss of information. Hence we propose a trade-off between descriptor size and information loss based on the concatenation of joint histograms corresponding to pairs of moment images.

Formally, we can recursively define the higher order SBP hybrid image descriptor as follows.

Let ${{M}_{1}}$and ${{M}_{2}}$ be moments or combinations of moments by their joint or concatenated histogram. We shall denote as $SB{{P}^{{{M}_{1}}{{M}_{2}}}}$(resp. $SB{{P}^{{{M}_{1}}\_{{M}_{2}}}}$) the image descriptor made by the joint (resp. concatenated) histograms constructed from $SB{{P}^{{{M}_{1}}}}$ and $SB{{P}^{{{M}_{2}}}}$. In our experiments for higher order moments below, we have only considered pairs of moments for joint histograms. The algorithm below summarizes the high order binary statistical moment SBP:

The SBP Algorithm
Input: f—a 2D image, $B\subset {{\mathbb{Z}}^{2}}$—the spatial support used to calculate the local moments, P—the number of neighbours, R—the radius neighbouring circle. Output: $SBP_{P,R}^{{{m}_{1}}{{\mu }_{2}}}$—texture descriptor of f. Calculate moment images: 1. Calculate the first order moment image ${{m}_{1}}$ (or $m_{(f,B)}^{1}$) associated to f and B using the formula (1). 2. Calculate the second order centered moment image ${{\mu }_{2}}$ (or $\mu _{(f,B)}^{2}$) associated to f and B using the formula (2). Statistical binary patterns: 1. Calculate statistical binary patterns $SB{{P}_{P,R}}\left( {{m}_{1}} \right)$ and $SB{{P}_{C}}\left( {{m}_{1}} \right)$ from the first order moment images ${{m}_{1}}$, using the formulas (5) and (6). 2. Calculate statistical binary patterns $SB{{P}_{P,R}}\left( {{\mu }_{2}} \right)$ and $SB{{P}_{C}}\left( {{\mu }_{2}} \right)$ from the second order moment images ${{\mu }_{2}}$, using the formulas (5) and (6). 3. Calculate $SBP_{P,R}^{{{m}_{1}}{{\mu }_{2}}}$ as joint histogram of $SB{{P}_{P,R}}\left( {{m}_{1}} \right)$, $SB{{P}_{C}}\left( {{m}_{1}} \right)$, $SB{{P}_{P,R}}\left( {{\mu }_{2}} \right)$ and $SB{{P}_{C}}\left( {{\mu }_{2}} \right)$.

Figures 1 and 2 compare the recognition rate of the algorithms LBP, CLBP [38] and SBP. For this comparison, we used the Outex database [39], a large and comprehensive texture database which includes 24 classes of textures collected under three illuminations and at nine angles. To measure the dissimilarity between the two histograms, we used the nearest neighborhood classifier with the Chi square distance. We considered different configurations of SBP: in Fig. 1 we set the (P,R) value equal to (24,3); in Fig. 2 we used values (8,1), (16,2) and (24,3).

3 Virtual collaborative projection

Zhang et al. [9] investigated the role of collaboration between classes in representing the query sample. In order to collaboratively represent the query sample $y\in {{\mathbb{R}}^{m}}$ using X (all the gallery images where each column is a training sample) with low computational cost, they introduced a method called collaborative representation based classification with regularized least square method (CRC_RLS). A general model of collaborative representation is:

$$\tilde{\alpha }=\arg {{\min }_{\alpha }}\left\{ \left\| y-X\alpha \right\|_{2}^{2}+\lambda \left\| \alpha \right\|_{2}^{2} \right\}$$

(8)

where $\alpha$ is the coding vector $(\alpha =[{{\alpha }_{1}},\ldots ,{{\alpha }_{i}},\ldots ]$ and $y\approx X\alpha)$ and $\lambda$ is the regularization parameter.

The algorithm is described below:

The CRC-RLS Algorithm
1. Normalize the columns of X to have unit l ₂-norm. 2. Code y over X by $\tilde{\alpha }=Py$ where $P={{\left( {{X}^{T}}X+\lambda I \right)}^{-1}}{{X}^{T}}$. 3. Compute the regularized residuals ${{r}_{i}}={\left\\| y-{{X}_{i}}{{{\tilde{\alpha }}}_{i}} \right\\|}/{{{\left\\| {{{\tilde{\alpha }}}_{i}} \right\\|}_{2}}}\;$ 4. Output the identity of y as $\text{identity}(y)\text{ }=\text{ argmi}{{\text{n}}_{\text{i}}}\left\{ {{r}_{i}} \right\}$

where ${{\tilde{\alpha }}_{i}}$ is the coding vector associated with class i.

The method proposed in this paper improves this algorithm by increasing the fidelity of the training images and enhancing the collaboration between classes by representing not only the query sample y but also all gallery images ${{x}_{i}}$ of every class i based on the idea of virtual collaborative projection (VCP).

Using this idea, we can compute the average images ${{C}_{i}}$ from every class i over X, defined as:

$${{C}_{i}}={\sum\limits_{1}^{tr}{{{x}_{i}}}}/{{{N}_{tr}}}\;$$

(9)

where ${{N}_{tr}}$ represents the number of training images of a class i.

Next by computing P as:

$$P={{\left( {{X}^{T}}X+\lambda I \right)}^{-1}}{{X}^{T}}$$

(10)

then the resulting virtual coefficient ${{\tilde{\alpha }}_{virtual}}$ is calculated as follows:

$${{\tilde{\alpha }}_{virtual}}=P{{C}_{i}}$$

(11)

This virtual coefficient is used as a weight for every class i and reconstruct a new gallery images ${{d}_{{{c}_{i}}}}$:

$${{d}_{{{c}_{i}}}}={{\left\| {{{\tilde{\alpha }}}_{virtual}} \right\|}_{2}}{{C}_{i}}$$

(12)

A new dictionary D (the update of X) is then obtained by combining all images ${{d}_{{{c}_{i}}}}\left( D=\left[ {{d}_{{{c}_{1}}}},\ldots ,{{d}_{{{c}_{i}}}},\ldots \right] \right)$.

Next, when a query sample y is presented to be classified, we follow the same procedure as CRC_RLS by computing the regularized residuals ${{r}_{i}}$ but we utilize the new dictionary D:

$${{r}_{i}}={{{\left\| y-{{D}_{i}}{{{\tilde{\alpha }}}_{virtual}} \right\|}/{\left\| {{{\tilde{\alpha }}}_{virtual}} \right\|}\;}_{2}}$$

(13)

where ${{D}_{i}}$ represents the images of a class i. The identity of a query sample y is computing by:

$$\text{identity}(y)\text{ }=\text{ }\arg {{\min }_{i}}\left\{ {{r}_{i}} \right\}$$

(14)

Below we present our virtual collaborative projection (VCP) algorithm when a query image$y$is presented to be classified:

The VCP Algorithm
1. Normalize the columns of X to have unit l ₂-norm. 2. Compute the average images ${{C}_{i}}$ of every class i using the formula (9). 3. Compute the virtual coefficient ${{\tilde{\alpha }}_{virtual}}$ using the formulas (10) and (11). 4. Compute ${{d}_{{{c}_{i}}}}$ using the formula (12). 5. Combining all the ${{d}_{{{c}_{i}}}}$ in a dictionary D. 6. Compute the regularized residuals ${{r}_{i}}$ using the formula (13). 7. Return the identity of y using the formula (14).

In order to investigate the efficiency of VCP versus CRC, we conducted some experiments using the AR face dataset [40] with different dimensionality. Note that PCA is used to reduce the dimensionality of original face images, and the Eigenface features are used for this first experiment with three dimensions 54, 120 and 300.

For this comparison, we selected a subset from AR dataset that contains 50 male subjects and 50 female subjects with only illumination and expression changes. For each subject, the seven images from Session 1 were used for training and the other seven images from Session 2 were used for testing. The images were cropped and resized to 60 × 43. Table 1 shows that VCP performs slightly better than CRC_RLS [9]:

Table 1 Comparison of VCP (virtual collaborative projection) versus CRC (collaborative representation based classification) using AR data set with different dimensionality

Statistical binary patterns and post-competitive representation for pattern recognition

Abstract

Similar content being viewed by others

A Collaborative Sparse Representation-Based Approach for Pattern Classification

Improving Sparse Representation-Based Classification Using Local Principal Component Analysis

Weighted average integration of sparse representation and collaborative representation for robust face recognition

Explore related subjects

1 Introduction

2 Statistical binary pattern and high order moments

2.1 Moment images

2.2 Statistical binary patterns

2.3 Image descriptors

2.4 Higher order moments

3 Virtual collaborative projection

4 Experiments

4.1 Parameter settings

4.2 Face recognition (FR)

4.2.1 Extended Yale B database

4.2.2 AR database

4.2.3 MPIE database

4.2.4 AR database, disguise

4.2.5 Georgia Tech data base with block occlusion

4.2.6 FRGC data base with block occlusion and single sample per person (SSPP)

4.3 Gender classification (GC)

4.3.1 AR database

4.3.2 FEI database

4.4 Handwritten digit recognition

4.5 Image categorization

4.6 Action recognition

4.7 Running time

5 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation