Keywords

1 Introduction

Texture classification is still a difficult problem in computer vision and pattern recognition. The task of it is to assign a class label to the texture category it belongs to. Recently, a lot of texture feature descriptors have been proposed in the literature [1,2,3,4,5,6]. One of the most famous texture descriptors is the Local Binary Patterns (LBP) [1], which describes the neighborhood of an image pixel by comparing its gray value with the neighborhood pixels near it and finally forming a binary code. Except for texture classification, the texture descriptors have been employed to solve other vision tasks, such as object detection, face recognition, and defect detection.

Texture description may suffer the impact of multiple factors, such as variations in scale, illumination, and changes in perspective, which makes the single texture descriptor not fit all the situations. People try to combine multiple features to solve this problem [7]. In this paper, we proposed to use a set of complementary feature descriptors to extract features.

Combing multiple texture descriptors may cause the dimension to be too high. To solve this problem, we proposed a novel feature selection method. Inspired by the recent development of low-rank constraint representation [8], we design low-rank constraints to learn the global structure of feature space and remove the noise. Additionally, local structure plays an important role in feature selection [9]. We design local structure learning by combing the Euclidean distance with the KNN graph. As our experiments show, the proposed method can select more informative features.

In this paper, we proposed to combine multiple texture descriptors to improve the performance compared to single texture descriptors. Moreover, we designed a novel feature selection method to reduce the dimension without loss of accuracy. Those improve the accuracy in texture classification.

The rest of the paper is organized as follows. In Sect. 2, we present related work focusing on texture descriptors and feature selection. Section 3 describes the multiple texture descriptors. In Sect. 4, we introduce the feature selection method we proposed. The experimental results are shown in Sect. 5. Finally, we provide the conclusion in Sect. 6.

2 Related Work

Our texture classification framework involves texture feature description and feature selection. In this section, we will discuss them separately.

2.1 Texture Descriptors

Texture descriptors reflect the spatial distribution of image pixels. Recently, a variety of texture descriptors have been proposed. In [1], a multiresolution approach based on LBP was also proposed for rotation invariants texture classification. Because of LBP’s simplicity and efficiency, lots of variants have been proposed, such as CLBP [2]. In [3], a method based on a deep convolution network consisting of computing successive wavelet transforms and modulus nonlinearity was proposed for invariance to scaling. Moreover, people also introduced a method that uses vector quantization based on the lookup table for texture description [10].

It is an interesting problem to fuse multiple texture descriptors for robust classification. For example, Li [11] uses the combination of HOG, LBP, and Gabor features for gender classification. We also use the combination of descriptors for feature extracting.

2.2 Feature Selection

Feature selection is a method that reduces the dimensionality by selecting a subset of most informative features, which may improve the efficiency and accuracy of classification.

In terms of label availability, feature selection methods can be classified into supervised methods and unsupervised methods. The supervised methods can effectively select discriminative features to distinguish samples from different classes. However, with the absence of a class label, it is difficult for the unsupervised methods to define feature relevance. To solve this problem, one of the criteria is to select the features which can preserve the data similarity or manifold structure in the original feature space [12]. These methods always generate cluster labels via clustering algorithms to guide the feature selection, such as MCFS [13] and UFSwithOL [14].

However, these methods may have a common drawback, that is, they ignore the effect of noise on the estimation of data’s underlying structures. To solve this problem, we proposed a novel unsupervised method to learn global and local structures simultaneously, and remove the noise from data.

3 Multiple Texture Descriptors

In this selection, we show the multiple texture descriptors and the representation constructed by them. Similarity to the paper [15], we combine five texture descriptors, which are completed local binary patterns (CLBP) [2], wavelet scattering coefficient (SCAT) [3], binary Gabor pattern (BGP) [4], local phase quantization (LPQ) [5] and binarized statistical features (BSIF) [6]. They are briefly described below.

  • CLBP: The completed local binary patterns (CLBP) extends the conventional LBP operator, which incorporates local difference and sign-magnitude transform information (LDSMT). The LDSMT further consists of two components, the difference sign and difference magnitude, which is encoded by a binary code. Similar to the conventional LBP, a region is also represented by its center pixel encoded by a binary code after global thresholding. Finally, the image is represented by the concatenation of three binary codes, which form a single histogram.

  • SCAT: The wavelet scattering coefficient is a joint translation and invariant representation of image patches. It is implemented with a deep convolution network, which computes successive wavelet transforms and modulus nonlinearity. Invariants to scaling, shearing, and small deformations are calculated with linear operators in the scattering domain. SCAT obtains excellent results on texture databases with uncontrolled view conditions.

  • BGP: The binary Gabor pattern is an efficient and effective multi-resolution approach to gray-scale and rotation invariant texture classification. Unlike MR8 filters [16], BGP uses predefined rotation invariant binary patterns without the pre-training phase. To counter the noise sensitivity, BGP adopts the difference of regions instead of the difference between two single pixels.

  • LPQ: The local phase quantization is based on quantizing the phase information of the local Fourier transform. It is a powerful image descriptor and robust against the most common image blurs. LPQ is showed to provide excellent results for texture and face recognition tasks.

  • BSIF: The binarized statistical features computes a binary code for each pixel by linearly projecting local image patches onto a subspace, whose basis vectors are learned from natural images via independent component analysis, and by binarizing the coordinates in this basis via thresholding. The number of basis vectors determines the length of the pixel binary codes which are used to construct the final histogram of an image.

In our paper,we slightly modified the scope of the CLBP descriptor, calculating the coded values of the inner and outer circles, which makes it have multi-scale effects. We called it CLBP_ext, and others remain the same. Then, each image is represented by the five texture description methods. The final representation is obtained by concatenating all five representations into a single histogram, \( {\text{H}} = \left[ {h1,h2,h3,h4,h5} \right] \). This histogram is the original multi-textured description, namely \( x_{i} \), and then we optimize it.

4 Unsupervised Feature Selection of Local Structures and Low-Rank Constraints

In this selection, we use an unsupervised feature selection method to optimize the original texture features. Compared with the supervised method, the unsupervised feature selection method pays more effort to find the most informative features. With the absence of a class label, the selected features should maintain the internal structure as the data presented by the feature before selected.

To solve this problem, we propose to use low-rank constraints to preserve the global structure and adjust the local structures with the heat kernel function calculated by the k-nn graph.

Let \( {\mathbf{X}} = \left\{ {\varvec{x}_{1} ,\varvec{x}_{2} , \ldots ,\varvec{x}_{\varvec{n}} } \right\} \in R^{d \times n} \) be the data matrix with each column correspond to the data instance \( x_{i} \) and row to feature. Then we summarize some notation and norms used in the following selections. The bold uppercase characters are used to denote matrices, and the bold lowercase characters to denote vectors. For an arbitrary matrix \( {\mathbf{M}} \in R^{m \times n} \), \( \varvec{M}_{ij} \) means the (i, j)-th entry of M, \( \varvec{m}_{i} \) means the i-th column vector of M and \( \varvec{m}_{j}^{T} \) denotes the j-th row vector of M. The \( l_{2,1} \)-norms of matrix ||M||2,1- is defined as \( \sum\nolimits_{i = 1}^{m} {\sqrt {\sum\nolimits_{j = 1}^{n} {\varvec{M}_{i,j}^{2} } } } \)

4.1 Global Low-Rank Constraints

In the last few decades, people proposed lots of algorithms to analyze the global structure of data, such as PCA. Recently, the similarity preserving feature selection framework has demonstrated promising performance, which selects a feature subset with the pairwise similarity between high-dimensional samples. However, much redundant information and noise exist in the original high dimensional space.

Inspired by the recent development of low-rank constraint representation [8], we use the low-rank reconstruction to extract the global structure of data and remove noise. According to the theory of latent low-rank representation [17], we can get the below function:

$$ \mathop {\hbox{min} }\limits_{{\varvec{Z},{\mathbf{L}},\varvec{E}}} \left\| \varvec{Z} \right\|_{ *} + \left\| \varvec{L} \right\|_{ *} + \lambda \left\| \varvec{E} \right\|_{2,1} $$
(1)
$$ {\text{s}}.{\text{t}}. {\mathbf{X}} = {\mathbf{XZ}} + {\mathbf{LX}} + {\mathbf{E}} $$

where \( {\mathbf{Z}} \in R^{n \times n} \) is the low-rank matrix, L is used to extract salient features, and E is the noise component, \( \lambda \) is used to balance the noise component. Compared with pairwise similarity, the low-rank representation can remove the noise in the samples and represent the principal feature in the data. Finally, the optimal solution can be obtained by the iterative method.

To preserve the global and low-rank reconstruction structure, we propose a row sparse feature selection and transformation matrix \( {\mathbf{W}} \in R^{d \times c} \) to reconstruct it, and get

$$ \mathop {\hbox{min} }\limits_{W} \left\| {\varvec{W}^{\varvec{T}} \varvec{X - W}^{\varvec{T}} \varvec{XZ}} \right\|^{2} + \beta \left\| \varvec{W} \right\|_{2,1} $$
(2)
$$ s.t. \varvec{W}^{\varvec{T}} \varvec{XX}^{\varvec{T}} \varvec{W} = I $$

where \( \beta \) is a regularization parameter to ensure that matrix W is row sparse, and \( \varvec{W}^{\varvec{T}} \varvec{X} \) denotes low dimensional space after dimension reduction. From Eq. (2), the global structure captured by Z can lead to finding the principle feature. Without noise, global structure estimation can be more accurate.

4.2 Local Structure Learning

Recently, the importance of preserving local geometric data structures in feature dimensionality reduction has been well recognized [9, 18], especial when transforming high-dimensional data to a low-dimensional space for analysis. What’s more, the local geometric structure of data can be considered as a data-dependent regularization of the transformation matrix, which leads to maintaining the local manifold structure.

In this paper, we first build a KNN graph with Heat kernel weight. Then, we can get the weight matrix \( {\mathbf{P}} \in R^{n \times n} \). For each data sample \( \varvec{x}_{\varvec{i}} \), only k nearest points \( \left\{ {\varvec{x}_{\varvec{j}} } \right\}_{j = 1}^{k} \) are considered its neighborhood with weight \( \varvec{P}_{ij} \). In the original feature space, the following equation can obtain minimum value:

$$ \sum\nolimits_{i,j} {\left\| {\varvec{x}_{\varvec{i}} - \varvec{x}_{\varvec{j}} } \right\|_{2}^{2} \varvec{P}_{{\varvec{ij}}} } $$
(3)

With the weight matrix P, the induced Laplacian \( \varvec{L}_{\varvec{P}} = \varvec{D}_{\varvec{P}} - \left( {\varvec{P} + \varvec{P}^{T} } \right)/2 \) can be used for local manifold characterization, where \( \varvec{D}_{\varvec{P}} \) is a diagonal matrix whose i-th diagonal element is \( \sum\nolimits_{j} {(\varvec{P}_{{\varvec{ij}}} + \varvec{P}_{{\varvec{ji}}} )/2} \).

To maintain the local structure after dimension reduction, we propose to recognize Eq. (3) as a regularization with transformation matrix W, and we get

$$ \mathop {\hbox{min} }\limits_{\varvec{W}} \sum\nolimits_{i,j}^{n} {\left\| {\varvec{W}^{\varvec{T}} \varvec{x}_{\varvec{i}} - \varvec{W}^{\varvec{T}} \varvec{x}_{\varvec{j}} } \right\|_{2}^{2} \varvec{P}_{{\varvec{ij}}} } $$
(4)

Thus, the optimization problem of Eq. (4) can be considered as a local structure learning.

Based on the low-rank constraints and local structure learning presented in Eq. (2) and Eq. (4), we propose a novel unsupervised feature selection method by solving the following optimization problem:

$$ \mathop {\hbox{min} }\limits_{\varvec{W}} \left\| {\varvec{W}^{\varvec{T}} \varvec{X} - \varvec{W}^{\varvec{T}} \varvec{XZ}} \right\|^{2} + \alpha \sum\nolimits_{i,j}^{n} {\left\| {\varvec{W}^{\varvec{T}} \varvec{x}_{\varvec{i}} - \varvec{W}^{\varvec{T}} \varvec{x}_{\varvec{j}} } \right\|_{2}^{2} \varvec{P}_{{\varvec{ij}}} + \beta \left\| \varvec{W} \right\|_{2,1} } $$
(5)
$$ {\text{s}}.{\text{t}}. \varvec{W}^{\varvec{T}} \varvec{XX}^{\varvec{T}} \varvec{W} = \varvec{I} $$

where \( \alpha \) and \( \beta \) are regularization parameters balancing the fitting error of local structure learning and sparsity of transformation matrix W. It can be seen that our method removes the noise by low-rank constraints and learns global and local structure simultaneously.

4.3 Optimization Algorithm

With the only variable to be solved, it is easy to derive the approximate optimum solution in an iterative way. Let \( \varvec{L}_{\varvec{Z}} = \left( {\varvec{I} - \varvec{Z}} \right)\left( {\varvec{I} - \varvec{Z}} \right)^{T} \), \( \varvec{L}_{\varvec{P}} = \varvec{D}_{\varvec{P}} - \left( {\varvec{P} + \varvec{P}^{T} } \right)/2 \) and \( {\mathbf{L}} = \varvec{L}_{\varvec{Z}} + \varvec{\alpha L}_{\varvec{P}} \), the Eq. (5) can be rewritten as:

$$ \mathop {\hbox{min} }\limits_{W} Tr(\varvec{W}^{\varvec{T}} \varvec{XLX}^{\varvec{T}} \varvec{W}) + \beta \left\| \varvec{W} \right\|_{2,1} $$
(6)
$$ {\text{s}}.{\text{t}}. \varvec{W}^{\varvec{T}} \varvec{XX}^{\varvec{T}} \varvec{W} = \varvec{I} $$

With the t-th estimation \( \varvec{W}^{\varvec{t}} \), and denote \( \varvec{D}_{{\varvec{W}^{\varvec{t}} }} \) be a diagonal matrix whose i-th diagonal element is \( \frac{1}{{2\left\| {\varvec{w}_{\varvec{i}}^{\varvec{t}} } \right\|^{2} }} \), the Eq. (6) can be rewritten as:

$$ \mathop {\hbox{min} }\limits_{\varvec{W}} Tr(\varvec{W}^{\varvec{T}} \varvec{X}\left( {\varvec{L} + \beta \varvec{D}_{{\varvec{W}^{\varvec{t}} }} } \right)\varvec{X}^{\varvec{T}} \varvec{W}) $$
(7)
$$ {\text{s}}.{\text{t}}. \varvec{W}^{\varvec{T}} \varvec{XX}^{\varvec{T}} \varvec{W} = \varvec{I} $$

The optimal solution of W are the eigenvectors corresponding to c smallest eigenvalues of generalized eigenproblem:

$$ \varvec{X}\left( {\varvec{L} + \beta \varvec{D}_{{\varvec{W}^{\varvec{t}} }} } \right)\varvec{X}^{\varvec{T}} \varvec{W} = {\varvec{\Lambda}}\varvec{XX}^{\varvec{T}} \varvec{W} $$
(8)

Where \( {\varvec{\Lambda}} \) is a diagonal matrix whose diagonal elements are eigenvalues.

The complete algorithm of the feature selection method is summarized in algorithm 1.

figure a

5 Experiments

In this section, we conduct extensive experiments to evaluate the performance of the proposed multiple descriptors combination and feature selection method in texture classification.

5.1 Data Sets

To validate the proposed method, we use the following two texture datasets, namely KTH-TIPS-2a [19] and CURET [20]. The KTH-TIPS-2a dataset consists of 11 texture categories with images at 9 different scales, 3 poses, and 4 different illumination conditions. According to the standard protocol [21], we randomly select 1 sample for the test and the remaining 3 samples for training. The CURET dataset consists of 61 texture categories, 92 images per class. We randomly select 46 images for training and the remaining for the test. The example images are shown in Fig. 1.

Fig. 1.
figure 1

Example images from KTH_TIPS_2a (up) and CURET (down)

Throughout our experiments, we use one-versus-all SVM using the RBF kernel [22].

5.2 Combining Multiple Texture Features

We start by showing the results for multi-texture representations. They are presented in Table 1. Compared to other single texture features, BGP provides the best performance. And the combination of five texture features significantly improves the classification accuracy. Although BSIF provides the worst performance, it still improves the accuracy of the combination. The results suggest that different texture representations possess complementary information, we should make good use of it.

Table 1. Classification accuracy (%) of different texture representations and their combinations

5.3 The Effect of Feature Selection Method

As shown above, the combination of features improves the accuracy at the price of high dimensionality. We use the proposed unsupervised feature selection method to remove the redundancy features and reduce dimensions. How many dimensions are appropriate is an open question. In this paper, we reduce the final dimension to 300. We set the parameters k = 5, \( \uplambda = 0.1,\upalpha = 0.1,\upbeta = 0.5 \), and c is set to be the number of classes. The result is shown in Table 2.

Table 2. Classification accuracy (%) obtained with and without feature selection method

The result shows our selection method reduces the dimensions without any significant loss in accuracy. Especially, on the KTH-TIPS-2a, our feature selection method improves the performance by 2.97% compared to the original representation.

To evaluate the effect of the local structure learning, we design the experiments with and without local structure learning. The result is shown in Table 3. The result shows that local structure learning improves performance, especially in KTH-TIPS-2a.

Table 3. Classification accuracy (%) obtained with and without local structure learning

Additionally, we also compare our feature selection method with other classical feature selection methods, such as CFS [23], MCFS [13], UDFS [24] and UFSwithOL [14]. We provide a brief introduction to the above methods:

  • CFS: a correlation-based feature selection method.

  • MCFS: it selects the features by adopting spectral regression with \( l_{1} \)-norm regularization.

  • UDFS: it exploits local discriminative information and feature correlations simultaneously.

  • UFSwithOL: it uses a triplet-based loss function to enforce the selected feature groups to preserve the ordinal locality of original data.

Table 4 shows all 4 feature selection methods can reduce texture features’ dimensions to 300, but they may cause varying degrees of decline in accuracy, especially MCFS. Our feature selection method significantly outperforms other methods.

Table 4. Classification accuracy (%) of different feature select methods

5.4 Computation Cost and Parameter Sensitivity

The experiment was running on a machine with Window10, Matlab R2018a, NVIDIA GeForce GTX 1080, Intel (R) Core (TM) i5-9400 CPU @ 2.90 GHz, 8 GB RAM. We recorded the time to solve Eq. (5) in the two data sets. It took 16.26 s in KTH_TIPS_2a and 14.57 s in CURET. Moreover, the classification accuracy is not very sensitive to \( \uplambda, \;\upalpha \;{\text{and}}\;\upbeta \) in wide ranges.

5.5 Comparison with State-of-the-Art

Table 5 shows the classification accuracy of various methods on two databases, which come from either original or related publications. It can be seen that our texture classification method outperforms typical and state-of-the-art methods.

Table 5. Classification accuracy (%) of various methods, and “-” means the lack of related original or publication

6 Conclusion

In this paper, we introduce a novel idea of fusing complementary texture features, which significantly improves the accuracy of texture classification. To reduce the dimensions of fusing texture features without loss of accuracy, we proposed a novel unsupervised feature selection method. We use low-rank constraints to learn global structures, and design a regularization to learn local structure simultaneously. Finally, our experimental results demonstrate that the framework combining multiple texture features and feature selection outperforms the state-of-the-art in texture classification.

In the future, we plan to design a feature complimentary evaluation method, which helps us to find more complementary features and further improves classification accuracy. Moreover, we plan to validate the performance of our feature selection method in a wider dataset.