1 Introduction

Texture classification is an active research topic in the fields of pattern recognition and computer vision. It has been applied to medical image analysis [1], fingerprint recognition [2], image retrieval [3], object tracking [4], material classification [5], etc. However, texture classification in real-world images is a challenging task because the texture of objects varies significantly due to viewing and illumination changes, scale variations, etc. Therefore, it is one of the most urgent tasks to extract the discriminative features for texture classification.

Many texture classification methods have been proposed in the literature. The local binary pattern (LBP) algorithm [6] has gained huge success in texture classification due to its tolerance against illumination changes, computational simplicity, and good performance. Many LBP variants were later proposed. The completed LBP (CLBP) [7] extended the conventional LBP operator by combining the information of difference sign, difference magnitude and the center pixel to capture more discriminative local features. The completed local binary count (CLBC) [8] achieved good classification accuracy with lower computational complexity. Recently, Guo et al. [9] proposed the scale selective LBP (SSLBP) to address the scale variation for texture classification. Besides the LBP-based methods, the texton learning-based method is also an important research direction. The VZ-MR8 method [10] first learned a set of textons via the MR8 filter bank and K-means clustering algorithm, and then used the learned textons to describe the texture features for texture classification. Later on, Varma and Zisserman [11] proposed the VZ-Joint method where the textons were learned directly from the patches of original images instead of the MR8 filter responses, which could lead to slightly better results. Recently, Xie et al. [12] proposed an effective texton learning and encoding scheme, where the l2-norm regularization was used to learn the texton dictionary, and the texton encoding-induced statistical features (TEISF) were adopted for texture classification.

Apart from LBP-based methods and texton learning-based methods, another promising and popular approach to texture classification is GF-based methods. Gabor wavelets were introduced to image analysis due to their biological relevance and computational properties. The Gabor wavelets, whose kernels are similar to the two-dimensional (2-D) receptive field profiles of the mammalian cortical simple cells, exhibit desirable characteristics of spatial locality and orientation selectivity, and are optimally localized in the space and frequency domains. Therefore, the Gabor wavelet representation can capture the features corresponding to different spatial frequencies (scales) and orientations. Inspired by the characteristics of Gabor wavelets, some GF-based methods have been proposed [13,14,15], but these traditional GF-based methods used only the mean and variance of magnitude of Gabor-filtered images to describe the texture feature. Recently, Hadizadeh [16] proposed the local Gabor wavelets binary patterns (LGWBP) descriptor by combining the GF and LBP for texture classification, where the LBP encoding was implemented on the Gabor-filtered images to capture discriminative features.

As mentioned above, the Gabor wavelets have some desirable characteristics that should produce desirable texture classification results. However, most traditional GF methods use only mean and variance of magnitude of filtered images for texture classification, which is too coarse to achieve satisfactory classification performance. Therefore, in this study, we aim to further explore the potential of GF method in texture classification, which may achieve comparable or even better texture classification performance compared with the LBP-based methods and texton learning-based methods.

The main contributions of this paper could be summarized as follows:

  1. (1)

    The joint coding of both magnitude and phase components of Gabor-filtered images is proposed to represent the local Gabor feature of texture image, and then, the global and local Gabor features are fused in the framework of NSC to improve the performance of GF-based method.

  2. (2)

    A pyramid space for each image is constructed and the proposed joint coding scheme is implemented on each level, and then the local GF feature is obtained by taking the maximum values across all the levels, which is simple and effective to address the scale and resolution variation issue.

  3. (3)

    The proposed GF-based method achieves better texture classification performance than traditional GF-based methods, some state-of-the-art LBP-based methods, and some state-of-the-art texton learning-based methods.

The rest of this paper is organized as follows: Sect. 2 briefly reviews the related Gabor filter bank design and GF. Section 3 presents the proposed method in detail. Section 4 describes the databases used to evaluate the proposed method and gives the experimental results. Section 5 draws conclusions.

2 Gabor filter bank design and GF

A 2-D Gabor function \( {\mathbf{g}}\left( {x,y} \right) \) is a complex exponential modulated by a 2-D Gaussian function, which can be defined as [13, 16]:

$$ {\mathbf{g}}\left( {x,y} \right) = \frac{1}{{2\pi \sigma_{x} \sigma_{y} }}\exp \left[ { - \frac{1}{2}\left( {\frac{{x^{2} }}{{\sigma_{x}^{2} }} + \frac{{y^{2} }}{{\sigma_{y}^{2} }}} \right)} \right]\exp \left( {2\pi jFx} \right), $$
(1)

where \( \sigma_{x} \) and \( \sigma_{y} \) are the standard deviations of the 2-D Gaussian function in the horizontal and vertical directions, respectively, and \( F \) is the spatial frequency of the complex exponential. In the spatial frequency domain, the Gabor function is simply a Gaussian function centered on the frequency of interest (i.e., \( F \)) as follows:

$$ {\mathbf{G}}\left( {u,v} \right) = \exp \left[ { - \frac{1}{2}\left( {\frac{{\left( {u - F} \right)^{2} }}{{\sigma_{u}^{2} }} + \frac{{v^{2} }}{{\sigma_{v}^{2} }}} \right)} \right], $$
(2)

where \( {\mathbf{G}}\left( {u,v} \right) \) is the Fourier transform of \( {\mathbf{g}}\left( {x,y} \right) \), \( \sigma_{u} = 1/\left( {2\pi \sigma_{x} } \right) \) and \( \sigma_{v} = 1/\left( {2\pi \sigma_{y} } \right) \). Then, a set of self-similar Gabor wavelets (kernels, filters) can be obtained by appropriate scaling and rotation of \( {\mathbf{g}}\left( {x,y} \right) \), which form a complete but non-orthogonal basis set for signal analysis.

As the Gabor wavelets are non-orthogonal, to reduce the redundant information in the Gabor-filtered images, Manjunath and Ma [13] proposed a strategy to design a bank of Gabor filters. This design strategy ensures that the half-peak magnitude supports of the filter responses in the frequency domain touch each other with no overlap. This yields the following formulas for computing \( \sigma_{u} \) and \( \sigma_{v} \) (and thus \( \sigma_{x} \) and \( \sigma_{y} \)):

$$ \begin{aligned} a = & \left( {\frac{{F_{h} }}{{F_{l} }}} \right)^{{\frac{1}{S - 1}}} ,\quad \sigma_{u} = \frac{{\left( {a - 1} \right)F_{h} }}{{\left( {a + 1} \right)\sqrt {2\ln 2} }}, \\ \sigma_{v} = & \tan \left( {\frac{\pi }{2K}} \right)\left[ {F_{h} - 2\ln \left( {\frac{{2\sigma_{u}^{2} }}{{F_{h} }}} \right)} \right]\left[ {2\ln 2 - \left( {\frac{{2\ln 2 \cdot \sigma_{u} }}{{F_{h} }}} \right)^{2} } \right]^{{ - \frac{1}{2}}} , \\ \end{aligned} $$
(3)

where \( F_{h} \) and \( F_{l} \) denote the maximum and minimum center frequencies of interest, S is the total number of scales, K is the total number of orientations. Figure 1 shows the real part of the Gabor filters at four scales and six orientations. The Gabor filter bank exhibits desirable characteristics of spatial frequency and orientation selectivity.

Fig. 1
figure 1

The real part of the Gabor filter bank at four scales and six orientations. Each row shows one scale with six orientations

The GF of an image can be implemented by the convolution of the image with the Gabor filter bank designed above as follows:

$$ \begin{aligned} &{\mathbf{W}}^{s,k} \left( {x,y} \right) = {\mathbf{I}}\left( {x,y} \right) * {\mathbf{g}}^{s,k} \left( {x,y} \right),\\ & \quad s = 1,2, \ldots ,S,\;k = 1,2, \ldots ,K, \end{aligned} $$
(4)

where \( * \) denotes the convolution operator, \( {\mathbf{I}}\left( {x,y} \right) \) is a grayscale image, \( {\mathbf{W}}^{s,k} \left( {x,y} \right) \) is the Gabor-filtered image corresponding to the Gabor filter \( {\mathbf{g}}^{s,k} \left( {x,y} \right) \) at scale s and orientation k. In traditional GF-based methods, the mean and variance of \( {\mathbf{W}}^{s,k} \left( {x,y} \right) \) at different scales and orientations are used to represent the texture feature.

3 Proposed method

3.1 Framework of the proposed method

To cope with the resolution variation of texture images, a pyramid space for each original image is first constructed via downsampling and upsampling to imitate the images with different resolutions. Second, the GF is implemented by the convolution of the Gabor filter bank with each image in the pyramid space, and then the magnitude and phase components of filtered images are calculated. Third, the global and local Gabor features are extracted, respectively, where the global Gabor feature is represented by the mean and variance of the magnitude component, and the local Gabor feature is represented by the joint coding of both magnitude and phase components in a histogram form. Finally, the NSC is used for the global and local Gabor feature fusion and texture classification. Figure 2 shows the framework of the proposed method.

Fig. 2
figure 2

The framework of the proposed method

3.2 Image pyramid space construction

Resolution variation exists in many texture images, which makes the texture classification more challenging. To address this problem, we construct a pyramid space for each original image to imitate the resolution variation of the original image.

In the pyramid space construction, the image with lower resolution is generated by downsampling its adjacent high resolution image, and the image with higher resolution is produced by upsampling its adjacent low resolution image. The downsampling and upsampling ratios are set to 2. To determine the number of pyramid levels, we consider two factors: (1) the size of texture image is usually relatively small. Too many times downsampling produces too small images that have little texture information. Therefore, the times of downsampling is set to 2, i.e., only two levels are produced by downsampling from the original image. (2) The upsampling operation is time-consuming. To maintain the high efficiency, only one level is produced by upsampling. Moreover, the nearest neighbor interpolation is used, and only the central one-fourth of the original image is cropped and used for upsampling, which further reduces the time consumption. Therefore, there are four levels in the proposed pyramid space including the original image, two images with lower resolution produced by downsampling, and one image with higher resolution produced by upsampling. The left part of Fig. 2 shows the proposed sampling scheme and the resulting image pyramid space for an image.

3.3 Global and local Gabor feature extraction

After the Gabor filter bank design and image pyramid space construction, we implement the GF to each image in the pyramid space. For an original image, there are \( 4 \times S \times K \) Gabor-filtered images.

In our method, both global and local Gabor features are extracted from the Gabor-filtered images. For the global Gabor feature, we use the mean and variance of magnitude of Gabor-filtered images at different scales and orientations, which is based on the traditional GF method. However, our proposed global Gabor feature differs from traditional Gabor feature in that our method introduces the image pyramid space that can address the resolution variation issue. Let \( \mu_{i}^{s,k} \) and \( \sigma_{i}^{s,k} \) are the mean and standard deviation (equivalent to the variance) of the filtered image at scale s, orientation k, and level i of the pyramid space, then the global Gabor feature can be represented as follows:

$$ \begin{gathered} {\mathbf{h}}_{{global}} = \left[ {\mu _{1}^{{1,1}} ,\sigma _{1}^{{1,1}} ,\mu _{1}^{{1,2}} ,\sigma _{1}^{{1,2}} , \ldots ,\mu _{1}^{{S,K}} ,\sigma _{1}^{{S,K}} ,} \right. \hfill \\ \left. {\quad \ldots ,\mu _{4}^{{1,1}} ,\sigma _{4}^{{1,1}} ,\mu _{4}^{{1,2}} ,\sigma _{4}^{{1,2}} , \ldots ,\mu _{4}^{{S,K}} ,\sigma _{4}^{{S,K}} } \right]. \hfill \\ \end{gathered} $$
(5)

The global Gabor feature mainly describes the holistic feature, which is too coarse to characterize the detailed feature difference. In our study, we found that both magnitude and phase of Gabor-filtered images have discriminative information. In other words, they characterize the texture images in the manner of different combinations. Therefore, we propose to use the joint coding of magnitude and phase of Gabor-filtered images to describe the local feature of texture images. Our experimental results (refer to Sect. 4.3.1) show that the joint coding of magnitude and phase components is able to provide better performance than using only the magnitude or phase component in texture classification.

To extract the local Gabor feature of an original image, we implement the following procedures:

Step 1 Calculate the magnitude and phase components of each Gabor-filtered image as follows:

$$ \begin{aligned}& {\mathbf{M}}^{s,k} = \sqrt {{\mathbf{X}}^{2} + {\mathbf{Y}}^{2} } , \\ & {\varvec{\Phi}}^{s,k} = \left\{ {\begin{array}{*{20}l} {\arctan \left( {{\mathbf{Y}}/{\mathbf{X}}} \right),} \hfill & \quad {{\mathbf{X}} > 0} \hfill \\ {\arctan \left( {{\mathbf{Y}}/{\mathbf{X}}} \right) + \pi ,} \hfill & \quad {{\mathbf{X}} < 0,{\kern 1pt} {\mathbf{Y}} > 0} \hfill \\ {\arctan \left( {{\mathbf{Y}}/{\mathbf{X}}} \right) - \pi ,} \hfill & \quad {{\mathbf{X}} < 0,{\mathbf{Y}} < 0} \hfill \\ {\pi /2,} \hfill & \quad {{\mathbf{X}} = 0,{\mathbf{Y}} > 0} \hfill \\ { - \pi /2,} \hfill & \quad {{\mathbf{X}} = 0,{\mathbf{Y}} < 0} \hfill \\ \end{array} ,} \right.\\ &\quad s = 1,2, \ldots ,S,\;k = 1,2, \ldots ,K, \\ \end{aligned} $$
(6)

where \( {\mathbf{W}}^{s,k} \) is the filtered image at scale s and orientation k, \( {\mathbf{X}} = \text{Re} \left\{ {{\mathbf{W}}^{s,k} } \right\} \) is the real part of \( {\mathbf{W}}^{s,k} \), \( {\mathbf{Y}} = \text{Img} \left\{ {{\mathbf{W}}^{s,k} } \right\} \) is the imaginary part of \( {\mathbf{W}}^{s,k} \), \( {\mathbf{M}}^{s,k} \) is the magnitude component, \( {\varvec{\Phi}}^{s,k} \) is the phase component, and all the values are calculated by point-wise operation. The phase component is in the range of \( \left[ { - \pi ,\;\pi } \right] \). Then, the magnitude and phase are normalized as follows:

$$ \begin{aligned} &{\mathbf{M}}^{s,k} = \frac{{{\mathbf{M}}^{s,k} - \hbox{min} \left( {{\mathbf{M}}^{s,k} } \right)}}{{\hbox{max} \left( {{\mathbf{M}}^{s,k} } \right) - \hbox{min} \left( {{\mathbf{M}}^{s,k} } \right)}},\quad {\varvec{\Phi}}^{s,k} = \frac{{{\varvec{\Phi}}^{s,k} - \left( { - \pi } \right)}}{{\pi - \left( { - \pi } \right)}},\\ & \quad s = 1,2, \ldots ,S,\;k = 1,2, \ldots ,K. \end{aligned} $$
(7)

Step 2 Jointly encode the magnitude and phase components of each filtered image by 8 bits, where the magnitude is encoded by the high 4 bits and the phase is encoded by the low 4 bits. Specifically, the normalized magnitude and phase components are first quantized into 16 levels with the range of 0–15, respectively, and then, the joint coding image can be obtained as follows:

$$ {\mathbf{J}}_{{}}^{s,k} = 16 \cdot {\mathbf{M}}_{q}^{s,k} + {\mathbf{P}}_{q}^{s,k} ,\quad s = 1,2, \ldots ,S,\;k = 1,2, \ldots ,K, $$
(8)

where \( {\mathbf{M}}_{q}^{s,k} \) and \( {\mathbf{P}}_{q}^{s,k} \) are quantized magnitude and phase components, respectively, and \( {\mathbf{J}}_{{}}^{s,k} \) is the joint coding image at scale s and orientation k. The joint coding value is in the range of 0–255.

Step 3 Calculate the histogram of each joint coding image \( {\mathbf{J}}_{{}}^{s,k} \), which is denoted as \( {\mathbf{h}}_{{}}^{s,k} \). The obtained histogram \( {\mathbf{h}}_{{}}^{s,k} \) can be considered as the local feature of a filtered image at scale s and orientation k.

Step 4 Take the maximum value at each bin among all the K histograms at K orientations of one scale, i.e., \( {\mathbf{h}}_{{}}^{s} = \max_{k} \left\{ {{\mathbf{h}}_{{}}^{s,k} } \right\}_{bin} \), where the subscript bin denotes the bin-wise operation. The obtained maximum histogram \( {\mathbf{h}}_{{}}^{s} \) can be viewed as the local feature of a filtered image at scale s.

Step 5 Concatenate all the maximum histograms at S scales to represent the local feature of a filtered image, i.e., \( {\mathbf{h}}_{i} { = }\left[ {{\mathbf{h}}_{i}^{1} ,{\mathbf{h}}_{i}^{2} , \ldots ,{\mathbf{h}}_{i}^{S} } \right] \), where the subscript i denotes the ith image corresponding to the original image in the pyramid space. For an original image, there are four corresponding images in the pyramid space, and these four images correspond to four histograms \( {\mathbf{h}}_{i} \), \( i = 1,2,3,4 \).

Step 6 Take the maximum value at each bin of four histograms \( {\mathbf{h}}_{i} \), \( i = 1,2,3,4 \) as follows:

$$ {\mathbf{h}}_{local} = \mathop {\hbox{max} }\limits_{i} \left\{ {{\mathbf{h}}_{i} } \right\}_{bin} ,\quad i = 1,2,3,4. $$
(9)

The obtained \( {\mathbf{h}}_{local} \) is used to represent the local Gabor feature of the original image.

3.4 Feature fusion and classifier

For a test image \( \varvec{y} \), we extract its global Gabor feature \( {\mathbf{h}}_{global}^{y} \) and local Gabor feature \( {\mathbf{h}}_{local}^{y} \), respectively. To exploit the discriminative information from both \( {\mathbf{h}}_{global}^{y} \) and \( {\mathbf{h}}_{local}^{y} \), we fuse them in the framework of NSC [12]. Suppose there are C classes of textures and n training samples per class. \( {\mathbf{H}}_{global} = [{\mathbf{h}}_{global,1} ,{\mathbf{h}}_{global,2} , \ldots ,{\mathbf{h}}_{global,n} ] \) and \( {\mathbf{H}}_{local} = [{\mathbf{h}}_{local,1} ,{\mathbf{h}}_{local,2} , \ldots ,{\mathbf{h}}_{local,n} ] \) denote the sets of global and local Gabor feature histograms for one class, respectively. We project \( {\mathbf{h}}_{global}^{y} \) and \( {\mathbf{h}}_{local}^{y} \) into the subspaces spanned by \( {\mathbf{H}}_{global} \) and \( {\mathbf{H}}_{local} \), respectively, as follows:

$$ \begin{aligned} {\varvec{\uprho}}_{global} = & \left( {{\mathbf{H}}_{global}^{\text{T}} {\mathbf{H}}_{global} } \right)^{ - 1} {\mathbf{H}}_{global}^{\text{T}} {\mathbf{h}}_{global}^{y} , \\ {\varvec{\uprho}}_{local} = & \left( {{\mathbf{H}}_{local}^{\text{T}} {\mathbf{H}}_{local} } \right)^{ - 1} {\mathbf{H}}_{local}^{\text{T}} {\mathbf{h}}_{local}^{y} , \\ \end{aligned} $$
(10)

where the superscript T denotes the transpose operation.

The projection residuals can be computed as

$$ \begin{aligned} {\mathbf{err}}_{global} = & \left\| {{\mathbf{H}}_{global} {\varvec{\uprho}}_{global} - {\mathbf{h}}_{global}^{y} } \right\|_{2} , \\ {\mathbf{err}}_{local} = & \left\| {{\mathbf{H}}_{local} {\varvec{\uprho}}_{local} - {\mathbf{h}}_{local}^{y} } \right\|_{2} . \\ \end{aligned} $$
(11)

Then, we adopt the weighted average method that is very simple and efficient for fusion:

$$ {\mathbf{err}}_{f} = w \cdot {\mathbf{err}}_{global} + (1 - w) \cdot {\mathbf{err}}_{local} ,\quad 0 \le w \le 1, $$
(12)

where \( w \) is the weight parameter that can be determined empirically.

Finally, we classify the test texture image \( \varvec{y} \) to the class with the minimal residual as follows:

$$ \varvec{y}_{Label} = \mathop {\text{arg min} }\limits_{k} \left\{ {{\mathbf{err}}_{f} (k)} \right\},\quad k = 1,2, \ldots ,C. $$
(13)

4 Experiments and results

4.1 Texture databases

To evaluate the performance of the proposed method, experiments are conducted on two challenging and benchmark databases, namely CUReT and KTH-TIPS. The CUReT database contains 61 texture classes and 92 images per class. The images were captured under unknown viewpoint and illumination. The CUReT database is challenging because it has both large inter-class similarity and low intra-class similarity. The KTH-TIPS database contains images with a total of 9 different scales, 3 different poses and 3 different illumination conditions. This database contains 10 texture classes, and each class includes 81 images. Compared with the CUReT database, the scale variations make the KTH-TIPS a more challenging database.

4.2 Parameter setting

In our experiments, we used the scheme proposed in [13] to design the Gabor filter bank, therefore we set \( F_{l} = 0.03 \), \( F_{h} = 0.4 \), the size of Gabor filter support \( N_{g} = 15 \), \( S = 4 \), and \( K = 6 \). For the global and local Gabor feature fusion, we set the weight \( w = 0.8 \) empirically.

The proposed approach was implemented with MATLAB R2016b, using a PC (Intel Core i3-6100 CPU @ 3.70 GHz, 4 GB RAM) on a Microsoft Windows 10 environment.

4.3 Experimental results

4.3.1 Classification performance using different components of Gabor-filtered images

Different components of Gabor-filtered images can be used as texture feature for classification. To determine the most effective feature description, we conducted an experiment on KTH-TIPS database to compare the classification performance using different components of Gabor-filtered images, where the proposed image pyramid space was always used. The results are listed in Table 1. In Table 1, M8 and P8 represented that only the magnitude or phase component was encoded by 8 bits and then used to construct the local Gabor feature, M8P8 represented that both magnitude (8 bits) and phase (8 bits) were encoded separately, and then their encoding histograms were concatenated as the local Gabor feature.

Table 1 Classification results using different components of Gabor-filtered images

From Table 1, we could observe that: (1) our proposed method achieved the best performance of 99.36%, which demonstrated the superiority of our method in texture classification. (2) In the proposed framework of image pyramid space, no matter which component was adopted, all the methods consistently outperformed the traditional GF method, which demonstrated the effectiveness of the proposed image pyramid space in coping with the resolution variation. (3) Our proposed method significantly outperformed the traditional GF-based method. The classification accuracy of traditional GF-based method was 91.48%, but our proposed method achieved 99.36%, which was a huge improvement. (4) For the local Gabor feature description, our proposed joint coding of magnitude and phase components had better performance than that of M8, P8, and M8P8. The reason was that M8 employed only the magnitude information and P8 employed only the phase information, which also revealed that both magnitude and phase components had useful information for texture classification. Though M8P8 used the magnitude and phase components, the features of magnitude and phase were extracted separately, which lost the corresponding location information between magnitude and phase components of each point. Our proposed joint coding of magnitude and phase components integrated the magnitude, phase, and their corresponding location information to acquire more discriminative feature that resulted in better classification performance.

4.3.2 Our method versus some other state-of-the-art methods

To further evaluate the classification performance of our proposed method, we compared our method with some other state-of-the-art methods based on LBP and texton learning on CUReT and KTH-TIPS databases. For the CUReT database, N = 46 images per class were randomly selected for training data, while the remaining 92 − N = 46 images per class were used as testing data. For the KTH-TIPS database, N = 40 images per class were randomly selected for training data, while the remaining 81 − N = 41 images per class were used as testing data. We repeated this random partition 100 times and calculated the average precision as the final classification accuracy. The results are listed in Table 2.

Table 2 Classification results (%) of different methods

From Table 2, we could observe that: (1) our method consistently outperformed all the other compared methods on CUReT database. Our method provided the best performance of 99.60%, which was even better than that of some recently proposed methods, such as TEISF_f, LGWBP, and SSLBP. (2) The SSLBP method gave the best performance of 99.39% on KTH-TIPS database, but our method achieved the performance of 99.36%, which was very close to that of SSLBP method. Except for SSLBP method, our method outperformed all the other compared methods on KTH-TIPS database. Therefore, the results clearly suggested that our proposed method could achieve comparable or even better texture classification performance compared with the LBP-based methods and texton learning-based methods.

4.3.3 Robustness to the number of training samples

We also evaluated the performance of our method with different number of training samples. On the KTH-TIPS database, N = (40, 30, 20, 10) samples were randomly chosen to form the training set. The results are listed in Table 3.

Table 3 Classification results (%) on KTH-TIPS with different number of training samples

From Table 3, we could observe that our method was more robust to the number of training samples. When the number of training samples decreased, the classification accuracy of our method did not drop significantly. This was probably because: (1) the extracted global and local Gabor features were highly discriminative and (2) we constructed an image pyramid space for each original image; therefore, four images with different resolutions were used to characterize a single original image, which compensated for the decrease of the number of training samples. Therefore, our method is very promising, especially when the number of available training samples is limited.

4.3.4 Time cost

The efficiency of a texture classification method is an important issue. We listed the average running time of our method and some other methods on the CUReT and KTH-TIPS databases in Table 4.

Table 4 Average running time (s)

From Table 4, we could observe that the proposed method had a moderate computational cost. In comparison with the traditional GF method and LBP-based methods, our method was somewhat slower. However, the computational complexity of our method was much lower than that of texton learning-based methods. Therefore, the proposed method is an efficient texture classification method that can be used for many practical applications.

5 Conclusion

In this study, we proposed an effective texture classification method by combing multi-resolution global and local Gabor features in pyramid space. In this method, a pyramid space for each original image is constructed to address the resolution variation issue. The global and local Gabor features are extracted and fused to implement the texture classification in the framework of NSC. Experimental results on CUReT and KTH-TIPS databases demonstrate that the proposed method significantly improves the performance of GF-based texture classification methods with moderate computational complexity. In the future work, we will focus on the study of the deep learning-based approaches [20, 21], which is a very promising direction of texture classification.