Local receptive fields based extreme learning machine with hybrid filter kernels for image classification

He, Bo; Song, Yan; Zhu, Yuemei; Sha, Qixin; Shen, Yue; Yan, Tianhong; Nian, Rui; Lendasse, Amaury

doi:10.1007/s11045-018-0598-9

Local receptive fields based extreme learning machine with hybrid filter kernels for image classification

Published: 16 June 2018

Volume 30, pages 1149–1169, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multidimensional Systems and Signal Processing Aims and scope Submit manuscript

Local receptive fields based extreme learning machine with hybrid filter kernels for image classification

Download PDF

Bo He¹^na1,
Yan Song¹^na1,
Yuemei Zhu¹,
Qixin Sha¹,
Yue Shen¹,
Tianhong Yan²,
Rui Nian¹ &
…
Amaury Lendasse^3,4

656 Accesses
7 Citations
Explore all metrics

Abstract

In this paper, an innovative method called extreme learning machine with hybrid local receptive fields (ELM-HLRF) is presented for image classification. In this method, filters generated by Gabor functions and the randomly generated convolution filters are incorporated into the convolution filter kernels of local receptive fields based extreme learning machine (ELM-LRF). Extreme learning machine (ELM) is derived from single hidden layer feed-forward neural networks, and the parameters of its hidden layer can be generated randomly. As locally connected ELM, ELM-LRF directly processes information with strong correlations such as images and speech. In this paper, two main contributions are proposed to improve the classification performance of ELM-LRF. First, the Gabor functions are used as one kind of convolution filter kernels of ELM-HLRF to execute image classification. Second, we use a data augmentation method to preprocess training images to avoid overfitting. Experiments on the Outex texture dataset, the Yale face dataset, the ORL face database and the NORB dataset demonstrate that ELM-HLRF outperforms ELM-LRF, ELM and support vector machine in classification accuracy, and the presented data augmentation method improves the classification performance.

Extreme learning machine with coefficient weighting and trained local receptive fields for image classification

Article 15 July 2020

Extreme learning machine with multi-scale local receptive fields for texture classification

Article 20 April 2016

Extreme learning machine with autoencoding receptive fields for image classification

Article 26 June 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Machine learning has been extensively studied in image classification and recognition nowadays. In some methods, features extraction and selection play an important role before applying neural networks to execute classification. Whether the features can represent the characteristics of images matters a lot and determines the classification accuracy. Local binary patterns (LBPs) (Pietikäinen 2010) and gray level co-occurrence matrix (GLCM) (Haralick et al. 1973) are features widely used for texture classification. On the other hand, researchers have utilized neural networks with local receptive field, such as convolutional neural network (CNN) (LeCun et al. 1995), to process images directly without extra feature extraction step. In CNN, the structure formed by convolutional layer and pooling layer performs like feature extraction procedure.

Extreme learning machine (ELM) was proposed by Huang et al. (2006, 2012) and performed well in both regression and classification. ELM is derived from single-hidden layer feed-forward neural networks (SLFNs) which consists of one hidden layer and one output layer. The advantages of ELM include that the hidden layer parameters of ELM can be generated randomly and it learns faster while keeping superior performance. ELM has been used in image classification and recognition combined with other algorithms or various kinds of image features. Li et al. (2015) employed ELM as the classifier with image features extracted by LBPs, which presented better performance than the state-of-art methods. Meanwhile, ELM and graph-based optimization methods were fused to boost remote sensing image classification (Bencherif et al. 2015). And Zeng et al. 2017 presented a traffic sign recognition method in which CNN was used to extract features from images and ELM was also applied as the classifier.

To utilize ELM to process images directly, Huang et al. (2015) proposed local receptive field based extreme learning machine (ELM-LRF). ELM-LRF outperforms CNN on the NORB dataset (LeCun et al. 2004a) in classification accuracy and time consumption. Huang et al. (2017) proposed a modified ELM-LRF to perform texture image classification. The modified method (Huang et al. 2017) employed multi-scale convolution kernels in ELM-LRF (ELM-MSLRF) and it could learn texture information of different scales. ELM-MSLRF is superior to ELM-LRF according to the experimental results.

The Gabor filters (Fogel and Sagi 1989) have been successfully employed in object detection (Jain et al. 1997), image segmentation (Jain and Farrokhnia 1991) and classification (Rajadell et al. 2013) and edge detection (Mehrotra et al. 1992) for more than two decades. By providing information with different scales and orientations, Gabor filters perform like human beings and are often used for texture representation and description. In practical applications, Gabor filters can extract the relevant features in different scales and orientations in the frequency domain. Recently, Gabor filters also have been used as the convolution kernels of CNN (GCNN) to carry out improved speech recognition (Chang and Morgan 2014).

On the other hand, data augmentation has been widely used in neural network to prevent overfitting of small dataset (Cui et al. 2015; Krizhevsky et al. 2012). In these studies, label-preserving transformations were used to augment training data. Cui et al. (2015) proposed two data augmentation methods to deal with data sparsity for both deep neural networks (DNNs) and CNN. The proposed augmentation methods could increase the variations of training data. Krizhevsky et al. (2012) introduced two data augmentation forms to reduce overfitting. The first form was carried out by doing image translation and horizontal reflections, and the second one included altering the intensities of the RGB channels in training images.

Motivated by the aforementioned research, two main improvements are introduced in this paper. First, to improve the performance of ELM-LRF in image classification, we use Gabor functions as one kind of convolution kernel filters. The Gabor functions can provide more image information by using filters with different scales and orientations. The proposed method, extreme learning machine with hybrid local receptive fields (ELM-HLRF), can provide maps generated by Gabor filters and randomly generated convolution filters in the convolutional layer. Second, we propose a data augmentation method using label-preserving transformations to improve classification performance. This data augmentation method uses Gaussian blur to preprocess training images. Then the blurred images and the original training images will be incorporated as augmented data to train classifiers. We evaluate the proposed methods on these datasets: the Outex dataset (Ojala et al. 2002), the Yale face database (Georghiades et al. 1997), the ORL face database (Samaria and Harter 1994) and the NORB dataset (LeCun et al. 2004b). Experimental results demonstrate that: first, ELM-HLRF performs better than ELM-LRF, ELM and support vector machine (SVM) (Cortes and Vapnik 1995) in classification accuracy; second, the proposed data augmentation method can improve classification performance.

The rest of this paper begins with related works in Sect. 2. Section 3 gives a detailed description of the proposed methods. Section 4 reports and discusses the experimental results. Finally Sect. 5 concludes.

2 Related works

2.1 Gabor filters in image processing

The two-dimensional discrete Gabor function can be written as follows

$$\begin{aligned} \phi (x,y) = \frac{1}{{2\pi {\delta _x}{\delta _y}}}\exp \left[ { -\, \frac{1}{2}\left( {\frac{{{x^2}}}{{\delta _x^2}} + \frac{{{y^2}}}{{\delta _y^2}}} \right) } \right] \exp (2\pi jWx), \end{aligned}$$

(1)

where W denotes the radial frequency of Gabor wavelet, ${\delta _x}$ and ${\delta _y}$ are parameters of Gaussian envelope along the x-axis and y-axis respectively.

The Gabor filter with frequency W and orientation $\theta $ by coordinate rotation can be given by

$$\begin{aligned} \phi '(x,y) = \frac{1}{{2\pi {\delta _x}{\delta _y}}}\exp \left[ { -\, \frac{1}{2}\left( {\frac{{x{'^2}}}{{\delta _x^2}} + \frac{{y{'^2}}}{{\delta _y^2}}} \right) } \right] \exp (2\pi jWx'), \end{aligned}$$

(2)

where $\phi '(x,y)$ is the two-dimensional discrete Gabor function when $x' = {\alpha ^{ - s}}(x \cdot \cos {\theta _l} + y \cdot \sin {\theta _l})$, $y' = {\alpha ^{ - s}}( -\, x \cdot \sin {\theta _l} + y \cdot \cos {\theta _l})$, s is the scale and l is the orientation; * represents transpose conjugate; $s = 1,2, \ldots ,p$; $l = 1,2, \ldots ,q$; $\alpha $ is the scale factor and $\alpha > 1$. The (x, y) denotes initial coordinate, while ($x', y'$) denotes the transformed coordinate. The symbols p and q are the number of scales and orientations of Gabor filters respectively. The functions $\phi (x,y)$ and $\phi '(x,y)$ have the following relationship:

$$\begin{aligned} \phi (x,y) = {\alpha ^{ - s}}\phi '(x,y). \end{aligned}$$

(3)

The Fourier transform of $\phi (x,y)$ is

$$\begin{aligned} \varPhi (u,v) = \exp \left\{ {\frac{1}{2}\left[ {\frac{{{{(u - W)}^2}}}{{\delta _u^2}} + \frac{{{v^2}}}{{{\delta _v}}}} \right] } \right\} , \end{aligned}$$

(4)

where ${\delta _u} = 1/2\pi {\delta _x}$ and ${\delta _v} = 1/2\pi {\delta _y}$. Let I(x, y) be the input image, I(x, y) filtered by $\phi '(x,y)$ can be written as

$$\begin{aligned} F(x,y) = \sum \limits _{{x_1}} {\sum \limits _{{y_1}} {I({x_1},{y_1})\phi '^*_{}(x - {x_1},y - {y_1})} }, \end{aligned}$$

(5)

where F(x, y) is the filter response. The mean and standard deviation of the magnitude of F(x, y) can be used as features (Manjunath and Ma 1996) of image tiles to perform texture classification. The mean and standard derivative of F(x, y) are calculated as

$$\begin{aligned} {\mu _{s,l}}= & {} \sum \limits _{{x}} \sum \limits _{{y}} {{|{F}(x,y)} } |, \end{aligned}$$

(6)

$$\begin{aligned} {\delta _{s,l}}= & {} \sqrt{\sum \limits _{{x}} \sum \limits _{{y}} {{(|{F}(x,y)} } | - {\mu _{s,l}}{)^2}} . \end{aligned}$$

(7)

The feature vector for I(x, y) is represented as $[{\mu _{1,1}},{\delta _{1,1}},{\mu _{1,2}},{\delta _{1,2}}, \ldots ,{\mu _{p,q}},{\delta _{p,q}}]$.

Figure 1 presents an instance of Gabor filtered face image. In this instance, we set $p = 5$ and $q = 8$, so there are 40 Gabor filtered results.

2.2 Brief review of ELM

ELM is proposed by Huang et al. (2006, 2012). When compared with traditional neural networks, ELM has faster learning speed and higher accuracy. The weights and biases in hidden layer of ELM can be assigned randomly. The flowchart of ELM is shown in Fig. 2. Let $({X_j},{t_j})$$(j = 1,2, \ldots ,N)$ be the N input samples of SLFNs, where ${X_j} = {[{x_{j1}},{x_{j2}}, \ldots ,{x_{jn}}]^T} \in {R^n}$ denotes input feature vector and ${t_j}$ denotes target value of ${X_j}$ . The value domain of ${t_j}$ is $\{1,2, \ldots ,m\}$. The SLFNs with L hidden nodes can be written as follows

$$\begin{aligned} \sum \limits _{i = 1}^L {{\beta _i}g({W_i} \cdot {X_j} + {b_i})} = {o_j},\quad j = 1, \ldots ,N, \end{aligned}$$

(8)

where $g( \cdot )$ denotes the activation function, ${W_i} = [{w_{i,1}},{w_{i,2}}, \ldots ,{w_{i,n}}]$ denotes the input weights, ${\beta _i}$ denotes the output weight and ${b_i}$ denotes the bias of the ith hidden layer unit. Function ${W_i} \cdot {X_j}$ denotes the inner product of ${W_i}$ and ${X_j}$.

The goal of SLFNs is to minimize the output error, that is to say, the output of SLFNS ${o_j}$ and the target output ${t_j}$ should satisfy the following equation:

$$\begin{aligned} \sum \limits _{j = 1}^N {\left\| {{o_j} - {t_j}} \right\| } = 0. \end{aligned}$$

(9)

For the training dataset, we have the following assumption:

$$\begin{aligned}&\sum \limits _{i = 1}^L {{\beta _i}g({W_i} \cdot {X_j} + {b_i})} = {t_j},\quad j = 1, \ldots ,N, \end{aligned}$$

(10)

$$\begin{aligned}&H\beta = T, \end{aligned}$$

(11)

where H is the output matrix of the hidden layer, $\beta $ is the output weights and $T\mathrm{{ = [}}{t_1}\mathrm{{,}}{t_2}\mathrm{{,}} \ldots \mathrm{{,}}{t_N}\mathrm{{]}}$ is the target outputs of all inputs. Equation 11 can be written as

$$\begin{aligned} \begin{array}{l} H({W_1}, \cdots ,{W_L},{b_1}, \cdots ,{b_L},{X_1}, \ldots ,{X_L})\\ \quad = {\left[ {\begin{array}{ccc} {g({W_1} \cdot {X_1} + {b_1})}&{} \cdots &{}{g({W_L} \cdot {X_1} + {b_L})}\\ \vdots &{} \cdots &{} \vdots \\ {g({W_1} \cdot {X_N} + {b_1})}&{} \cdots &{}{g({W_L} \cdot {X_N} + {b_L})} \end{array}} \right] _{N \times L}}, \end{array} \end{aligned}$$

(12)

where

$$\begin{aligned} \beta = {\left[ {\begin{array}{c} {{\beta _1}}\\ \vdots \\ {{\beta _L}} \end{array}} \right] _{L \times 1}},T = {\left[ {\begin{array}{c} {t_1^{}}\\ \vdots \\ {t_N^{}} \end{array}} \right] _{N \times 1}}. \end{aligned}$$

(13)

After training, we get ${\hat{W}_i}$, ${\hat{b}_i}$ and ${\hat{\beta }_i}$ which satisfy the following equation

$$\begin{aligned} \left\| {H({{\hat{W}}_i},{{\hat{b}}_i}){{\hat{\beta }}_i} - T} \right\| = \mathop {\min }\limits _{W,b,\beta } \left\| {H({W_i},{b_i}){\beta _i} - T} \right\| , \end{aligned}$$

(14)

where $i = 1, \ldots ,L$. Equation (14) also means to minimize loss function also means to minimize loss function

$$\begin{aligned} E = {\sum \limits _{j = 1}^N {\left( {\sum \limits _{i = 1}^L {{\beta _i}g({W_i} \cdot {X_j} + {b_i}) - {t_j}} } \right) } ^2}. \end{aligned}$$

(15)

$\beta $ can be calculated as follows

$$\begin{aligned} \hat{\beta }= {H^\dag }T, \end{aligned}$$

(16)

where ${H^\dag }$ is the Moore-Penrose generalized inverse of H.

2.3 Brief review of ELM-LRF

In ELM-LRF, the connection between the input layer and one node of hidden layer is generated according to a continuous probability distribution. These random connections constitute local receptive fields. ELM-LRF consists of four layers: the hidden layer, the pooling layer, the full-connected layer and the output layer.

In the hidden layer, the convolution kernel ${a_{{i}}}$ (${i} = 1,2, \ldots ,{k'}$) is randomly generated. Assume that the initial input weights are ${\hat{A}^{init}}$, the size of each input weight is $r \times r$ and the size of each input image is $d \times d$. So the size of each feature map is $(d - r + 1) \times (d - r + 1)$. Then

$$\begin{aligned} \begin{array}{l} {{\hat{A}}^{init}} \in {R^{{r^2} \times {k'}}},{{\hat{A}}^{init}} = [\hat{a}_1^{init},\hat{a}_2^{init}, \cdots ,\hat{a}_{{k'}}^{init}],\\ \mathrm{{ }}\hat{a}_{{i}}^{init} \in {R^{{r^2}}},{i} = 1, \ldots ,{k'}, \end{array} \end{aligned}$$

(17)

where ${\hat{A}^{init}}$ is orthogonalised using singular value decomposition (SVD). The orthogonalised input weights are $\hat{A}$ and $\hat{A} = [{\hat{a}_1},{\hat{a}_2}, \ldots ,{\hat{a}_{{k'}}}]$. Each column of $\hat{A}$ is the orthogonal basis of ${\hat{A}^{\mathrm{{init}}}}$. If $\hbox {r}^{2}<{k'}$, ${\hat{A}^{init}}$ should be transposed at first and then orthogonalised and transposed again at last. The convolution weight of the ith feature map is ${a_{{i}}} \in {\mathrm{{R}}^{r \times r}}$ and it is aligned of ${\hat{a}_{{i}}}$ by column. The convolution result of node $({x_1},{x_2})$ at the ith feature map is ${c_{{x_1},{x_2},{i}}}$:

$$\begin{aligned} \begin{array}{l} {c_{{x_1},{x_2},{i}}} = \sum \limits _{{m_1} = 1}^r {\sum \limits _{{m_2} = 1}^r {({I_{{x_1} + {m_1} - 1,{x_2} + {m_2} - 1}} \cdot {a_{{m_1},{m_2},{i}}})} } \\ \mathop {}\nolimits _{} \mathop {}\nolimits _{} {x_1},{x_2} = 1, \ldots ,(d - r + 1), \end{array} \end{aligned}$$

(18)

where ${I_{{x_1} + {m_1} - 1,{x_2} + {m_2} - 1}}$ is the pixel value of input image I at location $({x_1} + {m_1} - 1,{x_2} + {m_2} - 1)$.

In the pooling layer, pooling size e denotes the distance between the center and the edge of pooling. And the size of the pooled maps are the same as the feature maps ($(d - r + 1) \times (d - r + 1)$). The symbol ${c_{{x_1},{x_2},i}}$ is node (${x_1},{x_2}$) of the $i\mathrm{{th}}$ feature map; ${h_{{p_1},{p_2},i}}$ is node $({p_1},{p_2})$ of the $i\mathrm{{th}}$ pooled map; $i = 1,2, \ldots , k'$. And ${h_{{p_1},{p_2},i}}$ is obtained using

$$\begin{aligned} \begin{array}{l} {h_{{p_1},{p_2},i}} = \sqrt{\sum \limits _{{x_1} = {p_1} - e}^{{p_1} + e} {\sum \limits _{{x_2} = {p_2} - e}^{{p_2} + e} {c_{{x_1},{x_2},i}^2} } } \\ {p_1},{p_2} = 1, \ldots ,(d - r + 1). \end{array} \end{aligned}$$

(19)

If $({x_1},{x_2})$ is out of bound, ${c_{{x_1},{x_2},i}} = 0$.

In the full-connected layer, each pooling map is merged into a row vector. The size of each pooling map is $(d - r + 1) \times (d - r + 1)$. If there are N input images, the matrix $H' \in {\mathrm{{R}}^{N \times [k' \cdot {{(d - r + 1)}^2}]}}$ is calculated as

$$\begin{aligned} H' = {\left[ {\begin{array}{cccc} {{{\hat{h}}_{1,1}}}&{}{{{\hat{h}}_{1,2}}}&{} \cdots &{}{{{\hat{h}}_{1,k' \cdot {{(d - r + 1)}^2}}}}\\ {{{\hat{h}}_{2,1}}}&{}{{{\hat{h}}_{2,2}}}&{} \cdots &{}{{{\hat{h}}_{2,k' \cdot {{(d - r + 1)}^2}}}}\\ \vdots &{} \vdots &{} \cdots &{} \vdots \\ {{{\hat{h}}_{N,1}}}&{}{{{\hat{h}}_{N,2}}}&{} \cdots &{}{{{\hat{h}}_{N,(k') \cdot {{(d - r + 1)}^2}}}} \end{array}} \right] _{N \times [k' \cdot {{(d - r + 1)}^2}]}}, \end{aligned}$$

(20)

where ${\hat{h}_{i,j}} = g({W_j} \cdot {I_i} + {b_j})$, ${I_i}$ is the ith ($1 \le i \le N$) input image and $1 \le j \le k' \cdot {(d - r + 1)^2}$. The output weight $\beta $ of ELM-LRF is calculated as (Huang et al. 2012, 2015)

$$\begin{aligned}&\begin{array}{l} \beta = {H'}_{}^{^T}{\left( \frac{1}{C} + {H'}{H'}_{}^T\right) ^{ - 1}}T\\ \mathrm{{if}}\mathop {}\nolimits _{} N \le k' \cdot {(d - r + 1)^2}, \end{array} \end{aligned}$$

(21)

$$\begin{aligned}&\begin{array}{l} \beta = {\left( \frac{1}{C} + H'^{T}H'\right) ^{ - 1}}H'^{T}T\\ \mathrm{{if}}\mathop {}\nolimits _{} N > k' \cdot {(d - r + 1)^2}. \end{array} \end{aligned}$$

(22)

3 Methods

Section 3.1 introduces the proposed method ELM-HLRF, and Sect. 3.2 presents the data augmentation method.

3.1 Local receptive field based extreme learning machine with hybrid filter kernels

In this paper, we propose an innovative neural network which uses Gabor filters and randomly generated convolution kernels in the convolutional layer. The proposed method is called hybrid local receptive field based extreme learning machine (ELM-HLRF). In this modified topology, randomly generated convolution kernels used in ELM-LRF and Gabor filters of different scales and orientations are combined to process input images.

In image processing, with different combinations of the scale and orientation parameters, Gabor filters are employed to detect contours of various scales and orientations. Therefore the scales and orientations of Gabor filters are important parameters, and these parameters need to be chosen to get optimal training results in ELM-HLRF.

The flowchart of ELM-HLRF is shown in Fig. 3. In Fig. 3, ${G_{{i_1}}} \in {\mathrm{{R}}^{r \times r}}$$({i_1} = 1, \ldots ,{k_1})$ is the convolution kernel provided by Gabor filters, ${k_1}\mathrm{{ = }}p \cdot q$ and

$$\begin{aligned} {G_{{i_1}}} =\,\phi _{{i_1}}'(x,y),\quad \mathop {}\limits _{} {i_1} = 1, \ldots ,{k_1}. \end{aligned}$$

(23)

The convolution result when $\phi _{{i_1}}'(x,y)$ is applied to input image I is

$$\begin{aligned} {F_{{i_1}}}(x,y) = \sum \limits _{{x_1}} {\sum \limits _{{y_1}} {I({x_1},{y_1})\phi '^*_{{i_1}}(x - {x_1},y - {y_1})} }. \end{aligned}$$

(24)

The other kind of convolution kernel ${{a'}_{{i_2}}}$ (${i_2} = 1,2, \ldots ,{k_2}$) is randomly generated. According to (17), the initial input weights are ${\hat{A'}^{init}}$, and

$$\begin{aligned} \begin{array}{l} {{\hat{A'}}^{init}} \in {R^{{r^2} \times {k_2}}},{{\hat{A'}}^{init}} = \left[ \hat{a'}_1^{init},\hat{a'}_2^{init}, \ldots ,\hat{a'}_{{k_2}}^{init}\right] ,\\ \hat{a'}_{{i_2}}^{init} \in {R^{{r^2}}},\quad {i_2} = 1, \ldots ,{k_2}. \end{array} \end{aligned}$$

(25)

The orthogonalised result of ${\hat{A'}^{init}}$ is $\hat{A'} = [{\hat{a'}_1},{\hat{a'}_2}, \ldots ,{\hat{a'}_{{k_2}}}]$. The convolution result of node $({x_1},{x_2})$ at the ${i_2}\mathrm{{th}}$ feature map is ${{c'}_{{x_1},{x_2},{i_2}}}$, which is obtained using (18).

The convolution weights of the convolutional layer of ELM-HLRF consist of ${G_{{i_1}}} \in {\mathrm{{R}}^{r \times r}}$$({i_1} = 1, \ldots ,{k_1})$ and ${\mathrm{{{a'}}}_{{i_2}}} \in {\mathrm{{R}}^{r \times r}}$$({i_2} = 1, \ldots ,{k_2})$. The pooling map ${{h'}_{{p_1},{p_2},i}}$ ($i = 1,2, \ldots , ({k_1} + {k_2})$) can be calculated using (19).

As presented in Sect. 2.3, in the full-connected layer, the matrix ${\hat{H}} \in {\mathrm{{R}}^{N \times [({k_1} + {k_2}) \cdot {{(d - r + 1)}^2}]}}$ can be calculated as

$$\begin{aligned} {\hat{H}} = {\left[ {\begin{array}{cccc} {{{\hat{h}}_{1,1}}}&{}{{{\hat{h}}_{1,2}}}&{} \cdots &{}{{{\hat{h}}_{1,({k_1} + {k_2}) \cdot {{(d - r + 1)}^2}}}}\\ {{{\hat{h}}_{2,1}}}&{}{{{\hat{h}}_{2,2}}}&{} \cdots &{}{{{\hat{h}}_{2,({k_1} + {k_2}) \cdot {{(d - r + 1)}^2}}}}\\ \vdots &{} \vdots &{} \cdots &{} \vdots \\ {{{\hat{h}}_{N,1}}}&{}{{{\hat{h}}_{N,2}}}&{} \cdots &{}{{{\hat{h}}_{N,({k_1} + {k_2}) \cdot {{(d - r + 1)}^2}}}} \end{array}} \right] _{N \times [({k_1} + {k_2}) \cdot {{(d - r + 1)}^2}]}}. \end{aligned}$$

(26)

The output weight $\xi $ of ELM-HLRF is calculated as

$$\begin{aligned}&\begin{array}{l} \xi = {\hat{H}}_{}^{^T}{\left( \frac{1}{C} + {\hat{H}}{\hat{H}}_{}^T\right) ^{ - 1}}T\\ \mathrm{{if}}\mathop {}\nolimits _{} N \le ({k_1} + {k_2}) \cdot {(d - r + 1)^2}, \end{array} \end{aligned}$$

(27)

$$\begin{aligned}&\begin{array}{l} \xi = {\left( \frac{1}{C} + {\hat{H}}^{T}{\hat{H}}\right) ^{ - 1}}{\hat{H}}^{T}T\\ \mathrm{{if}}\mathop {}\nolimits _{} N > ({k_1} + {k_2}) \cdot {(d - r + 1)^2}. \end{array} \end{aligned}$$

(28)

3.2 Data augmentation

In supervised machine learning, we need sufficient data to train the neural networks to obtain robust model and to avoid overfitting. Data augmentation is the commonly used method which can enlarge datasets by using label-preserving transformations. The related data augmentation methods include color jittering, PCA jittering, random scale transformation, random crop, horizontal flip, vertical flip, translation transform, rotation, reflection, affine transformation, Gaussian noise and blurring, etc.

In this paper, we carry out a data augmentation method by altering the pixel intensities of training images. A $5 \times 5$ Gaussian blur function with standard deviation 1 is used to process each training image. Each blurred image and its corresponding training image share the same label. Then we train the network using the blurred images and the original training images. Figure 4 gives an example of Gaussian blurred face image. Trained with the augmented datasets, the learning machine will be more robust to blur details.

4 Experimental results and analysis

In this paper, we use five datasets to evaluate the performance of ELM-HLRF. Details about the datasets, the parameters setup, the experimental results and the discussions are introduced in this section.

4.1 Datasets for performance evaluation

In this paper, we use five datasets to evaluate the performance of ELM-HLRF: the Outex$\_$TC$\_$00000, the Outex$\_$TC$\_$00012, the Yale face database, the ORL face database and the NORB dataset.

We use two different databases of Outex: Outex$\_$TC$\_$00000 and Outex$\_$TC$\_$00012. The Outex$\_$TC$\_$00000 consists of 8832 images of 24 different textures. Each texture has 368 images, 184 for training and 184 for testing. We choose 099 folder in Outex$\_$TC$\_$00000 to do training and testing in our experiment. The other Outex database we use in this paper is Outex$\_$TC$\_$00012, which contains 1440 images of 24 different textures. Each texture has 60 samples, 20 of which for training and 40 of which for testing. These texture images are recorded under different illuminations. Images of Outex$\_$TC$\_$00012 are with the resolution of 128 by 128. We resize them into size of 32 by 32. Figures 5 and 6 give some texture image samples of Outex$\_$TC$\_$00000 and Outex$\_$TC$\_$00012 respectively.

The Yale face database contains 165 grayscale images of 15 individuals. There are 11 images for each face with different facial expressions such as center-light, glasses, happy, left-light, no glasses, normal, right-light, sad, sleepy, surprised and wink. Each image is 100 by 100 pixels in size. Images of Yale face database are also resized into a size of 32 by 32 pixels to speed up experiments in this paper. Figure 7 shows the face image samples of Yale face database.

The ORL face database (Samaria and Harter 1994) consists of 400 images, 10 for each of 40 distinct subjects. For some subjects, the images were taken at different times, lighting, facial expressions (open or closed eyes, smiling or not smiling) and facial details (glasses or no glasses). All the images were taken against a dark homogeneous background with the subjects in an upright, frontal position. The size of each image is 92 by 112 pixels. Images of the ORL face database are also resized into the size of 32 by 32 in this paper. Figure 8 are the face image samples of the ORL face database.

The NORB dataset is a benchmark for object recognition (LeCun et al. 2004c). It contains images of 50 toys belonging to 5 generic categories: four-legged animals, human figures, airplanes, trucks, and cars. The objects were imaged by two cameras under 6 lighting conditions, 9 elevations (30$^{\circ }$–70$^{\circ }$ every 5$^{\circ }$), and 18 azimuths (0–340 every 20$^{\circ }$). And there are 48,600 images ($50*6*9*18$) in the NORB dataset, half of which (ie 24,300 images) are used for training and the rest (ie 24,300 images) for testing. We downsize them to $32\times 32$ in the experiments.

4.2 Parameters setup

The experiments are carried out on five datasets using the Matlab software on a windows 7 64 bit system with Intel(R) Core(TM) i5-4210U CPU and 64 GB RAM.

In this paper, the number of the hidden nodes of ELM ranges from 100 to 2000. For each experiment, we will find the parameters which generate the optimal classification results on each dataset. Besides, six parameters have direct effect on classification accuracy: the number of the Gabor filter scales p, the number of the Gabor filter orientations q, the number of the convolution filters of ELM-HLRF ${k_2}$, the number of the convolution filters of ELM-LRF $k'$ , the convolution size r, the pooling size e and the regularization parameter C. The value of C can be $\{0.01, 0.1, 1, 10, 100\}$. The p ranges from 1 to 5 with stride 1, q is equal to 4 or 8, $k'$ and ${k_2}$ ranges from 4 to 80 with stride 4. The convolution size r ranges from 4 to 9 and the pooling size e ranges from 3 to 8.

4.3 Results and analysis

4.3.1 Performance evaluation on dataset Outex$\_$TC$\_$00012

Figure 9 shows the relationship between the classification accuracy, the pooling size e and the convolution size r of ELM-LRF when the number of the convolution filters ${k_2}$ is set to 48. We can see from Fig. 9 that when $e=8$ and $r=9$ we have the highest classification accuracy. For Outex datasets Outex$\_$TC$\_$00000 and Outex$\_$TC$\_$00012, the parameters of ELM-HLRF are listed in Table 1.

Table 1 The parameters of ELM-HLRF on dataset Outex$\_$TC$\_$00000 and Outex$\_$TC$\_$00012

Full size table

Table 2 The classification accuracy ($\%$) when ELM-HLRF and ELM-LRF are applied to dataset Outex$\_$TC$\_$00012 and $k'$ is a constant

Full size table

Table 2 presents the classification accuracy when ELM-HLRF and ELM-LRF are applied to dataset Outex$\_$TC$\_$00012. It can be seen from Table 2 that the proposed method ELM-HLRF can improve classification accuracy by providing Gabor filtered maps in the convolutional layer. To evaluate the quality of Gabor filters as convolution kernels, we compare the classification accuracy of ELM-HLRF with that of ELM-LRF, ELM and SVM. Table 3 shows that when $k'= 60$ ELM-LRF has the best result (82.50$\%$); and when ${k_2}=48$, $p = 3$ and $q = 4$, ELM-HLRF has the highest accuracy (96.77$\%$). It can be seen that ELM-HLRF outperforms other methods in classification accuracy.

Table 3 The classification accuracy ($\%$) when ELM-HLRF (${k_2}=48$) and ELM-LRF are applied to dataset Outex$\_$TC$\_$00012 and $k' = ({k_2}+p \cdot q)$

Full size table

Figure 10 shows the classification results of ELM-LRF ($k'= 60$) and ELM-HLRF (${k_2}=48$, $p = 5$ and $q = 4$) with different number of training samples. The classification accuracy of ELM-HLRF is always higher than that of ELM-LRF. Figure 11 gives the time consumption of ELM-LRF ($k'= 60$) and ELM-HLRF (${k_2}=48$, $p = 5$ and $q = 4$) with varying number of training samples. It should be noted that the time here includes the time to produce convolution weights and the time to perform Gabor filtering procedure of ELM-HLRF in the training step. The time consumption of ELM-HLRF is more than ELM-LRF but ELM-HLRF has more convolution nodes and higher classification accuracy.

ELM-HLRF is also compared with the method presented by Yang et al. (2018). The average classification accuracy of the method (Yang et al. 2018) on Outex$\_$TC$\_$ 00012 is 96.54$\%$, while ELM-HLRF has higher accuracy (96.88$\%$).

4.3.2 Performance evaluation on dataset Outex$\_$TC$\_$00000

The classification results when ELM-LRF and ELM-HLRF are applied to dataset Outex$\_$TC$\_$00000 are shown in Tables 4 and 5. Several conclusions can be obtained from the two tables: first, Gabor filters are efficient convolution kernels when doing image feature extraction; second, Table 5 shows that when $k' = ({k_2}+p \cdot q)$ the classification accuracy of ELM-HLRF is higher than that of ELM-LRF, so the hybrid filter kernels are superior to the randomly generated convolution kernels. ELM-HLRF achieves its optimal performance when ${k_2}=48$, $p=3$, $q = 8$ and does not increase with the increasing of p and q after then.

Table 4 The classification accuracy ($\%$) when ELM-HLRF and ELM-LRF are applied to dataset Outex$\_$TC$\_$00000 and $k'=48$

Full size table

Table 5 The classification accuracy ($\%$) when ELM-HLRF (${k_2} =48$) and ELM-LRF are applied to dataset Outex$\_$TC$\_$00000 and $k' = ({k_2}+p \cdot q)$

Full size table

Figure 12 shows the classification results of ELM-LRF ($k'= 88$) and ELM-HLRF (${k_2}=48$, $p = 5$ and $q = 8$) with different sizes of training samples. The classification accuracy of ELM-HLRF is always higher than that of ELM-LRF. Figure 13 gives the time consumption of ELM-LRF ($k'= 88$) and ELM-HLRF (${k_2}=48$, $p = 5$ and $q = 8$) with varying number of training samples. The convolution nodes of ELM-LRF and ELM-HLRF are the same in this experiment, and it can be seen from Fig. 13 that the consumed time of ELM-LRF is more than that of ELM-HLRF.

The results of ELM-HLRF on Outex$\_$TC$\_$00000 are also compared with those of other methods. With the input images of the same size ($32\times 32$), ELM-HLRF has higher classification accuracy (83.54$\%$) than the method (($76.7\pm 1.8)\%$) presented by Reininghaus et al. (2015).

4.3.3 Performance evaluation on the Yale face database

Figure 14 shows the classification results when ELM-LRF is applied to the Yale face database. The number of training samples for each class is 5. We can see from Fig. 14 that the increasing number of convolution maps contributes little to the classification accuracy. So we set $k'=4$ and $k'=8$ in this paper for the consideration of time-consumption.

Table 6 shows the classification accuracy of ELM, SVM, ELM-LRF, the method presented by Zhang et al. (2014) and ELM-HLRF. When the number of convolution filters is set to 8 (${k_2}=4$, $p=1$ and $q = 4$), the classification accuracy of ELM-HLRF is higher than that of other methods. Specifically, when the number of convolution nodes of ELM-LRF and ELM-HLRF is the same, ELM-HLRF outperforms ELM-LRF.

Table 6 The comparison of classification accuracy ($\%$) on the Yale face database

Full size table

4.3.4 Performance evaluation on the ORL face database

The classification results of ELM-HLRF on the ORL face database are shown in Table 7. In this experiment, 5 images for each class are used for training and the rest 5 for testing. We compare ELM-HLRF with ELM-LRF and the image classification method presented by Xu et al. (2015). The optimal parameters of ELM-LRF and ELM-HLRF are shown in Table 7. ELM-HLRF and ELM-LRF have the same classification accuracy (98.50$\%$), but the training time of ELM-HLRF is higher than that of ELM-LRF because the former has more convolution kernels.

Table 7 The comparison of classification accuracy ($\%$) and training time (s) on the ORL face database

Full size table

4.3.5 Performance evaluation on the NORB dataset

Table 9 shows the comparison of classification accuracy on the NORB dataset. The optimal parameters of ELM-LRF are shown in Table 8. The parameters of ELM-HLRF are shown in Table 8, and the optimal parameters are shown in Table 9. We can see from Table 9 that ELM-HLRF achieves good performance (97.45$\%$), which is comparable to ELM-MSLRF (97.50$\%$) and better than the other methods. Besides, the training time of ELM-HLRF is longer than that of ELM-LRF and ELM-MSLRF because the former has more convolution kernels.

Table 8 The parameters of ELM-LRF and ELM-HLRF on NORB

Full size table

Table 9 The comparison of classification accuracy ($\%$) and training time (s) on NORB

Full size table

4.3.6 Data augmentation

In this paper, data augmentation is considered to improve the overall classification accuracy. We use Gaussian blur to preprocess the training images first. Then the Gaussian blurred images combined with original training images will be provided as new training image inputs of ELM-HLRF or ELM-LRF. We use a $5 \times 5$ Gaussian blur function with standard deviation of 1 to preprocess the original training images. Table 10 shows the classification results when data augmentation is applied to the five datasets. It can be concluded that the results obtained by the proposed data augmentation method are better than those without data augmentation.

Table 10 Classification accuracy ($\%$) of the three datasets before and after data augmentation

Full size table

4.4 Discussions

Figures 15 and 16 show the relationship between the number of Gabor filters and the number of the random convolution kernels in ELM-HLRF on the Outex$\_$TC$\_$00012 dataset and the ORL face database respectively. We can see from Figs. 15 and 16 that, the accuracy increases with the increasing number of random convolution kernels at first, and then the number of random convolution kernels is not in evidence with the classification accuracy when ${k_2}\ge 24$ for Outex$\_$TC$\_$00012 and ${k_2}\ge 20$ for the ORL face database.

Furthermore, the results on the Outex$\_$TC$\_$00012, the ORL face database and the NORB dataset show that ELM-HLRF needs the more number of convolution kernels than ELM-LRF to achieve its highest classification accuracy. Therefore, ELM-HLRF spends more training time to achieve its optimal performance. On the other hand, ELM-HLRF has higher accuracy than ELM-LRF, which means that the Gabor filters can provide the features that the random convolution kernels cannot extract from images.

5 Conclusions

In this paper, we propose an innovative method local receptive field based extreme learning machine with hybrid filter kernels (ELM-HLRF) to carry out image classification. Two kinds of convolution kernels are included in ELM-HLRF: the kernels of ELM-LRF and Gabor filter kernels. In parallel, a data augmentation method based on Gaussian blur is used to improve classification performance. We evaluate the performance of ELM-HLRF and the data augmentation method using five datasets: Outex$\_$TC$\_$00000, Outex$\_$TC$\_$00012, the Yale face database, the ORL face database and the NORB dataset. It can be concluded that ELM-HLRF has higher classification accuracy than ELM-LRF, SVM and ELM, which proves that Gabor filter kernels are efficient convolution kernels. Also, the experiment results indicate that the training data augmented using data augmentation is more effective than the original training data.

References

Bencherif, M. A., Bazi, Y., Guessoum, A., Alajlan, N., Melgani, F., & AlHichri, H. (2015). Fusion of extreme learning machine and graph-based optimization methods for active classification of remote sensing images. IEEE Geoscience and Remote Sensing Letters, 12(3), 527–531.
Article Google Scholar
Chang, S.-Y. & Morgan, N. (2014). Robust CNN-based speech recognition with Gabor filter kernels. In Fifteenth annual conference of the international speech communication association.
Coates, A., Ng, A. & Lee, H. (2011). An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp. 215–223.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.
MATH Google Scholar
Cui, X., Goel, V., & Kingsbury, B. (2015). Data augmentation for deep neural network acoustic modeling. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(9), 1469–1477.
Article Google Scholar
Fogel, I., & Sagi, D. (1989). Gabor filters as texture discriminator. Biological Cybernetics, 61(2), 103–113.
Article Google Scholar
Georghiades, A., Belhumeur, P. & Kriegman, D. (1997). Yale face database. Center for Computational Vision and Control at Yale University, 2, 6. http://cvc.yale.edu/projects/yalefaces/yalefa.
Haralick, R. M., Shanmugam, K., et al. (1973). Textural features for image classification. IEEE Transactions on Systems, Man, and Cybernetics, 6, 610–621.
Article Google Scholar
Huang, G.-B., Bai, Z., Kasun, L. L. C., & Vong, C. M. (2015). Local receptive fields based extreme learning machine. IEEE Computational Intelligence Magazine, 10(2), 18–29.
Article Google Scholar
Huang, G.-B., Zhou, H., Ding, X., & Zhang, R. (2012). Extreme learning machine for regression and multiclass classification. IEEE Transactions on Systems, Man, and Cybernetics Part B (Cybernetics), 42(2), 513–529.
Article Google Scholar
Huang, G.-B., Zhu, Q.-Y., & Siew, C.-K. (2006). Extreme learning machine: Theory and applications. Neurocomputing, 70(1–3), 489–501.
Article Google Scholar
Huang, J., Yu, Z. L., Cai, Z., Gu, Z., Cai, Z., Gao, W., et al. (2017). Extreme learning machine with multi-scale local receptive fields for texture classification. Multidimensional Systems and Signal Processing, 28(3), 995–1011.
Article Google Scholar
Jain, A. K., & Farrokhnia, F. (1991). Unsupervised texture segmentation using gabor filters. Pattern Recognition, 24(12), 1167–1186.
Article Google Scholar
Jain, A. K., Ratha, N. K., & Lakshmanan, S. (1997). Object detection using gabor filters. Pattern Recognition, 30(2), 295–309.
Article Google Scholar
Krizhevsky, A., Sutskever, I. & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105.
LeCun, Y., Bengio, Y., et al. (1995). Convolutional networks for images, speech, and time series. The Handbook of Brain Theory and Neural Networks, 3361(10), 1995.
Google Scholar
LeCun, Y., Huang, F. J. & Bottou, L. (2004a). Learning methods for generic object recognition with invariance to pose and lighting. In Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, 2004. CVPR 2004, Vol. 2, IEEE, pp. II–104.
LeCun, Y., Huang, F. J. & Bottou, L. (2004b). Learning methods for generic object recognition with invariance to pose and lighting. In Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, 2004. CVPR 2004, Vol. 2, IEEE, pp. II–104.
LeCun, Y., Huang, F. J. & Bottou, L. (2004c). Learning methods for generic object recognition with invariance to pose and lighting. In Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, 2004. CVPR 2004, Vol. 2, IEEE, pp. II–104.
Li, W., Chen, C., Su, H., & Du, Q. (2015). Local binary patterns and extreme learning machine for hyperspectral imagery classification. IEEE Transactions on Geoscience and Remote Sensing, 53(7), 3681–3693.
Article Google Scholar
Manjunath, B. S., & Ma, W.-Y. (1996). Texture features for browsing and retrieval of image data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(8), 837–842.
Article Google Scholar
Mehrotra, R., Namuduri, K. R., & Ranganathan, N. (1992). Gabor filter-based edge detection. Pattern Recognition, 25(12), 1479–1494.
Article Google Scholar
Nair, V. & Hinton, G. E. (2009). 3d object recognition with deep belief nets. In Advances in neural information processing systems, pp. 1339–1347.
Ngiam, J., Chen, Z., Chia, D., Koh, P. W., Le, Q. V. & Ng, A. Y. (2010). Tiled convolutional neural networks. In Advances in neural information processing systems, pp. 1279–1287.
Ojala, T., Maenpaa, T., Pietikainen, M., Viertola, J., Kyllonen, J. & Huovinen, S. (2002). Outex-new framework for empirical evaluation of texture analysis algorithms. In Proceedings of 16th international conference on pattern recognition, 2002, Vol. 1, IEEE, pp. 701–706.
Pietikäinen, M. (2010). Local binary patterns. Scholarpedia, 5(3), 9775.
Article Google Scholar
Rajadell, O., García-Sevilla, P., & Pla, F. (2013). Spectral-spatial pixel characterization using gabor filters for hyperspectral image classification. IEEE Geoscience and Remote Sensing Letters, 10(4), 860–864.
Article Google Scholar
Reininghaus, J., Huber, S., Bauer, U. & Kwitt, R. (2015). A stable multi-scale kernel for topological machine learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4741–4748.
Samaria, F. S. & Harter, A. C. (1994). Parameterisation of a stochastic model for human face identification. In Proceedings of the second IEEE workshop on applications of computer vision, 1994, IEEE, pp. 138–142.
Saxe, A. M., Koh, P. W., Chen, Z., Bhand, M., Suresh, B. & Ng, A. Y. (2011). On random weights and unsupervised feature learning. In ICML, pp. 1089–1096.
Xu, Y., Zhang, B., & Zhong, Z. (2015). Multiple representations and sparse representation for image classification. Pattern Recognition Letters, 68, 9–14.
Article Google Scholar
Yang, P., Zhang, F. & Yang, G. (2018). Fusing DTCWT and LBP based features for rotation, illumination and scale invariant texture classification. In IEEE access.
Zeng, Y., Xu, X., Shen, D., Fang, Y., & Xiao, Z. (2017). Traffic sign recognition using kernel extreme learning machines with deep perceptual features. IEEE Transactions on Intelligent Transportation Systems, 18(6), 1647–1653.
Google Scholar
Zhang, S., He, B., Nian, R., Wang, J., Han, B., Lendasse, A., et al. (2014). Fast image recognition based on independent component analysis and extreme learning machine. Cognitive Computation, 6(3), 405–422.
Article Google Scholar

Download references

Acknowledgements

This work has been supported by The National Key Research and Development Program of China (2016YFC0301400) and Natural Science Foundation of China (51379198).

Author information

Bo He and Yan Song have contributed equally to this work.

Authors and Affiliations

School of Information Science and Engineering, Ocean University of China, 238 Songling Road, Qingdao, 266100, China
Bo He, Yan Song, Yuemei Zhu, Qixin Sha, Yue Shen & Rui Nian
School of Mechanical and Electrical Engineering, China Jiliang University, 258 Xueyuan Street, Xiasha High-Edu Park, Hangzhou, 310018, China
Tianhong Yan
Department of Mechanical and Industrial Engineering and the Iowa Informatics Initiative, The University of Iowa, Iowa City, IA, 52242-1527, USA
Amaury Lendasse
Arcada University of Applied Sciences, Helsinki, Finland
Amaury Lendasse

Authors

Bo He
View author publications
You can also search for this author in PubMed Google Scholar
Yan Song
View author publications
You can also search for this author in PubMed Google Scholar
Yuemei Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Qixin Sha
View author publications
You can also search for this author in PubMed Google Scholar
Yue Shen
View author publications
You can also search for this author in PubMed Google Scholar
Tianhong Yan
View author publications
You can also search for this author in PubMed Google Scholar
Rui Nian
View author publications
You can also search for this author in PubMed Google Scholar
Amaury Lendasse
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bo He.

Rights and permissions

Reprints and permissions

About this article

Cite this article

He, B., Song, Y., Zhu, Y. et al. Local receptive fields based extreme learning machine with hybrid filter kernels for image classification. Multidim Syst Sign Process 30, 1149–1169 (2019). https://doi.org/10.1007/s11045-018-0598-9

Download citation

Received: 25 November 2017
Revised: 20 May 2018
Accepted: 02 June 2018
Published: 16 June 2018
Issue Date: 01 July 2019
DOI: https://doi.org/10.1007/s11045-018-0598-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Local receptive fields based extreme learning machine with hybrid filter kernels for image classification

Abstract

Similar content being viewed by others

Extreme learning machine with coefficient weighting and trained local receptive fields for image classification

Extreme learning machine with multi-scale local receptive fields for texture classification

Extreme learning machine with autoencoding receptive fields for image classification

1 Introduction

2 Related works

2.1 Gabor filters in image processing

2.2 Brief review of ELM

2.3 Brief review of ELM-LRF

3 Methods

3.1 Local receptive field based extreme learning machine with hybrid filter kernels

3.2 Data augmentation

4 Experimental results and analysis

4.1 Datasets for performance evaluation

4.2 Parameters setup

4.3 Results and analysis

4.3.1 Performance evaluation on dataset Outex\(\_\)TC\(\_\)00012

4.3.2 Performance evaluation on dataset Outex\(\_\)TC\(\_\)00000

4.3.3 Performance evaluation on the Yale face database

4.3.4 Performance evaluation on the ORL face database

4.3.5 Performance evaluation on the NORB dataset

4.3.6 Data augmentation

4.4 Discussions

5 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Local receptive fields based extreme learning machine with hybrid filter kernels for image classification

Abstract

Similar content being viewed by others

Extreme learning machine with coefficient weighting and trained local receptive fields for image classification

Extreme learning machine with multi-scale local receptive fields for texture classification

Extreme learning machine with autoencoding receptive fields for image classification

Explore related subjects

1 Introduction

2 Related works

2.1 Gabor filters in image processing

2.2 Brief review of ELM

2.3 Brief review of ELM-LRF

3 Methods

3.1 Local receptive field based extreme learning machine with hybrid filter kernels

3.2 Data augmentation

4 Experimental results and analysis

4.1 Datasets for performance evaluation

4.2 Parameters setup

4.3 Results and analysis

4.3.1 Performance evaluation on dataset Outex\(\_\)TC\(\_\)00012

4.3.2 Performance evaluation on dataset Outex\(\_\)TC\(\_\)00000

4.3.3 Performance evaluation on the Yale face database

4.3.4 Performance evaluation on the ORL face database

4.3.5 Performance evaluation on the NORB dataset

4.3.6 Data augmentation

4.4 Discussions

5 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation