Unsupervised extreme learning machine with representational features

Ding, Shifei; Zhang, Nan; Zhang, Jian; Xu, Xinzheng; Shi, Zhongzhi

doi:10.1007/s13042-015-0351-8

Unsupervised extreme learning machine with representational features

Original Article
Published: 21 March 2015

Volume 8, pages 587–595, (2017)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Unsupervised extreme learning machine with representational features

Download PDF

Shifei Ding^1,2,
Nan Zhang^1,2,
Jian Zhang^1,2,
Xinzheng Xu^1,2 &
…
Zhongzhi Shi²

1674 Accesses
72 Citations
Explore all metrics

Abstract

Extreme learning machine (ELM) is not only an effective classifier but also a useful cluster. Unsupervised extreme learning machine (US-ELM) gives favorable performance compared to state-of-the-art clustering algorithms. Extreme learning machine as an auto encoder (ELM-AE) can obtain principal components which represent original samples. The proposed unsupervised extreme learning machine based on embedded features of ELM-AE (US-EF-ELM) algorithm applies ELM-AE to US-ELM. US-EF-ELM regards embedded features of ELM-AE as the outputs of US-ELM hidden layer, and uses US-ELM to obtain the embedded matrix of US-ELM. US-EF-ELM can handle the multi-cluster clustering. The learning capability and computational efficiency of US-EF-ELM are as same as US-ELM. By experiments on UCI data sets, we compared US-EF-ELM k-means algorithm with k-means algorithm, spectral clustering algorithm, and US-ELM k-means algorithm in accuracy and efficiency.

Extreme Learning Machine for Clustering

KELMC: An Improved K-Means Clustering Method Using Extreme Learning Machine

A Novel ELM K-Means Algorithm for Clustering

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Extreme learning machine (ELM) proposed by Huang et al. [1, 2] is an efficient learning algorithm of training single layer feed-forward neural networks (SLFNs). Many researches regard ELM as a learning method for regression and multiclass classification [3–6]. Regularized ELM (RELM) has been developed for classification and regression [7]. Weighted ELM (WELM) has been used for the data with imbalanced class distribution [8]. Sparse ELM achieves similar generalization performance in binary classification applications [9]. Chen et al. [10] and Lin et al. [11] present the theoretical analysis of ELM. Recently, ELM has been applied to clustering, such as ELM k-means algorithm [12], and unsupervised extreme learning machine (US-ELM) k-means algorithm [13]. Clustering is an important part of machine learning. At present, many clustering methods have been proposed, such as partitioning methods [14], hierarchical methods [15, 16], density-based methods [17, 18], and graph theory methods [19, 20].

As we know, the original data will be linear separable through a nonlinear transformation. ELM k-means algorithm uses k-means clustering in ELM feature space. US-ELM obtains embedded matrix by constructing a new cost function, and then uses k-means to cluster in embedded matrix of US-ELM. However, the input weights and biases of US-ELM are randomly generated. Therefore, the outputs of US-ELM hidden layer are not reasonable features of the original data. Extreme learning machine as an auto encoder (ELM-AE) [21] can obtain main features of the original data. At the same time, ELM-AE is not an iterative algorithm. The proposed unsupervised extreme learning machine based on embedded features of ELM-AE (US-EF-ELM) can use the features that learned from ELM-AE instead of the US-ELM, in combination with the US-ELM cost function to obtain output weights.

In order to assess the performance of the proposed algorithm, we compare the proposed algorithm with other algorithms on eight UCI datasets. US-EF-ELM k-means algorithm has achieved satisfying results on most of the data sets. This paper is organized as follows. In Sect. 2, we review the ELM method, and introduce the model of ELM-AE, then describe the principle of US-ELM. In Sect. 3, we make a detailed description of US-EF-ELM. In Sect. 4, experimental results on UCI data sets and the performance analysis of US-EF-ELM k-means algorithm are presented. Finally, some conclusions and the intending work are given in the last section.

2 Related works

The proposed US-EF-ELM is based on ELM-AE and US-ELM. This section provides brief reviews of ELM, ELM-AE, and US-ELM.

2.1 Extreme learning machine

ELM proposed by Huang et al. is an efficient learning algorithm of SLFNs. For N distinct samples ($x_{i} ,y_{i}$), i = 1,$\ldots$, N, $x_{i} \in R^{j}$ and $y_{i} \in R^{m}$, the ELM model structure has j input layer nodes, n hidden layer nodes, m output layer nodes and a hidden layer activation function $g(x)$. The outputs of hidden layer can be expressed as Eq. 1, and the relationship between the outputs of hidden layer and the outputs of output layer can be expressed as Eq. 2.

$$h = g(ax + b)$$

(1)

$$h(x_{i} )\beta = y_{i} ,\;\;\;{\kern 1pt} {\text{where}}\,\,i = 1,2, \ldots ,N$$

(2)

The above equation can be rewritten as Eq. 3:

$$H\beta = Y$$

(3)

where

$${\text{H}} = \left[ {\begin{array}{*{20}c} {g(\vec{a}_{1} ,b_{1} ,\vec{x}_{1} )} & {g(\vec{a}_{1} ,b_{1} ,\vec{x}_{2} )} & \cdots & {g(\vec{a}_{n} ,b_{n} ,\vec{x}_{N} )} \\ {g(\vec{a}_{2} ,b_{2} ,\vec{x}_{1} )} & {g(\vec{a}_{2} ,b_{2} ,\vec{x}_{2} )} & \cdots & {g(\vec{a}_{n} ,b_{n} ,\vec{x}_{N} )} \\ \vdots & \vdots & \ddots & \vdots \\ {g(\vec{a}_{n} ,b_{n} ,\vec{x}_{1} )} & {g(\vec{a}_{n} ,b_{n} ,\vec{x}_{2} )} & \cdots & {g(\vec{a}_{n} ,b_{n} ,\vec{x}_{N} )} \\ \end{array} } \right]^{T}\!,\,\,\beta = \left[ {\begin{array}{*{20}c} {\beta_{1}^{T} } \\ {\beta_{2}^{T} } \\ \vdots \\ {\beta_{n}^{T} } \\ \end{array} } \right]_{n \times m} ,\,Y = \left[ {\begin{array}{*{20}c} {y_{1}^{T} } \\ {y_{2}^{T} } \\ \vdots \\ {y_{N}^{T} } \\ \end{array} } \right]_{N \times m} .$$

Using ELM to obtain the output weights $\beta$ can be divided into three steps:

Step 1::: Randomly select numerical values to set input weights and the bias of the hidden layer
Step 2::: Calculate the output matrix H
Step 3::: Calculate the output weights $\beta$:

$$\beta = H^{\dag } Y$$

(4)

where $H^{\dag }$ represents the generalized inverse matrix of the output matrix H.

ELM has the advantage of high training speed and better generalization. However, the robustness of ELM is bad. To solve this question, Deng et al. [7] propose RELM which combines experiential risk and structural risk. RELM aims to obtain the output weights by minimizing the regularized least squares estimation cost function, which leads to the following formulation:

$$\hbox{min} \,\,L_{RELM} = \frac{1}{2}\left\| \beta \right\|^{2} + \frac{C}{2}\left\| {Y - H\beta } \right\|^{2}$$

(5)

where C is a parameter to balance experiential risk and structural risk.

By setting the gradient of L _RELM to zero, we have

$$\beta + CH^{T} (Y - H\beta ) = 0$$

(6)

When the number of training samples is larger than the number of hidden layer nodes, the output weight matrix $\beta$ in RELM can be expressed as Eq. 7:

$$\beta = \left( {\frac{I}{C} + H^{T} H} \right)^{ - 1} H^{T} Y$$

(7)

When the number of training samples is less than the number of hidden layer nodes, the output weight matrix $\beta$ in RELM can be expressed as Eq. 8:

$$\beta = H^{T} \left( {\frac{I}{C} + HH^{T} } \right)^{ - 1} Y$$

(8)

2.2 Extreme learning machine as an auto encoder

The auto encoder is an unsupervised neural network model which is commonly used in deep learning. The outputs of auto encoder are equal to the inputs. ELM-AE proposed by Kasun et al. [21] is a novel method of neural network which can reproduce the input signal as well as auto encoder.

The ELM-AE model consists of an input layer, a single-hidden layer and an output layer. The model structure of ELM-AE is shown in Fig. 1, which has j input layer nodes, n hidden layer nodes, j output layer nodes and a hidden layer activation function $g(x)$. ELM-AE can be divided into three different representations.

j > n::: compressed representation

Represent features from a higher dimensional input signal space to a lower dimensional feature space.

j = n::: equal dimension representation

Represent features from an input signal space dimension equal to feature space dimension.

j < n::: sparse representation

Represent features from a lower dimensional input signal space to a higher dimensional feature space.

There are two differences between ELM-AE and traditional ELM. Firstly, ELM is a supervised neural network and the outputs of ELM are labels, but ELM-AE is an unsupervised neural network and the outputs of ELM-AE are equal to the inputs of ELM-AE. Secondly, the input weights of ELM-AE are orthogonal and the biases of hidden layer of ELM-AE are also orthogonal, when the parameters in ELM are randomly initialized. For N distinct samples, the outputs of ELM-AE hidden layer can be expressed as Eq. 9, and the numerical relationship between the outputs of the hidden layer and the outputs of the output layer can be expressed as Eq. 10.

$$h = g(ax + b),\;\;\;{\kern 1pt} {\text{where}}\,\,{\text{a}}^{\text{T}} {\text{a = I,}}\,{\text{b}}^{\text{T}} {\text{b = 1}}$$

(9)

$$h(x_{i} )\beta = x_{i}^{T} ,\;\;\;{\kern 1pt} {\text{where}}\,\,i = 1,2, \ldots ,N$$

(10)

Using ELM-AE to obtain the output weights $\beta$ also can be separated into three steps, but the calculation method of the output weights $\beta$ in ELM-AE is different from ELM.

For sparse and compressed ELM-AE representations, output weights $\beta$ are calculated by Eqs. 11 and 12:

When the number of training samples is more than the number of hidden layer nodes,

$$\beta = \left( {\frac{I}{C} + H^{T} H} \right)^{ - 1} H^{T} X$$

(11)

When the number of training samples is less than the number of hidden layer nodes,

$$\beta = H^{T} \left( {\frac{I}{C} + HH^{T} } \right)^{ - 1} X$$

(12)

For equal dimension ELM-AE representation, output weights $\beta$ are calculated by Eq. 13:

$$\beta = H^{ - 1} X$$

(13)

2.3 Unsupervised extreme learning machine

US-ELM proposed by Huang et al. makes use of least square method to obtain the embedded matrix $E \in R^{N*m}$ (m is decided by us) which can be used to cluster. And the clustering validity of E is better than the original samples $X \in R^{N*j}$.

If two samples x _i and x _j are close to each other, then the outputs y _i and y _j of US-ELM should be close to each other as well. So US-ELM minimizes the following cost function,

$$\begin{aligned} L_{m} & = \frac{{\sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{n} {w_{ij} (y_{i} - y_{j} )^{2} } } }}{{2(c_{1} - c_{2} )^{2} }} = \frac{{\sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{n} {w_{ij} (y_{i}^{2} - 2y_{i} y_{j} + y_{j}^{2} )} } }}{{2(c_{1} - c_{2} )^{2} }} \\ & = \frac{{\sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{n} { - 2w_{ij} y_{i} y_{j} } } + \sum\limits_{i = 1}^{n} {y_{i}^{2} \left( {\sum\limits_{j = 1}^{n} {w_{ij} } } \right)} }}{{2(c_{1} - c_{2} )^{2} }} = \frac{{2\hat{Y}^{T} (D - W)\hat{Y}}}{{2(c_{1} - c_{2} )^{2} }} = \frac{{\hat{Y}^{T} L\hat{Y}}}{{(c_{1} - c_{2} )^{2} }} \\ \end{aligned}$$

(14)

where D is a diagonal matrix, L is a Laplacian matrix, $\hat{Y} = [\begin{array}{*{20}c} {\vec{y}_{i} } & {\vec{y}_{j} } \\ \end{array} ]^{T}$.

US-ELM aims to obtain the output weights by minimizing the least squares regularization cost function, which leads to the following formulation:

$$\begin{aligned} & \hbox{min} \, L_{{US{ - }ELM}} = \left\| \beta \right\|^{2} + C\,{\text{Tr}}(\beta V^{T} H^{T} LH\beta ) = \, Tr(\beta^{T} (I + CH^{T} LH)\beta ) \\ & \,\,\,\,subject. \, to. \, (H\beta )^{T} H\beta = I \\ \end{aligned}$$

(15)

where ${\text{Tr(}} \cdot )$ denotes the trace of a matrix.

If the number of the US-ELM output layer nodes for m, then choosing the output weights $\beta$ whose columns are the eigenvectors corresponding to the first m smallest eigenvalues is an optimal solution to Eq. 15. However, there is always a constant eigenvector 1 in Laplace eigenvector space and this eigenvalue corresponding eigenvector is the smallest eigenvalue 0. Thus we choose eigenvectors corresponding to first m smallest eigenvalues except for the 0 to compose the output weights $\beta$.

When the number of training samples is more than the number of hidden layer nodes,

$$(I + CH^{T} LH)v = \gamma H^{T} Hv$$

(16)

$$\beta = [\tilde{v}_{2} ,\tilde{v}_{3} , \ldots ,\tilde{v}_{{n_{0} + 1}} ],\;\;\;{\kern 1pt} where\,\,\tilde{v}_{i} = {\raise0.7ex\hbox{${v_{i} }$} \!\mathord{\left/ {\vphantom {{v_{i} } {\left\| {Hv_{i} } \right\|}}}\right.\kern-0pt} \!\lower0.7ex\hbox{${\left\| {Hv_{i} } \right\|}$}}$$

(17)

When the number of training samples is less than the number of hidden layer nodes,

$$(I + CLHH^{T} )\,u = \gamma HH^{T} u$$

(18)

$$\beta = H^{T} [\tilde{u}_{2} ,\tilde{u}_{3} , \ldots ,\tilde{u}_{{n_{0} + 1}} ],\;\;\;{\kern 1pt} where\,\,\tilde{u}_{i} = {\raise0.7ex\hbox{${u_{i} }$} \!\mathord{\left/ {\vphantom {{u_{i} } {\left\| {HH^{T} u_{i} } \right\|}}}\right.\kern-0pt} \!\lower0.7ex\hbox{${\left\| {HH^{T} u_{i} } \right\|}$}}$$

(19)

3 Unsupervised extreme learning machine based on embedded features of ELM-AE

US-ELM can obtain the output weights by constructing a different cost function instead of the ELM cost function without supervised information. However, the randomly generated input weights and biases will lead a result that the outputs of US-ELM hidden layer cannot represent features of the original samples very well. ELM-AE is an artificial neural network algorithm which is able to reconstruct the input signal. In order to achieve the reconstruction, ELM-AE obtains the main features of the original samples, as auto encoder does. ELM-AE can find principal components which represent the original samples and the learning process of ELM-AE without iteration. We can make use of features of ELM-AE instead of the outputs of US-ELM hidden layer, and then use the US-ELM cost function to obtain the output weights. We call it US-EF-ELM.

For N distinct samples ($x_{i} ,y_{i}$), i = 1,$\ldots$, N, $x_{i} \in R^{j}$ and $y_{i} \in R^{m}$, the model structure of US-EF-ELM is shown in Fig. 2, with j input layer nodes, n hidden layer nodes, m output layer nodes and a hidden layer activation function g(x). First of all, we make use of ELM-AE to calculate the output weights $\beta_{1}$. The transpose of $\beta_{1}$ is the input weights of US-EF-ELM. The outputs of US-EF-ELM hidden layer H can be expressed as

$${\text{When}}\,\,n\, \ne \,j\;\;\;{\kern 1pt} {\text{and}}\;\;\;{\kern 1pt} N < n,\,\,\,\beta_{1} = \left( {\frac{{I_{n} }}{{C_{1} }} + H_{ELM - AE}^{T} H_{ELM - AE} } \right)^{ - 1} H_{ELM - AE}^{T} X$$

(20)

$${\text{When}}\,\,n\, \ne \,j\;\;\;{\kern 1pt} {\text{and}}\;\;\;{\kern 1pt} N < n,\,\,\,\beta_{1} = H_{{ELM{ - }AE}}^{T} \left( {\frac{{I_{N} }}{{C_{1} }} + H_{{ELM{ - }AE}} H_{{ELM{ - }AE}}^{T} } \right)^{ - 1} X$$

(21)

$${\text{When}}\,\,\,n = j,\,\,\,\beta_{1} = H_{{ELM{ - }AE}}^{ - 1} X$$

(22)

$$H = g(X \cdot \beta_{1}^{T} )$$

(23)

US-EF-ELM aims to obtain the output weights by minimizing the US-ELM cost function, and leads to the following formulation:

$$\mathop {\hbox{min} }\limits_{{\beta_{2}^{T} H^{T} H\beta_{2} = I}} \, L_{{US{ - }SL{ - }ELM}} = \left\| {\beta_{2} } \right\|^{2} + C_{2} {\text{Tr}}(\beta_{2}^{T} H^{T} LH\beta_{2} ) = {\text{Tr}}(\beta_{2}^{T} (I + C_{2} H^{T} LH)\beta_{2} )$$

(24)

So we choose the eigenvectors corresponding to the first m smallest eigenvalues except for the 0 to compose output weights $\beta_{2}$.

$$\begin{aligned} & {\text{When}}\;\,N > n,\,\,\,(I_{n} + C_{2} H^{T} LH)v = \gamma H^{T} Hv, \\ & \,\,\beta_{2} = [\tilde{v}_{2} ,\tilde{v}_{3} , \ldots ,\tilde{v}_{{n_{0} + 1}} ],\;\;\;{\kern 1pt} where\,\,\tilde{v}_{i} = {\raise0.7ex\hbox{${v_{i} }$} \!\mathord{\left/ {\vphantom {{v_{i} } {\left\| {Hv_{i} } \right\|}}}\right.\kern-0pt} \!\lower0.7ex\hbox{${\left\| {Hv_{i} } \right\|}$}} \\ \end{aligned}$$

(25)

$$\begin{aligned} & {\text{When}}\,\,N < n,\,\,\,(I_{N} + C_{2} LHH^{T} )u = \gamma HH^{T} u, \\ & \beta_{2} = H^{T} [\tilde{u}_{2} ,\tilde{u}_{3} , \ldots ,\tilde{u}_{{n_{0} + 1}} ],\;\;\;{\kern 1pt} where\,\,\tilde{u}_{i} = {\raise0.7ex\hbox{${u_{i} }$} \!\mathord{\left/ {\vphantom {{u_{i} } {\left\| {HH^{T} u_{i} } \right\|}}}\right.\kern-0pt} \!\lower0.7ex\hbox{${\left\| {HH^{T} u_{i} } \right\|}$}} \\ \end{aligned}$$

(26)

The outputs E of US-EF-ELM can be calculated by following formulation:

$$E = H\beta_{2}$$

(27)

Then we can adopt the k-means algorithm to perform clustering in the embedded spaces of US-EF-ELM E. The following algorithm is a summary of the proposed US-EF-ELM.

4 Experiments and results

In this section, the performance of the US-EF-ELM k-means clustering algorithm is compared with classical methods (k-means algorithm and spectral clustering algorithm) and the US-ELM k-means clustering algorithm.

4.1 Data sets

The data sets used in the experiments are all from the UCI Machine Learning Repository, which include Ionosphere, Breast Cancer Wisconsin (Diagnostic), Musk (Version 1), Musk (Version 2), Abalone, ISOLET, EEG Eye State, and PEMS-SF. The details of those data sets are given in Table 1.

Table 1 The details of UCI data sets

Full size table

4.2 Experimental setup

We do experiments in a work station with a core i7 DMI2-Intel 3.6 GHz processor and 18 GB RAM running MATLAB 2012B. We run k-means algorithm, spectral clustering (SC) algorithm, US-ELM k-means algorithm, and US-EF-ELM k-means algorithm 100 times, and then run SC algorithm 10 times on last five datasets.

The number of US-ELM hidden nodes and US-EF-ELM hidden nodes is set to 2000 for all data sets, and activation functions of US-ELM and US-EF-ELM are sigmoid function. The kernel of SC algorithm is the Gaussian kernel. The parameter C in US-ELM and the parameters C ₁ and C ₂ in US-ELM are chosen from [10⁻⁴ 10⁻³ 10⁻² 10⁻¹ 10⁰ 10¹ 10² 10³ 10⁴].

4.3 Quality of the clustering results

This paper uses clustering accuracy (ACC) [22] to measure the quality of the clustering results. For N distinct samples $x_{i} \in R^{j}$, y _i and c _i are the inherent category label and the predicted cluster label of x _i, the calculation formula of ACC is

$$ACC = {{\sum\limits_{i = 1}^{N} {\delta (y_{i} ,map(c_{i} ))} } \mathord{\left/ {\vphantom {{\sum\limits_{i = 1}^{N} {\delta (y_{i} ,map(c_{i} ))} } N}} \right. \kern-0pt} N}$$

(28)

where $map( \cdot )$ maps each cluster label to a category label by the Hungarian algorithm [23] and this mapping is optimal, let $\delta (y_{i} ,c_{i} )$ equal to 1 if y _i = c _i or equals to 0 otherwise. The higher values of the ACC, the better the clustering performance.

The comparison of these algorithms is shown in Table 2, and the dimensions of the embedded space of US-ELM and US-EF-ELM are selected based on Fig. 3. It is obvious that the US-EF-ELM k-means algorithm has achieved gratifying results on most of the data sets. The first three data sets are small, next five data sets are larger. SC algorithm performs better on small data sets. US-ELM k-means algorithm and US-EF-ELM k-means algorithm perform better on larger data sets.

Table 2 The performance comparison of the proposed US-EF-ELM

Full size table

The embedded space of US-ELM and US-EF-ELM makes an obvious effect on clustering performance. Figure 3 shows the best clustering performance of US-ELM and US-EF-ELM in different dimensions of the embedded space. We can get the following conclusion from Fig. 3: (1) the best cluster performance of US-EF-ELM is better than US-ELM; (2) the US-EF-ELM is better than US-ELM in a wide range of embedding dimensions; (3) Huang et al. [13] shows that the US-ELM attains its best performance with a very low dimensional embedding, but the US-ELM and US-EF-ELM attain their best performance with a high dimensional embedding on ISOLET and Musk (Version 2) data sets. Performing k-means in a relatively lower dimensional space is just for our reference.

4.4 Computational efficiency

The training time comparison of these algorithms is shown in Table 3. It is obvious that the k-means algorithm has the best computational efficiency. In terms of the process of US-EF-ELM k-means algorithm, the training time of US-EF-ELM is longer than US-ELM. The embedded space dimension of US-EF-ELM is different from US-ELM, so the training time of US-ELM k-means algorithm is longer on some datasets. From experimental results on PEMS-SF dataset, we can conclude that the computational efficiency of US-EF-ELM is significantly lower than US-ELM when samples have high dimension. The process of ELM-AE has taken a long time when samples have high dimension. The training time of SC algorithm is longest on large datasets. Though the training time of US-EF-ELM k-means algorithm is longer than US-ELM on most of the datasets, US-EF-ELM is still efficient.

Table 3 The training time comparison of the proposed US-EF-ELM

Full size table

5 Conclusions

In this paper, US-EF-ELM is presented, and US-EF-ELM k-means algorithm gets better clustering performances compared to US-ELM k-means algorithm on UCI datasets. Besides the good performance of US-EF-ELM k-means algorithm, the proposed algorithm is also computational efficient. US-EF-ELM is a learning algorithm of SLFNs, and can be a learning algorithm of multi layer feed-forward neural networks. The new algorithm makes use of ELM-AE to train the weights in each layer and then uses the US-ELM cost function to obtain the output weights. However, it is difficult to determine the model structure. Future research is to find a suitable model structure of the algorithm.

6 Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 61379101), the National Key Basic Research Program of China (No. 2013CB329502), and the Natural Science Foundation of Jiangsu Province (No. BK20130209).

References

Huang GB, Zhu QY, Siew CK (2004) Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings of the international joint conference on neural networks (IJCNN’2004). IEEE Press, Budapest, pp 25–29
Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501
Article Google Scholar
Huang GB, Wang DH, Lan Y (2011) Extreme learning machines: a survey. Int J Mach Learn Cybern 2(2):107–122
Article Google Scholar
Cambria E, Huang GB, Kasun LLC, Zhou HM, Vong CM, Lin J, Yin JP, Cai ZP, Liu Q, Li K, Leung VCM, Feng L, Ong YS, Lim MH, Akusok A, Lendasse A, Corona F, Nian R, Miche Y, Gastaldo P, Zunino R, Decherchi S, Yang XF, Mao KZ, Oh BS, Jeon J, Toh KA, Beng A, Teoh J, Kim J, Yu HC, Chen YQ, Liu JF (2013) Extreme learning machines [trends & controversies]. IEEE Intell Syst 28(6):30–59
Article Google Scholar
Huang GB (2014) An Insight into extreme learning machines: random neurons, random features and kernels. Cogn Comput 6(3):376–390
Article MathSciNet Google Scholar
Ding SF, Xu XZ, Nie R (2014) Extreme learning machine and its applications. Neural Comput Appl 25(3):549–556
Article Google Scholar
Deng WY, Zheng QH, Chen L, Xu XB (2010) Research on extreme learning of neural networks. Chin J Comput 33(2):279–287
Article MathSciNet Google Scholar
Zong WW, Huang GB, Chen YQ (2013) Weighted extreme learning machine for imbalance learning. Neurocomputing 101:229–242
Article Google Scholar
Bai Z, Huang GB, Wang D, Westover MB (2014) Sparse extreme learning machine for classification. IEEE Trans Cybern. doi:10.1109/TCYB.2014.2298235
Google Scholar
Chen H, Peng JT, Zhou YC, Li LQ, Pan ZB (2014) Extreme learning machine for ranking: generalization analysis and applications. Neural Networks 53:119–126
Article MATH Google Scholar
Lin SB, Liu X, Fang J, Xu ZB (2015) Is extreme learning machine feasible? A theoretical assessment (part II). IEEE Trans Neural Networks Learn Syst 26(1):21–34
Article MathSciNet Google Scholar
He Q, Jin X, Du CY, Zhuang FZ, Shi ZZ (2014) Clustering in extreme learning machine feature space. Neurocomputing 128:88–95
Article Google Scholar
Huang G, Song SJ, Gupta JND, Wu C (2014) Semi-supervised and unsupervised extreme learning machines. IEEE Trans Cybern 44(12):2405–2417
Article Google Scholar
Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137
Article MathSciNet MATH Google Scholar
Defays D (1977) An efficient algorithm for a complete link method. Comput J 20(4):364–366
Article MathSciNet MATH Google Scholar
Zhang L, Zhang XH, Yang JW, Wang XQ, Li FZ (2012) Cascaded cluster ensembles. Int J Mach Learn Cybern 3(4):335–343
Article Google Scholar
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of 2nd international conference on KDD. AAAI Press, Menlo Park, pp 226–231
Xu X (2013) Enhancing gene expression clustering analysis using tangent transformation. Int J Mach Learn Cybern 4(1):75–83
Article Google Scholar
Von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416
Article MathSciNet Google Scholar
Liu N, Chen F, Lu MY (2013) Spectral co-clustering documents and words using fuzzy K-harmonic means. Int J Mach Learn Cybern 4(1):31–40
Article Google Scholar
Kasun LLC, Zhou HM, Huang GB, Vong CM (2013) Representational learning with extreme learning machine for big data. IEEE Intell Syst 28(6):31–34
Google Scholar
Ding SF, Jia HJ, Shi ZZ (2014) Spectral clustering algorithm based on adaptive Nyström sampling for big data analysis. J Softw 25(9):2037–2049
MATH Google Scholar
Papadimitriou CH, Steiglitz K (1998) Combinatorial optimization: algorithms and complexity. Courier Dover Publications, Mineola
MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China
Shifei Ding, Nan Zhang, Jian Zhang & Xinzheng Xu
Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
Shifei Ding, Nan Zhang, Jian Zhang, Xinzheng Xu & Zhongzhi Shi

Authors

Shifei Ding
View author publications
You can also search for this author in PubMed Google Scholar
Nan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jian Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xinzheng Xu
View author publications
You can also search for this author in PubMed Google Scholar
Zhongzhi Shi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shifei Ding.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ding, S., Zhang, N., Zhang, J. et al. Unsupervised extreme learning machine with representational features. Int. J. Mach. Learn. & Cyber. 8, 587–595 (2017). https://doi.org/10.1007/s13042-015-0351-8

Download citation

Received: 21 November 2014
Accepted: 13 March 2015
Published: 21 March 2015
Issue Date: April 2017
DOI: https://doi.org/10.1007/s13042-015-0351-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Unsupervised extreme learning machine with representational features

Abstract