Incremental ELMVIS for Unsupervised Learning

Akusok, Anton; Eirola, Emil; Miche, Yoan; Oliver, Ian; Björk, Kaj-Mikael; Gritsenko, Andrey; Baek, Stephen; Lendasse, Amaury

doi:10.1007/978-3-319-57421-9_15

Anton Akusok⁸,
Emil Eirola⁸,
Yoan Miche^9,10,
Ian Oliver⁹,
Kaj-Mikael Björk¹¹,
Andrey Gritsenko^12,13,
Stephen Baek¹² &
…
Amaury Lendasse^8,12,13

Part of the book series: Proceedings in Adaptation, Learning and Optimization ((PALO,volume 9))

884 Accesses

Abstract

An incremental version of the ELMVIS+ method is proposed in this paper. It iteratively selects a few best fitting data samples from a large pool, and adds them to the model. The method keeps high speed of ELMVIS+ while allowing for much larger possible sample pools due to lower memory requirements. The extension is useful for reaching a better local optimum with greedy optimization of ELMVIS, and the data structure can be specified in semi-supervised optimization. The major new application of incremental ELMVIS is not to visualization, but to a general dataset processing. The method is capable of learning dependencies from non-organized unsupervised data—either reconstructing a shuffled dataset, or learning dependencies in complex high-dimensional space. The results are interesting and promising, although there is space for improvements.

Access provided by CONRICYT-eBooks. Download conference paper PDF

Using data to build a better EM: EM* for big data

Article 05 July 2017

Learning from Past Observations: Meta-Learning for Efficient Clustering Analyses

Challenges in Learning from Streaming Data Extended Abstract

Keywords

1 Introduction

The ELMVIS method [1] is an interesting Machine Learning method that optimize a cost function by changing assignment between two sets of samples, or by changing the order of samples in one set which is the same. The cost function is learned by an Extreme Learning Machine (ELM) [2,3,4], a fast method for training feed-forward neural networks with convenient mathematical properties [5, 6]. Such optimization problem is found in various applications like open-loop Traveling Salesman problem [7] or clustering [8] (mapping between samples and clusters), but not in Neural Networks. ELMVIS is unique in a sense that it combines the optimal assignment task with neural network optimization problem; the latter is optimized at each step of ELMVIS.

A recent advance in ELMVIS+ method [9] set its runtime speed comparable or faster than other state-of-the-art methods in visualization application. However there are unresolved problems like a greedy optimization leading to a local optimum. Also ELMVIS+ can be applied to a much wider range of problems than a simple visualization or a visualization accounting for the class information [10], which have not been tested or reported yet. This paper addresses the aforementioned drawbacks, and presents the most recent research advances in the family of ELMVIS methods.

The proposed incremental ELMVIS allows for iterative growth of dataset size and model complexity. Incremental ELMVIS learns an approximate global data structure with a few data samples and a simple ELM model, because at a very small scale global and local optimums are similar or the same. Then more data samples are added to the model, choosing the ones that better fit an existing ELM. After adding a batch of new samples, the current model is refined by ELMVIS+ method. This refinement keeps the global optimum due to the greedy optimization and only small changes. More neurons are added to ELM as the dataset size grows, to better separate the data.

Iterative ELMVIS is useful for semi-supervised learning, starting from the data with known outputs and adding more data with unknown outputs, simultaneously updating the model. It can even apply to completely unsupervised datasets, where it finds an input-output dependency, learns it with ELM model, and then simultaneously expands the supervised part of a dataset and updates an ELM model that encodes the input-output dependency.

The experiments have shown the ability of iterative ELMVIS to improve global optimum, successfully perform semi-supervised and unsupervised learning with complex tasks. Current version of the method is found limited to good separation between only two classes in data (which it learns first), ignoring samples of the additional classes until the first two ones are exhausted, and poorly fitting the additional classes into the learned two-class model. Solution to this problem will be considered in further works on the topic.

2 Methodology

An iterative extension of ELMVIS+ methodology is rather straight-forward, as explained below. ELMVIS methods start with a list of (visualization) samples and an unordered set of data samples; and it finds an optimal order of data samples in the set by a greedy search with changing positions of many samples at once (ELMVIS) or only two samples at once (ELMVIS+).

Iterative ELMVIS splits data samples into fixed, candidate and available ones. Fixed samples have their order fixed and cannot be moved by an iterative ELMVIS. Candidate samples are a small number of samples which are chosen from candidate+available ones to maximize the cost function. This cost function takes into account fixed and current candidate samples, but ignores the available samples. Once current candidate samples are chosen optimally, they are added to the fixed ones, and the method is repeated with a few more candidate samples—until the available data samples are exhausted.

2.1 Extreme Learning Machine

Extreme Learning Machine is a way of training feedforward neural networks [11] with a single hidden layer that features randomly assigned input weights [12], explicit non-iterative solution for output weights and an extreme computation speed and scalability [13]. This model is used as a non-linear cost function in all ELMVIS methods. This short summary introduces the notations to the reader.

The goal of ELM model is to approximate the projection function $\hat{\mathbf {y}}_i \approx f(\mathbf {x}_i)$ using a representative training dataset. As ELM is a deterministic model, the function f() is assumed to be deterministic, and a noise $\epsilon $ is added to cover the deviation of true outputs $\mathbf {y}$ from the predictions by a deterministic function f()

$$\begin{aligned} \mathbf {y}= f(\mathbf {x}) + \epsilon \end{aligned}$$

(1)

The Extreme Learning Machine [2] (ELM) is a neural network with d input, L hidden and c output neurons. The hidden layer weights $\mathbf {W}_{d \times L}$ and biases $\text {bias}_{1 \times L}$ are initialized randomly and are fixed. The hidden layer neurons apply a transformation function $\phi $ to their outputs that is usually a non-linear transformation function with bounded output like sigmoid or hyperbolic tangent.

The output of the hidden layer is denoted by $\mathbf {h}$ with an expression

$$\begin{aligned} \mathbf {h}_i = \phi (\mathbf {x}_i\mathbf {W} + \text {bias}) \end{aligned}$$

(2)

where the function $\phi ()$ is applied element-wise, and can also be gathered in a matrix $\mathbf {H}_{N \times L}$ for convenience.

The output layer of ELM poses a linear problem $\mathbf {H}\varvec{\beta }= \mathbf {Y}$ with unknown output weights $\varvec{\beta }_{L \times c}$. The solution is derived from the minimization of the sum of squared residuals $\mathbf {r}^2 = (\mathbf {y}- \hat{\mathbf {y}})^2$, that gives the ordinary least squares solution $\varvec{\beta }= \mathbf {H}^\dagger \mathbf {Y}$, where $\mathbf {H}^\dagger $ is a Moore-Penrose pseudoinverse [14] of the matrix $\mathbf {H}$.

2.2 ELMVIS+ Method

ELMVIS+ method [15] approximates a relation between visualization (or input) space $\mathcal {V}$ and data space $\mathcal {X}$ by an ELM model. The task has N representative samples $\mathbf {v}\in \mathcal {V}$ and $\mathbf {x}\in \mathcal {X}$, but their order is unknown. The method assumes fixed order of samples $\mathbf {v}$ joined in matrix $\mathbf {V}$, and finds a suitable order of samples $\mathbf {x}$ joined in matrix $\mathbf {X}$ by exchanging pairs of rows in $\mathbf {X}$. Contrary to a common use of ELM, data samples $\mathbf {x}$ are the outputs of ELM and visualization coordinates $\mathbf {v}$ are the inputs (thus ELM predicts original data $\hat{\mathbf {x}}$). Visualization coordinates $\mathbf {V}$ are chosen arbitrary and fixed—they can be distributed randomly with normal or uniform distribution, or initialized on a regular grid.

The optimization criterion is a cosine similarity between $\mathbf {X}$ and $\hat{\mathbf {X}}$ predicted by ELM. A low error means that its possible to reconstruct data from the given visualization points, thus the visualization points keep information about the data. The reconstruction is approximated by the ELM model in ELMVIS.

There is an explicit formula for a change of error (negative cosine similarity) for swapping two rows in $\mathbf {X}$ and re-training ELM with this new dataset. The readers can refer to the original paper [15] for the full formula. It is based on the expression for the change on error $\varDelta _E$ in case a row $\mathbf {x}_a$ in $\mathbf {X}$ is changed by $\delta \in \mathcal {X}$ amount:

$$\begin{aligned} \varDelta _E= & {} \sum _{j=1}^{d} \Big ( \mathbf {A}_{a,a} \delta _j^2 + 2 \hat{\mathbf {x}}_{a,j} \delta _j \Big ) \end{aligned}$$

(3)

$$\begin{aligned} \hat{\mathbf {X}}\leftarrow & {} \hat{\mathbf {X}} - \mathbf {A}_{:,a} \times \delta \end{aligned}$$

(4)

$$\begin{aligned} \mathbf {X}_{:,a}\leftarrow & {} \mathbf {X}_{:,a} + \delta \end{aligned}$$

(5)

2.3 Incremental ELMVIS

Incremental ELMVIS splits all data samples in three groups: fixed, candidate and available samples. The separation is maintained with two indexes: $i_A$ is the number of fixed samples, and $i_B$ is the number of fixed+candidate ones.

Incremental ELMVIS works similar to ELMVIS+. First, the initial numbers of fixed and candidate samples are given by $i_A$ and $i_B$. Then swap indexes $a \in [ i_{A}, i_{B} ]$ and $b \in [ i_{B}+1, N]$ are selected randomly to replace one candidate sample with an available one. The change of error $\varDelta _E$ is computed by the formula (3) for the change in the candidate row of the data matrix $\mathbf {X}$. Compared to ELMVIS+, the change in the available sample is ignored. Then for the negative $\varDelta _E$, an update step is performed for sample $\mathbf {x}_a$ as in Eq. (4) and for both $\mathbf {x}_a$ and $\mathbf {x}_b$ as in Eq. (5).

If there is no improvement during a large number of swaps, the current candidate samples are added to the fixed ones ($i_A \leftarrow i_B$), and k more samples are added as candidates ($i_B \leftarrow i_B + k$). Candidate samples are already initialized with the data samples at indexes $\mathbf {x}_{i_A}, \ldots , \mathbf {x}_{i_B}$. The method then repeats for another iteration. Iterations stop when no available samples are left.

In the original ELMVIS+, matrix $\mathbf {A}$ took the most space and limited the maximum amount of processed samples (its memory size is $\mathcal {O}(N^2)$). In incremental ELMVIS, only a $\mathbf {A}_{i_B \times i_B}$ part of the whole matrix $\mathbf {A}$ is needed. That relaxed memory requirements of the method, and allows to use a very large pool of available samples. The memory constraints of incremental ELMVIS apply only to the number of optimized data samples.

3 Experimental Results

Incremental ELMVIS method is developed for two main applications. The first one is achieving a better global optimum in ELMVIS+. The original EMLVIS+ is a good visualization method, however it has an unwanted feature: with a large number of neurons in ELM it fragments clusters in the visualized data. This happens with large amount of data and a complex ELM model. An iterative ELMVIS that starts with small amount of data and a simple model keeps all similar data together; then more data samples are gradually added while the total picture changes little due to local minimum in ELMVIS+ optimization.

The second application is finding unknown relations in datasets. This is an unsupervised learning field relevant to the current Big Data trends, when a large amount of interesting data is available—but there pre-processing like manual labeling or classification. It is possible to extract relations inside data automatically by iteratively growing an ELMVIS+ model between two sets of data samples (they don’t have to be related to visualization). Results for both applications are presented below.

3.1 Better Optimum with ELMVIS

ELMVIS+ method is fast and works with large datasets, but it has a greedy optimization approach that leads to local optimality of the solution. Such local optimum is close to a global one for small datasets and simple ELM models, but with a large dataset and many neurons in ELM model the visualization data is split into multiple small clusters with local optimality, non-representative of a global picture.

A better global optimality is achievable with an incremental ELMVIS. This experiment uses MNIST digits [16] with their original $28 \times 28$ features (grayscale pixels), with 500 digits for each of the classes 0–4. It starts by seeding several cluster as shown on Fig. 1 by bold samples, and a simple ELM model. Then gradually added data fits into the existing model (Fig. 1, left). An ELM learns an easy separation between two clusters, and an incremental ELMVIS prefers to add samples of these clusters until they are available (Fig. 1, right).

The incremental ELMVIS learns a good model that separated between two different kinds of data. Then there is no data samples of these two types left, it begins adding more types, starting at the boundary (Fig. 2, left). These additional classes are mapped to a single area, although they go over the two previously learned cluster as there is no space left on the visualization (Fig. 2, right). The ELMVIS still ignores the last available class (digits 4) because it is not represented on the visualization space.

3.2 Better Optimum with Semi-supervised ELMVIS

In the previous experiment, there were no sharp borders between clusters because ELMVIS used all the visualization space to show only two clusters, and then has to map additional data clusters other them. Sharper borders can be obtained by running the original ELMVIS+ on the fixed set of data points after each iteration of the incremental ELMVIS. That will make space for more clusters by compacting the existing ones; while still preserving the global structure as ELMVIS+ optimization goes to the local optimum only.

In addition, a semi-supervised approach is tested where the clusters are initialized with a larger number of samples. This experiment uses 5 classes of digits with 1000 samples per class, initialized with 20 samples per class as shown on Fig. 3 (left). A large initialization set forces ELM to learn all the classes, and add samples from all of them instead of only two (Fig. 3, right).

When the method need space to map more samples from a particular class, an additional ELMVIS+ step moves existing clusters to give that space while keeping sharp boundaries between the classes. The effect is shown on Fig. 4 where all clusters are moved to give more space for digits 2. The global structure is well preserved, with large clusters keeping their place.

The added ELMVIS+ step refines the candidate samples placement within the fixed ones, that is necessary towards the end of visualization when there may be no samples of the desired class left, or no spaces left within the desired class area. An improvement of ELMVIS+ step fitting the candidate samples is presented on Fig. 5.

3.3 Data Structure Discovery with Unsupervised ELMVIS

The inputs to ELMVIS are not limited to visualization coordinates; they can be arbitrary data. Thus ELMVIS is a feasible method for finding structure in the data in an unsupervised manner. This experiments takes a number of MNIST digits in random (undefined) order, and maps them to the same number of classes (in zero/one encoding), or to the same number of different MNIST digits of the same classes.

First dataset has zero-one classes as inputs, and MNIST digits as outputs. However it cannot be used to train a supervised model because it is unknown which input corresponds to which output—a common situation in the analysis of large automatically acquired data corpora that has not been manually labeled. The goal of an incremental ELMVIS is to reconstruct the correct input-output pairing. The experiment uses 100 samples per class with 2 or 5 classes of digits. Note that the best pairing across all permutations of classes is reported, as in the unsupervised setup with equal amount of samples there is no way to tell ELMVIS which class should go to which digit.

The resulting confusion matrices are shown on Fig. 6. An incremental ELMVIS successfully paired classes with pictures of MNIST digits. The method mapped classes arbitrary (i.e. class [1, 0, 0, 0, 0] is mapped to digit 4), but this is to be expected from a purely unsupervised method.

Another experiment is performed in a similar setup, but instead of binary class representations the incremental ELMVIS tried to map MNIST digits to other MNIST digits (of the same classes, Fig. 7). The mapping is successful with two classes. With more than two classes the same feature always appears: two random classes are separated well while other classes are randomly mixed with them. This outcome is in line with the results observed in Sect. 3.1 where two classes are clearly separated at the beginning, followed by other classes mapped over them.

4 Conclusions

An iterative extension to the original ELMVIS+ method is proposed in this paper. It iteratively selects a small number of best fitting samples from all the available ones, and adds them to the model. It allows for a much larger set of potential samples than ELMVIS+ by limiting memory requirements to already fitted samples rather than to all available ones, keeping the high speed of the ELMVIS+ at the same time.

The method improves global structure of ELMVIS+ visualization by starting with a small dataset, and gradually adding more data or increasing the complexity of the model. It preserves the global structure, sharp boundaries between classes, and has a possible semi-supervised extension where the samples are mapped to the specified places on the visualization space.

The proposed method is capable of unsupervised data structure detection. It excels in reconstructing a randomly shuffled dataset with unknown pairing between inputs and outputs. It is also capable of finding a mapping between two complex data spaces as shown on MNIST digits example.

The methodology needs further investigation and improvement to counter the observed drawbacks, specifically the tendency of learning an easy model first leading to problems in incorporating more complex parts to the global picture.

References

Cambria, E., et al.: Extreme learning machines [Trends & Controversies]. IEEE Intell. Syst. 28(6) 30–59 (2013)
Google Scholar
Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. In: Neural Networks Selected Papers from the 7th Brazilian Symposium on Neural Networks (SBRN ’04)7th Brazilian Symposium on Neural Networks, vol. 70(1–3), pp. 489–501 (2006)
Google Scholar
Huang, G.B., Zhou, H., Ding, X., Zhang, R.: Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 42(2), 513–529 (2012). April
Article Google Scholar
Huang, G.B.: What are extreme learning machines? Filling the gap between Frank Rosenblatt’s dream and John von Neumann’s puzzle. Cogn. Comput. 7(3), 263–278 (2015)
Article Google Scholar
Huang, G.B., Chen, L., Siew, C.K.: Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans. Neural Netw. 17(4), 879–892 (2006). July
Article Google Scholar
Miche, Y., Sorjamaa, A., Bas, P., Simula, O., Jutten, C., Lendasse, A.: OP-ELM: optimally-pruned extreme learning machine. IEEE Trans. Neural Netw. 21(1), 158–162 (2010). January
Article Google Scholar
Gutin, G., Punnen, A.P. (eds.): The Traveling Salesman Problem and its Variations. Combinatorial Optimization. Kluwer Academic, Dordrecht, London (2002)
MATH Google Scholar
Alpaydin, E.: Introduction to Machine Learning. MIT press (2014)
Google Scholar
Akusok, A., Miche, Y., Björk, K.M., Nian, R., Lauren, P., Lendasse, A.: ELMVIS+: improved nonlinear visualization technique using cosine distance and extreme learning machines. In: Cao, J., Mao, K., Wu, J., Lendasse, A. (eds.) Proceedings of ELM-2015 Volume 2: Theory, Algorithms and Applications (II), pp. 357–369. Springer, Cham (2016)
Google Scholar
Gritsenko, A., Akusok, A., Miche, Y., Björk, K.M., Baek, Stephen, Lendasse, A.: Combined Nonlinear Visualization and Classification: ELMVIS++C. Vancouver, Canada (2016)
Google Scholar
Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall (1998)
Google Scholar
Huang, G.B.: An insight into extreme learning machines: random neurons, random features and kernels. Cogn. Comput. 6(3), 376–390 (2014)
Article Google Scholar
Akusok, A., Björk, K.M., Miche, Y., Lendasse, A.: High-performance extreme learning machines: a complete toolbox for big data applications. IEEE Access 3, 1011–1025 (2015). July
Article Google Scholar
Rao, C.R., Mitra, S.K.: Generalized inverse of a matrix and its applications. In: Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Theory of Statistics, pp. 601–620. University of California Press, Berkeley, CA (1972)
Google Scholar
Akusok, A., Miche, Y., Björk, K.M., Nian, R., Lauren, P., Lendasse, A.: ELMVIS+: Fast nonlinear visualization technique based on cosine distance and extreme learning machines. Neurocomputing (2016)
Google Scholar
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). November
Article Google Scholar

Download references

Author information

Authors and Affiliations

Arcada University of Applied Sciences, Helsinki, Finland
Anton Akusok, Emil Eirola & Amaury Lendasse
Nokia Solutions and Networks Group, Espoo, Finland
Yoan Miche & Ian Oliver
Department of Computer Science, Aalto University, Espoo, Finland
Yoan Miche
Risklab at Arcada University of Applied Sciences, Helsinki, Finland
Kaj-Mikael Björk
Department of Mechanical and Industrial Engineering, The University of Iowa, Iowa City, USA
Andrey Gritsenko, Stephen Baek & Amaury Lendasse
The Iowa Informatics Initiative, The University of Iowa, Iowa City, USA
Andrey Gritsenko & Amaury Lendasse

Authors

Anton Akusok
View author publications
You can also search for this author in PubMed Google Scholar
Emil Eirola
View author publications
You can also search for this author in PubMed Google Scholar
Yoan Miche
View author publications
You can also search for this author in PubMed Google Scholar
Ian Oliver
View author publications
You can also search for this author in PubMed Google Scholar
Kaj-Mikael Björk
View author publications
You can also search for this author in PubMed Google Scholar
Andrey Gritsenko
View author publications
You can also search for this author in PubMed Google Scholar
Stephen Baek
View author publications
You can also search for this author in PubMed Google Scholar
Amaury Lendasse
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anton Akusok .

Editor information

Editors and Affiliations

Institute of Information and Control, Hangzhou Dianzi University, Zhejiang, China
Jiuwen Cao
School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
Erik Cambria
Department of Mechanical and Industrial Engineering, University of Iowa, Iowa City, Iowa, USA
Amaury Lendasse
Department of Information and Computer Science, School of Science, Aalto University, Aalto, Finland
Yoan Miche
Department of Computer and Information Science, University of Macau, Macau, China
Chi Man Vong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Akusok, A. et al. (2018). Incremental ELMVIS for Unsupervised Learning. In: Cao, J., Cambria, E., Lendasse, A., Miche, Y., Vong, C. (eds) Proceedings of ELM-2016. Proceedings in Adaptation, Learning and Optimization, vol 9. Springer, Cham. https://doi.org/10.1007/978-3-319-57421-9_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-57421-9_15
Published: 26 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57420-2
Online ISBN: 978-3-319-57421-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics