Keywords

1 Introduction

The size and complexity of data collections are ever increasing from the beginning of the computer era, while the dimensionality of the data sets is rapidly increasing in recent years. Contemporary and especially future technologies allow us to acquire, store and process large high dimensional data collections that are commonly available in areas like medicine, biology, information retrieval, web analysis, social network analysis, image processing, financial transaction analysis and many others.

To have any chance to process such amount of the data we have to reduce amounts of raw data by categorizing them in smaller set of similar items, we have to identify groups that occurs in the data, we have to reveal structures hidden in the data. These tasks are precisely the purpose of methods known as clustering. There are many clustering methods, we will focus on clustering methods based on unsupervised learning in this paper.

Unfortunately, there are two major issues faced by clustering algorithms based on unsupervised learning, such as Growing Neural Gas (GNG) [10], that prevent them to be effective, in many real applications, on vast high dimensional data collection:

  1. 1.

    The fast growth of computational complexity with respect to growing data dimensionality, and

  2. 2.

    The specific similarity measurement in a high-dimensional space, where the expected distance, computed by Euclidean metrics to the closest and to the farthest point of any given point, shrinks with growing dimensionality [1].

The growth of computational complexity can be partially solved using the parallel computation facilities, such as High Performance Computing (HPC) cluster with MPI technology. Obviously, it is necessary to resolve technical and implementation issues specific to this computing platform, such as minimizing of interprocess communication, to provide effective parallel implementation of GNG.

The amount of interprocess increases with the growing number of neurons and edges connecting the neurons. The addition of a new neuron and edges to GNG is driven by condition given at the startup – neuron is added after predefined amount of time regardless the data collection properties. Respecting the data collection properties it is easy to see that some addition of a new neuron and edges is not necessary, for example the addition of a new neuron in dense data area caused just by precalculated condition. Similar approach can be used for outlying data. When a new neuron is created to cover this part of the data collection, there is no special need to attach a new neuron to more neurons than the nearest. In this way the future edge disposal is eliminated. A new neuron is attached to two nearest neurons only in the case that a new neuron would cover a part of the data collection located just nearby these two neurons. So, the sum of distances between a new neuron to these two neurons should be proportional to the distance between these two neurons themselves.

The proposed approach is based only on standard GNG learning algorithms, there is no need to apply space partitioning method to improve nearest neuron search. The performed experiments shows that our approach clearly adhere the structure of data collection, while quality of the GNG is preserved.

We will first introduce the terminology, the notation which we use in the article and related works to GNG. In the next section we describe a new approach how to add a new neuron to GNG. Section Experiments contains statistics and visualization of results. In conclusion, we discuss the advantages of our approach.

2 Growing Neural Gas

The principle of this neural network is an undirected graph which need not be connected. Generally, there are no restrictions on the topology. The graph is generated and continuously updated by competitive Hebbian Learning [9, 13]. According to the pre-set conditions, new neurons are automatically added and connections between neurons are subject to time and can be removed. GNG can be used for vector quantization by finding the code-vectors in clusters [8], clustering data streams [7], biologically influenced [14] and 3D model reconstruction [12]. GNG works by modifying the graph, where the operations are the addition and removal of neurons and edges between neurons.

To understand the functioning of GNG, it is necessary to define the algorithm. The algorithm described in our previous article [16] is based on the original algorithm [6, 8], but it is modified for better continuity in the SOM algorithm. Here is the Algorithm 1 which describes one iteration.

Remark

The notation used in the paper is briefly listed in Table 1.

Table 1. Notation used in the paper
figure a

2.1 Related Works

The methods based on Artificial Neural Networks (ANN) are computationally expensive. There are different approaches on how to improve effectivity of these methods – improve computation of the nearest neurons, reduce number of computation (batch), parallel implementation and other.

The authors of the paper [4] propose two optimization techniques that are aimed at an efficient implementation of the GNG algorithm internal structure. Their optimizations preserve all properties of the GNG algorithm. The technique enhances the nearest neighbor search using a space partitioning by a grid of rectangular cells and the second technique speeds up the handling of node errors using the lazy evaluation approach. The authors in [13] propose a algorithm for a GNG which can learn new input data (plasticity) without degrading the previously trained network and forgetting the old input data (stability). Online Incremental Supervised Growing Neural Gas in [3] is an algorithm whose features are zero nodes initialization, the original batch Supervised Growing Neural Gas node insertion mechanism and network size constraint. In [2] is proposed a batch variant of Neural gas (NG) which allows fast training for a priorly given data set and a transfer to proximity data. Author’s algorithm optimizes the same cost function as NG with faster convergence than original algorithm. A paper [11] proposes a Growing Neural Gas based on density, which is useful for clustering. An algorithm creates new units based on the density of data, producing a better representation of the data space with a less computational cost for a comparable accuracy. Authors use access methods to reduce considerably the number of distance calculations during the training process.

In the paper [5] the authors combine the batch variant of the GNG algorithm with the MapReduce paradigm resulting in a GNG variant suitable for processing large data sets in scalable, general cluster environments. The paper [15] is focused on the actualizations of neurons weights in the learning phase of parallel implementation of SOM. Authors study update strategies between Batch SOM – updates are processed at the end of whole epoch – and immediately updating after processing one input vector.

3 Optimization of Learning Phase

In our paper [16] we dealt with the parallelization of GNG. The main problem that reduces parallelization is communication between processes and threads. This communication increases with the number of neurons in the neural network. The goal of our optimization is to reduce the number of neurons and to manage the addition of new neurons.

By default, new neurons are added after the condition, which is determined at startup calculation - see Algorithm 1 step 8 (a neuron added after a certain period of time). We identified after analysing the GNG learning algorithm that the algorithm does not take into account the input data. The only limitation is the maximum number of neurons in the neural network. Our proposed solution consists of two parts. The first part is a change in the condition when inserting a new neuron and the second part is a change in the weight of this new neuron.

3.1 A New Approach How to Add a New Neuron

The goal is to change the current condition that adds a neuron (after a certain time) to a new condition that will take into account the data it is working on. We do not have to specify number of steps for which we add a new neuron. The basic principle of our approach for adding a new one is to evaluate the distance of the input vector from the neurons \(\mathsf {N}_{c_1}\) and \(\mathsf {N}_{c_2}\). If the input vector is close to \(\mathsf {N}_{c_1}\) then our algorithm does not add the new neuron. Standard approach allways add the new neuron in this situation.

The following inequalities determined when we add a new neuron. A new neuron is added to the neural network, if one of inequalities (6), (7), (8) is true.

$$\begin{aligned}&||\mathsf {N}_{c_2}-\mathbf {x}(t)||< ||\mathsf {N}_{c_1}-\mathsf {N}_{c_2}|| \quad \wedge \quad ||\mathsf {N}_{c_1}-\mathbf {x}(t)||\ge k||\mathsf {N}_{c_1}-\mathsf {N}_{c_2}|| \end{aligned}$$
(6)
$$\begin{aligned}&||\mathsf {N}_{c_2}-\mathbf {x}(t)||\ge ||\mathsf {N}_{c_1}-\mathsf {N}_{c_2}|| \quad \wedge \quad ||\mathsf {N}_{c_1}-\mathbf {x}(t)||\ge k||\mathsf {N}_{c_1}-\mathsf {N}_{c_2}|| \end{aligned}$$
(7)
$$\begin{aligned}&||\mathsf {N}_{c_2}-\mathbf {x}(t)|| > ||\mathsf {N}_{c_1}-\mathsf {N}_{c_2}||, \end{aligned}$$
(8)

where \(\mathbf {x}(t)\) is input vector, parametr k specifies the size of area around \(\mathsf {N}_{c_1}\) and \(0<k\le 1/2\).

Fig. 1.
figure 1

Addition of a new neuron based on given input vectors \(\mathbf {x_2}\), \(\mathbf {x_3}\) and \(\mathbf {x_4}\)

In Fig. 1, two neurons (\(\mathsf {N}_1\) and \(\mathsf {N}_2\)) and four input vectors (\(\mathbf {x}_{1}\), \(\mathbf {x}_{2}\), \(\mathbf {x}_{3}\) and \(\mathbf {x}_{4}\)) can be seen. Neurons \(\mathsf {N}_1\) and \(\mathsf {N}_2\) represent BMU \(\mathsf {N}_{c_1}\) and second BMU \(\mathsf {N}_{c_2}\) for all input vectors in the example. The input vector \(\mathbf {x}_{1}\) is too closed to \(\mathsf {N}_1\) (\(||\mathsf {N}_{1}-\mathbf {x}_{1}||<k||\mathsf {N}_{c_1}-\mathsf {N}_{c_2}||\)) and a new neuron is not added (Fig. 1(b)). If the input vectors \(\mathbf {x}_{2}\), \(\mathbf {x}_{3}\) and \(\mathbf {x}_{4}\) are selected then a new neuron is added. The input vector \(\mathbf {x}_{2}\) satisfies inequality (6) and it is in the standard situation (Fig. 1(a)). The new added neuron \(\mathsf {N}_3\) obtains weight which is calculated: \(1/2(w_{c_1}(t)+w_{c_2}(t))\). The edge between \(\mathsf {N}_{1}\) and \(\mathsf {N}_{2}\) is deleted and new edges are created which connect a new neuron \(\mathsf {N}_3\) with neuron \(\mathsf {N}_1\) and \(\mathsf {N}_2\). The age of new edges is set to zerro.

The inequality (8) is true for the input vector \(\mathbf {x}_{3}\) (see Fig. 1(c)) and the inequality (7) is true for the input vector \(\mathbf {x}_{4}\) (see Fig. 1(d)). Only one edge connect the new added neuron \(\mathsf {N}_4\) (\(\mathsf {N}_5\)) with BMU \(\mathsf {N}_1\) in both situations. The weight of new neuron and the age of the new edge is set in the standard way.

4 Experiments

4.1 Experimental Datasets and Hardware

One dataset was used in the experiments. The dataset was commonly used in benchmark – Clustering dataset.

Clustering Dataset. Three training data collections called TwoDiamonds, Lsun and Target from the Fundamental Clustering Problems Suite (FCPS) were used. A short description of the selected dataset used in our experiments is given in Table 2.

Table 2. Fundamental Clustering Problems Suite – selected datasets

Experimental Hardware. The experiments were performed on a Linux HPC cluster, named Anselm, with 209 computing nodes, where each node had 16 processors with 64 GB of memory. Processors in the nodes were Intel Sandy Bridge E5-2665. Compute network is InfiniBand QDR, fully non-blocking, fat-tree. Detailed information about hardware is possible to find on the web site of Anselm HPC clusterFootnote 1.

4.2 The Experiments

The first part of the experiments was oriented towards comparing the results obtained in density versions (k = 0.3 and k = 0.5) and standard GNG algorithm. The Clustering dataset was used for the experiment. The parallel version of the learning algorithm was run using 16 MPI processes. The GNG parameters are the same for all experiments and are as follows \(e_w\) = 0.05, \(e_n\) = 0.006, \(\alpha \) = 0.5, \(\beta \) = 0.0005, \(a_{max}\) = 100, M = 1000, \(\delta \) = 200.

The first row in the Table 3 shows a layout view of the input data, which are used for training GNG. The outputs of standard GNG algorithms are in the second row. In third and fourth rows are result of our proposal method, but each for different size areas that will not update.

Table 3. Graphical representations of data set layout and corresponding GNGs

In the Table 4, we can see number of neurons which have been used.

For data collection TwoDiamonds, the time computation of standard GNG is 6.05 s, parallel GNG with k = 0.3 is 3.9 s and parallel GNG with k = 0.5 is 2.2 s.

Table 4. Number of used neurons

5 Conclusion

In this paper the parallel implementation of the GNG neural network algorithm based on data density is presented. The achieved speed-up was better than our previous approach. That’s because there are only fewer neurons in the network. Therefore, it takes less time to locate the BMU. For parallelization, we generally try to keep communication as small as possible, which reduces this communication due to fewer neurons; this is most evident when adding and removing neurons.

In future work we intend to focus on the sparse date, use combinations of neural networks for improved result and improved acceleration.