Keywords

1 Introduction

Due to the expansion of network communications, data are generated continuously. Such data are called big data, and there have been many attempts to analyze and apply them to various research fields [1, 2, 11, 16, 24].

Laney derived three concepts of the big data characteristics, which are as follows [18]:

  1. (1)

    Data Volume: Massive amounts of data that continue to grow after being generated.

  2. (2)

    Data Velocity: Increasing numbers of networks are generating data continuously, which means that the data generation velocity is very high.

  3. (3)

    Data Variety: Data in a pool can be of different types, such as time-series, real environment, artificial environment, textual, and image data.

These three characteristics that are considered by [18] are known as the 3Vs, and are taken into consideration while dealing with big data.

In machine learning and data analysis research, it is necessary to estimate the probability density. However, it is difficult to estimate the probability density of big data due to the three reasons [22].

First, the density estimator for big data must be nonparametric because of the data volume. Further, we observe that parametric methods are effective for handling fixed data, because it is possible to tune the parameters of the method to obtain an optimal performance. However, the volume of big data is not observed to be constant. Therefore, the volume of big data cannot be analyzed in advance in order to obtain optimal parameters for the density estimator. However, we observe that the nonparametric density estimator is not troublesome, since analyzing and constructing a big data model beforehand is not necessary for a nonparametric density estimator.

Second, the density estimator for big data must use online learning methods due to the observed data velocity. In big data, massive amounts of data grow quickly until the total size of data becomes gigantic. Online learning methods can be sequentially updated using the growing data.

Third, the density estimator for big data must be robust. Data that are collected from real environments often contain noise, which could cause overfitting and decrease performance. Thus, robust methods are required to deal with data that contain noise.

Further, we observe that robustness is defined differently across various fields [9, 13]. In this study, we define robustness as ‘a function that provides almost the same results as learning data without noise when learning with noisy data.’ [22]. Further, we observe that there are two types of noise. The first type is the noise that is generated by the environment, but that is not related to the objective distribution. Thus, this type of noise needs to be eliminated. The second type is observed to be related to variance and fluctuation. Therefore, this type of noise must be preserved.

The kernel density estimation self-organizing incremental neural network (KDESOINN) method [22] satisfies all the three conditions for dealing with big data and is further observed to be robust to noise. However, it cannot adapt to a changing environment. Due to the variety of big data, the structure of data is likely to vary at any instance. Therefore, an ability to adapt to the observed variation of data is required. In this study, we propose a revised KDESOINN method to solve this problem. Further, our proposed method has been termed adaptive KDESOINN (AKDESOINN) in this paper.

2 Related Works

2.1 Kernel Density Estimation

Kernel Density Estimation (KDE) is a typical nonparametric density estimation approach [23]. The methodology of KDE process is presented in Algorithm 1

Algorithm 1.

Kernel Density Estimation

  1. (1)

    Require: training samples \( \left\{ {x_{i} \left| {x_{i} { \in {\mathbb{R}}}^{d} ,i = 1,2, \ldots ,N} \right.} \right\} \), K : kernel function, H : bandwidth matrix

  2. (2)

    \( \hat{p}\left( x \right) = \frac{1}{N}\sum\nolimits_{i = 1}^{N} {K_{H} \left( {x - x_{i} } \right)} \)

For the kernel function K, the Gaussian kernel is often used in an identical manner as that in (1)

$$ K_{H} \left( {x - \mu } \right) = \left( {1/\sqrt {\left( {2\pi } \right)^{d} \left| H \right|} } \right) * {\text{exp}}\left( { - \left( {x - \mu } \right)^{T} H^{ - 1} \left( {x - \mu } \right)/2} \right) $$
(1)

H in algorithm 1 is a parameter, which influences the performance of the estimation function. Further, attempts have been made to optimize the estimation function [8, 10]. KDE has been investigated using several methods such as by method of setting the number of kernels [3], gradient descent method [19], and online clustering method [17].

2.2 Self-organizing Incremental Neural Network

In the field of artificial intelligence, artificial neural networks have been recently proposed. They are usually classified into two groups, namely, supervised and unsupervised learning [25].

SOINN is an unsupervised learning method that is driven by growing neural gas [4]. There are several kinds of SOINN, including two-layer [5], enhanced [6], and adjusted SOINN [7]. Since the adjusted SOINN has less parameters than that of the other SOINNs, it is generally used in applied research [12, 14, 15].

While SOINN learns from the training data, it constructs a data network through competitive learning. Various nodes are added or deleted from the network or they may update their location. Further, the edges are added or deleted in a similar manner as the nodes. Thus, the SOINN network is updated in order to approximate the distribution using the added input data.

The flowchart of the adjusted SOINN is depicted in Fig. 1, and its procedural flow is presented in Algorithm 2

Fig. 1.
figure 1

Flowchart of SOINN

Algorithm 2.

Adjusted SOINN process

  1. (1)

    Require: A: set of all neurons. \( {\text{C}} \subset {\text{A}} \times {\text{A}} \): set of all edges. \( N_{i} \): set of all neighbors of neuron \( i \). \( W_{i} \): weight of neuron \( i \). \( \uplambda \): time period to delete redundant neurons. \( age_{max} \): parameter to delete edges.

  2. (2)

    if first time of input then

  3. (3)

    \( {\text{A}} \leftarrow c_{1} ,c_{2} \); randomly pick up two vectors from training data to initialize the neuron set.

  4. (4)

    \( C \leftarrow \emptyset \)

  5. (5)

    end if

  6. (6)

    while input data \( \upxi \) exist do

  7. (7)

    \( s_{1} \leftarrow argmin_{c \in A} \left\| {\upxi - W_{c} } \right\| \): find out the winner.

  8. (8)

    \( s_{2} \leftarrow argmin_{{c \in A\backslash s_{1} }} \left\| {\upxi - W_{c} } \right\| \): find out second winner.

  9. (9)

    calculate similarity thresholds \( T_{{s_{1} }} , T_{{s_{2} }} \). If \( i \) got neighbors, \( T_{i} \) is the distance to the farthest neighbor, else the distance to the nearest neuron.

  10. (10)

    if \( \left\| {\upxi - W_{{s_{1} }} } \right\| > T_{{s_{1} }} \) or \( \left\| {\upxi - W_{{s_{2} }} } \right\| > T_{{s_{2} }} \) then

  11. (11)

    \( {\text{A}} \leftarrow {\text{A}} \cup\upxi \): insert \( \upxi \) as a new neuron.

  12. (12)

    else

  13. (13)

    if \( \left( {s_{1} , s_{2} } \right) \notin {\text{C}} \): there is no edge between the winner and second winner, then

  14. (14)

    \( {\text{C}} \leftarrow {\text{C}} \cup \left( {s_{1} , s_{2} } \right) \): add new edge into the network

  15. (15)

    end if

  16. (16)

    \( age_{{\left( {s_{1} , s_{2} } \right)}} \leftarrow 0 \): reset the age of \( \left( {s_{1} , s_{2} } \right) \)

  17. (17)

    \( age_{{\left( {s_{1} , i} \right)}} \leftarrow age_{{\left( {s_{1} , i} \right)}} + 1\left( {\forall i \in N_{{s_{i} }} } \right) \): increase age of edges connected with the winner by 1.

  18. (18)

    \( \Delta W_{{s_{i} }} = \epsilon \left( {t_{{s_{1} }} } \right)\left( {\xi - W_{{s_{1} }} } \right), \Delta W_{i} = \epsilon \left( {100t_{i} } \right)\left( {\xi - W_{i} } \right)\left( {\forall i \in N_{{s_{i} }} } \right), \epsilon \left( t \right) = \frac{1}{t} \)

  19. (19)

    using \( vartriangleW_{{s_{i} }} ,\Delta W_{i} \) to adjust the winner and its neighbors

  20. (20)

    delete edges whose age is larger than \( age_{max} \)

  21. (21)

    among these neurons which the edge deleted in last step connected to, delete neurons having no neighbors.

  22. (22)

    end if

  23. (23)

    if input data number becomes \( {\text{n}} \times\uplambda\left( {{\text{n}} \in N^{ + } } \right) \) then

  24. (24)

    Delete neurons having less than one neighbor

  25. (25)

    end if

  26. (26)

    end while

2.3 KDESOINN

KDESOINN is an extended version of the adjusted SOINN [22]. It determines the structure of the network using each kernel in the node of a local network that is located near the node. Additionally, it estimates the probability function using the sum of the kernels. In the adjusted SOINN, only the Euclidean distance is used for calculating the similarity thresholds. Conversely, KDESOINN calculates the threshold using Algorithm 3.

Algorithm 3.

KDESOINN threshold calculation

  1. (1)

    Require: A: set of all neurons. \( \upxi \): new sample data. \( P_{i} \): set of nodes connected to node \( i \). \( \uprho \): parameter for threshold. \( \varpi_{i} \in {\mathbb{R}}^{d} \): positional vector of node \( i \). \( t_{i} \): number of wins of node \( i \) in competitive learning. \( I \): identity matrix. \( \Theta _{i} \): threshold region of node \( i \).

  2. (2)

    calculate \( \gamma_{i} = \left\{ {\begin{array}{*{20}l} { \mathop {\hbox{min} }\nolimits_{{p \in P_{i} }} \left\| { w_{p} - w_{i} } \right\|\quad \left( {P_{i} \ne \phi } \right)} \hfill \\ {\mathop {\hbox{min} }\nolimits_{{p \in A\left\{ i \right\}}} \left\| {w_{p} - w_{i} } \right\|\quad \left( {other wise} \right)} \hfill \\ \end{array} } \right. \)

  3. (3)

    \( T_{{P_{i} }} \leftarrow \sum\nolimits_{{i \in P_{i} }} {t_{i} } \)

  4. (4)

    \( C_{i} \leftarrow \frac{1}{{T_{{P_{i} }} }}\sum\nolimits_{{p \in P_{i} }} {t_{p} \left( {w_{p} - w_{i} } \right)\left( {w_{p} - w_{i} } \right)^{T} } \)

  5. (5)

    \( M_{i} \leftarrow C_{i} + \rho \gamma_{i} I \)

  6. (6)

    threshold region \( \Theta _{i} = \left( {\xi - w_{i} } \right)^{T} M_{i}^{ - 1} \left( {\xi - w_{i} } \right) \le 1 \)

KDESOINN can divide clusters more effectively than the adjusted SOINN. The entire process of KDESOINN is presented in Algorithm 4

Algorithm 4.

KDESOINN process

  1. (1)

    Require: A: set of all neurons. \( {\text{C}} \subset {\text{A}} \times {\text{A}} \): set of all edges. \( N_{i} \): set of all neighbors of neuron \( i \). \( W_{i} \): weight of neuron \( i \). \( \uplambda \): time period to delete redundant neurons. \( age_{max} \): parameter to delete edges. \( P_{i} \): set of nodes connected to node \( i \). \( \uprho \): parameter for threshold. \( t_{i} \): number of wins of node \( i \) in competitive learning. \( I \): identity matrix. E(G): set of edges in graph G.

  2. (2)

    if first time of input then

  3. (3)

    \( {\text{A}} \leftarrow c_{1} ,c_{2} \); randomly pick up two vectors from training data to initialize the neuron set.

  4. (4)

    \( C \leftarrow \emptyset \)

  5. (5)

    end if

  6. (6)

    while input data \( \upxi \) exist do

  7. (7)

    \( s_{1} \leftarrow argmin_{c \in A} \left\| {\upxi - W_{c} } \right\| \): find out the winner.

  8. (8)

    \( s_{2} \leftarrow argmin_{{c \in A\backslash s_{1} }} \left\| {\upxi - W_{c} } \right\| \): find out second winner.

  9. (9)

    calculate similarity thresholds \( \Theta _{{s_{1} }} ,\Theta _{{s_{2} }} \) by algorithm 3.

  10. (10)

    if \( \left( {\upxi - W_{{s_{1} }} } \right)^{T} M_{{s_{1} }}^{ - 1} \left( {\upxi - W_{{s_{2} }} } \right) > 1 \) or \( \left( {\upxi - W_{{s_{2} }} } \right)^{T} M_{{s_{2} }}^{ - 1} \left( {\upxi - W_{{s_{2} }} } \right) > 1 \) then

  11. (11)

    \( {\text{A}} \leftarrow {\text{A}} \cup\upxi \): insert \( \upxi \) as a new neuron.

  12. (12)

    else

  13. (13)

    if \( \left( {s_{1} , s_{2} } \right) \notin {\text{C}} \): there is no edge between the winner and second winner, then

  14. (14)

    \( {\text{C}} \leftarrow {\text{C}} \cup \left( {s_{1} , s_{2} } \right) \): add new edge into the network

  15. (15)

    end if

  16. (16)

    \( age_{{\left( {s_{1} , s_{2} } \right)}} \leftarrow 0 \): reset the age of \( \left( {s_{1} , s_{2} } \right) \)

  17. (17)

    \( age_{{\left( {s_{1} , i} \right)}} \leftarrow age_{{\left( {s_{1} , i} \right)}} + 1\left( {\forall i \in N_{{s_{i} }} } \right) \): increase age of edges connected with the winner by 1.

  18. (18)

    \( \Delta W_{{s_{i} }} = \epsilon \left( {t_{{s_{1} }} } \right)\left( {\xi - W_{{s_{1} }} } \right), \Delta W_{i} = \epsilon \left( {100t_{i} } \right)\left( {\xi - W_{i} } \right)\left( {\forall i \in N_{{s_{i} }} } \right), \epsilon \left( t \right) = \frac{1}{t} \)

  19. (19)

    using \( vartriangleW_{{s_{i} }} ,\Delta W_{i} \) to adjust the winner and its neighbors

  20. (20)

    delete edges whose age is larger than \( age_{max} \)

  21. (21)

    among these neurons which the edge deleted in last step connected to, delete neurons having no neighbors.

  22. (22)

    end if

  23. (23)

    if input data number becomes \( {\text{n}} \times\uplambda\left( {{\text{n}} \in N^{ + } } \right) \) then

  24. (24)

    delete neurons having no neighbor

  25. (25)

    create a k-NN graph \( G \) whose set of nodes is A.

  26. (26)

    \( {\text{C}} \leftarrow {\text{C}} \cup \left\{ {\left( {i, j} \right)\left| {\left( {i, j} \right) \in E\left( G \right), \left( {j, i} \right) \in E\left( G \right)} \right.} \right\} \)

  27. (27)

    end if

  28. (28)

    end while

  29. (29)

    create a k-NN graph \( G \) whose set of nodes is A.

  30. (30)

    \( {\text{C}} \leftarrow {\text{C}} \cup \left\{ {\left( {i, j} \right)\left| {\left( {i, j} \right) \in E\left( G \right), \left( {j, i} \right) \in E\left( G \right)} \right.} \right\} \)

3 Proposed Method

To improve KDSOINNs ability of adapting to the changing data, algorithm 5 was used after line 10 of Algorithm 4

Algorithm 5.

Adaptive step

  1. (1)

    Require: \( s_{1} \): first winner. \( s_{2} \): second winner. \( \upeta \): parameter for adapting. \( \xi \): new sample data.

  2. (2)

    \( D_{{s_{1} }} \leftarrow \left| {s_{1} - \xi } \right|, D_{{s_{2} }} \leftarrow \left| {s_{2} - \xi } \right| \)

  3. (3)

    update \( s_{1} \leftarrow s_{1} + \frac{{\eta D_{{s_{2} }} }}{{D_{{s_{1} }} + D_{{s_{2} }} }}\left( {\xi - s_{1} } \right) \)

  4. (4)

    update \( s_{2} \leftarrow s_{2} + \frac{{\eta D_{{s_{1} }} }}{{D_{{s_{1} }} + D_{{s_{2} }} }}\left( {\xi - s_{2} } \right) \)

By applying algorithm 5, SOINN can adapt to the data as they change with time. \( \upeta \) is the adaptation parameter. Further, if \( \upeta \) is observed to be equal to 0, the performance of AKDESOINN is observed to be exactly the same as that of KDESOINN. If \( \upeta \) is observed to be bigger than 1, it is possible that it can fit over \( \upxi \). To avoid overfitting and low performance, it is recommended to set \( \upeta \) within the range of 0 to 1. The entire process of AKDESOINN is presented in algorithm 6.

Algorithm 6.

AKDESOINN process

  1. (1)

    Require: A: set of all neurons. \( {\text{C}} \subset {\text{A}} \times {\text{A}} \): set of all edges. \( N_{i} \): set of all neighbors of neuron \( i \). \( W_{i} \): weight of neuron \( i \). \( \uplambda \): time period to delete redundant neurons. \( age_{max} \): parameter to delete edges. \( P_{i} \): set of nodes connected to node \( i \). \( \uprho \): parameter for threshold. \( \upeta \): parameter for adapting. \( t_{i} \): number of wins of node \( i \) in competitive learning. \( I \): identity matrix. E(G): set of edges in graph G.

  2. (2)

    if first time of input then

  3. (3)

    \( {\text{A}} \leftarrow c_{1} ,c_{2} \); randomly pick up two vectors from training data to initialize the neuron set.

  4. (4)

    \( C \leftarrow \emptyset \)

  5. (5)

    end if

  6. (6)

    while input data \( \upxi \) exist do

  7. (7)

    \( s_{1} \leftarrow argmin_{c \in A} \xi - W_{c} \): find out the winner.

  8. (8)

    \( s_{2} \leftarrow argmin_{{c \in A\backslash s_{1} }} \xi - W_{c} \): find out second winner.

  9. (9)

    calculate similarity thresholds \( \Theta _{{s_{1} }} ,\Theta _{{s_{2} }} \) by Algorithm 3.

  10. (10)

    if \( \left( {\xi - W_{{s_{1} }} } \right)^{T} M_{{s_{1} }}^{ - 1} \left( {\xi - W_{{s_{2} }} } \right) > 1 \) or \( \left( {\xi - W_{{s_{2} }} } \right)^{T} M_{{s_{2} }}^{ - 1} \left( {\xi - W_{{s_{2} }} } \right) > 1 \) then

  11. (11)

    \( D_{{s_{1} }} \leftarrow \left| {s_{1} - \xi } \right|, D_{{s_{2} }} \leftarrow \left| {s_{2} - \xi } \right| \)

  12. (12)

    update the location of \( s_{1} \leftarrow s_{1} + \frac{{\eta D_{{s_{2} }} }}{{D_{{s_{1} }} + D_{{s_{2} }} }}\left( {\xi - s_{1} } \right) \)

  13. (13)

    update the location of \( s_{2} \leftarrow s_{2} + \frac{{\eta D_{{s_{1} }} }}{{D_{{s_{1} }} + D_{{s_{2} }} }}\left( {\xi - s_{2} } \right) \)

  14. (14)

    \( {\text{A}} \leftarrow {\text{A}} \cup\upxi \): insert \( \upxi \) as a new neuron.

  15. (15)

    else

  16. (16)

    if \( \left( {s_{1} , s_{2} } \right) \notin {\text{C}} \): there is no edge between the winner and second winner, then

  17. (17)

    \( {\text{C}} \leftarrow {\text{C}} \cup \left( {s_{1} , s_{2} } \right) \): add new edge into the network

  18. (18)

    end if

  19. (19)

    \( age_{{\left( {s_{1} , s_{2} } \right)}} \leftarrow 0 \): reset the age of \( \left( {s_{1} , s_{2} } \right) \)

  20. (20)

    \( age_{{\left( {s_{1} , i} \right)}} \leftarrow age_{{\left( {s_{1} , i} \right)}} + 1\left( {\forall i \in N_{{s_{i} }} } \right) \): increase age of edges connected with the winner by 1.

  21. (21)

    \( \Delta W_{{s_{i} }} = \epsilon \left( {t_{{s_{1} }} } \right)\left( {\xi - W_{{s_{1} }} } \right), \Delta W_{i} = \epsilon \left( {100t_{i} } \right)\left( {\xi - W_{i} } \right)\left( {\forall i \in N_{{s_{i} }} } \right), \epsilon \left( t \right) = \frac{1}{t} \)

  22. (22)

    using \( vartriangleW_{{s_{i} }} ,\Delta W_{i} \) to adjust the winner and its neighbors

  23. (23)

    delete edges whose age is larger than \( age_{max} \)

  24. (24)

    among these neurons which the edge deleted in last step connected to, delete neurons having no neighbors.

  25. (25)

    end if

  26. (26)

    if input data number becomes \( {\text{n}} \times\uplambda\left( {{\text{n}} \in N^{ + } } \right) \) then

  27. (27)

    delete neurons having no neighbor

  28. (28)

    create a k-NN graph \( G \) whose set of nodes is A.

  29. (29)

    \( {\text{C}} \leftarrow {\text{C}} \cup \left\{ {\left( {i, j} \right)\left| {\left( {i, j} \right) \in E\left( G \right), \left( {j, i} \right) \in E\left( G \right)} \right.} \right\} \)

  30. (30)

    end if

  31. (31)

    end while

  32. (32)

    create a k-NN graph \( G \) whose set of nodes is A.

  33. (33)

    \( {\text{C}} \leftarrow {\text{C}} \cup \left\{ {\left( {i, j} \right)\left| {\left( {i, j} \right) \in E\left( G \right), \left( {j, i} \right) \in E\left( G \right)} \right.} \right\} \)

4 Experimental Study

In order to compare AKDESOINN with other methods, experimental evaluations were performed to evaluate the robustness, calculation time, accuracy, and adaptation ability of SOINN, KDESOINN, and AKDESOINN. The experimental environment comprised MATLAB2017b that was used on a personal computer having an eight-core CPU at 3.40 GHz and 16.0 GB RAM.

4.1 Fixed Gaussian Distribution

Initially, we evaluated the performance of the proposed method using a fixed Gaussian distribution. Specific details regarding the experiment are described in Table 1.

Table 1. Information of experiment 1

The experiment was repeated 100 times. Further, the Jensen-Shannon divergence was used to compare the accuracy [20], and the results are presented in Table 2.

Table 2. Result of the 100 trials of experiment 1

According to Table 2, KDESOINN is observed to depict the most effective performance in this experiment. Further, AKDESOINN depicts better performance than SOINN.

4.2 Changing Gaussian Distribution

To evaluate the adaptation performance to the changing data, a Gaussian distribution was used in the experiment. Specific details of the experiment are provided in Table 3.

Table 3. Information of experiment 2

The experiment was repeated 100 times, and the results were compared using the Jensen-Shannon divergence, in a similar manner as that in experiment 1. The results of the comparison are presented in Table 4.

Table 4. Result of the 100 trials of experiment 2

According to Table 4, AKDESOINN was observed to be the most effective in experiment 2 in terms of mean value

5 Conclusion

In this study, AKDESOINN was proposed, not only as a robust fast online nonparametric density estimator, but also as an adaptive method to the changing data. KDESOINN is a method combining both KDE and SOINN and is known to outperform the existing nonparametric density estimators in terms of robustness, calculation cost, and accuracy. The revised KDESOINN algorithm was successful in adapting to the changing data without depicting any performance loss.

For future studies, we could analyze the application of AKDESOINN to meteorological data. Because of the extraordinary climatic conditions, analysis models need to be updated constantly [21]. Because AKDESOINN can adapt its model online with analyzing the data, AKDESOIN may be effective in addressing this problem of constant change of large amounts of climatic data. Apart from the meteorological data, AKDESOINN could be extensively applicable to other fields that require nonparametric density estimation with a suitable adaptation ability to the changing data.