Semi-supervised Classification of Data Streams Based on Adaptive Density Peak Clustering

Liu, Changjie; Wen, Yimin; Xue, Yun

doi:10.1007/978-3-030-63833-7_54

Changjie Liu¹⁴,
Yimin Wen¹⁴ &
Yun Xue¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12533))

Included in the following conference series:

International Conference on Neural Information Processing

2604 Accesses

Abstract

In the real-world scenario of data stream classification, label scarcity is very common. More challenges are data streams always include concept drifts. To handle these challenges, an algorithm of semi-supervised classification of data streams based on adaptive density peak clustering (SSCADP) is proposed. In SSCADP, to generate concept clusters at leaves in a Hoeffding tree, a density peak clustering method and a change detection technique are combined to adaptively locate the clustering centers. Concerning concept drift detection, we argue that the change of cluster with higher density more likely reflect the change of data distribution. Hence, to detect concept drifts, an adaptive weighting method for density change detection is proposed to calculate the deviations between the history concept clusters and new ones. Experiments on synthetic and real datasets confirm the advantages of SSCADP.

Access provided by Autonomous University of Puebla. Download conference paper PDF

A dynamic hierarchical incremental learning-based supervised clustering for data stream with considering concept drift

Article 04 January 2022

Semi-Supervised Stream Clustering Using Labeled Data Points

Semi-supervised Classification on Data Streams with Recurring Concept Drift Based on Conformal Prediction

Keywords

1 Introduction

The classification of data steams with concept drift is one of the main challenges of data mining [1,2,3]. In various real-applications, including network intrusion detection, spam filtering and credit card fraud detection [4] etc., due to labeling cost and time consuming, it is unrealistic that all instances are labeled by expert. Therefore, a semi-supervised classification algorithm which can handle concept drifts plays a critical role in addressing the issue of data stream mining.

To overcome these challenges, many researches have been reported in recent years. However, there are still some limitations. Firstly, many existing methods [5, 6] commonly take clustering methods to label the unlabeled instances which should assign the number of clusters in advance and keep it constant during the processing of data streams. Secondly, many methods ignore the impact of high density clusters on concept drift detection [5, 7].

Specifically in SUN [5], firstly, K-modes is used to form clusters and label the unlabeled data. However, using K-modes requires setting the number of clusters in advance and keeping it unchanged during the processing of data streams, which is unreasonable in many real-applications. Since the dynamic feature of data streams, new concept clusters may appear while old may disappear. Secondly, the average deviation between the history and new concept clusters is adopted to detect concept drifts, which ignores changes in high-density clusters which are more likely result in concept drifts.

In light of these limitations, an approach of semi-supervised classification of data streams based on adaptive density peak clustering (SSCADP) is proposed. The framework of SSCADP is the same as SUN, but there are two main differences.

Firstly, to generate concept clusters at the leaves in a Hoeffding tree, a density peak clustering method [8] and a change detection technique [9] are combined to adaptively locate the cluster centers, instead of using K-modes. Secondly, we consider that the change of clusters with higher density is more likely to reflect the change of data distribution. And hence an improved detection method based on SUN is proposed that an adaptive weighted average strategy is adopted to assign higher weight value to the higher density clusters.

The rest of this paper is organized as follows. Section 2 presents some related work. Section 3 describes the proposed algorithm in detail including adaptively locating the cluster centers, labeling the unlabeled samples, and concept drift detection method. Experiments and results are shown in Sect. 4. Finally, Sect. 5 summarizes this paper and future work.

2 Related Work

The proposed approaches for semi-supervised classification of data stream with concept drifts can be broadly divided into decision tree-based and non-decision tree-based methods. The decision tree-based methods like SUN [5] and REDLLA [7] adopt a Hoeffding tree as base classifier. During the construction of the base classifier, unlabeled data are labeled by clustering in leaves, and then added into the detection of concept drifts and the updating of the base classifier. Concept drifts are detected based on the deviation between history concept clusters and the new ones. Others like Sco-forest [10] extends Co-forest algorithm to handle evolving data streams. The concomitant ensemble is used to select samples with high classification confidence and label these samples to update the corresponding base classifier. If a concept drift is detected by Adwin2 [11], the base classifier with the worst accuracy will be discarded.

The non-decision tree-based methods usually take cluster model for data streams classification. Reasc [12] maintains an ensemble of cluster-based classifiers. When updating the ensemble, the cluster-based classifier with the worst accuracy will be removed out. SPASC [6] maintains a classifier pool which is composed of weighted clusters-based model. The weight value will be adjusted adaptively according to the correctness of classification. SCBELS [13] maintains an ensemble of cluster-based classifiers which are constructed by BIRCH [14] and incrementally updated during the classification of data stream. Local structural information of data is taken into account to deal with concept drifts. [15] proposed to dynamically maintain a set of micro-clusters, each instance is used to update the model, outdated or micro-clusters with low reliability are removed to adapt to the evolving concepts of data streams.

Other models like ECU [16], it constructs an ensemble model which combines both classifiers and clusters for classification. [17] proposed a neural network framework for streaming data classification that each layer consists of a generative network, a discriminant structure and a bridge.

3 Proposed Algorithm

3.1 The Framework of the Proposed Algorithm

In this paper, a data stream is represented as ${{\textit{\textbf{D}}}}=\{{{\textit{\textbf{D}}}}^{0},{{\textit{\textbf{D}}}}^{1},{{\textit{\textbf{D}}}}^{2}\ldots ,{{\textit{\textbf{D}}}}^{t},\ldots \}$, in which ${{\textit{\textbf{D}}}}^{t}=\{{{\textit{\textbf{x}}}}_{1}^{t},{{\textit{\textbf{x}}}}_{2}^{t},\ldots ,{{\textit{\textbf{x}}}}_{m}^{t}\}$ indicates the data batch collected at the time t. SSCADP^{Footnote 1} is described in the algorithm 1. It employs a Hoeffding tree as its base classifier. After ${{\textit{\textbf{D}}}}^{t}$ is classified by the Hoeffding tree, each instance in it is sorted into a leaf, the corresponding statistics of the leaf are updated. If the number of instances arrived at the tree meets dp, all the labeled instances in a leaf l are utilized to label the unlabeled instances in it, and then concept drift detection is installed to detect drift at the leaf. Then, a pruning strategy is adopted on the Hoeffding tree, and if the number of instances in a new leaf meet $n_{min}$, the leaf attempts to split. After splitting, the updated Hoeffding tree is ready for new concept.

3.2 Adaptively Locate Cluster Centers and Label Unlabeled Instances

If a detection period is reached, a clustering method named Clustering by fast search and find of density peaks (CFSDP) [8] and a change detection technique [9] are combined to adaptively locate the cluster centers. After concept clusters at each leaf are created, graph-based label propagation [18] is installed to label the unlabeled data in each cluster. If a cluster without any labeled instance, all unlabeled instances in it will be assigned the majority label of the closest cluster.

The basic idea of CFSDP is that the clustering centers should be in the region with high data density and far away from each other. CFSDP is based on two quantities: (1) $\rho _{i}$, the local density of the i-th instance; (2) $\delta _{i}$, the minimum distance between the i-th instance and the instances which have higher density than the i-th instance. $\rho _{i}$ and $\delta _{i}$ are defined as

$$\begin{aligned} {{\rho }_{i}}=\sum \limits _{j\in {{{I}}_{s}}/\{i\}}{\exp \{-{{({{d}_{ij}}/{{d}_{c}})}^{2}}\}}, \ {\delta _i} = \mathop {\min }\limits _{j:{\rho _j} > {\rho _i}} ({d_{ij}}). \end{aligned}$$

(1)

where $d_{ij}$ refers to the Euclidean distance between i-th and j-th instance. $d_{c}$ represents the cutoff distance, which is set the same as CFSDP according to experience. $I_{s}$ is the set of all instances indexes. Hence, a cluster center has such characteristic that $\rho _{i}$ and $\delta _{i} $ are as large as possible. The $ \rho _{i}-\delta _{i}$ plot (decision graph) can provide a visual way to determine the number of clusters.

In the function 1, ${{\gamma }_{i}}={{\rho }_{i}}{{\delta }_{i}}$ is computed for each instance and then all ${\gamma }_{i}$ are sorted in ascending order. CFSDP assumed that the sorted ${\gamma }_{i}$(${\gamma }$) follows the power-law distribution, and hence jump point of the sorted ${\gamma }_{i}$ can be found.

After the cluster centers are determined, the clustering is installed, and then label propagation is conducted.

In order to find the jump point of the sorted ${\gamma }_{i}$, we refer to the idea of change detection in SAND [9], in which a change detection method is proposed to detect the location of the most significant changes in a series of values along a direction. Then, we further assume that the sorted $\gamma _{i}$ follows a Pareto distribution. After the jump point is found, the number of clusters can be determined. As shown in the Fig. 1, the point in the red circle are jump point, the points above it are the center points.

The probability density function of Pareto distribution can be expressed as $f(x,a,k)=a{{k}^{a}}{{x}^{-(a+1)}}$ where a is the shape parameter and k is the proportion parameter. The corresponding logarithmic probability density function can be expressed as $\log f(x,a,k)=a\log (k)+\log (a)-(a+1)\log (x)$. Define a random variable $\gamma \sim \mathrm{Pareto}(a,k)$, and $\{\gamma _{i}\}$ is the observed value of $\gamma $. The maximum likelihood estimation of the parameters are calculated as follow, where N is the total number of observed values.

$$\begin{aligned} {\hat{k}_{MLE}} = \mathop {\min }\limits _{1 \le i \le N} \{ {\gamma _i}\}, \ {\hat{a}_{MLE}} = N/\sum \nolimits _{i = 1}^N {(\ln {\gamma _i} - \ln {{\hat{k}}_{MLE}})} \end{aligned}$$

(2)

To detect the jump point, in the function 2, $\gamma $ is divided into two sub-windows by k from $N /2$ to $N -3$. The k value corresponding to the maximum statistical difference between the two sub-windows is the index of the jump point.

3.3 Concept Drift Detection

A concept drift detection method is installed at each leaf. Before introducing the detail of concept drift detection, many variables should be defined. Respectively, ${r_{hist}}$ and ${r_{new}}$ denote the radius of the set of history concept clusters $C_{hist}$ and the new ones $C_{new}$, $n_{hist}$ and $n_{new}$ represents the number of clusters in $C_{hist}$ and $C_{new}$. $r_k$ denotes the radius of a cluster and is computed as the average Euclidean distance from all instances in the cluster to its center: ${r_k} = \sum \nolimits _{i = 1}^{\left| {{C_k}} \right| } {\sqrt{\sum \nolimits _{j = 1}^{D} {({c_{kj}} - {x_{ij}}} {)^2}} } /\left| {{C_k}} \right| $, where $ x{}_i\mathrm{{ = \{ }}{x_{i1}}\mathrm{{, }}{x_{i2}}\mathrm{{, }}\ldots \mathrm{{ , }}{x_{iD}}\mathrm{{\} }} \in {C_k}$ is the i-th instance in cluster $C_{k}$. D represents the attribute dimension. $c_{k}$ refers to the cluster center of $C_{k}$ and $|C_{k}|$ is the total number of instance in $C_{k}$. ${r_{hist}}\mathrm{{ = }}\sum \nolimits _{i = 1}^{n_{hist}} {{r_i}} / n_{hist}$. Similarly, $r_{new}$ is calculated by this way. dist is used to measure the average distance between these two concept cluster sets. $dist = (\sum \nolimits _{i = 1}^{{n_{new}}} {\min [\sqrt{\sum \nolimits _{j = 1}^D {{{({c_{ij}} - {c_{kj}})}^2}} } ,{c_k} \in {C_{hist}},1 \le k \le {n_{hist}}])/{n_{new}}}$, where ${c_k}$ and ${c_i}$ denote cluster centers in ${C_{hist}}$ and ${C_{new}}$, respectively. In SUN, the value of dist greater than $\mathrm{max}(r_{hist},r_{new})$ means concept drift.

However, in our algorithm, it is assumed that the change of points with higher density is more likely to reflect the change of data distribution. Therefore, dist is redefined as $dist = \sum \nolimits _{i = 1}^{{n_{new}}} {{w_i}dis{t_i}}$ to capture the distribution change of data more accurately. $dist_i$ and $w_i$ are calculated by (3) and (4) respectively. $dist_i$ refers to the distance of the cluster center $c_{i}$ to $C_{hist}$. ${\rho _{{c_i}}}$ refers to the average density of the clusters $C_{i}$ and $C_{k}$ which is the closest cluster to $C_{i}$ in $C_{hist}$ , and ${\rho _{{c_{ij}}}}$ refers to the density of j-th instance belonging to $C_{i}$, $n_{i}$ and $n_{k}$ mean the total number of data in $C_{i}$ and $C_{k}$ respectively. And hence larger $w_i$ and $dist_i$ mean concept drift. Like SUN, if the value of dist is more than $\mathrm{max}(r_{hist},r_{new})$, a real concept drift is considered. The process of drift detection is described in the function 3.

(3)

(4)

dist can be utilized to detect concept drifts caused by the change of $\mathrm {P}(x)$. In addition, considering concept drift can be caused by the distribution change of class labels, the class labels of history concept clusters and new ones are compared when concept drift is not detected. If the class labels of $C_{hist}$ and $C_{new}$ are completely opposite, it is also defined as a real concept drift.

After the bottom-up search is implemented to find all drift leaves, a pruning method is installed for adjusting the tree to cope with concept drifts. Each level of the tree will be traversed once to check concept drift of each leaf from bottom to top until the root is reached. If all child nodes of a node are detected concept drift, these child nodes are pruned and the new leaf node maybe split again.

4 Experiments

In this paper, all synthetic datasets are generated by MOA [19]. xxx-abr, xxx-gra and xxx-inc represent the concept drift types of abrupt, gradual and incremental, respectively. In the dataset with gradual drifts, it takes 5000 instances to change from one concept to another. In addition, to verify the performance of SSCADP on the datasets where the number of clusters varies apparently, we generate a Gaussian dataset with clusters change dynamically and concept changes, which are shown in Fig. 2(a), (b) and (c) represent three different distributions which evolve along the time sequence, with the positive instances in red ‘+’ and the negative instances in green ‘$\times $’, each figure contains 600 instances.

Table 1 shows the properties of all datasets. For Sea, four concepts are generated by setting $\theta $ = 8, 9, 7, and 9.5. For Sine, the definition of Sine is if $a*sin(b*x_1+\theta )+c>x_2$, the label is 0, otherwise is 1, four concepts are generated by setting $a=b=1$ and $c=\theta =0$, $a=b=1$ and $c=\theta =0$ with class labels are changed oppositely, $a=0.3$, $b=3\pi $, $c=0.5$, $\theta =0$ and $a=0.3$, $b=3\pi $, $c=0.5$, $\theta =0$ with class labels are changed oppositely. For HyperPlane-abr and HyperPlane-gra, four concepts are generated by setting $w^1= (0, 0.5, 0.5), w^2= (0, 1, 0), w^3= (1, 0, 0)$ and $ w^4= (0.5, 0, 0.5)$, while for HyperPlane-inc, $d=10$. For Agrawal, function 1, 2, 5, 6 are selected as the concepts of 1, 2, 3, 4. The Weather and Electricity dataset are used in this paper. For each synthetic data, 10 copies are randomly generated while for each real dataset labeled instances are randomly selected 10 runs.

In this paper, $n_{min}$ refers to the minimum number of instances when a leaf attempts to do split-test and it is set to 200. dp means the detection period and $dp=200$ empirically. $\alpha =0.95$ is the confidence used in the clustering algorithm.

4.1 Experimental Results

SSCADP is compared with three baseline methods including SUN, SPASC, and Reasc. Three groups of experiments are conducted to evaluate the accuracy, the impact of label ratio, and concept drift tracking. In the first and last groups of the experiments, in order to simulate the situation of limited labeled data in real applications, the label ratio of all datasets are set to 0.1.

Accuracy. The accumulative accuracy is utilized to evaluate the performance of baseline methods and SSCADP. Table 2 shows the detailed results. For all datasets, each result is obtained by averaging the results of 10 runs.

It can be observed that SSCADP performs better than other baseline algorithms on almost of all datasets. The Friedman test is conducted on the results in Table 2, and the average rank is shown in Table 2, SSCADP achieves the best. Test statistic $F_F=10.654$, the critical value for $\alpha =0.05$ is 2.892, hence we can reject the null-hypothesis that there is no difference among the performance of all algorithms. Furthermore, in Nemenyi test $CD=1.35$ which means SSCADP performs significantly better than SUN and Reasc. There is no significant difference in the performance between SSCADP and SPASC.

Table 1. Properties of the datasets

Full size table

Table 2. Accumulative accuracy (%) on all datasets.

Full size table

Impact of Label Ratio. Considering the influence of labeling ratio on classification accuracy, we simulate the real scene to compare the algorithms by setting the label ratio to 0.05 and 0.2. Detailed results are shown in Table 3. In the case of 0.05, Friedman test is conducted and $F_F=3.175$, critical value for $\alpha =0.05$ is 2.892. This results indicate that the performance of all the algorithms is significantly different. Furthermore, in Nemenyi test $CD=1.35$, and hence it can be concluded that SSCADP performs significantly better than SUN. These results indicate that even there are very limited labels available, SSCADP can achieve better performance, and it is suitable for semi-supervised classification of data stream.

In the case of 0.2, Friedman test is conducted and $F_F=6.567$, critical value for $\alpha =0.05$ is 2.892. This results indicate that the performance of all the algorithms is significantly different. Furthermore, in Nemenyi test $CD=1.35$, and hence it can be concluded that SSCADP performs significantly better than SUN and Reasc. There is no significant difference in the performance between SSCADP and SPASC in the cases of 0.05 and 0.2.

Table 3. Impact of the label ratio on accumulative accuracy (%).

Full size table

Concept Drift Tracking. The drift tracking performance of all algorithms on all datasets with abrupt drift type are shown in Fig. 3. The number at bottom represent the kind of concept, and the vertical line indicate the location of concept drift. In the dataset of Sea, when new concepts arrive, the accuracy of SSCADP declined less in most cases, especially in concept 3 and 4 which are quite different. In the dataset of HyperPlane, SSCADP performed well in most cases except concept 2. In the dataset of Agrawal, the accuracy of SSCADP also declined less in most cases. In the dataset of Sine, SSCADP performed not well since it is a single model and the large differences exist between each concept. SPASC adopts ensemble model as well as can deal with recurring drifts which can achieve better performance.

5 Conclusions

In this paper, we propose a method of SSCADP to handle the semi-supervised classification of data streams. SSCADP can adaptively locate the cluster centers by considering both the local density and the distance between two points, and detect concept drifts by combining both the density of clusters and the distance between the historical clusters and the new ones. Experimental results illustrated that SSCADP can achieve better performance than the baseline algorithms in most datasets. In future, we will focus on how to effectively combine labeled data with unlabeled data to detect concept drift. We will also explore how to deal with recurring concept drift more effectively.

Notes

1.
Source code: https://gitee.com/ymw12345/sscadpsrc.git.

References

Ditzler, G., Roveri, M., Alippi, C., et al.: Learning in nonstationary environments: a survey. IEEE Comput. Intell. Mag. 10(4), 12–25 (2015)
Article Google Scholar
Gama, J., Žliobaitė, I., Bifet, A., et al.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 44–80 (2014)
Article Google Scholar
Lu, J., Liu, A., Dong, F., et al.: Learning under concept drift: a review. IEEE Trans. Knowl. Data Eng. 31(12), 2346–2363 (2019)
Article Google Scholar
Sedhai, S., Sun, A.: Semi-supervised spam detection in Twitter stream. IEEE Trans. Comput. Soc. Syst. 5(1), 169–175 (2018)
Article Google Scholar
Wu, X., Li, P., Hu, X., et al.: Learning from concept drifting data streams with unlabeled data. Neurocomputing 92, 145–155 (2012)
Article Google Scholar
Hosseini, M.J., Gholipour, A., Beigy, H., et al.: An ensemble of cluster-based classifiers for semi-supervised classification of non-stationary data streams. Knowl. Inf. Syst. 46(3), 567–597 (2016)
Article Google Scholar
Li, P.P., Wu, X., Hu, X.: Mining recurring concept drifts with limited labeled streaming data. ACM Trans. Intell. Syst. Technol. 13(2), 241–252 (2012)
Google Scholar
Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)
Article Google Scholar
Haque, A., Khan, L., Baron, M., et al.: SAND: semi-supervised adaptive novel class detection and classification over data stream. In: Proceedings of the 13th AAAI Conference on Artificial Intelligence, pp. 1652–1658. AAAI, Menlo Park, CA (2016)
Google Scholar
Wang, Y., Li, T.: Improving semi-supervised co-forest algorithm in evolving data streams. Appl. Intell. 48(10), 3248–3262 (2018)
Article Google Scholar
Bifet, A., Gavalda, R.: Learning from time-changing data with adaptive windowing. In: Proceedings of the 7th SIAM International Conference on Data Mining, pp. 443–448. SIAM, Philadelphia, PA (2007)
Google Scholar
Masud, M.M., Woolam, C., Gao, J., et al.: Facing the reality of data stream classification: coping with scarcity of labeled data. Knowl. Inf. Syst. 33(1), 213–244 (2012)
Article Google Scholar
Wen, Y.M., Liu, S.: Semi-supervised classification of data streams by BIRCH ensemble and local structure mapping. J. Comput. Sci. Technol. 35(2), 295–304 (2020)
Article Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M., et al.: BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pp. 103–114. ACM, New York (1996)
Google Scholar
Din, S.U., Shao, J., Kumar, J., et al.: Online reliable semi-supervised learning on evolving data streams. Inf. Sci. 525, 153–171 (2020)
Article MathSciNet Google Scholar
Zhang, P., Zhu, X., Tan, J., et al.: Classifier and cluster ensembles for mining concept drifting data streams. In: Proceedings of the 10th IEEE International Conference on Data Mining, pp. 1175–1180. IEEE, Los Alamitos, CA (2010)
Google Scholar
Li, Y., Wang, Y., Liu, Q., et al.: Incremental semi-supervised learning on streaming data. Pattern Recogn. 88, 383–396 (2019)
Article Google Scholar
Zhu X, Ghahramani Z.: Learning from labeled and unlabeled data with label propagation. Technical report CMU-CALD-02-107, Carnegie Mellon University (2002)
Google Scholar
Bifet, A., Holmes, G., Kirkby, R., et al.: MOA: massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)
Google Scholar

Download references

Acknowledgments

This work was partially supported by the Natural Science Foundation of Guangxi District (2018GXNSFDA138006), National Natural Science Foundation of China (61866007, 61662014), Collaborative Innovation Center of Cloud Computing and Big Data (YD16E12) and Image Intelligent Processing Project of Key Laboratory Fund (GIIP201505).

Author information

Authors and Affiliations

Guangxi Key Laboratory of Image and Graphic Intelligent Processing, Guilin University of Electronic Technology, Guilin, China
Changjie Liu & Yimin Wen
School of Municipal and Surveying Engineering, Hunan City University, Yiyang, China
Yun Xue

Authors

Changjie Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yimin Wen
View author publications
You can also search for this author in PubMed Google Scholar
Yun Xue
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yimin Wen .

Editor information

Editors and Affiliations

Department of AI, Ping An Life, Shenzhen, China
Haiqin Yang
Faculty of Information Technology, King Mongkut’s Institute of Technology Ladkrabang, Bangkok, Thailand
Kitsuchart Pasupa
City University of Hong Kong, Kowloon, China
Andrew Chi-Sing Leung
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, Hong Kong
James T. Kwok
School of Information Technology, King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
Jonathan H. Chan
The Chinese University of Hong Kong, New Territories, Hong Kong
Irwin King

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, C., Wen, Y., Xue, Y. (2020). Semi-supervised Classification of Data Streams Based on Adaptive Density Peak Clustering. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Lecture Notes in Computer Science(), vol 12533. Springer, Cham. https://doi.org/10.1007/978-3-030-63833-7_54

Download citation

DOI: https://doi.org/10.1007/978-3-030-63833-7_54
Published: 20 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63832-0
Online ISBN: 978-3-030-63833-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Semi-supervised Classification of Data Streams Based on Adaptive Density Peak Clustering

Abstract

Similar content being viewed by others

A dynamic hierarchical incremental learning-based supervised clustering for data stream with considering concept drift

Semi-Supervised Stream Clustering Using Labeled Data Points

Semi-supervised Classification on Data Streams with Recurring Concept Drift Based on Conformal Prediction

Keywords

1 Introduction

2 Related Work

3 Proposed Algorithm

3.1 The Framework of the Proposed Algorithm