Abstract
Anomaly detectors are used to distinguish differences between normal and abnormal data, which are usually implemented by evaluating and ranking the anomaly scores of each instance. A static unsupervised streaming anomaly detector is difficult to dynamically adjust anomaly score calculation. In real scenarios, anomaly detection often needs to be regulated by human feedback, which benefits adjusting anomaly detectors. In this paper, we propose a human-machine interactive streaming anomaly detection method, named ISPForest, which can be adaptively updated online under the guidance of human feedback. In particular, the feedback will be used to adjust the anomaly score calculation and structure of the detector, ideally attaining more accurate anomaly scores in the future. Our main contribution is to improve the tree-based streaming anomaly detection model that can be updated online from perspectives of anomaly score calculation and model structure. Our approach is instantiated for the powerful class of tree-based streaming anomaly detectors, and we conduct experiments on a range of benchmark datasets. The results demonstrate that the utility of incorporating feedback can improve the performance of anomaly detectors with a few human efforts.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Hawkins D M. Identification of Outliers. London: Chapman and Hall, 1980
Aggarwal C C. Outlier analysis. In: Aggarwal C C, ed. Data Mining. Cham: Springer, 2015, 237–263
Fiore U, De Santis A, Perla F, Zanetti P, Palmieri F. Using generative adversarial networks for improving classification effectiveness in credit card fraud detection. Information Sciences, 2019, 479: 448–455
Tseng V S, Ying J C, Huang C W, Kao Y, Chen K T. FrauDetector: a graph-mining-based framework for fraudulent phone call detection. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015, 2157–2166
Liu F T, Ting K M, Zhou Z H. Isolation forest. In: Proceedings of the 8th IEEE International Conference on Data Mining. 2008, 413–422
Yang X, Latecki L J, Pokrajac D. Outlier detection with globally optimal exemplar-based GMM. In: Proceedings of 2009 SIAM International Conference on Data Mining. 2009, 145–154
Zong B, Song Q, Min M R, Cheng W, Lumezanu C, Cho D K, Chen H F. Deep autoencoding Gaussian mixture model for unsupervised anomaly detection. In: Proceedings of the 6th International Conference on Learning Representations. 2018
Manzoor E, Milajerdi S M, Akoglu L. Fast memory-efficient anomaly detection in streaming heterogeneous graphs. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016, 1035–1044
Paulheim H, Meusel R. A decomposition of the outlier detection problem into a set of supervised learning problems. Machine Learning, 2015, 100(2): 509–531
Overby D, Wall J, Keyser J. Interactive analysis of situational awareness metrics. In: Proceedings of SPIE 8294 Visualization and Data Analysis 2012. 2012, 829406
Cao N, Shi C, Lin S, Lu J, Lin Y R, Lin C Y. TargetVue: visual analysis of anomalous user behaviors in online communication systems. IEEE Transactions on Visualization and Computer Graphics, 2016, 22(1): 280–289
Tan S C, Ting K M, Liu T F. Fast anomaly detection for streaming data. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence. 2011, 1511–1516
Wu K, Zhang K, Fan W, Edwards A, Yu P S. RS-Forest: a rapid density estimator for streaming anomaly detection. In: Proceedings of 2014 IEEE International Conference on Data Mining. 2014, 600–609
Pevný T. Loda: lightweight on-line detector of anomalies. Machine Learning, 2016, 102(2): 275–304
Erfani S M, Rajasegarar S, Karunasekera S, Leckie C. High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recognition, 2016, 58: 121–134
Zhang K, Hutter M, Jin H. A new local distance-based outlier detection approach for scattered real-world data. In: Proceedings of the 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining. 2009, 813–822
Guha S, Mishra N, Roy G, Schrijvers O. Robust random cut forest based anomaly detection on streams. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning. 2016, 2712–2721
Mu X, Ting K M, Zhou Z H. Classification under streaming emerging new classes: a solution using completely-random trees. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(8): 1605–1618
Gomes H M, Bifet A, Read J, Barddal J P, Enembreck F, Pfharinger B, Holmes G, Abdessalem T. Adaptive random forests for evolving data stream classification. Machine Learning, 2017, 106(9–10): 1469–1495
Ahmad S, Lavin A, Purdy S, Agha Z. Unsupervised real-time anomaly detection for streaming data. Neurocomputing, 2017, 262: 134–147
Malhotra P, Vig L, Shroff G, Agarwal P. Long short term memory networks for anomaly detection in time series. In: Proceedings of the 23rd European Symposium on Artificial Neural Networks. 2015, 89–94
Qiu J, Du Q, Qian C. KPI-TSAD: a time-series anomaly detector for KPI monitoring in cloud applications. Symmetry, 2019, 11(11): 1350
Munir M, Siddiqui S A, Dengel A, Ahmed S. DeepAnT: a deep learning approach for unsupervised anomaly detection in time series. IEEE Access, 2018, 7: 1991–2005
Dong Y, Japkowicz N. Threaded ensembles of autoencoders for stream learning. Computational Intelligence, 2018, 34(1): 261–281
Veeramachaneni K, Arnaldo I, Korrapati V, Bassias C, Li K. AI2: training a big data machine to defend. In: Proceedings of the 2nd IEEE International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS). 2016, 49–54
Das S, Wong W K, Fern A, Dietterich T G, Siddiqui M A. Incorporating feedback into tree-based anomaly detection. 2017, arXiv preprint arXiv: 1708.09441
Das S, Wong W K, Dietterich T, Fern A, Emmott A. Incorporating expert feedback into active anomaly discovery. In: Proceedings of the 16th IEEE International Conference on Data Mining (ICDM). 2016, 853–858
Ting K M, Zhou G T, Liu F T, Tan J S C. Mass estimation and its applications. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2010, 989–998
Welford B P. Note on a method for calculating corrected sums of squares and products. Technometrics, 1962, 4(3): 419–420
Bhatia S, Jain A, Li P, Kumar R, Hooi B. MStream: fast anomaly detection in multi-aspect streams. In: Proceedings of the Web Conference 2021. 2021, 3371–3382
Hand D J, Till R J. A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning, 2001, 45(2): 171–186
Schölkopf B, Williamson R C, Smola A J, Shawe-Taylor J, Platt J C. Support vector method for novelty detection. In: Proceedings of the 12th International Conference on Neural Information Processing Systems. 1999, 582–588
Breunig M M, Kriegel H P, Ng R T, Sander J. LOF: identifying density-based local outliers. In: Proceedings of 2000 ACM SIGMOD International Conference on Management of Data. 2000, 93–104
Acknowledgements
This work was supported in part by the National Science Fund for Distinguished Young Scholars (61725205), the National Natural Science Foundation of China (Grant Nos. 61960206008, 61772428, 61972319, and 61902320).
Author information
Authors and Affiliations
Corresponding author
Additional information
Qingyang Li received the bachelor’s degree from Northwestern Polytechnical University, China in 2016. She is currently a PhD student with the School of Computer Science, Northwestern Polytechnical University, China. Her research interests include ubiquitous computing, machine learning, and human-computer interaction.
Zhiwen Yu received the PhD degree in computer science from Northwestern Polytechnical University, China in 2005. He is currently a Professor and the Dean of the School of Computer Science, Northwestern Polytechnical University, China. He was an Alexander Von Humboldt Fellow with Mannheim University, Germany and a Research Fellow with Kyoto University, Japan. His research interests include ubiquitous computing, HCI, and mobile sensing and computing.
Huang Xu received the PhD degree in computer science from Northwestern Polytechnical University, China in 2019. His primary research interests include the area of data mining and ubiquitous computing. He has published in refereed conference proceedings, including ACM SIGKDD, IJCAI, and IEEE ICDM.
Bin Guo received the PhD degree in computer science from Keio University, Japan in 2009, He was a Postdoctoral Researcher with the Institut TELECOM SudParis, France. He is currently a Professor with Northwestern Polytechnical University, China. His research interests include ubiquitous computing, mobile crowd sensing and computing, and HCI.
Electronic Supplementary Material
Rights and permissions
About this article
Cite this article
Li, Q., Yu, Z., Xu, H. et al. Human-machine interactive streaming anomaly detection by online self-adaptive forest. Front. Comput. Sci. 17, 172317 (2023). https://doi.org/10.1007/s11704-022-1270-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11704-022-1270-y