Abstract
The clustering of unbounded data-streams is a difficult problem since the observed instances cannot be stored for future clustering decisions. Moreover, the probability distribution of streams tends to change over time, making it challenging to differentiate between a concept-drift and an anomaly. Although many excellent data-stream clustering algorithms have been proposed in the past, they are not suitable for capturing the temporal contexts of an entity.
In this paper, we propose pcStream; a novel data-stream clustering algorithm for dynamically detecting and managing sequential temporal contexts. pcStream takes into account the properties of sensor-fused data-streams in order to accurately infer the present concept, and dynamically detect new contexts as they occur. Moreover, the algorithm is capable of detecting point anomalies and can operate with high velocity data-streams. Lastly, we show in our evaluation that pcStream outperforms state-of-the-art stream clustering algorithms in detecting real world contexts from sensor-fused datasets. We also show how pcStream can be used as an analysis tool for contextual sensor streams.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Aggarwal, C.C., et al.: A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases vol. 29, pp. 81–92. VLDB Endowment (2003)
Anguita, D., Ghio, A., Oneto, L., Parra, X., Reyes-Ortiz, J.L.: Human Activity Recognition on Smartphones Using a Multiclass Hardware-Friendly Support Vector Machine. In: Bravo, J., Hervás, R., Rodríguez, M. (eds.) IWAAL 2012. LNCS, vol. 7657, pp. 216–223. Springer, Heidelberg (2012)
Babcock, B., et al.: Maintaining variance and k-medians over data stream windows. In: Proceedings of the Twenty-Second ACM Symposium On Principles Of Database Systems, pp. 234–243. ACM (2003)
Bache, K., Lichman, M.: UCI Machine Learning Repository (2013). http://archive.ics.uci.edu/ml
Baldauf, M., et al.: A survey on context-aware systems. International Journal of Ad Hoc and Ubiquitous Computing. 2(4), 263–277 (2007)
Bolanos, M., et al.: Introduction to stream: An extensible Framework for Data Stream Clustering Research with R
Bolanos, M., et al.: StreamMOA: Interface to Algorithms from MOA for stream
Cao, F., et al.: Density-based clustering over an evolving data stream with noise. In: SDM, pp. 326–337 SIAM (2006)
Chandola, V., et al.: Anomaly Detection: A Survey. ACM Comput. Surv. 41(3), 1–58 (2009)
Chen, Y., Tu, L.: Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD International Conference On Knowledge Discovery And Data Mining, pp. 133–142. ACM (2007)
Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with Drift Detection. In: Bazzan, A.L., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004)
Ge, Z., Song, Z.: Multivariate Statistical Process Control: Process Monitoring Methods and Applications. Springer (2012)
Gomes, J.B., et al.: CALDS: context-aware learning from data streams. In: Proceedings of the First International Workshop on Novel Data Stream Pattern Mining Techniques, pp. 16–24. ACM, Washington, D.C. (2010)
Harries, M.B., et al.: Extracting hidden context. Machine learning. 32(2), 101–126 (1998)
Hubert, L., Arabie, P.: Comparing partitions. Journal of classification. 2(1), 193–218 (1985)
Jolliffe, I.: Principal Component Analysis. Encyclopedia of Statistics in Behavioral Science. John Wiley & Sons, Ltd (2005)
Katakis, I., et al.: Tracking recurring contexts using ensemble classifiers: an application to email filtering. Knowledge and Information Systems. 22(3), 371–391 (2010)
Liu, W., et al.: A survey on context awareness. In: 2011 International Conference on Computer Science and Service System (CSSS), pp. 144–147. IEEE (2011)
Maesschalck, R.D., et al.: The Mahalanobis distance. Chemometrics and Intelligent Laboratory Systems. 50(1), 1–18 (2000)
Makris, P., et al.: A Survey on Context-Aware Mobile and Wireless Networking: On Networking and Computing Environments’ Integration. Communications Surveys & Tutorials, IEEE. 15(1), 362–386 (2013)
Padovitz, A., et al.: Towards a theory of context spaces. In: 2004, Proceedings of the Second IEEE Annual Conference on Pervasive Computing and Communications Workshops, pp. 38–42. IEEE (2004)
Riboni, D., Bettini, C.: COSAR: hybrid reasoning for context-aware activity recognition. Personal and Ubiquitous Computing. 15(3), 271–289 (2011)
Shlens, J.: A tutorial on principal component analysis. arXiv preprint arXiv:1404.1100 (2014)
Silva, J.A., et al.: Data Stream Clustering: A Survey. ACM Comput. Surv. 46(1), 1–31 (2013)
Unger, M., et al.: Contexto: lessons learned from mobile context inference. In: ACM 2014 International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, pp. 175–178. ACM (2014)
Widmer, G.: Tracking context changes through meta-learning. Machine Learning. 27(3), 259–286 (1997)
Wold, S., Sjostrom, M.: SIMCA: a method for analyzing chemical data in terms of similarity and analogy. Presented at the (1977)
Yang, Y., et al.: Mining in Anticipation for Concept Change: Proactive-Reactive Prediction in Data Streams. Data Mining and Knowledge Discovery. 13(3), 261–289 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Mirsky, Y., Shapira, B., Rokach, L., Elovici, Y. (2015). pcStream: A Stream Clustering Algorithm for Dynamically Detecting and Managing Temporal Contexts. In: Cao, T., Lim, EP., Zhou, ZH., Ho, TB., Cheung, D., Motoda, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2015. Lecture Notes in Computer Science(), vol 9078. Springer, Cham. https://doi.org/10.1007/978-3-319-18032-8_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-18032-8_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18031-1
Online ISBN: 978-3-319-18032-8
eBook Packages: Computer ScienceComputer Science (R0)