Abstract
Nowadays, most streaming data sources are becoming high-dimensional. Accordingly, subspace stream clustering, which aims at finding evolving clusters within subgroups of dimensions, has gained a significant importance. However, existing subspace clustering evaluation measures are mainly designed for static data, and cannot reflect the quality of the evolving nature of data streams. On the other hand, available stream clustering evaluation measures care only about the errors of the full-space clustering but not the quality of subspace clustering.
In this paper we propose, to the first of our knowledge, the first subspace clustering measure that is designed for streaming data, called SubCMM: Subspace Cluster Mapping Measure. SubCMM is an effective evaluation measure for stream subspace clustering that is able to handle errors caused by emerging, moving, or splitting subspace clusters. Additionally, we propose a novel method for using available offline subspace clustering measures for data streams within the Subspace MOA framework.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
KDD Cup 1999 Data, http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proc. of the 29th Int. Conf. on Very Large Data Bases, VLDB 2003, vol. 29, pp. 81–92 (2003)
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for projected clustering of high dimensional data streams. In: Proc. of VLDB 2004, pp. 852–863 (2004)
Aggarwal, C.C., Wolf, J.L., Yu, P.S., Procopiuc, C., Park, J.S.: Fast algorithms for projected clustering. SIGMOD Rec. 28(2), 61–72 (1999)
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proc. of SIGMOD 1998, pp. 94–105 (1998)
Assent, I., Krieger, R., Müller, E., Seidl, T.: Inscy: Indexing subspace clusters with in-process-removal of redundancy. In: ICDM, pp. 719–724 (2008)
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is ”nearest neighbor” meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)
Bohm, C., Kailing, K., Kriegel, H.-P., Kroger, P.: Density connected clustering with local subspace preferences. In: ICDM, pp. 27–34 (2004)
Bringmann, B., Zimmermann, A.: The chosen few: On identifying valuable patterns. In: ICDM, pp. 63–72 (2007)
Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: 2006 SIAM Conference on Data Mining, pp. 328–339 (2006)
Chen, Y., Tu, L.: Density-based clustering for real-time stream data. In: Proc. of KDD 2007, pp. 133–142 (2007)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. of KDD 1996, pp. 226–231 (1996)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier Science & Tech. (2006)
Hassani, M., Kim, Y., Seidl, T.: Subspace moa: Subspace stream clustering evaluation using the moa framework. In: Meng, W., Feng, L., Bressan, S., Winiwarter, W., Song, W. (eds.) DASFAA 2013, Part II. LNCS, vol. 7826, pp. 446–449. Springer, Heidelberg (2013)
Hassani, M., Kranen, P., Seidl, T.: Precise anytime clustering of noisy sensor data with logarithmic complexity. In: Proc. 5th International Workshop on Knowledge Discovery from Sensor Data (SensorKDD 2011) in Conjunction with KDD 2011, pp. 52–60 (2011)
Hassani, M., Müller, E., Seidl, T.: EDISKCO: energy efficient distributed in-sensor-network k-center clustering with outliers. In: Proc. SensorKDD 2010 Workshop in Conj. with KDD 2009, pp. 39–48 (2009)
Hassani, M., Spaus, P., Gaber, M.M., Seidl, T.: Density-based projected clustering of data streams. In: Hüllermeier, E., Link, S., Fober, T., Seeger, B. (eds.) SUM 2012. LNCS, vol. 7520, pp. 311–324. Springer, Heidelberg (2012)
Jain, A., Zhang, Z., Chang, E.Y.: Adaptive non-linear clustering in data streams. In: Proc. of CIKM 2006, pp. 122–131 (2006)
Kaufman, L., Rousseeuw, P.J.: Finding groups in data: an introduction to cluster analysis. Wiley series in probability and mathematical statistics: Applied probability and statistics. Wiley (1990)
Kranen, P., Kremer, H., Jansen, T., Seidl, T., Bifet, A., Holmes, G., Pfahringer, B., Read, J.: Stream data mining using the MOA framework. In: Lee, S.-g., Peng, Z., Zhou, X., Moon, Y.-S., Unland, R., Yoo, J. (eds.) DASFAA 2012, Part II. LNCS, vol. 7239, pp. 309–313. Springer, Heidelberg (2012)
Kremer, H., Kranen, P., Jansen, T., Seidl, T., Bifet, A., Holmes, G., Pfahringer, B.: An effective evaluation measure for clustering on evolving data streams. In: Proc. of KDD 2011 (2011)
Kriegel, H.-P., Kröger, P., Ntoutsi, I., Zimek, A.: Towards subspace clustering on dynamic data: an incremental version of predecon. In: Proceedings of the First International Workshop on Novel Data Stream Pattern Mining Techniques, StreamKDD 2010, pp. 31–38 (2010)
Kröger, P., Kriegel, H.-P., Kailing, K.: Density-connected subspace clustering for high-dimensional data. In: SDM, pp. 246–257 (2004)
Lin, G., Chen, L.: A grid and fractal dimension-based data stream clustering algorithm. In: ISISE 2008, pp. 66–70 (2008)
Müller, E., Assent, I., Günnemann, S., Jansen, T., Seidl, T.: Opensubspace: An open source framework for evaluation and exploration of subspace clustering algorithms in weka. In: In Open Source in Data Mining Workshop at PAKDD, pp. 2–13 (2009)
Müller, E., Günnemann, S., Assent, I., Seidl, T.: Evaluating clustering in subspace projections of high dimensional data. PVLDB 2(1), 1270–1281 (2009)
Ntoutsi, I., Zimek, A., Palpanas, T., Kröger, P., Kriegel, H.-P.: Density-based projected clustering over high dimensional data streams. In: Proc. of SDM 2012, pp. 987–998 (2012)
Park, N.H., Lee, W.S.: Grid-based subspace clustering over data streams. In: Proc. of CIKM 2007, pp. 801–810 (2007)
Patrikainen, A., Meila, M.: Comparing subspace clusterings. IEEE Transactions on Knowledge and Data Engineering 18(7), 902–916 (2006)
Sequeira, K., Zaki, M.: Schism: A new approach to interesting subspace mining, vol. 1, pp. 137–160 (2005)
Zhao, Y., Karypis, G.: Criterion functions for document clustering: Experiments and analysis. Technical report (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hassani, M., Kim, Y., Choi, S., Seidl, T. (2013). Effective Evaluation Measures for Subspace Clustering of Data Streams. In: Li, J., et al. Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2013. Lecture Notes in Computer Science(), vol 7867. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40319-4_30
Download citation
DOI: https://doi.org/10.1007/978-3-642-40319-4_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40318-7
Online ISBN: 978-3-642-40319-4
eBook Packages: Computer ScienceComputer Science (R0)