Abstract
Extracting useful knowledge from numerous distributed data repositories can be a very hard task when such data cannot be directly centralized or unified as a single file or database. This paper suggests practical distributed clustering algorithms without accessing the raw data to overcome the inefficiency of centralized data clustering methods. The aim of this research is to generate unit volume based probabilistic mixture model from local clustering results without moving original data. It has been shown that our method is appropriate for distributed clustering when real data cannot be accessed or centralized.
This work is supported by grant No. R01-2004-000-10689-0 from the Basic Research Program of the Korea Science & Engineering Foundation.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. John Wiley and Sons Inc., Chichester (2000)
Januzaj, E., Kriegel, H.P., Pfeifle, M.: Towards effective and efficient distributed clustering. In: International Workshop on Clustering Large Data Set (ICDM) (2003)
Vrahatis, M.N., Boutsinas, B., Alevizos, P., Pavlides, G.: The new k-windows algorithm for improving the k-means clustering algorithm. Journal of Complexity 18, 375–391 (2002)
Tasoulis, D.K., Vrahatis, M.N.: Unsupervised distributed clustering. In: The IASTED International Conference on Parallel and Distributed Computing and Networks, as part of the Twenty-Second IASTED International Multi-Conference on Applied Informatics, Innsbruck, Austria (2004)
Merugu, S., Ghosh, J.: Privacy-preserving distributed clustering using generative models. In: The Third IEEE International Conference on Data Mining (ICDM 2003) (2003)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., Han, J., Fayyad, U. (eds.) Second International Conference on Knowledge Discovery and Data Mining, Portland, Oregon, pp. 226–231. AAAI Press, Menlo Park (1996)
Trivedi, K.S.: Probability and statistics with reliability, queuing and computer science applications. John Wiley and Sons Inc., Chichester (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lee, K., Joo, J., Yang, J., Park, S. (2005). Unit Volume Based Distributed Clustering Using Probabilistic Mixture Model. In: Hoffmann, A., Motoda, H., Scheffer, T. (eds) Discovery Science. DS 2005. Lecture Notes in Computer Science(), vol 3735. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11563983_29
Download citation
DOI: https://doi.org/10.1007/11563983_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29230-2
Online ISBN: 978-3-540-31698-5
eBook Packages: Computer ScienceComputer Science (R0)