Summary
Clustering is commonly regarded as a synonym of unsupervised learning aimed at the discovery of structure in highly dimensional data. With the evident plethora of existing algorithms, the area offers an outstanding diversity of possible approaches along with their underlying features and potential applications. With the inclusion of fuzzy sets, fuzzy clustering became an integral component of Computational Intelligence (CI) and is now broadly exploited in fuzzy modeling, fuzzy control, pattern recognition, and exploratory data analysis. A lot of pursuits of CI are human-centric in the sense they are either initiated or driven by some domain knowledge or the results generated by the CI constructs are made easily interpretable. In this sense, to follow the tendency of human-centricity so profoundly visible in the CI domain, the very concept of fuzzy clustering needs to be carefully revisited. We propose a certain paradigm shift that brings us to the idea of knowledge-based clustering in which the development of information granules – fuzzy sets is governed by the use of data as well as domain knowledge supplied through an interaction with the developers, users and experts. In this study, we elaborate on the concepts and algorithms of knowledge-based clustering by considering the well known scheme of Fuzzy C-Means (FCM) and viewing it as an operational model using which a number of essential developments could be easily explained. The fundamental concepts discussed here involve clustering with domain knowledge articulated through partial supervision and proximity-based knowledge hints.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Abonyi, J. and Szeifert, F. (2003). Supervised fuzzy clustering for the identification of fuzzy classifiers, Pattern Recognition Letters,24,14, 2195-2207.
Agarwal, R. and Srikant, R. (2000). Privacy-preserving data mining. In: Proc. of the ACM SIGMOD Conference on Management of Data. ACM Press, New York, May 2000, 439-450.
Bensaid, A. M., Hall, L. O., Bezdek, J. C. and Clarke L. P. (1996). Partially supervised clustering for image segmentation, Pattern Recognition, 29,5,859-871.
Bezdek, J. C. (1981). Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press, NY.
Claerhout, B. and DeMoor, G.J.E. (2005). Privacy protection for clinical and genomic data: The use of privacy-enhancing techniques in medicine, Int. Journal of Medical Informatics, 74, 2-4, 257-265.
Clifton, C. (2000). Using sample size to limit exposure to data mining, Journal of Computer Security 8,4, 281-307.
Clifton, C. and Marks, D. (1996). Security and privacy implications of data mining. In: Workshop on Data Mining and Knowledge Discovery, Montreal, Canada, 15-19.
Clifton, C. and Thuraisingham, B. (2001). Emerging standards for data mining, Computer Standards & Interfaces, 23, 3, 187-193.
Coppi, R. and D'Urso, P. (2003). Three-way fuzzy clustering models for LR fuzzy time trajectories, Computational Statistics & Data Analysis, 43,2,149-177.
Da Silva, J. C., Giannella, C., Bhargava, R., Kargupta, H. and Klusch, M. (2005). Distributed data mining and agents, Engineering Applications of Artificial Intelligence, 18, 7, 791-807.
Du, W., Zhan, Z. (2002). Building decision tree classifier on private data. In: Clifton, C., Estivill-Castro, V. (Eds.), IEEE ICDM Workshop on Privacy, Security and Data Mining, Conferences in Research and Practice in Information Technology, vol. 14, Maebashi City, Japan, ACS, pp. 1-8.
Evfimievski, A., Srikant, R., Agrawal, R. and Gehrke, J. (2004). Privacy preserving mining of association rules, Information Systems, 29, 4, 343-364.
Johnsten, T. and Raghavan V.V. (2002). A methodology for hiding knowledge in databases. In: Clifton, C., Estivill-Castro, C. (Eds.), IEEE ICDM Workshop on Privacy, Security and Data Mining, Conferences in Research and Practice in Information Technology, vol. 14. Maebashi City, Japan, ACS, pp. 9-17.
Kargupta, H., Kun, L., Datta, S., Ryan, J. and Sivakumar, K. (2003). Homeland security and privacy sensitive data mining from multi-party distributed resources, Proc. 12th IEEE International Conference on Fuzzy Systems, FUZZ '03,. Volume 2, 25-28 May 2003, vol.2, 1257-1260.
Kersten, P.R. (1996). Including auxiliary information in fuzzy clustering, Proc. 1996 Biennial Conference of the North American Fuzzy Information Processing Society, NAFIPS, 19-22 June 1996, 221 -224.
Lindell, Y. and Pinkas, B. (2000). Privacy preserving data mining. In: Lecture Notes in Computer Science, vol. 1880, 36-54.
Liu, H. and Huang, S.T. (2003). Evolutionary semi-supervised fuzzy clustering, Pattern Recognition Letters, 24, 16, 3105-3113.
Merugu, S and Ghosh, J. (2005).A privacy-sensitive approach to distributed clustering, Pattern Recognition Letters, 26, 4, 399-410.
Park, B. and Kargupta, H. (2003). Distributed data mining: algorithms, systems, and applications. In: Ye, N. (Ed.), The Handbook of Data Mining. Lawrence Erlbaum Associates, N. York, 341-358.
Pedrycz, W. (1985). Algorithms of fuzzy clustering with partial supervision, Pattern Recognition Letters, 3, 1985, 13-20.
Pedrycz, W. and Waletzky, J. (1997). Fuzzy clustering with partial supervision, IEEE Trans. on Systems, Man, and Cybernetics, 5, 787-795.
Pedrycz, W. and Waletzky, J. (1997). Neural network front-ends in unsupervised learning, IEEE Trans. on Neural Networks, 8, 390-401.
Pedrycz, W., Loia, V. and Senatore, S. (2004). P-FCM: A proximity-based clustering, Fuzzy Sets & Systems, 148, 2004, 21-41.
Pedrycz, W. (2002). Collaborative fuzzy clustering, Pattern Recognition Letters, 23, 14, 1675-1686.
Pedrycz, W. (2005). Knowledge-Based Clustering: From Data to Information Granules, J. Wiley, N. York.
Pinkas, B. (2002). Cryptographic techniques for privacy-preserving data mining. ACM SIGKDD Explorations Newsletter 4, 2, 12-19.
Strehl, A. and Ghosh, J. (2002). Cluster ensembles—a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3, 583-617.
Timm, H., Klawonn, F. and Kruse, R. (2002). An extension of partially supervised fuzzy cluster analysis, Proc. Annual Meeting of the North American Fuzzy Information Processing Society, NAFIPS 2002, 27-29 June 2002, 63-68.
Tsoumakas, G., Angelis, L. and Vlahavas, I. (2004). Clustering classifiers for knowledge discovery from physically distributed databases, Data & Knowledge Engineering, 49, 3, 223-242.
Verykios, V.S., Bertino, E., Fovino, I.N., Provenza, L.P., Saygin, Y. and Theodoridis Y. (2004). State-of-the-art in privacy preserving data mining. SIGMOD Record 33, 1, 50-57.
Wang K., Yu, P.S. and Chakraborty, S. (2004). Bottom-up generalization: a data mining solution to privacy protection, Proc. 4 th IEEE International Conference on Data Mining, ICDM 2004, 1-4 Nov. 2004, 249-256
Wang, S.L. and Jafari, A. (2005). Using unknowns for hiding sensitive predictive association rules, Proc. 2005 IEEE International Conference on Information Reuse and Integration, 223-228.
Wang, E.T., Lee, G. and Lin, Y. T. (2005). A novel method for protect-ing sensitive knowledge in association rules mining, Proc. 29 th Annual International Computer Software and Applications Conference (COMP-SAC 2005), vol. 2, 511-516.
Zadeh, L. A. (2005). Toward a generalized theory of uncertainty (GTU) - an outline, Information Sciences, 172, 1-2, 1-40.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Pedrycz, W. (2007). Knowledge-Based Clustering in Computational Intelligence. In: Duch, W., Mańdziuk, J. (eds) Challenges for Computational Intelligence. Studies in Computational Intelligence, vol 63. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71984-7_12
Download citation
DOI: https://doi.org/10.1007/978-3-540-71984-7_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71983-0
Online ISBN: 978-3-540-71984-7
eBook Packages: EngineeringEngineering (R0)