Abstract
Similarity is a key concept for all attempts to construct human-like automated systems or assistants to human task solving since they are very natural in the human process of categorization, underlying many natural capabilities such as language understanding, pattern recognition or decision-making. In this paper, we study the use of similarities in data mining, basing our discourse on cognitive approaches of similarity stemming for instance from Tversky’s and Rosch’s seminal works, among others. We point out a general framework for measures of comparison compatible with these cognitive foundations, and we show that measures of similarity can be involved in all steps of the data mining process. We then focus on fuzzy logic that provides interesting tools for data mining mainly because of its ability to represent imperfect information, which is of crucial importance when databases are complex, large, and contain heterogeneous, imprecise, vague, uncertain or incomplete data. We eventually illustrate our discourse by examples of similarities used in real-world data mining problems.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Rosch, E.: Principles of categorization. In: Rosch, E., Lloyd, B. (eds.) Cognition and Categorization. Lawrence Erlbaum, Mahwah (1978)
Rosch, E., Mervis, C.: Family resemblance: studies of the internal structure of categories. Cognitive psychology 7, 573–605 (1975)
Tijus, C.: Introduction à la psychologie cognitive. Nathan Université (2001)
Hampton, J.A.: The role of similarity in natural categorization. In: Hahn, U., Ramscar, M. (eds.) Similarity and Categorization, pp. 13–28. Oxford University Press, Oxford (2001)
Wittgenstein, L.: Philosophical Investigations. Blackwell Publishing, Malden (1953/2001)
Kleiber, G.: Prototype et prototypes. In: Sémantique et cognition. CNRS, Paris (1991)
Barsalou, L.W.: Ideals, Central Tendency, and Frequency of Instantiation as Determinants of Graded Structure. Journal of Experimental Psychology: Learning, Memory and Cognition 11, 629–654 (1985)
Nosofsky, R.M.: Similarity, frequency, and category representations. Journal of Experimental Psychology: Learning, Memory, and Cognition 14(1), 54–65 (1988)
Poitrenaud, S., Richard, J.-F., Tijus, C.: Properties, categories and categorization. Thinking and reasoning 11(2), 151–208 (2005)
Posner, M.I., Keele, S.W.: On the genesis of abstract ideas. Journal of Experimental Psychology 77, 353–363 (1968)
Barsalou, L.W.: The instability of graded structure: Implications for the nature of concepts. In: Neisser, U. (ed.) Concepts and conceptual development: Ecological and intellectual factors in categorization, pp. 101–140. Cambridge University Press, Cambridge (1987)
Tversky, A.: Features of similarity. Psychological Rev. 84(4), 327–352 (1977)
Hahn, U., Ramscar, M.: Introduction: similarity and categorization. In: Hahn, U., Ramscar, M. (eds.) Similarity and categorization, pp. 1–11. Oxford University Press, Oxford (2001)
Medin, D.L., Goldstone, R.L., Gentner, D.: Respects for similarity. Psychological Review 100(2), 254–278 (1993)
Keane, M.T., Smyth, B., O‘Sullivan, F.: Dynamic similarity: A processing perspective on similarity (2001)
Markman, A.B., Gentner, D.: Structural alignment during similarity comparisons. Cognitive Psychology 25, 431–467 (1993)
Rissland, E.: AI and similarity, IEEE Int. Systems 21, 39–49 (2006)
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery in databases. AI magazine 17(3), 37–54 (1996)
Delavallade, T., Dang, T.H.: Using Entropy to Impute Missing Data in a Classification Task. In: IEEE International Conference on Fuzzy Systems, London, pp. 1–6 (2007)
Timm, H., Döring, C., Kruse, R.: Differentiated treatment of missing values in fuzzy clustering. In: De Baets, B., Kaynak, O., Bilgiç, T. (eds.) IFSA 2003. LNCS, vol. 2715, pp. 354–361. Springer, Heidelberg (2003)
Song, Q., Shepperd, M.: A new imputation method for small software project data sets. Journal of Systems and Software 80(1), 51–62 (2007)
Setnes, M., Babuska, R., Kaymak, U., van Nauta Lemke, H.R.: Similarity measures in fuzzy rule base simplification. IEEE Transactions on Systems, Man, and Cybernetics, Part B 28(3), 376–386 (1998)
Pichlova, M., Bouchon-Meunier, B. : Using fuzzy association rules for defect forecasting in pipelines. Rencontres Francophones sur la Logique Floue et ses Applications, Nantes, Cépaduès-Editions, 305–312 (2004)
Marsala, C.: Fuzzy partitioning methods. In: Pedrycz, W. (ed.) Granular Computing: An Emerging Paradigm, pp. 163–186. Physica-Verlag GmbH, Heidelberg (2001)
Yuan, Y., Shaw, M.J.: Induction of Fuzzy Decision Trees. Fuzzy Sets and systems 69, 125–139 (1995)
Marsala, C., Bouchon-Meunier, B.: An adaptable system to construct fuzzy decision trees. In: Proceedings of the 18th International Conference of the North American Society, pp. 223–227 (1999)
Bouchon-Meunier, B., Marsala, C.: Linguistic modifiers and measures of similarity or resemblance. In: 9th IFSA World Congress, Vancouver, pp. 2195–2199 (2001)
Laurent, A., Marsala, C., Bouchon-Meunier, B.: Improvement of the Interpretability of Fuzzy Rule Based Systems: Quantifiers, Similarities and Aggregators. In: Davenport, J.H. (ed.) On the Integration of Algebraic Functions. LNCS (LNAI), pp. 102–123. Springer, Heidelberg (1981)
Hüllermeier, E.: Fuzzy-Methods in Machine Learning and Data Mining: Status and Prospects. Fuzzy Sets and Systems 156(3), 387–407 (2005)
Guillaume, S.: Designing Fuzzy Inference Systems from Data: An Interpretability-Oriented Review. IEEE Transactions on Fuzzy Systems 9(3), 426–444 (2001)
Zadeh, L.A.: Similarity relations and fuzzy ordering. Information Science, 177–200 (1971)
Valverde, L.: On the structure of F-indistinguishability operators. Fuzzy Sets and Systems 17, 313–328 (1985)
Jaccard, P.: Nouvelles recherches sur la distribution florale. Bulletin de la Société Vaudoise des Sciences Naturelles 44, 223–270 (1908)
Dice, L.R.: Measures of the amount of ecological association between species. Ecology 26, 297–302 (1945)
Ochiai, A.: Zoogeographic studies on the soleoid fishes found in Japan and its neighbouring regions. Bulletin of the Japanese Society for Science and Fisheries 22, 526–530 (1957)
Zwick, R., Carlstein, E., Budescu, D.V.: Measures of similarity among fuzzy concepts: A comparative analysis. International Journal of Approximate Reasoning 1, 221–242 (1987)
Chen, S., Yeh, M., Hsiao, P.: A comparison of similarity measures of fuzzy values. Fuzzy Sets Systems 72(1), 79–89 (1995)
Xuzhu, W., De Baets, B., Kerre, E.: A comparative study of similarity measures. Fuzzy Sets and Systems 73(2), 28, 259–268 (1995)
Jain, R., Murthy, S.N.J., Chen, P.L.-J., Chatterjee, S.: Similarity measures for image databases. In: IEEE International Conference on Fuzzy Systems, pp. 1247–1254 (1995)
Li, Y., Liu, J.-M., Li, J., Deng, W., Ye, C.-X., Wu, Z.-F.: The fuzzy similarity measures for content-based image retrieval. In: Proceedings of the International Conference on Machine Learning and Cybernetics, vol. 5, pp. 3224–3228 (2003)
Cross, V.V., Sudkamp, T.A.: Similarity and Compatibility in Fuzzy Set Theory: Assessment and Applications. Physica-Verlag (2002)
Dubois, D., Prade, H.: A unifying view of comparison indices in a fuzzy set-theoretic framework. In: Yager, R.R. (ed.) Fuzzy and possibility theory, pp. 3–13. Pergamon Press, Oxford (1982)
Shiina, K.: A fuzzy-set-theoretic feature model and its application to asymmetric data analysis. Japanese psychological research 30(3), 95–104 (1988)
Santini, S., Jain, R.: Similarity measures. IEEE Transactions on Pattern Analysis and Machine Intelligence 21(9), 871–883 (1999)
Tolias, Y.A., Panas, S.M., Tsoukalas, L.H.: Generalized fuzzy indices for similarity matching. Fuzzy Sets Systems 120(2), 255–270 (2001)
Rifqi, M. : Mesures de comparaison, typicalité et classification d’objets flous: théorie et pratique. PhD thesis, Université Paris VI (1996)
Bouchon-Meunier, B., Rifqi, M., Bothorel, S.: Towards general measures of comparison of objects. Fuzzy Sets and Systems 84(2), 143–153 (1996)
Bouchon-Meunier, B., Rifqi, M.: OWA operators and an extension of the contrast model. In: Yager, R.R., Kacprzyk, J. (eds.) The Ordered Weighted Averaging Operators: Theory, Methodology, and Applications, pp. 29–35. Kluwer Academic Publishers, Dordrecht (1997)
Rifqi, M., Berger, V., Bouchon-Meunier, B.: Discrimination power of measures of comparison. Fuzzy Sets and Systems 110(2), 189–196 (2000)
Rifqi, M., Detyniecki, M., Bouchon-Meunier, B.: Discrimination power of measures of resemblance. In: De Baets, B., Kaynak, O., Bilgiç, T. (eds.) IFSA 2003. LNCS, vol. 2715, Springer, Heidelberg (2003)
Omhover, J.-F., Detyniecki, M., Rifqi, M., Bouchon-Meunier, B.: Image Retrieval using Fuzzy Similarity: measure equivalence based on invariance in ranking. In: Proceedings of the IEEE International Conference on Fuzzy Systems, Budapest, Hungary, pp. 1367–1372 (2004)
Omhover, J.-F., Rifqi, M., Detyniecki, M.: Ranking Invariance based on Similarity Measures in Document Retrieval. In: Detyniecki, M., Jose, J.M., Nürnberger, A., van Rijsbergen, C.J. (eds.) AMR 2005. LNCS, vol. 3877, pp. 55–64. Springer, Heidelberg (2006)
Zadeh, L.A.: A note on prototype theory and fuzzy sets. Cognition 12, 291–297 (1982)
Friedman, M., Ming, M., Kandel, A.: On the theory of typicality. Int. Journ. of Uncertainty, Fuzziness and Knowledge-based Systems 3(2), 127–142 (1995)
Kacprzyk, J., Yager, R.: Linguistic summaries of data using fuzzy logic. Int. Journ. of General Systems 30, 133–154 (2001)
Lesot, M.-J., Rifqi, M., Bouchon-Meunier, B.: Fuzzy prototypes: from a cognitive view to a machine learning principle. In: Bustince, H., Herrera, F., Montero, J. (eds.) Fuzzy Sets and Their Extensions: Representation, Aggregation and Models Studies. Springer, Heidelberg (2007)
Rifqi, M.: Constructing prototypes from large databases. In: Proc. International Conference IPMU 1996, Granada, pp. 301–306 (1996)
Lesot, M.-J., Mouillet, L., Bouchon-Meunier, B.: Fuzzy prototypes based on typicality degrees. In: Proc. of Fuzzy Days 04, Springer, Advances on Soft Computing, pp. 125–138. Dortmund, Allemagne (2006)
Lesot, M.-J.: Similarity, typicality and fuzzy prototypes for numerical data. In: Res-Systemica, 5 (Special issue on the 6th European Congress on Systems Science, Paris 2005) (2005)
Lesot, M.-J.: Typicality-based clustering. Int. Journal of Information Technology and Intelligent Computing 1(2), 279–292 (2006)
Bouchon-Meunier, B., Detyniecki, M., Lesot, M.-J., Marsala, C., Rifqi, M.: Real world fuzzy logic applications in data mining and information retrieval. In: Wang, P.P., Ruan, D., Kerre, E.E. (eds.) Fuzzy Logic - A Spectrum of Theoretical and Practical Issues, Studies in Fuzziness, pp. 219–247. Springer, Heidelberg (2007)
Rifqi, M., Bothorel, S., Bouchon-Meunier, B., Muller, S.: Similarity and prototype-based approach for classification of microcalcifications. Int. J. General Systems 29(4), 623–636 (2000)
Delavallade, T., Mouillet, L., Bouchon-Meunier, B., Collain, E.: Monitoring event flows and modelling scenarios for crisis prediction, application to ethnic conflicts forecasting. Int. J. of Uncertainty, Fuzziness and knowledge-based systems, 15, 83–110 (2007)
Mouillet, L., Bouchon-Meunier, B., Collain, E.: Automated identification of political conflicts with a scenario recognition technique. In: 10th International Conference IPMU, Perugia, Italy, vol. 3, pp. 1609–1616 (2004)
Labroche, N., Lesot, M.-J., Yaffi, L.: A new web usage mining and visualization tool. In: IEEE International Conference on Tools with Artificial Intelligence (ICTAI), Patras, Greece, pp. 321–328 (2007)
Omhover, J.-F., Detyniecki, M.: STRICT: an Image Retrieval Platform for Queries Based on Regional Content. In: Enser, P.G.B., Kompatsiaris, Y., O’Connor, N.E., Smeaton, A.F., Smeulders, A.W.M. (eds.) CIVR 2004. LNCS, vol. 3115, pp. 473–482. Springer, Heidelberg (2004)
Kobayashi, I., Sugeno, M.: An approach to a dynamic system simulation based on human information processing. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(6), 611–633 (2002)
Detyniecki, M., Nürnberger, A.: Adaptive multimedia retrieval: from data to user interaction. In: Gabrys, B., Leiviska, K., Strackeljan, J. (eds.) Do Smart adaptive systems exist – Best practice for selection and combination of intelligent methods. Series on Studies on Fuzziness and Soft Computing, pp. 341–370. Springer, Heidelberg (2004)
Utgoff, P.E.: Incremental Induction of Decision Trees. In: Machine Learning, vol. 4, pp. 161–185 (1989)
Wang, T., Li, Z., Yan, Y., Chen, H.: An Incremental Fuzzy Decision Tree Classification Method for Mining Data Streams. In: Perner, P. (ed.) MLDM 2007. LNCS (LNAI), vol. 4571, pp. 91–103. Springer, Heidelberg (2007)
Prehn, H., Sommer, G.: An Adaptive Classification Algorithm Using Robust Incremental Clustering. In: 18th International Conference on Pattern Recognition, vol. 1, pp. 896–899 (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Bouchon-Meunier, B., Rifqi, M., Lesot, MJ. (2008). Similarities in Fuzzy Data Mining: From a Cognitive View to Real-World Applications. In: Zurada, J.M., Yen, G.G., Wang, J. (eds) Computational Intelligence: Research Frontiers. WCCI 2008. Lecture Notes in Computer Science, vol 5050. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68860-0_17
Download citation
DOI: https://doi.org/10.1007/978-3-540-68860-0_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68858-7
Online ISBN: 978-3-540-68860-0
eBook Packages: Computer ScienceComputer Science (R0)