Abstract
The logical analysis of data (LAD) is a combinatorics, optimization and logic based methodology for the analysis of datasets with binary or numerical input variables, and binary outcomes. It has been established in previous studies that LAD provides a competitive classification tool comparable in efficiency with the top classification techniques available. The goal of this paper is to show that the methodology of LAD can be useful in the discovery of new classes of observations and in the analysis of attributes. After a brief description of the main concepts of LAD, two efficient combinatorial algorithms are described for the generation of all prime, respectively all spanned, patterns (rules) satisfying certain conditions. It is shown that the application of classic clustering techniques to the set of observations represented in prime pattern space leads to the identification of a subclass of, say positive, observations, which is accurately recognizable, and is sharply distinct from the observations in the opposite, say negative, class. It is also shown that the set of all spanned patterns allows the introduction of a measure of significance and of a concept of monotonicity in the set of attributes.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Abramson SD, Alexe G, Hammer PL, Kohn J (2005) A computational approach to predicting cell growth on polymeric biomaterials. J Biomed Mater Res A 73A(1): 166–124
Alexe G, Alexe S, Boros E, Axelrod D, Hammer PL (2003) Combinatorial analysis of breast cancer data from image cytometry and gene expression microarrays. RUTCOR-Rutgers University Technical Report, RTR 3:1-28
Alexe G, Alexe S, Hammer PL, Liotta L, Petricoin E, Reiss M (2004) Logical analysis of the proteomic ovarian cancer dataset. Proteomics 4(3): 766–783
Alexe G, Alexe S, Crama Y, Foldes S, Hammer PL, Simeone B (2004) Consensus algorithms for the generation of all maximal bicliques. Discrete Applied Mathematics 145: 11–21
Alexe G, Alexe S, Hammer PL, Kogan A (in press) Comprehensive vs. comprehensible classifiers in Logical Analysis of Data. Discrete Applied Mathematics
Alexe G, Hammer PL (in press) Spanned patterns in Logical Analysis of Data. Discrete Applied Mathematics
Alexe S, Blackstone E, Hammer PL, Ishwaran H, Lauer MS, Snader CEP (2003) Coronary risk prediction by Logical Analysis of Data. Annals of Operations Research 119: 15–42
Alexe S, Hammer PL, Kogan A, Lejeune MA (2003) A non-recursive regression model for country risk rating. RUTCOR-Rutgers University Research Report RRR 9:1–40
Alexe S, Hammer PL (in press) Accelerated algorithm for pattern detection in Logical Analysis of Data. Discrete Applied Mathematics
Alexe S, Hammer PL (in press) Partern-based discriminants in the Logical Analysis of Data. To appear in Data Mining in Biomedicine, Biocomputing, Springer Berlin Heidelberg New York
Blake A (1937) Canonical expressions in Boolean Algebra, PhD Thesis, University of Chicago
Boros E, Hammer PL, Ibaraki T, Kogan, A, Mayoraz E, Muchnik I (2000) An implementation of Logical Analysis of Data. IEEE Transactions on Knowledge and Data Engineering 12 (2):292–306
Crama Y, Hammer PL, Ibaraki T (1988) Cause-effect relationships and partially defined Boolean functions. Annals of Operations Research 16: 299–326
Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn, John Wiley & Sons, Inc
Hammer PL (1986) Partially defined Boolean functions and cause-effect relationships. International Conference on Multi-Attribute Decision Making Via OR-Based Expert Systems, University of Passau, Germany
Hammer A, Hammer PL, Muchnik I (1999) Logical Analysis of Chinese productivity patterns. Annals of Operations Research, 87:165-176
Hammer PL, Kogan A, Simeone B, Szedmak S (2004) Pareto-optimal patterns in Logical Analysis of Data. Discrete Applied Mathematics 144:79–102
Hartigan JA (1975) Clustering Algorithms, John Wiley & Sons, Inc
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning, data mining, inference and prediction, Springer Series in Statistics, Berlin Heidelberg New York
Huang Z (1997) A fast clustering algorithm to cluster very large categorical data sets in data mining. SIGMOD Workshop on Research Issues on Discrete Mathematics and Knowledge Discovery
Jollois FX, Nadif M (2002) Clustering large categorical data. In: Cheng MS, Yu PS, Liu B (eds) Advances in knowledge discovery and data mining Proceedings of the 6th Pacific-Asia Conference, PAKDD 2002, Taipei, Taiwan Lecture Notes in Computer Science 2336, Springer Berlin Heidelberg New York, pp 257–263
Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis, John Wiley & Sons, Inc
Koda Y, Ruskey F (1993) A Gray code for the ideals of a forest poset. Journal of Algorithms 15: 324–340
Lauer MS, Alexe S, Snader CEP, Blackstone E, Ishwaran H, Hammer PL (2002) Use of the Logical Analysis of Data method for assessing long-term mortality risk after exercise electrocardiography. Circulation 106:685–690
Malgrange Y (1962) Recherche des sous-matrices premières d'une matrice à coefficients binaires-Applications à certains problèmes de graphe Deuxième Congrès de l'AFCALTI, Gauthier-Villars pp 231–242
Quine W (1955) A way to simplify truth functions. American Mathematical Monthly 62: 627–631
Struyf A, Hubert M, Rousseeuw PJ (1997) Integrating robust clustering techniques in S-PLUS. Computational Statistics and Data Analysis 26:17–37
Vrac E, Diday S, Winsberg S, Limam MM (2002) Symbolic class description. In Krzysztof Jajuga et al (eds) Data analysis, classification and clustering Methods, Springer Berlin Heidelberg New York
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Alexe, G., Alexe, S. & Hammer, P. Pattern-based clustering and attribute analysis. Soft Comput 10, 442–452 (2006). https://doi.org/10.1007/s00500-005-0505-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-005-0505-9