Abstract
The paper describes a context-sensitive discretization algorithm that can be used to completely discretize a numeric or mixed numeric-categorical dataset. The algorithm combines aspects of unsupervised (class-blind) and supervised methods. It was designed with a view to the problem of finding association rules or functional dependencies in complex, partly numerical data. The paper describes the algorithm and presents systematic experiments with a synthetic data set that contains a number of rather complex associations. Experiments with varying degrees of noise and “fuzziness” demonstrate the robustness of the method. An application to a large real-world dataset produced interesting preliminary results, which are currently the topic of specialized investigations.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Agrawal, R., Imielinski, T., and Swami, A. (1993). Database Mining: A Performance Perspective. IEEE Transactions on Knowledge and Data Engineering 5(6) (Special issue on Learning and Discovery in Knowledge-Based Databases), 914–925.
Agrawal, R. and Srikant, R. (1994). Fast Algorithms for Mining Association Rules. In Proc. of the 20th Int.l Conference on Very Large Databases, Santiago, Chile.
Aumann, Y. and Lindell, Y. (1999). A Statistical Theory for Quantitative Association Rules. In Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining (KDD-99). Menlo Park, CA: AAAI Press.
Dillon, W., and Goldstein, M. (1984). Multivariate Analysis. New York: Wiley.
Dougherty, J., Kohavi, R., and Sahami, M. (1995). Supervised and Unsupervised Discretization of Continuous Features. In Proceedings of the 12th International Conference on Machine Learning (ML95). San Francisco, CA: Morgan Kaufmann.
Ludl, M.-C. and Widmer, G. (2000). Relative Unsupervised Discretization for Regression Problems. In Proceedings of the 11th European Conference on Machine Learning (ECML’2000). Berlin: Springer Verlag.
Pavlidis T. (1982). Algorithms for Graphics and Image Processing. Rockville, MD: Computer Science Press.
Rastogi, R. and Shim, K. (1999). Mining Optimized Support Rules for Numeric Attributes. In Proc. of the International Conference on Data Engineering 1999.
Srikant, R. and Agrawal, R. (1996). Mining Quantitative Association Rules in Large Relational Tables. In Proceedings of the ACM-SIGMOD Conference on Management of Data, Montreal.
Wang, K., Tay, S., and Liu, B. (1998). Interestingness-Based Interval Merger for Numeric Association Rules. In Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD-98). Menlo Park: AAAI Press.
Widmer, G. (1998). Applications of Machine Learning to Music Research: Empirical Investigations into the Phenomenon of Musical Expression. In R.S. Michalski, I. Bratko and M. Kubat (eds.), Machine Learning and Data Mining: Methods and Applications. Chichester, UK: Wiley.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lud, MC., Widmer, G. (2000). Relative Unsupervised Discretization for Association Rule Mining. In: Zighed, D.A., Komorowski, J., Żytkow, J. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 2000. Lecture Notes in Computer Science(), vol 1910. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45372-5_15
Download citation
DOI: https://doi.org/10.1007/3-540-45372-5_15
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41066-9
Online ISBN: 978-3-540-45372-7
eBook Packages: Springer Book Archive