Abstract
While real data often comes in mixed format, discrete and continuous, many supervised induction algorithms require discrete data. Efficient discretization of continuous attributes is an important problem that has effects on speed, accuracy and understandability of the induction models. In this paper, we propose a new discretization method MODL1, founded on a Bayesian approach. We introduce a space of discretization models and a prior distribution defined on this model space. This results in the definition of a Bayes optimal evaluation criterion of discretizations. We then propose a new super-linear optimization algorithm that manages to find near-optimal discretizations. Extensive comparative experiments both on real and synthetic data demonstrate the high inductive performances obtained by the new discretization method.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Bay, S. (2001). Multivariate discretization for set mining. Knowledge and Information Systems, 3(4), 491–512.
Bertier, P., & Bouroche, J. M. (1981). Analyse des données multidimensionnelles. Presses Universitaires de France.
Blake, C. L., & Merz, C. J. (1998). UCI Repository of machine learning databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science.
Boullé, M. (2003). Khiops: A discretization method of continuous attributes with guaranteed resistance to noise. Proceeding of the Third International Conference on Machine Learning and Data Mining in Pattern Recognition (pp. 50–64).
Boullé, M. (2004). Khiops: A statistical discretization method of continuous attributes. Machine Learning, 55(1), 53–69.
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. California: Wadsworth International.
Catlett, J. (1991). On changing continuous attributes into ordered discrete attributes. In Proceedings of the European Working Session on Learning (pp. 87–102). Springer-Verlag.
Dougherty, J., Kohavi, R., & Sahami, M. (1995). Supervised and unsupervised discretization of continuous features. In Proceedings of the 12 th International Conference on Machine Learning. (pp. 194–202) San Francisco, CA: Morgan Kaufmann.
Elomaa, T., & Rousu, J. (1996). Finding optimal multi-splits for numerical attributes in decision tree learning. Technical report, NeuroCOLT Technical Report NC-TR-96-041. Royal Holloway, University of London.
Elomaa, T., & Rousu, J. (1999). General and efficient multisplitting of numerical attributes. Machine Learning, 36, 201–244.
Fayyad, U., & Irani, K. (1992). On the handling of continuous-valued attributes in decision tree generation. Machine Learning, 8, 87–102.
Fischer, W. D. (1958). On grouping for maximum of homogeneity. Journal of the American Statistical Association, 53, 789–798.
Fulton, T., Kasif, S., & Salzberg, S. (1995). Efficient algorithms for finding multi-way splits for decision trees. In Proceeding of the Twelfth International Conference on Machine Learning.
Holte, R. C. (1993). Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11, 63–90.
Kass, G. V. (1980). An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29(2), 119–127.
Kerber, R. (1991). Chimerge discretization of numeric attributes. In Proceedings of the 10 th International Conference on Artificial Intelligence (pp. 123–128).
Kohavi, R., & Sahami, M. (1996). Error-based and entropy-based discretization of continuous features. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (pp. 114–119). Menlo Park, CA: AAAI Press/MIT Press.
Kononenko, I., Bratko, I., & Roskar, E. (1984). Experiments in automatic learning of medical diagnostic rules (Technical Report). Ljubljana: Joseph Stefan Institute, Faculty of Electrical Engineering and Computer Science.
Lechevallier, Y. (1990). Recherche d’une partition optimale sous contrainte d’ordre total. Technical report N ∘ 1247, INRIA.
Liu, H., Hussain, F., Tan, C. L., & Dash, M. (2002). Discretization: An enabling technique. Data Mining and Knowledge Discovery, 6(4), 393–423.
Quinlan, J. R. (1993). C4.5: Programs for machine learning. Morgan Kaufmann.
Rissanen, J. (1978). Modeling by shortest data description. Automatica, 14, 465–471.
Vitanyi, P. M. B., & Li, M. (2000). Minimum description length induction, Bayesianism, and Kolmogorov Complexity. IEEE Trans. Inform. Theory, IT-46(2), 446–464.
Zighed, D. A., Rabaseda, S., & Rakotomalala, R. (1998). Fusinter: A method for discretization of continuous attributes for supervised learning. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 6(33), 307–326.
Zighed, D. A., Rabaseda, S., Rakotomalala, R., & Feschet F. (1999). Discretization methods in supervised learning. In Encyclopedia of Computer Science and Technology, vol. 40 (pp. 35–50) Marcel Dekker Inc.
Zighed, D. A., & Rakotomalala, R. (2000). Graphes d’induction. (pp. 327–359) HERMES Science Publications.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: Tom Fawcett
1French patent No. 04 00179.
Rights and permissions
About this article
Cite this article
Boullé, M. MODL: A Bayes optimal discretization method for continuous attributes. Mach Learn 65, 131–165 (2006). https://doi.org/10.1007/s10994-006-8364-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-006-8364-x