Many classification studies often times conclude with a summary table which presents performance results of applying various data mining approaches on different datasets. No single method outperforms all methods all the time. Furthermore, the performance of a classiffication method in terms of its false-positive and false-negative rates may be totally unpredictable. Attempts to minimize any of the previous two rates, may lead to an increase on the other rate. If the model allows for new data to be deemed as unclassifiable when there is not adequate information to classify them, then it is possible for the previous two error rates to be very low but, at the same time, the rate of having unclassifiable new examples to be very high. The root to the above critical problem is the overfitting and overgeneralization behaviors of a given classification approach when it is processing a particular dataset. Although the above situation is of fundamental importance to data mining, it has not been studied from a comprehensive point of view. Thus, this chapter analyzes the above issues in depth. It also proposes a new approach called the HomogeneityBased Algorithm (or HBA) for optimally controlling the previous three error rates. This is done by first formulating an optimization problem. The key development in this chapter is based on a special way for analyzing the space of the training data and then partitioning it according to the data density of different regions of this space. Next, the classification task is pursued based on the previous partitioning of the training space. In this way, the previous three error rates can be controlled in a comprehensive manner. Some preliminary computational results seem to indicate that the proposed approach has a significant potential to fill in a critical gap in current data mining methodologies.
Key words: classification, prediction, overfitting, overgeneralization, falsepositive, false-negative, homogenous set, homogeneity degree, optimization
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Abdi, H., (2003), “A neural network primer,” Journal of Biological Systems, vol. 2, pp. 247-281.
Ali, K., C. Brunk, and M. Pazzani, (1994), “On learning multiple descriptions of a concept,” Proceedings of Tools with Artificial Intelligence, New Orleans, LA, USA, pp. 476-483.
Artificial Neural Network Toolbox 6.0 and Statistics Toolbox 6.0, Matlab Version 7.0, website: http://www.mathworks.com/products/
Boros, E., P. L. Hammer, and J. N. Hooker, (1994), “Predicting Cause-Effect Relationships from Incomplete Discrete Observations,” Journal on Discrete Mathematics, vol. 7, no. 4, pp. 531-543.
Bracewell, R., (1999), “The Impulse Symbol,” Chapter 5 in The Fourier Transform and Its Applications, 3rd ed. New York: McGraw-Hill, pp. 69-97.
Breiman, L., (1996), “Bagging predictors,” Journal of Machine Learning, vol. 24, pp. 123-140.
Breiman, L., (2001), ”Random forests,” Journal of Machine Learning, vol. 45, no. 1, pp. 5-32.
Breiman, L., J. H. Friedman, R. A. Olshen, and C. J. Stone, (1984), “Classification and Regression Trees,” Chapman Hall/CRC Publisher, pp. 279-293.
Byvatov, E., and G. Schneider, (2003), “Support vector machine applications in bioinformatics,” Journal of Application Bioinformatics, vol. 2, no.2, pp. 67-77.
Clark, P., and R. Boswell, (1991), “Rule induction with CN2: Some recent improvements,” Y. Kodratoff, editor, Machine Learning - EWSL-91, Berlin, SpringerVerlag, pp. 151-163.
Clark, P., and T. Niblett, (1989), “The CN2 Algorithm,” Journal of Machine Learning, vol. 3, pp. 261-283.
Cohen S., L. Rokach, O. Maimon, (2007), “Decision-tree instance-space decomposition with grouped gain-ratio,”, Information Science, Volume 177, Issue 17, pp. 3592-3612.
Cohen, W. W., (1995), “Fast effective rule induction,” Machine Learning: Proceedings of the Twelfth International Conference, Tahoe City, CA., USA, pp. 115-123.
Cortes, C., and V. Vapnik, (1995), “Support-vector networks,” Journal of Machine Learning, vol. 20, no. 3, pp. 273-297.
Cover, T. M., and P. E. Hart, (1967), “Nearest Neighbor Pattern Classification,” Institute of Electrical and Electronics Engineers Transactions on Information Theory, vol. 13, no. 1, pp. 21-27.
Cristianini, N., and S. T. John, (2000), “An Introduction to Support Vector Machines and other kernel-based learning methods,” Cambridge University Press.
Dasarathy, B. V., and B. V. Sheela, (1979), “A Composite Classifier System Design: Concepts and Methodology,” Proceedings of the IEEE, vol. 67, no. 5, pp. 708-713.
Dietterich, T. G., and G. Bakiri, (1994), “Solving multiclass learning problems via error-correcting output codes,” Journal of Artificial Intelligence Research, vol. 2, pp. 263-286.
Duda, R. O., and P. E. Hart, (1973), “Pattern Classification and Scene Analysis,” Wiley Publisher, pp. 56-64.
Duda. O. R., E. H. Peter, G. S. David , (2001), “Pattern Classification,” Chapter 4: Nonparametric Techniques in Wiley Interscience Publisher, pp. 161-199.
Dudani, S., (1976), “The Distance-Weighted k-Nearest-Neighbor Rule,” IEEE Transactions on Systems, Man and Cybernetics, vol. 6, no. 4, pp. 325-327.
Friedman, N., D. Geiger, and M. Goldszmidt, (1997), “Bayesian Network Classifiers,” Journal of Machine Learning, vol. 29, pp. 131-161.
Geman, S., E. Bienenstock, and R. Doursat, (1992), “Neural Networks and the Bias/Variance Dilemma,” Journal of Neural Computation, vol. 4, pp. 1-58.
Hecht-Nielsen, R., (1989), “Theory of the Backpropagation neural Network,” International Joint Conference on neural networks, Washington, DC, USA, pp. 593-605.
Huzefa, R., and G. Karypis, (2005), “Profile Based Direct Kernels for Remote Homology Detection and Fold Recognition,” Journal of Bioinformatics, vol. 31, no. 23, pp. 4239-4247.
Karp, R. M., (1972), “Reducibility Among Combinatorial Problems,” Proceedings of Sympos. IBM Thomas J. Watson Res. Center, Yorktown Heights, New York: Plenum, pp. 85-103.
Keller, J. M., M. R. Gray, and J. A. Givens, Jr, (1985), “A Fuzzy K-Nearest Neighbor Algorithm,” Journal of IEEE Transactions on Systems, Man, and Cybernetics, vol. 15, no. 4, pp. 580-585.
Kohavi R., (1996), “Scaling up the accuracy of naive-Bayes classifiers: a decisiontree hybrid,” Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, OR, USA, pp. 202-207.
Kohavi, R., and G. John, (1997), “Wrappers for Feature Subset Selection,” Journal of Artificial Intelligence: special issue on relevance, vol. 97, no. 1-2, pp. 273-324.
Kokol, P., M. Zorman, M. M. Stiglic, and I. Malcic, (1998), “The limitations of decision trees and automatic learning in real world medical decision making,” Proceedings of the 9th World Congress on Medical Informatics MEDINFO’98, vol. 52, pp. 529-533.
ıve Bayesian classifier,” Y. Kodratoff Editor, Proceedings of sixth European working session on learning, Springer-Verlag, pp. 206-219.
Kwok, S., and C. Carter, (1990), “Multiple decision trees: uncertainty,” Journal of Artificial Intelligence, vol.4, pp. 327-335.
Langley, P., and S. Sage, (1994), “Induction of Selective Bayesian Classifiers,” Proceedings of UAI-94, Seattle, WA, USA, pp. 399-406.
Mansour, Y., D. McAllester, (2000), “Generalization Bounds for Decision Trees,” Proceedings of the 13th Annual Conference on Computer Learning Theory, San Francisco, Morgan Kaufmann, USA, pp. 69-80.
Moody, J. E., (1992), “The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems,” Journal of Advances in Neural Information Processing Systems, vol. 4, pp. 847-854.
Nock, R., and O. Gascuel, (1995), “On learning decision committees,” Proceedings of the Twelfth International Conference on Machine Learning, Morgan Kaufmann, Taho City, CA., USA, pp. 413-420.
Oliver, J. J., and D. J.Hand, (1995), “On pruning and averaging decision trees,” Proceedings of the Twelfth International Conference on Machine Learning, Morgan Kaufmann, Taho City, CA., USA, pp. 430-437.
Pazzani, M.J., (1995), “Searching for dependencies in Bayesian classifiers,” Proceedings of AI STAT’95, pp. 239-248.
Podgorelec, V., P. Kokol, B. Stiglic, and I. Rozman, (2002), “Decision trees: an overview and their use in medicine,” Journal of Medical Systems, Kluwer Academic/Plenum Press, vol. 26, no. 5, pp. 445-463
Quinlan, J. R., (1987), “Simplifying decision trees,” International Journal of ManMachine Studies, vol. 27, pp. 221-234.
Quinlan, J. R., (1993), “C4.5: Programs for Machine Learning,” Morgan Kaufmann Publisher San Mateo, CA., USA, pp. 35-42.
Rada, M., (2004), “Seminar on Machine Learning,” a presentation of a course taught at University of North Texas.
Rokach L., O. Maimon, O. Arad, (2005), “Improving Supervised Learning by Sample Decomposition,” Journal of Computational Intelligence and Applications, vol. 5, no. 1, pp. 37-54.
Sands D., (1998), “Improvement theory and its applications,” Gordon A. D., and A. M. Pitts Editors, Higher Order Operational Techniques in Semantics, Publications of the Newton Institute, Cambridge University Press, pp. 275-306.
Schapire, R. E, (1990), “The strength of weak learnability,” Journal of Machine Learning, vol. 5, pp. 197-227.
Shawe-Taylor. J., and C. Nello, (1999), “Further results on the margin distribution,” Proceedings of COLT99, Santa Cruz, CA., USA, pp. 278-285.
Smith, M., (1996), “Neural Networks for Statistical Modeling,” Itp New Media Publisher, ISBN 1-850-32842-0, pp. 117-129.
Spizer, M., L. Stefan, C. Paul, S. Alexander, and F. George, (2006), “IsoSVM Distinguishing isoforms and paralogs on the protein level,” Journal of BMC Bioinformatics, vol. 7:110, website: http://www.biomedcentral.com/content/pdf/1471-2105-7-110.pdf.
Tan, P. N., S. Michael, and K. Vipin, (2005), “Introduction to Data Mining,” Chapters 4 and 5, Addison-Wesley Publisher, pp. 145-315.
Triantaphyllou, E., (2007), “Data Mining and Knowledge Discovery Via a Novel Logic-Based Approach,” A monograph, Springer, Massive Computing Series, 420 pages, (in print).
Triantaphyllou, E., and G. Felici, (Editors), (2006), “Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques,” Springer, Massive Computing Series, 796 pages.
Triantaphyllou, E., L. Allen, L. Soyster, and S. R. T. Kumara, (1994), “Generating Logical Expressions From Positive and Negative Examples via a Branch-andBound approach,” Journal of Computers and Operations Research, vol. 21, pp. 783-799.
Vapnik, V., (1998), “Statistical Learning Theory,” Wiley Publisher, pp. 375-567.
Webb, G. I., (1996), “Further experimental evidence against the utility of Occam’s razor,” Journal of Artificial Intelligence Research, vol. 4, pp. 397-417.
Webb, G. I., (1997), “Decision Tree Grafting,” Proceedings of the 15th International Joint Conference on Artificial Intelligence (IJCAI’97), vol. 2, pp. 23-29.
Weigend, A., (1994), “On overfitting and the effective number of hidden units,” Proceedings of the 1993 Connectionist Models Summer School, pp. 335-342.
Wikipedia Dictionary, (2007), website: http://en.wikipedia.org/wiki/Homogenous.
Wolpert, D. H, (1992), “Stacked generalization,” Journal of Neural Networks, vol. 5, pp. 241-259.
Zavrsnik, J., P. Kokol, I. Maleiae, K. Kancler, M. Mernik, and M. Bigec, (1995), “ROSE: decision trees, automatic learning and their applications in cardiac medicine,” MEDINFO’95, Vancouver, Canada, pp. 201-206.
Zhou Z. and C. Chen, (2002), “Hybrid decision tree,” Journal of Knowledge-Based Systems, vol. 15, pp. 515-528.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Pham, H.N.A., Triantaphyllou, E. (2008). The Impact of Overfitting and Overgeneralization on the Classification Accuracy in Data Mining. In: Maimon, O., Rokach, L. (eds) Soft Computing for Knowledge Discovery and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-69935-6_16
Download citation
DOI: https://doi.org/10.1007/978-0-387-69935-6_16
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-69934-9
Online ISBN: 978-0-387-69935-6
eBook Packages: Computer ScienceComputer Science (R0)