Abstract
Machine learning techniques often have to deal with noisy data, which may affect the accuracy of the resulting data models. Therefore, effectively dealing with noise is a key aspect in supervised learning to obtain reliable models from data. Although several authors have studied the effect of noise for some particular learners, comparisons of its effect among different learners are lacking. In this paper, we address this issue by systematically comparing how different degrees of noise affect four supervised learners that belong to different paradigms. Specifically, we consider the Naïve Bayes probabilistic classifier, the C4.5 decision tree, the IBk instance-based learner and the SMO support vector machine. We have selected four methods which enable us to contrast different learning paradigms, and which are considered to be four of the top ten algorithms in data mining (Yu et al. 2007). We test them on a collection of data sets that are perturbed with noise in the input attributes and noise in the output class. As an initial hypothesis, we assign the techniques to two groups, NB with C4.5 and IBk with SMO, based on their proposed sensitivity to noise, the first group being the least sensitive. The analysis enables us to extract key observations about the effect of different types and degrees of noise on these learning techniques. In general, we find that Naïve Bayes appears as the most robust algorithm, and SMO the least, relative to the other two techniques. However, we find that the underlying empirical behavior of the techniques is more complex, and varies depending on the noise type and the specific data set being processed. In general, noise in the training data set is found to give the most difficulty to the learners.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6: 37–66
Aha DW (1992) Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms. Int J Man–Mach Stud 36: 267–287
Angluin D, Laird P (1988) Learning from noisy examples. Mach Learn 2(4): 343–370
Asuncion A, Newman DJ (2007) UCI repository of machine learning databases. Available by anonymous ftp to ics.uci.edu in the pub/machine-learning-databases directory. University of California
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32: 675–701
Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11: 86–92
Fürnkranz J (1997) Noise-tolerant windowing. In: Proceedings of the 15th international joint conference on artificial intelligence (IJCAI-97), Nagoya, Japan. Morgan Kaufmann, pp 852–857
Goldman SA, Sloan RH (1995) Can PAC learning algorithms tolerate random attribute noise. Algorithmica 14(1): 70–84 (Springer, New York)
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten, IH (2009) The WEKA data mining software: an update; SIGKDD Explor 10–18
Hunt EB, Martin J, Stone P (1966) Experiments in induction. Academic Press, New York
John GH, Langley P (1995) Estimating continuous distributions in Bayesian classifiers. In: Eleventh conference on uncertainty in artificial intelligence, San Mateo, pp 338–345
Kearns M (1998) Efficient noise-tolerant learning from statistical queries. J ACM 45(6): 983–1006
Meeson S, Blott BH, Killingback ALT (1996) EIT data noise evaluation in the clinical environment. Physiol Meas 17: A33–A38
Nelson R (2005) Overcoming noise in data-acquisition systems (WEBCAST). Test Meas World. http://www.tmworld.com/article/319648- Overcomming_noise_in_data_acquisition_systems.php
Nettleton D, Torra V (2001) A comparison of active set method and genetic algorithm approaches for learning weighting vectors in some aggregation operators. Int J Intel Syst 16(9): 1069–1083
Nettleton D, Muñiz J (2001) Processing and representation of meta-data for sleep apnea diagnosis with an artificial intelligence approach. Int J Med Inform 63(1–2): 77–89
Platt J (1998) Fast training of support vector Machines using sequential minimal optimization. In: Schölkopf B, Burges CJC, Smola AJ (eds) Advances in kernel methods—support vector learning, Chap 12. MIT Press, pp 169–185
Quinlan JR (1986) Induction of decision trees. Mach Learn 1: 81–106 (Kluwer Academic Publishers)
Quinlan JR (1993) C4.5 programs for machine learning. Morgan Kaufmann, San Mateo
Sloan R (1988) Types of noise in data for concept learning. In: Annual workshop on computational learning theory. Proceedings of the first annual workshop on Computational learning theory: 91–96. SIGART: ACM special interest group on artificial intelligence
Sloan RH (1995) Four types of noise in data for PAC learning. Inform Process Lett 54(3): 157–162
Torra V (1997) The weighted owa operator. Int J Intell Syst 12(2): 153–166
Vapnik VN (1995) The nature of statistical learning theory. Springer Verlag, New York
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco
Yu S, Zhou ZH, Steinbac M, Hand DJ, Steinberg D (2007) Top 10 algorithms in data mining. Knowl Inform Syst 14(1): 1–37
Zhu X, Wu X, Chen Q (2003) Eliminating class noise in large datasets. In: Proceedings of the 20th ICML international conference on machine learning, Washington, DC, pp 920–927
Zhu X, Wu X (2004) Class noise vs. attribute noise: a quantitative study of their impacts. Artif Intel Rev 22: 177–210 (Kluwer Academic Publishers)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Nettleton, D.F., Orriols-Puig, A. & Fornells, A. A study of the effect of different types of noise on the precision of supervised learning techniques. Artif Intell Rev 33, 275–306 (2010). https://doi.org/10.1007/s10462-010-9156-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-010-9156-z