Abstract
Although in the past machine learning algorithms have been successfully used in many problems, their serious practical use is affected by the fact that often they cannot produce reliable and unbiased assessments of their predictions' quality. In last few years, several approaches for estimating reliability or confidence of individual classifiers have emerged, many of them building upon the algorithmic theory of randomness, such as (historically ordered) transduction-based confidence estimation, typicalness-based confidence estimation, and transductive reliability estimation. Unfortunately, they all have weaknesses: either they are tightly bound with particular learning algorithms, or the interpretation of reliability estimations is not always consistent with statistical confidence levels. In the paper we describe typicalness and transductive reliability estimation frameworks and propose a joint approach that compensates the above-mentioned weaknesses by integrating typicalness-based confidence estimation and transductive reliability estimation into a joint confidence machine. The resulting confidence machine produces confidence values in the statistical sense. We perform series of tests with several different machine learning algorithms in several problem domains. We compare our results with that of a proprietary method as well as with kernel density estimation. We show that the proposed method performs as well as proprietary methods and significantly outperforms density estimation methods.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Bay SD, Pazzani MJ (2000) Characterizing model errors and differences. In: Proc. 17th international conf. on machine learning. Morgan Kaufmann, San Francisco, CA,pp 49–56
Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Bartlett P, Mansour Y (eds) Proceedings of the 11th annual conference on computational learning theory. ACM Press, New York, USA, Madison, Wisconsin, pp 92–100
Diamond GA, Forester JS (1979) Analysis of probability as an aid in the clinical diagnosis of coronary artery disease. New England Journal of Medicine 300: 1350
Gammerman A, Vovk V, Vapnik V (1998) Learning by transduction. In: Cooper GF, Moral S (eds) Proceedings of the 14th conference on uncertainty in artificial intelligence. Morgan Kaufmann, San Francisco, USA, Madison, Wisconsin, pp 148–155
Gibbs AL, Su FE (2002) On choosing and bounding probability metrics. International Statistical Review 70(3): 419–435
Halck OM (2002) Using hard classifiers to estimate conditional class probabilities. In:Elomaa T, Mannila H, Toivonen H (eds) Proceedings of the thirteenth European conference on machine learning. Springer-Verlag, Berlin, pp 124–134
Hastie T, Tibisharani R, Friedman J (2001) The elements of statistical learning. Springer-Verlag
Ho SS, Wechsler H (2003) Transductive confidence machine for active learning. In: Proc. Int. joint Conf. on neural networks'03. Portland, OR
John GH, Langley P (1995) Estimating continuous distributions in Bayesian classifiers. In: Besnard P, Hanks S (eds) Proceedings of the eleventh conference on uncertainty in artificial intelligence. Morgan Kaufmann, San Francisco, USA
Kononenko I (1991) Semi-naive Bayesian classifier. In: Kodratoff Y (ed) Proc. European working session on learning-91. Springer-Verlag, Berlin-Heidelberg-New York, Porto, Potrugal, pp 206–219
Kukar M (2001a) Estimating classifications' reliability. PhD thesis, University of Ljubljana, faculty of computer and information science, Ljubljana, Slovenia. In Slovene
Kukar M (2001b) Making reliable diagnoses with machine learning: A case study. In: Quaglini S, Barahona P, Andreassen S (eds) Proceedings of artificial intelligence in medicine Europe, AIME 2001. Springer-Verlag, Berlin, Cascais, Portugal, pp 88–96
Kukar M, Kononenko I (2002) Reliable classifications with machine learning. In: Elomaa T, Mannila H, Toivonen H (eds) Proceedings of 13th European conference on machine learning. ECML 2002', Springer-Verlag, Berlin, pp 219–231
Li M, Vitányi P (1997) An introduction to Kolmogorov complexity and its applications. 2nd edn. Springer-Verlag, New York
Melluish T, Saunders C, Nouretdinov I, Vovk V (2001) Comparing the Bayes and typicalness frameworks. In: Proc. ECML 2001. vol 2167, pp 350–357
Nouretdinov I, Melluish T, Vovk V (2001) Ridge regressioon confidence machine. In: Proc. 18th international conf. on machine learning. Morgan Kaufmann, San Francisco, CA, pp 385–392
Olona-Cabases M (1994) The probability of a correct diagnosis. In: Candell-Riera J, Ortega-Alcalde D (eds) Nuclear cardiology in everyday practice. Kluwer, Dordrecht, NL, pp 348–357
Pfahringer B, Bensuasan H, Giraud-Carrier C (2000) Meta-learning by landmarking various learning algorithms. In: Proc. 17th international conf. on machine learning. Morgan Kaufmann, San Francisco, CA
Proedrou K, Nouretdinov I, Vovk V, Gammerman A (2002) Transductive confidence machines for pattern recognition. In: Proc. ECML 2002. Springer, Berlin, pp 381–390
Robnik-Šikonja M, Kononenko I (2003) Theoretical and empirical analysis of ReliefF and RReliefF. Mach. learn. 53: 23–69
Rumelhart D, McClelland JL (1986) Parallel distributed processing, vol 1: Foundations. MIT Press, Cambridge
Saunders C, Gammerman A, Vovk V (1999) Transduction with confidence and credibility. In: Dean T (ed) Proceedings of the international joint conference on artificial intelligence. Morgan Kaufmann, San Francisco, USA, Stockholm, Sweden
Seewald A, Furnkranz J (2001) An evaluation of grading classifiers. In: Proc. 4th international symposium on advances in intelligent data analysis. pp 115–124
Smyth P, Gray A, Fayyad U (1995) Retrofitting decision tree classifiers using kernel density estimation. In: Prieditis A, Russell SJ (eds) Proceedings of the twelvth international conference on machine learning. Morgan Kaufmann, San Francisco, USA, Tahoe City, California, USA, pp 506–514
Specht DF, Romsdahl H (1994) Experience with adaptive pobabilistic neural networks and adaptive general regression neural networks. In: Rogers SK (ed) Proceedings of IEEE international conference on neural networks. IEEE Press, Piscataway, USA, Orlando, USA
Taneja IJ (1995) On generalized information measures and their applications. Adv. Electron. Elect. Physics 76: 327–416
Vapnik V (1998) Statistical learning theory. John Wiley, New York, USA
Venables WN, Ripley BD (2002) Modern applied statistics with S-PLUS, 4th ed. Springer-Verlag
Vovk V, Gammerman A, Saunders C (1999) Machine learning application of algorithmic randomness. In: Bratko I, Dzeroski S (eds) Proceedings of the 16th international conference on machine learning (ICML'99). Morgan Kaufmann, San Francisco, USA, Bled, Slovenija
Wand MP, Jones MC (1995) Kernel smoothing. Chapman and Hall, London
Author information
Authors and Affiliations
Additional information
Matjaž Kukar is currently Assistant Professor in the Faculty of Computer and Information Science at University of Ljubljana. His research interests include machine learning, data mining and intelligent data analysis, ROC analysis, cost-sensitive learning, reliability estimation, and latent structure analysis, as well as applications of data mining in medical and business problems.
Rights and permissions
About this article
Cite this article
Kukar, M. Quality assessment of individual classifications in machine learning and data mining. Knowl Inf Syst 9, 364–384 (2006). https://doi.org/10.1007/s10115-005-0203-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-005-0203-z