Abstract
In this chapter, we consider instance selection as a focusing task in the data preparation phase of knowledge discovery and data mining. Focusing covers all issues related to data reduction. First, we define a broader perspective on focusing tasks, choose instance selection as one particular focusing task, and outline the specification of evaluation criteria to measure success of instance selection approaches. Thereafter, we present a unifying framework that covers existing approaches for instance selection as instantiations. We describe a specific example instantiation of this framework and discuss its strengths and weaknesses. Then, we propose an enhanced framework for instance selection, generic sampling, and summarize evaluation results for several instantiations of its implementation. Finally, we conclude with open issues and research challenges for instance selection as well as focusing in general.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aha, D.W., Kibler, D., & Albert, M.K. (1991). Instance-Based Learning Algorithms. Machine Learning, 6, p. 37–66.
Barreis, E.R. (1989). Exemplar-Based Knowledge Acquisition. Boston, MA: Academic Press.
Cochran, W.G. (1977). Sampling Techniques. New York: John Wiley & Sons.
Chapman, P., Clinton, J., Khabaza, T., Reinartz, T., & Wirth, R. (1999). The CRISP-DM Process Model, www.crisp-dm.org/pub-paper.pdf.
Datta, P., & Kibler, D. (1995). Learning Prototypical Concept Descriptions, in: Prieditis, A., & Russell, S. (eds.). Proceedings of the 12th International Conference on Machine Learning. July, 9–12, Tahoe City, CA. San Mateo, CA: Morgan Kaufmann, pp. 158–166.
Dougherty, J., Kohavi, R., & Sahami, M. (1995). Supervised and Unsupervised Discretization of Continuous Features, in: Prieditis, A., & Russell, S. (eds.). Proceedings of the 12th International Conference on Machine Learning. July, 9–12, Tahoe City, CA. Menlo Park, CA: Morgan Kaufmann, pp. 194–202.
Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). Knowledge Discovery and Data Mining: Towards a Unifying Framework, in: Simoudis, E., Han, J., & Fayyad, U. (eds.). Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. August, 2–4, Portland, Oregon. Menlo Park, CA: AAAI Press, pp. 82–88.
Genari, J.H. (1989). A survey of clustering methods. Technical Report 89–38, University of California, Irvine, CA.
Hartigan, J.A. (1975). Clustering Algorithms. New York, NY: John Wiley & Sons, Inc.
John, G.H., Kohavi, R., & Pfleger, K. (1994). Irrelevant Features and the Subset Selection Problem, in: Cohen, W.W., & Hirsh, H. (eds.). Proceedings of the 11th International Conference on Machine Learning. July, 10–13, Rutgers University, New Brunswick, N.J. San Mateo, CA: Morgan Kaufmann, pp. 121–129.
Kohavi, R., Sommerfield, D., & Dougherty, J. (1996). Data Mining Using MLC++: A Machine Learning Library in C++. http://robotics.stanford.edu/ ~ronnyk.
Linde, Y., Buzo, A., & Gray, R. (1980). An Algorithm for Vector Quantizer Design. IEEE Transactions on Communications, 28, pp. 85–95.
Matheus, C.J., Chan, P.K., & Piatetsky-Shapiro, G. (1993). Systems for Knowledge Discovery in Databases. IEEE Transactions on Knowledge and Data Engineering, Vol. 5, No. 6, pp. 903–913.
Quinlan, J.R. (1993). C4–5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.
Reinartz, T. (1997). Advanced Leader Sampling for Data Mining Algorithms, in: Kitsos, C.P. (ed.). Proceedings of the LSL’97 Satellite Conference on Lndustrial Statistics: Aims and Computational Aspects. August, 16–17, Athens, Greece. Athens, Greece: University of Economics and Business, Department of Statistics, pp. 137–139.
Reinartz, T. (1998). Similarity-Driven Sampling for Data Mining, in: Zytkow, J.M. & Quafafou, M. (eds.). Principles of Data Mining and Knowledge Discovery: Second European Symposium, PKDD’98. September, 23–26, Nantes, France. Heidelberg: Springer, pp. 423–431.
Reinartz, T. (1999). Focusing Solutions for Data Mining: Analytical Studies and Experimental Results in Real-World Domains, LNAI1623, Heidelberg: Springer.
Scheaffer, R.L., Mendenhall, W., & Ott, R.L. (1996). Elementary Survey Sampling, 5th Edition. New York, NY: Duxbury Press.
Sen, S., Knight, L. (1995). A Genetic Prototype Learner, in: Mellish, C.S. (ed.). Proceedings of the 14th International Joint Conference on Artificial Lntelligence. August, 20–25, Montreal, Quebec, Canada. San Mateo, CA: Morgan Kaufmann, Vol. I, pp. 725–731.
Skalak, D.B. (1993). Using a Genetic Algorithm to Learn Prototypes for Case Retrieval and Classification, in: Leake, D. (ed.). Proceedings of the AAAL’93 Case-based Reasoning Workshop. July, 11–15, Washington, DC. Menlo Park, CA: American Association for Artificial Intelligence, Technical Report WS-93–01, pp. 64–69.
Skalak, D.B. (1994). Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms, in: Cohen, W.W., & Hirsh, H. (eds.). Proceedings of the 11th International Conference on Machine Learning. July, 10–13, Rutgers University, New Brunswick, N.J. San Mateo, CA: Morgan Kaufmann, pp. 293–301.
Smyth, B., & Keane, M.T. (1995). Remembering to Forget, in: Mellish, C.S. (ed.). Proceedings of the 14th Lnternational Joint Conference on Artificial Lntelligence. August, 20–25, Montreal, Quebec, Canada. San Mateo, CA: Morgan Kaufmann, Vol. I, pp. 377–382.
Wettschereck, D., Aha, D., & Mohri, T. (1995). A Review and Comparative Evaluation of Feature Weighting Methods for Lazy Learning Algorithms. Technical Report AIC-95–012, Naval Research Laboratory, Navy Center for Applied Research in Artificial Intelligence, Washington, D.C.
Zhang, J. (1992). Selecting Typical Instances in Instance-Based Learning, in: Sleeman, D., & Edwards, P. (eds.). Proceedings of the 9th Lnternational Conference on Machine Learning. July, 1–3, Aberdeen, Scotland. San Mateo, CA: Morgan Kaufmann, pp. 470–479.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Reinartz, T. (2001). A Unifying View on Instance Selection. In: Liu, H., Motoda, H. (eds) Instance Selection and Construction for Data Mining. The Springer International Series in Engineering and Computer Science, vol 608. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-3359-4_3
Download citation
DOI: https://doi.org/10.1007/978-1-4757-3359-4_3
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-4861-8
Online ISBN: 978-1-4757-3359-4
eBook Packages: Springer Book Archive