A Unifying View on Instance Selection

Reinartz, Thomas

doi:10.1007/978-1-4757-3359-4_3

Thomas Reinartz³

Part of the book series: The Springer International Series in Engineering and Computer Science ((SECS,volume 608))

290 Accesses

Abstract

In this chapter, we consider instance selection as a focusing task in the data preparation phase of knowledge discovery and data mining. Focusing covers all issues related to data reduction. First, we define a broader perspective on focusing tasks, choose instance selection as one particular focusing task, and outline the specification of evaluation criteria to measure success of instance selection approaches. Thereafter, we present a unifying framework that covers existing approaches for instance selection as instantiations. We describe a specific example instantiation of this framework and discuss its strengths and weaknesses. Then, we propose an enhanced framework for instance selection, generic sampling, and summarize evaluation results for several instantiations of its implementation. Finally, we conclude with open issues and research challenges for instance selection as well as focusing in general.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

An Efficient Approach for Instance Selection

Information Selection and Data Compression RapidMiner Library

Unsupervised instance selection via conjectural hyperrectangles

Article 02 November 2022

References

Aha, D.W., Kibler, D., & Albert, M.K. (1991). Instance-Based Learning Algorithms. Machine Learning, 6, p. 37–66.
Google Scholar
Barreis, E.R. (1989). Exemplar-Based Knowledge Acquisition. Boston, MA: Academic Press.
Google Scholar
Cochran, W.G. (1977). Sampling Techniques. New York: John Wiley & Sons.
MATH Google Scholar
Chapman, P., Clinton, J., Khabaza, T., Reinartz, T., & Wirth, R. (1999). The CRISP-DM Process Model, www.crisp-dm.org/pub-paper.pdf.
Google Scholar
Datta, P., & Kibler, D. (1995). Learning Prototypical Concept Descriptions, in: Prieditis, A., & Russell, S. (eds.). Proceedings of the 12th International Conference on Machine Learning. July, 9–12, Tahoe City, CA. San Mateo, CA: Morgan Kaufmann, pp. 158–166.
Google Scholar
Dougherty, J., Kohavi, R., & Sahami, M. (1995). Supervised and Unsupervised Discretization of Continuous Features, in: Prieditis, A., & Russell, S. (eds.). Proceedings of the 12th International Conference on Machine Learning. July, 9–12, Tahoe City, CA. Menlo Park, CA: Morgan Kaufmann, pp. 194–202.
Google Scholar
Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). Knowledge Discovery and Data Mining: Towards a Unifying Framework, in: Simoudis, E., Han, J., & Fayyad, U. (eds.). Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. August, 2–4, Portland, Oregon. Menlo Park, CA: AAAI Press, pp. 82–88.
Google Scholar
Genari, J.H. (1989). A survey of clustering methods. Technical Report 89–38, University of California, Irvine, CA.
Google Scholar
Hartigan, J.A. (1975). Clustering Algorithms. New York, NY: John Wiley & Sons, Inc.
MATH Google Scholar
John, G.H., Kohavi, R., & Pfleger, K. (1994). Irrelevant Features and the Subset Selection Problem, in: Cohen, W.W., & Hirsh, H. (eds.). Proceedings of the 11th International Conference on Machine Learning. July, 10–13, Rutgers University, New Brunswick, N.J. San Mateo, CA: Morgan Kaufmann, pp. 121–129.
Google Scholar
Kohavi, R., Sommerfield, D., & Dougherty, J. (1996). Data Mining Using MLC++: A Machine Learning Library in C++. http://robotics.stanford.edu/ ~ronnyk.
Google Scholar
Linde, Y., Buzo, A., & Gray, R. (1980). An Algorithm for Vector Quantizer Design. IEEE Transactions on Communications, 28, pp. 85–95.
Article Google Scholar
Matheus, C.J., Chan, P.K., & Piatetsky-Shapiro, G. (1993). Systems for Knowledge Discovery in Databases. IEEE Transactions on Knowledge and Data Engineering, Vol. 5, No. 6, pp. 903–913.
Article Google Scholar
Quinlan, J.R. (1993). C4–5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Reinartz, T. (1997). Advanced Leader Sampling for Data Mining Algorithms, in: Kitsos, C.P. (ed.). Proceedings of the LSL’97 Satellite Conference on Lndustrial Statistics: Aims and Computational Aspects. August, 16–17, Athens, Greece. Athens, Greece: University of Economics and Business, Department of Statistics, pp. 137–139.
Google Scholar
Reinartz, T. (1998). Similarity-Driven Sampling for Data Mining, in: Zytkow, J.M. & Quafafou, M. (eds.). Principles of Data Mining and Knowledge Discovery: Second European Symposium, PKDD’98. September, 23–26, Nantes, France. Heidelberg: Springer, pp. 423–431.
Google Scholar
Reinartz, T. (1999). Focusing Solutions for Data Mining: Analytical Studies and Experimental Results in Real-World Domains, LNAI1623, Heidelberg: Springer.
Book MATH Google Scholar
Scheaffer, R.L., Mendenhall, W., & Ott, R.L. (1996). Elementary Survey Sampling, 5th Edition. New York, NY: Duxbury Press.
Google Scholar
Sen, S., Knight, L. (1995). A Genetic Prototype Learner, in: Mellish, C.S. (ed.). Proceedings of the 14th International Joint Conference on Artificial Lntelligence. August, 20–25, Montreal, Quebec, Canada. San Mateo, CA: Morgan Kaufmann, Vol. I, pp. 725–731.
Google Scholar
Skalak, D.B. (1993). Using a Genetic Algorithm to Learn Prototypes for Case Retrieval and Classification, in: Leake, D. (ed.). Proceedings of the AAAL’93 Case-based Reasoning Workshop. July, 11–15, Washington, DC. Menlo Park, CA: American Association for Artificial Intelligence, Technical Report WS-93–01, pp. 64–69.
Google Scholar
Skalak, D.B. (1994). Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms, in: Cohen, W.W., & Hirsh, H. (eds.). Proceedings of the 11th International Conference on Machine Learning. July, 10–13, Rutgers University, New Brunswick, N.J. San Mateo, CA: Morgan Kaufmann, pp. 293–301.
Google Scholar
Smyth, B., & Keane, M.T. (1995). Remembering to Forget, in: Mellish, C.S. (ed.). Proceedings of the 14th Lnternational Joint Conference on Artificial Lntelligence. August, 20–25, Montreal, Quebec, Canada. San Mateo, CA: Morgan Kaufmann, Vol. I, pp. 377–382.
Google Scholar
Wettschereck, D., Aha, D., & Mohri, T. (1995). A Review and Comparative Evaluation of Feature Weighting Methods for Lazy Learning Algorithms. Technical Report AIC-95–012, Naval Research Laboratory, Navy Center for Applied Research in Artificial Intelligence, Washington, D.C.
Google Scholar
Zhang, J. (1992). Selecting Typical Instances in Instance-Based Learning, in: Sleeman, D., & Edwards, P. (eds.). Proceedings of the 9th Lnternational Conference on Machine Learning. July, 1–3, Aberdeen, Scotland. San Mateo, CA: Morgan Kaufmann, pp. 470–479.
Google Scholar

Download references

Author information

Authors and Affiliations

Research & Technology, FT3/AD, DaimlerChrysler AG, P.O. Box 2360, 89013, Ulm, Germany
Thomas Reinartz

Authors

Thomas Reinartz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Arizona State University, USA
Huan Liu
Osaka University, Japan
Hiroshi Motoda

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Reinartz, T. (2001). A Unifying View on Instance Selection. In: Liu, H., Motoda, H. (eds) Instance Selection and Construction for Data Mining. The Springer International Series in Engineering and Computer Science, vol 608. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-3359-4_3

Download citation

DOI: https://doi.org/10.1007/978-1-4757-3359-4_3
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-4861-8
Online ISBN: 978-1-4757-3359-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

A Unifying View on Instance Selection

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

An Efficient Approach for Instance Selection

Information Selection and Data Compression RapidMiner Library

Unsupervised instance selection via conjectural hyperrectangles

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Unifying View on Instance Selection

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

An Efficient Approach for Instance Selection

Information Selection and Data Compression RapidMiner Library

Unsupervised instance selection via conjectural hyperrectangles

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation