Skip to main content

Part of the book series: The Springer International Series in Engineering and Computer Science ((SECS,volume 608))

  • 290 Accesses

Abstract

In this chapter, we consider instance selection as a focusing task in the data preparation phase of knowledge discovery and data mining. Focusing covers all issues related to data reduction. First, we define a broader perspective on focusing tasks, choose instance selection as one particular focusing task, and outline the specification of evaluation criteria to measure success of instance selection approaches. Thereafter, we present a unifying framework that covers existing approaches for instance selection as instantiations. We describe a specific example instantiation of this framework and discuss its strengths and weaknesses. Then, we propose an enhanced framework for instance selection, generic sampling, and summarize evaluation results for several instantiations of its implementation. Finally, we conclude with open issues and research challenges for instance selection as well as focusing in general.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  • Aha, D.W., Kibler, D., & Albert, M.K. (1991). Instance-Based Learning Algorithms. Machine Learning, 6, p. 37–66.

    Google Scholar 

  • Barreis, E.R. (1989). Exemplar-Based Knowledge Acquisition. Boston, MA: Academic Press.

    Google Scholar 

  • Cochran, W.G. (1977). Sampling Techniques. New York: John Wiley & Sons.

    MATH  Google Scholar 

  • Chapman, P., Clinton, J., Khabaza, T., Reinartz, T., & Wirth, R. (1999). The CRISP-DM Process Model, www.crisp-dm.org/pub-paper.pdf.

    Google Scholar 

  • Datta, P., & Kibler, D. (1995). Learning Prototypical Concept Descriptions, in: Prieditis, A., & Russell, S. (eds.). Proceedings of the 12th International Conference on Machine Learning. July, 9–12, Tahoe City, CA. San Mateo, CA: Morgan Kaufmann, pp. 158–166.

    Google Scholar 

  • Dougherty, J., Kohavi, R., & Sahami, M. (1995). Supervised and Unsupervised Discretization of Continuous Features, in: Prieditis, A., & Russell, S. (eds.). Proceedings of the 12th International Conference on Machine Learning. July, 9–12, Tahoe City, CA. Menlo Park, CA: Morgan Kaufmann, pp. 194–202.

    Google Scholar 

  • Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). Knowledge Discovery and Data Mining: Towards a Unifying Framework, in: Simoudis, E., Han, J., & Fayyad, U. (eds.). Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. August, 2–4, Portland, Oregon. Menlo Park, CA: AAAI Press, pp. 82–88.

    Google Scholar 

  • Genari, J.H. (1989). A survey of clustering methods. Technical Report 89–38, University of California, Irvine, CA.

    Google Scholar 

  • Hartigan, J.A. (1975). Clustering Algorithms. New York, NY: John Wiley & Sons, Inc.

    MATH  Google Scholar 

  • John, G.H., Kohavi, R., & Pfleger, K. (1994). Irrelevant Features and the Subset Selection Problem, in: Cohen, W.W., & Hirsh, H. (eds.). Proceedings of the 11th International Conference on Machine Learning. July, 10–13, Rutgers University, New Brunswick, N.J. San Mateo, CA: Morgan Kaufmann, pp. 121–129.

    Google Scholar 

  • Kohavi, R., Sommerfield, D., & Dougherty, J. (1996). Data Mining Using MLC++: A Machine Learning Library in C++. http://robotics.stanford.edu/ ~ronnyk.

    Google Scholar 

  • Linde, Y., Buzo, A., & Gray, R. (1980). An Algorithm for Vector Quantizer Design. IEEE Transactions on Communications, 28, pp. 85–95.

    Article  Google Scholar 

  • Matheus, C.J., Chan, P.K., & Piatetsky-Shapiro, G. (1993). Systems for Knowledge Discovery in Databases. IEEE Transactions on Knowledge and Data Engineering, Vol. 5, No. 6, pp. 903–913.

    Article  Google Scholar 

  • Quinlan, J.R. (1993). C4–5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Reinartz, T. (1997). Advanced Leader Sampling for Data Mining Algorithms, in: Kitsos, C.P. (ed.). Proceedings of the LSL’97 Satellite Conference on Lndustrial Statistics: Aims and Computational Aspects. August, 16–17, Athens, Greece. Athens, Greece: University of Economics and Business, Department of Statistics, pp. 137–139.

    Google Scholar 

  • Reinartz, T. (1998). Similarity-Driven Sampling for Data Mining, in: Zytkow, J.M. & Quafafou, M. (eds.). Principles of Data Mining and Knowledge Discovery: Second European Symposium, PKDD’98. September, 23–26, Nantes, France. Heidelberg: Springer, pp. 423–431.

    Google Scholar 

  • Reinartz, T. (1999). Focusing Solutions for Data Mining: Analytical Studies and Experimental Results in Real-World Domains, LNAI1623, Heidelberg: Springer.

    Book  MATH  Google Scholar 

  • Scheaffer, R.L., Mendenhall, W., & Ott, R.L. (1996). Elementary Survey Sampling, 5th Edition. New York, NY: Duxbury Press.

    Google Scholar 

  • Sen, S., Knight, L. (1995). A Genetic Prototype Learner, in: Mellish, C.S. (ed.). Proceedings of the 14th International Joint Conference on Artificial Lntelligence. August, 20–25, Montreal, Quebec, Canada. San Mateo, CA: Morgan Kaufmann, Vol. I, pp. 725–731.

    Google Scholar 

  • Skalak, D.B. (1993). Using a Genetic Algorithm to Learn Prototypes for Case Retrieval and Classification, in: Leake, D. (ed.). Proceedings of the AAAL’93 Case-based Reasoning Workshop. July, 11–15, Washington, DC. Menlo Park, CA: American Association for Artificial Intelligence, Technical Report WS-93–01, pp. 64–69.

    Google Scholar 

  • Skalak, D.B. (1994). Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms, in: Cohen, W.W., & Hirsh, H. (eds.). Proceedings of the 11th International Conference on Machine Learning. July, 10–13, Rutgers University, New Brunswick, N.J. San Mateo, CA: Morgan Kaufmann, pp. 293–301.

    Google Scholar 

  • Smyth, B., & Keane, M.T. (1995). Remembering to Forget, in: Mellish, C.S. (ed.). Proceedings of the 14th Lnternational Joint Conference on Artificial Lntelligence. August, 20–25, Montreal, Quebec, Canada. San Mateo, CA: Morgan Kaufmann, Vol. I, pp. 377–382.

    Google Scholar 

  • Wettschereck, D., Aha, D., & Mohri, T. (1995). A Review and Comparative Evaluation of Feature Weighting Methods for Lazy Learning Algorithms. Technical Report AIC-95–012, Naval Research Laboratory, Navy Center for Applied Research in Artificial Intelligence, Washington, D.C.

    Google Scholar 

  • Zhang, J. (1992). Selecting Typical Instances in Instance-Based Learning, in: Sleeman, D., & Edwards, P. (eds.). Proceedings of the 9th Lnternational Conference on Machine Learning. July, 1–3, Aberdeen, Scotland. San Mateo, CA: Morgan Kaufmann, pp. 470–479.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Reinartz, T. (2001). A Unifying View on Instance Selection. In: Liu, H., Motoda, H. (eds) Instance Selection and Construction for Data Mining. The Springer International Series in Engineering and Computer Science, vol 608. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-3359-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-4757-3359-4_3

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-4861-8

  • Online ISBN: 978-1-4757-3359-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics