Abstract
When facing the need to select the most appropriate algorithm to apply on a new data set, data analysts often follow an approach which can be related to test-driving cars to decide which one to buy: apply the algorithms on a sample of the data to quickly obtain rough estimates of their performance. These estimates are used to select one or a few of those algorithms to be tried out on the full data set. We describe sampling-based landmarks (SL), a systematization of this approach, building on earlier work on landmarking and sampling. SL are estimates of the performance of algorithms on a small sample of the data that are used as predictors of the performance of those algorithms on the full set. We also describe relative landmarks (RL), that address the inability of earlier landmarks to assess relative performance of algorithms. RL aggregate landmarks to obtain predictors of relative performance. Our experiments indicate that the combination of these two improvements, which we call Sampling-based Relative Landmarks, are better for ranking than traditional data characterization measures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
D.W. Aha. Generalizing from case studies: A case study. In D. Sleeman and P. Edwards, editors, Proceedings of the Ninth International Workshop on Machine Learning (ML92), pages 1–10. Morgan Kaufmann, 1992.
H. Bensusan and C. Giraud-Carrier. Casa batló is in passeig de grácia or landmarking the expertise space. In J. Keller and C. Giraud-Carrier, editors, Meta-Learning: Building Automatic Advice Strategies for Model Selection and Method Combination, pages 29–46, 2000.
H. Bensusan and A. Kalousis. Estimating the predictive accuracy of a classifier. In P. Flach and L. de Raedt, editors, Proceedings of the 12th European Conference on Machine Learning, pages 25–36. Springer, 2001.
C. Blake, E. Keogh, and C.J. Merz. Repository of machine learning databases, 1998. http://www.ics.uci.edu/~mlearn/MLRepository.html.
P. Brazdil, J. Gama, and B. Henery. Characterizing the applicability of classification algorithms using meta-level learning. In F. Bergadano and L. de Raedt, editors, Proceedings of the European Conference on Machine Learning (ECML-94), pages 83–102. Springer-Verlag, 1994.
J. Fürnkranz and J. Petrak. An evaluation of landmarking variants. In C. Giraud-Carrier, N. Lavrac, and S. Moyle, editors, Working Notes of the ECML/PKDD 2000 Workshop on Integrating Aspects of Data Mining, Decision Support and Meta-Learning, pages 57–68, 2001.
B. Gu, B. Liu, F. Hu, and H. Liu. Efficiently determine the starting sample size for progressive sampling. In P. Flach and L. de Raedt, editors, Proceedings of the 12th European Conference on Machine Learning. Springer, 2001.
G. H. John and P. Langley. Static versus dynamic sampling for data mining. In E. Simoudis, J. Han, and U. Fayyad, editors, Proceedings of Second International Conference on Knowledge Discovery and Data Mining (KDD-96). AAAI-Press, 1996.
J. Keller, I. Paterson, and H. Berrer. An integrated concept for multi-criteria ranking of data-mining algorithms. In J. Keller and C. Giraud-Carrier, editors, Meta-Learning: Building Automatic Advice Strategies for Model Selection and Method Combination, 2000.
D. Michie, D.J. Spiegelhalter, and C.C. Taylor. Machine Learning, Neural and Statistical Classification. Ellis Horwood, 1994.
J. Petrak. Fast subsampling performance estimates for classification algorithm selection. In J. Keller and C. Giraud-Carrier, editors, Meta-Learning: Building Automatic Advice Strategies for Model Selection and Method Combination, pages 3–14, 2000.
B. Pfahringer, H. Bensusan, and C. Giraud-Carrier. Tell me who can learn you and i can tell you who you are: Landmarking various learning algorithms. In P. Langley, editor, Proceedings of the Seventeenth International Conference on Machine Learning (ICML2000), pages 743–750. Morgan Kaufmann, 2000.
F. Provost, D. Jensen, and T. Oates. Efficient progressive sampling. In S. Chaudhuri and D. Madigan, editors, Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1999.
C. Soares and P. Brazdil. Zoomed ranking: Selection of classification algorithms based on relevant performance information. In D.A. Zighed, J. Komorowski, and J. Zytkow, editors, Proceedings of the Fourth European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD2000), pages 126–135. Springer, 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Soares, C., Petrak, J., Brazdil, P. (2001). Sampling-Based Relative Landmarks: Systematically Test-Driving Algorithms before Choosing. In: Brazdil, P., Jorge, A. (eds) Progress in Artificial Intelligence. EPIA 2001. Lecture Notes in Computer Science(), vol 2258. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45329-6_12
Download citation
DOI: https://doi.org/10.1007/3-540-45329-6_12
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43030-8
Online ISBN: 978-3-540-45329-1
eBook Packages: Springer Book Archive