Abstract
Distributed Data Mining (DDM) is the process of mining distributed and heterogeneous datasets. DDM is widely seen as a means of addressing the scalability issue of mining large data sets. Consequently, there is an emerging focus on optimisation of the DDM process. In this paper we present cost formulae for estimating the communication and computation time for different distributed data mining scenarios.
THE WORK REPORTED IN THIS PAPER HAS BEEN FUNDED IN PART BY THE COOPERATIVE RESEARCH CENTRE PROGRAM THROUGH THE DEPARTMENT OF INDUSTRY, SCIENCE AND TOURISM OF THE COMMONWEALTH GOVERNMENT OF AUSTRALIA.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Downey, A,B., (1997), “Predicting Queue Times on Space-Sharing Parallel Computers”, Proc. of the 11 th Intl. Parallel Processing Symposium (IPPS), Geneva, Switzerland, April.
Gibbons, R., (1997), “A Historical Application Profiler for Use by Parallel Schedulers”, LNCS 1291, Springer-Verlag, pp. 58–75.
Hu, X., (1995), “Knowledge Discovery in Databases: An Attribute-Oriented Rough sets Approach”, PhD Thesis, University of Regina, Canada.
Komorowski, J., Pawlak, Z., Polkowski, L., and Skowron, A., (1998), “Rough sets: A Tutorial”, in Rough-Fuzzy Hybridization: A New Trend in Decision Making, (eds) S.K. Pal and A. Skowron, Springer-Verlag, pp. 3–98.
Krishnaswamy, S., Loke, S,W., & Zaslavsky, A, (2002), “Application Run Time Estimation: A Quality of Service Metric for Web-based Data Mining Services”, To Appear in ACM Symposium on Applied Computing (SAC 2002), Madrid, March.
Parthasarathy, S., and Subramonian, R., (2001), “An Interactive Resource-Aware Framework for Distributed Data Mining”, in Newsletter of the IEEE Technical Committee on Distributed Processing, Spring 2001, pp. 24–32.
Smith, W., Taylor, V., and Foster, I.,(1999), “Using run-time predictions to estimate queue wait times and improve scheduler performance”, LNCS 1659, Springer-Verlag, pp. 202–229.
Straßer, M., and Schwehm, M., (1997), “A Performance Model for Mobile Agent Systems”, in Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA’97), (eds) H. Arabnia, Vol II, CSREA, pp. 1132–1140.
Turinsky, A., and Grossman, R., (2000), “A Framework for Finding Distributed Data Mining Strategies that are Intermediate between centralized Strategies and In-place Strategies”, Workshop on Distributed and Parallel Knowledge Discovery at KDD-2000, Boston, pp. 1–7.
Witten, I,H., and Eibe, F., (1999), “Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations”, Morgan Kauffman.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Krishnaswamy, S., Zaslavsky, A., Loke, S.W. (2002). Techniques for Estimating the Computation and Communication Costs of Distributed Data Mining. In: Sloot, P.M.A., Hoekstra, A.G., Tan, C.J.K., Dongarra, J.J. (eds) Computational Science — ICCS 2002. ICCS 2002. Lecture Notes in Computer Science, vol 2329. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46043-8_61
Download citation
DOI: https://doi.org/10.1007/3-540-46043-8_61
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43591-4
Online ISBN: 978-3-540-46043-5
eBook Packages: Springer Book Archive