Techniques for Estimating the Computation and Communication Costs of Distributed Data Mining

Krishnaswamy, Shonali; Zaslavsky, Arkady; Loke, Seng Wai

doi:10.1007/3-540-46043-8_61

Shonali Krishnaswamy⁷,
Arkady Zaslavsky⁸ &
Seng Wai Loke⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2329))

Included in the following conference series:

International Conference on Computational Science

1375 Accesses
11 Citations

Abstract

Distributed Data Mining (DDM) is the process of mining distributed and heterogeneous datasets. DDM is widely seen as a means of addressing the scalability issue of mining large data sets. Consequently, there is an emerging focus on optimisation of the DDM process. In this paper we present cost formulae for estimating the communication and computation time for different distributed data mining scenarios.

THE WORK REPORTED IN THIS PAPER HAS BEEN FUNDED IN PART BY THE COOPERATIVE RESEARCH CENTRE PROGRAM THROUGH THE DEPARTMENT OF INDUSTRY, SCIENCE AND TOURISM OF THE COMMONWEALTH GOVERNMENT OF AUSTRALIA.

Download to read the full chapter text

Chapter PDF

A Study of Various Varieties of Distributed Data Mining Architectures

Homogeneous Vs. Heterogeneous Distributed Data Clustering: A Taxonomy

Fundamental Concepts of Distributed Computing Used in Big Data Analytics

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Downey, A,B., (1997), “Predicting Queue Times on Space-Sharing Parallel Computers”, Proc. of the 11 ^th Intl. Parallel Processing Symposium (IPPS), Geneva, Switzerland, April.
Google Scholar
Gibbons, R., (1997), “A Historical Application Profiler for Use by Parallel Schedulers”, LNCS 1291, Springer-Verlag, pp. 58–75.
Google Scholar
Hu, X., (1995), “Knowledge Discovery in Databases: An Attribute-Oriented Rough sets Approach”, PhD Thesis, University of Regina, Canada.
Google Scholar
Komorowski, J., Pawlak, Z., Polkowski, L., and Skowron, A., (1998), “Rough sets: A Tutorial”, in Rough-Fuzzy Hybridization: A New Trend in Decision Making, (eds) S.K. Pal and A. Skowron, Springer-Verlag, pp. 3–98.
Google Scholar
Krishnaswamy, S., Loke, S,W., & Zaslavsky, A, (2002), “Application Run Time Estimation: A Quality of Service Metric for Web-based Data Mining Services”, To Appear in ACM Symposium on Applied Computing (SAC 2002), Madrid, March.
Google Scholar
Parthasarathy, S., and Subramonian, R., (2001), “An Interactive Resource-Aware Framework for Distributed Data Mining”, in Newsletter of the IEEE Technical Committee on Distributed Processing, Spring 2001, pp. 24–32.
Google Scholar
Smith, W., Taylor, V., and Foster, I.,(1999), “Using run-time predictions to estimate queue wait times and improve scheduler performance”, LNCS 1659, Springer-Verlag, pp. 202–229.
Google Scholar
Straßer, M., and Schwehm, M., (1997), “A Performance Model for Mobile Agent Systems”, in Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA’97), (eds) H. Arabnia, Vol II, CSREA, pp. 1132–1140.
Google Scholar
Turinsky, A., and Grossman, R., (2000), “A Framework for Finding Distributed Data Mining Strategies that are Intermediate between centralized Strategies and In-place Strategies”, Workshop on Distributed and Parallel Knowledge Discovery at KDD-2000, Boston, pp. 1–7.
Google Scholar
Witten, I,H., and Eibe, F., (1999), “Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations”, Morgan Kauffman.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Network Computing, Monash University (Peninsula Campus), McMahons Rd, Frankston, Victoria, 3199, Australia
Shonali Krishnaswamy
School of Computer Science and Software Engineering, Monash University, 900 Dandenong Road, Caulfield East, Victoria, 3145, Australia
Arkady Zaslavsky
School of Computer Science and Information Technology, RMIT University, GPO Box 2476V, Melbourne, Victoria, 3001, Australia
Seng Wai Loke

Authors

Shonali Krishnaswamy
View author publications
You can also search for this author in PubMed Google Scholar
Arkady Zaslavsky
View author publications
You can also search for this author in PubMed Google Scholar
Seng Wai Loke
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Science, Section Computational Science, University of Amsterdam, Kruislaan 403, 1098 SJ, Amsterdam, The Netherlands
Peter M. A. Sloot & Alfons G. Hoekstra &
Western Science Center, SHARCNET, University ofWestern Ontario, London, Ontario, Canada, N6A 5B7
C. J. Kenneth Tan
Computer Science Department Innovative Computing Laboratory, University of Tennessee, 1122 Volunteer Blvd, Knoxville, TN, 37996-3450, USA
Jack J. Dongarra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Krishnaswamy, S., Zaslavsky, A., Loke, S.W. (2002). Techniques for Estimating the Computation and Communication Costs of Distributed Data Mining. In: Sloot, P.M.A., Hoekstra, A.G., Tan, C.J.K., Dongarra, J.J. (eds) Computational Science — ICCS 2002. ICCS 2002. Lecture Notes in Computer Science, vol 2329. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46043-8_61

Download citation

DOI: https://doi.org/10.1007/3-540-46043-8_61
Published: 10 April 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43591-4
Online ISBN: 978-3-540-46043-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Techniques for Estimating the Computation and Communication Costs of Distributed Data Mining

Abstract

Chapter PDF

Similar content being viewed by others

A Study of Various Varieties of Distributed Data Mining Architectures

Homogeneous Vs. Heterogeneous Distributed Data Clustering: A Taxonomy

Fundamental Concepts of Distributed Computing Used in Big Data Analytics

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Techniques for Estimating the Computation and Communication Costs of Distributed Data Mining

Abstract

Chapter PDF

Similar content being viewed by others

A Study of Various Varieties of Distributed Data Mining Architectures

Homogeneous Vs. Heterogeneous Distributed Data Clustering: A Taxonomy

Fundamental Concepts of Distributed Computing Used in Big Data Analytics

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation