Abstract
The clustering algorithm DBSCAN relies on a density-based notion of clusters and is designed to discover clusters of arbitrary shape as well as to distinguish noise. In this paper, we present PDBSCAN, aparallel version ofthis algorithm. We use the ‘shared-nothing’ architecture withmultiple computers interconnected through a network. A fundamental component of a shared-nothing system is its distributed data structure. We introduce the dR*-tree, a distributed spatial index structure in which the data is spread among multiple computers and the indexes of the data are replicated on every computer. We implemented our method using a number of workstations connected via Ethernet (10 Mbit). A performance evaluation shows that PDBSCAN offers nearly linear speedup and has excellent scaleup and sizeup behavior.
This work was performed while the author was still working at the Institute for Computer Science, University of Munich.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R. and Shafer, J.C. 1996. Parallel mining of association rules: design, implementation, and experience. IBM Research Report.
Beckmann, N., Kriegel, H.-P., Schneider, R., and Seeger, B. 1990. The R*-tree: an efficient and robust access method for points and rectangles. Proc. ACM SIGMOD Int. Conf. on Management of Data. Atlantic City, NJ, pp. 322–331.
Berchtold S., Keim D.A., and Kriegel, H.-P. 1996. The X-tree: an index structure for high-dimensional data. Proc. 22nd Int. Conf. on Very Large Data Bases, Bombay, India, Morgan Kaufmann, pp. 28–39.
Bially, T. 1969. Space-filling curves: their generation and their application to bandwidth reduction. IEEE Trans. on Information Theory, IT-15(6):658–664.
Cheung, D.W., Han, J., Ng, V.T., Fu, A.W., and Fu, Y. 1996. A fast distributed algorithm for mining association rules. Proc. Int. Conf. on Parallel and Distributed Information System (PDIS’96). Miami Beach, FL, USA.
Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining. Portland, OR, pp. 226–231.
Ester, M., Kriegel, H.-P., and Xu, X. 1995. A database interface for clustering in large spatial databases. Proc. 1st Int. Conf. on Knowledge Discovery and Data Mining. Montreal, Canada, 1995, pp. 94–99.
Faloutsos, C. and Roseman, S. 1989. Fractals for secondary key retrieval. Proc. 8th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), pp. 247–252.
Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. 1996. Knowledge discovery anddatamining: towards a unifying framework. Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining. Portland, OR, pp. 82–88.
Geist, A., Beguelin, A., Dongama, J., Jiang, W., Manchek, R., and Sunderam, V. 1996. PVM: Parallel Virtual Machine — A User’s Guide and Tutorial for Networked Parallel Computing. Cambridge, MA, London, England: The MIT Press, 3rd printing.
Gueting, R.H. 1994. An introduction to spatial database systems. The VLDB Journal, 3(4):357–399.
Jaja, J. 1992. An Introduction to Parallel Algorithms. Addison-Wesley Publishing Company, pp. 61–65.
Kamel, I. and Faloutsos, C. 1993. On packing R-trees. Proc. 2nd Int. Conf. on Information and Knowledge Management (CIKM).
Li, X. and Fang, Z. 1989. Parallel clustering algorithms. Parallel Computing, 11:275–290.
Matheus, C.J., Chan, P.K., and Piatetsky-Shapiro, G. 1993. Systems for knowledge discovery in databases. IEEE Transactions on Knowledge and Data Engineering, 5(6):903–913.
Mehta, M. and DeWitt, D.J. 1997. Data placement in shared-nothing parallel database systems. VLDB Journal, 6:53–72.
Olson, C.F. 1995. Parallel algorithms for hierarchical clustering. Parallel Computing, 21(8):1313–1325.
Park, J.-S., Chen, M.-S., and Yu, P.S. 1995. An effective hash based algorithm formining association rules. Proc. ACM SIGMOD Int. Conf. on Management of Data. San Jose, CA, pp. 175–186.
Pfitzner, D.W., Salmon, J.K., and Sterling, T. 1998. Halo World: Tools forParallel Cluster Finding in Astrophysical N-body Simulations. Data Mining and Knowledge Discovery. Kluwer Academic Publishers, Vol. 2,No. 2, pp. 419–438.
Rasmussen, E.M. and Willett, P. 1989. Efficiency ofhierarchical agglomerative clustering using the ICL distributed array processor. Journal of Documentation, 45(1): 1–24.
Richards, A.J. 1983. Remote Sensing Digital Image Analysis. An Introduction, Berlin: Springer.
Sander, J., Ester, M., Kriegel, H.-P., and Xu, X. 1998. Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications. Data Mining and Knowledge Discovery, Kluwer Academic Publishers, vol. 2, pp. 1–27.
Stonebraker, M. 1986. The case for shared nothing. Database Eng., 9(1).
Stonebraker, M., Frew, J., Gardels, K., and Meredith, J. 1993. The SEQUOIA 2000 storage benchmark. Proc. ACM SIGMOD Int. Conf. on Management of Data. Washington, DC, pp. 2–11.
Xu, X. 1999. Efficient Clustering for Knowledge Discovery in Spatial Databases. Shaker, Germany: Aachen.
Xu, X., Ester, M., Kriegel, H.-P., and Sander, J. 1998. A distribution-based clustering algorithm for mining in large spatial databases. 14th Int. Conf. on Data Engineering (ICDE’98). Orlando, FL.
Zhang, T., Ramakrishnan, R., and Livny, M. 1998. BIRCH: A New Data Clustering Algorithmand Its Applications, Kluwer Academic Publishers, pp. 1–40
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Kluwer Academic Publishers
About this chapter
Cite this chapter
Xu, X., Jäger, J., Kriegel, HP. (1999). A Fast Parallel Clustering Algorithm for Large Spatial Databases. In: Guo, Y., Grossman, R. (eds) High Performance Data Mining. Springer, Boston, MA. https://doi.org/10.1007/0-306-47011-X_3
Download citation
DOI: https://doi.org/10.1007/0-306-47011-X_3
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-7923-7745-0
Online ISBN: 978-0-306-47011-0
eBook Packages: Springer Book Archive