Abstract
Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and etc. We combine sampling technique with DBSCAN algorithm to cluster large spatial databases, and two sampling-based DBSCAN (SDBSCAN) algorithms are developed. One algorithm introduces sampling technique inside DBSCAN, and the other uses sampling procedure outside DBSCAN. Experimental results demonstrate that our algorithms are effective and efficient in clustering largescale spatial databases.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Chen M S, Han J H, Yu P S. Data Mining: an Overview from a Database Perspective.IEEE Trans KDE, 1996,8(6): 866–883.
Ng R T, Han J. Efficient and Effective Clustering Methods for Spatial Data Mining.Proceedings of the 20th VLDB Conference. San Francisco; Morgan Kaufmann Publishers, 1994. 144–155.
Zhang T, Ramakrishnan R, Livny M. BIRCH: an Efficient Data Clustering Method for very Large Databases.Proceedings of the ACM SIGMOD International Conference on Management of Data. USA: ACM Press, 1996. 103–114.
Ester M, Kriegel H P, Sander J,eaet al. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise.Proceedings of 2nd International Conference on Knowledge Discovering in Databases and Data Mining (KDD-96). USA: ACM Press, 1996.
Guha S, Rastogi R, Shim K. CURE: an Efficient Clustering Algorithm for Large Databases.Proceedings of the ACM SIGMOD International Conference on Management of Data. USA: ACM Press, 1998. 73–84.
Zhang W, Yang J, Muntz R. STING: a Statistical Information Grid Approach to Spatial Data Mining.Proceedings of the 23rd VLDB Converence. San Francisco: Morgan Kaufmann Publishers, 1997. 186–195.
Agrawal R, Gehrke J, Gunopuios D,eaet al. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications.Proceedings of the ACM SIGMOD International Conference on Management of Data. USA: ACM Press, 1998. 73–84.
Sheikholeslami G, Chatterjee S, Zhang A. WaveCluster: a Multi-Resolution Clustering Approach for very Large Spatial Databases.Proceedings of the 24th VLDB Conference. San Francisco: Morgan Kaufmann Publishers, 1998. 428–439.
Kaufman L, Rousseeuw P J.Finding Groups in Data: an Introduction to Cluster Analysis. New York: John Wiley & Sons, 1990.
Ester M, Kriegel H P, Xu X. Knowledge Discovery in Large Spatial Database: Focusing Techniques for Efficient Class Identification.LNCS: Proceedings of 4th International Symposium on Large Spatial Databases. Berlin: Springer-Verlag. 1995,951: 67–82.
Vitter J. Random Sampling with Reservoir.ACM Transactions on Mathematical Software, 1985,11(1): 37–57.
Author information
Authors and Affiliations
Additional information
Foundation item: Supported by the Open Researches Fund Program of LIESMARS (WKL (00)0302)
Biography: Guan Ji-hong (1969-), female, Associate professor, research direction: distributed GIS, spatial database. E-mail: jhguan@wtusm.edu.cn
Rights and permissions
About this article
Cite this article
Ji-hong, G., Shui-geng, Z., Fu-ling, B. et al. Scaling up the DBSCAN algorithm for clustering large spatial databases based on sampling technique. Wuhan Univ. J. of Nat. Sci. 6, 467–473 (2001). https://doi.org/10.1007/BF03160286
Received:
Issue Date:
DOI: https://doi.org/10.1007/BF03160286