Scaling up the DBSCAN algorithm for clustering large spatial databases based on sampling technique

Ji-hong, Guan; Shui-geng, Zhou; Fu-ling, Bian; Yan-xiang, He

doi:10.1007/BF03160286

Scaling up the DBSCAN algorithm for clustering large spatial databases based on sampling technique

Geographic Information System
Published: March 2001

Volume 6, pages 467–473, (2001)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Wuhan University Journal of Natural Sciences

Scaling up the DBSCAN algorithm for clustering large spatial databases based on sampling technique

Download PDF

Guan Ji-hong¹,
Zhou Shui-geng²,
Bian Fu-ling³ &
…
He Yan-xiang¹

143 Accesses
Explore all metrics

Abstract

Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and etc. We combine sampling technique with DBSCAN algorithm to cluster large spatial databases, and two sampling-based DBSCAN (SDBSCAN) algorithms are developed. One algorithm introduces sampling technique inside DBSCAN, and the other uses sampling procedure outside DBSCAN. Experimental results demonstrate that our algorithms are effective and efficient in clustering largescale spatial databases.

Article PDF

Grid-Based Approach to Determining Parameters of the DBSCAN Algorithm

A Novel Approach to Determining the Radius of the Neighborhood Required for the DBSCAN Algorithm

VDMR-DBSCAN: Varied Density MapReduce DBSCAN

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Chen M S, Han J H, Yu P S. Data Mining: an Overview from a Database Perspective.IEEE Trans KDE, 1996,8(6): 866–883.
Google Scholar
Ng R T, Han J. Efficient and Effective Clustering Methods for Spatial Data Mining.Proceedings of the 20th VLDB Conference. San Francisco; Morgan Kaufmann Publishers, 1994. 144–155.
Google Scholar
Zhang T, Ramakrishnan R, Livny M. BIRCH: an Efficient Data Clustering Method for very Large Databases.Proceedings of the ACM SIGMOD International Conference on Management of Data. USA: ACM Press, 1996. 103–114.
Google Scholar
Ester M, Kriegel H P, Sander J,eaet al. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise.Proceedings of 2nd International Conference on Knowledge Discovering in Databases and Data Mining (KDD-96). USA: ACM Press, 1996.
Google Scholar
Guha S, Rastogi R, Shim K. CURE: an Efficient Clustering Algorithm for Large Databases.Proceedings of the ACM SIGMOD International Conference on Management of Data. USA: ACM Press, 1998. 73–84.
Google Scholar
Zhang W, Yang J, Muntz R. STING: a Statistical Information Grid Approach to Spatial Data Mining.Proceedings of the 23rd VLDB Converence. San Francisco: Morgan Kaufmann Publishers, 1997. 186–195.
Google Scholar
Agrawal R, Gehrke J, Gunopuios D,eaet al. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications.Proceedings of the ACM SIGMOD International Conference on Management of Data. USA: ACM Press, 1998. 73–84.
Google Scholar
Sheikholeslami G, Chatterjee S, Zhang A. WaveCluster: a Multi-Resolution Clustering Approach for very Large Spatial Databases.Proceedings of the 24th VLDB Conference. San Francisco: Morgan Kaufmann Publishers, 1998. 428–439.
Google Scholar
Kaufman L, Rousseeuw P J.Finding Groups in Data: an Introduction to Cluster Analysis. New York: John Wiley & Sons, 1990.
Google Scholar
Ester M, Kriegel H P, Xu X. Knowledge Discovery in Large Spatial Database: Focusing Techniques for Efficient Class Identification.LNCS: Proceedings of 4th International Symposium on Large Spatial Databases. Berlin: Springer-Verlag. 1995,951: 67–82.
Google Scholar
Vitter J. Random Sampling with Reservoir.ACM Transactions on Mathematical Software, 1985,11(1): 37–57.
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer, Wuhan University, 430072, Wuhan, China
Guan Ji-hong & He Yan-xiang
State Key Laboratory of Software Engineering, Wuhan University, 430072, Wuhan, China
Zhou Shui-geng
College of Remote Sensing and Information Engineering, Wuhan University, 430072, Wuhan, China
Bian Fu-ling

Authors

Guan Ji-hong
View author publications
You can also search for this author in PubMed Google Scholar
Zhou Shui-geng
View author publications
You can also search for this author in PubMed Google Scholar
Bian Fu-ling
View author publications
You can also search for this author in PubMed Google Scholar
He Yan-xiang
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Foundation item: Supported by the Open Researches Fund Program of LIESMARS (WKL (00)0302)

Biography: Guan Ji-hong (1969-), female, Associate professor, research direction: distributed GIS, spatial database. E-mail: jhguan@wtusm.edu.cn

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ji-hong, G., Shui-geng, Z., Fu-ling, B. et al. Scaling up the DBSCAN algorithm for clustering large spatial databases based on sampling technique. Wuhan Univ. J. of Nat. Sci. 6, 467–473 (2001). https://doi.org/10.1007/BF03160286

Download citation

Received: 20 December 2000
Issue Date: March 2001
DOI: https://doi.org/10.1007/BF03160286

Key words

CLC number

TP 311.13

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Scaling up the DBSCAN algorithm for clustering large spatial databases based on sampling technique

Abstract

Article PDF

Similar content being viewed by others

Grid-Based Approach to Determining Parameters of the DBSCAN Algorithm

A Novel Approach to Determining the Radius of the Neighborhood Required for the DBSCAN Algorithm

VDMR-DBSCAN: Varied Density MapReduce DBSCAN

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Key words

CLC number

Navigation

Scaling up the DBSCAN algorithm for clustering large spatial databases based on sampling technique

Abstract

Article PDF

Similar content being viewed by others

Grid-Based Approach to Determining Parameters of the DBSCAN Algorithm

A Novel Approach to Determining the Radius of the Neighborhood Required for the DBSCAN Algorithm

VDMR-DBSCAN: Varied Density MapReduce DBSCAN

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Search

Navigation