Abstract
Although Hartigan (1975) had already put forward the idea of connecting identification of subpopulations with regions with high density of the underlying probability distribution, the actual development of methods for cluster analysis has largely shifted towards other directions, for computational convenience. Current computational resources allow us to reconsider this formulation and to develop clustering techniques directly in order to identify local modes of the density. Given a set of observations, a nonparametric estimate of the underlying density function is constructed, and subsets of points with high density are formed through suitable manipulation of the associated Delaunay triangulation. The method is illustrated with some numerical examples.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Aitchison J. 1986. The Statistical Analysis of Compositional Data. Chapman & Hall, London.
Ankerst M., Breuning M.M., Kriegel H.P., and Sander J. 1999. OPTICS: ordering points to identify the clustering structure. In: International Conference on Management of Data (SIGMOD’99), ACM, pp. 49–60.
Barber C.B., Dobkin D.P., and Huhdanpaa H. 1996. The Quickhull algorithm for convex hulls. ACM Trans. Math. Software 22: 469–483.
Bowman A. and Foster P. 1993. Density based exploration of bivariate data. Statistics and Computing 3: 171–177.
Bowman A.W. and Azzalini 1997. Applied Smoothing Techniques for Data Analysis. Claredon Press, Oxford.
Cuevas A., Febrero M., and Fraiman R. 2000. Estimating the number of clusters. Canad. J. Stat. 28: 367–382.
Cuevas A., Febrero M., and Fraiman R. 2001. Cluster analysis: a further approach based on density estimation. Computational Statistics & Data Analysis 36: 441–459.
Devroye L.P. and Wagner T.J. 1980. The strong uniform consistency of kernel density estimates. In: Multivariate Analysis, North-Holland, Vol. 5, pp. 59–77.
Ester M., Kriegel H.P., Sander J., and Xu X. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery in Data Mining (KDD-96), Portland, OR, USA. ACM, pp. 226–231.
Forina M., Armanino C., Lanteri S., and Tiscornia E. 1983. Classication of olive oils from their fatty acid composition. In: H. Martens and H. J. Russwurm (Eds.), Food Research and Data Analysis, Applied Science Publishers: London, pp. 189–214.
Hartigan J.A. 1975. Clustering Algorithms. J. Wiley & Sons, New York.
Hubert L. and Arabie P. 1985. Comparing partitions. Journal of Classification 2: 193–218.
Nadaraya É.A. 1965. On non-parametric estimates of density functions and regression curves. Theory Probability its Appl. (Transl. Teorija Verojatnostei i ee Primenenija) 10: 186–190.
Okabe A., Boots B.N., and Sugihara K. 1992. Spatial Tessellations: Concepts and Applications of Voronoi Diagrams. J. Wiley & Sons, New York.
R Development Core Team 2004. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria 3-900051-07-0.
Rosolin T., Azzalini A., and Torelli N. 2003. Detecting clusters via nonparametric density estimation. In: Convegno SIS analisi statistica multivariata per le scienze economico-sociali, le scienze naturali e la tecnologia, Napoli, Italy. Società Italiana di Statistica, RCE edizioni.
Stuetzle W. 2003. Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample. Journal of Classification 20: 25–47.
Wong A.M. and Lane T. 1983. The kth nearest neighbour clustering procedure. Journal of the Royal Statistical Society, Series B 45: 362–368.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Azzalini, A., Torelli, N. Clustering via nonparametric density estimation. Stat Comput 17, 71–80 (2007). https://doi.org/10.1007/s11222-006-9010-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-006-9010-y