Abstract
In this paper, we deal with the problem of curves clustering.We propose a nonparametric method which partitions the curves into clusters and discretizes the dimensions of the curve points into intervals. The cross-product of these partitions forms a data-grid which is obtained using a Bayesian model selection approach while making no assumptions regarding the curves. Finally, a post-processing technique, aiming at reducing the number of clusters in order to improve the interpretability of the clustering, is proposed. It consists in optimally merging the clusters step by step, which corresponds to an agglomerative hierarchical classification whose dissimilarity measure is the variation of the criterion. Interestingly this measure is none other than the sum of the Kullback-Leibler divergences between clusters distributions before and after the merges. The practical interest of the approach for functional data exploratory analysis is presented and compared with an alternative approach on an artificial and a real world data set.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Abraham, C., Cornillon, P., Matzner-Løbe, E., Molinari, N.: Unsupervised curve clustering using b-splines. Scandinavian Journal of Statistics 30(3), 581–595 (2003)
Abramowitz, M., Stegun, I.: Handbook of mathematical functions. Dover Publications Inc., New York (1970)
Blei, D.M., Jordan, M.I.: Variational inference for dirichlet process mixtures. Bayesian Analysis 1, 121–144 (2005)
Boullé, M.: Data grid models for preparation and modeling in supervised learning. In: Guyon, I., Cawley, G., Dror, G., Saffari, A. (eds.) Hands on Pattern Recognition. Microtome (2010) (in press)
Cadez, I., Gaffney, S., Smyth, P.: A general probabilistic framework for clustering individuals and objects. In: Proc. ACM Sixth Inter. Conf. Knowledge Discovery and Data Mining, pp. 140–149 (2000)
Chamroukhi, F., Samé, A., Govaert, G., Aknin, P.: A hidden process regression model for functional data description. application to curve discrimination. Neurocomputing 73(7-9), 1210–1221 (2010)
Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., Wirth, R.: CRISP-DM 1.0: step-by-step data mining guide (2000)
Cover, T., Thomas, J.: Elements of information theory. Wiley-Interscience, New York (1991)
Delaigle, G., Hall, P.: Defining probability density for a distribution of random functions. Annals of Statistics 38(2), 1171–1193 (2010)
Ferraty, F., Vieu, P.: Nonparametric Functional Data Analysis: Theory and Practice. Springer (2006)
Gaffney, S., Smyth, P.: Joint probabilistic curve clustering and alignment. In: Advances in Neural Information Processing Systems 17 (2004)
Gasser, T., Hall, P., Presnell, B.: Nonparametric estimation of the mode of a distribution of random curves. Journal of the Royal Statistical Society 60, 681–691 (1998)
Hansen, P., Mladenovic, N.: Variable neighborhood search: principles and applications. European Journal of Operational Research 130, 449–467 (2001)
Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning. Springer (2001)
Hébrail, G., Hugueney, B., Lechevallier, Y., Rossi, F.: Exploratory Analysis of Functional Data via Clustering and Optimal Segmentation. Neurocomputing 73(7-9), 1125–1141 (2010)
Neal, R.M.: Markov chain sampling methods for dirichlet process mixture models. Journal of Computational and Graphical Statistics 9(2), 249–265 (2000)
Nguyen, X., Gelfand, A.: The dirichlet labeling process for clustering functional data. Sinica Statistica 21(3), 1249–1289 (2011)
Ramsay, J., Silverman, B.: Functional Data Analysis. Springer Series in Statistics. Springer (2005)
Rissanen, J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)
Sheather, S., Jones, M.: A reliable data-based bandwidth selection method for kernel density estimation. Journal of the Royal Statistical Society. Series B (Methodological), 683–690 (1991)
Teh, Y.W.: Dirichlet processes. In: Encyclopedia of Machine Learning. Springer (2010)
Vogt, J.E., Prabhakaran, S., Fuchs, T.J., Roth, V.: The translation-invariant wishart-dirichlet process for clustering distance data (2010)
Wallach, H.M., Jensen, S.T., Dicker, L., Heller, K.A.: An alternative prior process for nonparametric bayesian clustering. In: AISTATS, pp. 892–899 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Boullé, M., Guigourès, R., Rossi, F. (2014). Nonparametric Hierarchical Clustering of Functional Data. In: Guillet, F., Pinaud, B., Venturini, G., Zighed, D. (eds) Advances in Knowledge Discovery and Management. Studies in Computational Intelligence, vol 527. Springer, Cham. https://doi.org/10.1007/978-3-319-02999-3_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-02999-3_2
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02998-6
Online ISBN: 978-3-319-02999-3
eBook Packages: EngineeringEngineering (R0)