Fitting semiparametric clustering models to dissimilarity data

Vichi, Maurizio

doi:10.1007/s11634-008-0025-4

Fitting semiparametric clustering models to dissimilarity data

Regular Article
Published: 18 September 2008

Volume 2, pages 121–161, (2008)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Fitting semiparametric clustering models to dissimilarity data

Download PDF

Maurizio Vichi¹

141 Accesses
9 Citations
Explore all metrics

Abstract

The cluster analysis problem of partitioning a set of objects from dissimilarity data is here handled with the statistical model-based approach of fitting the “closest” classification matrix to the observed dissimilarities. A classification matrix represents a clustering structure expressed in terms of dissimilarities. In cluster analysis there is a lack of methodologies widely used to directly partition a set of objects from dissimilarity data. In real applications, a hierarchical clustering algorithm is applied on dissimilarities and subsequently a partition is chosen by visual inspection of the dendrogram. Alternatively, a “tandem analysis” is used by first applying a Multidimensional Scaling (MDS) algorithm and then by using a partitioning algorithm such as k-means applied on the dimensions specified by the MDS. However, neither the hierarchical clustering algorithms nor the tandem analysis is specifically defined to solve the statistical problem of fitting the closest partition to the observed dissimilarities. This lack of appropriate methodologies motivates this paper, in particular, the introduction and the study of three new object partitioning models for dissimilarity data, their estimation via least-squares and the introduction of three new fast algorithms.

Avoid common mistakes on your manuscript.

References

Asuncion A, Newman DJ (2007) UCI Machine Learning Repository, University of California, School of Information and Computer Science, Irvine, CA. http://www.ics.uci.edu/~mlearn/MLRepository.html
Ball GH, Hall DJ (1967) A clustering technique for summarizing multivariate data. Behav Sci 12: 153–5
Article Google Scholar
Bandelt HJ (1990) Recognition of the tree metrics. SIAM J Discrete Math 3(1): 1–6
Article MATH MathSciNet Google Scholar
Bock HH (1974) Automatische klassifikation. studia mathematica. Vandenhoek und Ruprecht, Göttingen
Google Scholar
Bock HH et al (1998) Probabilistic aspects in Classification. In: Hayashi (eds) Data science, classification and related methods. Springer, Heidelberg, pp 3–21
Google Scholar
Carroll JD, Pruzansky S (1975) Fitting of hierarchical tree structure (HTS) models, mixtures of HTS models, and hybrid models, via mathematical programming and alternating least squares. Paper presented at U. S.-Japan Seminar on Theory, Methods and Applications of Multidimensional Scaling and Related Techniques, San Diego, August 20–24
Carroll JD, Pruzansky S (1980) Discrete and hybrid scaling models. In: Lantermann ED, Feger (eds) Similarity and choice. Huber, Bern, pp 108–139
Cattell RB (1966) The scree test for the number of factors. Multivar Behav Res 1: 245–276
Article Google Scholar
Chandon JL, Lemaire J, Pouget J (1980) Construction de 1’ultramétrique la plus proche d’une dissimilarité an sens des moindres carrés. Recherche Operationelle Operations Research 14: 157–70
MATH MathSciNet Google Scholar
Cormack RM (1971) A review of classification (with discussion). J R Stat Soc A 134: 321–67
Article MathSciNet Google Scholar
Critchley F, Fichet B (1994) The partial order by inclusion of the principal classes of dissimilarity on finite set, and some of the basic properties. In: Cutsem B (eds) Classification and dissimilarity analysis, Lecture Notes in Statistics. Springer, Berlin, pp 5–65
Google Scholar
Dahlhaus E (1993) Fast parallel recognition of ultrametrics and tree metrics. SIAM J Discrete Math 6(4): 523–532
Article MATH MathSciNet Google Scholar
De Soete G (1984) A least squares algorithm for fitting an ultrametric tree to a dissimilarity matrix. Pattern Recognit Lett 2: 133–7
Article Google Scholar
Fisher L, Van Ness JW (1971) Admissible clustering procedures. Biometrika 58: 91–104
Article MATH MathSciNet Google Scholar
Fraley C, Raftery AE (2002) Model based clustering, discriminant analysis and density estimation. J Am Stat Assoc 97: 611–631
Article MATH MathSciNet Google Scholar
Gordon AD (1987) Parsimonious trees. J Classif 4: 85–101
Article MATH Google Scholar
Gordon AD (1999) Classification: methods for the exploratory analysis of multivariate data. Chapman and Hall, London
Google Scholar
Gower JC (1966) Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53: 325–38
MATH MathSciNet Google Scholar
Grötschel M, Wakabayashi Y (1989) A cutting plane algorithm for a clustering problem. Math Program 45: 59–96
Article MATH Google Scholar
Hansen P, Jaumard B, Sanlaville E (1994) Partitioning problems in cluster analysis: a review of mathematical programming approaches. In: Diday E, Lechevallier Y, Schader M, BertrandP Burtschy B (eds) New approaches in classification and data analysis. Springer, Berlin, pp 228–40
Google Scholar
Hartigan JA (1967) Representation of similarity matrices by trees. J Am Stat Assoc 62: 1140–58
Article MathSciNet Google Scholar
Hartigan JA (1975) Clustering algorithms. Wiley, New York
MATH Google Scholar
Hennig C (2007) Cluster-wise assessment of cluster stability. Comput Stat Data Anal 52: 258–271
Article MATH MathSciNet Google Scholar
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2: 193–218
Article Google Scholar
Hubert L, Arabie P, Meulman J (1997) Hierarchical clustering and the construction of (optimal) ultrametrics using in lecture Notes, Monograph Series, vol 31. Institute of Mathematical Statistics, Hayward, CA, pp 457–72
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Englewood Cliffs, NJ
MATH Google Scholar
Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York
Google Scholar
Krivánek M, Morávek J (1986) NP-hard problems in hierarchical-tree clustering. Acta Inform 23: 311–323
Article MATH MathSciNet Google Scholar
Keller JB (1962) Factorization of matrices by least squares. Biometrika 49: 239–242
MATH MathSciNet Google Scholar
Marcotorchino F, Michaud P (1982) Agregation de similarites en classification automatique. Revue de Statistique Appliquée 30(2): 21–44
MATH MathSciNet Google Scholar
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Le Cam LM, Neyman J (eds) Proceedings of the fifth berkeley symposium on mathematical statistics and probability, vol 1. Statistics. University of California Press, Berkeley, pp 281–297
Google Scholar
Mathar R (1985) The best Euclidean fit to a given distance matrix in prescribed dimensions. Linear Algebra Appl 67: 1–6
Article MATH MathSciNet Google Scholar
McLachlan G, Peel D (2000) Finite mixture models. Wiley, New York
MATH Google Scholar
Oh M-S, Raftery AE (2007) Model-based clustering with dissimilarities: a bayesian approach. J Comput Graph Stat. A Technical Report is available on http://www.stat.washington.edu/raftery/Research/bayes.html (to appear)
Powell MJD et al (1983) Variable metric methods for constrained optimization. In: Bachem A (eds) Mathematical programming: the state of the art. Springer, Berlin, pp 288–311
Google Scholar
Régnier S (1965) Sur quelques aspects mathématiques des problemes de classification automatique. International Computation Centre Bulletin 4, 175–91. Reprinted in Mathématiques et Sciences Humaines, 82, 13–29 (1982)
Rubin J (1967) Optimal classification into groups:an approach to solve the taxonomy problem. J Theor Biol 15: 103–144
Article Google Scholar
Sriram N (1990) Clique optimization: a method to construct parsimonious ultrametric trees from similarity data. J Classif 7: 33–52
Article MATH MathSciNet Google Scholar
Torgerson WS (1958) Theory and methods of scaling. Wiley, New York
Google Scholar
Vicari D, Vichi M (2000) Non-hierarchical classification structures. In: Gaul W, Opitz O, Schader M (eds) Data analysis, studies in classification data analysis and knowledge organization. Springer, Berlin, pp 51–66
Google Scholar
Vichi M (1993) Un algoritmo dei minimi quadrati per interpolare un insieme di classificazioni gerarchiche con una classificazione consenso, Metron, vol 51, 3–4, 139–163
Vichi M (1994) Un algoritmo per il consenso tra classificazioni gerarchiche con l’ausilio di tecniche multiway. Proc Italian Stat Soc 37: 261–268
Google Scholar
Vichi M (1996) Computational complexity of one mode classification, In: Prat A (ed) Proceedings of COMPSTAT vol 96. Physica-Verlag
Zangwill WI (1969) Nonlinear programming: a unified approach. Prentice-Hall, Englewood Cliffs, NJ
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, Probability and Applied Statistics, “Sapienza”, University of Rome, Rome, Italy
Maurizio Vichi

Authors

Maurizio Vichi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maurizio Vichi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vichi, M. Fitting semiparametric clustering models to dissimilarity data. Adv Data Anal Classif 2, 121–161 (2008). https://doi.org/10.1007/s11634-008-0025-4

Download citation

Received: 28 November 2007
Revised: 07 June 2008
Accepted: 31 July 2008
Published: 18 September 2008
Issue Date: October 2008
DOI: https://doi.org/10.1007/s11634-008-0025-4

Keywords

Mathematics Subject Classification (2000)

62h30

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Fitting semiparametric clustering models to dissimilarity data

Abstract

Article PDF

Similar content being viewed by others

On the Behaviour of K-Means Clustering of a Dissimilarity Matrix by Means of Full Multidimensional Scaling

Unsupervised Parameter Estimation of Non Linear Scaling for Improved Classification in the Dissimilarity Space

Hierarchical Means Clustering

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2000)

Navigation

Fitting semiparametric clustering models to dissimilarity data

Abstract

Article PDF

Similar content being viewed by others

On the Behaviour of K-Means Clustering of a Dissimilarity Matrix by Means of Full Multidimensional Scaling

Unsupervised Parameter Estimation of Non Linear Scaling for Improved Classification in the Dissimilarity Space

Hierarchical Means Clustering

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2000)

Search

Navigation