Abstract
The paper presents a methodology for classifying three-way dissimilarity data, which are reconstructed by a small number of consensus classifications of the objects each defined by a sum of two order constrained distance matrices, so as to identify both a partition and an indexed hierarchy.
Specifically, the dissimilarity matrices are partitioned in homogeneous classes and, within each class, a partition and an indexed hierarchy are simultaneously fitted.
The model proposed is mathematically formalized as a constrained mixed-integer quadratic problem to be fitted in the least-squares sense and an alternating least-squares algorithm is proposed which is computationally efficient.
Two applications of the methodology are also described together with an extensive simulation to investigate the performance of the algorithm.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
ASUNCION, A., and NEWMAN, D.J. (2007), UCI Machine Learning Repository, http://www.ics.uci.edu/~mlearn/MLRepository.html, Irvine, CA: University of California, School of Information and Computer Science.
CARROLL, J.D., and ARABIE, P. (1983), “An Individual Differences Generalization of the ADCLUS Model and the MAPCLUS Algorithm”, Psychometrika, 48, 157–169.
DE SOETE, G. (1984), “A Least Squares Algorithm for Fitting an Ultrametric Tree to Dissimilarity Matrix”, Pattern Recognition Letters, 2, 133–137.
FERN, X.Z., and BRODLEY, C.E. (2003), “Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach”, in Proceedings of the 20 th International Conference on Machine Learning, ICML, Washington D.C., pp.186–193.
FRED, A.L.N., and JAIN, A.K. (2003), “Robust Data Clustering”, in Proceedings IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR, USA.
GORDON, A.D. (1999), Classification (2nd ed.), Boca Raton, FL: Chapman & Hall/CRC.
GORDON, A.D., and VICHI, M. (1998), “Partitions of Partitions”, Journal of Classification, 15, 265–285.
GORDON, A.D., and VICHI, M. (2001), “Fuzzy Partition Models for Fitting a Set of Partitions”, Psychometrika, 66(2), 229–248.
HUBERT, L., and ARABIE, P. (1985), “Comparing Partitions”, Journal of Classification, 2, 193–218.
HUBERT, L., and ARABIE, P. (1994), “The Analysis of Proximity Matrices through Sums of Matrices Having (Anti-)Robinson Forms”, British Journal of Mathematical and Statistical Psychology, 47, 1–40.
HUBERT, L., ARABIE, P., and MEULMAN, J. (1998), “Graph-Theoretic Representations for Proximity Matrices through Strongly-Anti-Robinson or Circular Strongly-Anti-Robinson Matrices”, Psychometrika, 63(4), 341–358.
KAUFMAN, L., and ROUSSEEUW, P.J. (2005), Finding Groups in Data. An Introduction to Cluster Analysis, New York: John Wiley & Sons.
KOIVISTO, M., and SOOD, K. (2004), “Exact Bayesian Structure Discovery in Bayesian Networks”, Journal of Machine Learning Research, 5, 549–573.
MACQUEEN, J.B. (1967), “Some Methods for Classification and Analysis of Multivariate Observations”, in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1 Statistics, eds. L.M. Le Cam and J. Neyman, Berkeley: University of California Press, pp. 281–297.
MCKENZIE, D.P., and FORSYTH, R.S. (1995), “Classification by Similarity: An Overview of Statistical Methods of Case-based Reasoning”, Computers in Human Behavior, 11(2), 273–288.
MEULMAN, J.J., and HEISER, W.J. (2004), SPSS Categories 13.0, Chicago: SPSS Inc.
MILLIGAN, G.W., and COOPER, M.C. (1985), “An Examination of Procedures for Determining the Number of Clusters in a Data Set”, Psychometrika, 50, 159–179.
POWELL, M.J.D. (1983), “Variable Metric Methods for Constrained Optimization”, in: Mathematical Programming: The State of Art, eds. A. Bachem, M. Grotschel, B. Korte, New York: Springer-Verlag, pp. 288–311.
ROSENBERG, S., and KIM, M.P. (1975), “The Method of Sorting as Data-Gathering Procedure in Multivariate Research”, Multivariate Behavioral Research, 10, 489–502.
SOKAL, R.R., and ROHLF, F.J. (1962), “The Comparison of Dendrograms by Objective Methods”, Taxon, 11, 33–40.
STREHL, A., and GHOSH, J. (2002), “Cluster Ensembles – A Knowledge Reuse Framework for Combining Multiple Partitions”, Journal of Machine Learning Research, 3, 583–618.
VICARI, D., and VICHI, M. (2000), “Non-Hierarchical Classification Structures”, in Data Analysis, eds. W. Gaul, O. Opitz, and M. Schader, Heidelberg-Berlin: Springer-Verlag, pp. 51–65.
VICHI, M. (1999), “One Mode Classification of a Three-Way Data Matrix”, Journal of Classification, 16, 27–44.
WANG, D., CHAUDHARI, N.S., and PATRA, J.C. (2004), “A Constructive Unsupervised Learning Algorithm for Clustering Binary Patterns”, in Proceedings of the International Joint Conference on Neural Networks, IJCNN-04, 2, Budapest, Hungary (IEEE Cat. No. 04CH37541C), (ISBN: 0-7803-8360-5), pp. 1381–1386.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Vicari, D., Vichi, M. Structural Classification Analysis of Three-Way Dissimilarity Data. J Classif 26, 121–154 (2009). https://doi.org/10.1007/s00357-009-9033-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-009-9033-0