Abstract
The additive biclustering model for two-way two-mode object by variable data implies overlapping clusterings of both the objects and the variables together with a weight for each bicluster (i.e., a pair of an object and a variable cluster). In the data analysis, an additive biclustering model is fitted to given data by means of minimizing a least squares loss function. To this end, two alternating least squares algorithms (ALS) may be used: (1) PENCLUS, and (2) Baier’s ALS approach. However, both algorithms suffer from some inherent limitations, which may hamper their performance. As a way out, based on theoretical results regarding optimally designing ALS algorithms, in this paper a new ALS algorithm will be presented. In a simulation study this algorithm will be shown to outperform the existing ALS approaches.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
BAIER, D., GAUL, W., and SCHADER, M. (1997), “Two-Mode Overlapping Clustering with Applications to Simultaneous Benefit Segmentation and Market Structuring”, in Classification and Knowledge Organization, eds. R. Klar, and K. Opitz, Berlin, Germany: Springer, pp. 557–566.
BOTH, M., and GAUL, W. (1987), “Ein Vergleich Zweimodaler Clusteranalyseverfahren,” Methods of Operations Research, 57, 593–605.
BOTH, M., and GAUL, W. (1985), “PENCLUS: Penalty Clustering for Marketing Applications,” Discussion Paper No. 82, Institution of Decision Theory and Operations Research, University of Karlsruhe.
CEULEMANS, E., and KIERS, H.A.L. (2006), “Selecting Among Three-Mode Principal Component Models of Different Types and Complexities: A Numerical Convex Hull Based Method,” British Journal of Mathematical and Statistical Psychology, 59, 133–150.
CHATURVEDI,A., and CARROLL, J.D. (1994), “An Alternating Combinatorial Optimization Approach to Fitting the INDCLUS and Generalized INDCLUSModels,” Journal of Classification, 11, 155–170.
COLLINS, L.M., and DENT, C.W. (1988), “Omega: A General Formulation of the Rand Index of Cluster Recovery Suitable for Non-Disjoint Solutions,” Multivariate Behavioral Research, 23, 231–242.
DE LEEUW, J. (1994), “Block-Relaxation Algorithms in Statistics”, in Information Systems and Data Analysis, eds. H.-H. Bock, W. Lenski, and M.M. Richter, Berlin: Springer-Verlag, pp. 308–325.
DE SARBO,W.S. (1982), “Gennclus: New Models for General Nonhierarchical Clustering Analysis,” Psychometrika, 47, 449–475.
ECKES, T., and ORLIK, P. (1993), “An Error Variance Approach to Two-Mode Hierarchical Clustering,” Journal of Classification, 10, 51–74.
FAITH, J.J., HAYETE, B., JOSHUA, T., THADEN, J.T.,MONGO, I.M.,WIERZBOWSKI, J., COTTAREL, G., KASIF, S., COLLINS, J.J., and GARDNER, T.S. (2007), “Large-Scale Mapping and Validation of Escherichia Coli Transcriptional Regulation from a Compendium of Expression Profiles,” PLoS Biology, 5(1), 54–66.
GARA, M., ROSENBERG, S., and GOLDBERG, L. (1992), “DSM-IIIR as a Taxonomy: A Cluster Analysis of Diagnoses and Symptoms,” Journal of Nervous and Mental Disease, 180, 11–19.
GASCH, A.P., SPELLMAN, P.T., KAO, C.M., CARMEL-HAREL, O., EISEN, M.B., STORZ, G., BOTSTEIN, D., and BROWN, P.O. (2000), “Genomic Expression Programs in the Response of Yeast Cells to Environmental Changes,” Molecular Biology of the Cell, 11, 4241–4257.
GAUL, W., and SCHADER, M. (1996), “A New Algorithm for Two-Mode Clustering”, in Data Analysis and Information Systems: Statistical and Computational Approaches, eds. H.-H. Bock, and W. Polasek, Berlin, Germany: Springer, pp. 15–23.
GREENACRE, M.J. (1988), “Clustering the Rows and Columns of a Contingency Table,” Journal of Classification, 5, 39–51.
HAND, D., and KRZANOWSKI, W. (2005), “Optimizing K-means Clustering Results with Standard Software Packages,” Computational Statistics and Data Analysis, 49, 969–973.
HARTIGAN, J.A. (1976), “Modal Blocks in Dentition of West Coast Mammals,” Systematic Zoology, 25, 149–160.
LAZZERONI, L., and OWEN, A. (2002), “Plaid Models for Gene Expression Data,” Statistica Sinica, 12, 61–86.
MADEIRA, S.C., and OLIVEIRA, A.L. (2004), “Biclustering Algorithms for Biological Data Analysis: A Survey,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, 1, 24–45.
MEZZICH, J.E., and SOLOMON, H. (1980), Taxonomy and Behavioral Science: Comparative Performance of Grouping Methods, London: Academic Press.
MIRKIN, B., ARABIE, P., and HUBERT, L.J. (1995), “Additive Two-Mode Clustering: The Error-Variance Approach Revisited?,” Journal of Classification, 12, 243–263.
SCHEPERS, J., CEULEMANS, E., and VAN MECHELEN, I. (2008), “Selecting Among Multi-Mode Partitioning Models of Different Complexities: A Comparison of Four Model Selection Criteria,” Journal of Classification, 25, 67–85.
SCHEPERS, J., and VAN MECHELEN, I. (2011), “A Two-Mode Clustering Method to Capture the Nature of the Dominant Interaction Pattern in Large Profile Data Matrices,” Psychological Methods, 16, 361–371.
SEGAL, E., SHAPIRA, M., REGEV, A., PE’ER, D., BOTSTEIN, D., KOLLER, D., and FRIEDMAN, N. (2003), “Module Networks: Identifying Regulatory Modules and Their Condition-Specific Regulators from Gene Expression Data,” Nature Genetics, 34, 166–176.
SPELLMAN, P.T., SHERLOCK, G., ZHANG, M.Q., IYER, V.R., ANDERS, K., EISEN, M.B., BROWN, P.O., BOTSTEIN, D., and FUTCHER, B. (1998), “Comprehensive Identification of Cell Cycle-Regulated Genes of the Yeast Saccharomyces Cerevisiae by Microarray Hybridization,” Molecular Biology of the Cell, 9, 3273–3297.
STEINLEY, D., and BRUSCO, M.J. (2007), “Initializing K-means Batch Clustering: A Critical Evaluation of Several Techniques,” Journal of Classification, 24, 99–121.
TURNER,H., BAILEY, T., and KRZANOWSKI,W. (2005), “Improved Biclustering ofMicroarray Data Demonstrated Through Systematic Performance Tests,” Computational Statistics and Data Analysis, 48, 235–254.
VAN MECHELEN, I., BOCK, H.-H., and DE BOECK, P. (2004), “Two-Mode Clustering Methods: A Structured Overview,” Statistical Methods in Medical Research, 13, 363–394.
VAN MECHELEN, I., and DE BOECK, P. (1989), “Implicit Taxonomy in Psychiatric Diagnosis: A Case Study,” Journal of Social and Clinical Psychology, 8, 276–287.
VAN MECHELEN, I., and DE BOECK, P. (1990), “Projection of a Binary Criterion into a Model of Hierarchical Classes,” Psychometrika, 55, 677–694.
WILDERJANS, T. F., CEULEMANS, E., and VAN MECHELEN, I. (2008), “The CHIC Model: A Global Model for Coupled Binary Data,” Psychometrika, 73, 729–751.
WILDERJANS, T. F., CEULEMANS, E., and VAN MECHELEN, I. (2012), “The SIMCLAS Model: Simultaneous Analysis of Coupled Binary Data Matrices with Noise Heterogeneity Between and Within Data Blocks,” Psychometrika, 77, 724–740.
WILDERJANS, T. F., CEULEMANS, E., VAN MECHELEN, I., and DEPRIL, D. (2011), “ADPROCLUS: A Graphical User Interface for Fitting Additive Profile Clustering Models to Object by Variable Data Matrices,” Behavior Research Methods, 43, 56–65.
WILDERJANS, T. F., DEPRIL, D., and VAN MECHELEN, I. (2012), “Block-Relaxation Approaches for Fitting the INDCLUS Model,” Journal of Classification, 29, 277–296.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wilderjans, T.F., Depril, D. & Van Mechelen, I. Additive Biclustering: A Comparison of One New and Two Existing ALS Algorithms. J Classif 30, 56–74 (2013). https://doi.org/10.1007/s00357-013-9120-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-013-9120-0