Diagonal Co-clustering Algorithm for Document-Word Partitioning

Laclau, Charlotte; Nadif, Mohamed

doi:10.1007/978-3-319-24465-5_15

Charlotte Laclau¹⁶ &
Mohamed Nadif¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9385))

Included in the following conference series:

International Symposium on Intelligent Data Analysis

1298 Accesses
2 Citations

Abstract

We propose a novel diagonal co-clustering algorithm built upon the double Kmeans to address the problem of document-word co-clustering. At each iteration, the proposed algorithm seeks for a diagonal block structure of the data by minimizing a criterion based on the variance within and the centroid effect. In addition to be easy-to-interpret and efficient on sparse binary and continuous data, Diagonal Double Kmeans (DDKM) is also faster than other state-of-the art clustering algorithms. We illustrate our contribution using real datasets commonly used in document clustering.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Regularized bi-directional co-clustering

Article 10 April 2021

Extended Strategies for Document Clustering with Word Co-occurrences

An Analytical Approach to Document Clustering Techniques

1 Introduction

Co-clustering also known as biclustering or block-clustering involves simultaneous clustering of the set of observations and the set of features in a data matrix. By creating permutations of rows and columns, the co-clustering algorithms aim to reorganize the initial data matrix into homogeneous blocks. These blocks also called co-clusters can therefore be defined as subsets of the data matrix characterized by a set of observations and a set of features whose elements are similar. Other types of co-clustering approaches can be found in [11] and [12]. Co-clustering algorithms present several advantages: they reduce the initial matrix into a simpler form with the same basic structure and require far less computation when compared with separate processing of the same two sets; see for instance [8]. As a result, these methods are of interest to data mining.

In this work, we focus on co-clustering methods that seek a block diagonal structure, i.e. methods in which the number of clusters of rows is equal to the number of clusters of columns. An illustration is given in Fig. 1 where (a) represents an original binary matrix, (b) represents the same matrix after a proper permutation of rows whilst (c) adds a permutation of columns resulting in a clear block diagonal structure. These methods have proven efficient in dealing with the problem of document-word co-clustering. The objective is to group the documents based on the words within them and to group the words based on the documents in which they appear. The dataset is typically represented by a $document\times words$ matrix. In [10], the author proposed a block diagonal algorithm to deal with binary data. This algorithm alternates the clustering of observations and features minimizing the error between the original data matrix and the reconstructed matrix based on the cluster structure. In [3], the author proposed a spectral based solution. He built a bipartite graph from the $document\times words$ matrix which is partitioned in a way that minimize the cut objective function.

In this paper we propose a new diagonal co-clustering algorithm based on the minimization of an heterogeneity measure of blocks. This measure takes into account both the variance within blocks and a measure named the centroid effect [5] defined as the squared deviation from the mean entry in each block and the maximum entry in the input matrix. The proposed algorithm, in addition to be very efficient in terms of co-clustering on sparse data, are also faster than most of state-of-the art algorithms and therefore can deal with high dimensional data.

The remaining of this paper is organized as follows. Section 2 provides the needed background on the Double Kmeans (DKM) algorithm and presents the challenge of diagonal co-clustering. Section 3 presents the Diagonal Double Kmeans (DDKM) algorithm that we propose. Section 4 is devoted to numerical experiments on real datasets showing the appropriateness of our contribution for binary and continuous data. The final section sums up the study and gives recommendations for further research.

Notation. Let $\mathbf {X}:=\{x_{ij};i \in I; j \in J\}$ be a data matrix of size $n \times p$ where $I=\{1,\ldots ,n\}$ and $J=\{1,\ldots ,p\}$. The set I corresponds to the set of n objects and the set J to the set of p attributes. In the sequel, our aim consists in obtaining co-clustering of $\mathbf {X}$. Let $\mathbf {Z}=\{z_{1},\ldots ,z_{n}\}$ be a label vector, where $z_{i} \in \{1,\ldots ,K\}$, denotes the partition of I into K clusters and $\mathbf {W}=\{w_{1},\ldots ,w_{p}\}$ where $w_{j} \in \{1,\ldots ,H\}$ denotes the partition of J into H clusters. The partition of I (respectively J) can be represented by a matrix of elements in $\{0,1\}^K$ (respectively $\{0,1\}^H$) satisfying $\sum _{k=1}^K z_{ik}=1$ (respectively $\sum _{h=1}^H w_{jh}=1$). Finally, to simplify the notation, the sums relating to rows, columns, row and column clusters will be subscripted respectively by the letters $(i=1,\ldots ,n, j=1,\ldots ,p$ and, $k=1,\ldots ,K$, without indicating the implicit limits of variation. For example, the sum $\sum _{i,k}$ stands for $\sum _{i=1}^n\sum _{k=1}^K$.

2 Co-clustering and Diagonal Block Structure

The co-clustering can be formulated as the search for a good matrix approximation of the original data matrix $\mathbf {X}$. The quality is determined by the approximation error which can be measured by a large class of loss functions like the square Euclidean distances. This approximation is generally achieved through an alternate least square minimization process (see for instance [1, 7]). The Double Kmeans algorithm [16] is based on this principle.

2.1 Double Kmeans Algorithm

Formally, the aim is to minimize an objective function $J(\mathbf {Z},\mathbf {W},\mathbf {G})$ where $\mathbf {Z}$ and $\mathbf {W}$ are the partitions and $\mathbf {G}:=\{g_{kh}; k \in \{1,\ldots ,K\}, h \in \{1,\ldots ,H\}$ is a $K \times H$ matrix which can be viewed as a summary of the data matrix X (see Fig. 2).

Each element $g_{kh}$ of $\mathbf {G}$ is called a prototype of co-cluster $\mathbf {X}_{kh}:=\{x_{ij}; z_{ik}w_{jh}=1\}$. Double Kmeans (DK) adopts the squared Euclidean distance to measure the dissimilarity between the matrix $\mathbf {X}$ and the structure described in $\mathbf {Z}$,$\mathbf {W}$ and $\mathbf {G}$. Therefore, $J(\mathbf {Z},\mathbf {W},\mathbf {G})$ is given by

$$\begin{aligned} \mathcal {J}(\mathbf {Z},\mathbf {W},\mathbf {G})= \sum _{i,k,j,h}z_{ik}\times w_{jh}(x_{ij} - g_{kh})^{2}=||\mathbf {X}-\mathbf {Z}\mathbf {G}\mathbf {W}^T||^2, \end{aligned}$$

(1)

where ||.|| denotes the Frobenius norm. It is easy to see that for a fixed $(\mathbf {Z};\mathbf {W})$ the optimal values of $\mathbf {G}$ are the means of $\mathbf {X}_{kh}$’s. The optimal partitions $\mathbf {Z}$ and $\mathbf {W}$ are found using an iterative algorithm. A version of double Kmeans is presented in Algorithm 1 where $z_{.k}$ (resp. $w_{.h}$) represents the cardinality of the k$^{th}$ cluster (resp. h$^{th}$ cluster).

2.2 Block Diagonal Structure

The DKM algorithm appears to be not efficient when looking for a one-to-one correspondence between two partitions $\mathbf {Z}$ and $\mathbf {W}$. In order to deal with this specific case, we have to consider $w_{jk}$ instead of a $w_{jh}$; in other words, we assume that $H=K$. Secondly, the diagonal structure involve to impose some constraints on $\mathbf {G}$; for instance by taking $g_{kk}=\delta $ $\forall k$. This leads us to consider the following criterion:

$$\begin{aligned} J(\mathbf {X}, \mathbf {Z},\mathbf {W})=\sum _{i,j,k}z_{ik}w_{jk}(x_{ij}-\delta )^2, \end{aligned}$$

(2)

where $\delta $ is assumed to be known. The choice of this parameter will be discussed in the next section. The couple of partitions $(\mathbf {Z},\mathbf {W})$ optimizing the criterion given in Eq. 2 are found using the following iterative algorithm

1.
Update $\mathbf {Z}$ the partition of objects, with $\mathbf {W}$ fixed. This leads to the following formula
$$\begin{aligned} z_{i}=\mathop {{{\mathrm{arg\,min}}}}\limits _{k}\sum _{j}w_{jk}(x_{ij}-\delta )^2, \end{aligned}$$
2.
Update $\mathbf {W}$, the partition of features, with $\mathbf {Z}$ fixed. This leads to the following formula
$$\begin{aligned} w_{j}=\mathop {{{\mathrm{arg\,min}}}}\limits _{k}\sum _{i}z_{ik}(x_{ij}-\delta )^2. \end{aligned}$$

From these formulae, one observes that seeking the diagonal structure of blocks indirectly introduces a strong dependency between object assignments (respectively features assignments) to a block and the number of features that belong to this block (respectively the number of objects). If we consider object assignments, we have $(x_{ij}-\delta )^2\ge 0, \forall i,j$; therefore a higher number of features in a block will decrease the chance that an object will be assigned to this particular block. The same phenomenon occurs in the feature assignments. This leads to take into account the size of each co-cluster in order to avoid empty blocks.

3 Diagonal Double Kmeans

3.1 Criterion and Proposed Algorithm

In order to correct the bias introduced by the diagonal structure and to avoid certain blocks from vanishing, we propose a modified criterion that takes into account the number of elements in a block. This criterion takes the following form:

$$\begin{aligned} J(\mathbf {X}, \mathbf {Z},\mathbf {W})=\sum _{k}\frac{1}{z_{.k}w_{.k}}\sum _{i,j}z_{ik}w_{jk}(x_{ij}-\delta )^2. \end{aligned}$$

(3)

where $z_{.k}$ and $w_{.k}$ denote respectively the number of objects and the number of features in the k-th block. Furthermore, it is interesting to note that the criterion given in Eq. 3 may be expressed depending on the variance of a given block $(\mathbf {Z}_k,\mathbf {W}_k)$ and the squared deviation of its mean from the maximum input of the data:

$$\begin{aligned} J(\mathbf {X}, \mathbf {Z},\mathbf {W})= & {} \sum _{i,j,k}\frac{z_{ik}w_{jk}}{z_{.k}w_{.k}}(x_{ij}-\overline{x}_{k})^2 + \sum _{i,j,k}\frac{z_{ik}w_{jk}}{z_{.k}w_{.k}}(\overline{x}_{k}-\delta )^2 \nonumber \\= & {} \sum _{k} \overline{s}_{k}^2+ \sum _{k}(\overline{x}_{k}-\delta )^2. \end{aligned}$$

(4)

where

$$\overline{x}_{k}=\frac{1}{z_{.k}w_{.k}}\sum _{i,j}z_{ik}w_{jk}x_{ij} \text{ and } \overline{s}_{k}^2=\frac{1}{z_{.k}w_{.k}}\sum _{i,j}z_{ik}w_{jk}(x_{ij}-\overline{x}_{k})^2$$

denote the mean and the variance within the k-th block respectively. The first term of Eq. 4 ensures the homogeneity of each block while the second one provides the homogeneity between centers of the blocks and $\delta $. This objective function (Eq. 3) is optimized by an alternating optimization of two conditional criteria given $\mathbf {W}$ and $\mathbf {Z}$ respectively.

$$\begin{aligned} \tilde{J}_1(\mathbf {X}, \mathbf {Z}|\mathbf {W})=\sum _{k} \frac{1}{z_{.k}}\sum _{i}z_{ik}\frac{1}{w_{.k}}\sum _{j}w_{jk}(x_{ij}-\delta )^2 \end{aligned}$$

and

$$\begin{aligned} \tilde{J}_2(\mathbf {X}, \mathbf {W}|\mathbf {Z})=\sum _{k}\frac{1}{w_{.k}}\sum _{j}w_{jk}\frac{1}{z_{.k}}\sum _{i}z_{ik}(x_{ij}-\delta )^2. \end{aligned}$$

The optimization of $\tilde{J}_1$ and $\tilde{J}_2$ lead to the following update rules:

$$\begin{aligned} z_{i}=\mathop {{{\mathrm{arg\,min}}}}\limits _{k}\frac{1}{w_{.k}}\sum _{j}w_{jk}(x_{ij}-\delta )^2, \end{aligned}$$

(5)

$$\begin{aligned} w_{j}=\mathop {{{\mathrm{arg\,min}}}}\limits _{k}\frac{1}{z_{.k}}\sum _{i}z_{ik}(x_{ij}-\delta )^2. \end{aligned}$$

(6)

The proposed algorithm Diagonal Double Kmeans (DDKM) (Algorithm 2) is computationally efficient and its complexity can be shown to be $O(\tau \times npK)$ where $\tau $ denotes the number of iterations required to obtain the convergence, n, p and K are the number of objects (i.e. rows), features (i.e. columns) and clusters respectively.

3.2 Choice of $\delta $

Herein, we discuss the choice of $\delta $. Specifically, we compare between the optimal value of $\delta $ and the maximum entry of the matrix.

1.
If we consider $\delta $ as an unknown parameter, its optimal value for the criterion minimized is equal to the average of blocks means. Indeed, with $\mathbf {Z}$ and $\mathbf {W}$ fixed and by setting the derivative of J (Eq. 3) to zero we obtain $ \delta =\frac{1}{K}\sum _{k}\overline{x}_{k} $ where $\overline{x}_{k}$ denotes the mean of the k-th block. Although this value of $\delta $ is the optimal, we can observe that in the context of sparse data i.e. when the data matrix contains a high percentage of 0, its value will tend to 0 leading to a diagonal structure of blocks of 0. An illustration of the resulting co-clustering obtained with this value on the CSTR dataset (described in the numerical experiments section) is given in Fig. 3(c).
2.
Another way to proceed is to set the value of $\delta $ at the initialisation step. DDKM aims at grouping objects and features with the strongest association possible. For instance, in the case of a binary data matrix $\mathbf {X}$, the strongest association between an object i and a feature j is expressed in $x_{ij}=1$ which is the maximum value of the entry of $\mathbf {X}$. As a matter of fact, choosing the maximum allows to guarantee the homogeneity of diagonal blocks while ensuring blocks of 0 outside. In [5, 13], the authors proposed hierarchical algorithms based on this idea. We use the same example as for the optimal value to show the result on Fig. 3(b). It is important to stress that this approach requires for values of a data matrix to be comparable. This is the case for binary data or normalized data as we will see in the next section devoted to the document-word partitioning.

4 Numerical Experiments

4.1 Performance Evaluation

In order to assess and to compare the performance of the proposed algorithm, we use commonly adopted metrics: the Accuracy, the Normalized Mutual Information [15] and the Adjusted Rand Index [9]. We focus only on the quality of row clustering. Clustering accuracy noted Acc is one of the most widely used evaluation criterion and is defined as:

$$\begin{aligned} \text {Acc}=\frac{1}{n}max\;\left[ \sum _{\mathcal {C}_{k},\mathcal {L}_{\ell }}T(\mathcal {C}_{k},\mathcal {L}_{\ell })\right] \end{aligned}$$

where $\mathcal {C}_{k}$ is the k$^{th}$ cluster in the final results, and $\mathcal {L}_{\ell }$ is the true $\ell ^{th}$ class. $T(\mathcal {C}_{k},\mathcal {L}_{\ell })$ is the proportion of objects that were correctly recovered by the clustering algorithm i.e. $T(\mathcal {C}_{k},\mathcal {L}_{\ell })=\mathcal {C}_{k}\cap \mathcal {L}_{\ell }$. Accuracy computes the maximum sum of $T(\mathcal {C}_{k},\mathcal {L}_{\ell })$ for all pairs of clusters and classes, and these pairs have no overlaps. The second measure employed is the Normalized Mutual Information (NMI) and is calculated as follows:

$$\begin{aligned} \text {NMI}=\frac{\sum _{k,\ell }\frac{n_{k\ell }}{n}\log \frac{n_{k\ell }}{n_{k}\hat{n}_{\ell }}}{\sqrt{(\sum _{k}\frac{n_{k}}{n}\log \frac{n_{k}}{n}) (\sum _{\ell }\frac{\hat{n}_{k}}{n}\log \frac{\hat{n}_{\ell }}{n})}} \end{aligned}$$

where $n_{k}$ denotes the number of data contained in the cluster $\mathcal {C}_{k}(1 \le k \le K)$, $\hat{n}_{\ell }$, the number of data belonging to the class $\mathcal {C^{\prime }}_{k}(1 \le \ell \le K)$, and $n_{k\ell }$, the number of data that are in the intersection between the cluster $\mathcal {C}_{k}$ and the class $\mathcal {C^{\prime }}_{k}$. The last measure Adjusted Rand noted ARI measures the similarity between two clustering partitions. From a mathematical standpoint, the Rand index is related to the accuracy. The adjusted form of the Rand Index is defined as:

$$\begin{aligned} \text {ARI}=\frac{\sum _{k,\ell }\left( {\begin{array}{c}n_{k\ell }\\ 2\end{array}}\right) -\left[ \sum _{k}\left( {\begin{array}{c}n_{k}\\ 2\end{array}}\right) \sum _{\ell }\left( {\begin{array}{c}\hat{n}_{\ell }\\ 2\end{array}}\right) \right] /\left( {\begin{array}{c}n\\ 2\end{array}}\right) }{\frac{1}{2}\left[ \sum _{k}\left( {\begin{array}{c}n_{k}\\ 2\end{array}}\right) +\sum _{\ell }\left( {\begin{array}{c}\hat{n}_{\ell }\\ 2\end{array}}\right) \right] -\left[ \sum _{k}\left( {\begin{array}{c}n_{k}\\ 2\end{array}}\right) \sum _{\ell }\left( {\begin{array}{c}\hat{n}_{\ell }\\ 2\end{array}}\right) \right] /\left( {\begin{array}{c}n\\ 2\end{array}}\right) }. \end{aligned}$$

The value for these three metrics are between 0 and 1, a value close to 1 means a good result in terms of clustering.

4.2 Compared Algorithms

We compare against state-of-the-art (co)-clustering methods including Spherical Kmeans (SpKM)[4], Double Kmeans (DKM) and Spectral Co-Clustering (SpCo) [3]. We also report the clustering results by Kmeans and the Nonnegative Matrix Factorization (NMF) [2] as baseline. The Spherical Kmeans algorithm is basically a Kmeans algorithm that use the cosine dissimilarity instead of the Euclidean distance. It is known to be very efficient on sparse dataset and to converge quickly. We use the matlab implementation for kmeans and NMF. For SpCo algorithm we use the implementation proposed by Assaf Gottlieb^{Footnote 1}. We use the SpKM implementation given in [14]. Finally, we implement CROEUC [6], a fast version of Double Kmeans (DKM); its advantage is due to the use of intermediate matrices of reduced sizes rather than the original data.

4.3 Datasets and Results

We study the effectiveness of our algorithm for some well-known text datasets with different sizes and balances^{Footnote 2}: CSTR, Classic3, WebKB4 and 2 subsets of the 20 Newsgroups dataset. The 20 Newsgroups dataset is organized into 20 topics. Some of the topics are closely related while other are highly unrelated. We describe the topics included in the two subsets in Table 1. The NG2 dataset includes two topics not related (rec.motorcycles and sci.crypt,sci.space) while NG5 includes topics closely related involving a situation with overlapping clusters (rec.sport.baseball, sci.crypt, sci.med, talk.religion.misc, comp.windows.x, soc.religion.christian, talk.politics.mideast). A detailed description of all datasets can be found in Table 1.

Table 1. Description of the datasets in terms of size ($n\times p$), number of clusters (K), sparsity ($\%0$) and size of the cluster. A partition is assumed balanced if the balance coefficient is close to 1 and unbalanced otherwise.

Full size table

Originally each cell of these datasets denotes the number of occurrences of a word in a document. As we are interested in evaluating our algorithm on both binary and continuous data, we use one version of the datasets on which the data were converted into binary i.e. each cell having a value higher to 0 is considered equal to 1 and 0 otherwise, and a second version where a TF-IDF (Term Frequency - Inverse Document Frequency) transformation is applied. The TF-IDF normalization is one of the most used in text mining, and is defined as $x^{\prime }_{ij}=tf_{ij}\times \log \frac{n}{df_{j}}$ where $tf_{ij}$ denotes the number of occurrence of the j-th word in the i-th document and $df_{j}$ denotes the number of documents containing the j-th word.

We set the number of clusters as the true number of classes on all datasets. For each method, given the number of clusters, no parameter selection is needed. We run the algorithms 100 times and report the best result (in percentage), i.e. the one that corresponds to a local minimum of the criterion of all trials in Tables 2 and 3. Several observations can be made based on these results: the proposed algorithm outperforms the other method on each dataset whether it is the TF-IDF version or the binary one, except on the TF-IDF version of WebKB4. On NG2 and NG5 whose classes are not well separated, the performance difference is all the more important. We can also note that on the TF-IDF version of NG2, Kmeans is unable to find a partition into two clusters as required.

Table 2. Accuracy, Normalized Mutual Information and Adjusted Rand Index obtained on tf-idf datasets.

Full size table

Table 3. Accuracy, Normalized Mutual Information and Adjusted Rand Index obtained on binary datasets.

Full size table

4.4 Computational Complexity

We study the computational complexity of the compared clustering and co-clustering algorithms. We repeat clustering 100 times for each algorithm on each dataset. We report the average convergence time in Fig. 4 for the CSTR (a) and Classic 3 (b) datasets. The obtained results show that the proposed algorithm DDKM is only very slightly slower than NMF method while it requires far less time to converge than all other state-of-the-art algorithms. The same observations were made on the other datasets presented in this article.

5 Conclusion

In this paper we presented DDKM, a fast co-clustering algorithm that looks for homogeneous diagonal blocks. Compared with other methods, we demonstrate that our proposed algorithm is more effective for document-word partitioning datasets and especially in presence of classes having a high degree of overlap. In addition, DDKM requires less time to converge; up to 20 times less time than DKM and 40 times less time than SpCo commonly used in the domain of document clustering. In real world application, the knowledge of the number of co-clusters is mostly required. For further research, it will be worthwhile to investigate an efficient way to assess this parameter.

Notes

1.
http://adios.tau.ac.il/.
2.
The balance coefficient is defined as the ratio of the number of documents in the smallest class to the number of documents in the largest class.

References

Baier, D., Gaul, W., Schader, M.: Two-mode overlapping clustering with applications to simultaneous benefit segmentation and market structuring. In: Klar, R., Opitz, O. (eds.) Classification and knowledge organization. Springer, Heidelberg (1997)
Google Scholar
Berry, M.W., Browne, M., Langville, A.N., Pauca, V.P., Plemmons, R.J.: Algorithms and applications for approximate nonnegative matrix factorization. In: Computational Statistics and Data Analysis, pp. 155–173 (2006)
Google Scholar
Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. In: KDD 2001, pp. 269–274 (2001)
Google Scholar
Dhillon, I.S., Modha, D.S.: Concept decompositions for large sparse text data using clustering. Mach. Learn. 42(1–2), 143–175 (2001)
Article MATH Google Scholar
Eckes, T., Orlik, P.: An error variance approach to two-mode hierarchical clustering. J. Classif. 10(1), 51–74 (1993)
Article MATH Google Scholar
Govaert, G.: Classification croisée. Ph.D. thesis, Université Paris 6, France (1983)
Google Scholar
Govaert, G., Nadif, M.: Co-Clustering: Models, Algorithms and Applications. Wiley, New York (2013)
Book MATH Google Scholar
Govaert, G., Nadif, M.: Block clustering with bernoulli mixture models: comparison of different approaches. Comput. Stat. Data Anal. 52(6), 3233–3245 (2008)
Article MathSciNet MATH Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)
Article MATH Google Scholar
Li, T.: A general model for clustering binary data. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, KDD 2005, pp. 188–197 (2005)
Google Scholar
Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans. Comput. Biol. Bioinf. 1, 24–45 (2004)
Article Google Scholar
Mechelen, I.V., Bock, H.H., Boeck, P.D.: Two-mode clustering methods: a structured overview. Stat. Methods Med. Res. 13(5), 363–394 (2004)
Article MathSciNet MATH Google Scholar
Mirkin, B., Arabie, P., Hubert, L.: Additive two-mode clustering: the error-variance approach revisited. J. Classif. 12(2), 243–263 (1995)
Article MATH Google Scholar
Nguyen, X.V.: Gene clustering on the unit hypersphere with the spherical k-means algorithm: coping with extremely large number of local optima. In: International Conference on Bioinformatics & Computational Biology, BIOCOMP 2008, pp. 226–233 (2008)
Google Scholar
Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2003)
MathSciNet MATH Google Scholar
Vichi, M.: Double k-means clustering for simultaneous classification of objects and variables. In: Borra, S., Rocci, R., Vichi, M., Schader, M. (eds.) Advances in classification and data analysis, pp. 43–52. Springer, Heidelberg (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

LIPADE, Université Paris Descartes, 45 Rue des Saint-Pères, 75006, Paris, France
Charlotte Laclau & Mohamed Nadif

Authors

Charlotte Laclau
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Nadif
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Charlotte Laclau or Mohamed Nadif .

Editor information

Editors and Affiliations

Université de Saint-Etienne, Saint-Etienne, France
Elisa Fromont
Intelligent Systems Lab, University of Bristol Intelligent Systems Lab, Bristol, United Kingdom
Tijl De Bie
Informatics Section, Katholieke Universiteit Leuven, Leuven, Belgium
Matthijs van Leeuwen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Laclau, C., Nadif, M. (2015). Diagonal Co-clustering Algorithm for Document-Word Partitioning. In: Fromont, E., De Bie, T., van Leeuwen, M. (eds) Advances in Intelligent Data Analysis XIV. IDA 2015. Lecture Notes in Computer Science(), vol 9385. Springer, Cham. https://doi.org/10.1007/978-3-319-24465-5_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-24465-5_15
Published: 22 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24464-8
Online ISBN: 978-3-319-24465-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Diagonal Co-clustering Algorithm for Document-Word Partitioning

Abstract