Keywords

1 Introduction

Clustering, as a core paradigm of unsupervised learning, has been widely used in various domains owing to its powerful capacity in data preprocessing [13]. Many representative clustering methods like k-means and fuzzy c-means (FCM) have been invented for single-view data [10]. With the prompt development of information technology, however, single-view data can no longer meet the actual needs, and we are often faced with multi-view data represented by different sources or features. For example, an image can be expressed by multiple feature descriptors, such as LBP, SIFT and HOG. In such situations, the aforementioned classical clustering methods will no longer be applicable.

To counter this flaw and explore multi-view data, a load of derivatives of multi-view clustering methods have been emerged. One of the most representative is hard partition-based [2, 3, 6, 15]. For example, Cai et al. [2] suggested a \(\ell _{2, 1}\) norm-based multi-view k-means clustering, called RMVKM, to learn the weight of each view. Later, Chen et al. [3] developed a two-level variable-weighted k-means (TW-k-means) clustering for multi-view data by introducing two entropy regularizations to adjust the importance of the weights. Based on collaborative learning, Zhang et al. [15] presented a two-level weighted collaborative k-means (TW-Co-k-means) clustering to consider the weight of views and features. However, these multi-view clustering methods all consider that the relationship between objects and clusters is distinct, i.e., an object can only belong to a cluster completely or not.

Unlike hard partition-based clustering methods, fuzzy clustering describes the uncertainty by assigning each object to various singleton clusters with different memberships [11]. Many multi-view methods based on the idea of fuzzy clustering have been developed [4, 5, 7, 14]. Cleuziou et al. [4] proposed collaborative fuzzy clustering for multi-view data, named Co-FKM, whereas it equally considers the contribution of each view. Jiang et al. [7] proposed a weighted view collaborative fuzzy c-means clustering to identify the contributions of different views. Recently, Yang and Sinaga [14] suggested a feature-weighted multi-view FCM based on collaborative learning (Co-FW-MVFCM), which can automatically identify the contribution of each view and feature. Although these methods have achieved reasonable results to some extent, they cannot represent the imprecision in the result. In fact, uncertain and imprecise cluster structure is very ubiquitous in applications, that is, the cluster information of some objects is difficult to distinguish.

To address such issue, a new concept of credal partition based on the theory of belief functions is introduced [8, 9]. Credal partition extends the concepts of hard, fuzzy and possibilistic partitions by assigning the objects, not only to singleton clusters, but also to the union of several singleton clusters (called meta-cluster) with different masses of belief. To date, some evidential clustering methods have been developed in the framework of belief functions [9, 12]. Evidential c-means (ECM) clustering [9], as the evidential counterpart of FCM, can well characterize uncertain and imprecise cluster structures and derive effective clustering results. Unfortunately, ECM is built upon the assumption of access to single-view data. Hence, how to characterize the uncertainty and imprecision of multi-view data in cluster assignment and improve the clustering performance is an open issue.

In this paper, we propose a new adaptive weighted multi-view evidential clustering (WMVEC) method. Specifically, we introduce view weights to capture the different contributions of each view in clustering and employ entropy regularization to regulate the distribution of view weights.

The main contributions are summarized as follows:

  1. 1.

    Inspired by the theory of belief functions, we propose a novel WMVEC method, which can characterize the uncertainty and imprecision in clustering results.

  2. 2.

    We design the objective function of WMVEC and obtain the weight vector, cluster prototype matrix and credal partition matrix in an alternating optimization way.

  3. 3.

    We conduct extensive experiments on several real-world datasets to verify the superiority of WMVEC with respect to other state-of-the-art methods.

The remainder of this paper is organized as follows. Section 2 proposes the WMVEC method and illustrates its optimization process. In Sect. 3, the effectiveness and practicability of the proposed WMVEC method are analyzed using real-world datasets. Finally, we make a brief conclusion in Sect. 4.

2 Multi-view Evidential Clustering

As mentioned earlier, many real-world applications are currently crowded with multi-view data. However, how to characterize the uncertainty and imprecision in clustering results on such data is still a challenging task. Conventional evidential clustering is no longer adequate for the situations we encounter today.

In this section, we propose a novel adaptive weighted multi-view evidential clustering (WMVEC) method to simultaneously learn the consensus credal partition and the importance of each view.

2.1 Formulation

Given a dataset \(\mathcal {X} = \{\textbf{X}_1, \textbf{X}_2, \cdots , \textbf{X}_{\mathcal {H}}\}\) and h-th view is expressed as \(\textbf{X}_h = \{\textbf{x}_{1,h}, \textbf{x}_{2, h}, \cdots , \textbf{x}_{n,h}\} \in \mathbb {R}^{n \times \mathcal {P}_h}\). The objective function of WMVEC can be represented as follows:

$$\begin{aligned}&\!\!\!\!\!\mathcal {J}_{WMVEC}(\textbf{M}, \textbf{V}, \textbf{r}) = \sum \limits _{h = 1}^\mathcal {H} r_h \!\left( \!\sum \limits _{i = 1}^n \sum \limits _{\begin{array}{c} A_j\subseteq \varOmega \\ A_j\ne \emptyset \end{array}} |A_j|^\alpha m_{ij}^\beta \Vert \textbf{x}_{i, h} - \bar{\textbf{v}}_{j, h}\Vert ^2 + \!\sum \limits _{i = 1}^n \delta _h^2 m_{i\emptyset }^\beta \right) \end{aligned}$$
(1)
$$\begin{aligned}&\qquad \qquad \qquad \qquad \quad + \lambda \sum \limits _{h=1}^{\mathcal {H}} r_h \ln {r_h} \nonumber \\&{\textit{s.t.} } \sum \limits _{A_j\subseteq \varOmega , A_j\ne \emptyset } m_{ij} + m_{i\emptyset } = 1,\;\;i = 1, 2, \cdot \cdot \cdot , n;\;\;\sum \limits _{h=1}^{\mathcal {H}} r_h = 1 \end{aligned}$$
(2)

with

$$\begin{aligned} \bar{\textbf{v}}_{j, h} = \frac{1}{\left| A_j \right| } \sum \limits _{\omega _k \in A_j} \textbf{v}_{k, h} \end{aligned}$$
(3)

where \(r_h\) is the view weight of h-th view, \(m_{ij}\) is the (ij)-th element of credal partition matrix \(\textbf{M} \in \mathbb {R}^{n \times 2^c}\) and denotes the mass of belief of i-th object associated to cluster \(A_j\). \(\bar{\textbf{v}}_{j,h}\) is the j-th cluster prototype of h-th view. The parameter \(\lambda \) is used to adjust the distribution of view weights, and parameters \(\alpha \), \(\beta \) and \(\delta \) are the same as in ECM. The objective function \(\mathcal {J}_{MVEC}\) consists of two parts, the first part is similar to ECM and computes the sum of intra-cluster weighted distances in each view by assigning weights to the views. The second part is a negative weight entropy, which regulates the effect of view weight.

2.2 Optimization

In this subsection, we provide an efficient iterative method to minimize (2). More specifically, we employ the Lagrange optimization method to alternately update one variable while leaving others fixed.

Updating. \({\textbf{M}}\) With \(\textbf{V}\) and \(\textbf{r}\) fixed, \(\textbf{M}\) is computed by Theorem 1.

Theorem 1

Suppose \(\textbf{V} = \textbf{V}^*\) and \(\textbf{r} = \textbf{r}^*\) are fixed, \(\mathcal {J}_{{WMVEC}}(\textbf{M}, \textbf{V}^*, \textbf{r}^*)\) is minimized iff:

$$\begin{aligned} m_{ij} = \frac{ \left( \sum \limits _{h=1}^{\mathcal {H}} |A_j|^\alpha r_{h} \Vert \textbf{x}_{i, h} - \bar{\textbf{v}}_{j, h}\Vert ^2 \right) ^{ -\frac{1}{\beta -1}}}{\left( \sum \limits _{A_z \ne \emptyset } \sum \limits _{h=1}^{\mathcal {H}} |A_j|^\alpha r_{h} \Vert \textbf{x}_{i, h} - \bar{\textbf{v}}_{z, h}\Vert ^2 \right) ^{ -\frac{1}{\beta -1}}+\left( \sum \limits _{h=1}^{\mathcal {H}} r_{ h} \delta _h^2\right) ^{ -\frac{1}{\beta -1}}} \end{aligned}$$
(4)
$$\begin{aligned} \begin{aligned} m_{i\emptyset } = 1 - \sum \limits _{A_j\ne \emptyset } m_{ij} \end{aligned} \end{aligned}$$
(5)

Proof

We employ Lagrange multipliers \(\tau _{i}\) to solve the constrained minimization problem with respect to \(\textbf{M}\). The objective function (2) can be rewritten as:

$$\begin{aligned} \begin{aligned}&\mathcal {J}(\textbf{M}, \tau _{i}) = \mathcal {J}_{{WMVEC}}(\textbf{M}, \textbf{V}, \textbf{r})- \sum \limits _{i = 1}^n {\tau _{i}} \left( \sum \limits _{\begin{array}{c} A_j\subseteq \varOmega \\ A_j\ne \emptyset \end{array}} {m}_{ij} + m_{i\emptyset } -1 \right) \quad \\ \end{aligned} \end{aligned}$$
(6)

By differentiating the Lagrangian with respect to the \(m_{ij}\), \(m_{i\emptyset }\), and \(\tau _i\) and setting the derivatives to zero, we obtain:

$$\begin{aligned} {\frac{\partial \mathcal {J}}{\partial m_{ij}}} = \beta m_{ij}^{(\beta -1)} \sum \limits _{h=1}^{\mathcal {H}} |A_j|^\alpha r_{h} \Vert \textbf{x}_{i, h} - \bar{\textbf{v}}_{j, h}\Vert ^2 - \tau _{i} = 0 \end{aligned}$$
(7)
$$\begin{aligned} \begin{aligned} {\frac{\partial \mathcal {J}}{\partial m_{i\emptyset }}} = \sum \limits _{h = 1}^\mathcal {H} \beta r_{ h} \delta _h^2 m_{i\emptyset }^{(\beta -1)} - \tau _{i} = 0 \end{aligned} \end{aligned}$$
(8)
$$\begin{aligned} {\frac{\partial \mathcal {J}}{\partial \tau _{i}}} = \sum \limits _{A_j\subseteq \varOmega , A_j \ne \emptyset } {m}_{ij} + m_{i\emptyset } -1 = 0 \end{aligned}$$
(9)

We thus have from (7):

$$\begin{aligned} m_{ij}=\left( \frac{\tau _{i}}{\beta }\right) ^{ \frac{1}{\beta -1}} \left( \frac{1}{\sum \limits _{h=1}^{\mathcal {H}} |A_j|^\alpha r_{h} \Vert \textbf{x}_{i, h} - \bar{\textbf{v}}_{j, h}\Vert ^2 } \right) ^{ \frac{1}{\beta -1}} \end{aligned}$$
(10)
$$\begin{aligned} m_{i\emptyset }=\left( \frac{\tau _{i}}{\beta }\right) ^{ \frac{1}{\beta -1}} \left( \frac{1}{\sum \limits _{h=1}^{\mathcal {H}} r_{h} \delta _h^2 } \right) ^{ \frac{1}{\beta -1}} \end{aligned}$$
(11)

Using (9)–(11), we can obtain the updating (4)–(5).

Updating. \({\textbf{r}}\) With \(\textbf{M}\) and \(\textbf{V}\) fixed, \(\textbf{r}\) is computed by Theorem 2.

Theorem 2

Suppose \(\textbf{M} = \textbf{M}^*\) and \(\textbf{V} = \textbf{V}^*\) are fixed, \(\mathcal {J}_{{WMVEC}}(\textbf{M}^*, \textbf{V}^*, \textbf{r})\) is minimized iff:

$$\begin{aligned} r_{h} = \frac{\mathrm{{exp}} \left( \frac{- \sum \limits _{i = 1}^n \sum \limits _{\begin{array}{c} A_j\subseteq \varOmega \\ A_j\ne \emptyset \end{array}} |A_j|^\alpha m_{ij}^\beta \Vert \textbf{x}_{i, h} - \bar{\textbf{v}}_{j, h}\Vert ^2 + \sum \limits _{i = 1}^n \delta ^2_h m_{i\emptyset }^\beta }{\lambda }\right) }{\sum \limits _{g = 1}^\mathcal {H} \mathrm{{exp}} \left( \frac{- \sum \limits _{i = 1}^n \sum \limits _{\begin{array}{c} A_j\subseteq \varOmega \\ A_j\ne \emptyset \end{array}} |A_j|^\alpha m_{ij}^\beta \Vert \textbf{x}_{i, g} - \bar{\textbf{v}}_{j, g}\Vert ^2 + \sum \limits _{i = 1}^n \delta ^2_g m_{i\emptyset }^\beta }{\lambda }\right) } \end{aligned}$$
(12)

Proof

Lagrange multiplier \(\sigma \) is used to solve the constrained minimization problem with respect to \(\textbf{r}\). The objective function (2) becomes:

$$\begin{aligned} \mathcal {J}(\textbf{r}, \sigma ) = \mathcal {J}_{{WMVEC}}(\textbf{M}, \textbf{V}, \textbf{r}) - \sigma \left( \sum \limits _{h = 1}^\mathcal {H} {r_{h}} -1\right) \end{aligned}$$
(13)

By differentiating the Lagrangian with respect to the \(r_{h}\) and \(\sigma \) setting the derivatives to zero, we obtain:

$$\begin{aligned} \begin{aligned} {\frac{\partial \mathcal {J}}{\partial r_{h}}} =\sum \limits _{i = 1}^n \sum \limits _{\begin{array}{c} A_j\subseteq \varOmega \\ A_j\ne \emptyset \end{array}} |A_j|^\alpha m_{ij}^\beta \Vert \textbf{x}_{i, h} - \bar{\textbf{v}}_{j, h}\Vert ^2 + \sum \limits _{i = 1}^n \delta ^2_h m_{i\emptyset }^\beta + \lambda \left( 1 + \ln {r_h}\right) - \sigma = 0 \end{aligned} \end{aligned}$$
(14)
$$\begin{aligned} {\frac{\partial \mathcal {J}}{\partial \sigma }} = \sum \limits _{h = 1}^\mathcal {H} {r_{h}} -1 = 0 \end{aligned}$$
(15)

We thus have from (14):

$$\begin{aligned} \begin{aligned} r_{h} = \mathrm{{exp}} \left( \frac{ \sigma - \lambda }{\lambda }\right) \mathrm{{exp}} \left( \frac{- \sum \limits _{i = 1}^n \sum \limits _{\begin{array}{c} A_j\subseteq \varOmega \\ A_j\ne \emptyset \end{array}} |A_j|^\alpha m_{ij}^\beta \Vert \textbf{x}_{i, h} - \bar{\textbf{v}}_{j, h}\Vert ^2 + \sum \limits _{i = 1}^n \delta ^2_h m_{i\emptyset }^\beta }{\lambda }\right) \end{aligned} \end{aligned}$$
(16)

Using (15)–(16), we can obtain the updating (12).

Updating. \({\textbf{V}}\) while fixing \({\textbf{M}}\) and \({\textbf{r}}\) With \(\textbf{M}\) and \(\textbf{r}\) fixed, \(\textbf{V}\) is computed by Theorem 3.

Theorem 3

Suppose \(\textbf{M} = \textbf{M}^*\) and \(\textbf{r} = \textbf{r}^*\) are fixed, \(\mathcal {J}_{{WMVEC}}(\textbf{M}^*, \textbf{V}, \textbf{r}^*)\) is minimized iff:

$$\begin{aligned} \begin{aligned} B_{lq, h} \triangleq \sum \limits _{i = 1}^n {x}_{iq, h}\sum \limits _{A_j\ni \omega _l} |A_j|^{\alpha -1} m_{ij}^\beta ,\;\;\forall \;l = 1,c,\;\;\forall \;q = 1,\mathcal {P}_h \end{aligned} \end{aligned}$$
(17)
$$\begin{aligned} \begin{aligned} H_{lk, h} \triangleq \sum \limits _{i = 1}^n \sum \limits _{A_j\supseteq \{\omega _k,\omega _l\} } |A_j|^{\alpha -2} m_{ij}^\beta ,\;\;\forall \;k,l = 1,c \end{aligned} \end{aligned}$$
(18)
$$\begin{aligned} \begin{aligned} \textbf{H}_{h} \textbf{V}_{h} = \textbf{B}_{h} \end{aligned} \end{aligned}$$
(19)

where \(B_{lq,h}\) (resp. \(H_{lk,h}\)) is the lq-th (resp. lk-th) element of the matrix \(\textbf{B}_h \in \mathbb {R}^{c \times \mathcal {P}_h}\) (resp. \(\textbf{H}_h \in \mathbb {R}^{c \times c}\)).

Proof

The partial derivatives of \(\mathcal {J}_{{WMVEC}}\) with respect to the cluster centers are given by:

$$\begin{aligned} \begin{aligned} {\frac{\partial \mathcal {J}}{\partial \textbf{v}_{l, h}}} = \sum \limits _{i = 1}^n \sum \limits _{A_j\ne \emptyset } |A_j|^\alpha r_{h} m_{ij}^\beta {\frac{\partial D^2_{ij, h}}{\partial \textbf{v}_{l, h}}} \end{aligned} \end{aligned}$$
(20)
$$\begin{aligned} {\frac{\partial D^2_{ij, h}}{\partial \textbf{v}_{l, h}}} = -\frac{2}{|A_j|} \left( \textbf{x}_{i, h} - \frac{1}{|A_j|} \sum \limits _{A_j\ni \omega _l} \textbf{v}_{l, h}\right) \end{aligned}$$
(21)

where

$$\begin{aligned} D^2_{ij, h} = \Vert \textbf{x}_{i, h} - \bar{\textbf{v}}_{j, h}\Vert ^2 \end{aligned}$$
(22)

Setting these derivatives to zero gives c linear equations that can be written as:

$$\begin{aligned} \sum \limits _{i = 1}^n \textbf{x}_{i, h}\sum \limits _{A_j\ni \omega _l} |A_j|^{\alpha -1} r_{h} m_{ij}^\beta = \sum \limits _{k = 1}^c \textbf{v}_{k, h}\sum \limits _{i = 1}^N \sum \limits _{A_j\supseteq \{\omega _k,\omega _l\}} |A_j|^{\alpha -2} r_{h} m_{ij}^\beta \end{aligned}$$
(23)

The system of linear equations can be equally represented by (17) and (18).

For the convenience of implementation, the proposed WMVEC method is outlined in Algorithm 1.

figure a

2.3 Computational Complexity

The computational complexity of ECM is \(O(tn2^c)\), where t is the number of iterations, n and c represent the number of objects and clusters respectively. WMVEC is an extension of ECM in multi-view scenarios by introducing view weights. Therefore, the computational complexity of WMVEC is \(O(t\mathcal {H}n2^c)\), and \(\mathcal {H}\) is the number of views. It is worth noting that the number of clusters grows exponentially with the computational complexity, and the assignment of objects to high-cardinality meta-clusters is challenging to interpret in practice. Therefore, we can consider imposing constraints to limit the number of focal elements in the meta-cluster, for example, like in ECM, to keep only the meta-cluster containing two clusters. For higher cluster datasets, we can restrict the framework of discernment to consist of only singleton clusters, empty set, and \(\varOmega \). By doing so, the computational complexity of WMVEC can be reduced to \(O(t\mathcal {H}nc^2)\) and \(O(t\mathcal {H}n(c+2))\).

3 Experiments

3.1 Experiment on Iris Dataset

In this experiment, we demonstrate the ability of credal partition in WMVEC on the IrisFootnote 1 dataset. The Iris dataset contains 150 objects with 3 clusters, where each object consists of four features (Sepal.Length, Sepal.Width, Petal.Length and Petal.Width). We split the Iris dataset into three views by different features, as shown in Fig. 1(a), (e), (i). We can see that \(\{\omega _1\}\) is clearly distinguished from \(\{\omega _2\}\) and \(\{\omega _3\}\), whereas \(\{\omega _2\}\) and \(\{\omega _3\}\) are partially overlapped, and view 3 is much easier to distinguish between \(\{\omega _2\}\) and \(\{\omega _3\}\) than the other two views.

From Fig. 1(b)–(d), (f)–(h), (j)–(l), we can observe that WMVEC can clearly characterize the imprecision and uncertainty of cluster assignment. It carefully assigned indistinguishable objects to relevant meta-cluster (i.e. \(\{\omega _2, \omega _3\}\)). What is more, the parameter \(\alpha \) determines the size of the meta-cluster. A large \(\alpha \) tends to correspond to lower imprecision and higher error, and vice versa. In applications, \(\alpha \) should be adjusted according to the specific situation or user preferences. In the following experiments, we take \(\alpha = 2\), as suggested in [9].

Fig. 1.
figure 1

Clustering results of Iris dataset.

3.2 Experiments on Real-World Multi-view Datasets

To verify the validity of the proposed method with respect to other methods, we constructed experiments on eight public multi-view datasets, including WebKBFootnote 2, Multiple featuresFootnote 3, Prokaryotic Phyla [1], SensIT VehicleFootnote 4, HumanEva 3D Motion [1] and MSRCv1 [14]. Table 1 shows the basic information of the used multi-view datasets, including the numbers of objects, clusters, views and features.

Table 1. The details of the used real-world datasets

We compare the performance of the proposed WMVEC method with respect to a number of related methods, which are shown as follows: FCM [11], ECM [9], Co-FKM [4], RMVKM [2], TW-k-means [3], SMVF [6], TW-Co-k-means [15], Co-FW-MVFCM [14] and MVASM [5]. Among them, FCM and ECM are two classical single-view clustering methods, RMVKM, TW-k-means, SMVF and TW-Co-k-means are representative multi-view clustering methods based on hard partition, Co-FKM, Co-FW-MVFCM and MVASM are state-of-art methods based on fuzzy partition. In order to fully evaluate the performance of the above mentioned methods, we applied two popular performance metricsFootnote 5, namely Accuracy (ACC) and Rand Index (RI). Note that the larger values of ACC and RI correspond to better clustering performance.

Table 2 and Table 3 show the clustering results of WMVEC and other methods on real-world datasets. Specifically, several points can be drawn from the observation.

  1. 1.

    Compared with single-view clustering methods (i.e. FCM and ECM), WMVEC generally has obvious advantages, which indicates that it is not appropriate to use single-view clustering methods to deal with multi-view data. Besides, we can see that on the second view of the Prop dataset, FCM and ECM both achieve relatively good results, however, these results are physically meaningless.

  2. 2.

    Compared with the multi-view clustering method, the proposed WMVEC achieves better performance compared to other methods. In particular, the clustering methods considering view weights can often obtain better results, which further demonstrates the rationality of introducing view weights.

  3. 3.

    More importantly, the advantage of WMVEC is not only that it can be transformed into hard or fuzzy partition to obtain outstanding performance, but WMVEC with credal partition can also offer deeper insight into the data. It can effectively characterize the uncertainty and imprecision in cluster assignment and avoid the risk of misassignment, which is very valuable in some cautious decision-making applications.

Table 2. Clustering results with various multi-view datasets on ACC
Table 3. Clustering results with various multi-view datasets on RI

3.3 Parameter Study

Finally, we also tested the parameter sensitivity of WMVEC. The ACC and RI results with different parameters \(\beta \) and \(\lambda \) are shown in Fig. 2 and Fig. 3. We take \(\beta \) that varies in \(\{1.1, 1.2, \cdots , 2.0\}\) and \(\lambda \) that varies in \(\{e^0, e^1, \cdots , e^{10}\}\). It can be seen that the values of \(\beta \) and \(\lambda \) have a significant impact on the performance of the WMVEC. For each dataset, the WMVEC can gain better results if appropriate \(\beta \) and \(\lambda \) can be selected. In addition, we can clearly see that the WMVEC can often obtain better results with smaller \(\beta \) and larger \(\lambda \).

Fig. 2.
figure 2

Clustering results of multi-view datasets in terms of ACC.

Fig. 3.
figure 3

Clustering results of multi-view datasets in terms of RI.

4 Conclusion

In this paper, we propose an adaptive weighted multi-view evidential clustering (WMVEC) based on the framework of belief functions. As such, WMVEC is able to handle the uncertainty and imprecision of multi-view data clustering, and provides a credal partition, which extends the fuzzy, possibilistic and rough ones, allowing the representation of the imprecision and uncertainty about the assignment of objects to clusters. What is more, WMVEC can automatically learn the importance of views. Experimental results on several real-world datasets illustrate the potential and superiority of our proposed method. In future studies, we intend to extend the proposed method to deal with mixed types of data as well as incomplete data.