1 Introduction

Nowadays, along with data collection techniques, multi-view data has become ubiquitous around us in the real world. For example, a product online can be described by a paragraph of text introduction or several images, or even a short video. In web page classification tasks, the words on the web page itself and words underlying the links pointing to the web page from other pages constitute the two views of the web pages [4, 9, 11]. How to mine the rich information underlying the multi-view data becomes a challenging problem. As an important tool in machine learning and data mining fields, multi-view learning attracted more and more attentions during the past two decades [7, 10, 14, 16,17,18,19, 33, 40]. Relying on whether to use the labels, multi-view learning consists of multi-view classification and multi-view clustering. This work will focus on multi-view clustering. Multi-view clustering aims to cluster the subjects into several groups by integrating the multiple view information of the subjects [24, 39].

Besides multi-view clustering, there is a similar technique named ensemble clustering (EC) to mine multi-view information by clustering. Ensemble clustering boosts the clustering performance by ensembling many clustering algorithms [1, 13, 15, 27, 34]. When each clustering algorithm works on each view, EC becomes multi-view clustering. Thus, EC is a way to implement multi-view clustering to some degree. EC mainly consists of two steps: generation step and consensus step [35]. Generation step aims to generate many clustering results which are prepared for consensus step to combine and finally obtain a consistent result. When EC works for multi-view clustering, consensus step should be emphasized to achieve a better clustering result. In addition, ensemble strategy is highly popular in many Kaggle competitions. In this paper, we will adopt EC to implement multi-view clustering to take advantage of ensemble learning merits. Up to date, there exist a large number of multi-view clustering works. Most of these works assume that all the multi-view information are available and they rarely deal with the situation where missing values occur in the data. Unfortunately, missing values are pretty common in real world.

Due to the errors occurring in the processes of data collecting, data inputting, data storing and so on, missing data problem are pretty common in multi-view data. It is a significant challenge to cluster the multi-view data with missing values. As for missing value handling techniques, they can be categorized into three groups: deleting the subjects with missing values, imputation, and no handling. For the first group of techniques, after deleting the subjects with missing values, all the remaining subjects have all the information, but when most of the subjects have missing value, the information loss is too heavy for real-world applications. The techniques of the second group can be further split into single imputation and multiple imputation (MI). Single imputation methods include mean imputation, random imputation, regression imputation, EM (expectation maximum) imputation and so on. Compared with single imputation, MI imputes the missing value multiple times to take the missing value uncertainty into consideration, and performs better in most cases. The third group no handling is also a technique to deal with missing value. For instance, an indication matrix is adopted to deal with the missing views or values in some multi-view clustering works [8, 37]. Among all these missing value handling techniques, MI performs best in most cases and becomes the most popular strategy to deal with missing values.

To cluster the subjects with incomplete multi-view information, there appeared some incomplete multi-view clustering methods. They can be mainly classified into two groups. The first group uses an indicator matrix Mij to represent the i th subject is missing in j th view when Mij = 1, and the i th subject appears in j th view when Mij = 0 [26, 36, 37]. The second group imputes the missing value first and then runs general multi-view clustering algorithms. It is noted that the first group of methods are intended to deal with view missing case while the second group of methods aim to deal with any value missing case. Examples to explain the differences between two cases of missing value have been displayed in Fig. 1. View missing cases refers to the situation where the whole view will be missed for one subject while any value missing cases refer to the situation where any feature of any view can be missed for one subject. View missing cases can be considered as one special form of any value missing cases thus any value missing cases themselves are more difficult to tackle. To date, most of the existing incomplete multi-view clustering algorithms deal with view missing case. In this paper, we aim to deal with clustering problems in the general any value missing cases. We explore a two-stage method: the first stage is used to deal with missing values while the second stage is used to ensemble the clustering results from multiple views. To deal with these two problems, we adopted the popular MI to handle missing values and we explored different MI methods to choose the best and most stable one and then by considering the contributions from different views are different, we adopted a view weighting strategy [3, 20] to ensemble. We name our incomplete multi-view clustering with Multiple Imputation and Ensemble Clustering as MIEC. We compared it with the existing incomplete multi-view clustering algorithms including state-of-the-art ones and the results verified the effectiveness of our proposed method.

Fig. 1
figure 1

Differences between view missing case and any value missing case. These two pictures just show one view. On indicates the n-th sample while Fm indicates the m-th feature

The contributions of this paper are listed as follows:

  • This paper combined multiple imputation and ensemble clustering to implement incomplete multi-view clustering for the first time.

  • Compared with the existing incomplete multi-view clustering methods those handle view missing case, the proposed MIEC can deal with more general data missing problem: any value missing case.

  • To further boost the clustering performance, view weights are taken into account in the proposed method.

  • The experimental comparison with seven other algorithms on five data sets verifies the effectiveness of the proposed method.

The rest of this paper is organized as follows. Section 2 introduces the related works including MI and EC. Section 3 details the proposed MIEC. Section 4 shows the experimental setting and its results. Finally, conclusions are given in Section 5.

2 Related works

There are closely related works to our proposed incomplete multi-view clustering algorithm. They are MI and EC. We’ll introduce them briefly.

2.1 Multiple imputation

MI produces m complete data sets from the incomplete data by imputing the missing data m times with some reasonable method. In statics, there are two techniques to implement MI:

  1. 1.

    Joint Multivariate Normal Distribution MI: This technique assumes that the incomplete data follows a multivariate normal distribution. However, this technique will be inappropriate when the incomplete dataset doesn’t follow a multivariate normal distribution.

  2. 2.

    Conditional MI: Conditional MI provides many methods (different distribution models) to users and they follow an iterative procedure to implement modeling the conditional distribution of a certain variable given the other variables. It allows users to be more flexible to select appropriate distribution models assumed for incomplete datasets. A popular package named Multivariate Imputation by Chained Equations (MICE) [5] belongs to conditional MI.

The process of MICE package to MI is as follows:

At the first step, the MICE package creates several complete datasets. It assumes that the whole dataset follows a specific distribution selected by users, and fills in the blank with reasonable value following the distribution. Considering the uncertainty of the predicted values, MICE creates multiple complete datasets.

At the second step, the Ordinary Least Squares (OLS) regression is run and a different regression coefficient for each dataset is obtained to reflect the effect of imputation according to some basic value such as mean, variance, etc. Due to the uncertainty of the imputed data, the analysis results will be considered as the basis of selecting the most reasonable value. The analysis results are stored in a mira (multiply imputed repeated analysis) object class.

Finally, all the analysis results of imputed datasets are pooled together into one complete dataset. With the parameters in the pool() users select, means, variances, and other baselines are calculated to obtain the final imputed complete dataset.

2.2 Ensemble clustering

Ensemble technique was developed to boost classification performance when it appeared in the beginning. Inspired by this idea, we investigated the potential of implementing ensemble technique to deal with clustering problem. Generally, EC aims to produce a better clustering result by combing multiple clustering models than the result produced by individual clustering model. It can be divided into two steps: generation step and consensus step [35]. Generation step will generate many clustering models to feed consensus step to combine. There are many techniques to achieve the generation step: different clustering algorithms, different parameters initialization, different object representations and so on. Consensus step is the main step in EC algorithm. How to design an appropriate consensus function is the key to the success of EC. Thus researchers mainly work on the consensus step. There are also two ways to implement consensus step: objects co-ocurrence and median partition.

For the objects co-ocurrence way, Fred and Jain [21, 22] encode the information of the base clusterings as a co-association matrix whose each entry indicates the frequency according to which two objects are clustered together and then treat the co-association matrix as a similarity or distance matrix to compute the final clustering. Based on co-association matrix, one graph-based EC method is proposed in [32] and in the same paper, two other graph-based EC ideas are figured out. In addition, relabeling and voting are also popular methods to implement consensus function.

Median partition is defined as the partition that maximizes the similarity with all partitions in the cluster ensemble, thus it is generally formulated as an optimization problem. The difficulty of median partition is to define the similarity measure, thus many similarity measurement functions or distance functions are proposed to solve the problem, such as Mirkin distance [28], rand index [29], Jaccard coefficient [2] and so on.

Among all these successful EC algorithms, works [3, 20] took into account the different contributions of many base clustering models to the final results and eventually demonstrated superior performance. Although they lack the ability to deal with missing values directly, it can be an appropriate candidate to implement multi-view clustering in complete data settings.

2.3 Combination of MI and EC

One of the most significant advantage of multiple imputation is that it can recover the incomplete datasets without caring about the form of incompleteness. Given that most of existing multi-view clustering algorithms are confined to deal with view missing datasets, the introduction of multiple imputation to clustering could bring about a great improvement of the performance of algorithms. Furthermore, multiple imputation yields multitude of datasets and ensemble clustering is exactly good at processing large-scale datasets simultaneously. Therefore, combination of multiple imputation and ensemble clustering would be a brand-new and effective approach to implement clustering on incomplete multi-view datasets.

3 Incomplete multi-view clustering with multiple imputation and ensemble clustering

In this section, we present our MIEC method in detail. It is split into two stages: MI and EC. We utilize MI to fill the incomplete datasets to obtain a set of complete datasets, and then perform the EC algorithm on the complete datasets to accomplish the clustering.

3.1 Multiple imputation

Compared with single imputation, MI takes the uncertainty of the missing value into consideration and fill in more robust missing values. Especially in statistics, MI has become the most popular and effective technique to deal with missing values. Just as we introduced in Section 2.1, there are many algorithms to implement MI. In the popular R package MICE [5], there are some common algorithms: Sample, RF(Rondom Forest), Cart, Misdatouch, et al.

These algorithms’ performance vary on different datasets, thus it is necessary to select the most effective algorithms roughly. We donot need to identify the most effective one precisely, because its performance will also depend on the EC strategy. The selection of the algorithms of MI can be found in Section 4.6.2

3.2 Ensemble clustering

As we have known that EC consists of two stages: generation and consensus function. In generation step, after we obtained the set of complete datasets derived from performing MI on one incomplete dataset, we could choose some clustering algorithms such as k-means to perform on each view of the complete datasets, respectively. Finally, each view of each complete dataset corresponds to one result which is a co-association matrix [23].

\(X^{v}=\{{X_{1}^{v}}, {X_{2}^{v}},\cdots , X_{n_{v}}^{V}\} \in R^{d_{v}*n_{v}}\)denotes the data matrix of the v-th view where dv is the number of the features and nv is the number of samples. The ground truth Y (i) = {1,2,⋯ ,k} denotes which cluster the i-th sample belongs to. After generation step, we obtained many clustering results from each imputation completion of each view. Let V1,V2,...,VM be the M cluster model for M views and Ym be a cluster number assigned to a : Vm(a) = Ym, where Ym belongs to 1,2,...,k, and k denotes the number of clusters assigned by this result. Each cell of co-association matrix has the value:

$$ h_{m}(i, j) = \delta(V_{m}(i), V_{m}(j)) $$

where δ(a,b) is 1, if a = b, 0 otherwise.

For any pair of object(a,b), the true value of the matrix is:

$$ Z = \delta(Y(a), Y(b)) $$

In order to gather the information of all the results, we introduce a weighted method proposed by [3].

$$ H={\sum}_{m=1}^{M} \alpha_{m} \frac{1}{L_{m}} {\sum}_{l=1}^{L_{m}}h_{m} $$
(1)

where αm ≥ 0 is weight of the m th view, M denotes the number of views, Lm denotes the number of iteration.

Let’s introduce a probability model to accomplish the optimization of H.

The clustering accuracy can be represented as a probability:

$$ P[h_{m} = 1 | Z = 1] = q_{m} $$
(2)

According to the Bayesian estimate of partitioning probability

$$ \tilde{P}_{m}^{B} = \frac{{\sum}_{l} \Bar{h}_{l,m} + 1}{L_{m} + 2} $$
(3)
$$ \tilde{q}_{m}^{B} = max(\tilde{P}_{m}^{B}, 1-\tilde{P}_{m}^{B}), m = 1, 2,...,M. $$
(4)

For a pair of sample (a, b), considering the margin of clustering ensemble :

$$ mg = \{weighted\ number\ of\ vote\ of\ Z - weighted\ number\ of\ against\ of\ Z\} $$

which equals to

$$ mg = {\sum}_{m} \alpha_{m} \frac{1}{L_{m}} {\sum}_{l}{I[h_{m} = Z] - I[h_{m} \neq Z]}. $$
(5)

where Z ∈ 0,1, it is proven that [3] the probability of wrong prediction of Z:

$$ P_{err} < \frac{Var[mg(H, Z)]}{(E[mg(H,Z)])^{2}} $$
(6)

where E[mg(H,Z)]) is the expectation, and V ar[mg(H, Z)] is the variance of margin.

$$ E[mg(H,Z)]) = 2{\sum}_{m}\alpha_{m} q_{m} - 1 $$
$$ Var[mg(H, Z)] = 4 {\sum}_{m} \frac{{\alpha_{m}^{2}}}{L_{m}} q_{m}(1-q_{m}) $$

The optimization is altered to find the αm when Perr is minimized. The final solution is

$$ \alpha_{m} = \frac{\frac{L_{m}}{q_{m}(1-q_{m})}}{{\sum}_{m} \frac{L_{m}}{q_{m}(1-q_{m})}}, m=1,2,...,M $$
(7)

Given that we obtained the optimal H as co-association matrix, we perform the hierarchical agglomerative clustering to get the final cluster result. Hierarchical agglomerative clustering is a bottom-up clustering algorithm and we select “distance” criterion in our experiments. The algorithm treats each sample as a singleton cluster and then merge pairs of cluster to form flat cluster so that the original observations in each cluster have no greater cophenetic distance than “threshold” which is a parameter needed to be tuned.

figure a

4 Experiment

4.1 Datasets

We choose five complete datasets to perform our algorithm and make comparisons with seven other algorithms. The brief descriptions of the datasets are as follows:

  • LEAVESFootnote 1 consists of 1600 samples containing 100 features in each of three views. This dataset is derived from UCI Machine Learning Repository. 16 leaf samples for each of 100 plant species are collected, and for each sample, a shape descriptor, fine scale margin and texture histogram are given as three different views.

  • ORLFootnote 2 is the data set of the face images from 40 distinct subjects in AT&T Lab. There are ten different images for each of 40 subjects. Each image is 92×112 pixels, with 256 grey levels per pixel. We used a subset with 10 subjects.

  • COIL10 is subsampled from Columbia Object Image Libary (COIL100) [6, 30], which contains 32×32 gray scale images of 100 objects viewed from varying angles. The objects are placed on a motorized turntable against a black background. The turntable is rotated through 360 degrees to vary object pose with respect to a fixed color camera. Images of the objects are taken at pose intervals of 5 degrees. This corresponds to 72 poses per object.

  • YALE [25] consists of 165 samples. Each of them contains 4096, 3304, 6075 features in the three views. This dataset includes 165 images of raw pixel, with different conditions, lighting conditions, facial expression, illumination, etc.

  • UCIDIGITFootnote 3 consists of 2000 samples with 64,76,216 features in each of three views. This dataset is derived from UCI Machine Learning Repository. It represents features of handwritten numerals extracted from a collection of Dutch utility maps. There are 200 samples for each of 10 classes in the dataset. In the case of our experiment, we only choose three feature sets: 76 Fourier coefficients of the character shapes, 216 profile correlations, 64 Karhunen-Love coefficients as three views.

Considering that some of these datasets contain too many features for each view, we perform PCA (principal component analysis) on YALE, UCIdigit and ORL to pick up a group of the most important features in order to keep as much information as possible. These processed datasets kept over 95% information via PCA. The details of datasets are listed in Table 1.

Table 1 Summary of the complete datasets

4.2 Dataset processing

Since we aim to deal with incomplete multi-view clustering, with the complete datasets, we will miss some values to obtain the incomplete datasets to work with. As aforementioned in Introduction, there are two missing value cases: any value missing and view missing. Accordingly, two missing mechanisms are designed to produce them. In the first mechanism, datsets are processed to miss some data in completely random way. In the second mechanism, datasets are processed in the following rules that a specific ratio of samples will lose all the features in some views while other views will be complete. We conduct this procedure in each view separately. The detailed procedure of the two mechanisms are as follows:

For the first mechanism, we generated each indicator matrix with the same size of dataset matrix Xv(v indicates v-th view), each entry is generated from an uniform distribution on (0,1), and then we replaced those elements below the threshold 0.05 (0.1 or 0.2) with NAN and other elements with 1. In this way, an indicator matrix Av is obtained for each view. Element-wise product of the original dataset matrix Xv and the indicator matrix Av gets the incomplete multi-view dataset. It is noted that the threshold represents the ratio of missed elements.

For the second mechanism, we firstly generated a random vector with the length of N (N indicates number of samples) from an uniform distribution on (0,1), and then we replaced those elements below the threshold 0.05 (0.1 or 0.2) with NAN and other elements with 1. In this way, a random vector bv is obtained from each view, and then a diagonal matrix Bv with diagonal vector as bv is obtained. If the original dataset is Xv, Bv × Xv will generate the view missing datasets.

4.3 Compared methods

Corresponding to two missing value cases: view missing and any value missing, we name our method MICE to deal with them as MICE_VM and MICE_AM. To demonstrate the effectiveness of MICE_VM and MICE_AM, we compare them with the following algorithms.

  • Incomplete Multiview Spectral Clustering with Adaptive Graph Learning (IMSC) [13]: IMSC combines graph learning with spectral clustering to propose a framework to deal with incomplete multi-view clustering. It uses low-rank representation to construct the graph for each view and then conduct multi-view clustering in co-regularization way. It can only deal with view missing datasets.

  • View Variation and View Heredity for Incomplete Multi-view Clustering (V3H) [15]: V3H introduces the concept of variation and heredity of genetics to exploit the multi-view data information. It decomposes the subspace of data into variation matrix and heredity matrix and then integrates unique information of different views to recover incomplete datasets.

  • Perturbation-oriented Incomplete multi-view Clustering (PIC) [36]: The method aims to tackle the problem of incomplete multi-view clustering. It presents a link between perturbation risk bounds and incomplete multi-view clustering and transfers the missing problem of incomplete datasets to similarity matrix computing and finds the minimization of perturbation risk bounds. However, it can only deal with view missing case.

  • Doubly Aligned Incomplete Multi-view Clustering (DAIMC) [26]: DAIMC solves the incomplete multi-view clustering problem by introducing a weighted matrix to indicate whether the value in corresponding position is lost in each view and learning a consensus matrix to combine the information from multiple views.

  • Multiple Incomplete Views Clustering(MIC) [37]: MIC performs weighted nonnegative matrix factorization on each incomplete view and then utilizes the co-regularized way to get the consensus matrix obtained from the process of learning the latent feature matrix of all views, which minimizes the influence of missing data.

  • Partial Multi-View Clustering (PVC) [31]: PVC could solve the problem of missing data in a dataset, however, it can only handle two-view datasets. According to the method [26], with all the two-view combination of multi-view datasets, PVC is performed and the best result is selected as the final result.

  • Best Single View (BSV): It performs non-negative matrix factorization on each single view respectively and then pick the best clustering result as the final result.

4.4 Evaluation metrics

To evaluate the clustering performance, We use four evaluation metrics: accuracy (ACC), Normalized Mutual Information (NMI), Adjusted Rand Index (ARI) and F-Scores [12, 38]. Their definitions are given as follows:

  • Accuracy (ACC). Accuracy discovers the one-to-one relationship between clusters and classes and measures the extent to which each cluster contained data points from the corresponding class. It sums up the whole matching degree between all pair class-clusters.

    $$ \text{ACC}=\frac{1}{N}\max({\sum}_{X_{i},Y_{j}}N(X_{i},Y_{j})) $$
    (8)

    where Xi denotes the i-th cluster in the final results, and Yj is the true j-th class. N(Xi,Yj) is the number of samples which belong to class j are assigned to cluster i. Accuracy computes the maximum sum of N(Xi,Yj) for all pairs of clusters and classes, and these pairs have no overlaps.The greater clustering accuracy means the better clustering performance.

  • Normalized Mutual Information (NMI). For two random variables X and Y, the NMI is defined as:

    $$ \text{NMI}(\boldsymbol {X},\boldsymbol {Y})=\frac{I(\boldsymbol {X},\boldsymbol {Y})}{\sqrt{H(\boldsymbol {X}) H(\boldsymbol {Y})}} $$
    (9)

    where I(X,Y) is the mutual information between X and Y, while H(X) and H(Y) are the entropies of X and Y, respectively. Clearly, NMI takes the value from [0,1], the higher NMI means the better the clustering performance.

  • Adjusted Rand Index. Adjusted Rand index(ARI) is a corrected-for-chance version of the rand index. ARI assumes the generalized hypergeometric distribution as the model of randomness to deal with the problem of rand index, which is the effect of random labels to evaluation results. Both ARI and random index are calculated by the following four values.

    A true positive (TP) decision assigns two similar documents to the same cluster, a true negative (TN) decision assigns two dissimilar documents to different clusters. There are two types of errors we can commit. A false positive (FP) decision assigns two dissimilar documents to the same cluster. A false negative (FN) decision assigns two similar documents to different clusters. Rand index measures the percentage of decisions that are correct.

    $$ \text{ARI}=\frac{2\times(TP \cdot TN - FN \cdot FP)}{(TP+FN)(TN+FN)+(TP+FP)(FP+TN)} $$
    (10)
  • F-score The F-score is a way of combining precision and recall of the model, it is defined as the harmonic mean of precision and recall.

    $$ \text{Precision}=\frac{TP}{TP+FP} $$
    (11)
    $$ \text{Recall}=\frac{TP}{TP+FN} $$
    (12)
    $$ \text{F-score}=2\frac{Precision \cdot Recall}{ Precision+Recall} $$
    (13)

For all these four evaluation metrics, the higher value means better performance.

4.5 Experimental results

We perform our MIEC_VM, MIEC_AM and seven baselines on five datasets with 10% missing ratio to compare their performance. Table 2 and Fig. 2 report the performance in four evaluation metrics ACC, NMI, ARI and F-score.

Table 2 Clustering results of nine methods on five datasets with 10% missing ratio. The best one is emphasized with bold font and the second best one is shown underlined
Fig. 2
figure 2

Clustering performance of nine algorithms on five datasets with 10% missing ratio

According to Table 2 and Fig. 2 , we get the following analysis:

  • Outstanding performance to handle datasets with view missing: Since other compared algorithms can only deal with view missing case, here we compare our MIEC_VM with other algorithms on the five datasets with 10% missing ratio. According to Fig. 2, our algorithm MIEC_VM outperforms the baselines on all the datasets except IMSC on UCIDIGIT dataset. One reason why our algorithm shows the best performance is that the method adopts multiple imputation to deal with missing values, which can take the uncertainty of missing value into consideration. These baselines (DAIMC, PVC, BSV) complete the incomplete datasets with the average features, which renders huge deviations. The PIC utilizes a subtle method to transform the incomplete features into incomplete similarity entries which makes use of the values in all views, as a result, PIC performs better than the other baselines. For our algorithms, we use multiple imputation to get many sets of the complete datasets, and then perform k-means clustering algorithm on those imputed datasets. Finally, we get the weighted results in co-association matrix which makes full use of all the existing data as well as lots of imputed data from multiple imputation.

  • Handling the datasets with any value missing: The

    significant advantage of our algorithm is that it is capable of handling with the datasets with any missing values (MIEC_AM). By contrast, the baselines could only deal with the datasets with view missing. Meanwhile, according to Table 2, the performance of MIEC_AM is always the best among all the algorithms, which indicates that our MIEC_AM could tackle the datasets with any value missing successfully.

4.6 Parameter analysis

4.6.1 The effect of missing ratio

In the above experiments, the parameter missing ratio is fixed to 10%. Herein, we will explore the effect of missing ratio. Tables 3 and 4 present the four metrics of MIEC_VM and MIEC_AM on all the datasets when the missing ratio is set to 5%, 10%, 20%. We could see that the performance decreases with missing ratio increasing, while the extent of decline is becoming smaller, which means that our algorithm is not sensitive to the missing ratio when missing ratio is less than 20%.

Table 3 The performance of MIEC_VM with different missing ratio. The best one is emphasized with bold font
Table 4 The performance of MIEC_AM with different missing ratio. The best one is emphasized with bold font

4.6.2 The effect of methods of multiple imputation

In order to explore the effect of the method of MI, we select some popular methods including PMM, Sample, RF, Misdatouch to perform MICE_VM and MICE_AM on dataset LEAVES. Tables 5 and 6 present the results with four clustering metrics ACC, NMI, F, ARI.

Table 5 The performance of MIEC_VM obtained with different methods of MI. The best one is emphasized with bold font
Table 6 The performance of MIEC_AM obatined with different methods of MI. The best one is emphasized with bold font and the second best one is shown underlined

It turns out that MIEC_VM performs the best with the method of “PMM”, while MIEC_AM performs the best with the method of “Cart”. Comprehensively, we select “PMM” as our imputation method in all the experiments.

4.6.3 The effect of weighted process

When it comes to ensemble clustering, MIEC_AM and MIEC_VM assign the weights to the co-association matrix according to the correctness of the clustering results obtained from co-association matrix of each single view, and then calculate the weighted average co-association matrix of all the views to perform consensus function. To check the effectiveness of the weighting process, we conduct the comparison experiments on two datasets 100LEAVES and YALE with or without weighting process. The results are shown in Table 7.

Table 7 The Performance of MIEC with or without Weighting Process. The best one is emphasized with bold font

In Table 7, we have made a comparison between weighted process and average process. Through the observation, we can find that both MIEC_VM and MIEC_AM with weighting process perform better than those without weighting process, which indicates that the weight assignment improves the performance of our algorithm on all the evaluation metrics.

5 Conclusion

In this paper, we proposed an incomplete multi-view clustering method named MIEC. This method is a two stage algorithm, the first stage adopts the most popular missing value handling technique multiple imputation to complete the multi-view data, the second stage designs a weighted ensemble clustering algorithm to implement multi-view clustering considering the different contributions of multiple views. As far as we know, this is the first time to combine multiple imputation and ensemble clustering to conduct incomplete multi-view clustering. Compared with many existing incomplete multi-view clustering methods who just tackle view missing case, our proposed MIEC can deal with more general case any value missing. Additionally, the experiments on several datasets demonstrated that our proposed methods obtained better performance than existing incomplete multi-view clustering algorithms in terms of four evaluation metrics ACC, NMI, F, ARI. Considering that splitting the method into two stages may result in some gaps, in future work, we are interested in proposing an incomplete multi-view clustering method by conducting multiple imputation and ensemble clustering simultaneously.