Fast and General Incomplete Multi-view Adaptive Clustering

Ji, Xia; Yang, Lei; Yao, Sheng; Zhao, Peng; Li, Xuejun

doi:10.1007/s12559-022-10079-3

Fast and General Incomplete Multi-view Adaptive Clustering

Published: 01 December 2022

Volume 15, pages 683–693, (2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Cognitive Computation Aims and scope Submit manuscript

Fast and General Incomplete Multi-view Adaptive Clustering

Download PDF

Xia Ji¹,
Lei Yang ORCID: orcid.org/0000-0003-1165-3050¹,
Sheng Yao¹,
Peng Zhao¹ &
…
Xuejun Li¹

298 Accesses
2 Citations
Explore all metrics

Abstract

With the development of data collection technologies, multi-view clustering (MVC) has become an emerging research topic. The traditional MVC method cannot process incomplete views. In recent years, although many incomplete multi-view clustering methods have been proposed by many researchers, these methods still suffer from some limitations. For example, these methods all have parameters that need to be adjusted, or have high computational complexity and are not suitable for processing large-scale data. To make matters worse, these methods are not suitable for cases where there are no paired samples among multiple views. The above limitations make existing methods difficult to apply in practice. This paper proposes a Fast and General Incomplete Multi-view Adaptive Clustering (FGPMAC) method. The FGPMAC adopts an adaptive neighbor assignment strategy to independently construct the similarity matrix of each view, thereby it can handle the cases where there are no paired samples among multiple views, and eliminating the necessary to adjust the parameters. Moreover, by adopting a non-iterative approach, FGPMAC has low computational complexity and is suitable for large-scale datasets. Results of experiments on multiple real datasets fully demonstrate the advantages of FGPMAC, such as simplicity, effectiveness and superiority.

Incomplete Multi-view Clustering

Incomplete multi-view clustering via local and global co-regularization

Article 18 April 2022

Incomplete Multi-view Clustering Based on Self-representation

Article 10 March 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

With the development of information collection technology, data are often described by multiple views [1, 2]. A typical example is the classification of webpages, which can be described by two views of text and its linked information [3]. Generally speaking, different views provide complementary information to describe the data, which makes multiview learning can get better performance than single view learning [4]. As one of the most representative methods of multi-view data learning, multi-view clustering has been widely applied in many fields, such as data analysis, information retrieval and image classification [5,6,7,8]. Many advanced methods have been proposed for multi-view clustering. However, due to noise, failure of data-collecting equipment and many other unforeseen factors, data can be lost randomly in a single view or multiple views, making incomplete multi-view data widely exist [9].

Figure 1 shows two cases of incomplete multi-view data. In the first case, part of samples contain the features of all views. In the second case, no samples contain features of all views. That is, there is no complete common samples among multiple views. For the first case, many methods have been proposed, such as PVC (Partial multi-view clustering) [9], PMH (Learning to hash on partial multi-modal data) [10] and IMG (Incomplete multi-modal visual data grouping) [11] based on matrix factorization, and APMC (Anchors Bring Ease: An Embarrassingly Simple Approach to Partial Multi-view Clustering) [12]. However, these methods cannot handle the second case, because they need to rely on common parts of multiple views. Furthermore, neither the PVC nor the IMG approach can handle multi-view data with more than two views. To our knowledge, currently only the newly proposed methods IMSC_AGL (Incomplete Multiview Spectral Clustering with Adaptive Graph Learning) [13] and AGC_IMC (Incomplete Multiview Clustering with Adaptive Graph) [14] can handle the second case of incomplete multi-view data. However, both methods are computationally complex and require many iterations to converge. In addition, in order to achieve the optimal clustering effect, they each contain three parameters that need to be adjusted. Therefore, they are difficult to use in practice.

To address the above problems, we propose a novel method called Fast and General Incomplete Multi-View Adaptive Clustering (FGPMAC). FGPMAC adopts an adaptive neighbor assignment strategy to calculate the similarity matrix of each view independently and non-iteratively, without relying on the common part of multiple views, so FGPMAC can handle the second case of the incomplete multi-view data. Experimental results on many real datasets fully demonstrate the effectiveness and efficiency of FGPMAC. In general, FGPMAC has the following contributions.

1.
FGPMAC can independently calculate the similarity matrix of each view without requiring complete common samples among multiple views. It can handle both cases of incomplete multi-view data at the same time, so FGPMAC is a general incomplete multi-view clustering method.
2.
FGPMAC adopts an adaptive neighbor assignment strategy to calculate similarity, thus tedious parameter adjustment is avoided.
3.
FGPMAC has a non-iterative structure with low computational complexity and is suitable for large-scale datasets.

The rest of this paper is organized as follows. Related works are introduced in "Related Works". "The Proposed Method" describes FGPMAC in detail. "Spectral Clustering On Fused Similarities" presents the experimental resulst and the analysis. "Method Framework" is the conclusion of this paper.

Related Works

For the first case of incomplete multi-view data, many methods have been proposed. BSV (Best Single View) is the simplest and the most direct way to deal with incomplete multi-view clustering. BSV first complete the missing values with the average value of each view, and then clusters each view separately to select the best result. However, BSV does not take full use of the information of multiple views. SC[C] is a simple splicing method. SC[C] connect the features of multiple views into a long single vector, and then obtain a unified similarity matrix and perform spectral clustering. SC[A] is a simple fusion method. SC[A] generate a similarity matrix for each view, then fuse these similarity matrices equally, and finally perform spectral clustering. MultiNMF is a clustering method that processes complete multi-view data. We first complete all missing values like BSV, and then perform MultiNMF. PVC [9] is a pioneering approach that uses non-negative matrix factorization with ${l}_{1}$ sparse regularization to determine the optimal low-dimensional subspace. Meanwhile, MIC (Multiple incomplete views clustering via weighted NMF with ${l}_{\mathrm{2,1}}$ regularization) [15] extends MultiNMF (Multi-view clustering via joint nonnegative matrix factorization) [16] with weighted non-negative matrix factorization and ${l}_{\mathrm{2,1}}$ regularization to obtain highly reliable results. IMG [11] integrates PVC and manifold learning to adaptively capture the global structure of all instances, but such integration requires additional parameters. In addition, neither PVC nor IMG can process more than two views. Trivedi et al. [17] and Gao et al. [18] proposed an incomplete multi view clustering method based on kernel regular correlation analysis respectively. However, both methods require at least one complete view as a reference. APMC [12] demonstrates a significantly improved computational efficiency, but it must ensure that there are some samples that contain all the view features.

The PVC, MIC, IMG, MultiNMF, APMC mentioned above cannot handle the second incomplete case. The newly proposed IMSC_AGL [13] and AGC_IMC [14] can handle the second case. IMSC_AGL [13] and AGC_IMC [14] can perform clustering when no samples contain all the view features. However, they have high computational complexity, respectively $O\left(\tau \left(k{n}^{3}+{n}^{3}+{\sum }_{v}{n}_{v}^{3}\right)\right)$ and $O\left(\left(\tau c{n}^{2}\right)\right)$, where $\tau$ represents the number of iterations, k represents the number of views, n represents the number of samples, and c represents the number of clusters. Due to the high computational complexity, IMSC_AGL and AGC_IMC are unsuitable for large-scale datasets. In addition, the three parameters of IMSC_AGL and those of AGC_IMC are not easy to adjust in practice. The choice of parameters depends on experience and is adjusted according to the results. Inappropriate parameter selection will directly affect the accuracy of clustering. Different datasets require different optimal values for the parameters. This greatly affects the usefulness of both algorithms. Due to the limitations of previous methods, it is very necessary to do further research on incomplete multi-view clustering.

The Proposed Method

Notations

Consider dataset $X=\{{X}^{(1)},{X}^{(2)},...,{X}^{(v)}\}$, where ${X}^{\left(v\right)}=\left\{{x}_{1}^{v},{x}_{2}^{v},\dots ,{x}_{n}^{v},\right\}\in {R}^{{d}_{v}\times n}$ is the data matrix of the v-th view. v is the number of views, n is the total number of samples, and ${d}_{v}$ is the feature dimension of the v-th view. Incomplete multi-view clustering divides all of the above samples into c clusters, where c is predefined by users. The notations used in this paper are summarized in Table 1.

Table 1 Summary of the notations

Full size table

Method Framework

The proposed FGPMAC involves two stages, shown in Fig. 2. At the first stage, FGPMAC adopts an adap-tive neighbor assignment strategy to construct the similarity matrix of each view indepandently, thereby eliminating the necessary to adjust the parameters. Then, FGPMAC quantifies the contribution of each view, and generates a consistent similarity matrix by fusing the similarity matrixes of multiple views. At the second stage, FGPMAC performs spectral clustering [19] on the consistent similarity matrix to obtain clustering results.

Consistent Representation Learning

Let ${X}^{(v)}=\left\{{x}_{1}^{(v)},{x}_{2}^{(v)},\dots ,{x}_{n}^{(v)}\right\}\in {R}^{{d}_{v}\times n}$ denotes the samples of the v-th view (including the missing samples). We use $Y^{(v)}=\left[y_1^{(v)},y_2^{(v)},{\dots.,y}_{n_v}^{(v)}\right]\in R^{d_v\times n_v}(n_v<n)$ to represent the samples that are not lost in the v-th view, where ${d}_{v}$ and ${n}_{v}$ are the feature dimensions and the number of unmissed samples in the v-th view, respectively. We use ${Y}^{(v)}$ to construct the similarity matrix ${Z}^{(v)}$ of the unmissed samples.

We learn similarity matrix by adaptively assigning the optimal neighbors for each sample. Nearby points have similar properties [20]. Samples with a shorter distance should have a higher probability to be neighbors. The neighbors of ${x}_{i}\in {R}^{d\times 1}$ can be defined as the $k$-nearest samples to ${x}_{i}$. We use the Euclidean distance to measure the distance between two samples. For the $i$-th sample ${x}_{i}$, all the samples $\left\{{x}_{1},{x}_{2},\dots ,{x}_{n}\right\}$ can be the neighbors of ${x}_{i}$ with probability ${z}_{ij}$. Usually, a shorter distance ${d}_{ij}={\Vert {x}_{i}-{x}_{j}\Vert }_{2}^{2}$ should be assigned a higher probability ${z}_{ij}$.

The similarity ${z}_{ij}^{(v)}$ stands for the probability that ${y}_{j}^{(v)}$ is the neighbor of ${y}_{i}^{(v)}$. Thereby, a good way to obtain neighbor probabilities of the i-th sample is to solve the following problem:

$$\underset{{{Z}_{i}^{(v)}}^{T}1=1,{z}_{ij}^{(v)}\ge 0}{min} {{\sum }_{j=1}^{{n}_{v}}\Vert {y}_{i}^{(v)}-{y}_{j}^{(v)}\Vert }_{2}^{2}{z}_{ij}^{(v)}$$

(1)

However, there is a trivial solution in (1), i.e., only the nearest sample can be the neighbor of ${x}_{i}$ with probability 1, and the probability of all the other samples being the neighbors of ${x}_{i}$ is 0. This solution is obviously Pointless. To solve the problem, a regularization term is added to (1), then we have

$$\underset{{ {Z}_{i}^{(v)}}^{T}1=1,{z}_{ij}^{(v)}\ge 0}{min} {{\sum }_{j=1}^{{n}_{v}}\Vert {y}_{i}^{(v)}-{y}_{j}^{(v)}\Vert }_{2}^{2}{z}_{ij}^{(v)}+\upgamma {\sum }_{j=1}^{{n}_{v}}{{z}_{ij}^{(v)}}^{2}$$

(2)

The second term in Eq. (2) is a regularization and γ is the regularization parameter. ${z}_{ij}^{(v)}$ is the j-th value of ${{z}_{i}^{(v)}}^{T}$, ${n}_{v}$ represents the number of samples appearing in view-v. Let ${d}_{ij}^{(v)}={\Vert {y}_{i}^{(v)}-{y}_{j}^{(v)}\Vert }_{2}^{2}$, and (2) can be rewritten as

$$\underset{{Z}_{i}^{(v)}}{min}{\Vert {z}_{i}^{(v)}+\frac{{d}_{ij}^{(v)}}{2\gamma }\Vert }_{2}^{2}\text{ s.t }{{z}_{i}^{(v)}}^{T}1=1,{z}_{ij}^{(v)}\ge 0$$

(3)

Considering the equality and inequality constraints in (3), we use the Lagrangian function with KKT condition (Convex optimization) [21] to solve the equation. The Lagrangian function of (3) is

$$L\left({z}_{i}^{(v)},\eta ,{\beta }_{i}\right)=\frac{1}{2}{\Vert {z}_{i}^{(v)}+\frac{{d}_{i}^{(v)}}{2{\gamma }_{i}}\Vert }_{2}^{2}-\eta \left({{z}_{i}^{(v)}}^{T}-1\right)-{\beta }_{i}^{T}{z}_{i}^{(v)}$$

(4)

where $\eta$ and ${\beta }_{i}$ are the Lagrangian multipliers, $\eta$ is the equality constraint coefficient, and ${\beta }_{i}$ is the inequality constraint coefficient. According to the KKT condition, the following conditions (5) must be met in order to get the optimal solution.

$$\left\{\begin{array}{c}{\left.\frac{\partial L}{\partial {z}_{i}^{(v)}}\right|}_{{z}_{i}^{(v)}={z}_{i}^{^{\prime}}}=0\\ \eta \ne 0\\ {\beta }_{i}\ge 0\\ {\beta }_{i}{z}_{i}^{^{\prime}}=0\\ {{z}_{i}^{^{\prime}}}^{T}1-1=0\\ {z}_{i}^{^{\prime}}\ge 0\end{array}\right.$$

(5)

${z}_{i}^{\mathrm{^{\prime}}}$ is the optimal solution and γ can be set as $\gamma =\frac{k}{2}{d}_{i,k+1}^{(v)}-\frac{1}{2}{\sum }_{j=1}^{k}{d}_{ij}^{\left(v\right)}$ [22] in order to obtain the optimal solution of (4):

$${z}_{ij}^{(v)}=\frac{{d}_{i,k+1}^{(v)}-{d}_{ij}^{(v)}}{k{d}_{i,k+1}^{(v)}-{\sum }_{j=1}^{k}{d}_{ij}^{(v)}}$$

(6)

It is preferred to learn a sparse ${z}_{i}$ which has exactly k nonzero values. The study [23] revealed that sparse representation is robust to noise and outliers. Because the learned $Z$ is sparse, the computation burden of subsequent spectral clustering can be largely reduced. In addition, the number of neighbors k is much easier to adjust than the regularization parameter γ since k is an integer with an explicit meaning.

After obtaining the similarity matrix ${Z}^{(v)}$ among the non-missing instances in the v-th view, the similarity matrix ${\overline{Z} }^{(v)}$ of all samples (including the missing and non-missing samples) in the v-th view can be obtained by the following formula:

$${\overline{Z} }^{(v)}= {{G}^{(v)}}^{T}{Z}^{(v)}{G}^{(v)}$$

(7)

where ${G}^{(v)}\in {R}^{{n}_{v}\times n}$ is the index matrix, in which the value associated with the missing sample is forced to 0. The definition of matrix ${G}^{(v)}$ is as follows:

$${G}^{\left(v\right)}=\left\{\begin{array}{c}1\\ 0\end{array}\right.\genfrac{}{}{0pt}{}{ if {y}^{\left(v\right)}\;is\;the\;original\;instance\; {x}^{\left(v\right)}}{otherwise}$$

(8)

Similar to the previous analysis, deleting those samples that suffer from missing information or filling in the missing views with the average is unreasonable. However, setting the value to 0 in the similarity of the corresponding view is reasonable. In this way, the uncertain information in the incomplete view will not play a negative role in learning the data cluster representation. By contrast, only the available information is used to guide the representation learning, which is conducive to obtaining a highly reliable data cluster representation and reducing the negative impact of missing information.

After obtaining the similarity ${\overline{Z} }^{(v)}$ of each view, we fuse all the similarity matrices into a unified matrix, which is also our key matrix. We use $S={\sum }_{1}^{v}{W}^{(v)}{\overline{Z} }^{(v)}$ to represent the unified matrix to make full use of the information.

The non-missing samples represent the information that we can use. The more information a view provides, the greater the weight of this view is. Moreover, the view with more missing samples should be assigned a smaller weight to achieve a highly reliable and consistent representation and to reduce the negative influence of incomplete views. If the missing rates of multiple views are quite different and the weights are equally assigned, then those views with high missing rates may provide too much inaccurate information and affect the final clustering results. Therefore, ${W}^{(v)}$ can be expressed as

$${W}^{(v)}={n}_{v}/{\sum }_{1}^{v}{n}_{v} \text{s.t} {\sum }_{1}^{v}{W}^{v}=1$$

(9)

Spectral Clustering On Fused Similarities

After obtaining the fused similarity matrix S, spectral clustering is performed to gain the final clustering result. In comparison with the traditional k-means algorithm, spectral clustering is more adaptable to different data distribution. Spectral clustering learns a low-dimensional representation $F\in {R}^{n\times c}$ according to the fusion matrix S. Spectral clustering solves the problem (10) by eigendecomposition of $L$ to obtain the corresponding c minimum eigenvectors, and then K-means clustering is performed to get the clustering results.

$$\underset{{{\varvec{F}}}^{T}{\varvec{F}}={\varvec{I}}}{\text{min}}Tr({F}^{T}\mathit{LF})$$

(10)

where $Tr(.)$ is the trace of the matrix, L = D-S is a Laplacian matrix [24], $D\in {R}^{n\times n}$ is a diagonal matrix with ${D}_{ii}\in {\sum }_{j=1}^{n}{S}_{ij}$, and I is the identity matrix.

Algorithm 1 summarizes the calculation process described above.

Computational Complexity Analysis

At the first consistent similarity learning stage, the time complexity of generating the similarity matrix S is $O\left({n}_{v}^{2}{\sum }_{v}{d}_{v}\right)$, where ${n}_{v}$ represents the number of instances that appear in view-v, and ${d}_{v}$ represents the feature dimension of view-v. With the increase of missing samples, ${n}_{v}$ will decrease and the algorithm complexity will be greatly reduced. The complexity of the algorithm increases approximately linearly as the number of views increases. Because every time a view is added, the similarity matrix of a view needs to be calculated accordingly.

At the second stage, spectral clustering is performed on the fused similarity matrix S, and we only need the c largest singular values. In addition, given the properties of the similarity matrix S, performing SVD (Fast svd for large-scale matrices) [25] can reduce the time complexity to $O\left(n{c}^{2}\right)$.

Experiments

In this section, we conduct extensive experiments to demonstrate the effectiveness and efficiency of FGPMAC.

Datasets

Table 2 briefly describes the datasets used in the experiment. The datasets Flowers17 and USPS-MNIST with two views are used in the first incomplete case. The other datasets are used for the second incomplete case.

Table 2 Description of the used datasets

Full size table

The Flowers17 Dataset

[26] is composed of 17 flower classes, described by color, shape, and textures. Following [12], we take the ${X}^{2}$ distance matrix of color and shape features as the two views.

The USPS-MNIST Dataset

is the combination of two famous handwritten datasets, namely, USPS [27] and MNIST [28]. We follow [12] and randomly select 50 images from each number category from each dataset.

The 3Sources Dataset

[29] contains 948 news articles collected from 3 online news sources, namely, BBC, Reuters, and The Guardian. In our experiments, we select a subset containing 169 stories reported in all 3 sources.

The 100leaves Dataset

[30] contains three views with a total of 1600 samples divided into 100 categories.

The ORL Dataset

[31] contains 40 categories and a total of 400 images. For each image, we generate 4 feature vectors, including GIST (512), LBP (59), HOG (864), and CENT (254).

The NUS Dataset

[32] is a subset of NUS-WIDE, which contains a total of 1200 images, divided into 12 categories.

The Caltech101 Dataset

[33] contains 101 objects and a background category, and each object provides 40 to 800 images.

NUS-WIDE-Object

[32] is a dataset for object recognition which consists of 30,000 images in 31 classes. We use 5 features provided by the website, i.e., 65-dimension color Histogram (CH), 226-dimension color moments (CM), 145-dimension color correlation (CORR), 74-dimension edge distribution and 129 wavelet texture.

Comparison methods

1.
BSV [11] BSV (Best Single View) first fills in the missing values with the average value of each view, and then clusters each view separately to select the best result.
2.
SC[C] It connects the features of multiple views into a long single vector, and then obtain a unified similarity matrix and perform spectral clustering.
3.
SC[A] It generates a similarity matrix for each view, then fuse these similarity matrices equally, and finally perform spectral clustering.
4.
MultiNMF It first fill in all missing values like BSV, and then perform MultiNMF.
5.
PVC [9] establishes a latent subspace where the same sample described by different views are close to each other.
6.
MIC [15] extended MultiNMF by weighted NMF to obtain better results.
7.
IMG [11] integrates the global structure of data into subspace learning.
8.
APMC [12] utilizes anchors to reconstruct instance to instance relationships for clustering.
9.
IMSC_AGL [13] exploits the graph learning and spectral clustering techniques to learn the common representation for incomplete multi-view clustering.
10.
AGC_IMC [14] develops a joint framework for graph completion and consensus representation learning.

Experimental Settings

We construct two types of incomplete multi-view data in our experiments.

There are paired samples among multi views, i.e., some samples contain the features of all views.

We set the partial data ratio (IDR) from 10 to 90% with a 20% interval, where 0% means that all views are complete. The lost samples are evenly distributed across all views, and each sample is available in at least one view.

There are no paired samples among multi views, i.e., no sample contain the features of all views.

We randomly delete approximately 30%, 50%, and 70% of the samples from the multi-view dataset and then evaluate clustering performance under different missing rates.

In order to evaluate the clustering performance, two classic clustering evaluation indicators are adopted, namely, clustering accuracy (ACC) and standardized mutual information (NMI). The values of these indicators range from 0 to 1, and a larger value indicates a better performance of the clustering algorithm. For fairness, we perform all comparison methods 10 times on each dataset and report the average clustering results to eliminate the uncertainty caused by randomness. The parameters involved in the comparison method are set according to author's suggestion or default value. For the purpose of reproducibility, the code and datasets are released at: https://github.com/leiyang617/code_for_FGIMAC.

Results and Analysis

Figures 3 and 4 report the ACC and MNI at different missing rates in the first incomplete case, whereas Figs. 5–7 report the indicators in the second incomplete case. Table 3 shows the run time of different methods on various datasets with a missing rate of approximately 50%. Due to the limit of space, we only report the results obtained when the missing rate is 50% given that similar trends can be observed in the other missing rates.

Table 3 Running time (seconds) on different datasets with a missing rate of approximately 50%

Full size table

The clustering performance of all methods declines along with an increasing missing rate. Figures 3 and 4 show that BSV, SC[C], SC[A], MultiNMF and MIC are unsatisfactory in most cases, indicating that filling the missing samples with the average is not good enough to solve the incomplete multi-view clustering problem. BSV only uses a single view to obtain the clustering results and is unable to use the complementary information among multiple views. SC[C] connects all views into a single long view and ignores the differences in the distributions of various views. SC[A] treats all views equally without considering their integri-ty or credibility, which is unreasonable for incomplete multi-view clustering. PVC, IMG, APMC, IMSC_AGL and AGC_IMC demonstrate an acceptable performance, suggesting that using the complementary information among views is an effective approach.

In the first incomplete case, Fig. 3 show that FGPMAC obtains better results than other methods on the Flowers17 dataset. Figure 4 show that on the USPS-MNIST dataset, FGIMAC obtains better results than other methods except when the PDR is 90%.

In the second incomplete case, Fig. 5 shows that FGIMAC outperforms the other methods in all datasets when the missing rate is 30%. Figure 6 shows that when the missing rate is 50%, FGIMAC achieves the best results on all datasets except the ORL dataset and the NUS dataset. Figure 7 shows that when the missing rate is 70%, FGPMAC obtains the best results on all datasets except the 3Sources dataset. When FGPMAC fails to obtain the optimal result, the gap between its results and the optimal ones is very small.

Although IMSC_AGL and AGC_IMC have suboptimal clustering performance, but their computational complexity is unsatisfied.

It can be seen from Table 3 that the run time of IMSC_AGL and AGC_IMC is much higher than that of other methods. Our method has absolute advantages in terms of run time. For instance, on the 3Sources dataset, it takes IMSC_AGL 10.86 s to run, 5.09 s for AGC_IMC, while our method only use 0.18 s. On the Caltech101 dataset, the run time of IMSC_AGL is 3041.25 s, and that of AGC_IMC is 1278.09 s, while ours is just 173.07 s, especially on the dataset 100leaves, the run time of method AGC_IMC and IMSC_AGL is about hundreds of times longer than ours. In addition, the experimental results on Cal-tech101 and NUS-WIDEOBJ datasets show that the run time of IMSC_AGL and AGC_IMC increases rapidly with the enlargement of the dataset scale. For large-scale datasets, this is clearly unacceptable.

In order to further explore how the time complexity of the algorithm proposed in this paper changes with the number of views and sample missing rate, we conducted experiments on the Caltech101 dataset. The results are shown in Figs. 8 and 9, and it can be seen from the results that as the sample missing rate increases, the clustering accuracy decreases and the running time reduces significantly. In addition, ACC and NMI decrease dramatically when the missing rate reaches 70%, because the missing rate is so large that we have little information to use. Data with a missing rate of more than 70% are rare in practical applications. As the number of views increases, the accuracy increases and the runtime increases accordingly. The change in running time is consistent with what we analyzed in "Consistent Representation Learning". Furthermore, it can be seen that as the number of views increases, the ACC and NMI also increase, which indicates that combining features from multiple views helps to improve the clustering performance.

We test the sensitivity of parameters to further evaluate the performance of FGIMAC, which only needs to adjust pa-rameter $k$. We evaluate the clustering performance of $k$ in the range of {4, 6, …, 14}. Due to the limited space, we only report the results on the ORL dataset given that similar trends can be observed in the other datasets. As shown in Fig. 10, under the same PDR, the fluctuation of ACC and NMI is very small when different $k$ are selected, so it has little influence on the final clustering result. In other words, FGIMAC has low sensitivity to $k$ in a relatively wide range.

Conclusion

We propose FGIMAC, a simple and effective method for incomplete multi-view clustering that can overcome many shortcomings of the existing methods. Compared to traditional methods, FGIMAC demonstrates greater flexibility as it is able to cluster in two imcomplete cases of multi-view data. We perform experiments on multi-view datasets with different missing rates, and experimental results show that FGIMAC achieves higher clustering performance with less run time.

However, most of the existing incomplete multi-view clustering methods (including our method) must take the number of clusters into consideration. As for future work, we plan to refer to COMIC (COMIC: Multi-view Clustering Without Parameter Selection) [34] and adjust our method so that clustering can be performed without knowing the number of clusters.

Data Availability

The data and code that support the findings of this study are openly available at https://github.com/leiyang617/code_for_FGIMAC.

References

Song B, Zhou Z, Wang J, Xiang B, Qi T. Ensemble Diffusion for Retrieval. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV). 2017.
Zhang Z, Xie Y, Zhang W, Tian Q. Effective Image Retrieval via Multilinear Multi-Index Fusion. IEEE Trans Multimedia. 2019;21(11):2878–90.
Article Google Scholar
Xu W, Liu X, Gong Y. Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. 2003;267–273.
Yin M, Gao J, Xie S, Guo Y. Multiview subspace clustering via tensorial t-product representation. IEEE Transactions on Neural Networks and Learning Systems. 2019;30(3):851–64.
Article MathSciNet Google Scholar
Salvador J, Casas JR. Multi-view video representation based on fast monte carlo surface reconstruction. IEEE Trans Image Process. 2013;22(9):3342–52.
Article MathSciNet MATH Google Scholar
Tan TY, Zhang L, Lim CP. Adaptive melanoma diagnosis using evolving clustering, ensemble and deep neural networks. Knowledge-Based Systems. 2020;187:104807.1–104807.26.
Sato Y, Izui K, Yamada T, Nishiwaki S. Data mining based on clustering and association rule analysis for knowledge discovery in multiobjective topology optimization. Expert Syst Appl. 2019;119:247–61.
Article Google Scholar
Zhu X, Zhang S, He W, Hu R, Lei C, Zhu P. One-Step Multi-View Spectral Clustering. IEEE Trans Knowl Data Eng. 2019;31(10):2022–34.
Article Google Scholar
Li SY, Jiang Y, Zhou ZH. Partial multi-view clustering. In: Proceedings of the AAAI conference on artificial intelligence. 2014;1968–1974.
Wang Q, Si L, Shen B. Learning to hash on partial multi-modal data. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 2015;3904–3910.
Zhao H, Liu H, Fu Y. Incomplete multi-modal visual data grouping. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 2016;2392–2398.
Guo J, Ye J. Anchors Bring Ease: An Embarrassingly Simple Approach to Partial Multi-view Clustering. In: Proceedings of the AAAI conference on artificial intelligence. 2019;118–125.
Wen J, Xu Y, Liu H. Incomplete Multiview Spectral Clustering with Adaptive Graph Learning. IEEE Transactions on Cybernetics. 2020;50(4):1418–29.
Article Google Scholar
Wen J, Yan K, Zhang Z, Xu Y, Zhang B. Adaptive graph completion based incomplete multi-view clustering. IEEE Trans Multimedia. 2020;99:1–1.
Google Scholar
Shao W, He L, Yu P. Multiple incomplete views clustering via weighted NMF with regularization. ECML/PKDD. 2015;318–334.
Liu J, Wang C, Gao J, Han J. Multi-view clustering via joint nonnegative matrix factorization. SDM. 2013;252–260.
Trivedi A, Rai P, Daume H, DuVall SL. Multiview clustering with incomplete views. In: Advances in Neural Information Processing Systems Workshop. 2010.
Gao H, Peng Y, Jian S. Incomplete multi-view clustering. In: International Conference on Intelligent Information Processing. Springer. 2016;245–255.
Ng AY, Jordan MI, Weiss Y. On Spectral Clustering: Analysis and an Algorithm. In: Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic. MIT Press. 2001.
Hou C, Nie F, Li X, Yi D, Yi W. Joint Embedding Learning and Sparse Regression: A Framework for Unsupervised Feature Selection. IEEE Transactions on Cybernetics. 2013;44(6).
Boyd S, Vandenberghe L. Convex optimization. In Cambridge University Press. 2004.
Nie F, Wang X, Huang H. Clustering and projected clustering with adaptive neighbors. In: Proceedings of ACM SIGKDD. 2014.
Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y. Robust face recognition via sparse representation. In IEEE Transactions on Pattern Analysis and Machine Intelligence. 2019;31:210–27.
Article Google Scholar
Hagen L, Kahng AB. New spectral methods for ratio cut partitioning and clustering. IEEE Trans Comput Aided Des Integr Circuits Syst. 1992;11(9):1074–85.
Article Google Scholar
Holmes M, Gray A, Isbell C. Fast svd for large-scale matrices. In NIPS workshop. 2007;28:249–52.
Google Scholar
Nilsback ME, Zisserman A. A visual vocabulary for flower classification. In CVPR. 2006;1447–1454.
Hull J. A database for handwritten text recognition research. TPAMI. 1994;16(5):550–4.
Article Google Scholar
LeCun Y, Bottou L, Bengio Y, Haaffner P. Gradient-based learning applied to document recognition. In: Proceedings of the IEEE. 1998;86(11):2278–2324.
Greene D, Cunningham P. A matrix factorization approach for integrating multiple data views. ECML/PKDD. 2009;423–438.
Mallah C, Cope J, Orwell J. Plant leaf classification using probabilistic integration of shape, texture and margin features. In: Proceedings of the IASTED International Conference Signal Processing, Pattern Recognition and Applications. 2013;279–286.
Samaria FS, Harter AC. Parameterisation of a stochastic model for human face identification. In: Proceedings of the Second IEEE Workshop on Applications of Computer Vision. 1994;138–142.
Chua TS, Tang J, Hong R, Li H, Luo Z, Zheng Y. Nus-wide: a real-world web image database from national university of Singapore. In: Proceedings of the ACM International Conference on Image and Video Retrieval, (CIVR). 2009;48.
Dueck D, Frey BJ. Non-metric affinity propagation for unsupervised image categorization. In: Proceedings of the IEEE International Conference on Computer Vision, IEEE. 2007;1–8.
Peng Xi, et al. COMIC: Multi-view clustering without parameter selection. In: International conference on machine learning. PMLR. 2019;5092-5101.

Download references

Funding

This work was supported in part by the Natural Science Foundation of China under Grant 61972001, in part by the General Project of Anhui Natural Science Foundation under Grant 1908085MF188 and 2108085MF212, and in part by the Key Projects of Natural Science Foundation of Anhui Province Colleges and Universities under Grant KJ2020A0041.

Author information

Authors and Affiliations

School of Computer Science and Technology, Anhui University, Hefei, 230601, China
Xia Ji, Lei Yang, Sheng Yao, Peng Zhao & Xuejun Li

Authors

Xia Ji
View author publications
You can also search for this author in PubMed Google Scholar
Lei Yang
View author publications
You can also search for this author in PubMed Google Scholar
Sheng Yao
View author publications
You can also search for this author in PubMed Google Scholar
Peng Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Xuejun Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei Yang.

Ethics declarations

Ethical approval

This article does not contain any studies with human or animal subjects performed by any of the authors.

Informed Consent

Informed consent was obtained from all individual participants included in the study.

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ji, X., Yang, L., Yao, S. et al. Fast and General Incomplete Multi-view Adaptive Clustering. Cogn Comput 15, 683–693 (2023). https://doi.org/10.1007/s12559-022-10079-3

Download citation

Received: 25 December 2021
Accepted: 15 November 2022
Published: 01 December 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s12559-022-10079-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Fast and General Incomplete Multi-view Adaptive Clustering

Abstract

Similar content being viewed by others

Incomplete Multi-view Clustering

Incomplete multi-view clustering via local and global co-regularization

Incomplete Multi-view Clustering Based on Self-representation

Explore related subjects

Introduction

Related Works

The Proposed Method

Notations

Method Framework

Consistent Representation Learning

Spectral Clustering On Fused Similarities

Computational Complexity Analysis

Experiments

Datasets

The Flowers17 Dataset

The USPS-MNIST Dataset

The 3Sources Dataset

The 100leaves Dataset

The ORL Dataset

The NUS Dataset

The Caltech101 Dataset

NUS-WIDE-Object

Comparison methods

Experimental Settings

There are paired samples among multi views, i.e., some samples contain the features of all views.

There are no paired samples among multi views, i.e., no sample contain the features of all views.

Results and Analysis

Conclusion

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethical approval

Informed Consent

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation