Introduction

With the development of information collection technology, data are often described by multiple views [1, 2]. A typical example is the classification of webpages, which can be described by two views of text and its linked information [3]. Generally speaking, different views provide complementary information to describe the data, which makes multiview learning can get better performance than single view learning [4]. As one of the most representative methods of multi-view data learning, multi-view clustering has been widely applied in many fields, such as data analysis, information retrieval and image classification [5,6,7,8]. Many advanced methods have been proposed for multi-view clustering. However, due to noise, failure of data-collecting equipment and many other unforeseen factors, data can be lost randomly in a single view or multiple views, making incomplete multi-view data widely exist [9].

Figure 1 shows two cases of incomplete multi-view data. In the first case, part of samples contain the features of all views. In the second case, no samples contain features of all views. That is, there is no complete common samples among multiple views. For the first case, many methods have been proposed, such as PVC (Partial multi-view clustering) [9], PMH (Learning to hash on partial multi-modal data) [10] and IMG (Incomplete multi-modal visual data grouping) [11] based on matrix factorization, and APMC (Anchors Bring Ease: An Embarrassingly Simple Approach to Partial Multi-view Clustering) [12]. However, these methods cannot handle the second case, because they need to rely on common parts of multiple views. Furthermore, neither the PVC nor the IMG approach can handle multi-view data with more than two views. To our knowledge, currently only the newly proposed methods IMSC_AGL (Incomplete Multiview Spectral Clustering with Adaptive Graph Learning) [13] and AGC_IMC (Incomplete Multiview Clustering with Adaptive Graph) [14] can handle the second case of incomplete multi-view data. However, both methods are computationally complex and require many iterations to converge. In addition, in order to achieve the optimal clustering effect, they each contain three parameters that need to be adjusted. Therefore, they are difficult to use in practice.

Fig. 1
figure 1

Two types of the incomplete multi-view data

To address the above problems, we propose a novel method called Fast and General Incomplete Multi-View Adaptive Clustering (FGPMAC). FGPMAC adopts an adaptive neighbor assignment strategy to calculate the similarity matrix of each view independently and non-iteratively, without relying on the common part of multiple views, so FGPMAC can handle the second case of the incomplete multi-view data. Experimental results on many real datasets fully demonstrate the effectiveness and efficiency of FGPMAC. In general, FGPMAC has the following contributions.

  1. 1.

    FGPMAC can independently calculate the similarity matrix of each view without requiring complete common samples among multiple views. It can handle both cases of incomplete multi-view data at the same time, so FGPMAC is a general incomplete multi-view clustering method.

  2. 2.

    FGPMAC adopts an adaptive neighbor assignment strategy to calculate similarity, thus tedious parameter adjustment is avoided.

  3. 3.

    FGPMAC has a non-iterative structure with low computational complexity and is suitable for large-scale datasets.

    The rest of this paper is organized as follows. Related works are introduced in "Related Works". "The Proposed Method" describes FGPMAC in detail. "Spectral Clustering On Fused Similarities" presents the experimental resulst and the analysis. "Method Framework" is the conclusion of this paper.

Related Works

For the first case of incomplete multi-view data, many methods have been proposed. BSV (Best Single View) is the simplest and the most direct way to deal with incomplete multi-view clustering. BSV first complete the missing values with the average value of each view, and then clusters each view separately to select the best result. However, BSV does not take full use of the information of multiple views. SC[C] is a simple splicing method. SC[C] connect the features of multiple views into a long single vector, and then obtain a unified similarity matrix and perform spectral clustering. SC[A] is a simple fusion method. SC[A] generate a similarity matrix for each view, then fuse these similarity matrices equally, and finally perform spectral clustering. MultiNMF is a clustering method that processes complete multi-view data. We first complete all missing values like BSV, and then perform MultiNMF. PVC [9] is a pioneering approach that uses non-negative matrix factorization with \({l}_{1}\) sparse regularization to determine the optimal low-dimensional subspace. Meanwhile, MIC (Multiple incomplete views clustering via weighted NMF with \({l}_{\mathrm{2,1}}\) regularization) [15] extends MultiNMF (Multi-view clustering via joint nonnegative matrix factorization) [16] with weighted non-negative matrix factorization and \({l}_{\mathrm{2,1}}\) regularization to obtain highly reliable results. IMG [11] integrates PVC and manifold learning to adaptively capture the global structure of all instances, but such integration requires additional parameters. In addition, neither PVC nor IMG can process more than two views. Trivedi et al. [17] and Gao et al. [18] proposed an incomplete multi view clustering method based on kernel regular correlation analysis respectively. However, both methods require at least one complete view as a reference. APMC [12] demonstrates a significantly improved computational efficiency, but it must ensure that there are some samples that contain all the view features.

The PVC, MIC, IMG, MultiNMF, APMC mentioned above cannot handle the second incomplete case. The newly proposed IMSC_AGL [13] and AGC_IMC [14] can handle the second case. IMSC_AGL [13] and AGC_IMC [14] can perform clustering when no samples contain all the view features. However, they have high computational complexity, respectively \(O\left(\tau \left(k{n}^{3}+{n}^{3}+{\sum }_{v}{n}_{v}^{3}\right)\right)\) and \(O\left(\left(\tau c{n}^{2}\right)\right)\), where \(\tau\) represents the number of iterations, k represents the number of views, n represents the number of samples, and c represents the number of clusters. Due to the high computational complexity, IMSC_AGL and AGC_IMC are unsuitable for large-scale datasets. In addition, the three parameters of IMSC_AGL and those of AGC_IMC are not easy to adjust in practice. The choice of parameters depends on experience and is adjusted according to the results. Inappropriate parameter selection will directly affect the accuracy of clustering. Different datasets require different optimal values for the parameters. This greatly affects the usefulness of both algorithms. Due to the limitations of previous methods, it is very necessary to do further research on incomplete multi-view clustering.

The Proposed Method

Notations

Consider dataset \(X=\{{X}^{(1)},{X}^{(2)},...,{X}^{(v)}\}\), where \({X}^{\left(v\right)}=\left\{{x}_{1}^{v},{x}_{2}^{v},\dots ,{x}_{n}^{v},\right\}\in {R}^{{d}_{v}\times n}\) is the data matrix of the v-th view. v is the number of views, n is the total number of samples, and \({d}_{v}\) is the feature dimension of the v-th view. Incomplete multi-view clustering divides all of the above samples into c clusters, where c is predefined by users. The notations used in this paper are summarized in Table 1.

Table 1 Summary of the notations

Method Framework

The proposed FGPMAC involves two stages, shown in Fig. 2. At the first stage, FGPMAC adopts an adap-tive neighbor assignment strategy to construct the similarity matrix of each view indepandently, thereby eliminating the necessary to adjust the parameters. Then, FGPMAC quantifies the contribution of each view, and generates a consistent similarity matrix by fusing the similarity matrixes of multiple views. At the second stage, FGPMAC performs spectral clustering [19] on the consistent similarity matrix to obtain clustering results.

Fig. 2
figure 2

The framework of the proposed incomplete multi-view clustering method

Consistent Representation Learning

Let \({X}^{(v)}=\left\{{x}_{1}^{(v)},{x}_{2}^{(v)},\dots ,{x}_{n}^{(v)}\right\}\in {R}^{{d}_{v}\times n}\) denotes the samples of the v-th view (including the missing samples). We use \(Y^{(v)}=\left[y_1^{(v)},y_2^{(v)},{\dots.,y}_{n_v}^{(v)}\right]\in R^{d_v\times n_v}(n_v<n)\) to represent the samples that are not lost in the v-th view, where \({d}_{v}\) and \({n}_{v}\) are the feature dimensions and the number of unmissed samples in the v-th view, respectively. We use \({Y}^{(v)}\) to construct the similarity matrix \({Z}^{(v)}\) of the unmissed samples.

We learn similarity matrix by adaptively assigning the optimal neighbors for each sample. Nearby points have similar properties [20]. Samples with a shorter distance should have a higher probability to be neighbors. The neighbors of \({x}_{i}\in {R}^{d\times 1}\) can be defined as the \(k\)-nearest samples to \({x}_{i}\). We use the Euclidean distance to measure the distance between two samples. For the \(i\)-th sample \({x}_{i}\), all the samples \(\left\{{x}_{1},{x}_{2},\dots ,{x}_{n}\right\}\) can be the neighbors of \({x}_{i}\) with probability \({z}_{ij}\). Usually, a shorter distance \({d}_{ij}={\Vert {x}_{i}-{x}_{j}\Vert }_{2}^{2}\) should be assigned a higher probability \({z}_{ij}\).

The similarity \({z}_{ij}^{(v)}\) stands for the probability that \({y}_{j}^{(v)}\) is the neighbor of \({y}_{i}^{(v)}\). Thereby, a good way to obtain neighbor probabilities of the i-th sample is to solve the following problem:

$$\underset{{{Z}_{i}^{(v)}}^{T}1=1,{z}_{ij}^{(v)}\ge 0}{min} {{\sum }_{j=1}^{{n}_{v}}\Vert {y}_{i}^{(v)}-{y}_{j}^{(v)}\Vert }_{2}^{2}{z}_{ij}^{(v)}$$
(1)

However, there is a trivial solution in (1), i.e., only the nearest sample can be the neighbor of \({x}_{i}\) with probability 1, and the probability of all the other samples being the neighbors of \({x}_{i}\) is 0. This solution is obviously Pointless. To solve the problem, a regularization term is added to (1), then we have

$$\underset{{ {Z}_{i}^{(v)}}^{T}1=1,{z}_{ij}^{(v)}\ge 0}{min} {{\sum }_{j=1}^{{n}_{v}}\Vert {y}_{i}^{(v)}-{y}_{j}^{(v)}\Vert }_{2}^{2}{z}_{ij}^{(v)}+\upgamma {\sum }_{j=1}^{{n}_{v}}{{z}_{ij}^{(v)}}^{2}$$
(2)

The second term in Eq. (2) is a regularization and γ is the regularization parameter. \({z}_{ij}^{(v)}\) is the j-th value of \({{z}_{i}^{(v)}}^{T}\), \({n}_{v}\) represents the number of samples appearing in view-v. Let \({d}_{ij}^{(v)}={\Vert {y}_{i}^{(v)}-{y}_{j}^{(v)}\Vert }_{2}^{2}\), and (2) can be rewritten as

$$\underset{{Z}_{i}^{(v)}}{min}{\Vert {z}_{i}^{(v)}+\frac{{d}_{ij}^{(v)}}{2\gamma }\Vert }_{2}^{2}\text{ s.t }{{z}_{i}^{(v)}}^{T}1=1,{z}_{ij}^{(v)}\ge 0$$
(3)

Considering the equality and inequality constraints in (3), we use the Lagrangian function with KKT condition (Convex optimization) [21] to solve the equation. The Lagrangian function of (3) is

$$L\left({z}_{i}^{(v)},\eta ,{\beta }_{i}\right)=\frac{1}{2}{\Vert {z}_{i}^{(v)}+\frac{{d}_{i}^{(v)}}{2{\gamma }_{i}}\Vert }_{2}^{2}-\eta \left({{z}_{i}^{(v)}}^{T}-1\right)-{\beta }_{i}^{T}{z}_{i}^{(v)}$$
(4)

where \(\eta\) and \({\beta }_{i}\) are the Lagrangian multipliers, \(\eta\) is the equality constraint coefficient, and \({\beta }_{i}\) is the inequality constraint coefficient. According to the KKT condition, the following conditions (5) must be met in order to get the optimal solution.

$$\left\{\begin{array}{c}{\left.\frac{\partial L}{\partial {z}_{i}^{(v)}}\right|}_{{z}_{i}^{(v)}={z}_{i}^{^{\prime}}}=0\\ \eta \ne 0\\ {\beta }_{i}\ge 0\\ {\beta }_{i}{z}_{i}^{^{\prime}}=0\\ {{z}_{i}^{^{\prime}}}^{T}1-1=0\\ {z}_{i}^{^{\prime}}\ge 0\end{array}\right.$$
(5)

\({z}_{i}^{\mathrm{^{\prime}}}\) is the optimal solution and γ can be set as \(\gamma =\frac{k}{2}{d}_{i,k+1}^{(v)}-\frac{1}{2}{\sum }_{j=1}^{k}{d}_{ij}^{\left(v\right)}\) [22] in order to obtain the optimal solution of (4):

$${z}_{ij}^{(v)}=\frac{{d}_{i,k+1}^{(v)}-{d}_{ij}^{(v)}}{k{d}_{i,k+1}^{(v)}-{\sum }_{j=1}^{k}{d}_{ij}^{(v)}}$$
(6)

It is preferred to learn a sparse \({z}_{i}\) which has exactly k nonzero values. The study [23] revealed that sparse representation is robust to noise and outliers. Because the learned \(Z\) is sparse, the computation burden of subsequent spectral clustering can be largely reduced. In addition, the number of neighbors k is much easier to adjust than the regularization parameter γ since k is an integer with an explicit meaning.

After obtaining the similarity matrix \({Z}^{(v)}\) among the non-missing instances in the v-th view, the similarity matrix \({\overline{Z} }^{(v)}\) of all samples (including the missing and non-missing samples) in the v-th view can be obtained by the following formula:

$${\overline{Z} }^{(v)}= {{G}^{(v)}}^{T}{Z}^{(v)}{G}^{(v)}$$
(7)

where \({G}^{(v)}\in {R}^{{n}_{v}\times n}\) is the index matrix, in which the value associated with the missing sample is forced to 0. The definition of matrix \({G}^{(v)}\) is as follows:

$${G}^{\left(v\right)}=\left\{\begin{array}{c}1\\ 0\end{array}\right.\genfrac{}{}{0pt}{}{ if {y}^{\left(v\right)}\;is\;the\;original\;instance\; {x}^{\left(v\right)}}{otherwise}$$
(8)

Similar to the previous analysis, deleting those samples that suffer from missing information or filling in the missing views with the average is unreasonable. However, setting the value to 0 in the similarity of the corresponding view is reasonable. In this way, the uncertain information in the incomplete view will not play a negative role in learning the data cluster representation. By contrast, only the available information is used to guide the representation learning, which is conducive to obtaining a highly reliable data cluster representation and reducing the negative impact of missing information.

After obtaining the similarity \({\overline{Z} }^{(v)}\) of each view, we fuse all the similarity matrices into a unified matrix, which is also our key matrix. We use \(S={\sum }_{1}^{v}{W}^{(v)}{\overline{Z} }^{(v)}\) to represent the unified matrix to make full use of the information.

The non-missing samples represent the information that we can use. The more information a view provides, the greater the weight of this view is. Moreover, the view with more missing samples should be assigned a smaller weight to achieve a highly reliable and consistent representation and to reduce the negative influence of incomplete views. If the missing rates of multiple views are quite different and the weights are equally assigned, then those views with high missing rates may provide too much inaccurate information and affect the final clustering results. Therefore, \({W}^{(v)}\) can be expressed as

$${W}^{(v)}={n}_{v}/{\sum }_{1}^{v}{n}_{v} \text{s.t} {\sum }_{1}^{v}{W}^{v}=1$$
(9)

Spectral Clustering On Fused Similarities

After obtaining the fused similarity matrix S, spectral clustering is performed to gain the final clustering result. In comparison with the traditional k-means algorithm, spectral clustering is more adaptable to different data distribution. Spectral clustering learns a low-dimensional representation \(F\in {R}^{n\times c}\) according to the fusion matrix S. Spectral clustering solves the problem (10) by eigendecomposition of \(L\) to obtain the corresponding c minimum eigenvectors, and then K-means clustering is performed to get the clustering results.

$$\underset{{{\varvec{F}}}^{T}{\varvec{F}}={\varvec{I}}}{\text{min}}Tr({F}^{T}\mathit{LF})$$
(10)

where \(Tr(.)\) is the trace of the matrix, L = D-S is a Laplacian matrix [24], \(D\in {R}^{n\times n}\) is a diagonal matrix with \({D}_{ii}\in {\sum }_{j=1}^{n}{S}_{ij}\), and I is the identity matrix.

Algorithm 1 summarizes the calculation process described above.

Computational Complexity Analysis

At the first consistent similarity learning stage, the time complexity of generating the similarity matrix S is \(O\left({n}_{v}^{2}{\sum }_{v}{d}_{v}\right)\), where \({n}_{v}\) represents the number of instances that appear in view-v, and \({d}_{v}\) represents the feature dimension of view-v. With the increase of missing samples, \({n}_{v}\) will decrease and the algorithm complexity will be greatly reduced. The complexity of the algorithm increases approximately linearly as the number of views increases. Because every time a view is added, the similarity matrix of a view needs to be calculated accordingly.

At the second stage, spectral clustering is performed on the fused similarity matrix S, and we only need the c largest singular values. In addition, given the properties of the similarity matrix S, performing SVD (Fast svd for large-scale matrices) [25] can reduce the time complexity to \(O\left(n{c}^{2}\right)\).

figure a

Experiments

In this section, we conduct extensive experiments to demonstrate the effectiveness and efficiency of FGPMAC.

Datasets

Table 2 briefly describes the datasets used in the experiment. The datasets Flowers17 and USPS-MNIST with two views are used in the first incomplete case. The other datasets are used for the second incomplete case.

Table 2 Description of the used datasets

The Flowers17 Dataset

[26] is composed of 17 flower classes, described by color, shape, and textures. Following [12], we take the \({X}^{2}\) distance matrix of color and shape features as the two views.

The USPS-MNIST Dataset

is the combination of two famous handwritten datasets, namely, USPS [27] and MNIST [28]. We follow [12] and randomly select 50 images from each number category from each dataset.

The 3Sources Dataset

[29] contains 948 news articles collected from 3 online news sources, namely, BBC, Reuters, and The Guardian. In our experiments, we select a subset containing 169 stories reported in all 3 sources.

The 100leaves Dataset

[30] contains three views with a total of 1600 samples divided into 100 categories.

The ORL Dataset

[31] contains 40 categories and a total of 400 images. For each image, we generate 4 feature vectors, including GIST (512), LBP (59), HOG (864), and CENT (254).

The NUS Dataset

[32] is a subset of NUS-WIDE, which contains a total of 1200 images, divided into 12 categories.

The Caltech101 Dataset

[33] contains 101 objects and a background category, and each object provides 40 to 800 images.

NUS-WIDE-Object

[32] is a dataset for object recognition which consists of 30,000 images in 31 classes. We use 5 features provided by the website, i.e., 65-dimension color Histogram (CH), 226-dimension color moments (CM), 145-dimension color correlation (CORR), 74-dimension edge distribution and 129 wavelet texture.

Comparison methods

  1. 1.

    BSV [11] BSV (Best Single View) first fills in the missing values with the average value of each view, and then clusters each view separately to select the best result.

  2. 2.

    SC[C] It connects the features of multiple views into a long single vector, and then obtain a unified similarity matrix and perform spectral clustering.

  3. 3.

    SC[A] It generates a similarity matrix for each view, then fuse these similarity matrices equally, and finally perform spectral clustering.

  4. 4.

    MultiNMF It first fill in all missing values like BSV, and then perform MultiNMF.

  5. 5.

    PVC [9] establishes a latent subspace where the same sample described by different views are close to each other.

  6. 6.

    MIC [15] extended MultiNMF by weighted NMF to obtain better results.

  7. 7.

    IMG [11] integrates the global structure of data into subspace learning.

  8. 8.

    APMC [12] utilizes anchors to reconstruct instance to instance relationships for clustering.

  9. 9.

    IMSC_AGL [13] exploits the graph learning and spectral clustering techniques to learn the common representation for incomplete multi-view clustering.

  10. 10.

    AGC_IMC [14] develops a joint framework for graph completion and consensus representation learning.

Experimental Settings

We construct two types of incomplete multi-view data in our experiments.

There are paired samples among multi views, i.e., some samples contain the features of all views.

We set the partial data ratio (IDR) from 10 to 90% with a 20% interval, where 0% means that all views are complete. The lost samples are evenly distributed across all views, and each sample is available in at least one view.

There are no paired samples among multi views, i.e., no sample contain the features of all views.

We randomly delete approximately 30%, 50%, and 70% of the samples from the multi-view dataset and then evaluate clustering performance under different missing rates.

In order to evaluate the clustering performance, two classic clustering evaluation indicators are adopted, namely, clustering accuracy (ACC) and standardized mutual information (NMI). The values of these indicators range from 0 to 1, and a larger value indicates a better performance of the clustering algorithm. For fairness, we perform all comparison methods 10 times on each dataset and report the average clustering results to eliminate the uncertainty caused by randomness. The parameters involved in the comparison method are set according to author's suggestion or default value. For the purpose of reproducibility, the code and datasets are released at: https://github.com/leiyang617/code_for_FGIMAC.

Results and Analysis

Figures 3 and 4 report the ACC and MNI at different missing rates in the first incomplete case, whereas Figs. 57 report the indicators in the second incomplete case. Table 3 shows the run time of different methods on various datasets with a missing rate of approximately 50%. Due to the limit of space, we only report the results obtained when the missing rate is 50% given that similar trends can be observed in the other missing rates.

Table 3 Running time (seconds) on different datasets with a missing rate of approximately 50%

The clustering performance of all methods declines along with an increasing missing rate. Figures 3 and 4 show that BSV, SC[C], SC[A], MultiNMF and MIC are unsatisfactory in most cases, indicating that filling the missing samples with the average is not good enough to solve the incomplete multi-view clustering problem. BSV only uses a single view to obtain the clustering results and is unable to use the complementary information among multiple views. SC[C] connects all views into a single long view and ignores the differences in the distributions of various views. SC[A] treats all views equally without considering their integri-ty or credibility, which is unreasonable for incomplete multi-view clustering. PVC, IMG, APMC, IMSC_AGL and AGC_IMC demonstrate an acceptable performance, suggesting that using the complementary information among views is an effective approach.

Fig. 3
figure 3

Experiment results on the Flowers17 dataset in the first incomplete case.

In the first incomplete case, Fig. 3 show that FGPMAC obtains better results than other methods on the Flowers17 dataset. Figure 4 show that on the USPS-MNIST dataset, FGIMAC obtains better results than other methods except when the PDR is 90%.

In the second incomplete case, Fig. 5 shows that FGIMAC outperforms the other methods in all datasets when the missing rate is 30%. Figure 6 shows that when the missing rate is 50%, FGIMAC achieves the best results on all datasets except the ORL dataset and the NUS dataset. Figure 7 shows that when the missing rate is 70%, FGPMAC obtains the best results on all datasets except the 3Sources dataset. When FGPMAC fails to obtain the optimal result, the gap between its results and the optimal ones is very small.

Fig. 4
figure 4

Experiment results on the USPS-MNIST dataset in the first incomplete case

Fig. 5
figure 5

Experiment results on different datasets with 30% missing rate in the second incomplete case

Fig. 6
figure 6

Experiment results on different datasets with 50% missing rate in the second incomplete case

Although IMSC_AGL and AGC_IMC have suboptimal clustering performance, but their computational complexity is unsatisfied.

It can be seen from Table 3 that the run time of IMSC_AGL and AGC_IMC is much higher than that of other methods. Our method has absolute advantages in terms of run time. For instance, on the 3Sources dataset, it takes IMSC_AGL 10.86 s to run, 5.09 s for AGC_IMC, while our method only use 0.18 s. On the Caltech101 dataset, the run time of IMSC_AGL is 3041.25 s, and that of AGC_IMC is 1278.09 s, while ours is just 173.07 s, especially on the dataset 100leaves, the run time of method AGC_IMC and IMSC_AGL is about hundreds of times longer than ours. In addition, the experimental results on Cal-tech101 and NUS-WIDEOBJ datasets show that the run time of IMSC_AGL and AGC_IMC increases rapidly with the enlargement of the dataset scale. For large-scale datasets, this is clearly unacceptable.

In order to further explore how the time complexity of the algorithm proposed in this paper changes with the number of views and sample missing rate, we conducted experiments on the Caltech101 dataset. The results are shown in Figs. 8 and 9, and it can be seen from the results that as the sample missing rate increases, the clustering accuracy decreases and the running time reduces significantly. In addition, ACC and NMI decrease dramatically when the missing rate reaches 70%, because the missing rate is so large that we have little information to use. Data with a missing rate of more than 70% are rare in practical applications. As the number of views increases, the accuracy increases and the runtime increases accordingly. The change in running time is consistent with what we analyzed in "Consistent Representation Learning". Furthermore, it can be seen that as the number of views increases, the ACC and NMI also increase, which indicates that combining features from multiple views helps to improve the clustering performance.

Fig. 7
figure 7

Experiment results on different datasets with 70% missing rate in the second incomplete case

Fig. 8
figure 8

Running time on Caltech101 dataset with different missing rates

We test the sensitivity of parameters to further evaluate the performance of FGIMAC, which only needs to adjust pa-rameter \(k\). We evaluate the clustering performance of \(k\) in the range of {4, 6, …, 14}. Due to the limited space, we only report the results on the ORL dataset given that similar trends can be observed in the other datasets. As shown in Fig. 10, under the same PDR, the fluctuation of ACC and NMI is very small when different \(k\) are selected, so it has little influence on the final clustering result. In other words, FGIMAC has low sensitivity to \(k\) in a relatively wide range.

Fig. 9
figure 9

Running time on Caltech101 dataset with different numbers of views

Fig. 10
figure 10

Influence of the number of nearest k on the ORL dataset with different PDR settings

Conclusion

We propose FGIMAC, a simple and effective method for incomplete multi-view clustering that can overcome many shortcomings of the existing methods. Compared to traditional methods, FGIMAC demonstrates greater flexibility as it is able to cluster in two imcomplete cases of multi-view data. We perform experiments on multi-view datasets with different missing rates, and experimental results show that FGIMAC achieves higher clustering performance with less run time.

However, most of the existing incomplete multi-view clustering methods (including our method) must take the number of clusters into consideration. As for future work, we plan to refer to COMIC (COMIC: Multi-view Clustering Without Parameter Selection) [34] and adjust our method so that clustering can be performed without knowing the number of clusters.