1 Introduction

Multi-view learning is becoming increasingly popular as multi-view data finds applications across various real-world scenarios [1, 2]. The goal of this strategy is to utilize consistent and complementary information from several points of view. All the tasks in multi-view learning, multi-view clustering (MVC) is one of the most notable. MVC groups unlabeled data from several perspectives into clusters in order to produce reliable clustering results from all sides by utilizing a variety of perspectives. Many approaches have been developed in the field of multi-view clustering over the last ten years, as the literature [3,4,5] documents. Multi-view non-negative matrix factorization, or MultiNMF, is one such method [6]. This method incorporates a consensus constraint—which is essential for preserving consistent clustering results from many viewpoints—with a non-negative matrix factorization (NMF) process. Spectral clustering is used by another class of methods, such as centroid-based co-regularization and pairwise co-regularization [7], to produce clustering outcomes for every view. These techniques make use of a variety of procedures in order to align the clustering results from several viewpoints, guaranteeing consistency in the clustering results from multiple views.

The approaches for MVC discussed earlier presuppose that each of the examples’ views is complete. However, circumstances where some viewpoints are absent are common in real-world scenario. When examining a web page, text and photographs, for instance, might be viewed as two distinct perspectives; nevertheless, some web sites may include neither text nor image data at all. Similar to this, a news story can be viewed from a variety of angles by reading news reports from numerous media sources, even though some outlets might not have covered the particular subject. When multi-view data are incomplete, traditional multi-view clustering methods fall short. In response to this challenge, a number of strategies have surfaced recently [8,9,10]. A crucial element of many clustering strategies, matrix factorization has been shown to be effective in a wide range of applications [11, 12]. These techniques aim to uncover latent representations of incomplete multi-view data by combining matrix factorization and regularization algorithms. The preservation of these representations was improved by Zhao et al. [13] with the addition of a graph Laplacian term to their optimization procedure.

Furthermore, the absence of information in missing views is the primary cause of incomplete multi-view clustering (IMC) shortcomings. IMC techniques can be split into non-inference and inference approaches, each of which uses a different strategy, to address this problem. Non-inference IMC [18,19,20,21] aims to achieve clustering with incomplete multi-view data while mitigating the effects of information loss. Wen et al. [22], for instance, used the samples that were accessible from each view to create an incomplete similarity graph, filling in the missing elements with zeros. Next, k-means clustering is used to a shared spectral embedding obtained from completed similarity graphs. Additionally, the local graph preservation approach was used by Wen et al. [23] to obtain a common representation from incomplete views. In order to efficiently fuse partial similarity graphs, Liang et al. [24] applied sample-level adaptive weights on the similarity graphs of all views that were available. Hu et al. [25] presented a matrix factorization-based method that aligns view-specific basis matrices to learn a shared representation from imperfect data. Next, k-means is utilized to cluster this shared representation. IMC techniques focus on organizing the reconstructed viewpoints and retrieving missing ones in order to prevent information loss [26,27,28,29,30,31,32,33,34]. Achieving high-quality retrieval of missing data is the primary goal. A straightforward method is to create unavailable samples by averaging characteristics. Zhou et al. [35], for instance, reduced the impact of missing cases by introducing a weighting method and adding average features for each view. But because all recovered samples from this method have the same properties, they are unable to provide enough useful information and may cause alignment issues between views. To mutually reinforce each other, integrating the inference and clustering processes is a more rational method [36,37,38,39,40]. Pairwise dimension graph preservation was used by Wen et al. [41] to recover the missing instances, and reverse graph regularization was used to guide finished views.

Concurrently, the various data perspectives show complementary and consensus behavior. Each view is important for computing clustering performance because it allows explicit information to be explored from incomplete multi-view data. The primary motivation of proposed study is the vital problem of completely using the data included in individual incomplete views for the analysis of the consensus structure of the heterogeneous perspective on the kernel space [7, 47, 48]. In order to achieve this objective, a new method called weighted concept factorization is introduced for clustering incomplete multi-view data. The proposed method seeks to reveal hidden structures, or clusters. First, each view of the data is normalized and missing data is imputed using the algorithm. Subsequently, three matrices are repeatedly refined: an association matrix that captures the relationships between data and clusters, a projection matrix that assigns a value to each feature within each view, and a consensus matrix that represents a single view that is shared by all data viewpoints. Using the correlation aspect, the disagreement factor is extracted. To further prevent the over-fitting of the view, the Frobenius norm is also used to pair up the projection matrix. This procedure is repeated until convergence is achieved or the maximum number of iterations is completed. The key aspects of the proposed algorithm are summarized as follows:

  1. 1.

    The missing issues in multi-view data are effectively handled by the proposed approach. In our suggested objective function, we employ the weighted concept factorization approach. For each incomplete view, a weight matrix is built so that the missing instances in each view have a lower weight than the examples that are provided.

  2. 2.

    To drive the latent feature matrix toward a consensus, we use the co-regularized technique. In order to avoid the view’s over-fitting problem, and maintain the consistent information, the projection matrix, and the associated matrix are conjugated using the Frobenius norm. During the optimization process, each view’s weight is automatically determined. To handle the related optimization problem efficiently and effectively, a new updating rule is created.

  3. 3.

    The outcomes of the carrying experiments on real-world datasets are displayed in terms of F-score, ACC, and NMI. According to the experimental study, the suggested approach outperforms other existing techniques in clustering.

We give a summary of the incomplete multi-view clustering methods that are currently in use in Sect. 2. Section 3 provides a detailed explanation of our suggested methodology. An examination of the experiments carried out using benchmark datasets is covered in Sect. 4. Our final remarks are presented in Sect. 5.

2 Related works

An overview of related work in incomplete multi-view clustering is given in this section.

2.1 Multi-incomplete view clustering (MIC)

The MIC method, as described in reference [14], is an IMC approach that utilizes weighted NMF. Its objective function can be formulated in the following manner:

$$\begin{aligned} \mathop {\min }\limits _{{U_f},{V_f},{V^ * } \ge 0} \sum \limits _{f = 1}^F {\left\{ {\left\| {\left( {{X_f} - {U_f}{V_f}} \right) {W_f}} \right\| _F^2} \right\} } + \sum \limits _{f = 1}^F {\left\{ {\alpha \left\| {\left( {{V_f} - {V^ * }} \right) {W_f}} \right\| _F^2 + \beta {{\left\| {{V_f}} \right\| }_{2,1}}} \right\} }\nonumber \\ \end{aligned}$$
(1)

where \(\alpha \) and \(\beta \) serve as the corresponding parameters for the respective terms. F denotes the total views, and the expression \({\left\| {\, \bullet \,} \right\| _{2,1}}\) encompass the \(L_{2,1}\)-norm. The matrix \({X_f} \in {R^{m \times n}}\) includes both present and missing values from the fth view, with the absent data filled by averaging corresponding viewpoints. \({U_f} \in {R^{m \times c}}\) and \({V_f} \in {R^{c \times n}}\) designated as the basis and coefficient matrices for the \(f^{th}\) view. The dimensions are outlined as follows: m signifies the original space dimension in the \(f^{th}\) view, c denotes the latent space dimension, and n represents the overall dataset size. \(V^*\) defines the common representation matrix, while \(W_f\) serves as the diagonal weighting matrix for the \(f^{th}\) view. If the \(i^{th}\) instance of the \(f^{th}\) view is available, \(W_f^{ii} = {z_v}/n\), where \(z_v\) denotes the number of available instances in the \(f^{th}\) view. This technique undergoes iterative optimization.

2.2 Doubly aligned incomplete multi-view clustering (DAIMC)

A DAIMC is an incomplete clustering method for multi-view data based on weighted semi-NMF [25], and to express DAIMC’s cost function:

$$\begin{aligned} \mathop {\min }\limits _{V \ge 0} \,\sum \limits _{f = 1}^F {\left\{ \begin{array}{l} \left\| {\left( {{X_f} - {U_f}V} \right) {W_f}} \right\| _F^2 + \alpha \left( {\left\| {B_f^T{W_f} - I} \right\| _F^2} \right) + \beta {\left\| {{B_f}} \right\| _{2,1}} \end{array} \right\} } \end{aligned}$$
(2)

where \(\alpha \) and \(\beta \) defined as the trade-off parameters for the respective terms. The input matrix is represented by \({X_f} \in {R^{m \times n}}\), the common coefficient matrix is V, and the diagonal weighting matrix for the \(f^{th}\) view is \(W_f\). \(B_f\) is a regression coefficient matrix for the \(f^{th}\) view. \(W_f^{ii}\) equals 1 if the \(i^{th}\) instance of the \(f^{th}\) view is accessible, and \(W_f^{ii}\) equals 0 otherwise. By looking at Eq. (2), we can see that DAIMC aims to align several partial perspectives for V and \(U_f\). This approach undergoes iterative optimization.

2.3 Incomplete multi-view clustering methods

This work addresses the incomplete multi-view clustering issue. Over the course of the last ten years, a variety of incomplete multi-view clustering approaches have been suggested.

In this regard, Yin et al. [42] presented incomplete multi-view clustering with cosine similarity. Enhancing the preservation of the data’s manifold structure, this method computes cosine similarity directly in the original multi-view space. The need to include more variables is eliminated. Coherent method is achieved by merging the manifold structure preservation utilizing cosine similarity term with the matrix factorization component in the objective function. Chao et al. [43] offered a two-stage method to handle multi-view clustering in the event of any value missing that involved multiple imputation and ensemble clustering. The problem of missing values is addressed by multiple imputation, and multi-view clustering is implemented through the use of weighted ensemble clustering. Zhang et al. [44] introduced a novel approach that merges completed graphs into a single graph after filling in the gaps in the incomplete graphs based on agreement between various points of view. Further, the innovative method proposed by Xia et al. [45] involves information fusion in partition space to counteract consistency degradation, and adaptive weighting of all perspectives to represent their different contributions to clustering tasks. To create a desired similarity graph, the cluster structure information is used into the similarity learning process. In order to capture both the global and local structure of the data, Zhang et al. [46] suggested a novel incomplete multi-view clustering algorithm. By adding a distance regularization term to the model and applying a weighted fusion process, the suggested approach creates compact and discriminating representations from partial data.

3 Proposed method

This section provides a thorough explanation of the optimization step as well as a novel incomplete multi-view clustering technique that makes use of the CF approach. Additionally, we demonstrate the approach’s convergent proof and establish the method’s time complexity.

3.1 Concept factorization

CF serves as a potent method for matrix decomposition, particularly adept at handling datasets with negative values. Moreover, it demonstrates adaptability to altered data through kernel methodologies. Considering a data matrix \(X = \left[ {{x_1},{x_2},...,{x_f}} \right] \in {\Re ^{m \times n}}\), where each \(x_f\) is denoted by an f-dimensional feature vector, CF views every data point as an estimated linear formulation of all fundamental concepts. This approach provides a succinct representation of the data in the following manner:

$$\begin{aligned} {x_f} \approx \sum \nolimits _{g } {{w_g}{v_{fg}}} \end{aligned}$$
(3)

In this context, \(v_{fg}\) represents the projection matrix of \(x_f\) onto the basis matrix \(u_f\). Consequently, each basis matrix \(u_g\) is established through a linear combination involving all these data points. This concept is summarized as follows:

$$\begin{aligned} {w_g} \approx \sum \nolimits _{f} {{x_f}{u_{fg}}} \end{aligned}$$
(4)

In the mathematical representation provided by Eq. (3) and (4), \(w_{fg}\) represents a positive association weight. This leads us to the subsequent mathematical formulation:

$$\begin{aligned} {X_f} \approx {X_f}{U_f}{\left( {{V_f}} \right) ^T} \end{aligned}$$
(5)

where \(U_f\) and \(V_f\) belong to the set of matrices with dimensions \(n \times c\), the CF employs the Frobenius norm to approximate the data representation. This norm is utilized in minimizing the cost function through the subsequent objective function:

$$\begin{aligned} \mathop {\min }\limits _{{U_f},{V_f}}:{O_{CF}} = \left\| {{X_f} - {X_f}{U_f}{{({V_f})}^T}} \right\| _F^2{\hspace{0.55542pt}} {\hspace{0.55542pt}} {\hspace{0.55542pt}} s.t.{\hspace{0.55542pt}} {U_f},{V_f} \ge 0. \end{aligned}$$
(6)

Following optimization, the variables adhere to the multiplicative update rule outlined below:

$$\begin{aligned} \left. {\begin{array}{*{20}{l}} {U_{f} \leftarrow U_{f} \frac{{{{\left( {{K_f }{V_f }} \right) }}}}{{{{\left( {{K_f }{U_f }{V_f }{{\left( {{V_f }} \right) }^T}} \right) }}}}}\\ {V_{f} \leftarrow V_{f} \frac{{{{\left( {{K_f }{U_f }} \right) }}}}{{{{\left( {{V_f }{{\left( {{U_f }} \right) }^T}{K_f }{U_f }} \right) }}}}} \end{array}} \right\} \end{aligned}$$
(7)

where \({K_f} = (X_f) ^T{X_f }\) calculates the inner product within the initial data space.

3.2 Missing data completion

Given data matrix \(X_f\) with f-views, where each view is facing the problem of incompleteness. Since the missing instances could cause information to be incorrect for each view, we are unable to apply the clustering algorithm directly to partial data. In such a way, we introduce a weighted diagonal matrix for each incomplete view, which is filled through the following assumption:

$$\begin{aligned} W_f^s = \left\{ \begin{array}{l} 1,\,\,\,\,\,\,\text {if}\,\text {the}\,\, s{th}\,\, \text {instances}\,\, in\,\, f^{th}\,\, \text {view}\\ 0,\,\,\,\,\,\,\text {otherwise} \end{array} \right. \end{aligned}$$
(8)

3.3 Proposed objective function

In the input matrix \(X = \mathrm{{ }}\left\{ {{X_1},{X_2},...,{X_F}} \right\} \in {\Re ^{m \times n}}\), each row denotes a unique feature dimension, and each column represents an individual data instance, thereby defining the dataset with F views. CF satisfies the conditions for achieving the approximation through the three matrices, denoted as \(X \approx XU{V^T}\). This is because \(V \in {\Re ^{n \times c}}\) acts as the projection matrix, displaying the projected values that correlate to the concepts, and \(U \in {\Re ^{n \times c}}\) acts as the association matrix, validating the relationship of data points to concepts. The following is the formulation of the objective function:

$$\begin{aligned} {O_{WCFIMC}} = \sum \limits _{f = 1}^F {\left\{ {\begin{array}{*{20}{l}} \begin{array}{l} \left\| {{W_f}\left( {{X_f} - {X_f}{U_f}V_f^T} \right) } \right\| _F^2 + \alpha {\omega _f}\left\| {{W_f}\left( {{V_f} - {V^*}} \right) } \right\| _F^2\\ + \beta \left\| {{U_f}} \right\| _F^2 + \gamma \left\| {{V_f}} \right\| _F^2 + \eta {\left\| {{\omega _f}} \right\| ^2} \end{array} \end{array}} \right. } \end{aligned}$$
(9)

\(s.t.\,\,\,\,{U_{f}} \ge 0,\,{V_{f}} \ge 0,\,{\omega _{f}} \ge 0,\,\sum \limits _{f = 1}^F {{\omega _{f}} = 1.} \,\)

  • \({\left\| {{W_f}\left( {{X_f} - {X_f}{U_f}V_f^T} \right) } \right\| _F^2}\) is the mathematical representation of the concept factorization with weighted diagonal matrix.

  • \({\left\| {{W_f}\left( {{V_f} - {V^*}} \right) } \right\| _F^2}\) represents the correlation between the coefficient matrix and consensus matrix.

  • \({\left\| {{U_f}} \right\| _F^2}\) is used to represents the maintain the consistent information across the multiple views.

  • \({\left\| {{V_f}} \right\| _F^2}\) defines to avoid the over-fitting issue among the views.

where \(\alpha \), \(\beta \), \(\gamma \) and \(\eta \) are the trade-off parameters. We denote \( {{Q_f}} = W_f^T{W_f}\) and Eq. (9) is rewritten as:

$$\begin{aligned} {O_2} = \left\{ {\begin{array}{*{20}{l}} {{Q_f}\left( {tr\left( {{{\left( {{X_f} - {X_f}{U_f}V_f^T} \right) }^T}\left( {{X_f} - {X_f}{U_f}V_f^T} \right) } \right) } \right) }\\ {\begin{array}{*{20}{l}} { + \alpha {\omega _f}{Q_f}\left( {tr\left( {{{\left( {{V_f} - {V^*}} \right) }^T}\left( {{V_f} - {V^*}} \right) } \right) } \right) + \beta tr\left( {{U_f}U_f^T} \right) }\\ { + \gamma tr\left( {{V_f}V_f^T} \right) + \eta {{\left\| {{\omega _f}} \right\| }^2}} \end{array}} \end{array}} \right. \end{aligned}$$
(10)

Defining the standard kernel matrix \(K=X^TX\), which is used to calculate the data space’s inner product. Next, we rewrite Eq. (10) in this way:

$$\begin{aligned} {O_3} = \left\{ {\begin{array}{*{20}{l}} {Q\left( {tr\left( K \right) - 2tr\left( {V{U^T}K} \right) + tr\left( {V{U^T}KU{V^T}} \right) } \right) }\\ { + \alpha \omega Q\left( {tr{{\left( {V - {V^*}} \right) }^T}\left( {V - {V^*}} \right) } \right) + + \beta tr\left( {{U_f}U_f^T} \right) + \gamma tr\left( {{V_f}V_f^T} \right) +\eta {{\left\| \omega \right\| }^2}} \end{array}} \right. \nonumber \\ \end{aligned}$$
(11)

To put it succinctly, the CIWCFMvC modifies the optimization approach to get the conventional comprehensive solution. Through the use of \(\omega \), each view is stated as follows: \(\omega \) as 1/M. The k-means algorithm returns W, V, and \({V^{*}}\) primary values.

3.4 Optimization of the proposed function

Lagrange’s multiplier (LM) is integrated during the optimization process to ascertain the most optimal local solution, which is accomplished through the use of the iterative updating technique. Karush–Kuhn–Tucker (KKT) criteria are then taken into consideration for the analysis of the final amended rules.

3.4.1 Optimization of U

For the restrictions \( {U_{a,b}}\ge 0 \), assume the LM \( {\phi _{a,b}} \). In order to assess the function’s optimal outcome in light of the limitations, the LM is applied. In the end, the formulation of the Lagrange’s function \(L_1\) is \( L_1 = \,O\, - \,tr(\phi U) \). We address the relevant phrase up to U.

$$\begin{aligned} {L_1} = Q\left( { - 2tr\left( {V{U^T}K} \right) + tr\left( {V{U^T}KU{V^T}} \right) } \right) + \beta tr\left( {U{U^T}} \right) - tr\left( {\phi U} \right) \end{aligned}$$
(12)

By applying the partial derivative of \( L_{1} \) w.r.t U:

$$\begin{aligned} \frac{{\partial {L_1}}}{{\partial U}} = Q\left( { - 2KV + 2KUV{V^T}} \right) + 2\beta U - \phi \end{aligned}$$
(13)

Using the KKT condition \( {\phi _{ik}}{U_{ik}}\, = \,0 \), the following optimize rule for U is:

$$\begin{aligned} {U_i}_k = {U_i}_k\frac{{{{\left( {Q KV} \right) }_{ik}}}}{{{{\left( {Q KU{V^T}V}+ \beta U \right) }_{ik}}}} \end{aligned}$$
(14)

3.4.2 Optimization of V

For the constraints \( {V_{a,b}}\ge 0 \), consider the LM \( {\psi _{a,b}} \). Then, \(L_2 = \,O\, - \,tr(\psi V) \) is the reformed Lagrange’s function. We take into account only the required element of V.

$$\begin{aligned} {L_2}= & {} Q\left( { - 2tr\left( {V{U^T}K} \right) + tr\left( {V{U^T}KU{V^T}} \right) } \right) + \alpha \omega Q\left( {tr{{\left( {V - {V^*}} \right) }^T}\left( {V - {V^*}} \right) } \right) \nonumber \\{} & {} +\gamma tr\left( {V{V^T}} \right) - tr(\psi V) \end{aligned}$$
(15)

By applying the partial derivative of \( L_{2} \) w.r.t V:

$$\begin{aligned} \frac{{\partial {L_2}}}{{\partial V}} = Q\left( { - 2KU + 2V{U^T}KU} \right) + Q\left( {2\alpha \omega \left( {V - {V^*}} \right) } \right) + 2\gamma V - \psi \end{aligned}$$
(16)

Using the KKT condition \( {\psi _{i,k}}{V_{i,k}}\, = \,0 \), the following optimize rule for H is:

$$\begin{aligned} {V_{i,k}} = {V_{i,k}}\frac{{QKU + \alpha \omega Q{V^*}}}{{V{U^T}QKU + \alpha \omega QV + \gamma V}} \end{aligned}$$
(17)

It is important to highlight that in order to avoid \(U_{f}\) from reaching excessively high values (which could result in extremely low values of \(V_{f})\), it’s typical to impose a constraints on each associate matrix \(U_{f}\). However, the updated \(U_{f}\) might not satisfy the given constraints. Therefore, normalization is applied to matrices U and V in order to obtain the consistency constraint by the following scenario:

$$\begin{aligned} {V} \leftarrow {V}{({N})^{\frac{{ - 1}}{2}}},{U} \leftarrow {U}{({N})^{\frac{1}{2}}} \end{aligned}$$
(18)

While a diagonal matrix is implied by N and is expressed as:

$$\begin{aligned} N = diag\left( {\sum \nolimits _z {{{\left( V \right) }_{z,1}},} \sum \nolimits _z {{{\left( V \right) }_{z,2}},}...,\sum \nolimits _z {{{\left( V \right) }_{z,c}}} } \right) \end{aligned}$$
(19)

3.4.3 Optimization \(V^{*}\)

Assuming \(\zeta _{a,b}\) as the LM, let \( {V^*_{a,b}}\ge 0 \) be the constraints. Then, \( L_3 = \,O\, - \,tr(\zeta V^*) \) is the transformed Lagrange’s function. We focus on terms that contain only \(V^*\), and we use the partial derivation of Eq. (13) with respect to \(V^*\).

$$\begin{aligned} {L_3} = \alpha \omega Q \left( {tr\left( {{{\left( {V - {V^*}} \right) }^T}\left( {V - {V^*}} \right) } \right) } \right) \end{aligned}$$
(20)

The above equation is solved, and the update rule for \( V^{*} \) is then drawn:

$$\begin{aligned} {V^ * } = \frac{{\sum \nolimits _{j = 1}^M {{\omega _j} {{Q}} {V_j}} }}{{\sum \nolimits _{j = 1}^M {{\omega _j } {{Q}} } }} \end{aligned}$$
(21)

3.4.4 Optimization \(\omega \)

The weights for distinct views are automatically computed based on the disagreement factor between each V, and \(V^*\). Subsequently, the objective function is reformulated in the following manner:

$$\begin{aligned} O(\omega ) = \sum \limits _{j = 1}^J {\omega \left\| {V - {V^ * }} \right\| _F^2 + \eta \left\| \omega \right\| _2^2} \end{aligned}$$
(22)

where \(\pi \left\| \omega \right\| _2^2\) is used to control the smoothen the weight distribution among the multiple views to avoid the futile solution. Equation (22) is effectively solved by the quadratic programing Matlab function, i.e., quadprog.

Algorithm 1
figure a

The CIWCFMvC Algorithm

3.5 Computational complexity

We examine the complexity of the proposed method in this section. The kernel’s computational complexity is \(O(mn^{2})\). The related cost for the multiplicative updating case is O(tmn) if we assume that the multiplicative update ends after t iterations. Thus, \(O(mn^{2}+tmn)\) can be used to represent the overall computational complexity of the suggested approach.

4 Experiments and analysis

This section presents a comparison of seven state-of-the-art approaches on seven benchmark datasets with the proposed CIWCFMvC method. The normalized mutual information (NMI), F-score, and accuracy (ACC) are used to evaluate the clustering performances. The whole cases are arranged in these datasets. Next, in order to render the data as incomplete, we arbitrarily eliminate some representations from each view. In particular, the interval of 20% represents the ratio of incomplete occurrences, which ranges from 10 to 50%.

4.1 Dataset

To assess the efficacy of the recommended method, we conduct analysis on widely employed benchmark datasets, i.e., 3Sources,Footnote 1 NGs,Footnote 2 Wikipedia Articles,Footnote 3 BBCSportFootnote 4, BBC,Footnote 5 WebKB,Footnote 6 Citeseer,Footnote 7 and Reuters.Footnote 8. The description of the datasets is illustrated in Table 1.

Table 1 Important statistics of the benchmark datasets
  • 3Sources dataset is compiled from three reliable online news sources, each of which offers a distinct viewpoint. A selection of 169 distinct news stories has been made from these sources.

  • NGs dataset is the subsets of the 20 newsgroup NGs dataset, which contains archives from a variety of newsgroups. Extracts of reports from various news groups have been used, with each group represented as an independent point of view.

  • Wikipedia Articles dataset is made up of carefully picked parts of Wikipedia’s featured articles that have been put together in reports. Since it was compiled in October 2009, 2,669 articles from 29 different categories have been included. The most popular ten categories are highlighted, including articles that have several sections and photographs.

  • BBCSport is extracted from BCSport website which contains 544 records. A manual classification into one of five subject groups has been performed on each record, which has been divided into two sections.

  • BBC website maintains a collection of articles that are organized into five primary categories: business, entertainment, politics, sports, and technology. These articles cover the years 2004 to 2005. Six hundred and eighty-five stories were selected from four different sources.

  • WebKB dataset comprises 203 web pages organized into four divisions. The content of each webpage, including the title text and hyperlinks, define it.

  • Citeeer are 3312 papers in this collection, linked by 4732 citations. Each of these publications is annotated using six different labels: DB, IR, ML, Agents, AI, and HC.

  • Reuters is a compilation of English documents translated into four additional languages: Italian, French, Spanish, and German.

4.2 Evaluation indices

  1. 1.

    ACC: It determines which data point has the highest rate of accurate assignment to the correct cluster. Given that \(f_{i}\) represents the dataset \(x_{i}\)’s actual label and \(g_{i}\) represents the algorithm’s label, the ACC can be computed as follows:

    $$\begin{aligned} \text {ACC} = \frac{{\sum \limits _{i = 1}^n {\rho ({f_i},\text {map}({g_i}))} }}{{{n_d}}} \end{aligned}$$
    (23)

    If the indicator function is defined by \(\rho (x,y)\), the total number of points is \(n_{d}\), and the mapping function map(\(g_{i}\)) is used to assess the clustering label in order to establish true labels.

  2. 2.

    NMI: It employ for collaborative analysis comparing the truth label of the dataset with the label generated by the proposed method.

    Given the actual label set \(\Omega \, = \,\{ {S_1},{S_2},...,{S_c}\} \) and the clustering’s label \(\Omega '\, = \,\{ S{'_1},S{'_2},...,S{'_k}\} \), let \(m_{i}\) and \(m'_{i}\) represent the data points in clusters \(S_{i}\) and \(S'_{i}\), respectively, and \(m_{xy}\) denotes the data points in the intersection of clusters \(S_{x}\) and \(S_{y}\), the NMI between \(\Omega \) and \(\Omega '\) is computed as follows:

    $$\begin{aligned} \text {NMI}\, = \,\frac{{\sum \limits _{x = 1}^c {\sum \limits _{y = 1}^k {\log \left( \frac{{m{m_{xy}}}}{{{m_x}m{'_y}}}\right) } } }}{{\sqrt{\left( {\sum \limits _{x = 1}^c {{m_x}\log \frac{{{m_x}}}{m}} } \right) \left( {\sum \limits _{y = 1}^k {m{'_t}\log \frac{{m{'_t}}}{m}} } \right) } }} \end{aligned}$$
    (24)
  3. 3.

    F-score: The harmonic mean of the recall and precision is used to get the F-score. The definition of the calculation equation is:

    $$\begin{aligned} F{-}\text {score} = \frac{{2 \times P_n \times R_l}}{{P_n + R_l}} \end{aligned}$$
    (25)

    where \(P_n\) defines as precision and \(R_l\) defines as recall.

4.3 Baseline methods

We evaluate the CIWCFMvC against the existing techniques. Below is a summary of the techniques that have been compared in detail.

  • MIC [14]: For each incomplete view, the average feature values are filled in to address missing occurrences using the MIC approach. It then tackles this problem by applying \(L_{2,1}\)-Norm regularization and weighted NMF.

  • DAIMC [25]: Considering both basis matrix alignment and instances aligned, DAIMC aims to obtain a common latent feature matrix for all perspectives. It presents a corresponding weight matrix for every incomplete view, giving each view’s supplied instances one weight and its missing instances zero.

  • OMVC [15]: OMVC enforces sparsity in the acquired latent feature matrices through lasso regularization, thereby enhancing resilience to noise and outliers. Noteworthy is OMVC’s memory efficiency, as it circumvents the need to store the entire data matrix, resulting in reduced space complexity. The method processes data incrementally, simultaneously learning latent features and updating the basis matrix.

  • OPIMC [16]: OPIMC tackles the challenge of large-scale incomplete multi-view clustering by incorporating information about missing instances through weighted and matrix factorization. It introduces two global statistics that facilitate direct clustering outcomes and effectively determine the conclusion of the iteration process.

  • CFTIMC [17]: The common local graph is learned from the completed multiple views by CFTIMC, which models the inter-view alignment relation to infer the missing samples. Lastly, CFTIMC generates the spectral embedding for k-means clustering using the common local graph.

  • GIMC-FLSD [24]: With the help of local graph regularization, GIMC-FLSD determines the common representation from imperfect data and gives each view a learnable weight.

  • UEAF [41]: By using dimension graph regularization, UEAF assures that missing data are recovered, treating them as errors. Through the use of reserve graph constraints, it additionally guarantees the consensus structure of completed views. The multi-view data that UEAF generates are then used to extract a common representation.

  • IMC-LRAGR [46]: To build graphs that capture both global and local data structures, the suggested approach combines non-negative restrictions with distance regularization terms inside low-rank representations. The low-dimensional representation of the graph is then obtained by using spectral clustering.

  • EEOMVC [47]: This method creates low-dimensional latent features, makes a single partition representation, and breaks down larger similarity graphs from anchor graphs for every view. The binary indicator matrix is directly generated via a label discretization process. Clustering results are improved by the method by combining latent information fusion and clustering into a unified framework.

  • EERIMVC [49]: This technique presents a regularization technique to enhance the effectiveness of clustering in spite of missing data. The technique generates a single clustering result by combining data from all accessible views, even if some views are lacking.

  • UOMvSC [50]: In this method, the unified graph is produced by utilizing the relationship between the graph and the inner product of the embedding matrix. Information from every view is combined into one single graph. It is a one-step technique where the clustering labels are obtained directly from this unified network.

4.4 Parameter study

This section analyses the sensitivity of the manually adjusted parameters \(\alpha \), \(\beta \) (=\(\gamma \)), and \(\eta \) under average clustering performance. The parameter \(\alpha \) is chosen as \(\big \{ 1e-1, 1e-2, 1e-3, 1e-4, 1e-5 \big \}\), \(\beta \) is selected as \(\left\{ {10, 20, 30, 40, 50} \right\} \), and \(\eta \) from \(\left\{ {0.01,0.1,1,10,100} \right\} \). The performance evaluation on variable values of \(\eta \) and \(\alpha \) is shown in Fig. 1 and for \(\beta \) is discussed in Fig. 2. These figures clearly show that the proposed method maintains consistent performance across a diverse set of parameters. These experiments provide strong evidence for the robustness of the proposed methods against parameter variations.

Fig. 1
figure 1

Parameter sensitivity on the compared datasets

Fig. 2
figure 2

Parameter sensitivity on the compared datasets

4.5 Convergence study

The objective function meets the convergence for a missing rate of 0.1, 0.3, and 0.5 in Fig. 3. It is noteworthy to emphasize that the method optimizes the given function while continuously meeting the convergence requirements. Our method finds the most optimized values for the variables through iterative updates. The function’s values steadily decline as the number of repetition rises, finally attaining convergence after 30 iterations, according to analysis of Fig. 3.

Fig. 3
figure 3

Convergence rate of the benchmark datasets on 0.1, 0.3, 0.5 missing rates

4.6 Experiment results

The proposed approach’s clustering performance will be assessed using commonly employed metrics such as F-score, ACC, and NMI. The corresponding results are presented in Tables 2, 3, and 4, with bold numbers highlighting the top performances. Drawing conclusions from the evaluated performance, we arrive at the following findings:

To replace missing values in each data view with the matching average value, MIC used a weighted NMF method. It did not, however, outperform the recommended technique, demonstrating the superiority and efficacy of our suggested approach in improving performance. DAIMC outperformed other techniques when it came to clustering performance on the Wikipedia dataset. Likewise, our suggested approach produced better assessment outcomes on other datasets, verifying the efficacy of our technique.

Even though OMVC handled missing multi-view data, on average it performed the worst out of all the algorithms that were compared. On the other hand, our suggested approach demonstrated better clustering performance and handled incompleteness in multi-view data, achieving over 70% performance on datasets like BBCSport, NGs, and BBC.

Using NMF and Frobenius norm, OPIMC obtained the second-best results for all metrics. Alternatively, our approach demonstrated the best clustering performance when compared to other methods by utilizing weighted idea factorization and a co-regularization expression to create the common consensus matrix.

Average clustering performance was achieved by CFTIMC by combining the NMF approach with common latent subspace and manifold learning. On the other hand, our weighted idea factorization-based approach demonstrated better clustering performance in all of the datasets that were evaluated. While GIMC_FLSD improved over IMG in addressing missing instances, it was not able to outperform our suggested approach, which outperformed other state-of-the-art methods by achieving over 45% average performance across all criteria.

Similar to our approach, UEAF sought to remedy missing instances; however, it did not outperform it. Comparing our approach to other state-of-the-art techniques, it showed an average performance of over 50% across all parameters, demonstrating its efficacy in filling missing instances. IMC_LRAGR produced average clustering performance by combining the NMF approach with common latent subspace and manifold learning. By using weighted idea factorization, on the other hand, our suggested approach showed better clustering performance on all of the comparable datasets.

For the majority of datasets, the EEOMVC, EERIMVC, and UOMvSC algorithms perform better when clustering multi-view data. However, across all datasets, our suggested algorithm outperforms the competition and shows the best results..

In summary, our approach performs better than the current methods on real-world datasets, as shown by the comparison between Tables 2, 3 and 4. This highlights the superior performance of our method, which leverages a smooth regularization term to reduce over-fitting problems between views and a co-regularization term to reveal the shared consensus structure in the data.

Table 2 Average and Standard Deviation of ACC (%) of different approaches
Table 3 Average and standard deviation of NMI (%) of different approaches
Table 4 Average and standard deviation of F_score (%) of different approaches

5 Conclusion

In this study, we explore the challenge of dealing with incomplete views in multi-view clustering, where each view is affected by the absence of certain instances. By using the weighted concept factorization theory, which reduces disagreement between many viewpoints and a common consensus matrix in addition to utilizing matrix factorization, the CIWCFMvC model is proposed. Moreover, the weight of the view is automatically adjusted throughout the optimization process. Lastly, the innovative iterative technique is used to maximize the suggested objective function of the CIWCFMvC. Comprehensive tests on benchmark datasets confirm that the CIWCFMvC is better than the current methods.