Keywords

1 Introduction

Non-negative Matrix Factorization (NMF) is one of the most powerful tools in numerical computing, machine learning and data mining. By decomposing original high dimensional non-negative data matrix X into two low dimensional non-negative factors U and V, namely basis matrix and coefficient matrix, such that X ≈ UVT. Moreover, the additive reconstruction with nonnegative constraints can lead to a parts-based representation for images [1], texts [2], and microarray data [3] and so on.

To decompose the given data into a component part that encodes low-rank sparse principal features and a noise-fitting error part, Zhang et al. proposed a transductive low-rank and sparse principal feature coding (LSPFC) model to recover the low-rank and sparse subspaces jointly for robust data representation [4]. By imposing orthogonality constraints on NMF, Ding et al. proposed orthogonal non-negative matrix factorization (ONMF) method [5], which has been shown to work remarkably well for clustering tasks. Pompili et al. show that ONMF is mathematically equivalent to a weighted variant of spherical k-means [6]. Motivated by recent progress in manifold learning and orthogonal projection, in this paper we propose a novel algorithm, which called Orthogonal Graph Regularized Nonnegative Matrix Factorization (OGNMF), which combining orthogonal constraints and manifold regularization in a parts-based low-dimensional space.

From geometric perspective, the data samples are usually sampled from a low-dimensional manifold embedded in a high-dimensional ambient space. Based on such assumption, Cai et al. proposed graph regularized NMF (GNMF [7]) method, which incorporates the manifold learning into NMF for finding a compact low-dimensional representation to discover the latent semantics and intrinsic geometric structure in the dataset simultaneously. Since the sparse hypergraph inherits the merits of both the hypergraph model and sparse representation, S. Huang et al. proposed Sparse Hypergraph regularized NMF (SHNMF [8]) to exploit the high-order discriminant manifold information for data representation. In order to depict complex relations of samples, J. Wang et al. encoded different sample relations into multiple graphs and proposed Multiple Graph regularized NMF (MGNMF [9]). However, the process of multiple graphs construction is very time consuming. Motivated by recent study on sample diversity learning [10], C. Wang et al. proposed Graph regularized NMF with Sample Diversity (GNMFSD [11]), which incorporated label information into the graph to encode the intrinsic geometrical structures of the data space, then the discriminant power of the basis vectors is enhanced. By introducing an additional regression level, Mei et al. proposed a general method for including side information on the columns and rows in NMF, which can be used for time series recovery and prediction [12].

Recent works have shown that not only the original data samples are found to lie on a nonlinear low-dimensional manifold, which is called data manifold, but also the feature vectors lie on a manifold, which is called feature manifold [13]. Considering the duality between data samples and feature vectors, several dual-graph based data representation algorithms have been proposed and shown to be superior to traditional data manifold regularization based methods. Shang et al. [14] proposed graph Dual regularization Non-negative Matrix Factorization (DNMF). In order to improve the overall performance of recommender systems, by encoding the side information from both users and items as two graph regularization terms, Yao et al. [15] proposed a dual-regularized model for one-class collaborate filtering. In order to estimate fraction of different land cover types from remote sensing imagery, Tong et al. [16] applied DNMF in hyperspectral unmixing and showed that DNMF has better performance than GNMF. Yang et al. [17] applied DNMF to discover real-world events from Flickr data. By imposing the L2,1-norm constraint on the self-representation coefficients matrix in data space, Shang et al. proposed self-representation based dual-graph regularized feature selection clustering (DFSC, [18]) and non-negative spectral learning and sparse regression-based dual-graph regularized feature selection (NSSRD, [19]). For incorporating the graph topology, Yankelevsky et al. [20] proposed a dual regularized dictionary learning method, which imposed a quadratic smoothness constraint on the dictionary bases and a manifold smoothness regularization on the sparse representations. Luo et al. proposed Dual-regularized Multi-view Non-negative Matrix Factorization (DMvNMF [21]) to deal with multi-view data, which can extract compatible and complementary information contained in multiple modality datasets, while preserve the geometric structures in both the data space and the feature space. In order to reduce the redundancy between bases and representations in GNMF model, He et al. proposed Orthogonal GNMF (OGNMF [22]), which incorporates three kinds of orthogonal constraints into GNMF model.

In summary, our main contributions in this paper are listed below.

  1. (1)

    Orthogonal dual-graph regularized NMF models are proposed, in which three types of orthogonal constraints are added, including U orthogonal, V orthogonal and bi-orthogonal. In this way, the potential geometrical structural information can be preserved during the process of data representation, which can effectively enhance the discriminative ability of clustering.

  2. (2)

    The multiplicative iterative updating rules derived from original NMF are used to optimize the proposed DNMF models.

  3. (3)

    Comprehensive experiments on ten real world datasets are conducted to show the effectiveness of the proposed three algorithms, and demonstrate its advantage over other state-of-the-art methods.

The remainder of this paper is organized as follows. Section 2 briefly reviews the related works, such as NMF, ONMF and DNMF. In Sect. 3, we proposed three types of ODNMF models and their optimization algorithms. Extensive experiments are presented in Sect. 4. Finally, Sect. 5 concludes the paper.

2 Related Works

2.1 NMF

NMF aims to express each sample \(x_{i}\) as a linear combination of the basis vectors which are columns of U, namely, where \(v_{j}\) is the jth column \(x_{i} = Uv_{j}\) of V. Such a factorization is generally obtained by minimizing the following cost function:

$$ \begin{gathered} \begin{array}{*{20}c} {\mathop {\min }\limits_{U,V} } & {\left\| {X - UV^{T} } \right\|_{F}^{2} } \\ \end{array} \hfill \\ \begin{array}{*{20}c} {s.t.} & {u_{ij} \ge 0,v_{ij} \ge 0} \\ \end{array} \hfill \\ \end{gathered} $$
(1)

where \(X = \left[ {x_{1} ,x_{2} , \cdots ,x_{n} } \right] \in R^{d \times n}\) denotes the original data matrix, which contains n data samples with the dimensionality d, and \(U \in R^{d \times r}\) denotes the basis matrix, \(V \in R^{r \times n}\) denotes the coefficient matrix, where r is the reduced target dimensionality of projected low dimensional data. The NMF model (1) can be optimized by using following multiplicative updating rules [1]:

$$ u_{ij} = u_{ij} \frac{{\left( {XV} \right)_{ij} }}{{\left( {UV^{T} V} \right)_{ij} }} $$
(2)
$$ v_{ij} = v_{ij} \frac{{\left( {X^{T} U} \right)_{ij} }}{{\left( {VU^{T} U} \right)_{ij} }} $$
(3)

Since the objective function in (1) is not convex in both U and V, the above iterative updating rules can only find the local minimum of the objective function. The convergence of the optimization algorithm is proved in [1].

2.2 ONMF

ONMF is a variant of NMF, which approximate the data matrix with the product of two low-rank nonnegative matrices, one of which has orthonormal columns. For example, when the coefficient matrix V has orthonormal columns, the corresponding optimization model is

$$ \begin{array}{*{20}l} {\mathop {\min }\limits_{{U,V}} } \hfill & {\left\| {X - UV^{T} } \right\|_{F}^{2} } \hfill \\ {s.t.} \hfill & {\,\begin{array}{*{20}l} {u_{{ij}} \ge 0,v_{{ij}} \ge 0} \hfill \\ {V^{T} V = I} \hfill \\ \end{array} } \hfill \\ \end{array} $$
(4)

Since the column vectors of coefficient matrix V is orthonormal, its power lies in the potential to generate sparser part-based decompositions of data with smaller basis vectors, that are easier to interpret.

The updating rules to optimize the above objective function (4) are given as follows

$$ u_{ij} = u_{ij} \frac{{\left( {XV} \right)_{ij} }}{{\left( {UV^{T} V} \right)_{ij} }} $$
(5)
$$ v_{ij} = v_{ij} \frac{{\left( {X^{T} U} \right)_{ij} }}{{\left( {VU^{T} XV} \right)_{ij} }} $$
(6)

2.3 GNMF

GNMF adds manifold learning into the classical NMF for finding a compact representation, in which the local geometric structure of data manifold is simulated by a neighborhood graph.

Let \(G(X,W)\) denote a graph with vertex set \(X = \left\{ {x_{1} ,x_{2} , \cdots ,x_{n} } \right\}\) and weight matrix W whose entry \(W_{ij}\) denotes the similarity between sample \(x_{i}\) and \(x_{j}\). There several ways to compute the similarity weight matrix W, and we use the Gauss kernel function as follows:

$$ W_{{ij}} = \left\{ {\begin{array}{*{20}l} {e^{{ - \frac{{\left\| {x_{i} - x_{j} } \right\|_{2}^{2} }}{t}}} } \hfill & {x_{i} \in N_{k} (x_{j} ) \vee x_{j} \in N_{k} (x_{i} )} \hfill \\ 0 \hfill & {otherwise} \hfill \\ \end{array} } \right. $$
(7)

where \(N_{k} (x_{i} )\) is the sample points set that contains the k nearest neighbors of xi, and t is heat kernel parameter. Since G is an undirected graph, the weight matrix W is symmetric and the diagonal entries in W are all 0.

GNMF aims to learning a collection of nonnegative basis which can not only minimize the reconstruction error but also preserve the similarities between pairwise samples that encoded in neighborhood graph. Therefore, the optimal nonnegative basis U can be obtained via minimizing the following objective:

$$ \begin{array}{*{20}l} {\mathop {\min }\limits_{U,V} } \hfill & {\left\| {X - UV^{T} } \right\|_{F}^{2} + \lambda tr\left( {V^{T} LV} \right)} \hfill \\ {s.t.} \hfill & {u_{ij} \ge 0,v_{ij} \ge 0} \hfill \\ \end{array} $$
(8)

where L = DW is the graph Laplacian matrix and D is a diagonal matrix whose entry is \(D_{ii} = \sum\limits_{j = 1}^{n} {W_{ij} }\), W is the weight matrix to measure the similarity between the nearby data samples. The \(\lambda\) is a regularization parameter which balance the reconstruction error and graph embedding term. Since the graph embedding regularizer could smooth the variation between two samples with large similarity in the latent low-dimensional space, the performance of clustering based on GNMF can be effectively improved.

Similar to Eq. (2) and (3), the iterative updating rules to solve (8) are presented as follows:

$$ u_{ij} = u_{ij} \frac{{\left( {XV} \right)_{ij} }}{{\left( {UV^{T} V} \right)_{ij} }} $$
(9)
$$ v_{ij} = v_{ij} \frac{{\left( {X^{T} U + \lambda WV} \right)_{ij} }}{{\left( {VU^{T} U + \lambda DV} \right)_{ij} }} $$
(10)

2.4 DNMF

DNMF is an extension of GNMF, which imposed graph regularization on basis matrix and coefficient matrix to discover the geometrical structure of data. Similar to GNMF, DNMF constructed k nearest neighbor graph on data samples and feature vectors. Suppose the columns of data matrix X are \(\{ X_{:1} ,X_{:2} ,...,X_{:N} \}\) and the weight between two neighborhood samples can be defined as:

$$ [W^{V} ]_{{ij}} = \left\{ {\begin{array}{*{20}l} 1 \hfill & {{\rm{if}}\,X_{{:,j}} \in N(X_{{:,i}} )} \hfill \\ 0 \hfill & {{\rm{otherwise}}} \hfill \\ \end{array} } \right.i,j = 1,...,N $$
(11)

where \(N(X_{:,i} )\) is the k-th neighborhood sample of \(X_{:,i}\). The graph Laplacian matrix is \(L_{V} = D^{V} - W^{V}\), here \(D^{V}\) is diagonal matrix, whose diagonal elements are rows sum of W, i.e., \([D^{V} ]_{{{\rm{ii}}}} = \sum\nolimits_{j} {[W^{V} ]_{ij} }\).

Similarly, the rows of data matrix X are feature vectors, i.e., \(\{ X_{1,:}^{{\rm{T}}} ,X_{2,:}^{T} ,...,X_{{_{N,:} }}^{T} \}\), and the weight between feature vectors can be defined as:

$$ {[}W^{U} ]_{ij} = \left\{ {\begin{array}{*{20}l} {1,} \hfill & {if\,X_{j,:}^{T} \in N(X_{{_{i,:} }}^{T} )} \hfill \\ 0 \hfill & {otherwise} \hfill \\ \end{array} i,j = 1,...,M} \right. $$
(12)

where \(N(X_{i,:}^{T} )\) is the k-th feature vector of \(X_{i,:}^{T}\)., and the corresponding graph Laplacian matrix is \(L_{{\rm{U}}} = D^{U} - W^{U}\), here \(D^{U}\) is a diagonal matrix, and its element is row sum of W, i.e. \([D^{V} ]_{{{\rm{ii}}}} = \sum\nolimits_{j} {[W^{V} ]_{ij} }\).

Similar to GNMF, the DNMF model can be formulated as

$$ \begin{aligned} & J_{DNMF} \,\,{=}||X - UV^{T} {||}^{{2}} + \lambda {\rm{Tr(}}V^{{\rm{T}}} L_{V} V{)} + \mu {\rm{Tr(}}U^{{\rm{T}}} L_{U} U{)} \\ & {\rm{s}}{\rm{.t}}{. }\,\,U \ge {0,}V \ge {0} \\ \end{aligned} $$
(13)

where λ, μ are regularization parameters. When μ tends to 0, DNMF is equivalent to GNMF. When both the λ and μ tend to 0, DNMF is equivalent to NMF. The minimization process of objective function in Eq. (9) can be achieved through the following update rules:

$$ U_{ij} = U_{ij} \frac{{\left( {XV} \right)_{ij} + \mu \left( {W^{U} U} \right)_{ij} }}{{\left( {UV^{T} V} \right)_{ij} + \mu \left( {D^{U} U} \right)_{ij} }} $$
(14)
$$ V_{ij} = V_{ij} \frac{{\left( {X^{T} U} \right)_{ij} + \lambda \left( {WV} \right)_{ij} }}{{\left( {VU^{T} U} \right)_{ij} + \lambda \left( {DV} \right)_{ij} }} $$
(15)

3 Proposed Methods

Orthogonal projections have been proved to be more discriminative in most cases, since it can also release the correlation of the projection directions or low-dimensional representations. With orthogonal constraints on basis matrix and coefficient matrix, the performance of DNMF model on data clustering is improved. ODNMF inherits the advantages of ONMF and DNMF, it can capture the local similarity structures of data space and feature which are helpful for extracting the discriminative features, while preserving the global geometrical structure of data space which aims to contain the intrinsic information as much as possible.

3.1 ODNMF-U

Adding the orthogonal constraint on basis matrix U, we have the following orthogonal dual-graph regularized NMF with U orthogonality (ODNMF-U):

$$ \begin{aligned} & J_{ODNMF} = ||X - UV^{T} {||}^{{2}} + \lambda {\rm{Tr(}}V^{{\rm{T}}} L_{V} V{)} + \mu {\rm{Tr(}}U^{{\rm{T}}} L_{U} U{)} \\ & {\rm{s}}{\rm{.t}}{. }\,\,U \ge {0,}V \ge {0},\,\,U^{{\rm{T}}} U = I \\ \end{aligned} $$
(16)

The objective function in model (16) can be rewritten as:

$$ \begin{aligned} J_{ODNMF} & = \,tr((X - UV^{T} )(X - UV^{T} )^{T} ) \\ & + \,\lambda tr(V^{T} L_{V} V) + \mu tr(U^{T} L_{U} U) \\ & = \,tr(XX^{T} ) - 2tr(XVU^{T} ) + tr(UV^{T} VU^{T} ) \\ & + \,\lambda tr(V^{T} L_{V} V) + \mu tr(U^{T} L_{U} U) \\ \end{aligned} $$
(17)

The partial derivatives of objective function (17) with respect to U and V are:

$$ \begin{aligned} \nabla_{{^{U} }} J & = [\nabla_{{^{U} }} J]^{ + } - [\nabla_{{^{U} }} J]^{ - } \\ & = UV^{T} V - XV + \mu L_{U} U \\ & = \nabla_{U} \varepsilon + \mu L_{U} U \\ \end{aligned} $$
(18)
$$ \begin{aligned} \nabla_{{^{V} }} J & = [\nabla_{{^{V} }} J]^{ + } - [\nabla_{{^{V} }} J]^{ - } \\ & = VU^{T} U - X^{T} U + \lambda L_{V} V \\ & = \nabla_{V} \varepsilon + \lambda L_{V} V \\ \end{aligned} $$
(19)

where \(L_{U} = D^{U} - W^{U}\) and \(L_{V} = D^{V} - W^{V}\), and their definitions are same as Eq. (11) and (12). Let \(\nabla_{U} \varepsilon = UV^{T} V - XV\) and \(\nabla_{V} \varepsilon = VU^{T} U - X^{T} U\), when imposing orthogonal constraint on U, then

$$ \nabla_{U} \varepsilon = UV^{T} X^{T} U - XV $$
(20)

Substituting Eq. (20) into Eq. (18), we have the following updating rule on U,

$$ U_{ij} = U_{ij} \frac{{\left( {XV} \right)_{ij} + \mu \left( {W^{U} U} \right)_{ij} }}{{\left( {UV^{T} X^{T} U} \right)_{ij} + \mu \left( {D^{U} U} \right)_{ij} }} $$
(21)

Similarly, the updating rule on V is,

$$ V_{ij} = V_{ij} \, \frac{{\left( {X^{T} U} \right)_{ij} + \lambda \left( {W^{V} V} \right)_{ij} }}{{\left( {VU^{T} U} \right)_{ij} + \lambda \left( {D^{V} V} \right)_{ij} }} $$
(22)

3.2 ODNMF-V

By incorporating orthogonal constraint of the coefficient matrix V in DNMF model, we have the following ODNMF-V model,

$$ \begin{aligned} & J_{ODNMF} = ||X - UV^{T} {||}^{{2}} + \lambda tr{(}V^{{\rm{T}}} L_{V} V{)} + \mu tr{(}U^{{\rm{T}}} L_{U} U{)} \\ & {\rm{s}}{\rm{.t}}{. }\,\,U \ge {0,}V \ge {0},\,\,V^{{\rm{T}}} V = I \\ \end{aligned} $$
(23)

The partial derivatives of objective function (23) with respect to U and V are same as in Eq. (18) and (19). Similarly, the orthogonal constraint on V can lead to the following formula,

$$ \begin{aligned} \nabla_{V} \varepsilon & = [\nabla_{V} \varepsilon ]^{ + } - [\nabla_{V} \varepsilon ]^{ - } \\ & = VU^{T} XV - X^{T} U \\ \end{aligned} $$
(24)

Substituting Eq. (24) into Eq. (19), we have,

$$ \begin{aligned} \nabla_{{^{V} }} J & = [\nabla_{{^{V} }} J]^{ + } - [\nabla_{{^{V} }} J]^{ - } \\ & = \nabla_{{^{V} }} \varepsilon + \mu L_{v} V \\ & = VU^{T} XV - X^{T} U + \mu L_{v} V \\ \end{aligned} $$
(25)

Then the updating rules on V and U can be formulated as follows,

$$ V_{ij} = V_{ij} \, \frac{{\left( {X^{T} U} \right)_{ij} + \lambda \left( {W^{V} V} \right)_{ij} }}{{\left( {VU^{T} XV} \right)_{ij} + \lambda \left( {D^{V} V} \right)_{ij} }} $$
(26)
$$ U_{ij} = U_{ij} \, \frac{{\left( {XV} \right)_{ij} + \mu \left( {W^{U} U} \right)_{ij} }}{{\left( {UV^{T} V} \right)_{ij} + \mu \left( {D^{U} U} \right)_{ij} }} $$
(27)

3.3 ODNMF

ODNMF imposes the orthogonal constraints on both the basis matrix U and coefficient matrix V, which involves the following minimization problem,

$$ \begin{aligned} & J_{ODNMF} = ||X - UV^{T} {||}^{{2}} + \lambda tr{(}V^{{\rm{T}}} L_{V} V{)} + \mu tr{(}U^{{\rm{T}}} L_{U} U{)} \\ & {\rm{s}}{\rm{.t}}{. }\,\,U \ge {0,}V \ge {0},\,\,V^{{\rm{T}}} V = I,\;U^{T} U = I \\ \end{aligned} $$
(28)

The orthogonal constraints on the basis matrix U and coefficient matrix V provide a strong capability of simultaneously clustering rows and columns, which we call bi-orthogonal constraints. Combining update rules (21) in OGNMF-U and (26) in OGNMF-V, the update rules of OGNMF are given as follows

$$ V_{ij} = V_{ij} \, \frac{{\left( {X^{T} U} \right)_{ij} + \lambda \left( {W^{V} V} \right)_{ij} }}{{\left( {VU^{T} XV} \right)_{ij} + \lambda \left( {D^{V} V} \right)_{ij} }} $$
(29)
$$ U_{ij} = U_{ij} \, \frac{{\left( {XV} \right)_{ij} + \mu \left( {W^{U} U} \right)_{ij} }}{{\left( {UV^{T} X^{T} U} \right)_{ij} + \mu \left( {D^{U} U} \right)_{ij} }} $$
(30)

4 Experimental Results

In this section, we evaluate the effectiveness of our proposed ODNMF framework compared with the related matrix factorization methods including NMF, ONMF, GNMF.

4.1 Experimental Setting

To investigate the image clustering performances, two popular evaluation metric are used in the experiments, i.e., the clustering accuracy (AC) and the normalized mutual information (NMI) [10], that are defined as follows

$$ AC = \frac{{\sum\limits_{i = 1}^{n} {\delta \left( {l_{i} ,\tau_{i} } \right)} }}{n} $$

where \(\delta\) is a cluster label indicator function, which equals 1 if the two entries are the same and equals 0 otherwise. \(l_{i}\) and \(\tau_{i}\) denote the true label and predicted label.

$$ NMI = \frac{{MI\left( {C,C^{\prime}} \right)}}{{\max \left( {H(C),H(C^{\prime})} \right)}} $$

where C is the set of clusters from the ground truth, and C' is predicted clusters by clustering method. \(H( \cdot )\) is the entropy of a set, and the mutual information \(MI\left( {C,C^{\prime}} \right)\) between two sets of clusters C and \(C^{\prime}\) is defined as

$$ MI\left( {C,C^{\prime}} \right) = \sum\limits_{{c_{i} \in C,c^{\prime}_{j} \in C^{\prime}}}^{{}} {p\left( {c_{i} ,c^{\prime}_{j} } \right)\log \frac{{p\left( {c_{i} ,c^{\prime}_{j} } \right)}}{{p\left( {c_{i} } \right)p\left( {c^{\prime}_{j} } \right)}}} $$

where \(p\left( {c_{i} } \right)\) and \(p\left( {c^{\prime}_{j} } \right)\) are the probabilities of a sample belonging to the clusters \(c_{i}\) and \(c^{\prime}_{j}\), respectively. \(p\left( {c_{i} ,c^{\prime}_{j} } \right)\) denotes the joint probability that the selected sample belongs to the clusters \(c_{i}\) as well as \(c^{\prime}_{j}\) at the same time. The larger AC and higher NMI indicate a better clustering performance [6].

For the ODNMF methods, the neighborhood graph on data samples are set the same as those for GNMF, where the number of nearest neighbors of each sample is fixed as 5 and the regularization parameter in model (9), (12), (19) and (24) are empirically set as 100. After the low-dimensional data representations were produced by matrix factorization methods, the k-means clustering method was performed in the low-dimensional data space. All the experiments in this paper run in MATLAB R2015b on Win7 with 8G RAM and 3.4 GHz CPU.

4.2 Datasets Description

In our experiments, we use three types of datasets, including UCI datasetsFootnote 1, text datasets, and face image datasets. For UCI datasets, MADELON is an artificial dataset, which was part of the NIPS 2003 feature selection challenge and containing data points grouped in 32 clusters placed on the vertices of a five dimensional hypercube and randomly labeled +1 or −1. CNAE9 is a highly sparse text dataset, which contains 1080 documents with 9 groups. Semeion is a handwritten digital dataset, which contains 1593 binary images created by 80 persons, and each image is with the size of 16 × 16. Hillvalley is a graphical dataset, and each sample contains 100 points of a 2D plane, and there are 606 samples in two groups, hill or valley. For text dataset, Tr31 dataset is a subset of TREC datasetsFootnote 2 and Re1 is a subset of Reuters21578 datasetFootnote 3. For face image dataset, UMIST datasetFootnote 4 consists of 575 images with 20 different persons, the number of which varies from 19 to 36. XM2VST datasetFootnote 5 contains 2950 samples with 295 persons, and YaleB datasetFootnote 6 used in our experiments is a subset of original YaleB dataset, containing 2414 images of 38 persons under 64 lighting conditions. ORL datasetFootnote 7 contains 400 images of 10 individuals, each of which contains 40 images that captured at different conditions, including shooting time, illuminations, expressions and some other facial details, such as with/without glasses. Some statistics of the four face image datasets are summarized in Table 1.

Table 1. The description of datasets used in the experiments

4.3 Results Discussion

The optimal clustering results on different datasets are reported in this section. The clustering accuracies are summarized in Table 2 and the NMI metrics are summarized in Table 3. The value in the parentheses is the optimal target dimensionality under which the method achieved the maximum evaluation metric. As can be shown, the proposed orthogonal DNMFs have better clustering performances than others in most cases. It is worth noting that the original DNMF and orthogonal DNMF perform perfect on the Hillvalley dataset. This is because that the feature or attribute neighborhood information plays more role on identifying a Hill and a Valley on a 2D graph. However, on CNAE9, Tr31 and Re1 datasets, the traditional NMF method performs better than other methods, which indicates that dual-graph regularized NMFs are not suitable for text clustering tasks.

Table 2. The optimal clustering results (AC%)
Table 3. The optimal clustering results (NMI%)

5 Conclusions

Existing data representation methods are all carried out in data space. However, the information of feature space cannot be fully exploited. To compensate for this drawback, orthogonal dual-graph regularized NMF (ODNMF) is proposed by incorporating DNMF and ONMF. Orthogonal constraints on the basis matrix U and coefficient matrix V are incorporated as the additional condition, which can not only make full use of geometrical structural information underlying the data manifold, but also enhance the generalization performance and robustness of the proposed methods. Since ONMF are more robust to noise, and DNMF has good performance on data clustering, therefore, ODNMF can combine the advantages of ONMF and DNMF together, which has been confirmed on data clustering experiments. The theoretical analysis of the models will be investigated deeply in the future work.