Orthogonal Dual Graph Regularized Nonnegative Matrix Factorization

He, Jinrong; Shi, Yanxin; Bai, Zongwen; Bao, Zeyu

doi:10.1007/978-3-030-87358-5_8

Jinrong He^14,15,16,17,
Yanxin Shi¹⁵,
Zongwen Bai¹⁷ &
…
Zeyu Bao¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12889))

Included in the following conference series:

International Conference on Image and Graphics

1777 Accesses

Abstract

Nonnegative matrix factorization (NMF) is a classical low-rank approximation method of data matrix, which decomposes a high-dimensional data matrix into two nonnegative low-rank matrices, namely basis matrix and coefficient matrix. In order to capture the local geometric structure of original dataset, manifold learning methods are incorporated into NMF framework. Motivated by recent progress in dual graph regularization, by considering the geometric structures of both the data manifold and the feature manifold, Orthogonal Dual-graph NMF (ODNMF) algorithms were proposed, which imposed orthogonality constraints on basis matrix or coefficient matrix. Since the projection directions were mutually orthogonal, the representation power of data samples was enhanced, thus ODNMF methods are more robust for data clustering. The extensive experimental results on UCI, text and face image data sets have demonstrated the superiority of the proposed methods.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Orthogonal Graph Regularized Nonnegative Matrix Factorization for Image Clustering

Orthogonal Dual Graph-Regularized Nonnegative Matrix Factorization for Co-Clustering

Article 20 April 2021

Multiple Graph Regularized Non-negative Matrix Factorization Based on L2,1 Norm for Face Recognition

Keywords

1 Introduction

Non-negative Matrix Factorization (NMF) is one of the most powerful tools in numerical computing, machine learning and data mining. By decomposing original high dimensional non-negative data matrix X into two low dimensional non-negative factors U and V, namely basis matrix and coefficient matrix, such that X ≈ UVT. Moreover, the additive reconstruction with nonnegative constraints can lead to a parts-based representation for images [1], texts [2], and microarray data [3] and so on.

To decompose the given data into a component part that encodes low-rank sparse principal features and a noise-fitting error part, Zhang et al. proposed a transductive low-rank and sparse principal feature coding (LSPFC) model to recover the low-rank and sparse subspaces jointly for robust data representation [4]. By imposing orthogonality constraints on NMF, Ding et al. proposed orthogonal non-negative matrix factorization (ONMF) method [5], which has been shown to work remarkably well for clustering tasks. Pompili et al. show that ONMF is mathematically equivalent to a weighted variant of spherical k-means [6]. Motivated by recent progress in manifold learning and orthogonal projection, in this paper we propose a novel algorithm, which called Orthogonal Graph Regularized Nonnegative Matrix Factorization (OGNMF), which combining orthogonal constraints and manifold regularization in a parts-based low-dimensional space.

From geometric perspective, the data samples are usually sampled from a low-dimensional manifold embedded in a high-dimensional ambient space. Based on such assumption, Cai et al. proposed graph regularized NMF (GNMF [7]) method, which incorporates the manifold learning into NMF for finding a compact low-dimensional representation to discover the latent semantics and intrinsic geometric structure in the dataset simultaneously. Since the sparse hypergraph inherits the merits of both the hypergraph model and sparse representation, S. Huang et al. proposed Sparse Hypergraph regularized NMF (SHNMF [8]) to exploit the high-order discriminant manifold information for data representation. In order to depict complex relations of samples, J. Wang et al. encoded different sample relations into multiple graphs and proposed Multiple Graph regularized NMF (MGNMF [9]). However, the process of multiple graphs construction is very time consuming. Motivated by recent study on sample diversity learning [10], C. Wang et al. proposed Graph regularized NMF with Sample Diversity (GNMFSD [11]), which incorporated label information into the graph to encode the intrinsic geometrical structures of the data space, then the discriminant power of the basis vectors is enhanced. By introducing an additional regression level, Mei et al. proposed a general method for including side information on the columns and rows in NMF, which can be used for time series recovery and prediction [12].

Recent works have shown that not only the original data samples are found to lie on a nonlinear low-dimensional manifold, which is called data manifold, but also the feature vectors lie on a manifold, which is called feature manifold [13]. Considering the duality between data samples and feature vectors, several dual-graph based data representation algorithms have been proposed and shown to be superior to traditional data manifold regularization based methods. Shang et al. [14] proposed graph Dual regularization Non-negative Matrix Factorization (DNMF). In order to improve the overall performance of recommender systems, by encoding the side information from both users and items as two graph regularization terms, Yao et al. [15] proposed a dual-regularized model for one-class collaborate filtering. In order to estimate fraction of different land cover types from remote sensing imagery, Tong et al. [16] applied DNMF in hyperspectral unmixing and showed that DNMF has better performance than GNMF. Yang et al. [17] applied DNMF to discover real-world events from Flickr data. By imposing the L2,1-norm constraint on the self-representation coefficients matrix in data space, Shang et al. proposed self-representation based dual-graph regularized feature selection clustering (DFSC, [18]) and non-negative spectral learning and sparse regression-based dual-graph regularized feature selection (NSSRD, [19]). For incorporating the graph topology, Yankelevsky et al. [20] proposed a dual regularized dictionary learning method, which imposed a quadratic smoothness constraint on the dictionary bases and a manifold smoothness regularization on the sparse representations. Luo et al. proposed Dual-regularized Multi-view Non-negative Matrix Factorization (DMvNMF [21]) to deal with multi-view data, which can extract compatible and complementary information contained in multiple modality datasets, while preserve the geometric structures in both the data space and the feature space. In order to reduce the redundancy between bases and representations in GNMF model, He et al. proposed Orthogonal GNMF (OGNMF [22]), which incorporates three kinds of orthogonal constraints into GNMF model.

In summary, our main contributions in this paper are listed below.

(1)
Orthogonal dual-graph regularized NMF models are proposed, in which three types of orthogonal constraints are added, including U orthogonal, V orthogonal and bi-orthogonal. In this way, the potential geometrical structural information can be preserved during the process of data representation, which can effectively enhance the discriminative ability of clustering.
(2)
The multiplicative iterative updating rules derived from original NMF are used to optimize the proposed DNMF models.
(3)
Comprehensive experiments on ten real world datasets are conducted to show the effectiveness of the proposed three algorithms, and demonstrate its advantage over other state-of-the-art methods.

The remainder of this paper is organized as follows. Section 2 briefly reviews the related works, such as NMF, ONMF and DNMF. In Sect. 3, we proposed three types of ODNMF models and their optimization algorithms. Extensive experiments are presented in Sect. 4. Finally, Sect. 5 concludes the paper.

2 Related Works

2.1 NMF

NMF aims to express each sample $x_{i}$ as a linear combination of the basis vectors which are columns of U, namely, where $v_{j}$ is the jth column $x_{i} = Uv_{j}$ of V. Such a factorization is generally obtained by minimizing the following cost function:

$$ \begin{gathered} \begin{array}{*{20}c} {\mathop {\min }\limits_{U,V} } & {\left\| {X - UV^{T} } \right\|_{F}^{2} } \\ \end{array} \hfill \\ \begin{array}{*{20}c} {s.t.} & {u_{ij} \ge 0,v_{ij} \ge 0} \\ \end{array} \hfill \\ \end{gathered} $$

(1)

where $X = \left[ {x_{1} ,x_{2} , \cdots ,x_{n} } \right] \in R^{d \times n}$ denotes the original data matrix, which contains n data samples with the dimensionality d, and $U \in R^{d \times r}$ denotes the basis matrix, $V \in R^{r \times n}$ denotes the coefficient matrix, where r is the reduced target dimensionality of projected low dimensional data. The NMF model (1) can be optimized by using following multiplicative updating rules [1]:

$$ u_{ij} = u_{ij} \frac{{\left( {XV} \right)_{ij} }}{{\left( {UV^{T} V} \right)_{ij} }} $$

(2)

$$ v_{ij} = v_{ij} \frac{{\left( {X^{T} U} \right)_{ij} }}{{\left( {VU^{T} U} \right)_{ij} }} $$

(3)

Since the objective function in (1) is not convex in both U and V, the above iterative updating rules can only find the local minimum of the objective function. The convergence of the optimization algorithm is proved in [1].

2.2 ONMF

ONMF is a variant of NMF, which approximate the data matrix with the product of two low-rank nonnegative matrices, one of which has orthonormal columns. For example, when the coefficient matrix V has orthonormal columns, the corresponding optimization model is

$$ \begin{array}{*{20}l} {\mathop {\min }\limits_{{U,V}} } \hfill & {\left\| {X - UV^{T} } \right\|_{F}^{2} } \hfill \\ {s.t.} \hfill & {\,\begin{array}{*{20}l} {u_{{ij}} \ge 0,v_{{ij}} \ge 0} \hfill \\ {V^{T} V = I} \hfill \\ \end{array} } \hfill \\ \end{array} $$

(4)

Since the column vectors of coefficient matrix V is orthonormal, its power lies in the potential to generate sparser part-based decompositions of data with smaller basis vectors, that are easier to interpret.

The updating rules to optimize the above objective function (4) are given as follows

$$ u_{ij} = u_{ij} \frac{{\left( {XV} \right)_{ij} }}{{\left( {UV^{T} V} \right)_{ij} }} $$

(5)

$$ v_{ij} = v_{ij} \frac{{\left( {X^{T} U} \right)_{ij} }}{{\left( {VU^{T} XV} \right)_{ij} }} $$

(6)

2.3 GNMF

GNMF adds manifold learning into the classical NMF for finding a compact representation, in which the local geometric structure of data manifold is simulated by a neighborhood graph.

Let $G(X,W)$ denote a graph with vertex set $X = \left\{ {x_{1} ,x_{2} , \cdots ,x_{n} } \right\}$ and weight matrix W whose entry $W_{ij}$ denotes the similarity between sample $x_{i}$ and $x_{j}$. There several ways to compute the similarity weight matrix W, and we use the Gauss kernel function as follows:

$$ W_{{ij}} = \left\{ {\begin{array}{*{20}l} {e^{{ - \frac{{\left\| {x_{i} - x_{j} } \right\|_{2}^{2} }}{t}}} } \hfill & {x_{i} \in N_{k} (x_{j} ) \vee x_{j} \in N_{k} (x_{i} )} \hfill \\ 0 \hfill & {otherwise} \hfill \\ \end{array} } \right. $$

(7)

where $N_{k} (x_{i} )$ is the sample points set that contains the k nearest neighbors of x_i, and t is heat kernel parameter. Since G is an undirected graph, the weight matrix W is symmetric and the diagonal entries in W are all 0.

GNMF aims to learning a collection of nonnegative basis which can not only minimize the reconstruction error but also preserve the similarities between pairwise samples that encoded in neighborhood graph. Therefore, the optimal nonnegative basis U can be obtained via minimizing the following objective:

$$ \begin{array}{*{20}l} {\mathop {\min }\limits_{U,V} } \hfill & {\left\| {X - UV^{T} } \right\|_{F}^{2} + \lambda tr\left( {V^{T} LV} \right)} \hfill \\ {s.t.} \hfill & {u_{ij} \ge 0,v_{ij} \ge 0} \hfill \\ \end{array} $$

(8)

where L = D–W is the graph Laplacian matrix and D is a diagonal matrix whose entry is $D_{ii} = \sum\limits_{j = 1}^{n} {W_{ij} }$, W is the weight matrix to measure the similarity between the nearby data samples. The $\lambda$ is a regularization parameter which balance the reconstruction error and graph embedding term. Since the graph embedding regularizer could smooth the variation between two samples with large similarity in the latent low-dimensional space, the performance of clustering based on GNMF can be effectively improved.

Similar to Eq. (2) and (3), the iterative updating rules to solve (8) are presented as follows:

$$ u_{ij} = u_{ij} \frac{{\left( {XV} \right)_{ij} }}{{\left( {UV^{T} V} \right)_{ij} }} $$

(9)

$$ v_{ij} = v_{ij} \frac{{\left( {X^{T} U + \lambda WV} \right)_{ij} }}{{\left( {VU^{T} U + \lambda DV} \right)_{ij} }} $$

(10)

2.4 DNMF

DNMF is an extension of GNMF, which imposed graph regularization on basis matrix and coefficient matrix to discover the geometrical structure of data. Similar to GNMF, DNMF constructed k nearest neighbor graph on data samples and feature vectors. Suppose the columns of data matrix X are $\{ X_{:1} ,X_{:2} ,...,X_{:N} \}$ and the weight between two neighborhood samples can be defined as:

$$ [W^{V} ]_{{ij}} = \left\{ {\begin{array}{*{20}l} 1 \hfill & {{\rm{if}}\,X_{{:,j}} \in N(X_{{:,i}} )} \hfill \\ 0 \hfill & {{\rm{otherwise}}} \hfill \\ \end{array} } \right.i,j = 1,...,N $$

(11)

where $N(X_{:,i} )$ is the k-th neighborhood sample of $X_{:,i}$. The graph Laplacian matrix is $L_{V} = D^{V} - W^{V}$, here $D^{V}$ is diagonal matrix, whose diagonal elements are rows sum of W, i.e., $[D^{V} ]_{{{\rm{ii}}}} = \sum\nolimits_{j} {[W^{V} ]_{ij} }$.

Similarly, the rows of data matrix X are feature vectors, i.e., $\{ X_{1,:}^{{\rm{T}}} ,X_{2,:}^{T} ,...,X_{{_{N,:} }}^{T} \}$, and the weight between feature vectors can be defined as:

$$ {[}W^{U} ]_{ij} = \left\{ {\begin{array}{*{20}l} {1,} \hfill & {if\,X_{j,:}^{T} \in N(X_{{_{i,:} }}^{T} )} \hfill \\ 0 \hfill & {otherwise} \hfill \\ \end{array} i,j = 1,...,M} \right. $$

(12)

where $N(X_{i,:}^{T} )$ is the k-th feature vector of $X_{i,:}^{T}$., and the corresponding graph Laplacian matrix is $L_{{\rm{U}}} = D^{U} - W^{U}$, here $D^{U}$ is a diagonal matrix, and its element is row sum of W, i.e. $[D^{V} ]_{{{\rm{ii}}}} = \sum\nolimits_{j} {[W^{V} ]_{ij} }$.

Similar to GNMF, the DNMF model can be formulated as

$$ \begin{aligned} & J_{DNMF} \,\,{=}||X - UV^{T} {||}^{{2}} + \lambda {\rm{Tr(}}V^{{\rm{T}}} L_{V} V{)} + \mu {\rm{Tr(}}U^{{\rm{T}}} L_{U} U{)} \\ & {\rm{s}}{\rm{.t}}{. }\,\,U \ge {0,}V \ge {0} \\ \end{aligned} $$

(13)

where λ, μ are regularization parameters. When μ tends to 0, DNMF is equivalent to GNMF. When both the λ and μ tend to 0, DNMF is equivalent to NMF. The minimization process of objective function in Eq. (9) can be achieved through the following update rules:

$$ U_{ij} = U_{ij} \frac{{\left( {XV} \right)_{ij} + \mu \left( {W^{U} U} \right)_{ij} }}{{\left( {UV^{T} V} \right)_{ij} + \mu \left( {D^{U} U} \right)_{ij} }} $$

(14)

$$ V_{ij} = V_{ij} \frac{{\left( {X^{T} U} \right)_{ij} + \lambda \left( {WV} \right)_{ij} }}{{\left( {VU^{T} U} \right)_{ij} + \lambda \left( {DV} \right)_{ij} }} $$

(15)

3 Proposed Methods

Orthogonal projections have been proved to be more discriminative in most cases, since it can also release the correlation of the projection directions or low-dimensional representations. With orthogonal constraints on basis matrix and coefficient matrix, the performance of DNMF model on data clustering is improved. ODNMF inherits the advantages of ONMF and DNMF, it can capture the local similarity structures of data space and feature which are helpful for extracting the discriminative features, while preserving the global geometrical structure of data space which aims to contain the intrinsic information as much as possible.

3.1 ODNMF-U

Adding the orthogonal constraint on basis matrix U, we have the following orthogonal dual-graph regularized NMF with U orthogonality (ODNMF-U):

$$ \begin{aligned} & J_{ODNMF} = ||X - UV^{T} {||}^{{2}} + \lambda {\rm{Tr(}}V^{{\rm{T}}} L_{V} V{)} + \mu {\rm{Tr(}}U^{{\rm{T}}} L_{U} U{)} \\ & {\rm{s}}{\rm{.t}}{. }\,\,U \ge {0,}V \ge {0},\,\,U^{{\rm{T}}} U = I \\ \end{aligned} $$

(16)

The objective function in model (16) can be rewritten as:

$$ \begin{aligned} J_{ODNMF} & = \,tr((X - UV^{T} )(X - UV^{T} )^{T} ) \\ & + \,\lambda tr(V^{T} L_{V} V) + \mu tr(U^{T} L_{U} U) \\ & = \,tr(XX^{T} ) - 2tr(XVU^{T} ) + tr(UV^{T} VU^{T} ) \\ & + \,\lambda tr(V^{T} L_{V} V) + \mu tr(U^{T} L_{U} U) \\ \end{aligned} $$

(17)

The partial derivatives of objective function (17) with respect to U and V are:

$$ \begin{aligned} \nabla_{{^{U} }} J & = [\nabla_{{^{U} }} J]^{ + } - [\nabla_{{^{U} }} J]^{ - } \\ & = UV^{T} V - XV + \mu L_{U} U \\ & = \nabla_{U} \varepsilon + \mu L_{U} U \\ \end{aligned} $$

(18)

$$ \begin{aligned} \nabla_{{^{V} }} J & = [\nabla_{{^{V} }} J]^{ + } - [\nabla_{{^{V} }} J]^{ - } \\ & = VU^{T} U - X^{T} U + \lambda L_{V} V \\ & = \nabla_{V} \varepsilon + \lambda L_{V} V \\ \end{aligned} $$

(19)

where $L_{U} = D^{U} - W^{U}$ and $L_{V} = D^{V} - W^{V}$, and their definitions are same as Eq. (11) and (12). Let $\nabla_{U} \varepsilon = UV^{T} V - XV$ and $\nabla_{V} \varepsilon = VU^{T} U - X^{T} U$, when imposing orthogonal constraint on U, then

$$ \nabla_{U} \varepsilon = UV^{T} X^{T} U - XV $$

(20)

Substituting Eq. (20) into Eq. (18), we have the following updating rule on U,

$$ U_{ij} = U_{ij} \frac{{\left( {XV} \right)_{ij} + \mu \left( {W^{U} U} \right)_{ij} }}{{\left( {UV^{T} X^{T} U} \right)_{ij} + \mu \left( {D^{U} U} \right)_{ij} }} $$

(21)

Similarly, the updating rule on V is,

$$ V_{ij} = V_{ij} \, \frac{{\left( {X^{T} U} \right)_{ij} + \lambda \left( {W^{V} V} \right)_{ij} }}{{\left( {VU^{T} U} \right)_{ij} + \lambda \left( {D^{V} V} \right)_{ij} }} $$

(22)

3.2 ODNMF-V

By incorporating orthogonal constraint of the coefficient matrix V in DNMF model, we have the following ODNMF-V model,

$$ \begin{aligned} & J_{ODNMF} = ||X - UV^{T} {||}^{{2}} + \lambda tr{(}V^{{\rm{T}}} L_{V} V{)} + \mu tr{(}U^{{\rm{T}}} L_{U} U{)} \\ & {\rm{s}}{\rm{.t}}{. }\,\,U \ge {0,}V \ge {0},\,\,V^{{\rm{T}}} V = I \\ \end{aligned} $$

(23)

The partial derivatives of objective function (23) with respect to U and V are same as in Eq. (18) and (19). Similarly, the orthogonal constraint on V can lead to the following formula,

$$ \begin{aligned} \nabla_{V} \varepsilon & = [\nabla_{V} \varepsilon ]^{ + } - [\nabla_{V} \varepsilon ]^{ - } \\ & = VU^{T} XV - X^{T} U \\ \end{aligned} $$

(24)

Substituting Eq. (24) into Eq. (19), we have,

$$ \begin{aligned} \nabla_{{^{V} }} J & = [\nabla_{{^{V} }} J]^{ + } - [\nabla_{{^{V} }} J]^{ - } \\ & = \nabla_{{^{V} }} \varepsilon + \mu L_{v} V \\ & = VU^{T} XV - X^{T} U + \mu L_{v} V \\ \end{aligned} $$

(25)

Then the updating rules on V and U can be formulated as follows,

$$ V_{ij} = V_{ij} \, \frac{{\left( {X^{T} U} \right)_{ij} + \lambda \left( {W^{V} V} \right)_{ij} }}{{\left( {VU^{T} XV} \right)_{ij} + \lambda \left( {D^{V} V} \right)_{ij} }} $$

(26)

$$ U_{ij} = U_{ij} \, \frac{{\left( {XV} \right)_{ij} + \mu \left( {W^{U} U} \right)_{ij} }}{{\left( {UV^{T} V} \right)_{ij} + \mu \left( {D^{U} U} \right)_{ij} }} $$

(27)

3.3 ODNMF

ODNMF imposes the orthogonal constraints on both the basis matrix U and coefficient matrix V, which involves the following minimization problem,

$$ \begin{aligned} & J_{ODNMF} = ||X - UV^{T} {||}^{{2}} + \lambda tr{(}V^{{\rm{T}}} L_{V} V{)} + \mu tr{(}U^{{\rm{T}}} L_{U} U{)} \\ & {\rm{s}}{\rm{.t}}{. }\,\,U \ge {0,}V \ge {0},\,\,V^{{\rm{T}}} V = I,\;U^{T} U = I \\ \end{aligned} $$

(28)

The orthogonal constraints on the basis matrix U and coefficient matrix V provide a strong capability of simultaneously clustering rows and columns, which we call bi-orthogonal constraints. Combining update rules (21) in OGNMF-U and (26) in OGNMF-V, the update rules of OGNMF are given as follows

$$ V_{ij} = V_{ij} \, \frac{{\left( {X^{T} U} \right)_{ij} + \lambda \left( {W^{V} V} \right)_{ij} }}{{\left( {VU^{T} XV} \right)_{ij} + \lambda \left( {D^{V} V} \right)_{ij} }} $$

(29)

$$ U_{ij} = U_{ij} \, \frac{{\left( {XV} \right)_{ij} + \mu \left( {W^{U} U} \right)_{ij} }}{{\left( {UV^{T} X^{T} U} \right)_{ij} + \mu \left( {D^{U} U} \right)_{ij} }} $$

(30)

4 Experimental Results

In this section, we evaluate the effectiveness of our proposed ODNMF framework compared with the related matrix factorization methods including NMF, ONMF, GNMF.

4.1 Experimental Setting

To investigate the image clustering performances, two popular evaluation metric are used in the experiments, i.e., the clustering accuracy (AC) and the normalized mutual information (NMI) [10], that are defined as follows

$$ AC = \frac{{\sum\limits_{i = 1}^{n} {\delta \left( {l_{i} ,\tau_{i} } \right)} }}{n} $$

where $\delta$ is a cluster label indicator function, which equals 1 if the two entries are the same and equals 0 otherwise. $l_{i}$ and $\tau_{i}$ denote the true label and predicted label.

$$ NMI = \frac{{MI\left( {C,C^{\prime}} \right)}}{{\max \left( {H(C),H(C^{\prime})} \right)}} $$

where C is the set of clusters from the ground truth, and C' is predicted clusters by clustering method. $H( \cdot )$ is the entropy of a set, and the mutual information $MI\left( {C,C^{\prime}} \right)$ between two sets of clusters C and $C^{\prime}$ is defined as

$$ MI\left( {C,C^{\prime}} \right) = \sum\limits_{{c_{i} \in C,c^{\prime}_{j} \in C^{\prime}}}^{{}} {p\left( {c_{i} ,c^{\prime}_{j} } \right)\log \frac{{p\left( {c_{i} ,c^{\prime}_{j} } \right)}}{{p\left( {c_{i} } \right)p\left( {c^{\prime}_{j} } \right)}}} $$

where $p\left( {c_{i} } \right)$ and $p\left( {c^{\prime}_{j} } \right)$ are the probabilities of a sample belonging to the clusters $c_{i}$ and $c^{\prime}_{j}$, respectively. $p\left( {c_{i} ,c^{\prime}_{j} } \right)$ denotes the joint probability that the selected sample belongs to the clusters $c_{i}$ as well as $c^{\prime}_{j}$ at the same time. The larger AC and higher NMI indicate a better clustering performance [6].

For the ODNMF methods, the neighborhood graph on data samples are set the same as those for GNMF, where the number of nearest neighbors of each sample is fixed as 5 and the regularization parameter in model (9), (12), (19) and (24) are empirically set as 100. After the low-dimensional data representations were produced by matrix factorization methods, the k-means clustering method was performed in the low-dimensional data space. All the experiments in this paper run in MATLAB R2015b on Win7 with 8G RAM and 3.4 GHz CPU.

4.2 Datasets Description

In our experiments, we use three types of datasets, including UCI datasets^{Footnote 1}, text datasets, and face image datasets. For UCI datasets, MADELON is an artificial dataset, which was part of the NIPS 2003 feature selection challenge and containing data points grouped in 32 clusters placed on the vertices of a five dimensional hypercube and randomly labeled +1 or −1. CNAE9 is a highly sparse text dataset, which contains 1080 documents with 9 groups. Semeion is a handwritten digital dataset, which contains 1593 binary images created by 80 persons, and each image is with the size of 16 × 16. Hillvalley is a graphical dataset, and each sample contains 100 points of a 2D plane, and there are 606 samples in two groups, hill or valley. For text dataset, Tr31 dataset is a subset of TREC datasets^{Footnote 2} and Re1 is a subset of Reuters21578 dataset^{Footnote 3}. For face image dataset, UMIST dataset^{Footnote 4} consists of 575 images with 20 different persons, the number of which varies from 19 to 36. XM2VST dataset^{Footnote 5} contains 2950 samples with 295 persons, and YaleB dataset^{Footnote 6} used in our experiments is a subset of original YaleB dataset, containing 2414 images of 38 persons under 64 lighting conditions. ORL dataset^{Footnote 7} contains 400 images of 10 individuals, each of which contains 40 images that captured at different conditions, including shooting time, illuminations, expressions and some other facial details, such as with/without glasses. Some statistics of the four face image datasets are summarized in Table 1.

Table 1. The description of datasets used in the experiments

Full size table

4.3 Results Discussion

The optimal clustering results on different datasets are reported in this section. The clustering accuracies are summarized in Table 2 and the NMI metrics are summarized in Table 3. The value in the parentheses is the optimal target dimensionality under which the method achieved the maximum evaluation metric. As can be shown, the proposed orthogonal DNMFs have better clustering performances than others in most cases. It is worth noting that the original DNMF and orthogonal DNMF perform perfect on the Hillvalley dataset. This is because that the feature or attribute neighborhood information plays more role on identifying a Hill and a Valley on a 2D graph. However, on CNAE9, Tr31 and Re1 datasets, the traditional NMF method performs better than other methods, which indicates that dual-graph regularized NMFs are not suitable for text clustering tasks.

Table 2. The optimal clustering results (AC%)

Full size table

Table 3. The optimal clustering results (NMI%)

Full size table

5 Conclusions

Existing data representation methods are all carried out in data space. However, the information of feature space cannot be fully exploited. To compensate for this drawback, orthogonal dual-graph regularized NMF (ODNMF) is proposed by incorporating DNMF and ONMF. Orthogonal constraints on the basis matrix U and coefficient matrix V are incorporated as the additional condition, which can not only make full use of geometrical structural information underlying the data manifold, but also enhance the generalization performance and robustness of the proposed methods. Since ONMF are more robust to noise, and DNMF has good performance on data clustering, therefore, ODNMF can combine the advantages of ONMF and DNMF together, which has been confirmed on data clustering experiments. The theoretical analysis of the models will be investigated deeply in the future work.

Notes

References

Gao, H., Nie, F., Cai, W., Huang, H.: Robust capped norm nonnegative matrix factorization. In: Proceedings of the 24th ACM International Conference on Information and Knowledge Management, Melbourne, Australia, October 19–23 (2015)
Google Scholar
Shi, T., Kang, K., Choo, J., et al.: Short-text topic modeling via non-negative matrix factorization enriched with local word-context correlations. In: Proceedings of the 2018 World Wide Web Conference on World Wide Web. International World Wide Web Conferences Steering Committee, pp. 1105–1114 (2018)
Google Scholar
Kim, H., Park, H.: Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics 23(12), 1495–1502 (2007)
Article Google Scholar
Zhang, Z., Li, F., Zhao, M., et al.: Joint low-rank and sparse principal feature coding for enhanced robust representation and visual classification. IEEE Trans. Image Process. 25(6), 2429–2443 (2016)
Article MathSciNet Google Scholar
Ding, C., Li, T., Peng, W., Park, H.: Orthogonal nonnegative matrix t-factorizations for clustering. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 126–135. ACM (2006)
Google Scholar
Li, B., Zhou, G., Cichocki, A.: Two efficient algorithms for approximately orthogonal nonnegative matrix factorization. IEEE Signal Process. Lett. 22(7), 843–846 (2015)
Article Google Scholar
Cai, D., He, X., Han, J., et al.: Graph regularized nonnegative matrix factorization for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1548–1560 (2011)
Article Google Scholar
Huang, S., Wang, H., Ge, Y., et al.: Improved hypergraph regularized Nonnegative Matrix Factorization with sparse representation. Pattern Recogn. Lett. 102(15), 8–14 (2018)
Article Google Scholar
Wang, J.Y., Bensmail, H., Gao, X.: Multiple graph regularized nonnegative matrix factorization. Pattern Recogn. 46(10), 2840–2847 (2013)
Article Google Scholar
Xu, Y., Li, Z., Zhang, B., Yang, J., You, J.: Sample diversity, representation effectiveness and robust dictionary learning for face recognition. Inf. Sci. 375(1), 171–182 (2017)
Article Google Scholar
Wang, C., Song, X., Zhang, J.: Graph regularized nonnegative matrix factorization with sample diversity for image representation. Eng. Appl. Artif. Intell. 68(2), 32–39 (2018)
Article Google Scholar
Mei, J., De Castro, Y., Goude, Y., et al.: Nonnegative matrix factorization with side information for time series recovery and prediction. IEEE Trans. Knowl. Data Eng. 31(3), 493–506 (2018)
Article Google Scholar
He, J., Bi, Y., Liu, B., et al.: Graph-dual Laplacian principal component analysis. J. Ambient Intell. Humaniz. Comput. 1–14 (2018)
Google Scholar
Shang, F., Jiao, L.C., Wang, F.: Graph dual regularization non-negative matrix factorization for co-clustering. Pattern Recogn. 45(6), 2237–2250 (2012)
Article Google Scholar
Ye, J., Jin, Z.: Dual-graph regularized concept factorization for clustering. Neurocomputing 138, 120–130 (2014)
Article Google Scholar
Yin, M., Gao, J., Lin, Z., et al.: Dual graph regularized latent low-rank representation for subspace clustering. IEEE Trans. Image Process. 24(12), 4918–4933 (2015)
Article MathSciNet Google Scholar
Yao, Y., Tong, H., Yan, G., et al.: Dual-regularized one-class collaborative filtering. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 759–768. ACM (2014)
Google Scholar
Shang, R., Zhang, Z., Jiao, L., et al.: Self-representation based dual-graph regularized feature selection clustering. Neurocomputing 171, 1242–1253 (2016)
Article Google Scholar
Shang, R., Wang, W., Stolkin, R., et al.: Non-negative spectral learning and sparse regression-based dual-graph regularized feature selection. IEEE Trans. Cybern. 48(2), 793–806 (2017)
Article Google Scholar
Yankelevsky, Y., Elad, M.: Dual graph regularized dictionary learning. IEEE Trans. Signal Inf. Process. Over Netw. 2(4), 611–624 (2016)
Article MathSciNet Google Scholar
Luo, P., Peng, J., Guan, Z., Fan, J.: Dual regularized multi-view non-negative matrix factorization for clustering. Neurocomputing 294(6), 1–11 (2018)
Article Google Scholar
He, J., He, D., Liu, B., Wang, W.: Orthogonal graph regularized nonnegative matrix factorization for image clustering. In: Jin, H., Lin, X., Cheng, X., Shi, X., Xiao, N., Huang, Y. (eds.) BigData 2019. CCIS, vol. 1120, pp. 325–337. Springer, Singapore (2019). https://doi.org/10.1007/978-981-15-1899-7_23

Download references

Acknowledgement

This work is supported in part by National Natural Science Foundation of China under grant No. 61902339, by the Natural Science Basic Research Plan in Shaanxi Province of China under grants No. 2021JM-418, by Shaanxi Key Laboratory of Intelligent Processing for Big Energy Data in Yan'an University under grant No. IPBED14, by Doctoral Starting up Foundation of Yan'an University under grant No. YDBK2019-06, by Yan'an Special Foundation for Science and Technology (2019-01, 2019-13), by Collaborative education project supported by Google (202002107065) and Innovation and entrepreneurship training program for Shaanxi Provincial College Students (S202010719116, S202010719068).

Author information

Authors and Affiliations

Key Laboratory of Agricultural Internet of Things, Ministry of Agriculture and Rural Affairs, Northwest A&F University, Yangling, 712100, Shaanxi, China
Jinrong He
College of Mathematics and Computer Science, Yan‘an University, Yan‘an, 716000, China
Jinrong He & Yanxin Shi
Shaanxi Key Laboratory of Agricultural Information Perception and Intelligent Service, Northwest A&F University, Yangling, 712100, Shaanxi, China
Jinrong He
Shaanxi Key Laboratory of Intelligent Processing for Big Energy Data, Yanan University, Yan’an, 716000, Shaanxi, China
Jinrong He & Zongwen Bai
University of Science and Technology of China, Hefei, 230026, Anhui, China
Zeyu Bao

Authors

Jinrong He
View author publications
You can also search for this author in PubMed Google Scholar
Yanxin Shi
View author publications
You can also search for this author in PubMed Google Scholar
Zongwen Bai
View author publications
You can also search for this author in PubMed Google Scholar
Zeyu Bao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zongwen Bai .

Editor information

Editors and Affiliations

Peking University, Beijing, China
Yuxin Peng
Tsinghua University, Beijing, China
Shi-Min Hu
Tampere University, Tampere, Finland
Moncef Gabbouj
Zhejiang University, Hangzhou, China
Kun Zhou
Technion – Israel Institute of Technology, Haifa, Israel
Michael Elad
Tsinghua University, Beijing, China
Kun Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

He, J., Shi, Y., Bai, Z., Bao, Z. (2021). Orthogonal Dual Graph Regularized Nonnegative Matrix Factorization. In: Peng, Y., Hu, SM., Gabbouj, M., Zhou, K., Elad, M., Xu, K. (eds) Image and Graphics. ICIG 2021. Lecture Notes in Computer Science(), vol 12889. Springer, Cham. https://doi.org/10.1007/978-3-030-87358-5_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-87358-5_8
Published: 30 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87357-8
Online ISBN: 978-3-030-87358-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Orthogonal Dual Graph Regularized Nonnegative Matrix Factorization

Abstract

Similar content being viewed by others

Orthogonal Graph Regularized Nonnegative Matrix Factorization for Image Clustering

Orthogonal Dual Graph-Regularized Nonnegative Matrix Factorization for Co-Clustering

Multiple Graph Regularized Non-negative Matrix Factorization Based on L2,1 Norm for Face Recognition

Keywords

1 Introduction

2 Related Works

2.1 NMF

2.2 ONMF

2.3 GNMF

2.4 DNMF

3 Proposed Methods

3.1 ODNMF-U

3.2 ODNMF-V

3.3 ODNMF

4 Experimental Results

4.1 Experimental Setting

4.2 Datasets Description

4.3 Results Discussion

5 Conclusions

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Orthogonal Dual Graph Regularized Nonnegative Matrix Factorization

Abstract

Similar content being viewed by others

Orthogonal Graph Regularized Nonnegative Matrix Factorization for Image Clustering

Orthogonal Dual Graph-Regularized Nonnegative Matrix Factorization for Co-Clustering

Multiple Graph Regularized Non-negative Matrix Factorization Based on L2,1 Norm for Face Recognition

Keywords

1 Introduction

2 Related Works

2.1 NMF

2.2 ONMF

2.3 GNMF

2.4 DNMF

3 Proposed Methods

3.1 ODNMF-U

3.2 ODNMF-V

3.3 ODNMF

4 Experimental Results

4.1 Experimental Setting

4.2 Datasets Description

4.3 Results Discussion

5 Conclusions

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation