Quadratic regularization projected Barzilai–Borwein method for nonnegative matrix factorization

Huang, Yakui; Liu, Hongwei; Zhou, Shuisheng

doi:10.1007/s10618-014-0390-x

Quadratic regularization projected Barzilai–Borwein method for nonnegative matrix factorization

Published: 12 October 2014

Volume 29, pages 1665–1684, (2015)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Quadratic regularization projected Barzilai–Borwein method for nonnegative matrix factorization

Download PDF

Yakui Huang¹,
Hongwei Liu¹ &
Shuisheng Zhou¹

932 Accesses
25 Citations
Explore all metrics

Abstract

In this paper, based on the alternating nonnegative least squares framework, we present a new efficient method for nonnegative matrix factorization that uses a quadratic regularization projected Barzilai–Borwein (QRPBB) method to solve the subproblems. At each iteration, the QRPBB method first generates a point by solving a strongly convex quadratic minimization problem, which has a simple closed-form solution that is inexpensive to calculate, and then applies a projected Barzilai–Borwein method to update the solution of NMF. Global convergence result is established under mild conditions. Numerical comparisons of methods on both synthetic and real-world datasets show that the proposed method is efficient.

Sparse Non-negative Matrix Factorization with Generalized Kullback-Leibler Divergence

A Nonnegative Projection Based Algorithm for Low-Rank Nonnegative Matrix Approximation

Accelerated anti-lopsided algorithm for nonnegative least squares

Article 01 December 2016

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Nonnegative matrix factorization (NMF) (Paatero and Tapper 1994; Lee and Seung 1999) is a popular and powerful technique for learning the product of two low-rank nonnegative matrix approximation of a nonnegative matrix. In particular, given a matrix $V\in \mathbb {R}^{m\times n}$, in which each element is nonnegative, and an integer $r<\min \{m,n\}$, NMF is to find two matrices $W\in \mathbb {R}^{m \times r}$ and $H\in \mathbb {R}^{r \times n}$ with nonnegative elements such that

$$\begin{aligned} V\approx WH. \end{aligned}$$

(1)

NMF has been studied for more than a decade and successfully applied to signal processing (Cichocki et al. 2006), cancer class discovery (Berry et al. 2007), text mining (Pauca 2004), and computer vision (Li et al. 2001).

A common way to find $W$ and $H$ is by formulating (1) as the following minimization problem:

$$\begin{aligned} \min _{W\ge 0,~H\ge 0}F(W,H):=\frac{1}{2}\Vert V-W\!H\Vert _{F}^{2}, \end{aligned}$$

(2)

where $\Vert \cdot \Vert _{F}$ is the Frobenius norm, and $W\ge 0$ and $H\ge 0$ mean that all elements of $W$ and $H$ are nonnegative.

Lee and Seung developed a multiplicative update (MU) algorithm for solving (2) in their seminal work (Lee and Seung 2001). The MU algorithm updates matrix factors by multiplying each entry with a positive factor in each iteration round. More precisely,

$$\begin{aligned} W^{k+1}_{ia}&= W^{k}_{ia}\frac{\left( V(H^{k})^{T}\right) _{ia}}{\left( W^{k}H^{k}(H^{k})^{T}\right) _{ia}}, \quad \forall ~ i,a,\\ H^{k+1}_{bj}&= H^{k}_{bj}\frac{\left( (W^{k+1})^{T}V\right) _{bj}}{\left( (W^{k+1})^{T}W^{k+1}H^{k}\right) _{bj}}, \quad \forall ~ b,j. \end{aligned}$$

MU has been one of the most commonly used approach for NMF due to the simplicity. However, it has been observed to converge relatively slowly, especially when dealing with dense matrices (Berry et al. 2007; Kim et al. 2007; Lin 2007).

Another popular approach is the alternating nonnegative least squares (ANLS) framework:

Algorithm 1

ANLS Framework for NMF

Step 1. Initialize $W^{1}\in \mathbb {R}^{m\times r}_{+}$ and $H^{1}\in \mathbb {R}^{r\times n}_{+}$. Set $k=1$.
Step 2. Update the matrices in the following way until a convergence criterion is satisfied:
$$\begin{aligned} W^{k+1}&= \arg \min _{W\ge 0}F\left( W,H^{k}\right) \!,\end{aligned}$$
(3)

$$\begin{aligned} H^{k+1}&= \arg \min _{H\ge 0}F\left( W^{k+1},H\right) \!. \end{aligned}$$
(4)

At each iteration, one of the two factors is fixed and the other is updated in such a way that the objective function is reduced. This approach is the “block coordinate descent” method in bound constrained optimization (Bertsekas 1999). Although the original problem (2) is non-convex and NP-hard (Vavasis 2009) with respect to variables $W$ and $H$, the subproblems (3) and (4) are convex problems for which optimal solutions can be found. However, these subproblems may have multiple optimal solutions because they are not strictly convex. Past convergence results for block coordinate descent methods are based on the assumption that the subproblems have unique solutions (Bertsekas 1999). Fortunately, when we have only two blocks, Grippo and Sciandrone (2000) proved convergence of Algorithm 1 to a stationary point of (2) without requiring the uniqueness.

Based on the ANLS framework, many algorithms were developed for solving (2), such as the active set method (Kim and Park 2008), the projected gradient method (PG) (Lin 2007), the projected Newton method (Gong and Zhang 2012), and the projected quasi-Newton method (Kim et al. 2007; Zdunek and Cichocki 2006). However, most of these mentioned methods are inefficient in finding a step length by using the line search. Recently, Guan et al. (2012) proposed the NeNMF algorithm for NMF by applying Nesterov’s optimal gradient method (OGM) (Nesterov 1983), which performs the projected gradient method to a point constructed by the linear combination of the recent two iterations, to the ANLS framework. The OGM does not require line search. But the convergence rate of OGM depends not only on the distance between the initial guess and a stationary point but also on the step length used by the projected gradient method which is determined by the Lipschitz constant of the gradient of the objective function, see Guan et al. (2012) and Nesterov (1983) for details. If the initial iterate is far away from a stationary point or the Lipschitz constant of the gradient is very large, then OGM may take thousands of iterations to reach a given tolerance which consequently degrades the efficiency of NeNMF.

As is well known the projected gradient method suffers from slow convergence like steepest descent (Bertsekas 1999). Barzilai and Borwein (1988) introduced the following two stepsizes that significantly improve the performance of the gradient methods:

$$\begin{aligned} a_{k}^{BB1}=\frac{\langle s_{k-1},y_{k-1}\rangle }{\langle s_{k-1},s_{k-1}\rangle }, \end{aligned}$$

and

$$\begin{aligned} a_{k}^{BB2}=\frac{\langle y_{k-1},y_{k-1}\rangle }{\langle s_{k-1},y_{k-1}\rangle }, \end{aligned}$$

where $s_{k-1}=x^{k}-x^{k-1}$ and $y_{k-1}=\nabla f(x^{k})-\nabla f(x^{k-1})$. The BB method can be viewed as a quasi-Newton method by replacing the inverse Hessian in the Newton iterate formula with a diagonal matrix $a_{k}I$ where $a_{k}$ is the BB stepsize and $I$ is the identity matrix. Due to its simplicity and numerical efficiency, the BB approach has attracted much attention in recent literature (Dai and Liao 2002; Dai et al. 2006; Dai and Zhang 2001; Fletcher 2005; Grippo and Sciandrone 2002; Huang et al. 2013; Raydan 1997). By considering the combination of the global BB nonmonotone scheme (Raydan 1997) with classical projected gradient strategies, Birgin et al. (2000) proposed the so-called spectral gradient method for minimizing differentiable functions on closed convex sets. Dai and Fletcher (2005) developed a projected BB method that alternately uses the two BB stepsizes to solve box-constrained quadratic programming. In general, the projected BB method updates iterate by

$$\begin{aligned} x^{k+1}=P\left( x^{k}-a_{k}\nabla f\left( x^{k}\right) \right) , \end{aligned}$$

(5)

where $a_{k}$ is either BB stepsize and the operator $P(x)$ projects $x$ into the feasible set. The projected BB approach has been useful for solving real applications in different areas, including compressive sensing (Figueiredo et al. 2007), image restoration (Bonettini et al. 2009), support vector machines (Cores et al. 2009), etc. For more detail about the BB method and its variants see Fletcher (2005), Birgin et al. (2012) and references therein.

As mentioned above, the subproblems (3) and (4) are minimizing a quadratic function over the nonnegtive constraints. Thus the methods in Birgin et al. (2000) and Dai and Fletcher (2005) can be applied to solve the NMF problem (2) in the ANLS framework. Recently, Han et al. (2009) proposed four projected BB algorithms for nonsmooth NMF. Among the four proposed algorithms, APBB2, which applies the SPG2 method in Birgin et al. (2000) to NMF, performs best. However, the projected BB strategy may not generate satisfactory decrease due to the inexact approximation of the Hessian. In addition, the nonmonotone line search scheme may increase the objective significantly. Therefore, APBB2 may cost many iterations to solve the subproblems which can be a serious disadvantage in the large-scale case. These issues motivate us to develop a new algorithm for solving NMF.

In this paper, we present a new efficient algorithm for NMF based on the ANLS framework. Our algorithm adopts a quadratic regularization projected Barzilai–Borwein (QRPBB) method as an essential subroutine. At each iteration, the QRPBB method first solves a strongly convex quadratic minimization problem, which admits a simple closed-form solution that is inexpensive to calculate, and then applies the SPG2 method in Birgin et al. (2000) to update the solution of NMF. Global convergence result is established under mild conditions. Numerical experiments on synthetic, image, and text datasets show that the proposed algorithm is efficient.

The rest of this paper is organized as follows. In Sect. 2, our new method for solving the subproblems (3) and (4) is described in detail. In Sect. 3, we prove the global convergence of our method. In Sect. 4, we present experimental comparisons among several NMF methods. We conclude the paper in Sect. 5.

2 Quadratic regularization projected Barzilai–Borwein method

In this section, we present a quadratic regularization projected Barzilai–Borwein (QRPBB) method for solving the subproblems (3) and (4). Notice that the role of matrices $W$ and $H$ is perfectly symmetric: any formula designed to update $W$ directly translates into an update for $H$ in the original problem. So we focus only on the update of the matrix $W$.

We need to solve the following problem for updating the matrix $W$,

$$\begin{aligned} \min _{W\ge 0}f^{k}(W):=F\left( W,H^{k}\right) =\frac{1}{2}\left\| V-W\!H^{k}\right\| _{F}^{2}, \end{aligned}$$

(6)

which is the minimization problem in (3). To simplify the notation, we use $f(W)$ rather than $f^{k}(W)$ in the following part of this paper.

Apparently, the objective function $f(W)$ is convex and the gradient of which is Lipschitz continuous. We summarize these properties in the following lemma which has been shown by Guan et al. (2012) for the problem in (4).

Lemma 1

The following two statements are valid.

(i)
The objective function $f(W)$ of (6) is convex.
(ii)
The gradient
$$\begin{aligned} \nabla f(W)=\left( W\!H^{k}-V\right) \left( H^{k}\right) ^{T} \end{aligned}$$
is Lipschitz continuous with constant $L=\Vert H^{k}(H^{k})^{T}\Vert _{2}$.

For any fixed $U\ge 0$, define the quadratic regularization function by

$$\begin{aligned} \phi (U,W)=f(U)+\langle \nabla f(U),W-U\rangle +\frac{L}{2}\Vert W-U\Vert _{F}^{2}. \end{aligned}$$

From (ii) of Lemma 1, it follows that $\phi (U,W)$ is strictly convex in $W$ for any fixed $U\ge 0$. At the iteration $t$, our QRPBB method first generates a point by solving the following strongly convex quadratic minimization problem

$$\begin{aligned} \min _{W\ge 0}\phi (W_{t},W). \end{aligned}$$

(7)

Clearly, problem (7) has a unique solution in closed-form

$$\begin{aligned} Z_{t}=P\left[ W_{t}-\frac{1}{L}\nabla _{W}F\left( W_{t},H^{k}\right) \right] , \end{aligned}$$

(8)

where the operator $P(X)$ projects all the negative entries of $X$ to zero.

After calculating the point $Z_{t}$, QRPBB applies the SPG2 method by Birgin et al. (2000) to update the matrix:

$$\begin{aligned} W_{t+1}=Z_{t}+\lambda _{t}D_{t}, \end{aligned}$$

(9)

where $\lambda _{t}\in (0,1]$ is the step length obtained by nonmonotone line search (Grippo and Sciandrone 2002) and

$$\begin{aligned} D_{t}=P\left[ Z_{t}-\alpha _{t}\nabla f(Z_{t})\right] -Z_{t} \end{aligned}$$

(10)

is the direction with $\alpha _{t}$ be the BB stepsize with safeguards

$$\begin{aligned} \alpha _{t}=\min \left\{ \alpha _{\max }, \max \left\{ \alpha _{\min }, \frac{\langle S_{t-1},S_{t-1}\rangle }{\langle S_{t-1},Y_{t-1}\rangle }\right\} \right\} . \end{aligned}$$

(11)

Here, $\alpha _{\max }>\alpha _{\min }>0, S_{t-1}=W_{t}-W_{t-1}$ and $Y_{t-1}=\nabla f(W_{t})-\nabla f(W_{t-1})$.

The QRPBB method for (6) is summarized in Algorithm 2. Both $W^{k}$ and $H^{k}$ are obtained from the previous iteration in the ANLS framework. We apply QRPBB to solve the minimization problem in (4) as well.

Algorithm 2

Quadratic regularization projected BB method

Step 1. Initialize $\alpha _{0}=1, M>0, \alpha _{\max }>\alpha _{\min }>0, \rho ,~\gamma \in (0,1), L=\Vert H^{k}(H^{k})^{T}\Vert _{2}$, and $W_{0}=W^{k}$. Set $t=0.$
Step 2. If $\Vert P\left[ W_{t}-\nabla f(W_{t})\right] -W_{t}\Vert =0$, stop.
Step 3. Compute $Z_{t}=P[W_{t}-\displaystyle \frac{1}{L}\nabla f(W_{t})]$.
Step 4. Nonmonotone line search. Let $m_{t}$ be the smallest nonnegative integer $m$ satisfying
$$\begin{aligned} f\left( Z_{t}+\rho ^{m} D_{t}\right) \le \max _{0\le j\le \min \{t,M-1\}}f(Z_{t-j})+ \gamma \rho ^{m}\left\langle \nabla f(Z_{t}),D_{t}\right\rangle , \end{aligned}$$
(12)
where $D_{t}=P[Z_{t}-\alpha _{t}\nabla f(Z_{t})]-Z_{t}$. Set $\lambda _{t}:=\rho ^{m_{t}}, W_{t+1}=Z_{t}+\lambda _{t}D_{t}$.
Step 5. Compute
$$\begin{aligned} S_{t}=W_{t+1}-Z_{t} ~\mathrm and ~ Y_{t}=\nabla f(W_{t+1})-\nabla f(Z_{t}). \end{aligned}$$
If $\langle S_{t},S_{t}\rangle /\langle S_{t},Y_{t}\rangle \le 0$, set $\alpha _{t+1}=\alpha _{\max }$; otherwise, set
$$\begin{aligned} \alpha _{t+1}=\min \left\{ \alpha _{\max },\max \left\{ \alpha _{\min }, \langle S_{t},S_{t}\rangle /\langle S_{t},Y_{t}\rangle \right\} \right\} . \end{aligned}$$
Step 6. Set $t=t+1$ and go to Step 2.

Note that the idea of applying projected BB methods to solve the subproblems is not new. QRPBB is closely related to the APBB2 in Han et al. (2009) but has the following three differences:

(i)
In the calculation of new iterate: our QRPBB method computes the new point $Z_{t}$ and updates $W_{t}$ by using $Z_{t}$. It generates two sequences $\{Z_{t}\}$ and $\{W_{t}\}$ during the iterative process. The APBB2 method only produces a sequence $\{W_{t}\}$.
(ii)
In the use of line search: our QRPBB method performs nonmonotone line search on the new point $Z_{t}$ rather than $W_{t}$. On the other hand, APBB2 applies nonmonotone line search directly to $W_{t}$.
(iii)
In the use of initial BB stepsize: we initialize the BB stepsize as 1 at the beginning of the QRPBB method. The APBB2 method selects the maximum absolute of the element in the gradient to compute the initial BB stepsize, that is, $1/\max \limits _{i,j}(|\nabla f(W_{0})|)$.

Now we explain the reason for using such schemes. Our aim is to minimize $f(W)$ over the nonnegative constraints. As described in the former section, the projected BB strategy uses a nonmonotone line search may lead to a significant increase in the objective. In contrast, the quadratic approximation of $f(W)$ at $W_{t}$ given by $\phi (W_{t},W)$ produces the point $Z_{t}$ cheaply by (8). In next section, we will show that $f(Z_{t})\le f(W_{t})-\frac{L}{2}\Vert Z_{t}-W_{t}\Vert ^{2}$, which implies that $Z_{t}$ provides a reduction in the objective. Therefore, if the nonmonotone line search (12) increases the objective at the $t$th iteration, then $f(W_{t+1})$ will be no larger than the maximum value of recent $f(Z_{t})$ which is smaller than $f(W_{t})$. As a result, the objective value will not increase very much and may reduce the iterations to solve the subproblems.

It is not difficult to see that, at each iteration, the major operation of QRPBB is to check the condition (12) and compute the gradient. Thus the time complexity of one iteration of QRPBB is same as that of PG by Lin (2007), i.e., $O(mnr)+\#\mathrm sub-iterations \times O(tmr^2+tnr^2)$ with $t$ being the trials of the nonmonotone line search procedure. See Lin (2007) for details of how to calculate the complexity and Guan et al. (2012) for the complexity of popular methods for NMF.

3 Convergence

In this section, we show that the sequence $\{W_{t}\}$ generated by the QRPBB method converges to a stationary point of (6).

From the Karush–Kuhn–Tucker (KKT) optimality conditions, $W$ is a stationary point of (6) if and only if

$$\begin{aligned} W\ge 0,~\nabla f(W)\ge 0,~ \nabla f(W)\otimes W=0, \end{aligned}$$

(13)

where $\otimes $ is the Hadamard product.

Define the scaled projected gradient direction by

$$\begin{aligned} D_{\alpha }(W)=P\left[ W-\alpha \nabla f(W)\right] -W, \end{aligned}$$

(14)

for all $\alpha >0, W\ge 0$. It is easy to prove the following lemma that will be used in our proof, see Birgin et al. (2000) for example.

Lemma 2

For all $\alpha \in (0,\alpha _{\max }], W\ge 0$,

(i)
$\langle \nabla f(W),D_{\alpha }(W)\rangle \le -\displaystyle \frac{1}{\alpha }\Vert D_{\alpha }(W)\Vert ^{2}\le -\displaystyle \frac{1}{\alpha _{\max }}\Vert D_{\alpha }(W)\Vert ^{2}$,
(ii)
$W$ is a stationary point of (6) if and only if $D_{\alpha }(W)=0$.

By Theorem 2.2 of Birgin et al. (2000), we know that the nonmonotone line search Step 3 of Algorithm 2 is well defined. Therefore, Algorithm 2 is well defined.

Let $l(t)$ be an integer such that $t-\min \{t,M-1\}\le l(t)\le t$ and

$$\begin{aligned} f(Z_{l(t)})=\max \limits _{0\le j\le \min \{t,M-1\}}f(Z_{t-j}). \end{aligned}$$

By Lemma 2, we have the following lemma.

Lemma 3

Assume that the sequence $\{Z_{t}\}$ is generated by Algorithm 2, then the following statements are valid.

(i)
$f(Z_{t})\le f(W_{t})-\displaystyle \frac{L}{2}\Vert Z_{t}-W_{t}\Vert ^{2}.$
(ii)
The sequence $\{f(Z_{l(t)})\}$ is nonincreasing.

Proof

By (8) and the left inequality of (i) in Lemma 2, one has

$$\begin{aligned} \left\langle \nabla f(W_{t}),Z_{t}-W_{t}\right\rangle \le -L\left\| Z_{t}-W_{t}\right\| ^{2}, \end{aligned}$$

It follows that

$$\begin{aligned} f(Z_{t})&\le f(W_{t})+\left\langle \nabla f(W_{t}),Z_{t}-W_{t}\right\rangle +\frac{L}{2}\left\| Z_{t}-W_{t}\right\| ^{2}\\&\le f(W_{t})-\frac{L}{2}\left\| Z_{t}-W_{t}\right\| ^{2}, \end{aligned}$$

which proves the first part.

For the second part, taking into account that

$$\begin{aligned} f(Z_{l(t+1)})&=\max \limits _{0\le j\le \min \{t+1,M-1\}}f(Z_{t+1-j}) \\&\le \max \left\{ f(Z_{l(t)}),f(Z_{t+1})\right\} \\&\le \max \left\{ f(Z_{l(t)}),f(W_{t+1})\right\} \\&=f(Z_{l(t)}), \end{aligned}$$

where the last equality follows from (12). This completes the proof. $\square $

It follows from Lemma 3 that

$$\begin{aligned} f(W_{t+1})\le f(Z_{l(t)})\le f(Z_{l(t-1)})\le f(Z_{0}). \end{aligned}$$

Moreover, for any initial iterate $W_{0}\ge 0$ the sequences $\{W_{t}\}$ and $\{Z_{t}\}$ generated by Algorithm 2 are contained in the level set

$$\begin{aligned} \mathcal {L}(W_{0})=\left\{ W|f(W)\le f(W_{0}), \quad W\ge 0\right\} . \end{aligned}$$

Now we use similar argument to the one in Lemma 2.1 of Zhang and Hager (2004) to prove lower bound for the step length $\lambda _{t}$.

Lemma 4

If $Z_{t}$ is not a stationary point of (6), then there exists a constant $\tilde{\lambda }>0$ such that $\lambda _{t}\ge \tilde{\lambda }$.

Proof

Since by Lemma 1, $\nabla f(W)$ is Lipschitz continuous with constant $L$, then

$$\begin{aligned} f(Z_{t}+\lambda D_{t})-f(Z_{t})&=\lambda \left\langle \nabla f(Z_{t}),D_{t}\right\rangle +\int _{0}^{\lambda }\left\langle \nabla f(Z_{t}+tD_{t})- \nabla f(Z_{t}), D_{t}\right\rangle dt \nonumber \\&\le \lambda \left\langle \nabla f(Z_{t}),D_{t}\right\rangle +\int _{0}^{\lambda }Lt\Vert D_{t}\Vert ^{2}dt \nonumber \\&=\lambda \left\langle \nabla f(Z_{t}),D_{t}\right\rangle +\frac{L}{2}\lambda ^{2}\Vert D_{t}\Vert ^{2}. \end{aligned}$$

(15)

As we know that either $\lambda _{t}=1$ or the inequality (12) will be failed at least once. Therefore,

$$\begin{aligned} f\left( Z_{t}+\frac{\lambda _{t}}{\rho }D_{t}\right)&> f(Z_{l(t)})+ \gamma \frac{\lambda _{t}}{\rho }\left\langle \nabla f(Z_{t}),D_{t}\right\rangle \\&\ge f(Z_{t})+ \gamma \frac{\lambda _{t}}{\rho }\left\langle \nabla f(Z_{t}),D_{t}\right\rangle . \end{aligned}$$

Combining this with (15) and (ii) of Lemma 2, we obtain

$$\begin{aligned} \lambda _{t}\ge \min \left\{ 1,\frac{2\rho (1-\gamma )}{L}\frac{|\langle \nabla f(Z_{t}),D_{t}\rangle |}{\Vert D_{t}\Vert ^{2}}\right\} . \end{aligned}$$

From the right inequality of (i) of Lemma 2, one has

$$\begin{aligned} \langle \nabla f(Z_{t}),D_{t}\rangle \le - \frac{1}{\alpha _{\max }}\Vert D_{t}\Vert ^{2}. \end{aligned}$$

Thus,

$$\begin{aligned} \lambda _{t}\ge \min \left\{ 1,\frac{2\rho (1-\gamma )}{L\alpha _{\max }}\right\} :=\tilde{\lambda }. \end{aligned}$$

$\square $

The following theorem establishes the global convergence of the QRPBB method.

Theorem 1

Suppose that the level set $\mathcal {L}(W_{0})$ is bounded, then any accumulation point of the sequence $\{Z_{t}\}$ generated by Algorithm 2 is a stationary point of (6).

Proof

Since $\mathcal {L}(W_{0})$ is bounded, then $\{Z_{t}\}$ admits an accumulation point, and relabel $\{Z_{t}\}$ a subsequence converging to $Z_{*}$.

Let us suppose by way of contradiction that $Z_{*}$ is not a stationary point of (6). Then it follows from (ii) of Lemma 2 that $\Vert D_{\alpha }(Z_{*})\Vert >0$ for all $\alpha \in (0,\alpha _{\max }]$. By continuity and compactness, there exists $\eta >0$ such that $\Vert D_{\alpha }(Z_{*})\Vert >\eta $ for all $\alpha \in [\tilde{\lambda },\alpha _{\max }]$. Thus, for sufficiently large $t, \Vert D_{\alpha _{t}}(Z_{*})\Vert >\displaystyle \frac{\eta }{2}$. Hence, using Lemma 2, Lemma 3, and Lemma 4, we obtain

$$\begin{aligned} f(Z_{l(t)})&\le f(W_{l(t)})\\&= f\left( Z_{l(t)-1}+\lambda _{l(t)-1}D_{l(t)-1}\right) \\&\le f\left( Z_{l(l(t)-1)}\right) + \gamma \lambda _{l(t)-1}\left\langle \nabla f(Z_{l(t)-1}),D_{l(t)-1}\right\rangle \\&\le f\left( Z_{l(l(t)-1)}\right) -\frac{\gamma \tilde{\lambda }}{\alpha _{\max }}\Vert D_{l(t)-1}\Vert ^{2}\\&\le f\left( Z_{l(l(t)-1)}\right) -\frac{\gamma \tilde{\lambda }\eta ^{2}}{4\alpha _{\max }}, \end{aligned}$$

which implies that

$$\begin{aligned} f(Z_{l(t)})\rightarrow -\infty ,~t\rightarrow \infty . \end{aligned}$$

Here is the desired contradiction. Therefore, $Z_{*}$ is a stationary point of (6). $\square $

By the proof of Theorem 1 and Lemma 3, we have the following result.

Corollary 1

Suppose that the level set $\mathcal {L}(W_{0})$ is bounded, then any accumulation point of the sequence $\{W_{t}\}$ generated by Algorithm 2 is a stationary point of (6).

4 Experiments

We have tested the ANLS using our QRPBB method and compared it with other three ANLS-based methods including the NeNMF (Guan et al. 2012), the projected gradient method (PG^{Footnote 1} Lin (2007)), and the projected BB method (APBB2^{Footnote 2} Han et al. (2009)). We compared these methods on both synthetic datasets and real-world problems using the ORL image database^{Footnote 3}, CBCL face image database^{Footnote 4}, the Reuters-21578 (Lewis et al. 2004) and TDT-2 (Cieri et al. 1999) text corpus.^{Footnote 5} All the methods were implemented in MATLAB. (Codes of our QRPBB method are available under request.) The numerical tests were carried out on a 3.10 GHz Core i5 PC with 4 GB of RAM under Windows 7.

4.1 Stopping criterion

The KKT conditions of (2) can be written as

$$\begin{aligned} \nabla _{H}^{P}F(W,H)=0,~\nabla _{W}^{P}F(W,H)=0, \end{aligned}$$

(16)

where $\nabla _{W}^{P}F(W,H)$ is defined by

$$\begin{aligned} \nabla _{W}^{P}F(W,H)\!=\!\left\{ \!\begin{array}{lr} \nabla _{W}F(W,H)_{ij}, &{} (W)_{ij}>0, \\ \min \{0,\nabla _{W}F(W,H)_{ij}\}, &{} (W)_{ij}=0, \end{array}\right. \end{aligned}$$

and $\nabla _{H}^{P}F(W,H)$ is defined in the same way. We use the following stopping criterion as Lin (2007),

$$\begin{aligned}&\Big \Vert \big [\nabla _{H}^{P}F(W^{k},H^{k}),\nabla _{W}^{P}F(W^{k},H^{k})^{T}\big ]\Big \Vert _{F} \nonumber \\&\quad \le \epsilon \Big \Vert \left[ \nabla _{H}F(W^{1},H^{1}),\nabla _{W}F(W^{1},H^{1})^{T}\right] \Big \Vert _{F}. \end{aligned}$$

(17)

Here, $\epsilon >0$ is a tolerance.

For the subproblems (3) and (4), we stop the iterative procedures if

$$\begin{aligned} \left\| \nabla _{W}^{P}F\left( W^{k+1},H^{k}\right) \right\| _{F}\le \epsilon _{W}~\mathrm and ~ \left\| \nabla _{H}^{P}F\left( W^{k+1},H^{k+1}\right) \right\| _{F}\le \epsilon _{H}, \end{aligned}$$

(18)

where

$$\begin{aligned} \epsilon _{W}&=\epsilon _{H}=\max (10^{-3},\epsilon ) \Big \Vert \left[ \nabla _{H}F\left( W^{1},H^{1}\right) ,\nabla _{W}F\left( W^{1},H^{1}\right) ^{T}\right] \Big \Vert _{F}. \end{aligned}$$

If QRPBB solves (3) without any iterations, we decrease the stopping tolerance by $\epsilon _{W}=0.1\epsilon _{W}$. The same strategy is adopted to solve (4). Note that other three algorithms also use the same stopping criterion (17).

We also used 10,000 as the maximal number of iterations allowed and 1,000 seconds as the maximum CPU time allowed for each method. In the implementation of QRPBB, we allowed QRPBB to iterate at most 1,000 iterations for solving the subproblems (3) and (4).

4.2 Synthetic data

We first tested QRPBB and the other three methods on synthetic datasets. The matrix $V$ is randomly generated by the rand function in MATLAB. All initial points $(W^{1},H^{1})$ were randomly generated in the same way as $V$.

We use the following parameters for our QRPBB method in all the experiments of the remain subsections:

$$\begin{aligned} \alpha _{\max }=10^{20},~\alpha _{\min }=10^{-20},~\rho =0.25,~\gamma =10^{-4}. \end{aligned}$$

This setting is popular for most BB type methods and same as that of APBB2. Note that $M$ stands for the number of recent objectives among which the maximum is chosen for nonmontone line search. Since our tests do not show meaningful differences with $M=5,10$ and our QRPBB method requires few iterations to solve the subproblems, we use $M=5$. All parameters for other methods were set to their default values.

For each problem, 10 different initial points were randomly generated and the average results from using these initial values were reported in Table 1. The number iter denotes the number of iterations needed when the termination criterion (17) was met. The numbers, niter, pgn, residual, and time denote the total number of sub-iterations for solving (3) and (4), the final value of projected gradient norm $\Vert \big [\nabla _{H}^{P}F(W^{k},H^{k}),\nabla _{W}^{P}F(W^{k},H^{k})^{T}\big ]\Vert _{F}$, the final value of $\Vert V-W^{k}H^{k}\Vert _{F}/\Vert V\Vert _{F}$, and the CPU time used at the termination of each algorithm, respectively.

Table 1 Experimental results on synthetic datasets (dense matrices) with $\epsilon =10^{-7}$

Full size table

As shown in Table 1, all of these compared methods appeared to satisfy the convergence criterion within a reasonable number of iterations. For most of the test problems, QRPBB showed the shortest execution time and the smallest number of sub-iterations, especially for large problems. The final values of pgn generated by QRPBB are smaller than those by NeNMF and PG, and close to those by APBB2. On the other hand, the final values of residual obtained by QRPBB and APBB2 are slightly larger than those by NeNMF and PG. The following two factors can attribute to this phenomenon. First, since the NMF problem (2) is not convex, these methods may converge to different points. Second, these methods may have different final $(W^{k},H^{k})$ due to different update strategies.

Since QRPBB is closely related to APBB2, we further examined the behavior of QRPBB and APBB2. We present comparison of QRPBB and APBB2 on two random generated problems of the form (3) in Fig. 1. We terminate the methods if the stop criterion described by the first inequality in (18) is satisfied with $\epsilon _{W}=10^{-4}$ or the iteration number exceeds 1,000. Clearly, the curve generated by QRPBB is not as sharp as that by APBB2. Moreover, QRPBB requires less than 1/2 iterations of APBB2 to reach the given tolerance. Results here are consistent with our analysis in Sect. 2. The improvement becomes more obvious for solving NMF problems. Take the last problem in Table 1 for example, QRPBB needs only 16.4 iterations and 292.5 sub-iterations in average while APBB2 requires around 6 times iterations and 7 times sub-iterations of those by QRPBB. In addition, most of the residual obtained by QRPBB are same as by APBB2. However, QRPBB can sometimes give slightly larger final values of pgn and residual than APBB2. This phenomenon can be explained by the above statements.

4.3 Image data

The ORL image database contains 400 facial images of 40 different people with 10 images per person. The size of each image is $92\times 112$ pixels, with 256 grey levels per pixel. The matrix $V$ whose columns represent these images has $92\times 112=$10,304 rows and 400 columns.

The CBCL image database consists of 2,429 facial images. Each face image has $19\times 19$ pixels. We obtained $361\times $2,429 matrix.

For each database, we try 10 different randomly generated initial iterates with $\epsilon =10^{-7}$ in (17) and present the average results in Table 2. We can find that for both databases, our QRPBB method converges in less iterations and CPU time. Although the residual by QRPBB is not the smallest for each database, the values of pgn imply that solutions by QRPBB are closer to stationary points.

Table 2 Experimental results on the CBCL and ORL databases with $\epsilon =10^{-7}$

Full size table

We set $\epsilon $ in (17) to different values in order to investigate the convergence speed to a stationary point and show the results in Fig. 2. The performance of these four methods are close when a loose tolerance was given. However, when a tighter tolerance was used, QRPBB was clearly faster than other three methods.

4.4 Text data

The Reuters-21578 corpus contains 21,578 documents in 135 categories. Those documents with multiple category labels are discarded. It left us with 8,293 documents in 65 categories. After preprocessing, this corpus contains 18,933 distinct terms represented by an 18,933$\times $8,293-dimension matrix.

The TDT-2 corpus contains news articles from various sources such as ABC, CNN, VOA, NYT, PRI, and APW in 1998. It consists of 11,201 on-topic documents which are classified into 96 semantic categories. Those documents appearing in two or more categories were removed, and only the largest 30 categories were kept, thus leaving us with 9,394 documents represented by a 36,771$\times $9,394-dimension matrix.

Similar as the former subsection, we first set $\epsilon =10^{-7}$ and then set different tolerance values. The average results with 10 different initial iterates are presented in Table 3 and Fig. 3. We observe from Table 3 that NeNMF requires less iterations than other methods when the tolerance is set to $10^{-7}$, especially for the TDT2 dataset. But the residual and pgn by NeNMF are large. In contrast, NeNMF is observed slower than QRPBB which has the fastest final convergence for a tight tolerance in Fig. 3.

Table 3 Experimental results on the Reuters-21578 and TDT2 datasets with $\epsilon =10^{-7}$

Full size table

4.5 Document clustering

Recently, the use of NMF for document clustering has attracted much interest (Cai et al. 2005, 2011; Cichocki et al. 2009; Pauca 2004; Shahnaz et al. 2006). We conducted clustering experiments on the Reuters-21578 and TDT2 datasets. Each document was represented with a normalized term frequency vector. There are 50 randomly cases for each cluster number from 2 to 10. These randomly selected data in MATLAB format are downloaded from http://www.cad.zju.edu.cn/home/dengcai/Data/TextData.html. We use the randn function in MATLAB to generate initial iterates for each experiment. We normalize the matrix $W$ during the iterate process in a similar way as Xu et al (2003). We present the average accuracy and the average normalized mutual information (NMI) (Xu et al 2003) obtained by the 50 trials for each cluster number. The corresponding average CPU time is reported as well.

Figure 4 shows that QRPBB, PG, and APBB2 can obtain fairly good accuracy that better than NeNMF. For most cluster numbers, NMI obtained by PG is slightly larger than QRPBB and APBB2 while NeNMF gives the smallest NMI. QRPBB converges faster than PG and APBB2 and slightly slower than NeNMF, especially for a large cluster number. From Fig. 5, similar conclusions can be drawn as from Fig. 4. QRPBB, PG, and APBB2 outperform NeNMF in both accuracy and NMI. For some cluster number, QRPBB yields lower accuracy and smaller NMI than either PG or APBB2. One possible reason for the above observation is document clusters in TDT2 are generally more compact and focused than the clusters in Reuters. As a result, the accuracy and NMI of TDT2 maybe more sensitive to the values of residual than those of Reuters. Moreover, it has been shown by Table 3 that PG and APBB2 can produce smaller residual than QRPBB and NeNMF with tolerance $10^{-7}$. Overall our QRPBB method is competitive to other three compared methods.

As we know, the clustering performance can be degraded if the objective is optimized too much. We set different tolerance values to investigate the performance of QRPBB on clustering the Reuter-21578 and TDT-2 datasets and present the results in Fig. 6. For the Reuter-21578 dataset, we find that when the tolerance is set to $10^{-9}$, 5 of 9 clustering accuracy results are higher than those obtained with loose tolerance. Similar results are obtained for the TDT2 dataset, where the improvement of the clustering accuracy is more obvious. However, we can not expect better performance for all the cases with a tight tolerance.

4.6 Nonnegative sparse coding

Sparse coding aims to find a small set of basis vectors that capture high-level patterns in the input data. By adding a penalty term to the NMF, Hoyer (2004) proposed the nonnegative sparse coding

$$\begin{aligned} \min _{W\ge 0,~ H\ge 0}\frac{1}{2}\Vert V-WH\Vert ^{2}_{F}+\beta \Vert H\Vert _{1}, \end{aligned}$$

(19)

where $\Vert \cdot \Vert _{1}$ is the standard $L_{1}$-norm and $\beta $ is a constant to control the sparsity of $H$. It was shown by Guan et al. (2012) that the gradient of the objective in (19) is Lipschitz continuous either with respect to $H$. Since $H\ge 0$, (19) is equivalent to

$$\begin{aligned} \min _{W\ge 0,~ H\ge 0}\frac{1}{2}\Vert V-WH\Vert ^{2}_{F}+\beta \sum _{i,j}H_{ij}. \end{aligned}$$

(20)

Thus we can apply QRPBB in ANLS framework to solve (20).

We tested these four methods on a set of natural images used in Hoyer (2004). This provides us a 288$\times $10,000-dimension matrix. We also compared the results with those obtained by the algorithm (NNSC) proposed by Hoyer (2004).

Since NNSC uses the MU scheme to update matrix, the stopping criterion defined in (17) is not suitable for it. We use the following stopping criterion suggested by Guan et al. (2012):

$$\begin{aligned} \frac{F(W^{k},H^{k})-F(W^{k+1},H^{k+1})}{F(W^{1},H^{1})-F(W^{k+1},H^{k+1})}\le \epsilon . \end{aligned}$$

(21)

We set $\beta =10$ and $\epsilon =10^{-4}$ for all methods and run them from the same initial point. The stepsize for NNSC is set to 1. We report the results in Table 4, where spH is the sparseness of the final matrix $H^{k}$ measured by

$$\begin{aligned} \mathrm spH =\frac{\mathrm{Number\, of\, Zero\, Entries\, in }~ H}{\mathrm{Total\, Number\, of\, Entries\, in }~ H}, \end{aligned}$$

where $H_{ij}$ is considered a zero entry if $|H_{ij}|<10^{-6}$. We can see from Table 4 that all the five methods give similar residual and yield a sparse nonnegative factorization in this example. In particular, NNSC generates the smallest pgn and NeNMF gives the sparsest solution and smaller pgn than other three methods. However, QRPBB produces a solution as sparse as that by NeNMF and takes around 1/4 CPU time costed by NeNMF.

Table 4 Experimental results on natural image dataset with $\epsilon =10^{-4}$, where $m=288,~n=10,000,~r=72$. The final objective value is the approximately $1.34\times 10^{6}$ for all methods

Full size table

5 Conclusions

We have proposed an efficient method for nonnegative matrix factorization (NMF) based on the alternating nonnegative least squares (ANLS) framework. Our approach adopts a quadratic regularization projected Barzilai–Borwein (QRPBB) method to solve the subproblems that leads to fast convergence. Experimental comparisons with other three ANLS-based NMF algorithms on both synthetic and real-world datasets show that the QRPBB method is promising.

Our QRPBB method can be applied to other types of NMF, such as the nonsmooth NMF (Pascual-Montano et al. 2006; Han et al. 2009) and graph regularized NMF (Cai et al. 2011). It is interesting to see whether the QRPBB method performs well in solving these problems. We have tested these methods on some nonsmooth NMF problems which showed that the QRPBB method converges faster in terms of CPU time. More thorough experiments involving different types of data are needed to make a comprehensive comparison.

Notes

http://www.csie.ntu.edu.tw/~cjlin/nmf/index.html.
http://homepages.umflint.edu/~lxhan/software.html.
http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html.
http://cbcl.mit.edu/software-datasets/FaceData2.html.
Both Reuters-21578 corpus and TDT-2 corpus in MATLAB format are available at http://www.cad.zju.edu.cn/home/dengcai/Data/TextData.html.

References

Barzilai J, Borwein JM (1988) Two-point step size gradient methods. IMA J Numer Anal 8(1):141–148
Article MathSciNet MATH Google Scholar
Berry MW, Browne M, Langville AN, Pauca VP, Plemmons RJ (2007) Algorithms and applications for approximate nonnegative matrix factorization. Computat Statist Data Anal 52(1):155–173
Article MathSciNet MATH Google Scholar
Bertsekas DP (1999) Nonlinear programming, 2nd edn. Athena Scientific, Belmont
MATH Google Scholar
Birgin EG, Martínez JM, Raydan M (2000) Nonmonotone spectral projected gradient methods on convex sets. SIAM J Optim 10(4):1196–1211
Article MathSciNet MATH Google Scholar
Birgin EG, Martínez JM, Raydan M (2012) Spectral projected gradient methods: Review and perspectives. http://www.ime.usp.br/~egbirgin/
Bonettini S, Zanella R, Zanni L (2009) A scaled gradient projection method for constrained image deblurring. Inverse Probl 25(1):015002
Article MathSciNet Google Scholar
Cai D, He X, Han J (2005) Document clustering using locality preserving indexing. IEEE Trans Knowl Data Eng 17(12):1624–1637
Article Google Scholar
Cai D, He X, Han J, Huang TS (2011) Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Intell 33(8):1548–1560
Article Google Scholar
Cichocki A, Zdunek R, Amari SI (2006) New algorithms for non-negative matrix factorization in applications to blind source separation. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp 621–624
Cichocki A, Zdunek R, Phan AH, Amari SI (2009) Nonnegative matrix and tensor factorizations: applications to exploratory multi-way data analysis and blind source separation. Wiley, Chichester
Book Google Scholar
Cieri C, Graff D, Liberman M, Martey N, Strassel S (1999) The TDT-2 text and speech corpus. In: Proceedings of the DARPA Broadcast News Workshop, pp 57–60
Cores D, Escalante R, González-Lima M, Jimenez O (2009) On the use of the spectral projected gradient method for support vector machines. Comp Optim Appl 28(3):327–364
MATH Google Scholar
Dai YH, Zhang H (2001) Adaptive two-point stepsize gradient algorithm. Numer Algor 27(4):377–385
Article MathSciNet MATH Google Scholar
Dai YH, Liao LZ (2002) $R$-linear convergence of the Barzilai and Borwein gradient method. IMA J Numer Anal 22(1):1–10
Article MathSciNet MATH Google Scholar
Dai YH, Fletcher R (2005) Projected Barzilai-Borwein methods for large-scale box-constrained quadratic programming. Numer Math 100(1):21–47
Article MathSciNet MATH Google Scholar
Dai YH, Hager WW, Schittkowski K, Zhang H (2006) The cyclic Barzilai–Borwein method for unconstrained optimization. IMA J Numer Anal 26(3):604–627
Article MathSciNet MATH Google Scholar
Figueiredo MA, Nowak RD, Wright SJ (2007) Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems. IEEE J Sel Top Signal Process 1(4):586–597
Article Google Scholar
Fletcher R (2005) On the Barzilai–Borwein method. Optimization and control with applications, applied optimization. Springer, New York
Google Scholar
Gong P, Zhang C (2012) Efficient nonnegative matrix factorization via projected Newton method. Pattern Recognit 45(9):3557–3565
Article MATH Google Scholar
Grippo L, Sciandrone M (2000) On the convergence of the block nonlinear Gauss–Seidel method under convex constraints. Oper Res Lett 26(3):127–136
Article MathSciNet MATH Google Scholar
Grippo L, Sciandrone M (2002) Nonmonotone globalization techniques for the Barzilai–Borwein gradient method. Comput Optim Appl 23(2):143–169
Article MathSciNet MATH Google Scholar
Guan N, Tao D, Luo Z, Yuan B (2012) NeNMF: an optimal gradient method for nonnegative matrix factorization. IEEE Trans Signal Process 60(6):2882–2898
Article MathSciNet Google Scholar
Han L, Neumann M, Prasad U (2009) Alternating projected Barzilai-Borwein methods for nonnegative matrix factorization. Electron Trans Numer Anal 36(6):54–82
MathSciNet MATH Google Scholar
Hoyer PO (2004) Non-negative matrix factorization with sparseness constraints. J Mach Learn Res 5:1457–1469
MathSciNet MATH Google Scholar
Huang Y, Liu H, Zhou S (2013) A Barzilai–Borwein type method for stochastic linear complementarity problems. Numer Algor. doi:10.1007/s11075-013-9803-y
Kim H, Park H (2008) Nonnegative matrix factorization based on alternating nonnegativity constrained least squares and active set method. SIAM J Matrix Anal Appl 30(2):713–730
Article MathSciNet MATH Google Scholar
Kim D, Sra S, Dhillon IS (2007) Fast Newton-type methods for the least squares nonnegative matrix approximation problem. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp 343–354
Lee DD, Seung HS (2001) Algorithms for nonnegative matrix factorization. In: Advances in Neural Information Processing Systems, pp 556–562
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791
Article Google Scholar
Lewis DD, Yang Y, Rose TG, Li F (2004) RCV1: a new benchmark collection for text categorization research. J Mach Learn Res 5:361–397
Google Scholar
Li SZ, Hou XW, Zhang HJ, Cheng QS (2001) Learning spatially localized, parts-based representation. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 207–212
Lin CJ (2007) Projected gradient methods for nonnegative matrix factorization. Neural Comput 19(10):2756–2779
Article MathSciNet MATH Google Scholar
Nesterov Y (1983) A method of solving a convex programming problem with convergence rate $O(1/k^2)$. In: Soviet Mathematics Doklady, pp 372–376
Paatero P, Tapper U (1994) Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5(2):111–126
Article Google Scholar
Pascual-Montano A, Carazo JM, Kochi K, Lehmann D, Pascual-Marqui R (2006) Nonsmooth nonnegative matrix factorization (nsNMF). IEEE Trans Pattern Anal Mach Intell 28(3):403–415
Article Google Scholar
Pauca VP (2004) Text mining using nonnegative matrix factorizations. In: Proceedings of the 2004 SIAM International Conference on Data Mining, pp 22–24
Raydan M (1997) The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem. SIAM J Optim 7(1):26–33
Article MathSciNet MATH Google Scholar
Shahnaz F, Berry MW, Pauca VP, Plemmons RJ (2006) Document clustering using nonnegative matrix factorization. J Inf Process Manage 42(2):373–386
Article MATH Google Scholar
Vavasis S (2009) On the complexity of nonnegative matrix factorization. SIAM J Optim 20(3):1364–1377
Article MathSciNet MATH Google Scholar
Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 267–273
Zdunek R, Cichocki A (2006) Non-negative matrix factorization with quasi-Newton optimization. In: Proceedings of the 8th Artificial Intelligence and Soft Computing, pp 870–879
Zhang H, Hager WW (2004) A nonmonotone line search technique and its application to unconstrained optimization. SIAM J Optim 14(4):1043–1056
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

The authors would like to thank the handling associate editor Professor Kristian Kersting and the anonymous referees for their constructive comments and useful suggestions. This work is supported by the National Natural Science Foundation of China (NNSFC) under Grant No. 61072144 and No. 61179040 and the Fundamental Research Funds for the Central Universities No. K50513100007.

Author information

Authors and Affiliations

School of Mathematics and Statistics, Xidian University, Xi’an, 710071, People’s Republic of China
Yakui Huang, Hongwei Liu & Shuisheng Zhou

Authors

Yakui Huang
View author publications
You can also search for this author in PubMed Google Scholar
Hongwei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shuisheng Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yakui Huang.

Additional information

Responsible editor: Kristian Kersting.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, Y., Liu, H. & Zhou, S. Quadratic regularization projected Barzilai–Borwein method for nonnegative matrix factorization. Data Min Knowl Disc 29, 1665–1684 (2015). https://doi.org/10.1007/s10618-014-0390-x

Download citation

Received: 24 September 2013
Accepted: 29 September 2014
Published: 12 October 2014
Issue Date: November 2015
DOI: https://doi.org/10.1007/s10618-014-0390-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Quadratic regularization projected Barzilai–Borwein method for nonnegative matrix factorization

Abstract

Similar content being viewed by others

Sparse Non-negative Matrix Factorization with Generalized Kullback-Leibler Divergence

A Nonnegative Projection Based Algorithm for Low-Rank Nonnegative Matrix Approximation

Accelerated anti-lopsided algorithm for nonnegative least squares

1 Introduction

Algorithm 1

2 Quadratic regularization projected Barzilai–Borwein method

Lemma 1

Algorithm 2

3 Convergence

Lemma 2

Lemma 3

Proof

Lemma 4

Proof

Theorem 1

Proof

Corollary 1

4 Experiments

4.1 Stopping criterion

4.2 Synthetic data

4.3 Image data

4.4 Text data

4.5 Document clustering

4.6 Nonnegative sparse coding

5 Conclusions

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Quadratic regularization projected Barzilai–Borwein method for nonnegative matrix factorization

Abstract

Similar content being viewed by others

Sparse Non-negative Matrix Factorization with Generalized Kullback-Leibler Divergence

A Nonnegative Projection Based Algorithm for Low-Rank Nonnegative Matrix Approximation

Accelerated anti-lopsided algorithm for nonnegative least squares

Explore related subjects

1 Introduction

Algorithm 1

2 Quadratic regularization projected Barzilai–Borwein method

Lemma 1

Algorithm 2

3 Convergence

Lemma 2

Lemma 3

Proof

Lemma 4

Proof

Theorem 1

Proof

Corollary 1

4 Experiments

4.1 Stopping criterion

4.2 Synthetic data

4.3 Image data

4.4 Text data

4.5 Document clustering

4.6 Nonnegative sparse coding

5 Conclusions

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation