A Novel Newton-Type Algorithm for Nonnegative Matrix Factorization with Alpha-Divergence

Nakatsu, Satoshi; Takahashi, Norikazu

doi:10.1007/978-3-319-70087-8_36

Satoshi Nakatsu¹⁸ &
Norikazu Takahashi¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10634))

Included in the following conference series:

International Conference on Neural Information Processing

Abstract

We propose a novel iterative algorithm for nonnegative matrix factorization with the alpha-divergence. The proposed algorithm is based on the coordinate descent and the Newton method. We show that the proposed algorithm has the global convergence property in the sense that the sequence of solutions has at least one convergent subsequence and the limit of any convergent subsequence is a stationary point of the corresponding optimization problem. We also show through numerical experiments that the proposed algorithm is much faster than the multiplicative update rule.

Access provided by CONRICYT-eBooks. Download conference paper PDF

Algorithms for Nonnegative Matrix Factorization with the Kullback–Leibler Divergence

Article 08 May 2021

A unified global convergence analysis of multiplicative update rules for nonnegative matrix factorization

Article 15 March 2018

A novel update rule of HALS algorithm for nonnegative matrix factorization and Zangwill’s global convergence

Article Open access 30 April 2022

Keywords

1 Introduction

Nonnegative Matrix Factorization (NMF) [1,2,3] is a mathematical operation that decomposes a given nonnegative matrix $\varvec{X}$ into two nonnegative matrices $\varvec{W}$ and $\varvec{H}$ such that $\varvec{X} \approx \varvec{W}\varvec{H}$. NMF has found many applications in various fields such as image processing, acoustic signal processing, data analysis and text mining because it can find nonnegative basis for a given set of nonnegative data.

NMF is formulated as a constrained optimization problem in which an error between $\varvec{X}$ and $\varvec{W}\varvec{H}$ has to be minimized under the nonnegativity constraints on $\varvec{W}$ and $\varvec{H}$. Multiplicative update rules [3] are widely used as simple and easy-to-implement methods for solving the NMF optimization problems. This approach can be easily applied to a wide class of error measures [4,5,6], and the obtained update rules have the global convergence property [7, 8]. However, they are often slow. Hence many studies have been done to develop faster algorithms for solving the NMF optimization problems (see, for example, [9, 10] and references therein).

In this paper, we focus our attention on NMF with the alpha-divergence [11]. The alpha-divergence includes Pearson divergence, Hellinger divergence, and chi-square divergence as its special cases [12], and has been frequently used for NMF (see [13] and references therein). As a simple and fast method for solving the optimization problem for NMF with the alpha-divergence, we propose a novel iterative algorithm based on the coordinate descent and the Newton method. We show that the proposed algorithm has the global convergence property [7, 14,15,16] in the sense that the sequence of solutions has at least one convergent subsequence and the limit of any convergent subsequence is a stationary point of the corresponding optimization problem. We also show through numerical experiments that the proposed algorithm is much faster than the multiplicative update rule.

Notation: $\mathbb {R}$ denotes the set of real numbers. $\mathbb {R}_{+}$ and $\mathbb {R}_{++}$ denote the set of nonnegative and positive real numbers, respectively. For any subset S of $\mathbb {R}$, $S^{I \times J}$ denotes the set of all $I \times J$ matrices such that each entry belongs to S. For example, $\mathbb {R}_{+}^{I \times J}$ is the set of all $I \times J$ real nonnegative matrices. $\mathbb {N}$ denotes the set of natural numbers or the set of positive integers. $\varvec{0}_{I \times J}$ and $\varvec{1}_{I \times J}$ denote the $I \times J$ matrix of all zeros and all ones, respectively. For two matrices $\varvec{A}=(A_{ij})$ and $\varvec{B}=(B_{ij}) $ with the same size, the inequality $\varvec{A} \ge \varvec{B}$ means that $A_{ij} \ge B_{ij}$ for all i and j, and $(\varvec{A}\varvec{B})_{ij}$ denotes the (i, j)-th entry of the matrix $\varvec{A}\varvec{B}$, that is, $(\varvec{A}\varvec{B})_{ij}=\sum _{k}A_{ik}B_{kj}$.

2 Alpha-Divergence Based Nonnegative Matrix Factorization

2.1 Optimization Problem

Suppose we are given an $M \times N$ nonnegative matrix $\varvec{X}=(X_{ij})$. The alpha-divergence based NMF is formulated as the constrained optimization problem:

$$\begin{aligned} \begin{array}{ll} \text {minimize} &{} D_\alpha (\varvec{X}\,\Vert \,\varvec{W}\varvec{H}^\mathrm {T}) \\ \text {subject to} &{} \varvec{W} \in \mathbb {R}_{+}^{M \times K}, \;\; \varvec{H} \in \mathbb {R}_{+}^{N \times K} \end{array} \end{aligned}$$

(1)

where

$$\begin{aligned} D_\alpha (\varvec{X}\,\Vert \,\varvec{W}\varvec{H}^{\mathrm {T}})&= \frac{1}{\alpha (1-\alpha )} \sum _{i=1}^M \sum _{j=1}^N \biggl [ \alpha X_{ij} + (1-\alpha )(\varvec{W} \varvec{H}^\mathrm {T})_{ij} \nonumber \\&\qquad -X_{ij}^\alpha (\varvec{W}\varvec{H}^\mathrm {T})^{1-\alpha }_{ij} \biggr ] \quad (\alpha \ne 0,1). \end{aligned}$$

(2)

When $\alpha >1$, the right-hand side of (2) is not defined for all nonnegative matrices $\varvec{W}$ and $\varvec{H}$. A simple way to make the problem well-defined is to modify (1) as follows:

$$\begin{aligned} \begin{array}{ll} \text {minimize} &{} D_\alpha (\varvec{X}\,\Vert \,\varvec{W}\varvec{H}^\mathrm {T}) \\ \text {subject to} &{} \varvec{W} \in [\epsilon ,\infty )^{M \times K}, \;\; \varvec{H} \in [\epsilon ,\infty )^{N \times K} \end{array} \end{aligned}$$

(3)

where $\epsilon $ is a positive constant, which is usually set to a small number so that (3) is close to (1). In what follows, we consider (3) instead of (1).

Note that sparse factor matrices can never be obtained from the modified optimization problem (3). However, if we replace all $\epsilon $ in the obtained solution with zero, the resulting factor matrices are expected to be sparse because local optimal solutions of (3) are often located at the boundary of the feasible region. In addition, if $\epsilon $ is sufficiently small, it is expected that the pair of the resulting factor matrices is close to the original local optimal solution.

When $\alpha <0$ and $\varvec{X}$ has a zero entry, the right-hand of (2) is not determined. We thus impose throughout this paper the following assumption on $\varvec{X}$.

Assumption 1

All entries of $\varvec{X}$ are positive.

2.2 Properties of the Objective Function

The partial derivatives of $D_{\alpha }(\varvec{X}\,\Vert \,\varvec{W}\varvec{H}^\mathrm {T})$ with respect to $W_{ik}$ and $H_{jk}$ are given by

$$\begin{aligned} \frac{\partial {D_\alpha }}{\partial {W_{ik}}}&=\frac{1}{\alpha }\left( \sum _j H_{jk} - \sum _j X_{ij}^\alpha H_{jk} (\varvec{W} \varvec{H}^{\mathrm {T}})_{ij}^{-\alpha } \right) , \\ \frac{\partial {D_\alpha }}{\partial {H_{jk}}}&=\frac{1}{\alpha }\left( \sum _i W_{ik} - \sum _i X_{ij}^\alpha W_{ik} (\varvec{W} \varvec{H}^{\mathrm {T}})_{ij}^{-\alpha } \right) , \end{aligned}$$

and the second and third partial derivatives are given by

$$\begin{aligned} \frac{\partial ^2{D_\alpha }}{\partial W_{ik}^2}&=\sum _j X_{ij}^\alpha H_{jk}^2 (\varvec{W}\varvec{H}^\mathrm {T})_{ij}^{-\alpha -1}, \\ \frac{\partial ^2{D_\alpha }}{\partial H_{jk}^2}&=\sum _i X_{ij}^\alpha W_{ik}^2 (\varvec{W}\varvec{H}^\mathrm {T})_{ij}^{-\alpha -1}, \\ \frac{\partial ^3{D_\alpha }}{\partial W_{ik}^3}&=-(\alpha +1) \sum _j X_{ij}^\alpha H_{jk}^3 (\varvec{W}\varvec{H}^\mathrm {T})_{ij}^{-\alpha -2}, \\ \frac{\partial ^3{D_\alpha }}{\partial H_{jk}^3}&=-(\alpha +1)\sum _i X_{ij}^\alpha W_{ik}^3 (\varvec{W}\varvec{H}^\mathrm {T})_{ij}^{-\alpha -2}. \end{aligned}$$

Under Assumption 1, the second partial derivatives are positive for all $\alpha \,(\ne 0,1)$ and all pairs of positive matrices $\varvec{W}$ and $\varvec{H}$. Therefore, if we fix all entries of $\varvec{W}$ and $\varvec{H}$ except $W_{ik}$ ($H_{jk}$, resp.) to constants not less than $\epsilon $ then we obtain a function of $W_{ik}$ ($H_{jk}$, resp.) which is strictly convex on $[\epsilon , \infty )$. In what follows, we express these functions as $f_{ik}(W_{ik})$ and $g_{jk}(H_{jk})$. Then $f'_{ik}(W_{ik})$ and $g'_{jk}(H_{jk})$ are monotone increasing functions, and convex if $\alpha \le -1$ and concave otherwise.

It is easy to see that the objective function of (3) has the following property.

Lemma 1

For any $\alpha \in \mathbb {R} \setminus \{0,1\}$, $\epsilon \in \mathbb {R}_{++}$, $\varvec{X} \in \mathbb {R}_{++}^{M \times N}$, $\varvec{W}^{*} \in [\epsilon ,\infty )^{M \times K}$ and $\varvec{H}^{*} \in [\epsilon , \infty )^{N \times K}$, the level set

$$\begin{aligned} \{(\varvec{W},\varvec{H}) \in [\epsilon ,\infty )^{M \times K} \times [\epsilon ,\infty )^{N \times K}\,| D_{\alpha }(\varvec{X}\,\Vert \,\varvec{W}\varvec{H}^\mathrm {T}) \le D_{\alpha }(\varvec{X}\,\Vert \,\varvec{W}^{*}(\varvec{H}^{*})^\mathrm {T})\} \end{aligned}$$

is bounded.

2.3 Optimality Conditions

Let $\mathcal {F}_{\epsilon }=[\epsilon ,\infty )^{M \times K} \times [\epsilon ,\infty )^{N \times K}$ be the feasible region of the problem (3). If $(\varvec{W},\varvec{H}) \in \mathcal {F}_{\epsilon }$ is a local optimal solution of (3), it must satisfy the following conditions:

$$\begin{aligned} \forall i,k, \quad \frac{\partial D_{\alpha }}{\partial W_{ik}}&{\left\{ \begin{array}{ll} \ge 0, &{} \mathrm{if} \;\; W_{ik}=\epsilon , \\ =0, &{} \mathrm{if} \;\; W_{ik} > \epsilon , \end{array}\right. } \end{aligned}$$

(4)

$$\begin{aligned} \forall j,k, \quad \frac{\partial D_{\alpha }}{\partial H_{jk}}&{\left\{ \begin{array}{ll} \ge 0, &{} \mathrm{if} \;\; H_{jk}=\epsilon , \\ =0, &{} \mathrm{if} \;\; H_{jk} > \epsilon . \end{array}\right. } \end{aligned}$$

(5)

A point $(\varvec{W},\varvec{H}) \in \mathcal {F}_{\epsilon }$ satisfying (4) and (5) is called a stationary point of (3).

3 Proposed Algorithm

The algorithm proposed here for solving the problem (3) is based on the coordinate descent and the Newton method. Let the current values of $\varvec{W}$ and $\varvec{H}$ be $\varvec{W}^\mathrm {c} \in [\epsilon ,\infty )^{M \times K}$ and $\varvec{H}^\mathrm {c} \in [\epsilon ,\infty )^{N \times K}$, respectively. We want to minimize the value of the objective function by updating only one variable, say $W_{ik}$. This problem is formulated as

$$\begin{aligned} \begin{array}{ll} \text {minimize} &{} f_{ik}(W_{ik}) \\ \text {subject to} &{} W_{ik} \ge \epsilon \end{array} \end{aligned}$$

(6)

where $f_{ik}(W_{ik})$ is the function obtained from $D_\alpha (\varvec{X}\,\Vert \,\varvec{W} \varvec{H}^\mathrm {T})$ by fixing all variables except $W_{ik}$ to the current values. Because $f_{ik}(W_{ik})$ is strictly convex as stated in the previous section, (6) is a convex optimization problem. Therefore, if $f'_{ik}(W_{ik}^\mathrm {c})=0$ then $W_{ik}^\mathrm {c}$ is the optimal solution of (6). However, we cannot obtain the optimal solution in a closed form in general. So we apply the Newton method to obtain an approximate solution of $f'_{ik}(W_{ik})=0$, which is given by

$$\begin{aligned} W_{ik}^\mathrm {n}=W_{ik}^\mathrm {c}-\frac{f'_{ik}(W_{ik}^\mathrm {c})}{f''_{ik}(W_{ik}^\mathrm {c})}. \end{aligned}$$

If $f'_{ik}(W_{ik}^\mathrm {n})=0$ then $W_{ik}^\mathrm {n}$ is the minimum point of $f_{ik}(W_{ik})$, and hence $W_{ik}^\mathrm {new}=\max \{\epsilon , W_{ik}^\mathrm {n}\}$ is the optimal solution of (6). If $f'_{ik}(W_{ik}^\mathrm {n}) f'_{ik}(W_{ik}^\mathrm {c})>0$ then $f_{ik}(W_{ik})$ decreases monotonically as the value of $W_{ik}$ varies from $W_{ik}^\mathrm {c}$ to $W_{ik}^\mathrm {n}$. Hence, letting $W_{ik}^\mathrm {new}=\max \{\epsilon ,W_{ik}^\mathrm {n}\}$, we have $f_{ik}(W_{ik}^\mathrm {new})<f_{ik}(W_{ik}^\mathrm {c})$. On the other hand, in the case where $f'_{ik}(W_{ik}^\mathrm {n}) f'_{ik}(W_{ik}^\mathrm {c})<0$, it can occur that $f_{ik}(W_{ik}^\mathrm {n})>f_{ik}(W_{ik}^\mathrm {c})$ (see Fig. 1). In order to avoid this situation, we use a linear interpolation of the curve $Y=f'_{ik}(W_{ik})$. We first draw a line

$$\begin{aligned} Y-f'_{ik}(W_{ik}^\mathrm {c})=\frac{f'_{ik}(W_{ik}^\mathrm {c})-f'_{ik}(W_{ik}^\mathrm {n})}{W_{ik}^\mathrm {c}-W_{ik}^\mathrm {n}} (W_{ik}-W_{ik}^\mathrm {c}). \end{aligned}$$

on $W_{ik}$-Y plane, which passes through the points $(W_{ik}^\mathrm {c},f'(W_{ik}^\mathrm {c}))$ and $(W_{ik}^\mathrm {n},f'(W_{ik}^\mathrm {n}))$ (see the red line in Fig. 1). We then find the point at which the line intersects the $W_{ik}$-axis, and let the $W_{ik}$-coordinate of the point be a new approximate solution $W_{ik}^\mathrm {i}$ of $f'_{ik}(W_{ik})=0$, that is,

$$\begin{aligned} W_{ik}^\mathrm {i}=W_{ik}^\mathrm {c}-\frac{W_{ik}^\mathrm {n}-W_{ik}^\mathrm {c}}{f'_{ik}(W_{ik}^\mathrm {n})-f'_{ik}(W_{ik}^\mathrm {c})} f'_{ik}(W_{ik}^\mathrm {c}). \end{aligned}$$

Furthermore, if $W_{ik}^\mathrm {i}$ is less than $\epsilon $, we replace it with $\epsilon $. Then we have $f'_{ik}(W_{ik}^\mathrm {i})f'_{ik}(W_{ik}^\mathrm {c})>0$ and $f_{ik}(W_{ik}^\mathrm {i})<f_{ik}(W_{ik}^\mathrm {c})$.

Figure 2 shows the proposed update rule for $W_{ik}$, which is based on the idea described above but slightly modified so that a better solution can be obtained.

The problem of minimizing the value of the objective function by updating only $H_{jk}$ is formulated as

$$\begin{aligned} \begin{array}{ll} \text {minimize} &{} g_{jk}(H_{jk}) \\ \text {subject to} &{} H_{jk} \ge \epsilon \end{array} \end{aligned}$$

(7)

where $g_{jk}(H_{jk})$ is the function obtained from $D_{\alpha }(\varvec{X}\,\Vert \,\varvec{W}\varvec{H}^\mathrm {T})$ by fixing all variables except $H_{jk}$ to the current values. Using the same idea as above, we can derive an update rule for $H_{jk}$ as shown in Fig. 3.

It is clear from the derivation of Algorithms 1 and 2 that the following two lemmas hold true.

Lemma 2

If $W_{ik}$ is not the optimal solution of the subproblem (6) then $W_{ik}^\mathrm {new}$ obtained by Algorithm 1 satisfies $f_{ik}(W_{ik}^\mathrm {new}) < f_{ik}(W_{ik})$. Similarly, if $H_{jk}$ is not the optimal solution of the subproblem (7) then $H_{jk}^\mathrm {new}$ obtained by Algorithm 2 satisfies $g_{jk}(H_{jk}^\mathrm {new}) < g_{jk}(H_{jk})$.

Lemma 3

Suppose that $\varvec{X} \in \mathbb {R}_{++}^{M \times N}$, $\alpha \in \mathbb {R}\setminus \{0,1\}$ and $\epsilon \in \mathbb {R}_{++}$ are fixed. Then $W_{ik}^\mathrm {new}$, the output of Algorithm 1, depends continuously on $\varvec{W}$ and $\varvec{H}$ for any i and k. Similarly, $H_{jk}^\mathrm {new}$, the output of Algorithm 2, depends continuously on $\varvec{W}$ and $\varvec{H}$ for any j and k.

Furthermore, using Zangwill’s global convergence theorem [16], we obtain the following theorem.

Theorem 1

Given $\varvec{X} \in \mathbb {R}_{++}^{M \times N}$, $K \in \mathbb {N}$, $\alpha \in \mathbb {R}\setminus \{0,1\}$, $\epsilon \in \mathbb {R}_{++}$, and an initial solution $(\varvec{W}^{(0)},\varvec{H}^{(0)}) \in \mathcal {F}_{\epsilon }$, we apply the update rules described by Algorithms 1 and 2 to $MK+NK$ variables in a fixed cyclic order. Let $(\varvec{W}^{(l)},\varvec{H}^{(l)}) \in \mathcal {F}_{\epsilon }$ be the solution after l rounds of updates. Then the sequence $\{\varvec{W}^{(l)},\varvec{H}^{(l)}\}_{l=0}^{\infty }$ has at least one convergent subsequence and the limit of any convergent subsequence is a stationary point of the problem (3).

Proof

Let us express the relation between $(\varvec{W}^{(l)},\varvec{H}^{(l)})$ and $(\varvec{W}^{(l+1)},\varvec{H}^{(l+1)})$ by using a mapping $A: \mathcal {F}_{\epsilon } \rightarrow \mathcal {F}_{\epsilon }$ as follows:

$$\begin{aligned} (\varvec{W}^{(l+1)},\varvec{H}^{(l+1)}) = A(\varvec{W}^{(l)},\varvec{H}^{(l)}). \end{aligned}$$

In view of Zangwill’s global convergence theorem [16], it suffices to show that the following statements hold true.

1.
(Boundedness) For any initial solution $(\varvec{W}^{(0)},\varvec{H}^{(0)}) \in \mathcal {F}_{\epsilon }$, the sequence $\{(\varvec{W}^{(l)},\varvec{H}^{(l)})\}_{l=0}^{\infty }$ belongs to a compact subset of $\mathcal {F}_{\epsilon }$.
2.
(Monotoneness) The objective function $D_{\alpha }(\varvec{X}\,\Vert \,\varvec{W}\varvec{H}^\mathrm {T})$ satisfies
$$\begin{aligned} (\varvec{W},\varvec{H}) \not \in \mathcal {S}_{\epsilon }&\Rightarrow D_{\alpha }(\varvec{X}\,\Vert \,\varvec{W}'(\varvec{H}')^\mathrm {T}) < D_{\alpha }(\varvec{X}\,\Vert \,\varvec{W}\varvec{H}^\mathrm {T}) \\ (\varvec{W},\varvec{H}) \in \mathcal {S}_{\epsilon }&\Rightarrow D_{\alpha }(\varvec{X}\,\Vert \,\varvec{W}'(\varvec{H}')^\mathrm {T}) \le D_{\alpha }(\varvec{X}\,\Vert \,\varvec{W}\varvec{H}^\mathrm {T}) \end{aligned}$$
where $\mathcal {S}_{\epsilon }$ is the set of stationary points of (3) and $(\varvec{W}',\varvec{H}')=A(\varvec{W},\varvec{H})$.
3.
(Continuity) The mapping A is continuous in $\mathcal {F}_{\epsilon } \setminus \mathcal {S}_{\epsilon }$.

The monotoneness follows from Lemma 2. The boundedness follows from Lemmas 1 and 2. The continuity follows from Lemma 3. $\square $

By Theorem 1, we can immediately obtain an algorithm that stops within a finite number of rounds by relaxing the optimality condition given by (4) and (5), as shown in Reference [7]. The resulting algorithm is shown in Fig. 4.

Theorem 2

For any input, Algorithm 3 stops within a finite number of rounds.

4 Numerical Experiments

In order to evaluate the efficiency of the proposed algorithm, we applied it to a randomly generated matrix $\varvec{X}$ and compared the results with those obtained using the multiplicative update rule [6, 8] described by

$$\begin{aligned} W_{ik}^\mathrm {new} \leftarrow \max \left( \epsilon , W_{ik}\left( \frac{\sum _{j=1}^N X_{ij}H_{jk}/(\varvec{W}\varvec{H}^{\mathrm {T}})_{ij}}{\sum _{j=1}^N H_{jk}}\right) ^{\frac{1}{\alpha }}\right) \end{aligned}$$

and the same stopping condition. Although some other methods have been proposed (see [13] for example), we do not consider them because the global convergence is not guaranteed.

In all experiments, $\varvec{X}$ was set to the same $40 \times 20$ matrix of which each entry was drawn from an independent uniform distribution on the interval [0, 1]. The value of K was set to 5. The values of the parameters in Algorithm 3 were set to $\epsilon =10^{-6}$, $\delta _1=10^{-4}$, $\delta _2=10^{-6}$, $u=1.0$ and $\delta _3=0.5\times \delta _1$. The value of the parameter $\alpha $ in the alpha-divergence was set to $-1.5$, 0.5 and 2.5. For each value of $\alpha $, the multiplicative update rules and the proposed algorithms were run for 10 times with 10 different initial solutions, which were generated in the same way as $\varvec{X}$ but all entries less than $\epsilon $ were replaced with $\epsilon $ so that the initial solution belongs to the feasible region of the optimization problem. The maximum number of rounds was set to 40, 000, that is, if the solution does not satisfy the stopping condition within 40, 000 rounds then the algorithm was forcedly stopped. All algorithms were implemented in C language, compiled with gcc 5.3.0 and tested on a PC with Intel Core i5-4590 and 8 GB RAM.

The results are shown in Tables 1 and 2. It is easily seen from those tables that the proposed algorithm is much faster than the multiplicative update rule.

Table 1. The number of rounds of the multiplicative update (MU) rule and Algorithm 3 for solving (3).

Full size table

Table 2. Computation time (in second) of the multiplicative update (MU) rule and Algorithm 3 for solving (3).

Full size table

5 Conclusion

We have proposed a novel iterative algorithm, which is based on the coordinate descent and the Newton method, for NMF with the alpha-divergence. The proposed algorithm not only has the global convergence property like the multiplicative update rule but also is much faster than the multiplicative update rule, as shown in the experimental results in the previous section. Further experiments with various real data should be performed in the near future to evaluate the efficiency of the proposed algorithm.

References

Paatero, P., Tapper, U.: Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5(2), 111–126 (1994)
Article Google Scholar
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–792 (1999)
Article MATH Google Scholar
Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) Advances in Neural Information Processing Systems. vol. 13, pp. 556–562 (2001)
Google Scholar
Févotte, C., Bertin, N., Durrieu, J.L.: Nonnegative matrix factorization with the Itakura-Saito divergence: with application to music analysis. Neural Comput. 21(3), 793–830 (2009)
Article MATH Google Scholar
Févotte, C., Idier, J.: Algorithms for nonnegative matrix factorization with the $\beta $-divergence. Neural Comput. 23(9), 2421–2456 (2011)
Article MATH MathSciNet Google Scholar
Yang, Z., Oja, E.: Unified development of multiplicative algorithm for linear and quadratic nonnegative matrix factorization. IEEE Trans. Neural Networks 22(12), 1878–1891 (2011)
Article Google Scholar
Takahashi, N., Hibi, R.: Global convergence of modified multiplicative updates for nonnegative matrix factorization. Comput. Optim. Appl. 57, 417–440 (2014)
Article MATH MathSciNet Google Scholar
Takahashi, N., Katayama, J., Takeuchi, J.: A generalized sufficient condition for global convergence of modified multiplicative updates for NMF. In: Proceedings of 2014 International Symposium on Nonlinear Theory and Its Applications. pp. 44–47 (2014)
Google Scholar
Kim, J., He, Y., Park, H.: Algorithms for nonnegative matrix and tensor factorization: a unified view based on block coordinate descent framework. J. Global Optim. 58(2), 285–319 (2014)
Article MATH MathSciNet Google Scholar
Hansen, S., Plantenga, T., Kolda, T.G.: Newton-based optimization for Kullback-Leibler nonnegative tensor factorizations. Optim. Methods Softw. 30(5), 1002–1029 (2015)
Article MATH MathSciNet Google Scholar
Amari, S.I.: Differential-Geometrical Methods in Statistics. Springer, New York (1985)
Book MATH Google Scholar
Cichocki, A., Zdunek, R., Amari, S.I.: Csiszar’s divergences for non-negative matrix factorization: family of new algorithms. In: Proceedings of the 6th International Conference on Independent Component Analysis and Signal Separation, pp. 32–39 (2006)
Google Scholar
Cichocki, A., Zdunek, R., Phan, A.H., Amari, S.I.: Nonnegative Matrix and Tensor Factorizations. Wiley, West Sussex (2009)
Book Google Scholar
Kimura, T., Takahashi, N.: Global convergence of a modified HALS algorithm for nonnegative matrix factorization. In: Proceedings of 2015 IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing, pp. 21–24 (2015)
Google Scholar
Takahashi, N., Seki, M.: Multiplicative update for a class of constrained optimization problems related to NMF and its global convergence. In: Proceedings of 2016 European Signal Processing Conference, pp. 438–442 (2016)
Google Scholar
Zangwill, W.: Nonlinear Programming: A Unified Approach. Prentice-Hall, Englewood Cliffs (1969)
MATH Google Scholar

Download references

Acknowledgments

This work was partially supported by JSPS KAKENHI Grant Number JP15K00035.

Author information

Authors and Affiliations

Okayama University, Okayama, 700-8530, Japan
Satoshi Nakatsu & Norikazu Takahashi

Authors

Satoshi Nakatsu
View author publications
You can also search for this author in PubMed Google Scholar
Norikazu Takahashi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Satoshi Nakatsu .

Editor information

Editors and Affiliations

Guangdong University of Technology, Guangzhou, China
Derong Liu
Guangdong University of Technology, Guangzhou, China
Shengli Xie
South China University of Technology, Guangzhou, China
Yuanqing Li
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Dongbin Zhao
King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia
El-Sayed M. El-Alfy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nakatsu, S., Takahashi, N. (2017). A Novel Newton-Type Algorithm for Nonnegative Matrix Factorization with Alpha-Divergence. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10634. Springer, Cham. https://doi.org/10.1007/978-3-319-70087-8_36

Download citation

DOI: https://doi.org/10.1007/978-3-319-70087-8_36
Published: 24 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70086-1
Online ISBN: 978-3-319-70087-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics