Accelerated SVD-based initialization for nonnegative matrix factorization

Esposito, Flavia; Atif, Syed Muhammad; Gillis, Nicolas

doi:10.1007/s40314-024-02905-1

Accelerated SVD-based initialization for nonnegative matrix factorization

Open access
Published: 02 September 2024

Volume 43, article number 392, (2024)
Cite this article

Download PDF

You have full access to this open access article

Computational and Applied Mathematics Aims and scope Submit manuscript

Accelerated SVD-based initialization for nonnegative matrix factorization

Download PDF

112 Accesses
1 Altmetric
Explore all metrics

Abstract

Nonnegative matrix factorization (NMF) is a popular dimensionality reduction technique. NMF is typically cast as a non-convex optimization problem solved via standard iterative schemes, such as coordinate descent methods. Hence the choice of the initialization for the variables is crucial as it will influence the factorization quality and the convergence speed. Different strategies have been proposed in the literature, the most popular ones rely on singular value decomposition (SVD). In particular, Atif et al. (Pattern Recognit Lett 122:53–59, 2019) have introduced a very efficient SVD-based initialization, namely NNSVD-LRC, that overcomes the drawbacks of previous methods, namely, it guarantees that (i) the error decreases as the factorization rank increases, (ii) the initial factors are sparse, and (iii) the computational cost is low. In this paper, we improve upon NNSVD-LRC by using the low-rank structure of the residual matrix; this allows us to obtain NMF initializations with similar quality to NNSVD-LRC (in terms of error and sparsity) while reducing the computational load. We evaluate our proposed solution over other NMF initializations on several real dense and sparse datasets.

Initialization for non-negative matrix factorization: a comprehensive review

Article 11 November 2022

A column-wise update algorithm for nonnegative matrix factorization in Bregman divergence with an orthogonal constraint

Article 11 March 2016

A generalized tri-factorization method for accurate matrix completion

Article 06 August 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Nonnegative matrix factorization (NMF) decomposes an input nonnegative data matrix, $X\in \mathbb {R}_+^{m\times n}$, into the product of two nonnegative matrices $W\in \mathbb R_{+}^{m\times k}$ and $H\in \mathbb R_{+}^{k\times n}$ such that $X\approx WH$, where k is the rank of the factorization. The nonnegativity constraints of NMF allow one to reconstruct data using a purely additive model, with the advantage of a natural interpretability. This makes NMF useful in many applications (Cichocki et al. 2009; Gillis 2020) in various research fields like image analysis (Ensari 2016; Prajapati and Jadhav 2015; Luce et al. 2016; Shiga et al. 2016), text mining (Du et al. 2019), signal processing (Yoshii et al. 2016), and computational biology (Maruyama et al. 2014).

NMFs are in most cases computed by solving the following optimization problem:

$$\begin{aligned} \min _{ W \ge 0, H \ge 0 } \Vert X-WH \Vert _F^2, \end{aligned}$$

(1)

where $||\cdot ||_F$ denotes the Frobenius norm. Most algorithms for solving (1) use standard non-linear optimization schemes such as block coordinate descent methods, updating the factors in an iterative way; see, e.g., Cichocki et al. (2009), Gillis (2020) and the references therein. Hence, they require initialization mechanisms for the variables, W and H, that can greatly influence the factorization results. They can affect both (i) the number of iterations needed for an algorithm to converge (in fact, if the initial point is closer to a local minimum, it will require a reduced number of iterations to converge to it), and (ii) the final solution the algorithm will converge to. Different strategies have been proposed for NMF initialization (Esposito 2021), such as simple schemes (e.g., random initializations) (Langville et al. 2006), structured schemes (e.g., SVD- or clustering-based (Wild et al. 2004; Rezaei et al. 2011)), evolutionary-based approaches (e.g., genetics algorithms, particle swarm optimization) (Janecek and Tan 2011), and geometric-based methods. The latter class of methods tries to locate the vertices of the convex polytope formed by the columns of X; see, e.g., Zdunek (2012); Araújo et al. (2001); Nascimento and Dias (2005); Liu and Tan (2018) and [ Gillis (2020) Chapter 7]. To the best of our knowledge, some of the most popular initialization methods for NMF are based on deterministic low-rank decompositions such as rank-1 decomposition (Liu and Tan 2018) and the SVD (Esposito 2021), and geometric methods such as vertex component analysis (Nascimento and Dias 2005) and the successive projection algorithm (Araújo et al. 2001; Sauwen et al. 2017). In this paper, we focus on these latter. In this class, two of the most widely used methods, introduced in Boutsidis and Gallopoulos (2008) and Qiao (2015), have the drawback to increasing the approximation error, $||X-WH||_F^2$, as the rank increases; see Sect. 2 for more details. More recently, a variant of these approaches, referred to as Nonnegative SVD with Low-Rank Correction (NNSVD-LRC), makes use of a low-rank correction to address this issue (Atif et al. 2019), while reducing the computational load by using a truncated SVD of smaller rank (roughly half). In this paper, we improve upon NNSVD-LRC by proposing a new initialization scheme, referred to as accelerated Nonnegative SVD with Progressive Residual Projection (accNNSVD-PRP), that reduces further the computational load of NNSVD-LRC by keeping track of the residual using its low-rank property. The paper is organized as follows. Section 2 briefly reviews the background of SVD-based initialization for NMF, with an emphasis on NNSVD-LRC. Section 3 details our proposed solution, and highlights the differences with existing SVD-based initializations. In Sect. 4, we show that the new proposed initialization, accNNSVD-PRP, compares favorably against standard structured and random initializations on several real dense and sparse data sets. Section 5 concludes the paper and discusses some future directions of research.

2 SVD-based NMF initializations: previous works

As mentioned in the introduction, SVD-based initializations are the most popular and effective for NMF. Recall that the rank-p truncated SVD approximates a data matrix $X\in \mathbb {R}_+^{m\times n}$ with a lower-rank matrix composed by $p$ rank-one terms as follows:

$$\begin{aligned} X\approx X_{p} = U_{p}\Sigma _{p} V_{p}^{\top }, \end{aligned}$$

(2)

where $X_{p}$ is the best rank-p approximation of X in the Frobenius norm, $U_{p} \in \mathbb {R}^{m\times p}$ and $V_{p} \in \mathbb {R}^{n\times p}$ contain left and right singular vectors, and $\Sigma _{p} \in \mathbb {R}^{p\times p}$ contains the singular values on its diagonal in nonincreasing order. Let us write the truncated SVD as the product of only two matrices, as follows

$$\begin{aligned} X\approx X_{p} = \sum _{i=1}^{p} y_{i}z_{i} = Y Z, \end{aligned}$$

(3)

where $Y = U_p \Sigma _p^{1/2} \in \mathbb {R}^{m \times p}$ and $Z = \Sigma _p^{1/2} V_p^\top \in \mathbb {R}^{p \times n}$, so that $y_i = \sqrt{\sigma _i} U_p(:,i)$ and $z_i = \sqrt{\sigma _i} V_p(:,i)^\top $ for $1 \le i \le p$. (Note that $z_i$’s are row vectors, which will simplify the notation by avoiding using the transpose sign.) Due to their negative entries,^{Footnote 1}Y and Z cannot be used directly for NMF initialization.

Let us define, for a generic vector $b\in \mathbb {R}^{q}$, its nonnegative part, $b^{(\ge 0)} = \max (0,b)$, and its nonpositive part, $b^{(\le 0)} = \max (0,-b)$, so that $b = b^{(\ge 0)} - b^{(\le 0)}$. In this way, (3) can be rewritten as:

$$\begin{aligned} X_{p} = \sum \limits _{i=1}^{p}\Big ( y_{i}^{(\ge 0)}z_{i}^{(\ge 0)}+y_i^{(\le 0)}z_{i}^{(\le 0)}\Big ) \; - \; \sum \limits _{i=1}^{p}\Big (y_{i}^{(\ge 0)}z_{i}^{(\le 0)} + y_{i}^{(\le 0)}z_{i}^{(\ge 0)}\Big ). \end{aligned}$$

(4)

The negative sign of the second term in (4) poses challenges for NMF initialization, since it can lead to negative initialization values, which are not suitable for NMF purposes. In the literature, there are different methods designed to effectively manage and mitigate the impact of negative values.

The two most popular and widely used approaches are the following:

(1) Nonnegative Double SVD (NNDSVD) (Boutsidis and Gallopoulos 2008) discards the second term (with the minus sign) in (4), and selects $p$ product terms (among the 2p) from the first term according to the following criterion: for each $i$, it chooses $y_{i}^{(\ge 0)}z_{i}^{(\ge 0)}$ if $||y_{i}^{(\ge 0)}z_{i}^{(\ge 0)}||_F > ||y_i^{(\le 0)}z_{i}^{(\le 0)}||_F$, otherwise it opts for $y_i^{(\le 0)}z_{i}^{(\le 0)}$. This takes advantage of the sign ambiguity of the SVD (Boutsidis and Gallopoulos 2008; Bro et al. 2008).

(2) SVD-NMF (Qiao 2015) adopts as an initialization the component-wise absolute values of Y and Z, that is, $W = |Y|$ and $H = |Z|$.

Unfortunately, these two methods have several drawbacks. First, the reconstruction error of the initialization, $||X-WH||_F^2$, increases as the rank of the factorization increases. This is because all negative terms are discarded. Second, a lot of information embedded into the first term of (4) is lost. To alleviate these drawbacks, another method, called Nonnegative SVD with Low-Rank Correction (NNSVD-LRC), was proposed more recently in Atif et al. (2019) that splitting (4) in different parts, it considers just the nonnegative side of the rank p-approximation.

2.1 Nonnegative SVD with low-rank correction

NNSVD-LRC is composed by two phases: a nonnegative SVD (NNSVD) initialization, followed a Low-Rank Correction (LRC) that allows NNSVD-LRC to decrease the error as the factorization rank increases. The pseudo-code of NNSVD-LRC is detailed in Algorithm 1.

The first phase, NNSVD, computes the nonnegative factors W and H from the rank-p truncated SVD of X, using $p=\lfloor k/2+1 \rfloor $ as in (2), and then construct YZ as in (3). By the Perron-Frobenius theorem, $|y_1||z_1|$ is an optimal rank-one approximation of X in the Frobenius norm, hence we set $W(:,1) = |y_1|$ and $H(1,:) = |z_1|$.

The remaining $k-1$ columns of W and rows of H are completed with the next $\lfloor p/2\rfloor $ factors of the truncated SVD as follows:

$$\begin{aligned} W(:,j) = {\left\{ \begin{array}{ll} y_{i}^{(\ge 0)} \quad \text {if { j} is even},\\ y_{i}^{(\le 0)} \quad \text {otherwise}, \end{array}\right. } H(j,:) = {\left\{ \begin{array}{ll} z_{i}^{(\ge 0)} \quad \text {if { j} is even},\\ z_{i}^{(\le 0)} \quad \text {otherwise}, \end{array}\right. } \end{aligned}$$

(5)

where $j = 2, 3, \dots , k$ and $i=\lfloor \frac{j}{2} + 1 \rfloor $.

The second phase will fed the preliminary factors W and H into LRC to reduce the error induced during the NNSVD phase due to loss of information in the second term of (4). This is done by using a state-of-the-art NMF algorithm, namely the Accelerated Hierarchical Alternating Least Squares (A-HALS)^{Footnote 2} algorithm (Gillis and Glineur 2012), to solve

$$\begin{aligned} \min _{ W \ge 0, H \ge 0 } \Vert X_p-WH \Vert _F^2. \end{aligned}$$

(6)

The reason to use $X_p$ instead of X is that $X_p=YZ$ has a low rank representation and hence each iteration of the subsequent NMF algorithm can be reduced from O(mnk) operations per iteration to $O\left( (m+n)k^2 \right) $ operations. Although NNSVD-LRC performs a few relatively cheap A-HALS iterations, it is found empirically that the time taken by these iterations is not negligible compared to the truncated SVD computation.

This motivates us to introduce a new SVD-based initialization for NMF, namely accNNSVD-PRP described in the next section.

3 Accelerated nonnegative SVD with progressive residual projection

In this section, we detail the new proposed SVD-based initialization for NMF, namely Accelerated NNSVD with Progressive Residual Projection (accNNSVD-PRP); see Algorithm 2.

It differs from NNSVD-LRC in the following ways:

(1)
It modifies the first phase, NNSVD, to keep track of the second term of (4) which is discarded by NNSVD-LRC; see Algorithm 3.
(2)
It replaces the second phase, LRC, by a cheaper one, which we refer to as Progressive Residual Projection (PRP). In contrast to NNSVD-LRC, it only updates the factor H to decrease the error.

Let us describe these two phases in more details.

Phase 1: modified NNSVD (mNNSVD) This phase is very similar to NNSVD, the first phase of NNSVD-LRC, see Algorithm 1. The generated factors (W, H) are the same since it splits in different parts (5), but mNNSVD keeps track of the discarded term in (4), denoted $\bar{H}$, such that

$$\begin{aligned} X_p = WH - W \overline{H}, \quad \text { where } \left( W,H,\overline{H}\right) \ge 0, \end{aligned}$$

that is,

$$\begin{aligned} \overline{H}(1,:) = \varvec{0},\quad \text {and}\quad \overline{H}(j,:) = {\left\{ \begin{array}{ll} z_{i}^{(\le 0)} \quad \text {if { j} is even},\\ z_{i}^{(\ge 0)} \quad \text {otherwise}, \end{array}\right. } \end{aligned}$$

where $j = 2, 3, \dots , k$, $i=\lfloor \frac{j}{2} + 1 \rfloor $, and $\varvec{0}$ denotes the vector of zeros of appropriate dimension. Algorithm 3 described this procedure.

Phase 2: Progressive Residual Projection (PRP)

From the output of Algorithm 3, we directly use the basis matrix W as initialization. We will only update H to reduce the approximation error.

To this aim, similarly as in LRC, we want to solve

$$\begin{aligned} \min _{H \ge 0} F(H):= ||X_p - WH||_F^2. \end{aligned}$$

(7)

Let us denote $H^{(0)}$ and $\overline{H}^{(0)}$ the output of mNNSVD. We have $X_p = WH^{(0)} - W\overline{H}^{(0)}$. To solve (7), let us resort to Nesterov accelerated gradient descent (Nesterov 1983, 2018), and let us denote the iterates $H^{(t)}$ for $t=0,1,\dots $. At every iteration, we will have

$$\begin{aligned} X_p = WH^{(0)} - W\overline{H}^{(0)} \approx WH^{(t)}. \end{aligned}$$

To progressively keep track of the residual, let us define

$$\begin{aligned} \overline{H}^{(t)} = \overline{H}^{(0)} + H^{(t)} - H^{(0)}. \end{aligned}$$

This implies that, for all t, the following equality holds

$$\begin{aligned} X_p = W \left( H^{(t)}-\overline{H}^{(t)}\right) . \end{aligned}$$

(8)

This can be used to compute the gradient efficiently, as follows:

$$\begin{aligned} \nabla F\left( H^{(t)} \right) = -2W^\top \left( X_p-WH^{(t)}\right) = 2W^\top W\overline{H}^{(t)}. \end{aligned}$$

(9)

It is worthy to note that the formulation of the gradient in (9) implicitly depends on $H^{(t)}$. Since Nesterov accelerated gradient method does not ensure the objective function to decrease at every iteration, we embed the code with a restarting scheme. If the objective function increases, the algorithm abandons the extrapolated sequence and takes a standard gradient step (O’donoghue B, Candes E, 2015). The cost for each iteration of this phase consists of $O\left( k^{2} (m+n) \right) $ operations. In contrast, although LRC has the same computational cost per iteration, it alternates between optimizing W and H, necessitating a few iterations each time to update both factors, which results in a comparatively slower process. On the other hand, PRP exclusively optimizes H and typically requires less iterations (as it updates only H), it operates significantly faster than LRC.

Algorithm 4 provides the pseudo-code of PRP, where $S^{(t)}$ denotes the extrapolated sequence of Nesterov gradient method (Nesterov 1983). Recall that Nesterov accelerated gradient method performs a projected gradient step (step 3 in Algorithm 4) from an extrapolated sequence, denoted $S^{(t)}$, computed via a so-called extrapolation, see step 5 in Algorithm 4, where the so-called extrapolation parameters (step 4 in Algorithm 4) are chosen to guarantee an optimal convergence rate.

Sparsity of factors in accNNSVD-PRP Since W is initialized via the SVD in accNNSVD-PRP, its columns will be roughly 50% sparse (except for the first one, by the Perron-Frobenius theorem). On the other hand, since H is optimized, it is hard to predict the sparsity it will achieve, and it will depend on the data set. On the dense data sets presented in Sect. 4.1, the average sparsity of H was about 50% (between 45% and 56% for all data sets and tested ranks). For the sparse data sets (see Sect. 4.1), the sparsity was between 64% and 74%.

4 Numerical experiments

In this section, we show some numerical results to corroborate the goodness of our proposed method as initialization for NMF problems. The code of accNNSVD-PRP is available on GitHub at https://github.com/5y3datif/accNNSVD-PRP. We compare the proposed method with the scaled random initialization (SRI) from Gillis and Glineur (2008, 2012), and with five structure-based NMF initializations: three are SVD-based, namely NNDSVD, SVD-NMF and NNSVD-LRC (Atif et al. 2019), and two clustering-based, namely a recent hybrid method combining clustering and the computation of rank-one SVDs called CR1-NMF (Liu and Tan 2017), and SPKM based on spherical k-means (Wild et al. 2004). The code for CR1-NMF and SPKM are available from https://github.com/zhaoqiangliu/cr1-nmf. Note that, because of the non-deterministic nature of CR1-NMF, SPMK and SRI, we run the methods 100 times and report mean results along with corresponding standard deviations. All tests are preformed using Matlab R2021b on a laptop Intel(R) Core(TM) i5-9300 H CPU @ 2.40GHz empowered by NVIDIA GEFORCE GTX 1650, CUDA toolkit 10.0 and cuDNN 7.4. We take the advantage of available GPU (whenever possible) during simulation. We will compare the various methods using the following two quantities:

1.
the $\text {relative error}(X,WH) \; = \; \frac{\Vert X - WH\Vert _{F}}{\Vert X\Vert _{F}}$,
2.
the computational time which is computed as the CPU time in Matlab.

4.1 Datasets description

We conduct experiments on the following dense and sparse datasets.

4.1.1 Dense datasets

The following three widely used and real dense datasets:

$\bullet $ AT &T Faces is a dataset of face images taken in 1992–1994 and it is one of the widely used face images datasets by the research community. There are ten different images of 40 distinct subjects. The size of each image is 92$\times $112 pixels, with 256 grey levels per pixel. The dataset is freely available from http://cam-orl.co.uk/facedatabase.html.

$\bullet $ Faces95 is a facial images dataset that consists of 72 subjects such that there is a sequence of 20 images with a resolution of 180$\times $200 pixels. The dataset is freely accessible from https://www.essex.ac.uk/mv/allfaces/faces95.zip.

$\bullet $ PaviaU is a Hyperspectral image dataset acquired by the ROSIS sensor during a flight campaign over Pavia, Italy. Each image has resolution of $610\times 610$ with a spatial resolution of 1.3m, and 103 spectral bands. The dataset is freely available from http://www.ehu.eus/ccwintco/uploads/e/ee/PaviaU.mat, http://www.ehu.eus/ccwintco/uploads/5/50/PaviaU_gt.mat.

4.1.2 Sparse datasets

We also tested our approach on sparse document datasets. The datasets were derived from the San Jose Mercury newspaper articles that are distributed as part of the TREC collection (TIPSTER Vol. 3) and are accessible from https://catalog.ldc.upenn.edu/LDC93T3D (Zhong and Ghosh 2005):

$\bullet $ Sports: it consists of documents about seven different sports (such as baseball, basketball, football, hockey, boxing, bicycle, and golf). It contains 8580 documents with 14870 words. Its sparsity is $99.14\%$.

$\bullet $ Reviews: it consists documents about five topics (such as food, movies, music, radio, and restaurants). It contains of 4069 documents and 18483 words. Its sparsity is $98.99\%$.

$\bullet $ Hitech: it consists documents about six technologies (such as computers, electronics health, medical, research, and technology). It contains 2301 documents with 10080 words. Its sparsity is $99.14\%$.

4.2 Results

First, as for NNSVD-LRC and as opposed to NNSVD and SVD-NMF, our proposed method, accNNSVD-PRP, decreases the relative error as the rank increase; see Fig. 1 for an example on the AT &T Faces and Reviews data sets. We report representative examples for both classes of data sets studied: dense and sparse; moreover for the Reviews data set it can be noted that the relative error still decreases for all the rank values tested (from $k=1,\dots ,20$) that could represent an over-estimation of the $k=5$ topics in the data sets, as explained in Zhong and Ghosh (2005).

We also observe on Fig. 1 that accNNSVD-PRP generates solutions with initial error larger than that of NNSVD-LRC, but this is expected since accNNSVD-PRP only optimizes the factor H in the second phase. Table 1 reports the relative errors after a few iterations (namely, 5, 25, 125) of the HALS algorithm for NMF (Cichocki et al. 2007). We observe that acc-NNSVDPRP provides similar relative error values compared to NNSVD-LRC, this is even more evident especially after enough iterations of HALS. Compared to the other initialization strategies, accNNSVD-PRP and NNSVD-LRC perform on average better, as already reported in Atif et al. (2019).

Table 1 Relative error in percent of different initialization methods after few iterations (namely 5, 25, 125) of HALS for different over-estimated rank values i.e., $k=15,20,25$. Cases when accNNSVD-PRP performs better than NNSVD-LRC are indicated in bold

Full size table

However, although accNNSVD-PRP does not outperform NNSVD-LRC in terms of relative errors, it runs significantly faster since its second phase is much cheaper. Table 2 reports the computational times of the tested NMF initialization for different over-estimated rank values i.e., $k=15,20,25, 35, 50$.

Table 2 CPU time (in s.) taken by different NMF initializations for the different data sets according to several over-estimated rank values i.e., k = 15, 20, 25, 35, 50

Full size table

We observe that, except for random initializations which are very fast to generate, accNNSVD-PRP runs among the fastest in all cases (always at least second best, and very close to the fastest when not the best). In particular, accNNSVD-PRP runs significantly faster than NNSVD-LRC: on average 4.5 times faster on dense data sets, and 1.7 times faster on sparse data sets.

5 Conclusion

In this paper, we have proposed a new SVD-based initialization for NMF, namely accNNSVD-PRP. It is inspired by NNSVD-LRC, a state-of-the-art algorithm, but we adapted the two steps of NNSVD-LRC, by reformulating the NNSVD to introduce a factor $\overline{H}$ that takes into account some discarded terms that will be updated later with a low-rank correction. This procedure allows the proposed algorithm to run significantly faster, while keeping its nice properties, in particular that the approximation error decreases as the factorization rank increases. As a remark we would like to note that this approach can be adapted to switch between H and W factors and to include W in the LRC correction phase as well, to further improve performance of accNNSVD-PRP and this is the object of our future works. To conclude, we have shown over various sparse and dense datasets that accNNSVD-PRP runs significantly faster than NNSVD-LRC while performing similarly as an NMF initialization scheme in terms of relative errors. Moreover, it outperforms other state-of-the-art algorithms such as NNSVD and SVD-NMF.

Availability of data and materials

Data are publicly available at the links in the text.

Notes

Roughly half of them, except for the first rank-one factor, by the Perron-Frobenius theorem (Berman and Plemmons 1994).
A-HALS is an accelerated variant of the HALS algorithm (Cichocki et al. 2007) that uses a block coordinate descent method to solve the nonnegative least squares subproblems in order to update the factor matrices W and H.

References

Araújo MCU, Saldanha TCB, Galvao RKH, Yoneyama T, Chame HC, Visani V (2001) The successive projections algorithm for variable selection in spectroscopic multicomponent analysis. Chemometr Intell Lab Syst 57(2):65–73
Atif SM, Qazi S, Gillis N (2019) Improved svd-based initialization for nonnegative matrix factorization using low-rank correction. Pattern Recognit Lett 122:53–59
Article Google Scholar
Berman A, Plemmons RJ (1994) Nonnegative matrices in the mathematical sciences. SIAM
Boutsidis C, Gallopoulos E (2008) Svd based initialization: a head start for nonnegative matrix factorization. Pattern Recognit 41(4):1350–1362
Article Google Scholar
Bro R, Acar E, Kolda TG (2008) Resolving the sign ambiguity in the singular value decomposition. J Chemometr 22(2):135–140
Article Google Scholar
Cichocki A, Zdunek R, Phan AH, Amari SI (2009) Nonnegative matrix and tensor factorizations: applications to exploratory multi-way data analysis and blind source separation. Wiley, New York
Book Google Scholar
Cichocki A, Zdunek R, Amari SI (2007) Hierarchical ALS algorithms for nonnegative matrix and 3d tensor factorization. In: international conference on independent component analysis and signal separation. Springer, Berlin, pp 169–176
Du R, Drake B, Park H (2019) Hybrid clustering based on content and connection structure using joint nonnegative matrix factorization. J Glob Optim 74(4):861–877
Article MathSciNet Google Scholar
Ensari T (2016) Character recognition analysis with nonnegative matrix factorization. Int J Comput 1
Esposito F (2021) A review on initialization methods for nonnegative matrix factorization: towards omics data experiments. Mathematics 9(9):1006
Article Google Scholar
Geng X, Ji L, Sun K (2016) Non-negative matrix factorization based unmixing for principal component transformed hyperspectral data. Front Inf Technol Electron Eng 403–412
Gillis N (2020) Nonnegative matrix factorization. SIAM, Philadelphia
Book Google Scholar
Gillis N, Glineur F (2012) Accelerated multiplicative updates and hierarchical als algorithms for nonnegative matrix factorization. Neural Comput 24(4):1085–1105
Article MathSciNet Google Scholar
Gillis N, Glineur F (2008) Nonnegative factorization and the maximum edge biclique problem. arXiv:0810.4225
Janecek A, Tan Y (2011) Using population based algorithms for initializing nonnegative matrix factorization. In: International conference in swarm intelligence. Springer, Berlin, pp 307–316
Langville AN, Meyer CD, Albright R, Cox J, Duling D (2006) Initializations for the nonnegative matrix factorization. In: Proceedings of the twelfth ACM SIGKDD international conference on knowledge discovery and data mining, Citeseer, pp 23–26
Liu Z, Tan VY (2017) Rank-one nmf-based initialization for nmf and relative error bounds under a geometric assumption. IEEE Trans Signal Process 65(18):4717–4731
Article MathSciNet Google Scholar
Liu Z, Tan V (2018) Rank-one NMF-based initialization for NMF and relative error bounds under a geometric assumption. Information Theory And Applications Workshop (ITA), pp. 1-15
Luce R, Hildebrandt P, Kuhlmann U, Liesen J (2016) Using separable nonnegative matrix factorization techniques for the analysis of time-resolved raman spectra. Appl Spectrosc 70(9):1464–1475
Article Google Scholar
Maruyama R, Maeda K, Moroda H, Kato I, Inoue M, Miyakawa H, Aonishi T (2014) Detecting cells using non-negative matrix factorization on calcium imaging data. Neural Netw 55:11–19
Article Google Scholar
Nascimento J, Dias J (2005) Vertex component analysis: a fast algorithm to unmix hyperspectral data. IEEE Trans Geosci Remote Sens 898–910
Nesterov Y (1983) A method of solving a convex programming problem with convergence rate $O(1/k^2)$. In: Sov. Math. Dokl, vol 27, pp 372–376
Nesterov Y et al (2018) Lectures on convex optimization, vol 137. Springer, Berlin
Google Scholar
O’donoghue B, Candes E (2015) Adaptive restart for accelerated gradient schemes. Found Comput Math 15(3):715–732
Prajapati SJ, Jadhav KR (2015) Brain tumor detection by various image segmentation techniques with introduction to non negative matrix factorization. Brain 4(3):600–3
Google Scholar
Qiao H (2015) New svd based initialization strategy for non-negative matrix factorization. Pattern Recognit Lett 63:71–77
Article Google Scholar
Rezaei M, Boostani R, Rezaei M (2011) An efficient initialization method for nonnegative matrix factorization. J Appl Sci 11(2):354–359
Article Google Scholar
Sauwen N, Acou M, Bharath HN, Sima DM, Veraart J, Maes F, Himmelreich U, Achten E, Van Huffel S (2017) The successive projection algorithm as an initialization method for brain tumor segmentation using non-negative matrix factorization. Plos One 12(8):e0180268
Article Google Scholar
Shiga M, Tatsumi K, Muto S, Tsuda K, Yamamoto Y, Mori T, Tanji T (2016) Sparse modeling of eels and edx spectral imaging data by nonnegative matrix factorization. Ultramicroscopy 170:43–59
Article Google Scholar
Wang X, Xie X, Lu L (2012) An effective initialization for orthogonal nonnegative matrix factorization. J Comput Math 34–46
Wild S, Curry J, Dougherty A (2004) Improving non-negative matrix factorizations through structured initialization. Pattern Recognit 37(11):2217–2232
Article Google Scholar
Yoshii K, Itoyama K, Goto M (2016) Student’s t nonnegative matrix factorization and positive semidefinite tensor factorization for single-channel audio source separation. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 51–55
Zdunek R (2012) Initialization of nonnegative matrix factorization with vertices of convex polytope. In: International conference on artificial intelligence and soft computing, pp 448–455
Zhong S, Ghosh J (2005) Generative model-based document clustering: a comparative study. Knowl Inf Syst 8(3):374–384
Article Google Scholar

Download references

Acknowledgements

The author F.E. is a member of the Gruppo Nazionale Calcolo Scientifico - Istituto Nazionale di Alta Matematica (GNCS-INdAM) and she would like to thank Prof. Nicoletta Del Buono from the University of Bari Aldo Moro for scientific discussion during this project.

Funding

Open access funding provided by Università degli Studi di Bari Aldo Moro within the CRUI-CARE Agreement. The author F.E. is supported by ERC Seeds Uniba project “Biomes Data Integration with Low-Rank Models” (CUP H93C23000720001), Piano Nazionale di Ripresa e Resilienza (PNRR), Missione 4 “Istruzione e Ricerca”-Componente C2 Investimento 1.1, “Fondo per il Programma Nazionale di Ricerca e Progetti di Rilevante Interesse Nazionale”, Progetto PRIN-2022 PNRR, P2022BLN38, Computational approaches for the integration of multi-omics data. CUP: H53D23008870001.

Author information

Authors and Affiliations

Department of Mathematics, Università degli Studi di Bari Aldo Moro, Bari, Italy
Flavia Esposito
College of Computing and Software Engineering, Ziauddin University, Karachi, Pakistan
Syed Muhammad Atif
Department of Mathematics and Operational Research, Faculté Polytechnique, Université de Mons, Mons, Belgium
Nicolas Gillis

Authors

Flavia Esposito
View author publications
You can also search for this author in PubMed Google Scholar
Syed Muhammad Atif
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Gillis
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contribute equally to the work.

Corresponding author

Correspondence to Flavia Esposito.

Ethics declarations

Conflict of interest

Not applicable.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

All authors consent to the publication of the work.

Code availability

The code of accNNSVD-PRP is available on GitHub at https://github.com/5y3datif/accNNSVD-PRP.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Esposito, F., Atif, S.M. & Gillis, N. Accelerated SVD-based initialization for nonnegative matrix factorization. Comp. Appl. Math. 43, 392 (2024). https://doi.org/10.1007/s40314-024-02905-1

Download citation

Received: 29 February 2024
Revised: 23 July 2024
Accepted: 12 August 2024
Published: 02 September 2024
DOI: https://doi.org/10.1007/s40314-024-02905-1

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Accelerated SVD-based initialization for nonnegative matrix factorization

Abstract

Similar content being viewed by others

Initialization for non-negative matrix factorization: a comprehensive review

A column-wise update algorithm for nonnegative matrix factorization in Bregman divergence with an orthogonal constraint

A generalized tri-factorization method for accurate matrix completion

1 Introduction

2 SVD-based NMF initializations: previous works

2.1 Nonnegative SVD with low-rank correction

3 Accelerated nonnegative SVD with progressive residual projection

4 Numerical experiments

4.1 Datasets description

4.1.1 Dense datasets

4.1.2 Sparse datasets

4.2 Results

5 Conclusion

Availability of data and materials

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent for publication

Code availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Accelerated SVD-based initialization for nonnegative matrix factorization

Abstract

Similar content being viewed by others

Initialization for non-negative matrix factorization: a comprehensive review

A column-wise update algorithm for nonnegative matrix factorization in Bregman divergence with an orthogonal constraint

A generalized tri-factorization method for accurate matrix completion

Explore related subjects

1 Introduction

2 SVD-based NMF initializations: previous works

2.1 Nonnegative SVD with low-rank correction

3 Accelerated nonnegative SVD with progressive residual projection

4 Numerical experiments

4.1 Datasets description

4.1.1 Dense datasets

4.1.2 Sparse datasets

4.2 Results

5 Conclusion

Availability of data and materials

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent for publication

Code availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation