1 Introduction

Low rank matrix approximation (LRMA), which aims to recover the underlying low rank matrix from its degraded observation, has a wide range of applications in computer vision and machine learning. For instance, human facial images can be modeled as reflections of a Lambertian object and approximated by a low dimensional linear subspace; this low rank nature leads to a proper reconstruction of a face model from occluded/corrupted face images (De La Torre and Black 2003; Liu et al. 2010; Zheng et al. 2012). In recommendation system, the LRMA approach achieves outstanding performance on the celebrated Netflix competition, whose low-rank insight is based on the fact that the customers\(^\prime \) choices are mostly affected by only a few common factors (Ruslan and Srebro 2010). In background subtraction, the video clip captured by a surveillance camera is generally taken under a static scenario in background and with relatively small moving objects in foreground over a period, naturally resulting in its low-rank property; this inspires various effective techniques on background modeling and foreground object detection in recent years (Wright et al. 2009; Mu et al. 2011). In image processing, it has also been shown that the matrix formed by nonlocal similar patches in a natural image is of low rank; such a prior knowledge benefits the image restoration tasks (Wang et al. 2012). Owing to the rapid development of convex and non-convex optimization techniques in past decades, there are a flurry of studies in LRMA, and many important models and algorithms have been reported (Srebro et al. 2003; Buchanan and Fitzgibbon 2005; Ke and Kanade 2005; Eriksson and Van Den Hengel 2010; Fazel et al. 2001; Wright et al. 2009; Candès and Recht 2009; Cai et al. 2010; Candès et al. 2011; Lin et al. 2011).

The current development of LRMA can be categorized into two categories: the low rank matrix factorization (LRMF) approaches and the rank minimization approaches. Given a matrix \({{\varvec{Y}}}\in \mathfrak {R}^{m\times {n}}\), LRMF aims to factorize it into two smaller ones, \({{\varvec{A}}}\in \mathfrak {R}^{m\times k}\) and \({{\varvec{B}}}\in \mathfrak {R}^{n\times k}\), such that their product \({{\varvec{A}}}{{\varvec{B}}}^T\) can reconstruct \({{\varvec{Y}}}\) under certain fidelity loss functions. Here \(k<min(m,n)\) ensures the low-rank property of the reconstructed matrix \({{\varvec{A}}}{{\varvec{B}}}^T\). A variety of LRMF methods have been proposed, including the classical singular value decomposition (SVD) under \(\ell _2\)-norm loss, robust LRMF methods under \(\ell _1\)-norm loss, and many probabilistic methods (Srebro et al. 2003; Buchanan and Fitzgibbon 2005; Ke and Kanade 2005; Mnih and Salakhutdinov 2007; Kwak 2008; Eriksson and Van Den Hengel 2010).

Rank minimization methods represent another main branch along this line of research. These methods reconstruct the data matrix through imposing an additional rank constraint upon the estimated matrix. Since direct rank minimization is NP hard and is difficult to solve, the problem is generally relaxed by substitutively minimizing the nuclear norm of the estimated matrix, which is a convex relaxation of minimizing the matrix rank (Fazel 2002). This methodology is called as nuclear norm minimization (NNM). The nuclear norm of a matrix \({{\varvec{X}}}\), denoted by \(\Vert {{\varvec{X}}}\Vert _*\), is defined as the sum of its singular values, i.e., \(\Vert {{\varvec{X}}}\Vert _*=\sum _i\sigma _i({{\varvec{X}}})\), where \(\sigma _i({{\varvec{X}}})\) denotes the i-th singular value of \({{\varvec{X}}}\). The NNM approach has been attracting significant attention due to its rapid development in both theory and implementation. On one hand,  (Candès et al. 2011) proved that from the noisy input, its intrinsic low-rank reconstruction can be exactly achieved with a high probability through solving an NNM problem. On the other hand, (Cai et al. 2010) proved that the nuclear norm proximal (NNP) problem

(1)

can be easily solved in closed-form by imposing a soft-thresholding operation on the singular values of the observation matrix:

$$\begin{aligned} \hat{{{\varvec{X}}}}={{\varvec{U}}}{\mathcal {S}}_\frac{\lambda }{2}(\varvec{\varSigma }){{\varvec{V}}}^T, \end{aligned}$$
(2)

where \({{\varvec{Y}}}={{\varvec{U}}}\varvec{\varSigma }{{\varvec{V}}}^{T}\) is the SVD of \({{\varvec{Y}}}\) and \({\mathcal {S}}_\frac{\lambda }{2}(\varvec{\varSigma })\) is the soft-thresholding function on diagonal matrix \(\varvec{\varSigma }\) with parameter \(\frac{\lambda }{2}\). For each diagonal element \(\varvec{\varSigma }_{ii}\) in \(\varvec{\varSigma }\), there is

$$\begin{aligned} {\mathcal {S}}_\frac{\lambda }{2}(\varvec{\varSigma })_{ii}=\max \left( \varvec{\varSigma }_{ii}-\frac{\lambda }{2},0\right) . \end{aligned}$$
(3)

By utilizing NNP as the key proximal technique (Moreau 1965), many NNM-based models have been proposed in recent years (Lin et al. 2009; Ji and Ye 2009; Cai et al. 2010; Lin et al. 2011).

Albeit its success as aforementioned, NNM still has certain limitations. In traditional NNM, all singular values are treated equally and shrunk with the same threshold \(\frac{\lambda }{2}\) as defined in (3). This, however, ignores the prior knowledge we often have on singular values of a practical data matrix. More specifically, larger singular values of an input data matrix quantify the information of its underlying principal directions. For example, the large singular values of a matrix of image similar patches deliver the major edge and texture information. This implies that to recover an image from its corrupted one, we should shrink less the larger singular values while shrink more the smaller ones. Clearly, traditional NNM model, as well as its corresponding soft-thresholding solvers, are not flexible enough to handle such issues.

To improve the flexibility of NNM, we propose the weighted nuclear norm and study its minimization strategy in this work. The weighted nuclear norm of a matrix \({{\varvec{X}}}\) is defined as

$$\begin{aligned} \Vert {{\varvec{X}}}\Vert _{{{\varvec{w}}},*}=\textstyle \sum _iw_i\sigma _i({{\varvec{X}}}), \end{aligned}$$
(4)

where \({{\varvec{w}}}=[w_1,\ldots ,w_n]^T\) and \(w_i\ge 0\) is a non-negative weight assigned to \(\sigma _i({{\varvec{X}}})\). The weight vector will enhance the representation capability of the original nuclear norm. Rational weights specified based on the prior knowledge and understanding of the problem will benefit the corresponding weighted nuclear norm minimization (WNNM) model for achieving a better estimation of the latent data from the corrupted input. The difficulty of solving a WNNM model, however, lies in that it is non-convex in general cases, and the sub-gradient method (Cai et al. 2010) used for achieving the closed-form solution of an NNP problem is no longer applicable. In this paper, we investigate in detail how to properly and efficiently solve such non-convex WNNM problem.

As the NNP operator to the NNM problem, the following weighted nulcear norm proximal (WNNP)Footnote 1 operator determines the general solving regime of the WNNM problem:

(5)

We prove that the WNNP problem can be equivalently transformed to a quadratic programming (QP) problem with linear constraints. This allows us to easily reach the global optimum of the original problem by using off-the-shelf convex optimization solvers. We further show that when the weights are non-descending, the global optimum of WNNP can be easily achieved in closed-form, i.e., by a so-called weighted soft-thresholding operator. Such an efficient solver makes it possible to utilize weighted nuclear norm in more complex applications. Particularly, we propose the WNNM-based robust principle component analysis (WNNM-RPCA) model and WNNM-based matrix completion (WNNM-MC) model, and solve these models by virtue of the WNNP solver. Furthermore, inspired by the previous developments of reweighted sparse coding, we present a rational scheme to automatically set the weights for the given data.

To validate the effectiveness of the proposed WNNM models, we test them on several typical low level vision tasks. Specifically, we first test the performance of the proposed WNNP model on image denoising. By utilizing the nonlocal self-similarity prior of images (Buades et al. 2008), the WNNP model achieves superior performance to state-of-the-art denoising algorithms. We perform the WNNM-RPCA model on the background subtraction application. Both the quantitative results and visual examples demonstrate the superiority of the proposed model beyond previous low-rank learning methods. We further apply WNNM-MC to the image inpainting task, and it also shows competitive results with state-of-the-art methods.

The contribution of this paper is three-fold. First, we prove that the non-convex WNNP problem can be equivalently transformed into a QP problem with liner constraints. This results in that the global optimum of the non-convex WNNP problem can be readily achieved by off-the-shelf convex optimization solvers. Furthermore, for non-descendingly ordered weights, we can get the optimal solution in closed-form by the weighted soft-thresholding operator, which greatly eases the computation for the problem. Second, we extend WNNP to WNNM-RPCA and WNNM-MC models to handle sparse noise/outlier and missing data, respectively. Attributed to the proposed WNNP, both WNNM-RPCA and WNNM-MC can be solved without introducing extra computation burden of traditional RPCA/MC algorithms. Third, we adapt the proposed WNNM models to different computer vision tasks. In image denoising, the WNNP model outperforms state-of-the-art methods in both PSNR index and visual perception. In background subtraction and image inpainting, the proposed WNNM-RPCA and WNNM-MC models also achieve better performance than traditional algorithms, especially, the low-rank learning methods designed for these tasks.

The rest of this paper is organized as follows. Section 2 briefly introduces some related work. Section 3 analyzes the optimization of the WNNM models, including the basic WNNP operator and more complex WNNM-RPCA and WNNM-MC ones. Section 4 applies the WNNM model to image denoising and compares the proposed algorithm with state-of-the-art denoising algorithms. Sections 5 and 6 apply the WNNM-RPCA and WNNM-MC to background subtraction and image inpainting, respectively. Section 7 discusses the proposed WNNM models and other singular value regularization methods from the viewpoint of weight setting. Section 8 concludes the paper.

2 Related Work

The main goal of LRMA is to recover the underlying low rank matrix from its degraded/corrupted observation. As the data from many practical problems possess intrinsic low-rank structure, the research on LRMA has achieved a great success in various applications of computer vision. Here we provide a brief review for the current advancement on this topic. A more comprehensive review can be found in a recent survey paper (Zhou et al. 2014).

The LRMA approach mainly includes two categories: the LRMF methods and the rank minimization methods. Given an input data matrix \({{\varvec{Y}}}\in \mathfrak {R}^{m\times {n}}\), LRMF intends to find out the output \({{\varvec{X}}}\) as a product of two smaller matrices \({{\varvec{X}}} = {{\varvec{A}}}{{\varvec{B}}}^T\), where \({{\varvec{A}}}\in \mathfrak {R}^{m\times k}\), \({{\varvec{B}}}\in \mathfrak {R}^{n\times k}, k<m, n\). The most classical LRMF method is the well known SVD technique, which is capable of attaining the optimal rank-k approximation of \({{\varvec{Y}}} \) by a truncation operator on its singular value matrix in terms of F-norm fidelity loss. To handle outliers/heavy noises mixed in data, multiple advanced robust LRMF methods have been proposed. (De La Torre and Black 2003) initiated a robust principal component analysis (RPCA) framework by utilizing the Geman-McClure function instead of the conventional \(\ell _2\)-norm loss function. Beyond this non-convex loss function, a more commonly used one for robust LRMF is the \(\ell _1\)-norm. In the seminal work of (Ke and Kanade 2005), Ke and Kannde designed an alternative convex programming algorithm to solve this \(\ell _1\) norm factorization model. More strategies have also been attempted for solving this model, e.g., PCAL1 (Kwak 2008) and L\(_1\)-Wiberg (Eriksson and Van Den Hengel 2010). Furthermore, to deal with the LRMF problem in the presence of missing entries, a series of matrix completion techniques have been proposed to estimate the low-rank matrix from only partial observations. The related investigations range from studies on optimization algorithms (Jain et et al. 2013) to new regularization strategies for highly ill-posed matrix completion tasks (Srebro et al. 2004). Besides these deterministic LRMF methods, there have been several probabilistic extensions of matrix factorizations. These probabilistic methods are capable of adapting different kinds of noises, such as dense noise with Gaussian distribution (Tipping and Bishop 1999; Mnih and Salakhutdinov 2007), sparse noise distribution (Wang and Yeung 2013; Babacan et al. 2012; Ding et al. 2011) and more complex mixture-of-Gaussian noise (Meng and Torre 2013).

The main idea of the rank minimization methods is to estimate the low-rank reconstruction by minimizing the rank of its relaxations upon it. Since directly minimizing the matrix rank corresponds to an NP-hard problem (Fazel 2002), the relaxation methods are more commonly utilized. The most popular choice is to replace the rank function with its tightest convex relaxation, the nuclear norm, leading to NNM-based methods. The superiority of these methods is two-fold. Firstly, the exact recovery property of some low rank recovery models have been theoretically proved (Candès and Recht 2009; Candès et al. 2011). Secondly, in implementation, the typical NNP problem along this line can be easily solved in closed-form by soft-thresholding the singular values of the input matrix (Cai et al. 2010). The recent advancement of NNM in applications such as RPCA and MC (Lin et al. 2009; Ji and Ye 2009; Cai et al. 2010; Lin et al. 2011) has been successfully utilized in many computer vision tasks (Ji et al. 2010; Liu et al. 2010; Zhang et al. 2012b; Liu et al. 2012; Peng et al. 2012).

As a convex surrogate of the rank function, the nuclear norm corresponds to the \(\ell _1\)-norm on the singular values. While convexity simplifies the computation, some previous works have shown that nonconvex models may have better performance in applications such as compressive sensing (Chartrand 2007; Candès et al. 2008). Recently, researchers have attempted to substitute nuclear norm by certain non-convex ameliorations to make it more adaptable to real scenarios. For instance, (Nie et al. 2012) and (Mohan and Fazel 2012) used the Schatten-p norm, which is defined as the \(\ell _p\) norm of the singular values \((\sum _{i}\sigma _i^p)^{1/p}\), to regularize the rank of the reconstructed matrix. When \(p=0\), the Schatten-0 norm degenerates to the matrix rank, and when \(p=1\), the Schatten-1 norm complies with the standard nuclear norm. (Chartrand 2012) defined a generalized Huber penalty function to regularize the singular values of matrix.  (Zhang et al. 2012a) proposed a truncated nuclear norm regularization method which only regularizes the last k singular values of matrix. Similar strategy was utilized in (Oh et al. 2013) to deal with different low level vision problems. However, these low-rank regularization terms are all designed for a specific category of penalty functions for LRMA, while they are not flexible enough to adapt various real cases. Recently, (Dong et al. 2014) utilized nonconvex \(logdet({{\varvec{X}}})\) as a surrogate of the rank function and achieved state-of-the-art performance on compressive sensing. (Lu et al. 2014a, b) studied the generalized nonconvex nonsmooth rank minimization problem and validated that if the penalty function is monotonic and lower bounded, a generalized singular value soft-thresholding operation is able to get the optimal solution. Their optimization framework can be utilized to a wide range of regularization penalty functions.

The WNNM problem has been formally proposed and analyzed in (Gu et al. 2014), where Gu et al presented some preliminary theoretical results on this problem and evaluated its effectiveness in image denoising. In this paper, we thoroughly polish the theory and provide the general solving scheme for this problem. The automatic weight setting scheme is also rationally designed by referring to the reweighting technique in sparse coding (Candès et al. 2008). We also substantiate our theoretical analysis by conducting extensive experiments in image denoising, background subtraction and image inpainting.

3 Weighted Nuclear Norm Minimization for Low Rank Modeling

In this section, we first introduce the general solving scheme for the WNNP problem (5), and then introduce its WNNM-RPCA and WNNM-MC extensions. Note that these WNNM models are more difficult to optimize than conventional NNM ones due to the non-convexity of the involved weighted nuclear norm. Furthermore, the sub-gradient method proposed in (Cai et al. 2010) to solve NNP is not applicable to WNNP. We thus construct new solving regime for this problem. Obviously, NNM is a special case of WNNM when all the weights \(w_i\) are set the same. Our solution thus covers that of the traditional NNP.

3.1 Weighted Nuclear Norm Proximal for WNNM

In (Gu et al. 2014), a lemma is presented to analyze the WNNP problem. This Lemma is actually a special case of the following Lemma 1, i.e., the classical von Neumanns trace inequality (Mirsky 1975; Rhea 2011):

Lemma 1

(von Neumanns trace inequality (Mirsky 1975; Rhea 2011)) For any \(m\times n\) matrices \(\mathbf A \) and \(\mathbf B \), \(tr\left( \mathbf A ^T\mathbf B \right) \) \(\le \sum _i\sigma _i(\mathbf A )\sigma _i(\mathbf B )\), where \(\sigma _1(\mathbf A )\ge \sigma _2(\mathbf A )\ge \cdots \ge 0\) and \(\sigma _1(\mathbf B )\ge \sigma _2(\mathbf B )\ge \cdots \ge 0\) are the descending singular values of \(\mathbf A \) and \(\mathbf B \), respectively. The case of equality occurs if and only if it is possible to find unitaries \(\mathbf U \) and \(\mathbf V \) that simultaneously singular value decompose \(\mathbf A \) and \(\mathbf B \) in the sense that

$$\begin{aligned} \mathbf A = \mathbf U \varvec{\varSigma }_A\mathbf V ^{T}, and ~~ \mathbf B = \mathbf U \varvec{\varSigma }_B\mathbf V ^{T}, \end{aligned}$$

where \(\varvec{\varSigma }_A\) and \(\varvec{\varSigma }_B\) denote the ordered eigenvalue matrices with singular value \(\sigma (\mathbf A )\) and \(\sigma (\mathbf B )\) along the diagonal with the same order, respectively.

Based on the result of Lemma 1, we can deduce the following important theorem.

Theorem 1

Given \(\mathbf Y \in \mathfrak {R}^{m\times {n}}\), without loss of generality, we assume that \(m\ge n\), and let \(\mathbf Y =\mathbf U \varvec{\varSigma }{} \mathbf V ^{T}\) be the SVD of \(\mathbf Y \), where \(\varvec{\varSigma }=\left( \begin{array}{cc} diag(\sigma _1,\sigma _2,...,\sigma _n)\\ \mathbf 0 \end{array} \right) \in \mathfrak {R}^{m\times {n}}\). The global optimum of the WNNP problem in (5) can be expressed as \(\hat{\mathbf{X }}=\mathbf U \hat{\mathbf{D }}{} \mathbf V ^T\), where \(\mathbf D =\left( \begin{array}{cc} diag(d_1,d_2,...,d_n)\\ \mathbf 0 \end{array} \right) \) is a diagonal non-negative matrix and \((d_1,d_1,...,d_n)\) is the solution to the following convex optimization problem:

$$\begin{aligned} \begin{aligned} \min _{d_1,d_2,...,d_n}\sum \nolimits _{i=1}^n(\sigma _i-d_i)^2+w_id_i,\\ \qquad s.t.\quad d_1\ge d_2\ge ...\ge d_n\ge 0. \end{aligned} \end{aligned}$$
(6)

The proof of Theorem 1 can be found in the Appendix 9.1. Theorem 1 shows that the WNNP problem can be transformed into a new optimization problem (6). It is interesting that (6) is a quadratic optimization problem with linear constraints, and its global optimum can be easily calculated by off-the-shelf convex optimization solvers. This means that for the non-convex WNNP problem, we can always get its global solution through (6). In the following corollary, we further show that the global solution of (6) can be achieved in closed-form when the weights are sorted in a non-descending order.

Corollary 1

If \(\sigma _1\ge \cdots \ge \sigma _n\ge 0\) and the weights satisfy \(0\le w_1\le \cdots \le w_n\), then the global optimum of (6) is \(\hat{d}_i=max\left( \sigma -\frac{w_i}{2},0\right) \).

The proof of Corollary 1 is given in Appendix 9.2. The conclusion in Corollary 1 is very useful. The singular values of a matrix are sorted in a non-ascending order, and the larger singular values usually correspond to the subspaces of more important components of the data matrix. We thus always expect to shrink less the larger singular values to keep the major and faithful information of the underneath data. In this sense, Corollary 1 guarantees that we have a closed-form optimal solution for the WNNP problem by the weighted singular value soft-thresholding operation:

where \({{\varvec{Y}}}={{\varvec{U}}}\varvec{\varSigma }{{\varvec{V}}}^T\) is the SVD of \({{\varvec{Y}}}\), and \({\mathcal {S}}_{\frac{{{\varvec{w}}}}{2}}(\varvec{\varSigma })\) is the generalized soft-thresholding operator with weight vector \({{\varvec{w}}}\)

$$\begin{aligned} {\mathcal {S}}_\frac{{{\varvec{w}}}}{2}(\varvec{\varSigma })_{ii}=\max \left( \varvec{\varSigma }_{ii}-\frac{w_i}{2},0\right) . \end{aligned}$$

Note that when all the weights \(w_i\) are set the same, the above WNNP solver exactly degenerates to the NNP solver for the traditional NNM problem.

A recent work by (Lu et al. 2014b) has proved a similar conclusion to our Corollary 1. As Lu et al. analyzed the generalized singular value regularization model with different penalty functions for the singular values, the condition in their paper is the monotonicity property of the proximal operator which is determined by the penalty function. While our work attains the WNNP solver in general weighting cases rather than only in the case of nonascendingly ordered weights. Interested readers may refer to the proof of our paper and (Lu et al. 2014b) for details.

3.2 WNNM for Robust PCA

figure a

In last section, we analyzed the optimal solution to the WNNP operator for the WNNM problem. Based on our definition of WNNP in (5), is the low rank approximation to the observation matrix \({{\varvec{Y}}}\) under the F-norm data fidelity term. However, in real applications, the observation data may be corrupted by outliers or sparse noise with large magnitude. In such cases, the large magnitude noise, even with small amount, tends to greatly affect the F-norm data fidelity and lead to a biased low rank estimation. The recently proposed NNM-based RPCA (NNM-RPCA) method (Candès et al. 2011) alleviates this problem by optimizing the following problem:

$$\begin{aligned} \min \nolimits _{{{\varvec{E}}},{{\varvec{X}}}}\Vert {{\varvec{E}}}\Vert _1+\Vert {{\varvec{X}}}\Vert _*, \quad s.t. \quad {{\varvec{Y}}} = {{\varvec{X}}}+{{\varvec{E}}}. \end{aligned}$$
(7)

Using the \(\ell _1\)-norm to model the error \({{\varvec{E}}}\), model (7) guarantees a more robust matrix approximation in the presence of outliers/sparse noise. In particular, (Candès et al. 2011) proved that if the low rank matrix \({{\varvec{X}}}\) and the sparse component \({{\varvec{E}}}\) satisfy certain conditions, the NNM-RPCA model (7) can exactly recover \({{\varvec{X}}}\) with a probability close to 1.

In this section, we propose to use the weighted nuclear norm to reformulate (7), leading to the following WNNM-based RPCA (WNNM-RPCA) model:

$$\begin{aligned} \min _{{{\varvec{E}}},{{\varvec{X}}}}\Vert {{\varvec{E}}}\Vert _1+\Vert {{\varvec{X}}}\Vert _{w,*}, \quad s.t. \quad {{\varvec{Y}}} = {{\varvec{X}}}+{{\varvec{E}}}. \end{aligned}$$
(8)

As in NNM-RPCA, we also employ the alternating direction method of multipliers (ADMM) to solve the WNNM-RPCA problem. Its augmented Lagrange function is

$$\begin{aligned}&\varGamma ({{\varvec{X}}},{{\varvec{E}}},{{\varvec{L}}},\mu ) \nonumber \\&= \Vert {{\varvec{E}}}\Vert _1+\Vert {{\varvec{X}}}\Vert _{w,*}+\langle {{\varvec{L}}},{{\varvec{Y}}}-{{\varvec{X}}}-{{\varvec{E}}}\rangle +\frac{\mu }{2}\Vert {{\varvec{Y}}}{-}{{\varvec{X}}}{-}{{\varvec{E}}}\Vert _F^2,\nonumber \\ \end{aligned}$$
(9)

where \({{\varvec{L}}}\) is the Lagrange multiplier and \(\mu \) is a positive scaler. The optimization procedure is described in Algorithm 1. Note that the convergence of this ADMM algorithm is more difficult to analyze than the convex NNM-RPCA model due to the non-convexity of the WNNM-RPCA model. We give the following weak convergence result to facilitate the construction of a rational termination condition for Algorithm 1.

Theorem 2

If the weights are sorted in a non-descending order, the sequences \(\{\mathbf{E }_{k}\}\) and \(\{\mathbf{X }_{k}\}\) generated by Algorithm 1 satisfy:

$$\begin{aligned} (1)&\lim _{k\rightarrow \infty }\Vert \mathbf X _{k+1}-\mathbf X _k\Vert _F=0,\\ (2)&\lim _{k\rightarrow \infty }\Vert \mathbf E _{k+1}-\mathbf E _k\Vert _F=0.\\ (3)&\lim _{k\rightarrow \infty }\Vert \mathbf Y -\mathbf E _{k+1}-\mathbf X _{k+1}\Vert _F=0. \end{aligned}$$

The proof of Theorem 2 can be found in Appendix 9.3. Please note that the proof of Theorem 2 relies on the unboundedness of the parameter \(\mu _k\). In most of previous ADMM based methods, an upper bound of \(\mu _k\) is needed to ensure the optimal solution for convex objective functions (Lin et al. 2015). However, since our WNNM-RPCA model is non-convex for general weight conditions, we use an unbounded \(\mu _k\) to guarantee the convergence of Algorithm 1. If \(\mu _k\) increases too fast, the iteration may stop quickly and we might not get a good solution. Thus in both Algorithm 1 and the following Algorithm 2, a small \(\rho \) is adopted to prevent \(\mu _k\) increase too fast. Please refer to the experiments in Sects. 5 and 6 for more details.

3.3 WNNM for Matrix Completion

In Sect. 3.2, we introduced the WNNM-RPCA model and provided the ADMM algorithm to solve it. In this section, we further use WNNM to deal with the matrix completion problem, and propose the following WNNM-based matrix completion (WNNM-MC) model:

$$\begin{aligned} \min \nolimits _{{{\varvec{X}}}}\Vert {{\varvec{X}}}\Vert _{{{\varvec{w}}},*}\quad s.t. \quad {\mathcal {P}}_\varOmega ({{\varvec{X}}})={\mathcal {P}}_\varOmega ({{\varvec{Y}}}), \end{aligned}$$
(10)

where \(\varOmega \) is a binary support indicator matrix of the same size as \({{\varvec{Y}}}\), and zeros in \(\varOmega \) indicate the missing entries in the observation matrix. \({\mathcal {P}}_\varOmega ({{\varvec{Y}}})=\varOmega \odot {{\varvec{Y}}}\) is the element-wise matrix multiplication (Hardamard product) between the support matrix \(\varOmega \) and the variable \({{\varvec{Y}}}\). The constraint implies that the estimated matrix \({{\varvec{X}}}\) agrees with \({{\varvec{Y}}}\) in the observed entries.

By introducing a variable \({{\varvec{E}}}\), we reformulate (10) as

$$\begin{aligned} \min \nolimits _{{{\varvec{X}}}}\Vert {{\varvec{X}}}\Vert _{w,*},~~~ s.t. \quad {{\varvec{X}}}+{{\varvec{E}}} = {{\varvec{Y}}}, \quad {\mathcal {P}}_\varOmega ({{\varvec{E}}})=0. \end{aligned}$$
(11)

The ADMM algorithm for solving WNNM-MC can then be constructed in Algorithm 2. For non-descending weights, both the subproblems in steps 3 and 4 of Algorithm 2 have closed-form optimal solutions. However, as the weighted nuclear norm is not convex, it is difficult to accurately analyze the convergence of the algorithm. Like in Theorem 2, in the following Theorem 3, we also present a weak convergence result.

figure b

Theorem 3

If the weights are sorted in a non-descending order, the sequence \(\{\mathbf{X }_{k}\}\) generated by Algorithm 2 satisfies

$$\begin{aligned} (1)&\lim _{k\rightarrow \infty }\Vert \mathbf X _{k+1}-\mathbf X _k\Vert _F=0,\\ (2)&\lim _{k\rightarrow \infty }\Vert \mathbf Y -\mathbf E _{k+1}-\mathbf X _{k+1}\Vert _F=0. \end{aligned}$$

The proof of Theorem 3 is similar to the proof of Theorem 2, and thus we omit it here.

3.4 The Setting of Weighting Vector

In previous sections, we proposed to utilize the WNNM model to solve different problems. By introducing the weight vector, the WNNM model improves the flexibility of the original NNM model. However, the weight vector itself also brings more parameters in the model. Appropriate setting of the weights plays a crucial role in the success of the proposed WNNM model.

In (Candès et al. 2008), an effective reweighted mechanism is proposed to enhance the sparsity of sparse coding solutions by adaptively tuning weights through the following formula:

$$\begin{aligned} w_i^{\ell +1}=\frac{C}{|x_i^\ell |+\varepsilon }, \end{aligned}$$
(12)

where \(x^\ell _i\) is the i-th sparse coding coefficient in the \(\ell \)-th iteration and \( w_i^{\ell +1}\) is its corresponding regularization parameter in the \((\ell +1)\)-th iteration, \(\varepsilon \) is a small positive number to avoid dividing by zero and C is a compromising constant. Such a reweighted procedure has proved to be capable of getting a good resemble of the \(\ell _0\) norm and the model achieves superior performance in compressive sensing.

Inspired by the success of reweighted sparse coding, we can adopt a similar reweighted strategy in WNNM by replacing \(x^\ell _i\) in (12) with the singular value \(\sigma _{i}({{\varvec{X}}}_{\ell })\) in the \(\ell ^{th}\) iteration. Because (12) is monotonically decreasing with the singular values, the non-descending order of weights with respect to singular values will be kept throughout the reweighting process. Very interestingly, the following remark indicates that we can directly get the closed-form solution of WNNP operator with such a reweighting strategy.

Remark 1

Let \({{\varvec{Y}}}={{\varvec{U}}}\varvec{\varSigma }{{\varvec{V}}}^{T}\) be the SVD of \({{\varvec{Y}}}\), where \(\varvec{\varSigma }=\left( \begin{array}{cc} diag(\sigma _1({{\varvec{Y}}}),\sigma _2({{\varvec{Y}}}),...,\sigma _n({{\varvec{Y}}}))\\ \mathbf 0 , \end{array} \right) \), and \(\sigma _i({{\varvec{Y}}})\) denotes the i-th singular value of \({{\varvec{Y}}}\). If the regularization parameter C is positive and the positive value \(\varepsilon \) is small enough to make the inequality \(\varepsilon <min\left( \sqrt{C},\frac{C}{\sigma _1({{\varvec{Y}}})}\right) \) hold, by using the reweighting formula \( w_i^{\ell }=\frac{C}{\sigma _i(\mathbf {{X}}_{\ell })+\varepsilon }\) with initial estimation \({{\varvec{X}}}_0={{\varvec{Y}}}\), the reweighted WNNP problem \(\{\min _{{{\varvec{X}}}}\Vert {{\varvec{Y}}}-{{\varvec{X}}}\Vert _{F}^2+\Vert {{\varvec{X}}}\Vert _{{{\varvec{w}}},*}\}\) has the closed-form solution: \({{\varvec{X}}}^*={{\varvec{U}}}\widetilde{\varvec{\varSigma }}{{\varvec{V}}}^{T}\), where

$$\begin{aligned} \widetilde{\varvec{\varSigma }}=\left( \begin{array}{cc} diag(\sigma _1({{\varvec{X}}}^*),\sigma _2({{\varvec{X}}}^*),...,\sigma _n({{\varvec{X}}}^*))\\ \mathbf 0 , \end{array} \right) , \end{aligned}$$

and

$$\begin{aligned} \sigma _{i}({{\varvec{X}}}^*)=\left\{ \begin{array}{cl} 0 &{} \text {if }c_{2}<0\\ \frac{c_{1}+\sqrt{c_{2}}}{2} &{} \text {if }c_{2}\ge 0 \end{array}\right. \end{aligned}$$
(13)

where

$$\begin{aligned} c_{1}=\sigma _{i}({{\varvec{Y}}})-\varepsilon ,\ \ c_{2}=\left( \sigma _{i}({{\varvec{Y}}})+\varepsilon \right) ^2-4C. \end{aligned}$$

The proof of Remark 1 can be found in Appendix 9.4. Remark 1 shows that although a reweighting strategy \(w_i^{\ell }=\frac{C}{\sigma _i(\mathbf {{X}}_{\ell })+\varepsilon }\) is used, we do not need to iteratively perform the thresholding and weight calculation operations. Based on the relationship between the singular value of observation matrix \(\mathbf {{X}}\) and the regularization parameter C, the convergence of the sigular value of estimated matrix after the reweighting process can be directly obtained. In each iteration of both the WNNM-RPCA and WNNM-MC algorithms, such a reweighting strategy is performed on the WNNP subproblem (step 4 in Algorithms 1 and 2) to adjust weights based on current \({{\varvec{X}}}_k\). Thanks to Remark 1, utilizing reweighting strategy in step 4 of Algorithm 1 and Algorithm 2 will increase little the computation burden. We are able to use an operation to directly shrank the original singular value \(\sigma _i(\mathbf {{Y}})\) to 0 or \(\frac{c_{1}+\sqrt{c_{2}}}{2}\), just like the soft-thresholding operation in the NNM method.

In implementation, we initialize \({{\varvec{X}}}_0\) as the observation \({{\varvec{Y}}}\). The above weight setting strategy greatly facilitates the WNNM calculations. Note that there remains one parameter C in the WNNM implementation. In all of our experiments, we set it by experience for certain tasks. Please see the following sections for details.

4 Image Denoising by WNNM

In this section, we validate the proposed WNNM model in application of image denoising. Image denoising is one of the fundamental problems in low level vision, and is an ideal test bed to investigate and evaluate the statistical image modeling techniques and optimization methods. Image denoising aims to reconstruct the original image \({{\varvec{x}}}\) from its noisy observation \({{\varvec{y}}}={{\varvec{x}}}+{{\varvec{n}}}\), where \({{\varvec{n}}}\) is generally assumed to be additive white Gaussian noise (AWGN) with zero mean and variance \(\sigma _n^2\).

The seminal work of nonlocal means (Buades et al. 2005; Buades et al. 2008) triggers the study of nonlocal self-similarity (NSS) based methods for image denoising. NSS refers to the fact that there are many repeated local patterns across a natural image, and those nonlocal similar patches to a given patch offer helpful remedy for its better reconstruction. The NSS-based image denoising algorithms such as BM3D (Dabov et al. 2007), LSSC (Mairal et al. 2009), NCSR (Dong et al. 2011) and SAIST (Dong et al. 2013) have achieved state-of-the-art denoising results. In this section, we utilize the NSS prior to develop the following WNNM-based denoising algorithm.

4.1 Denoising Algorithm

For a local patch \({{\varvec{y}}}_j\) in image \({{\varvec{y}}}\), we can search for its nonlocal similar patches across a relatively large area around it by methods such as block matching (Dabov et al. 2007). By stacking those nonlocal similar patches into a matrix, denoted by \({{\varvec{Y}}}_j\), we have \({{\varvec{Y}}}_j={{\varvec{X}}}_j+{{\varvec{N}}}_j\), where \({{\varvec{X}}}_j\) and \({{\varvec{N}}}_j\) are the original clean patch matrix and the corresponding corruption matrix, respectively. Intuitively, \({{\varvec{X}}}_j\) should be a low rank matrix, and the LRMA methods can be used to estimate \({{\varvec{X}}}_j\) from \({{\varvec{Y}}}_j\). By aggregating all the estimated patches \({{\varvec{Y}}}_j\), the whole image can be reconstructed. Indeed, the NNM method has been adopted in (Ji et al. 2010) for video denoising, and we apply the proposed WNNP operator to each \({{\varvec{Y}}}_j\) to estimate \({{\varvec{X}}}_j\) for image denoising. By using the noise variance \(\sigma _n^2\) to normalize the F-norm data fidelity term \(\Vert {{\varvec{Y}}}_j-{{\varvec{X}}}_j\Vert _F^2\) , we have the following energy function:

$$\begin{aligned} \hat{{{\varvec{X}}}}_j=\textstyle \arg \min _{{{\varvec{X}}}_j}\frac{1}{\sigma _n^2}\Vert {{\varvec{Y}}}_j-{{\varvec{X}}}_j\Vert _F^2+\Vert {{\varvec{X}}}_j\Vert _{{{\varvec{w}}},*}. \end{aligned}$$
(14)

Throughout our experiments, we set the parameter C as \(\sqrt{2n}\) by experience, where n is the number of similar patches.

By applying the above procedures to each patch and aggregating all patches together, the image \({{\varvec{x}}}\) can be reconstructed. In practice, we can run several iterations of this reconstruction process across all image patches to enhance the denoising outputs. The whole denoising algorithm is summarized in Algorithm 3.

figure c

4.2 Experimental Setting

We compare the proposed WNNM-based image denoising algorithm with several state-of-the-art denoising methods, including BM3DFootnote 2 (Dabov et al. 2007), EPLLFootnote 3 (Zoran and Weiss 2011), LSSCFootnote 4 (Mairal et al. 2009), NCSRFootnote 5 (Dong et al. 2011) and SAISTFootnote 6 (Dong et al. 2013). The baseline NNM algorithm is also compared. All the competing methods exploit the image nonlocal redundancies.

There are several other parameters (\(\delta \), the iteration number K and the patch size) in the proposed algorithm. For all noise levels, the iterative regularization parameter \(\delta \) is fixed to 0.1. The iteration number K and the patch size are set based on noise level. For higher noise level, we choose bigger patches and run more times the iteration. By experience, we set patch sizes to \(6\times 6\), \(7\,\times \,7\), \(8\,\times \,8\) and \(9\,\times \,9\) for \(\sigma _n\le 20\), \(20<\sigma _n\le 40\), \(40<\sigma _n\le 60\) and \(60<\sigma _n\), respectively. The iteration times K is set to 8, 12, 14, and 14, and the number of selected non-local similar patches is to 70, 90, 120 and 140,respectively, on these noise levels.

For NNM, we use the same parameters as WNNM except for the uniform weight \(\sqrt{n}\sigma _n\). The source codes of the comparison methods are obtained directly from the original authors, and we use the default parameters. Our code can be downloaded from http://www4.comp.polyu.edu.hk/cslzhang/code/WNNM_code.zip.

Fig. 1
figure 1

The 20 test images used in image denoising experiments

4.3 Experimental Results on 20 Test Images

We evaluate the competing methods on 20 widely used test images, whose thumbnails are shown in Fig. 1. The first 12 images are of size \(256\times 256\), and the other 8 images are of size \(512\times 512\). AWGN with zero mean and variance \(\sigma _n^2\) are added to those test images to generate the noisy observations. We present the denoising results on four noise levels, ranging from low noise level \(\sigma _n=10\), to medium noise levels \(\sigma _n=30\) and 50, and to strong noise level \(\sigma _n=100\). The PSNR results under these noise levels for all competing denoising methods are shown in Table 1. For a more comprehensive comparison, we also list the average denoising results on more noise levels by all competing methods in Tabel 2.

Table 1 Denoising results (PSNR, dB) by competing methods on the 20 test images
Table 2 The average PSNR (dB) values by competing methods on the 20 test images
Fig. 2
figure 2

Denoising results on image Boats by competing methods (noise level \(\sigma _n=50\)). The demarcated area is enlarged in the right bottom corner for better visualization. The figure is better seen by zooming on a computer screen

From Tables 1 and 2, we can see that the proposed WNNM method achieves the highest PSNR in almost all cases. It achieves 1.3dB–2dB improvement over the NNM method on average and outperforms the benchmark BM3D method by 0.3dB–0.45dB consistently on all the four noise levels. Such an improvement is notable since few methods can surpass BM3D more than 0.3dB on average (Levin et al. 2012; Levin and Nadler 2011).

Fig. 3
figure 3

Denoising results on image Monarch by competing methods (noise level \(\sigma _n=100\)). The demarcated area is enlarged in the left bottom corner for better visualization. The figure is better seen by zooming on a computer screen

Fig. 4
figure 4

Denoising results on a real SAR image by all competing methods

Fig. 5
figure 5

Denoising results on a real color image by all competing methods

In Figs. 2 and 3, we compare the visual quality of the denoised images by the competing algorithms. More visual comparison results can be found in the supplementary file. Fig. 2 demonstrates that the proposed WNNM reconstructs more image details from the noisy observation. Compared with WNNM, the LSSC, NCSR and SAIST methods over-smooth more textures in the sands area of image Boats, and the BM3D and EPLL methods generate more artifacts. More interestingly, as can be seen in the demarcated window, the proposed WNNM is capable of well reconstructing the tiny masts of the boat, while the masts are almost unrecognizable in the reconstructed images by other methods. Fig. 3 shows an example with strong noise. It is easy to see that WNNM generates less artifacts and preserves better the image edge structures compared with other competing methods. In summary, WNNM shows strong denoising capability, producing more pleasant denoising outputs in visualization and higher PSNR indices in quantity.

4.4 Experimental Results on Real Noisy Images

We also evaluate different denoising algorithms on two real images, including a gray level SAR imageFootnote 7 and a color imageFootnote 8, as shown in Figs. 4 and 5, respectively. Since the noise level is unknown for real noisy images, some noise estimation method is needed to estimate the noise level in the image. We adopted the classical method in (Donoho 1995) to estimate the noise level, and the estimated noise level was used for all the competing methods. Figs. 4 and 5 compare the denoising results by the competing methods on the two images. One can see that the proposed WNNM method keeps well the local structures in the images and generates the least visual artifacts among the competing methods.

Table 3 Relative error of low rank matrix recovery results by NNM-RPCA and WNNM-RPCA, with \(p_e\) fixed as 0.05, and \(p_r\) varying from 0.05 to 0.45 with step length 0.05
Table 4 Relative error of low rank matrix recovery results by NNM-RPCA and WNNM-RPCA, with \(p_e\) fixed as 0.1, and \(p_r\) varying from 0.05 to 0.45 with step length 0.05
Table 5 Relative error of low rank matrix recovery results by NNM-RPCA and WNNM-RPCA, with \(p_e\) fixed as 0.2, and \(p_r\) varying from 0.05 to 0.45 with step length 0.05
Fig. 6
figure 6

The log-scale relative error \(log\frac{\Vert \hat{{{\varvec{X}}}}-{{\varvec{X}}}\Vert _F^2}{\Vert {{\varvec{X}}}\Vert _F^2}\) of NNM-RPCA and WNNM-RPCA with different rank and outlier rate settings \(\{p_r,p_e\}\)

5 WNNM-RPCA for Background Subtraction

SVD/PCA aims to capture the principal (affine) subspace along which the data variance can be maximally captured. It has been widely used in the area of data modeling, compression, and visualization. In the conventional PCA model, the error is measured under the \(\ell _2\)-norm fidelity, which is optimal to suppress additive Gaussian noise. However, there are occasions that outliers or sparse noise are corrupted in data, which may disable SVD/PCA in estimating the ground truth subspace. To address this problem, multiple RPCA models have been proposed to robustify PCA, and have been attempted in different applications such as structure from motion, ranking and collaborative filtering, face reconstruction and background subtraction (Zhou et al. 2014).

Recently, (Candès et al. 2011) proposed the NNM-RPCA model which not only can be efficiently solved by ADMM, but also can guarantee the exact reconstruction of the original data under certain conditions. In this paper, we propose WNNM-RPCA to further enhance the flexibility of NNM-RPCA. In the following, we first design synthetic simulations to comprehensively compare the performance between WNNM-RPCA and NNM-RPCA, and then show the superiority of the proposed method in background subtraction by comparing with more typical low-rank learning methods designed for this task.

5.1 Experimental Results on Synthetic Data

To quantitatively evaluate the performance of the proposed WNNM-RPCA model, we generate synthetic low rank matrix recovering simulations for testing. The ground truth low rank data matrix \({{\varvec{X}}}\in \mathfrak {R}^{m\times m}\) is obtained by the multiplication of two low rank matrices: \({{\varvec{X}}}={{\varvec{A}}}{{\varvec{B}}}^T\), where \({{\varvec{A}}}\) and \({{\varvec{B}}}\) are both of size \(m\times r\). Here \(r=p_r\times m\) constrains the upper bound of \(Rank({{\varvec{X}}})\). In all experiments, each element of \({{\varvec{A}}}\) and \({{\varvec{B}}}\) is generated from a Gaussian distribution \(\mathcal {N}(0,1)\). The ground truth matrix \({{\varvec{X}}}\) is corrupted by sparse noise \({{\varvec{E}}}\) which has \(p_e\times m^2\) non-zero entries. The non-zero entries in \({{\varvec{E}}}\) are located in random positions and the value of each non-zero element is generated from a uniform distribution between \([-5, 5]\). We set \(m=400\), and let both \(p_r\) and \(p_e\) vary from 0.01 to 0.5 with step length 0.01. For each parameter setting \(\{p_r, p_e\}\), we generate the synthetic low-rank matrix 10 times and the final results are measured by the average of these 10 runs.

For the NNM-RPCA model, there is an important parameter \(\lambda \). We set it as \(1/\sqrt{m}\) following the suggested setting of (Candès et al. 2011). For our WNNM-RPCA model, the parameter C is empirically set as the square root of matrix size, i.e., \(C=\sqrt{m\times m}=m\), in all experiments. The parameters in ADMM for both methods are set as \(\rho =1.05\). Typical experimental results are listed in Tables 3, 4, and 5 for easy comparison.

It is easy to see that when the rank of matrix is low or the number of corrupted entries is small, both NNM-RPCA and WNNM-RPCA models are able to deliver accurate estimation of the ground truth matrix. However, with the rank of matrix or the number of corrupted entries getting larger, NNM-RPCA fails to deliver an accurate estimation of the ground truth matrix. Yet the error of the results by WNNM-RPCA is much smaller than NNM-RPCA in these cases. In Fig. 6, we show the log-scale relative error map of the recovered matrices by NNM-RPCA and WNNM-RPCA with different settings of \(\{p_r,p_e\}\). It is clear that the success area of WNNM-RPCA is much larger than NNM-RPCA, which means that WNNM-RPCA has much better low-rank matrix reconstruction capability in the presence of outliers/sparse noise.

5.2 Experimental Results on Background Subtraction

As an important application in video surveillance, background sbutraction refers to the problem of separating the moving objects in foreground and the stable scene in background. The matrix \({{\varvec{Y}}}\) obtained by stacking the video frames as columns corresponds to a low-rank matrix with stationary background corrupted by the sparse moving objects in the foreground. Thus, RPCA model is appropriate to deal with this problem. We compare WNNM-RPCA with NNM-RPCA and several representative low-rank learning models, including the classic iteratively reweighted least squares (IRLS) based RPCA modelFootnote 9 (De La Torre and Black 2003), and the \(\ell _2\)-norm and \(\ell _1\)-norm based matrix factorization models: singular value decomposition (SVD), Bayesian robust matrix factorization (BRMF)Footnote 10 (Wang and Yeung 2013) and RegL1ALMFootnote 11 (Zheng et al. 2012). The results of a recently proposed Mixture of Gaussian (MoG) modelFootnote 12 (Meng and Torre 2013; Zhao et al. 2014) are also included. These comparison methods range over state-of-the-art \(\ell _2\) norm, \(\ell _1\) norm and probabilistic subspace learning methods, including both categories of rank minimization and LRMF based approaches. We downloaded the codes of these algorithms from the corresponding authors’ websites and keep their initialization and stopping criteria unchanged. The code of the proposed WNNM-RPCA model can be downloaded at http://www4.comp.polyu.edu.hk/cslzhang/code/WNNM_RPCA_code.zip.

Four benchmark video sequences provided by Li et al. (2004) were adopted in our experiments, including two outdoor scenes (Fountain and Watersurface) and two indoor scenes (Curtain and Airport). In each sequence, 20 frames of ground truth foreground regions were provided by Li et al. for quantitative comparison. For all the comparison methods, parameters are fixed on the four sequences. We follow the experimental setting in Zhao et al. (2014) and constrain the maximum rank to 6 for all the factorization based methods. The regularization parameter \(\lambda \) for the \(\ell _1\)-norm sparse term in the NNM-RPCA model is set to \(\frac{1}{2\sqrt{max(m,n)}}\), since we empirically found that it can perform better than the recommended parameter \(\frac{1}{\sqrt{max(m,n)}}\) in the original paper (Candès et al. 2011) in this series of experiments. For the proposed WNNM-RPCA model, we set \(C=\sqrt{2max(m^3,n^3)}\) in all experiments.

Table 6 Quantitative performance (S) comparison of background subtraction results obtained by different methods

To quantitatively compare the performance of competing methods, we use \(S(A,B)=\frac{A\cap B}{A\cup B}\) to measure the similarity between the estimated foreground regions and the ground truth ones. To generate the binary foreground map, we applied the Markov random field (MRF) model to label the absolute value of the estimated sparse error. The MRF labeling problem was solved by the multi-label optimization tool box (Boykov et al. 2001). The quantitative results of S by different methods are shown in Table 6. One can see that on all the four utilized sequences, the proposed WNNM-RPCA model outperforms all other competing methods.

The visual results of representative frames in the Watersurface and Curtain sequences are shown in Figs. 7 and 8. The visual result of the other two sequences can be found in the supplementary file. From these figures, we can see that WNNM-RPCA method is able to deliver clear background estimation even under prominently embedded foreground moving objects. This on the other hand facilitates a more accurate foreground estimation. Comparatively, in the results estimated by the other methods, there are some ghost shadows in the background, leading to relatively less complete foreground detection results.

Fig. 7
figure 7

Performance comparison in visualization of competing methods on the Watersurface sequence. First row the original frames and annotated ground truth foregrounds. Second row to the last row estimated backgrounds and foregrounds by SVD, IRLS, BRMF, RegL1ALM, MoGRPCA, NNM-RPCA and WNNM-RPCA, respectively

Fig. 8
figure 8

Performance comparison in visualization of competing methods on the Curtain sequence. First row the original frames and annotated ground truth foregrounds. Second row to the last row estimated backgrounds and foregrounds by SVD, IRLS, BRMF, RegL1ALM, MoGRPCA, NNM-RPCA and WNNM-RPCA, respectively

6 WNNM-MC for Image Inpainting

Matrix completion refers to the problem of recovering a matrix from only partial observation of its entries. It is a well known ill-posed problem which needs prior of the ground truth matrix as supplementary information for reconstruction. Fortunately, in many practical instance, the matrix to be recovered has a low-rank structure. Such a prior knowledge has been utilized in many low-rank matrix completion (LRMC) methods, such as ranking and collaborative filtering (Ruslan and Srebro 2010) and image inpainting (Zhang et al. 2012a). Matrix completion can be solved by both the matrix factorization or the rank minimization approaches. As the exact recovery property of the NNM-based methods has been proved by (Candès and Recht 2009), this methodology has received great research interest, and many algorithms have been proposed to solve the NNM-MC problem (Cai et al. 2010; Lin et al. 2011). In the following, we provide experimental results on synthetic data and image inpainting to show the superiority of the proposed WNNM-MC model to the traditional NNM-MC technology.

6.1 Experimental Results on Synthetic Data

We first compare NNM-MC with WNNM-MC using synthetic low-rank matrices. Similar to our experimental setting in the RPCA problem, we generate the ground truth low-rank matrix by a multiplication between two matrices \({{\varvec{A}}}\) and \({{\varvec{B}}}\) of size \(m\times r\). Here \(r=p_r\times m\) constrains the upper bound of \(Rank({{\varvec{X}}})\). All of their elements are generated from the Gaussian distribution \(\mathcal {N}(0,1)\). In the observed matrix \({{\varvec{Y}}}\), \(p_e\times m^2\) entries in the ground truth matrix \({{\varvec{X}}}\) are missing. We set \(m=400\), and let \(p_r\) and \(p_e\) vary from 0.01 to 0.5 with step length 0.01. For each parameter setting of \(\{p_r, p_e\}\), 10 groups of synthetic data are generated for testing and the performance of each method is assessed by the average of the 10 runs on these groups.

Table 7 Relative error of low rank matrix recovery results by NNM-MC and WNNM-MC, with \(p_e\) fixed as 0.1, and \(p_r\) varying from 0.05 to 0.45 with step length 0.05
Table 8 TRelative error of low rank matrix recovery results by NNM-MC and WNNM-MC, with \(p_e\) fixed as 0.2, and \(p_r\) varying from 0.05 to 0.45 with step length 0.05
Table 9 Relative error of low rank matrix recovery results by NNM-MC and WNNM-MC, with \(p_e\) fixed as 0.3, and \(p_r\) varying from 0.05 to 0.45 with step length 0.05

In all the experiments we fix parameters \(\lambda =1/\sqrt{m}\) and \(C=m\) in NNM-MC and WNNM-MC, respectively. The parameter \(\rho \) in the ALM algorithm is set to 1.05 for both methods. Typical experimental results are listed in Tables 7, 8, and 9.

Fig. 9
figure 9

The log-scale relative error \(log\frac{\Vert \hat{{{\varvec{X}}}}-{{\varvec{X}}}\Vert _F^2}{\Vert {{\varvec{X}}}\Vert _F^2}\) of NNM-MC and WNNM-MC with different rank and outlier rate settings \(\{p_r,p_e\}\)

It can be easily observed that when the rank of latent ground truth matrix \({{\varvec{X}}}\) is relatively low, both NNM-RPCA and WNNM-RPCA can successfully recover it with high accuracy. The advantage of WNNM-MC over NNM-MC is reflected when dealing with more challenging cases. Table 8 shows that when \(20~\%\) of entries in the matrix are missing, NNM-MC will not have good recovery accuracy once the rank is higher than 120, while WNNM-MC can still have very high accuracy. Similar observations can be made in Table 9.

The log-scale relative error map with different settings of \(\{p_r,p_e\}\) are shown in Fig. 9. From this figure, it is clear to see that WNNM-MC has a much larger success area than NNM-MC.

6.2 Experimental Results on Image Inpainting

We then test the proposed WNNM-MC model on image inpainting. In some previous works, the whole image is assumed to be a low rank matrix and matrix completion is directly performed on the image to get the inpainting result. However, a natural image is only approximately low rank, and the small singular values in the long tail distribution include many details. Simply using the low rank prior on the whole image may fail to recover the missing pixels or lose too much detailed information in the image. As in the image denoising experiments, we utilize the NSS prior and perform WNNM-MC on each group of non-local similar patches for this task. We initialize the inpainting by the field of experts (FOE) (Roth and Black 2009) method to search the non-local patches, and then for each patch group we perform WNNM-MC to get an updated low-rank reconstruction. After the first round estimation of the missing values in all patch groups, all reconstructed patches are aggregated to get the recovered image. We then perform a new stage of similar patch searching based on the first round estimation, and iteratively implement the similar process to converge to a final inpainting output.

Table 10 Inpainting results (PSNR, dB) by different methods
Fig. 10
figure 10

Inpainting results on image Starfish by different methods (Random mask with 75 % missing values)

The first 12 test imagesFootnote 13 with size 256 \(\times \) 256 in Fig. 1 are used to evaluate WNNM-MC. Random masks with 25, 50 and 75 % missing pixels and a text mask are used to test the inpainting performance, respectively. We compare WNNM-MC with NNM-MC and several representative and state-of-the-art inpainting methods, including the TV method (Chan and Shen 2005), FOE methodFootnote 14 (Roth and Black 2009), variational nonlocal example-based (VNL) methodFootnote 15 (Arias et al. 2011) and the beta process dictionary learning (BPDL) methodFootnote 16 (Zhou et al. 2009). The setting of the TV inpainting method follows the implementation of (Dahl et al. 2010)Footnote 17 and the codes for other comparison methods are provided by the original authors. The source code of the proposed WNNM-MC model can be downloaded at http://www4.comp.polyu.edu.hk/~cslzhang/code/WNNM_MC_code.zip.

The PSNR results by different methods are shown in Table 10. It is easy to see that WNNM-MC achieves much better results than the other methods. Visual examples on a random mask and a text mask are shown in Figs. 10 and 11, respectively. More visual examples can be found in the supplementary file. From the enlarged demarcated windows, we can see that inpainting methods based on image local prior (e.g., TV, FOE and BPDL) are able to recover the image smooth areas, while they have difficulties in extracting the details in edge and texture areas. The VNL, NNM and WNNM methods utilized the rational NSS prior, and thus the results are more visually plausible. However, in some challenging cases when the percentage of missing entries is high, it can be observed that VNL and NNM more or less generate artifacts across the recovered image. As a comparison, the proposed WNNM-MC model has much better visual quality of the inpainting results.

7 Discussions

To improve the flexibility of the original nuclear norm, we proposed the weighted nuclear norm and studied its minimization strategy in this work. Based on the observed data and the specific application, different weight setting strategies can be utilized to achieve better performance. Inspired by (Candès et al. 2008), in this work we utilized the reweighting strategy \(w_i^{\ell }=\frac{C}{|\sigma _i(\mathbf {{X}}_{\ell })|+\varepsilon }\) to approximate the \(\ell _0\) norm on the singular values of the data matrix. Other than this setting, there also exist other weight setting strategies for certain types of data and applications. For instance, the truncated nuclear norm regularization (TNNR) (Zhang et al. 2012a) and the partial sum minimization (PSM) (Oh et al. 2013) were proposed to regularize only the smallest \(N-r\) singular values. Actually, they can be viewed as a special case of WNNM, where weight vector \([w_{1\cdots r}=0,w_{r+1\cdots N}=\lambda ]\) is used to approximate the rank function of matrix. The Schatten-p norm minimization (SPNM) methods (Nie et al. 2012,Mohan and Fazel 2012) can also be understood as special cases of WNNM since the \(\ell _p\) norm proximal problem can be solved by the iteratively thresholding method (She 2012). The same strategy as in our work can be used to set weights for each singular values.

Fig. 11
figure 11

Inpainting results on image Monarch by different methods (Text mask)

Table 11 The average PSNR (dB) values of denoising results by competing methods on the 20 test images
Table 12 The average PSNR (dB) values of inpainting results by competing methods on the 12 test images
Table 13 Quantitative performance (S) comparison of background subtraction results obtained by different methods

In Tables 1112 and 13, we compare the proposed WNNM with the TNNR and SPNM methods on image denosing, background subtraction and image inpainting, respectively. The results obtained by the NNM method are also shown in the tables as baseline comparison. The experimental settings on the three applications are exactly the same as the experiments in Sects. 45 and 6. For the TNNR method, there are two parameters r and \(\lambda \), which represent the truncation position and the regularization parameter for the remaining \(N-r\) singular values. For the SPNM model, we need to set the p value for the \(\ell _p\) norm and the regularization parameter \(\gamma \). We tried our best to adjust these parameters for the two models for different applications. The remaining parameters are the same as the WNNM model on each task.

From the tables, we can find that regularizing more flexibly the singular values is beneficial for low rank models. Besides, WNNM outperforms TNNR and SPNM in the testing applications, which substitutes the superiority of the proposed method along this line of research. Nonetheless, it is worth to note that there may exist other weighting mechanisms which might achieve better performance on certain computer vision tasks. Actually, since we have proved in Theorem 1 that the WNNP problem is equivalent to an element wise proximal problem with order constrains, a wide range of weighting and reweighting strategies can be used to regularize the singular values of a data matrix. One important research problem of our future work is thus to investigate more sophisticated weight setting mechanisms for different types of data and applications.

8 Conclusion

We studied the weighted nuclear norm minimization problem (WNNM) in this paper. We first presented the solving strategy for the weighted nuclear norm proximal (WNNP) operator under \(\ell _2\)-norm fidelity loss to facilitate the solving of different WNNM paradigms. We then extended the WNNM model to robust PCA (RPCA) and matrix completion (MC) tasks and constructed efficient algorithms to solve them based on the derived WNNP operator. Inspired by previous results on reweighted sparse coding, we further designed a rational scheme for automatic weight setting, which offers closed form solutions of the WNNP operator and ease the utilization of WNNM in different applications.

We validated the effectiveness of the proposed WNNM models on multiple low level vision tasks. For baseline WNNM, we applied it to image denoising, and achieved state-of-the-art performance in both quantitative PSNR measures and visual qualities. For WNNM-RPCA, we applied it to background subtraction, and validated its superiority among up-to-date low-rank learning methods. For WNNM-MC, we applied it to image inpainting, and demonstrate its superior performance among state-of-the-art inpainting technologies.