Performance guarantees of transformed Schatten-1 regularization for exact low-rank matrix recovery

Wang, Zhi; Hu, Dong; Luo, Xiaohu; Wang, Wendong; Wang, Jianjun; Chen, Wu

doi:10.1007/s13042-021-01361-1

Performance guarantees of transformed Schatten-1 regularization for exact low-rank matrix recovery

Original Article
Published: 29 June 2021

Volume 12, pages 3379–3395, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Performance guarantees of transformed Schatten-1 regularization for exact low-rank matrix recovery

Download PDF

Zhi Wang ORCID: orcid.org/0000-0002-2167-830X¹,
Dong Hu¹,
Xiaohu Luo¹,
Wendong Wang²,
Jianjun Wang³ &
…
Wu Chen¹

462 Accesses
10 Citations
1 Altmetric
Explore all metrics

Abstract

Low-rank matrix recovery aims to recover a matrix of minimum rank that subject to linear system constraint. It arises in various real world applications, such as recommender systems, image processing, and deep learning. Inspired by compressive sensing, the rank minimization can be relaxed to nuclear norm minimization. However, such a method treats all singular values of target matrix equally. To address this issue, recently the transformed Schatten-1 (TS1) penalty function was proposed and utilized to construct low-rank matrix recovery models. Unfortunately, the method for TS1-based models cannot provide both convergence accuracy and convergence speed. To alleviate such problems, this paper further investigates the basic properties of TS1 penalty function. And we describe a novel algorithm, which we called ATS1PGA, that is highly efficient in solving low-rank matrix recovery problems at a convergence rate of O(1/N), where N denotes the iterate count. In addition, we theoretically prove that the original rank minimization problem can be equivalently transformed into the TS1 optimization problem under certain conditions. Finally, extensive experimental results on real image data sets show that our proposed algorithm outperforms state-of-the-art methods in both accuracy and efficiency. In particular, our proposed algorithm is about 30 times faster than TS1 algorithm in solving low-rank matrix recovery problems.

Low-rank matrix recovery with Ky Fan 2-k-norm

Article Open access 02 June 2021

Matrix completion via modified schatten 2/3-norm

Article Open access 09 June 2023

Algorithms of matrix recovery based on truncated Schatten p-norm

Article 03 January 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The problem of recovering a matrix of minimum rank subject to linear system constraint has attracted considerable attention in recent years. This problem arises in various real world applications, such as recommender systems [1, 2], image processing [3,4,5], quality-of-service (QoS) prediction [6], and deep learning [7, 8]. In general, such a task can be formulated as the following low-rank minimization problem [9, 10]:

$$\begin{aligned} \min _{X \in {\mathbb {R}}^{m \times n}} rank(X) \quad s.t. \quad {\mathcal {A}}(X) = b, \end{aligned}$$

(1)

where X is the considered low-rank matrix in ${\mathbb {R}}^{m \times n}$, b is a given measurement in ${\mathbb {R}}^{d}$, and ${\mathcal {A}}$ denotes the linear transformation. By adopting the regularization method, the optimization problem (1) can be equivalently converted into the following unconstrained minimization problem:

$$\begin{aligned} \min _{X \in {\mathbb {R}}^{m \times n}} \frac{1}{2}\Arrowvert {\mathcal {A}}(X) - b\Arrowvert _{2}^{2} + \lambda rank(X), \end{aligned}$$

(2)

where $\lambda > 0$ is a regularization parameter.

Unfortunately, the optimization problems (1) and (2) are computationally intractable due to the nonconvexity and discontinuous properties of the rank function. In order to overcome this difficulty, many researchers suggested to use the nuclear norm instead, which is known as the tightest convex proxy of the rank function [11, 12]. Theoretical analysis shows that, under some mild conditions, the low-rank matrix can be exactly recovered with high probability by using this scheme. Thus, a large number of methods have been proposed for the resultant nuclear norm optimization problem, such as singular value thresholding (SVT) [13], accelerated proximal gradient with linesearch algorithm (APGL) [14], and accelerated inexact soft-impute (AIS-Impute) [15].

Since the methods mentioned above are simple and easy to use with theoretical guarantee, nuclear norm based model has recently attracted significant attention in the field of low-rank matrix recovery. However, the performance of such a convex relaxation is not good enough. In other words, the solutions of nuclear norm optimization problem may deviate from the solutions of the original optimization problem. The main reason is that the nuclear norm based model over-penalizes large singular values. To alleviate this limitation, a common used strategy is to use nonconvex surrogates to approximate the rank function, which make closer approximation than nuclear norm. Examples of these nonconvex surrogate functions include $l_{q}$-norm $(0< q < 1)$ [16,17,18], weighted nuclear norm (WNN) [19], smoothly clipped absolute deviation (SCAD) [20], mini-max concave penalty (MCP) [21], log-sum penalty (LSP) [22], and so on. Despite the resultant problem is nonconvex, non-smooth, and even non-Lipschitz, numerous methods have been proposed to handle it. In [16] and [23], the authors proposed fixed point iterative scheme with the singular value thresholding operator. The convergence analysis and empirical results show that these methods are fast and efficient. In [24], the iteratively reweighted nuclear norm (IRNN) method has been proposed by using the concavity and decreasing supergradients property of existing nonconvex regularizer. Since a computationally expensive singular value decomposition (SVD) step is involved in per iteration, the IRNN method converges slowly. In order to improve the speed and performance of IRNN method, fast nonconvex low-rank (FaNCL) [25] method was proposed. The empirical results of all the methods mentioned above illustrate that the nonconvex based model outperforms the convex based model.

Recently, the transformed Schatten-1 (TS1) penalty function [26], as a matrix quasi-norm defined on its singular values, has been successfully applied to low-rank matrix recovery. Actually, the TS1 penalty function is extended from the Transformed $\ell _{1}$ (TL1) function. The TL1 function can be seen as a class of $\ell _{1}$ based nonconvex penalty function, which was generalized by Lv and Fan in [27]. Kang et al. [28] have demonstrated the very high efficiency of TL1 function when applied to robust principal component analysis. However, the TL1 penalty function leads to a nonconvex optimization problem that is difficult to solve fast and efficient. Therefore, Zhang et al. continue such a study [29, 30] and point out that the TL1 proximal operator has closed form analytical solutions for all values of parameter. Based on this finding, in this paper, we consider the following TS1 penalty function:

$$\begin{aligned} T(X) = \sum _{i = 1}^{rank(X)}\rho _{a}(\sigma _{i}(X)) = \sum _{i = 1}^{rank(X)}\frac{(a + 1)\sigma _{i}(X)}{a + \sigma _{i}(X)}, \end{aligned}$$

(3)

where

$$\begin{aligned} \rho _{a}(|x|) = \frac{(a + 1)|x|}{a + |x|}, \end{aligned}$$

(4)

is a nonconvex function with parameter $a \in (0, +\infty )$, and $\sigma _{i}(X)$ denotes the ith singular value of matrix X. Therefore, the original problem (1) can be naturally converted into the following optimization problem:

$$\begin{aligned} \min _{X \in {\mathbb {R}}^{m \times n}} T(X) \quad s.t. \quad {\mathcal {A}}(X) = b. \end{aligned}$$

(5)

More often, we focus on its regularization version, which can be formulated as:

$$\begin{aligned} \min _{X \in {\mathbb {R}}^{m \times n}} \frac{1}{2}\Arrowvert {\mathcal {A}}(X) - b\Arrowvert _{2}^{2} + \lambda T(X). \end{aligned}$$

(6)

It should be noted that the TS1 penalty function is more general than the nonconvex penalty function in [31]. Besides, $\rho _{a}$ with $a \in (0, +\infty )$ provides solutions satisfying the unbiasedness and low-rankness.

In this paper, we further investigate the basic properties of TS1 penalty function and propose a fast and efficient algorithm to solve problem (6). Specifically, we analyze the solution of rank minimization and prove that, under certain rank-RIP conditions, rank minimization can be equivalently transformed into unconstrained TS1 regularization, whose global minimizer can be obtained by TS1 thresholding functions. The TS1 thresholding functions are in closed analytical form for all parameter values. Thus, an exact mathematical analysis provides theoretical guarantee for the application of TS1 regularization in solving low-rank matrix recovery problems. We list the findings and contributions as follows:

(1)
We first show that the minimizer of problem (5) is actually the optimal solution of problem (1).
(2)
We further establish the relationship between problems (5) and (6), and prove that the solution of problem (5) can be obtained by solving problem (6).
(3)
Nesterov’s rule and inexact proximal strategies are adopted to achieve a novel algorithm highly efficient in solving the TS1 regularization at a convergence rate of O(1/N).
(4)
Extensive empirical studies regarding image inpainting tasks validate the advantages of our proposed method.

The remainder of this paper is organized as follows. Section 2 introduces the related works. In Sect. 3, the equivalence minimizers of the resultant optimization problem and the original optimization problem are established. Section 4 proposes an efficient optimization method with rigorous convergence guarantee. The experimental results are reported and analyzed in Sect. 5. Finally, we conclude this paper in Sect. 6.

2 Background

2.1 Proximal algorithms

In this paper, we consider the following low-rank optimization problem:

$$\begin{aligned} \min _{X \in {\mathbb {R}}^{m \times n}}F(X) = f(X) + \lambda h(X), \end{aligned}$$

(7)

where f is bounded below and differentiable with $L_{f}$-Lipschitz continuous gradient, i.e., $\Arrowvert \nabla f(X_{1}) - \nabla f(X_{2}) \Arrowvert _{F} \le L_{f}\Arrowvert X_{1} - X_{2} \Arrowvert _{F}$, h is low-rank regularizer, and $\lambda > 0$ is a given parameter. During the last decades, the proximal algorithm [32] has been received considerable attention and successfully applied to solve problem (7). Specifically, the proximal algorithm generates $X_{k + 1}$ as

$$\begin{aligned} X_{k + 1} = prox_{\frac{\lambda }{\tau }h}\left( X_{k} - \frac{1}{\tau }\nabla f(X_{k})\right) , \end{aligned}$$

(8)

where $\tau > 0$ is a parameter, and

$$\begin{aligned} prox_{\frac{\lambda }{\tau }h}(M) = \mathop {\arg \min }_{X \in {\mathbb {R}}^{m \times n}}\frac{1}{2}\Arrowvert X - M \Arrowvert ^{2}_{F} + \frac{\lambda }{\tau }h(X) \end{aligned}$$

(9)

denotes the proximal operator. Assume that f and h are convex, Toh and Yun [14] proposed an accelerated proximal gradient (APG) algorithm with a converge rate of $O(1/N^{2})$. However, exactly solving the proximal operator may be expensive. With the aim of alleviating this difficulty, the inexact proximal gradient methods have been proposed and the theoretical analysis in [33] reveals that it shares the same convergence rate as APG. Most recently, the inexact proximal gradient method has been extended to nonconvex and nonsmooth problems [34]. It should be pointed out that the nmAIPG algorithm in [34] is nearly the same as the nmAPG algorithm in [35], but it is much faster. Unfortunately, both nmAIPG and nmAPG algorithms may involve two proximal steps in per iteration.

2.2 TS1 thresholding algorithm

We first define the proximal operator with TS1 regularizer, which can be found as follows:

$$\begin{aligned} prox_{\lambda }^{\rho _{a}}(y) = \mathop {\arg \min }_{x \in {\mathbb {R}}}\frac{1}{2}(y - x)^{2} + \lambda \rho _{a}(|x|), \end{aligned}$$

(10)

where $\rho _{a}(\cdot )$ is defined in (4). As shown in [30], this nonconvex function has a closed-form expression for its optimal solution and the following lemma addresses this issue.

Lemma 1

(see [30]) For any $\lambda > 0$ and $y \in {\mathbb {R}}$, the solutions to nonconvex function (10) are

$$\begin{aligned} prox_{\lambda }^{\rho _{a}}(y) = \left\{ \begin{array}{lr} 0, &{} \mathrm{if}~|y| \le t\\ g_{\lambda }(y), &{} \mathrm{if}~|y| > t \end{array} \right. \end{aligned}$$

(11)

where

$$\begin{aligned} g_{\lambda }(y) = sgn(y)\left\{ \frac{2}{3}(a + |y|)cos\left( \frac{\phi (y)}{3}\right) - \frac{2a}{3} + \frac{|y|}{3}\right\} \end{aligned}$$

with $\phi (y) = arccos(1 - \frac{27\lambda a(a + 1)}{2(a + |y|)^{3}})$, and

$$\begin{aligned} t = \left\{ \begin{array}{lr} \frac{\lambda (a + 1)}{a}, &{} \mathrm{if}~\lambda \le \frac{a^{2}}{2(a + 1)}\\ \sqrt{2\lambda (a + 1)} - \frac{a}{2}, &{} \mathrm{if}~\lambda > \frac{a^{2}}{2(a + 1)}. \end{array} \right. \end{aligned}$$

(12)

Based on this finding, the optimal solutions to problem (6) can be obtained by the proximal operator.

Lemma 2

(see [26]) Assume that $\tau > \Arrowvert {\mathcal {A}} \Arrowvert _{2}^{2}$, the optimal solutions to problem (6) are

$$\begin{aligned} X^{*} = prox_{\frac{\lambda }{\tau }T(\cdot )}(B_{\tau }(X^{*})) = UDiag\left( prox_{\frac{\lambda }{\tau }}^{\rho _{a}}(\sigma _{i}(B_{\tau }(X^{*})))\right) V^{T}, \end{aligned}$$

(13)

where $B_{\tau }(X^{*}) = X^{*} - \frac{1}{\tau }{\mathcal {A}}^{*}({\mathcal {A}}(X^{*}) - b)$ and it admits SVD as $UDiag(\sigma (B_{\tau }(X^{*})))V^{T}$.

Therefore, at kth iteration

$$\begin{aligned} X_{k + 1} = prox_{\frac{\lambda }{\tau }T(\cdot )}\left( X_{k} - \frac{1}{\tau }{\mathcal {A}}^{*}({\mathcal {A}}(X_{k}) - b)\right) . \end{aligned}$$

(14)

According to iteration (14), the TS1 algorithm is proposed.

3 Equivalence minimizers of problem (1) and problem (5)

In this section, we further investigate the basic properties of TS1 penalty function. Assume that $X_{0}$ and $X_{1}$ be the minimizers to the problems (1) and (5), respectively. Let $R = X_{1} - X_{0}$, then partition matrix R into two matrices $R_{0}$ and $R_{c}$ which are defined as follows.

Definition 1

Let R admits SVD as $UDiag(\sigma (R))V^{T}$, the matrices $R_{0}$ and $R_{c}$ can be defined as:

$$\begin{aligned} R_{0} = [U_{2K} \; 0]_{m \times m}\left[ \begin{matrix} \Sigma _{1} &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 \end{matrix} \right] _{m \times n} [V_{2K} \; 0]_{n \times n}^{T} \end{aligned}$$

(15)

and

$$\begin{aligned} R_{c} = [0 \; U_{m - 2K}]_{m \times m}\left[ \begin{matrix} 0 &{} 0 &{} 0 \\ 0 &{} \Sigma _{2} &{} 0 \end{matrix} \right] _{m \times n} [0 \; V_{m - 2K}]_{n \times n}^{T}. \end{aligned}$$

(16)

Definition 2

Define the index set $I_{j} = \{P(j - 1) + 2K + 1, \ldots , Pj + 2K\}$ and partition $R_{c}$ into a sum of matrices $R_{1}, R_{2}, \ldots ,$ i.e.,

$$\begin{aligned} R_{c} = \sum _{j}R_{j}, \end{aligned}$$

where

$$\begin{aligned} R_{j} = [0 \; U_{I_{j}} \; 0]_{m \times m}\left[ \begin{matrix} 0 &{} 0 &{} 0 \\ 0 &{} \Sigma _{I_{j}} &{} 0\\ 0 &{} 0 &{} 0 \end{matrix} \right] _{m \times n} [0 \; V_{I_{j}} \; 0]_{n \times n}^{T}. \end{aligned}$$

(17)

First, we introduce the following useful results.

Lemma 3

Assume that $K = rank(X_{0})$, we have

$$\begin{aligned} T(R_{c}) \le T(R_{0}). \end{aligned}$$

(18)

Proof

Since $X_{1}$ is the minimizer of (5), we have

$$\begin{aligned} T(X_{0}) \ge T(X_{1})= & {} T(X_{0} + R) \ge T(X_{0} + R_{c}) - T(R_{0})\\= & {} T(X_{0}) + T(R_{c}) - T(R_{0}), \end{aligned}$$

the second inequality follows from Lemma 2.1 in [30] and the last equality follows from Lemma 3.4 in [12]. $\square $

Theorem 1

For any $a > 0$, we have

$$\begin{aligned} \Arrowvert R_{0} + R_{1} \Arrowvert _{F} \ge \frac{aT(R_{0})}{(a + 1)\sqrt{2K}}. \end{aligned}$$

(19)

Proof

According to the definition of $\rho _{a}(\cdot )$, we have

$$\begin{aligned} \rho _{a}(|x|) = \frac{(a + 1)|x|}{a + |x|} = \frac{(a + 1)|x|}{a} - \frac{(a + 1)x^{2}}{a(a + |x|)} \le \frac{(a + 1)|x|}{a}. \end{aligned}$$

Hence, we get

$$\begin{aligned} T(R_{0})= & {} \sum _{i}\frac{(a + 1)\sigma _{i}(R_{0})}{a + \sigma _{i}(R_{0})} \le \frac{a + 1}{a}\sum _{i}\sigma _{i}(R_{0}) = \frac{a + 1}{a}\Arrowvert \sigma (R_{0}) \Arrowvert _{1}\\&\le \frac{a + 1}{a}\sqrt{2K}\Arrowvert R_{0} \Arrowvert _{F} \le \frac{a + 1}{a}\sqrt{2K}\Arrowvert R_{0} + R_{1} \Arrowvert _{F}. \end{aligned}$$

Thus, we get the desired result

$$\begin{aligned} \Arrowvert R_{0} + R_{1} \Arrowvert _{F} \ge \frac{aT(R_{0})}{(a + 1)\sqrt{2K}}. \end{aligned}$$

$\square $

Lemma 4

Let $X \in {\mathbb {R}}^{m \times n}$ admits SVD as $UDiag(\sigma (X))V^{T}$. For any $a > 0$ and $\alpha > \alpha _{1}$, where

$$\begin{aligned} \alpha _{1} = \frac{ar + r - 1}{a}\sigma _{1}(X) \end{aligned}$$

(20)

with $r = rank(X)$, we have

$$\begin{aligned} T(\alpha ^{-1}X) \le 1. \end{aligned}$$

(21)

Proof

Using the fact that $\rho _{a}(x)$ is increasing in the range $[0, +\infty )$, we have

$$\begin{aligned} T(\alpha ^{-1}X)= & {} \sum _{i = 1}^{r} \rho _{a}(\sigma _{i}(\alpha ^{-1}X)) \le r\rho _{a}(\sigma _{1}(\alpha ^{-1}X))\\= & {} \frac{\frac{1}{\alpha }r(a + 1)\sigma _{1}(X)}{a + \frac{1}{\alpha }\sigma _{1}(X)} = \frac{r(a + 1)\sigma _{1}(X)}{\sigma _{1}(X) + a\alpha }. \end{aligned}$$

To get $T(\alpha ^{-1}X) \le 1$, it suffices to impose

$$\begin{aligned} \frac{r(a + 1)\sigma _{1}(X)}{\sigma _{1}(X) + a\alpha } \le 1, \end{aligned}$$

equivalently,

$$\begin{aligned} \alpha \ge \frac{ar + r - 1}{a}\sigma _{1}(X). \end{aligned}$$

We complete the proof. $\square $

Theorem 2

For any $a > 0$ and $\alpha > \frac{ar_{c} + r_{c} - 1}{a}\sigma _{1}(R_{c})$, we have

$$\begin{aligned} \sum _{i \ge 2}\Arrowvert \alpha ^{-1}R_{i} \Arrowvert _{F} \le \sum _{i \ge 2}\frac{T(\alpha ^{-1}R_{i -1})}{\sqrt{P}} \le \frac{T(\alpha ^{-1}R_{0})}{\sqrt{P}}, \end{aligned}$$

(22)

where $r_{c} = rank(R_{c})$.

Proof

According to the definition of $T(\cdot )$ and Lemma 4, we have

$$\begin{aligned} \rho _{a}(\sigma _{i}(\alpha ^{-1}R_{j})) \le T(\alpha ^{-1}R_{j}) \le 1, \; \forall i \in I_{j}. \end{aligned}$$

Hence, we have

$$\begin{aligned} \rho _{a}(\sigma _{i}(\alpha ^{-1}R_{j})) = \frac{(a + 1)\sigma _{i}(\alpha ^{-1}R_{j})}{a + \sigma _{i}(\alpha ^{-1}R_{j})} \le 1 \Leftrightarrow \sigma _{i}(\alpha ^{-1}R_{j}) \le 1, \end{aligned}$$

which is equivalent to

$$\begin{aligned} \sigma _{i}(\alpha ^{-1}R_{j}) \le \rho _{a}(\sigma _{i}(\alpha ^{-1}R_{j})). \end{aligned}$$

By Definition 2, we have

$$\begin{aligned} \sigma _{i}(R_{j}) \le \sigma _{i'}(R_{j - 1}), \; \forall i \in I_{j}, i' \in I_{j - 1}, \end{aligned}$$

and the rank of $R_{j}$s is not greater than k.

Since $\rho _{a}(t)$ is increasing in the range $(0, \infty )$, we get

$$\begin{aligned} \sigma _{i}(\alpha ^{-1}R_{j}) \le \rho _{a}(\sigma _{i}(\alpha ^{-1}R_{j})) \le \frac{T(\alpha ^{-1}R_{j - 1})}{P}. \end{aligned}$$

Therefore, we obtain

$$\begin{aligned} \Arrowvert \alpha ^{-1}R_{j} \Arrowvert _{F} \le \frac{T(\alpha ^{-1}R_{j - 1})}{\sqrt{P}}, \end{aligned}$$

and

$$\begin{aligned} \sum _{j \ge 2}\Arrowvert \alpha ^{-1}R_{j} \Arrowvert _{F} \le \sum _{j \ge 2}\frac{T(\alpha ^{-1}R_{j - 1})}{\sqrt{P}}. \end{aligned}$$

This is the first part of (22). By Lemma 3, the second part of (22)is obtained immediately.

We complete the proof. $\square $

In order to show that the minimizers to problem (5) are the optimal solutions of the problem (1), we introduce the following rank restricted isometry property (rank-RIP) condition.

Definition 3

(see [36]) For every integer r with $1 \le r \le m$, let the restricted isometry constant $\delta _{r}({\mathcal {A}})$ be the minimum number such that

$$\begin{aligned} (1 - \delta _{r}({\mathcal {A}}))\Arrowvert X \Arrowvert _{F}^{2} \le \Arrowvert {\mathcal {A}}(X) \Arrowvert _{2}^{2} \le (1 + \delta _{r}({\mathcal {A}}))\Arrowvert X \Arrowvert _{F}^{2}, \end{aligned}$$

(23)

holds for all $X \in {\mathbb {R}}^{m \times n}$ of rank at most r.

Armed with rank-RIP condition, the following theorem reveals that the original problem (1) and its nonconvex relaxation problem (5) share the same solutions under the rank-RIP condition with $\delta _{3K} < \frac{a^{2} - 4a - 2}{5a^{2} + 4a + 2}$.

Theorem 3

Assume that there is a number $P > 2K$, where $K = rank(X_{0})$, such that

$$\begin{aligned} \delta _{P}({\mathcal {A}}) + \frac{P}{2K}\delta _{2K + P}({\mathcal {A}}) < \frac{P}{2K} - 1. \end{aligned}$$

(24)

Then, there exists $a^{*} > 0$, such that for any $a^{*}< a < +\infty $, $X_{1} = X_{0}$.

Proof

Let

$$\begin{aligned} f(a) = \frac{a^{2}}{(a + 1)^{2}}\frac{P}{2K}\left( 1 - \delta _{2K + P}({\mathcal {A}})\right) - \left( 1 + \delta _{P}({\mathcal {A}})\right) . \end{aligned}$$

(25)

It is easy to verify that the function f is continuous and increasing in the range $[0, +\infty )$. Note that at $a = 0$,

$$\begin{aligned} f(0) = -1 - \delta _{P}({\mathcal {A}}) < 0, \end{aligned}$$

and

$$\begin{aligned} f(a) = \frac{P}{2K}\left( 1 - \delta _{2K + P}({\mathcal {A}})\right) - \left( 1 + \delta _{P}({\mathcal {A}})\right) > 0, \quad as \quad a \rightarrow +\infty . \end{aligned}$$

Thus, there exists a constant $a^{*} > 0$ such that $f(a^{*}) = 0$. Then, for any $a^{*}< a < +\infty $, we get

$$\begin{aligned} \frac{a}{a + 1}\sqrt{\frac{1 - \delta _{2K + P}({\mathcal {A}})}{2K}} - \sqrt{\frac{1 + \delta _{P}({\mathcal {A}})}{P}} > 0. \end{aligned}$$

(26)

Since ${\mathcal {A}}(R) = {\mathcal {A}}(X_{1} - X_{0}) = b - b = 0$, we obtain

$$\begin{aligned} 0= & {} \Arrowvert {\mathcal {A}}(\alpha ^{-1}R) \Arrowvert _{2} = \Arrowvert {\mathcal {A}}(\alpha ^{-1}R_{0} + \alpha ^{-1}R_{c}) \Arrowvert _{2}\\= & {} \Arrowvert {\mathcal {A}}(\alpha ^{-1}R_{0} + \alpha ^{-1}R_{1}) + \sum _{i \ge 2}{\mathcal {A}}(\alpha ^{-1}R_{i}) \Arrowvert _{2}\\\ge & {} \Arrowvert {\mathcal {A}}(\alpha ^{-1}R_{0} + \alpha ^{-1}R_{1})\Arrowvert _{2} - \left.{\&\#XArrowvert;}\sum _{i \ge 2}{\mathcal {A}}(\alpha ^{-1}R_{i}) \right.{\&\#XArrowvert;}_{2}\\\ge & {} \Arrowvert {\mathcal {A}}(\alpha ^{-1}R_{0} + \alpha ^{-1}R_{1})\Arrowvert _{2} - \sum _{i \ge 2}\Arrowvert {\mathcal {A}}(\alpha ^{-1}R_{i}) \Arrowvert _{2}\\\ge & {} \sqrt{1 - \delta _{2K + P}({\mathcal {A}})}\Arrowvert \alpha ^{-1}R_{0} + \alpha ^{-1}R_{1}\\&\Arrowvert _{F} - \sqrt{1 + \delta _{P}({\mathcal {A}})}\sum _{i \ge 2}\Arrowvert \alpha ^{-1}R_{i} \Arrowvert _{F}. \end{aligned}$$

Using Theorems 1 and 2, we get

$$\begin{aligned} 0\ge & {} \sqrt{1 - \delta _{2K + P}({\mathcal {A}})}\frac{aT(\alpha ^{-1}R_{0})}{(a + 1)\sqrt{2K}} - \sqrt{1 + \delta _{P}({\mathcal {A}})}\frac{1}{\sqrt{P}}T(\alpha ^{-1}R_{0})\\= & {} \left( \frac{a}{a + 1}\sqrt{\frac{1 - \delta _{2K + P}({\mathcal {A}})}{2K}} - \sqrt{\frac{1 + \delta _{P}({\mathcal {A}})}{P}}\right) T(\alpha ^{-1}R_{0}). \end{aligned}$$

According to the inequality (26) and the definition of $T(\cdot )$, we have $T(\alpha ^{-1}R_{0}) = 0$, which implies that $R_{0} = 0$. This, together with Lemma 3, yields $R_{c} = 0$. Therefore, $X^{*} = X_{0}$.

We complete the proof. $\square $

Remark 1

In Theorem 3, if let $P = 3K$, the inequality (26) is

$$\begin{aligned} \frac{3a^{2}}{2(a + 1)^{2}}(1 - \delta _{5K}({\mathcal {A}})) > 1 + \delta _{3K}({\mathcal {A}}). \end{aligned}$$

(27)

This inequality will approach $\delta _{3T} < \frac{a^{2} - 4a - 2}{5a^{2} + 4a + 2}$. Moreover, we can obtain $\delta _{3T} < 0.2$ as $a \rightarrow \infty $.

Although the minimizers of problem (1) can be exactly obtained by solving the nonconvex relaxation problem (5), the resultant optimal problem is still NP-hard. To overcome this difficultly, we focus on its regularization version (6). The following theorem addresses this issue.

Theorem 4

Let $\{\lambda _{p}\}$ be a decreasing sequence of positive numbers with $\lambda _{p} \rightarrow 0$, and $X_{\lambda _{p}}$ be the optimal minimizer of the problem (6) with $\lambda = \lambda _{p}$. Assume that the problem (5) is feasible, then the sequence $\{X_{\lambda _{p}}\}$ is bounded and any of its accumulation points is the optimal minimizer of the problem (5).

Proof

Assume that the problem (5) is feasible and ${\tilde{X}}$ is any feasible point, then ${\mathcal {A}}({\tilde{X}}) = b$. Since $X_{\lambda _{p}}$ is the optimal minimizer of the problem (6) with $\lambda = \lambda _{p}$, we get

$$\begin{aligned}&\max \left\{ \lambda _{p}T(X_{\lambda _{p}}), \Arrowvert {\mathcal {A}}(X_{\lambda _{p}}) - b \Arrowvert _{2}^{2}\right\} \le \Arrowvert {\mathcal {A}}(X_{\lambda _{p}}) - b \Arrowvert _{2}^{2} + \lambda _{p}T(X_{\lambda _{p}})\nonumber \\&\quad \le \Arrowvert {\mathcal {A}}({\tilde{X}}) - b \Arrowvert _{2}^{2} + \lambda _{p}T({\tilde{X}})\nonumber \\&\quad = \lambda _{p}T({\tilde{X}}). \end{aligned}$$

(28)

Thus, the sequence $\{X_{\lambda _{p}}\}$ is bounded and has at least one accumulation point. Assume that $X^{*}$ be any accumulation point of $\{X_{\lambda _{p}}\}$. From (28), we get ${\mathcal {A}}(X^{*}) = b$, which means that $X^{*}$ is a feasible point of the problem (5). Together with $T(X^{*}) \le T({\tilde{X}})$ and the arbitrariness of ${\tilde{X}}$, we get that $X^{*}$ is the optimal minimizer of the problem (5).

We complete the proof. $\square $

4 The proposed algorithm and its convergence analysis

In order to solve the nonconvex relaxation problem (6) efficiently, in this section we propose a novel algorithm highly efficient to handle it at a convergence rate of O(1/N). Specifically, the proposed algorithm has an improved convergence rate and improved recovery capability of the low-rank matrix over those of the state-of-the-art algorithms.

First, the power method [37] is adopted to obtain approximate SVD. As shown in [33] and [34], in real application it is often too expensive to compute the proximal operator. To speed up the convergence of the proposed algorithm, it is desirable to solve the proximal operator by computing the SVD on a smaller matrix. The following lemma addresses this issue.

Lemma 5

For any fixed $\lambda > 0$, assume that $X = QQ^{T}X \in {\mathbb {R}}^{m \times n}$, where $Q \in {\mathbb {R}}^{m \times t} (t \ll n)$ is an orthogonal matrix. Then,

$$\begin{aligned} prox_{\frac{\lambda }{\tau }T(\cdot )}(X) = Qprox_{\frac{\lambda }{\tau }T(\cdot )}(Q^{T}X). \end{aligned}$$

(29)

Proof

The proof can be followed the footsteps of Proposition 1 in [38], and we omit it here. $\square $

Since the power method is employed in our algorithm to get approximate SVD, the results generated by the proximal operator may be inexact, meaning that

$$\begin{aligned} X_{k + 1}\!=\! & {} prox_{\frac{\lambda }{\tau }T(\cdot )}(Y_{k})\nonumber \\\!=\! & {} \left\{ U|\frac{\lambda }{\tau }T(U) \!+\! \frac{1}{2}\Arrowvert U \!-\! Y_{k} \Arrowvert _{F}^{2} \!\le \! \xi _{k} \!+\! \frac{\lambda }{\tau }T(V) \right. \nonumber \\&\left. + \frac{1}{2}\Arrowvert V \!-\! Y_{k} \Arrowvert _{F}^{2}, \; \forall V \!\in \! {\mathbb {R}}^{m \times n}\right\} . \end{aligned}$$

(30)

As indicated in [34], although the inexact proximal steps are employed in our algorithm, the basic properties stay the same.

Second, the convergence rate of our proposed algorithm is further accelerated by the Nesterov’s rule [39] which is a widely used method to speed up the convergence rate of first-order algorithm. In this paper, we integrate $\mathrm {APGnc}^{+}$ [40] with our proposed algorithm, and the details can be found in Algorithm 1. It should be pointed out that our proposed algorithm ATS1PGA is much different from $\mathrm {APGnc}^{+}$. We focus on the transformed Schatten-1 regularizer which induces lower rank and achieves improved recovery performance than the nonconvex regularizer used in $\mathrm {APGnc}^{+}$.

The following theorem shows that the objective function is always decreased and any accumulation point of the sequence generated by Algorithm 1 is a stationary point.

Theorem 5

Let $F(X) = \frac{1}{2}\Arrowvert {\mathcal {A}}(X) - b\Arrowvert _{2}^{2} + \lambda T(X)$ and $\{X_{k}\}$ be the sequence generated via Algorithm 1 with $\tau \ge \Arrowvert {\mathcal {A}} \Arrowvert _{2}^{2}$. Assume that $\xi _{k} \le \delta \Arrowvert X_{k + 1} - Y_{k}\Arrowvert _{F}^{2}$ and $\frac{\tau }{2} - \frac{\Arrowvert {\mathcal {A}} \Arrowvert _{2}^{2}}{2} - \tau \delta > 0$, then

(1)
the sequence $\{X_{k}\}$ is bounded, and has at leat one accumulation point;
(2)
F(X) is monotonically decreasing and converges to $F(X^{*})$, where $X^{*}$ is any accumulation point of $\{X_{k}\}$;
(3)
$\lim _{k \rightarrow \infty }\Arrowvert X_{k + 1} - Y_{k}\Arrowvert _{F}^{2} = 0$;
(4)
$X^{*}$ is a stationary point of (6).

To prove Theorem 5, we first introduce the following lemma.

Lemma 6

If $\xi _{k} \le \delta \Arrowvert A - Y_{k} \Arrowvert _{F}^{2}$, then

$$\begin{aligned} F(A) \le F(Y_{k}) - \left( \frac{\tau }{2} - \frac{\Arrowvert {\mathcal {A}} \Arrowvert _{2}^{2}}{2} - \tau \delta \right) \Arrowvert A - Y_{k}\Arrowvert _{F}^{2}, \end{aligned}$$

(31)

where $A = prox_{\frac{\lambda }{\tau }T(\cdot )}(Y_{k} - \frac{1}{\tau }\nabla f(Y_{k}))$.

Proof

Let $f(A) = \frac{1}{2}\Arrowvert {\mathcal {A}}(A) - b \Arrowvert _{2}^{2}$, we have

$$\begin{aligned} f(A) \le f(Y_{k}) + <A - Y_{k}, \nabla f(Y_{k})> + \frac{\Arrowvert {\mathcal {A}} \Arrowvert _{2}^{2}}{2}\Arrowvert A - Y_{k} \Arrowvert _{F}^{2}. \end{aligned}$$

According to the definition of the inexact proximal operator for each step, we have that

$$\begin{aligned}&\frac{\lambda }{\tau } T(A) + \frac{1}{2}\Arrowvert A - Y_{k} + \frac{1}{\tau }\nabla f(Y_{k}) \Arrowvert _{F}^{2}\\&\quad \le \xi _{k} + \frac{\lambda }{\tau } T(Y_{k}) + \frac{1}{2}\Arrowvert \frac{1}{\tau }\nabla f(Y_{k}) \Arrowvert _{F}^{2}, \end{aligned}$$

and thus we can simplify it as

$$\begin{aligned} \lambda T(A) \le \tau \xi _{k} + \lambda T(Y_{k}) - \frac{\tau }{2}\Arrowvert A - Y_{k} \Arrowvert _{F}^{2} - <A - Y_{k}, \nabla f(Y_{k})>. \end{aligned}$$

Therefore, we have that

$$\begin{aligned} F(A)= & {} f(A) + \lambda T(A)\\\le & {} f(Y_{k}) +<\nabla f(Y_{k}), A - Y_{k}> + \frac{\Arrowvert {\mathcal {A}} \Arrowvert _{2}^{2}}{2}\Arrowvert A - Y_{k}\Arrowvert _{F}^{2}\\&+ \lambda T(Y_{k}) - <\nabla f(Y_{k}), A - Y_{k}> - \frac{\tau }{2}\Arrowvert A - Y_{k}\Arrowvert _{F}^{2} + \tau \xi _{k}\\= & {} F(Y_{k}) - \left( \frac{\tau - \Arrowvert {\mathcal {A}} \Arrowvert _{2}^{2}}{2}\right) \Arrowvert A - Y_{k}\Arrowvert _{F}^{2} + \tau \xi _{k}\\\le & {} F(Y_{k}) -\left( \frac{\tau }{2} - \frac{\Arrowvert {\mathcal {A}} \Arrowvert _{2}^{2}}{2} - \tau \delta \right) \Arrowvert A - Y_{k}\Arrowvert _{F}^{2}. \end{aligned}$$

We complete the proof. $\square $

Now, we begin prove Theorem 5.

Proof

(1) and (2) By applying Lemma 6, we obtain that

$$\begin{aligned} F(X_{k + 1}) \le F(Y_{k}) - \left( \frac{\tau }{2} - \frac{\Arrowvert {\mathcal {A}} \Arrowvert _{2}^{2}}{2} - \tau \delta \right) \Arrowvert X_{k + 1} - Y_{k} \Arrowvert _{F}^{2}. \end{aligned}$$

(32)

Since $\frac{\tau }{2} - \frac{\Arrowvert {\mathcal {A}} \Arrowvert _{2}^{2}}{2} - \tau \delta > 0$, it follows that $F(X_{k + 1}) \le F(Y_{k})$. Besides, according to the the update rule of Algorithm 4.1, we have $F(Y_{k + 1}) \le F(X_{k + 1})$. In summary, for all k the following inequality holds:

$$\begin{aligned} F(Y_{k + 1}) \le F(X_{k + 1}) \le F(Y_{k}) \le F(X_{k}), \end{aligned}$$

(33)

which shows that $\{F(X_{k})\}$ is monotonically decreasing and converges to a constant $C^{*}$. From $\{X_{k}\} \subset \{X: F(X) \le F(X_{0})\}$ which is bounded, it follows that $\{X_{k}\}$ is bounded and, therefore, there is at least one accumulation point $X^{*}$. Moreover, using the continuity and monotonicity of $\{F(\cdot )\}$, $F(X_{k}) \rightarrow F^{*} = F(X^{*})$ as $k \rightarrow +\infty $.

(3) From (32) and (33), we have

$$\begin{aligned} \left( \frac{\tau }{2} - \frac{\Arrowvert {\mathcal {A}} \Arrowvert _{2}^{2}}{2} - \tau \delta \right) \Arrowvert X_{k + 1} - Y_{k}\Arrowvert _{F}^{2} \le F(X_{k}) - F(X_{k + 1}). \end{aligned}$$

(34)

Summing (34) from $k = 1$ to m, we have

$$\begin{aligned} \sum _{k = 1}^{m}\Arrowvert X_{k + 1} - Y_{k} \Arrowvert _{F}^{2} \le \frac{1}{\frac{\tau }{2} - \frac{\Arrowvert {\mathcal {A}} \Arrowvert _{2}^{2}}{2} - \tau \delta }(F(X_{1}) - F(X_{m + 1})). \end{aligned}$$

(35)

Let $m \rightarrow \infty $, becomes

$$\begin{aligned} \sum _{k = 1}^{\infty }\Arrowvert X_{k + 1} - Y_{k} \Arrowvert _{F}^{2} \le \frac{1}{\frac{\tau }{2} - \frac{\Arrowvert {\mathcal {A}} \Arrowvert _{2}^{2}}{2} - \tau \delta }(F(X_{1}) - F(X^{*})). \end{aligned}$$

(36)

Thus,

$$\begin{aligned} \sum _{k = 1}^{\infty }\Arrowvert X_{k + 1} - Y_{k} \Arrowvert _{F}^{2} < \infty , \end{aligned}$$

(37)

Finally, we have that

$$\begin{aligned} \lim _{k \rightarrow \infty }\Arrowvert X_{k + 1} - Y_{k}\Arrowvert _{F}^{2} = 0, \end{aligned}$$

(38)

This also implies that $\{X_{k}\}$ and $\{Y_{k}\}$ share the same set of limit points.

(4) Let $\{X_{k_{j}}\}$ and $\{Y_{k_{j}}\}$ be the convergent subsequences of $\{X_{k}\}$ and $\{Y_{k}\}$, respectively. Assuming $X^{*}$ be their limit point, i.e.,

$$\begin{aligned} X_{k_{j}} \rightarrow X^{*}, \quad Y_{k_{j}} \rightarrow X^{*} \quad as \quad k_{j} \rightarrow +\infty . \end{aligned}$$

(39)

From

$$\begin{aligned}&\Arrowvert X_{k_{j + 1}} - X^{*} \Arrowvert _{F} \le \Arrowvert X_{k_{j + 1}} - X_{k_{j}} \Arrowvert _{F}\\&\quad + \Arrowvert X_{k_{j}} - X^{*} \Arrowvert _{F} \rightarrow 0, \quad as \quad k_{j} \rightarrow +\infty , \end{aligned}$$

we get

$$\begin{aligned} X_{k_{j + 1}} \rightarrow X^{*} \quad as \quad k_{j} \rightarrow +\infty . \end{aligned}$$

(40)

Since $X_{k_{j + 1}} = prox_{\frac{\lambda }{\tau }T(\cdot )}(Y_{k_{j}} - \frac{1}{\tau }\nabla f(Y_{k_{j}}))$, we have

$$\begin{aligned} \frac{1}{2} \left \Arrowvert X_{k_{j + 1}} - \left[ Y_{k_{j}} + \frac{1}{\tau }{\mathcal {A}}^{*}(b - {\mathcal {A}}(Y_{k_{j}}))\right] \right\Arrowvert _{F}^{2} +\frac{\lambda }{\tau }T(X_{k_{j + 1}})\\ \le \frac{1}{2} \left \Arrowvert X - \left[ Y_{k_{j}} + \frac{1}{\tau }{\mathcal {A}}^{*}(b - {\mathcal {A}}(Y_{k_{j}}))\right] \right\Arrowvert _{F}^{2} + \frac{\lambda }{\tau }T(X). \end{aligned}$$

From (39) and (40), we immediately get for any $X \in {\mathbb {R}}^{m \times n}$ that

$$\begin{aligned} \frac{1}{2}\left\Arrowvert X^{*} -\left[ X^{*} + \frac{1}{\tau }{\mathcal {A}}^{*}(b - {\mathcal {A}}(X^{*}))\right] \right\Arrowvert _{F}^{2} + \frac{\lambda }{\tau }T(X^{*})\\ \le \frac{1}{2}\left\Arrowvert X - \left[ X^{*} + \frac{1}{\tau }{\mathcal {A}}^{*}(b - {\mathcal {A}}(X^{*}))\right] \right\Arrowvert _{F}^{2} + \frac{\lambda }{\tau }T(X), \end{aligned}$$

therefore, $X^{*}$ is the minimizes of the following function:

$$\begin{aligned} X \rightarrow \frac{1}{2} \left \Arrowvert X - \left[ X^{*} + \frac{1}{\tau }{\mathcal {A}}^{*}(b - {\mathcal {A}}(X^{*}))\right] \right\Arrowvert _{F}^{2} + \frac{\lambda }{\tau }T(X). \end{aligned}$$

We complete the proof. $\square $

ATS1PA can be directly applied to solve matrix completion problems by replacing its step 4 with

$$\begin{aligned} B = Y - \frac{1}{\tau }{\mathcal {P}}_{\varOmega }(W - Y), \end{aligned}$$

(41)

where $\varOmega $ denotes the indices of the observed entries, and ${\mathcal {P}}_{\varOmega }$ is defined as

$$\begin{aligned}{}[{\mathcal {P}}_{\varOmega }(M)]_{ij} = {\left\{ \begin{array}{ll} M_{ij}, &{} \text { if }\varOmega _{ij} = 1\\ 0, &{} \text { if }\varOmega _{ij} = 0. \end{array}\right. } \end{aligned}$$

The main computation cost of B is ${\mathcal {P}}_{\varOmega }(W - Y)$ which takes $O(\Arrowvert \varOmega \Arrowvert _{1}{\bar{r}})$ time, where ${\bar{r}}$ is the rank of $(W - Y)$. The PowerMethod is performed in step 5 and takes $O(mnt_{k})$ time, where $t_{k}$ is the column number of R. At step 6, a SVD on a smaller matrix $Q^{T}B$ is performed and SVD($Q^{T}B$) takes only $O(mt_{k}^{2})$ time. Besides, by using the “sparse plus low rank” structure of (41), the complexity of proximal step is $O((m + n)t_{k}^{2} + \Arrowvert \varOmega \Arrowvert _{1}t_{k})$ when $Y_{k + 1} = V_{k + 1}$ in step 13 or $O((m + n){\bar{r}} + \Arrowvert \varOmega \Arrowvert _{1}t_{k})$ when $Y_{k + 1} = X_{k + 1}$. Summarizing, the time complexity of ATS1PGA in each iteration is $O((m + n)t_{k}^{2} + \Arrowvert \varOmega \Arrowvert _{1}t_{k})$, where $t_{k} \ll n$, $\Arrowvert \varOmega \Arrowvert _{1} \ll mn$.

5 Numerical experiments

In this section, numerous experimental results on real-world data are presented to demonstrate that our proposed algorithm has an improved convergence rate and improved restoration capability of low-rank matrix over that of the state-of-the-art algorithms. We compare ATS1PGA algorithm with the following representative algorithms:

SVT [13]: a widely used nuclear-norm-based algorithm which is inspired by the linearized Bregman iterations for compressed sensing.
APG [14]: a nuclear-norm-based algorithm which extends a fast iterative shrinkage thresholding algorithm from the vector case to the matrix case.
FPCA [41]: an algorithm deals with the same problem as APG while employing fast Monte Carlo algorithm for approximate SVD.
R1MP [36]: an algorithm which extends the orthogonal matching pursuit algorithm from the vector case to the matrix case.
IRNN [24]: a nonconvex algorithm which replaces the nuclear norm by reweighted nuclear norm. We choose SCAD in this work, as it always generates the best result in our test.
FaNCL [25]: a nonconvex algorithm deals with the same problem as IRNN while achieving improved convergence rate. We select SCAD in this work.
TS1 [26]: a nonconvex algorithm which is proposed by using transformed Schatten-1 penalty function.

As shown in Fig. 1, an image usually represents low-rank or approximately low-rank structure. Thus, the problem of recovering an incomplete image can be treated as the problem of recovering a low-rank matrix. To test effectiveness of our algorithm on real data, we compare it with the state-of-the-art algorithms on 20 widely used images represented in Fig. 2. The size of the first 7 images is $256 \times 256$, the size of the following 10 images is $512 \times 512$, and the size of the last 3 images is $1024 \times 1024$. In our tests, two different types of mask are considered.

Random mask: given an image, we randomly exclude $\alpha \%$ pixels, and the remaining ones serve as the observations.
Text mask: the text may cover some important texture information of a given image.

In the following experiments, the parameters in each algorithm are obtained by the recommended setting. For our proposed algorithm, we use the same parameter values as the TS1 algorithm [26]. Besides, all the algorithms are stopped when the difference in objective values between consecutive iterations becomes smaller than $10^{-4}$. All the algorithms are implemented in MATLAB R2014a on a Windows server 2008 system with Intel Xeon E5-2680-v4 CPU(3 cores, 2.4GHz) and 256GB memory.

5.1 Image inpainting with random mask

Table 1 Comparison of different algorithms on image datasets with $sr = 0.5$ and $s = 10$. CPU time is in seconds

Full size table

Table 2 Comparison of different algorithms on image datasets with $sr = 0.5$ and $s = 15$. CPU time is in seconds

Full size table

Image inpainting is one of the most basic problems in the field of image processing, which aims to find out the missing pixels from very limited information of an incomplete image. In the following tests, we first consider a relatively easy low-rank matrix recovery problem. We assume that the incomplete image data is corrupted with noise. Specifically, let matrix X represents an incomplete image data, before sampling missing pixels we first generate a noise matrix N with i.i.d. elements drawn form Gaussian distribution ${\mathcal {N}}(0, s)$. Then, we set $X = X + N$ as the observed matrix. By using the same setup as in [16], we randomly exclude $50 \%$ of the pixels in each image, and the remaining ones are used as the observations. We also use sr to denote the sample ratio.

The performance of all algorithms are evaluated as: (1) peak signal-to-noise ration (PSNR) [42]; (2) the running time. We vary s in the range $\{10, 15, 20, 30\}$. The results with different levels of noise, the average of 10 times experiments, are reported.

Table 3 Comparison of different algorithms on image datasets with $sr = 0.5$ and $s = 20$. CPU time is in seconds

Full size table

Table 4 Comparison of different algorithms on image datasets with $sr = 0.5$ and $s = 30$. CPU time is in seconds

Full size table

It is seen from Table 1 that ATS1PGA algorithm achieves higher PSNR values than other alternative algorithms except TS1. We also find that TS1 achieves most accurate solutions 9 times on all images. However, ATS1PGA runs much faster than TS1. Actually, our proposed algorithm is the fastest. From Tables 2 and 3, we can observe that the noise degenerates the performance of TS1. For ATS1PGA, we can obtain the similar trend in Table 1. From Table 4, we can find that our proposed algorithm outperforms all competing algorithms on nearly all images. We also present the recovered images in Fig. 3 by using different algorithms and show that the images obtained by ATS1PGA contain more details than those obtained by other stat-of-the-art algorithms. The convergence efficiency of ATS1PGA is also investigated and the empirical results are presented in Fig. 4. The results in Fig. 4 show that our ATS1PGA algorithm performs best among all competing algorithms. Therefore, taking both accuracy and efficiency into consideration, our ATS1PGA algorithm has the best recovery performance among all stat-of-the-art algorithms.

5.2 Image inpainting with text mask

In this section, we will consider the text removal problem, where some of the pixels of one image are masked in a non-random fashion, such as texts on the image. Text removal is a tough task in the field of image processing. To deal with such a problem, the position of the text should be detected first, and then the corresponding task turns into recovering a low-rank matrix problem. Figures 5 and 6 represent the empirical results of the eight low-rank matrix recovery methods. Specifically, for the example Lenna in Fig. 5, the PSNR values for SVT, APG, FPCA, R1MP, IRNN, FaNCL, TS1, and ATS1PGA are 5.53, 27.04, 13.54, 27.08, 26.63, 26.63, 29.79, and 27.76, respectively. And for the example Boat in Fig. 6, the PSNR values for SVT, APG, FPCA, R1MP, IRNN, FaNCL, TS1, and ATS1PGA are 5.4, 26.04, 14.27, 26.57, 25.65, 25.65, 28.55, and 26.22, respectively. These results show that our ATS1PA algorithm is better than SVT, APG, FPCA, R1MP, IRNN, and FaNCL for both Lenna and Boat but only slightly worse than TS1. In terms of speed among eight methods, our ATS1PGA is the fastest. In particular, our ATS1PGA is at least 15 times faster than TS1. Thus, we can conclude that our ATS1PGA algorithm is competitive in handling the text removal task.

6 Conclusion and future work

This paper further investigated the basic properties of TS1 penalty function and utilized it to deal with the problem of low-rank matrix recovery. Specifically, we theoretically proved that the original low-rank problem (1) can be equivalently transformed into the problem (5) under certain conditions. Since the resulting optimization problem (5) is still NP-hard, we proved that the solutions of (5) can be obtained by solving its regularization problem. To provide an efficient and fast low-rank matrix recovery method, an algorithm with inexact proximal steps and Nesterov’s rule was proposed. Besides, a convergence analysis of the proposed algorithm demonstrated that any accumulation point of the sequence generated by our algorithm is a stationary point. Finally, numerical experiments on real-world data sets demonstrated that our proposed algorithm much faster than the SVT, APG, FPCA, R1MP, IRNN, FaNCL, and TS1 algorithms. The experiments results also showed that our proposed algorithm achieves comparable recovery performance. We can conclude that our algorithm leads to impressive improvements over the state-of-the-art methods in the field of low-rank matrix recovery.

In many real-world applications, it is not appropriate to assume the observed entries are corrupted by only white Gaussian noise. This means that there is still much follow-up work to be done in future, such as developing more flexible model with TS1 regularizer to deal with low-rank matrix recovery problem when observed entries are corrupted by non-Gaussian noise. Furthermore, developing fast, robust, scalable and reliable algorithm is also a central issue in the future.

References

Koren Y (2008) Factorization meets the neighborhood: a multifaceted collaborative filtering model. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 426–434
Luo X, Zhou M, Li S, You Z, Xia Y, Zhu Q (2016) A non-negative latent factor model for large-scale sparse matrices in recommender systems via alternating direction method. IEEE Trans Neural Netw Learn Syst 27(3):579–592
Article MathSciNet Google Scholar
Huang C, Ding X, Fang C, Wen D (2014) Robust image restoration via adaptive low-rank approximation and joint kernel regression. IEEE Trans Image Process 23(12):5284–5297
Article MathSciNet Google Scholar
Chen B, Yang Z, Yang Z (2018) An algorithm for low-rank matrix factorization and its applications. Neurocomputing 275:1012–1020
Article Google Scholar
Zhao F, Peng J, Cui A (2020) Design strategy of thresholding operator for low-rank matrix recovery problem. Signal Process 171:1–10
Article Google Scholar
Luo X, Zhou M, Li S, Xia Y, You Z, Zhu Q, Leung H (2018) Incorporation of efficient second-order solvers into latent factor models for accurate prediction of missing QoS data. IEEE Trans Cybern 48(4):1216–1228
Article Google Scholar
Fan J, Chow T (2017) Deep learning based matrix completion. Neurocomputing 266:791–803
Article Google Scholar
Peng X, Zhang Y, Tang H (2016) A unified framework for representation-based subspace clustering of out-of-sample and large-scale data. IEEE Trans Neural Netw Learn Syst 27(12):2499–2512
Article MathSciNet Google Scholar
Liu G, Liu Q, Yuan X (2017) A new theory for matrix completion. In: Proceedings of the advances in neural information processing systems, pp 785–794
Liu G, Liu Q, Li P (2016) Low-rank matrix completion in the presence of high coherence. IEEE Trans Signal Process 64(21):5623–5633
Article MathSciNet Google Scholar
Fazel M (2002) Matrix rank minimization with applications. Ph.D. thesis, Stanford University
Recht B, Fazel M, Parrilo P (2010) Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev 52(3):471–501
Article MathSciNet Google Scholar
Cai J-F, Candès EJ, Shen Z (2010) A singular value thresholding algorithm for matrix completion. SIAM J Optim 20(4):1956–1982
Article MathSciNet Google Scholar
Toh K-C, Yun S (2010) An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems. Pac. J Optim 6(3):615–640
MathSciNet MATH Google Scholar
Yao Q, Kwok J (2015) Accelerated inexact soft-impute for fast large scale matrix completion. In: Proceedings of the international joint conference on artificial intelligence, pp 4002–4008
Wang Z, Wang W, Wang J, Chen S (2019) Fast and efficient algorithm for matrix completion via closed-form 2/3-thresholding operator. Neurocomputing 330:212–222
Article Google Scholar
Wang Z, Gao C, Luo X, Tang M, Wang J, Chen W (2020) Accelerated inexact matrix completion algorithm via closed-form q-thresholding $(q=1/2,2/3)$ operator. Int J Mach Learn Cybern 11:2327–2339
Article Google Scholar
Wang Z, Liu Y, Luo X, Wang J, Gao C, Peng D, Chen W (2021) Large-scale affine matrix rank minimization with a novel nonconvex regularizer. IEEE Trans Neural Netw Learn Syst (to be published)
Gu S, Zhang L, Zuo W, Feng X (2014) Weighted nuclear norm minimization with application to image denoising. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2862–2869
Fan J, Li R (2001) Variable selection via nonconcal penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360
Article Google Scholar
Zhang C (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38(2):894–942
Article MathSciNet Google Scholar
Candès EJ, Wakin M, Boyd S (2008) Enhancing sparsity by reweighted $l_{1}$ minimization. J Fourier Anal Appl 14:877–905
Article MathSciNet Google Scholar
Peng D, Xiu N, Yu J (2017) $S_{1/2}$ regularization methods and fixed point algorithms for affine rank minimization problems. Comput Optim Appl 67:543–569
Article MathSciNet Google Scholar
Lu C, Tang J, Yan S, Lin Z (2016) Nonconvex nonsmooth low rank minimization via iteratively reweighted nuclear norm. IEEE Trans Image Process 25(1):829–839
Article MathSciNet Google Scholar
Yao Q, Kwok J, Wang T, Liu T (2019) Large-scale low-rank matrix learning with nonconvex regularizers. IEEE Trans Pattern Anal Mach Intell 41(11):2628–2643
Article Google Scholar
Zhang S, Yin P, Xin J (2017) Transformed Schatten-1 iterative thresholding algorithms for low rank matrix completion. Commun Math Sci 15(3):839–862
Article MathSciNet Google Scholar
Lv J, Fan Y (2009) A unified approach to model selection and sparse recovery using regularized least squares. Ann Stat 37(6):3498–3528
MathSciNet MATH Google Scholar
Kang Z, Peng C, Cheng Q (2015) Robust PCA via nonconvex rank approximation. In: Proceedings of IEEE international conference on data mining, pp 211–220
Zhang S, Xin J (2017) Minimization of transformed $L_{1}$ penalty: closed form representation and iterative thresholding algorithms. Commun Math Sci 15(2):511–537
Article MathSciNet Google Scholar
Zhang S, Xin J (2018) Minimization of transformed $L_{1}$ penalty: theory, difference of convex function algorithm, and robust application in compressed sensing. Math Progr 169(1–2):307–336
Article Google Scholar
Cui A, Peng J, Li H (2018) Exact recovery low-rank matrix via transformed affine matrix rank minimization. Neurocomputing 319:1–12
Article Google Scholar
Parikh N, Boyd S (2014) Proximal algorithms. Found Trends Optim 1(3):127–239
Article Google Scholar
Schmidt M, Roux N, Bach F (2011) Convergence rates of inexact proximal gradient methods for convex optimization. In: Proceedings of the advances in neural information processing systems, pp 1458–1466
Gu B, Huo Z, Huang H (2018) Inexact proximal gradient methods for non-convex and non-smooth optimization. In: Proceedings of the twenty-second AAAI conference on artificial intelligence, pp 3093–3100
Li H, Lin Z (2015) Accelerated proximal gradient methods for nonconvex programming. In: Proceedings of the advances in neural information processing systems, pp 379–387
Wang Z, Lai M, Lu Z, Fan W, Davulcu H, Ye J (2015) Orthogonal rank-one matrix pursuit for low rank matrix completion. SIAM J Sci Comput 37(1):A488–A514
Article MathSciNet Google Scholar
Halko N, Martinsson P, Tropp J (2011) Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev 53(2):217–288
Article MathSciNet Google Scholar
Oh T, Matsushita Y, Tai Y, Kweon I (2018) Fast randomized singular value thresholding for low-rank optimization. IEEE Trans Pattern Anal Mach Intell 40(2):376–391
Article Google Scholar
Nesterov Y (1983) A method for solving the convex programming problem with convergence rate $O(1/k^{2})$. Dokl Akad Nauk SSSR 27(2):543–547
Google Scholar
Li Q, Zhou Y, Liang Y, Varshney P (2017) Convergence analysis of proximal gradient with momentum for nonconvex optimization. In: Proceedings of the 34th international conference on machine learning, pp 2111–2119
Ma S, Goldfarb D, Chen L (2011) Fixed point and Bregman iterative methods for matrix rank minimization. Math Progr 128(1–2):321–353
Article MathSciNet Google Scholar
Thu Q, Ghanbari M (2008) Scope of validity of PSNR in image/video quality assessment. Electron Lett 44(13):800–801
Article Google Scholar

Download references

Acknowledgements

This work is supported in part by the Natural Science Foundation of China under Grant 11901476, and in part by the Fundamental Research Funds for the Central Universities under Grant SWU120036.

Author information

Authors and Affiliations

College of Computer and Information Science, Southwest University, Chongqing, 400715, People’s Republic of China
Zhi Wang, Dong Hu, Xiaohu Luo & Wu Chen
College of Artificial Intelligence, Southwest University, Chongqing, 400715, People’s Republic of China
Wendong Wang
School of Mathematics and Statistics, Southwest University, Chongqing, 400715, People’s Republic of China
Jianjun Wang

Authors

Zhi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Dong Hu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohu Luo
View author publications
You can also search for this author in PubMed Google Scholar
Wendong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jianjun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wu Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Zhi Wang or Wu Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Z., Hu, D., Luo, X. et al. Performance guarantees of transformed Schatten-1 regularization for exact low-rank matrix recovery. Int. J. Mach. Learn. & Cyber. 12, 3379–3395 (2021). https://doi.org/10.1007/s13042-021-01361-1

Download citation

Received: 23 July 2020
Accepted: 11 June 2021
Published: 29 June 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s13042-021-01361-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Performance guarantees of transformed Schatten-1 regularization for exact low-rank matrix recovery

Abstract

Similar content being viewed by others

Low-rank matrix recovery with Ky Fan 2-k-norm

Matrix completion via modified schatten 2/3-norm

Algorithms of matrix recovery based on truncated Schatten p-norm

Explore related subjects

1 Introduction

2 Background

2.1 Proximal algorithms

2.2 TS1 thresholding algorithm

Lemma 1

Lemma 2

3 Equivalence minimizers of problem (1) and problem (5)

Definition 1

Definition 2

Lemma 3

Proof

Theorem 1

Proof

Lemma 4

Proof

Theorem 2

Proof

Definition 3

Theorem 3

Proof

Remark 1

Theorem 4

Proof

4 The proposed algorithm and its convergence analysis

Lemma 5

Proof

Theorem 5

Lemma 6

Proof

Proof

5 Numerical experiments

5.1 Image inpainting with random mask

5.2 Image inpainting with text mask

6 Conclusion and future work

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation