1 Introduction

The problem of recovering a matrix of minimum rank subject to linear system constraint has attracted considerable attention in recent years. This problem arises in various real world applications, such as recommender systems [1, 2], image processing [3,4,5], quality-of-service (QoS) prediction [6], and deep learning [7, 8]. In general, such a task can be formulated as the following low-rank minimization problem [9, 10]:

$$\begin{aligned} \min _{X \in {\mathbb {R}}^{m \times n}} rank(X) \quad s.t. \quad {\mathcal {A}}(X) = b, \end{aligned}$$
(1)

where X is the considered low-rank matrix in \({\mathbb {R}}^{m \times n}\), b is a given measurement in \({\mathbb {R}}^{d}\), and \({\mathcal {A}}\) denotes the linear transformation. By adopting the regularization method, the optimization problem (1) can be equivalently converted into the following unconstrained minimization problem:

$$\begin{aligned} \min _{X \in {\mathbb {R}}^{m \times n}} \frac{1}{2}\Arrowvert {\mathcal {A}}(X) - b\Arrowvert _{2}^{2} + \lambda rank(X), \end{aligned}$$
(2)

where \(\lambda > 0\) is a regularization parameter.

Unfortunately, the optimization problems (1) and (2) are computationally intractable due to the nonconvexity and discontinuous properties of the rank function. In order to overcome this difficulty, many researchers suggested to use the nuclear norm instead, which is known as the tightest convex proxy of the rank function [11, 12]. Theoretical analysis shows that, under some mild conditions, the low-rank matrix can be exactly recovered with high probability by using this scheme. Thus, a large number of methods have been proposed for the resultant nuclear norm optimization problem, such as singular value thresholding (SVT) [13], accelerated proximal gradient with linesearch algorithm (APGL) [14], and accelerated inexact soft-impute (AIS-Impute) [15].

Since the methods mentioned above are simple and easy to use with theoretical guarantee, nuclear norm based model has recently attracted significant attention in the field of low-rank matrix recovery. However, the performance of such a convex relaxation is not good enough. In other words, the solutions of nuclear norm optimization problem may deviate from the solutions of the original optimization problem. The main reason is that the nuclear norm based model over-penalizes large singular values. To alleviate this limitation, a common used strategy is to use nonconvex surrogates to approximate the rank function, which make closer approximation than nuclear norm. Examples of these nonconvex surrogate functions include \(l_{q}\)-norm \((0< q < 1)\) [16,17,18], weighted nuclear norm (WNN) [19], smoothly clipped absolute deviation (SCAD) [20], mini-max concave penalty (MCP) [21], log-sum penalty (LSP) [22], and so on. Despite the resultant problem is nonconvex, non-smooth, and even non-Lipschitz, numerous methods have been proposed to handle it. In [16] and [23], the authors proposed fixed point iterative scheme with the singular value thresholding operator. The convergence analysis and empirical results show that these methods are fast and efficient. In [24], the iteratively reweighted nuclear norm (IRNN) method has been proposed by using the concavity and decreasing supergradients property of existing nonconvex regularizer. Since a computationally expensive singular value decomposition (SVD) step is involved in per iteration, the IRNN method converges slowly. In order to improve the speed and performance of IRNN method, fast nonconvex low-rank (FaNCL) [25] method was proposed. The empirical results of all the methods mentioned above illustrate that the nonconvex based model outperforms the convex based model.

Recently, the transformed Schatten-1 (TS1) penalty function [26], as a matrix quasi-norm defined on its singular values, has been successfully applied to low-rank matrix recovery. Actually, the TS1 penalty function is extended from the Transformed \(\ell _{1}\) (TL1) function. The TL1 function can be seen as a class of \(\ell _{1}\) based nonconvex penalty function, which was generalized by Lv and Fan in [27]. Kang et al. [28] have demonstrated the very high efficiency of TL1 function when applied to robust principal component analysis. However, the TL1 penalty function leads to a nonconvex optimization problem that is difficult to solve fast and efficient. Therefore, Zhang et al. continue such a study [29, 30] and point out that the TL1 proximal operator has closed form analytical solutions for all values of parameter. Based on this finding, in this paper, we consider the following TS1 penalty function:

$$\begin{aligned} T(X) = \sum _{i = 1}^{rank(X)}\rho _{a}(\sigma _{i}(X)) = \sum _{i = 1}^{rank(X)}\frac{(a + 1)\sigma _{i}(X)}{a + \sigma _{i}(X)}, \end{aligned}$$
(3)

where

$$\begin{aligned} \rho _{a}(|x|) = \frac{(a + 1)|x|}{a + |x|}, \end{aligned}$$
(4)

is a nonconvex function with parameter \(a \in (0, +\infty )\), and \(\sigma _{i}(X)\) denotes the ith singular value of matrix X. Therefore, the original problem (1) can be naturally converted into the following optimization problem:

$$\begin{aligned} \min _{X \in {\mathbb {R}}^{m \times n}} T(X) \quad s.t. \quad {\mathcal {A}}(X) = b. \end{aligned}$$
(5)

More often, we focus on its regularization version, which can be formulated as:

$$\begin{aligned} \min _{X \in {\mathbb {R}}^{m \times n}} \frac{1}{2}\Arrowvert {\mathcal {A}}(X) - b\Arrowvert _{2}^{2} + \lambda T(X). \end{aligned}$$
(6)

It should be noted that the TS1 penalty function is more general than the nonconvex penalty function in [31]. Besides, \(\rho _{a}\) with \(a \in (0, +\infty )\) provides solutions satisfying the unbiasedness and low-rankness.

In this paper, we further investigate the basic properties of TS1 penalty function and propose a fast and efficient algorithm to solve problem (6). Specifically, we analyze the solution of rank minimization and prove that, under certain rank-RIP conditions, rank minimization can be equivalently transformed into unconstrained TS1 regularization, whose global minimizer can be obtained by TS1 thresholding functions. The TS1 thresholding functions are in closed analytical form for all parameter values. Thus, an exact mathematical analysis provides theoretical guarantee for the application of TS1 regularization in solving low-rank matrix recovery problems. We list the findings and contributions as follows:

  1. (1)

    We first show that the minimizer of problem (5) is actually the optimal solution of problem (1).

  2. (2)

    We further establish the relationship between problems (5) and (6), and prove that the solution of problem (5) can be obtained by solving problem (6).

  3. (3)

    Nesterov’s rule and inexact proximal strategies are adopted to achieve a novel algorithm highly efficient in solving the TS1 regularization at a convergence rate of O(1/N).

  4. (4)

    Extensive empirical studies regarding image inpainting tasks validate the advantages of our proposed method.

The remainder of this paper is organized as follows. Section 2 introduces the related works. In Sect. 3, the equivalence minimizers of the resultant optimization problem and the original optimization problem are established. Section 4 proposes an efficient optimization method with rigorous convergence guarantee. The experimental results are reported and analyzed in Sect. 5. Finally, we conclude this paper in Sect. 6.

2 Background

2.1 Proximal algorithms

In this paper, we consider the following low-rank optimization problem:

$$\begin{aligned} \min _{X \in {\mathbb {R}}^{m \times n}}F(X) = f(X) + \lambda h(X), \end{aligned}$$
(7)

where f is bounded below and differentiable with \(L_{f}\)-Lipschitz continuous gradient, i.e., \(\Arrowvert \nabla f(X_{1}) - \nabla f(X_{2}) \Arrowvert _{F} \le L_{f}\Arrowvert X_{1} - X_{2} \Arrowvert _{F}\), h is low-rank regularizer, and \(\lambda > 0\) is a given parameter. During the last decades, the proximal algorithm [32] has been received considerable attention and successfully applied to solve problem (7). Specifically, the proximal algorithm generates \(X_{k + 1}\) as

$$\begin{aligned} X_{k + 1} = prox_{\frac{\lambda }{\tau }h}\left( X_{k} - \frac{1}{\tau }\nabla f(X_{k})\right) , \end{aligned}$$
(8)

where \(\tau > 0\) is a parameter, and

$$\begin{aligned} prox_{\frac{\lambda }{\tau }h}(M) = \mathop {\arg \min }_{X \in {\mathbb {R}}^{m \times n}}\frac{1}{2}\Arrowvert X - M \Arrowvert ^{2}_{F} + \frac{\lambda }{\tau }h(X) \end{aligned}$$
(9)

denotes the proximal operator. Assume that f and h are convex, Toh and Yun [14] proposed an accelerated proximal gradient (APG) algorithm with a converge rate of \(O(1/N^{2})\). However, exactly solving the proximal operator may be expensive. With the aim of alleviating this difficulty, the inexact proximal gradient methods have been proposed and the theoretical analysis in [33] reveals that it shares the same convergence rate as APG. Most recently, the inexact proximal gradient method has been extended to nonconvex and nonsmooth problems [34]. It should be pointed out that the nmAIPG algorithm in [34] is nearly the same as the nmAPG algorithm in [35], but it is much faster. Unfortunately, both nmAIPG and nmAPG algorithms may involve two proximal steps in per iteration.

2.2 TS1 thresholding algorithm

We first define the proximal operator with TS1 regularizer, which can be found as follows:

$$\begin{aligned} prox_{\lambda }^{\rho _{a}}(y) = \mathop {\arg \min }_{x \in {\mathbb {R}}}\frac{1}{2}(y - x)^{2} + \lambda \rho _{a}(|x|), \end{aligned}$$
(10)

where \(\rho _{a}(\cdot )\) is defined in (4). As shown in [30], this nonconvex function has a closed-form expression for its optimal solution and the following lemma addresses this issue.

Lemma 1

(see [30]) For any \(\lambda > 0\) and \(y \in {\mathbb {R}}\), the solutions to nonconvex function (10) are

$$\begin{aligned} prox_{\lambda }^{\rho _{a}}(y) = \left\{ \begin{array}{lr} 0, &{} \mathrm{if}~|y| \le t\\ g_{\lambda }(y), &{} \mathrm{if}~|y| > t \end{array} \right. \end{aligned}$$
(11)

where

$$\begin{aligned} g_{\lambda }(y) = sgn(y)\left\{ \frac{2}{3}(a + |y|)cos\left( \frac{\phi (y)}{3}\right) - \frac{2a}{3} + \frac{|y|}{3}\right\} \end{aligned}$$

with \(\phi (y) = arccos(1 - \frac{27\lambda a(a + 1)}{2(a + |y|)^{3}})\), and

$$\begin{aligned} t = \left\{ \begin{array}{lr} \frac{\lambda (a + 1)}{a}, &{} \mathrm{if}~\lambda \le \frac{a^{2}}{2(a + 1)}\\ \sqrt{2\lambda (a + 1)} - \frac{a}{2}, &{} \mathrm{if}~\lambda > \frac{a^{2}}{2(a + 1)}. \end{array} \right. \end{aligned}$$
(12)

Based on this finding, the optimal solutions to problem (6) can be obtained by the proximal operator.

Lemma 2

(see [26]) Assume that \(\tau > \Arrowvert {\mathcal {A}} \Arrowvert _{2}^{2}\), the optimal solutions to problem (6) are

$$\begin{aligned} X^{*} = prox_{\frac{\lambda }{\tau }T(\cdot )}(B_{\tau }(X^{*})) = UDiag\left( prox_{\frac{\lambda }{\tau }}^{\rho _{a}}(\sigma _{i}(B_{\tau }(X^{*})))\right) V^{T}, \end{aligned}$$
(13)

where \(B_{\tau }(X^{*}) = X^{*} - \frac{1}{\tau }{\mathcal {A}}^{*}({\mathcal {A}}(X^{*}) - b)\) and it admits SVD as \(UDiag(\sigma (B_{\tau }(X^{*})))V^{T}\).

Therefore, at kth iteration

$$\begin{aligned} X_{k + 1} = prox_{\frac{\lambda }{\tau }T(\cdot )}\left( X_{k} - \frac{1}{\tau }{\mathcal {A}}^{*}({\mathcal {A}}(X_{k}) - b)\right) . \end{aligned}$$
(14)

According to iteration (14), the TS1 algorithm is proposed.

3 Equivalence minimizers of problem (1) and problem (5)

In this section, we further investigate the basic properties of TS1 penalty function. Assume that \(X_{0}\) and \(X_{1}\) be the minimizers to the problems (1) and (5), respectively. Let \(R = X_{1} - X_{0}\), then partition matrix R into two matrices \(R_{0}\) and \(R_{c}\) which are defined as follows.

Definition 1

Let R admits SVD as \(UDiag(\sigma (R))V^{T}\), the matrices \(R_{0}\) and \(R_{c}\) can be defined as:

$$\begin{aligned} R_{0} = [U_{2K} \; 0]_{m \times m}\left[ \begin{matrix} \Sigma _{1} &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 \end{matrix} \right] _{m \times n} [V_{2K} \; 0]_{n \times n}^{T} \end{aligned}$$
(15)

and

$$\begin{aligned} R_{c} = [0 \; U_{m - 2K}]_{m \times m}\left[ \begin{matrix} 0 &{} 0 &{} 0 \\ 0 &{} \Sigma _{2} &{} 0 \end{matrix} \right] _{m \times n} [0 \; V_{m - 2K}]_{n \times n}^{T}. \end{aligned}$$
(16)

Definition 2

Define the index set \(I_{j} = \{P(j - 1) + 2K + 1, \ldots , Pj + 2K\}\) and partition \(R_{c}\) into a sum of matrices \(R_{1}, R_{2}, \ldots ,\) i.e.,

$$\begin{aligned} R_{c} = \sum _{j}R_{j}, \end{aligned}$$

where

$$\begin{aligned} R_{j} = [0 \; U_{I_{j}} \; 0]_{m \times m}\left[ \begin{matrix} 0 &{} 0 &{} 0 \\ 0 &{} \Sigma _{I_{j}} &{} 0\\ 0 &{} 0 &{} 0 \end{matrix} \right] _{m \times n} [0 \; V_{I_{j}} \; 0]_{n \times n}^{T}. \end{aligned}$$
(17)

First, we introduce the following useful results.

Lemma 3

Assume that \(K = rank(X_{0})\), we have

$$\begin{aligned} T(R_{c}) \le T(R_{0}). \end{aligned}$$
(18)

Proof

Since \(X_{1}\) is the minimizer of (5), we have

$$\begin{aligned} T(X_{0}) \ge T(X_{1})= & {} T(X_{0} + R) \ge T(X_{0} + R_{c}) - T(R_{0})\\= & {} T(X_{0}) + T(R_{c}) - T(R_{0}), \end{aligned}$$

the second inequality follows from Lemma 2.1 in [30] and the last equality follows from Lemma 3.4 in [12]. \(\square \)

Theorem 1

For any \(a > 0\), we have

$$\begin{aligned} \Arrowvert R_{0} + R_{1} \Arrowvert _{F} \ge \frac{aT(R_{0})}{(a + 1)\sqrt{2K}}. \end{aligned}$$
(19)

Proof

According to the definition of \(\rho _{a}(\cdot )\), we have

$$\begin{aligned} \rho _{a}(|x|) = \frac{(a + 1)|x|}{a + |x|} = \frac{(a + 1)|x|}{a} - \frac{(a + 1)x^{2}}{a(a + |x|)} \le \frac{(a + 1)|x|}{a}. \end{aligned}$$

Hence, we get

$$\begin{aligned} T(R_{0})= & {} \sum _{i}\frac{(a + 1)\sigma _{i}(R_{0})}{a + \sigma _{i}(R_{0})} \le \frac{a + 1}{a}\sum _{i}\sigma _{i}(R_{0}) = \frac{a + 1}{a}\Arrowvert \sigma (R_{0}) \Arrowvert _{1}\\&\le \frac{a + 1}{a}\sqrt{2K}\Arrowvert R_{0} \Arrowvert _{F} \le \frac{a + 1}{a}\sqrt{2K}\Arrowvert R_{0} + R_{1} \Arrowvert _{F}. \end{aligned}$$

Thus, we get the desired result

$$\begin{aligned} \Arrowvert R_{0} + R_{1} \Arrowvert _{F} \ge \frac{aT(R_{0})}{(a + 1)\sqrt{2K}}. \end{aligned}$$

\(\square \)

Lemma 4

Let \(X \in {\mathbb {R}}^{m \times n}\) admits SVD as \(UDiag(\sigma (X))V^{T}\). For any \(a > 0\) and \(\alpha > \alpha _{1}\), where

$$\begin{aligned} \alpha _{1} = \frac{ar + r - 1}{a}\sigma _{1}(X) \end{aligned}$$
(20)

with \(r = rank(X)\), we have

$$\begin{aligned} T(\alpha ^{-1}X) \le 1. \end{aligned}$$
(21)

Proof

Using the fact that \(\rho _{a}(x)\) is increasing in the range \([0, +\infty )\), we have

$$\begin{aligned} T(\alpha ^{-1}X)= & {} \sum _{i = 1}^{r} \rho _{a}(\sigma _{i}(\alpha ^{-1}X)) \le r\rho _{a}(\sigma _{1}(\alpha ^{-1}X))\\= & {} \frac{\frac{1}{\alpha }r(a + 1)\sigma _{1}(X)}{a + \frac{1}{\alpha }\sigma _{1}(X)} = \frac{r(a + 1)\sigma _{1}(X)}{\sigma _{1}(X) + a\alpha }. \end{aligned}$$

To get \(T(\alpha ^{-1}X) \le 1\), it suffices to impose

$$\begin{aligned} \frac{r(a + 1)\sigma _{1}(X)}{\sigma _{1}(X) + a\alpha } \le 1, \end{aligned}$$

equivalently,

$$\begin{aligned} \alpha \ge \frac{ar + r - 1}{a}\sigma _{1}(X). \end{aligned}$$

We complete the proof. \(\square \)

Theorem 2

For any \(a > 0\) and \(\alpha > \frac{ar_{c} + r_{c} - 1}{a}\sigma _{1}(R_{c})\), we have

$$\begin{aligned} \sum _{i \ge 2}\Arrowvert \alpha ^{-1}R_{i} \Arrowvert _{F} \le \sum _{i \ge 2}\frac{T(\alpha ^{-1}R_{i -1})}{\sqrt{P}} \le \frac{T(\alpha ^{-1}R_{0})}{\sqrt{P}}, \end{aligned}$$
(22)

where \(r_{c} = rank(R_{c})\).

Proof

According to the definition of \(T(\cdot )\) and Lemma 4, we have

$$\begin{aligned} \rho _{a}(\sigma _{i}(\alpha ^{-1}R_{j})) \le T(\alpha ^{-1}R_{j}) \le 1, \; \forall i \in I_{j}. \end{aligned}$$

Hence, we have

$$\begin{aligned} \rho _{a}(\sigma _{i}(\alpha ^{-1}R_{j})) = \frac{(a + 1)\sigma _{i}(\alpha ^{-1}R_{j})}{a + \sigma _{i}(\alpha ^{-1}R_{j})} \le 1 \Leftrightarrow \sigma _{i}(\alpha ^{-1}R_{j}) \le 1, \end{aligned}$$

which is equivalent to

$$\begin{aligned} \sigma _{i}(\alpha ^{-1}R_{j}) \le \rho _{a}(\sigma _{i}(\alpha ^{-1}R_{j})). \end{aligned}$$

By Definition 2, we have

$$\begin{aligned} \sigma _{i}(R_{j}) \le \sigma _{i'}(R_{j - 1}), \; \forall i \in I_{j}, i' \in I_{j - 1}, \end{aligned}$$

and the rank of \(R_{j}\)s is not greater than k.

Since \(\rho _{a}(t)\) is increasing in the range \((0, \infty )\), we get

$$\begin{aligned} \sigma _{i}(\alpha ^{-1}R_{j}) \le \rho _{a}(\sigma _{i}(\alpha ^{-1}R_{j})) \le \frac{T(\alpha ^{-1}R_{j - 1})}{P}. \end{aligned}$$

Therefore, we obtain

$$\begin{aligned} \Arrowvert \alpha ^{-1}R_{j} \Arrowvert _{F} \le \frac{T(\alpha ^{-1}R_{j - 1})}{\sqrt{P}}, \end{aligned}$$

and

$$\begin{aligned} \sum _{j \ge 2}\Arrowvert \alpha ^{-1}R_{j} \Arrowvert _{F} \le \sum _{j \ge 2}\frac{T(\alpha ^{-1}R_{j - 1})}{\sqrt{P}}. \end{aligned}$$

This is the first part of (22). By Lemma 3, the second part of (22)is obtained immediately.

We complete the proof. \(\square \)

In order to show that the minimizers to problem (5) are the optimal solutions of the problem (1), we introduce the following rank restricted isometry property (rank-RIP) condition.

Definition 3

(see [36]) For every integer r with \(1 \le r \le m\), let the restricted isometry constant \(\delta _{r}({\mathcal {A}})\) be the minimum number such that

$$\begin{aligned} (1 - \delta _{r}({\mathcal {A}}))\Arrowvert X \Arrowvert _{F}^{2} \le \Arrowvert {\mathcal {A}}(X) \Arrowvert _{2}^{2} \le (1 + \delta _{r}({\mathcal {A}}))\Arrowvert X \Arrowvert _{F}^{2}, \end{aligned}$$
(23)

holds for all \(X \in {\mathbb {R}}^{m \times n}\) of rank at most r.

Armed with rank-RIP condition, the following theorem reveals that the original problem (1) and its nonconvex relaxation problem (5) share the same solutions under the rank-RIP condition with \(\delta _{3K} < \frac{a^{2} - 4a - 2}{5a^{2} + 4a + 2}\).

Theorem 3

Assume that there is a number \(P > 2K\), where \(K = rank(X_{0})\), such that

$$\begin{aligned} \delta _{P}({\mathcal {A}}) + \frac{P}{2K}\delta _{2K + P}({\mathcal {A}}) < \frac{P}{2K} - 1. \end{aligned}$$
(24)

Then, there exists \(a^{*} > 0\), such that for any \(a^{*}< a < +\infty \), \(X_{1} = X_{0}\).

Proof

Let

$$\begin{aligned} f(a) = \frac{a^{2}}{(a + 1)^{2}}\frac{P}{2K}\left( 1 - \delta _{2K + P}({\mathcal {A}})\right) - \left( 1 + \delta _{P}({\mathcal {A}})\right) . \end{aligned}$$
(25)

It is easy to verify that the function f is continuous and increasing in the range \([0, +\infty )\). Note that at \(a = 0\),

$$\begin{aligned} f(0) = -1 - \delta _{P}({\mathcal {A}}) < 0, \end{aligned}$$

and

$$\begin{aligned} f(a) = \frac{P}{2K}\left( 1 - \delta _{2K + P}({\mathcal {A}})\right) - \left( 1 + \delta _{P}({\mathcal {A}})\right) > 0, \quad as \quad a \rightarrow +\infty . \end{aligned}$$

Thus, there exists a constant \(a^{*} > 0\) such that \(f(a^{*}) = 0\). Then, for any \(a^{*}< a < +\infty \), we get

$$\begin{aligned} \frac{a}{a + 1}\sqrt{\frac{1 - \delta _{2K + P}({\mathcal {A}})}{2K}} - \sqrt{\frac{1 + \delta _{P}({\mathcal {A}})}{P}} > 0. \end{aligned}$$
(26)

Since \({\mathcal {A}}(R) = {\mathcal {A}}(X_{1} - X_{0}) = b - b = 0\), we obtain

$$\begin{aligned} 0= & {} \Arrowvert {\mathcal {A}}(\alpha ^{-1}R) \Arrowvert _{2} = \Arrowvert {\mathcal {A}}(\alpha ^{-1}R_{0} + \alpha ^{-1}R_{c}) \Arrowvert _{2}\\= & {} \Arrowvert {\mathcal {A}}(\alpha ^{-1}R_{0} + \alpha ^{-1}R_{1}) + \sum _{i \ge 2}{\mathcal {A}}(\alpha ^{-1}R_{i}) \Arrowvert _{2}\\\ge & {} \Arrowvert {\mathcal {A}}(\alpha ^{-1}R_{0} + \alpha ^{-1}R_{1})\Arrowvert _{2} - \left.{\&\#XArrowvert;}\sum _{i \ge 2}{\mathcal {A}}(\alpha ^{-1}R_{i}) \right.{\&\#XArrowvert;}_{2}\\\ge & {} \Arrowvert {\mathcal {A}}(\alpha ^{-1}R_{0} + \alpha ^{-1}R_{1})\Arrowvert _{2} - \sum _{i \ge 2}\Arrowvert {\mathcal {A}}(\alpha ^{-1}R_{i}) \Arrowvert _{2}\\\ge & {} \sqrt{1 - \delta _{2K + P}({\mathcal {A}})}\Arrowvert \alpha ^{-1}R_{0} + \alpha ^{-1}R_{1}\\&\Arrowvert _{F} - \sqrt{1 + \delta _{P}({\mathcal {A}})}\sum _{i \ge 2}\Arrowvert \alpha ^{-1}R_{i} \Arrowvert _{F}. \end{aligned}$$

Using Theorems 1 and 2, we get

$$\begin{aligned} 0\ge & {} \sqrt{1 - \delta _{2K + P}({\mathcal {A}})}\frac{aT(\alpha ^{-1}R_{0})}{(a + 1)\sqrt{2K}} - \sqrt{1 + \delta _{P}({\mathcal {A}})}\frac{1}{\sqrt{P}}T(\alpha ^{-1}R_{0})\\= & {} \left( \frac{a}{a + 1}\sqrt{\frac{1 - \delta _{2K + P}({\mathcal {A}})}{2K}} - \sqrt{\frac{1 + \delta _{P}({\mathcal {A}})}{P}}\right) T(\alpha ^{-1}R_{0}). \end{aligned}$$

According to the inequality (26) and the definition of \(T(\cdot )\), we have \(T(\alpha ^{-1}R_{0}) = 0\), which implies that \(R_{0} = 0\). This, together with Lemma 3, yields \(R_{c} = 0\). Therefore, \(X^{*} = X_{0}\).

We complete the proof. \(\square \)

Remark 1

In Theorem 3, if let \(P = 3K\), the inequality (26) is

$$\begin{aligned} \frac{3a^{2}}{2(a + 1)^{2}}(1 - \delta _{5K}({\mathcal {A}})) > 1 + \delta _{3K}({\mathcal {A}}). \end{aligned}$$
(27)

This inequality will approach \(\delta _{3T} < \frac{a^{2} - 4a - 2}{5a^{2} + 4a + 2}\). Moreover, we can obtain \(\delta _{3T} < 0.2\) as \(a \rightarrow \infty \).

Although the minimizers of problem (1) can be exactly obtained by solving the nonconvex relaxation problem (5), the resultant optimal problem is still NP-hard. To overcome this difficultly, we focus on its regularization version (6). The following theorem addresses this issue.

Theorem 4

Let \(\{\lambda _{p}\}\) be a decreasing sequence of positive numbers with \(\lambda _{p} \rightarrow 0\), and \(X_{\lambda _{p}}\) be the optimal minimizer of the problem (6) with \(\lambda = \lambda _{p}\). Assume that the problem (5) is feasible, then the sequence \(\{X_{\lambda _{p}}\}\) is bounded and any of its accumulation points is the optimal minimizer of the problem (5).

Proof

Assume that the problem (5) is feasible and \({\tilde{X}}\) is any feasible point, then \({\mathcal {A}}({\tilde{X}}) = b\). Since \(X_{\lambda _{p}}\) is the optimal minimizer of the problem (6) with \(\lambda = \lambda _{p}\), we get

$$\begin{aligned}&\max \left\{ \lambda _{p}T(X_{\lambda _{p}}), \Arrowvert {\mathcal {A}}(X_{\lambda _{p}}) - b \Arrowvert _{2}^{2}\right\} \le \Arrowvert {\mathcal {A}}(X_{\lambda _{p}}) - b \Arrowvert _{2}^{2} + \lambda _{p}T(X_{\lambda _{p}})\nonumber \\&\quad \le \Arrowvert {\mathcal {A}}({\tilde{X}}) - b \Arrowvert _{2}^{2} + \lambda _{p}T({\tilde{X}})\nonumber \\&\quad = \lambda _{p}T({\tilde{X}}). \end{aligned}$$
(28)

Thus, the sequence \(\{X_{\lambda _{p}}\}\) is bounded and has at least one accumulation point. Assume that \(X^{*}\) be any accumulation point of \(\{X_{\lambda _{p}}\}\). From (28), we get \({\mathcal {A}}(X^{*}) = b\), which means that \(X^{*}\) is a feasible point of the problem (5). Together with \(T(X^{*}) \le T({\tilde{X}})\) and the arbitrariness of \({\tilde{X}}\), we get that \(X^{*}\) is the optimal minimizer of the problem (5).

We complete the proof. \(\square \)

4 The proposed algorithm and its convergence analysis

In order to solve the nonconvex relaxation problem (6) efficiently, in this section we propose a novel algorithm highly efficient to handle it at a convergence rate of O(1/N). Specifically, the proposed algorithm has an improved convergence rate and improved recovery capability of the low-rank matrix over those of the state-of-the-art algorithms.

First, the power method [37] is adopted to obtain approximate SVD. As shown in [33] and [34], in real application it is often too expensive to compute the proximal operator. To speed up the convergence of the proposed algorithm, it is desirable to solve the proximal operator by computing the SVD on a smaller matrix. The following lemma addresses this issue.

Lemma 5

For any fixed \(\lambda > 0\), assume that \(X = QQ^{T}X \in {\mathbb {R}}^{m \times n}\), where \(Q \in {\mathbb {R}}^{m \times t} (t \ll n)\) is an orthogonal matrix. Then,

$$\begin{aligned} prox_{\frac{\lambda }{\tau }T(\cdot )}(X) = Qprox_{\frac{\lambda }{\tau }T(\cdot )}(Q^{T}X). \end{aligned}$$
(29)

Proof

The proof can be followed the footsteps of Proposition 1 in [38], and we omit it here. \(\square \)

Since the power method is employed in our algorithm to get approximate SVD, the results generated by the proximal operator may be inexact, meaning that

$$\begin{aligned} X_{k + 1}\!=\! & {} prox_{\frac{\lambda }{\tau }T(\cdot )}(Y_{k})\nonumber \\\!=\! & {} \left\{ U|\frac{\lambda }{\tau }T(U) \!+\! \frac{1}{2}\Arrowvert U \!-\! Y_{k} \Arrowvert _{F}^{2} \!\le \! \xi _{k} \!+\! \frac{\lambda }{\tau }T(V) \right. \nonumber \\&\left. + \frac{1}{2}\Arrowvert V \!-\! Y_{k} \Arrowvert _{F}^{2}, \; \forall V \!\in \! {\mathbb {R}}^{m \times n}\right\} . \end{aligned}$$
(30)

As indicated in [34], although the inexact proximal steps are employed in our algorithm, the basic properties stay the same.

Second, the convergence rate of our proposed algorithm is further accelerated by the Nesterov’s rule [39] which is a widely used method to speed up the convergence rate of first-order algorithm. In this paper, we integrate \(\mathrm {APGnc}^{+}\) [40] with our proposed algorithm, and the details can be found in Algorithm 1. It should be pointed out that our proposed algorithm ATS1PGA is much different from \(\mathrm {APGnc}^{+}\). We focus on the transformed Schatten-1 regularizer which induces lower rank and achieves improved recovery performance than the nonconvex regularizer used in \(\mathrm {APGnc}^{+}\).

figure a

The following theorem shows that the objective function is always decreased and any accumulation point of the sequence generated by Algorithm 1 is a stationary point.

Theorem 5

Let \(F(X) = \frac{1}{2}\Arrowvert {\mathcal {A}}(X) - b\Arrowvert _{2}^{2} + \lambda T(X)\) and \(\{X_{k}\}\) be the sequence generated via Algorithm 1 with \(\tau \ge \Arrowvert {\mathcal {A}} \Arrowvert _{2}^{2}\). Assume that \(\xi _{k} \le \delta \Arrowvert X_{k + 1} - Y_{k}\Arrowvert _{F}^{2}\) and \(\frac{\tau }{2} - \frac{\Arrowvert {\mathcal {A}} \Arrowvert _{2}^{2}}{2} - \tau \delta > 0\), then

  1. (1)

    the sequence \(\{X_{k}\}\) is bounded, and has at leat one accumulation point;

  2. (2)

    F(X) is monotonically decreasing and converges to \(F(X^{*})\), where \(X^{*}\) is any accumulation point of \(\{X_{k}\}\);

  3. (3)

    \(\lim _{k \rightarrow \infty }\Arrowvert X_{k + 1} - Y_{k}\Arrowvert _{F}^{2} = 0\);

  4. (4)

    \(X^{*}\) is a stationary point of (6).

To prove Theorem 5, we first introduce the following lemma.

Lemma 6

If \(\xi _{k} \le \delta \Arrowvert A - Y_{k} \Arrowvert _{F}^{2}\), then

$$\begin{aligned} F(A) \le F(Y_{k}) - \left( \frac{\tau }{2} - \frac{\Arrowvert {\mathcal {A}} \Arrowvert _{2}^{2}}{2} - \tau \delta \right) \Arrowvert A - Y_{k}\Arrowvert _{F}^{2}, \end{aligned}$$
(31)

where \(A = prox_{\frac{\lambda }{\tau }T(\cdot )}(Y_{k} - \frac{1}{\tau }\nabla f(Y_{k}))\).

Proof

Let \(f(A) = \frac{1}{2}\Arrowvert {\mathcal {A}}(A) - b \Arrowvert _{2}^{2}\), we have

$$\begin{aligned} f(A) \le f(Y_{k}) + <A - Y_{k}, \nabla f(Y_{k})> + \frac{\Arrowvert {\mathcal {A}} \Arrowvert _{2}^{2}}{2}\Arrowvert A - Y_{k} \Arrowvert _{F}^{2}. \end{aligned}$$

According to the definition of the inexact proximal operator for each step, we have that

$$\begin{aligned}&\frac{\lambda }{\tau } T(A) + \frac{1}{2}\Arrowvert A - Y_{k} + \frac{1}{\tau }\nabla f(Y_{k}) \Arrowvert _{F}^{2}\\&\quad \le \xi _{k} + \frac{\lambda }{\tau } T(Y_{k}) + \frac{1}{2}\Arrowvert \frac{1}{\tau }\nabla f(Y_{k}) \Arrowvert _{F}^{2}, \end{aligned}$$

and thus we can simplify it as

$$\begin{aligned} \lambda T(A) \le \tau \xi _{k} + \lambda T(Y_{k}) - \frac{\tau }{2}\Arrowvert A - Y_{k} \Arrowvert _{F}^{2} - <A - Y_{k}, \nabla f(Y_{k})>. \end{aligned}$$

Therefore, we have that

$$\begin{aligned} F(A)= & {} f(A) + \lambda T(A)\\\le & {} f(Y_{k}) +<\nabla f(Y_{k}), A - Y_{k}> + \frac{\Arrowvert {\mathcal {A}} \Arrowvert _{2}^{2}}{2}\Arrowvert A - Y_{k}\Arrowvert _{F}^{2}\\&+ \lambda T(Y_{k}) - <\nabla f(Y_{k}), A - Y_{k}> - \frac{\tau }{2}\Arrowvert A - Y_{k}\Arrowvert _{F}^{2} + \tau \xi _{k}\\= & {} F(Y_{k}) - \left( \frac{\tau - \Arrowvert {\mathcal {A}} \Arrowvert _{2}^{2}}{2}\right) \Arrowvert A - Y_{k}\Arrowvert _{F}^{2} + \tau \xi _{k}\\\le & {} F(Y_{k}) -\left( \frac{\tau }{2} - \frac{\Arrowvert {\mathcal {A}} \Arrowvert _{2}^{2}}{2} - \tau \delta \right) \Arrowvert A - Y_{k}\Arrowvert _{F}^{2}. \end{aligned}$$

We complete the proof. \(\square \)

Now, we begin prove Theorem 5.

Proof

(1) and (2) By applying Lemma 6, we obtain that

$$\begin{aligned} F(X_{k + 1}) \le F(Y_{k}) - \left( \frac{\tau }{2} - \frac{\Arrowvert {\mathcal {A}} \Arrowvert _{2}^{2}}{2} - \tau \delta \right) \Arrowvert X_{k + 1} - Y_{k} \Arrowvert _{F}^{2}. \end{aligned}$$
(32)

Since \(\frac{\tau }{2} - \frac{\Arrowvert {\mathcal {A}} \Arrowvert _{2}^{2}}{2} - \tau \delta > 0\), it follows that \(F(X_{k + 1}) \le F(Y_{k})\). Besides, according to the the update rule of Algorithm 4.1, we have \(F(Y_{k + 1}) \le F(X_{k + 1})\). In summary, for all k the following inequality holds:

$$\begin{aligned} F(Y_{k + 1}) \le F(X_{k + 1}) \le F(Y_{k}) \le F(X_{k}), \end{aligned}$$
(33)

which shows that \(\{F(X_{k})\}\) is monotonically decreasing and converges to a constant \(C^{*}\). From \(\{X_{k}\} \subset \{X: F(X) \le F(X_{0})\}\) which is bounded, it follows that \(\{X_{k}\}\) is bounded and, therefore, there is at least one accumulation point \(X^{*}\). Moreover, using the continuity and monotonicity of \(\{F(\cdot )\}\), \(F(X_{k}) \rightarrow F^{*} = F(X^{*})\) as \(k \rightarrow +\infty \).

(3) From (32) and (33), we have

$$\begin{aligned} \left( \frac{\tau }{2} - \frac{\Arrowvert {\mathcal {A}} \Arrowvert _{2}^{2}}{2} - \tau \delta \right) \Arrowvert X_{k + 1} - Y_{k}\Arrowvert _{F}^{2} \le F(X_{k}) - F(X_{k + 1}). \end{aligned}$$
(34)

Summing (34) from \(k = 1\) to m, we have

$$\begin{aligned} \sum _{k = 1}^{m}\Arrowvert X_{k + 1} - Y_{k} \Arrowvert _{F}^{2} \le \frac{1}{\frac{\tau }{2} - \frac{\Arrowvert {\mathcal {A}} \Arrowvert _{2}^{2}}{2} - \tau \delta }(F(X_{1}) - F(X_{m + 1})). \end{aligned}$$
(35)

Let \(m \rightarrow \infty \), becomes

$$\begin{aligned} \sum _{k = 1}^{\infty }\Arrowvert X_{k + 1} - Y_{k} \Arrowvert _{F}^{2} \le \frac{1}{\frac{\tau }{2} - \frac{\Arrowvert {\mathcal {A}} \Arrowvert _{2}^{2}}{2} - \tau \delta }(F(X_{1}) - F(X^{*})). \end{aligned}$$
(36)

Thus,

$$\begin{aligned} \sum _{k = 1}^{\infty }\Arrowvert X_{k + 1} - Y_{k} \Arrowvert _{F}^{2} < \infty , \end{aligned}$$
(37)

Finally, we have that

$$\begin{aligned} \lim _{k \rightarrow \infty }\Arrowvert X_{k + 1} - Y_{k}\Arrowvert _{F}^{2} = 0, \end{aligned}$$
(38)

This also implies that \(\{X_{k}\}\) and \(\{Y_{k}\}\) share the same set of limit points.

(4) Let \(\{X_{k_{j}}\}\) and \(\{Y_{k_{j}}\}\) be the convergent subsequences of \(\{X_{k}\}\) and \(\{Y_{k}\}\), respectively. Assuming \(X^{*}\) be their limit point, i.e.,

$$\begin{aligned} X_{k_{j}} \rightarrow X^{*}, \quad Y_{k_{j}} \rightarrow X^{*} \quad as \quad k_{j} \rightarrow +\infty . \end{aligned}$$
(39)

From

$$\begin{aligned}&\Arrowvert X_{k_{j + 1}} - X^{*} \Arrowvert _{F} \le \Arrowvert X_{k_{j + 1}} - X_{k_{j}} \Arrowvert _{F}\\&\quad + \Arrowvert X_{k_{j}} - X^{*} \Arrowvert _{F} \rightarrow 0, \quad as \quad k_{j} \rightarrow +\infty , \end{aligned}$$

we get

$$\begin{aligned} X_{k_{j + 1}} \rightarrow X^{*} \quad as \quad k_{j} \rightarrow +\infty . \end{aligned}$$
(40)

Since \(X_{k_{j + 1}} = prox_{\frac{\lambda }{\tau }T(\cdot )}(Y_{k_{j}} - \frac{1}{\tau }\nabla f(Y_{k_{j}}))\), we have

$$\begin{aligned} \frac{1}{2} \left \Arrowvert X_{k_{j + 1}} - \left[ Y_{k_{j}} + \frac{1}{\tau }{\mathcal {A}}^{*}(b - {\mathcal {A}}(Y_{k_{j}}))\right] \right\Arrowvert _{F}^{2} +\frac{\lambda }{\tau }T(X_{k_{j + 1}})\\ \le \frac{1}{2} \left \Arrowvert X - \left[ Y_{k_{j}} + \frac{1}{\tau }{\mathcal {A}}^{*}(b - {\mathcal {A}}(Y_{k_{j}}))\right] \right\Arrowvert _{F}^{2} + \frac{\lambda }{\tau }T(X). \end{aligned}$$

From (39) and (40), we immediately get for any \(X \in {\mathbb {R}}^{m \times n}\) that

$$\begin{aligned} \frac{1}{2}\left\Arrowvert X^{*} -\left[ X^{*} + \frac{1}{\tau }{\mathcal {A}}^{*}(b - {\mathcal {A}}(X^{*}))\right] \right\Arrowvert _{F}^{2} + \frac{\lambda }{\tau }T(X^{*})\\ \le \frac{1}{2}\left\Arrowvert X - \left[ X^{*} + \frac{1}{\tau }{\mathcal {A}}^{*}(b - {\mathcal {A}}(X^{*}))\right] \right\Arrowvert _{F}^{2} + \frac{\lambda }{\tau }T(X), \end{aligned}$$

therefore, \(X^{*}\) is the minimizes of the following function:

$$\begin{aligned} X \rightarrow \frac{1}{2} \left \Arrowvert X - \left[ X^{*} + \frac{1}{\tau }{\mathcal {A}}^{*}(b - {\mathcal {A}}(X^{*}))\right] \right\Arrowvert _{F}^{2} + \frac{\lambda }{\tau }T(X). \end{aligned}$$

We complete the proof. \(\square \)

ATS1PA can be directly applied to solve matrix completion problems by replacing its step 4 with

$$\begin{aligned} B = Y - \frac{1}{\tau }{\mathcal {P}}_{\varOmega }(W - Y), \end{aligned}$$
(41)

where \(\varOmega \) denotes the indices of the observed entries, and \({\mathcal {P}}_{\varOmega }\) is defined as

$$\begin{aligned}{}[{\mathcal {P}}_{\varOmega }(M)]_{ij} = {\left\{ \begin{array}{ll} M_{ij}, &{} \text { if }\varOmega _{ij} = 1\\ 0, &{} \text { if }\varOmega _{ij} = 0. \end{array}\right. } \end{aligned}$$

The main computation cost of B is \({\mathcal {P}}_{\varOmega }(W - Y)\) which takes \(O(\Arrowvert \varOmega \Arrowvert _{1}{\bar{r}})\) time, where \({\bar{r}}\) is the rank of \((W - Y)\). The PowerMethod is performed in step 5 and takes \(O(mnt_{k})\) time, where \(t_{k}\) is the column number of R. At step 6, a SVD on a smaller matrix \(Q^{T}B\) is performed and SVD(\(Q^{T}B\)) takes only \(O(mt_{k}^{2})\) time. Besides, by using the “sparse plus low rank” structure of (41), the complexity of proximal step is \(O((m + n)t_{k}^{2} + \Arrowvert \varOmega \Arrowvert _{1}t_{k})\) when \(Y_{k + 1} = V_{k + 1}\) in step 13 or \(O((m + n){\bar{r}} + \Arrowvert \varOmega \Arrowvert _{1}t_{k})\) when \(Y_{k + 1} = X_{k + 1}\). Summarizing, the time complexity of ATS1PGA in each iteration is \(O((m + n)t_{k}^{2} + \Arrowvert \varOmega \Arrowvert _{1}t_{k})\), where \(t_{k} \ll n\), \(\Arrowvert \varOmega \Arrowvert _{1} \ll mn\).

5 Numerical experiments

Fig. 1
figure 1

Example of low-rank image. a One 512 \(\times \) 512 image. b First 300 singular values of Barbara

In this section, numerous experimental results on real-world data are presented to demonstrate that our proposed algorithm has an improved convergence rate and improved restoration capability of low-rank matrix over that of the state-of-the-art algorithms. We compare ATS1PGA algorithm with the following representative algorithms:

  • SVT [13]: a widely used nuclear-norm-based algorithm which is inspired by the linearized Bregman iterations for compressed sensing.

  • APG [14]: a nuclear-norm-based algorithm which extends a fast iterative shrinkage thresholding algorithm from the vector case to the matrix case.

  • FPCA [41]: an algorithm deals with the same problem as APG while employing fast Monte Carlo algorithm for approximate SVD.

  • R1MP [36]: an algorithm which extends the orthogonal matching pursuit algorithm from the vector case to the matrix case.

  • IRNN [24]: a nonconvex algorithm which replaces the nuclear norm by reweighted nuclear norm. We choose SCAD in this work, as it always generates the best result in our test.

  • FaNCL [25]: a nonconvex algorithm deals with the same problem as IRNN while achieving improved convergence rate. We select SCAD in this work.

  • TS1 [26]: a nonconvex algorithm which is proposed by using transformed Schatten-1 penalty function.

Fig. 2
figure 2

The 20 test grayscale images for image recovery

As shown in Fig. 1, an image usually represents low-rank or approximately low-rank structure. Thus, the problem of recovering an incomplete image can be treated as the problem of recovering a low-rank matrix. To test effectiveness of our algorithm on real data, we compare it with the state-of-the-art algorithms on 20 widely used images represented in Fig. 2. The size of the first 7 images is \(256 \times 256\), the size of the following 10 images is \(512 \times 512\), and the size of the last 3 images is \(1024 \times 1024\). In our tests, two different types of mask are considered.

  • Random mask: given an image, we randomly exclude \(\alpha \%\) pixels, and the remaining ones serve as the observations.

  • Text mask: the text may cover some important texture information of a given image.

In the following experiments, the parameters in each algorithm are obtained by the recommended setting. For our proposed algorithm, we use the same parameter values as the TS1 algorithm [26]. Besides, all the algorithms are stopped when the difference in objective values between consecutive iterations becomes smaller than \(10^{-4}\). All the algorithms are implemented in MATLAB R2014a on a Windows server 2008 system with Intel Xeon E5-2680-v4 CPU(3 cores, 2.4GHz) and 256GB memory.

5.1 Image inpainting with random mask

Table 1 Comparison of different algorithms on image datasets with \(sr = 0.5\) and \(s = 10\). CPU time is in seconds
Table 2 Comparison of different algorithms on image datasets with \(sr = 0.5\) and \(s = 15\). CPU time is in seconds

Image inpainting is one of the most basic problems in the field of image processing, which aims to find out the missing pixels from very limited information of an incomplete image. In the following tests, we first consider a relatively easy low-rank matrix recovery problem. We assume that the incomplete image data is corrupted with noise. Specifically, let matrix X represents an incomplete image data, before sampling missing pixels we first generate a noise matrix N with i.i.d. elements drawn form Gaussian distribution \({\mathcal {N}}(0, s)\). Then, we set \(X = X + N\) as the observed matrix. By using the same setup as in [16], we randomly exclude \(50 \%\) of the pixels in each image, and the remaining ones are used as the observations. We also use sr to denote the sample ratio.

The performance of all algorithms are evaluated as: (1) peak signal-to-noise ration (PSNR) [42]; (2) the running time. We vary s in the range \(\{10, 15, 20, 30\}\). The results with different levels of noise, the average of 10 times experiments, are reported.

Table 3 Comparison of different algorithms on image datasets with \(sr = 0.5\) and \(s = 20\). CPU time is in seconds
Table 4 Comparison of different algorithms on image datasets with \(sr = 0.5\) and \(s = 30\). CPU time is in seconds
Fig. 3
figure 3

Recovered images by using different algorithms on image 20 with \(sr = 0.5\) and \(s = 20\)

Fig. 4
figure 4

Low-rank matrix recovery results on image data. We depict the PSNR along the running time

It is seen from Table 1 that ATS1PGA algorithm achieves higher PSNR values than other alternative algorithms except TS1. We also find that TS1 achieves most accurate solutions 9 times on all images. However, ATS1PGA runs much faster than TS1. Actually, our proposed algorithm is the fastest. From Tables 2 and 3, we can observe that the noise degenerates the performance of TS1. For ATS1PGA, we can obtain the similar trend in Table 1. From Table 4, we can find that our proposed algorithm outperforms all competing algorithms on nearly all images. We also present the recovered images in Fig. 3 by using different algorithms and show that the images obtained by ATS1PGA contain more details than those obtained by other stat-of-the-art algorithms. The convergence efficiency of ATS1PGA is also investigated and the empirical results are presented in Fig. 4. The results in Fig. 4 show that our ATS1PGA algorithm performs best among all competing algorithms. Therefore, taking both accuracy and efficiency into consideration, our ATS1PGA algorithm has the best recovery performance among all stat-of-the-art algorithms.

5.2 Image inpainting with text mask

Fig. 5
figure 5

Recovery results of the eight methods on Lenna with text noise. CPU time is in seconds. a Lenna. b Lenna with text. c SVT, Time = 14.08. d APG, Time = 9.37. e FPCA, Time = 18.36. f R1MP, Time = 6.96. g IRNN, Time = 9.38. h FaNCL, Time = 4.42. i TS1, Time = 55.98. j ATS1PGA, Time = 2.79. These results show that our proposed algorithm outperforms the competing methods

Fig. 6
figure 6

Recovery results of the eight methods on Boat with text noise. CPU time is in seconds. a Boat. b Boat with text. c SVT, Time = 14.78. d APG, Time = 8.51. e FPCA, Time = 18.34. f R1MP, Time = 8.93. g IRNN, Time = 10.76. h FaNCL, Time = 5.40. i TS1, Time = 54.54. j ATS1PGA, Time = 3.15. These results show that our proposed algorithm outperforms the competing methods

In this section, we will consider the text removal problem, where some of the pixels of one image are masked in a non-random fashion, such as texts on the image. Text removal is a tough task in the field of image processing. To deal with such a problem, the position of the text should be detected first, and then the corresponding task turns into recovering a low-rank matrix problem. Figures 5 and 6 represent the empirical results of the eight low-rank matrix recovery methods. Specifically, for the example Lenna in Fig. 5, the PSNR values for SVT, APG, FPCA, R1MP, IRNN, FaNCL, TS1, and ATS1PGA are 5.53, 27.04, 13.54, 27.08, 26.63, 26.63, 29.79, and 27.76, respectively. And for the example Boat in Fig. 6, the PSNR values for SVT, APG, FPCA, R1MP, IRNN, FaNCL, TS1, and ATS1PGA are 5.4, 26.04, 14.27, 26.57, 25.65, 25.65, 28.55, and 26.22, respectively. These results show that our ATS1PA algorithm is better than SVT, APG, FPCA, R1MP, IRNN, and FaNCL for both Lenna and Boat but only slightly worse than TS1. In terms of speed among eight methods, our ATS1PGA is the fastest. In particular, our ATS1PGA is at least 15 times faster than TS1. Thus, we can conclude that our ATS1PGA algorithm is competitive in handling the text removal task.

6 Conclusion and future work

This paper further investigated the basic properties of TS1 penalty function and utilized it to deal with the problem of low-rank matrix recovery. Specifically, we theoretically proved that the original low-rank problem (1) can be equivalently transformed into the problem (5) under certain conditions. Since the resulting optimization problem (5) is still NP-hard, we proved that the solutions of (5) can be obtained by solving its regularization problem. To provide an efficient and fast low-rank matrix recovery method, an algorithm with inexact proximal steps and Nesterov’s rule was proposed. Besides, a convergence analysis of the proposed algorithm demonstrated that any accumulation point of the sequence generated by our algorithm is a stationary point. Finally, numerical experiments on real-world data sets demonstrated that our proposed algorithm much faster than the SVT, APG, FPCA, R1MP, IRNN, FaNCL, and TS1 algorithms. The experiments results also showed that our proposed algorithm achieves comparable recovery performance. We can conclude that our algorithm leads to impressive improvements over the state-of-the-art methods in the field of low-rank matrix recovery.

In many real-world applications, it is not appropriate to assume the observed entries are corrupted by only white Gaussian noise. This means that there is still much follow-up work to be done in future, such as developing more flexible model with TS1 regularizer to deal with low-rank matrix recovery problem when observed entries are corrupted by non-Gaussian noise. Furthermore, developing fast, robust, scalable and reliable algorithm is also a central issue in the future.