1 Introduction

With the improvement of mathematical theory, the linear representation method has formed a nice theoretical framework, which has attracted widespread attention [1, 2]. Sparse representation method is one of the most representative method, which has been widely employed in various fields, such as signal processing, image processing, machine learning, and computer vision [3,4,5]. For many problems, such as image restoration, super-resolution, visual tracking, image classification and image segmentation, sparse representation methods also show its practical potential [6,7,8,9]. And low-rank matrix completion plays an importance role in sparse low-rank representation.

In recent years, estimating the missing values via the very limited information in an unknown matrix has aroused notable interest [10, 11]. Obviously, the completion of any matrix is an inappropriate problem, since the number of samples is limited, leading to being infinitely many cases for restorable matrices. Therefore, we need some other qualifying information to determine which of these candidates is more appropriate. In many cases, we hope that the restored matrix has low-rank or approximately low-rank features, which means it may be solved by a low-rank matrix completion method.

A typical low-rank matrix completion problem is the Netflix problem. From 2006 to 2009, Netflix held a data science competition to improve its proprietary movie recommendation system. The training set is a rating matrix containing approximately 480,000 users (i.e., rows) and 18,000 movies (i.e., columns), with a total of 8.6 billion potential users. Among them, only 1.2% of the scores can be observed, and the goal was to predict the remaining 98.8% of the blank items based on the existing data. How to use a customer’s ratings of a limited number of movies to predict their preferences, and then recommend movies that match their preferences, which is equivalent to inferring unknown elements through the known elements in the matrix. Due to the limited factors which may affect customers’ preferences, the matrix is essentially a low-rank matrix. Nowadays, matrix completion plays an important role in collaborative filtering, recommendation systems, data clustering, video denoising, network coding, medical imaging, and other fields, which has become one of the most significant and challenging tasks in computer vision and machine learning.

Candès and Recht pointed out that the matrix recovery is not as difficult to solve as people think [12]. They proved that even if the cardinality of the sample set is surprisingly small, most low-rank matrices can be accurately recovered from a small number of sample entries. More importantly, they proved that a low-rank matrix completion problem can be solved by minimizing the rank function.

However, due to the non-convexity and discontinuity of the rank function, the above-mentioned rank minimization problem is usually NP-hard, and existing algorithms cannot solve it directly and effectively. As is well known, the nuclear norm is the most common alternative to the rank function. Simultaneously, the value of the observed item may not have to be strictly accurate due to noise interference, therefore we can restore it by minimizing the prediction square error. But unfortunately, this kind of method may be very sensitive to outliers.

In this article, we propose a low-rank matrix completion model based on truncated nuclear norm and Lp-norm. Different from minimizing the sum of all singular values based on the nuclear norm method, a truncated nuclear norm is introduced, which only minimizes \(\min (m,n)-r\) singular values. In this way, a more accurate and robust approximation function to the rank function can be obtained. In addition, the Lp-norm is employed as the error function to improve the robustness. Therefore, our new goal is to minimize the joint truncated nuclear norm and Lp-norm. When p approaches 0, the proposed method is more robust and effective than the standard matrix completion method. Although our objective function is not a convex problem, an efficient solution based on the alternating direction multiplier method has been derived.

In the numerical simulation, experiments are carried out on both synthetic and real image data, and evaluations are performed by using relative error (RE) and peak signal-to-noise ratio (PSNR) indicators. All empirical results show that the presented model is better than several state-of-the-art matrix completion methods, especially at low sampling rates. The method efficiency is illustrated on the MovieLens-1M dataset.

All in all, the main contributions of this article are as follows:

  1. 1.

    A new objective function based on the truncated nuclear norm and Lp-norm is proposed for the low-rank matrix completion problem;

  2. 2.

    Though the optimization objective function is a non-convex and non-trivial problem, an efficient optimization scheme has been derived to solve this problem.

1.1 Outlines

The remainder of this paper is organized as follows: In the next section, the background of the matrix completion and some related definitions are provided. In Sect. 3, a robust matrix completion model is proposed based on the truncated nuclear norm and Lp-norm. An effective optimization scheme is designed in Sects. 4. Section 5 conducts the numerical experiments to compare the proposed method with existing methods. Finally, Sect. 6 concludes this paper.

1.2 Notations

The set \(R^n\) and \(R^{m\times n}\) denote the space of n dimensional column vector and the space of \(m\times n\) dimensional matrix, respectively. Lowercase bold letters indicate vectors, while uppercase bold letters indicate matrices, with the (ij) element of the matrix \(\varvec{X}\) being expressed as \(X_{ij}\). For a matrix \(\varvec{X}\), \(\text {rank}(\varvec{X})\) and \(\text {Tr}(\varvec{X})\) denote the rank and the trace of \(\varvec{X}\). The Frobenious norm of \(\varvec{X}\) is defined as \(\left\| \varvec{X}\right\| _F=(\sum _{i}\sum _{j} X_{ij})^{\frac{1}{2}}\). The nuclear norm of \(\varvec{X}\) is denoted as \(\left\| \varvec{X}\right\| _*=\sum _{i=1}^{\min (m,n)}\sigma _i\), with \(\sigma _i\) denoting the i-th singular value.

2 Preliminary

2.1 Background

We first describe the low-rank matrix completion problem formally. Let

$$\begin{aligned} \varvec{X}_{\varOmega }=\{ X_{ij}|(i,j)\in \varOmega \}, \end{aligned}$$

where \(\varvec{X}\) denotes a matrix, and \(\varOmega\) denotes the index of the observation matrix elements, i.e., \(\varOmega \subset \{1,...,m\} \times \{1,...,n\}\). Assuming that we have obtained some observations, \(\varvec{D}_{\varOmega }\), in the low-rank matrix, \(\varvec{X}\), the task of matrix completion is to restore the unobserved elements in \(\varvec{X}\). Specifically, the completion problem of \(\varvec{X}\) may be described by solving the following rank minimization problem as

$$\begin{aligned} \begin{aligned}&\min \limits _{\varvec{X}} \text {rank}(\varvec{X}),\\&\text {s.t.} \varvec{X}_{\varOmega }=\varvec{D}_{\varOmega }. \end{aligned} \end{aligned}$$
(1)

However, in most cases, due to the non-convexity and discontinuity of the rank function, the problem (1) is NP-hard [13], which means it is difficult to be solved directly and effectively. Theoretical studies show that the nuclear norm, \(\left\| \varvec{X}\right\| _*\), i.e., the sum of singular values, is the tightest convex lower bound of the rank function. Therefore, the nuclear norm is commonly used as the convex replacement of the matrix rank function [14]. Thus, problem (1) may be expressed approximately by the nuclear norm as

$$\begin{aligned} \begin{aligned}&\min \limits _{\varvec{X}} \left\| \varvec{X}\right\| _*,\\&\text {s.t.} \varvec{X}_{\varOmega }=\varvec{D}_{\varOmega }. \end{aligned} \end{aligned}$$
(2)

Estimating the low-rank matrix will inevitably be disturbed by noise, such that the recovered matrix does not have to perfectly match the corresponding items of the observation items. Thus, we relax the constraints of the equation, and problem (2) may be described as [15]

$$\begin{aligned} \min \limits _{\varvec{X}} \left\| \varvec{X}\right\| _* + \gamma \left\| {\varvec{X}_{\varOmega }-\varvec{D}_{\varOmega }} \right\| _2^{2}, \end{aligned}$$
(3)

where \(\gamma\) denotes the penalty parameter. Many researchers have strived to find the optimal solution of this problem, deriving a series of related approaches, such as the singular value threshold [16] and singular value projection algorithm [17]. In addition, a variety of objective function have been studied, such as the truncated nuclear norm [18], Schatten p-norm [19], and other variants [20,21,22], which may lead to a more accurate estimation.

2.2 Definition of Lp-Norm

Definition 1

Given a matrix \(\varvec{X} \in R^{m \times n}\), the Lp-norm of \(\varvec{X}\) is defined as [19]

$$\begin{aligned} \left\| \varvec{X}\right\| _p = (\sum _{i=1}^n \sum _{j=1}^m |X_{ij}|^p)^{\frac{1}{p}}. \end{aligned}$$
(4)

2.3 Definition of truncated nuclear norm

Definition 2

Given a matrix \(\varvec{X} \in R^{m \times n}\), its truncated nuclear norm \(\left\| \varvec{X}\right\| _r\) is defined as the sum of \(\min (m,n)-r\) smallest singular values, which may be expressed as [18]

$$\begin{aligned} \left\| \varvec{X}\right\| _r = \sum _{i=r+1}^{\min (m,n)} \sigma _{i}, \end{aligned}$$
(5)

where \(\sigma _{i}\) denotes the i-th singular value of \(\varvec{X}\).

3 The proposed method

Note that Lp-norm can be regarded as an extension of L2-norm. We, therefore, use Lp-norm (\(0<p\le 1\)) to replace L2-norm in problem (3). Employing Lp-norm as an error function can effectively enhance the robustness to the outliers [23]. Then, we can transform the problem (3) into the following form as

$$\begin{aligned} \min \limits _{\varvec{X}} \left\| \varvec{X}\right\| _* + \gamma \left\| {\varvec{X}_{\varOmega }-\varvec{D}_{\varOmega }} \right\| _p^{p}. \end{aligned}$$
(6)

Simultaneously, noting that the value of the largest r non-zero singular values will not affect the rank of a matrix, we do not constrain them in the newly designed truncated nuclear norm, and focus on minimizing the remaining \(\min (m,n)-r\) smallest singular values, i.e., convert \(\left\| X\right\| _*\) into the optimization of \(\left\| X\right\| _r\), such that problem (6) may be transformed into

$$\begin{aligned} \min \limits _{\varvec{X}} \left\| \varvec{X}\right\| _r + \gamma \left\| {\varvec{X}_{\varOmega }-\varvec{D}_{\varOmega }} \right\| _p^{p}. \end{aligned}$$
(7)

Since \(\left\| \varvec{X}\right\| _r\) is non-convex, it is obviously difficult to solve problem (7) directly. To handle this issue, the following lemma from [18] is given without proof:

Lemma 1

Given a matrix \(\varvec{X} \in R^{m \times n}\), for any matrix \(\varvec{A} \in R^{r \times m}\) and \(\varvec{B} \in R^{r \times n}\) that satisfy \(\varvec{A}\varvec{A}^T=I_{r\times r}\) and \(\varvec{B}\varvec{B}^T=I_{r\times r}\), where \(\varvec{I}_{r\times r}\) denotes a r-order identity matrix. Thus, for any non-negative integer r satisfying \(r \le \min (m,n)\), it holds [18]

$$\begin{aligned} \left\| \varvec{X}\right\| _r = \left\| \varvec{X}\right\| _* - \max \limits _{\varvec{A}\varvec{A}^T=\varvec{I}_{r\times r},\varvec{B}\varvec{B}^T=I_{r\times r}} \text {Tr}(\varvec{A}\varvec{X}\varvec{B}^T). \end{aligned}$$
(8)

In summary, combining the truncated nuclear norm and Lp-norm, we may rewrite the optimization problem (7) as

$$\begin{aligned} \min \limits _{\varvec{X}} \left\| \varvec{X}\right\| _* - \max \limits _{\varvec{A}\varvec{A}^T=\varvec{I}_{r\times r},\varvec{B}\varvec{B}^T=\varvec{I}_{r\times r}} \text {Tr}(\varvec{A}\varvec{X}\varvec{B}^T) + \gamma \left\| {\varvec{X}_{\varOmega }-\varvec{D}_{\varOmega }} \right\| _p^{p}, \end{aligned}$$
(9)

with \(\varvec{A} \in R^{r \times m}\) and \(\varvec{B} \in R^{r \times n}\). This is our proposed robust model combining Lp-norm and the truncated nuclear norm (Lp-TNN). The solution scheme of model (9) will be detailed in Sect. 4.

4 Efficient implementation

4.1 Three-steps method to solve problem (9)

For problem (9), we have to solve the problem \(\max \limits _{\varvec{A}\varvec{A}^T=\varvec{I}_{r\times r},\varvec{B}\varvec{B}^T=\varvec{I}_{r\times r}} \text {Tr}(\varvec{A}\varvec{X}\varvec{B}^T)\) first. Inspired by [18], a simple and effective three-step iterative method has been established here. Specifically, we first let \(\varvec{X}^{(0)}=\varvec{D}_{\varOmega }\) as the initial value. Then, in the k-th iteration, when \(\varvec{X}^{(k)}\) is obtained, \(\varvec{A}^{(k)}\) and \(\varvec{B}^{(k)}\) may be obtained by the singular value decomposition (SVD) procedure, i.e., \(\varvec{X}^{(k)}=\varvec{U}^{(k)} \varvec{\varSigma }^{(k)}\varvec{V}^{(k)}\), where

$$\begin{aligned} \varvec{A}^{(k)}=(\varvec{u}_1,...\varvec{u}_r)^T,\varvec{B}^{(k)}=(\varvec{v}_1,...\varvec{v}_r)^T, \end{aligned}$$
(10)

with \(\varvec{U}^{(k)}=(\varvec{u}_1,...\varvec{u}_m)\in R^{m\times m}\) and \(\varvec{V}^{(k)}=(\varvec{u}_1,...\varvec{u}_n)\in R^{n\times n}\). Then, \(\varvec{X}^{(k+1)}\) may be updated by solving the following problem

$$\begin{aligned} \min \limits _{\varvec{X}} \left\| \varvec{X}\right\| _* - \text {Tr}(\varvec{A}^{(k)}\varvec{X}\varvec{B}^{(k)^{T}})+ \gamma \left\| {\varvec{X}_{\varOmega }-\varvec{D}_{\varOmega }} \right\| _p^{p}. \end{aligned}$$
(11)

In this way, the maximization problem in minimization can be converted into a three-steps iterative process. This process is summarized in Algorithm 1, and the stopping criterion will be given in Sect. 4.7.

figure a

As can be seen in Algorithm 1, how to effectively solve step 3 is extremely significant. Noting that it is still a complex optimization problem, we restate and simplify it as follows

$$\begin{aligned} \min \limits _{\varvec{X}} \left\| \varvec{X}\right\| _* - \text {Tr}(\varvec{A}\varvec{X}\varvec{B}^{T}) + \gamma \left\| {\varvec{X}_{\varOmega }-\varvec{D}_{\varOmega }} \right\| _p^{p}, \end{aligned}$$
(12)

where \(\varvec{A}\) and \(\varvec{B}\) are known matrices. It should be noted that \(\left\| \varvec{X}\right\| _*\) and \(\text {Tr}(\varvec{A}\varvec{X}\varvec{B}^{T})\) are convex functions, and \(\left\| {\varvec{X}_{\varOmega }-\varvec{D}_{\varOmega }} \right\| _p^{p}\) is a concave function, therefore the Augmented Lagrangian Method (ALM) and Alternating Direction Multiplier Method (ADMM) can be used to optimization. We will describe the solution to problem (12) in Sect. 4.2.

4.2 Augmented lagrangian method to solve problem (12)

Here, we add some equivalent constraints to (12), such that it may be expressed as

$$\begin{aligned} \min \limits _{\varvec{X},\varvec{E}_\varOmega =\varvec{X}_\varOmega -\varvec{D}_\varOmega ,\varvec{X}=\varvec{W}}\left\| \varvec{X}\right\| _* - \text {Tr}(\varvec{A}\varvec{W}\varvec{B}^{T}) + \gamma \left\| {\varvec{E}_\varOmega } \right\| _p^{p}. \end{aligned}$$
(13)

The augmented lagrangian function corresponding to the above problem (13) can be expressed as

$$\begin{aligned} \begin{aligned} L(\varvec{X},\mu ,\varvec{\varLambda },\varvec{\varSigma },\varvec{W},\varvec{E}_{\varOmega })&= \left\| \varvec{X}\right\| _* - \text {Tr}(\varvec{A}\varvec{W}\varvec{B}^{T}) + \gamma \left\| {\varvec{E}_\varOmega } \right\| _p^{p}\\&\quad + \frac{\mu }{2} \left\| \varvec{E}_\varOmega -(\varvec{X}_{\varOmega }-\varvec{D}_{\varOmega })+\frac{1}{\mu } \varvec{\varLambda } \right\| _F^{2}\\&\quad + \frac{\mu }{2} \left\| \varvec{X} - \varvec{W} + \frac{1}{\mu }\varvec{\varSigma } \right\| _F^{2},\\ \end{aligned} \end{aligned}$$
(14)

where \(\mu \>0\) denotes the penalty parameter, with \(\varvec{\varLambda }\) and \(\varvec{\varSigma }\) denoting the scaled dual variables.

In the interest of brevity, we denote the objective function as \(f(\varvec{X})\), and the two constraints as \(h(\varvec{X})=0\) (i.e., \(h(\varvec{X}) = \varvec{E}_\varOmega -(\varvec{X}_{\varOmega }-\varvec{D}_{\varOmega })\)) and \(g(\varvec{X})=0\) (i.e., \(g(\varvec{X}) = \varvec{X} - \varvec{W}\)). Then, ALM can be used to solve the problem (13), and we summarize the mainly procedure in Algorithm 2.

figure b

As may be seen in Algorithm 2, step 2, step 3, and step 4 are relatively easy to solve. The main difficulty lies in the solution in step 1. The optimization problem that needs to be solved in step 1 can be expressed as

$$\begin{aligned} \begin{aligned}&\min \limits _{\varvec{X},\varvec{E}_\varOmega ,\varvec{W}}\left\| \varvec{X}\right\| _* - \text {Tr}(\varvec{A}\varvec{W}\varvec{B}^{T}) + \gamma \left\| {\varvec{E}_\varOmega } \right\| _p^{p}\\&\quad + \frac{\mu }{2} \left\| \varvec{E}_\varOmega -(\varvec{X}_{\varOmega }-\varvec{D}_{\varOmega })+\frac{1}{\mu } \varvec{\varLambda } \right\| _F^{2}\\&\quad + \frac{\mu }{2} \left\| \varvec{X} - \varvec{W} + \frac{1}{\mu }\varvec{\varSigma } \right\| _F^{2}.\\ \end{aligned} \end{aligned}$$
(15)

It is extremely difficult to accurately minimize \(\varvec{X}, \varvec{E}_\varOmega , \varvec{W}\) in problem (15) simultaneously, and the required algorithm’s complexity may be very high. Thus, the alternating direction multiplier method (ADMM) [24] may be employed to simplify the problem (15), and we will explain it in detail in Sect. 4.3.

4.3 Alternating direction multiplier method to solve problem (15)

As mentioned above, for the problem (15), it is very difficult to optimize three variables at the same time, so we employ ADMM to convert this complex problem into several sub-problems. Specifically, we may optimize the three variables separately. When optimizing for a certain variable, the other two variables will be fixed, leading to the following three sub-problems.

  • Sub-problem 1: When optimizing \(\varvec{X}\), we fix \(\varvec{E}_\varOmega\) and \(\varvec{W}\). Then, problem (15) may be simplified as

    $$\begin{aligned} \min \limits _{\varvec{X}} \left\| \varvec{X}\right\| _* + \frac{\mu }{2} \left\| \varvec{E}_\varOmega -(\varvec{X}_{\varOmega }-\varvec{D}_{\varOmega })+\frac{1}{\mu } \varvec{\varLambda } \right\| _F^{2} + \frac{\mu }{2} \left\| \varvec{X} - \varvec{W} + \frac{1}{\mu }\varvec{\varSigma } \right\| _F^{2}. \end{aligned}$$
    (16)
  • Sub-problem 2: When optimizing \(\varvec{E}_\varOmega\), we fix \(\varvec{X}\) and \(\varvec{W}\). Then, problem (15) may be simplified as

    $$\begin{aligned} \min \limits _{\varvec{E}_\varOmega } \gamma \left\| {\varvec{E}_\varOmega } \right\| _p^{p} + \frac{\mu }{2} \left\| \varvec{E}_\varOmega -(\varvec{X}_{\varOmega }-\varvec{D}_{\varOmega })+\frac{1}{\mu } \varLambda \right\| _F^{2}. \end{aligned}$$
    (17)
  • Sub-problems 3: When optimizing \(\varvec{W}\), we fix \(\varvec{E}_\varOmega\) and \(\varvec{X}\). Then, problem (15) may be simplified as

    $$\begin{aligned} \min \limits _{\varvec{W}} -\text {Tr}(\varvec{A}\varvec{W}\varvec{B}^{T}) + \frac{\mu }{2} \left\| \varvec{X} - \varvec{W} + \frac{1}{\mu }\varvec{\varSigma } \right\| _F^{2}. \end{aligned}$$
    (18)

In this way, we can transform the complex problem (15) into three simpler sub-problems. Specific solutions to these sub-problems are given in Sects. 4.44.5, and 4.6 , respectively.

4.4 Solution of sub-problem (16)

We first introduce the definition of singular value threshold operator.

Definition 3

Consider the SVD of a matrix \(\varvec{X}\in R^{m\times n}\) of rank r

$$\begin{aligned} \varvec{X} = \varvec{U}\varvec{\varSigma } \varvec{V}^{T}, \varvec{\varSigma }=\text {diag}(\{\sigma _i\}_{1\le i\le r}), \end{aligned}$$
(19)

where \(\varvec{U}\in R^{m\times r}\) and \(\varvec{V}\in R^{n\times r}\) denote unit orthogonal matrices, and the i-th singular values \(\sigma _i\) are all positive numbers. For given \(\tau \ge 0\), we define the singular value threshold operator \({\mathcal {D}}_\tau\) as [16]

$$\begin{aligned} {\mathcal {D}}_\tau (\varvec{X}) = \varvec{U}{\mathcal {D}}_\tau (\varvec{\varSigma })\varvec{V}^T,{\mathcal {D}}_\tau (\varvec{\varSigma })=\text {diag}(\max \{\sigma _i -\tau , 0\}). \end{aligned}$$

Immediately, the following lemma from [16] is given without proof:

Lemma 2

For every \(\tau \ge 0\) and a given matrix \(\varvec{Y} \in R^{m\times n}\), the singular value threshold operator defined by Definition 3 always satisfies [16]

$$\begin{aligned} {\mathcal {D}}_\tau (\varvec{Y}) = \mathop {\arg \min }_{\varvec{X}} \{\frac{1}{2} \left\| \varvec{X} - \varvec{Y} \right\| _F^{2} + \tau \left\| \varvec{X}\right\| _* \}. \end{aligned}$$
(20)

Then, the problem (16) may be simplifed as

$$\begin{aligned} \min \limits _{\varvec{X}} \left\| \varvec{X}\right\| _* + \frac{\mu }{2}\left\| \varvec{X}_\varOmega -\varvec{M}_\varOmega \right\| _F^{2} + \frac{\mu }{2}\left\| \varvec{X}-\varvec{N}\right\| _F^{2}, \end{aligned}$$
(21)

where \(\varvec{M}_\varOmega = \varvec{E}_\varOmega + \varvec{D}_\varOmega + \frac{1}{\mu }\varvec{\varLambda }\), \(\varvec{N} = \varvec{W} + \frac{1}{\mu }\varvec{\varSigma }\).

Here, we consider another optimization problem first

$$\begin{aligned} \min \limits _{\varvec{X}} \left\| \varvec{X}_\varOmega -\varvec{M}_\varOmega \right\| _F^{2} + \left\| \varvec{X}-\varvec{N}\right\| _F^{2}. \end{aligned}$$
(22)

Let \(\varvec{X}_{{\overline{\varOmega }}} = \{X_{ij}|(i,j)\notin \varOmega \}\). Then, the optimal solution to problem (22), i.e., \(\widehat{\varvec{X}}\), can be expressed as

$$\begin{aligned} \widehat{\varvec{X}}_{\varOmega } = \frac{\varvec{M}_{\varOmega }+\varvec{N}_{\varOmega }}{2},\widehat{\varvec{X}}_{\overline{\varvec{\varOmega }}} = \varvec{N}_{{\overline{\varOmega }}}. \end{aligned}$$
(23)

Exploiting the relationship between problems (21) and (22), the optimal solution of problem (21) may be obtained as \({\mathcal {D}}_{\frac{1}{\mu }}({\widehat{X}})\) by using Lemma 2. In this way, the iterative process of sub-problem (16) may be solved. The k-th iteration process of \(\varvec{X}\) can be expressed as

$$\begin{aligned} \varvec{X}^{(k+1)} = {\mathcal {D}}_{\frac{1}{\mu }}(\widehat{\varvec{X}}^{(k)}), \end{aligned}$$
(24)

where \(\widehat{\varvec{X}}^{(k)}\) can be obtained from \(\varvec{E}_{\varOmega }^{(k)}, \varvec{D}_{\varOmega }^{(k)}, \varvec{\varLambda }^{(k)}\), and \(\varvec{\varSigma }^{(k)}\) (see equation (23)).

4.5 Solution to sub-problem (17)

Problem (17) can be simplified as

$$\begin{aligned} \min \limits _{\varvec{E}_\varOmega } \frac{\gamma }{\mu } \left\| {\varvec{E}_\varOmega } \right\| _p^{p} + \frac{1}{2} \left\| \varvec{E}_\varOmega -\varvec{C}_{\varOmega } \right\| _F^{2}, \end{aligned}$$
(25)

where \(\varvec{C}_{\varOmega } = \varvec{X}_{\varOmega }-\varvec{D}_{\varOmega }-\frac{1}{\mu } \varvec{\varLambda }\). Since each element \(\{E_{ij}|(i,j)\in \varOmega \}\) of the matrix \(\varvec{E}_\varOmega\) in the above problem can be separated and solved separately, one may, reminiscent to [19], get the optimized problem for each element

$$\begin{aligned} \min \limits _{E_{ij}} \lambda |{E_{ij}}|^{p} + \frac{1}{2} |E_{ij}-C_{ij}|^2, \end{aligned}$$
(26)

where \(\{C_{ij}|(i,j)\in \varOmega \}\) denotes the element of the matrix \(\varvec{C}_{\varOmega }\) and \(\lambda =\frac{\gamma }{\mu }\) (\(\lambda \>0\)).

In the interest of brevity, the problem may be rewritten as the following equivalent form

$$\begin{aligned} \min \limits _{x} \psi (x) = \lambda |{x}|^{p} + \frac{1}{2} |x-a|^2. \end{aligned}$$
(27)

It is easy to know that the objective function \(\psi (x)\) is differentiable when \(x\not =0\), and this point is a discontinuous point of the derivative. The symbolic function \(\text {sgn}(x)\) is introduced to represent the first derivative of \(\psi (x)\) when \(x\not =0\), being expressed as

$${\text{sgn}}(x) = \left\{ {\begin{array}{*{20}l} { - 1,} \hfill & {x < 0,} \hfill \\ {1,} \hfill & {x > 0.} \hfill \\ \end{array} } \right.{\text{ }}$$
(28)

Then, when \(x\not =0\), the first order derivative of \(\psi (x)\) is

$$\begin{aligned} \psi '(x)=x-a+\lambda p|x|^{p-1}\text {sgn}(x). \end{aligned}$$
(29)

Similarly, we can obtain the second and third order derivative of \(\psi (x)\) when \(x\not =0\) as

$$\begin{aligned} \psi ''(x)&=1-\lambda p(1-p)|x|^{p-2}, \end{aligned}$$
(30)
$$\begin{aligned} \psi ^{(3)}(x)&=\lambda p(1-p)(2-p)|x|^{p-3}\text {sgn}(x). \end{aligned}$$
(31)

According to the expression of \(\psi ^{(3)}(x)\), we can know that when \(x\>0\), \(\psi ^{(3)}(x)\>0\), which means \(\psi ''(x)\) strictly increases monotonically. Also, when \(x<0\), \(\psi ^{(3)}(x)<0\), which means \(\psi ''(x)\) is strictly monotonously decreasing. Let \(\psi ''(x) = 0\), we can find a constant \(x_0\) as follows

$$\begin{aligned} x_0 = (\lambda p(1-p))^{\frac{1}{2-p}}. \end{aligned}$$
(32)

We summarize the nature of this function in Table 1. Note that \(\psi (0)=0\) holds, we may further discuss the objective function \(\psi (x)\) in different situations.

Table 1 Analysis of \(\psi (x)\)
  • Case 1: \(\psi '(-x_0)\le 0,\psi '(x_0)\ge 0\).

    In this case, when \(x <0\), \(\psi '(x)<\psi '(-x_0)\le 0\), which means \(\psi (x)\) is strictly monotonously decreasing. And when \(x\>0\), \(\psi '(x)\>\psi '(-x_0)\ge 0\), meaning \(\psi (x)\) is strictly monotonously increasing. Thus, the optimal solution of problem (27) in Case 1 is \(x^*=0\).

  • Case 2: \(\psi '(-x_0)\le 0,\psi '(x_0)<0\).

    In this case, when \(x <0\), \(\psi '(x)<\psi '(-x_0)\le 0\), which means \(\psi (x)\) is strictly monotonously decreasing. Since \(\psi '(x_0)<0\), we suppose that when \(x\>0\), \(\psi '(x)=0\) has two roots \(x_1, x_2\) (\(x1<x2\)). We use Table 2 to more intuitively show the situation of \(\psi (x)\) when \(x\>0\).

Table 2 Analysis of \(\psi (x)\) in Case 2

As can be seen, when \(x \>0\), \(\psi (x)\ge \psi (x_2)\), that is, \(\psi (x_2)\) is the minimum value when \(x \>0\). And \(x_2\) is the root of \(\psi '(x)=0\) greater than \(x_0\), which becomes a suspected optimal solution of problem (27). Newton’s method [25] can be used to iteratively solve the equation \(\psi '(x)=0\) to obtain \(x_2\). We can set the initial value to \(2x_0\) to ensure that the algorithm accurately converge to \(x_2\).

Considering all situation comprehensively, the optimal solution to problem (27) in Case 2 may be expressed as

$$\begin{aligned} x^*=\mathop {\arg \min }_{x\in \{0,x_2\}} \psi (x). \end{aligned}$$
(33)
  • Case 3: \(\psi '(-x_0)\>0,\psi '(x_0)\ge 0\).

    We follow the ideas of Case 2 to analyze. In this case, \(\psi (x)\) is strictly monotonously increasing when \(x \>0\), \(\psi '(x)\>\psi '(x_0)\ge 0\). Since \(\psi '(-x_0)\>0\), we suppose that \(\psi '(x)=0\) has two roots \(x_3, x_4\) (\(x3<x4\)) when \(x<0\). We use Table 3 to more intuitively show the situation of \(\psi (x)\) when \(x<0\).

Table 3 Analysis of \(\psi (x)\) in Case 3

As can be seen, when \(x <0\), \(\psi (x)\ge \psi (x_3)\), that is, \(\psi (x_3)\) is the minimum value when \(x <0\). And \(x_3\) is the root of \(\psi '(x)=0\) less than \(-x_0\), which is a suspected optimal solution of problem (27). We can also use Newton’s method to iteratively solve the equation \(\psi '(x)=0\) to obtain \(x_3\). The initial value is recommended to be set to \(-2x_0\), ensuring that the algorithm accurately converge to \(x_3\).

Considering all situation comprehensively, the optimal solution of problem (27) under this condition may be expressed as

$$\begin{aligned} x^*=\mathop {\arg \min }_{x\in \{0,x_3\}} \psi (x). \end{aligned}$$
(34)
  • Case 4: \(\psi '(-x_0)\>0,\psi '(x_0)<0\).

    This situation is the combination of Case 2 and Case 3. When \(\psi '(x_0)<0\), according to the analysis result of Case 2, we obtain the minimum value \(\psi (x_2)\) when \(x\>0\). When \(\psi '(-x_0)\>0\), according to the analysis result of Case 3, we obtained the minimum value \(\psi (x_3)\) when \(x<0\). Among them, the method of finding \(x_2\) and \(x_3\) is consistent with the above two cases. Then, all suspected optimal solutions are comprehensively considered, the optimal solution to an optimization problem (27) under this condition can be expressed as

    $$\begin{aligned} x^*=\mathop {\arg \min }_{x\in \{0,x_2,x_3\}} \psi (x). \end{aligned}$$
    (35)

All in all, we summarize all the cases of the optimization problem (27), such that its solution may be expressed as

$$\begin{aligned} {\left\{ \begin{array}{ll} &{}\psi '(-x_0)\le 0,\psi '(x_0)\ge 0, x^*=0,\\ &{}\psi '(-x_0)\le 0,\psi '(x_0)<0, x^*=\mathop {\arg \min }_{x\in \{0,x_2\}} \psi (x),\\ &{}\psi '(-x_0)\&{}gt;0,\psi '(x_0)\ge 0, x^*=\mathop {\arg \min }_{x\in \{0,x_3\}} \psi (x),\\ &{}\psi '(-x_0)\&{}gt;0,\psi '(x_0)<0, x^*=\mathop {\arg \min }_{x\in \{0,x_2,x_3\}} \psi (x). \end{array}\right. } \end{aligned}$$
(36)

In this way, sub-problem (17) may also be solved sucessfully.

4.6 Solution to sub-problem (18)

Exploiting the properties of trace, i.e., \(\text {Tr}(\varvec{A})=\text {Tr}(\varvec{A}^T)\) and \(\text {Tr}(\varvec{A}\varvec{B}\varvec{C}) = \text {Tr}(\varvec{B}\varvec{C}\varvec{A})\), problem (18) may be equivalently converted to the following problem as

$$\begin{aligned} \min \limits _{\varvec{W}} -\mu \text {Tr}(\frac{1}{\mu }\varvec{A}^{T}\varvec{B} \varvec{W}^T) + \frac{\mu }{2} \left\| \varvec{W} - (\varvec{X} + \frac{1}{\mu }\varvec{\varSigma }) \right\| _F^{2}. \end{aligned}$$
(37)

Adding a constant term, i.e., \(\frac{\mu }{2} \left\| \frac{1}{\mu }\varvec{A}^T\varvec{B}\right\| _F^{2}\), the problem (37) may be rewritten as

$$\begin{aligned} \min \limits _{\varvec{W}} -\mu \text {Tr}(\frac{1}{\mu }\varvec{A}^{T}\varvec{B} \varvec{W}^T) + \frac{\mu }{2} \left\| \varvec{W} - (\varvec{X} + \frac{1}{\mu }\varvec{\varSigma }) \right\| _F^{2} + \frac{\mu }{2} \left\| \frac{1}{\mu }\varvec{A}^T\varvec{B}\right\| _F^{2}. \end{aligned}$$
(38)

Then, combining the relationship between the F-norm and trace, the above formula may be simplified as

$$\begin{aligned} \min \limits _{\varvec{W}} \frac{\mu }{2} \left\| \varvec{W}- (\varvec{X} + \frac{1}{\mu }\varvec{\varSigma } + \frac{1}{\mu }\varvec{A}^T\varvec{B})\right\| _F^{2}, \end{aligned}$$
(39)

leading to a very straightforward optimal solution, i.e.,

$$\begin{aligned} \varvec{W}^* = \varvec{X} + \frac{1}{\mu }\varvec{\varSigma } + \frac{1}{\mu }\varvec{A}^T\varvec{B}. \end{aligned}$$
(40)

Then, we can obtain the iterative solution to problem (18). In the k-th iteration, the corresponding elements of the matrix \(\varvec{W}^{(k+1)}\) in set \(\varOmega\) are kept consistent with \(\varvec{W}^{(k)}\), yielding

$$\begin{aligned} \widehat{\varvec{W}}^{(k+1)}&= \varvec{X}^{(k+1)} + \frac{1}{\mu }\varvec{\varSigma }^{(k)} - \frac{1}{\mu }\varvec{A}^{(k+1)^T}\varvec{B}^{(k+1)}, \end{aligned}$$
(41)
$$\begin{aligned} \varvec{W}^{(k+1)}&= (\widehat{\varvec{W}}^{(k+1)})_{{{\overline{\varOmega }}}} + (\varvec{W}^{(k)})_{\varOmega }. \end{aligned}$$
(42)

4.7 Algorithm summary

In our proposed algorithm, the stopping criterion is set to

$$\begin{aligned} \frac{\left\| \varvec{X}^{(k+1)}-\varvec{X}^{(k)}\right\| _F}{\max (\left\| \varvec{X}^{(k+1)}\right\| _F,1)} \le tol, \end{aligned}$$
(43)

where tol is the stopping threshold. It is suggested that the stopping threshold, tol, is selected from \(10^{-4}\) to \(10^{-7}\). Based on the analysis in Sect. 4, we summarize the algorithm for the proposed Lp-TNN model in Algorithm 3. The MATLAB code for this algorithm is available online at https://github.com/HauLiang/Lp-TNN.

figure c

4.8 Convergence analysis and computational cost

In this section, we first discuss the convergence property of the proposed Lp-TNN. In the three-steps procedure, similar to [18], this iterative scheme can be proved to converge to a local minimum. Then, the ADMM is employed to convert a complex problem into several sub-problems, which ensures the convergence of the algorithm (see e.g., [26]). For the three sub-problems, the convergence of sub-problem 1 is ensured by singular value threshold operator [16], the convergence of sub-problem 2 is ensured by the Newton’s method (see e.g., [27, 28]), while sub-problem 3 has a closed-form solution.

Then, we analyze the complexity of the proposed algorithm. From Algorithm 3, we can know that the main per-iteration computation cost lies in the SVD operation. For a large-scale problem, we may employ some existing technologies, such as [29], to accelerate the calculation of SVD and make the algorithm more effective.

Fig. 1
figure 1

The REs versus the SNRs using the synthetic data under different sampling rates: a 0.05, b 0.10, c 0.15, and d 0.20

Fig. 2
figure 2

The running time versus the SNRs using the synthetic data under different sampling rates: a 0.05, b 0.10, c 0.15, and d 0.20

Fig. 3
figure 3

The REs versus the ranks using the synthetic data under different sampling rates: a 0.05, b 0.10, c 0.15, and d 0.20

Fig. 4
figure 4

The running time versus the ranks using the synthetic dataset under different sampling rates: a 0.05, b 0.10, c 0.15, and d 0.20

5 Numerical results

In this section, we compare the proposed method with the classic matrix completion methods on synthetic data and real-image data. The selection of parameters for the comparison approaches depend on the suggestions in their papers or the default parameters of the published code to obtain the best performance. The proposed Lp-TNN is compared with the following approaches:

  • SVT algorithm [16], the singular value threshold algorithm.

  • SVP algorithm [17], the singular value projection algorithm.

  • TNNR algorithm [18], the truncated nuclear norm regularization algorithm.

  • Sp-lp algorithm [19], the Schatten p-norm and \(\ell _p\)-norm algorithm.

  • FGSR algorithm [22], the factor group-sparse regularization algorithm.

5.1 Parameter setting

Firstly, we set the parameter p. We find that the performance of matrix recovery may be slightly worse when the value of p increases. This may be because when p tends to 1, the enhancement in robustness to outliers becomes smaller [30]. Simultaneously, we find that a smaller p value may lead to a longer running time. For simplicity, p-value is set to 0.2. For other parameters, the regularization parameter, \(\gamma\), is set to 1.1, the step size constant, \(\rho\), to 1.3, the update proportional constant, \(\mu\), to 0.1, the maximum number of iterations, n, to 1000, and the algorithm threshold, tol, to \(10^{-4}\), to make the algorithm have better performance. As for the setting of r, we test it from 0 to 20 and choose the best result as the final recovered matrix. If there are no special instructions, the parameters of the subsequent experiments are set to the same.

5.2 Synthetic data

Consider the randomly generated \(n\times n\) matrix completion problem with rank r. We first randomly generate two matrices, \(\varvec{X}_1\) and \(\varvec{X}_2\), where \(\varvec{X}_1\in R^{n\times r}\) and \(\varvec{X}_2\in R^{n\times r}\). Thus, the original matrix \(\varvec{X}\) can be generated by \(\varvec{X}=\varvec{X}_1\varvec{X}_2^T\). For the original matrix \(\varvec{X}\), we set different sampling rates, different ranks, and add different signal-to-noise ratios (SNRs) of noise to evaluate the performance of the algorithms in various situations. The relative error (RE) can be used to evaluate the performance of the algorithm, which may be defined as [31]

$$\begin{aligned} RE = \frac{\left\| \varvec{X}^*-\varvec{X}\right\| _F}{\left\| \varvec{X}\right\| _F}, \end{aligned}$$
(44)

where \(\varvec{X}\) and \(\varvec{X}^*\) denote the original matrix and recovered matrix, respectively. The smaller the RE, the better the performance.

First, we discuss the impact of different SNRs of noise on model performance. The size of the matrix is set to 300 × 300, and the rank is set to 10. The Gaussian white noise is added with different SNRs (dB) under different sampling rates. Note that the experiment is repeated 50 times and takes the average value. The REs and running time of the algorithms are recorded, and the results are shown in Figs. 1 and 2, respectively.

Then, we discuss the impact of different matrix ranks on model performance. The matrix size is also set to 300 × 300 and 0.1 dB Gaussian white noise is added. We set different ranks of the original matrix and take the average value after repeating 50 times under different sampling rates. We observe the REs and running time of the algorithms, and the results are shown in Figs. 3 and 4, respectively.

As can be seen in Figs. 1 and 2, the proposed model can maintain a small RE at a lower sampling rate and restore the original matrix with less running time and extremely high recovery efficiency. Figs. 3 and 4 also verify this. Simultaneously, under the influence of various SNR, the proposed method still restore the matrix stably and efficiently, which shows that it provides a robust estimation. In addition, when the rank of the matrix varies, the model can recover the matrix efficiently, showing its strong stability and adaptability.

From the above synthetic data experiments, it may be seen that for completely randomly generated matrix data, our model can perform the task of matrix completion well without causing failure to converge, which also verifies its good convergence.

Fig. 5
figure 5

The recovered images when the sampling rate is 0.4 by different algorithms: a original image, b sampling image, c SVT, d SVP, d TNNR, f Sp-lp, g FGSR, and h Lp-TNN

5.3 Real image data

In most cases, an image can be viewed as an approximate low-rank matrix. Sometimes, images may be partially damaged due to encoding and transmission problems. Thus, the matrix completion algorithm may be employed to these images to recover the lost information. Here, we choose an RGB image and sample it under different sampling rates to obtain an image with missing information. The peak signal-to-noise ratio (PSNR) of the restored image is employed to evaluate the performance of different methods, which can be expressed as

$$\begin{aligned} PSNR = 10 \log _{10}(\frac{255^2}{\frac{1}{3m_1 m_2} \sum _{i=1}^3\left\| \varvec{X}_i^*-\varvec{X}_i\right\| _F^2}), \end{aligned}$$
(45)

where \(\varvec{X}_i\) and \(\varvec{X}_i^*\) denote the original and recovered image matrix of the i-th RGB channel (size: \((m_1, m_2)\)). The larger the PSNR, the better the performance.

First, we show the restored pictures by different methods at a sampling rate of 0.4 in Fig. 5. As may be seen, our model may more clearly recover the picture.

Then, we exploit the sampling rate as an independent variable to calculate the PSNRs and running time under different sampling rates. The results are shown in Fig. 6.

Fig. 6
figure 6

The reconstruction performance by different algorithms, showing the analysis of the a PSNR and b running time

Figure 6 shows that the proposed method can achieve the highest PSNR under various sampling rates, and it has a higher operating efficiency when the sampling rate is low. This may be attributed to the introduction of Lp-norm since it is robust to the outliers, and when the sampling rate is low, outlier points are more likely to be generated.

Simultaneously, it can be seen that as the sampling rate increases, the PSNR of each algorithm increases, which shows the rationality of the experiment. Because as the sampling rate increases, more information about the image can be achieved, which will reduce the difficulty of image restoration and produce a higher PSNR.

Table 4 The NAMEs and RMSEs of the MovieLens-1M dataset by different algorithms

5.4 MovieLens-1M dataset

In this subsection, the proposed method is implemented on the real-world MovieLens-1M dataset, which is available in [32]. This dataset consists of 1 million ratings (1 to 5) for 3900 movies by 6040 users. Here, the stopping threshold, tol, is set to \(10^{-2}\), and the update proportional constant, \(\mu\), to 0.001. And we set the maximum allowable iterations, n, to 100.

We randomly mask 50% of the known ratings and perform the recovery. The performance of the proposed method can be illustrated using the normalized mean absolute error (NMAE) and normalized root-mean-squared-error (RMSE) [33]. The measured NMAEs and RMSEs are shown in Table 4. It may be seen that the Lp-TNN obtains a smaller indicator, again outperforming the alternative methods.

6 Conclusion

The matrix completion problem has a wide application in various fields. However, many matrix completion methods fail to provide a reliable estimation when the sample size is small. In this article, we present a matrix completion approach by combining truncated nuclear norm and Lp-norm, which can avoid a sharp drop when encounting the outliers. The truncated nuclear norm is a more accurate and robust approximation function of the rank function, while the Lp-norm as an error function can improve the model robustness, both of which provide a foundation for better performance.

Although the objective function of the proposed model is not convex, a simple and efficient iterative scheme has been studied. We first decomposed the complex original problem into several relatively easy-to-handle sub-problems by the alternating direction multiplier method. Then, for the sub-problems, the function analysis method, the singular value threshold operator, and the augmented Lagrangian method are employed to form an efficient implementation, respectively.

Finally, we conduct a numerical experiment. We perform corresponding matrix completion experiments on both synthetic and real data. The experimental results of synthetic data show that our algorithm may always obtain better performances under arbitrary random initialization and Gaussian noise interference. Additionally, the proposed methods may not introduce too much computational cost, which confirms its effectiveness. The experimental results of real images show that the proposed method may efficiently restore realistic images, which can provide a robust estimation while suffering Gaussian noise. Especially in the case of a low sampling rate, the image can be restored with a very high peak signal-to-noise ratio and a faster rate. The method’s practical potential is illustrated on the MovieLens-1M dataset.