1 Introduction

We consider solving a class of tensor convex optimization problem of the form:

$$\begin{aligned} \displaystyle \min _{{\mathcal {X}} \in \Omega } \left( \dfrac{1}{2}\Vert {\mathscr {H}}( {\mathcal {X}}) - {\mathcal {B}} \Vert _F^2 + \mu \Vert | \nabla {\mathcal {X}}\Vert |_{1} \right) , \end{aligned}$$
(1)

where the solution \( {\mathcal {X}} \) and the observation \( {\mathcal {B}} \) are Nth-order tensors, \( {\mathscr {H}}\) is a given linear tensor operator, and the set \( \Omega \) is assumed to be a convex constraint over \( {\mathcal {X}} \). The regularization term consists of the tensorial total variation regularization operator \( \Vert | \nabla {\mathcal {X}}\Vert |_{1} \) and the positive regularization parameter \( \mu \). The norms \( \Vert \;.\;\Vert _F \) and \( \Vert |\;.\;\Vert |_{1} \) will be defined in the next section.

The convex optimization problem (1) has the form of the well-known total variation regularization method. This regularization technique was first introduced in [42] with the explicit goal of preserving sharp discontinuities (edges) in a two-dimensional image while removing noise and other unwanted fine-scale detail. Over the years, this model has been extended to many other applications as image processing tasks [14, 18] including inpainting [31], blind deconvolution and data completion [46]. It has also been modified in a variety of ways to improve its performance [4, 30, 35, 39].

The proposed problem (1) represents a constrained multidimensional total variation regularization problem that will cover a wide range of application fields, such as color image and video processing. Our contribution to this work is threefold. Firstly, a gradient descent-like algorithm is developed to minimize the non-differentiable and nonlinear total variation problem over a convex set by computing first the proximal mapping of the total variation term and projecting after the problem using Tseng’s splitting algorithm [2]. Secondly, since the gradient algorithms have a slow convergence rate, we will accelerate our proposed algorithm using some extrapolation techniques. Finally, such methods are represented in the tensorial representation which may extend the range of application of our model and developed algorithm. To the best of our knowledge, the proposed techniques are new.

The paper is organized as follows. In Sect. 2, we review some standard definitions. In Sect. 3, we establish a multidimensional double proximal gradient algorithm applied with the tensorial form of the objective function in the minimization problem (1). We propose, in Sect. 4, to accelerate the algorithm by updating some efficient extrapolation techniques. Section 5 is devoted to apply the proposed algorithm to the low-rank tensor completion. The performance of the proposed algorithm is tested on color images, video completion and inpainting in Sect. 6. Finally, we state the conclusions in Sect. 7.

2 Notation and background

Tensor algebra is developed in direct response to higher dimensional representation and analysis. Keeping the data or the operation in its natural multidimensional form increases the ability of systems to collect and store vast volumes of multifaceted data and also preserve the modeling accuracy.

First, let us recall some preliminaries and notation on tensor algebra. (More details about tensor algebra can be found in [11, 26, 28].) In the remainder of our paper, we adopt the notation used in [26]. We use lowercase letters for vectors, e.g., a , uppercase letters for matrices, e.g., A , and calligraphic letters for tensors, e.g., \({\mathcal {A}}\). Let us denote by \( {\mathbb {X}} \) the space \( \mathrm{I\!R}^{I_{1}\times I_{2} \times \cdots \times I_{N}} \) and by \( {\mathbb {Y}} \) the space \( {\mathbb {X}}^{N}:= {\mathbb {X}}\times {\mathbb {X}}\times \cdots \times {\mathbb {X}}\). Let \( {\mathcal {X}}\in {\mathbb {X}} \) be an Nth-order tensor of size \( I_{1}\times \cdots \times I_{N} \). Its entries are denoted by \( {\mathcal {X}}_{i_1,i_2, \ldots ,i_N} \) or \( {\mathcal {X}}(i_1,i_2, \ldots ,i_N) \). Let us denote by \( {\mathcal {O}} \) the tensor having all its entries equal to zero. The mode-n matricization (also known as mode-n unfolding) of a tensor \( {\mathcal {X}} \) is denoted by the matrix \( \textbf{X }_{(n)} \), and the frontal slices are denoted by the matrices \( X_n \). Let \( k \ge 1 \) be an integer. The norm of the tensor \( {\mathcal {X}} \) is defined by

$$\begin{aligned} \Vert |{\mathcal {X}}\Vert |_{k}= \left( \sum _{ i_1=1}^{I_1}\sum _{ i_2=1}^{I_2} \ldots \sum _{ i_N=1}^{I_N} |{\mathcal {X}}(i_1,i_2, \ldots ,i_N)|^k \right) ^{1/k}. \end{aligned}$$

We also will need to introduce the norm \( \Vert |.\;\Vert |_{\infty } \) given by

$$\begin{aligned} \Vert |{\mathcal {X}}\Vert |_{\infty }= \max _{\begin{array}{c} { 1 \leqslant i_j \leqslant I_{j}} \\ {1 \leqslant j \leqslant N} \end{array}} |{\mathcal {X}}(i_1,i_2, \ldots ,i_N)|. \end{aligned}$$

The inner product of two same sized tensors \({\mathcal {X}}, {\mathcal {Y}}\in \mathrm{I\!R}^{I_1\times I_2\times \cdots \times I_N}\) is defined by

$$\begin{aligned} \langle {\mathcal {X}} \vert {\mathcal {Y}}\rangle = \sum _{ i_1=1}^{I_1}\sum _{ i_2=1}^{I_2} \ldots \sum _{ i_N=1}^{I_N} {\mathcal {X}}(i_1,i_2, \ldots ,i_N) {\mathcal {Y}}(i_1,i_2, \ldots ,i_N). \end{aligned}$$

It follows immediately that \( \Vert |{\mathcal {X}}\Vert |_{2} =\sqrt{ \langle {\mathcal {X}}\vert {\mathcal {X}}\rangle }\). For the case \( k = 2 \), we use the standard notation of the Frobenius norm \( \Vert \;.\; \Vert _{F} \) to denote the tensor norm \( \Vert |\;.\; \Vert |_{2} \).

Let \( f: {\mathbb {X}} \longrightarrow \mathrm{I\!R}\cup \{\infty \} \) be a closed proper convex function [40]. We recall the proximal operator of f in the following definition.

Definition 2.1

[40] The proximal operator (also called the proximal mapping) of f is the operator given by

$$\begin{aligned} {prox}_{f}({\mathcal {U}}) = \mathop {{{\,\textrm{argmin}\,}}}\limits _{{\mathcal {X}}} \left( f({\mathcal {X}} ) + \dfrac{1}{2} \Vert {\mathcal {X}} - {\mathcal {U}}\Vert ^{2} \right) \; \text { for any } {\mathcal {U}} \text { in } {\mathbb {X}}. \end{aligned}$$
(2)

Since the cost function of the minimization problem defined above is strongly convex and not everywhere infinite, then there exists a unique minimizer for every \( {\mathcal {U}} \in {\mathbb {X}} \), see [2, 40] for more details. We will often encounter the proximal operator of the scaled function \( \lambda f \) with \( \lambda > 0 \), which can be expressed as

$$\begin{aligned} {prox}_{ \lambda f}({\mathcal {U}}) = \mathop {{{\,\textrm{argmin}\,}}}\limits _{{\mathcal {X}}} \left( f({\mathcal {X}} ) + \dfrac{1}{2 \lambda } \Vert {\mathcal {X}} - {\mathcal {U}}\Vert ^{2} \right) . \end{aligned}$$
(3)

The operator \( {prox}_{ \lambda f} \) is also called the proximal operator of f with the parameter \( \lambda \).

Definition 2.2

The convex conjugate of f, denoted \( f^{*} \), is given by

$$\begin{aligned} f^{*} ({\mathcal {Y}})= \sup _{ {\mathcal {X}}} \left( \langle {\mathcal {X}} \vert {\mathcal {Y}} \rangle - f({\mathcal {X}} ) \right) ,\; \forall {\mathcal {Y}} \in {\mathbb {X}}. \end{aligned}$$
(4)

3 Tensorial double proximal gradient method for total variation problem

The main goal of this work is the resolution of the constrained tensorial total variation minimization problem (1). It will be useful to consider the following closed proper convex functions

$$\begin{aligned}\begin{array}{l} \begin{array}{r c l} {\mathscr {F}}: {\mathbb {X}} &{}\longrightarrow &{}\mathrm{I\!R}_+ \\ {\mathcal {X}} &{} \longrightarrow &{} {\mathscr {F}}({\mathcal {X}})= \dfrac{1}{2} \Vert {\mathscr {H}} ( {\mathcal {X}})- {\mathcal {B}} \Vert _F^2, \end{array} \\ \begin{array}{r c l} {\mathscr {G}}: {\mathbb {X}} &{}\longrightarrow &{}\mathrm{I\!R}_+ \\ {\mathcal {X}} &{} \longrightarrow &{} {\mathscr {G}}({\mathcal {X}})= \mu \Vert |\nabla {\mathcal {X}} \Vert |_1. \end{array} \end{array} \end{aligned}$$

Note that the gradient operator of an Nth-order tensor \( {\mathcal {X}} \in {\mathbb {X}} \) is defined as a column block tensor \( \nabla {\mathcal {X}} \) in \( {\mathbb {Y}} \) consisting of the partial derivatives \( (\nabla _{(n)} {\mathcal {X}})_{n} \), i.e., \(\nabla {\mathcal {X}} = \begin{pmatrix} \nabla _{(1)} {\mathcal {X}}, \ldots , \nabla _{(N)} {\mathcal {X}} \end{pmatrix}\). For \( n = 1, \ldots , N \), the block tensor \( \nabla _{(n)}{\mathcal {X}} \) is given by

$$\begin{aligned} (\nabla _{(n)}{\mathcal {X}})_{i_1,\ldots , i_N}=\left\{ \begin{array}{ll} {\mathcal {X}}_{i_1,\ldots ,i_n+1,\ldots , i_N}-{\mathcal {X}}_{i_1,\ldots , i_n,\ldots , i_N} &{} \quad \hbox {if} \, i_n<I_n\\ 0 &{} \quad \hbox {if} \, i_n=I_n. \end{array} \right. \end{aligned}$$

The convex constrained minimization problem (1) can be written as:

$$\begin{aligned} \min _{{\mathcal {X}} \in \Omega } \left( {\mathscr {F}}({\mathcal {X}}) + {\mathscr {G}}({\mathcal {X}})\right) , \end{aligned}$$
(5)

where \( \Omega \) is a convex nonempty bounded set. As \( {\mathscr {F}} \) and \( {\mathscr {G}} \) are proper lower semicontinuous convex functions, \( {\mathscr {F}} \) is Gateau differentiable and uniformly convex on \( {\mathbb {X}} \) and if it is further assumed that \( \Omega \) is closed, then there exists a unique solution of the minimization problem (5), see [2, 13, 15, 21, 41] for a deeper discussion.

As in the classical vectorial case, the function \( {\mathscr {F}} \) is differentiable, and its gradient on \( {\mathcal {X}} \) is given by

$$\begin{aligned} \nabla {\mathscr {F}}({\mathcal {X}}) = {\mathscr {H}}^* ({\mathscr {H}} ( {\mathcal {X}}) - {\mathcal {B}}), \end{aligned}$$
(6)

and the function \( {\mathscr {G}} \) is not differentiable due to the non-differentiability of the \( l^1 \) norm, which makes the resolution of this minimization problem more complex.

3.1 Tensorial double proximal gradient algorithm

In this section, we will introduce an interesting extension of gradient descent method to handle this tensorial convex minimization problem. In the literature, the gradient descent technique is developed in a variety of ways to handle different minimization problems, such as nonlinear minimization problems [13], fractional optimization problems [5] and others. The proximal gradient method represents a generalized form of the gradient descent method in the presence of non-differentiability in the cost function [1, 2, 40, 41]. First, we consider the unconstrained minimization problem

$$\begin{aligned} \min _{{\mathcal {X}} \in {\mathbb {X}} } \left( {\mathscr {F}}({\mathcal {X}}) + {\mathscr {G}}({\mathcal {X}})\right) . \end{aligned}$$
(7)

Suppose that, at the step k, we have constructed an iterate tensor \( {\mathcal {X}}_{k} \) that approximates the solution of the constrained minimization problem (5). The quadratic approximation of \( {\mathscr {F}} \) based at the iterate tensor \( {\mathcal {X}}_{k} \), for \( \lambda _{k} > 0 \), is given by

$$\begin{aligned} \Phi _{k}({\mathcal {X}}) = {\mathscr {F}}({\mathcal {X}}_{k}) + \langle {\mathcal {X}} - {\mathcal {X}}_{k} \vert \nabla {\mathscr {F}} ({\mathcal {X}}_k) \rangle + \dfrac{ 1 }{2\lambda _{k}} \Vert {\mathcal {X}} - {\mathcal {X}}_{k} \Vert _{F}^{2}. \end{aligned}$$
(8)

Then, it is immediate to see that the problem (7) is approached, at each step k, by the following minimization problem

$$\begin{aligned} \min _{{\mathcal {X}}} \left( {\mathscr {G}}({\mathcal {X}}) + \dfrac{1}{2\lambda _{k}} \left\| {\mathcal {X}} - {\mathcal {X}}_{k} + \lambda _{k} \nabla {\mathscr {F}} ({\mathcal {X}}_k) \right\| _{F}^{2} \right) , \end{aligned}$$
(9)

which admits a unique minimizer \( {\mathcal {Z}}_{k} \) given by

$$\begin{aligned} {\mathcal {Z}}_{k}= {prox}_{\lambda _{k} {\mathscr {G}}}\left( {\mathcal {X}}_k - \lambda _{k} \nabla {\mathscr {F}} ({\mathcal {X}}_k)\right) , \;\forall k \in \mathrm{I\!N}, \end{aligned}$$
(10)

where the operator \( {prox}_{\lambda _{k} {\mathscr {G}}} \) denotes the proximal mapping of \( {\mathscr {G}} \) with the parameter \( \lambda _{k} \).

In general, the algorithm proposed for computing \( {\mathcal {Z}}_{k} \) by (10) required two essential elements. The first one is an optimal selection of the step size sequence \( (\lambda _k)_k \) that depends on the Lipschitz constant of \( \nabla {\mathscr {F}} \) (to be discussed later), and the second one is the computation of the proximal operator of \( \lambda _{k} {\mathscr {G}} \) which is given in the following proposition.

Proposition 3.1

For all \( {\mathcal {Y}} \in {\mathbb {X}} \) and \( \lambda >0 \), the proximal operator of \( \lambda {\mathscr {G}} \) is given by

$$\begin{aligned} {\mathcal {Z}} = {prox}_{\lambda {\mathscr {G}} }({\mathcal {Y}})= {\mathcal {Y}}+ \lambda \nabla ^T({\mathcal {P}}^*), \end{aligned}$$
(11)

where \( {\mathcal {P}}^* \) is an optimal solution of

$$\begin{aligned} \displaystyle \min _{{\mathcal {P}}} \left( {\mathscr {G}}_1^*(-{\mathcal {P}})+ \langle \frac{\lambda }{2} \nabla \left( \nabla ^T {\mathcal {P}}\right) +\nabla {\mathcal {Y}} \vert {\mathcal {P}} \rangle \right) , \end{aligned}$$
(12)

with the function \( {\mathscr {G}}_1^* \) being the conjugate function of \( {\mathscr {G}}_1:= \mu \Vert |.\;\Vert |_1 \).

Proof

By Definition 2.1, for all \( {\mathcal {Y}} \) and \( \lambda >0 \), \( prox_{\lambda {\mathscr {G}} }({\mathcal {Y}}) \) is the unique optimal solution of following unconstrained minimization problem

$$\begin{aligned} \min _{{\mathcal {U}}} \left( {\mathscr {G}}({\mathcal {U}}) + \dfrac{1}{2\lambda } \Vert {\mathcal {U}}- {\mathcal {Y}}\Vert _F^2 \right) . \end{aligned}$$
(13)

If we assume that \( {\mathscr {G}}({\mathcal {U}})= {\mathscr {G}}_1 (\nabla {\mathcal {U}}) \), with \( {\mathscr {G}}_1({\mathcal {Y}})= \mu \Vert | {\mathcal {Y}} \Vert |_1 \), the minimization problem can be transformed to a constrained minimization problem as follows

$$\begin{aligned} \left\{ \begin{array}{c} \displaystyle \min _{{\mathcal {U}}, {\mathcal {V}}} \left( {\mathscr {G}}_1({\mathcal {V}}) + \dfrac{1}{2\lambda } \Vert {\mathcal {U}}- {\mathcal {Y}}\Vert _F^2 \right) \\ s.t \qquad {\mathcal {V}}= \nabla {\mathcal {U}}. \end{array} \right. \end{aligned}$$
(14)

The Lagrangian function associated with this problem is defined as

$$\begin{aligned} {\mathscr {L}}({\mathcal {U}}, {\mathcal {V}}, {\mathcal {P}}) = {\mathscr {G}}_1({\mathcal {V}}) + \dfrac{1}{2\lambda } \Vert {\mathcal {U}}- {\mathcal {Y}}\Vert _F^2 + \langle {\mathcal {P}} | {\mathcal {V}}-\nabla {\mathcal {U}} \rangle . \end{aligned}$$
(15)

As a consequence, the solution of (14) is exactly the saddle point of \( {\mathscr {L}} \) (see [20]), which is the solution of the Lagrangian primal problem

$$\begin{aligned} \min _{{\mathcal {U}}, {\mathcal {V}}} \max _{{\mathcal {P}}} {\mathscr {L}}({\mathcal {U}}, {\mathcal {V}}, {\mathcal {P}}). \end{aligned}$$
(16)

Since the Lagrangian is separable with respect to \( {\mathcal {U}} \) and \( {\mathcal {V}} \), then we may switch the min-max order based on the min-max theorem [15, 41]. As a consequence, the Lagrangian dual problem can be written as

$$\begin{aligned} \displaystyle \max _{{\mathcal {P}}} \left[ \min _{{\mathcal {U}}}\left( \dfrac{1}{2\lambda } \Vert {\mathcal {U}}- {\mathcal {Y}}\Vert _F^2 - \langle {\mathcal {P}} |\nabla {\mathcal {U}} \rangle \right) + \min _{{\mathcal {V}}} \left( {\mathscr {G}}_1({\mathcal {V}})+ \langle {\mathcal {P}} | {\mathcal {V}}\rangle \right) \right] . \end{aligned}$$
(17)

On the one hand, it is clear that the minimizer of the problem in \( {\mathcal {U}} \) is given by \( {\mathcal {U}}^*= {\mathcal {Y}} + \lambda \nabla ^T{\mathcal {P}} \) with a corresponding optimal value equal to:

$$\begin{aligned} \min _{{\mathcal {U}}} \left( \dfrac{1}{2\lambda } \Vert {\mathcal {U}}- {\mathcal {Y}}\Vert _F^2 - \langle {\mathcal {P}} |\nabla {\mathcal {U}} \rangle \right)&= \dfrac{1}{2\lambda } \Vert {\mathcal {U}}^*- {\mathcal {Y}}\Vert _F^2 - \left\langle {\mathcal {P}}|\nabla {\mathcal {U}}^* \right\rangle \nonumber \\ {}&= - \langle \frac{ \lambda }{2 } \nabla (\nabla ^T {\mathcal {P}}) +\nabla {\mathcal {Y}} | {\mathcal {P}}\rangle . \end{aligned}$$
(18)

On the other hand, the second minimization problem verifies

$$\begin{aligned} \min _{{\mathcal {V}}} \left( {\mathscr {G}}_1({\mathcal {V}})+ \langle {\mathcal {P}} | {\mathcal {V}}\rangle \right) = - \max _{{\mathcal {V}}} \left( \langle -{\mathcal {P}} | {\mathcal {V}}\rangle - {\mathscr {G}}_1({\mathcal {V}}) \right) = - {\mathscr {G}}_1^*(-{\mathcal {P}}), \end{aligned}$$
(19)

where we recall that \( {\mathscr {G}}_1^* \) is the convex conjugate function of \( {\mathscr {G}}_1 \). As a result, we obtain the following dual problem

$$\begin{aligned} \displaystyle \max _{{\mathcal {P}}} \left[ - {\mathscr {G}}_1^* \left( -{\mathcal {P}} \right) - \langle \frac{ \lambda }{2 } \nabla (\nabla ^T {\mathcal {P}}) +\nabla {\mathcal {Y}} | {\mathcal {P}} \rangle \right] \end{aligned}$$
(20)

that can be rewritten as the minimization problem (12). \(\square \)

So far, we have shown that \( {prox}_{\lambda {\mathscr {G}} }({\mathcal {Y}})= {\mathcal {Y}}+ \lambda \nabla ^T{\mathcal {P}}^{*} \), where \( {\mathcal {P}}^{*} \) is an optimal solution of the minimization problem (12). In other words, the calculation of the proximal mapping of the function \( {\mathscr {G}} \) required, at each iteration k, the resolution of the minimization problem (12). For \( {\mathcal {Y}} = {\mathcal {Y}}_{k}:= {\mathcal {X}}_{k} - \lambda _{k} \nabla {\mathscr {F}} ({\mathcal {X}}_k) \), and \( \lambda = \lambda _{k} \), at the step k, we have to solve the problem

$$\begin{aligned} \displaystyle \min _{{\mathcal {P}}} \left( {\mathscr {G}}_1^*(-{\mathcal {P}})+ \langle \frac{ \lambda _{k} }{2 } \nabla (\nabla ^T {\mathcal {P}}) +\nabla {\mathcal {Y}}_k | {\mathcal {P}} \rangle \right) . \end{aligned}$$
(21)

For all \( k \in \mathrm{I\!N}\), let us consider the operators \( {\mathscr {K}} \) and \( ({\mathscr {D}}_k)_k \) defined as

$$\begin{aligned} \begin{array}{l} \begin{array}{r c l} {\mathscr {K}}: {\mathbb {Y}} &{}\longrightarrow &{}\mathrm{I\!R}_+ \\ {\mathcal {P}} &{} \longrightarrow &{} {\mathscr {K}} ({\mathcal {P}})= {\mathscr {G}}_1^*(-{\mathcal {P}}), \end{array} \\ \begin{array}{r c l} {\mathscr {D}}_k: {\mathbb {Y}} &{}\longrightarrow &{}\mathrm{I\!R}\\ {\mathcal {P}} &{} \longrightarrow &{} {\mathscr {D}}_k({\mathcal {P}})=\langle \dfrac{ \lambda _{k} }{2 } \nabla (\nabla ^T {\mathcal {P}}) +\nabla {\mathcal {Y}}_k | {\mathcal {P}} \rangle . \end{array} \end{array} \end{aligned}$$

Since the objective functional of the minimization problem (21) is a sum of closed proper convex function \( {\mathscr {K}} \), and closed proper convex differentiable function \( {\mathscr {D}}_k \), then we can use again the proximal gradient approaches to solve (21). Hence, the solution can be approximated by the sequence \( ({\mathcal {P}}_l)_l \) defined as

$$\begin{aligned} \forall l \in \mathrm{I\!N}, \quad {\mathcal {P}}_{l+1}= {prox}_{\alpha _l {\mathscr {K}} }({\mathcal {P}}_l-\alpha _l \nabla {\mathscr {D}}_k({\mathcal {P}}_l)), \end{aligned}$$
(22)

with \( \alpha _l >0 \) being a step size parameter. Notice that the expression of the proximal gradient method (22) required two important ingredients: the gradient of the differentiable function \( {\mathscr {D}}_k \) and the proximal mapping of the non-differentiable function \( {\mathscr {K}} \). It is immediate to see that the gradient of \( {\mathscr {D}}_k \) is given by:

$$\begin{aligned} \forall k, \quad \nabla {\mathscr {D}}_k({\mathcal {P}}) = \lambda _{k} \nabla (\nabla ^T {\mathcal {P}}) +\nabla {\mathcal {Y}}_k. \end{aligned}$$
(23)

In the other hand, the proximal mapping \( prox_{\alpha _l {\mathscr {K}} } \) is discussed in the following proposition.

Proposition 3.2

For all \( {\mathcal {P}} \in {\mathbb {Y}} \), the proximal mapping of \( \alpha _l {\mathscr {K}} \) is given by

$$\begin{aligned} {prox}_{\alpha _l {\mathscr {K}} }({\mathcal {P}})= {\mathcal {P}}+ {prox}_{\alpha _l {\mathscr {G}}_1}(-{\mathcal {P}}), \; \forall l \in \mathrm{I\!N}, \end{aligned}$$
(24)

where \( {\mathscr {G}}_1:= \mu \Vert |.\;\Vert |_1 \).

Proof

For any \( {\mathcal {P}} \in {\mathbb {Y}} \), we have

$$\begin{aligned} {prox}_{\alpha _l {\mathscr {K}} }({\mathcal {P}})&= \mathop {{{\,\textrm{argmin}\,}}}\limits _{{\mathcal {W}}} \left( {\mathscr {K}} ({\mathcal {W}}) + \frac{1}{2 \alpha _l} \Vert {\mathcal {W}}- {\mathcal {P}}\Vert _F^2 \right) , \\&= \mathop {{{\,\textrm{argmin}\,}}}\limits _{{\mathcal {W}}} \left( {\mathscr {G}}_1^*(-{\mathcal {W}}) + \frac{1}{2 \alpha _l} \Vert {\mathcal {W}}- {\mathcal {P}}\Vert _F^2 \right) , \\&= - \mathop {{{\,\textrm{argmin}\,}}}\limits _{{\mathcal {V}}} \left( {\mathscr {G}}_1^*({\mathcal {V}}) + \frac{1}{2 \alpha _l} \Vert {\mathcal {V}}+ {\mathcal {P}}\Vert _F^2 \right) , \\&=-{prox}_{\alpha _l{\mathscr {G}}_1^* } (-{\mathcal {P}}). \end{aligned}$$

According to Moreau decomposition [37], we have the following property that relate the prox operator of any proper closed convex function f by their conjugates

$$\begin{aligned} {prox}_f(x) + {prox}_{f^*}(x)=x,\quad \forall x. \end{aligned}$$
(25)

Using the relation (25), we obtain the desired conclusion

$$\begin{aligned} {prox}_{\alpha _l {\mathscr {K}} }({\mathcal {P}})= {\mathcal {P}}+ {prox}_{\alpha _l {\mathscr {G}}_1}(-{\mathcal {P}}). \end{aligned}$$
(26)

Furthermore, the tensorial proximal mapping \(prox_{\alpha _l {\mathscr {G}}_1}:= {prox}_{\alpha _l \mu \Vert |\;.\;\Vert |_1 }\) of the function \( {\mathscr {G}}_1 \) is a direct result of the proximal operator of the \( \Vert |.\;\Vert |_{1} \) norm, also known as the soft thresholding operator in the vector case [40]. Then, by using the Moreau’s formula (25) and the fact that the \( \Vert |.\; \Vert |_{\infty } \) norm is the dual norm of the \( \Vert |\;.\; \Vert |_{1} \) norm, thus, the proximal operator \( {prox}_{ \eta \Vert |\;.\;\Vert |_1} \), with any \( \eta > 0 \), can be computed based on the orthogonal projection on the \( \Vert |.\; \Vert |_{\infty } \)-unit ball [40], which is the unit box. This leads to

$$\begin{aligned} \left( {prox}_{\eta \Vert |.\;\Vert |_1}({\mathcal {P}})\right) _{i_{1}, \ldots , i_{N}} =\left\{ \begin{array}{ll} {\mathcal {P}}_{i_{1}, \ldots , i_{N}}-\eta , &{} \quad {\mathcal {P}}_{i_{1}, \ldots , i_{N}} \geqslant \eta ,\\ 0, &{} \quad |{\mathcal {P}}_{i_{1}, \ldots , i_{N}}|< \eta ,\\ {\mathcal {P}}_{i_{1}, \ldots , i_{N}} + \eta , &{} \quad {\mathcal {P}}_{i_{1}, \ldots , i_{N}} \leqslant -\eta , \end{array} \right. \end{aligned}$$
(27)

which establishes the formula and ends the proof. \(\square \)

As a result, the algorithm (10) computing the sequence \( ({\mathcal {X}}_k)_k \) can be summarized as the following double iterative algorithm

$$\begin{aligned} \forall k \in \mathrm{I\!N}, \; \left\{ \begin{array}{lll} \forall l = 1, \ldots , l_{k}, \; \left\{ \begin{array}{lll} {\mathcal {Q}}_l= {\mathcal {P}}_{l} - \alpha _l \nabla {\mathscr {D}}_k({\mathcal {P}}_{l}), \\ {\mathcal {P}}_{l+1} = {\mathcal {Q}}_l + {prox}_{\alpha _l {\mathscr {G}}_1}(-{\mathcal {Q}}_l), \end{array} \right. \\ {\mathcal {Y}}_k={\mathcal {X}}_k - \lambda _k \nabla {\mathscr {F}} ({\mathcal {X}}_k),\\ {\mathcal {Z}}_{k}={\mathcal {Y}}_k + \lambda _k \nabla ^T ({\mathcal {P}}_{l_{k}+1}), \end{array} \right. \end{aligned}$$
(28)

where the step size scalar sequences \( (\lambda _k)_{k} \) and \( (\alpha _l)_{l}\) depend on the Lipschitz constants \( L(\nabla {\mathscr {F}} ) \) and \( L( \nabla {\mathscr {D}}_k) \), respectively, if they are exist.

In the following subsection, we will compute the approximated solution \( {\mathcal {X}}_{k+1} \) of the constrained minimization problem (5) at the step \( k+1 \) by projecting the iterate \( {\mathcal {Z}}_{k} \) in the convex set \( \Omega \) using the Tseng’s splitting approach.

3.2 Tseng’s splitting algorithm

Tseng’s splitting algorithm proposed in [2] considers as a modified version of the proximal gradient algorithm (10) used to handle the constrained convex non-smooth optimization problem. Under a nonempty closed and convex constraint \( \Omega \), the algorithm may be given as follows:

$$\begin{aligned} \forall k \in \mathrm{I\!N}, \quad \left\{ \begin{array}{l} {\mathcal {Y}}_k={\mathcal {X}}_k-\lambda _{k} \nabla {\mathscr {F}}({\mathcal {X}}_k), \\ {\mathcal {Z}}_k={prox}_{\lambda _{k} {\mathscr {G}}}({\mathcal {Y}}_k),\\ {\mathcal {R}}_k={\mathcal {Z}}_k-\lambda _{k} \nabla {\mathscr {F}}({\mathcal {Z}}_k), \\ {\mathcal {X}}_{k+1}=\Pi _{\Omega }({\mathcal {X}}_k-{\mathcal {Y}}_k+{\mathcal {R}}_k), \end{array} \right. \end{aligned}$$
(29)

where \( \Pi _{\Omega } \) denotes the orthogonal projection on the convex set \( \Omega \).

Theorem 3.1

Let \( {\mathcal {X}}^{*} \) denote the unique solution of the problem (5). Suppose that \( \Omega \) is furthermore a closed set in \( {\mathbb {X}} \). Then, the sequence \( ({\mathcal {X}}_{k})_{k} \) generated by Algorithm 29 satisfies \( \; \Vert {\mathcal {X}}_{k} - {\mathcal {X}}^{*}\Vert _{F} \longrightarrow 0 \) as \( k \longrightarrow \infty . \)

Proof

As the functional \( {\mathscr {F}} \) and \( {\mathscr {G}} \) are proper lower semicontinuous convex functions, \( {\mathscr {F}} \) is Gateau differentiable and uniformly convex on \( {\mathbb {X}} \) and the set \( \Omega \) is a closed convex nonempty subset of \( {\mathbb {X}} \), then, the strong convergence of the sequence \( ({\mathcal {X}}_{k})_{k} \) is an immediate consequence of the general result in [2] (see Pr.27.13 pg. 407). \(\square \)

3.3 The step size parameters selection

The choice of the step size parameters is considered as a typical condition that ensures the convergence of the sequence \( ({\mathcal {X}}_k)_{k} \) to the minimizer of the problem (5). It is required that the values of the step size parameters \( \lambda _{k} \) and \( \alpha _{l} \) be in the intervals \( (0,\; \frac{1}{L(\nabla {\mathscr {F}}) }) \) and \( (0, \; \frac{1}{L(\nabla {\mathscr {D}}_{k})}) \), respectively, where \( L(\nabla {\mathscr {F}}) \) and \( L(\nabla {\mathscr {D}}_{k}) \) denote the Lipschitz constants of the operators \( \nabla {\mathscr {F}} \) and \( \nabla {\mathscr {D}}_k \), respectively (see [2, 13] for further details).

For all couple \( ({\mathcal {X}}, {\mathcal {Y}}) \) in \( {\mathbb {X}}\times {\mathbb {X}}\), we have

$$\begin{aligned} \Vert \nabla {\mathscr {F}}({\mathcal {X}})- \nabla {\mathscr {F}}({\mathcal {Y}})\Vert _F&= 2 \left\| {\mathscr {H}}^T( {\mathscr {H}} ( {\mathcal {X}}) )- {\mathscr {H}}^T ({\mathscr {H}} ( {\mathcal {Y}})) \right\| _F, \\&= 2 \left\| {\mathscr {H}}^T ({\mathscr {H}} ({\mathcal {X}} -{\mathcal {Y}})) \right\| _F, \\&\leqslant 2 \left\| \left| {\mathscr {H}}^T \circ {\mathscr {H}} \right\| \right| . \Vert {\mathcal {X}} -{\mathcal {Y}}\Vert _F , \end{aligned}$$

where \( \circ \) stands for the composition operation. Then, we may choose as a Lipschitz constant of the operator \( \nabla {\mathscr {F}} \) the constant

$$\begin{aligned} L(\nabla {\mathscr {F}})= 2 \big \Vert \big | {\mathscr {H}}^T \circ {\mathscr {H}} \big \Vert \big |. \end{aligned}$$
(30)

As consequence, the step size \( (\lambda _k)_k \) can be chosen as a fixed step size value \( \lambda _k:= \lambda \in (0, \frac{1}{L(\nabla {\mathscr {F}})} )\).

In the case of the operator \( \nabla {\mathscr {D}}_k \), for a fixed step k, a Lipschitz constant \( L(\nabla {\mathscr {D}}_{k}) \) is not known; then, the step sizes \( (\alpha _l ) \) can be found by a line search method [40], which mean that we apply the proximal gradient method with an easy backtracking step size rule as fallows

$$\begin{aligned} \forall l \in \mathrm{I\!N}, \quad \alpha _l = \tau \alpha _{l-1}, \end{aligned}$$
(31)

where the scalar \( \tau > 0 \) is a line search parameter.

4 Accelerated tensorial double proximal gradient algorithms for total variation problem

It is well known that the proximal gradient algorithm suffers from a slow convergence rate. We will present in this section an accelerated version of the proximal gradient algorithm which consists in adding an extrapolation step in the algorithm, in order to compute the solution in less steps than the basic proximal gradient. A large amount of research has been conducted to different extrapolation algorithms applied to a variety of general problems [7, 9, 24, 33, 43], and others developed of the proximal gradient method [3, 38].

Definition 4.1

Let \( ({\mathcal {X}}_k )_{k}\) and \( ({\mathcal {T}}_k)_{k} \) be two convergent sequences to the same limit \( {\mathcal {X}}^{*} \), we say that \( ({\mathcal {T}}_k )\) converges faster than the sequence \( ({\mathcal {X}}_k )\) if

$$\begin{aligned} \lim _{ k \longrightarrow \infty } \dfrac{\Vert {\mathcal {T}}_k - {\mathcal {X}}^*\Vert }{\Vert {\mathcal {X}}_k - {\mathcal {X}}^* \Vert } = 0. \end{aligned}$$
(32)

The goal of the extrapolation is to find a sequence \( ({\mathcal {T}}_k)_{k} \) from the sequence \( ({\mathcal {X}}_k)_{k} \) so that \( ({\mathcal {T}}_k)_{k} \) converges faster to the same limit as \( ({\mathcal {X}}_k)_{k} \). There are many extrapolation methods in the literature, but we will only be interested to apply the Nesterov’s algorithm approach and the polynomial extrapolation methods to our tensorial nonlinear minimization problem.

4.1 The tensorial Nesterov acceleration techniques

One simple and widely studied strategy is to perform extrapolation in the spirit of Nesterov’s extrapolation techniques [29, 38]. The basic idea of this technique is to make use of historical information at each iteration in order to reduce the convergence rate from O(1/k) to \( O(1/k^2) \). Thus, using the position of the current iteration tensor and the previous iteration tensor, the tensorial double proximal gradient method is accelerated by adding an extrapolation step given by

$$\begin{aligned} {\mathcal {T}}_{k+1} = {\mathcal {X}}_k + \left( \dfrac{t_{k}-1}{t_{k+1}} \right) ({\mathcal {X}}_k-{\mathcal {X}}_{k-1}), \end{aligned}$$
(33)

where the scalars \( (t_{k}) \) is given, at each step k, by \( t_{k+1}=\dfrac{\sqrt{4t_k^2 + 1}+1}{2}\).

The convergence of the sequence \( ({\mathcal {T}}_k)_{k\in \mathrm{I\!N}} \) may be investigated by following the approaches given in several papers, e.g., [12, 13]. The complete algorithm summarizes the accelerated tensorial double proximal gradient with Nesterov’s extrapolation which is given in Algorithm 1.

figure a

4.2 The global tensorial polynomial extrapolation methods

The polynomial extrapolation methods are among the best-known extrapolation methods thanks to their theoretical clarity and numerical efficiency, especially when applied to solving nonlinear problems such as the case of our main problem (1). The polynomial extrapolation methods were introduced in [43] for the vectorial extrapolation case that developed after in [6, 23, 24] using efficient implementation techniques. Those methods were also developed in a matrix global form in [22] and recently were generalized for the tensor sequences in [16] using tensor product.

In the spirit of [24], we define the transformation in the following form

$$\begin{aligned} {\mathcal {T}}_{k,q} = \sum _{j = 0}^{q} \gamma _{j}{\mathcal {X}}_{k+j}, \end{aligned}$$
(34)

where k defines the first term of the sequence, the integer q stands for the number of terms of the sequence, and the scalars \( (\gamma _{j}) \) verify the following two conditions

$$\begin{aligned} \left\{ \begin{array}{lrc} \displaystyle \sum _{j = 0}^{q} \gamma _{j} = 1\;, \\ \displaystyle \sum _{j = 0}^{q} \gamma _{j} \langle {\mathcal {V}}_{i} \vert \Delta {\mathcal {X}}_{k + j}\rangle = 0,\; i = 0,1, \ldots , q-1, \; \end{array}\right. \end{aligned}$$
(35)

where \( \Delta {\mathcal {X}}_{j} = {\mathcal {X}}_{j+1} - {\mathcal {X}}_{j} \) and \( {\mathcal {V}}_{i} \) are given tensors.

As a consequence, the conditions (35) lead to the following linear system

$$\begin{aligned} \left\{ \begin{array}{llll} \gamma _{0} &{} \quad + \gamma _{1} &{} \quad + \cdots + \gamma _{q} &{} \quad = 1 \\ \gamma _{0} \langle {\mathcal {V}}_{0} \vert \Delta {\mathcal {X}}_{k}\rangle &{} \quad + \gamma _{1} \langle {\mathcal {V}}_{0} \vert \Delta {\mathcal {X}}_{k+1}\rangle &{} \quad + \cdots + \gamma _{q}\langle {\mathcal {V}}_{0} \vert \Delta {\mathcal {X}}_{k+q}\rangle &{} \quad = 0 \\ \gamma _{0} \langle {\mathcal {V}}_{1} \vert \Delta {\mathcal {X}}_{k}\rangle &{} \quad + \gamma _{1} \langle {\mathcal {V}}_{1} \vert \Delta {\mathcal {X}}_{k+1}\rangle &{} \quad + \cdots + \gamma _{q}\langle {\mathcal {V}}_{1} \vert \Delta {\mathcal {X}}_{k+q}\rangle &{} \quad =0 \\ \qquad \vdots &{} \quad \qquad \vdots &{} \quad \qquad \vdots &{} \quad \quad \vdots \\ \gamma _{0} \langle {\mathcal {V}}_{q-1} \vert \Delta {\mathcal {X}}_{k}\rangle &{} \quad + \gamma _{1} \langle {\mathcal {V}}_{q-1}, \Delta {\mathcal {X}}_{k+1}\rangle &{} \quad + \cdots + \gamma _{q}\langle {\mathcal {V}}_{q-1} \vert \Delta {\mathcal {X}}_{k+q}\rangle &{} \quad = 0 \end{array} \right. \end{aligned}$$
(36)

where the vector \( \varvec{\gamma }= [\gamma _{0}, \gamma _{1}, \ldots , \gamma _{q}]^{T}\) is the solution of the matrix equation:

$$\begin{aligned} \underbrace{ \begin{pmatrix} 1 &{} \quad 1 &{} \quad \cdots &{}\quad 1 \\ \langle {\mathcal {V}}_{0} \vert \Delta {\mathcal {X}}_{k}\rangle &{}\quad \langle {\mathcal {V}}_{0} \vert \Delta {\mathcal {X}}_{k+1}\rangle &{}\quad \cdots &{}\quad \langle {\mathcal {V}}_{0} \vert \Delta {\mathcal {X}}_{k+q}\rangle \\ \langle {\mathcal {V}}_{1} \vert \Delta {\mathcal {X}}_{k}\rangle &{}\quad \langle {\mathcal {V}}_{1} \vert \Delta {\mathcal {X}}_{k+1}\rangle &{}\quad \cdots &{} \quad \langle {\mathcal {V}}_{1} \vert \Delta {\mathcal {X}}_{k+q}\rangle \\ \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \vdots \\ \langle {\mathcal {V}}_{q-1} \vert \Delta {\mathcal {X}}_{k}\rangle &{}\quad \langle {\mathcal {V}}_{q-1} \vert \Delta {\mathcal {X}}_{k+1}\rangle &{}\quad \cdots &{}\quad \langle {\mathcal {V}}_{q-1} \vert \Delta {\mathcal {X}}_{k+q}\rangle \end{pmatrix} }_{M} \begin{pmatrix} \gamma _{0}\\ \gamma _{1}\\ \gamma _{2}\\ \vdots \\ \gamma _{q} \end{pmatrix}&= \begin{pmatrix} 1 \\ 0 \\ 0 \\ \vdots \\ 0 \end{pmatrix} . \end{aligned}$$
(37)

It is clear that \( {\mathcal {T}}_{k,q} \) exists and is unique if and only if the square matrix M is nonsingular. In this work, we consider two polynomial extrapolation methods, the global tensor minimal polynomial extrapolation (GT-MPE), where the sequence \( {\mathcal {V}}_{i} \) is defined as

$$\begin{aligned} {\mathcal {V}}_{i} = \Delta {\mathcal {X}}_{i + k}, \end{aligned}$$
(38)

and the global tensor reduced rank extrapolation method (GT-RRE) with

$$\begin{aligned} {\mathcal {V}}_{i} = \Delta ^{2} {\mathcal {X}}_{i + k}= \Delta {\mathcal {X}}_{i + k + 1} - \Delta {\mathcal {X}}_{i + k}. \end{aligned}$$
(39)

Note that there is an integer \( q_0 \), such that \( \left\{ \Delta {\mathcal {X}}_{0}, \Delta {\mathcal {X}}_{1}, \ldots , \Delta {\mathcal {X}}_{q_0-1} \right\} \) is a linearly independent set, but \( \left\{ \Delta {\mathcal {X}}_{0}, \Delta {\mathcal {X}}_{1}, \ldots , \Delta {\mathcal {X}}_{q_0-1}, \Delta {\mathcal {X}}_{q_0} \right\} \) is not. As a consequence, under the condition \( q < q_0 \), both GT-MPE and GT-RRE produce approximations \( {\mathcal {T}}_{k,q} \) of the solution \( {\mathcal {X}}^{*} \) in the form (34). For more details, we refer the reader to [24, 43].

The process of polynomial extrapolation using GT-MPE or GT-RRE is summarized in Algorithm 2.

figure b

For the GT-MPE and GT-RRE methods, the number of calculations required increases quadratically with the number of iterations q and the storage cost increases linearly. A good method to keep the cost of storage and the cost of the lowest possible calculations is to restart these algorithms periodically. The following algorithm describes the restarted version of those methods called also a cycling mode.

figure c

Remark 4.1

In the polynomial vector extrapolation case, the author in [24, 43] proposed a numerically stable and computationally economical algorithm for computing the \( (\gamma _{i})_{i} \) via the QR factorization. The same concept was developed for the tensor case in [16] by defining a new QR factorization of the tensor \( {\mathcal {U}}_{k} \) contents the sequence \( (\Delta {\mathcal {X}}_{i}) \) as frontal slices. However, in our situation, we have only used a direct method for solving the matrix Eq. (37).

The accelerated version of the tensorial double proximal gradient algorithm using the polynomial method is summarized in Algorithm 4.

figure d

5 Application in low-rank tensor completion

In this section, we will apply the proposed algorithm to the tensor completion problems. Completion is a technique of filling missing elements of incomplete data using the values of available elements and the structural assumptions of data. Let us consider the Nth-order tensor \( {\mathcal {G}} \) with observed entries indexed by the set C, i.e., \( C = \{ (i_{1}, i_{2}, \ldots , i_{N}): {\mathcal {X}}_{i_{1}, i_{2}, \ldots , i_{N}} \) is observed \( \} \). Following the same definition of Candes and Tao in [8], we define, in the tensor form, the projection \( P_C({\mathcal {X}}) \) to be the Nth-order tensor with the observed elements of \( {\mathcal {X}} \) preserved and the missing entries replaced with 0, namely

$$\begin{aligned} P_C({\mathcal {X}})=\left\{ \begin{array}{l l l} {\mathcal {X}}_{i_{1}, i_{2}, \ldots , i_{N}} &{} \qquad \hbox {if } (i_{1}, i_{2}, \ldots , i_{N}) \in C, \\ 0 &{} \qquad \hbox {otherwise}. \end{array} \right. \end{aligned}$$
(40)

One of the variants of the data completion problem is to find the lowest rank which matches the observed data. This leads to an NP-hard rank minimization problem due to the non-convexity of the rank function [45]. For that purpose, we need to replace the rank function with something similar that provides the same results. Therefore, the nuclear norm minimization method [17, 32, 47] is widely used in this case to replace the rank minimization problem by the following one

$$\begin{aligned} \left\{ \begin{array}{l} \displaystyle \min _{{\mathcal {X}}} \; \Vert {\mathcal {X}}\Vert _* \\ {\text {subject to}} \; P_C({\mathcal {X}})=P_C({\mathcal {G}}). \end{array} \right. \end{aligned}$$
(41)

where \( \Vert .\;\Vert _* \) stands for the tensor nuclear norm defined as sum of the singular values of the n-mode matricization \( \textbf{X}_{(n)} \) of the tensor \( {\mathcal {X}} \), i.e.\(, \; \Vert {\mathcal {X}}\Vert _*=\displaystyle \sum _{n}\Vert \textbf{X}_{(n)}\Vert _*.\) Tensor completion via total variation minimization was proposed in [31, 46] as an efficient technique to regularize the minimization problem (41). The tensor total variation completion problem is given in the following form:

$$\begin{aligned} \displaystyle \min _{{\mathcal {X}}} \Vert P_C({\mathcal {X}})-P_C({\mathcal {G}})\Vert _F^2+ \Vert {\mathcal {X}}\Vert _* +\mu \Vert |\nabla {\mathcal {X}} \Vert |_1. \end{aligned}$$
(42)

The problem (42) can be formulated to a constrained minimization problem as

$$\begin{aligned} \displaystyle \min _{{\mathcal {X}} \in \Omega } \Vert P_C({\mathcal {X}})- P_C({\mathcal {G}})\Vert _F^2 +\mu \Vert |\nabla {\mathcal {X}} \Vert |_1, \end{aligned}$$
(43)

where \( \Omega =\{ {\mathcal {X}},\; \Vert {\mathcal {X}}\Vert _* \leqslant \epsilon \} \).

By setting \( {\mathscr {H}} = P_C \) and \( {\mathcal {B}} = P_C({\mathcal {G}}) \), the problem (43) leads to the main problem (1). Then, to solve the minimization problem (43), we can use our proposed accelerated tensorial double proximal gradient algorithm. To apply the algorithm, we need to introduce some notation, definitions and properties. The first notion we have to introduce is the t-product. We give the definitions only for the third-order tensor, and the other cases are defined recursively [25].

Definition 5.1

[25] The t-product of two tensors \( {\mathcal {A}} \in \mathrm{I\!R}^{I_1 \times I_2 \times I_3 } \) and \( {\mathcal {B}} \in \mathrm{I\!R}^{ I_2 \times J \times I_3 }\) is a tensor \( {\mathcal {A}}*_t {\mathcal {B}} \in \mathrm{I\!R}^{ I_1 \times J \times I_3 }\) defined as

$$\begin{aligned} {\mathcal {A}}*_t {\mathcal {B}}={{\varvec{ibvec}}}\left( {{\varvec{bcirc}}}({\mathcal {A}})\cdot {{\varvec{bvec}}}({\mathcal {B}})\right) \end{aligned}$$
(44)

where the dot \( \cdot \) stands for the usual matrix product, the matrix \( {{\varvec{bcirc}}}({\mathcal {A}}) \) is a block circulant matrix defined by using the frontal slices of \({\mathcal {A}}\), \({{\varvec{bvec}}}({\mathcal {B}})\) defined as a block vector contain the frontal slices of \({\mathcal {B}} \), and \( {{\varvec{ibvec}}} \) stands for the inverse operation of \( {{\varvec{bvec}}} \).

$$\begin{aligned} {{\varvec{bcirc}}}({\mathcal {A}})=\left( \begin{matrix} A_1 &{}\quad A_{I_3} &{} \quad \cdots &{} \quad A_2\\ A_2 &{} \quad A_1 &{} \quad \cdots &{} \quad A_3\\ :&{} \quad \ddots &{} \quad &{} \quad :\\ A_{I_3}&{} \quad A_{I_3-1} &{} \quad \cdots &{} \quad A_1 \end{matrix} \right) \quad \hbox {and} \quad {{\varvec{bvec}}}({\mathcal {B}}) = \left( \begin{matrix} B_1\\ B_2\\ :\\ B_{I_3} \end{matrix} \right) . \end{aligned}$$

Definition 5.2

[25] The conjugate transpose of a tensor \( {\mathcal {A}} \in \mathrm{I\!R}^{I_1 \times I_2 \times I_3 } \) is the \( {\mathcal {A}}^* \in \mathrm{I\!R}^{I_2 \times I_1 \times I_3 } \) obtained by conjugate transposing each of the frontal slice and then reversing the order of transposed frontal slices 2 through \( I_3 \).

Definition 5.3

[26] For \( {\mathcal {A}} \in \mathrm{I\!R}^{I_1 \times I_2 \times I_3 } \), the t-SVD of the third tensor \( {\mathcal {A}} \) is given by

$$\begin{aligned} {\mathcal {A}}= {\mathcal {U}}*_t {\mathcal {S}} *_t{\mathcal {V}}^*, \end{aligned}$$
(45)

where \( {\mathcal {U}} \) and \( {\mathcal {V}} \) are orthogonal tensors of size \( I_1 \times I_1 \times I_3 \) and \( I_2 \times I_2 \times I_3 \), respectively. The tensor \( {\mathcal {S}} \) is a rectangular f-diagonal tensor of size \( I_1 \times I_2 \times I_3 \), and the entries in \( {\mathcal {S}} \) are called the singular values of \( {\mathcal {A}} \).

Definition 5.4

[26] The nuclear norm of \( {\mathcal {X}} \) is defined using t-SVD decomposition \( {\mathcal {X}}= {\mathcal {U}}*_t {\mathcal {S}} *_t{\mathcal {V}}^* \) as follows:

$$\begin{aligned} \Vert {\mathcal {X}}\Vert _*= \sum _{i=1}^{r} {\mathcal {S}}(i,i,1), \end{aligned}$$
(46)

where r denotes the tubal rank of \( {\mathcal {X}} \).

Let \( \Pi _{\Omega } \) denote the projection on the convex set \( \Omega \). It is immediate to proof that the expression of the projection \( \Pi _{\Omega } \) on \( \Omega \) reduces to the proximal mapping of the nuclear norm. Namely, for all \( {\mathcal {Z}} \) in the tensor space \( {\mathbb {X}} \)

$$\begin{aligned} \Pi _{\Omega }({\mathcal {Z}}) = {prox}_{ \sigma \Vert .\;\Vert _{*}}({\mathcal {Z}}), \end{aligned}$$
(47)

where the proximal mapping of the nuclear norm is given in Proposition 5.1.

Proposition 5.1

[34] Let \( {\mathcal {U}}*_t {\mathcal {S}} *_t{\mathcal {V}}^* \) be the t-SVD decomposition of the tensor \( {\mathcal {X}} \). The proximal mapping of the nuclear norm is given by

$$\begin{aligned} {prox}_{\sigma \Vert .\;\Vert _*}({\mathcal {X}})= {\mathcal {U}}*_t {\mathcal {S}}_\sigma *_t{\mathcal {V}}^*, \; for any \sigma > 0, \end{aligned}$$
(48)

where \( {\mathcal {S}}_\sigma \) is the result of the inverse discrete Fourier transform (IDFT) on \( max(\bar{{\mathcal {S}}}- \sigma , 0) \) along the third dimension, which means performing the IDFT on all the tubes. \( \bar{{\mathcal {S}}} \) is the result of DFT on \( {\mathcal {S}} \) along the third dimension.

6 Numerical results

This section presents that some results illustrate the performance of the developed algorithm. The described model is successfully applied on multidimensional data. Let \( {\mathcal {X}}_{true} \) denote the original image or video, and the tensor \( {\mathcal {B}}\) represents the incomplete data. To evaluate the effectiveness of our algorithm, we use the peak signal-to-noise ratio (PSNR) and the relative error (RE) defined as

$$\begin{aligned} {PSNR} =10 \log _{10} \left( \frac{d^2I_1 \cdots I_N}{\Vert {\mathcal {X}}_\textrm{true} -\widetilde{{\mathcal {X}}}\Vert ^2_F}\right) \;\text { and } RE=\frac{\Vert {\mathcal {X}}_\textrm{true} -\widetilde{{\mathcal {X}}}\Vert _F}{\Vert {\mathcal {X}}_\textrm{true}\Vert _F}, \end{aligned}$$
(49)

where \(I_1\times \cdots \times I_N\) is the size of the approximate solution \( \widetilde{{\mathcal {X}}} \) and d is the maximal variation in the input data. On the other hand, we adopt the stopping criterion of the algorithm as follows

$$\begin{aligned} e_k=\frac{\Vert {\mathcal {X}}_{k+1}-{\mathcal {X}}_k\Vert _F}{\Vert {\mathcal {X}}_k\Vert _F} \leqslant tol, \end{aligned}$$

where the tolerance tol is chosen in the interval \( [10^{-4}, 10^{-1}] \). All computations were carried out using the MATLAB R2018a environment on an Intel(R) Core(TM) i7-8550 CPU @ 1,80 GHz 1,99 GHz computer with 16 GB of RAM.

In the following subsections, we will be interested in two essential parts of tensor completion: color image and video inpainting (text removal) and grayscale video completion. We first illustrate the performance of the proposed Algorithms 1 and 4 by comparing three acceleration methods in inpainting different color images and grayscale videos. The completion of the grayscale video is reported after to show the efficiency of our algorithm in case of uniformly random missing pixels. Finally, we end with some comparisons with state-of-the-art algorithms.

6.1 Text removal

As an interesting application of completion problems, the text removal is a process of data inpainting that based on the completion techniques to recover the missing region in the tensor data or removing some objects added to it. The operation of inpainting depends on the type damaging in the image, and the application that caused this distortion. For example, in the text removal process, we talk about removing the text that can be found in an image or a video [44]. In the literature, many techniques have been developed to solve this problem [19, 27, 36]. The total variation was among the most efficient method to solve such a problem.

In this example, we took the original Barbara color image of size \( 256 \times 256 \times 3 \) and we added a text to this image like shown in Fig. 1. The criterion for stopping the proposed algorithms consists of the tolerance \( tol = 10^{-2} \), the maximum number of iterations \( k_\textrm{max} = 200 \). We hand turned all the parameters \( \lambda ,\; \mu ,\; \alpha \) and \( \sigma \) by choosing each one in its appropriate interval. We hand turned the value of \( \lambda \) in the interval \( (0,\; 1/L(\nabla P_{\Omega })) = (0,\; 1/2)\) by choosing \( \lambda = 2.5 \times 10^{-1} \). On the other hand, the step size sequence\( (\alpha _{k}) \) was computed using the line search iterative method starting by \( \alpha _{0} = 1.1 \) with a line search parameter \( \tau = 0.5 \). We set \( \sigma = 6.5 \times 10^{-2} \), and finally, the regularization parameter was chosen to be \( \mu = 1.2 \times 10^{-2} \). The corresponding results are shown in Fig. 2. Clearly, the accelerated version of the tensorial double proximal gradient method provides clearer images by removing all the added text, either using Algorithm 1 based on Nesterov acceleration approach or Algorithm 4 using the polynomial extrapolation techniques. Moreover, the relative error and the PSNR curves represented in Fig. 3 show that the results produced by the TDPG algorithm accelerated by the polynomial extrapolation techniques RRE or MPE converge faster than those produced by the TDPG accelerated by Nesterov’s technique. The speed of the convergence of the polynomial extrapolation method in comparison with the Nesterov’s approach is clearly illustrated in the report of acceleration in Fig. 4 that shows the fast convergence of TDPG-MPE and TDPG-RRE in less iterations than TDPG-Nesterov.

Fig. 1
figure 1

Original image (left), the observed image \( (PSNR = 15.5) \) (right)

Fig. 2
figure 2

Inpainted image without acceleration \( (PSNR = 21.71) \), inpainted image with TDPG-Nesterov \( (PSNR = 32.45 \)), inpainted image with TDPG-MPE \( (PSNR = 32.5) \), inpainted image with TDPG-RRE \( (PSNR = 32.47) \)

Fig. 3
figure 3

The PSNR and relative error curves

Fig. 4
figure 4

The report of acceleration \( \dfrac{\Vert {\mathcal {T}}_k - {\mathcal {X}}_*\Vert }{\Vert {\mathcal {X}}_k - {\mathcal {X}}_* \Vert } \)

In Table 1, we have reported the PSNR of the completed tensor, the relative error, as well as the number of iterations and the CPU-time results for “barbara.bmp ”color image and “xylophone.mpg ”grayscale video of size \( 120 \times 160 \times 30 \). Based on the tests reported in Table 1 and many more unreported tests, we remark that our proposed algorithm works very effectively for image and video inpainting problems, in terms of the PSNR as well as in terms of the relative error.

Table 1 Comparison between the proposed acceleration techniques for tensorial total variation proximal gradient method
Fig. 5
figure 5

Original frames (top), incomplete frames with PSNR \( = 7.94 \) (center), recovered frames with PSNR \( = 33.21 \) (bottom)

6.2 Grayscale video completion

In order to have more quantitative evaluations on the proposed approach, we used the “news.mpg ”grayscale video of size \( 144 \times 176 \times 10 \) as original data and we randomly mask off about 80% of entries that we regard them as missing values, as shown in the second line of Fig. 5. The completed frames in Fig. 5 with PSNR = 33.21 are obtained by using Algorithm 4 with the GT-RRE polynomial extrapolation technique. The criterion for stopping the algorithm consists of the tolerance \( tol = 10^{-2} \) and the maximum number of iterations \( k_\textrm{max} = 200 \). We set the step size parameters \( \lambda = 2 \times 10^{-1},\; \alpha _0 = 1.1,\; \sigma = 6 \times 10^{-1} \) and the regularization parameter \( \mu = 2 \times 10^{-2} \). We can see that the results are visually pleasant using the accelerated version of the tensorial double proximal gradient method that achieves a PSNR value equal to 33.21.

6.3 Comparison with state-of-the-art algorithms

In this subsection, we compare the performance of our proposed method with the following state-of-the-art tensor completion algorithms: LRTC, TNN, FBCP and RTC. The LRTC algorithms are based on minimizing the sum of nuclear norms of the unfolded matrices of a given tensor. The LRTC has two versions, HaLRTC and FaLRTC [32]. The first one stands for a fast low-rank tensor completion algorithm, and the second stands for a high accuracy low-rank tensor completion algorithm. We also found the TNN method [48] which is a tensor nuclear norm-based method developed using the tensor—singular value decomposition (t-SVD) [25]. The concept of automatic tensor rank determination was introduced in [49] which is based on a Bayesian CP factorization (FBCP) in order to recover incomplete tensor. For the same goal of completing tensors, recently, the RTC algorithm [10] was developed as an auto-weighted approach using this time the well-known tensor trains decomposition [26]. Four benchmark color images, of size \( 256 \times 256 \times 3\), are used in the comparisons, Baboon, Lena, Flower and Airplane (see Fig. 6). To show the efficiency of the proposed algorithm for different types of tensor completion, we generate different incomplete images either using uniformly random missing pixels or non-random missing pixels. In the first case, 80% and 60% of missing pixels are uniformly distributed in Flower and Airplane color images, respectively. Non-random missing pixels, such as text and scrabble, are used to corrupt the color images of Lena and Baboon. The corrupted images are shown in the first column of Fig. 7. In order to provide a fair and unified framework for comparison, all six algorithms are endowed with the same convergence criterion, i.e., the iterations for all algorithms were terminated when the relative error between two successive iterates of approximated primal variable is less than the tolerance \( tol = 10^{-4} \) or when a maximum of 200 iterations has been performed. In addition, the parameters of the six algorithms are refined in relation to the best PSNR, RE and CPU times scores on the images. Table 2 reports the PSNR, the relative error RE, as well as the CPU time in seconds for all the six algorithms, while the recovered images are shown in Fig. 7.

Fig. 6
figure 6

Four benchmark color images: Baboon, Lena, Flower and Airplane, respectively

Fig. 7
figure 7

Image completion comparisons of Airplane, Flower, Lena and Baboon by six algorithms. a the column of the observed (incomplete) images, b the completed images with FaLRTV algorithm, c the completed images with HaLRTV algorithm, d the completed images with TNN algorithm, e the completed images with FBCP algorithm, f the completed images with RTC algorithm and g the completed images with the proposed TDPG algorithm

Table 2 Comparison of the results of six methods applied to four different images

For the uniformly random missing pixels examples, the proposed TGPG algorithm is comparable with the TNN algorithm. Both approaches reach the best results in terms of PSNR and RE. However, by increasing the missing pixels in the Flower color image, the proposed algorithm converges faster than TNN. On the other hand, in the non-random missing pixels example, the proposed algorithm achieves the best PSNR and RE among all other methods. Meanwhile, in terms of computation time, algorithms RTC and FBCP are faster than ours.

7 Conclusion

The accelerated tensorial double proximal gradient algorithms described in this paper can be applied to other fields like image and video restoration. The applications illustrated in this paper are only a small portion of what these algorithms can tackle. The tensorial structure of the main problem offers the possibility of treating multidimensional data; in addition, the use of extrapolation techniques has improved significantly the convergence speed of the proposed algorithm.