A faster tensor robust PCA via tensor factorization

Wang, An-Dong; Jin, Zhong; Yang, Jing-Yu

doi:10.1007/s13042-020-01150-2

A faster tensor robust PCA via tensor factorization

Original Article
Published: 24 June 2020

Volume 11, pages 2771–2791, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

A faster tensor robust PCA via tensor factorization

Download PDF

750 Accesses
7 Citations
Explore all metrics

Abstract

Many kinds of real-world multi-way signal, like color images, videos, etc., are represented in tensor form and may often be corrupted by outliers. To recover an unknown signal tensor corrupted by outliers, tensor robust principal component analysis (TRPCA) serves as a robust tensorial modification of the fundamental PCA. Recently, a successful TRPCA model based on the tubal nuclear norm (TNN) (Lu et al. in IEEE Trans Pattern Anal Mach Intell 42:925–938, 2019) has attracted much attention thanks to its superiority in many applications. However, TNN is computationally expensive due to the requirement of full singular value decompositions, seriously limiting its scalability to large tensors. To address this issue, we propose a new TRPCA model which adopts a factorization strategy. Algorithmically, an algorithm based on the non-convex augmented Lagrangian method is developed with convergence guarantee. Theoretically, we rigorously establish the sub-optimality of the proposed algorithm. We also extend the proposed model to the robust tensor completion problem. Both the effectiveness and efficiency of the proposed algorithm is demonstrated through extensive experiments on both synthetic and real data sets.

A Factorization Strategy for Tensor Robust PCA

Robust principal component analysis based on tensor train rank and Schatten p-norm

Article 01 November 2022

Augmented Lagrangian method for tensor low-rank and sparsity models in multi-dimensional image recovery

Article 16 July 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

PCA is arguably the most broadly applied statistical approach for high-dimensional data analysis and dimension reduction. However, it regards each data instance as a vector, ignoring the rich intra-mode and inter-mode information in the emerging multi-way data (tensor data). One the other hand, it is sensitive to outliers which are ubiquitous in real applications. By manipulating the tensor instance in its original multi-way form and attempting to work well against outliers, tensor robust PCA [7, 23] is a powerful extension of PCA which can overcome the above issues. TRPCA finds many real life applications likes image/video restoration, video surveillance, face recognition, to name a few [7, 23].

As shown in Fig. 1, an idealized version of TRPCA aims to recover an underlying tensor ${\mathcal {L}}^*$ from measurements ${\mathcal {M}}$ corrupted by outliers represented by tensor ${\mathcal {S}}^*$, that is,

$$\begin{aligned} \begin{aligned} {\mathcal {M}}={\mathcal {L}}^*+{\mathcal {S}}^*. \end{aligned} \end{aligned}$$

(1)

Obviously, the decomposition in Model (1) is impossible without additional assumptions on the underlying tensor ${\mathcal {L}}^*$ and the outlier tensor ${\mathcal {S}}^*$. Thus, TRPCA further assumes ${\mathcal {L}}^*$ is “low-rank” and ${\mathcal {S}}^*$ sparse. Mathematically, TRPCA tries to solve a minimization problem as follows

$$\begin{aligned} \begin{aligned}&\min _{{\mathcal {L}},{\mathcal {S}}}\quad {{\,\mathrm{\text {rank}}\,}}({\mathcal {L}})+\lambda \Vert {{\mathcal {S}}} \Vert _{0}\\&~\text {s.t.}\,\,\quad {\mathcal {L}}+{\mathcal {S}}={\mathcal {M}}, \end{aligned} \end{aligned}$$

(2)

where ${{\,\mathrm{\text {rank}}\,}}(\cdot )$ denotes the “rank function” of a tensor, $ \Vert {\cdot } \Vert _{0}$ is the tensor $l_0$-norm (used as a sparsity measure), and $\lambda >0$ is a regularization parameter. Problem (2) is numerically very challenging, since both the tensor rank function and the tensor $l_0$-norm are neither continuous nor convex, even in their simplest matrix versions.

A mainstream approach for tackling the numerical hardness of Problem (2) is to respectively replace the rank function and $l_0$-norm with their convex surrogates $\text {conv-rank}(\cdot )$ and $l_1$-norm, leading to the following convex version of Problem (2)

$$\begin{aligned} \begin{aligned}&\min _{{\mathcal {L}},{\mathcal {S}}}\quad \text {conv-rank}({\mathcal {L}})+\lambda \Vert {{\mathcal {S}}} \Vert _{\text {1}}\\&~\text {s.t.}\,\,\quad {\mathcal {L}}+{\mathcal {S}}={\mathcal {M}}. \end{aligned} \end{aligned}$$

(3)

The $l_1$-norm $ \Vert {\cdot } \Vert _{\text {1}}$ in Problem (3) is widely used as a convex envelop of the $l_0$-norm in compressive sensing and sparse representation to impose sparsity [5, 14, 27, 41].

In the 2-way version of Problem (3) where ${\mathcal {L}},{\mathcal {S}}$ and ${\mathcal {M}}$ are matrices, tensor Robust PCA degenerates to the Robust PCA [2]. In RPCA, the matrix nuclear norm $\Vert {\cdot }\Vert _{*}$ [4] is often chosen as the convex surrogate of matrix rank. However, for general $K-$way $(K\ge 3)$ tensors, one may have multiple choices of $\text {conv-rank}(\cdot )$, since a tensor has many definitions of rank function due to different extensions of the matrix singular value decomposition. The most direct tensor extension of matrix rank is the tensor CP rank [8, 13] which is the smallest number of rank-one tensors that a tensor can be decomposed into. Nevertheless, both the CP rank and its corresponding version of nuclear norm are NP hard to compute [6, 9]. Thanks to its computational tractability, the Tucker rank [29] defined as a vector of ranks of the unfolding matrices along each mode, is the most widely used tensor rank. Its corresponding nuclear norm (denoted by SNN in this paper) [18] is defined as the (weighted) sum of nuclear norms of the unfolding matrices along each mode, and has been used in TRPCA [7]. However, SNN is not a tight convex relaxation of sum of the Tucker rank [28], and it models the underlying tensor as low Tucker rank, which maybe too strong for some real data tensors.

Recently, the low tubal rank models have achieved better performances than low Tucker rank models in many low rank tensor recovery tasks, like tensor completion [30, 33, 44], tensor RPCA [23, 44], sample outlier robust tensor PCA [45] and tensor low rank representation [36, 37], etc. At the core of these models is the tubal nuclear norm (TNN), a version of tensor nuclear norm defined within the framework of t-SVD [12]. Using TNN as the low-rank regularization in Problem (3), the recently proposed TNN-based TRPCA model has shown better performances than traditional models [23, 24, 44]. The rationality behind the superior performance of TNN-based models lies in that TNN is the tight convex relaxation of the tensor average rank, and the low average rank assumption is weaker than the low Tucker rank and low CP rank assumption [24].

Despite its broad use, TNN is computationally expensive since it requires full matrix singular value decompositions. The high computational complexity limits the application of TNN-based models to scale to emerging high-dimensional tensor data. By exploiting the orthogonal invariance of TNN, we come up with a factorization based model for TRPCA which can powerfully accelerate the original TNN-based TRPCA model. Extensive experiments show the superiority and efficiency of the proposed algorithm.

The main contributions of this paper are as follows:

1.
A new model for TRPCA named TriFac is proposed in Model (9) with an ALM algorithm (Algorithm 1) designed to efficiently solve it.
2.
Convergence of the proposed algorithm is shown in Theorem 1.
3.
Sub-optimality of the proposed algorithm is established in Theorems 2 and 3.
4.
We also extend the proposed model to the robust tensor completion problem.

The remaining of this paper is organized as follows. In Sect. 2, the notations, some preliminaries of t-SVD, and some related works are introduced. The problem formulation and the algorithm are shown in Sect. 3. Theoretical analysis of the proposed algorithm is given in Sect. 4. Experimental results are shown in Sect. 6. Finally, we conclude this work in Sect. 7.

2 Tensor SVD and related works

2.1 Notations and tensor SVD

The main notations and abbreviations are listed in Table 1 for convenience. For a 3-way tensor, a tube is a vector defined by fixing indices of the first two modes and varying the third one; A slice is a matrix defined by fixing all but two indices; $\mathrm{fft}_3({\cdot })$ denotes the fast discrete Fourier transformation (FFT) along the third mode of a 3rd order tensor, i.e., the command $\text {fft}(\cdot ,[],3)$ in Matlab; similarly, $\mathrm{ifft}_3({\cdot })$ is defined. Let $\lceil a \rceil $ denote the closest integer to $a \in \mathbb {R}^{}$ that is not smaller than a, and $\lfloor a \rfloor $ denotes the closest integer to $a \in \mathbb {R}^{}$ that is not larger than a. Let $\varvec{1}(\cdot )$ denote the indicator function which equals 1 if the condition is true and 0 otherwise. The spectral norm $\Vert {\cdot }\Vert $ and nuclear norm $\Vert {\cdot }\Vert _{*}$ of a matrix are the maximum and the sum of the singular values, respectively.

Table 1 List of notations and abbreviations

Full size table

Then, we introduce the tensor SVD by defining some notions first.

Definition 1

(T-product [43]) Let ${\mathcal {T}}_1 \in {\mathbb {R}}^{d_1 \times d_2 \times d_3}$ and ${\mathcal {T}}_2 \in {\mathbb {R}}^{d_2\times d_4 \times d_3}$. The t-product of ${\mathcal {T}}_1$ and ${\mathcal {T}}_2$ is a tensor ${\mathcal {T}}$ of size $d_1 \times d_4 \times d_3$:

$$\begin{aligned} {\mathcal {T}}:={\mathcal {T}}_1*{\mathcal {T}}_2, \end{aligned}$$

whose $(i,j)_{th}$ tube is given by ${\mathcal {T}}(i,j,:)=\sum _{k=1}^{d_2} {\mathcal {T}}_1(i,k,:) \bullet {\mathcal {T}}_2(k,j,:)$, where $\bullet $ denotes the circular convolution between two fibers [12].

T-Product and FFT The t-product of ${\mathcal {T}}_1$ and ${\mathcal {T}}_2$ can be computed efficiently by performing the fast Fourier transformation (FFT) along the tube fibers of ${\mathcal {T}}_1$ and ${\mathcal {T}}_2$ to get $\widetilde{{\mathcal {T}}_1}$ and $\widetilde{{\mathcal {T}}_2}$, multiplying the each pair of the frontal slices of $\widetilde{{\mathcal {T}}_1}$ and $\widetilde{{\mathcal {T}}_2}$ to obtain $\widetilde{{\mathcal {T}}}$, and then taking the inverse FFT along the third mode to get the result. For details, please refer to [12].

Definition 2

(Tensor transpose [43]) Let ${\mathcal {T}}$ be a tensor of size $d_1 \times d_2 \times d_3$, then ${\mathcal {T}}{}^\top $ is the $d_2 \times d_1 \times d_3$ tensor obtained by transposing each of the frontal slices and then reversing the order of transposed frontal slices 2 through $d_3$.

Definition 3

(Identity tensor [43]) The identity tensor ${\mathcal {I}} \in {\mathbb {R}}^{d_1 \times d_1 \times d_3}$ is a tensor whose first frontal slice is the $d_1 \times d_1$ identity matrix and all other frontal slices are zero.

Definition 4

(F-diagonal tensor [43]) A tensor is called f-diagonal if each frontal slice of the tensor is a diagonal matrix.

Definition 5

(Orthogonal tensor [43]) A tensor ${\mathcal {Q}}\in {\mathbb {R}}^{d_1\times d_1\times d_3}$ is orthogonal if ${\mathcal {Q}} {}^\top * {\mathcal {Q}} = {\mathcal {Q}} * {\mathcal {Q}} {}^\top = {\mathcal {I}}$.

Based on the concepts of tensor transpose, f-diagonal tensor and orthogonal tensor, the tensor singular value decomposition (t-SVD) can be defined as follows. It is illustrated in Fig. 2.

Definition 6

(T-SVD, Tensor tubal-rank [43]) For any ${\mathcal {T}}\in {\mathbb {R}}^{d_1 \times d_2 \times d_3}$, the tensor singular value decomposition (t-SVD) of ${\mathcal {T}}$ is given as follows

$$\begin{aligned} {\mathcal {T}}= {\mathcal {U}}*{\underline{\varSigma }}*{\mathcal {V}}{}^\top , \end{aligned}$$

where ${\mathcal {U}}\in {\mathbb {R}}^{d_1 \times d_1 \times d_3}$, ${\underline{\varSigma }}\in {\mathbb {R}}^{d_1 \times d_2 \times d_3}$, ${\mathcal {V}}\in {\mathbb {R}}^{d_2 \times d_2 \times d_3}$, ${\mathcal {U}}$ and ${\mathcal {V}}$ are orthogonal tensors, ${\underline{\varSigma }}$ is a rectangular f-diagonal tensor.

The tensor tubal rank of ${\mathcal {T}}$ is defined to be the number of non-zero tubes of ${\underline{\varSigma }}$ in the t-SVD factorization, i.e.,

$$\begin{aligned} r_{\text{ t }}({{\mathcal {T}}}):=\sum _{i}\varvec{1}({\underline{\varSigma }}(i,i,:)\ne {\mathbf{0}}). \end{aligned}$$

Definition 7

(Block diagonal matrix [43]) For any ${\mathcal {T}}\in {\mathbb {R}}^{d_1 \times d_2 \times d_3}$, let ${\overline{{\mathcal {T}}}}$ (or ${\overline{{\mathbf{T}}}}$) denote the block-diagonal matrix of the tensor $\widetilde{{\mathcal {T}}}:=\mathrm{fft}_3({{\mathcal {T}}})$, i.e.,

$$\begin{aligned} {\overline{{\mathcal {T}}}} :=&\left[ \begin{array}{lll}\widetilde{{{\mathcal {T}}}}(:,:,1)&{} &{} \\ &{}\ddots &{} \\ &{} &{} \widetilde{{{\mathcal {T}}}}(:,:,d_3)\end{array} \right] \in {\mathbb {C}}^{d_1d_3 \times d_2d_3}. \end{aligned}$$

According to the relationships between the tensor t-product and FFT, the inner product of two 3-D tensors ${\mathcal {T}}_1,{\mathcal {T}}_2 \in \mathbb {R}^{d_1\times d_2 \times d_3}$ and the inner product of their corresponding block diagonal matrices ${\overline{{\mathcal {T}}_1}},{\overline{{\mathcal {T}}_2}} \in \mathbb {C}^{d_1d_3\times d_2d_3}$ satisfy

$$\begin{aligned} \left\langle {\mathcal {T}}_1,{\mathcal {T}}_2\right\rangle =\frac{1}{d_3}\left\langle \widetilde{{\mathcal {T}}_1},\widetilde{{\mathcal {T}}_2}\right\rangle =\frac{1}{d_3}\left\langle {\overline{{\mathcal {T}}_1}},{\overline{{\mathcal {T}}_2}}\right\rangle . \end{aligned}$$

When ${\mathcal {T}}_1={\mathcal {T}}_2={\mathcal {T}}$, one has

$$\begin{aligned} \Vert {{\mathcal {T}}} \Vert _{\text {F}}=\frac{1}{\sqrt{d_3}}\Vert {\overline{{\mathcal {T}}}}\Vert _F. \end{aligned}$$

Definition 8

(Tensor average rank [43]) The tensor average rank $r_{\mathrm{{a}}}({\mathcal {T}})$ of ${\mathcal {T}}\in \mathbb {R}^{d_1\times d_2 \times d_3}$ is the rescaled matrix rank of ${\overline{{\mathcal {T}}}}$, i.e.,

$$\begin{aligned} r_{\mathrm{{a}}}({\mathcal {T}}):=\frac{\mathrm{rank}({\overline{{\mathcal {T}}}})}{d_3}. \end{aligned}$$

Definition 9

(Tensor operator norm [24, 43]) The tensor operator norm $ \Vert {{\mathcal {T}}} \Vert $ of a 3-D tensor ${\mathcal {T}}\in \mathbb {R}^{d_1\times d_2 \times d_3}$ is defined as follows

$$\begin{aligned} \Vert {{\mathcal {T}}} \Vert :=&\sup _{ \Vert {{\mathcal {V}}} \Vert _{\text {F}}\le 1} \Vert {{\mathcal {T}}*{\mathcal {V}}} \Vert _{\text {F}}. \end{aligned}$$

The relationship between t-product and FFT indicates that

$$\begin{aligned} \Vert {{\mathcal {T}}} \Vert :=&\sup _{ \Vert {{\mathcal {V}}} \Vert _{\text {F}}\le 1} \Vert {{\mathcal {T}}*{\mathcal {V}}} \Vert _{\text {F}}\\ =&\sup _{ \Vert {{\overline{{\mathcal {V}}}}} \Vert _{\text {F}}\le \sqrt{d_3}} \Vert {{\overline{{\mathcal {T}}}}\cdot {\overline{{\mathcal {V}}}}} \Vert _{\text {F}}\\ =&\Vert {{\overline{{\mathcal {T}}}}}\Vert . \end{aligned}$$

Definition 10

(Tensor spectral norm [43]) The tensor spectral norm $ \Vert {{\mathcal {T}}} \Vert $ of a 3-D tensor ${\mathcal {T}}$ is defined as the matrix spectral norm (i.e. the largest singular value) of ${\overline{{\mathbf{T}}}}$, i.e.,

$$\begin{aligned} \Vert {{\mathcal {T}}} \Vert := \Vert {{\overline{{\mathbf{T}}}}}\Vert . \end{aligned}$$

The tubal nuclear norm is further defined.

Definition 11

(Tubal nuclear norm [24]) For any ${\mathcal {T}}\in {\mathbb {R}}^{d_1 \times d_2 \times d_3}$ having t-SVD ${\mathcal {T}}= {\mathcal {U}}*{\underline{\varLambda }}*{\mathcal {V}}{}^\top $, the tubal nuclear norm of ${\mathcal {T}}$ is defined as

$$\begin{aligned} \Vert {{\mathcal {T}}} \Vert _{\star }:=\left\langle {\underline{\varLambda }},{\mathcal {I}}\right\rangle =\sum _{i=1}^{r}{\underline{\varLambda }}(i,i,1), \end{aligned}$$

where $r=r_{\text{ t }}({{\mathcal {T}}})$.

The property of FFT also indicates the relationship as follows

$$\begin{aligned} \begin{aligned} \Vert {{\mathcal {T}}} \Vert _{\star }&=\sum _{i=1}^{r}{\underline{\varLambda }}(i,i,1)=\sum _{k=1}^{d_3}\sum _{i=1}^{r}\widetilde{{\underline{\varLambda }}}(i,i,k)\\&=\frac{1}{d_3}\sum _{k=1}^{d_3}\Vert {\widetilde{{\mathcal {T}}}(:,:,k)}\Vert _{*}=\frac{1}{d_3}\Vert {\widetilde{{\mathcal {T}}}}\Vert _{*}. \end{aligned} \end{aligned}$$

(4)

As can be seen from Eq. (4), TNN indeed models low-rankness in spectral domain.

As shown in [24], tensor spectral norm is the dual norm of TNN. Besides, TNN is the tightest convex relaxation of the average rank in the unit ball of tensor spectral norm $\{{\mathcal {T}}\in \mathbb {R}^{d_1\times d_2 \times d_3}| \Vert {{\mathcal {T}}} \Vert \le 1\}$ [24].

2.2 Related works

In this subsection, only the most relevant works are introduced.^{Footnote 1} Limited to the focus of this paper, we only introduce related works of TRPCA based on different variants of tensor nuclear norms.

RPCA can be seen as TRPCA in the 2-way case. It tries to separate a low rank matrix ${\mathbf{L}}^*\in \mathbb {R}^{d_1 \times d_2}$ and a sparse matrix ${\mathbf{S}}^*\in \mathbb {R}^{d_1 \times d_2}$ from their sum ${\mathbf{M}}={\mathbf{L}}^*+{\mathbf{S}}^*$. As shown in [2], when the underlying tensor ${\mathbf{L}}^*$ satisfies the matrix incoherence conditions, the solution of the following problem

$$\begin{aligned} \begin{aligned}&\min _{{\mathbf{L}},{\mathbf{S}}}~\Vert {{\mathbf{L}}}\Vert _{*} + \lambda \Vert {\mathbf{S}}\Vert _1\\&~\text {s.t.}\,\,~{\mathbf{L}}+{\mathbf{S}}={\mathbf{M}}, \end{aligned} \end{aligned}$$

(5)

can exactly recover ${\mathbf{L}}^*$ and ${\mathbf{S}}^*$ with high probability with parameter $\lambda =1/\sqrt{\max \{d_1,d_2\}}$. Problem (5) uses the matrix nuclear norm $\Vert {\cdot }\Vert _{*}$ as the low rank regularization term $\text {conv-rank}(\cdot )$ in Problem (3).

For general K-way tensors, many variants of tensor nuclear norms are proposed as the low rank term $\text {conv-rank}(\cdot )$ in Problem (3). The most typical is the Tucker rank based tensor nuclear norm (SNN), defined as $ \Vert {{\mathcal {T}}}\Vert _{\text {SNN}}:=\sum \nolimits _{i=1}^{K}\lambda _k\Vert {{\mathbf{T}}_{(k)}}\Vert _{*},$ where $\lambda _k>0$ and ${\mathbf{T}}_{(k)}$ is the mode-k unfolding of ${\mathcal {T}}$ [13]. The SNN-based TRPCA is given as follows

$$\begin{aligned} \begin{aligned}&\min _{{\mathcal {L}},{\mathcal {S}}} ~\Vert {{\mathcal {T}}}\Vert _{\text {SNN}} + {\lambda } \Vert {{\mathcal {S}}} \Vert _{\text {1}}\\&~\text {s.t.}\,\,~{\mathcal {L}}+{\mathcal {S}}={\mathcal {M}}. \end{aligned} \end{aligned}$$

(6)

SNN inherently assumes the underlying tensor to be low Tucker rank, i.e., simultaneously low rank along each mode. However, this assumption may be too strong for some real tensor data. At the same time, the low average rank assumption has been shown to be weaker than low Tucker rank assumption [24]. As shown in Theorem 3.1 in [24], TNN is a convex envelop of the tensor average rank. The TNN-based TRPCA model adopts TNN as a low rank item in Problem 3, formulated as follows

$$\begin{aligned} \begin{aligned}&\min _{{\mathcal {L}},{\mathcal {S}}}~ \Vert {{\mathcal {L}}} \Vert _{\star }+\lambda \Vert {{\mathcal {S}}} \Vert _{\text {1}}\\&~\text {s.t.}\,\,~{\mathcal {L}}+{\mathcal {S}}={\mathcal {M}}. \end{aligned} \end{aligned}$$

(7)

In [23, 24], it is proved that when the underlying tensor ${\mathcal {L}}^*$ satisfy the tensor incoherent conditions, by solving Problem (7), one can exactly recover the underlying tensor ${\mathcal {L}}^*$ and ${\mathcal {S}}^*$ with high probability with parameter $\lambda =1/\sqrt{\max \{d_1,d_2\}d_3}$.

3 TriFac for tensor robust PCA

3.1 Model formulation

To solve the TNN-based TRPCA in Eq. (7), an algorithm based on the alternating directions methods of multipliers (ADMM) is proposed [23]. In each iteration, it computes a proximity operator of TNN, which requires FFT/IFFT, and $d_3$ full SVDs of $d_1$-by-$d_2$ matrices when the observed tensor ${\mathcal {M}}$ is in $\mathbb {R}^{d_1\times d_2 \times d_3}$. The one-iteration computation complexity of the ADMM-based algorithm is

$$\begin{aligned} \begin{aligned} O\big (d_1d_2d_3(\log d_3+\min \{d_1,d_2\})\big ), \end{aligned} \end{aligned}$$

(8)

which is very expensive for large tensors.

To reduce the cost of computing TNN in Problem (7), we come up with the following lemma, indicating that TNN is an orthogonal invariant norm.

Lemma 1

(Orthogonal invariance of TNN) Given a tensor ${\mathcal {X}}\in \mathbb {R}^{r\times r \times d_3}$, let ${\mathcal {P}}\in \mathbb {R}^{d_1\times r \times d_3}$ and ${\mathcal {Q}}\in \mathbb {R}^{d_2\times r \times d_3}$ be two semi-orthogonal tensors, i.e., ${\mathcal {P}}{}^\top *{\mathcal {P}}={\mathcal {I}}\in \mathbb {R}^{r\times r\times d_3}$ and ${\mathcal {Q}}{}^\top *{\mathcal {Q}}={\mathcal {I}}\in \mathbb {R}^{r\times r\times d_3}$, and $r\le \min \{d_1,d_2\}$. Then, we have the following relationship:

$$\begin{aligned} \Vert {{\mathcal {P}}*{\mathcal {X}}*{\mathcal {Q}}{}^\top } \Vert _{\star }= \Vert {{\mathcal {X}}} \Vert _{\star }. \end{aligned}$$

Proof

Let the full t-SVD of ${\mathcal {X}}$ be ${\mathcal {X}}={\mathcal {U}}*{\underline{\varLambda }}*{\mathcal {V}}{}^\top $, where ${\mathcal {U}},{\mathcal {V}}\in \mathbb {R}^{r \times r \times d_3}$ are orthogonal tensors and ${\underline{\varLambda }}\in \mathbb {R}^{r\times r\times d_3}$ is f-diagonal. Then

$$\begin{aligned} \Vert {{\mathcal {X}}} \Vert _{\star }&=\big \Vert {\overline{{\mathcal {U}}*{\underline{\varLambda }}*{\mathcal {V}}{}^\top }}\big \Vert _*=\big \Vert {\overline{{\mathcal {U}}}}\cdot {\overline{{\underline{\varLambda }}}}\cdot {\overline{{\mathcal {V}}{}^\top }}\big \Vert _*=\big \Vert {\overline{{\underline{\varLambda }}}}\big \Vert _*. \end{aligned}$$

Then ${\mathcal {P}}*{\mathcal {X}}*{\mathcal {Q}}{}^\top =({\mathcal {P}}*{\mathcal {U}})*{\underline{\varLambda }}*({\mathcal {Q}}*{\mathcal {V}}){}^\top $. Since

$$\begin{aligned} ({\mathcal {P}}*{\mathcal {U}}){}^\top *({\mathcal {P}}*{\mathcal {U}})&={\mathcal {U}}{}^\top *{\mathcal {P}}{}^\top *{\mathcal {P}}*{\mathcal {U}}={\mathcal {I}},\\ ({\mathcal {Q}}*{\mathcal {V}}){}^\top *({\mathcal {Q}}*{\mathcal {V}})&={\mathcal {V}}{}^\top *{\mathcal {Q}}{}^\top *{\mathcal {Q}}*{\mathcal {V}}={\mathcal {I}}, \end{aligned}$$

we obtain that

$$\begin{aligned} \Vert {{\mathcal {P}}*{\mathcal {X}}*{\mathcal {Q}}{}^\top } \Vert _{\star }&=\big \Vert {\overline{{\mathcal {P}}*{\mathcal {X}}*{\mathcal {Q}}{}^\top }}\big \Vert _*\\&=\big \Vert {\overline{({\mathcal {P}}*{\mathcal {U}})*{\underline{\varLambda }}*({\mathcal {Q}}*{\mathcal {V}}){}^\top }}\big \Vert _*\\&=\big \Vert {\overline{({\mathcal {P}}*{\mathcal {U}})}}\cdot {\overline{{\underline{\varLambda }}}}\cdot {\overline{({\mathcal {Q}}*{\mathcal {V}}){}^\top }}\big \Vert _*\\&=\big \Vert {\overline{{\underline{\varLambda }}}}\big \Vert _*.\\ \end{aligned}$$

Thus, $ \Vert {{\mathcal {P}}*{\mathcal {X}}*{\mathcal {Q}}{}^\top } \Vert _{\star }= \Vert {{\mathcal {X}}} \Vert _{\star }$. $\square $

Equipped with Lemma 1, we decompose the low rank component in Problem 7 as follows:

$$\begin{aligned} {\mathcal {L}}={\mathcal {P}}*{\mathcal {C}}*{\mathcal {Q}}{}^\top ,\quad \text {s.t.}\,\,{\mathcal {P}}{}^\top *{\mathcal {P}}={\mathcal {I}}_{r},~{\mathcal {Q}}{}^\top *{\mathcal {Q}}={\mathcal {I}}_{r}, \end{aligned}$$

where ${\mathcal {I}}_{r}\in \mathbb {R}^{r\times r\times d_3}$ is an identity tensor. Inspired by [17, 20], the tensors ${\mathcal {P}}$ and ${\mathcal {Q}}$ can be seen as “active tensor spaces”. Further, we propose the following model based on triple factorization (TriFac) for tensor robust PCA

$$\begin{aligned} \begin{aligned} \min _{{\mathcal {P}},{\mathcal {Q}},{\mathcal {C}},{\mathcal {S}}}&~ \Vert {{\mathcal {C}}} \Vert _{\star }+\lambda \Vert {{\mathcal {S}}} \Vert _{\text {1}}\\ \text {s.t.}\,\,&~{\mathcal {P}}*{\mathcal {C}}*{\mathcal {Q}}{}^\top +{\mathcal {S}}={\mathcal {M}},\\&~{\mathcal {P}}{}^\top *{\mathcal {P}}={\mathcal {I}}_{r},~{\mathcal {Q}}{}^\top *{\mathcal {Q}}={\mathcal {I}}_{r}, \end{aligned} \end{aligned}$$

(9)

where ${\mathcal {I}}_{r}:={\mathcal {I}}\in \mathbb {R}^{r\times r \times d_3}$, r is an upper estimate of the true tubal rank of the underlying tensor $r^*=r_{\text{ t }}({{\mathcal {L}}^*})$ and we set parameter $\lambda =1/\sqrt{\max \{d_1,d_2\}d_3}$ as suggested by [24]. Not that, different from Problem 7, the proposed TriFac is a non-convex model which may have many local minima.

3.2 Optimization algorithm

The partial augmented Lagrangian of Problem (9) is as follows:

$$\begin{aligned} \begin{aligned}&L_{\mu }({\mathcal {P}},{\mathcal {C}},{\mathcal {Q}},{\mathcal {S}},{\mathcal {Y}})\\&\quad = \Vert {{\mathcal {C}}} \Vert _{\star }+\lambda \Vert {{\mathcal {S}}} \Vert _{\text {1}}+\left\langle {\mathcal {Y}},{\mathcal {P}}*{\mathcal {C}}*{\mathcal {Q}}{}^\top +{\mathcal {S}}-{\mathcal {M}}\right\rangle \\&\qquad +\frac{\mu }{2} \Vert {{\mathcal {P}}*{\mathcal {C}}*{\mathcal {Q}}{}^\top +{\mathcal {S}}-{\mathcal {M}}} \Vert _{\text {F}}^2,\\&\text {s.t.}\,\,~{\mathcal {P}}{}^\top *{\mathcal {P}}={\mathcal {I}}_{r},~{\mathcal {Q}}{}^\top *{\mathcal {Q}}={\mathcal {I}}_{r}, \end{aligned} \end{aligned}$$

(10)

where $\mu >0$ is a penalty parameter, and ${\mathcal {Y}}\in \mathbb {R}^{d_1\times d_2 \times d_3}$ is the Lagrangian multiplier.

Based on the Lagrangian in Eq. (10), we update each variable by fixing others as follows:

$$\begin{aligned} \begin{aligned} \left\{ \begin{aligned}&{\mathcal {P}}=\mathop {{{\,\mathrm{argmin}\,}}}_{{\mathcal {P}}}L_{\mu }({\mathcal {P}},{\mathcal {C}},{\mathcal {Q}},{\mathcal {S}},{\mathcal {Y}}),~\text {s.t.}\,\,~{\mathcal {P}}{}^\top *{\mathcal {P}}={\mathcal {I}}_{r},\\&{\mathcal {Q}}=\mathop {{{\,\mathrm{argmin}\,}}}_{{\mathcal {Q}}}L_{\mu }({\mathcal {P}},{\mathcal {C}},{\mathcal {Q}},{\mathcal {S}},{\mathcal {Y}}),~\text {s.t.}\,\,~{\mathcal {Q}}{}^\top *{\mathcal {Q}}={\mathcal {I}}_{r},\\&{\mathcal {C}}=\mathop {{{\,\mathrm{argmin}\,}}}_{{\mathcal {C}}}L_{\mu }({\mathcal {P}},{\mathcal {C}},{\mathcal {Q}},{\mathcal {S}},{\mathcal {Y}}),\\&{\mathcal {S}}=\mathop {{{\,\mathrm{argmin}\,}}}_{{\mathcal {S}}}L_{\mu }({\mathcal {P}},{\mathcal {C}},{\mathcal {Q}},{\mathcal {S}},{\mathcal {Y}}). \end{aligned} \right. \end{aligned} \end{aligned}$$

(11)

Solutions to each sub-problem in Problem (11) will be given next. The algorithm is summarized in Algorithm 1.

The ${\mathcal {P}}$-subproblem We update ${\mathcal {P}}$ by fixing other variables and minimize $L_{\mu }(\cdot )$ as follows

$$\begin{aligned} \begin{aligned} {\mathcal {P}}_{t+1} &=\mathop {{{\,\mathrm{argmin}\,}}}_{{\mathcal {P}}{}^\top *{\mathcal {P}}={\mathcal {I}}_{r}}L_{\mu _t}({\mathcal {P}},{\mathcal {C}}_{t},{\mathcal {Q}}_{t},{\mathcal {S}}_{t},{\mathcal {Y}}_{t})\\ &=\mathop {{{\,\mathrm{argmin}\,}}}_{{\mathcal {P}}{}^\top *{\mathcal {P}}={\mathcal {I}}_{r}}\frac{\mu _t}{2} \Vert {{\mathcal {P}}*{\mathcal {A}}-{\mathcal {B}}} \Vert _{\text {F}}^2\\ \end{aligned} \end{aligned}$$

(12)

where ${\mathcal {A}}={\mathcal {C}}_{t}*{\mathcal {Q}}{}^\top _{t}$ and ${\mathcal {B}}={\mathcal {M}}-{\mathcal {S}}_{t}-{\mathcal {Y}}_{t}/\mu _t$. We need the following lemma to solve Problem (12).

Lemma 2

Given any tensors ${\mathcal {A}}\in \mathbb {R}^{r\times d_2 \times d_3},{\mathcal {B}}\in \mathbb {R}^{d_1\times d_2 \times d_3}$, suppose tensor ${\mathcal {B}}*{\mathcal {A}}{}^\top $ has t-SVD ${\mathcal {B}}*{\mathcal {A}}{}^\top ={\mathcal {U}}*{\underline{\varLambda }}*{\mathcal {V}}{}^\top $, where ${\mathcal {U}}\in \mathbb {R}^{d_1\times r \times d_3}$ and ${\mathcal {V}}\in \mathbb {R}^{r\times r\times d_3}$. Then, the problem

$$\begin{aligned} \begin{aligned} \min _{{\mathcal {P}}{}^\top *{\mathcal {P}}={\mathcal {I}}_{r}} \Vert {{\mathcal {P}}*{\mathcal {A}}-{\mathcal {B}}} \Vert _{\text {F}}^2 \end{aligned} \end{aligned}$$

(13)

has a closed-form solution as

$$\begin{aligned} \begin{aligned} {\mathcal {P}}={\mathfrak {P}}({\mathcal {B}}*{\mathcal {A}}{}^\top ):={\mathcal {U}}*{\mathcal {V}}{}^\top . \end{aligned} \end{aligned}$$

(14)

Proof

$$\begin{aligned} \begin{aligned} \Vert {{\mathcal {P}}*{\mathcal {A}}-{\mathcal {B}}} \Vert _{\text {F}}^2&= \Vert {{{\mathcal {P}}*{\mathcal {A}}}-{{\mathcal {B}}}} \Vert _{\text {F}}^2\\&= \Vert {{{\mathcal {P}}*{\mathcal {A}}}} \Vert _{\text {F}}^2+ \Vert {{{\mathcal {B}}}} \Vert _{\text {F}}^2-2\left\langle {{\mathcal {P}}*{\mathcal {A}}},{{\mathcal {B}}}\right\rangle \end{aligned} \end{aligned}$$

(15)

Since ${\mathcal {P}}{}^\top *{\mathcal {P}}={\mathcal {I}}_{r}$, we have that

$$\begin{aligned} \Vert {{{\mathcal {P}}*{\mathcal {A}}}} \Vert _{\text {F}}^2&=\frac{1}{d_3} \Vert {{\overline{{\mathcal {P}}*{\mathcal {A}}}}} \Vert _{\text {F}}^2=\frac{1}{d_3} \Vert {{\overline{{\mathcal {P}}}}\cdot {\overline{{\mathcal {A}}}}} \Vert _{\text {F}}^2\\&=\frac{1}{d_3}\text {Tr}\Big ( ({\overline{{\mathcal {P}}}}\cdot {\overline{{\mathcal {A}}}}){}^{\mathsf {H}}({\overline{{\mathcal {P}}}}\cdot {\overline{{\mathcal {A}}}}) \Big )=\frac{1}{d_3}\text {Tr}\Big ( {\overline{{\mathcal {A}}}}{}^{\mathsf {H}}{\overline{{\mathcal {A}}}}\Big )\\&=\frac{1}{d_3} \Vert {{\overline{{\mathcal {A}}}}} \Vert _{\text {F}}^2= \Vert {{\mathcal {A}}} \Vert _{\text {F}}^2 \end{aligned}$$

Also, we have

$$\begin{aligned} \begin{aligned} \left\langle {{\mathcal {P}}*{\mathcal {A}}},{{\mathcal {B}}}\right\rangle&=\frac{1}{d_3}\left\langle {\overline{{\mathcal {P}}*{\mathcal {A}}}},{\overline{{\mathcal {B}}}}\right\rangle =\frac{1}{d_3}\left\langle {\overline{{\mathcal {P}}}}\cdot {\overline{{\mathcal {A}}}},{\overline{{\mathcal {B}}}}\right\rangle \\&=\frac{1}{d_3}\left\langle {\overline{{\mathcal {P}}}},{\overline{{\mathcal {B}}}}\cdot {\overline{{\mathcal {A}}}}{}^{\mathsf {H}}\right\rangle =\frac{1}{d_3}\sum _{k=1}^{d_3}\left\langle \widetilde{{\mathcal {P}}}^{(k)},\widetilde{{\mathcal {X}}}^{(k)}\right\rangle , \end{aligned} \end{aligned}$$

(16)

where ${\mathcal {X}}={\mathcal {B}}*{\mathcal {A}}{}^\top $ and $\widetilde{{\mathcal {X}}}=\mathrm{fft}_3({{\mathcal {X}}})$.

According to the trace inequality of Von Neuman, the inequality reaches its maximum when matrices $\widetilde{{\mathcal {P}}}^{(k)}\in \mathbb {C}^{d_1\times r}$ and $\widetilde{{\mathcal {X}}}^{(k)}\in \mathbb {C}^{d_1\times r}$ have the same right and left singular vectors.

We perform SVD on its first $\lceil \frac{d_3+1}{2}\rceil $ frontal slices $\widetilde{{\mathcal {X}}}^{(k)}\in \mathbb {C}^{d_1\times }$ as follows

$$\begin{aligned} \widetilde{{\mathcal {X}}}^{(k)}={{\mathbf{U}}}^{(k)}{{\mathbf{S}}}^{(k)}({{\mathbf{V}}}^{(k)}){}^{\mathsf {H}}, \quad \forall k=1,\ldots ,\left\lceil \frac{d_3+1}{2}\right\rceil , \end{aligned}$$

where ${{\mathbf{U}}}^{(k)}\in \mathbb {C}^{d_3\times r}$ is a column-orthogonal matrix,${\mathbf{V}}^{(k)} \in \mathbb {C}^{r\times r}$ is an orthogonal matrix, ${\mathbf{S}}^{(k)}=\text {diag}(\sigma ^{(k)}_1,\ldots ,\sigma ^{(k)}_r)$, and $\sigma ^{(k)}_1 \ge \sigma ^{(k)}_2\ge \cdots \ge \sigma ^{(k)}_r\ge 0$ are the singular values of $\widetilde{{\mathcal {X}}}^{(k)}$. Using the relationships between FFT and t-SVD [24], we have that for all $k>\lceil \frac{d_3+1}{2}\rceil $, the frontal slice $\widetilde{{\mathcal {X}}}^{(k)}$ also has an SVD as

$$\begin{aligned} \widetilde{{\mathcal {X}}}^{(k)}&=\text {conj}\big (\widetilde{{\mathcal {X}}}^{(d_3-k+2)}\big ) \\&=\text {conj}\big ({{\mathbf{U}}}^{(d_3-k+2)}\big ){{\mathbf{S}}}^{(d_3-k+2)}\text {conj}\big ({{\mathbf{V}}}^{(d_3-k+2)}\big ){}^{\mathsf {H}}. \end{aligned}$$

Then, we construct a semi-orthogonal tensor ${\mathcal {U}}\in \mathbb {R}^{d_1\times r\times d_3}$ and orthogonal tensor ${\mathcal {V}}\in \mathbb {R}^{r\times r\times d_3}$ as a pair of “singular vector tensors” of ${\mathcal {X}}$:

$$\begin{aligned} \widetilde{{\mathcal {U}}}^{(k)}=\left\{ \begin{aligned}&{\mathbf{U}}^{(k)}, \quad \quad k\le \left\lceil \frac{d_3+1}{2}\right\rceil \\&\text {conj}({\mathbf{U}}^{(d_3-k+2)}),~k>\left\lceil \frac{d_3+1}{2}\right\rceil \end{aligned} \right. \end{aligned}$$

and

$$\begin{aligned} \widetilde{{\mathcal {V}}}^{(k)}=\left\{ \begin{aligned}&{\mathbf{V}}^{(k)}, \quad \quad k\le \left\lceil \frac{d_3+1}{2}\right\rceil \\&\text {conj}({\mathbf{V}}^{(d_3-k+2)}),~k> \left\lceil \frac{d_3+1}{2}\right\rceil \end{aligned} \right. . \end{aligned}$$

Further, we construct ${\mathcal {P}}\in \mathbb {R}^{d_1\times r\times d_3}$ by

$$\begin{aligned} \widetilde{{\mathcal {P}}}^{(k)}=\widetilde{{\mathcal {U}}}^{(k)}(\widetilde{{\mathcal {V}}}^{(k)}){}^{\mathsf {H}},\quad \forall k\le d_3. \end{aligned}$$

Thus we have ${\mathcal {P}}{}^\top *{\mathcal {P}}={\mathcal {I}}$. Also, according to the trace inequality of Von Neuman, the left hand side of Eq. (16) get its maximum and thus Problem (13) get its minimum. $\square $

The ${\mathcal {Q}}$-subproblem By fixing other variables, we update ${\mathcal {Q}}$ as follows

$$\begin{aligned} \begin{aligned} {\mathcal {Q}}_{t+1} &=\mathop {{{\,\mathrm{argmin}\,}}}_{{\mathcal {Q}}{}^\top *{\mathcal {Q}}={\mathcal {I}}_{r}}L_{\mu _t}({\mathcal {P}}_{t+1},{\mathcal {C}}_{t},{\mathcal {Q}},{\mathcal {S}}_{t},{\mathcal {Y}}_{t})\\ &=\mathop {{{\,\mathrm{argmin}\,}}}_{{\mathcal {Q}}{}^\top *{\mathcal {Q}}={\mathcal {I}}_{r}}\frac{\mu _t}{2} \Vert {{\mathcal {A}}'*{\mathcal {Q}}{}^\top -{\mathcal {B}}} \Vert _{\text {F}}^2\\ &=\mathop {{{\,\mathrm{argmin}\,}}}_{{\mathcal {Q}}{}^\top *{\mathcal {Q}}={\mathcal {I}}_{r}}\frac{\mu _t}{2} \Vert {({\mathcal {Q}}*{\mathcal {A}}'{}^\top )-{\mathcal {B}}{}^\top } \Vert _{\text {F}}^2\\ &=\big ({\mathfrak {P}}({\mathcal {B}}{}^\top *{\mathcal {A}}')\big ){}^\top , \end{aligned} \end{aligned}$$

(17)

where ${\mathcal {A}}'={\mathcal {P}}_{t+1}*{\mathcal {C}}_{t}$ and ${\mathcal {B}}={\mathcal {M}}-{\mathcal {S}}_{t}-{\mathcal {Y}}_{t}/\mu _t$, and ${\mathfrak {P}}(\cdot )$ is defined in Lemma 2. The last equality holds because of Eq. (14) in Lemma 2.

The ${\mathcal {C}}$-subproblem We update ${\mathcal {C}}$ as follows

$$\begin{aligned} \begin{aligned}&{\mathcal {C}}_{t+1}\\&=\mathop {{{\,\mathrm{argmin}\,}}}_{{\mathcal {C}}}L_{\mu _t}({\mathcal {P}}_{t+1},{\mathcal {C}},{\mathcal {Q}}_{t+1},{\mathcal {S}}_{t},{\mathcal {Y}}_{t})\\&=\mathop {{{\,\mathrm{argmin}\,}}}_{{\mathcal {C}}} \Vert {{\mathcal {C}}} \Vert _{\star }+\frac{\mu _t}{2} \Vert {{\mathcal {P}}_{t+1}*{\mathcal {C}}*{\mathcal {Q}}{}^\top _{t}+{\mathcal {S}}_{t}-{\mathcal {M}}+\frac{{\mathcal {Y}}}{\mu _t}} \Vert _{\text {F}}^2\\&=\mathop {{{\,\mathrm{argmin}\,}}}_{{\mathcal {C}}} \Vert {{\mathcal {C}}} \Vert _{\star }+\frac{\mu _t}{2} \Vert {{\mathcal {C}}-{\mathcal {P}}_{t+1}^\top *({\mathcal {M}}-{\mathcal {S}}_{t}-\frac{{\mathcal {Y}}}{\mu _t})*{\mathcal {Q}}_{t+1}} \Vert _{\text {F}}^2\\&=\mathfrak {S}_{1/\mu _t}({\mathcal {P}}_{t+1}{}^\top *\left( {\mathcal {M}}-{\mathcal {S}}_{t}-{\mathcal {Y}}/\mu _t\right) *{\mathcal {Q}}_{t+1}), \end{aligned} \end{aligned}$$

(18)

where $\mathfrak {S}_{\tau }(\cdot )$ is the proximity operator of TNN. In [30], a closed-form expression of $\mathfrak {S}_{\tau }(\cdot )$ is given as follows:

Lemma 3

(Proximity operator of TNN [30]) For any 3D tensor${\mathcal {A}}\in \mathbb {R}^{d_1\times d_2 \times d_3}$ with reduced t-SVD${\mathcal {A}}={\mathcal {U}}*{\underline{\varLambda }}*{\mathcal {V}}{}^\top $, where${\mathcal {U}}\in \mathbb {R}^{d_1 \times r \times d_3}$ and${\mathcal {V}}\in \mathbb {R}^{d_2 \times r \times d_3}$ are orthogonal tensors and${\underline{\varLambda }}\in \mathbb {R}^{r \times r \times d_3}$ is the f-diagonal tensor of singular tubes, the proximity operator $\mathfrak {S}_{\tau }({\mathcal {A}})$ at ${\mathcal {A}}$ can be computed by:

$$\begin{aligned} \mathfrak {S}_{\tau }({\mathcal {A}}):&=\mathop {{{\,\mathrm{argmin}\,}}}_{{\mathcal {X}}}\tau \Vert {{\mathcal {X}}} \Vert _{\star }+\frac{1}{2} \Vert {{\mathcal {X}}-{\mathcal {A}}} \Vert _{\text {F}}^2\\ &={\mathcal {U}}*\mathrm{ifft}_3({\max (\mathrm{fft}_3({{\underline{\varLambda }}})-\tau ,0)})*{\mathcal {V}}{}^\top . \end{aligned}$$

The ${\mathcal {S}}$-subproblem We update ${\mathcal {S}}$ as follows

$$\begin{aligned} \begin{aligned}&{\mathcal {S}}_{t+1}\\&\quad =\mathop {{{\,\mathrm{argmin}\,}}}_{{\mathcal {S}}}L_{\mu _{t}}({\mathcal {P}}_{t+1},{\mathcal {C}}_{t+1},{\mathcal {Q}}_{t+1},{\mathcal {S}},{\mathcal {Y}}_{t})\\&\quad =\mathop {{{\,\mathrm{argmin}\,}}}_{{\mathcal {S}}}{\small \Vert {{\mathcal {S}}} \Vert _{\text {1}}+\frac{\rho }{2\lambda } \Vert {{\mathcal {B}}'+{\mathcal {S}}-{\mathcal {M}}+\frac{{\mathcal {Y}}_{t}}{\mu _t}} \Vert _{\text {F}}^2 }\\&\quad =\mathfrak {T}_{\lambda /\rho }({\mathcal {M}}-{\mathcal {B}}'-\frac{{\mathcal {Y}}_{t}}{\mu _t}), \end{aligned} \end{aligned}$$

(19)

where ${\mathcal {B}}'={\mathcal {P}}_{t+1}*{\mathcal {C}}_{t+1}*{\mathcal {Q}}{}^\top _{t+1}$, and $\mathfrak {T}_{\tau }(\cdot )$ is the proximity operator of tensor $l_1$-norm whose closed-from solution is given as follows:

$$\begin{aligned} \mathfrak {T}_{\tau }({\mathcal {A}}):&=\mathop {{{\,\mathrm{argmin}\,}}}_{{\mathcal {X}}}\tau \Vert {{\mathcal {X}}} \Vert _{\text {1}}+\frac{1}{2} \Vert {{\mathcal {X}}-{\mathcal {A}}} \Vert _{\text {F}}^2\\ &=\text {sign}({\mathcal {A}})\circledast \max \{(|{\mathcal {A}}|-\tau ,0\}, \end{aligned}$$

where $\circledast $ denotes the element-wise tensor product.

Complexity analysis In a single iteration, the main cost of Algorithm 1 lies in updating the primal variables ${\mathcal {P}},{\mathcal {C}},{\mathcal {Q}},{\mathcal {S}}$ (using operators ${\mathfrak {P}}(\cdot )$, $\mathfrak {S}_{\tau }(\cdot )$, ${\mathfrak {P}}(\cdot )$, $\mathfrak {T}_{\tau }(\cdot )$ respectively) and the dual variables ${\mathcal {Y}},{\mathcal {Y}}^{i}, i\le 3$.

Note that, we have to compute many intermediate t-products like ${\mathcal {B}}*{\mathcal {A}}{}^\top $, ${\mathcal {P}}*{\mathcal {C}}*{\mathcal {Q}}{}^\top $ before directly updating the primal and dual variables. These t-products can be efficiently computed using FFT/IFFT with a total cost at most $O(d_1d_2d_3\log d_3)$, since the intermediate tensors to conduct FFT/IFFT have sizes at most $d_1\times d_2\times d_3$.

Then, the direct update of ${\mathcal {P}}$ using operator ${\mathfrak {P}}(\cdot )$ involves computing FFT, IFFT and $d_3$ SVDs of $r\times d_2$ matrices, having complexity $O\big (rd_2d_3\log d_3 + r^2d_2d_3\big )$. Similarly, the direct update of ${\mathcal {Q}}$ using operator ${\mathfrak {P}}(\cdot )$ has complexity $O\big ( rd_1d_3\log d_3 + r^2d_1d_3\big )$. Directly updating ${\mathcal {C}}$ using operator $\mathfrak {S}_{\tau }(\cdot )$ involves complexity $O\big ( r^2d_3(r+\log d_3)\big )$. Directly updating ${\mathcal {S}}$ using operator $\mathfrak {T}_{\tau }(\cdot )$ costs $O\big (d_1d_2d_3\big )$. Directly updating ${\mathcal {Y}},{\mathcal {Y}}^{i}, i\le 3$ costs $O(d_1d_2d_3)$. Thus, the one iteration cost of Algorithm 1 is as follows

$$\begin{aligned} O\Big (d_3\big (d_1d_2\log d_3 + r^2(r+{\tilde{d}}+\log d_3)+r{\tilde{d}}\log d_3\big )\Big ), \end{aligned}$$

where ${\tilde{d}}=d_1+d_2.$ When $r\ll \min \{d_1,d_2\}$, the above cost is significantly lower than the one-iteration cost of ADMM-based TRPCA [24] in Eq. (8).

Consider an extreme case in high dimensional settings where $r_{\text{ t }}({{\mathcal {L}}^*})=O(1)$, that is the tubal rank of the underlying tensor ${\mathcal {L}}^*$ scales like a small constant. By choosing the initialized rank $r=2r_{\text{ t }}({{\mathcal {L}}^*})=O(1)$, the one-iteration cost of Algorithm 1 scales like

$$\begin{aligned} \begin{aligned} O(d_1d_2d_3\log d_3), \end{aligned} \end{aligned}$$

(20)

much cheaper than $O(d_1d_2d_3\min \{d_1,d_2\})$ of ADMM-based algorithm in high dimensional settings.

4 Theoretical analysis

In this section, some theoretical properties of Algorithm 1 is discussed. Specifically, we first analyze the convergence behavior of Algorithm 1, and then establish the sub-optimality of it.

4.1 Convergence analysis of Algorithm 1

We will show in the following theorem that Algorithm 1 is convergent.

Theorem 1

Letting $({\mathcal {P}}_t,{\mathcal {C}}_t,{\mathcal {Q}}_t,{\mathcal {S}}_t)$ be any sequence generated by Algorithm 1, the following statements hold

(I).
The sequences $({\mathcal {C}}_t,{\mathcal {P}}_t*{\mathcal {C}}_t*{\mathcal {Q}}_t{}^\top ,{\mathcal {S}}_{t})$ are Cauchy sequences respectively.
(II).
$({\mathcal {P}}_t,{\mathcal {C}}_t,{\mathcal {Q}}_t,{\mathcal {S}}_t)$ is a feasible solution to Problem (9) in a sense that
$$\begin{aligned} \lim _{t\rightarrow \infty } \Vert {{\mathcal {P}}_t*{\mathcal {C}}_t*{\mathcal {Q}}{}^\top _t+{\mathcal {S}}_{t}-{\mathcal {M}}} \Vert _{\infty } \le \varepsilon . \end{aligned}$$

Before proving Theorem 1, we need the following lemmas.

Lemma 4

[16] Let $\Vert {\cdot }\Vert $ denote any norm with dual norm $\Vert {\cdot }\Vert ^*$. If ${\mathbf{y}}\in \partial \Vert {{\mathbf{x}}}\Vert $, then it holds that $\Vert {{\mathbf{y}}}\Vert ^*\le 1$.

Lemma 5

The sequence $\{{\mathcal {Y}}_{t}\},\{{\mathcal {Y}}^1_{t}\},\{{\mathcal {Y}}^2_{t+1}\},\{{\mathcal {Y}}^3_{t+1}\}$ in Algorithm 1 are bounded.

Proof

First, according to the optimality of ${\mathcal {S}}_{t+1}$ in Problem 19, we have that

$$\begin{aligned}&{\mathbf{0}}\in \lambda \partial \Vert {{\mathcal {S}}_{t+1}} \Vert _{\text {1}}\\&\quad +\mu _{t}({\mathcal {P}}_{t+1}*{\mathcal {C}}_{t+1}*{\mathcal {Q}}_{t+1}{}^\top +{\mathcal {S}}_{t+1}-{\mathcal {M}}+{\mathcal {Y}}_{t}/\mu _{t}), \end{aligned}$$

which means

$$\begin{aligned} -{\mathcal {Y}}_{t+1}\in \lambda \partial \Vert {{\mathcal {S}}_{t+1}} \Vert _{\text {1}}\Rightarrow \Vert {{\mathcal {Y}}_{t+1}} \Vert _{\infty }\le \lambda . \end{aligned}$$

Thus, $\{{\mathcal {Y}}_{t}\}$ is a bounded sequence.

Then, according to the optimality of ${\mathcal {Q}}_{t+1}$ to Problem 17, we obtain $ \Vert {{\mathcal {Y}}^2_{t+1}} \Vert _{\text {F}}\le \Vert {{\mathcal {Y}}^3_{t+1}} \Vert _{\text {F}}. $

Next, the optimality of ${\mathcal {P}}_{t+1}$ to Problem (12) yields

$$\begin{aligned} \Vert {{\mathcal {Y}}^3_{t+1}} \Vert _{\text {F}} &\le\Vert {{\mathcal {P}}_{t}*{\mathcal {C}}_{t}*{\mathcal {Q}}_{t}{}^\top +{\mathcal {S}}_{t}-{\mathcal {M}}+{\mathcal {Y}}_{t}/\mu _{t}} \Vert _{\text {F}}\\ &=\Vert {{\mathcal {Y}}_{t}/\mu _{t-1}-{\mathcal {Y}}_{t-1}/\mu _{t-1}+{\mathcal {Y}}_{t}/\mu _{t}} \Vert _{\text {F}}. \end{aligned}$$

Since the boundedness of $\{{\mathcal {Y}}_{t}\}$ leads to the boundedness of $\{{\mathcal {Y}}_{t}^3\}$. Then $\{{\mathcal {Y}}_{t}^2\}$ is also bounded.

The optimality of ${\mathcal {C}}_{t+1}$ to Problem (18) yields

$$\begin{aligned}&{\mathbf{0}}\in \partial \Vert {{\mathcal {C}}_{t+1}} \Vert _{\star }\\&\quad +\mu _t\big ({\mathcal {C}}_{t+1}+{\mathcal {P}}_{t+1}^{\top }*({\mathcal {S}}_{t}-{\mathcal {M}}+{\mathcal {Y}}_{t}/\mu _{t})*{\mathcal {Q}}_{t+1}\big ), \end{aligned}$$

which means

$$\begin{aligned}&-{\mathcal {P}}_{t+1}^{\top }{\mathcal {Y}}^1_{t+1}{\mathcal {Q}}_{t+1}\in \partial \Vert {{\mathcal {C}}_{t+1}} \Vert _{\star }\Rightarrow \Vert {{\mathcal {P}}_{t+1}^{\top }{\mathcal {Y}}^1_{t+1}{\mathcal {Q}}_{t+1}} \Vert \le 1. \end{aligned}$$

Let ${\mathcal {P}}_{t+1}^{\perp }={\mathcal {I}}-{\mathcal {P}}_{t+1}$ and ${\mathcal {Q}}_{t+1}^{\perp }={\mathcal {I}}-{\mathcal {Q}}_{t+1}$. Then,

$$\begin{aligned} \Vert {({\mathcal {P}}_{t+1}^{\perp }){}^\top {\mathcal {Y}}^1_{t+1}{\mathcal {Q}}_{t+1}} \Vert = \Vert {({\mathcal {P}}_{t+1}^{\perp }){}^\top {\mathcal {Y}}^3_{t+1}{\mathcal {Q}}_{t+1}} \Vert \le \Vert {{\mathcal {Y}}^3_{t+1}} \Vert . \end{aligned}$$

Thus, $\{({\mathcal {P}}_{t+1}^{\perp }){}^\top {\mathcal {Y}}^1_{t+1}{\mathcal {Q}}_{t+1}\}$ is bounded. Similarly, sequences $\{({\mathcal {P}}_{t+1}^{\perp }){}^\top {\mathcal {Y}}^1_{t+1}{\mathcal {Q}}_{t+1}^{\perp }\}$ and $\{{\mathcal {P}}_{t+1}{}^\top {\mathcal {Y}}^1_{t+1}{\mathcal {Q}}_{t+1}^{\perp }\}$ are also bounded.

In this way, $\{{\mathcal {Y}}_{t}^1\}$ is also bounded. $\square $

Equipped with the above two lemmas, we are able to prove Theorem 1.

Proof of Theorem 1

First, according to the process of Algorithm 1, we have the following chain of inequalities of the Lagrangian:

$$\begin{aligned}&L_{\mu _{t}}({\mathcal {P}}_{t+1},{\mathcal {C}}_{t+1},{\mathcal {Q}}_{t+1},{\mathcal {S}}_{t+1},{\mathcal {Y}}_{t})\\&\quad \le L_{\mu _{t}}({\mathcal {P}}_{t+1},{\mathcal {C}}_{t+1},{\mathcal {Q}}_{t+1},{\mathcal {S}}_{t},{\mathcal {Y}}_{t})\\&\quad \le L_{\mu _{t}}({\mathcal {P}}_{t+1},{\mathcal {C}}_{t},{\mathcal {Q}}_{t+1},{\mathcal {S}}_{t},{\mathcal {Y}}_{t})\\&\quad \le L_{\mu _{t}}({\mathcal {P}}_{t+1},{\mathcal {C}}_{t},{\mathcal {Q}}_{t},{\mathcal {S}}_{t},{\mathcal {Y}}_{t})\\&\quad \le L_{\mu _{t}}({\mathcal {P}}_{t},{\mathcal {C}}_{t},{\mathcal {Q}}_{t},{\mathcal {S}}_{t},{\mathcal {Y}}_{t})\\&\quad \le L_{\mu _{t-1}}({\mathcal {P}}_{t},{\mathcal {C}}_{t},{\mathcal {Q}}_{t},{\mathcal {S}}_{t},{\mathcal {Y}}_{t-1})+\frac{\mu _{t}+\mu _{t-1}}{2\mu _{t-1}^2} \Vert {{\mathcal {Y}}_{t}-{\mathcal {Y}}_{t-1}} \Vert _{\text {F}}^2\\&\quad \le L_{\mu _{0}}({\mathcal {P}}_{1},{\mathcal {C}}_{1},{\mathcal {Q}}_{1},{\mathcal {S}}_{1},{\mathcal {Y}}_{0})+\sum _{s=1}^{t}\frac{\mu _{s}+\mu _{s-1}}{2\mu _{s-1}^2} \Vert {{\mathcal {Y}}_{s}-{\mathcal {Y}}_{s-1}} \Vert _{\text {F}}^2\\&\quad \le L_{\mu _{0}}({\mathcal {P}}_{1},{\mathcal {C}}_{1},{\mathcal {Q}}_{1},{\mathcal {S}}_{1},{\mathcal {Y}}_{0})\\&\qquad +(\max _{s} \Vert {{\mathcal {Y}}_{s}-{\mathcal {Y}}_{s-1}} \Vert _{\text {F}}^2)\sum _{s=1}^{t}\frac{\mu _{s}+\mu _{s-1}}{2\mu _{s-1}^2} \end{aligned}$$

Note that the quantity $\max _{s} \Vert {{\mathcal {Y}}_{s}-{\mathcal {Y}}_{s-1}} \Vert _{\text {F}}^2$ in the above inequality is bounded, since $\{{\mathcal {Y}}_t\}$ is bounded. Recall the update of $\mu _t$ in Algorithm 1 $\mu _{t}=\rho \mu _{t-1}=\rho ^t\mu _{0}$, then we show the quantity $\sum _{t=1}^{\infty }\frac{\mu _{t}+\mu _{t-1}}{2\mu _{t-1}^2}$ is also bounded, since

$$\begin{aligned} \sum _{t=1}^{\infty }\frac{\mu _{t}+\mu _{t-1}}{2\mu _{t-1}^2}=\frac{\rho +1}{2\mu _{0}}\sum _{t=1}^{\infty }\frac{1}{\rho ^{t-1}}=\frac{\rho (\rho +1)}{2\mu _{0}(\rho -1)}. \end{aligned}$$

Thus, $L_{\mu _{t-1}}({\mathcal {P}}_{t},{\mathcal {C}}_{t},{\mathcal {Q}}_{t},{\mathcal {S}}_{t},{\mathcal {Y}}_{t-1})$ is bounded. Note that

$$\begin{aligned}&L_{\mu _{t}}({\mathcal {P}}_{t+1},{\mathcal {C}}_{t+1},{\mathcal {Q}}_{t+1},{\mathcal {S}}_{t+1},{\mathcal {Y}}_{t}) \\&\quad = \Vert {{\mathcal {C}}_{t}} \Vert _{\star }+\lambda \Vert {{\mathcal {S}}_{t}} \Vert _{\text {1}}+\left\langle {\mathcal {Y}}_{t-1},{\mathcal {P}}_{t}*{\mathcal {C}}_{t}*{\mathcal {Q}}{}^\top _{t}+{\mathcal {S}}_{t}-{\mathcal {M}}\right\rangle \\&\qquad + \frac{\mu _{t-1}}{2} \Vert {{\mathcal {P}}_{t}*{\mathcal {C}}_{t}*{\mathcal {Q}}{}^\top _{t}+{\mathcal {S}}_{t}-{\mathcal {M}}} \Vert _{\text {F}}^2\\&\quad = \Vert {{\mathcal {C}}_{t}} \Vert _{\star }+\lambda \Vert {{\mathcal {S}}_{t}} \Vert _{\text {1}}+\left\langle {\mathcal {Y}}_{t-1},\frac{{\mathcal {Y}}_{t}-{\mathcal {Y}}_{t-1}}{\mu _{t-1}}\right\rangle \frac{\mu _{t-1}}{2} \Vert {\frac{{\mathcal {Y}}_{t}-{\mathcal {Y}}_{t-1}}{\mu _{t-1}}} \Vert _{\text {F}}^2\\&\quad = \Vert {{\mathcal {C}}_{t}} \Vert _{\star }+\lambda \Vert {{\mathcal {S}}_{t}} \Vert _{\text {1}}+\frac{1}{2\mu _{t-1}} \Vert {{\mathcal {Y}}_{t}-{\mathcal {Y}}_{t-1}} \Vert _{\text {F}}^2. \end{aligned}$$

Then, the sequence $\big \{ \Vert {{\mathcal {C}}_{t}} \Vert _{\star }+\lambda \Vert {{\mathcal {S}}_{t}} \Vert _{\text {1}}\big \}$ is bounded.

According to the orthogonal invariance of TNN given in Lemma 1, we have

$$\begin{aligned} \Vert {{\mathcal {P}}_{t}*{\mathcal {C}}_{t}*{\mathcal {Q}}{}^\top _{t}} \Vert _{\star }= \Vert {{\mathcal {C}}_{t}} \Vert _{\star }. \end{aligned}$$

Then, we obtain that $({\mathcal {C}}_{t},{\mathcal {P}}_{t}*{\mathcal {C}}_{t}*{\mathcal {Q}}{}^\top _{t},{\mathcal {S}}_{t})$ is bounded.

According to the process of Algorithm 1, we have

$$\begin{aligned} {\mathcal {S}}_{t+1}-{\mathcal {S}}_{t}&=\mu _{t}^{-1}{({\mathcal {Y}}_{t+1}-{\mathcal {Y}}^1_{t+1})}{}\\ {\mathcal {C}}_{t+1}-{\mathcal {C}}_t&= \mu _{t}^{-1}{({\mathcal {P}}_{t+1}*({\mathcal {Y}}^1_{t+1}-{\mathcal {Y}}^2_{t+1})*{\mathcal {Q}}{}^\top _{t+1})} \end{aligned}$$

and the following relationships

$$\begin{aligned}&{\mathcal {P}}_{t+1}*{\mathcal {C}}_{t+1}*{\mathcal {Q}}{}^\top _{t+1}-{\mathcal {P}}_{t}*{\mathcal {C}}_{t}*{\mathcal {Q}}{}^\top _{t}\\&\quad =\mu _{t}^{-1}\big ({\mathcal {Y}}^1_{t+1}+{\mathcal {Y}}_{t-1}-(1+\rho ){\mathcal {Y}}_t\big ) \end{aligned}$$

and

$$\begin{aligned} {\mathcal {P}}_{t+1}*{\mathcal {C}}_{t+1}*{\mathcal {Q}}{}^\top _{t+1}+{\mathcal {S}}_{t+1}-{\mathcal {M}}&=\mu _{t}^{-1}{({\mathcal {Y}}_{t+1}-{\mathcal {Y}}_{t})}. \end{aligned}$$

By the update of $\mu _t=\rho \mu _{t-1}$ with $\rho =1.1$ in Algorithm 1, we have the fact that $\lim _{t\rightarrow \infty }\mu _{t}=+\infty $. Combing the above with the boundedness of ${\mathcal {Y}}_{t}$ and ${\mathcal {Y}}^i_{t},~i=1,2,3$, we have

$$\begin{aligned} \mu _{t}^{-1}{({\mathcal {Y}}_{t+1}-{\mathcal {Y}}^1_{t+1})}&\rightarrow {\mathbf{0}}\\ \mu _{t}^{-1}{({\mathcal {Y}}^1_{t+1}-{\mathcal {Y}}^2_{t+1})}&\rightarrow {\mathbf{0}}\\ {\mu _{t}}^{-1}\big ({\mathcal {Y}}^1_{t+1}+{\mathcal {Y}}_{t-1}-(1+\rho ){\mathcal {Y}}_t\big )&\rightarrow {\mathbf{0}}\\ \mu _{t}^{-1}({\mathcal {Y}}_{t+1}-{\mathcal {Y}}_{t})&\rightarrow {\mathbf{0}}. \end{aligned}$$

Then $\{{\mathcal {S}}_{t}\},\{{\mathcal {C}}_{t}\},\{{\mathcal {P}}_t*{\mathcal {C}}_t*{\mathcal {Q}}{}^\top _t\}$ are Cauchy sequences, and

$$\begin{aligned} \Vert {{\mathcal {P}}_{t+1}*{\mathcal {C}}_{t+1}*{\mathcal {Q}}{}^\top _{t+1}+{\mathcal {S}}_{t+1}-{\mathcal {M}}} \Vert _{\infty }\le \varepsilon . \end{aligned}$$

$\square $

4.2 Sub-optimality of Algorithm 1

In this subsection, we will establish the sub-optimality of Algorithm 1 by upper bounding the gap in objective value produced by Algorithm 1 and the optimal value of TNN-based TRPCA [24] in Problem (7).

4.2.1 Connection between Algorithm 1 and TRPCA

We first build a connection between the proposed model TriFac in Problem (9) with the TNN-based TRPCA model [24] in Problem (7) in the following theorem.

Theorem 2

(Connection between TriFac and TRPCA) Let $({\mathcal {P}}_{*},{\mathcal {C}}_{*},{\mathcal {Q}}_{*},{\mathcal {S}}_{*})$ be a global optimal solution to TriFac in Problem (9). And let $({\mathcal {L}}^\star ,{\mathcal {S}}^\star ) $ be the solution to TRPCA in Problem (7), and $r_{\text{ t }}({{\mathcal {L}}^\star })\le r$, where r is the initialized tubal rank. Then $({\mathcal {P}}_{*}*{\mathcal {C}}_{*}*{\mathcal {Q}}{}^\top _{*},{\mathcal {S}}_{*})$ is also the optimal solution to Problem (7).

Theorem 2 asserts that the global optimal point of TriFac coincides with the solution of TNN-based TRPCA, which is guaranteed to exactly recover the underlying tensor ${\mathcal {L}}^*$ under certain conditions. That means that the accuracy of the proposed model cannot exceed TPRCA, which will be shown numerically in the experiment section. The proof of Theorem 2 is given as follows.

Proof of Theorem 2

Note that $({\mathcal {P}}_{*}*{\mathcal {C}}_{*}*{\mathcal {Q}}{}^\top _{*},{\mathcal {S}}_{*})$ is a feasible point of Problem (7), then we have

$$\begin{aligned} \begin{aligned} \Vert {{\mathcal {L}}^\star } \Vert _{\star }+\lambda \Vert {{\mathcal {S}}^\star } \Vert _{\text {1}}&\le \Vert {{\mathcal {P}}_{*}*{\mathcal {C}}_{*}*{\mathcal {Q}}{}^\top _{*}} \Vert _{\star }+\lambda \Vert {{\mathcal {S}}_{*}} \Vert _{\text {1}}\\&= \Vert {{\mathcal {C}}_{*}} \Vert _{\star }+\lambda \Vert {{\mathcal {S}}_{*}} \Vert _{\text {1}}\\ \end{aligned} \end{aligned}$$

(21)

By the assumption that $r_{\text{ t }}({{\mathcal {L}}^\star })\le r$, there exists a decomposition ${\mathcal {L}}^\star ={\mathcal {P}}^\star *{\mathcal {C}}^\star *({\mathcal {Q}}^\star ){}^\top $, such that $({\mathcal {P}}^\star ,{\mathcal {C}}^\star ,{\mathcal {Q}}^\star ,{\mathcal {S}}^\star )$ is also a feasible point of Problem (9).

Moreover, since $({\mathcal {C}}_*,{\mathcal {S}}_*)$ is a global optimal solution to Problem (9), then we have that

$$\begin{aligned} \Vert {{\mathcal {C}}_{*}} \Vert _{\star }+\lambda \Vert {{\mathcal {S}}_{*}} \Vert _{\text {1}}\le \Vert {{\mathcal {C}}^\star } \Vert _{\star }+\lambda \Vert {{\mathcal {S}}^\star } \Vert _{\text {1}}. \end{aligned}$$

By ${\mathcal {L}}^\star ={\mathcal {P}}^\star *{\mathcal {C}}^\star *({\mathcal {Q}}^\star ){}^\top $, we have

$$\begin{aligned} \Vert {{\mathcal {L}}^\star } \Vert _{\star }= \Vert {{\mathcal {P}}^\star *{\mathcal {C}}^\star *({\mathcal {Q}}^\star ){}^\top } \Vert _{\star }= \Vert {{\mathcal {C}}^\star } \Vert _{\star }. \end{aligned}$$

Thus, we deduce

$$\begin{aligned} \begin{aligned} \Vert {{\mathcal {C}}_{*}} \Vert _{\star }+\lambda \Vert {{\mathcal {S}}_{*}} \Vert _{\text {1}}\le \Vert {{\mathcal {L}}^\star } \Vert _{\star }+\lambda \Vert {{\mathcal {S}}^\star } \Vert _{\text {1}}. \end{aligned} \end{aligned}$$

(22)

According to Eqs. (21) and (22), we further have

$$\begin{aligned} \Vert {{\mathcal {C}}_{*}} \Vert _{\star }+\lambda \Vert {{\mathcal {S}}_{*}} \Vert _{\text {1}}= \Vert {{\mathcal {L}}^\star } \Vert _{\star }+\lambda \Vert {{\mathcal {S}}^\star } \Vert _{\text {1}}. \end{aligned}$$

In this way, $({\mathcal {P}}_{*}*{\mathcal {C}}_{*}*{\mathcal {Q}}{}^\top _{*},{\mathcal {S}}_{*})$ is also the optimal solution to the TRPCA Problem (7). $\square $

Next, we further establish the sub-optimality of Algorithm 1. Let $t^*$ be the iteration number that Algorithm 1 stops at, then we have the following result.

Theorem 3

(The sub-optimality of Algorithm 1 with respect to TriFac) Let $({\mathcal {P}}_g,{\mathcal {C}}_g,{\mathcal {Q}}_g,{\mathcal {S}}_g)$ be a global optimal solution to Problem (9). Denote $f_g= \Vert {{\mathcal {C}}_g} \Vert _{\star }+\lambda \Vert {{\mathcal {S}}_g} \Vert _{\text {1}}$ and $f_{t^*+1}= \Vert {{\mathcal {C}}_{t^*+1}} \Vert _{\star }+\lambda \Vert {{\mathcal {S}}_{t^*+1}} \Vert _{\text {1}}$. Then, we have

$$\begin{aligned} f_{t^*+1}-f_g \le (\lambda +1)c_1+d_1d_2d_3\varepsilon , \end{aligned}$$

where

$$\begin{aligned} c_1=d_1d_2d_3\lambda ^2 \Vert {{\mathcal {M}}} \Vert \bigg (\frac{\rho (\rho +1)}{\mu _{0}(\rho -1)}+\frac{1}{2\rho ^{t^*}}\bigg )+ \Vert {{\mathcal {M}}} \Vert _{\text {1}}. \end{aligned}$$

Theorem 3 gives an upper bound on the gap between the global optimal value of TriFac and the objective value of Algorithm 1. Since TriFac shares the global optima with TRPCA, the gap between objective values of Algorithm 1 and TRPCA is upper bounded by

$$\begin{aligned} (\lambda +1)c_1+d_1d_2d_3\varepsilon . \end{aligned}$$

Now, we are in a position to give the proof of Theorem 3 in the following subsection.

4.2.2 Proof of Theorem 3

To prove Theorem 3, we should be equipped with the following lemmas.

Lemma 6

(Subgradient of TNN [23]) Let ${\mathcal {X}}\in \mathbb {R}^{m\times n \times k}$ with tubal rank r and its skinny t-SVD be ${\mathcal {X}}={\mathcal {U}}*{\mathcal {S}}*{\mathcal {V}}{}^\top $. The subdifferential (the set of subgradients) of $\partial \Vert {{\mathcal {X}}} \Vert _{\star }$ is

$$\begin{aligned}&\partial \Vert {{\mathcal {X}}} \Vert _{\star }\\&\quad =\{{\mathcal {U}}*{\mathcal {V}}{}^\top +{\mathcal {W}}|{\mathcal {U}}{}^\top *{\mathcal {W}}={\mathbf{0}},{\mathcal {W}}*{\mathcal {V}}={\mathbf{0}}, \Vert {{\mathcal {W}}} \Vert \le 1\}. \end{aligned}$$

Lemma 7

Let ${\mathcal {X}},{\mathcal {Y}}$ and ${\mathcal {Q}}$ are tensors of compatible dimensions. If ${\mathcal {P}},{\mathcal {Q}}$ obeys ${\mathcal {P}}{}^\top *{\mathcal {P}}={\mathcal {I}}_{r},{\mathcal {Q}}{}^\top *{\mathcal {Q}}={\mathcal {I}}_{r}$ and ${\mathcal {Y}}\in \partial \Vert {{\mathcal {X}}} \Vert _{\star }$, then

$$\begin{aligned} {\mathcal {P}}*{\mathcal {Y}}\in&\partial \Vert {{\mathcal {P}}*{\mathcal {X}}} \Vert _{\star }\\ {\mathcal {Y}}*{\mathcal {Q}}{}^\top \in&\partial \Vert {{\mathcal {X}}*{\mathcal {Q}}{}^\top } \Vert _{\star }\\ {\mathcal {P}}*{\mathcal {Y}}*{\mathcal {Q}}{}^\top \in&\partial \Vert {{\mathcal {P}}*{\mathcal {X}}*{\mathcal {Q}}{}^\top } \Vert _{\star }. \end{aligned}$$

Proof

Let the skinny t-SVD of ${\mathcal {X}}$ be ${\mathcal {X}}={\mathcal {U}}*{\mathcal {S}}*{\mathcal {V}}{}^\top $. By ${\mathcal {Y}}\in \Vert {{\mathcal {X}}} \Vert _{\star }$ and Lemma 6, we have

$$\begin{aligned} {\mathcal {Y}}={\mathcal {U}}*{\mathcal {V}}{}^\top +{\mathcal {W}}, \end{aligned}$$

where ${\mathcal {U}}{}^\top *{\mathcal {W}}={\mathbf{0}},{\mathcal {W}}*{\mathcal {V}}={\mathbf{0}}, \Vert {{\mathcal {W}}} \Vert \le 1$. Since ${\mathcal {P}}$ is column-orthogonal, then ${\mathcal {P}}*{\mathcal {X}}$ has a skinny t-SVD as

$$\begin{aligned} {\mathcal {P}}*{\mathcal {X}}=({\mathcal {P}}*{\mathcal {U}})*{\mathcal {S}}*{\mathcal {V}}{}^\top , \end{aligned}$$

where $({\mathcal {P}}*{\mathcal {U}}){}^\top *({\mathcal {P}}*{\mathcal {U}})={\mathcal {U}}{}^\top *{\mathcal {P}}{}^\top *{\mathcal {P}}*{\mathcal {U}}={\mathcal {I}}_{r}$. Then, it yields to

$$\begin{aligned}&\partial \Vert {{\mathcal {P}}*{\mathcal {X}}} \Vert _{\star }=\{{\mathcal {P}}*{\mathcal {U}}*{\mathcal {V}}{}^\top +{\mathcal {W}}_1\}, \end{aligned}$$

where ${\mathcal {U}}{}^\top *{\mathcal {P}}{}^\top *{\mathcal {W}}_1={\mathbf{0}},{\mathcal {W}}_1*{\mathcal {V}}={\mathbf{0}}, \Vert {{\mathcal {W}}_1} \Vert \le 1$. With the above notations, it can be verified that ${\mathcal {P}}*{\mathcal {Y}}\in \partial \Vert {{\mathcal {P}}*{\mathcal {X}}} \Vert _{\star }$. Similarly, the remaining two relationships can be proved. $\square $

Lemma 8

For any tensor ${\mathcal {X}}$, we have $ \Vert {{\mathcal {X}}} \Vert _{\infty }\le \Vert {{\mathcal {X}}} \Vert $.

Proof

It is obvious that $ \Vert {{\mathcal {X}}} \Vert _{\infty }=\Vert \text {bcirc}({\mathcal {X}})\Vert _{\infty }$, where $\text {bcirc}(\cdot )$ is the block-circulant operator [24]. It is also known that $ \Vert {{\mathcal {X}}} \Vert =\Vert {{\overline{{\mathcal {X}}}}}\Vert =\Vert {\text {bcirc}({\mathcal {X}})}\Vert $. Note that for any matrix ${\mathbf{X}}\in \mathbb {R}^{m\times n}$, we have

$$\begin{aligned} \Vert {\mathbf{X}}\Vert _{\infty }&=\max _{(i,j)\in [m]\times [n]}{\mathbf{e}}_i{}^\top {\mathbf{X}}{\mathbf{e}}_j\\&\le \inf _{{\mathbf{u}}\in {\mathbb {S}}^{m-1},{\mathbf{v}}\in {\mathbb {S}}^{n-1}}{\mathbf{u}}{}^\top {\mathbf{X}}{\mathbf{v}}\\&=\Vert {{\mathcal {X}}}\Vert. \end{aligned}$$

Thus, we have the following relationship

$$\begin{aligned} \Vert {{\mathcal {X}}} \Vert _{\infty }=\Vert \text {bcirc}({\mathcal {X}})\Vert _{\infty }\le \Vert {\text {bcirc}({\mathcal {X}})}\Vert \le \Vert {{\mathcal {X}}} \Vert . \end{aligned}$$

$\square $

Based on the above three lemmas, we give the following proposition which is key to the proof of Theorem 3.

Proposition 1

Let $({\mathcal {P}}_{t^*+1},{\mathcal {C}}_{t^*+1},{\mathcal {Q}}_{t^*+1},{\mathcal {S}}_{t^*+1})$ denote the the solution generated by Algorithm 1, we have that

$$\begin{aligned}&\Vert {{\mathcal {C}}} \Vert _{\star }+\lambda \Vert {{\mathcal {S}}} \Vert _{\text {1}}-( \Vert {{\mathcal {C}}_{{t^*+1}}} \Vert _{\star }+\lambda \Vert {{\mathcal {S}}_{{t^*+1}}} \Vert _{\text {1}})\\&\quad \ge \left\langle {\mathcal {Y}}_{{t^*+1}}-{\mathcal {Y}}^1_{t^*+1},{\mathcal {S}}-{\mathcal {S}}_{{t^*+1}}\right\rangle -d_1d_2d_3\varepsilon , \end{aligned}$$

for any feasible point $({\mathcal {P}},{\mathcal {C}},{\mathcal {Q}},{\mathcal {S}})$ of Problem (9).

Proof

Recall the update of ${\mathcal {P}}_{t^*+1}$ that

$$\begin{aligned} {\mathcal {P}}_{t^*+1}=\mathop {{{\,\mathrm{argmin}\,}}}_{{\mathcal {P}}{}^\top *{\mathcal {P}}={\mathcal {I}}_{r}} \Vert {{\mathcal {P}}*({\mathcal {C}}_{t^*}{\mathcal {Q}}{}^\top _{t^*})-({\mathcal {M}}-{\mathcal {S}}_{t^*}-{\mathcal {Y}}_{t^*}/\mu _{t^*})} \Vert _{\text {F}}^2. \end{aligned}$$

Let the skinny t-SVD of $({\mathcal {M}}-{\mathcal {S}}_{t^*}-{\mathcal {Y}}_{t^*}/\mu _{t^*})$ is ${\mathcal {U}}_t*{\underline{\varLambda }}_t*{\mathcal {V}}{}^\top _t$. Also let the full t-SVD of ${\underline{\varLambda }}_t*{\mathcal {V}}{}^\top _t*({\mathcal {C}}_{t^*}{\mathcal {Q}}{}^\top _{t^*}){}^\top $ be ${\mathcal {U}}*{\underline{\varLambda }}*{\mathcal {V}}{}^\top $, then we have

$$\begin{aligned} {\mathcal {P}}_{t^*+1}={\mathcal {U}}_t*{\mathcal {U}}*{\mathcal {V}}{}^\top , \end{aligned}$$

which further leads to

$$\begin{aligned} {\mathcal {P}}_{t^*+1}*{\mathcal {P}}{}^\top _{t^*+1}={\mathcal {U}}_t*{\mathcal {U}}{}^\top _t, \end{aligned}$$

Similarly, one has

$$\begin{aligned} {\mathcal {Q}}_{t^*+1}*{\mathcal {Q}}{}^\top _{t^*+1}={\mathcal {V}}_t*{\mathcal {V}}{}^\top _t. \end{aligned}$$

Recall the definition of ${\mathcal {Y}}^1_{t^*+1}$ in Algorithm 1 that

$$\begin{aligned} {\mathcal {Y}}^1_{t^*+1} &={\mathcal {Y}}_{t^*}+\mu _{t^*}({\mathcal {P}}_{t^*+1}*{\mathcal {C}}_{t^*+1}*{\mathcal {Q}}{}^\top _{t^*+1}+{\mathcal {S}}_{t^*}-{\mathcal {M}})\\ &=\mu _{t^*}\big ( {\mathcal {P}}_{t^*+1}*{\mathcal {C}}_{t^*+1}*{\mathcal {Q}}{}^\top _{t^*+1} -({\mathcal {M}}-{\mathcal {S}}_{t^*}-{\mathcal {Y}}_{t^*}/\mu _{t^*}) \big ) \end{aligned}$$

Performing left t-product of ${\mathcal {P}}_{t^*+1}*{\mathcal {P}}{}^\top _{t^*+1}$ and right t-product ${\mathcal {Q}}_{t^*+1}*{\mathcal {Q}}{}^\top _{t^*+1}$ on both sides of the above equation, we get

$$\begin{aligned}&{\mathcal {P}}_{t^*+1}*{\mathcal {P}}{}^\top _{t^*+1}*{\mathcal {Y}}^1_{t^*+1}*{\mathcal {Q}}_{t^*+1}*{\mathcal {Q}}{}^\top _{t^*+1}\\&\quad = \mu _{t^*}\big ( {\mathcal {P}}_{t^*+1}*{\mathcal {C}}_{t^*+1}*{\mathcal {Q}}{}^\top _{t^*+1} -({\mathcal {M}}-{\mathcal {S}}_{t^*}-{\mathcal {Y}}_{t^*}/\mu _{t^*})\big ), \end{aligned}$$

Thus, we have

$$\begin{aligned} {\mathcal {Y}}^1_{t^*+1}={\mathcal {P}}_{t^*+1}*{\mathcal {P}}{}^\top _{t^*+1}*{\mathcal {Y}}^1_{t^*+1}*{\mathcal {Q}}_{t^*+1}*{\mathcal {Q}}{}^\top _{t^*+1}. \end{aligned}$$

The update of ${\mathcal {C}}_{t^*+1}$ indicates that

$$\begin{aligned}&{\mathbf{0}}\in \partial \Vert {{\mathcal {C}}_{t^*+1}} \Vert _{\star }+{\mathcal {P}}{}^\top _{t^*+1}*{\mathcal {Y}}^1_{t^*+1}*{\mathcal {Q}}_{t^*+1}\\&\quad \Rightarrow -{\mathcal {Y}}^1_{t^*+1}\in \partial \Vert {{\mathcal {P}}_{t^*+1}*{\mathcal {C}}_{t^*+1}*{\mathcal {Q}}{}^\top _{t^*+1}} \Vert _{\star }\\&\quad \Rightarrow \Vert {{\mathcal {Y}}^1_{t^*+1}} \Vert \le 1. \end{aligned}$$

Then, we have the relationship $ {\mathcal {P}}_{t^*+1}*{\mathcal {P}}{}^\top _{t^*+1}*{\mathcal {Y}}^1_{t^*+1}*{\mathcal {Q}}_{t^*+1}*{\mathcal {Q}}{}^\top _{t^*+1} \in \partial \Vert {{\mathcal {P}}_{t^*+1}*{\mathcal {C}}_{t^*+1}*{\mathcal {Q}}{}^\top _{t^*+1}} \Vert _{\star }. $

According to the update ${\mathcal {S}}_{t^*+1}$, we get $ \Vert {{\mathcal {Y}}_{t^*+1}} \Vert _{\text {1}}\le \lambda $. Then using the property of convex functions, we have for any feasible solution $({\mathcal {P}},{\mathcal {C}},{\mathcal {Q}},{\mathcal {S}})$

$$\begin{aligned}&\Vert {{\mathcal {C}}} \Vert _{\star }+\lambda \Vert {{\mathcal {S}}} \Vert _{\text {1}}\\&\quad = \Vert {{\mathcal {P}}*{\mathcal {C}}*{\mathcal {Q}}{}^\top } \Vert _{\star }+\lambda \Vert {{\mathcal {S}}} \Vert _{\text {1}}\\&\quad \ge \Vert {{\mathcal {P}}_{t^*+1}*{\mathcal {C}}_{t^*+1}*{\mathcal {Q}}{}^\top _{t^*+1}} \Vert _{\star }+\lambda \Vert {{\mathcal {S}}_{t^*+1}} \Vert _{\text {1}}+\left\langle {\mathcal {Y}}_{t^*+1},{\mathcal {S}}-{\mathcal {S}}_{t^*+1}\right\rangle \\&\qquad +\left\langle {\mathcal {Y}}^1_{t^*+1},{\mathcal {P}}*{\mathcal {C}}*{\mathcal {Q}}{}^\top -{\mathcal {P}}_{t^*+1}*{\mathcal {C}}_{t^*+1}*{\mathcal {Q}}{}^\top _{t^*+1}\right\rangle \\&\quad = \Vert {{\mathcal {C}}_{t^*+1}} \Vert _{\star }+\lambda \Vert {{\mathcal {S}}_{t^*+1}} \Vert _{\text {1}}+\left\langle {\mathcal {Y}}_{t^*+1}-{\mathcal {Y}}^1_{t^*+1},{\mathcal {S}}-{\mathcal {S}}_{t^*+1}\right\rangle \\&\qquad +\left\langle {\mathcal {Y}}^1_{t^*+1},{\mathcal {P}}*{\mathcal {C}}*{\mathcal {Q}}{}^\top +{\mathcal {S}}-{\mathcal {P}}_{t^*+1}*{\mathcal {C}}_{t^*+1}*{\mathcal {Q}}{}^\top _{t^*+1}-{\mathcal {S}}_{t^*+1}\right\rangle . \end{aligned}$$

Note that the feasibility of $({\mathcal {P}},{\mathcal {C}},{\mathcal {Q}},{\mathcal {S}})$ indicates $ {\mathcal {P}}*{\mathcal {C}}*{\mathcal {Q}}{}^\top +{\mathcal {S}}={\mathcal {M}}. $ Then, we have

$$\begin{aligned}&\Vert {{\mathcal {P}}*{\mathcal {C}}*{\mathcal {Q}}{}^\top +{\mathcal {S}}-{\mathcal {P}}_{t^*+1}*{\mathcal {C}}_{t^*+1}*{\mathcal {Q}}{}^\top _{t^*+1}-{\mathcal {S}}_{t^*+1}} \Vert _{\infty }\\&\quad = \Vert {{\mathcal {M}}-{\mathcal {P}}_{t^*+1}*{\mathcal {C}}_{t^*+1}*{\mathcal {Q}}{}^\top _{t^*+1}-{\mathcal {S}}_{t^*+1}} \Vert _{\infty }\\&\quad = \Vert {\mu _{t^*}^{-1}({\mathcal {Y}}_{t^*+1}-{\mathcal {Y}}_{t^*})} \Vert _{\infty }\le \varepsilon . \end{aligned}$$

Note that for any tensor ${\mathcal {X}}$, we have $ \Vert {{\mathcal {X}}} \Vert _{\infty }\le \Vert {{\mathcal {X}}} \Vert $ according to Lemma 8. Then, it holds that

$$\begin{aligned}&\Big |\left\langle {\mathcal {Y}}^1_{t^*+1},{\mathcal {P}}*{\mathcal {C}}*{\mathcal {Q}}{}^\top +{\mathcal {S}}-{\mathcal {P}}_{t^*+1}*{\mathcal {C}}_{t^*+1}*{\mathcal {Q}}{}^\top _{t^*+1}-{\mathcal {S}}_{t^*+1}\right\rangle \Big |\\&\quad \le \Vert {{\mathcal {Y}}^1_{t^*+1}} \Vert _{\infty } \Vert {{\mathcal {P}}*{\mathcal {C}}*{\mathcal {Q}}{}^\top +{\mathcal {S}}-{\mathcal {P}}_{t^*+1}*{\mathcal {C}}_{t^*+1}*{\mathcal {Q}}{}^\top _{t^*+1}-{\mathcal {S}}_{t^*+1}} \Vert _{\text {1}}\\&\quad \le \Vert {{\mathcal {Y}}^1_{t^*+1}} \Vert \cdot d_1d_2d_3 \Vert {{\mathcal {M}}-{\mathcal {P}}_{t^*+1}*{\mathcal {C}}_{t^*+1}*{\mathcal {Q}}{}^\top _{t^*+1}-{\mathcal {S}}_{t^*+1}} \Vert _{\infty }\\&\quad \le d_1d_2d_3\varepsilon . \end{aligned}$$

Thus, the statement of this proposition is proved. $\square $

Then, we are able to prove Theorem 3 based on Proposition 1.

Proof of Theorem 3

Let $({\mathcal {P}}_g,{\mathcal {C}}_g,{\mathcal {Q}}_g,{\mathcal {S}}_g)$ be a global solution to Problem (9). Since $({\mathcal {P}}_{t^*+1},{\mathbf{0}},{\mathcal {Q}}_{t^*+1},{\mathcal {M}})$ is feasible to Problem (9). Then, we have

$$\begin{aligned} \lambda \Vert {{\mathcal {E}}_g} \Vert _{\text {1}}\le \Vert {{\mathcal {C}}_g} \Vert _{\star }+\lambda \Vert {{\mathcal {E}}_g} \Vert _{\text {1}}\le \lambda \Vert {{\mathcal {M}}} \Vert _{\text {1}}. \end{aligned}$$

Also, we have

$$\begin{aligned}&\lambda \Vert {{\mathcal {S}}_{t^*+1}} \Vert _{\text {1}}\\&\quad \le \Vert {{\mathcal {C}}_{t^*+1}} \Vert _{\star }+\lambda \Vert {{\mathcal {S}}_{t^*+1}} \Vert _{\text {1}}\\&\quad =L_{\mu _{t^*}}({\mathcal {P}}_{t^*+1},{\mathcal {C}}_{t^*+1},{\mathcal {Q}}_{t^*+1},{\mathcal {S}}_{t^*+1},{\mathcal {Y}}_{t^*})\\&\qquad -\frac{1}{2\mu _{t^*}}\bigg ( \Vert {{\mathcal {Y}}_{t^*+1}} \Vert _{\text {F}}^2- \Vert {{\mathcal {Y}}_{t^*}} \Vert _{\text {F}}^2\bigg )\\&\quad =L_{\mu _{t^*}}({\mathcal {P}}_{t^*+1},{\mathcal {C}}_{t^*+1},{\mathcal {Q}}_{t^*+1},{\mathcal {S}}_{t^*+1},{\mathcal {Y}}_{t^*})+\frac{1}{2\mu _{t^*}} \Vert {{\mathcal {Y}}_{t^*}} \Vert _{\text {F}}^2\\&\quad \le L_{\mu _{0}}({\mathcal {P}}_{1},{\mathcal {C}}_{1},{\mathcal {Q}}_{1},{\mathcal {S}}_{1},{\mathcal {Y}}_{0}) \\&\qquad +\sum _{s=1}^{t^*+1}\frac{\mu _{s}+\mu _{s-1}}{2\mu _{s-1}^2} \Vert {{\mathcal {Y}}_{s}-{\mathcal {Y}}_{s-1}} \Vert _{\text {F}}^2 +\frac{1}{2\mu _{t^*}} \Vert {{\mathcal {Y}}_{t^*}} \Vert _{\text {F}}^2\\&\quad \le \frac{\rho (\rho +1)}{2\mu _{0}(\rho -1)}\max _s \Vert {{\mathcal {Y}}_{s}-{\mathcal {Y}}_{s-1}} \Vert _{\text {F}}^2 +\frac{1}{2\mu _{t^*}} \Vert {{\mathcal {Y}}_{t^*}} \Vert _{\text {F}}^2 \end{aligned}$$

Since $ \Vert {{\mathcal {Y}}_t} \Vert _{\infty }\le \lambda ,\forall t\in {\mathbb {N}}_+$, we have

$$\begin{aligned} \Vert {{\mathcal {Y}}_{s}-{\mathcal {Y}}_{s-1}} \Vert _{\infty }\le \Vert {{\mathcal {Y}}_{s}} \Vert _{\infty }+ \Vert {{\mathcal {Y}}_{s-1}} \Vert _{\infty } \le 2\lambda . \end{aligned}$$

Thus we have $\max _s \Vert {{\mathcal {Y}}_{s}-{\mathcal {Y}}_{s-1}} \Vert _{\text {F}}^2\le 4d_1d_2d_3\lambda ^2$ and $ \Vert {{\mathcal {Y}}_{t^*}} \Vert _{\text {F}}^2\le \lambda ^2$. Then, it yields

$$\begin{aligned} \lambda \Vert {{\mathcal {S}}_{t^*+1}} \Vert _{\text {1}}&\le\frac{d_1d_2d_3\lambda ^2}{2\mu _0}\bigg (\frac{2\rho (\rho +1)}{\mu _{0}(\rho -1)}+\frac{1}{\rho ^{t^*}}\bigg )\\ &=d_1d_2d_3\lambda ^2 \Vert {{\mathcal {M}}} \Vert \bigg (\frac{\rho (\rho +1)}{\mu _{0}(\rho -1)}+\frac{1}{2\rho ^{t^*}}\bigg ). \end{aligned}$$

Hence,

$$\begin{aligned} \Vert {{\mathcal {S}}_g-{\mathcal {S}}_{t^*+1}} \Vert _{\text {1}}\le \Vert {{\mathcal {S}}_g} \Vert _{\text {1}}+ \Vert {{\mathcal {S}}_{t^*+1}} \Vert _{\text {1}}\le c_1, \end{aligned}$$

where

$$\begin{aligned} c_1=d_1d_2d_3\lambda ^2 \Vert {{\mathcal {M}}} \Vert \bigg (\frac{\rho (\rho +1)}{\mu _{0}(\rho -1)}+\frac{1}{2\rho ^{t^*}}\bigg )+ \Vert {{\mathcal {M}}} \Vert _{\text {1}}. \end{aligned}$$

Also note that $ \Vert {{\mathcal {Y}}_{t^*}-{\mathcal {Y}}^1_{t^*}} \Vert _{\infty }\le \Vert {{\mathcal {Y}}_{t^*}} \Vert _{\infty }+ \Vert {{\mathcal {Y}}^1_{t^*}} \Vert _{\infty }\le \Vert {{\mathcal {Y}}_{t^*}} \Vert _{\infty }+ \Vert {{\mathcal {Y}}^1_{t^*}} \Vert \le 1+\lambda $. Thus, we have

$$\begin{aligned} f_g&= \Vert {{\mathcal {C}}_g} \Vert _{\star }+\lambda \Vert {{\mathcal {S}}_g} \Vert _{\text {1}}\\&\ge \Vert {{\mathcal {C}}_{t^*+1}} \Vert _{\star }+\lambda \Vert {{\mathcal {S}}_{t^*+1}} \Vert _{\text {1}} \\&\quad +\left\langle {\mathcal {Y}}_{t^*}-{\mathcal {Y}}^1_{t^*},{\mathcal {S}}_g-{\mathcal {S}}_{t^*+1}\right\rangle -d_1d_2d_3\varepsilon \\& \ge f_{t^*+1}- \Vert {{\mathcal {Y}}_{t^*}-{\mathcal {Y}}^1_{t^*}} \Vert _{\infty } \Vert {{\mathcal {S}}_g-{\mathcal {S}}_{t^*+1}} \Vert _{\text {1}}-d_1d_2d_3\varepsilon \\& \ge f_{t^*+1}-(\lambda +1)c_1-d_1d_2d_3\varepsilon , \end{aligned}$$

which completes the proof. $\square $

5 Extensions and differences with prior works

In this section, we first consider an extension where the observation also suffers from missing values, known as robust tensor completion [11], and then explain the differences from this work and prior works.

5.1 Extensions to robust tensor completion

Besides corruptions, the observation ${\mathcal {M}}$ may also suffer from missing entries in many scenarios [11]. Thus, it is also practical to consider robust tensor completion which aims to recover ${\mathcal {L}}^*$ against outliers ${\mathcal {S}}^*$ and missing entries shown in the following observation model

$$\begin{aligned} \begin{aligned} {\mathcal {M}}= {\mathcal {B}}\circledast ({\mathcal {L}}^*+{\mathcal {S}}^*), \end{aligned} \end{aligned}$$

(23)

where tensor ${\mathcal {B}}\in \mathbb {R}^{d_1\times d_2 \times d_3}$ denote the missing mask where ${\mathcal {B}}_{ijk}=1$, if the (i, j, k)th entry of ${\mathcal {L}}$ is observed and ${\mathcal {B}}_{ijk}=0$ otherwise.

In [11], a TNN-based robust tensor completion model is proposed as follows

$$\begin{aligned} \begin{aligned} \min _{{\mathcal {L}},{\mathcal {S}}},~&~ \Vert {{\mathcal {L}}} \Vert _{\star }+\lambda \Vert {{\mathcal {S}}} \Vert _{\text {1}}\\ \text {s.t.}\,\,&~{\mathcal {B}}\circledast ({\mathcal {L}}+{\mathcal {S}})={\mathcal {M}}. \end{aligned} \end{aligned}$$

(24)

For Model (24), the ADMM algorithm proposed in [11] also requires full SVDs in each iteration, and thus can be accelerated by the factorized strategy in this paper. Taking into consideration of missing entries, we simply modify Model (9) as:

$$\begin{aligned} \begin{aligned} \min _{{\mathcal {P}},{\mathcal {Q}},{\mathcal {C}},{\mathcal {S}}}&~ \Vert {{\mathcal {C}}} \Vert _{\star }+\lambda \Vert {{\mathcal {S}}} \Vert _{\text {1}}\\ \text {s.t.}\,\,&~{\mathcal {B}}\circledast ({\mathcal {P}}*{\mathcal {C}}*{\mathcal {Q}}{}^\top +{\mathcal {S}})={\mathcal {M}},\\&~{\mathcal {P}}{}^\top *{\mathcal {P}}={\mathcal {I}}_{r},~{\mathcal {Q}}{}^\top *{\mathcal {Q}}={\mathcal {I}}_{r}. \end{aligned} \end{aligned}$$

(25)

Model (25) can be optimized by ADMM [1] and we omit it due to the focus of this paper.

It is easy to build a connection between the proposed Model (25) with Model (24) in the following theorem.

Theorem 4

(Connection between Model (25) and Model (24)) Let $({\mathcal {P}}_{*},{\mathcal {C}}_{*},{\mathcal {Q}}_{*},{\mathcal {S}}_{*})$ be a global optimal solution Problem (25). And let $({\mathcal {L}}^\star ,{\mathcal {S}}^\star ) $ be the solution Problem (24), and $r_{\text{ t }}({{\mathcal {L}}^\star })\le r$, where r is the initialized tubal rank. Then $({\mathcal {P}}_{*}*{\mathcal {C}}_{*}*{\mathcal {Q}}{}^\top _{*},{\mathcal {S}}_{*})$ is also the optimal solution to Problem (24).

5.2 Differences with prior works

First, we explain the differences between this paper and the TNN-based TRPCA [24].

1.
[24] recovers the underlying tensor using a convex model whereas our factorization-based model is non-convex.
2.
The theoretical results in [24] guarantee exact recovery of the underlying tensor whereas our theoretical analysis focus on the sub-optimality of the proposed algorithm.

Then, since our work is motivated by the “active space” model [17], we compare them as follows.

1.
The objectives are different: Our method is for tensor recovery and [17] is for matrix RPCA. Our model is based on the algebraic framework t-SVD and it can be viewed as a non-trivial multi-linear extension of [17].
2.
In our model, we adopts a triple tensor factorization ${\mathcal {L}}={\mathcal {P}}*{\mathcal {C}}*{\mathcal {Q}}{}^\top $ whereas [17] considers bipartite matrix factorization ${\mathbf{L}}={\mathbf{P}}{\mathbf{C}}$. Thus, our results cannot be trivially obtained from the analysis of [17].

Finally, we extend the conference version of this work [32] in the following aspects.

1.
We analyze the sub-optimality of the proposed Algorithm which is a main technical contribution of this paper. Besides, all proofs of lemmas and theorems in the conference version can be found directly in this paper.
2.
We also extend the proposed model to the robust tensor completion problem [11] in this paper.
3.
A large number of additional experiments are performed in this paper.

6 Experiments

In this section, we experiment on both synthetic and real datasets to verify the effectiveness and the efficiency of the proposed algorithm. All codes are written in Matlab and all experiments are performed in Windows 10 based on Intel(R) Core(TM) i7-8565U 1.80-1.99 GHz CPU with 8G RAM.

6.1 Synthetic data experiments

In this subsection, we compare Algorithm 1 (TriFac) with the TNN-based TRPCA [23] in accuracy and speed on synthetic datasets. Given tensor size $d_1\times d_2 \times d_3$ and tubal rank $r^*\ll \min \{d_1,d_2\}$, we first generate a tensor ${\mathcal {L}}_0\in \mathbb {R}^{d_1\times d_2 \times d_3}$ by $ {\mathcal {L}}_0={\mathcal {A}}*{\mathcal {B}}, $ where the elements of tensors ${\mathcal {A}}\in \mathbb {R}^{d_1\times r^*\times d_3}$ and ${\mathcal {B}}\in \mathbb {R}^{r^*\times d_2\times d_3}$ are sampled from independent and identically distributed (i.i.d.) standard Gaussian distribution. We then form ${\mathcal {L}}^*$ by ${\mathcal {L}}^*=\sqrt{d_1d_2d_3}{\mathcal {L}}_0/ \Vert {{\mathcal {L}}_0} \Vert _{\text {F}}$. Next, the support of ${\mathcal {S}}^*$ is uniformly sampled at random. For any $(i,j,k)\in \mathrm {supp}\left( {\mathcal {S}}^*\right) $, we set ${\mathcal {S}}^*_{ijk} = {\mathcal {B}}_{ijk}$, where ${\mathcal {B}}$ is a tensor with independent Bernoulli $\pm 1$ entries. Finally, we form the observation tensor ${\mathcal {M}}={\mathcal {L}}^*+{\mathcal {S}}^*$. In the experiments, we set parameter $\lambda =1/\sqrt{d_3\max \{d_1,d_2\}}$. For an estimation $\hat{{\mathcal {L}}}$ of the underlying tensor ${\mathcal {L}}^*$, the relative squared error (RSE) is used to evaluate the quality of $\hat{{\mathcal {L}}}$, that is,

$$\begin{aligned} \text {RSE}(\hat{{\mathcal {L}}},{\mathcal {L}}^*):=\frac{ \Vert {\hat{{\mathcal {L}}}-{\mathcal {L}}^*} \Vert _{\text {F}}}{ \Vert {{\mathcal {L}}^*} \Vert _{\text {F}}}. \end{aligned}$$

6.1.1 Effectiveness and efficiency of TriFac

Table 2 Comparison with TRPCA [24] in both accuracy and speed for different tensor sizes when the outliers follow i.i.d. Bernoulli distribution

Full size table

We first show that TriFac can exactly recover the underlying tensor ${\mathcal {L}}^*$ from corruptions with fast speed than TRPCA. We first test the recovery performance of different tensor sizes by setting $d_1=d_2\in \{100,160,200\}$ and $d_3=30$, with $(r_{\text{ t }}({{\mathcal {L}}^*}), \Vert {{\mathcal {S}}^*} \Vert _{0})=(0.05d,0.05d^2d_3)$. Then, a more difficult setting $(r_{\text{ t }}({{\mathcal {L}}^*}), \Vert {{\mathcal {S}}^*} \Vert _{0})=(0.15d,0.1d^2d_3)$ is tested. The results are shown in Table 2. It can be seen that TriFac can perform as well as TRPCA in the sense that both of them can exactly recover the underlying tensor. However, TriFac is much faster than TRPCA.

To test whether the distribution of outliers affects the performance of TriFac. We also test the cases where elements of the outlier tensor ${\mathcal {S}}^*$ are drawn from i.i.d. Gaussian distribution ${\mathcal {N}}(0,1)$, or uniform distribution ${\mathcal {U}}[0,1]$. The corresponding results are shown in Tables 3 and 4, respectively. As can be seen, the performance and speed of TriFac and TRPCA has not been significantly affected by the distribution of outliers. This can be explained by Theorem 4.1 in [24] that the underlying tensor ${\mathcal {L}}^*$ can be exactly recovered by TRPCA under some mild conditions if the support of the outlier tensor ${\mathcal {S}}^*$ is uniformly distributed. For the outlier tensor ${\mathcal {S}}^*$, Theorem 4.1 in [24] only makes one assumption on the random location distribution, but no assumption about the magnitudes or signs of the nonzero entries. Since the proposed TriFac can well mimic TRPCA for low-tubal-rank tensors, it is not surprising that TriFac is robust to different outliers.

Table 3 Comparison with TRPCA [24] in both accuracy and speed for with the outliers from ${\mathcal {N}}(0,1)$, observation tensor $ {\mathcal {M}}\in \mathbb {R}^{d\times d\times d_3}$, $d_3=30$, $r_{\text{ t }}({{\mathcal {L}}^*})=0.15d$, $ \Vert {{\mathcal {S}}^*} \Vert _{\text {1}}=0.1d^2d_3$, $r=\lfloor 1.5r_{\text{ t }}({{\mathcal {L}}^*})\rfloor $

Full size table

Table 4 Comparison with TRPCA [24] in both accuracy and speed for with the outliers from ${\mathcal {U}}[0,1]$, observation tensor ${\mathcal {M}}\in \mathbb {R}^{d\times d\times d_3}$, $d_3=30$, $r_{\text{ t }}({{\mathcal {L}}^*})=0.15d$, $ \Vert {{\mathcal {S}}^*} \Vert _{\text {1}}=0.1d^2d_3$, $r=\lfloor 1.5r_{\text{ t }}({{\mathcal {L}}^*})\rfloor $

Full size table

To further show the efficiency of the proposed TriFac, we consider a special case where the size of the underlying tensor ${\mathcal {L}}^*$ increases while the tubal rank is fixed as a constant. Specifically, we fix $r_{\text{ t }}({{\mathcal {L}}^*})=5$, and vary $d\in \{100,150,200,\ldots ,500\}$ with $d_3=20$. We set the parameter of initialized rank r of TriFac in Algorithm 1 by $r=30$. We test each setting 10 times and compute the averaged time. In all the runs, both TRPCA and TriFac can recover the underling tensor with RSE smaller than $1{\text {e}}{-6}$. The plot of averaged time versus the tensor size (shown in d) is given in Fig. 3. We can see that the time cost of TRPCA scales super-linearly with respect to d, whereas the proposed TriFac has approximately linear scaling.

6.1.2 Effects of the initialized tubal rank r

The performance of TriFac heavily rely on the choice of initialized tubal rank r in Model (9). Here, we explore the effects of initialized tubal rank on the accuracy and speed of TriFac. Specifically, we consider tensors of size $100\times 100\times 30$ with four different settings of tubal rank $r^*=r_{\text{ t }}({{\mathcal {L}}^*})$ and sparsity $s^*= \Vert {{\mathcal {S}}^*} \Vert _{0}$ as $(r^*,s^*)\in \{(10,1.5{\text {e}}{4}),(10,3{\text {e}}{4}),(15,1.5{\text {e}}{4}),(10,3{\text {e}}{4})\}$, where the elements outliers follow i.i.d. ${\mathcal {N}}(0,1)$. By varying the initialized $r\in \{5,10,15,\ldots ,50\}$, we show the effects of the initialized tubal rank r on the accuracy and speed of TriFac.

We first report the effects of initialized tubal rank r on the recovery accuracy of the underlying tensor ${\mathcal {L}}^*$, in terms of RSE and tubal rank of the final solution $\hat{{\mathcal {L}}}$. The results are shown in Fig. 4. As can be seen, there exists a phrase transition point $r_{\mathrm{{pt}}}$ that once the initialized rank r is larger than it, the RSE of $\hat{{\mathcal {L}}}$ will decrease rapidly.

Then, the effects of initialized tubal rank r on the estimation performance of the outlier tensor ${\mathcal {S}}^*$, in terms of RSE and $l_0$-norm of the final solution $\hat{{\mathcal {S}}}$ are shown in Fig. 5. We can also see that when the initialized rank r gets larger than the same phrase transition point $r_{\mathrm{{pt}}}$, the RSE of $\hat{{\mathcal {S}}}$ will soon vanish.

Finally, we show the effects of initialized tubal rank r on the running time of TriFac in Fig. 6. We can see that the running time increases as the initialized rank r gets larger, the underlying tensor gets more complex (i.e., $r^*$ gets higher), and the corruption gets heavier (i.e., $s^*$ gets larger). That is consistent with our intuition.

6.2 Real data experiments

In this section, the efficiency of the proposed TriFac compared with TRPCA [24] is evaluated on real-world datasets. Specifically, we carry out tensor restoration experiments on point cloud data, color images and brain MRI data. For an estimation $\hat{{\mathcal {L}}}$ of the underlying tensor ${\mathcal {L}}^*$, the peak signal-to-noise ratio (PSNR) is applied to evaluate the quality of $\hat{{\mathcal {L}}}$, that is,

$$\begin{aligned} \text {PSNR}:=10\log _{10}\bigg (\frac{d_1d_2d_3 \Vert {{\mathcal {L}}^*} \Vert _{\infty }^2}{ \Vert {\hat{{\mathcal {L}}}-{\mathcal {L}}^*} \Vert _{\text {F}}^2}\bigg ). \end{aligned}$$

6.2.1 Robust recovery point cloud data

We conduct experiments on a point cloud data set acquired by a vehicle-mounted Velodyne HDL-64E LiDAR^{Footnote 2} which is used for moving object tracking [25]. We extract the first 32 frames, transform and upsample the data to form two tensors in $\mathbb {R}^{512\times 800 \times 32}$ representing the distance data and the intensity data, respectively. Given a data tensor, we uniformly choose its indices with probability $\rho _s\in \{0.2,0.4\}$. We then corrupt the chosen positions with element-wise outliers from i.i.i. symmetric Bernoulli $\text {Bin}(-1,+1)$ or ${\mathcal {N}}(0,1)$. The proposed algorithm is also compared with SNN [10] and RPCA [2]. RPCA works on each frontal slice individually. The parameters of RPCA is set by $\lambda =1/\sqrt{\max \{d_1,d_2\}}$ [2]. The weight parameters $\varvec{\lambda }$ of SNN are chosen by $\lambda _k=\sqrt{\max \{d_k,\prod _{k'\ne k}d_{k'}\}}/3$. We set the parameter $\lambda =1/\sqrt{d_3\max \{d_1,d_2\}}$ [23] for TRPCA and the proposed TriFac. We choose the initialized tubal rank $r=196$.

We report the PSNR value and running time of each algorithm on the distance and intensity data in Fig. 7. It can be seen that TNN-based TRPCA has the highest PSNR values in all the settings, which is consistent with the results of tensor completion on this data that TNN outperforms SNN [33]. The proposed TriFac algorithm performs slightly worse than TNN, but it has the lowest running time. As is shown in Theorem 2, the proposed model in Eq. 9 can not outperform TNN-based RPCA model in Eq. (7) since they have the same global optimal solutions but the proposed model is non-convex. This explains why TriFac cannot achieve better performances than TRPCA.

6.2.2 Robust recovery of color images

We conduct image restoration on color images, which have natural 3-way tensor formats. The image restoration task aims at recovering an image from its corrupted observation. In this experiment, we test on 20 images of size $512\times 512\times 3$ shown in Fig. 8. Given a color image ${\mathcal {M}}\in \mathbb {R}^{m \times m \times 3}$, we first choose $10\%$ of is entries uniformly at random, and then corrupt them by i.i.d. outliers from ${\mathcal {U}}[0,1]$.

We also compare the proposed TriFac with SNN [10] and RPCA [2]. The weight parameters $\varvec{\lambda }$ of SNN are set by two different choices $[\lambda _1,\lambda _2,\lambda _3]=[15,15,1.5]$ as suggested by [24] or $[\lambda _1,\lambda _2,\lambda _3]=[\sqrt{3m},\sqrt{3m},m]/3$, and we use the result with higher PSNR. The parameter $\lambda $ of NN is set by $\lambda =1/\sqrt{m}$ [2]. We set the parameter $\lambda =1/\sqrt{3m}$ [23] for TRPCA and the proposed TriFac. We choose the initialized tubal rank $r=226$. Given a color image and a corruption level, we test 10 times and report the averaged PSNR and time.

Examples of the recovered images are shown in Fig. 9 for qualitative comparison. For quantitative comparison, the PSNR values and running time are reported in Fig. 10. We can see that TRPCA also performs best in most cases, consistent with the results in [24]. The proposed TriFac algorithm has the shortest running time and only slightly lower PSNRs than TRPCA.

6.2.3 Robust recovery of brain MRI data

To show the efficiency of the proposed TriFac, we also use the 3-way MRI data set analyzed in [38] which has good low-rank property. We extract the first 15 slices, each having a size of $181 \times 217$. To further show the efficiency of TriFac, we resize the data with scale parameter $\kappa \in \{1,1.5,2,2.5,3\}$ to form tensors in $\mathbb {R}^{\lceil 181\kappa \rceil \times \lceil 217\kappa \rceil \times 15}$. Then, we randomly choose 20% of the elements in the rescaled tensor, and corrupts them by elements from i.i.d. Bernoulli distribution $\text {Bin}(-1,+1)$. We compare TriFac with TRPCA in both running time and recovery performance with respect to different sizes. The results are shown in Table 5. It can be seen that the proposed TriFac works almost as well as TRPCA but has faster speed.

Table 5 Comparison of TriFac with TRPCA in both PSNR values and running time on rescaled MRI data in $\mathbb {R}^{\lceil 181\kappa \rceil \times ~ \lceil 217\kappa \rceil ~\times 15}$ with $\kappa \in \{1,1.5,2,2.5,3\}$

Full size table

6.2.4 Robust completion of color images

To evaluate the factorization strategy in its extension to robust tensor completion, we also conduct robust image completion on the test images in Fig. 8. In the experiment, we aim to restore a color image when 60% of its entries are missing uniformly at random and 10% of the observed entries are randomly corrupted by $\text {Bin}(-1,+1)$ outliers. The SSIM (Structural Similarity Index) [35] is also used to measure the quality of a restored image. The higher the SSIM value is, the higher the quality of a restored image is.

The proposed model (RTC-Fac) is compared with robust image completion models based on matrix nuclear norm (NN), SNN, and TNN, respectively. We also compare with a double $l_{1-2}$ metric based image completion model (DL1-2) inspired by [42]. The PSNR values, SSIM values, and running time are shown in Fig. 11. According to the results in Fig. 11, we can find that the proposed model has the least running time and achieves second highest PSNR and SSIM values in most cases.

7 Conclusion

In this paper, a factorization-based TRPCA model (TriFac) is first proposed to recover a 3-way data tensor from its observation corrupted by sparse outliers. Then, we come up with a non-convex ALM algorithm (Algorithm 1) to efficiently solve it. Further, the convergence and sub-optimality of the proposed algorithm are analyzed in Theorems 1 and 3, respectively. The effectiveness and efficiency of the proposed algorithm are demonstrated in experiments on both synthetic and real data sets.

It is interesting to study the statistical performance of the proposed model like [19, 31, 34], and consider sparse tensor discriminant analysis like [15, 21, 22, 26] as future works. It is also interesting to use non-convex tensor rank minimization [39, 40].

Notes

Although the problem of tensor RPCA is related to the RPCA [3] and tensor completion (TC) [18], we do not give further reviews of related works on variants of RPCA and TC.
http://www.mrt.kit.edu/z/publ/download/velodynetracking/dataset.html.

References

Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends®. Mach Learn 3(1):1–122
MATH Google Scholar
Candès EJ, Li X, Ma Y, Wright J (2011) Robust principal component analysis? J ACM (JACM) 58(3):11
Article MathSciNet MATH Google Scholar
Candès EJ, Tao T (2010) The power of convex relaxation: near-optimal matrix completion. IEEE Trans Inf Theory 56(5):2053–2080
Article MathSciNet MATH Google Scholar
Fazel M (2002) Matrix rank minimization with applications. Ph.D. thesis, Stanford University
Foucart S, Rauhut H (2013) A mathematical introduction to compressive sensing, vol 1. Birkhäuser, Basel
Book MATH Google Scholar
Friedland S, Lim L (2017) Nuclear norm of higher-order tensors. Math Comput 87(311):1255–1281
Article MathSciNet MATH Google Scholar
Goldfarb D, Qin Z (2014) Robust low-rank tensor recovery: models and algorithms. SIAM J Matrix Anal Appl 35(1):225–253
Article MathSciNet MATH Google Scholar
Harshman RA (1970) Foundations of the parafac procedure: models and conditions for an “explanatory” multi-modal factor analysis
Hillar CJ, Lim L (2009) Most tensor problems are np-hard. J ACM 60(6):45
MathSciNet MATH Google Scholar
Huang B, Mu C, Goldfarb D, Wright J (2015) Provable models for robust low-rank tensor completion. Pac J Optim 11(2):339–364
MathSciNet MATH Google Scholar
Jiang Q, Ng M (2019) Robust low-tubal-rank tensor completion via convex optimization. In: Proceedings of the 28th international joint conference on artificial intelligence. AAAI Press, Macao, China, pp 2649–2655
Kilmer ME, Braman K, Hao N, Hoover RC (2013) Third-order tensors as operators on matrices: a theoretical and computational framework with applications in imaging. SIAM J Matrix Anal Appl 34(1):148–172
Article MathSciNet MATH Google Scholar
Kolda TG, Bader BW (2009) Tensor decompositions and applications. SIAM Rev 51(3):455–500
Article MathSciNet MATH Google Scholar
Lai Z, Xu Y, Chen Q, Yang J, Zhang D (2014) Multilinear sparse principal component analysis. IEEE Trans Neural Netw 25(10):1942–1950
Article Google Scholar
Lai Z, Xu Y, Yang J, Tang J, Zhang D (2013) Sparse tensor discriminant analysis. IEEE Trans Image Process 22(10):3904–3915
Article MathSciNet MATH Google Scholar
Lin Z, Chen M, Ma Y (2010) The augmented Lagrange multiplier method for exact recovery of corrupted low-rank matrices. arXiv preprint arXiv:1009.5055
Liu G, Yan S (2012) Active subspace: toward scalable low-rank learning. Neural Comput 24(12):3371–3394
Article MathSciNet MATH Google Scholar
Liu J, Musialski P, Wonka P, Ye J (2013) Tensor completion for estimating missing values in visual data. IEEE Trans Pattern Anal Mach Intell 35(1):208–220
Article Google Scholar
Liu X, Aeron S, Aggarwal V, Wang X (2020) Low-tubal-rank tensor completion using alternating minimization. IEEE Trans Inf Theory 66(3):1714–1737
Article MathSciNet MATH Google Scholar
Liu Y, Jiao L, Shang F (2013) A fast tri-factorization method for low-rank matrix recovery and completion. Pattern Recognit 46(1):163–173
Article MATH Google Scholar
Liu Z, Lai Z, Ou W, Zhang K, Zheng R (2020) Structured optimal graph based sparse feature extraction for semi-supervised learning. Signal Process 170:107456
Article Google Scholar
Liu Z, Wang J, Liu G, Zhang L (2019) Discriminative low-rank preserving projection for dimensionality reduction. Appl Soft Comput 85:105768
Article Google Scholar
Lu C, Feng J, Chen Y, Liu W, Lin Z, Yan S (2016) Tensor robust principal component analysis: Exact recovery of corrupted low-rank tensors via convex optimization. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Las Vegas, USA, pp 5249-5257
Lu C, Feng J, Liu W, Lin Z, Yan S et al (2019) Tensor robust principal component analysis with a new tensor nuclear norm. IEEE Trans Pattern Anal Mach Intell 42:925–938
Article Google Scholar
Moosmann F, Stiller C (2013) Joint self-localization and tracking of generic objects in 3d range data. In: Proceedings of the IEEE international conference on robotics and automation. IEEE, Karlsruhe, Germany, pp 1138–1144
Peng Y, Lu BL (2017) Discriminative extreme learning machine with supervised sparsity preserving for image classification. Neurocomputing 261:242–252
Article Google Scholar
Peng Y, Lu BL (2017) Robust structured sparse representation via half-quadratic optimization for face recognition. Multimed Tools Appl 76(6):8859–8880
Article Google Scholar
Romera-Paredes B, Pontil M (2013) A new convex relaxation for tensor completion. In: Proceedings of advances in neural information processing systems. The Neural Information Processing Systems Foundation, Lake Tahoe, USA, pp 2967–2975
Tucker LR (1966) Some mathematical notes on three-mode factor analysis. Psychometrika 31(3):279–311
Article MathSciNet Google Scholar
Wang A, Jin Z (2017) Near-optimal noisy low-tubal-rank tensor completion via singular tube thresholding. In: Proceedings of the IEEE international conference on data mining workshop (ICDMW). IEEE, New Orleans, USA, pp 553–560
Wang A, Jin Z, Tang G (2020) Robust tensor decomposition via t-SVD: near-optimal statistical guarantee and scalable algorithms. Signal Process 167:107319. https://doi.org/10.1016/j.sigpro.2019.107319
Article Google Scholar
Wang A, Jin Z, Yang J (2019) A factorization strategy for tensor robust PCA. In: 2019 5th IAPR Asian conference on pattern recognition (ACPR). IAPR, Auckland, New Zealand, pp 424–437
Wang A, Lai Z, Jin Z (2019) Noisy low-tubal-rank tensor completion. Neurocomputing 330:267–279
Article Google Scholar
Wang A, Li C, Jin Z, Zhao Q (2020) Robust tensor decomposition via orientation invariant tubal nuclear norms. In: Proceedings of AAAI conference on artiicial intelligence. AAAI Press, New York, USA, pp 6102–6109
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Article Google Scholar
Wu T, Bajwa WU (2018) A low tensor-rank representation approach for clustering of imaging data. IEEE Signal Process Lett 25(8):1196–1200
Article Google Scholar
Xie Y, Tao D, Zhang W, Liu Y, Zhang L, Qu Y (2018) On unifying multi-view self-representations for clustering by tensor multi-rank minimization. Int J Comput Vis 126(11):1157–1179
Article MathSciNet Google Scholar
Xu Y, Hao R, Yin W, Su Z (2015) Parallel matrix factorization for low-rank tensor completion. Inverse Probl Imaging 9(2):601–624
Article MathSciNet MATH Google Scholar
Xue J, Zhao Y, Liao W, Chan JCW (2018) Total variation and rank-1 constraint rpca for background subtraction. IEEE Access 6:49955–49966
Article Google Scholar
Xue J, Zhao Y, Liao W, Chan JCW (2019) Nonconvex tensor rank minimization and its applications to tensor recovery. Inf Sci 503:109–128
Article MathSciNet Google Scholar
Xue J, Zhao Y, Liao W, Chan JCW, Kong SG (2020) Enhanced sparsity prior model for low-rank tensor completion. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2019.2956153
Zhang F, Yang G, Yang Z, Wan M (2018) Robust recovery of corrupted image data based on $l_{1-2}$ metric. IEEE Access 6:5848–5855
Article Google Scholar
Zhang Z, Aeron S (2017) Exact tensor completion using t-SVD. IEEE Trans Signal Process 65(6):1511–1526
Article MathSciNet MATH Google Scholar
Zhang Z, Ely G, Aeron S, Hao N, Kilmer M (2014) Novel methods for multilinear data completion and de-noising based on tensor SVD. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Columbus, USA, pp 3842–3849
Zhou P, Feng J (2017) Outlier-robust tensor PCA. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Honolulu, USA, pp 3938–3946

Download references

Acknowledgements

The authors would like to thank Prof. Qibin Zhao, Ms. Jin Wang, Mr. Bo Wang and Dr. Dongxu Wei for their long time support. This work is partially supported by National Natural Science Foundation of China [Grant nos. 61872188, U1713208, 61972204, 61672287, 61861136011, 61773215, 61703209,61972212], by the Natural Science Foundation of Guangdong Province [Grant nos. 2020A1515010671, 2017A030313367], and by the Natural Science Foundation of Jiangsu Province [Grant no. BK20190089].

Author information

Authors and Affiliations

School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China
An-Dong Wang, Zhong Jin & Jing-Yu Yang
Key Laboratory of Intelligent Perception and System for High-Dimensional Information of Ministry of Education, Nanjing University of Science and Technology, Nanjing, 210094, China
An-Dong Wang, Zhong Jin & Jing-Yu Yang

Authors

An-Dong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhong Jin
View author publications
You can also search for this author in PubMed Google Scholar
Jing-Yu Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhong Jin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Part of this work [32] has been presented in 5th IAPR Asian Conference on Pattern Recognition (ACPR), 26-29 November 2019, Auckland, New Zealand.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, AD., Jin, Z. & Yang, JY. A faster tensor robust PCA via tensor factorization. Int. J. Mach. Learn. & Cyber. 11, 2771–2791 (2020). https://doi.org/10.1007/s13042-020-01150-2

Download citation

Received: 02 December 2019
Accepted: 09 June 2020
Published: 24 June 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s13042-020-01150-2

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A faster tensor robust PCA via tensor factorization

Abstract

Similar content being viewed by others

A Factorization Strategy for Tensor Robust PCA

Robust principal component analysis based on tensor train rank and Schatten p-norm

Augmented Lagrangian method for tensor low-rank and sparsity models in multi-dimensional image recovery

Explore related subjects

1 Introduction

2 Tensor SVD and related works

2.1 Notations and tensor SVD

Definition 1

Definition 2

Definition 3

Definition 4

Definition 5

Definition 6

Definition 7

Definition 8

Definition 9

Definition 10

Definition 11

2.2 Related works

3 TriFac for tensor robust PCA

3.1 Model formulation

Lemma 1

Proof

3.2 Optimization algorithm

Lemma 2

Proof

Lemma 3

4 Theoretical analysis

4.1 Convergence analysis of Algorithm 1

Theorem 1

Lemma 4

Lemma 5

Proof

Proof of Theorem 1

4.2 Sub-optimality of Algorithm 1

4.2.1 Connection between Algorithm 1 and TRPCA

Theorem 2

Proof of Theorem 2

Theorem 3

4.2.2 Proof of Theorem 3

Lemma 6

Lemma 7

Proof

Lemma 8

Proof

Proposition 1

Proof

Proof of Theorem 3

5 Extensions and differences with prior works

5.1 Extensions to robust tensor completion

Theorem 4

5.2 Differences with prior works

6 Experiments

6.1 Synthetic data experiments

6.1.1 Effectiveness and efficiency of TriFac

6.1.2 Effects of the initialized tubal rank r

6.2 Real data experiments

6.2.1 Robust recovery point cloud data

6.2.2 Robust recovery of color images

6.2.3 Robust recovery of brain MRI data

6.2.4 Robust completion of color images

7 Conclusion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation