1 Introduction

Principal component analysis (PCA) [1] has been widely used in data analysis and dimension reduction, especially in the contemporary era of big data. Standard PCA extracts principal components (PCs), which are orthogonal to each other, of sample data matrix by computing leading eigenvectors of sample covariance matrix. PCs capture most sample information in the first few principal components and guarantee least information loss in the sample variance sense. However, principal components (PCs) are linear combination of all original sample variables, which are often numerous in most modern applications, and this makes PCs difficult to interpret which variables influence the sample information further. In order to solve this problem, in the past dozen years, several alternative sparse approaches have been proposed, that is, modified PCs, which are the linear combination of a small subset of sample variables. The modified PCs can still explain most information of all original sample variables. That is the sparse principal component analysis (SPCA). SPCA is one of the most widely used techniques for machine learning, modern data analysis, finance, statistics, process fault detection [2] and many other fields [3, 4].

One main method to get SPCA, which was first proposed by Jolliffe et al. [5], is \(l_{1}\)-regularized PCA. The technique is successful in improving the sparsity of loadings of principal component analysis (PCA) by imposing \(l_{1}\)-regularization penalty on the loadings of PCA. This earlier stage research on SPCA leads to rapidly growing investigations into sparsity promoting methods due to the success of SPCA’s more obtained interpretable ability. D’Aspremont et al. [6] translated the SPCA problem into a semi-definite programming-based relaxation formulation, and Shen et al. [7] proposed a method to obtain SPCA via regularized singular value decomposition. Subsequently, Journée et al. [8] developed two single-unit and two block optimization formulations of the SPCA problem, and also D’Aspremont et al. [3] formulated the SPCA problem as a new semi-definite relaxation and derived a greedy algorithm to compute a full set of good solutions for all target numbers of nonzero coefficients. Other related works including the problem of finding the dominant eigenvector of the sample covariance matrix under additional constraints on the vector proposed by Sigg et al. [9] and the generation of modified PCA with sparse loadings by using of the lasso and a new regularization and variable selection method, that is, the elastic net proposed by Zou et al. [10, 11].

One of the fundamental method to obtain the SPCA is to reformulate the PCA as a regression-type optimization problem and then to impose the \(l_{1}\) constraint on the regression coefficients [11]. It turns out that this problem can be transformed into a lasso-type optimization problem which has the form

$$\begin{aligned} \min _{\mathbf {z}\in \mathbb {R}^{N}}\Vert \mathbf {y-\mathbf {A}\mathbf {z}}\Vert ^{2}_{2}+\lambda \Vert \mathbf {z}\Vert _{1}, \end{aligned}$$
(1.1)

where \(\mathbf {A}\in \mathbb {R}^{{m\times N}}\) is a matrix and \(\lambda \) is a nonnegative parameter [10]. According to [12], the solution of problem (1.1) has close relation to the solution of problem (1.2)

$$\begin{aligned} \min _{\mathbf {z}\in \mathbb {R}^{N}}\Vert \mathbf {z}\Vert _{1} \quad subject \quad to \quad \Vert \mathbf {A}\mathbf {z}-\mathbf {y}\Vert _{2}\le \eta . \end{aligned}$$
(1.2)

But, in practice, it is inconceivable to measure a target vector \(\mathbf {z}\in \mathbb {R}^{N}\) exactly, a practicable method is to search \(\mathbf {z}\) consistent with the measured vector \(\mathbf {y}\). Meanwhile, as measurements are often perturbed by noise, this means that the measurement data vector \(\mathbf {y}\in \mathbb {R}^{m}\) is only an approximation of the vector \(\mathbf {A}\mathbf {z}\in \mathbb {R}^{m}\), that is, \(\Vert \mathbf {A}\mathbf {z}-\mathbf {y}\Vert \le \eta \) for some \(\eta \ge 0\) and norm \(\Vert \cdot \Vert \) on \(\mathbb {R}^{m}\), usually the norm \(\Vert \cdot \Vert _{2}\).

Especially, when \(\eta =0\), the problem (1.2) has the form

$$\begin{aligned} \min _{\mathbf {z}\in \mathbb {R}^{N}}\Vert \mathbf {z}\Vert _{1} \quad subject \quad to \quad \mathbf {A}\mathbf {z}=\mathbf {y}. \end{aligned}$$
(1.3)

In fact, problem (1.3) is the case considering linear measurements of a sparse data vector \(\mathbf {z}\in \mathbb {R}^{N}\) from its measurement vector \(\mathbf {y}=\mathbf {A}\mathbf {z}\in \mathbb {R}^{m}\), where \(\mathbf {A}\in \mathbb {R}^{m\times N} \) is a known matrix. Problem (1.3), known as basis pursuit (BP) [13], is usually regarded as the convex relaxation of the original problem

$$\begin{aligned} \min _{\mathbf {z}\in \mathbb {R}^{N}}\Vert \mathbf {z}\Vert _{0} \quad subject \quad to \quad \mathbf {A}\mathbf {z}=\mathbf {y}, \end{aligned}$$
(1.4)

which is to reconstruct the sparse data vector \(\mathbf {z}\) by solving the \(l_{0}\)-minimization problem, where \(\Vert \mathbf {z}\Vert _{0}\) denotes the number of nonzero entries of the vector \(\mathbf {z}\). Problem (1.3) is a tractable strategy convex relaxation of problem (1.4), which is a non-convex problem and is generally NP-hard indeed. This convex relaxation is meaningful because under certain conditions with respect to the matrix \(\mathbf {A}\), the model (1.3) can exactly or approximately reconstruct the original sparse data vector \(\mathbf {z}\) of problem (1.4). One of these conditions is the theorem with respect to the concept of null space property, which involves the known data matrix \(\mathbf {A}\) that ensures exact or approximate reconstruction of the data vector \(\mathbf {z}\). The theorem declares that given a matrix \(\mathbf {A}\in \mathbb {R}^{m\times N}\), every vector \(\mathbf {x}\in \mathbb {R}^{N}\) supported on a index set \(S\subset [N]\) is the unique solution of (1.3) with \(\mathbf {y}=\mathbf {A}\mathbf {x}\) if and only if matrix \(\mathbf {A}\) satisfies the null space property relative to S [12].

Due to the key role of \(l_{1}\)-regularization of (1.1) as a sparse reconstruction technique in compressive sensing literatures, this paper extends the results with respect to the (stable, robust) null space property, which can be used to recover (exactly or approximately) data vector, to reconstruction (exact or approximate) of sparse information matrix, and gives the corresponding results regarding the (stable, robust) general null space property.

This work develops theoretical condition for reconstruction (exact or approximate) of sparse principal components based on the definition of (stable, robust) general null space property and gives a theoretical analysis approach for sparse principal component analysis (SPCA).

The rest of this paper is organized as follows. The definitions of (stable, robust) general null space property are given, and the results respect to these concepts are proved, respectively, in Sects. 2, 3 and 4. In Sect. 5, the conclusion of this paper is presented.

Notation In the paper, card(S) denotes the cardinality of a set S, and symbol \(\Vert \mathbf {X}\Vert _{[0]}\), where \(\mathbf {X}\) is a matrix, denotes the number of nonzero entries of the matrix \(\mathbf {X}\). The sum of absolute value of entries of matrix \(\mathbf {X}\) is denoted as \(\Vert \mathbf {X}\Vert _{[1]}\), and \(\sigma _{s}(\mathbf {X})_{[1]}=\inf _{\Vert \mathbf {Z}\Vert _{[0]}\le s}\Vert \mathbf {X}-\mathbf {Z}\Vert _{[1]}\) denotes the error of best s-term approximation to the matrix \(\mathbf {X}\). For a \(m\times n\) matrix \(\mathbf {M}\), the set \(\Omega =\{(1,1), (1,2), \ldots , (1,n), \ldots , (m,n)\}\) denotes all the subscripts of the elements of matrix \(\mathbf {M}\), and for index set \(S\subset \Omega \), \(\mathbf {M}_{S}\) denotes a \(m\times n\) matrix that coincides with \(\mathbf {M}\) on the indices in S and is extended to zero outside S. Moreover, the set \(\overline{S}\) denotes the complement of a set S with respect to set \(\Omega \), that is, \(\overline{S}=\Omega \setminus S\). \(\Vert \cdot \Vert _{F}\) presents the Frobenius norm of a matrix \(\cdot \), and \(\mathbf {O}\) denotes a null matrix.

2 General Null Space Property

In this paper, we focus on the problem

figure a

where \(\mathbf {A}\) is matrix in \(\mathbb {R}^{m\times N}\).

First, we give three definitions, which are needed in the following presentation, with respect to the general null space property.

Definition 1

Given a matrix \(\mathbf {A}\in \mathbb {R}^{m\times N}\), the sum of all absolute values of the entries of matrix \(\mathbf {A}\) is denoted as \(\Vert \mathbf {A}\Vert _{[1]}\). At the same time, \(\Vert \mathbf {A}\Vert _{[0]}\) denotes the number of nonzero entries in the matrix \(\mathbf {A}\).

Definition 2

Given a matrix \(\mathbf {A}\in \mathbb {R}^{m\times N}\), the set \(\{\mathbf {V}\in \mathbb {R}^{N\times l}|\mathbf {A}\mathbf {V}=\mathbf {O}, \mathbf {A}\in \mathbb {R}^{m\times N}\}\) is called the general null space of matrix \(\mathbf {A}\) of order \({N\times l}\), denoted as \(Gker(\mathbf {A})\).

As stated by Definition 2, we have \(Gker(\mathbf {A})=\{\mathbf {V}\in \mathbb {R}^{N\times l}|\mathbf {A}\mathbf {V}=\mathbf {O}, \mathbf {A}\in \mathbb {R}^{m\times N}\}\).

Definition 3

A matrix \(\mathbf {A}\in \mathbb {R}^{m\times N}\) is said to satisfy the general null space property relative to a index set \(S\subset \Omega \), if

$$\begin{aligned} \Vert \mathbf {V}_{S}\Vert _{[1]}<\Vert \mathbf {V}_{\overline{S}}\Vert _{[1]} \quad for \ all \quad \mathbf {V}\in Gker(\mathbf {A}) \backslash \left\{ \mathbf {O}\right\} . \end{aligned}$$
(2.1)

A matrix \(\mathbf {A}\in \mathbb {R}^{m\times N}\) is said to satisfy the general null space property of order s if it satisfies the general null space property relative to any index set \({S}\subset \Omega \) with card(S)\(\le s\).

Remark 1

In the definition 3, \(\Omega =\{(1,1), (1,2), \ldots , (1,l), \ldots , (N,l)\}\) and \(\mathbf {O}\) is a null matrix in \(\mathbb {R}^{N\times l}\).

Remark 2

According to Definition 3, it can be observed that, for a given \(\mathbf {V}\in Gker(\mathbf {A}) \backslash \left\{ \mathbf {O}\right\} \), the condition \(\Vert \mathbf {V}_{S}\Vert _{[1]}<\Vert \mathbf {V}_{\overline{S}}\Vert _{[1]}\) holds for any set \(S\subset \Omega \) with card(\(S)\le s\) as soon as it holds for an index set of s largest entries of \(\mathbf {V}\) in absolute values.

Remark 3

There are other reformulations of the Definition 3. One of them is obtained by adding \(\Vert \mathbf {V}_{S}\Vert _{[1]}\) to both sides of the inequality \(\Vert \mathbf {V}_{S}\Vert _{[1]}<\Vert \mathbf {V}_{\overline{S}}\Vert _{[1]}\). Thus, the general null space property relative to index set \(S\subset \Omega \) is

$$\begin{aligned} 2\Vert \mathbf {V}_{S}\Vert _{[1]}<\Vert \mathbf {V}\Vert _{[1]} \quad for \ all \quad \mathbf {V}\in Gker(\mathbf {A}) \backslash \left\{ \mathbf {O}\right\} . \end{aligned}$$

The second one is obtained by choosing index set \(S\subset \Omega \) as an set of s largest in absolute value entries of \(\mathbf {V}_{S}\) and by adding \(\Vert \mathbf {V}_{\overline{S}}\Vert _{[1]}\) to both sides of the inequality \(\Vert \mathbf {V}_{S}\Vert _{[1]}<\Vert \mathbf {V}_{\overline{S}}\Vert _{[1]}\). Then, the general null space property of order s is

$$\begin{aligned} \Vert \mathbf {V}\Vert _{[1]}<2\sigma _{s}(\mathbf {V})_{[1]} \quad for \ all \quad \mathbf {V}\in Gker(\mathbf {A}) \backslash \left\{ \mathbf {O}\right\} , \end{aligned}$$

where

$$\begin{aligned} \sigma _{s}(\mathbf {X})_{[1]}=\inf _{\Vert \mathbf {Z}\Vert _{[0]}\le s}\Vert \mathbf {X}-\mathbf {Z}\Vert _{[1]}. \end{aligned}$$

Based on the definitions above, we have the following Theorem 1 which indicates the link between the general null space property and the recovery of sparse matrix via (\(\mathbf {P}_{1}\)).

Theorem 1

Given a matrix \(\mathbf {A}\in \mathbb {R}^{m\times N}\), every matrix \(\mathbf {X}\in \mathbb {R}^{N\times l}\) supported on a index set S is the unique solution of \((P_{1})\) with \(\mathbf {Y}=\mathbf {A}\mathbf {X}\) if and only if \(\mathbf {A}\) satisfies the general null space property relative to S.

Proof

Given a fixed index set S, let us first assume that every matrix \(\mathbf {X}\in \mathbb {R}^{N\times l}\) supported on S is the unique minimizer of \(\Vert \mathbf {Z}\Vert _{[1]}\) subject to \(\mathbf {A}\mathbf {Z}=\mathbf {A}\mathbf {X}\). Thus, for any \(\mathbf {V}\in Gker(\mathbf {A})\backslash \left\{ \mathbf {O} \right\} \), the matrix \(\mathbf {V}_{S}\) is the unique minimizer of \(\Vert \mathbf {Z}\Vert _{[1]}\) subject to \(\mathbf {A}\mathbf {Z}=\mathbf {A}\mathbf {V}_{S}\). At the same time, we have \(\mathbf {A}(-\mathbf {V}_{\overline{S}})=\mathbf {A}\mathbf {V}_{S}\) and \(-\mathbf {V}_{\overline{S}}\ne \mathbf {V}_{S}\), because \(\mathbf {A}(\mathbf {V}_{\overline{S}}+\mathbf {V}_{S})=\mathbf {A}\mathbf {V}=\mathbf {O}\) and \(\mathbf {V}\ne \mathbf {O}\). We conclude that \(\Vert \mathbf {V}_{S}\Vert _{[1]}< \Vert \mathbf {V}_{\overline{S}}\Vert _{[1]}\). This established the general null space property relative to S.

On the other hand, suppose the general null space property relative to index set S holds. Then, given a matrix \(\mathbf {X}\in \mathbb {R}^{N\times l}\) supported on S and a matrix \(\mathbf {Z}\in \mathbb {R}^{N\times l}\), \(\mathbf {Z}\ne \mathbf {X}\), satisfying \(\mathbf {A}\mathbf {Z}=\mathbf {A}\mathbf {X}\). We consider the matrix \(\mathbf {V}:=\mathbf {X}-\mathbf {Z}\in Gker(\mathbf {A})\backslash \left\{ \mathbf {O}\right\} \). According to the general null space property, we get

$$\begin{aligned} \Vert \mathbf {X}\Vert _{[1]}\le \,&\Vert \mathbf {X}-\mathbf {Z}_{S}\Vert _{[1]}+\Vert \mathbf {Z}_{S}\Vert _{[1]} \\ =\,&\Vert \mathbf {V}_{S}\Vert _{[1]}+\Vert \mathbf {Z}_{S}\Vert _{[1]} \\ <\,&\Vert \mathbf {V}_{\overline{S}}\Vert _{[1]}+\Vert \mathbf {Z}_{S}\Vert _{[1]} \\ =\,&\Vert \mathbf {-Z}_{\overline{S}}\Vert _{[1]}+\Vert \mathbf {Z}_{S}\Vert _{[1]} \\ =\,&\Vert \mathbf {Z}\Vert _{[1]}. \end{aligned}$$

The required minimality of \(\Vert \mathbf {X}\Vert _{[1]}\) is obtained. \({\Box }\)

We immediately get the following Theorem 2 as a consequence of Theorem 1 by varying the index set S.

Theorem 2

Given a matrix \(\mathbf {A}\in \mathbb {R}^{m\times N}\), every s-sparse matrix \(\mathbf {X}\in \mathbb {R}^{N\times l}\) is the unique solution of \((P_{1})\) with \(\mathbf {Y}=\mathbf {A}\mathbf {X}\) if and only if \(\mathbf {A}\) satisfies the general null space property of order s.

3 Stable General Null Space Property

The matrices we intend to recover via \(\mathbf {P}_{1}\) are sparse only in idealized situations. In more realistic circumstances, we can only claim that they approximate to the sparse matrices. In such cases, we would like to recover a matrix \(\mathbf {X}\in \mathbb {R}^{N\times l}\) with an error controlled by its distance to the sparse matrices. This property is often referred to as the stability of the reconstruction scheme with respect to sparsity defect. First of all, we give the following definition.

Definition 4

A matrix \(\mathbf {A}\in \mathbb {R}^{m\times N}\) is said to satisfy the stable general null space property with constant \(0<\rho <1\) relative to a index set \(S\subset \Omega \), if

$$\begin{aligned} \Vert \mathbf {V}_{S}\Vert _{[1]}\le \rho \Vert \mathbf {V}_{\overline{S}}\Vert _{[1]} \quad for \ all \quad \mathbf {V}\in Gker(\mathbf {A}). \end{aligned}$$

It is said to satisfy the stable general null space property of order s with constant \(0<\rho <1\) if it satisfies the stable general null space property with constant \(0<\rho <1\) relative to any index set \(S\subset \Omega \) with \(card(S)\le s\), where \(\Omega \) is the same as in the Definition 3.

The stability result with respect to the stable general null space property is the following theorem.

Theorem 3

Suppose that a matrix \(\mathbf {A}\in \mathbb {R}^{m\times N}\) satisfies the stable general null space property of order s with constant \(0<\rho <1\). Then, for any \(\mathbf {X}\in \mathbb {R}^{N\times l}\), a solution \(\mathbf {X}^{\sharp }\) of \((\mathbf {P}_{1})\) with \(\mathbf {Y}=\mathbf {A}\mathbf {X}\) approximates the matrix \(\mathbf {X}\) with \(\Vert \cdot \Vert _{[1]}\) error

$$\begin{aligned} \Vert \mathbf {X}-\mathbf {X}^{\sharp }\Vert _{[1]}\le 2\frac{1+\rho }{1-\rho }\sigma _{s}(\mathbf {X})_{[1]}. \end{aligned}$$
(3.1)

We would like to prove a stronger theorem in the following. The result is a statement valid for any index set S in which the matrix \(\mathbf {X}^{\sharp }\in \mathbb {R}^{N\times l}\) is replaced by any matrix \(\mathbf {Z}\in \mathbb {R}^{N\times l}\) satisfying \(\mathbf {A}\mathbf {Z}=\mathbf {A}\mathbf {X}\). Apart from improving Theorem 3 above, the result declares that, under the stable general null space property relative to index set S, the distance between a matrix \(\mathbf {X}\in \mathbb {R}^{N\times l}\) supported on S and a matrix \(\mathbf {Z}\in \mathbb {R}^{N\times l}\) satisfying \(\mathbf {A}\mathbf {Z}=\mathbf {A}\mathbf {X}\) is controlled by the difference between their \(\Vert \cdot \Vert _{[1]}\) norms.

Theorem 4

The matrix \(\mathbf {A}\in \mathbb {R}^{m\times N}\) satisfies the stable general null space property with constant \(0<\rho <1\) relative to index set S if and only if

$$\begin{aligned} \Vert \mathbf {Z}-\mathbf {X}\Vert _{[1]}\le \frac{1+\rho }{1-\rho }(\Vert \mathbf {Z}\Vert _{[1]}-\Vert \mathbf {X}\Vert _{[1]}+2\Vert \mathbf {X}_{\overline{S}}\Vert _{[1]}) \end{aligned}$$
(3.2)

for all matrices \(\mathbf {X},\mathbf {Z}\in \mathbb {R}^{N\times l}\) with \(\mathbf {A}\mathbf {Z}=\mathbf {A}\mathbf {X}\).

The error bound in (3.1) can be derived from Theorem 4 as follows: Take S to be a index set of s largest absolute elements of matrix \(\mathbf {X}\), so that \(\Vert \mathbf {X}_{\overline{S}}\Vert _{[1]}=\sigma _{s}(\mathbf {X})_{[1]}\). If \(\mathbf {X}^{\sharp }\) is a minimizer of (\(\mathbf {P}_{1}\)), then \(\Vert \mathbf {X}^{\sharp }\Vert _{[1]}\le \Vert \mathbf {X}\Vert _{[1]}\) and \(\mathbf {A}\mathbf {X}^{\sharp }=\mathbf {A}\mathbf {X}\). The right hand side of inequality (3.2) with \(\mathbf {Z}=\mathbf {X}^{\sharp }\) can therefore be estimated by the right hand of (3.1).

Before proving Theorem 4, we first identify the following Lemma, which will be needed in the following proof later.

Lemma 1

Given a index set \(S\subset \Omega \) and matrices \(\mathbf {X},\mathbf {Z}\in \mathbb {R}^{N\times l}\), it follows that

$$\begin{aligned} \Vert (\mathbf {X}-\mathbf {Z})_{\overline{S}}\Vert _{[1]}\le \Vert \mathbf {Z}\Vert _{[1]}-\Vert \mathbf {X}\Vert _{[1]}+\Vert (\mathbf {X}-\mathbf {Z})_{S}\Vert _{[1]} +2\Vert \mathbf {X}_{\overline{S}}\Vert _{[1]}. \end{aligned}$$

Proof

The result can be obtained from

$$\begin{aligned} \Vert \mathbf {X}\Vert _{[1]}&=\Vert \mathbf {X}_{\overline{S}}\Vert _{[1]}+\Vert \mathbf {X}_{S}\Vert _{[1]} \\&\le \Vert \mathbf {X}_{\overline{S}}\Vert _{[1]}+\Vert (\mathbf {X}-\mathbf {Z})_{S}\Vert _{[1]}+\Vert \mathbf {Z}_{S}\Vert _{[1]}, \\ \Vert (\mathbf {X}-\mathbf {Z})_{\overline{S}}\Vert _{[1]}&\le \Vert \mathbf {X}_{\overline{S}}\Vert _{[1]}+\Vert \mathbf {Z}_{\overline{S}}\Vert _{[1]}. \end{aligned}$$

Summing these two inequalities, we get

$$\begin{aligned} \Vert \mathbf {X}\Vert _{[1]}+\Vert (\mathbf {X}-\mathbf {Z})_{\overline{S}}\Vert _{[1]}\le 2\Vert \mathbf {X}_{\overline{S}}\Vert _{[1]} +\Vert (\mathbf {X}-\mathbf {Z})_{S}\Vert _{[1]}+\Vert \mathbf {Z}\Vert _{[1]}. \end{aligned}$$

The proof is completed. \({\Box }\)

Having the above lemma 1, we can prove Theorem 4.

Proof of Theorem 4. First, we assume the matrix \(\mathbf {A}\) satisfies (3.2) for all matrices \(\mathbf {X}, \mathbf {Z}\in \mathbb {R}^{N\times l}\) with \(\mathbf {A}\mathbf {Z}=\mathbf {A}\mathbf {X}\). Given a matrix \(\mathbf {V}\in Gker(\mathbf {A})\), since \(\mathbf {A}\mathbf {V}_{\overline{S}}=\mathbf {A}(-\mathbf {V}_{S})\), we can apply (3.2) with \(\mathbf {X}=-\mathbf {V}_{S}\) and \(\mathbf {Z}=\mathbf {V}_{\overline{S}}\). Then, we have

$$\begin{aligned} \Vert \mathbf {V}\Vert _{[1]}\le \frac{1+\rho }{1-\rho }(\Vert \mathbf {V}_{\overline{S}}\Vert _{[1]}-\Vert \mathbf {V}_{S}\Vert _{[1]}). \end{aligned}$$

That is

$$\begin{aligned} (1-\rho )(\Vert \mathbf {V}_{S}\Vert _{[1]}+\Vert \mathbf {V}_{\overline{S}}\Vert _{[1]})\le (1+\rho )(\Vert \mathbf {V}_{\overline{S}}\Vert _{[1]} -\Vert \mathbf {V}_{S}\Vert _{[1]}). \end{aligned}$$

By rearranging the above inequality’s terms, we get

$$\begin{aligned} \Vert \mathbf {V}_{S}\Vert _{[1]}\le \rho \Vert \mathbf {V}_{\overline{S}}\Vert _{[1]}, \end{aligned}$$

and the stable general null space property with constant \(0<\rho <1\) with respect to index set S is obtained.

On the other hand, suppose the matrix \(\mathbf {A}\) satisfies the stable general null space property with constant \(0<\rho <1\) with respect to index set S. For \(\mathbf {X},\mathbf {Z}\in \mathbb {R}^{N\times l}\) with \(\mathbf {A}\mathbf {Z}=\mathbf {A}\mathbf {X}\), let \(\mathbf {V}:=\mathbf {Z}-\mathbf {X}\), then \(\mathbf {V}\in Gker(A)\), the stable general null space property declares

$$\begin{aligned} \Vert \mathbf {V}_{S}\Vert _{[1]}\le \rho \Vert \mathbf {V}_{\overline{S}}\Vert _{[1]}. \end{aligned}$$
(3.3)

At the same time, Lemma 1 gives

$$\begin{aligned} \Vert \mathbf {V}_{\overline{S}}\Vert _{[1]}\le \Vert \mathbf {Z}\Vert _{[1]}-\Vert \mathbf {X}\Vert _{[1]}+\Vert \mathbf {V}_{S}\Vert _{[1]}+2\Vert \mathbf {X}_{\overline{S}}\Vert _{[1]}. \end{aligned}$$
(3.4)

Substituting (3.3) into (3.4), we have

$$\begin{aligned} \Vert \mathbf {V}_{\overline{S}}\Vert _{[1]}\le \Vert \mathbf {Z}\Vert _{[1]}-\Vert \mathbf {X}\Vert _{[1]}+\rho \Vert \mathbf {V}_{\overline{S}}\Vert _{[1]} +2\Vert \mathbf {X}_{\overline{S}}\Vert _{[1]}. \end{aligned}$$

Owing to \(\rho <1\), the above inequality can be transformed into

$$\begin{aligned} \Vert \mathbf {V}_{\overline{S}}\Vert _{[1]}\le \frac{1}{1-\rho }(\Vert \mathbf {Z}\Vert _{[1]}-\Vert \mathbf {X}\Vert _{[1]}+2\Vert \mathbf {X}_{\overline{S}}\Vert _{[1]}). \end{aligned}$$

By reusing (3.3), we get

$$\begin{aligned} \Vert \mathbf {V}\Vert _{[1]}&=\Vert \mathbf {V}_{\overline{S}}\Vert _{[1]}+\Vert \mathbf {V}_{S}\Vert _{[1]} \\&\le (1+\rho )\Vert \mathbf {V}_{\overline{S}}\Vert _{[1]} \\&\le \frac{1+\rho }{1-\rho }(\Vert \mathbf {Z}\Vert _{[1]}-\Vert \mathbf {X}\Vert _{[1]}+2\Vert \mathbf {X}_{\overline{S}}\Vert _{[1]}), \end{aligned}$$

which is the desired result. \({\Box }\)

4 Robust General Null Space Property

In practice, it is impossible to measure a data matrix \(\mathbf {Z}\in \mathbb {R}^{N\times l}\) precisely. This implies that the measurable matrix \(\mathbf {Y}\in \mathbb {R}^{m\times l}\) is only an approximation of the matrix \(\mathbf {A}\mathbf {Z}\in \mathbb {R}^{m\times l}\), with

$$\begin{aligned} \Vert \mathbf {A}\mathbf {Z}-\mathbf {Y}\Vert _{F}\le \eta \end{aligned}$$

for some \(\eta \ge 0\). In this case, the recovery strategy should be required to produce a data matrix \(\mathbf {Z}^*\in \mathbb {R}^{N\times l}\) whose distance to the original data matrix \(\mathbf {Z}\in \mathbb {R}^{N\times l}\) is controlled by the measurement error \(\eta \ge 0\). This property is called the robustness of the recovery strategy in respect of the measurement error. We will investigate the following problem in this section

$$\begin{aligned} \min _{\mathbf {Z}\in \mathbb {R}^{N\times l}}\Vert \mathbf {Z}\Vert _{[1]} \quad subject \quad to \quad \Vert \mathbf {A}\mathbf {Z}-\mathbf {Y}\Vert _{F}\le \eta . \end{aligned}$$
(4.1)

Before investigating the problem (4.1), we first give a more demanding definition of the general null space property, which guarantees the robustness of the reconstruction scheme.

Definition 5

The matrix \(\mathbf {A}\in \mathbb {R}^{m\times N}\) is said to satisfy the robust general null space property with constants \(0<\rho <1\) and \(\tau >0\) relative to a index set S if

$$\begin{aligned} \Vert \mathbf {V}_{S}\Vert _{[1]}\le \rho \Vert \mathbf {V}_{\overline{S}}\Vert _{[1]}+\tau \Vert \mathbf {A}\mathbf {V}\Vert _{F} \quad for \quad all \quad \mathbf {V}\in \mathbb {R}^{N\times l}. \end{aligned}$$
(4.2)

It is said to satisfy the robust general null space property of order s with constants \(0<\rho <1\) and \(\tau >0\) if it satisfies the robust general null space property with constants \(\rho ,\tau \) relative to any index matrix S with \(card(S)\le s\).

Notice that the definition above does not require that \(\mathbf {V}\) is contained in \(Gker(\mathbf {A})\). Especially, when \(\mathbf {V}\in Gker(\mathbf {A})\), the term \(\Vert \mathbf {A}\mathbf {V}\Vert _{F}\) in (4.2) disappears, and that is the case of stable general null space property in Definition 4.

Based on the Definition 5, we have the result that the following Theorem 5 affirms.

Theorem 5

The matrix \(\mathbf {A}\in \mathbb {R}^{m\times N}\) satisfies the robust general null space property with constant \(0<\rho <1\) and \(\tau >0\) relative to index set S if and only if

$$\begin{aligned} \Vert \mathbf {Z}-\mathbf {X}\Vert _{[1]}\le \frac{1+\rho }{1-\rho }(\Vert \mathbf {Z}\Vert _{[1]}-\Vert \mathbf {X}\Vert _{[1]} +2\Vert \mathbf {X}_{\overline{S}}\Vert _{[1]})+\frac{2\tau }{1-\rho }\Vert \mathbf {A(\mathbf {Z}-\mathbf {X})}\Vert _{F} \end{aligned}$$
(4.3)

for all matrices \(\mathbf {X}, \mathbf {Z}\in \mathbb {R}^{N\times l}\).

Proof

Firstly, let the matrix \(\mathbf {A}\) satisfy (4.3) for all matrices \(\mathbf {X}, \mathbf {Z}\in \mathbb {R}^{N\times l}\). Then, for \(\mathbf {V}\in \mathbb {R}^{N\times l}\), taking \(\mathbf {X}=-\mathbf {V}_{S}\) and \(\mathbf {Z}=\mathbf {V}_{\overline{S}}\), we have

$$\begin{aligned} \Vert \mathbf {V}\Vert _{[1]}\le \frac{1+\rho }{1-\rho }(\Vert \mathbf {V}_{\overline{S}}\Vert _{[1]}-\Vert \mathbf {V}_{S}\Vert _{[1]}) +\frac{2\tau }{1-\rho }\Vert \mathbf {A}\mathbf {V}\Vert _{F}. \end{aligned}$$

After rearranging the terms, we have

$$\begin{aligned} (1-\rho )(\Vert \mathbf {V}_{S}\Vert _{[1]}+\Vert \mathbf {V}_{\overline{S}}\Vert _{[1]}) \le (1+\rho )(\Vert \mathbf {V}_{\overline{S}}\Vert _{[1]}-\Vert \mathbf {V}_{S}\Vert _{[1]})+2\tau \Vert \mathbf {A}\mathbf {V}\Vert _{F}, \end{aligned}$$

that is

$$\begin{aligned} \Vert \mathbf {V}_{S}\Vert _{[1]}\le \rho \Vert \mathbf {V}_{\overline{S}}\Vert _{[1]}+\tau \Vert \mathbf {A}\mathbf {V}\Vert _{F}. \end{aligned}$$

This is the robust general null space property with constants \(0<\rho <1\) and \(\tau >0\) relative to index set S.

Reversely, suppose the matrix \(\mathbf {A}\) satisfy the robust general null space property with constants \(0<\rho <1\) and \(\tau >0\) relative to index set S. For \(\mathbf {X}, \mathbf {Z}\in \mathbb {R}^{N\times l}\), letting \(\mathbf {V}:=\mathbf {Z}-\mathbf {X}\), according to the robust general null space property and lemma 1, we have

$$\begin{aligned}&\Vert \mathbf {V}_{S}\Vert _{[1]}\le \rho \Vert \mathbf {V}_{\overline{S}}\Vert _{[1]}+\tau \Vert \mathbf {A}\mathbf {V}\Vert _{F}, \\&\Vert \mathbf {V}_{\overline{S}}\Vert _{[1]}\le \Vert \mathbf {Z}\Vert _{[1]}-\Vert \mathbf {X}\Vert _{[1]}+\Vert \mathbf {V}_{S}\Vert _{[1]} +2\Vert \mathbf {X}_{\overline{S}}\Vert _{[1]}. \end{aligned}$$

From these two inequalities, we can get

$$\begin{aligned} \Vert \mathbf {V}_{\overline{S}}\Vert _{[1]}\le \frac{1}{1-\rho }(\Vert \mathbf {Z}\Vert _{[1]}-\Vert \mathbf {X}\Vert _{[1]} +2\Vert \mathbf {X}_{\overline{S}}\Vert _{[1]}+\tau \Vert \mathbf {A}\mathbf {V}\Vert _{F}). \end{aligned}$$

According to the robust general null space property once again, we can obtain

$$\begin{aligned} \Vert \mathbf {V}\Vert _{[1]}&=\Vert \mathbf {V}_{\overline{S}}\Vert _{[1]}+\Vert \mathbf {V}_{S}\Vert _{[1]} \\&\le (1+\rho )\Vert \mathbf {V}_{\overline{S}}\Vert _{[1]}+\tau \Vert \mathbf {A}\mathbf {V}\Vert _{F} \\&\le \frac{1+\rho }{1-\rho }(\Vert \mathbf {Z}\Vert _{[1]}-\Vert \mathbf {X}\Vert _{[1]}+2\Vert \mathbf {X}_{\overline{S}}\Vert _{[1]})+ \frac{2\tau }{1-\rho }\Vert \mathbf {A}\mathbf {V}\Vert _{F}, \end{aligned}$$

which is the desired result. \({\Box }\)

5 Conclusions

In this paper, the null space property has been extended to the general null space property, and the corresponding results with respect to the general null space property have been proved including the stable and the robust cases. The extended concepts can be used to analysis sparse principal components theoretically. In addition to these, several possible avenues of further research work still remain to be done. Such as how to combine these theoretical analysis with practice effectively.