Keywords

1 Introduction

High prediction accuracy and discovering relevant predictive variables are two fundamental problems in statistical learning. Variable selection is particularly important when the underlying model has a sparse representation, especially in high-dimensional and massive data analysis. It has been argued by [1] that a good estimator should have oracle property, namely, the estimator

  • correctly selects covariates with nonzero coefficients with probability converging to one, as the sample size goes to infinity, and

  • has the same asymptotic distribution as if the zero coefficients were known in advance.

Consider the linear regression model \(\varvec{y} = \varvec{X\beta } + \varvec{\epsilon }\), where \(\mathbf X \in R^{n\times l_n}\) is a design matrix, \(\varvec{\beta }\in R^{l_n}\) is the vector of unknown coefficients, and \(\varvec{\epsilon }\in R^n\) is the vector of i.i.d. random variables with mean zero and finite variance \(\sigma ^2\). Note that \(l_n\), the length of \(\varvec{\beta }\) depends on the sample size n and may go to infinity as \(n\rightarrow \infty \). Without loss of generality, we assume that the response vector \(\varvec{y}\in R^n\) and the covariates are centered so that the intercept term can be excluded. In many situations we are to recover \(\varvec{\beta }\) from observation \(\varvec{y}\) such that \(\varvec{\beta }\) is of the most sparse structure, that is, \(\varvec{\beta }\) has the fewest nonzero components. A direct approach is to formulate this problem as \(\min _{\varvec{\beta }\in R^n}||\varvec{\beta }||_0\text { such that }\varvec{y} = \varvec{X\beta } + \varvec{\epsilon }\), which can be transformed into \(\min _{\varvec{\beta }\in R^n}||\varvec{y}- \varvec{X\beta }||^2+\lambda ||\varvec{\beta }||_0\), which is called an \(L_0\) regularization problem, where \(||\varvec{\beta }||_0\) is the number of nonzero components of \(\varvec{\beta }\) and \(\lambda \) is the regularization parameter. Indeed this method can recover sparse solutions even in situations in which \(l_n\gg n\), in fact, it can perfectly recover all the sparse \(\varvec{\beta }\) obeying \(||\varvec{\beta }||_0\le n/2\). However this is of little practical use since generally solving an \(L_0\) regularization problem usually requires an intractable number of combinatorial searches. To conquer this difficulty, several approximations to the \(L_0\) problem have been proposed, such as \(L_1\) regularization [2,3,4,5], the adaptive Lasso [6], the \(L_p\) \((0<p<1)\) regularization [7, 8] and the adaptive \(L_p\) \((0<p<1)\) regularization [9].

Among the proposed techniques above, \(L_1\) regularization (or Lasso) overcame the huge computational cost for large problems of the \(L_0\) but may introduce inconsistent estimations [6] and extra bias [10]. The adaptive Lasso and \(L_p\) regularization solved the above problems and their oracle property were established in both low and high dimensional scenarios [6,7,8, 11]. Meanwhile it has been claimed that the \(L_p\) \((0<p<1)\) regularization yields more sparse solutions than both the Lasso and the adaptive Lasso [12, 13], but sometimes its sparsity would lead to unstable estimation [9], who therefore proposed the adaptive \(L_p\) (\(0<p<1\)) regularization, \(\min _{\varvec{\beta }\in R^n}||\varvec{y}-\varvec{X\beta }||^2 + \lambda \sum _{j=1}^{l_{n}}\omega _j|\beta _j|^p\), and proved its oracle property when the number of covariates is fixed.

In this paper, we continue to investigate the adaptive \(L_{p}\) (\(0<p<1\)) regularization when the number of covariates depends on the sample size and can go to infinity as the sample size goes to infinity. We prove that under a series of mild conditions, the adaptive \(L_p\) (\(0<p<1\)) estimator enjoys the oracle property in high-dimensional settings even when \(l_n\gg n\), and proposed algorithms that can efficiently solve the adaptive \(L_p\). Finally we demonstrate the superior performance of the adaptive \(L_p\) in variable selection, signal recovery and image shape reconstruction by a series of numerical experiments, in comparison to the \(L_{1}\) estimator, the adaptive \(L_{1}\) estimator and the \(L_{1/2}\) estimator.

2 Preliminaries

The symbol \(\rightarrow \) stands for convergence in the common sense, \(\rightarrow _p\) for convergence in probability, and \(\rightarrow _d\) for convergence in distribution. \(\mathbb P(\cdot )\) stands for the probability. \(X_n=O_p(1)\) stands for some stochastically bounded sequence, and \(X_n=o_p(1)\) for \(X_n\rightarrow _p 0 \text { as }n\rightarrow \infty \). Meanwhile \(\varvec{\beta }_0=[\varvec{\beta }_{01}^{\top }, \varvec{\beta }_{00}^{\top }]^{\top }\in R^{l_n}\), where \(\varvec{\beta }_{01}\in R^{k_n}\) consists of the nonzero terms of \(\varvec{\beta }_0\) and \(\varvec{\beta }_{00}\in R^{m_n}\) are the zero ones, note that \(k_n+m_n=l_n\). We center the response vector \(\varvec{y}=[y_1, \cdots , y_n]^{\top }\) and standardize the design matrix \(X=(x_{ij})_{n\times p_n}\) so that \( \sum _{i=1}^n y_i=0,\text { } \sum _{i=1}^n x_{ij}=0,\text { } \frac{1}{n}\sum _{i=1}^nx_{ij}^2=1, \text { }j=1,\cdots ,l_n\).

Let \(J_{1n}=\{j\;|\;\beta _{0j}\ne 0\}\) and set \(\varvec{X}_n^{\top }=[\varvec{x}_1, \varvec{x}_2, \cdots , \varvec{x}_n]\in R^{l_n\times n}\), \(\varvec{X}_{n1}=[\varvec{x}_{11}, \varvec{x}_{12}, \cdots \varvec{x}_{1n}]\in R^{k_n\times n}\), accrodingly we define \(\varSigma _n=\frac{1}{n}\varvec{X}_n^{\top }\varvec{X}_n,\text { }\varSigma _{n1}=\frac{1}{n}\varvec{X}_{n1}^{\top } \varvec{X}_{n1}\). We denote \(\rho _{1n}\) and \(\tau _{1n}\) as the smallest eigenvalue of \(\varSigma _n\) and \(\varSigma _{n1}\) respectively, and \(\rho _{2n}\), \(\tau _{2n}\) the largest eigenvalue of \(\varSigma _n\) and \(\varSigma _{n1}\). We consider the oracle property which was proposed by [1].

Definition 2.1

(Oracle Property). Let \(\hat{\varvec{\beta }}_n^{\top }=[\hat{\varvec{\beta }}_{n1}^{\top }, \hat{\varvec{\beta }}_{n0}^{\top }]^{\top }\) be the estimator of the true parameter \(\varvec{\beta }_0=[\varvec{\beta }_{01}^{\top }, \varvec{\beta }_{00}^{\top }]^{\top }\). Then \(\hat{\varvec{\beta }}_n\) is said to possess oracle property if the two conditions below are satisfied: (1). (Consistency) \(\lim _{n\rightarrow \infty }\mathbb {P}(\hat{\varvec{\beta }}_{n0}=\varvec{0})=1\); (2). (Asymptotic Normality) Let \(s_n^2=\sigma ^2\varvec{\alpha }_n^{\top }\sigma _{n1}^{-1}\varvec{\alpha }_n\), where \(\alpha _n\) is any \(k_n\times 1\) vector satisfying \(\Vert \varvec{\alpha }_n\Vert _2\le 1\) such that

$$\begin{aligned} n^{-\frac{1}{2}}s_n^{-1}\varvec{\alpha }_n^{\top }(\hat{\varvec{\beta }}_{n1}-\varvec{\beta }_{01} )=n^{-\frac{1}{2}}s_n^{-1}\sum _{i=1}^n\epsilon _i\varvec{\alpha }_n^{\top } \varSigma _{n1}^{-1}\varvec{x}_{i1}+o_p(1)\rightarrow _dN(0,1). \end{aligned}$$
(2.1)

Let \(b_{1n}=\min _{j\in {J_{1n}}}\{|\beta _{0j}|\},\text { } b_{2n}=\max _{j\in {J_{1n}}}\{|\beta _{0j}|\}\).

Definition 2.2

(Zero Consistency). The estimator \(\tilde{\varvec{\beta }}_n\) is said to be zero consistent if it satisfies the two conditions: (1). \(\max _{j\in J_{0n}}|\tilde{\beta }_{nj}|=o_p(1)\); (2). There exists some constant \(c>0\) such that for any \(\epsilon >0\) when n is sufficiently large the following inequality holds

$$\begin{aligned} \mathbb {P}(\min _{j\in J_{1n}}|\tilde{\beta }_{nj}|\ge cb_{1n})>1-\epsilon . \end{aligned}$$
(2.2)

where \(\tilde{\beta }_{nj}\) is the marginal regression coefficient [11]. Furthermore, if for a certain constant \(C>0\), the following

$$\begin{aligned} \mathbb {P}(R_n\max _{j\in J_{0n}}|\tilde{\beta }_{nj}|>C)\rightarrow 0,\quad \text {as }n\rightarrow \infty \end{aligned}$$
(2.3)

where \(\lim _{n\rightarrow \infty }R_n=\infty \), then \(\tilde{\varvec{\beta }}_n\) is said to be zero consistent with rate \(R_n\).

3 Methods

Now we present the conditions for the oracle property of the adaptive \(L_p\) regularization. Due to the limit of space, we omit the proofs of all theorems which will be seen in our future paper. We consider the estimator [9]

$$\begin{aligned} U_n(\varvec{\beta })=\sum _{i=1}^n\sum _{i=1}^{l_n}(Y_i-x_{ij}\beta _j)^2+\lambda _n\sum _{i=1}^{l_n}\omega _{nj}|\beta _j|^p, \end{aligned}$$
(3.1)

where \(0<p<1\). Let \(\bar{\varvec{\beta }}_n=\mathrm {arg}\min _{\varvec{\beta }}U_n({\varvec{\beta }})= [\bar{\varvec{\beta }}_{n1}^{\top }, \bar{\varvec{\beta }}_{n0}^{\top }]^{\top }\), where \(\bar{\varvec{{\beta }}}_{n1}^{\top }\) and \(\bar{\varvec{\beta }}_{n0}^{\top }\) corresponds to the estimates of nonzero and zero coefficients of the true parameter \(\varvec{\beta }_0\) respectively. We give the following assumptions.

  1. (B1)

    (i) \(\{\epsilon _i\}_{i=1}^n\) is a sequence of i.i.d. random variables, with mean 0 and a finite variance \(\sigma ^2\); (ii). \(\epsilon _i\) is sub-Gaussian, that is \( \mathbb {P}(|\epsilon _i|>x)\le K\exp (-Cx^2)\), for all \(i\in \mathbb {N}\), some \(K>0\) and \(C>0\).

  2. (B2)

    \(\tilde{\varvec{\beta }}_n\) defined by Definition 2.4 is zero-consistent with rate \(R_n\).

  3. (B3)

    (i) There exists a constant \(c_9>0\) such that \( \left| n^{-\frac{1}{2}}\sum _{i=1}^nx_{ij}x_{ik}\right| \le c_9\) for all \(j\in J_{0n}\), all \(k\in J_{1n}\) and sufficiently large n; (ii). Let \(\xi _{nj}=n^{-1}\mathrm {E}(\sum _{i=1}^nY_ix_{ij})=n^{-1}\sum _{i=1}^n(\varvec{x}_{i1}^{\top }\varvec{\beta }_{01}x_{ij})\). There exists a \(\xi _0>0\), such that \( \min _{j\in J_{1n}}|\xi _{nj}|>2\xi _0b_{1n}>0\).

  4. (B4)

    (i) \(\frac{\lambda _n}{n}\rightarrow 0\), \(\lambda _nn^{-\frac{p}{2}}R_n^{\alpha }k_n^{p-2}\rightarrow \infty \); (ii) \(\log (m_n)=o(1)(\lambda _nn^{-\frac{p}{2}}R_n^{-\alpha })^{\frac{2}{2-p}}.\)

  5. (B5)

    (i) There exists some constants \(0<b_1<b_2<\infty \), such that \(b_1<b_{1n}<b_{2n}<b_2\); (ii) \(\lim _{n\rightarrow \infty }k_n\exp (-Cn)\rightarrow 0.\)

Theorem 3.3

Suppose the conditions (B1)–(B5) hold. Then \(\bar{\varvec{\beta }}_n\) is consistent in variable selection, namely

$$\begin{aligned} \mathbb {P}(\bar{\varvec{\beta }}_{n0}=\varvec{0})\rightarrow 1,\quad \mathbb {P}(\bar{\varvec{\beta }}_{n1j}\ne 0,\,j\in J_{1n})\rightarrow 1, \quad \text {as }n\rightarrow \infty . \end{aligned}$$
(3.2)

It can be seen from Theorem 3.3 that under appropriate conditions, the adaptive \(L_p\) (\(0<p<1\)) correctly selects nonzero covariates with probability converging to one. Towards the oracle property, we denote the nonzero terms as \(\bar{\varvec{\beta }}_n\) and consider optimizing the following objective function

$$\begin{aligned} \tilde{U}_n(\varvec{\beta }_1)=\sum _{i=1}^n(Y_i-\varvec{x}_{i1}^{\top }\varvec{\beta }_1)^2+\lambda _n^{*}\sum _{i=1}^{k_n}\omega _{nj}|\beta _{1j}|^p, \end{aligned}$$
(3.3)

where \(k_n\) is number of nonzero terms of \(\bar{\varvec{\beta }}_n\).

Let \(\hat{\varvec{\beta }}_{n1}\) be the nonzero terms of \(\hat{\varvec{\beta }}_0=\mathrm {arg\min _{\varvec{\beta }}}\tilde{U}_n (\varvec{\beta })\). We further give the following conditions.

  1. (B6)

    (i) There exists constants \(0<\tau _1<\tau _2<\infty \), such that \(0<\tau _1<\tau _{1n}<\tau _{2n}<\tau _2\); (ii) \(n^{-\frac{1}{2}}\max _{1\le i\le n}\varvec{x}_{i1}^{\top }\varvec{x}_{i1}\).

  2. (B7)

    (i) \(k_n(1+\lambda _n^{*})/n\rightarrow 0,\quad \)(ii) \(\lambda _n^{*}(k_n/l_n\sqrt{n})^{\frac{1}{2}}\rightarrow 0.\)

Therefore we have

Theorem 3.4

\(\hat{\varvec{\beta }}_{n1}\) is the estimate of the true non-zero parameter \(\hat{\varvec{\beta }}_{01}\). Suppose condition (B1)–(B7) hold, then

$$\begin{aligned} n^{\frac{1}{2}}s_n^{-1}\varvec{\alpha }_n^{\top }(\hat{\varvec{\beta }}_{n1}- \hat{\varvec{\beta }}_{01})=n^{\frac{1}{2}}s_n^{-1}\sum _{i=1}^n\epsilon _i\varvec{\alpha }_n^{\top }\varSigma _{n1}^{-1} \varvec{x}_{i1}+o_p(1)\rightarrow _d\mathrm {N}(0,1). \end{aligned}$$

where \(s_n^2=\sigma ^2\varvec{\alpha }_n^{\top }\varSigma _{n1}^{-1}\varvec{\alpha }_n\) and \(\varvec{\alpha }_n\) is an arbitrary \(k_n\times 1\) vector with \(||\varvec{\alpha }_n||_2\le 1\).

The assumption that \(\tilde{\varvec{\beta }}_n\) is zero-consistent with rate \(R_n\) is critical in establishing the oracle property of the adaptive \(L_p\) \((0<p<1)\) regularizer. [11] points out that when \(l_n\) is fixed or of the order \(o(\sqrt{n})\), the OLS estimator \(\tilde{\varvec{\beta }}_{ols} = (\varvec{X}^{\top }\varvec{X})^{-1}\varvec{X}^{\top }\varvec{y}\) is feasible as \(\tilde{\varvec{\beta }}_n\). But when \(l_n>O(\sqrt{n})\), the OLS estimator is no longer zero-consistent. Here we follow the work of [11] but with necessary modification, i.e., \(\exists b_1>0\text { such that }b_{1n}>b_1>0\) to present initial estimator. Refer to Sect. 3 of [11] for the rest of the details of discussions. For the case \(l_{n}<n\), refer to [9] for details.

However,for the case \(l_n>n\), the OLS is no longer feasible as an initial estimator. By [11] and Theorem 3.3, we perform variable selection first to obtain the nonzero \(\beta _j\), which induces the following Algorithm 1.

figure a

4 Results

In this section, we give three application examples of variable selection, signal recovery and image shape reconstruction respectively. We note that by Algorithm 1 solving the adaptive \(L_p\) \((0<p<1)\) is equivalent to solving a series of adaptive \(L_1\) which is very quick on a modern computer. We take \(p=1/2\) in the experiments, which was recommended by [12]. To show the performance of the present algorithm, we compare with the \(L_{1}\) (Lasso), adaptive \(L_{1}\) and (nonadaptive) \(L_{1/2}\) regularized estimators. In practice, we use the algorithm proposed in [6] to compute the adaptive Lasso and apply the iterative \(L_1\) algorithm in [12] to the \(L_{1/2}\) regularization. The parameter \(\lambda _n\) in \(L_{1/2}\) is selected by using the generalized cross-validation(GCV) method described in [1, 16].

For comparison with the adaptive \(L_{1}\) regularizer and the adaptive \(L_{1/2}\) regularizer, two-dimensional cross-validation is used and selected \(\gamma \) from \(\{0.5, 1.0, 1.5\}\). For fixed \(\gamma \) and \(\lambda \), we apply Algorithm 1 to obtain a numerical solution \(\hat{\beta }=\hat{\beta }_{\gamma , \lambda }\). Note that \(\hat{\beta }_{\gamma , \lambda }\) is the minimum of \(L_{1/2}\) regularizer [12], namely \(\hat{\beta }_{\gamma , \lambda } = (X^{\top }X+\lambda _n WD^{*})^{-1}X^{\top }y\), where W is a diagonal matrix with elements \(\omega _{nj}\) and D = diag\(\{|\hat{\beta }_j|^{3/2}\}\). Here \(D^{*}\) of D. Meanwhile the number of nonzero components of \(\hat{\beta }_{\gamma , \lambda }\) can be approximated by (refer to [16, 17]) \(P_{\gamma , \lambda } = \text {tr}(X(X^{\top }X+\lambda WD^{*})^{-1}X^{\top })\). Thus the generalized cross-validation statistic is given by \(\text {GCV}_{\gamma , \lambda } = \frac{1}{l_n}\frac{\text {RSS}_{\gamma , \lambda }}{(1-\text {P}_{\gamma , \lambda }/l_n)^2}\),where RSS stands for the residual sum of squared errors: \(\sum _{i=1}^n\big (y_i-x_i^{\top }\beta \big )^2\).Therefore we obtain the solution of adaptive \(L_{1/2}\) regularizer by the minimization problem \(\{\hat{\gamma }, \hat{\lambda }\} = \arg \min _{\gamma , \lambda }\text {GCV}_{\gamma , \lambda }\), that is, \(\hat{\beta }=\hat{\beta }_{\hat{\gamma }, \hat{\lambda }}\). All the simulation codes are written in Python with using the package SPAMS [15].

4.1 Variable Selection

Consider the following linear regression model mentioned in [1, 12, 16] \(y = X\beta ^{*}+\sigma \epsilon \), where the true values of \(\beta ^{*}\) are \([3,1.5,0,0,2,0,0,0]^{\top }\), \(\epsilon \) is the i.i.d. noise following certain distribution, and \(\sigma \) is the strength of the noise. We first take \(\sigma =1\) and second \(\sigma =3\) with \(\epsilon \) following the standard normal distribution. Finally take \(\sigma =1\) but \(\epsilon \) follows the linear mixture of 30% standard Cauchy distribution and 70% standard normal distribution (denoted as MIXTURE in the result table). The correlation between each \(x_i\) and \(x_j\) is equal to \((1/2)^{|i-j|}\).

For each type of noise, we simulate 100 datasets of \(n=100\) observations respectively, and the relative model error in each dataset is defined as \(||\hat{y}-X\beta ^{*}||_2/||y-X\beta ^{*}||_2\), where \(\hat{y} = X\hat{\beta }\), \(||\hat{y} - X\beta ^{*}||_2\) is the model error and \(||y-X\beta ^{*}||_2\) is the inherent prediction error due to the noise. The results shown in Tables 1 and 2 illustrated that adaptive \(L_{1/2}\) is more accurate and sparse than the other three regularizers.

Table 1. Median value of relative model error (\(n=100\)) (with min/max)
Table 2. Average number of zero coefficients (\(n=100\)) (with standard deviation)

4.2 Signal Recovery

In this and the next experiment, we show the application of adaptive \(L_p\) regularization in compressed sensing [2, 19, 20]. Consider a real-valued and finite length signal \(\text {x}\in R^N\), which is represented by an orthonormal basis \(\{\psi _i\}_{i=1}^N\) of \(R^N\). Let \(\varPsi = [\psi _1, \cdots , \psi _N]\). There exists \(s\in R^{N}\) such that \(\text {x} = \varPsi s = \sum _{i=1}^N\psi _i s_i\).

Consider \(y = \varPhi x + \epsilon \), where \(\epsilon \) is a noise term which is either stochastic or deterministic and \(\varPhi \) is the “sensing matrix”. It was shown by [19, 21] that reconstruction of \(\text {x}\) can be formulated to minimize the following \(L_0\) problem \(\min _{\text {x}\in R^N}\sum _{i=1}^N I_{x_i\ne 0}\text { such that }||y - \varPhi x||_2\le \delta \), where the parameter \(\delta \) is adjustable so that the true signal \(\text {x}\) can be feasible. According to the work of [2], if \(\text {x}\) is sufficiently sparse and \(\varPhi \) satisfies the Restricted Isometry Property [23], this \(L_0\) problem is equivalent to \(\min _{\text {x}\in R^N}\sum _{i=1}^N |x_i|\text { such that }||y - \varPhi x||_2\le \delta \), an \(L_1\) regularization. Now we apply the adaptive \(L_{1/2}\) regularizer to solve the original problem, that is, we consider the following minimization problem \(\min _{\text {x}\in R^N}\sum _{i=1}^N \omega _i|x_i|^{1/2}\text { such that }||y - \varPhi x||_2\le \delta \), to reconstruct the signal.

As a numerical experiment, take \(\text {x} = \sin (2\pi f_1t) + \cos (2\pi f_2t) + \epsilon \) with a fix signal length \(N=512\), \(t\in [0, 0.1]\) with fixed-length step, \(f_1=16\) and \(f_2=384\). We consider the noise \(\epsilon \) follows the standard normal distribution with \(\mu =0\) and \(\sigma =0.01\). Discrete cosine transform (DCT) is used to obtain the sparse representation of \(\text {x}\) (denoted as \(\hat{\text {x}}\)). Then we set \(M = 128\) and sample a random \(M\times N\) matrix \(\varPhi \) with i.i.d. Gaussian entries. We first apply the \(L_{1}\), adaptive \(L_1\), \(L_{1/2}\) and our adaptive \(L_{1/2}\) regularization to recover \(\hat{\text {x}}\) and then employ the inverse discrete cosine transform (idct) to obtain the reconstructed \(\text {x}\) respectively. We run each estimator for 100 trials.

We compare the recovery performance of both \(\hat{\text {x}}\) and \(\text {x}\) (denoted as \(\hat{\text {x}}_{\text {re}}\) and \(\text {x}_{\text {re}}\) respectively). For \(\hat{\text {x}}_{\text {re}}\), we measure the performance in terms of “sparseness”, by the ratio of the number of nonzero coefficients in \(\hat{\text {x}}_{\text {re}}\) to signal length. and for \(\text {x}_{\text {re}}\) we consider the relative error \(||\text {x}_{\text {re}} - \text {x}||_2 / ||\text {x}||_2\). Tables 3 and 4 shows us that though in terms of accuracy (relative error) our adaptive \(L_{1/2}\) performs slightly worse than adaptive \(L_1\) but better than the others and yields the most sparse solution.

Table 3. Averaged sparseness of recovered \(\hat{\text {x}}\) (with standard deviation(SD))
Table 4. Averaged relative error under 2-norm of recovered \(\text {x}\) (with SD)

4.3 Shape Reconstruction

We use the example proposed by [24] to reconstruct an image from a set of parallel projections, acquired along different angles. Similar patterns are commonly seen in computed tomography (CT) data. Without prior knowledge on the sample, the number of projections that are required to reconstruct the image is of order O(N) (in pixels). Here, we consider the case of the sparse image with the objects that are basic shapes where only the boundary of objects have non-zero value. These images are artificially generated but still correspond to real-life applications including monitoring cellular material.

The sparse image we use here is of size \(128\times 128\) and we added Gaussian noise with standard variance \(\sigma =0.2\) (shown in Fig. 1(a)). In reconstruction, we stretch it into a \(128\times 128=16384\) dimensional vector. The reconstruction results using the Lasso and our adaptive \(L_{1/2}\) with N / 7 pixels and N / 10 sampled are shown in Fig. 1(a).

Both estimators recovered the original image with highly visible accuracy with N / 7 pixels sampled, while the adaptive \(L_{1/2}\) regularizer has a better performance numerically which is demonstrated in Table 5. But when the sampling ratio drops to N / 10, the Lasso starts to fail (notice that the shapes break down), while our adaptive \(L_{1/2}\) still gives reconstruction result with high accuracy (shown in Fig. 1(b)).

figure 1

Fig. 1.

We use the Structural Similarity Image Metric (SSIM) [25, 26] to compare the reconstruction performance among the four estimators. The SSIM index can be viewed as a quality measure of one of the images being compared, provided the other image is regarded as of perfect quality and improve consistence with human visual perception, in comparison to the traditional indices such as peak signal-to-noise ratio (PSNR) and mean squared error (MSE) [27].

Table 5 shows that the adaptive \(L_{1/2}\) regularizer has the best performance of reconstruction in both cases that the sampling ratio is N / 7 or N / 10.

Table 5. Performance comparison of reconstruction results measured by SSIM among the Lasso, adaptive \(L_1\), \(L_{1/2}\), adaptive \(L_{1/2}\).

5 Concluding Remarks

We have conducted a study of a specific framework of the adaptive \(L_p\text { }(0<p<1)\) regularization, towards better performance for the estimation of sparsity problems. We have shown that the adaptive \(L_p\) regularized estimators possess the oracle property when \(l_n\gg n\). We also proposed a fast and efficient algorithm to solve the adaptive \(L_p\) regularization problem. Our results offer new insights into the \(L_p\) (\(0<p<1\)) related methods and reveals its potential application in diverse fields of compressed sensing.