1 Introduction

Compressive Sensing (CS) has been popularized for the past more than 10 years (see Candés et al. 2006; Candés and Tao 2005; Foucart and Rauhut 2013); its target is to reconstruct a sparse signal from a underdetermined system of linear equations. Mathematically, CS is to find a sparsest solution to a system of linear equations, which is NP-hard (see Natarajan 1995) because of the combinatorial feature of the problem. Many approaches have been developed for solving this class of problems, of which one of most popular methods is the relaxation method, i.e., find a solution to the original problem via its relaxation problem.

So far, the convex relaxation method of CS has been developed well, in both theory and algorithm. However, it has been found from both theoretical and numerical aspects that it needs fewer measurements for exact recovery via the non-convex relaxation (such as \(\ell _p\)-minimization) than the convex relaxation (such as \(\ell _1\)-minimization) (Chartrand 2007; Lyu et al. 2013; Zhang et al. 2013). Thus, nonconvex relaxation methods, including \(\ell _p\)-minimization, have become a hot spot in recent years (see, for example, Chartrand 2007; Davies and Gribonval 2009; Foucart and Lai 2009; Gribonval and Nielson 2003; Lai and Wang 2011; Peng et al. 2015; Saab et al. 2008; Sun 2012; Wang et al. 2011; Xu et al. 2012; Zhang et al. 2013).

Recently, Fung and Mangasarian (2011) discussed the relationship between the \(\ell _0\) and \(\ell _p\) minimal solutions to the system of linear equalities and inequalities with box constraints; and they showed that there is a (problem dependent) constant smaller or equal to one, denoted by \(p^*\), such that any optimal solution of \(\ell _p\)-minimization with \(p\in [0,p^*)\) also solves the concerned \(\ell _0\)-minimization. Such an important property has also been investigated for the system of linear equations (Peng et al. 2015), the linear complementarity problem (Chen and Xiang 2016) and the phaseless compressive sensing problem (You et al. 2017).

Inspired by the studies mentioned above, we investigate the partially sparsest solution of the union of finite polytopes. This problem covers a wide range which contains the problems discussed in Fung and Mangasarian (2011), Peng et al. (2015) and You et al. (2017) as special cases. We consider this problem via its nonconvex relaxation by using a class of nonconvex approximation functions which contains Schatten-p quasi-norm as a special case. We show that there exists a constant \(p^*\in (0,1]\) such that for any parameter p involving in the nonconvex approximation function satisfying \(p\in [0,p^*)\), every optimal solution of the relaxation problem is also an optimal solution of the original problem, which is a generalization of those obtained in Fung and Mangasarian (2011), Peng et al. (2015) and You et al. (2017). We give an example to illustrate our theoretical result.

The paper is organized as follows. In Sect. 2, we state the problem we concerned and provide some basic concepts and results which are useful in later sections. In Sect. 3, we reformulate equivalently the original sparse optimization problem and its nonconvex relaxation problem into new forms by means of some results from polytope, then study the relationship between the solution sets of these two problems. In Sect. 4, we show that several classes of sparse optimization problems are subclasses of the problem we concerned; and hence, the main result is valid for these problems. Some conclusions and comments are given in Sect. 5.

The conventions of the notations in this paper. Uppercase (greek) letters denote matrices, e.g., \(A, \Phi \); boldface lowercase letters denote column vectors, e.g., \(\mathbf {x}, \mathbf {b}\); \(\Vert \mathbf {x}\Vert _{0}\) means the number of nonzero components of vector \(\mathbf {x}\); \(\Vert \mathbf {x}\Vert _{\infty }\) is infinity norm defined by \(\max _{i}\{|x_i|\}\); \( \Vert \mathbf {x}\Vert _{p}\) is Schatten-p quasi-norm of \(\mathbf {x}\) defined by \((\sum \nolimits _{i=1}^{n}| x_{i}|^{p})^\frac{1}{p}\), called ‘\(\ell _p\)-norm of \(\mathbf {x}\)’, where \(0<p<1\); \(\mathbf {1}\) is a vector with each component being 1; \(\mathbf {0}\) is a zero-vector; \(|\mathbf {u}|\) is a vector with the i-th component being \(|u_i|\) for all i; \((\mathbf {x},\mathbf {y})\) stands for the vector \((\mathbf {x}^{\top },\mathbf {y}^{\top })^{\top }\), which is a column vector; \(\mathbf {u}\leqslant \mathbf {v}\) signifies vector inequality which satisfies \(u_i\leqslant v_i\) for all i; \(A^\top \) represents the transpose of matrix A and \(\mathbf {u}^\top \) the transpose of vector \(\mathbf {u}\); lowercase letters denote real scalars, e.g., \(u_i, a, b,\ldots \), particularly, \(u_i\) stands for the i-th component of \(\mathbf {u}\); [n] is the set \(\{1,2,\ldots , n\}\) with n being a positive integer; \(\mathbb {R}_{+}\) is the set of non-negative real numbers; \(\mathbb {R}^{n}_{+}\) is the non-negative orthant \(\{(x_1,\ldots ,x_n)^{\top }|x_i\geqslant 0, i\in [n]\}\); \(\mathbb {R}^{m\times n}\) is the set of all real matrices with m rows and n columns; \(\epsilon =(\epsilon _1,\ldots ,\epsilon _m)^{\top }\in \{-1,1\}^m\) is an m-dimensional vector with each \(\epsilon _i\) being \(-1\) or 1; \(\mathbf {u}\circ \mathbf {v}\) is Hadamard product of vectors \(\mathbf {u}\) and \(\mathbf {v}\); \(\mathop {sign}(a)\) is sign function, e.g., if \(a>0\), then \(\mathop {sign}(a)=1\), else if \(a=0\), then \(\mathop {sign}(a)=0\), else \(\mathop {sign}(a)=-1\) if \(a<0\); \(\mathop {\mathbf {sign}}(\mathbf {u})\) is a sign vector with the i-th component being \(\mathop {sign}(u_{i})\); \(\mathbb {I}, \mathbb {I}^{c}\) are two index sets contained in [n], with \(\mathbb {I}^{c}=[n]\backslash \mathbb {I}\); \(\#(\mathbb {I})\) is cardinality of index set \(\mathbb {I}\), i.e., the number of elements of \(\mathbb {I}\); \(\Phi _{\mathbb {I}}\) denotes a sub-matrix of \(\Phi \) constructed from columns corresponding to indices of index set \(\mathbb {I}\), and \(\mathbf {x}_{\mathbb {I}}\) a sub-vector of \(\mathbf {x}\) constructed from components corresponding to indices of index set \(\mathbb {I}\); \(\mathrm {Conv}(\mathbb {T})\) is convex hull of the set \(\mathbb {T}\). \(\mathrm {Diag}(\mathbf {v})\) denotes a diagonal matrix generated by vector \(\mathbf {v}\).

2 Preliminaries

In this paper, we study the following partially sparse optimization problem:

$$\begin{aligned} \quad \min _{(\mathbf {x}, \mathbf {y})} \Vert \mathbf {y}\Vert _0\quad \text {s.t.}\quad (\mathbf {x}, \mathbf {y})\in \bigcup _{i\in \mathcal {I}}\mathbb {P}_{i},\mathbf {l}\leqslant \mathbf {x}\leqslant \mathbf {u},\mathbf {0}\leqslant \mathbf {y}\leqslant \mathbf {1}, \end{aligned}$$
(1)

where \(\mathbf {x}\in \mathbb {R}^{n_{1}}, \mathbf {y}\in \mathbb {R}^{n_{2}}\), \((\mathbf {x},\mathbf {y})\in \mathbb {R}^N\) with \(N=n_1+n_2\), each \(\mathbb {\mathbb {P}}_{i}\) is a polyhedron in \(\mathbb {\mathbb {R}}^{N}\), \(\mathcal {I}\) is a finite index set, and \(\mathbf {l},\mathbf {u}\in \mathbb {R}^{n_{1}}\) are two known vectors satisfying \(\mathbf {l\leqslant \mathbf {u}}\).

To make sense of problem (1), we assume that the feasible set of problem (1) is nonempty throughout this paper.

We denote the feasible set of (1) by

$$\begin{aligned} \mathbb {T}:=\bigcup _{i\in \mathcal {I}}\mathbb {T}_{i}\subseteq \mathbb {R}^{N} \end{aligned}$$
(2)

where, for any \(i\in \mathcal {I}\),

$$\begin{aligned} \mathbb {T}_{i}:= \left\{ (\mathbf {x}, \mathbf {y})\in \mathbb {R}^{N}\Big | (\mathbf {x}, \mathbf {y})\in \mathbb {P}_{i},\mathbf {l}\leqslant \mathbf {x}\leqslant \mathbf {u},\mathbf {0}\leqslant \mathbf {y}\leqslant \mathbf {1} \right\} . \end{aligned}$$
(3)

By the assumption that the feasible set of problem (1) is nonempty and the fact that a set is a polytope if and only if it is a bounded polyhedral set (see Grünbaum 2003, p. 32), we have the following result.

Proposition 2.1

The set \(\mathbb {T}\) defined by (2) is nonempty and it is a union of finitely many polytopes in \(\mathbb {R}^{N}.\)

By Proposition 2.1, problem (1) is a sparse optimization problem over a union of finitely many polytopes. It should be noticed that \(~\mathbb {T}\) is convex when \(\#(\mathcal {I})=1\) and not necessarily convex when \(\#(\mathcal {I})>1\).

Remark 2.1

Consider the problem

$$\begin{aligned} \min _{(\mathbf {x}, \mathbf {z})} \Vert \mathbf {z}\Vert _0\quad \text {s.t.}\quad (\mathbf {x}, \mathbf {z})\in \bigcup _{i\in \mathcal {I}}\widetilde{\mathbb {P}}_{i},\mathbf {l}\leqslant \mathbf {x}\leqslant \mathbf {u},\mathbf {0}\leqslant \mathbf {z}\leqslant \mathbf {v}, \end{aligned}$$
(4)

where \(\mathbf {z}\in \mathbb {R}^{n_2}\), \(\mathbf {v}\in \mathbb {R}^{n_2}\) with \(\mathbf {v}>\mathbf {0}\), and every \(\widetilde{\mathbb {P}}_{i}\) is a polyhedron in \(\mathbb {R}^{N}\). Then, it is easy to see that problem (4) can be reformulated as a problem in the form of problem (1).

Problem (1) is a class of nonconvex optimization problems with wide range, which contains many popular problems as special cases (see some discussions in Sect. 4). In this paper, we investigate this class of problems via its nonconvex relaxation. For this purpose, we define the following set:

$$\begin{aligned} \mathbf {\mathscr {F}}(t;p):=\{f(t;p)\mid t\in [0,\infty ), f(t;p) ~\text {satisfies properties ({ C}1)--({ C}3)}\}, \end{aligned}$$
(5)

where properties (C1)–(C3) are as follows: for every given number \(p\in (0,1]\),

(C1) f(tp) is strictly (increasing) concave function in \(t\in [0,\infty )\);

(C2) \(f(0;p)=0\) and \(0< f(t;p)\leqslant 1 \) whenever \(t\in (0,1]\);

(C3) \(\lim \nolimits _{p\downarrow 0} f(t;p)=1\) as \(t>0\).

Several functions \(f(t;p)\in \mathscr {F}(t;p)\) have been proposed in Gribonval and Nielsen (2007), Bradley and Mangasarian (1998) and Haddou and Migot (2015). For example,

$$\begin{aligned} (a)\;f_1(t;p)=\frac{t}{t+p};\quad (b)\;f_2(t;p)=1-e^{-\frac{t}{p}};\quad (c)\;f_3(t;p)=t^p. \end{aligned}$$

Remark 2.2

Let \(f(t;p)\in \mathscr {F}(t;p)\). By properties (C2) and (C3), it is easy to see that \(sign(t)=\lim \nolimits _{p\rightarrow 0}f(t;p)\) for any real number \(t\geqslant 0\).

For any function \(f(t;p)\in \mathscr {F}(t;p)\), we will consider the following relaxation problem of problem (1):

$$\begin{aligned} \quad \min _{(\mathbf {x}, \mathbf {y})} \sum \limits _{i=1}^{n_2}f(y_i;p)\quad \text {s.t.}\quad (\mathbf {x}, \mathbf {y})\in \bigcup _{i\in \mathcal {I}}\mathbb {P}_{i},\mathbf {l}\leqslant \mathbf {x}\leqslant \mathbf {u},\mathbf {0}\leqslant \mathbf {y}\leqslant \mathbf {1}, \end{aligned}$$
(6)

To further discuss the relationship between problems (1) and (6), we need the following definitions concerning monotonically nondecreasing vector-valued and monotonically nondecreasing real functions.

Definition 2.1

(You et al. 2017, Definition 2.1) (i) Define a function \(\mathbf {f}: \mathbb {R}^n_{+}\rightarrow \mathbb {R}^n_+\) by

$$\begin{aligned} \mathbf {f}(\mathbf {x})=(f(x_1),f(x_2),\ldots , f(x_n))^{\top }, \end{aligned}$$
(7)

where \(f:\mathbb {R}_+\rightarrow \mathbb {R}_+\). The (vector-valued) function \(\mathbf {f}\) is said to be monotonically nondecreasing if for any two nonnegative vectors \(\mathbf {u}\) and \(\mathbf {v}\), \(\mathbf {u}\leqslant \mathbf {v}\) implies \(\mathbf {f}(\mathbf {u})\leqslant \mathbf {f}(\mathbf {v})\).

(ii) Let \(F: \mathbb {R}^n_{+}\rightarrow \mathbb {R}_+\). The function F is said to be monotonically nondecreasing if for any two nonnegative vectors \(\mathbf {u}\) and \(\mathbf {v}\), \(\mathbf {u}\leqslant \mathbf {v}\) implies \(F(\mathbf {u})\leqslant F(\mathbf {v})\).

Two monotonically nondecreasing functions are given in the following example, which will be frequently used in our subsequent analysis.

Example 2.1

(i) Suppose that \(\mathbf {f}: \mathbb {R}^n_{+}\rightarrow \mathbb {R}^n_+\) is defined by (7) with \(f(x_i)=\text {sign}(x_i)\) for any \(i\in [n]\). We denote this function by \(\mathbf {sign}(\mathbf {\cdot })\). It is easy to see that \(\mathbf {sign}(\mathbf {\cdot })\) is a monotonically nondecreasing function.

(ii) Suppose that \(\mathbf {f}: \mathbb {R}^n_{+}\rightarrow \mathbb {R}^n_+\) is defined by (7) with strictly increasing concave function \(f(x_i)=f(x_i;p)\) for any \(i\in [n]\) and \(p\in (0,1]\), where \(f\in \mathbf {\mathscr {F}}(\cdot ;p)\) which is defined in (5). It is easy to see that this function is monotonically nondecreasing.

For any given vectors \(\mathbf {u}\) and \(\mathbf {v}\) with \(\mathbf {0}\leqslant \mathbf {u}\leqslant \mathbf {v}\) and a monotonically nondecreasing function \(\mathbf {f}\), we have

$$\begin{aligned} \mathbf {1}^{\top }\mathbf {f}(\mathbf {u}) =\sum \limits _{i=1}^{n}{f(u_i)}\leqslant \sum \limits _{i=1}^{n}{f(v_i)}=\mathbf {1}^{\top }\mathbf {f}(\mathbf {v}). \end{aligned}$$

Therefore, \(\mathbf {1}^{\top }\mathbf {f}\) is monotonically nondecreasing.

3 Relationship between optimal solutions of problem (1) and problem (6)

In this section, we study the relationship between optimal solutions of problem (1) and problem (6).

Applying the set \(~\mathbb {T}\) defined by (2), problem (1) can be simply rewritten as

$$\begin{aligned} \quad \min \limits _{(\mathbf {x},\mathbf {y})} \Vert \mathbf {y}\Vert _0\quad \text {s.t.}\quad (\mathbf {x},\mathbf {y})\in \mathbb {T}. \end{aligned}$$
(8)

As is known that the objective function of (8) is nonconvex and the feasible set is usually nonconvex, problem (8) is NP-hard (Natarajan 1995). Similarly, problem (6) is simply rewritten as

$$\begin{aligned} \quad \min \limits _{(\mathbf {x},\mathbf {y})}\mathbf {1}^\top \mathbf {f}(\mathbf {y};p)\quad \text {s.t.}\quad (\mathbf {x},\mathbf {y})\in \mathbb {T}, \end{aligned}$$
(9)

where \(\mathbf {f}(\mathbf {t};p)=(f(t_{1};p),f(t_{2};p),\ldots ,f(t_{n_2};p))^\top \in \mathbb {R}^{n_2}\) is a vector-valued function with \(f(t;p)\in \mathscr {F}(t;p)\).

Denote the convex hull of the set \(\mathbb {T} \) defined in (2) by

$$\begin{aligned} \mathbb {S}:=\text {Conv}(\mathbb {T}) = \text {Conv}\left( \bigcup \limits _{i\in \mathcal {I}}\mathbb {T}_{i}\right) . \end{aligned}$$
(10)

Then, we have the following result.

Proposition 3.1

The set \(\mathbb {S}\) defined by (10) is a nonempty polytope contained in the box \(\prod \nolimits _{i=1}^{n_1}[l_i,u_i]\times [0,1]^{n_2}\subset \mathbb {R}^{N}\) with \(N=n_{1}+n_{2}\).

Proof

By Proposition 2.1, it follows that the set \(\mathbb {T}\) is nonempty and it is the union of finitely many polytopes in \(\mathbb {R}^{N}\). Thus, the set \(\mathbb {S}\) is nonempty and is a convex hull of finitely many polytopes. By the fact that the convex hull of finitely many polytopes is a polytope (see Grünbaum 2003, p. 32), it follows that \(\mathbb {S}\) is a polytope. Furthermore, since \(\mathbb {T}\subset \prod \nolimits _{i=1}^{n_1}[l_i,u_i]\times [0,1]^{n_2}\) with the latter set being a (convex) box and \(\mathbb {S}\) is the smallest convex set containing \(\mathbb {T}\), it is easy to see that \(\mathbb {S}\) is a subset of this box. Thus, the result of the proposition holds. \(\square \)

For the set \(\mathbb {T}\) defined in (2), we call \(\mathbf {x}\in \mathbb {T}\) a pseudo-extreme point of \(\mathbb {T}\) if it is a vertex of some polytope \(\mathbb {T}_{i}\), which implies that there are no two distinct points \(\mathbf {y}, \mathbf {z}\in \mathbb {T}_{i}\) and a constant \(\lambda \in (0,1)\) such that \(\mathbf {x}=\lambda \mathbf {y}+(1-\lambda )\mathbf {z}\). In addition, an extreme point of \(\mathbb {S}\) is called a vertex of \(\mathbb {S}\) as usually defined. In the following, we use \(\mathrm{vert }(\mathbb {S})\) to denote the set of all vertices of \(\mathbb {S}\) and \({\mathrm{pext}}(\mathbb {T})\) to denote the set of all pseudo-extreme points of \(\mathbb {T}\).

Using the convex hull \(\mathbb {S}\), problem (9) is relaxed as

$$\begin{aligned} \min \limits _{(\mathbf {x},\mathbf {y})}\mathbf {1}^\top \mathbf {f}(\mathbf {y};p)\quad \text {s.t.}\quad (\mathbf {x},\mathbf {y})\in \mathbb {S}. \end{aligned}$$
(11)

In what follows, we discuss the relationship between problem (9) and problem (11). We have the following result.

Theorem 3.1

Suppose that \(\mathbf {x}^*\in \mathrm{vert }(\mathbb {S})\) is an optimal solution to problem (11), then \(\mathbf {x}^*\in \mathrm{pext}(\mathbb {T})\) and it is an optimal solution to problem (9).

Proof

Notice that problem (11) and problem (9) are different mainly in their feasible sets, that is, one is \(\mathbb {T}\) and another is its convex hull \(\mathbb {S}\). By polytope geometry, it follows that \(\mathbb {S}=\text {Conv}(\mathrm{pext}(\mathbb {T}))\) (see Ziegler 1995, Theorem 2.15(3)). Furthermore, by the fact that if a polytope can be written as the convex hull of a finite point set, then the set contains all the vertices of the polytope (see Ziegler 1995, Proposition 2.2), we have that \(\mathrm{pext}(\mathbb {T})\supseteq \mathrm{vert }(\mathbb {S})\). Therefore, by You et al. (2017, Lemma 2.2), each vertex solution to problem (11) is a pseudo-extreme point which is an optimal solution to problem (9). \(\square \)

In the following, if a pseudo-extreme point of \(\mathbb {T}\) is an optimal solution to problem (9), then we call it a pseudo-extreme-point optimal solution to problem (9). The following addresses the existence of optimal solution to problem (9).

Theorem 3.2

For any given \(p\in (0,1)\), there exists at least one pseudo-extreme-point optimal solution to problem (9).

Proof

Denote \(f(\mathbf {x},\mathbf {y}):=\mathbf {1}^\top \mathbf {f}(\mathbf {y};p)=\sum \nolimits _{i=1}^{n_2}f(y_i;p)\) with \(p\in (0,1)\), then \(-f\) is a convex function on \(\text {dom}f=\mathbb {R}^{n_1}\times \mathbb {R}^{n_2}_+\). By Proposition 3.1, it follows that the set \(\mathbb {S}\) is nonempty and \(\mathbb {S}\subseteq \prod \nolimits _{i=1}^{n_1}[l_i,u_i]\times [0,1]^{n_2}\subset \text {dom}f\). There are no half-lines and no lines in \(\mathbb {S}\); and \(-f\) is bounded above on \(\mathbb {S}\). By Rockafellar (1970, Corollary 32.3.4), there exists a vertex optimal solution to problem (11), which is also a pseudo-extreme-point optimal solution to problem (9) by Theorem 3.1. In other words, for each real number \(p\in (0, 1)\), there exists at least one pseudo-extreme-point optimal solution to problem (9).\(\square \)

Theorem 3.3

Suppose \(\{p_k\}\) is an arbitrarily given sequence with \(0<p_{k+1}<p_k< 1\) for any \(k=1,2,\ldots \) and \(\lim \nolimits _{k\rightarrow \infty }p_{k}=0\). Then,

  1. (i)

    there is a subsequence \(\{p_{i_k}\}\subseteq \{p_k\}\) such that some fixed pseudo-extreme point \((\hat{\mathbf {x}},\hat{\mathbf {y}})\) of \(\mathbb {T}\) is an optimal solution to (9) with any \(p_{i_j}\in \{p_{i_k}\}\); and

  2. (ii)

    \((\hat{\mathbf {x}},\hat{\mathbf {y}})\) is an optimal solution to (8).

Proof

(i) For each \( p_i\in \{p_k\}\) with \(0<p_k< 1\) for any \(k=1,2,\ldots \) and \(\lim \nolimits _{k\rightarrow \infty }p_{k}=0\), by Theorem 3.2 it follows that there exists a pseudo-extreme-point optimal solution \((\hat{\mathbf {x}}(p_i), \hat{\mathbf {y}}(p_i))\) to problem (9) with \(p=p_i\).

Since the number of pseudo-extreme points of the set \(\mathbb {T}\) is finite, it follows that the number of pseudo-extreme-point optimal solutions is finite; while the set \(\{p_k\}\) is infinite, so we conclude that there exists an infinite sub-sequence \(\{p_{i_k}\}\subset \{p_k\}\) such that for any \(p_{i_j}\in \{p_{i_k}\}\), problem (9) with \(p=p_{i_j}\) has the same pseudo-extreme-point optimal solution \((\hat{\mathbf {x}},\hat{\mathbf {y}})\). The first part of the theorem is proved.

(ii) We show that \((\hat{\mathbf {x}},\hat{\mathbf {y}})\) is a partially sparsest solution of the original problem (1). Noting that \(\mathbf {y}\in \mathbb {R}^{n_2}\) and \(\mathbf {0}\leqslant \mathbf {y}\leqslant \mathbf {1}\), we have

$$\begin{aligned} (\mathbf {sign}(\mathbf {y}))_i=sign(y_i)=\left\{ \begin{array}{ccc} 1,&{} \quad \text{ if } &{} y_i >0,\\ 0,&{} \quad \text{ if } &{} y_i =0 \end{array}\right. \end{aligned}$$

and

$$\begin{aligned} \Vert \mathbf {y}\Vert _0=\sum \limits _{i=1}^{n_2}sign(y_i). \end{aligned}$$
(12)

Since \((\hat{\mathbf {x}},\hat{\mathbf {y}})\) is an optimal solution to problem (9) with \(p=p_{i_j}\) for any \(p_{i_j}\in \{p_{i_k}\}\), we have \((\hat{\mathbf {x}},\hat{\mathbf {y}})\in \mathbb {T}\); and hence, \(\mathbf {1}\geqslant \hat{\mathbf {y}}\geqslant \mathbf {0}\). Furthermore, \(\Vert \hat{\mathbf {y}} \Vert _0=\sum \nolimits _{i=1}^{n_2}sign(\hat{y}_{i})\) and

$$\begin{aligned} \mathbf {1}^\top \mathbf {f}(\hat{\mathbf {y}};p_{i_j})=\min \limits _{(\mathbf {x},\mathbf {y})\in \mathbb {T}}\mathbf {1}^\top \mathbf {f}(\mathbf {y};p_{i_j}). \end{aligned}$$
(13)

For simplicity, we use \(p_j\) to replace \(p_{i_j}\). Therefore,

$$\begin{aligned} \Vert \hat{\mathbf {y}}\Vert _0= & {} \sum \limits _{i=1}^{n_2}sign(\hat{y}_{i})=\sum \limits _{i=1}^{n_2}\lim \limits _{j\rightarrow \infty }f(\hat{y_i};p_j)=\lim \limits _{j\rightarrow \infty } \sum \limits _{i=1}^{n_2}f(\hat{y_i};p_j)=\lim \limits _{j\rightarrow \infty } \mathbf {1}^\top \mathbf {f}(\hat{\mathbf {y}};p_j)\nonumber \\= & {} \lim \limits _{j\rightarrow \infty }\min \limits _{(\mathbf {x},\mathbf {y})\in \mathbb {T}}\mathbf {1}^\top \mathbf {f}(\mathbf {y};p_j)=\lim \limits _{j\rightarrow \infty }\min \limits _{(\mathbf {x},\mathbf {y})\in \mathbb {T}} \sum \limits _{i=1}^{n_2}f(y_i;p_j)\nonumber \\\leqslant & {} \lim \limits _{j\rightarrow \infty }\sum \limits _{i=1}^{n_2}f(y_i;p_j)\;\;\text{ for } \text{ all }\;(\mathbf {x},\mathbf {y})\in \mathbb {T} \nonumber \\\leqslant & {} \sum \limits _{i=1}^{n_2}sign({y_i})=\Vert \mathbf {y}\Vert _0\;\;\text{ for } \text{ all }\;(\mathbf {x},\mathbf {y})\in \mathbb {T} \end{aligned}$$
(14)

where the first equality follows from (12), the fifth equality from (13), and the last equality from (12).

From (14) and the arbitrariness of \((\mathbf {x},\mathbf {y})\in \mathbb {T}\), we have

$$\begin{aligned} \Vert \hat{\mathbf {y}} \Vert _0 \leqslant \min \limits _{(\mathbf {x},\mathbf {y})\in \mathbb {T}}\Vert \mathbf {y} \Vert _0, \end{aligned}$$

which implies that \((\hat{\mathbf {x}},\hat{\mathbf {y}})\) is an optimal solution to problem (8), i.e., (1). The second part is proved. \(\square \)

Theorem 3.4

There exists a constant \(p^\sharp \in (0,1]\) such that for any fixed \(p\in (0,p^\sharp )\), there exists a pseudo-extreme-point optimal solution to problem (9), which is also an optimal solution to problem (8).

Proof

Suppose the statement is not true. Then for any fixed \(\tilde{p}\in (0, 1]\), there exists a \(\bar{p} \in (0,\tilde{p})\) such that none of all pseudo-extreme-point optimal solutions to (9) with \(p=\bar{p}\) is the optimal solution to problem (8). In what follows, we will derive a contradiction.

Define a sequence \(\mathbb {P}:=\{\tilde{p}_{1},\tilde{p}_{2},\ldots \}\) with \(1\geqslant \tilde{p}_{i}>\tilde{p}_{i+1}, i=1,2,\ldots \) and \(\lim \nolimits _{i\rightarrow +\infty }\tilde{p}_i=0\). By the above statement, for any \(\tilde{p}_{i}\in \mathbb {P}\), there exists a \(\bar{p}_i>0\) with \(\bar{p}_i<\tilde{p}_i\) such that none of all pseudo-extreme point optimal solutions to problem (9) with \(p=\bar{p}_i\) is the optimal solution to problem (8). Let \(\hat{p}_i=\min \{\tilde{p}_{i+1},\bar{p}_i\}\). Again by the same statement, there exists a \(\bar{p}_{i+1}>0\) with \(\bar{p}_{i+1}<\hat{p}_i\) such that none of all pseudo-extreme-point optimal solutions to problem (9) with \(p=\bar{p}_{i+1}\) is the optimal solution to problem (8).

Noticing that \(\tilde{p}_i>\bar{p}_{i}\geqslant \hat{p}_i>\bar{p}_{i+1}>0\) with \(i=1,2,\ldots \), we get a sequence \(\{\bar{p}_i\}\), which satisfies that \(0<\bar{p}_{i+1}<\bar{p}_i< 1 \quad \text{ and }\quad \lim \nolimits _{i\rightarrow +\infty }\bar{p}_i=0\), such that none of all pseudo-extreme-point optimal solutions to problem (9) with any \(p\in \{\bar{p}_i\}\) is the optimal solution to problem (8). This contradicts the results of Theorem 3.3. The proof is complete. \(\square \)

In the following, we will give two main results in this section.

Theorem 3.5

Suppose that \(p^\sharp \leqslant 1\) is given by Theorem 3.4. Then there exists a constant \(p^*\in (0,p^\sharp ]\) such that for any given real number p with \(0<p<p^*\), every pseudo-extreme-point optimal solution to problem (9) is an optimal solution to problem (8).

Proof

Assume that the statement is not true. Then for an arbitrary real number \(\tilde{p}\in (0, p^\sharp ]\subseteq (0,1]\), there is a real number \(\bar{p} \in (0,\tilde{p})\) such that there is a pseudo-extreme-point optimal solution \((\bar{\mathbf {x}}(\bar{p}),\bar{\mathbf {y}}(\bar{p}))\) to problem (9) with \(p=\bar{p}\), but it is not an optimal solution to problem (8); meanwhile, by Theorem 3.4, there is another pseudo-extreme-point optimal solution \((\hat{\mathbf {x}}(\bar{p}),\hat{\mathbf {y}}(\bar{p}))\) to problem (9) with \(p=\bar{p}\) that is also an optimal solution to problem (8). In what follows, we will deduce a contradiction.

Define a sequence \(\mathbb {P}:=\{\tilde{p}_{1},\tilde{p}_{2},\ldots \}\subseteq (0, p^\sharp ]\) with \(1\geqslant p^\sharp \geqslant \tilde{p}_{i}>\tilde{p}_{i+1}>0, i=1,2,\ldots \) and \(\lim \nolimits _{i\rightarrow +\infty }\tilde{p}_i=0\). By the statement above, for any \(\tilde{p}_{i}\in \mathbb {P}\), there exists \(\bar{p}_i>0\) with \(\bar{p}_i<\tilde{p}_i\) such that there is a pseudo-extreme-point optimal solution \((\bar{\mathbf {x}}(\bar{p_i}),\bar{\mathbf {y}}(\bar{p_i}))\) to problem (9) with \(p=\bar{p}_i\), but it is not an optimal solution to problem (8); meanwhile, by Theorem 3.4, there is another pseudo-extreme-point optimal solution \((\hat{\mathbf {x}}(\bar{p_i}),\hat{\mathbf {y}}(\bar{p_i}))\) to problem (9) with \(p=\bar{p}_i\) that is also an optimal solution to problem (8). Let \(\hat{p}_i=\min \{\tilde{p}_{i+1},\bar{p}_i\}\). Again by the same statement, there exists \(\bar{p}_{i+1}>0\) with \(\bar{p}_{i+1}<\hat{p}_i\) such that there is a pseudo-extreme-point optimal solution \((\bar{\mathbf {x}}(\bar{p}_{i+1}),\bar{\mathbf {y}}(\bar{p}_{i+1}))\) to problem (9) with \(p=\bar{p}_{i+1}\), but it is not an optimal solution to problem (8); meanwhile, by Theorem 3.4, there is another pseudo-extreme-point optimal solution \((\hat{\mathbf {x}}(\bar{p}_{i+1}),\hat{\mathbf {y}}(\bar{p}_{i+1}))\) to problem (9) with \(p=\bar{p}_{i+1}\) that is also an optimal solution to problem (8).

Noticing that \(\tilde{p}_i>\bar{p}_{i}\geqslant \hat{p}_i>\bar{p}_{i+1}>0\) with \(i=1,2,\ldots \), we get a sequence \(\{\bar{p}_i\}\), which satisfies that \(0<\bar{p}_{i+1}<\bar{p}_i\leqslant 1 \quad \text{ and }\quad \lim \nolimits _{i\rightarrow +\infty }\bar{p}_i=0\), such that there is a pseudo-extreme-point optimal solution \((\bar{\mathbf {x}}(\bar{p_i}),\bar{\mathbf {y}}(\bar{p_i}))\) to problem (9) with any \(p\in \{\bar{p}_i\}\), but it is not an optimal solution to problem (8); meanwhile, there is another pseudo-extreme-point optimal solution \((\hat{\mathbf {x}}(\bar{p_i}),\hat{\mathbf {y}}(\bar{p_i}))\) to problem (9) with any \(p\in \{\bar{p}_i\}\) that is also an optimal solution to problem (8).

Since the number of pseudo-extreme points of \(\mathbb {T}\) is finite, it follows that the pseudo-extreme-point optimal solution set \(\left\{ \left( \bar{\mathbf {x}}(\bar{p_i}),\bar{\mathbf {y}}(\bar{p_i})),(\hat{\mathbf {x}}(\bar{p_i}),\hat{\mathbf {y}}(\bar{p_i})\right) \right\} \) is finite; while the sequence \(\{\bar{p}_i\}\) is infinite, we conclude that there must be a sub-sequence \(\{\bar{p}_{i_k}\}\subseteq \{\bar{p}_i\}\) such that there is a fixed pseudo-extreme point \((\bar{\mathbf {x}},\bar{\mathbf {y}})\) that is an optimal solution to every problem (9) with \(p=\bar{p}_{i_k}\), however, \((\bar{\mathbf {x}},\bar{\mathbf {y}})\) is not an optimal solution to problem (8); meanwhile there is some fixed pseudo-extreme point \((\hat{\mathbf {x}},\hat{\mathbf {y}})\) being an optimal solution to problem (9) with \(p=\bar{p}_{i_k}\) that is also an optimal solution to problem (8). For simplicity, we use \(\bar{p}_i\) instead of \(\bar{p}_{i_k}\). Then,

$$\begin{aligned} \mathbf {1}^\top \mathbf {f}(\bar{\mathbf {y}};\bar{p}_{i})= \mathbf {1}^\top \mathbf {f}(\hat{\mathbf {y}};\bar{p}_{i})=\min \limits _{(\mathbf {x},\mathbf {y})\in \mathbb {T}}\mathbf {1}^\top \mathbf {f}(\mathbf {y};\bar{p}_{i})\; \text{ with }\; 0<\bar{p}_i<p^\sharp . \end{aligned}$$

Let \(\Vert \hat{\mathbf {y}}\Vert _{0}=s\). From above we have

$$\begin{aligned} \Vert \bar{\mathbf {y}}\Vert _{0}=\lim \limits _{\bar{p}_i\downarrow 0}\mathbf {1}^\top \mathbf {f}(\bar{\mathbf {y}};\bar{p}_{i})=\lim \limits _{\bar{p}_i\downarrow 0}\mathbf {1}^\top \mathbf {f}(\hat{\mathbf {y}};\bar{p}_{i})=\Vert \hat{\mathbf {y}}\Vert _{0}=s, \end{aligned}$$

which shows that \((\bar{\mathbf {x}},\bar{\mathbf {y}})\) is also an optimal solution to problem (8). This contradicts to the afore-mentioned statement that \((\bar{\mathbf {x}},\bar{\mathbf {y}})\) does not solve problem (8), which shows the assumption given in the beginning of the proof is false. The proof is complete. \(\square \)

Theorem 3.6

Consider sparse optimization problem (1) which is subject to the union of finite polyhedrons within a box \(\prod \nolimits _{i=1}^{n_1}[l_i,u_i]\times \prod \nolimits _{j=1}^{n_2}[0,1]\). There exists a constant \(p^*\in (0,1]\) such that for any given real number p with \(0<p<p^*\), every optimal solution to nonconvex minimization problem (6) is an optimal solution to problem (1).

Proof

Notice that problem (1) and problem (8) is equivalent, and problem (6) and problem (9) is equivalent. By virtue of Theorem 3.5, there exists a constant \(p^*\in (0,1]\) such that for any given real number p with \(0<p<p^*\), every pseudo-extreme-point optimal solution to problem (6) is an optimal solution to problem (1).

To accomplish the proof, we divide the feasible set \(\mathbb {T}\) [defined by (2)] of problem (6) into some subsets. First, we observe that \(\mathbb {T}\) is composed of the following two subsets, one is concerned with all the (finite) pseudo-extreme points, the other is concerned with all the (feasible) non-pseudo-extreme points; these two subsets are denoted by \(\mathbb {T}_{pseudo}\) and \(\mathbb {T}_{non\text {-}pseudo}\), respectively. Then,

$$\begin{aligned} \mathbb {T}=\mathbb {T}_{pseudo}\bigcup \mathbb {T}_{non\text {-}pseudo}. \end{aligned}$$
(15)

Furthermore, we divide the two subsets in (15) into several smaller subsets, respectively. As to \(\mathbb {T}_{pseudo}\), each pseudo-extreme point is either optimal solution to problem (6) or non-optimal feasible point; we denote the corresponding subsets by \(\mathbb {T}_{pseudo}^{opt}\) and \(\mathbb {T}_{pseudo}^{non\text {-}opt}\), respectively, that is,

$$\begin{aligned} \mathbb {T}_{pseudo}^{opt}:=\{(\hat{\mathbf {x}}_{opt}^{i},\hat{\mathbf {y}}_{opt}^{i})\}_{i\in I_{1}} \quad \text {and} \quad \mathbb {T}_{pseudo}^{non\text {-}opt}:=\{(\hat{\mathbf {x}}_{non\text {-}opt}^{j},\hat{\mathbf {y}}_{non\text {-}opt}^{j})\}_{j\in I_{2}}, \end{aligned}$$
(16)

where \(I_1\) and \(I_2\) are two finite index sets. Hence,

$$\begin{aligned} \mathbb {T}_{pseudo}=\mathbb {T}_{pseudo}^{opt}\bigcup \mathbb {T}_{pseudo}^{non\text {-}opt}. \end{aligned}$$
(17)

As to \(\mathbb {T}_{non\text {-}pseudo}\), composed of all the (feasible) non-pseudo-extreme points, we divide it specifically into three subsets:

$$\begin{aligned} \mathbb {T}_{non\text {-}pseudo}=\mathbb {T}_{non\text {-}pseudo}^{opt}\bigcup \mathbb {T}_{non\text {-}pseudo}^{non\text {-}opt}\bigcup \mathbb {T}_{non\text {-}pseudo}^{else}, \end{aligned}$$
(18)

where \(\mathbb {T}_{non\text {-}pseudo}^{opt}\) stands for a feasible subset constructed from all the \(\mathbf {y}\)-blocks of all the pseudo-extreme point optimal solutions to problem (6):

$$\begin{aligned} \mathbb {T}_{non\text {-}pseudo}^{opt}:= & {} \bigcup \limits _{i\in I_1}\{(\mathbf {x},\hat{\mathbf {y}}_{opt}^{i}):(\mathbf {x},\hat{\mathbf {y}}_{opt}^{i})\in \mathbb {T},\mathbf {x}\ne \hat{\mathbf {x}}_{opt}^{i}\}; \end{aligned}$$

\(\mathbb {T}_{non\text {-}pseudo}^{non\text {-}opt}\) stands for a feasible subset constructed from all the \(\mathbf {y}\)-blocks of all the non-optimal pseudo-extreme points of \(\mathbb {T}\):

$$\begin{aligned} \mathbb {T}_{non\text {-}pseudo}^{non\text {-}opt}:=\bigcup \limits _{j\in I_2}\{(\mathbf {x},\hat{\mathbf {y}}_{non\text {-}opt}^{j}):(\mathbf {x},\hat{\mathbf {y}}_{non\text {-}opt}^{j})\in \mathbb {T},\mathbf {x}\ne \hat{\mathbf {x}}_{non\text {-}opt}^{j}\}; \end{aligned}$$

and \(\mathbb {T}_{non\text {-}pseudo}^{else}\) stands for a feasible subset in which all the \(\mathbf {y}\)-blocks of all of its points are not from corresponding blocks of all the pseudo-extreme points of \(\mathbb {T}\), that is,

$$\begin{aligned} \mathbb {T}_{non\text {-}pseudo}^{else}:=\{(\mathbf {x},\mathbf {y}):(\mathbf {x},\mathbf {y})\in \mathbb {T},\mathbf {y}\ne \hat{\mathbf {y}}_{opt}^{i},\mathbf {y}\ne \hat{\mathbf {y}}_{non\text {-}opt}^{j},i\in I_1,j\in I_2\}. \end{aligned}$$

From (17) and (18), we observe that

$$\begin{aligned} \mathbb {T}=\mathbb {T}_{pseudo}^{opt}\bigcup \mathbb {T}_{pseudo}^{non\text {-}opt}\bigcup \mathbb {T}_{non\text {-}pseudo}^{opt} \bigcup \mathbb {T}_{non\text {-}pseudo}^{non\text {-}opt}\bigcup \mathbb {T}_{non\text {-}pseudo}^{else}. \end{aligned}$$
(19)

Notice that if \(\mathbb {T}_{non\text {-}pseudo}^{opt}\) is nonempty, then all the points in \(\mathbb {T}_{non\text {-}pseudo}^{opt}\) are optimal solutions to problem (6), and furthermore, they are also optimal solutions to problem (1) since the optimal values of objective function of problem (6) is derived from all the \(\mathbf {y}\)-blocks of pseudo-extreme point optimal solutions. In addition, it is obvious that all the points in \(\mathbb {T}_{pseudo}^{non\text {-}opt}\) and \(\mathbb {T}_{non\text {-}pseudo}^{non\text {-}opt}\) are not optimal solutions to problem (6).

From Theorem 3.5, (19) and the discussions above, we observe that only the feasible subset \(\mathbb {T}_{non\text {-}pseudo}^{else}\) remains to be discussed. Suppose \((\tilde{\mathbf {x}},\tilde{\mathbf {y}})\) is any given (feasible) point in \(\mathbb {T}_{non\text {-}pseudo}^{else}\). Since \(\tilde{\mathbf {y}}\ne \hat{\mathbf {y}}_{opt}^{i},\tilde{\mathbf {y}}\ne \hat{\mathbf {y}}_{non\text {-}opt}^{j},\forall i\in I_1,\forall j\in I_2\), it is not a pseudo-extreme point. Therefore, \((\tilde{\mathbf {x}},\tilde{\mathbf {y}})\) can be represented as a convex combination of some (at least two) pseudo-extreme points, with at least two distinct\(\mathbf {y}\)-blocks. This can be written as

$$\begin{aligned} (\tilde{\mathbf {x}},\tilde{\mathbf {y}})=\sum \limits _{i\in \tilde{I}_1\cup \tilde{I}_2}\lambda _{i}(\hat{\mathbf {x}}^{i},\hat{\mathbf {y}}^{i}) =\left( \sum \limits _{i\in \tilde{I}_1\cup \tilde{I}_2}\lambda _{i}\hat{\mathbf {x}}^{i},\sum \limits _{i\in \tilde{I}_1\cup \tilde{I}_2}\lambda _{i}\hat{\mathbf {y}}^{i}\right) , \end{aligned}$$
(20)

where \(\tilde{I}_1\subseteq I_1,\tilde{I}_2\subseteq I_2\); and we omit all the terms that have zero coefficients. Hence, each \(\lambda _{i}> 0\) and \(\sum \nolimits _{i\in \tilde{ I}_1\cup \tilde{I}_2}\lambda _{i}=1\). Let

$$\begin{aligned} \mathbf {1}^\top \mathbf {f}(\hat{\mathbf {y}}^{i_{0}};p)=\min \limits _{i\in \tilde{I}_1\cup \tilde{I}_2}\mathbf {1}^\top \mathbf {f}(\hat{\mathbf {y}}^{i};p). \end{aligned}$$
(21)

Notice that every \((\hat{\mathbf {x}}^{i},\hat{\mathbf {y}}^{i})\) is a pseudo-extreme point of \(\mathbb {T}\) and that at least two pseudo-extreme points have distinct\(\mathbf {y}\)-blocks, we have

$$\begin{aligned} \mathbf {1}^\top \mathbf {f}(\tilde{\mathbf {y}};p)= & {} \mathbf {1}^\top \mathbf {f}\left( \sum \limits _{i\in \tilde{I}_1\cup \tilde{I}_2}\lambda _{i}\hat{\mathbf {y}}^{i};p\right) =\sum \limits _{j=1}^{n_2}f\left( \sum \limits _{i\in \tilde{I}_1\cup \tilde{I}_2}\lambda _{i}\hat{y}^{i}_{j};p\right) \nonumber \\> & {} \sum \limits _{j=1}^{n_2}\sum \limits _{i\in \tilde{I}_1\cup \tilde{I}_2}\lambda _{i}f(\hat{y}^{i}_{j};p)=\sum \limits _{i\in \tilde{I}_1\cup \tilde{I}_2}\lambda _{i}\sum \limits _{j=1}^{n_2}f(\hat{y}^{i}_{j};p)\nonumber \\= & {} \sum \limits _{i\in \tilde{I}_1\cup \tilde{I}_2}\lambda _{i}\mathbf {1}^\top \mathbf {f}(\hat{\mathbf {y}}^{i};p) \geqslant \sum \limits _{i\in \tilde{I}_1\cup \tilde{I}_2}\lambda _{i}\mathbf {1}^\top \mathbf {f}(\hat{\mathbf {y}}^{i_{0}};p)\nonumber \\= & {} \mathbf {1}^\top \mathbf {f}(\hat{\mathbf {y}}^{i_{0}};p)\sum \limits _{i\in \tilde{I}_1\cup \tilde{I}_2}\lambda _{i}=\mathbf {1}^\top \mathbf {f}(\hat{\mathbf {y}}^{i_{0}};p)\nonumber \\\geqslant & {} \min \limits _{(\mathbf {x},\mathbf {y})\in \mathbb {T}}\mathbf {1}^\top \mathbf {f}(\mathbf {y};p), \end{aligned}$$
(22)

where the first strict inequality follows from the strict concavity of \(f(\cdot ;p)\) on \(\mathbb {S}\); the second inequality from (21); and the last inequality is obvious.

From (22), we have \(\mathbf {1}^\top \mathbf {f}(\tilde{\mathbf {y}};p)>\min \nolimits _{(\mathbf {x},\mathbf {y})\in \mathbb {T}}\mathbf {1}^\top \mathbf {f}(\mathbf {y};p)\), which implies that point \((\tilde{\mathbf {x}},\tilde{\mathbf {y}}) \) is not an optimal solution to problem (6). Since \((\tilde{\mathbf {x}},\tilde{\mathbf {y}}) \) is arbitrary in \(\mathbb {T}_{non\text {-}pseudo}^{else}\), we see that all the points in \(\mathbb {T}_{non\text {-}pseudo}^{else}\) are not optimal solutions to problem (6).

By all the discussions above, we derive that each optimal solution to problem (6) with \(0<p<p^*\) must have a \(\mathbf {y}\)-block being the corresponding block of some pseudo-extreme point optimal solution to problem (6). This demonstrates that each optimal solution to problem (6) with \(0<p<p^*\) must be an optimal solution to problem (1), since the optimal values of the objective functions of these both problems are all determined by the \(\mathbf{y}\)-block of each pseudo-extreme-point optimal solution of problem (6).

The proof is complete. \(\square \)

4 Several important subclasses of problem (1)

Problem (1) covers a wide range of problems. In this section, we give several important special cases of it.

4.1 Partially sparse solutions of equalities and inequalities

Consider the following partially sparse optimization problem

$$\begin{aligned} \min _{(\mathbf {x}, \mathbf {y})} \Vert \mathbf {y}\Vert _0\quad \text {s.t.}\;\; A\mathbf {x} +B \mathbf {y}= \mathbf {b}, C \mathbf {x}+D \mathbf {y}\leqslant \mathbf {c},\mathbf {l}\leqslant \mathbf {x}\leqslant \mathbf {u}, \mathbf {0}\leqslant \mathbf {y}\leqslant \mathbf {1}, \end{aligned}$$
(23)

where \(A\in \mathbb {R}^{m_1\times n_1},B\in \mathbb {R}^{m_1\times n_2},C\in \mathbb {R}^{m_2\times n_1},D\in \mathbb {R}^{m_2\times n_2}, \mathbf {b}\in \mathbb {R}^{m_1},\mathbf {c}\in \mathbb {R}^{m_2}\), \(\mathbf {l},\mathbf {u}\in \mathbb {R}^{n_1}\) satisfying \(\mathbf {l}\leqslant \mathbf {u}\), and \(\mathbf {x}\in \mathbb {R}^{n_{1}}, \mathbf {y}\in \mathbb {R}^{n_{2}}\), \((\mathbf {x},\mathbf {y})\in \mathbb {R}^N\) with \(N=n_1+n_2\).

We have the following result.

Theorem 4.1

Suppose that \(f(t;p)\in \mathscr {F}(t;p)\). We consider sparse optimization problem (23) and its nonconvex relaxation problem

$$\begin{aligned} \min _{(\mathbf {x}, \mathbf {y})} \sum \limits _{i=1}^{n_2}f(y_i;p)\quad \text {s.t.}\;\; A\mathbf {x} +B \mathbf {y}= \mathbf {b}, C \mathbf {x}+D \mathbf {y}\leqslant \mathbf {c},\mathbf {l}\leqslant \mathbf {x}\leqslant \mathbf {u}, \mathbf {0}\leqslant \mathbf {y}\leqslant \mathbf {1}. \end{aligned}$$
(24)

It follows that there exists a constant \(p^*\in (0,1]\) such that for an arbitrarily given real number p with \(0<p<p^*\), every optimal solution to nonconvex minimization problem (24) is an optimal solution to (23).

Proof

Let \(\mathbb {P}=\{(\mathbf {x}, \mathbf {y})\in \mathbb {R}^{N}: A \mathbf {x} +B \mathbf {y}= \mathbf {b},C \mathbf {x}+D \mathbf {y}\leqslant \mathbf {c}\}\). Then, problem (23) can be written as

$$\begin{aligned} \min _{(\mathbf {x}, \mathbf {y})} \Vert \mathbf {y}\Vert _0\quad \text {s.t.}\;\; (\mathbf {x},\mathbf {y})\in \mathbb {P},\mathbf {l}\leqslant \mathbf {x}\leqslant \mathbf {u}, \mathbf {0}\leqslant \mathbf {y}\leqslant \mathbf {1}, \end{aligned}$$
(25)

It is obvious that \(\mathbb {P}\) is a polyhedral in \(\mathbb {R}^{N}\). Thus, problem (25) is a special case of problem (1). By Theorem 3.6, the theorem holds immediately. \(\square \)

In the following, we further consider two special cases of problem (23).

Problem 1

We consider the problem of Compressive Sensing (see, e.g., Candés et al. 2006; Candés and Tao 2005; Foucart and Rauhut 2013):

$$\begin{aligned} \min _{\mathbf {z}} \Vert \mathbf {z}\Vert _0\quad \text {s.t.}\;\; \Phi \mathbf {z} = \mathbf {b}, \end{aligned}$$
(26)

where \(\Phi \in \mathbb {R}^{m\times n}\) with \(m\ll n\) is of full row rank, i.e., \(\mathop {rank}(\Phi )=m\), \(\mathbf {b}\in \mathbb {R}^m\) and \(\mathbf {z}\in \mathbb {R}^{n}\). By You et al. (2017, Theorem 3.1), there is a number \(r_0>0\) such that problem (26) can be reformulated as

$$\begin{aligned} \min _{\mathbf {z}} \Vert \mathbf {z}\Vert _0\quad \text {s.t.}\;\; \Phi \mathbf {z} = \mathbf {b}, \Vert \mathbf {z}\Vert _\infty \leqslant r_0. \end{aligned}$$
(27)

Let \(\mathbf {x}=\frac{1}{r_0}\mathbf {z}\) and \(A=r_0\Phi \), then problem (27) can be equivalently transformed to

$$\begin{aligned} \min _{\mathbf {x}} \Vert \mathbf {x}\Vert _0\quad \text {s.t.}\;\; A\mathbf {x} = \mathbf {b}, \Vert \mathbf {x}\Vert _\infty \leqslant 1. \end{aligned}$$
(28)

Furthermore, we introduce a variable \(\mathbf {y}\in \mathbb {R}^{n}\) such that \(|\mathbf {x}|\leqslant \mathbf {y}\); and define two-block matrices \(C:=(-\mathrm{E} ~,~\mathrm{E})^{\top }\) and \(D:=(-\mathrm{E} , -\mathrm{E})^{\top }\) where \(\mathrm{E}\in \mathbb {R}^{n\times n}\) is the \(n\times n\) identity matrix, then it is easy to verify that \(|\mathbf {x}|\leqslant \mathbf {y}\) if and only if \(C \mathbf {x}+D \mathbf {y}\leqslant \mathbf {0}\). Take \(B=\mathbf {O}\), i.e., B denotes an m by n zero matrix. Then, by You et al. (2017, Corollary 2.1), problem (28) can be written as

$$\begin{aligned} \min _{(\mathbf {x},\mathbf {y})} \Vert \mathbf {y}\Vert _0\quad \text {s.t.}\;\; A\mathbf {x}+B\mathbf {y} = \mathbf {b},C \mathbf {x}+D \mathbf {y}\leqslant \mathbf {0}, -\mathbf {1}\leqslant \mathbf {x}\leqslant \mathbf {1}, \mathbf {0}\leqslant \mathbf {y}\leqslant \mathbf {1}. \end{aligned}$$
(29)

It is easy to see that this problem is a special case of problem (23). To relax problem (29), we assume \(f(t;p)\in \mathscr {F}(t;p)\) and consider the following problem:

$$\begin{aligned} \min _{(\mathbf {x}, \mathbf {y})}\mathbf {1}^\top \mathbf {f}(\mathbf {y};p) \quad \text {s.t.}\;\; A\mathbf {x} +B \mathbf {y}= \mathbf {b}, C \mathbf {x}+D \mathbf {y}\leqslant \mathbf {0},-\mathbf {1}\leqslant \mathbf {x}\leqslant \mathbf {1}, \mathbf {0}\leqslant \mathbf {y}\leqslant \mathbf {1}, \end{aligned}$$
(30)

where \(\mathbf {f}(\mathbf {y};p)=(f(y_1;p),\ldots ,f(y_n;p))^\top \). Therefore, from Theorem 4.1 it follows that there exists a constant \(p^*\in (0,1]\) such that for an arbitrarily given real number p with \(0<p<p^*\), every optimal solution to nonconvex minimization problem (30) is an optimal solution to problem (29).

Noting that we have shown that problem (29) is equivalent to problem (26). By using a similar way we can obtain that problem (30) is equivalent to

$$\begin{aligned} \min _{\mathbf {z}} \mathbf {1}^\top \mathbf {f}\left( \frac{1}{r_0}|\mathbf {z}|;p\right) \quad \text {s.t.}\;\; \Phi \mathbf {z} = \mathbf {b}, \Vert \mathbf {z}\Vert _\infty \leqslant r_0, \end{aligned}$$
(31)

where \(r_0\) is given by (27). Thus, we can further obtain the following result.

Theorem 4.2

There exist a positive number \(r_0\) and a constant \(p^*\in (0,1]\) such that for an arbitrarily given real number p with \(0<p<p^*\), every optimal solution to nonconvex minimization problem (31) is an optimal solution to problem (26).

In particular, if we take \(f(t;p)=t^p\in \mathscr {F}(t;p)\), we can further obtain that problem (31) is equivalent to

$$\begin{aligned} \min _{\mathbf {z}} \Vert \mathbf {z}\Vert _p^p\quad \text {s.t.}\;\; \Phi \mathbf {z} = \mathbf {b}. \end{aligned}$$
(32)

Thus, from Theorem 4.2 it follows that there exists a constant \(p^*\in (0,1]\) such that for an arbitrarily given real number p with \(0<p<p^*\), every optimal solution to nonconvex minimization problem (32) is an optimal solution to problem (26). Such a result has been obtained in Peng et al. (2015).

Problem 2

We consider sparsest solutions of linear equalities and inequalities (see Fung and Mangasarian 2011):

$$\begin{aligned} \min _{\mathbf {x}}\Vert \mathbf {x}\Vert _0 \quad \text {s.t.}\;\; A\mathbf {x}= \mathbf {b}, F\mathbf {x}\leqslant \mathbf {d}, \Vert \mathbf {x}\Vert _\infty \leqslant 1, \end{aligned}$$
(33)

where \(A\in \mathbb {R}^{m\times n}\), \(\mathbf {b}\in \mathbb {R}^{m}\), \(F\in \mathbb {R}^{k\times n}\), \(\mathbf {d}\in \mathbb {R}^k\), and \(\mathbf {x}\in \mathbb {R}^{n}\). We introduce a variable \(\mathbf {y}\in \mathbb {R}^{n}\) such that \(|\mathbf {x}|\leqslant \mathbf {y}\) and define \(\mathbf {c}:=(\mathbf {d}^\top ,\mathbf {0}^{\top },\mathbf {0}^{\top })^\top \), \(C:=(\mathrm{F}^\top ,-\mathrm{E} ~,~\mathrm{E})^{\top }\) and \(D:=(\mathrm{O}^\top ,-\mathrm{E} , -\mathrm{E})^{\top }\), where \(\mathrm{O}\) is a \(k\times n\) zero matrix and \(\mathrm{E}\in \mathbb {R}^{n\times n}\) is the \(n\times n\) identity matrix, then it is easy to verify that both \(F\mathbf {x}\leqslant \mathbf {d}\) and \(|\mathbf {x}|\leqslant \mathbf {y}\) hold if and only if \(C \mathbf {x}+D \mathbf {y}\leqslant \mathbf {c}\). Let \(B=\mathbf {O}\), i.e., B denotes an m by n zero matrix. Then, by You et al. (2017, Corollary 2.1), problem (33) can be written as

$$\begin{aligned} \min _{(\mathbf {x},\mathbf {y})} \Vert \mathbf {y}\Vert _0\quad \text {s.t.}\;\; A\mathbf {x}+B\mathbf {y} = \mathbf {b},C \mathbf {x}+D \mathbf {y}\leqslant \mathbf {c}, -\mathbf {1}\leqslant \mathbf {x}\leqslant \mathbf {1}, \mathbf {0}\leqslant \mathbf {y}\leqslant \mathbf {1}. \end{aligned}$$
(34)

It is easy to observe that this problem is a special case of problem (23).

To relax problem (34), we assume \(f(t;p)\in \mathscr {F}(t;p)\) and consider the nonconvex relaxation of problem (34) as follows

$$\begin{aligned} \min _{(\mathbf {x}, \mathbf {y})}\mathbf {1}^\top \mathbf {f}(\mathbf {y};p) \quad \text {s.t.}\;A\mathbf {x} +B \mathbf {y}= \mathbf {b}, C \mathbf {x}+D \mathbf {y}\leqslant \mathbf {c},-\mathbf {1}\leqslant \mathbf {x}\leqslant \mathbf {1}, \mathbf {0}\leqslant \mathbf {y}\leqslant \mathbf {1}, \end{aligned}$$
(35)

where \(\mathbf {f}(\mathbf {y};p)=(f(y_1;p),\ldots ,f(y_n;p))^\top \). Therefore, from Theorem 3.6 it follows that there exists a constant \(p^*\in (0,1]\) such that for an arbitrarily given real number p with \(0<p<p^*\), every optimal solution to nonconvex minimization problem (35) is an optimal solution to problem (34).

By You et al. (2017, Corollary 2.1), we observe that problem (35) is equivalent to

$$\begin{aligned} \min _{\mathbf {x}}\mathbf {1}^\top \mathbf {f}(|\mathbf {x}|;p) \quad \text {s.t.}\;A\mathbf {x} = \mathbf {b}, F\mathbf {x}\leqslant \mathbf {d},\Vert \mathbf {x}\Vert _\infty \leqslant 1, \end{aligned}$$
(36)

and notice that (34) is equivalent to (33). Therefore, we derive

Theorem 4.3

There exists a constant \(p^*\in (0,1]\) such that for an arbitrarily given real number p with \(0<p<p^*\), every optimal solution to nonconvex relaxation problem (36) is an optimal solution to (33).

In particular, if we take \(f(t;p)=t^p\in \mathscr {F}(t;p)\), then the corresponding result was obtained in Fung and Mangasarian (2011, Proposition 1).

4.2 Partially sparsest solutions of absolute equalities and inequalities

Consider the following partially sparse optimization problem

$$\begin{aligned} \min _{(\mathbf {x}, \mathbf {y})} \Vert \mathbf {y}\Vert _0\quad \text {s.t.}\;|A\mathbf {x} +B \mathbf {y}|= \mathbf {b}, C \mathbf {x}+D \mathbf {y}\leqslant \mathbf {c},\mathbf {l}\leqslant \mathbf {x}\leqslant \mathbf {u}, \mathbf {0}\leqslant \mathbf {y}\leqslant \mathbf {1}, \end{aligned}$$
(37)

where \(A\in \mathbb {R}^{m_1\times n_1},B\in \mathbb {R}^{m_1\times n_2},C\in \mathbb {R}^{m_2\times n_1},D\in \mathbb {R}^{m_2\times n_2}, \mathbf {0}\leqslant \mathbf {b}\in \mathbb {R}^{m_1},\mathbf {c}\in \mathbb {R}^{m_2}\), \(\mathbf {l},\mathbf {u}\in \mathbb {R}^{n_1}\) satisfying \(\mathbf {l}\leqslant \mathbf {u}\), and \((\mathbf {x},\mathbf {y})\in \mathbb {R}^{N}=\mathbb {R}^{n_1}\times \mathbb {R}^{n_2}.\)

We have the following result.

Theorem 4.4

Assume \(f(t;p)\in \mathscr {F}(t;p)\). We consider sparse optimization problem (37) and its nonconvex relaxation problem

$$\begin{aligned} \min _{(\mathbf {x}, \mathbf {y})} \sum \limits _{i=1}^{n_2}f(y_i;p)\quad \text {s.t.}\;|A\mathbf {x} +B \mathbf {y}|= \mathbf {b}, C \mathbf {x}+D \mathbf {y}\leqslant \mathbf {c},\mathbf {l}\leqslant \mathbf {x}\leqslant \mathbf {u}, \mathbf {0}\leqslant \mathbf {y}\leqslant \mathbf {1}.\nonumber \\ \end{aligned}$$
(38)

It follows that there exists a constant \(p^*\in (0,1]\) such that for an arbitrarily given real number p with \(0<p<p^*\), every optimal solution to nonconvex relaxation problem (38) is an optimal solution to (37).

Proof

Let

$$\begin{aligned} \mathbb {T}:=\left( \bigcup _{\varvec{\epsilon }\in \{-1,1\}^{m_1}}\mathbb {P}_{\epsilon }\right) \bigcap \{(\mathbf {x},\mathbf {y})\in \mathbb {R}^N:\mathbf {l}\leqslant \mathbf {x}\leqslant \mathbf {u},\;\mathbf {0}\leqslant \mathbf {y}\leqslant \mathbf {1}\}, \end{aligned}$$
(39)

where each \( \mathbb {P}_{\epsilon }:= \{(\mathbf {x}, \mathbf {y})\in \mathbb {R}^{N} : A \mathbf {x} +B \mathbf {y}=\varvec{\epsilon }\circ \mathbf {b},C \mathbf {x}+D \mathbf {y}\leqslant \mathbf {c}\} \) is obviously a polyhedral in \(\mathbb {R}^{N}\). Hence, (37) can be rewritten as

$$\begin{aligned} \quad \min _{(\mathbf {x}, \mathbf {y})} \Vert \mathbf {y}\Vert _0\quad \text {s.t.}\;(\mathbf {x}, \mathbf {y})\in \bigcup _{\varvec{\epsilon }\in \{-1,1\}^{m_1}}\mathbb {P}_{\epsilon },\;\mathbf {l}\leqslant \mathbf {x}\leqslant \mathbf {u},\mathbf {0}\leqslant \mathbf {y}\leqslant \mathbf {1}. \end{aligned}$$
(40)

Thus, problem (37) is a special case of problem (1). By Theorem 3.6, the theorem holds immediately. \(\square \)

In the following, we further consider a special case of problem (37).

Problem 3

We consider the problem of Phaseless Compressive Sensing (see, e.g., Voroninski and Xu 2016):

$$\begin{aligned} \min _{\mathbf {z}} \Vert \mathbf {z}\Vert _0\quad \text {s.t.}\;\; |\Phi \mathbf {z}| = \mathbf {b}, \end{aligned}$$
(41)

where \(\Phi \in \mathbb {R}^{m\times n}\) with \(m\ll n\) is of full row rank, i.e., \(\mathop {rank}(\Phi )=m\), \(\mathbf {b}\in \mathbb {R}^m\) and \(\mathbf {z}\in \mathbb {R}^{n}\). By You et al. (2017, Theorem 3.2), there is a number \(r_0>0\) such that problem (41) can be reformulated as

$$\begin{aligned} \min _{\mathbf {z}} \Vert \mathbf {z}\Vert _0\quad \text {s.t.}\;\; |\Phi \mathbf {z}| = \mathbf {b}, \Vert \mathbf {z}\Vert _\infty \leqslant r_0. \end{aligned}$$
(42)

Let \(\mathbf {x}=\frac{1}{r_0}\mathbf {z}\) and \(A=r_0\Phi \), then problem (42) can be equivalently transformed to

$$\begin{aligned} \min _{\mathbf {x}} \Vert \mathbf {x}\Vert _0\quad \text {s.t.}\;\; |A\mathbf {x}| = \mathbf {b}, \Vert \mathbf {x}\Vert _\infty \leqslant 1. \end{aligned}$$
(43)

Furthermore, we introduce a variable \(\mathbf {y}\in \mathbb {R}^{n}\) such that \(|\mathbf {x}|\leqslant \mathbf {y}\); and define two-block matrices \(C:=(-\mathrm{E} ~,~\mathrm{E})^{\top }\) and \(D:= (-\mathrm{E} , -\mathrm{E})^{\top }\) where \(\mathrm{E}\in \mathbb {R}^{n\times n}\) is the \(n\times n\) identity matrix, then it is easy to verify that \(|\mathbf {x}|\leqslant \mathbf {y}\) if and only if \(C \mathbf {x}+D \mathbf {y}\leqslant \mathbf {0}\). Take \(B=\mathbf {O}\), i.e., B denotes an m by n zero matrix. Then, by You et al. (2017, Corollary 2.1), problem (28) can be written as

$$\begin{aligned} \min _{(\mathbf {x},\mathbf {y})} \Vert \mathbf {y}\Vert _0\quad \text {s.t.}\;\; |A\mathbf {x}+B\mathbf {y}| = \mathbf {b},C \mathbf {x}+D \mathbf {y}\leqslant \mathbf {0}, -\mathbf {1}\leqslant \mathbf {x}\leqslant \mathbf {1}, \mathbf {0}\leqslant \mathbf {y}\leqslant \mathbf {1}. \end{aligned}$$
(44)

It is easy to see that this problem is a special case of problem (37). To relax problem (44), we assume \(f(t;p)\in \mathscr {F}(t;p)\) and consider the following problem:

$$\begin{aligned} \min _{(\mathbf {x}, \mathbf {y})}\mathbf {1}^\top \mathbf {f}(\mathbf {y};p) \quad \text {s.t.}\;\; |A\mathbf {x} +B \mathbf {y}|= \mathbf {b}, C \mathbf {x}+D \mathbf {y}\leqslant \mathbf {0},-\mathbf {1}\leqslant \mathbf {x}\leqslant \mathbf {1}, \mathbf {0}\leqslant \mathbf {y}\leqslant \mathbf {1}, \end{aligned}$$
(45)

where \(\mathbf {f}(\mathbf {y};p)=(f(y_1;p),\ldots ,f(y_n;p))^\top \). Therefore, from Theorem 4.4 it follows that there exists a constant \(p^*\in (0,1]\) such that for an arbitrarily given real number p with \(0<p<p^*\), every optimal solution to nonconvex minimization problem (45) is an optimal solution to problem (44).

Noting that we have shown that problem (44) is equivalent to problem (41). By using a similar way we can obtain that problem (45) is equivalent to

$$\begin{aligned} \min _{\mathbf {z}} \mathbf {1}^\top \mathbf {f}\left( \frac{1}{r_0}|\mathbf {z}|;p\right) \quad \text {s.t.}\;\; |\Phi \mathbf {z}| = \mathbf {b}, \Vert \mathbf {z}\Vert _\infty \leqslant r_0, \end{aligned}$$
(46)

where \(r_0\) is given by (42).

Thus, we can further obtain the following result.

Theorem 4.5

There exist a positive number \(r_0\) and a constant \(p^*\in (0,1]\) such that for an arbitrarily given real number p with \(0<p<p^*\), every optimal solution to nonconvex minimization problem (46) is an optimal solution to problem (41).

In particular, if we take \(f(t;p)=t^p\in \mathscr {F}(t;p)\), we can further obtain that problem (46) is equivalent to

$$\begin{aligned} \min _{\mathbf {z}} \Vert \mathbf {z}\Vert _p^p\quad \text {s.t.}\;\; |\Phi \mathbf {z}| = \mathbf {b}. \end{aligned}$$
(47)

Thus, from Theorem 4.5 it follows that there exists a constant \(p^*\in (0,1]\) such that for an arbitrarily given real number p with \(0<p<p^*\), every optimal solution to nonconvex minimization problem (47) is an optimal solution to problem (41). Such a result has been obtained in You et al. (2017).

4.3 Sparsest solutions of absolute equalities and inequalities

In this subsection, we consider a class of problems that find the sparsest solutions defined as follows. Let \(\Phi \in \mathbb {R}^{m\times n}, \mathbf {b}\in \mathbb {R}^{m} \), \(\mathbf {d}\in \mathbb {R}^{s}\), \(P\in \mathbb {R}^{s\times n}\), \(\mathbf {l}:=(l_1,\ldots ,l_n)^\top \) and \(\mathbf {u}:=(u_1,\ldots ,u_n)^\top \). We consider

$$\begin{aligned} \min \limits _{\mathbf {x}} \Vert \mathbf {x}\Vert _0\quad \text {s.t.}\;\; |\Phi \mathbf {x}|=\mathbf {b}, P\mathbf {x}\leqslant \mathbf {d}, \mathbf {l}\leqslant \mathbf {x}\leqslant \mathbf {u}. \end{aligned}$$
(48)

We do not care two cases: \(\mathbf {0}<\mathbf {l}<\mathbf {u}\) or \(\mathbf {l}<\mathbf {u}<\mathbf {0}\), since, if so, all the feasible points are solutions with objective \(\min \Vert \mathbf {x}\Vert _0=n\). For convenience, we only consider the above problem when \(\mathbf {l}<\mathbf {u}\) with any one of four special cases as follows: (i) \(\mathbf {l}=\mathbf {0}\) and \(\mathbf {u}>\mathbf {0}\), (ii) \(\mathbf {l}\geqslant \mathbf {0}\) and \(\mathbf {u}>\mathbf {0}\ \), (iii) \(\mathbf {l}<0\) and \(\mathbf {u}\leqslant \mathbf {0}\) and (iv) \(\mathbf {l}<\mathbf {0}<\mathbf {u}\).

Proposition 4.1

For any given case defined above, problem (48) can be equivalently transformed to a special case of problem (37).

Proof

We divide the proof into four parts.

  1. (i)

    \(\mathbf {l}=\mathbf {0}\) and \(\mathbf {u}>\mathbf {0}\). In this case, the problem (48) is

    $$\begin{aligned} \min \limits _{\mathbf {x}} \Vert \mathbf {x}\Vert _0\quad \text {s.t.}\;\; |\Phi \mathbf {x}|=\mathbf {b}, P\mathbf {x}\leqslant \mathbf {d}, \mathbf {0}\leqslant \mathbf {x}\leqslant \mathbf {u}. \end{aligned}$$
    (49)

    Let \(\Lambda =\mathrm{Diag}(\mathbf {u}),\mathbf {y}=\Lambda ^{{-1}}\mathbf {x},B=\Phi \Lambda ,D=P\Lambda \). Then, (49) is written as

    $$\begin{aligned} \min \limits _{\mathbf {y}} \Vert \mathbf {y}\Vert _0\quad \text {s.t.}\;\; |B \mathbf {y}|= \mathbf {b}, D \mathbf {y}\leqslant \mathbf {d}, \mathbf {0}\leqslant \mathbf {y}\leqslant \mathbf {1}, \end{aligned}$$

    which can be regarded as a special case of problem (37).

  2. (ii)

    \(\mathbf {l}\geqslant \mathbf {0}\) and \(\mathbf {u}>\mathbf {0}\). Define an index set \(\mathbb {I}:=\{i\mid l_i=0,i\in [n]\}\). Then, we have \(\Phi \mathbf {x}= \Phi _{\mathbb {I}}\mathbf {x}_{\mathbb {I}}+{\Phi }_{\mathbb {I}^c}\mathbf {x}_{\mathbb {I}^c}\) and \(P \mathbf {x}=P_{\mathbb {I}}\mathbf {x}_{\mathbb {I}}+P_{\mathbb {I}^c}\mathbf {x}_{\mathbb {I}^c}\), and hence, (48) can be rewritten as

    $$\begin{aligned} \begin{array}{cl} \min \limits _{\mathbf {x}} &{} \Vert \mathbf {x}_{\mathbb {I}}\Vert _0\\ \text {s.t.} &{} |{\Phi }_{\mathbb {I}}\mathbf {x}_{\mathbb {I}}+{\Phi }_{\mathbb {I}^c}\mathbf {x}_{\mathbb {I}^c}|=\mathbf {b}, P_{\mathbb {I}}\mathbf {x}_{\mathbb {I}}+P_{\mathbb {I}^c}\mathbf {x}_{\mathbb {I}^c}\leqslant \mathbf {d}, \mathbf {0}\leqslant \mathbf {x}_{\mathbb {I}}\leqslant {\mathbf {u}}_{\mathbb {I}} ,\mathbf {l}_{\mathbb {I}^c}\leqslant \mathbf {x}_{\mathbb {I}^c}\leqslant \mathbf {u}_{\mathbb {I}^c}, \end{array} \end{aligned}$$
    (50)

    which is a special case of problem (4). By Remark 2.1, (50) can be equivalently transformed to a special case of (1), more specifically, to (37).

  3. (iii)

    \(\mathbf {l}<0\) and \(\mathbf {u}\leqslant \mathbf {0}\). In this case, let \(\bar{\mathbf {x}}=-\mathbf {x}\), \(\bar{\Phi }=-\Phi \), \(\bar{P}=-P\), \(\bar{\mathbf {l}}=-\mathbf {u}\) and \(\bar{\mathbf {u}}=-\mathbf {l}\), then \(\bar{\mathbf {l}}\geqslant \mathbf {0}\) and \(\bar{\mathbf {u}}>\mathbf {0}\), and hence, (48) can be rewritten as

    $$\begin{aligned} \min \Vert \bar{\mathbf {x}}\Vert _0\quad \text {s.t.}\;\; |\bar{\Phi }\bar{\mathbf {x}}|=b, \bar{P}\bar{\mathbf {x}}\leqslant \mathbf {d}, \bar{\mathbf {l}}\leqslant \bar{\mathbf {x}}\leqslant \bar{\mathbf {u}}, \end{aligned}$$

    which is the case (ii).

  4. (iv)

    \(\mathbf {l}<\mathbf {0}<\mathbf {u}\). In this case, we introduce a variable \(\mathbf {z}\in \mathbb {R}^n\) such that \(|\mathbf {x}|\leqslant \mathbf {z}\). Let \(\mathbf {v}=\max \{-\mathbf {l}, \mathbf {u}\}\). Since that \(\mathbf { sign}(\mathbf {z})\) is monotonically nondecreasing, noting that objective function \(\Vert \mathbf { z}\Vert _0=\mathbf { 1}^\top \mathbf { sign}(\mathbf {z})\), by You et al. (2017, Corollary 2.1), we have the following equivalent form:

    $$\begin{aligned} \min _{(\mathbf {x}, \mathbf {z})} \Vert \mathbf {z}\Vert _0\quad \mathrm {}{s.t.}\;|\Phi \mathbf {x}|= \mathbf {b}, P\mathbf {x}\leqslant \mathbf {d}, |\mathbf {x}|\leqslant \mathbf {z}, \mathbf {l}\leqslant \mathbf {x}\leqslant \mathbf {u} ~\mathrm{and}~ \mathbf {0}\leqslant \mathbf {z}\leqslant \mathbf {\mathbf {v}}. \end{aligned}$$
    (51)

    Let \( \Psi \) be a zero matrix, \(\Theta =(-{E},~~{E},~~{P}^\top )^{\top }\), \(\Gamma = (-{E},~{E},~{O})^{\top }\) with \(\mathrm{E}\) being an n by n Identity matrix and \(\mathrm {O}\) being an n by s zero matrix, and \(\mathbf {c}=(\mathbf {0}^\top ,~~\mathbf {0}^\top ,~~\mathbf {d}^\top )^\top \). Then, problem (51) can be rewritten as

    $$\begin{aligned} \min _{(\mathbf {x}, \mathbf {z})} \Vert \mathbf {z}\Vert _0\quad \mathrm {}{s.t.}\;|\Phi \mathbf {x} +\Psi \mathbf {z}|= \mathbf {b}, \Theta \mathbf {x}+\Gamma \mathbf {z}\leqslant \mathbf {c},\mathbf {l}\leqslant \mathbf {x}\leqslant \mathbf {u}, \mathbf {0}\leqslant \mathbf {z}\leqslant \mathbf {\mathbf {v}}, \end{aligned}$$
    (52)

    By Remark 2.1, (52) can be equivalently transformed to a special case of (1), more specifically, to (37).

Combining parts (i)–(iv), we complete the proof. \(\square \)

By Proposition 4.1, Theorem 4.4 and You et al. (2017, Corollary 2.1), with notation replacements if necessary, we can obtain the following results.

Theorem 4.6

Consider problem (48) with bounds \(\mathbf {l}<\mathbf {u}\) satisfying one of four special cases defined above. Let \(\mathbb {I}:=\{i:l_i u_i\leqslant 0,i\in [n]\}\), \(\mathbf {v}:=\max \{|\mathbf {l}_{\mathbb {I}}|,|\mathbf {u}_{\mathbb {I}}|\}\) and \(\Lambda :=\mathop {Diag}(\mathbf {v})\). Then, for the following nonconvex minimization problem

$$\begin{aligned} \min _{\mathbf {x}} \mathbf {1}^{\top }\mathbf {f}(\Lambda ^{-1}|\mathbf {x}_{\mathbb {I}}|;p)\quad \mathrm {}{s.t.}\;|\Phi \mathbf {x}|= \mathbf {b}, P \mathbf {x}\leqslant \mathbf {d},\mathbf {l}\leqslant \mathbf {x}\leqslant \mathbf {u}, \end{aligned}$$
(53)

where \(\mathbf {f}(\mathbf {y};p)=(f(y_i;p)_{i\in \mathbb {I}})^\top \) with any given \(f(t;p)\in \mathscr {F}(t;p)\), there exists a constant \(p^*\in (0,1]\) such that for any given real number p with \(0<p<p^*\), every optimal solution to nonconvex minimization problem (53) is an optimal solution to problem (48).

Next, we take an example to illustrate the validity of Theorem 4.6.

Example 4.1

We consider the 2-dimensional problem:

$$\begin{aligned} \min \limits _{\mathbf {x}\in \mathbb {R}^2}\Vert \mathbf {x} \Vert _0\quad \text {s.t.}\;\; |x_1-2x_2|=1, 2x_1-x_2\leqslant -1, \Vert \mathbf {x} \Vert _\infty \leqslant 1, \end{aligned}$$
(54)

where \( \mathbf {x}=(x_1,x_2)^{\top } \).

We apply two specific nonconvex functions and consider the corresponding nonconvex relaxation problems to confirm the result obtained in Theorem 4.6.

Case 1. We take \(f(t;p)=t^p\in \mathscr {F}(t;p)\), and consider nonconvex relaxation problem

$$\begin{aligned} \min \limits _{\mathbf {x}\in \mathbb {R}^2}\Vert \mathbf {x} \Vert _{p}^{p}\quad \text {s.t.}\; |x_1-2x_2|=1, 2x_1-x_2\leqslant -1, \Vert \mathbf {x} \Vert _\infty \leqslant 1. \end{aligned}$$
(55)
Fig. 1
figure 1

All feasible points of problem (54): point \((-1,-1)^\top \) and points on the bold line segment, with the point \((-1,0)^\top \) being the (sparsest) optimal solution to the original problem (54)

The equality constraint implies \(x_1-2x_2=-1\) or \(x_1-2x_2=1\). Accordingly, the feasible set (see Fig. 1), denoted by \(\mathbb {D}\), can be written as \(\mathbb {D}=\mathbb {D}_1\bigcup \mathbb {D}_2\), where

$$\begin{aligned} \mathbb {D}_1=\{(x_1,x_2)^{\top }\mid x_1-2x_2=-1,2x_1-x_2\leqslant -1,-1\leqslant x_i\leqslant 1,i=1,2 \} \end{aligned}$$

and

$$\begin{aligned} \mathbb {D}_2=\{(x_1,x_2)^{\top }\mid x_1-2x_2=1,2x_1-x_2\leqslant -1,-1\leqslant x_i\leqslant 1,i=1,2\}. \end{aligned}$$

Note that the objective function \(\Vert \mathbf {x} \Vert _{p}^{p}=|x_1|^p+|x_2|^p\) with \(0<p<1\). It follows that problem (55) is equivalent to

$$\begin{aligned} \min \left\{ \min \limits _{\mathbf {x}\in \mathbb {D}_1}|x_1|^p+|x_2|^p,\min \limits _{\mathbf {x}\in \mathbb {D}_2}|x_1|^p+|x_2|^p\right\} . \end{aligned}$$
(56)

Simple computations implies that

$$\begin{aligned} \mathbb {D}_1=\left\{ (2x_2-1,x_2)^{\top }\mid 0\leqslant x_2\leqslant \frac{1}{3} \right\} \quad \text{ and }\quad \mathbb {D}_2=\left\{ (-1,-1)^{\top }\right\} . \end{aligned}$$

Therefore, (56) can be rewritten as

$$\begin{aligned}&\min \left\{ \min \limits _{ 0\leqslant x_2\leqslant \frac{1}{3}}|2x_2-1|^p+|x_2|^p,2\right\} \nonumber \\&\quad =\min \left\{ \min \limits _{ 0\leqslant x_2\leqslant \frac{1}{3}}(1-2x_2)^p+x_2^p\;,2\right\} . \end{aligned}$$
(57)

Consider the subproblem in (57):

$$\begin{aligned} \min \,(1-2x_2)^p+x_{2}^{p}\quad \text {s.t.}\;\; 0\leqslant x_2\leqslant \frac{1}{3}. \end{aligned}$$
(58)
Fig. 2
figure 2

Objective functions of (58), being all concave over closed interval \([0,\frac{1}{3}]\); among three specific p values, \(p^*\) is the constant that is claimed in our main results in Sect. 3

When \(0<p<1\), problem (58) is a one-dimensional concave minimization problem over a closed interval (see Fig. 2); the corresponding objective values at the endpoints 0 and \(\frac{1}{3}\) are 1 and \(2\left( \frac{1}{3}\right) ^p\), respectively. Hence, (57) is equivalent to

$$\begin{aligned} \min \left\{ 1,2\left( \frac{1}{3}\right) ^p,2\right\} =\min \left\{ 1,2\left( \frac{1}{3}\right) ^p\right\} . \end{aligned}$$
(59)

Note that when \(x_2=0\), we have \(x_1=2x_2-1=-1\); and when \(x_2=\frac{1}{3}\), we have \(x_1=2x_2-1=-\frac{1}{3}\); that is, the candidate optimal points are \((-1,0)^\top \) and \((\frac{1}{3},\frac{1}{3})^\top \). However, we desire the point \((-1,0)^\top \) as an optimal solution to (55), meanwhile, as a (sparsest) optimal solution to the original problem (54). Hence we need the value of objective function of problem (55) at \((-1,0)^\top \) is strictly smaller than that at \((\frac{1}{3},\frac{1}{3})^\top \). That is, we look forward to \(1<2(\frac{1}{3})^p\); we derive \(0<p<\frac{\ln 2}{\ln 3}\). Then, we find \(p^*=\frac{\ln 2}{\ln 3}\).

In sum, there exists \(p^*=\frac{\ln 2}{\ln 3}\) such that for any p with \(0<p<p^*\), an optimal solution to nonconvex relaxation problem (55) is \((-1,0)^{T}\), which is also an optimal solution to the original problem (54). This confirm our results obtained in Sect. 3.

Case 2. We take \(f(t;p)=\frac{t}{t+p}\in \mathscr {F}(t;p)\), and consider nonconvex relaxation problem

$$\begin{aligned} \min \limits _{x\in \mathbb {R}^2}\left( \frac{|x_1|}{|x_1|+p}+\frac{|x_2|}{|x_2|+p}\right) \quad \text {s.t.}\;\; |x_1-2x_2|=1, 2x_1-x_2\leqslant -1, \Vert x \Vert _\infty \leqslant 1.\nonumber \\ \end{aligned}$$
(60)

The feasible set is same as Case 1 (see Fig. 1). Hence, problem (60) is equivalent to

$$\begin{aligned} \min \left\{ \min \limits _{\mathbf {x}\in \mathbb {D}_1}\left( \frac{|x_1|}{|x_1|+p}+\frac{|x_2|}{|x_2|+p}\right) , \min \limits _{\mathbf {x}\in \mathbb {D}_2}\left( \frac{|x_1|}{|x_1|+p}+\frac{|x_2|}{|x_2|+p}\right) \right\} . \end{aligned}$$
(61)

Therefore, (61) can be rewritten as

$$\begin{aligned}&\min \left\{ \min \limits _{ 0\leqslant x_2\leqslant \frac{1}{3}}\frac{|2x_2-1|}{|2x_2-1|+p}+\frac{|x_2|}{|x_2|+p}\;,\;\frac{2}{1+p}\right\} \nonumber \\&\quad =\min \left\{ \min \limits _{ 0\leqslant x_2\leqslant \frac{1}{3}}\frac{1-2x_2}{1-2x_2+p}+\frac{x_2}{x_2+p}\;,\;\frac{2}{1+p}\right\} . \end{aligned}$$
(62)

Consider the subproblem in (62):

$$\begin{aligned} \min \,f(x_2)=\frac{1-2x_2}{1-2x_2+p}+\frac{x_2}{x_2+p}\quad \text {s.t.}\; 0\leqslant x_2\leqslant \frac{1}{3}. \end{aligned}$$
(63)
Fig. 3
figure 3

Objective functions of (63), being all concave over closed interval \([0,\frac{1}{3}]\); among three specific p values, \(p^*\) is the constant that is claimed in our main results in Sect. 3

When \(0<p<1\), problem (63) is a one-dimensional concave minimization problem over a closed interval (see Fig. 3); the corresponding objective values at the endpoints 0 and \(\frac{1}{3}\) are \(\frac{1}{1+p}\) and \(\frac{2}{1+3p}\), respectively. Hence, (62) is equivalent to

$$\begin{aligned} \min \limits _{x_2\in \{0,\frac{1}{3}\}} \left\{ f(0)=\frac{1}{1+p},f\left( \frac{1}{3}\right) =\frac{2}{1+3p}\,,\,\frac{2}{1+p}\right\} =\min \limits _{x_2\in \{0,\frac{1}{3}\}}\left\{ \frac{1}{1+p},\frac{2}{1+3p}\right\} .\nonumber \\ \end{aligned}$$
(64)

Note that when \(x_2=0\), we have \(x_1=2x_2-1=-1\); and when \(x_2=\frac{1}{3}\), we have \(x_1=2x_2-1=-\frac{1}{3}\); that is, the candidate optimal points are \((-1,0)^\top \) and \((\frac{1}{3},\frac{1}{3})^\top \). However, we desire the point \((-1,0)^\top \) as an optimal solution to (55), meanwhile, as a (sparsest) optimal solution to the original problem (54). Hence we need the value of objective function of problem (55) at \((-1,0)^\top \) is strictly smaller than that at \((\frac{1}{3},\frac{1}{3})^\top \). That is, we look forward to \(\frac{1}{1+p}<\frac{2}{1+3p}\); we derive \(0<p<1\). Then, we find \(p^*=1\).

In sum, there exists \(p^*=1\) such that for any p with \(0<p<p^*\), an optimal solution to nonconvex relaxation problem (60) is \((-1,0)^{T}\), which is also an optimal solution to the original problem (54). This confirms our results obtained in Sect. 3.

Remark 4.1

From the two cases discussed above, we not only confirmed the validness of one of our results, but also observed that for different types of nonconvex relaxation approaches, we derived different values of constant \(p^*\), which are not too small to exceed the range of applications.

Additionally, if we solve the following \( \ell _1 \)-norm minimization problem

$$\begin{aligned} \min \limits _{\mathbf {x}\in \mathbb {R}^2}\Vert \mathbf {x} \Vert _1\quad \text {s.t.}\;\; |x_1-2x_2|=1, 2x_1-x_2\leqslant -1, \Vert \mathbf {x} \Vert _\infty \leqslant 1, \end{aligned}$$
(65)

then we obtain a unique solution \((-\frac{1}{3},\frac{1}{3})^{T}\), which is not one solution of the original problem (54). This implies that the solution of \(\ell _0\)-norm minimization problem (54) can not be obtained by solving the corresponding \(\ell _1\)-minimization (65).

5 Conclusions

In this paper, we studied a partially sparsest optimization problem, whose feasible set is a union of finite polytopes, via its nonconvex relaxation method in terms of a class of concave functions with a parameter p. We discussed the relationship between its optimal solution set and the optimal solution set of its nonconvex relaxation problem. We showed that there is a positive constant parameter \(p^*\leqslant 1\) such that any optimal solution to the nonconvex relaxation with any given parameter \(0<p< p^*\) is also an optimal solution of the original sparse optimization problem. It is possible that an analytic expression of such a constant \(p^*\) could be derived. In addition, it is important to design an effective algorithm for solving the nonconvex relaxation. These are important issues to be studied in the future.