A Continuous Relaxation of the Constrained $$\ell _2-\ell _0$$ Problem

Bechensteen, Arne Henrik; Blanc-Féraud, Laure; Aubert, Gilles

doi:10.1007/s10851-020-01014-y

A Continuous Relaxation of the Constrained $\ell _2-\ell _0$ Problem

Published: 09 January 2021

Volume 63, pages 472–491, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Mathematical Imaging and Vision Aims and scope Submit manuscript

A Continuous Relaxation of the Constrained $\ell _2-\ell _0$ Problem

Download PDF

Arne Henrik Bechensteen ORCID: orcid.org/0000-0002-5744-6244¹,
Laure Blanc-Féraud¹ &
Gilles Aubert²

397 Accesses
Explore all metrics

Abstract

We focus on the minimization of the least square loss function under a k-sparse constraint encoded by a $\ell _0$ pseudo-norm. This is a non-convex, non-continuous and NP-hard problem. Recently, for the penalized form (sum of the least square loss function and a $\ell _0$ penalty term), a relaxation has been introduced which has strong results in terms of minimizers. This relaxation is continuous and does not change the global minimizers, among other favorable properties. The question that has driven this paper is the following: can a continuous relaxation of the k-sparse constraint problem be developed following the same idea and same steps as for the penalized $\ell _2-\ell _0$ problem? We calculate the convex envelope of the constrained problem when the observation matrix is orthogonal and propose a continuous non-smooth, non-convex relaxation of the k-sparse constraint functional. We give some equivalence of minimizers between the original and the relaxed problems. The subgradient is calculated as well as the proximal operator of the new regularization term, and we propose an algorithm that ensures convergence to a critical point of the k-sparse constraint problem. We apply the algorithm to the problem of single-molecule localization microscopy and compare the results with well-known sparse minimization schemes. The results of the proposed algorithm are as good as the state-of-the-art results for the penalized form, while fixing the constraint constant is usually more intuitive than fixing the penalty parameter.

Two-stage convex relaxation approach to least squares loss constrained low-rank plus sparsity optimization problems

Article 07 October 2015

Inexact Fixed-Point Proximity Algorithm for the $\ell _0$ Sparse Regularization Problem

Article Open access 08 July 2024

Nonuniqueness of Solutions of a Class of $\ell _{0}$-minimization Problems

Article 29 January 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In this paper, we consider the constrained $\ell _2-\ell _0$ problem:

$$\begin{aligned} \min _{x\in {\mathbb {R}}^N} \frac{1}{2}\Vert Ax-d\Vert ^2 \text { such that }\Vert x\Vert _0\le k \end{aligned}$$

(1)

where $A\in {\mathbb {R}}^{M\times N}$ is an observation matrix, $d\in {\mathbb {R}}^M$ is the data, and $\Vert \cdot \Vert _0 $ is, by abuse of terminology, referred to as the $\ell _0$-norm:

$$\begin{aligned} \Vert x\Vert _0=\#\{x_i , i=1,\cdots , N : x_i\ne 0\} \end{aligned}$$

with $\#S$ defined as the number of elements in S. This formulation ensures that the solution ${\hat{x}}$ has at maximum k nonzero entries. This type of problem appears in many applications, such as source separation, machine learning, and single-molecule localization microscopy. These problems are often underdetermined, i.e., problems where $M\ll N$. A more studied sparse problem is the penalized $\ell _2-\ell _0$ problem:

$$\begin{aligned} \min _{x\in {\mathbb {R}}^N} \frac{1}{2}\Vert Ax-d\Vert ^2 +\lambda \Vert x\Vert _0 \end{aligned}$$

(2)

where $\lambda \in {\mathbb {R}}_{\ge 0}$ is a trade-off parameter. Even though the formulations (1) and (2) are similar, they are not equivalent (see, for example, [25] for a theoretical comparison). These problems also differ in their sparsity parameter. With the $\lambda $ parameter, it is not possible to know the sparsity of the solution without testing it. The constrained problem does not have this problem as k fixes the number of nonzero components. However, the problems are both non-convex, non-smooth and NP-hard. In the following paragraph, we will outline the different methods to solve (1) and (2).

Greedy algorithms Greedy algorithms are designed to solve problems of the form (1). These algorithms start with a zero initialization and add one component to the signal x at each iteration until the wished-for sparsity is obtained. Among them, we find the matching pursuit (MP) algorithm [23], and the orthogonal matching pursuit (OMP) [26]. Newer algorithms add and subtract components at each iteration, among them are the algorithm greedy sparse simplex [3] or single best replacement (SBR) [37].

Mathematical program with equilibrium constraint Another method to solve a sparse optimization problem is to introduce auxiliary variables to simulate the nature of the $\ell _0$-norm and add a constraint between primaries and auxiliaries, and thus called a mathematical program with equilibrium constraint. Mixed integer reformulations [8] and Boolean relaxation [28] are two among the many algorithms based on this method. Algorithms entering these families have been proposed to solve sparse problems (see [6, 22], for example). A recent paper [2] proved the exactness of a reformulation of the constrained $\ell _2-\ell _0$ problem and showed its abilities on single-molecule localization microscopy.

Relaxations An alternative to working with the non-convex $\ell _0$-norm is to replace it with the convex $\ell _1$-norm. This is called convex relaxation, but only under strict assumptions such as the RIP conditions, the original, and the convex relaxed problems are equivalent in terms of minimizers [11]. Furthermore, $\Vert x\Vert _1$ penalizes not only the number of components in x but also their magnitude. Thus, the $\ell _0$-norm and $\ell _1$-norm are very different when x contains large values. Non-smooth, non-convex but continuous relaxations were primarily introduced to avoid this difference. These relaxations are still non-convex, and the convergence of the algorithms to a global minimum is not assured. Some of the non-convex continuous relaxations are the nonnegative garrote [9], the log-sum penalty [12] or capped-$\ell _1$ [27] to mention some. The continuous exact $\ell _0$ penalty introduced in [35] proposes an exact relaxation for problem (2), and a unified view of these functions is given in [36]. A recent convex relaxation has been proposed in [33], which replace the $\ell _0$-norm with a non-convex term, but where the sum of the data-fitting term and the relaxation is convex. Relaxation of the constrained $\ell _2-\ell _0$ problem is less studied. However, the fixed rank problem and its convex envelope have been presented in [1], and the problem has certain similarities with the constrained $\ell _2-\ell _0$ problem.

Contributions and outline The paper presents and studies a non-smooth and non-convex relaxation of the constrained problem (1). Following the procedure used to design the CEL0-relaxation of problem (2) [35], we want to explore if an equivalent continuous relaxation can be found for (1). The next section shows the computation of the convex hull of the constrained $\ell _2-\ell _0$ formulation in the case of orthogonal matrices. The convex hull yields the square norm plus a penalty term that we name Q(x). Note that the expression of Q(x) could be obtained by applying the quadratic envelope presented in [14], choosing the right parameters. In other words, the present paper provides exact relaxation properties of the quadratic envelope [14] in a new regime that goes beyond those previously identified in [14]. In particular, our results are independent of A. This will be discussed later in Sect. 3. In the same section, the relaxed formulation is investigated as a continuous relaxation of the initial problem for any matrix A. We prove some basic properties of Q(x) to show that the relaxation favors k-sparse vectors. The relaxation does not always ensure a k-sparse solution, but it promotes sparsity. We show that if a minimizer of the relaxed expression is k-sparse, then the minimizer of the relaxed problem is a minimizer of the initial one. We propose an algorithm to minimize the relaxed formulation using an accelerated FBS method, and we add a “fail-safe” strategy which ensures convergence to a critical point of the initial problem. The relaxation and its associated algorithm is applied to the problem of single-molecule localization microscopy and compared to other state-of-the-art algorithms in $\ell _2-\ell _0$ minimization.

Notations and Assumption

$A\in {\mathbb {R}}^{M \times N}$ is an $M \times N$ matrix.
The vector $x^{\downarrow }\in {\mathbb {R}}^N$ is the vector x where its components are sorted by their magnitude, i.e., $|x^{\downarrow }_1|\ge |x^{\downarrow }_2|\ge \dots \ge |x^{\downarrow }_N|$.
Let $P^{(y)}\in {\mathbb {R}}^{N\times N}$ a permutation matrix such that $P^{(y)}y=y^{\downarrow }$, we denote the vector $x^{\downarrow y}= P^{(y)}x$.
$a_i$ is the ith column of A. We suppose $\Vert a_i\Vert \ne 0\, \forall i$.
The indicator function $\chi _{_X}$ is defined for $X\subset {\mathbb {R}}^N$ as
$$\begin{aligned} \chi _{_X}(x)={\left\{ \begin{array}{ll} +\infty \text { if } x\notin X\\ 0 \text { if } x\in X. \end{array}\right. } \end{aligned}$$
${{\,\mathrm{sign}\,}}^*(x)$ is the function ${{\,\mathrm{sign}\,}}$ for $x\ne 0$ and ${{\,\mathrm{sign}\,}}^*(0)=\{-1,1\}$.
${\mathbb {R}}_{\ge 0}^N$ denotes the space $\{x\in {\mathbb {R}}^N | x_i\ge 0, \forall i\}$.

Proposition 1

We can suppose that $\Vert a_i\Vert _2=1, \,\, \forall \,\, i$, without loss of generality.

Proof

The proof is based on the fact that $\ell _0$-norm is invariant to a multiplication factor. Let $\Lambda _{\Vert a_i\Vert }$ and $\Lambda _\frac{1}{\Vert a_i\Vert }$ be diagonal matrices with the norm of $a_i$ (respectively, $1/||a_i||$) on its diagonal, and let $z=\Lambda _{\Vert a_i\Vert } x$, then $\Vert \Lambda _\frac{1}{\Vert a_i\Vert }z\Vert _0=\Vert z\Vert _0=\Vert x\Vert _0$, and thus,

$$\begin{aligned}&\mathop {{\mathrm{arg\, min}}}\limits _x \frac{1}{2}\Vert Ax-d\Vert _2^2+{\chi _{}}_{\Vert \cdot \Vert _0\le k}(x)\\&\quad =\Lambda _{\frac{1}{\Vert a_i\Vert }}\mathop {{\mathrm{arg\, min}}}\limits _z\frac{1}{2}\Vert A_n z-d\Vert ^2_2+\chi _{\Vert \cdot \Vert _0\le k}(z) \end{aligned}$$

where $A_n$ is a matrix deduced from A where the norm of each column is 1. $\square $

We assume therefore that A has normalized columns throughout this paper.

2 The Convex Envelope of the Constrained $\ell _2-\ell _0$ Problem when A is Orthogonal

In this section, we are interested in the case where A is an orthogonal matrix, i.e., $<a_j,a_i>=0, \forall \, i\ne j$. In contrast to the penalized form (2), the functional with A orthogonal is not separable so the computation of the convex envelope in the N dimensional case cannot be reduced to the sum of N one-dimensional cases (as in [35]). The problem (1) can be written as the minimization of

$$\begin{aligned} G_k(x)= \frac{1}{2}\Vert Ax-d\Vert ^2+\chi _{_{\Vert \cdot \Vert _0\le k}}(x) \end{aligned}$$

(3)

where $\chi $ is the indicator function defined in notations. Before calculating the convex envelope, we need some preliminary results.

Proposition 2

Let $x\in {\mathbb {R}}^N.$ There exists $j\in {\mathbb {N}}$ such that $0<j\le k$ and

$$\begin{aligned} |x_{k-j+1}^\downarrow |\le \frac{1}{j}\sum _{i=k-j+1}^N |x_i^\downarrow |\le |x_{k-j}^\downarrow | \end{aligned}$$

(4)

where the left inequality is strict if $j\ne 1$, and where $x_0=+\infty $. Furthermore, $T_k(x)$ is defined as the smallest integer that verifies the double inequality.

The proof of existence is given in “Appendix A.1.” We will also use the Legendre–Fenchel transformation which is essential in the calculation of the convex envelope.

Definition 1

The Legendre–Fenchel transformation of a function $f:{\mathbb {R}}^N \rightarrow {\mathbb {R}}\cup \{+\infty \}$ is defined as:

$$\begin{aligned} f^*(u^*)= \sup _{u\in {\mathbb {R}}^N} <u,u^*> - f(u). \end{aligned}$$

The biconjugate of a function, that is applying the Legendre–Fenchel transformation twice, is the convex envelope of the function.

Following [35], we present the convex envelope of $G_k$ (3) when A is orthogonal.

Theorem 1

Let $A\in {\mathbb {R}}^{M \times N}$ be such that $A^TA=I$. The convex envelope of $G_k(x)$ is

$$\begin{aligned} G_k^{**}(x)=\frac{1}{2}\Vert Ax-d\Vert _2^2+Q(x) \end{aligned}$$

(5)

where

$$\begin{aligned} Q(x)=-\frac{1}{2} \sum _{i=k-T_k(x)+1}^{N} x_i^{\downarrow 2} + \frac{1}{2T_k(x)}\left( \sum _{i=k-T_k(x)+1}^ N |x_i^\downarrow | \right) ^2 \end{aligned}$$

(6)

and where $T_k(x)$ is defined as in Proposition 2.

Proof

Since $A^TA=I$, the function $G_k$ (3) can be rewritten as:

$$\begin{aligned} G_k(x)=\chi _{_{\Vert \cdot \Vert _0\le k}}(x)+\frac{1}{2}\left\| d-b\right\| _2^2+\frac{1}{2}\left\| x-z\right\| _2^2 \end{aligned}$$

(7)

where $b=AA^Td$ and $z=A^T d$. This reformulation allows us to decompose the data-fitting term into a sum of one-dimensional functions. We apply the Legendre transformation on the functional (7):

$$\begin{aligned} G_k^*(y)= & {} \sup _{x\in {\mathbb {R}}^N} <x,y>-\chi _{_{\Vert \cdot \Vert _0\le k}}(x)-\frac{1}{2}\left\| d-b\right\| _2^2\\&-\frac{1}{2}\left\| x-z\right\| _2^2. \end{aligned}$$

We leave out the terms that are not depending on x.

$$\begin{aligned} G_k^*(y)= & {} -\frac{1}{2}\left\| d-b\right\| _2^2\\&+\sup _{x\in {\mathbb {R}}^N} \left( <x,y>-\chi _{_{\Vert \cdot \Vert _0\le k}}(x)-\frac{1}{2}\left\| x-z\right\| _2^2\right) . \end{aligned}$$

Writing differently the expression inside the supremum, we get

$$\begin{aligned} G_k^*(y)&=-\frac{1}{2}\left\| d-b\right\| _2^2\\&\quad +\sup _{x\in {\mathbb {R}}^N}\left( -\chi _{_{\Vert \cdot \Vert _0\le k}}(x)-\frac{1}{2}\left\| x-\left( z+y\right) \right\| _2^2 \right. \\&\quad \left. +\frac{1}{2}\Vert z+y\Vert _2^2 -\frac{1}{2}\Vert z\Vert ^2_2\right) . \end{aligned}$$

We develop further

$$\begin{aligned} G_k^*(y)&=-\frac{1}{2}\left\| d-b\right\| _2^2-\frac{1}{2}\Vert z\Vert ^2_2+ \frac{1}{2}\Vert z+y\Vert _2^2\\&\quad +\sup _{x\in {\mathbb {R}}^N}\left( -\chi _{_{\Vert \cdot \Vert _0\le k}}(x)-\frac{1}{2}\left\| x-(z+y)\right\| _2^2 \right) . \end{aligned}$$

The supremum is reached when $x_i=(z+y)^{\downarrow }_i$, $i \le k$, and $x_i=0$, $\forall i>k$. The Legendre transformation of $G_k$ is therefore

$$\begin{aligned} G_k^*(y)=-\frac{1}{2}\left\| d-b\right\| _2^2-\frac{1}{2}\Vert z\Vert ^2_2+\frac{1}{2}\sum _{i=1}^{k} (z+y)_i^{\downarrow 2}. \end{aligned}$$

To obtain the convex envelope of the function $G_k$, we compute the Legendre transformation of $G_k^*$.

$$\begin{aligned} G_k^{**}(x)= & {} \sup _y<x,y>+\frac{1}{2}\left\| d-b\right\| _2^2\\&+\frac{1}{2}\Vert z\Vert ^2_2-\frac{1}{2}\sum _{i=1}^{k} (z+y)_i^{\downarrow 2}. \end{aligned}$$

We add and subtract $\frac{1}{2}\Vert x\Vert ^2$ and $<x,z>$ in order to obtain an expression that is easier to work with.

$$\begin{aligned} G_k^{**}(x)&=\sup _y<x,y>+\frac{1}{2}\left\| d-b\right\| _2^2+\frac{1}{2}\Vert z\Vert ^2_2\\&\quad +\frac{1}{2}\Vert x\Vert ^2-\frac{1}{2}\Vert x\Vert ^2\\&\quad +<x,z> -<x,z>-\frac{1}{2}\sum _{i=1}^{k} (z+y)_i^{\downarrow 2}\\ G_k^{**}(x)&=\sup _y<x,z+y>+\frac{1}{2}\left\| d-b\right\| _2^2\\&\quad +\frac{1}{2}\Vert x-z\Vert ^2_2-\frac{1}{2}\Vert x\Vert ^2-\frac{1}{2}\sum _{i=1}^{k} (z+y)_i^{\downarrow 2}. \end{aligned}$$

Noticing that $\frac{1}{2}\left\| d-b\right\| _2^2+\frac{1}{2}\Vert x-z\Vert ^2_2=\frac{1}{2}\Vert Ax-d\Vert _2^2$, using the notation $w=z+y$, and given the definition of $w^\downarrow $, this is equivalent to

$$\begin{aligned} G_k^{**}(x)= & {} \frac{1}{2}\Vert Ax-d\Vert _2^2-\frac{1}{2}\Vert x\Vert ^2\nonumber \\&+\sup _{w\in {\mathbb {R}}^N} <x,w> - \frac{1}{2} \sum _{i=1}^{k} w^{\downarrow 2}_i. \end{aligned}$$

(8)

The above supremum problem can be solved by using Lemma 1, which is presented after this proof. This yields

$$\begin{aligned} G_k^{**}(x)= & {} \frac{1}{2}\Vert Ax-d\Vert _2^2-\frac{1}{2} \sum _{i=k-T_k(x)+1}^{N} x_i^{\downarrow 2} \nonumber \\&+\frac{1}{2T_k(x)}\left( \sum _{i=k-T_k(x)+1}^ N |x_i^\downarrow | \right) ^2 \end{aligned}$$

(9)

$\square $

The following lemma is necessary in the proof of the convex envelope.

Lemma 1

Let $x\in {\mathbb {R}}^N$. Consider the following supremum problem

$$\begin{aligned} \sup _{y\in {\mathbb {R}}^N} -\frac{1}{2}\sum _{i=1}^k y^{\downarrow 2}_i+<y,x> . \end{aligned}$$

(10)

This problem is concave, and the value of the supremum problem (10) is

$$\begin{aligned} \frac{1}{2}\sum _{i=1}^{k-T_k(x)} x^{\downarrow 2}_i + \frac{1}{2T_k(x)}\left( \sum _{i=k-T_k(x)+1}^N|x^\downarrow _i| \right) ^2. \end{aligned}$$

$T_k(x)$ is defined in Proposition 2. The supremum argument is given by

$$\begin{aligned} y=P^{(x)^{-1}} {\hat{y}} \end{aligned}$$

where ${{\hat{y}}}$ is

$$\begin{aligned} {\hat{y}}_j(x)={\left\{ \begin{array}{ll}{{\,\mathrm{sign}\,}}(x^{\downarrow }_j)\frac{1}{T_k(x)}\sum _{i=k-T_k(x)+1}^N |x^{\downarrow }_i| &{}\text { if } k\ge j\ge k-T_k(x)+1 \\ &{}\text { or if } j>k \text { and } x^{\downarrow }_j\ne 0\\ \left[ -1,1\right] \frac{1}{T_k(x)}\sum _{i=k-T_k(x)+1}^N |x^{\downarrow }_i| &{}\text { if } j>k \text { and } x^{\downarrow }_j= 0\\ x^{\downarrow }_j &{}\text { if } j< k-T_k(x)+1. \end{array}\right. } \end{aligned}$$

(11)

The proof can be found in “Appendix A.2,” and it depends on multiple preliminary results in “Appendix A.1.”

Remark 1

${\hat{y}}$ is such that ${\hat{y}}={\hat{y}}^\downarrow $.

This expression of the convex envelope may be hard to grasp since the expression is on a non-closed form. To understand better Q(x), we have the following properties:

Property 1

$Q(x):\mathbb {R}^n\rightarrow [0,\infty [$.

Proof

Let us show that $Q(x)\ge 0, \, \forall x$. We use Eq. (6) as starting point.

$$\begin{aligned} Q(x)&=-\frac{1}{2} \sum _{i=k-T_k(x)+1}^{N} x^{\downarrow 2}_i \\&\quad + \frac{1}{2T_k(x)}\left( \sum _{i=k-T_k(x)+1}^ N |x_i^\downarrow | \right) ^2 \\&\ge -\frac{1}{2} |x_{k-T_k(x)+1}^\downarrow |\sum _{i=k-T_k(x)+1}^{N} |x^{\downarrow }_i| \\&\quad + \frac{1}{2T_k(x)}\left( \sum _{i=k-T_k(x)+1}^ N |x_i^\downarrow | \right) ^2 \\&\ge -\frac{1}{2} |x^{\downarrow }_{k-T_k(x)+1}|\sum _{i=k-T_k(x)+1}^{N} |x^{\downarrow }_i| \\&\quad + \frac{1}{2}|x^{\downarrow }_{k-T_k(x)+1}|\sum _{i=k-T_k(x)+1}^ N |x^\downarrow _i| = 0. \end{aligned}$$

We used the fact that $|x_{k-T_k(x)+1}^\downarrow |\ge |x^\downarrow _{i}|,\,\forall i \ge k-T_k(x)+1$ for the first inequality. For the second inequality, we used the inequality in the definition of $T_k(x)$ (see Proposition 2) to go from the second to third line. Note that for $T_k(x)>1$ the last inequality is strict. $\square $

Property 2

The function Q(x) is continuous on ${\mathbb {R}}^N$.

Proof

By definition we have that $G_k^{**}(x)=\frac{1}{2}\Vert Ax-d\Vert ^2+Q(x)$ when A is orthogonal, and $G_k^{**}$ is lower semi-continuous, and continuous in the interior of its domain. From [29, Corollary 3.47] for coercive functions, $dom(co (f))=co (dom (f))$, where co is the convex envelope of a function and dom is the domain of the function. First, $G_k$ is coercive when A is orthogonal since we have $\Vert Ax\Vert ^2=(Ax)^TAx=x^TA^TAx=\Vert x\Vert ^2$. $G_k^{**}$ is continuous on ${\mathbb {R}}^N$. Since $dom (G_k)$ is made up of all different supports where $\Vert x\Vert _0\le k$, its convex envelope is ${\mathbb {R}}^N$. Thus, $dom(G_k^{**})={\mathbb {R}}^N$, and $G_k^{**}$ is continuous on ${\mathbb {R}}^N$. Moreover, $Q(x) = G_k^{**}(x)-\frac{1}{2}\Vert Ax-d\Vert ^2$, so Q(x) is the difference between a continuous function and a continuous function, and is independent of A, and thus continuous. $\square $

Property 3

Let $\Vert x\Vert _0\le k$. Then, $T_k(x)$ as defined in Proposition 2 is such that $T_k(x)= 1$. The inverse is not necessarily true.

Proof

From Proposition 2, we know that $T_k(x)$ satisfies

$$\begin{aligned} |x^\downarrow _{k-T_k(x)+1}|\le \frac{1}{T_k(x)}\sum _{i=k-T_k(x)+1}^N| x^\downarrow _i|\le |x^\downarrow _{k-T_k(x)}|. \end{aligned}$$

First, note that for all x such that $\Vert x\Vert _0\le k$, we have $\forall j>k, \,\,x^\downarrow _j=0$, and in this case the inequalities are clearly satisfied for $T_k(x)=1$. Furthermore, $T_k(x)$ is defined as the smallest possible integer, and thus $T_k(x)=1$.

An example to prove the inverse is not true: Let $x=(6,3,2,1)^T$. Let $k=2$, then

$$\begin{aligned}\sum _{i=k}^N| x^\downarrow _i|=6\le |x^\downarrow _{k-1}|=6.\end{aligned}$$

$T_k(x)=1$, but the constraint $\Vert x\Vert _0\le 2$ is clearly not satisfied. $\square $

Property 4

$Q(x)=0 \mathbf{if} and only if \Vert x\Vert _0\le k$.

Proof

From Property 1, $Q(x)\ge 0$ and the inequality is strict if $T_k(x)>1$. Thus, it suffices to investigate $T_k(x)=1$. The expression is thus reduced to:

$$\begin{aligned} Q(x) = \sum _{j=k+1}^N\sum _{i=k}^{j-1} |x^\downarrow _i||x^\downarrow _j| \end{aligned}$$

which is equal to 0 only if at least $\forall j, \, j>k,\, x^\downarrow _j=0$. $\square $

In the next section, we will investigate the use of Q(x) when A is not orthogonal.

3 A New Relaxation

From now on, we suppose $A\in {\mathbb {R}}^{M\times N}$ with A not necessarily orthogonal.

We are interested in a continuous relaxation of $G_k$ defined as

$$\begin{aligned} G_k(x)=\frac{1}{2}\Vert Ax-d\Vert ^2+\chi _{_{\Vert \cdot \Vert _0\le k}}(x). \end{aligned}$$

Following the CEL0 approach, we propose the following relaxation of $G_k$:

$$\begin{aligned} G_{Q}(x)=\frac{1}{2}\Vert Ax-d\Vert ^2+Q(x) \end{aligned}$$

(12)

with

$$\begin{aligned} Q(x)=-\frac{1}{2} \sum _{i=k-T_k(x)+1}^{N} x_i^{\downarrow 2} + \frac{1}{2T_k(x)}\left( \sum _{i=k-T_k(x)+1}^ N |x_i^\downarrow | \right) ^2 \end{aligned}$$

(13)

where $T_k(x)$ is the function defined in Proposition 2 as the smallest integer that verifies the inequality:

$$\begin{aligned} |x^\downarrow _{k-T_k(x)+1}|\le \frac{1}{T_k(x)}\sum _{i=k-T_k(x)+1}^N| x^\downarrow _i|\le |x^\downarrow _{k-T_k(x)}| \end{aligned}$$

(14)

where, by definition, the inequality is strict if $T_k(x)>1$.

Remark that, from its definition [see Eq. (8)], Q(x) can be written as:

$$\begin{aligned} Q(x)=-\frac{1}{2}\sum _{i=1}^N x_i^2+ \sup _{w\in {\mathbb {R}}^N}-\frac{1}{2}\sum _{i=1}^k{w_i^{\downarrow }}^2+<w,x>. \end{aligned}$$

(15)

Note that the properties of Q(x) proved in Sect. 2 are valid for any A.

The exactness of a relaxation means that the relaxation has the same global minimizers as the initial function. Furthermore, it does not add any minimizers that are not minimizers of the initial function. The CEL0 relaxation [35] is an exact relaxation of the penalized functional (2). The proposed relaxation $G_Q$ of the constraint functional $G_k$ (3) is not exact as a counterexample later in the paper shows. We can prove, however, some partial results.

Remark 2

From Property 4, we have $Q(x)=0$ $\forall \, x$ such that $\Vert x\Vert _0\le k$. Thus, $G_Q(x)=G_k(x)$ $\forall \, x$ such that $\Vert x\Vert _0\le k$.

Theorem 2

Let ${\hat{x}}$ be a local (respectively global) minimizer of $G_Q$. If $\Vert {\hat{x}}\Vert _0\le k$, then ${\hat{x}}$ is a local (respectively, global) minimizer of $G_k$.

Proof

Let ${\mathscr {S}} {:}{=} \{x:\Vert x\Vert _0\le k\}$. Let ${\hat{x}}$ be a local minimizer of $G_Q$, such that $\Vert {\hat{x}}\Vert _0\le k$ and let $ {\mathscr {N}}({\hat{x}},\gamma )$ denote the $\gamma $-neighborhood of ${\hat{x}}$. By contradiction assume that $\exists {\bar{x}} \in {\mathscr {N}} ({\hat{x}},\gamma )\bigcup {\mathscr {S}}$ s.t. $G_k({\bar{x}})< G_k({\hat{x}})$. From Remark 2, $G_Q({\bar{x}})=G_k({\bar{x}})$ and $G_Q({\hat{x}})=G_k({\hat{x}})$, which means $\exists {\bar{x}} \in {\mathscr {N}}({\hat{x}},\gamma )\cup {\mathscr {S}} s.t. G_Q({\bar{x}})< G_Q({\hat{x}})$ which is a contradiction since ${\hat{x}}$ is a minimizer of $G_Q$. The same reasoning can be applied in the case of global minimizers. $\square $

Thus, if a minimizer of the relaxed functional satisfies the sparsity constraint, then it is a minimizer of the initial problem. Furthermore, the relaxation is a mix of absolute values and squares and promotes therefore sparsity. The subgradient, as can be seen in the next section, promotes a k-sparse solution.

Further note that we could have applied the quadratic envelope [14] to obtain the relaxation Q. The quadratic envelope can be defined as applying twice the $S_\gamma $ transformation on a function f. The $S_\gamma $ transformation is defined as:

$$\begin{aligned} S_\gamma (f)(y){:}{=} \sup _x - f(x)-\frac{\gamma }{2}\Vert x-y\Vert ^2. \end{aligned}$$

If we apply the quadratic envelope to the constrained $\ell _0$ indicator function, we obtain $\gamma Q$. Further, the author proposes to either choose $\gamma I \prec A^T A$, where I is the identity matrix, or $\gamma I \succ A^TA$. It is important to note that if we have a $\gamma $ such that $\gamma I \not \succ A^TA $, does not mean that $\gamma I \prec A^TA$. When $\gamma $ is such that $\gamma I \succ A^TA$, the relaxation is exact. However, numerically, we found this condition far too strong, and it did not perform better than minimizing the initial hard constraint function $G_k$ (3). For a normalized matrix A, Q can be found by taking $\gamma =1$ in $S_\gamma (S_\gamma (\chi _{_{\Vert \cdot \Vert _0\le k}}))$. However, we do not have necessarily $I \succ A^TA$. Nevertheless, we show in this paper, some exact relaxation properties for $G_Q$.

Furthermore, what is hidden in our proposed method is the fact that each column of A is normalized. Without this assumption, each element $x_i$ would be weighted by $\Vert a_i\Vert ^2$, which is finer than multiplying a constant to the whole regularization term. Again, we can compare with the CEL0 relaxation. When applying the quadratic envelope to the $\ell _0$ penalization term, we obtain CEL0, but instead of $\Vert a_i\Vert ^2$ in the expression, there is a $\gamma $.

However, we are obliged to normalize A to calculate the proximal operator of the regularization term.

3.1 The Subgradient

In this section, we calculate the subgradient of $G_Q$. Since $G_Q$ is neither smooth nor convex, we cannot calculate the gradient nor the subgradient in the sense of convex analysis. We calculate the generalized subgradient (or Clarke subgradient). The obtained expression shows the difficulties to give optimal necessary conditions for the relaxation.

To calculate the generalized subgradient, we must first prove that Q(x) is locally Lipschitz.

Definition 2

A function $f:{\mathbb {R}}^N\rightarrow {\mathbb {R}}$ is locally Lipschitz at point x if

$$\begin{aligned} \exists (L, \epsilon ), \forall (y,y')\in {\mathscr {N}}(x,\epsilon )^2, |f(y)-f(y')|\le L \Vert y-y'\Vert \end{aligned}$$

where $L\in {\mathbb {R}}_{\ge 0}$, and ${\mathscr {N}}(x,\epsilon )$ is a $\epsilon $ neighborhood of x.

Lemma 2

Q(x) is locally Lipschitz, $\forall x \in {\mathbb {R}}^N$.

Proof

First, it is well known that the supremum of locally Lipschitz functions is locally Lipschitz. Let us use the definition of Q(x) from (15). The function defined as $x\rightarrow \sup _w-\frac{1}{2}\sum _{i=1}^k{w_i^{\downarrow }}^2+<w,x>$ is locally Lipschitz since $\forall i$ the functions $x\rightarrow -\frac{1}{2}\sum _{i=1}^k{w_i^{\downarrow }}^2+<w,x>$ are locally Lipschitz. Furthermore, the sum of two locally Lipschitz functions is locally Lipschitz. $\square $

Since Q(x) is locally Lipschitz, we can search for the generalized subgradient, denoted $\partial $.

Definition 3

The generalized subgradient [16] of a function $f:{\mathbb {R}}^N\rightarrow {\mathbb {R}}$ (which is locally Lipschitz) is defined by

$$\begin{aligned} \partial f(x) {:}{=}\{\xi \in {\mathbb {R}}^N: f^0(x,v)\ge <v,\xi >, \forall v \in {\mathbb {R}}^N\} \end{aligned}$$

where $f^0(x,v)$ is the generalized directional derivative in the direction v,

$$\begin{aligned} f^0(x,v)= \limsup _{\begin{array}{c} y\rightarrow x\\ \eta \downarrow 0 \end{array}} \frac{f(y+\eta v)-f(y)}{\eta }. \end{aligned}$$

Theorem 3

Let $x\in {\mathbb {R}}^N$, and let $T_k(x)$ be as defined in Proposition 2. The subgradient of $G_Q(x)$ is

$$\begin{aligned} \partial G_Q (x)= A^*(Ax-d)-x+y(x) \end{aligned}$$

(16)

where y(x) is the argument where the supremum is reached in Lemma 1.

Proof

$G_Q$ is sum of three functions, $\sup _w-\frac{1}{2}\sum _{i=1}^k{w_i^{\downarrow }}^2+<w,x>$, $\frac{1}{2}\Vert Ax-d\Vert ^2$ and $-\frac{1}{2}\Vert x\Vert ^2$. From [16, Proposition 2.3.3 and Corollary 1] and since the two last functions are differentiable, we can write the generalized subgradient of $G_Q$ as the sum of the gradient of the two last functions and the generalized subgradient of the first, i.e.,

$$\begin{aligned} \partial G_Q= & {} \nabla \left[ \frac{1}{2}\Vert A\cdot -d\Vert ^2\right] (x) - \nabla \left[ \frac{1}{2}\Vert \cdot \Vert ^2\right] (x)\nonumber \\&+ \partial [\sup _{w\in {\mathbb {R}}^N}-\frac{1}{2}\sum _{i=1}^k{w_i^{\downarrow }}^2+<w,\cdot >](x). \end{aligned}$$

(17)

Thus, the difficulty is to calculate $\partial [\sup _w-\frac{1}{2}\sum _{i=1}^k{w_i^{\downarrow }}^2+<w,\cdot >](x)$.

From [24, Theorem 2.93], the subgradient of the supremum is the convex envelop of the subgradients where the supremum is reached. We define $g(w,x) =-\frac{1}{2}\sum _{i=1}^k{w_i^{\downarrow }}^2+<w,x>$. The subgradient of g with respect to x is $\partial (g(w,\cdot ))(x) = w$. Now, we need to find the supremum in $ \sup _w-\frac{1}{2}\sum _{i=1}^k{w_i^{\downarrow }}^2+<w,x>$. From Lemma 1, we know that the supremum is reached at y(x), given in (11). We insert y(x) into (17) and this concludes the proof. $\square $

3.2 A Numerical Example of the Relaxation in Two Dimensions

In order to obtain a clearer view of what is gained with the proposed relaxation, we study two numerical examples in two dimensions. We set $k=1$ and the initial problem is

$$\begin{aligned} G_k(x)=\frac{1}{2}\Vert Ax-d\Vert ^2+\chi _{_{\Vert \cdot \Vert _0\le 1}}(x). \end{aligned}$$

In two dimensions, the problem $G_{k=1}$ is a simple problem to minimize. The solution is either when the first component, ${\hat{x}}_1$ is 0, or when the second component ${\hat{x}}_2=0$, or both. For $k=1$ we have that $T_k(x)=1$, and the relaxed formulation is then

$$\begin{aligned} G_Q(x)=\frac{1}{2}\Vert Ax-d\Vert ^2+ |x_1||x_2|. \end{aligned}$$

We consider the case where $A \in {\mathbb {R}}^{2\times 2}$, and the two following examples:

$$\begin{aligned} A&=\left( \begin{matrix} 3 &{} 2\\ 1&{} 3 \end{matrix}\right) \Lambda _{1/\Vert a_i\Vert }&\text { and } d=\left( \begin{matrix} 1 \\ 2\end{matrix}\right) \end{aligned}$$

(18)

$$\begin{aligned} A&=\left( \begin{matrix} -3 &{} -2\\ 1 &{} 3 \end{matrix}\right) \Lambda _{1/\Vert a_i\Vert }&\text { and } d= \left( \begin{matrix} 1 \\ 2 \end{matrix}\right) \end{aligned}$$

(19)

where $\Lambda _{1/\Vert a_i\Vert }$ is a diagonal matrix with $\frac{1}{\Vert a_i\Vert }$ on its diagonal, and $\Vert a_i\Vert $ is the norm of the ith column of A. Figure 1 presents the contour lines of $G_k$ and $G_Q$. The red semi-transparency layer over the contour line of the $G_k$ represents the infinite value, and the blue semi-transparency layer over the relaxation marks the axes. The figures show the advantages of using $G_Q$ as relaxation. The relaxation is continuous, and in Example (18), the relaxation is exact. This can be observed in the upper row in Fig. 1. Example (19) gives an example when the relaxation is not exact. In the lower row of Fig. 1, we observe the effect of the relaxation, as it is a product of the absolute value of $x_1$ and $x_2$. The global minima for the relaxation in this case is situated in $(-0.086,1.0912)$ and the two minima for $G_k$ are $(-0.3162, 0)$ and (0, 1.094).

4 Algorithms to Deal with $G_Q$

The analysis of the relaxation shows that it promotes sparsity. The function $G_Q$ is non-convex and non-smooth, but $G_Q$ is continuous, which is not the case for $G_k$. One could implement a subgradient method, either by using gradient bundle methods (see [10] for an overview) or classical subgradient methods. However, there are no convergence guarantees for the latter. Both methods are also known to be slow compared to the classical forward–backward splitting algorithm (FBS). The FBS algorithm is proven to converge when the objective function has the Kurdyka-Łojasiewicz (K-Ł) property. More recent algorithms propose accelerations of the FBS, such as the non-monotone accelerated proximal gradient algorithm (nmAPG) [21] which is used in the numerical experiences of this paper. The algorithm is presented in “Appendix A.4.” It is designed to work on problems of the form:

$$\begin{aligned} {\hat{x}}\in \mathop {{\mathrm{arg\, min}}}\limits _x F(x){:}{=}f(x)+g(x) \end{aligned}$$

(20)

where f is a differentiable function, $\nabla f$ is L-Lipschitz, and the proximal operator of g can be calculated. It is possible to add a fail-safe to be sure that the algorithm always converges to a solution that satisfies the sparsity constraint. A simple projection to the constraint $\Vert x\Vert _0\le k$ using the proximal of the constraint and then the calculation of the optimal intensity for the given support would suffice. To use the FBS and its variants, we need to calculate the proximal operator of Q(x). To do so, we present some preliminary results before presenting the proximal operator.

Lemma 3

$G_Q$ satisfies the K-Ł property.

Proof

$\frac{1}{2}\Vert Ax-d\Vert ^2$ is semi-algebraic. Using the definition of Q(x) in (15), we can prove that Q(x) is semi-algebraic. First, note that $\Vert x\Vert _2^2$ is semi-algebraic. Furthermore,

$$\begin{aligned} \sum _{i=1}^kx_i^{\downarrow 2}= \sup _y g(x,y){:}{=}-\chi _{_{\Vert \cdot \Vert _0\le k}}(y)-\frac{1}{2}\Vert x-y\Vert ^2 \end{aligned}$$

and g(x, y) is semi-algebraic [7]; then, $\sum _{i=1}^kx_i^{\downarrow 2}$ is semi-algebraic. Thus, $f(x,y){:}{=}-\sum _{i=1}^k y_i^{\downarrow 2}+ <x,y>$ is semi-algebraic, and the supremum as well. We can conclude that Q(x) is semi-algebraic, and thus, $G_Q$ satisfies the K-Ł property. $\square $

The expression of Q(x) in (6) is not on a closed-form expression because of the function $T_k(x)$ and calculating the proximal operator directly from this expression is difficult. The following proposition facilitates the calculation of $\text {prox}_Q$. The proposition is inspired by [13, Proposition 3.3], and the proof is omitted in this article as it follows the same steps and arguments as in the referenced article.

Proposition 3

Let $\rho >1$ and $z=\text {prox}_{-(\frac{\rho -1}{\rho })\sum _{i=k+1}^N(\cdot )^{\downarrow 2}}(y)$. We have

$$\begin{aligned} \text {prox}_{\frac{Q}{\rho }}(y)=\frac{\rho y - z}{\rho -1}. \end{aligned}$$

(21)

Thus, it suffices to calculate the proximal operator of $\zeta (x){:}{=}-(\frac{\rho -1}{\rho })\sum _{i=k+1}^Nx_i^{\downarrow 2}$. This is done in Lemma 8 in “Appendix A.3.” The following theorem presents the proximal operator of Q

Theorem 4

The proximal operator of Q for $\rho >1$ is such that

$$\begin{aligned} \text {prox}_{\frac{Q}{\rho }}(y)^{\downarrow y}_i = {\left\{ \begin{array}{ll} \frac{\rho y^{\downarrow }_i -{{\,\mathrm{sign}\,}}(y^{\downarrow }_i)\max (|y^{\downarrow }_i|,\tau ) }{{\rho -1}}&{}\text { if } i\le k \\ \frac{\rho y^{\downarrow }_i -{{\,\mathrm{sign}\,}}(y^{\downarrow }_i)\min (\tau ,\rho |y^{\downarrow }_i|)}{\rho -1} &{}\text { if } i>k \end{array}\right. } \end{aligned}$$

or, equivalently

$$\begin{aligned} \text {prox}_{\frac{Q}{\rho }}(y)^{\downarrow y}_i = {\left\{ \begin{array}{ll} y^{\downarrow }_i &{}\text { if } i\le k^* \\ \frac{\rho y^{\downarrow }_i -{{\,\mathrm{sign}\,}}(y^{\downarrow }_i)\tau }{\rho -1} &{}\text { if } k^*<i<k^{**} \\ 0 &{}\text { if } k^{**}\le i. \end{array}\right. } \end{aligned}$$

where $k^*$ is the first index such that $\tau >|y^{\downarrow }_i|$ and $k^{**}$ is the first index such that $\rho |y^{\downarrow }_i|<\tau $. $\tau $ is a value in the interval $[|y^{\downarrow }_k|,\rho |y^{\downarrow }_{k+1}|]$, and is defined as

$$\begin{aligned} \tau =\frac{\rho \sum _{i\in n_1}|y^{\downarrow }_i|+\rho \sum _{i\in n_2}|y^{\downarrow }_i|}{\rho \#n_1+ \#n_2} \end{aligned}$$

(22)

where $n_1$ and $n_2$ are two groups of indices such that $\forall \, i \in n_1, y^{\downarrow }_i<\tau $ and $\forall \, i \in n_2,\, \tau \le \rho |y^{\downarrow }_i|$ for an $\#n_1$ and $\#n_2$ are the sizes of n1 and n2. To go from $\text {prox}_{\frac{Q}{\rho }}(y)^{\downarrow y}$ to $\text {prox}_{\frac{Q}{\rho }}(y)$, we apply the inverse permutation that sorts y to $y^{\downarrow }$.

Proof

The result is direct by applying Proposition 3 and Lemma 8 which present the proximal operator of $\text {prox}_{-(\frac{\rho -1}{\rho })\sum _{i=k+1}^N(\cdot )^{\downarrow 2}}(y)$; the latter is presented in “Appendix A.3.” $\square $

Note that the proximal operator of Q is only a relaxation of the proximal operator of $\Vert x\Vert _0\le k$, which keeps the k largest values of x. Further note that the search for $\tau $ can be done iteratively by sorting in descending order all values of $y^{\downarrow }_i$ $i\le k$ and $\rho y^{\downarrow }_i$ $i>k$ that are (with respect to their absolute value) in the interval $[|y^{\downarrow }_k|,\rho |y^{\downarrow }_{k+1}|]$. The elements in the interval are sorted, and denoted $p_i$. $n_1$,$n_2$ must calculated for each interval $[p_{i+1},p_i]$. The search is over if $\tau $ is $\in [p_{i+1},p_i]$.

The codes to compute the proximal operator and the cost function are available online: https://github.com/abechens/SMLM-Constraint-Relaxation.

5 Application to 2D Single-Molecule Localization Microscopy

In this section, we compare the minimization of the relaxation with other 2D grid-based sparse algorithms. The algorithms are applied to the problem of 2D single-molecule localization microscopy (SMLM).

SMLM is a microscopy method that is used to obtain images with a higher resolution than what is possible with traditional optical microscopes. The method was first introduced in [5, 19, 30]. Fluorescent microscopy uses photoactivatable fluorophores that can emit light when they are excited with a laser. The fluorophores are observed with an optical microscope, and, since the fluorophores are smaller than the diffraction limit, what is observed is not each fluorophore, but rather a diffraction pattern (or equivalently the point spread function (PSF)) larger than the fluorophores. This limits the resolution of the image. SMLM exploits photoactivatable fluorophores, and, instead of activating all the fluorophores at once as done by other fluorescent microscopy methods, one activates a sparse set of fluorescent fluorophores. The probability that two fluorophores are in the same PSF is low when only a few fluorophores are activated (low-density images), and precise localization of each is therefore possible. The localization becomes harder if the density of emitting fluorophores is higher because of the possibility of overlapping PSF’s. Once each molecule has been precisely localized, they are switched off and the process is repeated until all the fluorophores have been activated. The total acquisition time may be long when activating few fluorophores at a time, which is unfortunate as SMLM may be used on living samples that can move during this time. We are, in this paper, interested in high-density acquisitions.

The localization problem of SMLM can be described as a $\ell _2-\ell _0$ minimization problem such as (1) and (2) with an added positivity constraint since we reconstruct the intensity of the fluorophores. For $G_Q$, this is done by using the distance function to the nonnegative space since the proximal operator of the sum of Q(x) and the positivity constraint is not known. A is the matrix operator that performs a convolution with the point spread function and a reduction of dimensions. The fluorophores are reconstructed on a finer grid $\in {\mathbb {R}}^{ML\times ML}$ than the observed image $\in {\mathbb {R}}^{M \times M}$, with $L>1$. A detailed description of the mathematical model can be found in [2]. Note that an estimation of the number of excited fluorophores is possible to do beforehand as this is dependent on the intensity of the excitation laser. Thus, the constrained sparse formulation (1) may be more suitable to use compared to the penalized sparse formulation (2) as the sparsity parameter k is the maximum number of nonzero pixels to reconstruct, and one pixel can be roughly equivalent to one observed excited fluorophore.

We compare first $G_Q$ with iterative hard thresholding (IHT) [17] which minimizes the constrained initial function (1). This gives a clear comparison between the initial function and the proposed relaxation. We construct an image artificially with 213 of fluorophores randomly scattered on a $256\times 256$-grid, where each square measures $25\times 25$ nm. The observed image is $64 \times 64$-pixel image, where each pixel measures $100\times 100$ nm, with a simulated Gaussian PSF with an FWHM of 258.21 nm. Note that we use these parameters as this is representative of the simulated 2D-ISBI data presented in the next section. We then construct 100 observations by applying different realizations of Poisson noise to the same image. The signal-to-noise ratio is around 20dB for each observation (Fig. 2).

We compare the ability of $G_Q$ and constrained IHT to minimize the $\ell _2$ data fidelity term under the constraint that only 213 pixels are nonzero.

In Fig. 3, we compare the results of $G_Q$ and constrained IHT using the data fidelity term. The results of the 100 image reconstructions are presented with boxplots. The red mark in the box is the median of the reconstruction result of the 100 noisy, blurred, and downsampled images. The upper (respectively, lower) part of the box indicates the 75th (25th) percentiles median. We can observe that $G_Q$ always minimizes better than constrained IHT in terms of the data fidelity term. Thus, it manages more efficiently to solve the initial problem.

$G_Q$ reconstructs the 100 images with a median data fidelity value of 1.55. To compare, constrained IHT has 2.74 as a median data fidelity value.

This small example shows clearly the advantage of using $G_Q$ compared to constrained IHT. In the next section, we compare $G_Q$ and constrained IHT with other $\ell _2-\ell _0$-based algorithms.

5.1 Comparison on 2013 ISBI Data

We compare $G_Q$ and constrained IHT with CoBic [2], which is designed to minimize the constrained $\ell _2-\ell _0$ problem. We further compare the algorithms with two algorithms: CEL0 [18] and the $\ell _1$ relaxation, both relaxations of the penalized formulation (2). The $\ell _1$ relaxation is minimized using FISTA [4], and $G_Q$ is minimized with the non-monotone accelerated proximal gradient algorithm (nmAPG) [21]. The algorithms are applied to the problem of 2D single-molecule localization microscopy (SMLM).

The algorithms are tested on two datasets with high-density acquisitions, accessible from the ISBI 2013 challenge [31]. For a review of the SMLM and the different localization algorithms, see the ISBI-SMLM challenge [31]. A more recent challenge was launched in 2016 [32]. We decided to use the 2013 challenge as the data are denser in the 2013 challenge. Furthermore, the 2D data in the 2016 challenge contain observations where some elements are not in the focal plane. Thus, our image formation model is not optimized for this image acquisition method.

Figure 4 shows two of the 361 acquisitions of the simulated dataset as well as the sum of all the acquisitions. We apply the localization algorithm to each acquisition, and the sum of the results of the localization of the 361 acquisitions yields one super-resolution image.

Table 1 The Jaccard index obtained for an reconstruction of around 90, 100 and 142 nonzero pixels on average. In bold: best reconstruction for the tolerance and the number of pixel reconstructed

Full size table

We use the Jaccard index to do a numerical evaluation of the reconstructions. The Jaccard index is known from probability and is used to evaluate similarities between sets. In this case, it evaluates the localization of the reconstructed fluorophores (see [31]), and is defined as the ratio between the correctly reconstructed (CR) fluorophores and the sum of CR, false negatives (FN), and false positives (FP) fluorophores. The index is 1 for a perfect reconstruction, and the lower the index, the poorer the reconstruction. The Jaccard index includes a tolerance of error in its calculations when identifying the CR, FN and FP.

$$\begin{aligned} Jac=\frac{CR}{CR+FP+FN}\times 100\%. \end{aligned}$$

5.2 Results of the ISBI Simulated Dataset

The simulated dataset represents 8 tubes of 30 nm diameter. The acquisition is captured on a $64\times 64$ pixel grid with a pixel size of $100 \times 100\,\text {nm}^2$. The acquisition used a simulated point spread function (PSF) modeled by a Gaussian function with a full width at half maximum (FWHM) of 258.21 nm. Among the 361 images, there are 81 049 fluorophores.

The algorithms localize the fluorophores with higher precision on a $256\times 256$ grid, where each pixel measures $25\times 25\,\text {nm}^2$. This can be written as a reconstruction of $x\in {\mathbb {R}}^{ML\times ML}$ with an acquisition $d\in {\mathbb {R}}^{M\times M}$, where $L=4$ and $M=64$. The position of the fluorophore is estimated using the center of the pixel.

We test the reconstruction ability of $G_Q$ with the sparsity constraint k, set to three different values, and the Jaccard index is presented in Table 1. The $\lambda $ parameters for the penalized functional (2) are set such that the same number of nonzero pixels is reconstructed as for the constrained problem. The reconstructions for 99 nonzero pixels from the different algorithms are presented in Fig. 5. The proposed relaxation performs slightly better than CELO. The relaxation performs better than any of the constrained formulation algorithms (CoBic and constrained IHT); moreover, CoBic does not reconstruct more than 99 nonzero pixels on average. The average reconstruction time for one acquisition is found in Table 2.

5.3 Results of the Real Dataset

The algorithms are applied to the real high-density dataset, provided from the 2013 ISBI SMLM challenge [31]. In total, there are 500 acquisitions and each acquisition is of size $128\times 128$ pixels and each pixel measures $100 \times 100\,\hbox {nm}^2$. The FWHM is evaluated to be 351.8 nm [15]. The localization is done on a fine $512\times 512$ pixel grid, where each pixel measures $25 \times 25\,\text {nm}^2$. Extensive testing of the sparsity parameters has been done to obtain the results, presented in Fig. 6, as we have no prior knowledge of the solution. The parameters were chosen such that the parts in red and green had distinctive tubes, as well as the overall tubulins, were reconstructed. The results of the real dataset confirm the results of the simulated data, where the constrained IHT performance is not good, and the $\ell _1$ relaxation seems to tighten the holes which are observed in red.

Table 2 Average reconstruction time for one image acquisition for the different methods

Full size table

An important note In the numerical experience, the proposed relaxed formulation converges always to a critical point that satisfies the sparsity constraint, and thus, the “fail-safe” strategy is never activated.

6 Conclusion

We have investigated in this paper a continuous relaxation of the constrained $\ell _2-\ell _0$ problem. We compute the convex hull of $G_k$ when A is orthogonal. We further propose to use the same relaxation for any A and name this relaxation $G_Q$. This is the same procedure as the authors used to obtain CEL0 [35]. The question that has driven us has been answered; the proposed relaxation, $G_Q$, is not exact for every observation matrix A. However, it promotes sparsity and is continuous. We propose an algorithm to minimize the relaxed function. We further add a “fail-safe” strategy which ensures convergence to a critical point of the initial functional. In the case of SMLM, the relaxation performs as good as the other grid-based methods, and it converges toward a critical point of the initial problem each time without the “fail-safe” strategy activated. Furthermore, the constraint parameter of $G_Q$ is usually easier to fix than the regularizing parameter $\lambda $ in CEL0 in many sparse optimization problems.

References

Andersson, F., Carlsson, M., Olsson, C.: Convex envelopes for fixed rank approximation. Optim. Lett. 11(8), 1783–1795 (2017)
Article MathSciNet Google Scholar
Bechensteen, A., Blanc-Féraud, L., Aubert, G.: New $l_2- l_0$ algorithm for single-molecule localization microscopy. Biomed. Opt. Express 11(2), 1153–1174 (2020)
Article Google Scholar
Beck, A., Eldar, Y.C.: Sparsity constrained nonlinear optimization: optimality conditions and algorithms. SIAM J. Optim. 23(3), 1480–1509 (2013)
Article MathSciNet Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009). https://doi.org/10.1137/080716542
Article MathSciNet MATH Google Scholar
Betzig, E., Patterson, G.H., Sougrat, R., Lindwasser, O.W., Olenych, S., Bonifacino, J.S., Davidson, M.W., Lippincott-Schwartz, J., Hess, H.F.: Imaging intracellular fluorescent proteins at nanometer resolution. Science 313(5793), 1642–1645 (2006). https://doi.org/10.1126/science.1127344
Article Google Scholar
Bi, S., Liu, X., Pan, S.: Exact penalty decomposition method for zero-norm minimization based on mpec formulation. SIAM J. Sci. Comput. 36(4), A1451–A1477 (2014)
Article MathSciNet Google Scholar
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1), 459–494 (2014). https://doi.org/10.1007/s10107-013-0701-9
Article MathSciNet MATH Google Scholar
Bourguignon, S., Ninin, J., Carfantan, H., Mongeau, M.: Exact sparse approximation problems via mixed-integer programming: formulations and computational performance. IEEE Trans. Signal Process. 64(6), 1405–1419 (2016)
Article MathSciNet Google Scholar
Breiman, L.: Better subset regression using the nonnegative garrote. Technometrics 37(4), 373–384 (1995). https://doi.org/10.2307/1269730
Article MathSciNet MATH Google Scholar
Burke, J.V., Curtis, F.E., Lewis, A.S., Overton, M.L., Simões, L.E.: Gradient sampling methods for nonsmooth optimization. arXiv preprint arXiv:1804.11003 (2018)
Candes, E.J., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006). https://doi.org/10.1109/TIT.2005.862083
Article MathSciNet MATH Google Scholar
Candes, E.J., Wakin, M.B., Boyd, S.P.: Enhancing sparsity by reweighted $\ell _1$ minimization. J. Fourier Anal. Appl. 14(5–6), 877–905 (2008)
Article MathSciNet Google Scholar
Carlsson, M.: On convexification/optimization of functionals including an l2-misfit term. arXiv:1609.09378 [math] (2016)
Carlsson, M.: On convex envelopes and regularization of non-convex functionals without moving global minima. J. Optim. Theory Appl. 183(1), 66–84 (2019)
Article MathSciNet Google Scholar
Chahid, M.: Echantillonnage compressif appliqué à la microscopie de fluorescence et à la microscopie de super résolution. Ph.D. thesis, Bordeaux (2014)
Clarke, F.H.: Optimization and Nonsmooth Analysis, vol. 5. SIAM, Philadelphia (1990)
Book Google Scholar
Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward–backward splitting. Multiscale Model. Simul. 4(4), 1168–1200 (2005)
Article MathSciNet Google Scholar
Gazagnes, S., Soubies, E., Blanc-Féraud, L.: High density molecule localization for super-resolution microscopy using CEL0 based sparse approximation. In: 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), pp. 28–31. IEEE (2017)
Hess, S.T., Girirajan, T.P.K., Mason, M.D.: Ultra-high resolution imaging by fluorescence photoactivation localization microscopy. Biophys. J. 91(11), 4258–4272 (2006). https://doi.org/10.1529/biophysj.106.091116
Article Google Scholar
Larsson, V., Olsson, C.: Convex low rank approximation. Int. J. Comput. Vis. 120(2), 194–214 (2016). https://doi.org/10.1007/s11263-016-0904-7
Article MathSciNet MATH Google Scholar
Li, H., Lin, Z.: Accelerated proximal gradient methods for nonconvex programming. In: Advances in Neural Information Processing Systems, pp. 379–387 (2015)
Lu, Z., Zhang, Y.: Sparse approximation via penalty decomposition methods. SIAM J. Optim. 23(4), 2448–2478 (2013)
Article MathSciNet Google Scholar
Mallat, S.G., Zhang, Z.: Matching pursuits with time–frequency dictionaries. IEEE Trans. Signal Process. 41(12), 3397–3415 (1993). https://doi.org/10.1109/78.258082
Article MATH Google Scholar
Mordukhovich, B.S., Nam, N.M.: An easy path to convex analysis and applications. Synth. Lect. Math. Stat. 6(2), 1–218 (2013)
Article Google Scholar
Nikolova, M.: Relationship between the optimal solutions of least squares regularized with $\ell _0$-norm and constrained by k-sparsity. Appl. Comput. Harmonic Anal. 41(1), 237–265 (2016). https://doi.org/10.1016/j.acha.2015.10.010
Article MathSciNet MATH Google Scholar
Pati, Y.C., Rezaiifar, R., Krishnaprasad, P.S.: Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition. In: Proceedings of 27th Asilomar Conference on Signals, Systems and Computers, Vol. 1, pp. 40–44 (1993). https://doi.org/10.1109/ACSSC.1993.342465
Peleg, D., Meir, R.: A bilinear formulation for vector sparsity optimization. Signal Process. 88(2), 375–389 (2008). https://doi.org/10.1016/j.sigpro.2007.08.015
Article MATH Google Scholar
Pilanci, M., Wainwright, M.J., El Ghaoui, L.: Sparse learning via Boolean relaxations. Math. Program. 151(1), 63–87 (2015)
Article MathSciNet Google Scholar
Rockafellar, R.T., Wets, R.J.B.: Variational Analysis, vol. 317. Springer, Berlin (2009)
MATH Google Scholar
Rust, M.J., Bates, M., Zhuang, X.: Sub-diffraction-limit imaging by stochastic optical reconstruction microscopy (STORM). Nat. Methods 3(10), 793–796 (2006). https://doi.org/10.1038/nmeth929
Article Google Scholar
Sage, D., Kirshner, H., Pengo, T., Stuurman, N., Min, J., Manley, S., Unser, M.: Quantitative evaluation of software packages for single-molecule localization microscopy. Nat. Methods 12(8), 717 (2015)
Article Google Scholar
Sage, D., Pham, T.A., Babcock, H., Lukes, T., Pengo, T., Chao, J., Velmurugan, R., Herbert, A., Agrawal, A., Colabrese, S., et al.: Super-resolution fight club: assessment of 2d and 3d single-molecule localization microscopy software. Nat. Methods 16(5), 387–395 (2019)
Article Google Scholar
Selesnick, I.: Sparse regularization via convex analysis. IEEE Trans. Signal Process. 65(17), 4481–4494 (2017)
Article MathSciNet Google Scholar
Simon, B.: Trace Ideals and Their Applications, Vol. 120. American Mathematical Society, Philadelphia (2005)
Soubies, E., Blanc-Féraud, L., Aubert, G.: A continuous exact $\ell _0$ penalty (CEL0) for least squares regularized problem. SIAM J. Imaging Sci. 8(3), 1607–1639 (2015)
Article MathSciNet Google Scholar
Soubies, E., Blanc-Féraud, L., Aubert, G.: A unified view of exact continuous penalties for $\backslash $ell\_2-$\backslash $ell\_0 minimization. SIAM J. Optim. 27(3), 2034–2060 (2017)
Article MathSciNet Google Scholar
Soussen, C., Idier, J., Brie, D., Duan, J.: From Bernoulli–Gaussian deconvolution to sparse signal restoration. IEEE Trans. Signal Process. 59(10), 4572–4584 (2011)
Article MathSciNet Google Scholar
Tono, K., Takeda, A., Gotoh, J.: Efficient dc algorithm for constrained sparse optimization. arXiv preprint arXiv:1701.08498 (2017)

Download references

Author information

Authors and Affiliations

Université Côte d’Azur, CNRS, Inria, Laboratoire I3S UMR 7271, 06903, Sophia Antipolis, France
Arne Henrik Bechensteen & Laure Blanc-Féraud
Université Côte d’Azur, UNS, Laboratoire J. A. Dieudonné UMR 7351, 06100, Nice, France
Gilles Aubert

Authors

Arne Henrik Bechensteen
View author publications
You can also search for this author in PubMed Google Scholar
Laure Blanc-Féraud
View author publications
You can also search for this author in PubMed Google Scholar
Gilles Aubert
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arne Henrik Bechensteen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The authors would like to thank the anonymous reviewers for their detailed comments and suggestions. This work has been supported by the French government, through a financial Ph.D. allocation from MESRI and through the 3IA Côte d’Azur Investments in the Future project managed by the National Research Agency (ANR) with the Reference Number ANR-19-P3IA-0002.

A Appendix

1.1 A.1 Preliminary Results for Lemma 1

Proposition 2 (Reminder) Let $x\in {\mathbb {R}}^N.$ There exists $j\in {\mathbb {N}}$ such that $0<j\le k$ and

$$\begin{aligned} |x_{k-j+1}^\downarrow |\le \frac{1}{j}\sum _{i=k-j+1}^N |x_i^\downarrow |\le |x_{k-j}^\downarrow | \end{aligned}$$

(23)

where the left inequality is strict if $j\ne 1$, and where $x_0=+\infty $. Furthermore, $T_k(x)$ is defined as the smallest integer that verifies the double inequality.

Proof

First, we suppose that (23) is not true for $j\in \{1,2,\dots , k-1\}$, i.e., either

$$\begin{aligned} |x_{k-j+1}^\downarrow |>\frac{1}{j}\sum _{i=k-j+1}^N |x_i^\downarrow |, \end{aligned}$$

(24)

or

$$\begin{aligned} \frac{1}{j}\sum _{i=k-j+1}^N |x_i^\downarrow |> |x_{k-j}^\downarrow |, \end{aligned}$$

(25)

or both. We prove by recurrence that if (23) is not true $\forall j \in \{1,2,\dots ,k-1\}$, then (24) is false, and (25) is true. We investigate the case $j=1$:

$$\begin{aligned} \sum _{i=k}^N|x_i^\downarrow |=|x_k^\downarrow | +\sum _{i=k+1}^N|x_i^\downarrow |\ge |x_k^\downarrow |. \end{aligned}$$

(26)

The above inequality is obvious, and we can conclude that for $j=1$, (24) is false, and thus, (25) must be true, i.e.,

$$\begin{aligned} \sum _{i=k}^N |x_i^\downarrow |> |x_{k-1}^\downarrow |. \end{aligned}$$

(27)

We suppose that for some $j\in \{1,2,\dots ,k-1\}$, (24) is false and (25) is true, and we investigate $j+1$.

$$\begin{aligned} \frac{1}{j+1}\sum _{i=k-j}^N|x_i^\downarrow |&=\frac{1}{j+1}\left( |x_{k-j}^\downarrow | +\frac{j}{j}\sum _{i=k-j+1}^N|x_i^\downarrow |\right) \nonumber \\&>\frac{1}{j+1}\left( |x_{k-j}^\downarrow | +j |x_{k-j}^\downarrow |\right) =|x_{k-j+1}^\downarrow |. \end{aligned}$$

(28)

We get (28) since we have supposed (25) is true for j. Thus, by recurrence, we can conclude that (24) is false, and (25) is true $\forall j \in \{1,2,\dots ,k-1\}$.

Now, we investigate $j=k$:

$$\begin{aligned} \frac{1}{k}\sum _{i=1}^N|x_i^\downarrow |&=\frac{1}{k}\left( |x_{1}^\downarrow | +\frac{k-1}{k-1}\sum _{i=2}^N|x_i^\downarrow |\right) \nonumber \\&>\frac{1}{k}\left( |x_{1}^\downarrow | +(k-1) |x_{1}^\downarrow |\right) =|x_{1}^\downarrow |. \end{aligned}$$

(29)

We use the fact that (25) is true for $j=k-1$ to obtain the above inequality. Thus, (24) is false. By definition $x^\downarrow _0=+\infty $, and thus, (25) is also false. Thus, $T_k(x)=k$ verifies the double inequality in (23).

To conclude, either $T_k(x)=k$, or there exists $j\in \{1,2,\dots ,k-1\}$ such that $T_k(x)=j$. $\square $

Definition 4

Let $P^{(x)}\in {\mathbb {R}}^{N\times N}$ be a permutation matrix such that $P^{(x)}x=x^{\downarrow }$. The space ${\mathcal {D}}(x)$ is defined as:

$$\begin{aligned} {\mathcal {D}}(x)=\{b; \exists P^{(x)} \text { s.t. } P^{(x)}b=b^{\downarrow } \}. \end{aligned}$$

$z \in {\mathcal {D}}(x)$ means $<z,x>=<z^{\downarrow },x^{\downarrow }>$.

Remark 3

${\mathcal {D}}(x)={\mathcal {D}}(|x|)$, since we have $|x^{\downarrow }|=|x|^\downarrow $.

Proposition 4

Let $(a,b)\in {\mathbb {R}}_{\ge 0}^N\times {\mathbb {R}}_{\ge 0}^N$. Then,

$$\begin{aligned} \sum _i a_ib_i\le \sum _i a^{\downarrow }_i b^{\downarrow }_i \end{aligned}$$

and the inequality is strict if $b\notin {\mathcal {D}}(a)$.

Proof

[34, Lemma 1.8] proves it without proving the strict inequality.

We assume that a is not on the form $a=t(1,1\dots ,1)^T$, i.e., there exists $ i\ne j,\,\, a_i\ne a_j$. If $a=t(1,1\dots ,1)^T$, then $b\in {\mathcal {D}}(a)$, and $\sum _i a_ib_i =\sum _i a^{\downarrow }_i b^{\downarrow }_i$. Moreover, for simplicity, without loss of generality, we suppose $a=a^{\downarrow }$. We write

$$\begin{aligned}&\sum _i^Na_ib_i= a_N\sum _{i=1}^Nb_i +(a_{N-1}-a_N)\nonumber \\&\quad \sum _{i=1}^{N-1}b_i+\dots +(a_1-a_2)b_1 . \end{aligned}$$

(30)

As it is obvious that $\forall \, j=1,\dots N$

$$\begin{aligned} \sum _{i=1}^j b_i\le \sum _{i=1}^j b^{\downarrow }_i, \end{aligned}$$

(31)

and since $a_{j-1}-a_j\ge 0\, \forall \, j$, we get

$$\begin{aligned} \sum _{i=1}^N a_ib_i\le \sum _{i=1}^N a_i b^{\downarrow }_i= \sum _{i=1}^N a^{\downarrow }_i b^{\downarrow }_i \end{aligned}$$

(32)

The goal of Proposition 4 is to show that the inequality in (32) is strict if $b\notin {\mathcal {D}}(a)$.

First, we can remark if $b\notin {\mathcal {D}}(a)$, then there exists $j_0\in \{2,3,\dots ,N\}$ such

$$\begin{aligned} \sum _{i=1}^{j_0-1} b_i < \sum _{i=1}^{j_0-1} b^{\downarrow }_i. \end{aligned}$$

(33)

By contradiction, if (33) is not true, we have $\forall \, j\in \{2,3,\dots ,N\}$

$$\begin{aligned} \sum _{1=1}^{j-1}b^{\downarrow }_i \le \sum _{1=1}^{j-1}b_i, \end{aligned}$$

and with (31), we get

$$\begin{aligned} \sum _{1=1}^{j-1}b^{\downarrow }_i =\sum _{1=1}^{j-1}b_i. \end{aligned}$$

(34)

From (34), we easily obtain $\forall \, j,$

$$\begin{aligned} b_j=b^{\downarrow }_j, \end{aligned}$$

which means $b^{\downarrow }=b$, i.e., $b\in {\mathcal {D}}(a)$, which contradicts the hypothesis $b\notin {\mathcal {D}}(a)$. So there exists $j_0$ such that (33) is true, and if $a_{j_0-1}\ne a_{j_0}$

$$\begin{aligned} (a_{j_0-1}-a_{j_0})\sum _{i=1}^{j_0-1}b_i < (a_{j_0-1}-a_{j_0})\sum _{i=1}^{j_0-1}b^{\downarrow }_i, \end{aligned}$$

which, with (30), implies

$$\begin{aligned} \sum _{i=1}^N a_ib_i < \sum _{i=1}^N a_i b^{\downarrow }_i. \end{aligned}$$

It remains to examine the case where $a_{j_0-1}=a_{j_0}$. In this case, we claim there exists $j_1\in \{1,\dots ,j_{0-2}\}$ such that

$$\begin{aligned} \sum _{i=1}^{j_1}b_i <\sum _{i=1}^{j_1}b^{\downarrow }_i , \end{aligned}$$

(35)

or $j_1\in \{j_0,\dots ,N\}$ such that

$$\begin{aligned} \sum _{i=j_0}^{j_1}b_i< \sum _{i=j_0}^{j_1}b^{\downarrow }_i. \end{aligned}$$

(36)

If not, with the same proof as before we get

$$\begin{aligned} b^{\downarrow }_i=b_i \,\,\,\, i\in \{1,\dots ,j_{0}-2\} \cup \{j_0+1,\dots ,N\}, \end{aligned}$$

i.e., we have

$$\begin{aligned} \left( \begin{matrix} b^{\downarrow }_1 \\ b^{\downarrow }_2 \\ \vdots \\ b^{\downarrow }_{j_0-2} \\ x^{\downarrow }_1 \\ x^{\downarrow }_2\\ b^{\downarrow }_{j_0+1} \\ \vdots \\ b^{\downarrow }_N \end{matrix}\right) = \left( \begin{matrix} b_1\\ b_2\\ \vdots \\ b_{j_0-2} \\ x_1\\ x_2\\ b_{j_0+1}\\ \vdots \\ b_N \end{matrix}\right) \end{aligned}$$

where $(x_1,x_2)=(b_{j_0-1},b_{j_0})$ or $(b_{j_0},b_{j_0-1})$. The order does not matter since $a_{j_0-1}=a_{j_0}$. This implies that $b\in {\mathcal {D}}(a)$, which contradicts the hypothesis. So (35) and (36) are true and we get, for example,

$$\begin{aligned} (a_{j_1-1}-a_{j_1})\sum _{i=1}^{j_1-1} b_i < (a_{j_1-1}-a_{j_1})\sum _{i=1}^{j_1-1} b^{\downarrow }_i, \end{aligned}$$

and if $a_{j_1-1}-a_{j_1}\ne 0$ we deduce

$$\begin{aligned} \sum _i a_ib_i< \sum _i a_ib^{\downarrow }_i. \end{aligned}$$

(37)

If $a_{j_1-1}=a_{j_1}$, we repeat the same argument and proof as above, and we are sure to find an index $j_w$ such that $a_{j_w-1}-a_{j_w}\ne 0$ since we have supposed that $a\ne t(1,1,\dots ,1)^T$. Therefore, (37) is always true which concludes the proof. $\square $

Proposition 5

[38] $g(x):{\mathbb {R}}^N\rightarrow {\mathbb {R}}$ defined as $ g(x)=\frac{1}{2}\sum _{i=1}^k x_i^{\downarrow 2}$, is convex. Furthermore, note that $g(|x|)=g(x)$.

Lemma 4

Let $f_1(z,x)\in {\mathbb {R}}^N\times {\mathbb {R}}^N \rightarrow {\mathbb {R}}$ be defined as

$$\begin{aligned} f_1(z,x){:}{=} -\frac{1}{2}\sum _{i=1}^k z^{\downarrow 2}_i+<z^{\downarrow },x^{\downarrow }>. \end{aligned}$$

Let us consider the concave problem

$$\begin{aligned} \sup _{z\in {\mathbb {R}}^N_{\ge 0}} f_1(z,|x|). \end{aligned}$$

(38)

Problem (38) has the following optimal arguments

$$\begin{aligned}&\mathop {{\mathrm{arg \, sup}}}\limits _{z\in {\mathbb {R}}^N_{\ge 0}} f_1(z,|x|) = \{z; \exists \, P\in {\mathbb {R}}^{N\times N}\nonumber \\&\quad \text { a permutation matrix s.t. } Pz={\hat{z}}\}, \end{aligned}$$

(39)

where ${\hat{z}}$ is defined as

$$\begin{aligned} {\hat{z}}_j={\left\{ \begin{array}{ll}\frac{1}{T_k(x)}\sum _{i=k-T_k(x)+1}^N |x^{\downarrow }_i| &{}\text { if } k\ge j\ge k-T_k(x)+1 \\ &{}\text { or if } j>k \text { and } x^{\downarrow }_j\ne 0\\ \left[ 0,\frac{1}{T_k(x)}\sum _{i=k-T_k(x)+1}^N |x^{\downarrow }_i|\right] &{}\text { if } j>k \text { and } x^{\downarrow }_j= 0\\ |x^{\downarrow }_j| &{}\text { if } j< k-T_k(x)+1. \end{array}\right. } \end{aligned}$$

(40)

We can remark that ${\hat{z}}={\hat{z}}^\downarrow $, and $T_k(x)$ is defined in Proposition 2. The value of the supremum problem is

$$\begin{aligned} \frac{1}{2}\sum _{i=1}^{k-T_k(x)} x^{\downarrow 2}_i + \frac{1}{2T_k(x)}\left( \sum _{i=k-T_k(x)+1}^N |x^{\downarrow }_i| \right) ^2. \end{aligned}$$

(41)

Proof

Problem (38) can be written as:

$$\begin{aligned} \sup _{z\in {\mathbb {R}}^N_{\ge 0}} \sum _{i=1}^k|x^{\downarrow }_i|z^{\downarrow }_i -\frac{1}{2}\sum _{i=1}^k z^{\downarrow 2}_i+\sum _{i=k+1}^N |x^{\downarrow }_i|z^{\downarrow }_i . \end{aligned}$$

(42)

We remark that finding the supremum for $z^{\downarrow }_i \, , i>k$ reduces to finding the supremum of the following term, knowing that $z^{\downarrow }_i $ is upper bounded by $z^{\downarrow }_{i-1}$:

$$\begin{aligned} \sum _{i=k+1}^N |x^{\downarrow }_i| z^{\downarrow }_i . \end{aligned}$$

(43)

Let $z^{\downarrow }_k$ be a constant. The sum in (43) is nonnegative and increasing with respect to $z^{\downarrow }_j$, and the supremum is obtained when $z^{\downarrow }_j$ reaches its upper bound, i.e., $z^{\downarrow }_j=z^{\downarrow }_{j-1} \, \forall j>k$ and $|x^{\downarrow }_j|\ne 0$. By recursion, $z^{\downarrow }_j=z^{\downarrow }_{k} \, \forall j>k$ and $|x^{\downarrow }_j|\ne 0$. When $\exists \, j>k, |x^{\downarrow }_j|=0$, we observe that $z^{\downarrow }_j$ is multiplied with zero, and can take on every value between its lower bound and upper bounds, which is between 0 and $z^{\downarrow }_k$. Then, obviously, the supremum argument for (43) is

$$\begin{aligned} z^{\downarrow }_i {\left\{ \begin{array}{ll} =z^{\downarrow }_k \text { if } |x^{\downarrow }_i| \ne 0\\ \in [0,z^{\downarrow }_k] \text { if } |x^{\downarrow }_i|=0 \end{array}\right. } \end{aligned}$$

(44)

Further, from (42), we observe that for $i<k$, the optimal argument is

$$\begin{aligned} z^{\downarrow }_i =\max (|x^{\downarrow }_i|,z^{\downarrow }_{i+1}). \end{aligned}$$

(45)

By recursion, we can write this as

$$\begin{aligned} z^{\downarrow }_i =\max (|x^{\downarrow }_i|,z^{\downarrow }_k). \end{aligned}$$

(46)

It remains to find the value of $z^{\downarrow }_k$.

Inserting (44) and (46) into (42), we obtain:

$$\begin{aligned}&\sup _{z^{\downarrow }_k} \sum _{i=1}^k|x^{\downarrow }_i|\max (|x^{\downarrow }_i|,z^{\downarrow }_k)-\frac{1}{2}\sum _{i=1}^k \max (|x^{\downarrow }_i|,z^{\downarrow }_k)^2\nonumber \\&\quad +\sum _{i=k+1}^N |x^{\downarrow }_i|z^{\downarrow }_k. \end{aligned}$$

(47)

To treat the term $\max (|x^{\downarrow }_i|,z^{\downarrow }_k)$, we introduce $j^*(k)= \sup _j \{j: z^{\downarrow }_k\le |x^{\downarrow }_j|\}$ , i.e., $j^*(k)$ is the largest index such that $|x^{\downarrow }_{j^*(k)}|\ge z^{\downarrow }_k$, and we define $x^{\downarrow }_0=+\infty $. Therefore, (47) is rewritten as:

$$\begin{aligned}&\sup _{z^{\downarrow }_k} \sum _{i=1}^{j^*(k)}|x^{\downarrow }_i|^2-\frac{1}{2}\sum _{i=1}^{j^*(k)} |x^{\downarrow }_i|^2 + \sum _{i=j^*(k)+1}^k |x^{\downarrow }_i| z^{\downarrow }_k\nonumber \\&\quad -\frac{1}{2}\sum _{i=j^*(k)+1}^k z^{\downarrow 2}_k+\sum _{i=k+1}^N |x^{\downarrow }_i|z^{\downarrow }_k. \end{aligned}$$

(48)

(48) is a concave problem, and the optimality condition yields

$$\begin{aligned} -\sum _{i=j^*(k)+1}^k z^{\downarrow }_k+\sum _{j^*(k)+1}^N |x^{\downarrow }_i| = 0. \end{aligned}$$

(49)

We define $\sum _{i=j^*(k)+1}^k 1 =S$. Then, $j^*(k)=k-S$ and

$$\begin{aligned} z^{\downarrow }_k=\frac{1}{S}\sum _{k-S+1}^N |x^{\downarrow }_i|. \end{aligned}$$

(50)

Furthermore, since $j^*(k)=k-S$ was the largest index such that $|x_{k-S}|\ge z^{\downarrow }_k> |x_{k-S+1}|$. This translates to

$$\begin{aligned} |x^{\downarrow }_{k-S}|\ge \frac{1}{S}\sum _{k-S+1}^N |x^{\downarrow }_i| > |x^{\downarrow }_{k-S+1}|, \end{aligned}$$

which implies $S=T_k(x)$ (see Proposition 2). Note that if $j^*(k)=k$ (which is the same to say $T_k(x)=1$), then the right part of the above inequality is not strict.

Now, assume $|x^{\downarrow }_{j^*(k)}|= z^{\downarrow }_k$. Then, the max function can both take $z^{\downarrow }_k$ or $|x^{\downarrow }_{j^*(k)}|$. If it is the latter, than the expression above is correct. In the former case, $\max (|x^{\downarrow }_{j^*(k)}|,z^{\downarrow }_k)=z^{\downarrow }_k$. We obtain

$$\begin{aligned} z^{\downarrow }_k=\frac{1}{T_k(x)+1}\sum _{k-T_k(x)}^N |x^{\downarrow }_i|. \end{aligned}$$

(51)

Furthermore, we use the fact that $|x^{\downarrow }_{j^*(k)}|= z^{\downarrow }_k$ and $j^*(k)=k-T_k(x)$, and develop (51) as:

$$\begin{aligned} z^{\downarrow }_k&=\frac{1}{T_k(x)+1}\left( x_{k-T_k(x)}+\sum _{k-T_k(x)+1}^N |x^{\downarrow }_i|\right) \end{aligned}$$

(52)

$$\begin{aligned} (T_k(x)+1)z^{\downarrow }_k&= z^{\downarrow }_k+\sum _{k-T_k(x)+1}^N |x^{\downarrow }_i| \end{aligned}$$

(53)

$$\begin{aligned} T_k(x)z^{\downarrow }_k&=\sum _{k-T_k(x)+1}^N |x^{\downarrow }_i| \end{aligned}$$

(54)

$$\begin{aligned} z^{\downarrow }_k&=(40) \end{aligned}$$

(55)

The unique value of $z^{\downarrow }_k$ is given by (50). $\square $

Lemma 5

Let $x\in {\mathbb {R}}^N$ and $f_2(y,x)\in {\mathbb {R}}^N\times {\mathbb {R}}^N \rightarrow {\mathbb {R}}$, defined as

$$\begin{aligned} f_2(y,x) =-\frac{1}{2}\sum _{i=1}^k y^{\downarrow 2}_i+<y,x> \end{aligned}$$

The following concave supremum problem

$$\begin{aligned} \sup _{y\in {\mathbb {R}}^N } f_2(y,x) \end{aligned}$$

(56)

is equivalent to

$$\begin{aligned} \sup _{z\in {\mathbb {R}}^N_{\ge 0}} f_2(z,|x|). \end{aligned}$$

(57)

The arguments are such that ${\hat{y}}_i^\downarrow ={{\,\mathrm{sign}\,}}^*(x_i^{\downarrow {\hat{z}}}){\hat{z}}_i^\downarrow $.

Proof

Let ${\hat{z}}\in {\mathbb {R}}^N_{\ge 0}$ be the argument of the supremum in (57), ${\hat{y}}$ be such that ${\hat{y}}_i={{\,\mathrm{sign}\,}}(x_i){\hat{z}}_i$, and note that $f_2(y,x)=-g(y)+<y,x>$ with g defined as in Proposition 5 in “Appendix A.1.” First, $f_2(y,x)$ is a concave function in y (see Proposition 5). Furthermore, $f_2(y,x)$ is such that $-f_2(y,x)$ is coercive in y. Thus, a supremum exists. Further note that $g({\hat{y}})=g(|{\hat{y}}|)=g({\hat{z}})$. Then, the following sequence of equalities/inequalities completes the proof:

$$\begin{aligned} (57)&= \sup _{z\in {\mathbb {R}}^N_{\ge 0}}f_2(z,|x|)=-g({\hat{z}})\\&\quad +\sum _{i=1}^N {\hat{z}}_i|x_i| = -g({\hat{z}})+ \sum _{i=1}^N {{\,\mathrm{sign}\,}}(x_i){\hat{z}}_ix_i\\&= -g({\hat{y}})+\sum _{i=1}^N {\hat{y}}_ix_i \le (56)\\&= \sup _{y\in {\mathbb {R}}^N} f_2(y,x) \underset{<y,x>\le <|y|,|x|>}{\le } \sup _{y\in {\mathbb {R}}^N} f_2(|y|, |x|)\\&=\sup _{z\in {\mathbb {R}}^N_{\ge 0}} f_2(z,|x|)= (57) \end{aligned}$$

$\square $

1.2 A.2 Proof of Lemma 1

Proof

Note that a similar problem has been studied in [1]. They do, however, work with low-rank approximation; therefore, they did not have the problem of how to permute x since they work with matrices. First, let ${\mathcal {D}}(x)$ be as defined in Definition 4.

We are interested in

$$\begin{aligned} \sup _{y\in {\mathbb {R}}^N } f_2(y,x), \end{aligned}$$

and its arguments, with $f_2$ defined in Lemma 5. From this lemma, we know that we can rather study

$$\begin{aligned} \sup _{z\in {\mathbb {R}}^N_{\ge 0}} f_2(z,|x|). \end{aligned}$$

Furthermore, from Lemma 4, we know the expression of $\sup _{z\in {\mathbb {R}}^N_{\ge 0}}f_1(z,|x|)$ and its arguments. We want to show that $\sup _{z\in {\mathbb {R}}^N_{\ge 0}} f_2(z,|x|)=\sup _{z\in {\mathbb {R}}^N_{\ge 0}} f_1(z,|x|)$, and to find a connection between the arguments of $f_2$ and $f_1$.

First, note that

$$\begin{aligned} \sup _{z \in {\mathbb {R}}^N_{\ge 0}} f_2(z,|x|)\ge \sup _{z \in {\mathbb {R}}^N_{\ge 0} \cap {\mathcal {D}}(x)} f_2(z,|x|). \end{aligned}$$

(58)

From [34, Lemma 1.8] and Proposition 4, we have that $\forall (y,x) \in {\mathbb {R}}_{\ge 0}^N\times {\mathbb {R}}_{\ge 0}^N$:

$$\begin{aligned}<y,x>\le <y^{\downarrow },x^{\downarrow }>, \end{aligned}$$

and the inequality is strict if $y\notin {\mathcal {D}}(x)$, and thus

$$\begin{aligned} \sup _{z\in {\mathbb {R}}^N_{\ge 0}} f_2(z,|x|)\le \sup _{z \in {\mathbb {R}}^N_{\ge 0}} f_1(z,|x|). \end{aligned}$$

(59)

Note that we have ${\mathcal {D}}(|x|)={\mathcal {D}}(x)$, then $\forall z \in {\mathcal {D}}(x)$, $f_2(z,|x|)=f_1(z,|x|)$ and:

$$\begin{aligned}&\sup _{z \in {\mathbb {R}}^N_{\ge 0} \cap {\mathcal {D}}(x)} f_2(z,|x|) = \sup _{z\in {\mathbb {R}}^N_{\ge 0}} \sum _{i=1}^N z^{\downarrow }_i|x^{\downarrow }_i|\nonumber \\&-\frac{1}{2}\sum _{i=1}^k z^{\downarrow 2}_i= \sup _{z\in {\mathbb {R}}^N_{\ge 0}} f_1(z,|x|). \end{aligned}$$

(60)

Using inequalities (58) and (59) and connecting them to (60), we obtain

$$\begin{aligned} \sup _{z\in {\mathbb {R}}^N_{\ge 0}} f_1(z,|x|)= & {} \sup _{z \in {\mathbb {R}}^N_{\ge 0} \cap {\mathcal {D}}(x)} f_2(z,|x|)\\\le & {} \sup _{z\in {\mathbb {R}}^N_{\ge 0}} f_2(z,|x|) \le \sup _{z\in {\mathbb {R}}^N_{\ge 0}} f_1(z,|x|). \end{aligned}$$

$f_2(z,|x|)$ is upper and lower bounded by the same value; thus, we have

$$\begin{aligned} \sup _{z\in {\mathbb {R}}^N_{\ge 0}} f_2(z,|x|)=\sup _{z\in {\mathbb {R}}^N_{\ge 0}} f_1(z,|x|) \end{aligned}$$

(61)

The $\sup _{z\in {\mathbb {R}}^N_{\ge 0}} f_1(z,|x|)$ is known from Lemma 4:

$$\begin{aligned}&\sup _{z\in {\mathbb {R}}^N_{\ge 0}} f_1(z,|x|) = \frac{1}{2}\sum _{i=1}^{k-T_k(x)}x^{\downarrow 2}_i\nonumber \\&\quad +\frac{1}{2T_k(x)}\left( \sum _{i=k-T_k(x)+1}^N |x^{\downarrow }_i|\right) ^2 \end{aligned}$$

(62)

with the optimal arguments:

$$\begin{aligned}&\mathop {{\mathrm{arg \, sup}}}\limits _{z\in {\mathbb {R}}^N_{\ge 0}} f_1(z,|x|) = \{z; \exists \, P\in {\mathbb {R}}^{N\times N}\nonumber \\&\text { a permutation matrix s.t. } Pz={\hat{z}}\}, \end{aligned}$$

(63)

where ${\hat{z}}$ is such that:

$$\begin{aligned} {\hat{z}}_j={\left\{ \begin{array}{ll}\frac{1}{T_k(x)}\sum _{i=k-T_k(x)+1}^N |x^{\downarrow }_i| &{}\text { if } k\ge j\ge k-T_k(x)+1 \\ &{}\text { or if } j>k \text { and } x^{\downarrow }_j\ne 0\\ \left[ 0,\frac{1}{T_k(x)}\sum _{i=k-T_k(x)+1}^N |x^{\downarrow }_i|\right] &{}\text { if } j>k \text { and } |x^{\downarrow }_j|= 0\\ |x^{\downarrow }_j| &{}\text { if } j< k-T_k(x)+1. \end{array}\right. } \end{aligned}$$

(64)

Now we are interested in the optimal arguments of $f_2$. Let $P^{(x)}$ be such that $P^{(x)}x=x^{\downarrow }$. We define $z^*=P^{(x)^{-1}} {\hat{z}}$. Evidently, $P^{(x)}z^*={\hat{z}}$, and since ${\hat{z}}$ is sorted by its absolute value, $P^{(x)}z^*=z^{* \downarrow }$, and thus, $z^*\in {\mathcal {D}}(x)$. Furthermore, from Lemma 4, $z^*$ is an optimal argument of $f_1$.

We have then $f_2(z^*,|x|)=f_1(z^*,|x|)=\sup _{z\in {\mathbb {R}}^N_{\ge 0}} f_1(z,|x|)$. $z^*$ is therefore an optimal argument of $f_2$ since (61) shows the equality between the supremum value of $f_1$ and $f_2$.

We have shown that there exists ${\hat{z}}\in \mathop {{\mathrm{arg \, sup}}}\limits _{z\in {\mathbb {R}}^N_{\ge 0}} f_1(z,|x|)$, from which we can construct $z^*\in {\mathcal {D}}(x)$, an optimal argument of $f_2$. Now, by contradiction, we show that all optimal arguments of $f_2$ are in ${\mathcal {D}}(x)$. Assume ${\hat{z}} = \mathop {{\mathrm{arg \, sup}}}\limits _{z\in {\mathbb {R}}^N_{\ge 0}} f_2(z,|x|)$ and that ${\hat{z}}\notin {\mathcal {D}}(x)$. We can construct $z^*$, such that $z^{* \downarrow }={\hat{z}}^{\downarrow }$, and $z^*\in {\mathcal {D}}(x)$. We have then

$$\begin{aligned}&f_2(z^*,|x|)-f_2({\hat{z}},|x|)\\&\quad =-\frac{1}{2}\sum _i^k z^{*\downarrow 2}_i +<z^*,|x|> +\frac{1}{2}\sum _i^k {\hat{z}}_i^{\downarrow 2} -<{\hat{z}},|x|>\\&\quad =<z^*,|x|>-<{\hat{z}},|x|> =<z^{* \downarrow },|x^{\downarrow }|>\\&\qquad -<{\hat{z}},|x|> > 0. \end{aligned}$$

The last equality is due to $z^*\in {\mathcal {D}}(x)$, and the last inequality is from Proposition 4. Thus, ${\hat{z}}$ is not an optimal argument for $f_2$, and all optimal arguments of $f_2$ must be in ${\mathcal {D}}(x)$.

Furthermore, thus it suffices to study $\sup _{z\in {\mathbb {R}}^N_{\ge 0}\in {\mathcal {D}}(z)}f_2(z,|x|)$, and from (60), we can rather study $f_1$, and construct all supremum arguments of $f_2$ from $f_1$.

$$\begin{aligned} \mathop {{\mathrm{arg \, sup}}}\limits _{z\in {\mathbb {R}}^N_{\ge 0}} f_2(z,|x|) = P^{(x)^{-1}}{\hat{z}} \end{aligned}$$

(65)

where ${\hat{z}}$ is defined in (64). $\square $

1.3 A.3 Calculation of Proximal Operator of $\zeta (x)$

As preliminary results, we state and prove the two following lemmas 6 and 7.

Lemma 6

Let $j:{\mathbb {R}}\rightarrow {\mathbb {R}}$ be a strictly convex and coercive function, let $w=\mathop {{\mathrm{arg\, min}}}\limits _t j(t)$, and let us suppose that j is symmetric with respect to its minimum, i.e., $j(w-t)=j(w+t)\, \forall t \in {\mathbb {R}}$. The problem

$$\begin{aligned} z=\mathop {{\mathrm{arg\, min}}}\limits _{b\le |t|\le a} j(t) \end{aligned}$$

with a and b positive, has the following solution:

$$\begin{aligned} z= {\left\{ \begin{array}{ll} w &{}\text { if } b\le |w|\le a\\ {{\,\mathrm{sign}\,}}^*(w)a &{}\text { if } |w|\ge a\\ {{\,\mathrm{sign}\,}}^*(w)b &{}\text { if } |w|\le b. \end{array}\right. } \end{aligned}$$

Proof

However, j is symmetric with respect to its minimum $j(w+t_1)\le j(w+t_2)\,\forall |t_1|\le |t_2|$. Assume that $0<w\le b$. We can write $j(b)=j(w+\alpha )$, $\alpha > 0$ and $j(-b)=j(w+\beta ), \beta <0$. Since $w>0$, then $|\alpha |<|\beta |$, and thus, the minimum is reached with $z=b$ on the interval [b, a]. Similar reasoning can be used to prove the other cases. $\square $

Lemma 7

Let $g_i:{\mathbb {R}}\rightarrow {\mathbb {R}}\, , i\in [1..N]$ be strictly convex and coercive. Let $w=(w_1,w_2,\dots w_N)^T=\mathop {{\mathrm{arg\, min}}}\limits _{t_i} \sum g_i(t_i)$, i.e., $w_i=\mathop {{\mathrm{arg\, min}}}\limits _{t_i}g_i(t_i)$. Assume that $|w_1|\ge |w_2|\ge \dots \ge |w_k|$ and $|w_{k+1}|\ge |w_{k+2}|\ge \dots \ge |w_N|$. Let $g_i$ be symmetric with respect to its minimum. Consider the following problem:

$$\begin{aligned} \mathop {{\mathrm{arg\, min}}}\limits _{|t_1|\ge \cdots \ge |t_N|} \sum _i^N g_i(t_i). \end{aligned}$$

(66)

The optimal solution is

$$\begin{aligned} t_i(\tau )= {\left\{ \begin{array}{ll} {{\,\mathrm{sign}\,}}^*(w_i) \max (|w_i|,\tau ) &{}\text { if } 1\le i\le k \\ {{\,\mathrm{sign}\,}}^*(w_i)\min (|w_i|,\tau ) &{}\text { if } i> k \end{array}\right. } \end{aligned}$$

(67)

where $\tau \in {\mathbb {R}}$ is in $[\min (|w_k|,|w_{k+1}|),\max (|w_k|,|w_{k+1}|)]$ and is the value that minimizes $\sum g_i(t_i(\tau ))$.

Proof

Note that this proof is inspired by [20, Theorem 2], with some modifications. First, if $|w_k|\ge |w_{k+1}|$, then w satisfies the constraints in Problem (66), and thus, w is the optimal solution. If $|w_k|<|w_{k+1}|$, we must search a little more. In both cases, we can, since each $g_i$ is convex and symmetric with respect to its minimum, apply Lemma 6 for $t_i$, and the choices can be limited to the following choices:

$$\begin{aligned} t_i= {\left\{ \begin{array}{ll} w_i &{}\text { if } |t_{i-1}|\ge |w_i|\ge |t_{i+1}| \\ {{\,\mathrm{sign}\,}}^*(w_i)|t_{i+1}| &{}\text { if } |w_i|< |t_{i+1}| \\ {{\,\mathrm{sign}\,}}^*(w_i)|t_{i-1}| &{}\text { if } |w_i|> |t_{i-1}| \end{array}\right. } \end{aligned}$$

(68)

This can be rewritten in a shorter form, at first in the case where $i\le k$.

$$\begin{aligned} t_i={{\,\mathrm{sign}\,}}(w_i)^*\max {(|w_i|,|t_{i+1}|)}. \end{aligned}$$

(69)

This can be proved by recursion. In the case of $i=1$, $w_1$ is the optimal argument if $|w_1|\ge |t_2|$; otherwise, ${{\,\mathrm{sign}\,}}^*(w_1)|t_2|$ is optimal. Therefore, $t_1={{\,\mathrm{sign}\,}}^*(w_1)\max (|w_1|,|t_{2}|)$. Assume that this is true for the ith index.

$$\begin{aligned} t_{i+1}= {\left\{ \begin{array}{ll} w_{i+1} &{}\text { if } |t_{i}|\ge |w_{i+1}|\ge |t_{i+2}| \text { and } i+1\le k \\ {{\,\mathrm{sign}\,}}^*(w_{i+1})|t_{i+2}| &{}\text { if }| w_{i+1}|< |t_{i+2}|\text { and } i+1\le k \\ {{\,\mathrm{sign}\,}}^*(w_{i+1})|t_{i}| &{}\text { if } |w_{i+1}|> |t_{i}|\text { and } i+1\le k. \end{array}\right. } \end{aligned}$$

(70)

But $t_{i}={{\,\mathrm{sign}\,}}^*(w_i)\max (|w_{i}|,|t_{i+1}|)$, which yields $|t_{i}|\ge |w_{i}|\ge |w_{i+1}|$ and thus, the third case of (70) can be ignored.

Now assume for an $i\le k$ that $t_i\ne w_i$. This implies that

$$\begin{aligned} |t_i|=|t_{i+1}|>|w_i|. \end{aligned}$$

Since $w_i$ is non-increasing for $i\le k$, the following inequality $|t_{i+1}|>|w_{i+1}|$ is true. Furthermore, $|t_{i+1}|= \max (|w_{i+1}|,|t_{i+2}|) = |t_{i+2}|$. By recursion, we have

$$\begin{aligned} |t_i|=|t_{i+1}|=|t_{i+2}|=\cdots =|t_k|. \end{aligned}$$

To facilitate the notations, $|t_k|=\tau $. The lemma is proved by inserting $\tau $ instead of $|t_{i+1}|$ and $|t_k|$ into Eq. (69)

When $i> k$, a similar proof of recursion gives:

$$\begin{aligned} t_i={{\,\mathrm{sign}\,}}^*(w_i)\min (|t_k|,|w_i|). \end{aligned}$$

(71)

and by adopting the notation $\tau $, we finish the proof. $\square $

Remark 4

Note that if w, defined in Lemma 7 is such that $|w_k|\ge |w_{k+1}|$, then w is solution of (66).

Lemma 8

Let $y\in {\mathbb {R}}^N$. Define $\zeta : {\mathbb {R}}^N\rightarrow {\mathbb {R}}$ as $\zeta (x){:}{=}-(\frac{\rho -1}{\rho })\sum _{i=k+1}^N(x_i)^{\downarrow 2}$. The proximal operator of $\zeta $ is such that

$$\begin{aligned} \text {prox}_{\zeta (\cdot )}(y)^{\downarrow y} = {\left\{ \begin{array}{ll}{{\,\mathrm{sign}\,}}(y^{\downarrow }_i) \max {(|y^{\downarrow }_i|,\tau )} &{}\text { if } i\le k \\ {{\,\mathrm{sign}\,}}(y^{\downarrow }_i)\min (\tau ,|\rho y^{\downarrow }_i|) &{}\text { if } i>k.\\ \end{array}\right. } \end{aligned}$$

(72)

If $|y^{\downarrow }_k|<\rho |y^{\downarrow }_{k+1}|$, then $\tau $ is a value in the interval $[|y^{\downarrow }_k|,\rho |y^{\downarrow }_{k+1}|]$, and is defined as

$$\begin{aligned} \tau =\frac{\rho \sum _{i\in n_1}|y^{\downarrow }_i|+\rho \sum _{i\in n_2}|y^{\downarrow }_i|}{\rho \#n_1+ \#n_2} \end{aligned}$$

(73)

where $n_1$ and $n_2$ are two groups of indices such that $\forall \, i \in n_1, y^{\downarrow }_i<\tau $ and $\forall \, i \in n_2,\, \tau \le \rho |y^{\downarrow }_i|$ for an $\#n_1$ and $\#n_2$ are the sizes of n1 and n2. To go from ${\text {prox}_{\zeta (\cdot )}}(y)^{\downarrow y}$ to $\text {prox}_{\zeta (\cdot )}(y)$, we apply the inverse permutation that sorts y to $y^{\downarrow }$.

Note that we search

$$\begin{aligned}&\text {prox}_{-\left( \frac{\rho -1}{\rho }\right) \sum _{i=k+1}^N(\cdot )^{\downarrow 2}}(y)=\text {arg}\min _x -\frac{1}{2}\sum _{i=k+1}^N x^{\downarrow 2}_i \\&\quad + \frac{\rho }{2(\rho -1)}\left\| x-y\right\| _2^2 \end{aligned}$$

We define two functions, $l_1: {\mathbb {R}}^{N}\times {\mathbb {R}}^{N}\rightarrow {\mathbb {R}} $ and $l_2: {\mathbb {R}}^{N}\times {\mathbb {R}}^{N}\rightarrow {\mathbb {R}}$.

$$\begin{aligned} l_1(z,a)&=\frac{\rho }{2(\rho -1)}\sum _i^N (z_i-|a_i| )^2-\frac{1}{2}\sum _{i=k+1}^N z^{\downarrow 2}_i \end{aligned}$$

(74)

$$\begin{aligned} l_2(z,|a|)&=\frac{\rho }{2(\rho -1)}\sum _i^N (z^{\downarrow }_i-|a^{\downarrow }_i|)^2-\frac{1}{2}\sum _{i=k+1}^N z^{\downarrow 2}_i . \end{aligned}$$

(75)

As in Lemma 1, we can create relations between $l_1$ and $l_2$, where $l_2$ can be solved using Lemma 7.

We omit the proof as it is similar to the one of Lemma 1.

1.4 A.4 The Algorithm

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bechensteen, A.H., Blanc-Féraud, L. & Aubert, G. A Continuous Relaxation of the Constrained $\ell _2-\ell _0$ Problem. J Math Imaging Vis 63, 472–491 (2021). https://doi.org/10.1007/s10851-020-01014-y

Download citation

Received: 22 April 2020
Accepted: 19 December 2020
Published: 09 January 2021
Issue Date: May 2021
DOI: https://doi.org/10.1007/s10851-020-01014-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Continuous Relaxation of the Constrained \(\ell _2-\ell _0\) Problem

Abstract

Similar content being viewed by others

Two-stage convex relaxation approach to least squares loss constrained low-rank plus sparsity optimization problems

Inexact Fixed-Point Proximity Algorithm for the \(\ell _0\) Sparse Regularization Problem

Nonuniqueness of Solutions of a Class of \(\ell _{0}\)-minimization Problems

1 Introduction

Proposition 1

Proof

2 The Convex Envelope of the Constrained \(\ell _2-\ell _0\) Problem when A is Orthogonal

Proposition 2

Definition 1

Theorem 1

Proof

Lemma 1

Remark 1

Property 1

Proof

Property 2

Proof

Property 3

Proof

Property 4

Proof

3 A New Relaxation

Remark 2

Theorem 2

Proof

3.1 The Subgradient

Definition 2

Lemma 2

Proof

Definition 3

Theorem 3

Proof

3.2 A Numerical Example of the Relaxation in Two Dimensions

4 Algorithms to Deal with \(G_Q\)

Lemma 3

Proof

Proposition 3

Theorem 4

Proof

5 Application to 2D Single-Molecule Localization Microscopy

5.1 Comparison on 2013 ISBI Data

5.2 Results of the ISBI Simulated Dataset

5.3 Results of the Real Dataset

6 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

A Appendix

A Appendix

1.1 A.1 Preliminary Results for Lemma 1

Proof

Definition 4

Remark 3

Proposition 4

Proof

Proposition 5

Lemma 4

Proof

Lemma 5

Proof

1.2 A.2 Proof of Lemma 1

Proof

1.3 A.3 Calculation of Proximal Operator of \(\zeta (x)\)

Lemma 6

Proof

Lemma 7

Proof

Remark 4

Lemma 8

1.4 A.4 The Algorithm

Rights and permissions

About this article