Fast projection onto the simplex and the $$\pmb {l}_\mathbf {1}$$ ball

Condat, Laurent

doi:10.1007/s10107-015-0946-6

Fast projection onto the simplex and the $\pmb {l}_\mathbf {1}$ ball

Short Communication
Series A
Published: 19 September 2015

Volume 158, pages 575–585, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Mathematical Programming Submit manuscript

Fast projection onto the simplex and the $\pmb {l}_\mathbf {1}$ ball

Download PDF

Laurent Condat¹

4238 Accesses
223 Citations
Explore all metrics

Abstract

A new algorithm is proposed to project, exactly and in finite time, a vector of arbitrary size onto a simplex or an $l_1$-norm ball. It can be viewed as a Gauss–Seidel-like variant of Michelot’s variable fixing algorithm; that is, the threshold used to fix the variables is updated after each element is read, instead of waiting for a full reading pass over the list of non-fixed elements. This algorithm is empirically demonstrated to be faster than existing methods.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The projection of a vector onto the simplex or the $l_1$ ball appears in imaging problems, such as segmentation [15] and multispectral unmixing [2], in portfolio optimization [5], and in many applications of statistics, operations research and machine learning [3, 7, 18–20]. Given an integer $N\ge 1$, a sequence (vector) ${\mathbf {y}}=(y_n)_{n=1}^N\in \mathbb {R}^N$ and a real $a>0$, we aim at computing

$$\begin{aligned} \mathscr {P}_{\varDelta }({\mathbf {y}}):=\mathop {\text {arg min}}\limits _{{\mathbf {x}}\in \varDelta } \Vert {\mathbf {x}}-{\mathbf {y}}\Vert \end{aligned}$$

(1)

or

$$\begin{aligned} \mathscr {P}_{\mathscr {B}}({\mathbf {y}}):=\mathop {\text {arg min}}\limits _{{\mathbf {x}}\in \mathscr {B}} \Vert {\mathbf {x}}-{\mathbf {y}}\Vert , \end{aligned}$$

(2)

where the norm is the Euclidean norm, the simplex $\varDelta \subset \mathbb {R}^N$ is defined as the set of sequences whose elements are nonnegative and sum up to a (for $a=1$, $\varDelta $ is called the unit, or canonical, or standard, or probability simplex):

$$\begin{aligned} \varDelta :=\left\{ \textstyle (x_1,\ldots ,x_N)\in \mathbb {R}^N\ \big |\ \sum _{n=1}^N x_n=a \quad \text{ and }\quad x_n\ge 0,\quad \forall n=1,\ldots ,N\right\} \end{aligned}$$

(3)

and the $l_1$ ball, a.k.a. cross-polytope, $\mathscr {B} \subset \mathbb {R}^N$ is defined as

$$\begin{aligned} \mathscr {B}:=\textstyle \left\{ (x_1,\ldots ,x_N)\in \mathbb {R}^N\ \big |\ \sum _{n=1}^N |x_n|\le a\right\} . \end{aligned}$$

These two projections are well defined and unique, since $\varDelta $ and $\mathscr {B}$ are closed and convex sets. In this paper, we focus on algorithms to perform these projections exactly and in finite time. In Sect. 2, we review the methods of the literature. In Sect. 3, we propose a new algorithm and we show in Sect. 4 that it is faster than the existing methods.

2 Review of prior work

We first recall a well known property, which allows to project onto the $l_1$ ball, as soon as one can project onto the simplex:

Proposition 2.1

(see, e.g., [7, Lemma 3])

$$\begin{aligned} \mathscr {P}_{\mathscr {B}}({\mathbf {y}})=\left\{ \begin{array}{l} {\mathbf {y}},\ \text{ if }\ \sum \nolimits _{n=1}^N |y_n| \le a,\\ \big ({\mathrm {sgn}}(y_1)x_1,\ldots ,{\mathrm {sgn}}(y_N)x_N\big ),\ \text{ else }, \end{array}\right. \end{aligned}$$

(4)

where ${\mathbf {x}}=\mathscr {P}_\varDelta (|{\mathbf {y}}|)$, $|{\mathbf {y}}|=(|y_1|,\ldots ,|y_N|)$, and ${\mathrm {sgn}}$ is the signum function: if $t>0$, ${\mathrm {sgn}}(t)=1$, if $t<0$, ${{\mathrm {sgn}}}(t)=-1$, and ${{\mathrm {sgn}}}(0)=0$.

Remark 2.1

Conversely to Proposition 2.1, one can project onto the simplex using projection onto the $l_1$ ball. Indeed, $\mathscr {P}_\varDelta ({\mathbf {y}})=\mathscr {P}_\varDelta ({\mathbf {y}}+c)$, for every $c\in \mathbb {R}$, and $\mathscr {P}_\varDelta ({\mathbf {y}})=\mathscr {P}_{\mathscr {B}}({\mathbf {y}})$ if the elements of ${\mathbf {y}}$ are nonnegative and $\Vert {\mathbf {y}}\Vert _1\ge a$. Thus, whatever ${\mathbf {y}}$, we have $\mathscr {P}_\varDelta ({\mathbf {y}})=\mathscr {P}_{\mathscr {B}}({\mathbf {y}}-y_{\min }+a/N)$, where $y_{\min }$ is the smallest element of ${\mathbf {y}}$.

So, by virtue of Proposition 2.1, we focus in the following on projecting onto the simplex only, and we denote by

$$\begin{aligned} {\mathbf {x}}:=\mathscr {P}_\varDelta ({\mathbf {y}}) \end{aligned}$$

(5)

the projected sequence. An important property of the projection $\mathscr {P}_\varDelta $, which can be derived from the corresponding Karush–Kuhn–Tucker optimality conditions, is the following:

Proposition 2.2

(see, e.g., [10]) There exists a unique $\tau \in \mathbb {R}$ such that

$$\begin{aligned} x_n=\max \{y_n-\tau ,0\},\quad \forall n=1,\ldots ,N. \end{aligned}$$

(6)

Therefore, the whole difficulty of the operation $\mathscr {P}_\varDelta $ is to find the value of $\tau $. Then, the projection itself simply amounts to applying the thresholding operation (6). So, we must find $\tau $ such that $\sum _{n=1}^N \max \{y_n-\tau ,0\}=a$. Only the largest elements of ${\mathbf {y}}$, which are larger than $\tau $ and are not set to zero during the projection, contribute to this sum. So, if we knew the index set $\mathscr {I}=\{n \ |\ x_n>0\}$, since $\sum _{n=1}^N x_n=\sum _{n\in \mathscr {I}} x_n=\sum _{n\in \mathscr {I}} (y_n-\tau )=a$, we would have $\tau =(\sum _{n\in \mathscr {I}} y_n-a)/|\mathscr {I}|$. Algorithm 1, given in Fig. 1, which is based on sorting the elements of ${\mathbf {y}}$ in decreasing order, naturally follows from these considerations. This algorithm is explicitly given in [10] and was rediscovered many times later. Depending on the choice of the sorting algorithm, the worst case complexity of Algorithm 1 is $O(N^2)$ or $O(N\log N)$ [1].

An improvement of Algorithm 1 was proposed in [20], by noticing that it is not useful to sort ${\mathbf {y}}$ completely, since only its largest elements are involved in the determination of $\tau $. Thus, a heap structure can be used: a heap ${\mathbf {v}}=(v_1,\ldots ,v_N)$ is a partially sorted sequence, such that its first element $v_1$ is the largest and it is fast to re-arrange $(v_2,\ldots ,v_N)$ into a heap, with complexity $O(\log N)$. The complexity of arranging the elements of ${\mathbf {y}}$ into a heap is O(N). This yields Algorithm 2, given in Fig. 1, whose complexity is $O(N+K \log N)$, where $K=|\mathscr {I}|$ is the number of nonzero elements in the solution ${\mathbf {x}}$.

Another way of improving Algorithm 1 is based on the following observation: it is generally considered that the fastest sorting algorithm is quicksort, which uses partitioning with respect to an element called pivot [1]; the pivot is chosen, and the sequence to sort is split into the two subsequences of elements smaller and larger than the pivot, which are then sorted recursively. But we can notice that $\tau $ does not depend on the ordering of the elements $y_n$; it depends only on the sum of the largest elements. Thus, many operations in quicksort are not useful for our aim and can be omitted. Indeed, let us consider, at the first iteration of the algorithm, partitioning ${\mathbf {y}}$ with respect to some pivot value $\rho \in [y_{\min },y_{\max }]$, where $y_{\min }$ and $y_{\max }$ are the minimum and maximum elements of ${\mathbf {y}}$: we define the subsequences ${\mathbf {y}}_{{\mathrm {low}}}$ and ${\mathbf {y}}_{{\mathrm {high}}}$ of elements of ${\mathbf {y}}$ smaller and larger than $\rho $, respectively, and we set $S:=\sum _{y\in {\mathbf {y}}_{{\mathrm {high}}}} y-a$. Then, if $S/|{\mathbf {y}}_{{\mathrm {high}}}|\ge \rho $, we have $\tau \ge \rho $, so that we can discard the elements of ${\mathbf {y}}_{{\mathrm {low}}}$ and continue with ${\mathbf {y}}_{{\mathrm {high}}}$ to determine $\tau $. On the other hand, if $S/|{\mathbf {y}}_{{\mathrm {high}}}|\le \rho $, we have $\tau \le \rho $. Thus, we can discard the elements of ${\mathbf {y}}_{{\mathrm {high}}}$, keeping only the values S and $|{\mathbf {y}}_{{\mathrm {high}}}|$ in memory, and continue with ${\mathbf {y}}_{{\mathrm {low}}}$ to determine $\tau $ such that $\sum _{y\in {\mathbf {y}}_{{\mathrm {low}}}} \max \{y-\tau ,0\} + S - |{\mathbf {y}}_{{\mathrm {high}}}|\tau =0$. This yields Algorithm 3, given in Fig. 1. We refer to the review paper [12] for references and discussions about this class of algorithms, which was popularized recently by the highly cited paper of Duchi et al. [7]. Before discussing the choice of the pivot, we highlight a major drawback of the algorithm given in [7], whose only difference with Algorithm 3 is that, at step 2.4., the elements of ${\mathbf {v}}$ equal to the pivot, except one, are left in ${\mathbf {v}}$ instead of being discarded. The pivot is chosen at random in ${\mathbf {v}}$. The worst case expected complexity (averaged over all choices of the pivot) of this algorithm is not O(N) as claimed in [7], but $O(N^2)$. Indeed, expected linear time is guaranteed only if the elements of ${\mathbf {y}}$ are distinct. Since projection onto a simplex is often one operation among others in an iterative algorithm converging to a fixed point, and since sparsity of the solution is often a desirable property, it is likely that, in practice, the projection algorithm is fed with sequences in the simplex or close to it, thus containing many elements at zero. For instance, when applying the algorithm of [7] to ${\mathbf {y}}=(0,\ldots ,0,1)$, the complexity is $O(N^2)$: the algorithm iterates over sequences of size N, $N-1$, and so on until 1 is picked as pivot. This issue is corrected with our Algorithm 3, which has O(N) expected complexity when the pivot is chosen at random in ${\mathbf {v}}$, thanks to our careful handling of the elements equal to the pivot.

Now, concerning the choice of the pivot, this is the same much-discussed problem as for the sorting algorithm quicksort and the selection algorithm quickselect, which are based on partitioning as well [1]. The choice depends on whether one is ready to make use of a random number generator and accept a fluctuating running time. It also depends on whether one is ready to accept the worst case $O(N^2)$, which may come randomly with low probability or may be deliberately triggered by someone having knowledge of the implementation and feeding the algorithm with a contrived sequence ${\mathbf {y}}$, creating a security risk. Choosing the pivot at random in the list ${\mathbf {v}}$ gives expected complexity O(N), with some variance, but worst case complexity $O(N^2)$. If the pivot is the median of ${\mathbf {v}}$, the complexity becomes O(N), which is optimal, but a linear time median finding routine, typically based on the median of medians [4], is cumbersome to implement and slow. According to [12], a good compromise, which we adopt in Sect. 4, is to take the median of ${\mathbf {v}}$ as pivot, but to find it using the efficient algorithm of [11], whose expected complexity (not worst case) in terms of comparisons is $3N/2+o(N)$.

A different algorithm, of the variable-fixing type [13, 18, 19], has been proposed by Michelot [17]. It is reproduced^{Footnote 1} as Algorithm 4 in Fig. 1. It can be viewed as a version of Algorithm 3, where the pivot $\rho $ would alway be known to be a lower bound of $\tau $, so that the step 2.4. is always executed. Indeed, for every subsequence ${\mathbf {v}}$ of ${\mathbf {y}}$, by setting $\rho =(\sum _{y\in {\mathbf {v}}}-a)/|{\mathbf {v}}|$, we have $a=\sum _{y\in {\mathbf {v}}}(y-\rho )\le \sum _{y\in {\mathbf {v}}}\max \{y-\rho ,0\}\le \sum _{n=1}^N\max \{y_n-\rho ,0\}$. Therefore, $\rho \le \tau $. Consequently, if $y\le \rho $, we know that $\max \{y-\tau ,0\}=0$ and we can discard the element y, which does not contribute to the determination of $\tau $. By alternating between the calculation of $\rho =(\sum _{y\in {\mathbf {v}}}y-a)/|{\mathbf {v}}|$ and the update of the sequence ${\mathbf {v}}$ by discarding its elements smaller or equal to $\rho $, the algorithm enters a steady state after a finite number of iterations (consisting of steps 2.1. and 2.2.), with $\rho =\tau $. Algorithm 4 has several advantages: it is deterministic, very simple to implement, and independent of the ordering of the elements in ${\mathbf {y}}$. Its complexity is observed linear in practice [13]. Yet, its worst case complexity is $O(N^2)$; this corresponds to the case where only one element is discarded from ${\mathbf {v}}$ at step 2.1. of every iteration. Examples of such pathological sequences can be easily constructed; one example is given in [6, end of Section 3].

Yet another way to look at the problem is to view the search of $\tau $ as a root finding problem [9, 16]. Let us define the function $f:\rho \mapsto \sum _{n=1}^N \max \{y_n-\rho ,0\}-a$. We look for $\tau $ such that $f(\tau )=0$, so $\tau $ is a root of f. f has the following properties: it is convex; it is piecewise linear with breakpoints at the $y_n$; it is strictly decreasing on $(-\infty ,y_{\max }]$ and $f(\rho )=-a$ for every $\rho \in [y_{\max },+\infty )$. So, for any $\rho \in \mathbb {R}$, if $f(\rho )<0$, then $\rho >\tau $ and if $f(\rho )>0$, then $\rho <\tau $. Moreover, $f(y_{\min }-a/N)\ge 0$, $f(y_{\max }-a/N)\le 0$ and $f(y_{\max }-a)\ge 0$, so that $\tau \in [\max \{y_{\max }-a,y_{\min }-a/N\},y_{\max }-a/N]$. Thus, Algorithm 3 may be interpreted as a bracketing method and Algorithm 4 as a Newton method [6, Proposition 1] to find the root $\tau $. The method of [16] combines features from the bisection method and the secant method. However, the proof of [16] that the bisection method has complexity O(N) is not valid: for a fixed, arbitrarily small, value $\delta >0$, the number of elements $y_n$ in an interval of size $\delta $ bracketing $\tau $ may be as large as N, so that the number of bisection steps, each of complexity O(N), may be arbitrarily large. Finally, we note that projection onto the simplex is a particular case of the more general continuous quadratic knapsack problem; most methods proposed to solve this problem are based on the principles discussed in this section [12, 13] and we refer to the survey papers [18, 19] for a complete annotated list of references.

3 Proposed algorithm

Using the principles seen in the previous section, we are in position to explain the proposed algorithm, given in Fig. 2, which can be viewed as a Gauss–Seidel-like variation of Algorithm 4. Indeed, the lower bound $\rho $ of $\tau $ can be updated not only after a complete scan of the sequence ${\mathbf {v}}$, but after every element of ${\mathbf {v}}$ is read. Let us first describe a simplified version of the algorithm, without the step 3. and with the steps 2.1., 2.2. and 2.3. replaced by “2.1. Add $y_n$ to ${\mathbf {v}}$ and set $\rho :=\rho +(y_n-\rho )/|{\mathbf {v}}|$.”

The algorithm starts with the first pass (steps 1. and 2.), which does not assume any knowledge about ${\mathbf {y}}$. Let us consider that we are currently reading the element $y_n$, for some $2\le n\le N$ and that we have already read the previous elements $y_r$, $r=1,\ldots ,n-1$. We have a subsequence ${\mathbf {v}}$ of $(y_r)_{r=1}^{n-1}$ of all the elements potentially larger than $\tau $ and we maintain the variable $\rho =(\sum _{y\in {\mathbf {v}}}y-a)/|{\mathbf {v}}|$. We know that $\rho \le \tau $. Hence, if $y_n\le \rho $, then $y_n\le \tau $. So, we can ignore this element and we do nothing. In the other case $y_n>\rho $, we add $y_n$ to ${\mathbf {v}}$, since $y_n$ is potentially larger than $\tau $, and $\rho $ is assigned the new value of $(\sum _{y\in {\mathbf {v}}}y-a)/|{\mathbf {v}}|$, which is strictly larger than previously. Then we continue the pass with the next element $y_{n+1}$. The pass is initialized with ${\mathbf {v}}=(y_1)$ and $\rho =y_1-a$.

At the beginning of all the subsequent passes (step 4.), we have the subsequence ${\mathbf {v}}$ of all the elements of ${\mathbf {y}}$ potentially larger than $\tau $. The difference with the beginning of the first pass is that we have calculated the value $\rho =(\sum _{y\in {\mathbf {v}}}y-a)/|{\mathbf {v}}|$; we will use it to remove elements from ${\mathbf {v}}$ sequentially. Let us consider that we are reading the element $y\in {\mathbf {v}}$. If $y>\rho $, we do nothing. Else, $y\le \tau $, so we remove this element from ${\mathbf {v}}$. Consequently, $\rho $ is assigned the new value of $(\sum _{y\in {\mathbf {v}}}y-a)/|{\mathbf {v}}|$, which is strictly larger than previously. Then we continue the pass with the next element of ${\mathbf {v}}$.

The proof of correctness of this algorithm is straightforward. At the end of every pass, either at least one element has been removed from ${\mathbf {v}}$, or ${\mathbf {v}}$ and $\rho $ remain the same as after the previous pass. In the latter case, the elements of ${\mathbf {y}}$ which are not present in ${\mathbf {v}}$ are smaller than $\rho $ and the elements in ${\mathbf {v}}$ (which are the $|{\mathbf {v}}|$ largest elements of ${\mathbf {y}}$) are larger than $\rho $, so that $a=\sum _{y\in {\mathbf {v}}} (y-\rho )=\sum _{n=1}^N \max \{y_n-\rho ,0\}$. Thus, from Proposition 2, $\rho =\tau $.

The first pass of the proposed algorithm, as given in Fig. 2, contains a refinement with respect to the algorithm just described. Every pass aims at calculating the best possible lower bound $\rho $ of $\tau $. And for every n, $y_n-a\le \tau $. So, when reading $y_n$, if $y_n-a$ is larger than the current value of $\rho $, we set $\rho :=y_n-a$. But then a cleanup pass (step 3.) is necessary after the first pass, to restore the invariant properties that $\rho =(\sum _{y\in {\mathbf {v}}}y-a)/|{\mathbf {v}}|$ and that ${\mathbf {v}}$ contains all the elements of ${\mathbf {y}}$ larger than $\rho $.

We end this section with a few comments on the complexity of the proposed algorithm. It is always faster than Algorithm 4, since after every pass, more elements of ${\mathbf {v}}$ are removed. Contrary to Algorithm 4, its complexity depends on the ordering of the elements in ${\mathbf {y}}$. In the most favorable case where ${\mathbf {y}}$ is sorted in decreasing order, the complexity is O(N), since at the end of the first pass, which behaves like step 2. of Algorithm 1, we have $\rho =\tau $. The complexity is also O(N) if ${\mathbf {y}}$ is sorted in increasing order, since $\rho =\tau $ after the second pass. However, the worst case complexity is $O(N^2)$; like for Algorithm 4, an adversary sequence can be easily constructed.

4 Comparison of the algorithms

The complexity of the algorithms is summarized in Table 1. In Table 2, for one example, the number of elements not yet characterized as active or inactive is shown, after every pass of the iterative algorithms. This demonstrates the efficiency of the selection process of the proposed algorithm.

Table 1 Complexity of the algorithms to project onto the simplex, with respect to the length N of the data and number K of nonzero elements in the solution

Full size table

Table 2 Number $|{\mathbf {v}}|$ of elements in the sequence ${\mathbf {v}}$ after every pass of the algorithms, for one example with $N=10^6$

Full size table

All the algorithms were implemented in C and quite optimized. Attention was paid to the numerical robustness as well, with all the averaged quantities like $(\sum _{y\in {\mathbf {v}}}y-a)/|{\mathbf {v}}|$ computed using the Welford–Knuth running mean algorithm [14]. The code is freely available on the website of the author. Note that we implemented the algorithms to maximize the execution speed, without taking care of the memory usage. For instance, we implemented Algorithm 3 with two auxiliary buffers of size N, to store the elements lower and greater than the pivot. Using only one buffer of size N would be possible, performing partitioning using swaps to put the elements lower and greater than the pivot in the first and second part of the buffer, respectively (see the description of Program 6 in [1] for details). This would slow down the algorithm, however. The proposed algorithm only requires one memory buffer of size N to store the sequence ${\mathbf {v}}$, which is updated in place. As a parenthesis, we remark that one could easily modify Algorithm 4 and the proposed algorithm, so that they do not require any auxiliary memory buffer, except the one to store ${\mathbf {y}}$, replaced by ${\mathbf {x}}$ in place at the end of the algorithm; this would require reading the entire sequence ${\mathbf {y}}$ at every pass of the algorithm.

When implementing projection onto the $l_1$ ball, according to Proposition 2.1, the naive approach consists in doing a first pass to compute $|{\mathbf {y}}|$ and to decide whether ${\mathbf {y}}$ is inside the $l_1$ ball. With the proposed algorithm, this pass can be fused with its first pass (steps 1. and 2.), replacing $y_n$ by $|y_n|$ everywhere. Then, after step 2., a simple test is performed: if $\rho \le 0$, ${\mathbf {y}}$ is inside the $l_1$ ball, so the algorithm sets ${\mathbf {x}}={\mathbf {y}}$ and terminates. Else, ${\mathbf {y}}$ is outside the $l_1$ ball and the algorithm continues with step 3.; the signs of the $y_n$ are applied to the $x_n$ at step 6.

The code was run on a Apple Macbook Pro laptop under OS 10.9.4 with a 2.3Ghz Intel Core i7 CPU and 8Go RAM. The computation times for projecting onto the simplex, reported in Table 3, show that the proposed algorithm performs globally the best, except in the particular case, of limited practical interest, where ${\mathbf {y}}$ is maximally sparse and exactly on the simplex. In Table 4, for projection onto the $l_1$ ball, the proposed algorithm is compared to the improved bisection algorithm (IBIS) of Liu and Ye [16]; we used the C code of the authors (function eplb of their package SLEP), available online at http://www.public.asu.edu/~jye02/Software/SLEP/. As a result, the proposed algorithm is more than twice faster than IBIS for large N.

Table 3 Computation times in seconds for projecting ${\mathbf {y}}$ onto the unit simplex ($a=1$), averaged over $10^2$ (for $N=10^6$) and $10^4$ (for $N=10^3$ and $N=20$) realizations (the number in parentheses is the SD)

Full size table

Table 4 Computation times in seconds for projecting ${\mathbf {y}}$ onto the unit l1 ball ($a=1$), averaged over $10^2$ (for $N=10^6$) and $10^4$ (for $N=10^3$ and $N=20$) realizations (the number in parentheses is the SD)

Full size table

5 Concluding remarks

We have provided a synthetic overview of the available algorithms to project onto the simplex or the $l_1$ ball and we have proposed a new and faster algorithm. In some practical applications, the vector to project is of small length N and the projection is one operation among others, some of which with complexity $O(N^2)$, like dense matrix-vector products; in such case, the cost of the projection is negligible and all the projection algorithms are equally valid choices. By contrast, for large-scale problems like learning or classification over large dictionaries and in imaging, N is of order $10^6$ and more; in such case, all the operations have complexity O(N) or $O(N\log N)$ and a 50x speedup of the proposed algorithm with respect to the naive sort-based algorithm does make a big difference.

We can note that the proposed algorithm can be used for matrix approximation problems, by reasoning on their eigenvalues or singular values, e.g., to project a symmetric matrix onto the spectahedron, which is the set of positive semidefinite matrices of unit trace and can be seen as a natural generalization of the unit simplex to symmetric matrices.

Future work will be focused on fast algorithms for optimization problems involving the simplex or the $l_1$ ball. This includes inverse imaging problems regularized by a constraint on the total variation seminorm [8] and the computation of abundance maps in multispectral unmixing [2].

Notes

Actually, Algorithm 4 is an improvement of Michelot’s algorithm, with the test “$>$” instead of “$\ge $” at step 2.1. This modification has been proposed in [13, Sect. 5.7]. Algorithm 4 is also the same algorithm as in [9].

References

Bentley, J.L., McIlroy, M.D.: Engineering a sort function. Softw. Pract. Exp. 23(11), 1249–1265 (1993)
Article Google Scholar
Bioucas-Dias, J.M., Plaza, A., Dobigeon, N., Parente, M., Du, Q., Gader, P., Chanussot, J.: Hyperspectral unmixing overview: geometrical, statistical, and sparse regression-based approaches. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 5(2), 354–379 (2012)
Article Google Scholar
Blondel, M., Fujino, A., Ueda, N.: Large-scale multiclass support vector machine training via Euclidean projection onto the simplex. In: Proceedings of the 22th International Conference on Pattern Recognition (ICPR), pp. 1289–1294 (2014)
Blum, M., Floyd, R.W., Pratt, V.R., Rivest, R.L., Tarjan, R.E.: Time bounds for selection. J. Comput. Syst. Sci. 7(4), 448–461 (1973)
Article MathSciNet MATH Google Scholar
Brodie, J., Daubechies, I., Mol, C.D., Giannone, D., Loris, I.: Sparse and stable Markowitz portfolios. Proc. Natl. Acad. Sci. 106(30), 12267–12272 (2009)
Article MATH Google Scholar
Cominetti, R., Mascarenhas, W.F., Silva, P.J.S.: A Newton’s method for the continuous quadratic knapsack problem. Math. Program. Comp. 6, 151–169 (2014)
Article MathSciNet MATH Google Scholar
Duchi, J., Shalev-Shwartz, S., Singer, Y., Chandra, T.: Efficient projections onto the $\ell _1$-ball for learning in high dimensions. In: Proceedings of the 25th International Conference on Machine learning (ICML) (2008)
Fadili, J., Peyré, G.: Total variation projection with first order schemes. IEEE Trans. Image Process. 20(3), 657–669 (2011)
Article MathSciNet Google Scholar
Gong, P., Gai, K., Zhang, C.: Efficient Euclidean projections via piecewise root finding and its application in gradient projection. Neurocomputing 74, 2754–2766 (2011)
Article Google Scholar
Held, M., Wolfe, P., Crowder, H.: Validation of subgradient optimization. Math. Program. 6, 62–88 (1974)
Article MathSciNet MATH Google Scholar
Kiwiel, K.C.: On Floyd and Rivest’s SELECT algorithm. Theor. Comput. Sci. 347, 214–238 (2005)
Article MathSciNet MATH Google Scholar
Kiwiel, K.C.: Breakpoint searching algorithms for the continuous quadratic knapsack problem. Math. Program. Ser. A 112, 473–491 (2008)
Article MathSciNet MATH Google Scholar
Kiwiel, K.C.: Variable fixing algorithms for the continuous quadratic knapsack problem. J. Optim. Theory Appl. 136, 445–458 (2008)
Article MathSciNet MATH Google Scholar
Knuth, D.E.: The Art of Computer Programming, vol. 2, 3rd edn, p. 232. Addison-Wesley, Boston (1998)
Google Scholar
Lellmann, J., Kappes, J.H., Yuan, J., Becker, F., Schnörr, C.: Convex multi-class image labeling by simplex-constrained total variation. In: Proceedings of Scale Space and Variational Methods in Computer Vision (SSVM), vol. 5567, pp. 150–162 (2009)
Liu, J., Ye, J.: Efficient Euclidean projections in linear time. In: Proceedings of the 26th International Conference Machine Learning (ICML) (2009)
Michelot, C.: A finite algorithm for finding the projection of a point onto the canonical simplex of ${\mathbb{R}}^n$. J. Optim. Theory Appl. 50(1), 195–200 (1986)
Article MathSciNet MATH Google Scholar
Patriksson, M.: A survey on the continuous nonlinear resource allocation problem. Eur. J. Oper. Res. 185, 1–46 (2008)
Article MathSciNet MATH Google Scholar
Patriksson, M., Strömberg, C.: Algorithms for the continuous nonlinear resource allocation problem—new implementations and numerical studies. Eur. J. Oper. Res. 243(3), 703–722 (2015)
Article MathSciNet Google Scholar
van den Berg, E., Friedlander, M.P.: Probing the Pareto frontier for basis pursuit solutions. SIAM J. Sci. Comput. 31, 890–912 (2008)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

University of Grenoble Alpes, GIPSA-Lab, 38000, Grenoble, France
Laurent Condat

Authors

Laurent Condat
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Laurent Condat.

Additional information

This work has been partially supported by the LabEx PERSYVAL-Lab (ANR-11-LABX-0025).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Condat, L. Fast projection onto the simplex and the $\pmb {l}_\mathbf {1}$ ball. Math. Program. 158, 575–585 (2016). https://doi.org/10.1007/s10107-015-0946-6

Download citation

Received: 26 September 2014
Accepted: 31 August 2015
Published: 19 September 2015
Issue Date: July 2016
DOI: https://doi.org/10.1007/s10107-015-0946-6

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Fast projection onto the simplex and the \(\pmb {l}_\mathbf {1}\) ball

Abstract

1 Introduction