Abstract
Counterexamples to some old-standing optimization problems in the smooth convex coercive setting are provided. We show that block-coordinate, steepest descent with exact search or Bregman descent methods do not generally converge. Other failures of various desirable features are established: directional convergence of Cauchy’s gradient curves, convergence of Newton’s flow, finite length of Tikhonov path, convergence of central paths, or smooth Kurdyka–Łojasiewicz inequality. All examples are planar. These examples are based on general smooth convex interpolation results. Given a decreasing sequence of positively curved \(C^k\) convex compact sets in the plane, we provide a level set interpolation of a \(C^k\) smooth convex function where \(k\ge 2\) is arbitrary. If the intersection is reduced to one point our interpolant has positive definite Hessian, otherwise it is positive definite out of the solution set. Furthermore, given a sequence of decreasing polygons we provide an interpolant agreeing with the vertices and whose gradients coincide with prescribed normals.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
1.1 Questions and method
One of the goals of convex optimization is to provide a solution to a problem with stable and fast algorithms. The quality of a method is generally assessed by the convergence of sequences, rate estimates, complexity bounds, finite length of relevant quantities and other quantitative or qualitative ways.
Positive results in this direction are numerous and have been the object of intense research since decades. To name but a few: gradient methods e.g., [16, 31, 32], proximal methods e.g., [7], alternating methods e.g., [8, 39], path following methods e.g., [4, 33], Tikhonov regularization e.g. [24], semi-algebraic optimization e.g., [13, 15], decomposition methods e.g., [7, 8], augmented Lagrangian methods e.g., [11] and many others.
Despite this vast literature, some simple questions remain unanswered or just partly tackled, even for smooth convex coercive functions. Does the alternating minimization method, aka Gauss–Seidel method, converge? Does the steepest descent method with exact line search converge? Do mirror descent or Bregman methods converge? Does Newton’s flow converge? Do central paths converge? Is the gradient flow directionally stable? Do smooth convex functions have the Kurdyka–Łojasiewicz property?
In this article we provide a negative answer to all these questions.
Our work draws inspiration from early works of de Finetti [22], Fenchel [21], on convex interpolation, but also from Torralba’s PhD thesis [36] and the more recent [12], where some counterexamples on the Tikhonov path and Łojasiewicz inequalities are provided. The convex interpolation problem, see [22], is as follows: given a monotone sequence of convex setsFootnote 1 may we find a convex function interpolating each of these sets, i.e., having these sets as sublevel sets? Answers to these questions for continuous convex functions were provided by de Finetti, and improved by Fenchel [21], Kannai [25], and then used in [12, 36] for building counterexamples.
We improve this work by providing, for \(k\ge 2\) arbitrary, a general \(C^k\) interpolation theorem for positively curved convex sets, imposing at the same time the positive definiteness of its Hessian out of the solution set. An abridged version could be as follows.
Theorem
(Smooth convex interpolationFootnote 2) Let \(\left( T_i \right) _{i\in {\mathbb {Z}}}\) be a sequence of compact convex subsets of \({\mathbb {R}}^2\), with positively curved \(C^k\) boundary, such that \(T_i\subset \mathrm {int}\,T_{i+1}\) for all i in \({\mathbb {Z}}\). Then there exists a \(C^k\) convex function f having the \(T_i\) as sublevel sets with positive definite Hessian outside of the set:
We provide several additional tools (derivatives computations) and variants (status of the solution set, Legendre functions, Lipschitz continuity). Whether our result is generalizable to general smooth convex sequences, i.e., with vanishing curvature, seems to be a very delicate question whose answer might well be negative.
Our central theoretical result is complemented by a discrete approximate interpolation result “of order one" which is particularly well adapted for building counterexamples. Given a nested collection of polygons, one can indeed build a smooth convex function having level sets interpolating its vertices and whose gradients agree with prescribed normals.
Our results are obtained by blending parametrization techniques, Minkovski summation, Bernstein approximations and convex analysis.
As sketched below, our results offer the possibility of building counterexamples in convex optimization by restricting oneself to the construction of countable collections of nested convex sets satisfying some desirable properties. In all cases failures of good properties are caused by some curvature oscillations.
1.2 A digest of counterexamples
Counterexamples provided in this article can be classified along three axes: structural counterexamples,Footnote 3 counterexamples for convex optimization algorithms and ordinary differential equations.
In the following, the term “nonconverging” sequence or trajectory means, a sequence or a trajectory with at least two distinct accumulation points. Unless otherwise stated, convex functions have full domain.
The following results are proved for \(C^k\) convex functions on the plane with \(k\ge 2\).
Structural counterexamples
-
(i)
Kurdyka–Łojasiewicz: There exists a \(C^k\) convex function whose Hessian is positive definite outside its solution set and which does not satisfy the Kurdyka–Łojasiewicz inequality. This is an improvement on [12].
-
(ii)
Tikhonov regularization path: There exists a \(C^k\) convex function f such that the regularization path
$$\begin{aligned} x(r)= {{\,\mathrm{argmin}\,}}\left\{ f(y) + r \Vert y\Vert ^2:y\in {\mathbb {R}}^2\right\} , \,\,r\in (0,1) \end{aligned}$$has infinite length. This strengthens a theorem by Torralba [36].
-
(iii)
Central path: There exists a continuous Legendre function \(h :[-1,1]^2 \mapsto {\mathbb {R}}\), \(C^k\) on the interior, and c in \({\mathbb {R}}^2\) such that the central path
$$\begin{aligned} x(r) = {{\,\mathrm{argmin}\,}}\left\{ \left\langle c, y \right\rangle + r h(y):y\in D\right\} \end{aligned}$$does not have a limit as \(r \rightarrow 0\).
Algorithmic counterexamples:
-
(iv)
Gauss–Seidel method (block coordinate descent): There exists a \(C^k\) convex function with positive definite Hessian outside its solution set and an initialization \( (u_0,v_0)\) in \({\mathbb {R}}^2\), such that the alternating minimization algorithm
$$\begin{aligned} u_{i+1}&= \mathop {\hbox {argmin}}\limits _{u \in {\mathbb {R}}} f(u, v_i) \\ v_{i+1}&= \mathop {\hbox {argmin}}\limits _{v \in {\mathbb {R}}}f(u_{i+1}, v) \end{aligned}$$produces a bounded nonconverging sequence \(((u_i,v_i))_{i\in {\mathbb {N}}}\).
-
(v)
Gradient descent with exact line search: There exists a \(C^k\) convex function f with positive definite Hessian outside its solution set and an initialization \(x_0\) in \({\mathbb {R}}^2\), such that the gradient descent algorithm with exact line search
$$\begin{aligned} x_{i+1}&= \mathop {\hbox {argmin}}\limits _{t \in {\mathbb {R}}} f(x_i + t \nabla f(x_i)) \end{aligned}$$produces a bounded nonconvergent sequence.
-
(vi)
Bregman or mirror descent method: There exists a continuous Legendre function \(h :[-1,1]^2 \mapsto {\mathbb {R}}\), \(C^k\) on the interior, a vector c in \({\mathbb {R}}^2\) and an initialization \(x_0\) in \((-1,1)^2\) such that the Bregman recursion
$$\begin{aligned} x_{i+1} = \nabla h^*(\nabla h(x_i) - c) \end{aligned}$$produces a nonconverging sequence. The couple \((h,\langle c,\cdot \rangle \)) satisfies the smoothness properties introduced in [6].
Continuous time ODE counterexamples:
-
(vii)
Continuous time Newton method: There exists a \(C^k\) convex function with positive definite Hessian outside its solution set, and an initialization \(x_0\) in \({\mathbb {R}}^2\) such that the continuous time Newton’s system
$$\begin{aligned} {\dot{x}}(t)&= - \left[ \nabla ^2 f(x(t))\right] ^{-1} \nabla f(x(t)), \,\,t\ge 0,\\ x(0)&= x_0 \end{aligned}$$has a solution approaching \({{\,\mathrm{argmin}\,}}f\) which does not converge.
-
(viii)
Directional convergence for gradient curves: There exists a \(C^k\) convex function with 0 as a unique minimizer and a positive definite Hessian on \({\mathbb {R}}^2{\setminus }\{0\}\), such that for any non stationary solution to the gradient system
$$\begin{aligned} {\dot{x}}(t)&= - \nabla f(x(t)) \end{aligned}$$the direction \(x(t) / \Vert x(t)\Vert \) does not converge.
-
(ix)
Hessian Riemannian gradient dynamics: There exists a continuous Legendre function \(h :[-1,1]^2 \mapsto {\mathbb {R}}\), \(C^k\) on the interior, a linear function f and a nonconvergent solution to the following system
$$\begin{aligned} {\dot{x}}(t) = - \nabla _H f(x(t)), \end{aligned}$$where \(H = \nabla ^2 h\) is the Hessian of h and \(\nabla _H f = H^{-1} \nabla f\) is the gradient of f in the Riemannian metric induced by H on \((-1,1)^2\).
Pathological sequences and curves Our counterexamples lead to sequences or paths in \({\mathbb {R}}^2\) which are related to a function f by a certain property (see examples above) and have a certain type of pathology. For illustration purposes, we provide sketches of the pathological behaviors we met in Fig. 1.
2 Preliminaries
Let us set \({\mathbb {N}}^*={\mathbb {N}}{\setminus }\{0\}\). For p in \({\mathbb {N}}^*\), the Euclidean scalar product in \({\mathbb {R}}^p\) is denoted by \(\langle \cdot ,\cdot \rangle \), otherwise stated the norm is denoted by \(\Vert \cdot \Vert \). Given subsets S, T in \({\mathbb {R}}^p\), and x in \({\mathbb {R}}^p\), we define
and the Hausdorff distance between S and T,
Throughout this note, the assertion “g is \(C^k\) on D” where D is not an open set is to be understood as “g is \(C^k\) on an open neighborhood of D”. Given a map \(G :X \mapsto A \times B\) for some space X, \([G]_1 :X \mapsto A\) denotes the first component of G.
2.1 Continuous convex interpolations
We consider a sequence of compact convex subsets of \({\mathbb {R}}^p\), \(\left( T_i \right) _{i \in {\mathbb {Z}}}\) such that \(T_{i+1} \subset \mathrm {int}\ T_i\). Finding a continuous convex interpolation of \(\left( T_i \right) _{i \in {\mathbb {Z}}}\) is finding a convex continuous function which makes the \(T_i\) a sequence of level sets. We call this process continuous convex interpolation. This questioning was present in Fenchel [21] and dates back to de Finetti [22], let us also mention the work of Crouzeix [18] revolving around this issue.
Such constructions have been shown to be realizable by Torralba [36], Kannai [25], using ideas based on Minkowski sum. The validity of this construction can be proved easily using the result of Crouzeix [18] which was already present under different and weaker forms in the works of de Finetti and Fenchel.
Theorem 1
(de Finetti–Fenchel–Crouzeix) Let \(f :{\mathbb {R}}^p \rightarrow {\mathbb {R}}\) be a quasiconvex function. The functions
are concave for all x in \({\mathbb {R}}^p\), if and only f is convex.
Our goal is to build smooth convex interpolation for sequences of smooth convex sets. To make such a construction we shall use nonlinear Minkowski interpolation between level sets.
We shall also rely on Bernstein approximation which we now describe.
2.2 Bernstein approximation
We refer to the monograph [28] by Lorentz.
The main properties of Bernstein polynomials to be used in this paper are the following:
-
Bernstein approximation is linear in its functional argument f and “monotone” which allows to construct an approximation using only positive combination of a finite number of function values.
-
There are precise formulas for derivatives of Bernstein approximation. They involve repeated finite differences. So approximating piecewise affine function with high enough degree leads to an approximation for which corner values of derivatives are controlled while the remaining derivatives are vanishing (up to a given order).
-
Bernstein approximation is shape preserving, which means in particular that approximating a concave function preserves concavity.
The main idea to produce a smooth interpolation which preserves level sets is depicted in Fig. 2 where we use Bernstein approximation to interpolate smoothly between three points and controlling the successive derivatives at the end points of the interpolation.
Let us now be specific. Given f defined on the interval [0, 1], the Bernstein polynomial of order \(d \in {\mathbb {N}}^*\) associated to f is given by
Derivatives and shape preservation: For any h in (0, 1) and x in \( [0,1-h]\), we set \(\Delta _h^1 f(x) = f(x+h) - f(x)\) and recursively for all k in \( {\mathbb {N}}^*\), \(\Delta _h^k f(x) = \Delta \left( \Delta _h^{k-1} f(x) \right) \). We fix \(d \ne 0\) in \({\mathbb {N}}\) and for \(h = \frac{1}{d}\) write \(\Delta _h^k = \Delta ^k\). Then for any \(m \le d\), we have
for any x in [0, 1]. If f is increasing (resp. strictly increasing), then \(\Delta ^1f(x) \ge 0\) (resp. \(\Delta ^1f(x) > 0\)) for all x and \(B'_d\) is positive (resp. strictly positive) and \(B_d\) is increasing (resp. strictly increasing). Similarly, if f is concave, then \(\Delta ^2 f(x) \le 0\) for all x so that \(B^{(2)}_d \le 0\) and \(B_d\) is concave. From (2), we infer
for x in [0, 1].
Approximation of piecewise affine functions: The following lemma will be extensively used throughout the proofs.
Lemma 1
(Smoothing of piecewise lines in \({\mathbb {R}}^p\)) Let \(q_0,q_1 \in {\mathbb {R}}^p\), \(\lambda _-<\lambda _0< \lambda _1 < \lambda _+\) and \(0<e_1,e_0<1\). Set \(\Theta = \left( q_0,q_1,,\lambda _-,\lambda _0,\lambda _1,\lambda _+, e_0,e_1 \right) \) and define \(\gamma _\Theta :[0,1] \mapsto {\mathbb {R}}^p\) through
The curve \(\gamma _\Theta \) in \({\mathbb {R}}^{p+1}\) is the affine interpolant between the points \(q_0\), \( (1 + e_0) q_0\), \((1 - e_1) q_1\) and \( q_1\). For any m in \( {\mathbb {N}}\), we choose d in \({\mathbb {N}}^*\) such that
We consider a Bernstein-like reparametrization of \({\tilde{\gamma }}_\Theta \) given by
Then the following holds, for any \(2 \le l \le m\), \({\tilde{\gamma }}_\Theta \) is \(C^m\) and
Furthermore, if \(\gamma _\Theta \) has monotone coordinates (resp. strictly monotone, resp. concave, resp. convex), then so has \({\tilde{\gamma }}_\Theta \).
Proof
Note that the dependence of \({\tilde{\gamma }}_\Theta \) in \((q_0,q_1)\) is linear so that the dependence of \({\tilde{\gamma }}_\Theta \) in \((q_0,q_1)\) is also linear. Hence \({\tilde{\gamma }}_\Theta \) is of the form \(\lambda \mapsto a(\lambda ) q_0 + b(\lambda )q_1\). We can restrict ourselves to \(p = 1\) since the general case follows from the univariate case applied coordinatewise.
If \(p=1\), then \({\tilde{\gamma }}_\Theta = B_{f_{\Theta ,d}} \circ A\), where \(A :\lambda \mapsto \frac{\lambda - \lambda _-}{\lambda _+-\lambda _-}\). We have \({\tilde{\gamma }}_\Theta (0) = q_0\), \({\tilde{\gamma }}_\Theta (1) = q_1\) and \(\Delta ^1 f_{\Theta }(0) = \frac{\lambda _+ - \lambda _-}{\lambda _0 - \lambda _-}\frac{e_0}{d} q_0\), \(\Delta ^1 f_{\Theta }\left( 1 - \frac{1}{d} \right) = \frac{\lambda _+ - \lambda _-}{\lambda _+ - \lambda _1}\frac{e_1}{d} q_1\) and \(\Delta ^{(l)} f_{\Theta }(0) = \Delta ^{(l)} f_{\Theta }\left( 1 - \frac{l}{d} \right) =0\). The results follow from the expressions in (1) and (2) and the chain rule for \({\tilde{\gamma }}_\Theta = B_{f_{\Theta ,d}} \circ A\).
The last property of \({\tilde{\gamma }}_\Theta \) is due to the shape preserving property of Bernstein approximation and the fact that \({\tilde{\gamma }}_\Theta = B_{f_{\Theta ,d}} \circ L\). \(\square \)
Remark 1
(a) [Affine image] Using the notation of Lemma 1, if \((\lambda _-,q_0)\), \((\lambda _0, (1 + e_0) q_0)\), \((\lambda _1, (1 - e_1) q_1)\) and \((\lambda _+, q_1)\) are aligned, then the interpolation is actually an affine function.
(b) [Degree of the interpolants] Observe that the degree of the Bernstein interpolant is connected to the slopes of the piecewise path \(\lambda \) by (4).
3 Smooth convex interpolation
Being given a subset S of \({\mathbb {R}}^p\), we denote by \(\mathrm {int}(S)\) its interior, \({\bar{S}}\) its closure and \(\mathrm {bd}\,S={\bar{S}}{\setminus } \mathrm {int}(S)\) its boundary. Let us recall that the support function of S is defined through
3.1 Smooth parametrization of convex rings
A convex ring is a set of the form \(C_1{\setminus } C_2\) where \(C_1\subset C_2\) are convex sets. Providing adequate parameterizations for such objects is key for interpolating \(C_1\) and \(C_2\) by some (regular) convex function.
The following assertion plays a fundamental role.
Assumption 1
Let \(T_-,T_+ \subset {\mathbb {R}}^2\) be convex, compact with \(C^k\) boundary (\(k \ge 2\)) and positive curvature. Assume that, \(T_- \subset \mathrm {int} (T_+)\) and \(0 \in \mathrm {int}(T_-)\).
The positive curvature assumption ensures that the boundaries can be parametrized by their normal, that is, for \(i=-,+\), there exists
such that the normal to \(T_i\) at \(c_i(\theta )\) is the vector \(n(\theta ) = (\cos (\theta ),\sin (\theta ))^T\) and \({\dot{c}}_i(\theta ) = \rho _{i}(\theta ) \tau (\theta )\) where \(\rho _i > 0\) and \(\tau (\theta ) = (-\sin (\theta ),\cos (\theta ))\). In this setting, it holds that \(c_i(\theta ) = \text{ argmax}_{y \in T_i} \left\langle n(\theta ), y\right\rangle \). The map \(c_i\) is the inverse of the Gauss map and is \(C^{k-1}\) (see [35] Section 2.5).
Lemma 2
(Minkowski sum of convex sets with positive curvature) Let \(T_-,T_+\) be as in Assumption 1 with normal parametrizations as above. For \(a,b\ge 0\) with \(a+b>0\) set \(T=aT_-+bT_+\).
Then T has positive curvature and its boundary is given by
with the natural parametrization \({\mathbb {R}}/2\pi {\mathbb {Z}}\,\ni \theta \rightarrow a\,c_-(\theta )+b\,c_+(\theta ).\)
Proof
We may assume \(ab>0\) otherwise the result is obvious. Let x be in \(\mathrm {bd}\,T\) and denote by \(n(\theta )\) the normal vector at x for a well chosen \(\theta \), so that \(x=\mathrm {argmax}\,\left\{ \langle y,n(\theta )\rangle \right\} : y \in T\}\). Observe that the definition of the Minkowski sum yields
so that
which implies by extremality of x that \(x= a c_-(\theta )+b c_+(\theta )\). Conversely, for any such x, \(n(\theta )\) defines a supporting hyperplane to T and x must be on the boundary of T. The other results follow immediately. \(\square \)
In the following fundamental proposition, we provide a smooth parametrization of the convex ring \(T_+{\setminus } \mathrm {int}\,T_-\). The major difficulty is to control tightly the derivatives at the boundary so that the parametrizations can be glued afterward to build smooth interpolants.
Proposition 1
(\(C^k\) parametrization of convex rings) Let \(T_-,T_+\) be as in Assumption 1 with their normal parametrization as above. Fix \(k\ge 2\), \(\lambda _-< \lambda _0< \lambda _1 < \lambda _+\) and \(0< e_0,e_1<1\). Choose d in \( {\mathbb {N}}^*\), such that
Consider the map
with \(\Theta = (c_-(\theta ), c_+(\theta ), \lambda _-, \lambda _0, \lambda _1, \lambda _+, e_0,e_1)\) and \({\tilde{\gamma }}\) as given by Lemma 1. Assume further that:
Then the image of G is \({\mathcal {R}}:=T_+ {\setminus } \mathrm {int}(T_-)\), G is \(C^k\) and satisfies, for any \(2 \le l \le k\) and any m in \( {\mathbb {N}}^*\),
Besides G is a diffeomorphism from its domain on to its image. Set \({\mathcal {R}}\ni x \mapsto (f(x), \theta (x))\) to be the inverse of G. Then f is \(C^k\) and in addition, for all x in \( {\mathbb {R}}\),
where all denominators are positive.
Remark 2
Note that G is actually well defined and smooth on an open set containing its domain. As we shall see it is also a diffeomorphism from an open set containing its domain onto its image.
Proof
Note that by construction, we have \(G(\lambda ,\theta ) = a(\lambda ) c_-(\theta ) + b(\lambda )c_+(\theta )\) for some polynomials a and b which are nonnegative on \([\lambda _-,\lambda _+]\). The formulas for the derivatives follow easily from this remark, the form of a and b and Lemma 1.
Set, for any \(\lambda \) in \( [\lambda _-,\lambda _+]\), \(T_\lambda = a(\lambda ) T_- + b(\lambda ) T_+\). The resulting set \(T_\lambda \) is convex and has a positive curvature by Lemma 2, and for \(\lambda \) fixed \(G(\lambda ,\cdot )\) is the inverse of the Gauss map of \(T_\lambda \), which constitutes a parametrization by normals of the boundary.
Assume that \(\lambda < \lambda '\), using the monotonicity assumption \(({\mathcal {M}})\), we have for any \(\theta ,\theta '\),
so that \(G(\lambda ,\theta ) \ne G(\lambda ',\theta ')\). Furthermore, we have by definition of \(G(\lambda ',\theta ')\)
where the equality follows from the convexity of \(T_{\lambda '}\). By convexity and compactness, this entails that \(T_\lambda = \mathrm {conv}(\mathrm {bd}\,(T_\lambda )) \subset \mathrm {int}\,T_{\lambda '}\).
Let us show that the map G is bijective, first consider proving surjectivity. Let f be defined on \(T_+ {\setminus } \mathrm {int}(T_-)\) through
Since \(a(\lambda _+)=0, b(\lambda _+)=1\) this function is well defined and by compactness and continuity the infimum is achieved. It must hold that x belongs to \( \mathrm {bd}\,(T_{f(x)})\), indeed, if \(f(x) = \lambda _-\), then x belongs to \( \mathrm {bd}\,(T_-)\) and otherwise, if x is in \( \mathrm {int}(T_{\lambda '})\) for \(\lambda ' > \lambda _-\), then \(f(x) < \lambda '\). We deduce that x is of the form \(G(f(x),\theta )\) for a certain value of \(\theta \), so that G is surjective.
As for injectivity, we have already seen a first case, the monotonicity assumption \(({\mathcal {M}})\) ensures that \(\lambda \ne \lambda '\) implies \(G(\lambda ,\theta ) \ne G(\lambda ',\theta ')\) for any \(\theta ,\theta '\). Furthermore, we have the second case, for any \(\lambda \) in \( [\lambda _-,\lambda _+]\) and any \(\theta \), \(G(\lambda ,\theta ) = \arg \max _{y\in T_\lambda } \left\langle y,n(\theta ) \right\rangle \) so that \(\theta \ne \theta '\) implies \(G(\lambda ,\theta ) \ne G(\lambda ,\theta ')\). So in all cases, \((\lambda ,\theta ) \ne (\lambda ',\theta ')\) implies that \(G(\lambda ,\theta ) \ne G(\lambda ',\theta ')\) and G is injective.
Let us now show that the map G is a local diffeomorphism by estimating its Jacobian map.
Since \(0 \in \mathrm {int}(T_-)\), we have for any \(\lambda ,\theta \),
and both scalar products are positive so that \(a(\lambda ) + b(\lambda ) > 0\). Hence, for any \(\theta \) in \( {\mathbb {R}}/ 2\pi {\mathbb {Z}}\),
with \(a(\lambda ) \rho _-(\theta ) + b(\lambda ) \rho _+ (\theta ) > 0\).
Furthermore by assumption \(\lambda \rightarrow \max _{y \in T_\lambda } \left\langle y, n(\theta ) \right\rangle =\left\langle G(\lambda ,\theta ),n(\theta ) \right\rangle \) has strictly positive derivative in \(\lambda \), whence
We deduce that for any fixed \(\theta \), in the basis \((n(\theta ), \tau (\theta ))\), the Jacobian of G, denoted \(J_G\), is triangular with positive diagonal entries. More precisely fix \(\lambda , \theta \) and set \(x = G(\lambda , \theta )\) such that \(\lambda = f(x)\), \(\theta = \theta (x)\). In the basis \((n(\theta ), \tau (\theta ))\), we deduce from (8) that the Jacobian of G is of the form
where
It is thus invertible and we have a local diffeomorphism. We deduce that
We have \(J_G(\lambda ,\theta )^{-1} = J_{G^{-1}}(x)\) so that the first line is \(\nabla f(x)\) and second line is \(\nabla \theta (x)\), which proves the claimed expressions for gradients.
We also have \(d n(\theta ) / d\theta = \tau (\theta )\) so that
Differentiating the gradient expression, we obtain (\(\nabla \) denotes gradient with respect to x):
We have
where, for the last identity, we have used the fact that
We deduce that
We have
So that we actually get
This concludes the proof. \(\square \)
3.2 Smooth convex interpolation of smooth positively curved convex sequences
In this section, we consider an indexing set I with either \(I = {\mathbb {N}}\) or \(I = {\mathbb {Z}}\), and an increasing sequence of compact convex sets \(\left( T_i \right) _{i\in I}\) such that for any i in I, the couple \(T_+ := T_{i+1}\), \(T_-: = T_i\) satisfies Assumption 1. In particular, for each i in I, \(T_i\) is compact, convex with \(C^k\) boundary and positive curvature. We denote by \(c_i\) the corresponding parametrization by the normal, \(T_i \subset \mathrm {int}\, T_{i+1}\). With no loss of generality we assume \(\displaystyle 0 \in \cap _{i \in I} T_i\).
This is our main theoretical result.
Theorem 2
(Smooth convex interpolation) Let \(I = {\mathbb {N}}\) or \(I = {\mathbb {Z}}\) and \(\left( T_i \right) _{i\in I}\) such that for any \(i \in I\), \(T_i \subset {\mathbb {R}}^2\) and the couple \(T_+ := T_{i+1}\), \(T_-: = T_i\) satisfies Assumption 1. Then there exists a \(C^k\) convex function
such that
(i) \(T_i\) is a sublevel set of f for all i in I.
(ii) We have
(iii) \(\nabla ^2 f \) is positive definite on \({\mathcal {T}}{\setminus } {{\,\mathrm{argmin}\,}}f\), and if \(I = {\mathbb {N}}\), it is positive definite throughout \({\mathcal {T}}\).
Proof
Preconditionning. We have that \(0 \in \cap _{i \in I } \mathrm {int}(T_i)\). Hence for any i in I and \(0 \le \alpha <1\), \(\alpha T_i \in \mathrm {int}(T_i)\). Furthermore, for \(\alpha >0\), small enough, \((1 + \alpha )T_{i} \subset (1-\alpha ) \mathrm {int}(T_{i+1})\). Set \(\alpha _0\) such that \((1 + \alpha _0)T_0 \subset (1-\alpha _0) \mathrm {int} (T_1)\). By forward (and backward if \(I = {\mathbb {Z}}\)) induction, for all i in I, we obtain \(\alpha _i>0\) such that
Setting for all i in I, \(\epsilon _{i+1} = \min \{\alpha _i,\alpha _{i+1}\}\) (\(\epsilon _0 = \alpha _0\) if \(I = {\mathbb {N}}\)), we have for all i in I
For all i in I, we introduce
We have a new sequence of strictly increasing compact convex sets \(\left( S_i\right) _{i \in I }\).
Value assignation. For each i in I, we set
Note that for all i in I, \(K_{3i} = 1\). We choose \(\lambda _{1} = 2\), \(\lambda _0 = 1\) and for all i in I,
By construction, we have for all i in I and all \(\theta \in {\mathbb {R}}/ 2 \pi {\mathbb {Z}}\),
If \(I = {\mathbb {Z}}\), this entails
and the right-hand side is summable over negative indices \(i\le 0\), so that \(\lambda _{i} \rightarrow {\underline{\lambda }} \in {\mathbb {R}}\) as \(i \rightarrow -\infty \). In all cases \((\lambda _i)_{i\in I}\) is an increasing sequence bounded from below.
Local interpolation. We fix i in I and consider the function \(G_i\) described in Proposition 1 with \(T_+ = S_{3i + 3} = T_{i+1}\), \(T_- = S_{3i} = T_i\), \(\lambda _+ = \lambda _{3i+3}\), \(\lambda _{1} = \lambda _{3i+2}\), \(\lambda _0 = \lambda _{3i+1}\), \(\lambda _- = \lambda _{3i}\), \(e_0 = \epsilon _i\), \(e_1 = \epsilon _{i+1}\). By linearity, we have for any \((\lambda ,\theta ) \in [\lambda _-,\lambda _+] \times {\mathbb {R}}/ 2\pi {\mathbb {Z}}\),
where \( {\tilde{\gamma }}_{\Theta }\) is as in Lemma 1 with input data \(q_0 = \left\langle c_{3i}(\theta ), n(\theta )\right\rangle = \sigma _{S_{3i}}(n(\theta ))\), \(q_1 = \left\langle c_{3i+3}(\theta ), n(\theta )\right\rangle = \sigma _{S_{3i+3}}(n(\theta ))\), and \(\lambda _-,\lambda _0,\lambda _1,\lambda _+,e_1,e_0\) as already described. This corresponds to the Bernstein approximation of the piecewise affine interpolation between the points
By construction of \(\left( K_i \right) _{i \in I }\), we have for all \(\theta \),
Whence the affine interpolant between points in (10) is strictly increasing and concave, and by using the shape preserving properties of Bernstein polynomials, \(\left\langle G_i(\lambda ,\theta ),n(\theta )\right\rangle \) has strictly positive derivative. As a consequence \(G_i\) is a diffeomorphism and its derivatives are as in Proposition 1. Furthermore
is a \(C^k\) concave function of \(\lambda \).
Global interpolation. Recall that \({\underline{\lambda }} = \inf _{i \in I} \lambda _i>-\infty \) and set \({\bar{\lambda }} = \sup _{i \in I} \lambda _i\in (-\infty ,+\infty ]\). For any \(\lambda \in ({\underline{\lambda }},{\bar{\lambda }})\), there exists a unique \(i_\lambda \in I\) such that \(\lambda \in [\lambda _{3i_\lambda }, \lambda _{3i_\lambda +3})\). Define
Fix i in I. The boundary of \(T_{i+1}\) is given by \(G_{i+1}(\lambda _{3i+3}, {\mathbb {R}}/ 2 \pi {\mathbb {Z}}) = G_{i}(\lambda _{3i+3}, {\mathbb {R}}/ 2 \pi {\mathbb {Z}})\) with actually
Since \(K_{3i} = 1\), we have
The expressions of the derivatives in Proposition 1 and (11) ensure that the derivatives of \(G_{i+1}\) and \(G_{i}\) agree on \({\lambda _{3i+3}} \times {\mathbb {R}}/ 2 \pi {\mathbb {Z}}\) up to order k. Hence G is a local diffeomorphism. Bijectivity of each \(G_i\) ensure that G is also bijective and thus G is a diffeomorphism. Furthermore
is \(C^k\) piecewise concave and thus concave.
Extending G. If \(I = {\mathbb {N}}\), we may assume without loss of generality that \(S_0 = B\) the Euclidean ball and \(S_1 = 5/3 S_0\), which corresponds to \(\epsilon _0 = 2/3\), eventually after adding a set in the list and rescaling. Let \(\phi \) denote the function described in Lemma 10 and \(G_{-1}\) be described as in Lemma 11. This allows to extend G for \(\lambda \in [0,1]\), G is then \(C^k\) on \((0,{\bar{\lambda }}) \times {\mathbb {R}}/ 2\pi {\mathbb {Z}}\) by using Lemma 11 and Proposition 1. This does not affect the differentiability, monotonicity and concavity properties of G.
Defining the interpolant f. We assume without loss of generality that \({\underline{\lambda }} = 0\). We set f to be the first component of the inverse of G so that it is defined on \(G^{-1}\left( (0,{\bar{\lambda }}) \times {\mathbb {R}}/ 2\pi {\mathbb {Z}}\right) \). We extend f as follows:
-
\(f(0) = 0\) if \(I = {\mathbb {N}}\),
-
\(f = 0\) on \(\cap _{i \in I } T_i\) if \(I = {\mathbb {Z}}\).
Since G is \(C^k\) and non-singular on \( (0,{\bar{\lambda }}) \times {\mathbb {R}}/ 2\pi {\mathbb {Z}}\), the inverse mapping theorem ensures that f is \(C^k\) on \(\mathrm {int}({\mathcal {T}}) {\setminus } \arg \min _{{\mathcal {T}}} f\).
Convexity of f. For any \(\theta \) in \( {\mathbb {R}}/ 2\pi {\mathbb {Z}}\),
is equal to \(\left\langle G(\lambda ,\theta ),n(\theta )\right\rangle \) which is concave. It can be extended at \(\lambda = 0\) by continuity. This preserves concavity hence, using Theorem 1, we have proved that f is convex and \(C^k\) on \({\mathcal {T}}{\setminus } {{\,\mathrm{argmin}\,}}_{{\mathcal {T}}}f\).
Smoothness around the argmin and Hessian positivity. If \(I = {\mathbb {N}}\), then the interpolant defined in Lemma 11 ensures that f is proportional to the norm squared around 0. Hence it is \(C^k\) around 0 with positive definite Hessian. We may compose f with the function \(g :t \mapsto \sqrt{t^2 + 1} + t\) which is increasing and has positive second derivative. This ensures that the resulting Hessian is positive definite outside \({{\,\mathrm{argmin}\,}}f\) and thus everywhere since
is positive definite thanks to the expressions for the Hessian of f in Proposition 1
If \(I = {\mathbb {Z}}\), we let all the derivatives of f vanish around the solution set. The smoothing Lemma 14 applies and provides a function \(\phi \) with positive derivative on \((0,+\infty )\), such that \(\phi \circ f\) is convex, \(C^k\) with prescribed sublevel sets. Furthermore, we remark that
We may compose \(\phi \circ f\) with the function \(g :t \mapsto \sqrt{t^2 + 1} + t\) which is increasing with positive second derivative, the expressions for the Hessian of f in Proposition 1 ensure that the resulting Hessian is positive definite out of \({{\,\mathrm{argmin}\,}}f\).\(\square \)
Remark 3
(Interpolation of symmetric rings) In view of Remark 1, if we have \(T_{i+1} = \alpha T_i\) for some \(0<\alpha < 1\) and i in \( {\mathbb {Z}}\), then the interpolated level sets between \(T_i\) and \(T_{i+1}\) are all of the form \(s T_i\) for \(\alpha \le s \le 1\).
Remark 4
(Strict convexity) Recall that strict convexity of a differentiable function amounts to the injectivity of its gradient. In Theorem 2 if there is a unique minimizer, then the invertibility of the Hessian outside argmin f ensures that our interpolant is strictly convex (note that this is automatically the case if \(I = {\mathbb {N}}\)).
3.3 Considerations on Legendre functions
The following proposition provides some interpolant with additional properties as global Lipschitz continuity and finiteness properties for the dual function. At this stage these results appear as merely technical but they happen to be decisive in the construction of counterexamples involving Legendre functions. The properties of Legendre functions can be found in [37, Chapter 6]. We simply recall here that, given a convex body C of \({\mathbb {R}}^p\), a convex function \(h:C\rightarrow {\mathbb {R}}\) is Legendre if it is differentiable on \(\mathrm {int}\,C\) and if \(\nabla h\) defines a bijection from \(\mathrm {int}\,C\) to \(\nabla h(\mathrm {int}\,C)\) with in addition
for all z in \(\mathrm {bd}\,C\). We also assume that \(\text{ epi }\,f:=\{(x,\lambda ):f(x)\le \lambda \}\) is closed in \({\mathbb {R}}^{p+1}\). The Legendre conjugate or dual function of h is defined through
for z in \({\mathbb {R}}^p\), and its domain is \(D:=\left\{ z\in {\mathbb {R}}^p:h(z)<+\infty \right\} .\) The function \(h^*\) is differentiable on the interior of D, and the inverse of \(\nabla h:\mathrm {int}\,C \rightarrow \mathrm {int}\,D\) is \(\nabla h^*:\mathrm {int}\,D \rightarrow \mathrm {int}\,C\).
We start with a simple technical lemma on the compactness of the domain of a Legendre function.
Lemma 3
Let \(h :{\mathbb {R}}^2 \mapsto {\mathbb {R}}\) be a globally Lipschitz continuous Legendre function, and set \(D = \mathrm {int}(\mathrm {dom}(h^*))\) where \(h^* :{\mathbb {R}}^2 \mapsto {\mathbb {R}}\) is the conjugate of h. For each \(\lambda \ge \min _{{\mathbb {R}}^2}h\) let \(\sigma _\lambda \) be the support function associated to the set \(\left\{ z \in {\mathbb {R}}^2, \, h(z) \le \lambda \right\} \). The following are equivalent
-
(i)
\(h^*(x)\le 0\) for all \(x \in D\).
-
(ii)
For all \(y \in {\mathbb {R}}^2\), \(\sigma _{h(y)}(\nabla h(y)) \le h(y)\).
In both cases \(h^*\) has compact domain.
Proof
Let us establish beforehand the following formula
with \(y = \nabla h^* (z)\) and \(y\in {\mathbb {R}}^2\). Since \(h^*\) is Legendre, we have for all y in D,
We have, setting \(y = \nabla h^*(z)\)
because \(\nabla h(y)\) is normal to the sublevel set of h which contains y in its boundary. Hence we have \(h^*(z) = \sigma _{h(y)}(\nabla h(y))- h(y)\) with \(y = \nabla h^* (z)\), that is (12) holds. Since \(\nabla h^* :D \mapsto {\mathbb {R}}^2\) is a bijection, the equivalence follows. In this case the domain of \(h^*\) is closed because \(h^*\) is bounded and lower semicontinuous. The domain is also bounded by the Lipschitz continuity of h, whence compact. \(\square \)
Proposition 2
(On Legendre interpolation) Let \(\left( S_i \right) _{i\in {\mathbb {N}}}\) be such that for any i in I, \(T_- = S_{i}\), \(T_+ = S_{i+1}\) satisfy Assumption 1 and there exists a sequence \(\left( \epsilon _i \right) _{i \in {\mathbb {N}}}\) in (0, 1) such that for all \(i\ge 1\), \((1 - \epsilon _i)^{-1} S_{3i-1} = (1 +\epsilon _i)^{-1} S_{3i+1}=S_{3i}\).
Assume in addition that,
Then there exists a convex \(C^k\) function \(h :{\mathbb {R}}^2 \mapsto {\mathbb {R}}\), such that
-
For all i in \( {\mathbb {N}}\), \(S_{3i}\) is a sublevel set of h,
-
h has positive definite Hessian,
-
h is globally Lipschitz continuous,
-
\(h^*\) has a compact domain D and is \(C^k\) and strictly convex on \(\mathrm {int}(D)\).
Proof
The construction of h follows the exact same principle as that of Theorem 2. This ensures that the first two points are valid. Note that Eq. (13) implies that the sets sequence grows by at least a fixed amount in each direction as i grows. Hence we have \({\mathcal {T}}= {\mathbb {R}}^2\).
Global Lipschitz continuity of h: The values of h are defined through
so that \(K_i = 1 + O(1/i^3)\) thanks to Eq. (14). Note that the moderate growth assumption entails
For \(i\ge 1\), one has
On the other hand using the bounds (14), (13) and the identity (16), there exists a constant \(\kappa >0\) such that for all \(i \ge 1\), all \(\theta \) in \( {\mathbb {R}}/ 2 \pi {\mathbb {Z}}\),
By the interpolation properties described in Lemma 1 the function \(\left\langle G(\lambda ,\theta ),n(\theta )\right\rangle \) constructed in Theorem 2 has derivative with respect to \(\lambda \) greater than \(\kappa \). Recalling the expression of the gradient as given in Proposition 1 (and the concavity of G with respect to \(\lambda \)), this shows that
for all x in \( {\mathbb {R}}^2\), and by the mean value theorem, h is globally Lipschitz continuous on \({\mathbb {R}}^2\).
Properties of the dual function: h is Legendre, its conjugate \(h^*\) is therefore Legendre. From the definiteness of \(\nabla ^2 h\) and the fact that \(\nabla h :{\mathbb {R}}^2 \mapsto \mathrm {int}(D)\) is a bijection, we deduce that \(h^*\) is \(C^k\) by the inverse mapping theorem. So the only property which we need to establish is that \(h^*\) has a compact domain, in other words, using Lemma 3, it is sufficient to show that \(\sup _{x \in \mathrm {int} D} h^*(x) \le 0\).
Using the notation of the proof of Theorem 2, we will show that it is possible to verify that, for all \(\lambda ,\theta \) in the domain of G
Equation (18) is indeed the coordinate form of the characterization given in Lemma 3. Let us observe that
and since G is concave, the right hand side is positive.
Assume that we have proved that,
has finite integral as \(\lambda \rightarrow \infty \). Since the function
is continuous on \({\mathbb {R}}/2\pi {\mathbb {Z}}\) for any \(\lambda \), Lebesgue dominated convergence theorem would ensure that
is continuous in \(\theta \), so that:
Shifting values if necessary, we could assume that this upper bound is equal to zero to obtain Eq. (18). The latter being the condition required in Lemma 3, we would have reached a conclusion.
Let us therefore establish that (19) is integrable over \({\mathbb {R}}_+\). Recall that G is constructed using the Bernstein interpolation given in Lemma 1 between successive values of \(\lambda \). As a result, for a fixed \(\theta \), the function \(\left\langle n(\theta ), G(\lambda ,\theta )\right\rangle \) is the interpolation of the piecewise affine function interpolating
as in Eq. (10). This interpolation is concave and increasing.
Assumption (14) ensures that \(K_j = 1 + O(1/j^3)\). Then
where \(\bar{K}\) is the finite, positive limit of the product (we can for example perform integral series comparison after taking the logarithm).
The recursion on the values writes for all \(i\ge 1\)
so that
This means that the gap between consecutive values tends to be constant. Thus by (4) in Lemma 1, see also Remark 1, the degree of the Bernstein interpolants is bounded. Using this bound together with inequality (3), providing bounds for the derivatives of Bernstein’s polynomial, ensure that, for all \(\lambda \) in \( [\lambda _{3i}, \lambda _{3i+3})\):
Now for any \(j = 3i+2, 3i+1\),
where the last identity follows from the triangle inequality because using \(\sigma _{S_{j}}(n(\theta )) - \sigma _{S_{j-1}}(n(\theta )) = O(1)\) in (15). Hence
where the last inequality follows from (15) and (14). Now as \(i \rightarrow \infty \), \(\lambda _{3i} \sim \lambda _{3i + 3} \sim i c\) for some constant \(c>0\) and
and
has finite integral as \(\lambda \rightarrow \infty \). This implies (20) and it concludes the proofs. \(\square \)
4 Smooth convex interpolation for sequences of polygons
Given a sequence of points \(A_1,\ldots ,A_n\), we denote by \(A_1\ldots A_n\) the polygon obtained by joining successive points ending the loop with the segment \([A_n,A_1]\). In the sequel we consider mainly convex polygons, so that the vertices \(A_1,\ldots ,A_n\) are also the extreme points.
The purpose of this section is first to show that polygons can be approximated by smooth convex sets with prescribed normals under weak assumptions. Figure 3 illustrates the result we would like to establish: given a target polygon with prescribed normals at its vertices, we wish to construct a smooth convex set interpolating the vertices with the desired normals and whose distance to the polygon is small.
Then given a sequence of nested polygons, we provide a smooth convex function which interpolates the polygons in the sense described just above.
Given a closed nonempty convex subset S of \({\mathbb {R}}^p\) and x in S, we recall that the normal cone to S at x is
Such vectors will often simply called normals (to S) at x.
4.1 Smooth approximations of polygons
Lemma 4
For any \(r_-,r_+ > 0\), \(t_- > 0\), \(t_+<0\) and \(\epsilon > 0\), \(m \in {\mathbb {N}}\), \(m \ge 3\), there exists a strictly concave polynomial function \(p :[0,1] \mapsto [0,\epsilon ]\) such that
Proof
Let us begin with preliminary remarks. Consider for any a, b in \( {\mathbb {R}}\), the function
For any t in \( {\mathbb {R}}\), q in \( {\mathbb {N}}\), \(q > 2\), and any \(c > 0\), we have
Choosing the degree d and constructing the polynomial. For any d in \( {\mathbb {N}}\), \(d \ge 2m + 1\), we set
and define the functions
Furthermore, we set
Note that \(b_-(d) \rightarrow t_- / r_-\), \(b_+(d) \rightarrow 1 + t_+ / r_+\), \(a_-(d) \rightarrow - r_-/2\) and \(a_+(d) \rightarrow - r_+/2\) as \(d \rightarrow \infty \) so that \(f_d \rightarrow f\) and \(g_d \rightarrow g\) uniformly on [0, 1]. For any d, \(f_d\) is concave increasing and \(g_d\) is concave decreasing and all of them are Lipschitz continuous on [0, 1] with constants that do not depend on d. Note also that \(f(0) = 0 < g(0)\) and \(g(1) = 0 < f(1)\). We choose \(d \ge 2m + 1\) such that
Such a d always exists because in both cases, the left hand side converges to 0 and the right hand side converges to a strictly positive term as d tends to \(\infty \). For such a d, we set \(h :s \mapsto \min \left\{ f_d(s), g_d(s),\epsilon \right\} \). By construction, h is concave, agrees with \(f_d\) on \(\left[ 0,\frac{m}{d} \right] \subset [0,1/2]\) and with \(g_d\) on \(\left[ 1- \frac{m}{d},1 \right] \subset [1/2,1]\). Using Eq. (23) with \(c = 1/d\), we deduce that
From the concavity of h and the derivative formula (2), we deduce that the polynomial \(B_{h,d}\) satisfies the desired properties (Fig. 4). \(\square \)
We deduce the following result
Lemma 5
Let \(a > 0\), \(r > 0\), \(\epsilon > 0\), and an integer \(m \ge 3\). Consider two unit vectors: \(v_-\) with strictly positive entries and \(v_+\) with first entry strictly positive and second entry strictly negative. Then there exists a \(C^m\) curve \(\gamma :[0,M] \mapsto {\mathbb {R}}^2\), such that
-
1.
\(\Vert \gamma '\Vert = 1\).
-
2.
\(\gamma (0) = (-a, 0) := A\) and \(\gamma (1) = (0,a) : = B\).
-
3.
\(\gamma '(0) = v_-\) and \(\gamma '(1) = v_+\).
-
4.
\(\Vert \gamma ''(0)\Vert = \Vert \gamma ''(-1)\Vert = r\).
-
5.
\(\mathrm {det}(\gamma ',\gamma '') < 0\) along the curve.
-
6.
\(\gamma ^{(q)}(0) = \gamma ^{(q)}(1) = 0\) for any \(3 \le q \le m\).
-
7.
\(\mathrm {dist}(\gamma ([0,M]), [A,B]) \le \epsilon \).
Proof
Consider the graph of a polynomial as given in Lemma 4 with \(t_- = v_-[2] / v_-[1]> 0\), \(t_+ = v_+[2] / v_+[1]< 0\) and \(r_- = \frac{r}{2a} (1 + t_-^2)^{\frac{3}{2}}\), \(r_+ = \frac{r}{2a} (1 + t_+^2)^{\frac{3}{2}}\) and \(\epsilon /2a\) as an approximation parameter. This graph is parametrized by t. It is possible to reparametrize it by arclength to obtain a \(C^m\) curve \(\gamma _0\) whose tangents at 0 is \(T_-\), at 1 is \(T_+\), and whose curvature at 0 and 1 is \(- \frac{r}{2a}\). Furthermore, \(\gamma _0\) has strictly negative curvature whence item 5. Consider the affine transform: \(x \mapsto 2a(x - 1/2)\), \(y \mapsto 2a y\). This results in a \(C^m\) curve \(\gamma \), parametrized by arclength which satisfies the desired assumptions. \(\square \)
Lemma 6
(Normal approximations of polygons by smooth convex sets) Let \(S=A_1...A_n\) be a convex polygon. For each i, let \(V_i\) be in \( N_S(A_i)\) such that the angle between \(V_i\) and each of the two neighboring faces is within \(\left( \frac{\pi }{2},\pi \right) \). Then for any \(\epsilon > 0\) and any \(m \ge 2\), there exists a compact convex set \(C \subset {\mathbb {R}}^2\) such that
-
(i)
the boundary of C is \(C^m\) with non vanishing curvature,
-
(ii)
\(S \subset C\),
-
(iii)
for any \(i = 1,\ldots , n\), \(A_i \) in \( \mathrm {bd}\,(C)\) and the normal cone to C at \(A_i\) is given by \(V_i\),
-
(iv)
\(\max _{y \in C} \mathrm {dist}(y,S) \le \epsilon \).
Proof
We assume without loss of generality that \(A_1,\ldots ,A_n\) are ordered clockwise. For \(i = 1, \ldots , n-1\), and each segment \([A_i,A_{i+1}]\), we may perform a rotation and a translation to obtain \(A_i = -(a,0)\) and \(A_{i+1} = (a,0)\). Working in this coordinate system, using the angle condition on \(V_i\), we may choose \(v^-_i\), \(v^+_{i+1}\) satisfying the hypotheses of Lemma 5 respectively orthogonal to \(V_i\) and \(V_{i+1}\). Choosing \(r = 1\), we obtain \(\gamma _i :[0,M_i] \mapsto {\mathbb {R}}^2\) as given by Lemma 5. Rotation and translations affect only the direction of the derivatives of curves, not their length. Hence, it is possible to concatenate curves \(\left( \gamma _i \right) _{i=1}^{n-1}\) and to preserve the \(C^m\) properties of the resulting curve. At end-points, tangents and second order derivatives coincide while higher derivatives vanish. Furthermore the curvature has constant sign and does not vanish. We obtain a closed \(C^m\) curve which defines a convex set which satisfies all the requirements of the lemma. \(\square \)
Remark 5
(Bissector) Given any polygon, choosing normal vectors as given by the direction of the bissector of each angles ensure that the above assumptions are satisfied. Hence all our approximation results hold given polygon without specifying the choice of outer normals.
4.2 Smooth convex interpolants of polygonal sequences
Definition 1
(Interpolability) For \(n\ge 3\), let \(A_1\ldots A_n \) be a convex polygon S and \(V_i \) be in \( N_S(V_i)\) for \(i=1,\ldots ,n\). We say that \(\left( A_i,V_i \right) _{i=1}^n \) is interpolable if for each \(i = 1,\ldots ,n\), the angle between \(V_i\) and each of the two neighboring faces of the polygon is in \(\left( \frac{\pi }{2},\pi \right) \). The collection \(\left( A_i,V_i \right) _{i=1}^n \) is called a polygon-normal pair.
Let \(I = {\mathbb {Z}}\) or \(I = {\mathbb {N}}\). Let \(\left( PN_i \right) _{i \in I}\) be a sequence of interpolable polygon-normal pairs. Setting for i in I, \(PN_i = \left\{ \left( A_{j,i} \right) _{j=1}^{n_i}, \left( V_{j,i} \right) _{j=1}^{n_i} \right\} \) where \(n_j \) is in \( {\mathbb {N}}\) and denoting by \(T_i\) the polygon \(A_{1,i}\ldots A_{n_i,i}\), we say that the sequence \(\left( PN_i \right) _{i \in I}\) is strictly increasing if for all i in I, \(T_i \subset \mathrm {int}(T_{i+1})\).
Let \(\left( PN_i \right) _{i \in I}\) be a strictly increasing sequence of interpolable polygon-normal pairs. A sequence \((\epsilon _i)_{i \in I}\) in (0, 1) is said to be admissible if \(0 \in \mathrm {int}(T_i)\) for each i in I and
for all \(\gamma \in [1-\epsilon _i,1+\epsilon _i].\) We have the following corollary of Theorem 2.
Corollary 1
(Smooth convex interpolation of polygon sequences) Let \(I = {\mathbb {Z}}\) or \(I = {\mathbb {N}}\). Let \(\left( PN_i \right) _{i \in I}\) be a strictly increasing sequence of interpolable polygon-normal pairs and \((\epsilon _i)_{i \in I}\) be admissible. Set \(\displaystyle {\mathcal {T}}:=\mathrm {int}\left( \cup _{i\in I}T_i \right) .\)
Then for any k in \( {\mathbb {N}}\), \(k \ge 2\) there exists a \(C^k\) convex function \(f :{\mathcal {T}} \mapsto {\mathbb {R}}\), and an increasing sequence \((\lambda _i)_{i \in I}\), with \(\inf _{i \in I} \lambda _i > -\infty \), such that for each i in I
-
(i)
\(T_i \subset \left\{ x,\,f(x) \le \lambda _i \right\} \).
-
(ii)
\(\mathrm {dist}(T_i, \left\{ x,\,f(x) \le \lambda _i \right\} ) \le \epsilon _i\).
-
(iii)
For each i in I, j in \(\{1,\ldots ,n_i\}\), we have \(f(A_{i,j}) = \lambda _i\) and \(\nabla f(x)\) is colinear to \(V_{i,j}\).
-
(iv)
\(\nabla ^2 f\) is positive definite outside \({{\,\mathrm{argmin}\,}}f\). When there is a unique minimizer then \(\nabla ^2f\) is positive definite throughout \({\mathcal {T}}\) (this is the case when \(I = {\mathbb {N}}\) or when \(I={\mathbb {Z}}\) and \(\cap _{i \in I} T_i\) is a singleton).
We add two remarks which will be useful for directional convergence issues and the construction of Legendre functions:
-
(a)
If two consecutive elements of the sequence of interpolable polygon-normal pairs are homothetic with center 0 in the interior of both polytopes, then the restriction of the resulting convex function to this convex ring can be constructed such that all the sublevel sets within this ring are homothetic with the same center.
-
(b)
If further conditions are imposed on the elements of a strictly increasing interpolable polygon-normal pair, then the resulting function can be constructed to be Legendre and globally Lipschitz continuous (that is, its Legendre conjugate has bounded support). This is a consequence of Proposition 2 and will be detailled in the next section.
4.3 More on Legendre functions and a pathological function with polyhedral domain
Using intensively polygonal interpolation, we build below a finite continuous Legendre function h on an \(\ell ^\infty \) square with oscillating “mirror lines": \(t\rightarrow \nabla h^*(\nabla h(x_0)+tc)\).
We start with the following preparation proposition related to the Legendre interpolation of Proposition 2.
Lemma 7
Let \(\left( PN_i \right) _{i \in {\mathbb {N}}^*}\) be a strictly increasing sequence of interpolable polygon-normal pairs. Setting for i in \({\mathbb {N}}^*\), \(PN_i = \left\{ \left( A_{j,i} \right) _{j=1}^{n_i}, \left( V_{j,i} \right) _{j=1}^{n_i} \right\} \) where \(n_j\) is in \({\mathbb {N}}^*\) and denoting by \(T_i\) the polygon \(A_{1,i}\ldots A_{n_i,i}\), we assume that
where P is a fixed polygon which contains the unit Euclidean disk.
Then for any l in \({\mathbb {N}}\), \(l \ge 2\), there exists a strictly increasing sequence of sets \(\left( S_i \right) _{i \in {\mathbb {N}}, \,i\ge 2}\), such that for \(j \ge 1\),
-
\(S_{3j}\) interpolates the normals of \(PN_j\) in the sense of Lemma 6 with \(\mathrm {dist}(S_{3j}, T_j) \le 1 / (4(3j+2)^l)\)
-
\(S_{3j - 1} = \frac{3j - 1}{3_j} S_{3j}\)
-
\(S_{3j + 1} = \frac{3j + 1}{3_j} S_{3j}\)
This sequence has the following properties
-
there exists \(c>0\), such that for all j in \({\mathbb {N}}\), \(j \ge 3\) and for all unit vector x,
$$\begin{aligned} c\ge \sigma _{S_{j+1}}(x) - \sigma _{S_j}(x) \ge 1 - \frac{1}{(j+1)^l}. \end{aligned}$$(26) -
for all unit vector x,
$$\begin{aligned} \left| \frac{\sigma _{S_{j+1}}(x) - \sigma _{S_j}(x)}{ \sigma _{S_{j}}(x) - \sigma _{S_{j-1}}(x)} -1\right| \le \frac{1}{j^l},\,\forall j\ge 3. \end{aligned}$$(27)
Proof
Set for all j in \({\mathbb {N}}^*\), \(\delta _j = \frac{1}{4(3j+2)^l}\) and let \(S_{3j}\) be the \(\delta _j\) interpolant of \(T_j=3jP\) as given by Lemma 6 so that \(\mathrm {dist}(S_{3j}, 3jP) \le \delta _j\). Since P contains the unit ball,
Now set
For any j in \({\mathbb {N}}^*\), it is clear that \(S_{3j-1} \subset \mathrm {int}(S_{3j})\) and \(S_{3j} \subset \mathrm {int}(S_{3j+1})\). Furthermore, by (28), we have
so that we indeed have a strictly increasing sequence of sets. We obtain from the construction, for any j in \({\mathbb {N}}^*\), and any unit vector x,
This proves (26). We deduce that for all j in \({\mathbb {N}}^*,\)
Furthermore, using the fact that \(t \mapsto \frac{1+t}{1-t}\) is increasing on \((-\infty ,1)\) and the fact that \(\delta _{j+1} \le \delta _j\),
Setting \(s(t)= (1+t)/(1-t)\), we have, for all \(t \le 1/2\)
Thus \(s(t) \le 1 + 2 t + 12 t^2\) on \((-\infty ,1/2]\). Since \(\frac{4}{3} \delta _j \le \frac{4}{75} \le \frac{1}{2}\), we deduce from the previous remark and (30) above:
Finally using (29) again,
This proves the desired result. \(\square \)
Combining Lemma 7 and Proposition 2, we obtain the following result.
Theorem 3
Let \(\left( PN_i \right) _{i \in {\mathbb {N}}^*}\) be a strictly increasing sequence of interpolable polygon-normal pairs. Set for i in \({\mathbb {N}}^*\), \(PN_i = \left\{ \left( A_{j,i} \right) _{j=1}^{n_i}, \left( V_{j,i} \right) _{j=1}^{n_i} \right\} \) where \(n_i\) is in \({\mathbb {N}}^*\), denote by \(T_i\) the polygon \(A_{1,i}\ldots A_{n_i,i}\), and assume that for all i in \( {\mathbb {N}}^*\), \(T_i=3i P\) where P is a fixed polygon which contains the unit disk in its interior.
Then for any k in \( {\mathbb {N}}\), \(k \ge 2\) and all \(l \ge 3\), there exists a \(C^k\) globally Lipschitz continuous Legendre function, \(h :{\mathbb {R}}^2 \mapsto {\mathbb {R}}\), and an increasing sequence \((\lambda _i)_{i \in {\mathbb {N}}}\), such that for each i in \( {\mathbb {N}}\):
-
\(T_i \subset \left\{ x,\,h(x) \le \lambda _i \right\} \),
-
\(\mathrm {dist}(T_i, \left\{ x,\,h(x) \le \lambda _i \right\} ) \le \frac{1}{4(3i+2)^l}\),
-
For any x with \(h(x) = \lambda _i\) and \(\nabla h(x)\) is colinear to \(V_i\) for each vertex x of \(T_i\),
-
h has positive definite Hessian and is globally Lipschitz continuous,
-
\(h^*\) has compact domain and is \(C^k\) on the interior of its domain.
Corollary 2
(Continuity on the domain) The function \(h^*\) constructed in Theorem 3 has compact polygonal domain and is continuous on this domain.
Proof
Since P is a polygon and contains the unit Euclidean disk, the gauge function of 3P is polyhedral with full domain \({\mathbb {R}}^2\), call it \(\omega \). Denote by \(P^{\circ }\) the polar of P. This is a polytope and since \(\omega \) is the gauge of P, we actually have \(\omega = \sigma _{P^{\circ }}\), the support function of the polar of P [35, Theorem 1.7.6]. Hence the the convex conjugate of \(\omega \) is the indicator of the polytope \(P^\circ \) [37, Theorem 13.2].
It can be easily seen from the proof of Proposition 2 that \(\lambda _i = \alpha i + r_i\) with \(r(i) = O(1)\) as \(i \rightarrow \infty \). Without loss of generality, we may suppose that \(\alpha = 1\) (this is a simple rescaling) so that there is a positive constant c such that \(|\lambda _i - i| \le c\) for all i.
Let h be given as in Theorem 3, fix \(i \ge 1\) and \(x \in {\mathbb {R}}^2\) such that \(\lambda _{i-1} \le h(x) \le \lambda _i\). We have in \({\mathbb {R}}^2\)
and hence
and we deduce that
Since i was arbitrary, this shows that there exists a constant \(C>0\) such that \(|h(x) - \omega (x)|\le C\) for all \(x \in {\mathbb {R}}^2\). Recall that \(z \mapsto \sup _{y \in {\mathbb {R}}^2} \left\langle y,z\right\rangle - \omega (y)\) is the indicator function of \(P^\circ \), hence,
which shows that the domain of \(h^*\) is actually \(P^{\circ }\) which is a polytope. Now, \(h^*\) is convex and lower semicontinuous on \(P^{\circ }\), invoking the results of [23], it is also upper semicontinuous on \(B^*\) and finally it is continuous on \(B^*\). \(\square \)
Corollary 3
(A pathological Legendre function) For any \(\theta \in \left( \frac{-\pi }{4},\frac{\pi }{4} \right) \) there exists a Legendre function \(h:{\mathbb {R}}^2 \mapsto {\mathbb {R}}\) whose domain is a closed square, continuous on this domain and \(C^k\) on its interior, such that for all \(i \in {\mathbb {N}}^*\), \(\nabla h^*(i, 0)\) is proportional to \((\cos (\theta ), (-1)^i \sin (\theta ))\).
Proof
For \(x=(u,v)\), set \(\Vert x\Vert _1=|u|+|v|\), and let \(P = \left\{ x \in {\mathbb {R}}^2,\, \Vert x\Vert _1 \le 2 \right\} \). Let us construct a strictly increasing sequence of interpolable polygon-normal pairs \(\left( PN_i \right) _{i \in {\mathbb {N}}^*}\) as follows, we fix \(\theta \in \left( \frac{-\pi }{4},\frac{\pi }{4} \right) \) and set for all \(i\in {\mathbb {N}}^*\) :
-
\(T_i = 3i P\), the polygon associated to the i-th term \(PN_i\) of the sequence,
-
except at the rightmost corner, consider the normals given by the canonical basis vectors and their opposite,
-
at the rightmost corner, (6i, 0), one chooses the normal given by the vector
$$\begin{aligned} (\cos (\theta ), (-1)^i \sin (\theta )). \end{aligned}$$
We now invoke Theorem 3 to obtain a Lipschitz continuous Legendre function, denoted \(h^*\), with full domain having all the \(T_i\) as sublevel sets and satisfying the hypotheses of the corollary. Rescaling by a factor 6 and setting \(h = h^{**}\) gives the result. \(\square \)
5 Counterexamples in continuous optimization
We are now in position to apply our interpolation results to build counterexamples to classical problems in convex optimization. We worked on situations ranging from structural questions to qualitative behavior of algorithms and ODEs. Through 9 counterexamples we tried to cover a large spectrum but there are many more possibilities that are left for future research. Some example are constructed from decreasing sequences of convex sets, they can be interpolated using Theorem 2 with \(I = {\mathbb {Z}}\), indexing the sequence with negative indices and adding artificially additional sets for positive indices. Nonetheless we sometimes depart from the notations of the first sections and index these sequences by \({\mathbb {N}}\) even though they are decreasing for simplification purposes.
5.1 Kurdyka-Łojasiewicz inequality may not hold
The following result is proved in [12], it was crucial to construct a \(C^2\) convex function which does not satisfy Kurdyka–Łojasiewicz (KL) inequality.
Lemma 8
[12, Lemma 35] There exists a decreasing sequence of compact convex sets \(\left( T_i \right) _{i\in {\mathbb {N}}}\) such that for any i in \( {\mathbb {N}}\), \(T_- = T_{i+1}\) and \(T_+ = T_i\) satisfy Assumption 1 and
As a corollary, we improve the counterexample in [12] and provide a \(C^k\) convex counterexamples for any \(k\ge 2 \) in \( {\mathbb {N}}\).
Corollary 4
(Smooth convex functions are not KL in general) There exists a \(C^k\) convex function \(f :{\mathbb {R}}^2 \mapsto {\mathbb {R}}\) which does not satisfy KL inequality. More precisely, for any \(r > \inf f\) and \(\varphi :[\inf f, r] \mapsto {\mathbb {R}}\) continuous and differentiable on \((\inf f, r)\) with \(\varphi ' > 0\) and \(\varphi (\inf f) = 0\), we have
Proof
Using [35, Theorem 1.8.13], each \(T_i\) can be approximated up to arbitrary precision by a polygon. Hence we may assume that all \(T_i\) are polygonal while preserving the property of Lemma 8 as well as Assumption 1. Furthermore, using Lemma 6 and Remark 5 each \(T_i\) can in turn be approximated with arbitrary precision by a convex set with \(C^k\) boundary and positive curvature. Hence we may also assume that all \(T_i\) satisfy both the result of Lemma 8 and have \(C^k\) boundary with nonvanishing curvature. Reversing the order of the sets and adding additional sets artificially, we are in the conditions of application of Theorem 2 with \(I = {\mathbb {Z}}\) and the resulting f follows from the same argument as in [12, Theorem 36]. \(\square \)
5.2 Block coordinate descent may not converge
The following polygonal construction is illustrated in Fig. 5. For any \(n\ge 2\) in \({\mathbb {N}}\), we set
This defines a convex polygon. We may choose the normals at \(A_n, C_n, D_n, E_n\) to be bisectors of the corresponding corners and the normal at \(B_n\) to be horizontal (see Fig. 5). Rotating by an angle of \(-\frac{n\pi }{2}\) and repeating the process indefinitely, we obtain the sequence of polygons depicted in Fig. 5. It can be checked that the polygons form a strictly decreasing sequence of sets, as for \(n>1\), the polygon \(A_nB_nC_nD_nE_n\) is contained in the interior of the square \(A_{n-1}C_{n-1}D_{n-1}E_{n-1}\). This fulfills the requirement of Corollary 1.
Corollary 5
There exists a \(C^k\) convex function \(f :{\mathbb {R}}^2 \mapsto {\mathbb {R}}\) and an initialization \(x_0=(u_0,v_0)\) such that the recursion, for \(i \ge 1\)
produces a non converging sequence \(\displaystyle (x_i)_{i\in {\mathbb {N}}}=\left( (u_i,v_i)\right) _{i\in {\mathbb {N}}}\).
Proof
We apply Corollary 1 to the proposed decreasing sequence and by choosing \((u_0,v_0) = B_2\) for example. This requires to shift indices (start with \(i = 2\)) and use Theorem 2 with \(I = {\mathbb {Z}}\). Note that the optimality condition for partial minimization and the fact that level sets have nonvanishing curvature ensure that the partial minima are unique. \(\square \)
In the nonsmooth convex case cyclic minimization is known to fail to provide the infimum value, see e.g., [3, p. 94]. Smoothness is sufficient for establishing value convergence (see e.g. [10, 39] and references therein), whether it is enough or not for obtaining convergence was an open question. Our counterexample closes this question and shows that cyclic minimization does not yield converging sequences even for \(C^k\) convex functions. This result also closes the question for the more general nonconvex case for which we are not aware of a nontrivial counterexample for convergence of alternating minimization. Let us mention however Powell’s example [34] which shows that cyclic minimization with three blocks does not converge for smooth functions.
It would also be interesting to understand how our result may impact dual methods and counterexamples in that field, as for instance the recent three blocks counterexample in [17].
5.3 Gradient descent with exact line search may not converge
Gradient descent with exact line search is governed by the recursion:
where x is a point in the plane.
Observe that the step coincides with partial minimization when the gradient \(\nabla f(x)\) is colinear to one of the axis of the canonical basis. From the previous section, we thus deduce the following.
Corollary 6
(Failure of gradient descent with exact line search) There exists a \(C^k\) convex function \(f :{\mathbb {R}}^2 \mapsto {\mathbb {R}}\) and an initialization \(z_0\) in the plane such that the recursion, for \(i \ge 1\)
produces a well defined non converging sequence \(\left( x_i \right) _{i \in {\mathbb {N}}}\).
Convergence failure for gradient descent with exact line search is new up to our knowledge. Let us mention that despite non convergence, the constructed sequence satisfy sublinear convergence rates in function values [10].
5.4 Tikhonov regularization path may have infinite length
Following [36], we consider for any \(r > 0\)
where f is \(C^k\) convex and where \({\bar{x}}\) is any anchor point. We would like to show that the curve \(r \mapsto x(r)\) may have infinite length. Torralba provided a counterexample in his PhD Thesis for continuous convex functions, see [36]. This work extends his result to smooth \(C^k\) convex functions in \({\mathbb {R}}^2\).
For any n in \( {\mathbb {N}}^*\), we set
This is depicted in Fig. 6. For all \(n \ge 1\), denote by \(M_n\) the point on the x axis above \(B_n\) and \(N_n\), the intersection of the normal cone at \(B_n\) and the x axis. We have
so that for all \(n \ge 1\), \(M_nN_n = 3\) and \(N_n = (3 + 2/n + 1 / n^2, 0)\). Choosing \(P = (7,0)\), since for \(n \ge 1\), \(3 + 2/n + 1 / n^2 \le 6 < 7\) , this shows that P constantly belongs to the interior of the normal cone at \(B_n\) for all \(n \ge 1\). The sequence of level sets is constructed as in Fig. 6 by considering alternating symmetries with respect to the x-axis of the sequence of polygons above. It can be checked that the polygons form a strictly decreasing sequence of sets, as for \(n>1\), the polygon \(A_nB_nC_nD_nE_n\) is contained in the interior of the square \(A_{n-1}C_{n-1}D_{n-1}E_{n-1}\). We choose the normal at \(A_n, C_n, D_n, E_n\) to belong to the bisector at the corner and the normal at \(B_n\) to be proportional to the vector \(B_n P\). Applying Corollary 1, we construct f and choose \(\bar{x} = P\) in (31) to obtain the following:
Corollary 7
(A bounded infinite length Tikhonov path) There exists a \(C^k\) strictly convex function \(f :{\mathbb {R}}^2 \mapsto {\mathbb {R}}\) and \(\bar{x} \in {\mathbb {R}}^2\) such that the curve x((0, 1)) given by (31) has infinite length.
Proof
We apply Corollary 1 with \(I = {\mathbb {Z}}\) and revert the indices set to match the sequence that we have described. For any \(n \ge 1\) there exists a value of \(\lambda _n\) such that \(f(B_n)= \lambda _n\) and \(\nabla f(B_n)\) is colinear to the vector \(B_n P\). Set
we have \(\nabla f(B_n) + 2r(B_n - P) = 0\) which is the optimality condition in (31) with \(\bar{x} = P\). Hence we have shown that there exists a value of r such that \(B_n\) is the solution to (31). Since n was arbitrary this is true for all n and the curve \(r \mapsto x(r)\) has to go through a sequence of points whose second coordinate is of the form \((-1)^n/n\) for all \(n \ge 1\). Since this sequence is not absolutely summable, the curve has infinite length. \(\square \)
This result is in contrast with the definable case for which we have finite length by the monotonicity lemma, since the whole trajectory is definable and bounded.
5.5 Secants of gradient curves at infinity may not converge
Thom’s gradient conjecture and Kurdyka–Mostowski–Parusinski’s theorem A theorem of Łojasiewicz [27] asserts that bounded solutions to the gradient system
converge when f is a real analytic potential. Thom conjectured in [38] that this convergence should occur in a stronger form: trajectories converging to a given \(\bar{x}\) should admit a tangent at infinity, that is
should have a limit as \(t \rightarrow \infty \). Lines passing through \({\bar{x}}\) and having (32) as a slope are called secants of x at \({\bar{x}}\). This conjecture was proved to be true in [26]. In the convex world, it is well known that solutions to the gradient system converge for general potentials (this is a Féjer monotonicity argument due to Bruck); see also the original approaches by Manselli and Pucci [30] and Daniilidis et al. [19]. It is then natural to wonder whether this convergence satisfies higher order rigidity properties as in the analytic case. The answer turns out to be negative in general yielding a quite mysterious phase portrait.
Absence of tangential convergence for convex potentials
The construction given in this paragraph is more complex than the previous ones, we start with a technical lemma which will be the basic building block for our counterexample.
Lemma 9
Let S be a convex set with \(C^k\) boundary interpolating ABCDEF in Fig. 7 and let g be the gauge function associated to S. The function g is differentiable outside the origin. Consider any initialization \(x_0 \) in [BC] with corresponding trajectory solution to the equation
Set \(\bar{t} = \sup _{x(t) \in OBC} t\), we have \(\bar{t} < + \infty \) and \(x(\bar{t}) \) in \( [CC']\).
Proof
The fact that g is differentiable comes from the fact that its subgradient is uniquely determined by the normal cone to S (which has dimension one because of the smoothness of the boundary of S). Since S is interpolating the polygon, we have \(g(B) = g(C) = 1\). Furthermore, we have for all \(t\ge 0\), \(\frac{d}{dt} g(x(t)) = -\Vert \nabla g(x(t))\Vert ^2= -1\), thence \(\bar{t} \le 1 - g(C')\). By homogeneity, for any \(x \ne 0\) and \(s > 0\), \(\nabla g (sx) = \nabla g(x)\). For any x in [BC], by convexity
and therefore
By homogeneity of g, (33) is true for any x in the triangle OCB (different from 0) and thus in the triangle \(C'CB\) . Denote by y the solution to the equation
which integrates to \(y(t) = B - t w\) for all t. Equation (33) ensures that for any \(0\le t \le \bar{t}\)
Hence, we have for any \(0\le t \le \bar{t}\), integrating on [0, t]
Furthermore, for all x in [BC], we have
because v is orthogonal to \(C-B\). The first term is maximal for \(x = C\) and thus the second term is minimal for \(x = C\), we have thus for all x in [BC]
Equation (35) holds for all x in OCB different from O by homogeneity. We deduce that for all \(0 \le t \le \bar{t}\), we have
and by integration
Hence, in the coordinate system \((C - B, -v)\), which is orthogonal, for all t in \( [0,\bar{t}]\), x(t) has larger coordinates than y(t).
The trajectory y(t), of equation \(t \mapsto B - t w\) is the line going from B to \(C'\). From Eqs. (34) and (36), we may write for all t in \( [0,\bar{t}]\), \(x(t) = y(t) + \alpha (t) (C-B) + \beta (t) (-v)\) where \(\alpha \) and \(\beta \) are positive functions. Since y(t) belongs to the line \((BC')\), this shows that x(t) has to be above this line for all \(t \ge 0\), \(t\le \bar{t}\) and actually, \(x(\bar{t}) \) in \( BCC'\). Hence at time \(\bar{t}\), we have \(x(\bar{t}) \) in \( [CC']\). This holds true because \(x(\bar{t})\) is on the boundary of OCB and on the boundary of \(BCC'\). Hence either \(x(\bar{t}) \) in \( [CC']\), either \(x(\bar{t}) \) in [BC]. Equation (35) ensures that if \(x(\bar{t}) \) in [BC] then \(x(\bar{t}) = C\) which concludes the proof. \(\square \)
Corollary 8
(Secants of gradient curves may all fail to converge) There exists a \(C^k\) strictly convex function on \({\mathbb {R}}^2\) with a unique minimizer \(\bar{x}\), such that any nonconstant solution to the gradient flow equation
is such that
does not have a limit as \(t \rightarrow \infty \).
The function f has a positive definite Hessian everywhere except at 0.
Proof
We assume without loss of generality that \(\bar{x} = O\) is the origin. Writting \(x(t) = (r(t),\theta (t))\) in polar coordinate, we will construct a function f such that each solution to the ODE produces nonconverging trajectories \(\theta (t)\).
We start with an interpolating set \(S_0 = ABCDE\) as in Lemma 9 and let \(S_1 = A'B'C'D'E'\) be its scaled version as described in Fig. 7.
Let \(\alpha \) be the value of the angle \({\widehat{BOC}}\) and \(m = \left\lceil \frac{2 \pi }{\alpha } \right\rceil + 1\). We have
To obtain \(S_{2}\), we rotate \(S_0\) by an angle \(2\pi / m\), we denote \(S'_0\) the resulting set. We rescale \(S'_0\) by a factor \(\beta \) in (0, 1) so that \(\beta S'_0\) lies in the interior of \(S_1\). Call the resulting set \(S_2\) and \(S_3\) is obtained from \(S_2\) exactly the same way as \(S_1\) is obtained from \(S_0\). We repeat the same process indefinitely to obtain a strictly decreasing sequence of \(C^k\) sets. Note that for any k in \( {\mathbb {N}}\), \(S_{2km}\) and \(S_{2km + 1}\) are homothetic to \(S_0\).
We now invoke Corollary 1 (with \(I = {\mathbb {Z}}\) and revert the indices) to obtain a \(C^k\) function f with those prescribed level sets. Using Remark 3 it turns out that the level sets of f between \(S_0\) and \(S_1\) are simple scalings of \(S_0\). Hence the gradient curves of f and those of the gauge function of S are the same between \(S_0\) and \(S_1\), up to time reparametrization.
Using Lemma 9 any trajectory crossing [BC] in Fig. 7, must also be crossing \([CC']\) and leave the triangle BOC in finite time. The same statement holds after scaling the level sets and since for all k in \( {\mathbb {N}}\), \(S_{2km}\) and \(S_{2km+1}\) are homothetic to \(S_0\), this shows that no solution stays indefinitely in the triangle BOC.
Lemma 9 still holds after rotations and by our construction, for any triangle T obtained by rotating BOC by a multiple of \(2\pi /m\), no trajectory stays indefinitely within T. Since \(2\pi / m < \alpha \), the union of these triangles U contains O in its interior.
Note first that any gradient curve converges to \({\bar{x}}\). Let us argue by contradiction and assume that there exists a continuous gradient curve \(t \mapsto z(t)\) distinct from the stationary solution \(\bar{x}\), such that
converges. This exactly means that the angle \(\theta (t)\) of the curve has a limit in \([0,2\pi )\) as t goes to infinity. There is a rotation of BOC by a multiple of \(2\pi /m\) whose interior intersects the half line given by the direction \(\theta \), call this triangle T. The directional convergence entails that there exists \(t_0 \ge 0\) such that z(t) belongs to T for all \(t \ge t_0\). Hence z can not be a gradient curve. To complete the proof, we may add disks of increasing size to the list of sets to obtain a full domain function and invoke Theorem 2 with \(I = {\mathbb {Z}}\). \(\square \)
5.6 Newton’s flow may not converge
Given a twice differentiable convex function f, we define the open set \(\Omega :=\{x\in {\mathbb {R}}^2:\nabla ^2f \text{ is } \text{ invertible }\}\) and we consider maximal solutions to the differential equation
on \(\Omega \). This is the continuous counterpart of Newton’s method, it has been studied in [5] and [2]. Let \(x_0\) be in \(\Omega \), there exists a unique maximal nontrivial interval I containing 0 and a unique solution x to (37) on I with \(x(0) = x_0\). Equation (37) may be rewritten as
and thus for all t in I, we have
If we could ensure that \(I = {\mathbb {R}}\) and f has oscillating gradients close to its minimum, then (38) entails that the direction of the gradient is constant along the solution, which requires oscillations in space to compensate for gradient oscillations.
Corollary 9
(A bounded Newton’s curve that oscillates at infinity) For any \(k\ge 2\), there exists a \(C^k\) convex coercive function \(f :{\mathbb {R}}^2 \mapsto {\mathbb {R}}\) and an initial condition \(x_0 \) in \( {\mathbb {R}}^2\) such that the solution to (37) is bounded, defined on \({\mathbb {R}}\) and has at least two distinct accumulation points.
The counterexample is sketched in Fig. 8, the construction is the same as for Corollary 5 but instead of doing quarter rotations, we use symmetry with respect to the first axis. We can then call for Corollary 1 to construct the function f and equation (38) ensures that the solution interval is unbounded.
5.7 Bregman descent (mirror descent) may not converge
The mirror descent algorithm was introduced in [31] as an efficient method to solve constrained convex problems. In [9], this method is shown to be equivalent to a projected subgradient method, using non-Euclidean projections. It plays an important role for some categories of constrained optimization problem; see e.g., [6] for recent developments and [20] for a surprising example.
Let us recall beforehand some definitions. Given a Legendre function h with domain \(\mathrm {dom}\,h\), define the Bregman distanceFootnote 4 associated to h as \(D_h(u,v)=h(u)-h(v)-\langle \nabla h(v),u-v\rangle \) where u is in \( \mathrm {dom}\,h\) and v is in the interior of \(\mathrm {dom}\,h\).
Given a smooth convex function f that we wish to minimize on \(\overline{ \mathrm {dom}\,h}\), we consider the Bregman method
where \(x_0\) is in \(\mathrm {int}\,\mathrm {dom}\,h\) and \(\lambda >0\) is a step size. When the above iteration is well defined, e.g. when \(\mathrm {dom}\,h\) is bounded, it writes:
In [6] the authors identified a generalized smoothness condition which confers good minimizing properties to the above method:
The corollary below shows that such an algorithm may not converge, even though we assume the cost to satisfy (39), the step to satisfy (40), and the Legendre function to have a compact domain.
Corollary 10
(Bregman or mirror descent may not converge) There exists a Legendre function \(h:D \mapsto {\mathbb {R}}\), defined on a closed square D, continuous on D, a vector c in \( {\mathbb {R}}^2\), and \(x_0\) in \({\mathbb {R}}^2\) such that the Bregman descent recursion
produces a bounded sequence \((x_i)_{i \in {\mathbb {N}}}\) which has at least two distinct accumulation points.
Proof
We fix \(\theta \in \left( \frac{-\pi }{4},\frac{\pi }{4} \right) \), \(\theta \ne 0\), and consider h constructed in Corollary 3 and choose \(c = (-1,0)\). In this case the Bregman descent recursion writes for all i in \({\mathbb {N}}\),
so that we actually have \(\nabla h (x_{i}) - \nabla h(x_{0}) = \nabla h (x_{i}) = - i c\) and thus
By Corollary 3, we have for all \(i \in {\mathbb {N}}\) that \(\nabla h^*(i,0)\) proportional to \((\cos (\theta ), (-1)^i \sin (\theta ))\). Since the norm of the gradient of \(h^*\) cannot vanish at infinitiy (no flat direction) and is bounded, this proves that the sequence \((x_i)_{i \in {\mathbb {N}}}\) has at least two accumulation points which is the desired result. \(\square \)
5.8 Central paths of Legendre barriers may not converge
Consider the problem
where D is a subset of \({\mathbb {R}}^2\) and c. Given a Legendre function h on D, we introduce the h central path through
where \(r>0\) is meant to tend to 0. Central paths are one of the essential tools behind interior point methods, see e.g., [4, 33] and references therein.
Note that the accumulation points of x(r) as \(r \rightarrow 0\), have to be in the the solution set of (41). It is even tempting to think that the convergence of the path to some specific minimizer could occur, as it is the case for many barriers, see e.g. [4]. We have however:
Corollary 11
(A nonconverging central path) There exists a Legendre function \(h:D \mapsto {\mathbb {R}}\), defined on a closed square D, continuous on D, a vector c in \( {\mathbb {R}}^2\), such that the h central path \(r \mapsto x(r)\) has two distinct accumulation points.
Proof
The optimality condition which characterizes x(r) for any \(r > 0\) writes,
and the construction is the same as in Corollary 10. \(\square \)
5.9 Hessian Riemannian gradient dynamics may not converge
The construction of this paragraph is similar to the two previous paragraphs. Consider a \(C^k\) (\(k\ge 2\)) Legendre function \(h :D \mapsto {\mathbb {R}}\) and the continuous time dynamics
where \(H = \nabla ^2 h\) is the Hessian of h and \(\nabla _H f = H^{-1} \nabla f\) is the gradient of some differentiable function f in the Riemannian metric induced by H on \(\mathrm {int}\,D\). Such dynamics were considered in [1, 14].
We have the following result:
Corollary 12
(Nonconverging Hessian Riemannian gradient dynamics) There exists a Legendre function \(h:D \mapsto {\mathbb {R}}\), defined on a closed square D, continuous on D, a vector c in \( {\mathbb {R}}^2\), and \(x_0\) in \({\mathbb {R}}^2\) such that the solution to (43) with \(f=\langle c,\cdot \rangle \) has two distinct accumulation points.
Proof
Equation (43) may be rewritten
so choosing \(c = (-1,0)\), we have for all \(t \in {\mathbb {R}}\), \(\nabla h(x(t)) = \nabla h(x(0)) + (t,0) = (t,0)\) and the construction is the same as in Corollary 10. \(\square \)
Notes
In the sense of sets inclusion the sequence being indexed on \({\mathbb {N}}\) or \({\mathbb {Z}}\).
See Theorem 2 for the full version.
By structural, we include homotopic deformations by mere summation.
It is actually not a proper distance.
References
Alvarez, F., Bolte, J., Brahic, O.: Hessian Riemannian gradient flows in convex programming. SIAM J. Control. Optim. 43(2), 477–501 (2004)
Alvarez, D.F., Pérez, C.J.M.: A dynamical system associated with Newton’s method for parametric approximations of convex minimization problems. Appl. Math. Optim. 38, 193–217 (1998)
Auslender, A.: Optimisation Méthodes Numériques. Masson, Paris (1976)
Auslender, A.: Penalty and barrier methods: a unified framework. SIAM J. Optim. 10(1), 211–230 (1999)
Aubin, J.-P., Cellina, A.: Differential Inclusions: Set-Valued Maps and Viability Theory. Springer, Berlin (1984)
Bauschke, H.H., Bolte, J., Teboulle, M.: A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications. Math. Oper. Res. 42(2), 330–348 (2016)
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, vol. 408. Springer, New York (2011)
Beck, A.: First-Order Methods in Optimization, vol. 25. SIAM, Philadelphia (2017)
Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3), 167–175 (2003)
Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent type methods. SIAM J. Optim. 23(4), 2037–2060 (2013)
Bertsekas, D.P., Scientific, A.: Convex Optimization Algorithms. Athena Scientific, Belmont (2015)
Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362(6), 3319–3363 (2010)
Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.W.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165(2), 471–507 (2017)
Bolte, J., Teboulle, M.: Barrier operators and associated gradient-like dynamical systems for constrained minimization problems. SIAM J. Control Optim. 42(4), 1266–1292 (2003)
Borwein, J.M., Li, G., Yao, L.: Analysis of the convergence rate for the cyclic projection algorithm applied to basic semialgebraic convex sets. SIAM J. Optim. 24(1), 498–527 (2014)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Chen, C., He, B., Ye, Y., Yuan, X.: The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math. Program. 155(1–2), 57–79 (2016)
Crouzeix, J.-P.: Conditions for convexity of quasiconvex functions. Math. Oper. Res. 5(1), 120–125 (1980)
Daniilidis, A., Ley, O., Sabourau, S.: Asymptotic behaviour of self-contracted planar curves and gradient orbits of convex functions. Journal de matéhmatiques pures et appliquées 94(2), 183–199 (2010)
Dragomir, R.A., Taylor, A., d’Aspremont, A., Bolte, J.: Optimal complexity and certification of Bregman first-order methods (2019). arXiv preprint arXiv:1911.08510
Fenchel, W.: Convex Cones, Sets and Functions, Mimeographed Lecture Note. Princeton University, Princeton (1951)
de Finetti, B.: Sulle stratificazioni convesse. Ann. Mat. 30(1), 173–183 (1949)
Gale, D., Klee, V., Rockafellar, R.T.: Convex functions on convex polytopes. Proc. Am. Math. Soc. 19(4), 867–873 (1968)
Golub, G.H., Hansen, P.C., O’Leary, D.P.: Tikhonov regularization and total least squares. SIAM J. Matrix Anal. Appl. 21(1), 185–194 (1999)
Kannai, Y.: Concavifiability and constructions of concave utility functions. J. Math. Econ. 4(1), 1–56 (1977)
Kurdyka, K., Mostowski, T., Parusinski, A.: Proof of the gradient conjecture of R. Thom. Ann. Math. 152(3), 763–792 (2000)
Łojasiewicz, S.: Sur les trajectoires du gradient d’une fonction analytique. Seminari di Geometria, Bologna (1982/83). Universita’ degli Studi di Bologna, Bologna 1984, 115–117 (1984)
Lorentz, G.G.: Bernstein Polynomials. American Mathematical Society, Providence (1954)
Ma, T.W.: Higher chain formula proved by combinatorics. Electron. J. Comb. 16(1), N21 (2009)
Manselli, P., Pucci, C.: Maximum length of steepest descent curves for quasi-convex functions. Geom. Dedicata 38(2), 211–227 (1991)
Nemirovsky, A.S., Yudin, D.B.: Problem Complexity and Method Efficiency in Optimization. Wiley-Interscience, New York (1983)
Nesterov, Y.: Lectures on Convex Optimization, vol. 137. Springer, Berlin (2003)
Nesterov, Y., Nemirovskii, A.: Interior-Point Polynomial Algorithms in Convex Programming, vol. 13. SIAM, Philadelphia (1994)
Powell, M.J.: On search directions for minimization algorithms. Math. Program. 4(1), 193–201 (1973)
Schneider, R.: Convex Bodies: The Brunn–Minkowski Theory, vol. 151. Cambridge University Press, Cambridge (1993)
Torralba, D.: Convergence épigraphique et changements d’échelle en analyse variationnelle et optimisation. Ph.D. Thesis, Université Montpellier 2 (1996)
Rockafellar, R.T.: Convex Analysis, vol. 28. Princeton University Press, Princeton (1970)
Thom, R.: Problèmes rencontrés dans mon parcours mathématique : un bilan. Publications mathématiques de l’IHES 70, 199–214 (1989)
Wright, S.J.: Coordinate descent algorithms. Math. Program. 151(1), 3–34 (2015)
Acknowledgements
The authors acknowledge the support of AI Interdisciplinary Institute ANITI funding, through the French “Investing for the Future – PIA3” program under the Grant agreement \(\mathrm{n}^{\circ }\)ANR-19-PI3A-0004, Air Force Office of Scientific Research, Air Force Material Command, USAF, under grant numbers FA9550-19-1-7026, FA9550-18-1-0226, and ANR MasDol. J. Bolte acknowledges the support of ANR Chess, grant ANR-17-EURE-0010, TSE-P and ANR OMS.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Lemma 10
(Smooth concave interpolation: in between square root and affine) There exists a \(C^\infty \) strictly increasing concave function \(\phi :[0,1] \mapsto [0,1]\) such that
Proof
Consider a \(C^\infty \) function \(g_0 :{\mathbb {R}}\mapsto [0,1]\) such that \(g_0 = 1\) on \((-\infty ,-1)\), \(g_0 = 0\) on \((1, +\infty )\) (for example convoluting the step function with a smooth bump function). Set \(g(t) = \frac{1}{2}\left( g_0(t) + 1 - g_0(-t) \right) \) we have that g is \(C^\infty \), \(g = 1\) on \((-\infty ,-1)\), \(g = 0\) on \((1, +\infty )\) and \(g(t) + g(-t) = 1\) for all t. We have
Set \(\phi _0 :[-3,3] \mapsto {\mathbb {R}}\), such that
For all r in \( [-3,3]\), we have
and thus
and in particular \(\phi _0(3) = 12\) and \(\phi _0'(3) = 3\). Set \(\phi _1(s) = \phi _0(6 s -3)/12\).
\(\phi _1\) is stricly increasing, let \(\phi :[0,1] \mapsto [0,1]\) denote the inverse of \(\phi _1\), we have
\(\square \)
Lemma 11
(Interpolation inside a sublevel set) Consider any strictly increasing \(C^k\) function \(\phi :(0,2) \mapsto {\mathbb {R}}\) such that \(\phi (1) = 1\) and \(\phi ^{(m)}(1) = 0\), \(m = 2,\ldots k\). Then the function
is diffeomorphism which satisfies for any \(m=1 \ldots ,k\) and \(l =2,\ldots , k\),
Lemma 12
Combinatorial Arbogast-Faà di Bruno Formula (from [29]) Let \(g :{\mathbb {R}}\mapsto {\mathbb {R}}\) and \(f :{\mathbb {R}}^p \mapsto [0, +\infty )\) be \(C^k\) functions. Then we have for any \(m \le k\) and any indices \(i_1,\ldots ,i_m \in \left\{ 1,\ldots , p \right\} \).
where \({\mathcal {P}}\) denotes all partitions of \(\left\{ 1,\ldots , m \right\} \), the product is over subsets of \(\left\{ 1,\ldots ,m \right\} \) given by the partition \(\pi \) and \(|\cdot |\) denotes the number of elements of a set. We rewrite this as follows
where \({\mathcal {P}}_k\) denotes all partitions of size k of \(\left\{ 1,\ldots , m \right\} \).
Lemma 13
From [12, Lemma 45] Let h in \( C^0\left( (0,r_0],{\mathbb {R}}_+^* \right) \) be an increasing function. Then there exists a function \(\psi \) in \( C^\infty ({\mathbb {R}},{\mathbb {R}}_+)\) such that \(\psi = 0\) on, \({\mathbb {R}}_-\) and \(0 < \psi (s) \le h(s)\) for any s in \((0,r_0]\) and \(\psi \) is increasing on \({\mathbb {R}}\)
Lemma 14
(High-order smoothing near the solution set) Let \(D \subset {\mathbb {R}}^p\) be a nonempty compact convex set and \(f :D \mapsto {\mathbb {R}}\) convex, continuous on D and \(C^k\) on \(D {\setminus } {{\,\mathrm{argmin}\,}}_{D} f\). Assume further that \({{\,\mathrm{argmin}\,}}_D f \subset \mathrm {int}(D)\), \(k \ge 1\), with \(\min _D f = 0\). Then there exists \(\phi :{\mathbb {R}}\mapsto {\mathbb {R}}_+\), \(C^k\), convex and increasing with positive derivative on \((0,+\infty )\), such that \(\phi \circ f\) is convex and \(C^k\) on D.
Proof
By a simple translation, we may assume that \(\min _D f = 0\) and \(\max _D f = 1\). Any convex function is locally Lipschitz continuous on the interior of its domain so that f is globally Lipschitz continuous on D and its gradient is bounded. Hence, \(f^2\) is \(C^1\) and convex on D. We now proceed by recursion. For any \(m =1,\ldots , k\), we let \(Q_m\) denote the m-order tensor of partial derivatives of order m. Fix m in \(\{1,\ldots ,k\}\). Assume that f is \(C^m\) throughout D while it is \(C^{m+1}\) on \(D {\setminus } \arg \min _D f\). Note that all the derivatives up to order m are bounded. We wish to prove that f is globally \(C^{m+1}\).
Consider the increasing function
and set \(\psi \) as in Lemma 13. Recall that \(\psi \) is \(C^\infty \) and all its derivative vanish at 0 and \(\psi \le h\) on (0, 1]. Let \(\phi \) denote the anti-derivative of \(\psi \) such that \(\phi (0) = 0\). \(\phi \) is \(C^\infty \) and convex increasing on \({\mathbb {R}}\) and, since its derivatives at 0 vanish as well, one has, for any q in \( {\mathbb {N}}\), \(\phi ^{(q)}(z) = o(z)\). Consider the function \(\phi \circ f\). It is \(C^m\) on D and it has bounded derivatives up to order m. Furthermore, it is \(C^{m+1}\) on \(D {\setminus } {{\,\mathrm{argmin}\,}}_D f\). Let \(\bar{y} \) in \( {{\,\mathrm{argmin}\,}}_D f\). If \(\bar{y} \) in \( \mathrm {int}({{\,\mathrm{argmin}\,}}_D f)\), then f and \(\phi \,\circ f\) have derivatives of all order vanishing at \(\bar{y}\). Assuming that \(\bar{y} \) in \( {{\,\mathrm{argmin}\,}}_D f{\setminus } \mathrm {int}({{\,\mathrm{argmin}\,}}_D f)\). By the induction assumption and Lemma 12, we have for any indices \(i_1,\ldots ,i_m \in \left\{ 1,\ldots , p \right\} \) and any h in \( {\mathbb {R}}^p\):
All the derivatives of f are of order less or equal to m and thus remain bounded as \(z \rightarrow 0\). Further more f is Lipschitz continuous on D so that \(f(\bar{y} + z) = O(\Vert z\Vert )\) near 0, and, for any q in \( {\mathbb {N}}\), \(\phi ^{(q)}(f(\bar{y} + z)) = o(\Vert z\Vert )\). Hence \(\phi \circ f\) has derivative of order \(m+1\) at \(\bar{y}\) and it is 0.
Since \({{\,\mathrm{argmin}\,}}_D f \subset \mathrm {int}(D)\), we may consider any sequence of point \((y_{j})_{j \in {\mathbb {N}}}\) in \(D {\setminus } {{\,\mathrm{argmin}\,}}_D f\) converging to \(\bar{y}\). By Lemma 12, we have for any indices \(i_1,\ldots ,i_{m+1} \in \left\{ 1,\ldots , p \right\} \), and any j in \( {\mathbb {N}}\),
where the inequality follows from the construction of \(\phi \). The third step follows using the definition of h and the fact that, for any \(q \ge 2\),
-
1.
Each partition of \(\left\{ 1,\ldots ,m+1 \right\} \) of size q contains subsets of size at most m. Thus in the product, the terms \(\partial ^{|B|} f\) correspond to bounded derivatives of f by the induction hypothesis.
-
2.
\(\phi ^{(q)}(a) = o(a)\) as \(a \rightarrow 0\).
The last step stems from the fact that the ratio has asbolute value less than 1. This shows that the derivatives of order \(m+1\) of \(\phi \circ f\) are decreasing to 0 as \(j \rightarrow \infty \) and \(\phi \circ f\) is actually \(C^{m+1}\) and convex on D. The result follows by induction up to \(m = k\) and by the fact that a composition of increasing convex functions is increasing and convex. \(\square \)
Lemma 15
Let \(p :{\mathbb {R}}_+ \mapsto {\mathbb {R}}_+\) be concave increasing and \(C^1\) with \(p' \ge c\) for some \(c > 0\). Assume that there exists \( A > 0\) such that for all x in \( {\mathbb {R}}_+\)
Then setting \(a= A/c\), we have for all \(x \ge a\),
Proof
For all \(x \ge a\), we have
hence
\(\square \)
Rights and permissions
About this article
Cite this article
Bolte, J., Pauwels, E. Curiosities and counterexamples in smooth convex optimization. Math. Program. 195, 553–603 (2022). https://doi.org/10.1007/s10107-021-01707-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-021-01707-1
Keywords
- Convex programming
- Smooth convex counterexamples
- Interpolation of decreasing convex sequences
- Bregman methods
- Block-coordinate methods
- Exact line search