Curiosities and counterexamples in smooth convex optimization

Bolte, Jérôme; Pauwels, Edouard

doi:10.1007/s10107-021-01707-1

Curiosities and counterexamples in smooth convex optimization

Full Length Paper
Series A
Published: 09 October 2021

Volume 195, pages 553–603, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Mathematical Programming Submit manuscript

Curiosities and counterexamples in smooth convex optimization

Download PDF

925 Accesses
4 Citations
Explore all metrics

Abstract

Counterexamples to some old-standing optimization problems in the smooth convex coercive setting are provided. We show that block-coordinate, steepest descent with exact search or Bregman descent methods do not generally converge. Other failures of various desirable features are established: directional convergence of Cauchy’s gradient curves, convergence of Newton’s flow, finite length of Tikhonov path, convergence of central paths, or smooth Kurdyka–Łojasiewicz inequality. All examples are planar. These examples are based on general smooth convex interpolation results. Given a decreasing sequence of positively curved $C^k$ convex compact sets in the plane, we provide a level set interpolation of a $C^k$ smooth convex function where $k\ge 2$ is arbitrary. If the intersection is reduced to one point our interpolant has positive definite Hessian, otherwise it is positive definite out of the solution set. Furthermore, given a sequence of decreasing polygons we provide an interpolant agreeing with the vertices and whose gradients coincide with prescribed normals.

The Equivalence of Three Types of Error Bounds for Weakly and Approximately Convex Functions

Article 26 March 2022

Faster subgradient methods for functions with Hölderian growth

Article 07 January 2019

Interpolatory pointwise estimates for convex polynomial approximation

Article 30 June 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

1.1 Questions and method

One of the goals of convex optimization is to provide a solution to a problem with stable and fast algorithms. The quality of a method is generally assessed by the convergence of sequences, rate estimates, complexity bounds, finite length of relevant quantities and other quantitative or qualitative ways.

Positive results in this direction are numerous and have been the object of intense research since decades. To name but a few: gradient methods e.g., [16, 31, 32], proximal methods e.g., [7], alternating methods e.g., [8, 39], path following methods e.g., [4, 33], Tikhonov regularization e.g. [24], semi-algebraic optimization e.g., [13, 15], decomposition methods e.g., [7, 8], augmented Lagrangian methods e.g., [11] and many others.

Despite this vast literature, some simple questions remain unanswered or just partly tackled, even for smooth convex coercive functions. Does the alternating minimization method, aka Gauss–Seidel method, converge? Does the steepest descent method with exact line search converge? Do mirror descent or Bregman methods converge? Does Newton’s flow converge? Do central paths converge? Is the gradient flow directionally stable? Do smooth convex functions have the Kurdyka–Łojasiewicz property?

In this article we provide a negative answer to all these questions.

Our work draws inspiration from early works of de Finetti [22], Fenchel [21], on convex interpolation, but also from Torralba’s PhD thesis [36] and the more recent [12], where some counterexamples on the Tikhonov path and Łojasiewicz inequalities are provided. The convex interpolation problem, see [22], is as follows: given a monotone sequence of convex sets^{Footnote 1} may we find a convex function interpolating each of these sets, i.e., having these sets as sublevel sets? Answers to these questions for continuous convex functions were provided by de Finetti, and improved by Fenchel [21], Kannai [25], and then used in [12, 36] for building counterexamples.

We improve this work by providing, for $k\ge 2$ arbitrary, a general $C^k$ interpolation theorem for positively curved convex sets, imposing at the same time the positive definiteness of its Hessian out of the solution set. An abridged version could be as follows.

Theorem

(Smooth convex interpolation^{Footnote 2}) Let $\left( T_i \right) _{i\in {\mathbb {Z}}}$ be a sequence of compact convex subsets of ${\mathbb {R}}^2$, with positively curved $C^k$ boundary, such that $T_i\subset \mathrm {int}\,T_{i+1}$ for all i in ${\mathbb {Z}}$. Then there exists a $C^k$ convex function f having the $T_i$ as sublevel sets with positive definite Hessian outside of the set:

$$\begin{aligned} {{\,\mathrm{argmin}\,}}f=\bigcap _{i\in {\mathbb {Z}}} T_i. \end{aligned}$$

We provide several additional tools (derivatives computations) and variants (status of the solution set, Legendre functions, Lipschitz continuity). Whether our result is generalizable to general smooth convex sequences, i.e., with vanishing curvature, seems to be a very delicate question whose answer might well be negative.

Our central theoretical result is complemented by a discrete approximate interpolation result “of order one" which is particularly well adapted for building counterexamples. Given a nested collection of polygons, one can indeed build a smooth convex function having level sets interpolating its vertices and whose gradients agree with prescribed normals.

Our results are obtained by blending parametrization techniques, Minkovski summation, Bernstein approximations and convex analysis.

As sketched below, our results offer the possibility of building counterexamples in convex optimization by restricting oneself to the construction of countable collections of nested convex sets satisfying some desirable properties. In all cases failures of good properties are caused by some curvature oscillations.

1.2 A digest of counterexamples

Counterexamples provided in this article can be classified along three axes: structural counterexamples,^{Footnote 3} counterexamples for convex optimization algorithms and ordinary differential equations.

In the following, the term “nonconverging” sequence or trajectory means, a sequence or a trajectory with at least two distinct accumulation points. Unless otherwise stated, convex functions have full domain.

The following results are proved for $C^k$ convex functions on the plane with $k\ge 2$.

Structural counterexamples

(i)
Kurdyka–Łojasiewicz: There exists a $C^k$ convex function whose Hessian is positive definite outside its solution set and which does not satisfy the Kurdyka–Łojasiewicz inequality. This is an improvement on [12].
(ii)
Tikhonov regularization path: There exists a $C^k$ convex function f such that the regularization path
$$\begin{aligned} x(r)= {{\,\mathrm{argmin}\,}}\left\{ f(y) + r \Vert y\Vert ^2:y\in {\mathbb {R}}^2\right\} , \,\,r\in (0,1) \end{aligned}$$
has infinite length. This strengthens a theorem by Torralba [36].
(iii)
Central path: There exists a continuous Legendre function $h :[-1,1]^2 \mapsto {\mathbb {R}}$, $C^k$ on the interior, and c in ${\mathbb {R}}^2$ such that the central path
$$\begin{aligned} x(r) = {{\,\mathrm{argmin}\,}}\left\{ \left\langle c, y \right\rangle + r h(y):y\in D\right\} \end{aligned}$$
does not have a limit as $r \rightarrow 0$.

Algorithmic counterexamples:

(iv)
Gauss–Seidel method (block coordinate descent): There exists a $C^k$ convex function with positive definite Hessian outside its solution set and an initialization $ (u_0,v_0)$ in ${\mathbb {R}}^2$, such that the alternating minimization algorithm
$$\begin{aligned} u_{i+1}&= \mathop {\hbox {argmin}}\limits _{u \in {\mathbb {R}}} f(u, v_i) \\ v_{i+1}&= \mathop {\hbox {argmin}}\limits _{v \in {\mathbb {R}}}f(u_{i+1}, v) \end{aligned}$$
produces a bounded nonconverging sequence $((u_i,v_i))_{i\in {\mathbb {N}}}$.
(v)
Gradient descent with exact line search: There exists a $C^k$ convex function f with positive definite Hessian outside its solution set and an initialization $x_0$ in ${\mathbb {R}}^2$, such that the gradient descent algorithm with exact line search
$$\begin{aligned} x_{i+1}&= \mathop {\hbox {argmin}}\limits _{t \in {\mathbb {R}}} f(x_i + t \nabla f(x_i)) \end{aligned}$$
produces a bounded nonconvergent sequence.
(vi)
Bregman or mirror descent method: There exists a continuous Legendre function $h :[-1,1]^2 \mapsto {\mathbb {R}}$, $C^k$ on the interior, a vector c in ${\mathbb {R}}^2$ and an initialization $x_0$ in $(-1,1)^2$ such that the Bregman recursion
$$\begin{aligned} x_{i+1} = \nabla h^*(\nabla h(x_i) - c) \end{aligned}$$
produces a nonconverging sequence. The couple $(h,\langle c,\cdot \rangle $) satisfies the smoothness properties introduced in [6].

Continuous time ODE counterexamples:

(vii)
Continuous time Newton method: There exists a $C^k$ convex function with positive definite Hessian outside its solution set, and an initialization $x_0$ in ${\mathbb {R}}^2$ such that the continuous time Newton’s system
$$\begin{aligned} {\dot{x}}(t)&= - \left[ \nabla ^2 f(x(t))\right] ^{-1} \nabla f(x(t)), \,\,t\ge 0,\\ x(0)&= x_0 \end{aligned}$$
has a solution approaching ${{\,\mathrm{argmin}\,}}f$ which does not converge.
(viii)
Directional convergence for gradient curves: There exists a $C^k$ convex function with 0 as a unique minimizer and a positive definite Hessian on ${\mathbb {R}}^2{\setminus }\{0\}$, such that for any non stationary solution to the gradient system
$$\begin{aligned} {\dot{x}}(t)&= - \nabla f(x(t)) \end{aligned}$$
the direction $x(t) / \Vert x(t)\Vert $ does not converge.
(ix)
Hessian Riemannian gradient dynamics: There exists a continuous Legendre function $h :[-1,1]^2 \mapsto {\mathbb {R}}$, $C^k$ on the interior, a linear function f and a nonconvergent solution to the following system
$$\begin{aligned} {\dot{x}}(t) = - \nabla _H f(x(t)), \end{aligned}$$
where $H = \nabla ^2 h$ is the Hessian of h and $\nabla _H f = H^{-1} \nabla f$ is the gradient of f in the Riemannian metric induced by H on $(-1,1)^2$.

Pathological sequences and curves Our counterexamples lead to sequences or paths in ${\mathbb {R}}^2$ which are related to a function f by a certain property (see examples above) and have a certain type of pathology. For illustration purposes, we provide sketches of the pathological behaviors we met in Fig. 1.

2 Preliminaries

Let us set ${\mathbb {N}}^*={\mathbb {N}}{\setminus }\{0\}$. For p in ${\mathbb {N}}^*$, the Euclidean scalar product in ${\mathbb {R}}^p$ is denoted by $\langle \cdot ,\cdot \rangle $, otherwise stated the norm is denoted by $\Vert \cdot \Vert $. Given subsets S, T in ${\mathbb {R}}^p$, and x in ${\mathbb {R}}^p$, we define

$$\begin{aligned} \mathrm {dist}(x,S):=\inf \left\{ \Vert x-y\Vert :y\in S\right\} , \end{aligned}$$

and the Hausdorff distance between S and T,

$$\begin{aligned} \mathrm {dist}(S,T)=\max \left( \sup _{x\in S}\mathrm {dist}(x,T),\sup _{x\in T}\mathrm {dist}(x,S)\right) . \end{aligned}$$

Throughout this note, the assertion “g is $C^k$ on D” where D is not an open set is to be understood as “g is $C^k$ on an open neighborhood of D”. Given a map $G :X \mapsto A \times B$ for some space X, $[G]_1 :X \mapsto A$ denotes the first component of G.

2.1 Continuous convex interpolations

We consider a sequence of compact convex subsets of ${\mathbb {R}}^p$, $\left( T_i \right) _{i \in {\mathbb {Z}}}$ such that $T_{i+1} \subset \mathrm {int}\ T_i$. Finding a continuous convex interpolation of $\left( T_i \right) _{i \in {\mathbb {Z}}}$ is finding a convex continuous function which makes the $T_i$ a sequence of level sets. We call this process continuous convex interpolation. This questioning was present in Fenchel [21] and dates back to de Finetti [22], let us also mention the work of Crouzeix [18] revolving around this issue.

Such constructions have been shown to be realizable by Torralba [36], Kannai [25], using ideas based on Minkowski sum. The validity of this construction can be proved easily using the result of Crouzeix [18] which was already present under different and weaker forms in the works of de Finetti and Fenchel.

Theorem 1

(de Finetti–Fenchel–Crouzeix) Let $f :{\mathbb {R}}^p \rightarrow {\mathbb {R}}$ be a quasiconvex function. The functions

$$\begin{aligned} F_x :\lambda \mapsto \sup \left\{ \langle z,x\rangle : f(z)\le \lambda \right\} \end{aligned}$$

are concave for all x in ${\mathbb {R}}^p$, if and only f is convex.

Our goal is to build smooth convex interpolation for sequences of smooth convex sets. To make such a construction we shall use nonlinear Minkowski interpolation between level sets.

We shall also rely on Bernstein approximation which we now describe.

2.2 Bernstein approximation

We refer to the monograph [28] by Lorentz.

The main properties of Bernstein polynomials to be used in this paper are the following:

Bernstein approximation is linear in its functional argument f and “monotone” which allows to construct an approximation using only positive combination of a finite number of function values.
There are precise formulas for derivatives of Bernstein approximation. They involve repeated finite differences. So approximating piecewise affine function with high enough degree leads to an approximation for which corner values of derivatives are controlled while the remaining derivatives are vanishing (up to a given order).
Bernstein approximation is shape preserving, which means in particular that approximating a concave function preserves concavity.

The main idea to produce a smooth interpolation which preserves level sets is depicted in Fig. 2 where we use Bernstein approximation to interpolate smoothly between three points and controlling the successive derivatives at the end points of the interpolation.

Let us now be specific. Given f defined on the interval [0, 1], the Bernstein polynomial of order $d \in {\mathbb {N}}^*$ associated to f is given by

$$\begin{aligned} B_d(x) = B_{d,f}(x) = \sum _{k=0}^d f\left( \frac{k}{d} \right) {d \atopwithdelims ()k} x^k(1-x)^{d-k}, \text{ for } x\in [0,1]. \end{aligned}$$

(1)

Derivatives and shape preservation: For any h in (0, 1) and x in $ [0,1-h]$, we set $\Delta _h^1 f(x) = f(x+h) - f(x)$ and recursively for all k in $ {\mathbb {N}}^*$, $\Delta _h^k f(x) = \Delta \left( \Delta _h^{k-1} f(x) \right) $. We fix $d \ne 0$ in ${\mathbb {N}}$ and for $h = \frac{1}{d}$ write $\Delta _h^k = \Delta ^k$. Then for any $m \le d$, we have

$$\begin{aligned} B_d(x)^{(m)}&= d(d-1)\ldots (d-m+1) \sum _{k=0}^{d - m} \Delta ^m f\left( \frac{k}{d} \right) {{d-m} \atopwithdelims ()k} x^k(1-x)^{d-k-m}, \end{aligned}$$

(2)

for any x in [0, 1]. If f is increasing (resp. strictly increasing), then $\Delta ^1f(x) \ge 0$ (resp. $\Delta ^1f(x) > 0$) for all x and $B'_d$ is positive (resp. strictly positive) and $B_d$ is increasing (resp. strictly increasing). Similarly, if f is concave, then $\Delta ^2 f(x) \le 0$ for all x so that $B^{(2)}_d \le 0$ and $B_d$ is concave. From (2), we infer

$$\begin{aligned} \left| B_d(x)^{(m)}\right| \le d(d-1)\ldots (d-m+1)\sup _{k\in \{0,\ldots ,d-m\}} \left| \Delta ^m f\left( \frac{k}{d} \right) \right| \end{aligned}$$

(3)

for x in [0, 1].

Approximation of piecewise affine functions: The following lemma will be extensively used throughout the proofs.

Lemma 1

(Smoothing of piecewise lines in ${\mathbb {R}}^p$) Let $q_0,q_1 \in {\mathbb {R}}^p$, $\lambda _-<\lambda _0< \lambda _1 < \lambda _+$ and $0<e_1,e_0<1$. Set $\Theta = \left( q_0,q_1,,\lambda _-,\lambda _0,\lambda _1,\lambda _+, e_0,e_1 \right) $ and define $\gamma _\Theta :[0,1] \mapsto {\mathbb {R}}^p$ through

$$\begin{aligned}&\gamma _\Theta (t)\\&= {\left\{ \begin{array}{ll} q_0 \left( 1 + \frac{e_0}{\lambda _0 - \lambda _-} \left( t (\lambda _+- \lambda _-) \right) \right) &{} \text { if } 0\le t \le \frac{\lambda _0 - \lambda _-}{\lambda _+ - \lambda _-} \\ q_0(1 + e_0) \left( \frac{\lambda _1 - \lambda _- - t (\lambda _+ - \lambda _-)}{\lambda _1 - \lambda _0} \right) + q_1(1 - e_1) \left( \frac{ \lambda _- - \lambda _0 + t (\lambda _+ - \lambda _-)}{\lambda _1 - \lambda _0} \right) &{} \text { if } \frac{\lambda _0 - \lambda _-}{\lambda _+ - \lambda _-} \le t \le \frac{\lambda _1 - \lambda _-}{\lambda _+ - \lambda _-} \\ q_1\left( 1 + \frac{(t-1)(\lambda _+-\lambda _-)e_1}{\lambda _+ - \lambda _1} \right) &{} \text { if } \frac{\lambda _1 - \lambda _-}{\lambda _+ - \lambda _-} \le t \le 1. \end{array}\right. } \end{aligned}$$

The curve $\gamma _\Theta $ in ${\mathbb {R}}^{p+1}$ is the affine interpolant between the points $q_0$, $ (1 + e_0) q_0$, $(1 - e_1) q_1$ and $ q_1$. For any m in $ {\mathbb {N}}$, we choose d in ${\mathbb {N}}^*$ such that

$$\begin{aligned} \frac{m}{d} \le \min \left\{ \frac{\lambda _0 - \lambda _-}{\lambda _+ - \lambda _-}, 1 - \frac{\lambda _1 - \lambda _-}{\lambda _+ - \lambda _-} \right\} . \end{aligned}$$

(4)

We consider a Bernstein-like reparametrization of ${\tilde{\gamma }}_\Theta $ given by

$$\begin{aligned} {\tilde{\gamma }}_\Theta :[\lambda _-,\lambda _+]&\mapsto {\mathbb {R}}^p\\ \lambda&\mapsto \sum _{k=0}^d {\tilde{\gamma }}_\Theta \left( \frac{k}{d} \right) {d \atopwithdelims ()k} \left( \frac{\lambda - \lambda _-}{\lambda _+ - \lambda _-} \right) ^k\left( 1 - \frac{\lambda - \lambda _-}{\lambda _+ - \lambda _-} \right) ^{d-k}. \end{aligned}$$

Then the following holds, for any $2 \le l \le m$, ${\tilde{\gamma }}_\Theta $ is $C^m$ and

$$\begin{aligned} {\tilde{\gamma }}_\Theta (\lambda _-)&=q_0&{\tilde{\gamma }}_\Theta (\lambda _+)&=q_1\\ {\tilde{\gamma }}_\Theta '(\lambda _-)&=\frac{e_0}{\lambda _0- \lambda -} q_0&{\tilde{\gamma }}_\Theta '(\lambda _+)&=\frac{e_1}{\lambda _+ - \lambda _1} q_1\\ {\tilde{\gamma }}_\Theta ^{(l)}(\lambda _-)&=0&{\tilde{\gamma }}_\Theta ^{(l)}(\lambda _+)&= 0. \end{aligned}$$

Furthermore, if $\gamma _\Theta $ has monotone coordinates (resp. strictly monotone, resp. concave, resp. convex), then so has ${\tilde{\gamma }}_\Theta $.

Proof

Note that the dependence of ${\tilde{\gamma }}_\Theta $ in $(q_0,q_1)$ is linear so that the dependence of ${\tilde{\gamma }}_\Theta $ in $(q_0,q_1)$ is also linear. Hence ${\tilde{\gamma }}_\Theta $ is of the form $\lambda \mapsto a(\lambda ) q_0 + b(\lambda )q_1$. We can restrict ourselves to $p = 1$ since the general case follows from the univariate case applied coordinatewise.

If $p=1$, then ${\tilde{\gamma }}_\Theta = B_{f_{\Theta ,d}} \circ A$, where $A :\lambda \mapsto \frac{\lambda - \lambda _-}{\lambda _+-\lambda _-}$. We have ${\tilde{\gamma }}_\Theta (0) = q_0$, ${\tilde{\gamma }}_\Theta (1) = q_1$ and $\Delta ^1 f_{\Theta }(0) = \frac{\lambda _+ - \lambda _-}{\lambda _0 - \lambda _-}\frac{e_0}{d} q_0$, $\Delta ^1 f_{\Theta }\left( 1 - \frac{1}{d} \right) = \frac{\lambda _+ - \lambda _-}{\lambda _+ - \lambda _1}\frac{e_1}{d} q_1$ and $\Delta ^{(l)} f_{\Theta }(0) = \Delta ^{(l)} f_{\Theta }\left( 1 - \frac{l}{d} \right) =0$. The results follow from the expressions in (1) and (2) and the chain rule for ${\tilde{\gamma }}_\Theta = B_{f_{\Theta ,d}} \circ A$.

The last property of ${\tilde{\gamma }}_\Theta $ is due to the shape preserving property of Bernstein approximation and the fact that ${\tilde{\gamma }}_\Theta = B_{f_{\Theta ,d}} \circ L$. $\square $

Remark 1

(a) [Affine image] Using the notation of Lemma 1, if $(\lambda _-,q_0)$, $(\lambda _0, (1 + e_0) q_0)$, $(\lambda _1, (1 - e_1) q_1)$ and $(\lambda _+, q_1)$ are aligned, then the interpolation is actually an affine function.

(b) [Degree of the interpolants] Observe that the degree of the Bernstein interpolant is connected to the slopes of the piecewise path $\lambda $ by (4).

3 Smooth convex interpolation

Being given a subset S of ${\mathbb {R}}^p$, we denote by $\mathrm {int}(S)$ its interior, ${\bar{S}}$ its closure and $\mathrm {bd}\,S={\bar{S}}{\setminus } \mathrm {int}(S)$ its boundary. Let us recall that the support function of S is defined through

$$\begin{aligned} \sigma _S(x)=\sup \left\{ \langle y,x\rangle :y\in S\right\} \in {\mathbb {R}}\cup \{+\infty \}. \end{aligned}$$

3.1 Smooth parametrization of convex rings

A convex ring is a set of the form $C_1{\setminus } C_2$ where $C_1\subset C_2$ are convex sets. Providing adequate parameterizations for such objects is key for interpolating $C_1$ and $C_2$ by some (regular) convex function.

The following assertion plays a fundamental role.

Assumption 1

Let $T_-,T_+ \subset {\mathbb {R}}^2$ be convex, compact with $C^k$ boundary ($k \ge 2$) and positive curvature. Assume that, $T_- \subset \mathrm {int} (T_+)$ and $0 \in \mathrm {int}(T_-)$.

The positive curvature assumption ensures that the boundaries can be parametrized by their normal, that is, for $i=-,+$, there exists

$$\begin{aligned} c_i :{\mathbb {R}}/ 2 \pi {\mathbb {Z}}\mapsto \mathrm {bd}\,(T_i) \end{aligned}$$

such that the normal to $T_i$ at $c_i(\theta )$ is the vector $n(\theta ) = (\cos (\theta ),\sin (\theta ))^T$ and ${\dot{c}}_i(\theta ) = \rho _{i}(\theta ) \tau (\theta )$ where $\rho _i > 0$ and $\tau (\theta ) = (-\sin (\theta ),\cos (\theta ))$. In this setting, it holds that $c_i(\theta ) = \text{ argmax}_{y \in T_i} \left\langle n(\theta ), y\right\rangle $. The map $c_i$ is the inverse of the Gauss map and is $C^{k-1}$ (see [35] Section 2.5).

Lemma 2

(Minkowski sum of convex sets with positive curvature) Let $T_-,T_+$ be as in Assumption 1 with normal parametrizations as above. For $a,b\ge 0$ with $a+b>0$ set $T=aT_-+bT_+$.

Then T has positive curvature and its boundary is given by

$$\begin{aligned} \mathrm {bd}\,T=\{a\,c_-(\theta )+b\,c_+(\theta ): \theta \in {\mathbb {R}}/ 2 \pi {\mathbb {Z}}\}, \end{aligned}$$

with the natural parametrization ${\mathbb {R}}/2\pi {\mathbb {Z}}\,\ni \theta \rightarrow a\,c_-(\theta )+b\,c_+(\theta ).$

Proof

We may assume $ab>0$ otherwise the result is obvious. Let x be in $\mathrm {bd}\,T$ and denote by $n(\theta )$ the normal vector at x for a well chosen $\theta $, so that $x=\mathrm {argmax}\,\left\{ \langle y,n(\theta )\rangle \right\} : y \in T\}$. Observe that the definition of the Minkowski sum yields

$$\begin{aligned} \max _{y\in T}\langle y,n(\theta )\rangle= & {} \max _{(v,w)\in T_-\times T_+}\langle a v+bw,n(\theta )\rangle \\= & {} a \max _{v\in T_-} \langle v,n(\theta )\rangle +b\max _{w\in T_+} \langle w,n(\theta )\rangle \end{aligned}$$

so that

$$\begin{aligned} \langle x,n(\theta )\rangle= & {} a \langle c_-(\theta ),n(\theta )\rangle +b\langle c_+(\theta ),n(\theta )\rangle \\= & {} \langle a c_-(\theta )+b c_+(\theta ),n(\theta )\rangle \end{aligned}$$

which implies by extremality of x that $x= a c_-(\theta )+b c_+(\theta )$. Conversely, for any such x, $n(\theta )$ defines a supporting hyperplane to T and x must be on the boundary of T. The other results follow immediately. $\square $

In the following fundamental proposition, we provide a smooth parametrization of the convex ring $T_+{\setminus } \mathrm {int}\,T_-$. The major difficulty is to control tightly the derivatives at the boundary so that the parametrizations can be glued afterward to build smooth interpolants.

Proposition 1

($C^k$ parametrization of convex rings) Let $T_-,T_+$ be as in Assumption 1 with their normal parametrization as above. Fix $k\ge 2$, $\lambda _-< \lambda _0< \lambda _1 < \lambda _+$ and $0< e_0,e_1<1$. Choose d in $ {\mathbb {N}}^*$, such that

$$\begin{aligned} \frac{k}{d} \le \min \left\{ \frac{\lambda _0 - \lambda _-}{\lambda _+ - \lambda _-}, 1 - \frac{\lambda _1 - \lambda _-}{\lambda _+ - \lambda _-} \right\} . \end{aligned}$$

Consider the map

$$\begin{aligned} G :[\lambda _-,\lambda _+] \times {\mathbb {R}}/2\pi {\mathbb {Z}}&\mapsto {\mathbb {R}}^2 \nonumber \\ (\lambda , \theta ) \mapsto {\tilde{\gamma }}_\Theta (\lambda ) \end{aligned}$$

(5)

with $\Theta = (c_-(\theta ), c_+(\theta ), \lambda _-, \lambda _0, \lambda _1, \lambda _+, e_0,e_1)$ and ${\tilde{\gamma }}$ as given by Lemma 1. Assume further that:

$$\begin{aligned}&({\mathcal {M}})\, \textit{For any }\theta \in {\mathbb {R}}/ 2\pi {\mathbb {Z}}, \lambda \mapsto \left\langle G(\lambda , \theta ), n(\theta )\right\rangle \textit{ has strictly positive derivative on }\\&[\lambda _-,\lambda _+]. \end{aligned}$$

Then the image of G is ${\mathcal {R}}:=T_+ {\setminus } \mathrm {int}(T_-)$, G is $C^k$ and satisfies, for any $2 \le l \le k$ and any m in $ {\mathbb {N}}^*$,

$$\begin{aligned} \frac{\partial ^m G}{\partial \theta ^m}(\lambda _-,\theta )&= c_-^{(m)}(\theta )&\frac{\partial ^m G}{\partial \theta ^m}(\lambda _+,\theta )&= c_+^{(m)}(\theta )\\ \frac{\partial ^{m+1} G}{\partial \lambda \partial \theta ^m} (\lambda _-,\theta )&= c_-^{(m)}(\theta ) \frac{e_0}{\lambda _0 - \lambda _-}&\frac{\partial ^{m+1} G}{\partial \lambda \partial \theta ^m} (\lambda _+,\theta )&= c_+^{(m)}(\theta ) \frac{e_1}{\lambda _+-\lambda _1}\\ \frac{\partial ^{l+m} G}{\partial \lambda ^l \partial \theta ^m }(\lambda _-,\theta )&= 0&\frac{\partial ^{l+m} G}{\partial \lambda ^l \partial \theta ^m }(\lambda _+,\theta )&= 0. \end{aligned}$$

Besides G is a diffeomorphism from its domain on to its image. Set ${\mathcal {R}}\ni x \mapsto (f(x), \theta (x))$ to be the inverse of G. Then f is $C^k$ and in addition, for all x in $ {\mathbb {R}}$,

$$\begin{aligned} \nabla f(x) =\quad&\frac{1}{\left\langle \frac{\partial G}{\partial \lambda } (f(x), \theta (x)), n(\theta (x)) \right\rangle } \, n(\theta (x))\nonumber \\ \nabla \theta (x) =\quad&\frac{1}{\left\langle \frac{\partial G}{\partial \theta }(f(x),\theta (x)),\tau (\theta (x))\right\rangle } \tau (\theta (x)) \nonumber \\&- \frac{\left\langle \frac{\partial G}{\partial \lambda } (f(x),\theta (x)), \tau (\theta (x))\right\rangle }{\left\langle \frac{\partial G}{\partial \lambda } (f(x),\theta (x)), n(\theta (x))\right\rangle \left\langle \frac{\partial G}{\partial \theta }(f(x),\theta (x)),\tau (\theta (x))\right\rangle } n(\theta (x)) \nonumber \\ \nabla ^2 f(x)=\quad&\frac{\left\langle \frac{\partial G}{\partial \theta }(f(x),\theta (x)),\tau (\theta (x))\right\rangle }{\left\langle \frac{\partial G}{\partial \lambda } (f(x), \theta (x)), n(\theta (x)) \right\rangle } \nabla \theta (x) \nabla \theta (x)^T \nonumber \\&- \frac{\left\langle \frac{\partial ^2 G}{\partial \lambda ^2}(f(x),\theta (x)), n(\theta (x))\right\rangle }{\left\langle \frac{\partial G}{\partial \lambda } (f(x), \theta (x)), n(\theta (x)) \right\rangle } \nabla f(x) \nabla f(x)^T, \end{aligned}$$

(6)

where all denominators are positive.

Remark 2

Note that G is actually well defined and smooth on an open set containing its domain. As we shall see it is also a diffeomorphism from an open set containing its domain onto its image.

Proof

Note that by construction, we have $G(\lambda ,\theta ) = a(\lambda ) c_-(\theta ) + b(\lambda )c_+(\theta )$ for some polynomials a and b which are nonnegative on $[\lambda _-,\lambda _+]$. The formulas for the derivatives follow easily from this remark, the form of a and b and Lemma 1.

Set, for any $\lambda $ in $ [\lambda _-,\lambda _+]$, $T_\lambda = a(\lambda ) T_- + b(\lambda ) T_+$. The resulting set $T_\lambda $ is convex and has a positive curvature by Lemma 2, and for $\lambda $ fixed $G(\lambda ,\cdot )$ is the inverse of the Gauss map of $T_\lambda $, which constitutes a parametrization by normals of the boundary.

Assume that $\lambda < \lambda '$, using the monotonicity assumption $({\mathcal {M}})$, we have for any $\theta ,\theta '$,

$$\begin{aligned} \left\langle n(\theta '), G(\lambda ,\theta ) \right\rangle&\le \sup _{y \in T_{\lambda }} \left\langle n(\theta '),y \right\rangle \\&= \left\langle n(\theta '), G(\lambda ,\theta ') \right\rangle \\&< \left\langle n(\theta '), G(\lambda ',\theta ') \right\rangle \end{aligned}$$

so that $G(\lambda ,\theta ) \ne G(\lambda ',\theta ')$. Furthermore, we have by definition of $G(\lambda ',\theta ')$

$$\begin{aligned} G(\lambda ,\theta ) \in \bigcap _{\theta ' \in {\mathbb {R}}/2\pi {\mathbb {Z}}} \left\{ y,\; \left\langle y, n(\theta ')\right\rangle \le \left\langle n(\theta '), G(\lambda ',\theta ') \right\rangle \right\} = T_{\lambda '}, \end{aligned}$$

where the equality follows from the convexity of $T_{\lambda '}$. By convexity and compactness, this entails that $T_\lambda = \mathrm {conv}(\mathrm {bd}\,(T_\lambda )) \subset \mathrm {int}\,T_{\lambda '}$.

Let us show that the map G is bijective, first consider proving surjectivity. Let f be defined on $T_+ {\setminus } \mathrm {int}(T_-)$ through

$$\begin{aligned} f :x \mapsto \inf _{}\left\{ \lambda : \lambda \ge \lambda _-, \; x \in T_\lambda = a(\lambda ) T_- + b(\lambda ) T_+\right\} . \end{aligned}$$

(7)

Since $a(\lambda _+)=0, b(\lambda _+)=1$ this function is well defined and by compactness and continuity the infimum is achieved. It must hold that x belongs to $ \mathrm {bd}\,(T_{f(x)})$, indeed, if $f(x) = \lambda _-$, then x belongs to $ \mathrm {bd}\,(T_-)$ and otherwise, if x is in $ \mathrm {int}(T_{\lambda '})$ for $\lambda ' > \lambda _-$, then $f(x) < \lambda '$. We deduce that x is of the form $G(f(x),\theta )$ for a certain value of $\theta $, so that G is surjective.

As for injectivity, we have already seen a first case, the monotonicity assumption $({\mathcal {M}})$ ensures that $\lambda \ne \lambda '$ implies $G(\lambda ,\theta ) \ne G(\lambda ',\theta ')$ for any $\theta ,\theta '$. Furthermore, we have the second case, for any $\lambda $ in $ [\lambda _-,\lambda _+]$ and any $\theta $, $G(\lambda ,\theta ) = \arg \max _{y\in T_\lambda } \left\langle y,n(\theta ) \right\rangle $ so that $\theta \ne \theta '$ implies $G(\lambda ,\theta ) \ne G(\lambda ,\theta ')$. So in all cases, $(\lambda ,\theta ) \ne (\lambda ',\theta ')$ implies that $G(\lambda ,\theta ) \ne G(\lambda ',\theta ')$ and G is injective.

Let us now show that the map G is a local diffeomorphism by estimating its Jacobian map.

Since $0 \in \mathrm {int}(T_-)$, we have for any $\lambda ,\theta $,

$$\begin{aligned} 0&< \sup _{y \in T_-} \left\langle y, n(\theta ) \right\rangle \\&= \left\langle G(\lambda _-,\theta ),n(\theta ) \right\rangle \\&\le a(\lambda ) \left\langle c_-(\theta ), n(\theta )\right\rangle + b(\lambda ) \left\langle c_+(\theta ), n(\theta ) \right\rangle , \end{aligned}$$

and both scalar products are positive so that $a(\lambda ) + b(\lambda ) > 0$. Hence, for any $\theta $ in $ {\mathbb {R}}/ 2\pi {\mathbb {Z}}$,

$$\begin{aligned} \frac{\partial G}{\partial \theta }(\lambda ,\theta ) = (a(\lambda ) \rho _-(\theta ) + b(\lambda ) \rho _+ (\theta )) \tau (\theta ), \end{aligned}$$

(8)

with $a(\lambda ) \rho _-(\theta ) + b(\lambda ) \rho _+ (\theta ) > 0$.

Furthermore by assumption $\lambda \rightarrow \max _{y \in T_\lambda } \left\langle y, n(\theta ) \right\rangle =\left\langle G(\lambda ,\theta ),n(\theta ) \right\rangle $ has strictly positive derivative in $\lambda $, whence

$$\begin{aligned} \left\langle \frac{\partial G}{\partial \lambda } (\lambda ,\theta ), n(\theta )\right\rangle > 0. \end{aligned}$$

We deduce that for any fixed $\theta $, in the basis $(n(\theta ), \tau (\theta ))$, the Jacobian of G, denoted $J_G$, is triangular with positive diagonal entries. More precisely fix $\lambda , \theta $ and set $x = G(\lambda , \theta )$ such that $\lambda = f(x)$, $\theta = \theta (x)$. In the basis $(n(\theta ), \tau (\theta ))$, we deduce from (8) that the Jacobian of G is of the form

$$\begin{aligned} J_G(\lambda , \theta ) = \begin{pmatrix} \alpha &{}\quad 0\\ \gamma &{}\quad \beta \end{pmatrix}, \end{aligned}$$

where

$$\begin{aligned} \alpha&= \left\langle \frac{\partial G}{\partial \lambda } (\lambda ,\theta ), n(\theta )\right\rangle>0 \\ \beta&= \left\langle \frac{\partial G}{\partial \theta } (\lambda ,\theta ), \tau (\theta )\right\rangle = a(\lambda ) \rho _-(\theta ) + b(\lambda ) \rho _+ (\theta ) >0\\ \gamma&= \left\langle \frac{\partial G}{\partial \lambda } (\lambda ,\theta ), \tau (\theta )\right\rangle . \end{aligned}$$

It is thus invertible and we have a local diffeomorphism. We deduce that

$$\begin{aligned} J_G(\lambda , \theta )^{-1} = \begin{pmatrix} \alpha ^{-1} &{}\quad 0\\ \frac{-\gamma }{\alpha \beta }&{}\quad \beta ^{-1} \end{pmatrix}. \end{aligned}$$

We have $J_G(\lambda ,\theta )^{-1} = J_{G^{-1}}(x)$ so that the first line is $\nabla f(x)$ and second line is $\nabla \theta (x)$, which proves the claimed expressions for gradients.

We also have $d n(\theta ) / d\theta = \tau (\theta )$ so that

$$\begin{aligned} J_{n \circ \theta }(x) = \tau (\theta (x)) \nabla \theta (x)^T. \end{aligned}$$

Differentiating the gradient expression, we obtain ($\nabla $ denotes gradient with respect to x):

$$\begin{aligned}&\nabla ^2 f(x)\\&\quad =\quad \frac{\partial }{\partial x} \left( \frac{1}{\left\langle \frac{\partial G}{\partial \lambda } (f(x), \theta (x)), n(\theta (x)) \right\rangle } \, n(\theta (x))\right) \\&\quad =\quad \frac{1}{\left\langle \frac{\partial G}{\partial \lambda } (f(x), \theta (x)), n(\theta (x)) \right\rangle } \, J_{n\circ \theta }(x) \\&\qquad + n(\theta (x)) \nabla \left( \frac{1}{\left\langle \frac{\partial G}{\partial \lambda } (f(x), \theta (x)), n(\theta (x)) \right\rangle }\right) ^T \\&\quad =\quad \frac{1}{\left\langle \frac{\partial G}{\partial \lambda } (f(x), \theta (x)), n(\theta (x)) \right\rangle } \, J_{n\circ \theta }(x) \\&\qquad - n(\theta (x)) \nabla \left( \left\langle \frac{\partial G}{\partial \lambda } (f(x), \theta (x)), n(\theta (x)) \right\rangle \right) ^T \frac{1}{\left\langle \frac{\partial G}{\partial \lambda } (f(x), \theta (x)), n(\theta (x)) \right\rangle ^2} \\&\quad =\quad \frac{1}{\left\langle \frac{\partial G}{\partial \lambda } (f(x), \theta (x)), n(\theta (x)) \right\rangle } \left( \tau (\theta ) \nabla \theta (x)^T \right. \\&\qquad \left. - \nabla f(x) \nabla \left( \left\langle \frac{\partial G}{\partial \lambda } (f(x), \theta (x)), n(\theta (x)) \right\rangle \right) ^T\right) . \end{aligned}$$

We have

$$\begin{aligned} =\quad&\nabla \left( \left\langle \frac{\partial G}{\partial \lambda } (f(x), \theta (x)), n(\theta (x)) \right\rangle \right) ^T \\ =\quad&\frac{\partial G}{\partial \lambda } (f(x), \theta (x))^T J_{n\circ \theta }(x) + n(\theta (x))^T J_{\frac{\partial G}{\partial \lambda }}(f(x),\theta (x)) J_{G^{-1}} (x) \\ =\quad&\left\langle \frac{\partial G}{\partial \lambda }(f(x), \theta (x)), \tau (\theta (x)) \right\rangle \nabla \theta (x)^T + \left\langle n (\theta (x)), \frac{\partial ^2 G}{\partial \lambda ^2}(\lambda ,\theta )\right\rangle \nabla f(x)^T, \end{aligned}$$

where, for the last identity, we have used the fact that

$$\begin{aligned} n(\theta (x))^T J_{\frac{\partial G}{\partial \lambda }}(f(x),\theta (x))&= \left\langle n (\theta (x)), \frac{\partial ^2 G}{\partial \lambda ^2}(\lambda ,\theta )\right\rangle n(\theta (x))^T \\&\quad + \left\langle n (\theta (x)), \frac{\partial ^2 G}{\partial \lambda \partial \theta }(\lambda ,\theta )\right\rangle \tau (\theta (x))^T\\&= \left\langle n (\theta (x)), \frac{\partial ^2 G}{\partial \lambda ^2}(\lambda ,\theta )\right\rangle n(\theta (x))^T\\ n(\theta (x))^T J_{G^{-1}}(x)&= n(\theta (x))^T J_G(f(x),\theta (x))^{-1}\\&= \frac{1}{\left\langle \frac{\partial G}{\partial \lambda } (f(x),\theta (x)), n(\theta (x))\right\rangle } n(\theta (x))^T = \nabla f(x)^T. \end{aligned}$$

We deduce that

$$\begin{aligned}&\nabla ^2 f(x)\\&\quad =\quad \frac{1}{\left\langle \frac{\partial G}{\partial \lambda } (f(x), \theta (x)), n(\theta (x)) \right\rangle }\\&\qquad \Bigg ( \tau (\theta (x)) \nabla \theta (x)^T - \left\langle \frac{\partial G}{\partial \lambda }(f(x), \theta (x)), \tau (\theta (x)) \right\rangle \nabla f(x) \nabla \theta (x)^T \\&\qquad - \nabla f(x) \nabla f(x)^T \left\langle n (\theta (x)), \frac{\partial ^2 G}{\partial \lambda ^2}(\lambda ,\theta )\right\rangle \Bigg ). \end{aligned}$$

We have

$$\begin{aligned}&\tau (\theta (x)) \nabla \theta (x)^T - \left\langle \frac{\partial G}{\partial \lambda }(f(x), \theta (x)), \tau (\theta (x)) \right\rangle \nabla f(x) \nabla \theta (x)^T \\&\quad =\quad \left( \tau (\theta (x)) - \frac{\left\langle \frac{\partial G}{\partial \lambda }(f(x), \theta (x)), \tau (\theta (x)) \right\rangle }{\left\langle \frac{\partial G}{\partial \lambda }(f(x), \theta (x)), n(\theta (x)) \right\rangle } n(\theta (x)) \right) \nabla \theta (x)^T\\&\quad =\quad (a(\lambda ) \rho _-(\theta ) + b(\lambda ) \rho _+ (\theta )) \nabla \theta (x) \nabla \theta (x)^T \end{aligned}$$

So that we actually get

$$\begin{aligned}&\nabla ^2 f(x) \left\langle \frac{\partial G}{\partial \lambda } (f(x), \theta (x)), n(\theta (x)) \right\rangle \\&= (a(\lambda ) \rho _-(\theta ) + b(\lambda ) \rho _+ (\theta )) \nabla \theta (x) \nabla \theta (x)^T - \left\langle n (\theta (x)), \frac{\partial ^2 G}{\partial \lambda ^2}(\lambda ,\theta )\right\rangle \nabla f(x) \nabla f(x)^T\\&= \left\langle \frac{\partial G}{\partial \theta }(\lambda ,\theta ),\tau (\theta (x))\right\rangle \nabla \theta (x) \nabla \theta (x)^T - \left\langle n (\theta (x)), \frac{\partial ^2 G}{\partial \lambda ^2}(\lambda ,\theta )\right\rangle \nabla f(x) \nabla f(x)^T. \end{aligned}$$

This concludes the proof. $\square $

3.2 Smooth convex interpolation of smooth positively curved convex sequences

In this section, we consider an indexing set I with either $I = {\mathbb {N}}$ or $I = {\mathbb {Z}}$, and an increasing sequence of compact convex sets $\left( T_i \right) _{i\in I}$ such that for any i in I, the couple $T_+ := T_{i+1}$, $T_-: = T_i$ satisfies Assumption 1. In particular, for each i in I, $T_i$ is compact, convex with $C^k$ boundary and positive curvature. We denote by $c_i$ the corresponding parametrization by the normal, $T_i \subset \mathrm {int}\, T_{i+1}$. With no loss of generality we assume $\displaystyle 0 \in \cap _{i \in I} T_i$.

This is our main theoretical result.

Theorem 2

(Smooth convex interpolation) Let $I = {\mathbb {N}}$ or $I = {\mathbb {Z}}$ and $\left( T_i \right) _{i\in I}$ such that for any $i \in I$, $T_i \subset {\mathbb {R}}^2$ and the couple $T_+ := T_{i+1}$, $T_-: = T_i$ satisfies Assumption 1. Then there exists a $C^k$ convex function

$$\begin{aligned} f :{\mathcal {T}}:= \mathrm {int}\left( \bigcup _{i\in I} T_i \right) \mapsto {\mathbb {R}}\end{aligned}$$

such that

(i) $T_i$ is a sublevel set of f for all i in I.

(ii) We have

$$\begin{aligned} {{\,\mathrm{argmin}\,}}f= {\left\{ \begin{array}{ll} \bigcap _I T_i &{}\quad \text { if } I = {\mathbb {Z}}\\ \{0\}&{}\quad \text { if } I = {\mathbb {N}}. \end{array}\right. } \end{aligned}$$

(iii) $\nabla ^2 f $ is positive definite on ${\mathcal {T}}{\setminus } {{\,\mathrm{argmin}\,}}f$, and if $I = {\mathbb {N}}$, it is positive definite throughout ${\mathcal {T}}$.

Proof

Preconditionning. We have that $0 \in \cap _{i \in I } \mathrm {int}(T_i)$. Hence for any i in I and $0 \le \alpha <1$, $\alpha T_i \in \mathrm {int}(T_i)$. Furthermore, for $\alpha >0$, small enough, $(1 + \alpha )T_{i} \subset (1-\alpha ) \mathrm {int}(T_{i+1})$. Set $\alpha _0$ such that $(1 + \alpha _0)T_0 \subset (1-\alpha _0) \mathrm {int} (T_1)$. By forward (and backward if $I = {\mathbb {Z}}$) induction, for all i in I, we obtain $\alpha _i>0$ such that

$$\begin{aligned} (1 + \alpha _i) T_i&\subset (1-\alpha _{i}) \mathrm {int} (T_{i+1}). \end{aligned}$$

Setting for all i in I, $\epsilon _{i+1} = \min \{\alpha _i,\alpha _{i+1}\}$ ($\epsilon _0 = \alpha _0$ if $I = {\mathbb {N}}$), we have for all i in I

$$\begin{aligned} (1 + \epsilon _i) T_i \subset (1 + \alpha _i) T_i&\subset (1-\alpha _{i}) \mathrm {int} (T_{i+1}) \subset (1-\epsilon _{i+1}) \mathrm {int} (T_{i+1}). \end{aligned}$$

For all i in I, we introduce

$$\begin{aligned} S_{3i}&= T_i,\\ S_{3i + 1}&= (1 + \epsilon _i) T_i\\ S_{3i + 2}&= (1 - \epsilon _{i+1}) T_{i+1}. \end{aligned}$$

We have a new sequence of strictly increasing compact convex sets $\left( S_i\right) _{i \in I }$.

Value assignation. For each i in I, we set

$$\begin{aligned} K_i = \max _{\Vert x\Vert = 1} \frac{\sigma _{S_{i+1}}(x) - \sigma _{S_i}(x)}{ \sigma _{S_{i}}(x) - \sigma _{S_{i-1}}(x)} \in \left( 0, + \infty \right) . \end{aligned}$$

Note that for all i in I, $K_{3i} = 1$. We choose $\lambda _{1} = 2$, $\lambda _0 = 1$ and for all i in I,

$$\begin{aligned} \lambda _{i+1} = \lambda _i + K_i(\lambda _i - \lambda _{i-1}). \end{aligned}$$

(9)

By construction, we have for all i in I and all $\theta \in {\mathbb {R}}/ 2 \pi {\mathbb {Z}}$,

$$\begin{aligned} \frac{\sigma _{S_{i+1}}(n(\theta )) - \sigma _{S_{i}}(n(\theta ))}{\lambda _{i+1} - \lambda _{i}} \le \frac{\sigma _{S_{i}}(n(\theta )) - \sigma _{S_{i-1}}(n(\theta )) }{\lambda _{i} - \lambda _{i-1}}. \end{aligned}$$

If $I = {\mathbb {Z}}$, this entails

$$\begin{aligned} 0 < \lambda _i - \lambda _{i - 1} \le \frac{\lambda _{1}- \lambda _0}{\sigma _{S_{1}}(n(\theta )) - \sigma _{S_{0}}(n(\theta ))} (\sigma _{S_{i}}(n(\theta )) - \sigma _{S_{i-1}}(n(\theta ))), \end{aligned}$$

and the right-hand side is summable over negative indices $i\le 0$, so that $\lambda _{i} \rightarrow {\underline{\lambda }} \in {\mathbb {R}}$ as $i \rightarrow -\infty $. In all cases $(\lambda _i)_{i\in I}$ is an increasing sequence bounded from below.

Local interpolation. We fix i in I and consider the function $G_i$ described in Proposition 1 with $T_+ = S_{3i + 3} = T_{i+1}$, $T_- = S_{3i} = T_i$, $\lambda _+ = \lambda _{3i+3}$, $\lambda _{1} = \lambda _{3i+2}$, $\lambda _0 = \lambda _{3i+1}$, $\lambda _- = \lambda _{3i}$, $e_0 = \epsilon _i$, $e_1 = \epsilon _{i+1}$. By linearity, we have for any $(\lambda ,\theta ) \in [\lambda _-,\lambda _+] \times {\mathbb {R}}/ 2\pi {\mathbb {Z}}$,

$$\begin{aligned} \left\langle G_i(\lambda ,\theta ),n(\theta )\right\rangle = {\tilde{\gamma }}_{\Theta }(\lambda ) \end{aligned}$$

where $ {\tilde{\gamma }}_{\Theta }$ is as in Lemma 1 with input data $q_0 = \left\langle c_{3i}(\theta ), n(\theta )\right\rangle = \sigma _{S_{3i}}(n(\theta ))$, $q_1 = \left\langle c_{3i+3}(\theta ), n(\theta )\right\rangle = \sigma _{S_{3i+3}}(n(\theta ))$, and $\lambda _-,\lambda _0,\lambda _1,\lambda _+,e_1,e_0$ as already described. This corresponds to the Bernstein approximation of the piecewise affine interpolation between the points

$$\begin{aligned}&\left( \lambda _{3i}, \sigma _{S_{3i}}(n(\theta ))\right) \nonumber \\&\left( \lambda _{3i+1}, \sigma _{S_{3i+1}}(n(\theta ))\right) ,\nonumber \\&\left( \lambda _{3i+2}, \sigma _{S_{3i+2}}(n(\theta ))\right) ,\nonumber \\&(\lambda _{3i+3}, \sigma _{3i+3}(n(\theta ))), \end{aligned}$$

(10)

By construction of $\left( K_i \right) _{i \in I }$, we have for all $\theta $,

$$\begin{aligned}&0 < \frac{\sigma _{S_{3i+3}}(n(\theta )) - \sigma _{S_{3i+2}}(n(\theta )) }{\lambda _{3i+3} - \lambda _{3i + 2}}\\&\quad \le \frac{\sigma _{S_{3i+2}}(n(\theta )) - \sigma _{S_{3i+1}}(n(\theta )) }{\lambda _{3i+2} - \lambda _{3i + 1}} \le \frac{\sigma _{S_{3i+1}}(n(\theta )) - \sigma _{S_{3i}}(n(\theta )) }{\lambda _{3i+1} - \lambda _{3i}}. \end{aligned}$$

Whence the affine interpolant between points in (10) is strictly increasing and concave, and by using the shape preserving properties of Bernstein polynomials, $\left\langle G_i(\lambda ,\theta ),n(\theta )\right\rangle $ has strictly positive derivative. As a consequence $G_i$ is a diffeomorphism and its derivatives are as in Proposition 1. Furthermore

$$\begin{aligned} \lambda \mapsto \left\langle G_i(\lambda ,\theta ),n(\theta )\right\rangle \end{aligned}$$

is a $C^k$ concave function of $\lambda $.

Global interpolation. Recall that ${\underline{\lambda }} = \inf _{i \in I} \lambda _i>-\infty $ and set ${\bar{\lambda }} = \sup _{i \in I} \lambda _i\in (-\infty ,+\infty ]$. For any $\lambda \in ({\underline{\lambda }},{\bar{\lambda }})$, there exists a unique $i_\lambda \in I$ such that $\lambda \in [\lambda _{3i_\lambda }, \lambda _{3i_\lambda +3})$. Define

$$\begin{aligned} G :({\underline{\lambda }}, {\bar{\lambda }}) \times {\mathbb {R}}/ 2\pi {\mathbb {Z}}&\mapsto {\mathbb {R}}^2 \\ (\lambda ,\theta )&\mapsto G_{i_\lambda } (\lambda , \theta ). \end{aligned}$$

Fix i in I. The boundary of $T_{i+1}$ is given by $G_{i+1}(\lambda _{3i+3}, {\mathbb {R}}/ 2 \pi {\mathbb {Z}}) = G_{i}(\lambda _{3i+3}, {\mathbb {R}}/ 2 \pi {\mathbb {Z}})$ with actually

$$\begin{aligned} G_{i+1}(\lambda _{3i+3}, \theta ) = G_{i}(\lambda _{3i+3}, \theta ), \text{ for } \text{ all } \theta \text{ in } {\mathbb {R}}/ 2 \pi {\mathbb {Z}}. \end{aligned}$$

(11)

Since $K_{3i} = 1$, we have

$$\begin{aligned} \lambda _{3i+1} - \lambda _{3i} = \lambda _{3i} - \lambda _{3i-1}. \end{aligned}$$

The expressions of the derivatives in Proposition 1 and (11) ensure that the derivatives of $G_{i+1}$ and $G_{i}$ agree on ${\lambda _{3i+3}} \times {\mathbb {R}}/ 2 \pi {\mathbb {Z}}$ up to order k. Hence G is a local diffeomorphism. Bijectivity of each $G_i$ ensure that G is also bijective and thus G is a diffeomorphism. Furthermore

$$\begin{aligned} \lambda \mapsto \left\langle G(\lambda ,\theta ),n(\theta )\right\rangle \end{aligned}$$

is $C^k$ piecewise concave and thus concave.

Extending G. If $I = {\mathbb {N}}$, we may assume without loss of generality that $S_0 = B$ the Euclidean ball and $S_1 = 5/3 S_0$, which corresponds to $\epsilon _0 = 2/3$, eventually after adding a set in the list and rescaling. Let $\phi $ denote the function described in Lemma 10 and $G_{-1}$ be described as in Lemma 11. This allows to extend G for $\lambda \in [0,1]$, G is then $C^k$ on $(0,{\bar{\lambda }}) \times {\mathbb {R}}/ 2\pi {\mathbb {Z}}$ by using Lemma 11 and Proposition 1. This does not affect the differentiability, monotonicity and concavity properties of G.

Defining the interpolant f. We assume without loss of generality that ${\underline{\lambda }} = 0$. We set f to be the first component of the inverse of G so that it is defined on $G^{-1}\left( (0,{\bar{\lambda }}) \times {\mathbb {R}}/ 2\pi {\mathbb {Z}}\right) $. We extend f as follows:

$f(0) = 0$ if $I = {\mathbb {N}}$,
$f = 0$ on $\cap _{i \in I } T_i$ if $I = {\mathbb {Z}}$.

Since G is $C^k$ and non-singular on $ (0,{\bar{\lambda }}) \times {\mathbb {R}}/ 2\pi {\mathbb {Z}}$, the inverse mapping theorem ensures that f is $C^k$ on $\mathrm {int}({\mathcal {T}}) {\setminus } \arg \min _{{\mathcal {T}}} f$.

Convexity of f. For any $\theta $ in $ {\mathbb {R}}/ 2\pi {\mathbb {Z}}$,

$$\begin{aligned} (0, {\bar{\lambda }})&\mapsto {\mathbb {R}}_+\\ \lambda&\mapsto \sup _{z\in [f \le \lambda ]} n(\theta )^Tz \end{aligned}$$

is equal to $\left\langle G(\lambda ,\theta ),n(\theta )\right\rangle $ which is concave. It can be extended at $\lambda = 0$ by continuity. This preserves concavity hence, using Theorem 1, we have proved that f is convex and $C^k$ on ${\mathcal {T}}{\setminus } {{\,\mathrm{argmin}\,}}_{{\mathcal {T}}}f$.

Smoothness around the argmin and Hessian positivity. If $I = {\mathbb {N}}$, then the interpolant defined in Lemma 11 ensures that f is proportional to the norm squared around 0. Hence it is $C^k$ around 0 with positive definite Hessian. We may compose f with the function $g :t \mapsto \sqrt{t^2 + 1} + t$ which is increasing and has positive second derivative. This ensures that the resulting Hessian is positive definite outside ${{\,\mathrm{argmin}\,}}f$ and thus everywhere since

$$\begin{aligned} \nabla ^2 g \circ f = g' \nabla ^2 f + g'' \nabla f \nabla f^T \end{aligned}$$

is positive definite thanks to the expressions for the Hessian of f in Proposition 1

If $I = {\mathbb {Z}}$, we let all the derivatives of f vanish around the solution set. The smoothing Lemma 14 applies and provides a function $\phi $ with positive derivative on $(0,+\infty )$, such that $\phi \circ f$ is convex, $C^k$ with prescribed sublevel sets. Furthermore, we remark that

$$\begin{aligned} \nabla ^2 \phi \circ f = \phi ' \nabla ^2 f + \phi '' \nabla f \nabla f^T. \end{aligned}$$

We may compose $\phi \circ f$ with the function $g :t \mapsto \sqrt{t^2 + 1} + t$ which is increasing with positive second derivative, the expressions for the Hessian of f in Proposition 1 ensure that the resulting Hessian is positive definite out of ${{\,\mathrm{argmin}\,}}f$.$\square $

Remark 3

(Interpolation of symmetric rings) In view of Remark 1, if we have $T_{i+1} = \alpha T_i$ for some $0<\alpha < 1$ and i in $ {\mathbb {Z}}$, then the interpolated level sets between $T_i$ and $T_{i+1}$ are all of the form $s T_i$ for $\alpha \le s \le 1$.

Remark 4

(Strict convexity) Recall that strict convexity of a differentiable function amounts to the injectivity of its gradient. In Theorem 2 if there is a unique minimizer, then the invertibility of the Hessian outside argmin f ensures that our interpolant is strictly convex (note that this is automatically the case if $I = {\mathbb {N}}$).

3.3 Considerations on Legendre functions

The following proposition provides some interpolant with additional properties as global Lipschitz continuity and finiteness properties for the dual function. At this stage these results appear as merely technical but they happen to be decisive in the construction of counterexamples involving Legendre functions. The properties of Legendre functions can be found in [37, Chapter 6]. We simply recall here that, given a convex body C of ${\mathbb {R}}^p$, a convex function $h:C\rightarrow {\mathbb {R}}$ is Legendre if it is differentiable on $\mathrm {int}\,C$ and if $\nabla h$ defines a bijection from $\mathrm {int}\,C$ to $\nabla h(\mathrm {int}\,C)$ with in addition

$$\begin{aligned} \lim _{\begin{array}{l} x\in \mathrm {int}\,C\\ x\rightarrow z\end{array}} \Vert \nabla h(x)\Vert =+\infty , \end{aligned}$$

for all z in $\mathrm {bd}\,C$. We also assume that $\text{ epi }\,f:=\{(x,\lambda ):f(x)\le \lambda \}$ is closed in ${\mathbb {R}}^{p+1}$. The Legendre conjugate or dual function of h is defined through

$$\begin{aligned} h^*(z)=\sup \left\{ \langle z,x\rangle -h(x):x \in C\right\} , \end{aligned}$$

for z in ${\mathbb {R}}^p$, and its domain is $D:=\left\{ z\in {\mathbb {R}}^p:h(z)<+\infty \right\} .$ The function $h^*$ is differentiable on the interior of D, and the inverse of $\nabla h:\mathrm {int}\,C \rightarrow \mathrm {int}\,D$ is $\nabla h^*:\mathrm {int}\,D \rightarrow \mathrm {int}\,C$.

We start with a simple technical lemma on the compactness of the domain of a Legendre function.

Lemma 3

Let $h :{\mathbb {R}}^2 \mapsto {\mathbb {R}}$ be a globally Lipschitz continuous Legendre function, and set $D = \mathrm {int}(\mathrm {dom}(h^*))$ where $h^* :{\mathbb {R}}^2 \mapsto {\mathbb {R}}$ is the conjugate of h. For each $\lambda \ge \min _{{\mathbb {R}}^2}h$ let $\sigma _\lambda $ be the support function associated to the set $\left\{ z \in {\mathbb {R}}^2, \, h(z) \le \lambda \right\} $. The following are equivalent

(i)
$h^*(x)\le 0$ for all $x \in D$.
(ii)
For all $y \in {\mathbb {R}}^2$, $\sigma _{h(y)}(\nabla h(y)) \le h(y)$.

In both cases $h^*$ has compact domain.

Proof

Let us establish beforehand the following formula

$$\begin{aligned} h^*(z)&= \sigma _{h(y)}(\nabla h(y))- h(y), \end{aligned}$$

(12)

with $y = \nabla h^* (z)$ and $y\in {\mathbb {R}}^2$. Since $h^*$ is Legendre, we have for all y in D,

$$\begin{aligned} h^*(z)&= \sup _{y \in {\mathbb {R}}^2} \left\langle z, y \right\rangle - h(y) = \left\langle z, \nabla h^*(z) \right\rangle - h(\nabla h^*(z)). \end{aligned}$$

We have, setting $y = \nabla h^*(z)$

$$\begin{aligned} \left\langle z, \nabla h^*(z) \right\rangle = \left\langle \nabla h(y),y \right\rangle = \sigma _{h(y)}(\nabla h(y)) \end{aligned}$$

because $\nabla h(y)$ is normal to the sublevel set of h which contains y in its boundary. Hence we have $h^*(z) = \sigma _{h(y)}(\nabla h(y))- h(y)$ with $y = \nabla h^* (z)$, that is (12) holds. Since $\nabla h^* :D \mapsto {\mathbb {R}}^2$ is a bijection, the equivalence follows. In this case the domain of $h^*$ is closed because $h^*$ is bounded and lower semicontinuous. The domain is also bounded by the Lipschitz continuity of h, whence compact. $\square $

Proposition 2

(On Legendre interpolation) Let $\left( S_i \right) _{i\in {\mathbb {N}}}$ be such that for any i in I, $T_- = S_{i}$, $T_+ = S_{i+1}$ satisfy Assumption 1 and there exists a sequence $\left( \epsilon _i \right) _{i \in {\mathbb {N}}}$ in (0, 1) such that for all $i\ge 1$, $(1 - \epsilon _i)^{-1} S_{3i-1} = (1 +\epsilon _i)^{-1} S_{3i+1}=S_{3i}$.

Assume in addition that,

$$\begin{aligned} \inf _{\Vert x\Vert = 1}&\sigma _{S_i}(x) - \sigma _{S_{i-1}}(x) =1 + O\left( \frac{1}{i^3}\right)&\text{(non } \text{ degeneracy) }, \end{aligned}$$

(13)

$$\begin{aligned} \sup _{\Vert x\Vert =1}&\left| \frac{\sigma _{S_{i+1}}(x) - \sigma _{S_i}(x)}{\sigma _{S_{i}}(x) - \sigma _{S_{i-1}}(x)} - 1\right| = O\left( \frac{1}{i^3} \right)&\text{(moderate } \text{ growth) }. \end{aligned}$$

(14)

Then there exists a convex $C^k$ function $h :{\mathbb {R}}^2 \mapsto {\mathbb {R}}$, such that

For all i in $ {\mathbb {N}}$, $S_{3i}$ is a sublevel set of h,
h has positive definite Hessian,
h is globally Lipschitz continuous,
$h^*$ has a compact domain D and is $C^k$ and strictly convex on $\mathrm {int}(D)$.

Proof

The construction of h follows the exact same principle as that of Theorem 2. This ensures that the first two points are valid. Note that Eq. (13) implies that the sets sequence grows by at least a fixed amount in each direction as i grows. Hence we have ${\mathcal {T}}= {\mathbb {R}}^2$.

Global Lipschitz continuity of h: The values of h are defined through

$$\begin{aligned} \lambda _{i+1}-\lambda _{i}&=K_i(\lambda _{i}-\lambda _{i-1}),\;\forall i\in {\mathbb {N}}^*\\ K_i&= \max _{\Vert x\Vert = 1} \frac{\sigma _{S_{i+1}}(x) - \sigma _{S_i}(x)}{ \sigma _{S_{i}}(x) - \sigma _{S_{i-1}}(x)} \in \left( 0, + \infty \right) , \end{aligned}$$

so that $K_i = 1 + O(1/i^3)$ thanks to Eq. (14). Note that the moderate growth assumption entails

$$\begin{aligned} \sup _{i \in {\mathbb {N}}} \sigma _{S_{i+1}}(x) - \sigma _{S_{i}}(x) = O\left( \prod _{i \in {\mathbb {N}}^*} K_i\right) = O(1). \end{aligned}$$

(15)

For $i\ge 1$, one has

$$\begin{aligned} \lambda _{i+1}-\lambda _{i}=\prod _{1 \le j \le i} K_j(\lambda _1-\lambda _0). \end{aligned}$$

(16)

On the other hand using the bounds (14), (13) and the identity (16), there exists a constant $\kappa >0$ such that for all $i \ge 1$, all $\theta $ in $ {\mathbb {R}}/ 2 \pi {\mathbb {Z}}$,

$$\begin{aligned} \frac{\sigma _{S_{i+1}}(n(\theta )) - \sigma _{S_{i}}(n(\theta )) }{\lambda _{i+1} - \lambda _{i}} = \frac{\left( \sigma _{S_{i+1}}(n(\theta )) - \sigma _{S_{i}}(n(\theta ))\right) }{\prod _{j=1}^i K_j (\lambda _{1} - \lambda _{0})} \ge \kappa >0. \end{aligned}$$

(17)

By the interpolation properties described in Lemma 1 the function $\left\langle G(\lambda ,\theta ),n(\theta )\right\rangle $ constructed in Theorem 2 has derivative with respect to $\lambda $ greater than $\kappa $. Recalling the expression of the gradient as given in Proposition 1 (and the concavity of G with respect to $\lambda $), this shows that

$$\begin{aligned} \Vert \nabla h(x)\Vert \le \frac{1}{\kappa } \end{aligned}$$

for all x in $ {\mathbb {R}}^2$, and by the mean value theorem, h is globally Lipschitz continuous on ${\mathbb {R}}^2$.

Properties of the dual function: h is Legendre, its conjugate $h^*$ is therefore Legendre. From the definiteness of $\nabla ^2 h$ and the fact that $\nabla h :{\mathbb {R}}^2 \mapsto \mathrm {int}(D)$ is a bijection, we deduce that $h^*$ is $C^k$ by the inverse mapping theorem. So the only property which we need to establish is that $h^*$ has a compact domain, in other words, using Lemma 3, it is sufficient to show that $\sup _{x \in \mathrm {int} D} h^*(x) \le 0$.

Using the notation of the proof of Theorem 2, we will show that it is possible to verify that, for all $\lambda ,\theta $ in the domain of G

$$\begin{aligned} \frac{\left\langle n(\theta ), G(\lambda ,\theta )\right\rangle }{\left\langle \frac{\partial G}{\partial \lambda }(\lambda ,\theta ), n(\theta ) \right\rangle } \le \lambda . \end{aligned}$$

(18)

Equation (18) is indeed the coordinate form of the characterization given in Lemma 3. Let us observe that

$$\begin{aligned} \frac{\partial }{\partial \lambda } \left( \left\langle n(\theta ), G(\lambda ,\theta )\right\rangle - \lambda \left\langle \frac{\partial G}{\partial \lambda }(\lambda ,\theta ), n(\theta ) \right\rangle \right) = -\lambda \left\langle \frac{\partial ^2 G}{\partial \lambda ^2}(\lambda ,\theta ), n(\theta ) \right\rangle , \end{aligned}$$

(19)

and since G is concave, the right hand side is positive.

Assume that we have proved that,

$$\begin{aligned} \lambda \mapsto \sup _\theta -\lambda \left\langle \frac{\partial ^2 G}{\partial \lambda ^2}(\lambda ,\theta ), n(\theta ) \right\rangle \end{aligned}$$

(20)

has finite integral as $\lambda \rightarrow \infty $. Since the function

$$\begin{aligned} \theta \mapsto \lambda \left\langle \frac{\partial ^2 G}{\partial \lambda ^2}(\lambda ,\theta ), n(\theta ) \right\rangle \end{aligned}$$

(21)

is continuous on ${\mathbb {R}}/2\pi {\mathbb {Z}}$ for any $\lambda $, Lebesgue dominated convergence theorem would ensure that

$$\begin{aligned} \theta \mapsto \int _{\lambda \ge \lambda _0} -\lambda \left\langle \frac{\partial ^2 G}{\partial \lambda ^2}(\lambda ,\theta ), n(\theta ) \right\rangle d \lambda \end{aligned}$$

is continuous in $\theta $, so that:

$$\begin{aligned} \sup _{\theta } \left[ \lim _{\lambda \rightarrow \infty } \left\langle n(\theta ), G(\lambda ,\theta )\right\rangle - \lambda \left\langle \frac{\partial G}{\partial \lambda }(\lambda ,\theta ), n(\theta ) \right\rangle \right] <+\infty . \end{aligned}$$

Shifting values if necessary, we could assume that this upper bound is equal to zero to obtain Eq. (18). The latter being the condition required in Lemma 3, we would have reached a conclusion.

Let us therefore establish that (19) is integrable over ${\mathbb {R}}_+$. Recall that G is constructed using the Bernstein interpolation given in Lemma 1 between successive values of $\lambda $. As a result, for a fixed $\theta $, the function $\left\langle n(\theta ), G(\lambda ,\theta )\right\rangle $ is the interpolation of the piecewise affine function interpolating

$$\begin{aligned}&\left( \lambda _{3i}, \sigma _{S_{3i}}(n(\theta ))\right) \nonumber \\&\left( \lambda _{3i+1}, \sigma _{S_{3i+1}}(n(\theta ))\right) ,\nonumber \\&\left( \lambda _{3i+2}, \sigma _{S_{3i+2}}(n(\theta ))\right) ,\nonumber \\&(\lambda _{3i+3}, \sigma _{3i+3}(n(\theta ))), \end{aligned}$$

(22)

as in Eq. (10). This interpolation is concave and increasing.

Assumption (14) ensures that $K_j = 1 + O(1/j^3)$. Then

$$\begin{aligned} \prod _{j=1}^m K_j = \bar{K} + O(1 / j^{2}) \end{aligned}$$

where $\bar{K}$ is the finite, positive limit of the product (we can for example perform integral series comparison after taking the logarithm).

The recursion on the values writes for all $i\ge 1$

$$\begin{aligned} \lambda _{i+1} = \lambda _i + K_i(\lambda _i - \lambda _{i-1}), \end{aligned}$$

so that

$$\begin{aligned} \lambda _{i+1} - \lambda _i= (\lambda _1 - \lambda _0) \prod _{j=1}^i K_i = (\lambda _1 - \lambda _0) \bar{K} + O(1/i^{2}). \end{aligned}$$

This means that the gap between consecutive values tends to be constant. Thus by (4) in Lemma 1, see also Remark 1, the degree of the Bernstein interpolants is bounded. Using this bound together with inequality (3), providing bounds for the derivatives of Bernstein’s polynomial, ensure that, for all $\lambda $ in $ [\lambda _{3i}, \lambda _{3i+3})$:

$$\begin{aligned}&\left| \left\langle \frac{\partial ^2 G}{\partial \lambda ^2}(\lambda ,\theta ), n(\theta ) \right\rangle \right| \\&\quad =\, O\left( \max _{j = 3i+2, 3i+1}\left| \frac{\sigma _{S_{j+1}}(n(\theta )) - \sigma _{S_{j}}(n(\theta ))}{\lambda _{j+1} - \lambda _{j}} - \frac{\sigma _{S_{j}}(n(\theta )) - \sigma _{S_{j-1}}(n(\theta )) }{\lambda _{j} - \lambda _{j-1}} \right| \right) . \end{aligned}$$

Now for any $j = 3i+2, 3i+1$,

$$\begin{aligned}&\left| \frac{\sigma _{S_{j+1}}(n(\theta )) - \sigma _{S_{j}}(n(\theta ))}{\lambda _{j+1} - \lambda _{j}} - \frac{\sigma _{S_{j}}(n(\theta )) - \sigma _{S_{j-1}}(n(\theta )) }{\lambda _{j} - \lambda _{j-1}} \right| \\&\quad =\, \frac{1}{\lambda _{j+1} - \lambda _{j}} \left| (\sigma _{S_{j+1}}(n(\theta )) - \sigma _{S_{j}}(n(\theta ))) - \frac{\lambda _{j+1} - \lambda _{j}}{\lambda _{j} - \lambda _{j-1}} (\sigma _{S_{j}}(n(\theta )) - \sigma _{S_{j-1}(n(\theta )) }) \right| \\&\quad =\, \left( 1 / ((\lambda _1 - \lambda _0) \bar{K}) + O(1/i^2)\right) \\&\qquad \times \left| (\sigma _{S_{j+1}}(n(\theta )) - \sigma _{S_{j}}(n(\theta ))) - (1 + O(1/i^2)) (\sigma _{S_{j}}(n(\theta )) - \sigma _{S_{j-1}(n(\theta )) }) \right| \\&\quad =\, \left( 1 / ((\lambda _1 - \lambda _0) \bar{K}) + O(1/i^2)\right) \\&\qquad \times \left| (\sigma _{S_{j+1}}(n(\theta )) - \sigma _{S_{j}}(n(\theta ))) - (\sigma _{S_{j}}(n(\theta )) - \sigma _{S_{j-1}(n(\theta )) }) \right| \end{aligned}$$

where the last identity follows from the triangle inequality because using $\sigma _{S_{j}}(n(\theta )) - \sigma _{S_{j-1}}(n(\theta )) = O(1)$ in (15). Hence

$$\begin{aligned}&\left| \left\langle \frac{\partial ^2 G}{\partial \lambda ^2}(\lambda ,\theta ), n(\theta ) \right\rangle \right| \\&\quad =\, \left( 1 / ((\lambda _1 - \lambda _0) \bar{K}) + O(1/i^2)\right) \\&\qquad \times O\left( \max _{j = 3i+2, 3i+1}\left| \sigma _{S_{j+1}}(n(\theta )) - \sigma _{S_{j}}(n(\theta )) - (\sigma _{S_{j}}(n(\theta )) - \sigma _{S_{j-1}}(n(\theta )) ) \right| \right) \\&\quad =\, \left( 1 / ((\lambda _1 - \lambda _0) \bar{K}) + O(1/i^2)\right) \\&\qquad \times O\left( \max _{j = 3i+2, 3i+1}\left| \sigma _{S_{j}}(n(\theta )) - \sigma _{S_{j-1}}(n(\theta )) \right| \times \left| \frac{\sigma _{S_{j+1}}(n(\theta )) - \sigma _{S_{j}}(n(\theta ))}{\sigma _{S_{j}}(n(\theta )) - \sigma _{S_{j-1}}(n(\theta )) } - 1 \right| \right) \\&\quad =\, O(1/i^3), \end{aligned}$$

where the last inequality follows from (15) and (14). Now as $i \rightarrow \infty $, $\lambda _{3i} \sim \lambda _{3i + 3} \sim i c$ for some constant $c>0$ and

$$\begin{aligned} \sup _{\lambda \in [\lambda _{3i}, \lambda _{3i+3}], \theta \in [0, 2 \pi ]} - \lambda \left\langle \frac{\partial ^2 G}{\partial \lambda ^2}(\lambda ,\theta ), n(\theta ) \right\rangle = O(1 / i^2) \end{aligned}$$

and

$$\begin{aligned} \sup _{ \theta \in [0, 2 \pi ]} - \lambda \left\langle \frac{\partial ^2 G}{\partial \lambda ^2}(\lambda ,\theta ), n(\theta ) \right\rangle \end{aligned}$$

has finite integral as $\lambda \rightarrow \infty $. This implies (20) and it concludes the proofs. $\square $

4 Smooth convex interpolation for sequences of polygons

Given a sequence of points $A_1,\ldots ,A_n$, we denote by $A_1\ldots A_n$ the polygon obtained by joining successive points ending the loop with the segment $[A_n,A_1]$. In the sequel we consider mainly convex polygons, so that the vertices $A_1,\ldots ,A_n$ are also the extreme points.

The purpose of this section is first to show that polygons can be approximated by smooth convex sets with prescribed normals under weak assumptions. Figure 3 illustrates the result we would like to establish: given a target polygon with prescribed normals at its vertices, we wish to construct a smooth convex set interpolating the vertices with the desired normals and whose distance to the polygon is small.

Then given a sequence of nested polygons, we provide a smooth convex function which interpolates the polygons in the sense described just above.

Given a closed nonempty convex subset S of ${\mathbb {R}}^p$ and x in S, we recall that the normal cone to S at x is

$$\begin{aligned} N_S(x)=\left\{ z\in {\mathbb {R}}^p:\langle z,y-x\rangle \le 0,\forall y\in S\right\} . \end{aligned}$$

Such vectors will often simply called normals (to S) at x.

4.1 Smooth approximations of polygons

Lemma 4

For any $r_-,r_+ > 0$, $t_- > 0$, $t_+<0$ and $\epsilon > 0$, $m \in {\mathbb {N}}$, $m \ge 3$, there exists a strictly concave polynomial function $p :[0,1] \mapsto [0,\epsilon ]$ such that

$$\begin{aligned} p(0)&= 0&p(1)&= 0\\ p'(0)&= t_-&p'(1)&= t_+\\ p''(0)&= -r_-&p''(1)&= -r_+.\\ p^{(q)}(0)&= 0 \quad q \in \{3, \ldots , m\}. \end{aligned}$$

Proof

Let us begin with preliminary remarks. Consider for any a, b in $ {\mathbb {R}}$, the function

$$\begin{aligned} f :t \mapsto a(t-b)^2. \end{aligned}$$

For any t in $ {\mathbb {R}}$, q in $ {\mathbb {N}}$, $q > 2$, and any $c > 0$, we have

$$\begin{aligned} f(t + c) - f(t)&= \Delta ^1_c f(t) = a c ( 2(t-b) + c)\nonumber \\ f(t + 2 c) - 2 f(t+c) + f(t)&= \Delta ^2_c f(t) = a c ( 2 (t + c - b) + c - 2(t-b) - c) \nonumber \\ = 2ac^2 \Delta _c^q f(t)&= 0 \end{aligned}$$

(23)

Choosing the degree d and constructing the polynomial. For any d in $ {\mathbb {N}}$, $d \ge 2m + 1$, we set

$$\begin{aligned} a_-(d)&= \frac{-dr_-}{2(d - 1)}<0&b_-(d)&= \frac{1}{2d} \left( 1 + 2 t_-\frac{d-1}{r_-} \right) > 0\nonumber \\ a_+(d)&= \frac{-dr_+}{2(d - 1)}<0&b_+(d)&= 1 + \frac{1}{2d} \left( -1 + 2t_+\frac{d-1}{r_+} \right) < 1, \end{aligned}$$

(24)

and define the functions

$$\begin{aligned} f_d :s&\mapsto {\left\{ \begin{array}{ll} a_-(d) (( s - b_-(d))^2 - b_-(d)^2) &{} \text { if } s\le b_-(d)\\ - a_-(d) b_-(d)^2&{} \text { if } s\ge b_-(d) \end{array}\right. }\\ g_d :t&\mapsto {\left\{ \begin{array}{ll} a_+(d) (( s - b_+(d))^2 - (1-b_+(d))^2) &{} \text { if } s\ge b_+(d)\\ - a_+(d) (1-b_+(d))^2&{} \text { if } s\le b_+(d). \end{array}\right. } \end{aligned}$$

Furthermore, we set

$$\begin{aligned} f :t&\mapsto {\left\{ \begin{array}{ll} \frac{r_-}{2} \left( \left( \frac{t_-}{r_-} \right) ^2- \left( s - \frac{t_-}{r_-} \right) ^2 \right) &{} \text { if } s\le \frac{t_-}{r_-}\\ \frac{r_-}{2} \left( \frac{t_-}{r_-} \right) ^2 &{} \text { if } s\ge \frac{t_-}{r_-} \end{array}\right. }\\ g :t&\mapsto {\left\{ \begin{array}{ll} \frac{r_+}{2} \left( \left( \frac{t_+}{r_+} \right) ^2 - \left( s -1 - \frac{t_+}{r_+} \right) ^2 \right) &{} \text { if } s\ge 1 + \frac{t_+}{r_+}\\ \frac{r_+}{2} \left( \frac{t_+}{r_+} \right) ^2 &{} \text { if } s\le 1 + \frac{t_+}{r_+}. \end{array}\right. } \end{aligned}$$

Note that $b_-(d) \rightarrow t_- / r_-$, $b_+(d) \rightarrow 1 + t_+ / r_+$, $a_-(d) \rightarrow - r_-/2$ and $a_+(d) \rightarrow - r_+/2$ as $d \rightarrow \infty $ so that $f_d \rightarrow f$ and $g_d \rightarrow g$ uniformly on [0, 1]. For any d, $f_d$ is concave increasing and $g_d$ is concave decreasing and all of them are Lipschitz continuous on [0, 1] with constants that do not depend on d. Note also that $f(0) = 0 < g(0)$ and $g(1) = 0 < f(1)$. We choose $d \ge 2m + 1$ such that

$$\begin{aligned} f_d\left( \frac{m}{d} \right)&\le \min \left( \epsilon , g_d\left( \frac{m}{d} \right) \right) \nonumber \\ g_d\left( 1 - \frac{m}{d} \right)&\le \min \left( \epsilon , f_d\left( 1 - \frac{m}{d} \right) \right) . \end{aligned}$$

(25)

Such a d always exists because in both cases, the left hand side converges to 0 and the right hand side converges to a strictly positive term as d tends to $\infty $. For such a d, we set $h :s \mapsto \min \left\{ f_d(s), g_d(s),\epsilon \right\} $. By construction, h is concave, agrees with $f_d$ on $\left[ 0,\frac{m}{d} \right] \subset [0,1/2]$ and with $g_d$ on $\left[ 1- \frac{m}{d},1 \right] \subset [1/2,1]$. Using Eq. (23) with $c = 1/d$, we deduce that

$$\begin{aligned} d (d-1) \Delta ^2 h(0)&= d (d-1) \Delta ^2 f_d(0) = d(d-1)\frac{2 a_-(d)}{d^2} = -r_-\\ d \Delta h(0)&= d \Delta f_d(0) = a_-(d)\left( \frac{1}{d} -2b_-(d)\right) = t_-\\ \Delta ^{q}h(0)&= 0 = \Delta ^{q}f_d(0) \quad \forall m \ge q \ge 3\\ d (d-1) \Delta ^2 h\left( 1 - \frac{2}{d}\right)&= d (d-1) \Delta ^2 g_d\left( 1 - \frac{2}{d}\right) = d(d-1)\frac{2 a_+(d)}{d^2} = -r_+\\ d \Delta h\left( 1 - \frac{1}{d} \right)&= d \Delta g_d\left( 1 - \frac{1}{d} \right) = a_+(d) \left( \frac{-1}{d} + 2 (1 - b_+(d))\right) = t_+\\ \Delta ^{q}h\left( 1 - \frac{m}{d} \right)&= \Delta ^{q}g_d\left( 1 - \frac{m}{d} \right) = 0 \quad \forall m \ge q \ge 3. \end{aligned}$$

From the concavity of h and the derivative formula (2), we deduce that the polynomial $B_{h,d}$ satisfies the desired properties (Fig. 4). $\square $

We deduce the following result

Lemma 5

Let $a > 0$, $r > 0$, $\epsilon > 0$, and an integer $m \ge 3$. Consider two unit vectors: $v_-$ with strictly positive entries and $v_+$ with first entry strictly positive and second entry strictly negative. Then there exists a $C^m$ curve $\gamma :[0,M] \mapsto {\mathbb {R}}^2$, such that

1.
$\Vert \gamma '\Vert = 1$.
2.
$\gamma (0) = (-a, 0) := A$ and $\gamma (1) = (0,a) : = B$.
3.
$\gamma '(0) = v_-$ and $\gamma '(1) = v_+$.
4.
$\Vert \gamma ''(0)\Vert = \Vert \gamma ''(-1)\Vert = r$.
5.
$\mathrm {det}(\gamma ',\gamma '') < 0$ along the curve.
6.
$\gamma ^{(q)}(0) = \gamma ^{(q)}(1) = 0$ for any $3 \le q \le m$.
7.
$\mathrm {dist}(\gamma ([0,M]), [A,B]) \le \epsilon $.

Proof

Consider the graph of a polynomial as given in Lemma 4 with $t_- = v_-[2] / v_-[1]> 0$, $t_+ = v_+[2] / v_+[1]< 0$ and $r_- = \frac{r}{2a} (1 + t_-^2)^{\frac{3}{2}}$, $r_+ = \frac{r}{2a} (1 + t_+^2)^{\frac{3}{2}}$ and $\epsilon /2a$ as an approximation parameter. This graph is parametrized by t. It is possible to reparametrize it by arclength to obtain a $C^m$ curve $\gamma _0$ whose tangents at 0 is $T_-$, at 1 is $T_+$, and whose curvature at 0 and 1 is $- \frac{r}{2a}$. Furthermore, $\gamma _0$ has strictly negative curvature whence item 5. Consider the affine transform: $x \mapsto 2a(x - 1/2)$, $y \mapsto 2a y$. This results in a $C^m$ curve $\gamma $, parametrized by arclength which satisfies the desired assumptions. $\square $

Lemma 6

(Normal approximations of polygons by smooth convex sets) Let $S=A_1...A_n$ be a convex polygon. For each i, let $V_i$ be in $ N_S(A_i)$ such that the angle between $V_i$ and each of the two neighboring faces is within $\left( \frac{\pi }{2},\pi \right) $. Then for any $\epsilon > 0$ and any $m \ge 2$, there exists a compact convex set $C \subset {\mathbb {R}}^2$ such that

(i)
the boundary of C is $C^m$ with non vanishing curvature,
(ii)
$S \subset C$,
(iii)
for any $i = 1,\ldots , n$, $A_i $ in $ \mathrm {bd}\,(C)$ and the normal cone to C at $A_i$ is given by $V_i$,
(iv)
$\max _{y \in C} \mathrm {dist}(y,S) \le \epsilon $.

Proof

We assume without loss of generality that $A_1,\ldots ,A_n$ are ordered clockwise. For $i = 1, \ldots , n-1$, and each segment $[A_i,A_{i+1}]$, we may perform a rotation and a translation to obtain $A_i = -(a,0)$ and $A_{i+1} = (a,0)$. Working in this coordinate system, using the angle condition on $V_i$, we may choose $v^-_i$, $v^+_{i+1}$ satisfying the hypotheses of Lemma 5 respectively orthogonal to $V_i$ and $V_{i+1}$. Choosing $r = 1$, we obtain $\gamma _i :[0,M_i] \mapsto {\mathbb {R}}^2$ as given by Lemma 5. Rotation and translations affect only the direction of the derivatives of curves, not their length. Hence, it is possible to concatenate curves $\left( \gamma _i \right) _{i=1}^{n-1}$ and to preserve the $C^m$ properties of the resulting curve. At end-points, tangents and second order derivatives coincide while higher derivatives vanish. Furthermore the curvature has constant sign and does not vanish. We obtain a closed $C^m$ curve which defines a convex set which satisfies all the requirements of the lemma. $\square $

Remark 5

(Bissector) Given any polygon, choosing normal vectors as given by the direction of the bissector of each angles ensure that the above assumptions are satisfied. Hence all our approximation results hold given polygon without specifying the choice of outer normals.

4.2 Smooth convex interpolants of polygonal sequences

Definition 1

(Interpolability) For $n\ge 3$, let $A_1\ldots A_n $ be a convex polygon S and $V_i $ be in $ N_S(V_i)$ for $i=1,\ldots ,n$. We say that $\left( A_i,V_i \right) _{i=1}^n $ is interpolable if for each $i = 1,\ldots ,n$, the angle between $V_i$ and each of the two neighboring faces of the polygon is in $\left( \frac{\pi }{2},\pi \right) $. The collection $\left( A_i,V_i \right) _{i=1}^n $ is called a polygon-normal pair.

Let $I = {\mathbb {Z}}$ or $I = {\mathbb {N}}$. Let $\left( PN_i \right) _{i \in I}$ be a sequence of interpolable polygon-normal pairs. Setting for i in I, $PN_i = \left\{ \left( A_{j,i} \right) _{j=1}^{n_i}, \left( V_{j,i} \right) _{j=1}^{n_i} \right\} $ where $n_j $ is in $ {\mathbb {N}}$ and denoting by $T_i$ the polygon $A_{1,i}\ldots A_{n_i,i}$, we say that the sequence $\left( PN_i \right) _{i \in I}$ is strictly increasing if for all i in I, $T_i \subset \mathrm {int}(T_{i+1})$.

Let $\left( PN_i \right) _{i \in I}$ be a strictly increasing sequence of interpolable polygon-normal pairs. A sequence $(\epsilon _i)_{i \in I}$ in (0, 1) is said to be admissible if $0 \in \mathrm {int}(T_i)$ for each i in I and

$$\begin{aligned} \gamma T_i\subset \mathrm {int}\,T_{i+1} \end{aligned}$$

for all $\gamma \in [1-\epsilon _i,1+\epsilon _i].$ We have the following corollary of Theorem 2.

Corollary 1

(Smooth convex interpolation of polygon sequences) Let $I = {\mathbb {Z}}$ or $I = {\mathbb {N}}$. Let $\left( PN_i \right) _{i \in I}$ be a strictly increasing sequence of interpolable polygon-normal pairs and $(\epsilon _i)_{i \in I}$ be admissible. Set $\displaystyle {\mathcal {T}}:=\mathrm {int}\left( \cup _{i\in I}T_i \right) .$

Then for any k in $ {\mathbb {N}}$, $k \ge 2$ there exists a $C^k$ convex function $f :{\mathcal {T}} \mapsto {\mathbb {R}}$, and an increasing sequence $(\lambda _i)_{i \in I}$, with $\inf _{i \in I} \lambda _i > -\infty $, such that for each i in I

(i)
$T_i \subset \left\{ x,\,f(x) \le \lambda _i \right\} $.
(ii)
$\mathrm {dist}(T_i, \left\{ x,\,f(x) \le \lambda _i \right\} ) \le \epsilon _i$.
(iii)
For each i in I, j in $\{1,\ldots ,n_i\}$, we have $f(A_{i,j}) = \lambda _i$ and $\nabla f(x)$ is colinear to $V_{i,j}$.
(iv)
$\nabla ^2 f$ is positive definite outside ${{\,\mathrm{argmin}\,}}f$. When there is a unique minimizer then $\nabla ^2f$ is positive definite throughout ${\mathcal {T}}$ (this is the case when $I = {\mathbb {N}}$ or when $I={\mathbb {Z}}$ and $\cap _{i \in I} T_i$ is a singleton).

We add two remarks which will be useful for directional convergence issues and the construction of Legendre functions:

(a)
If two consecutive elements of the sequence of interpolable polygon-normal pairs are homothetic with center 0 in the interior of both polytopes, then the restriction of the resulting convex function to this convex ring can be constructed such that all the sublevel sets within this ring are homothetic with the same center.
(b)
If further conditions are imposed on the elements of a strictly increasing interpolable polygon-normal pair, then the resulting function can be constructed to be Legendre and globally Lipschitz continuous (that is, its Legendre conjugate has bounded support). This is a consequence of Proposition 2 and will be detailled in the next section.

4.3 More on Legendre functions and a pathological function with polyhedral domain

Using intensively polygonal interpolation, we build below a finite continuous Legendre function h on an $\ell ^\infty $ square with oscillating “mirror lines": $t\rightarrow \nabla h^*(\nabla h(x_0)+tc)$.

We start with the following preparation proposition related to the Legendre interpolation of Proposition 2.

Lemma 7

Let $\left( PN_i \right) _{i \in {\mathbb {N}}^*}$ be a strictly increasing sequence of interpolable polygon-normal pairs. Setting for i in ${\mathbb {N}}^*$, $PN_i = \left\{ \left( A_{j,i} \right) _{j=1}^{n_i}, \left( V_{j,i} \right) _{j=1}^{n_i} \right\} $ where $n_j$ is in ${\mathbb {N}}^*$ and denoting by $T_i$ the polygon $A_{1,i}\ldots A_{n_i,i}$, we assume that

$$\begin{aligned} T_i=3i P,\,\forall i \in {\mathbb {N}}^*, \end{aligned}$$

where P is a fixed polygon which contains the unit Euclidean disk.

Then for any l in ${\mathbb {N}}$, $l \ge 2$, there exists a strictly increasing sequence of sets $\left( S_i \right) _{i \in {\mathbb {N}}, \,i\ge 2}$, such that for $j \ge 1$,

$S_{3j}$ interpolates the normals of $PN_j$ in the sense of Lemma 6 with $\mathrm {dist}(S_{3j}, T_j) \le 1 / (4(3j+2)^l)$
$S_{3j - 1} = \frac{3j - 1}{3_j} S_{3j}$
$S_{3j + 1} = \frac{3j + 1}{3_j} S_{3j}$

This sequence has the following properties

there exists $c>0$, such that for all j in ${\mathbb {N}}$, $j \ge 3$ and for all unit vector x,
$$\begin{aligned} c\ge \sigma _{S_{j+1}}(x) - \sigma _{S_j}(x) \ge 1 - \frac{1}{(j+1)^l}. \end{aligned}$$
(26)
for all unit vector x,
$$\begin{aligned} \left| \frac{\sigma _{S_{j+1}}(x) - \sigma _{S_j}(x)}{ \sigma _{S_{j}}(x) - \sigma _{S_{j-1}}(x)} -1\right| \le \frac{1}{j^l},\,\forall j\ge 3. \end{aligned}$$
(27)

Proof

Set for all j in ${\mathbb {N}}^*$, $\delta _j = \frac{1}{4(3j+2)^l}$ and let $S_{3j}$ be the $\delta _j$ interpolant of $T_j=3jP$ as given by Lemma 6 so that $\mathrm {dist}(S_{3j}, 3jP) \le \delta _j$. Since P contains the unit ball,

$$\begin{aligned} 3j P \subset S_{3j} \subset (3j + \delta _j) P. \end{aligned}$$

(28)

Now set

$$\begin{aligned} S_{3j - 1}&= \frac{3j - 1}{3_j} S_{3j}\\ S_{3j + 1}&= \frac{3j + 1}{3_j} S_{3j}. \end{aligned}$$

For any j in ${\mathbb {N}}^*$, it is clear that $S_{3j-1} \subset \mathrm {int}(S_{3j})$ and $S_{3j} \subset \mathrm {int}(S_{3j+1})$. Furthermore, by (28), we have

$$\begin{aligned} S_{3j+1} \subset \frac{3j+1}{3j} ( 3j + \delta _j) P \subset ((3j+1) + 2\delta _j) P \subset \mathrm {int}((3j+2) P) \subset \mathrm {int} (S_{3j+2}) \end{aligned}$$

so that we indeed have a strictly increasing sequence of sets. We obtain from the construction, for any j in ${\mathbb {N}}^*$, and any unit vector x,

$$\begin{aligned}&\sigma _{S_{3j}}(x) - \sigma _{S_{3j-1}}(x) =\sigma _{S_{3j+1}}(x) - \sigma _{S_{3j}}(x)\nonumber \\&\quad = \frac{1}{3j} \sigma _{S_{3j}}(x) \in \left[ \sigma _P(x), \left( 1 + \frac{\delta _j}{3j} \right) \sigma _P(x) \right] \subset \left[ 1, \left( 1+\frac{\delta _j}{3j} \right) \sigma _P(x) \right] \nonumber \\&\sigma _{S_{3j+2}}(x) - \sigma _{S_{3j+1}}(x) \le \sigma _P(x) (3j +2) \left( 1 + \frac{\delta _{j+1}}{3j+3} \right) - \sigma _P(x)(3j+1)\nonumber \\&\sigma _{S_{3j+2}}(x) - \sigma _{S_{3j+1}}(x) \ge \sigma _P(x) (3j +2) - \sigma _P(x)(3j+1)\left( 1 + \frac{\delta _j}{3j} \right) \nonumber \\&\quad = \sigma _P(x) \left( 1 - \delta _j \frac{3j+1}{3j} \right) \nonumber \\&\quad \ge 1 - \delta _j \frac{4}{3} \ge 1 - \frac{1}{(3j+2)^l} \end{aligned}$$

(29)

This proves (26). We deduce that for all j in ${\mathbb {N}}^*,$

$$\begin{aligned}&\frac{\sigma _{S_{3j+1}}(x) - \sigma _{S_{3j}}(x)}{ \sigma _{S_{3j}}(x) - \sigma _{S_{3j-1}}(x)} = 1, \text{ for } \text{ all } \text{ nonzero } \text{ vector } x, \\&\max _{\Vert x\Vert = 1} \frac{\sigma _{S_{3j+2}}(x) - \sigma _{S_{3j+1}}(x)}{ \sigma _{S_{3j+1}}(x) - \sigma _{S_{3j}}(x)} \le (3j +2) \left( 1 + \frac{\delta _{j+1}}{3j+3} \right) - (3j+1)\\&\quad = 1 + \frac{3j+2}{3j+3} \delta _{j+1} \le 1 + \frac{1}{(3j +1)^l},\\&\min _{\Vert x\Vert = 1} \frac{\sigma _{S_{3j+2}}(x) - \sigma _{S_{3j+1}}(x)}{ \sigma _{S_{3j+1}}(x) - \sigma _{S_{3j}}(x)} \underset{(29)}{\ge } \frac{1 - \delta _j \frac{3j+1}{3j} }{\left( 1 + \frac{\delta _j}{3j} \right) }\\&\quad \ge \left( 1 - \delta _j \frac{3j+1}{3j} \right) \left( 1 - \frac{\delta _j}{3j} \right) \\&\quad \ge 1 - \delta _j\left( \frac{3j+1}{3j} + \frac{1}{3j} \right) \ge 1 - \delta _j \frac{5}{3} \ge 1 - \frac{1}{(3j+1)^l}. \end{aligned}$$

Furthermore, using the fact that $t \mapsto \frac{1+t}{1-t}$ is increasing on $(-\infty ,1)$ and the fact that $\delta _{j+1} \le \delta _j$,

$$\begin{aligned} \max _{\Vert x\Vert = 1} \frac{\sigma _{S_{3j+3}}(x) - \sigma _{S_{3j+2}}(x)}{ \sigma _{S_{3j+2}}(x) - \sigma _{S_{3j+1}}(x)}&\underset{(29)}{\le } \frac{1 + \frac{\delta _{j+1}}{3j + 3}}{1 - (3j+1) \frac{\delta _j}{3j}} \nonumber \\&\le \frac{1 + (3j+1)\frac{\delta _{j}}{3j}}{1 - (3j+1) \frac{\delta _j}{3j}} \nonumber \\&\le \frac{1 + \delta _j\frac{4}{3}}{1 - \delta _j \frac{4}{3}}. \end{aligned}$$

(30)

Setting $s(t)= (1+t)/(1-t)$, we have, for all $t \le 1/2$

$$\begin{aligned} s'(t)&= \frac{2}{(1-t)^2}, \, s(0)=1\\ s''(t)&= \frac{4}{(1-t)^3} \le 24,\, s'(0)=2. \end{aligned}$$

Thus $s(t) \le 1 + 2 t + 12 t^2$ on $(-\infty ,1/2]$. Since $\frac{4}{3} \delta _j \le \frac{4}{75} \le \frac{1}{2}$, we deduce from the previous remark and (30) above:

$$\begin{aligned} \max _{\Vert x\Vert = 1} \frac{\sigma _{S_{3j+3}}(x) - \sigma _{S_{3j+2}}(x)}{ \sigma _{S_{3j+2}}(x) - \sigma _{S_{3j+1}}(x)}&\le 1 + \frac{8}{3} \delta _j + \frac{64}{3} \delta _j^2 = 1 + \delta _j\left( \frac{8}{3} + \frac{64}{3} \delta _j \right) \\&\le 1 + \delta _j\left( 3 + \frac{64}{3 \times 25}\right) \le 1 + 4 \delta _j = 1 + \frac{1}{(3j+2)^l}. \end{aligned}$$

Finally using (29) again,

$$\begin{aligned} \min _{\Vert x\Vert = 1} \frac{\sigma _{S_{3j+3}}(x) - \sigma _{S_{3j+2}}(x)}{ \sigma _{S_{3j+2}}(x) - \sigma _{S_{3j+1}}(x)}&\ge \frac{1 }{(3j +2) \left( 1 + \frac{\delta _{j+1}}{3j+3} \right) - (3j+1) } \\&= \frac{1 }{1 + \delta _{j+1}\frac{3j+2}{3j+3} } \\&\ge 1 - \delta _{j+1}\frac{3j+2}{3j+3} \ge 1 - \delta _{j+1} \ge 1 - \frac{1}{(3j+2)^l}. \end{aligned}$$

This proves the desired result. $\square $

Combining Lemma 7 and Proposition 2, we obtain the following result.

Theorem 3

Let $\left( PN_i \right) _{i \in {\mathbb {N}}^*}$ be a strictly increasing sequence of interpolable polygon-normal pairs. Set for i in ${\mathbb {N}}^*$, $PN_i = \left\{ \left( A_{j,i} \right) _{j=1}^{n_i}, \left( V_{j,i} \right) _{j=1}^{n_i} \right\} $ where $n_i$ is in ${\mathbb {N}}^*$, denote by $T_i$ the polygon $A_{1,i}\ldots A_{n_i,i}$, and assume that for all i in $ {\mathbb {N}}^*$, $T_i=3i P$ where P is a fixed polygon which contains the unit disk in its interior.

Then for any k in $ {\mathbb {N}}$, $k \ge 2$ and all $l \ge 3$, there exists a $C^k$ globally Lipschitz continuous Legendre function, $h :{\mathbb {R}}^2 \mapsto {\mathbb {R}}$, and an increasing sequence $(\lambda _i)_{i \in {\mathbb {N}}}$, such that for each i in $ {\mathbb {N}}$:

$T_i \subset \left\{ x,\,h(x) \le \lambda _i \right\} $,
$\mathrm {dist}(T_i, \left\{ x,\,h(x) \le \lambda _i \right\} ) \le \frac{1}{4(3i+2)^l}$,
For any x with $h(x) = \lambda _i$ and $\nabla h(x)$ is colinear to $V_i$ for each vertex x of $T_i$,
h has positive definite Hessian and is globally Lipschitz continuous,
$h^*$ has compact domain and is $C^k$ on the interior of its domain.

Corollary 2

(Continuity on the domain) The function $h^*$ constructed in Theorem 3 has compact polygonal domain and is continuous on this domain.

Proof

Since P is a polygon and contains the unit Euclidean disk, the gauge function of 3P is polyhedral with full domain ${\mathbb {R}}^2$, call it $\omega $. Denote by $P^{\circ }$ the polar of P. This is a polytope and since $\omega $ is the gauge of P, we actually have $\omega = \sigma _{P^{\circ }}$, the support function of the polar of P [35, Theorem 1.7.6]. Hence the the convex conjugate of $\omega $ is the indicator of the polytope $P^\circ $ [37, Theorem 13.2].

It can be easily seen from the proof of Proposition 2 that $\lambda _i = \alpha i + r_i$ with $r(i) = O(1)$ as $i \rightarrow \infty $. Without loss of generality, we may suppose that $\alpha = 1$ (this is a simple rescaling) so that there is a positive constant c such that $|\lambda _i - i| \le c$ for all i.

Let h be given as in Theorem 3, fix $i \ge 1$ and $x \in {\mathbb {R}}^2$ such that $\lambda _{i-1} \le h(x) \le \lambda _i$. We have in ${\mathbb {R}}^2$

$$\begin{aligned} \{y:\, \omega (y) \le i-1 \} \subset \{y:\, h(y) \le \lambda _{i-1} \} \subset \{y:\, h(y) \le \lambda _{i} \} \subset \{y:\, \omega (y) \le i+1 \} \end{aligned}$$

and hence

$$\begin{aligned} i - 1 \le \omega (x) \le i+1 \end{aligned}$$

and we deduce that

$$\begin{aligned} \omega (x) - 2 - c \le i - 1 -c \le \lambda _{i-1} \le h(x) \le \lambda _i \le i+c \le \omega (x) + c + 1. \end{aligned}$$

Since i was arbitrary, this shows that there exists a constant $C>0$ such that $|h(x) - \omega (x)|\le C$ for all $x \in {\mathbb {R}}^2$. Recall that $z \mapsto \sup _{y \in {\mathbb {R}}^2} \left\langle y,z\right\rangle - \omega (y)$ is the indicator function of $P^\circ $, hence,

$$\begin{aligned} z \in P^{\circ }&\Rightarrow \sup _{y \in {\mathbb {R}}^2} \left\langle y,z\right\rangle - \omega (y) = 0 \Rightarrow \sup _{y \in {\mathbb {R}}^2} \left\langle y,z\right\rangle - h(y) \le C < +\infty \\ z \not \in P^{\circ }&\Rightarrow \sup _{y \in {\mathbb {R}}^2} \left\langle y,z\right\rangle - \omega (y) = +\infty \Rightarrow \sup _{y \in {\mathbb {R}}^2} \left\langle y,z\right\rangle - h(y) = +\infty \end{aligned}$$

which shows that the domain of $h^*$ is actually $P^{\circ }$ which is a polytope. Now, $h^*$ is convex and lower semicontinuous on $P^{\circ }$, invoking the results of [23], it is also upper semicontinuous on $B^*$ and finally it is continuous on $B^*$. $\square $

Corollary 3

(A pathological Legendre function) For any $\theta \in \left( \frac{-\pi }{4},\frac{\pi }{4} \right) $ there exists a Legendre function $h:{\mathbb {R}}^2 \mapsto {\mathbb {R}}$ whose domain is a closed square, continuous on this domain and $C^k$ on its interior, such that for all $i \in {\mathbb {N}}^*$, $\nabla h^*(i, 0)$ is proportional to $(\cos (\theta ), (-1)^i \sin (\theta ))$.

Proof

For $x=(u,v)$, set $\Vert x\Vert _1=|u|+|v|$, and let $P = \left\{ x \in {\mathbb {R}}^2,\, \Vert x\Vert _1 \le 2 \right\} $. Let us construct a strictly increasing sequence of interpolable polygon-normal pairs $\left( PN_i \right) _{i \in {\mathbb {N}}^*}$ as follows, we fix $\theta \in \left( \frac{-\pi }{4},\frac{\pi }{4} \right) $ and set for all $i\in {\mathbb {N}}^*$ :

$T_i = 3i P$, the polygon associated to the i-th term $PN_i$ of the sequence,
except at the rightmost corner, consider the normals given by the canonical basis vectors and their opposite,
at the rightmost corner, (6i, 0), one chooses the normal given by the vector
$$\begin{aligned} (\cos (\theta ), (-1)^i \sin (\theta )). \end{aligned}$$

We now invoke Theorem 3 to obtain a Lipschitz continuous Legendre function, denoted $h^*$, with full domain having all the $T_i$ as sublevel sets and satisfying the hypotheses of the corollary. Rescaling by a factor 6 and setting $h = h^{**}$ gives the result. $\square $

5 Counterexamples in continuous optimization

We are now in position to apply our interpolation results to build counterexamples to classical problems in convex optimization. We worked on situations ranging from structural questions to qualitative behavior of algorithms and ODEs. Through 9 counterexamples we tried to cover a large spectrum but there are many more possibilities that are left for future research. Some example are constructed from decreasing sequences of convex sets, they can be interpolated using Theorem 2 with $I = {\mathbb {Z}}$, indexing the sequence with negative indices and adding artificially additional sets for positive indices. Nonetheless we sometimes depart from the notations of the first sections and index these sequences by ${\mathbb {N}}$ even though they are decreasing for simplification purposes.

5.1 Kurdyka-Łojasiewicz inequality may not hold

The following result is proved in [12], it was crucial to construct a $C^2$ convex function which does not satisfy Kurdyka–Łojasiewicz (KL) inequality.

Lemma 8

[12, Lemma 35] There exists a decreasing sequence of compact convex sets $\left( T_i \right) _{i\in {\mathbb {N}}}$ such that for any i in $ {\mathbb {N}}$, $T_- = T_{i+1}$ and $T_+ = T_i$ satisfy Assumption 1 and

$$\begin{aligned} \sum _{i=0}^{+\infty } \mathrm {dist}(T_i, T_{i+1}) = +\infty \end{aligned}$$

As a corollary, we improve the counterexample in [12] and provide a $C^k$ convex counterexamples for any $k\ge 2 $ in $ {\mathbb {N}}$.

Corollary 4

(Smooth convex functions are not KL in general) There exists a $C^k$ convex function $f :{\mathbb {R}}^2 \mapsto {\mathbb {R}}$ which does not satisfy KL inequality. More precisely, for any $r > \inf f$ and $\varphi :[\inf f, r] \mapsto {\mathbb {R}}$ continuous and differentiable on $(\inf f, r)$ with $\varphi ' > 0$ and $\varphi (\inf f) = 0$, we have

$$\begin{aligned} \inf \{\Vert \nabla (\varphi \circ f)(x) \Vert : \, x\in {\mathbb {R}}^2, \,\inf f< f(x) < r \} = 0. \end{aligned}$$

Proof

Using [35, Theorem 1.8.13], each $T_i$ can be approximated up to arbitrary precision by a polygon. Hence we may assume that all $T_i$ are polygonal while preserving the property of Lemma 8 as well as Assumption 1. Furthermore, using Lemma 6 and Remark 5 each $T_i$ can in turn be approximated with arbitrary precision by a convex set with $C^k$ boundary and positive curvature. Hence we may also assume that all $T_i$ satisfy both the result of Lemma 8 and have $C^k$ boundary with nonvanishing curvature. Reversing the order of the sets and adding additional sets artificially, we are in the conditions of application of Theorem 2 with $I = {\mathbb {Z}}$ and the resulting f follows from the same argument as in [12, Theorem 36]. $\square $

5.2 Block coordinate descent may not converge

The following polygonal construction is illustrated in Fig. 5. For any $n\ge 2$ in ${\mathbb {N}}$, we set

$$\begin{aligned} A_n&= \left( \frac{1}{4} + \frac{1}{n}, \frac{1}{4} + \frac{1}{n}\right) \\ B_n&= \left( \frac{1}{4} + \frac{1}{2( n-1)} + \frac{1}{2n}, \frac{1}{4} + \frac{1}{2n} + \frac{1}{2(n+1)} \right) \\ C_n&= \left( \frac{1}{4} + \frac{1}{n}, - \frac{1}{4} - \frac{1}{n}\right) \\ D_n&= \left( -\frac{1}{4} - \frac{1}{n}, - \frac{1}{4} - \frac{1}{n}\right) \\ E_n&= \left( -\frac{1}{4} - \frac{1}{n}, + \frac{1}{4} + \frac{1}{n}\right) . \end{aligned}$$

This defines a convex polygon. We may choose the normals at $A_n, C_n, D_n, E_n$ to be bisectors of the corresponding corners and the normal at $B_n$ to be horizontal (see Fig. 5). Rotating by an angle of $-\frac{n\pi }{2}$ and repeating the process indefinitely, we obtain the sequence of polygons depicted in Fig. 5. It can be checked that the polygons form a strictly decreasing sequence of sets, as for $n>1$, the polygon $A_nB_nC_nD_nE_n$ is contained in the interior of the square $A_{n-1}C_{n-1}D_{n-1}E_{n-1}$. This fulfills the requirement of Corollary 1.

Corollary 5

There exists a $C^k$ convex function $f :{\mathbb {R}}^2 \mapsto {\mathbb {R}}$ and an initialization $x_0=(u_0,v_0)$ such that the recursion, for $i \ge 1$

$$\begin{aligned} u_{i+1}&\in \mathop {\hbox {argmin}}\limits _{u} f(u,v_i) \\ v_{i+1}&\in \mathop {\hbox {argmin}}\limits _{v} f(u_{i+1},v) \end{aligned}$$

produces a non converging sequence $\displaystyle (x_i)_{i\in {\mathbb {N}}}=\left( (u_i,v_i)\right) _{i\in {\mathbb {N}}}$.

Proof

We apply Corollary 1 to the proposed decreasing sequence and by choosing $(u_0,v_0) = B_2$ for example. This requires to shift indices (start with $i = 2$) and use Theorem 2 with $I = {\mathbb {Z}}$. Note that the optimality condition for partial minimization and the fact that level sets have nonvanishing curvature ensure that the partial minima are unique. $\square $

In the nonsmooth convex case cyclic minimization is known to fail to provide the infimum value, see e.g., [3, p. 94]. Smoothness is sufficient for establishing value convergence (see e.g. [10, 39] and references therein), whether it is enough or not for obtaining convergence was an open question. Our counterexample closes this question and shows that cyclic minimization does not yield converging sequences even for $C^k$ convex functions. This result also closes the question for the more general nonconvex case for which we are not aware of a nontrivial counterexample for convergence of alternating minimization. Let us mention however Powell’s example [34] which shows that cyclic minimization with three blocks does not converge for smooth functions.

It would also be interesting to understand how our result may impact dual methods and counterexamples in that field, as for instance the recent three blocks counterexample in [17].

5.3 Gradient descent with exact line search may not converge

Gradient descent with exact line search is governed by the recursion:

$$\begin{aligned} x^+\in {{\,\mathrm{argmin}\,}}\left\{ f(y):y=x-t\nabla f(x),\,t\in {\mathbb {R}}\right\} , \end{aligned}$$

where x is a point in the plane.

Observe that the step coincides with partial minimization when the gradient $\nabla f(x)$ is colinear to one of the axis of the canonical basis. From the previous section, we thus deduce the following.

Corollary 6

(Failure of gradient descent with exact line search) There exists a $C^k$ convex function $f :{\mathbb {R}}^2 \mapsto {\mathbb {R}}$ and an initialization $z_0$ in the plane such that the recursion, for $i \ge 1$

$$\begin{aligned} x_{i+1} \in {{\,\mathrm{argmin}\,}}\left\{ f(y): y=x_i-t\nabla f(x_i),\, t\in {\mathbb {R}}\right\} \end{aligned}$$

produces a well defined non converging sequence $\left( x_i \right) _{i \in {\mathbb {N}}}$.

Convergence failure for gradient descent with exact line search is new up to our knowledge. Let us mention that despite non convergence, the constructed sequence satisfy sublinear convergence rates in function values [10].

5.4 Tikhonov regularization path may have infinite length

Following [36], we consider for any $r > 0$

$$\begin{aligned} x(r) = {{\,\mathrm{argmin}\,}}\left\{ f(x) + r \Vert x - \bar{x}\Vert _2^2:x\in {\mathbb {R}}^2\right\} \end{aligned}$$

(31)

where f is $C^k$ convex and where ${\bar{x}}$ is any anchor point. We would like to show that the curve $r \mapsto x(r)$ may have infinite length. Torralba provided a counterexample in his PhD Thesis for continuous convex functions, see [36]. This work extends his result to smooth $C^k$ convex functions in ${\mathbb {R}}^2$.

For any n in $ {\mathbb {N}}^*$, we set

$$\begin{aligned} A_n&= \left( \frac{2}{n}, \frac{2}{n}\right) \\ B_n&= \left( \frac{2}{n} + \frac{1}{n^2}, -\frac{1}{n} \right) \\ C_n&= \left( \frac{2}{n}, - \frac{2}{n}\right) \\ D_n&= \left( - \frac{2}{n}, - \frac{2}{n}\right) \\ E_n&= \left( - \frac{2}{n}, \frac{2}{n}\right) . \end{aligned}$$

This is depicted in Fig. 6. For all $n \ge 1$, denote by $M_n$ the point on the x axis above $B_n$ and $N_n$, the intersection of the normal cone at $B_n$ and the x axis. We have

$$\begin{aligned} \frac{M_nN_n}{M_nB_n} = n \times M_nN_n= \frac{A'_nB_n}{A_nA'_n} = \frac{3/n}{1/n^2} = 3 n, \end{aligned}$$

so that for all $n \ge 1$, $M_nN_n = 3$ and $N_n = (3 + 2/n + 1 / n^2, 0)$. Choosing $P = (7,0)$, since for $n \ge 1$, $3 + 2/n + 1 / n^2 \le 6 < 7$ , this shows that P constantly belongs to the interior of the normal cone at $B_n$ for all $n \ge 1$. The sequence of level sets is constructed as in Fig. 6 by considering alternating symmetries with respect to the x-axis of the sequence of polygons above. It can be checked that the polygons form a strictly decreasing sequence of sets, as for $n>1$, the polygon $A_nB_nC_nD_nE_n$ is contained in the interior of the square $A_{n-1}C_{n-1}D_{n-1}E_{n-1}$. We choose the normal at $A_n, C_n, D_n, E_n$ to belong to the bisector at the corner and the normal at $B_n$ to be proportional to the vector $B_n P$. Applying Corollary 1, we construct f and choose $\bar{x} = P$ in (31) to obtain the following:

Corollary 7

(A bounded infinite length Tikhonov path) There exists a $C^k$ strictly convex function $f :{\mathbb {R}}^2 \mapsto {\mathbb {R}}$ and $\bar{x} \in {\mathbb {R}}^2$ such that the curve x((0, 1)) given by (31) has infinite length.

Proof

We apply Corollary 1 with $I = {\mathbb {Z}}$ and revert the indices set to match the sequence that we have described. For any $n \ge 1$ there exists a value of $\lambda _n$ such that $f(B_n)= \lambda _n$ and $\nabla f(B_n)$ is colinear to the vector $B_n P$. Set

$$\begin{aligned} r = \frac{\Vert \nabla f(B_n)\Vert }{2 B_nP} \end{aligned}$$

we have $\nabla f(B_n) + 2r(B_n - P) = 0$ which is the optimality condition in (31) with $\bar{x} = P$. Hence we have shown that there exists a value of r such that $B_n$ is the solution to (31). Since n was arbitrary this is true for all n and the curve $r \mapsto x(r)$ has to go through a sequence of points whose second coordinate is of the form $(-1)^n/n$ for all $n \ge 1$. Since this sequence is not absolutely summable, the curve has infinite length. $\square $

This result is in contrast with the definable case for which we have finite length by the monotonicity lemma, since the whole trajectory is definable and bounded.

5.5 Secants of gradient curves at infinity may not converge

Thom’s gradient conjecture and Kurdyka–Mostowski–Parusinski’s theorem A theorem of Łojasiewicz [27] asserts that bounded solutions to the gradient system

$$\begin{aligned} {\dot{x}}(t) = - \nabla f(x(t)) \end{aligned}$$

converge when f is a real analytic potential. Thom conjectured in [38] that this convergence should occur in a stronger form: trajectories converging to a given $\bar{x}$ should admit a tangent at infinity, that is

$$\begin{aligned} \frac{x(t) - \bar{x}}{\Vert x(t) - \bar{x}\Vert } \end{aligned}$$

(32)

should have a limit as $t \rightarrow \infty $. Lines passing through ${\bar{x}}$ and having (32) as a slope are called secants of x at ${\bar{x}}$. This conjecture was proved to be true in [26]. In the convex world, it is well known that solutions to the gradient system converge for general potentials (this is a Féjer monotonicity argument due to Bruck); see also the original approaches by Manselli and Pucci [30] and Daniilidis et al. [19]. It is then natural to wonder whether this convergence satisfies higher order rigidity properties as in the analytic case. The answer turns out to be negative in general yielding a quite mysterious phase portrait.

Absence of tangential convergence for convex potentials

The construction given in this paragraph is more complex than the previous ones, we start with a technical lemma which will be the basic building block for our counterexample.

Lemma 9

Let S be a convex set with $C^k$ boundary interpolating ABCDEF in Fig. 7 and let g be the gauge function associated to S. The function g is differentiable outside the origin. Consider any initialization $x_0 $ in [BC] with corresponding trajectory solution to the equation

$$\begin{aligned} {\dot{x}}(t)&= - \nabla g(x(t)), \, t\ge 0,\\ x(0)&= x_0. \end{aligned}$$

Set $\bar{t} = \sup _{x(t) \in OBC} t$, we have $\bar{t} < + \infty $ and $x(\bar{t}) $ in $ [CC']$.

Proof

The fact that g is differentiable comes from the fact that its subgradient is uniquely determined by the normal cone to S (which has dimension one because of the smoothness of the boundary of S). Since S is interpolating the polygon, we have $g(B) = g(C) = 1$. Furthermore, we have for all $t\ge 0$, $\frac{d}{dt} g(x(t)) = -\Vert \nabla g(x(t))\Vert ^2= -1$, thence $\bar{t} \le 1 - g(C')$. By homogeneity, for any $x \ne 0$ and $s > 0$, $\nabla g (sx) = \nabla g(x)$. For any x in [BC], by convexity

$$\begin{aligned} 0 \le \left\langle C-x,\nabla g(C) - \nabla g(x) \right\rangle = \left\langle C-B,\nabla g(C) - \nabla g(x) \right\rangle \frac{\Vert C-x\Vert }{\Vert C-B\Vert }, \end{aligned}$$

and therefore

$$\begin{aligned} -\left\langle C - B, \nabla g(C) \le -\left\langle C - B, \nabla g(x) \right\rangle \right\rangle \end{aligned}$$

(33)

By homogeneity of g, (33) is true for any x in the triangle OCB (different from 0) and thus in the triangle $C'CB$ . Denote by y the solution to the equation

$$\begin{aligned} {\dot{y}}&= - \nabla g(C)\\ y(0)&= B, \end{aligned}$$

which integrates to $y(t) = B - t w$ for all t. Equation (33) ensures that for any $0\le t \le \bar{t}$

$$\begin{aligned} \frac{d}{dt} (\left\langle C - B, x(t) \right\rangle )&\ge \frac{d}{dt} (\left\langle C - B, y(t) \right\rangle ) \end{aligned}$$

Hence, we have for any $0\le t \le \bar{t}$, integrating on [0, t]

$$\begin{aligned} \left\langle C - B, x(t) \right\rangle&\ge \left\langle C - B, y(t) \right\rangle + \left\langle C - B, x_0 - B \right\rangle \nonumber \\&\ge \left\langle C - B, y(t) \right\rangle . \end{aligned}$$

(34)

Furthermore, for all x in [BC], we have

$$\begin{aligned} 1 = \Vert \nabla g(x)\Vert ^2 = \frac{1}{\Vert C - B\Vert ^2}\left\langle C - B, \nabla g(x) \right\rangle ^2 + \left\langle v, \nabla g(x) \right\rangle ^2, \end{aligned}$$

because v is orthogonal to $C-B$. The first term is maximal for $x = C$ and thus the second term is minimal for $x = C$, we have thus for all x in [BC]

$$\begin{aligned} 0 < \left\langle \nabla g(C), v \right\rangle = \left\langle - \nabla g(C), -v \right\rangle \le \left\langle \nabla g(x), v \right\rangle = \left\langle -\nabla g(x), -v \right\rangle \le 1. \end{aligned}$$

(35)

Equation (35) holds for all x in OCB different from O by homogeneity. We deduce that for all $0 \le t \le \bar{t}$, we have

$$\begin{aligned} \frac{d}{dt} (\left\langle -v, x(t) \right\rangle )&\ge \frac{d}{dt} (\left\langle -v, y(t) \right\rangle ) \end{aligned}$$

and by integration

$$\begin{aligned} \left\langle -v, x(t) \right\rangle&\ge \left\langle -v, y(t) \right\rangle + \left\langle -v, x_0 - B \right\rangle \nonumber \\&= \left\langle -v, y(t) \right\rangle . \end{aligned}$$

(36)

Hence, in the coordinate system $(C - B, -v)$, which is orthogonal, for all t in $ [0,\bar{t}]$, x(t) has larger coordinates than y(t).

The trajectory y(t), of equation $t \mapsto B - t w$ is the line going from B to $C'$. From Eqs. (34) and (36), we may write for all t in $ [0,\bar{t}]$, $x(t) = y(t) + \alpha (t) (C-B) + \beta (t) (-v)$ where $\alpha $ and $\beta $ are positive functions. Since y(t) belongs to the line $(BC')$, this shows that x(t) has to be above this line for all $t \ge 0$, $t\le \bar{t}$ and actually, $x(\bar{t}) $ in $ BCC'$. Hence at time $\bar{t}$, we have $x(\bar{t}) $ in $ [CC']$. This holds true because $x(\bar{t})$ is on the boundary of OCB and on the boundary of $BCC'$. Hence either $x(\bar{t}) $ in $ [CC']$, either $x(\bar{t}) $ in [BC]. Equation (35) ensures that if $x(\bar{t}) $ in [BC] then $x(\bar{t}) = C$ which concludes the proof. $\square $

Corollary 8

(Secants of gradient curves may all fail to converge) There exists a $C^k$ strictly convex function on ${\mathbb {R}}^2$ with a unique minimizer $\bar{x}$, such that any nonconstant solution to the gradient flow equation

$$\begin{aligned} {\dot{x}}(t) = -\nabla f(x(t)) \end{aligned}$$

is such that

$$\begin{aligned} \frac{x(t) - \bar{x}}{\Vert x(t) - \bar{x}\Vert } \end{aligned}$$

does not have a limit as $t \rightarrow \infty $.

The function f has a positive definite Hessian everywhere except at 0.

Proof

We assume without loss of generality that $\bar{x} = O$ is the origin. Writting $x(t) = (r(t),\theta (t))$ in polar coordinate, we will construct a function f such that each solution to the ODE produces nonconverging trajectories $\theta (t)$.

We start with an interpolating set $S_0 = ABCDE$ as in Lemma 9 and let $S_1 = A'B'C'D'E'$ be its scaled version as described in Fig. 7.

Let $\alpha $ be the value of the angle ${\widehat{BOC}}$ and $m = \left\lceil \frac{2 \pi }{\alpha } \right\rceil + 1$. We have

$$\begin{aligned} \frac{2\pi }{m} < \alpha . \end{aligned}$$

To obtain $S_{2}$, we rotate $S_0$ by an angle $2\pi / m$, we denote $S'_0$ the resulting set. We rescale $S'_0$ by a factor $\beta $ in (0, 1) so that $\beta S'_0$ lies in the interior of $S_1$. Call the resulting set $S_2$ and $S_3$ is obtained from $S_2$ exactly the same way as $S_1$ is obtained from $S_0$. We repeat the same process indefinitely to obtain a strictly decreasing sequence of $C^k$ sets. Note that for any k in $ {\mathbb {N}}$, $S_{2km}$ and $S_{2km + 1}$ are homothetic to $S_0$.

We now invoke Corollary 1 (with $I = {\mathbb {Z}}$ and revert the indices) to obtain a $C^k$ function f with those prescribed level sets. Using Remark 3 it turns out that the level sets of f between $S_0$ and $S_1$ are simple scalings of $S_0$. Hence the gradient curves of f and those of the gauge function of S are the same between $S_0$ and $S_1$, up to time reparametrization.

Using Lemma 9 any trajectory crossing [BC] in Fig. 7, must also be crossing $[CC']$ and leave the triangle BOC in finite time. The same statement holds after scaling the level sets and since for all k in $ {\mathbb {N}}$, $S_{2km}$ and $S_{2km+1}$ are homothetic to $S_0$, this shows that no solution stays indefinitely in the triangle BOC.

Lemma 9 still holds after rotations and by our construction, for any triangle T obtained by rotating BOC by a multiple of $2\pi /m$, no trajectory stays indefinitely within T. Since $2\pi / m < \alpha $, the union of these triangles U contains O in its interior.

Note first that any gradient curve converges to ${\bar{x}}$. Let us argue by contradiction and assume that there exists a continuous gradient curve $t \mapsto z(t)$ distinct from the stationary solution $\bar{x}$, such that

$$\begin{aligned} \frac{z(t) - \bar{x}}{\Vert z(t) - \bar{x}\Vert } \end{aligned}$$

converges. This exactly means that the angle $\theta (t)$ of the curve has a limit in $[0,2\pi )$ as t goes to infinity. There is a rotation of BOC by a multiple of $2\pi /m$ whose interior intersects the half line given by the direction $\theta $, call this triangle T. The directional convergence entails that there exists $t_0 \ge 0$ such that z(t) belongs to T for all $t \ge t_0$. Hence z can not be a gradient curve. To complete the proof, we may add disks of increasing size to the list of sets to obtain a full domain function and invoke Theorem 2 with $I = {\mathbb {Z}}$. $\square $

5.6 Newton’s flow may not converge

Given a twice differentiable convex function f, we define the open set $\Omega :=\{x\in {\mathbb {R}}^2:\nabla ^2f \text{ is } \text{ invertible }\}$ and we consider maximal solutions to the differential equation

$$\begin{aligned} {\dot{x}}(t) = - \nabla ^2 f(x(t))^{-1} \nabla f(x(t)), \end{aligned}$$

(37)

on $\Omega $. This is the continuous counterpart of Newton’s method, it has been studied in [5] and [2]. Let $x_0$ be in $\Omega $, there exists a unique maximal nontrivial interval I containing 0 and a unique solution x to (37) on I with $x(0) = x_0$. Equation (37) may be rewritten as

$$\begin{aligned} \frac{d}{dt} \nabla f(x(t)) = - \nabla f(x(t)) \end{aligned}$$

and thus for all t in I, we have

$$\begin{aligned} \nabla f(x(t)) = e^{-t} \nabla f(x_0). \end{aligned}$$

(38)

If we could ensure that $I = {\mathbb {R}}$ and f has oscillating gradients close to its minimum, then (38) entails that the direction of the gradient is constant along the solution, which requires oscillations in space to compensate for gradient oscillations.

Corollary 9

(A bounded Newton’s curve that oscillates at infinity) For any $k\ge 2$, there exists a $C^k$ convex coercive function $f :{\mathbb {R}}^2 \mapsto {\mathbb {R}}$ and an initial condition $x_0 $ in $ {\mathbb {R}}^2$ such that the solution to (37) is bounded, defined on ${\mathbb {R}}$ and has at least two distinct accumulation points.

The counterexample is sketched in Fig. 8, the construction is the same as for Corollary 5 but instead of doing quarter rotations, we use symmetry with respect to the first axis. We can then call for Corollary 1 to construct the function f and equation (38) ensures that the solution interval is unbounded.

5.7 Bregman descent (mirror descent) may not converge

The mirror descent algorithm was introduced in [31] as an efficient method to solve constrained convex problems. In [9], this method is shown to be equivalent to a projected subgradient method, using non-Euclidean projections. It plays an important role for some categories of constrained optimization problem; see e.g., [6] for recent developments and [20] for a surprising example.

Let us recall beforehand some definitions. Given a Legendre function h with domain $\mathrm {dom}\,h$, define the Bregman distance^{Footnote 4} associated to h as $D_h(u,v)=h(u)-h(v)-\langle \nabla h(v),u-v\rangle $ where u is in $ \mathrm {dom}\,h$ and v is in the interior of $\mathrm {dom}\,h$.

Given a smooth convex function f that we wish to minimize on $\overline{ \mathrm {dom}\,h}$, we consider the Bregman method

$$\begin{aligned} x_{i+1}={{\,\mathrm{argmin}\,}}\left\{ \langle \nabla f(x_i),u-x_i\rangle +\lambda D_h(u,x_i):u\in {\mathbb {R}}^2\right\} , \end{aligned}$$

where $x_0$ is in $\mathrm {int}\,\mathrm {dom}\,h$ and $\lambda >0$ is a step size. When the above iteration is well defined, e.g. when $\mathrm {dom}\,h$ is bounded, it writes:

$$\begin{aligned} x_{i+1}=\nabla h^*\left( \nabla h(x_i)-\lambda \nabla f(x_i)\right) . \end{aligned}$$

In [6] the authors identified a generalized smoothness condition which confers good minimizing properties to the above method:

$$\begin{aligned}&Lh-f \text{ convex }, \end{aligned}$$

(39)

$$\begin{aligned}&\lambda \in (0,L). \end{aligned}$$

(40)

The corollary below shows that such an algorithm may not converge, even though we assume the cost to satisfy (39), the step to satisfy (40), and the Legendre function to have a compact domain.

Corollary 10

(Bregman or mirror descent may not converge) There exists a Legendre function $h:D \mapsto {\mathbb {R}}$, defined on a closed square D, continuous on D, a vector c in $ {\mathbb {R}}^2$, and $x_0$ in ${\mathbb {R}}^2$ such that the Bregman descent recursion

$$\begin{aligned} x_{i+1} = \nabla h^*\left( \nabla h(x_i) - c \right) , \end{aligned}$$

produces a bounded sequence $(x_i)_{i \in {\mathbb {N}}}$ which has at least two distinct accumulation points.

Proof

We fix $\theta \in \left( \frac{-\pi }{4},\frac{\pi }{4} \right) $, $\theta \ne 0$, and consider h constructed in Corollary 3 and choose $c = (-1,0)$. In this case the Bregman descent recursion writes for all i in ${\mathbb {N}}$,

$$\begin{aligned} \nabla h (x_{i+1}) - \nabla h(x_i) = -c \end{aligned}$$

so that we actually have $\nabla h (x_{i}) - \nabla h(x_{0}) = \nabla h (x_{i}) = - i c$ and thus

$$\begin{aligned} x_i = \nabla h^*(-ic) = \nabla h^*(i,0). \end{aligned}$$

By Corollary 3, we have for all $i \in {\mathbb {N}}$ that $\nabla h^*(i,0)$ proportional to $(\cos (\theta ), (-1)^i \sin (\theta ))$. Since the norm of the gradient of $h^*$ cannot vanish at infinitiy (no flat direction) and is bounded, this proves that the sequence $(x_i)_{i \in {\mathbb {N}}}$ has at least two accumulation points which is the desired result. $\square $

5.8 Central paths of Legendre barriers may not converge

Consider the problem

$$\begin{aligned} \min _{x \in D} \left\langle c, x\right\rangle \end{aligned}$$

(41)

where D is a subset of ${\mathbb {R}}^2$ and c. Given a Legendre function h on D, we introduce the h central path through

$$\begin{aligned} x(r) = {{\,\mathrm{argmin}\,}}\left\{ \left\langle c, x\right\rangle + r h(x):x\in {\mathbb {R}}^2\right\} \end{aligned}$$

(42)

where $r>0$ is meant to tend to 0. Central paths are one of the essential tools behind interior point methods, see e.g., [4, 33] and references therein.

Note that the accumulation points of x(r) as $r \rightarrow 0$, have to be in the the solution set of (41). It is even tempting to think that the convergence of the path to some specific minimizer could occur, as it is the case for many barriers, see e.g. [4]. We have however:

Corollary 11

(A nonconverging central path) There exists a Legendre function $h:D \mapsto {\mathbb {R}}$, defined on a closed square D, continuous on D, a vector c in $ {\mathbb {R}}^2$, such that the h central path $r \mapsto x(r)$ has two distinct accumulation points.

Proof

The optimality condition which characterizes x(r) for any $r > 0$ writes,

$$\begin{aligned} x(r)= \nabla h^*\left( \frac{c}{r} \right) , \end{aligned}$$

and the construction is the same as in Corollary 10. $\square $

5.9 Hessian Riemannian gradient dynamics may not converge

The construction of this paragraph is similar to the two previous paragraphs. Consider a $C^k$ ($k\ge 2$) Legendre function $h :D \mapsto {\mathbb {R}}$ and the continuous time dynamics

$$\begin{aligned} {\dot{x}}(t) = - \nabla _H f(x(t)), \, t\ge 0, \end{aligned}$$

(43)

where $H = \nabla ^2 h$ is the Hessian of h and $\nabla _H f = H^{-1} \nabla f$ is the gradient of some differentiable function f in the Riemannian metric induced by H on $\mathrm {int}\,D$. Such dynamics were considered in [1, 14].

We have the following result:

Corollary 12

(Nonconverging Hessian Riemannian gradient dynamics) There exists a Legendre function $h:D \mapsto {\mathbb {R}}$, defined on a closed square D, continuous on D, a vector c in $ {\mathbb {R}}^2$, and $x_0$ in ${\mathbb {R}}^2$ such that the solution to (43) with $f=\langle c,\cdot \rangle $ has two distinct accumulation points.

Proof

Equation (43) may be rewritten

$$\begin{aligned} \frac{d}{dt} \nabla h(x(t))= - \nabla f(x(t)), \end{aligned}$$

so choosing $c = (-1,0)$, we have for all $t \in {\mathbb {R}}$, $\nabla h(x(t)) = \nabla h(x(0)) + (t,0) = (t,0)$ and the construction is the same as in Corollary 10. $\square $

Notes

In the sense of sets inclusion the sequence being indexed on ${\mathbb {N}}$ or ${\mathbb {Z}}$.
See Theorem 2 for the full version.
By structural, we include homotopic deformations by mere summation.
It is actually not a proper distance.

References

Alvarez, F., Bolte, J., Brahic, O.: Hessian Riemannian gradient flows in convex programming. SIAM J. Control. Optim. 43(2), 477–501 (2004)
Article MathSciNet Google Scholar
Alvarez, D.F., Pérez, C.J.M.: A dynamical system associated with Newton’s method for parametric approximations of convex minimization problems. Appl. Math. Optim. 38, 193–217 (1998)
Auslender, A.: Optimisation Méthodes Numériques. Masson, Paris (1976)
MATH Google Scholar
Auslender, A.: Penalty and barrier methods: a unified framework. SIAM J. Optim. 10(1), 211–230 (1999)
Article MathSciNet Google Scholar
Aubin, J.-P., Cellina, A.: Differential Inclusions: Set-Valued Maps and Viability Theory. Springer, Berlin (1984)
Book Google Scholar
Bauschke, H.H., Bolte, J., Teboulle, M.: A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications. Math. Oper. Res. 42(2), 330–348 (2016)
Article MathSciNet Google Scholar
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, vol. 408. Springer, New York (2011)
Book Google Scholar
Beck, A.: First-Order Methods in Optimization, vol. 25. SIAM, Philadelphia (2017)
Book Google Scholar
Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3), 167–175 (2003)
Article MathSciNet Google Scholar
Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent type methods. SIAM J. Optim. 23(4), 2037–2060 (2013)
Article MathSciNet Google Scholar
Bertsekas, D.P., Scientific, A.: Convex Optimization Algorithms. Athena Scientific, Belmont (2015)
Google Scholar
Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362(6), 3319–3363 (2010)
Article Google Scholar
Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.W.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165(2), 471–507 (2017)
Article MathSciNet Google Scholar
Bolte, J., Teboulle, M.: Barrier operators and associated gradient-like dynamical systems for constrained minimization problems. SIAM J. Control Optim. 42(4), 1266–1292 (2003)
Article MathSciNet Google Scholar
Borwein, J.M., Li, G., Yao, L.: Analysis of the convergence rate for the cyclic projection algorithm applied to basic semialgebraic convex sets. SIAM J. Optim. 24(1), 498–527 (2014)
Article MathSciNet Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Book Google Scholar
Chen, C., He, B., Ye, Y., Yuan, X.: The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math. Program. 155(1–2), 57–79 (2016)
Article MathSciNet Google Scholar
Crouzeix, J.-P.: Conditions for convexity of quasiconvex functions. Math. Oper. Res. 5(1), 120–125 (1980)
Article MathSciNet Google Scholar
Daniilidis, A., Ley, O., Sabourau, S.: Asymptotic behaviour of self-contracted planar curves and gradient orbits of convex functions. Journal de matéhmatiques pures et appliquées 94(2), 183–199 (2010)
Article MathSciNet Google Scholar
Dragomir, R.A., Taylor, A., d’Aspremont, A., Bolte, J.: Optimal complexity and certification of Bregman first-order methods (2019). arXiv preprint arXiv:1911.08510
Fenchel, W.: Convex Cones, Sets and Functions, Mimeographed Lecture Note. Princeton University, Princeton (1951)
Google Scholar
de Finetti, B.: Sulle stratificazioni convesse. Ann. Mat. 30(1), 173–183 (1949)
Article MathSciNet Google Scholar
Gale, D., Klee, V., Rockafellar, R.T.: Convex functions on convex polytopes. Proc. Am. Math. Soc. 19(4), 867–873 (1968)
Article MathSciNet Google Scholar
Golub, G.H., Hansen, P.C., O’Leary, D.P.: Tikhonov regularization and total least squares. SIAM J. Matrix Anal. Appl. 21(1), 185–194 (1999)
Article MathSciNet Google Scholar
Kannai, Y.: Concavifiability and constructions of concave utility functions. J. Math. Econ. 4(1), 1–56 (1977)
Article MathSciNet Google Scholar
Kurdyka, K., Mostowski, T., Parusinski, A.: Proof of the gradient conjecture of R. Thom. Ann. Math. 152(3), 763–792 (2000)
Article MathSciNet Google Scholar
Łojasiewicz, S.: Sur les trajectoires du gradient d’une fonction analytique. Seminari di Geometria, Bologna (1982/83). Universita’ degli Studi di Bologna, Bologna 1984, 115–117 (1984)
Lorentz, G.G.: Bernstein Polynomials. American Mathematical Society, Providence (1954)
MATH Google Scholar
Ma, T.W.: Higher chain formula proved by combinatorics. Electron. J. Comb. 16(1), N21 (2009)
Article MathSciNet Google Scholar
Manselli, P., Pucci, C.: Maximum length of steepest descent curves for quasi-convex functions. Geom. Dedicata 38(2), 211–227 (1991)
Article MathSciNet Google Scholar
Nemirovsky, A.S., Yudin, D.B.: Problem Complexity and Method Efficiency in Optimization. Wiley-Interscience, New York (1983)
Google Scholar
Nesterov, Y.: Lectures on Convex Optimization, vol. 137. Springer, Berlin (2003)
MATH Google Scholar
Nesterov, Y., Nemirovskii, A.: Interior-Point Polynomial Algorithms in Convex Programming, vol. 13. SIAM, Philadelphia (1994)
Book Google Scholar
Powell, M.J.: On search directions for minimization algorithms. Math. Program. 4(1), 193–201 (1973)
Article MathSciNet Google Scholar
Schneider, R.: Convex Bodies: The Brunn–Minkowski Theory, vol. 151. Cambridge University Press, Cambridge (1993)
Book Google Scholar
Torralba, D.: Convergence épigraphique et changements d’échelle en analyse variationnelle et optimisation. Ph.D. Thesis, Université Montpellier 2 (1996)
Rockafellar, R.T.: Convex Analysis, vol. 28. Princeton University Press, Princeton (1970)
Book Google Scholar
Thom, R.: Problèmes rencontrés dans mon parcours mathématique : un bilan. Publications mathématiques de l’IHES 70, 199–214 (1989)
Article Google Scholar
Wright, S.J.: Coordinate descent algorithms. Math. Program. 151(1), 3–34 (2015)
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors acknowledge the support of AI Interdisciplinary Institute ANITI funding, through the French “Investing for the Future – PIA3” program under the Grant agreement $\mathrm{n}^{\circ }$ANR-19-PI3A-0004, Air Force Office of Scientific Research, Air Force Material Command, USAF, under grant numbers FA9550-19-1-7026, FA9550-18-1-0226, and ANR MasDol. J. Bolte acknowledges the support of ANR Chess, grant ANR-17-EURE-0010, TSE-P and ANR OMS.

Author information

Authors and Affiliations

Toulouse School of Economics, Université Toulouse 1 Capitole, Toulouse, France
Jérôme Bolte
IRIT, ANITI, CNRS. DEEL, IRT Saint Exupery, Université de Toulouse, Toulouse, France
Edouard Pauwels

Authors

Jérôme Bolte
View author publications
You can also search for this author in PubMed Google Scholar
Edouard Pauwels
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jérôme Bolte.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Lemma 10

(Smooth concave interpolation: in between square root and affine) There exists a $C^\infty $ strictly increasing concave function $\phi :[0,1] \mapsto [0,1]$ such that

$$\begin{aligned} \phi (t)&= \sqrt{2t/3} \quad \forall t \le 1/6\\ \phi (1)&= 1 \\ \phi '(1)&= 2/3\\ \phi ^{(m)}(1)&= 0, \quad \forall m \ge 2 \end{aligned}$$

Proof

Consider a $C^\infty $ function $g_0 :{\mathbb {R}}\mapsto [0,1]$ such that $g_0 = 1$ on $(-\infty ,-1)$, $g_0 = 0$ on $(1, +\infty )$ (for example convoluting the step function with a smooth bump function). Set $g(t) = \frac{1}{2}\left( g_0(t) + 1 - g_0(-t) \right) $ we have that g is $C^\infty $, $g = 1$ on $(-\infty ,-1)$, $g = 0$ on $(1, +\infty )$ and $g(t) + g(-t) = 1$ for all t. We have

$$\begin{aligned} \int _{-1}^1 g(s) ds = 1\\ \int _{-1}^1 \left( \int _{-1}^t g(s)ds \right) dt = 1 \end{aligned}$$

Set $\phi _0 :[-3,3] \mapsto {\mathbb {R}}$, such that

$$\begin{aligned} \phi _0(t) = \int _{-3}^t \left( \int _{-3}^r g(s) ds\right) dr. \end{aligned}$$

For all r in $ [-3,3]$, we have

$$\begin{aligned} \int _{-3}^r g(s) ds = {\left\{ \begin{array}{ll} r+3 &{} \text { if } r \le -1\\ 2 + \int _{-1}^r g(s)ds &{} \text { if } -1 \le r \le 1\\ 3 &{} \text { if } r \ge 1 \end{array}\right. } \end{aligned}$$

and thus

$$\begin{aligned} \phi _0(t) = {\left\{ \begin{array}{ll} \frac{t^2}{2} - 9/2 + 3(t+3) &{} \text { if } t \le -1\\ 2 + 2(t + 1) + \int _{-1}^t \left( \int _{-1}^r g(s)ds \right) dr&{} \text { if } -1 \le t \le 1\\ 6 + 3(t-1) &{} \text { if } 1 \ge t \end{array}\right. } \end{aligned}$$

and in particular $\phi _0(3) = 12$ and $\phi _0'(3) = 3$. Set $\phi _1(s) = \phi _0(6 s -3)/12$.

$$\begin{aligned} \phi _1(0)&= 0\\ \phi _1(t)&= \left( \frac{(6t-3)^2}{2} - 9/2 + 2(3t )\right) /12 = 3 t^2 / 2 = \text { if } t \le 1/3\\ \phi _1(1)&= 1\\ \phi _1'(1)&= 3/2. \end{aligned}$$

$\phi _1$ is stricly increasing, let $\phi :[0,1] \mapsto [0,1]$ denote the inverse of $\phi _1$, we have

$$\begin{aligned} \phi (1)&= 1\\ \phi '(1)&= 2 / 3\\ \phi (t)&= \sqrt{2t/3} \text { if } t\le 1/6. \end{aligned}$$

$\square $

Lemma 11

(Interpolation inside a sublevel set) Consider any strictly increasing $C^k$ function $\phi :(0,2) \mapsto {\mathbb {R}}$ such that $\phi (1) = 1$ and $\phi ^{(m)}(1) = 0$, $m = 2,\ldots k$. Then the function

$$\begin{aligned} G :(0,2) \times {\mathbb {R}}/ 2 \pi {\mathbb {Z}}&\mapsto {\mathbb {R}}^2 \\ (s,\theta )&\mapsto \phi (s) n(\theta ) \end{aligned}$$

is diffeomorphism which satisfies for any $m=1 \ldots ,k$ and $l =2,\ldots , k$,

$$\begin{aligned}&\frac{\partial ^m G}{\partial \theta ^m}(1,\theta ) = n^{(m)}(\theta )\\&\frac{\partial ^{m+1} G}{\partial \lambda \partial \theta ^m} (1,\theta ) = \phi '(1)n^{(m)}(\theta ) \\&\frac{\partial ^{l+m} G}{\partial \lambda ^l \partial \theta ^m }(\lambda _-,\theta ) = 0. \end{aligned}$$

Lemma 12

Combinatorial Arbogast-Faà di Bruno Formula (from [29]) Let $g :{\mathbb {R}}\mapsto {\mathbb {R}}$ and $f :{\mathbb {R}}^p \mapsto [0, +\infty )$ be $C^k$ functions. Then we have for any $m \le k$ and any indices $i_1,\ldots ,i_m \in \left\{ 1,\ldots , p \right\} $.

$$\begin{aligned} \frac{\partial ^m}{\prod _{l=1}^{m}\partial x_{i_l}} g \circ f(x) = \sum _{\pi \in {\mathcal {P}}} g^{(|\pi |)}(f(x)) \prod _{B \in \pi } \frac{\partial ^{|B|} f}{\prod _{l \in B}\partial x_{i_l}}(x), \end{aligned}$$

where ${\mathcal {P}}$ denotes all partitions of $\left\{ 1,\ldots , m \right\} $, the product is over subsets of $\left\{ 1,\ldots ,m \right\} $ given by the partition $\pi $ and $|\cdot |$ denotes the number of elements of a set. We rewrite this as follows

$$\begin{aligned} \frac{\partial ^m}{\prod _{l=1}^{m}\partial x_{i_l}} g \circ f(x) = \sum _{k = 1}^m\sum _{\pi \in {\mathcal {P}}_k} g^{(k)}(f(x)) \prod _{B \in \pi } \frac{\partial ^{|B|} f}{\prod _{l=1}^{m}\partial x_{i_l}}(x), \end{aligned}$$

where ${\mathcal {P}}_k$ denotes all partitions of size k of $\left\{ 1,\ldots , m \right\} $.

Lemma 13

From [12, Lemma 45] Let h in $ C^0\left( (0,r_0],{\mathbb {R}}_+^* \right) $ be an increasing function. Then there exists a function $\psi $ in $ C^\infty ({\mathbb {R}},{\mathbb {R}}_+)$ such that $\psi = 0$ on, ${\mathbb {R}}_-$ and $0 < \psi (s) \le h(s)$ for any s in $(0,r_0]$ and $\psi $ is increasing on ${\mathbb {R}}$

Lemma 14

(High-order smoothing near the solution set) Let $D \subset {\mathbb {R}}^p$ be a nonempty compact convex set and $f :D \mapsto {\mathbb {R}}$ convex, continuous on D and $C^k$ on $D {\setminus } {{\,\mathrm{argmin}\,}}_{D} f$. Assume further that ${{\,\mathrm{argmin}\,}}_D f \subset \mathrm {int}(D)$, $k \ge 1$, with $\min _D f = 0$. Then there exists $\phi :{\mathbb {R}}\mapsto {\mathbb {R}}_+$, $C^k$, convex and increasing with positive derivative on $(0,+\infty )$, such that $\phi \circ f$ is convex and $C^k$ on D.

Proof

By a simple translation, we may assume that $\min _D f = 0$ and $\max _D f = 1$. Any convex function is locally Lipschitz continuous on the interior of its domain so that f is globally Lipschitz continuous on D and its gradient is bounded. Hence, $f^2$ is $C^1$ and convex on D. We now proceed by recursion. For any $m =1,\ldots , k$, we let $Q_m$ denote the m-order tensor of partial derivatives of order m. Fix m in $\{1,\ldots ,k\}$. Assume that f is $C^m$ throughout D while it is $C^{m+1}$ on $D {\setminus } \arg \min _D f$. Note that all the derivatives up to order m are bounded. We wish to prove that f is globally $C^{m+1}$.

Consider the increasing function

$$\begin{aligned} h :(0,1]&\mapsto {\mathbb {R}}_+^*\\ s&\mapsto \frac{s}{1 + \sup _{s \le f(x) \le 1}\Vert Q_{m+1}(x)\Vert _{\infty }} \end{aligned}$$

and set $\psi $ as in Lemma 13. Recall that $\psi $ is $C^\infty $ and all its derivative vanish at 0 and $\psi \le h$ on (0, 1]. Let $\phi $ denote the anti-derivative of $\psi $ such that $\phi (0) = 0$. $\phi $ is $C^\infty $ and convex increasing on ${\mathbb {R}}$ and, since its derivatives at 0 vanish as well, one has, for any q in $ {\mathbb {N}}$, $\phi ^{(q)}(z) = o(z)$. Consider the function $\phi \circ f$. It is $C^m$ on D and it has bounded derivatives up to order m. Furthermore, it is $C^{m+1}$ on $D {\setminus } {{\,\mathrm{argmin}\,}}_D f$. Let $\bar{y} $ in $ {{\,\mathrm{argmin}\,}}_D f$. If $\bar{y} $ in $ \mathrm {int}({{\,\mathrm{argmin}\,}}_D f)$, then f and $\phi \,\circ f$ have derivatives of all order vanishing at $\bar{y}$. Assuming that $\bar{y} $ in $ {{\,\mathrm{argmin}\,}}_D f{\setminus } \mathrm {int}({{\,\mathrm{argmin}\,}}_D f)$. By the induction assumption and Lemma 12, we have for any indices $i_1,\ldots ,i_m \in \left\{ 1,\ldots , p \right\} $ and any h in $ {\mathbb {R}}^p$:

$$\begin{aligned}&\frac{\partial ^m}{\prod _{l=1}^{m}\partial x_{i_l}} (\phi \circ f)(\bar{y} + z) - \frac{\partial ^m}{\prod _{l=1}^{m}\partial x_{i_l}} (\phi \circ f)(\bar{y}) \\&\quad =\; \frac{\partial ^m}{\prod _{l=1}^{m}\partial x_{i_l}}( \phi \circ f)(\bar{y} + z) \\&\quad =\;\sum _{q = 1}^{m}\sum _{\pi \in {\mathcal {P}}_q} \phi ^{(q)}(f(\bar{y} + z)) \prod _{B \in \pi } \frac{\partial ^{|B|} f}{\prod _{l=1}^{m}\partial x_{i_l}}(\bar{y} + z). \end{aligned}$$

All the derivatives of f are of order less or equal to m and thus remain bounded as $z \rightarrow 0$. Further more f is Lipschitz continuous on D so that $f(\bar{y} + z) = O(\Vert z\Vert )$ near 0, and, for any q in $ {\mathbb {N}}$, $\phi ^{(q)}(f(\bar{y} + z)) = o(\Vert z\Vert )$. Hence $\phi \circ f$ has derivative of order $m+1$ at $\bar{y}$ and it is 0.

Since ${{\,\mathrm{argmin}\,}}_D f \subset \mathrm {int}(D)$, we may consider any sequence of point $(y_{j})_{j \in {\mathbb {N}}}$ in $D {\setminus } {{\,\mathrm{argmin}\,}}_D f$ converging to $\bar{y}$. By Lemma 12, we have for any indices $i_1,\ldots ,i_{m+1} \in \left\{ 1,\ldots , p \right\} $, and any j in $ {\mathbb {N}}$,

$$\begin{aligned} \frac{\partial ^{(m+1)}}{\prod _{l=1}^{m+1}\partial x_{i_l}} (\phi \circ f)(y_j)&= \phi '(f(y_j)) \frac{\partial ^{(m+1)} f}{\prod _{l=1}^{m}\partial x_{i_l}}(y_j) \\&\quad + \sum _{q = 2}^{m+1}\sum _{\pi \in \Pi _q} \phi ^{(q)}(f(y_j)) \prod _{B \in \pi } \frac{\partial ^{|B|} f}{\prod _{l=1}^{m}\partial x_{i_l}}(x)\\&\le h(f(y_j))\frac{\partial ^{(m+1)} f}{\prod _{l=1}^{m}\partial x_{i_l}}(y_j) \\&\quad + \sum _{q = 2}^{m+1}\sum _{\pi \in \Pi _q} \phi ^{(q)}(f(y_j)) \prod _{B \in \pi } \frac{\partial ^{|B|} f}{\prod _{l=1}^{m}\partial x_{i_l}}(x)\\&= f(y_j) \frac{\frac{\partial ^{(m+1)} f}{\prod _{l=1}^{m}\partial x_{i_l}}(y_j)}{1 + \sup _{f(y_j) \le f(x) \le 1}\Vert Q_{m+1}(x)\Vert _{\infty }} + O(f(y_j))\\&= O(f(y_j)), \end{aligned}$$

where the inequality follows from the construction of $\phi $. The third step follows using the definition of h and the fact that, for any $q \ge 2$,

1.
Each partition of $\left\{ 1,\ldots ,m+1 \right\} $ of size q contains subsets of size at most m. Thus in the product, the terms $\partial ^{|B|} f$ correspond to bounded derivatives of f by the induction hypothesis.
2.
$\phi ^{(q)}(a) = o(a)$ as $a \rightarrow 0$.

The last step stems from the fact that the ratio has asbolute value less than 1. This shows that the derivatives of order $m+1$ of $\phi \circ f$ are decreasing to 0 as $j \rightarrow \infty $ and $\phi \circ f$ is actually $C^{m+1}$ and convex on D. The result follows by induction up to $m = k$ and by the fact that a composition of increasing convex functions is increasing and convex. $\square $

Lemma 15

Let $p :{\mathbb {R}}_+ \mapsto {\mathbb {R}}_+$ be concave increasing and $C^1$ with $p' \ge c$ for some $c > 0$. Assume that there exists $ A > 0$ such that for all x in $ {\mathbb {R}}_+$

$$\begin{aligned} p(x) - x p'(x) \le A. \end{aligned}$$

Then setting $a= A/c$, we have for all $x \ge a$,

$$\begin{aligned} p(x-a) - x p'(x-a) \le 0 \end{aligned}$$

Proof

For all $x \ge a$, we have

$$\begin{aligned} f(x-a) - (x-a)f'(x-a) \le A, \end{aligned}$$

hence

$$\begin{aligned} f(x-a) - xf'(x-a) \le A - af'(x-a) \le A - ac = 0. \end{aligned}$$

$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bolte, J., Pauwels, E. Curiosities and counterexamples in smooth convex optimization. Math. Program. 195, 553–603 (2022). https://doi.org/10.1007/s10107-021-01707-1

Download citation

Received: 21 January 2020
Accepted: 10 August 2021
Published: 09 October 2021
Issue Date: September 2022
DOI: https://doi.org/10.1007/s10107-021-01707-1

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Curiosities and counterexamples in smooth convex optimization

Abstract

Similar content being viewed by others

The Equivalence of Three Types of Error Bounds for Weakly and Approximately Convex Functions

Faster subgradient methods for functions with Hölderian growth

Interpolatory pointwise estimates for convex polynomial approximation

1 Introduction

1.1 Questions and method

Theorem

1.2 A digest of counterexamples

2 Preliminaries

2.1 Continuous convex interpolations

Theorem 1

2.2 Bernstein approximation

Lemma 1

Proof

Remark 1

3 Smooth convex interpolation

3.1 Smooth parametrization of convex rings

Assumption 1

Lemma 2

Proof

Proposition 1

Remark 2

Proof

3.2 Smooth convex interpolation of smooth positively curved convex sequences

Theorem 2

Proof

Remark 3

Remark 4

3.3 Considerations on Legendre functions

Lemma 3

Proof

Proposition 2

Proof

4 Smooth convex interpolation for sequences of polygons

4.1 Smooth approximations of polygons

Lemma 4

Proof

Lemma 5

Proof

Lemma 6

Proof

Remark 5

4.2 Smooth convex interpolants of polygonal sequences

Definition 1

Corollary 1

4.3 More on Legendre functions and a pathological function with polyhedral domain

Lemma 7

Proof

Theorem 3

Corollary 2

Proof

Corollary 3

Proof

5 Counterexamples in continuous optimization

5.1 Kurdyka-Łojasiewicz inequality may not hold

Lemma 8

Corollary 4

Proof

5.2 Block coordinate descent may not converge

Corollary 5

Proof

5.3 Gradient descent with exact line search may not converge

Corollary 6

5.4 Tikhonov regularization path may have infinite length

Corollary 7

Proof

5.5 Secants of gradient curves at infinity may not converge

Lemma 9

Proof

Corollary 8

Proof

5.6 Newton’s flow may not converge

Corollary 9

5.7 Bregman descent (mirror descent) may not converge

Corollary 10

Proof

5.8 Central paths of Legendre barriers may not converge

Corollary 11