1 Introduction

1.1 Questions and method

One of the goals of convex optimization is to provide a solution to a problem with stable and fast algorithms. The quality of a method is generally assessed by the convergence of sequences, rate estimates, complexity bounds, finite length of relevant quantities and other quantitative or qualitative ways.

Positive results in this direction are numerous and have been the object of intense research since decades. To name but a few: gradient methods e.g., [16, 31, 32], proximal methods e.g., [7], alternating methods e.g., [8, 39], path following methods e.g., [4, 33], Tikhonov regularization e.g. [24], semi-algebraic optimization e.g., [13, 15], decomposition methods e.g., [7, 8], augmented Lagrangian methods e.g., [11] and many others.

Despite this vast literature, some simple questions remain unanswered or just partly tackled, even for smooth convex coercive functions. Does the alternating minimization method, aka Gauss–Seidel method, converge? Does the steepest descent method with exact line search converge? Do mirror descent or Bregman methods converge? Does Newton’s flow converge? Do central paths converge? Is the gradient flow directionally stable? Do smooth convex functions have the Kurdyka–Łojasiewicz property?

In this article we provide a negative answer to all these questions.

Our work draws inspiration from early works of de Finetti [22], Fenchel [21], on convex interpolation, but also from Torralba’s PhD thesis [36] and the more recent [12], where some counterexamples on the Tikhonov path and Łojasiewicz inequalities are provided. The convex interpolation problem, see [22], is as follows: given a monotone sequence of convex setsFootnote 1 may we find a convex function interpolating each of these sets, i.e., having these sets as sublevel sets? Answers to these questions for continuous convex functions were provided by de Finetti, and improved by Fenchel [21], Kannai [25], and then used in [12, 36] for building counterexamples.

We improve this work by providing, for \(k\ge 2\) arbitrary, a general \(C^k\) interpolation theorem for positively curved convex sets, imposing at the same time the positive definiteness of its Hessian out of the solution set. An abridged version could be as follows.

Theorem

(Smooth convex interpolationFootnote 2) Let \(\left( T_i \right) _{i\in {\mathbb {Z}}}\) be a sequence of compact convex subsets of \({\mathbb {R}}^2\), with positively curved \(C^k\) boundary, such that \(T_i\subset \mathrm {int}\,T_{i+1}\) for all i in \({\mathbb {Z}}\). Then there exists a \(C^k\) convex function f having the \(T_i\) as sublevel sets with positive definite Hessian outside of the set:

$$\begin{aligned} {{\,\mathrm{argmin}\,}}f=\bigcap _{i\in {\mathbb {Z}}} T_i. \end{aligned}$$

We provide several additional tools (derivatives computations) and variants (status of the solution set, Legendre functions, Lipschitz continuity). Whether our result is generalizable to general smooth convex sequences, i.e., with vanishing curvature, seems to be a very delicate question whose answer might well be negative.

Our central theoretical result is complemented by a discrete approximate interpolation result “of order one" which is particularly well adapted for building counterexamples. Given a nested collection of polygons, one can indeed build a smooth convex function having level sets interpolating its vertices and whose gradients agree with prescribed normals.

Our results are obtained by blending parametrization techniques, Minkovski summation, Bernstein approximations and convex analysis.

As sketched below, our results offer the possibility of building counterexamples in convex optimization by restricting oneself to the construction of countable collections of nested convex sets satisfying some desirable properties. In all cases failures of good properties are caused by some curvature oscillations.

1.2 A digest of counterexamples

Counterexamples provided in this article can be classified along three axes: structural counterexamples,Footnote 3 counterexamples for convex optimization algorithms and ordinary differential equations.

In the following, the term “nonconverging” sequence or trajectory means, a sequence or a trajectory with at least two distinct accumulation points. Unless otherwise stated, convex functions have full domain.

The following results are proved for \(C^k\) convex functions on the plane with \(k\ge 2\).

Structural counterexamples

  1. (i)

    Kurdyka–Łojasiewicz: There exists a \(C^k\) convex function whose Hessian is positive definite outside its solution set and which does not satisfy the Kurdyka–Łojasiewicz inequality. This is an improvement on [12].

  2. (ii)

    Tikhonov regularization path: There exists a \(C^k\) convex function f such that the regularization path

    $$\begin{aligned} x(r)= {{\,\mathrm{argmin}\,}}\left\{ f(y) + r \Vert y\Vert ^2:y\in {\mathbb {R}}^2\right\} , \,\,r\in (0,1) \end{aligned}$$

    has infinite length. This strengthens a theorem by Torralba [36].

  3. (iii)

    Central path: There exists a continuous Legendre function \(h :[-1,1]^2 \mapsto {\mathbb {R}}\), \(C^k\) on the interior, and c in \({\mathbb {R}}^2\) such that the central path

    $$\begin{aligned} x(r) = {{\,\mathrm{argmin}\,}}\left\{ \left\langle c, y \right\rangle + r h(y):y\in D\right\} \end{aligned}$$

    does not have a limit as \(r \rightarrow 0\).

Algorithmic counterexamples:

  1. (iv)

    Gauss–Seidel method (block coordinate descent): There exists a \(C^k\) convex function with positive definite Hessian outside its solution set and an initialization \( (u_0,v_0)\) in \({\mathbb {R}}^2\), such that the alternating minimization algorithm

    $$\begin{aligned} u_{i+1}&= \mathop {\hbox {argmin}}\limits _{u \in {\mathbb {R}}} f(u, v_i) \\ v_{i+1}&= \mathop {\hbox {argmin}}\limits _{v \in {\mathbb {R}}}f(u_{i+1}, v) \end{aligned}$$

    produces a bounded nonconverging sequence \(((u_i,v_i))_{i\in {\mathbb {N}}}\).

  2. (v)

    Gradient descent with exact line search: There exists a \(C^k\) convex function f with positive definite Hessian outside its solution set and an initialization \(x_0\) in \({\mathbb {R}}^2\), such that the gradient descent algorithm with exact line search

    $$\begin{aligned} x_{i+1}&= \mathop {\hbox {argmin}}\limits _{t \in {\mathbb {R}}} f(x_i + t \nabla f(x_i)) \end{aligned}$$

    produces a bounded nonconvergent sequence.

  3. (vi)

    Bregman or mirror descent method: There exists a continuous Legendre function \(h :[-1,1]^2 \mapsto {\mathbb {R}}\), \(C^k\) on the interior, a vector c in \({\mathbb {R}}^2\) and an initialization \(x_0\) in \((-1,1)^2\) such that the Bregman recursion

    $$\begin{aligned} x_{i+1} = \nabla h^*(\nabla h(x_i) - c) \end{aligned}$$

    produces a nonconverging sequence. The couple \((h,\langle c,\cdot \rangle \)) satisfies the smoothness properties introduced in [6].

Continuous time ODE counterexamples:

  1. (vii)

    Continuous time Newton method: There exists a \(C^k\) convex function with positive definite Hessian outside its solution set, and an initialization \(x_0\) in \({\mathbb {R}}^2\) such that the continuous time Newton’s system

    $$\begin{aligned} {\dot{x}}(t)&= - \left[ \nabla ^2 f(x(t))\right] ^{-1} \nabla f(x(t)), \,\,t\ge 0,\\ x(0)&= x_0 \end{aligned}$$

    has a solution approaching \({{\,\mathrm{argmin}\,}}f\) which does not converge.

  2. (viii)

    Directional convergence for gradient curves: There exists a \(C^k\) convex function with 0 as a unique minimizer and a positive definite Hessian on \({\mathbb {R}}^2{\setminus }\{0\}\), such that for any non stationary solution to the gradient system

    $$\begin{aligned} {\dot{x}}(t)&= - \nabla f(x(t)) \end{aligned}$$

    the direction \(x(t) / \Vert x(t)\Vert \) does not converge.

  3. (ix)

    Hessian Riemannian gradient dynamics: There exists a continuous Legendre function \(h :[-1,1]^2 \mapsto {\mathbb {R}}\), \(C^k\) on the interior, a linear function f and a nonconvergent solution to the following system

    $$\begin{aligned} {\dot{x}}(t) = - \nabla _H f(x(t)), \end{aligned}$$

    where \(H = \nabla ^2 h\) is the Hessian of h and \(\nabla _H f = H^{-1} \nabla f\) is the gradient of f in the Riemannian metric induced by H on \((-1,1)^2\).

Pathological sequences and curves Our counterexamples lead to sequences or paths in \({\mathbb {R}}^2\) which are related to a function f by a certain property (see examples above) and have a certain type of pathology. For illustration purposes, we provide sketches of the pathological behaviors we met in Fig. 1.

Fig. 1
figure 1

Rough sketches of pathological behavior for curves in the plane; for sequences similar figures would be obtained. Red colors indicate proximity with the solution set. Convergence with infinite length corresponds to counterexample (ii), finite length jiggling corresponds to counterexample (viii) (recall that gradient curves have to be self-contractant as well, see [19]), nonconvergent jiggling corresponds to counterexamples (iii), (vi), (vii), (ix) and nonconvergent spiraling corresponds to counterexamples (iv), (v) and somehow (i) (color figure online)

2 Preliminaries

Let us set \({\mathbb {N}}^*={\mathbb {N}}{\setminus }\{0\}\). For p in \({\mathbb {N}}^*\), the Euclidean scalar product in \({\mathbb {R}}^p\) is denoted by \(\langle \cdot ,\cdot \rangle \), otherwise stated the norm is denoted by \(\Vert \cdot \Vert \). Given subsets ST in \({\mathbb {R}}^p\), and x in \({\mathbb {R}}^p\), we define

$$\begin{aligned} \mathrm {dist}(x,S):=\inf \left\{ \Vert x-y\Vert :y\in S\right\} , \end{aligned}$$

and the Hausdorff distance between S and T,

$$\begin{aligned} \mathrm {dist}(S,T)=\max \left( \sup _{x\in S}\mathrm {dist}(x,T),\sup _{x\in T}\mathrm {dist}(x,S)\right) . \end{aligned}$$

Throughout this note, the assertion “g is \(C^k\) on D” where D is not an open set is to be understood as “g is \(C^k\) on an open neighborhood of D”. Given a map \(G :X \mapsto A \times B\) for some space X, \([G]_1 :X \mapsto A\) denotes the first component of G.

2.1 Continuous convex interpolations

We consider a sequence of compact convex subsets of \({\mathbb {R}}^p\), \(\left( T_i \right) _{i \in {\mathbb {Z}}}\) such that \(T_{i+1} \subset \mathrm {int}\ T_i\). Finding a continuous convex interpolation of \(\left( T_i \right) _{i \in {\mathbb {Z}}}\) is finding a convex continuous function which makes the \(T_i\) a sequence of level sets. We call this process continuous convex interpolation. This questioning was present in Fenchel [21] and dates back to de Finetti [22], let us also mention the work of Crouzeix [18] revolving around this issue.

Such constructions have been shown to be realizable by Torralba [36], Kannai [25], using ideas based on Minkowski sum. The validity of this construction can be proved easily using the result of Crouzeix [18] which was already present under different and weaker forms in the works of de Finetti and Fenchel.

Theorem 1

(de Finetti–Fenchel–Crouzeix) Let \(f :{\mathbb {R}}^p \rightarrow {\mathbb {R}}\) be a quasiconvex function. The functions

$$\begin{aligned} F_x :\lambda \mapsto \sup \left\{ \langle z,x\rangle : f(z)\le \lambda \right\} \end{aligned}$$

are concave for all x in \({\mathbb {R}}^p\), if and only f is convex.

Our goal is to build smooth convex interpolation for sequences of smooth convex sets. To make such a construction we shall use nonlinear Minkowski interpolation between level sets.

We shall also rely on Bernstein approximation which we now describe.

2.2 Bernstein approximation

We refer to the monograph [28] by Lorentz.

The main properties of Bernstein polynomials to be used in this paper are the following:

  • Bernstein approximation is linear in its functional argument f and “monotone” which allows to construct an approximation using only positive combination of a finite number of function values.

  • There are precise formulas for derivatives of Bernstein approximation. They involve repeated finite differences. So approximating piecewise affine function with high enough degree leads to an approximation for which corner values of derivatives are controlled while the remaining derivatives are vanishing (up to a given order).

  • Bernstein approximation is shape preserving, which means in particular that approximating a concave function preserves concavity.

The main idea to produce a smooth interpolation which preserves level sets is depicted in Fig. 2 where we use Bernstein approximation to interpolate smoothly between three points and controlling the successive derivatives at the end points of the interpolation.

Fig. 2
figure 2

Illustration of Bernstein’s smooth interpolation. We consider at first three points, to which we add four extra points to control derivatives at the junctions. We then build a concave, piecewise linear function and make a Bernstein approximation between the first two original points and the last two original points

Let us now be specific. Given f defined on the interval [0, 1], the Bernstein polynomial of order \(d \in {\mathbb {N}}^*\) associated to f is given by

$$\begin{aligned} B_d(x) = B_{d,f}(x) = \sum _{k=0}^d f\left( \frac{k}{d} \right) {d \atopwithdelims ()k} x^k(1-x)^{d-k}, \text{ for } x\in [0,1]. \end{aligned}$$
(1)

Derivatives and shape preservation: For any h in (0, 1) and x in \( [0,1-h]\), we set \(\Delta _h^1 f(x) = f(x+h) - f(x)\) and recursively for all k in \( {\mathbb {N}}^*\), \(\Delta _h^k f(x) = \Delta \left( \Delta _h^{k-1} f(x) \right) \). We fix \(d \ne 0\) in \({\mathbb {N}}\) and for \(h = \frac{1}{d}\) write \(\Delta _h^k = \Delta ^k\). Then for any \(m \le d\), we have

$$\begin{aligned} B_d(x)^{(m)}&= d(d-1)\ldots (d-m+1) \sum _{k=0}^{d - m} \Delta ^m f\left( \frac{k}{d} \right) {{d-m} \atopwithdelims ()k} x^k(1-x)^{d-k-m}, \end{aligned}$$
(2)

for any x in [0, 1]. If f is increasing (resp. strictly increasing), then \(\Delta ^1f(x) \ge 0\) (resp. \(\Delta ^1f(x) > 0\)) for all x and \(B'_d\) is positive (resp. strictly positive) and \(B_d\) is increasing (resp. strictly increasing). Similarly, if f is concave, then \(\Delta ^2 f(x) \le 0\) for all x so that \(B^{(2)}_d \le 0\) and \(B_d\) is concave. From (2), we infer

$$\begin{aligned} \left| B_d(x)^{(m)}\right| \le d(d-1)\ldots (d-m+1)\sup _{k\in \{0,\ldots ,d-m\}} \left| \Delta ^m f\left( \frac{k}{d} \right) \right| \end{aligned}$$
(3)

for x in [0, 1].

Approximation of piecewise affine functions: The following lemma will be extensively used throughout the proofs.

Lemma 1

(Smoothing of piecewise lines in \({\mathbb {R}}^p\)) Let \(q_0,q_1 \in {\mathbb {R}}^p\), \(\lambda _-<\lambda _0< \lambda _1 < \lambda _+\) and \(0<e_1,e_0<1\). Set \(\Theta = \left( q_0,q_1,,\lambda _-,\lambda _0,\lambda _1,\lambda _+, e_0,e_1 \right) \) and define \(\gamma _\Theta :[0,1] \mapsto {\mathbb {R}}^p\) through

$$\begin{aligned}&\gamma _\Theta (t)\\&= {\left\{ \begin{array}{ll} q_0 \left( 1 + \frac{e_0}{\lambda _0 - \lambda _-} \left( t (\lambda _+- \lambda _-) \right) \right) &{} \text { if } 0\le t \le \frac{\lambda _0 - \lambda _-}{\lambda _+ - \lambda _-} \\ q_0(1 + e_0) \left( \frac{\lambda _1 - \lambda _- - t (\lambda _+ - \lambda _-)}{\lambda _1 - \lambda _0} \right) + q_1(1 - e_1) \left( \frac{ \lambda _- - \lambda _0 + t (\lambda _+ - \lambda _-)}{\lambda _1 - \lambda _0} \right) &{} \text { if } \frac{\lambda _0 - \lambda _-}{\lambda _+ - \lambda _-} \le t \le \frac{\lambda _1 - \lambda _-}{\lambda _+ - \lambda _-} \\ q_1\left( 1 + \frac{(t-1)(\lambda _+-\lambda _-)e_1}{\lambda _+ - \lambda _1} \right) &{} \text { if } \frac{\lambda _1 - \lambda _-}{\lambda _+ - \lambda _-} \le t \le 1. \end{array}\right. } \end{aligned}$$

The curve \(\gamma _\Theta \) in \({\mathbb {R}}^{p+1}\) is the affine interpolant between the points \(q_0\), \( (1 + e_0) q_0\), \((1 - e_1) q_1\) and \( q_1\). For any m in \( {\mathbb {N}}\), we choose d in \({\mathbb {N}}^*\) such that

$$\begin{aligned} \frac{m}{d} \le \min \left\{ \frac{\lambda _0 - \lambda _-}{\lambda _+ - \lambda _-}, 1 - \frac{\lambda _1 - \lambda _-}{\lambda _+ - \lambda _-} \right\} . \end{aligned}$$
(4)

We consider a Bernstein-like reparametrization of \({\tilde{\gamma }}_\Theta \) given by

$$\begin{aligned} {\tilde{\gamma }}_\Theta :[\lambda _-,\lambda _+]&\mapsto {\mathbb {R}}^p\\ \lambda&\mapsto \sum _{k=0}^d {\tilde{\gamma }}_\Theta \left( \frac{k}{d} \right) {d \atopwithdelims ()k} \left( \frac{\lambda - \lambda _-}{\lambda _+ - \lambda _-} \right) ^k\left( 1 - \frac{\lambda - \lambda _-}{\lambda _+ - \lambda _-} \right) ^{d-k}. \end{aligned}$$

Then the following holds, for any \(2 \le l \le m\), \({\tilde{\gamma }}_\Theta \) is \(C^m\) and

$$\begin{aligned} {\tilde{\gamma }}_\Theta (\lambda _-)&=q_0&{\tilde{\gamma }}_\Theta (\lambda _+)&=q_1\\ {\tilde{\gamma }}_\Theta '(\lambda _-)&=\frac{e_0}{\lambda _0- \lambda -} q_0&{\tilde{\gamma }}_\Theta '(\lambda _+)&=\frac{e_1}{\lambda _+ - \lambda _1} q_1\\ {\tilde{\gamma }}_\Theta ^{(l)}(\lambda _-)&=0&{\tilde{\gamma }}_\Theta ^{(l)}(\lambda _+)&= 0. \end{aligned}$$

Furthermore, if \(\gamma _\Theta \) has monotone coordinates (resp. strictly monotone, resp. concave, resp. convex), then so has \({\tilde{\gamma }}_\Theta \).

Proof

Note that the dependence of \({\tilde{\gamma }}_\Theta \) in \((q_0,q_1)\) is linear so that the dependence of \({\tilde{\gamma }}_\Theta \) in \((q_0,q_1)\) is also linear. Hence \({\tilde{\gamma }}_\Theta \) is of the form \(\lambda \mapsto a(\lambda ) q_0 + b(\lambda )q_1\). We can restrict ourselves to \(p = 1\) since the general case follows from the univariate case applied coordinatewise.

If \(p=1\), then \({\tilde{\gamma }}_\Theta = B_{f_{\Theta ,d}} \circ A\), where \(A :\lambda \mapsto \frac{\lambda - \lambda _-}{\lambda _+-\lambda _-}\). We have \({\tilde{\gamma }}_\Theta (0) = q_0\), \({\tilde{\gamma }}_\Theta (1) = q_1\) and \(\Delta ^1 f_{\Theta }(0) = \frac{\lambda _+ - \lambda _-}{\lambda _0 - \lambda _-}\frac{e_0}{d} q_0\), \(\Delta ^1 f_{\Theta }\left( 1 - \frac{1}{d} \right) = \frac{\lambda _+ - \lambda _-}{\lambda _+ - \lambda _1}\frac{e_1}{d} q_1\) and \(\Delta ^{(l)} f_{\Theta }(0) = \Delta ^{(l)} f_{\Theta }\left( 1 - \frac{l}{d} \right) =0\). The results follow from the expressions in (1) and (2) and the chain rule for \({\tilde{\gamma }}_\Theta = B_{f_{\Theta ,d}} \circ A\).

The last property of \({\tilde{\gamma }}_\Theta \) is due to the shape preserving property of Bernstein approximation and the fact that \({\tilde{\gamma }}_\Theta = B_{f_{\Theta ,d}} \circ L\). \(\square \)

Remark 1

(a) [Affine image] Using the notation of Lemma 1, if \((\lambda _-,q_0)\), \((\lambda _0, (1 + e_0) q_0)\), \((\lambda _1, (1 - e_1) q_1)\) and \((\lambda _+, q_1)\) are aligned, then the interpolation is actually an affine function.

(b) [Degree of the interpolants] Observe that the degree of the Bernstein interpolant is connected to the slopes of the piecewise path \(\lambda \) by (4).

3 Smooth convex interpolation

Being given a subset S of \({\mathbb {R}}^p\), we denote by \(\mathrm {int}(S)\) its interior, \({\bar{S}}\) its closure and \(\mathrm {bd}\,S={\bar{S}}{\setminus } \mathrm {int}(S)\) its boundary. Let us recall that the support function of S is defined through

$$\begin{aligned} \sigma _S(x)=\sup \left\{ \langle y,x\rangle :y\in S\right\} \in {\mathbb {R}}\cup \{+\infty \}. \end{aligned}$$

3.1 Smooth parametrization of convex rings

A convex ring is a set of the form \(C_1{\setminus } C_2\) where \(C_1\subset C_2\) are convex sets. Providing adequate parameterizations for such objects is key for interpolating \(C_1\) and \(C_2\) by some (regular) convex function.

The following assertion plays a fundamental role.

Assumption 1

Let \(T_-,T_+ \subset {\mathbb {R}}^2\) be convex, compact with \(C^k\) boundary (\(k \ge 2\)) and positive curvature. Assume that, \(T_- \subset \mathrm {int} (T_+)\) and \(0 \in \mathrm {int}(T_-)\).

The positive curvature assumption ensures that the boundaries can be parametrized by their normal, that is, for \(i=-,+\), there exists

$$\begin{aligned} c_i :{\mathbb {R}}/ 2 \pi {\mathbb {Z}}\mapsto \mathrm {bd}\,(T_i) \end{aligned}$$

such that the normal to \(T_i\) at \(c_i(\theta )\) is the vector \(n(\theta ) = (\cos (\theta ),\sin (\theta ))^T\) and \({\dot{c}}_i(\theta ) = \rho _{i}(\theta ) \tau (\theta )\) where \(\rho _i > 0\) and \(\tau (\theta ) = (-\sin (\theta ),\cos (\theta ))\). In this setting, it holds that \(c_i(\theta ) = \text{ argmax}_{y \in T_i} \left\langle n(\theta ), y\right\rangle \). The map \(c_i\) is the inverse of the Gauss map and is \(C^{k-1}\) (see [35] Section 2.5).

Lemma 2

(Minkowski sum of convex sets with positive curvature) Let \(T_-,T_+\) be as in Assumption 1 with normal parametrizations as above. For \(a,b\ge 0\) with \(a+b>0\) set \(T=aT_-+bT_+\).

Then T has positive curvature and its boundary is given by

$$\begin{aligned} \mathrm {bd}\,T=\{a\,c_-(\theta )+b\,c_+(\theta ): \theta \in {\mathbb {R}}/ 2 \pi {\mathbb {Z}}\}, \end{aligned}$$

with the natural parametrization \({\mathbb {R}}/2\pi {\mathbb {Z}}\,\ni \theta \rightarrow a\,c_-(\theta )+b\,c_+(\theta ).\)

Proof

We may assume \(ab>0\) otherwise the result is obvious. Let x be in \(\mathrm {bd}\,T\) and denote by \(n(\theta )\) the normal vector at x for a well chosen \(\theta \), so that \(x=\mathrm {argmax}\,\left\{ \langle y,n(\theta )\rangle \right\} : y \in T\}\). Observe that the definition of the Minkowski sum yields

$$\begin{aligned} \max _{y\in T}\langle y,n(\theta )\rangle= & {} \max _{(v,w)\in T_-\times T_+}\langle a v+bw,n(\theta )\rangle \\= & {} a \max _{v\in T_-} \langle v,n(\theta )\rangle +b\max _{w\in T_+} \langle w,n(\theta )\rangle \end{aligned}$$

so that

$$\begin{aligned} \langle x,n(\theta )\rangle= & {} a \langle c_-(\theta ),n(\theta )\rangle +b\langle c_+(\theta ),n(\theta )\rangle \\= & {} \langle a c_-(\theta )+b c_+(\theta ),n(\theta )\rangle \end{aligned}$$

which implies by extremality of x that \(x= a c_-(\theta )+b c_+(\theta )\). Conversely, for any such x, \(n(\theta )\) defines a supporting hyperplane to T and x must be on the boundary of T. The other results follow immediately. \(\square \)

In the following fundamental proposition, we provide a smooth parametrization of the convex ring \(T_+{\setminus } \mathrm {int}\,T_-\). The major difficulty is to control tightly the derivatives at the boundary so that the parametrizations can be glued afterward to build smooth interpolants.

Proposition 1

(\(C^k\) parametrization of convex rings) Let \(T_-,T_+\) be as in Assumption 1 with their normal parametrization as above. Fix \(k\ge 2\), \(\lambda _-< \lambda _0< \lambda _1 < \lambda _+\) and \(0< e_0,e_1<1\). Choose d in \( {\mathbb {N}}^*\), such that

$$\begin{aligned} \frac{k}{d} \le \min \left\{ \frac{\lambda _0 - \lambda _-}{\lambda _+ - \lambda _-}, 1 - \frac{\lambda _1 - \lambda _-}{\lambda _+ - \lambda _-} \right\} . \end{aligned}$$

Consider the map

$$\begin{aligned} G :[\lambda _-,\lambda _+] \times {\mathbb {R}}/2\pi {\mathbb {Z}}&\mapsto {\mathbb {R}}^2 \nonumber \\ (\lambda , \theta ) \mapsto {\tilde{\gamma }}_\Theta (\lambda ) \end{aligned}$$
(5)

with \(\Theta = (c_-(\theta ), c_+(\theta ), \lambda _-, \lambda _0, \lambda _1, \lambda _+, e_0,e_1)\) and \({\tilde{\gamma }}\) as given by Lemma 1. Assume further that:

$$\begin{aligned}&({\mathcal {M}})\, \textit{For any }\theta \in {\mathbb {R}}/ 2\pi {\mathbb {Z}}, \lambda \mapsto \left\langle G(\lambda , \theta ), n(\theta )\right\rangle \textit{ has strictly positive derivative on }\\&[\lambda _-,\lambda _+]. \end{aligned}$$

Then the image of G is \({\mathcal {R}}:=T_+ {\setminus } \mathrm {int}(T_-)\), G is \(C^k\) and satisfies, for any \(2 \le l \le k\) and any m in \( {\mathbb {N}}^*\),

$$\begin{aligned} \frac{\partial ^m G}{\partial \theta ^m}(\lambda _-,\theta )&= c_-^{(m)}(\theta )&\frac{\partial ^m G}{\partial \theta ^m}(\lambda _+,\theta )&= c_+^{(m)}(\theta )\\ \frac{\partial ^{m+1} G}{\partial \lambda \partial \theta ^m} (\lambda _-,\theta )&= c_-^{(m)}(\theta ) \frac{e_0}{\lambda _0 - \lambda _-}&\frac{\partial ^{m+1} G}{\partial \lambda \partial \theta ^m} (\lambda _+,\theta )&= c_+^{(m)}(\theta ) \frac{e_1}{\lambda _+-\lambda _1}\\ \frac{\partial ^{l+m} G}{\partial \lambda ^l \partial \theta ^m }(\lambda _-,\theta )&= 0&\frac{\partial ^{l+m} G}{\partial \lambda ^l \partial \theta ^m }(\lambda _+,\theta )&= 0. \end{aligned}$$

Besides G is a diffeomorphism from its domain on to its image. Set \({\mathcal {R}}\ni x \mapsto (f(x), \theta (x))\) to be the inverse of G. Then f is \(C^k\) and in addition, for all x in \( {\mathbb {R}}\),

$$\begin{aligned} \nabla f(x) =\quad&\frac{1}{\left\langle \frac{\partial G}{\partial \lambda } (f(x), \theta (x)), n(\theta (x)) \right\rangle } \, n(\theta (x))\nonumber \\ \nabla \theta (x) =\quad&\frac{1}{\left\langle \frac{\partial G}{\partial \theta }(f(x),\theta (x)),\tau (\theta (x))\right\rangle } \tau (\theta (x)) \nonumber \\&- \frac{\left\langle \frac{\partial G}{\partial \lambda } (f(x),\theta (x)), \tau (\theta (x))\right\rangle }{\left\langle \frac{\partial G}{\partial \lambda } (f(x),\theta (x)), n(\theta (x))\right\rangle \left\langle \frac{\partial G}{\partial \theta }(f(x),\theta (x)),\tau (\theta (x))\right\rangle } n(\theta (x)) \nonumber \\ \nabla ^2 f(x)=\quad&\frac{\left\langle \frac{\partial G}{\partial \theta }(f(x),\theta (x)),\tau (\theta (x))\right\rangle }{\left\langle \frac{\partial G}{\partial \lambda } (f(x), \theta (x)), n(\theta (x)) \right\rangle } \nabla \theta (x) \nabla \theta (x)^T \nonumber \\&- \frac{\left\langle \frac{\partial ^2 G}{\partial \lambda ^2}(f(x),\theta (x)), n(\theta (x))\right\rangle }{\left\langle \frac{\partial G}{\partial \lambda } (f(x), \theta (x)), n(\theta (x)) \right\rangle } \nabla f(x) \nabla f(x)^T, \end{aligned}$$
(6)

where all denominators are positive.

Remark 2

Note that G is actually well defined and smooth on an open set containing its domain. As we shall see it is also a diffeomorphism from an open set containing its domain onto its image.

Proof

Note that by construction, we have \(G(\lambda ,\theta ) = a(\lambda ) c_-(\theta ) + b(\lambda )c_+(\theta )\) for some polynomials a and b which are nonnegative on \([\lambda _-,\lambda _+]\). The formulas for the derivatives follow easily from this remark, the form of a and b and Lemma 1.

Set, for any \(\lambda \) in \( [\lambda _-,\lambda _+]\), \(T_\lambda = a(\lambda ) T_- + b(\lambda ) T_+\). The resulting set \(T_\lambda \) is convex and has a positive curvature by Lemma 2, and for \(\lambda \) fixed \(G(\lambda ,\cdot )\) is the inverse of the Gauss map of \(T_\lambda \), which constitutes a parametrization by normals of the boundary.

Assume that \(\lambda < \lambda '\), using the monotonicity assumption \(({\mathcal {M}})\), we have for any \(\theta ,\theta '\),

$$\begin{aligned} \left\langle n(\theta '), G(\lambda ,\theta ) \right\rangle&\le \sup _{y \in T_{\lambda }} \left\langle n(\theta '),y \right\rangle \\&= \left\langle n(\theta '), G(\lambda ,\theta ') \right\rangle \\&< \left\langle n(\theta '), G(\lambda ',\theta ') \right\rangle \end{aligned}$$

so that \(G(\lambda ,\theta ) \ne G(\lambda ',\theta ')\). Furthermore, we have by definition of \(G(\lambda ',\theta ')\)

$$\begin{aligned} G(\lambda ,\theta ) \in \bigcap _{\theta ' \in {\mathbb {R}}/2\pi {\mathbb {Z}}} \left\{ y,\; \left\langle y, n(\theta ')\right\rangle \le \left\langle n(\theta '), G(\lambda ',\theta ') \right\rangle \right\} = T_{\lambda '}, \end{aligned}$$

where the equality follows from the convexity of \(T_{\lambda '}\). By convexity and compactness, this entails that \(T_\lambda = \mathrm {conv}(\mathrm {bd}\,(T_\lambda )) \subset \mathrm {int}\,T_{\lambda '}\).

Let us show that the map G is bijective, first consider proving surjectivity. Let f be defined on \(T_+ {\setminus } \mathrm {int}(T_-)\) through

$$\begin{aligned} f :x \mapsto \inf _{}\left\{ \lambda : \lambda \ge \lambda _-, \; x \in T_\lambda = a(\lambda ) T_- + b(\lambda ) T_+\right\} . \end{aligned}$$
(7)

Since \(a(\lambda _+)=0, b(\lambda _+)=1\) this function is well defined and by compactness and continuity the infimum is achieved. It must hold that x belongs to \( \mathrm {bd}\,(T_{f(x)})\), indeed, if \(f(x) = \lambda _-\), then x belongs to \( \mathrm {bd}\,(T_-)\) and otherwise, if x is in \( \mathrm {int}(T_{\lambda '})\) for \(\lambda ' > \lambda _-\), then \(f(x) < \lambda '\). We deduce that x is of the form \(G(f(x),\theta )\) for a certain value of \(\theta \), so that G is surjective.

As for injectivity, we have already seen a first case, the monotonicity assumption \(({\mathcal {M}})\) ensures that \(\lambda \ne \lambda '\) implies \(G(\lambda ,\theta ) \ne G(\lambda ',\theta ')\) for any \(\theta ,\theta '\). Furthermore, we have the second case, for any \(\lambda \) in \( [\lambda _-,\lambda _+]\) and any \(\theta \), \(G(\lambda ,\theta ) = \arg \max _{y\in T_\lambda } \left\langle y,n(\theta ) \right\rangle \) so that \(\theta \ne \theta '\) implies \(G(\lambda ,\theta ) \ne G(\lambda ,\theta ')\). So in all cases, \((\lambda ,\theta ) \ne (\lambda ',\theta ')\) implies that \(G(\lambda ,\theta ) \ne G(\lambda ',\theta ')\) and G is injective.

Let us now show that the map G is a local diffeomorphism by estimating its Jacobian map.

Since \(0 \in \mathrm {int}(T_-)\), we have for any \(\lambda ,\theta \),

$$\begin{aligned} 0&< \sup _{y \in T_-} \left\langle y, n(\theta ) \right\rangle \\&= \left\langle G(\lambda _-,\theta ),n(\theta ) \right\rangle \\&\le a(\lambda ) \left\langle c_-(\theta ), n(\theta )\right\rangle + b(\lambda ) \left\langle c_+(\theta ), n(\theta ) \right\rangle , \end{aligned}$$

and both scalar products are positive so that \(a(\lambda ) + b(\lambda ) > 0\). Hence, for any \(\theta \) in \( {\mathbb {R}}/ 2\pi {\mathbb {Z}}\),

$$\begin{aligned} \frac{\partial G}{\partial \theta }(\lambda ,\theta ) = (a(\lambda ) \rho _-(\theta ) + b(\lambda ) \rho _+ (\theta )) \tau (\theta ), \end{aligned}$$
(8)

with \(a(\lambda ) \rho _-(\theta ) + b(\lambda ) \rho _+ (\theta ) > 0\).

Furthermore by assumption \(\lambda \rightarrow \max _{y \in T_\lambda } \left\langle y, n(\theta ) \right\rangle =\left\langle G(\lambda ,\theta ),n(\theta ) \right\rangle \) has strictly positive derivative in \(\lambda \), whence

$$\begin{aligned} \left\langle \frac{\partial G}{\partial \lambda } (\lambda ,\theta ), n(\theta )\right\rangle > 0. \end{aligned}$$

We deduce that for any fixed \(\theta \), in the basis \((n(\theta ), \tau (\theta ))\), the Jacobian of G, denoted \(J_G\), is triangular with positive diagonal entries. More precisely fix \(\lambda , \theta \) and set \(x = G(\lambda , \theta )\) such that \(\lambda = f(x)\), \(\theta = \theta (x)\). In the basis \((n(\theta ), \tau (\theta ))\), we deduce from (8) that the Jacobian of G is of the form

$$\begin{aligned} J_G(\lambda , \theta ) = \begin{pmatrix} \alpha &{}\quad 0\\ \gamma &{}\quad \beta \end{pmatrix}, \end{aligned}$$

where

$$\begin{aligned} \alpha&= \left\langle \frac{\partial G}{\partial \lambda } (\lambda ,\theta ), n(\theta )\right\rangle>0 \\ \beta&= \left\langle \frac{\partial G}{\partial \theta } (\lambda ,\theta ), \tau (\theta )\right\rangle = a(\lambda ) \rho _-(\theta ) + b(\lambda ) \rho _+ (\theta ) >0\\ \gamma&= \left\langle \frac{\partial G}{\partial \lambda } (\lambda ,\theta ), \tau (\theta )\right\rangle . \end{aligned}$$

It is thus invertible and we have a local diffeomorphism. We deduce that

$$\begin{aligned} J_G(\lambda , \theta )^{-1} = \begin{pmatrix} \alpha ^{-1} &{}\quad 0\\ \frac{-\gamma }{\alpha \beta }&{}\quad \beta ^{-1} \end{pmatrix}. \end{aligned}$$

We have \(J_G(\lambda ,\theta )^{-1} = J_{G^{-1}}(x)\) so that the first line is \(\nabla f(x)\) and second line is \(\nabla \theta (x)\), which proves the claimed expressions for gradients.

We also have \(d n(\theta ) / d\theta = \tau (\theta )\) so that

$$\begin{aligned} J_{n \circ \theta }(x) = \tau (\theta (x)) \nabla \theta (x)^T. \end{aligned}$$

Differentiating the gradient expression, we obtain (\(\nabla \) denotes gradient with respect to x):

$$\begin{aligned}&\nabla ^2 f(x)\\&\quad =\quad \frac{\partial }{\partial x} \left( \frac{1}{\left\langle \frac{\partial G}{\partial \lambda } (f(x), \theta (x)), n(\theta (x)) \right\rangle } \, n(\theta (x))\right) \\&\quad =\quad \frac{1}{\left\langle \frac{\partial G}{\partial \lambda } (f(x), \theta (x)), n(\theta (x)) \right\rangle } \, J_{n\circ \theta }(x) \\&\qquad + n(\theta (x)) \nabla \left( \frac{1}{\left\langle \frac{\partial G}{\partial \lambda } (f(x), \theta (x)), n(\theta (x)) \right\rangle }\right) ^T \\&\quad =\quad \frac{1}{\left\langle \frac{\partial G}{\partial \lambda } (f(x), \theta (x)), n(\theta (x)) \right\rangle } \, J_{n\circ \theta }(x) \\&\qquad - n(\theta (x)) \nabla \left( \left\langle \frac{\partial G}{\partial \lambda } (f(x), \theta (x)), n(\theta (x)) \right\rangle \right) ^T \frac{1}{\left\langle \frac{\partial G}{\partial \lambda } (f(x), \theta (x)), n(\theta (x)) \right\rangle ^2} \\&\quad =\quad \frac{1}{\left\langle \frac{\partial G}{\partial \lambda } (f(x), \theta (x)), n(\theta (x)) \right\rangle } \left( \tau (\theta ) \nabla \theta (x)^T \right. \\&\qquad \left. - \nabla f(x) \nabla \left( \left\langle \frac{\partial G}{\partial \lambda } (f(x), \theta (x)), n(\theta (x)) \right\rangle \right) ^T\right) . \end{aligned}$$

We have

$$\begin{aligned} =\quad&\nabla \left( \left\langle \frac{\partial G}{\partial \lambda } (f(x), \theta (x)), n(\theta (x)) \right\rangle \right) ^T \\ =\quad&\frac{\partial G}{\partial \lambda } (f(x), \theta (x))^T J_{n\circ \theta }(x) + n(\theta (x))^T J_{\frac{\partial G}{\partial \lambda }}(f(x),\theta (x)) J_{G^{-1}} (x) \\ =\quad&\left\langle \frac{\partial G}{\partial \lambda }(f(x), \theta (x)), \tau (\theta (x)) \right\rangle \nabla \theta (x)^T + \left\langle n (\theta (x)), \frac{\partial ^2 G}{\partial \lambda ^2}(\lambda ,\theta )\right\rangle \nabla f(x)^T, \end{aligned}$$

where, for the last identity, we have used the fact that

$$\begin{aligned} n(\theta (x))^T J_{\frac{\partial G}{\partial \lambda }}(f(x),\theta (x))&= \left\langle n (\theta (x)), \frac{\partial ^2 G}{\partial \lambda ^2}(\lambda ,\theta )\right\rangle n(\theta (x))^T \\&\quad + \left\langle n (\theta (x)), \frac{\partial ^2 G}{\partial \lambda \partial \theta }(\lambda ,\theta )\right\rangle \tau (\theta (x))^T\\&= \left\langle n (\theta (x)), \frac{\partial ^2 G}{\partial \lambda ^2}(\lambda ,\theta )\right\rangle n(\theta (x))^T\\ n(\theta (x))^T J_{G^{-1}}(x)&= n(\theta (x))^T J_G(f(x),\theta (x))^{-1}\\&= \frac{1}{\left\langle \frac{\partial G}{\partial \lambda } (f(x),\theta (x)), n(\theta (x))\right\rangle } n(\theta (x))^T = \nabla f(x)^T. \end{aligned}$$

We deduce that

$$\begin{aligned}&\nabla ^2 f(x)\\&\quad =\quad \frac{1}{\left\langle \frac{\partial G}{\partial \lambda } (f(x), \theta (x)), n(\theta (x)) \right\rangle }\\&\qquad \Bigg ( \tau (\theta (x)) \nabla \theta (x)^T - \left\langle \frac{\partial G}{\partial \lambda }(f(x), \theta (x)), \tau (\theta (x)) \right\rangle \nabla f(x) \nabla \theta (x)^T \\&\qquad - \nabla f(x) \nabla f(x)^T \left\langle n (\theta (x)), \frac{\partial ^2 G}{\partial \lambda ^2}(\lambda ,\theta )\right\rangle \Bigg ). \end{aligned}$$

We have

$$\begin{aligned}&\tau (\theta (x)) \nabla \theta (x)^T - \left\langle \frac{\partial G}{\partial \lambda }(f(x), \theta (x)), \tau (\theta (x)) \right\rangle \nabla f(x) \nabla \theta (x)^T \\&\quad =\quad \left( \tau (\theta (x)) - \frac{\left\langle \frac{\partial G}{\partial \lambda }(f(x), \theta (x)), \tau (\theta (x)) \right\rangle }{\left\langle \frac{\partial G}{\partial \lambda }(f(x), \theta (x)), n(\theta (x)) \right\rangle } n(\theta (x)) \right) \nabla \theta (x)^T\\&\quad =\quad (a(\lambda ) \rho _-(\theta ) + b(\lambda ) \rho _+ (\theta )) \nabla \theta (x) \nabla \theta (x)^T \end{aligned}$$

So that we actually get

$$\begin{aligned}&\nabla ^2 f(x) \left\langle \frac{\partial G}{\partial \lambda } (f(x), \theta (x)), n(\theta (x)) \right\rangle \\&= (a(\lambda ) \rho _-(\theta ) + b(\lambda ) \rho _+ (\theta )) \nabla \theta (x) \nabla \theta (x)^T - \left\langle n (\theta (x)), \frac{\partial ^2 G}{\partial \lambda ^2}(\lambda ,\theta )\right\rangle \nabla f(x) \nabla f(x)^T\\&= \left\langle \frac{\partial G}{\partial \theta }(\lambda ,\theta ),\tau (\theta (x))\right\rangle \nabla \theta (x) \nabla \theta (x)^T - \left\langle n (\theta (x)), \frac{\partial ^2 G}{\partial \lambda ^2}(\lambda ,\theta )\right\rangle \nabla f(x) \nabla f(x)^T. \end{aligned}$$

This concludes the proof. \(\square \)

3.2 Smooth convex interpolation of smooth positively curved convex sequences

In this section, we consider an indexing set I with either \(I = {\mathbb {N}}\) or \(I = {\mathbb {Z}}\), and an increasing sequence of compact convex sets \(\left( T_i \right) _{i\in I}\) such that for any i in I, the couple \(T_+ := T_{i+1}\), \(T_-: = T_i\) satisfies Assumption 1. In particular, for each i in I, \(T_i\) is compact, convex with \(C^k\) boundary and positive curvature. We denote by \(c_i\) the corresponding parametrization by the normal, \(T_i \subset \mathrm {int}\, T_{i+1}\). With no loss of generality we assume \(\displaystyle 0 \in \cap _{i \in I} T_i\).

This is our main theoretical result.

Theorem 2

(Smooth convex interpolation) Let \(I = {\mathbb {N}}\) or \(I = {\mathbb {Z}}\) and \(\left( T_i \right) _{i\in I}\) such that for any \(i \in I\), \(T_i \subset {\mathbb {R}}^2\) and the couple \(T_+ := T_{i+1}\), \(T_-: = T_i\) satisfies Assumption 1. Then there exists a \(C^k\) convex function

$$\begin{aligned} f :{\mathcal {T}}:= \mathrm {int}\left( \bigcup _{i\in I} T_i \right) \mapsto {\mathbb {R}}\end{aligned}$$

such that

(i) \(T_i\) is a sublevel set of f for all i in I.

(ii) We have

$$\begin{aligned} {{\,\mathrm{argmin}\,}}f= {\left\{ \begin{array}{ll} \bigcap _I T_i &{}\quad \text { if } I = {\mathbb {Z}}\\ \{0\}&{}\quad \text { if } I = {\mathbb {N}}. \end{array}\right. } \end{aligned}$$

(iii) \(\nabla ^2 f \) is positive definite on \({\mathcal {T}}{\setminus } {{\,\mathrm{argmin}\,}}f\), and if \(I = {\mathbb {N}}\), it is positive definite throughout \({\mathcal {T}}\).

Proof

Preconditionning. We have that \(0 \in \cap _{i \in I } \mathrm {int}(T_i)\). Hence for any i in I and \(0 \le \alpha <1\), \(\alpha T_i \in \mathrm {int}(T_i)\). Furthermore, for \(\alpha >0\), small enough, \((1 + \alpha )T_{i} \subset (1-\alpha ) \mathrm {int}(T_{i+1})\). Set \(\alpha _0\) such that \((1 + \alpha _0)T_0 \subset (1-\alpha _0) \mathrm {int} (T_1)\). By forward (and backward if \(I = {\mathbb {Z}}\)) induction, for all i in I, we obtain \(\alpha _i>0\) such that

$$\begin{aligned} (1 + \alpha _i) T_i&\subset (1-\alpha _{i}) \mathrm {int} (T_{i+1}). \end{aligned}$$

Setting for all i in I, \(\epsilon _{i+1} = \min \{\alpha _i,\alpha _{i+1}\}\) (\(\epsilon _0 = \alpha _0\) if \(I = {\mathbb {N}}\)), we have for all i in I

$$\begin{aligned} (1 + \epsilon _i) T_i \subset (1 + \alpha _i) T_i&\subset (1-\alpha _{i}) \mathrm {int} (T_{i+1}) \subset (1-\epsilon _{i+1}) \mathrm {int} (T_{i+1}). \end{aligned}$$

For all i in I, we introduce

$$\begin{aligned} S_{3i}&= T_i,\\ S_{3i + 1}&= (1 + \epsilon _i) T_i\\ S_{3i + 2}&= (1 - \epsilon _{i+1}) T_{i+1}. \end{aligned}$$

We have a new sequence of strictly increasing compact convex sets \(\left( S_i\right) _{i \in I }\).

Value assignation. For each i in I, we set

$$\begin{aligned} K_i = \max _{\Vert x\Vert = 1} \frac{\sigma _{S_{i+1}}(x) - \sigma _{S_i}(x)}{ \sigma _{S_{i}}(x) - \sigma _{S_{i-1}}(x)} \in \left( 0, + \infty \right) . \end{aligned}$$

Note that for all i in I, \(K_{3i} = 1\). We choose \(\lambda _{1} = 2\), \(\lambda _0 = 1\) and for all i in I,

$$\begin{aligned} \lambda _{i+1} = \lambda _i + K_i(\lambda _i - \lambda _{i-1}). \end{aligned}$$
(9)

By construction, we have for all i in I and all \(\theta \in {\mathbb {R}}/ 2 \pi {\mathbb {Z}}\),

$$\begin{aligned} \frac{\sigma _{S_{i+1}}(n(\theta )) - \sigma _{S_{i}}(n(\theta ))}{\lambda _{i+1} - \lambda _{i}} \le \frac{\sigma _{S_{i}}(n(\theta )) - \sigma _{S_{i-1}}(n(\theta )) }{\lambda _{i} - \lambda _{i-1}}. \end{aligned}$$

If \(I = {\mathbb {Z}}\), this entails

$$\begin{aligned} 0 < \lambda _i - \lambda _{i - 1} \le \frac{\lambda _{1}- \lambda _0}{\sigma _{S_{1}}(n(\theta )) - \sigma _{S_{0}}(n(\theta ))} (\sigma _{S_{i}}(n(\theta )) - \sigma _{S_{i-1}}(n(\theta ))), \end{aligned}$$

and the right-hand side is summable over negative indices \(i\le 0\), so that \(\lambda _{i} \rightarrow {\underline{\lambda }} \in {\mathbb {R}}\) as \(i \rightarrow -\infty \). In all cases \((\lambda _i)_{i\in I}\) is an increasing sequence bounded from below.

Local interpolation. We fix i in I and consider the function \(G_i\) described in Proposition 1 with \(T_+ = S_{3i + 3} = T_{i+1}\), \(T_- = S_{3i} = T_i\), \(\lambda _+ = \lambda _{3i+3}\), \(\lambda _{1} = \lambda _{3i+2}\), \(\lambda _0 = \lambda _{3i+1}\), \(\lambda _- = \lambda _{3i}\), \(e_0 = \epsilon _i\), \(e_1 = \epsilon _{i+1}\). By linearity, we have for any \((\lambda ,\theta ) \in [\lambda _-,\lambda _+] \times {\mathbb {R}}/ 2\pi {\mathbb {Z}}\),

$$\begin{aligned} \left\langle G_i(\lambda ,\theta ),n(\theta )\right\rangle = {\tilde{\gamma }}_{\Theta }(\lambda ) \end{aligned}$$

where \( {\tilde{\gamma }}_{\Theta }\) is as in Lemma 1 with input data \(q_0 = \left\langle c_{3i}(\theta ), n(\theta )\right\rangle = \sigma _{S_{3i}}(n(\theta ))\), \(q_1 = \left\langle c_{3i+3}(\theta ), n(\theta )\right\rangle = \sigma _{S_{3i+3}}(n(\theta ))\), and \(\lambda _-,\lambda _0,\lambda _1,\lambda _+,e_1,e_0\) as already described. This corresponds to the Bernstein approximation of the piecewise affine interpolation between the points

$$\begin{aligned}&\left( \lambda _{3i}, \sigma _{S_{3i}}(n(\theta ))\right) \nonumber \\&\left( \lambda _{3i+1}, \sigma _{S_{3i+1}}(n(\theta ))\right) ,\nonumber \\&\left( \lambda _{3i+2}, \sigma _{S_{3i+2}}(n(\theta ))\right) ,\nonumber \\&(\lambda _{3i+3}, \sigma _{3i+3}(n(\theta ))), \end{aligned}$$
(10)

By construction of \(\left( K_i \right) _{i \in I }\), we have for all \(\theta \),

$$\begin{aligned}&0 < \frac{\sigma _{S_{3i+3}}(n(\theta )) - \sigma _{S_{3i+2}}(n(\theta )) }{\lambda _{3i+3} - \lambda _{3i + 2}}\\&\quad \le \frac{\sigma _{S_{3i+2}}(n(\theta )) - \sigma _{S_{3i+1}}(n(\theta )) }{\lambda _{3i+2} - \lambda _{3i + 1}} \le \frac{\sigma _{S_{3i+1}}(n(\theta )) - \sigma _{S_{3i}}(n(\theta )) }{\lambda _{3i+1} - \lambda _{3i}}. \end{aligned}$$

Whence the affine interpolant between points in (10) is strictly increasing and concave, and by using the shape preserving properties of Bernstein polynomials, \(\left\langle G_i(\lambda ,\theta ),n(\theta )\right\rangle \) has strictly positive derivative. As a consequence \(G_i\) is a diffeomorphism and its derivatives are as in Proposition 1. Furthermore

$$\begin{aligned} \lambda \mapsto \left\langle G_i(\lambda ,\theta ),n(\theta )\right\rangle \end{aligned}$$

is a \(C^k\) concave function of \(\lambda \).

Global interpolation. Recall that \({\underline{\lambda }} = \inf _{i \in I} \lambda _i>-\infty \) and set \({\bar{\lambda }} = \sup _{i \in I} \lambda _i\in (-\infty ,+\infty ]\). For any \(\lambda \in ({\underline{\lambda }},{\bar{\lambda }})\), there exists a unique \(i_\lambda \in I\) such that \(\lambda \in [\lambda _{3i_\lambda }, \lambda _{3i_\lambda +3})\). Define

$$\begin{aligned} G :({\underline{\lambda }}, {\bar{\lambda }}) \times {\mathbb {R}}/ 2\pi {\mathbb {Z}}&\mapsto {\mathbb {R}}^2 \\ (\lambda ,\theta )&\mapsto G_{i_\lambda } (\lambda , \theta ). \end{aligned}$$

Fix i in I. The boundary of \(T_{i+1}\) is given by \(G_{i+1}(\lambda _{3i+3}, {\mathbb {R}}/ 2 \pi {\mathbb {Z}}) = G_{i}(\lambda _{3i+3}, {\mathbb {R}}/ 2 \pi {\mathbb {Z}})\) with actually

$$\begin{aligned} G_{i+1}(\lambda _{3i+3}, \theta ) = G_{i}(\lambda _{3i+3}, \theta ), \text{ for } \text{ all } \theta \text{ in } {\mathbb {R}}/ 2 \pi {\mathbb {Z}}. \end{aligned}$$
(11)

Since \(K_{3i} = 1\), we have

$$\begin{aligned} \lambda _{3i+1} - \lambda _{3i} = \lambda _{3i} - \lambda _{3i-1}. \end{aligned}$$

The expressions of the derivatives in Proposition 1 and (11) ensure that the derivatives of \(G_{i+1}\) and \(G_{i}\) agree on \({\lambda _{3i+3}} \times {\mathbb {R}}/ 2 \pi {\mathbb {Z}}\) up to order k. Hence G is a local diffeomorphism. Bijectivity of each \(G_i\) ensure that G is also bijective and thus G is a diffeomorphism. Furthermore

$$\begin{aligned} \lambda \mapsto \left\langle G(\lambda ,\theta ),n(\theta )\right\rangle \end{aligned}$$

is \(C^k\) piecewise concave and thus concave.

Extending G. If \(I = {\mathbb {N}}\), we may assume without loss of generality that \(S_0 = B\) the Euclidean ball and \(S_1 = 5/3 S_0\), which corresponds to \(\epsilon _0 = 2/3\), eventually after adding a set in the list and rescaling. Let \(\phi \) denote the function described in Lemma 10 and \(G_{-1}\) be described as in Lemma 11. This allows to extend G for \(\lambda \in [0,1]\), G is then \(C^k\) on \((0,{\bar{\lambda }}) \times {\mathbb {R}}/ 2\pi {\mathbb {Z}}\) by using Lemma 11 and Proposition 1. This does not affect the differentiability, monotonicity and concavity properties of G.

Defining the interpolant f. We assume without loss of generality that \({\underline{\lambda }} = 0\). We set f to be the first component of the inverse of G so that it is defined on \(G^{-1}\left( (0,{\bar{\lambda }}) \times {\mathbb {R}}/ 2\pi {\mathbb {Z}}\right) \). We extend f as follows:

  • \(f(0) = 0\) if \(I = {\mathbb {N}}\),

  • \(f = 0\) on \(\cap _{i \in I } T_i\) if \(I = {\mathbb {Z}}\).

Since G is \(C^k\) and non-singular on \( (0,{\bar{\lambda }}) \times {\mathbb {R}}/ 2\pi {\mathbb {Z}}\), the inverse mapping theorem ensures that f is \(C^k\) on \(\mathrm {int}({\mathcal {T}}) {\setminus } \arg \min _{{\mathcal {T}}} f\).

Convexity of f. For any \(\theta \) in \( {\mathbb {R}}/ 2\pi {\mathbb {Z}}\),

$$\begin{aligned} (0, {\bar{\lambda }})&\mapsto {\mathbb {R}}_+\\ \lambda&\mapsto \sup _{z\in [f \le \lambda ]} n(\theta )^Tz \end{aligned}$$

is equal to \(\left\langle G(\lambda ,\theta ),n(\theta )\right\rangle \) which is concave. It can be extended at \(\lambda = 0\) by continuity. This preserves concavity hence, using Theorem 1, we have proved that f is convex and \(C^k\) on \({\mathcal {T}}{\setminus } {{\,\mathrm{argmin}\,}}_{{\mathcal {T}}}f\).

Smoothness around the argmin and Hessian positivity. If \(I = {\mathbb {N}}\), then the interpolant defined in Lemma 11 ensures that f is proportional to the norm squared around 0. Hence it is \(C^k\) around 0 with positive definite Hessian. We may compose f with the function \(g :t \mapsto \sqrt{t^2 + 1} + t\) which is increasing and has positive second derivative. This ensures that the resulting Hessian is positive definite outside \({{\,\mathrm{argmin}\,}}f\) and thus everywhere since

$$\begin{aligned} \nabla ^2 g \circ f = g' \nabla ^2 f + g'' \nabla f \nabla f^T \end{aligned}$$

is positive definite thanks to the expressions for the Hessian of f in Proposition 1

If \(I = {\mathbb {Z}}\), we let all the derivatives of f vanish around the solution set. The smoothing Lemma 14 applies and provides a function \(\phi \) with positive derivative on \((0,+\infty )\), such that \(\phi \circ f\) is convex, \(C^k\) with prescribed sublevel sets. Furthermore, we remark that

$$\begin{aligned} \nabla ^2 \phi \circ f = \phi ' \nabla ^2 f + \phi '' \nabla f \nabla f^T. \end{aligned}$$

We may compose \(\phi \circ f\) with the function \(g :t \mapsto \sqrt{t^2 + 1} + t\) which is increasing with positive second derivative, the expressions for the Hessian of f in Proposition 1 ensure that the resulting Hessian is positive definite out of \({{\,\mathrm{argmin}\,}}f\).\(\square \)

Remark 3

(Interpolation of symmetric rings) In view of Remark 1, if we have \(T_{i+1} = \alpha T_i\) for some \(0<\alpha < 1\) and i in \( {\mathbb {Z}}\), then the interpolated level sets between \(T_i\) and \(T_{i+1}\) are all of the form \(s T_i\) for \(\alpha \le s \le 1\).

Remark 4

(Strict convexity) Recall that strict convexity of a differentiable function amounts to the injectivity of its gradient. In Theorem 2 if there is a unique minimizer, then the invertibility of the Hessian outside argmin f ensures that our interpolant is strictly convex (note that this is automatically the case if \(I = {\mathbb {N}}\)).

3.3 Considerations on Legendre functions

The following proposition provides some interpolant with additional properties as global Lipschitz continuity and finiteness properties for the dual function. At this stage these results appear as merely technical but they happen to be decisive in the construction of counterexamples involving Legendre functions. The properties of Legendre functions can be found in [37, Chapter 6]. We simply recall here that, given a convex body C of \({\mathbb {R}}^p\), a convex function \(h:C\rightarrow {\mathbb {R}}\) is Legendre if it is differentiable on \(\mathrm {int}\,C\) and if \(\nabla h\) defines a bijection from \(\mathrm {int}\,C\) to \(\nabla h(\mathrm {int}\,C)\) with in addition

$$\begin{aligned} \lim _{\begin{array}{l} x\in \mathrm {int}\,C\\ x\rightarrow z\end{array}} \Vert \nabla h(x)\Vert =+\infty , \end{aligned}$$

for all z in \(\mathrm {bd}\,C\). We also assume that \(\text{ epi }\,f:=\{(x,\lambda ):f(x)\le \lambda \}\) is closed in \({\mathbb {R}}^{p+1}\). The Legendre conjugate or dual function of h is defined through

$$\begin{aligned} h^*(z)=\sup \left\{ \langle z,x\rangle -h(x):x \in C\right\} , \end{aligned}$$

for z in \({\mathbb {R}}^p\), and its domain is \(D:=\left\{ z\in {\mathbb {R}}^p:h(z)<+\infty \right\} .\) The function \(h^*\) is differentiable on the interior of D, and the inverse of \(\nabla h:\mathrm {int}\,C \rightarrow \mathrm {int}\,D\) is \(\nabla h^*:\mathrm {int}\,D \rightarrow \mathrm {int}\,C\).

We start with a simple technical lemma on the compactness of the domain of a Legendre function.

Lemma 3

Let \(h :{\mathbb {R}}^2 \mapsto {\mathbb {R}}\) be a globally Lipschitz continuous Legendre function, and set \(D = \mathrm {int}(\mathrm {dom}(h^*))\) where \(h^* :{\mathbb {R}}^2 \mapsto {\mathbb {R}}\) is the conjugate of h. For each \(\lambda \ge \min _{{\mathbb {R}}^2}h\) let \(\sigma _\lambda \) be the support function associated to the set \(\left\{ z \in {\mathbb {R}}^2, \, h(z) \le \lambda \right\} \). The following are equivalent

  1. (i)

    \(h^*(x)\le 0\) for all \(x \in D\).

  2. (ii)

    For all \(y \in {\mathbb {R}}^2\), \(\sigma _{h(y)}(\nabla h(y)) \le h(y)\).

In both cases \(h^*\) has compact domain.

Proof

Let us establish beforehand the following formula

$$\begin{aligned} h^*(z)&= \sigma _{h(y)}(\nabla h(y))- h(y), \end{aligned}$$
(12)

with \(y = \nabla h^* (z)\) and \(y\in {\mathbb {R}}^2\). Since \(h^*\) is Legendre, we have for all y in D,

$$\begin{aligned} h^*(z)&= \sup _{y \in {\mathbb {R}}^2} \left\langle z, y \right\rangle - h(y) = \left\langle z, \nabla h^*(z) \right\rangle - h(\nabla h^*(z)). \end{aligned}$$

We have, setting \(y = \nabla h^*(z)\)

$$\begin{aligned} \left\langle z, \nabla h^*(z) \right\rangle = \left\langle \nabla h(y),y \right\rangle = \sigma _{h(y)}(\nabla h(y)) \end{aligned}$$

because \(\nabla h(y)\) is normal to the sublevel set of h which contains y in its boundary. Hence we have \(h^*(z) = \sigma _{h(y)}(\nabla h(y))- h(y)\) with \(y = \nabla h^* (z)\), that is (12) holds. Since \(\nabla h^* :D \mapsto {\mathbb {R}}^2\) is a bijection, the equivalence follows. In this case the domain of \(h^*\) is closed because \(h^*\) is bounded and lower semicontinuous. The domain is also bounded by the Lipschitz continuity of h, whence compact. \(\square \)

Proposition 2

(On Legendre interpolation) Let \(\left( S_i \right) _{i\in {\mathbb {N}}}\) be such that for any i in I, \(T_- = S_{i}\), \(T_+ = S_{i+1}\) satisfy Assumption 1 and there exists a sequence \(\left( \epsilon _i \right) _{i \in {\mathbb {N}}}\) in (0, 1) such that for all \(i\ge 1\), \((1 - \epsilon _i)^{-1} S_{3i-1} = (1 +\epsilon _i)^{-1} S_{3i+1}=S_{3i}\).

Assume in addition that,

$$\begin{aligned} \inf _{\Vert x\Vert = 1}&\sigma _{S_i}(x) - \sigma _{S_{i-1}}(x) =1 + O\left( \frac{1}{i^3}\right)&\text{(non } \text{ degeneracy) }, \end{aligned}$$
(13)
$$\begin{aligned} \sup _{\Vert x\Vert =1}&\left| \frac{\sigma _{S_{i+1}}(x) - \sigma _{S_i}(x)}{\sigma _{S_{i}}(x) - \sigma _{S_{i-1}}(x)} - 1\right| = O\left( \frac{1}{i^3} \right)&\text{(moderate } \text{ growth) }. \end{aligned}$$
(14)

Then there exists a convex \(C^k\) function \(h :{\mathbb {R}}^2 \mapsto {\mathbb {R}}\), such that

  • For all i in \( {\mathbb {N}}\), \(S_{3i}\) is a sublevel set of h,

  • h has positive definite Hessian,

  • h is globally Lipschitz continuous,

  • \(h^*\) has a compact domain D and is \(C^k\) and strictly convex on \(\mathrm {int}(D)\).

Proof

The construction of h follows the exact same principle as that of Theorem 2. This ensures that the first two points are valid. Note that Eq. (13) implies that the sets sequence grows by at least a fixed amount in each direction as i grows. Hence we have \({\mathcal {T}}= {\mathbb {R}}^2\).

Global Lipschitz continuity of h: The values of h are defined through

$$\begin{aligned} \lambda _{i+1}-\lambda _{i}&=K_i(\lambda _{i}-\lambda _{i-1}),\;\forall i\in {\mathbb {N}}^*\\ K_i&= \max _{\Vert x\Vert = 1} \frac{\sigma _{S_{i+1}}(x) - \sigma _{S_i}(x)}{ \sigma _{S_{i}}(x) - \sigma _{S_{i-1}}(x)} \in \left( 0, + \infty \right) , \end{aligned}$$

so that \(K_i = 1 + O(1/i^3)\) thanks to Eq. (14). Note that the moderate growth assumption entails

$$\begin{aligned} \sup _{i \in {\mathbb {N}}} \sigma _{S_{i+1}}(x) - \sigma _{S_{i}}(x) = O\left( \prod _{i \in {\mathbb {N}}^*} K_i\right) = O(1). \end{aligned}$$
(15)

For \(i\ge 1\), one has

$$\begin{aligned} \lambda _{i+1}-\lambda _{i}=\prod _{1 \le j \le i} K_j(\lambda _1-\lambda _0). \end{aligned}$$
(16)

On the other hand using the bounds (14), (13) and the identity (16), there exists a constant \(\kappa >0\) such that for all \(i \ge 1\), all \(\theta \) in \( {\mathbb {R}}/ 2 \pi {\mathbb {Z}}\),

$$\begin{aligned} \frac{\sigma _{S_{i+1}}(n(\theta )) - \sigma _{S_{i}}(n(\theta )) }{\lambda _{i+1} - \lambda _{i}} = \frac{\left( \sigma _{S_{i+1}}(n(\theta )) - \sigma _{S_{i}}(n(\theta ))\right) }{\prod _{j=1}^i K_j (\lambda _{1} - \lambda _{0})} \ge \kappa >0. \end{aligned}$$
(17)

By the interpolation properties described in Lemma 1 the function \(\left\langle G(\lambda ,\theta ),n(\theta )\right\rangle \) constructed in Theorem 2 has derivative with respect to \(\lambda \) greater than \(\kappa \). Recalling the expression of the gradient as given in Proposition 1 (and the concavity of G with respect to \(\lambda \)), this shows that

$$\begin{aligned} \Vert \nabla h(x)\Vert \le \frac{1}{\kappa } \end{aligned}$$

for all x in \( {\mathbb {R}}^2\), and by the mean value theorem, h is globally Lipschitz continuous on \({\mathbb {R}}^2\).

Properties of the dual function: h is Legendre, its conjugate \(h^*\) is therefore Legendre. From the definiteness of \(\nabla ^2 h\) and the fact that \(\nabla h :{\mathbb {R}}^2 \mapsto \mathrm {int}(D)\) is a bijection, we deduce that \(h^*\) is \(C^k\) by the inverse mapping theorem. So the only property which we need to establish is that \(h^*\) has a compact domain, in other words, using Lemma 3, it is sufficient to show that \(\sup _{x \in \mathrm {int} D} h^*(x) \le 0\).

Using the notation of the proof of Theorem 2, we will show that it is possible to verify that, for all \(\lambda ,\theta \) in the domain of G

$$\begin{aligned} \frac{\left\langle n(\theta ), G(\lambda ,\theta )\right\rangle }{\left\langle \frac{\partial G}{\partial \lambda }(\lambda ,\theta ), n(\theta ) \right\rangle } \le \lambda . \end{aligned}$$
(18)

Equation (18) is indeed the coordinate form of the characterization given in Lemma 3. Let us observe that

$$\begin{aligned} \frac{\partial }{\partial \lambda } \left( \left\langle n(\theta ), G(\lambda ,\theta )\right\rangle - \lambda \left\langle \frac{\partial G}{\partial \lambda }(\lambda ,\theta ), n(\theta ) \right\rangle \right) = -\lambda \left\langle \frac{\partial ^2 G}{\partial \lambda ^2}(\lambda ,\theta ), n(\theta ) \right\rangle , \end{aligned}$$
(19)

and since G is concave, the right hand side is positive.

Assume that we have proved that,

$$\begin{aligned} \lambda \mapsto \sup _\theta -\lambda \left\langle \frac{\partial ^2 G}{\partial \lambda ^2}(\lambda ,\theta ), n(\theta ) \right\rangle \end{aligned}$$
(20)

has finite integral as \(\lambda \rightarrow \infty \). Since the function

$$\begin{aligned} \theta \mapsto \lambda \left\langle \frac{\partial ^2 G}{\partial \lambda ^2}(\lambda ,\theta ), n(\theta ) \right\rangle \end{aligned}$$
(21)

is continuous on \({\mathbb {R}}/2\pi {\mathbb {Z}}\) for any \(\lambda \), Lebesgue dominated convergence theorem would ensure that

$$\begin{aligned} \theta \mapsto \int _{\lambda \ge \lambda _0} -\lambda \left\langle \frac{\partial ^2 G}{\partial \lambda ^2}(\lambda ,\theta ), n(\theta ) \right\rangle d \lambda \end{aligned}$$

is continuous in \(\theta \), so that:

$$\begin{aligned} \sup _{\theta } \left[ \lim _{\lambda \rightarrow \infty } \left\langle n(\theta ), G(\lambda ,\theta )\right\rangle - \lambda \left\langle \frac{\partial G}{\partial \lambda }(\lambda ,\theta ), n(\theta ) \right\rangle \right] <+\infty . \end{aligned}$$

Shifting values if necessary, we could assume that this upper bound is equal to zero to obtain Eq. (18). The latter being the condition required in Lemma 3, we would have reached a conclusion.

Let us therefore establish that (19) is integrable over \({\mathbb {R}}_+\). Recall that G is constructed using the Bernstein interpolation given in Lemma 1 between successive values of \(\lambda \). As a result, for a fixed \(\theta \), the function \(\left\langle n(\theta ), G(\lambda ,\theta )\right\rangle \) is the interpolation of the piecewise affine function interpolating

$$\begin{aligned}&\left( \lambda _{3i}, \sigma _{S_{3i}}(n(\theta ))\right) \nonumber \\&\left( \lambda _{3i+1}, \sigma _{S_{3i+1}}(n(\theta ))\right) ,\nonumber \\&\left( \lambda _{3i+2}, \sigma _{S_{3i+2}}(n(\theta ))\right) ,\nonumber \\&(\lambda _{3i+3}, \sigma _{3i+3}(n(\theta ))), \end{aligned}$$
(22)

as in Eq. (10). This interpolation is concave and increasing.

Assumption (14) ensures that \(K_j = 1 + O(1/j^3)\). Then

$$\begin{aligned} \prod _{j=1}^m K_j = \bar{K} + O(1 / j^{2}) \end{aligned}$$

where \(\bar{K}\) is the finite, positive limit of the product (we can for example perform integral series comparison after taking the logarithm).

The recursion on the values writes for all \(i\ge 1\)

$$\begin{aligned} \lambda _{i+1} = \lambda _i + K_i(\lambda _i - \lambda _{i-1}), \end{aligned}$$

so that

$$\begin{aligned} \lambda _{i+1} - \lambda _i= (\lambda _1 - \lambda _0) \prod _{j=1}^i K_i = (\lambda _1 - \lambda _0) \bar{K} + O(1/i^{2}). \end{aligned}$$

This means that the gap between consecutive values tends to be constant. Thus by (4) in Lemma 1, see also Remark 1, the degree of the Bernstein interpolants is bounded. Using this bound together with inequality (3), providing bounds for the derivatives of Bernstein’s polynomial, ensure that, for all \(\lambda \) in \( [\lambda _{3i}, \lambda _{3i+3})\):

$$\begin{aligned}&\left| \left\langle \frac{\partial ^2 G}{\partial \lambda ^2}(\lambda ,\theta ), n(\theta ) \right\rangle \right| \\&\quad =\, O\left( \max _{j = 3i+2, 3i+1}\left| \frac{\sigma _{S_{j+1}}(n(\theta )) - \sigma _{S_{j}}(n(\theta ))}{\lambda _{j+1} - \lambda _{j}} - \frac{\sigma _{S_{j}}(n(\theta )) - \sigma _{S_{j-1}}(n(\theta )) }{\lambda _{j} - \lambda _{j-1}} \right| \right) . \end{aligned}$$

Now for any \(j = 3i+2, 3i+1\),

$$\begin{aligned}&\left| \frac{\sigma _{S_{j+1}}(n(\theta )) - \sigma _{S_{j}}(n(\theta ))}{\lambda _{j+1} - \lambda _{j}} - \frac{\sigma _{S_{j}}(n(\theta )) - \sigma _{S_{j-1}}(n(\theta )) }{\lambda _{j} - \lambda _{j-1}} \right| \\&\quad =\, \frac{1}{\lambda _{j+1} - \lambda _{j}} \left| (\sigma _{S_{j+1}}(n(\theta )) - \sigma _{S_{j}}(n(\theta ))) - \frac{\lambda _{j+1} - \lambda _{j}}{\lambda _{j} - \lambda _{j-1}} (\sigma _{S_{j}}(n(\theta )) - \sigma _{S_{j-1}(n(\theta )) }) \right| \\&\quad =\, \left( 1 / ((\lambda _1 - \lambda _0) \bar{K}) + O(1/i^2)\right) \\&\qquad \times \left| (\sigma _{S_{j+1}}(n(\theta )) - \sigma _{S_{j}}(n(\theta ))) - (1 + O(1/i^2)) (\sigma _{S_{j}}(n(\theta )) - \sigma _{S_{j-1}(n(\theta )) }) \right| \\&\quad =\, \left( 1 / ((\lambda _1 - \lambda _0) \bar{K}) + O(1/i^2)\right) \\&\qquad \times \left| (\sigma _{S_{j+1}}(n(\theta )) - \sigma _{S_{j}}(n(\theta ))) - (\sigma _{S_{j}}(n(\theta )) - \sigma _{S_{j-1}(n(\theta )) }) \right| \end{aligned}$$

where the last identity follows from the triangle inequality because using \(\sigma _{S_{j}}(n(\theta )) - \sigma _{S_{j-1}}(n(\theta )) = O(1)\) in (15). Hence

$$\begin{aligned}&\left| \left\langle \frac{\partial ^2 G}{\partial \lambda ^2}(\lambda ,\theta ), n(\theta ) \right\rangle \right| \\&\quad =\, \left( 1 / ((\lambda _1 - \lambda _0) \bar{K}) + O(1/i^2)\right) \\&\qquad \times O\left( \max _{j = 3i+2, 3i+1}\left| \sigma _{S_{j+1}}(n(\theta )) - \sigma _{S_{j}}(n(\theta )) - (\sigma _{S_{j}}(n(\theta )) - \sigma _{S_{j-1}}(n(\theta )) ) \right| \right) \\&\quad =\, \left( 1 / ((\lambda _1 - \lambda _0) \bar{K}) + O(1/i^2)\right) \\&\qquad \times O\left( \max _{j = 3i+2, 3i+1}\left| \sigma _{S_{j}}(n(\theta )) - \sigma _{S_{j-1}}(n(\theta )) \right| \times \left| \frac{\sigma _{S_{j+1}}(n(\theta )) - \sigma _{S_{j}}(n(\theta ))}{\sigma _{S_{j}}(n(\theta )) - \sigma _{S_{j-1}}(n(\theta )) } - 1 \right| \right) \\&\quad =\, O(1/i^3), \end{aligned}$$

where the last inequality follows from (15) and (14). Now as \(i \rightarrow \infty \), \(\lambda _{3i} \sim \lambda _{3i + 3} \sim i c\) for some constant \(c>0\) and

$$\begin{aligned} \sup _{\lambda \in [\lambda _{3i}, \lambda _{3i+3}], \theta \in [0, 2 \pi ]} - \lambda \left\langle \frac{\partial ^2 G}{\partial \lambda ^2}(\lambda ,\theta ), n(\theta ) \right\rangle = O(1 / i^2) \end{aligned}$$

and

$$\begin{aligned} \sup _{ \theta \in [0, 2 \pi ]} - \lambda \left\langle \frac{\partial ^2 G}{\partial \lambda ^2}(\lambda ,\theta ), n(\theta ) \right\rangle \end{aligned}$$

has finite integral as \(\lambda \rightarrow \infty \). This implies (20) and it concludes the proofs. \(\square \)

4 Smooth convex interpolation for sequences of polygons

Given a sequence of points \(A_1,\ldots ,A_n\), we denote by \(A_1\ldots A_n\) the polygon obtained by joining successive points ending the loop with the segment \([A_n,A_1]\). In the sequel we consider mainly convex polygons, so that the vertices \(A_1,\ldots ,A_n\) are also the extreme points.

The purpose of this section is first to show that polygons can be approximated by smooth convex sets with prescribed normals under weak assumptions. Figure 3 illustrates the result we would like to establish: given a target polygon with prescribed normals at its vertices, we wish to construct a smooth convex set interpolating the vertices with the desired normals and whose distance to the polygon is small.

Then given a sequence of nested polygons, we provide a smooth convex function which interpolates the polygons in the sense described just above.

Given a closed nonempty convex subset S of \({\mathbb {R}}^p\) and x in S, we recall that the normal cone to S at x is

$$\begin{aligned} N_S(x)=\left\{ z\in {\mathbb {R}}^p:\langle z,y-x\rangle \le 0,\forall y\in S\right\} . \end{aligned}$$

Such vectors will often simply called normals (to S) at x.

4.1 Smooth approximations of polygons

Lemma 4

For any \(r_-,r_+ > 0\), \(t_- > 0\), \(t_+<0\) and \(\epsilon > 0\), \(m \in {\mathbb {N}}\), \(m \ge 3\), there exists a strictly concave polynomial function \(p :[0,1] \mapsto [0,\epsilon ]\) such that

$$\begin{aligned} p(0)&= 0&p(1)&= 0\\ p'(0)&= t_-&p'(1)&= t_+\\ p''(0)&= -r_-&p''(1)&= -r_+.\\ p^{(q)}(0)&= 0 \quad q \in \{3, \ldots , m\}. \end{aligned}$$
Fig. 3
figure 3

Arrows designate the prescribed normals. We construct a strictly convex set with smooth boundary entirely contained in the auxiliary (blue) polygon. This set interpolates the normals and the distance to the original (red) polygon can be chosen arbitrarily small. The degree of smoothness of the boundary can be chosen arbitrarily high (color figure online)

Proof

Let us begin with preliminary remarks. Consider for any ab in \( {\mathbb {R}}\), the function

$$\begin{aligned} f :t \mapsto a(t-b)^2. \end{aligned}$$

For any t in \( {\mathbb {R}}\), q in \( {\mathbb {N}}\), \(q > 2\), and any \(c > 0\), we have

$$\begin{aligned} f(t + c) - f(t)&= \Delta ^1_c f(t) = a c ( 2(t-b) + c)\nonumber \\ f(t + 2 c) - 2 f(t+c) + f(t)&= \Delta ^2_c f(t) = a c ( 2 (t + c - b) + c - 2(t-b) - c) \nonumber \\ = 2ac^2 \Delta _c^q f(t)&= 0 \end{aligned}$$
(23)

Choosing the degree d and constructing the polynomial. For any d in \( {\mathbb {N}}\), \(d \ge 2m + 1\), we set

$$\begin{aligned} a_-(d)&= \frac{-dr_-}{2(d - 1)}<0&b_-(d)&= \frac{1}{2d} \left( 1 + 2 t_-\frac{d-1}{r_-} \right) > 0\nonumber \\ a_+(d)&= \frac{-dr_+}{2(d - 1)}<0&b_+(d)&= 1 + \frac{1}{2d} \left( -1 + 2t_+\frac{d-1}{r_+} \right) < 1, \end{aligned}$$
(24)

and define the functions

$$\begin{aligned} f_d :s&\mapsto {\left\{ \begin{array}{ll} a_-(d) (( s - b_-(d))^2 - b_-(d)^2) &{} \text { if } s\le b_-(d)\\ - a_-(d) b_-(d)^2&{} \text { if } s\ge b_-(d) \end{array}\right. }\\ g_d :t&\mapsto {\left\{ \begin{array}{ll} a_+(d) (( s - b_+(d))^2 - (1-b_+(d))^2) &{} \text { if } s\ge b_+(d)\\ - a_+(d) (1-b_+(d))^2&{} \text { if } s\le b_+(d). \end{array}\right. } \end{aligned}$$

Furthermore, we set

$$\begin{aligned} f :t&\mapsto {\left\{ \begin{array}{ll} \frac{r_-}{2} \left( \left( \frac{t_-}{r_-} \right) ^2- \left( s - \frac{t_-}{r_-} \right) ^2 \right) &{} \text { if } s\le \frac{t_-}{r_-}\\ \frac{r_-}{2} \left( \frac{t_-}{r_-} \right) ^2 &{} \text { if } s\ge \frac{t_-}{r_-} \end{array}\right. }\\ g :t&\mapsto {\left\{ \begin{array}{ll} \frac{r_+}{2} \left( \left( \frac{t_+}{r_+} \right) ^2 - \left( s -1 - \frac{t_+}{r_+} \right) ^2 \right) &{} \text { if } s\ge 1 + \frac{t_+}{r_+}\\ \frac{r_+}{2} \left( \frac{t_+}{r_+} \right) ^2 &{} \text { if } s\le 1 + \frac{t_+}{r_+}. \end{array}\right. } \end{aligned}$$

Note that \(b_-(d) \rightarrow t_- / r_-\), \(b_+(d) \rightarrow 1 + t_+ / r_+\), \(a_-(d) \rightarrow - r_-/2\) and \(a_+(d) \rightarrow - r_+/2\) as \(d \rightarrow \infty \) so that \(f_d \rightarrow f\) and \(g_d \rightarrow g\) uniformly on [0, 1]. For any d, \(f_d\) is concave increasing and \(g_d\) is concave decreasing and all of them are Lipschitz continuous on [0, 1] with constants that do not depend on d. Note also that \(f(0) = 0 < g(0)\) and \(g(1) = 0 < f(1)\). We choose \(d \ge 2m + 1\) such that

$$\begin{aligned} f_d\left( \frac{m}{d} \right)&\le \min \left( \epsilon , g_d\left( \frac{m}{d} \right) \right) \nonumber \\ g_d\left( 1 - \frac{m}{d} \right)&\le \min \left( \epsilon , f_d\left( 1 - \frac{m}{d} \right) \right) . \end{aligned}$$
(25)

Such a d always exists because in both cases, the left hand side converges to 0 and the right hand side converges to a strictly positive term as d tends to \(\infty \). For such a d, we set \(h :s \mapsto \min \left\{ f_d(s), g_d(s),\epsilon \right\} \). By construction, h is concave, agrees with \(f_d\) on \(\left[ 0,\frac{m}{d} \right] \subset [0,1/2]\) and with \(g_d\) on \(\left[ 1- \frac{m}{d},1 \right] \subset [1/2,1]\). Using Eq. (23) with \(c = 1/d\), we deduce that

$$\begin{aligned} d (d-1) \Delta ^2 h(0)&= d (d-1) \Delta ^2 f_d(0) = d(d-1)\frac{2 a_-(d)}{d^2} = -r_-\\ d \Delta h(0)&= d \Delta f_d(0) = a_-(d)\left( \frac{1}{d} -2b_-(d)\right) = t_-\\ \Delta ^{q}h(0)&= 0 = \Delta ^{q}f_d(0) \quad \forall m \ge q \ge 3\\ d (d-1) \Delta ^2 h\left( 1 - \frac{2}{d}\right)&= d (d-1) \Delta ^2 g_d\left( 1 - \frac{2}{d}\right) = d(d-1)\frac{2 a_+(d)}{d^2} = -r_+\\ d \Delta h\left( 1 - \frac{1}{d} \right)&= d \Delta g_d\left( 1 - \frac{1}{d} \right) = a_+(d) \left( \frac{-1}{d} + 2 (1 - b_+(d))\right) = t_+\\ \Delta ^{q}h\left( 1 - \frac{m}{d} \right)&= \Delta ^{q}g_d\left( 1 - \frac{m}{d} \right) = 0 \quad \forall m \ge q \ge 3. \end{aligned}$$

From the concavity of h and the derivative formula (2), we deduce that the polynomial \(B_{h,d}\) satisfies the desired properties (Fig. 4). \(\square \)

Fig. 4
figure 4

Illustration of the approximation result of Lemma 4 with \(\epsilon = 0.1\), \(m = 3\), \(t_- =0.7\), \(t_+ = -2.2\), \(r_- = 2\) and \(r_+ = 0.2\). The resulting polynomial is of degree 66. Numerical estimations of the first and second order derivatives at 0 and 1 match the required values up to 3 precision digits. The polynomial is strongly concave, however, this is barely visible because the strong concavity constant is extremely small

We deduce the following result

Lemma 5

Let \(a > 0\), \(r > 0\), \(\epsilon > 0\), and an integer \(m \ge 3\). Consider two unit vectors: \(v_-\) with strictly positive entries and \(v_+\) with first entry strictly positive and second entry strictly negative. Then there exists a \(C^m\) curve \(\gamma :[0,M] \mapsto {\mathbb {R}}^2\), such that

  1. 1.

    \(\Vert \gamma '\Vert = 1\).

  2. 2.

    \(\gamma (0) = (-a, 0) := A\) and \(\gamma (1) = (0,a) : = B\).

  3. 3.

    \(\gamma '(0) = v_-\) and \(\gamma '(1) = v_+\).

  4. 4.

    \(\Vert \gamma ''(0)\Vert = \Vert \gamma ''(-1)\Vert = r\).

  5. 5.

    \(\mathrm {det}(\gamma ',\gamma '') < 0\) along the curve.

  6. 6.

    \(\gamma ^{(q)}(0) = \gamma ^{(q)}(1) = 0\) for any \(3 \le q \le m\).

  7. 7.

    \(\mathrm {dist}(\gamma ([0,M]), [A,B]) \le \epsilon \).

Proof

Consider the graph of a polynomial as given in Lemma 4 with \(t_- = v_-[2] / v_-[1]> 0\), \(t_+ = v_+[2] / v_+[1]< 0\) and \(r_- = \frac{r}{2a} (1 + t_-^2)^{\frac{3}{2}}\), \(r_+ = \frac{r}{2a} (1 + t_+^2)^{\frac{3}{2}}\) and \(\epsilon /2a\) as an approximation parameter. This graph is parametrized by t. It is possible to reparametrize it by arclength to obtain a \(C^m\) curve \(\gamma _0\) whose tangents at 0 is \(T_-\), at 1 is \(T_+\), and whose curvature at 0 and 1 is \(- \frac{r}{2a}\). Furthermore, \(\gamma _0\) has strictly negative curvature whence item 5. Consider the affine transform: \(x \mapsto 2a(x - 1/2)\), \(y \mapsto 2a y\). This results in a \(C^m\) curve \(\gamma \), parametrized by arclength which satisfies the desired assumptions. \(\square \)

Lemma 6

(Normal approximations of polygons by smooth convex sets) Let \(S=A_1...A_n\) be a convex polygon. For each i, let \(V_i\) be in \( N_S(A_i)\) such that the angle between \(V_i\) and each of the two neighboring faces is within \(\left( \frac{\pi }{2},\pi \right) \). Then for any \(\epsilon > 0\) and any \(m \ge 2\), there exists a compact convex set \(C \subset {\mathbb {R}}^2\) such that

  1. (i)

    the boundary of C is \(C^m\) with non vanishing curvature,

  2. (ii)

    \(S \subset C\),

  3. (iii)

    for any \(i = 1,\ldots , n\), \(A_i \) in \( \mathrm {bd}\,(C)\) and the normal cone to C at \(A_i\) is given by \(V_i\),

  4. (iv)

    \(\max _{y \in C} \mathrm {dist}(y,S) \le \epsilon \).

Proof

We assume without loss of generality that \(A_1,\ldots ,A_n\) are ordered clockwise. For \(i = 1, \ldots , n-1\), and each segment \([A_i,A_{i+1}]\), we may perform a rotation and a translation to obtain \(A_i = -(a,0)\) and \(A_{i+1} = (a,0)\). Working in this coordinate system, using the angle condition on \(V_i\), we may choose \(v^-_i\), \(v^+_{i+1}\) satisfying the hypotheses of Lemma 5 respectively orthogonal to \(V_i\) and \(V_{i+1}\). Choosing \(r = 1\), we obtain \(\gamma _i :[0,M_i] \mapsto {\mathbb {R}}^2\) as given by Lemma 5. Rotation and translations affect only the direction of the derivatives of curves, not their length. Hence, it is possible to concatenate curves \(\left( \gamma _i \right) _{i=1}^{n-1}\) and to preserve the \(C^m\) properties of the resulting curve. At end-points, tangents and second order derivatives coincide while higher derivatives vanish. Furthermore the curvature has constant sign and does not vanish. We obtain a closed \(C^m\) curve which defines a convex set which satisfies all the requirements of the lemma. \(\square \)

Remark 5

(Bissector) Given any polygon, choosing normal vectors as given by the direction of the bissector of each angles ensure that the above assumptions are satisfied. Hence all our approximation results hold given polygon without specifying the choice of outer normals.

4.2 Smooth convex interpolants of polygonal sequences

Definition 1

(Interpolability) For \(n\ge 3\), let \(A_1\ldots A_n \) be a convex polygon S and \(V_i \) be in \( N_S(V_i)\) for \(i=1,\ldots ,n\). We say that \(\left( A_i,V_i \right) _{i=1}^n \) is interpolable if for each \(i = 1,\ldots ,n\), the angle between \(V_i\) and each of the two neighboring faces of the polygon is in \(\left( \frac{\pi }{2},\pi \right) \). The collection \(\left( A_i,V_i \right) _{i=1}^n \) is called a polygon-normal pair.

Let \(I = {\mathbb {Z}}\) or \(I = {\mathbb {N}}\). Let \(\left( PN_i \right) _{i \in I}\) be a sequence of interpolable polygon-normal pairs. Setting for i in I, \(PN_i = \left\{ \left( A_{j,i} \right) _{j=1}^{n_i}, \left( V_{j,i} \right) _{j=1}^{n_i} \right\} \) where \(n_j \) is in \( {\mathbb {N}}\) and denoting by \(T_i\) the polygon \(A_{1,i}\ldots A_{n_i,i}\), we say that the sequence \(\left( PN_i \right) _{i \in I}\) is strictly increasing if for all i in I, \(T_i \subset \mathrm {int}(T_{i+1})\).

Let \(\left( PN_i \right) _{i \in I}\) be a strictly increasing sequence of interpolable polygon-normal pairs. A sequence \((\epsilon _i)_{i \in I}\) in (0, 1) is said to be admissible if \(0 \in \mathrm {int}(T_i)\) for each i in I and

$$\begin{aligned} \gamma T_i\subset \mathrm {int}\,T_{i+1} \end{aligned}$$

for all \(\gamma \in [1-\epsilon _i,1+\epsilon _i].\) We have the following corollary of Theorem 2.

Corollary 1

(Smooth convex interpolation of polygon sequences) Let \(I = {\mathbb {Z}}\) or \(I = {\mathbb {N}}\). Let \(\left( PN_i \right) _{i \in I}\) be a strictly increasing sequence of interpolable polygon-normal pairs and \((\epsilon _i)_{i \in I}\) be admissible. Set \(\displaystyle {\mathcal {T}}:=\mathrm {int}\left( \cup _{i\in I}T_i \right) .\)

Then for any k in \( {\mathbb {N}}\), \(k \ge 2\) there exists a \(C^k\) convex function \(f :{\mathcal {T}} \mapsto {\mathbb {R}}\), and an increasing sequence \((\lambda _i)_{i \in I}\), with \(\inf _{i \in I} \lambda _i > -\infty \), such that for each i in I

  1. (i)

    \(T_i \subset \left\{ x,\,f(x) \le \lambda _i \right\} \).

  2. (ii)

    \(\mathrm {dist}(T_i, \left\{ x,\,f(x) \le \lambda _i \right\} ) \le \epsilon _i\).

  3. (iii)

    For each i in I, j in \(\{1,\ldots ,n_i\}\), we have \(f(A_{i,j}) = \lambda _i\) and \(\nabla f(x)\) is colinear to \(V_{i,j}\).

  4. (iv)

    \(\nabla ^2 f\) is positive definite outside \({{\,\mathrm{argmin}\,}}f\). When there is a unique minimizer then \(\nabla ^2f\) is positive definite throughout \({\mathcal {T}}\) (this is the case when \(I = {\mathbb {N}}\) or when \(I={\mathbb {Z}}\) and \(\cap _{i \in I} T_i\) is a singleton).

We add two remarks which will be useful for directional convergence issues and the construction of Legendre functions:

  1. (a)

    If two consecutive elements of the sequence of interpolable polygon-normal pairs are homothetic with center 0 in the interior of both polytopes, then the restriction of the resulting convex function to this convex ring can be constructed such that all the sublevel sets within this ring are homothetic with the same center.

  2. (b)

    If further conditions are imposed on the elements of a strictly increasing interpolable polygon-normal pair, then the resulting function can be constructed to be Legendre and globally Lipschitz continuous (that is, its Legendre conjugate has bounded support). This is a consequence of Proposition 2 and will be detailled in the next section.

4.3 More on Legendre functions and a pathological function with polyhedral domain

Using intensively polygonal interpolation, we build below a finite continuous Legendre function h on an \(\ell ^\infty \) square with oscillating “mirror lines": \(t\rightarrow \nabla h^*(\nabla h(x_0)+tc)\).

We start with the following preparation proposition related to the Legendre interpolation of Proposition 2.

Lemma 7

Let \(\left( PN_i \right) _{i \in {\mathbb {N}}^*}\) be a strictly increasing sequence of interpolable polygon-normal pairs. Setting for i in \({\mathbb {N}}^*\), \(PN_i = \left\{ \left( A_{j,i} \right) _{j=1}^{n_i}, \left( V_{j,i} \right) _{j=1}^{n_i} \right\} \) where \(n_j\) is in \({\mathbb {N}}^*\) and denoting by \(T_i\) the polygon \(A_{1,i}\ldots A_{n_i,i}\), we assume that

$$\begin{aligned} T_i=3i P,\,\forall i \in {\mathbb {N}}^*, \end{aligned}$$

where P is a fixed polygon which contains the unit Euclidean disk.

Then for any l in \({\mathbb {N}}\), \(l \ge 2\), there exists a strictly increasing sequence of sets \(\left( S_i \right) _{i \in {\mathbb {N}}, \,i\ge 2}\), such that for \(j \ge 1\),

  • \(S_{3j}\) interpolates the normals of \(PN_j\) in the sense of Lemma 6 with \(\mathrm {dist}(S_{3j}, T_j) \le 1 / (4(3j+2)^l)\)

  • \(S_{3j - 1} = \frac{3j - 1}{3_j} S_{3j}\)

  • \(S_{3j + 1} = \frac{3j + 1}{3_j} S_{3j}\)

This sequence has the following properties

  • there exists \(c>0\), such that for all j in \({\mathbb {N}}\), \(j \ge 3\) and for all unit vector x,

    $$\begin{aligned} c\ge \sigma _{S_{j+1}}(x) - \sigma _{S_j}(x) \ge 1 - \frac{1}{(j+1)^l}. \end{aligned}$$
    (26)
  • for all unit vector x,

    $$\begin{aligned} \left| \frac{\sigma _{S_{j+1}}(x) - \sigma _{S_j}(x)}{ \sigma _{S_{j}}(x) - \sigma _{S_{j-1}}(x)} -1\right| \le \frac{1}{j^l},\,\forall j\ge 3. \end{aligned}$$
    (27)

Proof

Set for all j in \({\mathbb {N}}^*\), \(\delta _j = \frac{1}{4(3j+2)^l}\) and let \(S_{3j}\) be the \(\delta _j\) interpolant of \(T_j=3jP\) as given by Lemma 6 so that \(\mathrm {dist}(S_{3j}, 3jP) \le \delta _j\). Since P contains the unit ball,

$$\begin{aligned} 3j P \subset S_{3j} \subset (3j + \delta _j) P. \end{aligned}$$
(28)

Now set

$$\begin{aligned} S_{3j - 1}&= \frac{3j - 1}{3_j} S_{3j}\\ S_{3j + 1}&= \frac{3j + 1}{3_j} S_{3j}. \end{aligned}$$

For any j in \({\mathbb {N}}^*\), it is clear that \(S_{3j-1} \subset \mathrm {int}(S_{3j})\) and \(S_{3j} \subset \mathrm {int}(S_{3j+1})\). Furthermore, by (28), we have

$$\begin{aligned} S_{3j+1} \subset \frac{3j+1}{3j} ( 3j + \delta _j) P \subset ((3j+1) + 2\delta _j) P \subset \mathrm {int}((3j+2) P) \subset \mathrm {int} (S_{3j+2}) \end{aligned}$$

so that we indeed have a strictly increasing sequence of sets. We obtain from the construction, for any j in \({\mathbb {N}}^*\), and any unit vector x,

$$\begin{aligned}&\sigma _{S_{3j}}(x) - \sigma _{S_{3j-1}}(x) =\sigma _{S_{3j+1}}(x) - \sigma _{S_{3j}}(x)\nonumber \\&\quad = \frac{1}{3j} \sigma _{S_{3j}}(x) \in \left[ \sigma _P(x), \left( 1 + \frac{\delta _j}{3j} \right) \sigma _P(x) \right] \subset \left[ 1, \left( 1+\frac{\delta _j}{3j} \right) \sigma _P(x) \right] \nonumber \\&\sigma _{S_{3j+2}}(x) - \sigma _{S_{3j+1}}(x) \le \sigma _P(x) (3j +2) \left( 1 + \frac{\delta _{j+1}}{3j+3} \right) - \sigma _P(x)(3j+1)\nonumber \\&\sigma _{S_{3j+2}}(x) - \sigma _{S_{3j+1}}(x) \ge \sigma _P(x) (3j +2) - \sigma _P(x)(3j+1)\left( 1 + \frac{\delta _j}{3j} \right) \nonumber \\&\quad = \sigma _P(x) \left( 1 - \delta _j \frac{3j+1}{3j} \right) \nonumber \\&\quad \ge 1 - \delta _j \frac{4}{3} \ge 1 - \frac{1}{(3j+2)^l} \end{aligned}$$
(29)

This proves (26). We deduce that for all j in \({\mathbb {N}}^*,\)

$$\begin{aligned}&\frac{\sigma _{S_{3j+1}}(x) - \sigma _{S_{3j}}(x)}{ \sigma _{S_{3j}}(x) - \sigma _{S_{3j-1}}(x)} = 1, \text{ for } \text{ all } \text{ nonzero } \text{ vector } x, \\&\max _{\Vert x\Vert = 1} \frac{\sigma _{S_{3j+2}}(x) - \sigma _{S_{3j+1}}(x)}{ \sigma _{S_{3j+1}}(x) - \sigma _{S_{3j}}(x)} \le (3j +2) \left( 1 + \frac{\delta _{j+1}}{3j+3} \right) - (3j+1)\\&\quad = 1 + \frac{3j+2}{3j+3} \delta _{j+1} \le 1 + \frac{1}{(3j +1)^l},\\&\min _{\Vert x\Vert = 1} \frac{\sigma _{S_{3j+2}}(x) - \sigma _{S_{3j+1}}(x)}{ \sigma _{S_{3j+1}}(x) - \sigma _{S_{3j}}(x)} \underset{(29)}{\ge } \frac{1 - \delta _j \frac{3j+1}{3j} }{\left( 1 + \frac{\delta _j}{3j} \right) }\\&\quad \ge \left( 1 - \delta _j \frac{3j+1}{3j} \right) \left( 1 - \frac{\delta _j}{3j} \right) \\&\quad \ge 1 - \delta _j\left( \frac{3j+1}{3j} + \frac{1}{3j} \right) \ge 1 - \delta _j \frac{5}{3} \ge 1 - \frac{1}{(3j+1)^l}. \end{aligned}$$

Furthermore, using the fact that \(t \mapsto \frac{1+t}{1-t}\) is increasing on \((-\infty ,1)\) and the fact that \(\delta _{j+1} \le \delta _j\),

$$\begin{aligned} \max _{\Vert x\Vert = 1} \frac{\sigma _{S_{3j+3}}(x) - \sigma _{S_{3j+2}}(x)}{ \sigma _{S_{3j+2}}(x) - \sigma _{S_{3j+1}}(x)}&\underset{(29)}{\le } \frac{1 + \frac{\delta _{j+1}}{3j + 3}}{1 - (3j+1) \frac{\delta _j}{3j}} \nonumber \\&\le \frac{1 + (3j+1)\frac{\delta _{j}}{3j}}{1 - (3j+1) \frac{\delta _j}{3j}} \nonumber \\&\le \frac{1 + \delta _j\frac{4}{3}}{1 - \delta _j \frac{4}{3}}. \end{aligned}$$
(30)

Setting \(s(t)= (1+t)/(1-t)\), we have, for all \(t \le 1/2\)

$$\begin{aligned} s'(t)&= \frac{2}{(1-t)^2}, \, s(0)=1\\ s''(t)&= \frac{4}{(1-t)^3} \le 24,\, s'(0)=2. \end{aligned}$$

Thus \(s(t) \le 1 + 2 t + 12 t^2\) on \((-\infty ,1/2]\). Since \(\frac{4}{3} \delta _j \le \frac{4}{75} \le \frac{1}{2}\), we deduce from the previous remark and (30) above:

$$\begin{aligned} \max _{\Vert x\Vert = 1} \frac{\sigma _{S_{3j+3}}(x) - \sigma _{S_{3j+2}}(x)}{ \sigma _{S_{3j+2}}(x) - \sigma _{S_{3j+1}}(x)}&\le 1 + \frac{8}{3} \delta _j + \frac{64}{3} \delta _j^2 = 1 + \delta _j\left( \frac{8}{3} + \frac{64}{3} \delta _j \right) \\&\le 1 + \delta _j\left( 3 + \frac{64}{3 \times 25}\right) \le 1 + 4 \delta _j = 1 + \frac{1}{(3j+2)^l}. \end{aligned}$$

Finally using (29) again,

$$\begin{aligned} \min _{\Vert x\Vert = 1} \frac{\sigma _{S_{3j+3}}(x) - \sigma _{S_{3j+2}}(x)}{ \sigma _{S_{3j+2}}(x) - \sigma _{S_{3j+1}}(x)}&\ge \frac{1 }{(3j +2) \left( 1 + \frac{\delta _{j+1}}{3j+3} \right) - (3j+1) } \\&= \frac{1 }{1 + \delta _{j+1}\frac{3j+2}{3j+3} } \\&\ge 1 - \delta _{j+1}\frac{3j+2}{3j+3} \ge 1 - \delta _{j+1} \ge 1 - \frac{1}{(3j+2)^l}. \end{aligned}$$

This proves the desired result. \(\square \)

Combining Lemma 7 and Proposition 2, we obtain the following result.

Theorem 3

Let \(\left( PN_i \right) _{i \in {\mathbb {N}}^*}\) be a strictly increasing sequence of interpolable polygon-normal pairs. Set for i in \({\mathbb {N}}^*\), \(PN_i = \left\{ \left( A_{j,i} \right) _{j=1}^{n_i}, \left( V_{j,i} \right) _{j=1}^{n_i} \right\} \) where \(n_i\) is in \({\mathbb {N}}^*\), denote by \(T_i\) the polygon \(A_{1,i}\ldots A_{n_i,i}\), and assume that for all i in \( {\mathbb {N}}^*\), \(T_i=3i P\) where P is a fixed polygon which contains the unit disk in its interior.

Then for any k in \( {\mathbb {N}}\), \(k \ge 2\) and all \(l \ge 3\), there exists a \(C^k\) globally Lipschitz continuous Legendre function, \(h :{\mathbb {R}}^2 \mapsto {\mathbb {R}}\), and an increasing sequence \((\lambda _i)_{i \in {\mathbb {N}}}\), such that for each i in \( {\mathbb {N}}\):

  • \(T_i \subset \left\{ x,\,h(x) \le \lambda _i \right\} \),

  • \(\mathrm {dist}(T_i, \left\{ x,\,h(x) \le \lambda _i \right\} ) \le \frac{1}{4(3i+2)^l}\),

  • For any x with \(h(x) = \lambda _i\) and \(\nabla h(x)\) is colinear to \(V_i\) for each vertex x of \(T_i\),

  • h has positive definite Hessian and is globally Lipschitz continuous,

  • \(h^*\) has compact domain and is \(C^k\) on the interior of its domain.

Corollary 2

(Continuity on the domain) The function \(h^*\) constructed in Theorem 3 has compact polygonal domain and is continuous on this domain.

Proof

Since P is a polygon and contains the unit Euclidean disk, the gauge function of 3P is polyhedral with full domain \({\mathbb {R}}^2\), call it \(\omega \). Denote by \(P^{\circ }\) the polar of P. This is a polytope and since \(\omega \) is the gauge of P, we actually have \(\omega = \sigma _{P^{\circ }}\), the support function of the polar of P [35, Theorem 1.7.6]. Hence the the convex conjugate of \(\omega \) is the indicator of the polytope \(P^\circ \) [37, Theorem 13.2].

It can be easily seen from the proof of Proposition 2 that \(\lambda _i = \alpha i + r_i\) with \(r(i) = O(1)\) as \(i \rightarrow \infty \). Without loss of generality, we may suppose that \(\alpha = 1\) (this is a simple rescaling) so that there is a positive constant c such that \(|\lambda _i - i| \le c\) for all i.

Let h be given as in Theorem 3, fix \(i \ge 1\) and \(x \in {\mathbb {R}}^2\) such that \(\lambda _{i-1} \le h(x) \le \lambda _i\). We have in \({\mathbb {R}}^2\)

$$\begin{aligned} \{y:\, \omega (y) \le i-1 \} \subset \{y:\, h(y) \le \lambda _{i-1} \} \subset \{y:\, h(y) \le \lambda _{i} \} \subset \{y:\, \omega (y) \le i+1 \} \end{aligned}$$

and hence

$$\begin{aligned} i - 1 \le \omega (x) \le i+1 \end{aligned}$$

and we deduce that

$$\begin{aligned} \omega (x) - 2 - c \le i - 1 -c \le \lambda _{i-1} \le h(x) \le \lambda _i \le i+c \le \omega (x) + c + 1. \end{aligned}$$

Since i was arbitrary, this shows that there exists a constant \(C>0\) such that \(|h(x) - \omega (x)|\le C\) for all \(x \in {\mathbb {R}}^2\). Recall that \(z \mapsto \sup _{y \in {\mathbb {R}}^2} \left\langle y,z\right\rangle - \omega (y)\) is the indicator function of \(P^\circ \), hence,

$$\begin{aligned} z \in P^{\circ }&\Rightarrow \sup _{y \in {\mathbb {R}}^2} \left\langle y,z\right\rangle - \omega (y) = 0 \Rightarrow \sup _{y \in {\mathbb {R}}^2} \left\langle y,z\right\rangle - h(y) \le C < +\infty \\ z \not \in P^{\circ }&\Rightarrow \sup _{y \in {\mathbb {R}}^2} \left\langle y,z\right\rangle - \omega (y) = +\infty \Rightarrow \sup _{y \in {\mathbb {R}}^2} \left\langle y,z\right\rangle - h(y) = +\infty \end{aligned}$$

which shows that the domain of \(h^*\) is actually \(P^{\circ }\) which is a polytope. Now, \(h^*\) is convex and lower semicontinuous on \(P^{\circ }\), invoking the results of [23], it is also upper semicontinuous on \(B^*\) and finally it is continuous on \(B^*\). \(\square \)

Corollary 3

(A pathological Legendre function) For any \(\theta \in \left( \frac{-\pi }{4},\frac{\pi }{4} \right) \) there exists a Legendre function \(h:{\mathbb {R}}^2 \mapsto {\mathbb {R}}\) whose domain is a closed square, continuous on this domain and \(C^k\) on its interior, such that for all \(i \in {\mathbb {N}}^*\), \(\nabla h^*(i, 0)\) is proportional to \((\cos (\theta ), (-1)^i \sin (\theta ))\).

Proof

For \(x=(u,v)\), set \(\Vert x\Vert _1=|u|+|v|\), and let \(P = \left\{ x \in {\mathbb {R}}^2,\, \Vert x\Vert _1 \le 2 \right\} \). Let us construct a strictly increasing sequence of interpolable polygon-normal pairs \(\left( PN_i \right) _{i \in {\mathbb {N}}^*}\) as follows, we fix \(\theta \in \left( \frac{-\pi }{4},\frac{\pi }{4} \right) \) and set for all \(i\in {\mathbb {N}}^*\) :

  • \(T_i = 3i P\), the polygon associated to the i-th term \(PN_i\) of the sequence,

  • except at the rightmost corner, consider the normals given by the canonical basis vectors and their opposite,

  • at the rightmost corner, (6i, 0), one chooses the normal given by the vector

    $$\begin{aligned} (\cos (\theta ), (-1)^i \sin (\theta )). \end{aligned}$$

We now invoke Theorem 3 to obtain a Lipschitz continuous Legendre function, denoted \(h^*\), with full domain having all the \(T_i\) as sublevel sets and satisfying the hypotheses of the corollary. Rescaling by a factor 6 and setting \(h = h^{**}\) gives the result. \(\square \)

5 Counterexamples in continuous optimization

We are now in position to apply our interpolation results to build counterexamples to classical problems in convex optimization. We worked on situations ranging from structural questions to qualitative behavior of algorithms and ODEs. Through 9 counterexamples we tried to cover a large spectrum but there are many more possibilities that are left for future research. Some example are constructed from decreasing sequences of convex sets, they can be interpolated using Theorem 2 with \(I = {\mathbb {Z}}\), indexing the sequence with negative indices and adding artificially additional sets for positive indices. Nonetheless we sometimes depart from the notations of the first sections and index these sequences by \({\mathbb {N}}\) even though they are decreasing for simplification purposes.

5.1 Kurdyka-Łojasiewicz inequality may not hold

The following result is proved in [12], it was crucial to construct a \(C^2\) convex function which does not satisfy Kurdyka–Łojasiewicz (KL) inequality.

Lemma 8

[12, Lemma 35] There exists a decreasing sequence of compact convex sets \(\left( T_i \right) _{i\in {\mathbb {N}}}\) such that for any i in \( {\mathbb {N}}\), \(T_- = T_{i+1}\) and \(T_+ = T_i\) satisfy Assumption 1 and

$$\begin{aligned} \sum _{i=0}^{+\infty } \mathrm {dist}(T_i, T_{i+1}) = +\infty \end{aligned}$$

As a corollary, we improve the counterexample in [12] and provide a \(C^k\) convex counterexamples for any \(k\ge 2 \) in \( {\mathbb {N}}\).

Corollary 4

(Smooth convex functions are not KL in general) There exists a \(C^k\) convex function \(f :{\mathbb {R}}^2 \mapsto {\mathbb {R}}\) which does not satisfy KL inequality. More precisely, for any \(r > \inf f\) and \(\varphi :[\inf f, r] \mapsto {\mathbb {R}}\) continuous and differentiable on \((\inf f, r)\) with \(\varphi ' > 0\) and \(\varphi (\inf f) = 0\), we have

$$\begin{aligned} \inf \{\Vert \nabla (\varphi \circ f)(x) \Vert : \, x\in {\mathbb {R}}^2, \,\inf f< f(x) < r \} = 0. \end{aligned}$$

Proof

Using [35, Theorem 1.8.13], each \(T_i\) can be approximated up to arbitrary precision by a polygon. Hence we may assume that all \(T_i\) are polygonal while preserving the property of Lemma 8 as well as Assumption 1. Furthermore, using Lemma 6 and Remark 5 each \(T_i\) can in turn be approximated with arbitrary precision by a convex set with \(C^k\) boundary and positive curvature. Hence we may also assume that all \(T_i\) satisfy both the result of Lemma 8 and have \(C^k\) boundary with nonvanishing curvature. Reversing the order of the sets and adding additional sets artificially, we are in the conditions of application of Theorem 2 with \(I = {\mathbb {Z}}\) and the resulting f follows from the same argument as in [12, Theorem 36]. \(\square \)

5.2 Block coordinate descent may not converge

Fig. 5
figure 5

Illustration of the alternating minimization (resp. exact line search) example: on the left, the sublevel sets in gray and the corresponding alternating minimization (resp. exact line search) sequence in dashed lines. On the right the interpolating polygons together with their normal vectors as in Lemma 6

The following polygonal construction is illustrated in Fig. 5. For any \(n\ge 2\) in \({\mathbb {N}}\), we set

$$\begin{aligned} A_n&= \left( \frac{1}{4} + \frac{1}{n}, \frac{1}{4} + \frac{1}{n}\right) \\ B_n&= \left( \frac{1}{4} + \frac{1}{2( n-1)} + \frac{1}{2n}, \frac{1}{4} + \frac{1}{2n} + \frac{1}{2(n+1)} \right) \\ C_n&= \left( \frac{1}{4} + \frac{1}{n}, - \frac{1}{4} - \frac{1}{n}\right) \\ D_n&= \left( -\frac{1}{4} - \frac{1}{n}, - \frac{1}{4} - \frac{1}{n}\right) \\ E_n&= \left( -\frac{1}{4} - \frac{1}{n}, + \frac{1}{4} + \frac{1}{n}\right) . \end{aligned}$$

This defines a convex polygon. We may choose the normals at \(A_n, C_n, D_n, E_n\) to be bisectors of the corresponding corners and the normal at \(B_n\) to be horizontal (see Fig. 5). Rotating by an angle of \(-\frac{n\pi }{2}\) and repeating the process indefinitely, we obtain the sequence of polygons depicted in Fig. 5. It can be checked that the polygons form a strictly decreasing sequence of sets, as for \(n>1\), the polygon \(A_nB_nC_nD_nE_n\) is contained in the interior of the square \(A_{n-1}C_{n-1}D_{n-1}E_{n-1}\). This fulfills the requirement of Corollary 1.

Corollary 5

There exists a \(C^k\) convex function \(f :{\mathbb {R}}^2 \mapsto {\mathbb {R}}\) and an initialization \(x_0=(u_0,v_0)\) such that the recursion, for \(i \ge 1\)

$$\begin{aligned} u_{i+1}&\in \mathop {\hbox {argmin}}\limits _{u} f(u,v_i) \\ v_{i+1}&\in \mathop {\hbox {argmin}}\limits _{v} f(u_{i+1},v) \end{aligned}$$

produces a non converging sequence \(\displaystyle (x_i)_{i\in {\mathbb {N}}}=\left( (u_i,v_i)\right) _{i\in {\mathbb {N}}}\).

Proof

We apply Corollary 1 to the proposed decreasing sequence and by choosing \((u_0,v_0) = B_2\) for example. This requires to shift indices (start with \(i = 2\)) and use Theorem 2 with \(I = {\mathbb {Z}}\). Note that the optimality condition for partial minimization and the fact that level sets have nonvanishing curvature ensure that the partial minima are unique. \(\square \)

In the nonsmooth convex case cyclic minimization is known to fail to provide the infimum value, see e.g., [3, p. 94]. Smoothness is sufficient for establishing value convergence (see e.g. [10, 39] and references therein), whether it is enough or not for obtaining convergence was an open question. Our counterexample closes this question and shows that cyclic minimization does not yield converging sequences even for \(C^k\) convex functions. This result also closes the question for the more general nonconvex case for which we are not aware of a nontrivial counterexample for convergence of alternating minimization. Let us mention however Powell’s example [34] which shows that cyclic minimization with three blocks does not converge for smooth functions.

It would also be interesting to understand how our result may impact dual methods and counterexamples in that field, as for instance the recent three blocks counterexample in [17].

5.3 Gradient descent with exact line search may not converge

Gradient descent with exact line search is governed by the recursion:

$$\begin{aligned} x^+\in {{\,\mathrm{argmin}\,}}\left\{ f(y):y=x-t\nabla f(x),\,t\in {\mathbb {R}}\right\} , \end{aligned}$$

where x is a point in the plane.

Observe that the step coincides with partial minimization when the gradient \(\nabla f(x)\) is colinear to one of the axis of the canonical basis. From the previous section, we thus deduce the following.

Corollary 6

(Failure of gradient descent with exact line search) There exists a \(C^k\) convex function \(f :{\mathbb {R}}^2 \mapsto {\mathbb {R}}\) and an initialization \(z_0\) in the plane such that the recursion, for \(i \ge 1\)

$$\begin{aligned} x_{i+1} \in {{\,\mathrm{argmin}\,}}\left\{ f(y): y=x_i-t\nabla f(x_i),\, t\in {\mathbb {R}}\right\} \end{aligned}$$

produces a well defined non converging sequence \(\left( x_i \right) _{i \in {\mathbb {N}}}\).

Convergence failure for gradient descent with exact line search is new up to our knowledge. Let us mention that despite non convergence, the constructed sequence satisfy sublinear convergence rates in function values [10].

5.4 Tikhonov regularization path may have infinite length

Fig. 6
figure 6

Illustration of the Tikhonov regularization example, on the left in gray, polygons used to build the sublevel sets of the constructed f and the corresponding solutions to (31) for some values of r (solutions are joined by dotted lines). On the right the normal to be chosen to apply Lemma 6 (for \(n = 1\), see main text for details). The point P represents \({\bar{x}}\), it sits on the x-axis and is constantly contained in the normal cone at \(B_n\) for any \(n \ge 1\)

Following [36], we consider for any \(r > 0\)

$$\begin{aligned} x(r) = {{\,\mathrm{argmin}\,}}\left\{ f(x) + r \Vert x - \bar{x}\Vert _2^2:x\in {\mathbb {R}}^2\right\} \end{aligned}$$
(31)

where f is \(C^k\) convex and where \({\bar{x}}\) is any anchor point. We would like to show that the curve \(r \mapsto x(r)\) may have infinite length. Torralba provided a counterexample in his PhD Thesis for continuous convex functions, see [36]. This work extends his result to smooth \(C^k\) convex functions in \({\mathbb {R}}^2\).

For any n in \( {\mathbb {N}}^*\), we set

$$\begin{aligned} A_n&= \left( \frac{2}{n}, \frac{2}{n}\right) \\ B_n&= \left( \frac{2}{n} + \frac{1}{n^2}, -\frac{1}{n} \right) \\ C_n&= \left( \frac{2}{n}, - \frac{2}{n}\right) \\ D_n&= \left( - \frac{2}{n}, - \frac{2}{n}\right) \\ E_n&= \left( - \frac{2}{n}, \frac{2}{n}\right) . \end{aligned}$$

This is depicted in Fig. 6. For all \(n \ge 1\), denote by \(M_n\) the point on the x axis above \(B_n\) and \(N_n\), the intersection of the normal cone at \(B_n\) and the x axis. We have

$$\begin{aligned} \frac{M_nN_n}{M_nB_n} = n \times M_nN_n= \frac{A'_nB_n}{A_nA'_n} = \frac{3/n}{1/n^2} = 3 n, \end{aligned}$$

so that for all \(n \ge 1\), \(M_nN_n = 3\) and \(N_n = (3 + 2/n + 1 / n^2, 0)\). Choosing \(P = (7,0)\), since for \(n \ge 1\), \(3 + 2/n + 1 / n^2 \le 6 < 7\) , this shows that P constantly belongs to the interior of the normal cone at \(B_n\) for all \(n \ge 1\). The sequence of level sets is constructed as in Fig. 6 by considering alternating symmetries with respect to the x-axis of the sequence of polygons above. It can be checked that the polygons form a strictly decreasing sequence of sets, as for \(n>1\), the polygon \(A_nB_nC_nD_nE_n\) is contained in the interior of the square \(A_{n-1}C_{n-1}D_{n-1}E_{n-1}\). We choose the normal at \(A_n, C_n, D_n, E_n\) to belong to the bisector at the corner and the normal at \(B_n\) to be proportional to the vector \(B_n P\). Applying Corollary 1, we construct f and choose \(\bar{x} = P\) in (31) to obtain the following:

Corollary 7

(A bounded infinite length Tikhonov path) There exists a \(C^k\) strictly convex function \(f :{\mathbb {R}}^2 \mapsto {\mathbb {R}}\) and \(\bar{x} \in {\mathbb {R}}^2\) such that the curve x((0, 1)) given by (31) has infinite length.

Proof

We apply Corollary 1 with \(I = {\mathbb {Z}}\) and revert the indices set to match the sequence that we have described. For any \(n \ge 1\) there exists a value of \(\lambda _n\) such that \(f(B_n)= \lambda _n\) and \(\nabla f(B_n)\) is colinear to the vector \(B_n P\). Set

$$\begin{aligned} r = \frac{\Vert \nabla f(B_n)\Vert }{2 B_nP} \end{aligned}$$

we have \(\nabla f(B_n) + 2r(B_n - P) = 0\) which is the optimality condition in (31) with \(\bar{x} = P\). Hence we have shown that there exists a value of r such that \(B_n\) is the solution to (31). Since n was arbitrary this is true for all n and the curve \(r \mapsto x(r)\) has to go through a sequence of points whose second coordinate is of the form \((-1)^n/n\) for all \(n \ge 1\). Since this sequence is not absolutely summable, the curve has infinite length. \(\square \)

This result is in contrast with the definable case for which we have finite length by the monotonicity lemma, since the whole trajectory is definable and bounded.

5.5 Secants of gradient curves at infinity may not converge

Thom’s gradient conjecture and Kurdyka–Mostowski–Parusinski’s theorem A theorem of Łojasiewicz [27] asserts that bounded solutions to the gradient system

$$\begin{aligned} {\dot{x}}(t) = - \nabla f(x(t)) \end{aligned}$$

converge when f is a real analytic potential. Thom conjectured in [38] that this convergence should occur in a stronger form: trajectories converging to a given \(\bar{x}\) should admit a tangent at infinity, that is

$$\begin{aligned} \frac{x(t) - \bar{x}}{\Vert x(t) - \bar{x}\Vert } \end{aligned}$$
(32)

should have a limit as \(t \rightarrow \infty \). Lines passing through \({\bar{x}}\) and having (32) as a slope are called secants of x at \({\bar{x}}\). This conjecture was proved to be true in [26]. In the convex world, it is well known that solutions to the gradient system converge for general potentials (this is a Féjer monotonicity argument due to Bruck); see also the original approaches by Manselli and Pucci [30] and Daniilidis et al. [19]. It is then natural to wonder whether this convergence satisfies higher order rigidity properties as in the analytic case. The answer turns out to be negative in general yielding a quite mysterious phase portrait.

Fig. 7
figure 7

\(A = (-5,0)\), \(B=\left( - \frac{5}{3} - \frac{25}{16}, \frac{10}{3} + \frac{5}{4} \right) \), \(C=\left( \frac{-5}{2},5 \right) \), \(D=\left( 0,5 \right) \), \(E = \left( 5,0 \right) \), \(F = \left( 0, -5 \right) \). All normals are chosen to be bisectors except w which is parallel to the line (DE). The vector v is orthogonal to the segment [BC]. The point \(C'\) is obtained by considering the intersection between the line (Bw) (starting from B with direction w), and the segment [OC]. The points \(A'\), \(B'\), \(D'\), \(E'\), \(F'\) are obtained by performing a scaling of ABDEF of a factor \(\frac{OC'}{OC}\). The polygon \(A''B''C''D''E''F''\) is ABCDEF scaled by a factor \(\frac{OC' + OC}{2 OC}\)

Absence of tangential convergence for convex potentials

The construction given in this paragraph is more complex than the previous ones, we start with a technical lemma which will be the basic building block for our counterexample.

Lemma 9

Let S be a convex set with \(C^k\) boundary interpolating ABCDEF in Fig. 7 and let g be the gauge function associated to S. The function g is differentiable outside the origin. Consider any initialization \(x_0 \) in [BC] with corresponding trajectory solution to the equation

$$\begin{aligned} {\dot{x}}(t)&= - \nabla g(x(t)), \, t\ge 0,\\ x(0)&= x_0. \end{aligned}$$

Set \(\bar{t} = \sup _{x(t) \in OBC} t\), we have \(\bar{t} < + \infty \) and \(x(\bar{t}) \) in \( [CC']\).

Proof

The fact that g is differentiable comes from the fact that its subgradient is uniquely determined by the normal cone to S (which has dimension one because of the smoothness of the boundary of S). Since S is interpolating the polygon, we have \(g(B) = g(C) = 1\). Furthermore, we have for all \(t\ge 0\), \(\frac{d}{dt} g(x(t)) = -\Vert \nabla g(x(t))\Vert ^2= -1\), thence \(\bar{t} \le 1 - g(C')\). By homogeneity, for any \(x \ne 0\) and \(s > 0\), \(\nabla g (sx) = \nabla g(x)\). For any x in [BC], by convexity

$$\begin{aligned} 0 \le \left\langle C-x,\nabla g(C) - \nabla g(x) \right\rangle = \left\langle C-B,\nabla g(C) - \nabla g(x) \right\rangle \frac{\Vert C-x\Vert }{\Vert C-B\Vert }, \end{aligned}$$

and therefore

$$\begin{aligned} -\left\langle C - B, \nabla g(C) \le -\left\langle C - B, \nabla g(x) \right\rangle \right\rangle \end{aligned}$$
(33)

By homogeneity of g, (33) is true for any x in the triangle OCB (different from 0) and thus in the triangle \(C'CB\) . Denote by y the solution to the equation

$$\begin{aligned} {\dot{y}}&= - \nabla g(C)\\ y(0)&= B, \end{aligned}$$

which integrates to \(y(t) = B - t w\) for all t. Equation (33) ensures that for any \(0\le t \le \bar{t}\)

$$\begin{aligned} \frac{d}{dt} (\left\langle C - B, x(t) \right\rangle )&\ge \frac{d}{dt} (\left\langle C - B, y(t) \right\rangle ) \end{aligned}$$

Hence, we have for any \(0\le t \le \bar{t}\), integrating on [0, t]

$$\begin{aligned} \left\langle C - B, x(t) \right\rangle&\ge \left\langle C - B, y(t) \right\rangle + \left\langle C - B, x_0 - B \right\rangle \nonumber \\&\ge \left\langle C - B, y(t) \right\rangle . \end{aligned}$$
(34)

Furthermore, for all x in [BC], we have

$$\begin{aligned} 1 = \Vert \nabla g(x)\Vert ^2 = \frac{1}{\Vert C - B\Vert ^2}\left\langle C - B, \nabla g(x) \right\rangle ^2 + \left\langle v, \nabla g(x) \right\rangle ^2, \end{aligned}$$

because v is orthogonal to \(C-B\). The first term is maximal for \(x = C\) and thus the second term is minimal for \(x = C\), we have thus for all x in [BC]

$$\begin{aligned} 0 < \left\langle \nabla g(C), v \right\rangle = \left\langle - \nabla g(C), -v \right\rangle \le \left\langle \nabla g(x), v \right\rangle = \left\langle -\nabla g(x), -v \right\rangle \le 1. \end{aligned}$$
(35)

Equation (35) holds for all x in OCB different from O by homogeneity. We deduce that for all \(0 \le t \le \bar{t}\), we have

$$\begin{aligned} \frac{d}{dt} (\left\langle -v, x(t) \right\rangle )&\ge \frac{d}{dt} (\left\langle -v, y(t) \right\rangle ) \end{aligned}$$

and by integration

$$\begin{aligned} \left\langle -v, x(t) \right\rangle&\ge \left\langle -v, y(t) \right\rangle + \left\langle -v, x_0 - B \right\rangle \nonumber \\&= \left\langle -v, y(t) \right\rangle . \end{aligned}$$
(36)

Hence, in the coordinate system \((C - B, -v)\), which is orthogonal, for all t in \( [0,\bar{t}]\), x(t) has larger coordinates than y(t).

The trajectory y(t), of equation \(t \mapsto B - t w\) is the line going from B to \(C'\). From Eqs. (34) and (36), we may write for all t in \( [0,\bar{t}]\), \(x(t) = y(t) + \alpha (t) (C-B) + \beta (t) (-v)\) where \(\alpha \) and \(\beta \) are positive functions. Since y(t) belongs to the line \((BC')\), this shows that x(t) has to be above this line for all \(t \ge 0\), \(t\le \bar{t}\) and actually, \(x(\bar{t}) \) in \( BCC'\). Hence at time \(\bar{t}\), we have \(x(\bar{t}) \) in \( [CC']\). This holds true because \(x(\bar{t})\) is on the boundary of OCB and on the boundary of \(BCC'\). Hence either \(x(\bar{t}) \) in \( [CC']\), either \(x(\bar{t}) \) in [BC]. Equation (35) ensures that if \(x(\bar{t}) \) in [BC] then \(x(\bar{t}) = C\) which concludes the proof. \(\square \)

Corollary 8

(Secants of gradient curves may all fail to converge) There exists a \(C^k\) strictly convex function on \({\mathbb {R}}^2\) with a unique minimizer \(\bar{x}\), such that any nonconstant solution to the gradient flow equation

$$\begin{aligned} {\dot{x}}(t) = -\nabla f(x(t)) \end{aligned}$$

is such that

$$\begin{aligned} \frac{x(t) - \bar{x}}{\Vert x(t) - \bar{x}\Vert } \end{aligned}$$

does not have a limit as \(t \rightarrow \infty \).

The function f has a positive definite Hessian everywhere except at 0.

Proof

We assume without loss of generality that \(\bar{x} = O\) is the origin. Writting \(x(t) = (r(t),\theta (t))\) in polar coordinate, we will construct a function f such that each solution to the ODE produces nonconverging trajectories \(\theta (t)\).

We start with an interpolating set \(S_0 = ABCDE\) as in Lemma 9 and let \(S_1 = A'B'C'D'E'\) be its scaled version as described in Fig. 7.

Let \(\alpha \) be the value of the angle \({\widehat{BOC}}\) and \(m = \left\lceil \frac{2 \pi }{\alpha } \right\rceil + 1\). We have

$$\begin{aligned} \frac{2\pi }{m} < \alpha . \end{aligned}$$

To obtain \(S_{2}\), we rotate \(S_0\) by an angle \(2\pi / m\), we denote \(S'_0\) the resulting set. We rescale \(S'_0\) by a factor \(\beta \) in (0, 1) so that \(\beta S'_0\) lies in the interior of \(S_1\). Call the resulting set \(S_2\) and \(S_3\) is obtained from \(S_2\) exactly the same way as \(S_1\) is obtained from \(S_0\). We repeat the same process indefinitely to obtain a strictly decreasing sequence of \(C^k\) sets. Note that for any k in \( {\mathbb {N}}\), \(S_{2km}\) and \(S_{2km + 1}\) are homothetic to \(S_0\).

We now invoke Corollary 1 (with \(I = {\mathbb {Z}}\) and revert the indices) to obtain a \(C^k\) function f with those prescribed level sets. Using Remark 3 it turns out that the level sets of f between \(S_0\) and \(S_1\) are simple scalings of \(S_0\). Hence the gradient curves of f and those of the gauge function of S are the same between \(S_0\) and \(S_1\), up to time reparametrization.

Using Lemma 9 any trajectory crossing [BC] in Fig. 7, must also be crossing \([CC']\) and leave the triangle BOC in finite time. The same statement holds after scaling the level sets and since for all k in \( {\mathbb {N}}\), \(S_{2km}\) and \(S_{2km+1}\) are homothetic to \(S_0\), this shows that no solution stays indefinitely in the triangle BOC.

Lemma 9 still holds after rotations and by our construction, for any triangle T obtained by rotating BOC by a multiple of \(2\pi /m\), no trajectory stays indefinitely within T. Since \(2\pi / m < \alpha \), the union of these triangles U contains O in its interior.

Note first that any gradient curve converges to \({\bar{x}}\). Let us argue by contradiction and assume that there exists a continuous gradient curve \(t \mapsto z(t)\) distinct from the stationary solution \(\bar{x}\), such that

$$\begin{aligned} \frac{z(t) - \bar{x}}{\Vert z(t) - \bar{x}\Vert } \end{aligned}$$

converges. This exactly means that the angle \(\theta (t)\) of the curve has a limit in \([0,2\pi )\) as t goes to infinity. There is a rotation of BOC by a multiple of \(2\pi /m\) whose interior intersects the half line given by the direction \(\theta \), call this triangle T. The directional convergence entails that there exists \(t_0 \ge 0\) such that z(t) belongs to T for all \(t \ge t_0\). Hence z can not be a gradient curve. To complete the proof, we may add disks of increasing size to the list of sets to obtain a full domain function and invoke Theorem 2 with \(I = {\mathbb {Z}}\). \(\square \)

5.6 Newton’s flow may not converge

Given a twice differentiable convex function f, we define the open set \(\Omega :=\{x\in {\mathbb {R}}^2:\nabla ^2f \text{ is } \text{ invertible }\}\) and we consider maximal solutions to the differential equation

$$\begin{aligned} {\dot{x}}(t) = - \nabla ^2 f(x(t))^{-1} \nabla f(x(t)), \end{aligned}$$
(37)

on \(\Omega \). This is the continuous counterpart of Newton’s method, it has been studied in [5] and [2]. Let \(x_0\) be in \(\Omega \), there exists a unique maximal nontrivial interval I containing 0 and a unique solution x to (37) on I with \(x(0) = x_0\). Equation (37) may be rewritten as

$$\begin{aligned} \frac{d}{dt} \nabla f(x(t)) = - \nabla f(x(t)) \end{aligned}$$

and thus for all t in I, we have

$$\begin{aligned} \nabla f(x(t)) = e^{-t} \nabla f(x_0). \end{aligned}$$
(38)

If we could ensure that \(I = {\mathbb {R}}\) and f has oscillating gradients close to its minimum, then (38) entails that the direction of the gradient is constant along the solution, which requires oscillations in space to compensate for gradient oscillations.

Corollary 9

(A bounded Newton’s curve that oscillates at infinity) For any \(k\ge 2\), there exists a \(C^k\) convex coercive function \(f :{\mathbb {R}}^2 \mapsto {\mathbb {R}}\) and an initial condition \(x_0 \) in \( {\mathbb {R}}^2\) such that the solution to (37) is bounded, defined on \({\mathbb {R}}\) and has at least two distinct accumulation points.

The counterexample is sketched in Fig. 8, the construction is the same as for Corollary 5 but instead of doing quarter rotations, we use symmetry with respect to the first axis. We can then call for Corollary 1 to construct the function f and equation (38) ensures that the solution interval is unbounded.

Fig. 8
figure 8

Illustration of the continuous time Newton’s dynamics. On the left, the “skeletons" of the sublevel sets in gray and a sketch of the corresponding curve. On the right, the normals to be chosen in order to apply Lemma 6

5.7 Bregman descent (mirror descent) may not converge

The mirror descent algorithm was introduced in [31] as an efficient method to solve constrained convex problems. In [9], this method is shown to be equivalent to a projected subgradient method, using non-Euclidean projections. It plays an important role for some categories of constrained optimization problem; see e.g., [6] for recent developments and [20] for a surprising example.

Let us recall beforehand some definitions. Given a Legendre function h with domain \(\mathrm {dom}\,h\), define the Bregman distanceFootnote 4 associated to h as \(D_h(u,v)=h(u)-h(v)-\langle \nabla h(v),u-v\rangle \) where u is in \( \mathrm {dom}\,h\) and v is in the interior of \(\mathrm {dom}\,h\).

Given a smooth convex function f that we wish to minimize on \(\overline{ \mathrm {dom}\,h}\), we consider the Bregman method

$$\begin{aligned} x_{i+1}={{\,\mathrm{argmin}\,}}\left\{ \langle \nabla f(x_i),u-x_i\rangle +\lambda D_h(u,x_i):u\in {\mathbb {R}}^2\right\} , \end{aligned}$$

where \(x_0\) is in \(\mathrm {int}\,\mathrm {dom}\,h\) and \(\lambda >0\) is a step size. When the above iteration is well defined, e.g. when \(\mathrm {dom}\,h\) is bounded, it writes:

$$\begin{aligned} x_{i+1}=\nabla h^*\left( \nabla h(x_i)-\lambda \nabla f(x_i)\right) . \end{aligned}$$

In [6] the authors identified a generalized smoothness condition which confers good minimizing properties to the above method:

$$\begin{aligned}&Lh-f \text{ convex }, \end{aligned}$$
(39)
$$\begin{aligned}&\lambda \in (0,L). \end{aligned}$$
(40)

The corollary below shows that such an algorithm may not converge, even though we assume the cost to satisfy (39), the step to satisfy (40), and the Legendre function to have a compact domain.

Corollary 10

(Bregman or mirror descent may not converge) There exists a Legendre function \(h:D \mapsto {\mathbb {R}}\), defined on a closed square D, continuous on D, a vector c in \( {\mathbb {R}}^2\), and \(x_0\) in \({\mathbb {R}}^2\) such that the Bregman descent recursion

$$\begin{aligned} x_{i+1} = \nabla h^*\left( \nabla h(x_i) - c \right) , \end{aligned}$$

produces a bounded sequence \((x_i)_{i \in {\mathbb {N}}}\) which has at least two distinct accumulation points.

Proof

We fix \(\theta \in \left( \frac{-\pi }{4},\frac{\pi }{4} \right) \), \(\theta \ne 0\), and consider h constructed in Corollary 3 and choose \(c = (-1,0)\). In this case the Bregman descent recursion writes for all i in \({\mathbb {N}}\),

$$\begin{aligned} \nabla h (x_{i+1}) - \nabla h(x_i) = -c \end{aligned}$$

so that we actually have \(\nabla h (x_{i}) - \nabla h(x_{0}) = \nabla h (x_{i}) = - i c\) and thus

$$\begin{aligned} x_i = \nabla h^*(-ic) = \nabla h^*(i,0). \end{aligned}$$

By Corollary  3, we have for all \(i \in {\mathbb {N}}\) that \(\nabla h^*(i,0)\) proportional to \((\cos (\theta ), (-1)^i \sin (\theta ))\). Since the norm of the gradient of \(h^*\) cannot vanish at infinitiy (no flat direction) and is bounded, this proves that the sequence \((x_i)_{i \in {\mathbb {N}}}\) has at least two accumulation points which is the desired result. \(\square \)

5.8 Central paths of Legendre barriers may not converge

Consider the problem

$$\begin{aligned} \min _{x \in D} \left\langle c, x\right\rangle \end{aligned}$$
(41)

where D is a subset of \({\mathbb {R}}^2\) and c. Given a Legendre function h on D, we introduce the h central path through

$$\begin{aligned} x(r) = {{\,\mathrm{argmin}\,}}\left\{ \left\langle c, x\right\rangle + r h(x):x\in {\mathbb {R}}^2\right\} \end{aligned}$$
(42)

where \(r>0\) is meant to tend to 0. Central paths are one of the essential tools behind interior point methods, see e.g., [4, 33] and references therein.

Note that the accumulation points of x(r) as \(r \rightarrow 0\), have to be in the the solution set of (41). It is even tempting to think that the convergence of the path to some specific minimizer could occur, as it is the case for many barriers, see e.g. [4]. We have however:

Corollary 11

(A nonconverging central path) There exists a Legendre function \(h:D \mapsto {\mathbb {R}}\), defined on a closed square D, continuous on D, a vector c in \( {\mathbb {R}}^2\), such that the h central path \(r \mapsto x(r)\) has two distinct accumulation points.

Proof

The optimality condition which characterizes x(r) for any \(r > 0\) writes,

$$\begin{aligned} x(r)= \nabla h^*\left( \frac{c}{r} \right) , \end{aligned}$$

and the construction is the same as in Corollary 10. \(\square \)

5.9 Hessian Riemannian gradient dynamics may not converge

The construction of this paragraph is similar to the two previous paragraphs. Consider a \(C^k\) (\(k\ge 2\)) Legendre function \(h :D \mapsto {\mathbb {R}}\) and the continuous time dynamics

$$\begin{aligned} {\dot{x}}(t) = - \nabla _H f(x(t)), \, t\ge 0, \end{aligned}$$
(43)

where \(H = \nabla ^2 h\) is the Hessian of h and \(\nabla _H f = H^{-1} \nabla f\) is the gradient of some differentiable function f in the Riemannian metric induced by H on \(\mathrm {int}\,D\). Such dynamics were considered in [1, 14].

We have the following result:

Corollary 12

(Nonconverging Hessian Riemannian gradient dynamics) There exists a Legendre function \(h:D \mapsto {\mathbb {R}}\), defined on a closed square D, continuous on D, a vector c in \( {\mathbb {R}}^2\), and \(x_0\) in \({\mathbb {R}}^2\) such that the solution to (43) with \(f=\langle c,\cdot \rangle \) has two distinct accumulation points.

Proof

Equation (43) may be rewritten

$$\begin{aligned} \frac{d}{dt} \nabla h(x(t))= - \nabla f(x(t)), \end{aligned}$$

so choosing \(c = (-1,0)\), we have for all \(t \in {\mathbb {R}}\), \(\nabla h(x(t)) = \nabla h(x(0)) + (t,0) = (t,0)\) and the construction is the same as in Corollary 10. \(\square \)