Some Properties of Smooth Convex Functions and Newton’s Method

Denisov, D. V.; Evtushenko, Yu. G.; Tret’yakov, A. A.

doi:10.1134/S1064562421020034

Some Properties of Smooth Convex Functions and Newton’s Method

MATHEMATICS
Published: 13 July 2021

Volume 103, pages 76–80, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Doklady Mathematics Aims and scope Submit manuscript

Some Properties of Smooth Convex Functions and Newton’s Method

Download PDF

D. V. Denisov¹,
Yu. G. Evtushenko^1,2,3,4 &
A. A. Tret’yakov^2,5,6

121 Accesses
Explore all metrics

Abstract

New properties of convex infinitely differentiable functions related to extremal problems are established. It is shown that, in a neighborhood of the solution, even if the Hessian matrix is singular at the solution point of the function to be minimized, the gradient of the objective function belongs to the image of its second derivative. Due to this new property of convex functions, Newtonian methods for solving unconstrained optimization problems can be applied without assuming the nonsingularity of the Hessian matrix at the solution of the problem and their rate of convergence in argument can be estimated under fairly general assumptions.

On the Properties of the Method of Minimization for Convex Functions with Relaxation on the Distance to Extremum

Article 01 January 2019

A Newton-CG algorithm with complexity guarantees for smooth unconstrained optimization

Article 19 January 2019

A Method for Global Minimization of Functions Using the Krawczyk Operator

Article 28 November 2019

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The unconstrained minimization problem deals with functions $f( \cdot )$ of n real variables defined and sufficiently smooth in a neighborhood $U(x\text{*})$ of the minimizer. Throughout this paper, the set of minimizers of f is denoted by $X\text{*} = {\text{Arg}}\min f$ and is assumed to be nonempty. The necessary condition for x* to be a minimizer of f is the equality $f{\kern 1pt} '(x\text{*}) = 0,$ and the Hessian matrix of f at the minimizer is positive semidefinite. Newton’s method has been studied in numerous works, for example, in [2–6]. An overview of relevant studies can be found in [1]. In this paper, we show that, even if the Hessian matrix is singular at the point x*, the objective function gradient in a neighborhood of x* belongs to the image of its second derivative and, hence, the Newtonian system for the descent direction is solvable at points of this neighborhood. This topological property of convexity and optimality provides a new insight into Newton-type methods and makes it possible to prove their rate of convergence without the restrictive assumption that the Hessian matrix is nonsingular at x*. For unconstrained optimization problems, we establish properties that refine Lojasiewicz’s inequality [7, 8]. Namely, the inequality is generalized to the entire spectrum of derivatives up to a certain order. We consider functions whose derivatives are bounded uniformly in argument and jointly in order in a neighborhood $U(x\text{*})$ of x*, i.e., for this neighborhood, there exists a positive constant M such that the values of the derivatives of any order do not exceed M in absolute value. Additionally, we consider functions that are sufficiently smooth in a neighborhood of the minimizer, i.e., infinitely differentiable functions. Throughout this paper, without loss of generality, we assume that $f(x\text{*}) = 0$ and use the notation ${{f}^{{(0)}}}(x) = f(x).$ The following result holds for functions of one variable.

Lemma 1. If x* is an isolated local minimizer of a function $f{\kern 1pt} :\;R \to R$ that is sufficiently smooth in a neighborhood $U(x\text{*})$ and has derivatives that are bounded in $U(x\text{*})$ uniformly in argument and jointly in order, then there exists an even 2p, $p = {{p}_{f}} \in \mathbb{N}$, such that

$$\begin{gathered} {{f}^{{(2p)}}}(x\text{*}) > 0,\quad {{f}^{{(k)}}}(x\text{*}) = 0, \\ \frac{{{{f}^{{(k)}}}(x)}}{{{{{(x - x\text{*})}}^{{2p - k}}}}} \geqslant {{C}_{k}},\quad k = 0,1,2, \ldots ,2p - 1, \\ \end{gathered} $$

for all $x \in \overset{\circ}{U}(x\text{*})$, where ${{C}_{k}} = {{C}_{{k,f}}}$ for k = 0, 1, 2, ..., $2p - 1$ are positive constants independent of $x.$

Proof. Since x* is a local minimizer of f, we have $f{\kern 1pt} '(x\text{*}) = 0.$ First, we show that, under the conditions of Lemma 1, it is not possible that ${{f}^{{(k)}}}(x\text{*}) = 0$ for any k, $k = 1,2$, .... Indeed, in this case, given a fixed point $x = x\text{*} + \;t \in U(x\text{*})$, since the derivatives are bounded in a neighborhood of x*, the Taylor formula with zero derivatives up to any fixed order $k$ yields the inequality ${\text{|}}f{\kern 1pt} '(x\text{*}\, + \,\theta t){\text{|}}\, \leqslant \,\tfrac{{M{{{({\text{|}}\theta t{\text{|}})}}^{k}}}}{{k!}}$, where M is the above-introduced the upper bound for the values of the derivatives of f in the indicated neighborhood and $\theta \in (0,1).$ Combining this result with the condition that the local minimizer x* is isolated yields f(x* + t) = $\int\limits_0^1 {f{\kern 1pt} '(x{\kern 1pt} {\text{*}}\, + \,\theta t)d\theta } \, \leqslant \,\frac{M}{{(k\, + \,1)!}}$. For sufficiently large k, this means a violation of the possible assumption $f(x) \ne 0$, $x \ne x\text{*}$, which contradicts the fact that the local minimizer x* is isolated. Therefore, there exists a finite k > 1 such that ${{f}^{{(k)}}}(x\text{*}) \ne 0$ and ${{f}^{{(l)}}}(x\text{*}) = 0$, $l = 1,2, \ldots ,k - 1$. Moreover, k can only be even: $k = 2p$, $p \in \mathbb{N}$, and ${{f}^{{(2p)}}}(x\text{*}) > 0$, since x* is a local minimizer of f. Now, to complete the proof, we denote $\frac{1}{{(2p + 1)!}}{{f}^{{(2p)}}}(x\text{*})$ by C₀. Then the first inequality in the lemma follows from the Taylor formula and the boundedness of the $(2p + 1)th$ derivative in the neighborhood $U(x\text{*})$. The Taylor expansion of ${{f}^{{(k)}}}(x)$ in a neighborhood of x* yields the other inequalities of the lemma for any k, $k = 1,2, \ldots ,2p - 1,$ if C_k denotes $\tfrac{1}{{(2p + 1 - k)!}}{{f}^{{(2p)}}}(x\text{*})$.

Corollary 1. Under the conditions of Lemma 1, the function f is locally convex.

Indeed, if ${{f}^{{(2)}}}(x\text{*}) > 0$, then the second derivative remains positive in a small neighborhood of x*, which means that f(x) is locally convex. If ${{f}^{{(2)}}}(x\text{*}) = 0$, then, as is shown in Lemma 1, ${{f}^{{(2p)}}}(x\text{*}) > 0$ for some p = ${{p}_{f}} \in \mathbb{N}$, $p > 1$, and, additionally, ${{f}^{{(k)}}}(x\text{*}) = 0$ for $k = 1,2$, ..., 2p – 1. Then ${{f}^{{(2)}}}(x) \geqslant {{C}_{2}}{{(x - x\text{*})}^{{2p - 2}}}$ for any $x \in U(x\text{*})$, where C₂ is a positive constant determined in Lemma 1. This inequality also means that f is locally convex.

For a function of n variables, the kth derivative in the direction h at a point x is denoted by $f_{h}^{{(k)}}(x).$

Corollary 2. If $f{\kern 1pt} :\;{{R}^{n}} \to R$ is a sufficiently smooth function whose derivatives are bounded uniformly in argument and jointly in order in a neighborhood $U(x\text{*})$ of an isolated minimizer $x\text{*}$, then, for any $h\, \in \,{{R}^{n}}{\kern 1pt} :\;{\text{||}}h{\text{||}}$ = 1, there exists an even $2{{p}_{{h,f}}},{{p}_{{h,f}}} \in \mathbb{N}$, such that ${{f}^{{(2{{p}_{{h,f}}})}}}(x\text{*})$ > 0, ${{f}^{{(k)}}}(x\text{*}) = 0$, and $\tfrac{{{{f}^{{(k)}}}(x\text{*} + \;th)}}{{{{t}^{{2{{p}_{{h,f}}} - k}}}}} \geqslant {{C}_{k}}$, k = 0, 1, 2, ..., $2{{p}_{{h,f}}} - 1$ for all $t \in (0,{{\delta }_{h}}],$ where ${{C}_{k}} = {{C}_{{k,h,f}}}$ for k = 0, $1,2, \ldots ,2{{p}_{{h,f}}} - 1$ are positive constants independent of $t,x\text{*} + \;{{\delta }_{h}} \in U(x\text{*})$; moreover, the function f is locally convex in the direction $h.$

Here, the element $p = {{p}_{{h,f}}} \in \mathbb{N}$ is determined by the direction $h,{\text{||}}h{\text{||}} = 1$ and does not depend on small t for which $x\text{*} + \;{{\delta }_{h}} \in U(x\text{*})$, but the value ${{\delta }_{h}} > 0$ depends on h and, in the general case, ${{\delta }_{h}}$ can be infinitesimal with respect to h. It will be shown in Lemma 2 below that, for a convex function f, we can guarantee the existence of an upper bound for ${{p}_{{h,f}}} \in \mathbb{N}$ and a positive lower bound for δ_h on the set of vectors $h{\kern 1pt} :\;{\text{||}}h{\text{||}} = 1$.

Lemma 1 implies that the Newton operator ψ(x) = $x - {{f}^{{(2)}}}{{(x)}^{{ - 1}}}f{\kern 1pt} '(x)$ is well defined in a sufficiently small deleted neighborhood of the solution. In the case of Rⁿ of arbitrary dimension, Lemma 1 means that f is convex along any straight line passing through the point x*, assuming that the function is sufficiently smooth in the intersection of the straight line and the neighborhood U(x*). Additionally, for any straight line passing through the point x* along the vector h, we have $f_{h}^{{(2p)}}(x\text{*}) \geqslant {{C}_{2}}$ for some exponent $2p,p = {{p}_{h}} \in \mathbb{N}$ and some constant ${{C}_{2}} = {{C}_{{2,h}}} > 0.$ Lojasiewicz’s inequality guarantees that this inequality holds if the function is analytic in U(x*) for some p and C₂ independent of h. Due to the convexity of the function in U(x*), we assume that the function is only sufficiently smooth and obtain the “stability” of Lojasiewicz’s inequality, which expands the applicability of Newton’s method.

Lemma 2. If $f{\kern 1pt} :\;{{R}^{n}} \to R$ is a convex sufficiently smooth function whose derivatives are bounded uniformly in argument and jointly in order in a neighborhood U(x*) of an isolated minimizer x*, then there exists a even exponent $2p,\;p = {{p}_{f}} \in \mathbb{N}$, a constant $C = {{C}_{f}}$ > 0, and a number $\delta = {{\delta }_{f}} > 0$ such that

$$f(x\text{*} + \;th) \geqslant C{{t}^{{2p}}}$$

for all $h{\kern 1pt} :\;{\text{||}}h{\text{||}} = 1,\;t \in (0,\delta ]$.

Proof. Let us prove the existence of $p = {{p}_{f}} \in \mathbb{N}$ such that, if $f_{h}^{{(i)}}(x\text{*}) = 0$ for $i = 0,1,2,...,2p{\kern 1pt} '(h) - 1$ and $f_{h}^{{(2p'(h))}}(x\text{*}) > 0$, then $p{\kern 1pt} '(h) \leqslant p$. Assume the opposite. Then, for some sequences ${{h}_{k}}{\kern 1pt} :\;{\text{||}}{{h}_{k}}{\text{||}} = 1$, ${{t}_{k}}{\kern 1pt} :\;{{t}_{k}} \to + 0$, it is true that $f(x\text{*} + \;{{t}_{k}}{{h}_{k}}) < {{C}_{k}}{{t}^{{{{m}_{k}}}}}$, where ${{C}_{k}} > 0$ is a bounded sequence and ${{m}_{k}} \in \mathbb{N}$ is an increasing sequence. Without loss of generality, we assume that ${{h}_{k}} \to h$ and ${{C}_{k}} \to C$ as $k \to \infty $. Let ${\text{conv}}\{ {{h}_{k}},{{h}_{{k + 1}}},{{h}_{{k + 2}}},...,{{h}_{{k + n}}}\} $ be denoted by ${{V}_{k}}.$ If ${\text{dim}}{{V}_{k}} < n$, this set of vectors is modified by adding linearly independent increments of length on the order of ${{C}_{k}}{{t}^{{2p}}}$ (${{C}_{k}} \to 0$, $k \to \infty $) to $n - {\text{dim}}{{V}_{k}}$ vectors, after which we can assume that ${\text{dim}}{{V}_{k}}$ = n. Next, suppose that $h_{k}^{'} \in {\text{int}}{{V}_{k}}$, $h_{k}^{{''}} = h + {{\alpha }_{k}}(h_{k}^{'} - h)$, ${{\alpha }_{k}} \in (0,1)$, α_k = inf{α > 0: h + $\alpha (h_{k}^{'} - h) \in {{V}_{k}}\} $. The convexity and continuity of f imply that $f(x\text{*} + \;{{t}_{k}}h_{k}^{{''}}) \leqslant 2C{{t}^{{{{m}_{k}}}}}$. On the other hand, Lemma 1 implies that, for some $p = {{p}_{h}} \in \mathbb{N}$, there exists $k \in \mathbb{N}$, $k \leqslant p$, for which $f_{h}^{{(2k)}}(x\text{*}) \geqslant {{C}_{2}} > 0$ for some ${{C}_{2}} > 0.$ Since $h_{k}^{{''}} \to h$ as $k \to \infty $, for sufficiently large indices k, we have $f(x\text{*} + \;{{t}_{k}}h_{k}^{{''}}) \geqslant {{C}_{2}}{{t}^{{2k}}}$, which contradicts the assumption.

Corollary 3. The proof of Lemma 2 implies the existence of an exponent $2p,\;p = {{p}_{f}} \in \mathbb{N}$, and constants ${{C}_{k}} = {{C}_{{k,f}}}$, $k = 1,2,...,2p{\kern 1pt} '\; - 1$, for which

$$\begin{gathered} f_{h}^{{(2p')}}(x\text{*}) > 0,\quad f_{h}^{{(k)}}(x\text{*}) = 0,\quad k = 0,1,...,2p{\kern 1pt} '\; - 1, \\ f_{h}^{{(k)}}(x\text{*} + \;th) \geqslant {{C}_{k}}{{t}^{{2p - k}}},\quad 0 < t \leqslant \delta , \\ \end{gathered} $$

moreover, $p{\kern 1pt} ' = p{\kern 1pt} '(h) \leqslant p$ for all $h{\kern 1pt} :\;{\text{||}}h{\text{||}} = 1$.

For the point $x = x\text{*} + \;th \in U(x\text{*})$, where ${\text{||}}h{\text{||}} = 1$, the basis $G = G(h) = \{ {{g}_{1}},{{g}_{2}},...,{{g}_{n}}\} $ in Rⁿ and the corresponding index $q = q(h)$ are determined as follows. Consider the matrices

$${{a}_{k}} = \frac{1}{{(k - 1)!}}{{f}^{{(k)}}}(x\text{*}){{[h]}^{{k - 2}}},\quad k = 2,3,...\,\,.$$

Then, for a sufficiently smooth function f at the point $x = x\text{*} + \;th$, we have the relations

$$\begin{gathered} {{f}^{{(2)}}}(x)th = \sum\limits_{k \geqslant 2} \,(k - 1){{a}_{k}}h{{t}^{{k - 1}}}, \\ f{\kern 1pt} '(x) = \sum\limits_{k \geqslant 2} \,{{a}_{k}}h{{t}^{{k - 1}}},\quad f(x) = \sum\limits_{k \geqslant 2} \,\frac{{{{a}_{k}}{{{[h]}}^{2}}{{t}^{k}}}}{k}. \\ \end{gathered} $$

(1)

Next the index ${{k}_{1}} \geqslant 2$ is defined as the minimum among those indices k for which the one-dimensional space ${{L}_{1}} = {{L}_{{{{k}_{1}}}}} = Lin\{ {{a}_{k}}h\} \ne \{ 0\} $. The straight line L₁ is redenoted by L¹, the vector ${{g}_{1}}$ is defined as $\tfrac{{{{a}_{{{{k}_{1}}}}}h}}{{{\text{||}}{{a}_{{{{k}_{1}}}}}h{\text{||}}}}$, and the index ${{k}_{2}} > {{k}_{1}}$ is defined as the minimum k for which $Lin\{ {{a}_{k}}h\} $ is not contained in L¹. Then ${{L}_{{{{k}_{2}}}}}\, = \,Lin\{ {{a}_{{{{k}_{2}}}}}h\} $, ${\text{dim}}({{L}_{{{{k}_{2}}}}} \oplus {{L}^{1}}) = 2$, the one-dimensional subspace L₂ is defined as $P{{r}_{{{{{({{L}^{1}})}}^{ \bot }}}}}{{L}_{{{{k}_{2}}}}}$, and the vector g₂ is defined as the normalized vector $P{{r}_{{{{L}_{2}}}}}{{a}_{{{{k}_{2}}}}}h.$ Here, $P{{r}_{{{{L}_{2}}}}}({{a}_{{{{k}_{2}}}}}h) \subseteq P{{r}_{{{{L}_{2}}}}}Im{{a}_{{{{k}_{2}}}}}.$ Next, the space L² is defined as ${{L}^{1}} \oplus {{L}_{2}}$. For each $j = 3,4$, ..., q = q(h), in a similar fashion, we determine the straight lines ${{L}_{j}} = P{{r}_{{{{{({{L}^{{j - 1}}})}}^{ \bot }}}}}{{L}_{{{{k}_{j}}}}}$, $j = 3,4,...,q$ and the corresponding unit vectors ${{g}_{j}},j = 1,2,...,q.$ Here, $q = q(h) = \dim Lin\{ {{a}_{1}}h,{{a}_{2}}h,...\} \leqslant n$, the direct sum ${{L}^{{j - 1}}} \oplus {{L}_{j}}$ is denoted as L^j and is a j-dimensional subspace of ${{R}^{n}}$. At the final stage, we construct a subspace L^q of dimension $q = q(h) \leqslant n.$ The orthogonal complement of L^q is $H{\kern 1pt} :\;H = {{({{L}^{q}})}^{ \bot }}$ for q < n and $H = \{ 0\} $ for q = n. The basis $G(h)$ in ${{R}^{n}}$ is defined as the set consisting of the constructed mutually orthogonal unit vectors ${{g}_{1}},{{g}_{2}},...,{{g}_{q}}$ directed along the constructed orthogonal straight lines ${{L}_{l}},{\text{ }}l = 1,2,...,q,$ and an arbitrary orthogonal basis ${{g}_{{q + 1}}},{{g}_{{q + 2}}},...,{{g}_{n}}$ in H.

Theorem 1. If $f{\kern 1pt} :\;{{R}^{n}} \to R$ is a sufficiently smooth function whose derivatives are bounded uniformly in argument and jointly in order in a neighborhood U(x*) of an isolated minimizer x*, then, for any fixed $h{\kern 1pt} :\;{\text{||}}h{\text{||}} = 1$, the system

$${{f}^{{(2)}}}(x\text{*} + \;th)s = f{\kern 1pt} '(x\text{*} + \;th)$$

(2)

is solvable with respect to s for sufficiently small t.

Proof. For any fixed vector h, the solution s of the system in the basis G is determined coordinatewise as follows. Let $P{{r}_{H}}s = 0.$ Then, according to (1), the coordinate s_q is determined by the condition $P{{r}_{{{{L}_{q}}}}}(({{k}_{q}} - 1){{a}_{{{{k}_{q}}}}}{{t}^{{{{k}_{q}} - 2}}}s)$ = $P{{r}_{{{{L}_{q}}}}}f{\kern 1pt} '(x) = P{{r}_{{{{L}_{q}}}}}\sum\limits_{k \geqslant {{k}_{q}}} {{{a}_{k}}h{{t}^{{k - 1}}}} $. Specifically, ${{s}_{q}} = \frac{{P{{r}_{{{{L}_{q}}}}}ht}}{{({{k}_{q}} - 1)}}$ + o(t). Substituting the coordinate s_q into system (2) yields the coordinate ${{s}_{{q - 1}}} = \frac{{P{{r}_{{{{L}_{{q - 1}}}}}}ht}}{{({{k}_{{q - 1}}} - 1)}}$ + o(t). Next, the coordinates s_l, $l = q - 2,q - 3$, ..., 1, are determined sequentially. Generally speaking, the solution of system (2) is not unique, but it follows from (1) that the coordinates of any solution s in the basis $G = G(h)$ are given by

$${{s}_{j}} = \tfrac{{P{{r}_{{{{L}_{j}}}}}ht}}{{{{k}_{j}} - 1}} + o(t),\quad j = 1,2,....,q.$$

(3)

Remark 1. To obtain a unique solution of system (2), we consider the problem

$${{\left| {\left| s \right|} \right|}^{2}} \to {{\min }_{s}},\quad {{f}^{{(2)}}}(x)s = f{\kern 1pt} '(x),$$

(4)

whose solution exists for each fixed h for all sufficiently small $t$ under the assumptions made in Theorem 1. The length of the interval in $t$ within which the solution exists depends on h.

Example 1. For the nonconvex function f(x) = $x_{1}^{4} + {{({{x}_{2}} - x_{1}^{2})}^{2}}$, Theorem 1 is violated for points of the parabola ${{x}_{2}} = 4x_{1}^{2}.$ Thus, the length of the interval on which Theorem 1 holds tends to zero as the vector h approaches (1, 0).

Below, we show that, in the case of a convex function, there is a neighborhood with respect to t with a common radius for all vectors of the unit sphere in which problem (4) is solvable. The form of the objective function of problem (4) implies that its solution is given by $s = {{f}^{{(2)}}}{{(x)}^{ + }}f{\kern 1pt} '(x),$ where ${{( \cdot )}^{ + }}$ denotes a pseudoinverse operator on its image. The coordinates of the solution to this problem in the basis $G$ satisfy the condition $P{{r}_{H}}s = 0.$

Theorem 2. Under the conditions of Lemma 2, for all $x$ sufficiently close to $x\text{*}$, problem (4) is solvable and its solution satisfies the conditions

$$\begin{gathered} \left( {f{\kern 1pt} '\left( x \right),s} \right) \geqslant {{M}_{1}}{\text{||}}x - x\text{*}{\text{|}}{{{\text{|}}}^{{2p}}}, \\ (f{\kern 1pt} '{\kern 1pt} '(x)s,s) \geqslant {{M}_{2}}{\text{||}}x - x\text{*}{\text{|}}{{{\text{|}}}^{{2p}}}, \\ \end{gathered} $$

(5)

where the exponent $2p$ is defined in Lemma 2 and the constants ${{M}_{1}} = {{M}_{{1,f}}}$ and ${{M}_{2}} = {{M}_{{2,f}}} > 0$ do not depend on x.

Proof. Lemma 2 and (1) imply that, for every $h{\kern 1pt} :\;{\text{||}}h{\text{||}} = 1$, the numbers k_j determining the basis G satisfy the following condition: there is an index ${{l}_{1}} \in \{ {{k}_{1}},{{k}_{1}} + 1,...,2p\} $ for which ${{a}_{{{{l}_{1}}}}}{{[h]}^{2}} \geqslant C > 0,$ the exponent 2p is defined in Lemma 2 and is independent of $h$, and the constant C > 0 is also independent of h and is defined in Corollary 3. The last inequality is equivalent to the condition $\tfrac{{{{a}_{{{{l}_{1}}}}}{{{[h]}}^{2}}}}{{{{l}_{1}} - 1}} \geqslant {{C}_{1}}{{t}^{{2p}}}$ for small t. Denoting $\tfrac{C}{{2p - 1}}$ by ${{M}_{1}},$ we derive the inequality stated in the theorem. The second inequality is derived from Corollary 3 in a similar manner.

Corollary 4. Under the conditions of Lemma 2, for a convex function f, the system

$${{f}^{{(2)}}}(x)s = f{\kern 1pt} '(x)$$

(6)

solvable with respect to s for all x sufficiently close to x*, which is equivalent to the inclusion

$$f{\kern 1pt} '\left( x \right) \in {\text{Im}}{{f}^{{(2)}}}(x)$$

(7)

irrespective of the rank of the matrix ${{f}^{{(2)}}}(x).$

In what follows, the set $\{ x \in {{R}^{n}}{\kern 1pt} :\;f(x) \leqslant f(x\text{*}) + \varepsilon \} $ is denoted by $X_{\varepsilon }^{*}$ and the diameter of a set A is denoted by ${\text{diam}}A.$

Theorem 3. Under the conditions of Lemma 2, there exists a positive integer p and a constant $\bar {C} < \infty $ for which

$${\text{diam }}X_{\varepsilon }^{*} < \bar {C}{{\varepsilon }^{{\tfrac{1}{{2p}}}}}.$$

(8)

Remark 2. The above-mentioned properties hold assuming that the set of minimizers of f(x) is an isolated point. If this assumption is rejected, while the other assumptions of Theorem 1 are satisfied, then we have the representation

$${\text{Arg}}\min {\text{ }}f = \{ x\text{*}\} + L,$$

(9)

where L is the eigensubspace of Rⁿ determined by the condition ${{f}^{k}}(x{\text{*}}){{[h]}^{k}} = 0$ for any positive integer k.

Theorem 1 does not imply that ${{f}^{{(2)}}}(x)$ is positive definite in a neighborhood of the minimizer x*. Nevertheless, it guarantees that Newton’s method with a suitable initial point is applicable for finding a minimizer of a smooth function, since problem (1) makes it possible to obtain a transition vector in the iterative scheme without assuming that the objective function is convex. Moreover, in this case, we can obtain the monotonicity of Newton’s method in argument and, under stronger assumptions, a linear rate of convergence. If the objective function is convex, then this property holds without making additional assumptions.

First, we use Theorem 1 to prove the monotonicity of Newton’s scheme in argument. Specifically, the Newton operator is defined as ψ(x, s) = x – ${{f}^{{(2)}}}{{(x)}^{ + }}f{\kern 1pt} '(x)$, where ${{f}^{{(2)}}}{{(x)}^{ + }}f{\kern 1pt} '(x) = s$ is the solution of problem (4). The process of constructing the solution s in Theorem 1 and the remark to the theorem imply that, for any fixed $h{\kern 1pt} :\;{\text{||}}h{\text{||}} = 1$, this operator is well defined for sufficiently small t without assuming that the function f is convex. To obtain a convergence estimate, the following property is naturally introduced in addition to the assumptions of Theorem 1.

Definition 1. The function f is said to be weakly regular at the point x* if there exists a constant $d = {{d}_{f}} > 0$ such that $\mathop {\max }\limits_{j \leqslant q} {\text{|}}{{h}_{j}}{\text{|}} \geqslant d$ for any $h,{\text{ ||}}h{\text{||}} = 1$. Here, as before, $q = q(h) = \dim Lin\{ {{a}_{2}}h,{{a}_{3}}h,...\} $ and h_j is the jth coordinate of the vector h in the basis G.

Theorem 4. If the function f is weakly regular at the point x*, then, under the conditions of Theorem 1, there exists $\lambda \in (0,1)$ such that

$${\text{||}}\psi (x{\kern 1pt} {\text{*}} + th,s) - x\text{*}{\text{||}} \leqslant \lambda t$$

for all sufficiently small t.

Proof. For any solution of system (1), under the weak regularity assumption at the point x*, there exists an index j for which ${\text{||}}P{{r}_{{{{L}_{{{{k}_{j}}}}}}}}h{\text{||}} > d$. It follows that

$$\begin{gathered} {\text{||}}th - s{\text{||}} \leqslant \left\| {t\sum\limits_{l = 1,l \ne {{k}_{j}}}^q \,P{{r}_{{{{L}_{l}}}}}h - P{{r}_{{{{L}_{l}}}}}s} \right\| + t{\text{||}}P{{r}_{H}}h{\text{||}} \\ {\text{ + }}\;{\text{|}}tP{{r}_{{{{L}_{{{{k}_{j}}}}}}}}h - P{{r}_{{{{L}_{{{{k}_{j}}}}}}}}s{\text{|}} \leqslant t(1 - \alpha ) \\ \, + t\alpha \left( {1 - \frac{1}{{{{k}_{j}} - 1}}} \right) \leqslant t\left( {1 - \frac{d}{{{{k}_{j}} - 1}}} \right) = \lambda t, \\ \end{gathered} $$

$$\lambda = \lambda (h) = 1 - \frac{d}{{{{k}_{j}} - 1}} \in (0,1),\quad \alpha = \alpha (h) \geqslant d.$$

Remark 3. The resulting inequality does not mean the linear convergence of Newton’s method, since $\lambda $ involved in the estimate depends, generally speaking, on h. In the case of a convex function, the result holds in a stronger form, namely, ${\text{||}}\psi (x,s) - x\text{*}{\text{||}} \leqslant \lambda {\text{||}}x - x\text{*}{\text{||}}$ for all x from a sufficiently small neighborhood of x*, since the following property is valid for convex functions.

Definition 2. The function f is said to be uniformly regular at the point x* if there exists an index m = ${{m}_{f}} \in \mathbb{N}$ and a constant $d = {{d}_{f}} > 0$ such that there is $j \leqslant q:{{k}_{j}} \leqslant m$ for which ${\text{||}}P{{r}_{{{{L}_{j}}}}}h{\text{||}} \geqslant d$.

Theorem 5. If the function f is uniformly regular at the point x*, then, under the conditions of Theorem 1, there exists $\lambda \in (0,1)$ such that ${\text{||}}\psi (x,s) - x\text{*}{\text{||}} \leqslant \lambda {\text{||}}x - x\text{*}{\text{||}}$ for all x from a sufficiently small neighborhood of x*.

Proof. Theorem 4 implies that the denominator $\lambda \, = \,\lambda (h)$ is bounded from above uniformly in h, ||h|| = 1: $\lambda (h) = 1 - \tfrac{d}{{{{k}_{j}} - 1}} \leqslant \lambda {\kern 1pt} ' < 1$, where $j{\kern 1pt} :\;{{k}_{j}} \leqslant m.$ Such an index j exists, as was shown in Theorem 1. Additionally, the uniform regularity assumption implies that ${\text{||}}h_{H}^{ \bot }{\text{||}} > d$ for some d > 0 independent of h, where h_A is the projection of the vector $h$ onto the subspace A. This yields an $h$-independent estimate for λ.

Remark 4. Lemma 2 implies that a convex function is uniformly regular at the point $x\text{*}$: k_j = ${{k}_{j}}(h) \leqslant m$ = 2p, ${\text{||}}h{\text{||}} = 1$. Uniform regularity is a necessary and sufficient condition for Newton’s method to converge linearly under the hypotheses of Theorem 1 without assuming that the function f is convex. Newton’s iteration has the form

$${{x}_{{k + 1}}} = {{x}_{k}} - {{f}^{{(2)}}}{{({{x}_{k}})}^{ + }}f{\kern 1pt} '({{x}_{k}}),$$

and the convergence rate estimate of the method is linear:

$${\text{||}}{{x}_{{k + 1}}} - x\text{*}{\text{||}} \leqslant \lambda {\text{||}}{{x}_{k}} - x\text{*}{\text{||}}\,,\quad \lambda \in (0,1),\quad k = 0,1,2,...,$$

for an initial approximation that is sufficiently close to the solution x* in the case when f is uniformly regular at x*.

Corollary 5. Under the conditions of Theorem 2, for a convex function, the rate of convergence of Newton’s iteration is linear for any choice of the initial point x from a sufficiently small neighborhood of the solution. Indeed, Theorem 2 implies that a convex function is uniformly regular at the solution x*, which implies a linear convergence estimate.

To ensure that Newton’s method converges faster than linearly, we consider a modified operator ${{\psi }_{1}}(x,s)$ written coordinatewise in the basis G as ψ₁(x, s)_j = $x - ({{k}_{j}} - 1){{s}_{j}}$, $j = 1,2,...,q$; ${{\psi }_{1}}{{(x,s)}_{H}} = {{x}_{H}} - g(x,h)$, where $g(x,h){\kern 1pt} :\;H \to H$ and $s = {{f}^{{(2)}}}{{(x)}^{ + }}f{\kern 1pt} '(x)$. The operator ${{\psi }_{1}}{{(x,s)}_{H}}$ determines the modified Newton scheme ${{x}_{{k + 1}}}$ = ${{\psi }_{1}}({{x}_{k}},{{s}_{k}})$. The following result is a consequence of Theorem 1.

Theorem 6. Given a good initial approximation, the modified Newton scheme guarantees a superlinear convergence estimate if and only if the function $g(x,h)$ satisfies the condition ${\text{||}}g(x,h) - {{h}_{H}}{\text{||}} = o(t)$.

REFERENCES

B. T. Polyak, Tr. Inst. Sist. Anal. Ross. Akad. Nauk 28, 44–62 (2006).
Google Scholar
B. Bamadio and K. A. Lebedev, Mezhdunar. Nauch.-Issled. Zh. 37 (6–2), 11–14 (2015).
V. I. Zabotin and Yu. A. Chernyaev, Comput. Math. Math. Phys. 58 (3), 322–327 (2018). https://doi.org/10.1134/S0965542518030144
Article MathSciNet Google Scholar
D. Budzko, A. Cordero, and J. R. Torregrosa, SeMA J. 66 (1), 43–53 (2014). https://doi.org/10.1007/s40324-014-0020-y
Article MathSciNet Google Scholar
Y. Nesterov, Math. Program. 112 (1), 159–181 (2008). https://doi.org/10.1007/s10107-006-0089-x
Article MathSciNet Google Scholar
B. Polyak and A. Tremba, Optim. Methods Software 35 (6), 1272–1303 (2020). https://doi.org/10.1080/10556788.2019.1669154
Article MathSciNet Google Scholar
T. H. Colding and W. P. Minicozzi, “Lojasiewicz inequalities and applications,” arXiv:1402.5087. 2014.
S. Lojasiewicz, C. R. Acad. Sci. 246 (5), 683–686 (1958).
Google Scholar

Download references

Funding

This work was supported by the Russian Science Foundation, project no. 21-71-30005.

Author information

Authors and Affiliations

Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, 119991, Moscow, Russia
D. V. Denisov & Yu. G. Evtushenko
Dorodnicyn Computing Centre, Federal Research Center “Computer Science and Control,” Russian Academy of Sciences, 119333, Moscow, Russia
Yu. G. Evtushenko & A. A. Tret’yakov
Moscow Institute of Physics and Technology (National Research University), 141700, Dolgoprudnyi, Moscow oblast, Russia
Yu. G. Evtushenko
Moscow Aviation Institute (National Research University), 125080, Moscow, Russia
Yu. G. Evtushenko
System Research Institute, Polish Academy of Sciences, 01-447, Warsaw, Poland
A. A. Tret’yakov
Faculty of Sciences, Siedlce University, 08-110, Siedlce, Poland
A. A. Tret’yakov

Authors

D. V. Denisov
View author publications
You can also search for this author in PubMed Google Scholar
Yu. G. Evtushenko
View author publications
You can also search for this author in PubMed Google Scholar
A. A. Tret’yakov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to D. V. Denisov, Yu. G. Evtushenko or A. A. Tret’yakov.

Additional information

Translated by I. Ruzanova

Rights and permissions

Reprints and permissions

About this article

Cite this article

Denisov, D.V., Evtushenko, Y.G. & Tret’yakov, A.A. Some Properties of Smooth Convex Functions and Newton’s Method. Dokl. Math. 103, 76–80 (2021). https://doi.org/10.1134/S1064562421020034

Download citation

Received: 26 November 2020
Revised: 03 February 2021
Accepted: 03 February 2021
Published: 13 July 2021
Issue Date: March 2021
DOI: https://doi.org/10.1134/S1064562421020034

Keywords:

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Some Properties of Smooth Convex Functions and Newton’s Method

Abstract

Similar content being viewed by others

On the Properties of the Method of Minimization for Convex Functions with Relaxation on the Distance to Extremum

A Newton-CG algorithm with complexity guarantees for smooth unconstrained optimization

A Method for Global Minimization of Functions Using the Krawczyk Operator

REFERENCES

Funding

Author information

Authors and Affiliations

Corresponding authors

Additional information

Rights and permissions

About this article

Cite this article

Keywords:

Navigation

Some Properties of Smooth Convex Functions and Newton’s Method

Abstract

Similar content being viewed by others

On the Properties of the Method of Minimization for Convex Functions with Relaxation on the Distance to Extremum

A Newton-CG algorithm with complexity guarantees for smooth unconstrained optimization

A Method for Global Minimization of Functions Using the Krawczyk Operator

REFERENCES

Funding

Author information

Authors and Affiliations

Corresponding authors

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords:

Search

Navigation