The contribution by Kantorovich had a very great influence on the mathematics of the twentieth century. His pioneering work [1] laid the foundation of the theory of convex extremal problems and applications of this theory to the optimal allocation of resources were evaluated subsequently by awarding him the Nobel Prize in Economics. Here we expose a possible rethinking of a few fragments of classical approximation theory based on some ideas, methods, and results of Kantorovich.

Kantorovich belonged to the St. Petersburg mathematical school founded by Chebyshev. Approximation theory was one of the main scientific areas of Chebyshev.

Chebyshev was the creator of the first Russian mathematical school on approximation theory. Among his followers in approximation theory, we mention such persons as Markov and Zolotarev. It is believed that this branch of mathematical analysis began to develop after Chebyshev’s article [2].

In the classical approximation theory, we distinguish the three stages of development from start to 1970s.

The first, Chebyshev, stage of the newly-emerged theory was devoted to the approximation by polynomials and rational fractions.

The second stage of approximation theory was associated with the name of one of the most prominent Russian mathematicians of the last century, Bernstein, who was one of the teachers of Kantorovich. Bernstein’s first fundamental work [3] on approximation theory appeared in 1912. Bernstein had obtained the initial impulse to search into relationship between smoothness and approximation rate from de la Vallée-Poussin, and the means of approximation (polynomials) and the characteristic of approximations (the uniform metric) Bernstein had taken from Chebyshev.

The third stage is connected with the contribution of Kolmogorov whom Kantorovich erstwhile called one of his teachers in a conversation with me.

Let us first single out the topic that Chebyshev, Bernstein, and their followers paid great attention to. This is the extremal problems of approximation theory (more precisely, approximation by polynomials, inequalities for the derivatives of polynomials, etc.). We recall some classical results on this topic:

Theorem (of alternance, Chebyshev, 1859)

For a polynomial \( \widehat{x}(\cdot) \) of degree \( n \) to be the best approximation in the space \( C([a,b]) \) to continuous functions on \( [a,b] \) to a function \( \xi(\cdot) \) it is necessary and sufficient that the difference \( \xi(\cdot)-\widehat{x}(\cdot) \) have \( (n+2) \)-alternance, i.e., take its maximal and minimal values at \( (n+2) \) consecutive points of \( [a,b] \) while alternating signs.


The polynomial of degree \( n \) least deviating from zero in the metric of \( C([-1,1]) \) is the polynomial with nonzero coefficient of the highest order term and the least uniform norm on \( [-1,1] \).

Theorem (of least deviation, Chebyshev, 1854)

The polynomial least deviating from zero is unique and has the form \( 2^{n-1}T_{n}(t)=2^{n-1}\cos n\operatorname{cos}^{-1}t \).

Theorem (of polynomial extrapolation, Chebyshev, 1881)

Among all algebraic polynomials of degree at most \( n \) whose norm in \( C([-1,1]) \) is at most \( \delta>0 \), the greatest value at a point \( \tau>1 \) is taken by the polynomial \( \delta T_{n}(\cdot) \), where \( T_{n}(\cdot) \) is the Chebyshev polynomial of degree \( n \).

Theorem (Bernstein, 1912)

The following sharp inequality holds:

$$ \|\dot{y}(\cdot)\|_{C(𝕋)}\leq n\|y(\cdot)\|_{C(𝕋)}\quad\forall\,y(\cdot)\in{\mathcal{T}}_{n}, $$

where \( \mathcal{T}_{n} \) is the set of trigonometric polynomials of degree \( n \). Equality is attained at the function \( t\mapsto\sin nt \).


The classical results by Chebyshev’s students, the Markov brothers and Zolotarev, also belong to this topic. These are fundamental achievements of the first stage of the theory. Each of the results was obtained by special techniques and often with great effort. Now let us try to look at all these problems using the machinery laid by Minkowski and developed in the first half of the last century (especially, in the 1930s) by von Neumann, Fenchel, and Kantorovich. This area of mathematics acquired the name of convex analysis.

Convex analysis studies convex sets, convex functions, and convex extremal problems. Kantorovich was not only one of the founders of convex analysis: Throughout his creative life, he was closely associated with extremal problems, which is witnessed by his creative survey [4]. All four results we have listed belong to convex extremal problems. The first two of them are in essence convex problems without constraints, and the other two are minimization problems for convex functions with convex constraints. Classical analysis deals with necessary conditions for an extremum in smooth problems. They consist of Fermat’s Theorem in a problem without constraints and of the Lagrange multiplier rule for problems with constraints. In convex analysis, the notion of differential, involved in Fermat’s Theorem, is replaced by the notion of subdifferential and, instead of the necessary condition for an extremum at a point \( \widehat{x} \) consisting in the equality \( f^{\prime}(\widehat{x})=0 \), the criterion \( 0\in\partial f(\widehat{x}) \) arises for a minimum. For problems with constraints, the Lagrange multiplier rule with the Lagrange multiplier for the minimizing functional different from zero is also a criterion for minima of the problems.

We made this brief reminder to clarify the equivalence of the solution of a convex problem and the solution of the relations following from the minimum criteria. In order to demonstrate a unified way to proving the above results, it remains only to add three general statements of convex analysis. Without formulating the exact conditions of the theorems (the reader is referred to [5]), we formulate the essence of these general results.

Under sufficiently broad conditions, the subdifferential of the sum of convex functions is equal to the sum of their subdifferentials, and the subdifferential of the maximum of convex functions is equal to the convex hull of the union of their subdifferentials (the Fenchel–Moreau and Dubovitskii–Milyutin Theorems respectively).

The subdifferential of the norm at zero is the unit ball of the conjugate space, and at the point \( x_{0} \) different from zero, it is equal to the family of unit vectors in the dual space whose value at \( x_{0} \) is equal to the norm of \( x_{0} \).

Theorem (of clean-up, Levin)

The subdifferential of the maximum of a family of convex functions on \( 𝕉^{n} \) over an upper semicontinuous parameter is equal to the subdifferential of the maximum of at most \( (n+1) \) functions in the family.


Let us give a scheme of the proofs of the above results. We begin with the theorem of alternance.

Here we have to study the problem

$$ \max\limits_{t\in[a,b]}\biggl{|}\xi(t)-\sum\limits_{k=0}^{n}x_{k}t^{k}\biggr{|}\to\min,\quad x\in{𝕉}^{n+1}. $$
(1)

By the Clean-Up Theorem, it suffices to confine consideration only to \( m\leq n+2 \) points \( a\leq t_{1}<\dots<t_{m} \), and then the solution to the problem

$$ f(x)=\max\limits_{t_{i}\in[a,b]}\biggl{|}\xi(t_{i})-\sum\limits_{k=0}^{n}x_{k}t_{i}^{k}\biggr{|}\to\min,\quad x\in{𝕉}^{n+1}, $$
(1’)

is the same vector \( \widehat{x} \) as the solution to problem (1), i.e.,

$$ |\xi(t_{i})-\widehat{x}(t_{i})|=\|\xi(\cdot)-\widehat{x}(\cdot)\|_{C([a,b])}. $$

Applying the minimum criterion \( 0\in\partial f(\widehat{x}) \) and using the Dubovitskii–Milyutin criterion on the subdifferential of convex functions, we arrive at the system \( \big{\{}\sum\nolimits_{i=1}^{m}\mu_{i}t_{i}^{j}=0\big{\}}_{j=0}^{n} \) of \( n+1 \) homogeneous equations with \( m \) unknowns. Moreover, \( \mu_{i}=\lambda_{i}\operatorname{sgn}(\xi(t_{i})-\widehat{x}(t_{i})) \), \( \lambda_{i}>0 \), \( \sum\lambda_{i}=1 \). Hence, \( m=n+2 \). Since the determinant of a rectangular matrix consisting of the first \( (n+1) \) rows and columns is a Vandermonde one, we conclude that the signs of \( \mu_{i} \) alternate, which proves the criterion.  ☐

To study the least deviation polynomial, we must consider a particular case of the problem

$$ \max\limits_{t\in[a,b]}\biggl{|}t^{n}-\sum\limits_{k=0}^{n-1}x_{k}t^{k}\biggr{|}\to\min,\quad x\in{𝕉}^{n}. $$
(2)

Applying the Alternance Theorem to this problem, we see that, for the polynomial least deviating from zero, which we will denote by \( y(\cdot) \), there exist \( n+1 \) points \( a\leq t_{1}<\dots<(t_{n+1})\leq b \) in which the polynomial \( y(\cdot) \), while alternating, takes its minimal and maximal values whose modulus is equal to \( \gamma \). For finishing the proof, we must use a remarkable trick of Chebyshev. Rolle’s Theorem and the Fundamental Theorem of Algebra imply that the roots of the derivative of the \( y^{\prime}(\cdot) \) are located inside the interval \( [-1,1] \), and the remaining extremal points are located at its endpoints. This enables us to write down the differential equation for \( y \); i.e., \( (1-t^{2})y^{\prime}{}^{2}=n^{2}(\gamma^{2}-y^{2}) \), because, on the left and right sides, there are polynomials of the same degrees and the same higher coefficients. Integrating the obtained equation, we conclude that \( y=2^{n-1}\cos n\operatorname{cos}^{-1}t \).  ☐

Remark 1

Zolotarev generalized Chebyshev’s result about polynomials least deviating from zero by considering the problem

$$ \max\limits_{t\in[-1,1]}\biggl{|}t^{n}+\sigma t^{n-1}-\sum\limits_{k=0}^{n-2}x_{k}t^{k}\biggr{|}\to\min,\quad x\in{𝕉}^{n-1}. $$
(2’)

To Problem \( (2^{\prime}) \), one must apply Chebyshev’s Least Deviation Theorem and then the trick reducing the problem to a differential equation. But while in Chebyshev’s case, the differential equation is integrated in trigonometric functions, in Zolotarev’s case, the equations (which depend on the parameter \( \sigma\in{𝕉} \)) are integrated either in trigonometric or elliptic functions. The explicit expression for Zolotarev’s polynomial is rather complicated, and we will not give it. But application of complex analysis significantly facilitates the problem.

Return to Chebyshev’s Theorems. In Chebyshev’s extrapolation problem, we face the extremal problem with constraints

$$ f_{0}(x)=p(\tau,x)\to\max,\quad f_{1}(x)=\|p(\cdot,x)\|_{C([-1,1])}\leq\delta,\quad\tau>1. $$
(3)

Here \( x\in{𝕉}^{n+1} \), \( p(t,x)=\sum\nolimits_{k=0}^{n}x_{k}t^{k} \). The Lagrange function of the problem is as follows:

$$ \mathcal{L}(x,\overline{\lambda})=-f_{0}(x)+\lambda f_{1}(x)=-p(\tau,x)+\lambda\|p(\cdot,x)\|_{C([-1,1])}. $$

Let the solution to the problem be \( p(\cdot,\widehat{x}) \). Here the Lagrange principle yields the criterion \( 0\in\partial\mathcal{L}(\widehat{x},\overline{\lambda}) \). Let \( \{t_{k}\}_{k=1}^{r} \) be the points where \( |p(t_{k},\widehat{x})|=\|p(\cdot,\widehat{x})\|_{C([-1,1])}=\delta \), \( r\leq n+1 \). We have

$$ 0\in\partial\mathcal{L}(\widehat{x},\overline{\lambda})\Rightarrow p(\tau)=\sum\limits_{k=1}^{r}\mu_{k}p(t_{k}), $$

the main identity \( \forall p(\cdot)\in\mathcal{P}_{n} \), \( \sum\nolimits_{k=1}^{r}|\mu_{k}|=1 \), \( r\leq n+1 \), \( \mu_{k}\neq 0 \), where \( \mathcal{P}_{n} \) is the set of algebraic polynomials of degree \( n \).

If we suppose that \( r\leq n \) then, inserting the polynomial \( p_{0}(t)=\prod\nolimits_{k=1}^{r}(t-t_{k}) \) in the main identity, we conclude that \( p=x_{0}(\tau)\neq 0 \); a contradiction. Hence, \( r=n+1 \) and, as was proved in Chebyshev’s Least Deviation Theorem, the solution to the problem is proportional to the Chebyshev polynomial \( T_{n}(\cdot) \). As a result, we have proved that the value of the problem is equal to \( \delta T_{n}(\tau) \).  ☐

And, finally, let us infer Bernstein’s inequality.

The proof is analogous to that of the previous result if we recall that, due to the homogeneity of the problem with respect to shifts, it suffices to solve the following “pointwise” problem:

$$ f_{0}(x)=\dot{y}(0,x)\to\max,\quad f_{1}(x)=\|y(\cdot,x)\|_{C({𝕋})}\leq 1. $$
(4)

Here

$$ x\in{𝕉}^{2n+1},\quad y(t,x)=\frac{x_{0}}{2}+\sum\limits_{k=1}^{n}(x_{k}\cos kt+x_{n+k}\sin kt). $$

The Lagrange function of the problem is as follows: \( \mathcal{L}(x,\overline{\lambda})=-f_{0}(x)+\lambda f_{1}(x) \).

Let \( y(\cdot,\widehat{x}) \) be the solution to the problem. Here the Lagrange principle yielding in the criterion \( 0\in\partial{\mathcal{L}}(\widehat{x},\overline{\lambda}) \). Let \( \{t_{k}\}_{k=1}^{r} \) be the points where \( |y(t_{k},\widehat{x})|=\|y(\cdot,\widehat{x})\|_{C({𝕋})}=1 \), \( r\leq 2n+2 \). We have

$$ 0\in\partial{\mathcal{L}}(\widehat{x},\overline{\lambda})\Rightarrow\dot{y}(0)=\sum\limits_{k=1}^{r}\mu_{k}y(t_{k}) $$

(the main property) for all \( y(\cdot)\in{\mathcal{T}}_{n} \), \( \sum\nolimits_{k=1}^{r}|\mu_{k}|=1 \), \( r\leq 2n+2 \), \( \mu_{k}\neq 0 \). The case of \( r>2n \) is impossible because of Rolle’s Theorem (since otherwise the derivative of the polynomial would have more than \( 2n \) zeros). The assumption that \( r<2n \) leads to a contradiction either: If we consider a trigonometric polynomial with zeros at points \( \{t_{k}\}_{k=1}^{r} \) (which do not contain zero), at zero, and, possibly, at some more points, so that the total number of simple zeros is equal to \( 2n \), and insert such a polynomial in the main identity then it turns out that \( 0\neq 0 \) since the derivative of the polynomial at zero cannot be zero by Rolle’s Theorem. Hence, the polynomial \( \widehat{x}(\cdot) \) attains its maxima and minima at \( 2n \) points, i.e., satisfies the equation \( \dot{y}^{2}=n^{2}(1-y^{2}) \) because the trigonometric polynomials on the left and right sides have the same degrees (\( 4n \)), the same zeros (double zeros at the points \( \{t_{k}\}_{k=1}^{2n} \)), and the same coefficients at the higher term. The so-obtained equation is satisfied by the function \( \sin nt \), which and the sufficiency of the necessary condition in this convex problem imply that \( \sin n \) is a solution to the problem, i.e., in particular, the value of the problem is equal to \( n \).  ☐

Remark 2

The general “pointwise” problem on the \( k \)th (\( 0\leq k\leq n-1 \)) derivative of an algebraic polynomial is posed as follows:

$$ f_{0}(x)=p^{(k)}(\tau,x)\to\max,\quad f_{1}(x)=\|p(\cdot,x)\|_{C([-1,1])}\leq\delta,\quad\tau\in{𝕉}. $$
(5)

As one can see, this is a generalization of Chebyshev’s extrapolation problem. Problem (5) is dealt with in the same way as Chebyshev’s extrapolation problem. It turns out that the answer is either the Chebyshev polynomial or one of the Zolotarev polynomials. An additional small variation leads to the well-known Markov brothers’ inequalities for polynomials.

Another topic, which we will only briefly cover, is related to the generalization of the Landau–Kolmogorov inequalities. This is a huge set of problems depending on many parameters and various initial data. Problems about inequalities for derivatives are posed in the one-dimensional and multidimensional cases but exact solutions are obtained mainly for functions of one variable. Let us touch on this in a bit more detail.

The exact solution of one-dimensional inequalities in the simplest case of two constraints is equivalent to the exact solution of such extremal problems (where \( T={𝕉},{𝕉}_{+}\vee{𝕋} \))

$$ \|x^{(k_{0})}(\cdot)\|_{L_{p_{0}}(T)}\to\max,\quad\|x^{(k_{1})}(\cdot)\|_{L_{p_{1}}(T)}\leq\gamma_{1},\quad\|x^{(k_{2})}(\cdot)\|_{L_{p_{2}}(T)}\leq\gamma_{2}. $$
(6)

The series of problems (6) has many parameters. An individual problem from this family will be denoted by \( P(p,k,\gamma,T) \), where \( p=(p_{0},p_{1},p_{2}) \), \( k=(k_{0},k_{1},k_{2}) \), \( \gamma=(\gamma_{1},\gamma_{2}) \), \( T={𝕉},{𝕉}_{+}\vee{𝕋} \).

It is acknowledged that the first problem in this series, namely, Problem \( P(\infty,\infty,\infty,1,0,2,\gamma_{1},\gamma_{2},{𝕉}_{+}) \), was solved by Landau in 1912 (see [6]). The most brilliant among the solutions of the problems of this series is the solution by Kolmogorov in 1938 (see [7]) of the problem

$$ P(\infty,\infty,\infty,k_{0},k_{1},k_{2},\gamma_{1},\gamma_{2},{𝕉}),\quad k_{1}=0,\ k_{2}\geq 2,\ 1\leq k_{0}\leq k_{2}-1. $$

(This gives the whole topic the title Landau–Kolmogorov inequalities.)

In [7], some more general problem was posed, when, instead of three inequalities for derivatives in the \( L_{\infty} \)-metric, more inequalities are used. It is hard to hope that this general Kolmogorov problem will ever be thoroughly settled but there is a metric for which complete solutions can be obtained not only in the one-dimensional case. This is the \( L_{2} \)-metric on spaces with invariant structure, in particular, in \( {𝕋}^{n} \) and \( {𝕉}^{n} \).

The crux of the matter is that, in the case of inequalities for the derivatives in the metric \( L_{2} \), after expansion in Fourier series or integrals, extremal problems are reduced to problems of linear programming whose theory was constructed by Kantorovich. The theory of linear programming makes it possible to study problems about inequalities for derivatives in the \( L_{2} \)-metric thoroughly: prove existence theorems and duality, construct effective numerical methods, and often complete the exact solutions. Let us illustrate this for the one-dimensional case on the torus with several inequalities.

Consider one of the simplest cases in (6), which however is a significant share of solutions in this series. It is about Problem \( P(2,2,2,k_{0},0,k_{2},\gamma_{1},\gamma_{2},{𝕋}) \); i.e., in more detail, about the problem

$$ \|x^{(k_{0})}(\cdot)\|_{L_{2}({𝕋})}\to\max,\quad\|x(\cdot)\|_{L_{2}({𝕋})}\leq\gamma_{1},\quad\|x^{(k_{2})}(\cdot)\|_{L_{2}({𝕋})}\leq\gamma_{2}. $$
(7)

Using Parseval’s formula, write down the hypothesis of the problem through the Fourier coefficients of \( x(\cdot) \) on assuming for certain simplification that \( x(\cdot) \) is equal to zero in average. Expand this function in the Fourier series

$$ x(t)=\sum\limits_{j\in{𝕅}}a_{j}\cos jt+b_{j}\sin jt,\quad a_{j}=\frac{1}{\pi}\int\limits_{𝕋}x(t)\cos jt\,dt,\quad b_{j}=\frac{1}{\pi}\int\limits_{𝕋}x(t)\sin jt\,dt. $$

If we denote \( a_{j}^{2}+b_{j}^{2} \) by \( x_{j} \), \( j\in{𝕅} \); then, by Parseval’s formula, problem (7) acquires the form of the problem of linear programming whose theory was created by Kantorovich:

$$ \sum\limits_{j\in{𝕅}}j^{2k_{0}}x_{j}\to\max,\quad\sum\limits_{j\in{𝕅}}x_{j}\leq\gamma_{1}^{2},\quad\sum\limits_{j\in{𝕅}}j^{2k_{2}}x_{j}\leq\gamma_{2}^{2}. $$
(8)

The Lagrange function of this problem leads us to the necessity of considering the function \( f(z)=-z^{2k_{0}}+\lambda_{1}+\lambda_{2}z^{2k_{2}},\lambda_{1},\lambda_{2}\geq 0 \). By the minimum condition in the Lagrange multiplier rule, if this function is positive at an integer point \( j \) then the corresponding coefficient \( \widehat{x}_{j} \) (of the solution to problem \( (8) \)) must be zero. The function \( f \) cannot be negative at integer points (otherwise, the minimum would be equal to minus infinity). Consequently, the coefficients of the solution can be nonzero only if \( f \) is equal to zero at integer points. There are either one or two consecutive points like that, and the solution of \( (8) \) is reduced to a very easy problem of linear programming with one or two unknowns.

Solving the general problem \( (7) \) is reduced to considering several problems with two constraints. Solving the Landau–Kolmogorov problems in their majority can be carried out by the methods we briefly described above.

In this short note, we tried to demonstrate that the most substantial part of classical approximation theory, which was developed by dozens of researchers from the schools of Chebyshev, Bernstein, Kolmogorov, Stechkin, et al. can be rethought on the basis of the ideas and methods of Leonid Kantorovich.