Keywords

1 Introduction

In recent years, it has been actively developing higher-order or tensor methods for convex optimization problems. The primary impulse was the work of Yu. Nesterov [23] about the possibility of the implementation tensor method. He proposed a smart regularization of Taylor approximation that makes subproblem convex and hence implementable. Also Yu. Nesterov proposed accelerated tensor methods [22, 23], later A. Gasnikov et al. [4, 11, 12, 18] proposed the near-optimal tensor method via the Monteiro–Svaiter envelope [21] with line-search and got a near-optimal convergence rate up to a logarithmic factor. Starting from 2018–2019 the interest in this topic rises. There are a lot of developments in tensor methods, like tensor methods for Hölder-continuous higher-order derivatives [15, 28], proximal methods [6], tensor methods for minimizing the gradient norm of convex function [9, 15], inexact tensor methods [14, 19, 24], and near-optimal composition of tensor methods for sum of two functions [19]. There are some results about local convergence and convergence for strongly convex functions [7, 10, 11]. See [10] for more references on applications of tensor method.

At the very beginning of 2020, Yurii Nesterov proposed a Superfast Second-Order Method [25] that converges with the rate \(O(N^{-4})\) for a convex function with Lipshitz third-order derivative. This method uses only second-order information during the iteration, but assume additional smoothness via Lipshitz third-order derivative.Footnote 1 Here we should note that for the first-order methods, the worst-case example can’t be improved by additional smoothness because it is a specific quadratic function that has all high-order derivatives bounded [24].Footnote 2 But for the second-order methods, one can see that the worst-case example does not have Lipshitz third-order derivative. This means that under the additional assumption, classical lower bound \(O(N^{-2/7})\) can be beaten, and Nesterov proposes such a method that converges with \(O(N^{-4})\) up to a logarithmic factor. The main idea of this method to run the third-order method with an inexact solution of the Taylor approximation subproblem by method from Nesterov with inexact gradients that converges with the linear speed. By inexact gradients, it becomes possible to replace the direct computation of the third derivative by the inexact model that uses only the first-order information. Note that for non-convex problems previously was proved that the additional smoothness might speed up algorithms [1, 3, 14, 26, 29].

In this paper, we propose a Hyperfast Second-Order Method for a convex function with Lipshitz third-order derivative with the convergence rate \(O(N^{-5})\) up to a logarithmic factor. For that reason, firstly, we introduce Inexact Near-optimal Accelerated Tensor Method, based on methods from [4, 19] and prove its convergence. Next, we apply Bregman-Distance Gradient Method from [14, 25] to solve Taylor approximation subproblem up to the desired accuracy. This leads us to Hyperfast Second-Order Method and we prove its convergence rate. This method have near-optimal convergence rates for a convex function with Lipshitz third-order derivative and the best known up to this moment.

The paper is organized as follows. In Sect. 2 we formulate problem and introduce some basic facts and notation. In Sect. 3 we propose Inexact Near-optimal Accelerated Tensor Method and prove its convergence rate. In Sect. 4 we propose Hyperfast Second-Order Method and get its convergence speed.

2 Problem Statement and Preliminaries

In what follows, we work in a finite-dimensional linear vector space \(E=\mathbb {R}^n\), equipped with a Euclidian norm \(\Vert \,\cdot \,\Vert =\Vert \,\cdot \,\Vert _2\).

We consider the following convex optimization problem:

$$\begin{aligned} \min \limits _{x} f(x), \end{aligned}$$
(1)

where f(x) is a convex function with Lipschitz p-th derivative, it means that

$$\begin{aligned} \Vert D^p f(x)- D^p f(y)\Vert \le L_{p}\Vert x-y\Vert . \end{aligned}$$
(2)

Then Taylor approximation of function f(x) can be written as follows:

$$\begin{aligned} \varOmega _{p}(f,x;y)=f(x)+\sum _{k=1}^{p}\frac{1}{k!}D^{k}f(x)\left[ y-x \right] ^k, \, y\in \mathbb {R}^n. \end{aligned}$$
(3)

By (2) and the standard integration we can get next two inequalities

$$\begin{aligned} |f(y)-\varOmega _{p}(f,x;y)|\le \frac{L_{p}}{(p+1)!}\Vert y-x\Vert ^{p+1}, \end{aligned}$$
(4)
$$\begin{aligned} \Vert \nabla f(y)- \nabla \varOmega _{p}(f,x;y)\Vert \le \frac{L_{p}}{p!}\Vert y-x\Vert ^{p}. \end{aligned}$$
(5)

3 Inexact Near-Optimal Accelerated Tensor Method

Problem (1) can be solved by tensor methods [23] or its accelerated versions [4, 12, 18, 22]. This methods have next basic step:

$$\begin{aligned} T_{H_p}(x) = \mathop {\mathrm {argmin}}\limits _{y} \left\{ \tilde{\varOmega }_{p,H_p}(f,x;y) \right\} , \end{aligned}$$

where

$$\begin{aligned} \tilde{\varOmega }_{p,H_p}(f,x;y) = \varOmega _{p}(f,x;y) + \frac{H_p}{p!}\Vert y - x \Vert ^{p+1}. \end{aligned}$$
(6)

For \(H_p\ge L_p\) this subproblem is convex and hence implementable.

But what if we can not solve exactly this subproblem. In paper [25] it was introduced Inexact pth-Order Basic Tensor Method (BTMI\(_p\)) and Inexact pth-Order Accelerated Tensor Method (ATMI\(_p\)). They have next convergence rates \(O(k^{-p})\) and \(O(k^{-(p+1)})\), respectively. In this section, we introduce Inexact pth-Order Near-optimal Accelerated Tensor Method (NATMI\({_p}\)) with improved convergence rate \(\tilde{O}(k^{-\frac{3p+1}{2}})\), where \(\tilde{O}(\cdot )\) means up to logarithmic factor. It is an improvement of Accelerated Taylor Descent from [4] and generalization of Inexact Accelerated Taylor Descent from [19].

Firstly, we introduce the definition of the inexact subproblem solution. Any point from the set

$$\begin{aligned} \mathcal {N}_{p,H_p}^{\gamma }(x) = \left\{ T \in \mathbb {R}^n \, : \, \Vert \nabla \tilde{\varOmega }_{p,H_p}(f,x;T) \Vert \le \gamma \Vert \nabla f(T)\Vert \right\} \end{aligned}$$
(7)

is the inexact subproblem solution, where \(\gamma \in [0; 1]\) is an accuracy parameter. \(N_{p,H_p}^{0}\) is the exact solution of the subproblem.

Next we propose Algorithm 1.

figure a

To get the convergence rate of Algorithm 1 we prove additional lemmas. The first lemma gets intermediate inequality to connect theory about inexactness and method’s theory.

Lemma 1

If \(y_{k+1} \in \mathcal {N}_{p,H_p}^{\gamma }(\tilde{x}_k) \), then

$$\begin{aligned} \Vert \nabla \tilde{\varOmega }_{p,H_p}(f,\tilde{x}_k;y_{k+1}) \Vert \le \frac{\gamma }{1-\gamma } \cdot \frac{(p+1)H_p+L_p}{p!}\Vert y_{k+1}-\tilde{x}_k\Vert ^p. \end{aligned}$$
(9)

Proof

From triangle inequality we get

$$\begin{aligned} \Vert \nabla f(y_{k+1}) \Vert&\le \Vert \nabla f(y_{k+1}) - \nabla \varOmega _{p}(f,\tilde{x}_k;y_{k+1}) \Vert \\&+ \Vert \nabla \varOmega _{p}(f,\tilde{x}_k;y_{k+1})-\nabla \tilde{\varOmega }_{p,H_p}(f,\tilde{x}_k;y_{k+1}) \Vert + \Vert \nabla \tilde{\varOmega }_{p,H_p}(f,\tilde{x}_k;y_{k+1}) \Vert \\&\overset{(5),(6),(7)}{\le } \frac{L_p}{p!}\Vert y_{k+1}-\tilde{x}_k \Vert ^{p}+ \frac{(p+1)H_p}{p!}\Vert y_{k+1} - \tilde{x}_k \Vert ^{p} + \gamma \Vert \nabla f(y_{k+1}) \Vert . \end{aligned}$$

Hence,

$$\begin{aligned} (1-\gamma )\Vert \nabla f(y_{k+1}) \Vert \le \frac{(p+1)H_p+L_p}{p!}\Vert y_{k+1} - \tilde{x}_k \Vert ^{p}. \end{aligned}$$

And finally from (7) we get

$$\begin{aligned} \Vert \nabla \tilde{\varOmega }_{p,H_p}(f,\tilde{x}_k;y_{k+1}) \Vert \le \frac{\gamma }{1-\gamma } \cdot \frac{(p+1)H_p+L_p}{p!}\Vert y_{k+1} - \tilde{x}_k \Vert ^{p}. \end{aligned}$$

Next lemma plays the crucial role in the prove of the Algorithm 1 convergence. It is the generalization for inexact subpropblem of Lemma 3.1 from [4].

Lemma 2

If \(y_{k+1} \in \mathcal {N}_{p,H_p}^{\gamma }(\tilde{x}_k) \), \(H_p=\xi L_p\) such that \(1 \ge 2\gamma +\frac{1}{\xi (p+1)}\) and

$$\begin{aligned} \frac{1}{2} \le \lambda _{k+1}&\frac{H_p \cdot \Vert y_{k+1} - \tilde{x}_k\Vert ^{p-1}}{(p-1)!} \le \frac{p}{p+1} \, , \text { then}\end{aligned}$$
(10)
$$\begin{aligned} \Vert y_{k+1} - (\tilde{x}_k&- \lambda _{k+1} \nabla f(y_{k+1})) \Vert \le \sigma \cdot \Vert y_{k+1}-\tilde{x}_k\Vert \, ,\end{aligned}$$
(11)
$$\begin{aligned} \sigma&\ge \frac{p \xi + 1 -\xi +2\gamma \xi }{(1-\gamma )2p \xi }, \end{aligned}$$
(12)

where \(\sigma \le 1\).

Proof

Note, that by definition

$$\begin{aligned} \begin{aligned} \nabla \tilde{\varOmega }_{p,H_p}(f,\tilde{x}_k;y_{k+1})&= \nabla \varOmega _{p}(f,\tilde{x}_k;y_{k+1})\\&+ \frac{H_p(p+1)}{p!}\Vert y_{k+1} - \tilde{x}_k \Vert ^{p-1} (y_{k+1}-\tilde{x}_k). \end{aligned} \end{aligned}$$
(13)

Hence,

$$\begin{aligned} \begin{aligned} y_{k+1}-\tilde{x}_k&= \frac{p!}{H_p(p+1)\Vert y_{k+1} - \tilde{x}_k \Vert ^{p-1}} \\&\cdot \left( \nabla \tilde{\varOmega }_{p,H_p}(f,\tilde{x}_k;y_{k+1}) - \nabla \varOmega _{p}(f,\tilde{x}_k;y_{k+1})\right) . \end{aligned} \end{aligned}$$
(14)

Then, by triangle inequality we get

$$\begin{aligned}&\Vert y_{k+1} - (\tilde{x}_k - \lambda _{k+1} \nabla f(y_{k+1})) \Vert = \Vert \lambda _{k+1} (\nabla f(y_{k+1})- \nabla \varOmega _{p}(f,\tilde{x}_k;y_{k+1}))\\&+\lambda _{k+1}\nabla \tilde{\varOmega }_{p,H_p}(f,\tilde{x}_k;y_{k+1})\\&+ \left. \left( y_{k+1} - \tilde{x}_k + \lambda _{k+1}(\nabla \varOmega _{p}(f,\tilde{x}_k;y_{k+1})-\nabla \tilde{\varOmega }_{p,H_p}(f,\tilde{x}_k;y_{k+1}))\right) \right\| \\&\overset{(5),(14)}{\le } \lambda _{k+1} \frac{L_p}{p!} \Vert y_{k+1} - \tilde{x}_k\Vert ^p + \lambda _{k+1}\Vert \nabla \tilde{\varOmega }_{p,H_p}(f,\tilde{x}_k;y_{k+1})\Vert \\&+ \left| \lambda _{k+1} - \frac{p!}{H_p \cdot (p+1) \cdot \Vert y_{k+1} - \tilde{x}_k\Vert ^{p-1}} \right| \\&\cdot \Vert \nabla \tilde{\varOmega }_{p,H_p}(f,\tilde{x}_k;y_{k+1})-\nabla \varOmega _{p}(f,\tilde{x}_k;y_{k+1})\Vert \end{aligned}$$
$$\begin{aligned}&\overset{(9),(13)}{\le } \Vert y_{k+1} - \tilde{x}_k\Vert \left( \lambda _{k+1} \frac{L_p}{p!} \Vert y_{k+1} - \tilde{x}_k\Vert ^{p-1} \right. \\&+\left. \lambda _{k+1}\frac{\gamma }{1-\gamma } \cdot \frac{(p+1)H_p+L_p}{p!}\Vert y_{k+1}-\tilde{x}_k\Vert ^{p-1}\right) \\&+ \left| \lambda _{k+1} - \frac{p!}{H_p \cdot (p+1) \cdot \Vert y_{k+1} - \tilde{x}_k\Vert ^{p-1}} \right| \cdot \frac{(p+1)H_p}{p!} \Vert y_{k+1} - \tilde{x}_k\Vert ^{p} \end{aligned}$$
$$\begin{aligned}&=\Vert y_{k+1} - \tilde{x}_k\Vert \left( \frac{\lambda _{k+1}}{p!} \left( L_p + \frac{\gamma }{1-\gamma } ((p+1)H_p+L_p) \right) \Vert y_{k+1} - \tilde{x}_k\Vert ^{p-1} \right) \\&+ \Vert y_{k+1} - \tilde{x}_k\Vert \left| \frac{\lambda _{k+1}(p+1)H_p}{p!} \Vert y_{k+1} - \tilde{x}_k\Vert ^{p-1} - 1\right| \end{aligned}$$
$$\begin{aligned}&\overset{(10)}{\le }\Vert y_{k+1} - \tilde{x}_k\Vert \left( \frac{\lambda _{k+1}}{p!} \left( L_p + \frac{\gamma }{1-\gamma } ((p+1)H_p+L_p) \right) \Vert y_{k+1} - \tilde{x}_k\Vert ^{p-1} \right) \\&+ \Vert y_{k+1} - \tilde{x}_k\Vert \left( 1-\frac{\lambda _{k+1}(p+1)H_p}{p!} \Vert y_{k+1} - \tilde{x}_k\Vert ^{p-1} \right) \\&=\Vert y_{k+1} - \tilde{x}_k\Vert \left( 1 + \frac{\lambda _{k+1}}{p!} \Vert y_{k+1} - \tilde{x}_k\Vert ^{p-1} \right. \\&\cdot \left. \left( L_p - (p+1)H_p + \frac{\gamma }{1-\gamma } ((p+1)H_p+L_p) \right) \right) . \end{aligned}$$

Hence, by (10) and simple calculations we get

$$\begin{aligned} \sigma&\ge 1 + \frac{1}{2p H_p} \left( L_p - (p+1)H_p + \frac{\gamma }{1-\gamma } ((p+1)H_p+L_p) \right) \\&= 1 + \frac{1}{2p \xi } \left( 1 - (p+1)\xi + \frac{\gamma }{1-\gamma } ((p+1)\xi +1) \right) \\&= 1 + \frac{1}{2p \xi } \left( 1 - p\xi -\xi + \frac{\gamma p\xi +\gamma \xi +\gamma }{1-\gamma } \right) \\&= 1 + \frac{1}{2p \xi } \left( \frac{1 - p\xi -\xi - \gamma + \gamma p\xi +\gamma \xi +\gamma p\xi +\gamma \xi +\gamma }{1-\gamma } \right) \\&= 1 + \left( \frac{1 - p\xi -\xi + 2\gamma p\xi +2\gamma \xi }{(1-\gamma )2p \xi } \right) \\&= \frac{p \xi + 1 -\xi +2\gamma \xi }{(1-\gamma )2p \xi }. \end{aligned}$$

Lastly, we prove that \(\sigma \le 1\). For that we need

$$\begin{aligned} (1-\gamma )2p \xi&\ge p \xi + 1 -\xi +2\gamma \xi \\ (p+1) \xi&\ge 1 +2\gamma \xi (1+p)\\ \frac{1}{2}-\frac{1}{2\xi (p+1)}&\ge \gamma . \end{aligned}$$

We have proved the main lemma for the convergence rate theorem, other parts of the proof are the same as [4]. As a result, we get the next theorem.

Theorem 1

Let f be a convex function whose \(p^{th}\) derivative is \(L_p\)-Lipschitz and \(x_{*}\) denote a minimizer of f. Then Algorithm 1 converges with rate

$$\begin{aligned} f(y_k) - f(x_{*}) \le \tilde{O}\left( \frac{H_p R^{p+1}}{k^{\frac{ 3p +1}{2}}}\right) \,, \end{aligned}$$
(15)

where

$$\begin{aligned} R=\Vert x_0 - x^{*}\Vert \end{aligned}$$
(16)

is the maximal radius of the initial set.

4 Hyperfast Second-Order Method

In recent work [25] it was mentioned that for convex optimization problem (1) with first order oracle (returns gradient) the well-known complexity bound \(\left( L_{1}R^2/\varepsilon \right) ^{1/2}\) can not be beaten even if we assume that all \(L_{p} < \infty \). This is because of the structure of the worth case function

$$f_p(x) = |x_1|^{p+1} + |x_2 - x_1|^{p+1} + ... + |x_n - x_{n-1}|^{p+1},$$

where \(p = 1\) for first order method. It’s obvious that \(f_p(x)\) satisfy the condition \(L_{p} < \infty \) for all natural p. So additional smoothness assumptions don’t allow to accelerate additionally. The same thing takes place, for example, for \(p=3\). In this case, we also have \(L_{p} < \infty \) for all natural p. But what is about \(p=2\)? In this case \(L_3 = \infty \). It means that \(f_2(x)\) couldn’t be the proper worth case function for the second-order method with additional smoothness assumptions. So there appears the following question: Is it possible to improve the bound \(\left( L_{2}R^3/\varepsilon \right) ^{2/7}\)? At the very beginning of 2020 Yu. Nesterov gave a positive answer. For this purpose, he proposed to use an accelerated third-order method that requires \(\tilde{O}\left( (L_{3}R^4/\varepsilon )^{1/4}\right) \) iterations by using second-order oracle [23]. So all this means that if \(L_3 < \infty \), then there are methods that can be much faster than \(\tilde{O}\left( \left( L_{2}R^3/\varepsilon \right) ^{2/7}\right) \).

In this section, we improve convergence speed and reach near-optimal speed up to logarithmic factor. We consider problem (1) with \(p=3\), hence \(L_3<\infty \). In previous section, we have proved that Algorithm 1 converges. Now we fix the parameters for this method

$$\begin{aligned} p=3,\quad \gamma =\frac{1}{2p}=\frac{1}{6}, \quad \xi = \frac{2p}{p+1}=\frac{3}{2}. \end{aligned}$$
(17)

By (12) we get \(\sigma = 0.6\) that is rather close to initial exact \(\sigma _{0}=0.5\). For such parameters we get next convergence speed of Algorithm 1 to reach accuracy \(\varepsilon \):

$$\begin{aligned} N_{out}= \tilde{O}\left( \left( \frac{L_3 R^{4}}{\varepsilon }\right) ^{\frac{1}{5}}\right) . \end{aligned}$$
(18)

Note, that at every step of Algorithm 1 we need to solve next subproblem with accuracy \(\gamma =1/6\)

$$\begin{aligned} \begin{aligned} \mathop {\mathrm {argmin}}\limits _{y}&\left\{ \left\langle \nabla f(x_i),y-x_i\right\rangle +\frac{1}{2}\nabla ^2 f(x_i)[y-x_i]^2\right. \\&+ \left. \frac{1}{6}D^3f(x_i)[y-x_i]^3 + \frac{L_3}{4}\Vert y - x_i \Vert ^{4} \right\} . \end{aligned} \end{aligned}$$
(19)

In [14] it was proved, that problem (19) can be solved by Bregman-Distance Gradient Method (BDGM) with linear convergence speed. According to [25] BDGM can be improved to work with inexact gradients of the functions. This made possible to approximate \(D^3 f(x)\) by gradients and escape calculations of \(D^3 f(x)\) at each step. As a result, in [25] it was proved, that subproblem (19) can be solved up to accuracy \(\gamma = 1/6\) with one calculation of Hessian and \(O\left( \log \left( \frac{\Vert \nabla f(x_i)\Vert +\Vert \nabla ^2 f(x_i)\Vert }{\varepsilon }\right) \right) \) calculation of gradient.

We use BDGM to solve subproblem from Algorithm 1 and, as a result, we get next Hyperfast Second-Order method as merging NATMI and BDGM.

figure b

In the Algorithm 3, \(\beta _{\rho _k}(z_i,z)\) is a Bregman distance generated by \(\rho _k(z)\)

$$\begin{aligned} \beta _{\rho _k}(z_i,z)=\rho _k(z) - \rho _k(z_i) -\left\langle \nabla \rho _k(z_i), z-z_i \right\rangle . \end{aligned}$$

By \(g_{\varphi _k,\tau }(z)\) we take an inexact gradient of the subproblem (19)

$$\begin{aligned} g_{\varphi _k,\tau }(z)= \nabla f(\tilde{x}_k) +\nabla ^2 f(\tilde{x}_k)[z-\tilde{x}_k]+ \frac{1}{2} g^{\tau }_{\tilde{x}_k}(z) + L_3\Vert z - \tilde{x}_k \Vert ^{2} (z - \tilde{x}_k) \end{aligned}$$
(22)

and \(g^{\tau }_{\tilde{x}_k}(z)\) is a inexact approximation of \(D^3f(\tilde{x}_k)[y-\tilde{x}_k]^2\)

$$\begin{aligned} g^{\tau }_{\tilde{x}_k}(z)= \frac{1}{\tau ^2}\left( \nabla f(\tilde{x}_k+\tau (z-\tilde{x}_k))+ \nabla f(\tilde{x}_k-\tau (z-\tilde{x}_k))-2\nabla f(\tilde{x}_k)\right) . \end{aligned}$$
(23)

In paper [25] it is proved, that we can choose

$$\delta =O\left( \frac{\varepsilon ^{\frac{3}{2}}}{\Vert \nabla f(\tilde{x}_k)\Vert ^{\frac{1}{2}}_{*}+\Vert \nabla ^2 f(\tilde{x}_k)\Vert ^{\frac{3}{2}}/L_3^{\frac{1}{2}}} \right) ,$$
figure c

then total number of inner iterations equal to

$$\begin{aligned} T_k(\delta )=O\left( \ln {\frac{G+H}{\varepsilon }}\right) , \end{aligned}$$
(24)

where G and H are the uniform upper bounds for the norms of the gradients and Hessians computed at the points generated by the main algorithm. Finally, we get next theorem.

Theorem 2

Let f be a convex function whose third derivative is \(L_3\)-Lipschitz and \(x_{*}\) denote a minimizer of f. Then to reach accuracy \(\varepsilon \) Algorithm 2 with Algorithm 3 for solving subproblem computes

$$\begin{aligned} N_{1}=\tilde{O}\left( \left( \frac{L_3 R^{4}}{\varepsilon }\right) ^{\frac{1}{5}}\right) \end{aligned}$$
(25)

Hessians and

$$\begin{aligned} N_{2}=\tilde{O}\left( \left( \frac{L_3 R^{4}}{\varepsilon }\right) ^{\frac{1}{5}}\log \left( \frac{G+H}{\varepsilon }\right) \right) \end{aligned}$$
(26)

gradients, where G and H are the uniform upper bounds for the norms of the gradients and Hessians computed at the points generated by the main algorithm.

One can generalize this result on uniformly-strongly convex functions by using inverse restart-regularization trick from [13].

So, the main observation of this section is as follows: If \(L_3 < \infty \), then we can use this hyperfastFootnote 3 second-order algorithm instead of considered in the paper optimal one to make our sliding faster (in convex and uniformly convex cases).

5 Conclusion

In this paper, we present Inexact Near-optimal Accelerated Tensor Method and improve its convergence rate. This improvement make it possible to solve the Taylor approximation subproblem by other methods. Next, we propose Hyperfast Second-Order Method and get its convergence speed \(O(N^{-5})\) up to logarithmic factor. This method is a combination of Inexact Third-Order Near-Optimal Accelerated Tensor Method with Bregman-Distance Gradient Method for solving inner subproblem. As a result, we prove that our method has near-optimal convergence rates for given problem class and the best known on that moment.

In this paper, we developed near-optimal Hyperfast Second-Order method for sufficiently smooth convex problem in terms of convergence in function. Based on the technique from the work [9], we can also developed near-optimal Hyperfast Second-Order method for sufficiently smooth convex problem in terms of convergence in the norm of the gradient. In particular, based on the work [16] one may show that the complexity of this approach to the dual problem for 1-entropy regularized optimal transport problem will be \(\tilde{O}\left( \left( (\sqrt{n})^{4}/\varepsilon \right) ^{1/5}\right) \cdot O(n^{2.5}) = O(n^{2.9}\varepsilon ^{-1/5})\) a.o., where n is the linear dimension of the transport plan matrix, that could be better than the complexity of accelerated gradient method and accelerated Sinkhorn algorithm \(O(n^{2.5}\varepsilon ^{-1/2})\) a.o. [8, 16]. Note, that the best theoretical bounds for this problem are also far from to be practical ones [2, 17, 20, 27].