Keywords

1 Introduction

Recently, distributed optimization problems have received extensive attention, and it is very helpful to solve this problem by the distributed consensus algorithm. We improved the consensus algorithm to solve the distributed optimization problem in this paper, where each agent can access to one cost function , and all agents collaboratively minimize the entire function \(\frac{1}{n}\sum \limits _{i=1}^n f_i(x)\) through the exchange of information between agents. This paper focuses on the situation of unbalanced directed graph. Early works about distributed optimization problems mainly included distributed gradient descent [1] and distributed dual averaging [2] over undirected graphs. It was proved that the optimal value could be found at a linear rate of \(O(\frac{lnk}{\sqrt{k}})\) for any convex function, and the rate of \(O(\frac{lnk}{k}) \)for any strongly convex function, where k is the number of iterations. Under the conditions of strong convexity and Lipschitz continuous gradient, algorithms were improved with faster convergence speed. For example, the algorithm with a constant step size geometrically converged to an error ball around the optimal solution, there was another method that requires symmetric weights to achieve global geometric convergence. In [4,5,6], the imprecise gradient method and the gradient estimation method were introduced to deal with this problem.

The aforementioned methods were all for undirected graph. When the communication capabilities between agents were inconsistent, the algorithm for undirected graph would no longer be applicable. Therefore, algorithms suitable for directed graphs need to be developed. The Push-sum method and the DGD (distributed sub-gradient descent) method were introduced for directed graphs in [7,8,9,10]. However, the effect of the reduction of the step size resulted in a relatively slow convergence rate. Literature [11] assumed that the objective function has a Lipschitz continuous gradient and is strongly convex. It was shown that all agents would converge to the optimal value at geometrical rate. By constructing a row stochastic matrix and a column stochastic matrix, another type of algorithm was proposed [12,13,14], where the row random matrix ensured the consistency of the algorithm, and the column randomness matrix was used to guarantee the optimality. In [12, 13], the cases of fixed strong connectivity and time-varying strong connectivity were considered, based on which, the gravity ball was introduced to improve the convergence rate of algorithms.

For second order and heterogeneous multi-agent systems, some improve algorithms were also proposed in [16, 17]. Inspired by the literature [12], we studies an improved fully distributed algorithm to optimize all objective functions in a distributed manner, where the momentum term is borrowed to improve the convergence rate. It is shown that the position states of every agents would converge to the optimal solution of the objective function by the nature of the random matrix.

\(I_n\) represents an n-dimensional unit matrix, and \(1_n\) represents a column vector whose components are all ones. \(\rho (x)\) represents the spectral radius of the vector x, and \(X_{\infty }\) represents the infinite power of the matrix X. For the row random matrix A, \(\pi _r\) and \(1_n\) to represent the left and right eigenvectors of A respectively, such that \(\pi _r^T1_n=1\). Similarly, for the column random matrix B, \(1_n\) and \(\pi _c\) to represent the left and right eigenvectors of B respectively, such that \(\pi _c^T1_n=1\). \(\Arrowvert \cdot \Arrowvert _2\) represents the 2-Norm of the vector. \({{\left| \left| \left| \cdot \right| \right| \right| }}_2\) represents the spectral norm of the matrix.

2 Graph Theory Foundation and Problem Description

\(\mathcal {G} = (\mathcal {V},\mathcal {E})\) denotes a directed graph, where \(\mathcal {V}=\{1,2,\dots ,n\}\) represents the set of network agents, and \(\mathcal {E}\) represents the set of edges between agents in the network. (ji) or \(j\rightarrow i\) indicates that there is a directed edge that transmits information from the agent j to the agent i. For \({\forall }\) i,j, if there is a directed path \((i_1,i_{s1}),(i_{s1},i_{s2}),\dots ,(i_{sk},j)\), then it is called a strongly connected graph. In addition, \(N_i^{in}=\{j\mid (j, i)\in \mathcal {E}\}\) represents the into-neighbor set of the agent i, that is the set of agents that the agent i can receives information from. Similarly, \(N_i^{out}=\{j\mid (i,j)\in \mathcal {E}\}\) represents the out-neighbors set of the agent i, that is the set of agents that can receive information from agent i. Note that both \(N_i^{in}\) and \(N_i^{out}\) contain node i.

In the distributed convex optimization problem, each agent i can access to a local decision variable and a convex cost function \(f_i(x)\). The goal of this problem is to minimize the following integral objective function.

(1)

Each agent i can only obtain its own cost function . Assume that each cost function is strongly convex and its gradient is Lipschitz.

Assumption 1

\(\mathcal {G}\) is directed strongly connected graph.

Assumption 2

The gradient of the objective function of each agent satisfies Lipschitz condition, that is, for any agent i and , there is a constant \(l_i\) such that:

$$\begin{aligned} \Arrowvert \nabla f_i(x) - \nabla f_i(y) \Arrowvert \le l_i\Arrowvert x - y \Arrowvert \end{aligned}$$
(2)

Assumption 3

The cost function of each agent is strongly convex, that is, for any agent i and , there is a positive constant \(\mu \) such that:

$$\begin{aligned} f_i(x)-f_i(y) \ge \nabla f_i(x)^\mathrm{T} (x - y)-\frac{\mu }{2} \Arrowvert x-y \Arrowvert _2^2 \end{aligned}$$
(3)

Remark 1:

Assumption 2 and Assumption 3 ensure that the global optimal solution \(x^{*}\) exists and is unique respectively. Assumption 3 is conducive to the subsequent proof of the convergence of the algorithm.

3 Algorithm Design

We propose the following algorithm to solve problem (1) in this paper. Each agent i contains two variables \(x_{i,k}, s_{i,k}\) in the network, and k represents iteration step, where \(i \in \mathcal {V}\), . The system satisfies the initial state \(s_{i,0}=\nabla f_i(x_{i,0}),i \in \mathcal {V}\).

$$\begin{aligned} x_{i,k+1}=\sum \limits _{j=1}^n a_{ij}x_{j,k}-\alpha _is_{i,k}+\beta [\sum \limits _{j=1}^na_{ij}(x_{j,k}-x_{i,k})]_- \end{aligned}$$
(4a)
$$\begin{aligned} s_{i,k+1}=\sum \limits _{j=1}^n b_{ij}[s_{j,k}+\nabla f_j(x_{j,k+1})-\nabla f_j(x_{j,k})] \end{aligned}$$
(4b)

where \(\alpha _i\) and \(\beta \) are both positive constants. The weights \(a_{ij}\) and \(b_{ij}\) satisfy the following:

$$\begin{aligned} a_{ij}=\left\{ \begin{aligned} >0&,&j\in N_i^{in},\\ 0&,&j\notin N_i^{in}, \end{aligned} \right. \quad \sum \limits _{j=1}^n a_{ij}=1, \forall i, \end{aligned}$$
(5)
$$\begin{aligned} b_{ij}=\left\{ \begin{aligned} >0&,&i\in N_j^{out},\\ 0&,&i\notin N_j^{out}, \end{aligned} \right. \quad \sum \limits _{i=1}^n b_{ij}=1, \forall j, \end{aligned}$$
(6)

\(\bar{A}=\{a_{ij}\}\) represents the row random matrix, and \(\bar{B}=\{b_{ij}\}\) represents the column random matrix.

Denote \(x_k=[x_{1,k}^T,\dots ,x_{n,k}^T]^T\), \(s_k=[s_{1,k}^T,\),\(\dots ,s_{n,k}^T]^T\), \(\nabla f(x_k)=[\nabla f_1(x_{1,k})^T,\dots \),\(\nabla f_n(x_{n,k})^T]^T\). Let\(A=\bar{A}\otimes I_m\),\(B=\bar{B}\otimes I_m\), then Eq. (4) can be rewritten the following form:

$$\begin{aligned} x_{k+1}=Ax_k-D_\alpha s_k +\beta [Ax_k-x_k]_- \end{aligned}$$
(7a)
$$\begin{aligned} s_{k+1}=B[s_{k}+\nabla f(x_{k+1})-\nabla f(x_{k})] \end{aligned}$$
(7b)

where \(D_\alpha \) represents a diagonal matrix whose diagonal elements are \(\alpha _i\), and other elements are 0, where \(s_0=\nabla f(x_0)\), and \( x_0\) is arbitrary.

4 Algorithm Convergence Analysis

First, let us prove a key lemma, which involves the shrinkage of the consistency process of the row and column random matrix respectively.

Lemma 1

\(A=\bar{A}\otimes I_m\) and \(B=\bar{B}\otimes I_m\) are weight matrices, there are vector norms \(\Arrowvert \cdot \Arrowvert _A\) and \(\Arrowvert \cdot \Arrowvert _B\) such that for ,

$$\begin{aligned} \Arrowvert Ax-A_{\infty } \Arrowvert _A \le \sigma _A\Arrowvert x-A_{\infty }x \Arrowvert _A \end{aligned}$$
(8)
$$\begin{aligned} \Arrowvert Bx-B_{\infty } \Arrowvert _B \le \sigma _B\Arrowvert x-B_{\infty }x \Arrowvert _B. \end{aligned}$$
(9)

Proof

Since \(\bar{A}\) is irreducible, its diagonal elements are all positive, and the rows are random. According to the Perro-Frobenius theorem, \(\rho (\bar{A})=1\). Every eigenvalue except 1 is strictly less than \(\rho (\bar{A})\), \(\pi _r^{T}\) is a strictly positive left eigenvector corresponding to eigenvalue 1, and \(\pi _r ^{T}1_n=1\). Therefore, \(\lim \limits _{k \rightarrow \infty }\bar{A}^k=1_n\pi _r^{T}\), and

$$\begin{aligned} A_{\infty }=\lim \limits _{k \rightarrow \infty }A^k=(\lim \limits _{k \rightarrow \infty }\bar{A}^k)\otimes I_m=(1_n\pi _r^{T})\otimes I_m. \end{aligned}$$

Then

$$\begin{aligned} AA_{\infty }=(\bar{A}\otimes I_m)((1_n\pi _r^{T})\otimes I_m)=A_{\infty } \end{aligned}$$
$$\begin{aligned} A_{\infty }A_{\infty }=((1_n\pi _r^{T})\otimes I_m)((1_n\pi _r^{T})\otimes I_m)=A_{\infty } \end{aligned}$$

Therefore,\(AA_{\infty }-A_{\infty }A_{\infty } = 0\), then there are the following formulas

$$\begin{aligned} Ax-A_{\infty }x=(A-A_{\infty })(x-A_{\infty }x). \end{aligned}$$
(10)

Because \(\rho (A-A_{\infty })=\rho ((\bar{A}-1_n\pi _r^T)\otimes I_m)<1\), according to [15], there is a matrix norm \({{\left| \left| \left| \cdot \right| \right| \right| }}_A\) such that \(\sigma _A={{\left| \left| \left| A-A_{\infty } \right| \right| \right| }}_A<1\). In addition, according to Theorem 5.7.13 in [15], there is a corresponding vector norm \({{\left| \left| \left| \cdot \right| \right| \right| }}_A\) for any matrix norm \(\Arrowvert \cdot \Arrowvert _A\), such that for all matrices Y and vectors y, \(\Arrowvert Yy \Arrowvert _A \le {{\left| \left| \left| Y \right| \right| \right| }}_A\Arrowvert y \Arrowvert _A\) . Therefore, Eq. (10) leads to:

$$ \begin{aligned}&\Arrowvert Ax-A_{\infty }x \Arrowvert _A=\Arrowvert (A-A_{\infty })(x-A_{\infty }x) \Arrowvert _A \\&\le {{\left| \left| \left| A-A_{\infty } \right| \right| \right| }}_A\Arrowvert x-A_{\infty }x \Arrowvert _A = \sigma _A\Arrowvert x-A_{\infty }x \Arrowvert _A. \end{aligned} $$

The Eq. (8) of Lemma 1 is proved. The same is true for Eq. (9).

Lemma 2

$$\begin{aligned} (1_n^T \otimes I_m)s_k=(1_n^T \otimes I_m)\nabla f(x_k),\forall k. \end{aligned}$$

Proof

\( (1_n^T \otimes I_m)s_k =(1_n^T \otimes I_m)(\bar{B}\otimes I_m)[s_k+\nabla f(x_{k+1})-\nabla f(x_k)] =(1_n^T \otimes I_m)s_k+(1_n^T \otimes I_m)(\nabla f(x_{k+1})-\nabla f(x_k)) =(1_n^T \otimes I_m)(s_0-\nabla f(x_0))+(1_n^T \otimes I_m)\nabla f(x_k) =(1_n^T \otimes I_m)\nabla f(x_k). \)

Lemma 3

[18]. If the function f satisfies Assumptions 2 and 3, and l and \(\mu \) are respectively strongly convex and Lipschitz continuous coefficients, then for ,

$$\begin{aligned} \Arrowvert x-\alpha \nabla f(x)-x^{*} \Arrowvert \le (1-\mu \alpha )\Arrowvert x-x^{*}\Arrowvert . \end{aligned}$$

Lemma 4

[15]. Suppose is non-negative, and is positive. If \(Ww<\zeta w\) with \(\zeta >0\), then \(\rho (W)<\zeta \).

The subsequent analysis of convergence is carried out from the contraction relationship of the following four quantities.

\(1)\quad \Arrowvert x_{k+1}-A_{\infty }x_{k+1}\Arrowvert _A;\)

\(2)\quad \Arrowvert x_{k+1}-Ax_{k+1}\Arrowvert _2;\)

\(3)\quad \Arrowvert A_{\infty }x_{k+1}-1_n \otimes x^{*}\Arrowvert _2;\)

\(4)\quad \Arrowvert s_{k+1}-B_{\infty }s_{k+1}\Arrowvert _B.\)

Norms in finite-dimensional linear space are equivalent, that is, there are positive constants cdhqgp such that the vector norm satisfies the following inequality:

$$\begin{aligned} \Arrowvert \cdot \Arrowvert _A \le c\Arrowvert \cdot \Arrowvert _B,\quad \Arrowvert \cdot \Arrowvert _2 \le h\Arrowvert \cdot \Arrowvert _B,\quad \Arrowvert \cdot \Arrowvert _2 \le g\Arrowvert \cdot \Arrowvert _A, \end{aligned}$$
$$\begin{aligned} \Arrowvert \cdot \Arrowvert _B \le d\Arrowvert \cdot \Arrowvert _A, \quad \Arrowvert \cdot \Arrowvert _B \le q\Arrowvert \cdot \Arrowvert _2,\quad \Arrowvert \cdot \Arrowvert _A \le p\Arrowvert \cdot \Arrowvert _2. \end{aligned}$$

Lemma 5

For \(\forall k\ge 0\), the following inequality holds,

$$\begin{aligned} \Arrowvert s_k \Arrowvert _2&\le h\Arrowvert s_k \Arrowvert _B +{{\left| \left| \left| B \right| \right| \right| }}_2\bar{l}g\Arrowvert s_k \Arrowvert _A+{{\left| \left| \left| B \right| \right| \right| }}_2\bar{l}\Arrowvert A_{\infty }x_k-1_n\otimes x^{*} \Arrowvert _2 \end{aligned}$$

where \(\bar{l}=\max {\{l_i\}}\).

Proof

$$\begin{aligned} \Arrowvert s_k \Arrowvert _2 \le h\Arrowvert s_k-B_{\infty }s_k \Arrowvert _B+\Arrowvert B_{\infty }s_k \Arrowvert _2 \end{aligned}$$

\(\Arrowvert B_{\infty }s_k \Arrowvert _2=\Arrowvert (\pi _c\otimes I_m)(1_n^T\otimes I_m)s_k \Arrowvert _2 =\Arrowvert \pi _c \Arrowvert _2\Arrowvert (1_n^T\otimes I_m)s_k \Arrowvert _2 =\Arrowvert \pi _c \Arrowvert _2\Arrowvert \sum \limits _{i=1}^n \nabla f_i(x_{i,k})- \sum \limits _{i=1}^n\nabla f_i(x^{*}) \Arrowvert _2 \le \Arrowvert \pi _c \Arrowvert _2\bar{l}\sum \limits _{i=1}^n\Arrowvert x_{i,k}-x^{*} \Arrowvert _2 \le \Arrowvert \pi _c \Arrowvert _2\bar{l}\sqrt{n}\Arrowvert x_k-1_n\otimes x^{*} \Arrowvert _2 \le {{\left| \left| \left| B \right| \right| \right| }}_2\bar{l}g\Arrowvert x_k-A_{\infty }x_k \Arrowvert _A+{{\left| \left| \left| B \right| \right| \right| }}_2\bar{l}\Arrowvert A_{\infty }x_k-1_n\otimes x^{*} \Arrowvert _2.\)

The proof is completed.

Lemma 6

For \(\forall k\ge 0\), we have the following inequality,

\(\Arrowvert x_{k+1}-A_{\infty }x_{k+1} \Arrowvert _A \le \sigma _A\Arrowvert x_k-A_{\infty }x_k \Arrowvert _A +\bar{\alpha }p{{\left| \left| \left| I_{mn}-A_{\infty } \right| \right| \right| }}_2\Arrowvert s_k \Arrowvert _2 +\beta {{\left| \left| \left| I_{mn}-A_{\infty } \right| \right| \right| }}_A\Arrowvert Ax_k-x_k \Arrowvert _A.\)

where \(\bar{\alpha }= \max {\{\alpha _i\}}\).

Proof

\(\Arrowvert x_{k+1}-A_{\infty }x_{k+1} \Arrowvert _A=\Arrowvert Ax_k-D_\alpha s_k +\beta [Ax_k-x_k]_{-}-A_{\infty }x_k+A_{\infty }D_\alpha s_k \quad -\beta A_{\infty }[Ax_k-x_k]_- \Arrowvert _A \le \sigma _A\Arrowvert x_k-A_{\infty }x_k \Arrowvert _A+\bar{\alpha }\Arrowvert s_k-A_{\infty }s_k \Arrowvert _A+ \quad \beta {{\left| \left| \left| I_{mn}-A_{\infty } \right| \right| \right| }}_A\Arrowvert Ax_k-x_k \Arrowvert _A \le \sigma _A\Arrowvert x_k-A_{\infty }x_k \Arrowvert _A +\bar{\alpha }p{{\left| \left| \left| I_{mn}-A_{\infty } \right| \right| \right| }}_2\Arrowvert s_k \Arrowvert _2 \quad +\beta {{\left| \left| \left| I_{mn}-A_{\infty } \right| \right| \right| }}_A\Arrowvert Ax_k-x_k \Arrowvert _A.\)

Lemma 7

For \(\forall k\ge 0\), we have the following inequality,

\(\Arrowvert x_{k+1}-Ax_{k+1} \Arrowvert _2\le (\sigma _A+\sigma _A^2)g\Arrowvert x_k-A_{\infty }x_k\Arrowvert _A+\bar{\alpha }{{\left| \left| \left| I_{mn}-A \right| \right| \right| }}_2\Arrowvert s_k \Arrowvert _2+\beta {{\left| \left| \left| I_{mn}-A \right| \right| \right| }}_2\Arrowvert x_k-Ax_k \Arrowvert _2\).

Proof

\( \Arrowvert x_{k+1}-Ax_{k+1} \Arrowvert _2 =\Arrowvert Ax_k-D_\alpha s_k +\beta [Ax_k-x_k]_- \quad -A^2x_k+AD_\alpha s_k -\beta A[Ax_k-x_k]_- \Arrowvert _A =\Arrowvert Ax_k-A_{\infty }x_k-A^2x_k+A_{\infty }x_k -D_\alpha s_k+AD_\alpha s_k +\beta [Ax_k-x_k]_--\beta A[Ax_k-x_k]_- \Arrowvert _A \le \sigma _Ag\Arrowvert x_k-A_{\infty }x_k \Arrowvert _A+\sigma _A^2g\Arrowvert x_k-A_{\infty }x_k \Arrowvert _A+\bar{\alpha }\Arrowvert s_k-As_k \Arrowvert _2 +\beta {{\left| \left| \left| I_{mn}-A \right| \right| \right| }}_2\Arrowvert x_k-Ax_k \Arrowvert _2 \le (\sigma _A+\sigma _A^2)g\Arrowvert x_k-A_{\infty }x_k \Arrowvert _A+\bar{\alpha }{{\left| \left| \left| I_{mn}-A \right| \right| \right| }}_2\Arrowvert s_k \Arrowvert _2 +\beta {{\left| \left| \left| I_{mn}-A \right| \right| \right| }}_2\Arrowvert x_k-Ax_k \Arrowvert _2. \)

Lemma 8

When \(0<\bar{\alpha }<\frac{1}{nl\pi _r^T\pi _c}\), for \(\forall k\ge 0\), we have the following inequality:

\( \Arrowvert A_{\infty }x_{k+1}-1_n\otimes x^{*}\Arrowvert _2 \le (1-n\mu (\pi _r^T\pi _c)\bar{\alpha })\Arrowvert A_{\infty }x_k-1_n\otimes x^{*}\Arrowvert _2+\bar{\alpha }(\pi _r^T\pi _c)nlg\Arrowvert x_k-A_{\infty }x_k \Arrowvert _A +\bar{\alpha }h\Arrowvert s_k-B_{\infty }s_k \Arrowvert _B+\beta {{\left| \left| \left| A_{\infty } \right| \right| \right| }}_2\Arrowvert x_k-Ax_k \Arrowvert _2 \)

Proof

\( \Arrowvert A_{\infty }x_{k+1}-1_n\otimes x^{*}\Arrowvert _2 =\Arrowvert A_{\infty }(Ax_k-D_{\alpha } s_k+(D_{\alpha }-D_{\alpha })B_{\infty }s(k)+\beta [Ax_k-x_k]_-)-1_n\otimes x^{*}\Arrowvert _2 \le \Arrowvert A_{\infty }x_k-A_{\infty }D_{\alpha }B_{\infty }\nabla f(x_k)-(1_n\otimes I_m)x^{*}\Arrowvert _2+\bar{\alpha }h\Arrowvert s_k-B_{\infty }s_k \Arrowvert _B +\beta {{\left| \left| \left| A_{\infty } \right| \right| \right| }}_2\Arrowvert x_k-Ax_k \Arrowvert _2. \)

\( A_{\infty }B_{\infty }=((1_n\pi _r^T)\otimes I_m)((\pi _c 1_n^T)\otimes I_m)=(\pi _r^T\pi _c)(1_n1_n^T)\otimes I_m. \)

\( \Arrowvert ((1_n\pi _r^T)\otimes I_m)x_k-(1_n\otimes I_m)x^{*}A_{\infty }D_{\alpha }B_{\infty }\nabla f(x_k)\Arrowvert _2 =\Arrowvert (1_n\otimes I_m)((\pi _r^T\otimes I_m)x_k-(\pi _r^Tdiag(\alpha )\pi _c)(1_n^T\otimes I_m)\nabla f(x_k)-x^{*}\Arrowvert _2) \le \Arrowvert (1_n\otimes I_m)((\pi _r^T\otimes I_m)x_k-n\pi _r^T\pi _c\bar{\alpha }\nabla f(\pi _r^T\otimes I_mx_k)-x^{*})\Arrowvert _2 +n\pi _r^T\pi _c\bar{\alpha }\Arrowvert (1_n\otimes I_m)(n\nabla f((\pi _r^T\otimes I_m)x_k-(1_n\otimes I_m)\nabla f_k)\Arrowvert _2 \triangleq s_1+s_2. \)

From Lemma 3, if \(0<n(\pi _r^T\pi _c)\bar{\alpha }<\frac{1}{l}\).

\( s_1=\sqrt{n}\Arrowvert (\pi _r^T\otimes I_m)x_k-n\pi _r^T\pi _c\bar{\alpha }\nabla F((\pi _r^T\otimes I_m)x_k) -x^{*}\Arrowvert _2 \le \sqrt{n}(1-n(\pi _r^T\pi _c)\bar{\alpha })\Arrowvert (\pi _r^T\otimes I_m)x_k-x^{*}\Arrowvert _2 = (1-n\mu (\pi _r^T\pi _c)\bar{\alpha })\Arrowvert A_{\infty }x_k-1_n\otimes x^{*}\Arrowvert _2, \)

\( s_2 \le \bar{\alpha }(\pi _r^T\pi _c)n\Arrowvert \nabla f((1_n\otimes I_m)(\pi _r^T\otimes I_m)x_k)-\nabla f(x_k) \Arrowvert _2 \le \bar{\alpha }(\pi _r^T\pi _c)nlg\Arrowvert x_k-A_{\infty }x_k \Arrowvert _A. \)

Lemma 9

For \(\forall k\ge 0\), we have the following inequality

\( \Arrowvert s_{k+1}-B_{\infty }s_{k+1} \Arrowvert _B \le \sigma _B\Arrowvert s_k-B_{\infty }s_k \Arrowvert _B+\sigma _Bq\bar{l}g{{\left| \left| \left| A-I_{mn} \right| \right| \right| }}_2\Arrowvert x_k-A_{\infty }x_k \Arrowvert _2 +\sigma _Bq\bar{l}\beta \Arrowvert x_k-Ax_k \Arrowvert _2 +\sigma _Bq\bar{l}\bar{\alpha }\Arrowvert s_k \Arrowvert _2. \)

Proof

\( \Arrowvert s_{k+1}-B_{\infty }s_{k+1} \Arrowvert _B =\Arrowvert B[s_k+\nabla f(x_{k+1})-\nabla f(x_k)]-B_{\infty }B[s_k+\nabla f(x_{k+1})-\nabla f(x_k)] \Arrowvert _B \le \sigma _B\Arrowvert s_k-B_{\infty }s_k \Arrowvert _B+\sigma _B\bar{l}q\Arrowvert x_{k+1}-x_k \Arrowvert _2, \)

\( \Arrowvert x_{k+1}-x(k) \Arrowvert _2 =\Arrowvert Ax_k-D_\alpha s(k) +\beta [Ax_k-x_k]_--x_k \Arrowvert _2 \le \Arrowvert (A-I_{mn})(x_k-A_{\infty }x_k) \Arrowvert _2+\beta \Arrowvert x_k-Ax_k\Arrowvert _2+\bar{\alpha }\Arrowvert s_k \Arrowvert _2. \)

5 Analysis of Convergence Results

The analysis of the convergence results is given below.

Theorem 1

$$\begin{aligned} t_{k+1}<J_{\bar{\alpha },\beta }t_k,\forall k\ge 0 \end{aligned}$$

where \(t_k\in \mathbb {R}^4,J_{\bar{\alpha },\beta }\in \mathbb {R}^{4\times 4}\) are given by.

$$t_k=\left( \begin{array}{r} \Arrowvert x_k-A_{\infty }x_k \Arrowvert _A\\ \Arrowvert A_{\infty }x_k-1_n\otimes x^{*}\Arrowvert _2\\ \Arrowvert x_k-Ax_k \Arrowvert _2\\ \Arrowvert s_k-B_{\infty }s_k \Arrowvert _B \end{array}\right) $$
$$J_{\bar{\alpha },\beta } = \begin{pmatrix} \sigma _A+a_1\bar{\alpha }&{}a_2\bar{\alpha }&{}a_3\beta &{}a_4\bar{\alpha }\\ a_5\bar{\alpha }&{}1-a_6\bar{\alpha }&{}a_7\beta &{}a_8\bar{\alpha }\\ (\sigma _A+\sigma _A^2)a_9+a_{10}\bar{\alpha }&{}a_{11}\bar{\alpha }&{}a_{12}\beta &{}a_{13}\bar{\alpha }\\ \sigma _Ba_{14}+\sigma _Ba_{15}\bar{\alpha }&{}\sigma _Ba_{16}\bar{\alpha }&{}\sigma _Ba_{17}\beta &{}\sigma _B+\sigma _Ba_{18}\bar{\alpha } \end{pmatrix} $$

where \(a_i\) in the above expression are \(a_1={{\left| \left| \left| B \right| \right| \right| }}_2\bar{l}gm{{\left| \left| \left| I_{mn}-A_{\infty } \right| \right| \right| }}_2, a_2=p\bar{l}{{\left| \left| \left| B \right| \right| \right| }}_2{{\left| \left| \left| I_{mn}-A_{\infty } \right| \right| \right| }}_2, a_3={{\left| \left| \left| I_{mn}-A_{\infty } \right| \right| \right| }}_A, a_4=ph{{\left| \left| \left| I_{mn}-A_{\infty } \right| \right| \right| }}_2, a_5=(\pi _r^T\pi _c)nlg, a_6=n\mu (\pi _r^T\pi _c), a_7={{\left| \left| \left| A_{\infty } \right| \right| \right| }}_2, a_8=h, a_9=g, a_{10}=\bar{l}g{{\left| \left| \left| B_{\infty } \right| \right| \right| }}_2{{\left| \left| \left| I_{mn}-A_{\infty } \right| \right| \right| }}_2, a_{11}={{\left| \left| \left| I_{mn}-A_{\infty } \right| \right| \right| }}_2{{\left| \left| \left| B_{\infty } \right| \right| \right| }}_2, a_{12}={{\left| \left| \left| I_{mn}-A_{\infty } \right| \right| \right| }}_2, a_{13}=h{{\left| \left| \left| I_{mn}-A_{\infty } \right| \right| \right| }}_2, a_{14}=\bar{l}qg{{\left| \left| \left| I_{mn}-A_{\infty } \right| \right| \right| }}_2, a_{15}=\bar{l}^2qg{{\left| \left| \left| B_{\infty } \right| \right| \right| }}_2, a_{16}=\bar{l}^2q{{\left| \left| \left| B_{\infty } \right| \right| \right| }}_2, a_{17}=\bar{l}q, a_{18}=\bar{l}qh \)

Define the positive vector \(\boldsymbol{\delta }=[\delta _1,\delta _2,\delta _3,\delta _4]\), where

\(\delta _1=1-\sigma _B,\quad \delta _2=2\frac{a_5(a-\sigma _B)+2\sigma _Ba_{14}}{a_6},\delta _3=2(\sigma _A+\sigma _A^2)a_9,\quad \delta _4=2\sigma _Ba_{14}\)

If \(\bar{\alpha }\) and \(\beta \) are within:

$$\begin{aligned} \begin{aligned} 0<\bar{\alpha }<\min \{\frac{1}{nl\pi _r^T\pi _c}, \frac{(1-\sigma _A)\delta _1}{a_1\delta _1+a_2\delta _2+a_4\delta _4}, \frac{\delta _3-(\sigma _A+\sigma _A^2)a_9}{a_{10}\delta _1+a_{11}\delta _2+a_{13}\delta _4},\\ \frac{(1-\sigma _B)\delta _4-\sigma _Ba_{14}\delta _1}{a_{15}\sigma _B\delta _1+a_{16}\sigma _B\delta _2+a_{18}\sigma _B\delta _4}\} \end{aligned} \end{aligned}$$
(11)
$$\begin{aligned} \begin{aligned}&0<\beta <\min \{\frac{(1-\sigma _A)-(a_1\delta _1+a_2\delta _2+a_4\delta _4)\bar{\alpha }}{a_3\delta _3}, \frac{a_6\delta _2-a_5\delta _1+a_8\delta _4}{a_7\delta _3},\\&\frac{\delta _3-(\sigma _A+\sigma _A^2)a_9-a_{10}\bar{\alpha }\delta _1-(a_{11}\delta _2+a_{13}\delta _4)\bar{\alpha }}{a_{12}\delta _3},\\&\frac{(1-\sigma _B)\delta _4-\sigma _Ba_{14}\delta _1-(a_{15}\sigma _B\delta _1+a_{16}\sigma _B\delta _2+a_{18}\sigma _B\delta _4)\bar{\alpha }}{\sigma _Ba_{17}\delta _3} \} \end{aligned} \end{aligned}$$
(12)

then, \(\rho (J_{\bar{\alpha },\beta })<1\). Therefore, \(\Arrowvert x(k)-1_n\bigotimes x^{*}\Arrowvert _2\) converges linearly to 0 at the rate of \(\mathcal {O}(\rho (J_{\bar{\alpha },\beta }))^k\).

Proof

It is easy to verify \(t_{k+1}<J_{\bar{\alpha },\beta }t_k,\forall k\ge 0\) from Lemmas 59. To prove that \(\Arrowvert x_k-1_n\bigotimes x^{*}\Arrowvert _2\) linearly converges, just prove that there are \(\bar{\alpha }\) and \(\beta \) such that \(\rho (J_{\bar{\alpha },\beta })<1\). From Lemma 4, we need to proof there exist \(\bar{\alpha }\) and \(\beta \) that satisfies \(J_{\bar{\alpha },\beta }\boldsymbol{\delta }<\boldsymbol{\delta }\) for some positive vector \(\boldsymbol{\delta }=[\delta _1,\delta _2,\delta _3,\delta _4]\) and solve the range of \(\bar{\alpha }\) and \(\beta \). The inequality is changed into.

$$\begin{aligned} a_3\delta _3\beta <(1-\sigma _A)\delta _1-(a_1\delta _1+a_2\delta _2+a_4\delta _4)\bar{\alpha } \end{aligned}$$
(13)
$$\begin{aligned} a_7\delta _3\beta <(a_6\delta _2-a_5\delta _1-a_8\delta _4)\bar{\alpha } \end{aligned}$$
(14)
$$\begin{aligned} a_{12}\delta _3\beta <\delta _3-((\sigma _A+\sigma _A^2)a_9+a_{10}\bar{\alpha }\delta _1)-a_{11}\delta _2\bar{\alpha }-a_{13}\delta _4\bar{\alpha } \end{aligned}$$
(15)
$$\begin{aligned} \sigma _Ba_{17}\delta _3\beta <-\sigma _Ba_{14}\delta _1+(1-\sigma _B)\delta _4 -(\sigma _Ba_{18}\delta _4+\sigma _Ba_{15}\delta _1+\sigma _Ba_{16}\delta _2)\bar{\alpha }. \end{aligned}$$
(16)

Since \(\beta >0\), the right side of the above four inequalities are positive, which can derive the range of \(\bar{\alpha },\delta _1,\delta _2,\delta _3,\delta _4\).

$$\begin{aligned} \bar{\alpha }<\frac{(1-\sigma _A)\delta _1}{a_1\delta _1+a_2\delta _2+a_4\delta _4} \end{aligned}$$
(17)
$$\begin{aligned} \bar{\alpha }<\frac{\delta _3-(\sigma _A+\sigma _A^2)a_9}{a_{10}\delta _1+a_{11}\delta _2+a_{13}\delta _4} \end{aligned}$$
(18)
$$\begin{aligned} \bar{\alpha }<\frac{(1-\sigma _B)\delta _4-\sigma _Ba_{14}\delta _1}{a_{15}\sigma _B\delta _1+a_{16}\sigma _B\delta _2+a_{18}\sigma _B\delta _4} \end{aligned}$$
(19)
$$\begin{aligned} \delta _2>\frac{a_5\delta _1+a_8\delta _4}{a_6} \end{aligned}$$
(20)

Because \(\bar{\alpha }>0\), we can choose \(\delta _1,\delta _2,\delta _3,\delta _4\) to make \(\bar{\alpha }\) positive. According to formulas (17)–(20), select the value of \(\delta _i\) as follows: \(\delta _1=1-\sigma _B,\quad \delta _2=2\frac{a_5(a-\sigma _B)+2\sigma _Ba_14}{a_6}, \delta _3=2(\sigma _A+\sigma _A^2)a_9,\quad \delta _4=2\sigma _Ba_14\)

After determining \(\delta _i\), the upper bound of \(\bar{\alpha }\) can be determined according to inequality (13)–(16), and the upper bound of \(\beta \) can be determined according to inequality (17)–(20). Theorem 1 is finally proved.

Remark 2:

From the above theorem, we can obtain the linear convergence rate of the algorithm. However, since the equivalent constants between \(\sigma _A\) and \(\sigma _B\) and the norm are unknown, the boundary between \({\alpha }\) and \(\beta \) cannot be clearly given, it is necessary to manually adjust the parameters to get the best performance.

Fig. 1.
figure 1

Directed graph with six agents

6 Numerical Experiment

This paper use Matlab to demonstrate the simulation effect to check the effectiveness of the algorithm.

Figure 1 denotes a communication topology between 6 agents. We consider the distributed convex optimization problem in the directed strongly connected network with 6 agents. And the local objective function of each agent is

\(f_1(x_1)=x_1^2-2x_1+cos(x_1)+3, f_2(x_2)=x_2^2-5x_2+e^{-0.1x_2}-1,\)

\(f_3(x_3)=x_3^2-3x_3-0.5sin(x_3)-3, f_4(x_4)=x_4^2+2x_4^4-3.\)

\(f_5(x_5)=x_5^2+3x_5+1, f_6(x_6)=4x_6^2+2x_6-cos(x_6)+3\)

Fig. 2.
figure 2

Agent state trajectory

Fig. 3.
figure 3

Agent state trajectory (extra)

The optimal solution in this case is \(x^*=0.2980\). Figure 2 shows that the algorithm proposed in this paper finally find the optimal value. Figure 3 is the agent trajectory diagram of the algorithm proposed in [12], and shows that the algorithm proposed in this article has a faster convergence rate. Distributed optimization can also be applied to drone formation, if a drone wants to control its own position, it can make a decision based on the position information of the nearby drones to determine its own position.

7 Conclusion

We proposed an improved fully distributed optimization algorithm for the directed strongly connected graph in this paper. Assume that the objective function is strongly convex with Lipschitz continuous gradient, all agents can be forced to converge to the optimal point at geometric rate under the algorithm. By introducing row stochastic, column stochastic matrix, and a momentum term, the rate of convergence of our algorithm is higher than that of the literatures.