An Improved Distributed Optimization Algorithm over Unbalanced Directed Graph

Gao, Zhenteng; Mo, Lipo

doi:10.1007/978-981-19-3998-3_49

Zhenteng Gao⁴⁰ &
Lipo Mo⁴⁰

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 934))

2334 Accesses

Abstract

This paper mainly discusses the common distributed optimization problem over unbalanced directed graph. Assumed that the local objective function of each agent is strongly convex and has a Lipschitz continuous gradient. An improved distributed algorithm is proposed by introducing a momentum term and different local step lengths. Then we prove that all agents would find the optimal value under our algorithm when the maximum step length and the momentum parameter satisfy a certain range and are positive. At last, we illustrate the effectiveness of the obtained results by a numerical experiment.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Continuous-time Distributed Heavy-ball Algorithm for Distributed Convex Optimization over Undirected and Directed Graphs

Article 21 January 2022

Regularized dual gradient distributed method for constrained convex optimization over unbalanced directed graphs

Article 14 June 2019

Distributed accelerated optimization algorithms: Insights from an ODE

Article 19 June 2020

Keywords

1 Introduction

Recently, distributed optimization problems have received extensive attention, and it is very helpful to solve this problem by the distributed consensus algorithm. We improved the consensus algorithm to solve the distributed optimization problem in this paper, where each agent can access to one cost function , and all agents collaboratively minimize the entire function $\frac{1}{n}\sum \limits _{i=1}^n f_i(x)$ through the exchange of information between agents. This paper focuses on the situation of unbalanced directed graph. Early works about distributed optimization problems mainly included distributed gradient descent [1] and distributed dual averaging [2] over undirected graphs. It was proved that the optimal value could be found at a linear rate of $O(\frac{lnk}{\sqrt{k}})$ for any convex function, and the rate of $O(\frac{lnk}{k}) $for any strongly convex function, where k is the number of iterations. Under the conditions of strong convexity and Lipschitz continuous gradient, algorithms were improved with faster convergence speed. For example, the algorithm with a constant step size geometrically converged to an error ball around the optimal solution, there was another method that requires symmetric weights to achieve global geometric convergence. In [4,5,6], the imprecise gradient method and the gradient estimation method were introduced to deal with this problem.

The aforementioned methods were all for undirected graph. When the communication capabilities between agents were inconsistent, the algorithm for undirected graph would no longer be applicable. Therefore, algorithms suitable for directed graphs need to be developed. The Push-sum method and the DGD (distributed sub-gradient descent) method were introduced for directed graphs in [7,8,9,10]. However, the effect of the reduction of the step size resulted in a relatively slow convergence rate. Literature [11] assumed that the objective function has a Lipschitz continuous gradient and is strongly convex. It was shown that all agents would converge to the optimal value at geometrical rate. By constructing a row stochastic matrix and a column stochastic matrix, another type of algorithm was proposed [12,13,14], where the row random matrix ensured the consistency of the algorithm, and the column randomness matrix was used to guarantee the optimality. In [12, 13], the cases of fixed strong connectivity and time-varying strong connectivity were considered, based on which, the gravity ball was introduced to improve the convergence rate of algorithms.

For second order and heterogeneous multi-agent systems, some improve algorithms were also proposed in [16, 17]. Inspired by the literature [12], we studies an improved fully distributed algorithm to optimize all objective functions in a distributed manner, where the momentum term is borrowed to improve the convergence rate. It is shown that the position states of every agents would converge to the optimal solution of the objective function by the nature of the random matrix.

$I_n$ represents an n-dimensional unit matrix, and $1_n$ represents a column vector whose components are all ones. $\rho (x)$ represents the spectral radius of the vector x, and $X_{\infty }$ represents the infinite power of the matrix X. For the row random matrix A, $\pi _r$ and $1_n$ to represent the left and right eigenvectors of A respectively, such that $\pi _r^T1_n=1$. Similarly, for the column random matrix B, $1_n$ and $\pi _c$ to represent the left and right eigenvectors of B respectively, such that $\pi _c^T1_n=1$. $\Arrowvert \cdot \Arrowvert _2$ represents the 2-Norm of the vector. ${{\left| \left| \left| \cdot \right| \right| \right| }}_2$ represents the spectral norm of the matrix.

2 Graph Theory Foundation and Problem Description

$\mathcal {G} = (\mathcal {V},\mathcal {E})$ denotes a directed graph, where $\mathcal {V}=\{1,2,\dots ,n\}$ represents the set of network agents, and $\mathcal {E}$ represents the set of edges between agents in the network. (j, i) or $j\rightarrow i$ indicates that there is a directed edge that transmits information from the agent j to the agent i. For ${\forall }$ i,j, if there is a directed path $(i_1,i_{s1}),(i_{s1},i_{s2}),\dots ,(i_{sk},j)$, then it is called a strongly connected graph. In addition, $N_i^{in}=\{j\mid (j, i)\in \mathcal {E}\}$ represents the into-neighbor set of the agent i, that is the set of agents that the agent i can receives information from. Similarly, $N_i^{out}=\{j\mid (i,j)\in \mathcal {E}\}$ represents the out-neighbors set of the agent i, that is the set of agents that can receive information from agent i. Note that both $N_i^{in}$ and $N_i^{out}$ contain node i.

In the distributed convex optimization problem, each agent i can access to a local decision variable and a convex cost function $f_i(x)$. The goal of this problem is to minimize the following integral objective function.

(1)

Each agent i can only obtain its own cost function . Assume that each cost function is strongly convex and its gradient is Lipschitz.

Assumption 1

$\mathcal {G}$ is directed strongly connected graph.

Assumption 2

The gradient of the objective function of each agent satisfies Lipschitz condition, that is, for any agent i and , there is a constant $l_i$ such that:

$$\begin{aligned} \Arrowvert \nabla f_i(x) - \nabla f_i(y) \Arrowvert \le l_i\Arrowvert x - y \Arrowvert \end{aligned}$$

(2)

Assumption 3

The cost function of each agent is strongly convex, that is, for any agent i and , there is a positive constant $\mu $ such that:

$$\begin{aligned} f_i(x)-f_i(y) \ge \nabla f_i(x)^\mathrm{T} (x - y)-\frac{\mu }{2} \Arrowvert x-y \Arrowvert _2^2 \end{aligned}$$

(3)

Remark 1:

Assumption 2 and Assumption 3 ensure that the global optimal solution $x^{*}$ exists and is unique respectively. Assumption 3 is conducive to the subsequent proof of the convergence of the algorithm.

3 Algorithm Design

We propose the following algorithm to solve problem (1) in this paper. Each agent i contains two variables $x_{i,k}, s_{i,k}$ in the network, and k represents iteration step, where $i \in \mathcal {V}$, . The system satisfies the initial state $s_{i,0}=\nabla f_i(x_{i,0}),i \in \mathcal {V}$.

$$\begin{aligned} x_{i,k+1}=\sum \limits _{j=1}^n a_{ij}x_{j,k}-\alpha _is_{i,k}+\beta [\sum \limits _{j=1}^na_{ij}(x_{j,k}-x_{i,k})]_- \end{aligned}$$

(4a)

$$\begin{aligned} s_{i,k+1}=\sum \limits _{j=1}^n b_{ij}[s_{j,k}+\nabla f_j(x_{j,k+1})-\nabla f_j(x_{j,k})] \end{aligned}$$

(4b)

where $\alpha _i$ and $\beta $ are both positive constants. The weights $a_{ij}$ and $b_{ij}$ satisfy the following:

$$\begin{aligned} a_{ij}=\left\{ \begin{aligned} >0&,&j\in N_i^{in},\\ 0&,&j\notin N_i^{in}, \end{aligned} \right. \quad \sum \limits _{j=1}^n a_{ij}=1, \forall i, \end{aligned}$$

(5)

$$\begin{aligned} b_{ij}=\left\{ \begin{aligned} >0&,&i\in N_j^{out},\\ 0&,&i\notin N_j^{out}, \end{aligned} \right. \quad \sum \limits _{i=1}^n b_{ij}=1, \forall j, \end{aligned}$$

(6)

$\bar{A}=\{a_{ij}\}$ represents the row random matrix, and $\bar{B}=\{b_{ij}\}$ represents the column random matrix.

Denote $x_k=[x_{1,k}^T,\dots ,x_{n,k}^T]^T$, $s_k=[s_{1,k}^T,$,$\dots ,s_{n,k}^T]^T$, $\nabla f(x_k)=[\nabla f_1(x_{1,k})^T,\dots $,$\nabla f_n(x_{n,k})^T]^T$. Let$A=\bar{A}\otimes I_m$,$B=\bar{B}\otimes I_m$, then Eq. (4) can be rewritten the following form:

$$\begin{aligned} x_{k+1}=Ax_k-D_\alpha s_k +\beta [Ax_k-x_k]_- \end{aligned}$$

(7a)

$$\begin{aligned} s_{k+1}=B[s_{k}+\nabla f(x_{k+1})-\nabla f(x_{k})] \end{aligned}$$

(7b)

where $D_\alpha $ represents a diagonal matrix whose diagonal elements are $\alpha _i$, and other elements are 0, where $s_0=\nabla f(x_0)$, and $ x_0$ is arbitrary.

4 Algorithm Convergence Analysis

First, let us prove a key lemma, which involves the shrinkage of the consistency process of the row and column random matrix respectively.

Lemma 1

$A=\bar{A}\otimes I_m$ and $B=\bar{B}\otimes I_m$ are weight matrices, there are vector norms $\Arrowvert \cdot \Arrowvert _A$ and $\Arrowvert \cdot \Arrowvert _B$ such that for ,

$$\begin{aligned} \Arrowvert Ax-A_{\infty } \Arrowvert _A \le \sigma _A\Arrowvert x-A_{\infty }x \Arrowvert _A \end{aligned}$$

(8)

$$\begin{aligned} \Arrowvert Bx-B_{\infty } \Arrowvert _B \le \sigma _B\Arrowvert x-B_{\infty }x \Arrowvert _B. \end{aligned}$$

(9)

Proof

Since $\bar{A}$ is irreducible, its diagonal elements are all positive, and the rows are random. According to the Perro-Frobenius theorem, $\rho (\bar{A})=1$. Every eigenvalue except 1 is strictly less than $\rho (\bar{A})$, $\pi _r^{T}$ is a strictly positive left eigenvector corresponding to eigenvalue 1, and $\pi _r ^{T}1_n=1$. Therefore, $\lim \limits _{k \rightarrow \infty }\bar{A}^k=1_n\pi _r^{T}$, and

$$\begin{aligned} A_{\infty }=\lim \limits _{k \rightarrow \infty }A^k=(\lim \limits _{k \rightarrow \infty }\bar{A}^k)\otimes I_m=(1_n\pi _r^{T})\otimes I_m. \end{aligned}$$

Then

$$\begin{aligned} AA_{\infty }=(\bar{A}\otimes I_m)((1_n\pi _r^{T})\otimes I_m)=A_{\infty } \end{aligned}$$

$$\begin{aligned} A_{\infty }A_{\infty }=((1_n\pi _r^{T})\otimes I_m)((1_n\pi _r^{T})\otimes I_m)=A_{\infty } \end{aligned}$$

Therefore,$AA_{\infty }-A_{\infty }A_{\infty } = 0$, then there are the following formulas

$$\begin{aligned} Ax-A_{\infty }x=(A-A_{\infty })(x-A_{\infty }x). \end{aligned}$$

(10)

Because $\rho (A-A_{\infty })=\rho ((\bar{A}-1_n\pi _r^T)\otimes I_m)<1$, according to [15], there is a matrix norm ${{\left| \left| \left| \cdot \right| \right| \right| }}_A$ such that $\sigma _A={{\left| \left| \left| A-A_{\infty } \right| \right| \right| }}_A<1$. In addition, according to Theorem 5.7.13 in [15], there is a corresponding vector norm ${{\left| \left| \left| \cdot \right| \right| \right| }}_A$ for any matrix norm $\Arrowvert \cdot \Arrowvert _A$, such that for all matrices Y and vectors y, $\Arrowvert Yy \Arrowvert _A \le {{\left| \left| \left| Y \right| \right| \right| }}_A\Arrowvert y \Arrowvert _A$ . Therefore, Eq. (10) leads to:

$$ \begin{aligned}&\Arrowvert Ax-A_{\infty }x \Arrowvert _A=\Arrowvert (A-A_{\infty })(x-A_{\infty }x) \Arrowvert _A \\&\le {{\left| \left| \left| A-A_{\infty } \right| \right| \right| }}_A\Arrowvert x-A_{\infty }x \Arrowvert _A = \sigma _A\Arrowvert x-A_{\infty }x \Arrowvert _A. \end{aligned} $$

The Eq. (8) of Lemma 1 is proved. The same is true for Eq. (9).

Lemma 2

$$\begin{aligned} (1_n^T \otimes I_m)s_k=(1_n^T \otimes I_m)\nabla f(x_k),\forall k. \end{aligned}$$

Proof

$ (1_n^T \otimes I_m)s_k =(1_n^T \otimes I_m)(\bar{B}\otimes I_m)[s_k+\nabla f(x_{k+1})-\nabla f(x_k)] =(1_n^T \otimes I_m)s_k+(1_n^T \otimes I_m)(\nabla f(x_{k+1})-\nabla f(x_k)) =(1_n^T \otimes I_m)(s_0-\nabla f(x_0))+(1_n^T \otimes I_m)\nabla f(x_k) =(1_n^T \otimes I_m)\nabla f(x_k). $

Lemma 3

[18]. If the function f satisfies Assumptions 2 and 3, and l and $\mu $ are respectively strongly convex and Lipschitz continuous coefficients, then for ,

$$\begin{aligned} \Arrowvert x-\alpha \nabla f(x)-x^{*} \Arrowvert \le (1-\mu \alpha )\Arrowvert x-x^{*}\Arrowvert . \end{aligned}$$

Lemma 4

[15]. Suppose is non-negative, and is positive. If $Ww<\zeta w$ with $\zeta >0$, then $\rho (W)<\zeta $.

The subsequent analysis of convergence is carried out from the contraction relationship of the following four quantities.

$1)\quad \Arrowvert x_{k+1}-A_{\infty }x_{k+1}\Arrowvert _A;$

$2)\quad \Arrowvert x_{k+1}-Ax_{k+1}\Arrowvert _2;$

$3)\quad \Arrowvert A_{\infty }x_{k+1}-1_n \otimes x^{*}\Arrowvert _2;$

$4)\quad \Arrowvert s_{k+1}-B_{\infty }s_{k+1}\Arrowvert _B.$

Norms in finite-dimensional linear space are equivalent, that is, there are positive constants c, d, h, q, g, p such that the vector norm satisfies the following inequality:

$$\begin{aligned} \Arrowvert \cdot \Arrowvert _A \le c\Arrowvert \cdot \Arrowvert _B,\quad \Arrowvert \cdot \Arrowvert _2 \le h\Arrowvert \cdot \Arrowvert _B,\quad \Arrowvert \cdot \Arrowvert _2 \le g\Arrowvert \cdot \Arrowvert _A, \end{aligned}$$

$$\begin{aligned} \Arrowvert \cdot \Arrowvert _B \le d\Arrowvert \cdot \Arrowvert _A, \quad \Arrowvert \cdot \Arrowvert _B \le q\Arrowvert \cdot \Arrowvert _2,\quad \Arrowvert \cdot \Arrowvert _A \le p\Arrowvert \cdot \Arrowvert _2. \end{aligned}$$

Lemma 5

For $\forall k\ge 0$, the following inequality holds,

$$\begin{aligned} \Arrowvert s_k \Arrowvert _2&\le h\Arrowvert s_k \Arrowvert _B +{{\left| \left| \left| B \right| \right| \right| }}_2\bar{l}g\Arrowvert s_k \Arrowvert _A+{{\left| \left| \left| B \right| \right| \right| }}_2\bar{l}\Arrowvert A_{\infty }x_k-1_n\otimes x^{*} \Arrowvert _2 \end{aligned}$$

where $\bar{l}=\max {\{l_i\}}$.

Proof

$$\begin{aligned} \Arrowvert s_k \Arrowvert _2 \le h\Arrowvert s_k-B_{\infty }s_k \Arrowvert _B+\Arrowvert B_{\infty }s_k \Arrowvert _2 \end{aligned}$$

$\Arrowvert B_{\infty }s_k \Arrowvert _2=\Arrowvert (\pi _c\otimes I_m)(1_n^T\otimes I_m)s_k \Arrowvert _2 =\Arrowvert \pi _c \Arrowvert _2\Arrowvert (1_n^T\otimes I_m)s_k \Arrowvert _2 =\Arrowvert \pi _c \Arrowvert _2\Arrowvert \sum \limits _{i=1}^n \nabla f_i(x_{i,k})- \sum \limits _{i=1}^n\nabla f_i(x^{*}) \Arrowvert _2 \le \Arrowvert \pi _c \Arrowvert _2\bar{l}\sum \limits _{i=1}^n\Arrowvert x_{i,k}-x^{*} \Arrowvert _2 \le \Arrowvert \pi _c \Arrowvert _2\bar{l}\sqrt{n}\Arrowvert x_k-1_n\otimes x^{*} \Arrowvert _2 \le {{\left| \left| \left| B \right| \right| \right| }}_2\bar{l}g\Arrowvert x_k-A_{\infty }x_k \Arrowvert _A+{{\left| \left| \left| B \right| \right| \right| }}_2\bar{l}\Arrowvert A_{\infty }x_k-1_n\otimes x^{*} \Arrowvert _2.$

The proof is completed.

Lemma 6

For $\forall k\ge 0$, we have the following inequality,

$\Arrowvert x_{k+1}-A_{\infty }x_{k+1} \Arrowvert _A \le \sigma _A\Arrowvert x_k-A_{\infty }x_k \Arrowvert _A +\bar{\alpha }p{{\left| \left| \left| I_{mn}-A_{\infty } \right| \right| \right| }}_2\Arrowvert s_k \Arrowvert _2 +\beta {{\left| \left| \left| I_{mn}-A_{\infty } \right| \right| \right| }}_A\Arrowvert Ax_k-x_k \Arrowvert _A.$

where $\bar{\alpha }= \max {\{\alpha _i\}}$.

Proof

$\Arrowvert x_{k+1}-A_{\infty }x_{k+1} \Arrowvert _A=\Arrowvert Ax_k-D_\alpha s_k +\beta [Ax_k-x_k]_{-}-A_{\infty }x_k+A_{\infty }D_\alpha s_k \quad -\beta A_{\infty }[Ax_k-x_k]_- \Arrowvert _A \le \sigma _A\Arrowvert x_k-A_{\infty }x_k \Arrowvert _A+\bar{\alpha }\Arrowvert s_k-A_{\infty }s_k \Arrowvert _A+ \quad \beta {{\left| \left| \left| I_{mn}-A_{\infty } \right| \right| \right| }}_A\Arrowvert Ax_k-x_k \Arrowvert _A \le \sigma _A\Arrowvert x_k-A_{\infty }x_k \Arrowvert _A +\bar{\alpha }p{{\left| \left| \left| I_{mn}-A_{\infty } \right| \right| \right| }}_2\Arrowvert s_k \Arrowvert _2 \quad +\beta {{\left| \left| \left| I_{mn}-A_{\infty } \right| \right| \right| }}_A\Arrowvert Ax_k-x_k \Arrowvert _A.$

Lemma 7

For $\forall k\ge 0$, we have the following inequality,

$\Arrowvert x_{k+1}-Ax_{k+1} \Arrowvert _2\le (\sigma _A+\sigma _A^2)g\Arrowvert x_k-A_{\infty }x_k\Arrowvert _A+\bar{\alpha }{{\left| \left| \left| I_{mn}-A \right| \right| \right| }}_2\Arrowvert s_k \Arrowvert _2+\beta {{\left| \left| \left| I_{mn}-A \right| \right| \right| }}_2\Arrowvert x_k-Ax_k \Arrowvert _2$.

Proof

$ \Arrowvert x_{k+1}-Ax_{k+1} \Arrowvert _2 =\Arrowvert Ax_k-D_\alpha s_k +\beta [Ax_k-x_k]_- \quad -A^2x_k+AD_\alpha s_k -\beta A[Ax_k-x_k]_- \Arrowvert _A =\Arrowvert Ax_k-A_{\infty }x_k-A^2x_k+A_{\infty }x_k -D_\alpha s_k+AD_\alpha s_k +\beta [Ax_k-x_k]_--\beta A[Ax_k-x_k]_- \Arrowvert _A \le \sigma _Ag\Arrowvert x_k-A_{\infty }x_k \Arrowvert _A+\sigma _A^2g\Arrowvert x_k-A_{\infty }x_k \Arrowvert _A+\bar{\alpha }\Arrowvert s_k-As_k \Arrowvert _2 +\beta {{\left| \left| \left| I_{mn}-A \right| \right| \right| }}_2\Arrowvert x_k-Ax_k \Arrowvert _2 \le (\sigma _A+\sigma _A^2)g\Arrowvert x_k-A_{\infty }x_k \Arrowvert _A+\bar{\alpha }{{\left| \left| \left| I_{mn}-A \right| \right| \right| }}_2\Arrowvert s_k \Arrowvert _2 +\beta {{\left| \left| \left| I_{mn}-A \right| \right| \right| }}_2\Arrowvert x_k-Ax_k \Arrowvert _2. $

Lemma 8

When $0<\bar{\alpha }<\frac{1}{nl\pi _r^T\pi _c}$, for $\forall k\ge 0$, we have the following inequality:

$ \Arrowvert A_{\infty }x_{k+1}-1_n\otimes x^{*}\Arrowvert _2 \le (1-n\mu (\pi _r^T\pi _c)\bar{\alpha })\Arrowvert A_{\infty }x_k-1_n\otimes x^{*}\Arrowvert _2+\bar{\alpha }(\pi _r^T\pi _c)nlg\Arrowvert x_k-A_{\infty }x_k \Arrowvert _A +\bar{\alpha }h\Arrowvert s_k-B_{\infty }s_k \Arrowvert _B+\beta {{\left| \left| \left| A_{\infty } \right| \right| \right| }}_2\Arrowvert x_k-Ax_k \Arrowvert _2 $

Proof

$ \Arrowvert A_{\infty }x_{k+1}-1_n\otimes x^{*}\Arrowvert _2 =\Arrowvert A_{\infty }(Ax_k-D_{\alpha } s_k+(D_{\alpha }-D_{\alpha })B_{\infty }s(k)+\beta [Ax_k-x_k]_-)-1_n\otimes x^{*}\Arrowvert _2 \le \Arrowvert A_{\infty }x_k-A_{\infty }D_{\alpha }B_{\infty }\nabla f(x_k)-(1_n\otimes I_m)x^{*}\Arrowvert _2+\bar{\alpha }h\Arrowvert s_k-B_{\infty }s_k \Arrowvert _B +\beta {{\left| \left| \left| A_{\infty } \right| \right| \right| }}_2\Arrowvert x_k-Ax_k \Arrowvert _2. $

$ A_{\infty }B_{\infty }=((1_n\pi _r^T)\otimes I_m)((\pi _c 1_n^T)\otimes I_m)=(\pi _r^T\pi _c)(1_n1_n^T)\otimes I_m. $

$ \Arrowvert ((1_n\pi _r^T)\otimes I_m)x_k-(1_n\otimes I_m)x^{*}A_{\infty }D_{\alpha }B_{\infty }\nabla f(x_k)\Arrowvert _2 =\Arrowvert (1_n\otimes I_m)((\pi _r^T\otimes I_m)x_k-(\pi _r^Tdiag(\alpha )\pi _c)(1_n^T\otimes I_m)\nabla f(x_k)-x^{*}\Arrowvert _2) \le \Arrowvert (1_n\otimes I_m)((\pi _r^T\otimes I_m)x_k-n\pi _r^T\pi _c\bar{\alpha }\nabla f(\pi _r^T\otimes I_mx_k)-x^{*})\Arrowvert _2 +n\pi _r^T\pi _c\bar{\alpha }\Arrowvert (1_n\otimes I_m)(n\nabla f((\pi _r^T\otimes I_m)x_k-(1_n\otimes I_m)\nabla f_k)\Arrowvert _2 \triangleq s_1+s_2. $

From Lemma 3, if $0<n(\pi _r^T\pi _c)\bar{\alpha }<\frac{1}{l}$.

$ s_1=\sqrt{n}\Arrowvert (\pi _r^T\otimes I_m)x_k-n\pi _r^T\pi _c\bar{\alpha }\nabla F((\pi _r^T\otimes I_m)x_k) -x^{*}\Arrowvert _2 \le \sqrt{n}(1-n(\pi _r^T\pi _c)\bar{\alpha })\Arrowvert (\pi _r^T\otimes I_m)x_k-x^{*}\Arrowvert _2 = (1-n\mu (\pi _r^T\pi _c)\bar{\alpha })\Arrowvert A_{\infty }x_k-1_n\otimes x^{*}\Arrowvert _2, $

$ s_2 \le \bar{\alpha }(\pi _r^T\pi _c)n\Arrowvert \nabla f((1_n\otimes I_m)(\pi _r^T\otimes I_m)x_k)-\nabla f(x_k) \Arrowvert _2 \le \bar{\alpha }(\pi _r^T\pi _c)nlg\Arrowvert x_k-A_{\infty }x_k \Arrowvert _A. $

Lemma 9

For $\forall k\ge 0$, we have the following inequality

$ \Arrowvert s_{k+1}-B_{\infty }s_{k+1} \Arrowvert _B \le \sigma _B\Arrowvert s_k-B_{\infty }s_k \Arrowvert _B+\sigma _Bq\bar{l}g{{\left| \left| \left| A-I_{mn} \right| \right| \right| }}_2\Arrowvert x_k-A_{\infty }x_k \Arrowvert _2 +\sigma _Bq\bar{l}\beta \Arrowvert x_k-Ax_k \Arrowvert _2 +\sigma _Bq\bar{l}\bar{\alpha }\Arrowvert s_k \Arrowvert _2. $

Proof

$ \Arrowvert s_{k+1}-B_{\infty }s_{k+1} \Arrowvert _B =\Arrowvert B[s_k+\nabla f(x_{k+1})-\nabla f(x_k)]-B_{\infty }B[s_k+\nabla f(x_{k+1})-\nabla f(x_k)] \Arrowvert _B \le \sigma _B\Arrowvert s_k-B_{\infty }s_k \Arrowvert _B+\sigma _B\bar{l}q\Arrowvert x_{k+1}-x_k \Arrowvert _2, $

$ \Arrowvert x_{k+1}-x(k) \Arrowvert _2 =\Arrowvert Ax_k-D_\alpha s(k) +\beta [Ax_k-x_k]_--x_k \Arrowvert _2 \le \Arrowvert (A-I_{mn})(x_k-A_{\infty }x_k) \Arrowvert _2+\beta \Arrowvert x_k-Ax_k\Arrowvert _2+\bar{\alpha }\Arrowvert s_k \Arrowvert _2. $

5 Analysis of Convergence Results

The analysis of the convergence results is given below.

Theorem 1

$$\begin{aligned} t_{k+1}<J_{\bar{\alpha },\beta }t_k,\forall k\ge 0 \end{aligned}$$

where $t_k\in \mathbb {R}^4,J_{\bar{\alpha },\beta }\in \mathbb {R}^{4\times 4}$ are given by.

$$t_k=\left( \begin{array}{r} \Arrowvert x_k-A_{\infty }x_k \Arrowvert _A\\ \Arrowvert A_{\infty }x_k-1_n\otimes x^{*}\Arrowvert _2\\ \Arrowvert x_k-Ax_k \Arrowvert _2\\ \Arrowvert s_k-B_{\infty }s_k \Arrowvert _B \end{array}\right) $$

$$J_{\bar{\alpha },\beta } = \begin{pmatrix} \sigma _A+a_1\bar{\alpha }&{}a_2\bar{\alpha }&{}a_3\beta &{}a_4\bar{\alpha }\\ a_5\bar{\alpha }&{}1-a_6\bar{\alpha }&{}a_7\beta &{}a_8\bar{\alpha }\\ (\sigma _A+\sigma _A^2)a_9+a_{10}\bar{\alpha }&{}a_{11}\bar{\alpha }&{}a_{12}\beta &{}a_{13}\bar{\alpha }\\ \sigma _Ba_{14}+\sigma _Ba_{15}\bar{\alpha }&{}\sigma _Ba_{16}\bar{\alpha }&{}\sigma _Ba_{17}\beta &{}\sigma _B+\sigma _Ba_{18}\bar{\alpha } \end{pmatrix} $$

where $a_i$ in the above expression are \(a_1={{\left| \left| \left| B \right| \right| \right| }}_2\bar{l}gm{{\left| \left| \left| I_{mn}-A_{\infty } \right| \right| \right| }}_2, a_2=p\bar{l}{{\left| \left| \left| B \right| \right| \right| }}_2{{\left| \left| \left| I_{mn}-A_{\infty } \right| \right| \right| }}_2, a_3={{\left| \left| \left| I_{mn}-A_{\infty } \right| \right| \right| }}_A, a_4=ph{{\left| \left| \left| I_{mn}-A_{\infty } \right| \right| \right| }}_2, a_5=(\pi _r^T\pi _c)nlg, a_6=n\mu (\pi _r^T\pi _c), a_7={{\left| \left| \left| A_{\infty } \right| \right| \right| }}_2, a_8=h, a_9=g, a_{10}=\bar{l}g{{\left| \left| \left| B_{\infty } \right| \right| \right| }}_2{{\left| \left| \left| I_{mn}-A_{\infty } \right| \right| \right| }}_2, a_{11}={{\left| \left| \left| I_{mn}-A_{\infty } \right| \right| \right| }}_2{{\left| \left| \left| B_{\infty } \right| \right| \right| }}_2, a_{12}={{\left| \left| \left| I_{mn}-A_{\infty } \right| \right| \right| }}_2, a_{13}=h{{\left| \left| \left| I_{mn}-A_{\infty } \right| \right| \right| }}_2, a_{14}=\bar{l}qg{{\left| \left| \left| I_{mn}-A_{\infty } \right| \right| \right| }}_2, a_{15}=\bar{l}^2qg{{\left| \left| \left| B_{\infty } \right| \right| \right| }}_2, a_{16}=\bar{l}^2q{{\left| \left| \left| B_{\infty } \right| \right| \right| }}_2, a_{17}=\bar{l}q, a_{18}=\bar{l}qh \)

Define the positive vector $\boldsymbol{\delta }=[\delta _1,\delta _2,\delta _3,\delta _4]$, where

$\delta _1=1-\sigma _B,\quad \delta _2=2\frac{a_5(a-\sigma _B)+2\sigma _Ba_{14}}{a_6},\delta _3=2(\sigma _A+\sigma _A^2)a_9,\quad \delta _4=2\sigma _Ba_{14}$

If $\bar{\alpha }$ and $\beta $ are within:

$$\begin{aligned} \begin{aligned} 0<\bar{\alpha }<\min \{\frac{1}{nl\pi _r^T\pi _c}, \frac{(1-\sigma _A)\delta _1}{a_1\delta _1+a_2\delta _2+a_4\delta _4}, \frac{\delta _3-(\sigma _A+\sigma _A^2)a_9}{a_{10}\delta _1+a_{11}\delta _2+a_{13}\delta _4},\\ \frac{(1-\sigma _B)\delta _4-\sigma _Ba_{14}\delta _1}{a_{15}\sigma _B\delta _1+a_{16}\sigma _B\delta _2+a_{18}\sigma _B\delta _4}\} \end{aligned} \end{aligned}$$

(11)

$$\begin{aligned} \begin{aligned}&0<\beta <\min \{\frac{(1-\sigma _A)-(a_1\delta _1+a_2\delta _2+a_4\delta _4)\bar{\alpha }}{a_3\delta _3}, \frac{a_6\delta _2-a_5\delta _1+a_8\delta _4}{a_7\delta _3},\\&\frac{\delta _3-(\sigma _A+\sigma _A^2)a_9-a_{10}\bar{\alpha }\delta _1-(a_{11}\delta _2+a_{13}\delta _4)\bar{\alpha }}{a_{12}\delta _3},\\&\frac{(1-\sigma _B)\delta _4-\sigma _Ba_{14}\delta _1-(a_{15}\sigma _B\delta _1+a_{16}\sigma _B\delta _2+a_{18}\sigma _B\delta _4)\bar{\alpha }}{\sigma _Ba_{17}\delta _3} \} \end{aligned} \end{aligned}$$

(12)

then, $\rho (J_{\bar{\alpha },\beta })<1$. Therefore, $\Arrowvert x(k)-1_n\bigotimes x^{*}\Arrowvert _2$ converges linearly to 0 at the rate of $\mathcal {O}(\rho (J_{\bar{\alpha },\beta }))^k$.

Proof

It is easy to verify $t_{k+1}<J_{\bar{\alpha },\beta }t_k,\forall k\ge 0$ from Lemmas 5–9. To prove that $\Arrowvert x_k-1_n\bigotimes x^{*}\Arrowvert _2$ linearly converges, just prove that there are $\bar{\alpha }$ and $\beta $ such that $\rho (J_{\bar{\alpha },\beta })<1$. From Lemma 4, we need to proof there exist $\bar{\alpha }$ and $\beta $ that satisfies $J_{\bar{\alpha },\beta }\boldsymbol{\delta }<\boldsymbol{\delta }$ for some positive vector $\boldsymbol{\delta }=[\delta _1,\delta _2,\delta _3,\delta _4]$ and solve the range of $\bar{\alpha }$ and $\beta $. The inequality is changed into.

$$\begin{aligned} a_3\delta _3\beta <(1-\sigma _A)\delta _1-(a_1\delta _1+a_2\delta _2+a_4\delta _4)\bar{\alpha } \end{aligned}$$

(13)

$$\begin{aligned} a_7\delta _3\beta <(a_6\delta _2-a_5\delta _1-a_8\delta _4)\bar{\alpha } \end{aligned}$$

(14)

$$\begin{aligned} a_{12}\delta _3\beta <\delta _3-((\sigma _A+\sigma _A^2)a_9+a_{10}\bar{\alpha }\delta _1)-a_{11}\delta _2\bar{\alpha }-a_{13}\delta _4\bar{\alpha } \end{aligned}$$

(15)

$$\begin{aligned} \sigma _Ba_{17}\delta _3\beta <-\sigma _Ba_{14}\delta _1+(1-\sigma _B)\delta _4 -(\sigma _Ba_{18}\delta _4+\sigma _Ba_{15}\delta _1+\sigma _Ba_{16}\delta _2)\bar{\alpha }. \end{aligned}$$

(16)

Since $\beta >0$, the right side of the above four inequalities are positive, which can derive the range of $\bar{\alpha },\delta _1,\delta _2,\delta _3,\delta _4$.

$$\begin{aligned} \bar{\alpha }<\frac{(1-\sigma _A)\delta _1}{a_1\delta _1+a_2\delta _2+a_4\delta _4} \end{aligned}$$

(17)

$$\begin{aligned} \bar{\alpha }<\frac{\delta _3-(\sigma _A+\sigma _A^2)a_9}{a_{10}\delta _1+a_{11}\delta _2+a_{13}\delta _4} \end{aligned}$$

(18)

$$\begin{aligned} \bar{\alpha }<\frac{(1-\sigma _B)\delta _4-\sigma _Ba_{14}\delta _1}{a_{15}\sigma _B\delta _1+a_{16}\sigma _B\delta _2+a_{18}\sigma _B\delta _4} \end{aligned}$$

(19)

$$\begin{aligned} \delta _2>\frac{a_5\delta _1+a_8\delta _4}{a_6} \end{aligned}$$

(20)

Because $\bar{\alpha }>0$, we can choose $\delta _1,\delta _2,\delta _3,\delta _4$ to make $\bar{\alpha }$ positive. According to formulas (17)–(20), select the value of $\delta _i$ as follows: $\delta _1=1-\sigma _B,\quad \delta _2=2\frac{a_5(a-\sigma _B)+2\sigma _Ba_14}{a_6}, \delta _3=2(\sigma _A+\sigma _A^2)a_9,\quad \delta _4=2\sigma _Ba_14$

After determining $\delta _i$, the upper bound of $\bar{\alpha }$ can be determined according to inequality (13)–(16), and the upper bound of $\beta $ can be determined according to inequality (17)–(20). Theorem 1 is finally proved.

Remark 2:

From the above theorem, we can obtain the linear convergence rate of the algorithm. However, since the equivalent constants between $\sigma _A$ and $\sigma _B$ and the norm are unknown, the boundary between ${\alpha }$ and $\beta $ cannot be clearly given, it is necessary to manually adjust the parameters to get the best performance.

6 Numerical Experiment

This paper use Matlab to demonstrate the simulation effect to check the effectiveness of the algorithm.

Figure 1 denotes a communication topology between 6 agents. We consider the distributed convex optimization problem in the directed strongly connected network with 6 agents. And the local objective function of each agent is

$f_1(x_1)=x_1^2-2x_1+cos(x_1)+3, f_2(x_2)=x_2^2-5x_2+e^{-0.1x_2}-1,$

$f_3(x_3)=x_3^2-3x_3-0.5sin(x_3)-3, f_4(x_4)=x_4^2+2x_4^4-3.$

$f_5(x_5)=x_5^2+3x_5+1, f_6(x_6)=4x_6^2+2x_6-cos(x_6)+3$

The optimal solution in this case is $x^*=0.2980$. Figure 2 shows that the algorithm proposed in this paper finally find the optimal value. Figure 3 is the agent trajectory diagram of the algorithm proposed in [12], and shows that the algorithm proposed in this article has a faster convergence rate. Distributed optimization can also be applied to drone formation, if a drone wants to control its own position, it can make a decision based on the position information of the nearby drones to determine its own position.

7 Conclusion

We proposed an improved fully distributed optimization algorithm for the directed strongly connected graph in this paper. Assume that the objective function is strongly convex with Lipschitz continuous gradient, all agents can be forced to converge to the optimal point at geometric rate under the algorithm. By introducing row stochastic, column stochastic matrix, and a momentum term, the rate of convergence of our algorithm is higher than that of the literatures.

References

Nedic, A., Ozdaglar, A.: Distributed subgradient methods for multi-agent optimization. IEEE Trans. Autom. Control 54(1), 48–61 (2009). https://doi.org/10.1109/TAC.2008.2009515
Article MathSciNet MATH Google Scholar
Duchi, J.C., Agarwal, A., Wainwright, M.J.: Dual averaging for distributed optimization: convergence analysis and network scaling. IEEE Trans. Autom. Control 57(3), 592–606 (2012). https://doi.org/10.1109/TAC.2011.2161027
Article MathSciNet MATH Google Scholar
Yuan, K., Ling, Q., Yin, W.: On the convergence of decentralized gradient descent. SIAM J. Optim. 26(3), 1835–1854 (2016). https://doi.org/10.1137/130943170
Article MathSciNet MATH Google Scholar
Xu, J., Zhu, S., Soh, Y.C., Xie, L.: Augmented distributed gradient methods for multi-agent optimization under uncoordinated constant stepsizes. In: 2015 54th IEEE Conference on Decision and Control (CDC), pp. 2055–2060 (2015). https://doi.org/10.1109/CDC.2015.7402509
Qu, G., Li, N.: Harnessing smoothness to accelerate distributed optimization. IEEE Trans. Control Netw. Syst. 5(3), 1245–1260 (2018). https://doi.org/10.1109/TCNS.2017.2698261
Article MathSciNet MATH Google Scholar
Qu, G., Li, N.: Accelerated distributed Nesterov gradient descent. IEEE Trans. Autom. Control 65(6), 2566–2581 (2020). https://doi.org/10.1109/TAC.2019.2937496
Article MathSciNet MATH Google Scholar
Tsianos, K.I., Lawlor, S., Rabbat, M.G.: Push-sum distributed dual averaging for convex optimization. In: 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), pp. 5453–5458 (2012). https://doi.org/10.1109/CDC.2012.6426375
Nedic, A., Olshevsky, A.: Distributed optimization over time-varying directed graphs. IEEE Trans. Autom. Control 60(3), 601–615 (2015). https://doi.org/10.1109/TAC.2014.2364096
Article MathSciNet MATH Google Scholar
Mai, V.S., Abed, E.H.: Distributed optimization over weighted directed graphs using row stochastic matrix. In: 2016 American Control Conference (ACC), pp. 7165–7170 (2016). https://doi.org/10.1109/ACC.2016.7526803
Li, H., Lu, Q., Huang, T.: Distributed projection subgradient algorithm over time-varying general unbalanced directed graphs. IEEE Trans. Autom. Control 64(3), 1309–1316 (2019). https://doi.org/10.1109/TAC.2018.2849616
Article MathSciNet MATH Google Scholar
Xi, C., Khan, U.A.: DEXTRA: a fast algorithm for optimization over directed graphs. IEEE Trans. Autom. Control 62(10), 4980–4993 (2017). https://doi.org/10.1109/TAC.2017.2672698
Article MathSciNet MATH Google Scholar
Xin, R., Khan, U.A.: A linear algorithm for optimization over directed graphs with geometric convergence. IEEE Control Syst. Lett. 2(3), 315–320 (2018). https://doi.org/10.1109/LCSYS.2018.2834316
Article MathSciNet Google Scholar
Saadatniaki, F., Xin, R., Khan, U.A.: Decentralized optimization over time-varying directed graphs with row and column-stochastic matrices. IEEE Trans. Autom. Control 65(11), 4769–4780 (2020). https://doi.org/10.1109/TAC.2020.2969721
Article MathSciNet MATH Google Scholar
Xin, R., Khan, U.A.: Distributed heavy-ball: a generalization and acceleration of first-order methods with gradient tracking. IEEE Trans. Autom. Control 65(6), 2627–2633 (2020). https://doi.org/10.1109/TAC.2019.2942513
Article MathSciNet MATH Google Scholar
Horn, R.A., Johnson, C.R.: Matrix Analysis, 2nd edn. Cambridge University Press, New York (2013)
Google Scholar
Mo, L., Li, J., Huang, J.: Distributed optimization algorithm for discrete-time heterogeneous multi-agent systems with nonuniform stepsizes. IEEE Access 7, 87303–87312 (2019). https://doi.org/10.1109/ACCESS.2019.2925414
Article Google Scholar
Mo, L., Lin, P.: Distributed continuous-time optimization over second-order multi-agent networks with nonuniform gains. In: 2019 Chinese Control and Decision Conference (CCDC), pp. 35-38 (2019). https://doi.org/10.1109/CCDC.2019.8833262
Nesterov, Y.: Introductory Lectures on Convex Optimization. Springer, Boston (2004). https://doi.org/10.1007/978-1-4419-8853-9
Book MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Mathematics and Statistics, Beijing Technology and Business University, Beijing, 100048, China
Zhenteng Gao & Lipo Mo

Authors

Zhenteng Gao
View author publications
You can also search for this author in PubMed Google Scholar
Lipo Mo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lipo Mo .

Editor information

Editors and Affiliations

Beihang University, Beijing, China
Zhang Ren
Beijing Institute of Electronic System Engineering, Beijing, China
Mengyi Wang
Beihang University, Beijing, China
Yongzhao Hua

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gao, Z., Mo, L. (2023). An Improved Distributed Optimization Algorithm over Unbalanced Directed Graph. In: Ren, Z., Wang, M., Hua, Y. (eds) Proceedings of 2021 5th Chinese Conference on Swarm Intelligence and Cooperative Control. Lecture Notes in Electrical Engineering, vol 934. Springer, Singapore. https://doi.org/10.1007/978-981-19-3998-3_49

Download citation

DOI: https://doi.org/10.1007/978-981-19-3998-3_49
Published: 29 July 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-3997-6
Online ISBN: 978-981-19-3998-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

An Improved Distributed Optimization Algorithm over Unbalanced Directed Graph

Abstract

Similar content being viewed by others

Continuous-time Distributed Heavy-ball Algorithm for Distributed Convex Optimization over Undirected and Directed Graphs

Regularized dual gradient distributed method for constrained convex optimization over unbalanced directed graphs

Distributed accelerated optimization algorithms: Insights from an ODE

Keywords

1 Introduction

2 Graph Theory Foundation and Problem Description

Assumption 1

Assumption 2

Assumption 3

Remark 1:

3 Algorithm Design

4 Algorithm Convergence Analysis

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Lemma 4

Lemma 5

Proof

Lemma 6

Proof

Lemma 7

Proof

Lemma 8

Proof

Lemma 9

Proof

5 Analysis of Convergence Results

Theorem 1

Proof

Remark 2:

6 Numerical Experiment

7 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation