Inexact Alternating Direction Methods of Multipliers with Logarithmic–Quadratic Proximal Regularization

Li, Min; Liao, Li-Zhi; Yuan, Xiaoming

doi:10.1007/s10957-013-0334-4

Inexact Alternating Direction Methods of Multipliers with Logarithmic–Quadratic Proximal Regularization

Published: 25 May 2013

Volume 159, pages 412–436, (2013)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Inexact Alternating Direction Methods of Multipliers with Logarithmic–Quadratic Proximal Regularization

Download PDF

Min Li¹,
Li-Zhi Liao² &
Xiaoming Yuan²

752 Accesses
10 Citations
Explore all metrics

Abstract

In the literature, it was shown recently that the Douglas–Rachford alternating direction method of multipliers can be combined with the logarithmic-quadratic proximal regularization for solving a class of variational inequalities with separable structures. This paper studies the inexact version of this combination, where the resulting subproblems are allowed to be solved approximately subject to different inexactness criteria. We prove the global convergence and establish worst-case convergence rates for the derived inexact algorithms.

Symmetric Alternating Direction Method with Indefinite Proximal Regularization for Linearly Constrained Convex Optimization

Article 13 December 2017

An inexact proximal generalized alternating direction method of multipliers

Article 12 May 2020

A Partially Inexact Proximal Alternating Direction Method of Multipliers and Its Iteration-Complexity Analysis

Article 05 April 2019

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Variational inequality problems (VIPs) with separable structures and linear constraints capture wide applications in some fields. For solving these VIPs, the Douglas–Rachford alternating direction method of multipliers (ADMM) proposed originally in [1] is a benchmark. The proximal point algorithm (PPA) in [2, 3] was suggested to regularize ADMM’s subproblems and accordingly an algorithmic framework of the inexact version of ADMM with proximal regularization was established. Moreover, in [4, 5] the ADMM subproblems were suggested to be regularized by the logarithmic-quadratic proximal (LQP) regularization proposed in [6]; thus the complementarity subproblems of applying ADMM reduce to systems of nonlinear equations. However, it is not possible to acquire the exact solutions of the complementarity subproblems of ADMM with LQP regularization. Therefore, we are interested in investigating how to solve these subproblems approximately. Our purpose is to study the inexact version of the combination of ADMM with LQP regularization, seeking appropriate inexactness criteria under which the inexact version of ADMM with LQP regularization can be still guaranteed to be convergent. We shall study several inexactness criteria with different levels of implementability. For the resulting algorithms based on the inexact version of ADMM with LQP regularization, we prove their global convergence and establish their worst-case O(1/t) convergence rates.

The rest of this paper is organized as follows. We first state the model to be studied and the motivation of considering the inexact version of ADMM with LQP regularization in Sect. 2. Then, in Sect. 3, we summarize some preliminaries for further analysis. After that, a conceptual inexact version of the ADMM with LQP regularization is presented in Sect. 4; an implementable algorithm based on the inexact version of ADMM with LQP regularization is proposed in Sect. 5. For both the conceptual and the implementable algorithms, the global convergence and a worst-case convergence rate are established. Finally, some concluding remarks are drawn in Sect. 6.

2 Model and Motivation

More specifically, the VIP we consider is to find a $u^{*} \in \mathcal{U}$ such that

$$ \bigl(u - u^*\bigr)^T F\bigl(u^*\bigr) \ge 0, \quad \forall u \in \mathcal{U}, $$

(1)

with

(2)

where $A\in \mathbb{R}^{m \times n}$ and $B\in \mathbb{R}^{m \times p}$ are given matrices; $b\in \mathbb{R}^{m}$ is a given vector; $f: \mathbb{R}^{n}_{+} \rightarrow \mathbb{R}^{n}$ and $g: \mathbb{R}^{p}_{+} \rightarrow \mathbb{R}^{p}$ are continuous and monotone operators.

By attaching a Lagrange multiplier $\lambda\in \mathbb{R}^{m}$ to the linear constraint in (2), we can reformulate (1)–(2) as: Finding a $w^{*} \in \mathcal{W}$ such that

$$ \bigl(w - w^*\bigr)^T Q\bigl(w^*\bigr) \ge 0, \quad \forall w \in \mathcal{W}, $$

(3)

where

$$ \quad w := \left ( \begin{array}{c} x \\ y \\ \lambda \end{array} \right ), \qquad Q(w):=\left (\begin{array}{c} f(x) - A^T\lambda \\[5pt] g(y) - B^T\lambda \\[5pt] Ax + By -b \end{array} \right ), \qquad \mathcal{W} := \mathbb{R}^n_{^+} \times \mathbb{R}^p_{^+} \times \mathbb{R}^m. $$

(4)

We denote by SVI$(\mathcal{W},Q)$ the structured VIP (3)–(4); and its solution set, denoted by $\mathcal{W}^{*}$, is assumed to be nonempty.

The ADMM in [1] for solving SVI$(\mathcal{W},Q)$ is

(5)

(6)

(7)

where $H \in \mathbb{R}^{m \times m}$ is a penalty matrix in the metric form. In the literature, it is often chosen as H:=β⋅I, where I is the identity matrix in $\mathbb{R}^{m \times m}$ and β>0 is a scalar.

To truly implement the ADMM scheme (5)–(7), the resulting complementarity problems should be solved efficiently. In fact, for some particular cases, where f, g, A, and B are all special enough, the resulting subproblems of ADMM could be easy enough to have closed-form solutions or can be easily solved up to high precisions. This feature has triggered a tremendous burst of applying ADMM in the literature for solving different problems varying from semidefinite programming, image processing, statistical learning, control, computer vision to numerical linear algebra; see, e.g., [7] and references therein. On the other hand, equally importantly, for generic setting of f, g, A, and B, we can only expect to achieve approximate solutions of these subproblems subject to certain accuracy by applying certain iterative scheme internally. For this case, it is important to analyze under which criterion these subproblems can be solved approximately and how to seek truly implementable inexactness criteria. In [4, 5] the ADMM subproblems were suggested to be regularized by the LQP regularization proposed in [6]. The LQP regularization forces the solutions of ADMM subproblems to be interior points of $\mathbb{R}^{n}_{+}$ and $\mathbb{R}^{p}_{+}$, respectively, thus the complementarity subproblems (5) and (6) reduce to systems of nonlinear equations. More specifically, the iterative scheme of ADMM with LQP regularization is as follows:

(8)

(9)

(10)

where $R\in \mathbb{R}^{n \times n}$ and $S\in \mathbb{R}^{p \times p}$ are symmetric positive definite diagonal matrices and they are called proximal matrices; R(x ^k+1−x ^k) and S(y ^k+1−y ^k) are quadratic proximal regularization terms, μ∈]0,1[ is a given constant, $X_{k} := \operatorname{diag}(x^{k}_{1}, x^{k}_{2}, \ldots, x_{n}^{k})$, $(x^{k+1})^{-1}\in \mathbb{R}^{n}$ is a vector whose jth element is $1/x_{j}^{k+1}$, $Y_{k} := \operatorname{diag} (y^{k}_{1}, y^{k}_{2}, \ldots, y_{n}^{k})$, $(y^{k+1})^{-1} \in \mathbb{R}^{p}$ is a vector whose jth element is $1/y_{j}^{k+1}$. In [8], this scheme was further shown to be convergent on a worst-case O(1/t) rate, provided that both the x- and the y-subproblems (i.e., (8) and (9)) are solved exactly. Here, the worst-case O(1/t) convergence rate of ADMM with LQP regularization means the scheme (8)–(10) achieves a solution of SVI$(\mathcal{W},Q)$ with the accuracy of O(1/t) after most t iterations.

To solve the complementarity subproblems (8) and (9), we shall study several inexactness criteria with different levels of implementability. When the accuracy of solving these subproblems is relaxed, it is easy to understand that some additional corrections might need to be supplemented after the iteration of ADMM with LQP regularization. Or, relaxing the accuracy of solving these subproblems might pay the price of requiring additional correction on the subproblems’ solutions. We thus suggest to solve these subproblems within the framework of either under a more restricted criterion but without further correction step, or under a more relaxed criterion but with some further correction step. How to balance the accuracy of inner subproblems and further correction depends on how special a concrete application of the abstract model (3)–(4) is, and it is too inane to discuss this issue for the abstract model (3)–(4) without any specification of the involved functions and sets. Instead, in this paper, we only provide some inexact algorithms based on the framework of combining ADMM with LQP regularization. In addition to the global convergence, a worst-case O(1/t) convergence rate is established for these inexact algorithms. Therefore, we show that these inexact algorithms based on the combination of ADMM with LQP regularization, which are more practical than the exact version (8)–(10), enjoy the same worst-case convergence rate as the exact version.

3 Preliminaries

In this section, we recall some basic definitions and properties, which will be frequently used in our later analysis. Some useful results proved already in the literature are also summarized.

3.1 Some Basic Definitions and Properties

Let N be a positive definite matrix with appropriate dimensionality, the N-norm of a vector v with appropriate dimensionality is denoted by $\|v\|_{N}:= \sqrt{v^{T}Nv}$. Let Ω be a nonempty, closed, and convex subset of $\mathbb{R}^{l}$. The projection operator under the N-norm is defined by

$$P_{\varOmega,N}(v):=\hbox{argmin} \bigl\{ \| v-u \|_N \; | \; u\in \varOmega \bigr\}. \nonumber $$

Clearly, according to this definition, we have the following property related to the projection operator under the N-norm:

$$ \big\| P_{\varOmega,N}(v) - w \big\|_N \le \| v - w \|_N, \quad \forall v\in \mathbb{R}^l, w \in \varOmega. $$

(11)

In the following, we give the definitions of monotonicity and Lipschitz continuity for a mapping.

Definition 3.1

Let $\mathcal{X} \subseteq \mathbb{R}^{n}$ and $f:\mathcal{X} \rightarrow \mathbb{R}^{n}$ be a mapping. Then,

(i)
f is said to be monotone with respect to $\mathcal{X}$ iff the following inequality always holds:
$$(x - \tilde{x})^T\bigl(f(x) - f(\tilde{x})\bigr)\geq 0,\quad \forall \, x, \tilde{x} \in \mathcal{X}; \nonumber $$
(ii)
f is Lipschitz continuous with respect to $\mathcal{X}$ iff there exists a constant (called Lipschitz constant) L _f>0 such that
$$\big\| f(x) - f(\tilde{x}) \big\| \le L_f \| x - \tilde{x} \|,\quad \forall \, x, \tilde{x} \in \mathcal{X}. \nonumber $$

We denote by L _f and L _g the Lipschitz constants of the mappings f and g in the model (1)–(2). Note that the mapping Q(w) in (4) is monotone with respect to $\mathcal{W}$ under the monotonicity assumption of f and g. Therefore, the solution set $\mathcal{W}^{*}$ of SVI$(\mathcal{W},Q)$ is closed and convex; see, e.g., [9].

Throughout the rest of the paper, we define the matrix G as

$$ G:=\left ( \begin{array}{c@{\quad}c@{\quad}c} (1+\mu) R & 0 & 0 \\ 0 & (1+\mu)S+ B^THB & 0 \\ 0 & 0 & H^{-1} \end{array} \right ). $$

(12)

Last, we recall a characterization of the solution set $\mathcal{W}^{*}$ proved in [9], see (2.3.2) in p. 159 of [9]. As shown in [8, 10], this characterization is crucial for establishing the convergence rate for ADMM or ADMM with LQP regularization. More specifically, recall that $\mathcal{W}^{*}$ can be characterized as

$$\mathcal{W}^* := \bigcap_{w \in \mathcal{W}} \bigl\{ \bar{w} \in \mathcal{W}\; | \; (w - \bar{w})^T Q(w) \ge 0 \bigr\}. \nonumber $$

Therefore, as Definition 1 in [11], $\bar{w} \in \mathcal{W}$ can be regarded as an ε-approximation solution of SVI$(\mathcal{W},Q)$ if it satisfies

(13)

In our later analysis, we shall establish the worst-case O(1/t) convergence rate for the new algorithms to be proposed in the sense that after t iterations of these algorithms, we can find a $\bar{w} \in \mathcal{W}$ such that

$$(\bar{w} - w)^T Q(w) \le \varepsilon, \quad \forall w \in \mathcal{B}_\mathcal{W}(\bar{w}), \nonumber $$

with ε=O(1/t).

3.2 A Key Lemma

Inspired by Proposition 1 in [6], the following lemma was proved in the literature (see, e.g., [5, 12]), and its conclusion plays an important role in analyzing the convergence of ADMM with LQP regularization and establishing its convergence rate. Our later analysis is also based on this conclusion.

Lemma 3.1

Let $P := \operatorname{diag}(p_{1}, p_{2},\ldots, p_{t}) \in \mathbb{R}^{t \times t}$ be a positive definite diagonal matrix, $q(u) \in \mathbb{R}^{t}$ be a monotone mapping of u with respect to $\mathbb{R}^{t}_{^{+}}$, and μ∈]0,1[. For a given ${\bar{u}} \in \mathbb{R}^{t}_{++}$, we define ${\bar{U}}:= \operatorname{diag}({\bar{u}}_{1},{\bar{u}}_{2}, \ldots, {\bar{u}}_{t})$. Then, the equation

$$ q(u) + P\bigl[(u -{\bar{u}}) + \mu\bigl( {\bar{u}} - {\bar{U}}^2 u^{-1}\bigr)\bigr]=0 $$

(14)

has the unique positive solution u. In addition, for this positive solution $u \in \mathbb{R}^{t}_{++}$ and any $v \in \mathbb{R}^{t}_{+} $, we have

$$ (v-u )^T q(u) \ge \frac{1+\mu}{2} \bigl(\| u - v \|_P^2 - \|{\bar{u}} - v\|_P^2 \bigr) + \frac{1-\mu}{2}\| {\bar{u}} - u \|_P^2. $$

(15)

Moreover, we have

$$ (v-u )^T q(u) \ge (1 + \mu) ({\bar{u}} - u)^T P (v - u) - \mu \|{\bar{u}} - u\|_P^2. $$

(16)

3.3 Proof Framework of Convergence

Later, we shall propose several algorithms, and they differ in the inexactness criteria for solving the ADMM subproblems with LQP regularization. The convergence proofs for these different algorithms share a common framework, even though the specific techniques required are quite different in their respective proofs. For succinctness purpose, in this subsection we first show the framework of proving the convergence before these algorithms are presented. Then, when we establish the convergence for a specific algorithm later, it suffices to check the conditions presented in this framework. The proof framework presented below is thus a unified treatment on the convergence analysis for different algorithms.

We first establish an identity which will be used later for convergence. The proof of this lemma only requires straightforward manipulation.

Lemma 3.2

Let ${\bar{x}}^{k} \in \mathbb{R}^{n}$, $y^{k}, {\bar{y}}^{k} \in \mathbb{R}^{p}$ and $\lambda^{k} \in \mathbb{R}^{m}$ be given. We define $\bar{\lambda}^{k}$ and $\hat{\lambda}^{k+1}$ as

$$\bar{\lambda}^k := \lambda^k - H \bigl(A \bar{x}^k + By^k - b\bigr) \quad \hbox{\textit{and}} \quad \hat{ \lambda}^{k+1} := \lambda^k - H \bigl(A \bar{x}^k + B\bar{y}^k - b\bigr). \nonumber $$

Then, for any $y \in \mathbb{R}^{p}$ and $\lambda \in \mathbb{R}^{m}$, we have

Proof

Using the definition of $\hat{\lambda}^{k+1}$, we have

(17)

For the first term in (17), we have the following identity:

$$ 2 \bigl(\lambda - \hat{\lambda}^{k+1} \bigr)^T H^{-1} \bigl(\lambda^k - \hat{ \lambda}^{k+1}\bigr) = \big\|\lambda - \hat{\lambda}^{k+1} \big\|_{H^{-1}}^2 - \big\|\lambda - \lambda^k \big\|_{H^{-1}}^2 + \big\|\lambda^k - \hat{ \lambda}^{k+1}\big\|_{H^{-1}}^2. $$

(18)

Then, it follows from the definitions of $\bar{\lambda}^{k}$ and $\hat{\lambda}^{k+1}$ that

$$\hat{\lambda}^{k+1} = \bar{\lambda}^k + HBy^k - HB\bar{y}^k, \nonumber $$

and thus

Substituting this and (18) into (17), we have

(19)

Note that

$$2 \bigl(y- \bar{y}^k\bigr)^T B^T H B \bigl(y^k - \bar{y}^k\bigr) = \big\|y - \bar{y}^k \big\|^2_{B^THB} - \big\|y - y^k\big\|^2_{B^THB} + \big\|y^k - \bar{y}^k\big\|^2_{B^THB}. $$

Adding the above equality and (19), the assertion is proved. □

Next, we will present a unified framework for proving the convergence of our new algorithms to be presented.

Theorem 3.1

Let c ₀∈]0,+∞[ and μ∈]0,1[ be positive constants; {η _k} be a nonnegative sequence with $\sum_{k=0}^{\infty} \eta_{k} <+\infty$; and G be defined by (12). For any $w^{*} \in \mathcal{W}^{*}$, if the sequences {w ^k=(x ^k,y ^k,λ ^k)} and $\{\bar{w}^{k} = (\bar{x}^{k}, \bar{y}^{k}, \bar{\lambda}^{k})\}$ satisfy

(20)

then {w ^k} is bounded and

$$ \lim_{k\rightarrow \infty}\big\| w^k - \bar{w}^k \big\|_G=0. $$

(21)

Furthermore, if the mapping Q is continuous and

$$ \liminf_{k\to \infty} \bigl(w - \bar{w}^k \bigr)^T Q\bigl(\bar{w}^k\bigr) \ge 0, \quad \forall w \in \mathcal{W}, $$

(22)

then the sequence {w ^k} converges to a point in $\mathcal{W}^{*}$.

Proof

It follows from (20) that for any l≤k and $w^{*} \in \mathcal{W}^{*}$, we have

$$ \big\|w^{k+1} - w^*\big\|_G^2 \le \big\|w^k - w^*\big\|_G^2 + \eta_k \le \big\|w^l - w^*\big\|_G^2 + \sum _{i=l}^k\eta_i. $$

(23)

Thus the sequence {w ^k} is bounded, since $\sum_{i=0}^{\infty} \eta_{i} <+\infty$. Summing the inequality (20) over k=0,1,… , we get

$$ c_0\sum_{k=0}^\infty \bigl( \big\|x^k - \bar{x}^k\big\|^2_R + \big\|y^k - \bar{y}^k\big\|^2_S + \big\| \lambda^k - \bar{\lambda}^k \big\|_{H^{-1}}^2 \bigr) \le \big\|w^0 - w^*\big\|_G^2 + \sum _{k=0}^{\infty} \eta_k < + \infty. $$

Therefore, we have

$$\lim_{k\to\infty} \bigl(\big\|x^k - \bar{x}^k \big\|^2_R + \big\|y^k - \bar{y}^k \big\|^2_S + \big\|\lambda^k - \bar{ \lambda}^k \big\|_{H^{-1}}^2 \bigr) = 0. \nonumber $$

It follows that

$$\lim_{k\rightarrow \infty}\big\| x^k - \bar{x}^k \big\|_R =0, \quad \lim_{k\rightarrow \infty}\big\| y^k - \bar{y}^k \big\|_S=0 \quad \hbox{and} \quad \lim _{k\rightarrow \infty} \big\| \lambda^k - \bar{\lambda}^k \big\|_{H^{-1}}= 0. \nonumber $$

The first assertion (21) is proved.

Since {w ^k} is bounded and $\lim_{k \rightarrow \infty} \|w^{k} - \bar{w}^{k}\|_{G} = 0$, we find that $\{\bar{w}^{k}\}$ is also bounded and then it has at least one cluster point. Let w ^∞ be a cluster point of $\{\bar{w}^{k}\}$ and the subsequences $\{\bar{w}^{k_{j}}\}$ and $\{w^{k_{j}}\}$ both converge to w ^∞. It follows from (22) that

$$\liminf_{j\to \infty} \bigl(w - \bar{w}^{k_j} \bigr)^T Q\bigl(\bar{w}^{k_j}\bigr) \ge 0, \quad \forall w \in \mathcal{W}, \nonumber $$

and consequently

$$\bigl(w - w^\infty\bigr)^T Q\bigl(w^\infty\bigr) \ge 0, \quad \forall w \in \mathcal{W}, \nonumber $$

since the mapping Q is continuous. This means that w ^∞ is a solution of SVI$(\mathcal{W},Q)$. Note that the inequality (23) is true for all solution points of SVI$(\mathcal{W},Q)$, hence we have

$$ \big\|w^{k+1}-w^\infty\big\|_G^2 \leq \big\|w^l-w^\infty\big\|_G^2 + \sum _{i =l}^{\infty} \eta_i, \quad \forall k \ge 0, \forall l \le k. $$

(24)

Since $w^{k_{j}}\rightarrow w^{\infty}\ (j\rightarrow\infty)$ and $\sum_{i=0}^{\infty} \eta_{i} < + \infty$, for any given ε>0, there exists a j ₀>0 such that

$$ \big\|w^{k_{j_0}}-w^\infty\big\|_G^2 \leq \frac{\varepsilon^2}{2} \quad \hbox{and} \quad \sum_{i=k_{j_0}}^{\infty} \eta_i \le \frac{\varepsilon^2}{2}. $$

(25)

Therefore, for any $k\geq k_{j_{0}}$, it follows from (24) and (25) that

This implies that the sequence {w ^k} converges to a point w ^∞ in $\mathcal{W}^{*}$. □

According to Theorem 3.1, it suffices to check conditions (20) and (22) when we establish the convergence for the algorithms to be proposed. As the new algorithms are based on the combination of ADMM with LQP regularization, where the subproblems are solved inexactly, we abbreviate these algorithms as I-ADMM-LQP X, where “X” is an arabian number representing a concrete algorithm.

4 The First Algorithm

We propose the first algorithm based on the combination of ADMM with LQP regularization where the subproblems are solved approximately. The inexactness criterion required by this algorithm involves the exact solutions of the resulting subproblems, thus is not applicable directly in implementation. We still present this algorithm, however, because its convergence analysis provides a similar sketch for the other algorithms to be analyzed later, and is quite simple. Thus, by this simpler analysis, it becomes easier to understand the latter analysis for other implementable algorithms. In addition, although the inexactness criterion itself is not checkable directly, it provides the possibility to seek implementable criteria based on some error bounds; see, e.g., [9, 13]. Thus, it deserves at least theoretical interest. Note this criterion is widely used in PPA and ADMM literature.

4.1 Algorithm 1

In this algorithm, we seek approximate solutions of the subproblems (8)–(9) and the accuracy is controlled by a summable sequence of positive scales.

Algorithm 1

(I-ADMM-LQP 1)

Step 0. :

Let ε>0; μ∈]0,1[; $w^{0}:=(x^{0}, y^{0}, \lambda^{0}) \in {\mathbb{R}^{n}_{^{++}}} \times \mathbb{R}^{p}_{^{++}} \times \mathbb{R}^{m}$; $R \in \mathbb{R}^{n \times n}$, $S \in \mathbb{R}^{p \times p}$ and $H \in \mathbb{R}^{m \times m}$ be positive definite diagonal matrices; and {ν _k} be a nonnegative sequence satisfying $\sum_{k=0}^{\infty} \nu_{k} < + \infty$. Set k=0.

Step 1. :

Find $x^{k+1} \in \mathbb{R}^{n}_{++}$ such that

$$ \big\|x^{k+1} - x_*^{k+1}\big\| \le \nu_k, $$

(26)

where $x^{k+1}_{*}$ is the exact solution of the system

$$ f(x) - A^T \bigl[\lambda^k - H \bigl(A x + B y^k - b\bigr) \bigr] + R\bigl[\bigl(x-x^k \bigr) + \mu \bigl(x^k - X_k^2 x^{-1}\bigr)\bigr]= 0. $$

(27)

Step 2. :

Find $y^{k+1} \in \mathbb{R}^{p}_{++}$ such that

$$ \big\|y^{k+1} - y^{k+1}_*\big\| \le \nu_k, $$

(28)

where $y^{k+1}_{*}$ is the exact solution of the system

$$ \quad\;\;\; g(y) - B^T \bigl[\lambda^k - H \bigl(A x^{k+1}_* + B y - b\bigr)\bigr] + S\bigl[\bigl(y-y^k \bigr) + \mu \bigl(y^k - Y_k^2 y^{-1}\bigr)\bigr] = 0. $$

(29)

Step 3. :

Update the Lagrange multiplier

$$ \lambda^{k+1} := \lambda^k - H \bigl(A x^{k+1} + B y^{k+1} - b\bigr). $$

(30)

Step 4. :

Set w ^k+1=(x ^k+1,y ^k+1,λ ^k+1). If ∥w ^k+1−w ^k∥≤ε, stop; otherwise set k=k+1 and goto Step 1.

Remark 4.1

It follows from Lemma 3.1 that there exist unique $x^{k+1}_{*} \in \mathbb{R}^{n}_{++}$ and $y^{k+1}_{*} \in \mathbb{R}^{p}_{++}$ satisfying (27) and (29), respectively. This guarantees that we can find $x^{k+1} \in \mathbb{R}^{n}_{++}$ and $y^{k+1} \in \mathbb{R}^{p}_{++}$ satisfying (26) and (28), respectively.

4.2 Convergence

In this subsection, we prove the convergence of Algorithm 1. First, we define

$$ \lambda^{k+1}_* := \lambda^k - H \bigl(A x^{k+1}_* + B y^{k+1}_* - b\bigr), $$

(31)

and set

$$ \quad\; w^{k+1}_* := \left ( \begin{array}{c} x^{k+1}_* \\ y^{k+1}_* \\ \lambda^{k+1}_* \end{array} \right ) \quad \hbox{and} \quad \bar{w}^k :=\left ( \begin{array}{c} \bar{x}^k \\ \bar{y}^k \\ \bar{\lambda}^k \end{array} \right ) = \left ( \begin{array}{c} x^{k+1}_* \\ y^{k+1}_* \\ \lambda^k - H(Ax^{k+1}_* + By^k - b) \end{array} \right ) $$

(32)

to simplify our notation in the following analysis.

To prove the convergence of Algorithm 1, we first present some lemmas.

Lemma 4.1

Let $\{w^{k+1}_{*}\}$ be defined by (32) and {w ^k} be generated by Algorithm 1. Then, there exists a positive constant ρ such that

$$ \big\|w^{k+1}_* - w^{k+1}\big\|_G \le \rho \nu_k, \quad \forall k \ge 0, $$

(33)

where G is defined by (12).

Proof

It follows from (30) and (31) that

$$\lambda^{k+1}_* - \lambda^{k+1} = HA\bigl(x^{k+1} - x^{k+1}_*\bigr) +HB\bigl(y^{k+1} - y^{k+1}_*\bigr). \nonumber $$

Together with (26), (28), and (12), the above equation implies (33) immediately. □

Lemma 4.2

Let the sequence {w ^k} be generated by Algorithm 1, and the accompanying sequences $\{w^{k+1}_{*}\}$ and $\{\bar{w}^{k}\}$ be defined by (32). Then, for any $w:=(x, y, \lambda) \in \mathcal{W}$, we have

(34)

where G is defined by (12).

Proof

Note that $\bar{x}^{k} = x^{k+1}_{*}$ and $\bar{y}^{k} = y^{k+1}_{*}$. Applying Lemma 3.1 to (27) by setting P=R, $\bar{u} = x^{k}$, $u = \bar{x}^{k} = x^{k+1}_{*}$, $q(u) = f(\bar{x}^{k}) - A^{T}[\lambda^{k} - H(A\bar{x}^{k} + By^{k} -b)] {\stackrel{\small (32)}{=}} f(\bar{x}^{k}) - A^{T} \bar{\lambda}^{k}$ and v=x in (15), we have

(35)

Similarly, applying Lemma 3.1 to (29) and using (32), we get

It follows from the above inequality that

(36)

Setting $\hat{\lambda}^{k+1} = \lambda^{k+1}_{*}$ in Lemma 3.2 and using $\bar{y}^{k} = y^{k+1}_{*}$, we have

Combining (35), (36) and the above equation together and by simple manipulations, we can get (34) immediately. The proof is completed. □

The following result shows the contraction of the sequence generated by Algorithm 1, based on which the convergence of Algorithm 1 can be established easily.

Lemma 4.3

Let the sequence {w ^k} be generated by Algorithm 1. Then {w ^k} is bounded, i.e., for any $w^{*} \in \mathcal{W}^{*}$, there are positive constants $C_{w^{*}}$ and ρ, such that

$$\big\|w^k - w^*\big\|_G \le C_{w^*}, \quad \forall k \ge 0, \nonumber $$

and

where G is defined by (12).

Proof

Setting w=w ^∗ in (34), we get

On the other hand, since Q is monotone, $\bar{w}^{k} \in \mathcal{W}$, and $w^{*} \in \mathcal{W}^{*}$, we have

$$0 \ge \bigl(w^* - \bar{w}^k\bigr)^T Q\bigl(w^*\bigr) \ge \bigl(w^* - \bar{w}^k\bigr)^T Q\bigl( \bar{w}^k\bigr). \nonumber $$

From the above two inequalities and using μ∈]0,1[, we obtain

$$ \big\| w^{k+1}_* - w^* \big\|^2_G \le \big\| w^k - w^* \big\|^2_G - (1-\mu) \bigl( \big\|x^k - \bar{x}^k\big\|^2_R + \big\|y^k - \bar{y}^k\big\|^2_S + \big\| \lambda^k - \bar{\lambda}^k \big\|_{H^{-1}}^2 \bigr). $$

(37)

It follows from (37) and Lemma 4.1 that there is a positive constant ρ such that

Consequently, for any k≥0 we get

$$ \big\| w^{k+1} - w^*\big\|_G \le \big\| w^0- w^*\big\|_G + \rho \sum_{i=0}^k \nu_i \le \big\| w^0- w^*\big\|_G + \rho \sum _{i=0}^{\infty} \nu_i := C_{w^*} < + \infty. $$

(38)

Therefore, the sequence {w ^k} generated by Algorithm 1 is bounded. It follows from (33), (37) and (38) that

The proof is completed. □

Now, we are ready to prove the convergence of Algorithm 1.

Theorem 4.1

The sequence {w ^k} generated by Algorithm 1 converges to some w ^∞ which is a solution of SVI $(\mathcal{W},Q)$.

Proof

From $\sum_{k=0}^{\infty}\nu_{k} < + \infty$ and ν _k≥0, it follows that

$$\sum_{k=0}^{\infty} \bigl(2 \rho C_{w^*}\nu_k + \rho^2 \nu_k^2 \bigr) < + \infty. \nonumber $$

Setting $\eta_{k} = 2 \rho C_{w^{*}} \nu_{k} + \rho^{2}\nu_{k}^{2}$, c ₀=1−μ in (20), from Lemma 4.3 and Theorem 3.1 we have

$$ \lim_{k\rightarrow \infty}\big\| w^k - \bar{w}^k \big\|_G=0. $$

(39)

Then it follows from (12), (31), and (32) that

From (34) and the above two formulas, we get

$$\liminf_{k\to \infty} \bigl(w - \bar{w}^k \bigr)^T Q\bigl(\bar{w}^k\bigr) \ge 0, \quad \forall w \in \mathcal{W}. \nonumber $$

The convergence of Algorithm 1 is then obtained immediately from Theorem 3.1. □

4.3 Convergence Rate

Now, we show the worst-case O(1/t) convergence rate for Algorithm 1.

Theorem 4.2

For any integer t>0, there is a $\bar{w}_{t} \in \mathcal{W}$ which is a convex combination of the iterates $\bar{w}^{0}, \bar{w}^{1}, \ldots, \bar{w}^{t}$ defined by (32). Then, for any $w \in \mathcal{W}$, we have

$$ (\bar{w}_t - w)^TQ(w) \le \frac{1}{t+1} \Biggl(\frac{1}{2}\big\| w^0 - w \big\|^2_G + \rho \sum_{k=0}^t \nu_k \big\|w^{k+1} - w \big\|_G \Biggr), $$

(40)

where $\bar{w}_{t} := (\sum_{k =0}^{t} \bar{w}^{k})/(t+1)$.

Proof

From (34), we have

$$\bigl(w - \bar{w}^k\bigr)^T Q\bigl(\bar{w}^k \bigr) + \frac{1}{2} \big\|w^k - w\big\|^2_G \ge \frac{1}{2}\big\|w^{k+1}_* - w\big\|^2_G, \quad \forall w \in \mathcal{W}. \nonumber $$

It follows from (33) that

Since Q is monotone, from the above two inequalities, we have

(41)

Summing the inequality (41) over k=0,1,…,t, we obtain

Since $\sum_{k =0}^{t} 1/(t+1) = 1$, $\bar{w}_{t}$ is a convex combination of $\bar{w}^{0}, \bar{w}^{1}, \ldots, \bar{w}^{t}$ and thus $\bar{w}_{t} \in \mathcal{W}$. Using the notation of $\bar{w}_{t}$, we derive

$$(w - \bar{w}_t)^T Q(w) + \frac{1}{2(t+1)} \big\| w^0 - w\big\|^2_G \ge - \frac{\rho}{t+1} \sum _{k=0}^t \nu_k \big\|w^{k+1} - w\big\|_G, \quad \forall w \in \mathcal{W}. \nonumber $$

The assertion (40) follows from the above inequality immediately. □

It follows from Lemma 4.3 that the sequence {w ^k} generated by Algorithm 1 is bounded. According to (39), the sequence $\{\bar{w}^{k}\}$ defined by (32) is also bounded. Therefore, there exists a constant D>0 such that

$$ \big\|w^k\big\|_G \le D \quad \hbox{and} \quad \big\| \bar{w}^k\big\|_G \le D, \quad \forall k \ge 0. $$

Recall that $\bar{w}_{t}$ is the average of $\{\bar{w}^{0}, \bar{w}^{1}, \ldots, \bar{w}^{t}\}$. Thus, we have $\|\bar{w}_{t}\|_{G} \le D$. Denote $E_{1}:= \sum_{k=0}^{\infty}\nu_{k} < + \infty$. For any $w \in \mathcal{B}_{\mathcal{W}}(\bar{w}_{t}): = \{ w \in \mathcal{W}\; | \; \|w - \bar{w}_{t}\|_{G} \le 1 \}$, we get

Thus, for any given ε>0, after at most $t := \lceil \frac{(2D+1)(2D + 1 + 2\rho E_{1})}{2 \varepsilon} - 1 \rceil $ iterations, we have

$$ (\bar{w}_t - w)^T Q(w) \le \varepsilon, \quad \forall w \in \mathcal{B}_\mathcal{W}(\bar{w}_t), $$

which means ${\bar{w}}_{t}$ is an approximate solution of SVI$(\mathcal{W},Q)$ with an accuracy of O(1/t). That is, a worst-case O(1/t) convergence rate of Algorithm 1 in ergodic sense is established.

5 The Second Algorithm

As we have mentioned, the inexactness criterion in Algorithm 1 (i.e., (26) and (28)) is not applicable directly due to the lack of $x_{*}^{k+1}$ and $y_{*}^{k+1}$. Now we propose a new inexactness criterion in the absence of these exact solutions and embed it into the subproblems (8) and (9). A new implementable ADMM with LQP regularization is thus proposed. Note the new inexactness criterion allows the relative error of solving the subproblems (8) and (9) to be fixed as a constant.

5.1 Algorithm 2

To ensure the convergence when the relative error of solving the subproblems (8) and (9) is fixed as a constant, the approximate solutions of (8) and (9) should be corrected to generate a new iterate (x ^k+1,y ^k+1,λ ^k+1). In this algorithm, we thus use $(\tilde{x}^{k}, \tilde{y}^{k},\tilde{\lambda}^{k}) \in \mathbb{R}^{n}_{++} \times \mathbb{R}^{p}_{++} \times \mathbb{R}^{m}$ to denote the approximate solutions of (8)–(10). Below, for notational simplicity we choose R:=rI and S:=sI with r,s>0 as the proximal matrices for this algorithm.

Algorithm 2

(I-ADMM-LQP 2)

Step 0. :

Let ε>0; μ,ν,σ∈]0,1[; γ∈]0,2[; r,s>0; $w^{0}:=(x^{0}, y^{0}, \lambda^{0}) \in {\mathbb{R}^{n}_{^{++}}} \times \mathbb{R}^{p}_{^{++}} \times \mathbb{R}^{m}$ and H be a positive definite diagonal matrix. Set k=0.

Step 1. :

Find $\tilde{x}^{k} \in \mathbb{R}^{n}_{++}$ and $\xi_{x}^{k} \in \mathbb{R}^{n}$ such that

(42)

where $\xi_{x}^{k}$ satisfies the following inexactness criterion:

$$ \big\|\xi_x^k\big\| \le \nu r (1 - \mu) \big\|x^k-\tilde{x}^k\big\|. $$

(43)

Step 2. :

Find $\tilde{y}^{k} \in \mathbb{R}^{p}_{++}$ and $\xi_{y}^{k} \in \mathbb{R}^{p}$ such that

(44)

where $\xi_{y}^{k}$ satisfies the following inexactness criterion:

$$ \big\|\xi_y^k\big\| \le \nu s (1 - \mu) \big\|y^k-\tilde{y}^k\big\|. $$

(45)

Step 3. :

Update $\tilde{\lambda}^{k}$ via

$$ \tilde{\lambda}^k :=\lambda^k - H \bigl(A\tilde{x}^k+B\tilde{y}^k-b\bigr). $$

(46)

Step 4. :

Set $\tilde{w}^{k} := (\tilde{x}^{k}, \tilde{y}^{k}, \tilde{\lambda}^{k})$, and then generate the new iterate w ^k+1 by

$$ w^{k+1}:= (1 - \sigma) w^k + \sigma P_{\mathcal{W}, G }\bigl[w^k - \alpha_k d \bigl(w^k,\tilde{w}^k,\xi^k\bigr)\bigr] $$

(47)

with the step-size

$$ \alpha_k := \gamma \alpha_k^*, $$

(48)

where

(49)

(50)

and G is defined by (12).

Step 5. :

If ∥w ^k+1−w ^k∥≤ε, stop; otherwise set k=k+1 and goto Step 1.

Remark 5.1

Note that Lemma 3.1 guarantees that there exists a unique solution $\tilde{x}^{k} \in \mathbb{R}^{n}_{++}$ for a given $\xi_{x}^{k} \in \mathbb{R}^{n}$ and a unique solution $\tilde{y}^{k} \in \mathbb{R}^{p}_{++}$ for a given $\xi_{y}^{k} \in \mathbb{R}^{p}$. To implement Algorithm 2, it is easy to determine the error terms $\xi_{x}^{k}$ and $\xi_{y}^{k}$ subject to the inexactness criteria (43) and (45). For example, in practice within finite iterations, we can choose

$$ \xi_x^k : = f\bigl(x^k \bigr)-f\bigl({\tilde{x}}^{k}\bigr) + A^THA \bigl(x^k - {\tilde{x}}^{k}\bigr). $$

(51)

Then, the system of equations (42) reduces to

$$ \quad f\bigl(x^k\bigr) - A^T\bigl[ \lambda^k - H\bigl(A x^k + By^k - b\bigr) \bigr] + r \bigl\{\bigl({\tilde{x}}^k - x^k\bigr) + \mu \bigl[ x^k - X_k^2 \bigl({\tilde{x}}^k\bigr)^{-1}\bigr] \bigr\} =0, $$

(52)

whose solution can be given explicitly by

$$\bigl({\tilde{x}}^k\bigr)_i: = \frac{q_i^k + \sqrt{(q_i^k)^2 + 4 \mu (x_i^k)^2}}{2}, \quad i = 1, \ldots, n \nonumber $$

with

$$q^k := (1 - \mu)x^k - \frac{1}{r} \bigl\{f \bigl(x^k\bigr) - A^T\bigl[\lambda^k - H \bigl(Ax^k + By^k - b\bigr)\bigr] \bigr\}. \nonumber $$

It is easy to verify that ${\tilde{x}}^{k} \in \mathbb{R}^{n}_{++}$ whenever $x^{k} \in \mathbb{R}^{n}_{++}$. Similarly, we can choose

$$ \xi_y^k : = g\bigl(y^k \bigr)-g\bigl({\tilde{y}}^{k}\bigr) + B^THB \bigl(y^k - {\tilde{y}}^{k}\bigr). $$

(53)

Then the positive solution of (44) can be obtained explicitly by

$$\bigl({\tilde{y}}^k\bigr)_i: = \frac{\tau_i^k + \sqrt{(\tau_i^k)^2 + 4 \mu (y_i^k)^2}}{2}, \quad i = 1, \ldots, p \nonumber $$

with

$$\tau^k := (1 - \mu)y^k - \frac{1}{s} \bigl\{g \bigl(y^k\bigr) - B^T\bigl[\lambda^k - H \bigl(A \tilde{x}^k + By^k - b\bigr)\bigr] \bigr\}. \nonumber $$

Furthermore, ${\tilde{y}}^{k} \in \mathbb{R}^{p}_{++}$ whenever $y^{k} \in \mathbb{R}^{p}_{++}$.

Remark 5.2

The inexactness criteria (43) and (45) can be met with proper values of r and s, respectively. For example, when both f and g in (2) are Lipschtiz continuous, we can choose $\xi_{x}^{k}$ as (51) and

$$ r \ge \frac{L_f + \|A^THA\|}{\nu(1-\mu)} , $$

(54)

where L _f is f’s Lipschtiz constant. Then, it follows that

$$\big\| \xi_x^k\big\| { \stackrel{\small(51)}{\le}} \bigl( L_f + \big\|A^THA\big\|\bigr)\big\|x^k - {\tilde{x}}^{k}\big\| { \stackrel{\small(54)}{\le}} \nu r (1-\mu) \big\|x^k - {\tilde{x}}^{k}\big\|, \nonumber $$

which means that the inexactness criterion (43) is satisfied. Similarly, we can choose $\xi_{y}^{k}$ as (53) and

$$ s\ge \frac{L_g + \|B^THB\|}{\nu(1-\mu)}, $$

(55)

where L _g is g’s Lipschtiz constant. Then, we have

$$\big\| \xi_y^k\big\| { \stackrel{\small(53)}{\le}} \bigl( L_g + \big\|B^THB\big\|\bigr)\big\|y^k - {\tilde{y}}^{k}\big\| { \stackrel{\small(55)}{\le}} \nu s(1-\mu) \big\|y^k - {\tilde{y}}^{k}\big\|, \nonumber $$

and thus the inexactness criterion (45) is satisfied.

Remark 5.3

As we have mentioned, since the ADMM subproblems with LQP regularization are allowed to be solved subject to a constant accuracy, Algorithm 2 thus requires certain correction step to further correct the approximate solutions to ensure the convergence. More specifically, the additional correction required by Algorithm 2 in (47) can be specified as

$$\left \{ \begin{array}{l} x^{k+1} := (1 - \sigma)x^k + \sigma P_{\mathbb{R}^n_+} \{x^k - \alpha_k [x^k - \tilde{x}^k - \frac{1}{(1+\mu)r}\xi_x^k] \}, \\[5pt] y^{k+1} := (1 - \sigma)y^k + \sigma P_{\mathbb{R}^p_+, M} \{ y^k - \alpha_k [y^k - \tilde{y}^k - M^{-1}\xi_y^k] \}, \\[5pt] \lambda^{k+1} := (1 - \sigma)\lambda^k + \sigma [\lambda^k - \alpha_k (\lambda^k - \tilde{\lambda}^k)] = \lambda^k - \sigma \alpha_k (\lambda^k - \tilde{\lambda}^k), \end{array} \right . \nonumber $$

where M:=(1+μ)S+B ^T HB. This additional computation is easy and inexpensive computationally.

5.2 Convergence

Recall we let $w^{k}:=(x^{k},y^{k},\lambda^{k}) \in \mathbb{R}^{n}_{++} \times \mathbb{R}^{p}_{++} \times \mathbb{R}^{m} $ be a given vector, $\tilde{w}^{k}:=(\tilde{x}^{k}, \tilde{y}^{k},\tilde{\lambda}^{k}) \in \mathbb{R}^{n}_{++} \times \mathbb{R}^{p}_{++} \times \mathbb{R}^{m}$ be generated by Algorithm 2, w ^k, $\tilde{w}^{k}$ and ξ ^k satisfy (43) and (45). In this subsection, we prove the convergence of Algorithm 2.

For convenience of the following analysis, we define

$$ \bar{w}^k :=\left ( \begin{array}{c} \bar{x}^k \\ \bar{y}^k \\ \bar{\lambda}^k \end{array} \right ) = \left ( \begin{array}{c} \tilde{x}^k \\ \tilde{y}^k \\ \lambda^k - H (A\tilde{x}^k + By^k -b) \end{array} \right ). $$

(56)

Lemma 5.1

Let the sequence {w ^k} be generated by Algorithm 2, and the accompanying sequence $\{\bar{w}^{k}\}$ be defined by (56). Then, for any $w:=(x, y, \lambda) \in \mathcal{W}$, we have

$$ \bigl(w - \bar{w}^k\bigr)^T Q\bigl( \bar{w}^k\bigr) \ge \frac{1}{2 \sigma \gamma \alpha_k^*} \bigl(\big\|w^{k+1} - w \big\|_G^2 - \big\|w^k - w\big\|_G^2 \bigr) + \frac{2-\gamma}{2}\varphi\bigl(w^k, \tilde{w}^k, \xi^k\bigr), $$

(57)

where $\varphi(w^{k}, \tilde{w}^{k}, \xi^{k})$ is defined by (50).

Proof

Recall that $\bar{x}^{k} = \tilde{x}^{k}$ and $\bar{y}^{k} = \tilde{y}^{k}$. It follows from (42), (44) and (56) that

Applying Lemma 3.1 to the above equations, respectively, the inequality (16) can be written as

Thus we have

Note that

$$\bigl(\lambda - \bar{\lambda}^k\bigr)^T \bigl(A \bar{x}^k + B\bar{y}^k - b\bigr) =\bigl(\lambda - \bar{ \lambda}^k\bigr)^T H^{-1}\bigl( \lambda^k - \tilde{\lambda}^k\bigr), \quad \forall \lambda \in \mathbb{R}^m. \nonumber $$

Combining the above three formulas together and using (49) and the notation of G and ξ ^k, we get

(58)

(59)

For any $w \in \mathcal{W}$, we have

(60)

It follows from (60) that

(61)

Since $(\tilde{w}^{k} - \bar{w}^{k})^{T} G d(w^{k}, \tilde{w}^{k}, \xi^{k}) = (\lambda^{k} - \tilde{\lambda}^{k})^{T} (By^{k} - B{\tilde{y}^{k}})$ (see (12), (49) and (56)), it follows from the notation of $\varphi(w^{k}, \tilde{w}^{k}, \xi^{k})$ that

(62)

Substituting (61) and (62) into (59), the assertion (57) is proved. □

Using the notation of M, the matrix G can be rewritten as

$$G :=\left ( \begin{array}{c@{\quad}c@{\quad}c} (1+\mu)R & 0 & 0 \\ 0 & M & 0 \\ 0 & 0 & H^{-1} \end{array} \right ). \nonumber $$

Proposition 5.1

For given $w^{k}:=(x^{k}, y^{k}, \lambda^{k}) \in {\mathbb{R}^{n}_{^{++}}} \times \mathbb{R}^{p}_{^{++}} \times \mathbb{R}^{m}$, let $\tilde{w}^{k} :=(\tilde{x}^{k}, \tilde{y}^{k}, \tilde{\lambda}^{k})$ be generated by (42)–(46). Then we have

$$ \varphi\bigl(w^k, \tilde{w}^k, \xi^k\bigr) \ge \frac{1}{4} \big\| d\bigl(w^k, \tilde{w}^k, \xi^k\bigr) \big\|_G^2. $$

(63)

Proof

It follows from (49), (50) and the notation of G and x ^k that

(64)

Using $\bar{\lambda}^{k} = \tilde{\lambda}^{k} - H(By^{k} - B\tilde{y}^{k})$ (see (46) and (56)), we have

Substituting this into (64), it follows that

(65)

Since R:=rI, it follows from (43) that

$$ \big\|R^{-1} \xi_x^k \big\|_R \le \nu (1 - \mu) \big\|x^k - \tilde{x}^k \big\|_R. $$

(66)

Using the Cauchy–Schwarz inequality and (66), we have

From the above inequality, we obtain

$$ \big\|x^k - \tilde{x}^k\big\|_R^2 - \bigl(x^k - \tilde{x}^k\bigr)^T \xi_x^k \ge \frac{1}{2} \bigl[ \big\|x^k - \tilde{x}^k\big\|_{(1+\mu)R}^2 - \bigl(x^k - \tilde{x}^k\bigr)^T \xi_x^k \bigr]. $$

(67)

Using (66), we have

It follows from the above inequality that

Substituting this into (67), we obtain

$$ \big\|x^k - \tilde{x}^k\big\|^2_R - \bigl(x^k - \tilde{x}^k\bigr)^T \xi_x^k \ge \frac{1}{4} \big\| x^k - \tilde{x}^k - \bigl[(1+\mu)R\bigr]^{-1} \xi_x^k\big\|^2_{(1+\mu)R}. $$

(68)

Similarly, using (45), we obtain

$$\big\| y^k - \tilde{y}^k \big\|_S^2 - \bigl( y^k - \tilde{y}^k\bigr)^T \xi_y^k \ge \frac{1}{4}\big\| y^k - \tilde{y}^k - \bigl[(1+\mu)S\bigr]^{-1} \xi_y^k \big\|_{(1+\mu)S}^2. \nonumber $$

Therefore, it follows from (65) and the above two inequalities that

(69)

Using the notation of M, we get

$$\bigl(\xi_y^k\bigr)^T \bigl[(1+\mu)S \bigr]^{-1} \xi_y^k \ge \bigl( \xi_y^k\bigr)^T M^{-1} \xi_y^k, \nonumber $$

and thus

$$ \big\| \bigl[(1+\mu)S\bigr]^{-1} \xi_y^k \big\|_{(1+\mu)S}^2 \ge \big\| M^{-1} \xi_y^k \big\|_M^2. $$

(70)

By a simple manipulation, we have

(71)

Substituting (71) into (69) and using the notation of $d(w^{k}, \tilde{w}^{k}, \xi^{k})$ and G, the assertion of this proposition is proved. □

The following corollary follows from the definition of $\alpha^{k}_{*}$ and (63) directly, and we omit its proof.

Corollary 5.1

The step-size $\alpha_{k}^{*}$ defined in Step 4 of Algorithm 2 satisfies $\alpha_{k}^{*}\ge 1/4$ for all k≥0.

Next, we will show the contraction of the sequence generated by Algorithm 2, based on which the convergence of Algorithm 2 can be established easily.

Corollary 5.2

For any $w^{*} \in \mathcal{W}^{*}$, there is a positive constant c ₁ such that the sequence {w ^k} generated by Algorithm 2 satisfies

$$ \big\| w^{k+1} - w^* \big\|^2_G \le \big\| w^k - w^* \big\|^2_G - c_1\bigl( \big\|x^k - \bar{x}^k\big\|^2_R + \big\|y^k - \bar{y}^k\big\|^2_S + \big\| \lambda^k - \bar{\lambda}^k\big\|_{H^{-1}}^2 \bigr). $$

(72)

Proof

Since $w^{*} \in \mathcal{W}^{*}$, $\bar{w}^{k} \in \mathcal{W}$ and Q is monotone, we have

$$0 \ge \bigl( w^* - \bar{w}^k \bigr)^T Q \bigl(w^*\bigr) \ge \bigl( w^* - \bar{w}^k \bigr)^T Q\bigl( \bar{w}^k\bigr). \nonumber $$

Then, setting w=w ^∗ in (57), we have

It follows from the above equation and Corollary 5.1 that

(73)

Using (65), we have

It follows from (43) and (45) that

Setting c ₁:=σγ(2−γ)(1−μ)(1−ν)/8, the assertion (72) is proved from the above three inequalities. □

Theorem 5.1

The sequence {w ^k} generated by Algorithm 2 converges to some w ^∞ which is a solution of SVI $(\mathcal{W},Q)$.

Proof

Setting η _k≡0, c ₀=c ₁ in (20), from (72) and Theorem 3.1 we have

$$ \lim_{k\rightarrow \infty}\big\| w^k - \bar{w}^k \big\|_G=0, $$

(74)

and {w ^k} is bounded. It follows from (73) and (63) that

$$ \lim_{k \rightarrow \infty} \big\|d\bigl(w^k, \tilde{w}^k, \xi^k\bigr)\big\|_G = 0. $$

(75)

From (58), we obtain

Together with (74) and (75), we get

$$\liminf_{k\to \infty} \bigl(w - \bar{w}^k \bigr)^T Q\bigl(\bar{w}^k\bigr) \ge 0, \quad \forall w \in \mathcal{W}. \nonumber $$

The convergence of Algorithm 2 is then obtained immediately from Theorem 3.1. □

5.3 Convergence Rate

Now, we show the worst-case O(1/t) convergence rate for Algorithm 2.

Theorem 5.2

For any integer t>0, there is a $\bar{w}_{t} \in \mathcal{W}$, which is a convex combination of the iterates $\bar{w}^{0}, \bar{w}^{1}, \ldots, \bar{w}^{t}$ defined in (56). For any $w \in \mathcal{W}$, we have

$$ (\bar{w}_t - w)^TQ(w) \le \frac{1}{2\sigma\gamma \varUpsilon_t} \big\| w^0 - w\big\|^2_G, $$

(76)

where

$$\varUpsilon_t := \sum_{k = 0}^t \alpha_k^* \quad \hbox{\textit{and}} \quad \bar{w}_t := \frac{1}{\varUpsilon_t} \sum_{k =0}^t \alpha_k^* \bar{w}^k. \nonumber $$

Proof

It follows from (57) and (63) that

$$\bigl(w - \bar{w}^k\bigr)^T\alpha_k^* Q \bigl(\bar{w}^k\bigr) + \frac{1}{2\sigma\gamma} \big\|w^k - w \big\|^2_G \ge \frac{1}{2\sigma\gamma}\big\|w^{k+1} - w \big\|^2_G, \quad \forall w \in \mathcal{W}. \nonumber $$

Since Q is monotone, from the above inequality, we have

$$ \bigl(w - \bar{w}^k\bigr)^T \alpha_k^* Q(w) + \frac{1}{2\sigma\gamma} \big\| w^k - w\big\|^2_G \ge \frac{1}{2\sigma\gamma}\big\|w^{k+1} - w\big\|^2_G, \quad \forall w \in \mathcal{W}. $$

(77)

Summing the inequality (77) over k=0,1,…,t, we obtain

Since $(\sum_{k =0}^{t} \alpha_{k}^{*})/{\varUpsilon_{t}} = 1$, $\bar{w}_{t}$ is a convex combination of $\bar{w}^{0}, \bar{w}^{1}, \ldots, \bar{w}^{t}$ and thus $\bar{w}_{t} \in \mathcal{W}$. Using the notation of ϒ _t and $\bar{w}_{t}$, we derive

$$(w - \bar{w}_t)^T Q(w) + \frac{1}{2\sigma\gamma \varUpsilon_t} \big\| w^0 - w\big\|^2_G \ge 0, \quad \forall w \in \mathcal{W}. \nonumber $$

The assertion (76) follows from the above inequality immediately. □

Using Theorem 5.2, we can prove a worst-case O(1/t) convergence rate of Algorithm 2 in ergodic sense as Algorithm 1.

6 Conclusions

This paper studies how to combine the alternating direction method of multipliers and the logarithmic-quadratic proximal method, allowing the resulting subproblems to be solved approximately subject to different inexactness criteria. We demonstrate how to develop inexact versions of ADMM with LQP regularization by applying the most popular and fundamental choices of inexactness criteria in the literature. It deserves further research on embedding other inexactness criteria into the subproblems of ADMM with LQP regularization and thus developing other algorithms of the same kind as those in this paper. We expect that certain correction steps are required if some relaxed inexactness criteria are chosen for solving the resulting subproblems of the combination of ADMM and LQP method.

References

Glowinski, R., Marrocco, A.: Sur l’approximation par éléments finis d’ordre un et la résolution par pénalisation-dualité d’une classe de problèmes de Dirichlet non linéaires. Rev. Fr. Autom. Inform. Rech. Opér., Anal. Numér. 2, 41–76 (1975)
MathSciNet Google Scholar
He, B.S., Liao, L.-Z., Han, D.R., Yang, H.: A new inexact alternating directions method for monotone variational inequalities. Math. Program. 92(1), 103–118 (2002)
Article MathSciNet MATH Google Scholar
Martinet, B.: Regularision d’inéquations variationnelles par approximations successive. Rev. Fr. Autom. Inform. Rech. Opér. 126, 154–159 (1970)
MathSciNet Google Scholar
Auslender, A., Teboulle, M.: Entropic proximal decomposition method for convex programs and variational inequalities. Math. Program. 91(1), 33–47 (2001)
MathSciNet MATH Google Scholar
Yuan, X.M., Li, M.: An LQP-based decomposition method for solving a class of variational inequalities. SIAM J. Optim. 21(4), 1309–1318 (2011)
Article MathSciNet MATH Google Scholar
Auslender, A., Teboulle, M., Ben-Tiba, S.: A logarithmic-quadratic proximal method for variational inequalities. Comput. Optim. Appl. 12, 31–40 (1999)
Article MathSciNet MATH Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Article Google Scholar
Tao, M., Yuan, X.M.: On the O(1/t) convergence rate of alternating direction method with logarithmic-quadratic proximal regularization. SIAM J. Optim. 22(4), 1431–1448 (2012)
Article MathSciNet MATH Google Scholar
Facchinei, F., Pang, J.S.: Finite-Dimensional Variational Inequalities and Complementarity Problems, Vols. I and II. Springer, New York (2003)
Google Scholar
He, B.S., Yuan, X.M.: On the O(1/n) convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700–709 (2012)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Gradient methods for minimizing composite objective function pp. 1–30. Core Discussion Paper, 2007/76 (2007)
He, B.S., Liao, L.-Z., Yuan, X.M.: A LQP-based interior prediction-correction method for nonlinear complementarity problems. J. Comput. Math. 24(1), 33–44 (2006)
MathSciNet MATH Google Scholar
Yamashita, N., Fukushima, M.: Equivalent unconstrained minimization and global error bounds for variational inequality problems. SIAM J. Control Optim. 35(1), 273–284 (1997)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The first author was supported by National Natural Science Foundation of China grant 11001053, Program for New Century Excellent Talents in University grant NCET-12-0111 and Natural Science Foundation of Jiangsu Province grant BK2012662. The second author was supported in part by Hong Kong General Research Fund grants HKBU 201409 and HKBU 201611. The third author was supported by the grant FRG2/11-12/120 from Hong Kong Baptist University and the General Research Fund HKBU 203311 from Hong Kong Research Grants Council.

Author information

Authors and Affiliations

School of Economics and Management, Southeast University, Nanjing, 210096, China
Min Li
Department of Mathematics, Hong Kong Baptist University, Hong Kong, China
Li-Zhi Liao & Xiaoming Yuan

Authors

Min Li
View author publications
You can also search for this author in PubMed Google Scholar
Li-Zhi Liao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoming Yuan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoming Yuan.

Additional information

Communicated by Masao Fukushima.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, M., Liao, LZ. & Yuan, X. Inexact Alternating Direction Methods of Multipliers with Logarithmic–Quadratic Proximal Regularization. J Optim Theory Appl 159, 412–436 (2013). https://doi.org/10.1007/s10957-013-0334-4

Download citation

Received: 10 September 2012
Accepted: 04 May 2013
Published: 25 May 2013
Issue Date: November 2013
DOI: https://doi.org/10.1007/s10957-013-0334-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Inexact Alternating Direction Methods of Multipliers with Logarithmic–Quadratic Proximal Regularization

Abstract

Similar content being viewed by others

Symmetric Alternating Direction Method with Indefinite Proximal Regularization for Linearly Constrained Convex Optimization

An inexact proximal generalized alternating direction method of multipliers

A Partially Inexact Proximal Alternating Direction Method of Multipliers and Its Iteration-Complexity Analysis

1 Introduction

2 Model and Motivation

3 Preliminaries

3.1 Some Basic Definitions and Properties

Definition 3.1

3.2 A Key Lemma

Lemma 3.1

3.3 Proof Framework of Convergence

Lemma 3.2

Proof

Theorem 3.1

Proof

4 The First Algorithm

4.1 Algorithm 1

Algorithm 1

Remark 4.1

4.2 Convergence

Lemma 4.1

Proof

Lemma 4.2

Proof

Lemma 4.3

Proof

Theorem 4.1

Proof

4.3 Convergence Rate

Theorem 4.2

Proof

5 The Second Algorithm

5.1 Algorithm 2

Algorithm 2

Remark 5.1

Remark 5.2

Remark 5.3

5.2 Convergence

Lemma 5.1

Proof

Proposition 5.1

Proof

Corollary 5.1

Corollary 5.2

Proof

Theorem 5.1

Proof

5.3 Convergence Rate

Theorem 5.2

Proof

6 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation