Abstract
In the literature, it was shown recently that the Douglas–Rachford alternating direction method of multipliers can be combined with the logarithmic-quadratic proximal regularization for solving a class of variational inequalities with separable structures. This paper studies the inexact version of this combination, where the resulting subproblems are allowed to be solved approximately subject to different inexactness criteria. We prove the global convergence and establish worst-case convergence rates for the derived inexact algorithms.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Variational inequality problems (VIPs) with separable structures and linear constraints capture wide applications in some fields. For solving these VIPs, the Douglas–Rachford alternating direction method of multipliers (ADMM) proposed originally in [1] is a benchmark. The proximal point algorithm (PPA) in [2, 3] was suggested to regularize ADMM’s subproblems and accordingly an algorithmic framework of the inexact version of ADMM with proximal regularization was established. Moreover, in [4, 5] the ADMM subproblems were suggested to be regularized by the logarithmic-quadratic proximal (LQP) regularization proposed in [6]; thus the complementarity subproblems of applying ADMM reduce to systems of nonlinear equations. However, it is not possible to acquire the exact solutions of the complementarity subproblems of ADMM with LQP regularization. Therefore, we are interested in investigating how to solve these subproblems approximately. Our purpose is to study the inexact version of the combination of ADMM with LQP regularization, seeking appropriate inexactness criteria under which the inexact version of ADMM with LQP regularization can be still guaranteed to be convergent. We shall study several inexactness criteria with different levels of implementability. For the resulting algorithms based on the inexact version of ADMM with LQP regularization, we prove their global convergence and establish their worst-case O(1/t) convergence rates.
The rest of this paper is organized as follows. We first state the model to be studied and the motivation of considering the inexact version of ADMM with LQP regularization in Sect. 2. Then, in Sect. 3, we summarize some preliminaries for further analysis. After that, a conceptual inexact version of the ADMM with LQP regularization is presented in Sect. 4; an implementable algorithm based on the inexact version of ADMM with LQP regularization is proposed in Sect. 5. For both the conceptual and the implementable algorithms, the global convergence and a worst-case convergence rate are established. Finally, some concluding remarks are drawn in Sect. 6.
2 Model and Motivation
More specifically, the VIP we consider is to find a \(u^{*} \in \mathcal{U}\) such that
with
where \(A\in \mathbb{R}^{m \times n}\) and \(B\in \mathbb{R}^{m \times p}\) are given matrices; \(b\in \mathbb{R}^{m}\) is a given vector; \(f: \mathbb{R}^{n}_{+} \rightarrow \mathbb{R}^{n}\) and \(g: \mathbb{R}^{p}_{+} \rightarrow \mathbb{R}^{p}\) are continuous and monotone operators.
By attaching a Lagrange multiplier \(\lambda\in \mathbb{R}^{m}\) to the linear constraint in (2), we can reformulate (1)–(2) as: Finding a \(w^{*} \in \mathcal{W}\) such that
where
We denote by SVI\((\mathcal{W},Q)\) the structured VIP (3)–(4); and its solution set, denoted by \(\mathcal{W}^{*}\), is assumed to be nonempty.
The ADMM in [1] for solving SVI\((\mathcal{W},Q)\) is
where \(H \in \mathbb{R}^{m \times m}\) is a penalty matrix in the metric form. In the literature, it is often chosen as H:=β⋅I, where I is the identity matrix in \(\mathbb{R}^{m \times m}\) and β>0 is a scalar.
To truly implement the ADMM scheme (5)–(7), the resulting complementarity problems should be solved efficiently. In fact, for some particular cases, where f, g, A, and B are all special enough, the resulting subproblems of ADMM could be easy enough to have closed-form solutions or can be easily solved up to high precisions. This feature has triggered a tremendous burst of applying ADMM in the literature for solving different problems varying from semidefinite programming, image processing, statistical learning, control, computer vision to numerical linear algebra; see, e.g., [7] and references therein. On the other hand, equally importantly, for generic setting of f, g, A, and B, we can only expect to achieve approximate solutions of these subproblems subject to certain accuracy by applying certain iterative scheme internally. For this case, it is important to analyze under which criterion these subproblems can be solved approximately and how to seek truly implementable inexactness criteria. In [4, 5] the ADMM subproblems were suggested to be regularized by the LQP regularization proposed in [6]. The LQP regularization forces the solutions of ADMM subproblems to be interior points of \(\mathbb{R}^{n}_{+}\) and \(\mathbb{R}^{p}_{+}\), respectively, thus the complementarity subproblems (5) and (6) reduce to systems of nonlinear equations. More specifically, the iterative scheme of ADMM with LQP regularization is as follows:
where \(R\in \mathbb{R}^{n \times n}\) and \(S\in \mathbb{R}^{p \times p}\) are symmetric positive definite diagonal matrices and they are called proximal matrices; R(x k+1−x k) and S(y k+1−y k) are quadratic proximal regularization terms, μ∈]0,1[ is a given constant, \(X_{k} := \operatorname{diag}(x^{k}_{1}, x^{k}_{2}, \ldots, x_{n}^{k})\), \((x^{k+1})^{-1}\in \mathbb{R}^{n}\) is a vector whose jth element is \(1/x_{j}^{k+1}\), \(Y_{k} := \operatorname{diag} (y^{k}_{1}, y^{k}_{2}, \ldots, y_{n}^{k})\), \((y^{k+1})^{-1} \in \mathbb{R}^{p}\) is a vector whose jth element is \(1/y_{j}^{k+1}\). In [8], this scheme was further shown to be convergent on a worst-case O(1/t) rate, provided that both the x- and the y-subproblems (i.e., (8) and (9)) are solved exactly. Here, the worst-case O(1/t) convergence rate of ADMM with LQP regularization means the scheme (8)–(10) achieves a solution of SVI\((\mathcal{W},Q)\) with the accuracy of O(1/t) after most t iterations.
To solve the complementarity subproblems (8) and (9), we shall study several inexactness criteria with different levels of implementability. When the accuracy of solving these subproblems is relaxed, it is easy to understand that some additional corrections might need to be supplemented after the iteration of ADMM with LQP regularization. Or, relaxing the accuracy of solving these subproblems might pay the price of requiring additional correction on the subproblems’ solutions. We thus suggest to solve these subproblems within the framework of either under a more restricted criterion but without further correction step, or under a more relaxed criterion but with some further correction step. How to balance the accuracy of inner subproblems and further correction depends on how special a concrete application of the abstract model (3)–(4) is, and it is too inane to discuss this issue for the abstract model (3)–(4) without any specification of the involved functions and sets. Instead, in this paper, we only provide some inexact algorithms based on the framework of combining ADMM with LQP regularization. In addition to the global convergence, a worst-case O(1/t) convergence rate is established for these inexact algorithms. Therefore, we show that these inexact algorithms based on the combination of ADMM with LQP regularization, which are more practical than the exact version (8)–(10), enjoy the same worst-case convergence rate as the exact version.
3 Preliminaries
In this section, we recall some basic definitions and properties, which will be frequently used in our later analysis. Some useful results proved already in the literature are also summarized.
3.1 Some Basic Definitions and Properties
Let N be a positive definite matrix with appropriate dimensionality, the N-norm of a vector v with appropriate dimensionality is denoted by \(\|v\|_{N}:= \sqrt{v^{T}Nv}\). Let Ω be a nonempty, closed, and convex subset of \(\mathbb{R}^{l}\). The projection operator under the N-norm is defined by
Clearly, according to this definition, we have the following property related to the projection operator under the N-norm:
In the following, we give the definitions of monotonicity and Lipschitz continuity for a mapping.
Definition 3.1
Let \(\mathcal{X} \subseteq \mathbb{R}^{n}\) and \(f:\mathcal{X} \rightarrow \mathbb{R}^{n}\) be a mapping. Then,
-
(i)
f is said to be monotone with respect to \(\mathcal{X}\) iff the following inequality always holds:
$$(x - \tilde{x})^T\bigl(f(x) - f(\tilde{x})\bigr)\geq 0,\quad \forall \, x, \tilde{x} \in \mathcal{X}; \nonumber $$ -
(ii)
f is Lipschitz continuous with respect to \(\mathcal{X}\) iff there exists a constant (called Lipschitz constant) L f >0 such that
$$\big\| f(x) - f(\tilde{x}) \big\| \le L_f \| x - \tilde{x} \|,\quad \forall \, x, \tilde{x} \in \mathcal{X}. \nonumber $$
We denote by L f and L g the Lipschitz constants of the mappings f and g in the model (1)–(2). Note that the mapping Q(w) in (4) is monotone with respect to \(\mathcal{W}\) under the monotonicity assumption of f and g. Therefore, the solution set \(\mathcal{W}^{*}\) of SVI\((\mathcal{W},Q)\) is closed and convex; see, e.g., [9].
Throughout the rest of the paper, we define the matrix G as
Last, we recall a characterization of the solution set \(\mathcal{W}^{*}\) proved in [9], see (2.3.2) in p. 159 of [9]. As shown in [8, 10], this characterization is crucial for establishing the convergence rate for ADMM or ADMM with LQP regularization. More specifically, recall that \(\mathcal{W}^{*}\) can be characterized as
Therefore, as Definition 1 in [11], \(\bar{w} \in \mathcal{W}\) can be regarded as an ε-approximation solution of SVI\((\mathcal{W},Q)\) if it satisfies
In our later analysis, we shall establish the worst-case O(1/t) convergence rate for the new algorithms to be proposed in the sense that after t iterations of these algorithms, we can find a \(\bar{w} \in \mathcal{W}\) such that
with ε=O(1/t).
3.2 A Key Lemma
Inspired by Proposition 1 in [6], the following lemma was proved in the literature (see, e.g., [5, 12]), and its conclusion plays an important role in analyzing the convergence of ADMM with LQP regularization and establishing its convergence rate. Our later analysis is also based on this conclusion.
Lemma 3.1
Let \(P := \operatorname{diag}(p_{1}, p_{2},\ldots, p_{t}) \in \mathbb{R}^{t \times t}\) be a positive definite diagonal matrix, \(q(u) \in \mathbb{R}^{t}\) be a monotone mapping of u with respect to \(\mathbb{R}^{t}_{^{+}}\), and μ∈]0,1[. For a given \({\bar{u}} \in \mathbb{R}^{t}_{++}\), we define \({\bar{U}}:= \operatorname{diag}({\bar{u}}_{1},{\bar{u}}_{2}, \ldots, {\bar{u}}_{t})\). Then, the equation
has the unique positive solution u. In addition, for this positive solution \(u \in \mathbb{R}^{t}_{++}\) and any \(v \in \mathbb{R}^{t}_{+} \), we have
Moreover, we have
3.3 Proof Framework of Convergence
Later, we shall propose several algorithms, and they differ in the inexactness criteria for solving the ADMM subproblems with LQP regularization. The convergence proofs for these different algorithms share a common framework, even though the specific techniques required are quite different in their respective proofs. For succinctness purpose, in this subsection we first show the framework of proving the convergence before these algorithms are presented. Then, when we establish the convergence for a specific algorithm later, it suffices to check the conditions presented in this framework. The proof framework presented below is thus a unified treatment on the convergence analysis for different algorithms.
We first establish an identity which will be used later for convergence. The proof of this lemma only requires straightforward manipulation.
Lemma 3.2
Let \({\bar{x}}^{k} \in \mathbb{R}^{n}\), \(y^{k}, {\bar{y}}^{k} \in \mathbb{R}^{p}\) and \(\lambda^{k} \in \mathbb{R}^{m}\) be given. We define \(\bar{\lambda}^{k}\) and \(\hat{\lambda}^{k+1}\) as
Then, for any \(y \in \mathbb{R}^{p}\) and \(\lambda \in \mathbb{R}^{m}\), we have
Proof
Using the definition of \(\hat{\lambda}^{k+1}\), we have
For the first term in (17), we have the following identity:
Then, it follows from the definitions of \(\bar{\lambda}^{k}\) and \(\hat{\lambda}^{k+1}\) that
and thus
Substituting this and (18) into (17), we have
Note that
Adding the above equality and (19), the assertion is proved. □
Next, we will present a unified framework for proving the convergence of our new algorithms to be presented.
Theorem 3.1
Let c 0∈]0,+∞[ and μ∈]0,1[ be positive constants; {η k } be a nonnegative sequence with \(\sum_{k=0}^{\infty} \eta_{k} <+\infty\); and G be defined by (12). For any \(w^{*} \in \mathcal{W}^{*}\), if the sequences {w k=(x k,y k,λ k)} and \(\{\bar{w}^{k} = (\bar{x}^{k}, \bar{y}^{k}, \bar{\lambda}^{k})\}\) satisfy
then {w k} is bounded and
Furthermore, if the mapping Q is continuous and
then the sequence {w k} converges to a point in \(\mathcal{W}^{*}\).
Proof
It follows from (20) that for any l≤k and \(w^{*} \in \mathcal{W}^{*}\), we have
Thus the sequence {w k} is bounded, since \(\sum_{i=0}^{\infty} \eta_{i} <+\infty\). Summing the inequality (20) over k=0,1,… , we get
Therefore, we have
It follows that
The first assertion (21) is proved.
Since {w k} is bounded and \(\lim_{k \rightarrow \infty} \|w^{k} - \bar{w}^{k}\|_{G} = 0\), we find that \(\{\bar{w}^{k}\}\) is also bounded and then it has at least one cluster point. Let w ∞ be a cluster point of \(\{\bar{w}^{k}\}\) and the subsequences \(\{\bar{w}^{k_{j}}\}\) and \(\{w^{k_{j}}\}\) both converge to w ∞. It follows from (22) that
and consequently
since the mapping Q is continuous. This means that w ∞ is a solution of SVI\((\mathcal{W},Q)\). Note that the inequality (23) is true for all solution points of SVI\((\mathcal{W},Q)\), hence we have
Since \(w^{k_{j}}\rightarrow w^{\infty}\ (j\rightarrow\infty)\) and \(\sum_{i=0}^{\infty} \eta_{i} < + \infty\), for any given ε>0, there exists a j 0>0 such that
Therefore, for any \(k\geq k_{j_{0}}\), it follows from (24) and (25) that
This implies that the sequence {w k} converges to a point w ∞ in \(\mathcal{W}^{*}\). □
According to Theorem 3.1, it suffices to check conditions (20) and (22) when we establish the convergence for the algorithms to be proposed. As the new algorithms are based on the combination of ADMM with LQP regularization, where the subproblems are solved inexactly, we abbreviate these algorithms as I-ADMM-LQP X, where “X” is an arabian number representing a concrete algorithm.
4 The First Algorithm
We propose the first algorithm based on the combination of ADMM with LQP regularization where the subproblems are solved approximately. The inexactness criterion required by this algorithm involves the exact solutions of the resulting subproblems, thus is not applicable directly in implementation. We still present this algorithm, however, because its convergence analysis provides a similar sketch for the other algorithms to be analyzed later, and is quite simple. Thus, by this simpler analysis, it becomes easier to understand the latter analysis for other implementable algorithms. In addition, although the inexactness criterion itself is not checkable directly, it provides the possibility to seek implementable criteria based on some error bounds; see, e.g., [9, 13]. Thus, it deserves at least theoretical interest. Note this criterion is widely used in PPA and ADMM literature.
4.1 Algorithm 1
In this algorithm, we seek approximate solutions of the subproblems (8)–(9) and the accuracy is controlled by a summable sequence of positive scales.
Algorithm 1
(I-ADMM-LQP 1)
- Step 0. :
-
Let ε>0; μ∈]0,1[; \(w^{0}:=(x^{0}, y^{0}, \lambda^{0}) \in {\mathbb{R}^{n}_{^{++}}} \times \mathbb{R}^{p}_{^{++}} \times \mathbb{R}^{m}\); \(R \in \mathbb{R}^{n \times n}\), \(S \in \mathbb{R}^{p \times p}\) and \(H \in \mathbb{R}^{m \times m}\) be positive definite diagonal matrices; and {ν k } be a nonnegative sequence satisfying \(\sum_{k=0}^{\infty} \nu_{k} < + \infty\). Set k=0.
- Step 1. :
-
Find \(x^{k+1} \in \mathbb{R}^{n}_{++}\) such that
$$ \big\|x^{k+1} - x_*^{k+1}\big\| \le \nu_k, $$(26)where \(x^{k+1}_{*}\) is the exact solution of the system
$$ f(x) - A^T \bigl[\lambda^k - H \bigl(A x + B y^k - b\bigr) \bigr] + R\bigl[\bigl(x-x^k \bigr) + \mu \bigl(x^k - X_k^2 x^{-1}\bigr)\bigr]= 0. $$(27) - Step 2. :
-
Find \(y^{k+1} \in \mathbb{R}^{p}_{++}\) such that
$$ \big\|y^{k+1} - y^{k+1}_*\big\| \le \nu_k, $$(28)where \(y^{k+1}_{*}\) is the exact solution of the system
$$ \quad\;\;\; g(y) - B^T \bigl[\lambda^k - H \bigl(A x^{k+1}_* + B y - b\bigr)\bigr] + S\bigl[\bigl(y-y^k \bigr) + \mu \bigl(y^k - Y_k^2 y^{-1}\bigr)\bigr] = 0. $$(29) - Step 3. :
-
Update the Lagrange multiplier
$$ \lambda^{k+1} := \lambda^k - H \bigl(A x^{k+1} + B y^{k+1} - b\bigr). $$(30) - Step 4. :
-
Set w k+1=(x k+1,y k+1,λ k+1). If ∥w k+1−w k∥≤ε, stop; otherwise set k=k+1 and goto Step 1.
Remark 4.1
It follows from Lemma 3.1 that there exist unique \(x^{k+1}_{*} \in \mathbb{R}^{n}_{++}\) and \(y^{k+1}_{*} \in \mathbb{R}^{p}_{++}\) satisfying (27) and (29), respectively. This guarantees that we can find \(x^{k+1} \in \mathbb{R}^{n}_{++}\) and \(y^{k+1} \in \mathbb{R}^{p}_{++}\) satisfying (26) and (28), respectively.
4.2 Convergence
In this subsection, we prove the convergence of Algorithm 1. First, we define
and set
to simplify our notation in the following analysis.
To prove the convergence of Algorithm 1, we first present some lemmas.
Lemma 4.1
Let \(\{w^{k+1}_{*}\}\) be defined by (32) and {w k} be generated by Algorithm 1. Then, there exists a positive constant ρ such that
where G is defined by (12).
Proof
It follows from (30) and (31) that
Together with (26), (28), and (12), the above equation implies (33) immediately. □
Lemma 4.2
Let the sequence {w k} be generated by Algorithm 1, and the accompanying sequences \(\{w^{k+1}_{*}\}\) and \(\{\bar{w}^{k}\}\) be defined by (32). Then, for any \(w:=(x, y, \lambda) \in \mathcal{W}\), we have
where G is defined by (12).
Proof
Note that \(\bar{x}^{k} = x^{k+1}_{*}\) and \(\bar{y}^{k} = y^{k+1}_{*}\). Applying Lemma 3.1 to (27) by setting P=R, \(\bar{u} = x^{k}\), \(u = \bar{x}^{k} = x^{k+1}_{*}\), \(q(u) = f(\bar{x}^{k}) - A^{T}[\lambda^{k} - H(A\bar{x}^{k} + By^{k} -b)] {\stackrel{\small (32)}{=}} f(\bar{x}^{k}) - A^{T} \bar{\lambda}^{k}\) and v=x in (15), we have
Similarly, applying Lemma 3.1 to (29) and using (32), we get
It follows from the above inequality that
Setting \(\hat{\lambda}^{k+1} = \lambda^{k+1}_{*}\) in Lemma 3.2 and using \(\bar{y}^{k} = y^{k+1}_{*}\), we have
Combining (35), (36) and the above equation together and by simple manipulations, we can get (34) immediately. The proof is completed. □
The following result shows the contraction of the sequence generated by Algorithm 1, based on which the convergence of Algorithm 1 can be established easily.
Lemma 4.3
Let the sequence {w k} be generated by Algorithm 1. Then {w k} is bounded, i.e., for any \(w^{*} \in \mathcal{W}^{*}\), there are positive constants \(C_{w^{*}}\) and ρ, such that
and
where G is defined by (12).
Proof
Setting w=w ∗ in (34), we get
On the other hand, since Q is monotone, \(\bar{w}^{k} \in \mathcal{W}\), and \(w^{*} \in \mathcal{W}^{*}\), we have
From the above two inequalities and using μ∈]0,1[, we obtain
It follows from (37) and Lemma 4.1 that there is a positive constant ρ such that
Consequently, for any k≥0 we get
Therefore, the sequence {w k} generated by Algorithm 1 is bounded. It follows from (33), (37) and (38) that
The proof is completed. □
Now, we are ready to prove the convergence of Algorithm 1.
Theorem 4.1
The sequence {w k} generated by Algorithm 1 converges to some w ∞ which is a solution of SVI \((\mathcal{W},Q)\).
Proof
From \(\sum_{k=0}^{\infty}\nu_{k} < + \infty\) and ν k ≥0, it follows that
Setting \(\eta_{k} = 2 \rho C_{w^{*}} \nu_{k} + \rho^{2}\nu_{k}^{2}\), c 0=1−μ in (20), from Lemma 4.3 and Theorem 3.1 we have
Then it follows from (12), (31), and (32) that
From (34) and the above two formulas, we get
The convergence of Algorithm 1 is then obtained immediately from Theorem 3.1. □
4.3 Convergence Rate
Now, we show the worst-case O(1/t) convergence rate for Algorithm 1.
Theorem 4.2
For any integer t>0, there is a \(\bar{w}_{t} \in \mathcal{W}\) which is a convex combination of the iterates \(\bar{w}^{0}, \bar{w}^{1}, \ldots, \bar{w}^{t}\) defined by (32). Then, for any \(w \in \mathcal{W}\), we have
where \(\bar{w}_{t} := (\sum_{k =0}^{t} \bar{w}^{k})/(t+1)\).
Proof
From (34), we have
It follows from (33) that
Since Q is monotone, from the above two inequalities, we have
Summing the inequality (41) over k=0,1,…,t, we obtain
Since \(\sum_{k =0}^{t} 1/(t+1) = 1\), \(\bar{w}_{t}\) is a convex combination of \(\bar{w}^{0}, \bar{w}^{1}, \ldots, \bar{w}^{t}\) and thus \(\bar{w}_{t} \in \mathcal{W}\). Using the notation of \(\bar{w}_{t}\), we derive
The assertion (40) follows from the above inequality immediately. □
It follows from Lemma 4.3 that the sequence {w k} generated by Algorithm 1 is bounded. According to (39), the sequence \(\{\bar{w}^{k}\}\) defined by (32) is also bounded. Therefore, there exists a constant D>0 such that
Recall that \(\bar{w}_{t}\) is the average of \(\{\bar{w}^{0}, \bar{w}^{1}, \ldots, \bar{w}^{t}\}\). Thus, we have \(\|\bar{w}_{t}\|_{G} \le D\). Denote \(E_{1}:= \sum_{k=0}^{\infty}\nu_{k} < + \infty\). For any \(w \in \mathcal{B}_{\mathcal{W}}(\bar{w}_{t}): = \{ w \in \mathcal{W}\; | \; \|w - \bar{w}_{t}\|_{G} \le 1 \}\), we get
Thus, for any given ε>0, after at most \(t := \lceil \frac{(2D+1)(2D + 1 + 2\rho E_{1})}{2 \varepsilon} - 1 \rceil \) iterations, we have
which means \({\bar{w}}_{t}\) is an approximate solution of SVI\((\mathcal{W},Q)\) with an accuracy of O(1/t). That is, a worst-case O(1/t) convergence rate of Algorithm 1 in ergodic sense is established.
5 The Second Algorithm
As we have mentioned, the inexactness criterion in Algorithm 1 (i.e., (26) and (28)) is not applicable directly due to the lack of \(x_{*}^{k+1}\) and \(y_{*}^{k+1}\). Now we propose a new inexactness criterion in the absence of these exact solutions and embed it into the subproblems (8) and (9). A new implementable ADMM with LQP regularization is thus proposed. Note the new inexactness criterion allows the relative error of solving the subproblems (8) and (9) to be fixed as a constant.
5.1 Algorithm 2
To ensure the convergence when the relative error of solving the subproblems (8) and (9) is fixed as a constant, the approximate solutions of (8) and (9) should be corrected to generate a new iterate (x k+1,y k+1,λ k+1). In this algorithm, we thus use \((\tilde{x}^{k}, \tilde{y}^{k},\tilde{\lambda}^{k}) \in \mathbb{R}^{n}_{++} \times \mathbb{R}^{p}_{++} \times \mathbb{R}^{m}\) to denote the approximate solutions of (8)–(10). Below, for notational simplicity we choose R:=rI and S:=sI with r,s>0 as the proximal matrices for this algorithm.
Algorithm 2
(I-ADMM-LQP 2)
- Step 0. :
-
Let ε>0; μ,ν,σ∈]0,1[; γ∈]0,2[; r,s>0; \(w^{0}:=(x^{0}, y^{0}, \lambda^{0}) \in {\mathbb{R}^{n}_{^{++}}} \times \mathbb{R}^{p}_{^{++}} \times \mathbb{R}^{m}\) and H be a positive definite diagonal matrix. Set k=0.
- Step 1. :
-
Find \(\tilde{x}^{k} \in \mathbb{R}^{n}_{++}\) and \(\xi_{x}^{k} \in \mathbb{R}^{n}\) such that
(42)where \(\xi_{x}^{k}\) satisfies the following inexactness criterion:
$$ \big\|\xi_x^k\big\| \le \nu r (1 - \mu) \big\|x^k-\tilde{x}^k\big\|. $$(43) - Step 2. :
-
Find \(\tilde{y}^{k} \in \mathbb{R}^{p}_{++}\) and \(\xi_{y}^{k} \in \mathbb{R}^{p}\) such that
(44)where \(\xi_{y}^{k}\) satisfies the following inexactness criterion:
$$ \big\|\xi_y^k\big\| \le \nu s (1 - \mu) \big\|y^k-\tilde{y}^k\big\|. $$(45) - Step 3. :
-
Update \(\tilde{\lambda}^{k}\) via
$$ \tilde{\lambda}^k :=\lambda^k - H \bigl(A\tilde{x}^k+B\tilde{y}^k-b\bigr). $$(46) - Step 4. :
-
Set \(\tilde{w}^{k} := (\tilde{x}^{k}, \tilde{y}^{k}, \tilde{\lambda}^{k})\), and then generate the new iterate w k+1 by
$$ w^{k+1}:= (1 - \sigma) w^k + \sigma P_{\mathcal{W}, G }\bigl[w^k - \alpha_k d \bigl(w^k,\tilde{w}^k,\xi^k\bigr)\bigr] $$(47)with the step-size
$$ \alpha_k := \gamma \alpha_k^*, $$(48)where
(49)(50)and G is defined by (12).
- Step 5. :
-
If ∥w k+1−w k∥≤ε, stop; otherwise set k=k+1 and goto Step 1.
Remark 5.1
Note that Lemma 3.1 guarantees that there exists a unique solution \(\tilde{x}^{k} \in \mathbb{R}^{n}_{++}\) for a given \(\xi_{x}^{k} \in \mathbb{R}^{n}\) and a unique solution \(\tilde{y}^{k} \in \mathbb{R}^{p}_{++}\) for a given \(\xi_{y}^{k} \in \mathbb{R}^{p}\). To implement Algorithm 2, it is easy to determine the error terms \(\xi_{x}^{k}\) and \(\xi_{y}^{k}\) subject to the inexactness criteria (43) and (45). For example, in practice within finite iterations, we can choose
Then, the system of equations (42) reduces to
whose solution can be given explicitly by
with
It is easy to verify that \({\tilde{x}}^{k} \in \mathbb{R}^{n}_{++}\) whenever \(x^{k} \in \mathbb{R}^{n}_{++}\). Similarly, we can choose
Then the positive solution of (44) can be obtained explicitly by
with
Furthermore, \({\tilde{y}}^{k} \in \mathbb{R}^{p}_{++}\) whenever \(y^{k} \in \mathbb{R}^{p}_{++}\).
Remark 5.2
The inexactness criteria (43) and (45) can be met with proper values of r and s, respectively. For example, when both f and g in (2) are Lipschtiz continuous, we can choose \(\xi_{x}^{k}\) as (51) and
where L f is f’s Lipschtiz constant. Then, it follows that
which means that the inexactness criterion (43) is satisfied. Similarly, we can choose \(\xi_{y}^{k}\) as (53) and
where L g is g’s Lipschtiz constant. Then, we have
and thus the inexactness criterion (45) is satisfied.
Remark 5.3
As we have mentioned, since the ADMM subproblems with LQP regularization are allowed to be solved subject to a constant accuracy, Algorithm 2 thus requires certain correction step to further correct the approximate solutions to ensure the convergence. More specifically, the additional correction required by Algorithm 2 in (47) can be specified as
where M:=(1+μ)S+B T HB. This additional computation is easy and inexpensive computationally.
5.2 Convergence
Recall we let \(w^{k}:=(x^{k},y^{k},\lambda^{k}) \in \mathbb{R}^{n}_{++} \times \mathbb{R}^{p}_{++} \times \mathbb{R}^{m} \) be a given vector, \(\tilde{w}^{k}:=(\tilde{x}^{k}, \tilde{y}^{k},\tilde{\lambda}^{k}) \in \mathbb{R}^{n}_{++} \times \mathbb{R}^{p}_{++} \times \mathbb{R}^{m}\) be generated by Algorithm 2, w k, \(\tilde{w}^{k}\) and ξ k satisfy (43) and (45). In this subsection, we prove the convergence of Algorithm 2.
For convenience of the following analysis, we define
Lemma 5.1
Let the sequence {w k} be generated by Algorithm 2, and the accompanying sequence \(\{\bar{w}^{k}\}\) be defined by (56). Then, for any \(w:=(x, y, \lambda) \in \mathcal{W}\), we have
where \(\varphi(w^{k}, \tilde{w}^{k}, \xi^{k})\) is defined by (50).
Proof
Recall that \(\bar{x}^{k} = \tilde{x}^{k}\) and \(\bar{y}^{k} = \tilde{y}^{k}\). It follows from (42), (44) and (56) that
Applying Lemma 3.1 to the above equations, respectively, the inequality (16) can be written as
Thus we have
Note that
Combining the above three formulas together and using (49) and the notation of G and ξ k, we get
For any \(w \in \mathcal{W}\), we have
It follows from (60) that
Since \((\tilde{w}^{k} - \bar{w}^{k})^{T} G d(w^{k}, \tilde{w}^{k}, \xi^{k}) = (\lambda^{k} - \tilde{\lambda}^{k})^{T} (By^{k} - B{\tilde{y}^{k}})\) (see (12), (49) and (56)), it follows from the notation of \(\varphi(w^{k}, \tilde{w}^{k}, \xi^{k})\) that
Substituting (61) and (62) into (59), the assertion (57) is proved. □
Using the notation of M, the matrix G can be rewritten as
Proposition 5.1
For given \(w^{k}:=(x^{k}, y^{k}, \lambda^{k}) \in {\mathbb{R}^{n}_{^{++}}} \times \mathbb{R}^{p}_{^{++}} \times \mathbb{R}^{m}\), let \(\tilde{w}^{k} :=(\tilde{x}^{k}, \tilde{y}^{k}, \tilde{\lambda}^{k})\) be generated by (42)–(46). Then we have
Proof
It follows from (49), (50) and the notation of G and x k that
Using \(\bar{\lambda}^{k} = \tilde{\lambda}^{k} - H(By^{k} - B\tilde{y}^{k})\) (see (46) and (56)), we have
Substituting this into (64), it follows that
Since R:=rI, it follows from (43) that
Using the Cauchy–Schwarz inequality and (66), we have
From the above inequality, we obtain
Using (66), we have
It follows from the above inequality that
Substituting this into (67), we obtain
Similarly, using (45), we obtain
Therefore, it follows from (65) and the above two inequalities that
Using the notation of M, we get
and thus
By a simple manipulation, we have
Substituting (71) into (69) and using the notation of \(d(w^{k}, \tilde{w}^{k}, \xi^{k})\) and G, the assertion of this proposition is proved. □
The following corollary follows from the definition of \(\alpha^{k}_{*}\) and (63) directly, and we omit its proof.
Corollary 5.1
The step-size \(\alpha_{k}^{*}\) defined in Step 4 of Algorithm 2 satisfies \(\alpha_{k}^{*}\ge 1/4\) for all k≥0.
Next, we will show the contraction of the sequence generated by Algorithm 2, based on which the convergence of Algorithm 2 can be established easily.
Corollary 5.2
For any \(w^{*} \in \mathcal{W}^{*}\), there is a positive constant c 1 such that the sequence {w k} generated by Algorithm 2 satisfies
Proof
Since \(w^{*} \in \mathcal{W}^{*}\), \(\bar{w}^{k} \in \mathcal{W}\) and Q is monotone, we have
Then, setting w=w ∗ in (57), we have
It follows from the above equation and Corollary 5.1 that
Using (65), we have
It follows from (43) and (45) that
Setting c 1:=σγ(2−γ)(1−μ)(1−ν)/8, the assertion (72) is proved from the above three inequalities. □
Theorem 5.1
The sequence {w k} generated by Algorithm 2 converges to some w ∞ which is a solution of SVI \((\mathcal{W},Q)\).
Proof
Setting η k ≡0, c 0=c 1 in (20), from (72) and Theorem 3.1 we have
and {w k} is bounded. It follows from (73) and (63) that
From (58), we obtain
Together with (74) and (75), we get
The convergence of Algorithm 2 is then obtained immediately from Theorem 3.1. □
5.3 Convergence Rate
Now, we show the worst-case O(1/t) convergence rate for Algorithm 2.
Theorem 5.2
For any integer t>0, there is a \(\bar{w}_{t} \in \mathcal{W}\), which is a convex combination of the iterates \(\bar{w}^{0}, \bar{w}^{1}, \ldots, \bar{w}^{t}\) defined in (56). For any \(w \in \mathcal{W}\), we have
where
Proof
It follows from (57) and (63) that
Since Q is monotone, from the above inequality, we have
Summing the inequality (77) over k=0,1,…,t, we obtain
Since \((\sum_{k =0}^{t} \alpha_{k}^{*})/{\varUpsilon_{t}} = 1\), \(\bar{w}_{t}\) is a convex combination of \(\bar{w}^{0}, \bar{w}^{1}, \ldots, \bar{w}^{t}\) and thus \(\bar{w}_{t} \in \mathcal{W}\). Using the notation of ϒ t and \(\bar{w}_{t}\), we derive
The assertion (76) follows from the above inequality immediately. □
Using Theorem 5.2, we can prove a worst-case O(1/t) convergence rate of Algorithm 2 in ergodic sense as Algorithm 1.
6 Conclusions
This paper studies how to combine the alternating direction method of multipliers and the logarithmic-quadratic proximal method, allowing the resulting subproblems to be solved approximately subject to different inexactness criteria. We demonstrate how to develop inexact versions of ADMM with LQP regularization by applying the most popular and fundamental choices of inexactness criteria in the literature. It deserves further research on embedding other inexactness criteria into the subproblems of ADMM with LQP regularization and thus developing other algorithms of the same kind as those in this paper. We expect that certain correction steps are required if some relaxed inexactness criteria are chosen for solving the resulting subproblems of the combination of ADMM and LQP method.
References
Glowinski, R., Marrocco, A.: Sur l’approximation par éléments finis d’ordre un et la résolution par pénalisation-dualité d’une classe de problèmes de Dirichlet non linéaires. Rev. Fr. Autom. Inform. Rech. Opér., Anal. Numér. 2, 41–76 (1975)
He, B.S., Liao, L.-Z., Han, D.R., Yang, H.: A new inexact alternating directions method for monotone variational inequalities. Math. Program. 92(1), 103–118 (2002)
Martinet, B.: Regularision d’inéquations variationnelles par approximations successive. Rev. Fr. Autom. Inform. Rech. Opér. 126, 154–159 (1970)
Auslender, A., Teboulle, M.: Entropic proximal decomposition method for convex programs and variational inequalities. Math. Program. 91(1), 33–47 (2001)
Yuan, X.M., Li, M.: An LQP-based decomposition method for solving a class of variational inequalities. SIAM J. Optim. 21(4), 1309–1318 (2011)
Auslender, A., Teboulle, M., Ben-Tiba, S.: A logarithmic-quadratic proximal method for variational inequalities. Comput. Optim. Appl. 12, 31–40 (1999)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Tao, M., Yuan, X.M.: On the O(1/t) convergence rate of alternating direction method with logarithmic-quadratic proximal regularization. SIAM J. Optim. 22(4), 1431–1448 (2012)
Facchinei, F., Pang, J.S.: Finite-Dimensional Variational Inequalities and Complementarity Problems, Vols. I and II. Springer, New York (2003)
He, B.S., Yuan, X.M.: On the O(1/n) convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700–709 (2012)
Nesterov, Y.: Gradient methods for minimizing composite objective function pp. 1–30. Core Discussion Paper, 2007/76 (2007)
He, B.S., Liao, L.-Z., Yuan, X.M.: A LQP-based interior prediction-correction method for nonlinear complementarity problems. J. Comput. Math. 24(1), 33–44 (2006)
Yamashita, N., Fukushima, M.: Equivalent unconstrained minimization and global error bounds for variational inequality problems. SIAM J. Control Optim. 35(1), 273–284 (1997)
Acknowledgements
The first author was supported by National Natural Science Foundation of China grant 11001053, Program for New Century Excellent Talents in University grant NCET-12-0111 and Natural Science Foundation of Jiangsu Province grant BK2012662. The second author was supported in part by Hong Kong General Research Fund grants HKBU 201409 and HKBU 201611. The third author was supported by the grant FRG2/11-12/120 from Hong Kong Baptist University and the General Research Fund HKBU 203311 from Hong Kong Research Grants Council.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Masao Fukushima.
Rights and permissions
About this article
Cite this article
Li, M., Liao, LZ. & Yuan, X. Inexact Alternating Direction Methods of Multipliers with Logarithmic–Quadratic Proximal Regularization. J Optim Theory Appl 159, 412–436 (2013). https://doi.org/10.1007/s10957-013-0334-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-013-0334-4