Abstract
This note provides a comprehensive discussion of the equivalences between some splitting methods. We survey known results concerning these equivalences which have been studied over the past few decades. In particular, we provide simplified proofs of the equivalence of the ADMM and the Douglas–Rachford method and the equivalence of the ADMM with intermediate update of multipliers and the Peaceman–Rachford method.
Access provided by Autonomous University of Puebla. Download chapter PDF
Similar content being viewed by others
Keywords
- Alternating Direction Method of Multipliers (ADMM)
- Chambolle–Pock method
- Douglas–Rachford algorithm
- Dykstra method
- Equivalence of splitting methods
- Fenchel–Rockafellar duality
- Peaceman–Rachford algorithm
AMS 2010 Subject Classification
13.1 Introduction
Splitting methods have become popular in solving convex optimization problems that involve finding a minimizer of the sum of two proper lower semicontinuous convex functions. Among these methods are the Douglas–Rachford and the Peaceman–Rachford methods introduced in the seminal work of Lions and Mercier [24], the forward-backward method (see, e.g., [12, 17] and [29]), Dykstra’s method (see, e.g., [3] and [10]), and the Method of Alternating Projections (MAP) (see, e.g., [19]).
When the optimization problem features the composition of one of the functions with a bounded linear operator, a popular technique is the Alternating-Direction Method of Multipliers (ADMM) (see [22, Section 4], [16, Section 10.6.4] and also [7, Chapter 15]). The method has a wide range of applications including large-scale optimization, machine learning, image processing and portfolio optimization, see, e.g., [9, 15] and [20]. A powerful framework to use ADMM in the more general setting of monotone operators is developed in the work of Briceño-Arias and Combettes [13] (see also [8] and [14]). Another relatively recent method is the Chambolle–Pock method introduced in [11].
Equivalences between splitting methods have been studied over the past four decades. For instance, it is known that ADMM is equivalent to the Douglas–Rachford method [24] (see, also [21]) in the sense that with a careful choice of the starting point, one can prove that the sequences generated by both algorithms coincide. (See, e.g., [22, Section 5.1] and [6, Remark 3.14].) A similar equivalence holds between ADMM (with intermediate update of multiplier) and Peaceman–Rachford method [24] (see [22, Section 5.2]). In [25], the authors proved the correspondence of Douglas–Rachford and Chambolle–Pock methods.
The rest of this paper is organized as follows: Section 13.2 provides a brief literature review of ADMM, Douglas–Rachford and Peaceman–Rachford methods. In Sections 13.3 and 13.4 we explicitly describe the equivalence of ADMM (respectively ADMM with intermediate update of multipliers) and Douglas–Rachford (respectively Peaceman–Rachford) method introduced by Gabay in [22, Sections 5.1&5.2]. We provide simplified proofs of these equivalences. Section 13.5 focuses on the recent work of O’Connor and Vandenberghe concerning the equivalence of Douglas–Rachford and Chambolle–Pock methods (see [25]). Our notation is standard and follows largely, e.g., [5].
13.2 Three Techniques
In this paper, we assume that
and that
Alternating-Direction Method of Multipliers (ADMM)
In the following we assume thatFootnote 1
that
and that
where \({\operatorname {sri}} S\) denotes the strong relative interior of a subset S of X with respect to the closed affine hull of S. When X is finite-dimensional we have , where is the relative interior of S defined as the interior of S with respect to the affine hull of S.
Consider the problem
Note that (13.1) and (13.2) imply that (see, e.g., [5, Proposition 27.5(iii)(a)1])
In view of (13.4), solving (13.3) is equivalent to solving the inclusion:
The augmented Lagrangian associated with (13.3) is the function
The ADMM (see [22, Section 4] and also [16, Section 10.6.4]) applied to solve (13.3) consists in minimizing \(\mathcal {L}\) over b then over a and then applying a proximal minimization step with respect to the Lagrange multiplier u. The method applied with a starting point (a 0, u 0) ∈ X × X generates three sequences \((a_n)_{{n\in {\mathbb N}}}\); (b n)n≥1 and \((u_n)_{{n\in {\mathbb N}}}\) via \((\forall {n\in {\mathbb N}})\):
where .
Let \((x_n)_{n\in {\mathbb N}}\) be a sequence in X and let \(\overline {x}\in X\). In the following we shall use \(x_n\to \overline {x}\) (respectively \(x_n{\:{\rightharpoonup }\:} \overline {x}\)) to indicate that \((x_n)_{n\in {\mathbb N}}\) converges strongly (respectively weakly) to \(\overline {x}\).
Fact 13.1 (Convergence of ADMM (See [22, Theorem 4.1]))
Let (a 0, u 0) ∈ X × X, and let \((a_n)_{{n\in {\mathbb N}}}\) , (b n)n≥1 and \((u_n)_{{n\in {\mathbb N}}}\) be defined as in (13.7). Then, there exists \(\overline {b}\in Y\) such that \(b_n{\:{\rightharpoonup }\:}\overline {b}\in {\operatorname {argmin}}(f \circ L+g)\).
The Douglas–Rachford Method
Suppose that Y = X and that \(L={\operatorname {Id}}\). In this case Problem (13.3) becomes
The Douglas–Rachford (DR) method, introduced in [24], applied to the ordered pair (f, g) with a starting point x 0 ∈ X to solve (13.8) generates two sequences \((x_n)_{{n\in {\mathbb N}}}\) and \((y_n)_{{n\in {\mathbb N}}}\) via:
where
and where .
To lighten the notation, in the sequel we shall use T DR to denote T DR(f, g). Let T : X → X. Recall that the set of fixed points of T, denoted by \({\operatorname {Fix}} T\), is defined as .
Fact 13.2 (Convergence of Douglas–Rachford Method (See, e.g., [24, Theorem 1] or [5, Corollary 28.3]))
Let x 0 ∈ X and let \((x_n)_{n\in {\mathbb N}}\) and \((y_n)_{n\in {\mathbb N}}\) be defined as in (13.9). Then, there exists \(\overline {x}\in {\operatorname {Fix}} T_{ \mathit{\text{DR}}}\) such that \( x_n{\:{\rightharpoonup }\:}\overline {x}\) and \(y_n{\:{\rightharpoonup }\:}{\operatorname {Prox}}_f\overline {x}\in {\operatorname {argmin}}(f+g)\).
The Peaceman–Rachford Method
Let \(h\colon X\to \left ]-\infty ,+\infty \right ]\) be proper and let β > 0. We say that h is strongly convex if is convex, i.e., \((\forall (x,y)\in {\operatorname {dom}} f\times {\operatorname {dom}} f)\) \((\forall \alpha \in \left ]0,1\right [)\) we have .
When g is strongly convex, the Peaceman–Rachford (PR) method, introduced in [24], can be used to solve (13.8). In this case, given x 0 ∈ X, PR method generates the sequences \((x_n)_{{n\in {\mathbb N}}}\) and \((y_n)_{{n\in {\mathbb N}}}\) via:
where
To lighten the notation, in the sequel we shall use T PR to denote T PR(f, g).
Fact 13.3 (Convergence of Peaceman–Rachford Method (See, e.g., [24, Proposition 1] or [5, Proposition 28.8]))
Suppose that g is strongly convex. Let \(\overline {y}\) be the unique minimizer of f + g, let x 0 ∈ X and let \((x_n)_{n\in {\mathbb N}}\) and \((y_n)_{n\in {\mathbb N}}\) be defined as in (13.11). Then \(y_n\to \overline {y}\).
In the sequel we use the notation
Recall that the Fenchel–Rockafellar dual of (13.3) is
Remark 13.1
-
(i)
One can readily verify that \(\partial g^{{{\scriptscriptstyle \vee }}} =(-{\operatorname {Id}})\circ \partial g \circ (-{\operatorname {Id}})\). Therefore, in view of [27, Theorem A] and [20, Lemma 3.5 on page 125 and Lemma 3.6 on page 133] (see also [4, Corollaries 4.2 and 4.3]) we haveFootnote 2
(13.15)and
(13.16) -
(ii)
When \((L,Y)=({\operatorname {Id}}, X)\), inclusion (13.5) reduces to: Find y ∈ X such that 0 ∈ ∂f(y) + ∂g(y) and the dual inclusion (corresponding to the Fenchel–Rockafellar dual (13.14)) is: Find y ∈ X such that 0 ∈ ∂f ∗(y) − ∂g ∗(−y) = (∂f)−1 y − (∂g)−1(−y), which in this case coincide with the Attouch–Thera dual of (13.5) (see [2]).
One can use DR method to solve (13.14) where (f, g) in Fact 13.2 is replaced by (f ∗, g ∗∘ (−L ∗)). Recalling (13.15) we learn that T DR = T DR(f ∗, g ∗∘ (−L ∗)) = T DR(f ∗∗, (g ∗∘ (−L ∗))∨∗) = T DR(f, (g ∗∘ L ∗)∗), where the last identity follows from [5, Proposition 13.44].
In view of (13.10) (13.15), and [5, Proposition 15.23(v)] we have
Similarly, under additional assumptions (see Fact 13.3), one can use PR method to solve (13.14) where (f, g) in Theorem 13.3 is replaced by (f ∗, g ∗∘ (−L ∗)). In this case (13.12), (13.16) and [5, Proposition 15.23(v)] imply that
For completeness, we provide a concrete proof of the formula for \({\operatorname {Prox}}_{(g^*\circ L^*)^*}\) in Appendix 1 (see Proposition 13.3 (viii) below). We point out that the formula for \({\operatorname {Prox}}_{(g^*\circ L^*)^*}\) in a more general setting is given in [22, Proposition 4.1] (see also [16, Section 10.6.4]).
13.3 ADMM and Douglas–Rachford Method
In this section we discuss the equivalence of ADMM and DR method. This equivalence was first established by Gabay in [22, Section 5.1] (see also [6, Remark 3.14]). Let (x 0, a 0, u 0) ∈ X 3. Throughout the rest of this section, we assume that the sequences \((x_n)_{n\in \mathbb {N}}\) and \((y_n)_{n\in \mathbb {N}}\) are as defined in (13.9), i.e.,
Note that the second identity in (13.20) follows from (13.17) and Proposition 13.3 (viii). We also assume that
The following lemma will be used later to clarify the equivalence of DR and ADMM.
Lemma 13.1
Let (b −, a −, u −) ∈ Y × X × X and set
Then
Proof
Indeed, it follows from (13.20), (13.21), (13.22a) and (13.22b) that
where (13.23a) follows from (13.20), (13.23b) and (13.23d) follow from (13.21a), and (13.23e) follow from (13.21b). This proves (13.22a). Now (13.22b) follows from combining (13.22a) and (13.21b). □
We now prove the main result in this section by induction.
Theorem 13.4
The following hold:
-
(i)
(DR as ADMM Iteration) Using DR method with a starting point x 0 ∈ X to solve (13.14) is equivalent to using ADMM with a starting point to solve (13.3), in the sense that (x n)n≥1 = (Lb n + u n−1)n≥1 and \((y_n)_{{n\in {\mathbb N}}}=(a_n)_{{n\in {\mathbb N}}}\).
-
(ii)
(ADMM as DR Iteration) Using ADMM with a starting point (a 0, u 0) ∈ X × X to solve (13.3) is equivalent to using DR method with a starting point x 0 = Lb 1 + u 0 to solve (13.14), in the sense that \((x_n)_{{n\in {\mathbb N}}}=(Lb_{n+1}+u_{n})_{{n\in {\mathbb N}}}\) and \((y_n)_{{n\in {\mathbb N}}}=(a_{n+1})_{{n\in {\mathbb N}}}\).
Proof
For simplicity, set T = T DR. (i): Note that (13.19) implies that y 0 = a 0. Now, when n = 1 we have
Combining with (13.19) and (13.7b) we get \(y_1={\operatorname {Prox}}_f Tx_0={\operatorname {Prox}}_f x_1={\operatorname {Prox}}_f(u_0+Lb_1)=a_1\), which verifies the base case. Now suppose for some n ≥ 1 we have x n = Lb n + u n−1 and y n = a n and use Lemma 13.1 with (b −, a −, u −) replaced by (b n−1, a n−1, u n−1) to learn that x n+1 = Lb n+1 + u n and y n+1 = a n+1. Consequently, (x n)n≥1 = (Lb n + u n−1)n≥1 and \((y_n)_{{n\in {\mathbb N}}}=(a_n)_{{n\in {\mathbb N}}}\) as claimed.
(ii): At n = 0, x 0 = Lb 1 + u 0 = Lb 0+1 + u 0, and therefore (13.9a) implies that \(y_0={\operatorname {Prox}}_fx_0={\operatorname {Prox}}_f(Lb_1+u_0)=a_1\) by (13.7b). Now suppose that for some n ≥ 0 we have x n = Lb n+1 + u n and y n = a n+1. The conclusion follows by applying Lemma 13.1 with (b −, a −, u −) replaced by (b n, a n, u n). □
13.4 ADMM and Peaceman–Rachford Method
We now turn to the equivalence of ADMM with intermediate update of multiplier and PR method. This equivalence was established in [22, Section 5.2]. Given (a 0, u 0) ∈ X × X, the ADMM with an intermediate update of multiplier applied to solve (13.3) generates four sequences \((a_n)_{{n\in {\mathbb N}}}\), \((u_n)_{{n\in {\mathbb N}}}\), (b n)n≥1 and (w n)n≥1 via \((\forall {n\in {\mathbb N}})\):
Fact 13.5 (Convergence of ADMM with Intermediate Update of Multipliers (See [22, Theorem 5.4]))
Suppose that g is strongly convex. Let (a 0, u 0) ∈ X × X, and let (b n)n≥1 , (w n)n≥1 , \((a_n)_{{n\in {\mathbb N}}}\) and \((u_n)_{{n\in {\mathbb N}}}\) be defined as in (13.24). Then, there exists \(\overline {b}\in Y\) such that \(b_n\to \overline {b}\in {\operatorname {argmin}}(f \circ L+g)\).
In this section we work under the additional assumption that
Let (x 0, a 0, u 0) ∈ X 3. Throughout the rest of this section we assume that the sequences \((x_n)_{n\in \mathbb {N}}\) and \((y_n)_{n\in \mathbb {N}}\) are as defined in (13.11), i.e.,
where
Note that the second identity in (13.26) follows from (13.18) and Proposition 13.3 (viii). We also assume that
Before we proceed further, we prove the following useful lemma.
Lemma 13.2
Let (b −, w −, a −, u −) ∈ Y × X × X × X and set
Then
Proof
Indeed, by (13.26), (13.27a) and (13.27b) we have
which proves (13.28a). Now (13.28b) is a direct consequence of (13.28a) in view of (13.27b). □
We are now ready for the main result in this section.
Theorem 13.6
Suppose that g is strongly convex. Then the following hold:
-
(i)
(PR as ADMM Iteration) Using PR method with a starting point x 0 ∈ X to solve (13.14) is equivalent to using ADMM with intermediate update of multipliers with starting points to solve (13.3), in the sense that (x n)n≥1 = (Lb n + w n)n≥1 and \((y_n)_{{n\in {\mathbb N}}}=(a_n)_{{n\in {\mathbb N}}}\).
-
(ii)
(ADMM as PR Iteration) Using ADMM with intermediate update of multipliers with a starting point (a 0, u 0) ∈ X × X to solve (13.3) is equivalent to using PR method with starting point x 0 = Lb 1 + w 1 to solve (13.14), in the sense that \((x_n)_{{n\in {\mathbb N}}}=(Lb_{n+1}+w_{n+1})_{{n\in {\mathbb N}}}\) and \((y_n)_{{n\in {\mathbb N}}}=(a_{n+1})_{{n\in {\mathbb N}}}\).
Proof
We proceed by induction. (i): We have
which verifies the base case. Now suppose for some n ≥ 1 we have x n = Lb n + w n. The conclusion follows from applying Lemma 13.2 with (b −, w −, a −, u −) replaced by (b n−1, w n−1, a n−1, u n−1) in view of (13.24).
(ii): At n = 0, the base case clearly holds in view of (13.25) and (13.24c). Now suppose that for some n ≥ 0 we have x n = Lb n+1 + w n+1 and y n = a n+1 and use Lemma 13.2 with (b −, w −, a −, u −) replaced by (b n, w n, a n, u n) in view of (13.24). □
13.5 Chambolle–Pock and Douglas–Rachford Methods
In this section we survey the recent work by O’Connor and Vandenberghe [25] concerning the equivalence of Douglas–Rachford method and Chambolle–Pock method. (For a detailed study of this correspondence in the more general framework of the primal-dual hybrid gradient method and DR method with relaxation, as well as connection to linearized ADMM, we refer the reader to [25].) We work under the assumption thatFootnote 3
Consider the problem
and its Fenchel–Rockafellar dual given by
To proceed further, in the following we assume that
Note that (13.34) implies that (see, e.g., [5, Proposition 27.5(iii)(a)1])
In view of (13.35), solving (13.32) is equivalent to solving the inclusion:
The Chambolle–Pock (CP) method applied with a staring point (u 0, v 0) ∈ X × Y to solve (13.32) generates the sequences \(({u}_n)_{{n\in {\mathbb N}}}\), and \(({v}_n)_{{n\in {\mathbb N}}}\) via:
where τ and σ are as defined in (13.31).
Fact 13.7 (Convergence of Chambolle–Pock Method (See [11, Theorem 1], [30, Theorem 3.1] and Also [18, Theorem 3.1]))
Let (u 0, v 0) ∈ X × Y and let \((u_n)_{n\in {\mathbb N}}\) and \((v_n)_{n\in {\mathbb N}}\) be defined as in (13.37). Then, there exists \((\overline {u},\overline {v})\in X \times Y\) such that \( (u_n,v_n)_{n\in {\mathbb N}}{\:{\rightharpoonup }\:}(\overline {u},\overline {v})\) , \(\overline {u}\in {\operatorname {argmin}}(f+g\circ A)\) and \(\overline {v}\in {\operatorname {argmin}}(f^*\circ (-A^*)+g^*)\).
It is known that the method in (13.37) reduces to DR method (see, e.g., [11, Section 4.2]) when \(A={\operatorname {Id}}\). We state this equivalence in Proposition 13.1 below.
Proposition 13.1 (DR as a CP Iteration)
Suppose that X = Y , and that \(A={\operatorname {Id}}\) . Then, using DR method, defined as in (13.9), with a starting point x 0 ∈ X to solve (13.32) is equivalent to using CP method with a starting point (u 0, v 0) ∈{(u, v) | u − v = x 0 } ⊆ X × X to solve (13.32) in the sense that \((x_n)_{n\in {\mathbb N}}=(u_n-v_n)_{n\in {\mathbb N}}\) and \((y_n)_{n\in {\mathbb N}}=(u_n)_{n\in {\mathbb N}}\).
Proof
We use induction. When n = 0, the base case is obviously true. Now suppose that for some n ≥ 0 we have x n = u n − v n and y n = u n. Then,
Here (13.38a) follows from Lemma 13.5 below (applied with γ = 1), (13.38b) follows from the inductive hypothesis, and (13.38c) follows from (13.37) applied with (τ, σ, A) replaced by \((1,1,{\operatorname {Id}})\). The claim about y n+1 follows directly and the proof is complete. □
Chambolle–Pock as a DR Iteration: The O’Connor–Vandenberghe Technique
Let Z be a real Hilbert space. In the following, we assume that C : Z → Y is linear and that
Note that one possible choice of C is to set , where the existence of C follows from, e.g., [26, Theorem on page 265]. Now consider the problem
where
The following result, proved in [25, Section 4] in the more general framework of primal-dual hybrid gradient method, provides an elegant way to construct the correspondence between the DR sequence when applied to solve (13.40) and the CP sequence when applied to solve (13.32). We restate the proof for the sake of completeness.
Proposition 13.2 (CP Corresponds to a DR Iteration)
Using CP method with starting point (u 0, v 0) ∈ X × Z to solve (13.32) corresponds to using DR with starting point to solve (13.40), in the sense that \(({\mathbf {x}}_n)_{{n\in {\mathbb N}}} =((u_n,0)-\tau B^* v_n)_{{n\in {\mathbb N}}}\) and \(({\mathbf {y}}_n)_{{n\in {\mathbb N}}} =({u}_{n+1},0)_{n\in {\mathbb N}}\).
Proof
We apply DR to solve (13.40) with \((\widetilde {f},g)\) replaced by \((\tau \widetilde {f},\tau g)\). The proof proceeds by induction. When n = 0, by assumption we have x 0 = (u 0, 0) − τB ∗ v 0. It follows from Proposition 13.4 (i)&(vii) below applied with \(\widetilde {f} \) replaced by \(\tau \widetilde {f} \) that \({\mathbf {y}}_0={\operatorname {Prox}}_{\tau \tilde {f}}{\mathbf {x}}_0 ={\operatorname {Prox}}_{\tau \widetilde {f}}((u_0,0)-\tau (A^* v_0,C^* v_0)) ={\operatorname {Prox}}_{\tau \widetilde {f}}(u_0-\tau A^* v_0,-\tau C^* v_0) =({\operatorname {Prox}}_{\tau f}(u_0-\tau A^* v_0),0)\). Now suppose that for some n ≥ 0 we have
Then
where (13.43a) follows from (13.37b), (13.43b) follows from (13.42b) and Proposition 13.4 (iii) below, (13.43c) follows from (13.39), and (13.43e) follows from (13.42a), Proposition 13.4 (viii) and (13.49b) below applied with (γ, g) replaced by (τ, g ∘ B).
Now by (13.37a) and Proposition 13.4 (vii) below we have
□
Notes
- 1.
The adjoint of L is the unique operator L ∗: X → Y that satisfies 〈Ly, x〉 = 〈y, L ∗ x〉 (∀(x, y) ∈ X × Y ).
- 2.
It is straightforward to verify that g ∨∗ = (g ∗)∨ (see, e.g., [5, Proposition 13.23(v)]).
- 3.
In passing, we point out that, when X is a finite-dimensional Hilbert space, the condition can be relaxed to . The convergence in this case is proved in [18, Theorem 3.3].
References
Attouch, H., Brézis, H.: Duality for the sum of convex functions in general Banach spaces. In: Aspects of Mathematics and Its Applications 34, pp. 125–133. North-Holland, Amsterdam (1986)
Attouch, H., Théra, M.: A general duality principle for the sum of two operators. J. Convex Anal. 3, 1–24 (1996)
Bauschke, H.H., Combettes, P.L.: A Dykstra-like algorithm for two monotone operators. Pacific J. Optim. 4, 383–391 (2008)
Bauschke, H.H., Boţ, R.I., Hare, W.L., Moursi, W.M.: Attouch–Théra duality revisited: paramonotonicity and operator splitting. J. Approx. Th. 164, 1065–1084 (2012)
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Second Edition. Springer (2017)
Bauschke, H.H., Koch, V.R.: Projection methods: Swiss Army knives for solving feasibility and best approximation problems with halfspaces. In: Infinite Products of Operators and Their Applications, Contemp. Math. vol. 636, pp. 1–40 (2012)
Beck, A.: First-Order Methods in Optimization, SIAM (2017)
Boţ, R.I, Csetnek, E.R.: ADMM for monotone operators: convergence analysis and rates. Advances Comp. Math. 45, 327–359 (2019)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning 3, 1–122 (2011)
Boyle, J.P., Dykstra, R.L.: A method for finding projections onto the intersection of convex sets in Hilbert spaces. Lecture Notes in Statistics 37, 28–47 (1986)
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging, J. Math. Imaging Vis. 40, 120–145 (2011)
Combettes, P.L.: Solving monotone inclusions via compositions of nonexpansive averaged operators. Optimization 53, 475–504 (2004)
Briceño-Arias, L.M., Combettes, P.L.: A monotone + skew splitting model for composite monotone inclusions in duality. SIAM J. Optim. 21, 1230–1250 (2011)
Combettes, P.L., Pesquet, J.-C.: Primal-dual splitting algorithm for solving inclusions with mixtures of composite, Lipschitzian, and parallel-sum type monotone operators. Set-Valued Var. Anal. 20, 307–330 (2012)
Combettes, P.L., Pesquet, J.-C.: A proximal decomposition method for solving convex variational inverse problems. Inverse Problems 24, article 065014 (2008)
Combettes, P.L., Pesquet, J.-C.: Proximal splitting methods in signal processing. In: Fixed-Point Algorithms for Inverse Problems in Science and Engineering, vol. 49, pp. 185–212. Springer, New York (2011)
Combettes, P.L., Vũ, B.-C.: Variable metric forward–backward splitting with applications to monotone inclusions in duality. Optimization 63, 1289–1318 (2014)
Condat, L.: A primal-dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms. J. Optim. Th. Appl. 158, 460–479 (2013)
Deutsch, F.: Best Approximation in Inner Product Spaces, Springer (2001)
Eckstein, J.: Splitting Methods for Monotone Operators with Applications to Parallel Optimization, Ph.D. thesis, MIT (1989)
Eckstein, J., Bertsekas, D.P.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Prog. 55, 293–318 (1992)
Gabay, D.: Applications of the method of multipliers to variational inequalities. In: Augmented Lagrangian Methods: Applications to the Numerical Solution of Boundary-Value Problems, vol. 15, pp. 299–331. North-Holland, Amsterdam (1983)
Gossez, J.-P.: Opérateurs monotones non linéaires dans les espaces de Banach non réflexifs. J. Math. Anal. Appl. 34, 371–395 (1971)
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16, 964–979 (1979)
O’Connor, D., Vandenberghe, L.: On the equivalence of the primal-dual hybrid gradient method and Douglas–Rachford splitting. Math. Prog. (Ser. A) (2018), https://doi.org/10.1007/s10107-018-1321-1.
Riesz, F., Sz.-Nagy, B.: Functional Analysis. Dover paperback (1990)
Rockafellar, R.T.: On the maximal monotonicity of subdifferential mappings. Pacific J. Math. 33, 209–216 (1970)
Stoer, J., Bulirsch, R.: Introduction to Numerical Analysis. Third Edition. Springer-Verlag (2002)
Tseng, P.: Applications of a splitting algorithm to decomposition in convex programming and variational inequalities. SIAM J. Con. Optim. 29, 119–138 (1991)
Vũ, B.C.: A splitting algorithm for dual monotone inclusions involving cocoercive operators. Adv. Comput. Math. 38, 667–681 (2013)
Acknowledgements
WMM was supported by the Pacific Institute of Mathematics Postdoctoral Fellowship and the DIMACS/Simons Collaboration on Bridging Continuous and Discrete Optimization through NSF grant # CCF-1740425.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendices
13.1.1 Appendix 1
Let A: X → X be linear. Define
Recall that a linear operator A: X → X is monotone if (∀x ∈ X) 〈x, Ax〉≥ 0, and is strictly monotone if \((\forall x\in X\smallsetminus \{0\})\) 〈x, Ax〉 > 0. Let h: X →ℝ and let x ∈ X. We say that h is Fréchet differentiable at x if there exists a linear operator Dh(x): X →ℝ, called the Fréchet derivative of h at x, such that ; and h is Fréchet differentiable on X if it is Fréchet differentiable at every point in X.
The following lemma is a special case of [5, Proposition 17.36].
Lemma 13.3
Let A: X → X be linear, strictly monotone, self-adjoint and invertible. Then the following hold:
-
(i)
q A and \(q_{A^{-1}}\) are strictly convex, continuous, Fréchet differentiable. Moreover, \((\nabla q_A,\nabla q_{A^{-1}} )=(A,A^{-1})\).
-
(ii)
\(q_{A}^*=q_{A^{-1}}\).
Proof
Note that, likewise A, A −1 is linear, strictly monotone, self-adjoint (since (A −1)∗ = (A ∗)−1 = A −1) and invertible. Moreover, \({\operatorname {ran}} A={\operatorname {ran}} A^{-1}=X\). (i): This follows from [5, Example 17.11 and Proposition 17.36(i)] applied to A and A −1 respectively. (ii): It follows from [5, Proposition 17.36(iii)], [28, Theorem 4.8.5.4] and the invertibility of A that \(q_{A}^*=q_{A^{-1}}+\iota _{{\operatorname {ran}} A} =q_{A^{-1}}+\iota _{X}=q_{A^{-1}}\). □
Proposition 13.3
Let L: Y → X be linear. Suppose that L ∗ L is invertible. \(g\colon Y \to \left ]-\infty ,+\infty \right ]\) be convex, lower semicontinuous, and proper. Then the following hold:
-
(i)
\(\ker L=\{0\}\).
-
(ii)
L ∗ L is strictly monotone.
-
(iii)
\({\operatorname {dom}}(q_{L^*L}+g)^*=X\).
-
(iv)
\(\partial (q_{L^*L}+g)=\nabla q_{L^*L}+\partial g=L^*L+\partial g\).
-
(v)
\((q_{L^*L}+g^*)^*\) is Fréchet differentiable on X.
-
(vi)
(L ∗ L + ∂g ∗)−1 is single-valued and \({\operatorname {dom}} (L^*L+\partial g^*)^{-1}=X\).
-
(vii)
\({\operatorname {Prox}}_{g^*\circ L^*}={\operatorname {Id}}-L(L^*L+\partial g)^{-1}L^*\).
-
(viii)
\({\operatorname {Prox}}_{(g^*\circ L^*)^{*}}=L(L^*L+\partial g)^{-1}L^*\).
Proof
(i): Using [5, Fact 2.25(vi)] and the assumption that L ∗ L is invertible we have \(\ker L=\ker L^*L=\{0\}\). (ii): Using (i) we have \((\forall x\in X\smallsetminus \{0\})\) , hence L ∗ L is strictly monotone. (iii): By (ii) and Lemma 13.3 (i) applied with A replaced by L ∗ L we have \({\operatorname {dom}} q_{L^*L}={\operatorname {dom}} q^*_{L^*L}=X\), hence
It follows from (13.46), [1, Corollary 2.1] and Lemma 13.3 (ii)&(i) that \({\operatorname {dom}} (q_{L^*L}+g)^*={\operatorname {dom}} q_{L^*L}^*+{\operatorname {dom}} g^{*} ={\operatorname {dom}} q_{(L^*L)^{-1}}+{\operatorname {dom}} g^*=X+{\operatorname {dom}} g^*=X\). (iv): Combine (13.46), [1, Corollary 2.1] and Lemma 13.3 (i). (v): Since \(q_{L^*L}\) is strictly convex, so is \(q_{L^*L}+g\), which in view of [5, Proposition 18.9] and (iii) implies that \((q_{L^*L}+g)^*\) is Fréchet differentiable on \(X={\operatorname {int}} {\operatorname {dom}} (q_{L^*L}+g)^*\). (vi): Using (iv), Fact 13.8 (i) applied with f replaced by \(q_{L^*L}+g\), (v) and [5, Proposition 17.31(i)] we have \((L^*L+\partial g)^{-1}=(\partial (q_{L^*L}+g))^{-1} =\partial (q_{L^*L}+g)^* =\{\nabla (q_{L^*L}+g)^*\}\) is single-valued with \({\operatorname {dom}} (L^*L+\partial g)^{-1}=X\).
(vii): Let \(x\in X={\operatorname {dom}} (L^*L+\partial g)^{-1}\) and let y ∈ X such that y = x − L(L ∗ L + ∂g)−1 L ∗ x. Then using (vi) we have
Consequently, L ∗ y + L ∗ Lu = L ∗ x ∈ L ∗ Lu + ∂g(u), hence L ∗ y ∈ ∂g(u), equivalently, in view of Fact 13.8 (i) applied with f replaced by g, u ∈ (∂g)−1(L ∗ y) = ∂g ∗(L ∗ y). Combining with (13.47) we learn that
Note that [5, Fact 2.25(vi) and Fact 2.26] implies that \({\operatorname {ran}} L^*={\operatorname {ran}} L^*L=X\), hence \(0\in {\operatorname {sri}}({\operatorname {dom}} g^*-{\operatorname {ran}} L^*)\). Therefore one can apply [5, Corollary 16.53(i)] to re-write (13.48) as \(x\in ({\operatorname {Id}}+\partial (g^*\circ L^*))y\). Therefore, \(y={\operatorname {Prox}}_{g^*\circ L^*}x\) by [5, Proposition 16.44]. (viii): Apply Fact 13.8 (ii) with f replaced by g ∗∘ L ∗. □
13.1.2 Appendix 2
Lemma 13.4
Let \(g\colon Y \to \left ]-\infty ,+\infty \right ]\) be convex, lower semicontinuous, and proper. Consider the following statements:
-
(i)
g is strongly convex.
-
(ii)
g ∗ is Fréchet differentiable and ∇g ∗ is Lipschitz continuous.
-
(iii)
g ∗∘ L ∗ is Fréchet differentiable and ∇(g ∗∘ L ∗) = L ∘ (∇g ∗) ∘ L ∗ is Lipschitz continuous.
-
(iv)
(g ∗∘ L ∗)∗ is strongly convex.
Proof
(i)⇔(ii): See [5, Theorem 18.15]. (ii)⇒(iii): Clearly g ∗∘ L ∗ is Fréchet differentiable. Now let (x, y) ∈ X × X and suppose that β > 0 is a Lipschitz constant of ∇g ∗. It follows from [5, Corollary 16.53] that . (iii)⇔(iv): Use the equivalence of (i) and (ii) applied with g replaced by (g ∗∘ L ∗)∗. □
13.1.3 Appendix 3
We start by recalling the following well-known fact.
Fact 13.8
Let \(f\colon X\to \left ]-\infty ,+\infty \right ]\) be convex, lower semicontinuous and proper and let γ > 0. Then the following hold:
-
(i)
(∂f)−1 = ∂f ∗.
-
(ii)
\({\operatorname {Prox}}_{\gamma f}+{\operatorname {Prox}}_{(\gamma f)^*}={\operatorname {Id}}\).
Proof
(i): See, e.g., [27, Remark on page 216] or [23, Théorème 3.1].
(ii): See, e.g., [5, Theorem 14.3(iii)]. □
Lemma 13.5
Let γ > 0. The Douglas–Rachford method given in (13.9) applied to the ordered pair (γf, γg) with a starting point x 0 ∈ X to solve (13.8) can be rewritten as:
Proof
Using (13.9a), (13.10), and Fact 13.8 (ii) applied with f replaced by g we have
and the conclusion follows. □
13.1.4 Appendix 4
Proposition 13.4
Let (x, y, z) ∈ X × Y × Z and let B and \(\widetilde {f}\) be defined as in (13.39) and (13.41). Then the following hold:
-
(i)
B ∗ y = (A ∗ y, C ∗ y).
-
(ii)
\({\operatorname {dom}} \widetilde {f}={\operatorname {dom}} f\times \{0\}\).
-
(iii)
\((\forall (x,z)\in {\operatorname {dom}} \widetilde {f})\) we have z = 0 and B(x, z) = Ax.
-
(iv)
\(B({\operatorname {dom}} \widetilde {f})=A({\operatorname {dom}} f)\).
-
(v)
\(0\in {\operatorname {sri}} ({\operatorname {dom}} g-B({\operatorname {dom}} \widetilde {f}))\).
-
(vi)
\({\operatorname {argmin}}(\widetilde {f}+g\circ B)={\operatorname {argmin}} (f+g\circ A)\times \{0\} \neq \varnothing \).
-
(vii)
\({\operatorname {Prox}}_{\widetilde {f}}(x,z)=({\operatorname {Prox}}_f x,0)\).
-
(viii)
\({\operatorname {Prox}}_{(\tau g\circ B)^*}=\tau B^*{\operatorname {Prox}}_{\sigma g^*}(\sigma B)\).
Proof
(i): This clearly follows from (13.39). (ii): It follows from (13.41) that \({\operatorname {dom}}\widetilde {f}={\operatorname {dom}} f\times {\operatorname {dom}}\iota _{\{0\}}={\operatorname {dom}} f\times \{0\}\). (iii): The claim that z = 0 follows from (ii). Now combine with (13.39). (iv): Combine (ii) and (iii). (v): Combine (iv) and (13.34). (vi): We have
where (13.51a) follows from (v) and (13.4) applied with (f, g, L) replaced by \((g,\widetilde {f},B)\), and (13.51b) follows from (13.39) and (13.41). Therefore, \((x,z)\in {\operatorname {argmin}}(\widetilde {f}+g\circ B)\) ⇔ [z = 0 and \(x\in {\operatorname {zer}}(\partial f+A^* \circ \partial g\circ A)\)] ⇔ \((x,z)\in {\operatorname {argmin}}(f+g\circ A) \times \{0\}\). Now combine with (13.4). (vii): Combine (13.41) and [5, Proposition 23.18]. (viii): Indeed, Proposition 13.3 (viii) implies
□
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Moursi, W.M., Zinchenko, Y. (2019). A Note on the Equivalence of Operator Splitting Methods. In: Bauschke, H., Burachik, R., Luke, D. (eds) Splitting Algorithms, Modern Operator Theory, and Applications. Springer, Cham. https://doi.org/10.1007/978-3-030-25939-6_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-25939-6_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-25938-9
Online ISBN: 978-3-030-25939-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)