Abstract
We solve the non-linearized and linearized obstacle problems efficiently using a primal-dual hybrid gradients method involving projection and/or \(L^1\) penalty. Since this method requires no matrix inversions or explicit identification of the contact set, we find that this method, on a variety of test problems, achieves the precision of previous methods with a speed up of 1–2 orders of magnitude. The derivation of this method is disciplined, relying on a saddle point formulation of the convex problem, and can be adapted to a wide range of other constrained convex optimization problems.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The obstacle problem is a classic motivating example in the mathematical study of variational inequalities and free boundary problems with broad applications, e.g., in fluid filtration in porous media, constrained heating, elasto-plasticity, optimal control, or financial mathematics [2, 7, 10, 22]. The classical formulation is motivated by the equilibrium position of a membrane (described by the graph of \(u:\varOmega \subset {\mathbb {R}}^n\rightarrow {\mathbb {R}}\)), with fixed values f imposed on the boundary \(\partial \varOmega \) and constrained to lie (almost everywhere) above an obstacle function, \(\varphi :\varOmega \rightarrow {\mathbb {R}}\):
For small deflections, the above integral (1.1) is typically linearized into the Dirichlet integral:
Within the Sobolev space \(H^1(\varOmega )\), the admissible functions for either obstacle problem, (1.1) or (1.2), define the closed convex set \(U = U(\varOmega ,\phi ,f)\):
Standard arguments from variational analysis can be used to show that a unique minimizer of (1.2) exists for sufficiently regular data \((\varOmega ,\phi ,f)\); see, for example, [2, 10, 22]. The non-coercivity of the nonlinear problem (1.1) can be handled using the methods in [7].
In practice, numerical methods are required to solve the obstacle problem. Lions and Mercier [16] illustrated a splitting algorithm on the obstacle problem. Finite element methods were used to solve the obstacle problem as a free boundary problem [5, 11, 15]. Iteration methods based on penalty and projected gradients are known approaches [6, 23]. In [12, 13], the obstacle constraint is incorporated into the objective using a Lagrange multiplier. Multigrid and multilevel methods have been employed to speed up computation: The obstacle problems described by variational inequalities can be solved using the iterative scheme proposed in [14], where at each step the system is solved by a multigrid algorithm; [28] discretizes the problem into a continuous piecewise linear finite element space and turn the problem into a quadratic programming problem with inequality constraints, which is then solved with multilevel projection (MP).
In [17], the authors propose a modified level set method to solve the obstacle problem and turn the problem into a nonsmooth minimization problem. Two methods are presented to solve the minimization problem: a proximal bundle method is employed for solving a general nonsmooth minimization problem; a gradient method is proposed to solve the regularized problems.
Most recently, in [26], the authors proposed an efficient numerical scheme for solving the (linearized) obstacle problem (and generalizations) based on a reformulation of the obstacle in terms of an \(L^1\)-like penalty on the variational problem, in lieu of the obstacle constraint:
where \((\cdot )_+=\max (\cdot , 0)\). It was shown that for sufficiently large but finite \(\mu \), the minimizer of the unconstrained problem (1.4) is also the minimizer of the original, constrained problem (1.2).
The functionals and constraints in (1.1), (1.2), and (1.4) are all closed and convex, which makes the problems accessible to convex optimization techniques. There is a huge corpus of algorithms that apply to such convex optimization problems. Recently, primal-dual splitting algorithms have gained particular attention, most notably so in the context of TV and \(L^1\)-type problems in imaging; see, e.g., [8, 9, 29, 30]. Zosso and Bustin [31] have applied the theory to the particular case of the Beltrami functional [24], used as an interesting regularizer in imaging problems that interpolates between the classical \(H^1\) and TV regularizers, and which is a generalization of the surface area functional. Zosso and Osting [32] use the method for efficient minimization of a minimal surface-based graph partitioning problem.
In this paper, we develop a primal-dual hybrid gradient method to efficiently solve constrained and \(L^1\)-penalty formulations of the linear and non-linear obstacle problems, (1.1) and (1.2). We comment that the original constrained problem is algorithmically related to the \(L^1\)-based problem for finite \(\mu \), and strictly identical for the particular choice \(\mu =\infty \). Our results achieve state-of-the-art precision in much shorter time; the speed up is 1–2 orders of magnitude with respect to the method in [26], and even larger compared to older methods [16, 17, 27]. This is demonstrated for a variety of one- and two-dimensional problems, including the extension to elasto-plastic problems with double obstacles and potential term.
It is important to note that our fast algorithm is derived from a primal-dual reformulation of the original convex optimization problems in a very disciplined way. The primal-dual hybrid gradients method employed here enjoys straightforward convergence guarantees. Moreover, in the particular case of the linear obstacle problem, the dual variable update is particularly simple, and the dual variable can be eliminated altogether, resulting in a single primal variable update scheme that greatly improves over standard explicit gradient descent in multiple ways. Namely, the resulting scheme is fast because previous iterates keep being taken into account, which has highly interesting interpretations in terms of “momentum method”, and can also be reformulated as a damped wave equation. We illustrate this nature on a very simple yet striking toy example. More general advantages of our method are the avoidance of matrix/operator inversion (all update steps are explicit), and the fact that the contact/coincidence set (the free boundary) is handled implicitly, with no explicit tracking being necessary.
1.1 Outline
The remainder of the paper is organized as follows: Sect. 2 gives a formulation of the discretized obstacle problem and its different flavors. Section 3 describes the proposed primal-dual algorithms. In Sect. 4, we compare the resulting primal-dual algorithms to some more classical ones. Section 5 presents the results of the primal-dual algorithm on a variety of numerical experiments. We conclude in Sect. 6 with a discussion.
2 Discretized Obstacle Problem
In this section, we introduce a discretized obstacle problem and discuss some of its properties. To this end, we introduce the following notation.
Let X denote the set of Cartesian grid points of the discretization of the D-dimensional domain \(\varOmega \subset {\mathbb {R}}^D\).
2.1 Inner Products and Norms
For functions \(f,g:X\rightarrow {\mathbb {R}}\), we define the \(L^2\) inner product
and the induced \(L^2\)-norm
Similarly, for vectors \(v,w:X\rightarrow {\mathbb {R}}^D\), we define the \(L^2\) inner product
and obtain the induced \(L^2\)-norm
where \(v_d(x)\) denotes the d-th component of the vector at location x. For a vector \(v:X\rightarrow {\mathbb {R}}^D\), the magnitude \(|v|:X \rightarrow {\mathbb {R}}\) is defined as follows:
The magnitude operator, \(|\cdot |:L^2(X)^D\rightarrow L^2(X)\), thus maps vector-valued functions to scalar-valued functions. It is easy to see that \(\Vert v\Vert _{L^2(X)^D} = \Vert |v| \Vert _{L^2(X)}\). Finally, for the obstacle penalty, we need the single sided \(L^1\)-penalty of a function \(f:X\rightarrow {\mathbb {R}}\):
A corresponding negative-sided penalty can be obtained directly as
It is easy to verify that \(\Vert f\Vert _+, \Vert f\Vert _-\ge 0\), and \(\Vert f\Vert _+ + \Vert f\Vert _- = \Vert f\Vert _1\).
2.2 Finite Difference Operators
For functions \(f:X\rightarrow {\mathbb {R}}\), we denote by \(\nabla _X:L^2(X)\rightarrow L^2(X)^D\) a suitable discretization of the continuous gradient by finite differences, subject to desired boundary constraints. Using the discrete gradient, for vectors \(v:X\rightarrow {\mathbb {R}}^D\) we define the discretized divergence \(\mathrm{div}_X :L^2(X)^D\rightarrow L^2(X)\), such that the two operators satisfy the usual negative adjoint property:
For functions \(f:X\rightarrow {\mathbb {R}}\), the composition of divergence and gradient yields the discrete Laplacian operator \(\varDelta _X:L^2(X)\rightarrow L^2(X)\):
For example, we consider the standard forward differences stencil for the gradient; then the adjoint relation leads to backward differences in the divergence and the resulting discretized Laplacian corresponds to the commonly used \(2D+1\) point centered differences stencil.
2.3 Surface Area and Dirichlet Energy
Based on the preceding definitions, we define the following two functionals: Let \(S:L^2(X)^D\rightarrow {\mathbb {R}}\) be defined as
For a Monge surface, discretized by the graph of \(u:X\rightarrow {\mathbb {R}}\), the discrete surface area is then measured by \(S[\nabla _Xu]\). Similarly, let \(D:L^2(X)^D\rightarrow {\mathbb {R}}\) be defined as
We then call \(D[\nabla _Xu]\) the discrete Dirichlet energy of u.
For slowly varying functions u, i.e., such that \(\forall x\in X:|\nabla _X u|(x) \ll 1\), the Dirichlet energy is an approximation to the surface area,
where |X| denotes the number of sampling points in X, i.e., the area of the discretized domain \(\varOmega \). Note that both the surface area and the Dirichlet energy functional are proper, closed, and convex.
2.4 Convex Conjugates
We recall the definition of the convex conjugate (a.k.a. Legendre–Fenchel transform) \(f^*\) of a function f:
The biconjugate, \(f^{**}:=(f^*)^*\), is the largest closed convex function with \(f^{**}\le f\). As a result, \(f\equiv f^{**}\) iff f is closed convex (Fenchel–Moreau–Rockafellar Theorem) [21, Theorem 5].
Using Definitions 2.4 and 2.11, we can determine the respective convex conjugate functionals, \(D^*,S^*:L^2(X)^D\rightarrow {\mathbb {R}}\) of the Dirichlet energy and surface area as follows:
and
Thanks to the Fenchel–Moreau theorem we know that the biconjugate functionals, \(S^{**}\) and \(D^{{**}}\), are identical to the original functionals, thus
2.5 Discretized Obstacle Problem
Having defined the discretized surface area and Dirichlet energy functionals, we next define the discretized optimization problem.
Given a discretization X and appropriate finite difference operators for gradient, divergence, and Laplacian, we declare a subset \(\partial X\subset X\) to be boundary points. Let \(\varphi :X\rightarrow {\mathbb {R}}\) be an obstacle, and \(f:\partial X\rightarrow {\mathbb {R}}\) the prescribed boundary values, compatible with the obstacle, i.e., \(\forall x\in \partial X:\varphi (x) \le f(x)\). Consider further the set \(U_f\) of square integrable functions who satisfy the boundary constraints,
and the subset \(U_{f,\varphi }\) thereof additionally satisfying the obstacle constraint,
Both these sets can be shown to be closed convex. We are interested in solving the following four (primal) problems:
2.5.1 Constrained Obstacle Problem
2.5.2 Constrained Linearized Obstacle Problem
2.5.3 \(L^1\)-Penalty Obstacle Problem
2.5.4 Linearized \(L^1\)-Penalty Obstacle Problem
Due to the convexity of functionals and admissible sets it follows that if the admissible sets are non-empty, there exists a solution. The following Lemma shows that the solution to (P2) and (P4) is unique.
Lemma 1
Provided the graph X is connected, the boundary points \(\partial X \ne \emptyset \), and the admissible sets are non-empty, the solutions to (P2) and (P4) are unique.
Proof
We prove the result for (P4); the result for (P2) is similar.
Let \(u,v \in U_f\) be optimal solutions and \(\alpha \in [0,1]\). Denote the objective by
By the convexity of J, any convex combination of the u and v must have the same value, so
Writing out the definition of J and rearranging, this is equivalent to
Here we have used the convexity of \(\Vert \cdot \Vert _+\). But, \(\langle -\varDelta _X \cdot , \cdot \rangle \) is a positive semi-definite form, and since X is assumed to be connected, has only the constant function in the kernel. This implies that \(u-v = \alpha 1\) where \(\alpha \in {\mathbb {R}}\) and \( 1 :V \rightarrow {\mathbb {R}}\) is the constant function. Since, \(u,v \in U_f\), i.e., satisfy the same boundary conditions, we conclude that \(u=v\), as desired.
In [26, Theorem 3.1] it is shown that for any \(\mu \ge - \varDelta \phi \), the minimizers of the constrained and \(L^1\)-penalty linearized obstacle problems, the continuum version of (P2) and (P4), are the same. Below, in Theorem 1, we give the analogous result for the discrete problems, (P2) and (P4). The proof is largely the same, except for one small but interesting point. In the continuum, one has that for any \(f \in H^1\), \(\langle \nabla f_+ , \nabla f_- \rangle = 0\), where \(f_+ = \max (f,0)\), \(f_- = \min (f,0)\), and \(f = f_+ + f_-\). In our discrete setting we have the following result.
Lemma 2
For any \(f \in L^2(X)\), \(\langle \nabla _X f_+ , \nabla _X f_- \rangle _{L^2(X)^D} \ge 0\).
Proof
Denoting the adjacent vertices of \(x\in X\) by N(x), we compute
Theorem 1
Let u and \(u_\mu \) denote the minimizers of (P2) and (P4) respectively. For any \(\mu \ge \max \{ - \varDelta _X \phi (x) :x \in X\}\), we have that \(u_ \mu = u\).
Proof
Consider \(w := u_\mu + (\phi - u_\mu )_+\). Abbreviate \(\langle \cdot , \cdot \rangle = \langle \cdot , \cdot \rangle _{L^2(X)^D}\) and \(\Vert \cdot \Vert = \Vert \cdot \Vert _{L^2(X)^D}\). Evaluating the objective of (P4) at the admissible \(w \in U_{f,\phi }\), we obtain
Now assuming \(\mu \ge \max \{ - \varDelta _X \phi (x) :x \in X\}\) and using Lemma 2, we obtain
Since \(u_\mu \) is the unique minimizer of \(D[\nabla _X\cdot ] + \mu \Vert \phi - \cdot \Vert _+ \) over \(U_f\), we conclude that \(w = u_\mu \) which implies that \(u_\mu \ge \phi \), i.e., \(u_\mu \in U_{f,\phi }\). Additionally, we have that
Since u is the unique minimizer of \(D[\nabla _X\cdot ]\) over \(U_{f,\phi }\) and \(u_\mu \) is admissible, we conclude that \(u = u_\mu \), as desired.
2.6 Saddle Point Formulation
Thanks to the identity of a closed convex functional with its biconjugate, we can substitute the functionals \(S[\nabla _Xu]\) and \(D[\nabla _Xu]\) by the definition of their respective biconjugates (2.14a) and (2.14b). We thus obtain the four primal-dual saddle point problems, respectively equivalent to the primal problems (P1)–(P4):
2.6.1 Constrained Obstacle Problem
2.6.2 Constrained Linearized Obstacle Problem
2.6.3 \(L^1\)-Penalty Obstacle Problem
2.6.4 Linearized \(L^1\)-Penalty Obstacle Problem
For discussing algorithms for the preceding problems (PD1)–(PD4), it is convenient to summarize these saddle point problems in the following form:
where we identify the admissible sets, U and \(W^*\), and the functionals, \(F^*\) and G, as follows:
3 Algorithms
3.1 Primal-Dual Hybrid Gradients Scheme
To solve the general saddle point problem (2.17) efficiently, we propose an adaptation of the primal-dual hybrid gradients algorithm [8, 29, 30, 32]. For (2.17), the structure of the PDHG algorithm is as follows:
The first two steps are the proximal update of the dual and primal variable, respectively. The third, extra-gradient step is an overrelaxation of the primal update in order to overcome the stepsize shortening typical of first order methods; it is a prediction of the primal variable update used in the dual variable update. Note the appearance of \(\bar{u}\) instead of u in the dual variable update 3.1a.
In general, O(1 / n) (where n is the number of iterations) convergence has been shown for fixed \(r_1,r_2\) satisfying
where \(\Vert \nabla _X\Vert \) is the operator norm/induced norm of the discretized gradient operator; equivalently, \(\Vert \nabla _X\Vert ^2=\Vert \varDelta _X\Vert \) is the induced norm of the composition \(\varDelta _X = \mathrm{div}_X\circ \nabla _X\).
In the following subsections, we show how the respective minimization sub-problems (3.1a) and (3.1b) can be solved efficiently for each specific configuration in (2.18) corresponding to (PD1)–(PD4).
3.2 Dual Variable Update
We first address the update of the dual variable \(p\). This step is unaffected by the choice of constrained or \(L^1\)-penalty formulation, but differs between the non-linear, \(S[\nabla _Xu]\), and linearized, \(D[\nabla _Xu]\), case.
3.2.1 Linear Cases, (PD2) and (PD4)
In the linearized cases (PD2) and (PD4), where \(F^*=D^*\), the minimization problem of the dual variable update (3.1a) is given by
The minimizer of this quadratic function is immediately found as
3.2.2 Non-Linear Cases (PD1) and (PD3)
In the nonlinear cases (PD1) and (PD3), where \(F^*=S^*\), by rationalizing the numerator of the dual, the minimization problem of the dual variable update (3.1a) can be rewritten as
We propose to solve this in an iteratively reweighted least squares (IRLS) approach as follows. We observe that the objective in (3.5) without the square-root in the denominator is quadratic in \(p\). We fix the current estimate of \(p\) in the square-root to obtain a weighted least squares problem, this weight is then updated, and the process is repeated until convergence:
Our focus is on the solution of the fixed-point minimization step (3.6b). Indeed, it can be shown (see [32, Lemma 4.3]) that for small enough \(r_1\), the solution to (3.6b) is given by
where \(\varepsilon ^t(x):=\sqrt{1-|\psi ^t|(x)^2}\). Further, for suitably chosen \(r_1\), the iterative scheme (3.6) converges to the desired fixed point (see [32, Lemma 4.4]).
3.3 Primal Variable Update
We now consider the update problem (3.1b) of the primal variable. Here, the choice of non-linear versus linearized cost functional is irrelevant; instead, we distinguish the original constraint formulations (PD1) and (PD2) from the relaxed \(L^1\)-penalty formulations (PD3) and (PD4).
3.3.1 Constrained Formulation (PD1) and (PD2)
The constrained primal variable iteration (3.1b) reads
Based on the projection theorem and variational inequalities [6], and in the tradition of the Arrow–Hurwitz–Uzawa algorithm [1] and the first recent proposition of the PDHG algorithm [29], this update is approximated by seeking a minimizer in the relaxed set \(L^2(X)\) and projecting onto the admissible set \(U_{f,\varphi }\). This approximation to the iteration can be written
with
and where the projection \(P_{U_{f,\varphi }}:L^2(X) \rightarrow {U_{f,\varphi }}\) is simply defined as
Therefore, more explicitly for \(x\in X\), the update of the primal variable in the constrained setting (PD1), (PD2), is:
3.3.2 \(L^1\)-Penalty Formulations (PD3) and (PD4)
The \(L^1\)-penalty method relaxes the admissible set to \(U_f\), i.e., \(L^2(X)\) subject to only the boundary values f on \(\partial X\), and introduces the \(L^1\)-like penalty term \(\Vert \varphi -u\Vert _+\), instead. The resulting instance of (3.1b) can be rewritten as
The minimizer can be easily found by making a change of variables, \(z=\varphi -u\), leading to
where \({{\mathrm{shrink}}}_+\) denotes single-sided soft-thresholding (shrinkage) as follows:
Let again \(u^\star =u^n+r_2\mathrm{div}_X p^{n+1}\). We re-substitute \(u^{n+1}=\varphi -z^\star \) and respect the boundary conditions through projection \(P_{U_f}\),
This leads to the following explicit primal update for the \(L^1\)-penalty based obstacle problems, (PD3), (PD4):
Remark 1
Recall that in Theorem 1, we proved that the solutions to the constrained and \(L^1\)-penalized obstacle problems, (PD2) and (PD4), are equivalent for \(\mu \ge \max \{-\varDelta _X \varphi (x) :x \in X\}\); see also [26]. In light of this, it is interesting to compare primal iterations (3.12) and (3.17) for the constrained and \(L^1\)-penalized obstacle problems. These two algorithms only disagree for points \(x \notin \partial X\) with \(u^\star (x) < \varphi (x)-r_2\mu \), i.e., when the tentative membrane update \(u^\star \) violates the obstacle constraint by an amount in excess of \(r_2\mu \). This suggests that not only do the solutions of these two problems agree for sufficiently large \(\mu \), but so do all of the iterates in the proposed primal-dual methods.
3.4 Elimination of the Dual Variable in the Linear Case, (PD2) and (PD4)
In the updates of the primal variable, the dual variable only appears through its divergence, \(\mathrm{div}_X p^{n+1}\); see (3.10) being used in (3.12) and (3.17). However, in the linear case, (PD2) and (PD4), by applying the divergence to (3.4), it is easy to see that
Assuming \(p^{0}(x)=0\) for all \(x\in X\), this recursion relation can be solved, giving
This allows rewriting the primal variable updates without the dual variable, effectively eliminating the dual variable from the scheme altogether. After elimination, the intermediate primal update (3.10) is now
which is then projected/shrunk according to (3.12) or (3.17) as before.
3.5 Complete Algorithms
The complete algorithms for the four different flavors of the obstacle problem (PD1)–(PD4) are summarized in Algorithms 1–2. These algorithms have extremely simple structure and require only low-complexity operations. In particular, no matrix inversions are required. These algorithms converge in O(1 / n) (where n is the number of iterations) for \(r_1,r_2\) satisfying
Remark 2
Although Algorithms 1 and 2 appear to be specifically engineered to serve a particular purpose, it is important to note that they have been derived in a disciplined manner: the re-formulation of the original, primal obstacle problem as a saddle-point problem and by implementing a primal-dual hybrid gradients algorithm with known convergence guarantees.
4 Interpretation/Comparisons for Proposed Algorithm
In this section we compare the proposed primal-dual algorithm (summarized in Algorithms 1 and 2) to two time-stepping schemes arising in classical problems.
4.1 Comparison to the Forward Euler Scheme for the Discrete Heat Equation
For a comparison, recall that the forward Euler time-stepping scheme for the discrete heat equation, can be writtten
where \(r_1r_2\) is the time step. Thus, after elimination of the dual variable, the primal-dual method for the linear problem seems to perform something akin to an alternating sequence of explicit heat diffusion steps (4.1), projection/thresholding (3.12) or (3.17), and an extra gradient step (3.1c). Unfortunately, the explicit heat diffusion (4.1) is well-known to be severely limited by its stability criterion and therefore dreadfully slow.
Indeed, the first term, \(\eta =0\), of the summation in (3.20) is an explicit heat update step, subject to the usual constraint on the time step, here \(r_1r_2\le \rho (\varDelta _X)^{-1}\). However, thanks to the sum, previous iterates are taken into account (discounted by a compounding factor of \(\frac{1}{1+r_1}\)), the impact of which is essential for the efficiency and convergence of the scheme. Consider the following two cases:
Away from the solution, subsequent heat updates point towards the same direction and the sum lets the history of updates interfere constructively, effectively greatly increasing the time step of the explicit heat update. Assuming \(\varDelta _X\bar{u}^{n-\eta }\approx \varDelta _Xu^n\) is constant for recent iterations (small \(\eta \)), and noting that \(\sum _{\eta =0}^n \frac{1}{(1+r_1)^{\eta +1}} = \frac{1}{r_1}\), we can approximate (3.20) by
The proposed scheme is thus heat diffusion with a \(1/r_1\)-fold larger effective time-step, see (4.1). This is more pronounced for \(r_1 \ll 1\), reducing the discounting on previous time steps (i.e., increasing the memory horizon of the summation).
Close to the solution, the membrane tends to oscillate about the optimum and subsequent heat steps have opposing signs. The summation leads to destructive interference of subsequent updates, dampening the oscillations and effectively stabilizing the explicit heat update. In this case, larger values of \(r_1\) reduce the memory horizon and allow the system to transition from constructive interference to a stabilizing, destructive interference, thereby reducing overshooting.
4.2 Comparison to a Time-Stepping Scheme for the Damped Wave Equation
Another interesting analysis of the resulting update scheme, in the linear case and away from obstacle and boundary value projections, emerges when we rewrite it as a three-point scheme in time.
Lemma 3
The proposed update iterations of the primal variable (ignoring projections) satisfy the following three-point equality:
Remark 3
The left hand terms in (4.2) are immediately identified as finite difference approximations of \(u_{tt}^n\approx \left( u^{n+1}-2u^n+u^{n-1}\right) /h^2\) and \(u_t^n\approx \left( u^{n+1}-u^n\right) /h\), and one clearly recognizes elements of a damped wave equation:
Proof
By substituting the extragradient (3.1c) into (3.18), we get:
From (3.10) and (3.12)/(3.17), but ignoring shrinkage and projections, we get
Put together, this yields the primal update
from which (4.2) follows by simple term rearrangements.
The three-point primal update scheme (4.4) highlights how the proposed scheme performs explicit heat diffusion (based on the forward-projected Laplacian, in the extragradient step), on top of which the most recent update is added. This scheme is related to the “momentum method”—because it mimicks the motion of a system u with a certain mass (and therefore inertia) [19]. This type of gradient descent acceleration method is classically used in gradient-descent back-propagation training of neural networks [20], and has been associated with a specific type of conjugate gradients [4]. Also, the idea of maintaining momentum by mixing in an extrapolation of most recent updates is the essence of Nesterov’s gradient acceleration scheme [18] and the core of the FISTA algorithm [3]. The idea is also to be compared to gradient averaging e.g., as in stochastic gradient descent methods.
The evolution of a simple toy problem under both, standard Euler explicit heat updates (4.1) and the proposed primal-dual scheme (3.20) is illustrated in Fig. 1. It is easy to see how the summation in (3.20) causes the peak to decay at a faster pace after the first iterate. As a result, the initial localized state splits into two components that propagate away from the center and decay, illustrating the damped wave nature of the iterations. The wave is reflected at the fixed boundaries, inverting the sign, in contrast to the explicit scheme which remains non-negative (according to the maximum principle). Evidently, the proposed scheme converges to the (trivial) flat solution \(u=0\) much faster than the explicit heat update.
5 Numerical Experiments
In all following problems, we assume X to be a Cartesian sampling of either an interval in \({\mathbb {R}}\), or of a rectangular subset of \({\mathbb {R}}^2\). The differential operators are chosen as forward differences for the gradient, and, as a result, backward differences for the divergence and central differences for the Laplacian.
We implemented the algorithms in MATLAB®,Footnote 1 and ran our experiments on standard computing equipment (2012 laptop with 2.80 GHz Intel® CoreTM i7, with 4 GB RAM).
5.1 One-Dimensional Obstacles
We consider two obstacles on [0, 1], previously considered in [23, 26]:
and
A third obstacle (from [26]) is given by
All three problems and the obtained solutions are illustrated in Fig. 2. In the 1D case, there is of course no difference between the solutions of the linear and the non-linear versions: the solutions are identical to the obstacle on the contact set, and straight lines away from it.
5.2 Two-Dimensional Obstacles
We now look at example problems in 2D (from [26]). First, consider the square domain, \(\varphi _4:\varOmega \rightarrow {\mathbb {R}}\), with \(\varOmega =[0,1]^2\), and the obstacle
We solve the linear (PD2) and non-linear (PD1) obstacle problem (with 0 boundary values imposed) at different resolutions, as illustrated in Fig. 3. Parameter values are as follows: \(\mu =\infty \), \(r_1=0.01\), \(r_2=12.5\), \(\epsilon =10^{-5}\).
In Table 1 we can compare to numerical experiments of previous algorithm implementations, in particular the \(L^1\)-penalty method [26], and the Lions-Mercier splitting scheme [16], equally reported in [26]. These comparisons hold only in a loose sense, because the implementation and computing infrastructure is undisclosed for [26]. Nevertheless, our method seems to outperform by 1–2 orders of magnitude.
Next, we consider the square domain \(\varOmega =[-2,2]^2\), and the radially-symmetric obstacle
This obstacle has been used in [17, 26], and can be traced back to at least [25], albeit with a typo in its definition. Assuming \(f=0\) on a circular boundary of radius \(R=2\),Footnote 2 the linear obstacle problem admits the following radially-symmetric analytical solution:
where \(r^*=0.6979651482\ldots \) satisfies \((r^*)^2(1-\ln (r^*/R))=1\).
Using the values of this analytical solution as prescribed boundary values on the boundaries of the square domain, we solve the linear obstacle problem. We set the parameters as follows: \(\mu = 0.1\), \(r_1=0.008\), \(r_2=15.625\), \(\epsilon =10^{-7}\). Results are shown in Fig. 4. The pointwise error is of the order of \(10^{-5}\) and concentrated at the circular free boundary, mainly as a result of discretization. This type of error is comparable to the results reported in [17] and [26].
For this obstacle problem, we can compare to numerical experiments of previous algorithm implementations, see Table 2. In particular, we compare the runtime and number of iterations with the levelset gradient-method [17]. Even assuming that the 2004 computing times of [17] may be discounted by Moore’s law (by half each 18 months, about a factor of 160–200 overall), we note that our method clearly outperforms by 1–2 orders of magnitude.
5.3 Generalization to a Double Obstacle with Forcing
A more general form of the obstacle problem deals with two obstacles, bounding the membrane from below \((\varphi )\) and above \((\psi )\), and the inclusion of a force term v acting vertically on the membrane:
It is cumbersome but straightforward to extend the proposed algorithms to this case: the double obstacle results in a second shrinkage term (albeit with opposite signs), and the force v becomes an extra term in the primal variable update, \(u^\star = u^n + r_2\mathrm{div}_Xp^{n+1} + v\). The exact details are left as an exercise to the reader.
As an illustration of the performance of this slightly extended algorithm, we consider an elastic-plastic torsion problem described in [27]. Let \(\varOmega = [0,1]^2\) and X a Cartesian sampling thereof. The obstacles are constructed as
The boundary values are \(u=0\) on \(\partial \varOmega \). The force \(v:\varOmega \rightarrow {\mathbb {R}}\) is defined as
where we further have
The algorithm on a coarse grid \(64\times 64\) (as in [27]), with \(\mu =0.1\), \(r_1=0.008\), \(r_2=15.625\), and for a tolerance \(\epsilon =10^{-6}\), converges in 1104 iterations, requiring only 0.45 s. The algorithm presented in [27] is reported to perform in 916.8–1275.1 s, which is more than three orders of magnitude slower. The computed minimum-energy membrane and the associated contact set (coincidence set) are shown in Fig. 5, and compare very well to the results in [27].
6 Conclusions and Outlook
In this paper we have proposed a primal-dual hybrid gradients based method for the efficient solution of the obstacle problem. We consider both the non-linear and linearized versions of the discrete obstacle problem, as well as both the original, constrained formulation and the \(L^1\)-penalty relaxation. All these problems are convex minimization problems, and we start by reformulating them as primal-dual problems, based on the Legendre–Fenchel transform (convex conjugate) of the surface area and the Dirichlet energy, respectively. The resulting saddle-point problem is solved by the primal-dual hybrid gradients method, which consists of three iterative steps: the dual and primal variable proximal updates, and an extra-gradient step (overrelaxation) of the primal variable. The proximal updates can be solved efficiently, and in the linear case even particularly so. Indeed, the linear case allows eliminating the dual variable altogether, resulting in primal-variable update and extragradient scheme, only. We demonstrate on various 1D and 2D problems that the resulting scheme outperforms current state-of-the-art methods by 1–2 orders of magnitude.
In addition to being efficient, the proposed algorithm also benefits from a highly interesting physical interpretation, as discussed in Sects. 3.4 and 4.2. Firstly, the elimination of the dual variable results in a single variable scheme that is strongly reminiscent of gradient descent, except for the fact that previous iterates remain involved. As a result, there is build-up of a certain “momentum” that accelerates the updates beyond the limits of the usual CFL step-size criteria, see Sect. 3.4. Indeed, away from the solution the previous iterates interfere constructively resulting in a greatly increased apparent time step; whereas closer to the solution the membrane has a tendency to oscillate, and the iterates cancel out leading to increased stability. Similarly, the scheme can be brought into a form that is highly reminiscent of a damped wave equation (Sect. 4.2). This nature is also particularly illustrated by the toy-example considered in Fig. 1, where the momentum splits an initial central state into two separate parts, who then propagate away laterally and eventually dissipate.
A general advantage of the proposed primal-dual based method for the obstacle problem is the absence of matrix/operator inversions, since the proximal updates only contain explicit operations involving adjoints, at most. Also, in analogy to [23, 26] but in contrast to e.g., [12, 13], the proposed method does not need to track the contact set explicitly; indeed, the method handles the interface between coincidence set and harmonic membrane as a free boundary, intrinsically. Finally, it is to note that multigrid and multilevel methods that have been employed to speed up computation in different approaches such as [28] may be applicable to the proposed method as well, expected to result in a further speed up of the computations.
We remark that the properties of the proposed algorithm are not specifically engineered, but emerge from a very disciplined derivation from convex optimization, see Remark 2. More applications of this method, including elliptic PDE problems, will be reported in [33].
Notes
Code available at http://www.math.montana.edu/dzosso/code.
None of these sources explicitly specifies the boundary values other than claiming \(f=0\) on “some” (undisclosed) location, clearly different from the square border \(\partial \varOmega \).
References
Arrow, K.J., Hurwicz, L., Uzawa, H.: Studies in Linear and Non-Linear Programming. Cambridge Univ Press, Cambridge (1958)
Attouch, H., Buttazzo, G., Michaille, G. (eds.): Variational Analysis in Sobolev and BV Spaces: Applications to PDEs and Optimization, 2nd edn. SIAM, Philadelphia, PA (2014)
Beck, A., Teboulle, M.: Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Trans. Image Process. 18(11), 2419–2434 (2009). doi:10.1109/TIP.2009.2028250
Bhaya, A., Kaszkurewicz, E.: Steepest descent with momentum for quadratic functions is a version of the conjugate gradient method. Neural Netw. Off. J. Int. Neural Netw. Soc. 17(1), 65–71 (2004). doi:10.1016/S0893-6080(03)00170-9
Braess, D., Carstensen, C., Hoppe, R.H.W.: Convergence analysis of a conforming adaptive finite element method for an obstacle problem. Numer. Math. 107(3), 455–471 (2007). doi:10.1007/s00211-007-0098-6
Brezis, H., Sibony, M.: Méthodes d’approximation et d’itération pour les opérateurs monotones. Arch. Ration. Mech. Anal. 28(1), 59–82 (1968). doi:10.1007/BF00281564
Caffarelli, L.A.: The obstacle problem revisited. J. Fourier Anal. Appl. 4(4–5), 383–402 (1998). doi:10.1007/BF02498216
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vision 40(1), 120–145 (2011). doi:10.1007/s10851-010-0251-1
Esser, E., Zhang, X., Chan, T.F.: A general framework for a class of first order primal-dual algorithms for convex optimization in imaging science. SIAM J. Imaging Sci. 3(4), 1015–1046 (2010). doi:10.1137/09076934X
Friedmann, A.: Variational Principles and Free-Boundary Problems. Wiley, New York (1982)
Glowinski, R.: Numerical methods for nonlinear variational problems. Springer Science & Business Media, Berlin (2013)
Hintermüller, M., Ito, K., Kunisch, K.: The primal-dual active set strategy as a semismooth newton method. SIAM J. Optim. 13(3), 865–888 (2002). doi:10.1137/S1052623401383558
Hintermüller, M., Kovtunenko, V.A., Kunisch, K.: Obstacle problems with cohesion: a hemivariational inequality approach and its efficient numerical solution. SIAM J. Optim. 21(2), 491–516 (2011). doi:10.1137/10078299
Hoppe, R.H.W.: Multigrid algorithms for variational inequalities. SIAM J. Numer. Anal. 24(5), 1046–1065 (1987). doi:10.1137/0724069
Johnson, C.: Adaptive finite element methods for the obstacle problem. Math. Model Method Appl. Sci. 02(04), 483–487 (1992). doi:10.1142/S0218202592000284
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)
Majava, K., Tai, X.C.: A level set method for solving free boundary problems associated with obstacles. Int. J. Numer. Anal. Model. 1(2), 157–171 (2004)
Nesterov, Y.: A method of solving a convex programming problem with convergence rate \(O(1/k^2)\). Sov. Math. Dokl. 27(2), 372–376 (1983)
Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964). doi:10.1016/0041-5553(64)90137-5
Qian, N.: On the momentum term in gradient descent learning algorithms. Neural Netw. 12(1), 145–151 (1999). doi:10.1016/S0893-6080(98)00116-6
Rockafellar, R.T.: Conjugate duality and optimization. Soc. Ind. Appl. Math. (1974). doi:10.1137/1.9781611970524
Rodrigues, J.F.: Obstacle Problems in Mathematical Physics. Elsevier Science Publishers, Amsterdam (1987)
Scholz, R.: Numerical solution of the obstacle problem by the penalty method. Computing 32(4), 297–306 (1984). doi:10.1007/BF02243774
Sochen, N., Kimmel, R., Malladi, R.: A general framework for low level vision. IEEE Trans. Image Process. 7(3), 310–318 (1998)
Tai, X.C.: Rate of convergence for some constraint decomposition methods for nonlinear variational inequalities. Numer. Math. 93(4), 755–786 (2003). doi:10.1007/s002110200404
Tran, G., Schaeffer, H., Feldman, W.M., Osher, S.J.: An \(L^1\) penalty method for general obstacle problems. SIAM J. Appl. Math. 75(4), 1424–1444 (2015). doi:10.1137/140963303
Wang, F., Cheng, X.L.: An algorithm for solving the double obstacle problems. Appl. Math. Comput. 201(1–2), 221–228 (2008). doi:10.1016/j.amc.2007.12.015
Zhang, Y.: Multilevel projection algorithm for solving obstacle problems. Comput. Math. Appl. 41(12), 1505–1513 (2001). doi:10.1016/S0898-1221(01)00115-8
Zhu, M., Chan, T.: An efficient primal-dual hybrid gradient algorithm for total variation image restoration. Tech. Rep. UCLA CAM Rep. 08–34 (2008). http://www.math.ucla.edu/applied/cam/ ftp://ftp.math.ucla.edu/pub/camreport/cam08-34.pdf
Zhu, M., Wright, S.J., Chan, T.F.: Duality-based algorithms for total-variation-regularized image restoration. Comput. Optim. Appl. 47(3), 377–400 (2010). doi:10.1007/s10589-008-9225-2
Zosso, D., Bustin, A.: A primal-dual projected gradient algorithm for efficient Beltrami regularization. Tech. Rep. UCLA CAM Rep. 14–52 (2014). http://www.math.ucla.edu/applied/cam/ ftp://ftp.math.ucla.edu/pub/camreport/cam14-52.pdf
Zosso, D., Osting, B.: A minimal surface criterion for graph partitioning. Inverse Probl. Imaging 10(4), 1149–1180 (2016). doi:10.3934/ipi.2016036
Zosso, D., Osting, B., Osher, S.J.: Elliptic PDE and primal-dual optimization methods. In Preparation (2015)
Acknowledgements
The authors thank W. Feldman, I. Kim, and G. Tran for helpful discussions.
Author information
Authors and Affiliations
Corresponding author
Additional information
DZ and BO gratefully acknowledge support from NSF DMS-1461138. DZ was also supported by UC Lab Fees Research Grant 12-LR-236660 and the W. M. Keck Foundation. SJO is supported by NSF DMS-1118971, UC Lab Fees Grant, and ONR Grant N00014-14-0444.
Rights and permissions
About this article
Cite this article
Zosso, D., Osting, B., Xia, M. et al. An Efficient Primal-Dual Method for the Obstacle Problem. J Sci Comput 73, 416–437 (2017). https://doi.org/10.1007/s10915-017-0420-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10915-017-0420-0