1 Introduction

The obstacle problem is a classic motivating example in the mathematical study of variational inequalities and free boundary problems with broad applications, e.g., in fluid filtration in porous media, constrained heating, elasto-plasticity, optimal control, or financial mathematics [2, 7, 10, 22]. The classical formulation is motivated by the equilibrium position of a membrane (described by the graph of \(u:\varOmega \subset {\mathbb {R}}^n\rightarrow {\mathbb {R}}\)), with fixed values f imposed on the boundary \(\partial \varOmega \) and constrained to lie (almost everywhere) above an obstacle function, \(\varphi :\varOmega \rightarrow {\mathbb {R}}\):

$$\begin{aligned} \min _{u\in H^1(\varOmega )} \ \int _\varOmega \sqrt{1+|\nabla u|^2} \qquad \text {s.t.} \quad u \ge \varphi \text { a.e. in } \varOmega \quad \text {and} \quad u=f \text { on } \partial \varOmega . \end{aligned}$$
(1.1)

For small deflections, the above integral (1.1) is typically linearized into the Dirichlet integral:

$$\begin{aligned} \min _{{u\in H^1(\varOmega )}} \ \int _\varOmega |\nabla u|^2 \qquad \text {s.t.} \quad u \ge \varphi \text { a.e. in } \varOmega \quad \text {and} \quad u=f \text { on } \partial \varOmega . \end{aligned}$$
(1.2)

Within the Sobolev space \(H^1(\varOmega )\), the admissible functions for either obstacle problem, (1.1) or (1.2), define the closed convex set \(U = U(\varOmega ,\phi ,f)\):

$$\begin{aligned} U=\lbrace u\in H^1(\varOmega ),\; u \ge \varphi \text { a.e. in } \varOmega ,\; u\vert _{\partial \varOmega } = f\rbrace \end{aligned}$$
(1.3)

Standard arguments from variational analysis can be used to show that a unique minimizer of (1.2) exists for sufficiently regular data \((\varOmega ,\phi ,f)\); see, for example, [2, 10, 22]. The non-coercivity of the nonlinear problem (1.1) can be handled using the methods in [7].

In practice, numerical methods are required to solve the obstacle problem. Lions and Mercier [16] illustrated a splitting algorithm on the obstacle problem. Finite element methods were used to solve the obstacle problem as a free boundary problem [5, 11, 15]. Iteration methods based on penalty and projected gradients are known approaches [6, 23]. In [12, 13], the obstacle constraint is incorporated into the objective using a Lagrange multiplier. Multigrid and multilevel methods have been employed to speed up computation: The obstacle problems described by variational inequalities can be solved using the iterative scheme proposed in [14], where at each step the system is solved by a multigrid algorithm; [28] discretizes the problem into a continuous piecewise linear finite element space and turn the problem into a quadratic programming problem with inequality constraints, which is then solved with multilevel projection (MP).

In [17], the authors propose a modified level set method to solve the obstacle problem and turn the problem into a nonsmooth minimization problem. Two methods are presented to solve the minimization problem: a proximal bundle method is employed for solving a general nonsmooth minimization problem; a gradient method is proposed to solve the regularized problems.

Most recently, in [26], the authors proposed an efficient numerical scheme for solving the (linearized) obstacle problem (and generalizations) based on a reformulation of the obstacle in terms of an \(L^1\)-like penalty on the variational problem, in lieu of the obstacle constraint:

$$\begin{aligned} \min _{u\in H^1(\varOmega )} \ \int _\varOmega |\nabla u|^2+\mu (\varphi -u)_+\qquad \text {s.t.}\quad u\vert _{\partial \varOmega } = f, \end{aligned}$$
(1.4)

where \((\cdot )_+=\max (\cdot , 0)\). It was shown that for sufficiently large but finite \(\mu \), the minimizer of the unconstrained problem (1.4) is also the minimizer of the original, constrained problem (1.2).

The functionals and constraints in (1.1), (1.2), and (1.4) are all closed and convex, which makes the problems accessible to convex optimization techniques. There is a huge corpus of algorithms that apply to such convex optimization problems. Recently, primal-dual splitting algorithms have gained particular attention, most notably so in the context of TV and \(L^1\)-type problems in imaging; see, e.g., [8, 9, 29, 30]. Zosso and Bustin [31] have applied the theory to the particular case of the Beltrami functional [24], used as an interesting regularizer in imaging problems that interpolates between the classical \(H^1\) and TV regularizers, and which is a generalization of the surface area functional. Zosso and Osting [32] use the method for efficient minimization of a minimal surface-based graph partitioning problem.

In this paper, we develop a primal-dual hybrid gradient method to efficiently solve constrained and \(L^1\)-penalty formulations of the linear and non-linear obstacle problems, (1.1) and (1.2). We comment that the original constrained problem is algorithmically related to the \(L^1\)-based problem for finite \(\mu \), and strictly identical for the particular choice \(\mu =\infty \). Our results achieve state-of-the-art precision in much shorter time; the speed up is 1–2 orders of magnitude with respect to the method in [26], and even larger compared to older methods [16, 17, 27]. This is demonstrated for a variety of one- and two-dimensional problems, including the extension to elasto-plastic problems with double obstacles and potential term.

It is important to note that our fast algorithm is derived from a primal-dual reformulation of the original convex optimization problems in a very disciplined way. The primal-dual hybrid gradients method employed here enjoys straightforward convergence guarantees. Moreover, in the particular case of the linear obstacle problem, the dual variable update is particularly simple, and the dual variable can be eliminated altogether, resulting in a single primal variable update scheme that greatly improves over standard explicit gradient descent in multiple ways. Namely, the resulting scheme is fast because previous iterates keep being taken into account, which has highly interesting interpretations in terms of “momentum method”, and can also be reformulated as a damped wave equation. We illustrate this nature on a very simple yet striking toy example. More general advantages of our method are the avoidance of matrix/operator inversion (all update steps are explicit), and the fact that the contact/coincidence set (the free boundary) is handled implicitly, with no explicit tracking being necessary.

1.1 Outline

The remainder of the paper is organized as follows: Sect. 2 gives a formulation of the discretized obstacle problem and its different flavors. Section 3 describes the proposed primal-dual algorithms. In Sect. 4, we compare the resulting primal-dual algorithms to some more classical ones. Section 5 presents the results of the primal-dual algorithm on a variety of numerical experiments. We conclude in Sect. 6 with a discussion.

2 Discretized Obstacle Problem

In this section, we introduce a discretized obstacle problem and discuss some of its properties. To this end, we introduce the following notation.

Let X denote the set of Cartesian grid points of the discretization of the D-dimensional domain \(\varOmega \subset {\mathbb {R}}^D\).

2.1 Inner Products and Norms

For functions \(f,g:X\rightarrow {\mathbb {R}}\), we define the \(L^2\) inner product

$$\begin{aligned} \langle f, g\rangle _{L^2(X)} := \sum _{x\in X} f(x)g(x) \end{aligned}$$
(2.1)

and the induced \(L^2\)-norm

$$\begin{aligned} \Vert f\Vert _{L^2(X)} := \langle f,f\rangle _{L^2(X)}^{\frac{1}{2}} = \sqrt{\sum _{x\in X}f(x)^2} \end{aligned}$$
(2.2)

Similarly, for vectors \(v,w:X\rightarrow {\mathbb {R}}^D\), we define the \(L^2\) inner product

$$\begin{aligned} \langle v,w\rangle _{L^2(X)^D} := \sum _{x\in X} \sum _{d=1}^D v_d(x)\cdot w_d(x) \end{aligned}$$
(2.3)

and obtain the induced \(L^2\)-norm

$$\begin{aligned} \Vert v\Vert _{L^2(X)^D} := \langle v,v \rangle _{L^2(X)^D}^{\frac{1}{2}} = \sqrt{\sum _{x\in X} \sum _{d=1}^D v_d(x)^2}, \end{aligned}$$
(2.4)

where \(v_d(x)\) denotes the d-th component of the vector at location x. For a vector \(v:X\rightarrow {\mathbb {R}}^D\), the magnitude \(|v|:X \rightarrow {\mathbb {R}}\) is defined as follows:

$$\begin{aligned} |v|(x) := \sqrt{\sum _{d=1}^D v_d(x)^2},\qquad x\in X. \end{aligned}$$
(2.5)

The magnitude operator, \(|\cdot |:L^2(X)^D\rightarrow L^2(X)\), thus maps vector-valued functions to scalar-valued functions. It is easy to see that \(\Vert v\Vert _{L^2(X)^D} = \Vert |v| \Vert _{L^2(X)}\). Finally, for the obstacle penalty, we need the single sided \(L^1\)-penalty of a function \(f:X\rightarrow {\mathbb {R}}\):

$$\begin{aligned} \Vert f\Vert _+ := \sum _{x\in X} \max (f(x),0). \end{aligned}$$
(2.6)

A corresponding negative-sided penalty can be obtained directly as

$$\begin{aligned} \Vert f\Vert _- := \Vert -f\Vert _+. \end{aligned}$$
(2.7)

It is easy to verify that \(\Vert f\Vert _+, \Vert f\Vert _-\ge 0\), and \(\Vert f\Vert _+ + \Vert f\Vert _- = \Vert f\Vert _1\).

2.2 Finite Difference Operators

For functions \(f:X\rightarrow {\mathbb {R}}\), we denote by \(\nabla _X:L^2(X)\rightarrow L^2(X)^D\) a suitable discretization of the continuous gradient by finite differences, subject to desired boundary constraints. Using the discrete gradient, for vectors \(v:X\rightarrow {\mathbb {R}}^D\) we define the discretized divergence \(\mathrm{div}_X :L^2(X)^D\rightarrow L^2(X)\), such that the two operators satisfy the usual negative adjoint property:

$$\begin{aligned} \langle \nabla _X f, v\rangle _{L^2(X)^D} = -\langle f,\mathrm{div}_X v \rangle _{L^2(X)},\qquad f:X\rightarrow {\mathbb {R}},\ v:X\rightarrow {\mathbb {R}}^D. \end{aligned}$$
(2.8)

For functions \(f:X\rightarrow {\mathbb {R}}\), the composition of divergence and gradient yields the discrete Laplacian operator \(\varDelta _X:L^2(X)\rightarrow L^2(X)\):

$$\begin{aligned} \varDelta _X f := (\mathrm{div}_X \circ \nabla _X) f. \end{aligned}$$
(2.9)

For example, we consider the standard forward differences stencil for the gradient; then the adjoint relation leads to backward differences in the divergence and the resulting discretized Laplacian corresponds to the commonly used \(2D+1\) point centered differences stencil.

2.3 Surface Area and Dirichlet Energy

Based on the preceding definitions, we define the following two functionals: Let \(S:L^2(X)^D\rightarrow {\mathbb {R}}\) be defined as

$$\begin{aligned} S[s] := \sum _{x\in X} \sqrt{1+|s|(x)^2}. \end{aligned}$$
(2.10)

For a Monge surface, discretized by the graph of \(u:X\rightarrow {\mathbb {R}}\), the discrete surface area is then measured by \(S[\nabla _Xu]\). Similarly, let \(D:L^2(X)^D\rightarrow {\mathbb {R}}\) be defined as

$$\begin{aligned} D[s] := \frac{1}{2} \Vert s\Vert _{L^2(X)^D}^2. \end{aligned}$$
(2.11)

We then call \(D[\nabla _Xu]\) the discrete Dirichlet energy of u.

For slowly varying functions u, i.e., such that \(\forall x\in X:|\nabla _X u|(x) \ll 1\), the Dirichlet energy is an approximation to the surface area,

$$\begin{aligned} S[\nabla _Xu] \approx |X|+D[\nabla _Xu], \end{aligned}$$

where |X| denotes the number of sampling points in X, i.e., the area of the discretized domain \(\varOmega \). Note that both the surface area and the Dirichlet energy functional are proper, closed, and convex.

2.4 Convex Conjugates

We recall the definition of the convex conjugate (a.k.a. Legendre–Fenchel transform) \(f^*\) of a function f:

$$\begin{aligned} f^*(p) = \sup _{s} \ \ \langle s,p\rangle - f(s). \end{aligned}$$
(2.12)

The biconjugate, \(f^{**}:=(f^*)^*\), is the largest closed convex function with \(f^{**}\le f\). As a result, \(f\equiv f^{**}\) iff f is closed convex (Fenchel–Moreau–Rockafellar Theorem) [21, Theorem 5].

Using Definitions 2.4 and 2.11, we can determine the respective convex conjugate functionals, \(D^*,S^*:L^2(X)^D\rightarrow {\mathbb {R}}\) of the Dirichlet energy and surface area as follows:

$$\begin{aligned} D^*[p]&:= \sup _{s\in L^2(X)^D} \ \ \langle s,p\rangle _{L^2(X)^D} - D[s] = \frac{1}{2} \Vert p\Vert _{L^2(X)^D}^2, \end{aligned}$$
(2.13a)

and

$$\begin{aligned} S^*[p]&:= \sup _{s\in L^2(X)^D} \ \ \langle s,p\rangle _{L^2(X)^D} - S[s] \\ \nonumber&= {\left\{ \begin{array}{ll}-\sum \limits _{x\in X}\sqrt{1-|p|(x)^2} &{} \text {if}\ \ \forall x\in X:|p|(x) \le 1,\\ \infty &{} \text {otherwise.}\end{array}\right. } \end{aligned}$$
(2.13b)

Thanks to the Fenchel–Moreau theorem we know that the biconjugate functionals, \(S^{**}\) and \(D^{{**}}\), are identical to the original functionals, thus

$$\begin{aligned} S[\nabla _X u]&= {\mathop {\mathop {\sup }\limits _{p\in L^2(X)^D }}\limits _{|p|(x)\le 1}} \ \ \langle \nabla _X u,p\rangle _{L^2(X)^D} - S^*[p] \end{aligned}$$
(2.14a)
$$\begin{aligned} D[\nabla _X u]&= \sup _{p\in L^2(X)^D} \ \ \langle \nabla _X u,p\rangle _{L^2(X)^D} - D^*[p]. \end{aligned}$$
(2.14b)

2.5 Discretized Obstacle Problem

Having defined the discretized surface area and Dirichlet energy functionals, we next define the discretized optimization problem.

Given a discretization X and appropriate finite difference operators for gradient, divergence, and Laplacian, we declare a subset \(\partial X\subset X\) to be boundary points. Let \(\varphi :X\rightarrow {\mathbb {R}}\) be an obstacle, and \(f:\partial X\rightarrow {\mathbb {R}}\) the prescribed boundary values, compatible with the obstacle, i.e., \(\forall x\in \partial X:\varphi (x) \le f(x)\). Consider further the set \(U_f\) of square integrable functions who satisfy the boundary constraints,

$$\begin{aligned} U_f:=\{u\in L^2(X)\mid u(x) = f(x) \ \forall x\in \partial X \}, \end{aligned}$$
(2.15)

and the subset \(U_{f,\varphi }\) thereof additionally satisfying the obstacle constraint,

$$\begin{aligned} U_{f,\varphi }:=\{u\in L^2(X)\mid u(x) = f(x) \ \forall x\in \partial X \text { and } u(x) \ge \varphi (x) \ \forall x\in X\}. \end{aligned}$$
(2.16)

Both these sets can be shown to be closed convex. We are interested in solving the following four (primal) problems:

2.5.1 Constrained Obstacle Problem

$$\begin{aligned} \min _{u\in U_{f,\varphi }}\ \ S[\nabla _Xu] \end{aligned}$$
(P1)

2.5.2 Constrained Linearized Obstacle Problem

$$\begin{aligned} \min _{u\in U_{f,\varphi }}\ \ D[\nabla _Xu] \end{aligned}$$
(P2)

2.5.3 \(L^1\)-Penalty Obstacle Problem

$$\begin{aligned} \min _{u\in U_f}\ \ S[\nabla _Xu] + \mu \Vert \varphi -u\Vert _+ \end{aligned}$$
(P3)

2.5.4 Linearized \(L^1\)-Penalty Obstacle Problem

$$\begin{aligned} \min _{u\in U_f}\ \ D[\nabla _Xu] + \mu \Vert \varphi -u\Vert _+ \end{aligned}$$
(P4)

Due to the convexity of functionals and admissible sets it follows that if the admissible sets are non-empty, there exists a solution. The following Lemma shows that the solution to (P2) and (P4) is unique.

Lemma 1

Provided the graph X is connected, the boundary points \(\partial X \ne \emptyset \), and the admissible sets are non-empty, the solutions to (P2) and (P4) are unique.

Proof

We prove the result for (P4); the result for (P2) is similar.

Let \(u,v \in U_f\) be optimal solutions and \(\alpha \in [0,1]\). Denote the objective by

$$\begin{aligned} J(u) = \langle u, -\varDelta _X u \rangle + \mu \Vert \phi - u \Vert _+. \end{aligned}$$

By the convexity of J, any convex combination of the u and v must have the same value, so

$$\begin{aligned} J(\alpha u + (1-\alpha ) v) = \alpha J(u) + (1-\alpha ) J(v). \end{aligned}$$

Writing out the definition of J and rearranging, this is equivalent to

$$\begin{aligned}&\ \alpha (1-\alpha ) \langle -\varDelta _X (u-v) , u-v \rangle \\&\quad = \ \mu \left( \Vert \phi - \alpha u - (1-\alpha ) v \Vert _+ - \alpha \Vert \phi - u \Vert _+ - (1-\alpha ) \Vert \phi - v \Vert _+ \right) \\&\quad \le \ 0. \end{aligned}$$

Here we have used the convexity of \(\Vert \cdot \Vert _+\). But, \(\langle -\varDelta _X \cdot , \cdot \rangle \) is a positive semi-definite form, and since X is assumed to be connected, has only the constant function in the kernel. This implies that \(u-v = \alpha 1\) where \(\alpha \in {\mathbb {R}}\) and \( 1 :V \rightarrow {\mathbb {R}}\) is the constant function. Since, \(u,v \in U_f\), i.e., satisfy the same boundary conditions, we conclude that \(u=v\), as desired.

In [26, Theorem 3.1] it is shown that for any \(\mu \ge - \varDelta \phi \), the minimizers of the constrained and \(L^1\)-penalty linearized obstacle problems, the continuum version of (P2) and (P4), are the same. Below, in Theorem 1, we give the analogous result for the discrete problems, (P2) and (P4). The proof is largely the same, except for one small but interesting point. In the continuum, one has that for any \(f \in H^1\), \(\langle \nabla f_+ , \nabla f_- \rangle = 0\), where \(f_+ = \max (f,0)\), \(f_- = \min (f,0)\), and \(f = f_+ + f_-\). In our discrete setting we have the following result.

Lemma 2

For any \(f \in L^2(X)\), \(\langle \nabla _X f_+ , \nabla _X f_- \rangle _{L^2(X)^D} \ge 0\).

Proof

Denoting the adjacent vertices of \(x\in X\) by N(x), we compute

$$\begin{aligned} \langle \nabla _X f_+ , \nabla _X f_- \rangle _{L^2(X)^D}&= - \langle \varDelta _X f_+ , f_- \rangle _{L^2(X)} = - \sum _{x \in X} (\varDelta _X f_+)(x) f_-(x) \\&= - \sum _{x \in X} \Bigr ( \sum _{y \in N(x) } f_+(y) - f_+(x) \Bigr ) f_-(x) \\&= \sum _{x \in X} |f_-(x) | \sum _{y \in N(x)} | f_+(y) | \\&\ge 0. \end{aligned}$$

Theorem 1

Let u and \(u_\mu \) denote the minimizers of (P2) and (P4) respectively. For any \(\mu \ge \max \{ - \varDelta _X \phi (x) :x \in X\}\), we have that \(u_ \mu = u\).

Proof

Consider \(w := u_\mu + (\phi - u_\mu )_+\). Abbreviate \(\langle \cdot , \cdot \rangle = \langle \cdot , \cdot \rangle _{L^2(X)^D}\) and \(\Vert \cdot \Vert = \Vert \cdot \Vert _{L^2(X)^D}\). Evaluating the objective of (P4) at the admissible \(w \in U_{f,\phi }\), we obtain

$$\begin{aligned} D[\nabla _Xw] + \mu \Vert \phi - w \Vert _+&= D[\nabla _Xw] = D[\nabla _Xu_\mu ] + \langle \nabla _X u_\mu , \nabla _X (\phi -u_\mu )_+ \rangle \\&\quad + \frac{1}{2} \langle \nabla _X (\phi -u_\mu )_+, \nabla _X (\phi -u_\mu )_+ \rangle \\&= D[\nabla _Xu_\mu ] + \langle - \varDelta _X \phi , (\phi -u_\mu )_+ \rangle _{L^2(X)} \\&\quad - \frac{1}{2} \Vert \nabla _X (\phi - u_\mu )_+ \Vert ^2 - \langle \nabla _X(\phi -u_\mu )_+, \nabla _X (\phi -u_\mu )_- \rangle \end{aligned}$$

Now assuming \(\mu \ge \max \{ - \varDelta _X \phi (x) :x \in X\}\) and using Lemma 2, we obtain

$$\begin{aligned} D[\nabla _Xw] + \mu \Vert \phi - w \Vert _+ \le D[\nabla _Xu_\mu ] + \mu \Vert \phi - u_\mu \Vert _+ . \end{aligned}$$

Since \(u_\mu \) is the unique minimizer of \(D[\nabla _X\cdot ] + \mu \Vert \phi - \cdot \Vert _+ \) over \(U_f\), we conclude that \(w = u_\mu \) which implies that \(u_\mu \ge \phi \), i.e., \(u_\mu \in U_{f,\phi }\). Additionally, we have that

$$\begin{aligned} D[\nabla _Xu_\mu ] = D[\nabla _Xu_\mu ] + \mu \Vert \phi - u_\mu \Vert _+ \le D[\nabla _Xu] + \mu \Vert \phi - u \Vert _+ = D[\nabla _Xu]. \end{aligned}$$

Since u is the unique minimizer of \(D[\nabla _X\cdot ]\) over \(U_{f,\phi }\) and \(u_\mu \) is admissible, we conclude that \(u = u_\mu \), as desired.

2.6 Saddle Point Formulation

Thanks to the identity of a closed convex functional with its biconjugate, we can substitute the functionals \(S[\nabla _Xu]\) and \(D[\nabla _Xu]\) by the definition of their respective biconjugates (2.14a) and (2.14b). We thus obtain the four primal-dual saddle point problems, respectively equivalent to the primal problems (P1)–(P4):

2.6.1 Constrained Obstacle Problem

$$\begin{aligned} \min _{u\in U_{f,\varphi }}\ \ \ {\mathop {\mathop {\max }\limits _{p\in L^2(X)^D}}\limits _{|p|(x)\le 1}} \ \langle \nabla _X u,p\rangle _{L^2(X)^D} - S^*[p] \end{aligned}$$
(PD1)

2.6.2 Constrained Linearized Obstacle Problem

$$\begin{aligned} \min _{u\in U_{f,\varphi }} \ \ \max _{p\in L^2(X)^D} \ \ \langle \nabla _X u,p\rangle _{L^2(X)^D} - D^*[p] \end{aligned}$$
(PD2)

2.6.3 \(L^1\)-Penalty Obstacle Problem

$$\begin{aligned} \min _{u\in U_f} \ \ {\mathop {\mathop {\max }\limits _{p\in L^2(X)^D}}\limits _{|p|(x)\le 1}} \ \ \langle \nabla _X u,p\rangle _{L^2(X)^D} - S^*[p] + \mu \Vert \varphi -u\Vert _+ \end{aligned}$$
(PD3)

2.6.4 Linearized \(L^1\)-Penalty Obstacle Problem

$$\begin{aligned} \min _{u\in U_f} \ \ \max _{p\in L^2(X)^D} \ \ \langle \nabla _X u,p\rangle _{L^2(X)^D} - D^*[p] + \mu \Vert \varphi -u\Vert _+ \end{aligned}$$
(PD4)

For discussing algorithms for the preceding problems (PD1)–(PD4), it is convenient to summarize these saddle point problems in the following form:

$$\begin{aligned} \min _{u\in U} \ \ \max _{p\in W^*} \ \ \langle \nabla _X u,p\rangle _{L^2(X)^D} - F^*[p] + G[u], \end{aligned}$$
(2.17)

where we identify the admissible sets, U and \(W^*\), and the functionals, \(F^*\) and G, as follows:

$$\begin{aligned} U&:={\left\{ \begin{array}{ll}U_{f,\varphi }&{} \text {for}\ \ (\hbox {PD1}),(\hbox {PD2})\\ U_{f}&{} \text {for}\ \ (\hbox {PD3}),(\hbox {PD4})\end{array}\right. } \end{aligned}$$
(2.18a)
$$\begin{aligned} W^*&:= {\left\{ \begin{array}{ll}L^2(X)^D \cap \{p\mid |p|(x)\le 1\}&{} \text {for}\ \ (\hbox {PD1}),(\hbox {PD3})\\ L^2(X)^D &{} \text {for}\ \ (\hbox {PD2}),(\hbox {PD4})\end{array}\right. } \end{aligned}$$
(2.18b)
$$\begin{aligned} F^*[p]&:= {\left\{ \begin{array}{ll}S^*[p] &{} \text {for}\ \ (\hbox {PD1}),(\hbox {PD3})\\ D^*[p] &{} \text {for}\ \ (\hbox {PD2}),(\hbox {PD4})\end{array}\right. } \end{aligned}$$
(2.18c)
$$\begin{aligned} G[u]&:={\left\{ \begin{array}{ll}0&{} \text {for}\ \ (\hbox {PD1}),(\hbox {PD2})\\ \mu \Vert \varphi -u\Vert _+&{} \text {for}\ \ (\hbox {PD3}),(\hbox {PD4}).\end{array}\right. } \end{aligned}$$
(2.18d)

3 Algorithms

3.1 Primal-Dual Hybrid Gradients Scheme

To solve the general saddle point problem (2.17) efficiently, we propose an adaptation of the primal-dual hybrid gradients algorithm [8, 29, 30, 32]. For (2.17), the structure of the PDHG algorithm is as follows:

$$\begin{aligned} p^{n+1}&= {{\mathrm{arg\,min}}}_{p\in W^*} \ \ -\langle \nabla _X \bar{u}^n,p\rangle _{L^2(X)^D} + F^*[p] + \frac{1}{2r_1}\Vert p-p^{n}\Vert _{L^2(X)^D}^2 \end{aligned}$$
(3.1a)
$$\begin{aligned} u^{n+1}&= {{\mathrm{arg\,min}}}_{u\in U} \ \ -\langle u,\mathrm{div}_Xp^{n+1}\rangle _{L^2(X)} + G[u] + \frac{1}{2r_2}\Vert u-u^{n}\Vert _{L^2(X)}^2 \end{aligned}$$
(3.1b)
$$\begin{aligned} \bar{u}^{n+1}&=2u^{n+1} -u^{n}. \end{aligned}$$
(3.1c)

The first two steps are the proximal update of the dual and primal variable, respectively. The third, extra-gradient step is an overrelaxation of the primal update in order to overcome the stepsize shortening typical of first order methods; it is a prediction of the primal variable update used in the dual variable update. Note the appearance of \(\bar{u}\) instead of u in the dual variable update 3.1a.

In general, O(1 / n) (where n is the number of iterations) convergence has been shown for fixed \(r_1,r_2\) satisfying

$$\begin{aligned} r_1r_2\Vert \nabla _X\Vert ^2\le 1, \end{aligned}$$
(3.2)

where \(\Vert \nabla _X\Vert \) is the operator norm/induced norm of the discretized gradient operator; equivalently, \(\Vert \nabla _X\Vert ^2=\Vert \varDelta _X\Vert \) is the induced norm of the composition \(\varDelta _X = \mathrm{div}_X\circ \nabla _X\).

In the following subsections, we show how the respective minimization sub-problems (3.1a) and (3.1b) can be solved efficiently for each specific configuration in (2.18) corresponding to (PD1)–(PD4).

3.2 Dual Variable Update

We first address the update of the dual variable \(p\). This step is unaffected by the choice of constrained or \(L^1\)-penalty formulation, but differs between the non-linear, \(S[\nabla _Xu]\), and linearized, \(D[\nabla _Xu]\), case.

3.2.1 Linear Cases, (PD2) and (PD4)

In the linearized cases (PD2) and (PD4), where \(F^*=D^*\), the minimization problem of the dual variable update (3.1a) is given by

$$\begin{aligned} p^{n+1} = {{\mathrm{arg\,min}}}_{p\in L^2(X)^D} \ \ -\langle \nabla _X\bar{u}^n,p\rangle _{L^2(X)^D} + \frac{1}{2} \Vert p\Vert _{L^2(X)^D}^2 + \frac{1}{2r_1}\Vert p-p^{n}\Vert _{L^2(X)^D}^2. \end{aligned}$$
(3.3)

The minimizer of this quadratic function is immediately found as

$$\begin{aligned} p^{n+1} = \frac{p^n+r_1\nabla _X\bar{u}^n}{1+r_1}. \end{aligned}$$
(3.4)

3.2.2 Non-Linear Cases (PD1) and (PD3)

In the nonlinear cases (PD1) and (PD3), where \(F^*=S^*\), by rationalizing the numerator of the dual, the minimization problem of the dual variable update (3.1a) can be rewritten as

$$\begin{aligned} p^{n+1} = {\mathop {\mathop {{{\mathrm{arg\,min}}}}\limits _{p\in L^2(X)^D}}\limits _{|p|(x)\le 1}} -\langle \nabla _X\bar{u}^n,p\rangle _{L^2(X)^D} +\sum _{x\in X}\frac{|p|(x)^2-1}{\sqrt{1-|p|(x)^2}} + \frac{1}{2r_1}\Vert p-p^{n}\Vert _{L^2(X)^D}^2. \end{aligned}$$
(3.5)

We propose to solve this in an iteratively reweighted least squares (IRLS) approach as follows. We observe that the objective in (3.5) without the square-root in the denominator is quadratic in \(p\). We fix the current estimate of \(p\) in the square-root to obtain a weighted least squares problem, this weight is then updated, and the process is repeated until convergence:

$$\begin{aligned} \psi ^{0}&= p^{n} \end{aligned}$$
(3.6a)
$$\begin{aligned} \psi ^{t+1}&={\mathop {\mathop {{{\mathrm{arg\,min}}}}\limits _{\psi \in L^2(X)^D }}\limits _{|\psi |(x)\le 1}} \ \ -\langle \nabla _X\bar{u}^n,\psi \rangle _{L^2(X)^D} + \sum _{x\in X}\frac{|\psi |(x)^2-1}{\sqrt{1-|\psi ^t|(x)^2}} + \frac{1}{2r_1}\Vert \psi -p^{n}\Vert _{L^2(X)^D}^2 \end{aligned}$$
(3.6b)
$$\begin{aligned} p^{n+1}&= \lim _{t\rightarrow \infty }\psi ^t \end{aligned}$$
(3.6c)

Our focus is on the solution of the fixed-point minimization step (3.6b). Indeed, it can be shown (see [32, Lemma 4.3]) that for small enough \(r_1\), the solution to (3.6b) is given by

$$\begin{aligned} \psi ^{t+1}(x) = \frac{\varepsilon ^t(x)}{\varepsilon ^t(x)+2r_1}\left[ p^n(x)+r_1(\nabla _X\bar{u}^n)(x)\right] ,\quad x\in X, \end{aligned}$$
(3.7)

where \(\varepsilon ^t(x):=\sqrt{1-|\psi ^t|(x)^2}\). Further, for suitably chosen \(r_1\), the iterative scheme (3.6) converges to the desired fixed point (see [32, Lemma 4.4]).

3.3 Primal Variable Update

We now consider the update problem (3.1b) of the primal variable. Here, the choice of non-linear versus linearized cost functional is irrelevant; instead, we distinguish the original constraint formulations (PD1) and (PD2) from the relaxed \(L^1\)-penalty formulations (PD3) and (PD4).

3.3.1 Constrained Formulation (PD1) and (PD2)

The constrained primal variable iteration (3.1b) reads

$$\begin{aligned} u^{n+1} = {{\mathrm{arg\,min}}}_{u\in U_{f,\varphi }}\ \ -\langle u,\mathrm{div}_Xp^{n+1}\rangle _{L^2(X)} +\frac{1}{2r_2}\Vert u-u^n\Vert _{L^2(X)}^2. \end{aligned}$$
(3.8)

Based on the projection theorem and variational inequalities [6], and in the tradition of the Arrow–Hurwitz–Uzawa algorithm [1] and the first recent proposition of the PDHG algorithm [29], this update is approximated by seeking a minimizer in the relaxed set \(L^2(X)\) and projecting onto the admissible set \(U_{f,\varphi }\). This approximation to the iteration can be written

$$\begin{aligned} u^{n+1}&= P_{U_{f,\varphi }}\left[ {{\mathrm{arg\,min}}}_{u\in L^2(X)}\ \ -\langle u,\mathrm{div}_Xp^{n+1}\rangle _{L^2(X)} +\frac{1}{2r_2}\Vert u-u^n\Vert _{L^2(X)}^2\right] \\ \nonumber&= P_{U_{f,\varphi }}[u^\star ], \end{aligned}$$
(3.9)

with

$$\begin{aligned} u^\star := u^n + r_2\mathrm{div}_Xp^{n+1} \end{aligned}$$
(3.10)

and where the projection \(P_{U_{f,\varphi }}:L^2(X) \rightarrow {U_{f,\varphi }}\) is simply defined as

$$\begin{aligned} P_{U_{f,\varphi }}[u^\star ] := {{\mathrm{arg\,min}}}_{u\in U_{f,\varphi }}\Vert u-u^\star \Vert _{L^2(X)}. \end{aligned}$$
(3.11)

Therefore, more explicitly for \(x\in X\), the update of the primal variable in the constrained setting (PD1), (PD2), is:

$$\begin{aligned} u^{n+1}(x) := {\left\{ \begin{array}{ll} f(x) &{} \text {if} \ \ x\in \partial X,\\ \varphi (x) &{} \text {if}\ \ x\notin \partial X \text { and } u^\star (x) < \varphi (x),\\ u^\star (x) &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$
(3.12)

3.3.2 \(L^1\)-Penalty Formulations (PD3) and (PD4)

The \(L^1\)-penalty method relaxes the admissible set to \(U_f\), i.e., \(L^2(X)\) subject to only the boundary values f on \(\partial X\), and introduces the \(L^1\)-like penalty term \(\Vert \varphi -u\Vert _+\), instead. The resulting instance of (3.1b) can be rewritten as

$$\begin{aligned} u^{n+1} = {{\mathrm{arg\,min}}}_{u\in U_{f}}\ \ \mu \Vert \varphi -u\Vert _++ \frac{1}{2r_2}\Vert u-(u^n+r_2\mathrm{div}_Xp^{n+1})\Vert _{L^2(X)}^2. \end{aligned}$$
(3.13)

The minimizer can be easily found by making a change of variables, \(z=\varphi -u\), leading to

$$\begin{aligned} z^\star&= {{\mathrm{arg\,min}}}_{z}\ \ r_2\mu \Vert z\Vert _++ \frac{1}{2}\Vert \varphi -(u^n+r_2\mathrm{div}_Xp^{n+1})-z\Vert _{L^2(X)}^2\nonumber \\&= {{\mathrm{shrink}}}_+(\varphi -(u^n+r_2\mathrm{div}_Xp^{n+1}), r_2\mu ), \end{aligned}$$
(3.14)

where \({{\mathrm{shrink}}}_+\) denotes single-sided soft-thresholding (shrinkage) as follows:

$$\begin{aligned} {{\mathrm{shrink}}}_+(s,\tau ) := {\left\{ \begin{array}{ll} s&{}\text {if} \ \ s<0,\\ 0 &{} \text {if} \ \ 0\le s \le \tau ,\\ s-\tau &{} \text {if} \ \ s>\tau .\end{array}\right. } \end{aligned}$$
(3.15)

Let again \(u^\star =u^n+r_2\mathrm{div}_X p^{n+1}\). We re-substitute \(u^{n+1}=\varphi -z^\star \) and respect the boundary conditions through projection \(P_{U_f}\),

$$\begin{aligned} u^{n+1} = P_{U_f}[\varphi - {{\mathrm{shrink}}}_+(\varphi -u^\star , r_2\mu ) ]. \end{aligned}$$
(3.16)

This leads to the following explicit primal update for the \(L^1\)-penalty based obstacle problems, (PD3), (PD4):

$$\begin{aligned} u^{n+1}(x) = {\left\{ \begin{array}{ll} f(x) &{} \text {if} \ \ x\in \partial X,\\ u^\star (x)+r_2\mu &{} \text {if}\ \ x\notin \partial X \text { and } u^\star (x) < \varphi (x)-r_2\mu ,\\ \varphi (x) &{} \text {if}\ \ x\notin \partial X \text { and } \varphi (x)-r_2\mu \le u^\star (x) \le \varphi (x),\\ u^\star (x) &{} \text {otherwise}.\end{array}\right. } \end{aligned}$$
(3.17)

Remark 1

Recall that in Theorem 1, we proved that the solutions to the constrained and \(L^1\)-penalized obstacle problems, (PD2) and (PD4), are equivalent for \(\mu \ge \max \{-\varDelta _X \varphi (x) :x \in X\}\); see also [26]. In light of this, it is interesting to compare primal iterations (3.12) and (3.17) for the constrained and \(L^1\)-penalized obstacle problems. These two algorithms only disagree for points \(x \notin \partial X\) with \(u^\star (x) < \varphi (x)-r_2\mu \), i.e., when the tentative membrane update \(u^\star \) violates the obstacle constraint by an amount in excess of \(r_2\mu \). This suggests that not only do the solutions of these two problems agree for sufficiently large \(\mu \), but so do all of the iterates in the proposed primal-dual methods.

3.4 Elimination of the Dual Variable in the Linear Case, (PD2) and (PD4)

In the updates of the primal variable, the dual variable only appears through its divergence, \(\mathrm{div}_X p^{n+1}\); see (3.10) being used in (3.12) and (3.17). However, in the linear case, (PD2) and (PD4), by applying the divergence to (3.4), it is easy to see that

$$\begin{aligned} \mathrm{div}_Xp^{n+1} = \frac{\mathrm{div}_Xp^n+r_1\varDelta _X\bar{u}^n}{1+r_1}. \end{aligned}$$
(3.18)

Assuming \(p^{0}(x)=0\) for all \(x\in X\), this recursion relation can be solved, giving

$$\begin{aligned} \mathrm{div}_Xp^{n+1} = \sum _{\eta =0}^n \frac{r_1\varDelta _X\bar{u}^{n-\eta }}{(1+r_1)^{\eta +1}}. \end{aligned}$$
(3.19)

This allows rewriting the primal variable updates without the dual variable, effectively eliminating the dual variable from the scheme altogether. After elimination, the intermediate primal update (3.10) is now

$$\begin{aligned} u^\star = u^n + r_1r_2\sum _{\eta =0}^n \frac{\varDelta _X\bar{u}^{n-\eta }}{(1+r_1)^{\eta +1}}, \end{aligned}$$
(3.20)

which is then projected/shrunk according to (3.12) or (3.17) as before.

3.5 Complete Algorithms

The complete algorithms for the four different flavors of the obstacle problem (PD1)–(PD4) are summarized in Algorithms 1–2. These algorithms have extremely simple structure and require only low-complexity operations. In particular, no matrix inversions are required. These algorithms converge in O(1 / n) (where n is the number of iterations) for \(r_1,r_2\) satisfying

$$\begin{aligned} r_1r_2\Vert \varDelta _X\Vert \le 1. \end{aligned}$$

Remark 2

Although Algorithms 1 and 2 appear to be specifically engineered to serve a particular purpose, it is important to note that they have been derived in a disciplined manner: the re-formulation of the original, primal obstacle problem as a saddle-point problem and by implementing a primal-dual hybrid gradients algorithm with known convergence guarantees.

figure a
figure b

4 Interpretation/Comparisons for Proposed Algorithm

In this section we compare the proposed primal-dual algorithm (summarized in Algorithms 1 and 2) to two time-stepping schemes arising in classical problems.

4.1 Comparison to the Forward Euler Scheme for the Discrete Heat Equation

For a comparison, recall that the forward Euler time-stepping scheme for the discrete heat equation, can be writtten

$$\begin{aligned} u^\star&= u^n + r_1r_2 \varDelta _Xu^n \end{aligned}$$
(4.1a)
$$\begin{aligned}&= u^0 + r_1r_2 \sum _{\eta =0}^n \varDelta _Xu^\eta , \end{aligned}$$
(4.1b)

where \(r_1r_2\) is the time step. Thus, after elimination of the dual variable, the primal-dual method for the linear problem seems to perform something akin to an alternating sequence of explicit heat diffusion steps (4.1), projection/thresholding (3.12) or (3.17), and an extra gradient step (3.1c). Unfortunately, the explicit heat diffusion (4.1) is well-known to be severely limited by its stability criterion and therefore dreadfully slow.

Indeed, the first term, \(\eta =0\), of the summation in (3.20) is an explicit heat update step, subject to the usual constraint on the time step, here \(r_1r_2\le \rho (\varDelta _X)^{-1}\). However, thanks to the sum, previous iterates are taken into account (discounted by a compounding factor of \(\frac{1}{1+r_1}\)), the impact of which is essential for the efficiency and convergence of the scheme. Consider the following two cases:

Away from the solution, subsequent heat updates point towards the same direction and the sum lets the history of updates interfere constructively, effectively greatly increasing the time step of the explicit heat update. Assuming \(\varDelta _X\bar{u}^{n-\eta }\approx \varDelta _Xu^n\) is constant for recent iterations (small \(\eta \)), and noting that \(\sum _{\eta =0}^n \frac{1}{(1+r_1)^{\eta +1}} = \frac{1}{r_1}\), we can approximate (3.20) by

$$\begin{aligned} u^\star \approx u^n + r_2 \varDelta _X u^n. \end{aligned}$$

The proposed scheme is thus heat diffusion with a \(1/r_1\)-fold larger effective time-step, see (4.1). This is more pronounced for \(r_1 \ll 1\), reducing the discounting on previous time steps (i.e., increasing the memory horizon of the summation).

Close to the solution, the membrane tends to oscillate about the optimum and subsequent heat steps have opposing signs. The summation leads to destructive interference of subsequent updates, dampening the oscillations and effectively stabilizing the explicit heat update. In this case, larger values of \(r_1\) reduce the memory horizon and allow the system to transition from constructive interference to a stabilizing, destructive interference, thereby reducing overshooting.

4.2 Comparison to a Time-Stepping Scheme for the Damped Wave Equation

Another interesting analysis of the resulting update scheme, in the linear case and away from obstacle and boundary value projections, emerges when we rewrite it as a three-point scheme in time.

Lemma 3

The proposed update iterations of the primal variable (ignoring projections) satisfy the following three-point equality:

$$\begin{aligned} \left( u^{n+1}-2u^n+u^{n-1}\right) + r_1\left( u^{n+1}-u^n\right) = r_1r_2\left( 2\varDelta _X u^{n}-\varDelta _X u^{n-1}\right) . \end{aligned}$$
(4.2)

Remark 3

The left hand terms in (4.2) are immediately identified as finite difference approximations of \(u_{tt}^n\approx \left( u^{n+1}-2u^n+u^{n-1}\right) /h^2\) and \(u_t^n\approx \left( u^{n+1}-u^n\right) /h\), and one clearly recognizes elements of a damped wave equation:

$$\begin{aligned} u_{tt}^n + \frac{r_1}{h}u_t^n = \frac{r_1r_2}{h^2}\left( 2\varDelta _X u^{n}-\varDelta _X u^{n-1}\right) \end{aligned}$$
(4.3)

Proof

By substituting the extragradient (3.1c) into (3.18), we get:

$$\begin{aligned} (1+r_1)\mathrm{div}_Xp^{n+1} = \mathrm{div}_Xp^n + r_1\left( 2\varDelta _X u^n - \varDelta _X u^{n-1}\right) . \end{aligned}$$

From (3.10) and (3.12)/(3.17), but ignoring shrinkage and projections, we get

$$\begin{aligned} \mathrm{div}_Xp^{n+1}=\frac{u^{n+1}-u^n}{r_2}. \end{aligned}$$

Put together, this yields the primal update

$$\begin{aligned} u^{n+1} = u^n + \frac{u^n-u^{n-1}}{1+r_1} + \frac{r_1r_2}{1+r_1}\left( 2\varDelta _X u^n - \varDelta _X u^{n-1}\right) . \end{aligned}$$
(4.4)

from which (4.2) follows by simple term rearrangements.

Fig. 1
figure 1

Toy example problem: obstacle-free linearized minimal surface problem (PD2) on 32-point 1D domain discretization, \(f=0\) boundary conditions, and single peak initialization at the center. Evolution of the membrane over iterations n. a Standard Euler explicit heat update (4.1). b Our primal-dual scheme (3.20). Black area indicates error with respect to (trivial) solution

The three-point primal update scheme (4.4) highlights how the proposed scheme performs explicit heat diffusion (based on the forward-projected Laplacian, in the extragradient step), on top of which the most recent update is added. This scheme is related to the “momentum method”—because it mimicks the motion of a system u with a certain mass (and therefore inertia) [19]. This type of gradient descent acceleration method is classically used in gradient-descent back-propagation training of neural networks [20], and has been associated with a specific type of conjugate gradients [4]. Also, the idea of maintaining momentum by mixing in an extrapolation of most recent updates is the essence of Nesterov’s gradient acceleration scheme [18] and the core of the FISTA algorithm [3]. The idea is also to be compared to gradient averaging e.g., as in stochastic gradient descent methods.

The evolution of a simple toy problem under both, standard Euler explicit heat updates (4.1) and the proposed primal-dual scheme (3.20) is illustrated in Fig. 1. It is easy to see how the summation in (3.20) causes the peak to decay at a faster pace after the first iterate. As a result, the initial localized state splits into two components that propagate away from the center and decay, illustrating the damped wave nature of the iterations. The wave is reflected at the fixed boundaries, inverting the sign, in contrast to the explicit scheme which remains non-negative (according to the maximum principle). Evidently, the proposed scheme converges to the (trivial) flat solution \(u=0\) much faster than the explicit heat update.

5 Numerical Experiments

In all following problems, we assume X to be a Cartesian sampling of either an interval in \({\mathbb {R}}\), or of a rectangular subset of \({\mathbb {R}}^2\). The differential operators are chosen as forward differences for the gradient, and, as a result, backward differences for the divergence and central differences for the Laplacian.

We implemented the algorithms in MATLAB®,Footnote 1 and ran our experiments on standard computing equipment (2012 laptop with 2.80 GHz Intel® CoreTM i7, with 4 GB RAM).

5.1 One-Dimensional Obstacles

We consider two obstacles on [0, 1], previously considered in [23, 26]:

$$\begin{aligned} \varphi _1(x) := {\left\{ \begin{array}{ll} 100x^2 &{} \text {for} \ \ 0\le x\le 0.25\\ 100x(1-x) &{} \text {for} \ \ 0.25\le x\le 0.5\\ \varphi _1(1-x)&{}\text {for}\ \ 0.5\le x \le 1, \end{array}\right. } \end{aligned}$$
(5.1)

and

$$\begin{aligned} \varphi _2(x) := {\left\{ \begin{array}{ll} 10\sin (2\pi x) &{} \text {for} \ \ 0\le x\le 0.25\\ 5\cos (\pi (4x-1))+5 &{} \text {for} \ \ 0.25\le x\le 0.5\\ \varphi _1(1-x)&{}\text {for}\ \ 0.5\le x \le 1. \end{array}\right. } \end{aligned}$$
(5.2)

A third obstacle (from [26]) is given by

$$\begin{aligned} \varphi _3(x) := 10\sin ^2(\pi (x+1)^2),\qquad x\in [0,1]. \end{aligned}$$
(5.3)

All three problems and the obtained solutions are illustrated in Fig. 2. In the 1D case, there is of course no difference between the solutions of the linear and the non-linear versions: the solutions are identical to the obstacle on the contact set, and straight lines away from it.

Fig. 2
figure 2

1D obstacles \(\varphi _1\) through \(\varphi _3\) (dotted) and respective solutions to the obstacle problem (solid). ab Linear obstacle problem (PD2). \(N=256\), \(\mu =5\), boundary conditions \(u(0)=u(1)=0\). c Non-linear obstacle problem (PD1). \(N=512\), \(\mu =\infty \), \(u(0)=5\) and \(u(1) = 10\). For all three: \(r_1=0.01\), \(r_2=25\), \(\epsilon =10^{-5}\)

5.2 Two-Dimensional Obstacles

We now look at example problems in 2D (from [26]). First, consider the square domain, \(\varphi _4:\varOmega \rightarrow {\mathbb {R}}\), with \(\varOmega =[0,1]^2\), and the obstacle

$$\begin{aligned} \varphi _4(x,y) := {\left\{ \begin{array}{ll} 5 &{} \text {for}\ \ |x-0.6|+|y-0.6|< 0.04\\ 4.5 &{} \text {for}\ \ (x-0.6)^2+(y-0.25)^2< 0.001\\ 4.5&{} \text {for}\ \ y=0.57 \ \ \text {and} \ \ 0.075< x < 0.13\\ 0&{} \text {otherwise}. \end{array}\right. } \end{aligned}$$
(5.4)

We solve the linear (PD2) and non-linear (PD1) obstacle problem (with 0 boundary values imposed) at different resolutions, as illustrated in Fig. 3. Parameter values are as follows: \(\mu =\infty \), \(r_1=0.01\), \(r_2=12.5\), \(\epsilon =10^{-5}\).

In Table 1 we can compare to numerical experiments of previous algorithm implementations, in particular the \(L^1\)-penalty method [26], and the Lions-Mercier splitting scheme [16], equally reported in [26]. These comparisons hold only in a loose sense, because the implementation and computing infrastructure is undisclosed for [26]. Nevertheless, our method seems to outperform by 1–2 orders of magnitude.

Fig. 3
figure 3

Solution to the linear and nonlinear problems (PD1) and (PD2) for a 2D obstacle, \(\varphi _4(x,y)\) in (5.4). a The obstacles at \(128\times 128\). b The linear obstacle problem solution computed at medium resolution \(128\times 128\). c \(64\times 64\) low-resolution solution of the linear problem. d The low-resolution non-linear obstacle problem solution

Table 1 Running times for 2D obstacle problem \(\varphi _4\), at different resolution

Next, we consider the square domain \(\varOmega =[-2,2]^2\), and the radially-symmetric obstacle

$$\begin{aligned} \varphi _5(r) := {\left\{ \begin{array}{ll} \sqrt{1-r^2} &{} \text {for}\ \ r\le 1,\\ -1 &{}\text {otherwise}. \end{array}\right. } \end{aligned}$$
(5.5)

This obstacle has been used in [17, 26], and can be traced back to at least [25], albeit with a typo in its definition. Assuming \(f=0\) on a circular boundary of radius \(R=2\),Footnote 2 the linear obstacle problem admits the following radially-symmetric analytical solution:

$$\begin{aligned} u_5^*(r) = {\left\{ \begin{array}{ll}\sqrt{1-r^2} &{} \text {for}\ \ r\le r^*,\\ -(r^*)^2\ln (r/R)/\sqrt{1-(r^*)^2} &{}\text {otherwise}, \end{array}\right. } \end{aligned}$$
(5.6)

where \(r^*=0.6979651482\ldots \) satisfies \((r^*)^2(1-\ln (r^*/R))=1\).

Using the values of this analytical solution as prescribed boundary values on the boundaries of the square domain, we solve the linear obstacle problem. We set the parameters as follows: \(\mu = 0.1\), \(r_1=0.008\), \(r_2=15.625\), \(\epsilon =10^{-7}\). Results are shown in Fig. 4. The pointwise error is of the order of \(10^{-5}\) and concentrated at the circular free boundary, mainly as a result of discretization. This type of error is comparable to the results reported in [17] and [26].

Fig. 4
figure 4

Radially symmetric 2D obstacle problem. a Obstacle \(\varphi _5(x,y)\). bc Low- and medium-resolution solution and circular zero-level set (blue). d Error map of the high-resolution solution \(256\times 256\). The error is concentrated as peaks near the free boundary, mostly due to spatial discretization, and propagating from there into the free membrane parts. (Color figure online)

For this obstacle problem, we can compare to numerical experiments of previous algorithm implementations, see Table 2. In particular, we compare the runtime and number of iterations with the levelset gradient-method [17]. Even assuming that the 2004 computing times of [17] may be discounted by Moore’s law (by half each 18 months, about a factor of 160–200 overall), we note that our method clearly outperforms by 1–2 orders of magnitude.

5.3 Generalization to a Double Obstacle with Forcing

A more general form of the obstacle problem deals with two obstacles, bounding the membrane from below \((\varphi )\) and above \((\psi )\), and the inclusion of a force term v acting vertically on the membrane:

$$\begin{aligned} \min _u\int _\varOmega |\nabla u|^2 - u v \qquad \text {s.t.}\quad \psi \ge u\ge \varphi ,\quad u\vert _{\partial \varOmega } = f. \end{aligned}$$
(5.7)

It is cumbersome but straightforward to extend the proposed algorithms to this case: the double obstacle results in a second shrinkage term (albeit with opposite signs), and the force v becomes an extra term in the primal variable update, \(u^\star = u^n + r_2\mathrm{div}_Xp^{n+1} + v\). The exact details are left as an exercise to the reader.

Table 2 Running times for spherical 2D problem \(\varphi _5\), at different resolution

As an illustration of the performance of this slightly extended algorithm, we consider an elastic-plastic torsion problem described in [27]. Let \(\varOmega = [0,1]^2\) and X a Cartesian sampling thereof. The obstacles are constructed as

$$\begin{aligned} \varphi _6(x,y)&:= -\text {dist}(x,\partial \varOmega ),\ \ \text {and} \end{aligned}$$
(5.8a)
$$\begin{aligned} \psi _6(x,y)&:= 0.2. \end{aligned}$$
(5.8b)

The boundary values are \(u=0\) on \(\partial \varOmega \). The force \(v:\varOmega \rightarrow {\mathbb {R}}\) is defined as

$$\begin{aligned} v(x,y) := {\left\{ \begin{array}{ll} 300,&{}\text {if}\ \ (x,y)\in S := \{ (x,y):|x-y|\le 0.1 \ \wedge \ x\le 0.3\},\\ -70e^yg(x), &{} \text {if}\ \ (x,y) \notin S\ \wedge \ x\le 1-y,\\ 15e^yg(x), &{} \text {if}\ \ (x,y) \notin S\ \wedge \ x> 1-y,\\ \end{array}\right. } \end{aligned}$$
(5.9)

where we further have

$$\begin{aligned} g(x) := \left\{ \begin{array}{lll} 6x,&{}\text {if}\ \ 0\le x\le 1/6,\\ 2(1-3x),&{}\text {if}\ \ 1/6< x\le 1/3,\\ 6(x-1/3),&{}\text {if}\ \ 1/3< x\le 1/2,\\ 2(1-3(x-1/3)),&{}\text {if}\ \ 1/2< x\le 2/3,\\ 6(x-2/3),&{}\text {if}\ \ 2/3< x\le 5/6,\\ 2(1-3(x-2/3)),&{}\text {if}\ \ 5/6< x\le 1.\\ \end{array}\right. \end{aligned}$$
(5.10)

The algorithm on a coarse grid \(64\times 64\) (as in [27]), with \(\mu =0.1\), \(r_1=0.008\), \(r_2=15.625\), and for a tolerance \(\epsilon =10^{-6}\), converges in 1104 iterations, requiring only 0.45 s. The algorithm presented in [27] is reported to perform in 916.8–1275.1 s, which is more than three orders of magnitude slower. The computed minimum-energy membrane and the associated contact set (coincidence set) are shown in Fig. 5, and compare very well to the results in [27].

Fig. 5
figure 5

Double obstacle with force: elasto-plastic torsion problem. a Computed optimal membrane u. b Computed contact set: free membrane (blue), contact with upper obstacle (red), and contact with lower obstacle (green). Compare with results in [27]. (Color figure online)

6 Conclusions and Outlook

In this paper we have proposed a primal-dual hybrid gradients based method for the efficient solution of the obstacle problem. We consider both the non-linear and linearized versions of the discrete obstacle problem, as well as both the original, constrained formulation and the \(L^1\)-penalty relaxation. All these problems are convex minimization problems, and we start by reformulating them as primal-dual problems, based on the Legendre–Fenchel transform (convex conjugate) of the surface area and the Dirichlet energy, respectively. The resulting saddle-point problem is solved by the primal-dual hybrid gradients method, which consists of three iterative steps: the dual and primal variable proximal updates, and an extra-gradient step (overrelaxation) of the primal variable. The proximal updates can be solved efficiently, and in the linear case even particularly so. Indeed, the linear case allows eliminating the dual variable altogether, resulting in primal-variable update and extragradient scheme, only. We demonstrate on various 1D and 2D problems that the resulting scheme outperforms current state-of-the-art methods by 1–2 orders of magnitude.

In addition to being efficient, the proposed algorithm also benefits from a highly interesting physical interpretation, as discussed in Sects. 3.4 and 4.2. Firstly, the elimination of the dual variable results in a single variable scheme that is strongly reminiscent of gradient descent, except for the fact that previous iterates remain involved. As a result, there is build-up of a certain “momentum” that accelerates the updates beyond the limits of the usual CFL step-size criteria, see Sect. 3.4. Indeed, away from the solution the previous iterates interfere constructively resulting in a greatly increased apparent time step; whereas closer to the solution the membrane has a tendency to oscillate, and the iterates cancel out leading to increased stability. Similarly, the scheme can be brought into a form that is highly reminiscent of a damped wave equation (Sect. 4.2). This nature is also particularly illustrated by the toy-example considered in Fig. 1, where the momentum splits an initial central state into two separate parts, who then propagate away laterally and eventually dissipate.

A general advantage of the proposed primal-dual based method for the obstacle problem is the absence of matrix/operator inversions, since the proximal updates only contain explicit operations involving adjoints, at most. Also, in analogy to [23, 26] but in contrast to e.g., [12, 13], the proposed method does not need to track the contact set explicitly; indeed, the method handles the interface between coincidence set and harmonic membrane as a free boundary, intrinsically. Finally, it is to note that multigrid and multilevel methods that have been employed to speed up computation in different approaches such as [28] may be applicable to the proposed method as well, expected to result in a further speed up of the computations.

We remark that the properties of the proposed algorithm are not specifically engineered, but emerge from a very disciplined derivation from convex optimization, see Remark 2. More applications of this method, including elliptic PDE problems, will be reported in [33].