1 Introduction

The total variation (TV)-seminorm \(\vert \,\cdot \,\vert _\mathrm{TV}\) is ubiquitous as a regularizing functional in image analysis and related applications; see for instance [12, 15, 26, 46]. When \(\varOmega \subset \mathbb {R}^2\) is a bounded domain, this seminorm is defined as

$$\begin{aligned}&\vert u\vert _{\mathrm{TV}(\varOmega )}\nonumber \\&\quad :=\sup \left\{ \int _\varOmega u {\mathrm{div}}\,{\varvec{p}}\, \text{ d }x: {\varvec{p}}\in C^\infty _c(\varOmega ;\mathbb {R}^2), \; \vert {\varvec{p}}\vert _{s^*} \le 1 \right\} ,\nonumber \\ \end{aligned}$$
(1)

where \(s \in [1,\infty ]\), \(s^* = \frac{s}{s-1}\) denotes the conjugate of s and \(\vert \,\cdot \,\vert _{s^*}\) is the usual \(s^*\)-norm of vectors in \(\mathbb {R}^2\). Frequent choices include \(s = 2\) (the isotropic case) and \(s = 1\), see Fig. 1.

Fig. 1
figure 1

A \(\mathcal {D}\mathcal {G}_{0}(\varOmega )\) function u with values 0 and 1 on two triangles forming the unit square \(\varOmega \) (left), and the value of the associated TV-seminorm \(\vert u\vert _{\mathrm{TV}(\varOmega )} = \vert u\vert _{\mathrm{DTV}(\varOmega )}\) as a function of the rotation angle of the mesh

It has been observed in [19] that “the rigorous definition of the TV for discrete images has received little attention.” In this paper we propose and analyze a discrete analogue of (1) for functions u belonging to a space \(\mathcal {D}\mathcal {G}_{r}(\varOmega )\) or \(\mathcal {C}\mathcal {G}_{r}(\varOmega )\) of globally discontinuous or continuous finite element functions of polynomial degreeFootnote 1\(0 \le r \le 4\) on a geometrically conforming, simplicial triangulation of \(\varOmega \), consisting of triangles T and interior edges E.

In this case, it is not hard to see that TV-seminorm (1) can be evaluated as

(2)

where denotes the vector-valued jump of a function in normal direction across an interior edge of the triangulation.

It is intuitively clear that when u is confined to a finite element space such as \(\mathcal {D}\mathcal {G}_{r}(\varOmega )\) or \(\mathcal {C}\mathcal {G}_{r}(\varOmega )\), then it ought to be sufficient to consider the supremum in (1) over all vector fields \({\varvec{p}}\) from an appropriate finite-dimensional space as well. Indeed, we show that this is the case, provided that TV-seminorm (2) is replaced by its discrete analogue

(3)

which we term the discrete TV-seminorm. Here \(\mathcal {I}_T\) and \(\mathcal {I}_E\) are local interpolation operators into the polynomial spaces \(\mathcal {P}_{r-1}(T)\) and \(\mathcal {P}_r(E)\), respectively. Therefore, (3) amounts to the application of a nodal quadrature formula for the integrals appearing in (2). We emphasize that both (2) and (3) are isotropic when \(s = 2\), i.e., invariant w.r.t. rotations of the coordinate system. In the lowest order case (\(r = 0\)) of piecewise constant functions, the first sum in (3) is zero and only edge contributions appear. Moreover, in this case (2) and (3) coincide since is constant on edges. In general, we will show that the difference between (2) and (3) is of the order of the mesh size, see proposition 3.1.

Using (3) in place of (2) in optimization problems in imaging offers a number of significant advantages. Specifically, we will show in Theorem 3.1 that (3) has a discrete dual representation

$$\begin{aligned} \vert u\vert _{\mathrm{DTV}(\varOmega )}&= \max \Bigg \{ \int _\varOmega u {\mathrm{div}}\,{\varvec{p}}\, \text{ d }x: {\varvec{p}}\in \mathcal {R}\mathcal {T}_{\!\!\;r+1}^0(\varOmega ) \Bigg . \phantom {\Bigg \}} \nonumber \\&\quad \Bigg . \text {s.t. a number of simple constraints} \Bigg \} \end{aligned}$$
(4)

for \(u \in \mathcal {D}\mathcal {G}_{r}(\varOmega )\), where \(\mathcal {R}\mathcal {T}_{\!\!\;r+1}(\varOmega )\) denotes the space of Raviart–Thomas finite element functions of order \(r+1\), and \(\mathcal {R}\mathcal {T}_{\!\!\;r+1}^0(\varOmega )\) is the subspace defined by \({\varvec{p}}\cdot {\varvec{n}}= 0\) (where \({\varvec{n}}\) is the outer normal of unit Euclidean length) on the boundary of \(\varOmega \). In the lowest order case \(r = 0\) in particular, one obtains

$$\begin{aligned}&\vert u\vert _{\mathrm{DTV}(\varOmega )} = \max \Bigg \{ \int _\varOmega \! u {\mathrm{div}}\,{\varvec{p}}\, \text{ d }x: {\varvec{p}}\in \mathcal {R}\mathcal {T}_{\!\!\;1}^0(\varOmega ) \Bigg .,\nonumber \\&\quad \Bigg . \int _E \vert {\varvec{p}}\cdot {\varvec{n}}_{E}\vert \, \text{ d }S\le \vert E\vert \, \vert {\varvec{n}}_{E}\vert _s \text { on interior edges} \Bigg \} . \end{aligned}$$
(5)

Here \({\varvec{n}}_{E}\) denotes a normal vector of arbitrary orientation and unit Euclidean length, i.e., \(\vert {\varvec{n}}_{E}\vert _2 = 1\), on an interior edge E, and \(\vert E\vert \) denotes the (Euclidean) edge length. Since the expressions \(\int _E \vert {\varvec{p}}\cdot {\varvec{n}}_{E}\vert \, \text{ d }S\) are exactly the degrees of freedom typically used to define the basis in \(\mathcal {R}\mathcal {T}_{\!\!\;1}(\varOmega )\), the constraints in (5) are in fact simple bound constraints on the coefficient vector of \({\varvec{p}}\). For comparison, the pointwise restrictions \(\vert {\varvec{p}}\vert _{s^*} \le 1\) appearing in (1) are nonlinear unless \(s^* \in \{1,\infty \}\). For the case of higher-order finite elements, i.e., \(1 \le r \le 4\), further constraints in (4) impose an upper bound on the \(\vert \,\cdot \,\vert _{s^*}\)-norm of pairs of coefficients of \({\varvec{p}}\), see Theorem 3.1. Consequently, these constraints are likewise linear in the important special case \(s = 1\). In any case, each coefficient of \({\varvec{p}}\) is constrained only once.

As a consequence of (4), we establish that optimization problems utilizing discrete TV-seminorm (3) as a regularizer possess a discrete dual problem with very simple constraints. This applies, in particular, to the famous TV-\(L^2\) and TV-\(L^1\) models; see [46] and [15, 26, 41], respectively. The structure of the primal and dual problems is in turn essential for the efficient implementation of appropriate solution algorithms. As one of the main contributions of this paper, we are able to show that a variety of popular algorithms for TV-\(L^2\), originally developed in the context of finite difference discretizations on Cartesian grids, apply with little or no changes to discretizations with low- or higher-order finite elements. Specifically, we consider the split Bregman algorithm [30] and the primal–dual method of Chambolle and Pock [13] for TV-\(L^2\) denoising and inpainting problems. We mention that Chambolle’s projection method [11] and a primal–dual active set method similar to [34] can also be considered, as well as algorithms for TV-\(L^1\) and a ‘Huberized’ version of (3); we refer the reader to the extended preprint [33] for details.

There are multiple motivations to study finite element discretizations of the TV-seminorm, in imaging and beyond. First, finite element discretizations lend themselves in applications whenever the data are not represented on a Cartesian grid. While we focus in this paper mainly on the mathematical theory on triangular grids, we mention, for instance, that honeycombed octagonal CCD sensor layouts are in use in consumer cameras, e.g., the Fujifilm SuperCCD sensor. Furthermore, nonrectangular sub-pixel configurations appear to be promising for spatially varying exposure (SVE) sensors for high-dynamic-range (HDR) imaging, see [38], and super-resolution applications, see [8, 47, 54]. Image processing problems on nonregular pixel layouts have been previously considered in [18, 35, 36, 50]. Further applications of higher-order discretizations in imaging arise when the image data to be reconstructed are not a priori quantized into piecewise constant pixel values.

Second, (1) is popular as a regularizer in inverse coefficient problems for partial differential equations; see for instance [4, 16, 17]. In this situation, a discretization by finite elements of both the state and the unknown coefficient is often the natural choice, in particular on nontrivial geometries. Third, finite element discretizations generalize easily to higher order simply by increasing the polynomial degree. It is well known that higher-order discretizations can outperform mesh refinement approaches when the function to be approximated is sufficiently smooth. Finally, we anticipate that our approach can be extended to total generalized variation (TGV) introduced in [10] as well and imaging problems on surfaces as in [32, 40], although this is not the subject of the present paper.

The vast majority of all publications to date dealing with the TV-seminorm use a (lowest order) finite difference approximation of (1) on Cartesian grids, where the divergence is approximated by one-sided differences. We are aware of only a few contributions including [1, 5, 6, 9, 17, 24, 27, 53] using lowest order (\(r = 1\)) continuous finite elements, i.e., \(u \in \mathcal {C}\mathcal {G}_{1}(\varOmega )\). In this case the edge-jump contributions in (2) and (3) vanish, and since \(\nabla u \in \mathcal {D}\mathcal {G}_{0}(\varOmega )\) holds, formulas (2) and (3) coincide. Moreover, the case \(u \in \mathcal {D}\mathcal {G}_{0}(\varOmega )\) on uniform, rectangular grids, i.e., pixel images, is discussed in [37, 49]. Recently, [14] proposed a different discrete approximation of the total variation over the Crouzeix–Raviart finite element space for the image data u, which lies in between \(\mathcal {D}\mathcal {G}_{1}(\varOmega )\) and \(\mathcal {C}\mathcal {G}_{1}(\varOmega )\).

To the best of our knowledge, the definition of discrete TV-seminorm (3) as well as the role of the Raviart–Thomas finite element space to establish dual representation (4) are novel contributions of the present work.

This paper is structured as follows. We collect some background material on finite elements in Sect. 2. In Sect. 3 we establish dual representation (3) of discrete TV-seminorm (4). We also derive an estimate of the error between (3) and (2). We present the discrete TV-\(L^2\) model along with its dual in Sect. 4. In Sect. 5 we show that two well-known algorithms for TV-\(L^2\) image denoising and inpainting can be applied in our (possibly higher-order) finite element setting with little or no changes compared to their classical counterparts in the Cartesian finite difference domain. Further implementation details in the finite element framework FEniCS are given in Sect. 6, and numerical results for TV-\(L^2\) denoising and inpainting are presented in Sect. 7. We conclude with an outlook in Sect. 8.

1.1 Notation

Let \(\varOmega \subset \mathbb {R}^2\) be a bounded domain with polygonal boundary. We denote by \(L^2(\varOmega )\) and \(H^1(\varOmega )\) the usual Lebesgue and Sobolev spaces. \(H^1_0(\varOmega )\) is the subspace of \(H^1(\varOmega )\) of functions having zero trace on the boundary \(\partial \varOmega \). The vector-valued counterparts of these spaces as well as all vector-valued functions will be written in bold-face notation. Moreover, we define

$$\begin{aligned} {\varvec{H}}({\mathrm{div}}\,;\varOmega ) :=\big \{ {\varvec{p}}\in {\varvec{L}}^2(\varOmega ): {\mathrm{div}}\,{\varvec{p}}\in L^2(\varOmega ) \big \} \end{aligned}$$

and \({\varvec{H}}_0({\mathrm{div}}\,;\varOmega )\) is the subspace of functions having zero normal trace on the boundary, i.e., \({\varvec{p}}\cdot {\varvec{n}}= 0\).

2 Finite Element Spaces

Suppose that \(\varOmega \) is triangulated by a geometrically conforming mesh (no hanging nodes) consisting of nondegenerate triangular cells T and interior edges E. Recall that on each interior edge, \({\varvec{n}}_{E}\) denotes the unit normal vector (of arbitrary but fixed orientation). Throughout, \(r \ge 0\) denotes the degree of certain polynomials.

2.1 Lagrangian Finite Elements

Let \(\mathcal {P}_r(T)\) denote the space of scalar, bivariate polynomials on T with total maximal degree r. The dimension of \(\mathcal {P}_r(T)\) is \((r+1) \, (r+2) / 2\). Let \(\{\Phi _{T,k}\}\) denote the standard nodal basis of \(\mathcal {P}_r(T)\) with associated Lagrange nodes \(\{X_{T,k}\}\), \(k=1,\ldots , (r+1)\,(r+2)/2\). In other words, each \(\Phi _{T,k}\) is a function in \(\mathcal {P}_r(T)\) satisfying \(\Phi _{T,k}(X_{T,k'}) = \delta _{kk'}\). We denote by

$$\begin{aligned} \mathcal {D}\mathcal {G}_{r}(\varOmega )&:=\big \{ u \in L^2(\varOmega ): { \left. u \phantom {|} \right| _{T} } \in \mathcal {P}_r(T) \big \}, \quad r \ge 0, \end{aligned}$$
(6)
$$\begin{aligned} \mathcal {C}\mathcal {G}_{r}(\varOmega )&:=\big \{ u \in C(\varOmega ): { \left. u \phantom {|} \right| _{T} } \in \mathcal {P}_r(T) \big \}, \quad r \ge 1, \end{aligned}$$
(7)

the standard finite element spaces of globally discontinuous (\(L^2\)-conforming) or continuous (\(H^1\)-conforming) piecewise polynomials of degree r. A finite element function \(u \in \mathcal {D}\mathcal {G}_{r}(\varOmega )\) or \(\mathcal {C}\mathcal {G}_{r}(\varOmega )\), restricted to T, is represented by its coefficient vector w.r.t. the basis \(\{\Phi _{T,k}\}\), which is simply given by point evaluations. We use the notation

$$\begin{aligned} u_{T,k} = { \left. u \phantom {|} \right| _{T} }(X_{T,k}) \end{aligned}$$

to denote the elements of the coefficient vector of a function \(u \in \mathcal {D}\mathcal {G}_{r}(\varOmega )\) or \(\mathcal {C}\mathcal {G}_{r}(\varOmega )\).

Frequently we will also work with the space \(\mathcal {P}_{r-1}(T)\), whose standard nodal basis and Lagrange nodes we denote by \(\{\varphi _{T,i}\}\) and \(\{x_{T,i}\}\), \(i=1,\ldots , r\,(r+1)/2\). The interpolation operator into this space (used in definition (3) of \(\vert u\vert _{\mathrm{DTV}(\varOmega )}\)) is defined by

Similarly, \(\mathcal {P}_r(E)\) denotes the space of univariate scalar polynomials on E of maximal degree r, which has dimension \(r+1\). Let \(\{\varphi _{E,j}\}\) denote the standard nodal basis of \(\mathcal {P}_r(E)\) with associated Lagrange nodes \(\{x_{E,j}\}\), \(j=1,\ldots , r+1\). The associated interpolation operator becomes

$$\begin{aligned} \mathcal {I}_E \{ v \} :=\sum _{j=1}^{r+1} v(x_{E,j}) \, \varphi _{E,j}. \end{aligned}$$

Finally, we address the definition of the jump of a \(\mathcal {D}\mathcal {G}_{r}(\varOmega )\) function across an interior edge E connecting two cells \(T_1\) and \(T_2\) with their respective outer normals \({\varvec{n}}_1\) and \({\varvec{n}}_2 = - {\varvec{n}}_1\) of unit length. We recall that the edge normal \({\varvec{n}}_E\) coincides with either \({\varvec{n}}_1\) or \({\varvec{n}}_2\), and we distinguish between the

figure f

Notice that the sign of \(\llbracket u\rrbracket \) depends on the orientation of \({\varvec{n}}_{E}\), while does not. For instance when \({\varvec{n}}_{E} = {\varvec{n}}_1\), then \(\llbracket u\rrbracket :={ \left. u \phantom {|} \right| _{T_1} } - { \left. u \phantom {|} \right| _{T_2} }\) holds. Moreover, we point out that holds.

2.2 Raviart–Thomas Finite Elements

For \(r \ge 0\), we denote by

$$\begin{aligned}&\mathcal {R}\mathcal {T}_{\!\!\;r+1}(\varOmega )\nonumber \\&\quad :=\big \{ {\varvec{p}}\in {\varvec{H}}({\mathrm{div}}\,;\varOmega ): { \left. {\varvec{p}} \phantom {|} \right| _{T} } \in \mathcal {P}_r(T)^2 + {\varvec{x}}\, \mathcal {P}_r(T) \big \} \end{aligned}$$
(9)

the (\({\varvec{H}}({\mathrm{div}}\,;\varOmega )\)-conforming) Raviart–Thomas finite element space of order \(r+1\).Footnote 2 Moreover, \(\mathcal {R}\mathcal {T}_{\!\!\;r+1}^0(\varOmega )\) is the subspace of functions satisfying \({\varvec{p}}\cdot {\varvec{n}}= 0\) along the boundary of \(\varOmega \). The dimension of the polynomial space on each cell is \((r+1) \, (r+3)\). Notice that several choices of local bases for \(\mathcal {R}\mathcal {T}_{\!\!\;r+1}(T)\) are described in the literature, based on either point evaluations or integral moments as degrees of freedom (dofs). Clearly, a change of the basis does not alter the finite element space but only the representation of its members, which can be identified with their coefficient vectors w.r.t. a particular basis. For the purpose of this paper, it is convenient to work with the following global degrees of freedom of integral type for \({\varvec{p}}\in \mathcal {R}\mathcal {T}_{\!\!\;r+1}(\varOmega )\); see [39, Ch. 3.4.1]:

$$\begin{aligned} {\varvec{\sigma }}_{T,i}({\varvec{p}})&:=\int _T \varphi _{T,i} \, {\varvec{p}}\, \text{ d }x, \quad i = 1, \ldots , \textstyle r\,(r+1)/2, \end{aligned}$$
(10a)
$$\begin{aligned} \sigma _{E,j}({\varvec{p}})&:=\int _E \varphi _{E,j} \,({\varvec{p}}\cdot {\varvec{n}}_E) \, \text{ d }S, \quad j = 1, \ldots , r+1. \end{aligned}$$
(10b)

We will refer to (10a) as triangle-based, or interior, dofs and to (10b) as edge-based dofs. Notice that while the edge-based dofs are scalar, the triangle-based dofs have values in \(\mathbb {R}^2\) for notational convenience. The global basis functions for the space \(\mathcal {R}\mathcal {T}_{\!\!\;r+1}(\varOmega )\) are denoted by \(\varvec{\psi }^T_{i}\) and \(\varvec{\psi }^E_{j}\), respectively. Notice that \(\varvec{\psi }^T_{i}\) is \(\mathbb {R}^{2 \times 2}\)-valued. As is the case for all finite element spaces, any dof applied to any of the basis functions evaluates to zero except

(11)

Let us emphasize that for any function \({\varvec{p}}\in \mathcal {R}\mathcal {T}_{\!\!\;r+1}^0(\varOmega )\), dof values (10) are precisely the coefficients of \({\varvec{p}}\) w.r.t. the basis, i.e.,

(12)
Table 1 Finite element spaces, their degrees of freedom and corresponding bases. Here \(N_T\), \(N_E\) and \(N_V\) denote the number of triangles, interior edges and vertices in the triangular mesh. A term like \((r-a)^+\) should be understood as \(\max \{r-a,0\}\)

2.3 Index Conventions

In order to reduce the notational overhead, we are going to associate specific ranges for any occurrence of the indices i, j and k in the sequel:

$$\begin{aligned} \begin{aligned}&i \in \{1, \ldots , r \, (r+1)/2 \} \text { as in the basis functions}\\&\qquad \varphi _{T,i}\text { of }\mathcal {P}_{r-1}(T)\text { and dofs } {\varvec{\sigma }}_{T,i}\text { in }\mathcal {R}\mathcal {T}_{\!\!\;r+1}(\varOmega ) ,\\&j \in \{1, \ldots , r+1 \} \text { as in the basis functions}\\&\qquad \varphi _{E,j}\text { of }\mathcal {P}_r(E)\text { and dofs of } \sigma _{E,j}\text { in }\mathcal {R}\mathcal {T}_{\!\!\;r+1}(\varOmega ) ,\\&k \in \{1, \ldots , (r+1)(r+2)/2 \} \text { as in the basis functions}\\&\qquad \Phi _{T,k}\text { of }\mathcal {P}_r(T) . \end{aligned} \end{aligned}$$

For instance, (12) will simply be written as

$$\begin{aligned} {\varvec{p}}= \sum _{T,i} {\varvec{\sigma }}_{T,i}({\varvec{p}}) \, \varvec{\psi }^T_{i} + \sum _{E,j} \sigma _{E,j}({\varvec{p}}) \, \varvec{\psi }^E_{j} \end{aligned}$$

in what follows. For convenience, we summarize the notation for the degrees of freedom and basis functions needed throughout the paper in Table 1.

3 Properties of the Discrete Total Variation

In this section we investigate the properties of the discrete total variation-seminorm

for functions \(u \in \mathcal {D}\mathcal {G}_{r}(\varOmega )\). Recall that \(\mathcal {I}_T\) and \(\mathcal {I}_E\) are local interpolation operators into the polynomial spaces \(\mathcal {P}_{r-1}(T)\) and \(\mathcal {P}_r(E)\), respectively. In terms of the Lagrangian bases \(\{\varphi _{T,i}\}\) and \(\{\varphi _{E,j}\}\) of these spaces, we have

(13a)
(13b)

where the weights are given by

$$\begin{aligned} c_{T,i} :=\int _T \varphi _{T,i} \, \text{ d }x\quad \text {and} \quad c_{E,j} :=\int _E \varphi _{E,j} \, \text{ d }S. \end{aligned}$$
(14)

Figure 2 provides an illustration of the difference between the contributions

to \(\vert u\vert _{\mathrm{TV}(\varOmega )}\) and \(\vert u\vert _{\mathrm{DTV}(\varOmega )}\).

Fig. 2
figure 2

Illustration of typical edge-jump contributions to \(\vert u\vert _{\mathrm{TV}(\varOmega )}\) and to \(\vert u\vert _{\mathrm{DTV}(\varOmega )}\). The green and red curves show \(\llbracket u\rrbracket \) and \(\vert \llbracket u\rrbracket \vert \), respectively, and the blue curve shows \(\mathcal {I}_E \big \{ \vert \llbracket u\rrbracket \vert \big \}\) for polynomial degrees \(r = 1\) (left) and \(r = 2\) (right). The left picture also confirms \(\vert u\vert _{\mathrm{TV}(\varOmega )} \le \vert u\vert _{\mathrm{DTV}(\varOmega )}\) when \(r = 1\), see corollary 3.1, while \(\vert u\vert _{\mathrm{TV}(\varOmega )}\) may be larger or smaller than \(\vert u\vert _{\mathrm{DTV}(\varOmega )}\) when \(r \in \{2,3,4\}\)

In virtue of the fact that \({ \left. \nabla u \phantom {|} \right| _{T} } \in \mathcal {P}_{r-1}(T)^2\) and \(\llbracket u\rrbracket \in \mathcal {P}_r(E)\), it is clear that \(\vert \,\cdot \,\vert _{\mathrm{DTV}(\varOmega )}\) is indeed a seminorm on \(\mathcal {D}\mathcal {G}_{r}(\varOmega )\), provided that all weights \(c_{T,i}\) and \(c_{E,j}\) are nonnegative. The following lemma shows that this is the case for polynomial degrees \(0 \le r \le 4\).

Lemma 3.1

(Lagrange basis functions with positive integrals)

  1. (a)

    Let \(T \subset \mathbb {R}^2\) be a triangle and \(1 \le r \le 4\). Then \(c_{T,i} \ge 0\) holds for all \(i = 1, \ldots , r \, (r+1)/2\). When \(r \ne 3\), then all \(c_{T,i} > 0\).

  2. (b)

    Let \(E \subset \mathbb {R}^2\) be an edge and \(0 \le r \le 7\). Then \(c_{E,j} > 0\) holds for all \(j = 1, \ldots , r+1\).

Proof

Given that the Lagrange points form a uniform lattice on either T or E, the values of \(c_{T,i}\) and \(c_{E,j}\) are precisely the integration weights of the closed Newton–Cotes formulas. For triangles, these weights are tabulated, e.g., in [48, Tab. I] for orders \(0 \le r \le 8\), and they confirm (a). For edges (intervals), we refer the reader to, e.g., [21, Ch. 2.5] or [20, Ch. 5.1.5], which confirms (b). \(\square \)

We can now prove the precise form of dual representation (4) of discrete TV-seminorm (3).

Theorem 3.1

(Dual Representation of \(\vert u\vert _{\mathrm{DTV}(\varOmega )}\)) Suppose \(0 \le r \le 4\). Then for any \(u \in \mathcal {D}\mathcal {G}_{r}(\varOmega )\), discrete TV-seminorm (3) satisfies

$$\begin{aligned} \vert u\vert _{\mathrm{DTV}(\varOmega )}&= \sup \Bigg \{ \int _\varOmega u {\mathrm{div}}\,{\varvec{p}}\, \text{ d }x: {\varvec{p}}\in \mathcal {R}\mathcal {T}_{\!\!\;r+1}^0(\varOmega ), \nonumber \\&\quad \vert {\varvec{\sigma }}_{T,i}({\varvec{p}})\vert _{s^*} \le c_{T,i} \text { for all }T, i = 1, \ldots , r \, (r+1)/2, \nonumber \\&\quad \vert \sigma _{E,j}({\varvec{p}})\vert \le \vert {\varvec{n}}_E\vert _s \, c_{E,j} \text { for all } E, j = 1, \ldots , r+1 \Bigg \} . \end{aligned}$$
(15)

Proof

We begin with the observation that integration by parts yields

$$\begin{aligned}&- \int _\varOmega u {\mathrm{div}}\,{\varvec{p}}\, \text{ d }x= - \sum _T \int _T u {\mathrm{div}}\,{\varvec{p}}\, \text{ d }x\nonumber \\&\quad = \sum _T \int _T \nabla u \cdot {\varvec{p}}\, \text{ d }x+ \sum _E \int _E \llbracket u\rrbracket \, ({\varvec{p}}\cdot {\varvec{n}}_E) \, \text{ d }S\end{aligned}$$
(16)

for any \(u \in \mathcal {D}\mathcal {G}_{r}(\varOmega )\) and \({\varvec{p}}\in \mathcal {R}\mathcal {T}_{\!\!\;r+1}^0(\varOmega )\), i.e., \({\varvec{p}}\cdot {\varvec{n}}= 0\) on the boundary \(\partial \varOmega \).

Let us consider one of the edge integrals first. Notice that \(\llbracket u\rrbracket \in \mathcal {P}_r(E)\) holds and thus \(\llbracket u\rrbracket = \sum _j v_j \, \varphi _{E,j}\) with coefficients \(v_j = \llbracket u\rrbracket (x_{E,j})\). By duality property (11) of the basis of \(\mathcal {R}\mathcal {T}_{\!\!\;r+1}(\varOmega )\), we obtain

$$\begin{aligned}&\int _E \llbracket u\rrbracket \, ({\varvec{p}}\cdot {\varvec{n}}_E) \, \text{ d }S\\&\quad = \sum _j v_j \int _E \varphi _{E,j} \, ({\varvec{p}}\cdot {\varvec{n}}_E) \, \text{ d }S= \sum _j v_j \, \sigma _{E,j}({\varvec{p}}). \end{aligned}$$

The maximum of this expression w.r.t. \({\varvec{p}}\) verifying the constraints in (15) is attained when

$$\begin{aligned}\sigma _{E,j}({\varvec{p}}) = {\mathrm{sgn}}\,(v_j) \, \vert {\varvec{n}}_E\vert _s \, c_{E,j}\end{aligned}$$

holds. Here we are using the fact that \(c_{E,j} > 0\) holds; see Lemma 3.1. Choosing \({\varvec{p}}\) as the maximizer yields

where we used \(\vert v_j\vert = \bigl \vert \llbracket u\rrbracket (x_{E,j})\bigr \vert = \bigl \vert \llbracket u\rrbracket \bigr \vert (x_{E,j})\) and thus in the last step.

Next we consider an integral over a triangle, which is relevant only when \(r \ge 1\). Since \(u \in \mathcal {P}_r(T)\) holds, we have \(\nabla u \in \mathcal {P}_{r-1}(T)^2\) and thus \(\nabla u = \sum _i \varphi _{T,i} \, {\varvec{w}}_i\) with vector-valued coefficients \({\varvec{w}}_i = \nabla u(x_{T,i})\). Using again duality property (11) of the basis of \(\mathcal {R}\mathcal {T}_{\!\!\;r+1}(\varOmega )\), we obtain

$$\begin{aligned} \int _T \nabla u \cdot {\varvec{p}}\, \text{ d }x= \sum _i {\varvec{w}}_i \cdot \int _T \varphi _{T,i} \, {\varvec{p}}\, \text{ d }x= \sum _i {\varvec{w}}_i \cdot {\varvec{\sigma }}_{T,i}({\varvec{p}}). \end{aligned}$$

By virtue of Hölder’s inequality, the maximum of this expression w.r.t. \({\varvec{p}}\) verifying the constraints in (15) can be characterized explicitly. When \({\varvec{w}}_i \ne {\varvec{0}}\) and \(1 \le s < \infty \), then the maximum is attained when

$$\begin{aligned} {\varvec{\sigma }}_{T,i}({\varvec{p}}) = \begin{pmatrix} ({\mathrm{sgn}}\,{\varvec{w}}_{i,1}) \, \vert {\varvec{w}}_{i,1}\vert ^{s-1} \\ ({\mathrm{sgn}}\,{\varvec{w}}_{i,2}) \, \vert {\varvec{w}}_{i,2}\vert ^{s-1} \end{pmatrix} \frac{c_{T,i}}{\vert {\varvec{w}}_i\vert _s^{s-1}}. \end{aligned}$$

Similarly, in case \({\varvec{w}}_i \ne {\varvec{0}}\) and \(s = \infty \), we choose

$$\begin{aligned} {\varvec{\sigma }}_{T,i}({\varvec{p}}) = {\left\{ \begin{array}{ll} c_{T,i} \, ({\mathrm{sgn}}\,{\varvec{w}}_{i,\ell }) &{}\quad \text {for exactly one component} \\ &{}\quad \ell \in \{1,2\}\text { s.t. }\ \vert {\varvec{w}}_{i,\ell }\vert = \vert {\varvec{w}}_i\vert _\infty , \\ 0 &{} \quad \text {otherwise}. \end{array}\right. } \end{aligned}$$

When \({\varvec{w}}_i = {\varvec{0}}\) holds, \({\varvec{\sigma }}_{T,i}({\varvec{p}})\) can be chosen arbitrarily but subject to \(\vert {\varvec{\sigma }}_{T,i}({\varvec{p}})\vert _{s^*} \le c_{T,i}\). In any case, we arrive at the optimal value \({\varvec{w}}_i \cdot {\varvec{\sigma }}_{T,i}({\varvec{p}}) = c_{T,i} \, \vert {\varvec{w}}_i\vert _s\). As before, we are using here the fact that \(c_{T,i} \ge 0\) holds; see again Lemma 3.1. For an optimal \({\varvec{p}}\), we thus have

$$\begin{aligned}&\int _T \nabla u \cdot {\varvec{p}}\, \text{ d }x= \sum _i \vert {\varvec{w}}_i\vert _s \, c_{T,i}\\&\quad = \sum _i \int _T \vert {\varvec{w}}_i\vert _s \, \varphi _{T,i} \, \text{ d }x= \int _T \mathcal {I}_T \big \{ \vert \nabla {u}\vert _s \big \} \, \text{ d }x, \end{aligned}$$

where we used \(\vert {\varvec{w}}_i\vert _s = \vert \nabla u(x_{T,i})\vert _s = \vert \nabla u\vert _s(x_{T,i})\) in the last step.

Finally, we point out that each summand in (16) depends on \({\varvec{p}}\) only through the dof values \({\varvec{\sigma }}_{T,i}({\varvec{p}})\) or \(\sigma _{E,j}({\varvec{p}})\) associated with one particular triangle or edge. Consequently, the maximum of (16) is attained if and only if each summand attains its maximum subject to the constraints on the dof values set forth in (15). Since \(-{\varvec{p}}\) verifies the same constraints as \({\varvec{p}}\), the maxima over \(\pm \int _\varOmega u {\mathrm{div}}\,{\varvec{p}}\, \text{ d }x\) coincide and (15) is proved. \(\square \)

Remark 3.1

(The lowest order case\(r = 0\)) In the lowest order case \(r = 0\), the only basis function on any interior edge E is \(\varphi _{E,1} \equiv 1\) so that \(c_{E,1} = \vert E\vert \) holds. Consequently, (15) reduces to (5).

It may appear peculiar that the constraints for the edge dofs in (15) are scalar and linear, while the constraints for the pairwise triangle dofs \({\varvec{\sigma }}_{T,i}({\varvec{p}}) \in \mathbb {R}^2\) are generally nonlinear. Notice, however, that it becomes evident in the proof of Theorem 3.1 that the edge dofs are utilized to measure the contributions in \(\vert u\vert _{\mathrm{DTV}(\varOmega )}\) associated with the edge jumps of u, while the triangle dofs account for the contributions attributed to the gradient \(\nabla u\). Since the edge jumps are maximal in the direction normal to the edge, scalar dofs suffice in order to determine the unknown jump height. On the other hand, both the norm and direction of the gradient are unknown and must be recovered from integration against suitable functions \({\varvec{p}}\). To this end, a variation of \({\varvec{\sigma }}_{T,i}({\varvec{p}})\) within a two-dimensional ball (w.r.t. the \(\vert \,\cdot \,\vert _{s^*}\)-norm) is required, leading to constraints \(\vert {\varvec{\sigma }}_{T,i}({\varvec{p}})\vert _{s^*} \le c_{T,i}\) on pairs of coefficients of \({\varvec{p}}\). Notice that those constraints appear for polynomial degrees \(r \ge 1\) and they are nonlinear unless \(s^* \in \{1,\infty \}\), which correspond to variants of the TV-seminorm with maximal anisotropy; compare Fig. 1.

We conclude this section by comparing TV-seminorm (2) with our discrete variant (3) for \(\mathcal {D}\mathcal {G}_{r}(\varOmega )\) functions. For the purpose of the following result, let us denote by \(\llbracket u\rrbracket '\) the tangential derivative (in arbitrary direction of traversal) of the scalar jump of u along an edge E. The symbol

$$\begin{aligned}&\vert u\vert _{W^{2,\infty }(T)} = \max \Big \{ \max _{x \in T} \left\{ \vert u_{x_1x_1}(x)\vert \right\} , \;\\&\quad \max _{x \in T} \left\{ \vert u_{x_1x_2}(x)\vert \right\} , \; \max _{x \in T} \left\{ \vert u_{x_2x_2}(x)\vert \right\} \Big \} \end{aligned}$$

is the \(W^{2,\infty }\)-seminorm of u on T. Moreover, we recall that the aspect ratio \(\gamma _T = h_T/\varrho _T\) of a triangle T is the ratio between its diameter (longest edge) \(h_T\) and the diameter \(\varrho _T\) of the maximal inscribed circle; see for instance [25, Definition 1.107].

Proposition 3.1

There is a constant \(C > 0\) such that

$$\begin{aligned}&\bigl \vert \vert u\vert _{\mathrm{TV}(\varOmega )} - \vert u\vert _{\mathrm{DTV}(\varOmega )} \bigr \vert \nonumber \\&\quad \le C \, h \, \Bigl ( \max _T \, \vert u\vert _{W^{2,\infty }(T)} + \sum _E \, \bigl \Vert \llbracket u\rrbracket ' \bigr \Vert _{L^1(E)} \Bigr ) \end{aligned}$$
(17)

holds for all \(u \in \mathcal {D}\mathcal {G}_{r}(\varOmega )\), \(0 \le r \le 4\), where \(h :=\max _T h_T\) is the mesh size. The constant C depends only on r, s, the maximal aspect ratio \(\max _T \gamma _T\) and the area \(\vert \varOmega \vert \).

Proof

We use (13) to interpret the discrete TV-seminorm as a quadrature rule applied to TV-seminorm (2). Note that no volume terms appear in the piecewise constant case \(r = 0\). In case \(r \ge 1\), we use [25, Lem. 8.4] with \(d = 2\), \(p = \infty \), \(k_q = 0\), and \(s = 1\) therein, for the volume terms in (13a). This result yields the existence of a constant \(C > 0\) such that

$$\begin{aligned} \biggl \vert \int _T v \, \text{ d }x- \sum _i \, v(x_{T,i}) \, c_{T,i} \biggr \vert \le C \, h_T^3 \, \vert v\vert _{W^{1,\infty }(T)} \end{aligned}$$

holds for all \(v \in W^{1,\infty }(T)\). Using this estimate for \(v = \vert \nabla u\vert _s\) shows

$$\begin{aligned}&\biggl \vert \int _T \vert \nabla u\vert _s \, \text{ d }x- \sum _i \, \bigl \vert \nabla u(x_{T,i})\bigr \vert _s \, c_{T,i} \biggr \vert \\&\quad \le C \, h_T^3 \, \bigl \vert \vert \nabla u\vert _s\bigr \vert _{W^{1,\infty }(T)}. \end{aligned}$$

(During the proof, C denotes a generic constant which may change from instance to instance.) Summing over T and using \(\sum _T h_T^2 \le C\) (depending on \(\vert \varOmega \vert \) and the maximal aspect ratio \(\max _T \gamma _T\)), we find

$$\begin{aligned}&\sum _T \, \biggl \vert \int _T \Bigl ( \vert \nabla u\vert _s - \mathcal {I}_T \big \{ \vert \nabla u\vert _s \big \} \Bigr ) \, \text{ d }x\biggr \vert \\&\quad = \sum _T \, \biggl \vert \int _T \vert \nabla u\vert _s \, \text{ d }x- \sum _i \, \bigl \vert \nabla u(x_{T,i})\bigr \vert _s \, c_{T,i} \biggr \vert \\&\quad \le C \, h \, \max _T \, \bigl \vert \vert \nabla u\vert _s\bigr \vert _{W^{1,\infty }(T)}. \end{aligned}$$

Since \({\varvec{v}}\mapsto \vert {\varvec{v}}\vert _s\) is globally Lipschitz continuous, we find that

$$\begin{aligned} \max _T \, \bigl \vert \vert \nabla u\vert _s\bigr \vert _{W^{1,\infty }(T)} \le C \, \max _T \, \vert u\vert _{W^{2,\infty }(T)}. \end{aligned}$$

Similarly, for each edge E, we will apply [25, Lem. 8.4] in (13b) (using \(d = 1\), \(p = 1\), \(k_q = 0\), and \(s = 1\) therein); note that the proof carries over to this limit case with \(p = 1\) and \(d = s\). This implies the existence of \(C > 0\) such that

$$\begin{aligned} \biggl \vert \int _E v \, \text{ d }S- \sum _j \, v(x_{E,j}) \, c_{E,j} \biggr \vert \le C \, h \, \Vert v' \Vert _{L^1(E)} \end{aligned}$$

holds for all \(v \in W^{1,1}(E)\), where \(v'\) denotes the tangential derivative of v. Using \(v = \bigl \vert \llbracket u\rrbracket \bigr \vert \) yields the estimate

$$\begin{aligned} \biggl \vert \int _E \bigl \vert \llbracket u\rrbracket \bigr \vert \, \text{ d }S- \sum _j \, \bigl \vert \llbracket u\rrbracket (x_{E,j})\bigr \vert \, c_{E,j} \biggr \vert \le C \, h \, \bigl \Vert \vert \llbracket u\rrbracket \vert ' \bigr \Vert _{L^1(E)}. \end{aligned}$$

Here, \(\vert \llbracket u\rrbracket \vert '\) is the tangential derivative of the absolute value of the jump of u on E. Notice that \(\bigl \Vert \vert \llbracket u\rrbracket \vert ' \bigr \Vert _{L^1(E)} = \bigl \Vert \llbracket u\rrbracket ' \bigr \Vert _{L^1(E)}\) holds. Summing over E yields

$$\begin{aligned}&\sum _E \, \biggl \vert \int _E \bigl \vert \llbracket u\rrbracket \bigr \vert - \mathcal {I}_E \big \{ \bigl \vert \llbracket u\rrbracket \bigr \vert \big \} \, \text{ d }S\biggr \vert \\&\quad = \sum _E \, \biggl \vert \int _E \bigl \vert \llbracket u\rrbracket \bigr \vert \, \text{ d }S- \sum _j \, \bigl \vert \llbracket u\rrbracket (x_{E,j})\bigr \vert \, c_{E,j} \biggr \vert \\&\quad \le C \, h \, \sum _E \, \bigl \Vert \llbracket u\rrbracket ' \bigr \Vert _{L^1(E)}. \end{aligned}$$

By using on each edge, and combining the above estimates, we obtain the announced error bound. \(\square \)

Corollary 3.1

(Low-Order Polynomial Degrees)

  1. (a)

    When \(r = 0\), we have \(\vert u\vert _{\mathrm{TV}(\varOmega )} = \vert u\vert _{\mathrm{DTV}(\varOmega )}\) for all \(u \in \mathcal {D}\mathcal {G}_{r}(\varOmega )\).

  2. (b)

    When \(r = 1\), then \(\vert u\vert _{\mathrm{TV}(\varOmega )} \le \vert u\vert _{\mathrm{DTV}(\varOmega )}\) for all \(u \in \mathcal {D}\mathcal {G}_{r}(\varOmega )\).

Proof

In case \(r = 0\), the right-hand side of the estimate in proposition 3.1 vanishes. In case \(r = 1\), \(\nabla u\) is piecewise constant and the corresponding terms in (2) and (3) coincide. Moreover, for affine functions \(v : E \rightarrow \mathbb {R}\) it is easy to check that

$$\begin{aligned} \int _E \vert v\vert \, \text{ d }S\le \frac{1}{2} \, \Bigl ( \bigl \vert v(x_{E,1})\bigr \vert + \bigl \vert v(x_{E,2})\bigr \vert \Bigr ) \, \int _E 1 \, \text{ d }S, \end{aligned}$$

where \(x_{E,1}\) and \(x_{E,2}\) are the two end points of E. This yields the claim in case \(r = 1\). \(\square \)

We also mention that the boundary perimeter formula

$$\begin{aligned} {\mathrm{Per}}\,(E) :=\vert \chi _E\vert _{\mathrm{TV}(\varOmega )} = \vert \chi _E\vert _{\mathrm{DTV}(\varOmega )} = {\mathrm{length}}\,(E) \end{aligned}$$

holds when E is a union of triangles and thus the characteristic function \(\chi _E\) belongs to \(\mathcal {D}\mathcal {G}_{0}(\varOmega )\).

4 Discrete Dual Problem

In this section we revisit the classical image denoising and inpainting problem,

figure g

see [12, 26, 46]. We introduce its discrete counterpart and establish its Fenchel dual. Here \(\varOmega _0 \subset \varOmega \) is the domain where data are available, and \(\beta \) is a positive parameter. For simplicity, we assume that the inpainting region \(\varOmega {\setminus }\varOmega _0\) is the union of a number of triangles in the discrete problems.

The discrete counterpart of (TV-L2) we consider is

figure h

The reconstructed image u is sought in \(\mathcal {D}\mathcal {G}_{r}(\varOmega )\) for some \(0 \le r \le 4\). We can assume that the given data f belong to \(\mathcal {D}\mathcal {G}_{r}(\varOmega _0)\) as well, possibly after applying interpolation or quasi-interpolation. Notice that we use the discrete TV-seminorm as regularizer.

The majority of algorithms considered in the literature utilize either the primal or the dual formulations of the problems at hand. The continuous (pre-)dual problem for (TV-L2) is well known, see for instance [34]:

figure i

with \({\varvec{p}}\in {\varvec{H}}_0({\mathrm{div}}\,;\varOmega )\). Our first result in this section shows that the dual of discrete problem (DTV-L2) has a very similar structure as (TV-L2-D), but with the pointwise constraints replaced by coefficient-wise constraints as in (15). For future reference, we denote the associated admissible set by

$$\begin{aligned}&{\varvec{P}}:=\Big \{ {\varvec{p}}\in \mathcal {R}\mathcal {T}_{\!\!\;r+1}^0(\varOmega ): \Big . \nonumber \\&\quad \vert {\varvec{\sigma }}_{T,i}({\varvec{p}})\vert _{s^*} \le c_{T,i} \text { for all }T \text { and all } i, \nonumber \\&\quad \vert \sigma _{E,j}({\varvec{p}})\vert \le \vert {\varvec{n}}_E\vert _s \, c_{E,j} \text { for all }E\text { and all } j \Big \}. \end{aligned}$$
(18)

Theorem 4.1

(Discrete dual problem for (DTV-L2)) Let \(0 \le r \le 4\). Then the dual problem of (DTV-L2) is

figure j

Here \({\varvec{p}}\in \beta {\varvec{P}}\) means that \({\varvec{p}}\) satisfies constraints as in (18) but with \(c_{T,i}\) and \(c_{E,j}\) replaced by \(\beta \, c_{T,i}\) and \(\beta \, c_{E,j}\), respectively.

Proof

We cast (DTV-L2) in the common form \(F(u) + \beta \, G(\varLambda u)\). Let us define \(U :=\mathcal {D}\mathcal {G}_{r}(\varOmega )\) and \(F(u) :=\frac{1}{2} \Vert u-f\Vert _{L^2(\varOmega _0)}^2\). The operator \(\varLambda \) represents the gradient of u, which consists of the triangle-wise contributions plus measure-valued contributions due to (normal) edge jumps. We therefore define

$$\begin{aligned} \varLambda : U \rightarrow Y :=\prod _T \mathcal {P}_{r-1}(T)^2 \times \prod _E \mathcal {P}_r(E). \end{aligned}$$
(19a)

The components of \(\varLambda u\) will be addressed by \((\varLambda u)_T\) and \((\varLambda u)_E\), respectively, and they are defined by

$$\begin{aligned} (\varLambda u)_T :={ \left. \nabla u \phantom {|} \right| _{T} } \quad \text {and} \quad (\varLambda u)_E :=\llbracket u\rrbracket _E. \end{aligned}$$
(19b)

Finally, the function \(G: Y \rightarrow \mathbb {R}\) is defined by

$$\begin{aligned} G({\varvec{d}})&:=\sum _T \int _T \mathcal {I}_T \big \{ \vert {\varvec{d}}_T\vert _s \big \} \, \text{ d }x\nonumber \\&\qquad + \sum _E \vert {\varvec{n}}_E\vert _s \int _E \mathcal {I}_E \big \{ \vert d_E\vert \big \} \, \text{ d }S. \end{aligned}$$
(20)

A crucial observation now is that the dual space \(Y^*\) of Y can be identified with \(\mathcal {R}\mathcal {T}_{\!\!\;r+1}^0(\varOmega )\) when the duality product is defined as

$$\begin{aligned} \langle {\varvec{p}} ,\, {\varvec{d}} \rangle :=\sum _T \int _T {\varvec{p}}\cdot {\varvec{d}}_T \, \text{ d }x+ \sum _E \int _E ({\varvec{p}}\cdot {\varvec{n}}_E) \, d_E \, \text{ d }S. \end{aligned}$$
(21)

In fact, \(\mathcal {R}\mathcal {T}_{\!\!\;r+1}^0(\varOmega )\) has the same dimension as Y and, for any \({\varvec{p}}\in \mathcal {R}\mathcal {T}_{\!\!\;r+1}^0(\varOmega )\), (21) clearly defines a linear functional on Y. Moreover, the mapping \({\varvec{p}}\mapsto \langle {\varvec{p}} ,\, \cdot \rangle \) is injective since \(\langle {\varvec{p}} ,\, {\varvec{d}} \rangle = 0\) for all \({\varvec{d}}\in Y\) implies \({\varvec{p}}= {\varvec{0}}\); see (10). With this representation of \(Y^*\) available, we can evaluate \(\varLambda ^*: \mathcal {R}\mathcal {T}_{\!\!\;r+1}^0(\varOmega ) \rightarrow U\), where we identify U with its dual space using the Riesz isomorphism induced by the \(L^2(\varOmega )\) inner product. Consequently, \(\varLambda ^*\) is defined by the condition \(\langle {\varvec{p}} ,\, \varLambda u \rangle = (u, \varLambda ^* {\varvec{p}})_{L^2(\varOmega )}\) for all \({\varvec{p}}\in \mathcal {R}\mathcal {T}_{\!\!\;r+1}^0(\varOmega )\) and all \(u \in \mathcal {D}\mathcal {G}_{r}(\varOmega )\). The left-hand side is

$$\begin{aligned} \langle {\varvec{p}} ,\, \varLambda u \rangle= & {} \sum _T \int _T {\varvec{p}}\cdot \nabla u \, \text{ d }x+ \sum _E \int _E ({\varvec{p}}\cdot {\varvec{n}}_E) \, \llbracket u\rrbracket \, \text{ d }S\nonumber \\= & {} \sum _T - \int _T ({\mathrm{div}}\,{\varvec{p}}) \, u \, \text{ d }x+ \sum _T \int _{\partial T} ({\varvec{p}}\cdot {\varvec{n}}_T) \, u \, \text{ d }S\nonumber \\&+ \sum _E \int _E ({\varvec{p}}\cdot {\varvec{n}}_E) \, \llbracket u\rrbracket \, \text{ d }S= - \int _\varOmega ({\mathrm{div}}\,{\varvec{p}}) \, u \, \text{ d }x,\nonumber \\ \end{aligned}$$
(22)

hence \(\varLambda ^* = - {\mathrm{div}}\,\) holds. Here \({\varvec{n}}_T\) denotes the outward unit normal along the triangle boundary \(\partial T\).

The dual problem can be cast as

$$\begin{aligned} \text {Minimize} \quad F^*(-\varLambda ^* {\varvec{p}}) + \beta \, G^*({\varvec{p}}/\beta ). \end{aligned}$$
(23)

It is well known that the convex conjugate of \(F(u) = \frac{1}{2} \Vert u-f\Vert _{L^2(\varOmega _0)}^2\) is \(F^*(u) = \frac{1}{2} \Vert u+f\Vert _{L^2(\varOmega _0)}^2 - \frac{1}{2} \Vert f\Vert _{L^2(\varOmega _0)}^2\). It remains to evaluate

$$\begin{aligned} G^*({\varvec{p}})= & {} \sup _{{\varvec{d}}\in Y} \langle {\varvec{p}} ,\, {\varvec{d}} \rangle - G({\varvec{d}})\\= & {} \sup _{{\varvec{d}}\in Y} \sum _T \int _T \left[ {\varvec{p}}\cdot {\varvec{d}}_T - \mathcal {I}_T \big \{ \vert {\varvec{d}}_T\vert _s \big \} \right] \, \text{ d }x\\&+ \sum _E \int _E \left[ ({\varvec{p}}\cdot {\varvec{n}}_E) \, d_E - \mathcal {I}_E \big \{ \vert d_E\vert \big \} \vert {\varvec{n}}_E\vert _s \right] \, \text{ d }S. \end{aligned}$$

Let us consider the contribution from \(d_E = \alpha \, \varphi _{E,j}\) for some \(\alpha \in \mathbb {R}\) on a single interior edge E, and \({\varvec{d}}\equiv 0\) otherwise. By (10b) and (14), this contribution is \(\alpha \, \sigma _{E,j}({\varvec{p}}) - \vert \alpha \vert \, \vert {\varvec{n}}_E\vert _s \, c_{E,j}\), which is bounded above if and only if \(\vert \sigma _{E,j}({\varvec{p}})\vert \le \vert {\varvec{n}}_E\vert _s \, c_{E,j}\). In this case, the maximum is zero. Similarly, it can be shown that the contribution from remains bounded above if and only if \(\vert {\varvec{\sigma }}_{T,i}({\varvec{p}})\vert _{s^*} \le c_{T,i}\), in which case the maximum is zero as well. This shows that \(G^* = I_{{\varvec{P}}}\) is the indicator function of the constraint set \({\varvec{P}}\) defined in (18), which concludes the proof. \(\square \)

Notice that discrete dual problem (DTV-L2-D) features the same, very simple set of constraints which already appeared in (15). As is the case for (TV-L2-D), the solution of discrete dual problem (DTV-L2-D) is not necessarily unique. However, its divergence is unique on \(\varOmega _0\) due to the strong convexity of the objective in terms of \({\mathrm{div}}\,{\varvec{p}}\).

Although not needed for Algorithms 1 and 2, we state the following relation between the primal and the dual solutions for completeness.

Lemma 4.1

(Recovery of the Primal Solution in (DTV-L2)) Suppose that \({\varvec{p}}\in \mathcal {R}\mathcal {T}_{\!\!\;r+1}^0(\varOmega )\) is a solution of (DTV-L2-D) in case \(\varOmega _0 = \varOmega \). Then the unique solution of (DTV-L2) is given by

$$\begin{aligned} u = {\mathrm{div}}\,{\varvec{p}}+ f \in \mathcal {D}\mathcal {G}_{r}(\varOmega ). \end{aligned}$$
(24)

Proof

From (23), the pair of optimality conditions to analyze is

$$\begin{aligned} -\varLambda ^* {\varvec{p}}\in \partial F(u) \quad \text {and} \quad {\varvec{p}}\in \partial (\beta \, G)(\varLambda u), \end{aligned}$$
(25)

see [23, Ch. III, Sect. 4]. Here it suffices to consider the first condition, which by [23, Prop. I.5.1] is equivalent to \(F(u) + F^*(-\varLambda ^* {\varvec{p}}) - ( u ,\, -\varLambda ^* {\varvec{p}} )_{L^2(\varOmega )} = 0\). This equality can be rewritten as

$$\begin{aligned}&\Vert u-f\Vert ^2_{L^2(\varOmega )} + \Vert {\mathrm{div}}\,{\varvec{p}}+ f\Vert ^2_{L^2(\varOmega )} - \Vert f\Vert ^2_{L^2(\varOmega )}\\&\quad -\, 2 \, ( u ,\, {\mathrm{div}}\,{\varvec{p}} )_{L^2(\varOmega )} = 0. \end{aligned}$$

Developing each summand in terms of the inner product \(( \cdot ,\, \cdot )_{L^2(\varOmega )}\) and rearranging appropriately, we obtain

$$\begin{aligned}&( u - f - {\mathrm{div}}\,{\varvec{p}} ,\, u )_{L^2(\varOmega )} + ( - u + f + {\mathrm{div}}\,{\varvec{p}} ,\, f )_{L^2(\varOmega )} \\&\quad +\, ( {\mathrm{div}}\,{\varvec{p}}+ f - u ,\, {\mathrm{div}}\,{\varvec{p}} )_{L^2(\varOmega )} = 0, \end{aligned}$$

which amounts to \(\Vert u-f-{\mathrm{div}}\,{\varvec{p}}\Vert ^2_{L^2(\varOmega )} = 0\), and (24) is proved. \(\square \)

Remark 4.1

In case \(\varOmega _0 \subsetneq \varOmega \), the solution of the primal problem will not be unique in general. An inspection of the proof of Lemma 4.1 shows that in this case, one can derive the relation

$$\begin{aligned} \Vert u-f-{\mathrm{div}}\,{\varvec{p}}\Vert ^2_{L^2(\varOmega _0)} = 2 \int _{\varOmega {\setminus }\varOmega _0} u \, {\mathrm{div}}\,{\varvec{p}}\, \text{ d }x. \end{aligned}$$

5 Algorithms for (DTV-L2)

Our goal in this section is to show that two standard algorithms developed for images on Cartesian grids, with finite difference approximations of gradient and divergence operations, are implementable with the same efficiency in our framework of higher-order finite elements on triangular meshes. Specifically, we consider in the following the split Bregman iteration [30] and the primal–dual method of Chambolle and Pock [13]. We refer the reader to the extended preprint [33] for a additional discussion of Chambolle’s projection method [11] and a primal–dual active set method similar to [34]. Since these algorithms are well known, we only focus on the main steps in each case. Let us recall that we are seeking a solution \(u \in \mathcal {D}\mathcal {G}_{r}\). For simplicity, we exclude the case \(r = 3\), i.e., we restrict the discussion to the polynomial degrees \(r \in \{0,1,2,4\}\) so that all weights \(c_{T,i}\) and \(c_{E,j}\) are strictly positive.

5.1 Split Bregman Method

The split Bregman method (also known as alternating direction method of multipliers (ADMM)) considers primal problem (DTV-L2). It introduces an additional variable \({\varvec{d}}\) so that (DTV-L2) becomes

$$\begin{aligned}&\text {Minimize} \quad \frac{1}{2} \Vert u-f\Vert _{L^2(\varOmega _0)}^2 + \beta \sum _{T,i} c_{T,i} \, \bigl \vert {\varvec{d}}_{T,i}\bigr \vert _s\nonumber \\&\qquad +\, \beta \sum _{E,j} \vert {\varvec{n}}_E\vert _s \, c_{E,j} \, \vert d_{E,j}\vert \quad \text {s.t.} \quad {\varvec{d}}= \varLambda u \end{aligned}$$
(26)

and enforces the constraint \({\varvec{d}}= \varLambda u = \nabla u\) by an augmented Lagrangian approach. As detailed in (19), \({\varvec{d}}\) has contributions \({ \left. \nabla u \phantom {|} \right| _{T} }\) per triangle, as well as contributions \(\llbracket u\rrbracket _E\) per interior edge. We can thus express \({\varvec{d}}\) through its coefficients \(\{{\varvec{d}}_{T,i}\}\) and \(\{d_{E,j}\}\) w.r.t. the standard Lagrangian bases of \(\mathcal {P}_{r-1}(T)^2\) and \(\mathcal {P}_r(E)\),

$$\begin{aligned} {\varvec{d}}= \sum _i {\varvec{d}}_{T,i} \, \varphi _{T,i} + \sum _j d_{E,j} \, \varphi _{E,j}. \end{aligned}$$
(27)

Using (13) and (14), we rewrite discrete total variation (3) in terms of \({\varvec{d}}\) and adjoin the constraint \({\varvec{d}}= \nabla u\) by way of an augmented Lagrangian functional,

$$\begin{aligned}&\frac{1}{2} \Vert u-f\Vert _{L^2(\varOmega _0)}^2 + \beta \sum _{T,i} c_{T,i} \, \bigl \vert {\varvec{d}}_{T,i}\bigr \vert _s\nonumber \\&\quad +\, \beta \sum _{E,j} \vert {\varvec{n}}_E\vert _s \, c_{E,j} \, \vert d_{E,j}\vert + \frac{\lambda }{2} \Vert {\varvec{d}}- \varLambda u - {\varvec{b}}\Vert _Y^2. \end{aligned}$$
(28)

Here \({\varvec{b}}\) is an estimate of the Lagrange multiplier associated with the constraint \({\varvec{d}}= \nabla u \in Y\), and \({\varvec{b}}\) is naturally discretized in the same way as \({\varvec{d}}\).

Remark 5.1

(Inner product onY) So far we have not endowed the space

$$\begin{aligned}Y = \prod _T \mathcal {P}_{r-1}(T)^2 \times \prod _E \mathcal {P}_r(E)\end{aligned}$$

with an inner product. Since elements of Y represent (measure-valued) gradients of \(\mathcal {D}\mathcal {G}_{r}(\varOmega )\) functions, the natural choice would be to endow Y with a total variation norm of vector measures, which would amount to

$$\begin{aligned} \sum _T \int _T \vert {\varvec{d}}_T\vert _s \, \text{ d }x+ \sum _E \vert {\varvec{n}}_E\vert _s \int _E \vert d_E\vert \, \text{ d }S\end{aligned}$$

for \({\varvec{d}}\in Y\). Clearly, this \(L^1\)-type norm is not induced by an inner product. Therefore we are using the \(L^2\) inner product instead. For computational efficiency, it is crucial to consider its lumped version, which amounts to

$$\begin{aligned} ( {\varvec{d}} ,\, {\varvec{e}} )_Y :=S\sum _{T,i} c_{T,i} \, {\varvec{d}}_{T,i} \, {\varvec{e}}_{T,i} + \sum _{E,j} c_{E,j} \, d_{E,j} \, e_{E,j} \end{aligned}$$
(29)

for \({\varvec{d}}, {\varvec{e}}\in Y\). The associated norm is denoted as \(\Vert {\varvec{d}}\Vert _Y^2 = ( {\varvec{d}} ,\, {\varvec{d}} )_Y\). Notice that \(S> 0\) is a scaling parameter which can be used to improve the convergence of the split Bregman and other iterative methods.

The efficiency of the split Bregman iteration depends on the ability to efficiently minimize (28) independently for u, \({\varvec{d}}\) and \({\varvec{b}}\), respectively. Let us show that this is the case.

5.1.1 The Gradient Operator \(\varLambda \)

The gradient operator \(\varLambda \) evaluates the cell-wise gradient of \(u \in \mathcal {D}\mathcal {G}_{r}(\varOmega )\) as well as the edge-jump contributions, see (19). These are standard operations in any finite element toolbox. For computational efficiency, the matrix realizing \(u(x_{T,i})\) and \(u(x_{E,j})\) in terms of the coefficients of u can be stored once and for all.

5.1.2 Solving the u-problem

We consider the minimization of (28), or equivalently, of

$$\begin{aligned}&\frac{1}{2} \Vert u-f\Vert _{L^2(\varOmega _0)}^2 + \frac{\lambda S}{2} \sum _{T,i} c_{T,i} \, \bigl \vert {\varvec{d}}_{T,i} - \nabla u(x_{T,i}) - {\varvec{b}}_{T,i}\bigr \vert _2^2\nonumber \\&\quad +\, \frac{\lambda }{2} \sum _{E,j} c_{E,j} \, \bigl \vert d_{E,j} - \llbracket u\rrbracket (x_{E,j}) - b_{E,j}\bigr \vert ^2 \end{aligned}$$
(30)

w.r.t. \(u \in \mathcal {D}\mathcal {G}_{r}(\varOmega )\). This problem can be interpreted as a DG finite element formulation of the elliptic partial differential equation \(- \lambda \, \Delta u + \chi _{\varOmega _0} u = \chi _{\varOmega _0} f + \lambda {\mathrm{div}}\,({\varvec{b}}- {\varvec{d}})\) in \(\varOmega \). More precisely, it constitutes a nonsymmetric interior penalty Galerkin (NIPG) method; compare for instance [44] or [43, Ch. 2.4, 2.6]. Specialized preconditioned solvers for such systems are available, see for instance [3]. However, as proposed in [30], a (block) Gauss–Seidel method may be sufficient. It is convenient to group the unknowns of the same triangle together, which leads to local systems of size \((r+1)(r+2)/2\).

5.1.3 Solving the \({\varvec{d}}\)-problem

The minimization of (28), or equivalently, of

$$\begin{aligned}&\beta \sum _{T,i} c_{T,i} \, \bigl \vert {\varvec{d}}_{T,i}\bigr \vert _s + \beta \sum _{E,j} \vert {\varvec{n}}_E\vert _s \, c_{E,j} \, \vert d_{E,j}\vert \nonumber \\&\quad +\, \frac{\lambda S}{2} \sum _{T,i} c_{T,i} \, \bigl \vert {\varvec{d}}_{T,i} - \nabla u(x_{T,i}) - {\varvec{b}}_{T,i}\bigr \vert _2^2\nonumber \\&\quad +\, \frac{\lambda }{2} \sum _{E,j} c_{E,j} \, \bigl \vert d_{E,j} - \llbracket u\rrbracket (x_{E,j}) - b_{E,j}\bigr \vert ^2 \end{aligned}$$
(31)

decouples into the minimization of

$$\begin{aligned}&\beta \, \bigl \vert {\varvec{d}}_{T,i}\bigr \vert _s + \frac{\lambda S}{2} \bigl \vert {\varvec{d}}_{T,i} - \nabla u(x_{T,i}) - {\varvec{b}}_{T,i}\bigr \vert _2^2 \end{aligned}$$
(32a)
$$\begin{aligned}&\quad \text {and } \beta \, \vert {\varvec{n}}_E\vert _s \, \vert d_{E,j}\vert + \frac{\lambda }{2} \bigl \vert d_{E,j} - \llbracket u\rrbracket (x_{E,j}) - b_{E,j}\bigr \vert ^2 \end{aligned}$$
(32b)

w.r.t. \({\varvec{d}}_{T,i} \in \mathbb {R}^2\) and \(d_{E,j} \in \mathbb {R}\), respectively.

It is well known that scalar problem (32b) is solved via

$$\begin{aligned} d_{E,j} = {\mathrm{shrink}}\,\left( \llbracket u\rrbracket (x_{E,j}) + b_{E,j}, \; \frac{\beta \, \vert {\varvec{n}}_E\vert _s }{\lambda } \right) , \end{aligned}$$

where \({\mathrm{shrink}}\,(\xi ,\gamma ) :=\max \left\{ \vert \xi \vert - \gamma , \; 0 \right\} {\mathrm{sgn}}\,{\xi }\), while the minimization of (32a) defines the (Euclidean) \({\mathrm{prox}}\,\) mapping of \(\vert \,\cdot \,\vert _s\) and thus we have

$$\begin{aligned} {\varvec{d}}_{T,i} = {\mathrm{prox}}\,_{\beta /(\lambda S) \vert \,\cdot \,\vert _s} \big ( \nabla u(x_{T,i}) + {\varvec{b}}_{T,i} \big ), \end{aligned}$$

where

$$\begin{aligned} {\mathrm{prox}}\,_{\beta /(\lambda S) \vert \,\cdot \,\vert _s}({\varvec{\xi }}) = {\varvec{\xi }}- \frac{\beta }{\lambda S} {\mathrm{proj}}\,_{B_{\vert \,\cdot \,\vert _{s^*}}} \left( \frac{\lambda S}{\beta } {\varvec{\xi }}\right) . \end{aligned}$$

Here \({\mathrm{proj}}\,_{B_{\vert \,\cdot \,\vert _{s^*}}}\) is the Euclidean orthogonal projection onto the closed \(\vert \,\cdot \,\vert _{s^*}\)-norm unit ball; see for instance [7, Ex. 6.47]. When \(s \in \{1, 2\}\), then we have closed-form solutions of (32a):

$$\begin{aligned}{}[{\varvec{d}}_{T,i}]_\ell = {\mathrm{shrink}}\,\left( \big [ \nabla u(x_{T,i}) + {\varvec{b}}_{T,i} \big ]_\ell , \; \frac{\beta }{\lambda S} \right) \text { for } \ell = 1,2 \end{aligned}$$

when \(s = 1\) and

$$\begin{aligned}&{\varvec{d}}_{T,i} = \max \left\{ \bigl \vert \nabla u(x_{T,i}) + {\varvec{b}}_{T,i}\bigr \vert _2 - \frac{\beta }{\lambda S}, \; 0 \right\} \\&\quad \cdot \frac{\nabla u(x_{T,i}) + {\varvec{b}}_{T,i}}{\bigl \vert \nabla u(x_{T,i}) + {\varvec{b}}_{T,i}\bigr \vert _2} \end{aligned}$$

when \(s = 2\). When \(\nabla u(x_{T,i}) + {\varvec{b}}_{T,i} = 0\), the second formula is understood as \({\varvec{d}}_{T,i} = 0\). Efficient approaches for \(s = \infty \) are also available; see [22].

5.1.4 Updating \({\varvec{b}}\)

This is simply achieved by replacing the current values for \({\varvec{b}}_{T,i}\) and \(b_{E,j}\) by \({\varvec{b}}_{T,i} + \nabla u(x_{T,i}) - {\varvec{d}}_{T,i}\) and \(b_{E,j} + \llbracket u\rrbracket (x_{E,j}) - d_{E,j}\), respectively.

The quantities \({\varvec{b}}_{T,i}\) and \(b_{E,j}\) represent discrete multipliers associated with the components of the constraint \({\varvec{d}}= \varLambda u\). Here we clarify how these multipliers relate to the dual variable \({\varvec{p}}\in \mathcal {R}\mathcal {T}_{\!\!\;r+1}^0(\varOmega )\) in (DTV-L2-D). In fact, let us interpret \({\varvec{b}}_{T,i}\) as the coefficients of a function \({\varvec{b}}_T \in \mathcal {P}_{r-1}(T)\) and \(b_{E,j}\) as the coefficients of a function \(b_E \in \mathcal {P}_r(E)\) w.r.t. the standard nodal bases, just as in (27). Moreover, let us define a function \(\bar{\varvec{p}}\in \mathcal {R}\mathcal {T}_{\!\!\;r+1}^0(\varOmega )\) by specifying its coefficients as follows,

$$\begin{aligned} {\varvec{\sigma }}_{T,i}(\bar{\varvec{p}}) :=\lambda S\, {\varvec{b}}_{T,i} \, c_{T,i} \quad \text {and} \quad \sigma _{E,j}(\bar{\varvec{p}}) :=\lambda \, b_{E,j} \, c_{E,j}. \end{aligned}$$
(33)

Then

$$\begin{aligned}&\int _T \! \bar{\varvec{p}}\cdot (\nabla u - {\varvec{d}}_T) \, \text{ d }x\\&\quad = \sum _i \int _T \! \bar{\varvec{p}}\, \varphi _{T,i} \cdot \big ( \nabla u(x_{T,i}) - {\varvec{d}}_{T,i} \big ) \, \text{ d }x\\&\quad = \lambda S\sum _i c_{T,i} \, {\varvec{b}}_{T,i} \cdot \big ( \nabla u(x_{T,i}) - {\varvec{d}}_{T,i} \big ) \end{aligned}$$

and

$$\begin{aligned}&\int _E \! \bar{\varvec{p}}\, (\llbracket u\rrbracket - d_E) \, {\varvec{n}}_E \, \text{ d }S\\&\quad = \sum _j \int _E \! \bar{\varvec{p}}\, \varphi _{E,j} \big ( \llbracket u\rrbracket (x_{E,j}) - d_{E,j} \big ) \, \text{ d }S\\&\quad = \lambda \sum _j c_{E,j} \, b_{E,j} \, (\llbracket u\rrbracket (x_{E,j}) - d_{E,j}) , \end{aligned}$$

and these are precisely the terms appearing in discrete augmented Lagrangian functional (28). Consequently, \(\bar{\varvec{p}}\) can be interpreted as the Lagrange multiplier associated with the components of the constraint \({\varvec{d}}= \varLambda u\), when the latter are adjoined using the lumped \({\varvec{L}}^2(T)\) and \(L^2(E)\) inner products. It can be shown using the KKT conditions for (26) and optimality conditions (25) that \(\bar{\varvec{p}}\) defined by (33) solves dual problem (DTV-L2-D). To prove this assertion, suppose that \((u,{\varvec{d}})\) is optimal for (26). We will show that \((u,\bar{\varvec{p}})\) satisfy necessary and sufficient optimality conditions (25). The Lagrangian for (26) can be written as \(F(u) + \beta \, G({\varvec{d}}) + \langle \bar{\varvec{p}} ,\, \varLambda u - {\varvec{d}} \rangle \) and the optimality of \((u,{\varvec{d}})\) implies \(\bar{\varvec{p}}\in \partial (\beta \, G)({\varvec{d}}) = \partial (\beta \, G)(\varLambda u)\). On the other hand, u is optimal for (DTV-L2), which implies \(0 \in \partial F(u) + \varLambda ^* \partial (\beta \, G)(\varLambda u)\) and thus \(- \varLambda ^* \bar{\varvec{p}}\in \partial F(u)\). Altogether, we have verified (25), which is necessary and sufficient for \(\bar{\varvec{p}}\) to be optimal for (DTV-L2-D).

For convenience, we specify the split Bregman iteration in Algorithm 1.

figure k

5.2 Chambolle–Pock Method

The method by [13], also known as primal–dual extragradient method, see [31], is based on a reformulation of the optimality conditions in terms of the \({\mathrm{prox}}\,\) operators pertaining to F and \(G^*\). We recall that F is defined by \(F(u) = \frac{1}{2} \Vert u-f\Vert _{L^2(\varOmega _0)}^2\) on \(U = \mathcal {D}\mathcal {G}_{r}(\varOmega )\). Moreover, \(G^*\) is defined on \(Y^* \cong \mathcal {R}\mathcal {T}_{\!\!\;r+1}^0(\varOmega )\) by \(G^* = I_{{\varvec{P}}}\), the indicator function of \({\varvec{P}}\), see (18).

Notice that \({\mathrm{prox}}\,\) operators depend on the inner product in the respective space. We recall that U has been endowed with the (regular, nonlumped) \(L^2(\varOmega )\) inner product, see the proof of Theorem 4.1. For the space Y we are using again the inner product defined in (29). Exploiting duality product (21) between Y and \(Y^* \cong \mathcal {R}\mathcal {T}_{\!\!\;r+1}^0(\varOmega )\) is then straightforward to derive the Riesz map \(R: Y \ni {\varvec{d}}\mapsto {\varvec{p}}\in Y^*\). In terms of the coefficients of \({\varvec{p}}\), we have

$$\begin{aligned} {\varvec{\sigma }}_{T,i}({\varvec{p}}) = c_{T,i}S\, {\varvec{d}}_{T,i} \quad \text {and} \quad \sigma _{E,j}({\varvec{p}}) = c_{E,j} \, d_{E,j}. \end{aligned}$$
(34)

Consequently, the induced inner product in \(\mathcal {R}\mathcal {T}_{\!\!\;r+1}^0(\varOmega )\) becomes

$$\begin{aligned}&( {\varvec{p}} ,\, {\varvec{q}} )_{Y^*} :=\sum _{T,i} \frac{1}{c_{T,i}S} \, {\varvec{\sigma }}_{T,i}({\varvec{p}}) \cdot {\varvec{\sigma }}_{T,i}({\varvec{q}})\nonumber \\&\quad + \sum _{E,j} \frac{1}{c_{E,j}} \, \sigma _{E,j}({\varvec{p}}) \, \sigma _{E,j}({\varvec{q}}). \end{aligned}$$
(35)

To summarize, the inner products in Y, \(Y^*\) as well as the Riesz map are realized efficiently by simple, diagonal operations on the coefficients.

5.2.1 Solving the F-prox

Let \(\sigma > 0\). The \({\mathrm{prox}}\,\)-operator of \(\sigma F\), denoted by

$$\begin{aligned}{\mathrm{prox}}\,_{\sigma F}(\bar{u}): U \rightarrow U,\end{aligned}$$

is defined as \(u = {\mathrm{prox}}\,_{\sigma F}(\bar{u})\) if and only if

$$\begin{aligned} u = \mathop {\hbox {arg min}}\limits _{v \in \mathcal {D}\mathcal {G}_{r}(\varOmega )} \frac{1}{2} \Vert v - \bar{u}\Vert _{L^2(\varOmega )}^2 + \frac{\sigma }{2} \Vert v-f\Vert _{L^2(\varOmega _0)}^2. \end{aligned}$$

For given data \(\bar{u} {}\in \mathcal {D}\mathcal {G}_{r}(\varOmega )\) and \(f \in \mathcal {D}\mathcal {G}_{r}(\varOmega _0)\), it is easy to see that a necessary and sufficient condition is \(u - \bar{u} + \sigma \, (u - f) = 0\), which amounts to the coefficient-wise formula

$$\begin{aligned} u_{T,k} = \frac{1}{1 + \sigma _{T,k}} \big ( \bar{u}_{T,k} + \sigma _{T,k} f_{T,k} \big ), \end{aligned}$$
(36)

where \(\sigma _{T,k} = \sigma \) if \(T \subset \varOmega _0\) and \(\sigma _{T,k} = 0\) otherwise.

5.2.2 Solving the \(G^*\)-prox

Let \(\tau > 0\). The \({\mathrm{prox}}\,\)-operator

$$\begin{aligned}{\mathrm{prox}}\,_{\tau G^*}: Y^* \cong \mathcal {R}\mathcal {T}_{\!\!\;r+1}^0(\varOmega ) \rightarrow Y^*\end{aligned}$$

is defined as \({\varvec{p}} = {\mathrm{prox}}\,_{\tau G^*}(\bar{\varvec{p}})\) if and only if

$$\begin{aligned} {\varvec{p}} = \mathop {\hbox {arg min}}\limits _{{\varvec{q}}\in \mathcal {R}\mathcal {T}_{\!\!\;r+1}^0(\varOmega )} \frac{1}{2} \Vert {\varvec{q}}- \bar{\varvec{p}}\Vert _{Y^*}^2 \text { s.t. } {\varvec{q}}\in {\varvec{P}}. \end{aligned}$$
(37)

Similarly, the \({\mathrm{prox}}\,\) operator for \((\beta \, G)^*\) is obtained by replacing \({\varvec{P}}\) by \(\beta {\varvec{P}}\), for any \(\tau > 0\). Due to the diagonal structure of the inner product in \(Y^*\), this is efficiently implementable. When \(\bar{\varvec{p}}\in \mathcal {R}\mathcal {T}_{\!\!\;r+1}^0(\varOmega )\), then we obtain the solution in terms of the coefficients, similar to (32), as

$$\begin{aligned} \begin{aligned} {\varvec{\sigma }}_{T,i}({\varvec{p}})&= {\mathrm{proj}}\,_{\beta \, c_{T,i} B_{\vert \,\cdot \,\vert _{s^*}}} \left( {\varvec{\sigma }}_{T,i}(\bar{\varvec{p}}) \right) \\ \sigma _{E,j}({\varvec{p}})&= \min \big \{ \vert \sigma _{E,j}(\bar{\varvec{p}})\vert , \; \beta \, \vert {\varvec{n}}_E\vert _s \, c_{E,j} \big \} \frac{\sigma _{E,j}(\bar{\varvec{p}})}{\vert \sigma _{E,j}(\bar{\varvec{p}})\vert } . \end{aligned} \end{aligned}$$
(38)

In particular we have

$$\begin{aligned} \big [ {\varvec{\sigma }}_{T,i}({\varvec{p}}) \big ]_{\ell } = \min \left\{ \bigl \vert [ {\varvec{\sigma }}_{T,i}(\bar{\varvec{p}}) ]_{\ell }\bigr \vert , \; \beta \, c_{T,i} \right\} {\mathrm{sgn}}\,[ {\varvec{\sigma }}_{T,i}(\bar{\varvec{p}}) ]_{\ell } \end{aligned}$$

for \(\ell = 1,2\) when \(s = 1\) and

$$\begin{aligned} {\varvec{\sigma }}_{T,i}({\varvec{p}}) = \min \left\{ \vert {\varvec{\sigma }}_{T,i}(\bar{\varvec{p}})\vert _2, \; \beta \, c_{T,i} \right\} \frac{{\varvec{\sigma }}_{T,i}(\bar{\varvec{p}})}{\vert {\varvec{\sigma }}_{T,i}(\bar{\varvec{p}})\vert _2} \end{aligned}$$

when \(s = 2\). The second formula is understood as

$$\begin{aligned} {\varvec{\sigma }}_{T,i}({\varvec{p}}) = 0 \end{aligned}$$

when \(\vert {\varvec{\sigma }}_{T,i}(\bar{\varvec{p}})\vert _2 = 0\). An implementation of the Chambolle–Pock method is given in Algorithm 2. Notice that the solution of the \({\mathrm{prox}}\,_{\tau G^*}\) problem is independent of the scaling parameter \(S > 0\). However, S enters through Riesz isomorphism (34).

figure l

6 Implementation Details

Our implementation was carried out in the finite element framework FEniCS (version 2017.2). We refer the reader to [2, 39] for background reading. FEniCS supports finite elements of various types on simplicial meshes, including \(\mathcal {C}\mathcal {G}_{r}\), \(\mathcal {D}\mathcal {G}_{r}\) and \(\mathcal {R}\mathcal {T}_{\!\!\;r+1}\) elements of arbitrary order. Although we focus on this piece of software, the content of this section will apply to other finite element frameworks as well.

While the bases for the spaces \(\mathcal {C}\mathcal {G}_{r}\) and \(\mathcal {D}\mathcal {G}_{r}\) in FEniCS are given by the standard nodal basis functions as described in Sect. 2, the implementation of \(\mathcal {R}\mathcal {T}_{\!\!\;r+1}\) elements in FEniCS uses degrees of freedom based on point evaluations of \({\varvec{p}}\) and \({\varvec{p}}\cdot {\varvec{n}}_E\), rather than the integral-type dofs in (10). Since we wish to take advantage of the simple structure of the constraints in dual representation (15) of \(\vert u\vert _{\mathrm{DTV}(\varOmega )}\) however, we rely on the choice of dofs described in (10). In order to avoid a global basis transformation, we implemented our own version of the \(\mathcal {R}\mathcal {T}_{\!\!\;r+1}\) finite element in FEniCS.

Our implementation uses the dofs in (10) on the reference cell \(\widehat{T}\). As usual in finite element methods, an arbitrary cell T is then obtained via an affine geometry transformation, i.e.,

$$\begin{aligned} G_T: \widehat{T} \rightarrow T, \qquad G_T(\widehat{x}) = B_T \, \widehat{x} + b_T, \end{aligned}$$

where \(B_T \in \mathbb {R}^{2 \times 2}\) is a nonsingular matrix and \(b_T \in \mathbb {R}^2\). We mention that \(B_T\) need not necessarily have a positive determinant, i.e., the transformation \(G_T\) may not necessarily be orientation preserving. In contrast to \(\mathcal {C}\mathcal {G}_{}\) and \(\mathcal {D}\mathcal {G}_{}\) elements, a second transformation is required to define the dofs and basis functions on the world cell T from the dofs and basis functions on \(\widehat{T}\). For the (\({\varvec{H}}({\mathrm{div}}\,;\varOmega )\)-conforming) \(\mathcal {R}\mathcal {T}_{\!\!\;}\) spaces, this is achieved via the (contravariant) Piola transform; see for instance [25, Ch. 1.4.7] or [45]. In terms of functions \(\widehat{\varvec{p}}\) from the local polynomial space, we have

$$\begin{aligned} \begin{aligned}&P_T: \mathcal {P}_r(\widehat{T})^2 + \widehat{\varvec{x}}\, \mathcal {P}_r(\widehat{T}) \rightarrow \mathcal {P}_r(T)^2 + {\varvec{x}}\, \mathcal {P}_r(T), \\&P_T(\widehat{\varvec{p}}) = (\det B_T^{-1}) \, B_T \, [ \widehat{\varvec{p}}\circ G_T^{-1} ]. \end{aligned} \end{aligned}$$

The Piola transform preserves tangent directions on edges, as well as normal traces of vector fields, up to edge lengths. It satisfies

$$\begin{aligned} \vert \widehat{E}\vert \, \widehat{\varvec{p}}\cdot \widehat{\varvec{n}}_{\widehat{E}} = \pm \vert E\vert \, {\varvec{p}}\cdot {\varvec{n}}_E \quad \text {and} \quad \vert \widehat{T}\vert \, B_T \, \widehat{\varvec{p}}= \pm \vert T\vert \, {\varvec{p}}, \end{aligned}$$
(39)

where \(\widehat{E}\) is an edge of \(\widehat{T}\), \(\widehat{\varvec{n}}_{\widehat{E}}\) is the corresponding unit outer normal, \(E = G_T(\widehat{E})\), \({\varvec{n}}_E\) is a unit normal vector on E with arbitrary orientation, \({\varvec{p}}= P_T(\widehat{\varvec{p}})\), and \(\vert T\vert \) is the area of T; see for instance [25, Lem. 1.84].

We denote by \(\widehat{{\varvec{\sigma }}}_{\widehat{T},i}\) and \(\widehat{\sigma }_{\widehat{E},j}\) the degrees of freedom as in (10), defined in terms of the nodal basis functions \(\widehat{\varphi }_{\widehat{T},i} \in \mathcal {P}_{r-1}(\widehat{T})\) and \(\widehat{\varphi }_{\widehat{E},j} \in \mathcal {P}_r(\widehat{E})\) on the reference cell. Let us consider how these degrees of freedom act on the world cell. Indeed, the relations above imply

$$\begin{aligned} \widehat{{\varvec{\sigma }}}_{\widehat{T},i}(\widehat{\varvec{p}})&:=\int _{\widehat{T}} \widehat{\varphi }_{\widehat{T},i} \, \widehat{\varvec{p}}\, \text{ d }\widehat{x} \nonumber \\&= \pm \int _T \varphi _{T,i} \, B_T^{-1} \, {\varvec{p}}\, \text{ d }x=: \pm \widetilde{{\varvec{\sigma }}}_{T,i}({\varvec{p}}) , \end{aligned}$$
(40a)
$$\begin{aligned} \widehat{\sigma }_{\widehat{E},j}(\widehat{\varvec{p}})&:=\int _{\widehat{E}} \, \widehat{\varphi }_{\widehat{E},j} \, (\widehat{\varvec{p}}\cdot \widehat{\varvec{n}}_{\widehat{E}}) \, \text{ d }\widehat{s} \nonumber \\&= \pm \int _E \, \varphi _{E,j} \, ({\varvec{p}}\cdot {\varvec{n}}_E) \, \text{ d }S= \pm \sigma _{E,j}({\varvec{p}}) , \end{aligned}$$
(40b)

where we used that Lagrangian basis functions are transformed according to \(\varphi _{T,i} = \widehat{\varphi }_{\widehat{T},i} \circ G_T^{-1}\), and similarly for the edge-based quantities. The correct choice of the sign in (39) and (40) depends on the sign of \(\det B_T\) and on the relative orientations of \(P_T(\widehat{\varvec{n}}_{\widehat{E}})\) and \({\varvec{n}}_E\). However the sign is not important since all operations depending on the dofs or coefficients, such as \({\varvec{\sigma }}_{T,i}({\varvec{p}})\), are sign invariant, notably the constraint set in (18).

Notice that while (40b) agrees (possibly up to the sign) with our preferred set of edge-based dofs (10b), the interior dofs \(\widetilde{{\varvec{\sigma }}}_{T,i}\) available through transformation (40a) are related to the desired dofs \({\varvec{\sigma }}_{T,i}\) from (10a) via

$$\begin{aligned} {\varvec{\sigma }}_{T,i}({\varvec{p}}) = {\mathrm{sgn}}\,(\det B_T) \, B_T^\top \, \widetilde{{\varvec{\sigma }}}_{T,i}({\varvec{p}}). \end{aligned}$$
(41)

Notice that this transformation is impossible to avoid since dofs (10a) are not invariant under the Piola transform. However, (41) is completely local to the triangle and inexpensive to evaluate. Although not required for our numerical computations, we mention for completeness that the corresponding dual basis functions are related via

$$\begin{aligned} \varvec{\psi }^T_{i} = {\mathrm{sgn}}\,(\det B_T) \, \widetilde{\varvec{\psi }}^T_{i} B_T^{-\top }. \end{aligned}$$
(42)

To summarize this discussion, functions \({\varvec{p}}\in \mathcal {R}\mathcal {T}_{\!\!\;r+1}(\varOmega )\) will be represented in terms of coefficients w.r.t. the dofs \(\{\sigma _{E,j}\}\) and \(\{\widetilde{{\varvec{\sigma }}}_{T,i}\}\) in our FEniCS implementation of the \(\mathcal {R}\mathcal {T}_{\!\!\;}\) space. Transformations to and from the desired dofs \(\{{\varvec{\sigma }}_{T,i}\}\) will be performed for all operations manipulating directly the coefficients of an \(\mathcal {R}\mathcal {T}_{\!\!\;r+1}\) function. For instance, the projection operation in (38) (for Chambolle–Pock Algorithm 2) in the case \(s = 2\) would be implemented as

$$\begin{aligned}&\widetilde{{\varvec{\sigma }}}_{T,i}({\varvec{p}}) = B_T^{-\top } \min \left\{ \vert B_T^\top \, \widetilde{{\varvec{\sigma }}}_{T,i}(\bar{\varvec{p}})\vert _2, \; \beta \, c_{T,i} \right\} \\&\quad \cdot \frac{B_T^\top \, \widetilde{{\varvec{\sigma }}}_{T,i}(\bar{\varvec{p}})}{\vert B_T^\top \, \widetilde{{\varvec{\sigma }}}_{T,i}(\bar{\varvec{p}})\vert _2} . \end{aligned}$$
Table 2 Dimensions of the \(\mathcal {D}\mathcal {G}_{r}\) spaces for our test images depending on the polynomial degree \(r \in \{0,1,2\}\)

7 Numerical Results for (DTV-L2)

In this section we present some numerical results for (DTV-L2) in the isotropic case (\(s = 2\)). Our goals are to compare the convergence behavior and computational efficiency for Algorithms 1 and 2 w.r.t. varying polynomial degree \(r \in \{0,1,2\}\), and to exhibit the benefits of polynomial orders \(r \ge 1\) for image quality, for both denoising and inpainting applications.

Fig. 3
figure 3

Left: cameraman pixel test image. Middle: nondiscrete test image. Right: mesh used to represent the image in the middle

In our tests, we use the two images displayed in Fig. 3. Both have data in the range [0, 1]. The discrete cameraman image has a resolution of \(256 \times 256\) square pixels and will be interpolated onto a \(\mathcal {D}\mathcal {G}_{r}(\varOmega )\) space on a triangular grid with crossed diagonals, so the mesh has 262,144 cells and 131,585 vertices. We are also using a low-resolution version of the cameraman image on a \(64 \times 64\) square grid in Sect. 7.3. The second is a nondiscrete image on a circle of radius 0.5. The corresponding discrete problems are set up on a mesh consisting of 5460 cells and 2811 vertices. For each problem, the dimension of the finite element space for the image u is given in Table 2. In all of the following tests, noise is added to each degree of freedom in the form of a normally distributed random variable with standard deviation \(\sigma = 10^{-1}\) and zero mean. Our implementation uses the finite element framework FEniCS (version 2017.2). All experiments were conducted on a standard desktop PC with an Intel i5-4690 CPU running at 3.50 Ghz, 16 GB RAM and Linux openSUSE Leap 42.1. Visualization was achieved in ParaView.

A stopping criterion for Algorithms 1 and 2 can be based on the primal–dual gap

$$\begin{aligned} F(u) + \beta \, G(\varLambda u) + F^*(\varLambda ^* {\varvec{p}}) + \beta \, G^*({\varvec{p}}/\beta ) . \end{aligned}$$
(43)

Notice that since \(G^* = I_{{\varvec{P}}}\) is the indicator function of the constraint set \({\varvec{P}}\), the last term is either 0 or \(\infty \), and (43) can therefore not directly serve as a meaningful stopping criterion. Instead, we omit the last term in (43) and introduce a distance-to-feasibility measure for \({\varvec{p}}\) as a second criterion. For the latter, we utilize the difference of \({\varvec{p}}\) and its \(Y^*\)-orthogonal projection onto \(\beta {\varvec{P}}\), measured in the \(Y^*\)-norm squared. This expression can be easily evaluated when \(s \in \{1,2\}\). Straightforward calculations then show that we obtain the following specific expressions:

(44a)

and

$$\begin{aligned}&{\mathrm{INFEAS}}\,_{2}({\varvec{p}}) :=\sum _{T,i} \frac{1}{c_{T,i}S} \max \big \{ \vert {\varvec{\sigma }}_{T,i}({\varvec{p}})\vert _2 - \beta \, c_{T,i}, \; 0 \big \}^2\nonumber \\&\quad + \sum _{E,j} \frac{1}{c_{E,j}} \max \big \{ \vert \sigma _{E,j}({\varvec{p}})\vert - \beta \, c_{E,j}, \; 0 \big \}^2 \end{aligned}$$
(44b)

when \(s = 2\), as well as

$$\begin{aligned}&{\mathrm{INFEAS}}\,_{1}({\varvec{p}})\nonumber \\&\quad :=\sum _{T,i} \frac{1}{c_{T,i}S} \sum _{\ell =1}^2 \max \big \{ \bigl \vert [{\varvec{\sigma }}_{T,i}({\varvec{p}})]_\ell \bigr \vert - \beta \, c_{T,i}, \; 0 \big \}^2\nonumber \\&\quad + \sum _{E,j} \frac{1}{c_{E,j}} \max \big \{ \vert \sigma _{E,j}({\varvec{p}})\vert - \beta \, \vert {\varvec{n}}_E\vert _s \, c_{E,j}, \; 0 \big \}^2 \end{aligned}$$
(44c)

when \(s = 1\). In our numerical experiments, we focus on the case \(s = 2\) and we stop either algorithm as soon as the iterates \((u,{\varvec{p}})\) satisfy the following conditions:

$$\begin{aligned} \begin{aligned} \vert {\mathrm{GAP}}\,(u,{\varvec{p}})\vert&\le \varepsilon _\text {rel} {\mathrm{GAP}}\,(f,{\varvec{0}})\\ {\mathrm{INFEAS}}\,_{2}({\varvec{p}})&\le 10^{-11} \end{aligned} \end{aligned}$$
(45)

with \(\varepsilon _\text {rel} = 10^{-3}\). As a measurement for the quality of our results we use the common peak signal-to-noise ratio, defined by

$$\begin{aligned} {\mathrm{PSNR}}\,(u,u_{\text{ ref }}) = 10 \log _{10} \left( \frac{M^2 \, \vert \varOmega \vert }{\Vert u-u_{\text{ ref }}\Vert _{L^2(\varOmega )}^2}\right) , \end{aligned}$$
(46)

where u is the recovered image, \(u_{\text{ ref }}\) is the reference image, and \(\vert \varOmega \vert \) is the area of the image. Moreover, \(M = 1\) is the maximum possible image value.

Fig. 4
figure 4

Original, noisy and denoised images (top to bottom) for \(\mathcal {D}\mathcal {G}_{0}\) (left column), \(\mathcal {D}\mathcal {G}_{1}\) (middle column) and \(\mathcal {D}\mathcal {G}_{2}\) (right column) for (DTV-L2) with parameter \(\beta = 10^{-3}\) in the isotropic setting (\(s = 2\)). Results obtained using Algorithm 1 (split Bregman), see Table 3. The results obtained by the Chambolle–Pock method are similar and not shown

Table 3 Comparison of the performance of Algorithms 1 and 2 for the denoising problem shown in Fig. 4 in various discretizations

7.1 Denoising of \(\mathcal {D}\mathcal {G}_{r}\)-Images

This section addresses the denoising of \(\mathcal {D}\mathcal {G}_{r}\) images, and it also serves as a comparative study of Algorithms 1 and 2. We represent (interpolate) the nondiscrete image displayed in Fig. 3 (middle) in the space \(\mathcal {D}\mathcal {G}_{r}(\varOmega )\) for \(r=0,1,2\). Noise is added to each degree of freedom as described above. We show the denoising results for the split Bregman method (Algorithm 1) in Fig. 4. The results for the Chambolle–Pock approach (Algorithm 2) are very similar and are therefore not shown. In either case, the noise is removed successfully. Infeasibility criterion (44b) in the final iteration was smaller than \(10^{-37}\) for Algorithm 1 and smaller than \(10^{-11}\) for Algorithm 2 in all cases \(r \in \{0,1,2\}\). Table 3 summarizes the convergence behavior of both methods. Since the split Bregman method performed slightly better w.r.t. iteration count and run time in our implementation, we will use only Algorithm 1 for the subsequent denoising examples (Sect. 7.2 and 7.3).

Figure 4 visualizes the benefits of higher-order finite elements in particular in the case where the discontinuities in the image are not resolved by the computational mesh. In addition, the \(\mathcal {D}\mathcal {G}_{1}\) and \(\mathcal {D}\mathcal {G}_{2}\) solutions exhibit less staircasing. Further evidence for the benefits of higher-order polynomial spaces for the cameraman test image is given in Sect. 7.3.

Before continuing, we mention that all results in \(\mathcal {D}\mathcal {G}_{1}\) were interpolated onto \(\mathcal {D}\mathcal {G}_{0}\) on a twice refined mesh merely for visualization since \(\mathcal {D}\mathcal {G}_{1}\) functions cannot directly be displayed in ParaView. Likewise, results in \(\mathcal {D}\mathcal {G}_{2}\) were interpolated onto \(\mathcal {D}\mathcal {G}_{0}\) on a three times refined mesh for visualization.

7.2 Comparison to \(\mathcal {D}\mathcal {G}_{0}\) Image Denoising on Pixel Grids

In this section we provide a comparison of our approach, using \(\mathcal {D}\mathcal {G}_{r}\) representations of an image for \(r \in \{0,1,2\}\) and discrete problem (DTV-L2), with the classical representation by constant pixels. We refer to the latter as \(\mathcal {D}\mathcal {G}_{0}\) on pixels. In this example, we use the discrete cameraman test image on a \(256 \times 256\) pixel grid. For the finite element spaces, each pixel is refined into four triangles with crossed diagonals.

Fig. 5
figure 5

Noisy (left) and denoised (right) images for classical \(\mathcal {D}\mathcal {G}_{0}\) on pixels (top row), and finite element solutions in \(\mathcal {D}\mathcal {G}_{0}\) (second row), \(\mathcal {D}\mathcal {G}_{1}\) (third row) and \(\mathcal {D}\mathcal {G}_{2}\) (bottom row) for (DTV-L2) with parameter \(\beta = 3\times 10^{-4}\) in the isotropic setting (\(s = 2\)). Results obtained using Algorithm 1 (split Bregman), see Table 4

For this problem we do not expect higher-order discretization to be particularly beneficial since the ‘original’ image data are only piecewise constant itself. In addition, we cannot directly compare run times since the \(\mathcal {D}\mathcal {G}_{0}\) pixel problem was solved with an implementation of the split Bregman method in Matlab, since FEniCS does not support all function spaces on quadrilateral meshes. In any case, the same starting guess and stopping criterion (45) was used in each case.

Table 4 Comparison of the performance of Algorithm 1 (split Bregman) for the denoising problem shown in Fig. 5 in various discretizations
Fig. 6
figure 6

Original (interpolated), noisy and denoised images (top to bottom) for \(\mathcal {D}\mathcal {G}_{0}\) (left column) and \(\mathcal {D}\mathcal {G}_{2}\) (right column) for (DTV-L2) with parameter \(\beta = 4\times 10^{-4}\) for the isotropic setting (\(s = 2\)) on a coarse grid. Results obtained using Algorithm 1 (split Bregman), see Table 5

Table 5 Performance of Algorithm 1 (split Bregman) for the low-resolution denoising problem shown in Fig. 6 in various discretizations

The denoising results are shown in Fig. 5, and the convergence behavior of the split Bregman method is displayed in Table 4.

7.3 Denoising of Low-Resolution Images

In this section we consider a low resolution of the cameraman image, which was obtained by interpolating the \(256 \times 256\) pixel image onto a \(64 \times 64\) square pixel grid with crossed diagonals. Again, noise is added per coefficient in the respective space. Subsequently, the denoising problem is solved in the \(\mathcal {D}\mathcal {G}_{r}(\varOmega )\) spaces for \(r \in \{0,1,2\}\) on the coarse grid. The goal is to demonstrate that the use of higher-order polynomial functions can partially compensate the loss of geometric resolution. In Fig. 6 we show the results obtained using the split Bregman method, whose performance was similar as in Sect. 7.1, as can be seen in Table 5. The \({\mathrm{PSNR}}\,\) values were evaluated using the full-resolution image as \(u_{\text{ ref }}\).

Fig. 7
figure 7

Inpainting with \(66.6\%\) of the cells erased (shown in black in the upper left image). The noisy images are not shown. Inpainting and denoising results for \(\mathcal {D}\mathcal {G}_{0}\) (upper right), \(\mathcal {D}\mathcal {G}_{1}\) (lower left) and \(\mathcal {D}\mathcal {G}_{2}\)(lower right) for (DTV-L2) with parameter \(\beta = 10^{-3}\) for the isotropic setting (\(s = 2\)). Results obtained using Algorithm 2 (Chambolle–Pock), see Table 6

Table 6 Performance of Algorithm 2 (Chambolle–Pock) for (DTV-L2) inpainting problem shown in Fig. 7 in various discretizations

As can be seen from the results in Fig. 6 and Table 5, the recovered image in \(\mathcal {D}\mathcal {G}_{2}(\varOmega )\), see Fig. 6 (bottom right), exceeds the \(\mathcal {D}\mathcal {G}_{0}\) image both in visual quality and in PSNR value.

7.4 Inpainting of \(\mathcal {D}\mathcal {G}_{r}\)-Images

In this and the following section we demonstrate the utility of higher-order polynomial function spaces for the purpose of denoising and inpainting. To this end, we consider the nondiscrete ‘ball’ image and randomly delete two-thirds of all cells, which subsequently serve as the inpainting region \(\varOmega {\setminus }\varOmega _0\). Noise is added to the remaining data and problem (DTV-L2) solved in \(\mathcal {D}\mathcal {G}_{r}(\varOmega )\) for \(r \in \{0,1,2\}\); see Fig. 7. For this test, we found the Chambolle–Pock method (Algorithm 2) to perform better than split Bregman; see Table 6.

The results for this combined inpainting and denoising problem are similar to those for the pure denoising case (Sect. 7.1). Clearly, the higher-order results produce images closer to the original than the recovery in \(\mathcal {D}\mathcal {G}_{0}\), which is also reflected in the \({\mathrm{PSNR}}\,\) values.

8 Conclusion and Outlook

In this paper we have introduced a discrete version (DTV) of the TV-seminorm for globally discontinuous (\(\mathcal {D}\mathcal {G}_{r}\)) Lagrangian finite element functions on simplicial grids in \(\mathbb {R}^2\). We have shown that \(\vert \,\cdot \,\vert _{\mathrm{DTV}(\varOmega )}\) has a convenient dual representation in terms of the supremum over the space of Raviart–Thomas finite element functions, subject to a set of simple constraints. This allows for the efficient realization of a variety of algorithms for (DTV-L2-D) for image denoising and inpainting, with both low- and higher-order finite element functions available in finite element libraries.

An extension to 3D applications is readily obtained by replacing triangles by tetrahedra and edges by facets. In this case the analogue of Lemma 3.1 limits the polynomial degrees with positive weights to \(r \in \{0,1,2\}\); see [48, Tab. II]. Further extensions to TV-\(L^1\) problems and a ‘Huberized’ version of the discrete TV-seminorm, we refer the reader to the extended preprint [33].

As we admit higher-order polynomial functions, it would be natural to extend our analysis to a discrete version of the total generalized variation (TGV) functional introduced in [10]. Another generalization that could be of interest is to consider finite element functions defined on more general cells than the simplices considered here. Clearly rectangles are of particular interest in imaging applications, but also hexagons; see [18, 36], as mentioned in introduction. We remark that \(\mathcal {R}\mathcal {T}_{\!\!\;}\) finite element spaces on parallelograms were already discussed in the original contribution [42], and we refer to [37] for an application to imaging, but only for the lowest order case. The generalization to higher-order finite elements, as well as to more general element geometries, is left for future research.

The polynomial degree in our 2D study was limited to \(r \in \{0,1,2,4\}\), which should be sufficient for most applications. The limitation in the degree arises due to the requirement that the quadrature weights, i.e., the integrals over the standard Lagrangian basis functions, have to be positive; see Lemma 3.1. This brings up the question whether a Lagrangian basis for higher-order polynomial functions on triangles or tetrahedra exists, such that the integrals of the basis functions are (strictly) positive. This is answered in the affirmative by results in [51, 52] for the triangle and [28, 55] for tetrahedra, where interpolatory quadrature formulas with positive weights are constructed. However, it remains to be investigated whether a Lagrangian finite element with a modified basis admits an appropriate Raviart–Thomas-type counterpart such that a dual representation of \(\vert \,\cdot \,\vert _{\mathrm{DTV}(\varOmega )}\) parallel to Theorem 3.1 continues to hold. Moreover, such nonstandard finite element spaces certainly incur an overhead in implementation.

One may also envision applications where it would be beneficial to allow for locally varying polynomial degrees and mesh sizes in imaging applications, so that the resolution can be chosen adaptively. Finally, we mention possible extensions to vectorial TV-seminorms, see for instance [29]. These topics remain for future research.