Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Introduction and Related Work

In this work, we consider variational energy minimization problems of the form

$$\displaystyle \begin{aligned} \inf\limits_{u:\varOmega\to\varGamma} \int_\varOmega \rho (x,u(x)) \,dx + \lambda S(u),{} \end{aligned} $$
(5.1)

for estimating some unknown data u defined on an open, bounded, connected—usually rectangular—image domain \(\varOmega \subseteq \mathbb {R}^d\) with values in \(\varGamma \subseteq \mathbb {R}^n\). The data term in (5.1) is of integral form, with the integrand ρ(x, u(x)) typically depending on some noisy, corrupted measurements. We are particularly interested in the case where ρ is non-convex in u(x).

The regularizer S, weighted by a parameter λ > 0, encodes prior knowledge in order to account for randomness and is often used to resolve ambiguities and render the problem well-posed.

A classical convex example is the Rudin-Osher-Fatemi model with

$$\displaystyle \begin{aligned} \rho(x,u(x)) := \frac{1}{2} (u(x) - g(x))^2\quad \text{and} \quad S(u) := \ensuremath{\operatorname{TV}}(u), \end{aligned} $$
(5.2)

which can be used to remove noise from a given image g : Ω → Γ while preserving discontinuities [35]. The total variation \(\ensuremath {\operatorname {TV}}(u)\) is defined as the integral

$$\displaystyle \begin{aligned} \ensuremath{\operatorname{TV}}(u) := \int_\varOmega d \| D u \|, \end{aligned} $$
(5.3)

where the vector-valued Radon measure Du is used to represent the distributional derivative of u in order to allow for discontinuities [1, 41]. For (weakly) differentiable u, the total variation assumes the simpler form

$$\displaystyle \begin{aligned} \ensuremath{\operatorname{TV}}(u) = \int_\varOmega \|\nabla u(x)\|{}_2 d x.{} \end{aligned} $$
(5.4)

As we will be mainly focused on the discretized setting, we will restrict ourselves to the regular case and use the more suggestive notation (5.4).

In the ROF model, as ρ is convex, computing a global minimizer of (5.1) numerically is feasible even for large problems [4]. However, in many applications, one cannot assume convexity . As a prime example, consider the problem of image registration [ 24 ], also sometimes referred to as large-displacement optical flow : one starts with two images \(R,T:\varOmega \to \mathbb {R}\) and aims to find a deformation , also called displacement , \(u:\varOmega \to \mathbb {R}^d\) which is “sufficiently regular” and aligns R and T in the sense that

$$\displaystyle \begin{aligned} R(x) \approx T(x + u(x)) \end{aligned} $$
(5.5)

for all x ∈ Ω. A suitable energy is

$$\displaystyle \begin{aligned} \frac{1}{2} \int_\varOmega (R(x) - T(x + u(x))^2 d x + \lambda S(u).{} \end{aligned} $$
(5.6)

This data term is also referred to as sum-of-squares distance (SSD) [25].

Numerically minimizing (5.6) is a challenging problem: not only is the data term generally non-convex, the degree of non-convexity is also completely determined by the data R and T, which are generally noisy and result in an energy landscape with many local minima.

Typical methods for minimizing (5.6) therefore rely on local solvers such as gradient-based and (Quasi-)Newton methods; see [30] for an algorithmic overview. In the context of optical flow [16], the classical, and still most common, method is to linearize T around a current estimate, which renders the problem convex. These approaches suffer from the typical issue of local non-convex optimization : the algorithm can get stuck in local minima and requires a good initial estimate. Much work has been dedicated to finding such a starting point, such as “warping” and coarse-to-fine strategies [3].

Non-convexity also appears in much simpler settings, such as q-(pseudo-)norm denoising with energies of the form

$$\displaystyle \begin{aligned} \int_\varOmega |u(x)-g(x)|{}^q d x + \lambda S(u),{} \end{aligned} $$
(5.7)

with q < 1. This choice of q makes the method more robust against outliers in the data g, as the influence of outliers diminishes as q → 0. Choosing q < 1 also encourages the sparsity of the argument more than convex variants with q ≥ 1; this is a particularly useful feature in the context of sparse representation [11]. See also [29] for an extensive analysis of non-convex regularization . However, it again renders the data term non-smooth and non-convex. A recent development is to modify methods for non-smooth convex optimization to the non-convex setting [26, 27], however these are again local and convergence results are currently very limited.

Computational and algorithmic advances have recently made another strategy viable: Instead of solving the non-convex problem directly, one aims to approximate it by a—usually much larger—convex one, which can be solved to a global optimum. In order to approximate the original problem well, one relies on functional lifting , i.e., embedding the original problem into a much larger space: Instead of solving

$$\displaystyle \begin{aligned} \inf_{u:\varOmega\to\varGamma} f(u),{} \end{aligned} $$
(5.8)

one solves the lifted problem

$$\displaystyle \begin{aligned} \inf_{\bar{u}:\varOmega\to\mathcal{P}(\varGamma)}\bar{f}(\bar{u}),{} \end{aligned} $$
(5.9)

where \(\mathcal {P}(\varGamma )\) is the set of probability measures over the range Γ, and \(\bar {f}\) is a suitable extension of f on this larger function space in the following sense: with each u : Ω → Γ, one can associate a function \(\bar {u}:\varOmega \to \mathcal {P}(\varGamma )\) which is a Dirac measure at every point, \(\bar {u}(x):=\delta _{u(x)}\), and require that \(\bar {f}(\bar {u})=f(u)\) for all u in the original space. On the other hand, if the solution of (5.9) is a Dirac measure δ u(x) at every point and \(\bar {f}\) does not introduce artificial minimizers, then u will be a solution of the original problem (5.8), as each element of the feasible set of (5.8) has a corresponding element in the feasible set of (5.9).

This leaves the question of how to define \(\bar {f}\) on arguments \(\bar {u}\) that are not Dirac measures , but rather mixtures or even diffuse measures. There is a series of publications discussing different strategies for deriving “good” liftings, starting with image segmentation [20, 32, 38, 39], general convex first-order regularization [33] as a functional-analytic formulation of the classical paper [17], recently advancing the framework to manifold-valued problems [21] and more accurate discretizations [18, 28, 40].

However, all these works assume that the regularizer depends only on the first-order derivative ∇u or its distributional counterpart. For natural images, such first-order regularization is often sub-optimal, as it penalizes linear parts and, in the case of \(\ensuremath {\operatorname {TV}}\), prefers -wise constant solutions.

For natural images, regularizers that use second- and higher-order derivatives have been found to be much more suitable [2, 7, 22, 23, 31, 36]. Therefore one would like to use these more advanced regularizers in the functional lifting framework. However, so far there has been little progress in this direction. The reason is that the space of probability measures \(\mathcal {P}(\varGamma )\) is usually discretized as a discrete probability measure on chosen points in the range Γ. If one follows the same strategy as for lifting \(\ensuremath {\operatorname {TV}}\), one ends up with a large number of constraints on the dual variables, which is at least cubic with respect to . This requires to choose very small, which corresponds to a very rough discretization of the range Γ of u and brings the accuracy below acceptable thresholds.

Contributions

In this work, we propose a method for approximating energies of the form (5.1) using functional lifting and convex relaxation, where ρ is a general non-convex data term and the regularizer S incorporates second-order information:

  • We investigate the non-smooth “Absolute Laplacian” regularizer, which incorporates second-order derivatives and coincides with \(\ensuremath {\operatorname {TV}}^2\) on one-dimensional domains (section “Lifting for Absolute Laplacian Regularization ”).

  • After reviewing mathematical prerequisites (section “Notation and Mathematical Preliminaries”) and the discretized version of the problem, we discuss where the usual strategy for computing a convex extension of the regularizer fails for more involved regularizers (section “Approximate Relaxation of the Absolute Laplacian”).

    We prove that by introducing an approximation step, the number of required constraints can be reduced from cubic ( 3) to linear () growth in the one-dimensional case (Theorem 5.1). We propose an extension to the case d ≥ 2, which—although currently without theoretical guarantees—has been very successful in all of our experiments.

  • In order to show that a non-convex data term combined with higher-order regularization has practical benefits, we illustrate the method on a synthetical q-pseudo-norm denoising example as in (5.7) with second-order regularization (section “Non-convex Denoising with Second-Order Regularity”).

  • We demonstrate the applicability to the non-convex problem of image registration as in (5.6) (section “Image Registration Using the Absolute Laplacian”).

We conclude with an outlook and notes on further open questions (section “Conclusion and Outlook”).

Lifting for Absolute Laplacian Regularization

In the following, we will consider a special case of second-order regularization : For \(u=(u_1,\ldots ,u_n):\varOmega \to \mathbb {R}^n\), we define the absolute Laplacian regularizer

$$\displaystyle \begin{aligned} S_{AL}(u) := \int_\varOmega \|\varDelta u(x)\|{}_1 \,dx.{} \end{aligned} $$
(5.10)

By convention the Laplacian Δu := (Δu 1, …, Δu n) is vector-valued for n > 1.

Similar to the total variation , S AL can be extended to functions with distributional Laplacians as well using a dual formulation; it can also be viewed as the set of functions with a gradient of bounded deformation [37]. Again we will focus on the discretized energy and therefore use the simplified notation (5.10).

The absolute Laplacian regularizer (5.10) has some drawbacks: most importantly, it is not isotropic in the sense that S AL(u) = S AL(Ru) for some rotation matrix \(R\in \mathbb {R}^{n\times n}\), and it has a large kernel that includes all harmonic functions. The latter issue was also discussed in detail in [14] for quadratic Laplacian regularization .

It is tempting to substitute a full Hessian regularization such as [9, 15, 23]

$$\displaystyle \begin{aligned} \int_\varOmega \left( \sum_{i=1}^n \|\nabla^2 u_i(x)\|{}_2^2 \right)^{\frac{1}{2}} \,dx, \end{aligned} $$
(5.11)

however this couples all components of u, which invalidates the argument used in the proof of Theorem 5.1 below. As of now, we have not found a way for efficiently computing a convex relaxation in the full Hessian-regularized case.

In contrast, the absolute Laplacian (5.10) decouples in the components u i. Moreover, in the one-dimensional scalar case with d = 1 and n = 1, it is identical to the second-order total variation [9],

$$\displaystyle \begin{aligned} S_{AL}(u) = \int_\varOmega |u''(x)|\,dx \end{aligned} $$
(5.12)

or its distributional equivalent.

The absolute Laplacian is motivated by a regularizer that is—in a slightly loose interpretation of the term—known as “curvature” regularization in the medical image registration community [10] and penalizes the squared Laplacians \(\|\varDelta u(x)\|{ }_2^2\) instead of ∥Δu(x)∥1. However, as we will see in the next sections, the 1-homogeneous nature of S AL is crucial in order to accurately lift the regularizer.

Notation and Mathematical Preliminaries

In the following, we detail the discretized lifting approach. We follow the notation in [28]. In order to discretize the probability measures \(\mathcal {P}(\varGamma )\), we choose an n-dimensional regular grid of points {t 1, …, t }⊆ Γ, which are referred to as labels. The number of labels in each dimension of the range Γ is denoted by l k, k = 1, …, n, and the grid spacing h is assumed to be uniform and constant.

The space \(\mathcal {P}(\varGamma )\) is discretized as the unit simplex in \(\mathbb {R}^\ell \),

$$\displaystyle \begin{aligned} \varDelta_\ell := \{\bar{p} \in \mathbb{R}^\ell | \bar{p} \geq 0,\;\sum_{i=1}^\ell \bar{p}_i = 1\}. \end{aligned} $$
(5.13)

In a slight abuse of notation, we will from now on denote by \(\bar u\) a function mapping into the set of discretized probability measures , i.e., \(\bar {u}:\varOmega \to \varDelta _\ell \). The i-th unit vector e i ∈ Δ , i ∈{1, …, }, is associated with the Dirac measure \(\delta _{t_i}\) at label t i. Rather than associating a general vector \(\bar {u}(x) \in \varDelta _\ell \) with a weighted sum of Dirac measures as is commonly done, we assign to each vector a single Dirac measure δ u(x), where u(x) ∈ Γ is obtained by linear weighting of the labels:

$$\displaystyle \begin{aligned} u(x) = \sum_{i=1}^\ell \bar{u}_i(x) t_i.{} \end{aligned} $$
(5.14)

Whenever (5.14) holds, we refer to \(\bar {u}(x)\in \mathbb {R}^\ell \) as a lifted representation of u(x) ∈ Γ. A function \(\bar {u}:\varOmega \rightarrow \varDelta _\ell \) is called a lifted representation of the function u : Ω → Γ if (5.14) holds point-wise for all x ∈ Ω.

Approximate Relaxation of the Absolute Laplacian

In order to illustrate the basic process of constructing an energy function for the lifted representation, first consider the data term in integral form:

$$\displaystyle \begin{aligned} F(u) := \int_\varOmega \rho(x,u(x)) \,dx. \end{aligned} $$
(5.15)

We discretize \(\mathcal {P}(\varGamma )\) as in the previous section, and seek a suitable convex extension of F,

$$\displaystyle \begin{aligned} \bar{F}(\bar{u}) := \int_\varOmega \bar{\rho}(x,\bar{u}(x)) \,dx, \end{aligned} $$
(5.16)

to all \(\bar {u}:\varOmega \to \varDelta _\ell \). A classical way [6] is to find the largest convex \(\bar {\rho }:\varOmega \times \varDelta ^\ell \to \mathbb {R}\) such that

$$\displaystyle \begin{aligned} \bar{\rho}(x,e_i) = \rho(x,t_i),\quad i=1,\ldots,\ell. \end{aligned} $$
(5.17)

In order to do so for some fixed x, one first defines a function

$$\displaystyle \begin{aligned} \phi(p) := \begin{cases} \rho(x,t_i), &\text{if } p = e_i, \\ +\infty, & \text{otherwise,}\end{cases} {} \end{aligned} $$
(5.18)

and sets \(\bar {\rho }(x,p) := \phi ^{\ast \ast }(p)\), where ϕ ∗∗ is the Legendre-Fenchel biconjugate [34]. More precisely,

$$\displaystyle \begin{aligned} \begin{array}{rcl} \phi^{\ast}(f) &\displaystyle :=&\displaystyle \sup_{p} \{ \langle p,f \rangle - \phi(p)\} = \max_{i \in \{1,\ldots,\ell\}} \{ \langle e_i, f \rangle - \rho(x,t^i)\},{} \end{array} \end{aligned} $$
(5.19)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \phi^{\ast\ast}(p) &\displaystyle :=&\displaystyle \sup_{f} \{ \langle p,f \rangle - \phi^\ast(f)\}. \end{array} \end{aligned} $$
(5.20)

As can be seen from (5.19), even for integrands ρ that depend only on a single value u(x), the conjugate is generally composed of pieces. Using common first-order solvers, this incurs a cost of dual or auxiliary variables per point.

For the regularizer, this issue is much worse: Assume \(\varOmega \subseteq \mathbb {R}\), then the Laplacian of u at a point x is simply the second derivative and commonly discretized as

$$\displaystyle \begin{aligned} u''(x) \approx (u(x-\eta) - 2 u(x) + u(x+\eta))/\eta^2, \end{aligned} $$
(5.21)

which depends on three different values of u. A finite difference-based second-order regularizer will therefore be of the form

$$\displaystyle \begin{aligned} \int_\varOmega \rho(u(x-\eta),u(x),u(x+\eta)) \,dx, \end{aligned} $$
(5.22)

which results in three running indices in (5.19) and thus 3 terms in the maximum. Even for a very moderate choice of  = 10, this results in 1000 additional variables per point, which is impractical.

In this section, we therefore consider an approximation of this process for the special case of the absolute Laplacian regularizer (5.10), which only requires linear complexity. We derive the model for the one-dimensional case d = 1 and n = 1,

$$\displaystyle \begin{aligned} \int_\varOmega |u''(x)| d x, \end{aligned} $$
(5.23)

and subsequently discuss how to generalize it to n-dimensional image domains and vector-valued u.

The first step is to separate computation of the second derivatives from the lifting process, i.e., we also apply the derivative operator to the lifted representation \(\bar {u}\) and seek a lifted regularizer

$$\displaystyle \begin{aligned} \int_\varOmega \bar{\rho}(\bar{u}''(x)) d x \approx \int_\varOmega \bar{\rho}\left(\left( \bar{u}(x+\eta)-2 \bar{u}(x)+\bar{u}(x-\eta)\right)/\eta^2\right) d x, \end{aligned} $$
(5.24)

where x ± η are the neighboring points of x. For simplicity, we assume η = 1.

We apply the same process as in (5.18) to ρ(z) = |z| and set

$$\displaystyle \begin{aligned} \phi(p) = \begin{cases} \left\lvert \mu \right\rvert\cdot\left\lvert t_{i_1} - 2t_{i_0} + t_{i_2} \right\rvert, &\text{if } p = \mu \cdot (e_{i_1} - 2e_{i_0} + e_{i_2}), \\ +\infty, & \text{otherwise,}\end{cases} {} \end{aligned} $$
(5.25)

where 1 ≤ i 0, i 1, i 2 ≤ . The free variable \(\mu \in \mathbb {R}\) is not required, but ensures that ϕ is positively homogeneous. This implies that the conjugate ϕ is an indicator function of some set, which simplifies the later optimization. Taking the convex conjugate, we obtain

$$\displaystyle \begin{aligned} \phi ^*(f) = \delta_{K_{1D}} (f) := \begin{cases} 0, & f \in K_{1D}, \\ +\infty, & \text{otherwise,}\end{cases} \end{aligned} $$
(5.26)

with the set

$$\displaystyle \begin{aligned} K_{1D} := \bigcap\limits_{1\leq i_{0}, i_{1},i_{2} \leq \ell}\big\lbrace f\in\mathbb{R}^\ell: f_{i_1} -2f_{i_0} + f_{i_2} \leq h\left\lvert i_1-2i_0+i_2 \right\rvert\big\rbrace. {} \end{aligned} $$
(5.27)

This is a straightforward computation following from the definition of the convex conjugate and making use of the 1-homogeneity of ϕ, and using the assumption that the labels t i are uniformly spaced with distance h. The above formulation consists of 3 constraints, which would render the problem numerically intractable except for very small .

A main contribution of this work is the following theorem, which shows that the number of constraints can be reduced to linear order.

Theorem 5.1

The set K 1D in (5.27) with ℓ 3 linear constraints can be equivalently represented by ℓ linear constraints:

$$\displaystyle \begin{aligned} \begin{array}{rcl} K_{1D} &\displaystyle =&\displaystyle \big\lbrace f\in\mathbb{R}^\ell: f_2-f_1 \leq h,\;\; f_{\ell} - f_{\ell-1} \geq -h \big\rbrace\cap \\ &\displaystyle &\displaystyle \bigcap\limits_{2\leq i \leq \ell-1}\big\lbrace f\in\mathbb{R}^\ell: f_{i-1} -2f_i + f_{i+1}\leq 0\big\rbrace.{} \end{array} \end{aligned} $$
(5.28)

Proof

Denoting the right-hand side in (5.28) by \(K_{1D}^{red}\), and using the definition of K 1D in (5.27), we have to show that \(K_{1D} = K_{1D}^{red}\). \(K_{1D} \subseteq K_{1D}^{red}\)

Assume f ∈ K 1D as in (5.27), i.e., \(f_{i_1}-2f_{i_0}+f_{i_2} \leq h \lvert i_1 -2i_0 + i_2 \rvert \) holds for all triples i 0, i 1, i 2 ∈{1, …, }. Choose i 1 = i 2 = 2 and i 0 = 1, then the first inequality in (5.28) follows. Analogously we obtain the second inequality f  − f −1 ≥−h by setting i 1 = i 2 =  − 1 and i 0 = . All other inequalities in (5.28) follow by setting i 1 = i − 1, i 0 = i, i 2 = i + 1, therefore \(f \in K_{1D}^{red}\). \(K_{1D} \supseteq K_{1D}^{red}\)

Suppose \(f\in K_{1D}^{red}\), i.e., the inequalities in (5.28) hold. We define the vector \(a\in \mathbb {R}^{\ell -1}\), a i := f i+1 − f i as the difference between two consecutive components of f. Using this notation, we reformulate the constraints (5.28) in terms of a:

$$\displaystyle \begin{aligned} a_1 &\leq h, {} \end{aligned} $$
(5.29)
$$\displaystyle \begin{aligned} a_{\ell-1} &\geq -h, \end{aligned} $$
(5.30)
$$\displaystyle \begin{aligned} a_{i-1} &\geq a_{i},\; \forall i \in \lbrace 2,\dots, \ell-1\rbrace{}. \end{aligned} $$
(5.31)

Thus the components of a i form a finite, monotonously non-increasing sequence that is absolutely bounded by h, i.e., a ∈ S := {x ∈ [−h, +h]−1 : x i ≥ x i+1}.

If i 0 = i 1 = i 2, the inequality in (5.27) holds trivially. Otherwise, if two of the indices agree, then the inequality in (5.27) takes the form

$$\displaystyle \begin{aligned} f_j - f_k \leq h |j - k|. \end{aligned} $$
(5.32)

Assuming without loss of generality that j > k, this inequality follows from

$$\displaystyle \begin{aligned} f_j - f_k = a_{k} + \ldots + a_{j-1} \leq |a_k|+\ldots+|a_{j-1}| \leq h|j-k| \end{aligned} $$
(5.33)

due to the observation that all a i are bounded by ± h.

We are left with the last case of distinct i 0, i 1, i 2. Without loss of generality assume i 1 > i 2, otherwise we swap the symbols.

As all inequalities are invariant with respect to the addition of a constant to f, it suffices to prove the claim for all f with f 1 fixed to some constant. Therefore we can assume f 1 = 0. Under this assumption, the linear map between vectors \(f \in \mathbb {R}^\ell \) in \(K_{1D}^{red}\) and vectors \(a\in \mathbb {R}^{\ell -1}\) satisfying (5.29)–(5.31) is bijective. As the vertices of the latter set consist of the vectors of the form (h, …, h, −h, …, −h), from bijectivity we deduce that the vertices of the set \(K_{1D}^{red}\cap \{f|f_1=0\}\) are the elements satisfying the equality |f i+1 − f i| = h and the inequality f i−1 − 2f i + f i+1 ≤ 0.

Showing that all f satisfying (5.28) are contained in the set in (5.27) is equivalent to showing

$$\displaystyle \begin{aligned} \max_{f \in K_{1D}^{red} \cap \{f|f_1=0\}} \{ f_{i_1} - 2 f_{i_0} + f_{i_2} \} \leq h|i_1 - 2 i_0 + i_2|.{} \end{aligned} $$
(5.34)

As the maximum problem is a linear program, it assumes its maximum on the set of vertices of \(K_{1D}^{red}\cap \{f|f_1 = 0\}\). Therefore we only have to show that

$$\displaystyle \begin{aligned} f_{i_1} - 2 f_{i_0} + f_{i_2} \leq h|i_1 - 2 i_0 + i_2|{} \end{aligned} $$
(5.35)

for all f in the finite set of vertices, i.e., satisfying |f i+1 − f i| = h and the inequality f i+1 − 2f i + f i−1 ≤ 0 (and still f 1 = 0). This can be argued case by case:

  • As the left-hand side in (5.35) can be written as \((f_{i_1} - f_{i_0}) + (f_{i_2} - f_{i_0})\) and due the observation (5.33), the maximum is assumed on the vertex f satisfying f i+1 = f i + h for all i, with maximum value

    $$\displaystyle \begin{aligned} f_{i_1} - 2 f_{i_0} + f_{i_2} = h (i_1 - i_0) + h (i_2 - i_0) = h(i_1 - 2 i_0 + i_2) = h |i_1 - 2 i_0 + i_2|, \end{aligned} $$
    (5.36)

    which shows that the inequality in (5.27) holds for this case.

  • In this case the maximum is assumed if either f i+1 = f i + h or f i+1 = f i − h for all i, depending on which of i 2 − i 0 and i 0 − i 1 is larger. Therefore

    $$\displaystyle \begin{aligned} f_{i_1} - 2 f_{i_0} + f_{i_2} \leq \max\{\pm(h(i_0 - i_2)-h(i_1-i_0))\} =h\lvert i_1 - 2 i_0 + i_2\rvert. \end{aligned} $$
    (5.37)
  • Again with the observation (5.33), we see that in this case the maximum is attained for f i+1 = f i − h for all i, in which case

    $$\displaystyle \begin{aligned} f_{i_1} - 2 f_{i_0} + f_{i_2} = -(f_{i_0}-f_{i_1})-(f_{i_0}-f_{i_2}) = h (-i_2+2i_0 - i_1) = h|i_1 -2 i_0 + i_2|.\end{aligned} $$
    (5.38)

This shows that (5.35) holds for all vertices in the set \(K_{1D}^{red}\cap \{f|f_1=0\}\), and therefore for all points, which concludes the proof of the remaining inclusion \(K_{1D}^{red} \subseteq K_{1D}\). □

Interestingly, in the classical convex relaxation for the (first-order ) total variation used in [19, 33], the dual constraint set is of the form

$$\displaystyle \begin{aligned} K_{\ensuremath{\operatorname{TV}},1D} = \bigcap\limits_{1\leq i \leq \ell-1}\big\lbrace f\in\mathbb{R}^\ell: |f_i-f_{i+1}| \leq h\big\rbrace. \end{aligned} $$
(5.39)

As the second intersection in (5.28) enforces f i+1 − f i ≤ f i − f i−1, we obtain

$$\displaystyle \begin{aligned} K_{1D} = K_{\ensuremath{\operatorname{TV}},1D} \cap \bigcap\limits_{2\leq i \leq \ell-1}\big\lbrace f\in\mathbb{R}^\ell: f_{i-1} -2f_i + f_{i+1}\leq 0\big\rbrace. \end{aligned} $$
(5.40)

Thus, when moving from first- to second-order regularization in the proposed way, the only addition is an extra non-positivity constraint on the second derivative of the dual variable f.

So far we have only considered the case of a one-dimensional domain Ω. In order to generalize the construction in (5.25) to d > 1 dimensions, we replace the one-dimensional three-point stencil by the corresponding Laplacian stencil in higher dimensions:

$$\displaystyle \begin{aligned} \phi(p) = \begin{cases} \left\lvert \mu \right\rvert\cdot\left\lvert \sum_{j=1}^d (i_{1,j} - 2 i_0 + i_{2,j}) \right\rvert, &\text{if } p = \mu \cdot \sum_{j=1}^d (e_{i_{1,j}} -2 e_{i_0} + e_{i_{2,j}}), \\ +\infty, & \text{otherwise,} \end{cases} {} \end{aligned} $$
(5.41)

where i 1,j and i 2,j are the indices of the neighboring points of i 0 in the j-th spatial direction. The convex conjugate can be computed in a similar fashion as in the one-dimensional case:

$$\displaystyle \begin{aligned} \phi ^*(f) = \delta_K (f) \end{aligned} $$
(5.42)

with the set

$$\displaystyle \begin{aligned} K := \bigcap\limits_{1\leq i_0, i_{1,1}, i_{2,1}, \ldots \leq \ell} \big\lbrace f\in\mathbb{R}^\ell: \sum_{j=1}^d (f_{i_{1,j}} - 2 f_{i_0} + f_{i_{2,j}}) \leq h\left\lvert \sum_{k=1}^d (i_{1,j} - 2 i_0 + i_{2,j}) \right\rvert\big\rbrace. \end{aligned} $$
(5.43)

Taken all together, the lifted absolute Laplacian regularizer for scalar-valued images in a d-dimensional image domain becomes

$$\displaystyle \begin{aligned} \bar{S}_{AL,s}(\bar{u}) := \int_\varOmega \sup_{f \in K}\langle \varDelta \bar{u} (x), f\rangle \,dx.{} \end{aligned} $$
(5.44)

In order to approximate the absolute Laplacian for lifted vector-valued functions u = (u 1, …, u n), we apply (5.44) to the marginal distributions \(\bar {u}^{(k)}(x) := \varPi _k \bar {u}(x) \in \varDelta ^{l_k}\) separately in each component k ∈{1, …, n}, where

$$\displaystyle \begin{aligned} \varPi_k := \underbrace{(1,\dots,1)}_{\text{l}_1\cdot \text{l}_2\cdot \ldots \cdot \text{l}_{\text{k}-1} \text{ones}}\otimes\;\mathrm{Id}_{l_k}\otimes\underbrace{(1,\dots,1)}_{\text{l}_{\text{k}+1}\cdot \text{l}_{\text{k}+2}\cdot \ldots \cdot \text{l}_{\text{n}} \text{ones}}\in\mathbb{R}^{l_k\times\ell} \end{aligned} $$
(5.45)

computes the k-th marginal distribution by summing the entries of \(\bar {u}\) over all dimensions of the range with the exception of the k-th dimension. As the absolute Laplacian regularizer decouples in the components of u, it can be approximated by summing the one-dimensional regularizer of the marginalized label distribution over the label dimensions:

$$\displaystyle \begin{aligned} \bar{S}_{AL}(\bar{u}) := \sum_{i=1}^n \bar{S}_{AL,s}(\varPi_i\bar{u}) =\sum_{i=1}^n\int_\varOmega \sup_{f^i \in K_{l_i}}\langle \varDelta \varPi_i \bar{u} (x), f^i(x)\rangle \,dx. \end{aligned} $$
(5.46)

Here \(K_{l_i}\subseteq \mathbb {R}^{l_i}\) denotes a set of the form (5.43) in l i-dimensional space, which accounts for the fact that there may be a different number of labels in each dimension of the range.

After discretizing the image domain \(\varOmega \subseteq \mathbb {R}^d\) on a d-dimensional Cartesian grid Ω′, the full discretized problem can be formulated in saddle point form:

$$\displaystyle \begin{aligned} \inf\limits_{\bar{u}:\varOmega'\rightarrow\varDelta_{\ell}}\sup_{f^i:\varOmega'\rightarrow K_{l_i},\, i=1,\ldots,n} \sum_{x\in\varOmega'}\bar{\rho}^{}(x, \bar{u}(x)) + \lambda \sum_{x\in\varOmega'} \sum_{i=1}^n\langle \varDelta\varPi_i\bar{u}(x),f^i(x) \rangle. \end{aligned} $$
(5.47)

This problem can be readily solved using any available primal-dual method for non-smooth convex optimization .

We do not know of a result similar to Theorem 5.1 yet in order to reduce the number of constraints for the sets \(K_{l_i}\) in a similar way as for K 1D. Therefore, we take a pragmatic approach: we approximate each of the sets \(K_{l_i}\) by the set K 1D in the corresponding dimension, which amounts to an outer approximation of \(K_{l_i}\). We can then apply Theorem 5.1 to solve the problem using the reduced number of constraints.

Experimental Results

We evaluate the proposed strategy for higher-order relaxation of non-convex problems on two applications. Firstly, we consider a non-convex denoising problem, using the Matlab extension CVX [12, 13] to solve the primal formulation of the saddle-point problem (5.47) on an Intel Core i7-4500U CPU with 8 GB of RAM.

Secondly, we examine a real-world image registration problem, using a CUDA 7.5.17 implementationFootnote 1 of a first order primal-dual algorithm with diagonal preconditioning [5] which runs on an Nvidia GeForce GTX 680 GPU with an Intel Core i7 960 CPU and 24 Gb RAM. The implementation uses a more recent “sublabel-accurate” approach for lifting the data term in order to reduce the required resolution for the data term [18, 28].

Non-convex Denoising with Second-Order Regularity

In order to illustrate that non-convexity can be beneficial when combined with second-order regularization , we consider the simple one-dimensional denoising problem

$$\displaystyle \begin{aligned} \inf\limits_{u:\varOmega\rightarrow\mathbb{R}} \int_\varOmega |u(x)-g(x)|{}^q \,dx + \lambda \int_\varOmega |u''(x)| d x{} \end{aligned} $$
(5.48)

with \(\varOmega \subseteq \mathbb {R}\). For q = 1, one obtains a simple convex \(\ensuremath {\operatorname {TV}}^2-L^1\) denoising model, while for q < 1, the energy is generally non-convex. We used the proposed method to approximate a global solution of (5.48).

The method was applied to a smooth input signal g distorted by heavy salt-and-pepper noise, with 80% of the values randomly set to 0 or 1. The locations of the outliers were unknown to the solver, and no additional preprocessing or outlier masking was performed.

As can be seen from Figs. 5.1 and 5.2, combining higher-order regularization with a non-convex data term allows to reconstruct the signal more faithfully. While both approaches prefer piecewise linear results as expected from the function-space formulation, in the convex approach with q = 1, input noise is carried over into the output before the structure is fully visible.

Fig. 5.1
figure 1

Classical convex second-order (\(\ensuremath {\operatorname {TV}}^2-L^1\)) denoising of a smooth signal corrupted by 80% blind salt-and-pepper noise using varying regularization strength λ. The result is piecewise affine as expected from \(\operatorname {TV}^2\) regularization . Starting from large λ with heavy over-regularization and decreasing λ, noise is picked up early. There is no regimen where both noise is removed and the signal reconstructed faithfully

Fig. 5.2
figure 2

Non-convex second-order (\(\ensuremath {\operatorname {TV}}^2-L^q\) with q = 0.1) denoising of the signal in Fig. 5.1 using the proposed convex lifting and approximation with varying regularization strength λ. The proposed approach allows to approximate global minimizers of such higher-order regularized non-convex models. The additional non-convexity achieves a better reconstruction of the signal (top right) than in the convex case (Fig. 5.1) before giving in to noise (bottom left)

While convex methods relying on L 1 data terms are often—rightfully—referred to as “robust” methods in comparison to methods using smooth or quadratic data terms, the non-convex approach with q = 0.1 is even more robust against outliers and returns a decent reconstruction for a range of λ on this challenging problem. Run times were in the order of 0.3 s for a discretization of Ω using 120 grid points and  = 63 labels.

Image Registration Using the Absolute Laplacian

For a more challenging application, we apply the method to the image registration problem with SSD data term (5.6) and absolute Laplacian regularization .

Translation-Only Synthetic Image

We first apply the absolute Laplacian regularizer to a synthetic binary image registration problem. The input reference image R is a binary 64 × 64 image of two vertical boxes. The template image T is obtained by translating the input image by 12 pixels (Fig. 5.3, first row). Thus the ground truth is a uniform translation by 12 pixels and constitutes a global minimizer , as it has vanishing data term and the second-order regularizer does not penalize linear deformations . This configuration is challenging for methods based on local optimization, as there is a strong local minimum. Furthermore, as the images contain large constant regions, the energy landscape has extensive flat regions with zero gradient.

Fig. 5.3
figure 3

Application of the proposed lifting for absolute Laplacian regularization to a synthetic image registration problem. A traditional curvature-regularized model solved using a local Gauss-Newton method serves as a baseline. The input reference image R (top left) and template image T (top right) differ by a ground truth translation of 12 pixels. The second and third row show the final difference images \(\frac {1}{2}(R(x) - T(x+u(x)))^2\) (left) and obtained deformation u visualized as a deformation grid (right). The classical local optimization method (second row) converges to a local solution which is not globally optimal and yields a non-constant deformation with a mean displacement of 2.3 pixels. Using the proposed functional lifting for absolute Laplacian regularization (bottom row), the global optimum is retrieved accurately with an average displacement of 12.0002 pixels

We compare our approach to a traditional curvature-regularized model solved using a single-resolution local minimization method implemented in the Matlab extension FAIR [24, 25]. The regularization strength was manually set to λ = 10, however a wide range of values for λ produced the same qualitative behavior. The traditional approach leads to a solution that is not globally optimal (Fig. 5.3, second row). Using our approach, we retrieve the globally optimal ground truth with  = 9 labels in the label space Γ = [−12, 12]2 and a run time of 85 s, without having to resort to approaches such as coarse-to-fine or affine pre-registration for initialization (Fig. 5.3, bottom row).

Real-World Image Registration

As a real-world example, we employ the SSD energy with absolute Laplacian regularization to solve the image registration problem on a pair of X-ray images, and compare to the existing lifting approach [18] with total variation regularization . The regularization strength was manually set to λ = 0.05. Run times were 933 s for total variation , and 515 s for absolute Laplacian minimization.

As can be seen from the numerical results (Fig. 5.4), while the first-order total variation regularization achieves a very good data fit, it results in a physically implausible self-intersecting deformation grid (Fig. 5.4, second row). This behavior can be partly attributed to the well-known fact that total variation promotes piecewise constant solutions, also commonly referred to as stair-casing effect [8]. In the context of medical image registration , this is a highly undesired behavior, as jumps in the deformation map u correspond to infinite stretch or compression and often lead to self-intersections. In contrast, the proposed second-order regularizer (Fig. 5.4, bottom row) maintains a physically meaningful deformation , while still achieving an acceptable data fit.

Fig. 5.4
figure 4

Comparison between two global optimization methods for medical image registration : classical first-order total variation regularization [18] and the proposed second-order lifting approach. The input data consists of a pair of 128 × 128 grayscale X-ray images of two right hands (top row). Both approaches are evaluated using  = 102 = 100 labels, Γ = [−12, 12]2, Ω = [0, 128]2, and a regularization strength of λ = 0.05. The classical first-order total variation regularization generates piecewise constant deformations and a physically implausible self-intersecting deformation grid (second row). The second-order regularizer avoids discontinuities and maintains a physically meaningful deformation grid (bottom row)

Conclusion and Outlook

In this work, we have taken a first step towards extending the convex relaxation and functional lifting framework to second-order regularization . We showed how to solve the main issue of an exploding number of constraints for the absolute Laplacian regularization .

Experiments on a denoising problem showed that the combination of higher-order regularization and non-convex data terms can lead to better results than a convex model, and allows to recover highly corrupted data in a piecewise linear fashion. In the application of image registration , the absolute Laplacian faithfully retrieves simple translations and leads to a more realistic deformation grid than total variation regularization on a real-world problem.

While our relaxation allows to reduce the number of required constraints to linear complexity, it is an approximation, rather than a “tight” relaxation in the sense of an exact biconjugate, and the proof is still limited to one dimension. An open question is whether one can find a similar compact representation for the tight relaxation in more than one dimension.

Finally, in this work we have constrained ourselves to the discretized setting. A functional-analytic discussion as well as an extension to the more recent manifold-valued and sublabel-accurate relaxations remain subject of future work.