Functional Lifting for Variational Problems with Higher-Order Regularization

Loewenhauser, Benedikt; Lellmann, Jan

doi:10.1007/978-3-319-91274-5_5

Benedikt Loewenhauser^9,10 &
Jan Lellmann¹¹

Part of the book series: Mathematics and Visualization ((MATHVISUAL))

Included in the following conference series:

International Conference on Imaging, Vision and Learning based on Optimization and PDEs

473 Accesses
3 Citations

Abstract

Variational approaches are an established paradigm in the field of image processing . The non-convexity of the functional can be addressed by functional lifting and convex relaxation techniques, which aim to solve a convex approximation of the original energy on a larger space. However, so far these approaches have been limited to first-order , gradient-based regularizers such as the total variation . In this work, we propose a way to extend functional lifting to a second-order regularizer derived from the Laplacian. We prove that it can be represented efficiently and thus allows numerical optimization. We experimentally demonstrate the usefulness on a synthetic convex denoising problem and on synthetic as well as real-world image registration problems.

Access provided by CONRICYT-eBooks. Download conference paper PDF

Functional Liftings of Vectorial Variational Problems with Laplacian Regularization

Combined First and Second Order Variational Approaches for Image Processing

Article 10 March 2015

Lifting Methods for Manifold-Valued Variational Problems

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Introduction and Related Work

In this work, we consider variational energy minimization problems of the form

$$\displaystyle \begin{aligned} \inf\limits_{u:\varOmega\to\varGamma} \int_\varOmega \rho (x,u(x)) \,dx + \lambda S(u),{} \end{aligned} $$

(5.1)

for estimating some unknown data u defined on an open, bounded, connected—usually rectangular—image domain $\varOmega \subseteq \mathbb {R}^d$ with values in $\varGamma \subseteq \mathbb {R}^n$. The data term in (5.1) is of integral form, with the integrand ρ(x, u(x)) typically depending on some noisy, corrupted measurements. We are particularly interested in the case where ρ is non-convex in u(x).

The regularizer S, weighted by a parameter λ > 0, encodes prior knowledge in order to account for randomness and is often used to resolve ambiguities and render the problem well-posed.

A classical convex example is the Rudin-Osher-Fatemi model with

$$\displaystyle \begin{aligned} \rho(x,u(x)) := \frac{1}{2} (u(x) - g(x))^2\quad \text{and} \quad S(u) := \ensuremath{\operatorname{TV}}(u), \end{aligned} $$

(5.2)

which can be used to remove noise from a given image g : Ω → Γ while preserving discontinuities [35]. The total variation $\ensuremath {\operatorname {TV}}(u)$ is defined as the integral

$$\displaystyle \begin{aligned} \ensuremath{\operatorname{TV}}(u) := \int_\varOmega d \| D u \|, \end{aligned} $$

(5.3)

where the vector-valued Radon measure Du is used to represent the distributional derivative of u in order to allow for discontinuities [1, 41]. For (weakly) differentiable u, the total variation assumes the simpler form

$$\displaystyle \begin{aligned} \ensuremath{\operatorname{TV}}(u) = \int_\varOmega \|\nabla u(x)\|{}_2 d x.{} \end{aligned} $$

(5.4)

As we will be mainly focused on the discretized setting, we will restrict ourselves to the regular case and use the more suggestive notation (5.4).

In the ROF model, as ρ is convex, computing a global minimizer of (5.1) numerically is feasible even for large problems [4]. However, in many applications, one cannot assume convexity . As a prime example, consider the problem of image registration [ 24 ], also sometimes referred to as large-displacement optical flow : one starts with two images $R,T:\varOmega \to \mathbb {R}$ and aims to find a deformation , also called displacement , $u:\varOmega \to \mathbb {R}^d$ which is “sufficiently regular” and aligns R and T in the sense that

$$\displaystyle \begin{aligned} R(x) \approx T(x + u(x)) \end{aligned} $$

(5.5)

for all x ∈ Ω. A suitable energy is

$$\displaystyle \begin{aligned} \frac{1}{2} \int_\varOmega (R(x) - T(x + u(x))^2 d x + \lambda S(u).{} \end{aligned} $$

(5.6)

This data term is also referred to as sum-of-squares distance (SSD) [25].

Numerically minimizing (5.6) is a challenging problem: not only is the data term generally non-convex, the degree of non-convexity is also completely determined by the data R and T, which are generally noisy and result in an energy landscape with many local minima.

Typical methods for minimizing (5.6) therefore rely on local solvers such as gradient-based and (Quasi-)Newton methods; see [30] for an algorithmic overview. In the context of optical flow [16], the classical, and still most common, method is to linearize T around a current estimate, which renders the problem convex. These approaches suffer from the typical issue of local non-convex optimization : the algorithm can get stuck in local minima and requires a good initial estimate. Much work has been dedicated to finding such a starting point, such as “warping” and coarse-to-fine strategies [3].

Non-convexity also appears in much simpler settings, such as q-(pseudo-)norm denoising with energies of the form

$$\displaystyle \begin{aligned} \int_\varOmega |u(x)-g(x)|{}^q d x + \lambda S(u),{} \end{aligned} $$

(5.7)

with q < 1. This choice of q makes the method more robust against outliers in the data g, as the influence of outliers diminishes as q → 0. Choosing q < 1 also encourages the sparsity of the argument more than convex variants with q ≥ 1; this is a particularly useful feature in the context of sparse representation [11]. See also [29] for an extensive analysis of non-convex regularization . However, it again renders the data term non-smooth and non-convex. A recent development is to modify methods for non-smooth convex optimization to the non-convex setting [26, 27], however these are again local and convergence results are currently very limited.

Computational and algorithmic advances have recently made another strategy viable: Instead of solving the non-convex problem directly, one aims to approximate it by a—usually much larger—convex one, which can be solved to a global optimum. In order to approximate the original problem well, one relies on functional lifting , i.e., embedding the original problem into a much larger space: Instead of solving

$$\displaystyle \begin{aligned} \inf_{u:\varOmega\to\varGamma} f(u),{} \end{aligned} $$

(5.8)

one solves the lifted problem

$$\displaystyle \begin{aligned} \inf_{\bar{u}:\varOmega\to\mathcal{P}(\varGamma)}\bar{f}(\bar{u}),{} \end{aligned} $$

(5.9)

where $\mathcal {P}(\varGamma )$ is the set of probability measures over the range Γ, and $\bar {f}$ is a suitable extension of f on this larger function space in the following sense: with each u : Ω → Γ, one can associate a function $\bar {u}:\varOmega \to \mathcal {P}(\varGamma )$ which is a Dirac measure at every point, $\bar {u}(x):=\delta _{u(x)}$, and require that $\bar {f}(\bar {u})=f(u)$ for all u in the original space. On the other hand, if the solution of (5.9) is a Dirac measure δ _u(x) at every point and $\bar {f}$ does not introduce artificial minimizers, then u will be a solution of the original problem (5.8), as each element of the feasible set of (5.8) has a corresponding element in the feasible set of (5.9).

This leaves the question of how to define $\bar {f}$ on arguments $\bar {u}$ that are not Dirac measures , but rather mixtures or even diffuse measures. There is a series of publications discussing different strategies for deriving “good” liftings, starting with image segmentation [20, 32, 38, 39], general convex first-order regularization [33] as a functional-analytic formulation of the classical paper [17], recently advancing the framework to manifold-valued problems [21] and more accurate discretizations [18, 28, 40].

However, all these works assume that the regularizer depends only on the first-order derivative ∇u or its distributional counterpart. For natural images, such first-order regularization is often sub-optimal, as it penalizes linear parts and, in the case of $\ensuremath {\operatorname {TV}}$, prefers -wise constant solutions.

For natural images, regularizers that use second- and higher-order derivatives have been found to be much more suitable [2, 7, 22, 23, 31, 36]. Therefore one would like to use these more advanced regularizers in the functional lifting framework. However, so far there has been little progress in this direction. The reason is that the space of probability measures $\mathcal {P}(\varGamma )$ is usually discretized as a discrete probability measure on ℓ chosen points in the range Γ. If one follows the same strategy as for lifting $\ensuremath {\operatorname {TV}}$, one ends up with a large number of constraints on the dual variables, which is at least cubic with respect to ℓ. This requires to choose ℓ very small, which corresponds to a very rough discretization of the range Γ of u and brings the accuracy below acceptable thresholds.

Contributions

In this work, we propose a method for approximating energies of the form (5.1) using functional lifting and convex relaxation, where ρ is a general non-convex data term and the regularizer S incorporates second-order information:

We investigate the non-smooth “Absolute Laplacian” regularizer, which incorporates second-order derivatives and coincides with $\ensuremath {\operatorname {TV}}^2$ on one-dimensional domains (section “Lifting for Absolute Laplacian Regularization ”).
After reviewing mathematical prerequisites (section “Notation and Mathematical Preliminaries”) and the discretized version of the problem, we discuss where the usual strategy for computing a convex extension of the regularizer fails for more involved regularizers (section “Approximate Relaxation of the Absolute Laplacian”).

We prove that by introducing an approximation step, the number of required constraints can be reduced from cubic (ℓ ³) to linear (ℓ) growth in the one-dimensional case (Theorem 5.1). We propose an extension to the case d ≥ 2, which—although currently without theoretical guarantees—has been very successful in all of our experiments.
In order to show that a non-convex data term combined with higher-order regularization has practical benefits, we illustrate the method on a synthetical q-pseudo-norm denoising example as in (5.7) with second-order regularization (section “Non-convex Denoising with Second-Order Regularity”).
We demonstrate the applicability to the non-convex problem of image registration as in (5.6) (section “Image Registration Using the Absolute Laplacian”).

We conclude with an outlook and notes on further open questions (section “Conclusion and Outlook”).

Lifting for Absolute Laplacian Regularization

In the following, we will consider a special case of second-order regularization : For $u=(u_1,\ldots ,u_n):\varOmega \to \mathbb {R}^n$, we define the absolute Laplacian regularizer

$$\displaystyle \begin{aligned} S_{AL}(u) := \int_\varOmega \|\varDelta u(x)\|{}_1 \,dx.{} \end{aligned} $$

(5.10)

By convention the Laplacian Δu := (Δu ₁, …, Δu _n)^⊤ is vector-valued for n > 1.

Similar to the total variation , S _AL can be extended to functions with distributional Laplacians as well using a dual formulation; it can also be viewed as the set of functions with a gradient of bounded deformation [37]. Again we will focus on the discretized energy and therefore use the simplified notation (5.10).

The absolute Laplacian regularizer (5.10) has some drawbacks: most importantly, it is not isotropic in the sense that S _AL(u) = S _AL(Ru) for some rotation matrix $R\in \mathbb {R}^{n\times n}$, and it has a large kernel that includes all harmonic functions. The latter issue was also discussed in detail in [14] for quadratic Laplacian regularization .

It is tempting to substitute a full Hessian regularization such as [9, 15, 23]

$$\displaystyle \begin{aligned} \int_\varOmega \left( \sum_{i=1}^n \|\nabla^2 u_i(x)\|{}_2^2 \right)^{\frac{1}{2}} \,dx, \end{aligned} $$

(5.11)

however this couples all components of u, which invalidates the argument used in the proof of Theorem 5.1 below. As of now, we have not found a way for efficiently computing a convex relaxation in the full Hessian-regularized case.

In contrast, the absolute Laplacian (5.10) decouples in the components u _i. Moreover, in the one-dimensional scalar case with d = 1 and n = 1, it is identical to the second-order total variation [9],

$$\displaystyle \begin{aligned} S_{AL}(u) = \int_\varOmega |u''(x)|\,dx \end{aligned} $$

(5.12)

or its distributional equivalent.

The absolute Laplacian is motivated by a regularizer that is—in a slightly loose interpretation of the term—known as “curvature” regularization in the medical image registration community [10] and penalizes the squared Laplacians $\|\varDelta u(x)\|{ }_2^2$ instead of ∥Δu(x)∥₁. However, as we will see in the next sections, the 1-homogeneous nature of S _AL is crucial in order to accurately lift the regularizer.

Notation and Mathematical Preliminaries

In the following, we detail the discretized lifting approach. We follow the notation in [28]. In order to discretize the probability measures $\mathcal {P}(\varGamma )$, we choose an n-dimensional regular grid of points {t ₁, …, t _ℓ}⊆ Γ, which are referred to as labels. The number of labels in each dimension of the range Γ is denoted by l _k, k = 1, …, n, and the grid spacing h is assumed to be uniform and constant.

The space $\mathcal {P}(\varGamma )$ is discretized as the unit simplex in $\mathbb {R}^\ell $,

$$\displaystyle \begin{aligned} \varDelta_\ell := \{\bar{p} \in \mathbb{R}^\ell | \bar{p} \geq 0,\;\sum_{i=1}^\ell \bar{p}_i = 1\}. \end{aligned} $$

(5.13)

In a slight abuse of notation, we will from now on denote by $\bar u$ a function mapping into the set of discretized probability measures , i.e., $\bar {u}:\varOmega \to \varDelta _\ell $. The i-th unit vector e _i ∈ Δ _ℓ, i ∈{1, …, ℓ}, is associated with the Dirac measure $\delta _{t_i}$ at label t _i. Rather than associating a general vector $\bar {u}(x) \in \varDelta _\ell $ with a weighted sum of Dirac measures as is commonly done, we assign to each vector a single Dirac measure δ _u(x), where u(x) ∈ Γ is obtained by linear weighting of the labels:

$$\displaystyle \begin{aligned} u(x) = \sum_{i=1}^\ell \bar{u}_i(x) t_i.{} \end{aligned} $$

(5.14)

Whenever (5.14) holds, we refer to $\bar {u}(x)\in \mathbb {R}^\ell $ as a lifted representation of u(x) ∈ Γ. A function $\bar {u}:\varOmega \rightarrow \varDelta _\ell $ is called a lifted representation of the function u : Ω → Γ if (5.14) holds point-wise for all x ∈ Ω.

Approximate Relaxation of the Absolute Laplacian

In order to illustrate the basic process of constructing an energy function for the lifted representation, first consider the data term in integral form:

$$\displaystyle \begin{aligned} F(u) := \int_\varOmega \rho(x,u(x)) \,dx. \end{aligned} $$

(5.15)

We discretize $\mathcal {P}(\varGamma )$ as in the previous section, and seek a suitable convex extension of F,

$$\displaystyle \begin{aligned} \bar{F}(\bar{u}) := \int_\varOmega \bar{\rho}(x,\bar{u}(x)) \,dx, \end{aligned} $$

(5.16)

to all $\bar {u}:\varOmega \to \varDelta _\ell $. A classical way [6] is to find the largest convex $\bar {\rho }:\varOmega \times \varDelta ^\ell \to \mathbb {R}$ such that

$$\displaystyle \begin{aligned} \bar{\rho}(x,e_i) = \rho(x,t_i),\quad i=1,\ldots,\ell. \end{aligned} $$

(5.17)

In order to do so for some fixed x, one first defines a function

$$\displaystyle \begin{aligned} \phi(p) := \begin{cases} \rho(x,t_i), &\text{if } p = e_i, \\ +\infty, & \text{otherwise,}\end{cases} {} \end{aligned} $$

(5.18)

and sets $\bar {\rho }(x,p) := \phi ^{\ast \ast }(p)$, where ϕ ^∗∗ is the Legendre-Fenchel biconjugate [34]. More precisely,

$$\displaystyle \begin{aligned} \begin{array}{rcl} \phi^{\ast}(f) &\displaystyle :=&\displaystyle \sup_{p} \{ \langle p,f \rangle - \phi(p)\} = \max_{i \in \{1,\ldots,\ell\}} \{ \langle e_i, f \rangle - \rho(x,t^i)\},{} \end{array} \end{aligned} $$

(5.19)

$$\displaystyle \begin{aligned} \begin{array}{rcl} \phi^{\ast\ast}(p) &\displaystyle :=&\displaystyle \sup_{f} \{ \langle p,f \rangle - \phi^\ast(f)\}. \end{array} \end{aligned} $$

(5.20)

As can be seen from (5.19), even for integrands ρ that depend only on a single value u(x), the conjugate is generally composed of ℓ pieces. Using common first-order solvers, this incurs a cost of ℓ dual or auxiliary variables per point.

For the regularizer, this issue is much worse: Assume $\varOmega \subseteq \mathbb {R}$, then the Laplacian of u at a point x is simply the second derivative and commonly discretized as

$$\displaystyle \begin{aligned} u''(x) \approx (u(x-\eta) - 2 u(x) + u(x+\eta))/\eta^2, \end{aligned} $$

(5.21)

which depends on three different values of u. A finite difference-based second-order regularizer will therefore be of the form

$$\displaystyle \begin{aligned} \int_\varOmega \rho(u(x-\eta),u(x),u(x+\eta)) \,dx, \end{aligned} $$

(5.22)

which results in three running indices in (5.19) and thus ℓ ³ terms in the maximum. Even for a very moderate choice of ℓ = 10, this results in 1000 additional variables per point, which is impractical.

In this section, we therefore consider an approximation of this process for the special case of the absolute Laplacian regularizer (5.10), which only requires linear complexity. We derive the model for the one-dimensional case d = 1 and n = 1,

$$\displaystyle \begin{aligned} \int_\varOmega |u''(x)| d x, \end{aligned} $$

(5.23)

and subsequently discuss how to generalize it to n-dimensional image domains and vector-valued u.

The first step is to separate computation of the second derivatives from the lifting process, i.e., we also apply the derivative operator to the lifted representation $\bar {u}$ and seek a lifted regularizer

$$\displaystyle \begin{aligned} \int_\varOmega \bar{\rho}(\bar{u}''(x)) d x \approx \int_\varOmega \bar{\rho}\left(\left( \bar{u}(x+\eta)-2 \bar{u}(x)+\bar{u}(x-\eta)\right)/\eta^2\right) d x, \end{aligned} $$

(5.24)

where x ± η are the neighboring points of x. For simplicity, we assume η = 1.

We apply the same process as in (5.18) to ρ(z) = |z| and set

$$\displaystyle \begin{aligned} \phi(p) = \begin{cases} \left\lvert \mu \right\rvert\cdot\left\lvert t_{i_1} - 2t_{i_0} + t_{i_2} \right\rvert, &\text{if } p = \mu \cdot (e_{i_1} - 2e_{i_0} + e_{i_2}), \\ +\infty, & \text{otherwise,}\end{cases} {} \end{aligned} $$

(5.25)

where 1 ≤ i ₀, i ₁, i ₂ ≤ ℓ. The free variable $\mu \in \mathbb {R}$ is not required, but ensures that ϕ is positively homogeneous. This implies that the conjugate ϕ is an indicator function of some set, which simplifies the later optimization. Taking the convex conjugate, we obtain

$$\displaystyle \begin{aligned} \phi ^*(f) = \delta_{K_{1D}} (f) := \begin{cases} 0, & f \in K_{1D}, \\ +\infty, & \text{otherwise,}\end{cases} \end{aligned} $$

(5.26)

with the set

$$\displaystyle \begin{aligned} K_{1D} := \bigcap\limits_{1\leq i_{0}, i_{1},i_{2} \leq \ell}\big\lbrace f\in\mathbb{R}^\ell: f_{i_1} -2f_{i_0} + f_{i_2} \leq h\left\lvert i_1-2i_0+i_2 \right\rvert\big\rbrace. {} \end{aligned} $$

(5.27)

This is a straightforward computation following from the definition of the convex conjugate and making use of the 1-homogeneity of ϕ, and using the assumption that the labels t _i are uniformly spaced with distance h. The above formulation consists of ℓ ³ constraints, which would render the problem numerically intractable except for very small ℓ.

A main contribution of this work is the following theorem, which shows that the number of constraints can be reduced to linear order.

Theorem 5.1

The set K _1D in (5.27) with ℓ ³ linear constraints can be equivalently represented by ℓ linear constraints:

$$\displaystyle \begin{aligned} \begin{array}{rcl} K_{1D} &\displaystyle =&\displaystyle \big\lbrace f\in\mathbb{R}^\ell: f_2-f_1 \leq h,\;\; f_{\ell} - f_{\ell-1} \geq -h \big\rbrace\cap \\ &\displaystyle &\displaystyle \bigcap\limits_{2\leq i \leq \ell-1}\big\lbrace f\in\mathbb{R}^\ell: f_{i-1} -2f_i + f_{i+1}\leq 0\big\rbrace.{} \end{array} \end{aligned} $$

(5.28)

Proof

Denoting the right-hand side in (5.28) by $K_{1D}^{red}$, and using the definition of K _1D in (5.27), we have to show that $K_{1D} = K_{1D}^{red}$. $K_{1D} \subseteq K_{1D}^{red}$

Assume f ∈ K _1D as in (5.27), i.e., $f_{i_1}-2f_{i_0}+f_{i_2} \leq h \lvert i_1 -2i_0 + i_2 \rvert $ holds for all triples i ₀, i ₁, i ₂ ∈{1, …, ℓ}. Choose i ₁ = i ₂ = 2 and i ₀ = 1, then the first inequality in (5.28) follows. Analogously we obtain the second inequality f _ℓ − f _ℓ−1 ≥−h by setting i ₁ = i ₂ = ℓ − 1 and i ₀ = ℓ. All other inequalities in (5.28) follow by setting i ₁ = i − 1, i ₀ = i, i ₂ = i + 1, therefore $f \in K_{1D}^{red}$. $K_{1D} \supseteq K_{1D}^{red}$

Suppose $f\in K_{1D}^{red}$, i.e., the inequalities in (5.28) hold. We define the vector $a\in \mathbb {R}^{\ell -1}$, a _i := f _i+1 − f _i as the difference between two consecutive components of f. Using this notation, we reformulate the constraints (5.28) in terms of a:

$$\displaystyle \begin{aligned} a_1 &\leq h, {} \end{aligned} $$

(5.29)

$$\displaystyle \begin{aligned} a_{\ell-1} &\geq -h, \end{aligned} $$

(5.30)

$$\displaystyle \begin{aligned} a_{i-1} &\geq a_{i},\; \forall i \in \lbrace 2,\dots, \ell-1\rbrace{}. \end{aligned} $$

(5.31)

Thus the components of a _i form a finite, monotonously non-increasing sequence that is absolutely bounded by h, i.e., a ∈ S := {x ∈ [−h, +h]^ℓ−1 : x _i ≥ x _i+1}.

If i ₀ = i ₁ = i ₂, the inequality in (5.27) holds trivially. Otherwise, if two of the indices agree, then the inequality in (5.27) takes the form

$$\displaystyle \begin{aligned} f_j - f_k \leq h |j - k|. \end{aligned} $$

(5.32)

Assuming without loss of generality that j > k, this inequality follows from

$$\displaystyle \begin{aligned} f_j - f_k = a_{k} + \ldots + a_{j-1} \leq |a_k|+\ldots+|a_{j-1}| \leq h|j-k| \end{aligned} $$

(5.33)

due to the observation that all a _i are bounded by ± h.

We are left with the last case of distinct i ₀, i ₁, i ₂. Without loss of generality assume i ₁ > i ₂, otherwise we swap the symbols.

As all inequalities are invariant with respect to the addition of a constant to f, it suffices to prove the claim for all f with f ₁ fixed to some constant. Therefore we can assume f ₁ = 0. Under this assumption, the linear map between vectors $f \in \mathbb {R}^\ell $ in $K_{1D}^{red}$ and vectors $a\in \mathbb {R}^{\ell -1}$ satisfying (5.29)–(5.31) is bijective. As the vertices of the latter set consist of the vectors of the form (h, …, h, −h, …, −h), from bijectivity we deduce that the vertices of the set $K_{1D}^{red}\cap \{f|f_1=0\}$ are the elements satisfying the equality |f _i+1 − f _i| = h and the inequality f _i−1 − 2f _i + f _i+1 ≤ 0.

Showing that all f satisfying (5.28) are contained in the set in (5.27) is equivalent to showing

$$\displaystyle \begin{aligned} \max_{f \in K_{1D}^{red} \cap \{f|f_1=0\}} \{ f_{i_1} - 2 f_{i_0} + f_{i_2} \} \leq h|i_1 - 2 i_0 + i_2|.{} \end{aligned} $$

(5.34)

As the maximum problem is a linear program, it assumes its maximum on the set of vertices of $K_{1D}^{red}\cap \{f|f_1 = 0\}$. Therefore we only have to show that

$$\displaystyle \begin{aligned} f_{i_1} - 2 f_{i_0} + f_{i_2} \leq h|i_1 - 2 i_0 + i_2|{} \end{aligned} $$

(5.35)

for all f in the finite set of vertices, i.e., satisfying |f _i+1 − f _i| = h and the inequality f _i+1 − 2f _i + f _i−1 ≤ 0 (and still f ₁ = 0). This can be argued case by case:

As the left-hand side in (5.35) can be written as $(f_{i_1} - f_{i_0}) + (f_{i_2} - f_{i_0})$ and due the observation (5.33), the maximum is assumed on the vertex f satisfying f _i+1 = f _i + h for all i, with maximum value
$$\displaystyle \begin{aligned} f_{i_1} - 2 f_{i_0} + f_{i_2} = h (i_1 - i_0) + h (i_2 - i_0) = h(i_1 - 2 i_0 + i_2) = h |i_1 - 2 i_0 + i_2|, \end{aligned} $$
(5.36)
which shows that the inequality in (5.27) holds for this case.
In this case the maximum is assumed if either f _i+1 = f _i + h or f _i+1 = f _i − h for all i, depending on which of i ₂ − i ₀ and i ₀ − i ₁ is larger. Therefore
$$\displaystyle \begin{aligned} f_{i_1} - 2 f_{i_0} + f_{i_2} \leq \max\{\pm(h(i_0 - i_2)-h(i_1-i_0))\} =h\lvert i_1 - 2 i_0 + i_2\rvert. \end{aligned} $$
(5.37)
Again with the observation (5.33), we see that in this case the maximum is attained for f _i+1 = f _i − h for all i, in which case
$$\displaystyle \begin{aligned} f_{i_1} - 2 f_{i_0} + f_{i_2} = -(f_{i_0}-f_{i_1})-(f_{i_0}-f_{i_2}) = h (-i_2+2i_0 - i_1) = h|i_1 -2 i_0 + i_2|.\end{aligned} $$
(5.38)

This shows that (5.35) holds for all vertices in the set $K_{1D}^{red}\cap \{f|f_1=0\}$, and therefore for all points, which concludes the proof of the remaining inclusion $K_{1D}^{red} \subseteq K_{1D}$. □

Interestingly, in the classical convex relaxation for the (first-order ) total variation used in [19, 33], the dual constraint set is of the form

$$\displaystyle \begin{aligned} K_{\ensuremath{\operatorname{TV}},1D} = \bigcap\limits_{1\leq i \leq \ell-1}\big\lbrace f\in\mathbb{R}^\ell: |f_i-f_{i+1}| \leq h\big\rbrace. \end{aligned} $$

(5.39)

As the second intersection in (5.28) enforces f _i+1 − f _i ≤ f _i − f _i−1, we obtain

$$\displaystyle \begin{aligned} K_{1D} = K_{\ensuremath{\operatorname{TV}},1D} \cap \bigcap\limits_{2\leq i \leq \ell-1}\big\lbrace f\in\mathbb{R}^\ell: f_{i-1} -2f_i + f_{i+1}\leq 0\big\rbrace. \end{aligned} $$

(5.40)

Thus, when moving from first- to second-order regularization in the proposed way, the only addition is an extra non-positivity constraint on the second derivative of the dual variable f.

So far we have only considered the case of a one-dimensional domain Ω. In order to generalize the construction in (5.25) to d > 1 dimensions, we replace the one-dimensional three-point stencil by the corresponding Laplacian stencil in higher dimensions:

$$\displaystyle \begin{aligned} \phi(p) = \begin{cases} \left\lvert \mu \right\rvert\cdot\left\lvert \sum_{j=1}^d (i_{1,j} - 2 i_0 + i_{2,j}) \right\rvert, &\text{if } p = \mu \cdot \sum_{j=1}^d (e_{i_{1,j}} -2 e_{i_0} + e_{i_{2,j}}), \\ +\infty, & \text{otherwise,} \end{cases} {} \end{aligned} $$

(5.41)

where i _1,j and i _2,j are the indices of the neighboring points of i ₀ in the j-th spatial direction. The convex conjugate can be computed in a similar fashion as in the one-dimensional case:

$$\displaystyle \begin{aligned} \phi ^*(f) = \delta_K (f) \end{aligned} $$

(5.42)

with the set

$$\displaystyle \begin{aligned} K := \bigcap\limits_{1\leq i_0, i_{1,1}, i_{2,1}, \ldots \leq \ell} \big\lbrace f\in\mathbb{R}^\ell: \sum_{j=1}^d (f_{i_{1,j}} - 2 f_{i_0} + f_{i_{2,j}}) \leq h\left\lvert \sum_{k=1}^d (i_{1,j} - 2 i_0 + i_{2,j}) \right\rvert\big\rbrace. \end{aligned} $$

(5.43)

Taken all together, the lifted absolute Laplacian regularizer for scalar-valued images in a d-dimensional image domain becomes

$$\displaystyle \begin{aligned} \bar{S}_{AL,s}(\bar{u}) := \int_\varOmega \sup_{f \in K}\langle \varDelta \bar{u} (x), f\rangle \,dx.{} \end{aligned} $$

(5.44)

In order to approximate the absolute Laplacian for lifted vector-valued functions u = (u ₁, …, u _n), we apply (5.44) to the marginal distributions $\bar {u}^{(k)}(x) := \varPi _k \bar {u}(x) \in \varDelta ^{l_k}$ separately in each component k ∈{1, …, n}, where

$$\displaystyle \begin{aligned} \varPi_k := \underbrace{(1,\dots,1)}_{\text{l}_1\cdot \text{l}_2\cdot \ldots \cdot \text{l}_{\text{k}-1} \text{ones}}\otimes\;\mathrm{Id}_{l_k}\otimes\underbrace{(1,\dots,1)}_{\text{l}_{\text{k}+1}\cdot \text{l}_{\text{k}+2}\cdot \ldots \cdot \text{l}_{\text{n}} \text{ones}}\in\mathbb{R}^{l_k\times\ell} \end{aligned} $$

(5.45)

computes the k-th marginal distribution by summing the entries of $\bar {u}$ over all dimensions of the range with the exception of the k-th dimension. As the absolute Laplacian regularizer decouples in the components of u, it can be approximated by summing the one-dimensional regularizer of the marginalized label distribution over the label dimensions:

$$\displaystyle \begin{aligned} \bar{S}_{AL}(\bar{u}) := \sum_{i=1}^n \bar{S}_{AL,s}(\varPi_i\bar{u}) =\sum_{i=1}^n\int_\varOmega \sup_{f^i \in K_{l_i}}\langle \varDelta \varPi_i \bar{u} (x), f^i(x)\rangle \,dx. \end{aligned} $$

(5.46)

Here $K_{l_i}\subseteq \mathbb {R}^{l_i}$ denotes a set of the form (5.43) in l _i-dimensional space, which accounts for the fact that there may be a different number of labels in each dimension of the range.

After discretizing the image domain $\varOmega \subseteq \mathbb {R}^d$ on a d-dimensional Cartesian grid Ω′, the full discretized problem can be formulated in saddle point form:

$$\displaystyle \begin{aligned} \inf\limits_{\bar{u}:\varOmega'\rightarrow\varDelta_{\ell}}\sup_{f^i:\varOmega'\rightarrow K_{l_i},\, i=1,\ldots,n} \sum_{x\in\varOmega'}\bar{\rho}^{}(x, \bar{u}(x)) + \lambda \sum_{x\in\varOmega'} \sum_{i=1}^n\langle \varDelta\varPi_i\bar{u}(x),f^i(x) \rangle. \end{aligned} $$

(5.47)

This problem can be readily solved using any available primal-dual method for non-smooth convex optimization .

We do not know of a result similar to Theorem 5.1 yet in order to reduce the number of constraints for the sets $K_{l_i}$ in a similar way as for K _1D. Therefore, we take a pragmatic approach: we approximate each of the sets $K_{l_i}$ by the set K _1D in the corresponding dimension, which amounts to an outer approximation of $K_{l_i}$. We can then apply Theorem 5.1 to solve the problem using the reduced number of constraints.

Experimental Results

We evaluate the proposed strategy for higher-order relaxation of non-convex problems on two applications. Firstly, we consider a non-convex denoising problem, using the Matlab extension CVX [12, 13] to solve the primal formulation of the saddle-point problem (5.47) on an Intel Core i7-4500U CPU with 8 GB of RAM.

Secondly, we examine a real-world image registration problem, using a CUDA 7.5.17 implementation^{Footnote 1} of a first order primal-dual algorithm with diagonal preconditioning [5] which runs on an Nvidia GeForce GTX 680 GPU with an Intel Core i7 960 CPU and 24 Gb RAM. The implementation uses a more recent “sublabel-accurate” approach for lifting the data term in order to reduce the required resolution for the data term [18, 28].

Non-convex Denoising with Second-Order Regularity

In order to illustrate that non-convexity can be beneficial when combined with second-order regularization , we consider the simple one-dimensional denoising problem

$$\displaystyle \begin{aligned} \inf\limits_{u:\varOmega\rightarrow\mathbb{R}} \int_\varOmega |u(x)-g(x)|{}^q \,dx + \lambda \int_\varOmega |u''(x)| d x{} \end{aligned} $$

(5.48)

with $\varOmega \subseteq \mathbb {R}$. For q = 1, one obtains a simple convex $\ensuremath {\operatorname {TV}}^2-L^1$ denoising model, while for q < 1, the energy is generally non-convex. We used the proposed method to approximate a global solution of (5.48).

The method was applied to a smooth input signal g distorted by heavy salt-and-pepper noise, with 80% of the values randomly set to 0 or 1. The locations of the outliers were unknown to the solver, and no additional preprocessing or outlier masking was performed.

As can be seen from Figs. 5.1 and 5.2, combining higher-order regularization with a non-convex data term allows to reconstruct the signal more faithfully. While both approaches prefer piecewise linear results as expected from the function-space formulation, in the convex approach with q = 1, input noise is carried over into the output before the structure is fully visible.

While convex methods relying on L ¹ data terms are often—rightfully—referred to as “robust” methods in comparison to methods using smooth or quadratic data terms, the non-convex approach with q = 0.1 is even more robust against outliers and returns a decent reconstruction for a range of λ on this challenging problem. Run times were in the order of 0.3 s for a discretization of Ω using 120 grid points and ℓ = 63 labels.

Image Registration Using the Absolute Laplacian

For a more challenging application, we apply the method to the image registration problem with SSD data term (5.6) and absolute Laplacian regularization .

Translation-Only Synthetic Image

We first apply the absolute Laplacian regularizer to a synthetic binary image registration problem. The input reference image R is a binary 64 × 64 image of two vertical boxes. The template image T is obtained by translating the input image by 12 pixels (Fig. 5.3, first row). Thus the ground truth is a uniform translation by 12 pixels and constitutes a global minimizer , as it has vanishing data term and the second-order regularizer does not penalize linear deformations . This configuration is challenging for methods based on local optimization, as there is a strong local minimum. Furthermore, as the images contain large constant regions, the energy landscape has extensive flat regions with zero gradient.

We compare our approach to a traditional curvature-regularized model solved using a single-resolution local minimization method implemented in the Matlab extension FAIR [24, 25]. The regularization strength was manually set to λ = 10, however a wide range of values for λ produced the same qualitative behavior. The traditional approach leads to a solution that is not globally optimal (Fig. 5.3, second row). Using our approach, we retrieve the globally optimal ground truth with ℓ = 9 labels in the label space Γ = [−12, 12]² and a run time of 85 s, without having to resort to approaches such as coarse-to-fine or affine pre-registration for initialization (Fig. 5.3, bottom row).

Real-World Image Registration

As a real-world example, we employ the SSD energy with absolute Laplacian regularization to solve the image registration problem on a pair of X-ray images, and compare to the existing lifting approach [18] with total variation regularization . The regularization strength was manually set to λ = 0.05. Run times were 933 s for total variation , and 515 s for absolute Laplacian minimization.

As can be seen from the numerical results (Fig. 5.4), while the first-order total variation regularization achieves a very good data fit, it results in a physically implausible self-intersecting deformation grid (Fig. 5.4, second row). This behavior can be partly attributed to the well-known fact that total variation promotes piecewise constant solutions, also commonly referred to as stair-casing effect [8]. In the context of medical image registration , this is a highly undesired behavior, as jumps in the deformation map u correspond to infinite stretch or compression and often lead to self-intersections. In contrast, the proposed second-order regularizer (Fig. 5.4, bottom row) maintains a physically meaningful deformation , while still achieving an acceptable data fit.

Conclusion and Outlook

In this work, we have taken a first step towards extending the convex relaxation and functional lifting framework to second-order regularization . We showed how to solve the main issue of an exploding number of constraints for the absolute Laplacian regularization .

Experiments on a denoising problem showed that the combination of higher-order regularization and non-convex data terms can lead to better results than a convex model, and allows to recover highly corrupted data in a piecewise linear fashion. In the application of image registration , the absolute Laplacian faithfully retrieves simple translations and leads to a more realistic deformation grid than total variation regularization on a real-world problem.

While our relaxation allows to reduce the number of required constraints to linear complexity, it is an approximation, rather than a “tight” relaxation in the sense of an exact biconjugate, and the proof is still limited to one dimension. An open question is whether one can find a similar compact representation for the tight relaxation in more than one dimension.

Finally, in this work we have constrained ourselves to the discretized setting. A functional-analytic discussion as well as an extension to the more recent manifold-valued and sublabel-accurate relaxations remain subject of future work.

Notes

1.
See [28] and http://github.com/tum-vision/prost for the most recent version.

References

L. Ambrosio, N. Fusco, D. Pallara, Functions of Bounded Variation and Free Discontinuity Problems (Clarendon Press, Oxford, 2000)
MATH Google Scholar
K. Bredies, K. Kunisch, T. Pock, Total generalized variation. J. Imag. Sci. 3(3), 294–526 (2010)
MathSciNet MATH Google Scholar
T. Brox, A. Bruhn, N. Papenberg, J. Weickert, High accuracy optical flow estimation based on a theory for warping, in European Conference on Computer Vision (Springer, Berlin, 2004), pp. 25–36
MATH Google Scholar
A. Chambolle, An algorithm for total variation minimization and applications. J. Math. Imag. Vis. 20, 89–97 (2004)
Article MathSciNet Google Scholar
A. Chambolle, T. Pock, A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imag. Vis. 40(1), 120–145 (2011)
Article MathSciNet Google Scholar
A. Chambolle, D. Cremers, T. Pock, A convex approach to minimal partitions. J. Imag. Sci. 5(4), 1113–1158 (2012)
Article MathSciNet Google Scholar
T. Chan, A. Marquina, P. Mulet, Higher-order total variation-based image restoration. J. Sci. Comput. 22(2), 503–516 (2000)
MathSciNet MATH Google Scholar
T. Chan, S. Esedoḡlu, F. Park, A. Yip, Total variation image restoration: overview and recent developments, in The Handbook of Mathematical Models in Computer Vision (Springer, Berlin, 2005)
Google Scholar
F. Demengel, Fonctions à Hessien borné. Ann. Inst. Fourier 34, 155–190 (1985)
Article Google Scholar
B. Fischer, J. Modersitzki, Curvature based image registration. J. Math. Imag. Vis. 18(1), 81–85 (2003)
Article MathSciNet Google Scholar
W.J. Fu, Penalized regressions: the bridge versus the lasso. J. Comput. Graph. Stat. 7(3), 397–416 (1998)
MathSciNet Google Scholar
M. Grant, S. Boyd, Graph implementations for nonsmooth convex programs, in Recent Advances in Learning and Control (Springer, Berlin, 2008), pp. 95–110
Book Google Scholar
M. Grant, S. Boyd, CVX: matlab software for disciplined convex programming (2014). http://cvxr.com/cvx
S. Henn, A full curvature based algorithm for image registration. J. Math. Imag. Vis. 24(2), 195–208 (2006)
Article MathSciNet Google Scholar
W. Hinterberger, O. Scherzer, Variational methods on the space of functions of bounded Hessian for convexification and denoising. Computing 76(1), 109–133 (2006)
Article MathSciNet Google Scholar
B.K.P. Horn, B.G. Schunck, Determining optical flow. Artif. Intell. 17, 185–203 (1981)
Article Google Scholar
H. Ishikawa, Exact optimization for Markov random fields with convex priors. Pattern. Anal. Mach. Intell. 25(10), 1333–1336 (2003)
Article Google Scholar
E. Laude, T. Möllenhoff, M. Moeller, J. Lellmann, D. Cremers, Sublabel-accurate convex relaxation of vectorial multilabel energies, in European Conference on Computer Vision (Springer, Berlin, 2016), pp. 614–627
Google Scholar
J. Lellmann, C. Schnörr, Continuous multiclass labeling approaches and algorithms. SIAM J. Imag. Sci. 4(4), 1049–1096 (2011)
Article MathSciNet Google Scholar
J. Lellmann, J. Kappes, J. Yuan, F. Becker, C. Schnörr, Convex multi-class image labeling by simplex-constrained total variation, in Scale Space and Variational Methods in Computer Vision. Lecture Notes in Computer Science, vol. 5567 (2009), pp. 150–162
Google Scholar
J. Lellmann, E. Strekalovskiy, S. Koetter, D. Cremers, Total variation regularization for functions with values in a manifold, in International Conference on Computer Vision (2013), pp. 2944–2951
Google Scholar
M. Lysaker, X.C. Tai, Iterative image restoration combining total variation minimization and a second-order functional. Int. J. Comput. Vis. 66(1), 5–18 (2006)
Article Google Scholar
M. Lysaker, A. Lundervold, X.C. Tai, Noise removal using fourth-order partial differential equation with applications to medical magnetic resonance images in space and time. IEEE Trans. Image Process. 12(12), 1579–1590 (2003)
Article Google Scholar
J. Modersitzki, Numerical Methods for Image Registration (Oxford University Press on Demand, Oxford, 2004)
MATH Google Scholar
J. Modersitzki, FAIR: Flexible Algorithms for Image Registration (SIAM, Philadelphia, 2009)
Book Google Scholar
T. Möllenhoff, E. Strekalovskiy, M. Moeller, D. Cremers, The primal-dual hybrid gradient method for semiconvex splittings. SIAM J. Imag. Sci. 8(2), 827–857 (2015)
Article MathSciNet Google Scholar
T. Möllenhoff, E. Strekalovskiy, M. Möller, D. Cremers, Low rank priors for color image regularization, in International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition (Springer, Berlin, 2015), pp. 126–140
Google Scholar
T. Möllenhoff, E. Laude, M. Moeller, J. Lellmann, D. Cremers, Sublabel-accurate relaxation of nonconvex energies, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 3948–3956
Google Scholar
M. Nikolova, Analysis of the recovery of edges in images and signals by minimizing nonconvex regularized least-squares. Multiscale Model. Simul. 4(3), 960–991 (2005)
Article MathSciNet Google Scholar
J. Nocedal, S.J. Wright, Numerical Optimization (Springer, Berlin, 2006)
MATH Google Scholar
K. Papafitsoros, C.B. Schönlieb, A combined first and second order variational approach for image reconstruction. J. Math. Imag. Vision 48(2), 308–338 (2014)
Article MathSciNet Google Scholar
T. Pock, A. Chambolle, D. Cremers, H. Bischof, A convex relaxation approach for computing minimal partitions, in Computer Vision and Pattern Recognition (2009)
Google Scholar
T. Pock, D. Cremers, H. Bischof, A. Chambolle, Global solutions of variational models with convex regularization. J. Imag. Sci. 3(4), 1122–1145 (2010)
Article MathSciNet Google Scholar
R.T. Rockafellar, Convex Analysis (Princeton University Press, Princeton, 1970)
Book Google Scholar
L. Rudin, S. Osher, E. Fatemi, Nonlinear total variation based noise removal algorithms. Physica D 60, 259–268 (1992)
Article MathSciNet Google Scholar
S. Setzer, G. Steidl, T. Teuber, Infimal convolution regularizations with discrete l1-type functionals. Commun. Math. Sci 9(3), 797–872 (2011)
Article MathSciNet Google Scholar
P. Suquet, Existence et régularité des solutions des équations de la plasticité parfaite. C. R. Acad. Sci. Paris, Ser. A 286, 1201–1204 (1978)
Google Scholar
J. Yuan, E. Bae, X.C. Tai, Y. Boykov, A continuous max-flow approach to Potts model, in European Conference on Computer Vision (2010), pp. 379–392
Google Scholar
C. Zach, D. Gallup, J.M. Frahm, M. Niethammer, Fast global labeling for real-time stereo using multiple plane sweeps, in Vision, Modeling, and Visualization (2008)
Google Scholar
C. Zach, C. Häne, M. Pollefeys, What is optimized in convex relaxations for multilabel problems: connecting discrete and continuously inspired map inference. IEEE Trans. Pattern Anal. Mach. Intell. 36(1), 157–170 (2014)
Article Google Scholar
W.P. Ziemer, Weakly Differentiable Functions (Springer, Berlin, 1989)
Book Google Scholar

Download references

Acknowledgements

This work was partially funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)—394737018 “Functional Lifting 2.0: Efficient Convexifications for Imaging and Vision”. We would like to thank Emanuel Laude and Thomas Möllenhoff for providing their library prost, which was used to solve the saddle-point formulation of the problems.

Author information

Authors and Affiliations

Institute of Mathematics and Image Computing (MIC), University of Lübeck, Lübeck, Germany
Benedikt Loewenhauser
Technical University of Munich, Garching, Germany
Benedikt Loewenhauser
Institute of Mathematics and Image Computing (MIC), University of Lübeck, Lübeck, Germany
Jan Lellmann

Authors

Benedikt Loewenhauser
View author publications
You can also search for this author in PubMed Google Scholar
Jan Lellmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan Lellmann .

Editor information

Editors and Affiliations

Department of Mathematics, University of Bergen, Bergen, Norway
Xue-Cheng Tai
Norwegian Defence Research Establishment, Kjeller, Norway
Egil Bae
University of South-Eastern Norway, Porsgrunn, Norway
Marius Lysaker

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Loewenhauser, B., Lellmann, J. (2018). Functional Lifting for Variational Problems with Higher-Order Regularization. In: Tai, XC., Bae, E., Lysaker, M. (eds) Imaging, Vision and Learning Based on Optimization and PDEs. IVLOPDE 2016. Mathematics and Visualization. Springer, Cham. https://doi.org/10.1007/978-3-319-91274-5_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-91274-5_5
Published: 20 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91273-8
Online ISBN: 978-3-319-91274-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Functional Lifting for Variational Problems with Higher-Order Regularization

Abstract

Similar content being viewed by others

Functional Liftings of Vectorial Variational Problems with Laplacian Regularization

Combined First and Second Order Variational Approaches for Image Processing

Lifting Methods for Manifold-Valued Variational Problems

Keywords

Introduction and Related Work

Contributions

Lifting for Absolute Laplacian Regularization

Notation and Mathematical Preliminaries

Approximate Relaxation of the Absolute Laplacian

Theorem 5.1

Proof

Experimental Results

Non-convex Denoising with Second-Order Regularity

Image Registration Using the Absolute Laplacian

Translation-Only Synthetic Image

Real-World Image Registration

Conclusion and Outlook

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Functional Lifting for Variational Problems with Higher-Order Regularization

Abstract

Similar content being viewed by others

Functional Liftings of Vectorial Variational Problems with Laplacian Regularization

Combined First and Second Order Variational Approaches for Image Processing

Lifting Methods for Manifold-Valued Variational Problems

Keywords

Introduction and Related Work

Contributions

Lifting for Absolute Laplacian Regularization

Notation and Mathematical Preliminaries

Approximate Relaxation of the Absolute Laplacian

Theorem 5.1

Proof

Experimental Results

Non-convex Denoising with Second-Order Regularity

Image Registration Using the Absolute Laplacian

Translation-Only Synthetic Image

Real-World Image Registration

Conclusion and Outlook

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation