1 Introduction

Nonlinear variational methods have provided very powerful tools in the design and analysis of image processing and computer vision algorithms in recent decades. In parallel, methods based on harmonic analysis, dictionary learning, and sparse representations, as well as spectral analysis of linear operators (such as the graph Laplacian) have shown tremendous advances in processing highly complex signals such as natural images, 3D data, and speech. Recent studies now suggest that variational methods can also be analyzed and understood through a nonlinear generalization of eigenvalue analysis, referred to as nonlinear spectral methods. Most of the current knowledge is focused on one-homogeneous functionals, which will be the focus of this paper.

The motivation and interpretation of classical linear filtering strategies is closely linked to the eigendecomposition of linear operators. In this manuscript, we will show that one can define a nonlinear spectral decomposition framework based on the gradient flow with respect to arbitrary convex one-homogeneous functionals and obtain a remarkable number of analogies to linear filtering techniques. To closely link the proposed nonlinear spectral decomposition framework to the linear one, let us summarize earlier studies concerning the use of nonlinear eigenfunctions in the context of variational methods.

One notion which is very important is the concept of nonlinear eigenfunctions induced by convex functionals. Throughout this paper, we will consider convex functionals J either in a function space setting \(J: {\mathcal {X}} \rightarrow {\mathbb {R}}\) for a Banach space \({\mathcal {X}}\) embedded into \(L^2\) or—in a discrete setting—as a function \(J: {\mathbb {R}}^n \rightarrow {\mathbb {R}}\). We will specify the considered setting at those points where it is crucial. In either setting, we refer to u as an eigenfunction of J if it admits the following eigenvalue problem

$$\begin{aligned} \lambda u \in \partial J(u), \ \ \text { and } \Vert u\Vert _2 = 1, \end{aligned}$$
(1)

where \(\lambda \in {\mathbb {R}}\) is the corresponding eigenvalue, and \(\partial J(u) = \left\{ p\in {\mathcal {X}}^* ~|~ J(v) - J(u) - \langle p, v-u \rangle \ge 0\right\} \) the subdifferential of J at u.

The analysis of eigenfunctions related to nonquadratic convex functionals was mainly concerned with the total variation (TV) regularization, given by

$$\begin{aligned} J_\mathrm{TV}(u) = \sup _{ \begin{array}{c} {z \in C_c^\infty }, \\ {\Vert z\Vert _{L^\infty ({\varOmega })}\le 1} \end{array}}\int _{\varOmega } \, u(x) \, {{\mathrm{div}}}z(x) ~\mathrm{d}x, \end{aligned}$$

on a domain \({\varOmega }\), which for smooth functions u reduces to \(J_\mathrm{TV}(u) = \int _{\varOmega } |\nabla u(x)| ~\mathrm{d}x\).

In the analysis of the variational TV denoising, i.e., the ROF model from [51], Meyer [41] has shown an explicit solution for the case of a disk (an eigenfunction of TV), quantifying explicitly the loss of contrast and advocating the use of \(TV-G\) regularization. Within the extensive studies of the TV flow [1, 2, 6, 7, 15, 29, 52] eigenfunctions of TV (referred to as calibrable sets) were analyzed and explicit solutions were given for several cases of eigenfunction spatial settings. In [17] an explicit solution of a disk for the inverse scale space flow is presented, showing its instantaneous appearance at a precise time point related to its radius and height.

Fig. 1
figure 1

Example of valid and nonvalid shapes as nonlinear eigenfunctions with respect to the TV functional. Smooth enough convex sets, which admit condition (2) (marked with a green check), are solutions of the eigenvalue problem (1). Shapes with either too high curvature, not convex or too elongated (marked with red X) are not valid

A highly notable contribution in [7] is the precise geometric definition of convex sets which are eigenfunctions. Let \(\chi _C\), be a characteristic function of the set \( C \subset {\mathbb {R}}^2\), then it admits (1), with J the TV functional \(J_\mathrm{TV}\), if

$$\begin{aligned} \text {ess} \sup _{q \in \partial C}\kappa (q) \le \frac{P(C)}{|C|}, \end{aligned}$$
(2)

where C is convex, \(\partial C \in C^{1,1}\), P(C) is the perimeter of C, |C| is the area of C, and \(\kappa \) is the curvature. In this case, the eigenvalue is \(\lambda = \frac{P(C)}{|C|}\). See Fig. 1 for some examples. [7, Theorem 6] furthermore classified all possible eigenfunctions arising from the indicator function of a set. Such eigenfunctions must arise from a union of sets C with the above properties, all having the same eigenvalue \(\frac{P(C)}{|C|}\) and additionally being sufficiently far apart.

In [9, 44, 46, 48] eigenfunctions related to the total-generalized-variation (TGV) [11] and the infimal convolution total variation functional [20] are analyzed in the case of a bounded domain and their different reconstruction properties on particular eigenfunctions of the TGV are demonstrated theoretically as well as numerically.

Examples of certain eigenfunctions for different extensions of the TV to color images are given in [26].

In [52] Steidl et al. have shown the close relations, and equivalence in a 1D discrete setting, of the Haar wavelets to both TV regularization and TV flow. This was later developed for a 2D setting in [57]. The connection between Haar wavelets and TV methods in 1D was made more precise in [10], who indeed showed that the Haar-wavelet basis is an orthogonal basis of eigenfunctions of the total variation (with appropriate definition at the domain boundary)—in this case the Rayleigh principle holds for the whole basis. In the field of morphological signal processing, nonlinear transforms were introduced in [25, 40].

Fig. 2
figure 2

Overview of the spectral framework

2 Spectral Representations

2.1 Scale Space Representation

We will use the scale space evolution, which is straightforward, as the canonical case of spectral representation. Consider a convex (absolutely) one-homogeneous functional J, i.e., a functional for which \(J(\lambda u) = |\lambda | J(u)\) holds for all \(\lambda \in {\mathbb {R}}\).

The scale space or gradient flow is

$$\begin{aligned} \partial _t u(t) = -p(t), \ \ \ p(t) \in \partial J(u(t)), \ u(0)=f. \end{aligned}$$
(3)

We refer the reader to [3, 56] for an overview over scale space techniques.

A spectral representation based on the gradient flow formulation (3) was the first work toward defining a nonlinear spectral decomposition and has been conducted in [30, 31] for the case of J being the TV regularization. In our conference paper [14], we extended this notion to general one-homogeneous functionals by observing that the solution of the gradient flow can be computed explicitly for any one-homogeneous J in the case of f being an eigenfunction. For \(\lambda f \in \partial J(f)\), the solution to (3) is given by

$$\begin{aligned} u(t) = \left\{ \begin{array}{ll} (1 - t \lambda ) f &{} \quad \text { for } t\le \frac{1}{\lambda }, \\ 0 &{} \quad \text { else. } \end{array} \right. \end{aligned}$$
(4)

Note that in linear spectral transformations such as Fourier or Wavelet-based approaches, the input data being an eigenfunction lead to the energy of the spectral representation being concentrated at a single wavelength. To preserve this behavior for nonlinear spectral representations, the wavelength decomposition of the input data f is defined by

$$\begin{aligned} \phi (t) = t \partial _{tt} u(t). \end{aligned}$$
(5)

Note that due to the piecewise linear behavior in (4), the wavelength representation of an eigenfunction f becomes \(\phi (t) = \delta \left( t-\frac{1}{\lambda }\right) f\), where \(\delta \) denotes a Dirac delta distribution. The name wavelength decomposition is natural because for \(\lambda f \in \partial J(f)\), \(\Vert f\Vert =1\), one readily shows that \(\lambda = J(f)\), which means that the eigenvalue \(\lambda \) is corresponding to a generalized frequency. In analogy to the linear case, the inverse relation of a peak in \(\phi \) appearing at \(t = \frac{1}{\lambda }\) motivates the interpretation as a wavelength, as discussed in more details in the following section.

For arbitrary input data f, one can reconstruct the original image by

$$\begin{aligned} f(x) = \int _0^\infty \phi (t;x) \mathrm{d}t. \end{aligned}$$
(6)

Given a transfer function \(H(t)\in {\mathbb {R}}\), image filtering can be performed by

$$\begin{aligned} f^H(x) := \int _0^\infty H(t)\phi (t;x) \mathrm{d}t. \end{aligned}$$
(7)

Figure 2 gives an overview of the spectral decomposition framework including two examples of applying transfer or filter functions for texture enhancement and low pass filtering. Some of the terms and notations (such as low-pass filter (LPF) and spectrum S) are explained hereafter in Sect. 3.

We would like to point out that Eq. (5) is an informal definition of the wavelength decomposition which is supposed to illustrate the general idea of nonlinear spectral decompositions. We refer the reader to our more theoretical work [16] in which we give sense to (5) for one-homogeneous J in the spatially discrete setting as elements in \(\left( W^{1,1}_\mathrm{loc}({\mathbb {R}}^+, {\mathbb {R}}^n)\right) ^*\). The discrete case also permits to show a sufficiently rapid decrease of \(\partial _t u(t)\) for the reconstruction (7) to hold for all absolutely one-homogeneous regularization functions J. For the sake of simplicity and readability, we will skip the technical details of the underlying approach in most parts of this paper.

3 Signal Processing Analogy

Up until very recently, nonlinear filtering approaches such as (3) or related variational methods have been treated independently of the classical linear point of view of changing the representation of the input data, filtering the resulting representation and inverting the transform. In [30, 31] it was proposed to use (3) in the case of J being the TV to define a TV spectral representation of images that allows to extend the idea of filtering approaches from the linear to the nonlinear case. This was later generalized to one-homogeneous functionals in [14].

Classical Fourier filtering has some very convenient properties for analyzing and processing signals:

  1. (1)

    Processing is performed in the transform (frequency) domain by simple attenuation or amplification of desired frequencies

  2. (2)

    A straightforward way to visualize the frequency activity of a signal is through its spectrum plot. The spectral energy is preserved in the frequency domain, through the well-known Parseval’s identity.

  3. (3)

    The spectral decomposition corresponds to the coefficients representing the input signal in a new orthonormal basis.

  4. (4)

    Both, transform and inverse transform are linear operations.

In the nonlinear setting, the first two characteristics are mostly preserved. Orthogonality could so far only be obtained for the particular (discrete) case of \(J(u)=\Vert Vu\Vert _1\) with \(VV^*\) being diagonally dominant, [16]. Further orthogonality statements are still an open issue. Finally, linearity is certainly lost. In essence, we obtain a nonlinear forward transform and a linear inverse transform. Thus following the nonlinear decomposition, filtering can be performed easily.

In addition, we gain edge-preservation and new scale features, which, unlike sines and cosines, are data-driven and are therefore highly adapted to the image. Thus the filtering has far less tendency to create oscillations and artifacts.

Let us first derive the relation between Fourier and the eigenvalue problem (1). For

$$\begin{aligned} J(u)=\frac{1}{2}\int |\nabla u(x)|^2 \mathrm{d}x, \end{aligned}$$

we get \(-{\Delta } u \in \partial J(u)\). Thus, with appropriate boundary conditions, the sine waves arising from Fourier analysis are eigenfunctions, in the sense of (1), where, for a frequency \(\omega \) of the sine waves, we have the relation \(\lambda = \omega ^2\). Other convex regularizing functionals, such as TV and TGV, can therefore be viewed as natural nonlinear generalizations.

A fundamental concept in linear filtering is ideal filters [49]. Such filters either retain or diminish completely frequencies within some range. In a linear time (space) invariant system, a filter is fully determined by its transfer function \(H(\omega )\). The filtered response of a signal f(x), with Fourier transform \(F(\omega )\), filtered by a filter H is

$$\begin{aligned} f^H(x) := {\mathcal {F}}^{-1}\Big (H(\omega )\cdot F(\omega )\Big ), \end{aligned}$$

with \({\mathcal {F}}^{-1}\) the inverse Fourier transform. For example, an ideal LPF retains all frequencies up to some cutoff frequency \(\omega _c\). Its transfer function is thus

$$\begin{aligned} H(\omega ) = \left\{ \begin{array}{cl} 1 &{} \text { for } 0 \le \omega \le \omega _c, \\ 0 &{} \text { else. } \end{array} \right. \end{aligned}$$

Viewing frequencies as eigenvalues in the sense of (1), one can define generalizations of these notions.

3.1 Nonlinear Ideal Filters

The (ideal) LPF can be defined by Eq. (7) with \(H(t)=1\) for \(t \ge t_c\) and 0 otherwise, or

$$\begin{aligned} LPF_{t_c}(f) := \int _{t_c}^\infty \phi (t;x) \mathrm{d}t. \end{aligned}$$
(8)

Its complement, the (ideal) high-pass filter (HPF), is defined by

$$\begin{aligned} HPF_{t_c}(f) := \int _{0}^{t_c} \phi (t;x) \mathrm{d}t. \end{aligned}$$
(9)

Similarly, band-(pass/stop)-filters are filters with low and high cutoff scale parameters (\(t_1 < t_2\))

$$\begin{aligned}&BPF_{t_1,t_2}(f) := \int _{t_1}^{t_2} \phi (t;x) \mathrm{d}t,\end{aligned}$$
(10)
$$\begin{aligned}&BSF_{t_1,t_2}(f) := \int _0^{t_1}\phi (t;x) \mathrm{d}t + \int _{t_2}^\infty \phi (t;x) \mathrm{d}t. \end{aligned}$$
(11)

For f being a single eigenfunction with eigenvalue \(\lambda _0 = \frac{1}{t_0}\), the above definitions coincide with the linear definitions, where the eigenfunction is either completely preserved or completely diminished, depending on the cutoff eigenvalue(s) of the filter.

Again, the above point of view is rather illustrative than mathematically precise. As mentioned above, the discrete case allows to identify \(\phi \) with elements of \(\left( W^{1,1}_\mathrm{loc}\right) ^*\) such that only mollified version of the above ideal filters is permissible; see [16] for details.

Fig. 3
figure 3

Total variation spectral filtering example. The input image (top left) is decomposed into its \(\phi (t)\) components, the corresponding spectrum \(S_1(t)\), Eq. (12), is on the top right. Integration of the \(\phi \)’s over the t domains 1, 2, and 3 (top right) yields high-pass, band-pass, and LPF, respectively (chosen manually here). The band-stop filter (bottom right) is the complement integration domain of region 2. Taken from [14]

In Figs. 3 and 4, an example of spectral TV processing is shown with the response of the four filters defined above in Eqs. (8) through (11).

Fig. 4
figure 4

A classical single peak example of a TV eigenfunction in \({\mathbb {R}}\), \(u(x)=B_w(x)\)

3.2 Spectral Response

As in the linear case, it is very useful to measure in some sense the “activity” at each frequency (scale). This can help identify dominant scales and design better the filtering strategies (either manually or automatically). Moreover, one can obtain a notion of the type of energy which is preserved in the new representation using some analog of Parseval’s identity.

In [30, 31], a \(L^1\) type spectrum was suggested for the TV spectral framework (without trying to relate to a Parseval rule),

$$\begin{aligned} S_1(t) := \Vert \phi (t;x)\Vert _{L^1({\varOmega })} = \int _{\varOmega } |\phi (t;x)|\mathrm{d}x. \end{aligned}$$
(12)
Table 1 Some properties of the TV transform in \({\mathbb {R}}^n\), compared to the Fourier transform

In [14], the following definition was suggested,

$$\begin{aligned} S^2_2(t) = t^2 \frac{d^2}{\mathrm{d}t^2}J\left( u(t)\right) = \left\langle \phi (t), 2 t p(t) \right\rangle . \end{aligned}$$
(13)

With this definition the following analog of the Parseval identity was shown:

$$\begin{aligned} \Vert f \Vert ^2= & {} - \int _0^\infty \frac{d}{\mathrm{d}t} \Vert u(t) \Vert ^2 ~\mathrm{d}t \nonumber \\= & {} 2 \int _0^\infty \left\langle p(t), u(t) \right\rangle ~\mathrm{d}t \nonumber \\= & {} 2 \int _0^\infty J(u(t))~\mathrm{d}t \nonumber \\= & {} \int _0^\infty S^2_2(t) ~\mathrm{d}t. \end{aligned}$$
(14)

In [16] a third definition for the spectrum was suggested (which is simpler and admits Parseval),

$$\begin{aligned} S^2_3(t) = \left\langle \phi (t), f \right\rangle . \end{aligned}$$
(15)

It can be shown that a similar Parseval-type equality holds here

$$\begin{aligned} \Vert f \Vert ^2= & {} \langle f,f \rangle = \Big \langle \int _0^\infty \phi (t)\mathrm{d}t,f \Big \rangle = \int _0^\infty \langle \phi (t),f \rangle \mathrm{d}t \nonumber \\= & {} \int _0^\infty S^2_3(t) ~\mathrm{d}t. \end{aligned}$$
(16)

While (12), (13), and (15) yield the intuition of possible definitions of the spectral decomposition, care has to be taken to obtain mathematically precise definitions. In the discrete case, we have \(\phi \in \left( W^{1,1}_\mathrm{loc}({\mathbb {R}}^+, {\mathbb {R}}^n)\right) ^*\), such that \(S_2^2\) and \(S_3^2\) have to be interpreted as elements in \(\left( W^{1,1}_\mathrm{loc}({\mathbb {R}}^+, {\mathbb {R}})\right) ^*\). In fact, in [16], we showed these two definitions to be equivalent for sufficiently regular subdifferentials of J.

As an overview example, Table 1 summarizes the analogies of the Fourier and TV-based spectral transformation. There remain many open questions of further possible generalizations regarding Fourier-specific results, some of which are listed like the convolution theorem, Fourier duality property (where the transform and inverse transform can be interchanged up to a sign), the existence of phase in Fourier, and more.

4 Decomposition into Eigenfunctions

Let us consider the discrete case of J being a semi-norm on \({\mathbb {R}}^n\). As discussed in the Sect. 3, item 3, one of the fundamental interpretations of linear spectral decompositions arises from it being the coefficients for representing the input data in a new basis. This basis is composed of the eigenvectors of the transform. For example, in the case of the cosine transform, the input signal is represented by a linear combination of cosines with increasing frequencies. The corresponding spectral decomposition consists of the coefficients in this representation, which therefore admits an immediate interpretation.

Although the proposed nonlinear spectral decomposition does not immediately correspond to a change of basis anymore, it is interesting to see that the property of representing the input data as a linear combination of (generalized) eigenfunctions can be preserved: In [14], we showed that for the case of \(J(u) = \Vert V u\Vert _1\) and V being any orthonormal matrix, the solution of the scale space flow (3) meets

$$\begin{aligned} Vu(t) = \text {sign}(\zeta ) \; \max (|\zeta | - t, 0), \end{aligned}$$
(17)

where \(\zeta = Vf\) are the coefficients for representing f in the orthonormal basis of V. It is interesting to see that the subgradient p(t) in (3) admits a componentwise representation of Vp(t) as

$$\begin{aligned} (Vp)_i(t) = \left\{ \begin{array}{ll} \text {sign}(\zeta _i) &{} \text { if } |\zeta _i| \ge t, \\ 0 &{} \text { else.} \end{array} \right. \end{aligned}$$
(18)

The latter shows that p(t) can be represented by \(V^T q(t)\) for some q(t) which actually meets \(q(t) \in \partial \Vert Vp(t)\Vert _1\). In a single equation, this means \(p(t) \in \partial J(p(t))\) and shows that the p(t) arising from the scale space flow are eigenfunctions (up to normalization). Integrating the scale space flow Eq. (3) from zero to infinity and using that there are only finitely many times at which p(t) changes, one can see that one can indeed represent f as a linear combination of eigenfunctions. Our spectral decomposition \(\phi (t)\) then corresponds to the change of the eigenfunctions during the (piecewise) dynamics of the flow.

While the case of \(J(u) = \Vert Vu\Vert _1\) for an orthogonal matrix V is quite specific because it essentially recovers the linear spectral analysis exactly, we have shown in [16] that the property of finitely many times at which p(t) changes can be proved for any polyhedral one-homogeneous regularization. If in addition, the subdifferentials are sufficiently regular, e.g., as in the case of \(J(u)=\Vert Vu\Vert _1\) and \(VV^*\) being diagonally dominant, then the property of p(t) being an eigenfunction for all times t remains true. We refer the reader to [16] for more details. The analysis of such properties for more general one-homogeneous regularizations or the extension of these results to a function space setting remains as open questions.

5 Explicit TV Eigenfunctions in 1D

Let us consider the function space setting and give an analytic expression of a large set of eigenfunctions of the TV in one dimension with the signal domain being the entire real line. We will later see that Haar wavelets are a small subset of these, hence eigenfunctions are expected to represent signals more concisely, with much fewer elements.

We give the presentation below in a somewhat informal manner with the goal of rather explaining the main concept of the findings than giving mathematical details. A formal presentation of the theory of TV eigenfunctions can be found e.g., in [7].

The TV functional can be expressed as

$$\begin{aligned} J_\mathrm{TV}(u) = \sup _{\begin{array}{c} z \in C_c^\infty \\ \Vert z\Vert _{L^\infty ({\varOmega })\le 1} \end{array}} \langle u, {{\mathrm{div}}}z \rangle . \end{aligned}$$
(19)

Let \(z_u\) be an argument admitting the supremum of (19) then it immediately follows that \({{\mathrm{div}}}z_u \in \partial J_\mathrm{TV}(u)\): in the one-homogeneous case we need to show \(\langle u, {{\mathrm{div}}}z_u \rangle = J_\mathrm{TV}(u)\) which is given in (19); and in addition that for any v in the space we have \(J_\mathrm{TV}(v)\ge \langle v,p \rangle \), \(p\in \partial J(u)\),

$$\begin{aligned} J_\mathrm{TV}(v) = \sup _{\Vert z\Vert _{L^\infty ({\varOmega })\le 1}} \langle v, {{\mathrm{div}}}z \rangle \ge \langle v, {{\mathrm{div}}}z_u \rangle . \end{aligned}$$

From here on, we will refer to \(z_u\) simply as z.

To understand better what z stands for, we can check the case of smooth u and perform integration by parts in (19) to have

$$\begin{aligned} J_\mathrm{TV}(u) = \langle \nabla u, -z \rangle . \end{aligned}$$

Then as z also maximizes \(\langle \nabla u, -z \rangle \), we can solve this pointwise, taking into account that \(|z(x)| \le 1\) and that the inner product of a vector is maximized for a vector at the same angle, to have

$$\begin{aligned} z(x) \left\{ \begin{array}{ll} = - \frac{\nabla u(x)}{|\nabla u(x)|} &{} \text { for } \nabla u(x) \ne \mathbf {0}, \\ \in [-1,1] &{} \nabla u(x) = \mathbf {0}. \end{array} \right. \end{aligned}$$
(20)

5.1 Single Peak

Let us consider the case of one-dimensional TV regularization and define the following function of a single-unit peak of width w

$$\begin{aligned} B_w(x) = \left\{ \begin{array}{cc} 1 &{} \text { for } x \in [0,w), \\ 0 &{} \text {otherwise}. \end{array} \right. \end{aligned}$$
(21)

Then it is well known (e.g., [41]) that any function of the type \(h\cdot B_w(x-x_0)\) is an eigenfunction of TV in \(x\in {\mathbb {R}}\) with eigenvalue \(\lambda = \frac{2}{hw}\). Let us illustrate the latter by considering the characterization of the subdifferential (20) and define

$$\begin{aligned} z(x) = \left\{ \begin{array}{ll} -1 &{} \text { for } x \in ]-\infty , x_0], \\ \frac{2(x-x_0)}{w}-1 &{} \text { for } x \in (x_0,x_0+w),\\ \ 1 &{} \text { for } x \in [x_0 + w, \infty [. \end{array} \right. \end{aligned}$$

Although this z is clearly not a \(C_c^\infty \) function, it was shown in [1, 7] that functions \(z \in L^\infty ({\mathbb {R}})\) with \({{\mathrm{div}}}z \in L^2({\mathbb {R}})\) are sufficient for the characterization of subgradients.

For the above z, on the one hand we have \(\partial _x z(x) = \frac{w}{2}B_w(x-x_0)\), and on the other hand z meets equation (20) for \(u = \frac{w}{2}B_w( \cdot -x_0)\). Therefore, \(\frac{w}{2}B_w( \cdot -x_0) \in \partial J_\mathrm{TV}(\frac{w}{2}B_w( \cdot -x_0))\), and after normalization with \(\Vert \frac{w}{2}B_w( \cdot -x_0)\Vert _2 = \frac{w^2}{2}\) we find that \(\frac{1}{w} B_w( \cdot -x_0)\) is an eigenfunction with eigenvalue \(\lambda = \frac{2}{w}\).

5.2 Set of 1D Eigenfunctions

Generalizing this analysis, one can construct for any eigenvalue \(\lambda \) an infinite set of piecewise constant eigenfunctions (with a compact support).

Proposition 1

Let \(-\infty< x_0< x_1,\cdots< x_n < \infty \) be a set of \(n+1\) points on the real line. Let

$$\begin{aligned} u(x) = \sum _{i=0}^{n-1} h_i B_{w_i}(x-x_i), \end{aligned}$$
(22)

with \(B_w(\cdot )\) defined in (21), \( w_i = x_{i+1}-x_i,\) and

$$\begin{aligned} h_i = \frac{2 (-1)^i}{\lambda w_i}. \end{aligned}$$
(23)

Then u(x) admits the eigenvalue problem (1) with \(J = J_\mathrm{TV}\).

Proof

One can construct the following z in the shape of “zigzag” between \(-1\) and 1 at points \(x_i\),

$$\begin{aligned} z(x) = (-1)^i\left( \frac{ 2(x-x_i)}{w_i}-1\right) , \,\,\, x \in [x_i,x_{i+1}), \end{aligned}$$

and \(\partial _x z=0\) otherwise. In a similar manner to the single peak case, we get the subgradient element in \(\partial J_\mathrm{TV}(u)\)

$$\begin{aligned} p(x) = \left\{ \begin{array}{ll} \partial _x z = (-1)^i \frac{2}{w_i}, &{} x \in [x_i,x_{i+1}) \\ 0, &{} x \notin [x_0,x_n). \end{array} \right. \end{aligned}$$
(24)

This yields \(p(x) = \lambda u(x)\). \(\square \)

In Figs. 5 and 6, we see examples of u, p, and z which construct simple TV eigenfunctions in the unbounded and bounded domains, respectively. In Fig. 7, several different types of eigenfunctions are shown. Compared to wavelets, they also oscillate (with mean zero value) but their variability is significantly larger.

Fig. 5
figure 5

Illustration of Proposition 1, a TV eigenfunction u(x) in \({\mathbb {R}}\)

Fig. 6
figure 6

A TV eigenfunction u(x) in a bounded domain

Fig. 7
figure 7

A few examples of functions meeting \(f\in \partial J_\mathrm{TV}(f)\) in \({\mathbb {R}}\). (Note that these are piecewise constant functions and the jumps should be understood as discontinuities.)

5.2.1 The Bounded Domain Case

The unbounded domain is easier to analyze in some cases; however, in practice, we can implement only signals with a bounded domain. We therefore give the formulation for \(u \in {\varOmega }=[A,B) \subset {\mathbb {R}}\). Requiring \(J_\mathrm{TV}(1)=\langle p,1 \rangle =0\) and \(\langle u, p \rangle = \langle -\nabla u, z \rangle \) leads to the boundary condition \(z|_{\partial {\varOmega }}=0\).

Thus, on the boundaries we have \(z=0\) with half the slope of the unbounded case (see Fig. 6), all other derivations are the same. Setting \(x_0 = A\), \(x_n =B\) we get the solution of (22) with \(h_i\) defined slightly differently as

$$\begin{aligned} h_i = \frac{2a_i (-1)^i}{\lambda w_i}, \end{aligned}$$
(25)

where \(a_i=\frac{1}{2}\) for \(i\in \{1,n-1 \}\) and \(a_i=1\) otherwise. See a numerical convergence of \(\lambda p\) to u in Fig. 12.

Remark

Note that if u is an eigenfunction so is \(-u\) so the formulas above are all valid also with the opposite sign.

6 Wavelets and Hard Thresholding

As we have seen in Sect. 4, the gradient flow with respect to regularizations of the form \(J(u) = \Vert Vu\Vert _1\) has a closed form solution for any orthogonal matrix V. From equation (18) one deduces that

$$\begin{aligned} \phi (t) = \sum _i \zeta _i \delta (t - |\zeta _i|) v_i, \end{aligned}$$

where \(v_i\) are the rows of the matrix V, and \(\zeta _i = (Vf)_i\). In particular, using definition (15) we obtain

$$\begin{aligned} (S_3(t))^2 = \sum _i (\zeta _i)^2 \delta (t - |\zeta _i|). \end{aligned}$$

Peaks in the spectrum therefore occur ordered by the magnitude of the coefficients \(\zeta _i = (Vf)_i\): The smaller \(|\zeta _i|\) the earlier it appears in the wavelength representation \(\phi \). Hence, applying an ideal low-pass filter (8) on the spectral representation with cutoff wavelength \(t_c\) is exactly the same as hard thresholding by \(t_c\), i.e., setting all coefficients \(\zeta _i\) with magnitude less than \(t_c\) to zero.

6.1 Haar Wavelets

Wavelet functions are usually continuous and cannot represent discontinuities very well. A special case are the Haar wavelets which are discontinuous and thus are expected to represent much better discontinuous signals. The relation between Haar wavelets and TV regularization in one dimension has been investigated, most notably in [52]. In higher dimensions, there is no straightforward analogy, in the case of isotropic TV, as it is not separable, contrary to the wavelet decomposition (thus one gets disk-like shapes as eigenfunctions as opposed to rectangles).

We will show here that even in the one-dimensional case, TV eigenfunctions can represent signals in a much more concise manner.

Let \(\psi _H(x)\) be a Haar mother wavelet function defined by

$$\begin{aligned} \psi _H(x) = \left\{ \begin{array}{cc} 1 &{} \text { for } x \in [0,\frac{1}{2}), \\ -1 &{} \text { for } x \in [\frac{1}{2},1), \\ 0 &{} \text {otherwise}. \end{array} \right. \end{aligned}$$
(26)

For any integer pair \(n,k \in {\mathbb {Z}}\), a Haar function \(\psi _H^{n,k}(x) \in {\mathbb {R}}\) is defined by

$$\begin{aligned} \psi _H^{n,k}(x) = 2^{n/2}\psi _H(2^n x - k). \end{aligned}$$
(27)

We now draw the straightforward relation between Haar wavelets and TV eigenfunctions.

Proposition 2

A Haar-wavelet function \(\psi _H^{n,k}\) is an eigenfunction of TV with eigenvalue \(\lambda = 2^{(2+n/2)}\).

Proof

One can express the mother wavelet as

$$\begin{aligned} \psi _H(x) = B_{\frac{1}{2}}(x)-B_{\frac{1}{2}}(x-\frac{1}{2}), \end{aligned}$$

and in general, any Haar function as

$$\begin{aligned} \psi _H^{n,k}(x) = h^n\left( B_{w^n}(x-x_0^{n,k})-B_{w^n}(x-x_0^{n,k}-w^n)\right) , \end{aligned}$$

with \(h^n = 2^{(n/2)}\), \(w^n=2^{-(n+1)}\), and \(x_0^{n,k}=2^{-n}k\). Thus, based on Proposition 1, we have that for any \(n,k \in {\mathbb {Z}}\), \(\psi _H^{n,k}\) is an eigenfunction of TV with \(\lambda = \frac{2}{h^n w^n}= 2^{(2+n/2)}\). \(\square \)

Fig. 8
figure 8

Example for a spectral decomposition: We evaluate the flow of Eq. (3) for given data f shown on the left. We compute the spectral decomposition (5), and compute a (discretized) version of the power spectral \(S^2_3\) and \(S_1\) from (15) and (12) shown in the middle and right plot, respectively. Since there appear five distinct peaks, one can integrate the spectral \(\phi \) components at the respective time intervals (\(=\)band-pass filter), visualized in different colors, to obtain the five components shown in Fig. 9

Fig. 9
figure 9

Decomposing f (Fig. 8, left) into the 5 elements represented by the peaks in the spectrum using spectral TV

Fig. 10
figure 10

Decomposing f (Fig. 8, left) with Haar wavelets using 15 elements

Fig. 11
figure 11

Comparison of wavelet Haar hard-thresholding to spectral TV hard-thresholding (ideal LPF). Although both representations can handle well discontinuities, the spectral TV representation is better adapted to the image edges and produces less artifacts

In Figs. 89, and 10, we show the decomposition of a signal f composed of three peaks of different widths. The TV spectral decomposition shows five numerical deltas (corresponding to the five elements depicted in Fig. 9). On the other hand, the Haar-wavelet decomposition needs 15 elements to represent this signal, thus the representation is less sparse. In the 2D case, Fig. 11, we see the consequence of an inadequate representation, which is less adapted to the data, where Haar thresholding is compared to ideal TV LPF, depicting blocky artifacts in the Haar case.

7 Rayleigh Quotients and SVD Decomposition

In the classical Hilbert space case and J being a quadratic form, the Rayleigh quotient is defined by

$$\begin{aligned} R_M(v) := \frac{v^T M v}{v^T v}, \end{aligned}$$
(28)

where for the real-valued case M is a symmetric matrix and v is a nonzero vector. It can be shown that, for a given M, the Rayleigh quotient reaches its minimum value at \(R_M(v)=\lambda _1\), where \(\lambda _1\) is the minimal eigenvalue of M and \(v=v_1\) is the corresponding eigenvector (and similarly for the maximum).

One can generalize this quotient to functionals, similar as in [10], by

$$\begin{aligned} R_J(u) := \frac{J(u)^2}{\Vert u\Vert ^2}, \end{aligned}$$
(29)

where \(\Vert \cdot \Vert \) is the \(L^2\) norm. We restrict the problem to the nondegenerate case where \(J(u)>0\), that is, u is not in the null-space of J.

To find a minimizer, an alternative formulation can be used (as in the classical case):

$$\begin{aligned} \min _u \{ J(u) \} \text { s.t. } \Vert u\Vert ^2=1. \end{aligned}$$
(30)

Using Lagrange multipliers, the problem can be recast as

$$\begin{aligned} \min _u \left\{ J(u) - \frac{\lambda }{2}\left( \Vert u\Vert ^2 - 1 \right) \right\} , \end{aligned}$$

with the optimality condition

$$\begin{aligned} 0 \in \partial J(u) - \lambda u, \end{aligned}$$

which coincides with the eigenvalue problem (1). Note that with the additional constraint of u being in the orthogonal complement of the null-space of J in the case of J being absolutely one-homogeneous, in which case one obtains the minimal nonzero eigenvalue as a minimum of the generalized Rayleigh quotient (30).

The work by Benning and Burger in [10] considers more general variational reconstruction problems involving a linear operator in the data fidelity term, i.e.,

$$\begin{aligned} \min _u \frac{1}{2}\Vert Au-f\Vert _2^2 + t \ J(u), \end{aligned}$$

and generalizes Eq. (1) to

$$\begin{aligned} \lambda A^*A u \in \partial J(u), \qquad \Vert Au\Vert _2=1, \end{aligned}$$

in which case u is called a singular vector. Particular emphasis is put on the ground states

$$\begin{aligned} u^0 = \text {arg}\min \limits _{ \begin{array}{c} u \in \mathrm {kern}(J)^\perp ,\\ \Vert Au\Vert _2=1 \end{array}} J(u) \end{aligned}$$

for semi-norms J, which were proven to be singular vectors with the smallest possible singular value. Although the existence of a ground state (and hence the existence of singular vectors) is guaranteed for all reasonable J in regularization methods, it was shown that the Rayleigh principle for higher singular values fails. As a consequence, determining or even showing the existence of a basis of singular vectors remains an open problem for general semi-norms J.

In the setting of nonlinear eigenfunctions for one-homogeneous functionals, the Rayleigh principle for the second eigenvalue (given the smallest eigenvalue \(\lambda _1\) and ground state \(u_1\)) leads to the optimization problem

$$\begin{aligned} \min _u \{ J(u) \} \text { s.t. } \Vert u\Vert ^2=1, \langle u, u_1 \rangle = 0 . \end{aligned}$$
(31)

With appropriate Lagrange multipliers \(\lambda \) and \(\mu \), we obtain the solution as a minimizer of

$$\begin{aligned} \min _u \{ J(u) - \frac{\lambda }{2}\left( \Vert u\Vert ^2 - 1 \right) + \mu \langle u, u_1 \rangle \}, \end{aligned}$$

with the optimality condition

$$\begin{aligned} \lambda u - \mu u_1 = p \in \partial J(u). \end{aligned}$$

We observe that we can only guarantee u to be an eigenvector of J if \(\mu = 0\), which is not guaranteed in general. A scalar product with \(u_1\) and the orthogonality constraint yields \(\mu = - \langle p, u_1 \rangle \), which only needs to vanish if \(J(u) = \Vert u \Vert \), but not for general one-homogeneous J.

An example of a one-homogeneous functional failing to produce the Rayleigh principle for higher eigenvalues (which can be derived from the results in [10]) is given by \(J: {\mathbb {R}}^2 \rightarrow {\mathbb {R}}\),

$$\begin{aligned} J(u) = \Vert D u \Vert _1, \qquad D = \left( \begin{array}{cc} 1 &{} - 2 {\epsilon } \\ 0 &{} \frac{1}{\epsilon } \end{array} \right) , \end{aligned}$$

for \(0< \epsilon < \frac{1}{2}\). The ground state \(u=(u^1,u^2)\) minimizes \(\vert u^1 - 2\epsilon u^2\vert + \frac{1}{\epsilon }\vert u^2 \vert \) subject to \(\Vert u \Vert _2 = 1\). It is easy to see that \(u=\pm (1,0)\) is the unique ground state, since by the normalization and the triangle inequality

$$\begin{aligned} 1&= \Vert u \Vert _2 \le \Vert u \Vert _1 \le |u^1 - 2\epsilon u^2|+(1+2\epsilon )| u^2 |\\&\quad \le |u^1 - 2\epsilon u^2|+\frac{1}{\epsilon }| u^2 |, \end{aligned}$$

and the last inequality is sharp if and only if \(u^2 =0\). Hence the only candidate v being normalized and orthogonal to u is given by \(v = \pm (0,1)\). Without restriction of generality, we consider \(v=(0,1)\). Now, \(Dv = \left( -2\epsilon , \frac{1}{\epsilon }\right) \) and hence

$$\begin{aligned} \partial J(v) = \left\{ D^T (-1,1)\right\} = \left\{ \left( -1,2\epsilon + \frac{1}{\epsilon }\right) \right\} , \end{aligned}$$

which implies that there cannot be a \(\lambda > 0\) with \(\lambda v \in \partial J(v)\). A detailed characterization of functionals allowing for the Rayleigh principle for higher eigenvalues is still an open problem as well as the question whether there exists an orthogonal basis of eigenvectors in general. However, in the above example we are not able to construct an orthogonal basis of eigenvectors—this will become more apparent from the characterization of eigenvectors in the next section.

8 Sparse Representation by Eigenfunctions

With our previous definitions, a convex functional induces a dictionary by its eigenfunctions. Let us formalize this notion by the following definition:

Definition 1

(Eigenfunction Dictionary) Let \({\mathcal {D}}_J\) be a dictionary of functions in a Banach space \({\mathcal {X}}\), with respect to a convex functional J, defined as the set of all eigenfunctions,

$$\begin{aligned} {\mathcal {D}}_J := \bigcup _{\lambda \in {\mathbb {R}}}\Big \{ u \in {\mathcal {X}}\, |\, \lambda u \in \partial J(u), {\ \Vert u\Vert _2 = 1}\Big \}. \end{aligned}$$

There are natural questions to be asked such as the following:

  1. (1)

    What functions can be reconstructed by a linear combination of functions in the dictionary \({\mathcal {D}}_J\)?

  2. (2)

    Is \({\mathcal {D}}_J\) an overcomplete dictionary?

  3. (3)

    Does \({\mathcal {D}}_J\) contain a complete orthonormal system ?

  4. (4)

    How many elements are needed to express some type of functions (or how sparse are they in this dictionary)?

We notice that for many functionals, in particular semi-norms, there is a natural ambiguity between u and \(-u\), which are both elements of \({\mathcal {D}}_J\). Thus, we might either ask for conical combinations or identify u and \(-u\) as the same elements of the dictionary.

So far the above questions have not been investigated systematically and there is no general answer. However, from the existing examples we expect the eigenfunctions of a convex one-homogeneous function to contain a basis, and often be extremely overcomplete. Regarding the third aspect, it would be desirable to have a criterion for when input data consisting of a linear combination of eigenfunctions can actually be decomposed into its original atoms.

For some simple cases, we can immediately give an answer to the questions raised above. Firstly, note that in the case of \(J(u) = \Vert u\Vert _1\) any \(u = \frac{g}{\Vert g\Vert _2}\) with \(g_i \in \{-1, 0, +1 \}\) is an eigenfunction (except \(g \equiv 0\)). Thus, even after removing the sign ambiguity there still exist \(3^n/2-1\) different eigenfunctions in \({\mathbb {R}}^n\). Remarkably, the number of eigenfunctions grows exponentially with the dimension of the space.

8.1 Total Variation Dictionaries

In the continuous setting, the result of Proposition 2 stating that any Haar-wavelet function is a TV eigenfunction immediate yields the existence of a basis of eigenfunction in this case, too. (See for example [22] for details on the wavelet reconstruction properties). Moreover, the explicit construction of eigenfunctions according to Proposition 1 along with the freedom to choose the \(x_i \in {\mathbb {R}}\) arbitrarily yields that there are uncountably many eigenfunctions and the dictionary is overcomplete.

Without having the immediate connection to wavelets, the above results extend to the c-dimensional TV regularization:

Corollary 1

For \(J_\mathrm{TV}: \text {BV}({\mathbb {R}}^c) \rightarrow {\mathbb {R}}\) being the TV functional, any \(f\in L^2({\mathbb {R}}^c)\) can be approximated up to a desired error \(\varepsilon \) by a finite linear (respectively conical) combination of elements from \({\mathcal {D}}_{J_\mathrm{TV}}\). Moreover, \({\mathcal {D}}_{J_\mathrm{TV}}\) is an overcomplete dictionary (so the linear combination is not unique).

Proof

First of all, we can approximate any \(L^2\)-function to arbitrary precision with a linear combination of a finite number of piecewise constant functions. Since the Borel sets are generated by balls, we can further construct an approximation to arbitrary precision with functions piecewise constant on balls, i.e., linear combinations of characteristic functions of balls. Since the latter are eigenfunctions of the TV, i.e., elements of \({\mathcal {D}}_J\), the dictionary is complete. Moreover, since there are Cheeger sets not equal to a ball, and their characteristic functions are in \({\mathcal {D}}_J\), too, the dictionary is overcomplete. \(\square \)

While the properties of eigenfunctions with respect to the total variation regularization are fairly well understood, the question for the properties of a dictionary of eigenfunctions for general one-homogeneous regularization functionals remains open. In [16], we showed that the regularization \(J(u) = \Vert Du\Vert _1\) for \(DD^*\) being diagonally dominant always yields the existence of a basis of eigenfunctions. We do, however, expect to have a significantly overdetermined dictionary of eigenfunctions. Unfortunately, classical ways of determining eigenfunctions—such as the Rayleigh principle—fail in the setting of generalized eigenfunctions (1) for nonquadratic J as we have seen in the previous section. Nevertheless, for absolutely one-homogeneous functions we can describe the set of eigenvectors more precisely.

8.2 Dictionaries from One-Homogeneous Functionals

In the following, let us consider one-homogeneous functionals \(J:{\mathbb {R}}^n \rightarrow {\mathbb {R}}\), which by duality can be represented as

$$\begin{aligned} J(u) = \sup _{q \in K} \langle q, u \rangle \end{aligned}$$
(32)

for some convex set K. In order to avoid technicalities (related to a possible null-space of J), we assume that K has nonempty interior. The following result provides a characterization of eigenvectors for nonzero eigenvalues:

Lemma 1

Let J be defined by (32) and \(p \in \partial K\) satisfy

$$\begin{aligned} \langle p, p -q \rangle \ge 0 \qquad \forall ~q \in K. \end{aligned}$$
(33)

Then \(u=\frac{p}{\Vert p \Vert }\) is an eigenvector with eigenvalue \(\lambda = \Vert p \Vert \). Vice versa, if u is an eigenvector with eigenvalue \(\lambda \ne 0\), then \(p=\lambda u\) satisfies (33).

Proof

We have \(p \in \partial J(u)\) if and only if p is the maximal element in \(\langle q,u \rangle \) over all \(q \in K\). Hence, u is an eigenvector, i.e., a positive multiple of p if and only if (33) holds. \(\square \)

This result has a straightforward geometric interpretation: if \(u=\lambda p\) is an eigenvector corresponding to \(\lambda \ne 0\), then a ball centered at zero is tangential to \(\partial K\) at p. In other words, the hyperplane through p and orthogonal to p is tangential to \(\partial K\), more precisely K lies on one side of this hyperplane.

The above geometric insight allows a more precise understanding of the example from the previous section. For \(J(u) = \Vert Du \Vert _1\), we have

$$\begin{aligned} K = \{~D^T e ~|~\Vert e\Vert _\infty \le 1 \}, \end{aligned}$$

i.e., K is a sheared version of the unit square. For the specific form of D in the example, the facets remaining parallel to the second coordinate axis remain unchanged. If the shearing is not too strong (\(\epsilon < \frac{1}{2}\)), the first tangential touch of a ball centered at zero happens at \(p=\pm (1,0)\), which gives the first eigenvector. A second tangential touch happens at the extremal points of the sheared square, which obviously yields an eigenvalue with nonzero first coordinate, hence not orthogonal to the first eigenvector. The construction of the minimal and maximal eigenvalue from this example can indeed be generalized with a straightforward proof to the general case:

Proposition 3

Let J be defined by (32) and p be a solution of

$$\begin{aligned} \Vert p \Vert \rightarrow \min _{p \in \partial K}, \end{aligned}$$
(34)

then \(\lambda = \Vert p \Vert \) is the minimal eigenvalue of J and \(u=\lambda p\) is a ground state. Let p be a solution of

$$\begin{aligned} \Vert p \Vert \rightarrow \max _{p \in \partial K}, \end{aligned}$$
(35)

then \(\lambda = \Vert p \Vert \) is the maximal eigenvalue of J and \(u=\lambda p\) is a corresponding eigenvector.

9 Cheeger Sets and Spectral Clustering

As mentioned in the introduction, there is a large literature on calibrable sets, respectively Cheeger sets, whose characteristic function is an eigenfunction of the TV functional, compare (2). Particular interest is of course paid to the lowest eigenvalues and their eigenfunctions (ground states), which are related to solutions of the isoperimetric problem, easily seen from the relation \(\lambda = \frac{P(C)}{|C|}\). Due to scaling invariance of \(\lambda \), the shape of C corresponding to the minimal eigenvalue can be determined by minimizing P(C) subject to \(|C|=1\). Hence, the ground states of the TV functional are just the characteristic functions of balls in the isotropic setting. Note that the normalization of \(|C|=1\) is consistent with the constraint of ground states having a unit \(L^2\) norm. Interesting variants are obtained with different definitions of the TV via anisotropic vector norms. e.g., if one uses the \(\ell ^1\)-norm of the gradient in the definition, then it is a simple exercise to show that the ground state is given by characteristic functions of squares with sides parallel to the coordinate directions. In general, the ground state is the characteristic function of the Wulff shape related to the special vector norm used (cf. [8, 28]) and is hence a fingerprint of the used metric.

On bounded domains or in (finite) discrete versions of the total variation the situation differs, a trivial argument shows that the first eigenvalue is simply zero, with a constant given as the ground state. Since this is not interesting, it was even suggested to redefine the ground state as an eigenfunction of the smallest nonzero eigenvalue (cf. [10]). The more interesting quantity is the second eigenfunction, which is orthogonal to the trivial ground state (i.e., has zero mean). In this case, the Rayleigh principle always works (cf. [10]) and we obtain

$$\begin{aligned} \lambda _2 = \inf _{u, \int u = 0} \frac{J_\mathrm{TV}(u)}{\Vert u \Vert _2}. \end{aligned}$$
(36)

Depending on the underlying domain on which TV is considered (and again the precise definition of TV) one obtains the minimizer as a function positive in one part and negative in another part of the domain, usually with constant values in each. Hence, the computation of the eigenvalue can also be interpreted as some optimal cut through the domain.

The latter idea had a strong impact in data analysis (cf. [12, 13, 39, 50, 53]), where the TV is defined as a discrete functional on a graph, hence one obtains a relation to graph cuts and graph spectral clustering. For data given in form of a weighted graph, with weights \(w_{ij}\) on an edge between two vertices ij in the vertex set \({{{\mathcal {W}}}}\), the TV is defined as

$$\begin{aligned} J_{1,w}(u) = \sum _{i,j \in {{{\mathcal {V}}}}} w_{ij} |u_i - u_j| \end{aligned}$$
(37)

for a function \(u: {{{\mathcal {V}}}} \rightarrow {\mathbb {R}}\). Then a graph cut is obtained from a minimizing u in

$$\begin{aligned} \lambda _2 = \inf _{u, \sum u_i = 0} \frac{J_{1,w}(u)}{\Vert u \Vert _2}, \end{aligned}$$
(38)

i.e., an eigenfunction corresponding to the second eigenvalue. The cut directly yields a clustering into the two classes

$$\begin{aligned} {{{\mathcal {C}}}}_+ = \left\{ i \in {{{\mathcal {V}}}}~|~u_i > 0 \right\} , \qquad {{{\mathcal {C}}}}_+ = \left\{ i \in {{{\mathcal {V}}}}~|~u_i < 0 \right\} . \end{aligned}$$
(39)

Note that in this formulation, the graph cut technique is completely analogous to the classical spectral clustering technique based on second eigenvalues of the graph Laplacian (cf. [55]). In the setting (38), the spectral clustering based on the graph Laplacian can be incorporated by using the functional

$$\begin{aligned} J_{2,w}(u) = \sqrt{ \sum _{i,j \in {{{\mathcal {V}}}}} w_{ij} |u_i - u_j|^2}. \end{aligned}$$
(40)

Some improvements in spectral clustering are obtained if the normalization in the eigenvalue problem is not done in the Euclidean norm (or a weighted variant thereof), but some \(\ell ^p\)-norm, i.e.,

$$\begin{aligned} \lambda _2 = \inf _{u, \sum u_i = 0} \frac{J_{r,w}(u)}{\Vert u \Vert _p}, \end{aligned}$$
(41)

for \(r=1\) or generalizations to weighted norms. We refer to [39] for an overview, again the case \(p=1\) has received particular attention. In the case of graph-total-variation, see a recent initial analysis and a method to construct certain type of concentric eigenfunctions in [4]. A thorough theoretical study of eigenvalue problems and spectral representations for such generalized eigenvalue problems is still an open problem, as for any nonHilbert space norm. The corresponding gradient flow becomes a doubly nonlinear evolution equation of the form

$$\begin{aligned} 0 \in \partial \Vert \partial _t u \Vert _p + \partial J(u) . \end{aligned}$$
(42)

The gradient flow for \(p=1\) and TV in the continuum is related to the \(L^1\)-TV model, which appears to have interesting multiscale properties (cf. [58]).

A challenging problem is the numerical computation of eigenvalues and eigenfunctions in the nonlinear setup, which is the case also in the general case beyond the spectral clustering application. Methods reminiscent of the classical inverse power method in eigenvalue computation (cf. e.g., [34]) have been proposed and are used with some success, but it is difficult to obtain and guarantee convergence to the smallest or second eigenvalue in general.

10 Numerical Implementation Issues

Here, we give the gradient flow approximation. We would like to approximate \( u_t = -p(u) \). We use the Moreau–Yosida approximation [43] for gradient flow. The implicit discrete evolution (which is unconditionally stable in \(\mathrm{d}t\)) is

$$\begin{aligned} u(n+1) = u(n) - dtp\Big (u(n+1)\Big ). \end{aligned}$$

We can write the above expression as

$$\begin{aligned} u(n+1) - u(n) + dtp\Big (u(n+1)\Big ) = 0, \end{aligned}$$

and see it coincides with the Euler–Lagrange of the following minimization:

$$\begin{aligned} E\Big (u,u(n)\Big ) = J(u) + \frac{1}{2dt}\Vert u - u(n)\Vert _{l^2}^2, \end{aligned}$$
(43)

where \(u(n+1)\) is the minimizer u of \(E\Big (u,u(n)\Big )\) and u(n) is fixed. We now have a standard convex variational problem which can be solved using various algorithms (e.g., [19, 21, 24, 33, 47]).

To approximate the second time derivative \(u_{tt}(t)\), we store in memory 3 consecutive time steps of u and use the standard central scheme:

$$\begin{aligned} D^2 u(n) = \frac{u(n-1)+u(n+1)-2u(n)}{(\mathrm{d}t)^2}, \end{aligned}$$

with \(n=1,2,\ldots \), \(u(0)=f\), \(D^2 u(0)=0\). The time t is discretized as \(t(n) = ndt\). Therefore,

$$\begin{aligned} \phi (n) = D^2 u(n) t(n) = \frac{n}{\mathrm{d}t}\left( u(n-1)+u(n+1)-2u(n)\right) ,\nonumber \\ \end{aligned}$$
(44)

and for instance the spectrum \(S_1\) is

$$\begin{aligned} S_1(n) = \sum _{i \in {\mathcal {N}}} |\phi _i(n)|, \end{aligned}$$
(45)

where i is a pixel in the image and \({\mathcal {N}}\) is the image domain.

Fig. 12
figure 12

Numerical implementation of a 1D TV eigenfunction, Chambolle–Pock scheme [21]. The ratio \(\frac{p(x)}{u(x)}=\lambda _1\) should converge to \(1/(1-\epsilon )\). It takes about 5000 iterations for the numerical scheme to fully converge. Note that the error is data-dependent, where larger flat regions converge much slower

In Fig. 12, we show a numerical computation for eigenfunctions and visualize the spatial convergence. We do a single time step of a TV flow in the following way. Let f be a TV eigenfunction in 1D in a bounded domain, which admits Eqs. (22), (25), for simplicity we chose \(\lambda =1\). We set \(n=0\), \(u(0)=f\) and find the minimizer of (43) for a very small time step \(\mathrm{d}t=\epsilon \) (in our implementation we had \(\mathrm{d}t=0.0001\)). Following Eq. (4), we should get that \(u(n=1,x) \equiv u(x) = (1-\mathrm{d}t)f(x)\). Note that the subdifferential of any absolutely one-homogeneous regularization function is zero-homogeneous, i.e., \(\partial J(c u) = \partial J(u)\) for all \(c>0\), which means that \(f \in \partial J_\mathrm{TV}(f)\) implies \(\frac{1}{1-\mathrm{d}t} u(n=1,x) \in \partial J(u(n=1,x))\). For minimization, we use the scheme of Chambolle–Pock [21], where we have internal m iterations. Let \(u^m(x)\), \(p^m(x)\) be the result of the numerical scheme at iteration m. At full convergence, as \(m \rightarrow \infty \), we have \(\lim _{m \rightarrow \infty }\frac{p^m(x)}{u^m(x)}=\lambda _1 \approx 1\) pointwise, as shown in the green flat line of Fig. 12 (for \(m=5000\) iterations).

It is shown that such schemes are far better for oscillating signals (works well for denoising), but when there are large flat regions the local magnitude of p depends on the support of that region and convergence is considerably slower (can be orders of magnitude). For the 1D case shown here, alternative schemes can be used which are much faster, such as [23, 35]. Furthermore, for our purpose of constructing a full solution path, variants of the homotopy method [45] or the adaptive inverse scale space flow [18] would be well suited for the 1D case as they can construct exact solutions without discretizing the underlying equation in time. However, as far as we know, there are no efficient schemes for higher dimensions which deal well with flat regions. Thus alternative ways are desired to solve such cases more efficiently.

11 Extensions to Other Nonlinear Decomposition Techniques

As presented in [14], the general framework of nonlinear spectral decompositions via one-homogeneous regularization functionals does not have to be based on the gradient flow Eq. (3), but may as well be defined via a variational method of the form

$$\begin{aligned} u(t) = \arg \min _u \frac{1}{2}\Vert u-f\Vert _2^2 + t J(u), \end{aligned}$$
(46)

or an inverse scale space flow of the form

$$\begin{aligned} \partial _s p(s) = f - u(s), \qquad p(s) \in \partial J(u(s)). \end{aligned}$$
(47)

Since the variational method shows exactly the same behavior as the scale space flow for the input data f being an eigenvector of J, the definitions of the spectral decomposition coincide with the one of the scale the flow. The behavior of the inverse scale space flow on the other hand is different in two respects. Firstly, it starts at u(0) being the projection of f onto the kernel of J and converges to \(u(s)=f\) for sufficiently large s. Secondly, the dynamics for f being an eigenfunction of J is piecewise constant in u(s) such that only a single derivative is needed to obtain a peak in the spectral decomposition. The behavior on eigenfunctions furthermore yields the relation \(s = \frac{1}{t}\) when comparing the spectral representation obtained by the variational or scale space method with the one obtained by the gradient flow.

In analogy to the linear spectral analysis, it is natural to call the spectral representation of the inverse scale space flow a frequency representation opposed to the wavelength representation of the variational and scale space methods.

Table 2 Six spectral representations of one-homogeneous functionals

To be able to obtain frequency and wavelength representations for all three types of methods, we make the convention that a filter H(t) acting on the wavelength representation \(\phi \) should be equivalent to the filter H(1 / t) acting the frequency representation \(\psi \), i.e.,

$$\begin{aligned} \int _0^\infty \phi (t) H(t) ~\mathrm{d}t = \int _0^\infty \psi (t) H(1/t) ~\mathrm{d}t. \end{aligned}$$

By a change of variables one deduces that

$$\begin{aligned} \psi (t) = \frac{1}{t^2}\phi \left( \frac{1}{t}\right) \text { respectively } \phi (t) = \frac{1}{t^2}\psi \left( \frac{1}{t}\right) \end{aligned}$$

are the appropriate conversion formulas to switch from the frequency to the wavelength representation and vice versa. Table 2 gives an overview over the three nonlinear decomposition methods and their spectral representation in terms of frequencies and wavelength.

It can be verified that in particular cases such as regularizations of the form \(J(u) = \Vert Vu\Vert _1\) for an orthonormal matrix V, or for the data f being an eigenfunction of J, the gradient flow, the variational method, and the inverse scale space flow yield exactly the same spectral decompositions \(\phi \) and \(\psi \). In [16], we are investigating more general equivalence results and we refer the reader to [27] for a numerical comparison of the above approaches. A precise theory of the differences of the three approaches for a general one-homogeneous J, however, remains an open question.

12 Preliminary Applications

12.1 Filter Design for Denoising

One particular application of the spectral decomposition framework could be the design of filters in denoising applications. It was shown in [31, Theorem2.5] that the popular denoising strategy of evolving the scale space flow (3) for some time \(t_1\), is equivalent to the particular filter

$$\begin{aligned} H(t) = \left\{ \begin{array}{ll} 0 &{} \text { for } 0 \le t \le t_1 \\ \frac{t-t_1}{t} &{} \text { for } t_1 \le t \le \infty \end{array} \right. \end{aligned}$$

in the framework of Eq. (7). The latter naturally raises the question if this particular choice of filter is optimal for practical applications. While in case of a perfect separation of signal and noise an ideal low pass filter (8) may allow a perfect reconstruction (as illustrated on synthetic data in Fig. 13), the spectral decomposition of natural noisy images will often contain noise as well as texture in high-frequency components.

Fig. 13
figure 13

Example for perfect separation of noise and signal via anisotropic TV regularization in the framework of nonlinear spectral decompositions using the inverse scale space flow. From left to right Clean image, corresponding spectrum of the clean image; noisy image, spectrum of the noisy image with an ideal low pass filter to separate noise and signal; reconstructed image after applying the ideal low-pass filter. We used the definition \(S_3\) in this illustration of the spectrum

In [42], a first approach to learning optimal denoising filters on a training dataset of natural images demonstrated promising results. In particular, it was shown that optimal filters neither had the shape of ideal low-pass filters nor of the filter arising from evolving the gradient flow.

In future research projects, different regularization terms for separating noise and signals in a spectral framework will be investigated. Moreover, the proposed spectral framework allows to filter more general types of noise, i.e., filters are not limited to considering only the highest frequencies as noise. Finally, the complete absence of high-frequency components may lead to an unnatural appearance of the images, such that the inclusion of some (damped) high-frequency components may improve the visual quality despite possibly containing noise. Figure 14 supports this conjecture by a preliminary example.

Fig. 14
figure 14

Example for denoising natural images: The original image (upper left) contains a lot of texture. If one tries to denoise the upper right image with an ideal low-pass filter (using TV in a color space that decouples intensities and chromaticities), we obtain the lower left image. Since some of the high frequencies are completely suppressed, the image looks unnatural. A designed filter as shown in the lower right can produce more visually pleasing results (lower middle image). This example was produced using the ISS framework such that the filters are shown in frequency representation. Note that the suppression of color artifacts seems to be significantly more important than the suppression of oscillations

12.2 Texture Processing

One can view the spectral representation as an extension to infinite dimensions of multiscale approaches for texture decomposition, such as [32, 54]. In this sense, \(\phi (t)\) of (5) is an infinitesimal textural element (which goes in a continuous manner from “texture” to “structure,” depending on t). A rather straightforward procedure is therefore to analyze the spectrum of an image and either manually or automatically select integration bands that correspond to meaningful textural parts in the image. This was done in [36], where a multiscale orientation descriptor, based on Gabor filters, was constructed. This yields for each pixel a multi-valued orientation field, which is more informative for analyzing complex textures. See Fig. 15 as an example of such a decomposition of the Barbara image.

Fig. 15
figure 15

Multiscale orientation analysis based on spectral decomposition a Barbara image, b TV spectrum of the image with separated scales marked in different colors. ce Multiscale decomposition, fh. Corresponding orientation maps

Fig. 16
figure 16

Spectral analysis of a wall image, from [38]. a Input image of a brick wall b \(S(x)=\max _t \phi (t;x)\). c Approximation of S(x) by a plain

Fig. 17
figure 17

Decomposition by a separation band in the spectral domain. a Input image, b the maximal \(\phi \) response, c the separation band, de spectral decomposition, fg TV-G decomposition. Taken from [37]

In [37, 38] a unique way of using the spectral representation was suggested for texture processing. It was shown that textures with gradually varying pattern-sizes, pattern-contrasts or illuminations can be represented by surfaces in the three-dimensional TV transform domain. A spatially varying texture decomposition amounts to estimating a surface which represents significant maximal response of \(\phi \), within a time range \([t_1,t_2]\),

$$\begin{aligned} \max _{t\in [t_1,t_2]} \{\phi (t;x)\} > \epsilon \end{aligned}$$

for each spatial coordinate x. In Fig. 16, a wall is shown with gradually varying texture scales and its scale representation in the spectral domain. A decomposition of structures with gradually varying contrasts is shown in Fig. 17 where the band-spectral decomposition (spatially varying scale separations) is compared to the best TV-G separation of [5].

13 Discussion and Conclusion

In this paper, we presented the rationale for analyzing one-homogeneous variational problems through a spectral approach. It was shown that solutions of the generalized nonlinear eigenvalue problem (1) are a fundamental part of this analysis. The quadratic (Dirichlet) energy yields Fourier frequencies as solutions, and thus eigenfunctions of one-homogeneous functionals can be viewed as new nonlinear extensions of classical frequencies.

Analogies to the Fourier case, to wavelets and to dictionary representations were drawn. However, the theory is only beginning to be formed and there are many open theoretical problems, a few examples are

  1. (1)

    More exact relations of the one-homogeneous spectral representations to basis and frames representations.

  2. (2)

    Understanding the difference between the gradient flow, variational, and inverse scale space representations. As we have shown in [16], a regularization \(J(u) = \Vert Du\Vert _1\) with \(DD^*\) being diagonally dominant yields the equivalence between the spectral decomposition of all three methods. Under a (weaker) regularity assumption on the subdifferentials, we obtained the equivalence between the gradient flow and the variational approach. We do, however, expect the equivalence of all three methods to hold under more general assumptions. It would be of particular interest to classify the difference in cases where the representations are not equivalent.

  3. (3)

    Consistency of the decomposition: If we filter the spectral decomposition of some input data f, e.g., by an ideal low -pass filter at frequency T, and apply the spectral decomposition again, will we see any frequencies greater than T?

  4. (4)

    Are the regularizations investigated in [16] the most general class of functions for which one can guarantee orthogonality of the spectral representation \(\phi \)?

  5. (5)

    Spectral analysis of random noise: Can we expect to classify the decay of coefficients from high to low frequencies, for instance for Gaussian random noise? Does this lead to implications about the optimal choice of filters for suppressing noise?

  6. (6)

    Can the theory be extended from the one-homogeneous case to the general convex one?

In addition, there are many practical aspects, such as

  1. (1)

    Learning the regularization on a training set with the goal to separate certain features.

  2. (2)

    Numerical issues, computing \(\phi \) in a stable manner (as it involves a second derivative in time), also can one design better schemes than the ones given here, which are less local and converge fast for any eigenfunction?

  3. (3)

    Additional applications where the new representations can help in better design of variational algorithms.

Some initial applications related to filter design and to texture processing were shown. This direction seems as a promising line of research which can aid in better understanding of variational processing and that has a strong potential to provide alternative improved algorithms.