Structure–texture image decomposition via non-convex total generalized variation and convolutional sparse coding

Wang, Chunxue; Xu, Linlin; Liu, Ligang

doi:10.1007/s00371-021-02392-z

Structure–texture image decomposition via non-convex total generalized variation and convolutional sparse coding

Original article
Published: 07 February 2022

Volume 39, pages 1121–1136, (2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

The Visual Computer Aims and scope Submit manuscript

Structure–texture image decomposition via non-convex total generalized variation and convolutional sparse coding

Download PDF

622 Accesses
6 Citations
1 Altmetric
Explore all metrics

Abstract

Image decomposition is a fundamental but challenging ill-posed problem in image processing and has been widely applied to compression, enhancement, texture removal, etc. In this paper, we introduce a novel structure–texture image decomposition model via non-convex total generalized variation regularization (NTGV) and convolutional sparse coding (CSC). NTGV aims to characterize the detailed-preserved structural component ameliorating the staircasing artifacts existing in total variation-based models, and CSC aims to characterize image fine-scale textures. Moreover, we incorporate both structure-aware and texture-aware measures to well distinguish structural and textural component. The proposed model is numerically implemented by an alternating minimization scheme based on alternating direction method of multipliers. Experimental results demonstrate the effectiveness of our approach on several applications including texture removal, high dynamic range image tone mapping, detail enhancement and non-photorealistic abstraction.

Semantic-Aware Visual Decomposition for Image Coding

Article 02 June 2023

Image Smoothing Based on Image Decomposition and Sparse High Frequency Gradient

Article 11 May 2018

An adaptive texture-preserved image denoising model

Article 16 June 2015

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Image decomposition is a key step in image analysis to extract two semantically meaningful components from an image. That is, given an input image f, the image decomposition problem can be written as $f = u_{C} + v_{T}$, where the unknown $u_{C}$ represents structural component modeling the well-structured homogeneous regions and the unknown $v_{T}$ represents textural component defining oscillating patterns such as texture and/or noise. Image denoising and structure–texture image decomposition are two types of representative decomposition problems, focusing on how to formulate two parts with different priors. In this paper, we mainly focus on the structure–texture decomposition and the difficulties mainly reflect in two aspects. Firstly, this is an ill-posed problem since the number of unknowns is twice as many as the number of known variables, and how to employ appropriate constraints to reach a unique solution is challenging. Secondly, as there is no clear distinction between structures and textures among different images and even within one image, how to define the scales of both parts and extract structures and textures consistent with Human Visual System (HVS) is an another difficult problem.

Numerous techniques have been developed to implement decomposition models. For example, different variational techniques have been developed to separate image structures and textures by utilizing different regularizations. Early methods always applied total variation (TV) [50] and its extensions including to extract the piecewise constant structural part and characterized the textural part with different functional spaces or norms, such as TV-G [42], TV-div($L^{p}$) [57], TV-$H^{-1}$ [46], TV-$H^{-s}$ [32], TV-div(BMO) [30], TV-$L^{1}$ [72], RTV [66], JCAS [19]. However, these methods often over-smooth the structural component. Filtering-based methods were proposed to filter out textures from the structural image according to edge strength [12, 13, 29, 31, 51, 70, 71] or spacial scale [17, 24, 67, 75, 77]. Commonly, the above work can remove weak or smaller-scale edges and preserve the strong or larger-scale ones. However, these methods still cannot describe the distinction between image structure and texture, especially when weak edges belong to image structure, while strong ones correspond to the texture. Recently, deep neural networks (DNNs) have been proposed in the image decomposition [8, 14, 15, 28, 65]. Since there is no ground truth of image structure and texture, these methods are dependent on man-made datasets via sparse annotations or a large number of external samples and perform unstable on real-world data. In addition, STD [76] proposed a model of self-example, unsupervised and online learning, but still lost too many details of structural component due to the TV regularization, especially for the examples with undistinguishable image structures and textures.

In this paper, we take advantages of both the variational framework and sparse representation for structure–texture image decomposition. In particular, we propose to use non-convex total generalized variation regularization (NTGV) as a regularization term of image structure to capture more structural details and extract image fine-scale texture with convolutional sparse coding (CSC). To better distinguish textual edges from structural ones, both structure-aware and a texture-aware measures are incorporated. We develop an alternating minimization scheme based on alternating direction method of multipliers (ADMM) and show effectiveness of our method on several applications both quantitatively and qualitatively.

The rest of the paper is organized as follows. In Sect. 2, we review related works on structure–texture image decomposition. We present the proposed model in Sect. 3 and introduce the algorithm details in Sect. 4. A comprehensive analysis is provided in Sect. 5. In Sect. 6, we present the numerical experiments on several applications and make comparisons with the corresponding state-of-the-art methods.

2 Related work

Generally, image decomposition is also known as image layer separation including reflectance (albedo)-illumination (shading) decomposition and structure (cartoon)-texture decomposition. The former mainly involves intrinsic image decomposition and Retinex-based decomposition designed for image enhancement, surface re-texturing, object insertion and scene relighting, etc. While the latter aims to extract main information of the images successfully applied to HDR image tone mapping, detail enhancement, semantic image segmentation and non-photorealistic abstraction, etc. We have explored the total generalized variation regularization for Retinex-based decomposition in our previous work [58] and consider the non-convex TGV to separate a single image into structural and textural components in this paper. Therefore, we give a brief overview of related works on structure(cartoon)-texture decomposition in this section. For the past few decades, many implementations and improvements of structure–texture image decomposition have been studied, and we mainly discuss three classes of methodologies that are most relevant to ours: adaptive neighborhood filter, variational framework and deep learning-based method.

Adaptive neighborhood filter Image structure can be achieved easily by filtering out image details, and the well-known adaptive filters include bilateral filter [54], guided filter [22] and non-local filter [6]. To better preserve image edges, some extensions appeared subsequently, e.g., fast bilateral filter [13], gradient domain-guided filter [22], hashed non-local filter [6]. To suppress the structure halos, the filter of weighted least squares (WLS) [17] was proposed by introducing gradient-based smoothness weights and has been well applied to HDR tone mapping and detail manipulation. Inspired by WLS, Li et al. [31] proposed an edge-aware weighting strategy in the paradigm of guided filtering, the tree filtering [3] was designed by spatial, range, and tree distances to achieve strong image smoothing, and side window filtering (SWF) [70] was introduced to adaptively select the support domain of filtering. In addition, anisotropic diffusion equation [48] and zero crossings of the derivative of histogram [26] were proposed in a different mathematical form, and a pyramid of Laplacian filters [47] was presented to characterize edges by large-scale edges or small-scale details. Since the aforementioned methods are not designed to explicitly describe both image structures and textures, their performance on image decomposition is not satisfactory. Recently, more and more works [24, 67, 75, 77] focused on the spatial scale of regions to distinguish image structures and textures. The rolling guidance filter (RGF) [75] employed scale space theory [33] to remove the small-scale edges and smoothed the large-scale edges with rolling guidance method iteratively. Jeon et al. [24] used patch-based statistics to find an optimal per-pixel smoothing scale. Zhou et al. [77] measured the scale with the help of a scanline along gradient direction through iterative global optimization (IGO). Xu et al. [67] adaptively adjusted the window size by the distances from pixels to structures. The neighborhood filters on spatial scale are intuitive and easy to implement, but still cannot well distinguish image structures and textures, especially for multiple scales of complex structures and textures.

Variational framework Variational models have been widely applied to image decomposition during the last two decades, aiming to find appropriate functional spaces to describe structures and textures, respectively. One of the most notable models is the TV approach [50], and early researches discussed the TV-based models characterizing textures by different functional spaces or norms [2], involving TV-G [42], TV-div($L^{p}$) [57], TV-$H^{-1}$ [46], TV-$H^{-s}$ [32], TV-div(BMO) [30], TV-$L^{1}$ [72], etc. To well model textural component, sparse coding [19, 52, 74] and low rank [16] were proposed in TV-based models. However, these models often over-smooth structural component as the property of TV. Beyond the TV [50], many other priors have also been proposed. Xu et al. [64] put forward $L^{0}$ gradient minimization to remove weak edges and retain strong edges. Similarly, Ono [45] formulated $L^{0}$-norm of gradients as the hard constraint solved by $L^{0}$ projection algorithm. The model of relative total variation (RTV) [66] used a pixel-wise windowed TV normalized by a windowed inherent variation, which is treated as a great progress in describing image textures. Many scale-ware measures in image decomposition models [21, 24, 67, 77] essentially originated from RTV. Although different scale-aware measures were discussed, the resulting structure usually looked like piecewise constant, and thus structural edges were regarded as textural details. As a successful modified TV, total generalized variation (TGV) [4] has been applied in image decomposition to preserve structural edges. Following adaptive TV [34, 37], Xu et al. [63] proposed the adaptive TGV to preserve the key features such as object boundaries and Liu [35] developed a weighted TGV-Gabor model. Following Meyer’s TV-G model [42], Xu et al. [62] just used TGV regularization term in place of TV term in TV-G, aiming to capture structural component with more details by TGV, and Lu et al. [38] implemented TGV regularization term to model structural component and its dual TGV$^{*}$ for textural component. Although they all employed TGV to reduce the staircase effect and preserve the edges, they failed to recover the repetition or oscillation property of textures.

Deep learning-based method Deep learning-based methods have been developed for image decomposition recently. The deep edge-aware filters (DEF) [65] were proposed to approximate various filters based on a deep convolutional neural network with a gradient domain training procedure. Fan et al. [14] solved this problem by constructing two cascaded sub-networks, named as edge prediction network and image reconstruction network. Chen et al. [8] used a fully convolutional network to approximate a wide variety of variational models including the TV [50], RTV [66] and $L^{0}$ [64]. Kim et al. [28] learned the structure prior through context aggregation network (CAN) and generated training data by adding Gaussian noises to clean image patches. With the similar idea, Lu et al. [39] trained two prediction networks to highlight textural and structural parts on synthetic samples. However, all above data-driven approaches were highly relied on a large number of man-made training data. To overcome above difficulties, Fan et al. [15] employed a modified CAN on external samples and Zhou et al. [76] adopted modified U-net (MU-net) on self-example samples. Although both methods were in the unsupervised fashion, the learning variational priors still failed to distinguish image structures and textures well, especially for the complex cases.

3 NTGV + CSC decomposition model

In this section, we first review the TGV, NTGV regularization in Sect. 3.1 and CSC in 3.2 and then formulate our optimization for structure–texture image decomposition in Sect. 3.3.

3.1 Background of TGV and NTGV

TGV was first proposed by Bredies et al. [4] and has been widely applied in image processing [20, 55, 58] and geometry processing [36], written as follows:

$$\begin{aligned} \begin{aligned} \mathrm {TGV}_{\varvec{\alpha }}^{k}(u) =&\sup _{\nu } \Bigg \{ \int _{\varOmega }u\mathrm {div}^{k}(\nu )\mathrm {d}x \ | \ \nu \in \mathcal {C}_{c}^{k}(\varOmega ,\mathrm {Sym}^{k}(\mathbb {R}^{d})),\\&\Vert \mathrm {div}^{j}(\nu )\Vert _{\infty } \le \alpha _{j},\ \ j = 0,\ldots ,k-1 \Bigg \}, \end{aligned} \end{aligned}$$

(1)

where $\varvec{\alpha } =(\alpha _{0},\alpha _{1},\ldots ,\alpha _{k-1})$ is a fixed positive vector, $\mathrm {Sym}^{k}(\mathbb {R}^{d})$ denotes the space of symmetric k-tensors on $\mathbb {R}^{d}$ as $\mathrm {Sym}^{k}(\mathbb {R}^{d}) = \{ \xi :\mathbb {R}^{d} \times \cdots \times \mathbb {R}^{d}|\; \xi \, is\, k-linear\, and\, symmetric\},$ and $\mathcal {C}_{c}^{k}(\varOmega ,\mathrm {Sym}^{k}(\mathbb {R}^{d}))$ is the space of compactly supported symmetric tensor fields.

Meanwhile, $\mathrm {TGV}_{\varvec{\alpha }}^{k}$ can be interpreted as a k-fold infimal convolution by employing the Fenchel−Rockafellar duality formula, as follows:

$$\begin{aligned} \mathrm {TGV}_{\varvec{\alpha }}^{k}(u) = \inf _{u_{j}} \sum _{j=1}^{k}\varvec{\alpha }_{k-j}\int _{\varOmega }|\mathcal {E}(u_{j-1})-u_{j}|\mathrm {d}x, \end{aligned}$$

(2)

where $u_{0}=u$, $u_{k}=0$, $u_{j}\in \mathcal {C}_{c}^{k-j}(\varOmega ,\mathrm {Sym}^{j}(\mathbb {R}^{d}))$ for $j=1,\ldots ,k$, and $\mathcal {E}$ represents the distributional symmetrized derivative, i.e., $\mathcal {E}(u_{j-1})=\frac{\nabla u_{j-1} + (\nabla u_{j-1})^{T}}{2}$. Thus, $\mathrm {TGV}_{\varvec{\alpha }}^{k}$ automatically balances the first to the k-th derivatives of u among themselves and reduces the staircasing effects of the TV.

Recently, studies [25, 43, 44] have demonstrated that the non-convex variant of TGV performs better than the convex one for image restoration by preserving edges, which can be written as follows:

$$\begin{aligned} \mathrm {NTGV}_{\varvec{\alpha }}^{k}(u) = \inf _{u_{j}} \sum _{j=1}^{k}\varvec{\alpha }_{k-j}\int _{\varOmega }\phi _{i}(|\mathcal {E}(u_{j-1})-u_{j}|)\mathrm {d}x, \end{aligned}$$

(3)

where $\{\phi _{i}\}$ are a list of non-convex prototype functions, such as the regularized $\ell _{p}$-norm $\phi _{i}(s) = (s+\varepsilon _{i})^{p}$ with $0<p<1$ and $\varepsilon _{i} >0$, or $\log $-function $\phi _{i}(s) = \frac{1}{\mu _{i}}\log (1+\mu _{i} s)$ or $\phi _{i}(s) = \frac{1}{\mu _{i}}\log (1+\mu _{i} s^2)$ with $\mu _{i}>0$.

It is hard to solve the minimization problem involving the non-convex $\ell _{p}$-norm regularizer because finding a limiting-supergradient of $\Vert \cdot \Vert _{q}$ at zero is difficult. Moreover, according to the researches in [25, 44, 56], the $\log $-function is suitable for the reconstruction of images being piecewise smooth and having some sharp jump discontinuities. Structural images are mostly piecewise-smooth; thus, we particularly utilize the second-order $\mathrm {NTGV}$ denoted by $\mathrm {NTGV}_{\varvec{\alpha }}^{2}$ in this work.

3.2 Background of CSC

Sparse representation encodes a signal vector $\varvec{y}$ as the linear combination of a few atoms in a dictionary $\varvec{D}$ by solving the following inverse problem:

$$\begin{aligned} \arg \min _{\varvec{x}}\frac{1}{2}\Vert \varvec{D}\varvec{x} - \varvec{y}\Vert ^{2}_{2} + \lambda \Vert \varvec{x}\Vert _{1}, \end{aligned}$$

(4)

Over the past two decades, this optimization was solved independently on a set of overlapping image patches covering the image[9, 53] and has achieved state-of-the-art results in various computer vision tasks [41, 49, 61, 68]. As a category of sparse representations, the CSC was first proposed by Zeiler et al. [73] to decompose the input image into N sparse feature maps by N filters, instead of sparsely representing a vector by the linear combination of dictionary atoms, replacing (4) with

$$\begin{aligned} \arg \min _{\{\varvec{x}_{m}\}}\frac{1}{2}\Vert \sum _{m}\varvec{d}_{m}*\varvec{x}_{m} - \varvec{y}\Vert ^{2}_{2} + \lambda \sum _{m}\Vert \varvec{x}_{m}\Vert _{1}, \end{aligned}$$

(5)

where $\{\varvec{d}_{m}\}$ is a set of N dictionary filters, $*$ denotes convolution, and $\varvec{x}_{m}$ is a set of coefficient maps, each of which is the same size as $\varvec{y}$.

The convolutional decomposition avoids dividing the whole image into overlapped patches and can naturally utilize the consistency prior in the decomposition procedure. The convolutional decomposition has made great progress in numerical optimization. Zeile et al. [73] adopted an alternating minimization of two splitting subproblems, where one problem is how to solve a large linear system by an iterative method, and the other problem is about a simple shrinkage. Other algorithms operating in the spatial domain have been proposed including coordinate descent [27] and a proximal gradient method [7]. Bristow et al. [5] proposed a fast CSC algorithm by considering the property of block circulant with circulant block (BCCB) matrix in the Fourier domain, and Wohlberg [60] further improved above algorithm by introducing an efficient method for solving the linear systems that is linear in the number of filters, instead of cubic.

3.3 Proposed model

Here, we present our NTGV + CSC model for structure–texture decomposition, recovering the structural component with more details by NTGV and the textural component with repetition property by CSC, for an $m \times n$ input image $\varvec{y}$, we characterize its structural part $\varvec{u}$ with $\mathrm {NTGV}_{\varvec{\alpha }}^{2}(\varvec{u})$, approximate textural part $\varvec{v}$ by the sum of N convolutions of $s\times s$ filter $\varvec{f}_{i}$ and sparse feature map $\varvec{Z}_{i}$ with size $(m+s-1)\times (m+s-1)$. The image decomposition is achieved by solving the following objective function:

$$\begin{aligned} \begin{aligned}&\min _{\varvec{u},\varvec{f},\varvec{Z}}\frac{1}{2}\Vert \varvec{y}\!-\!\varvec{u}\!-\!\sum _{i=1}^{N}\varvec{f}_{i}*\varvec{Z}_{i}\Vert _{F}^{2}\!+\!\omega _{s}\mathrm {NTGV_{\varvec{\alpha }}^{2}}(\varvec{u}) \!+ \!\lambda \sum _{i=1}^{N}\Vert \varvec{Z}_{i}\Vert _{1},\\&\ \ {\mathrm{s.t.}} \ \ \Vert \varvec{f}_{i}\Vert _{F}^{2}\le 1 \ \ for \ \ i=1,\ldots ,N \end{aligned}\nonumber \\ \end{aligned}$$

(6)

where $\lambda $ is a positive parameter controlling the $L_{1}$ penalty, $\omega _{s}$ is a weight function defining on each pixel to distinguish structural edges or textural edges, and the inequality constraints on the filter $\varvec{f}_{i}$ prevent the dictionary from absorbing all the system’s energy. The term $\frac{1}{2}\Vert \varvec{y}-\varvec{u}-\sum _{i=1}^{N}\varvec{f}_{i}*\varvec{Z}_{i}\Vert _{F}^{2}$ is used for fidelity, and $\mathrm {NTGV_{\varvec{\alpha }}^{2}}(\varvec{u})$ is minimized over all gradients of the deformation field $\varvec{p}=(p_{1},p_{2})$ on image space $\varOmega $, which reads

$$\begin{aligned} \inf _{\varvec{p} \in \mathcal {C}_{c}^{2}(\varOmega ,\mathbb {R}^{2})}\alpha _{1}\int _{\varOmega }\phi _{1}(|\nabla {\varvec{u}} - \varvec{p}|)\;{\mathrm{d}}x + \alpha _{0}\int _{\varOmega }\phi _{2}(|\mathcal {E}(\varvec{p})|)\;{\mathrm{d}}x, \end{aligned}$$

(7)

where $\alpha _{0}$ and $\alpha _{1}$ are positive parameters balancing between the first- and second-order derivative of $\varvec{u}$, $\phi _{i}(s)=\frac{1}{\rho _{i}}\log (1+\rho _{i}s)$ with the parameters $\rho _{i}\!>\!0$ controlling the non-convexity of the first- and second-order regularization terms.

The Setting of $\omega _{s}$ The weight function $\omega _{s}$ plays an important role in highlighting the pixels from structural edges or textural edges and guiding a good image decomposition. Following the work [76], we introduce two measures: structure-aware measure and texture-aware measure. The former can reveal the main structural information of an image, and the latter describes the repetition property of textures.

Taking advantage of anisotropy property, structural gradients would have a dominant orientation, representing by one of the eigenvectors of the following positive semi-definite matrix $\mathbf {J}$ [1]:

$$\begin{aligned} \mathbf {J}(i) = \begin{bmatrix} \mathbf {g}_{x}^{T}(i)\mathbf {g}_{x}(i)&{}\mathbf {g}_{x}^{T}(i)\mathbf {g}_{y}(i)\\ \mathbf {g}_{y}^{T}(i)\mathbf {g}_{x}(i)&{}\mathbf {g}_{y}^{T}(i)\mathbf {g}_{y}(i)\end{bmatrix}, \end{aligned}$$

where i indicates the location of the patch, and $\mathbf {g}_{x}$ and $\mathbf {g}_{y}$ are the vectors containing gradients of each pixel in the patch along the abscissa and the ordinate, respectively. The matrix $\mathbf {J}$ has two non-negative eigenvalues denoted as $\lambda _{1}$ and $\lambda _{2}$, and its dominant direction can be represented by the eigenvectors corresponding to the lower one. For any patch, its anisotropy degree A can be calculated by [23]

$$\begin{aligned} A_{\mathbf {J}} = \frac{\max {(\lambda _{1},\lambda _{2})}-\min {(\lambda _{1},\lambda _{2})}}{\lambda _{1}+\lambda _{2}}, \end{aligned}$$

(8)

where $\max (\cdot ,\cdot )$ and $\min (\cdot ,\cdot )$ return the maximum and minimum value of their two arguments, respectively. From Eq. 8, the range of $A_{\mathbf {J}}$ is from 0 to 1, and the larger $A_{\mathbf {J}}$ of one patch indicates that it is more likely to be a patch of image structure. Taking edge strength into consideration, the structure-aware measure $M_{s}$ is designed as

$$\begin{aligned} M_{s}(i) = A_{\mathbf {J}}(i)\frac{\Vert \nabla {f}(i)\Vert _{2}}{R}, \end{aligned}$$

(9)

where f(i) is the pixel intensity at pixel i, $\Vert \nabla {f}(i)\Vert _{2}$ denotes the gradient magnitude in $L_{2}$ norm, R is fixed as 255 for 8-bit images.

To define texture-aware measure, we employ the histogram of oriented gradients (HOG) [10] for feature extraction same as [76]. Specifically, given a pixel i, we determine repetitive patterns by going through all its nearby pixels and calculate the texture-aware measure $M_{t}$ as follows

$$\begin{aligned} M_{t}(i) = \frac{1}{\# (C_{N(i)})}\sum _{j\in C_{N(i)}}\cos \theta _{ij}\exp (-D_{KL}(\mathbf {h}_{j}||\mathbf {h}_{i})), \end{aligned}$$

(10)

where $\theta _{ij}$ is the angle between edge direction $\mathbf {v}_{ij}$ and gradient direction, $D_{KL}(\cdot ||\cdot )$ denotes the Kullback-Leibler (KL) divergence, $\mathbf {h}_{i}$ and $\mathbf {h}_{j}$ are the HOG features of pixels i and j, respectively, $\# (C_{N(i)})$ counts the element number in the set consisted of pixels with noticeable gradients in the neighborhood of pixel i.

Based on above, it is straightforward to combine both structure-aware measure and texture-aware measure to guide our decomposition instead of using the same value among all pixels. For the edge pixel from structural edges, i.e., $M_{s}$ is higher and $M_{t}$ is lower, we hope they can be preserved as sharp as the input by reducing the impact of the $\mathrm {NTGV}_{\varvec{\alpha }}^{2}(\varvec{u})$. For the edge pixel from textural edges, i.e., $M_{s}$ is lower and $M_{t}$ is higher, we hope they can be removed by increasing the impact of $\mathrm {NTGV}_{\varvec{\alpha }}^{2}(\varvec{u})$. So the weight function $\omega _{s}$ on pixel i is defined as

$$\begin{aligned} \omega _{s}(i) = (1-t)(1-M_{s}(i)) + tM_{t}(i), \ \ 0<t<1. \end{aligned}$$

(11)

4 Optimization algorithm for image decomposition

In this section, we develop an efficient implementation for solving problem (6). First of all, we define linear operators $\varvec{F}_{i}$ such that $\varvec{F}_{i}\varvec{z}_{i}=\varvec{f}_{i}*\varvec{Z}_{i}$, where $\varvec{z}_{i}$ is the vectorization of the feature map $\varvec{Z}_{i}$. Based on this, problem (6) is equivalent to

$$\begin{aligned} \begin{aligned}&\min _{\varvec{u},\varvec{f},\varvec{z}}\frac{1}{2}\Vert \varvec{y}-\varvec{u}-\sum _{i}^{N}\varvec{F}_{i}\varvec{z}_{i}\Vert _{F}^{2}+\omega _{s}\mathrm {NTGV_{\varvec{\alpha }}^{2}}(\varvec{u}) + \lambda \sum _{i=1}^{N}\Vert \varvec{z}_{i}\Vert _{1},\\&\ \ {\mathrm{s.t.}} \ \ \Vert \varvec{f}_{i}\Vert _{F}^{2}\le 1 for \ \ i=1,\ldots ,N \end{aligned}\nonumber \\ \end{aligned}$$

(12)

where $\varvec{F}_{i}$ is the corresponding BCCB matrix of filter $\varvec{f}_{i}$. Note that the three unknowns $\varvec{u}$, $\varvec{f}$ and $\varvec{z}$ are coupled in (12), our idea is to update three variables alternatively and the details of each subproblem are summarized in Algorithm 1.

Updating $\varvec{u}$ Fixing $\{\varvec{f}_{i}\}_{i=1,\ldots ,N}$ and $\{\varvec{z}_{i}\}_{i=1,\ldots ,N}$, we solve the $\varvec{u}$ subproblem by optimizing the following problem:

$$\begin{aligned} \min _{\varvec{u}}\frac{1}{2}\Vert \varvec{y}-\varvec{u}-\sum _{i}^{N}\varvec{F}_{i}\varvec{z}_{i}\Vert _{F}^{2}+\omega _{s}\mathrm {NTGV_{\varvec{\alpha }}^{2}}(\varvec{u}) \end{aligned}$$

(13)

where the $\mathrm {NTGV_{\varvec{\alpha }}^{2}}(\varvec{u})$ is introduced in Sect. 3.1. Denote $\varvec{x} = \varvec{y}-\sum _{i}^{N}\varvec{F}_{i}\varvec{z}_{i}$, and we adopt the iteratively reweighted $l_{1}$ algorithm (IRLA) [43]. Introducing auxiliary variables $\varvec{d}$ and $\varvec{w}$, we can reformulate the unconstrained problem in (13) into the following constrained problem as

$$\begin{aligned} \begin{aligned}&\min _{\varvec{u}}\frac{1}{2}\Vert \varvec{x}-\varvec{u}\Vert _{F}^{2}+\omega _{s}\alpha _{1}\phi _{1}(|\varvec{d}|) + \omega _{s}\alpha _{0}\phi _{2}(|\varvec{w}|)\\&\ \ {\mathrm{s.t.}} \ \ \varvec{d} = \nabla \varvec{u}-\varvec{p}, \varvec{w}= \mathcal {E}(\varvec{p}). \end{aligned} \end{aligned}$$

(14)

Hence, the IRLA can be applied to (14), and a convex minimization problem to update $(\varvec{u}^{k+1}, \varvec{p}^{k+1}, \varvec{d}^{k+1},$ $\varvec{w}^{k+1})$ is given by

$$\begin{aligned} \begin{aligned}&\min _{\varvec{u},\varvec{p},\varvec{d},\varvec{w}}\left\{ \frac{1}{2}\Vert \varvec{x}-\varvec{u}\Vert _{F}^{2}+\omega _{s}\alpha _{1}\langle \tilde{\varvec{d}^{k}},|\varvec{d}|\rangle +\omega _{s}\alpha _{0}\langle \tilde{\varvec{w}^{k}},|\varvec{w}|\rangle \right\} \\&\ \ {\mathrm{s.t.}} \ \ \varvec{d} = \nabla \varvec{u}-\varvec{p}, \varvec{w}= \mathcal {E}(\varvec{p}). \end{aligned} \end{aligned}$$

(15)

where $\tilde{\varvec{d}^{k}} = \frac{1}{\rho _{1}|\varvec{d}^{k}|+1}$ and $\tilde{\varvec{w}^{k}} = \frac{1}{\rho _{2}|\varvec{w}^{k}|+1}$.

Problem (15) can be solved by ADMM algorithm, which yields

(16)

where the penalty parameter is a fixed constant, and the relaxation is required for the convergence of the ADMM algorithm.

To solve the above problem, we first utilize forward differences to approximate and the symmetrized derivative in the following, respectively.

where and are the circulant matrices corresponding to the forward finite difference operators with periodic boundary conditions along x-axis and y-axis, respectively.

The subproblems for and in (16) have closed- form solutions using shrink operator as follows:

$$\begin{aligned}&\varvec{d}^{k+1} = \mathrm{shrink}\left( \frac{\lambda ^{k}_{1}}{\mu } + \nabla {\varvec{u}^{k}} - \varvec{p}^{k},\frac{\omega _{s}\alpha _{1}\tilde{\varvec{d}^{k}}}{\mu }\right) , \end{aligned}$$

(17)

$$\begin{aligned}&\varvec{w}^{k+1} = \mathrm{shrink}\left( \frac{\lambda ^{k}_{2}}{\mu } + \mathcal {E}(\varvec{p}^{k}),\frac{\omega _{s}\alpha _{0}\tilde{\varvec{w}^{k}}}{\mu }\right) , \end{aligned}$$

(18)

where the shrink operator is defined as .

The subproblems for and in (16) are both least square problems, and their corresponding normal equations are as follows:

$$\begin{aligned}&(\mu D^{\intercal }D + I)\varvec{u}^{k+1} = \mu (\varvec{x}-D^{\intercal }\lambda ^{k}_{1})\nonumber \\&\quad + D^{\intercal }(\varvec{d}^{k+1} + \varvec{p}^{k+1}), \end{aligned}$$

(19)

$$\begin{aligned}&(\mathcal {E}^{*}\mathcal {E}+I)\varvec{p}^{k+1} = \mathcal {E}^{*}(\varvec{w}^{k+1}-\frac{\lambda ^{k}_{2}}{\mu })+\frac{\lambda ^{k}_{1}}{\mu }\nonumber \\&\quad +(D\varvec{u}^{k+1}-\varvec{d}^{k+1}). \end{aligned}$$

(20)

where denotes the adjoint operator of $\mathcal {E}$.

The block matrices and can be diagonalized by the Fourier transform under the periodic boundary condition. Thus, solutions and in our algorithm can be achieved explicitly and easily using the Fourier transform and the block matrix inversion formula.

Finally, the IRLA with ADMM algorithm for -step in (16) is summarized in Algorithm 2.

Updating Fixing structural part and filters , we solve the following subproblem to obtain :

(21)

The optimization in (21) is a standard convolutional sparse coding problem, which can be solved by the ADMM algorithm in the Fourier domain to substantially reduce computational cost [60].

Updating $\varvec{f}$ To recover $\varvec{f}$, we first apply an efficient variable reordering $\sum _{i}^{N}\varvec{F}_{i}\varvec{z}_{i} = \hat{\varvec{Z}}\varvec{f}$, where $\varvec{f}$ is written as the vectorization of all the filters $\varvec{f}_{i}, {i=1,\ldots ,N}$, $\hat{\varvec{Z}} = [\hat{\varvec{Z}_{1}},\ldots ,\hat{\varvec{Z}_{i}},\ldots ,\hat{\varvec{Z}_{N}}]$ and $\hat{\varvec{Z}_{i}}$ is generated by collecting the patches in $\varvec{Z}_{i}$. The $\varvec{f}$ subproblem can be re-written as the following equivalent form:

$$\begin{aligned} \begin{aligned}&\min _{\varvec{f}}\frac{1}{2}\Vert \varvec{y}-\varvec{u}-\hat{\varvec{Z}}\varvec{f}\Vert _{F}^{2},\\&\ \ {\mathrm{s.t.}} \ \ \Vert \varvec{f}_{i}\Vert _{F}^{2}\le 1 \ \ for \ \ i=1,\ldots ,N \end{aligned} \end{aligned}$$

(22)

Following the optimization in JCAS [19], we adopt a proximal gradient descent scheme to solve (22), which namely

$$\begin{aligned} \varvec{f}^{t+1} = Prox_{\Vert \cap \Vert \le 1}(\varvec{f}^{t}-\tau \hat{\varvec{Z}}^{\intercal }(\varvec{y}-\varvec{u}-\hat{\varvec{Z}}\varvec{f}^{t})). \end{aligned}$$

(23)

Since our proposed model is non-convex, it is important to set an appropriate initialization and the optimization order of the variables to ensure the convergence of algorithm. In our implementation, we initialize the textural part as an all-zero matrix and solve the $\varvec{u}^{1}$ subproblem first. Then, we initialize filters $\{\varvec{f}^{1}_{i}\}_{i=1,\ldots ,N}$ as the PCA dictionary of the residual image $\varvec{y}-\varvec{u}^{1}$. Based on $\varvec{u}^{1}$ and $\varvec{f}^{1}$, we recover the feature map $\varvec{z}^{1}$ by solving convolutional sparse coding problem (21). These steps can guarantee that texture details are estimated from the residual image gradually, while keeping the details of the structural part.

5 Analysis

5.1 Implementation details

The proposed method is implemented using MATLAB R2018b on a windows 10 platform with an Intel Corei7 at 3.2 GHz and 8 GB RAM. We fix the regularization parameter $\lambda =10^{-2}$ in objective function discussed in Sect. 5.4. For the parameters in Algorithm 2, the default setting is as follows: $\alpha _{0} = 0.1$, $\alpha _{1} = 0.2$, $\mu = 10$, $\gamma = 1.618$, $\rho _{1} = 0.005$, $\rho _{2} = 0.2$ and $\epsilon _{3} = 5e^{-4}$. To calculate $\{\varvec{f}_{i}\}_{i=1,\ldots ,N}$, we fix the size and numbers of filters are $5\times 5$ and 7, respectively. To calculate $\omega _{s}$, we set balance parameter t to 0.6, the batch size of $M_{s}$ to 3 and the neighborhood size of $M_{t}$ to 6. Since our idea is to remove textual edges from structural image gradually, so we update $M_{s}$ during the iterations and gradually reduce the impact of the second term by decreasing t in $\omega _{s}$ of (11). In our implementation, $M_{t}$ is calculated once from the input and $M_{s}$ is re-estimated from the updated structural result of previous step.

5.2 The detail-preserving property of our model

It is worthy to note that how our model removes textural edges while keeping sharp features of structural component. Firstly, the weight function $\omega _{s}$ plays an important role in locating structural and textural edges, which is a good guiding of our model to distinguish image structures and textures. Secondly, the non-convex TGV can well construct smooth regions without staircasing effects while preserving edges and details, especially for the images with much structures and strong edges or complex textures. Based above, our model can better preserve the details guided by $\omega _{s}$. Here, we compare our proposed method with two most related works. Lu et al. [38] adopted the convex TGV to capture structural edges, and Gu et al. [19] employed synthesis sparse representation to capture textural edges, both methods either focused on recovering structural edges or textural edges, but cannot pursue both. We take account of both advantages and also combine the weight function $\omega _{s}$ to guide a good image decomposition. A visual example is given in Fig. 1, and the regions of red boxes are highlighted. The input images given in Fig. 1a, b and e show the image structures from different methods. From Fig. 1b, we can see that TGV-TGV$^{*}$ cannot remove the textural edges totally from structural component due to ignoring considering the repetition property of textures. For JCAS in Fig. 1c, small-scale structures are blurred or even removed because of the staircase effects from TV just as shown in the highlighted parts. Figure 1d shows the results of our decomposition model without $\omega _{s}$, and NTGV improves the detail-preserving property greatly compared with JCAS. The best results are achieved by adding the weight function $\omega _{s}$ shown in Fig. 1e, from which we can observe that the structural edges are much sharper and the textural edges are removed clearly.

5.3 Convergence analysis

Note that the three unknowns $\varvec{u},\varvec{f},\varvec{z}$ are coupled in (6), it is effective to apply an inner alternating minimization scheme to decrease the energy monotonically. However, it should be mentioned that such an alternating scheme does not guarantee the whole sequence convergence to the minimizer of the original problem. For fixed $\varvec{f},\varvec{z}$, the problem is non-convex to $\varvec{u}$ in (14) and we can obtain a partial convergence of the IRLA with ADMM algorithm in (15) (see [25, 43, 44] for more details). That is, the sequence $\{\varvec{u}^{k}, \varvec{v}^{k}, \varvec{d}^{k}, \varvec{w}^{k}\}$ generated by (15) is bounded and has at least one accumulation point. However, since the non-convex log function in (14) is not a sum of a convex function, we cannot even assure the global convergence of the algorithm in (15). For fixed $\varvec{u},\varvec{f}$, the problem is convex to $\varvec{z}$. And for fixed $\varvec{u},\varvec{z}$, the problem is convex to $\varvec{f}$. Since our objective function has a general lower bound 0, the $\varvec{f}$ and $\varvec{z}$ subproblems can be guaranteed to converge to their minimum, respectively.

Here, we compute average value of per-pixel intensity differences during iterations. Figure 2a shows the how the differences $\Vert \varvec{u}^{k+1} - \varvec{u}^{k}\Vert ^{2}$ and $\Vert \varvec{y}-\varvec{u}^{k}-\varvec{v}^{k}\Vert ^{2}$ change over iterations. From the figure, we can see that our algorithm converges to a fixed point within a few iterations and both differences are less than $1.0\times 10^{-5}$ after 10 iterations. Meanwhile, Fig. 2a–d shows that textural edges are removed from structural edges gradually during iterations.

5.4 Parameters impact

The regularization parameter $\lambda $ is important to distinguish structural edges and textural edges, controlling the sparsity of repetitive textures. In other words, the larger value of $\lambda $ produces the more sparser textures, yielding less smoothing effects (and vice-versa). We show the resulting structural image $\varvec{u}$ with various choices of $\lambda $ in Fig. 3. The number of iterations K is fixed to 12 and feature map is initialized with zeros for all cases. As can be observed in Fig. 3, textures can be removed from the structural part gradually with the decreasing of $\lambda $. Different from JCAS [19], our method can well avoid over-smoothing of structural part thanks to the weight function $\omega _{s}$ and achieves almost the same results although the $\lambda $ is less than $10^{-2}$, just shown in Fig. 3e, f. Therefore, we fix $\lambda $ to $10^{-2}$ for almost examples. Furthermore, compared with STD [76] in Fig. 3b, both results in Fig. 3e, f contain more sharper details and overcome the staircase effect of TV regularization.

Table 1 Average running time of different methods on 80 images of size $350\times 350$

Full size table

5.5 Runtime

Since our algorithm is an alternating minimization scheme updating $\varvec{u}$, $\varvec{f}$ and $\varvec{z}$, respectively, and the most time-consuming part is the $\varvec{u}$ updating which consists of four subproblems. We compare the runtime of our method with those of the several decomposition methods including RTV [66], JCAS [19], IUL [15], SWF [70] and STD [76] and measure the runtime of the proposed method using gputimeit function built in MATLAB. We select 80 images with the same size of $350\times 350$ and record the average running time over these images on the same computing machine provided in Table 1. Among the methods, IUL is the fastest due to its one forward propagation and RTV is the second since only the optimization of structures is considered. The efficiency of STD is dependent on the initialization of network, the scheme using standard initialization is time-consuming, while the one using pre-trained MU-net is comparable. Similar with traditional filters, SWF needs to perform calculations over multiple windows, and its computational cost is higher. JCAS and our method are both time-consuming mainly because both involve the structural and textural image estimation via the alternating minimization scheme. However, both methods can guarantee a fast convergence in several iterations, so we set the maximum number of iterations as 10. With code optimization and implementing C programming, the computational speed may be significantly improved.

6 Applications

In this section, we demonstrate the effectiveness of our approach on several image decomposition applications, including texture removal, HDR tone mapping, detail manipulation and non-photorealistic abstraction. Meanwhile, we compare the proposed method with several state-of-the-art methods in each application. The code of the LowRank [16] is written by ourselves, while the results of other competing methods for comparison are from the source codes or already-trained models obtained from the original authors and parameters are set to be optimal according to the original papers.

6.1 Texture removal

Texture removal is a direct application of image decomposition extensively used in image analysis, recognition and composition. To demonstrate the performance of our proposed model on textural removal, we compare it with seven popular methods. Among these competitors, RTV, LowRank and JCAS are the models of variational framework, RGF and SWF are filtering-based methods, and IUL and STD are two unsupervised methods based on DNNs. We use the test images which contain different kinds of textures in Fig. 4, and the comparison results of different methods are presented in Figs. 5 and 6.

In Fig. 5, we present both structural and textural components to visualize where textures are removed. The corresponding zoomed parts are shown with enlargements in red rectangles. One can observe that some structural edges are treated as textural edges removed in the textural components for most competitors. For example, the abdomen region of the man and the eye region of the horse are highlighted in Fig. 5. Visually, both parts belong to structures, but they are removed from structures leading to the smooth and blurred results in Fig. 5a–g. In Fig. 5h, our method removes the square patterns only and preserves these structural edges effectively.

For other examples, we only exhibit the results of texture removal to save spaces in Fig. 6. From Fig. 6a, we can observe that RTV always fails to distinguish the small-scale structures and textures, and blurs the structural parts. The results of LowRank method always contain different scales of textures and blocking artifacts, which is insufficient to only extract complex textures with low rank theory, as illustrated in Fig. 6b. Suffering from the staircase effect of TV, JCAS and STD methods always remove the small-scale structures and achieve piecewise constant results, as shown in Fig. 6c, e. The drawback of SWF method is that some complex textures surrounding structural edges are always incorrectly preserved as shown in Fig. 6d. Our results are exhibited in Fig. 6f, from which we can observe that image textures can be clearly removed from image structures and structural edges are separated clearly and sharply.

6.2 HDR tone mapping

Tone mapping aims to compress the intensity of a high dynamic range image, preserving the details and colors of the HDR image. To apply our method to HDR tone mapping problem, we first compute the luminance image using $\mathbf {l}^{in} = (0.299 \mathbf {r}^{in} + 0.587 \mathbf {g}^{in} + 0.114\mathbf {b}^{in})$; then, we apply our model to decompose the logarithm of luminance $\mathbf {l}^{in}$ into base and detail components. The output luminance $\mathbf {l}^{out}$ is obtained from the detail component plus the base component compressed with a factor of 0.4. Finally, we adopt the color restoration [18] to reproduce chrominance information.

We compare our method with state-of-the-art tone mapping methods including filtering-based method [13, 17], optimization-based method [19] and DNNs-based methods [76]. With a better image decomposition result, tone-mapped images can better preserve structural details and also be natural-looking. Visual examples of tone mapping results are provided in Fig. 7. As can be seen, for Durand and WLS, the results are over-enhanced and look unnatural, as shown in Fig. 7b, c. By contrast, the results from JCAS and STD do not suffer from such a problem, but they always lose structural details in relatively darker regions, as shown in Fig. 7d, e. In comparison with the competitors, the results from our method achieve the best balance between detail-preserving and naturalness, as shown in Fig. 7f. For example, the cracking bark and the leaves with white dots in red rectangles are recovered well. In addition, we use the tone-mapped image quality index (TMQI) [69] to compare different methods on a subject-rated image database of 15 HDR images quantitatively. From Table 2, one can see that our method achieves the highest TMQI values on most images (10 out of 15 images).

Table 2 TMQI values of different methods on a subject-rated image database[69]

Full size table

6.3 Detail enhancement

Given an image decomposition result, the detail enhancement can be achieved by superimposing the enhanced textural component on the structural component. Following the implementation of SWF [70] for detail enhancement, we obtained the enhanced image by $\varvec{y}_{e} = \varvec{u} + (\alpha +1)\varvec{v}$, where $\alpha $ is an amplification parameter and is fixed to 5 in all experiments of this application. Here, the compared methods include RTV [66], JCAS [19], EGIF [40] and SWF [70]. If structural part and textural part are separated in a satisfactory way, edges of image details would be much sharper, while structural edges would be better preserved from being over-sharpened or overshot. On the contrary, if structural edges are extracted in textural part, they can be enhanced as the textural edges and halo artifacts will appear.

Visual examples of detail-enhanced experiments are provided in Fig. 8. As can be observed, for the results from RTV, JCAS and EGIF shown in Fig. 8b, c and d, respectively, structural edges of the leaves are over-enhanced as the structural edges are incorrectly separated in the textural component. Instead, SWF does not suffer from such a problem, but the detail edges of SWF in Fig. 8e are under-enhanced for the reason that the details needed to be enhanced are extracted in the structural component. In comparison with the competitors, our method can enhance the details in a better way while preserving the structural edges, as shown in Fig. 8f.

6.4 Non-photorealistic abstraction

Non-photorealistic abstraction aims to decrease the complexity of the scene while protecting important structures. Following the traditional methods [11, 59], we perform the edge extraction with difference-of-Gaussian (DoG) on the structural component and highlight extracted edges to achieve non-photorealistic abstraction effect. If the structural edges are incorrectly removed in textural component, the structural component always is too over-smooth to detect in the abstraction results. Meanwhile, if the textural edges are incorrectly preserved in the structural component, redundant edge maps will appear in the abstraction results.

Figure 9 shows the visual examples of non-photorealistic abstraction results. It can be found that, for the methods of JCAS and IUL, as shown in Fig. 9b, c, respectively, the color and edges are over-smooth. For SWF, the textural edges retained clearly in the structural component lead to the mussy edges of abstraction results, as shown in Fig. 9d. For STD, the structural edges are more sharper than above, but still lose several structural details such as the white petals shown in Fig. 9e. However, our method removes the repetitive textures from the structural component but never smooth the structural edges, such as the colorful leaves shown in Fig. 9f. Furthermore, we recover the white petals and blue flower cores in a better way.

7 Conclusion

In this paper, we have presented an efficient structure–texture image decomposition model, which took advantages of both NTGV and CSC to separate the complex textures from richly detailed structural component. Meanwhile, the weight function has been incorporated to better distinguish structural and textural edges. We solved the model by an alternating minimization algorithm based on ADMM and gave a comprehensive analysis to the proposed algorithm. The experimental results have demonstrated the effectiveness of our approach on several applications including texture removal, HDR image tone mapping, detail enhancement and non-photorealistic abstraction.

In the future, we hope to increase the efficiency of our algorithm by GPU programming or DNNs. How to extend these techniques to decompose the digital images of wall paintings full of various diseases will also be a future work.

References

Aach, T., Mota, C., Stuke, I., Muhlich, M., Barth, E.: Analysis of superimposed oriented patterns. IEEE Trans. Image Process. 15(12), 3690–3700 (2006)
Article MathSciNet Google Scholar
Aujol, J.F., Gilboa, G., Chan, T., Osher, S.: Structure–texture image decomposition modeling, algorithms, and parameter selection. Int. J. Comput. Vis. 67(1), 111–136 (2006)
Article MATH Google Scholar
Bao, L., Song, Y., Yang, Q., Yuan, H., Wang, G.: Tree filtering: efficient structure-preserving smoothing with a minimum spanning tree. IEEE Trans. Image Process. 23(2), 555–569 (2013)
Article MathSciNet MATH Google Scholar
Bredies, K., Kunisch, K., Pock, T.: Total generalized variation. SIAM J. Imaging Sci. 3(3), 492–526 (2010)
Article MathSciNet MATH Google Scholar
Bristow, H., Eriksson, A., Lucey, S.: Fast convolutional sparse coding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 391–398 (2013)
Buades, A., Coll, B., Morel, J.M.: Nonlocal image and movie denoising. Int. J. Comput. Vision 76(2), 123–139 (2008)
Article Google Scholar
Chalasani, R., Principe, J.C., Ramakrishnan, N.: A fast proximal method for convolutional sparse coding. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN), pp. 1–5. IEEE (2013)
Chen, Q., Xu, J., Koltun, V.: Fast image processing with fully-convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2497–2506 (2017)
Chen, S.S., Donoho, D.L., Saunders, M.A.: Atomic decomposition by basis pursuit. SIAM Rev. 43(1), 129–159 (2001)
Article MathSciNet MATH Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 886–893. IEEE (2005)
DeCarlo, D., Santella, A.: Stylization and abstraction of photographs. ACM Trans. Graph. 21(3), 769–776 (2002)
Article Google Scholar
Dowson, N., Salvado, O.: Hashed nonlocal means for rapid image filtering. IEEE Trans. Pattern Anal. Mach. Intell. 33(3), 485–499 (2010)
Article Google Scholar
Durand, F., Dorsey, J.: Fast bilateral filtering for the display of high-dynamic-range images. ACM Trans. Graph. 21(3), 257–266 (2002)
Article Google Scholar
Fan, Q., Yang, J., Hua, G., Chen, B., Wipf, D.: A generic deep architecture for single image reflection removal and image smoothing. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3238–3247 (2017)
Fan, Q., Yang, J., Wipf, D., Chen, B., Tong, X.: Image smoothing via unsupervised learning. ACM Trans. Graph. (TOG) 37(6), 1–14 (2018)
Article Google Scholar
Fan, Y.R., Huang, T.Z., Ma, T.H., Zhao, X.L.: Cartoon-texture image decomposition via non-convex low-rank texture regularization. J. Franklin Inst. 354(7), 3170–3187 (2017)
Article MathSciNet MATH Google Scholar
Farbman, Z., Fattal, R., Lischinski, D., Szeliski, R.: Edge-preserving decompositions for multi-scale tone and detail manipulation. ACM Trans. Graph. (TOG) 27(3), 1–10 (2008)
Article Google Scholar
Fattal, R., Lischinski, D., Werman, M.: Gradient domain high dynamic range compression. In: Proceedings of the Annual Conference on Computer Graphics and Interactive Techniques, pp. 249–256 (2002)
Gu, S., Meng, D., Zuo, W., Zhang, L.: Joint convolutional analysis and synthesis sparse representation for single image layer separation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1708–1716 (2017)
Gu, Y., Yang, X., Gao, Y.: A novel total generalized variation model for image dehazing. J. Math. Imaging Vis. 61(9), 1329–1341 (2019)
Article MathSciNet MATH Google Scholar
Guo, X., Li, Y., Ma, J., Ling, H.: Mutually guided image filtering. IEEE Trans. Pattern Anal. Mach. Intell. 42(3), 694–707 (2018)
Article Google Scholar
He, K., Sun, J., Tang, X.: Guided image filtering. IEEE Trans. Pattern Anal. Mach. Intell. 35(6), 1397–1409 (2013)
Article Google Scholar
Jähne, B.: Spatio-Temporal Image Processing: Theory and Scientific Applications, vol. 751. Springer, Berlin (1993)
MATH Google Scholar
Jeon, J., Lee, H., Kang, H., Lee, S.: Scale-aware structure-preserving texture filtering. In: Proceedings of the Computer Graphics Forum, vol. 35, pp. 77–86. Wiley, New York (2016)
Kang, M., Kang, M., Jung, M.: Total generalized variation based denoising models for ultrasound images. J. Sci. Comput. 72(1), 172–197 (2017)
Article MathSciNet MATH Google Scholar
Kass, M., Solomon, J.: Smoothed local histogram filters. In: ACM SIGGRAPH 2010 Papers, pp. 1–10 (2010)
Kavukcuoglu, K., Sermanet, P., Boureau, Y.L., Gregor, K., Mathieu, M., Cun, Y., et al.: Learning convolutional feature hierarchies for visual recognition. Adv. Neural. Inf. Process. Syst. 23, 1090–1098 (2010)
Google Scholar
Kim, Y., Ham, B., Do, M.N., Sohn, K.: Structure–texture image decomposition using deep variational priors. IEEE Trans. Image Process. 28(6), 2692–2704 (2018)
Article MathSciNet MATH Google Scholar
Kou, F., Chen, W., Wen, C., Li, Z.: Gradient domain guided image filtering. IEEE Trans. Image Process. 24(11), 4528–4539 (2015)
Article MathSciNet MATH Google Scholar
Le, T.M., Vese, L.A.: Image decomposition using total variation and div (bmo). Multiscale Model. Simul. 4(2), 390–423 (2005)
Article MathSciNet MATH Google Scholar
Li, Z., Zheng, J., Zhu, Z., Yao, W., Wu, S.: Weighted guided image filtering. IEEE Trans. Image Process. 24(1), 120–129 (2014)
MathSciNet MATH Google Scholar
Lieu, L.H., Vese, L.A.: Image restoration and decomposition via bounded total variation and negative Hilbert–Sobolev spaces. Appl. Math. Optim. 58(2), 167 (2008)
Article MathSciNet MATH Google Scholar
Lindeberg, T.: Scale-space theory: a basic tool for analyzing structures at different scales. J. Appl. Stat. 21(1–2), 225–270 (1994)
Article Google Scholar
Lingling, J., Haiqing, Y., Xiangchu, F.: Adaptive variational models for image decomposition combining staircase reduction and texture extraction. J. Syst. Eng. Electron. 20(2), 254–259 (2009)
Google Scholar
Liu, X.: A new Tgv–Gabor model for cartoon-texture image decomposition. IEEE Signal Process. Lett. 25(8), 1221–1225 (2018)
Article Google Scholar
Liu, Z., Li, Y., Wang, W., Liu, L., Chen, R.: Mesh total generalized variation for denoising. IEEE Trans. Visual Comput. Graphics (2021). https://doi.org/10.1109/TVCG.2021.3088118
Article Google Scholar
Lu, C., Song, G.: Image decomposition using adaptive regularization and div (bmo). J. Syst. Eng. Electron. 22(2), 358–364 (2011)
Article Google Scholar
Lu, C., Wang, M.: Alternating direction method for tgv-tgv* based cartoon-texture image decomposition. IET Image Proc. 10(6), 495–504 (2016)
Article Google Scholar
Lu, K., You, S., Barnes, N.: Deep texture and structure aware filtering network for image smoothing. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 217–233 (2018)
Lu, Z., Long, B., Li, K., Lu, F.: Effective guided image filtering for contrast enhancement. IEEE Signal Process. Lett. 25(10), 1585–1589 (2018)
Article Google Scholar
Mairal, J., Elad, M., Sapiro, G.: Sparse representation for color image restoration. IEEE Trans. Image Process. 17(1), 53–69 (2007)
Article MathSciNet Google Scholar
Meyer, Y.: Oscillating patterns in image processing and nonlinear evolution equations: the fifteenth Dean Jacqueline B. Lewis memorial lectures, vol. 22. American Mathematical Society (2001)
Ochs, P., Dosovitskiy, A., Brox, T., Pock, T.: An iterated $l^{1}$ algorithm for non-smooth non-convex optimization in computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1759–1766 (2013)
Ochs, P., Dosovitskiy, A., Brox, T., Pock, T.: On iteratively reweighted algorithms for nonsmooth nonconvex optimization in computer vision. SIAM J. Imaging Sci. 8(1), 331–372 (2015)
Article MathSciNet MATH Google Scholar
Ono, S.: $l^{0}$ gradient projection. IEEE Trans. Image Process. 26(4), 1554–1564 (2017)
Article MathSciNet MATH Google Scholar
Osher, S., Solé, A., Vese, L.: Image decomposition and restoration using total variation minimization and the $h^{-1}$ norm. Multiscale Model. Simul. 1(3), 349–370 (2003)
Article MathSciNet MATH Google Scholar
Paris, S., Hasinoff, S.W., Kautz, J.: Local Laplacian filters: edge-aware image processing with a Laplacian pyramid. Commun. ACM 58(3), 81–91 (2015)
Article Google Scholar
Perona, P., Malik, J.: Scale-space and edge detection using anisotropic diffusion. IEEE Trans. Pattern Anal. Mach. Intell. 12(7), 629–639 (1990)
Article Google Scholar
Rubinstein, R., Member, S., IEEE, Bruckstein, A.M., Member,: Dictionaries for sparse representation modeling. Proc. IEEE 98(6), 1045–1057 (2010)
Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60(1–4), 259–268 (1992)
Article MathSciNet MATH Google Scholar
Shu, L., Du, H.: Side window weighted median image filtering. In: Proceedings of the International Conference on Multimedia Systems and Signal Processing, pp. 26–30. Association for Computing Machinery (2020)
Starck, J.L., Elad, M., Donoho, D.L.: Image decomposition via the combination of sparse representations and a variational approach. IEEE Trans. Image Process. 14(10), 1570–1582 (2005)
Article MathSciNet MATH Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. 73, 273–282 (1996). https://doi.org/10.1111/j.1467-9868.2011.00771.x
Article MathSciNet MATH Google Scholar
Tomasi, C., Manduchi, R.: Bilateral filtering for gray and color images. In: Proceedings of the International Conference on Computer Vision, pp. 839–846. IEEE (1998)
Valkonen, T., Bredies, K., Knoll, F.: Total generalized variation in diffusion tensor imaging. SIAM J. Imaging Sci. 6(1), 487–525 (2013)
Article MathSciNet MATH Google Scholar
Vese, L., Chan, T.F.: Reduced Non-convex Functional Approximations for Image Restoration and Segmentation. University of California, Department of Mathematics, Los Angeles (1997)
Google Scholar
Vese, L., Osher, S.: Modeling textures with total variation minimization and oscillating patterns in image processing. J. Sci. Comput. 19(1–3), 553–572 (2003)
Article MathSciNet MATH Google Scholar
Wang, C., Zhang, H., Liu, L.: Total generalized variation-based retinex image decomposition. Vis. Comput. 37(1), 77–93 (2021)
Article Google Scholar
Winnemöller, H., Olsen, S.C., Gooch, B.: Real-time video abstraction. ACM Trans. Graph. 25(3), 1221–1226 (2006)
Article Google Scholar
Wohlberg, B.: Efficient convolutional sparse coding. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7173–7177. IEEE (2014)
Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S., Ma, Y.: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2008)
Article Google Scholar
Xu, J., Feng, X., Hao, Y., Han, Y.: Image decomposition and staircase effect reduction based on total generalized variation. J. Syst. Eng. Electron. 25(1), 168–174 (2014)
Article Google Scholar
Xu, J., Feng, X., Hao, Y., Han, Y.: Image decomposition using adaptive second-order total generalized variation. SIViP 8(1), 39–47 (2014)
Article Google Scholar
Xu, L., Lu, C., Xu, Y., Jia, J.: Image smoothing via $l^{0}$ gradient minimization. In: Proceedings of the SIGGRAPH Asia Conference, pp. 1–12 (2011)
Xu, L., Ren, J., Yan, Q., Liao, R., Jia, J.: Deep edge-aware filters. In: Proceedings of the International Conference on Machine Learning, vol. 37, pp. 1669–1678. PMLR (2015)
Xu, L., Yan, Q., Xia, Y., Jia, J.: Structure extraction from texture via relative total variation. ACM Trans. Graph. (TOG) 31(6), 1–10 (2012)
Google Scholar
Xu, P., Wang, W.: Structure-aware window optimization for texture filtering. IEEE Trans. Image Process. 28(9), 4354–4363 (2019)
Article MathSciNet MATH Google Scholar
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1794–1801. IEEE (2009)
Yeganeh, H., Wang, Z.: Objective quality assessment of tone-mapped images. IEEE Trans. Image Process. 22(2), 657–667 (2012)
Article MathSciNet MATH Google Scholar
Yin, H., Gong, Y., Qiu, G.: Side window filtering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8758–8766 (2019)
Yin, H., Gong, Y., Qiu, G.: Combined window filtering and its applications. Multidimens. Syst. Signal Process. 32(1), 313–333 (2021)
Article MATH Google Scholar
Yin, W., Goldfarb, D., Osher, S.: Image cartoon-texture decomposition and feature selection using the total variation regularized l$^{1}$ functional. In: Proceedings of the International Workshop on Variational, Geometric, and Level Set Methods in Computer Vision, pp. 73–84. Springer (2005)
Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535. IEEE (2010)
Zhang, H., Patel, V.M.: Convolutional sparse coding-based image decomposition. In: Proceedings of the British Machine Vision Conference (BMVC) (2016)
Zhang, Q., Shen, X., Xu, L., Jia, J.: Rolling guidance filter. In: Proceedings of the European Conference on Computer Vision, pp. 815–830. Springer (2014)
Zhou, F., Chen, Q., Liu, B., Qiu, G.: Structure and texture-aware image decomposition via training a neural network. IEEE Trans. Image Process. 29, 3458–3473 (2019)
Article MATH Google Scholar
Zhou, Z., Wang, B., Ma, J.: Scale-aware edge-preserving image filtering via iterative global optimization. IEEE Trans. Multimed. 20(6), 1392–1405 (2017)
Article Google Scholar

Download references

Funding

This study was funded by the Youth Science and Technology Foundation of Gansu (20JR5RA050).

Author information

Authors and Affiliations

Dunhuang Academy, Jiuquan, Gansu, China
Chunxue Wang
The School of Computer Information Management, Inner Mongolia University of Finance and Economics, Hohhot, Inner Mongolia, China
Linlin Xu
The School of Mathematical Sciences, University of Science and Technology of China, Hefei, Anhui, China
Ligang Liu

Authors

Chunxue Wang
View author publications
You can also search for this author in PubMed Google Scholar
Linlin Xu
View author publications
You can also search for this author in PubMed Google Scholar
Ligang Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chunxue Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, C., Xu, L. & Liu, L. Structure–texture image decomposition via non-convex total generalized variation and convolutional sparse coding. Vis Comput 39, 1121–1136 (2023). https://doi.org/10.1007/s00371-021-02392-z

Download citation

Accepted: 23 December 2021
Published: 07 February 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s00371-021-02392-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Structure–texture image decomposition via non-convex total generalized variation and convolutional sparse coding

Abstract

Similar content being viewed by others

Semantic-Aware Visual Decomposition for Image Coding

Image Smoothing Based on Image Decomposition and Sparse High Frequency Gradient

An adaptive texture-preserved image denoising model

1 Introduction

2 Related work

3 NTGV + CSC decomposition model

3.1 Background of TGV and NTGV

3.2 Background of CSC

3.3 Proposed model

4 Optimization algorithm for image decomposition

5 Analysis

5.1 Implementation details

5.2 The detail-preserving property of our model

5.3 Convergence analysis

5.4 Parameters impact

5.5 Runtime

6 Applications

6.1 Texture removal

6.2 HDR tone mapping

6.3 Detail enhancement

6.4 Non-photorealistic abstraction

7 Conclusion

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Structure–texture image decomposition via non-convex total generalized variation and convolutional sparse coding

Abstract

Similar content being viewed by others

Semantic-Aware Visual Decomposition for Image Coding

Image Smoothing Based on Image Decomposition and Sparse High Frequency Gradient

An adaptive texture-preserved image denoising model

Explore related subjects

1 Introduction

2 Related work

3 NTGV + CSC decomposition model

3.1 Background of TGV and NTGV

3.2 Background of CSC

3.3 Proposed model

4 Optimization algorithm for image decomposition

5 Analysis

5.1 Implementation details

5.2 The detail-preserving property of our model

5.3 Convergence analysis

5.4 Parameters impact

5.5 Runtime

6 Applications

6.1 Texture removal

6.2 HDR tone mapping

6.3 Detail enhancement

6.4 Non-photorealistic abstraction

7 Conclusion

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation