Keywords

1 Introduction

Image filtering has attracted many research attentions for years and been witnessed significant advances. The goal is to remove fine-scale details or textures and preserve sharp edges. In computer vision and graphics community, it is a simple and fundamental tool to extract meaningful information for understanding and analyzing images.

Most early proposed image filters are linear translation-invariant filters (e.g., Gaussian and Laplacian filters), which can be explicitly expressed by convolution operator between one input image and a specific filter kernel. They usually achieve poor performance due to their simple forms and lacking of elaborate designing. To better preserve edges, bilateral filter (BF) has been proposed in [1, 30], taking both spatial and range information into consideration. By using an additional favourable image instead of the input image, bilateral filter can be extended to joint bilateral filter (JBF) [10, 25]. However, one well-known limitation of (joint) bilateral filter is that it may generate gradient reversal artifacts [2, 11] in detail enhancement and HDR compression.

Joint filtering techniques need an input image and a guidance image. Based on a local linear model between the guidance image and the output, the guide filter (GF) [14] is a representative structure-transferring filter and overcomes the gradient reversal limitation. Unluckily, one trouble thing is that the guidance image may be insufficient or unreliable locally, which may lead to unpleasing artifacts. Shen et al. propose the concept of mutual-structure in [28] to address the structure inconsistency problem. Relying on mutual-structures that are contained in both the input and the guidance image to generate the output, MSJF [28] is suitable to specific problems like joint structure extraction and joint segmentation. It does not has structure-transferring property.

Many methods [16, 19, 20] have been proposed to improve guided filter [14]. However, most of them work on designing various adaptive weights or regularization terms and pay little attention on the structure loss problem. Guided filter [14] applies \(L_2\) norm distance on intensity to formulate fidelity term, leading that some meaningful structures may not be preserved well, particularly near edges. This can be illustrated in Fig. 1. As shown in Fig. 1(b), there is noticeable loss of structural information near the edge. This easily causes errors or artifacts in many applications, e.g., detail enhancement.

Fig. 1.
figure 1

Illustration of structure loss on 1D signals. The guided filter output losses structural information and blurs the edge. Our method is more capable of preserving the edge and the output edge is sharper.

We in this work propose an algorithm to improve the capability of the guided filter on avoiding structure loss. Our contribution is two-fold. First, we modify the original objective function and develop an efficient algorithm by iterative re-weighting mechanism. Second, we show that the proposed method benefits many vision applications. Experimental results compared with state-of-the-art methods demonstrate the effectiveness of our method.

2 Proposed Model and Optimization

In this section, we propose a new objective function based on the similar assumption as [14]. Then we develop a numerical algorithm and give the iterated solutions of the proposed method.

2.1 Proposed Model

Given an guidance image G, the proposed method is based on the following local model:

$$\begin{aligned} q_i = a_kG_i + b_k, \forall i \in W_k, \end{aligned}$$
(1)

where q denote the expected output, and i is the pixel index in the window \(W_k\), which is centered at pixel k. \(a_k\) and \(b_k\) are two linear transform coefficients. All pixels in \(W_k\) are assumed to share the same \(a_k\) and \(b_k\). \(W_k\) is set to be square with \(2r+1\) pixels on the side.

For an input image p, the output q is expected to contain major structures associated with p. Details, textures, and noise are expected to be contained in \(n = p - q\). Based on these assumptions, we propose the following objective function:

$$\begin{aligned} E(a_k, b_k) = \sum _{i \in W_k} \left( |a_kG_i + b_k - p_i| + \epsilon a_k^2 \right) . \end{aligned}$$
(2)

To deal with the structure loss, \(L_1\) norm is employed in (2) to formulate the fidelity term.

2.2 Optimization

The data term in (2) is not quadratic, making the optimization problem not a simple linear regression problem. To solve (2), we employ iterative re-weighted least squares (IRLS) algorithm [17] to obtain the iterative solutions of \(a_k\) and \(b_k\). IRLS solves a sequence of least square problems within an iterating framework, and every least square problem can be penalized by the reciprocal of absolute error of previous iteration. The cost function at t-th iteration (\(t \ge 1, t \in N\)) is defined as

$$\begin{aligned} E(a_k^t, b_k^t) = \sum _{i \in W_k} \omega _i^t (a_k^tG_i + b_k^t - p_i)^2 + \epsilon (a_k^t)^2, \end{aligned}$$
(3)

where the weight at pixel i is given by \(\omega _i^t = 1/\max \left\{ |q_i^{t-1} -p_i|,~\nu \right\} \). \(\nu \) is a parameter to avoid the zero denominator. We define the \(q^0\) as the output of the guided filter [14].

The energy function (3) is a linear regression problem [9]. By setting the derivatives of (3) with respect to \(a_k^t\) and \(b_k^t\) to zero respectively, we can obtain the iterative solutions of \(a_k^t\) and \(b_k^t\):

$$\begin{aligned} \left\{ \begin{aligned}&a_k^t = \frac{\frac{1}{|W|} \sum _{i \in W_k} \omega _i^tG_ip_i - \widetilde{G}_k^t\widetilde{p}_k^{\,t}\widetilde{\omega }_k^t}{(\widetilde{\sigma }_k^t)^2 + \epsilon } ,\\&b_k^t = \widetilde{p}_k^{\,t} - a_k^t\widetilde{G}_k^t , \end{aligned} \right. \end{aligned}$$
(4)

where \(\widetilde{G}_k^t\) and (\(\widetilde{\sigma }_k^t)^2\) denote the weighted mean and weighted variance of G in \(W_k\), given by \(\widetilde{G}_k^t = \frac{\sum _{i \in W_k} \omega _i^tG_i}{\sum _{i \in W_k} \omega _i^t}\) and \((\widetilde{\sigma }_k^t)^2 = \frac{1}{|W|} \sum _{i \in W_k} \omega _i^t(G_i - \widetilde{G}_k^t)^2\). \(\widetilde{p}_k^{\,t}\) denotes the weighted mean of p and \(\widetilde{\omega }_k^t\) denotes the mean of all the penalized weights \(\omega _i^t\) in \(W_k\), given by \(\widetilde{\omega }_k^t = \frac{1}{|W|} \sum _{i \in W_k} \omega _i^t\) and \(\widetilde{p}_k^{\,t} = \frac{\sum _{i \in W_k} \omega _i^tp_i}{\sum _{i \in W_k} \omega _i^t}\).

Similar to [14], overlapping problem appears. In each iteration, we compute \(\overline{a}_i^t\) and \(\overline{b}_i^t\) by averaging strategy: \(\overline{a}_i^t = \frac{1}{|W|} \sum _{k: i \in W_k} a_k^t\) and \(\overline{b}_i^t = \frac{1}{|W|} \sum _{k: i \in W_k} b_k^t\). The final output is calculated by \(q_i^t = \overline{a}_i^tG_i + \overline{b}_i^t\).

We point out that the calculations of (4) involve \(\widetilde{G}_k^t,~\widetilde{p}_k^{\,t}\), and \(\widetilde{\sigma }_k^t\), which are associated with the penalized weights \(\widetilde{\omega }_i^t\). This is different from the calculations of the solutions (\(a_k\) and \(b_k\)) of guided filter [14], since the filtering process of [14] is not iterative.

Fig. 2.
figure 2

Comparisons of guided filter and the proposed filter.

3 Discussions

In this section, we discuss some properties of the proposed iterative filter and provide the expressions of our iterative kernel weights. Extension to color images and limitations are also discussed.

3.1 Edge-Preserving Filter

In this section we analyze how does the proposed filter work. When \(G = p\), the equations in (4) are simplified to \(a_k^t = \left( \widetilde{\sigma }_k^t\right) ^2 \big / \left( (\widetilde{\sigma }_k^t)^2 + \epsilon \right) \) and \(b_k^t = \left( 1 - a_k^t\right) \widetilde{G}_k^t\). As \(\epsilon \) is positive, for each \(q_i^t\) there are two special cases:

  • If \(\left( \widetilde{\sigma }_k^t\right) ^2 \ll \epsilon \), then \(a_k^t \approx 0\), so \( q_i^t \approx b_k^t \approx \widetilde{G}_k^t\).

  • If \(\left( \widetilde{\sigma }_k^t\right) ^2 \gg \epsilon \), then \(a_k^t \approx 1\) and \(b_k^t\approx 0\), so \(q_i^t \approx G_i\).

For pixels located in a flat window, their intensities are approximative and we have \(\widetilde{G}_k^t \approx G_i\). Then \((\widetilde{\sigma }_k^t)^2 \rightarrow 0 \) and \((\widetilde{\sigma }_k^t)^2 \ll \epsilon \), we obtain \(a_k^t \approx 0\) and \( q_i^t \approx \widetilde{G}_k^t\). In other words, the proposed filter handles pixel in a flat window by weighted averaging to reach the goal of smoothing. On the other hand, only when \((\widetilde{\sigma }_k^t)^2 \gg \epsilon \), pixel centered at this window is preserved. Note that \((\widetilde{\sigma }_k^t)^2\) is influenced by both \(\omega _i^t\) and structures in the patch \(W_k\). This means that whether the pixel is preserved or not is determined by G, p and \(q^{t-1}\). That is, the criterion “what is an edge” or “structures which are expected to be preserved” is no longer simply measured by the given parameter \(\epsilon \) like [14]. In the t-th iteration, pixels where p and \(q^{t-1}\) are approximate are assigned large weights to reach the similar smoothing effects as guided filter, whereas pixels where p and \(q^{t-1}\) are quite different are assigned small weights for proper modifications. This explains why the proposed method is more capable of avoiding structural information loss than guided filter. Visual comparison of guided filter and the proposed filter is shown in Fig. 2.

3.2 Gradient-Preserving Filter

The proposed filter is able to avoid the gradient reversal artifacts. We take detail enhancement for example and follow the algorithm based on base-detail layers decomposition \(E = B + \tau D\), where BDE denote the base layer, the detail layer and the enhanced image, respectively. \(\tau \) is a parameter to control the magnification of details.

In practice, the base layer B is generated by filtering on the input image p, and the detail layer D can be viewed as \(D = p - B\). This relationship ensures that \(\partial D = \partial p -\partial B\). If B can not be consistent with the input signal p and further leads to \(\partial D\cdot \partial p < 0\), the gradient reversal artifacts would appear in the enhanced signal after magnifying the detail layer D.

Theoretically, the local linear model (1) indicates that \(\partial B\) is \(a_k^t\) times of \(\partial p\) when \(p \equiv G\). For \(a_k^t\), we have \(a_k^t = \widetilde{\sigma }_k^2 \big / (\widetilde{\sigma }_k^2 + \epsilon )\) and it is less than 1. Then we further have \(\partial D = \partial p - \partial B = (1 - a_k^t)\partial p\) and \(\partial D\cdot \partial p \ge 0\).

An example of 1D signal is shown in Fig. 3. As can be seen in Fig. 3(c), our final enhanced signal avoids gradient reversal artifacts safely and does not introduce over-sharpened artifacts in the enhanced signal.

Fig. 3.
figure 3

The proposed filter is gradient-preserving. (a) Input signal and our filtering result. (b) Detail layer. (c) The enhanced signal. The proposed method does not generate unrealistic details near edges and further produces natural enhanced signal without over-sharpened artifacts.

3.3 Iterative Filter Kernel

The filter kernel of the proposed method varies in each iteration. The explicit expressions of kernel weights in t-th iteration can be given by

$$\begin{aligned} \varvec{W}_{ij}^t = \frac{1}{|W|^2}\!\!\!\!\sum \limits _{k:(i,j)\in W_k}\!\!\!\!\varvec{M}\left( \!\!\frac{(G_i - \widetilde{G}_k^t)(G_j - \widetilde{G}_k^t)}{\left( \widetilde{\sigma }_k^t\right) ^2 + \epsilon } + \frac{1}{\widetilde{\omega }_k^t}\!\right) , \end{aligned}$$
(5)

where \(\varvec{M} = \omega _j^t + \varvec{T}_j\varvec{H}\left( |q_j^{t-1} - p_j| - \nu \right) \big (p_j - \widetilde{p}_k^{\,t} - a_k^t(G_j - \widetilde{G}_k^t)\big )\), and \(\varvec{T}_j = {{\,\mathrm{sgn}\,}}(q_j^{t-1} - p_j)/(q_j^{t-1} - p_j)^2\). We use \({{\,\mathrm{sgn}\,}}(\cdot )\) to denote sign function and \(\varvec{H}(\cdot )\) to denote heaviside step function (outputting ones for positive values and zeros otherwise).

(5) can be proved by the Chain Rule and a series of careful algebraic manipulations. Here we visually show several kernels in Fig. 4.

Fig. 4.
figure 4

Several iterative filtering kernels for two special cases. (a) Image patches. (b) Kernels of guided filter (\(\varvec{W^0}\)) at the pixels denoted by the red dots in (a). (c)–(f) Our iterative kernels after 1st, 4th, 7th and 10th iteration. The final kernels are capable in preserving underlying data structure.

3.4 Filtering Using Color Guidance Image

Section 2 presents the iterated filtering process for the case of a gray input with a gray guidance image. However, RGB images usually contain more information than gray images. Thus, we develop another proper algorithm for the case of color guidance image. We rewrite model (1) in vector form:

$$\begin{aligned} q_i = \varvec{a}_k^T \varvec{G}_i + b_k, \forall i \in W_k , \end{aligned}$$
(6)

where \((\cdot )^T\) denotes matrix transposing operator, \(\varvec{G}_i\) denotes RGB intensities at pixel i, and \(\varvec{a}_k\) denotes coefficient vector. Note that \(\varvec{G}_i\) and \(\varvec{a}_k\) are \(3\times 1\) vectors, while \(q_i\) and \(b_k\) are still scalars. Then (2) becomes

$$\begin{aligned} E(\varvec{a}_k^t, b_k^t) = \sum _{i \in W_k} \omega _i^t (\varvec{a}_k^t\varvec{G}_i + b_k^t - p_i)^2 + \epsilon (\varvec{a}_k^t)^T \varvec{a}_k^t , \end{aligned}$$
(7)

and the solutions of \(\varvec{a}_k^t\) and \(b_k^t\) can be obtained by linear regression:

$$\begin{aligned} \left\{ \begin{aligned}&\varvec{a}_k^t = \left( \widetilde{\varvec{\varSigma }}_k^t + \epsilon \varvec{E} \right) ^{-1}\left[ \frac{1}{|W|}\sum \omega _i^tp_i\varvec{G}_i - \widetilde{\omega }_k^t\widetilde{p}_k^{\,t}\widetilde{\varvec{G}}_k^t \right] \\&b_k^t = \widetilde{p}_k^{\,t} - (\varvec{a}_k^t)^T \widetilde{\varvec{G}}_k^t \\ \end{aligned} \right. , \end{aligned}$$
(8)

where \(\varvec{E}\) is a \(3\times 3\) matrix with all one elements, and \(\widetilde{\varvec{G}}_k^t\) is a \(3\times 1\) weighted averaged vector of \(\varvec{G}_i\). \(\widetilde{\varvec{\varSigma }}_k^t\) is a \(3\times 3\) weighted covariance matrix, expressed as \(\widetilde{\varvec{\varSigma }}_k^t = \frac{1}{|W|}\sum _{i \in W_k} \omega _i\left( \varvec{G}_i - \widetilde{\varvec{G}}_k^t\right) \left( \varvec{G}_i - \widetilde{\varvec{G}}_k^t\right) ^T\).

After dealing with the overlapping problem, the final filter output are given by \( q_i^t = \frac{1}{|W|} \sum _{k: i \in W_k}\left( \left( \varvec{a}_k^t\right) ^T\varvec{G}_i + b_k^t\right) = \left( \overline{\varvec{a}}_i^t\right) ^T\varvec{G}_i + \overline{b}_i^t\).

We show an example in Fig. 5. By comparing the results of using gray guidance and RGB guidance visually, we can see that edges in Fig. 5(c) are preserved better than that in Fig. 5(b).

Fig. 5.
figure 5

Comparisons on color guidance image and gray guidance image.

In addition, filtering a gray input with a color guidance image is also very useful for some vision tasks, such as dehazing and image matting. These applications can be found in Sect. 4.

3.5 Limitations

The proposed method would not work well if there is complex texture patterns contained in the image, or it is incapable of removing dense textures. We show an example in Fig. 6. As can be seen, gradient-based RTV method [33] performs better than the proposed method.

Fig. 6.
figure 6

One failure example.

4 Applications and Experimental Results

The proposed method can be applied to a variety of computer vision tasks. Several tasks we outline in this section are flash/no-flash image restoration, image dehazing, detail enhancement, HDR compression, and image matting.

Parameter Settings. We state parameter settings first. Empirically, we set window radius \(r < 100\) and the regularization parameter \(\epsilon < 1\). \(\nu \) is set to be 0.0001 in all experiments. The number of iterations is set to be 5.

Running Time. The proposed algorithm has been implemented in MATLAB on a PC with Intel Xeon E5630 CPU and 12 GB RAM. It takes about 0.4 s to process a \(321 \times 481\) gray image without code optimization. For the color case, processing an image with the same size takes about 23.8 s.

4.1 Flash/No-Flash Image Restoration

Flash/No-Flash Denoising. The observed flash image can be viewed as a guidance image to facilitate denoising a noisy no-flash input. Figure 7 shows an example. The compared methods include joint BF, GF [14], and WLS [11]. As can be seen, the results shown in Fig. 7(c) (joint BF) contain noticeable gradient reversal artifacts. GF is incapable of preserving edges and can not produce sharp edges in some regions (Fig. 7(d)). The results of WLS (Fig. 7(e)) contain artifacts generated by the intensive noises. Our result is visually better than others.

Flash/No-Flash Deblurring. One common way for flash/no-flash deblurring is to generate an image which both preserves the ambiance lighting and contains clear edges and details. This process can be simply finished by the base-detail layers decomposition model mentioned in Sect. 3.2, which may save much computation compared with existing deblurring methods. The base layer B is generated by filtering on the no-flash image p guided by the flash image G in order to maintain the ambient lighting. The detail layer D is produced by \(D = G - \widehat{G}\), where \(\widehat{G}\) denotes a self-guidance filtered output of G. Then we can combine the base layer B and the detail layer D to generate a blur-free image.

A challenging case is that some saturated regions may appear in the blurry no-flash image, as shown in Fig. 8(a). Two representative blind image deblurring methods [22, 24], fail to produce clear structure around the saturated region, as shown in Fig. 8(b)–(c). Severe artifacts can be found in these results. Even for methods [15, 23], which are specifically proposed to deal with outliers, their deblurred results shown in Fig. 8(d)–(e) are still unpleasing. We also provide results of several filtering methods, including joint BF, GF [14], WLS [11], \(L_0\) gradient minimization [32], domain transform filter (DTF) [12], RTV [33], RGF [34], and MSJF [28]. As shown in Fig. 8(g)–(o), ours is visually the best.

Fig. 7.
figure 7

Denoising with flash/no-flash image pair. The no-flash image (input) suffers from severe noises while the structures and edges in the flash image (guidance) are quite clear. Compared with the results shown in (c)–(e), the proposed filtering method produce a noise-free result with clear edges.

Fig. 8.
figure 8

Blur removal example with flash/no-flash image pair. Results shown in (b)–(e) are restored by deblurring methods [15, 22,23,24], respectively (blur kernel size: \(33\times 33\); running time of kernel estimation process: 465.7 s, 2020.6 s, 306.8 s, and 1933.4 s). Results shown in (g)–(o) require no kernel estimation process (running time: 1.6 s, 1.1 s, 7.4 s, 10.2 s, 78.4 s, 17.5 s, 6.4 s, 23.8 s and 7.8 s).

4.2 Image Dehazing

We follow the widely used the hazy image formation model \( \varvec{I}(x) = \varvec{J}(x)T(x) + \varvec{A}(x)(1 - T(x))\), where \(\varvec{I}, \varvec{J}, \varvec{A}, T\) denote the observed hazy image, the scene radiance, the global atmospheric light and the medium transmission, respectively. x is the pixel index. We estimate the atmospheric light \(\varvec{A}\) and the raw transmission map \(T^0(x)\) within the framework [13] and refine \(T^0(x)\) by filtering \(T^0(x)\) instead of solving the matting Laplacian matrix like [13], which is very slow. As can be seen in Fig. 9, our refined transmission map T (Fig. 9(d)) contains more meaningful structures than \(T^0\) (Fig. 9(b)). The final dehazed result is shown in Fig. 9(e). The competitive dehazing methods include DCP [13], BCCR [21], NLD [3], DehazeNet [5], and MSCNN [26]. Our result is visually comparable to the results of conventional methods DCP, BCCR and NLD, and is a little better than that of learning-based methods DehazeNet and MSCNN.

Fig. 9.
figure 9

An image dehazing example. (a) Input hazy image. (b) Raw transmission map \(T^0\). (c) Dehazed by (b). (d) Our refined T. (e) Dehazed by (d). (f) DCP [13]. (g) BCCR [21]. (h) NLD [3]. (i) DehazeNet [5]. (j) MSCNN [26]. It cost about 57 s and 40 s for (f) (using matting Laplacian) and (e) (ours) to refine \(T^0\).

4.3 Detail Enhancement and HDR Compression

Detail Enhancement. The detail enhancement algorithm has been described in Sect. 3.2. Figure 10 shows comparisons of using GF [14], LLF [29], RTV [33], mRTV [8], RoG [4] and the proposed filter on an example. The results shown in Fig. 10(b)–(f) suffer from unrealistic artifacts (See close-ups). In comparison, our method avoids to generate unrealistic artificial details (Fig. 10(g)).

Fig. 10.
figure 10

A detail enhancement example compared with [4, 8, 14, 29, 33]. The guided filter generates an over-enhanced results with unrealistic artifacts due to structural information loss, as shown in (b). Our result is visually more natural with little unrealistic artifacts than (b)–(f).

HDR Compression. Different from detail enhancement, HDR compression aims to generate low dynamic range image by compressing the base layer at some rate while preserving the details. We show an example in Fig. 11 compared with some filter methods [4, 11, 29, 32]. To display we convert the input HDR radiance to a logarithmic scale and then map the result to [0, 1] (See Fig. 11(a)). For each result, we show two close-ups of a highlighted area and a dark area. The proposed method produces clean result with natural details, whereas the other methods either suffer from aliasing or fail to compress the range properly.

Fig. 11.
figure 11

HDR compression. Close-ups show that the proposed method compresses the highlighted area and the dark area effectively.

4.4 Image Matting

An accurate matte can be generated from filtering a coarse binary mask with the guidance of corresponding clear image. We compare our method with image matting methods [6, 7, 18, 27, 31]. Our result shown in Fig. 12(g) is visually comparable with the results shown in Fig. 12(b)–(f). Nevertheless, the proposed method does not require another user-assisted input but a coarse binary mask. In comparison, all the competitive methods require user-assisted input (either scribbles or trimap image) for labeling.

Fig. 12.
figure 12

An image matting example.

5 Conclusion

In this paper, we propose to modify the measurement function of fidelity term in the objective function of guided filter to address its structural information loss limitation and improve its capability on preserving structures. We then develop an efficient iterative re-weighting algorithm to solve the proposed model. We analyze the attractive properties of our method. The extension to color guidance image (with gray input) leads the proposed filter to benefit some specific tasks, e.g., image matting. We also outline other applications which can be benefited from the proposed method. We expect to apply it to more practical applications.