1 Introduction

With the rapid development of Web 2.0 technologies and imaging devices, there are more and more personal images available on the Internet. There would be hundreds or even thousands of new photographs uploaded to social network websites just after a single personal travel. Since most people are amateur photographers, it would be attractive for them to develop techniques which are able to effectively enhance the quality of their works. For example, before taking landscape pictures, a smart imaging system automatically suggests the optimal scope of the visual field [14]. After taking pictures, some post-processing technique enhances image details or generates special effects [15]. In the process of sharing pictures onto social networks, an intelligent system helps people to select best photos from all their works and provide intuitive knowledge on photograph shooting [12]. In this paper, we primarily concern image post-processing techniques such as detail enhancement.

As for detail enhancement, various image processing algorithms [1, 2, 9, 16, 17] have been designed to achieve this goal such as anisotropic filtering, bilateral filtering and their extensions. Generally, these methods consider the anisotropic nature of image edges and bring about edge-preserving results. However, they often have technical problems such as parameter tuning, edge halos, and optimization. Different from the above-mentioned methods, Paris et al. [15] propose the Local Laplacian Filter (LLF) and obtain promising results in editing image details or tones. The LLF method distinguishes itself from other methods in the following aspects. First, the backbones of LLF, i.e. Gaussian and Laplacian image pyramids, are based on isotropic Gaussian kernels, which are opposed to the anisotropic nature of image edges and were thought to be inappropriate for constructing edge-preserving methods. In this sense, the LLF method makes contributions in enlarging the role of Laplacian image pyramid in image processing. Second, the LLF can be simply implemented with basic manipulations, e.g. convolution, resizing and linear remapping, in a non-iterative style.

With in-depth exploration of LLF, we can further observe the following issues about the method. First, the original LLF process is pixel-wise and each pixel manipulation involves a whole-image remapping sub-process. Obviously, the computational cost would be very large (In the following sections, detailed analysis on the complexity would be given). Second, the remapping process is solely determined by pixel intensities, which totally ignores the spatial information of pixels and thereof misses the spatial information of image contents. As for the first issue, Paris et al. have proposed a fast version of LLF in their seminal research [15], where each remapping is only conducted in a neighboring region. As for the second issue, it is potentially helpful to borrow ideas from the fields such as image understanding or content analysis.

In terms of image understanding, an image can be usually viewed as a spatial combination of several semantic regions. For example, a scenery photograph often contains buildings, sky, mountains, to name but a few. Ideally, our goal is to design a spatially guided remapping function for the LLF model according to these semantic regions. Then we can dynamically assign parameters to different image regions to achieve user-preferred effects. However, this “classification and filtering” roadmap seems impossible due to some technical difficulties. As the semantic gap always exists between low-level features and image contents [20], it is still a very challenging task to have an image semantically partitioned. To bridge this gap, various learning methods have been developed and achieved state-of-the-art results in image/video tagging [19, 22], index [6, 21], and 2D/3D object detection and searching [4, 5, 7, 11]. However, these methods are not suitable for our goal due to the following reasons. On one hand, in terms of model training, as negative samples come from an infinite semantic space, the trained model may be still unreliable to an ad hoc test image. On the other hand, from the practical view, integrating these models may bring too much extra computational cost, especially for the cases where images are expected to be post-processed in real time on imaging devices.

One possible solution to the afore-mentioned problem could be a trade-off between the efficiency of improved LLF and the concrete semantics of image contents. This strategy is supported by the common observation that the image contents can be divided into only two kinds, i.e. regions with rich image details and regions with few details. Taking landscape images for example, they usually contain flat regions such as the blue sky and texture-rich regions such as buildings or mountains. According to the richness of image details, we can design a dynamical remapping function and therefore improve the LLF model. In this roadmap, by proposing a simple technique for approximating richness of image details, we construct a spatially guided remapping function and embed it into the LLF model. Base on the low-level image features directly from the Laplacian image pyramid, we use Chamfer distance transform to realize this approximation. The contributions of our improved LLF method are two-fold. On one hand, the filtering process can be spatially guided by the new remapping function. In this way, we can assign different parameters to different regions. For detail-absent regions, e.g. flat regions, it is reasonable to put small or, practically, no filtering strength on them, which would structurally enhance the efficiency of photo processing. On the other hand, the extra computational cost brought from new remapping function is very low. Even when there is no flat region throughout an image, the improved LLF simply degenerates to the original version without introducing much computation cost.

1.1 Related research

As for research on image detail enhancement, Fattal et al. [3] propose a multi-scale method based on the bilateral filter. As mentioned before, the bilateral filter can be essentially viewed as a trade-off between edge-preserving and edge-smoothing. So tuning the parameters to achieve an optimized result has to be taken into consideration. Based on a psychophysical law, Gu et al. [8] propose a locally nonlinear model to tune image tone mapping. Ling et al. [13] propose a detail enhancement framework, including an energy optimization process, to preserve both tone and fidelity. Different from these methods, our approach considers the image detail richness of image contents and constructs a simple guiding map to spatially determine the enhancing strength. In our method, estimating image detail richness is a key ingredient. Similarly, Hong et al. [10] utilize low-level image feature to approximate detail-absent image regions, according to which movie captions are dynamically placed. Our method is partially inspired by this idea but differentiates in that we directly utilize image features from Laplacian image pyramid and thus introduce no extra feature extracting process.

The rest of the paper is organized as follows. In Section 2, we demonstrate the principle of original LLF and analyze its complexity. Then we present our spatially guided LLF version in Section 3. Experiments for validating the effectiveness of our method are demonstrated in Section 4. Finally, Section 5 concludes the paper and discusses possible future work.

2 Local Laplacian Filter

In this section, we first introduce the principle of basic local Laplacian filter (LLF) and its faster version. Then we analyze their complexities as well as the feasibility on further reducing the complexity.

2.1 LLF principle

Here we assume that the size of an image I is N × N and its intensity is defined as a scalar function I(x, y) ∈ [0, 1] after normalization. As for color images, they can be simply viewed as three independent channels (RGB) in the LLF model. With the target image, we can then formulate the well-known Gaussian image pyramid and Laplacian image pyramid respectively, i.e. {G l } = C G (I) and {L l } = C L ({G l }), where \( \begin{array}{cc}\hfill l\in 0\dots n,\hfill & \hfill n={ \log}_2N\hfill \end{array} \). Equation 1 demonstrates the relationship between these two pyramids:

$$ \left\{\begin{array}{ll}{L}_l={G}_l- upsample\left({G}_{l+1}\right)\hfill & l=1\dots n-1\hfill \\ {}{L}_n={G}_n\hfill & l=n\hfill \end{array}\right.. $$
(1)

In this formulation, we observe that a Laplacian image pyramid can be derived from a Gaussian image pyramid and they share a common spatial structure and a full point-wise correspondence. Computationally, apart from the identical top layers in {G l } and {L l }, the rest layers of {L l } can be obtained by subtracting the upsampled layer Gl + 1 from G l for l = 0 … n − 1. With {G l } and {L l } at hand, an image in its original resolution can be reconstructed through a recursive process I = Α({L l }), i.e. G l  = L l  + upsample(Gl + 1) from the lth layer to the 0th layer.

LLF model defines a remapping function r(i, g) on the normalized image intensity:

$$ r\left(i,g\right)=\left\{\begin{array}{ll}g+\mathrm{sign}\left(i-g\right){\left(\left|i-g\right|/{\delta}_r\right)}^{\alpha}\hfill & if\left|i-g\right|\le {\delta}_r\hfill \\ {}g+\mathrm{sign}\left(i-g\right)\left(\beta \left(\left|i-g\right|-{\delta}_r\right)+{\delta}_r\right)\hfill & if\left|i-g\right|>{\delta}_r\hfill \end{array}\right., $$
(2)

r(i, g) is a scalar function and its input and output range are both between [0, 1]. The aim of this function is, given a center g, to remap i at the position of (x l , y l , l) in {G l } into r(i, g), where g and i are both scalar image intensities. Here α and β are two key parameters that control the LLF effects. The former parameter α generates effects of detail enhancing/smoothing while the later parameter β generates effects of image tone mapping. In this paper, we primarily concern the detail manipulation and set β = 1. As for the parameter α, we can observe that the remapping functions represent an “S” shape in Fig. 1, of which the geometry reveals the principles of LLF detail manipulation. Specifically, when the parameter α is bigger than 1, the intensity range near the center g are compressed into a smaller one. Since the image edges can be perceived with bilateral contrast of imaging intensities, this inverted S-shape mapping decreases this kind of contrast and thus eliminates many weak details. In the contrary, when parameter α is smaller than 1, the remapping function produces a detail-enhancing effect, i.e. the S-shape mapping enlarges subtle intensity differences and thereof strengthens the image details. As is validated by the results in Fig. 2, we can observe that remapping functions with smaller α produce more intensified detail enhancing effects.

Fig. 1
figure 1

Two kinds of remapping function in detail smoothing (a) and enhancing (b) in LLF model

Fig. 2
figure 2

Effects of LLF detail enhancement under different α a the original image b α = 0.9 c α = 0.7 d α = 0.5 e α = 0.3 f α = 0.1

By embedding this remapping function into the image reconstruction process I = Α({L l }), the LLF can be summarized into the following process, which we call as naïve LLF:

Inputs: Original image I, remapping function r(g, i), parameter δ r

Output: Reconstructed image I '

  1. Step1:

    {G l } = C G (I), {L0l} = C L ({G l })

  2. Step2:

    for all g = G l (x l , y l )

  3. Step3:

    I remap  = r(I, g)

  4. Step4:

    {G l } = C G (I remap ), {L l } = C L ({G l })

  5. Step5:

    L0l(x l , y l ) = L l (x l , y l )

  6. Step6:

    end for

  7. Step7:

    I ′ = Α({L0l})

2.2 Complexity analysis and our idea for improving LLF

From the above algorithm, we can conclude that the LLF method includes the following process: image pyramids construction, point-wise remapping, and image reconstruction. It is obvious that the main computation load comes from the remapping process. For a single point in {G l }, in the naïve LLF, the complexity of remapping function is CN2. And the whole complexity of LLF can be approximated as \( C{N}^2{\displaystyle \sum_{j=0}^{n-1}{\left(\frac{1}{4}\right)}^j}{N}^2\approx {C}_1{N}^4 \), which is so expensive that it is not practical in most image processing applications. To tackle this problem, Paris et al. [15] propose an improved sub-region LLF. Instead of the whole image region, the spatial range for each point-wise remapping can be reduced into a K × K sub-region (K = 3(2l + 1)), where the image pyramids are built. Therefore, the complexity of sub-region LLF is reduced to \( {C}_1{\displaystyle \sum_{j=0}^{n-1}{\left(\frac{1}{4}\right)}^j{N}^2}{K}^2 \), which can be further derived as C2 N2 log2 N2.

The above LLF methods adopt a universal parameter α for all pixels in an image, which is irrelevant to spatial location or, to some extent, image contents. In our roadmap, we propose to improve LLF by spatially guiding the parameter value of α, i.e. distributing different values of α to different image regions, according to the richness of image details. In this way, we can extend the former universal filtering process to a dynamical one. Also, the improved model under our model can be faster than the sub-region LLF. For example, we can empirically assign no strength (α = 1) to the flat or detail-absent regions so that we can omit the C1 K2 remapping process for these pixels. This would have the potential to structurally reduce the whole complexity of the LLF process.

We then quasi-quantitatively analyze the feasibility for reducing the complexity. Suppose the complexity of the sub-region LLF and the detail approximation process are O1 and O2, the ratio of non-LLF region is η (η < 1). To reduce the computation cost, it should satisfy the following in equation (1 − η)O1 + O2 < O1, i.e. O2/O1 < η. This means that, on one hand, given a fixed method on approximating αs for each pixel, the lower bound for η, at which the computation cost can be saved, is O2/O1. On the other hand, given an object image (η is fixed), it is demanded that the complexity of introduced detail approximation method be as small as possible.

3 Content based Local Laplacian Filter

As the Laplacian operator is essentially the second order differential, it eliminates the 0th and first order intensity variation, i.e. producing zero response in flat regions and gradually varying regions. Instead of supervised learning methods aiming at classifying the image contents, we directly utilize the bottom layer of {L l } to estimate the richness of image details. This procedure produces an auxiliary matrix of the same size as image I, which can be incorporated into the remapping function.

We normalized the L0 layer as a scalar function f(x, y) = γ ∈ [0, 1] defined on the grids. The locations of the non-zero γs are the places where image details respond to Laplacian filtering. However, it is inappropriate to only manipulate these discrete non-zero dots or edges in the remapping process. Instead, we propose to estimate the richness of image details by overlapping the impacts of these dots or edges.

To this end, we define a function h(f− 1(γ)) that simulates the impact of γ -valued points in L0, where they are regarded as electric charges in the image domain. So the h(⋅) function simulates a truncated electric potential produced by these charges. Then the auxiliary image can be represented as an integral of these h(⋅) functions:

$$ map\left(\boldsymbol{I}\right)={\displaystyle {\int}_{\tau }h\left({f}^{-1}\left(\gamma \right)\right)}\mathrm{d}\gamma, $$
(3)

where τ ∈ (0, 1].

To achieve the discretization of the integral operator in Eq. 3, we divide the integrating range into k intervals {Γ i } = {((i − 1)Δ, ]}, i = 1 … k, where Δ is the length of an interval. We use the Chamfer distance transform to approximate the electric potentials:

$$ map\left(\boldsymbol{I}\right)={\displaystyle {\sum}_{i=1}^k Cham\left({f}^{-1}\left({\varGamma}_i\right)\right)}. $$
(4)

In practice, the Laplacian response image is decomposed and translated into a set of binary images, i.e. f− 1(Γ i ). The positions of zero-value pixels in a binary image are those whose intensities in L0 fall into the interval Γ i . The complexity of the conducting these Chamfer distance transforms is kC0N2 = C3N2. The distance transform, whose complexity is only proportional to N2, i.e. the number of pixels of an image, comprises the main body for building the auxiliary image.

Empirically, after accumulating the histogram of an image’s Laplacian response, e.g. Fig. 3c, we observe that the non-zero responses in L0 layer distribute in a small range after the normalization. Therefore, by reducing the number of intervals, i.e. k, we can further lower the computation cost of Eq. 4. We choose a smaller k ′ so that the (0, k ′ Δ] range includes 95 % of the non-zero responses. As the value of k ′/k is usually much smaller than 1, we can thus reduce the number of distance transforms.

Fig. 3
figure 3

a The original image, b the L0 layer, and c the accumulation of (b)’s histogram

With the constructed auxiliary images, e.g. the second column of Fig. 5, we can dynamically assign non-single αs to different image regions. As the parameter α is inversely proportional to the enhancing strength, we linearly map the values in an auxiliary image onto the interval [αmin, αmax], of which the range can be determined by users. Then we can obtain a parameter matrix αmap and incorporated into the sub-region LLF framework to construct the spatially guided LLF. Usually, we can choose αmax = 1 so that the detail-absent regions can be omitted by the LLF process. Recall that the spatially guided LLF is able to save running time on condition that the complexities satisfy O2/O1 < η < 1. Specifically, compared with sub-region LLF, our LLF model achieves a faster implementation when the ratio between areas of detail-absent region and the whole image region is larger than O2/O1 = C3/(C2 log2N2). Although we cannot give a precise value of this ratio, this condition could be easily satisfied for most images due to a qualitative observation that the bound can be very small in most cases. The following experiments also empirically validate this observation.

4 Experiments

In this section, we first analyze parameter selection of our algorithm. Then we validate our proposed spatially guided LLF in the following two aspects, i.e. the qualitative visual quality and the quantitative temporal costs. As for the experimental data, we choose three images used in [15] with a resolution of 800*500 and three images collected from the Internet with a resolution of 1800*1200. The algorithms were implemented with un-optimized codes and no parallel techniques were used. Since the naïve LLF is obviously much more time-consuming, we only compare our method with the sub-region LLF.

4.1 Parameter selection

In the sub-region LLF and our improved LLF, the main parameters are α, β, δ r and k and k '. As for the sub-region LLF, the parameter α is set to 0.4 in the experiments. As for our methods, the dynamical range for α is set to [0.2,1]. The parameters β and δ r for both methods are respectively set to 1 and 0.2.

In the following, we discuss the selection of parameter k. As we have chosen 8-bit digital images as the experimental data, the pixel values are discretized from 0 to 255. In this context, the parameter k can be chosen as an integer which equally divides the range of image intensities. The α map based on the discretized Eq. 3 can be best approximated with k = 256. Practically, we can further decrease the number of intervals, i.e. k, to save the computational cost. From Fig. 4, we observe that larger ks produce smoother maps, and small ones generate block effects since less number of fields, which is generated by Chamfer distance transform, are accumulated. In our method, we empirically choose k = 64 as it generally keeps the smoothly varied αs meanwhile further decreases the computational cost. With the determined k in our experiments, the values of parameter k ′ fall around 16 in our experiments.

Fig. 4
figure 4

Auxiliary images for guiding α values with different ks a k = 256 b k = 128 c k = 64 d k = 32 e k = 16 f k = 8

4.2 Comparison between sub-region LLF and spatially guided LLF

Figure 5 demonstrates the visual effects of the sub-region LLF and spatially guided LLF on the six experimental images. Generally, these two methods achieve similar effects of detail enhancement on these images (both for grayscale images and color image), e.g. the enhanced distant mountains in “scence1” and “scence3”. Differently, our method does not bring too much turbulence for the flat regions since it uses an auxiliary image to detect these regions and put no enhancing strength on them. In contrast, the sub-region LLF puts a universal strength (in our experiments the α is set to 0.4 as mentioned above) on all pixels, which would mistakenly exaggerate the microscopic variations in these regions. For the validation at a finer scale, we select some representative sub-regions from the experimental images and enlarge the results of both methods for further comparison. From Fig. 6, we can see that our method generates much less unnecessary turbulence than its counterpart 1) in flat regions and 2) around salient boundaries. As for the former case, it is mainly due to the zero filtering strength determined by our model; as for the later case, the distance transform modeling contributes to the descending filtering strength around salient boundaries. However, from the second column of Figure 5, we have to note that our technique only roughly detect detail-absent regions so that an a map is far from a precise image content partition. As is discussed in Section 1, this fast unsupervised estimation is a trade-off between accuracy and efficiency. So we choose to distribute gradually descending strengths near the true boundaries between detail-rich and detail-absent contents.

Fig. 5
figure 5

Visual comparisons between sub-region LLF and our method (The black frames in (a-1), (c-1), (d-1) and (e-1) are for enlarged comparisons in Fig. 6) (a-1) olddome (a-2) α map (a-3) spatially guided LLF (a-4) sub-region LLF[2] (b-1) polin (b-2) α map (b-3) spatially guided LLF (b-4) sub-region LLF[2] (c-1) easter (c-2) α map (c-3) spatially guided LLF (c-4) sub-region LLF[2] (d-1) scence1 (d-2) α map (d-3) spatially guided LLF (d-4) sub-region LLF[2] (e-1) scence2 (e-2) α map (e-3) spatially guided LLF (e-4) sub-region LLF[2] (f-1) scence3 (f-2) α map (f-3) spatially guided LLF (f-4) sub-region LLF[2]

Fig. 6
figure 6

Enlarged sub-regions of several experimental images for visual comparisons between sub-region LLF and our method a Enlarged views of the original experimental images, i.e. olddome, easter, scence1 and scence2 b Enlarged views of the output images based on our spatially guided LLF c Enlarged views of the output images based on the sub-region LLF[2]

In terms of temporal costs, we can observe that our method can be “conditionally” faster than the sub-region LLF from Table 1. On one hand, besides the second image “polin”, the running time of our methods on all other five images shows impressive improvements over its counterpart. Moreover, the time for computing α map s only account for 0.1 % of the whole algorithm, which satisfies the demand for detail richness approximation. On the other hand, although this temporal advantage no longer exists for images with little or no flat regions, our method is still able to dynamically assign the filtering strength with certain user preference, without introducing much computational cost.

Table 1 Comparisons of running time between sub-region LLF and our method

5 Conclusions

In this paper, we propose an improved local Laplacian filter that spatially guides the filtering strength by approximating the richness of image details. This method endows the LLF with the ability to dynamically assign appropriate parameter values to different image contents. We use a simple distance transform technique to achieve the detail richness estimation, which introduces little computation expense compared to the whole framework. Experimental results on some representative images validate the effectiveness of our method. Our method is also extendable to other image editing tasks such as LLF tone mapping. As for the future work, we can conduct parallel implementation to further speed up the algorithm. Also, as our method is totally unsupervised, we can consider a less supervised way to achieve better results by taking advantages of user-provided interactions [18].