1 Introduction

With the prevalence of advanced mobile devices, it is very convenient for people to take photographs almost anywhere at anytime. Along with the booming of social media platforms, huge amounts of photographs are produced and shared each day, which have formed an important kind of big data [36]. However, the quality of these visual data is not ensured, as the generating source of these visual data is quite open. On one hand, in taking a photograph, most people are amateurs or know little about photographing skills. They tend to choose sub-optimal photographing parameters [23, 28]. On the other hand, there are many challenging photographing conditions that lead to low photo quality, such as bad weather, moving objects, and low light conditions [4]. Low-light images lower the visual quality for user experience or hinder the content understanding for industrial applications.

Various image enhancement techniques are thus highly needed to recover image details or lift image quality. For example, rich user-generated contents from social media can be fully adopted to optimize the scene composition [13, 23, 28]. In these learning based methods, it is necessary to capture the semantic information from sufficient visually similar exemplars. Furthermore, for processing an arbitrarily new photograph, large amount exemplars are needed at the server side [26]. In this context, the computation cost and the storage load make the support of cloud computing and high-speed transmission indispensable. Nevertheless, for some tasks, there is no need to process images at the cloud side. For example, to deal with challenging imaging conditions, the image manipulation has to be on the pixel-wise level, where the efficient filtering at the side of mobile devices is more feasible.

In this research, we intend to enhance the low-light images. Specifically, there are two situations of low light, i.e., nighttime, unbalanced light. In dark surroundings, the image histogram mainly gathers in low-intensity regions, and most image details are therefore concealed by the low contrast (e.g. the first column in Fig. 1). As for the unbalanced light situations, the low-light regions only exist in part of the whole image due to backlight or sidelight (e.g. the second and third column in Fig. 1).

Fig. 1
figure 1

Examples of low light images, such as nighttime (first column), backlight (second column) and sidelight (third column)

These situations pose challenges to the image enhancement task. First, during the process of enhancing edge and texture patterns, the imaging noise overlapped in the dark regions is likely to be amplified in the meanwhile. Second, to deal with unbalanced-light conditions, an adaptive model is required to dynamically assign enhancing strengths to different regions. Third, the naturalness of a result image is supposed to be preserved, such as the basic scene characteristics and the visual consistency. Also, the computational efficiency is very important for applications on the mobile device side. The traditional histogram based [24] and Retinex based [15, 16] methods often have difficulties to meet all the requirements simultaneously.

In this paper, we propose a Retinex based low-light enhancement model shown in Fig. 2. To address the aforementioned challenges, the key issue is to produce a high-quality illumination map T. Specifically, on one hand, to reveal the detailed texture pattern, T is required to be piecewise-smooth for the Retinex model. On the other hand, T is preferred to be structure-aware, since it would introduce visual inconsistency if multiple filtering strengths were assigned to a same semantic region. In our model, we initially estimate the illumination map T through the max-RGB technique, and then apply a self-guided filter to refine T. The contribution of our paper is two-fold. First, the illumination map is refined for the Retinex model with effectiveness and efficiency. Second, experimental results empirically show that our model achieve good balance among the requirements of low-light image enhancement. Of note, this paper is an extension of a conference paper [4]. In this version, we provide more details in describing the research background, and introduce more related works. We also provide additional experimental results and their analysis to further validate the effectiveness of our model.

Fig. 2
figure 2

The general framework of our method (Better with an enlarged view)

The rest of this paper is organized as follows. We briefly introduce the related works for processing low-light images in Section 2. Section 3 presents the proposed method. We validate our model in Section 4. Section 5 finally concludes the paper.

2 Related works

Image enhancement is usually a prerequisite step in many research fields, such as natural image classification/retrieval [35, 37, 38], medical image processing [8, 39], social media analysis [13, 31], and visual surveillance [21, 34]. The central task of enhancement can be also quite different, e.g., contrast enhancement [29], color enhancement [19], detail enhancement [9], composition enhancement [23], to name but a few. In this paper, we briefly introduce the related works on low-light enhancement models.

Retinex based models achieve promising results on enhancing low-light images. Its main idea is to decompose an image into an illumination map and a reflectance map [15, 16], where the reflectance can be treated as the enhanced result. However, this roadmap is limited as the produced result is often over-enhanced. Guo et al. [7] propose a simplified enhancement model LIME and achieves good results. Nevertheless, the model always needs a gamma correction for non-linearly rescaling the refined illumination map. This additional post-processing stage lowers the model’s robustness. Our model is most related to the LIME model, but distinguishes itself mainly in the model simplicity. By using a self-guided filter, we directly obtain the illumination map, which can be directly adopted in the enhancing model.

As a classical problem in computer vision and graphics, intrinsic image decomposition can be also used in the enhancement task. Fu et al. [6] propose a weighted variational model for simultaneously estimating reflectance and illumination, and apply this model in manipulating the illumination map. The limitation of this method is that its computational complexity is relatively high, as it aims at simultaneously recovering two channels. Based on HSV representation, Yue et al. [29] decompose the V channel into an illumination layer and a reflectance layer based on Split Bregman algorithm. By adjusting the illumination layer through Gamma Correction and histogram equalization, all the layers and channels are gradually integrated to achieve the final result.

Differently, Dong et al. [3] propose an interesting technical roadmap. They consider the inverse of the estimated illumination map as a hazed image, and obtain enhancement through the dehazing model. Song et al. [25] extend this model and solve the block artifact issue. Although this model obtains satisfying results, its idea lacks a clear physical meaning to some extent.

As another popular method family, histogram based methods are highlighted for their simplicity of reshaping the image histogram into a desired distribution. Traditional methods [1, 17] tend to over- or under-enhance the target image, since they stretch the illumination range without considering image details. Recently, a 2-D histogram based on the layered difference representation is built and effectively applied in the low-light enhancing task [18].

There are methods that combine different enhancing strategies. Fu et al. [5] proposed a novel mixture method. After estimating the illumination layer, they generate multiple enhancing results and then fuse them with a multi-scale pyramid model, which combines the strengths of several enhancing models. Lim et al. [20] propose to split an image into structure, texture and noise components, and use 2D–histogram-based scheme to enhance low-light images.

Of note, with the rapid development of deep learning models, it is possible to train a deep network to realize the low-light enhancement task. Lore et al. [22] propose a deep autoencoder-based approach to identify signal features from low-light images and adaptively brighten images without over-amplifying the lighter parts in an image.

3 Proposed method

In this section, we present our low-light enhancement model with a refined illumination map. The general framework is shown in Fig. 2. Given an input image, we initially estimate the illumination map with the max-RGB technique. Then we refine the initial estimation through a self-guided filtering process, which firstly removes the fine texture and then iteratively recovers the edge under a rolling guidance. Based on the refined map, the image is enhanced with a simplified Retinex model.

3.1 Simplified Retinex model

The Retinex model represents an observed low light image \( \mathbf{I}\in {\mathbb{R}}^{{\mathrm{N}}_1\times {\mathrm{N}}_2} \) as the pixel-wise multiplication:

$$ \mathbf{I}=\mathbf{R}\bigodot \mathbf{T} $$
(1)

where \( \mathbf{R}\in {\mathbb{R}}^{{\mathbf{N}}_1\times {\mathbf{N}}_2} \) is the reflectance layer and \( \mathbf{T}\in {\mathbb{R}}^{{\mathbf{N}}_1\times {\mathbf{N}}_2} \) is the illumination map. This model assumes that the scene sensed by human’s visual system is the product of R and T. Specifically, R is an image with an ideal light condition and is considered as the enhanced result, and T is the illumination map that controls the strength of lowering image intensity.

As the problem in Eq. 1 is ill-posed, methods based on full intrinsic decomposition [6] can be adopted. Nevertheless, this process is often time-consuming. In our research, we use a simplified model advocated in [7], which directly recovers R based on an element-wise division R = I/T.

We note that the direct element-wise division can be numerically unstable in case of very low T values. Therefore, to enhance a color image, a constant regularization term ϵ is added to the enhancement model for each channel:

$$ {\boldsymbol{R}}^{\boldsymbol{c}}(p)={\boldsymbol{I}}^{\boldsymbol{c}}(p)/\left(\mathbf{T}\left(\boldsymbol{p}\right)+\upepsilon \right) $$
(2)

where c is one channel of RGB, i.e. c ∈ {R, G, B}, and p represents a pixel of an image. This model is extremely simple and fast. To obtain satisfying results, the key is to further estimate an appropriate illumination map.

3.2 Illumination map refinement

On initially estimating each element T(p) in the illumination map, the Max-RGB technique is chosen:

$$ \mathbf{T}(p)=\underset{\boldsymbol{c}\in \left\{\mathbf{R},\mathbf{G},\mathbf{B}\right\}}{\mathbf{\max}}{\boldsymbol{I}}^{\boldsymbol{c}}\left(\boldsymbol{p}\right) $$
(3)

In this equation, it is assumed that the illumination for each pixel is at least the maximum value among its three channels. Although there are other methods [6, 29] for accurately estimating the illumination map, we still choose the Max-RGB technique in our model for two reasons. First, the Max-RGB technique is extremely simple and fast. Second, the inaccurate estimation can be left for the following refinement stage [7], which produces a better T desired by the low-light enhancement task.

From Eq. 3, we can see that T is estimated in the pixel-wise style, producing an illumination map that is quite similar to the original image in terms of its local structure and texture pattern. In this way, the texture patterns of a patch in I can be flatted by T during the division process in Eq. 2, which results in detail loss (such as the red rectangle of R 0 in Fig. 3). There are some simple ways of refining the initial estimation, such as further applying the block-wise mean filter on T. It seems that the original pattern can be reserved by the smoothed T. However, we note that artifacts are likely to be brought together, such as the edge reversal effect shown in the yellow rectangles of R 1 and R 2 in Fig. 3.

Fig. 3
figure 3

A toy example of example of the difference between the initially estimated illumination map (G 0) and its refined version during the iterative filter (G 1 to G 3), as well as their enhanced results (R 0 to R 3)

Therefore, we prefer the illumination map to be both piecewise-constant and structure-aware in our research. Specifically, the texture pattern in the same semantic regions in T should be removed, while the boundaries between different regions should be preserved. To this end, we refine the initially estimated T with an iterative self-guided filtering model [30], which includes the following two stages.

The first stage is the texture removal. We convolute T with a Gaussian kernel parameterized by a scale factor σ s :

$$ {\mathbf{T}}_g(p)=\frac{1}{K_p}\sum \limits_{q\in N(p)}\mathit{\exp}\left(-\frac{{\left|\left|p-q\right|\right|}^2}{2{\sigma}_s^2}\right)\boldsymbol{T}(q) $$
(4)

where ||p − q|| is the Euclidean distance between two pixels, \( {K}_p=\sum \limits_{q\in N(p)}\exp \left(-\frac{{\left|\left|p-q\right|\right|}^2}{2{\upsigma}_s^2}\right) \) is the normalization factor, and is N(p) the neighbor patch around p. The filtering process can be seen as a Gaussian weighted average controlled by the spatial parameter σ s . In this way, the image details with their spatial scale under σ s are removed. However, as boundaries of main structures in T are blurred during the Gaussian filtering, we need a following stage that restores the edges with T g and T.

The second stage is the iterative edge recovery. According to the information theory, it is impossible to recover the blurred edge only with T g . So we adopt the idea of joint filtering [2, 12] to address this issue, which leverages multiple image sources. Specifically, we anchor T as the input, and set the blurred T g as the initial guidance G 0. By using the joint filter \( \mathcal{F}\left(\mathbf{T},{\mathbf{G}}_0\right) \), we can obtain \( {\mathbf{G}}_1=\mathcal{F}\left(\mathbf{T},{\mathbf{G}}_0\right) \) that partially recovers main edge structures (e.g. G 1 in Fig. 3). We then use G 1 as the updated guidance for the next round filtering \( {\mathbf{G}}_2=\mathcal{F}\left(\mathbf{T},{\mathbf{G}}_1\right) \). In this way, we empirically found that G k  quickly converges to a piecewise-constant map after a few iterations (e.g. the second row in Fig. 3). In our research, we choose the fast guided filter [11] as our joint filter to ensure the efficiency of the whole refinement process. Of note, the iterative edge recovery strategy (illustrated within the dot line in Fig. 2) is different from the traditional joint filtering framework, since it gradually refines the guidance G while fix the input image T [30]. Based on the refined illumination map, we can obtain the enhanced image as:

$$ {\boldsymbol{R}}^{\boldsymbol{c}}(p)={\boldsymbol{I}}^{\boldsymbol{c}}(p)/\left({\mathbf{G}}_k\left(\boldsymbol{p}\right)+\upepsilon \right) $$
(5)

where c is one channel of an RGB-based color image, and p represents a pixel. We provide a toy example to demonstrate the effectiveness of our iterative edge recovery in Fig. 3. From G 0 to G 3, we observe that the fine-scale texture is smoothed, and the region boundaries become clear. In the meanwhile, we observe the corresponding enhancing results R 0 to R 3 gradually improve, as the texture patterns are preserved and the halo effect disappears.

The complexity of our model is presented as follows. Suppose N is the number of image pixels (N = N1 × N2). The maxRBG step takes O(3N) time. The texture removal step can be also realized in O(N) time regardless of its scale parameter σ s with the box filtering technique. As for the iterative guided filtering with k iterations, it needs O(k ∙ N/σ s ) time by using the fast guided filtering technique [11]. Finally, Eq. 5 is applied on RGB channels respectively, producing O(3N) computational cost. The overall complexity is proportional to O(N) level, which is comparable to other state-of-the-art models.

4 Experiments and analysis

We conduct the experiments on a laptop with a 2.6G Hz CPU and 8G ROM. Our method was implemented with un-optimized codes on the Matlab platform in a single thread. We use the experimental data from [7, 18] and the Internet, which composes of different low-light conditions such as nighttime, backlight and sidelight.

We first validate the effectiveness of using the refined illumination map. In Fig. 4, we can see that the texture details are lost (e.g. Fig. 4 (g)) if the initial T (Fig. 4 (c)) is not refined. By contrast, our result (Fig. 4 (d)) preserves the details (e.g. Fig. 4 (i)), as the refined map (Fig. 4 (e)) becomes piecewise-smooth (e.g. Fig. 4 (j)). Then we compare different strategies of illumination map refinement in Fig. 5, where we respectively use LIME [7], RTV [27] and self-guided filtering for refining T. The simplified Retinex model in Eq.2 was used for three models, and the only difference lies in the refined maps. From visual comparison, we observe that our method achieves the best visual consistency. As shown in the zoomed-in region in Fig. 5, our result has better naturalness than other two models, since our refinement process keeps looking at the homogenous guidance image, and therefore becomes more structure-aware to the complex local pattern.

Fig. 4
figure 4

Results generated by models with/without refined illumination map. a is the original image. (c) and (e) are the illumination maps directly from maxRGB and based our refinement. b and d are their corresponding results. (f-j) are zoomed-in regions from (a-e) with corresponding colors

Fig. 5
figure 5

Results generated by different illumination maps based on RTV [27], LIME [7] and ours

In experimental comparison with other methods, the parameters of our method are set as follows. We empirically set the numerical stability parameter ϵ as 0.15, which works well for all the experimental images for different low-light conditions. As for the illumination map refinement, we empirically set the scale parameter σ s around 1/250 of the minimum of the image width and height. In experiments, we found three iterations (k = 3) are enough.

We first compare our method with several traditional image enhancing methods, such as Histogram Equalization (HE), Adaptive Histogram Equalization (AHE), and Gamma Correction (GC). In the implementation, we directly use the toolbox from Matlab to realize HE and AHE. For GC, we empirically set γ GC  = 0.8. We choose one nighttime, one backlight and one sidelight image respectively, and demonstrate all their results in Fig. 6. We have the following observations. First, HE and AHE produce over-enhanced results, and they are also sensitive to image noises. Second, the results of GC are more natural and robust to noises. However, since GC only imposes a global non-linear mapping function for all the pixels, the enhancing effects are less apparent. Third, our results generally have a better global correcting effect, and recover the details from the dark regions while keep the naturalness of the previously bright regions.

Fig. 6
figure 6

Visual comparison with traditional low-light enhancing methods. (Better with an enlarged view)

We then compare our method with several state-of-the-art methods, i.e. LIME [7], Multi-scale Fusion (MF) [5], and Layered Difference Representation (LDR) [18]. In this comparison, we use the default parameter settings in [5, 7, 18]. Of note, we do not apply the post-denoising technique for all the results since it tends to generate the unrealistic cartoon effect. Fig. 7 shows the visual comparison between these state-of-the-art model and ours. We can see that our model achieves most balanced effect in terms of contrast enhancement, detail recovery and naturalness preserving. For example, our method generates fewer artifacts, such as noises and edge halos for nighttime and backlight images, than LIME and MF. LDR is most robust to these artifacts, but the global luminance of its results is relatively weak, which limits the detail recovery. As for the implementation time, our method is comparable to LIME and MF, which is able to process 500 K pixels in less than one second. The histogram based LDR achieves the fastest implementing time, as its computational load only lies in the optimization of a 256-dimensional histogram.

Fig. 7
figure 7

Visual comparison with several state-of-the-art low-light enhancing methods. (Better with an enlarged view)

Finally, we make a further comparison between LIME and our model. Since the refined illumination map based on LIME needs a post-gamma-correction, we demonstrate multiple versions of LIME-based results with different γs (from 0.5 to 1) and our results in Fig.8. We observe that the global lightness of LIME results heavily depend on the gamma correction parameter. With a large γ, the noise is over-boosted and thus lowers both the image quality and naturalness. In contrary, our model is free of this extra controlling parameter, as the overall intensity distribution of G k is always constrained by the self-guided framework.

Fig. 8
figure 8

Results generated by the LIME model with different illumination maps and our model. (Better with an enlarged view)

5 Conclusion and discussion

In this paper, we propose a simple but effective low-light enhancement model, which uses a piecewise constant illumination map to recover the concealed image details. To produce such a map, we apply an iterative self-guided filter to partition texture patterns from the initial estimation. We have validated our method by comparing it with several traditional and state-of-the-art methods. Experimental results demonstrate that our method is effective in recovering image details while keeping the visual naturalness.

The future research includes the following aspects. First, we plan to introduce a dynamic regularization term into the model in Eq. 2. For example, a spatially guided map [10] based on the lightness can be built and used to guide the regularization. Second, we note that the way of quantitatively assessing low-light enhancing results is still an open problem. It would be valuable to introduce an aesthetics evaluation model [14] specifically designed for our task. Third, aiming at producing content-aware enhancing results, it would be very useful to introduce high-level semantic information [32, 33] into our model.