1 Introduction

Due to inadequate illumination, images captured at night always suffer from interference of varieties of noise and loss of visibility. This issue affects the collection of images in photography, forensics, analysis, monitoring, and some other optical imaging systems. The majority of professional or portable cameras widely used in vision systems cannot capture satisfactory nighttime images. Nighttime image enhancement techniques which aim to increase the interpretability of the images by increasing the intensity of image brightness are highly desired in both consumer photography and computer vision tasks [1, 2]. This issue has become a hot and challenging research topic and extensively studied in the last decade.

A variety of nighttime image enhancement methods were proposed in the past couple of years. Focusing on single low-lighting image enhancement methods, many impressive algorithms were proposed, such as classical image processing methods (histogram equalization, gamma correction, tone mapping [3]) and their variants [4]. Retinex is widely used in image contrast enhancement. For example, Li [5] and Liu [6] got impressive results using variants of retinex. Based on an observation that a pixel-wise inversion of a nighttime image has a quite similar appearance with the image acquired at foggy days, Dong et al. [7] introduced single image haze removal algorithm into nighttime image enhancement. In essence, it is a variant of retinex, too. Following this work, several similar methods [8, 9] were proposed to improve the performance of the dark channel prior [10].

It is important to note that noise must be taken into account in night photography. There are many kinds of noise sources in the nighttime image, such as readout, photon shot, dark current, and fixed pattern noise in addition to photon response non-uniformities [11]. The noise level would be enlarged when lightness and contrast are amplified, especially in the compressed images. To get more satisfactory results, a noise suppression method is strongly demanded. Among the above methods, Zhang [8] smoothed results with bilateral filter. Other authors do not refer to noise removal. The complexity of noise in nighttime images makes the performance of commonly used image smoothing methods poor.

Fig. 1
figure 1

An example of the proposed nighttime image enhancement

In this paper, we propose a framework by introducing structure–texture image decomposition method into nighttime image enhancement. Structure–texture image decomposition [12] is widely used in many areas of computer graphics. Here, we introduce this method into nighttime image enhancement to decompose the input image into the structure layer and texture layer. Then, we address two components, respectively, with the improved retinex method and mask-weighted least squares (WLS) [13], and obtain results using image fusion operation. We show an example in Fig. 1, which can demonstrate the effectiveness of the proposed method. As we can see, the visual quality is largely improved and the noise and artifact are apparently reduced. We also conduct sufficient experiments with different types of nighttime images. The experimental results demonstrate that the proposed approach can improve the perceptual quality of nighttime image regarding not only enhancing visibility and contrast, but also effectively suppressing noise and artifact and avoiding excessive reinforcement.

2 Related work

A variety of approaches have been proposed to enhance the quality of the nighttime images. However, some traditional approaches require additional information such as auxiliary input image. These methods usually take two or more images as input, e.g., a nighttime image with an additional infrared image [14] or an image taken under different illumination conditions in the same scene [15], or a sequence of images presenting the same scene [16]. Picking up the practicality and universality into account, enhancement method for single night image is more desired. Here, we focus on the relevant work in the area of single night image enhancement, retinex, and image denoising.

2.1 Retinex and its varieties

Retinex is a commonly used image enhancement model based on scientific experiment and scientific analysis. A natural image is represented by a formula:

$$\begin{aligned} I(x) = R(x)\cdot L(x), \end{aligned}$$
(1)

where x is the pixel index, L(x) is the illumination which the dynamic range of every pixel can be achieved, R(x) is the reflectance which is instinct properties of a scene, and I(x) is the image we captured. Intuitively, this is an ill-posed problem. Usually, we get R(x) using single-scale retinex (SSR), multi-scale retinex (MSR), and some variants [17].

Dong et al. [7] noticed that a pixel-wise inversion of night image has quite similar histogram distribution with foggy image. They utilize Chi-square test to examine the statistical similarities between hazy videos and inverted low-lighting and high dynamic range videos, and the exciting result demonstrates that introducing haze removal algorithm into night image enhancement is reasonable. Following their work, Jiang [9] got impressive results, too. Zhang [8] reduced noise in the results by adding bilateral filter.

This kind of method with clear physical meaning is easy to achieve. Following a similar intuition, the image degradation model can be formulated as follows:

$$\begin{aligned} \dot{I}(x) = \dot{J}(x)\cdot t(x) + A(1-t(x)), \end{aligned}$$
(2)

where x is the pixel index, \(\dot{I}(x) = 1 - I(x)\) is the inverted night image and I(x) is the input night image, \(J(x) = 1 - \dot{J}(x)\) is the scene radiance we desire, t(x) is called medium transmission which is related to distance from the point of pixel x to the camera, and A is the global illumination.

Dark channel prior [10] assumes that there will be at least one pixel that has a dark color channel in a local patch. Using this minimal value as an estimate of the present haze, we can get t(x) effortlessly. Since there always are some dark regions in the input image I(x), we can assume \(A = 1\) reasonably. Then, we can get the enhanced result with the following equation:

$$\begin{aligned} J(x) = I(x)/t(x). \end{aligned}$$
(3)

Note that Eq. 3 is almost identical to retinex if we use logarithmic operation here. This discovery gives the physical meaning of this kind of methods instead of just inspired by statistical analysis.

2.2 Image denoising

A large number of noise removal algorithms are developed for the images with ample light and achieve satisfactory performance. Among the smoothing methods with edge preservation, bilateral filter (BF) [18], guided image filter (GIF) [19], and L0 smoothing [20] can get excellent results. As a nonlinear, edge-preserving, and noise-reducing smoothing filter, BF calculates the weight depending on the Euclidean distance of pixels and the radiometric differences. The BF is defined as Eq. 4:

$$\begin{aligned} I_s (x) = \frac{1}{W_p}\sum _{y\in \Omega (x)}I(y)f_r(||I(y)-I(x)||)f_d(||y-x||), \end{aligned}$$
(4)

where \(W_p\) is a normalization term.

$$\begin{aligned} W_p = \frac{1}{W_p}\sum _{y\in \Omega (x)}f_r(||I(y)-I(x)||)f_d(||y-x||), \end{aligned}$$
(5)

where \(f_r\) is the range kernel for smoothing differences in intensities, \(f_d\) is the spatial kernel for smoothing differences in coordinates, \(\Omega (x)\) is the filter window centered in x, I(x) is the input image, and \(I^f(x)\) is the result we get. \(f_r\) and \(f_d\) usually are Gaussian functions. When given appropriate filter window size and Gaussian function parameters, BF can achieve satisfactory results.

Fig. 2
figure 2

An overview of the proposed method. Firstly, the input image is decomposed into structure layer and texture layer. Then, structure layer is enhanced using the improved retinex method. A mask map from structure layer helps to smooth texture layer with weighted least squares (WLS) optimization. At last, two layers are recombined to get the final result. Here, we amplify texture layer and refined texture layer for visualization

GIF performs edge-preserving smoothing, too. GIF takes an additional input image as a guide image to influence the filtering. It assumes that the filter result is a linear transform of the corresponding input image as the following equation:

$$\begin{aligned} I_s (y) = a_x G(y) + b_x, \end{aligned}$$
(6)

where \((a_x, b_x)\) are some linear coefficients assumed to be constant in the local window centered at x, and G(y) is the guidance image. We can get \((a_x, b_x)\) by minimizing the following cost function in the window

$$\begin{aligned} E(a_x, b_x) = \sum _{y\in \Omega (x)}((a_x G(y) + b_x - I(y))^2 + \sigma a_x^2). \end{aligned}$$
(7)

Here, I(y) is the input image. \(\sigma \) is a regularization parameter preventing \(a_k^2\) from being too large. We apply the linear model in Eq. 6 to all local windows in the entire image to get the output image. Note that we need to choose the size of the filter window carefully.

Besides, weighted least squares (WLS) [13] and edge-avoiding wavelet (EAW) [21] can do edge-preserving smoothing well, too. Given an input image, WLS can seek a new image and a guide image, which are as close as possible to the input image and as smooth as possible at the same time, except across significant gradients. EAW deals with an input image with nonlinear multi-scale edge-avoiding image filters in computation time associated with the input image size. And [22] focuses on the noise in the high dynamic range images.

As one of the state-of-the-art denoising algorithms, BM3D [23] always performs well. BM3D processes an input image in three-dimensional space which is built by adjacent similar blocks. It gets excellent performance in most cases when evaluating the availability of noise removal algorithms using peak signal-to-noise ratio (PSNR), but gets a high time complexity at the same time. However, it is difficult to find appropriate blocks with similar appearance and visibility in a nighttime image.

3 Nighttime image enhancement

3.1 Overview

When dealing with nighttime image, the influence of noise on the results must be taken into account. Hence, we present an image decomposition-based nighttime image enhancement method. To achieve the goal, we model the input image as

$$\begin{aligned} I(x) = I_\mathrm{S}(x) + I_\mathrm{T}(x). \end{aligned}$$
(8)

Here, x is the pixel index, I(x) is the original input image, \(I_\mathrm{S}(x)\) is the structure layer related to larger gradient magnitudes, and \(I_\mathrm{T}(x)\) is the texture layer that contains details, noise, and artifact.

Our primary pipeline is illustrated in Fig. 2. The proposed method begins by decomposing the original input image into two layers: structure layer and texture layer. We enhance structure layer using an improved retinex method and smooth texture layer by weighted least square (WLS) [13] method. To enhance the details, we deal with texture layer using a mask map extracted from the structure layer. Ultimately, we fuse the enhanced structure layer and refined texture layer to get the final result.

$$\begin{aligned} \min _{I_\mathrm{S}} \sum _{x}{(I(x) - I_\mathrm{S}(x))^2 + \lambda \vert \nabla { I_\mathrm{S}(x)\vert }}. \end{aligned}$$
(9)

To obtain the structure layer, we choose to minimize the object function designed as Formula 9 [24], which is based on the total variation image reconstruction regularization. The first term in Formula 9 is to make \(I_\mathrm{S}(x)\) have main structure extracted from I(x), and second term is to ensure \(I_\mathrm{S}(x)\) as smooth as possible locally. The critical parameter \(\lambda \) is allowed to be adjusted to effectively control the level of structure coarseness. \(\nabla \) represents the gradient operator. Equation 9 involves a discrete counting metric. It is hard to solve the two terms model, respectively, the pixel-wise difference and global discontinuity statistically. Traditional gradient decent or other discrete optimization methods are not applicable here. Hence, we introduce auxiliary variables h(x) and v(x), corresponding to horizontal gradient \(\partial _{h}I_\mathrm{S}(x)\) and vertical gradient \(\partial _{v}I_\mathrm{S}(x)\), respectively, and rewrite the objective function as

$$\begin{aligned} \begin{aligned} \min _{I_\mathrm{S},h,v}&\sum _{x}{ (I(x)-I_\mathrm{S}(x))^2 + \lambda (\vert {h(x)\vert }+\vert {v(x)\vert }) }\\&+\beta {((h(x)-\partial _{h}I_\mathrm{S}(x))^2+(v(x)-\partial _{v}I_\mathrm{S}(x))^2) }. \end{aligned} \end{aligned}$$
(10)

We describe the solver using an alternating optimization strategy in Algorithm 1 for clarity. Here, \(\mathcal {F}\) is the fast Fourier transform (FFT) operator, and \(\mathcal {F}^*\) denotes the complex conjugate. \(\beta _0\) is the initial value of \(\beta \) which controls the smoothness, and \(\beta _\mathrm{max}\) is the termination threshold. \(\sigma \) controls the rate of convergence. In this paper, we set \(\lambda = 0.015, \beta _0 = 2\lambda , \beta _\mathrm{max} = 1E5\), and \(\sigma = 2\) by experiment. This operation can extract the main structure from the input image as shown in Fig. 3.

figure a
Fig. 3
figure 3

Results of structure–texture decomposition. Input images are in the top row. Structure layer (middle row) consists of objects with larger gradient magnitudes. Texture layer (bottom row) contains details, noises, and artifacts

Fig. 4
figure 4

Denoising results of different smoothing approaches. Input texture images are at left side, on the right side followed by the results with BF, GIF, EAW, BM3D, and WLS with mask

3.2 Improved retinex enhancement

We design an improved retinex enhancement algorithm to improve the brightness of the structure Layer. The structure layer is modeled as Eq. 11:

$$\begin{aligned} I_\mathrm{S} (x) = I_\mathrm{R} (x)\cdot I_\mathrm{L} (x). \end{aligned}$$
(11)

Here, \(I_\mathrm{R} (x)\) represents reflectance and \(I_\mathrm{L} (x)\) represents illumination. Reflectance represents the intrinsic properties of the objects in the image. Illumination at night is always non-uniform. Traditional methods solve this ill-posed problem by single-scale retinex or multi-scale retinex [17].

Instead of using logarithmic operation and Gaussian filter in SSR and MSR, we estimate \(I_\mathrm{L} (x)\) using a series of nonlinear operations to get more reliable illumination map. We suppress the inherent halo effect caused by retinex model with edge-preserving smoothing method.

Due to multiple light sources and the rapid decay of light, illumination in nighttime images is non-uniform in most cases. Motivated by dark channel prior, we design a simple algorithm to get an approximate illumination map. First of all, we get the brightest channel of the structure layer:

$$\begin{aligned} I_\mathrm{M} (x) = \max _{c\in (r,g,b)} I_\mathrm{S}^c (x), \end{aligned}$$
(12)

where (rgb) means the three channels of the image.

Illumination should be consistent and smooth in the local block. Hence, we make the illumination map as smooth as possible. We smooth \(I_\mathrm{M} (x)\) using the median filter first and then remove the details from it.

$$\begin{aligned} I_\mathrm{L} (x) = \frac{1}{N}\sum _{i=1}^{N}\mathrm{GIF}_i (M(x)). \end{aligned}$$
(13)

Here, \(M(\cdot )\) represents median filter, \(\mathrm{GIF}(\cdot )\) is the guide image filter and N is the number of GIF kernels. In this work, we set median filtering kernel to 15. We use multi-scale kernels GIF to make the illumination map texture free. In this work, we use 3 (N = 3, including 3, 9, and 15) size kernels to generate the results.

Introducing the illumination map \(I_\mathrm{L} (x)\) into Eq. 11, we can get enhanced structure layer

$$\begin{aligned} I_\mathrm{E} (x) = \frac{I_\mathrm{S} (x)}{I_\mathrm{L} (x) + \epsilon }. \end{aligned}$$
(14)

Here, \(\epsilon \) is a small positive value to keep the fraction is meaningful in case of \(I_\mathrm{L} (x) = 0\). In this paper, we fix it to 0.01.

3.3 Noise and artifact suppression

Details, noise, and artifact are stirred together in the texture layer. It is difficult to select an appropriate kernel scale for noise and artifact removal when we use image smoothing filters. Improving the ability of noise-reducing method might not be suitable since details will also be removed at the same time. Hence, we process the texture layer with not only image smoothing method, but also a mask map.

There is an intuitive observation that the dark regions contain more random noise because of lack intensity of illumination. We can see this in Fig. 1 that even in the completely dark areas there still exists random noise. Hence, we draw a mask map based on this observation. As shown in Eq. 15, we design this operation to weight the importance of the smoothing texture layer.

$$\begin{aligned} \mathrm{Mask}(x) = \min (I_\mathrm{L} (x),1-I_\mathrm{L} (x)). \end{aligned}$$
(15)

Then, the luminance component is smoothed by Eq. 16.

$$\begin{aligned} I_\mathrm{F}(x) = \mathrm{Mask}(x)\cdot \mathrm{WLS}(I_\mathrm{T}(x)/I_{L}(x)), \end{aligned}$$
(16)

where \(I_\mathrm{F}(x)\) is the refined texture layer.

To demonstrate the effect of the proposed method, we try several different approaches to reduce the noise, such as BF, GIF, EAW, BM3D, and WLS, with mask as shown in Fig. 4. At last, we add enhanced structure layer and refined texture layer together to get the final result R(x).

$$\begin{aligned} R(x) = I_\mathrm{E}(x)+I_\mathrm{F}(x). \end{aligned}$$
(17)
Table 1 Quantitative comparison of the average NIQE, BRISQUE, and ILNIQE
Fig. 5
figure 5

Comparative results for subjective evaluation. From left to right: input images, Jiang’s results [9], MSRCR, Zhang’s results [8], results with GIF [19], results with EAW [13], results with BM3D [23], and our results

4 Experiments and assessment

To evaluate the proposed method, we test it using a total of 310 nighttime images and make comparisons with different methods for subjective and objective evaluation.

We compute several non-reference evaluation metrics (including no-reference image quality evaluator (NIQE) [25], blind/referenceless image spatial quality evaluator (BRISQUE) [26], and ILNIQE [27]). To demonstrate the effect of the proposed method, we apply several different approaches to reduce the noises, such as multi-scale retinex with color restoration (MSRCR), dark channel prior-based nighttime image enhancement method (DCP-NIE) [9], and reducing noises with different methods (BF [8], GIF [19], EAW[21], and BM3D [23]). In addition, we compare our method with more low-lighting image enhancement approaches [28,29,30,31]. The results are demonstrated in Table 1. We can find that the proposed approach outperforms other methods and gets competitive results with BM3D according to these no-reference image quality evaluators. But the proposed method runs 8 times faster than BM3D does. BF [8] and GIF [19] get similar scores at this task. But we can see that BF removes more noise than GIF as shown in Fig. 5. EAW [21] does not work well in this task. LIME [28], FEMWII [29], JIEPMR [30], and FMSBIE [31] outperform previous approaches without noise removal. But we can find that the methods with noise removal operation get better performance. This result indicates that noise removal operation is effective.

We display some comparative results for subjective evaluation in Fig. 5 to get a more intuitive comparison. Without noise reduction operation, Jiang [9] gets impressive but noisy results, especially in dark areas. MSRCR gets the brightness-enhanced image with obvious color distortion. The results based on the structure–texture decomposition framework depend on the use of the denoising method. Zhang [8] gets a similar objective appraisal as GIF [19] does. But the difference is that BF gets noticeable noise points in the results, and GIF gets fuzzy results with fuzzy noise and artifacts. EAW [21] does not work well in nighttime image noise removal. BM3D [23] gets impressive results. It reduces the noise and artifacts apparently. However, it removes details when they are not so noticeable. Our method can make the enhanced images smooth enough while preserving useful details, as shown in the second and third rows in Fig. 5. To make the comparison more distinguishable, we enlarge the result to see more details in Fig. 6. We can find that our result retains details (red patch) while removing noise and artifact (green patch), as shown in the zoomed image patches. BM3D fails in this case because it is difficult to locate enough patches to maintain the faint edges.

Fig. 6
figure 6

A comparative experiment of the zoomed results. The above two lines are arranged in the following order: input image, BM3D [23], and our results. The zoomed image patches are arranged in the same way

Fig. 7
figure 7

More comparative experiments. The images are ordered from left to right: input images, LIME [28], FEMWII [29], JIEPMR [30], FMSBIE [31], and our results

Fig. 8
figure 8

Comparison experiment of the noise reduction ability. From left to right: input image, BM3D + enhancement [9] (denoise before enhancement), enhancement [9] + BM3D (denoise after enhancement), and our results with BM3D

We show other comparative experiments in Fig. 7. LIME [28] gets more impressive results than other three methods, just as the objective evaluation in Table 1. Our approach gets results as bright as LIME does but without noise and artifact.

Fig. 9
figure 9

More experimental results

In order to demonstrate the role of the structure–texture decomposition, we compare our method with two different denoising strategies, i.e., noise reduction before enhancement and after enhancement. Since the visibility is too low, removing noise before enhancement makes the consequences blurry and noisy. However, noise is amplified after enhancement. It makes eliminating noise difficult, too. In Fig. 8, we can find that image decomposition helps suppress noise with the same denoising method. This demonstrates the importance of structure–texture decomposition in nighttime image enhancement. We notice that our method cannot handle the nighttime image with heavy color distortion as shown in the bottom left image. And the white balance algorithm [32] as a post-processing can be adopted to improve image quality. At last, we show more results to demonstrate the effectiveness of the proposed approach in Fig. 9.

5 Conclusion

The main contribution of our work is constructing a framework for enhancing nighttime image while suppressing noise and artifact. The main idea is to decompose the input image into the structure and texture components. Then, we apply an improved retinex approach to enhance the visibility of the structure layer and reduce noise and artifact in the texture layer with mask-weighted least square. After the above operation, we fuse two components together to gain the final result. Experimental results demonstrate that the proposed method has good performance in most conditions. Compared to other methods with commonly used image smoothing methods, the proposed approach gets natural, detail-rich, and noise-free results.