1 Introduction

With the development of science and technology, it has become possible to record the splendour of the world in the form of images, using digital imaging equipment. Because capturing high-quality images in dim light is difficult, an angle with sufficient light is typically chosen for shooting images. Under nocturnal illumination, insufficient indoor lighting, or cloudy weather conditions [12], the photon count and signal to noise ratio (SNR) are low. As a result, the amount of light that is reflected from the surface of the imaged object is relatively small, and the image acquisition equipment cannot effectively record the colours of the imaged object [10]. Images captured in low light have low brightness, low contrast, relatively high noise, and artifacts, which seriously affect the visual experience.

When the light source changes, the reflection spectrum of the surface of the imaged object changes as well. The human visual system exhibits colour constancy [22], which allows objects to be distinguished even under low illumination. Image-acquisition devices simulate the human eye, and use charge-coupled device (CCD) / complementary metal oxide semiconductor (CMOS) image sensors to convert external light stimuli into electrical signals, which are then stored in the digital memory in a specific format. Under low illumination, these sensors can only record those photons that enter the lens; thus, the colour deviation in the images captured by these image-acquisition devices is serious. Although hardware technology can also be used to increase the amount of captured photons, hardware is usually expensive and the processing effect is not ideal [16]. Besides, when the image has been taken, the hardware method can no longer change the image quality. Therefore, the image processing method may become the best solution. As digital images are widely used in intelligent transportation [31] and remote sensing and surveillance [46], methods for improving the processing of low-light images are essential.

Ying [51] proposed an exposure fusion framework. By designing a weighting matrix and image fusion using the illumination estimation technology, contrast enhancement was achieved. Another work described an enhancement algorithm for medical images [28], which used contrast-limited adaptive histogram equalisation (CLAHE) for improving the overall contrast of the images; the resulting visual effects were very good. Lu [33] proposed a deep convolutional neural network (CNN) with depth estimation, for solving the scattering problem of underwater images under low illumination. Besides, the modeling of uncertain differential equations is a commonly used method in applied science and engineering [2, 3]. The fuzzy equation solving approach [4, 5] has a certain role in promoting the development of image enhancement technology.

Image enhancement has been widely used in various fields, and the enhancement of images under low illumination conditions is an area of extensive research. Low-light enhancement methods can be roughly divided into two types: to perform end-to-end training directly, and the other is to enhance by light estimation. The methods of network training are mostly based on data driving, and the model is obtained through a large amount of training data. Illumination estimation requires comprehensive consideration of pixels in all channels of the image. The task is more cumbersome and prone to noise. In this article, to minimize noise, we enhance the low-light image by inverting it.

A low-light image is inverted, and this inverted image appears foggy [14]. In the present work, we sought to invert low-light images to obtain foggy images, and use the inverted images for calculating the scene-to-camera transmission and global atmospheric light component at that time. The obtained coefficients were incorporated into the low-light imaging model [17, 36, 40], to yield clear images. Based on the qualitative and quantitative evaluation of the current experimental results, we conclude that our proposed scheme can overcome problems such as colour distortion, excessive enhancement, and noise. The method proposed in this study may provide fresh insights regarding the enhancement of low-light images.

Our proposed method has three contributions to the field:

  1. (1)

    We propose a method for the enhancement of low-light images using a conventional physical model. By estimating the atmospheric light value and illuminance map, clear images with enhanced brightness and detail are obtained using the proposed method.

  2. (2)

    We use a pyramid-type dense residual block network for estimating the transmission map of the image. Dense residual blocks can effectively extract features at different levels, and a pyramidal pooling layer realises the common estimation of transmission maps for images of different scales.

  3. (3)

    We have carried out many comparative experiments that suggest the proposed method is superior to other algorithms.

The remainder of this article is organised as follows. Section 2 briefly introduces some related work on low-light image enhancement. Section 3 mainly introduces the methods mentioned in this article, including low-light imaging models, estimating atmospheric light values and light maps, and the method for calculating clear images with enhanced brightness and detail. In Section 4, the qualitative and quantitative analyses of the experimental results are reported, while the study conclusions are listed in Section 5.

2 Related work

Low-light image enhancement is typically accomplished using the following two approaches: (1) Improving the characteristics of the associated hardware, such as cameras that support thermal imaging and infrared sensors; (2) Using low-light image enhancement algorithms. Note that the cost of the specialised hardware is typically quite high, making it inaccessible to the general public. On the other hand, algorithmic low-light image enhancement can effectively improve the images’ quality, and is much more affordable. Many algorithmic methods for low-light image enhancement have been proposed. In the following, we review the existing approaches and discuss some closely related research.

2.1 Existing low-light enhancement methods

Low-light imaging uses a camera to process the initial acquired data for building an enhanced RGB output. The objective of low-light image enhancement is to improve the brightness and contrast of the acquired images, so that the details hidden in the dark can be resolved. One approach toward low-light enhancement is based on the stack-based high dynamic range (HDR) method [8]. The HDR method improves image quality by capturing and fusing multiple images with different exposures / low dynamic ranges. However, this method requires a combination of multiple images. Any movement of dynamic images may cause differences between different images, so those images are likely to feature serious artifacts following the HDR synthesis.

Conventional low-light image enhancement algorithms include the gray-scale transformation method and the histogram equalisation (HE) method [18]. Huang [19] proposed an adaptive gamma correction algorithm based on the cumulative distribution probability histogram. Kim [21] proposed a standard adaptive HE. In another study, an algorithm was proposed that uses the context information between the image’s pixels to enhance the contrast of low-light images [9]. Wang [47] proposed an equal-size dualistic subimage HE (DSIHE) algorithm, for dividing the original image into two equal-size parts to maximise the image’s entropy, thereby solving the image information loss problem; this method aims at improving the image brightness and contrast.

Light-based low-light enhancement methods have also been proposed. The original Retinex theory [23] assumed that image colours can be decomposed into two parts, namely reflectance and illumination. Later, based on the revision of that theory, single-scale Retinex (SSR) [20] and multi-scale Retinex theories with colour restoration (MSRCR) [34] have been developed; however, it was noted that the finally obtained images look unnatural. In [15], for better reflectivity and illumination, the simultaneous reflectance and illumination estimation (SRIE) approach was used. Li [29] introduced a new noise term. Although the proposed method was able to suppress the noise in the images to some extent, it did not address the colour distortion problem.

Different from the previous methods based on prior knowledge, hereby we propose an algorithm for low-light enhancement of images using a conventional physical model. We introduce CNNs, combining a priori and deep learning-based methods. First, a low-luminance image is inverted to obtain a foggy image. The foggy image is introduced to a pyramid-type dense residual block network, to estimate the image’s transmission map. We use a dark primary colour prior to estimate the atmospheric light value of the scene at that time. The obtained transmission map and atmospheric light value are substituted into the physical model of low-light imaging, to obtain an enhanced clear image. The qualitative and quantitative analyses of our experimental results demonstrate the effectiveness of the proposed method.

2.2 Deep learning-based low-light enhancement methods

Deep learning has demonstrated excellent results in computer vision tasks. For low-light image enhancement tasks, deep learning has been widely used as well. For example, the low-light net (LLNet), proposed in [32], featured a contrast enhancement and denoising module. According to the theory of multi-scale Retinex, the MSR-net [43] CNN realises end-to-end mapping by learning the mapping relationship between low-light images and clear images. In addition, the RetinexNet [49] deep CNN performs operations such as image decomposition, denoising [13], and lighting mapping. However, because only the real light on the ground is considered, the effect of noise on light is ignored. Chen [11] proposed an end-to-end low-light image-processing method based on a full CNN, and demonstrated good results with respect to noise suppression and colour distortion processing. However, that proposed method was limited to a specific data format. When the same network processed data in the JPEG format, the performance markedly decreased. In the enhancement of a low-light image using a CNN, the CNN learns the mapping between the low-light image and the corresponding clear image, thereby achieving the end-to-end reconstruction. Therefore, learning the mapping between the two images (original and clear) is key to successful enhancement; it is equally important to select a proper data set for training. When images in the training set are not uniformly illuminated, local over-enhancement can occur.

2.3 Image dehazing methods

Atmospheric particles can significantly absorb or scatter light, degrading the quality of acquired images. In the fields of image processing, multimedia, and computer vision, image dehazing has been an actively researched topic. Single-image dehazing is a highly ill-posed problem. Image dehazing methods can be categorised into those that utilise prior knowledge, and into data-driven methods that are based on deep learning.

Typically, several images are acquired, under different weather conditions [38]. Then, statistical analysis is performed on these acquired images, and differences between foggy images and clear images are determined and registered. For example, Omer [41] used the “colour line” assumption to specify that pixel intensities in a small image block are distributed in one dimension in the RGB space. However, the pixels’ intensities for a foggy image deviate from the straight line, and a mapping can be estimated based on this deviation. Berman [6] determined, by statistical analysis, that clear images contain hundreds of different colours, and they are aggregated into points in the RGB space, while no such aggregation was observed for foggy images.

Among the existing data-driven dehazing methods based on deep learning, MSCNN [42] was the first proposed method for dehazing a single image using CNNs, using an input-output two-level network for training. Li et al. [27, 30] proposed a dehazing image method based on the residual network, which avoids the estimation of atmospheric light value and improves the dehazing efficiency. In [50], a pre-trained VGG [44] network was used as an encoder, and training was performed by combining the mean squared error (MSE) and perceptual loss metrics. To sum up, deep learning-based methods perform supervised training based on a large number of collected data, and had demonstrated satisfactory results.

3 Proposed method

The ultimate objective of low-light image enhancement is to improve the brightness and contrast of low-light images, thereby making these low-light images clear. In this paper, we start with the imaging model of the low-light image and solve other parameters in the model to finally obtain a clear image. First, we reverse the low-light image to obtain a foggy image. Then, the foggy image is subjected to the dehazing operation, and the transmission as well as the global atmospheric light component are calculated. Finally, the calculated transmission and the global atmospheric light components obtained by dehazing the image are substituted into the low-light imaging model to solve for the clear image. In the following sections, the imaging model for low-light images is first introduced (Section 3.1). Next, we explain how to estimate the transmission map (Section 3.2) and atmospheric light values (Section 3.3). In Section 3.4, we present the resulting enhanced low-light image. The flowchart of the proposed method is shown in Fig. 1.

Fig. 1
figure 1

Pipeline framework of the proposed method

3.1 Low-light imaging model

The light emitted from the surface of an object is reflected to an imaging unit to form an image. This is demonstrated in Fig. 2, where I(x, y) is the incident light, R(x, y) is the light reflected from the surface of the imaged object, and F(x, y) is the imaging light (received by the imaging device). The incident light source is divided into two sources of external light and object reflection. The perceived colour of the external light directly depends on the spectral component of the light source. The reflected light refers to those spectral components of the incident light that are not absorbed by the object but rather are reflected. For example, the colour of the leaves is green. This is because leaves absorb much of the blue-violet and red-orange spectral components of light, do not absorb the green spectral components, and reflect the green components back. The colour perception of non-light-emitting objects depends on the spectral components of external light and on the physical characteristics of the absorption spectrum of the object. The retina of the human eye has a constant colour characteristic, and the colour perception of objects under different incident light conditions tends to be stable.

Fig. 2
figure 2

Object imaging model

Video image acquisition devices simulate the visual imaging process of the human eye. However, video imaging devices use CCD/CMOS image sensors to convert external light stimuli into electrical signals before proceeding to the next storage operation. Video imaging devices only record the result of the accumulation of photons that enter the device’s lens, and the quality of imaging depends on the external lighting conditions. Under low lighting, the number of photons and SNR are small ; thus, the imaging quality is poor. Low lighting conditions can be divided into many categories, for example image acquisition under very low lighting, which introduces a strong noise and colour distortion. In addition, during the sunset, most objects appear backlit, which can cause colour distortion in captured images. To enhance the imaging of low-light images, imaging devices often can adjust their exposure. However, short-term exposures are susceptible to noise, and long-term exposures are likely to cause blurring. A low-light imaging model has been proposed in [17, 36, 40]; this model is given in (1), as follows:

$$ I = R*T + a(1 - T), $$
(1)

where a is the atmospheric light intensity, I is the low-light image, T is the transmission map, and R is the clear image after the enhancement. Unlike traditional image degradation, this image has certain reduced characteristics. This characteristic is related to depth and unevenly spans the entire image.

After inverting a low-light image, the inverted image is used with the low-light imaging model, as shown in (2). The low-light image after the inversion is subjected to the image dehazing operation. The imaging model of a foggy image is given in (3), as follows:

$$ 1 - I = (1 - R)*T + a(1 - T), $$
(2)
$$ I = J*T + a(1 - T), $$
(3)

where a is the global atmospheric light component, I is the foggy image, T is the scene-to-camera transmission, and J is the clear image after the dehazing.

Comparing (1) and (3), we see that the two equations are very similar, and the interpretations of their parameters are similar as well. Since it is difficult to estimate the parameters of a low-light image, a low-light image is inverted to obtain a similar foggy image. Through the dehazing operation of the foggy image, the atmospheric light intensity value and the transmittance value are obtained. Substituting these parameters into (2), an enhanced clear image is obtained.

3.2 Transmission map estimation network

In recent years, people have used statistical cues to count the statistical characteristics of hazy images, thus obtaining prior knowledge. For example, He [17] counts the characteristics of more than 5,000 images and derives the dark channel a priori theory. Still, this theory is not ideal for processing white areas such as the sky. Therefore, there are certain errors in determining the transmission image by extracting the chroma, texture, and contrast from blurred images. The method based on a priori knowledge may not apply to all images, and the estimated transmission map may not be accurate enough.

With the development of deep learning, data-driven methods have begun to emerge. CNNs estimate transmission maps by inferring the inherent characteristics of foggy images. In this section, we consider a pyramid-like dense residual block network for estimating the transmission map. The network uses dense residual blocks for feature extraction. Then use the multi-level pooling module to preserve the larger global structure. Finally, up-sampling is performed for bringing the size of the convolutional layer to the original size. The network structure of the final output transmission map is shown in Fig. 3.

Fig. 3
figure 3

Network framework for the transmission map estimation. a The network framework of the CNN. b The structure of the residual block

As shown in Fig. 3, we use a convolutional neural network for feature estimation. Among them, the dense block can maximize the information flow of the transfer feature, and connect all layers to ensure better convergence. Each residual block contains 4 small blocks, the size of the convolution kernel is 3 ×3, and the number of convolution kernels is 32. In addition, to refine the global structure information, we perform pooling on the extracted features. The pooling layer reduces the size of the feature map to 1/4, 1/8, 1/16, and 1/32 of the original size, respectively. Since the image size has changed after pooling, we restore the image to its original resolution through up-sampling. After the above series of operations, the features of different levels of the image are fully excavated.

In the network, we consider dense blocks and multi-level pyramid pooling block as the coding structure. In addition, we call the transition between dense blocks and up-sampling a decoding structure. Inspired by previous work [7, 27, 30, 42, 52], the use of residual blocks for dense connections in the coding structure can extract feature information to the greatest extent and ensure the convergence of the network. In addition, a multi-level pyramid pooling module is used for considering the global structure-related information, for refining the learned features [53]. After the input image is continuously convolved by multiple convolution kernels, the feature size at the end of the encoder is only 1/32 of the input size. For maintaining the image’s resolution, the image is restored using the decoder module. The decoder consists of dense residual blocks and several up-sampling modules [54].

Since there are enough residual blocks and jump connections in the coding structure, this increases the depth of the network and enables the network to learn features at different levels. We use coding structure to increase the depth of the network, and use multi-level pyramid pooling to coordinate global structure information. The connection of dense residuals still lacks the global structure-related information about different-scale objects. To use the global context-related information in classification and segmentation tasks [55], a very large pooling layer is typically used for capturing. In this paper, for directly estimating the final transmission map using different-scale images, the multi-level pyramid pooling approach is used [52]. Multi-level pyramid pooling ensures that different-scale features are embedded in the final result. After pooling, all four levels of features are up-sampled to the size of the original image, and the final estimate is connected to the previous original features, yielding a global estimate of the transmission map.

When the convolution kernel learns more effective features, the loss between the estimated transmission map and the real transmission map will be less, so that the estimated transmission map will be more accurate. Euclidean loss (L2) results may be blurred, so using the L2 loss measure for estimating the transmission map may result in the loss of details. The edges correspond to the discontinuities in the image intensity, and can be considered using image gradients for representation. In addition, some low-level features of edges and contours can be obtained from the shallow structure of the CNN. In summary, the edge-preserving loss function is used for network training.

Gradient loss has been used for depth estimation [26], and perceptual loss has been used in low-level vision tasks [56]. In the edge-preserving loss function, the L2 loss, bidirectional gradient loss, and feature edge loss are defined as (4). To better retain the detailed information in the transmission map during training, we use L2 loss, bidirectional gradient loss and feature edge loss for joint training in training.

$$ {L^{E}} = {\lambda_{E,{l_{2}}}}{L_{E,{l_{2}}}} + {\lambda_{E,g}}{L_{E,g}} + {\lambda_{E,f}}{L_{E,f}}, $$
(4)

where LE represents the overall edge retention loss, \({L_{E,{l_{2}}}}\) represents the L2 loss,and LE,g implies the bidirectional loss (horizontal and vertical). The gradient loss is defined by (5). The loss LE,f is the characteristic loss, which is defined by (6). The weights \({\lambda _{E,{l_{2}}}}, {\lambda _{E,g}}, {\lambda _{E,f}}\) are used for balancing the contributions of the different loss terms.

$$ \begin{array}{l} {L_{E,g}} = \sum\limits_{w,h} {{{\left\| {{{({H_{x}}({G_{t}}(I)))}_{w,h}} - ({H_{x}}{{(t)}_{w,h}}} \right\|}_{2}}} + \\ \left\| {{{({H_{y}}({G_{t}}(I)))}_{w,h}} - ({H_{y}}{{(t)}_{w,h}}} \right\|, \end{array} $$
(5)

where Hx and Hy are the image gradients calculated along the horizontal and vertical directions, respectively, representing the size of the output feature.

$$ \begin{array}{l} {L_{E,f}} = \sum\limits_{{c_{1}},{w_{1}},{h_{1}}} {{{\left\| {{{({V_{1}}({G_{t}}(I)))}_{{c_{1}},{w_{1}},{h_{1}}}} - {{({V_{1}}(t))}_{{c_{1}},{w_{1}},{h_{1}}}}} \right\|}_{2}}} \\ + \sum\limits_{{c_{2}},{w_{2}},{h_{2}}} {{{\left\| {{{({V_{2}}({G_{t}}(I)))}_{{c_{2}},{w_{2}},{h_{2}}}} - {{({V_{2}}(t))}_{{c_{2}},{w_{2}},{h_{2}}}}} \right\|}_{2}}}, \end{array} $$
(6)

where Vi is the CNN structure and ci,wi,hi are the dimensions of the corresponding low-level features. We use the layers before relu1-1 and relu2-1 of VGG-16 [44] for the edge-extraction procedures V1 and V2, respectively.

In the transmission map estimation, we assign the weight of the loss according to \({\lambda _{E,{l_{2}}}} = 1,{\lambda _{E,g}} = 0.5, {\lambda _{E,f}} = 0.8\). In the training process, we use ADAM as the optimization algorithm, and the learning rate is 2 × 10− 3.

3.3 Estimation of the atmospheric light

In the low-light image enhancement algorithm based on the low-light imaging model, the intensity of atmospheric light is a critical parameter. In previous studies, statistical analyses were performed on fog-free images, and a summary of the dark primary colour verification was provided. For most outdoor fog-free images, some pixels always have a certain minimal brightness value in a certain colour channel. This minimal brightness is called the dark primary colour, and the corresponding pixels are called dark primary colour pixels. A dark primary colour can be defined as

$$ {J^{dark}}(x) = \mathop {\min }\limits_{c \in \{ r,g,b\} } (\mathop {\min }\limits_{y \in {\Omega} (x)} ({J^{C}}(y))), $$
(7)

where C is the colour channel, JC is the component of image J in the channel, and Ω(x) is the square-like neighbourhood around the pixel x.

For a clear image, for a certain pixel x, there are likely to be some dark pixels in its neighbourhood, making Jdark(x) ≈ 0. We call Jdark(x) the dark prior colour. For the a priori estimation of the atmospheric light intensity based on the image dark primary colours, we first arrange the brightness values in the descending order to organise Jdark. Then, the first 0.1% of the pixels are selected as candidate atmospheric light intensity data, and their brightness values are compared with those of the corresponding pixels in the original image I. The maximal brightness is reported as the atmospheric light intensity.

In practice, estimation of the atmospheric light intensity based on the prior knowledge of dark channels is likely to be inaccurate. In practical applications, the atmospheric light intensity A may be set based on the brightest pixel far from the actual atmospheric light, and not in the area with the heaviest fog. In this paper, we choose the atmospheric light intensity estimation method that is based on cluster statistics [57]. This method uses the dark channel prior for obtaining the dark prior colour Jdark(x) corresponding to the image J(x). The first 0.1% of pixels constitute a set XC, which is used as a candidate point for the atmospheric light intensity estimation.

$$ {X_{C}} = \{ {x_{cand}}|{J^{dark}}({x_{cand}}) \in High\_Value\_Range\}, $$
(8)

For clustering, we use the k-means algorithm, with k = 5.

$$ L = \{ {L_{n}}|{L_{i}}\mathop \cap \limits_{i \ne j} {L_{j}} = \emptyset ;{L_{n}} = {X_{C}};i,j,n = 1,2...,5\}, $$
(9)

The set of the clustered points is then filtered, and the set that minimises the following equations is selected:

$$ J = \sum\limits_{n = 1}^{K = 5} {\sum\limits_{x \in {L_{n}}} {{{\left\| {{x_{i}} - {\mu_{n}}} \right\|}^{2}}} }, $$
(10)

where xi is the point in the point set Ln, and is the centre of the set Ln.

After the clustering procedure, the set of points is partitioned into labelled subsets L = {Ln|n = 1,2...,5}. The obtained subsets are sorted in the order of their cardinality (by the number of points Nn in the subset), and the most populated subset \({L_{n^{\prime }}}\) is selected; the area covered by the pixels in this subset is considered as the atmospheric light area. Next, we consider the geometric centre of all the candidate light points \(\{ {x_{i}}^{\prime }|{x_{i}}^{\prime } \in {L_{n^{\prime }}}\}\) in cluster \({L_{n^{\prime }}}\) for computing the atmospheric light intensity. The average brightness over all candidate light spots is taken as the atmospheric brightness vector \({L_{\infty } }\):

$$ {L_{\infty} } = mean(J({x_{i}}^{\prime})), $$
(11)

3.4 Image restoration

By performing the steps described in Sections 3.2 and 3.3, we obtain the transmission map and the atmospheric light intensity A. The obtained parameter values are then used in (2), yielding a clear restored image 1-R(x), as follows:

$$ (1 - R(x)) = \frac{{(1 - I(x) - A)}}{{\max (t(x),t{}_{0})}} + A, $$
(12)

where t0 is a threshold value, to prevent the denominator from becoming too small, which is usually set to 0.1.

4 Experiments and analysis

In this section, we elaborate on the implementation details of the method and the evaluation of the experimental results. For validating the effectiveness of the proposed method, experiments were performed using an Intel (R) Core (TM) i7-6700 CPU @ 3.4 GHZ computer, with 16 GB of RAM and a Windows 10 operating system. The software used for numerical analysis was MATLAB 2018b. The tested images included urban street scenes, natural scenes, and indoor images.

To verify the effectiveness of the proposed method, we randomly selected images from the NPE dataset [48], LIME dataset [16], DICM dataset [24], MEF dataset [35] and VV dataset [45]. The specific introduction of the dataset is as follows:

NPE dataset: The dataset has 85 low-light images, divided into four parts: NPE, NPE-ex1, NPE-ex2 and NPE-ex3. Among them, NPE only contains 8 outdoor natural scene images. The remaining three are additional supplementary datasets, in which the images are mainly low-light images in cloudy, dawn, evening and night scenes.

LIME dataset:

This dataset contains 10 low-light images used in the LIME method.

DICM dataset:

The dataset contains 69 low-light images collected by commercial digital cameras.

MEF dataset:

This dataset is a multi-exposure image set. The dataset contains 17 high-quality image sequences, the image styles include natural landscape, indoor and outdoor landscape, and man-made buildings. Each image sequence corresponds to multiple images with different degrees of exposure, and we select the poorly exposed image in each image sequence as the object for low-light enhancement.

VV dataset:

This dataset is composed of 24 images collected by Vassilios Vonikakis in daily life. Each image in the dataset has a part of the area that is well exposed and part of the area is underexposed, so this dataset is a challenging task for enhancement. A good enhancement algorithm should not perform secondary enhancement for well-exposed areas, it mainly enhances underexposed areas.

We extracted the test images from various data sets and evaluated the quality of the enhanced images using two aspects: 1) subjective evaluation and 2) objective evaluation. The experimental results are shown in Fig. 4. Clearly, the proposed method yields clear images with natural colours, whether it is a near or far scene, which proves the effectiveness and applicability of the method. In the following experiments, the proposed method is compared with various existing methods, for evaluating the effectiveness of its underlying algorithm.

Fig. 4
figure 4

Example experimental results. The first and third rows are the original low-light images. The second and fourth rows are the enhanced images

4.1 Objective quantitative analysis

First, we compared the experimental results obtained using the proposed method with those obtained using conventional image-enhancement methods. The experimental results are shown in Fig. 5. Panels b–f show the results obtained using the AHE method, the HE method, the Retinex method, the linear transformation method, and the currently proposed method. The third and sixth rows show the amplified views of the regions delineated by the red and blue boxes in group (a) in Fig. 5. As shown in Fig. 5, the image quality is uneven. Figure 5b and c show the results obtained using the AHE and HE methods, respectively. Looking at the entire images, the enhanced images exhibit significant tone deviations, with little or no contrast improvement, and with blurring in some cases. The amplified views show that the images are severely blurred and are noisy. The results for the Retinex method are shown in Fig. 5d. The contrast of the enhanced image is significantly improved, but the image exhibits excessively enhanced areas. The results obtained using the linear transformation method and the currently proposed method are shown in Fig. 5e and f, respectively. The overall image quality is better than that of the other methods, but the image processed using the linear transformation method (Fig. 5e) is darker. In contrast, the currently proposed method yields significant improvements in colour and detail, and outperforms the other algorithms in terms of its visual effects.

Fig. 5
figure 5

Comparison of the proposed method with conventional enhancement methods

In addition to the above-mentioned conventional enhancement methods, the proposed algorithm was compared with several new algorithms. These included the SRRM [29], EFF [33], DCP [17], LIME [16], SRIE [15], LACE [25], MSRCR [34], and CVC [9] algorithms. Figure 6 compares the brightness and contrast of the images processed using these different algorithms. In particular, the comparison is performed in terms of the images’ histograms. The wider the histogram, the stronger the brightness and contrast of the corresponding image. Figure 6a shows the original low-light image; the corresponding histogram is narrow, and is positively skewed, indicating that the image brightness is low. The result of the image enhancement using the CVC method is shown in Fig. 6b. Compared with the original low-light image, the histogram of the enhanced image is wider and the brightness is improved. However, the image still appears dark and fuzzy. The MSRCR, LACE, and SRIE methods further increase the width of the image histogram, also improving the brightness and the contrast of the image. The LIME method yields the widest histogram, and the overall brightness of the image is significantly improved using this method; yet, this method does not address the colour migration problem. The EFF and SRRM methods feature smaller improvements of the image’s brightness and contrast, but correct the colour shift. The DCP method yields excessive enhancement, amplifying the noise in the dark areas. Based on these observations, the currently proposed method significantly improves the brightness and the contrast of the image, adequately addressed the colour shift problem, and the overall visual effect of the image is better.

Fig. 6
figure 6

Comparison of the different methods in terms of the image brightness and contrast improvements

In Fig. 6 we used the histograms for evaluating the brightness and the contrast of the processed images. In Fig. 7, for further evaluating the quality of the enhanced images, the amplified views of the enhanced images are shown. The image of a building taken at night is shown in Fig. 7a. The CVC method only relatively weakly improves the image brightness, and the colour of the pixels in the low-light areas of the image is not restored. The MSRCR method handles the details better, but the brightness is not satisfactory. The brightness of the image after processing using the LACE method is low. The SRIE method, LIME method, and currently proposed algorithm all demonstrate good results in terms of the image colour and brightness. The EFF enhancement algorithm yields a relatively small improvement of the image tone deviation, but significantly improves the brightness of the image. The SRRM method exhibits artifacts and blurry areas after processing. The DCP algorithm excessively strengthens the edges of the target, generating unnecessary noise.

Fig. 7
figure 7

Comparison of the different methods in terms of the image detail processing

The currently proposed method does not excessively amplify the noise in the dark areas, and handles the image details better. The saliency of the object is significantly improved, and the overall contrast of the image is improved as well. In addition, dark areas are not over-enhanced. To further observe the performance difference between the currently proposed method and existing methods, in Fig. 8, the enhancement effect of the currently proposed method and that of the comparative experiment are shown.

Fig. 8
figure 8

Examples of some comparative experiments

4.2 Subjective qualitative analysis

Subjective evaluation is mainly based on visual perception. Because different image enhancement methods focus on different enhancement aspects, it is difficult to ensure sufficient objectivity when performing subjective evaluation. To further quantify the performance of the currently proposed method, objective evaluation was performed using the information entropy (IE) and average gradient (AG) metrics [32].

The information metric captures the Shannon information at a specific point in time, while entropy corresponds to the expected amount of information prior to receiving the measurement results. We consider a still image as a signal source with a random output, and we denote by \(\left \{ {{a_{i}}} \right \}\) the set of source symbols A. The average amount of information contained in the image is given by (13), which present the IE concept:

$$ H = - \sum\limits_{i = 1}^{L} {p({a_{i}}){{\log }_{2}}p({a_{i}})}, $$
(13)

According to the information theory, the more details an image contains, the higher is its information content, and the higher is its information entropy.

Our images contain many details, and there are clear gray differences near the images’ edges (or borders), implying the gray change rate. A high change rate signals that the small details in the image exhibit significant contrast changes. The rate of change can be considered as the rate of change in the image density, and can be also used for capturing the image sharpness. The average gradient is defined as follows:

$$ AG = \frac{1}{{M \times N}}\sum\limits_{i = 1}^{M} {\sum\limits_{j = 1}^{N} {\sqrt{\frac{{{{(\partial f/\partial x)}^{2}} + {{(\partial f/\partial y)}^{2}}}}{2}} } }, $$
(14)

where M and N represent the width and the height of the image, f/x represents the horizontal gradient, and f/y represents the vertical gradient.

To verify the effectiveness of the proposed method, we randomly selected images from Retinex dataset [39] and other image datasets for experiments. In Fig. 9, the used images are depicted, which include images of indoor and outdoor scenes with low illumination and uneven illumination. The images are labelled as Image 1 – Image 10. These ten images were enhanced using the MSRCR, LACE, SRIE, LIME, EFF, and SRRM algorithms, and the enhancement results were compared with those obtained the currently proposed method. In the first comparison, we evaluated the IEs of the enhanced images. The results of this comparison are summarised in Table 1, in which the maxima are given in boldface. Table 1 shows that the average IE of non-processed images is 6.581. After enhancement using the different algorithms, the average IE increased to 7.4. With respect to single-image IE, the MSRCR, LIME, and the currently proposed algorithm yielded the highest values. For Image 9, the MSRCR algorithm yielded the IE value of 7.681, which was the highest across all of the considered algorithms. The IE value yielded by the LIME algorithm for Image 2 was 7.846. The currently proposed method exhibited the highest IE for multiple images, and the average IE across the ten images was 7.671. In summary, the currently proposed method performed comprehensively better than the other considered methods.

Fig. 9
figure 9

Examples of low-light images

Table 1 Comparison of the IE values for images enhanced by the different algorithms

Table 2 compares the average gradients for the enhanced images. Boldface indicates the maximal gradient. Table 2 shows that the average of the average gradient across the first ten images without processing is only 4.463, which is relatively low. After processing using the different methods (as described in the main text), the average (across the images) gradient increased, and the extent of the increase was method-dependent. The maximal increase for the currently proposed method was 10.999. The LACE method exhibited a relatively small improvement (39.3%). The MSRCR method exhibited a good overall performance, and demonstrated significant results with respect to the processing of multiple images.

Table 2 Comparison of the average gradients of images enhanced using the different algorithms

Besides,in order to verify the effectiveness of the method in this paper, we conduct experiments on all the images in the MEF dataset and VV dataset to obtain the average performance indicators of different methods. We use different methods to enhance the images in the dataset, and use average gradient, natural image quality evaluator (NIQE)[1] and BRISQUE [37] to objectively evaluate the enhancement results. The experimental results are shown in the following Table 3 . The best performance has been marked in bold. From the average gradient evaluation, SRIE and our method have obtained the best performance on MEF dataset and VV dataset respectively. As for the NIQE evaluation, our method achieves the best performance on both MEF dataset and VV dataset. For the BRISQUE evaluation, the LIME method has achieved better performance on the MEF dataset, while our method has achieved better results on the VV dataset.

Table 3 Comparison the average performance of different methods on the MEF dataset and the VV dataset

4.3 Comparison of time complexity

We also tested the computational complexity of the ten image-processing methods for the images in Fig. 9. For this, image processing for each image and method was repeated ten times, and the average over the replicate runs was taken as the final estimated performance time. From Table 4, it is evident that the processing time increases with an increasing image size. The time required for processing a 1039 × 789 image exceeds 1 s, and the time required for processing a 2000 × 1500 image reaches 3.89 s. For smaller images, the processing speed is relatively fast, and the processing time is under 1 s.

Table 4 Processing time of different-resolution images (time units: s)

To determine the temporal complexity of the currently proposed method, the method’s performance was compared with those of the MSRCR, EFF, LIME, and SRIE methods. The image dimensions used in this test were 600 × 300, 700 × 500, 1024 × 760, 1500 × 800, and 2000 × 1300. The runtimes of the different methods are listed in Table 5. The SRIE method takes the longest time to process a single image and has the lowest processing efficiency. For processing a 2000 × 1300 image using this method, the processing time reached 286.81 s. This is mainly because the iterative calculation complexity of the SRIE method is relatively high, which prolongs the runtime. The MSRCR method exhibited a higher processing speed. For a 600 × 300 image, the processing time reached 0.21 s, the shortest among all the tested methods. For a 2000 × 1300 image, the runtime of the MSRCR method was 7.94 s, while the runtime of the currently proposed method was 7.38 s. The processing speed of the currently proposed method was equivalent to that of the MSRCR method, and the processing time for a single image was shorter.

Table 5 Comparison of the times required for the different algorithms to process different-resolution images (time units: s)

The above subjective evaluation, objective evaluation, and time complexity analysis suggest that the currently proposed method outperforms all the other considered methods. The currently proposed method effectively enhances low-light images, and its temporal complexity is acceptable.

4.4 Image enhancement under extreme conditions

To more comprehensively evaluate the performance of the currently proposed method, image enhancement was applied to images that were acquired under extreme illumination conditions. Very low-light images that were enhanced using the currently proposed method are depicted in Fig. 10. In the first and the third rows in Fig. 10, the original low-light images are shown. In the second and the fourth rows, the corresponding enhanced images are shown. In Fig. 10a, an image that was captured under backlight conditions is shown. Owing to the backlight, the image exhibits tonal deviation. The processed (enhanced) image has much higher brightness and contrast. In Fig. 10c, a dark indoor scene is shown. Because the light is very low and the image content is blurred, the enhancement is not ideal, but the colour of the box can be easily identified. In Fig. 10d, the enhancement of a natural-scene image is demonstrated. The brightness and contrast of the enhanced image are much better, and the tonal imbalance is significantly alleviated. The enhancement of the two images shown in Fig. 10e and f is not ideal, but no block effect is observed in the enhanced images, and they look more natural.

Fig. 10
figure 10

Examples of image enhancement for images acquired under extreme illumination conditions

To analyse the experimental results more objectively, the images in Fig. 10 were enhanced using the MSRCR, LACE, SRIE, LIME, EFF, and SRRM methods. To quantitatively compare the overall image quality, the quality of the enhanced images was measured using the natural NIQE and BRISQUE.

NIQE is an objective evaluation index without reference images. NIQE extracts feature from highly regular natural landscapes and then fits it into a multivariate gaussian model. Comparing the features in the test image and the natural landscape image actually measures the difference in the multivariate distribution of the image under test and the natural landscape image. The multivariate distribution is constructed from the features extracted from a series of normal natural images.

The NIQE algorithm measures the difference between the test image and the natural image. First, obtain small blocks on the image, and then use the gaussian line distribution to describe the spatial domain features on the small blocks. Finally, the features on the test image are compared with the standard natural image features, and the evaluation formula is as follows:

$$ D\left( {v_{1}},{v_{2}},\sum{_{1}} ,\sum{_{2}}\right) = \sqrt{{{({v_{1}} - {v_{2}})}^{T}}{{\left( \frac{{\sum{_{1}} + \sum{_{2}} }}{2}\right)}^{- 1}}({v_{1}} - {v_{2}})} , $$
(15)

where v1,v2 and \(\sum {_{1}} ,\sum {_{2}} \) are the mean vector and covariance matrix of the multivariate Gaussian model of the natural image and the multivariate Gaussian model of the distorted image, respectively. When the gap between the test image and the natural landscape image is larger, the NIQE value is larger, and the image quality is worse.

BRISQUE proposes a general-purpose non-reference image quality evaluation based on statistics of natural scenes. First, the local normalized brightness coefficient is obtained by subtracting the mean value divided by the variance. The coefficient is used to quantify the loss of the “naturalness” of the image. The parameter feature vector is obtained by introducing the asymmetric generalized Gaussian distribution to fit the corresponding relationship between the natural image and the distorted image. Finally, the support vector model is used to map the parameter features to the quality scores.

When using BRISQUE for image quality evaluation, the test image is represented as an artificially designed feature vector, and then a support vector machine is used for classification. In the classification process, half of the feature vectors are extracted for the first time, and in the second extraction, they are scaled 0.5 times on the original basis. Perform local luminance normalization on the extracted features, and asymmetric generalized Gaussian distribution to fit and other operations. Because BRISQUE also compares the difference between the test image and the natural image, the higher the BRISQUE score, the worse the image quality.

The BRISQUE and NIQE values are listed in Tables 6 and 7, respectively, for the images in Fig. 10. The best performance has been marked in bold. For the BRISQUE values (Table 6), the MSRCR method yielded the best performance of 25.618 on Image 2, while the currently proposed method yielded 27.117. For all the other images, the currently proposed method demonstrated the best results among all the methods. For the currently proposed method, the average value was 22.714. Regarding the NIQE values in Table 7, the LACE,SRRM and the currently proposed method have all demonstrated good enhancement performance. By comparing the results for the different considered methods, we conclude that the currently proposed method yielded higher detection values on multiple images, and has better effectiveness and applicability than the other considered methods.

Table 6 Comparison of BRISQUE values of images enhanced by the different methods
Table 7 Comparison of NIQE values of images enhanced by the different methods

5 Conclusion

In the present study, we considered the problem of enhancing low-light images. We first provided an overview of the problem, as well as an overview of the existing methods to deal with the image-enhancement problem. Then, we introduced our proposed method for image enhancement and studied its performance. The proposed method first reverses a low-light image to obtain a foggy image. Next, the method uses a CNN based on dense residual blocks for estimating the transmission map of the image. After calculating the transmission map, the atmospheric light intensity value of the image was estimated. We used a dark channel prior method with dark primary colour pixels of 0.1% as candidate atmospheric light intensity values. The obtained transmission map and atmospheric light intensity value were substituted into a low-light imaging model, to obtain a clear image. We used conventional physical modelling methods for obtaining clear images, and used CNNs for statistical inference.

The proposed method was shown to better maintain the image details and overall exhibited a balanced distribution of the image colour. We used the IE, AG, BRISQUE, and NIQE metrics for evaluating the performance of the method. Both objectively and subjectively, the currently proposed method exhibited significant advantages over the existing methods. In future work, we will seek to improve the real-time performance of the algorithm and will further explore the application of low-light enhancement to video images.