1 Introduction

Driving at night is comparatively harsher than daytime due to insufficient light. In low light, it is very tough for any driver to detect pedestrians, potholes, or any other object on the road in real-time. As a result, driving becomes risky and the chances of road accidents become high. Fifty percent of road accidents happen at night due to low light (Organization 2018). Therefore, the vision of the driver at night needs to be enhanced. Early approaches of low-light image enhancement lead to high computation time and applicable only for a single input image. Most of the existing methods are having unnatural results, over-enhanced results, contrast mismatch, expensive, or huge computation time, which are not suitable for real-time driving scenarios. A tropical road at night is shown in Fig. 1.

Fig. 1
figure 1

Insufficient light on the road at night

To tackle the above aforesaid drawbacks, a novel real-time and very fast low-light vision enhancement technique is proposed for drivers. The proposed approach is completely based on a modified bright channel prior with a tactic to reduce huge computation time, followed by adaptive gamma correction as a post-processing step for final contrast adjustment. Main contributions are summarized as follows.

  • The proposed method can provide a real-time bright vision for drivers at night within a minimum computation time.

  • It provides a prominent enhanced vision using a low-cost 2D camera to avoid high expenditure.

  • To reduce the huge processing time, frames are classified into two states. The value of atmospheric light (A) is recalculated only when a new state starts.

  • A concept of dynamic-patch is used to avoid over enhancement for the darker region of frames. Smaller patch size is provided to darker pixels and vice versa. The use of dynamic-patch provides prominent bright output for the road and surrounding area.

  • Use of the adaptive gamma correction as a post-processing step uplifts the quality of the output frame to a pleasant level with perfect contrast.

  • The proposed method outperforms state-of-the-art models for all types of frames in real-time at night.

The residual of the article is arranged as follows. Section 2 describes some research work in literature which are closely associated with the proposed approach. Section 3 presents a step-by-step explanation of the proposed approach. Section 4 represents an assessment of real-time experiential performance. Lastly, Sect. 5 represents the conclusion and describes the next stage in future research.

2 Literature review

Early approaches of low-light image enhancement lead to high computation time and applicable only for a single input image. A low-light enhancement method is introduced (Ying et al. 2017; Ren et al. 2019b) using the response characteristics of cameras. Two images with diverse exposures are inspected to find an accurate camera response model. Then, the illumination approximation methods are used to evaluate the exposure ratio map. Lastly, the desired exposure is adapted from the exposure ratio map.

The multi-branch low-light enhancement network (MBLLEN) (Lv et al. 2018, 2019) is defined by using a deep learning technique. The main logic is to extract the rich features in different levels so that enhancement can be applied via multiple subnets. Finally, the output image is produced by multi-branch fusion. However, this method requires large-scale training data.

Another low-light image enhancement algorithm is established (Tanaka et al. 2019) based on gradient. As the gradients are more sensitive than absolute values for any human vision, this method improves the gradients of the dark region. Besides, the intensity-range constraints are involved for image integration. According to the intensity-range constraints, the resulting output image can be integrated with enhanced gradients preserving the specified gradient information.

A real-time low light enhancement algorithm is introduced (Bhat et al. 2019) with detail preservation for low-cost embedded platforms. The method is integrated into the image signal processing pipeline of the TI’s TDA3x and it achieved ~ 50 fps on the c66x DSP for any HD resolution video acquired from Omnivision’s OV10640 the Bayer image sensor.

Low-light images are enhanced (Sun et al. 2017) using the illumination-reflection model. Based on the illumination-reflection model, a guided filter is used to evaluate the illumination components of the original image. Then, the reflection component is obtained and enhanced by sigmoid and gamma (nonlinear functions) respectively. Finally, the high contrast output images are generated.

Another low-light single image degradation model (Gu et al. 2018) is introduced built on the atmospheric-scattering-model. A pure pixel ratio prior is presented to enhance low-light images. First, the input image is segmented into regions according to similar brightness and utilized a scene-based transmission approximation approach instead of the traditional pixel-based fashion. Next, the rough transmission map is refined by using a total variation smooth operator and obtain the enhanced output image accordingly.

A low-illumination image enhancement technique (Shi et al. 2018) is proposed for night-time based on dual channel-prior. It builds upon two existing image-priors: bright channel prior and dark channel prior. The bright channel prior is applied to obtain a primary transmission map. Then, the dark channel is used as a complementary matching channel to precise inaccurate transmission estimates achieved from the bright channel prior.

The maximum diffusion value is used in another method (Kim et al. 2019) for low-light image enhancement. The assessed illumination component can be precisely detached from the scene reactance by choosing the maximum value at each pixel position. Thus, the visual quality of the image is enhanced. However, the method often causes a color-inconsistency problem due to an incorrectly estimated illumination map and suffers from low-visibility.

An unsupervised learning method (Lee et al. 2020) for single low-light image enhancement is proposed by using the bright-channel-prior with a small patch (close to 1). A loss function (unsupervised) is also defined. An enhanced network that consists of a simple encoder and decoder, is trained by using the predefined unsupervised loss function. Furthermore, a saturation loss and a self-attention map are introduced to preserve image details with natural effect in the enhanced output. However, sometimes it produces unnatural results after consuming huge processing time.

An adaptive model for contrast enhancement is introduced (Hsieh et al. 2020) for partially shaded low-light images. The bright pixels are preserved while the dim pixels are boosted. The split Bregman process is used to achieve effective numerical implementation of the introduced adaptive-variational model. But, it produces a resultant image with over enhancement.

Another low-light image enhancement (LIME) method is introduced (Guo et al. 2017) where the radiance of each pixel is first assessed by finding the extreme value in the RGB channels. Further, the primary illumination map is polished by including a structure prior, as an ultimate illumination map. The enhanced output is achieved for the well-built illumination map.

A dark video restoration procedure is presented (Ko et al. 2017) using the same patches among the adjacent frames. This procedure consists of three phases: (1) brightness enhancement by using the same patches among the adjacent frames and the adaptive accumulation, (2) improved color assignment to reduce color distortion, and (3) image fusion for saturation reduction using the guide map. This approach does not produce any unwanted artifacts for the similar patches among the adjacent frames. Fully automatic color preservation and average brightness control are empowered by the color assignment and the fusion steps. But, the time complexity of this method is high.

The robust-Retinex-model (Li et al. 2018) is used to enhance low-light images. It additionally considers a noise map associated with the conventional Retinex model. An optimization function is introduced which includes new regularization terms for reflectance and illumination.

The absorption-light-scattering-model (ALSM) (Wang et al. 2019) is used to describe the absorbed light for low-light images. The minimum channel of ALSM is identified. Finally, a novel low-light image enhancement technique is identified by observing the monotonicity between atmospheric light and scene reflection. However, it consumes a huge computation time.

A trainable hybrid network followed by a spatially variant recurrent neural network (RNN) is introduced (Ren et al. 2019a) to enhance the visibility of an image. The network comprises two different streams to instantaneously study the global content followed by the relevant structures of an image in a unified network. The content stream evaluates the relevant global content of the low-light input image by using an encoder-decoder network. Yet, a new spatially different recurrent neural network (RNN) is introduced to prevent the loss of structural details. Nevertheless, the resultant image is not as natural as the actual ground truth.

However, all the above approaches are designed for the low-light enhancement of a single image, not for real-time video. The approaches consume huge processing time, high cost, and produce unnatural and over-enhanced results, which is not suitable for any real-time application, especially for driving.

3 Proposed approach

The proposed approach is based on the modified bright channel prior to the atmospheric light scattering model. This approach includes atmospheric light estimation with dynamic patch size, frame inversion, and transmission map estimation, bright scene recovery followed by gamma correction for prominent contrast adjustment. The proposed framework of real-time low-light vision enhancement is presented in Fig. 2. Stepwise visualization of the overall framework is shown in Fig. 3.

Fig. 2
figure 2

The proposed framework of real-time low-light vision enhancement approach

Fig. 3
figure 3

Visualization of the overall framework of the proposed approach

3.1 Atmospheric light estimation using maximum filtering technique with dynamic patch

In computer vision, the atmospheric light scattering model (Shi et al. 2018; Kim et al. 2019; Lee et al. 2020) is defined as:

$$I(p) = J(p)t(p) + A(1 - t(p))$$
(1)

where (A) is the atmospheric light, R(p) is the intensity of pixel p, J(p) is the output image, I(p) indicates the input image, and t(p) describes the transmission map. The low-light vision enhancement process aims to compute J from I with the help of A and t. A frame is split into multiple small patches of size Ω.

The local patch Ω(p) size is provided dynamically depending upon the intensity of the local pixels as follows:

$$\Omega (p) = \left\{ {\begin{array}{*{20}c} {3 \times 3 \, \forall \, pixels \, where \, itensity \le 85} \\ {5 \times 5 \, \forall \, pixels \, where \space\,\, { 86} \le itensity \le 170} \\ {9 \times 9 \, \forall \, pixels \, where \, itensity > 170} \\ \end{array} } \right.$$
(2)

Here, the regions or pixels of a frame, where the intensity is comparatively higher, are given less importance. Therefore, if the intensity of a pixel is greater than 170, then 9 × 9 pixels (bigger patch size) are considered as a single local patch. Similarly, if the intensity of a certain region/pixel is very low (less than 85), then it indicates a comparatively darker area of the frame, which needs special care for enhancement. Therefore, the darker areas are split into smaller patch size of 3 × 3 to pay close attention. In this way, the patch sizes are determined dynamically depending upon intensity. The threshold values are chosen according to the best quality output after many real-time observations.

The maximum filter is applied to every local patch of three RGB channels. The highest value of intensity Ihighest(p) of any pixel is estimated as follows:

$$I^{highest} (p) = \mathop {\max }\limits_{{y\varepsilon \{ \Omega (p)\} }} (\mathop {\max }\limits_{{c\varepsilon \{ r,g,b\} }} I^{c} (p))$$
(3)

where Ic is the color channel of I. The atmospheric light (A) is the average intensity of the 0.2% pixels (i.e. brightest 0.1% and darkest 0.1% pixels), denoted as K in the three bright channels (RGB) of I, i.e.:

$$A = \frac{1}{\left| k \right|}\sum\limits_{p \in k} {I^{highest} (p)}$$
(4)

The source of (A) at night is caused by streetlights and headlights. The road vision at night is classified into two states. (a) Bright (consists of high beam headlights of oncoming vehicles) (b) dark (no high beam headlights of any vehicles). Many real-time experiments reveal almost similar values for the same state. Therefore, the value of (A) is calculated when a new state starts, and no need to recalculate it until there is any change in state and so on.

In the case of a bright frame (consists of high beam light), the highest intensity present in the high beam headlight region. So, Ihighest(p) will be the highest value of intensity (from maximum value among R, G, and B) for a local patch. Hence, the brightest 0.1% pixels will be from the high beam light region, and the darkest 0.1% pixels will be from any other darker region of the frame. Therefore, the value of atmospheric light (A) (the average intensity of the 0.2% pixels) will be a balanced value, which will enhance the entire frame smoothly.

3.2 Frame Inversion and transmission map estimation

Now, normalization of the Eq. (1) is done as follows:

$$\frac{{I^{c} (p)}}{{A^{c} }} = t(p)\frac{{J^{c} (p)}}{{A^{c} }} + (1 - t(p))$$
(5)

Afterward, the highest intensity is calculated by putting the max operator on each side of the Eq. (5):

$$\mathop {\max }\limits_{y \in \Omega (p)} \left( {\mathop {\max }\limits_{c} \frac{{I^{c} (p)}}{{A^{c} }}} \right) = t(p)\mathop {\max }\limits_{y \in \Omega (p)} \left( {\mathop {\max }\limits_{c} \frac{{J^{c} (p)}}{{A^{c} }}} \right) + (1 - t(p))$$
(6)

Here, t(p) is the transmission map. As J is a clear and bright output frame, so the highest intensity of J tends to 255, i.e.:

$$J^{highest} (p) = \mathop {\max }\limits_{{y \in \{ \Omega (p)\} }} \left( {\mathop {\max }\limits_{c} J^{c} (p)} \right) = 255$$
(7)

Putting Eq. (7) into Eq. (6), we have

$$\frac{{I^{highest} (p)}}{{A^{c} }} = t(p)\frac{255}{{A^{c} }} + (1 - t(p))$$
(8)

From Eq. (8) the transmission t(p) is assessed by

$$t(p) = \frac{{I^{highest} (p) - A^{c} }}{{255 - A^{c} }}$$
(9)

3.3 Bright scene recovery

With the help of A and t(p), the bright scene is recovered according to (1). Sometimes t(p) becomes close to zero and the output J(p) becomes a little bit noisy. To avoid such a problem, we limit a lower bound of the transmission by t0. Here, t0 = 0.15 is finalized after several real-time experimental analyses. The ultimate scene radiance J(p) is recovered by

$$J(p) = \frac{I(p) - A}{{\max (t(p),t_{0} )}} + A$$
(10)

3.4 Adaptive gamma correction

The intensity of a monitor screen, which emits light using any given input signal, is usually not a linear function. It depends on the power factor of the voltage provided in it. This power factor is known as gamma. We can represent gamma while increasing the intensity of dark and low contrast image by:

$$I = V^{\gamma }$$
(11)

where the voltage (V) drives the electron (input intensity), flow is related to the light intensity (I) emitted by the monitor (output intensity), and gamma (γ) is the power factor (constant). The representation of light intensity for various gamma (γ) value is demonstrated in Fig. 4.

Fig. 4
figure 4

Variation of light intensity for different gamma (γ) value

Gamma correction method (Rahman et al. 2016; Huang et al. 2013; Acharya and Giri 2020) can correct the non-linear function through a look-up table using the following steps:

Step 1: Input frame

Step 2: Determine the maximum pixel intensity

Step 3: Select the value of gamma

Step 4: Look-up table formation

Step 5: Mapping of input pixel values in the look-up table

Step 6: Output frame

After determining the maximum intensity value of the input frame and finalizing the gamma value, the formation of the look-up table for an 8-bit image frame is done as follows.

$$Look - Up \, Table = \left| {Max\_{\text{int}} ensity \times \left( {\frac{{[0:Max\_{\text{int}} ensity]^{\gamma } }}{{Max\_{\text{int}} ensity}}} \right)} \right|$$
(12)

Pixels of the input frames are mapped to the values in the look-up table. In the beginning, the first pixel of the frame is considered and the intensity is calculated. At that time the corresponding look-up table index is checked. Finally, the matching value in the look-up table is collected to create a new output frame as demonstrated in Fig. 5.

Fig. 5
figure 5

Mapping of pixels to the values in the look-up table

Here, a gamma (γ) value of 0.45 is used for lookup table formation and to enhance the low light dark frames. This threshold value of gamma, which is appropriate for all types of night-time frames, is chosen after many real-time experiments. It maps a narrow range of low-light input values into a wider range of brighter output values with gamma < 1. A dark pixel of a dark frame increases more intensity by gamma correction than a brighter pixel in a dark frame. Therefore, the frames which consist of brighter lights (such as streetlights, headlights, and moonlight), and without any brighter lights, both are perfectly suitable for enhancement using the chosen gamma value. Hence, from Eqs. (11) and (12) we get:

$${\text{Enhanced intensity}} = \left| {255 \times \left( {\frac{{[{\text{Original intensity}}]^{0.45} }}{255}} \right)} \right|$$
(13)

Enhanced frames after gamma correction with various gamma (γ) values are shown in Fig. 6. From Fig. 6, it is observed that gamma correction with Gamma(γ) = 0.25 produces an output with over brightness. Similarly, Gamma(γ) = 0.75 produce darker output. Whereas, Gamma(γ) = 0.45 produce perfect output.

Fig. 6
figure 6

Gamma correction with various gamma values a original frame after bright scene recovery b gamma (γ) = 0.25 c gamma (γ) = 0.75 d gamma (γ) = 0.45.

4 Experimental results

The proposed approach is tested on many real-time frames during the night. The approach is also trialed on “BDD Dataset (BDD100K)” (Yu et al. 2020) (contains plenty of night-time driving images). Outstanding performances for most of the frames are observed with minimum processing time. Quantitative and qualitative performances are as follows.

4.1 Quantitative performance

As the ground truth of output frames are unknown, so no-reference (NR) methods are used to compare the proposed approach with state-of-the-art techniques. No-reference free energy-based robust-metric for contrast-distortion (NIQMC) and blind image quality measure of enhanced images (BIQME) are used for effective assessment of performance (Gu et al. 2017, 2018).

NIQMC evaluates the quality of an image by its local particulars and global histogram of the image. NIQMC particularly favors the images with high contrast. Thus, higher NIQMC values indicate better image contrast quality.

BIQME comprehensively considers five influencing factors, i.e., contrast, sharpness, brightness, colorfulness, and naturalness, and 17 associated features to blindly predict visual quality. A higher BIQME indicates better image quality.

Besides, the output frames are tested by three other effective assessment methods, which are: lightness order error (LOE) (Guo et al. 2017; Ying et al. 2017), the peak signal-to-noise ratio (PSNR), and the structural similarity (SSIM) (Lv et al. 2018, 2019). The “LOE” is used to measure the lightness distortion between the original input image and the enhanced image. The lower the “LOE” value, the better the enhancement. The “PSNR” is evaluated based on the noise present in the enhanced image. The higher the PSNR value, the better the image approximation. The SSIM index is used to quantify the likeness between two images, and it considers three aspects in enhanced images: lighting, contrast, and structure. The SSIM index is a decimal value between -1 and 1. SSIM = 1 only when two images with identical sets of data are compared.

A comparative performance analysis (average of 30 real-time frames and 30 frames from BDD100K dataset), among existing low-light vision enhancement methods in terms of NIQMC, BIQME, LOE, PSNR, and SSIM is exposed in Table 1. Table 1 reveals the performance superiority of our proposed technique compared to the existing low-light vision enhancement methods.

Table 1 Quantitative performance assessment (BIQME, NIQMC, PSNR, and SSIM indicate the higher the better. LOE indicates the lower the better)

4.2 Qualitative performance

The qualitative performance of the proposed approach is demonstrated (stepwise) in Fig. 7. In Fig. 7 it is observed that prominent and bright road views (output frames) are generated from darker frames (real-time original frames and frames from “BDD Dataset (BDD100K)”) at night by the proposed approach. A close view of the qualitative performance (stepwise) of a real-time frame that consists of a high-intensity headlight of an oncoming bike in the evening, is shown in Fig. 8. Figure 8 demonstrates the clarity of the enhanced output frame generated from a low-light input frame. We qualitatively evaluated and compared the proposed method with state-of-the-art techniques in Fig. 9. In Fig. 9 it is observed that the output of MBLLEN (Lv et al. 2018) and GBLIE (Tanaka et al. 2019) look better, but still it has darkness throughout the road. Therefore, objects are not visible prominently. The output of LIEMDV (Kim et al. 2019) becomes a bit hazy due to the over enhancement of the frame. This method often causes a color-inconsistency problem due to an incorrectly estimated illumination map and suffers from low-visibility. There is also a lack of sufficient light in all the pixels of the road and surrounding views in the output of LIME (Guo et al. 2017).

Fig. 7
figure 7

Low-light enhancement of frames (real-time frames and frames from “BDD Dataset (BDD100K)”) a original frame b obtained the highest intensity by dynamic-patch c transmission map d output frame after bright scene recovery and gamma correction.

Fig. 8
figure 8

Low-light enhancement of real-time frame that consists of a high-intensity headlight of an oncoming bike in the evening. a Original frame b obtained the highest intensity by dynamic-patch c transmission map d bright scene recovery d output frame after gamma correction.

Fig. 9
figure 9

Qualitative comparison among the state-of-the-art techniques. a Input frame b result of Lv et al. (2018) c result of Tanaka et al. (2019) d result of Kim et al. (2019) e result of Guo et al. (2017) f result of our proposed method

However, the result obtained from our proposed method is bright, prominent, natural, and maintains the color consistency, while enhancing dark regions effectively. Figure 9 also demonstrates the performance superiority of our proposed technique compared to previously introduced approaches.

4.3 Processing time analysis

In the case of driving, the timing-factor is very vital. If a driver is unable to see the road at the same time as live view then there is a high possibility of an accident. There shouldn’t be any delay between live video acquisition and reconstructed video display. In this proposed system, the total processing time of a frame is only a few milliseconds that a driver’s open eye will take to look like an original live video. Time taken (in milliseconds) by the entire process for various frame sizes is shown in Fig. 10.

Fig. 10
figure 10

Processing time observation (CPU configuration: Intel(R) Core (TM) i7-10510U @ 2133 MHz, 8GB RAM, and 500 GB SSD)

The main strategy behind such a less processing time is that the atmospheric light (A) is not calculated in every frame, rather it is calculated only when a new state starts. Comparison of processing time among the existing methods and the proposed method is shown in Fig. 11.

Fig. 11
figure 11

Comparison of processing time among the existing methods and the proposed method for a 1280 × 720 resolution frame

From Figs. 10 and 11, it is observed that, if a small size display (like a traditional night vision display) of standard resolution (848 × 480) is used, then the processing time will be very less. A normal computing processor can process the above resolution and can produce a real-time display without any noticeable delay. However, if a high resolution (HD) display is used, then a high-speed processor is required to process the frames without any noticeable delay. In that case, processor cost will be slightly expensive. During numerous real-time experiments, a pleasant real-time output of 960 × 540 resolution is observed in a 13-in. display by a traditional processor [Intel(R) Core (TM) i5 8250U CPU@1.60–1.80 GHz with 8 GB RAM] without any delay.

5 Conclusion

The paper presents a real-time fast, low-light vision enhancement technique to enhance road visibility at night for drivers. The driver faces a problem at night mainly due to a lack of sufficient light. This problem cannot be handled properly by using headlights and streetlights because these artificial lights cannot brighten all the pixels of a night-time frame. By the proposed technique a driver can have a prominent real-time bright vision of the road and surrounding view at night which appears the same during the day-time. Drivers can easily distinguish objects, potholes, and pedestrians on the road. There are many existing approaches, which can enhance any low-light image. But, most of the existing approaches require enormous processing time and are not suitable for real-time applications. The proposed approach aims to provide a real-time bright view of the road prominently for drivers during the night at a minimum computation time using a low-cost 2D camera. Many real-time tests are conducted and the experimental result reveals that the proposed approach achieves auspiciously against the existing low-light image enhancement algorithms. This system can be installed in any vehicle which generally plies at night. The driver, as well as passengers, can be out of risk by using this system. Pedestrians can also walk on the road safely if everybody uses the proposed system. This system can also be used in autonomous vehicles. Road accidents during night will also decrease.

In the future, the proposed work will be extended by adding more features to handle rapid fluctuations of atmospheric light in a dynamic way. The region-based dynamic calculation of atmospheric light will be a great solution to produce high quality enhanced frames. Frames are produced with a perfect balance of brightness and contrast throughout the entire frame within a minimum processing time. A similar strategy will also be applied in a smartphone application for two-wheeler riders. Instead of installing a camera in windshield glass, a virtual reality headset with a smartphone will be used so that any two-wheeler rider can get an enhanced view on any dark road. Besides, unwanted high beam lights on the road will also be inhibited with a more dynamic way to produce a comfortable output frame for any driver or rider.