1 Introduction

Haze is a phenomenon of absorption or scattering of particles in the atmosphere. It influences the attenuation of radiance along the path toward the camera and captured images are suffered from contrast and quality. It limits the visibility of distant scenes. In recent times, significant advancements has been made in de-hazing techniques based on the atmospheric scattering model of the hazy image. The image de-hazing techniques are categorized: Based on image enhancement and physical model. The image enhancement methods include dynamic range compression and shadow compensation methods [1, 7, 12], linear transformation [10], structure-preservation [20] and dark channel prior [11] whereas, polarization-based [22] image de-hazing is an example for the physical model method.

According to [28], the methods [1, 7, 12] work well to enhance low-light areas while maintaining the color information and image data, without producing visual halos. But it has disadvantages in the dark areas and they are not being enough enhanced. In the polarization method, the haze can be removed by considering many hazy images taken from the dissimilar degree of polarization. In this case, more than one image is needed to recover the de-hazed image. Based on this, Narasimhan et al. [18, 19] proposed a haze image model to estimate the haze properties of an image. This method can be used only when the environment has a thin haze. Schechner et al. [22] observed that air-light is partially polarized when it is scattered with atmospheric particles. Based on the above concept, they proposed a new haze removal algorithm using a polarizer at various angles. Therefore, the polarization method is inefficient for removing the haze from an image. Later, Kopf et al. [14] suggested a haze removal algorithm using depth information. Jobson et al. [13] used a multi-scale retinex method to increase the visualness of hazy images. Tan et al. [24] used to maximize the contrast of the de-haze image to achieve high contrast haze-free image. Since the resultant de-hazed image has a higher contrast than a hazy image, this method is better for the thickest haze regions with severe color distortions. Fattal et al. [9] proposed a refined image formation model with an assumption that the surface shading and the scene transmission are locally uncorrelated to estimate the thickness of the haze. Next, He et al. [11] proposed a well-used method called dark channel prior, which produces good outcomes and efficiently work compared with enhancement methods [1, 7, 10, 12, 20, 28] for image de-hazing. Meng et al. [16] suggested an effective regularization method for de-hazing where the haze-free image can be restored with the help of inherent boundary constraints. Tang et al.[25] started an investigation to find out the best feature combination for image de-hazing on different hazy features in a random forest. The random forest is an over-fitting problem that has inherent limitations. Cai et al. [6] proposed a trainable end-to-end system called DehazeNet. This system works using Convolutional Neural Networks (CNN) based on deep architecture [4]. Recently, an active contour on a region based on image segmentation with a variational level set formulation for de-hazing is proposed [2]. This method minimizes the energy function to preserve the edges and it is an iterative structure. Thus, the computational complexity of this method is more. Hence, it is difficult to implement on hardware. A fast and memory efficient dehazing algorithm for real-time computer vision applications is proposed in [23] but not discussed about its hardware implementation. Bai et al. [5] proposed a real-time single image de-hazing system on DSP Processor, which required 4mSec time for execution. Hence, for real-time applications, de-hazing is still a challenging task.

Digital image/video processing algorithms are difficult to implement on a processor due to a few factors like a huge amount of data present in an image, and the complex operations need to be performed on the image. A single image of size 256 \(\times \) 256 can be considered, to perform a single operation (3x3 convolution/masking) it requires about 0.2 million computations (addition/subtraction, multiplication, padding, and shifting) without considering the overhead of loading and recovering pixel values. This paper presents two different implementation strategies to carry out the goal of hardware implementation. One method is on Zynq-706 FPGA and another on DSP processor (TMS320C6748).

This paper is organized as follows. Section 2 describes the proposed method. In Sect. 3, the hardware architectures for the implemented de-hazing algorithm are described in detail. Section 4 illustrates the discussion of simulation results. Conclusions are provided in Sect. 5.

2 Proposed Method

Based on atmospheric scattering theory, McCartney [15] has described a haze image model in terms of light attenuation and air-light. This model is commonly used to define the haze image formation and is given as

$$\begin{aligned} I(x)=J(x)t(x)+A[1-t(x)] \end{aligned}$$
(1)

where x is the image pixel coordinate, I(x) is input hazy image, J(x) is output de-haze image, A is air light, and t(x) is transmission media. In (1), the product term J(x)t(x) is known as direct attenuation, the second term \(A[1-t(x)]\) is an additive component and termed as air-light. Three unknown components are present in the above equation. The basic idea of de-hazing is restoring J(x) from I(x) by estimating the parameters A and t(x). Then, the final dehazed image can be represented in (2)

$$\begin{aligned} J(x)=A+\left( \frac{I(x)-A}{max(t(x),t_{0}))} \right) \end{aligned}$$
(2)

Using method [11] de-hazing can be done. But, this method suffers from high computation complexity owing to, wide-ranging matrix multiplication/division, sorting, and floating-point operations. In low-speed processors, this method cannot meet the user timing requirements for the real-time image processing applications. Therefore, an efficient and low-complexity haze removal method using pixel-based and gray image-based is proposed for real-time applications.

Figure 1 represents the block diagram of the proposed algorithm based on the minimum number of computations.

Fig. 1
figure 1

Block diagram of proposed algorithm

2.1 Estimation of Average Channel

The DCP method employed a minimum channel of the pixels as a dark channel of the image. As a result, it produces halos and artifacts in the de-hazed image. The effect of employing a minimum of pixels with a patch-based moment is represented in Fig. 2. The white region becomes shaded, resulting in will lead to producing undesired pixel intensity which causes a halo effect.

Fig. 2
figure 2

Halo effect

$$\begin{aligned} I_{average}=\frac{I_{dark}+I_{gray}}{2} \end{aligned}$$
(3)

The \(I_{dark}\) can be estimated using [11]. The \(I_{gray}\) can be estimated using (4). The mean of these can be termed as \(I_{average}\) and it significantly reduce the halos and artifacts present in the final dehazed image.

$$\begin{aligned} I_{gray}=0.3*R+0.59*G+0.11*B \end{aligned}$$
(4)

2.2 Estimation of Atmospheric Light (AL)

It is noticed that the value of the air light is always closed to higher pixel value within the image. By conducting several simulation experiments in MATLAB with different values of atmospheric light ranging from 0 to 1 for floating-point datatype and 0 to 255 for unsigned integer format (uint8), the atmospheric light value is assumed as maximum pixel value within the dark image. It can be calculated as “\(255-min\)” for uint8 format and “\(1-min\)” for floating-point format. Figure 3 represents the effect of variation of atmospheric-light on de-hazed images, it is observed that the flower database produces accurate de-hazing output at AL=0.7, whereas table database at AL=0.9.

Fig. 3
figure 3

Variation of atmospheric light on hazy images (Flower, Table database)

2.3 Estimation of Transmission Map

The transmission can be estimated by normalizing the average image with atmospheric light. Later, recovered scene radiance can be obtained as a de-hazed image. The transmission map of various algorithms are visually compared and is shown in Fig. 4. It is observed that our proposed method is estimated an accurate transmission map when compared with other existing algorithms.

Fig. 4
figure 4

Transmission depth map of different algorithms a.Zhu et al. b. Tarel et al., c.Our Proposed d. He et al., e. Tripathi et al.

2.4 Performance Evaluation Metrics

Evaluation metrics gives information about which scheme of the method can be preferable. In this paper, five performance metrics are used for evaluating the efficiency of the proposed algorithm.

2.4.1 PSNR and MSE

PSNR and MSE are well-identified performance metrics for measuring the degree of error because these represent overall error content in the entire output image. PSNR is defined as the “logarithmic ratio of peak signal energy (P) to the mean squared error (MSE) between output \(N_{i}\) and input \(M_{j}\) images”. It can be expressed as

$$\begin{aligned} PSNR= & {} 20\times log\left[ \frac{max(P))}{\sqrt{MSE}}\right] \end{aligned}$$
(5)
$$\begin{aligned} MSE= & {} \frac{1}{kn}\sum _{i=0}^{k-1}\sum _{j=0}^{n-1}\left\| M_{i}-N_{j} \right\| ^{2} \end{aligned}$$
(6)

where P is the maximum value of the pixel in an image (typical value of \(P=255\)), MSE is mean squared error, kn are the no. of rows, no. of columns of the image, respectively. Generally, the value of PSNR would be desirably high.

2.4.2 Computation Time or Average Time Cost (ATC)

It represents the amount of time needed to complete an algorithm. The unit is in seconds. The average time cost of our proposed algorithm is estimated with the other four existing algorithms in 21 number of iteration.

2.4.3 Average Contrast of Output Image (ACOI)

$$\begin{aligned} C=\frac{1}{MN}\left( max \left( \frac{L_{max}-L_{min}}{L_{max}+L_{min}} \right) \right) \end{aligned}$$
(7)

Here, \(L_{min}\) \(L_{max}\) is the minimum, maximum luminance of an output image correspondingly. M, N row, and column of an output image, respectively. Generally, the maximum value of it directs that an image is more quality.

Table 1 Comparison of quantitative metrics
Fig. 5
figure 5

Simulation results of de-hazing Techniques (Input hazy image, output of Zhu et al., Tarel et. al., He et. al., Tripathi et. al., Proposed from left to right.)

The proposed algorithm is qualitatively and quantitatively compared in MATLAB 2017a with four existing state-of-art algorithms, namely Zhu et al. [31], Tarel et al. [26], He et al. [11] and Tripathi et al. [27] in terms of PSNR, ATC, ACOI, MSE, PHI, and SSIM [29] for an input image dimensions of \(640 \times 480\). The comparison of quantitative metrics of the proposed algorithm is represented in Table 1 and corresponding visual comparison using qualitative metrics are represented in Fig. 5. It is observed that the proposed algorithm takes an optimum amount of execution time. The SSIM value indicates that the proposed method produces good quality of the de-hazed images. The PSNR and PHI values of our method are high which indicating that haze particle elimination, as well as visualization of a de-hazed image is good. Thereby, it eliminates hallos & artifacts in the recovered output image, reduces the time-consuming computational step of refining transmission in DCP, and produces accurate transmission map. The bold value in the Table 1 is indicating that the better evaluation metric value produced by the corresponding method over other methods.

3 Hardware Architecture

3.1 Implementation on Zynq-706 FPGA

Since the existing state-of-art algorithms are difficult to implement on the hardware platform, the 14-stage pipeline structure based on Zynq-706 [8] FPGA for single image haze removal is designed. The implemented algorithm takes full advantage of powerful parallel processing and the ability to perform the same on the hardware platform. Figure 6 represents the hardware architecture of the proposed de-hazing algorithm on Zynq FPGA. Initially, the design includes RAM inference using MIF (Memory Initialization File), which specifies the initial content of a memory block. Three .mif files are sufficient for storing the input image data. With the help of a counter, the data which is stored in the .mif file can be invoked. In the proposed design, Each .mif stores \(256 \times 256\)=65536 pixels data. Therefore, A 16-bit counter is used for extracting the address of 64Kb .mif data. Then, three 8-bit comparators are employed for finding minimum pixel among R, G and B, followed by a gray image of the input image using shift operations can be obtained using the following equation.

$$\begin{aligned} I_{gray modified}=0.25*R+0.5*G+0.25*B \end{aligned}$$
(8)

The (8) is not the same as the traditional definition (4). Still, it makes almost no difference. It is much easier to be implemented in hardware design, to reduce the resource utilization, we employed a cut-off operation (shifting) instead of multiplication, thereby achieving a gray image of an input.

Later, a 10-bit adder is used and then truncating the LSB of the sum and carry bit so that an 8-bit data is given as input to further stages. Once the gray and dark channel is achieved, the mean of them using left shift operation is considered as the average channel of a hazy input image. Since the uint8 format of the image is employed, the value of atmospheric light calculated as the “255-minimum” pixel value of the input image. Moreover, the visibility content of the output is not much effected compared with the software platform. The normalization of the average channel to the atmospheric light results in a transmission map. An 8-bit divider circuit is employed for this operation. The above process is run in parallel for R, G, and B channels of the input image. Finally, using (2) modified pixel values are obtained as a de-haze image. The detailed 14-stage pipelining structure in the form of RTL of the proposed algorithm on the Zynq-706 platform using HDL is represented in Fig. 7. In order to obtain the output in a systematic manner, we introduced two new Intellectual Property (IP) cores, i.e., Virtual Input/ Output (VIO) and Integrated Logic Analyzer (ILA) these cores are responsible for verifying the de-haze algorithm. Figure 8 represents the top-level arrangement of IP cores with the de-haze algorithm.

Fig. 6
figure 6

Proposed Hardware architecture of image de-hazing

Fig. 7
figure 7

Schematic and elaborated design structure

Fig. 8
figure 8

Test setup of de-hazing algorithm in Vivado

Once the algorithm is implemented on hardware, verification of the algorithm can be done using VIO console. Figure 9 represents simulation results of the handwritten HDL code of image de-hazing. It can be observed that the first three signals within the waveform are input red, green and blue pixels, respectively, the corresponding gray, dark channel and average channel of these pixels are indicated in the bottom of these waveforms. It is very hard to analyze for \(256\times 256=65536\hbox { pixels}\). But, using the Tcl command the corresponding pixel values which are currently being executed on the FPGA can be exported to the spreadsheet, and qualitative analysis is verified in MATLAB.

Fig. 9
figure 9

Simulation results of image de-hazing system

3.2 Implementation on DSP Processor (TMS320C6748)

Based on DaVinci technology, an ideal core power for signal processing as well as image processing applications, the lowest cost DSP device has to choose. For which various kinds of DSP processors have been compared. Finally, TMS320C6748 [30] is chosen to be the core device of the system. The TMS320C6748 evaluation module is chosen as the hardware platform of data processing, and Code Composer [21] V6.0 simulator is chosen to implement simulation.

The hardware implementation flow of the proposed algorithm on TMS320C6748 is presented in Figure 10.

Fig. 10
figure 10

Flow of hardware implementation on TMS320C6748

In initial step, an input image is taken from the standard database of size 256 by 256 for experimental verification. At the pre-processing stage, an image is cast into a standard datatype ‘floating point (double precision)’ format. The implementation of the de-haze algorithm in the code composer as follows. In the first step, we implemented a minimum filter for an image of \(\hbox {M}\times \hbox {N}\times 3\) pixels. Using (4) gray image of the input image is calculated. The overall cost \(\hbox {O}(2\times \hbox {M}\times \hbox {N}\)) time. Since the floating format of the image is employed, therefore, in transmission estimation each pixel value is escalating to 1. The value of atmospheric light assumed as the “1-minimum” pixel value of the image. Figure 11 represents basic steps involved for implementing image de-hazing on DSP TMS320C6748 processor. The process is applied on each color channel individually and combined at the final step.

Fig. 11
figure 11

Basic Steps involved in hardware implementation of image De-hazing using TMS320C6748

4 Results and Discussion

The validity of the proposed algorithm, which is implemented on two hardware platforms is verified using a well-known standard database of Middlebury [3] and from the Flickr website. To speed up the algorithm while implementing it on Zynq FPGA, the “uint8” format for representing the pixels is employed. Consequently, there is a negligible amount of loss in information resulting in the variation of the percentage of haze improvement. Whereas the floating-point operations are added as an advantage for obtaining better results in DSP processor-based implementation. In these two implementations, a refined transmission map (Guided filter in case of DCP) was eliminated, and this technique is helpful for disallowing the artifacts and hallows. The intensity variation between the haze and de-hazed image can be observed in Fig. 12. Here, red, green and blue lines represent the variation of RGB pixels of the image. It is observed that the brightness or intensity value of the de-hazed image is less compared with haze image pixels.

Fig. 12
figure 12

Sharp variation of brightness of the image pixels a. Haze image b.Dehazed image

Table 2 indicates the speed of operation of the above-mentioned techniques. It is observed that the processing time required for executing the haze removal algorithm is optimal in two hardware platforms. The Zynq-706 FPGA with HDL [17] based implementation took less amount of time when compared with DSP processor-based implementation. Since a 14-stage pipeline line structure is used, resulting in a number of clock pulses \((256\times 256\times 3\times 14=2752512)\) are required for de-haze operation theoretically.

Table 2 Execution time on hardware platforms
Table 3 Utilization summary of hardware resources
Fig. 13
figure 13

Implementation results of de-hazing system (Input hazy image, output on DSP processor, output on hand-written HDL based design, output of MATLAB from left to right)

Hardware resource utilization summary of the proposed algorithm on the Zynq platform is presented in Table 3, and it is noticed that the algorithm took less number of hardware resources. A very less number of other resources (LUT (Look Up Table), LUTRAM, FF (Flip-Flops), IO (Input Output), and MMCM) are utilized when compared with BRAM. Since storing the entire image or frame is not an efficient way of using on-chip memory (BRAM) resources on FPGA. Such a storage mechanism (.mif based) generally affects BRAM memory resources. This algorithm requires approximately 44 percentage of BRAM resources, which can be considered as a major challenging issue in hardware implementation. It can be overcome by storing a small portion of the image data inside the FPGA using line buffer instead of BRAMs, i.e., a portion of the image data or few selected lines of image data can be stored in a line buffer for manipulation and then, the remaining part of the image will be stored. Figure 13 and Table 4 represent qualitative and quantitative comparison of de-haze algorithm, respectively. Since the cutoff operation has employed for finding the gray image in HDL-based implementation resulting in high MSE compared with processor-based implementation. PSNR and ACOI values are suggested that both hardware-based de-hazing algorithms produce a better result. When throughput is the user constraint, Zynq-based hardware platform is well suited and if the quality of the output image requirement is the user constraint then DSP processor-based hardware implementation is well suited for de-haze applications. The hardware test setup of the proposed algorithm is represented in Fig. 14.

Table 4 Quantitative performance of de-haze algorithm on different hardware platforms

5 Conclusion

In this paper, we have introduced a fast and efficient de-hazing algorithm well suited for implementing on any hardware platform. The entire work is carried out on i5 CPU 3 GHz and 8GB memory with OS window 8.1 with MATLAB 2017a. Vivado version of 2015.4 and Code composer studio v6.0 are used for implementation. A Zynq (xc7z045ffg900-2)-based hardware architecture, and DSP processor-based image de-hazing algorithms are implemented successfully. The RGB format always needs 24-bits to represent pixels. It will still affect the processing time and hardware resources. To overcome this issue, a YUV 4:2:2 format may be used. We consent to this issue in future research. The same algorithm can be extended for real-time video de-hazing in the future.

Fig. 14
figure 14

Hardware Platform Test Setup; Left: on DSP Processor TMS320C6748, Right: on Zynq 706 All programmable SoC