1 Introduction

Modern video, audio, and multimedia applications for mobile smart devices and onboard of vehicles that require real-time processing rely on multi-core embedded systems for their high data rate processing capabilities and portability. Near future will bring more occurrences of these systems with the “Internet of Things”. In this paper, we show how to remove haze, a phenomenon which degrades visibility, from videos. It is caused by particles or water droplets in the atmosphere that absorb and scatter the light that travels from the source to the observer as stated by McCartney [16]. Haze affects drivers as it changes the colors of the objects and attenuates the scene radiance. Moreover, haze affects the performance of autonomous vehicles which depend on detecting objects in the scene such as the road lanes. Thus, removing haze from videos will have a great impact on enhancing the performance of these systems and will provide a great help for drivers as well. However, the implementation of haze removal algorithms requires many calculations and large memory. These come as barriers to real-time implementation on embedded systems.

In this paper, the TMS320DM6446 multi-core digital media system on chip (DMSoC) [26] is used to provide an onboard portability for the embedded system and high speed in removing haze from videos. The TMS320DM6446 consists of ARM926EJ-S core as a general purpose processor (GPP), TMS320C64x+ as a digital signal processor (DSP) core, and a video and image coprocessor (VICP).

The most commonly used haze imaging model, also called image degradation model, proposed by McCartney [16] is:

$${\mathbf{I}}({\mathbf{x}})={\mathbf{J}} ({\mathbf{x}})t({\mathbf{x}})+{\mathbf{A}}(1-t({\mathbf{x}})),$$
(1)

where I is the observed scene, J is the scene radiance, t is the medium transmission, A is the atmospheric light, and x = (xy) are the horizontal and vertical positions of the pixel. The term J(x)t(x) is the direct attenuation of the scene radiance, and the term A(1 − t(x)) is the added atmospheric light which distorts scene colors. Haze removal problem aims at recovering J given only the observation I blindly with the absence of knowledge about both A and t. Therefore, solving Eq. (1) is an ill-posed inverse problem according to Hadamard [10].

The main challenge lies in the ambiguity of the problem as the number of equations is less than the number of unknowns. Though, there are an infinite number of solutions given only the degraded or hazy image I. There are two main approaches to solve this ambiguity by either acquiring more knowledge about the scene or imposing a priori constraints. For instance, some haze removal algorithms capture multiple images of the same scene under different settings to assist the recovery process such as the algorithms proposed by Narasimhan and Nayar [1721], Shwartz et al. [22], Kopf et al. [14], and Feng et al. [9]. However, it is difficult to obtain extra images in practice in many consumer video cameras.

The second approach is to impose extra constraints using knowledge or assumptions which are called priors such as the algorithms proposed by Tan [23], Fattal [8], He et al [11], and Ke and Chen [12]; thus, practically we only require one image. Generally, a prior can be a statistical property, rule, or heuristic assumption. The quality of the results is often determined by the extent to which the prior is valid.

Wu et al. [28] have surveyed haze removal algorithms. Among the different methods, single image haze removal without additional imaging hardware can operate using most consumer video cameras. A comparison of the execution time for these methods, as reported in their papers, is shown in Table 1. The methods proposed by He et al. [11] and Ke and Chen [12] are based on the dark channel prior that have lower execution time compared to the other methods.

Table 1 Execution times of different single image haze removal methods on personal computers

We rely our work on the dark channel prior because it is fast and effective in most scenes. However, it may not work for some particular cases such as saturated objects or objects which are inherently similar to the atmospheric light and without shadow. Our focus is to design a fast algorithm to present a real-time haze removal system.

Previous work based on the dark channel proposed by Khodary and Aly [18] and El-Hashash et al. [12] developed haze removal for video on TMS320DM6446 platform and achieved 0.5 and 3 frames per second (fps) respectively for video frames of \(720\times480\) pixels. The idea of multi scale resolution was introduced in [7] to expedite computations. In this paper, we made the processing across resolution scales and propose a fast reconstruction equation. We added clear analysis from a signal processing perspective to justify our algorithm.

This paper is organized as follows: The dark channel prior-based algorithms are reviewed in Sect. 2. In Sect. 3, our algorithm using the dark channel prior and its implementation technique on the target platform are presented. In Sect. 4, experimental results and performance evaluation are demonstrated. Finally, the paper is concluded in Sect. 5.

2 Background

The dark channel prior-based algorithms are reviewed in Sect. 2.1 and the implementation technique for the target platform is presented in Sect. 2.2.

2.1 Dark channel prior-based algorithms

He et al. [11] defines the dark channel as:

$$J^{\text{dark}}({\mathbf{x}})=\min_{{\rm c}\in \{{\rm r},{\rm g},{\rm b}\}}(\min_{{\mathbf{y}} \in \varOmega ({\mathbf{x}})} (I^{\rm c}({\mathbf{y}}))),$$
(2)

where Jdark is the estimated dark channel of I, Ic is a color channel (red, green, blue) of I, and \(\varOmega({\mathbf{x}})\) is a local patch around the pixel at x. According to He et al. [11], the dark channel is almost zero for haze-free images as shown in Fig. 1, where we have the dark channel for two images: a hazy and a haze-free one. In Fig. 1a, b, it can be seen that the objects that are clearly visible in I have zero-valued dark channel such as the rocks in the front. The atmospheric light is computed using Jdark by finding pixels with the highest 0.1 % values as in Eq. (3). These pixels are usually most haze opaque. Some haze removal methods estimate the airlight by selecting the pixel that has the maximum intensity, which is not valid in many scenes (for example, if there is an object that has higher intensity than the airlight). The indices of the highest 0.1 % pixels are stored in m as follows:

$${\mathbf{m}} = \mathop{\arg}\limits_{\tilde{\mathbf{x}}} \max \limits_{0.1\%} ({J^{{\rm dark}}}({\tilde{\mathbf{x}}})).$$
(3)

Then, the sum of the three color channel values in \({\mathbf{I}}({\mathbf{m}})\) is calculated. Finally, the pixel corresponding to the maximum sum is selected as A as follows:

$${\mathbf{A}} = \mathop{\arg}\limits_{\tilde{\mathbf{I}}} \max \left(\sum \limits_{{\rm c} \in \{{\rm r},{\rm g},{\rm b}\} } {{{\tilde{I}^{\rm c}}}} ({\mathbf{m}})\right).$$
(4)

Using I and the estimated A, the normalized dark channel is defined by:

$$J^{{{\text{dark}}}}_N({\mathbf{x}})= \min_{{\rm c}\in \{{\rm r},{\rm g},{\rm b}\}} \left(\frac{I^{\rm c}({\mathbf{x}})}{A^{\rm c}}\right),$$
(5)

then the coarse transmission \(\tilde{t}\) is assumed to be:

$$\tilde{t}({\mathbf{x}})=1-(\omega \times \min_{{\mathbf{y}} \in \varOmega ({\mathbf{x}})}(J^{{{\text{dark}}}}_N({\mathbf{y}}))),$$
(6)

where \(\omega\) is the aerial perspective factor and is set to 0.95 in He et al. [11]. The transmission map should be object based at specific scene depth. There are two approaches to obtain an object-based transmission map. One way was performed by He et al. [11] where they assumed the objects to be patches (15 × 15) followed by removing the blocking artifacts by edge-directed smoothing operation (soft matting method by Levin et al. [15]). Alternatively, Ke and Chen [12] approached the problem starting from dense field based on pixels which results in noisy incoherent transmission map that is smoothed by a moving average filter. Both approaches provide almost the same result that is called refined (enhanced) transmission map t in [11]. The enhanced transmission t and A are used to recover J as:

$${\mathbf{J}}({\mathbf{x}})= \frac{{\mathbf{I}}({\mathbf{x}})-{\mathbf{A}}}{\hbox{max}(t({\mathbf{x}}),t_0)}+{\mathbf{A}},$$
(7)

where t0 is set to 0.1 in He et al. [11] to avoid division by zero for areas where the value of the transmission map is below 0.1. As from Eq. (1), when \(t \rightarrow 0\), \({\mathbf{I}} \rightarrow {\mathbf{A}}\) and in this case it is very hard to reconstruct J because it is severely attenuated by the small value of t. Hence, in the reconstruction they stop further amplification of A by setting t0 to 0.1.

Fig. 1
figure 1

a Hazy image and its dark channel in b, c haze-free image and its dark channel in d

Soft matting requires extensive computations and memory, and thus Ke and Chen [12] calculated the dark channel by only selecting the value of the minimum color channel for the pixel. They used Eqs. (3) and (4) to estimate A, then Eqs. (2) and (6) in [12] became:

$$J^{{{\text{dark}}}}({\mathbf{x}})=\min_{{\rm c}\in \{{\rm r},{\rm g},{\rm b}\}} (I^{\rm c}({\mathbf{x}})),$$
(8)
$$\tilde{t}({\mathbf{x}})=1-(\omega \times J^{{{\text{dark}}}}_N({\mathbf{x}})).$$
(9)

A smoothing operation is used instead of soft matting and performed for \(\tilde{t}\) by a moving average filter to obtain \(t_s\) reducing the computational cost. Finally, a gamma correction function is applied to the smoothed transmission to avoid over haze removal and preserve a natural look for the scene:

$${t}({\mathbf{x}}) = ({t_s}({\mathbf{x}}))^\gamma,$$
(10)

where Ke and Chen [12] set \(\gamma\) to 0.6 to enhance the contrast of the scene. A sample resulting transmission map using [12] is shown in Fig. 2b which looks very comparable to Fig. 2a of He et al. [11] but at reduced computations and without large memory requirements.

Fig. 2
figure 2

Transmission map of image in Fig. 1a using different haze removal methods: a using [11], b using [12], c using our algorithm

2.2 Implementation technique for the DM6446

The main cores of the TMS320DM6446 are the ARM, DSP, and VICP. The ARM9 GPP core at 300 MHz handles tasks related to both the peripherals and the control application and uses embedded Linux as a real-time operating system (RTOS). The C64x+ DSP core at 600 MHz handles vector processing (such as convolution) operations efficiently. The VICP hardware accelerator at 405 MHz is effective in enhancing DSP performance. VICP takes over the execution of varied computationally intensive tasks [26] that have multi-input multi-output nature such as color conversions. The DSP core uses DSP/BIOS [24] which is a RTOS provided by TI, and runs the desired algorithm. The ARM and DSP cores communicate with each other via DSP/BIOS Link [6] on shared internal bus as shown in Fig. 3. Implementation of applications on this heterogeneous system is performed using the codec engine framework [4], which is a set of APIs used to instantiate and run eXpress DSP algorithm interoperability standard (XDAIS) [25]. To support digital media encoders, decoders, and codec extensions for video, imaging, speech, and audio (VISA), codec classes provide the XDAIS-DM interface [29] which specifies APIs for digital media codecs. For instance, the DSP and the VICP can run the haze removal algorithm which should meet the XDAIS-DM before it is instantiated and executed by the ARM using the VISA APIs. The block diagram for the DM6446 is detailed in Fig. 3.

Fig. 3
figure 3

TMS320DM6446 functional block diagram [26]

3 Haze removal system

We describe our haze removal algorithm in Sect. 3.1, while in Sect. 3.2, we provide our software architecture for the algorithmic tasks distributed among the different cores.

3.1 Implemented haze removal algorithm

We implemented Ke and Chen [12] algorithm on DM6446 where the execution time for each task is shown in Table 2. Equations (9) and (10) are the bottleneck tasks and consume 93 % of the total time. The DSP core TMS320C64x+ has a fixed point architecture. Floating point operations will generally deteriorate the system performance (frames per second (fps)). Hence, we focus on redesigning Eqs. (9) and (10) by our new method as shown later in Eq. (15). The implemented algorithm consists of two components; the first is a new technique to replace Eq. (9) and the smoothing operation using the moving average filter by Ke and Chen [12]. The second component provides a new approach to recover the scene radiance.

Table 2 Time profiling for Ke and Chen algorithm [12]

Regarding the first component, we need to reduce the execution time consumed by Eq. (9) and a good smoothing filter for \(\tilde{t}\) instead of the moving average filter used by Ke and Chen [12], specifically, a filter with minimum ripples in the stop band. Several candidate filters with large taps could be used. However, the vectorized multiply and add instruction of the DSP cannot be efficiently used as the number of coefficients to be loaded are limited. Additionally, filtering with moving average or bilinear filters have optimized hardware implementation on the DSP of the DM6446. We use the separable bilinear filter twice. The first use is as an anti aliasing filter \(h_a({\mathbf{x}})\) in a downsampling of \(J^{{{\text{dark}}}}_N\) by a factor d in each dimension to obtain \(J^{{{\text{dark}}}\downarrow }_N\). The second use is as a bilinear interpolation filter \(h_i({\mathbf{x}})\) in a subsequent upsampling of \(J^{{{\text{dark}}}\downarrow }_N\) by a factor d in each dimension to obtain the smoothed normalized dark channel \(J^{{{\text{dark}}}}_{Ns}\). Although this idea was presented in [7], we give in this paper detailed justification. Without loss of generality, we can use a simple filter such as bilinear for both \(h_a\) and \(h_i\).

To compare between the smoothing using moving average filter \(h_b({\mathbf{x}})\) by Ke and Chen [12] and our method using only the downsampling followed by upsampling, we cast our analysis in the frequency domain where \(T_1({\mathbf{u}})\) and \(T_2({\mathbf{u}})\) are the discrete Fourier transform of the smoothed transmission maps of the method by Ke and Chen [12] and ours, respectively. They can be written as:

$${T_1}({\mathbf{u}})= ({H_b} \tilde{T})({\mathbf{u}}),$$
(11)

where \(\tilde{T}\) is the fourier transform of the coarse transmission map which is smoothed by a moving average filter whose frequency response is \(H_b(\textbf{u})\):

$${H_b}({\mathbf{u}})= \frac{{\sin (\pi ud)}}{{d\sin (\pi u)}}\frac{{\sin (\pi vd)}}{{d\sin (\pi v)}},$$
(12)

where the vector \({\mathbf{u}} ={[u \space v]}^{\rm T}\) has the horizontal and vertical frequencies, and d is the size of the moving average and can determine the cut-off frequency \({\mathbf{u}}_c=[{\pm \frac{1}{2d}} \space {\pm\frac{1}{2d}}]^{\rm T}\). In our algorithm, as will be described later, we do not need to calculate the transmission map explicitly. Our implicit smoothing of \(\tilde{t}\) using downsampling followed by upsampling by factor d in each dimension can be written as:

$$\begin{aligned} {T_2}({\mathbf{u}}) &= {H_i}({\mathbf{u}})\sum \limits_{k = 0}^{d^2-1} {({H_a}\tilde{T})({\mathbf{u}} + {\mathbf{s}}_k)} \\ &= ({H_a}{H_i}\tilde{T})({\mathbf{u}}) + {H_i}({\mathbf{u}})\sum \limits_{k = 1}^{d^2-1} {({H_a}\tilde{T})({\mathbf{u}} + {\mathbf{s}}_k)}, \end{aligned}$$
(13)

where \({\mathbf{s}} _k\) are the coset representatives of the high-resolution reciprocal lattice in the lower resolution reciprocal lattice as detailed by Aly and Dubois [1]. \(k=1,2,\ldots ,d^2-1\) represents the aliasing effect due to downsampling in the frequency domain. The first term is an implicit smoothing filter whose frequency response is \({H_a}{H_i}({\mathbf{u}})\), while the second is the aliasing components.

Since \(H_b, H_a\), and \(H_i\) are approximations of a low-pass filters with cut-off frequency at \({\mathbf{u}}_c=[{\pm \frac{1}{2d}} \space {\pm\frac{1}{2d}}]^{\rm T}\), then the product \(H_a H_i({\mathbf{u}})\) in Eq. (13) is considered to be an approximation of a low-pass filter as well. Illustrations using moving average and bilinear filters for \(H_a, H_i\) are shown in Fig. 4a, b, and the product \(H_a H_i({\mathbf{u}})\) in Fig. 5a, b for the moving average and bilinear prototype \(H_a\) and \(H_i\). It can be seen that at the pass band \({\mathbf{u}}_p \in [{{-}\frac{1}{2d}},{\frac{1}{2d}}]^2\), the frequency response of \(H_aH_i({\mathbf{u}})\) is similar to a smoothing filter such as \(H_b ({\mathbf{u}})\). At the stop band \({\mathbf{u}}_s \in [{{-}\frac{1}{2}},{\frac{1}{2}}]^2-{\mathbf{u}}_p\), we see that \(H_a H_i({\mathbf{u}})\) has no ripples or attenuated ones as compared to \(H_b({\mathbf{u}})\) in both cases of moving average and bilinear filters. This makes the implicit filter \(H_a H_i({\mathbf{u}})\), a good smoothing filter better than \(H_b({\mathbf{u}})\). For sake of comparison only with the other methods, we apply Eq. (9) and our implicit transmission map is shown in Fig. 2c which appears to be smooth in regions with the same scene depth without ringing around edges.

Fig. 4
figure 4

Frequency response of \(H_b\): a of a 5 × 5 moving average filter in [12] b corresponding bilinear filter

Fig. 5
figure 5

Frequency response of the implicit filter \(H_a H_i\): a using moving average for both \(H_a, H_i\)b using bilinear filter for both \(H_a, H_i\)

The second component of our algorithm deals with the reconstruction in Eq. (7) and the preceding steps in Eqs. (9) and (10). The three equations involve massive floating point operations. We designed a new method to recover the scene. We substitute the cumulative effect of \(\frac{1}{{(1-\omega h_{b^{\ast}}J^{{{\text{dark}}}}_{N})}^\gamma }\) in Ke and Chen [12], where * denotes convolution, by a first-order polynomial approximation \(1+KJ^{{{\text{dark}}}}_{Ns}\) as follows:

$${\mathbf{J}}({\mathbf{x}})= ({\mathbf{I}} ({\mathbf{x}})-{\mathbf{A}})(1+KJ^{{{\text{dark}}}}_{Ns}({\mathbf{x}})) + {\mathbf{A}},$$
(14)

where K is a constant. Thus, the scene radiance J is recovered by:

$${\mathbf{J}} ({\mathbf{x}})= {\mathbf{I}} ({\mathbf{x}}) + ({\mathbf{I}} ({\mathbf{x}})-{\mathbf{A}})(KJ^{{{\text{dark}}}}_{Ns}({\mathbf{x}})).$$
(15)

This is illustrated in Fig. 6a where we plotted the two functions (\(\frac{1}{{(1-\omega h_{b^{\ast}}J^{{{\text{dark}}}}_{N})}^\gamma }\) and \(1+KJ^{{{\text{dark}}}}_{Ns}\)). We empirically found the best estimate for the factor K in Eq. (15) to be 1.5. We focus on the interval where the values for \(J^{{{\text{dark}}}}_{Ns}\in [0.2, 0.6]\) as this interval contain the majority of pixels values in \(J^{{{\text{dark}}}}_{Ns}\) as shown in the histogram in Fig. 6b, c for \(J^{{{\text{dark}}}}_{Ns}\) for two hazy images (Forest and Train).

Fig. 6
figure 6

a Scene reconstruction functions (\(\frac{1}{{(1-\omega h_b*J^{{{\text{dark}}}}_{N})}^\gamma }\)), (\(1+KJ^{{{\text{dark}}}}_{Ns}\)), \(K=1.5\), b, c histogram for \(J^{{{\text{dark}}}}_{Ns}\) for Forest and Train hazy images, respectively

This reconstruction equation reduces the computational cost of the algorithm as there is no need for calculating the transmission map or the gamma correction function, and the scene is recovered directly using \(J^{{{\text{dark}}}}_{Ns}\).

Thus, our haze removal algorithm is as follows:

figure a

3.2 Implemented software architecture on TMS320DM6446

We analyzed the tasks of our haze removal algorithm and matched each with a suitable core (ARM, DSP, VICP). This matching was based on the nature of each core. For instance, VICP has multi-input multi-output (MIMO) nature thus color conversions \(RGB \longleftrightarrow Y C_{b} C_{r}\) are assigned to it. This color conversion is necessary as video data is presented in the NTSC by the TVP5146 decoder in YCbCr at 4:2:2 format, while our haze removal algorithm based on the dark channel operates on the RGB format. On the other hand, DSP has vector processing nature so convolution tasks such as downsampling are directed to it. The complete list of tasks and core assignments are shown in Table 3.

Table 3 Distribution of our algorithmic tasks among different cores

The implemented software architecture, shown in Fig. 7, is as follows: An application runs on the ARM9 core that has two threads. The first thread is the control thread that deals with the peripherals through the input/output drivers. The second one is the haze removal thread that creates, processes, controls, and deletes the haze removal algorithm as a codec through the VISA interface [3]. The DSP core runs the haze removal codec that is implemented as XDAIS-DM [4] and a codec engine server [5] configured to make it available for the ARM application. The DSP core initializes and controls the VICP using the VICP signal processing library [27]. Furthermore, fixed point data type is used instead of floating point data without affecting the accuracy as we still use four bytes for data representation, and the division operations are converted into multiplications as stated by Bovik [2]. Moreover, shortcut arithmetic operations are used to minimize memory movements.Footnote 1 The implemented software architecture of the algorithmic blocks distributed among the multi-cores is detailed in Fig. 7.

Fig. 7
figure 7

The implemented haze removal software architecture on the target heterogeneous multi-cores

4 Experimental results and performance evaluation

Since implementation of the method by He et al. [11] on TMS320DM6446 is not feasible, due to soft matting requirements, and in order to maintain a benchmark for comparisons of both quality and run time, we present two sets of results. First in Sect. 4.1, results for single images on a personal computer (PC) compared to the methods by He et al. [11] and Ke and Chen [12]. Then, in Sect. 4.2 we present our video execution time compared to the methods by Ke and Chen [12], Khodary and Aly [13], and El-Hashash et al. [7] on TMS320DM6446.

4.1 Benchmark results on personal computer

The implementation of the methods by He et al. [11], Ke and Chen [12], and our algorithm has been made using MATLAB 8.3 on core i5 at \(\text{2.27 GHz}\) PC with 4GB RAM using 720 × 480 images. Results for samples of the used test data set are shown in Fig. 8. The execution times for sample images are calculated and shown in Table 4. The Sobel edge detection for the sample hazy images and the images with removed haze are shown in Fig. 9. The reconstructed images contains more edges than the hazy ones.

Fig. 8
figure 8

First column hazy image a Forest, e Train, i Mountain, m Toys, second columnb, f, j, n haze removal using He et al. [11], third columnc, g, k, o haze removal using Ke and Chen [12], fourth column: d, h, l, p haze removal using our algorithm

Fig. 9
figure 9

First rowa, b, c, d sobel edge detection for hazy images, second rowe, f, g, h sobel edge detection for images with haze removed using our algorithm

Table 4 Execution times in seconds for images in Fig. 8 all of size 720 × 480 using different haze removal algorithms on personal computer (best value in bold)

Our algorithm achieves the lowest execution time compared to the methods in [11, 12] on the PC. Moreover, our algorithm provides better haze removal visual results in most scenes compared to the other methods.

4.2 Performance of the implemented algorithm under embedded environment

We tested the method of Khodary and Aly [13], El-Hashash et al. [7], and additionally implemented the method of Ke and Chen [12] on TMS320DM6446 to provide comparisons. The used input video stream on the target platform was NTSC for all three methods. We have deployed the system onboard and test-driven it in hazy weather, and sample results are shown in Fig. 10. Areas of heavy haze are reconstructed by our algorithm without spurious colors as in [12, 13]. The implemented system provides a 16-fold increase in frame rate compared to [12, 13] and approximately a three fold increase compared to [7] as shown in Table 5 and achieving a high speed of 8 fps.

Fig. 10
figure 10

First row hazy video frames, second row haze removal using Ke and Chen [12], third row haze removal using Khodary and Aly [13], fourth row haze removal using El-Hashash et al. [7],  fifth row haze removal using our system onboard

Table 5 Frame rate (fps) of different haze removal systems on embedded platform (best value in bold)

5 Conclusions

In this paper, a video haze removal system on heterogeneous multi-cores has been implemented. The system uses a newly designed haze removal algorithm based on the dark channel prior. Our algorithm has the lowest execution time compared to the related work in literature while maintaining the haze removal image quality. Our algorithm is developed on the DM6446 DMSoC, and the algorithmic blocks (tasks) were distributed among the ARM, DSP, and VICP cores. The time consuming bottleneck tasks were analyzed and redesigned in our algorithm. This provided us with 8 fps for full TV resolution (720 × 480). This was achieved by using the two components of the smoothing operation at lower resolution scale and the new reconstruction formula.