1 Introduction

The phenomenon of haze, due to the presence of airborne particulate matter, absorbs the reflected light of the scene and scatters some atmospheric light to the camera [1]. As such, the visibility, contrast and vividness of the scene are dramatically degraded, which is a major threat to subsequent high-level computer vision tasks, such as object detection and recognition. Hence, it is important to design an image dehazing method to improve the environmental adaptability of the visual system.

In the literature, various methods have been proposed to handle the image dehazing problem. We can roughly classify these methods into two categories: multiple image dehazing and single image dehazing. However, the haze removal from one single image has now gained the dominant popularity, since it is more practical for realistic settings. This paper focuses on the problem of single image dehazing. In computer vision, the formation of hazy image is usually described by the atmospheric scattering model [2]:

$$\begin{aligned} \mathbf {I}(x)=\mathbf {J}(x)t(x)+\mathbf {A}(1-t(x)), \end{aligned}$$
(1)

where \(\mathbf {I}\) is the observed hazy image intensity defined on the image domain \(\varOmega \subset \mathbf {R}^2\), \(\mathbf {J}\) is the scene radiance to be recovered, \(\mathbf {A}\) is the atmospheric light and assumed to be constant over the whole image, t is the medium transmission and x denotes the image coordinates. The transmission describes the portion of the light reaching the camera without being scattered. In a homogeneous atmosphere, the transmission t can be further expressed as \(t(x)= \mathrm{e}^{-\eta d(x)}\), where d(x) is the distance from the scene point to the camera and \(\eta \) is the scattering coefficient of the atmosphere. The task of dehazing is to estimate \(\mathbf {J}\) (with \(\mathbf {A}\) and t as by-products) from the input image \(\mathbf {I}\), which is a severely ill-posed problem. Therefore, haze removal is a challenging problem, especially for single image dehazing. Various prior knowledges have been proposed to capture deterministic (e.g., physical rules and variational energy) or statistical properties of the hazy image. For instance, Tan et al. [3] proposed a dehazing model based on the observation that the clear or enhanced images usually have higher contrast than hazy images. After that, Fattal [4] estimated the transmission through a refined image formation model, in which the cost functions of surface shading and transmission are considered to be statistically uncorrelated. He et al. [5] leveraged an empirical observation that the local patches in haze-free images often contain some low albedo values in at least one color channel. Inspired by this observation, they proposed a dark channel prior (DCP) to estimate the haze thickness. One may have the initial estimation of the transmission t (denoted \(t_0\)) from He et al.’s model:

$$\begin{aligned} t_0(x)=1-\mu \min _{c\in \{R,G,B\}}\Bigg (\min _{y \in N(x)}\frac{I^c(y)}{A^c}\Bigg ), \end{aligned}$$

where \(\mu \) is used to keep some depth information of a natural image and is commonly set as \(\mu =0.95\), \(I^c\) is a color channel of the hazy image \(\mathbf {I}\), the atmospheric light \(A_{\mathrm{c}}\) can be estimated as the brightest pixel color in the dark channel (c denotes a color channel). This coarse transmission map is computed locally and thus often needs to be refined. He et al. utilized the soft-matting technique to reduce the haze effect. As the soft-matting algorithm is too complex, Tarel et al. [6] proposed a fast image restoration algorithm by using a median filtering and its variant to replace the soft-matting algorithm. Nowadays, many imaging problems are solved by variational methods. Meng et al. [7] reformulated DCP as a boundary constraint and incorporated a contextual regularization to model transmissions of hazy images. They found an optimal transmission function t(x) by minimizing the following objective function:

$$\begin{aligned} \frac{\lambda }{2}\Arrowvert t-t_0 \Arrowvert _2^2+\sum _{j\in E}\Arrowvert W_j\circ (D_j \otimes t) \Arrowvert _1, \end{aligned}$$

where \(\lambda \) is the regularization parameter for balancing the two terms, E is an index set, \(\circ \) represents the element-wise multiplication operator, \(\otimes \) stands for the convolution operator, \(D_j\) is a first-order differential operator and \(W_j(j\in \omega )\) is a weighting matrix. Later, Zhu et al. [8] created a linear model to estimate the scene depth under a novel color attenuation prior and learned the model parameters in HSV space. Although this approach can produce visually pleasing performance, it often fails to remove the dense haze in the remote scenes. The total variation (TV) and its adaptive term, such as weighted TV or vectorial TV, have been proved to be effective for edge preservation, and they have been applied in many image processing tasks, such as image denoising, image restoration, deblurring, filtering, and so on. Fang et al. [9] proposed a new variational model with weighted vectorial total variation (WVTV) to dehaze and denoise simultaneously. The minimizing problem is given as follows:

$$\begin{aligned} \begin{aligned} \min _{\mathbf {g},d} E(\mathbf {g},d)&= \Arrowvert h\nabla \mathbf {g}\Arrowvert _{\mathrm{TV}} +\lambda \Arrowvert \nabla d\Arrowvert _{TV}\\&\quad +\frac{1}{2}\Arrowvert |\mathbf {g}-\mathbf {f}-\mathbf {d}\Arrowvert ^2 +\frac{\gamma }{2}\Arrowvert d-d_0\Arrowvert ^2 , \end{aligned} \end{aligned}$$

where \(d=-\frac{1}{\eta }\log {t(x)}\), \(d_0=-\frac{1}{\eta }\log {t_0(x)}\), \(\mathbf {g}=\frac{1}{\eta }\log (\mathbf {A}-\mathbf {J})\), \(\mathbf {f}=\frac{1}{\eta }\log (\mathbf {A}-\mathbf {I})\). The TV approach, however, also has some shortcomings, most notably the staircasing phenomenon. If one natural image contains not only flat but also slanted regions, then the image dehazing on the basis of the total variation tends to be piecewise constant (staircasing). The work in [10] aimed to solve a variational model based on image-guided total generalized variation (TGV) regularization to suppress artifacts in hazy images. They refined the transmission by using a global method based on image-guided total generalized variation (TGV) regularization:

$$\begin{aligned}&\min _{t,\omega } \left\{ \alpha _1 \int |D^{1/2}(\nabla t-\omega )|{\mathrm{d}}x\right. \\&\quad \left. +\,\alpha _0\int |\nabla \omega |{\mathrm{d}}x+\int |t-t_0|{\mathrm{d}}x\right\} , \end{aligned}$$

where \(D^{1/2}\) is the anisotropic diffusion tensor defined as:

$$\begin{aligned} D^{1/2} = \exp (-\gamma |\nabla I|^\beta )nn^T + n^\perp n^{\perp T}, \end{aligned}$$

where n is the direction of the gradient of the guided image \(n= \frac{\nabla I}{|\nabla I|}\) and \(n^\perp \) is the perpendicular direction, \(\gamma \), \(\beta \) are parameters to adjust the sharpness and magnitude of the tensor and \(\omega \) is an auxiliary variable. After the transmission map was refined, they recovered the scene radiance \(\mathbf {J}\) by gradient residual minimization (GRM):

$$\begin{aligned} \min _{\mathbf {J}} \left\{ \frac{1}{2}\int \Arrowvert \mathbf {J}t-(\mathbf {I} -\mathbf {A}+\mathbf {A}t)\Arrowvert ^2_2{\mathrm{d}}x{+}\eta \int \Arrowvert \nabla \mathbf {J}{-}\nabla \mathbf {I}\Arrowvert _0 {\mathrm{d}}x\right\} , \end{aligned}$$

where the \(l_0\) norm counts the number of nonzero elements and \(\eta \) is a weighting parameter. However, for very faraway objects, their method cannot significantly increase their contrast, which is due to the ambiguity between the artifacts and true objects covered by very thick haze, and mistakenly remove the distant objects to produce over-smoothed results. Lately, as convolutional neural networks (CNNs) have witnessed prevailing success in computer vision tasks, they have been introduced to image dehazing as well. Cai et al. [11] proposed a trainable model to estimate the transmission matrix from a hazy image called DehazeNet. Ren et al. [12] further exploited a multiscale CNN (MSCNN) that first generated a coarse-scale transmission matrix and later refined it.

In this paper, we propose a novel variational model with double-TGV regularizations related to \(\mathbf {J}(x)\) and t(x) for image dehazing and prove the existence and uniqueness of solutions of the proposed model. TGV regularization can suppress artifacts in hazy images. We do not need to estimate the transmission in advance as the haze estimation step presented in [10]. We develop a primal–dual algorithm for optimal solutions by alternating updating the transmission map t(x) and the scene radiance \(\mathbf {J}(x)\) step-by-step. Moreover, the two variables affect each other in the calculating process.

The rest of our paper is organized as follows. In Sect. 2, we deduce the proposed model and prove the existence and uniqueness of the solution for the model. In Sect. 3, we present the primal–dual algorithm to solve our model. In Sect. 4, we give some experimental results on natural images and illustrate the superior performance of our model. In Sect. 5, we provide concluding remarks.

2 The Proposed Model

In this section, we introduce our variational model for recovering the haze-free image without estimating the refined transmission in advance.

(1) can be rewritten as

$$\begin{aligned} A_{\mathrm{c}}-I_{\mathrm{c}}(x)=t(x)(A_{\mathrm{c}}-J_{\mathrm{c}}(x) ), \end{aligned}$$
(2)

where c denotes the color channel. In order to handle the product expression in (2), we convert it into the logarithmic domain, i.e.,

$$\begin{aligned} \log (A_{\mathrm{c}}-I_{\mathrm{c}}(x))=\log {t(x)}+\log (A_{\mathrm{c}}-J_{\mathrm{c}}(x)). \end{aligned}$$

As we know, t is an exponential function of the depth map d, so we have \(d(x)=-\frac{1}{\eta }\log {t(x)},\) where again \(\eta \) is the scattering coefficient and, for simplicity, we set it to be 1 for our experiments. We further set

$$\begin{aligned} f_{\mathrm{c}}(x)=\frac{1}{\eta }\log (A_{\mathrm{c}}-I_{\mathrm{c}}(x)),~~ g_{\mathrm{c}}(x)=\frac{1}{\eta }\log (A_{\mathrm{c}}-J_{\mathrm{c}}(x)). \end{aligned}$$

Then, Eq. (2) is equivalent to

$$\begin{aligned} \mathbf {g}(x)=\mathbf {f}(x)+\mathbf {d}(x),~~~~x\in \varOmega \end{aligned}$$

where \(\mathbf {g}=\{g_R, g_G, g_B\}, \mathbf {f}=\{f_R, f_G, f_B\}, \mathbf {d}(x)=\{d, d, d\}.\)

In order to deal with color images, it is convenient to have a notion of TGV for vector-valued images \(\mathbf {g}:\varOmega \subset \mathbf {R}^2 \rightarrow \mathbf {R}^L \) for some \(L\ge 1\). Here, \(L=3\) for the RGB color space. A vector-valued version of TGV\(^k_\alpha (\mathbf {g})\) can then be defined as

$$\begin{aligned} {\mathrm{TGV}}^k_\alpha (\mathbf {g})= & {} \sup ~ \bigg \{\int _{\varOmega } \sum _{l=1}^3 \mathbf {g}_l ~({\mathrm{div}}^k \mathbf {v}_l)~ {\mathrm{d}}x ~|~\mathbf {v}\\&\in \mathscr {C}^k_{\mathrm{c}}(\varOmega ,{\mathrm{Sym}}^k(\mathbf {R}^2)^3), \\&\quad \Arrowvert {\mathrm{div}}^\kappa \mathbf {v} \Arrowvert _{\infty }\le \alpha _\kappa ,\kappa =0,\ldots ,k-1 \bigg \}, \end{aligned}$$

where \(\mathbf {g}_l\) is a color channel of the image \(\mathbf {g}\), \(\mathbf {v}_l\) is a component of \(\mathbf {v}\), \({\mathrm{Sym}}^k(\mathbf {R}^2)=\{\xi : \underbrace{ \mathbf {R}^2 \times \cdots \times \mathbf {R}^2 }_{k ~times}\rightarrow \mathbf {R}|~\xi ~ {\mathrm{multilinear}}~ {\mathrm{and }} ~{\mathrm{symmetric}}\}\) denotes the space of symmetric tensors on \(\mathbf {R}^2\) (for \(k = 1\), \({\mathrm{Sym}}^1(\mathbf {R}^2) = \mathbf {R}^2\); and for \(k = 2\), \({\mathrm{Sym}}^2(\mathbf {R}^2) = S^{2\times 2}\)) and \(\alpha _\kappa \) are fixed positive parameters.

The vector-valued space \({\mathrm{BGV}}^k_\alpha (\varOmega ,\mathbf {R}^3)\) is defined as follows:

$$\begin{aligned} {\mathrm{BGV}}^k_\alpha (\varOmega ,\mathbf {R}^3)=\{\mathbf {g}\in L^1(\varOmega ,\mathbf {R}^3)|{\mathrm{TGV}}^k_\alpha (\mathbf {g})< \infty \}. \end{aligned}$$

The scalar version of \({\mathrm{TGV}}^k_\beta \) for d can be defined as same as the vector-valued version of \({\mathrm{TGV}}^k_\alpha \) with \(l=1\), and the scalar space BGV is defined as \({\mathrm{BGV}}^k_\beta (\varOmega )\).

For \( k = 1\), \(\alpha _0 =1\), \({\mathrm{TGV}}^k_\alpha \) coincides with TV and the space \({\mathrm{BGV}}^k_\alpha \) coincides with space BV. Indeed, solutions of variational problems with TV regularization admit many desirable properties, most notably the appearance of sharp edges. Unfortunately, one can also observe typical artifacts which are associated with the regularization with TV. The most prominent of these artifacts are the so-called staircasing effect and blocky effect. This is a side effect of the model assumption that an image consists of piecewise constant. Natural images are, however, often piecewise smooth due to shading, for instance. Compared with the conventional TV, total generalized variation (TGV) prefers piecewise smooth images. This is also a desired property for the transmission, because we may have a slanted plane (e.g., road, bridge) whose transmission varies smoothly along with the change of depth. As a consequence, we use TGV regularization terms for a hazy image and its transmission map both.

We thus make the following assumptions:

  • We suppose functions d(x) and \(\mathbf {g}(x)\) are piecewise smooth in \(\varOmega \). Thus, \({\mathrm{TGV}}_\beta ^k(d)\) and \({\mathrm{TGV}}_\alpha ^k(\mathbf {g})\) can be set as the regularization terms of d and \(\mathbf {g}\).

  • The sum of \(\mathbf {d}(x)\) and \(\mathbf {f}(x)\) is close to \(\mathbf {g}(x)\) in the sense of square norm. Thus, the fidelity terms are given by \(\int _\varOmega (\mathbf {g}-\mathbf {d}-\mathbf {f})^2 {\mathrm{d}}x\) and \(\int _\varOmega (d-d_0)^2 {\mathrm{d}}x\).

Combining the above two assumptions, we propose the following energy functional for recovering a haze-free image:

$$\begin{aligned} \begin{aligned} E(\mathbf {g},d)&={\mathrm{TGV}}^k_{\alpha }(\mathbf {g})+ \frac{1}{2}\int _\varOmega (\mathbf {g}-\mathbf {f}-\mathbf {d})^2 {\mathrm{d}}x\\&\quad +\,{\mathrm{TGV}}^k_{\beta }(d)+\frac{\gamma }{2}\int _\varOmega (d-d_0)^2 {\mathrm{d}}x, \end{aligned} \end{aligned}$$
(3)

where \(d_0=-\log {t_0}\), and \(\gamma \) is a positive parameter.

Then, the total space \(\varLambda \) of our energy functional can be defined as

$$\begin{aligned} \varLambda :=\{(\mathbf {g},d)|(\mathbf {g},d)\in {\mathrm{BGV}}^k_{\alpha }(\varOmega ,\mathbf {R}^3)\times {\mathrm{BGV}}^k_{\beta }(\varOmega )\}. \end{aligned}$$

After the functional space has been chosen, the minimizing problem of (3) can be written as the following form:

$$\begin{aligned} \inf _{(\mathbf {g},d)\in \varLambda } E(\mathbf {g},d). \end{aligned}$$
(4)

In the following, we will discuss the existence and uniqueness of the global minimizer for the proposed model. It is easy to show the lower semicontinuity of TGV, and the proof can be seen in [13].

Lemma 1

(Lower semicontinuity) If the sequences \({\mathbf {g}^n} \subset {\mathrm{BGV}}^k_{\alpha }(\varOmega ,\mathbf {R}^3)\) and \(\mathbf {g}^{*} \in {\mathrm{BGV}}^k_{\alpha }(\varOmega ,\mathbf {R}^3)\) satisfy that \(\mathbf {g}^n \longrightarrow \mathbf {g}^{*}\) in \(L^1(\varOmega ,\mathbf {R}^3)\), i.e.,

$$\begin{aligned} \lim _{n \rightarrow +\infty } \int _{\varOmega }|\mathbf {g}^n - \mathbf {g}^{*}|{\mathrm{d}}x=0, \end{aligned}$$

then

$$\begin{aligned} {\mathrm{TGV}}^k_\alpha (\mathbf {g}^{*}) \le \lim _{n \rightarrow +\infty } \inf {\mathrm{TGV}}^k_\alpha (\mathbf {g}^n). \end{aligned}$$

Theorem 1

Assume that \(\mathbf {f} \in L^2(\varOmega ,\mathbf {R}^3)\) and that \(d_0 \in L^2(\varOmega )\) is nonnegative, then the minimization problem (4) admits a unique solution \((\mathbf {g}^{*},d^{*})\in \varLambda \).

Proof

First, the energy functional \(E(\mathbf {g},d)\) is clearly nonnegative and proper because E(0, 0) is a finite value. Let \(\{(\mathbf {g}^n,d^n)\}\) be a minimizing sequences of problem (4). Then, there is a constant \(M >0\), such that

$$\begin{aligned} \begin{aligned} E(\mathbf {g}^n,d^n)&={\mathrm{TGV}}^k_{\alpha }(\mathbf {g}^n)+\frac{1}{2}\int _ \varOmega ( \mathbf {g}^n-\mathbf {f}-\mathbf {d}^n)^2 {\mathrm{d}}x\\&\quad +\,{\mathrm{TGV}}^k_{\beta }(d^n)+\frac{\gamma }{2}\int _\varOmega (d^n-d_0)^2 {\mathrm{d}}x\le M . \end{aligned} \end{aligned}$$

Thus,

$$\begin{aligned} {\mathrm{TGV}}^k_\alpha (\mathbf {g}^n) \le M,~~{\mathrm{TGV}}^k_\beta (d^n) \le M. \end{aligned}$$
(5)

Since \(d_0 \in L^2(\varOmega )\), \(\mathbf {f} \in L^2(\varOmega ,\mathbf {R}^3)\), we can assume that

$$\begin{aligned} \Arrowvert d_0 \Arrowvert _2 \le M,~~\Arrowvert \mathbf {f} \Arrowvert _2 \le M. \end{aligned}$$

Observing that

$$\begin{aligned}&\frac{\gamma }{2}\int _{\varOmega }|d^n-d_0|^2 {\mathrm{d}}x\le M ~~and\\&\quad \frac{1}{2}\int _{\varOmega }|\mathbf {g}^n-\mathbf {f}-\mathbf {d}^n|^2 {\mathrm{d}}x\le M, \end{aligned}$$

we have

$$\begin{aligned} \begin{aligned} \Arrowvert d^n\Arrowvert _2^2&=\Arrowvert (d^n-d_0)+d_0\Arrowvert _2^2 \\&\le 2\Arrowvert d^n-d_0\Arrowvert _2^2 + 2\Arrowvert d_0\Arrowvert _2^2 \\&\le \frac{4}{\gamma }M+2M^2 \end{aligned} \end{aligned}$$

and

$$\begin{aligned} \begin{aligned} \Arrowvert \mathbf {g}^n\Arrowvert _2^2&=\Arrowvert (\mathbf {g}^n-\mathbf {f}-\mathbf {d}^n)+\mathbf {f}+\mathbf {d}^n\Arrowvert _2^2 \\&\le 3\Arrowvert \mathbf {g}^n-\mathbf {f}-\mathbf {d}^n\Arrowvert _2^2 +3\Arrowvert \mathbf {f} \Arrowvert _2^2+3\Arrowvert \mathbf {d}^n\Arrowvert _2^2 \\&\le 6M+\frac{12}{\gamma }M+9M^2 . \end{aligned} \end{aligned}$$

Then, the boundedness of \(\Arrowvert d^n\Arrowvert _1\) and \(\Arrowvert \mathbf {g}^n\Arrowvert _1\) is automatically obtained by Holder’s inequality:

$$\begin{aligned}&\Arrowvert d^n\Arrowvert _1^2 \le |\varOmega |\Arrowvert d^n\Arrowvert _2^2 \le |\varOmega |\left( \frac{4}{\gamma }M+2M^2\right) ,\\&\Arrowvert \mathbf {g}^n\Arrowvert _1^2 \le |\varOmega |\Arrowvert \mathbf {g}^n\Arrowvert _2^2 \le |\varOmega |\left( 6M+\frac{12}{\gamma }M+9M^2\right) , \end{aligned}$$

where \(\gamma \) is a positive parameter. Namely, the sequences \(\{\mathbf {g}^n\}\) and \(\{d^n\}\) are bounded in \(L^1(\varOmega )\) and \(L^1(\varOmega ,\mathbf {R}^3)\), respectively. Combining it with (5), we can conclude that the sequences \(\{\mathbf {g}^n\}\) and \(\{d^n\}\) are bounded in \({\mathrm{BGV}}^k_{\alpha }(\varOmega ,\mathbf {R}^3)\) and \({\mathrm{BGV}}^k_{\beta }(\varOmega )\), respectively. By Sobolev imbedding theorems, there exist subsequences \({(\mathbf {g}^n,d^n)}\) such that

$$\begin{aligned}&d^n \longrightarrow d^{*}~~ {\mathrm{and}} ~~\mathbf {g}^n \longrightarrow \mathbf {g}^{*} ~~{\mathrm{strongly}}~~ {\mathrm{in}}~~ L^1(\varOmega ),\\&d^n \longrightarrow d^{*} ~~{\mathrm{and}} ~~\mathbf {g}^n \longrightarrow \mathbf {g}^{*} ~~ {\mathrm{weakly}} ~~{\mathrm{in}}~~ L^2(\varOmega ),\\&d^n {\mathop {\longrightarrow }\limits ^{\omega ^*}} d^{*}~~ {\mathrm{and}} ~~\mathbf {g}^n {\mathop {\longrightarrow }\limits ^{\omega ^*}} \mathbf {g}^{*} ~~ {\mathrm{in}} ~~{\mathrm{BGV}}, \end{aligned}$$

where \((\mathbf {g}^{*},d^{*}) \in {\mathrm{BGV}}^k_{\alpha }(\varOmega ,\mathbf {R}^3) \times {\mathrm{BGV}}^k_{\beta }(\varOmega )\). By the weakly lower semicontinuity of the \(L^2\) and Lemma 1, a standard process can prove that

$$\begin{aligned} \min _{(\mathbf {g},d)\in \varLambda }E(\mathbf {g},d)=\lim _{n \rightarrow +\infty } \inf E(\mathbf {g}^n,d^n) \ge E(\mathbf {g}^*,d^*). \end{aligned}$$

Hence, \((\mathbf {g}^{*},d^{*})\) is a minimal point of \(E(\mathbf {g},d)\).

All the terms in (3) are convex, so E is convex. (Here, \(\mathbf {g}\) is a three-dimensional vector value and d is one-dimensional value.) Moreover, the Hessian matrix of the functional \(E_0(\mathbf {g},d):= \frac{1}{2}\int _\varOmega (\mathbf {g}-\mathbf {f}-\mathbf {d})^2 {\mathrm{d}}x+\frac{\gamma }{2}\int _\varOmega (d-d_0)^2 {\mathrm{d}}x \) is

$$\begin{aligned} H=\left[ \begin{array}{cc} \frac{\partial ^2 E_0}{\partial \mathbf {g}^2} &{} \frac{\partial ^2 E_0}{\partial \mathbf {g} \partial d} \\ \frac{\partial ^2 E_0}{\partial d \partial \mathbf {g}} &{} \frac{\partial ^2 E_0}{\partial d^2} \\ \end{array} \right] =\left[ \begin{array}{cc} 1 &{} -3 \\ -1 &{} 3+\gamma \\ \end{array} \right] .\end{aligned}$$

H is positive definite; thus, \(E_0\) is strictly convex. As a result, E is strictly convex, too.

Therefore, we can conclude that the variational problem (4) admits a unique solution. \(\square \)

3 The Algorithm

To solve the minimizing problem (4), we apply the primal–dual algorithm. The present paper addresses this issue by analyzing the case of \(k=2\), i.e., total generalized variation of second order.

With the constraint \(\nabla ^T q = p\), we can now deduce a discrete version of \({{\mathrm{TGV}}}^2_{\alpha }(u)\):

$$\begin{aligned} \begin{aligned} {\mathrm{TGV}}^2_\alpha (u)&=\max ~ \left\{ \langle u, \nabla ^T p\rangle _{\mathbf {R}^L} ~|~(p,q) \in (\mathbf {R}^2)^L\times {\mathrm{Sym}}^2(\mathbf {R}^2)^L,\right. \\&\quad \left. \nabla ^T q=p, ~~~\Arrowvert q \Arrowvert _\infty \le \alpha _0,\Arrowvert p \Arrowvert _\infty \le \alpha _1 \right\} , \end{aligned} \end{aligned}$$

The inner products we used here are defined as follows:

$$\begin{aligned}&{\mathrm{for}}~u,~r~\in ~ \mathbf {R}^L:~\langle u,r\rangle _{\mathbf {R}^L} =\sum _{l=1}^L\langle u^l,r^l\rangle ,\\&{\mathrm{for}}~u,~r~\in ~ (\mathbf {R}^2)^L:~\langle u,r\rangle _{(\mathbf {R}^2)^L} \\&\quad =\sum _{l=1}^L\langle (u^l)^1,(r^l)^1\rangle +\langle (u^l)^2,(r^l)^2\rangle ,\\&{\mathrm{for}}~u,~r~\in ~ {\mathrm{Sym}}^2(\mathbf {R}^2)^L:\\&\langle u,r\rangle _{{\mathrm{Sym}}^2(\mathbf {R}^2)^L} =\sum _{l=1}^L\langle (u^l)^{11},(r^l)^{11}\rangle \\&\quad +\langle (u^l)^{22},(r^l)^{22}\rangle +2\langle (u^l)^{12},(r^l)^{12}\rangle , \end{aligned}$$

and \(\langle \cdot ,\cdot \rangle \) denotes the Euclidean inner product.

Fig. 1
figure 1

Thin haze removal

Fig. 2
figure 2

Dense haze removal

Fig. 3
figure 3

Inhomogeneous haze removal

Fig. 4
figure 4

A comparison between the TV model and the TGV model

Fig. 5
figure 5

A comparison between the TV model’s transmission map and the TGV model’s transmission map

Fig. 6
figure 6

A close-up comparison between the TV model and the TGV model

Introducing indicator functionals, i.e.,

$$\begin{aligned} \mathscr {I}_K(x) = \left\{ \begin{array}{ll} 0 &{} \quad \text {if }x\in K\\ \infty &{} \quad {\text {else}} \end{array} \right. \end{aligned}$$

and observing that

$$\begin{aligned} -\mathscr {I}_{\{0\}}(\nabla ^T q-p)=\min _{\omega \in (\mathbf {R}^2)^L}\langle \omega , \nabla ^T q-p\rangle _{(\mathbf {R}^2)^L}, \end{aligned}$$

the discrete functional can be rewritten as

$$\begin{aligned} \begin{aligned} {\mathrm{TGV}}^2_\alpha (u)&=\max _{p,q} \min _{\omega \in (\mathbf {R}^2)^L}~~ \langle u, \nabla ^T p \rangle _{\mathbf {R}^L} {+} \langle \omega , \nabla ^T q{-}p \rangle _{(\mathbf {R}^2)^L} \\&\quad -\,\mathscr {I}_{\{\Arrowvert \cdot \Arrowvert \le \alpha _0\}}(q)-\mathscr {I}_{\{\Arrowvert \cdot \Arrowvert \le \alpha _1\}}(p). \end{aligned} \end{aligned}$$

One can show that the maximum and minimum can be interchanged. Moreover, the constraints are symmetric around 0, so we have

$$\begin{aligned} \begin{aligned} {\mathrm{TGV}}^2_\alpha (u)&=\min _{\omega } \max _{p,q}~~ \langle \nabla u-\omega , p \rangle _{(\mathbf {R}^2)^L} + \langle \mathscr {E} \omega , q \rangle \\&\quad -\,\mathscr {I}_{\{\Arrowvert \cdot \Arrowvert \le \alpha _0\}}(q)-\mathscr {I}_{\{\Arrowvert \cdot \Arrowvert \le \alpha _1\}}(p), \end{aligned} \end{aligned}$$

where \(\mathscr {E}\) is the symmetric derivative \(\mathscr {E}\omega =\frac{1}{2}(\nabla \omega +\nabla \omega ^T).\)

For \(L=1\), the discrete scalar version of \({\mathrm{TGV}}^2_\beta (d)\) can be defined; for \(L=3\), the discrete vector-valued version of \({\mathrm{TGV}}^2_\alpha (\mathbf {g})\) can be defined.

Then, a discretization of the variational problem (4) is given by the saddle-point problem

$$\begin{aligned} \begin{aligned} \min _{\mathbf {g},d,\varvec{\omega }_1,\omega _2}&\max _{\mathbf {p_1},p_2,\mathbf {q_1},q_2} \langle \nabla \mathbf {g}-\varvec{\omega }_1, \mathbf {p_1} \rangle _{(\mathbf {R}^2)^3} + \langle \mathscr {E}\varvec{\omega }_1, \mathbf {q_1} \rangle _{{\mathrm{Sym}}^2(\mathbf {R}^2)^3} \\&\quad +\,\frac{1}{2}\int _\varOmega (\mathbf {g}-\mathbf {f}-\mathbf {d})^2 {\mathrm{d}}x +\langle \nabla d-\omega _2,p_2 \rangle _{\mathbf {R}^2} \\&\quad +\,\langle \mathscr {E}\omega _2, q_2 \rangle _{{\mathrm{Sym}}^2(\mathbf {R}^2)}+\frac{\gamma }{2}\int _\varOmega (d-d_0)^2 {\mathrm{d}}x \\&\quad -\,I_{\{\Arrowvert \cdot \Arrowvert \le \alpha _1\}}(\mathbf {p_1})-I_{\{\Arrowvert \cdot \Arrowvert \le \alpha _2\}}(\mathbf {q_1})-I_{\{\Arrowvert \cdot \Arrowvert \le \beta _1\}}(p_2) \\&\quad -\,I_{\{\Arrowvert \cdot \Arrowvert \le \beta _2\}}(q_2). \end{aligned} \end{aligned}$$
(6)

Note that (6) can be solved by many efficient methods, such as the augmented Lagrangian method and split Bregman iteration. Here, we chose to employ the primal–dual method due to its simplicity which is summarized in the following:

figure d

In the algorithm, \(\mathbf {d}^{k+1}=(d^{k+1},d^{k+1},d^{k+1})\), \(\tau ,~\sigma >0\) are the step sizes and k is the iteration number. The project operators \(\mathscr {P}\) is defined:

$$\begin{aligned} \mathscr {P}[x]=\frac{x}{\max \{1,|x|\}}. \end{aligned}$$

The divergence and gradient operators in the optimization are approximated by using standard finite difference. More details are defined as in [13].

4 Experimental Results

In this section, we present some experimental results to show the performance and the effectiveness of our model, in comparison with Tarel et al.’s model [6], DCP model [5], Meng et al.’s model [7], Berman et al.’s model [14], Chen et al.’s model [10], CAP model [8], DehazeNet [11], MSCNN [12], AOD-Net [15] and DCPDN [16]. The first five conventional methods are proposed to refine the coarse transmission map obtained by dark channel prior, and the others are some state-of-the-art of learning methods nowadays. Note that the most of the following experiments are implemented in MATLAB R2015b on an Intel 2.50 GHz PC with 8 GB RAM; especially, AOD-Net and DCPDN are implemented in Pycaffe on an Intel PC with GTX1064 GPU. In our model, the parameters \(\alpha \) and \(\beta \) adjust the importance of the two TGV regularization terms. We observe that the results of the proposed model change very little as the parameters \(\alpha \) and \(\beta \) vary in ranges [0.01, 0.1] and [0.2, 0.8]. The parameter \(\gamma \) determines the fidelity term; as a result, we cannot set \(\gamma \) to be too small. In our experiment, we fix \(\gamma ~ in ~[2, 4]\). We choose the parameters\(\alpha =0.02\), \(\beta =0.5\) and \(\gamma =4\) in our manuscript by inferring on best visual appearance after taking many times experiments. For other comparison models, we use the parameter ranges given in their corresponding papers (or programs) and choose the values to give the best visual effect.

4.1 Comparison to the State-of-Art Methods

The haze of the input image in Fig. 1 is thin, and the scene depth of it is varying continuously. The dehazed result of Tarel et al.’s model [6] has oversaturated color in the sky region, and we can see this unpleasant effect in the close-up view. To some extent, the dehazed result of Meng et al.’s model [7] can remove the haze of the image, but it suffers from color distortion owing to the error estimation of the atmospheric light. Besides, the results of Berman et al.’s model [14], Chen et al.’s model [10], CAP model [8] and MSCNN [12] also suffer from unpleasing color shifting. The result of DCP model [5] still contains haze in the distant horizon. The dehazed results of CAP model [8], DehazeNet [11], MSCNN [12] and AOD-Net [15] are a little dark and lack of enhancement in some areas (see the close-up view in the yellow rectangle). DCPDN’s [11] result and our result contain much less visual artifacts and appear to be more natural.

In Fig. 2, the image suffers from dense haze and has discontinuous scene depth. We can observe that edge artifacts around scene depth become severe in Tarel et al. model [6]’s result. The result of Chen et al.’s model [10] is quite over-smoothed for the distant objects. The results of Meng et al.’s model [7] and Berman et al.’s model [14] have the compelling visibility, but they suffer from color distortion which makes the image appear to be unnatural. The dehazed results of the learning-based methods are enhanced, but not as clear as ours. The result of our method is comparable to the DCP [5] method’s, but our result has a higher image contrast. Compared with the other results, our dehazed image is more balanced in the visibility and the naturalness.

In Fig. 3, the image is full of inhomogeneous haze and discontinuous depth. For this input, most existing dehazing methods’ results are visual unnoticeable, especially in heavy haze regions. The haze cannot be effectively removed using the previous methods, even the learning methods nowadays, such as CAP [8], DehazeNet [11] and MSCNN [12]. This is because, during the dataset training process, the haze image is synthesized homogenously. Our result can remove more haze in the image with less visual artifact, but the color of our image is prone to dark. However, a better result may be obtained by further enhancing its brightness via exposure compensation. We can see the close-ups from the boxed areas in the bottom.

4.2 Comparison to TV-Based Models

We further compare our method with the TV-based dehazing model [9]. Two examples are shown in Fig. 4, where the input images have lower quality in the remote scenes. We can see that fine details in the remote scenes are lost in TV model’s results such as the buildings in the red square, the trees, houses and telegraph pole in the red rectangle. More apparently, the bridge (the close-up view in the yellow rectangle) in the dehazed image using TV model is missing. As we know, the kernel of \({\mathrm{TGV}}^2_\alpha \) reads as

$$\begin{aligned} \mathrm{ker}({\mathrm{TGV}}^2_\alpha )=\{u(x)=Ax+b~~| A \in \mathbf {R}^{3\times 2},~~b \in \mathbf {R}^3\}, \end{aligned}$$

but the kernel of TV consists of constant functions. For the TGV model, the dehazed image can be represented as \(e^{Ax+b}\) roughly in a local region, while \(e^c\) for the TV model, which implies that our TGV model allows the image change linearly in this local region. This reveals that the range of the color value is larger in the TGV model’s result than in the TV model’s; in other words, more color and content will be preserved.

In Fig. 5, we compare the transmission maps obtained by the TV model and our model. We can see that there are strong visible blocky artifacts appearing in the sky region of the TV model’s results which are alleviated after dehazing by our TGV model. As we know, the transmission cannot be well approximated by piecewise constant functions when it varies smoothly along with the change in depth.

The superiority of the TGV model is more apparent in Fig. 6, in which close-up views of Fig. 4 are displayed. From the images, we observe that the hull, the brown pillar and the yellow wall recovered by the TV model suffer more staircasing effect than by our model. Meanwhile, the top of the brown building and the water surface of the TV model’s result in Fig. 4 are over-smoothed. This eliminates the possibility that the weights of the TV regularization terms are too small. The staircasing effect is produced by the TV model itself due to its inaccurate assumption of scene depth and surface radiance being piecewise constant.

4.3 Objective Quality Evaluations

A good image quality assessment method needs to compare the effect of visibility, color restoration and image structure similarity of different dehazing algorithms. As shown in Table 1, no-reference image quality assessments [17] are used here, because the full-reference and reduced-reference image quality assessments need a clear image corresponding to the hazy image to act as the reference image. This is hard to be satisfied in real applications.

Table 1 Blind quantitative analysis comparison of the results
Table 2 Comparison of average model running time (in s)
Table 3 Quantitative analysis comparison of synthetic images

In Table 1, we compare the quality assessments of the dehazing methods with those of our method, such as IVM, VCM, Cg, UQI and so on. The definitions of these objective image quality assessment are described in detail in the appendix. According to the subjective quality assessments in Sect. 4.14.1, we can find that DehazeNet, MSCNN and CAP obtain the satisfactory visual results with respect to Figs. 1, 2 and 3, respectively, so in Table 1 we take the three methods to compare with our method. As shown in the table, our model achieves better values of most visibility assessment criteria than the other methods. In some sense, when all the other values of assessments are equal, a smaller SSIM indicates less haze in the dehazed image. Thus, in this case, the SSIM here is only a reference for the readers. (There is no truth image in real-world applications of image dehazing and the original hazy image is always chosen as the reference image, so large SSIM and UQI do not mean the image is of better dehazed result. For example, the SSIM index of two identical hazy images must be larger than the SSIM index of the hazy image and the dehazed image.) Compared with the result of CAP, the color of our result is a little dark. So the HCC values of ours shown in the table are smaller than CAP’s. Next, our algorithm achieves better values, on average, than MSCNN and DehazeNet algorithms. There is no dehazing method that has very good performance under all conditions, so the quality assessment indexes can be used as references for the subjective assessment.

The runtimes of DCP model, Meng et al.’s model, Berman et al.’s model, CAP model, DehazeNet model, AOD model, TV model, Chen et al.’s model and our model are listed in Table 2. Furthermore, we have compared runtimes of three image sizes of 465*384, 400*600 and 1024*768. Most of the methods are implemented in MATLAB R2015b but AOD-Net and DCPDN are implemented in Pycaffe. CAP model, DehazeNet model and AOD model are learning methods using trained model parameters; as a result, the runtimes of them are very fast. DCP method and Berman et al.’s method do not use the variational model, so they take little time in iterative calculation. TV model, Chen et al.’s model and our model are variational models using two variables while Meng et al. use one variable to optimize their model; therefore, their method is faster than ours. We want to accelerate our model with a more efficient fast algorithm in the future.

To verify the effectiveness on complete images, we test our method on synthesized hazy images from stereo images with a known depth map d(x). And our method is compared with DCP, Meng et al., Berman et al. and MSCNN in Fig. 7. To quantitatively assess these methods, we use a series of evaluation criteria in terms of the difference between each pair of ground-truth image and dehazed result. Apart from the widely used structural similarity (SSIM) indices, we use additional evaluation indexes, namely mean square error (MSE) and peak signal-to-noise ratio (PSNR). In Table 3, our method achieves the best performance in PSNR and MSE. On the whole, our method achieves a satisfactory performance on synthesized images.

Fig. 7
figure 7

Dehazing results on synthetic images

5 Conclusion

In this paper, we have proposed a variational model with TGV regularizations to remove the haze of a single image. The proposed dehazing method obtains the optimal solutions through a primal–dual algorithm. We have validated the superior performance of our method over some state-of-the-art methods. Compared with the TV dehazing model [7], the proposed method can remove the visible blocky artifacts appearing in the transmission maps and recover haze-free images with less visual staircasing artifacts in the slanted plane, more details in the remote regions. But the color of our output image is a little dim; it still needs to be improved. In the future, we will consider how to enhance its brightness adaptively.