1 Introduction

With the development of commercial and scientific exploration in the underwater environment, the quality of underwater images and video sequences plays an important role in many fields [38], such as object detection and classification [28, 59], and marine organisms tracking [41]. Although specialized apparatus can improve the imaging quality, it is costly and power-consuming. Moreover, they are inconvenient for us to capture underwater images during diving and snorkeling activities. Thus, improving underwater image quality by developing some image enhancement and restoration technologies has received wide attentions and interests due to its low cost. However, underwater image processing is challenging because of the complexity and diversity of underwater environment. Light absorption and scattering are two main factors to cause visible degradation of underwater captured images. The former wavelength-dependent attenuation will cause color distortion that increases with the distance when light travels through water, and the latter explains that light can be refracted and scattered by water particles causing haze and blur. Therefore, it is meaningful to develop an effective preprocessing method to correct color, enhance contrast, and improve sharpness of underwater images for further applications.

In general, underwater image sharpening methods can be categorized into two types [25]: enhancement-based methods and restoration-based methods. Enhancement-based methods consider the quality perceived by human or the performance of computer vision, however, ignoring the physical property of image degradation. Numerous enhancement methods have been proposed to improve underwater image quality, for example, white balancing algorithm [1, 35], histogram equalization (e.g. DHE [57], CLAHE [15]), unsharp masking operation [51], and color-transfer technique [61] can be employed to remove color cast, increase the contrast, and improve the sharpness, respectively. In contrast, restoration-based methods take the underwater image formation model (UIFM) into account, wherein the parameters of the physical model are deduced using extra priors or scene information. Since the scattering is a function of distance and the transmission is variant within the image, many prior information-based methods [3, 56] were proposed to estimate the transmission map.

Dark channel prior (DCP) [21] is a widely used prior information-based method for outdoor image dehazing, which is based on the observation that the haze density can be regarded as a useful depth clue to estimate transmission map. Since the simplified UIFM is similar to the outdoor fogging model, a lot of UIFM-based methods derived from DCP are proposed for underwater image restoration. Benefiting from DCP, the transmission map can be acquired without estimating the depth map in advance. In [7], DCP was directly used to estimate the depth of the turbid water and restore the clarity of underwater image. Chiang and Chen [8] combined DCP with wavelength compensation to remove haze and correct color distortion for improving the quality of underwater images. In [11], an underwater dark channel prior (UDCP) was developed by only considering green and blue channels, since the red channel cannot provide dependable information to estimate the transmission map. Afterward, Galdran et al. [13] modified the DCP by inverting the red channel, considering that the intensity in red channel rapidly decays as the distance increases. They also incorporated saturation component into the red channel prior (RCP) to handle artificial illumination. In [58], Xie et al. proposed a normalized total variation method based on RCP for restoring underwater images. Likewise, Gao et al. [14] combined the red channel with the inverse of green and blue channels as a new degraded image and proposed a bright channel prior approach to estimate the transmission map. Recently, a generalization of DCP-based method for transmission estimation was proposed in [46], and the relationship between image intensity and depth was modeled by linear regression to estimate ambient light.

Another line of research focuses on estimating the transmission map by reasoning depth information based on different priors. For example, the maximum intensity prior (MIP) [5] extracted depth information by calculating intensity differences of color channels Dmip, and estimated the transmission map by directly using Dmip rather than a depth map. Peng et al. [45] proposed a novel method to estimate the scene depth using image blurriness and light absorption (IBLA). In [44], they further extended it to determine the distance between the closest scene point and the camera. Based on the analysis of a large number of underwater images, Song et al. [55] proposed an underwater light attenuation prior (ULAP) to estimate the scene depth. Afterward, Zhou et al. [63] introduced a color-line model to handle the degradation problem in underwater environment and determined the local depth with a non-linear optimization. In [43], the transmission map was estimated by an observation that the scene depth is inversely proportional to the geodesic color distance from the background light.

In addition to these aforementioned methods, many other methods have been also developed and achieved significant advances. Li et al. [32] proposed a dehazing approach based minimum information loss principle and histogram distribution prior to improve the contrast and visibility of underwater images. The concept of haze-lines for image dehazing was adopted by [2, 40], which describes that the pixels that belong to the same color cluster will be distributed on a straight line. Moreover, a range of variational methods [23, 24, 37] have been proposed to solve the problem of underwater image degradation. In recent years, since deep learning techniques have achieved great performance in natural image dehazing [49, 50], some works [47, 48] attempt to adopt deep learning strategies to enhance and restore underwater images. For example, Cao et al. [4] adopted a multi-scale architecture to estimate the scene depth map. Ding et al. [10] presented a jointly wavelength compensation and dehazing network (JWCDN) to estimate background light, wavelength attenuation and transmission map simultaneously. Considering the depth map estimation as an image-to-image translation, Zhang et al. [62] proposed a depth generative adversarial network (DepthGAN) to obtain high-quality depth map.

Based on the overview of related work of underwater image restoration methods, we find that the scattering effect can be naturally removed if the scene depth is considered. However, most existing methods include a post-processing step to increase the visibility of image, which may compromise the accuracy of the underlying scene radiance. In this paper, we suggest that an accurate scene depth map is the key to successfully estimate the transmission map, and propose a novel restoration method for improving underwater image quality without any post-processing step. The main contributions of our work are as follows.

  1. (a)

    An efficient underwater image restoration method is presented based on underwater image formation model, which can effectively remove haze, improve color rendition, and reveal more details.

  2. (b)

    Rather than directly estimating the transmission map, we first combine the oblique gradient operator (OGO) and underwater light attenuation prior to extract the scene depth, and then recover the scene radiance depending on the UIFM.

  3. (c)

    Instead of simply picking the brightest pixel, we introduce a new scheme to determine the candidate region for estimating background light based on the quad-tree subdivision.

The rest of this paper is organized as follows. Section 2 briefly introduces the underwater image formation model and characteristics of underwater light attenuation prior. In section 3, the proposed method is described in detail. Section 4 presents the performance of the proposed method including qualitative and quantitative comparisons, and application test. Finally, the conclusion is provided in section 5.

2 Background and foundation

2.1 Underwater image formation model

The degradation model of underwater images proposed by Jaffe [30] points out that the total energy ET detected by a camera includes the direct component Ed, the forescattering Efs and the backscattering Ebs. Thus, the concrete expression of UIFM can be written as a linear superposition of these three components:

$$ {E}_T={E}_d+{E}_{fs}+{E}_{bs} $$
(1)

In Eq. (1), Ed is the light reflected by the object; Efs is similar to Ed but has been scattered with a small angle; and Ebs is the light reflected by other suspended particles. Assuming that the forescattering can be neglected when the camera is close to the scene points, Schechner and Karpel [52] defined the direct component Ed and backscattering Ebs by:

$$ {E}_d=\mathrm{Jt} $$
(2)
$$ {E}_{bs}=\mathrm{B}\left(1-t\right) $$
(3)

where J is the scene radiance, B is a scalar that depends on the wavelength and t is the transmission map that represents the percentage of the scene radiance reaching the camera. Based on Schechner-Karpel model, the intensity of degraded underwater image in Eq. (1) can be simplified as:

$$ {I}^c(x)={J}^c(x){t}_c(x)+{B}^c\left(1-{t}_c(x)\right) $$
(4)

where c ∈ (R, G, B) denotes the different color channels, Ic(x) is the captured underwater image and Jc(x) is the undistorted underwater image. From Eq. (4), we can observe that to recover Jc(x) from Ic(x), we first need to estimate background light Bc and transmission maps tc(x). Following, we also utilize this widely used simplified underwater image formation model [11, 13, 44] to restore the underwater image.

Actually, accurate estimations of background light and transmission map are the basis for underwater image restoration. The background light Bc is often considered to be the intensity of the pixel with the maximum depth when assuming homogeneous lighting along the line of sight. However, it is difficult to find the farthest pixel in a single image. In an in-air image, the global air-light is often estimated as the color of the highest intensity pixel [21]. However, objects that are brighter than the background will lead to an incorrect estimation in underwater environment.

Following, the transmission tc can also be defined as an exponential decay function correlated with scene depth:

$$ {t}_c(x)={e}^{-{\beta}_cd(x)} $$
(5)

where βc is the attenuation coefficient depends on wavelength, d(x) is the distance from the camera to the scene point x. The scene depth d can be represented as a sum of the distance of the nearest point to the camera d0 and infinity distance dn [56].

2.2 Underwater light attenuation prior

Base on the above analysis, it can be concluded that the scene depth is a key clue to estimate the transmission map. In the early years, Song [55] proposed an effective scene depth estimation model using underwater light attenuation prior. The ULAP describes that there is a strong correlation between the scene depth and the difference between the value of red channel and the maximum value of G-B channel.

Since the absorption of red light can be an order of magnitude greater than the absorption of blue and green light, the intensity of red channel will attenuate faster than that of green or blue channels when the depth increases. To be specific, in the far region, the red light attenuates seriously, leading to a large difference between the maximum value of G-B channel and the value of red channel.

Based on the light attenuation prior, a linear model of the maximum value of G-B channel and the value of R channel was developed to estimate the depth map:

$$ d(x)={\theta}_0+{\theta}_1m(x)+{\theta}_2v(x) $$
(6)

where m(x) is the maximum value of G-B channel, v(x) is the value of R channel, θ0θ1θ2 are coefficients. To get the accurate values of parametersθ0θ1θ2, the authors manually select 100 proper depth maps obtained by [44] as the training data and train the model with a supervised linear regression. Unfortunately, the ULAP method only considers the light attenuation, which may fail in some cases as shown in Fig. 1. We can observe that some large blue objects in an image are often incorrectly estimated to be farther. Besides, if the color of the water body contains more red tones, it will be estimated to be closer than the foreground.

Fig. 1
figure 1

Two failure examples of depth map estimation using ULAP

3 Proposed method

The proposed method is composed of three main parts involving background light estimation, scene depth estimation, and transmission map estimation, which will be explained in detail in the following subsections. The flowchart of the proposed method is shown in Fig. 2.

Fig. 2
figure 2

Flowchart of the proposed method

3.1 Background light estimation

The background light refers to the waterbody color that depends on different water types. Most existing methods believe that the color of water can be obtained from at least one pixel in the image. Generally, the farthest region of an underwater image is often regarded as the candidate region of background light.

We assume that there exists an area that does not contain objects, in which the intensity of the pixels can reflect the color of the water body. Since the amount of light absorption varies with different wavelengths, the dominating color in this area appears green or blue. At the same time, such an area has a low variance. To detect the candidate area of background light, we use an automatic searching method based on quad-tree subdivision [31]. Considering both the color difference and smoothness of this candidate region, the score for each sub-block can be set as:

$$ Score={S}_{\varDelta }+{S}_{\sigma } $$
(7)

where SΔ is determined by calculating the max difference between G-B value and R value:

$$ {S}_{\varDelta }=\max \left(\max \left(G(x),B(x)\right)-R(x)\right),x\in \Omega $$
(8)

and Sσ is defined as:

$$ {S}_{\sigma }=-\frac{1}{3}\sum \limits_{c\in \left\{r,g,b\right\}}{\sigma}_c $$
(9)

where σ is the standard deviation of the pixel value in a selected region Ω.

After that, the block with the highest score will be further divided into smaller blocks until the size of the block is smaller than a predefined threshold. The final background light is calculated by averaging the pixel value inside the last block. The detailed algorithm is described in Algo.1.

Algorithm 1
figure d

Background light estimation

Following, four representative underwater images with different scenes (i.e. images with a white object, horizontal perspective or top-down perspective images, and images with complex foreground) are selected to demonstrate the effectiveness and robustness of quad-tree subdivision method. Their results of background light estimation are presented in Fig. 3 and the final selected block is filled with red color.

Fig. 3
figure 3

Examples of background light estimation

3.2 Scene depth estimation

To accurately estimate the distance from the farthest point to the closest one in an image, we also take the image gradient in to consideration. The intensity of image gradient is a rough estimation of depth information based on the observation that the regions of far scene points are smoother than those of close scene points, producing a smaller gradient value.

The magnitude of image gradient Gmag is computed as:

$$ {G}_{mag}=\sqrt{{G_x}^2+{G_y}^2} $$
(10)

where Gx and Gy are calculated by applying horizontal and vertical operators to different patches in an image. A 3×3 patch is presented in Fig. 4afc denotes the value of the central pixel and fk (k = 1,2,…8) is the value of its kth neighbor. This traditional calculation of gradient only indicates changes along y and x axes. Thus it is unable to represent how illumination changes in an arbitrary direction. Singh [54] used a combined oblique gradient profile prior (OGPP) on haze images and can efficiently estimate their depth maps. The corresponding oblique gradient operator with a patch size of 3×3 is presented in Fig. 4b and is defined as:

$$ o\left(m,n\right)=\arctan \left(\frac{G_y}{G_x}\right)=\arctan \left(\frac{\sum \limits_{k=1}^8\left({f}_c-{f}_k\right)}{8}\right) $$
(11)
Fig. 4
figure 4

a 3×3 image patch centered at fc, b oblique gradient operator

To better understand the process of estimating depth map, we give an example to illustrate it, as presented in Fig. 5. Based on Eq. (11), we first use the OGO to generate the gradient magnitude map Gmag of the degraded image as shown in Fig. 5b. Assuming that depth is locally constant in a small path, we further apply the dilation operation and a hole-filling algorithm to improve Gmag (shown in Fig. 5c), which is expressed as

$$ {G}_{dilate}(x)={G}_{mag}(x)\oplus SE $$
(12)
Fig. 5
figure 5

Depth map estimation. a Original image, b the gradient magnitude map, c the depth map after morphological dilation and hole-filling, d the refined depth map based on guided filter

In Eq. (12), the morphological structuring element (SE) is square-shaped whose width is 7 pixels, and the dilation operation for an image I(x,y) using a structuring element b is defined as

$$ I\oplus b\left(x,y\right)=\max \left\{I\left(x-s,y-t\right)+b\Big(s,t\Big)\right\} $$
(13)

Here, we use stretched map on the range [0, 1] to obtain the gradient-based depth map

$$ {d}_{gmag}(x)=1- Strch\left({G}_{fill}(x)\right) $$
(14)

where Gfill is the modified Gdilate after filling holes generated by flat regions and the stretching function, which is defined as

$$ Strch(V)=\frac{V-\min (V)}{\max (V)-\min (V)} $$
(15)

Finally, we utilize the guided filter algorithm [22] to further refine the depth map, as shown in Fig. 5d.

Inspired by the ULAP, we also incorporate the rapid and effective depth estimation dulap into Eq. (6) to guarantee a reliable depth map. This prior has been explained in Section 2.2, and the coefficients θ0θ1θ2 are set to be 0.53214829, 0.51309827 and − 0.91066194, respectively, according to the best learning results in [55].

The coarse depth map is computed by combining the depth map (dgmag) generated by image gradient and the depth map (dulap) based on light attenuation prior

$$ {d}_c(x)={wd}_{ulap}+\left(1-w\right){d}_{gmag} $$
(16)

where w is the weight to balance the effect of dulap and dgmag, which can be determined by the information of red channel with the help of sigmoid function

$$ w=\frac{1}{1+{e}^{-{\alpha}_1\left(r-{\alpha}_2\right)}} $$
(17)

where r is the average value of red channel, α1 is the parameter that controls the slope of the curve, which is empirically set to 32, α2 is the center of the horizontal coordinate.

To get a more accurate scene depth, the distance between the nearest point and the camera needs to be considered

$$ {d}_0=1-\underset{c\in \left\{r,g,b\right\}}{\max}\left(\frac{\max \left|{B}^c-{I}^c(x)\right|}{\max \left({B}^c,1-{B}^c\right)}\right) $$
(18)

To demonstrate the effectiveness of the proposed scheme, five sample images and their depth maps using different methods are shown in Fig. 6. The depth maps of UDCP [11] in Fig. 6b are obtained by inverting Eq. (5) with d(x) = log Nrer (r) (tr(x)). The depth maps generated by IBLA [44] and our proposed method are presented in Fig. 6c and d, respectively. It can be seen that UDCP method presents unsatisfactory depth maps since it only identifies the distance between foreground and background. The IBLA method works well in most cases but incorrectly estimates the white fish and the difference between objects in the foreground. In contrast, our method can produce proper depth maps and the visual quality of the depth maps shows much better. As shown in the first column of Fig. 6, for instance, only our method can correctly estimate the depth with small values closer to the camera when the image has a white object in the foreground.

Fig. 6
figure 6

Comparison of depth estimation. a Original images, b the depth maps using UDCP, c the depth maps using IBLA, d the depth maps using our proposed method

3.3 Transmission map estimation

The scene depth acquired by our method needs to be further transformed to actual distance using a constant scaling factor D:

$$ d(x)={D}_{\infty}\times \left({d}_c(x)+{d}_0\right) $$
(19)

Then, the transmission map for each channel can be estimated based on Eq. (5).

In most cases, the attenuation coefficient of red channel is determined as \( {\beta}^r\in \left[\frac{1}{8},\frac{1}{5}\right] \) [44]. According to [32], the attenuation coefficient of green/blue channel can be further calculated by green-red and blue-red ratios:

$$ \frac{\beta^{c\hbox{'}}}{\beta^r}=\frac{\left(-0.00113{\lambda}_{c\hbox{'}}+1.62517\right){B}^r\left(\infty \right)}{\left(-0.00113{\lambda}_r+1.62517\right){B}^{c\hbox{'}}\left(\infty \right)},{c}^{\hbox{'}}\in \left\{g,b\right\} $$
(20)

where Bc is the background light, and λc is the wavelength of each channel. The ranges of the wavelength of different channels are 620 ~ 750 nm (red), 490 ~ 550 nm (green) and 400 ~ 490 nm (blue) [8]. Here, we set D = 8, \( {\beta}^r=\frac{1}{6} \) and λc for R-G-B channel as 620, 540 and 450, respectively. The transmission maps estimation is described in Algo.2.

Once the background light B and the transmission map t are obtained, we can restore the scene radiance from Eq. (4). However, the light color area will be excessively restored when the transmission t approaches zero. To assure the restored results appear more natural, we set a constant t0 = 0.1 as a lower bound of transmission t. Finally, the restored image J can be calculated using the following modified equation:

$$ {J}^c(x)=\frac{I^c(x)-{B}^c}{\max \left({t}_c(x),{t}_0\right)}+{B}^c $$
(21)
Algorithm 2
figure e

Transmission estimation

4 Experimental results and analysis

In this section, we first present some recovered results and perform a validation of the proposed method. Then, the proposed method is compared with the other five underwater restoration methods to evaluate their performance qualitatively and quantitatively. Finally, we extend to examine the results of our approach for application in object segmentation.

4.1 Qualitative assessment

To verify the effectiveness of our proposed method, 20 underwater images with different degraded scenes (i.e. bluish scene, greenish scene, hazy scene, low light scene, turbid scene) are selected from the UIEB dataset [33], as presented in Fig. 7a. In the first row of Fig. 7a, we can observe that these original images contain a large area of the pure water body. As mentioned above, the estimation of the background light in this scenario is easy to be obtained. On the contrary, as shown in the bottom row of Fig. 7a, these degraded images contain thin mist or almost no water area. Most of them are close-up scenes of fish and coral reefs whose scene depth varies little. From Fig. 7b, it can be seen that the foreground color of these restored images has been well improved because more red color is reproduced. These satisfying results demonstrate our proposed method can effectively remove haze, as well as correct color and reveal more valuable information.

Fig. 7
figure 7

Examples for presenting the performance of our proposed method. a Original images, b the corresponding restored results of our proposed method

Furthermore, we compare the performance of our method with other five recent competitive methods including MIP method [5], wavelength compensation and image dehazing (WCID) method [8], UDCP method [11], IBLA method [44] and ULAP method [55]. In Fig. 8, due to the space limitation, we present six representatives that contain different characteristics in the scene. MIP method attempts to get the so-called depth using differences between the maximum value of red channel and green, blue channel Dmip and estimate the transmission through a simple shifting of Dmip. As can be seen from Fig. 8b, MIP method has a little effect on dehazing. Specifically, when the haze is thick, the color becomes pale, the differences of the value Dmip among different patches are very small, leading to an inaccurate estimation of depth. Similarly, the restored image generated by the WCID method are also unsatisfactory, as shown in Fig. 8c, despite the fact that the haze can be removed to some extent. UDCP method finds the brightest pixels in the dark channel as an estimation of the background light, generating a darker scene radiance. Although UDCP method has a good performance on dehazing, the whole restored results become darker and the color cast is even more serious. That’s because it is derived from DCP method and the estimated transmission of the whole scene has similar values. Additionally, as shown in Fig. 8d, UDCP renders the restored images in a bluish or greenish tone at the foreground due to a wrong estimated transmission. Although, the dehazing effects of IBLA and ULAP methods are not as good as UDCP method, IBLA method can recover more details, as shown in Fig. 8e (i.e. the top left corner and the bottom left corner of the last two images), yet there remains some fuzzy. Likewise, Fig. 8f demonstrates that the thin haze is also not removed by ULAP method. Moreover, the contrast enhancement of ULAP is not obvious. Since the normalized residual energy ratios \( {Nrer}_c={e}^{-{\beta}_c} \) used in ULAP is fixed, it cannot be adjusted according to various scenes and thus producing visually unnatural results. On the contrary, the recovered results of our proposed method achieve superior performance on dehazing, enhancing contrast, and revealing more details due to the more accurate depth estimation, as shown in Fig. 8g.

Fig. 8
figure 8

Comparison of restored results by different methods. a Original images, b-g the restored results by using MIP method, WCID method, UDCP method, IBLA method, ULAP method and the proposed method, respectively

4.2 Quantitative assessment

Following, we conduct some quantitative evaluations on the restored results in Fig. 8. Because of the unavailability of ground truth, it is difficult to evaluate the quality of underwater images using full-referenced metrics [18, 36]. Besides, some no-reference (NR) quality assessment methods [16, 17, 19, 20, 53] designed for in-air images are also not suitable for underwater images with various degradations, including haze effect, low contrast and non-uniform color distortion. Therefore, some no-reference metrics [42, 60] specially designed for evaluating underwater images quality are emerging. Here, we adopted six NR metrics including underwater color image quality evaluation (UCIQE) [60], underwater image quality measure (UIQM) [42], no-reference quality assessment of contrast-distorted images (CDIQA) [12], fog aware density evaluator (FADE) [9], no-reference quality metric of contrast (NIQMC) [19] and blind image quality measure of enhanced images (BIQME) [20], to quantitatively evaluate the performance of these compared restored methods. Their calculated results of Fig. 8 are listed in Tables 1 and 2, respectively.

Table 1 Quantitative comparison of UCIQE and UIQM metrics. (The bold values represent the best results)
Table 2 Quantitative comparison of CDIQA, FADE, NIQMC and BIQME metrics. (The bold values represent the best results)

As shown in Table 1, it can be seen that our proposed method achieves the highest UCIQE values comparing with MIP, WCID, UDCP, IBLA and ULAP. Moreover, the obtained UCIQE values of our method are more stable. For example, for image 5, the UCIQE values obtained by MIP and UDCP are even lower than that of the original image. However, in most cases, the values obtained by the proposed method are higher than 0.6, which indicates that our method can achieve a better balance among chroma, contrast and saturation. Also, for UIQM, the results presented in Table 1 show that the results of our method outperform the other methods in most cases except image 3 of Fig. 8c generated by UDCP with the highest value of 1.4701. While combining with the qualitative assessment, the results of UDCP suffer from underexposure problem even though it sometimes boosts its UIQM scores. In contrast, our method achieves better visual result in increasing contrast and has minimal performance fluctuation compared with other methods.

Table 2 demonstrates the quantitative measures of CDIQA, FADE, NIQMC and BIQME. The obtained highest values of CDIQA and NIQMC values by our method indicate that it can significantly enhance the contrast. For FADE, it is clear that both UDCP and our method outperform the other four compared methods. UDCP method ranks as the first/second with respect to the ability to recover scene visibility. Likewise, WCID can also perform well on some specific scenes (i.e., Image 3 and Image 5). Although our proposed method does not rank first in these two images, it can still rank in the top three among all the methods, demonstrating that our method can efficiently remove the haze and produce a relatively clear scene. In terms of NIQMC, the proposed method scores above 4.9 on all six examples. For BIQME, we can observe that WCID, UDCP, IBLA and ULAP show unsatisfactory results and even yield lower scores than the original image 6. MIP achieves a specific high BIQME score on Image 6 but performs unevenly in the other five tested images. All in all, the proposed method has better robustness and achieves high scores across various metrics.

To further evaluate the effectiveness and robustness of the proposed method, we carry out some experiments on UIEB [33] dataset and RUIE [39] dataset. Table 3 summarizes their average scores of UCIQE, UIQM, CDIQA, FADE, NIQMC, and BIQME of restored images by using different compared methods. It is worth noting that the results of our proposed method are superior to other five state-of-art methods in terms of these six objective NR quality assessment metrics. To sum up, both qualitative and quantitative experimental results demonstrate that our method can achieve better performance in removing haze and improving contrast.

Table 3 Comparison of average UCIQE, FADE, NR-CDIQA, NIQMC, BIQME of different restored methods on UIEB dataset and RUIE dataset. (The bold values represent the best results)

4.3 Application test

To further assess the performance of the proposed method, we attempt to examine its application in image segmentation. Image segmentation is a critical and essential task of many computer vision applications [26, 27, 29, 34]. In this section, we employ the original implementation of chan-vese model [6], which is an important image segmentation model based on regional information, to evaluate the performance of our work.

Due to the limited space, we simply display two examples, as shown in Fig. 9. It can be seen from Fig. 9 that the restored results of MIP, UDCP, ULAP and IBLA have large error in segmentation of Image 1, and WCID cannot even detect the fish. However, our proposed method can accurately extract the edge information of the fish. The results of Image 2 recovered by ULAP, IBLA and our method also show that the restored version with high visibility and contrast can achieve good separation. Furthermore, their performance of different compared methods are tested using two simple metrics: intersection over union (IoU) and dice similarity coefficient (DICE), which are the most commonly used indexes in evaluating image segmentation. Both the two metrics are used to measure the similarity between the segmentation result and the standard mask (manually segmented). The results in Table 4 show that the proposed method achieves the higher scores than the other five methods, which suggests that our method can improve the accuracy of conventional segmentation as a pre-processing step.

Fig. 9
figure 9

Applications in image segmentation. a Original images and segmentation results, b-g the restored images and their corresponding segmentation results by using MIP method, WCID method, UDCP method, IBLA method, ULAP method and our proposed method, respectively

Table 4 Quantitative comparison of IoU and DICE metrics. (The bold values represent the best results)

5 Conclusion

We present a novel restoration method for improving the quality of underwater image. The proposed method is based on an assumption that the intensity of the image gradient is a rough estimation of depth information. Initially, we utilize the quad-tree subdivision to estimate the background light by both considering smoothness and color difference. Afterward, an oblique gradient operator and underwater light attenuation prior are combined to estimate the scene depth. Subsequently, the transmission map is calculated relying on the acquired background light and scene depth. Finally, the scene radiance can be obtained based on the UIFM without any post-processing. Experimental results demonstrate that the proposed method achieves a good performance across different degraded scenes. The qualitative and quantitative comparisons further show that the proposed method outperforms the other five compared methods. Despite of the good performance, our proposed method also has some limitations. One is that it is not satisfactory to recovering non-uniform illumination image caused by auxiliary light sources. In future work, we intend to enhance and restore underwater image under more challenging conditions.