1 Introduction

Underwater imaging is one of the critical technologies for studying and exploring the underwater world. This technique gathers information and images of the underwater environment and objects utilizing high-tech equipment and sensors, such as laser radar, sonar, and imaging sensors (Kang et al., 2023; Pan et al., 2022; Zhang et al., 2022). Underwater cameras can directly capture images of the underwater environment and marine life, providing critical observational data and evidence for research and exploration in ocean energy development and marine life monitoring. Additionally, underwater cameras can support the work of underwater robots and divers, improving their operational efficiency and safety (Jiang et al., 2022). However, underwater image restoration poses more significant challenges than terrestrial image restoration, due to selective absorption and scattering caused by diverse aquatic media, inadequate lighting, and inferior underwater imaging equipment (Ren et al., 2020; Qi et al., 2022; Li et al., 2022; Liu et al., 2021; Jiang et al., 2022). Specifically, color distortion in underwater images is often caused by the selective absorption of light by water. Furthermore, light scattering contributes to the degradation of image clarity, resulting in low contrast and blurry details (Zhuang et al., 2022). While the addition of artificial illumination has made the underwater environment more complex, the reconstruction of color loss in underwater images remains an essential and valuable area of research that has received significant attention (Liu et al., 2022a, b; Qi et al., 2021; Ren et al., 2021; Yuan et al., 2021). High-quality underwater images are valuable for various tasks, including target detection (Zang et al., 2013), recognition, and segmentation.

To overcome these challenges, IFM-based methods are used to reduce backscattering and improve color distortion and contrast in underwater imaging. The depth map of a scene is crucial for IFM, and traditional depth estimation methods rely on hand-crafted priors, which can lead to errors. Deep learning-based depth estimation methods are more accurate and robust, but require a large dataset for network training, which is difficult to obtain underwater.

Therefore, we propose a depth estimation method that combines prior and unsupervised methods based on Comprehensive Imaging Formation Model (CIFM) to reconstruct underwater images. Extensive experiments conducted on multiple underwater databases have demonstrated that our method can produce enhanced results with superior visual quality compared to other relevant techniques. The major contributions of this paper can be summarized as follows:

(1) We propose a novel restoration strategy based on a CIFM, which involves three stages: monocular depth estimation, backscatter removal, and color correction. This approach leverages absolute depth map estimations and an adaptive dark pixel prior for efficient and dynamic backscatter elimination across varying depths, followed by a color correction to rectify color bias and enhance image brightness.

(2) We design the CIP to estimate the depth map considering the underwater light attenuation rate. Integrating the depth map of the CIP prior with an unsupervised method to generate the fused depth map (\(CIP^{+}\)), which effectively overcomes prior failure and unsupervised errors.

(3) We construct the ADP to calculate the minimum and maximum distances in the dynamic depth transformation based on varying image degradation levels and NIQE metrics. ADP not only accelerates algorithmic operations but also minimizes backscatter fitting errors via calculating the sum channel and strategic dark pixel selection for different depth intervals.

(4) We develop a color compensation strategy that improves the precision of the attenuation coefficient fitting by defining the minimum distance between consecutive data points. Concurrently, we devise a color balance procedure that accounts for the pixel intensity distribution within the blue and green channels to establish the color balance factor. Our approach adeptly circumvents the issue of over-amplification artifacts commonly associated with the low-intensity red channel in underwater imaging.

The remainder of this paper is organized as follows. In Sect. 2, we introduce two imaging models and provide a succinct recap of previous work in the domain of underwater image enhancement. Our proposed method is presented in detail in Sect. 3. Subsequently, Sect. 4 presents an extensive series of experiments validating the effectiveness of our approach. Finally, we conclude and consolidate our findings in Sect. 5, where we also discuss potential directions for future work.

Fig. 1
figure 1

Underwater imaging model and light absorption schematic

2 Background

In this section, we will first provide an overview of two underwater imaging models. Subsequently, we review research related to underwater image enhancement, including physical model-based methods, non-physical model-based methods, and deep learning-based methods.

2.1 Underwater Image Formation Model

The Jaffe–McGlamey imaging model (Jaffe, 1990) depicted in Fig. 1, the imaging process can be represented as a combination of three elements: direct scattering, backscattering, and forward scattering. However, direct scattering is typically disregarded, allowing the imaging model to be simplified as follows:

$$\begin{aligned} I_{c}\!\!=\!\! J_{c}\!\exp (-\beta cz)\!+\!Ac(1-\exp (-\beta cz)),c\!\in \!\! \left\{ R,G,B\!\right\} \end{aligned}$$
(1)

where I and J are the degraded and clear images, respectively. A is the global background light, \(\beta \) is the light attenuation coefficient, and z is the distance from the camera to the scene. \(\exp (-\beta z)\) is the medium transmission map indicating the portion of J that reaches the camera. \(J\exp (-\beta z)\) is the forward scattering, which is the main cause of blur and fog effects. \(A(1-\exp (-\beta z))\) is the backward scattering, which causes contrast degradation and color bias in underwater images.

According to Akkaynak et al. (2017), we noted the limation of the Jaffe–McGlamey imaging model in effectively portraying the multifaceted nature of underwater imaging. This is due to its failure to account for the varying dependence of direct scattering attenuation coefficient and backward scattering coefficient, and instead, merely assuming them to be identical. To delve deeper into the imaging process, they conducted in-situ experiments (Akkaynak & Treibitz, 2018) in two different types of optical water bodies and analyzed the functional relationships and parameter dependencies. Their work exposed the inaccuracies originating from this oversimplification, prompting them to design a revised, more robust underwater imaging model.

$$\begin{aligned} I_{c}=J_{c}\exp (-\beta _{c}^{D}(\nu _{D})\cdot z)+A_{c}^{\infty }(1-\exp (-\beta _{c}^{B}(\nu _{B})\cdot z))\nonumber \\ \end{aligned}$$
(2)

where I, J, A, z, and c are the same as in the Jaffe–McGlamey imaging model, and the vectors \(\nu _{D}\) and \(\nu _{B}\) denote the parameter dependence of the attenuation coefficient \(\beta _{c}^{D}\) in direct scattering and the scattering coefficient \(\beta _{c}^{B}\) in backward scattering, respectively, as follows:

$$\begin{aligned} \nu _{D}=\left\{ z,\xi ,H,R_{s},\beta \right\} ,\nu _{B}=\left\{ H,R_{s},\gamma ,\beta \right\} \end{aligned}$$
(3)

where \(\xi \), H, \(R_{s}\) denotes the scene reflectance, irradiance, and sensor spectral response parameters, respectively. \(\beta \) and \(\gamma \) denote the beam attenuation and scattering coefficients, respectively. \(\triangle z\) denotes the amount of change in distance. Based on the wavelength \(\lambda \) of visible light and the global background light \(A_{\infty }(\lambda )\), \(\beta _{c}^{D}\) and \(\beta _{c}^{B}\) can be further expressed as:

$$\begin{aligned} \beta _{c}^{D}= & {} \ln \left[ \frac{\int R_{s}(\lambda )\xi (\lambda )H(\lambda )\exp (-\beta (\lambda )(z))d\lambda }{\int R_{s}(\lambda )\xi (\lambda )H(\lambda )\exp (-\beta (\lambda )(z+\triangle z))d\lambda }\right] /\triangle z \nonumber \\ \end{aligned}$$
(4)
$$\begin{aligned} \beta _{c}^{B}= & {} -\ln \left[ 1- \frac{\int R_{s}(\lambda )A_{\infty }(\lambda )(1-\exp (-\beta (\lambda )(z)))d\lambda }{\int R_{s}(\lambda )A_{\infty }(\lambda )d\lambda }\right] /z\nonumber \\ \end{aligned}$$
(5)

However, compared to the Jaffe–McGlamey imaging model, the CIFM to invert underwater degradation processes is limited to certain scenarios due to the reliance on precise depth information and a series of manually measured optical parameters.

Table 1 Underwater image depth estimation prior

2.2 Related Work

Recently, various techniques have been devised to enhance the clarity of underwater images. These underwater image enhancement (UIE) methods can be broadly categorized into three groups: physical model-based, non-physical model-based, and deep learning-based methods.

Physical model-based methods The method is rooted in a physical imaging model for underwater environments, which employs specific prior constraints to determine the background light and transmission maps, thereby inverting the degradation process and producing high-quality images. Several prior-based depth estimation methods have been proposed for underwater imaging. Carlevaris-Bianco et al. (2010) introduced the scene depth Prior (MIP) that estimates depth using the significant differences in light attenuation across the three color channels in water. Drews et al. (2013) designed the Underwater Dark Channel Prior (UDCP), which is based on the selective absorption of light by water, excluding the red channel. Peng and Cosman (2017) proposed the Image Blurriness and Light Absorption (IBLA) method, which relies on image ambiguity and light absorption prior to estimating the scene depth and restoring degraded underwater images. Recently, Berman et al. (2017) suggested the Haze-Line prior to dealing with wavelength-dependent attenuation in underwater images. Song et al. (2018) proposed a quick depth estimation model based on the Underwater Light Attenuation Prior (ULAP). The model’s coefficients were trained using supervised linear regression with a learning-based approach. Furthermore, Akkaynak and Treibitz (2019) improved the underwater imaging model, called Sea-Thru, by considering underwater specificities. Based on CIFM (Akkaynak & Treibitz, 2019), Zhou et al. (2021, 2022) proposed a new underwater unsupervised depth estimation method and a backward scattering-based color compensation method, respectively. Table 1 lists several prior-based depth estimation methods. These methods based on physical models are usually efficient, yet they heavily depend on manually designed prior knowledge.

Non-physical model-based methods This method enhances the image quality without relying on physical models by directly manipulating pixel values to produce more visually appealing underwater images. Popular techniques include image fusion (Ancuti et al., 2012, 2017), histogram stretching (Hitam et al., 2013), and Retinex-based methods (Zhuang & Ding, 2020; Zhuang et al., 2021). For example, Ancuti et al. (2012) suggested a fusion approach that combined different feature images into a single image through weight assignment. Following this, Ancuti et al. (2017) advanced their approach by creating a multiscale fusion technique that combined the white balance method’s color correction and the histogram method’s contrast-boosted version, resulting in favorable outcomes for underwater images that experience substantial red channel reduction. Hitam et al. (2013) developed a hybrid Contrast-Limited Adaptive Histogram Equalization (CLAHE) method. This method carries out CLAHE operations on both RGB and HSV color models and then uses Euclidean parametric to merge the results, thereby improving the image contrast in small areas. Zhuang et al. (2021) proposed a Bayesian Retinex algorithm, which simplifies the complex underwater image enhancement process by dividing it into two simpler denoising sub-problems using multi-order gradient priors on reflectance and illumination. However, these image enhancement techniques lack the consideration of the fundamental principles of underwater imaging, which can result in over-enhancement and overexposure in the output images.

Fig. 2
figure 2

The flowchart of the proposed approach. Our methodology comprises three steps: depth estimation, backscatter removal, and color reconstruction. Specifically, the \(CIP^{+}\) depth map is derived by fusing the Channel Intensity Prior (CIP) depth map, which accounts for distinct light attenuation laws, with the MONO2 depth map obtained from an unsupervised approach. Backscatter is then removed using an Adaptive Dark Pixel (ADP) technique, dynamically adapted according to varying degrees of image degradation. Finally, the image’s color and luminance are reconstructed via color compensation and balancing

Deep learning-based methods The trend of deep learning in the field of UIE has emerged due to its exceptional capability in robust and powerful feature learning. Li et al. (2020), an underwater image enhancement network (UWCNN) based on CNN was developed, utilizing synthetic underwater images for training. The network aims to restore clear underwater images through an end-to-end approach that considers the optical properties of various underwater environments. Nonetheless, the UWCNN lacks the capability to determine the appropriate water type automatically. An unsupervised color correction network named WaterGAN was presented by Li et al. (2017). It integrated a generative adversarial network (GAN) with a physical underwater imaging model, generating a dataset of improved underwater images and accompanying depth information. Li et al. (2019) proposed a gated fusion network, known as WaterNet, and created an underwater image enhancement benchmark dataset (UIEBD) that includes a variety of scenes. They also developed corresponding high-quality reference images. Li et al. (2021) developed a Ucolor network by taking inspiration from a physical underwater imaging model. The network incorporates multicolor spatial embedding and media transport guidance to improve its response to areas with degraded quality. The proposed Lightweight Adaptive Feature Fusion Network (LAFFNet) in Yang et al. (2021) incorporates multiple adaptive feature fusion modules from the codec model to generate multi-scale feature mappings and utilizes channel attention to merge these features dynamically. Tang et al. (2022) introduced a new search space that incorporates transformers and employs neural structure search to find the optimal U-Net structure for enhancing underwater images. This leads to the creation of an effective and lightweight deep network with ease. Fu et al. (2022) proposed a new probabilistic network PUIE, to enhance the distribution of degraded underwater images and mitigate bias in reference map markers. However, deep learning-based UIE methods face a common challenge: the need for large, high-quality public training datasets.

3 Proposed Method

In this study, we propose a novel approach for underwater image restoration leveraging the Comprehensive Imaging Formation Model (CIFM). The proposed method encompasses the development of an enhanced Channel Intensity Prior (\(CIP^{+}\)) for depth estimation, the deployment of Adaptive Dark Pixels (ADP) for backscatter removal, and advanced techniques for color reconstruction. The overall process is illustrated in Fig. 2, which will be detailed in the following sections.

Fig. 3
figure 3

Depth estimation for underwater images. a DCP (He et al., 2010), b UDCP (Drews et al., 2013), c MIP (Carlevaris-Bianco et al., 2010), d IRC (invert the R channel intensity) (Galdran et al., 2015), and e Our depth estimation method. The top row shows the estimated results for the natural light scene, while the bottom row demonstrates estimates for the artificial light scene. The darker areas represent regions further from the camera. The images are sourced from the UIEBD dataset

3.1 Simplified Model

For many years, IFM-based methods have been popular for recovering underwater images. However, unlike the atmosphere, its results are inconsistent and depend on the underwater environment, making them unreliable and unstable. This is due to the dependence of the underwater environment on wavelength and scene. IFM assumes that the attenuation and scattering coefficients are equal. However, the improved underwater imaging model (CIFM) explicitly considers their differences (Zhou et al., 2021). However, the model faces difficulties in its application to underwater image recovery methods, due to the numerous parameters. That is difficult to estimate. To tackle this issue, our method simplifies the improved underwater imaging model. Based on Akkaynak and Treibitz (2018), we understand that there is one attenuation coefficient value for each color and distance in the scene, the imaging distance has the greatest effect on the attenuation coefficient, and there is only one scattering coefficient value for the whole scene. In addition, the backward scattering increases exponentially with the imaging distance, i.e., in a fixed scene (water type), the value of the scattering coefficient also depends on the imaging distance. Therefore, we ignore other parameters with a smaller effect and focus only on the effect of imaging distance on the attenuation and scattering coefficients. The simplified improved imaging model obtained is as follows:

$$\begin{aligned} I_{c}=J_{c}e^{-\beta _{c}^{D}\cdot z}+A_{c}\left( 1-e^{-\beta _{c}^{B}\cdot z} \right) \end{aligned}$$
(6)

where \(I_{c}\), \(J_{c}\), \(A_{c}\), z, \(\beta _{c}^{D}\) and \(\beta _{c}^{B}\) are consistent with the revised model. Compared with the IFM, the simplified and improved imaging model considers the various functional dependencies between the direct reflection and backward scattering components, leading to a more precise representation of underwater imaging.

3.2 Depth Estimation

The simplified and improved imaging model is contingent on the absolute depth value of the scene, estimating the depth map as a crucial aspect of the processing procedure. The sea-thru method (Akkaynak & Treibitz, 2019) employs the Structure From Motion (SFM) method to obtain the depth map in meters. However, the SFM approach requires multiple images of the same scene, which becomes a limitation given the variable and challenging underwater conditions. This constraint underscores the importance of monocular depth estimation in enhancing the flexibility of underwater imaging applications.

Fig. 4
figure 4

Monodepth2 performs depth estimation of underwater images and uses them directly in the recovery results of our method. The top row showcases a successful outcome, whereas the bottom row demonstrates a failure case. The source of these images is the UIEBD

Traditional depth estimation methods using DCP and MIP can obtain accurate results under ideal lighting conditions for monocular depth estimation. However, when underwater lighting conditions are not ideal, these prior assumptions are violated, causing decreased accuracy in in-depth estimation and recovery outcomes.

The first row in Fig. 3 presents a natural illumination image of the shallow water area. Regarding the DCP, the fish and coral in the foreground exhibit dark pixels, resulting in the dark channels having low values. Hence, they are accurately identified to be near. Conversely, the background lacks extremely dark pixels, leading the dark channel to exhibit high values, and these areas are inferred to be relatively distant. For the MIP, the closer sites yield more significant \(D_{mip}\) values than the farther sites, thereby providing accurate depth estimation. However, the far end of this image exhibits substantially greater brightness than the near point, i.e., the distant R-channel intensity is larger, leading to an error in the IRC depth estimation. The second row shows an underwater image captured under artificial lighting. The traditional depth estimation methods yield unsatisfactory results. The DCP incorrectly classifies the bright fish in the foreground as distant, and the depth span is not obvious. The RGB channel values are similar across the image result, leading to incorrect estimation of the MIP and IRC channels. Unlike these methods, our method accurately delineates the foreground and the background under different lighting conditions achieving more precise depth variations.

Deep learning-based techniques leverage the remarkable feature extraction capacity of neural networks, resulting in increased robustness and precision. However, obtaining the pixel-level depth datasets necessary for training supervised depth estimation methods is challenging (Bhoi, 2019). In contrast, instead of minimizing the error in the classical depth map, the unsupervised depth estimation method monodepth2 (Godard et al., 2019) network estimates the pose between two images, computes the depth map, and then uses the estimated pose and depth map to compute the reprojection of the first image on the second image. The loss to be minimized is the reconstruction error of that estimate. This enables the network to receive training solely from stereo images, eliminating the limitations posed by the dataset, and producing precise depth map estimates.

Nevertheless, compared to atmospheric images, underwater images often exhibit significant color cast due to selective light absorption by water. Monodepth2 is inefficient in dealing with heavily skewed underwater images, leading to a failure in image recovery. The top row in Fig. 4 is suitable for the monodepth2 method, which allows for precise depth map estimation and optimal recovery outcomes. The second-row image has a classical deep-sea image, with severely attenuated red and blue channels and a greenish hue. The monodepth2 method is unsuitable for depth estimation and restoration of these types of images.

In order to make Monodepth2 work in underwater scenarios, we present \(CIP^{+}\), a solution that merges unsupervised and prior techniques to calculate the scene depth. Our proposal starts with the channel intensity prior (CIP). In underwater environments, red light with the longest wavelength decays quickly, followed by blue-green light. This means that the distance of an object is proportional to the lower intensity of the red channel and the higher difference in intensity between the blue and green channels. We then elaborate on how CIP and unsupervised methods are combined using color deviation factors.

The definition of the red channel map R is as follows:

$$\begin{aligned} R(x)=\min _{y\in \Omega (x)}\left\{ 1-I_{r}(y)\right\} \end{aligned}$$
(7)

where \(\Omega (x)\) is a square local patch centered at x, \(I_{c}\) is the observed intensity in color channel c of the input image at pixel x. Based on CIP, the red light with the longest wavelength decays quickest as distance increases. As a result, the farther away from the camera, the lesser the red channel percentage of the image field. Hence, we can calculate the depth estimation directly from the red channel map, denoted \({\widetilde{d}}_{r}\):

$$\begin{aligned} {{{\widetilde{d}}}_{r}}=N_{s}(R) \end{aligned}$$
(8)

where \(N_{s}\) is a normalized function, defined as follows:

$$\begin{aligned} N_{s}(\nu )=\frac{\nu -\min \nu }{\max \nu -\min \nu } \end{aligned}$$
(9)

where \(\nu \) is a vector.

The chromatic aberration map M is defined as:

$$\begin{aligned} M(x)=\max _{y\in \Omega (x)}\left\{ I_{g}(y),I_{b}(y)\right\} -\max _{y\in \Omega (x)}I_{r}(y) \end{aligned}$$
(10)

Adopting the CIP, the greater the difference in intensity between the three channels, the greater the distance. Depth estimate is obtained, denoted \({\widetilde{d}}_{m}\):

$$\begin{aligned} {\widetilde{d}}_{m}=N_{s}(M) \end{aligned}$$
(11)

Combining Eqs. (8) and (11), the CIP depth is obtained as follows:

$$\begin{aligned} {\widetilde{d}}_{cip}=\alpha {\widetilde{d}}_{m}+(1-\alpha ){\widetilde{d}}_{r} \end{aligned}$$
(12)

where \(\alpha =S\left( \frac{Sum(I^{gray}>127.5)}{Size(I^{gray})},0.2\right) \), Sum(x) and Size(y) count the number of pixels that match the condition x and the number of all pixels in y, respectively. The sigmod function \(S(a,\delta )\) is defined as follows:

$$\begin{aligned} S(a,\delta )= \frac{1}{1+e^{-s(a-\delta )}} \end{aligned}$$
(13)

The value of \(\alpha \) hinges upon the global illumination of the image. When the percentage of pixels in the grayscale image \(I^{gray}\), with values greater than 127.5, is significantly below 0.2, \(\alpha \) is set to 0. This implies that majority of the pixel points in the image exhibit lower intensity, the overall brightness of the image is dim, and the intensity variation between the three channels is negligible, making \({\widetilde{d}}_{m}\) inapplicable. Consequently, the depth can only be accurately represented by \({\widetilde{d}}_{r}\). Conversely, when the proportion of pixels with values larger than 127.5 in the grayscale map \(I^{gray}\) significantly surpasses 0.2, \(\alpha \) is set to 1. The image is brighter, the background light becomes relatively brighter, and the intensity accounted for by the background light becomes more significant at the pixel points farther away, resulting in the possibility that the more distant pixel exhibit larger values in the red channel and are incorrectly assumed to be closer. Thus, for the brighter images, \({\widetilde{d}}_{m}\) is utilized to represent the scene depth. For cases in these two cases, the depth is obtained by a weighted combination of the two methods. The optimal values of 127.5 and 0.2 were experimentally derived, and we recommend variations between [110–140] and [0.15–0.35]. Changing the 127.5 and 0.2 to values beyond these ranges will invariably result in \(\alpha \) being either to 1 or 0.

The unsupervised depth estimation is directly obtained by the monodepth2 approach, which is denoted as \({\widetilde{d}}_{mono}\). Combining the prior and unsupervised estimations, the \(CIP^{+}\) depth of the underwater scene is described as follows:

$$\begin{aligned} {\widetilde{d}}_{cip^{+}}=\beta {\widetilde{d}}_{cip}+(1-\beta ){\widetilde{d}}_{mono} \end{aligned}$$
(14)

where \(\beta =S(k,2)\), k is the image color bias factor. When k is significantly larger than 2, \(\beta \)= 1, this implies a heavily color biased, the unsupervised method is infeasible, and the depth is represented by the prior depth estimate \({\widetilde{d}}_{cip}\). Conversely, if k is substantially less than 2, \(\beta \)= 0, there indicates no color bias and a more accurate depth estimate can be represented by \({\widetilde{d}}_{mono}\). Between these two cases, the depth map is obtained though a weighted combination of both methods. Through experimental validation, 2 is established as the optimal value for classified images, and we recommended operational range of [1.5\(-\)3.5]. Changing 2 to a smaller or larger value will result in \(\beta \) always being fixed to 1 or 0, undermining the adaptivity of the method.

Determining the color bias coefficient k is a critical aspect in image enhancement. To address this, we adopt the equivalent circle-based chromaticity detection method described in Xu et al. (2008). The traditional methods exhibit certain limations, which merely depend on the average image chromaticity or the image luminance extreme chromaticity to measure the degree of image chromaticity. The working principle of this method is that if the chromaticity distribution in the two-dimensional histogram on the a-b chromaticity coordinate plane is a single peak or the distribution is more concentrated. The chromaticity average plays a crucial role in assessing the level of color deviation in a sample. Generally, a larger chromaticity average often indicates a more severe color bias.

Leveraging the principle of equivalent circle, we derive the color bias coefficient, denoted as k:

$$\begin{aligned} k=\frac{D}{M} \end{aligned}$$
(15)

where \(D=\sqrt{d_{a}^{2}+d_{b}^{2}}\) is the average image chromaticity and \(M=\sqrt{m_{a}^{2}+m_{b}^{2}}\) is the chromaticity center distance.

$$\begin{aligned}{} & {} d_{a}=\frac{\sum _{i=1}^{W}\sum _{i=1}^{V}a}{WV} \end{aligned}$$
(16)
$$\begin{aligned}{} & {} d_{b}=\frac{\sum _{i=1}^{W}\sum _{i=1}^{V}b}{WV} \end{aligned}$$
(17)
$$\begin{aligned}{} & {} m_{a}=\frac{\sum _{i=1}^{W}\sum _{i=1}^{V}(a-d_{a})^{2}}{WV} \end{aligned}$$
(18)
$$\begin{aligned}{} & {} m_{b}=\frac{\sum _{i=1}^{W}\sum _{i=1}^{V}(b-d_{b})^{2}}{WV} \end{aligned}$$
(19)

where W and V are the width and height of the image, respectively, in pixels. In the a-b chromaticity plane, the coordinate center of the equivalent circle is \((d_{a},d_{b})\). The color balance of the image is established by the location of the equivalent circle on the coordinate system. If \(d_{a}>0\), the overall image hue is red. Otherwise, it is green; if \(d_{b}>0\), the overall image hue is yellow. Otherwise, it is blue. As the value of the color bias factor k increases, the severity of the color bias becomes more severe.

Details of the Depth-Estimation Algorithm are outlined in Algorithm 1.

Algorithm 1
figure a

Depth-Estimate

3.3 Backscatter Estimation

In the preceding section, the obtained depth map is a relative depth rather than an absolute depth, and its depth values are dimensionless and only related to other objects in the scene rather than an absolute depth in meters. To address this issue, we propose adaptive dark pixels, which aim to dynamically convert the relative depth to absolute depth and effectively remove backscatter.

First, we categorize underwater images into two groups, each defined by the background: images with seawater as the background and images with other backgrounds. For the former category, the theoretical maximum distance is \(\infty \), but the visibility decreases rapidly with increasing distance. Therefore, we define a maximum visibility \(d_{max}\) with a pre-set default value of 12 meters. For images where other elements form the background, this default visibility limit is reduced to 8 meters. It is important to note that these values, 12 and 8, are determined empirically based on the underwater camera’s visibility, and we recommend a range of [8–15] for these values. Any alteration to these numbers, either smaller or larger, could result in corresponding changes in the absolute depths, potentially increasing the backward scattering fitting error.

Moreover, the limited field of view of camera lenses often results in elements situated at shallow depths being not discernible in the captured image. To resolve the concern, we add the nearest distance \(d_{min}\) in meters for each depth within the scene. By estimating the maximum difference between the pixel intensity of the maximum depth value and the observed intensity \(I_{c}\) in the input image, the estimated \(d_{min}\in [0,1]\) can be efficiently computed.

$$\begin{aligned} {d}_{min}=1-\max _{x,c\in \left\{ r,g,b\right\} }\frac{|\theta -I^{c}(x)|}{\max (\theta ,255-\theta )} \end{aligned}$$
(20)

where \(\theta =I^{c}(\arg \max d(x))\) represents the pixel intensity at the maximum depth, i.e., the global background light. Consequently, when the global background light contributes a significant portion to the pixel intensity of the nearest pixel point, the gap between the closest and farthest pixel points decreases, resulting in an increase in \({d}_{min}\).

We employ a linear conversion method to convert the relative depth values x in the original depth map to absolute depth values y. The conversion equation is:

$$\begin{aligned} y=\frac{d_{max}-d_{min}}{{d}_{max}^{'}-{d}_{min}^{'}}x-{d}_{min}^{'}\frac{d_{max}-d_{min}}{{d}_{max}^{'}-{d}_{min}^{'}}+d_{min} \end{aligned}$$
(21)

where \({d}_{max}^{'}\) represents the highest value in the relative depth map and \({d}_{min}^{'}\) represents the lowest value, introducing \(d_{max}\) and \(d_{min}\) to adjust the relative depth.

Finally, the optimal recovery map is governed by the \(d_{max}\) value, as determined by the NIQE index. Unlike other current non-reference image quality assessment (IQA) algorithms, which require prior knowledge of image distortion and training on subjective human evaluations, NIQE implements “quality-aware” statistical feature sets by means of a simple and effective statistical model of natural scenes in the spatial domain, thereby solely requiring measurable deviations from the statistical patterns observed in natural images. Thus, NIQE does not require contact with distorted images and avoids the instability of subjective factors. As indicated by Mittal et al. (2012), NIQE outperforms the full-reference peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) metrics and provides the same performance as the top-performing no-reference, option aware, and distortion-aware IQA algorithms. The more details of the Depth-Conversion process can be found in Algorithm 2.

Algorithm 2
figure b

Depth-Convert

In addressing backscatter removal, we employ the principle of dark pixel prior knowledge (Akkaynak & Treibitz, 2019). This is grounded on the following assumption that in any given underwater scenario, there exist regions that exhibit zero reflectance (\(\xi =0\)), which indicates an absence of any color light reflection. Such regions are potentially attributable to black objects or the shadow cast by various objects.

Unlike the dark pixel prior, our approach posits that the dark pixel should be the pixel with the smallest summation of the R, G, and B channels. This is underpinned by the recognition that the intensity of the black pixel is the sum of \(B_{c}\), while the other pixels are dictated by \(B_{c}+D_{c}\). Given the same depth, it is inherently evident that the former will invariably be smaller than the latter.

Subsequently, the pixel points in the degraded image are separated into T groups based on the depth map. The total sum of RGB values is computed for each group of pixel points, and the first Y pixel points with the lowest values are selected as the initial estimates for backward scattering. T and Y are calculated as follows:

$$\begin{aligned}{} & {} T=d_{max}-2 \end{aligned}$$
(22)
$$\begin{aligned}{} & {} Y=\min \left\{ {\frac{N_{i} *T}{10000},N}\right\} \end{aligned}$$
(23)

where the dynamic value of T is adopted to avoid the violation of the dark pixel prior, which could occur due to the small depth span of individual intervals in close scenes where the overall brightness of the image is high. \(N_{i}\) represents the total number of pixels in a particular group. Given the backscatter estimation, we only require a small number of backscattered pixels, thus N is set to 500. As can be observed in the second row of Fig. 2, the selected black pixels are denoted by red dots.

Upon obtaining the initial estimate of the backscattering \({{B}}_{c}\) and its corresponding depth value z, we ascertain the value of \(J_{c},\beta _{c}^{D},A_{c},\beta _{c}^{B}\) in Eq. (24). The correlation between the tri-channel backscattering value and the depth value facilitates the derivation.

$$\begin{aligned} {{B}}_{c}=J_{c}(x)e^{-\beta _{c}^{D}z(x)}+A_{c}(1-e^{-\beta _{c}^{B}z(x)}) \end{aligned}$$
(24)

where \(J_{c}, A_{c}\in [0,1]\), \(\beta _{c}^{D}, and \beta _{c}^{B}\in [0,10]\). In the fitting procedure, we noticed irregularity and discreteness in shallow-depth data, which affected the fitting. To address this issue, we set a minimum threshold for the color depth utilized in the estimation process, defaulting 0.1% of the depth values. Consequently, we can calculate the direct reflection \(D_{c}\) of the scene as detailed below:

$$\begin{aligned} D_{c}=I_{c}-B_{c} \end{aligned}$$
(25)

Details of backscatter estimation are displayed in Algorithm 3.

Algorithm 3
figure c

Backsactter-Estimation

Fig. 5
figure 5

Backscatter removal process: a original image from UIEBD, b results of scattering removal by our method, c corresponding RGB channel backscatter fit curve, with the horizontal axis representing the imaging distance in meters and the vertical axis representing the color value

3.4 Color Reconstruction

The process of removing backward scattering from the raw image merely resolves the haze effect attributable to scattering, while does not correct color distortion induced by light absorption, as illustrated in Fig. 5. Consequently, it necessitates the implementation of both color compensation and color balance to reconstruct the image’s color and luminance to yield a more natural recovery outcome.

Building on the work of Akkaynak and Treibitz Akkaynak and Treibitz (2019), we model the color compensation factor \(\beta _{c}^{D}(z)\) as the summation of two exponentials, as detailed follows:

$$\begin{aligned} \beta _{c}^{D}(z)=a*e^{bz}+c*e^{dz} \end{aligned}$$
(26)

A preliminary estimate of \(\beta _{c}^{D}(z)\) can be obtained from the scene light source diagram \(H_{c}\) (Ebner & Hansen, 2013), as follows:

$$\begin{aligned} \frac{-\log H_{c}}{z}=\beta _{c}^{D}(z) \end{aligned}$$
(27)

Next, employing the already established range map z, we refine the estimate of \(\beta _{c}^{D}(z)\) obtained via \(\min _{\beta _{c}^{D}(z)}\left\| z-{\widetilde{z}}\right\| \) from Eq. (28). This yields the following:

$$\begin{aligned} {\widetilde{z}}=\frac{-\log {H}_{c}}{\beta _{c}^{D}(z)} \end{aligned}$$
(28)

In the refinement stage, we set the minimum distance between consecutive depth inputs to at least 1% of the total depth range. This setting balances the distribution of pixels at varying depths within the input dataset, allowing for an accurate estimation of the fitting exponential trend, rather than being dominated by dense clusters of data points at the minimum and maximum depths.

$$\begin{aligned} J_{c}=D_{c}e^{\beta _{c}^{D}(z)z} \end{aligned}$$
(29)

After image reconstruction using Eq. (29), we enhance the visual appeal by calculating the color balance factor, grounded in the CIP prior. This factor considers the pixel intensity distribution in the blue and green channels, avoiding the red artifacts caused by over-compensation of the red channel in extreme cases.

$$\begin{aligned}{} & {} w_{g}=avg(\max _{10\%}(I_{g})) \end{aligned}$$
(30)
$$\begin{aligned}{} & {} w_{b}=avg(\max _{10\%}(I_{b})) \end{aligned}$$
(31)

where \(avg(\max _{10\%}(I_{c}))\) is calculated by taking the average of the intensity of the larger top 10% pixel points among all the pixel points of channel \(I_{c}\). Further, the green channel color balance factor \(W_{g}\) and the blue channel color balance factor \(W_{b}\) can be calculated as:

$$\begin{aligned}{} & {} W_{g}=\frac{w_{g}}{2*(w_{b}+w_{g})} \end{aligned}$$
(32)
$$\begin{aligned}{} & {} W_{b}=\frac{w_{b}}{2*(w_{b}+w_{g})} \end{aligned}$$
(33)

The green and blue channels, \(I_{g}\) and \(I_{b}\), are updated as follows:

$$\begin{aligned}{} & {} I_{g}=W_{g}*I_{g} \end{aligned}$$
(34)
$$\begin{aligned}{} & {} I_{b}=W_{b}*I_{b} \end{aligned}$$
(35)

4 Experiment and Analysis

In this section, we evaluate the effectiveness of our method against several traditional and deep learning methods. We also discuss the impact of critical components of our method through detailed enhancement and ablation studies. Finally, we further analyze the time complexity of our method.

Fig. 6
figure 6

Visual comparisons on images of different color bias: a original image from UIEBD, bk display the results acquired by IBLA, GDCP, ULAP, WaterNet, SMBL, UWCNN, \(L^{2}\)UWE, Ucolor, PUIE, and our method, respectively. The best UCIQE scores in each case are highlighted in red, and the second-best scores are denoted in blue

4.1 Experimental Settings

Comparison Methods To evaluate the efficacy of our methodology, we conducted a comparative study with nine other techniques used to improve underwater images. These included five methods based on physical models, including IBLA (TIP’17) (Peng & Cosman, 2017), GDCP (TIP’18) (Peng et al., 2018), ULAP (RCM’18) (Song et al., 2018), SMBL (TB’20) (Song et al., 2020), \(L^{2}\)UWE (CVPR’20) (Marques & Albu, 2020), as well as four deep learning-based methods like WaterNet (TIP’19) (Li et al., 2019), UWCNN (PR’20) (Li et al., 2020), Ucolor (TIP’21) (Li et al., 2021), and PUIE (ECCV’22) (Fu et al., 2022).

Benchmark Datasets We evaluated our method using several datasets, including the UIEBD (Li et al., 2019), MABLs (Song et al., 2020), UCCS (Liu et al., 2020), U-45 (Li et al., 2019) and EUVP (Islam et al., 2020) datasets. The UIEB dataset (Li et al., 2019) includes 890 image pairs captured in real underwater environments, while the UCCS dataset (Liu et al., 2020) is divided into subsets of different water hues to test different color correction techniques. MABLs (Song et al., 2020) come with manual annotations of background light in images. U-45 (Li et al., 2019) is a commonly used dataset for underwater image testing, and EUVP (Islam et al., 2020) encompasses a diverse range of underwater objects.

Evaluation Metrics To assess the image quality, we employed four metrics: Underwater Color Image Quality Evaluation (UCIQE) (Yang & Sowmya, 2015), Contrast-changed Image Quality Measure (CEIQ) Yan et al. (2019), Naturalness Image Quality Evaluator (NIQE) (Mittal et al., 2012), and Information Entropy (IE) (Zhang et al., 2019). UCIQE evaluates image quality through chromaticity, saturation, and contrast, with a higher score indicating better quality. CEIQ assesses the overall quality using five contrast-related features, with higher scores indicating higher quality. NIQE gauges image quality by comparing it to a model derived from natural scenes, with lower scores implying better quality. Finally, IE represents the average amount of information in the image, and a higher score means more information and richer color.

4.2 Subjective Assessment

The performance of various color correction methods was evaluated using the UIEB dataset, as depicted in Fig. 6. In instances where there is a strong green color cast, the outcomes produced by IBLA (Peng & Cosman, 2017), GDCP (Peng et al., 2018), ULAP (Song et al., 2018), and \(L^{2}\)UWE (Marques & Albu, 2020) fall short of expectations. This is due to the near-zero values of the red and blue channels, resulting in a prior failure. SMBL (Song et al., 2020) and \(L^{2}\)UWE (Marques & Albu, 2020) enhance the clarity of low-visibility underwater images, but are not fully successful in eliminating the haze. In high scattering images, the values in the RGB channels tend to be similar, which leads to the ULAP prior not working effectively. This results in an overcompensation of the red channel. WaterNet (Li et al., 2019), UWCNN (Li et al., 2020), Ucolor (Li et al., 2021), and PUIE (Fu et al., 2022) correct color distortion. Still, their loss functions do not prioritize luminance information, resulting in inadequate improvement of contrast in low-visibility images and resulting in local darkness. From the RGB histogram in Fig. 6k, it can be seen that our method effectively removes color bias and enhances contrast, thereby effectively resolving artificial artifacts. UCIQE values confirm the superior visual quality of the results.

Fig. 7
figure 7

Visual comparisons on images of various illumination conditions: a original image from MABLs, EUVP, and U-45, bk illustrate the results obtained by IBLA, GDCP, ULAP, WaterNet, SMBL, UWCNN, \(L^{2}\)UWE, Ucolor, PUIE, and our method, respectively. The best UCIQE scores in each case are highlighted in red, and the second-best scores are denoted in blue

Fig. 8
figure 8

Visual comparison results of high scattering and high distorted color images: a original image from MABLs, bk showcase the results obtained by IBLA, GDCP, ULAP, WaterNet, SMBL, UWCNN, \(L^{2}\)UWE, Ucolor, PUIE, and our method, respectively. The best UCIQE scores in each case are highlighted in red, and the second-best scores are denoted in blue

To effectively handle diverse underwater lighting conditions and address the problem of non-uniform illumination caused by artificial light sources, we evaluated the enhancement results of images with different illumination conditions on the MABLs, U-45, and EUVP datasets. As depicted in Fig. 7. Existing methods such as IBLA (Peng & Cosman, 2017), GDCP (Peng et al., 2018), SMBL (Song et al., 2020), and \(L^{2}\)UWE (Marques & Albu, 2020) resulted in overexposure images when enhancing artificially illuminated images. Although methods like WaterNet (Li et al., 2019), UWCNN (Li et al., 2020), Ucolor (Li et al., 2021), and PUIE (Fu et al., 2022) performed well on artificially illuminated images, they introduced local darkness in low-illumination images, with WaterNet exhibiting the worst performer in this regard. However, IBLA, SMBL, and \(L^{2}\)UWE still exist with overexposure. In contrast, our approach surpasses the performance of the compared methods in enhancing contrast and preserving details, avoiding over or under-enhancement, and preventing the creation of dark regions. The UCIQE scores demonstrate that our approach effectively enhances contrast and removes haze under various lighting conditions.

Table 2 Quantitative evaluations of various techniques on the UIEB, U-45, UCCS, and MABLS datasets were conducted

To evaluate the effectiveness and robustness of various techniques, we conducted image enhancement experiments on a dataset of MABLs with high backscatter and color bias, as shown in Fig. 8. It is evident that several compared methods encounter difficulties when applied to enhance challenging underwater images. GDCP (Peng et al., 2018) induce undesirable color distortions. while \(L^{2}\)UWE boosts the texture and edge sharpness, it also causes a blurring of the image’s details. IBLA, WaterNet, UWCNN, and Ucolor tend to introduce localized color bias without effectively correcting the overall darkness of the image. SMBL (Song et al., 2020) effectively improves the color of the image, but does not eliminate the fog effect of the image. In contrast, our method successfully eliminates unnatural colors, while improving visibility, rendering more details and vibrant colors. Additionally, UCIQE values also show that our method is effective and robust.

4.3 Objective Assessment

To validate the earlier subjective observations, we employed an objective evaluation technique to conduct a more comprehensive assessment of the quality of the restored images. Table 2 presents the average scores of the four no-reference quality metrics (UCIQE, CEIQ, NIQE, and IE) for various methods. These methods were applied to the UIEBD, MABLs, U-45, and UCCS datasets.

It can be observed that deep learning-based methods such as WaterNet, UWCNN, Ucolor, and PUIE have performed well on the four test datasets. They exhibit lower UCIQE and favorable CEIQ, NIQE, and IE scores. The convolutional capabilities of these deep learning methods allow them to correct color distortion effectively. However, they may not be as effective as traditional methods that employ physical models in terms of enhancing contrast and increasing color vividness.

The physics-based model, including IBLA, GDCP, ULAP, SMBL, and \(L^{2}\)UWE, have demonstrated lower CEIQ, NIQE, and IE scores on the four test datasets. The root cause of this phenomenon is that these methods use physics-based atmospheric imaging models. However, atmospheric imaging models fall short of accurately describing the degradation of underwater image quality. These physical-based methods achieve optimal performance and necessitate the incorporation of accurate prior information, a requirement they often fail to meet in underwater imaging scenarios.

Thanks to the differentiation of attenuation and scattering coefficients with CIFM and dynamic removal of backscatter based on image type by ADP, our method achieves the highest UCIQE, CEIQ, and IE scores on all four test datasets. Moreover, our method outperforms other state-of-the-art methods regarding the NIQE score. In conclusion, our approach’s superiority in color correction is proved by both qualitative and quantitative evaluations.

4.4 Comparisons of Detail Enhancement

Precise fine structural details are essential for generating high-quality underwater images. To evaluate the effectiveness of various enhancement techniques in enhancing the detailed portions of the images, we conducted a comparison by localized zoom, as illustrated by the red and blue boxes in Fig. 9. From a global perspective, our method effectively removes color distortion and improves contrast. On a local scale, it enhances image structure details and significantly enhances clarity and information.

Fig. 9
figure 9

Detail enhances the visual effect of different methods: a original image from MABLs, bk results obtained by IBLA, GDCP, ULAP, WaterNet, SMBL, UWCNN, \(L^{2}\)UWE, Ucolor, PUIE, and Our method, respectively

Fig. 10
figure 10

Qualitative ablation results are presented for each key component of our method: a original image from UIEBD, bk results obtained by -w/o CIP, -w/o MONO, -o/y R, -o/y M, -w/o ADP, and Our method (full model), respectively

4.5 Ablation Study

To validate the efficacy of the core components in our method, i.e., the \(CIP^{+}\) and ADP modules, we conducted an extensive series of ablation studies on the UIEB dataset. The tested variants include (a) the original image, (b) our method without channel intensity prior depth estimation (-w/o CIP), (c) our method without self-supervised depth estimation (-w/o MONO), (d) our method with only red channel prior depth estimates (-o/y R), (e) our method with only chromatic aberration prior depth estimates (-o/y M), (f) our method without adaptive dark pixel (-w/o ADP), and (g) our method (full model).

To assess the effectiveness of the \(CIP^{+}\) module, we performed a detailed comprehensive ablation study. As depicted in Fig. 10(b)–(e), the backscatter was successfully removed in certain areas, and the contrast was improved. However, there existed inaccurate depth estimation, resulting in the introduction of artifacts and over-enhancement in some regions. The depth estimation results for each portion from (b) to (e) are shown in Fig. 11. Both MONO2 and M depth maps exhibited inaccuracies of varying extents, which were primarily due to color bias and luminance loss. Although the R depth map appeared accurate, it incorporated excessive image detail and lacked smoothness. Contrarily, the \(CIP^{+}\) model provides a more accurate and smoother depth estimate. The objective ablation experiments results are reported in Table 3. All metrics of the incomplete \(CIP^{+}\) module experienced varying degrees of the decline attributed to depth estimation errors. Therefore, the \(CIP^{+}\) module is the superior choice.

Fig. 11
figure 11

Qualitative ablation results for each key component of our depth estimation method: a original image from UIEBD, bf results obtained by MONO, M, R, CIP, and \(CIP^{+}\), respectively. The x-axis of g is the depth, and the y-axis is the chromaticity, reflecting the depth values represented by the different colors in the depth map

Table 3 Ablation study on the UIEB dataset

To explore the effectiveness of the ADP module, we eliminate the ADP module and obtain a variant of the method. As shown in Fig. 10, it can be seen that (f) successfully removes color bias and enhances texture detail, but it also causes local darkness due to the fitting error. Meanwhile, Fig. 10(g) includes all the critical components for the best visual outcome. The fitting results with and without the ADP module are displayed in Fig. 12, and it is evident that the fitting loss is smaller with the addition of the ADP module in various scenarios. To further validate our ablation study, we employed the full reference metrics SSIM (Structural Similarity Index) and PSNR (Peak Signal-to-Noise Ratio). The detailed ablation results are presented in Table 3. It’s evident that all indicators of the method without the ADP module trigger noticeable declines across all performance metrics. This is attributed to the ADP module’s ability to dynamically select dark pixel points based on the degradation level of each image, thereby minimizing fitting errors. As a result, the ADP module proves to be a crucial component of our method.

Fig. 12
figure 12

Qualitative ablation results for the key component of our scattering fit: a original image from UIEBD, b the result of fitting the backward scattering coefficient of -w/o ADP, and c the result of fitting the backward scattering coefficient of our method

4.6 Running Time Comparisons

To evaluate the computational efficiency of our method, we created an underwater image dataset composed of 100 images for each of the following sizes: 256 \(\times \) 256, 512 \(\times \) 512, and 1024 \(\times \) 1024. We tested a PC with an Intel Xeon Silver 4215R CPU @ 3.20 GHz 3.19 GHz and an NVIDIA Tesla V100 PCIE 32GB GPU. The traditional method was run using MATLAB R2019a, while Python and PyTorch were employed for executing the deep learning-based method.

Table 4 indicates that deep learning methods generally perform faster than traditional ones due to their comprehensive training and GPU utilization. Conversely, traditional restoration methods, like IBLA and our method, consume a considerable amount of time to calculate the background light and transmission map. This leads to an extended runtime, primarily driven by the need for repetitive transmission map estimations. Although our approach may not outperform other methods in terms of processing speed, it effectively removes blur and color bias while addressing the issue of artificial light.

Table 4 Average runtime of different underwater image enhancement techniques

5 Conclusion

In this paper, we propose a novel method for artificial light removal by combining an adaptive dark pixel prior and color correction technique within the CIMF framework. We adopt the \(CIP^{+}\) depth estimation technique based on the law of light attenuation and unsupervised methods, considering the degree of image degradation. Additionally, we employ ADP to remove backward scattering effectively. Our method demonstrates robust performance across various underwater environments and illumination conditions, yielding visually pleasing images. Objective experiments report that our UCIQE/CEIQ outperforms the GDCP method by 6.64% and 6.79%, and the recently data-driven PUIE approach by 11.36% and 3.35%, evidencing significant improvements in color recovery and detail enhancement. The extensive experiments clearly show the efficacy of the proposed method in enhancing details and restoring the natural color, highlighting its potential for underwater image restoration. Nevertheless, our approach requires a more extended runtime for accurate depth estimation than deep learning-based methods. In future work, we aim to accelerate its speed and optimize the fitting procedure.