1 Introduction

Underwater optical cameras are used to capture images in unfamiliar ocean environments. The captured underwater images often suffer from low contrast, color distortion, and heavy noise. For underwater images, the different absorption rate for light with different wavelengths causes color shifting, the forward-scattering leads to blurring, backward-scattering effect limits the contrast, and the artificial light causes non-uniform illumination. Consequently, it is necessary to improve the quality of underwater images.

In recent years, underwater image restoration has been studied by many researchers, and many effective restoration methods have been proposed. These methods can be divided into hardware and non-hardware-based approaches. Hardware-based approaches require special equipment, such as computational imaging [1], range-gated imaging [2], and stereo imaging [3] equipment.

1.1 Imaging methods

1.1.1 Computational imaging

Schechner et al. [1] used a polarization filter attached to a camera to restore images taken at significantly different scene distances. Treibitz et al. [4] proposed a fluorescence light-based imaging method to eliminate scatter. However, these methods cannot attenuate the transmission of radiance through polarization filters, particularly for time-variations that affect visibility and illumination conditions.

1.1.2 Range-gated imaging

Tan et al. [2] proposed a sample plot of the timing of a range-gated imaging system to capture images in turbid water. Ouyang et al. [5] proposed a pulsed laser line scan imaging system for underwater imaging. However, range-gated imaging methods are easily affected by sediment and the power of the laser. In addition, processing images with range-gated imaging methods is time consuming.

1.1.3 Stereo imaging

Martin et al. combined a stereo matching and light attenuation model to recover visibility under water [3]. Lee et al. proposed a stereo image de-hazing method that used a stereo image pair to estimate scattering parameters [6]. However, obtaining a high quality depth map with stereo matching is difficult because the input images are significantly distorted due to scattering.

Non-hardware-based approaches have been proposed to overcome the limitations of hardware-based approaches. Non-hardware-based approaches use only digital signal processing tools, such as histogram equalization [7, 8], statistical modeling [9, 10], and unsharp masking [11], to enhance underwater images.

1.2 Image processing methods

1.2.1 Image enhancement

Garcia et al. [7] proposed local histogram equalization to address non-uniform lighting and haze. Zuiderveld et al. [8] proposed contrast limited adaptive histogram equalization to adjust the target region according to interpolation between the histograms of neighboring regions. However, local equalization is very time consuming and causes additional noise.

1.2.2 Image restoration

The physical based image restoration methods are also studied in recent years. Lu et al. proposed the physical based model to restore the underwater images, such as physical wavelength [12], spectral characteristics [13].

1.3 Recognizing methods

1.3.1 Statistical modeling

Fattal [9] designed a color-lines method to estimate the turbidity of haze, and then used a Markov Random Field model to recover clean images. He et al. proposed using a dark channel prior to estimating the depth map. Then, they employed soft matting to refine the depth map and obtain clear images [10]. The subsequent image enhancement resulted in regional contrast stretching that could cause halos or aliasing.

1.3.2 Unsharp masking

Unsharp masking improves images by emphasizing high-frequency components [11]. Although this method is easy to implement, it is very sensitive to noise and causes digitization effects and blocking artifacts.

All of the abovementioned methods can improve the quality of underwater images. However, an integrated underwater image restoration method has not yet been developed. Moreover, most physical imaging models that attempt to improve image quality function according to a simple assumption, i.e., parameters are constant; thus, they can only remove homogeneous scatters.

In this paper, we propose an underwater light propagation model and a corresponding restoration method to improve the quality of underwater images. The proposed imaging model is considered the effects of scattering, absorption, and blurring. We estimate the scattering coefficient using color histogram distance (CHD). Then, we employ hierarchical transmission fusion to refine the transmission. After estimating scene radiance, we use a spectrometer to estimate the absorption rate and recover the real scene color. Finally, a clean image is obtained through de-blurring.

The reminder of this paper is organized as follows. Section 2 reviews related work. Section 3 describes the proposed underwater light propagation model and image restoration method. Section 4 presents a simulation evaluation of the proposed approach and experiments using real-world underwater images. Finally, Section 5 summarizes the paper and describes future research.

2 Related works

Many underwater image contrast enhancement methods have been proposed in recent years. Bazeille et al. proposed an image preprocessing scheme to enhance images captured in turbid water [14]. This method uses contrast enhancement, anisotropic filtering, and wavelet filtering to recover an underwater scene. However, this method causes color distortion. Nicholas et al. [15] improved the dark channel prior and graph-cut method to refine the transmission map. This method achieves better results for low turbidity underwater images. However, using graph cuts is time consuming. Chiang et al. [16] considered the effects of variations in wavelength on underwater imaging and obtained a reconstructed image using a dark channel prior. However, the piecewise function for estimating the transmission map and the use of constant depth information for color correction cause the scatters residues and color distortion. Ancuti et al. [17] used an exposed fusion method in a turbid medium to reconstruct a clear image. This method also performs well for processing images in low turbidity water. However, it is ineffective for images captured in highly turbid water and does not deal with non-uniformly distributed scatter. Galdran et al. [18] proposed a red channel-based underwater image restoration method. This method assumes that the red color channel fades quickly in ocean water. However, according to previous work, red color does not always fade faster than other colors [19]. In turbid water or in places with high plankton concentration, red light may transmit better than blue light. Thus, Lu et al. proposed an underwater dual-dark channel prior [13]. Lu’s work employs the red and blue channels to calculate a transmission map; however, a locally adaptive cross filter may cause additional noise. In addition, this method cannot remove non-uniform scatter. Emberton et al. [20] proposed a hierarchical rank-based veiling light estimation method to estimate ambient light. This method can avoid mistaking bright objects (e.g., bubbles) for ambient light in water. However, this method is time consuming and cannot be employed directly in underwater vehicles. Lu et al. [12] proposed a robust atmospheric light estimation method for de-hazing. This method first removes highlights (e.g., flickers and bubbles) in the image and then uses the underwater dark channel prior for de-hazing. Codevilla et al. [21] evaluated multiple feature detectors in an underwater environment relative to varying turbidity. Regrettably, image restoration methods were not mentioned in this work. Gibson et al. [22] considered noise, scattering, and blurring in imaging models. However, they relied on a contrast enhancement turbulence mitigation method, which does not consider many characteristics of water.

In this study, we attempt to address non-uniform scatter and blurring caused by sensor movement. In the proposed underwater imaging model, we consider scatter effects, light absorption, and motion blur. We also propose methods to estimate the scatter coefficient, color distortion coefficient, and transmission map. The following sessions show the details for processing.

3 Proposed method

3.1 Imaging model

The Koschmieder model [19], which describes the atmospheric effects of weather on an observer, is used in the proposed method. In turbid water, radiance from a scene point will attenuate in the line of sight and scatter ambient light towards the observer (Fig. 1). In addition, simply operating a camera typically results in blurring. Therefore, the Koschmieder model has been adapted for underwater imaging conditions. The adapted model can be expressed as

$$ {I}_c={I}_c^{\infty }(z)\rho T(d)+{I}_c^{\infty }(z)\left(1-T(d)\right)+n $$
(1)

where transmission map T(d) = e β(λ)d,\( {I}_c^{\infty }(z) \) is the ambient light intensity at depth z, ρ is the normalized radiance of a scene point, d is the distance from the scene point to the camera, β(λ)is the total nonlinear beam attenuation coefficient with wavelength λ, n is the noise, and c is the color channel. The ambient illumination at depth z is subject to light attenuation in the following form

$$ {I}_c^{\infty }(z)=H\ast \left(\alpha {I}_c^0{e}^{-\beta \left(\lambda \right)z}\right) $$
(2)

where \( {I}_c^0 \) is the atmospheric intensity at the surface of the water, H is the degradation matrix, and α is the transmission coefficient at the surface of the water. Most previous research [16, 18, 19] [38] has assumed that the total nonlinear beam attenuation coefficient, normalized radiance, and transmission coefficient are constant. In real ocean waters, the scattering and absorption coefficients are non-negligible. Therefore, these coefficients must be approximated from the wavelength. The proposed pipeline is shown in Fig. 2. In the following sections, we introduce the information required to solve Eq. (1).

Fig. 1
figure 1

Underwater imaging model

Fig. 2
figure 2

Example of the proposed approach to recover images in non-uniform turbid water

3.2 Estimating coefficients

3.2.1 Scattering coefficient β(λ) estimation

To estimate the scattering coefficient, we use CHD to measure the amount of scatter between patches at different positions. The CHD is modeled as an ellipsoid characterized by a sample mean μ and a sample covariance∑. The mean of a scattered patch is scaled by transmission and shifted by ambient light using Eq. (1).

$$ {\mu}_{I_c}={e}^{-\beta \left(\lambda \right)d}\cdot {\mu}_{I_c^{\infty }(z)\rho }+\left(1-{e}^{-\beta \left(\lambda \right)d}\right)\cdot {I}_c^{\infty }(z) $$
(3)

The covariance is scaled by squared transmission as follows.

$$ {\sum}_{I_c}={\left[{e}^{-\beta \left(\lambda \right)d}\right]}^2\cdot {\sum}_{I_c^{\infty }(z)\rho } $$
(4)

The rectangle and volume of a cluster imply how much haze is present in a patch with respect to distance. Since scattering is homogeneous in a small patch, the size of two clusters is only dependent on the corresponding distance. The attenuated variances of two patches in a circular window are described as follows:

$$ {\sum}_{I_{c,1}}={t}_1^2\cdot {\sum}_{I_c^{\infty }(z)\rho } $$
(5)
$$ {\sum}_{I_{c,2}}={t}_2^2\cdot {\sum}_{I_c^{\infty }(z)\rho } $$
(6)

where t = e β(λ)d. Thus, the estimated scattering coefficient β(λ) is computed using the relationship of transmission with distance as follows:

$$ \beta \left(\lambda \right)=\frac{\ln \left({t}_1^2/{t}_2^2\right)}{2\mid {d}_1-{d}_2\mid } $$
(7)

where d 1 and d 2 are the distance of the scene points from the camera. The series transmissions of the entire scattered image can be simply calculated as follows:

$$ {T}_i\left({d}_i\right)={e}^{-{\beta}_i\left(\lambda \right){d}_i},\kern1em i\in \left\{1,2,\cdots, N\right\} $$
(8)

where i is the number of circular windows in the scattered image. We obtain transmissions for different levels of scatter. However, each transmission map corresponds to each scatter level; thus, a final clear scene transmission map is required. In the following sections, we propose a fusion method to fuse different transmission maps to a single clean transmission map.

3.2.2 Hierarchical transmission fusion

The transmission maps can be considered different focused images. We introduce the multi-frame fusion method to obtain a final clean transmission map. In a frequency domain, such as wavelet and curvelet transform domains, transmission maps are decomposed to low and high frequencies. Thus, the criteria used to select appropriate low and high-frequency coefficients are important. In this paper, wavelet transform is applied to two images to obtain the coefficients. Then, the coefficients are processed at low and high frequencies prior to fusing the images. Finally, a clear transmission map is obtained by inverse wavelet transform.

We use maximum local energy [23] to measure low frequency coefficients. The maximum energy (in a local 3 × 3 region) of two source images is selected as the output. Due to human visual perception characteristics and the relationship of decomposition of local correlation coefficients, the statistical characteristics of neighbors should be considered. Therefore, the statistical algorithm is based on a 3 × 3 sliding window. The energy is described as follows:

$$ {E}_j\left(m,n\right)=\sum_{m\hbox{'}\in M,n\hbox{'}\in N}L\left(m+{m}^{\hbox{'}},n+{n}^{\hbox{'}}\right)\cdot {f}_j^{(0)2}\left(m+{m}^{\hbox{'}},n+{n}^{\hbox{'}}\right) $$
(9)

where L is the local filtering operator, M and N represent the scope of the local window, j is the number of different level transmission maps, \( {f}_j^{(0)}\left({{}^{\ast}}^{\ast}\right) \)represents the low frequency coefficients, and m’ and n’ are variables.

Local energy (LE) is expressed as follows:

$$ {E}_j^{l,k}\left(m,n\right)={L}_1\ast {f}_j^{(0)2}\left(m,n\right)+{L}_2\ast {f}_j^{(0)2}\left(m,n\right)+\cdots +{L}_K\ast {f}_j^{(0)2}\left(m,n\right) $$
(10)

where L 1, L 2,…, L K-1 and L K are the filter operators in K different directions, and l and k are the scale and direction of the transform, respectively.

$$ {L}_1=\left[\begin{array}{ccc}\hfill -1\hfill & \hfill -1\hfill & \hfill -1\hfill \\ {}\hfill 2\hfill & \hfill 2\hfill & \hfill 2\hfill \\ {}\hfill -1\hfill & \hfill -1\hfill & \hfill -1\hfill \end{array}\right],\kern0.5em {L}_2=\left[\begin{array}{ccc}\hfill -1\hfill & \hfill 2\hfill & \hfill -1\hfill \\ {}\hfill -1\hfill & \hfill 2\hfill & \hfill -1\hfill \\ {}\hfill -1\hfill & \hfill 2\hfill & \hfill -1\hfill \end{array}\right],\kern0.5em {L}_3=\left[\begin{array}{ccc}\hfill -1\hfill & \hfill 0\hfill & \hfill -1\hfill \\ {}\hfill 0\hfill & \hfill 4\hfill & \hfill 0\hfill \\ {}\hfill -1\hfill & \hfill 0\hfill & \hfill -1\hfill \end{array}\right] $$
(11)

Here, \( {C}_{j,L}^{l,k}\left(m,n\right) \), \( {C}_{j+1,L}^{l,k}\left(m,n\right) \), and \( {C}_L^{l,k}\left(m,n\right) \)denote the low frequency coefficients of the source and fused images. The proposed local energy-based fusion rule can be expressed as follows.

$$ {C}_L^{l,k}\left(m,n\right)=\left\{\begin{array}{c}\hfill {C}_{j,L}^{l,k}\left(m,n\right),\kern1em if\kern0.5em {E}_j^{l,k}\left(m,n\right)\ge {E}_{j+1}^{l,k}\left(m,n\right)\hfill \\ {}\hfill {C}_{j+1,L}^{l,k}\left(m,n\right),\kern1em if\kern0.5em {E}_j^{l,k}\left(m,n\right)<{E}_{j+1}^{l,k}\left(m,n\right)\hfill \end{array}\right. $$
(12)

Assuming that the image details are contained in the high-frequency coefficients in the multi-scale domain, typically, the fusion rule is a maximum-based rule that selects high-frequency coefficients with the maximum absolute value. Recently, measurements, such as energy of gradient, spatial frequency, Tenengrad, Laplacian energy, and sum-modified-Laplacian (SML), have been used. In this study, we use SML to determine the high-frequency coefficients.

A focus measure is defined as a maximum for the focused image. Therefore, for multifocal image fusion, the focused image areas of the source images must produce maximum focus measures. Here, let f(x,y) be the gray level intensity of pixel (x,y). SML [24] is defined as follows:

$$ {S}_j^{l,k}\left(m,n\right)=\sum_{m\hbox{'}=-M}^M\sum_{n\hbox{'}=-N}^N{\nabla}_s^2f\left(m+{m}^{\hbox{'}},n+{n}^{\hbox{'}}\right),\kern1em for{\nabla}_s^2f\left(m,n\right)\ge T $$
(13)

where

$$ {\displaystyle \begin{array}{l}{\nabla}_s^2f\left(m,n\right)=\mid 2f\left(m,n\right)-f\left(m-1,n\right)-f\left(m+1,n\right)\mid \\ {}\kern4.5em +\mid 2f\left(m,n\right)-f\left(m,n-1\right)-f\left(m,n+1\right)\mid \end{array}} $$
(14)

Here, l and k are the transform scale and direction, respectively, T is a discrimination threshold value, M and N determine the window of size (2 M + 1) × (2 N + 1), and m’ and n’ are variables.

Let \( {C}_{j,H}^{l,k}\left(m,n\right) \), \( {C}_{j+1,H}^{l,k}\left(m,n\right) \), and \( {C}_H^{l,k}\left(m,n\right) \)denote the high-frequency coefficients of the source and fused images. The proposed SML-based fusion rule can be expressed as follows:

$$ {C}_H^{l,k}\left(m,n\right)=\left\{\begin{array}{c}\hfill {C}_{j,H}^{l,k}\left(m,n\right),\kern1em if\kern0.5em {S}_{j,H}^{l,k}\left(m,n\right)\ge {S}_{j+1,H}^{l,k}\left(m,n\right)\hfill \\ {}\hfill {C}_{j+1,H}^{l,k}\left(m,n\right),\kern1em if\kern0.5em {S}_{j,H}^{l,k}\left(m,n\right)<{S}_{j+1,H}^{l,k}\left(m,n\right)\hfill \end{array}\right. $$
(15)

where l and k are the transform scale and direction, respectively. After inverse frequency domain transformation, we obtain the clean transmission map T(d).

3.2.3 Scene radiance ρ recovery

Assume that the image is captured under ideal conditions without noise and scatter. Thus, the non-linear estimation problem can be written as follows.

$$ {I}_c^I={I}_c^{\infty }(z)\left(1-T(d)\right),\kern1em \forall {I}_c^I\le {I}_{1\%}^I $$
(16)

We can use a region approach to minimize the above eq. [25]. Onceβ(λ),\( {I}_c^{\infty }(z) \), and T(d) are estimated, the scene radiance recovery of all image pixels is calculated as follows.

$$ \rho =1+\left(\frac{I_c}{I_c^{\infty }(z)}-1\right){T}^{-1}(d) $$
(17)

3.3 Color distortion coefficient α estimation

We take irradiance measurements with an upward looking spectrometer at the surface and beneath the water at a measured depth. Thus, we can derive the attenuation coefficientc a  = a + b, where athe absorption coefficient and b is the scattering coefficient at depth z. After determining the attenuation coefficient, we can estimate the incident irradiance at any depth ε using the Lambert-Beer equation:

$$ {E}_{\varepsilon }={E}_s{e}^{- cz} $$
(18)

where E s is the irradiance at the surface of the water. The chromatic transfer function τ(λ), which describes how light from the surface of the water changes with depth, can be calculated as follows.

$$ \tau \left(\lambda \right)=\frac{E_s\left(\lambda \right)}{E_{\varepsilon}\left(\lambda \right)} $$
(19)

Using the spectral response of the RGB camera, we convert the chromatic function to the RGB domain as follows:

$$ \alpha =\sum^k\tau \left(\lambda \right){C}_c\left(\lambda \right) $$
(20)

where α is the color distortion coefficient, C c (λ) is the underwater spectral characteristic function for color band c, and k is the number of discrete bands of the camera’s spectral characteristic function.

3.4 De-blurring

We use the patch-based de-scattering method; therefore, there are ringing artifacts and additional noise in the de-scattered image. Thus, we must use a centralized sparse representation (CSR) for image de-blurring. Wang et al. [26] reviewed recent image de-blurring developments, such as non-blind or blind de-blurring and spatially invariant or variant de-blurring techniques. The authors concluded that the method proposed by Dong et al. [27] overcomes the common issue of non-local strategies, i.e., the loss of local smoothness. Thus, we use this method to recover a blurred image.

For a size of \( \sqrt{n}\times \sqrt{n} \)patch x i of the blurred image at location i, given a PCA dictionaryΦ, which is generated by [27], each patch can be coded sparsely asx i  ≈ Φδ i using a sparse coding algorithm [28]. Then, the blurred image can be represented sparsely using the set of sparse codes{δ i }. The CSR model is expressed as follows:

$$ {\delta}_y=\underset{\delta }{\arg \min}\left\{||y-H\Phi \circ \delta |{}_2^2+\sum_i\frac{2\sqrt{2}{\sigma}_n^2}{\sigma_i}||{\delta}_i|{}_1+\sum_i\frac{2\sqrt{2}{\sigma}_n^2}{\sigma_i}||{\theta_i}_1\right\} $$
(21)

where σ n is the standard deviation of the additive Gaussian noise, θ i is the i-th element of the SCN signal θ, and σ i is the standard deviations of δ i .

4 Experiments and discussion

4.1 Experimental setup

Our experimental setup is shown in Fig. 3. A camera and LED lights were placed in water. A fluorescent lamp was placed 20 cm from a 180 L glass aquarium. All sides of the aquarium were covered with black textile to avoid glass reflection. The objects were placed approximately 30 cm from the front of the camera. First, we captured a ground truth image without sediment. Then, we increased the turbidity of the water using deep-sea soil. So, we can achieve a variety of linear scale of ten turbidity steps ranging from 1 mg/L to 500 mg/L. Most studies [29, 30] used milk or a mixture of milk and grape juice to simulate turbid water. This simulation may be appropriate for natural ocean water; however, deep-sea soil may be much better for simulating the environment of the sea bottom as observed by underwater robots.

Fig. 3
figure 3

Experimental setup (OLYMPUS uTough 8000 underwater camera, INON LE700-W/S LED lights, and SLIK SBH-320DS PTZ)

4.2 Water tank simulation

In this experiment, we added 18 g of deep-sea soil to the water tank. The deep-sea soil was dropped from the top of the water tank. Note that deep-sea soil is not distributed homogeneously or uniformly in water. This can simulate the operational environment of an underwater robot. The OLYMPUS camera captured a non-uniform scattered image, as shown in Fig. 4b. Then, we used conventional de-scattering methods to process the captured image.

Fig. 4
figure 4

Comparison of underwater image restoration methods in a water tank

As shown in Fig. 4, the DCP [10] de-scattering method can remove some scatter. We used ambient light estimation by pixel value sorting; therefore, the depth map is incorrect. As a result, over de-scattering occurs, and the information of the lower right corner of the image is lost. Nicholas et al. [15] improved the DCP method and used GraphCut segmentation to refine the depth map in each color channel. Figure 4d shows that, even though post-processing can avoid incorrect depth map estimation. Chiang et al. [16] proposed a physical imaging model and a corresponding enhancement method. However, their method does not consider artificial light and camera characteristics. The depth map also used DCP, which resulted in incorrect estimation. In 2014, Fattal et al. [9] used the color lines method to estimate the ambient light to achieve successful de-scattering. However, the roughly estimated depth map can cause unsatisfactory results (Fig. 4f). The median DCP method was the first applicable de-scattering method for real applications. However, it is unsuitable for underwater image processing because it does not consider the medium influence. The Weiner de-hazing method [22] uses a Weiner filter to select the patch size. However, it selects a fixed patch size for an image and cannot remove inhomogeneous scatter. Thus, some information of the result is missed. The expose fusion enhancement method [17] was the first method to use HDR technology to de-haze images. Note that, in many cases, contrast enhancement does not always perform well. In contrast, the proposed method achieves the best performance for all metrics, as shown in Table 1. In this paper, we use structural similarity (SSIM), peak signal to noise ratio (PSNR), and average E [34] for measuring the image quality, Fig. 5 shows the results of SSIM with different restoration methods.

Table 1 Comparative analysis of different underwater de-hazing methods shown in Fig. 4
Fig. 5
figure 5

Comparison of SSIM values of different restoration methods

Table 2 Comparative analysis of different underwater de-hazing methods shown in Fig. 6

As shown in Fig. 5, the SSIM values decrease when the turbidity increases. In addition, the DCP-based methods (i.e., DCP, DCP and Graph-cut, etc.) outperform the other methods (e.g., expose fusion, etc.). The conventional DCP method [10] is not robust against non-uniform scatter images. The color-lines method [10] cannot be applied to high turbidity images because the color lines are difficult to calculate. The MDCP [31] and Weiner de-hazing [22] methods outperform the other conventional methods. However, after MDCP processing, some scatter is evident in the results. The Weiner de-hazing method causes color shifts or distortion. In contrast, the proposed method outperforms all other methods and can preserve colors and remove scatter.

4.3 Real-world simulation

In this experiment, we set real objects in water and added turbidity to the water. The restored images are shown in Fig. 6. The most recent methods, namely the glow and light colors method [32] and the color attenuation prior method [33], were used for comparison. The results show that that the color-lines method [9] and the proposed method perform well. The color-lines method demonstrates greater color distortion. The proposed method demonstrates the best performance compared to the recent methods (Table 2).

Fig. 6
figure 6

Real-world comparison of underwater image restoration methods

4.4 Object segmentation

Here, we examine the results of level set segmentation [35]. Image segmentation is a basic operation of object recognition. To the best of our knowledge, the level set method is robust against low quality images. Note that image preprocessing methods affect segmentation. In this experiment, we compared the level set-based segmentation results using different restoration methods (Fig. 7). All parameters in each experiment were the same. The step for segmentation was 0.3 and the number of iterations was 500.

Fig. 7
figure 7

Comparison of underwater image restoration methods by fast level set segmentation

4.5 Classification

In the fourth experiment, in order to verify the utility of the proposed method, we compared the classification accuracy of common classification methods. The results are shown in Fig. 8. In this experiment, we used 7330 images from the Japan Agency for Marine-Earth Science and Technology database. The images were classified manually into four classes (squid, crab, shark, and minerals). We selected 5330 images for training and 2000 images for classification.

Fig. 8
figure 8

Comparison of the effectiveness of the proposed method in conventional classification methods

In the classification part, we made use of one of the recent state-of-the image classification method called Deep Learning [36]. This network architecture we used in this experiment was proposed by Szegedy et al. [37], which called GoogLeNet, and was the winner of ILSVRC 2014. The main idea behind the GoogLeNet is the inception layer, which combines information from multiple scales and significantly reduces the number of parameters.

As shown in Fig. 8, the proposed method can improve all of the popular classification algorithms. The average accuracy rate was improved by approximately 2.1%. Thus, we conclude that the proposed method work well and can be applied with the deep learning based methods such as GoogLeNet.

5 Conclusion

In this paper, we have proposed a hierarchical transmission fusion method and a color-line ambient light estimation method for high turbidity inhomogeneous underwater image restoration from a single input image. There are three primary contributions of this work. First, we proposed a hierarchical transmission fusion method to estimate a transmission map in an inhomogeneous underwater image. Second, we considered scattering and blurring effects in underwater imaging, and we proposed a corresponding framework to recover distorted images. Third, we have built a large deep-sea image dataset for underwater robots. We also tested the performance of the proposed method with state-of-the-art methods using image quality assessment indexes and post-processing methods, such as image segmentation and image classification. The experimental results demonstrate that the accurate estimation of the depth map and ambient light by the proposed method can recover visually pleasing images with sharp details. In future, we will focus on high turbidity non-uniform and vignetting problems.