1 Introduction

The major portion of the earth’s surface is covered with water yet it remains the least explored region by human beings. The captured underwater images will help in various fields such as aquatic life monitoring as well as tracking. However, captured images degrade due to underwater physical properties such as scattering, attenuation and low light. Thus, UIE methods are employed to extract information by enhancing degraded underwater images.

The light transmitted from the underwater surface suffers from scattering due to particles resulting in diminished underwater image quality [1]. In past, several UIE approaches have been proposed for improving the degraded underwater images by using the physical underwater image formation model (UIFM) [2, 3]. But, these methods fail to enhance different types of underwater images because the UIFM-based methods did not consider the scattering effect in the mathematical formulation.

Learning-based approaches have recently shown great performance in digital image processing and vision-based applications. Underwater researchers have been attracted to the advantages of learning-based approaches. Thus, they employed neural networks in the field of enhancement. Cai et al. [4], introduced CURE-Net for obtaining attenuated color using the three cascaded sub-networks. In contrast, Sun et al. [5], presented UMGAN based on feedback mechanisms and a noise reduction network. Fu et al. [6] introduced a learning-based method that employs the global as well as local information of the input image for enhancing color and contrast.

The above-discussed learning-based methods do not perform well for synthetic as well as real underwater images. Also, the architecture of learning-based UIE methods is complicated and introduces artefacts after enhancement. Thus, to resolve these issues, F2UIE is proposed that is based on a feature-based end-to-end convolution neural network. It is also a lightweight training network due to easy training procedures.

Motivation:

The research in the underwater world with the help of UIE methods has various applications as follows [7]:

  1. (a)

    Monitoring marine organisms, including flora and fauna.

  2. (b)

    Analyzing sea-beds, sub-marine pipes and shipwrecks.

  3. (c)

    Assess water quality for a healthy underwater environment.

  4. (d)

    Surveillance marine activities with the help of AUV.

The existing UIE methods face issues in enhancing images due to low brightness, color-cast, low contrast and haze that results in degraded performance of UIE methods. Due to the attenuation and absorption phenomenon, low brightness and colour cast are observed in captured image. The level of attenuation and absorption depends upon the wavelength. Also, blurring occurs when light scatters between an object and the camera, resulting in low contrast. Another problem is the haze effect observed in the image that is caused by undersea particles.

There exist no UIE methods that perform well for diverse underwater images. Existing approaches only partially remove color-cast and fail to obtain the attenuated red color. Thus a feature transfer-based CNN (F2UIE) method is proposed to deal with degraded underwater images captured in different environmental conditions.

Moreover, learning-based methods proposed in the past did not train the model using real as synthetic datasets. Thus, F2UIE is trained on both types of datasets. The neural network must be precisely trained and optimized to get better results in terms of enhancement. To achieve this, a loss function is introduced namely, the underwater image enhancement loss (UIEL), for optimizing the F2UIE parameters.

The primary contributions of the paper are mentioned below:

  • The F2UIE (Feature Transfer-based Convolution Neural Network) aims to perform "end-to-end" feature extraction to obtain detailed information from underwater images. The network generates two confidence maps by processing the input images using techniques: White Balancing (WB) and Contrast Limited Adaptive Histogram Equalization (CLAHE). Subsequently, the output of the neural network and the learning units are combined to produce an enhanced underwater image, leveraging the extracted features for improved visual quality.

  • To optimize the training process of the F2UIE and address the challenges associated with underwater image enhancement, an optimized UIEL (Underwater Image Enhancement Loss) function is introduced. The UIEL loss function serves multiple purposes: first, it enables the effective training of the F2UIE by minimizing the discrepancy between the enhanced images and input images. Second, it facilitates the enhancement of pixel-level information by encouraging the network to preserve important details during the enhancement process. Lastly, the UIEL loss function helps to mitigate unwanted noise artefacts that may arise as a result of the enhancement, ensuring that the final output is visually appealing and noise-free.

2 Background study of UIE methods

The quality of captured underwater images relies on the level of absorption, scattering, depth of water and refraction. The absorption ratio varies greatly with wavelength [8]. Figure 1, shows that red color is absorbed in smaller depths whereas green, as well as blue color, travels deeper. Due to this, the captured underwater images are usually blue and green. Figure 2 illustrated the process of image acquisition in the underwater world. The attenuation effect is observed due to absorption. The scattering effect leads to limited contrast and visibility [9, 10].

Underwater image enhancement is a challenging problem due to the underwater environment, including low visibility, color distortion, and poor contrast. Researchers have explored various techniques to improve the quality of underwater images. In this literature review, we categorize the methods into three main categories: neural network-based methods, conventional methods, and fusion-based methods. We present an overview of the latest advancements in each category, highlighting their principles, advantages, and limitations.

Fig. 1
figure 1

The representation of the effect of underwater color absorption [7]

Fig. 2
figure 2

The step-by-step process of underwater image formation based on different environmental factors [7]

  1. (i)

    Neural Network-based Methods: In recent years, there has been a growing focus on improving the quality of underwater images through the use of learning-based methods, which have shown promising results in solving various visual tasks [7, 11]. However, one of the main challenges faced by these methods is the requirement for large-scale pairs of clear-degraded underwater images for supervised training, which can be difficult to obtain in complex underwater scenes. Li et al. [12] developed a two-stage CNN network called WaterGAN, which generated a synthetic underwater image training dataset by considering depth estimation and color restoration. Li et al. [13] combined a physical model with the optical properties of underwater scenes to synthesize diverse water-type datasets. They trained a lightweight underwater convolutional neural network (UWCNN) to enhance each scene-type image. Li et al. [14] proposed a weakly supervised underwater color transfer network that allowed for images to be captured from unknown locations. Ye et al. [15] introduced an unsupervised adaptation network to address the joint problem of underwater depth estimation and color correction from monocular underwater images. To improve the generalization of the CNN, Li et al. [16] curated a real-world underwater images benchmark (UIEB) dataset and developed an underwater image enhancement network called Water-Net. Building upon the UIEB dataset, Li et al. [16, 17] proposed Ucolor, a method that effectively improves the visual quality of underwater images through medium transmission-guided multi-color space embedding. Lin et al. [18] introduced a multiscale deformable convolution network with an attention mechanism for underwater image enhancement. Fabbri et al. [19] utilized CycleGAN to create a training set without paired data for enhancing underwater imagery, while Yu et al. [20] introduced a conditional GAN with perceptual loss for underwater color correction. Chen et al. [21] developed a unified network called HybridDetectionGAN, which consists of an enhancement model and a detection preceptor in each branch to generate detection-favoring images. However, these methods may struggle to handle extreme distortions in underwater images when provided with a singular input. In summary, the current state-of-the-art methods for underwater image enhancement (UIE) suffer from limitations in their neural network training, resulting in suboptimal image quality enhancement. These methods have primarily been trained on similar types of datasets, which restricts their effectiveness in handling diverse underwater scenes and variations. In contrast, our proposed F2UIE addresses this limitation by leveraging a diverse range of datasets during the training process. By incorporating various types of underwater images into our training pipeline, our F2UIE model becomes more versatile and capable of effectively enhancing image quality across different underwater environments. This broader training approach enhances the generalization and adaptability of our model, enabling it to perform more effectively than existing methods in terms of image enhancement.

  2. (ii)

    Conventional Methods: Conventional methods for UIE rely on physical scattering models and manually crafted priors derived from statistical information to recover the original pixel intensity. These priors-based methods such as underwater dark channel prior (UDCP) [22], red-saturation prior [23], blurriness prior (BP) [24], minimum information loss prior [25], haze-lines prior [26] and general dark channel prior [27] have shown success in enhancing underwater images. However, these methods are often sensitive to specific underwater scenes and lack robustness when dealing with severely degraded real-world underwater images. For example, the underwater dark channel prior (UDCP) fails when applied to white scene objects, and the blurriness prior (BP) does not perform well on clear underwater images. Moreover, the simple physical models employed by these methods may not cover the full range of underwater scenes, and the quality of the enhancement results heavily relies on the accuracy of the physical model itself. To address these limitations, Akkaynak et al. [28] have proposed revised underwater image formation models that consider different scattered signal dependencies. Zhou et al. [29] introduced backscatter pixel priors and color cast restoration methods to deal with blurriness and color degradation issues. In summary, conventional methods for UIE aim to address the scattering and absorption parameters by incorporating additional constraints to compensate for information loss. However, these hand-crafted priors may lack robustness and struggle to handle all challenging scenarios effectively. In contrast, our proposed F2UIE takes a different approach by avoiding reliance on a physical model to compensate for information loss. Instead, the F2UIE leverages its inherent learning capabilities to efficiently enhance underwater images across various scenarios, regardless of the level of degradation. By eliminating the dependency on explicit physical models, our F2UIE method exhibits improved adaptability and effectiveness in handling diverse underwater conditions.

  3. (iii)

    Fusion-based Methods: Fusion-based methods combines multiple input images or their processed versions to improve the overall quality of underwater images [30]. These methods aim to address challenges such as color distortion, contrast enhancement, and preservation of information. Fusion can occur in different domains, including spatial and frequency domains, and involves combining different components of the input images to achieve the desired enhancement. Li et al. [31], focused on minimum information loss and histogram distribution prior to enhance contrast and correct color distortion. Fu et al. [32] employed a “two-step” strategy to correct color-cast and enhance contrast, but this approach led to noisy results and color cast issue. To overcome these limitations, Ancuti et al. [19] derived both a contrast-enhanced version and a color-corrected version from the original input, blending them using weights based on “Laplacian contrast,” “local contrast,” “saliency,” and “exposedness” in a multi-scale fusion process. To address this, Ancuti et al. [33] proposed a new fusion strategy that directly blended a sharpened version and a gamma-corrected version derived from a color-corrected and white-balanced version of the original underwater image. They utilized “Laplacian,” “saliency,” and “saturation” as weights for the fusion process. Another approach, L2uwe, introduced by Marques et al. [34], created two models to generate processed versions of the input and employed a multi-scale fusion scheme for underwater image enhancement. Zhuang et al. [35] developed a Bayesian retinex algorithm to enhance the visual quality of underwater images. However, this approach was unable to preserve information, particularly for images captured under artificial illumination. In summary, the existing fusion-based approaches for UIE have limitations including the presence of artifacts, noisy outcomes, unrealistic color tones, and challenges in maintaining detail information and addressing color distortion. To overcome these limitations, this paper focuses on leveraging the strengths of both learning-based methods and fusion-based methods to enhance the overall quality of degraded underwater images. By combining the advantages of these approaches, we aim to develop a more effective and robust method for underwater image enhancement. Through this integrated approach, we seek to address the limitations and improve the visual quality of underwater images.

3 F2UIE methodology

Existing underwater image enhancement methods often struggle to handle the wide range of variations in the underwater environment and lighting conditions. However, the fusion-based method [19] has shown promising results by incorporating multiple pre-processing steps and a fusion technique. Despite the rapid development of UIE approaches, deep learning-based algorithms still lag behind traditional techniques in terms of generalization, primarily due to the scarcity of diverse training data and the need for well-designed network architectures.

In this research paper, an innovative approach is proposed, Feature Transfer based Convolutional Neural Network (F2UIE), which is inspired by the fusion approach and utilizes a convolutional neural network (CNN). Figure 3, illustrates the comprehensive framework of the proposed F2UIE method, showcasing its main phases: pre-processing, training, and testing. In the pre-processing phase, we divide the input dataset into training and testing sets. For data augmentation, we apply operations such as rotation, color transformation, and shifting to the training data, enhancing its diversity and robustness. In the training phase, we train a Multi-Stack CNN using the pre-processed data and carefully selected hyper-parameters. This enables the network to learn the underlying patterns and features specific to underwater images. Finally, in the testing phase, we employ the trained Multi-Stack CNN model to generate enhanced images, effectively improving the visibility and overall quality of underwater images.

By integrating the strengths of the fusion approach and feature-based CNN, the proposed F2UIE method aims to address the limitations of existing UIE methods and offer a more efficient and effective solution for enhancing underwater images.

Fig. 3
figure 3

Schematic diagram of the F2UIE framework for image enhancement

3.1 Input dataset

F2UIE is trained using real and synthetic underwater datasets. After investigating a wide range of underwater image datasets, four benchmark datasets are utilized for training the network. The details of the datasets are mentioned below:

  • Enhancement of underwater visual perception (EUVP) [36] The EUVP includes 12, 000 paired and 8, 000 unpaired images of degraded and high visual quality. The images were obtained during the oceanic explorations in varied visibility circumstances. Real-world images were degraded using an underwater distortion model based on CycleGAN to generate the paired images.

  • Underwater generative adversarial network (UWGAN) [37] The UWGAN is comprised of 15, 000 degraded underwater images. The dataset is generated using the generative adversarial network that generates synthetic images along with its ground truth. The images include color-cast and haze properties of the underwater environment.

  • Real-world underwater image enhancement (RUIE) [38] The RUIE is collected from the real-world underwater environment that shows the properties of the complex underwater environment. The RUIE consists of 4,230 images. The dataset includes marine animals (such as urchins, and scallops) and shows issues (such as color-cast, haze, and limited lightning).

3.2 Pre-processing phase

During the pre-processing phase of our research on underwater image enhancement, the input dataset is divided into training and testing datasets, with a ratio of 70:30. For the training phase of the F2UIE, a set of 37,961 images are utilized, while the remaining 16,269 images are reserved for evaluating the performance of the F2UIE model.

In addition, to enhance the diversity and robustness of the training data, various data augmentation techniques are applied. These techniques involve resizing, flipping, and rotation operations. Considering the limitations of memory resources, the training data is resized to a dimension of 256 \(\times \) 256 pixels.

By employing this pre-processing phase, we ensure a proper division of the dataset for training and testing purposes. Moreover, the augmentation techniques applied to the training data enhance its variability, thereby allowing the F2UIE model to learn and generalize better. The resizing operation helps manage the memory constraints while still preserving the essential visual information necessary for the subsequent stages of the F2UIE model.

Table 1 The values of parameters used to train the Multi-stack CNN

3.3 Training phase

In the training phase, F2UIE employs multi-stack CNN that is trained using the pre-processed images, CLAHE and WB with their respective raw image resulting in predicted confidence maps. The predicted confidence maps are the important features of the underwater image. The outstanding results of fusion techniques [19] motivated us to employ a fusion-based approach. Multi-stack CNN is a convolutional neural network that serves as a baseline model and is based on a fusion approach. The hyper-parameter settings that are used to train the Multi-stack CNN are mentioned in Table 1.

3.3.1 Data augmentation

Collecting a comprehensive and diverse underwater dataset poses significant challenges due to the complex nature of underwater scenes and the degradation caused by ecological factors. The inherent similarity among captured underwater images within a specific scene limits the variability of the dataset. To address this limitation and improve the effectiveness of underwater image enhancement algorithms, data augmentation techniques are employed to enrich the dataset with additional data points generated through subtle modifications of the existing samples.

Rotation, color transformation, and shift operations are utilized as augmentation techniques. Rotation enables the modification of the orientation and perspective of the input images, enhancing the diversity and richness of the dataset and is computed using (1). Color transformations, on the other hand, allow for adjustments in brightness, contrast, and hue, introducing different color patterns within the images and further augmenting the dataset. Additionally, shift operations introduce slight spatial modifications, contributing to the exploration of various scene compositions.

Through data augmentation, the dimensionality of the underwater image dataset can be effectively reduced, focusing on the essential information contained within the three color channels (red, green, and blue). This reduction not only aids in mitigating the similarity issue but also facilitates the development of more efficient and accurate algorithms for underwater image enhancement.

$$\begin{aligned} I_{x y}=\left[ r_{x y}, g_{x y}, b_{x y}\right] \end{aligned}$$
(1)

where \(r_{xy}, g_{xy}\), and \(b_{xy}\) are the eigenvalues of red, blue and green direction vectors respectively. These values are computed using the (2)

$$\begin{aligned}&r_{x y}=m_{r} \lambda _{r}, \nonumber \\&g_{x y}=m_{g} \lambda _{g} \nonumber \\&b_{x y}=m_{b} \lambda _{b} \end{aligned}$$
(2)

where \(m_{r}\), \(m_{g}\) and \(m_{b}\) are the matrix of the red, blue and green color channels. Further, \(\beta \) is used as a random variable where mean= 0 and variance=0.1 and added in the transformation function as shown in (3).

$$\begin{aligned} I_{x y}=\left[ m_{r}, m_{g}, m_{b}\right] \left[ \beta _{r} \lambda _{r}, \beta _{g} \lambda _{g}, \beta _{b} \lambda _{b}\right] ^{T} \end{aligned}$$
(3)

Then, rotation is performed as shown in (4)

$$\begin{aligned} \left\{ \begin{array}{l} X^{\prime }=X_{i} \cos \theta _{1}-Y_{i} \sin \theta _{1} \\ Y^{\prime }=X_{i} \sin \theta _{1}+Y_{i} \cos \theta _{1} \end{array}\right. \end{aligned}$$
(4)

where \(\left( X^{\prime }, Y^{\prime }\right) \) are the coordinates that were rotated and transformed and \(\theta _{1}\) is angle of rotation.

Then, shift transformation is performed as shown in (4)

$$\begin{aligned} \left\{ \begin{array}{l} X^{\prime }=X_{i}+Y_{i} \tan \theta _{2} \\ Y^{\prime }=X_{i} \tan \theta _{2}+Y_{i} \end{array}\right. \end{aligned}$$
(5)

where \(\theta _{2}\) is the angle of shifting.

3.3.2 Training input generation

The inputs for training the multi-stack CNN are generated using WB and CLAHE methods. Thus, two inputs are generated: \(I_{WB}\) and \(I_{C}\). Further, \(I_{WB}\), \(I_{C}\) and \(I_{RAW}\) are passed in multi-stack CNN where \(I_{RAW}\) is the degraded underwater image.

  1. (i)

    White Balancing (WB) The WB is used to resolve the color-cast issue and has been proven to be effective [19]. White balancing is a common image enhancement technique used to correct color cast and restore the true colors in an image. The step by step computation of WB is shown below: Step 1: Compute Color Channels- For an input image \(I_{RAW}\) the red channel R(xy), green channel G(xy), and blue channel B(xy) are extracted. Step 2: Compute Channel Averages- Calculate the average values of each color channel over the entire image.

    $$\begin{aligned} \bar{R} = \frac{1}{n} \sum _{x=1}^{width} \sum _{y=1}^{height} R(x, y) \end{aligned}$$
    (6)
    $$\begin{aligned} \bar{G} = \frac{1}{n} \sum _{x=1}^{width} \sum _{y=1}^{height} G(x, y) \end{aligned}$$
    (7)
    $$\begin{aligned} \bar{B} = \frac{1}{n} \sum _{x=1}^{width} \sum _{y=1}^{height} B(x, y) \end{aligned}$$
    (8)

    where N is the total number of pixels of an image. Step 3: Compute Scaling Factors- Normalize the average channel values to make the average gray value equal for all channels.

    $$\begin{aligned} \bar{V} = \frac{\bar{R} + \bar{G} + \bar{B}}{3} \end{aligned}$$
    (9)

    Compute the scaling factors for each channel:

    $$\begin{aligned} S_R = \frac{\bar{V}}{\bar{R}} \end{aligned}$$
    (10)
    $$\begin{aligned} S_G = \frac{\bar{V}}{\bar{G}} \end{aligned}$$
    (11)
    $$\begin{aligned} S_B = \frac{\bar{V}}{\bar{B}} \end{aligned}$$
    (12)

    Step 4: Adjust Color Channels- Multiply each channel by its respective scaling factor to balance the colors.

    $$\begin{aligned} R' = S_R \cdot R(x, y) \end{aligned}$$
    (13)
    $$\begin{aligned} G' = S_G \cdot G(x, y) \end{aligned}$$
    (14)
    $$\begin{aligned} B' = S_B \cdot B(x, y) \end{aligned}$$
    (15)

    Step 5: Clamp Values- Ensure that the adjusted channel values are within the valid range of intensity values, typically 0 to 255:

    $$\begin{aligned} R' = \min (\max (R'(x, y), 0), 255) \end{aligned}$$
    (16)
    $$\begin{aligned} G' = \min (\max (G'(x, y), 0), 255) \end{aligned}$$
    (17)
    $$\begin{aligned} B' = \min (\max (B'(x, y), 0), 255) \end{aligned}$$
    (18)

    Step 6: Merge Color Channels- Combine the adjusted red, green, and blue channels to form the white-balanced image.

    $$\begin{aligned} I_{\text {WB}} = (R', G', B') \end{aligned}$$
    (19)
  2. (ii)

    CLAHE The contrast-limited adaptive histogram equalization (CLAHE) is a widely used technique in image pre-processing that enhances the local contrast of an image while maintaining the overall global contrast [39, 40]. It is particularly useful in applications such as medical imaging, computer vision, and digital photography. CLAHE operates on the concept of histogram equalization but with a contrast limiting mechanism to prevent over-amplification of noise [41]. The CLAHE method to enhance the degraded underwater image consists of the following steps: Step 1: Image Partitioning- Divide the input image into non-overlapping tiles or patches of equal size. Step 2: Histogram Calculation- For each tile, compute the histogram H(i) that represents the frequency of occurrence of pixel intensity i. Step 3: Cumulative Distribution Function (CDF)- Calculate the CDF (\(C_(i)\)) based on the histogram, representing the cumulative probability of pixel intensities up to intensity i.

    $$\begin{aligned} C_(i) = \sum _{j=0}^{i} H(j) \end{aligned}$$
    (20)

    where \(\sum [H(j)]\) denotes the sum of histogram values from j = 0 to i. Step 4: Contrast Enhancement- Enhance the contrast by mapping the original intensity values to new enhanced values within each tile. Apply a contrast limiting function, denoted as \(F_(i)\), to the pixel values. The contrast limiting function prevents excessive contrast enhancement.

    $$\begin{aligned} F_(i) = L \cdot \frac{{C_(i) - C_{\text {min}}}}{{C_{\text {max}} - C_{\text {min}}}} \end{aligned}$$
    (21)

    where L represents the desired intensity range, and \(C_{\text {min}}\) and \(C_{\text {max}}\) are the minimum and maximum CDF values within the tile, respectively. Step 5: Clip Excessive Intensities- Check if any enhanced pixel intensity exceeds the intensity range [0, L]. If so, clip the intensity value to the range limits. Step 6: Interpolation- Interpolate the enhanced tiles to create the final enhanced image. This step ensures smooth transitions between adjacent tiles and maintains overall visual consistency.

3.3.3 Architecture of multi-stack CNN

The Multi-Stack CNN is based on the convolutional neural network where the ReLU activation function is used. It is a fusion network that fuses the features of the input image and a pre-processed image along with the estimated confidence maps resulting in image enhancement. Multi-Stack CNN includes a learning unit, where raw images are fed to the Multi-Stack CNN along with the images obtained using CLAHE and WB. The Multi-Stack CNN is utilized to boost the performance and training of the F2UIE. The output of the learning unit is estimated by confidence maps. At last, the output of the Multi-Stack CNN is the enhanced images. Finally, the proposed UIEL function is described in detail that is used for Multi-Stack CNN training. The total features extracted are shown in Table 2.

The input of the Multi-Stack CNN module is pre-processed images, \(I_{C}, I_{WB}\), and raw image \(I_{RAW}\) that is used for feature extraction. The relationship between the convolution map and activation maps is shown in (22)

$$\begin{aligned}&X_{1}=Conv_{7 \times 7}(X), X_{2}=Conv_{3 \times 3}(X),X_{3}= Conv_{3 \times 3}(X), \nonumber \\&X_{4}= Conv_{3 \times 3}(X), X_{5}= Conv_{3 \times 3}(X), X_{6}= Conv_{7 \times 7}(X) \end{aligned}$$
(22)

After that, 3 activation maps are then fused using the element-wise summation as shown in (23)

$$\begin{aligned} \hat{X}=X_{1}+X_{2}+X_{3}+X_{4}+X_{5}+X_{6} \end{aligned}$$
(23)

Then \(\hat{X}\) is transformed into a channel-wise tensor by employing the global maximum pooling (GMP) and two sequentially fully connected (FC) layers to obtain 3 activation maps as shown in (24)

$$\begin{aligned} C_{i}=f c_{i}(G_{max}(\hat{X})) \end{aligned}$$
(24)

where \(G_{max}\) is global maximum pooling, \(f c_{i}\) are two FC layers for different activation maps and i is the size of the convolution kernel. Finally, to predict the confidence maps, the two obtained inputs \(I_{C}\) and\(I_{WB,}\) and the \(I_{RAW}\) are fed to the F2UIE. At last, two confidence maps are obtained \(C_{C} and C_{WB}\).

Table 2 The summary of trainable parameters in F2UIE
Table 3 The summary of trainable parameters in Learning Unit

To reduce the artefacts introduced using the WB and CLAHE algorithms, two learning units are used. The learning unit includes 4 stacked convolutional layers. The total features extracted in the learning unit are shown in Table 3. Each learning unit is fed with \(I_{RAW}\), \(I_{C}\) and \(I_{WB}\) and the output is the modified images, \(n_{C}\) and \(n_{WB}\). At last, the outputs of learning units are multiplied with the predicted confidence maps as shown in (25).

$$\begin{aligned} I_{ep}^{i}=n{W B} \odot C_{ W B}+n_{C} \odot C_{C } \end{aligned}$$
(25)

where I is the enhanced image; \(\odot \) is the multiplications of matrices; \(n_{W B}\) and \(n_{C}\) are the output of pre-processing by WB and CLAHE algorithms, respectively; \(C_{W B}\) and \(C_{C}\), are the learned confidence maps (Fig. 4).

Fig. 4
figure 4

Step-by-Step workflow of the Multi-stack CNN model for enhancing an underwater image

3.3.4 Loss function

To make sure that the edges are sharp, the F2UIE is trained with a multi-term loss function that takes into account the pixel-wise loss caused due to enhancement. The loss is computed using two losses including VGG perceptual loss and mean square error (MSE) loss.

  • MSE: It calculates the total of squared differences between the enhanced image \(I_{i}\) and the ground truth information \(I_{i}^{*}\) as shown in (26).

    $$\begin{aligned} L_{M S E}=\frac{1}{N} \sum _{i=1}^{N}\left( I_{i}-I_{i}^{*}\right) ^{2} \end{aligned}$$
    (26)
  • SSIM: It is used to evaluate the structural similarity between the predicted and ground truth images during underwater image enhancement. This loss function considers aspects such as luminance, contrast, and structural information present in the images [42]. The SSIM ensures that the model captures and preserves important structural details and characteristics, leading to a more visually accurate representation of the underwater scene.

    $$\begin{aligned} \text {{SSIM}}(x, y) = \frac{{(2\mu _x\mu _y + C_1)(2\sigma _{xy} + C_2)}}{{(\mu _x^2 + \mu _y^2 + C_1)(\sigma _x^2 + \sigma _y^2 + C_2)}} \end{aligned}$$
    (27)
  • VGG Perpetual Loss: It is based on the 19-layer VGG network that employs the ReLU activation function. The feature representations are obtained by passing the enhanced image and the ground truth image to the last convolutional layer of the pre-trained network. To compute the perceptual loss the total distance between the feature representations of the enhanced image, \((J_{c}\), and the ground truth image \(\hat{J}_{c}\) is obtained as shown in (28)

    $$\begin{aligned} L_{P e r}\left( J_{c}, \hat{J}_{c}\right) =\mid \left( V G G\left( J_{c}\right) -V G G\left( \hat{J}_{c}\right) \mid \right. \end{aligned}$$
    (28)

    where VGG is the pre-trained VGG network [43].

  • Underwater Image Enhancement Loss (UIEL) Function: The proposed UIEL is computed as the sum of VGG perceptual loss and MSE loss as shown in (29). It computes the optimized value of the loss to reduce the loss

    $$\begin{aligned} UIEL=\alpha {1} L_{ {MSE }}+\alpha {2} L_{P e r} \end{aligned}$$
    (29)

    where \(\alpha {1}\) and \(\alpha {2}\) are the scaling coefficients. They are used to adjust the loss components and the values are set as hyper-parameters.

The algorithm of the Multi-Stack CNN is shown in Algorithm 1.

Algorithm 1
figure a

Algorithm of Multi-Stack CNN.

4 Experimental analysis

The effectiveness of the F2UIE is tested on real and synthetic datasets including the RUIE, EUVP, UWGAN on existing CLAHE, UCM, UWCNN, Water-Net, Shallow-Net and F2UIE methods. The performance of the F2UIE method is analyzed using the natural image quality evaluator (NIQE) [44], UIQM [45], blind/reference-less image spatial quality evaluator (BRISQUE) [46] evaluation metric. The experiments are implemented on an i7 processor, Nvidia Quadro T2000 4GB Graphics and Windows 10 Professional Operating System.

4.1 Evaluation metric

The evaluation of the F2UIE model was conducted using non-reference evaluation metrics, due to non-availability of ground truth information in the RUIE dataset. Non-reference evaluation metrics rely on the enhanced image itself to estimate the quality of the input image. In this study, three evaluation metrics were employed: NIQE, UIQM, and BRISQUE to compute the performance of F2UIE.

NIQE: It is inspired by human visual perception and plays an important role in assessing image quality. It computes the total amount of noise and computes the natural scene statistics. Further, use the Gaussian model and then calculates the mean and standard deviation of each Gaussian. Finally, the distance between two Gaussian is estimated and that is considered as the final score. A lower NIQE value indicates superior image quality, reflecting a higher degree of naturalness and reduced noise artefacts.

$$\begin{aligned} \begin{array}{l} D\left( \nu _{1}, \nu _{2}, \Sigma _{1}, \Sigma _{2}\right) = \\ \sqrt{\left( \left( \nu _{1}-\nu _{2}\right) ^{T}\left( \frac{\Sigma _{1}+\Sigma _{2}}{2}\right) ^{-1}\left( \nu _{1}-\nu _{2}\right) \right) } \end{array} \end{aligned}$$
(30)

where \(\nu _{1}\), \(\nu _{2}\) are mean vectors, \(\Sigma _{1}\), \(\Sigma _{2}\) are co-variance matrices and the degraded image of the multi-variance Gaussian model.

BRISQUE: It computes the level of distortion present in an image. It quantifies the loss of naturalness exhibited by the input image by analyzing its natural scene statistics and extracting relevant feature vectors. Subsequently, a support vector machine is utilized to estimate the final BRISQUE score. A lower BRISQUE value indicates superior image quality, as it signifies a reduced level of distortion and a higher degree of naturalness.

$$\begin{aligned} \hat{I}(i, j)=\frac{I(i, j)-\mu (i, j)}{\sigma (i, j)+C} \end{aligned}$$
(31)

where, I(ij) is the intensity of the input image, \(\mu (i, j)\) is the mean, \(\sigma (i, j)\) is the standard deviation and C is constant.

UIQM: The UIQM is derived from the perceptiveness of the human visual system. It takes into account three crucial parameters: colorfulness, sharpness, and contrast measure of underwater images. A higher UIQM value signifies superior image quality. By incorporating these perceptual aspects, UIQM provides a comprehensive assessment of underwater image quality, ensuring that the evaluation aligns with human visual perception.

$$\begin{aligned} UIQM=c_{1}\times UICM+c_{2}\times UISM+c_{3}\times UIConM \end{aligned}$$
(32)

where, \(c_{1}\), \(c_{2}\) and \(c_{3}\) are the weights.

4.2 Qualitative comparison

Underwater images exhibit distinct characteristics that differ from natural images. They are characterized by low luminance and contrast. Consequently, it is crucial to evaluate the impact of different image enhancement methods based on human visual perception. To gain a deeper understanding of the effectiveness of the F2UIE method, a comprehensive visual analysis was conducted using 5 images from each dataset. The selection of these images was based on their degradation level and the presence of common issues encountered in underwater imagery, such as color-cast, low-lighting, and haze. Different UIE methods are compare the performance of F2UIE including CLAHE [47], UCM [48], UWCNN [13], Water-Net [16], Shallow-Net [49], WaveNet [50] and F2UIE methods on RUIE, EUVP and UWGAN datasets. By conducting this comprehensive assessment, we aim to ascertain the superiority of F2UIE in real underwater scenario.

The qualitative comparison analysis serves a dual purpose: (1) showcasing the effectiveness of deep-learning-based methods in situations where reference information is unavailable, and (2) highlighting the superiority of our proposed method, which successfully enhances underwater scenes without relying on ground truth data for training for real as well as synthetic underwater images.

Fig. 5
figure 5

The qualitative comparison of existing CLAHE, UCM, UWCNN, Water-net, Shallow-net, Wave-net with F2UIE on RUIE dataset

In Fig. 5, the input images from RUIE dataset exhibits a noticeable green deviation, and several of the compared UIE methods fail to produce satisfactory results in addressing this issue. Specifically, methods such as CLAHE and UCM are unable to effectively eliminate the greenish color cast. Whereas, Water-Net and Shallow-Net both introduces yellowish tint. The UWCNN methods also introduces red artefacts in the enhanced with increased contrast. The results obtained using Wave-Net are quite effective but higher luminance. In contrast, our F2UIE successfully eliminates the green deviation and achieves a well-balanced color representation.

Fig. 6
figure 6

The qualitative comparison of existing CLAHE, UCM, UWCNN, Water-net, Shallow-net, Wave-net with F2UIE on EUVP dataset

In Fig. 6, the input images from EUVP dataset includes different underwater issues such as color cast and low brightness. The CLAHE and UCM methods are unable to handle green as well as blue color cast issue. The UWCNN eliminates color cast issue but degrades the quality of image by introducing reddish color. Whereas Water-net and Shallow-net both partially removes color cast issue but results in poor contrast. In contrast, our F2UIE successfully eliminates the color cast issue in some scenarios.

Fig. 7
figure 7

The qualitative comparison of existing HE, CLAHE, ICM, UCM, UWCNN, Water-Net and F2UIE on the UWGAN dataset

In Fig. 7, the input images from UWGAN dataset includes different level of color cast and haze issues. The CLAHE, UCM, UWCNN, Water-net, Shallow-net and Wave-net partially removes the color-cast and fails to restore the true colors of the images. However, the F2UIE shows better results in comparison to existing methods due to its training and better feature extraction.

Through qualitative analysis, it is evident that the proposed F2UIE method outperforms the compared methods in terms of color-cast removal, haze reduction, and improved image quality. These results validate the effectiveness of the F2UIE method in handling underwater image enhancement challenges.

4.3 Quantitative comparison

The quantitative analysis of the F2UIE method was conducted by comparing its performance with several existing methods, using metrics such as NIQE, BRISQUE, and UIQM.

The results presented in Table 4 demonstrate that the F2UIE method achieved the highest values for UIQM for real underwater datasets such as RUIE ad EUVP. This indicates that F2UIE effectively enhances the sharpness, color, and contrast of degraded images, surpassing the capabilities of other methods. In contrast, traditional approaches like CLAHE and UCM mainly focus on contrast improvement but often introduce unwanted artefacts. Moreover, the UWCNN, WaterNet, ShallowNet and WaveNet method tends to produce a yellowish effect in the output, resulting in undesired noise. It also shows that the F2UIE method achieved the lowest BRISQUE values for all three datasets, indicating its ability to recover images that are more aligned with the human visual system. The F2UIE method successfully restores the natural colors in the images.

In terms of NIQE, as shown in Table 4, the F2UIE method obtained the minimum values for the EUVP and UWGAN datasets, demonstrating its effectiveness in restoring the naturalness of degraded images. However, for the RUIE datasets, F2UIE did not yield significant improvements compared to other methods. Notably, the WaveNet method performed well in terms of NIQE.

Overall, the quantitative analysis confirms that the proposed F2UIE method outperforms existing methods in terms of NIQE, BRISQUE, and UIQM, showcasing its superior ability to enhance underwater images.

Table 4 The quantitative comparison of CLAHE, UCM, UWCNN, Water-Net, Shallow-Net, WaveNet and F2UIE methods on RUIE, EUVP and UWGAN dataset based on UIQM, BRISQUE, NIQE
Table 5 The total run-time of HE, CLAHE, ICM, UCM, UWCNN, F2UIE and Water-Net methods on RUIE, UWGAN and EUVP dataset

4.4 Run-time comparsion

The average runtime of HE, CLAHE, ICM, UCM, UWCNN, Water-Net and F2UIE have been computed on different resolutions, 100\(\times \)100, 200\(\times \)200, 300\(\times \)300, 400\(\times \)400 and 500\(\times \)500. Table 5 presents the values of the average runtime of the compared methods including F2UIE for 200 images. However, the proposed network has been tested on a large number of datasets. It can be seen that UCM spends more time processing in comparison to other methods. It can also be observed that Water-Net obtains the worst time complexity. Among, all these methods HE and CLAHE spend minimum time in processing as they do not need training time. However, in terms of deep learning methods, the proposed F2UIE outperforms existing methods as it is a lightweight network.

5 Conclusion

In this paper, a novel approach Feature-based Convolutional Neural Network (F2UIE) is presented for underwater image enhancement. Its effectiveness, superior performance, and efficiency make it a valuable tool for various underwater imaging applications, ranging from marine research to underwater robotics and surveillance. The qualitative and quantitative evaluation of the F2UIE demonstrates its effectiveness in addressing common challenges such as haze, color-cast, and low-lighting in underwater images. The qualitative results showcase that the F2UIE method successfully mitigates color-cast issues and produces visually appealing enhancements. Furthermore, the F2UIE exhibits robust performance in removing haze from both real-world and synthetic underwater images. The quantitative evaluation validates the superiority of the F2UIE over state-of-the-art methods using metrics such as UIQM (RUIE-3.367, UWGAN-2.631, EUVP-3.390), NIQE (RUIE-3.146, UWGAN-5.11, EUVP-3.41) and BRISQUE (RUIE-36.44, UWGAN-37.01 and EUVP-38.38) for all three datasets. The F2UIE achieves excellent performance in terms of image quality assessment, further affirming its effectiveness in enhancing underwater images. Moreover, the proposed F2UIE demonstrates promising results in terms of average run time for different image sizes (100*100-225.16, 200*200-219.05, 300*300-213.6, 400*400-281.13, 500*500-335.19), outperforming other deep neural networks. This efficiency is crucial for real-time and resource-constrained applications.

However, there are avenues for future research and improvement. Exploring other neural network architectures as baseline models could potentially enhance the effectiveness and performance of the F2UIE. Additionally, investigating novel techniques for handling specific challenges in underwater image enhancement, such as image distortion and noise reduction, would further contribute to advancing the field.