Keywords

1 Introduction

Monitoring inland water resources is an important aspect of sustainable water management. In recent years, advances in satellite technology have enabled the monitoring of inland water resources such as rivers, lakes, and wetlands from space. This has opened up new opportunities for the use of deep learning techniques to better understand and manage these resources. Deep learning is a type of machine learning which utilizes artificial neural networks to learn from large datasets. It can be used to detect and classify objects in satellite imagery, identify changes in land cover and surface water characteristics, and monitor the health of inland water bodies. This paper will discuss the potential of deep learning for monitoring inland water resources, focusing on the application of deep learning to satellite imagery.

The use of deep learning for monitoring inland water resources has been gaining traction in recent years as a way to better understand and manage these resources. The ability to detect and classify objects in high-resolution satellite imagery has allowed researchers to gain valuable insight into the health of inland water bodies, such as changes in water levels, water quality, and surface features. This information can be used to better inform and guide decision-making related to water resources management. Satellite imagery has become an invaluable tool for monitoring inland water resources. High-resolution satellite imagery allows for detailed analysis of water bodies, such as changes in water levels, water quality, and surface features.

Many of the researchers examined alternative methods to classify and detect the water body image and some of these are described more below.

[1] The random under sampling boosted (RUSBoost) technique is used to develop a high-resolution machine learning (ML) approach for identifying inland water content using CYGNSS data. [3] applied a deep learning-based approach to satellite imagery for detecting inland water sources in the Yellow River Basin in China. The authors used a CNN to classify water bodies according to their size. [4] used deep learning techniques to detect water bodies in satellite imagery of a coastal area [5]. The study found that deep learning was more accurate than traditional methods, and it was able to detect more than 90% of water bodies in the imagery [6]. The authors suggest that deep learning techniques could be used to monitor inland water sources in a more efficient way than traditional methods.

Contribution of the Study

The primary goal of this research is, deep learning based inland water image classification using a neural network through satellite. The study explores the possibility of using neural network to classify the normal image as well as water image.

Objectives of the Study

The following are the objectives that need to be accomplished in order to do this study.

  • To model a neural network architecture, that can be used to classify the normal image and water image through satellite.

  • To analyze the performance of neural network by using performance metrics.

2 Background

In this section, Background of our proposed model VGG 16 architectures as well as other architectures like (AlexNet, and GoogLeNet).

2.1 VGG16 Architecture

VGG16 proved to be a defining moment in humanity’s attempt to make computers “see” the world. For decades, a lot of work has been invested into enhancing this capacity under the field of Computer Vision (CV) [7]. The major development known as VGG16 paved the way for several other developments in this field. Andrew Zisserman and Karen Simonyan of the University of Oxford created the Convolutional Neural Network (CNN) model. A manual contest called the ImageNet Wide Scale Visual Recognition Challenge (ILSVRC) evaluated image categorization (and object identification) techniques on a significant scale.

figure a

2.2 AlexNet Architecture

The network’s initial two convolutional layers are connected to overlapping layers of max-pooling to get the most features feasible [8]. The outputs of the convolutional layer and fully connected layer are all coupled to the ReLu non-linear activation function.

figure b

2.3 GoogLeNet Architecture

The GoogLeNet Architecture is comprised of 27 pooling levels and has a total of 22 stacked layers. In all, there are nine inception modules that are arranged in a linear fashion. The global average pooling layer is linked to the terminals of the inception modules. The graphic that follows shows the whole of the GoogLeNet architecture at a reduced scale.

figure c

3 Methodology

In this study, VGG 16 neural network is used to classify the inland water body images. Through pre-processing, features associated to the images are retrieved to identify the water and non-water bodies. This stage takes raw satellite imagery and uses an internal threshold value to label the water body portion of the image with white pixels and the land portion with black pixels (Fig. 1).

Fig. 1.
figure 1

Flow chart for proposed methodology

3.1 Dataset Description

The data we collected at this website https://www.kaggle.com/datasets/franciscoescobar/satellite-images-of-water-bodies From this dataset, we have to select water body images. The Kaggle satellite images of water bodies data set is used for training and testing of the listed models. The data set consists of the images of both normal and water body images. These images are enhanced and classified before going into training phase.

3.2 Pre-Processing Steps for Image Classification

The objective is to demonstrate how the accuracy changes when some well-known pre-processing methods are applied to certain basic convolutional networks. The following lists a few pre-processing methods.

Read Image:

To read the image, we constructed a method to load picture-containing folders into arrays after storing the path to our image dataset in a variable.

Resize Image:

In resizing image, we will write two methods to show the photos in this phase, one to display one image and the other to display two images, in order to see the change. Following that, we develop a method called processing that only accepts the photos as an input.

Remove Noise:

To Remove the noise a Gaussian function is used to blur a picture, producing a gaussian blur. It is a typical graphics application effect that is often used to decrease visual noise. In order to improve picture structures at various sizes, computer vision algorithms also use gaussian smoothing as a pre-processing step.

Segmentation:

The segmentation done by dividing the picture into its background and foreground using segmentation, and we will then use further noise reduction to further enhance our segmentation.

Morphology:

The processing of pictures based on forms encompasses a wide range of image processing processes. The output image produced by morphological processes is the same size as the input image after it has had a structural element applied to it.

Train_test Split:

After all the preprocessing steps are done, the data set is split into training and testing sets based on the user’s split ratio. Later, this split train data will be used to train the models, and the test data will be used to test the models.

Train the Network:

The suggested VGG-16 model and Alex, Google Nets are trained using the train data. The suggested model’s performance is evaluated and compared using these two extra Alex and Google nets. The following metrics may be used to assess the model’s performance.

Performance Metrics

The effectiveness of a technique is assessed in view of the confusion matrix’s accuracy, sensitivity, precision, and F1-score

$$ Accuracy = \frac{TP + TN}{{TP + TN + FP + FN}} $$
(1)
$$ Sensitivity = \frac{TP}{{TP + FN}} $$
(2)
$$ {\text{Pr}}\,ecision = \frac{TP}{{TP + FP}} $$
(3)
$$ F1 - score = 2*\frac{{{\text{Pr}}\,ecisison*{\text{Re}}\,call}}{{{\text{Pr}}\,ecision + {\text{Re}}\,call}} $$
(4)
$$ Specificity = \frac{TN}{{TN + FP}} $$
(5)

4 Results

A confusion matrix is a type of table that is used to describe the performance of a classification model on a set of test data for which the true values are known. It is a table of correct predictions and incorrect predictions broken down by each class, allowing you to see where the model is making mistakes. The rows of the matrix represent the predicted classes, while the columns represent the actual classes. The cells of the matrix contain the number of correct predictions and incorrect predictions for each class. The below shown figures are the confusion matrix of the proposed and existing models.

The Fig. 2 is the confusion matrix of the proposed VGG-16 architecture. In figure normal images are represented by zeros and water body images are represented by ones. The proposed method identified 64 times images as normal images and there is a misclassification of 2 images in normal images. 152 times, it is correctly predicted as water body images and there are 2 misclassifications.

Figure 3 presents the image of confusion matrix of the AlexNet architecture is seen above. Normal images are represented by zeros in the figure, whereas water body images are represented by ones. AlexNet identified 60 images as normal images, with 3 images misclassified as normal images. It is properly predicted as water body images 151 times, with 6 misclassifications.

Fig. 2.
figure 2

Confusion matrix of the proposed VGG-16

Fig. 3.
figure 3

Confusion matrix of the AlexNet architecture

Figure 4 presents the confusion matrix for the GoogLeNet is seen in the above graphic. Normal images are represented in the figure by zeros, whereas images of water bodies are represented by ones. GoogLeNet wrongly identified 11 images as normal images while identifying 62 times as normal images. There were 4 incorrect classifications out of 143 predictions that the images were of bodies of water.

It is clear from the Fig. 5 graph that the performance of the proposed model VGG-16 is measured and compared using the two additional models AlexNet and GooLeNet. Therefore, the recommended model provided greater performance, with an accuracy score of 96.36, in comparison to AlexNet’s score of 95.91 and GoogLeNet’s score of 93.18, respectively.

Below accompanying Fig. 6 graph makes it evident that the two additional models AlexNet and GooLeNet are used to evaluate and compare the performance of the proposed model VGG-16. As a result, the suggested model performed better than AlexNet and GooLeNet, with a specificity score of 93.05 and 91.86, respectively, for the proposed model.

Fig. 4.
figure 4

Confusion matrix for the GoogLeNet

Fig. 5.
figure 5

Performance of the proposed model VGG-16

Fig. 6.
figure 6

Specificity

The following Fig. 7 graph demonstrates that the performance of the proposed model VGG-16 is assessed and compared using remaining two models, AlexNet and GoogLeNet. With a sensitivity score of 94.45, the suggested model demonstrated superior performance in contrast to AlexNet (90.91) and GoogLeNet (89.94).

Fig. 7.
figure 7

Sensitivity

5 Conclusion

This study has assessed the effectiveness of deep learning approaches to monitor inland water bodies using satellite images. The VGG-16 neural network is used to classify the inland water body images. The evaluation of the model was done by comparing the results with two additional models: AlexNet and GoogLeNet. The proposed model showed higher accuracy, sensitivity and specificity scores than the other two models. The proposed model VGG-16 shows improved performance in comparison to the benchmark models AlexNet and GoogLeNet, with an accuracy score of 96.36, specificity score of 93.05 and 91.86, respectively. This model can be used to extract useful information from satellite images and identify the water body portion of the image with more accuracy. This research provides an important step towards better understanding of the inland water resources and their distribution.