Introduction

Coastal/lake area bathymetry is considered to be of fundamental importance for many purposes and applications, such as coastal engineering sciences, sustainable management, and spatial planning of lakes (Leu and Chang 2005; Gao 2009). These areas have changeable sediment movements due to tidal currents, wave propagation, and intensive human activities (Ceyhun and Yalçın 2010). As a result, rapid and accurate monitoring measurements of these regions, especially water bottom levels, should be developed (Pacheco et al. 2015).

Currently, there are two methods for water depth detection: single–multibeam echosounders and Lidar. The multibeam echosounder is the most conventional method for bathymetric applications, especially for very deep waters with depths up to 500 m. High accuracy and full-bottom coverage are considered to be the main advantages of the multibeam echosounder. In addition, the single-beam echosounder represents a feasible alternative for producing sea-bottom maps with acceptable vertical accuracy at a lower cost than that of the multibeam echosounder (Sánchez-Carnero et al. 2012). However, both of these methods are expensive, time-consuming, and require intensive labor, especially in shallow areas where coral reefs, rocks, and general shallowness are an obstacle to the navigation of survey boats. Finally, Lidar technology has been developed for bathymetric applications through the past several decades. Despite their high level of depth accuracy and suitability to shallow coastal areas, these systems are extremely expensive and have comparatively low coverage (Chust et al. 2010).

Optical remote sensing represents a low-cost, wide-coverage, and time-effective alternative to single–multibeam echosounders and Lidar for coastal bathymetry monitoring applications (Sánchez-Carnero et al. 2014). Lyzenga (1985) introduced the log-linear empirical approach using single imagery band for detecting water depths from satellite images. His theory was dependent on removing all other reflected values influencing water bottom signals. Recently, a log-linear correlation between multiband and water depth values was proposed by Lyzenga et al. (2006). Lyzenga et al.’s (2006) approach was applied subsequently by other researchers using different satellite images: Quickbird (Lyons et al. 2011), Worldview 2 (Doxani et al. 2012), Spot 4 (Sánchez-Carnero et al. 2014), and Landsat 8 images (Pacheco et al. 2015). Another empirical approach depending on band ratios, in which the difference in attenuation for different bands can be used for bathymetry detection, was developed by Stumpf et al. (2003). In the following years, this approach was developed further by other researchers such as Su et al. (2008) and Bramante et al. (2013).

Both of these empirical approaches which considered the most simple and widely used water depth derivation methods (Vahtmäe and Kutser 2016) have limitations. The first approach supposes that the entire bottom surface is homogenous and that the water column is similar in the entire coverage area. The second approach overcomes this demerit, but it has no physical foundation and its parameters are calculated through a trial process (Gholamalifard et al. 2013).

In the last decades, a novel alternative approach was proposed for detecting bathymetry using the neural network (NN) algorithm by Ceyhun and Yalçın (2010). This algorithm performs a non-linear relation between all spectral bands and water depth values, overcoming the limitations of the regressive models. Recently, other researchers have applied the same algorithm to various water depths, with various satellite images, and argued the outperformance to conventional Lyzenga and Stumpf approaches. Examples include Landsat images (Gholamalifard et al. 2013), IRS P6-LISS III images (Moses et al. 2013), and Quickbird images (Corucci 2011), using the neuro-fuzzy approach. This method has many disadvantages such as its complexity, black box nature, sensitivity to any small changes in data values resulting in high variance in output results, and vulnerability to the overfitting process. Moreover, it requires a lot of data or spectral bands to detect bathymetry.

On the other hand, comparative analytical methods using spectral libraries (look-up tables) in interpretation of remote sensing data have gained popularity in the mapping of water depths (Mobley et al. 2005; Lesser and Mobley 2007; Brando et al. 2009). These approaches need the spectral data about the bottom reflectance, suspended and dissolved matters, and applied with hyperspectral images (Vahtmäe and Kutser 2016). However, these approaches have three major limitations. First, the hyperspectral images are not available for enormous zones and have coarse spatial resolution. Also, the alternative airborne systems are expensive especially for wide coverage. Second, the processing of the hyperspectral imagery is computationally hard. Finally, they are relatively complex methods. This makes the empirical models with multispectral imagery a valuable alternative (Bramante et al. 2013).

The objective of this research is proposing three simple empirical approaches for bathymetry detection in shallow coastal/lake areas and attempts to overcome the drawbacks of the NN and Lyzenga GLM methods. These approaches are the ensemble regression tree-fitting algorithm using bagging (BAG), the ensemble regression tree-fitting algorithm of least squares boosting (LSB), and the support vector regression algorithm (SVR). The three algorithms are more stable, simpler, and more invulnerable to overfitting than NN; are much simpler than analytical approaches; and are less affected by other environmental factors than the Lyzenga GLM. The proposed methodologies were applied using Spot 6 and Landsat 8 images as example of high and low spatial resolutions over various study areas. The reflectance of green, red, blue/red, and green/red band ratios were used to obtain bathymetric maps as they demonstrate high correlation with water depths. To support the robustness of these approaches, three study areas were selected to provide diverse bottom samples with different levels of turbidity. The results achieved were then evaluated and compared with echosounder bathymetric data over three different study areas.

Study areas and available data

The first study area was Alexandria port, Egypt (see Fig. 1a). It is a fairly deep, low-turbidity, calm water area, due to its coastal barriers and has a depth range of 10.5 m. Almost all the harbor bottom surface cover is silt-sand. The second study area was the entrance zone of Lake Nasser/Nubia, which is located in the Sudanese part of Lake Nubia (see Fig. 1b). It is a fairly irregular, shallow, highly turbid water area with depths up to 6 m and high rates of sediment changes and annual flood changes. The lake has a clayey bottom surface.

Fig. 1
figure 1

a The first study area of Alexandria port coastal area, Egypt. b The second study area of Nubia Lake entrance zone, Sudan

The third study area was Shiraho, a subtropical territory, which is located in the southeastern part of Ishigaki Island, Japan (see Fig. 2). It is an irregularly shallow, low-turbid water area with depths up to 14 m. Shiraho is a heterogeneous area with a rich marine biodiversity that includes various ecosystems such as mangroves, seagrasses, and coral reefs.

Fig. 2
figure 2

The third study area of Shiraho, Ishigaki Island, Japan

Imagery data

Freely available Landsat 8 satellite images were used for detecting the bathymetry of the first and the third study areas with a spatial resolution of 30 m. Spot 6 image with a spatial resolution of 1.5 m was used for the second study area. The required parameters for radiometric image corrections were available in the images’ metadata files. The first Landsat 8 image was acquired during calm weather conditions on 3 August 2014, and the second Landsat 8 image was collected during windy conditions on 5 June 2013. The Spot 6 image was acquired during calm weather conditions on 12 January 2014. These images were selected so as to be synchronized with the echosounder field collection times for each study area.

Echosounder data

The reference water depths of the first study areas used for calibrating the algorithms were acquired by a NaviSound Hydrographic Systems model 210 echosounder instrument with an attached Trimble 2000 GPS. The maximum depth range of the echosounder was 400 m, and its vertical accuracy was 1 cm at 210 kHz (see Fig. 3). The second study area’s water depths were acquired by an Odom Hydrographic Systems Echotrac model DF 3200 MKII echosounder instrument with built-in DGPS. The depth range of the echosounder was from 0.2 to 200 m and its vertical accuracy was 0.01 m ± 0.1% of depth (see Fig. 4). Finally, the reference water depths of the third study area were acquired by a single-beam Lowrance LCX-15MT dual frequency (50/200 kHz) transducer and 12-channel GPS antenna. The horizontal and vertical accuracies were ±1 and ±0.03 m, respectively (Collin et al. 2014) (see Fig. 5).

Fig. 3
figure 3

Field bathymetry. Reference points of the first study area from the echosounder

Fig. 4
figure 4

In-situ bathymetry. Reference points of the second study area from the echosounder

Fig. 5
figure 5

Field bathymetry. Reference points of the third study area from the echosounder

About 2500 field points were collected for the first study area, 12,500 for the second study area, and 14,500 for the third study area. All the water depths were referenced to mean sea level (MSL). These points were used for calibration and evaluation of all the bathymetric models.

Methodology

To determine bathymetric information from the satellite images, pixel values were first converted to radiometrically correct spectral radiance values (Todd and Chris 2010). The data required for this conversion were available in the images’ metadata files (MTL files). Then, two essential successive corrections were applied to the radiance images: the sun glint correction and the atmospheric correction (Doxani et al. 2012). The sequence of implementation of these two corrections is optional (Kay et al. 2009). Second, the input values for training the supervised regression algorithms were extracted from the corrected reflectance images at the corresponding locations of the echosounder field points. Four inputs are used for training each approach: the values of red, green, blue/red, and green/red bands logarithms and the outputs were the water depths. For all study areas, the number of the sounding points was randomly divided to independent 75% training points and 25% testing points. For instance, for the Alex port study area, the sounding points were divided to 1875 points for training and 625 points for testing. Finally, the outputs from all algorithms were evaluated using the same independent testing points. The comparison depended on the RMSE and R 2 values resulting from the difference between the extracted bathymetric values and testing points of all approaches. Figure 6 summarises the workflow of these steps.

Fig. 6
figure 6

Workflow processing steps of presented methodology for detecting bathymetry from satellite images by different methods

The following subsections describe the imagery data preprocessing.

Imagery data preprocessing

Spectral top of atmosphere radiance

The spectral top of atmosphere radiance of each pixel was computed from the imagery pixel digital number (DN) values using the following equation (Landsat-8 2013):

$$ {\mathrm{L}}_{\lambda }={\mathrm{M}}_{\mathrm{l}}*DN+{\mathrm{A}}_{\mathrm{l}} $$
(1)

where Lλ = top of atmosphere spectral radiance, DN = digital number recorded by the sensor, Ml = band-specific multiplicative rescaling factor for radiances, and Al = band-specific additive rescaling factor for radiances.

The Ml and Al values were available in the images’ metadata files (MTL files).

Atmospheric correction

Atmospheric correction was applied to all images using the Fast Line-of-Sight Atmospheric Analysis of Hypercubes (FLAASH™) tool in the Envi 5.3 program. FLAASH performs radiative transfer-based models based on MODTRAN4 code (Berk et al. 1998) and has look-up tables for different types of atmosphere. Different types of aerosols are supported in the FLAASH tool, which defines the particle properties as scattering, absorption, and wavelength path radiance performance. The calculated radiance images were used as input for FLAASH tool. For all study areas, the maritime types were selected as aerosol model type, tropical as atmospheric model for hot areas, and two blue and infrared bands over water were selected as aerosol retrieval (Su et al. 2008). Finally, atmospherically corrected reflectance images were produced as the result.

Sun glint correction

The sun glint correction was applied to the atmospherically corrected reflectance images using the relation between the bands used for bathymetry and the near-infrared band (Hedley et al. 2005; Sánchez-Carnero et al. 2014). The deglinted pixel value can be calculated using Eq. 2:

$$ {{\mathrm{R}}_{\mathrm{i}}}^{\prime }={\mathrm{R}}_{\mathrm{i}}*{\mathrm{b}}_{\mathrm{i}}\left(\mathrm{RNIR}-\mathrm{MinNIR}\right) $$
(2)

where Ri′ = deglinted pixel reflectance value, Ri = atmospherically corrected reflectance value, bi = regression line slope, RNIR = corresponding pixel value in NIR band, and MinNIR = min NIR value existing in the sample.

Proposed approaches for bathymetry estimation

Least squares boosting fitting ensemble

Boosting is considered to be one of the most powerful learning ensemble algorithms to be proposed recently. It was originally designed for classification, but it was found that it could also be used for regression problems (Hastie et al. 2009). The basic concept of boosting is to develop multiple models in sequence by assigning higher weights as boosting for those training cases or learners that are difficult to be fitted in regression problems (Quinlan 2006). In this approach, learners learn sequentially, with early learners fitting simple models of data, and then the data are analyzed for errors. These errors identify problems of particular samples of data that are difficult to fit. Later models focus primarily on these samples by giving them higher weights and trying to predict them correctly. Finally, all models are given weights, and the set is converted to some overall predictors.

Thus, boosting is a method of converting a sequence of weak learners into strong predictors or a way of increasing the complexity of the primary model. Initial learners often are very simple, but the weighted combination can develop stronger and more complex learners (Ihler 2012).

The least-squares algorithm can be used to minimize any differentiable loss L(y, F) in conjunction with forward stagewise additive modeling in order to fit the generic function h (X, a) to the pseudo-responses (Ῡ = −gm (Xi)) for i = 1… N. In least-squares regression, the loss of function is L(Y, Ƒ) = (Y − Ƒ) 2/2 and the pseudo-response is Ῡi = Y I − Ƒm − 1 (Xi). The following steps illustrate the least-squares boosting algorithm (Casale et al. 2011):

  • Ƒo(X) = Ῡ

  • For m = 1 to M do:

  • i = Yi - Ƒm-1 (Xi), i = 1, N

  • m, am) = argmina, p \( {\sum}_{i=1}^N\left[\Upsilon \mathrm{i}-\uprho h\left(\mathrm{Xi};\mathrm{a}\right)\right]2 \)

  • Ƒm (X) = Ƒm-1 (X) + ρm h (X; am)

  • end For

  • end Algorithm

Bagging fitting ensemble

Bagging is an ensemble learning algorithm proposed by Breiman (1996) to improve regression, classification accuracy, and prediction model performance by reducing variance and avoiding “overfitting” problems. The basic concept of bagging is to generate some independent samples by replacement from the available training set. Then, a model is fitted to each bootstrap sample and the models are finally aggregated by majority voting for classification or averaging for regression (Kulkarni and Kelkar 2014). The main advantage of bagging is improving unstable algorithms such as NN and regression trees by averaging different resamples. As a consequence, the result is always better than fitting a single model to the training dataset (Inoue and Kilian 2006). For the splitting of each node, an impurity or error node criterion must be assigned, e.g., the Gini diversity index, which can be calculated using the following equation:

$$ 1-{\sum}_i{p}^2(i) $$
(3)

where p (i) is the observed fraction of classes with class i that reach the node.

The splitting is continued until the Gini index reaches zero and the resultant node is a pure node. This means that one class is assigned for each final node. For a standard training set T of size n, bagging generates m new training sets Ti (i = 1 to m, each of size n′) by sampling uniformly from the training set and with replacement. By sampling with replacement, some observations may be repeated and others may not be selected at all. If n′ and n are equal, then for large n, the set Ti is expected to have about 63% of the unique samples of T replicated with a full-size data known as in-bag. The rest is known as out-of-bag. This process is known as bootstrap sampling. The m bootstrap samples are used for fitting the m models and to return the result that receives the maximum number of votes H (x) (Ghimire et al. 2012). The following steps elucidate the bagging algorithm (Galar et al. 2012):

  • For m = 1 to M do

  • Tm = Random sample replacement (n, T)

  • hm = L (Tm)

  • end for

  • H (x) = sign (\( \sum_{m=1}^M\mathrm{hm}(x) \)) where hm∊ [−1, 1] are the induced classifiers

  • End algorithm

Support vector regression

Vapnik et al. (1964) proposed support vector machines (SVMs) for solving classification problems and statistical learning applications. As the method has shown high performance and has resulted in high accuracies, it has been extended successfully to regression problems. To discuss any regression problem, suppose that we have a training dataset of D = {yi, ti and i = 1, 2, 3…n}, with input vectors yi and target vectors ti. The main regression problem is to find a fitting function f (y) that approximates the relation between the input and target points. In addition, the output t is interpreted for any new input point y. This regression fitting function has a loss of function describing the difference between the predicted values and the actual target values (Smola and Schölkopf 2004). The support vector regression finds the most possible flat and deviated insensitive loss of function ε from the real targets (Vapnik 2000). In other words, errors are allowed if it is less than the predefined ε that controls the tolerance; otherwise, they are not. Suppose that we have a linear problem with the following equation

$$ \mathrm{F}\left(\mathrm{x}\right)=w*y+b $$
(4)

where wy and b ∍ R, both w and y are the dot product of w and y, and b is the bias.

Flatness in regression problems means searching for a small value for w, or in other words, minimixing the norm Euclidian space ‖w‖2. Thus, the regression can be stated as a convex optimisation problem as follows (Smola and Schölkopf 2004):

Minimize \( \frac{1}{2}{\left\Vert \mathrm{w}\right\Vert}^2 \)

Subject to

$$ \left\{\begin{array}{c}\hfill ti-(w.y)+b\kern0.5em \le \varepsilon \hfill \\ {}\hfill (w.y)-ti+b\kern0.5em \le \varepsilon \hfill \end{array}\right. $$
(5)

However, this formula assumes that all points are approximated within the allowable precision ε, which is not a feasible assumption in all cases, and some exceeding errors need to be allowed. Cortes and Vapnik (1995) used a soft margin loss function to present slack variables ζi̽ to overcome this problem, and the support vector regression solves this problem as follows:

Minimize \( \frac{1}{2}{\left\Vert \mathrm{w}\right\Vert}^2+C{\sum}_{i=1}^n\left({\boldsymbol{\upzeta} \mathbf{i}+\boldsymbol{\upzeta} \mathbf{i}}^{*}\right) \)

Subject to

$$ \left\{\begin{array}{c}\hfill ti-\left(w\cdot y\right)+b\le \varepsilon +\boldsymbol{\upzeta} \mathbf{i}\hfill \\ {}\hfill \left(w\cdot y\right)-ti+b\le \varepsilon +{\boldsymbol{\upzeta} \mathbf{i}}^{*}\hfill \\ {}\hfill \boldsymbol{\upzeta} \mathbf{i},\boldsymbol{\upzeta} \overset{x}{\mathbf{i}}\hfill \end{array}\right. $$
(6)

where C represents the compromise between the flatness and the tolerated deviation larger than ε. The points outside ε are called support vectors.

It was found that solving this optimzation problem is easier in its dual formulation and by extending the SVM to non-linear functions. As a result, a standard dualisation method using Lagrange multipliers can be applied to solve the SVR optimization problem. A Lagrange function can be constructed from the objective function by defining a dual set of variables. The dual optimisation problem can be written as follows (Farag and Mohamed 2004):

Maximize

$$ \left\{\begin{array}{c}-\frac{1}{2}\sum_{i,j=1}^l\left(\alpha i-\alpha \overset{\mathrm{x}}{i}\right)\left(\alpha j-\alpha \overset{\mathrm{x}}{j}\right)\ \left(yi,yj\right)\\ {}-\upvarepsilon \sum_{i=1}^l\left(\alpha i+\alpha \overset{\mathrm{x}}{i}\right)+\sum_{i=1}^l xi\ \left(\alpha i-\alpha \overset{\mathrm{x}}{i}\right)\end{array}\right. $$

Subject to

$$ \left\{\begin{array}{l}{\sum}_{i=1}^l\left(\alpha i-\alpha \overset{\mathrm{x}}{i}\right)=0\hfill \\ {}\left(\alpha i,\alpha \overset{\mathrm{x}}{i}\right)\in \left[0,C\right]\hfill \end{array}\right. $$
(7)

where αi and αi ̽ are Lagrange multipliers.

As a result, w and the expansion of F(x) can be calculated as follows:

$$ \mathrm{W}={\sum}_{i=1}^n\left(\alpha i-\alpha \overset{\mathrm{x}}{i}\right)\ xi\ \mathrm{and}\ \mathrm{F}\left(\mathrm{x}\right)={\sum}_{i=1}^l\left(\alpha i-\alpha \overset{\mathrm{x}}{i}\right)\left(yi,yj\right)+b $$
(8)

These equations infer that w can be calculated from a linear combination of the training sets of yi.

The bias term b is calculated using the Karush Kuhn Tucker (KKT) conditions (Karush 1939; Kuhn and Tucker 1951) as follows:

$$ b= xi-\left(w,y\right)-\varepsilon \kern0.5em for\kern0.5em \alpha i\hbox{'}\left(0,C\right) $$
(9)

The non-linearity of the support vector algorithm can be performed by preprocessing the training sets yi with a map Ф: y → Ƒ into some feature space Ƒ. Figure 7 illustrates the conversion from the decision boundary into a hyperplane after mapping to a feature space.

Fig. 7
figure 7

Converting the two-dimensional input space into a three-dimensional feature space

For a feasible solution, a kernel k (y, y’) can be used, and the support vector regression algorithm can be rewritten as follows (Farag and Mohamed 2004):

Maximize

$$ \left\{\begin{array}{c}-\frac{1}{2}\sum_{i,j=1}^l\left(\alpha i-\alpha \overset{\mathrm{x}}{i}\right)\left(\alpha j-\alpha \overset{\mathrm{x}}{j}\right)\ k\ \left(yi,yj\right)\\ {}-\upvarepsilon \sum_{i=1}^l\left(\alpha i+\alpha \overset{\mathrm{x}}{i}\right)+\sum_{i=1}^l xi\ \left(\alpha i-\alpha \overset{\mathrm{x}}{i}\right)\end{array}\right. $$

Subject to

$$ \left\{\begin{array}{l}{\sum}_{i=1}^l\left(\alpha i-\alpha \overset{\mathrm{x}}{i}\right)=0\hfill \\ {}\left(\alpha i,\alpha \overset{\mathrm{x}}{i}\right)\in \left[0,C\right]\hfill \end{array}\right. $$
(10)

Also, w and the expansion of F(x) can be rewritten as:

$$ \mathrm{W}={\sum}_{i=1}^n\left(\alpha i-\alpha \overset{\mathrm{x}}{i}\right)\Phi xi\ \mathrm{and}\ \mathrm{F}\left(\mathrm{x}\right)={\sum}_{i=1}^l\left(\alpha i-\alpha \overset{\mathrm{x}}{i}\right)k\left(yi,yj\right)+b $$
(11)

Then, we need a kernel function k (y, y’) that corresponds to a dot product in some feature space. In other words, one that transforms the non-linear input space to a high-dimensional feature space. There are several kernels that can be used to perform this transformation, such as the linear, polynomial, Gaussian and radial basis function, and Pearson universal kernel. The latter was proposed by Ustun et al. (2006), who argued its robustness, a time-saving and leading power that results in better generalisation performance of SVRs. The Pearson universal kernel can be written as follows:

$$ \mathrm{k}\left(yi,yj\right)=\frac{1}{\left[1+\left(\frac{2\ast \sqrt{\left\Vert yi-yj\right\Vert 2\sqrt{2\left(\frac{1}{w}\right)}}-1}{\sigma}\right)\right]} $$
(12)

where ω and σ are kernel parameters that control the half-width and the tailing factor of the peak.

Platt (1998) proposed the sequential minimum optimisation (SMO) algorithm for solving the optimisation problem in SVR and argued its precedence to other optimisation solutions. The SMO is an iterative algorithm solving the optimisation problem analytically by breaking the optimisation problem into smaller problems. The constraints for the Lagrange multipliers are reduced as follows:

(13)

The algorithm begins by finding the Lagrange parameter αi that violates the KKT conditions (Karush 1939; Kuhn and Tucker 1951) then chooses the second Lagrange parameter αi̽, optimizes both, and repeats these steps until convergence. When all Lagrange multipliers satisfy the conditions within the predefined allowable tolerance, the problem is solved.

Methods used for comparison

Lyzenga generalized linear model correlation approach

To overcome the demerits of single-band linear correlation, which assumes that the water column is uniform and that the bottom surface is homogenous, Lyzenga (1985) used a combination of two bands to correct these errors. Furthermore, Lyzenga et al. (2006) generalized this approach by using some bands and proved that it gives water depths not influenced by other factors such as water column and bottom type. The water depths can be calculated using Eq. 14 (Lyzenga et al. 2006):

$$ Z={a}_o+{\sum}_{i=1}^N ai\kern0.5em Xi $$
(14)

where Z = the water depth, a o and a i  = coefficients determined through multiple regression using the reflectance of the corresponding bands and the known depth, and X i  = the logarithm of the corrected band.

Recently, Sánchez-Carnero et al. (2014) used a GLM to link a linear combination of non-random explanatory variables X as example image bands to a dependent random variable Y such as the water depth values. The GLM represents the least-squares fit of the link of the response to the data (Gentle et al. 2012). The mean of the observed non-linear variable can be fitted to a linear predictor of the explanatory variables using the link function of g [μY] as follows (Sánchez-Carnero et al. 2014):

$$ \mathrm{g}\ \left[\mu Y\right]=\beta o+{\sum}_i\beta i Xi+{\sum}_{ij}\beta ij\kern0.5em Xi\kern0.5em Xj $$
(15)

where βo, βi, and βij are coefficients and Xi and X j are variable combinations.

Artificial neural network approach

Artificial neural networks (ANNs) have been used widely in remote sensing for classification and regression problems (Mather and Tso 2009). The multilayer perception (MLP) model using the back propagation (BP) algorithm is a supervised approach used for displaying the non-linear relationship between input and output data (Rumelhart et al. 1986). The MLP consists of three parts: the input layers as neurons that represent the available data, which in this case is the multispectral image band values; the hidden layer that demonstrates the network training process; and finally the output layer, which is the water depths. The BP algorithm begins with initial network weights to find the least error values by comparing actual outputs with desired values through an iterative process eventually reaching a predefined level of accuracy (Razavi 2014). The log sigmoid function is used to transfer the net inputs to the hidden layer as its derivative is computed easily and commonly used. Also, the linear function from the hidden layer to node outputs (Ceyhun and Yalçın 2010). The Levenberg–Marquard training algorithm is used to train the BP for weight and bias values updating as it is the first-choice supervised algorithm that is highly recommended for training middle-sized feed-forward neural networks (Ranganathan 2004).

The algorithm is given in Eq. 16 (Hagan and Menhaj 1994):

$$ {\mathrm{X}}_{\mathrm{k}+1}={\mathrm{X}}_{\mathrm{k}}+\left[JTJ+\mu \mathrm{I}\right]-1\ JT\ \varepsilon k $$
(16)

where Xk = the vector of current weights and biases, ɛ = the vector matrix of the network errors, J = Jacobean matrix of the network errors, μ = a scalar indicating the calculation speed of the Jacobean matrix, k = iteration number, I = the unit matrix, and T = the transpose matrix.

Results

Both Landsat 8 and Spot 6 multispectral images of the study areas were preprocessed for bathymetric by converting the image pixel values to radiance utilizing the image metadata file values. Moreover, performing the atmospheric and sun glint corrections to the image radiance values. All steps were performed in an ENVI 5.3 environment. The FLAASH tool was used for atmospheric correction, and the input parameters were set as described in the methodology. The resultant images from atmospheric correction were checked using field signal curves for each reflectance value.

For bathymetry mapping, the proposed approaches SVR, LSB, and BAG were applied to the preprocessed Landsat 8 and Spot 6 multispectral images and compared with the NN and GLM methods. GLM results the following equations for the three study areas respectively:

$$ {\mathrm{Z}}_{\mathrm{Alex}\ \mathrm{port}}=17.25\hbox{--} 4.69\ {\mathrm{L}}_{\mathrm{G}}\hbox{--} 0.51\ {\mathrm{L}}_{\mathrm{R}}+0.06\ \mathrm{B}/\mathrm{R}\hbox{--} 0.10\ \mathrm{G}/\mathrm{R}+0.65\ {\mathrm{L}}_{\mathrm{G}}{\mathrm{L}}_{\mathrm{R}}\hbox{--} 0.03\ {\mathrm{L}}_{\mathrm{G}}\mathrm{B}/\mathrm{R}\hbox{--} 2.30\ {\mathrm{L}}_{\mathrm{R}}\mathrm{G}/\mathrm{R}+0.06\ {\mathrm{L}}_{\mathrm{G}}\mathrm{G}/\mathrm{R}+0.004\ \mathrm{B}/\mathrm{R}\ \mathrm{G}/\mathrm{R} $$
(17)
$$ {\mathrm{Z}}_{\mathrm{Nubia}\ \mathrm{lake}}=2912.2\hbox{--} 904.96\ {\mathrm{L}}_{\mathrm{G}}+1219.7\ {\mathrm{L}}_{\mathrm{R}}\hbox{--} 3024.6\ \mathrm{B}/\mathrm{R}\hbox{--} 1900.7\ \mathrm{G}/\mathrm{R}+19.35\ {\mathrm{L}}_{\mathrm{G}}{\mathrm{L}}_{\mathrm{R}}\hbox{--} 1.06\ {\mathrm{L}}_{\mathrm{G}}\mathrm{B}/\mathrm{R}+1.07\ {\mathrm{L}}_{\mathrm{R}}\mathrm{G}/\mathrm{R}\hbox{--} 18.44\ {\mathrm{L}}_{\mathrm{G}}\mathrm{G}/\mathrm{R}\hbox{--} 1281.1\ {\mathrm{L}}_{\mathrm{R}}\mathrm{B}/\mathrm{R}+2143.8\ \mathrm{B}/RG/\mathrm{R} $$
(18)
$$ {\mathrm{Z}}_{\mathrm{Shiraho}}=\hbox{--} 15.185+29.67\ {\mathrm{L}}_{\mathrm{G}}\hbox{--} 39.73\ {\mathrm{L}}_{\mathrm{R}}\hbox{--} 10.48\ \mathrm{B}/\mathrm{R}+73.43\ \mathrm{G}/\mathrm{R}+0.44\ {\mathrm{L}}_{\mathrm{G}}{\mathrm{L}}_{\mathrm{R}}+28.63\ {\mathrm{L}}_{\mathrm{G}}\mathrm{B}/\mathrm{R}\hbox{--} 15.2\ {\mathrm{L}}_{\mathrm{R}}\mathrm{G}/\mathrm{R}+5.09\ {\mathrm{L}}_{\mathrm{G}}\mathrm{G}/\mathrm{R}\hbox{--} 18.22\ {\mathrm{L}}_{\mathrm{R}}\mathrm{B}/\mathrm{R}+3.36\ \mathrm{B}/RG/\mathrm{R} $$
(19)

where LG is the logarithm of corrected green band, LR is the logarithm of corrected red band, B/R is blue/red, and G/R is green/red logarithm ratio values.

The support vector regression was applied with SMO for solving the optimisation problem and the PUK kernel function. The SVR parameters were set as C parameter = 1, ε = 0.0, ζ = 0.001, and tolerance = 0.001. The PUK kernel parameters were ω = 0.5 and σ = 0.5. On the other hand, the NN training function was Levenberg-Marquardt back-propagation with ten hidden layers. Finally, the LSB and BAG models were constructed with ensembles of 50 regression trees. All of these parameters for each algorithm were selected based on the minimum RMSE and highest R 2 values. These algorithms were implemented in MATLAB environment and all the statistical analysis also were applied in MATLAB. The support vector regression code was originally developed by Clark (2013).

Figures 8, 10, and 12 show the bathymetric maps computed by applying each model using the Landsat 8 and Spot 6 images for each study area; Figs. 9, 11, and 13 the evaluation of each model; and Tables 1, 2, and 3 summarizes the corresponding RMSE and R2 values.

Fig. 8
figure 8

Bathymetric maps derived by applying each algorithm using Landsat 8 imagery over the Alexandria harbor area, Egypt. a GLM, b NN, c SVR, d LSB, e BAG

Fig. 9
figure 9

The continuous fitted models for Alexandria port area, Egypt. Depths are represented as points, and the continuous line represents the continuous fitted model a GLM, b NN, c SVR, d LSB, e BAG

Table 1 The RMSEs and R 2 of all methods for bathymetry detection over Alexandria port area, Egypt
Table 2 The RMSEs and R 2 of all methods for bathymetry detection for Nubia Lake entrance zone, Sudan
Table 3 The RMSEs and R 2 of all methods for bathymetry detection for Shiraho Island, Japan
Fig. 10
figure 10

Bathymetric maps derived by applying each algorithm using Spot 6 imagery over Nubia Lake entrance zone, Sudan. a GLM, b NN, c SVR, d LSB, e BAG

Fig. 11
figure 11

The continuous fitted models for Nubia Lake entrance zone, Sudan. Depths are represented as points, and the continuous line represents the continuous fitted model a GLM, b NN, c SVR, d LSB, e BAG

Fig. 12
figure 12

Bathymetric maps derived by applying each algorithm using Landsat 8 imagery over Shiraho Island area, Japan. a GLM, b NN, c SVR, d LSB, e BAG

Fig. 13
figure 13

The continuous fitted models for Shiraho Island, Japan. Depths are represented as points, and the continuous line represents the continuous fitted model a GLM, b NN, c SVR, d LSB, e BAG

Discussion

The selection of appropriate bands for bathymetry was performed through a statistical analysis to investigate the correlation between water depth and the imagery bands. The red and green bands demonstrated a strong correlation with water depth (Doxani et al. 2012; Sánchez-Carnero et al. 2014) also the blue/red and green/red logarithms band ratios.

The Lyzenga GLM correlates the band combination directly to water depth. Finding the best combination of the selected bands is performed through a trial process based on the lowest value of RMSE and the highest value of R 2. In our experiments, the best combination occurred between the green, red, blue/red, and green/red band logarithms. NN performs the correlation between the multilayer of the imagery bands as input and water depth as output through multidimensional non-linear functions. Many researchers have confirmed the outperformance of NN compared to conventional empirical methods as it finds the highest correlation between the imagery data and the in situ water depth (Gholamalifard et al. 2013). Our results also proposed outperformance of NN compared to Lyzenga GLM. The main disadvantage of NN is the many trials needed to find the best weights for correlation as it is an unstable approach having significant fluctuations of RMSE and R2 from one trial to another.

The SVR algorithm, on the other hand, is a stable approach that uses minimum sequential optimisation to correlate the imagery bands with water depth. The optimum kernel function was selected, after several trials, from the radial basis function kernel, the polynomial kernel, and the Pearson universal kernel based on minimum RMSE and maximum R 2. The latter outperformed the other kernel functions with the highest R 2 and lowest RMSE. Also, the optimum SVR parameters, C, ε, ζ, ω, and σ, were selected based on the minimum RMSE criterion.

LSB and BAG are fitting ensembles of regression tree algorithms that have two different theories for collecting regression trees. LSB works sequentially by focusing on the missed regression values of the previous tree. On the contrary, the BAG ensemble averages regression trees built from a bootstrapped random selection from input data. For both ensembles, the optimum number of regression trees was selected after sequential trials of various numbers of trees (10, 20, 30...100), and the best values were achieved with 50 trees. Both algorithms use the Gini diversity index for the splitting trees that are not pruned. The randomness of the regression trees and the splitting of the data into training and testing sets argue that the ensembles were not overfitting the input data. The results illustrate a preference of all proposed algorithms to Lyzenga GLM in addition to outperformance and greater stability of the BAG ensemble compared to the NN approach.

Many researchers used low-resolution satellite images for bathymetry detection, especially Landsat images exploiting their free availability (Gholamalifard et al. 2013; Sánchez-Carnero et al. 2014; Pacheco et al. 2015; Salah 2016). To compare the results with previous results from similar studies, the pixel size, water quality conditions, bottom type, availability of field points in the study area, and depth range should be considered. Sánchez-Carnero et al. (2014) applied the Lyzenga GLM, principal component analysis (PCA), and green band correlation algorithms to Spot 4 imagery with 10-m resolution to detect bathymetry over the turbid water in a shallow coastal area. The results of the research suggested outperformance of Lyzenga GLM compared to the PCA and green band correlation methods, with an RMSE of 0.88 m in the depth range of 6 m. Pacheco et al. (2015) tested the Lyzenga GLM using Landsat 8 coastal, blue, and green bands over clear waters in a shallow coastal area and obtained an RMSE of 1.01 m in the depth range of 12 m. Poliyapram et al. (2014) tested a new method for removing atmospheric and sun glint errors by using the shortwave infrared band from Landsat 8 imagery and then applied the Lyzenga GLM approach to the corrected bands over a slightly turbid shallow coastal area, obtaining an RMSE of 1.24 m in the depth range of 10 m. Gholamalifard et al. (2013) applied red band correlation, PCA, and NN using Landsat 5 imagery over a deep water area. The research argued better performance for the NN approach compared to PCA and red band correlation, with an RMSE of 2.14 m in the depth range of 45 m. Our results are comparable to the results of these studies for the NN and Lyzenga GLM approaches within the same depth ranges.

The results demonstrate an improvement in bathymetry accuracies from the three proposed methods over Lyzenga GLM and less influence by environmental factors. This improvement ranges from 0.04 to 0.35 m for the three methods. BAG algorithm also shows higher accuracies and more stability than do NN over the three study areas with different bottom covers and satellite image resolutions.

Conclusions

This study proposed three approaches for bathymetry detection. These approaches were applied in three different areas: a low-turbidity, deep, silt-sand bottom area of Alexandria port, Egypt, with depths up to 10.5 m; a highly turbid and clayey bottom area of the Lake Nubia entrance zone, Sudan, with 6 m water depths; and a low-turbidity, shallow, coral reefs area of Shiraho, Ishigaki, Japan, with a 14-m depth range. The proposed approaches used green, red, blue/red, and green/red band ratio logarithms corrected from atmospheric and sun-glint systematic bands of Landsat 8 and Spot 6 satellite images as input data and water depth as output. To validate the proposed approaches, the approaches were compared with the Lyzenga GLM and NN approaches. All results were also compared with echosounder water depth data. The Lyzenga GLM correlation algorithm gave RMSE values of 0.96, 1.02, and 1.16 m, whereas the NN yielded RMSE values of 0.87, 0.96, and 1.08 m in the three study areas, respectively. The proposed approaches, SVR, LSB, and BAG, produced RMSE values of 0.92, 0.88, and 0.65 m for the first study area; 0.98, 0.99, and 0.85 m for the second study area; and 1.11, 1.09, and 0.80 m for the third study area, respectively. From these results, it can be concluded that BAG, LSB, and SVR provide more accurate results than do Lyzenga GLM for bathymetry mapping over diverse areas. Additionally, outperformance of the BAG ensemble compared to the NN approach was confirmed.