1 Introduction

Water transparency is a crucial factor for understanding the status of the aquatic ecosystem. It controls the amount of light available for autotrophs occupying the first layer in the aquatic food chain. Water transparency is influenced by other important water quality variables [i.e., chlorophyll-a concentrations (Chl-a), colored dissolved organic matter (CDOM) and total suspended sediments (TSS)] (Chang et al. 2013; Song et al. 2014; Alikas and Kratzer 2017). These water constituents are linked to biotic and abiotic processes in coastal and marine ecosystems, including phytoplankton abundance, bacterial activity and interrelationships between inland waters and coastal ecosystems (Kaiser et al. 2011; Cherukuru et al. 2014; Boufeniza et al. 2020). Thus, synoptic monitoring of water transparency is fundamental to understanding, maintaining, and sustaining coastal and marine ecosystems.

Water transparency has been routinely monitored for decades by measuring the depth of Secchi disk (ZSD), a white or white and black disk with 20–30 cm diameter (Preisendorfer 1986; Lewis et al. 1988; Liu et al. 2019). Due to the simple procedure for estimating ZSD, large datasets of in situ ZSD have been established for many sites worldwide (Seafarers et al. 2017). However, such in situ data cannot efficiently represent the water transparency of coastal and marine areas in a spatial–temporal context. Mapping spatial and temporal distributions of ZSD requires an integrated approach that incorporates in situ data and ocean color images, such as those from Moderate‐Resolution Imaging Spectroradiometer (MODIS) and Medium‐Resolution Imaging Spectrometer (MERIS).

Various approaches were used to map ZSD in different aquatic ecosystems using remotely sensed data. The accuracy of these approaches was influenced by multiple factors, including the selection of atmospheric correction algorithm, modeling method and in situ-remotely sensed data matching criteria and time window (e.g., Constantin et al. 2016; Kulshreshtha and Shanmugam 2017). A standard approach of atmospheric correction assumes negligible reflectance from near-infrared (NIR) bands over water bodies. This approach is successfully used to estimate ZSD in open seas and oceans where Chl-a concentrations mainly control the optical properties of waters (e.g., Lewis et al. 1988; Seafarers et al. 2017; Shi et al. 2014). However, this standard atmospheric correction approach is less efficient in turbid waters where the reflectance from NIR bands cannot be ignored (Hu et al. 2000; Wang et al. 2012).

Two alternative atmospheric correction approaches were developed to improve the reliability of remote sensing reflectance (Rrs) over turbid waters, i.e., Management Unit of the North Sea Mathematical Models (MUMM) atmospheric correction algorithm developed by Ruddick et al. (2006) and Near Infrared-Short Wave Infrared (NIR-SWIR) atmospheric algorithm developed by Shi and Wang (2007). The MUMM atmospheric algorithm justifies Rrs at NIR wavelengths by a parameter derived by ratioing two Rrs of NIR bands. The algorithm also derives the aerosol reflectance over clear waters and applies it to the image, assuming that the atmospheric composition is relatively homogeneous over the area of interest. The NIR-SWIR atmospheric algorithm corrects Rrs over coastal waters utilizing SWIR bands, which are highly absorbed by turbid waters compared to NIR bands (Wang et al. 2009).

Furthermore, two modeling approaches (i.e., empirical and semi-analytical) can be used to derive ZSD. Empirical models derive ZSD statistically by comparing Rrs to simultaneous in situ ZSD observations. The models are constructed either using simple regression analysis by regressing the product of the band ratio to in situ ZSD or by using Rrs in different bands as predictors of ZSD through multiple regression analysis or neural network analysis (Chen et al. 2015; Stock 2015; Kulshreshtha and Shanmugam 2017; Shi et al. 2018). Empirical models were usually constructed to map local or regional distributions of ZSD (e.g., Toming et al. 2017; Zhang et al. 2012). Semi-analytical models resolve the apparent optical properties (AOP) using semi-analytically derived inherent optical properties (i.e., absorption, scattering and backscattering coefficients) based on the radiation transfer equation. Semi-analytical approaches attempted to propose a universal ZSD model for both clear and turbid waters. Although the recent semi-analytical techniques provided a promising result in this matter (Lee et al. 2015; Shang et al. 2016), they still have lower performance over turbid waters than over clear waters (Liu et al. 2019). Thus, empirically deriving ZSD using advanced statistical methods is still favorable for local and regional estimation of ZSD (Kulshreshtha and Shanmugam 2017).

The accuracy of modeling ZSD is also influenced by the time window used for matching Rrs with in situ data. Using a narrow time window between the satellite overpass and in situ measurements would reduce errors induced by temporal changes of water constituents (Bailey and Werdell 2006; Petus et al. 2014). Another potential source of errors is the way of matching in situ measurements with satellite data. In situ measurements are usually compared to a group of pixels, i.e., 3 × 3, 5 × 5 or 7 × 7 pixels, centered at each in situ point to overcome the positional uncertainty introduced by satellite images (Bailey and Werdell 2006). These pixel values are more frequently aggregated by the mean value that can be misleading when presenting a group of heterogeneous pixels. To reduce the effect of heterogeneity on the mean value, Bailey and Werdell (2006) excluded outlier pixels that exceed predefined lower and upper-value limits. Also, they stated that using a larger group of pixels would increase the heterogeneity effect, especially in dynamic waters. To overcome the heterogeneity issue, Chen et al. (2007) suggested excluding the mean value of matching pixels when its coefficient of variation (CV) is greater than 40%. Another remedy of the heterogeneity effect is to use the median value, which is not affected by outliers (Goyens et al. 2013).

Systematic procedures of matching in situ ZSD with satellite images and selecting the appropriate atmospheric correction algorithm and modeling technique are significant in constructing an accurate and robust model for ZSD estimation. In the Northern Arabian (Persian) Gulf (NAG), there have been few attempts to model ZSD. Two ZSD models derived from MODIS were established for the Arabian Gulf region by Alsahli (2009) and Al-Kaabi et al. (2016); both studies used the standard atmospheric algorithm. And, to our knowledge, ZSD models derived from MERIS have not been established yet for NAG waters. Therefore, this study aimed to: (1) examine the performance of three atmospheric correction algorithms (standard, MUMM and NIR-SWIR) over NAG, and (2) find the most optimum in situ-satellite image matching technique to construct an empirical model of ZSD over NAG by investigating the relationship between in situ ZSD and Rrs of MODIS and MERIS.

2 Study Area

The NAG lies within the geographic coordinates of 28° 31′ to 30° 01′ N and 47° 41′ to 51° 05′ E (Fig. 1). This part of the Arabian Gulf is shallow with less than 30 m in most areas, while the depth gradually increases towards the south. The general water movements in the NAG are governed by counter-clockwise circulations derived by the wind regime of the area (Al-Yamani et al. 2004). This circulation system enhances water transportation most of the year, from the eastern and northern banks of NAG to the western bank (Abuzinada et al. 2008). Freshwaters discharge into the NAG through five rivers, i.e., the Shatt Al-Arab, Karun, Hendijan, Hilleh, and Mond (also known as Mand in literature). The extensive anthropogenic activities along these rivers have degraded the water quality of the rivers and consequently contributed to changing NAG water characteristics, including the water transparency (Marzouni et al. 2014; Rahmanpour et al. 2014; Al-Mahmood and Mahmood 2019; Cunningham et al. 2019). Recently, Alsahli and Nazeer (2021) reported observable changes in the NAG water transparency during the last two decades. The NAG water transparency has been affected by regional and local factors, including dust storms and extensive anthropogenic activities along the NAG coasts (Al-Ghadban and El-Sammak 2005; Karbassi et al. 2005). Ultimately, this adds more environmental stress to the marine life of NAG (Alsahli 2009).

Fig. 1
figure 1

The northern part of Arabian Gulf (NAG). The stars and black circles along Kuwait’s territorial waters represent sites where seawater quality is observed by Kuwait Environmental Public Authority (KEPA) and Ministry of Public Works (MPW), respectively

3 Data Used

3.1 In Situ Data

The in situ ZSD data collected from Jan. 2003 to Apr. 2015 were used to model the water transparency using MODIS and MERIS images. This period was selected to maximize the number of in situ observations for model development and capture any potential seasonal variability affecting the model robustness. The in situ data were obtained from the Kuwait Environmental Public Authority (KEPA) and Ministry of Public Works (MPW) of Kuwait. The KEPA collects water quality data monthly at 13 sites along Kuwait territorial waters (Fig. 1). The MPW started collecting water quality data in 2013 to monitor the environmental status of Kuwait Bay during the Sheikh Jaber Al Ahmad Al Sabah Causeway Project, one of the mega projects in the country. In situ ZSD data of MPW used in this study were collected in Kuwait Bay from Jul. 2013 to Apr. 2015 at six sites (Fig. 1). In situ ZSD from both datasets (KEPA and MPW) ranged from 0.5 to 9.5 m.

3.2 Satellite Data

MODIS (Aqua) and MERIS data were used to model ZSD over the NAG. MODIS and MERIS have almost a daily coverage of the study area, allowing finding more coincident in situ measurements with these satellite data. The MODIS sensors aboard Terra and Aqua platforms were launched during Dec. 1999 and May 2002, respectively. They cover the entire Earth’s surface in 1–2 days with spatial resolutions ranging from 250 to 1000 m and an image swath width of 2330 km. MODIS Terra and Aqua overpass the study area approximately at 10:30 A.M. and 1:00 P.M. (local time), respectively. Both have identical multispectral bands suitable to observe bio-optical and physical characteristics of water bodies. MODIS Terra, however, has been experiencing a general system degradation since 2007, reducing its adequacy for quantitative analyses (Franz et al. 2008). Therefore, only MODIS Aqua level-1A data coincident with in situ data were obtained from the Ocean Color website (http://oceancolor.gsfc.nasa.gov).

Also, MERIS level-1 data coincident with in situ data were obtained from the Envisat MERIS website (http://merisfrs-merci-ds.eo.esa.int/merci). The MERIS images have a 300 m spatial resolution and have been collected for the study area around 10:00 A.M. (local time). The satellite mission was terminated in 2012 (Nilson et al. 2012). However, European Space Agency (ESA) continued MERIS mission by launching ESA Sentinel-3A and 3B during Feb. 2016 and Apr. 2018, respectively (Nilson et al. 2012; European Space Agency (ESA) 2018). The MERIS and current ESA satellites provide an important archive for studying biophysical variables of marine and coastal waters.

4 Methodology

4.1 Satellite Data Processing

MODIS and MERIS Images with wide viewing angles or affected by environmental conditions, such as dust storms and clouds, were disregarded to minimize potential errors when modeling ZSD (Bailey and Werdell 2006). Also, a ± 3-h window around the satellite overpass was selected for matching in situ observations with MODIS and MERIS data to reduce the differences due to the time factor between the two datasets (Vaičiūtė et al. 2012). Alsahli (2009) revealed that using a ± 3-h window around the satellite overpass was significantly reduced potential errors induced by the temporal variability in the in situ measurements. This time window has also been frequently used in other studies (e.g., Chen et al. 2014a, b; Delgado et al. 2014).

Selected MODIS and MERIS level-1 data were processed to level-2 using SeaWiFS Data Analysis System (SeaDAS 7.5) software. The MODIS level-1 data were processed to level-2 using three atmospheric algorithms (standard, MUMM and NIR-SWIR) to investigate their efficiency in minimizing effects of atmospheric perturbations over the study area, whereas the MERIS level-1 data were processed to level-2 using the standard atmospheric algorithm. When processing level-1 data to level-2, Rrs and normalized water-leaving radiance (nLw) (from 413 to 754 nm) were computed. The products of Rrs and nLw at these spectral bands have been frequently used in literature to estimate coastal water constituents, such as ZSD, turbidity and TSS (e.g., Doron et al. 2011; Nechad et al. 2010; Stock 2015).

4.2 Satellite Data Extraction Criteria

The MODIS and MERIS level-2 data were matched with in situ data by extracting pixel values using a 3 × 3 window centered at each in situ point. The MODIS level-2 data consisted of three datasets processed based on three atmospheric algorithms (standard, MUMM and NIR-SWIR). To reduce the effect of pixels’ heterogeneity, we investigated the suitability of four aggregation measures in matching the Rrs data with in situ ZSD measurements. The mean value with CV ≤ 30% (thereafter M-30), mean value with CV ≤ 15% (thereafter M-15), median, and the filtered mean suggested by Bailey and Werdell (Bailey and Werdell 2006) (Eq. 1).

$${\text{Filtered}}\;{\text{mean}} = \frac{{\sum\nolimits_{i} {\left( {1.5 \times \sigma - \overline{X}} \right) < X_{i} < \left( {1.5 \times \sigma + \overline{X}} \right)} }}{n}$$
(1)

where \(\overline{X }\) and \(\sigma\) are the mean and standard deviation, respectively, of the extracted pixel values at each in situ point, and n is the number of pixels within (± 1.5 × \(\sigma\)) from the mean. Filtered means with n less than five were excluded.

The level-2 MODIS and MERIS data were further divided into different datasets based on four aggregation measures (M-30, M-15, median and filtered mean). These datasets were independently analyzed to find the most optimum aggregation measure for matching remotely sensed datasets with in situ ZSD.

4.3 Modeling Z SD

Modeling ZSD went through three stages, i.e., preparation of variables, construction of ZSD models, and accuracy assessment of models. As an initial step of constructing the ZSD models, the normality of all variables (i.e., in situ ZSD, Rrs and nLw derived from MODIS and MERIS level-2 images) was tested. Different types of transformations were applied on variables that were not normally distributed to improve the data distribution shape and linearity between the regressed variables of ZSD models (Mertler and Reinhart 2016). Several ZSD models were developed by regressing Rrs and nLw at wavelength extending from 413 to 754 nm on in situ ZSD measurements; the MODIS and MERIS datasets were analyzed separately. The statistical relationship between in situ ZSD and these spectral bands was investigated using univariate and multiple regression analyses (based on stepwise technique) to find the best model explaining the most ZSD variations within NAG waters.

The accuracy of ZSD models was evaluated using cross-validation techniques. The robustness of MODIS derived ZSD models was assessed using 3–1 cross-validation. The dataset was divided into three segments; two segments were selected to build the model, and the third segment was used to validate the model accuracy. This procedure was repeated three times by rotating the segment role (Camstra and Boomsma 1992; Jonathan et al. 2000; D’Alimonte and Zibordi 2003). For MERIS-derived ZSD models, the robustness of models was evaluated using the leave-one-out cross-validation (LOOCV) method due to the limited number of observations. In the LOOCV method, one observation from the dataset was used as a training sample, while the rest were used to construct the ZSD model. This procedure was repeated until each observation was trained by switching the role of observations in each run. Cawley and Talbot (2004) stated that LOOCV provides an accurate estimation of the model robustness. The average root mean squared error (RMSE) derived from cross-validation analysis and coefficient of determination (r2) were used to sort the ZSD models based on their robustness. The statistical analysis was carried out using R statistical programming language. Figure 2 illustrates an overview of methods used to model ZSD.

Fig. 2
figure 2

An overview of methodology used to model ZSD

5 Results

5.1 Simple Regression Analysis

In situ ZSD observations were matched with Rrs and nLw products of MODIS and MERIS by calculating four aggregation measures (i.e., M-30, M-15, median and filtered mean) of 3 × 3 pixels centered at each corresponding in situ observation, meaning that the Rrs and nLw data were divided into four datasets based on these aggregation measures. Because the exclusion criteria of pixels values were different for each aggregation measure, the number of matching in situ observations corresponding to the aggregated pixel values varied. The Rrs and nLw products almost had the same relationship with in situ ZSD observations. Thus, we reported the results of Rrs here for the sake of brevity. Also, the MODIS level-2 data estimated using the NIR-SWIR atmospheric correction algorithm were excluded at an early stage of the analysis due to the many invalid pixels yielded from using this algorithm. The NIR-SWIR atmospheric correction algorithm was clearly inefficient over the NAG compared to the Standard and MUMM algorithms.

The MODIS products of \({\mathrm{Rrs}}_{ 531\mathrm{ to }645}\) atmospherically corrected using the standard algorithm had a significant general relationship with in situ ZSD (Table 1), with an exception observed at \({\mathrm{Rrs}}_{ 645}\) extracted by M-15 (r2 = 0.45). Contrarily, \({\mathrm{Rrs}}_{645}\) extracted using the filtered mean had the most significant relationship with in situ ZSD. All Rrs datasets (i.e., those extracted by the different aggregation measures) exhibited, in general, a similar relationship with in situ ZSD with two apparent exceptions at \({\mathrm{Rrs}}_{645}\) and \({\mathrm{Rrs}}_{678}\) (Fig. 3a). Among the four aggregation measures, the filtered mean returned a consistent relationship between Rrs and in situ ZSD.

Table1 Coefficients of determination (r2) between log-transformed in situ ZSD and log-transformed MODIS Rrs atmospherically corrected using the standard algorithm
Fig. 3
figure 3

a The MODIS \({\mathrm{Rrs}}_{443\mathrm{ to }531}\) of the four datasets had a similar relationship with in situ ZSD: Differences were clear beyond \({\mathrm{Rrs}}_{531}\)with an exception at \({\mathrm{Rrs}}_{678}\). b The MODIS Rrs of the four datasets exhibited a similar relationship with in situ ZSD. In general, Rrs dataset extracted using M-15 had the most significant relationship with in situ ZSD. c The MERIS Rrs of the three datasets exhibited some variations in the relationship with in situ ZSD. These variations were clear beyond \({\mathrm{Rrs}}_{681}\)

Furthermore, the MODIS products of \({\mathrm{Rrs}}_{ 531\mathrm{ to }678}\) atmospherically corrected using MUMM algorithm exhibited a more consistent and significant relationship with in situ ZSD than those estimated using the standard atmospheric correction algorithm (Table 2). The Rrs of the four datasets exhibited a generally consistent and similar relationship with in situ ZSD. The \({\mathrm{Rrs}}_{555}\) had the most significant relationship with in situ ZSD, especially the \({\mathrm{Rrs}}_{555}\) extracted by M-15 (r2 = 0.72) (Fig. 3b). For the Rrs dataset extracted by the filtered mean, the most significant relationship with in situ ZSD was observed at \({Rrs}_{ 555}\) and \({Rrs}_{ 645}\)(r2 = 0.64).

Table 2 Coefficients of determination (r2) between log-transformed in situ ZSD and log-transformed MODIS Rrs atmospherically corrected using MUMM algorithm

For the MERIS Rrs products, three datasets of \({\mathrm{Rrs}}_{443\mathrm{ to }754}\) were extracted using mean, filtered mean and median of 3 × 3 pixels centered at each corresponding in situ observations, whereas excluding Rrs mean values based on their CV was not applied due to the limited number of matching points (n = 72). The \({\mathrm{Rrs}}_{560\mathrm{ to }681}\) of the three datasets had a significant relationship with in situ ZSD (Table 3). The most significant relationship between in situ ZSD and MERIS Rrs was observed at \({\mathrm{Rrs}}_{620}\) extracted by the filtered mean (r2 = 0.62). For the three datasets, \({\mathrm{Rrs}}_{510\mathrm{ and }560}\) showed a similar relationship with in situ, whereas the relationship beyond \({\mathrm{Rrs}}_{681}\) varied among the three datasets (Fig. 3c).

Table 3 Coefficients of determination (r2) between log-transformed in situ ZSD and log-transformed MERIS Rrs

5.2 Multiple Regression Analysis

Simple regression analysis revealed that Rrs (from MODIS and MERIS) responded differently to in situ ZSD variations. The MODIS and MERIS Rrs at the green and red wavelengths (λ > 530 nm) were very responsive to in situ ZSD variations, whereas the blue wavelengths were less responsive. The red-edge and NIR wavelengths (700 nm < λ > 800 nm) exhibited an inconsistent relationship with in situ ZSD. We observed that each spectral band from \({\mathrm{Rrs}}_{443}\) to \({\mathrm{Rrs}}_{754}\) explained an amount of in situ ZSD variations that can be statistically incorporated in a single model using multiple regression analysis. Thus, we investigated all these bands together as predictors for in situ ZSD to find those that most portray in situ ZSD variations.

The MODIS level-2 products atmospherically corrected using the standard algorithm showed that the maximum variations of in situ ZSD were explained by \({\mathrm{Rrs}}_{488}\) and \({\mathrm{Rrs}}_{748}\) extracted by M-15 (R2 = 0.73). The Rrs extracted by M-30 and median responded differently to in situ ZSD variations; \({\mathrm{Rrs}}_{678}\) and \({\mathrm{Rrs}}_{748}\) extracted by M-30 explained the most in situ ZSD variations (R2 = 0.65), whereas \({\mathrm{Rrs}}_{547}\) and \({\mathrm{Rrs}}_{645}\) extracted by the median value had the most significant relationship with in situ ZSD (R2 = 0.64) (Table 4). Multiple regression models using Rrs extracted by the filtered mean could not be established as they did not pass the analysis assumptions.

Table 4 Multiple regression models explained more in situ ZSD variations than simple regression models

The MODIS Rrs estimated based on the MUMM atmospheric correction algorithm showed more consistent responses to in situ ZSD variations than the Rrs estimated based on the standard atmospheric correction algorithm. The Rrs extracted by M-30 and median revealed that \({\mathrm{Rrs}}_{488}\) and \({\mathrm{Rrs}}_{555}\) explained the most in situ ZSD variations (R2 = 0.75 and R2 = 0.74, respectively) (Table 4). The other Rrs datasets (i.e., those extracted by M-15 and filtered mean) did not pass the multiple regression analysis assumptions.

Also, incorporating MERIS Rrs using multiple regression analysis to estimate ZSD yielded a significant relationship; the \({\mathrm{Rrs}}_{681}\) and \({\mathrm{Rrs}}_{443}\) extracted by the mean value had the most significant relationship with in situ ZSD variations (R2 = 0.78). This relationship was changed (R2 = 0.66) when extracting \({\mathrm{Rrs}}_{681}\) and \({\mathrm{Rrs}}_{443}\) using median. By extracting the Rrs using the filtered mean, the most significant predictors were different, i.e., \({\mathrm{Rrs}}_{620}\) and \({\mathrm{Rrs}}_{709}\) (R2 = 0.72) (Table 5).

Table 5 Statistical comparison between different modeling techniques using MERIS dataset

5.3 ZSD Model Accuracy Assessment

Significant MODIS ZSD models derived through simple and multiple regression analyses (r2 and R2 ≥ 0.5) were evaluated for accuracy and robustness using 3–1 cross-validation. The RMSEs of ZSD models estimated by the cross-validation technique varied from 220 to 85 cm. Among the MODIS ZSD models, the most accurate and robust model was the multiple regression model of \({\mathrm{Rrs}}_{555}\) and \({\mathrm{Rrs}}_{488}\) calculated using MUMM algorithm and extracted by M-30 (Table 4) (Fig. 4a, b). A demonstration of applying this model on MODIS level-2 data is illustrated in Fig. 5a; the model is mathematically expressed as:

Fig. 4
figure 4

a A scatter plot between actual and estimated log ZSD values from the MODIS model. The area between the dotted lines in a and c is the prediction margin (α = 0.95) that illustrates the model performance. For MODIS model (b) and MERIS model (d). the random pattern between residuals and fitted values shows the model’s robustness, i.e., errors do not covary with ZSD values. c A scatter plot between actual and estimated \(\sqrt{{Z}_{\mathrm{SD}}}\) values (denoted as \(\widehat{\sqrt{\mathrm{in} \mathrm{situ} {Z}_{\mathrm{SD}}}}\)) from MERIS model

$$Z_{{{\text{SD}}}} \left( {{\text{cm}}} \right) = 10^{{0.522 - 2.53 \times \log {\text{Rrs}}_{555} + 1.482 \times \log {\text{Rrs}}_{488} }}$$
(2)
Fig. 5
figure 5

a The MODIS ZSD model applied on MODIS level-2 acquired in 3 Jan 2011. b The MERIS ZSD model on MERIS level-2 acquired in 23 Jan 2011

Furthermore, MERIS models derived from simple and multiple regression analysis were evaluated for accuracy and robustness using the LOOCV technique. The RMSEs of MERIS models estimated by LOOCV technique varied from 84 to 74 cm showing robustness comparable to MODIS models derived from both atmospheric algorithms (Table 5). The most accurate and robustness ZSD model was the multiple regression model of \({\mathrm{Rrs}}_{681}\) and \({\mathrm{Rrs}}_{443}\) (Fig. 4c, d). A demonstration of applying this model on MERIS level-2 data is illustrated in Fig. 5; the model is mathematically expressed as:

$$Z_{{{\text{SD}}}} \left( {{\text{cm}}} \right) = \left[ { - 20.917 - \left( {13.524 \times \log {\text{Rrs}}_{ 681} } \right) + \left( {33.201 \times \sqrt {{\text{Rrs}}443} } \right)} \right]^{2}$$
(3)

6 Discussion

Modeling ZSD using high temporal resolution satellite images, such as MODIS and MERIS, is essential for building synoptic water quality monitoring programs. In this study, two empirical ZSD models were developed for NAG waters derived from MODIS and MERIS level-2 data. In developing the ZSD models, we investigated the performance of three atmospheric correction algorithms (i.e., standard, NIR-SWIR and MUMM) over NAG waters and the suitability of four aggregation measures for extracting pixels (i.e., M-30, M-15, median and filtered mean). The consistent relationship between ZSD and Rrs calculated using MUMM algorithm demonstrates the performance of this atmospheric correction algorithm over NAG waters. We observed that whatever the aggregation measure used to extract Rrs, the spectral responses to ZSD variations had a uniform shape (Fig. 3b). Also, the standard atmospheric correction algorithm performed well when modeling ZSD using MERIS level-2 data. Two possible explanations of this performance are the high spatial resolution that could reduce the heterogeneity of pixels matched with in situ data, and the use of a long wavelength (i.e., red) band that is less affected by atmospheric scattering factors (Goyens et al. 2013).

Furthermore, the comparison among the aggregation measures revealed that different pixel extraction criteria could yield different results. For instance, excluding the mean of heterogeneous pixels, whose CV ≥ 30%, improved the model’s accuracy more than using the mean after excluding the outliers (i.e., the filtered mean), especially when the number of concurrent in situ observations with satellite data is relatively large (n > 100) (Mertler and Reinhart 2016). Using the filtered mean suggested by Bailey and Werdell (2006) to remedy the heterogeneity issue of matched pixels can be a practical alternative aggregation measure when the number of matching in situ observations is relatively small. Another suitable alternative aggregation measure is the median of pixels that was a good representative value of Rrs, whereas excluding the mean of pixels whose CV ≥ 15% was impractical because it reduced the number of candidate in situ observations.

The MODIS and MERIS Rrs at the visible spectrum were very responsive to in situ ZSD variations. The Rrs at green bands had a high correlation with in situ ZSD observations. Modeling ZSD using the most responsive Rrs, however, was not the most optimum means to capture the maximum ZSD variations as water transparency fluctuates by different water constituents that independently contribute to Rrs at different visible bands. For instance, CDOM significantly contributes to Rrs at blue and green regions, whereas TSS contributes to Rrs at red regions (Hu et al. 2004). Using the band ratioing approach to overcome this issue might not be the best solution. Considering that each spectral band contributes to explaining ZSD differently, this approach does not efficiently control the contribution weight of bands being ratioed. Developing ZSD models using multiple regression analysis can solve this issue through coefficients (slopes) that precisely determine the contribution of each predictor (Rrs) in explaining ZSD variations (Mertler and Reinhart 2016). Using multiple regression is a straightforward and accurate approach to model OAPs when they have (directly or by transformation) a linear relationship with Rrs. In contrast, complicated nonlinear relationships between OAPs and Rrs can be efficiently modeled using neural network analysis (Zhang et al. 2002; Chen et al. 2015, 2019; Heddam 2016a). Yuan et al. (2017) and Chang et al. (2000), however, stated that the efficiency of neural network model varies with the training dataset selection, meaning that applying the model to data that differ from the training dataset might yield unreliable results.

Moreover, the uncertainties associated with ZSD models can be induced by in situ ZSD measurements that are influenced by multiple factors, such as the observer’s visual acuity and water surface roughness (Preisendorfer 1986; Heddam 2016b; Alikas and Kratzer 2017). Visual acuity differences among observers would contribute to in situ ZSD variations not related to the light intensity within the water column. Also, differences in water surface roughness due to boat movements and wind speed variations can be a source of error. These factors are intrusive during in situ ZSD data collection and cannot be avoided. Thus, ZSD models with an RMSE of about 75 cm seem to be very accurate considering these factors.

The proposed ZSD models in this study improve the estimation of water transparency over NAG waters compared to the previous models of the Arabian Gulf region. Alsahli (2009) model (r2 = 0.68), developed to estimate water transparency over Kuwait waters using MODIS level-2 data, and Al-Kaabi et al. (2016) model (r2 = 0.62), developed to estimate water transparency over the Arabian Gulf region using MODIS level-2 data, had less performance than the proposed ZSD MODIS in his study (R2 = 0.75). The other factors that give more reality to the proposed ZSD over the previous two models are the number of observations used to construct the models and the narrow matching time window. The previous two models were developed using observations (n < 67) less than what was used on the proposed ZSD model (n = 152). Al-Kaabi et al. (2016) used a wider time window (± 6 h) to match satellite images with in situ observations that might also not capture the temporal variability of ZSD as they were insufficiently distributed over the months of the year.

Furthermore, the two previous models used the standard atmospheric correction algorithm and light attenuation coefficient at 488 nm (Kd488) semi-analytically derived using Lee et al. (2005) model. The semi-analytical approach showed less performance than the empirical approach used in this study. Lee et al. (2015) proposed an improved approach of the previous semi-analytical model that performed well in estimating ZSD in different places (Shang et al. 2016; Kulshreshtha and Shanmugam 2017). The performance of the current semi-analytical model, however, is questionable in highly turbid waters, especially those below 2 m as illustrated by Liu et al. (2019). Thus, using the empirical approach to model ZSD of highly turbid waters where a large area of the water body has ZSD below 2 m, such as NAG waters, is still the most favorable option.

The NAG ecosystems are significantly influenced by multiple local and regional factors, including extensive anthropogenic activities and high turbid fresh waters discharging from the rivers that carry a large amount of organic matters and nutrients from agricultural areas and other sources (Rahmanpour et al. 2014; Al-Mahmood and Mahmood 2019; Cunningham et al. 2019). These factors can disturb the NAG ecosystems in different ways. For instance, increasing organic matters induced by anthropogenic activities provides optimum conditions for heterotrophic plankton communities to grow and overgraze phytoplankton species leading to an imbalance status in the aquatic ecosystem by changing water quality indicators (e.g., dissolved oxygen, nitrogen, ammonium, and Chl-a concentrations) (Johannessen et al. 2006; Boufeniza et al. 2020). Thus, monitoring ZSD over NAG waters in a synoptic perspective using ocean color satellite products can be significantly linked to biotic and abiotic activities. Monitoring these activities provides an early alert for catastrophic events and assists in controlling many polluting sources contributing to degrading NAG ecosystems. The MODIS and MERIS ZSD models proposed in this study can be applied in water quality monitoring programs to significantly estimate water transparency of NAG and understand factors degrading its ecosystems (Alsahli and Nazeer 2021). Using the two models also can be extended to cover the entire Arabian Gulf, and probably similar waters, with some training data for the accuracy estimation.

7 Conclusion

Multiple regression analysis was performed to develop two empirical ZSD models for NAG waters using MODIS and MERIS level-2 data. In constructing these two ZSD models, the performance of three atmospheric correction algorithms (i.e., standard, NIR-SWIR and MUMM) over NAG waters and suitability of four aggregation measures used to extract Rrs (i.e., M-30, M-15, median and filtered mean) were evaluated. Among the three atmospheric correction algorithms, MUMM was the most suitable algorithm for the NAG. The comparison among the pixel extraction methods revealed that excluding severe heterogeneous groups of pixels, whose CV ≥ 30%, was the most appropriate extraction method. In contrast, using filtered mean might be an alternative extraction method when the number of matching in situ observations is relatively small.

The in situ ZSD variations were significantly explained by the ZSD models derived from the MODIS and MERIS level-2 Rrs products (R2 = 0.75 and RSME = 80 cm, R2 = 0.78 and RMSE = 74 cm, respectively). The uncertainties associated with ZSD models can be induced by in situ ZSD measurements that are influenced by multiple factors, such as the observer’s visual acuity and water surface roughness. With this margin of error, however, the proposed ZSD models improved the estimation of water transparency over NAG waters compared to the previous models of the Arabian Gulf region. The ZSD models can be used to accurately map spatial and temporal distributions of ZSD for NAG waters that would provide a better understanding of NAG water quality dynamics. Also, the two ZSD models can be applied for the entire Arabian Gulf waters and probably other similar waters, with the availability of training data.