Introduction

Water clarity is an integrative record of overall water quality and a convenient indicator for assessing trophic status in aquatic sciences and has been routinely and widely measured with Secchi disk (ZSD, m) (Bai et al. 2020; Lee et al. 2016; Shi et al. 2018; Zhao et al. 2011). The ZSD values depend on the intensity light and optically active constituents (e.g., chlorophyll-a, total suspended matter, and color dissolved organic matter) and play a critical role in understanding aquatic environment variations and biochemical processes (Feng et al. 2019; Song et al. 2021). ZSD is also useful to monitor other bio-optical properties, such as the light availability for photo-synthetically active radiation (PAR), relating to its measurement costs and simplicity (Song et al. 2017; Zhang et al. 2012).

Over the past few decades, remote satellite data has also been used for estimating ZSD due to its large coverage characteristics and rapid data acquisition (Fukushima et al. 2017; Mu et al. 2021; Olmanson et al. 2016; Shang et al. 2016). Generally, there are two strategies for retrieving ZSD from remote sensing data: empirical and semi-analytical approaches. The empirical approaches usually estimate ZSD by developing a regression model between field measured ZSD and remote sensing reflectance (Binding et al. 2015; Olmanson et al. 2016; Shi et al. 2018). However, the empirical approaches have characteristic of site-specific limitation and may not be transferable to other settings (Lee et al. 2016; Ren et al. 2018; Xu et al. 2021a). In contrast, previous studies have confirmed that semi-analytical methods contain greater potential in developing a general approach for estimating ZSD (Rodrigues et al. 2017; Yang et al. 2013). Therefore, semi-analytical algorithms generally provide more reliable results for monitoring the ZSD in various aquatic ecosystems.

An innovative semi-analytical model (termed as ZSDV6) based on radiative transfer theory was proposed by Lee et al. (2015), and its superior performance was confirmed by subsequent studies (Lee et al. 2016; Shang et al. 2016). However, several defects may exist when applying the original algorithm to various settings with various optical properties. Firstly, the estimation of the total absorption coefficients at reference wavebands does not work in highly turbid waters due to large uncertainties in estimating total absorption coefficients (Liu et al. 2020; Mishra et al. 2014; Watanabe et al. 2016; Xue et al. 2019). Secondly, the coefficients of the original model, determined by open ocean and coastal water datasets, may produce over/underestimated inherent optical properties (IOPs). The third potential concern was that a fixed ratio value of 1.5 was assumed between beam radiance diffuse attenuation coefficient (c) and \({K}_{d}\) (i.e., \({K}_{T}/{K}_{d}\) = 1.5). Therefore, challenges remain for applying algorithms to estimate ZSD in various waters, especially in large-scale applications.

Designed to provide continuous observations similar to the Landsat series, the MultiSpectral Instrument (MSI) onboard the twin satellites (Sentinel-2A and 2B), were launched by the European Space Agency Copernicus program in June 2015 and March 2017, respectively. With its eight bands from visible to shortwave infrared (SWIR) spectral region (443 ~ 835 nm) and fine temporal resolutions (5 days), the sentinel-2 MSI data has been considered as a superior satellite sensor with high signal-to-noise ratios (SNR) for regular estimation of target land-cover change at regional or global scales. Furthermore, with finer spatial resolution (10 m, 20 m, and 60 m) from visible to near-infrared (NIR) bands, MSI data provides more opportunities for enhanced monitoring of water quality parameters in smaller inland waters (e.g., water area < 30 km2). However, the systematic spatiotemporal variation patterns of ZSD using the abovementioned satellite data on a large scale (e.g., national scale) have rarely been investigated.

Therefore, the main purposes of this study are to (1) provide an improved scalable semi-analytical model for estimating ZSD in inland waters with various turbid categories; (2) validate a developed model and compare it with existing models; (3) demonstrate long-term applications of the developed model on a national scale using sentinel-2 MSI data.

Study area and data collection

Study area

Our surveyed area across China was divided into six large sub-regions, the Northeast Plain (NP) and Mountains (NE), Eastern Plain (EP), Inner-Mongolian Plateau (MP), Yunnan-Guizhou Plateau (YG), Xinjiang province (XJ), and Tibetan-Qinghai Plateau (TP) according to geographical conditions and climatic characteristics (Zhang et al. 2019). The sampled waters included 20 inland lakes that ranged from clear to highly turbid, from shallow to deep, and from oligotrophic to hypereutrophic (Fig. 1). These sampled lakes are sporadically distributed from the eastern region (NP and EP) to the western region (TP and XJ), with altitudes ranging from below 5 to above 4250 m. The basic information of 20 sampled lakes is shown in Fig. 1 and Table 1. Among the surveyed lakes, hypereutrophic lakes (e.g., Lake Taihu) are mainly located in the EP, while mesotrophic lakes (e.g., Lake Erhai) are mostly distributed in YG and EP. With the exception of Wanlv Lake in EP, most oligotrophic lakes are located on the TP, which are characterized by small water surface (< 30 km2) and limited human disturbance.

Fig. 1
figure 1

Location of 20 sampling lakes. The blue, green, and red triangles mark the oligotrophic, mesotrophic, and hypereutrophic lakes, respectively

Table 1 The basic information of sampling lake is listed, including sampling number, date, central longitude (Lon.), central latitude (Lat.), water area (Area, m2), and trophic status between 2015 and 2021

In situ water quality data and spectra data collection

The dataset I (calibration)

From 2015 to 2021, water samples were collected by taking a total of 276 samples from ten lakes (Fig. 1 and Table 1). In situ surface water samples (~ 0.5 m depth) were measured using pre-cleaned Niskin bottles and were placed in an ice bin at low temperatures (~ 3 °C) for laboratory analysis. The measurements of water clarity (ZSD, m) were simultaneously taken using a Secchi disk (Li et al. 2020; Ren et al. 2018). The Secchi disk is slowly lowered into a water column until it disappears from the inspector’s view, the depth at which it is no longer visible is recorded as the ZSD value (Zeng et al. 2020a). Within the following days, water samples were filtered and four water quality parameters, including concentration of total phosphorus (CTP), chlorophyll-a (CChla), total suspended matter (CTSM), and the absorption coefficients of colored dissolved organic matter (CDOM) at 443 nm (termed as \({a}_{\mathrm{CDOM}}\left(443\right)\)) were measured in the laboratory following the method described by Zeng et al. (2022a) and Xu et al. (2018a). At the same time, the trophic state of sampling lakes was evaluated by the Carlson’s trophic state index (TSI) (Carlson 1977); the results are shown in Fig. 1 and Table 1.

The corresponding \({R}_{rs}\left(\uplambda \right)\) was field-measured using an Analytical Spectral Devices (ASD) Inc. FieldSpec Pro between 14:00 and 16:00. More details of the process are provided by previous studies (Cai et al. 2021; Lei et al. 2018; Xu et al. 2020, 2021c). At each sampling point, the total radiance (\({L}_{t}\)), the sky-viewing radiance (\({L}_{sky}\)), and the radiance reflected by a standard gray panel (\({L}_{p}\)) were collected. The \({R}_{rs}\left(\uplambda \right)\) can be calculated using the following equation:

$${R}_{rs}\left(\lambda \right)={\rho }_{p}\left({L}_{t}-{r}_{aw}{L}_{sky}\right)/\left(\pi Lp\right)$$
(1)

where \({r}_{aw}\) represents the skylight reflectance at the air–water surface and is taken as 2.2% for calm water conditions (Zeng et al. 2022b; Zheng et al. 2016).

An underwater spectroradiometer (TriOS Mess- und Datentechnik GmbH, Rastede, Germany) was used to collect the diffuse attenuation coefficient \({K}_{\mathrm{d}}\left(\lambda \right)\) on site, which has a scanning channels range of 320 to 950 nm and a spectral sampling interval of 3.3 nm. After that, the spectral sampling interval can be interpolated to a narrow resolution of 1 nm through indoor data processing. The spectra of underwater downward irradiance \({E}_{d}\left(\lambda ,z\right)\) at different sampling depths (\(z\)= 0.4, 0.8,…….,3.6 m) were measured following the methods suggested by Lei et al. (2020), and the \({K}_{\mathrm{d}}\left(\lambda \right)\) was calculated using a non-linear fit equation as follows:

$${E}_{d}\left(\lambda ,z\right)={E}_{d}\left(\lambda ,{0}^{-}\right)*\mathrm{exp}\left(-{K}_{\mathrm{d}}\left(\lambda \right)*z\right)$$
(2)

\({K}_{\mathrm{d}}\left(\lambda \right)\) was determined only if R2 ≥ 0.95 and the number of sampling depths must be more than 3 (Zhang et al. 2012).

The dataset II (validation)

Another dataset (dataset II), including ZSD and \({R}_{rs}\left(\uplambda \right)\) measurements from 14 lakes, was collected by taking a total of 203 samples from 2015 to 2018 (Fig. 1, Table 1), similar protocols used for dataset I were followed. Notice that dataset II is used as an independent dataset for algorithm verification. The distributions of those water samples covered the eastern region to the southwest region (Yunnan Province and Tibet Plateau) in China (Fig. 1).

Water type classification

We divided waterbodies into three basic types determined by the shape of \({R}_{rs}\), and a simple but robust water type classification algorithm proposed by Balasubramanian et al. (2020) was applied in this study. According to Balasubramanian et al. (2020), if \({R}_{rs}\left(665\right)<{R}_{rs}(560)\) and \({R}_{rs}\left(665\right)>{R}_{rs}(490)\), the waterbodies were identified as slightly turbid waters (ST); if \({R}_{rs}\left(665\right)>{R}_{rs}(560)\) and \({R}_{rs}\left(740\right)>0.01{/\mathrm{sr}}\), the waterbodies were classified as highly turbid waters (HT); if \({R}_{rs}\left(665\right)<{R}_{rs}(490)\), the waterbodies were identified as clear waters (CW). Finally, there were 120, 84, and 72 samples in dataset I classified into HT, ST, and CW, respectively, while 90, 52, and 61 samples were correspondingly classified in dataset II.

Accuracy assessment

Three evaluation indicators, including the coefficient of determination (R2), the mean absolute square percentage error (MAPE), and root mean square error (RMSE), were selected to characterize the performance of the model:

$$MAPE=\frac{1}{n}{\sum }_{i=1}^{n}\left(\left|{{V}_{meas}^{i}-V}_{pred}^{i}\right|/{V}_{meas}^{i}\right)*100\mathrm{\%}$$
(3)
$$RMSE=\frac{\sqrt{{\sum }_{i=1}^{n}{\left({V}_{meas}^{i}-{V}_{pred}^{i}\right)}^{2}}}{n}$$
(4)

where the \({V}_{pred}^{i}\) and \({V}_{meas}^{i}\) are the estimated and measured values, respectively; n is the number of samples.

Satellite data collection and preprocessing

A total of 10,523 Sentinel-2 MSI L1C images captured in 410 large waterbodies (areas > 10 km2) of China were downloaded during the non-freezing period from 2018 to 2021 from the European Space Agency (https://scihub.copernicus.eu/). Because heavy cloud contaminated imagery is unsuitable for ZSD estimation, only imagery with free or low cloud coverages (< 10%) was selected for analysis. These images were atmospherically corrected using Acolite algorithm (Vanhellemont and Ruddick 2014). Water boundaries in the images were extracted following the method described by Zhang et al. (2019). At the same time, Virtual-Baseline Floating macroAlgae Height (VB-FAH) was applied to mask the water pixels with heavy algal blooms (Xing and Hu 2016). During the field studies conducted from 2015 to 2019 (dataset I), the field measured time of a total of 65 points (29 points in Lake Taihu, 12 in Lake Erhai, and 24 points in Lake Wanlv) was close to Sentinel-2 MSI imaging time (± 3 h) and were selected for further validation of the accuracy of Acolite atmospheric correction.

The performance of the Acolite algorithm was assessed by comparing the field measured and Sentinel-2 MSI-derived \({R}_{rs}\left(\lambda \right)\) at the available bands of 443, 490, 560, 665, and 740 nm in the ZSD retrieval algorithm. The field measured \({R}_{rs}\left(\lambda \right)\) were converted to simulated spectrum using the corresponding spectral response functions (SRFs) (Li et al. 2017b; Zeng et al. 2020b), which can be expressed by the following equations:

$$R_{rs}(\lambda)=\sum\nolimits_{\lambda_{\min}}^{\lambda_{\max}}S_\lambda R_{rs\_m}(\lambda)/\sum\nolimits_{\lambda_{\min}}^{\lambda_{\max}}S_\lambda$$
(5)

where \({R}_{rs}(\lambda )\) is the simulation of the Sentinel-2 MSI \({R}_{rs}(\lambda )\); \({\mathrm{R}}_{\mathrm{rs}\_\mathrm{m}}(\uplambda )\) is the field-measured Rrs(λ); and \({S}_{\lambda }\) is the SRF of Sentinel-2 MSI, which can be downloaded from the European Space Agency (https://scihub.copernicus.eu/).

The performance of the Acolite algorithm applied in Sentinel-2 MSI imagery is presented in Fig. S1. The Acolite algorithm had relatively poor performance at blue (443 and 490 nm) wavebands in Lake Taihu, as well as at NIR (740 nm) wavebands in Lake Wanlv, with the MAPE greater than 25%. At the same time, Acolite algorithm performed well in visible wavebands, with the MAPE ranging from 17.06 to 21.1% and the RMSE from 0.001 to 0.009 m. Based on the atmospheric corrected wavebands at visible and NIR bands of Sentinel-2 MSI, i.e., the MAPE values at 443, 490, 560, 665, and 740 nm were less than 30%, and the RMSE values maintained low error values, the Acolite atmospheric correction method is reasonable for these bands and has greater potential in retrieving ZSD based on Sentinel-2 MSI data.

Model development

A new semi-analytical model based on QAA algorithm (Lee et al. 2015, 2016) (denoted as ZSD20) used to estimate ZSD in various waters is proposed in this study. The symbols and corresponding description are summarized in Table S1, and the derivation process in ZSD20 algorithm corresponding to various water types is listed in Table 2.

Table 2 The derivation flowchart of ZSD20 algorithm for clear (CW), slightly turbid (ST), and highly turbid waters (HT)

In step 1, \({r}_{rs}(\lambda )\) could be determined as (Lee et al. 2015, 2016):

$${r}_{rs}\left(\lambda \right)=\frac{{R}_{rs}\left(\lambda \right)}{0.52+1.7{R}_{rs}\left(\lambda \right)}$$
(6)

In step 2, \(u\left(\uplambda \right)\) could be expressed as (Lee et al. 2016):

$$u\left(\uplambda \right)=\frac{-\mathrm{g}0+\sqrt{{(\mathrm{g}0)}^{2}+4\mathrm{g}1*{r}_{rs}\left(\uplambda \right)}}{2{\mathrm{g}}_{1}}$$
(7)

where g0 and g1 was 0.084 and 0.17, respectively, which were suggested by the former study (Lee et al. 1999; Xue et al. 2019).

In step 3, \(a({\uplambda }_{0})\) could be determined by making assumption that \({a}_{\mathrm{w}}\left({\lambda }_{0}\right)\) is domination (Lee et al. 2016; Xue et al. 2019).

$$a\left({\lambda }_{0}\right)={a}_{\mathrm{w}}\left({\lambda }_{0}\right)+{a}_{nw}\left({\uplambda }_{0}\right)$$
(8)

According to ZSDV6 algorithm (Lee et al. 2009), \(a\left({\uplambda }_{0}\right)\) could be calculated by selecting the reference bands to 560 nm and 670 nm in CW and SW, respectively. Meanwhile, \(a\left({\uplambda }_{0}\right)\) should be estimated at near infrared region (NIR) in HT, in order to meet an assumed condition that the water absorption \({a}_{\mathrm{w}}\left({\uplambda }_{0}\right)\) dominates in NIR (i.e., \({a}_{\mathrm{w}}\left(\mathrm{NIR}\right)\approx a\left(\mathrm{NIR}\right)\)) (Cai et al. 2023; Rodrigues et al. 2017; Xue et al. 2019). However, such assumptions are not held because the absorption of particulate matter (\({a}_{p}\left(\uplambda \right)\)) cannot be ignored in such waters (Le et al. 2009; Zeng et al. 2021), and a large difference may occur between in-situ \(a\left(\mathrm{NIR}\right)\) and \({a}_{w}\left(\mathrm{NIR}\right)\). To fill this gap, we located \({\uplambda }_{0}\) to MSI-740 nm, then estimated \({a}_{nw}\left(740\right)\) from its empirical relationship with \({R}_{rs}(740)\) based on field-measured data (dataset I, N = 120) (Fig. 2a). Therefore, Eq. (8) can be modified as (9):

Fig. 2
figure 2

Relationships between \({a}_{nm}\left(740\right)\) and \({R}_{rs}\left(740\right)\) in highly turbid waters (a). The matchup points between the in situ and the retrieved \({K}_{d}\left(560\right)\) from improved model (b) and original model (c). The values of \({K}_{T}/{K}_{d}\) vary in various waters, and the dash line indicated the equation of \({K}_{T}/{K}_{d}=1.5\) in original ZSDV6 algorithm (d)

$$a\left(740\right)=a_w\left(740\right)+38.14\;{\ast R}_{rs}\left(740\right)+0.066$$
(9)

In step 4, \({b}_{b}({\uplambda }_{0})\) could be expressed as:

$${b}_{b}\left({\lambda }_{0}\right)=\frac{\mu \left({\lambda }_{0}\right)\alpha \left({\lambda }_{0}\right)}{1-\mu \left({\lambda }_{0}\right)}$$
(10)

Furthermore, \({b}_{b}\left(\lambda \right)\) could be retrieved from \({b}_{b}\left({\lambda }_{0}\right)\) as (Lee et al. 2016):

$${b}_{b}\left(\lambda \right)={(b}_{b}\left({\lambda }_{0}\right)-{b}_{w}\left({\lambda }_{0}\right))*{\left(\frac{{\lambda }_{0}}{\lambda }\right)}^{Y}+{b}_{w}\left(\lambda \right)$$
(11)

where the power-law exponent values of \({b}_{b}\left(\lambda \right)\) (Y) were obtained from different bands ratio as following equations (Xue et al. 2019):

$$Y=3.99-3.59\mathrm{exp}(-0.9\frac{{r}_{rs}\left(443\right)}{{r}_{rs}\left(560\right)})$$
(12)

In step 6, \(a\left(\lambda \right)\) could be retrieved from \({b}_{b}\left(\lambda \right)\) as (Lee et al. 2016):

$$a\left(\lambda \right)=(\left(1-\mu \left(\lambda \right)\right){b}_{b}\left(\lambda \right)/\mu \left(\lambda \right)$$
(13)

In step 7, the diffuse attenuation coefficient \({K}_{\mathrm{d}}\left(\lambda \right)\) could be retrieved from \(a\left(\lambda \right)\) and \({b}_{b}\left(\lambda \right)\) based on following equation (Lee et al. 2015):

$${K}_{d}\left(\lambda \right)=\left(1+{\mathrm{m}}_{0}*{\uptheta }_{\mathrm{s}}\right)a\left({\uplambda }_{0}\right)+{\mathrm{m}}_{1}(1-{\mathrm{m}}_{2}\mathrm{exp}(-{\mathrm{m}}_{3}a\left({\uplambda }_{0}\right))){b}_{b}({\uplambda }_{0})$$
(14)

where \({K}_{d}\left(560\right)\) always represents the minimum \({K}_{d}\) value among the transparent spectral domain (443 ~ 665 nm) in the original model (Zeng et al. 2020a); \({\uptheta }_{\mathrm{s}}\) is the subsurface solar zenith angle; \({m}_{0-3}\) are model parameters, which were derived using Hydrolight simulations based on oligotrophic waters and Case-1 models of Morel and Maritorena (2001) for optical properties by assuming the IOPs were vertically constant (Lee et al. 2013). Therefore, those parameters (\({m}_{0-3}\)) should be retuned using in situ data (dataset I) in order to meet the application of inland waters, similar to Lee et al. (2013). In detail, we fixed the same values for \({\mathrm{m}}_{1}\), \({\mathrm{m}}_{2}\), and \({\mathrm{m}}_{3}\), but varied \({\mathrm{m}}_{0}\), the optimal value of \({\mathrm{m}}_{0}\) was further confirmed based on nonlinear best fit. The values of \({\mathrm{m}}_{1-3}\) were derived in the same way. The retuned values of the four model parameters (\({m}_{0-3}\)) were 0.012, 3.16, 0.52, and 10.7, respectively. Figure 2b presents the matchup points between the in situ and the retrieved \({K}_{d}\left(560\right)\) where all of the samples were evenly distributed along the 1:1 line with low MAPE (10.24%) and RMSE (0.81/m), indicating that the new values of the parameters (\({m}_{0-3}\)) are satisfactory for estimation of \({K}_{d}\left(560\right)\).

After that, the ZSD is derived based on following equation (Lee et al. 2015):

$${Z}_{SD}=\frac{1}{{K}_{d}\left(\lambda \right)+{K}_{T}\left(\lambda \right)}\mathrm{ln}\left({T}_{r}\frac{\left|{r}_{T}-{r}_{W}\right|}{{C}_{t}^{r}}\right)$$
(15)

where \({r}_{T}\) is the radiance reflectance right above a target and is approximately 0.27/sr (Duntley and Preisendorfer 1952; Lee et al. 2015). The \({r}_{W}\) represents the radiance reflectance of water corresponding to the wavelength with \({K}_{d}\), which can be calculated following Eq. (6). The \({T}_{r}\) is approximately equal to 0.54/sr, representing the radiance transmittance (Wei et al. 2015). The \({C}_{t}^{r}=0.013\)/sr represents the contrast threshold of what is visible in air (Lee et al. 2015). The \({K}_{T}\) is the upward diffuse attenuation coefficient, and the bottom is assumed to be a Lambertian reflector (Lee et al. 2015; Volpe et al. 2011):

$${K}_{T}=\left(a+{b}_{b}\right)1.04{\left(1+5.4u\right)}^{0.5}= {K}_{d}\frac{{1.04\left(1+5.4u\right)}^{0.5}}{{\frac{1}{\left(1- \frac{{\mathrm{sin}\left({\theta }_{s}\right)}^{2}}{{RI}^{2}}\right)}}^{0.5}}$$
(16)

where \(RI\) is the refractive index value of pure water, which is equal to an empirical constant of 1.34, and \(u\) is defined as \({b}_{b}/(a+{b}_{b})\) (Jiang et al. 2019). Therefore, the \({K}_{T}/{K}_{d}\) only depend on the \(u\) when \({\theta }_{s}\) is confirmed. According to the study of Lee et al. (2015), an empirical equation of \({K}_{T}=1.5{K}_{d}\) was determined using a large dataset covering different IOPs, including Case-1 and Case-2 waters. Nevertheless, previous studies found that the value of \({K}_{T}/{K}_{d}\) can vary and depends on the optical properties (Jiang et al. 2019, Lee et al. 1994, Philpot 1989), which means that the fixed value of \({K}_{T}/{K}_{d}\) may produce large bias for estimating ZSD in various waters.

According to our field measured data, the values of \({K}_{T}/{K}_{d}\) showed a wide range (0.5–1.71) in different waters (Fig. 2d), which was different from the empirical constant of 1.5 suggested by Lee et al. (2016). The average values of \({K}_{T}/{K}_{d}\) were 1.44 in highly turbid (HT), whereas much lower mean values of 0.88 and 0.69 in clear waters (CW) and slightly turbid waters (ST) were found, respectively. Based on the above findings, we can conclude that the new algorithm has satisfying performance in deriving \(a\), \({K}_{d}\), and \({K}_{T}/{K}_{d}\) in various waters, and it can be applied to identify weaknesses in the original ZSDV6 algorithm. According to previous studies (Liu et al. 2020; Xue et al. 2019), the bio-optical properties of inland waters are strongly influenced by the absorption and backscattering of water quality parameters, which are significantly different from the training dataset collected from marine and coastal waters by the original ZSDV6 algorithm (Lee et al. 2016). For example, the lakes in Yangtze River Plain cover a wide range of Cchla of 2.48–320.53 mg/L and \({a}_{\mathrm{CDOM}}(440)\) of 0.05–2.18/m (Deyong et al. 2009; Xu et al. 2021b; Zeng et al. 2022a), while relatively small values of Cchla (0.2–40 mg/L) and \({a}_{\mathrm{CDOM}}(440)\) (0.02–0.4/m) were found in the training dataset of original ZSDV6 algorithm (Lee et al. 1994). These findings indicated that high dynamic of bio-optical properties (e.g., \(a\) and \({b}_{b}\)) of inland waters may lead to a wide range of \(u\), \({K}_{d}\), \({K}_{T}\), and ultimately, \({K}_{T}/{K}_{d}\). Based on the above analysis, we can conclude that the value of \({K}_{T}/{K}_{d}\) vary in different waters, and the application of the original ZSDV6 algorithm in various waters requires specific parameterization. Therefore, we re-calculated the values of \({K}_{T}\) and \({K}_{d}\) in the new algorithm following Eqs. (14) and (16).

Results

Biogeochemical characterization

The basic statistics of the field collected water quality parameters in investigated waters are presented in Table S2, comprising a wide variability and covering a wide concentration range of water parameters. The CW have very low CChla (0.14–19.88 μg/L), \({a}_{\mathrm{CDOM}}\left(443\right)\) (0.098–1.01/m), and CTSM (0.47–4.52 mg/L), but high average ZSD value (3.23 m). The ST have moderate CChla (7.27–34.34 μg/L), \({a}_{\mathrm{CDOM}}\left(443\right)\) (0.15–1.87 m−1), and CTSM (1.96–7.5 mg/L). On the contrary, HT is characterized by low water clarity (mean ZSD of 0.43), which closely related to relatively higher CChla (4.45–30.93 μg/L), \({a}_{\mathrm{CDOM}}\left(443\right)\) (0.41–3.78/m), and CTSM (2.5–200.53 mg/L).

Performance of the developed algorithms

Three key improved processes are implemented in the new algorithm. First, suitable reference bands (560 nm, 670 nm, and 740 nm) were available for various waters to estimate the reference total absorption \(a\left({\uplambda }_{0}\right)\). Secondly, some specific but more accurate parameterization steps of derived \({K}_{d}\) were adopted in the new algorithm to reduce defects. Third, more accurate values of \({K}_{T}/{K}_{d}\) were obtained in the new algorithm. Figure 3 shows the performance of the ZSDV6 algorithm (a, b, c, and g) and ZSD20 algorithm (d, e, f, and h) in three types of water based on dataset I. The original algorithm may have a moderate performance in CW with a MAPE of 30.86% and an RMSE of 1.28 m (Fig. 3a), which can be improved by using the new developed algorithm with relative higher accuracy (MAPE = 16.88%, RMSE = 0.63 m) (Fig. 3d). At the same time, similar performances were found between the original and new algorithm in water samples dominated by phytoplankton (Fig. S2), which are located in the center of the lake and keep stable deposition condition and are not susceptible to environmental disturbance. In ST and HT, both in non-phytoplankton and phytoplankton-dominated waters, significant underestimations were found for the original algorithm, while the new algorithm performed better (Fig. S2 and Fig. 3). Figure 3g and h summarize the total performance of the original algorithm and our developed algorithm. It can be concluded that the improved algorithm of ZSD20 gave more accurate performance, with value of MAPE reduced from 31.9 to 12.7% and RMSE reduced from 0.74 to 0.34 m.

Fig. 3
figure 3

Performance comparison between ZSDV6 algorithm (a, c, e, and g) and ZSD20 algorithm (b, d, f, and h)

At the same time, we used dataset II (described in the “The dataset II (validation)” section) as an independent dataset to further test the generality and effectiveness of the proposed algorithm in other waters. The statistical validations are summarized in Table 3. Although an encouraging performance was confirmed using the new algorithm (MAPE = 19.4%, RMSE = 0.67 m), some unsatisfactory results also existed, such as some underestimations of surveyed lakes in Tibet Plateau, which may be related to imperfect atmospheric correction in this area. Regrettably, the field measured spectral data that was closest to Sentinel-2 MSI imaging time was lacking in the examination of the Acolite atmospheric correction. Although these defects were found in our dataset, the new algorithm performed well with satisfactory accuracy and had great potential to further estimate the ZSD in satellite images.

Table 3 The performance of ZSD20 in 14 lakes based on validated dataset (dataset II)

Spatiotemporal variation of water clarity

Seasonal distribution of ZSD

To demonstrate the application of the developed model (ZSD20) in inland waters, we mapped spatiotemporal distribution of ZSD in 410 waters of China, including natural lakes and artificial reservoirs. These waterbodies account for more than 60% of all waterbodies across China with an area more than 10 km2. The seasonal variations of water clarity in these waterbodies is shown in Fig. 4 and Fig. S3, which shows long-term mean values between 8.24 m (in Lake Yamdrok) and 0.1 m, indicating remarkable spatial and temporal heterogeneity during the observed period (2015–2021), which may be related to different factors among different sub-regions (Liu et al. 2021a; Shen et al. 2020).

Fig. 4
figure 4

The seasonal variations of water clarity in lakes of China. The retrieval results of ZSD in some lakes are not shown due to lake freezing period in winter. The waterbodies in Huaihe Basin and middle and lower reaches of Yangtze River Plain (HB-MLYRL) are marked by the dashed rectangle

The ZSD of 410 waters across China showed a relatively high ZSD in summer (1.03 ± 1.26 m) and autumn (1.02 ± 1.21 m) but low ZSD in spring (0.69 ± 0.66 m) and winter (0.6 ± 0.72 m), exhibiting a significant temporal variation of water clarity between different seasons (Fig. 4). It was observed that a relatively small seasonal variation of ZSD was found in Huaihe Basin and middle and lower reaches of Yangtze River Plain (HB-MLYRL) (Fig. 5), which may be related to the stable dominant factors in these areas. The waters in these areas always receive major impacts from hydro-climatological events and anthropogenic interference throughout the year, resulting in low water clarity (Song et al. 2020; Wang et al. 2022; Xu et al. 2018b).

Fig. 5
figure 5

The seasonal variations of water clarity in Huaihe Basin and middle and lower reaches of Yangtze River Plain (HB-MLYRL)

At the same time, the seasonal dynamic in proportion of water types was statistically analyzed. The highly turbid waters (HT) always maintain the highest proportion of water types in all seasons (≥ 70%), especially in spring (80%). For slightly turbid waters (ST), the highest percentage was reached in summer, which had a higher proportion of 21% than those in winter (13%), spring (17%), and autumn (19%). The largest proportion of clear waters (CW), accounting for 11% of all waters, were found in autumn, with only 3% and 4% in spring and winter, respectively.

Spatial distribution of ZSD

For spatial pattern, waterbodies with low ZSD were observed in Eastern China (MP, NE, and EP), where the ZSD ranged from 0.1 to 7.31 m with an average value of 0.25 m (Fig. 6). Most waters in Eastern China are characterized by shallow water depth and highly turbidity, water clarity was influenced by weather conditions and high disturbance from anthropogenic activities (Lei et al. 2020; Song et al. 2020). Conversely, the waters in TP exhibited the highest long-term mean ZSD (2.7 ± 1.4 m), which was much higher than that of all surveyed waters (0.77 ± 0.75 m). The proportion of lakes with different water types is shown in Fig. 4b. Over 50% of lakes in EP, MP, NE, and XJ are characterized by highly turbidity. The largest proportion of CW was found in TP, accounting for 40.98% of all waters, with only 1.64% and 1.67% in MP and NE, respectively. According to previous studies (Liu et al. 2021b; Pi et al. 2020), most waters in TP are characterized by deep water depth (> 15 m), stable deposition condition, and low human disturbance, and are not susceptible environmental disturbance and maintain relatively higher ZSD. In addition, increasing trends of ZSD were found from low–high latitudes and east–west longitudes, indicating that the eutrophic conditions of waters in southern and western China are better than those in northern and eastern China (Fig. 6b), which is consistent with previous findings (Hu et al. 2022; Song et al. 2020). It can be concluded that water clarity exhibited significant spatial variations in China based on the above analysis.

Fig. 6
figure 6

The spatial distribution in water clarity of 410 waters (a), the proportion of various water types in six sub-regions (b), with statistical results sorted by latitude (c) and longitude (d)

Discussion

Necessity of the improvements in new algorithms

Varied optical active components result in complex bio-optical properties in inland waters characterized by various optical water types. Even at the same phenological period, a single waterbody can also exhibit significant spatially heterogeneous distribution of water optical properties (Matsushita et al. 2015), resulting in the varied performance of ZSD algorithm among those water types. Therefore, the specific optical properties should be considered when applying the ZSD algorithms to various waters.

In the original algorithm of ZSDV6, the inherent optical properties (IOPs) were first derived based on the QAA algorithm, the ZSD was further derived relying on the minimum \({K}_{d}\) value among the transparent spectral domain and its corresponding \({R}_{rs}\left(\lambda \right)\) without parameter adjustments in open ocean and coastal waters (Lee et al. 2016; Shang et al. 2016). Nevertheless, several significant shortages of the original QAA were found for applications in inland waters (Bai et al. 2020; Jiang et al. 2019; Rodrigues et al. 2017). In this study, obvious underestimation of the original algorithm was found in ST (MAPE = 34.3%, RMSE = 0.63 m) with poor performance in CW (MAPE = 30.89%, RMSE = 1.33 m) and HT (MAPE = 36.1%, RMSE = 0.11 m) (Fig. 3) which may be due to the following reasons.

Firstly, the existing estimation algorithm of IOPs at fixed reference wavebands may produce large bias in inland waters due to scattering and absorption (Wang et al. 2017; Watanabe et al. 2016). Therefore, it is necessary to select suitable reference wavebands and bridge the difference between the estimated and field values of total absorption coefficient in various inland waters (Step 3 in Table 2). Additionally, it is possible that the coefficient parameters of derived \({K}_{d}\) in the original algorithm were not applicable to inland waters (Watanabe et al. 2016). Therefore, some specific parameterization steps were adopted in the new algorithm to reduce defects (see in Fig. 2). The third issue is that a fixed ratio value of \({K}_{T}/{K}_{d}\) may lead to an obvious bias estimation in inland waters. In this study, the seasonal values of \({K}_{T}/{K}_{d}\) in 410 waters across China were derived using data from 2015–2020, which span a wide range of 0.5–1.81 (Fig. 7). It can be seen that the values of \({K}_{T}/{K}_{d}\) tend to be smaller in clear water (average value of 0.73), slightly turbid waters (1.05), and highly turbid waters (1.47), which is similar to previous study by Jiang et al. (2019). Therefore, improvements are needed in the newly developed algorithm for application in various inland waters. Compared to the native algorithm, three important improvements have been made in the new algorithm. First, a strategy was designed to select suitable reference bands (560 nm, 670 nm, and 740 nm) for various waters to estimate the reference total absorption \(a\left({\uplambda }_{0}\right)\). Secondly, optimized \({K}_{d}\) were obtained to reduce defects. Third, more appropriate values of \({K}_{T}/{K}_{d}\) were implemented in the new algorithm. The validation results indicated that the new algorithm gave more accurate performance, with RMSE reduced from 31.9% to 12.7% and RMSE from 0.74 to 0.34 m (Fig. 3).

Fig. 7
figure 7

The seasonal average \({K}_{T}/{K}_{d}\) in various waters. The retrieval results in winter are not shown due to lake freezing period in winter

Comparison with existing algorithms

A number of empirical and semi-analytical algorithms have been developed to derive ZSD in various waters, covering clear waters to highly turbid waters (Feng et al. 2019; Lee et al. 2016; Mishra et al. 2014; Rodrigues et al. 2017), which were selected for comparison with the algorithm in this study (Table 4). These algorithms can be separated into two groups: the empirical algorithm group (Binding et al. 2015; Giardino et al. 2001; Olmanson et al. 2011; Ren et al. 2018; Wu et al. 2008a, 2008b; Zhang et al. 2021) and the semi-analytical algorithm group (Lee et al. 2016; Rodrigues et al. 2017). However, the performance of these algorithms cannot be directly compared due to the significant differences of band settings in various sensors (e.g., MODIS and Landsat 8). Therefore, all key parameters in the original algorithms must be re-calibrated and validated using the simulated spectrum and field measured datasets.

Table 4 Comparison of performance of dataset II between the existing models and the proposed models

The performance comparisons between existing algorithms and our algorithm is presented in Table 4. The model of Ren et al. (2018), originally calibrated and validated with data collected from moderate to highly turbid waters, shows relatively higher accuracy (MAPE = 68.44% and RMSE = 0.59 m) compared to others, but a much lower accuracy than the original (MAPE = 21.68% and RMSE = 0.076 m). In contrast, the model of Olmanson et al. (2011) yielded a worse performance than others based on simulated data, with MAPE = 255.91% and RMSE = 4.48 m. Meanwhile, the single band models of Zhang et al. (2021), Wu et al. (2008b), and Binding et al. (2015) also produced relatively poor performance based on our validation data. These models are characterized by their simplicity of form and ease of application (Binding et al. 2015; Lei et al. 2020; Zhang et al. 2012). However, the choice of a single band usually varies with the optical properties of the water, developing a generic estimation model for various types of water may produce large uncertainties based on site-specific data (Alikas and Kratzer 2017; Binding et al. 2015; Zhang et al. 2012).

Based on two existing semi-analytical algorithms, significant differences were also observed between the field measured and estimated ZSD after the evaluation of two models from Lee et al. (2016) and Rodrigues et al. (2017) which were carried out based on the same in situ measurements, with MAPE and RMSE of 31.58%, 0.83 m, and 75.89%, 1.6 m, respectively (Table 2), showing poorer accuracy compared to the new proposed models (MAPE = 19.4%, RMSE = 0.67 m). These findings were consistent with previous studies where inadequate performance may occur when these semi-analytical models are applied in turbid inland and coastal waters, such as Lake Taihu, Lake Hongze, Lake Dongting, the Bohai Sea, and the Yellow Sea (Bai et al. 2020; Feng et al. 2019; Jiang et al. 2019; Shang et al. 2016). Although semi-analytical algorithms show greater potential than empirical algorithms for retrieving ZSD, they are also sensitive to errors introduced by different water optical properties (Rodrigues et al. 2017; Yang et al. 2013). Overall, our proposed algorithm showed the best estimation effect when compared to existing algorithms.

Advantage and limitations of the model

The new semi-analytical model was developed on the basis of extensive field measured SDD and Sentinel-2 MSI simulated spectra, and the calibration and validation datasets in the model were collected from different types of water with various dataset over a large geographical area, demonstrating the model’s promise for applications in ZSD system assessments on a continental/global scale. Several key wavebands were applied for developing the model, which are also equipped on almost all current earth resource satellites and ocean color satellite platforms (e.g., Sentinel-3, GOCI, and MODIS). The new algorithm performed well with satisfactory accuracy (MAPEs < 39%, RMSEs < 19 m) (Table S3), implying that it had broad applicability for the estimation of ZSD using a variety of satellite data. However, it should be noted that these satellite data may has limited applications in smaller inland waters (e.g., water area < 10 km2) due to poor spatial resolution (> 250 m).

At the same time, a number of limitations and challenges may stand in the way of practical applications of the proposed method. The atmospheric correction effects are an important factor affecting the accuracy of biochemical parameter retrievals in inland waters (Lei et al. 2019; Ren et al. 2018). In this study, the Acolite model corrected MSI imagery showed promising performance in most visible wavebands, with MAPE < 22% and RMSE < 0.002 (Fig. 2). However, the Acolite model may produce relatively poor performance at blue (443 and 490 nm) wavebands in highly turbid waters, as well as at NIR (740 nm) wavebands in clean waters (Fig. S1). There is no doubt that these negative effects, although the errors remain within permissible limits, could introduce uncertainty into the resulting IOPs and further reduce the final accuracy of the algorithms. Therefore, a reliable estimation of water quality parameters is closely related to the accuracy of corrected spectra, indicating that an excellent atmospheric correction model with high accuracy is necessary.

Furthermore, the bottom effect of shallow water may introduce significant uncertainty on ZSD estimation. For optically deep waters, the upwelling water leaving radiance is regarded as the contributions of water column constituents and the bottom effect can be ignored (Li et al. 2017a, 2018; Wei et al. 2018). However, such assumptions may not hold in optically shallow waters, thus greatly limiting the application of the algorithms. To avoid the reflectance contribution from lake bottom, the new model in this study was calibrated and validated using datasets where euphotic depth is significantly lower than water depth. As there is currently insufficient data to validate whether the constructed algorithms can accurately estimate ZSD in open ocean water or clear waters, caution should be taken when applying the newly developed algorithm under these conditions.

Conclusions

An improved semi-analytical algorithm (ZSD20) was developed for estimating water clarity in various waters with a wide range of water optical properties, which was recalibrated and re-parameterized by our field measured data collected from 16 lakes in China, acquiring a satisfying total performance (MAPE = 19.4%, RMSE = 0.67 m). The algorithm was implemented in 410 waters of China to demonstrate significant spatiotemporal variation of water clarity based on Sentinel-2 MSI imagery from 2018 to 2021. Compared to the native algorithm (ZSDV6), three key improved processes are contained in the new algorithm. First, a strategy was designed to select suitable reference bands (560 nm, 670 nm, and 740 nm) for new algorithm to estimate the reference total absorption \(a\left({\uplambda }_{0}\right)\). Secondly, some specific but more accurate parameterization steps of derived \({K}_{d}\) were adopted in the new algorithm to reduce defects. Third, more realistic values of \({K}_{T}/{K}_{d}\) were implemented in the hybrid algorithm. This study provides a new strategy for estimating water clarity in various waters with a wide range of optical properties, benefitting the monitoring and mitigation of adverse effects on aquatic ecosystems.