Introduction

Water quality in many lakes and reservoirs is poorly characterized due to data limitations associated either with difficulty of access and/or lack of monitoring resources. With limited data, assessing the ecological status and managing these important systems over time are severely constrained. As such, remotely sensing changes in water quality provide an opportunity to fill spatio-temporal gaps in the global aquatic data record. This is particularly advantageous in developing countries, where assessment of water pollution tends to be sporadic at best.

The Landsat program provides an important long-term record of change in terrestrial and aquatic systems. Its use in monitoring water quality of freshwater systems has been explored with varying degrees of success (Härmä et al. 2001; Ma and Dai 2005; Olmanson et al. 2008; Svab et al. 2005; Tebbs et al. 2013; Zhou and Zhao 2011; Olmanson et al. 2016; Gitelson and Yacobi 1995; Lim and Choi 2015; Markogianni et al. 2014). While Landsat band allocations can be considered to be suboptimal for water quality quantification, the program has proven to be particularly useful for small inland lakes, given the relatively high spatial resolution of the sensors (30 m), as compared to the more specialized ocean color satellites such as MODIS (250 m–1.1 km), MERIS (300 m), and SeaWiFS (1.1 km) (Olmanson et al. 2015). Landsat is also advantageous as compared to higher spatial resolution satellite sensors—such as those aboard the SPOT satellite program, ASTER or IKONOS—given its more extensive temporal record (dating back to 1972), highly accessible images with global spatial coverage, and most importantly the open access Landsat data-archive (Woodcock et al. 2008; Wulder et al. 2012).

While semi-analytical models have been used with Landsat data (Allan et al. 2015; Salama et al. 2012), Landsat-based water quality models predominantly rely on empirical regression-based relationships that relate satellite bands or band ratios to optically active in situ water column characteristics, such as chlorophyll-a, total suspended matter (TSM), and water clarity (Al-Fahdawi et al. 2015; Alparslan et al. 2007; Chao Rodríguez et al. 2014; Duan et al. 2007; Hicks et al. 2013; Karakaya et al. 2011; Kloiber et al. 2002b; Ma and Dai 2005; Matthews 2011; Olmanson et al. 2015, 2016; Tebbs et al. 2013; Tyler et al. 2006). The broad spectral bands in Landsat do not allow for stable inversion in bio-optical models (Allan et al. 2015). Empirical algorithms are often developed for specific lakes (e.g., Brivio et al. 2001; Mayo et al. 1995; Ostlund et al. 2001; Tyler et al. 2006; Vincent et al. 2004; Al-Fahdawi et al. 2015; Chao Rodríguez et al. 2014; Markogianni et al. 2014) or for a group of optically similar regional lakes (e.g., Brezonik et al. 2005; Hadjimitsis and Clayton 2009; Hansen et al. 2015; Härmä et al. 2001; Hicks et al. 2013; Kloiber et al. 2002a, b; McCullough et al. 2012; Olmanson et al. 2008, 2011; Sass et al. 2007), where certain band ratios appear to be more generally informative than others at characterizing water quality parameters across systems.

In order to take advantage of the full scope of Landsat-based water quality monitoring, empirical algorithm transferability across Landsat sensors requires proper assessment. While some have examined the robustness of developed water quality algorithms to differences between the Landsat TM and ETM+ sensors aboard the Landsat 4, 5, and 7 satellites (Alavipanah et al. 2007; Chander et al. 2009; Kutser 2012; Olmanson et al. 2008; Svab et al. 2005; Teillet et al. 2001), limited work has been done to quantify transferability to the new OLI sensor recently launched (February 1, 2013) onboard the Landsat 8 satellite (Holden and Woodcock 2016; Ke et al. 2015; Lymburner et al. 2016; Olmanson et al. 2016; Pahlevan et al. 2014; Roy et al. 2016; Zhu et al. 2016). One recent effort showed that Landsat 7 and 8 sensors were generally comparable when it came to assessing colored dissolved organic matter (CDOM) concentrations and water clarity in the oligotrophic lakes and reservoirs of Minnesota (Olmanson et al. 2016).

This study develops sensor-specific water quality monitoring algorithms for quantifying chlorophyll-a, TSM, and water clarity levels using in situ data collected from a semi-arid hypereutrophic reservoir. The performance of these models is then compared with a “pooled model” that disregards sensor differences as a test of algorithm transferability. The inclusion of ancillary environmental data is evaluated in terms of its ability to explain seasonal variability in the lake’s optical signals and improve model fit. The study concludes by discussing the implications of observed algorithm differences across sensors on the potential of using the Landsat program for monitoring water quality in hypereutrophic lakes and reservoirs.

Methods and materials

Pilot study area

The pilot study area, the Qaraoun Reservoir (33o 34′ N, 35o 42′ E, altitude = 840 m), is located in the Bekaa valley Lebanon and has a surface area that fluctuates between 4 and 10 km2 depending on seasonal precipitation patterns and dam discharge rates. The maximum depth of the reservoir is 45 m, while median depth fluctuates between 10 and 20 m depending on seasonal changes. Fed primarily by the Litani River, the reservoir, which was constructed in 1959 to provide water for hydropower as well as irrigation and domestic supply, has a storage capacity of 220 MCM.

Since its construction, the reservoir has been subject to the uncontrolled discharge of both point and non-point pollution sources (BAMAS 2005; ELARD 2011; Ministry of Environment 2011), resulting in the deterioration of its water quality and the development of highly eutrophic and turbid conditions based on reported chlorophyll-a content (Fadel et al. 2014; Slim et al. 2012), with massive algae blooms and high levels of nutrient pollution (El-Fadel and Zeinati 2000; Shaban and Nassif 2007). Most of the turbidity in the reservoir is from algae growth, with sediment entering the water body during periods of heavy rainfall in the winter months. Past monitoring programs were short term and generally inconsistent both in terms of the parameters measured and in sampling locations (Fadel et al. 2014; El-Fadel et al. 2003; Jurdi et al. 2002; Korfali et al. 2006; Shaban and Nassif 2007; International Resources Group 2011). Establishing robust water quality inference based on remotely sensed data can thus substantially improve the management of similar inland freshwater systems that lack adequate in situ data.

In situ monitoring

Sixteen in situ water quality sampling campaigns were conducted between July 2013 and May 2015, whereby a total of 108 surface samples were collected and analyzed. The campaigns were planned to be concurrent with the overpass of the Landsat 7 or 8 satellites (Table 1). Measurements were taken at sampling stations accessed by boat between 9 AM and 12 noon (local time) to capture lake conditions around the 10 AM (local time) satellite overpass time. To ensure the utility of the satellite images, in situ sampling was restricted to clear days with minimal cloud cover over the reservoir, thus limiting winter sampling because the area experiences heavy winter precipitation typical of the Eastern Mediterranean region.

Table 1 Summary of in situ sampling dates and associated Landsat satellite

In situ samples were collected from 9 sampling stations across the water surface of the reservoir (Fig. 1). This number was reduced down to 5 stations in 2014 because of extreme drought conditions resulting in low reservoir volume and surface area. Secchi disk depth (SDD) was measured in the field using a 20-cm diameter disk to assess water transparency. Two 200 ml surface water grab-samples were collected at ~ 10 cm below the water surface and were subsequently analyzed in the lab for chlorophyll-a and TSM. The laboratory analysis was conducted in accordance to Standard Methods for the Examination of Water and Wastewater (APHA, WEF, AWWA 2012). For chlorophyll-a analysis, a known volume of sample with a magnesium carbonate buffer was filtered through a membrane filter paper, which was stored overnight at − 20 °C to facilitate bursting of algal cells. Chlorophyll-a was then extracted using a 90% acetone solution and sonification. Extracts were seeped in the acetone solution overnight and then clarified using centrifugation. Chlorophyll-a concentrations were calculated based on absorbance at 664, 647, and 630 nm. The 750-nm wavelength was used to correct for turbidity (APHA, WEF, AWWA 2012). Absorbance was measured on a HACH 4000 spectrophotometer. For TSM, a known sample volume was filtered through pre-dried glass fiber filter paper. Samples were then dried in a 105 °C oven and weighed. The total fixed matter content (i.e., mineral sediments) was also measured by igniting the TSM samples in a 550 °C furnace to assess the proportion of non-organic faction within the total matter (APHA, WEF, AWWA 2012). Non-fixed (organic) matter content consists of algae and CDOM and can be highly correlated to chlorophyll-a depending on the proportional representation of algae in the TSM.

Fig. 1
figure 1

General location of study reservoir and sampling locations. True color image is a Landsat 8 image taken on July 4, 2013 showing an algal bloom

Image processing

Available Landsat ETM+ and OLI surface reflectance image products of the study area were obtained from the USGS Earth Explorer website, http://earthexplorer.usgs.gov/ (Row 174, Path 037). These products include conversion to top-of-atmosphere (TOA) reflectance (ρp). Atmospheric correction of Landsat 7 images was based on the USGS Landsat Ecosystem Disturbance Adaptive Processing System (LEDAPS) algorithm (Masek et al. 2006). LEDAPS uses MODIS atmospheric correction routines on Landsat Level-1 data, with the Second Simulation of a Satellite Signal in the Solar Spectrum (6S) radiative transfer model. The use of the 6S model was found to improve the reliability of water quality algorithms developed for reservoirs and eutrophic lakes (Bonansea et al. 2015; Tebbs et al. 2013). For Landsat 8, the atmospheric correction was based on the provisional USGS L8SR algorithm, which uses an internal radiative transfer model with input from a MODIS climate modeling grid. Both LEDAPS and L8SR Surface Reflectance datasets have been compared to other atmospheric correction algorithms including ATCOR-2 (Vuolo et al. 2015), AERONET (Claverie et al. 2015; Maiersperger et al. 2013), MODIS NBAR (Feng et al. 2013), 6S and ELM (Nazeer et al. 2014), and a MODTRAN-based model (Sendra et al. 2015). These studies have found that LEDAPS and L8SR compared favorably with the rest of the atmospheric correction algorithms. Note that all images underwent standard geometric and terrain correction.

The Landsat images were further processed using the Raster (Hijmans and van Etten 2014) and Landsat (Goslee 2011) packages in the software R (R Core Team 2015). Landsat 7 images were corrected for the failure of the Scan Line Corrector (SLC) on the ETM+ sensor by using the gap masks provided with the Level 1T products. As Qaraoun Reservoir is positioned towards the center of the Row 174, Path 037 scene, only a small proportion of the reservoir surface area was affected by missing SLC data (with the exact proportion of missing data varying by image acquisition date). Additionally, land mask layers were developed for both Landsat 7 and 8 images using histogram splicing to trace the outline of the reservoir and remove contamination with land pixels (Frazier and Page 2000).

Water quality algorithm development

Remote sensing-based algorithm development favors the use of band ratios as opposed to single bands as predictor variables, since band ratios tend to eliminate noise in the reflectance signal relating to underlying water body characteristics (Lee et al. 2006; O’Reilly et al. 1998; O’Reilly et al. 2001). The most effective Landsat TM or ETM+ band ratios for quantifying chlorophyll-a, TSM, and SDD in lakes and reservoirs were defined by analysis of previous studies and selecting of spectral band ratios that showed clear linkages to the radiometric properties of the measured water quality variables used (Table 2). Several single band combinations were also explored due to their prevalence in the literature. Both the NIR (Onderka and Pekárová 2008; Long and Pavelsky 2013; Tebbs et al. 2013; Hicks et al. 2013) and the Red band (Ma and Dai 2005; Ritchie et al. 1987; Nellis et al. 1998; Svab et al. 2005; Tyler et al. 2006; Long and Pavelsky 2013; Pahlevan et al. 2014) have been used to determine TSM concentrations. Similarly, for SDD, the use of the Red band (Brezonik et al. 2005; Dekker et al. 2001; McCullough et al. 2012), and the Blue band (Kloiber et al. 2002a; McCullough et al. 2012; Olmanson et al. 2011, 2008; Sawaya et al. 2003), or their ratio (Chipman et al. 2004; Greb et al. 2009; Olmanson et al. 2016; Fuller et al. 2004) have been reported to be effective. While both Landsat 7 and 8 have similar spectral band placements for the Blue (ETM+ band 1, 0.45–0.52 μm; OLI band 2, 0.45–0.51 μm), Green (ETM+ band 2, 0.52–0.60 μm; OLI band 3: 0.53–0.59 μm), Red (ETM+ band 3, 0.63–0.69 μm; OLI band 4, 0.64–0.67 μm), and NIR (ETM+ band 4, 0.76–0.90 μm; OLI band 5, 0.85–0.88 μm) bands, the latter features a new band that has been termed the “Ultra-Blue” (OLI band 1: 0.43–0.45 μm) that promises to be effective in coastal studies as well as for tracking atmospheric aerosols.

Table 2 Landsat band ratios tested in this study for chlorophyll-a, TSM, and SDD and reference to their successful use in the remote sensing literature

In situ data were matched with Landsat data by first generating 30 and 60 m buffers around each sampling station; pixel values in each buffer were then averaged. While the buffering distance can have a significant role in algorithm development (McCullough et al. 2012), this was not the case in this study. As such, the analysis proceeded based on the 30-m buffer distance. Although there can be significant variations in the concentration of water quality parameters within the 30 m boundary of the satellite pixel area, it is common in remote sensing work to assume that in situ point measurements are representative of their immediate neighborhood (Bonansea et al. 2015; Brezonik et al. 2005; Cheng and Lei 2001; Dekker and Peters 1993; Olmanson et al. 2011; Serwan and Baban 1993; Sun et al. 2015; Tebbs et al. 2013; Yacobi et al. 1995). Such an assessment my not be valid in the event of a localized algal bloom. In this study, only two data point exceeded the 500 μg/L threshold defined for algal scums (Matthews et al. 2012); one of which was excluded from the analysis (Chl-a = 5500 μg/L). It should be emphasized that comparing a point measurement to an aerial prediction can introduce errors associated with scale mismatch (Banerjee et al. 2014).

One in situ sampling point was removed from the analysis of images between August 13 and November 17 2013 (5 data points in total) due to its close proximity to the reservoir shoreline in an effort to minimize land contamination of water spectrum. Exploratory data analysis also identified 11 potential chlorophyll-a outliers out of the remaining 103 samples. One measurement of chlorophyll-a exceeded 5500 μg/L and therefore was excluded. Similarly, 10 low chlorophyll-a values clustered away from the main distribution (in situ chlorophyll-a values < 10 μg/L) and were also omitted from the analysis as we believe they represent atypical chlorophyll-a conditions in the reservoir. As such, the chlorophyll-a model development was based on a total of 92 in situ sampling points, while the TSM and SDD model development was based on 103 in situ sampling points.

Initial Spearman’s Rank correlations were determined between in situ chlorophyll-a, TSM and SDD and the extracted information from the Blue, Green, Red, and NIR bands for Landsat 7 and 8 satellites separately to facilitate interpretation of band-ratio results. For Landsat 8, the “Ultra Blue” band was also included in the correlation analysis given that it proved to be useful in predicting CDOM and water clarity in the predominantly oligotrophic waters of Minnesota’s lakes and reservoirs (Olmanson et al. 2016). Band ratios from Table 2 were then used to develop empirical algorithms using simple linear regression. Band ratios that included the Blue band in the numerator or denominator (\( \frac{\mathrm{Blue}-\mathrm{Red}}{\mathrm{Green}} \) and \( \frac{\mathrm{Blue}}{\mathrm{Red}} \) for chlorophyll-a, \( \frac{\mathrm{NIR}}{\mathrm{Blue}} \) for TSM, and \( \frac{\mathrm{Blue}}{\mathrm{Red}} \) for SDD) were tested with both the Blue and Ultra Blue bands for Landsat 8 data. Algorithms were developed separately for Landsat 7 and 8. The single-sensor models were compared with the algorithms that were developed through pooling (combining) the data from both sensors to generate algorithms that do not account for inter-sensor differences. Pooling assumes that the data collected from the two sensors are interchangeable and as such it can be considered as a test of algorithm robustness between the two sensors. Full sensor transferability between algorithms will occur if there is no difference between the algorithm functional form and coefficients across the two sensors. Log transformation of all dependent variables (chlorophyll-a, TSM, SDD) and many of the predictor variables (\( \frac{\mathrm{NIR}}{\mathrm{Red}} \) , \( \frac{\mathrm{Blue}}{\mathrm{Red}} \) , \( \frac{\mathrm{NIR}}{\mathrm{Blue}} \),\( \frac{\mathrm{Ultra}\ \mathrm{Blue}}{\mathrm{Red}} \), \( \frac{\mathrm{NIR}}{\mathrm{Ultra}\ \mathrm{Blue}} \)) was required to meet the assumptions of parametric regression.

Single predictor variable water quality algorithms were then expanded using stepwise regressions to assess if more complex models could improve the quantification of chlorophyll-a, TSM, and SDD. All band ratios listed in Table 2, along with the single Red and NIR bands for TSM models, and the Red and Blue bands for SDD models, were used in the stepwise process. In an effort to reduce overfitting, the models were restricted to have a maximum of two band-based predictors. The best models were chosen based on Akaike’s Information Criteria (AIC) (Akaike 1974), with the lowest AIC value indicating the most parsimonious model. If two competing models were found to be equally supported by the data, a common functional form for the Landsat 7, 8 and combined models was given priority. Multiple regression models were also tested for collinearity between predictor variables using the variance inflation factor (VIF), with models rejected if the VIF is greater than 5 (Craney and Surles 2002; Mason et al. 2003). Ancillary data in the form of water temperature and seasonality were then included to expand these models to better capture changes in algae growth and community dynamics, sediment and nutrient inputs, changes in the coupling of optically active water column constituents, and internal lake mixing. Lake surface temperatures were based on the thermal bands of Landsat 7 and 8. The remotely sensed sensor-based temperatures were locally corrected based on in situ measurements (Eqs. 1 and 2). Temperature results indicated a strong model correspondence between Landsat and in situ-based measurements, with a slope very close to 1. Seasonality was estimated by using a sine wave based on the Julian day of image capture \( \left(\sin \left(\frac{2\uppi\ \mathrm{Julian}\ \mathrm{Day}}{365}\right)\right) \). Note that both temperature (after local correction with either Eqs. 1 or 2) and seasonality require no additional in situ measurements and can be easily incorporated in future Landsat-based water quality monitoring initiatives. Including temperature and/or seasonality in the final model was based on improvements in the AIC model scores (lower scores) and reduction in standard errors of model coefficients.

$$ \mathrm{Landsat}\kern0.37em 7\ \mathrm{Corrected}\ \mathrm{Temp}\kern0.37em \left({}^oC\right)=0.47+{0.96}^{\ast}\left(\frac{Band\ 6.1+ Band\ 6.2}{2}\right);\kern0.37em \left({\mathrm{R}}^2=0.95,\mathrm{n}=41\right) $$
(1)
$$ \mathrm{Landsat}\kern0.37em 8\ \mathrm{Corrected}\ \mathrm{Temp}\kern0.37em \left({}^oC\right)=0.90+{0.92}^{\ast}\left(\frac{Band\ 10+ Band\ 11}{2}\right);\kern0.37em \left({\mathrm{R}}^2=0.94,\mathrm{n}=41\right) $$
(2)

Model validation

A fourfold (k-fold) cross-validation was performed on the data using the DAAG package (Maindonald and Braun 2014) in the software R (R Core Team 2015). The data was partitioned into 4 equal subsamples, with 1 subsample used for validation and 3 used as training data. This process was repeated 4 times so that each subsample was used once for validation. R2 shrinkage was assessed for the cross-validation models using the bootstrap package (Tibshirani and Leisch 2015) in R.

Results

Spatio-temporal variations of in situ chlorophyll-a, TSM, and SDD across the 15 sampling campaigns are shown in Fig. 2. Overall chlorophyll-a concentrations were consistently high in the reservoir, with a median value of 71.0 μg/L (n = 102, range = 4.8–5502 μg/L). Chlorophyll-a concentrations were generally higher in the summer although algae blooms were also apparent in the fall and spring. TSM concentrations were moderate over the sampling period, with a median value of 12.5 mg/L (n = 103, range = 1.0–69.0 mg/L). Additionally, most of the TSM was found to be organic (median % organic = 79.6%, range = 43.8–100%). The median SDD level was 1.0 m (n = 103, range = 0.18–4.2 m). All three in situ parameters examined in this study highlight the eutrophic to hypereutrophic status of the reservoir, a direct consequence of excessive nutrient discharge upstream. Both TSM and SDD exhibited strong correlations with chlorophyll-a concentrations (Table 3). The high correlation between chlorophyll-a and SDD indicates that algae is the dominant factor impacting light penetration in the reservoir. Also, the high proportion of organic material in the TSM suggests that suspended matter in the reservoir are also dominated by algae and their byproducts. Thus, all three parameters are in some respects acting as different indicators of algae biomass in this study, yet with clear seasonal variability.

Fig. 2
figure 2

Variation in in situ chlorophyll-a, TSM, and SDD. Points represent individual sampling stations. Open circles represent field sampling values during LE7 overpass. Closed circles represent field sampling values during LC8 overpass

Table 3 Spearman’s Rank correlation coefficients (rho) for in situ water quality parameters. All correlations are significant (p < 0.0001, n = 92)

While correlations between in situ water constituents (chlorophyll-a, TSM, and SDD) and the Landsat bands exhibited similar patterns between the two sensors, Landsat 8 bands generally had lower correlations with the three water quality parameters (Table 4). Reflectance values are expected to generally increase with increasing chlorophyll-a and TSM, and with decreasing SDD, in the Green, Red, and NIR bands due to high reflectance and low absorbance of chlorophyll-a and suspended sediments in these spectral regions (Arenz et al. 1996; Gitelson 1992; Gitelson et al. 1993; Gitelson and Yacobi 1995; Le et al. 2009; Odermatt et al. 2012; Yacobi et al. 1995). Reflectance in the visible Blue spectrum is expected to be dictated by chlorophyll-a absorption on one hand and the backscattering of TSM (Mayo et al. 1995).

Table 4 Spearman’ Rank correlation coefficients (rho) between in situ water quality parameters and Landsat 7 (Blue = band 1, Green = band 2, Red = band 3, NIR = band 4), and Landsat 8 (Blue = band 2, Green = band 3, Red = band 4, NIR = band 5, Ultra Blue = band 1) bands. Bands represent surface reflectance

The observed differences in the correlations established between the sensor bands on one hand and the water quality parameters of interest on the other (Table 4) could be due to changes in band placement in the new Landsat 8 OLI sensor as compared to the Landsat 7 ETM+ and/or to the adoption of different atmospheric correction algorithms between the two sensors (Holden and Woodcock 2016). Differences in band placement are particularly pronounced in the NIR and Red regions. The weakest correlations were found between the water quality parameters and the Blue band(s), particularly for Landsat 8. The conflicting chlorophyll-a absorption and TSM backscattering features in the Landsat Blue bands can affect correlations with the monitored water quality parameters (Arenz et al. 1996; Gitelson and Yacobi 1995; Le et al. 2009; Odermatt et al. 2012). Tebbs et al. (2013) have also reported that in hypereutrophic systems, where algae cells concentrate at the surface, scattering tends to increase, thus diminishing light penetration and absorption. Moreover, the coarse spectral resolution in the Blue bands in Landsat (ETM+ band 1, 0.45–0.52 μm; OLI band 2, 0.45–0.51 μm) is expected to further complicate establishing robust relationships between chlorophyll-a concentrations and reflectance values in the Blue band.

Simple linear regression models developed for chlorophyll-a with band ratios showed stronger relationships with Landsat 7 images for all models that incorporated the Blue band. Conversely, the chlorophyll-a model based on the \( \left(\frac{\mathrm{NIR}}{\mathrm{Red}}\right) \) band ratio performed well for both sensors (Table 5). Upon pooling the results to develop a common algorithm for the two sensors, the performance of the \( \left(\frac{\mathrm{Blue}-\mathrm{Red}}{\mathrm{Green}}\right) \) and the \( \left(\frac{\mathrm{Blue}}{\mathrm{Red}}\right) \)-based models was generally weak, highlighting the differences found in the sensor-specific models. The common algorithm based on the \( \left(\frac{\mathrm{NIR}}{\mathrm{Red}}\right) \) ratio was found to be largely stable across sensors. The robustness of the \( \left(\frac{\mathrm{NIR}}{\mathrm{Red}}\right) \) algorithm across sensors, as compared to the Blue based algorithms, can be attributed to the diminished spectral interference by sediment particles and CDOM on the NIR/Red ratio in turbid and highly eutrophic waters (Gitelson and Yacobi 1995; Gurlin et al. 2011; Han et al. 1994; Stumpf et al. 2016). Expanding the models to incorporate multiple band ratios improved model fit across both sensors (adjusted R2 Landsat 7 = 0.70; adjusted R2 Landsat 8 = 0.50). Multiple regression models were not found to exhibit collinearity (VIF for Landsat 7 + 8 model = 2.04; VIF for Landsat 7 model = 3.54; VIF for Landsat 8 model = 2.09). The best models across the two sensors shared the same functional form, which highlights their robustness (Table 5).

Table 5 Chlorophyll-a regression models based on band ratios. Band ratios are based on surface reflectance; Water temperature (Temp) is in °C for corrected Landsat thermal bands; Season is calculated as sin((2πJ)/365), where J = Julian Day; Cross-validated R2 are based on fourfold cross-validation

Landsat 7-based TSM regression algorithms were also better than those based on Landsat 8 for models that incorporated the Blue band. Pooling the data from both sensors did not improve the models (Table 6). Expanding the TSM models to include multiple bands significantly improved model quantification of TSM for both sensors with no evidence of collinearity (VIF for Landsat 7 + 8 model = 2.56; VIF for Landsat 7 model = 3.90; VIF for Landsat 8 model = 2.29). The inclusion of a seasonal term and/or water temperature as a covariate proved advantageous. The best overall TSM model for Landsat 7 and 8 had an adjusted R2 of 0.81 and 0.58, respectively. The across-sensor pooled model had an adjusted R2 of 0.63.

Table 6 TSM regression models based on band ratios and single bands. Band ratios are based on surface reflectance. Water temperature (Temp) is in °C for corrected Landsat thermal bands; Season is calculated as sin((2πJ)/365), where J = Julian Day; Cross-validated R2 and residual sum of squares are based on fourfold cross-validation

Models relating SDD to the\( \left(\frac{\mathrm{Blue}}{\mathrm{Red}}\right) \) ratio proved to be effective for both satellites, with an adjusted R2 of 0.76 and 0.57 for Landsat 7 and 8, respectively (Table 7). The relationship reflects the increase in the reflectance in the Red band as SDD levels drop. Regression models resulting from the addition of the Red band provided an improvement of all algorithm R2, while also reducing their residual standard errors. Multiple regression models were not found to suffer from collinearity defined at a VIF > 5 (VIF for Landsat 7 + 8 model = 2.64; VIF for Landsat 7 model = 4.22; VIF for Landsat 8 model = 2.46). Inclusion of season as an additional covariate also improved all models significantly. The best overall SDD model for Landsat 7 had an adjusted R2 of 0.81, while the best overall SDD model for Landsat 8 had an adjusted R2 of 0.63. The best across-sensor model shared the structural form of the individual sensor models, and had an adjusted R2 of 0.66.

Table 7 SDD regression models based on band ratios and single bands. Band ratios are based on surface reflectance. Water temperature (Temp) is in °C for corrected Landsat thermal bands; Season is calculated as sin((2πJ)/365), where J = Julian Day; Cross-validated R2 and residual sum of squares are based on fourfold cross-validation

In an attempt to evaluate the robustness of the models and to ensure no over-fitting, a fourfold cross-validation analysis was conducted. The results (Tables 5, 6, and 7) showed that the models were generally robust with no evidence of over-fitting, given the minimal drop in the original R2 of all models. Overall, the chlorophyll-a models exhibited slightly lower robustness as compared to the TSM and SDD models, especially for models based on a single band ratio. Moreover, expanding the models beyond a single band ratio appeared to increase robustness across the three water quality parameters, which further confirms that the models do not suffer from over-fitting.

Discussion

The results confirmed the effectiveness of Landsat-based algorithms in quantifying water quality in a semi-arid hypereutrophic reservoir. This reinforces the important role that both sensors can play in assessing current eutrophication status, particularly Landsat 8 as the most recent Landsat mission, and Landsat 7 given its long-term viability (in operation since 1999) despite malfunctions in its SLC. Overall, the predictive power of the models developed from both Landsat satellites can be considered good (adjusted R2 for chlorophyll-a 0.50–0.70; adjusted R2 for TSM 0.58–0.81; adjusted R2 for SDD 0.63–0.81), particularly given that these models were calibrated using in situ data collected from an optically complex reservoir over multiple seasons and across 3 years (Table 1). While models with higher R2 have been reported in the literature for chlorophyll-a (Giardino et al. 2001; He et al. 2008; Kallio et al. 2005; Karakaya et al. 2011; Tebbs et al. 2013; Torbick et al. 2008; Alparslan et al. 2007), TSM (Ma and Dai 2005; Wu et al. 2015; Alparslan et al. 2007) and SDD (Karakaya et al. 2011; Khattab and Merkel 2014; Kloiber et al. 2002a; McCullough et al. 2012; Olmanson et al. 2016), they were either calibrated with data collected over a limited period of time, used spatio-temporal averaging, worked in high clarity lakes, and/or resorted to data-mining techniques. A detailed literature review of Landsat-based chlorophyll-a algorithms developed for lakes and reservoirs indicates that only 6 out of 24 studies that were conducted between 1990 and 2015 calibrated their water quality algorithms on data collected from more than one season (Allan et al. 2015; Bonansea et al. 2015; Chao Rodríguez et al. 2014; Cheng and Lei 2001; Ritchie et al. 1990; Tebbs et al. 2013). Increasing the span of the calibration period increases robustness and reduces overfitting.

The models for TSM and SDD were more robust than chlorophyll-a models, likely because they represent more generalized parameters, and because Landsat band placement is sub-optimum for measuring the latter. The lack of a Red-edge band (around 705 nm) in the Landsat sensors has proven to be challenging in waters with high mineral suspended matter, where the latter can interfere with chlorophyll-a algorithms (Gitelson et al. 2008; Gitelson 1992; Moses et al. 2012; Olmanson et al. 2013). Shortcomings of Landsat-based chlorophyll-a monitoring in optically complex waters have been recognized (Olmanson et al. 2015; Olmanson et al. 2016). Moreover, the proliferation of cyanobacteria in the reservoir in the summer and autumn can play a role in diminishing the predictive power of the chlorophyll-a models because they contain phycocyanin, which is indistinguishable from chlorophyll-a with regard to absorbance within the red bands of both Landsat 7 and 8 (Olmanson et al. 2015; Stumpf et al. 2016).

The expansion of the models beyond a single band ratio in this study generally improved model fit (improved adjusted R2 and lower residual standard error), with little evidence of over-fitting as supported by the cross-validation and variance inflation factors (VIF) analyses. Similarly, the inclusion of ancillary data improved model fit for TSM and SDD, with little evidence of over-fitting, particularly for across-sensor models (Fig. 3). Residual standard errors were lowest for all TSM and SDD algorithms when temperature and/or season were included in models for Landsat 7, 8, and the combined sensor models. Hence, the models demonstrate seasonal differences in how the optically active water quality parameters produce spectral signals measurable through remote sensing. Both water temperature and seasonality affect algae growth and community dynamics, cycles of nutrient inputs (including internal lake mixing), cycles of sediment inputs from river flow, as well as light and wind patterns, and therefore can be expected to produce some degree of predictable difference in reflectance signals over the year. These covariates act as surrogates of algae ecology and compositional changes in suspended matter in the regression models and are particularly advantageous from a management perspective in that they do not rely on in situ data collection to generate required measurements; thus they are easy to incorporate in future predictions as well as for hindcasting.

Fig. 3
figure 3

Predicted versus observed plot for the final multiple regression models developed for chlorophyll-a (first row), TSM (second row), and SDD (third row). Gray circles represent the mean predicted value to the observed value for Landsat 7 Gray vertical lines show the 95% confidence interval of prediction for Landsat 7 Black circles represent the mean predicted value to the observed value for Landsat 8 Black vertical lines show the 95% confidence interval of prediction for Landsat 8 Diagonal lines represent the 1:1 line

The best covariate multiple regression water quality models exhibited strong similarities between the two sensor-types with regard to their functional forms, indicating that both sensors are capable of providing valuable and realistic information on the water quality of hypereutrophic lakes and reservoirs. Yet, differences in the model coefficients between the ETM+- and OLI-based models indicate that sensor transferability can introduce biases in eutrophic systems like Qaraoun Reservoir (refer to online Supplementary Material). Therefore, given current sensor calibration and Landsat atmospheric correction regimes, the development of sensor-specific algorithms calibrated with in situ data remains necessary to properly capture water quality dynamics.

The differences in model performance can be partially attributed to differences in band placements between sensor types, which is the case for the NIR band (Band 4 for Landsat 7 = 0.76–0.90 μm; Band 5 for Landsat 8 = 0.85–0.88 μm), and to a lesser extent for the Red band (Band 3 for Landsat 7 = 0.63–0.69 μm; Band 4 for Landsat 8 = 0.64–0.67 μm). Single band correlations for both the Red and NIR showed superior relationships with the water quality parameters for Landsat 7. While the repositioning of the NIR band in Landsat 8 was introduced in an effort to remove the water vapor absorption feature at 0.825 μm (Irons et al. 2012), this shift may have interfered with the reflectance characteristics of the water quality parameters, particularly of TSM. For instance, Shafique et al. (2003) reported that small differences in the NIR reflectance had a large impact on turbidity predictions for Ohio rivers. The splitting of the Blue spectral region into two bands in Landsat 8 (Band 1 Ultra-Blue = 0.43–0.45 μm, Band 2 Blue = 0.45–0.51 μm) versus one band in Landsat 7 (Band 1 Blue = 0.45–0.52 μm) may have also contributed to the observed differences in band correlation and overall model fit. In this study, the use of Ultra-Blue instead of Blue for Landsat 8-based models resulted in minimal change when predicting chlorophyll-a and TSM concentrations; differences were more noticeable in the case of the SDD algorithms. Note that differences in model performance (adjusted R2 and residual standard errors) between Landsat 7 and 8 were most pronounced for algorithms that made use of the Blue band ((Blue-Red)/Green and Blue/Red for chlorophyll-a; NIR/Blue for TSM; and Blue/Red for SDD). Correlation coefficient differences were also most pronounced for the Blue band, with water quality parameters showing stronger correlations with the Blue band for Landsat 7 images. The observed discrepancies between the Landsat 7 and 8 Blue bands is unexpected given their nearly identical placement along the electromagnetic spectrum. Other studies have reported low OLI reflectance responses in the Blue band (Helder et al. 2013; Holden and Woodcock 2016; Pahlevan et al. 2014; Zhu et al. 2016). Holden and Woodcock (2016) interpreted the darker OLI Blue band reflectance as resulting from better masking of cirrus clouds in the current Landsat 8 atmospheric correction regimes, which make use of the new OLI Ultra-Blue coastal aerosol band (Vermote et al. 2016). Pahlevan et al. (2014) attributed the reflectance differences to sensor calibration based on pre-launch practices. Moreover, the adoption of the push broom design for the OLI sensor, as compared to the whiskbroom design of the ETM+ could also affect radiometric calibration (Czapla-Myers et al. 2013). Future developments to the L8SR atmospheric correction algorithm currently being used with OLI could further reduce the observed differences in future Landsat products.

Differences in the range of in situ measurements between the two sensors could have also impacted the observed differences in sensor performance. The data shows that while the range of the in situ data for TSM and SDD was smaller for Landsat 7 as compared to Landsat 8, median values and interquartile ranges (first and third quartile) were nearly identical for the two sensors. With respect to chlorophyll-a, the Landsat 7 data showed a wider range of variability. Nevertheless, the Landsat 8 data had 25% of its chlorophyll-a concentrations below 40 μg/L, while Landsat 7 had less than 14% of its in situ measurements below that threshold. This mismatch could have negatively affected the performance of the Landsat 8 models, as they had to predict under a wider set of trophic states. Summary statistics of the measured in situ Chlorophyll-a, TSM, and SDD concentrations by sensor type are presented in the electronic supplementary material.

In closure, it is worth emphasizing that Landsat-based chlorophyll-a, TSM, and SDD monitoring are dependent on the lake or reservoir trophic state (Brivio et al. 2001; Gitelson and Yacobi 1995; McCullough et al. 2012) and hence model results in this study can be pertinent to other reservoirs that exist in a eutrophic or hypereutrophic state with low levels of mineral suspended matter. Differences between the two Landsat sensors (ETM+ and OLI) may be less pronounced in oligotrophic systems and/or systems not dominated with cyanobacteria as those tend to be more homogeneous and less optically complex in comparison to eutrophic systems. Recent studies of oligotrophic systems reported more acceptable correspondence in the predictive skill between the two sensors when predicting TSM, water clarity, and CDOM (Lymburner et al. 2016; Olmanson et al. 2016). Ultimately, more studies are needed across the trophic spectrum of diverse inland lakes and reservoirs, both in the semi-arid and in the temperate zones, to ascertain sensor transferability between Landsat 7 versus Landsat 8 and to properly evaluate the accuracies of these two important water quality monitoring sensors.

Conclusion

Landsat-based remote sensing of water quality offers the advantage of improving the spatial and temporal coverage of data in a cost-effective way. If properly applied, it has the potential to improve our understanding of poorly monitored lakes and reservoirs. While algorithm transferability between Landsat sensors would be ideal for generating long-term data-sets and thus improving management outcomes, this study shows that differences exist between the ETM+ and the OLI-based water quality models given current sensor calibrations and atmospheric routines. Yet, both sensors performed satisfactorily when independently quantifying chlorophyll-a, TSM, and SDD. Moreover, the functional forms of the developed algorithms were often similar across sensors, which indicates a degree of robustness in the developed algorithms. Overall, the Landsat program offers an opportunity for water managers to better assess the quality of lakes and reservoirs, track harmful algal blooms, and assess the success of river-basin point and non-point source pollution plans in limiting anthropogenic eutrophication.