Introduction

The Indian River Lagoon (IRL) estuary system is located on the east coast of Florida and has the highest species diversity in North America (Graves et al. 2004; SFWMD 2011). In the past several decades, a decline in the ecological and biological integrity at IRL has been observed, and an ecological shift from seagrass to macroalgae has occurred (Riegl and Foster 2011). The decline is in part caused by deterioration of water quality in the IRL due to increased nutrient level (Sigua et al. 2000; NEP 2007), as excessive nutrient to IRL may lead to large phytoplankton blooms, loss of submerged macrophytes, decline of fish habitats, and anoxia (Gray 1992). Nutrients (and other pollutants) may enter the estuary waters from both natural sources (e.g., surface runoff, erosion, and atmospheric deposition) and anthropogenic sources (e.g., septic systems, agriculture, and reclaimed water irrigation) in the IRL basin (Doering 1996; Badruzzaman et al. 2012; Lapointe et al. 2015). Anthropogenic influences on water quality can be negative or positive. Negative influences include adverse water quality impacts from extensive commercial and residential development along the IRL over past decades; positive influences include a variety of major restoration projects (e.g., rediversion project for canal C-1, dragline ditch restoration, connection of residential areas served by septic tanks to central sewer, and storm water treatment projects) that have been implemented by local governments and water management districts with funding support from the Florida Department of Environmental Protection (FDEP) (CCMP 2008; BMAP 2013). For protecting the IRL ecosystem and water resources, it is important to understand spatial variation and temporal changes of the water quality in IRL due to the impacts of natural and anthropogenic factors.

Spatial and temporal variation of the water quality in IRL has been reported in the literature. Based on surface water quality data collected in the period of 1988–1994, Sigua et al. (2000) found that the total nitrogen (TN) concentration was high (1.25 mg/L) in the northern IRL, but lower (0.89 mg/L) in the southern IRL. The spatial pattern of the total phosphorous (TP) concentration was opposite, with higher TP concentration in the northern IRL and lower concentration in the southern IRL. With respect to temporal variation of the water quality, Qian et al. (2007a) found that, for the St. Lucie Estuary in the southern IRL area, water quality was better in wet seasons than in dry seasons. Qian et al. (2007b) further studied nutrient trend of the same region and found a significant increase in orthophosphate loadings from 1979 to 2004. For the north part of the IRL area, Riegl and Foster (2011) evaluated the biomass of drift macroalgae and found that the biomass was 102,162 metric tons in 2010, significantly more than the biomass of 69,859 metric tons evaluated in 2008. These findings indicate that it is necessary to study spatial and temporal variation of water quality in IRL for water quality management.

This study conducted multivariate statistical analysis and trend analysis for understanding spatial variation and temporal changes of surface water quality in the central IRL area, which includes Brevard and Indian River Counties (Fig. 1). In comparison with the water quality in the south and north IRL areas, less attention has been paid to water quality in the central IRL area. This study is therefore necessary for water quality management of the entire IRL area. The statistical analysis of this study is based on the surface water quality data collected from 1998 through 2013 in the central IRL area. Since these data are more recent than those used in the studies discussed above, the results of this analysis can better reflect spatial variability and temporal change of surface water quality in the study area over a more recent time period.

Fig. 1
figure 1

Location of twelve monitoring stations in the central Indian River Lagoon area

Based on the compiled data, clustering analysis and principal component analysis were used to evaluate spatial variability of water quality, and trend analysis was used to evaluate temporal variability of water quality. The three techniques of statistical analysis have been widely used for water quality study (Singh et al. 2004; Ouyang 2005; Castro et al. 2017; Pant et al. 2018). Clustering analysis is often used to group water quality data with similar characteristics, which simplifies subsequent statistical analysis such as principal component analysis (Legendre and Legendre 1998; Simeonov et al. 2003). Principal component analysis can reduce the dimensionality of data by selecting a small number of principal components that can explain most of the variance of the original data with minimal information loss (Park et al. 2002; Bengraine and Marhaba 2003). Trend analysis is to detect whether the variables of interest are increasing, decreasing, or have no trend over time (Helsel and Hirsch 1992). A particular feature of this study is that it uses the sequential Mann–Kendall method for analyzing temporal changes of water quality, which has not been reported in the literature. The statistical results of this study are expected to be useful for understanding temporal changes of surface water quality in the central IRL area.

Monitoring stations and dataset

The surface water quality data for the central IRL area were downloaded from the STOrage and RETrieval (STORET) data warehouse website (http://www.epa.gov/storet/) of the U.S. Environmental Protection Agency (EPA). The dataset used in this study was selected based on the following two criteria: (1) the data should be recent and from a long monitoring period and (2) the data should be from routine monitoring with the amount of missing data as small as possible. The second criterion is necessary for the Mann–Kendall and sequential Mann–Kendall methods, and missing data were filled with mean values of the neighboring data for performing the multivariate statistical analysis and trend analysis. The dataset includes twelve water quality parameters collected monthly for the period from January 1, 1998 to December 31, 2013. The twelve parameters are as follows: nitrite and nitrate (NOx, mg/L), total Kjeldhal nitrogen (TKN, mg/L), total nitrogen (TN, mg/L), total phosphorus (TP, mg/L), total suspended solids (TSS, mg/L), turbidity (NTU), dissolved oxygen (DO, mg/L), pH (pH units), specific conductivity (SC, µS/cm or µmho/cm), salinity (parts per thousand or gm/L), color (PCU), and water temperature (T). The data were collected at twelve monitoring stations shown in Fig. 1, which include eight stations in the Brevard County and four in Indian River County. The twelve stations are representative of hydrologic conditions within the IRL, because they are not close to transitional areas such as point source mixing zones and nearshore regions. It should be noted that the monitoring data were collected at routine monitoring schemes, not to capture specific flow or rainfall events, which may significantly affect water quality (Lapointe et al. 2012). In addition, data of biochemical/chemical oxygen demand are not available in the STORET database for the twelve monitoring stations, probably because water sample used for measuring the water quality parameters is subjected to short holding time less than 1 day, as explained in Qian et al. (2007a). Since low dissolved oxygen in the lagoon has not been a major issue (except in several rare occasions such as the Super Algal Bloom happened in 2010–2011), there are only few sites where biochemical/chemical oxygen demand is occasionally measured (Kroening 2008; Gao 2009; Gao and Rhew 2012).

Since land uses (e.g., citrus, pasture, urban, natural wetland, row crop, dairy, and golf courses) may be a major factor impacting water quality in the St. Lucie Estuary (Graves et al. 2004), land cover and land use data for Brevard and Indian River Counties were downloaded from the SJRWMD website (http://www.sjrwmd.com/gisdevelopment/docs/themes.html). The data from the website were available for 2 years, 2000 and 2009. The data for four land use classifications (agriculture and pasture land, built-up land, forest land, and golf course) that could contribute nutrients to the IRL are shown in Fig. 2a, b for 2000 and 2009, respectively. The boxes in Fig. 2 delineate the boundary of Indian River County. The area of the Brevard County is 1557 mi2 (996,480 acres), about 2.52 times as large as the area of 617 mi2 (394,880 acres) for Indian River County. While Indian River County has less built-up land and forest land than Brevard County, it has more agriculture and pasture land, as shown in Fig. 2c. The graphic in Fig. 2c shows the changes in the four land uses from 2000 to 2009. The usage of agricultural land decreased in the both counties, while the built-up and golf course land increased. Figure 2a, b shows that most of the increased built-up lands (red) were located next to the lagoon, resulting increased septic tank effluents to the lagoon.

Fig. 2
figure 2

Four categories of major land cover and land use in the study area for a year 2000 and b year 2009. The bar chart in Figure (c) shows the areas (acres) of the four categories, and the table in Figure (c) lists the increasing (+) or decreasing (−) areas of the four categories from 2000 to 2009

Methods

This section gives a brief description of the methods of clustering analysis, principal component analysis, and trend analysis. More details of the methods are referred to the literatures given below.

Cluster analysis and principal component analysis

Clustering analysis was applied in this study to separate the twelve monitoring stations into groups with similar water quality characteristics. Clustering analysis is an unsupervised pattern detection technique used to classify objects into clusters (categories) based on similarity of the objects (Vega et al. 1998). The similarity between two objects is typically measured by Euclidean distance. Clustering analysis is a sequential process, starting from the most similar objects and forming desired clusters gradually so that the magnitude of association is strong for objects in the same cluster, but weak among different clusters (Otto 1998). A hierarchical agglomerative clustering was performed for the normalized data (standardization by the z-transformation) using the Ward’s method (Winderlin et al. 2001; Simeonov et al. 2003) implemented in MATLAB. The method minimizes the sum of squared distances of centroids from any two groups formed at each step of clustering analysis. The linkage distance is expressed as Dlink/Dmax × 100, where Dlink is the linkage distances for a particular cluster and Dmax is the maximal linkage distance (Singh et al. 2004). The linkage distance is used to measure the similarity of water quality data at different monitoring stations.

For the data in each of the groups identified in the clustering analysis, principal component analysis was used to reduce the number of variables. Instead of analyzing all the twelve water quality parameters, principal component analysis extracts a smaller number of components without losing important information. Following Winderlin et al. (2001) and Simeonov et al. (2003), the following steps were conducted for the principal component analysis: (1) standardize the water quality data to make them dimensionless; (2) calculate the covariance matrix of the standardized data; (3) find the eigenvalues and the corresponding eigenvectors; and (4) use the Kaiser criterion to choose the principal components based on the eigenvalues, scree plot, and the explained variances. The principal component analysis reduces the dimensionality of the water quality data, since the number of water quality parameters involved in the selected principal components is smaller than twelve, the total number of water quality parameters.

Mann–Kendall and sequential Mann–Kendall trend analysis

Trend analysis was conducted for the water quality parameters involved in the selected principal components. This study uses the Mann–Kendall test (Mann 1945; Kendall 1975), a nonparametric method that does not require residual models (Libiseller and Grimvall 2002; Kundzewicz and Robson 2004; Zhang et al. 2006). In this study, the Mann–Kendall test was applied to examine whether a trend in the time series of water quality parameters (prioritized by the principal component analysis) was statistically significant at significance levels α = 0.01 (the 99% confidence interval) and α = 0.05 (the 95% confidence interval). To satisfy the requirement in the Mann–Kendall test that the data are serially independent, the procedure of data pre-whitening (von Storch and Navarra 1995) was implemented to remove serial correlation before applying the Mann–Kendall test.

After finding the statistically significant trends, abrupt trends were identified by the sequential Mann–Kendall method, which has been widely used to analyze hydrometeorological time series (Douglas et al. 2000; Modarres and Sarhadi 2009; Tabari et al. 2010; Sayemuzzaman and Jha 2014; Sayemuzzaman et al. 2014a, b, 2015). It is a sequential procedure of progressive and backward analyses of the Mann–Kendall test. If the two series (progressive and backward) are crossing each other, the year of crossing represents the year of trend change. If the two series cross and diverge from each other for a longer period of time, the year of diverge beginning indicates abrupt trend change (Modarres and Sarhadi 2009). The details of implementing the sequential Mann–Kendall test are referred to Sayemuzzaman and Jha (2014).

Results and discussion

This section describes the results of the clustering analysis, principal component analysis, and trend analysis conducted for the water quality data.

Clusters of monitoring stations

The clustering analysis was used to determine the clusters of the twelve monitoring stations for analyzing the spatial variability of the water quality data in the study area. Figure 3 shows the dendrogram for the twelve monitoring stations. Based on the criterion of (Dlink/Dmax) × 100 < 60%, the twelve stations can be grouped into four clusters. Cluster 1, denoted as C1, includes stations IRL102, IRL107, IRL110, IRL113, IRL115, and IRL118, which are the six stations in the north of the study area (Fig. 1). The second cluster (C2) has only one station (IRLHUS), and the third cluster (C3) is also associated with a single station (Crane Creek). The last cluster (C4) has four stations, IRLVNC, IRLVMC, IRLVSC, and C-25 Upstream, located in the southern end of the study area (Fig. 1).

Fig. 3
figure 3

Dendrogram of clustering analysis for the water quality data at twelve monitoring stations in the study area. The stations are numbered 1–12 from north to south. The actual station names shown in Fig. 1 are included in the parentheses

The cluster identification is reasonable in terms of the station locations. All the C1 stations are located in the Brevard County, and they are at the lagoon water surrounded by the landmasses from both east and west sides of the lagoon. All the C4 stations are located in Indian River County and are at the inland area near the lagoon and surrounded by the densely populated area. The different population densities and agricultural land uses of the two counties may contribute to the different water quality of the two clusters. This is consistent with the finding of Graves et al. (2004), who compared the effects of dominant land use types on water quality in the IRL watershed and concluded that runoff from agricultural and urban land use yielded greater nutrient concentrations than wetland runoff. The IRLHUS station of cluster C2 is located at the downstream of the Horse Creek, and its water quality may be affected by the creek runoff. The Crane Creek station of cluster C3 is located at Crane Creek (1.8 miles inland from the lagoon), and its water quality may be influenced by golf course runoff (Fig. 1).

Important water quality parameters and their spatial variability

Since water quality data are similar within each cluster, principal component analysis was conducted for the data of the individual clusters to select important water quality parameters. Subsequently, spatial variability of the important water quality parameters was investigated. Table 1 lists, as an example, the principal components obtained by applying principal component analysis to the data of cluster C1. The five components explain about 75% of the total variance of the cluster data. The important variables of water quality in each principal component was identified based on the component loadings obtained after the coordinate transform during the principal component analysis, and the identified water quality parameters are highlighted in Table 1. The identification is based on the work of Liu et al. (2003), Ouyang (2005), and Singh et al. (2004). They classified the principal component loadings as “strong” (absolute loading value > 0.75), “moderate” (absolute loading value from 0.50 to 0.75), and “weak” (absolute loading value from 0.30 to 0.50). Only the strong and moderate loadings are selected.

Table 1 Component loadings obtained from principal component analysis for water quality data of cluster C1

Table 2 summarizes the identified important water quality for all the four clusters. The table shows that nutrient (nitrogen and phosphorus) is the most important water quality parameter, because TKN, TN, and TP are found in the first principal component for all the clusters. To evaluate the spatial variability of the important water quality parameters, the concentrations of TKN, TN, and TP at the twelve stations are plotted in Fig. 4 as box-and-whisker plots. For the convenience of evaluation, Fig. 4 also plots the concentrations of NO x , which was identified as an important variable for cluster C4. The figure shows that the NO x and TP data have substantial spatial variability. Figure 4a shows that the six stations (IRL102, IRL107, IRL110, IRL113, IRL115, and IRL118) of cluster C1 in the north of the study area have substantially lower NO x concentrations than the other six stations. This may be attributed to the lower urban density in the Brevard County than in the Indian River County (Fig. 2), assuming that the areas with lower density of urban development contain fewer nutrient sources. Figure 4d shows that the spatial pattern of TP concentration is similar to the spatial pattern of NO x , which may be attributed to the larger area of agricultural land in the Indian River County than in the Brevard County (Fig. 2). The TKN and TN data plotted in Fig. 4b, c do not exhibit substantial spatial variability. Given that the median TKN and TN concentrations are higher than the median NO x concentrations, organic nitrogen and ammonium may contribute more than NO x to nitrogen in the study area. Spatial variability of organic nitrogen and ammonium is smaller than that of NO x in the study area.

Table 2 Important water quality parameters for the four cluster groups
Fig. 4
figure 4

Box-and-whisker plot for the concentrations of a NO x , b TKN, c TN, and d TP at the twelve monitoring stations

Table 2 indicates that principal components PC3–PC5 select dissolved oxygen (DO), turbidity, total suspended solids (TSS), pH, and water temperature as important parameters. The importance of turbidity and TSS in clusters C3 and C4 may be attributed to runoff of sediment due to larger area of developed land in the corresponding portion of the study area, considering that developed areas have less vegetation and more exposed soil than undeveloped areas. However, Fig. 5a does not show large spatial variability for turbidity, because the median values of all the stations are similar. On the contrast, Fig. 5b shows that TSS is substantially smaller in the last 5 stations (of clusters C3 and C4) located in the Indian River County than in the first seven stations (of clusters C1 and C2) located in the Brevard County. The selection of DO and temperature from the principal components PC3 and PC4 may be explained by the correlation between DO and temperature, because warm and cold water may tend to correlate with low and high DO, respectively. Although the principal component analysis shows that DO, turbidity, TSS, pH, and water temperature are less important than nutrients, specific conductivity, and salinity, all the variables are important to the lagoon ecology. Therefore, the Mann–Kendall trend analysis was applied to all the water quality parameters identified by the principal component analysis.

Fig. 5
figure 5

Box-and-whisker plots for the concentrations of a turbidity and b TSS at the twelve monitoring stations

Trend analyses for important water quality parameters

The Mann–Kendall and sequential Mann–Kendall trend analyses were applied to the water quality parameters listed in Table 2. When applying the trend analysis to the data of 16 years, a 12-month moving average was used to avoid the impacts of extreme values within the dataset. The pre-whitening process discussed above was applied to obtain serially independent data for the trend analysis. Table 3 lists the results of the trend analysis for the individual stations. The thick upward arrows in red (downward arrows in blue) indicate statistically significant increasing (decreasing) trends at the 10% significance level; the thin arrows denote statistically insignificant trends. The numbers listed in Table 3 are the years of abrupt trend detected by the sequential Mann–Kendall test. For the purpose of demonstration, Fig. 6 shows the results of the sequential Mann–Kendall test for NO x at the Crane Creek station and for TP at the IRLHUS station.

Table 3 Summary of trend analysis results
Fig. 6
figure 6

Forward and backward series of sequential Mann–Kendall trend analysis applied to a NO x concentrations at the Crane Creek station and b TP concentrations at the IRLHUS station

Table 3 shows that the nutrient water quality parameters (NO x , TN, TKN, and TP) have significant decreasing trends at most of the stations, except that NO x and TP have substantial increasing trends at several stations. The decreasing trend may be attributed to three factors. The first one is the decreased fertilizer usage, which is shown in Fig. 7 based on the fertilizer usage data collected from the website of the Florida Department of Agriculture and Consumer Services (http://www.freshfromflorida.com/Divisions-Offices/Agricultural-Environmental-services/Business-Services/Fertilizer/Fertilizer-Manufacturers/Fertilizer-Consumption-Tonnage-Data/Archive-Fertilizer-Tonnage-Data, valid as of 10/30/2016). The second factor that may explain the decreasing trends is the county-wide activities of nutrient management, such as muck removal, storm water improvement, and wetland restoration (WQ Report 2008). The third factor that may contribute to the decreasing trends is the drought condition in the study area that resulted in lower-than-average freshwater flow into the lagoon, which decreased frequency and magnitude of watershed flushing and hence decreased nutrients load to surface water (Schindler et al. 1996). The drought effect is also reflected in the trends of salinity and specific conductivity, as discussed below. The increasing trend of NO x at stations Crane Creek and IRLVNC may be explained by their close proximity (less than one mile) to the golf courses shown in Fig. 1, considering that golf courses may be sources of nitrogen fertilizer. The increasing trend of NO x at stations IRLVNC and IRLVMC may be attributed to nearby urban sources of nitrogen such as lawn fertilizer and/or septic systems. Figure 2c shows that the built-up land increases from 2000 to 2009 in the Indian River County.

Fig. 7
figure 7

Fertilizer consumption (tons) from 1999 to 2012 in Brevard County (BC) and Indian River County (IRC)

Table 3 shows that salinity and specific conductivity increase significantly in 2006 in the first seven stations. This is attributed to a severe drought in 2006 during which there were reduced freshwater flows into the lagoon and enhanced the evaporation in the lagoon (Murdoch et al. 2000). Figure 8 plots the relation between salinity and specific conductivity at the first seven stations listed in Table 3 and the historical drought data (the Palmer Drought Severity Index) downloaded from the NOAA website http://www7.ncdc.noaa.gov/CDO/CDODivisionalSelect.jsp (valid as of 10/30/2016). To plot the data of the three variables in one figure, the data were normalized, and the drought data are moving-window average with the window size of 4 months. The correlation coefficient (r1) between drought and salinity ranges from − 0.30 to − 0.49, the correlation coefficient (r2) between drought and specific conductivity ranges from − 0.30 to − 0.51, and the correlation coefficient (r3) between salinity and specific conductivity ranges from 0.97 to 0.99. The negative correlation suggests that salinity and specific conductivity increased as the drought became more severe. The correlation is sufficiently significant to conclude that the increasing trends of salinity and specific conductivity are caused by the drought condition at the study area.

Fig. 8
figure 8

Time series of normalized drought index, salinity, and specific conductivity based on the moving-window average of 4 months. The correlation coefficients r1, r2, and r3 are between drought index and salinity, between drought and specific conductivity, and between salinity and specific conductivity, respectively

Table 3 shows significant increasing trends of DO at different stations from north to south of the study area. This is related to the decreasing trends of nutrients, which can improve the DO in the lagoon. The pH trends increase significantly at all the stations, although the trend changes occurred in different years. The TSS trends are found to decrease significantly at all the stations except IRLHUS. This may be attributed to the activities of best management practices (BMP), such as urban BMP and agricultural BMP that decreased the load of suspended solids into the lagoon (Gao and Rhew 2012). Although the trends of the important water quality parameters have not been fully understood, the trend analysis provides a quantitative tool for analyzing temporal changes of the water quality parameters.

Summary and conclusions

Multivariate statistical analysis and trend analysis were conducted in this study to analyze and interpret spatial variability and temporal change of surface water quality in the central IRL area. Since the data used in this study are more recent than those used in previous studies, the results of this analysis can help better manage water quality in the study area. Using the cluster analysis, the data collected from the twelve monitoring stations were clustered into four groups: C1 (IRL102, IRL107, IRL110, IRL113, IRL115, and IRL118) located in the northern IRL, C2 (IRLHUS), C3 (Crane Creek), and C4 (IRLVNC, IRLVMC, IRLVSC, and C-25 Upstream) located in the southern part of the IRL. For the four groups, the water quality parameters involved in the first five principal components were identified as important parameters to the water quality in the study area. These parameters are nutrient species (nitrogen and phosphorus), physicochemical parameters (salinity, specific conductivity, pH, DO), and erosion factors (total suspended solid), which are well-known water quality indicators. The concentrations of TKN, TN, and TP were found to be important at all the monitoring stations in the study area. These parameters may be associated with nutrient sources in urban areas close to the lagoon. NO x concentration is the important water quality parameter at the monitoring stations located in the south part of the study area, and this may be attributed to urban sources of nutrients in the Indian River County.

Statistically significant trends and abrupt trend shifts were detected using the Mann–Kendall and sequential Mann–Kendall trend analysis, respectively. Significant trends and trend shifts were identified for the important water quality parameters. The nutrient water quality parameters (NO x , TN, TKN, and TP) have significant decreasing trends at most of the stations, except that NO x and TP have substantial increasing trends at several stations. Significant increasing trends were detected for specific conductivity and salinity at seven monitoring stations located in the Brevard County after 2006, and it is attributed to the drought conditions in 2006. Drought (especially the warmer-drier conditions) reduces the amount of water to the lagoon, enhances the evaporative loss of fresh water, and subsequently increases the salinity in surface waters and watershed soils. Improvement in water quality in terms of decreasing nutrient concentrations was noticed, which may be resulted from the implementation of lagoon restoration projects. The understanding of water quality for the central IRL area obtained in this study can be utilized for water quality improvement.