Introduction

Biological indicators are currently standard for assessment of aquatic ecosystems (Simon, 2003; Cao & Hawkins, 2005; Palmer et al., 2005). Indexes of Biotic Integrity (IBI) use fish, invertebrate, algal, bird, and other community data to calculate metrics for use in assessment of environmental degradation, habitat quality, and ecosystem change across time or space (Karr, 1981; Karr et al., 1986; Simon, 2003). For example, these indices are widely used to compare spatial variation that brackets pollution input locations, or temporal variation at a single site before and after the initiation or cessation of pollution control efforts (Yoder & Smith, 1999; McCormick et al., 2001). Since their original introduction by Karr (1981), IBIs for specific watersheds have been regularly developed, refined, and improved with increased usage, understanding, and application (Simon, 2003).

A large body of literature is devoted to effectively sampling fish assemblages for assessment based on sampling frequency (Paller, 1995; Meador, 2005), sampling distance (Angermeier & Karr, 1986; Lyons, 1992; Dauwalter & Pert, 2003), gear type (Meador & McIntyre, 2003), and capture efficiency (Bayley & Austen, 2002). However, fewer studies have addressed within-year temporal variation of IBI scores (Angermeier & Karr, 1986; Karr et al., 1987; Fore et al., 1994). The inherent temporal variation in fish assemblage structure in lentic and lotic ecosystems is well-known (Matthews, 1998). The relative contributions of various sources of this variation (random sampling variation, hydrologic variation, and weather variation) are not well understood.

The IBI sampling protocols were designed to sample abundance of taxa using specific protocols or gear types across sites or years using a single sample per year (Angermeier & Karr, 1986). When multiple samples are obtained for the same site, some approaches have allowed for averaging multiple IBI scores for the same site over seasons or years (e.g., Pyron et al., 2004). Other protocols require sample collection only during a specific window of time. For those protocols using single sites with no time restraints to compute IBI scores, there is reason to wonder whether this provides an adequate basis for characterizing a stream reach, in light of pervasive spatial and temporal variation in aquatic ecosystems. Fore et al. (1994) examined the statistical properties of the IBI and concluded that the approach is valid. They found that degraded sites had higher within year temporal variability of IBI scores than more pristine sites. Fore et al. (1994) determined that sampling during late summer/early fall season decreased year-to-year variability among samples taken at the same sites. While significant spatial and temporal variation of IBI scores were identified by Karr et al. (1987) for two small Midwestern streams, water quality classifications based on these IBI values did not significantly change over time. Angermeier & Karr (1986) concluded that samples collected later in the summer resulted in biased IBI values due to presence of young-of-the-year fishes (young-of-year are excluded in recent IBIs).

Assemblage variation that results in similar assemblages over distance produces spatial autocorrelation patterns. Spatial autocorrelation is common in ecological variables and can occur when assemblages are similar over some distance (Legendre, 1993; Cooper et al., 1997). In riverine ecosystems, significant spatial autocorrelation can result from individuals moving freely among reaches (Lloyd et al., 2006), or be caused by greater similarity of site conditions among adjacent sites than among distant sites (Legendre, 1993). Although ecologists identify the presence of spatial autocorrelation (Cooper et al., 1997), few lotic studies explicitly test for spatial autocorrelation.

The majority of published assessments of the effect of sampling frequency on IBI scores and subsequent biotic integrity interpretations have been based on data from small- to medium-size streams. Fewer studies have reported on the application of the IBI technique to large rivers (Lyons et al., 2001; Emery et al., 2003; Rinne et al., 2005). The objectives of this study were to use a long-term database for fish assemblages in a large temperate river (Wabash River, Indiana) to determine: (1) if the Wabash River IBI scores based on samples obtained during two summer months are significantly different, (2) the presence and direction of temporal changes in IBI scores at individual sites over the 25-year period of record, (3) if the river distance––IBI relationship changed with time, (4) whether within-year temporal variation in IBI is greater at lower quality sites than at higher quality site, (5) if spatial autocorrelation is present for IBI scores, and (6) if temporal autocorrelation is present for IBI scores.

Methods

The Wabash River originates in west central Ohio and flows west across Indiana where it eventually forms the border between Indiana and Illinois before its confluence with the Ohio River. It has a watershed area of 85,340 km2 (Benke & Cushing, 2005) and lower portions of the river have been classified as a Great River (Simon & Sanders, 1999). Human influences on the Wabash River have been predominantly from agricultural activities as >60% of the basin is row-crop farmland (Gammon, 1998). There is one mainstem impoundment (Huntington Lake, river km 662), although numerous impoundments are present on tributaries. Sampling sites were located from river km 529 in Carroll County downstream to river km 272 in Sullivan County (Fig. 1).

Fig. 1
figure 1

Wabash River watershed. Collection sites were on the mainstem between the open circles

Our analyses were based on annual collections by Gammon (1998) for the years 1974–1998. From this 25-year database, we selected only those sites and years for which two samples were obtained from the same reach. Specific summer sampling dates were constrained by river discharge limitations and varied somewhat. For most years, the two samples were obtained in June and July, or July and August. In several years, the second collection was in September. On the basis of these criteria, a total of 857 site-year collections among 24 years were selected from the database for use in these analyses. At each site, fishes were sampled in a single pass along a 500-m reach located on the outer river bends. Gammon (1998) demonstrated that the 500-m site length is where species richness reached an asymptote. All fishes were collected by boat electrofisher (Smith-Root 5.0 GPP) using DC voltage (600 V, 6-8 A) and one netter using a dip net with 6.25-mm2 mesh (Gammon & Simon, 2000). Sampling time, the time when the power was applied to electrodes, for individual sites averaged 600–1000-s. Individual fish were identified to species, measured (mm), weighed (g), evaluated for DELT (deformities, erosions, lesions, and tumors) anomalies (Simon & Emery, 1995), and released.

We calculated IBI scores for each of the two samples obtained per site per year using the methods of Simon & Emery (1995), modified for the Wabash River by Gammon & Simon (2000). A total of 12 metrics (four species richness and composition metrics, two species tolerance metrics, three trophic composition metrics, one reproductive guild metric, one abundance, and one condition metric) were scored into three categories (1, 3, 5) and summed to provide a maximum score of 60 (Table 1).

Table 1 Index of biotic integrity scoring criteria for the Wabash River, from Gammon & Simon (2000)

Differences among monthly samples

We tested whether IBI scores in the two samples obtained in the same year at each of the sites were significantly different using a paired t-test. While the distribution of individual IBI scores was not normally distributed, the sample size was sufficiently large that the sampling distribution of mean of difference values for a paired t-test can be assumed normal under the central limit theorem.

Spatial and temporal autocorrelation

Standard statistical tests of significance required independent observations, an assumption that can be violated by spatial or temporal autocorrelation in the data. The database used in this study includes data collected over multiple years at sites in close proximity. Hence, there was reason to be concerned about the presence of autocorrelation. We tested for spatial and temporal autocorrelation of IBI scores using Mantel tests (Wilconsin & Edds, 2001) using PC-ORD software (McCune & Mefford, 1999). In this implementation of the Mantel test, two difference matrices are calculated and evaluated for concordance in change in Time or Distance, and change in IBI score. For temporal autocorrelation tests, we used sites that had at least 11 years of data.

Change in the river distance relationship with IBI score over time

We calculated Spearman rank correlation coefficients for river location and IBI score for each collection year with more than nine collection sites in a given year. These calculations were performed using data only for the first monthly collections (typically June). Interpretation of P-values for correlations between IBI values and space (river distance) was problematic for samples taken at sites that are in close proximity. The Mantel spatial autocorrelation analysis described above was used to evaluate whether or not autocorrelation impacted this analysis of change in IBI with river distance.

Temporal change in IBI score at individual sites

Spearman rank correlation coefficients were calculated between year of the collection and IBI score using the first, or only collection in that year (typically June) at each of the study sites that had at least nine years of data. The purpose of this analysis was to assess if IBI scores changed in a systematic manner over the duration of the period of sampling, and to determine if the strength and directionality of temporal change varied systematically among different locations (river km). Interpretation of P-values for correlations between IBI values and time was problematic for samples taken at a single site over multiple years when temporal autocorrelation was present. The Mantel temporal autocorrelation analysis described above was used to evaluate whether or not autocorrelation was an issue. It is possible, given the large number of sites and associated correlations, that some of these individual site correlations were spurious. However, the purpose of these analyses was to look for a large-scale pattern of temporal variation that was well replicated across multiple sites.

Association between mean IBI and variation in IBI

To assess whether lower quality sites had greater month-to-month temporal variation than higher quality sites, Pearson’s correlation analysis (one-tailed test of significance) was used to determine if there was a negative association between the mean of three monthly IBI scores for a single year and the range (maximum–minimum), the standard deviation (SD), or the coefficient of variation (CV) of the three scores. The dataset for which three samples per year were available was smaller, thus these calculations were performed on a reduced sample size (n = 108 site-years). A re-sampling distribution was created by shuffling IBI scores 1000 times for the three months and recalculating the correlation coefficient between mean IBI score and range, and mean IBI score and CV for sites. This estimated the probability of obtaining the observed Pearson’s r due only to random variation. We acknowledge that spatial correlation between mean IBI and IBI variation within these 108 site-years of data makes interpretation of the P-value for Pearson’s r problematic. However, we are not aware of any means to address this issue, so we leave it to the reader to interpret whether or not the results presented provide sufficient evidence of an association between mean IBI and within-year variability in IBI scores.

Results

During the 1974–1998 period of the study, 70 sites were sampled two times per year (Gammon, 1998; Table 2). These collections resulted in 80,374 individuals and 104 species and were used to generate IBI scores. There was a pattern of increasing IBI score with upstream distance (Fig. 2; r = 0.40, P < 0.001) and decreasing IBI score with year (Fig. 3; r = −0.28, P < 0.001).

Table 2 Years when two collections per site were made, the number of sites that were sampled, and location of sites (furthest downstream site to furthest upstream site in each year)
Fig. 2
figure 2

IBI score and river location of sites for the first collections (June usually). Many of the points are overlapping and not visible

Fig. 3
figure 3

IBI score and year of collection for the first collections (June usually). Many of the points are overlapping and not visible

Differences among monthly samples

Mean IBI score for the first sample was 27.5 and 27.3 (SD = 8) for the second sample and the range of IBI scores was 11–49. A paired t-test comparing mean IBI did not result in a significant difference for the first and second sample (t = 0.71, P = 0.48, n = 678).

Spatial and temporal autocorrelation

Significant spatial autocorrelation was detected in several years and the mean Mantel r-value was 0.18 (Fig. 4). In addition, spatial autocorrelation decreased in recent collections (r = 0.46, P = 0.007). The mean Mantel r-value was 0.09 for temporal autocorrelation. Temporal autocorrelation increased with upstream river location (Fig. 4, r = 0.60, P = 0.05). At upstream sites, samples that were made at similar times resulted in more similar IBI scores. Downstream sites that had collections at similar times were less likely to result in similar IBI scores.

Fig. 4
figure 4

Results of Mantel tests for spatial autocorrelation by year (top). The regression of r-values with year explained 28.2% of variation (P = 0.05). Results of Mantel tests for temporal autocorrelation by site (bottom). The regression of r-values with river location explained 18.5% of variation (P = 0.007)

Change in the river distance relationship with IBI score over time

Spearman rank correlation coefficients for river location and IBI score ranged from −0.30 to 0.80 and had a mean of 0.4 (Fig. 5). Upstream sites had higher IBI scores than downstream sites during the majority of collection years. One of the collection years (1983) resulted in a strong negative correlation coefficient: upstream sites had lower IBI scores than downstream sites. The low scores at two upstream sites in 1983 were due to collections of only one and two individual fish per site.

Fig. 5
figure 5

Spearman rank correlations of IBI scores and site location, by year

Temporal change in IBI score at individual sites

Spearman rank correlations for year of collection and IBI score ranged from −0.80 to 0.50 and had a mean of −0.32 (Fig. 6): IBI scores decreased with time. A one-sample t-test testing for mean correlation of zero resulted in significance (t = −8.6, P < 0.001, n = 25). IBI scores decreased at the vast majority of sites, and increased only at downstream sites (Fig. 6).

Fig. 6
figure 6

Spearman rank correlations of IBI scores and year, at individual sites. The line at zero correlation demonstrates that the few positive correlations were at downstream locations

Association between mean IBI and variation in IBI

A total of 108 site-year collections was available where three collections were made in each year. The correlation coefficient for IBI mean and range was −0.06 (P = 0.99, Fig. 7). The correlation coefficient for IBI mean and SD was −0.05 (P = 0.59, Fig. 7). The correlation coefficient for IBI mean and CV was −0.46 (P < 0.001, Fig. 7). The relationship of mean IBI with range and with SD indicated no change in mean IBI with variation. Our interpretation of the significant relationship for mean IBI and coefficient of variation IBI is that mean IBI is in the denominator, hence the significant relationship.

Fig. 7
figure 7

The relationship for mean IBI score and three measures of variation: range of IBI score (top), standard deviation of IBI (middle), and coefficient of variation of IBI (bottom). IBI was calculated from three samples at the same sites (n = 108). The top plot includes year of sample, the middle plot includes river km of sample

Discussion

Our results show that a single sample per year is adequate for characterizing fish assemblages of a large river such as the Wabash River using IBI scores from boat electrofishing collections; little additional information or different assessment is obtained when multiple samples are obtained in a single summer. These results further validate the robustness of a multimetric index against random sampling variation and within-year variation in the fish assemblage (Fore et al., 1994). If a study is designed to characterize large river fish assemblages with the minimum cost, our analysis indicates that single summer samples will likely be sufficient.

Temporal variation in IBI scores among months in the same year at the same sites is due to numerous factors including changes in water quality, changes in habitat between sampling dates, flow variation between sampling dates, seasonal movements of individuals, and random variation (Karr et al., 1987). However, the majority of studies using multimetric indices to evaluate streams do not report seasonal variation, even if they collected multiple samples (Reash, 1999). Variation in IBI scores is typically expected with multiple samples. Angermeier & Karr (1986) found seasonal variation in IBI scores from a small Illinois stream that appeared to be due in part to individuals becoming patchily distributed in the fall. Although seasonal variation in our IBI scores was present, there was not a significant difference in IBI score among months.

We identified high variation in IBI scores among years and among sites (Fig. 6). This suggests a high level of randomness among sampling periods and sites. However, our impression is that even though raw IBI scores may vary substantially, the resulting integrity class scores (i.e., poor, good, excellent ranks) are robust to this variation.

Spatial patterns that we observed using these Wabash River data were consistent with Gammon (1998), Gammon & Simon (2000), and Pyron et al. (2006) using subsets of the same collections, and Pyron & Lauer (2004) using collections from 2001 to 2002: sites upstream of river km 500 resulted in higher IBI scores than downstream sites. This pattern was present in nearly every year. Gammon (1998) attributed the decline in site quality with downstream distance primarily to increasing impacts from agriculture and urban drainage. Pyron & Lauer (2004) added hydrologic alteration as a historical and current impact. The hydrology of the Wabash River watershed has been altered primarily by reservoir release and agriculture (Pyron & Neumann, unpubl. data).

There is evidence that the quality of fish assemblages has improved during this 25-year period. Gammon (1998) compared collections at these sites from the 1970s to collections from the 1980s and found improvements in species richness (4.9–6.9) and CPUE (30–51/km). He attributed a majority of the high annual variation in multimetric scores to drought and flood events. However, detection of improvements is likely scale and context-dependent. For example, Pyron et al. (2006) found directional changes in the fish assemblages during this 25-year period, but only at the scale of the entire reach, not at individual stations. Directional changes were only significant at the scale of the entire reach. At individual stations, year-to-year changes in assemblages were not directional (i.e., not predictable). Our analyses resulted in IBI scores that slightly decreased during this 25-year period.

Autocorrelation appears to be a common phenomenon among stream assemblages (Wilkinson & Edds, 2001). Wilkonsin & Edds (2001) found spatial autocorrelation among stream sites was a stronger explanation of within drainage variation than environmental (habitat) variation. Spatial autocorrelation among our sites is a further indication that longitudinal patterns (upstream–downstream) distinguish the majority of assemblage variation. It is not surprising that the habitat of sites that are closer together is more similar than for sites that are distant. The presence of significant spatial autocorrelation in some years suggests that sites are not independent, possibly due to local movements of individuals (Lloyd et al., 2006). This may be interpreted as being unnecessary to sample sites over a large river distance. However, sampling protocols for biomonitoring studies are based on identification of potential negative impacts, or impact sources. Sampling stations are typically located where anthropogenic impacts are obvious, such as point source discharges, urbanization or tributaries with potentially negative impacts. In addition we found temporal autocorrelation present and strongest in upstream reaches. Upstream sites tended to result in IBI scores that were more similar with repeated sampling. We suggest that researchers test for spatial and temporal autocorrelation, and be aware of potential violations of statistical independence when present.

Monitoring of fish assemblage quality using biological measurements (IBI) is an effective approach to understand trends in the status of river ecosystems (Karr, 2006). Decreases in fish assemblage quality as a result of anthropogenic impacts have occurred in the large rivers of North America (Rinne et al., 2005). Causes of degradation are frequently similar as in the Wabash River and include agriculture, urbanization, and hydrologic alteration from dams and other river regulation (Dynesius & Nilsson, 1994; Allan, 2004). The ecological integrity of large rivers, including the Wabash River, is fundamentally impacted by human alteration of headwater streams, that compose over two-third of stream length in a typical drainage (Freeman et al., 2007). Restoration of riverine ecosystems will require restoring a natural flow regime (Galat & Lipkin, 2000; Bunn & Arthington, 2002) that includes connectivity with headwaters.

The effect of sampling design influences sampling variation and, hence, IBI scores (Karr et al., 1987; Fore et al., 1994). Although our results suggest that a single sample by boat electrofisher will provide a reasonable estimate of the quality of the fish assemblage at a site, sampling frequencies at specific time intervals (e.g., months) are still necessary if a goal is to detect the presence and impact of temporal variation. For example, temporal variation may be an indicator of human influence (Fore et al., 1994). We suggest that future studies of fish assemblages use information from these and similar studies to adjust their sampling designs so as to obtain the best data for addressing their study questions within the constraints of limited time and resources.