Introduction

Highland streams in Ethiopia have been reported to show deteriorated water quality from severe anthropogenic impacts (Alemneh et al., 2017) including poor waste management practices, urbanization, population growth, industrialization, unregulated agriculture, floriculture farms that use large quantities of fertilizers and pesticides as well as deforestation (Getachew et al., 2020). A sampling strategy to monitor water quality must be capable of distinguishing anthropogenic impacts from natural variation (Callanan et al., 2008). Natural variation in biological community structure occurs in the river continuum especially across ecoregions in response to changes in altitude (Wang et al., 2012) and seasonal change (Alemneh et al., 2019; Gratwicke, 1998). At the same time, the delivery of pollutants to rivers and streams may be from a point or diffuse sources as chronic or acute inputs. According to Buss et al. (2015) several factors also dictate the limitations to adopting standardized biomonitoring protocols: (1) Climatic differences among countries create differences in season, month, or flow condition for sampling in all regions; (2) lack of taxonomic expertise to identify specimens to genus or species level; and (3) the fact that most biological indicators and indices have been developed for specific geographic regions, states, or countries, as well as for specific pressures.

Inferring water quality and in particular ecological water quality from biological communities is well established (Geist, 2011; Resh & Rosenberg, 1993). Biological sampling protocols have been established for most developed regions such as America, Europe, and Australia (Barbour et al., 1999; Koester & Gergs, 2017; Stark et al., 2001) based on their ecological contexts but often not consistently applied at local, national, or continental scales-even in their home countries (Buss et al., 2015). The protocols follow several levels of sampling effort that can be categorized into three groups (Buss et al., 2015) namely: fixed sample number, fixed sampling length/area, and fixed sampling time.

Benthic macroinvertebrates are among the most widely used globally for the assessment of water quality as they reflect prevailing conditions, respond rapidly to environmental stresses, and are often the first ecological indicators to react to changes in the environment (Barbour et al., 1999). For example, Carlisle et al. (2008) showed that the macroinvertebrate model was the most precise compared to the diatom and fish models. However, the use of macroinvertebrates has been criticized due to long-time requirements and expensive costs of sampling, sorting, counting, and identifying them (De Bikuna et al., 2015) and this is likely to affect the effectiveness of biomonitoring protocols using macroinvertebrate communities (Pinna et al., 2014).

Among the most commonly applied methods for bioassessment using macroinvertebrates is kick sampling. However, the approach to kick sampling is variable especially in terms of timing; examples include 2-min (Mandaville, 2002), 3-min (Larsen et al., 2012), 4-min (Haase, et al., 2004a, b), and 5-min (Cheshmedjiev et al., 2011; Luo et al., 2018) to collect samples of macroinvertebrate communities at each location. Conversely, there is often little attention given to the duration the “kicking” should be conducted to take a sample (Bradley & Ormerod, 2002; Clarke et al., 2002). Furthermore, there is variability in terms of the mesohabitats (e.g., riffle, run/glide, pool) targeted for sampling. Both single habitat (Blocksom et al., 2008) and multihabitat (Barbour et al., 2006) approaches have been adopted. However, most developing countries including Ethiopia have no agreed macroinvertebrate kick sampling protocols. So far, several bioassessment studies conducted in Ethiopia have used various kick sampling methods, for example, 3-min (Getachew et al., 2012; Lakew & Moog, 2015) (most frequently used method), 5-min (Ambelu et al., 2014), and even 10-min (Mereta et al., 2012, 2013) in different parts of the country without any subsampling techniques for benthic invertebrates. There is no consensus on whether a standardized approach should be adopted for monitoring wadeable rivers and streams in the country in terms of the kick sampling method. This was the impetus for the current research that undertook to test several approaches to kick sampling.

In identifying the approach that might be most appropriate we considered both processing time and costs. As the kick time increases, the cost of sample processing also increases unless subsampling is used. For example, Haase et al. (2004a, b) explained that sorting is often the most time-consuming aspect of a bioassessment study that led to either economic or deliverability consequences. Any decrease in time spent on sample processing while not decreasing sampling effectiveness will increase productivity and reduce costs. For example, Feeley et al. (2012) compared two multihabitat (MH) kick sampling methods in small streams and found that a kick sampling method as short as 20-s was as effective as 60-s to produce a reasonable representation of the taxa present. Mykra et al. (2006) also found that there was no indication of differences in community structure among different sample sets collected for 1-, 2-, 3-, and 4-min, and suggested that even 1-min kick samples can adequately represent macroinvertebrate community structure in streams for that particular ecoregion.

Decreasing effort and cost in developing a protocol is not the only aim of sampling methods. It should also maintain accuracy to answer research or monitoring questions (Barbour & Gerritsen, 1996). This needs effective yet rapid sampling and processing methods. However, there is a scarcity of studies in Ethiopia that have evaluated different kick sampling methods in rivers and streams. As previously explained, studies in the country used different kick sampling methods varying from 3- to 10-min, all of which have been developed elsewhere in different ecoregions such as non-tropical countries (Barbour et al., 1999; Gabriels et al., 2010; Ostermiller & Hawkins, 2004) without determining the habitat type to sample. The 2-min kick sampling method has not been used in previous bioassessment studies in Ethiopia; however, it has been effectively used in different ecoregions (Bradley & Ormerod, 2002; Heino et al., 2002, 2003). Therefore, the objective of this study was to determine whether there are differences in four sampling approaches; MH kick samples collected for 2- and 3-min and riffle habitat (RH) kick samples collected for 2- and 3-min at minimally impacted sites (Table 2). The study tested the hypothesis that “no differences occur in terms of a range of macroinvertebrate metrics as well as community structure between samples collected using the four methods.”

Methods and materials

Study areas

Ethiopia has a diverse climate due to its equatorial positioning and varied topography (Block, 2008) ranging from a semi-arid desert to a humid and warm (temperate) type in southwest Ethiopia (Beyene & Meissner, 2010) where this study was conducted. Traditionally climatic conditions in Ethiopia are classified into five major agroecological zones based on the altitude and temperature variations (Berhanu et al., 2014; MOA, 1998) (Table 1). The study sites for this study are located in the “Kola” and “Weynadega” agroecological zones having nearly 80% share of the country’s surface area (MOA, 1998).

Table 1 Traditional climatic zones of Ethiopia and their physical characteristics. Source: (Berhanu et al., 2014; MOA, 1998)

Ethiopia has twelve major river basins (Awulachew et al., 2007). The study sites were located in the Abay and Omo-Gibe river basins (Fig. 1) based on their physical characteristics (e.g., slope, soil type, stream order, and land use types) and known ecological conditions (Alemneh et al., 2019; Lakew, 2017) to reflect typical Ethiopian wadeable rivers and streams.

Fig. 1
figure 1

Source: ArcGIS 10.5)

Land use types and location of study sites in the Abay and Omo-Gibe River basins of Ethiopia (

Twenty reaches on four reference streams (Urgessa, Yebu, Enkulu, and Feche) with minimal anthropogenic pressures were selected using the standards representing minimally impacted waters (Table 2). The collection of macroinvertebrate samples within designated index periods is critical for repeated assessments (KDOW, 2015) and to minimize seasonal variation (Montana, 2012). We selected the dry season as an index period to collect the samples when water levels are sufficiently low to enable sampling. All samples were collected in the same year (2019) in one month (May). Previous bioassessment studies in Ethiopia have also been conducted in this season (mid-September to mid-June) (Ambelu et al., 2013; Getachew et al., 2012). Prior to field sampling background information on land use, geology, and soils where available were compiled (Table 2).

Table 2 Details of streams selected for the collection of macroinvertebrate samples in southwestern Ethiopia, 2019

The coordinates and altitudes were recorded using a global positioning system (GERMIN-72H). Dissolved oxygen (mg/L), pH, water temperature (°C), conductivity (μS/cm @ 25 °C), total dissolved solids (mg/L) were measured in-situ using a digital handheld portable multi-parameter (Hach HQD) probe. Habitat assessment (representation of flow mesohabitats and benthic substrates) was also conducted using the rapid bioassessment protocol set for wadeable streams and rivers (Barbour et al., 1999). Nutrients such as nitrate (mg/L NO3-N) and phosphate (mg/L) were measured using a screen touch spectrophotometer in the Department of Environmental Health Sciences and Technology laboratory of Jimma University.

Sampling method evaluation

We first selected an accessible length of the stream at each site to locate five 50 m reach (250 m in total on each river) for sampling after Feeley et al. (2012) and Mabidi et al. (2017). From these stream reaches, 15 samples for each 2- and 3-min MH method were collected from all mesohabitats including riffles, run/glides, and pools in proportion to their representation in each of the 50-m stream reach. Similarly, 5 samples of 2- and 3-min duration were collected from riffle habitats only within similar reaches where the MH samples were collected. There was no overlap in the sampling area covered by each sampling method. For all methods, sampling was started from downstream and continued upstream to avoid habitat disturbance prior to sampling. A total of 120 MH replicates (60 for each method) and 40 RH replicates (20 for each method) were collected using a D-frame pond net (0.5 mm mesh). Immediately after the collection of benthic macroinvertebrates, large leaves, sticks, and stones were individually rinsed and inspected for organisms and discarded at the site. The entire sample was then preserved in 96% ethanol alcohol after Barbour et al. (1999) and transported to the laboratory for sorting and identification to the family level and counting the various taxa. Wright et al. (2000) highlighted that family-level data are adequate for rapid bioassessment of water quality. Furthermore, Barbour et al. (1999) showed that family-level identifications provide a high degree of precision among samples, require less expertise to perform, and accelerate the production of results. However, the necessary level of taxonomic resolution is determined depending on the purpose(s) of a study (Bailey et al., 2001). Fundamentally, most biomonitoring programs acknowledged a tradeoff between efficiency and sensitivity for large-scale monitoring, thus many strategies for reducing processing time and cost have been implemented (Buss et al., 2015).

Data and statistical analysis

The non-parametric Kruskal–Wallis test was used to test for significant differences in a range of metrics among the four kick sampling methods (2- and 3-min RH and 2- and 3-min MH samples). The analyzed metrics include taxonomic richness at the family level, total abundance, Ephemeroptera, Plecoptera, and Trichoptera (EPT) richness, %EPT, Ephemeroptera richness, Ephemeroptera abundance, family biotic index (FBI), the Biological Monitoring Working Party (BMWP) score, Average Score Per Taxon (ASPT) (Hawkes, 1998), benthic macroinvertebrates based on the biotic score (ETHbios), and ASPT calculated from ETHbios (Aschalew & Moog, 2015). Also, the Shannon–Wiener’s index, evenness, Simpson’s index, %scrappers, %shredders, %collector-gatherers, %collectors-filterer, and %predators were used to evaluate the four kick sampling methods. The post hoc Mann–Whitney test was calculated to evaluate pairwise differences between the groups. The Bonferroni corrected p-values were calculated in PAST statistical software to reduce type I error, or false rejection of the null hypothesis (i.e., there is no difference between the kick sampling methods). The bar charts with standard error of the mean (Cumming et al., 2007) were also displayed for selected metrics (richness, total abundance, and Ephemeroptera abundance).

We used the relative abundances of macroinvertebrate communities to carry out the multivariate tests to control for the differences in kick net, sampling times, and habitat differences. Analysis of similarity (ANOSIM), a randomization test (Clarke & Green, 1988), was performed on the resemblance matrix based on the Bray–Curtis similarity coefficient (Armitage et al., 1987; Clarke & Gorley, 2006). This was carried out in PRIMER 6, based on log(x + 1) transformed macroinvertebrate community relative abundance data using 999 permutations. It allows a test of the null hypothesis that there are no macroinvertebrate assemblage differences among groups of replicated samples collected by different kick sampling methods (Clarke & Gorley, 2006). The differences among the groups were measured by the global R statistic, which is calculated as \(R=\frac{4({\overline{{\varvec{r}}} }_{{\varvec{B}}}-{\overline{{\varvec{r}}} }_{{\varvec{W}}})}{{\varvec{n}}({\varvec{n}}-1)}\), where \({\overline{{\varvec{r}}} }_{{\varvec{B}}}\) and \({\overline{{\varvec{r}}} }_{{\varvec{W}}}\) are the mean rank between-group similarity and within-group similarity, respectively, and n is the total number of samples (Cao et al., 2005). The ANOSIM calculates a test statistic R that ranges from -1 to 1 (Chapman & Underwood, 1999). An R-value close to 1 indicates good separation or differences exist among groups and a value close to 0 indicates weak separation or complete random grouping (Chapman & Underwood, 1999; Clarke & Gorley, 2006) and the negative values occur when the samples within a group are less similar to one another than to the samples of other groups, probably due to inappropriate sampling designs (Chapman & Underwood, 1999). ANOSIM provides associated p-values to R-statistics which highlights the significance level of the test (Clarke & Gorley, 2006).

The two-dimensional MDS ordination plot which is a visual representation of differences in macroinvertebrate structures among the methods was also constructed using a relative abundance matrix. An MDS is a rank-based approach where the relative abundance data are substituted with ranks (Buttigieg & Ramette, 2014). An ordination plot with stress values equal to or below 0.05 indicated a good fit and values > 0.2 are arbitrary or more of a random pattern indicating that there is a little explainable pattern in the MDS plot (Clarke, 1993). To elucidate the differences in macroinvertebrate distribution in the MDS plot, an index of multivariate dispersion (MVDISP) test was then employed to statistically measure the relative rank dissimilarity between the samples within each method. Correspondingly, the within-group similarity was also determined for samples within each method using SIMPER analysis according to the method described in Clarke and Gorley (2006).

Furthermore, to compare the patterns of macroinvertebrates community structure among different kick sampling methods, we used the RELATE routine in PRIMER 6 (Clarke & Gorley, 2006) which compares the relationship between two different resemblance matrices or MDS plots by calculating Spearman’s rank correlation coefficients ρ(rho) from relative abundances of macroinvertebrates based on 999 permutations to test the (dis)similarity of the methods. A Spearman’s rank correlation coefficient (ρ = 1) indicates a perfect similarity between method pairs.

Finally, classification strength-sampling method comparability (CS-SMC) (Cao et al., 2005) was computed based on the formula proposed by several authors (Cao & Hawkins, 2011; Cao et al., 2005; Van Sickle, 1997). The CS-SMC allows the direct comparison of benthic macroinvertebrate community structure similarity obtained by kick sampling methods (Cao et al., 2005). All six pairs of the sampling methods were compared from the log(x + 1) transformed Bray–Curtis similarities of taxonomic assemblages (SIMPER) to calculate the CS-SMC using the following equation (Cao et al., 2005; Van Sickle, 1997): \(CS-SMC=\frac{2{\overline{{\varvec{S}}} }_{{\varvec{b}}}}{{\overline{{\varvec{S}}} }_{{\varvec{w}}1}+{\overline{{\varvec{S}}} }_{{\varvec{w}}2}}\times 100,\) where \(\overline{S }\) b denotes the mean between-group similarity, \({\overline{{\varvec{S}}} }_{{\varvec{w}}1}\) and \({\overline{{\varvec{S}}} }_{{\varvec{w}}2}\) denotes the mean within-group similarity for any possible pairs of macroinvertebrate kick sampling methods from MH and RH samples. The CS-SMC measures similarity between methods relative to that between replicates within a method Cao et al. (2005) and indicates that methods are similar if the average within-method similarity is equal to the average within-method similarity of the other method. The CS-SMC is a useful measure of comparability as it is independent of site differences and sampling effort (Cao et al., 2005; Feeley et al., 2012).

Results

A total of 56,930 macroinvertebrates belonging to 78 families and 15 orders were collected using the four methods. Ephemeroptera 29,817 (52.37%), Coleoptera 6,243 (10.97%), Trichoptera 6150 (10.80%), and Diptera 5,405 (9.47%) were the four most abundant orders of macroinvertebrates present. The various metrics calculated are given in Table 3 below.

Table 3 Descriptive summary, mean ± SE, mean ranks (MR), and Kruskal–Wallis (K-W) test with its p-value of macroinvertebrate metrics for each kick sampling method. SE standard errors, H’ Shannon’s index, D Simpson’s index, ASPTBMWP average score per taxon from BMWP, FBI family biotic index, Eph. Ephemeroptera

The Kruskal–Wallis test did not show any significant difference (Fig. 2, Table 3) among the methods for all metrics except total abundance (H = 42.4, p = 0.001, df = 3) and Ephemeroptera abundance (H = 19.34, p = 0.001, df = 3) (Fig. 3, Table 3). For example, the mean (± standard error) for richness (Fig. 2) did not show any significant difference among the methods.

Fig. 2
figure 2

Comparison of the mean (± standard error) of selected benthic macroinvertebrate metric (family-level taxonomic richness) for the four kick sampling methods using bar charts and standard error bars examined in the dry season, 2019

Fig. 3
figure 3

Comparison of the mean (± standard error) of total abundance and Ephemeroptera abundance for the four kick sampling methods using bar charts and standard error bars examined in the dry season, 2019

For significantly different metrics, the pairwise post hoc test indicated significant differences in total abundance among all method pairs except 2-min RH and the 2-min MH pair (Fig. 3, Table 4).

Table 4 p-values in the pairwise post hoc test calculated for macroinvertebrate total abundance (lower left) and Ephemeroptera abundance (upper right) having significant differences in the Kruskal–Wallis test

Similarly, the post hoc test on Ephemeroptera abundance showed significant differences between 2- and 3-min RH, 2-min MH and 3-min RH, and 2- and 3-min MH (Fig. 3, Table 4).

The ANOSIM test based on the relative abundance of macroinvertebrates also showed no significant difference among the four kick sampling methods (R = 0.048, p = 0.972, ANOSIM). Furthermore, the two-way ANOSIM test, sites versus methods, did not show any significant difference (p = 0.214) although the global R was very low (R = 0.022; ANOSIM). Similarly, the MDS ordination plot (Fig. 4) from the relative abundance of macroinvertebrates with a high stress value of 0.24 (Clarke, 1993) as well showed no separation of the samples according to the method applied.

Fig. 4
figure 4

MDS plot of the macroinvertebrate community structure of the samples from the various sampling methods based on log(x + 1)-transformed Bray–Curtis similarity matrix

However, the MVDISP (index of multivariate dispersion) given in Table 5 showed the variations in the relative abundances of macroinvertebrate communities among the methods. The dispersion sequence of 0.34, 0.341, 0.945, and 1.197 for 3-min RH, 2-min RH, 3-min MH, and 2-min MH, respectively, showed that the average rank dissimilarity is almost 3.5 times higher within 2-min MH samples than for 2- and 3-min RH samples. The 2-and 3-min RH replicates were almost similar and the least dispersed compared to both 2- and 3-min multihabitat replicates. Likewise, the similarity percentages were similar for 2- and 3-min riffle habitats (Table 5). The lower the MVDISP value, the less dispersed the samples within the factor, and the greater the SIMPER, the more similar the samples within the factor (Wasserman et al., 2015).

Table 5 MVDISP (index of multivariate dispersion) and SIMPER test results, respectively, representing the relative dispersion and similarity of samples within each method (factor)

The RELATE analysis computed from the relative abundance of macroinvertebrate communities further indicated a similarity in macroinvertebrate community relative abundances recorded between all pairs of methods with a Spearman rank correlation coefficient (ρ) of greater than 0.5 and p = 0.001. Among all pairs, 2- and 3-min RH samples and 2- and 3-min MH samples showed the highest similarity in macroinvertebrate community structures recorded (Table 6).

Table 6 The RELATE analysis results indicating Spearman’s rank correlation coefficients (ρ) between different biotic resemblance matrices

Finally, the analysis of similarities between benthic macroinvertebrates community structure had a classification strength-sampling method comparability (CS-SMC) close to 100% for all pairs of methods (Fig. 5) which highlighted that the similarity of samples between methods was very high. The trend line in Fig. 5 illustrated that method pairs such as 2- and 3-min RH, 2-min RH, and 2-min MH, and 2-min MH and 3-min RH had better CS-SMC scores (close to 100%) compared to the other method pairs. Method pairs between 2-min MH and 3-min MH showed a CS-SMC value greater than 100% (Fig. 5) indicating a greater similarity between the methods than within the methods.

Fig. 5
figure 5

The similarity between replicate samples within each method relative to the similarity of samples between methods as calculated by CS-SM

Discussion

This study set out to develop a fixed time sampling protocol as a method for bioassessment of streams and rivers using macroinvertebrates in Ethiopia although it has not been tested yet for its utility in determining water quality conditions at sites. Comparisons among methods can be made at several levels of data organization: taxonomic composition, relative abundances, metrics, indices, and bioassessment endpoints (e.g., good-fair-poor) (Blocksom et al., 2008). Four kick sampling methods (2- and 3-min RH, 2- and 3-min MH) were selected and compared using various statistical analysis methods. Based on the Kruskal Wallis test (H), the majority of metrics compared were similar probably because of some reasons. According to Blocksom et al. (2008), the predominant habitat tends to be riffles in higher gradient streams and a sample that is collected using the single habitat method that focuses on riffle habitats should be very similar to the sample that is collected using the multiple habitat method, in which habitats are sampled according to their proportional representation in the stream. Also, the lack of differences among the methods could be related to inadequate sampling and also the level of taxonomic resolution. The only exceptions included total abundance and Ephemeroptera abundance, which differed significantly between methods. The differences may probably be due to the larger total number of macroinvertebrates and Ephemeroptera families in the 3-min RH and 3-min MH samples.

Similarly, diversity indices, biotic scores, and functional feeding group attributes evaluated for the variability are in the support of future bioassessments using these 4 methods. This finding was in agreement with Friberg et al. (2006) who found that a sampling method with a lower sampling effort, such as the 2-min, achieves equal sampling effort as 3-min in terms of macroinvertebrate community structure. Furthermore, Mykra et al. (2006) indicated that similarities in community structure among different sample sets collected for 1-, 2-, 3-, and 4-min showing that even 1-min samples would yield adequate and comparable taxonomic structures. Similarly, Feeley et al. (2012) found that 20- and 60-s MH kick sampling methods displayed a reasonable representation of the taxa in terms of various metrics tested. Likewise, the ANOSIM test did not show any overall difference among the four methods and highlighted the high similarity in macroinvertebrate community structure across all sites sampled using the methods tested. In contrast, a related study (Feeley et al., 2012) found a high similarity but significantly different between 20- and 60-s kick sampling methods.

The RELATE analysis on relative abundances of macroinvertebrate communities showed that samples from similar habitats but collected by different kick sampling methods had higher correlation coefficients than those sampled from different habitats by either different or similar kick sampling methods. For example, 2- and 3-min RH pair matrices and 2- and 3-min MH pair matrices had Pearson correlation coefficients of 0.76 and 0.80, respectively. This probably highlights that more similar macroinvertebrate taxa had been collected from similar habitats regardless of the different methods used. This relationship had been well described by Khudhair et al. (2019) where sediment type, water flow, presence of aquatic vegetation types directly affect the assemblage structure of benthic macroinvertebrates. The other reason for this high similarity among the methods may be related to the family level identification used. Although the family-level identification is adequate for bioassessment of water quality (Chessman, 1995; Corkum, 1989; Wright et al., 2000), Bailey et al. (2001) concluded that organisms identified to genus or species level provide a significantly more informative description of conditions in the stream than higher taxonomic levels.

The CS-SMC measure also showed high similarity scores (close to 100%) among the method pairs indicating that all kick sampling methods represented almost identical macroinvertebrate community structures from MH and RH samples. This finding was in contradiction to Cao et al. (1998) and Cao et al. (2002) who found that sample size is important but they identified some taxa to lower levels (species, genus, and family levels). They noted that increasing sample size can affect the relative abundance of macroinvertebrates in a sample and thus the analysis of community structure. However, Mykra et al. (2006), who sampled only riffle habitats, showed that sample size is more important to detect rare and endangered macroinvertebrate taxa and concluded that the 2-min samples are sufficient for most biodiversity purposes.

In the current study, several analytical tests, for example, the Kruskal–Wallis, ANOSIM, RELATE, and CS-SMC showed no significant differences among the four methods indicating that the low sampling effort, such as those of the 2-min RH and 2-min MH method, perform equally and often better, in terms of the numbers of taxa and abundances recorded compared to that of methods with a higher sampling effort (3-min RH and 3-min MH). However, although the stress value of the MDS ordination plot among the four methods was higher compared to the Clarke (1993) cutoff value, it appears that the sample replicates using the MH methods (2-and 3-min) were highly dispersed compared to the sample replicates using the RH methods (2- and 3-min). The differences were well reflected by the numerical analyses of MVDISP and SIMPER in terms of dispersion and similarity percentage of macroinvertebrate samples. The MVDISP analysis showed that the 2- and 3-min RH have an equal and low index of multivariate dispersion (IMD) and high within similarity (SIMPER). In contrast, high IMD and lower within similarity (SIMPER) were observed in multihabitat replicate samples collected by 2-and 3-min (Table 5). According to Warwick and Clarke (1993), there are two potential sources of increased variability among replicate samples: (1) because of an increase in the variability of abundances of the same set of taxa, and (2) due to changes in species identities.

As detected in MDS and from the MVDISP analysis, it appears that the 2- and 3-min RH kick samples were much less variable and this would enhance bioassessments by being better able to detect impacted sites as opposed to the highly variable 2- and 3-min MH samples. Based on the findings in this study, 2- and 3-min RH kick sampling methods in terms of time and habitat type for the collection of macroinvertebrate data would be key considerations in the country. However, the extra time spent for kicking such as 3-min in RH was an effort spent increasing the abundances of macroinvertebrate taxa already sampled. This unnecessary increase in the abundance of macroinvertebrates collected by the extra time for each sample will increase the identification effort (Feeley et al., 2012) and costs which are directly related to the amount of material and the number of individuals collected (Barbour & Gerritsen, 1996). Although not examined in this study, the taxonomic resolution also plays a major role in the costs associated with sample processing (Vlek et al., 2006).

In summary, if the focus is not on rare taxa and the required information is not dependent on additional evidence provided by the use of lower taxonomic levels of identification, the results in this study support the use of the shorter 2-min RH kick samples, also used by other studies (Heino et al., 2003; Mykra et al., 2006; Paavola et al., 2003), for bioassessment of wadeable rivers and streams in Ethiopia. Future studies should explore the effectiveness of lower sampling time, such as 1-min kick samples across a wider range of wadeable stream and river types, and also the effect of variable levels of taxonomic resolution, not just for bioassessment of water quality but also for aquatic invertebrate biodiversity assessment, an issue that is becoming increasingly important with alarming losses of freshwater biodiversity in many parts of the world (Reid et al., 2019). Furthermore, we would suggest future studies to include impacted sites to see how the methods perform there as well.

Conclusions

A method that takes the minimum duration of time and effort but produces an equitable representation of the taxa available would be eventually adopted in the assessment of streams and rivers (Buss & Borges, 2008; Feeley et al., 2012; Mykra et al., 2006) based on the ecological contexts of the region. The unnecessary increase in abundance, as found with 3-min RH kick samples, will considerably increase the identification effort for each sample. Sampling for a longer time duration increases effort and fatigue (Feeley et al., 2012; Mykra et al., 2006) and reduces concentration and the productivity of the operators, in turn potentially affecting the results (Feeley et al., 2012). Consequently, the optimum macroinvertebrate kick sampling method which enables rapid but accurate bioassessment of water quality in terms of time and habitat type is an important consideration in Ethiopia. Based on the results in the present study, the shorter 2-min RH kick sampling method could be a good candidate for the bioassessment of wadeable rivers and streams in Ethiopia but further testing across a pollution gradient is recommended.