Introduction

Ethiopia is gifted with many streams and rivers that comprise diverse aquatic ecosystems of great scientific interest and economic importance. The rapid population growth, expansion of urban and suburban areas, industrialization, land use change, and removal of riparian vegetation have resulted serious ecological problems on surface water resources (Zinabu and Elias, 1989; Aschalew, 2014). To put a sustainable water resource management into practice, the first step is to understand the relationship between human-induced disturbance and their effect on aquatic resources, followed by the development of appropriate assessment tools to undertake mitigation measures.

Biological monitoring methods have been developed largely to assess the health and level of stresses on an aquatic ecosystem (Birk et al., 2012). Unlike chemical monitoring, the advantage of biomonitoring lies in the ability of biological communities to reflect not only the quality of the water, but also the overall ecological status of the ecosystem (Rosenberg and Resh, 1993). Biological integrity in aquatic ecosystem represents the ability of the systems to support and maintain a balanced and integrated community of organisms comparable to the natural habitat (Karr and Chu, 1999).

Benthic macroinvertebrates are often the taxa group of choice for biomonitoring in streams and rivers (Moog, 1988; Rosenberg and Resh, 1993). They are ubiquitous and good indicators of several anthropogenic pressures such as water pollution (Armitage et al., 1983) and geomorphological alterations (Negishi et al., 2002). They consist of thousands of species that originate from different systematic categories with different environmental needs—thus they are differentially sensitive to pollutants of various types.

The conceptual basis for biomonitoring methods began in Europe after the saprobic system was developed by Kolkwitz & Marsson (1902). In the 1980s, most biotic indices were developed based on score systems (Armitage et al., 1983; Hilsenhoff, 1988). More recently, integrated assessment methods—multimetric approach (e.g., Karr, 1981; Barbour et al., 1995; Ofenboeck et al., 2004; Gabriels et al., 2010) and multivariate approach (e.g., Reynoldson et al., 1995; Kokes et al., 2006) were developed. Both multimetric and multivariate methods are based on the reference conditions approach and compare the deviation of an impaired sites from pristine sites. The use of multivariate approach is, however, still emergent in tropical areas but it constitutes the basis for the development of multimetric approach as it is used in temperate regions.

A multimetric index (MMI), first introduced by Karr (1981) using fish assemblages, has now gained increasing attention worldwide for its ability to include complementary information from a broad spectrum of stressors (Barbour et al., 1996; Hering et al., 2006). MMI integrates different biological measures (richness, composition, tolerance, and trophic status) into a single value that can potentially reflect impact of multiple anthropogenic pressures (Barbour et al., 1999; Hering et al. 2006). Moreover, MMI are increasingly used for the purpose of conservation and restoration actions, as they provide relevant information for regulatory agencies and decision makers (Barbour et al., 1995; Karr and Chu, 1999). For example, multimetric approach has become a popular method for routine biomonitoring program in European member state countries (Hering et al., 2004; Ofenboeck et al., 2004). Recently, the multimetric approach is very occasionally applied in tropical countries (e.g. Ferreira et al., 2011; Bolivia: Moya et al., 2011; Couceiro et al., 2012; Vietnam: Nguyen et al., 2014; Brazil: Melo et al., 2015). An important advantage of MMI is that they are flexible and can easily be adjusted by adding or removing metrics or refining the index threshold values. However, MMI should be applied cautiously because of the differences in reference conditions, anthropogenic pressures, and regional species assemblages.

In Ethiopia, the use of biomonitoring in general and application of a multimetric approach in particular is still at its infancy. The evaluation of water quality is primarily focused on physicochemical data and optionally includes fecal coliforms as a biological component (Alemayehu, 2001). In recent years, however, research institutions and universities in collaboration with foreign scientists have tested biomonitoring systems using benthic macroinvertebrate communities to assess ecological condition of streams, rivers, and natural wetlands (Getachew et al., 2012; Mereta et al., 2013; Aschalew, 2014). The results showed that benthic macroinvertebrates are reliable indicators for biomonitoring purpose in the region. Thus, in this paper, we aimed to develop a MMI for assessing the ecological status of streams and rivers in a broader scale and under multiple stressors using benthic macroinvertebrates. The geographical focus comprises central and south east highlands of Ethiopia, one of the most diverse and threatened ecoregion in the country. We believe that this study provides a useful tool for the assessment of rivers in the highlands and also a methodological guideline for wider application of biomonitoring network in the country.

Materials and methods

Study area

The present study was conducted in the upper section of Awash, Wabe-shebelle, Genale, and Rift Valley basins, lying between 6°57′N and 9°05′N latitude and 38°07′E and 40°06′E longitude (Fig. 1). Of 104 sampling sites, 95 sites were located in the Ethiopian montane grassland–woodlands ecoregion with altitude ranging from 1,900 to 2,500 m a.s.l, while the remaining 9 were distributed between 2,600 and 3,200 m a.s.l. in afro-alpine ecoregion of Bale highlands. The study area shows bimodal rainfall with a short rain from February to April and the main rainfall from June to September (National Meteorological Agency (NMA), 2012). The natural vegetation cover is restricted to state-protected forest areas in upper Awash (e.g., Chilimo, Suba forest) and central rift valley (e.g., Wondogent forest) basins. In Genale and Wabe-shebelle basins, the high mountains are dominated by small bushy shrubs and Erica forest, while the escarpments are covered by Junipers, Hyginia sp., mixed forest, and grass lands (e.g., Bale National Park and Adaba forest). In rural areas, the unprotected river catchments are highly populated and most of them are subsistence farmers. Moreover, the demand on natural products and land for farming is huge, which could be the predominant reason for the widespread loss of natural vegetation in this ecoregion. In urban areas, especially in and around the capital Addis Ababa, streams and rivers serve as major domestic and industrial waste dumping sites.

Fig. 1
figure 1

Spatial distribution of study sites in four drainage basins of Ethiopia

Sampling sites were distributed strategically in the national park, protected forest areas, rural-agricultural areas and urban-industrial sites to include different stress gradients. The major threats in rural areas include removal of riparian vegetation, nutrient loading from farmlands and washing activities, siltation of stream bed due to sediment from catchment, sand excavation, and un-budgeted water withdrawal for irrigation. In urban areas, the major pollutants originate from diffuse and punctual loads of untreated domestic and industrial wastes.

Reference site selection

The development of a MMI requires establishing stress gradient into reference and impaired sites (Hering et al., 2006). Many authors defined reference condition as the condition that is representative of a group of ‘least impaired’ sites characterized by selected physical, chemical, and biological characteristics (Hughes, 1995; Barbour et al., 1996; Hering et al., 2003). We established reference sites based on 20 thoughtfully selected ‘a priori’ criteria during our extensive field sampling (Table 1). The ‘a priori’ criteria were representative for Ethiopian highlands and reflect some general aspects of naturalness and address wide range of human disturbances. To be considered as reference site, at least 16 criteria are required including the four compulsory ‘a priori’ criteria (* in Table 1) in the sense of the ‘best attainable condition’ (Stoddard et al., 2006).

Table 1 Criteria applied for selection of reference sites in Ethiopian highland streams

Sample collection

Water quality parameters including temperature, pH, dissolved oxygen, and conductivity were measured using a portable WTW multi-parameter probe. From each site, two liters of water was collected and stored in ice box until return to the Laboratory of JIJE labo PLC and National Fishery and Aquatic Life Research center. Total phosphorus (TP) and biochemical oxygen demand (BOD5) were measured following the standard methodology described in APHA (1997). Macroinvertebrates were collected using standard square hand net with frame width of 25 cm and mesh size 500 µm following multi-habitat approach (Barbour et al., 1999; Moog & Sharma 2005). A composite sample was passed through a set of sieves to separate different size classes of macroinvertebrates and to wash off formalin using tap water. Benthic macroinvertebrates were identified to the lowest possible taxa using identification keys and the help of taxonomic experts in Vienna, Austria.

Selection of candidate metrics

In order to assess the ability of the potential metrics to discriminate between reference and impaired sites, two methodologies have been applied. (a) Mann–Whitney U test (P value at 0.05); (b) a graphical analysis using box-and-whisker plots combined with a sensitivity score. A five-class sensitivity score was assigned following Barbour et al. (1996) based on the degree of interquartile overlap in the Box-and-Whisker plots: a sensitivity score 3 was given if no overlap occurred in interquartile range; a sensitivity score 2 if the medians were outside the interquartile range overlap; a sensitivity score 1 if there was moderate overlap of interquartile range but one median was outside the interquartile range overlap; a sensitivity score 0a if one range was completely overlapping the other interquartile range but one median was outside the interquartile range overlap; and a sensitivity score 0b if both medians were inside interquartile range overlap.

Selection of core metrics and removal of redundant metrics

Selection of core metrics from the candidates was based on two applications: sensitivity score and discrimination efficiency (DE). Metrics with sensitivity score 3 (P < 0.05 in Mann–Whitney U test) were selected as core metrics. In the absence of sensitivity score 3 in a given metric category, a metric with sensitivity score 2 with higher DE (>50%) was selected to ensure the inclusion of primary metric categories (richness, composition, sensitivity, and feeding guild). DE was calculated as the percentage of impaired sites that score below 25th percentile of the reference sites for decreasing metrics with disturbance and above 75th percentile for increasing metrics with disturbance (Ofenboeck et al., 2004).

Metrics were considered as redundant if the Spearman rank correlation was higher than 0.8. The final core metrics were selected based on the following two major criteria: (a) a metric with an inter-correlation coefficient less than 0.8 (Hering et al., 2006) and (b) for highly inter-correlated metrics (r > 0.8); DE, ecological importance, wider applicability, and degree of correlation with environmental parameters were considered.

Metrics transformation, aggregation, and threshold values

The final core metrics were standardized between 0 and 1 (Hering et al., 2006). The 95th percentile of the data distribution of a metric was used as upper anchor to eliminate extreme outliers for metrics that decrease with increasing degradation (e.g., taxa richness, EPT taxa). Values close to the 95th percentile receive higher scores, while values having a greater deviation from this percentile receive lower scores. On the other hand, the 5th percentile of the data distribution of a metric was used as upper anchor for those metrics that increase in response to degradation (e.g., % Red Chironomidae, Hilsenhoff Family Biotic Index). Transformation was performed using the following formula:

For metrics decrease with increasing disturbance:

$${\text{Value}} = \frac{{{\text{Metric result}} - 5{\text{th percentile of impaired sites}}}}{{95{\text{th percentile of reference sites }} - 5{\text{th percentile of impaired sites}}}}.$$

For metrics increase with increasing impairment:

$${\text{Value}} = \frac{{{\text{Metric result }}- 95{\text{th percentile of impaired sites}}}}{{5{\text{th percentile of reference sites }} - 95{\text{th percentile of impaired sites}}}}.$$

Whenever standardized value exceeds 1 or become negative, it was scaled to 1 and 0, respectively. A value close to 1 represents high ecological status, and value close to 0 represents bad ecological status. Such transformation was commonly applied to integrate different metrics into MMI and to simplify assessment of ecological status (Ofenboeck et al., 2004).

The numerical range of MMI (0–1) was divided into five ecological quality classes (high, good, moderate, poor, and bad) based on a literature survey and discussion of the data in a stepwise procedure.

  • Step 1 defines the lowest threshold value of the index for reference sites (high ecological quality class) through application of 25th percentile approach (Munne and Prat, 2009). This approach was applied (1) to be sure that only truly unimpaired sites serve as a reference and (2) to avoid that too many impaired sites with comparably high indexes will shift into reference sites.

  • Step 2 defines the upper threshold value for heavily impacted (bad ecological quality class) sites. The aim of this step was to clearly characterize the bad sites based on the observed index values following Moog & Sharma (2005) chemical water quality class approach (total phosphorus, dissolved oxygen, and biological oxygen demand).

  • Step 3 defines the application of equidistance approach (Hering et al., 2006) to define river quality classes between good–moderate, moderate–poor, and poor–bad classes.

The sensitivity and robustness of a MMI was evaluated by assessing discrimination efficiency between reference and impaired sites. The percentage of correctly classified reference sites in high ecological quality class was calculated for performance evaluation. Moreover, the strength of relationship with relevant environmental variables and index stability across different hydrological conditions was evaluated.

Results

Metric selection

Of the 75 potential metrics initially considered as appropriate, only 39 metrics showed a significant difference between reference and impaired sites according to Mann–Whitney U test (P < 0.05). Most of these candidate metrics were widely recognized as being sensitive to various anthropogenic impacts. However, the sensitivity test from box-and-whisker plot followed by DE identified 18 core metrics (seven richness, five composition, three sensitivity, and three feeding guild-measures) (Table 2).

Table 2 Core metrics selected for the development of multimetric index, predicted response to degradation, sensitivity score, and discrimination efficiency

The redundancy test showed that four metrics namely % COPTE, % EPT-BCH, Family Biotic Index, and % shredders were not highly correlated with their respective metric category (Spearman rank correlation, r < 0.8). Since these metrics are representative for most sites and fulfilled all metric selection procedures, they were selected with the highest priority for the MMI (Fig. 2). When two metrics are strongly correlated, it is not justifiable to automatically disregard one metric if it is known that the two metrics represent two different aspects of the benthic invertebrate community. With this concept, total taxa and number of EPT-BH > 1sp from richness-diversity category, % Oligochaeta and Red Chironomidae from composition, ASPT from sensitivity, and % collector gathering from feeding group were retained based on their high DE, wide ecological representation, and application.

Fig. 2
figure 2

Box-and-whisker plots of nine core metrics used to integrate multimetric index. Bar line within the box represents median number, boxes represent first- and third-quartile ranges (25th and 75th percentiles) and range bars show maximum and minimum of non-outlier numbers

Environmental variables and final core metrics

The physical, chemical, and morphological characteristics were studied because in reference conditions they define the ambient quality in which aquatic biotas have developed, and in impaired conditions they correspond to the environmental pressures on aquatic species. Most of these parameters were significantly different between reference and impaired sites (Table 3). The final core metrics showed strong correlation with most environmental variables (Table 4). The correlation of physicochemical and land use variables were stronger than those variables related to substrate type, altitude, and catchment size. Dissolved oxygen showed strong positive correlation with EPT-BH > 1sp and ASPT(r > 0.65, P < 0.05), and strong negative correlation with % Oligochaeta and Red Chironomidae (r > 0.62, P < 0.05) which may indicate the importance of the metrics in response to oxygen depleting factors. Phosphorus concentration showed significantly high correlation with all core metrics in a predictable way (r > 0.65, P < 0.05). Furthermore, strong correlations of core metrics with % forest (r > 0. 5, P < 0.05) and % urban (r > 0. 5, P < 0.05) showed the efficiency of metrics in reflecting land use changes. With the exception of % fine sand, all substrate types showed either insignificant (P > 0.05) or weak correlation (r < 0.5) with core metrics which may show the sensitivity of the metrics in response to anthropogenic impacts than habitat variability.

Table 3 Mean, standard deviation, and range (in bracket) values of environmental parameters in reference and impaired sites
Table 4 Correlation between metrics selected for index integration and physical–chemical parameters using Spearman correlation (n = 104)

Threshold values of ecological quality class

The range of MMI was divided into five ecological classes as described in the methodology. Following the 25th percentile approach of Munne and Prat (2009) to distinguish high–good ecological quality classes a value of 0.759 was obtained and we set at a value greater than 0.75 as a threshold. Since total phosphorus, dissolved oxygen, and BOD5 were identified as important environmental variables, the 75th percentile of bad chemical water quality class was averaged to 0.149. Therefore, the upper threshold value for bad river quality class was set at a value less or equal to 0.15. The application of an equidistance approach resulted in values of 0.35 and 0.55 between poor–moderate and moderate–good river quality classes, respectively. To consider the effect of a different ecoregion in the dimension of a threshold value, 0.72 was set as high–good threshold for afro-alpine streams based on visual observation at the data distribution and physicochemical condition of the streams.

Discussion

The primary step in developing a MMI is the classification of sites into reference and impaired. This must be based on environmental features like physical, chemical, hydro-morphological, and eco-geographical data but in any case independently of the biological community. Although reference sites are defined as least impaired sites or minimal human disturbance (Hughes, 1995; Hering et al., 2003; Stoddard et al., 2006), there are difficulties in defining this condition in Ethiopia where most sites are easily accessible to humans. We adapted and used 20 ‘a priori’ criteria to classify sites into two impairment levels for developing a MMI and this appeared to be an efficient method. For example, 82% of the ‘a priori’ reference sites were classified as ‘high’ and 18% as ‘good’ river quality classes, and all impaired sites were distributed from ‘good’ to ‘bad’ river quality classes. Specific but frequent pressures that occur in Ethiopian highland such as cattle watering point, washing, and rural residents are important criteria to characterize reference sites. The absence of point source of pollution from domestic and industries upstream of potential reference sites was a key criterion which should be fulfilled by all reference sites.

Taxonomic identification level is the other important issue in biomonitoring and has been widely discussed in previous studies (Schmidt-Kloiber and Nijboer, 2004). In Ethiopia, where the taxonomic knowledge of benthic macroinvertebrates is still incipient, defining the taxonomic resolution is particularly important for an index. Many assessment methods prefer family level identification for rapidity and cost benefits (Thorne and Williams, 1997). However, it is often claimed that different species or genera belonging to the same family exhibit different tolerance level (Stubauer and Moog, 2000). In the present study, we combined family level (e.g., Perlidae, Simuliidae) and genus level (e.g., genus of Hydropsychidae and Baetidae), and this appeared to be a good approach to improve the sensitivity of the MMI in Ethiopian highlands. In some cases, identification to the species level was also used.

After a detailed metric selection procedure, nine core metrics were designated for developing the MMI. Most of these metrics were widely used in developing MMI in different climatic regions (Ofenboeck et al., 2004; Stoddard et al., 2006; Baptista et al., 2007; Ofenboeck et al., 2010). Taxa richness decreased substantially at polluted sites and those species lost were usually pollution-intolerant. This supports the well-established view that sensitive species get reduced when water quality deteriorates (e.g. Barbour et al., 1996; Karr and Chu, 1999). We found elevated taxa richness in moderately impaired sites (good river quality class), which may be due to moderate increase in nutrient availability from nearby agricultural activities at a level to enhance taxa diversity and allow sensitive taxa to survive. The inclusion of this metrics in developing MMI has been reported by previous authors (e.g. Ofenboeck et al., 2004).

EPT richness is a widely used metric type in many biomonitoring systems since the components (Ephemeroptera, Plecoptera, and Trichoptera) are well known for their sensitivity to anthropogenic impacts (Rosenberg and Resh, 1993). However, in the present study, some taxa of Baetidae and Hydropsychidae were frequently collected in large number from range of impaired sites. The hardiness of Baetidae and Hydropsychidae to pollution was also previously reported by different authors (e.g. Harrison and Hynes, 1988; Malicky and Graf, 2012). Thus, we adapted EPT metrics where Baetidae and Hydropsychidae were considered in EPT metric only if they consist more than a single species (EPT-BH > 1sp) which is highly sensitive and discriminant than EPT total. The diversity of Plecoptera is low in Ethiopian highland streams, as it was also reported by Harrison and Hynes (1988). In the present study, Plecoptera is represented only by a Neoperla sp. belonging to Perlidae. Many studies reported that Perlidae are highly sensitive to environmental degradation (e.g. Fore et al., 1996; Graf et al., 2002) but Baptista et al. (2007) reported some genus are partly tolerant to some pressures. Since Perlidae were mostly collected from ‘a priori’ reference sites characterized by natural riparian vegetation, undisturbed habitat with good physicochemical water quality parameters, it is worthy to consider them as highly sensitive taxa for assessment of Ethiopian highland streams.

Both Chironomidae and Oligochaeta were not sufficiently sensitive to be used separately in the development of the MMI, but when combined, the DE increased significantly. This could be due to the importance of these two groups of organisms in characterizing impacted sites. It was clearly observed that the percentage abundance of Oligochaeta and Red Chironomidae changed very markedly, from being rare at reference sites to being common at slightly polluted sites and abundant at heavily polluted sites. Although numerous studies have also confirmed that Chironomidae abundance increases with increasing degradation (e.g. Fore et al. 1996; Botts, 1997), identification to genus or species level was suggested by many authors to use them as water quality indicators (e.g. Kerans and Karr, 1994; Wymer and Cook, 2003). We found that Red Chironomidae together with Oligochaeta is a useful metric, and above all the routine identification can be done by non-taxonomists.

Although some studies in tropical areas have shown that metrics based on feeding types yield variable responses to human perturbation (e.g. Thorne and Williams, 1997), many authors proposed the use of trophic measures for biomonitoring particularly in multimetric systems (e.g. Barbour et al., 1996; Karr and Chu, 1999; Hering et al., 2006). In the present study, % shredders and % collector gathering were incorporated into the final index. A consistent decrease of shredders was observed with decreasing vegetation cover and water quality deterioration, which is in agreement with previous studies (Anderson et al. 1978; Ofenboeck et al., 2004). We also observed low % shredder in Eucalyptus sp. and Juniperus sp. dominated streams, which may be because of low nutritional quality and high lignin contents of the leaves that made it hardy and unpalatable. In contrary, collector gathering increased toward increasing degradation as reported in Hilsenhoff (1988) and showed strong positive correlation with pollution indicator parameters such as TP, BOD5, and % urban (Table 4).

The other key issue for the success of biological monitoring is to establish sound threshold values for river quality classes in the region. The methods applied in the present study are efficient for streams in Ethiopian grassland and woodland ecoregions (Table 5). Slightly lower threshold value needs to be established for ‘high–good’ river quality class in afro-alpine streams. This underlines the importance of ecoregions as useful spatial units for river typology (e.g. Moog et al., 2004) which can be explained in terms of natural factors that influence faunal structures and the metrics as a consequence. In afro-alpine ecoregion of Bale Highlands, we observed that the natural vegetation is dominated by short Erica forest and in between the soil is covered by various grass species. Thus, streams flowing in this region have naturally limited autochthonous and allochthonous food sources to support high diversity of benthic macroinvertebrates. Furthermore, these streams are located at higher elevation (>2,600 m a.s.l.) characterized by low temperature throughout the year. For example, the mean maximum temperature at Sanetti Plateau was between 6 and 12°C, while mean minimum temperature was between 1 and 10°C (National Meteorological Agency (NMA), 2012) which may limit high diversity, reproduction, and growth of benthic macroinvertebrates.

Table 5 Proposed river quality class boundaries for Ethiopian highland streams in comparison with ecological quality ration (EQR) by Hering et al. (2006) for European Water Framework Directives (WFD)

The development of a MMI requires testing for stability with the natural seasonal variability (Barbour et al., 1996). Some studies on seasonal variability supported the argument on the importance of this step (Sporka et al., 2006; Baptista et al., 2007). The newly developed MMI is stable across different hydrological conditions of dry and light rainy months of the year (Fig. 3).

Fig. 3
figure 3

Response of multimetric index to different hydrological conditions (dry and light rainy months) of year 2011/12 under reference and impaired sites (agricultural, siltation, and paper mill waste)

In conclusion, the newly developed MMI is robust and sensitive to a wide variety of anthropogenic impacts such as land use change, and physical and chemical river degradation in central and southeast highland rivers of Ethiopia. This index is adequate monitoring tool for the study area and could be applied for establishing a biomonitoring network in comparable highlands of the county. We also believe that this index provides scientifically sound and reliable information to develop and implement preservation and restoration measures by decision makers.