Introduction

Rivers are ecologically important ecosystems, and their health is crucial to the human communities that depend on them (Dickens et al., 2018; Nguyen et al., 2018). The availability of clean and safe water is crucial to maintaining a balanced ecosystem, ensuring the survival of all forms of life and promoting socio-economic development (United Nations Environment Programme, 2016). Indeed, freshwater ecosystems are among the most exploited ecosystems on the planet not only due to their functionality, but they are also directly impacted by human activities in their watersheds (Malmqvist & Rundle, 2002). Among these environments, the inland waters of the Afro-tropical region are among the most threatened in the world, while harbouring rich biodiversity and high degree of endemism (Barlow et al., 2018; Sayer et al., 2018).

In sub-Saharan Africa, particularly in Nigeria, rivers are under pressure of anthropogenic activities such as population growth, agricultural activities, industrialisation and urbanisation. This deteriorates water quality through increased nutrient concentrations, dissolved oxygen depletion and dissolved solid accumulation (Krynak & Yates, 2018), which hampers the availability of water for drinking and other uses (Kaboré et al., 2018; Keke et al., 2021; Parienté, 2017). In addition, these stressors are the main drivers of biodiversity loss (Krynak & Yates, 2018), and high prevalence of human waterborne diseases, requiring water resources managers and authorities to increase their efforts toward improved management and protection systems (Edegbene et al., 2021; Tafangenyasha & Dzinomwa, 2005). However, effective management of these water resources requires a good understanding of anthropogenic pollution effects on aquatic ecosystems.

Water quality management and monitoring in Nigeria is still mainly a matter of physical and chemical analyses and sometimes, to some extent, basic analysis of stream biota (Arimoro et al., 2015). However, physical and chemical analyses have shown their limitations in terms of their ability to provide comprehensive information on the deterioration status of water bodies. For example, many chemical and physical measurements only describe conditions at the time of sample collection, increasing the risk of not capturing sporadic outbreaks of pollutants (Holt & Miller, 2010). According to Bonada et al. (2006), the results of these analyses may be biased by seasons, sampling sites, and may not even take into account critical ecological activities conditions in the studied water bodies. Furthermore, physical and chemical analyses provide no information on the biological impacts of pollutants on aquatic organisms (Wan et al., 2018).

Moreover, physical and chemical analyses are relatively costly and require considerable analytical skills and financial resources, unlike biological analysis. To ensure sustainable development and management of these water resources, it is imperative to understand the interrelationship between human activities and the health of aquatic ecosystems, while at the same time developing reliable tools for monitoring river quality. Several monitoring programmes prioritise biological indicators due to their ability to take account of human disturbance both in space (local and global) and time (short and long term) (Davies & Jackson, 2006; Hughes, 2019).

Biological assessments aim to characterise the current state of water bodies by monitoring changes in aquatic communities associated with anthropogenic disturbances (Jun et al., 2012). These aquatic organisms are sensitive to a wide range of physical, chemical and biological impacts and can respond in a precise and gradual manner to pressures on the aquatic environment (Ertaş & Yorulmaz, 2022). However, these responses are variable depending on the type of community. Among these aquatic communities, macroinvertebrates are the most commonly used for biomonitoring due to their ubiquity, different levels of tolerance to disturbance and cost-effectiveness of sampling (Agboola et al., 2020; Ko et al., 2020; Ndatimana et al., 2023). Furthermore, the responses of these organisms to environmental disturbances are universally recognised, and their responses are generally used in the development of indices for monitoring the integrity of freshwater ecosystems (Ndatimana et al., 2023). Various environmental factors, such as dissolved oxygen (DO), biological oxygen demand (BOD), phosphorus compounds and nitrate nitrogen concentration, as well as riparian parameters such as water speed and depth, quality and quantity of available habitats and bank occupation have a direct and indirect impact on the diversity, composition and distribution of aquatic macroinvertebrates (Sripanya et al., 2022; Stevenson & Bahls, 2002). For example, a drastic decrease in the oxygen level in the aquatic environment caused by activities near the watercourse could lead to the proliferation of tolerant organisms compared to organisms more sensitive to pollution. In recent decades, several tools and approaches to biomonitoring based on macroinvertebrates have been developed to complement physical and chemical monitoring (biotic indices, functional groups, multivariate approaches, multimetric indices) (Odountan et al., 2019). However, among these various approaches, multimetric indices (MMIs) have proven to be highly effective due to the integration of information and data from different dimensions of the aquatic biota and the overall ecosystem (Bonada et al., 2006). MMI integrates several taxonomic measures (taxa richness, abundances, pollution tolerance/sensitivity, structural diversity) into a single index for the assessment of aquatic ecosystems (Edegbene et al., 2019; Shull et al., 2019).

The communities in North-central region of Nigeria are highly dependent on the river waters for drinking, domestic use and watering livestock, among others services. However, the anthropogenic activities are increasingly affecting the water quality and the riverine ecosystem integrity. In addition to agricultural pollution, the poorly functioning urban drainage systems and inadequate solid waste management are also threats to these rivers (Arimoro et al., 2018). Unfortunately, there are very few biomonitoring tools or methods developed using an integrated approach for water resource management. Yet these integrated approaches can help to develop tools and strategies for long-term monitoring of water pollution. The aim of this study is to develop a macroinvertebrate-based biotic index through a rigorous statistical process, for assessing water quality of rivers in Niger state. The study specifically focuses on (1) categorising sites according to a disturbance gradient, (2) developing a multimetric index based on macroinvertebrates communities of selected rivers in Niger State and (3) testing the performance of the index newly developed. It is anticipated that the newly developed index will help in time and cost-effective monitoring of rivers in Niger state and North Central Ecoregion of Nigeria at large.

Materials and methods

Study area

The study area is located in the North Central Region of Nigeria and lies between longitude 5.250–7.350°E and latitude 10.725–8.450°N. The study area covers approximately 23,600 km2 and is characterised by two distinct seasons: the rainy season (April to October) and the dry season (November to March). The average annual rainfall ranges from 1000 to 1600 mm, and the mean monthly temperature ranges between 20 and 36 °C. The maximum temperature is recorded between March and June, while the minimum temperature is usually between December and January. North Central region lies within the savannah region, which is characterised by grasses, shrubs and trees (Agbelade et al., 2016; Keke et al., 2017). The region is drained by numerous rivers (Niger, Kaduna, Gurara, Gbako) and their tributaries which allow the development of human activities such as agro-pastoral and industrial activities.

Sampling sites

Forty sites spread over 18 rivers were selected and sampled seasonally (dry and wet season) from 2016 to 2023. These were Kaduna, Wushishi, Wuya, Baka-Jeba, Chanchaga, Chike, Gada, Gbako, Grigada, Gurara, Kataeeregi, Landzun, Musa, Penyan, Samu, Kemi, Mutundaya and Maitumbi Rivers (Fig. 1; Table 1). The main criteria for the selection of samplings sites were accessibility, surface area (channel morphology), stream permanence and potential sources of pollution. The catchment and riparian zones of the selected sites are characterised by agricultural activities, forestry, livestock farming, fishing and some industrial activities; highly urbanised areas, poorly functioning drainage systems and inadequate solid waste management have been observed in the catchment and riparian zones of the selected sites.

Fig. 1
figure 1

Map showing Nigeria (top left corner), Niger state (coloured in light green) and location of the study area including sampling sites, river network, land use/land cover and sub-catchments

Table 1 Sub-catchment characteristics and proportions of land use/land cover of the study area

Data collection

Of the 40 sites sampled, dataset was divided into a calibration dataset (sampled from January 2016 to December 2017) and a validation dataset (sampled from January 2022 to February 2023). Calibration dataset consisted of 30 sites spread over 15 streams (2 sites per stream) and has been defined as such because the data constituting it was collected over a longer period in comparison with the validation dataset. The data collected at these sites was then stored in a digital database. Validation dataset consisted of the remaining ten sites.

Macroinvertebrates were sampled simultaneously with measurements of physical and chemical variables at each site on four occasions over a period of 1 year (two times in the dry season and two times in the wet season). Data collection was conducted over a short time interval within the same season to avoid results reflecting seasonal differences in macroinvertebrate communities (Leibold et al., 2014). The same methodological approaches to data collection (macroinvertebrates and physical and chemical) were applied to both datasets.

Physical and chemical parameters

A wide range of abiotic variables were recorded both in the field (in situ) and in the laboratory. Physical and chemical variables such as temperature, pH, electrical conductivity (EC) and dissolved oxygen (DO) were measured in situ using pre-calibrated devices (HANNA HI 991300/1 multiple probes, pH meter). The probes of these devices were immersed in the water, and the selection of the desired function allowed the value of the concerned parameter to be displayed on the screen. At each site, these measurements were carried out prior to macroinvertebrate sampling to avoid any disturbance of the environment that could bias the results.

Several riparian and in-stream variables such as canopy cover, macrophytes covering the streams, wood/logs, coarse particulate organic matter (CPOM), moss and substratum composition (bedrock and fine sediment) were estimated visually in terms of percentage at each site (Peck et al., 2006). In addition, each site investigated was evaluated using a score assigned according to the degree of disturbance to the banks. This score, ranging from 1 to 4, provided overall information on the level of disturbance in the vicinity of the sites, based on sensory characteristics of the immediate environment of the watercourse. These disturbances included waste deposits, sand or gravel excavation activity, defecation practices on the banks, proximity of roads or means of transport and bank stability. For each site, an average score considering the aforementioned parameters was defined as follows: highly disturbed (3–4), disturbed (2–3), moderately disturbed (1–2) and less disturbed (0–1).

Average mid-channel water velocity was determined at three replicates by timing the movement of a floating object over a distance of 10 m (Gordon et al., 2004). The depth was measured with a calibrated rod. Water samples were collected at each site in sterilised 1.5-L water bottles. Dark-coloured bottles (intended for BOD5 measurement) were used specifically to collect water samples and were then wrapped in aluminium foil and sealed with adhesive tape to prevent light penetration, while light-coloured bottles were used for nitrate and phosphate (NO3 and PO4). The water samples were then stored in a container with ice and transported to the laboratory. Once in the laboratory, the bottles intended for BOD5 measurement were incubated at 20 °C for 5 days. The analysis of the samples started within 24 h of collection for nutrient concentrations (NO3 and PO4) and after 5 days for BOD5 (BOD5 = DOin-situ – DOlab). Nutrients were determined using an atomic absorption spectrophotometer and reagents. The water samples and their analysis were conducted following the standard methods of the American Public Health Association (APHA, 2012).

The percentage of land cover and land use in the study area was extracted and generated using QGIS 3.28 software from 10-m resolution satellite image data (Sentinel-2 L2A) (Karra et al., 2021). To do this, the surface areas of sub-catchments draining the sites and the land use categories were determined using QGIS software. In addition, a 500-m buffer zone was set around each sampling site to determine the areas of land use located around the sites. A total of five levels of land cover (agricultural areas, built-up areas, trees, natural vegetation and bare grounds) were defined for this study, and their surface areas were expressed as a percentage in order to identify the dominant categories for each sub-catchment and in the vicinity of the sites.

Macroinvertebrate sampling and identification

A surber net with a 0.09-m2 rectangular metal quadrat attached to a 250-μm mesh vacuum net was used to sample macroinvertebrates. Organisms were collected along a 100-m stretch, considering the water flow regime and the nature of the substrate. Collection method ensured that the three microhabitats (pools, riffles and runs) and all the different substrata (vegetation, sand, gravel biotopes, etc.) were included. To avoid any form of bias, three random samples were collected from each of the three microhabitats for a total of three times and then pooled to form one composite sample per site (Jeffries & Mills, 1990). The collected samples were preserved in labelled containers with a 70% ethanol solution and transported to the laboratory. At laboratory, samples were sorted, separated, identified and counted using a stereomicroscope and identification keys (Arimoro & James, 2008; de Moor et al., 2003; Tachet et al., 2010). Considering the limitations in cost (both during MMI development and application) and taxonomic knowledge in the region, we identified and build metrics at a family-level resolution.

Data analysis

Pressure gradient and site disturbance classification

A multivariate approach was used to classify all the sites along a disturbance gradient. Principal component analysis (PCA) was implemented on the spatial data in order to distinguish the different levels of pollution induced by human activities at the sites explored. This analysis was used to minimise dimensionality and improve understanding of the variables (physical, chemical and riparian) likely to provide information on the state of these streams. The environmental variables taken into account for this analysis were selected following a Spearman correlation analysis to avoid collinearity between variables (|r| > 0.75), in order to retain those, commonly known to be good indicators of anthropogenic disturbances. The PCA was then applied on standardised centred data. In addition, variables that contributed little to the construction of the first two axes of the PCA were excluded from the analysis. The first dimension of the PCA (PCA1), which contains the most information about the analysis, was selected as defining a disturbance gradient. The categorisation of sites was obtained by extracting the score (coordinate) for each site from axis 1 of the PCA. These scores were then standardised between 0 (worst condition) and 100 (good condition) according to the following equation:

$${\textrm{PCA}}_{\textrm{scaled}}=\frac{{\textrm{PCA}}_{\textrm{max}}-\textrm{PCA}}{{\textrm{PCA}}_{\textrm{max}}-{\textrm{PCA}}_{\textrm{min}}}\times 100$$
(1)

where PCA is the score to be standardised and PCAmin and PCAmax are, respectively, the minimum and maximum PCA1 scores for all sites. Following this standardisation, the 80th and 50th percentiles of the distribution of these scores were used as threshold values firstly to identify sites that were as close as possible to natural conditions (least disturbed sites) and secondly to differentiate moderately disturbed sites (MDS) from the most disturbed sites (HDS). Thus, the distribution of standardised PCA1 scores for each level of disturbance of the sites was as follows: PCAscaled ≥ 80th percentiles (LDS), 50th ≤ PCAscaled < 80th percentiles (MDS) and PCAscaled < 50th percentiles (HDS).

A similar approach using principal component analysis as a means of categorising sites according to a disturbance gradient has already been applied in previous studies (Angradi et al., 2009; Edegbene et al., 2019; Ofogh et al., 2023). In the present study, the term “reference sites” was used to designate the least disturbed sites and the term “impaired sites” to designate the moderately and highly disturbed sites. The principal component analysis was implemented using the “Factoextra” package of the statistical software R (Kassambara & Mundt, 2021; R core team, 2021).

Macroinvertebrate assemblages

Similarity analysis (ANOSIM) was used to detect significant (p<0.05) differences in macroinvertebrate community composition between site categories and seasons. In order to identify which macroinvertebrate taxa contributed most to spatial or seasonal differences, similarity percentage analysis (SIMPER) was performed using the Bray-Curtis dissimilarity measure, with 9999 permutations. This analysis was carried out using the “vegan” package of the R statistical software (Oksanen et al., 2020).

MMI development

Selection of candidate metrics

A total of 88 candidate metrics were selected for the development of the multimetric index. These metrics, chosen to represent various aspects of the aquatic macroinvertebrate community, were selected based on insights from previous tropical studies, including those by Odume et al. (2012), Mereta et al. (2013), Silva et al. (2017). Edegbene et al. (2019) and Edegbene et al. (2021), among others. The choice of these metrics was based on their ability to detect the effects of human activities on macroinvertebrates and to distinguish the most disturbed from the least disturbed sites (Tomanova et al., 2008). The metrics selected were from four groups: abundance and composition, richness, diversity indices and functional characteristics (traits) (Table SI1).

Abundance and composition were assessed by examining the relative proportion of taxonomic groups (classes, orders, families) expressed in numbers of individuals (such as Ephemeroptera abundance, Mollusca abundance, Odonata abundance, Diptera abundance) or as a percentage of the total number of individuals sometimes combined into larger groups at different taxonomic resolutions in the sample (such as %Hemiptera, %EPT, EPT/Chironomidae, Molluscs+Diptera abundances). Taxonomic richness, which refers to the number of taxa present in a sample, was calculated by determining the number of macroinvertebrate families within an order (such as Ephemeroptera richness, Diptera richness, Odonata richness, EPT richness) or class (Mollusca richness and Decapoda richness). Diversity metrics (Shannon Wiener, Simpson, Evenness, Margalef and Pielou index) were obtained using the PAST software (Hammer & Harper, 2001). Finally, for the traits selected as candidate metrics (feeding habits, mobility and body size), fuzzy coding ranging from 0 to 3 (0 no affinity to 3 high affinity) was used to assign scores based on the affinities of the taxa for each trait attribute (Table SI2). This approach has the advantage of considering the different levels of information available, the adaptability of organisms and the different phases of their life cycles (Chevenet et al., 1994). The scores obtained were then expressed in terms of frequencies for each category of traits. The percentages of candidate metrics were obtained by multiplying these frequencies by the abundance of individuals in the sample concerned. Trait information was adapted from the existing South African database (Odume et al., 2018).

Metric screening

The development of an efficient multimetric index requires a robust statistical process to avoid any loss of biological information. To this end, the detection of core metrics was carried out on the calibration data (30 sites) according to a stepwise process as recommended by Barbour et al. (1996), Hering et al. (2006) and Baptista et al. (2007).

  1. (1)

    Range test and stability: This test was used firstly to eliminate metrics with low amplitudes of variation or containing many zero values. As a result, richness and percentage metrics with ranges of less than 5 and 10%, respectively, were excluded from the process (Klemm et al., 2002). Secondly, the coefficient of variation (CV = sd/mean) of each metric in the group of least disturbed sites was calculated. Metrics with low CV values (CV < 0.5) were considered stable and therefore retained (Shiyun et al., 2017); those not meeting this criterion were excluded from the process.

  2. (2)

    Metric sensitivity: Candidate metrics were tested for their discriminatory potential between reference sites and impaired sites. A Mann-Whitney comparison test between the two groups of sites was performed to assess the sensitivity of the metrics (p < 0.05). Sensitivity score based on the overlap degree of the interquartile ranges (IQRs) was then assigned to each metric (3=no overlap of IQRs ranges; 2=some overlap of IQRs ranges but both medians are outside the IQRs overlap; 1=moderate overlap of IQRs but at least one median is outside; and 0=extensive overlap of IQRs or both medians within the overlap). Metrics with a p value < 0.05 in the Mann-Whitney test and a IQR score ≥ 2 were considered to be sensitive and retained for further analysis (Baptista et al., 2007). In addition, metrics that passed the sensitivity test but showed unexpected responses to the predictions were discarded from the final metric detection process (Cao et al., 2007). Indeed, it would be difficult to argue that such response reflects biological alteration in the study area.

  3. (3)

    Redundancy: To ensure that each metric included in the final index has the capacity to provide new information and to facilitate the decision on which metrics to retain in the event of redundancy, a cluster analysis using Spearman’s correlation coefficient as a measure of similarity and “ward.D2” as a clustering method was implemented to measure metric redundancy. This test was applied exclusively to reference sites to avoid the effects of stress factors on redundancy between metrics (Shiyun et al., 2017). The discrimination efficiency (DE) for these metrics was also calculated. For positive metrics (assumed to decrease with increasing pollution), the DE was measured as the percentage of impaired sites with a metric value > 25th percentile of reference site values. For negative metrics, the DE was measured as the percentage of impaired sites with values > 75th percentile of reference site values.

$$\textrm{DE}=\frac{{\textrm{n}}_{\textrm{imp}}}{{\textrm{N}}_{\textrm{imp}}}\times 100$$
(2)

where nimp denotes the number of impaired sites > 25th or 75th percentile of reference sites and Nimp the total number of impaired sites. Finally, for each group of correlated metrics (Spearman correlation: |r| ≥ 0.75) within a cluster, those with the highest DE were selected (Chen et al., 2014). When the discrimination efficiency was the same for correlated metrics, the most stable metric (lowest CV) was selected. Cluster analysis was carried out using the “Factoextra” package of the statistical software R (Kassambara & Mundt, 2021; R core team, 2021).

  1. (4)

    Relationship with abiotic variables: Finally, the correlation between the non-redundant metrics and the environmental variables was assessed using the Spearman’s correlation test, because a multimetric index should be able to indicate a potential stressor-specific relationship (Klemm et al., 2002). Metrics significantly correlated (p < 0.05) with at least one of the abiotic parameters were considered to respond to a disturbance gradient and thus was included in the final MMI.

Index construction and ecological water quality classes

The continuous scoring method was used to calculate MMI scores due to its sensitivity and lower variability as observed in previous studies (Blocksom, 2003; Stoddard et al., 2008). Due to the different value ranges of the selected metrics, they were normalized using the 5th and 95th percentiles to rescale them to a score between 0 and 10 by interpolating the measurements between the floor and ceiling values (Vander Laan et al., 2013). The advantage of this approach is that it minimises the influence of outliers that could alter the analysis of the metrics (Fierro et al., 2018). The scoring procedure for the Niger State Rivers Biotic Index (NSRBI) consists of three steps using two formulas:

  1. (1)

    Computing all final metrics for all sites.

  2. (2)

    Standardising metric values to a 0–10 scale: by interpolating only metric values between floor (5th percentile) and ceiling (95th percentile) values. When metric values were outside floor/ceiling, they just scored with 0 or 10 (for positive metrics, values below the floor all got 0 and those above the ceiling got 10, the opposite for negative metrics). Thus, for metrics that responded negatively to disturbance (positive metrics), the 95th percentile of the reference (least disturbed sites) values was considered as the ceiling and the 5th percentile of all sites values as the floor (Formula a). In an opposite way, metrics that responded positively to disturbance (negative metrics) received the 5th percentile of the reference values as the floor and the 95th percentile of all sites as the ceiling (Formula b).

Formula a:

$$\textrm{positive}\ \textrm{metrics}=10\times \frac{\textrm{metric}\ \textrm{result}-\textrm{floor}}{\textrm{ceiling}-\textrm{floor}}$$
(3)

Formula b:

$$\textrm{negative}\ \textrm{metrics}=10\times \left[1-\left(\frac{\textrm{metric}\ \textrm{result}-\textrm{floor}}{\textrm{ceiling}-\textrm{floor}}\right)\right]$$
(4)
  1. (3)

    The final NSRBI value normalised to a scale of 0 to 100 was obtained by summing the final metric values (values scaled to 0 to 10) at each site and multiplying this result by 10/n (n was the number of final metric).

To assess the ecological state of Niger State rivers, we defined four categories (good, fair, poor, very poor) that reflect different levels of ecological quality of rivers. The detection of category boundaries was based on the scores of the reference sites in the calibration data. The 25th percentile of the calibration reference site scores was calculated. This value was considered as the “good-fair” boundary. For the boundaries of “fair-poor” and “poor-very poor”, the scoring range between minimum score (0) and the “good-fair” boundary (25th percentile of calibration sites scores under reference condition) was divided in three equal classes.

Index performance and precision

The sensitivity and the performance of the NSRBI index were assessed on 10 impaired sites (1 moderately disturbed and 9 highly disturbed). These test sites were not included in the index development process. The sensitivity of the NSRBI index developed was assessed using the pressure scores of the test sites on the first axis of the PCA used to classify the sites (see sectionSite disturbance classification”). A simple linear regression model (Y, NSRBI scores of test sites; X, scores of test sites on PCA1) was implemented to measure the degree of association of the index with the gradient of disturbance of the streams. Index performance was assessed in terms of precision for each group of impaired sites. The correct classification percentage (CCP) was estimated for MDS as the percentage of streams evaluated as fair and poor, whereas CCP of HDS was calculated as the percentages of streams evaluated as poor and very poor in the whole group (Zhang et al., 2019). Simple linear regression was performed in the R software environment and regression plotting was performed using the R package “ggplot2” (R core team, 2021; Vaissie et al., 2021).

Results

Site disturbance classification

For all 21 environmental variables considered to provide information on the existence of a disturbance gradient at the sites investigated, no redundant variables were observed (Table SI3). Of these 21 variables, only 11 contributed effectively to the construction of the first two axes of the PCA (Fig. 2). These first two axes expressed 67.8% of the total variance, i.e. 46.5% for axis 1 and 21.3% for axis 2 (Fig. 3). Axis 1 of the PCA was negatively correlated with dissolved oxygen concentration and %canopy cover. In contrast, pH, conductivity, nitrate concentration and the level of disturbance of the physical habitat (RD_score) were positively correlated with this axis, indicating the existence of a disturbance gradient. The standardised scores for the sites along the first dimension of the PCA are presented in Table 2.

Fig. 2
figure 2

Contribution of environmental variables to PCA dimension 1 and 2 constructions for site classification

Fig. 3
figure 3

PCA biplot of the environmental variables that contribute most to the construction of the PCA axes for site classification

Table 2 Standardised scores of sampling sites and their classification based on PCA axis 1 scores

Of the 40 sites investigated, 8 were categorised as less disturbed (reference sites) and 32 sites as impaired (12 moderately disturbed and 20 severely disturbed). The reference sites were characterised by significantly low values for conductivity, pH, BOD5 and nitrate concentration and significantly high values for dissolved oxygen. In addition, these sites were located in areas with a zero percentage of urbanised area and recorded the lowest scores for disturbance of the physical habitat in their riparian zones (Table 3). In contrast, the highest values for these parameters were observed at the impaired sites.

Table 3 Range and median values of physical and chemical variables, substrate composition, riparian measurements and land cover/land use of the impaired (n=32) and reference sites (n=8)

Macroinvertebrate assemblages

A total of 105 taxa were collected at the various sites. The most dominant macroinvertebrate orders were Diptera (30.25% of total abundance), Ephemeroptera (15.45%), Odonata (11.69%), Coleoptera (11.31%) and Tubificida (9.88%). The ANOSIM results showed that the composition of macroinvertebrate communities differed significantly between the two categories of sites (R = 0.23, p = 0.0254). According to the SIMPER analysis, Chironomidae, Naididae, Baetidae, Dytiscidae, Gyrinidae, Coenagrionidae, Atyidae and Libellulidae contributed most (> 50% of the cumulative contribution) to the difference between the impaired sites and the reference sites. These taxa were predominant at the impaired sites. Furthermore, ANOSIM revealed no significant difference (R = −0.055, p = 0.951) in the composition of macroinvertebrate communities between the dry and wet seasons.

Metrics evaluation

Of the 88 candidate metrics initially considered, the range and stability test (CV < 0.5) selected 64 for further analysis. Of these, 14 metrics showed a significant ability to differentiate between impaired sites and reference sites according to the results of Mann-Whitney test (p<0.05) and sensitivity score (scores ≥ 2). However, of these 14 metrics, five (Odonata abundances, ETOC abundances, Col+Hem abundances, EPTC abundances and total Taxa-Chi abundances) were excluded from the process because they showed unexpected responses to the predictions made. In fact, these metrics, which were supposed to decrease with increasing pollution, were predominant at impaired sites. The cluster analysis then grouped the nine remaining metrics into six groups (Fig. 4).

Fig. 4
figure 4

Cluster dendrograms based on “Ward.D2” using Spearman rank correlation as similarity measure of metrics retained for redundancy test; height represents the degree of association between the metrics; a low value of height indicates a high degree of similarity; meaning of metrics codes are shown in appendix Table 7

The DE of these metrics ranged from 64 to 95% (Table 7 in appendix). Initially, the evenness index (dm3), Shannon index (dm2) and %Baetidae (cm24) metrics were excluded from the process because of their low DE with respect to the metrics to which they were correlated in their grouping. Then, the Simpson index (dm1) and %burrowing individuals (tm9) metrics were excluded from the process because they were correlated with the %Hemiptera (cm11) and %very large individuals (tm11) metrics, respectively (Table 4). The DE of the dm1 metric (82%) was lower than that of cm11 (95%). As the DE of the tm9 metric was identical to that of the cm11 metric, the cm11 metric was selected because it was more stable (CV: 0.247) than tm9 (CV: 0.457). The four (4) remaining metrics (Pielou’s equitability index, Diptera richness, %Hemiptera and %individuals with very large body size) showed significant correlations for at least one of the environmental variables (Table 5) and were therefore included in the final multimetric index. The ability of the final metrics to distinguish reference sites from impaired sites is shown in Fig. 5.

Table 4 Spearman correlation matrix between sensitive metrics
Table 5 Spearman correlation test between final metrics and environmental variables
Fig. 5
figure 5

Box and whisker plots of each of the four selected metrics integrated in the final index

Metrics scoring

Upper (ceiling) and lower (floor) thresholds were established for each metric using the values of all sites and the least disturbed sites in the calibration data (Table 6). The possible index scores range from 0 to 100. This score limit was obtained by weighting the sum of the scores of the four final metrics of each site by 10/4. The range of NSRBI was subdivided into four quality classes corresponding to different ecological condition (Fig. 6): good (score ≥ 75.78), fair (score ranging from 50.52 to 75.78), poor (score between 50.52 and 25.26) and very poor (score less than 25.26).

Table 6 Lower (floor), upper (ceiling) thresholds and formulas for the calculation of NSRBI metrics
Fig. 6
figure 6

Ecological quality classes for NSRBI. The green line indicates the threshold between “good” and “fair” quality classes, the orange line indicates the threshold between “fair” and “poor” quality classes and the red line is the boundary between “poor” and “very poor” quality classes

Index validation and application

The first factorial axis, which contained most of the PCA information (46.5% of the total variance explained), was strongly and significantly negatively correlated with the NSRBI scores (r = −0.776, p < 0.001) revealing a significant response of the index to the river disturbance gradient (Fig. 7a). The index performance results were as expected for MDS, classifying this site in the “Fair” ecological category. Overall, the application of the index to the test data provided a CCP of 90% (Fig. 7b).

Fig. 7
figure 7

Simple linear regression between NSRBI scores and PCA axis 1 factor scores (a) and NSRBI scores at test sites (b); coloured lines in graph b indicate ecological water quality threshold (green, good–fair/orange; fair–poor; red, poor–very poor)

Discussion

Site classification

The development of a robust multimetric index requires an appropriate choice of reference sites. This decision, taken independently of the biological assessment, must reflect natural ecological patterns and processes, i.e. natural ecological conditions where there is no human activity, as emphasised by Stoddard et al. (2008) and Jun et al. (2012). However, it is rare to find unaltered reference conditions, particularly in the study region, where several studies have documented pollution, excessive nutrient inputs and alterations to the hydro-morphology of streams attributed to human activities (Arimoro & Keke, 2021; Keke et al., 2015; Yisa & Tijani, 2010). In most cases, the sub-catchments investigated were dominated by agricultural activities and, to a lesser extent, urban areas. As a result, it has become imperative to adopt alternative approaches to determine the reference conditions, as suggested by Barbour et al. (1996), Hering et al. (2006) and Kaboré et al. (2018). Thus, a reference state can be defined as the usual condition of a set of sites considered to be the least altered, sharing similar trends in their physical, chemical and biological characteristics. The classification of the sites based on principal component analysis (PCA) of the abiotic variables clearly differentiated the least disturbed sites from the most disturbed sites. The impaired sites were characterised by significantly high values for certain parameters, the increase in which reflects stress on the streams (EC, BOD5, NO3 and PO4) associated with a major disturbance of the physical habitat (RD_score). These high values for these parameters unambiguously reflect a marked human influence, probably due to intensive agricultural activities or urban pressures. These results contrast with those obtained at the least disturbed sites (reference sites), which were characterised by relatively higher oxygenation, virtually no urbanisation near the sites, and a less hydro-morphological and riparian alteration near these sites, highlighting a significant difference in terms of water quality. Out of the 40 sites investigated, only 8 sites, i.e. 20%, were categorised as reference sites for the development of the multimetric index, which is relatively low compared to other studies carried out in Africa (Alemneh et al., 2019; Assefa et al., 2023; Mereta et al., 2013; Tampo et al., 2020). These findings clearly underline and reinforce the idea of the substantial impact of human activities on water quality (Carpenter & Bennett, 2011; Keke et al., 2017; Smith & Siciliano, 2015) and therefore the need for cost and time-efficient tools and methods to detect levels of disturbance in streams and highlight the associated trends.

Metrics selection and MMI development

Four metrics representing each of the above dimensions of the aquatic macroinvertebrate community were identified as potential metrics for the construction of the final multimetric index. The inclusion of these various dimensions of biological systems not only provides a better account of anthropogenic disturbance on the aquatic ecosystem, but also ensures that the MMI is representative (Huang et al., 2015).

The percentage of Hemiptera was one of the four metrics chosen for the final index. The analyses revealed a marked trend, with a higher proportion of Hemiptera in the reference sites than in the impaired sites. In addition, this metric appeared to be significantly and negatively affected by conductivity. This finding is consistent with previous studies by Mary (1999), Millán et al. (2011), Carbonell et al. (2011) and Assefa et al. (2023), where distribution and abundance of hemiptera families were negatively affected by salinity and conductivity. The metrics “EPT abundance” and “percentage of EPT individuals”, recognised as excellent indicators in biological assessments (Arimoro & Muller, 2010; Klemm et al., 2002; Ofenböck et al., 2004; Rosenberg & Resh, 1993), were excluded from the NSRBI index due to their limited discriminatory power. This limitation could be attributed to the dominance of certain taxa of the order Ephemeroptera at impaired sites. In this study, the results of the ANOSIM and SIMPER analyses showed that Baetidae, which were among the taxa that contributed most to dissimilarity in the composition of macroinvertebrate communities between the two categories of sites, were predominant in the impaired sites. These results are in line with Beketov (2004), Baptista et al. (2007) and Rodier et al. (2009) which have reported that certain families of macroinvertebrates belonging to the order Ephemeroptera (Baetidae and Caenidae) and Trichoptera (Hydropsychidae) have developed a resistance to pollution and are able to tolerate a wide range of disturbances. With this in mind, reports have pointed out that the use of low taxonomic resolution (i.e. at order level) can lead to a possible underestimation of the impact of pollution on aquatic ecosystems (Odume et al., 2012). As such, Dobson et al. (2002) had already witnessed the rarity of order Plecoptera in the rivers of tropical Africa.

The Diptera richness metric, measured by the number of Diptera families, emerged as one of the most reliable indicators of river disturbance at the calibration sites. This metric reacted positively to conditions related to organic pollution (correlated significantly and positively with BOD5, NO3, EC, CPOM and negatively with DO) and was integrated into the final index. According to some studies (Arimoro et al., 2015; Morse et al., 1994; Odume et al., 2016), some Diptera have various specific adaptations that give them the ability to tolerate highly disturbed environments. For example, Chironomidae use an inherent molecule, haemoglobin, to trap oxygen in their bodies in response to hypoxic environmental conditions. Syrphidae, on the other hand, have expandable breathing tubes that enable them to capture atmospheric oxygen in polluted sites. It has also been established that most Diptera show a preference for environments rich in organic matter, with low oxygen levels and high conductivity, as indicated by Gil et al. (2008) and Príncipe et al. (2010). In addition, organisms that adapt physiologically to low levels of dissolved oxygen can increase their abundance by exploiting excess nutrients, as pointed out by Camargo and Alonso (2006) and Beyene et al. (2009). Although Diptera richness did not show a significant correlation with any type of land use in this study, it has been documented to be a crucial metric for explaining the impacts of land use on lotic systems, particularly in assessing the lotic system biological degradation (Roy et al., 2003). Several studies have already included Diptera as indicators of organic pollution in multimetric indices. (Aura et al., 2017; Baptista et al., 2011; Edegbene, 2022; Edegbene et al., 2019; Sripanya et al., 2023).

The decision to exclude the Shannon-Wiener index from the final metrics was based on its comparatively low discriminatory power (DE, 77%) in the redundancy test, contrasting with the superior performance of other diversity indices (Simpson index, 82%; evenness index, 95%; Pielou equitability, 95%). Despite being frequently employed in biomonitoring rivers based on macroinvertebrates (Carter et al., 2017; Eriksen et al., 2021), the Shannon-Wiener index demonstrated limited effectiveness in distinguishing variations in aquatic ecosystems health. This choice aligns with findings by Al-Shami et al. (2011), indicating that the Shannon index exhibits notable variability at sites with low pollution levels or exposed to intermediate disturbances. In contrast, Pielou’s equitability, chosen for the NSRBI, showed greater stability, being less prone to variability (relatively low value for CV) and offering higher discrimination efficiency than other metrics. The non-linear response of the Shannon index to increasing pollution, as noted by Metcalfe (1989), further justified its exclusion. Analysing the sensitivity of various diversity indices, Beisel et al. (2003) recommended the use of Pielou’s equitability index for general applications in ecological data due to its acceptable level of sensitivity. Consequently, the selection of Pielou’s equitability for the NSRBI reflects a strategic choice aimed at enhancing stability and discrimination efficiency in assessing river health.

The use of macroinvertebrate functional attributes as metrics has also been recommended for the development of MMIs. Because of their mechanistic links with various stress factors, these metrics make it possible to identify anthropogenic disturbances without being affected by taxonomic composition (Moya et al., 2011; Saito et al., 2015; Tomanova & Usseglio-Polatera, 2007). Among these functional metrics, those relating to functional feeding groups (FFGs) are the most commonly used in the development of macroinvertebrates-based biotic indices (Barbour, 1999; Eriksen et al., 2021; Moya et al., 2011). In contrast to some studies carried out in the tropics which reported functional metrics in final indices (Assefa et al., 2023; Kaboré et al., 2022; Mereta et al., 2013), there were no FFG metric retained in the present study, owing to their inability to discriminate significantly the reference from impaired sites (Mann-Whitney test: p > 0.05). It has been demonstrated that FFGs can be influenced by factors such as seasonality, spatial scale, taxonomic resolution and sampling methods (Cummins, 2021; Sitati et al., 2021). In addition, in neotropical rivers, the feeding habits of macroinvertebrate species can differ considerably even within the same family (Moya et al., 2007; Thorne & Williams, 1997), thus adopting a family-level taxonomy, as in our study, risks causing a significant loss of ecological information (Moya et al., 2011). However, the body size (%VeLB) and mobility (%Burrower) metrics passed sensitivity and stability tests. Given the collinearity of these two metrics, %VeLB was included in the final index due to its stability (coefficient of variation of %VeLB less than that of %Burrower). In addition, most of the large individuals (maximum size > 40mm) were found at the reference sites. Their prolonged reproductive cycle and relatively small offspring compared with smaller individuals make them more sensitive to environmental stresses, leading to a reduction in their abundance (de Castro et al., 2018; Serra et al., 2017). This metric has already been incorporated into the final index designed to assess urban pollution in the Niger Delta region of Nigeria (Edegbene et al., 2019). The strength of the final index is reinforced by the sensitivity and level of bioindication of all the metrics that compose the index. Prior to the development of this index, the ANOSIM test showed a certain seasonal invariability within the macroinvertebrate assemblages, which tended to have a positive influence on the reliability and stability of the index developed, in agreement with Baptista et al. (2007) on the importance of the stability of multimetric indices in time and space. In addition, the similarity of populations observed between dry and wet periods could be explained by the low impact of the seasonality of environmental factors structuring macroinvertebrate communities. Dalu et al. (2017) and Tonkin et al. (2016) highlighted a limited effect of season on the taxonomic structures of macroinvertebrate assemblages in Nigerian Afrotropical streams and a South African stream, respectively.

MMI validation and application

The newly developed NSRBI is responsive to the abiotic factors as indicated by its negative correlation with the factor score of axis 1 of the PCA. Similar results were obtained by Alemneh et al. (2019) and Tampo et al. (2020) when developing taxonomic tools for stream bioassessment in the Ethiopian highlands and the Zio River catchment in Togo, respectively. The final metrics selected in this study are able to represent various ecological conditions, reflecting the level of anthropogenic disturbance observed at the different sites. The spatial accuracy of the newly developed index is in line with that of Hering et al. (2006) and Zhang et al. (2021), which considered index to be robust when its accuracy exceeds 80%. Certainly, the accuracy of NSRBI is 90% for the test sites. These researchers maintain that an index is considered robust when its accuracy exceeds 80%. Therefore, this makes it essential for assessing the ecological conditions of streams in the predominantly agricultural and urban areas of north-central Nigeria.

However, the new index that has been developed has certain limitations and caveats. To begin with, the categorisation of sites did not take into account certain pollutants such as heavy metals or pesticides, which have the capacity to influence water quality, but also the aquatic organisms in presence. The sensitivity of the index to different sampling methods was not tested in this study, which may introduce bias or error into the data. Indeed, sampling methods can influence the composition and abundance of macroinvertebrate communities and therefore affect the calculation and interpretation of the index (Weigel & Dimick, 2011). In addition, the small number of final metrics (selection of 4 final metrics from 88 candidate metrics) may restrict monitoring of all sources of disturbance and contamination of watercourses (Assefa et al., 2023). Finally, the difficulty in detecting reference sites for index validation in the study area is a major challenge, whereas these sites are essential for establishing expected conditions and the deviation of impaired sites from the reference condition (Odountan et al., 2019).

Conclusion

The Niger State River Biotic Index (NSRBI) presented in this study was developed to assess the ecological status of rivers in predominantly agricultural and urban landscapes. This composite is based on the fusion of four main measures (% hemiptera, diptera richness, Pielou equitability index and % individuals with maximum size > 40mm), encompassing different facets of the aquatic macroinvertebrate community, including abundance and composition, richness, diversity and bioecological traits. These measures were rigorously selected by a step-by-step statistical process from an initial set of 88 metrics. The newly developed index showed responsiveness to stream stressors, and its ability to effectively discriminate the highly altered from less altered sites. Thus, NSRBI is positioned as a robust tool, specifically to be adapted for the biomonitoring of lotic systems in the north-central region of Nigeria. It would be appropriate for further research to examine the influence of certain contaminants, such as pesticides and heavy metals, which were not covered in this study. It would be also beneficial to extend the survey to other regions with similar landscape characteristics, particularly savannah areas, in order to assess the effectiveness of the index.