1 Introduction

Detached soil fragments enter a watershed’s hydrographic network as the ‘byproduct’ of the ongoing soil surface; riverbed; bank erosion processes. Based on their origin, fluvial sediments are categorized as wash load and bed load material. Wash load is produced only during flood events. It is the result of erosional processes that take place exclusively at the catchment surface, i.e., outside the drainage network, and is usually in suspension. Bed load on the other hand consists of streambed material, it is either in traction or suspension, and is the only source of sediments during the dry season.

Bed load is typically estimated using the methodologies of Gilbert (1914), Schoklitsch (1930), Einstein (1948; 1950), Meyer-Peter and Müller (1948), which are based on the tractive force theoryFootnote 1 (Lane 1937, 1955). Gilbert’s research (1914) made foundational contribution to the study of bed load transfer in watercourses. He investigated the laws that govern bed load movement, and the interrelation of load quantity with the stream’s slope and discharge, and the debris’ comminution level. He introduced the term ‘traction’ to describe sediment movement along the riverbed, where larger particles like gravel (2–80 mm) and pebbles (4–64 mmm) are rolled, slid, or bounced (saltated) by the force of flowing water. His work laid the groundwork for understanding sediment transport dynamics and the development of modern fluvial geomorphology. Schoklitsch (1930) developed an empirical formula to estimate the bed load transport rate based on water discharge and sediment size. His work introduced the concept of a critical shear stress or threshold that must be exceeded for bed material to start moving, refining earlier models of sediment transport. The formula became widely used in hydraulic engineering and river management, as it provided a practical tool for calculating sediment movement in natural streams. Einstein’s 1948 work provided a mathematical framework for estimating the rates at which sediment particles are transported along the riverbed, influenced by factors such as water flow, sediment size, and channel characteristics. His 1950 publication introduced the ‘bed-load function’, a tool used to calculate the rate at which sediment particles are carried by flowing water, influenced by the same factors. Meyer-Peter and Müller (1948) developed the empirical Meyer-Peter-Müller (MPM) equation, to predict bed load transport in rivers. Their work focused on how sediment is transported along the riverbed by the force of water flow. The MPM equation introduced the concept of shear stress as the driving factor for particle movement, with bed load transport occurring only when shear stress exceeds a certain critical threshold. Suspended load is estimated by the equations of Lane and Kalinske (1941), and Einstein (1964), in which suspension is described by the diffusion and dispersion conceptFootnote 2. Lane and Kalinske (1964) developed a model that described how finer particles, such as silt and clay, are transported in suspension by the flow of water. Their work emphasized the role of turbulent flow in keeping sediment particles suspended and how sediment concentration depends on water velocity and turbulence. They also highlighted the distinction between bed load and suspended load, providing a more comprehensive framework for analysing sediment transport in river systems. In 1964, Einstein expanded on his earlier work by addressing suspended sediment load transport in rivers. He developed a more detailed theory that explained how sediment particles are carried in suspension by the turbulence of flowing water. Einstein’s model considered the vertical distribution of sediment concentration, showing that finer particles remain suspended longer while coarser ones settle more quickly. His work emphasized the complex interaction between flow velocity, turbulence, and particle size, contributing significantly to the theoretical understanding and prediction of suspended sediment transport in river systems. Other empirical methods (calibration required; natural watercourse conditions need to be considered) are the Leopold and Maddock (1953), Fleming (1969), and Kennedy (1895) formulas, which are based on the regime theoryFootnote 3 (Lindley 1919).

Suspended load estimation has gained increasing attention by the scientific community over the last decades. Researchers focus on the impact (level, means) of soil erosion (Sui et al. 2005), riparian zone characteristics (Steiger et al. 2001; Nicholas 2003), specific sources of sediment, e.g., landslide debris, riverbanks (Prestegaard 1988; Laubel et al. 2000) on a watercourse’s budget, and the negative effects of high sediment supply on the natural and anthropogenic environment. Regarding the latter, the chemical, physical, and biological properties of aquatic ecosystems are degraded by excessive nutrient; trace metal; organic compound; pesticide volumes (Lloyd et al. 1987; Paul et al. 2024) attached to displaced fragments that enter the fluvial network through runoff (Neal et al. 2006; Singh and Stenger 2018). Eutrophication, water temperature rise, decrease of photosynthesis rates, drop of dissolved oxygen levels, fish kills (lack of oxygen, clogged gills), migration of sensitive species are indicative consequences. Furthermore, flood risk soars due to canal silting caused by deposition build-up (Kavian et al. 2016). Besides, the proper function of infrastructure is tampered. In the case of hydroelectric dams (Fig. 9), their reservoirs gradually lose their active storage capacity – the Sanmenxia (China) and Mead (Hoover Dam, USA) reservoirs lose annually approximately 1.7% and 0.3%, respectively (Sloff 1991) – flood storage capacity is reduced, equipment is damaged, and pressure is exerted on the base of the dam (Schleiss et al. 2016; Paschalidis et al. 2021). Moreover, irrigation and drainage networks suffer from equipment clogging and silting, harbour services are affected by silting, drinking water quality declines requiring increased treatment costs (Waters 1995), recreational activities are restricted. In this regard, the United States Environmental Protection Agency (USEPA) classified suspended sediment concentration (SSC) as one of the major causes of river impairment (USEPA 1996, 2017).

At worldwide scale circa 14 × 109 t of suspended load and 1 × 109 t of bed load ends up in the oceans (Walling 1984), and the fertile surface soil layer is reduced by 57.5 mm/1000 y. Soil loss (by extension sediment deposition) does not only depreciate the natural capital, but is also a severe monetary threat. Brown (1948) estimated the annual cost of sedimentation in the USA at circa $175 M, an amount exceeding several billion dollars in today’s prices. Clark et al. (1985) raised the cost (1980 prices) to $6.1 B, $2.2 B of which are owed to farmlands. Mahmood (1987) estimate the annual reservoir remediation cost due to active storage depletion at approximately $6 B, globally. The annual sediment removal cost from the reservoirs of the EU and UK is circa €2.3 B for sediment volumes produced by water erosion; including all other soil loss processes, i.e., gullies, landslides, quarrying, etc., the cost elevates to €5–8 B per year (Panagos et al. 2024).

Hence, the accurate quantification of suspended load becomes essential for the design of appropriate watershed; river management (Kuhnle and Simon 2000) and pollution control strategies (Gao et al. 2007). Systematic sediment discharge measurements, however, are costly and time-consuming, they require specialized personnel and/or well-maintained gauging stations to perform them, and they depend on natural conditions during the sampling. To this end, approximations of different theoretical background and complexity are implemented (Wren et al. 2000; Gao 2008), depending on resources availability. These span from simple deterministic methods such as the sediment rating curves (SRC) (Asselman 2000; Moradinejad 2024) to comprehensive ones, including watershed modelling (Arnold et al. 1993; Mariani et al. 2024), turbidity monitoring (Lewis 1996; Rasmussen et al. 2009), artificial intelligence techniques like fuzzy logic (Lohani et al. 2007; Buyukyildiz and Kumcu 2017) and Artificial Neural Networks (ANNs) (Jain 2001; Sharma et al. 2015), automated pumping samplers (Herman et al. 2008; Gettel et al. 2011), etc. The applicability of some (turbidity monitoring, automated samplers) is limited by high operational costs (installation, calibration, functioning, maintenance), while the validity of others (models) is questioned by the scarcity and/or poor quality of field data that hinder the validation of the simulated results.

In Greece, a country that faces similar limitations, a long-term national-scale sampling program could only be realized by a public agency – in this case the Greek Public Power Corporation (PPC). The apparent merits of the compiled database, comprising Qs and simultaneous discharge-suspended sediment discharge records (Q-Qs) at several cross-sections nationwide, were offset by the program’s deficiencies that affected the quality of outputs. In brief, (i) some stations display rather fragmented Qs timeseries, (ii) bed load measurements were disregarded due to high cost and technical difficulties, (iii) the Q-Qs pairs recording was temporally infrequent, non-systematic, and with long non-recording intervals, and (iv) Qs, Q-Qs data refer to past decades and they are considered obsolete, since they largely describe the designing period of dam development projects (recording continues to this day, yet the processing of raw data has stopped after their completion). The latter underline the need for a contemporary high-quality Qs timeseries, able to accurately represent the sedimentary regime of the surveyed watercourse(s) and support modern research efforts.

In this context, the study aims to evaluate the performance of four established SRC methodologies on Mediterranean type watercourses, with specific reference to Greek rivers. The goal is to provide insight on the selection of the most suitable method per gauging station for the accurate quantification of suspended sediment discharge. The ‘simple rating curve’ (Asselman 2000), ‘different ratings for the dry and wet season of the year’ (Rovira and Batalla 2006; Walling 1977; Hu et al. 2011), ‘hydrographic classification’ – influenced by the ‘different ratings for the rising and falling limb of the runoff hydrograph’ (Glysson 1987; Jansson 1996; De Girolamo et al. 2015) – and ‘broken line interpolation’ (Koutsoyiannis 2000) [also known as ‘piecewise linear regression’ (Ryan and Porth 2007; Yang et al. 2016)] methods were employed. Nine cross-sections of three major rivers of north-western Greece, namely Aliakmon, Acheloos (upper route), and Arachthos were surveyed. The evaluation was based on statistical and graphical analyses. To the best of the author’s knowledge, this is the first time that such an extensive and manifold dataset concerning Greece is investigated in international literature. The novelty of this research lies in the sample size processed (consisted of 36 time series in total), the diversity of the watercourses (natural and modified canals), the high sedimentation rates and the impending siltation risk (catchments susceptible to erosion) that require in-depth knowledge and effective management, and the use of observed sediment discharge records as benchmark values.

2 Materials and methods

2.1 Study area

The study focuses on nine cross-sections of three major rivers of north-western Greece, namely Aliakmon, upper Acheloos (delimited by the Avlaki Dam), and Arachthos (Fig. 1). Their gauging network consists of four; one; four hydrometric stations, respectively, each demarcating the outlet of the homonym sub-watershed (Table 1). The rivers’ selection was based on (i) the availability of reliable, detailed, long-term data that can underpin simulations and their validation, (ii) their subdivision potential into smaller clusters, allowing the expansion and amplification of conclusions, (iii) the diversity of flow conditions, comprising natural and modified canals (Aliakmon’s route comprises a cascade of five dams that regulate its flow), (iv) the high sedimentation rates, impact of the local geological, geomorphological, climatic, hydrological conditions, and anthropogenic practices (Poulos et al. 1996), (v) their importance as key financial and development drivers, contributing to the regional water supply, irrigation, and power supply needs, and (vi) their vital role as aquatic ecosystems, promoting environmental sustainability. The watersheds’ characteristics were described in previous research efforts, e.g., by Efthimiou (2019), Efthimiou et al. (2020; 2022), etc.

In brief, the climate is Mediterranean (Kottek et al. 2006), manifesting in hot, dry summers with frequent thunderstorms and mild, rainy winters. Grouped in this general category, various climatic types are met, influenced by the complex orographic and topographic configuration, i.e., strong altitudinal differences, dryland-to-sea transition, etc. For example, the Mediterranean type is prevalent at the coastal zones, the Continental type at the mountainous areas, and the Alpine type at very high altitudes (Livadas 1976; Balafoutis 1977). Precipitation is determined by the mountain range of Pindos, the dominant morphological element of the study area, and of Greece in general. The thrust is the extension of the Alpine system that stretches through Central and Southern Europe (Everett et al. 1986). It descends from Epirus to Peloponnese parallel to the eastern coastline of continental Greece, forming a natural barrier that demarcates western from eastern Greece in terms of climate (and water abundance) regime. Specifically, the western windward side (where the Acheloos, and Arachthos watersheds are) receives far higher precipitation depths that the eastern leeward one (where Aliakmon is), because of the rain shadow effect (Hatzianastassiou et al. 2008).

Regarding lithology, the upper Acheloos and Arachthos watersheds, part of the external Hellenides (Pindos, Ionian) geotectonic zones, comprise (mainly) erosion-prone sedimentary formations, i.e., limestones, sandstones, marls, hornstones, ophiolites, flysch, clay schists and Quaternary alluvial deposits (conglomerates, terraces, talus cones and scree, etc.). The bedrock of the Aliakmon watershed, laying over the Pindos and Sub-Pelagonian zones (western; eastern part of the basin, respectively), roughly consists of erosion-resistant material, i.e., crystalline formations, granites, volcanic rocks, ophiolites, limestones, sedimentary and volcano-sedimentary formations (Migiros et al. 2008; Karalis et al. 2018).

Besides, the upper Acheloos and Arachthos watersheds share similar topographic (mountainous terrain, steep slopes, diversification between morphological elements) and land use (extensive vegetation cover; limited plains and croplands) features. Conversely, the Aliakmon basin’s relief is milder.

Fig. 1
figure 1

Study area

Table 1 Study area attributes (retrieved from Efthimiou et al. 2022)

2.2 Datasets and Instruments

The PPC provided the Q-Qs pair (Fig. 10), daily; monthly Q (m3 s−1) (Fig. 11), and monthly Qs (kg s−1) (Fig. 12) records per cross-section. The mean monthly Q, Qs patterns are displayed in Figs. 13; 14, respectively.

The Q-Qs pairs sampling program was hydrologically (not calendar-) based, with most measurements taken in low flow conditions. Daily Q calculations required the development of stage-discharge curves, using concurrent water stage and Q observations. Q is the average value of the nQi measured at the (n) sub-sectors every cross-section was divided to. Subsequently, mean daily Q was estimated by implementing the respective water stage value on each section’s specific formula. Monthly Qs derived from a fixed-interval sampling schedule, with PPC estimating total Qs using Eqs. 1, 2.

$$\:{G}_{s}=K\times\:{S}_{c}\times\:Q\times\:{10}^{-8}$$
(1)
$$\:{S}_{c}=\frac{\sum\:{S}_{s}{Q}_{s}}{\sum\:{Q}_{s}}$$
(2)

where, Gs (T s−1) is the total sediment discharge, K a rating coefficient equal to 2.792, Q (m3 s−1) the cross-section’s mean discharge, Sc (ppm) the cross-section’s mean concentration, Ss (ppm) the sub-sector’s mean concentration, Qs (m3 s−1) the sub-sector’s mean discharge, Ss × Qs (m3 s−1) the sub-sector’s partial discharge.

Qs measurements were conducted using a Delft bottle sampler (Fig. 15) (Dijkman 1978, 1981) taking one average sub-sectoral value per section. The instrument considers coarse (> 100 μm) suspended sediment from a range of 0 (water surface) to 0.1 m depth above the riverbed. Its operation is an application of the flow-through principle, according to which water enters the intake nozzle, flows through the chamber, and deposits the material at its ‘tail’. Sedimentation (settling) is caused by the notable reduction of flow velocity within the vessel due to its bottle-like shape (at the entry, is equal to that of the undisturbed flow). The Delft method estimates (in situ) in sufficient accuracy the volumetric quantity of the local sand transport, limiting laboratory analysis. At least 5 min of sampling time is necessary for the acquisition of statistically reliable records.

2.3 Sediment Rating Curves (SRC)

A power function (Eq. 3) is the most common SRC form (Mimikou 1982; Asselman 2000).

$$\:{Q}_{s}=a{Q}^{b}e$$
(3)

where, Qs (kg s−1) is the sediment discharge, Q (m3 s−1) the discharge, a (kg s−1) the rating coefficient, b (dimensionless) the rating exponent, and e the multiplicative error term. The exponent values range between 0.5 and 3, and the error term follows – in theory at least – a Gaussian (log-normal) distribution (Walling 1977).

Several researchers attempted a physical interpretation of the role of a, b. Morgan (2005) correlated the high values of a with large volumes of suspended (fine) material, an easily routed load that can increase sediment transport rates. According to Peters-Kümmerly (1973) a reflects (proportionally) the watershed’s erodibility, i.e., low values indicate the presence of non-susceptible formations and vice versa. Arguably, the coefficient merely represents the value of Qs when Q is equal to 1, hence, it cannot serve as erodibility measure since its magnitude will be largely ‘controlled’ by the catchment size. This means that unit discharge in a small catchment could classify as very high flow, whereas, in a large one as very low – well beneath the minimum record.

On the other hand, Ellison et al. (2014) correlated b with the flow’s erosive potential [so did Peters-Kümmerly (1973), Asselman (2000), and Gao (2008)] and sediment transport capacity. As its values increase, transport capacity becomes more efficient (at steeper slopes, i.e., higher b values, even low flows can ‘carry’ sediment load), reaching its full potential near the coefficient’s upper threshold. At this highly sensitive state, even a small increase of Q can affect Qs disproportionally. Specifically, as streamflow increases, a steep positive gradient denotes rapid SSC rise, a low or level gradient designates moderate SSC rise, and a negative gradient signifies dilution due to limited sediment supply (Ellison et al. 2014). The exponent can also interpret storm event dynamics (Tanaka et al. 1983), upland sediment supply and transport within the watershed (the rate is supply limited; the degree depends on the seasonality of land cover) (Gao and Puckett 2012), variations of the rising and falling limb of hydrographs or the erodibility of different (dry, wet) regions (Park 1992), the extend of new sediment sources (Syvitski et al. 2000), etc.

2.4 Statistical Measures

2.4.1 Statistical Bias Correction Factor (BCF)

The non-parametric bias correction factor (BCF) (Eq. 4) or ‘smearing’ estimator of Duan (1993) is used for the correction of the back-transformation bias from logarithmic to arithmetic space.

$$\:BCF=\frac{1}{n}\sum\:_{i=1}^{n}{b}^{{r}_{i}}$$
(4)

where, n is the size of the population, i.e., the number of samples (arithmetic space) or regression residuals (logarithmic space), b is the logarithm base, and ri is the difference of observed against estimated sediment concentration per sample i measured in logarithmic units.

2.4.2 The Nash-Sutcliff Efficiency (NSE) Index

The Nash-Sutcliff efficiency (NSE) (Nash and Sutcliffe 1970) is a time-series convergence index, and a measure of the dispersion (scatter) of observed records based on their divergence from their mean value, as the fraction denominator of Eq. 5 indicates. NSE is the ratio of residual variance to measured data variances, acquiring values in the range -∞ to 1. Negative values indicate poor model performance, i.e., failure to reproduce the observed mean. Contrary, values close to 1 are optimal, evidence of high convergence between observed and simulated series.

$$\:NSE=1-\left[\frac{\sum\:_{i=1}^{N}{\left({O}_{i}-{P}_{i}\right)}^{2}}{\sum\:_{i=1}^{N}{\left({O}_{i}-{\stackrel{-}{P}}_{i}\right)}^{2}}\right]$$
(5)

where, N is the number of points, Oi the observed values, Pi the simulated values, P ̅i the mean value of the population.

3 Results

3.1 Sediment Rating Curves Development

Alternative methods are introduced in the attempt to minimize the uncertainty of the SRC results. Among them, the hydrological (rising/falling limb of the hydrograph) and seasonal [wet/dry season; monthly (Mao and Carrillo 2017)] stratification of the Q-Qs pairs with different equations per period/stage, the use of complex (polynomial) formulas (Cordova and Gonzalez 1997; Horowitz 2003), the calibration for different precipitation intensity ranges (Guzman et al. 2013), etc. However, none is considered universally accepted (Sivakumar and Wallender 2004) and the choice lies with the researcher.

This study utilized four different grouping methods for the Q-Qs pairs, namely the (i) simple rating curve, (ii) different ratings for the dry and wet season of the year, (iii) hydrographic classification, and (iv) broken line interpolation (Table 2, Figs. 16, 17, 18, 19, 20, 21, 22, 23 and 24). Method (i) employs a single formula to describe the entire dataset. Method (ii) categorizes the Q-Qs population according to the recording period into dry (June to November) and wet (December to May) season. In method (iii) the grouping is a function of the magnitude of flow, defined by the inflection point (IP) that separates low (Q ≤ IP) from high (Q > IP) flows. IP is equal to the mean annual Q value multiplied by the 1.2 constant,. However, in the summer, rising stages could still occur when Q is below that threshold, and in the winter, flows exceeding the threshold could belong to the falling stage (Efthimiou et al. 2022). Method (iv) follows a similar approach, i.e., clustering in two discharge classes, however, in this case the IP is retrieved by a trial-and-error process considering the physical characteristics of the surveyed rivers. The objective is to simultaneously minimize the fitting error (if that is the only concern, there is a risk the stratification may end up in a very rough ‘broken line’) and the roughness of the sequential segments of the curve.

The stratification is followed by a regression analysis (least squares fitting), performed on the logarithms of the Q-Qs pairs. The logarithmic space is selected for (i) presentation reasons, since there the curves ‘become’ straight lines, and (ii) meeting the theoretical hypotheses of normal distribution and homoscedasticity (constant variance) of the Qs residuals (Ferguson 1986; Cohn et al. 1992; Asselman 2000). The descriptive equations of methods (ii) to (iv) comprise two distinct formulae, i.e., one per segment.

Table 2 SRC attributes (retrieved from Efthimiou et al. 2020)

3.2 Simulated Suspended Sediment Discharge

The mean daily Q records were applied on the SRC formulas, developing four simulated (coarse, > 100 μm) Qs series per cross-section. The monthly (Figs. 25, 26, 27, 28, 29, 30, 31, 32 and 33), mean monthly (Fig. 2), and annual (Fig. 3) load was estimated as the average Qs per timestep. The convergence between observed and simulated data is also displayed.

Fig. 2
figure 2

Mean monthly simulated suspended sediment discharge (Qs, kg s-1) and convergence with field measurements at the station; river (a) Grevena bridge, Aliakmon, (b) Moni Ilarion, Aliakmon, (c) Moni Prodromou, Aliakmon, (d) Velventos, Aliakmon, (e) Avlaki dam, upper Acheloos, (f) Arta bridge, Arachthos, (g) Plaka bridge, Arachthos, (h) Tsimovo bridge, Arachthos, (i) Gogo bridge, Arachthos

Fig. 3
figure 3

Annual simulated suspended sediment discharge (Qs, kg s-1) and convergence with field measurements at the station; river (a) Grevena bridge, Aliakmon, (b) Moni Ilarion, Aliakmon, (c) Moni Prodromou, Aliakmon, (d) Velventos, Aliakmon, (e) Avlaki dam, upper Acheloos, (f) Arta bridge, Arachthos, (g) Plaka bridge, Arachthos, (h) Tsimovo bridge, Arachthos, (i) Gogo bridge, Arachthos

The transition from logarithmic to arithmetic space inherits a back-transformation bias, i.e., the distribution of the Qs residuals is not normal (deviation from the theoretical principle mentioned in § 3.1); their mean value is > 0 (Koch and Smillie 1986). The bias is usually negative, leading to underestimated Qs. According to Ferguson (1986), the underestimation is analogous to the variance of the residuals (additive error terms of the log-linear regression). It occurs because in the arithmetic plot the power function (curve) passes through the arithmetic means of the Q-Qs pairs, while in the log-log plot (line) through the geometric ones, and the former are systematically higher than the latter. In wash load conditions where the deviation is higher {the mean square error of the log-transformed regression σ2 (> 0; 0 when no dispersion around the regression line occurs) increases, causing the multiplicative error terms to attain higher values} the underestimation grows (Ferguson 1986; Singh and Durgunoglu 1989; Asselman 2000).

Among the several methods proposed for the elimination of the back-transformation bias, e.g., Bradu and Mundlak (1970), Jones (1981), Ferguson (1986), Koch and Smillie (1986), Cohn et al. (1989), Duan (1983), etc., the ‘smearing’ estimator of Duan (1993) was ultimately selected due to its widespread use. However, there is no consensus on which BCF is prevalent.

The unbiased estimator of the true load is acquired by applying the BCF (multiplying it directly with the rating coefficient) to the simulated Qs time series. This causes the curve to shift upwards, towards higher loads (Figs. 16, 17, 18, 19, 20, 21, 22, 23 and 24).

4 Discussion

4.1 Datasets Variability

4.1.1 Causality Drivers

Suspended sediment discharge (Qs) largely depends on the watershed’s (riverbed; banks included) sediment supply, the composite detachment-transport-deposition mechanism that regulates its routing towards the fluvial network, the network’s and tributary inflows’ pattern that governs floodwater travel rate and distance, and the storage-mobilization-depletion cycle (Williams 1989). The latter display strong spatial and temporal heterogeneity, especially in alpine basins of diverse morphology (Lenzi et al. 2003; Mao et al. 2009). Soil loss is a multifactorial process, conditioned upon the complexity of the associated variables and the heterogeneity of their interrelations. The impact of an individual variable on the magnitude and frequency of soil displacement, though, is neither linear nor explicit, while it is often difficult to identify whether these variables act complementary or contradictory towards sediment production (or soil protection for that matter). These non-deterministic interactions need to be deciphered, without omitting the spatial component of all causality agents. Overall, physical basins, and especially mountainous landscapes, are typical non-linear; -stationary systems that result in non-Gaussian sediment load/hydrometric variables’ distributions (Lenzi et al. 2003; Mao et al. 2009).

Rainfall (depth; intensity) displays distinct seasonal variation (as Q and water temperatureFootnote 4), evolving in clear cyclesFootnote 5 (intra-annual, annual), typical of the Mediterranean climate. Its spatial distribution in Greece is defined by the mountain range of Pindos that demarcates western from eastern Greece in terms of water abundance (see § 2.1). The parameter manifests in winter depths of high erosivity (Panagos et al. 2016) and intense summer thunderstorms (Kottek et al. 2006). These infrequent, transient events (Poesen and Hooke 1997) largely govern sediment rates (Martinez-Mena et al. 2001). A striking example is the formation of torrentsFootnote 6. Apart from the apparent risks entailed by the intensity of the dry season outbursts, rainfall (runoff) increases its erosive potential when falling on (flowing over) unprotected surfaces, especially from October to January, when the highest rainfall erosivity rates are met (Panagos et al. 2016). Mismanagement results to soil stripped of vegetation due to overgrazing, forest fires, and fuelwood gathering (Woodward 1995), and bare farmlands due to ‘unregulated’ tillage, cultivation, or harvesting. Fallow (continuous, or part of a crop rotation schedule), ploughing (in September, October), sowing (roughly from November-April), cultivation of vulnerable crop types or ‘unsynchronized’ phenological phases (the canopy is not fully developed during the early growth stages) (Schmidt et al. 2018; Baiamonte et al. 2019), random distribution of winter (non-irrigated; soil is assumed bare from June-November) and summer (permanently irrigated; soil is assumed bare from December-May) crops without considering the spatiotemporal patterns of rainfall, and bad post-harvesting ‘habits’Footnote 7 leave the soil exposed to the abrading forces of rainfall and runoff. Vulnerability is increased by the abandonment of traditional conservation techniques. For example, the deterioration of terraces leads to longer slopes; greater floodwater/sediment volume and travel distance (Koulouri and Giourga 2007). Besides, deep ploughing and intensive cultivation destroy soil structure (Faust and Schmidt 2009), further exacerbating soil’s inability to sustain vegetation. Since most of the annual soil in agricultural lands is lost during few severe events (Nadal-Romero et al. 2012; Rodrigo-Comino et al. 2017) the joint rainfall-biomass density coupling becomes critical for the identification of the hazardous region-period combinations.

Other inherent characteristics of the Mediterranean landscape, namely the diverse topography (Polykretis et al. 2020), the erodible bedrock (flysch, alluvial formations, etc.) (Kosmas et al. 2001), the active tectonic processes (Bailey et al. 1993), the pedological properties, e.g., shallow profiles, low Organic Matter Content (OMC), etc. (Poesen and Hooke 1997; Canton et al. 2011) classify Mediterranean soils as highly susceptible to erosion (Cerda et al. 2010).

Finally, the watershed scale exerts notable influence on the accuracy of Qs assessment (Richards and Holloway 1987), with the quality of estimates being inversely related to it (Phillips et al. 1999) – according to Walling and Webb (1988) the estimation error decreases as the basin size increases. Boyce (1975) interpreted the inverse relationship of net to gross erosion as a function of slope steepness. This fraction (transported to on-site mean annual soil loss) is always < 1 and is reduced as the basin scale increases. Specifically, at the topographically diverse parts of the watershed, strong inclinations are responsible for the prevalence of net erosion (high detachment rates) over deposition. As watershed scale increases, mean slope decreases, causing sediment production rate per unit area to decrease accordingly. Furthermore, at larger scales, the higher number of sinks, depressions, valleys, etc., favours deposition, leading to lower net erosion rates (there and throughout) (van Rompaey et al. 2001).

4.1.2 Temporal Fluctuation

Discharge (Q) and suspended sediment discharge (Qs) display inter- (monthly, seasonal) (Fig. 4) and intra-annual temporal variability (Fig. 5).

Fig. 4
figure 4

Mean monthly (a) discharge (Q, m3 s-1) and (b) suspended sediment discharge (Qs, kg s-1) measurements inter-annual variability at the station; river (a) Grevena bridge, Aliakmon, (b) Moni Ilarion, Aliakmon, (c) Moni Prodromou, Aliakmon, (d) Velventos, Aliakmon, (e) Avlaki dam, upper Acheloos, (f) Arta bridge, Arachthos, (g) Plaka bridge, Arachthos, (h) Tsimovo bridge, Arachthos, (i) Gogo bridge, Arachthos

Fig. 5
figure 5

Annual discharge (Q, m3 s-1) and suspended sediment discharge (Qs, kg s-1) measurements intra-annual variability at the station; river (a) Grevena bridge, Aliakmon, (b) Moni Ilarion, Aliakmon, (c) Moni Prodromou, Aliakmon, (d) Velventos, Aliakmon, (e) Avlaki dam, upper Acheloos, (f) Arta bridge, Arachthos, (g) Plaka bridge, Arachthos, (h) Tsimovo bridge, Arachthos, (i) Gogo bridge, Arachthos

At the monthly scale (Fig. 12) the Qs outliers recorded in the summer and early fall are ascribed to the outburst of intense storms falling onto bare soil (Steegen et al. 2000; Lecce et al. 2006). During the winter and early spring, the rainfall and snowmelt patterns lead to increased streamflow, and, by extension, sediment transport. Overall, the intra-annual patterns of Q (and sediment storage; see § 1) are strongly correlated with the monthly dynamics of Qs.

At the mean monthly scale (Fig. 4), the variables’ fluctuation is depicted by inverse bell shape curves. The highest values (left/right tail) are acquired in the winter and the lowest in the summer, a common trend to all stations. May is the turning point when precipitation decreases, and the soil is adequately protected by fully grown canopies at the end of the growing season. In the warm period, from June to September, the minimal streamflow (and soil loss, due to the coupling of low intensity non-erosive storms with sufficient biomass density) is reflected in the routed sediment volume. The lowest Q, Qs values are met to all stations in August (Figs. 4 and 6). Among them, Grevena bridge yields the minimum records at 1.62 m3 s−1, and 0.02 kg s−1, respectively. The maximum Q is recorded in March at Moni Prodromou (141.39 m3 s−1), and the maximum Qs in November at Arta bridge (880.10 kg s−1). The individual (per station) Q, Qs patterns are displayed in Figs. 13, 14 respectively.

Fig. 6
figure 6

Mean monthly (a) discharge (Q, m3 s-1) and (b) suspended sediment discharge (Qs, kg s-1) measurements value range

At the annual scale (Fig. 5) the Q and Qs time series display strong variability, however, there is a satisfactory convergence of their trend in most stations. The lowest mean annual Q value (18.10 m3 s−1) is calculated at Gogo bridge and the highest at Velventos (72.88 m3 s−1). The respective Qs values are retrieved at Grevena bridge (1.96 kg s−1) and Arta bridge (239.75 kg s−1) (Figs. 5 and 6).

The graphical investigation of Fig. 12 reveals two evident characteristics of the Q, Qs time series, common to all stations, i.e., strong seasonal diversification and notable deviations from the mean state. The inter-quantile range (equal to the boxplot length) and overall data range (equal to the distance between the edge of the two whiskers) illustrated in Figs. 7 and 8 verify such variability. According to Markonis et al. (2017) both are a direct impact of the Mediterranean climate and the seasonality of rainfall. These fluctuations further affect sediment load variability (standard deviation) and maxima (Fig. 8). However, the skewness of the time series of both variables (Figs. 7 and 8) does not depend on seasonality, yielding positive values throughout the year. This indicates that extreme deviations from the mean state can occur in various months (Efthimiou et al. 2022).

Fig. 7
figure 7

Monthly statistics of discharge (Q, m3 s-1) measurements (a) median, (b) standard deviation, (c) maximum, (d) skewness

Fig. 8
figure 8

Monthly statistics of suspended sediment discharge (Qs, kg s-1) measurements (a) median, (b) standard deviation, (c) maximum, (d) skewness

4.2 Sediment Rating Curves Deciphering

The simple rating curve (method i) uses a single equation to describe the entire range of measured Q (base flow, flood events). At the Grevena bridge station (Fig. 16(a) the method underestimated the high Qs records and overestimated the low ones, i.e., the application of high Q values on its formula yielded lower simulated Qs against the observed ones, and vice versa. The former are depicted by the points above the line (wash load, few events), and the latter by those below it (base flow). Comparatively, the underprediction (wash load conditions, where most of the annual sediment load is delivered) is notably greater and more critical for the total load estimation than the overprediction (Horowitz 2003; Cox et al. 2008). This behavior is ascribed to the (i) inadequacy of the regression model to describe the data population (Freund et al. 2006) – in other words, the limitation of a single-segment line fit to incorporate the factors known to affect sediment transport – and (ii) the effect of the Q-Qs sampling program (the majority of records taken during low flows) on the slope of the least squares curve. The coefficient of determination (r2) – the level of explanation of the variance of the data concentration – is rather low (51%), indicating a poor correlation between Q (independent variable) and suspended Qs.

The different ratings for the dry and wet season of the year (method ii) clusters the Q-Qs pairs according to their recording period. Aside from Q and Qs, the method considers (indirectly) several temporally variable parameters, such as water temperature, storm type, vegetation cover, the erosion process (Mimikou 1982; Koutsoyiannis and Tarla 1987). At the Grevena bridge station (Fig. 16(b), of the 47 pair measurements 11 were apportioned to the dry season (June to November) and 36 to the wet one (December to May). For every Q value, the dry season segment yields higher Qs than the wet season one. This is ascribed to the (i) greater erosive potential of a dry season precipitation event (that has conveyed equal volume of Q), and (ii) watershed’s higher sediment availability, due to physical, atmospheric, etc., conditions, e.g., high temperatures that lead to increased evapotranspiration, low soil moisture, and eventually dry and fragmented soil. Additionally, the wet season yields the highest mean and maximum Q [in this case 32.26 m3 s−1 and 140.71 m3 s−1 (13/02/1980), respectively], while the dry season has the lowest minimum Q (1.29 m3 s−1). The Q patterns are driven by the seasonality of rainfall and snowmelt, with Qs following these cycles. Furthermore, the exponent of the wet season segment is relatively higher than that of the dry season (1.51 instead of 1.30), denoting more erosive power and efficient transport capacity of the flow, and of runoff (in terms of supply rate and degree) (see § 2.3). The r2 index is rather low in both periods (wet: 37%, dry: 58%), with the Q-Qs pairs displaying greater scatter in the first cluster.

The hydrographic classification (method iii) stratifies the Q-Qs pairs according to the magnitude of flow. At the Grevena bridge station (Fig. 16(c)), the IP, i.e., the threshold between low and high flows, is calculated as 21.91 m3 s−1, given the average daily Q (18.25 m3 s−1) for the period 1962–1988 (Table 2). Of the 47 pair measurements 25 were grouped to the high flows cluster (Q > IP, rising limb of the hydrograph) and 22 to the low flows one (Q ≤ IP, falling limb of the hydrograph). For every Q value, the rising limb yields higher Qs than the falling one. Furthermore, the slope of the segment above the IP is steeper, denoting that Qs is increasing faster in this value range. This implies the presence of different sources of sediments, e.g., bank erosion, washed away deposits of bed material pilled-up at low flow periods, riverbed deterioration, upland supply, the ‘production’ of which occurs during intense floods and Q rates that far exceed the IP. The r2 index of both clusters is low, especially that of the rising limb (18%).

The broken line interpolation (method iv) also stratifies the Q-Qs pairs according to the magnitude of flow yet following a different approach (see § 3.1). At the Grevena bridge station (Fig. 16(d)), a single IP is calculated as 22.00 m3 s−1, leading to the formation of two discharge classes. It so happens that the IP in methods (iii) and (iv) is almost identical (21.91 m3 s−1 ≈ 22.00 m3 s−1), hence they both behave similarly, i.e., (a) from the 47 pair measurements 25 were grouped to the high Q cluster and 22 to the low Q one, (b) the respective segments are described by the same mathematical formulas, (c) the slope of the segment above the IP is subject to the same causes and limitations as the one of method (iii), (d) the r2 indices are equally low. The two-segment line was selected based on the physical characteristics of the watercourse (and the study area in general). In gravel bed rivers the riverbed surface forms an ‘armor’ layer, which is coarser than both the substrate and the suspended sediments. Past a certain Q value, the layer starts to deteriorate, or gets completely washed out. During the flood event, the exceedance of this threshold causes the exposing and mobilization of a larger range of particle sizes and the notable increase of the transport rate. Under such conditions, Qs is further enhanced by other processes/sources of sediment, such as bank erosion. Conversely, bellow this threshold no exchange occurs between the suspended and the ‘shielded’ material. The fact that the IP is higher than the mean daily Q value (18.25 m3 s−1) implies that the armor layer breaks up less frequently, and extreme floods contribute more to the long-term Qs than ordinary runoff events.

The interpretation of the curve form and placement at the other sites is analogous to that of Grevena bridge. In brief:

  • Moni Ilarion, Aliakmon.

The average daily Q for the period 1962-82 is 51.95 m3 s−1, the IP of method (iii) (1.2×Qavg) that demarcates low from high flows is 62.33 m3 s−1, and the threshold of method (iv) is 66.00 m3 s−1 (Table 2).

  • Moni Prodromou, Aliakmon.

The average daily Q for the period 1962-71 is 72.60 m3 s−1, the IP of method (iii) (1.2×Qavg) that demarcates low from high flows is 87.12 m3 s−1, and the threshold of method (iv) is 80.00 m3 s−1 (Table 2).

  • Velventos, Aliakmon.

The average daily Q for the period 1962-70 is 72.68 m3 s−1, the IP of method (iii) (1.2×Qavg) that demarcates low from high flows is 87.21 m3 s−1, and the threshold of method (iv) is 64.00 m3 s−1 (Table 2).

  • Avlaki dam, upper Acheloos.

The average daily Q for the period 1950-94 is 52.40 m3 s−1, the IP of method (iii) (1.2×Qavg) that demarcates low from high flows is 62.88 m3 s−1, and the threshold of method (iv) is 60.00 m3 s−1 (Table 2).

  • Arta bridge, Arachthos.

The average daily Q for the period 1962-75 is 64.62 m3 s−1, the IP of method (iii) (1.2×Qavg) that demarcates low from high flows is 77.54 m3 s−1, and the threshold of method (iv) is 100.00 m3 s−1 (Table 2).

  • Plaka bridge, Arachthos.

The average daily Q for the period 1961-79 is 41.81 m3 s−1, the IP of method (iii) (1.2×Qavg) that demarcates low from high flows is 50.17 m3 s−1, and the threshold of method (iv) is 60.00 m3 s−1 (Table 2).

  • Tsimovo bridge, Arachthos.

The average daily Q for the period 1964-01 is 18.07 m3 s−1, the IP of method (iii) (1.2×Qavg) that demarcates low from high flows is 21.68 m3 s−1, and the threshold of method (iv) is 40.00 m3 s−1 (Table 2).

  • Gogo bridge, Arachthos.

The average daily Q for the period 1965-75 is 10.87 m3 s−1, the IP of method (iii) (1.2×Qavg) that demarcates low from high flows is 13.04 m3 s−1, and the threshold of method (iv) is 20.00 m3 s−1 (Table 2).

In some cases, e.g., Moni Ilarion; method iia, iib, Velventos; method iia, iib, etc., r2 is sufficiently high, acquiring its maximum value at Arta bridge (88%, method iia). According to Koutsoyiannis and Tarla (1987) this is because Q ‘embodies’ both the hydraulic characteristic of a cross section and the hydrological attributes of the watershed. Arguably, the relationship may be largely spurious (Benson 1965), considering that Q is, at the same time, the independent variable (instantaneous point measurement) and part of the depended variable (product of Q with mean SSC). Hence the goodness of fit might be rather deceptive (McBean and Al-Nassri 1988).

4.3 Validation of Simulated Suspended Sediment Discharge

The level of convergence between monthly observed and modelled timeseries (Figs. 25, 26, 27, 28, 29, 30, 31, 32 and 33) was evaluated utilizing the NSE index. In general, all methods display weakness in simulating Qs extremes to all stations. Surprisingly, the negative deviations (underprediction) remain even after the application of the BCF. Apparently, the correction forces the curve to shift towards higher loads (see § 3.2), leading to a relatively better fit at high flows. Yet, the low and medium Qs get overestimated. The poor fit is attributed to (i) the rather high BCF values, resulting from the obvious scatter (high standard deviation) of the error terms’ log-linear regression per sample/population, (ii) the ability level of BCF to provide reliable Qs records (Ashmore 1986; Koch and Smillie 1986), and (iii) the fact that bias is not the dominant cause of inaccuracy (Walling and Webb 1988), against more significant sources of error, not reflected in the BCF, such as the population scatter, the seasonal fluctuations of the rating relationship, hysteretic and exhaustion effects, etc. (Walling 1977). Conversely, the total sediment volume is assessed quite satisfactory. Indicatively, at Grevena bridge (Fig. 25) method (i) continuously underestimated the observed Qs, method (iii) and (iv) performed similarly (irregular case where the respective low; high flows segments are described by the same equations, given the almost identical IP value), continuously overpredicting the measured data, while method (ii) performed exceptionally well, successfully simulating all outliers. Overall, at such analysis, there is no distinct trend of over- or under-prediction by any method. The NSE values indicate that method (i) performs best at Moni Ilarion (0.93), Moni Prodromou (0.98), and (comparatively) Arta bridge (0.19), method (ii) is prevalent at Grevena bridge (0.99) and Gogo bridge (0.74), method (iii) suits Velventos (0.95) and Plaka bridge (0.66), and method (iv) betters fits Avlaki dam (0.74) and Tsimovo bridge (0.37).

The correlation of the monthly observed and modelled Qs was additionally assessed employing a least squares fitting (Figs. 34, 35, 36, 37, 38, 39, 40, 41 and 42). The observed data were placed on the y-axis and the simulated values on the chi-axis (Pineiro et al. 2008). Each plot depicts the y = x line (45 degrees or slope equal to 1) with black colour, the trendline (y = a + b x) with red colour and the regression formulas (a is the intercept, b the slope, r2 the coefficient of determination). Indicatively, at Grevena bridge (Fig. 34) the r2 values are extremely high for all methods, denoting minimum variance of the data pairs. Slope values > 1 (Fig. 36(a)/ method (i), 36(b)/ method (ii)) indicate underestimation and < 1 (Fig. 36(c)/ method (iii), 36(d)/ method (iv)) overestimation. Furthermore, at low flows the scatter is low, and most pairs are pinned close to the regression line. As Q increases, the dispersion widens, and the underprediction grows proportionally. This pattern, however, is not consistent. Methods (iii) and (iv), despite their overestimating trend, keep all pairs close to regression line throughout the Q range. Hence, they are more reliable to predict the variable’s actual values, even at high Q where greater uncertainty occurs. The interpretation of the curve form and placement at the other sites is analogous to that of Grevena bridge.

At the mean monthly scale (Fig. 2), inverse bell curves depict the datasets fluctuation. The highest Qs values are yielded to all stations by every method in the winter and the lowest in the summer. The minimal Qs from June to October is ascribed to the warm period’s low flow values (see § 4.1.2). All methods deliver the lowest Qs to all stations in August. Positive deviations denote that the simulated Qs is higher than the observed one, and vice versa. Indicatively, at Grevena bridge, methods (iii) and (iv) continuously overpredicted Qs displaying positive divergence from the measured records, method (ii) followed the same trend but with smaller deviations, and method (i) was inconsistent with positive deviations from March to October and underprediction in the remaining months. Overall, all simulated values are considered similar, and in accordance with the observed ones. The interpretation of the curve form and placement at the other sites is analogous to that of Grevena bridge.

Although the observed Qs (Fig. 3) displays strong intra-annual variability, there is satisfactory convergence between measured and modelled time series in most stations. Indicatively, at Grevena bridge, as in the case of the mean monthly time step, methods (iii) and (iv) continuously overpredicted Qs, method (ii) also displayed constant positive divergence, yet it was comparatively more accurate (smaller deviations), and method (i) was anew inconsistent (no specific trend was detected).

4.4 Advantages of SRCs and Reasons of Inaccuracy in Sediment Load Estimates

SRCs can produce synthetic Qs records of satisfactory validity at cross-sections with limited inputs (Efthimiou 2019; Efthimiou et al. 2022). Τhe method merits a rather simple development of a continuous series of detailed (analogous to the resolution of Q) analysis of high interpolating (filling gaps of disrupted historical series), extrapolating (extending short and/or outdated series) and reproducibility potential without the requirementFootnote 8 of a regular sampling program. The reduced sampling intensity – typical characteristic of an extrapolationFootnote 9 procedure – allows for a straightforward cost-effective application, particularly in the case of poor financial and labor resources (Walling 1977).

However, several researchers have identified the notably greater error (80–900%) involved in sediment load estimates by SRC against field observations (Dickinson 1981; Ferguson 1986; Walling and Webb 1988; Asselman 2000; Horowitz 2003). A major reason of inaccuracy is the limitations of the Q, Qs sampling program (Douglas 1971; Horowitz 2003). In this study, elevated cost and the technical difficulties entailed, forced the Greek PPC to omit bed load sampling and the measuring of other critical parameters, i.e., suspended load granulometry, per depth concentration distribution, etc. Furthermore, most Q-Qs pairs were taken in low and normal flow conditions (on the rising and falling limb of the runoff hydrograph, and definitely not around its peaks), even though in Mediterranean watersheds the majority of the annual wash load volume is delivered by few flood events. In other words, the program was established on hydrological criteria rather than a systematic calendar-based approach. This means that the records were not representative of a random sample of the population (Ferguson 1986). The temporal infrequency and the long non-recording intervals hindered the program’s capability to fully describe sediment transport on the surveyed rivers. Similarly, the Q dataset failed to represent the entire range of possible flows, since the mean daily Q data available cannot describe flood conditions that require finer analysis measurements. Finally, the operation of the Delft bottle induced errors in the sampling process, providing only a rough estimate of the sand transport. These are quite important, since they can rise up to 50% for single samples, even after the implementation of correction measures. Specifically, (i) the intake velocity deviates from that of the local flow, (ii) the instrument can only collect a fragment of the total sediment volume, namely the coarse (> 100 μm) suspended particles that approximate a capacity load (not supply-controlled bed material load), while the non-capacity loads consisting of finer than 100 μm particles such as the suspended silt (63 –2 μm) and clay (< 2 μm) fractions are disregarded (Dijkman 1978, 1981), (iii) during the emerging and submerging of the bottle superfluous particles enter the nozzle, and (iv) a small part of the sample is lost during the removal of the sand-catch.

The quality of estimates is equally affected by the limitations of the SRC, namely the stationarity of the rating parameters, and errors ascribed to other intrinsic determinants like the population scatter, the statistical bias, hysteretic and exhaustion effects. To further elaborate, the non-temporal character of the a and b coefficients implies a permanent relationship between Q and Qs, which was also adopted in this study to justify the seamless use of the rating formulas throughout the application period. However, this hypothesis does not apply in flood events, and predominately to intra-annual studies. During floods, Q and Qs fluctuate intensely (Rieger et al. 1988; Walling and Webb 1988), while other sources of sediment, e.g., bank erosion, scouring, etc., must be considered. Hence, in such conditions, the predictability of the curves is rather poor (Blanco et al. 2010). This nonlinearity presents challenges in developing universally reliable models for sediment management (Wright and Parker 2004). Moreover, the hypothesis is violated in natural watersheds, where changes in precipitation (due to seasonal patterns, climate crisis, etc.), land cover (due to urbanization, wildfires, construction sites, etc.), landscape morphology (due to landslides, earthquakes, etc.), channel morphology (the aggradation/degradation cycle), and flow fluctuation, occur over longer periods of time (Kuhnle et al. 1996; Prestegaard 1988; Yang et al. 2007).

Hysteresis is the lack of temporal homogeneity between Qs for the same Q on the rising and falling limb of a hydrograph (Walling 1977). The greater concentration, i.e., by two or more orders of magnitude (Walling 1977), is met on the rising limb (Gurnell 1987; Kronvang et al. 1997). Specifically, gravity causes a downward transport and storage of suspended sediments at low Q. As Q rises, and especially during flood events, the turbulence of the flow forces an upward transport. These observations, though, may not be proportional to the paired flow rate records. The phenomenon could also indicate different runoff and erosion types (Seeger et al. 2004), and sediment delivery and source area (Klein 1984; Williams 1989; DiCenzo and Luk 1997). Apparently, the knowledge of sediment sources within a drainage canal, e.g., when its morphology is altered (Juez et al. 2018), is essential for the accurate calculation of the sediment budget. In this study, however, the luck of data prevented such analysis. Overall, the hysteresis phenomenon is difficult to decipher using SRC, even when different segments/data stratification per limb is employed.

The scatter of the Q-Qs population, that largely defines the slope of the trendline, is associated with the features of the sampling program (recording frequency, flow; SSC conditions, etc.), the rating curve plot, the seasonality of the rating relationships, and the lag between SSC and Q response (Tanaka et al. 1983; Walling and Webb 1988; Roberts 1997; Gao et al. 2007). According to Walling (1977), the dispersion of the Q-Qs pairs can be interpreted by stream hydraulics, since in natural watercourses suspended sediment load is actually a non-capacity load. Such dispersion could also be ascribed to the hysteresis effect, of short (within events) or longer (between events) duration.

The back-transformation of the Q, Qs logarithms to arithmetic values entails a statistical bias, which is an appreciable cause of inaccuracy leading to underpredictions. The application of the BCF reduces to a large extent the deviation between observed and modelled Qs (Ferguson 1986; Hansen and Bray 1987), yet it does not zero it out. Overall, concerns are raised about the merits of the BFC application on bias removal (see § 4.3), the responsibility extend of bias regarding sediment load estimation errors (see § 4.3) and the problems arising in the attempt to analyze long-term Qs trends due to the interannual variability of load estimates (Walling and Webb 1988).

Size range and other intrinsic characteristics of the watersheds, e.g., the hydrological regime, affect the relative accuracy of the estimates between them. For example, the SRC does not perform well in small basins (Walling and Webb 1988). Furthermore, in this study, the Aliakmon river mainstem is modified, and the flow is regulated by a cascade of five dams. In such way, suspended routing and deposition within the watercourse is tampered, extending to the SRC development (Q-Qs pairs) and application (Q implementation). Besides, SRCs are often specific to individual rivers or even locations within a river system. Variability in watershed characteristics such as geology, vegetation, land use, and slope can all influence sediment transport, making it difficult to transfer the curves from one site to another without the need for recalibration. This limits their broader applicability, requiring continuous monitoring and adjustments for changing conditions at each location (Horowitz 2003).

Other sources of inaccuracy are the erroneous Q measurements (Achite and Ouillon 2007; Crowder et al. 2007), the incorrect application of Q to the rating formulas (Colby 1956; Gregory and Walling 1973), and the use of mean daily Q to describe the flow range implemented to a SCR (this could result in notable underprediction of the daily load, especially in the case of floods). The lack of high-resolution data can also hinder efforts to accurately model the effects of land use changes or climate variability on sediment dynamics (Ferguson 1986).

4.5 Implications for River Management

SRCs are an essential tool in river management. The valid quantification of a natural watercourse’s sedimentary budget under different flow conditions is critical to policy makers and stakeholders for the implementation of efficient management strategies, including sediment transfer control, water quality restoration, improved reservoir operations, and effective flood mitigation.

  • Predicting sediment transport: Excessive sedimentation can lead to reduced (active and/or flood) water storage capacity of reservoirs over time, equipment damage, and pressure exertion on the base of the dam (Schleiss et al. 2016). Understanding sediment dynamics enables the compilation of dredging schedules and sediment traps, and other solutions that optimize their performance and longevity, such as dead volume storage design, regulation planning (return period; exceedance risk), revision of operations (water storage; release time) (Molino et al. 2023).

  • Informing infrastructure design: With knowledge of sediment loads, managers can design and operate other hydraulic structures like levees and flood channels more effectively. For example, levees can be constructed to withstand erosional forces caused by high sediment flows. This is crucial in regions prone to high sediment loads, where river infrastructure may deteriorate more rapidly without proper consideration of sediment dynamics (Brandt 2000).

  • Flood risk management: By predicting sediment deposition and erosion, rating curves help improve flood modelling, preventing riverbed aggradation and enhancing flood prevention efforts (Horowitz 2003). When coupled with hydrologic models, they improve the ability to forecast the impacts of floods on downstream infrastructure and communities.

  • Water quality and ecosystem health: Suspended sediment can carry pollutants, nutrients, and organic matter. High sediment loads may reduce water quality, impair aquatic habitats, and harm fish populations by smothering spawning grounds. Moreover, drinking water quality declines requiring increased treatment costs, and recreational activities are restricted. SRCs help identify critical periods when sediment loads are likely to spike, allowing for better timing of interventions aimed at preserving ecosystem health (Collins & Walling, 2007).

  • Long-term monitoring and adaptive management: SRCs support continuous monitoring of sediment dynamics, which is crucial for adapting management strategies to changing environmental conditions, such as climate variability or land use changes.

4.6 Future Research

By focusing on improved data collection, integration with advanced modelling, and leveraging Artificial Intelligence (AI), SRCs can better predict sediment transport and enhance river management under changing environmental conditions.

Indicative suggestions include:

  • Improved data collection techniques: Traditional SRCs rely on periodic sampling, which may miss short-term or extreme events. Advances in real-time monitoring and remote sensing technologies (Zahiri et al. 2020), such as acoustic or optical sensors, can help collect continuous sediment and flow data, improving the accuracy of sediment load predictions, particularly during storms or floods (Gray and Gartner 2009).

  • Event-based calibration: Given that sediment transport often spikes during extreme weather events, there is a need to focus on event-based sampling (Gupta et al. 2019) to capture these anomalies more effectively. This will help develop more robust SRCs that account for variability during high-flow conditions, which traditional methods often underestimate (Asselman 2000).

  • Integration with predictive models: The future of sediment management lies in the integration of SRCs with hydrological, geomorphological, and land-use models. These models can consider the effects of erosion, land cover changes, and climate variability, offering a more comprehensive understanding of sediment dynamics over longer time periods (Syvitski et al. 2005).

  • Machine learning and AI: Incorporating machine learning algorithms (Kumar et al. 2019) could help improve the predictive power of sediment rating curves. By analysing large datasets and identifying patterns, AI tools could refine predictions by adapting to changing environmental conditions and complex non-linear relationships between discharge and sediment load.

  • Adaptation to the climate crisis: As the climate crisis intensifies rainfall patterns and alters river dynamics, sediment rating curves will need to be updated more frequently. Adaptive management strategies that continuously revise and recalibrate rating curves using new data are essential for addressing the increased variability in sediment transport.

  • Holistic, system-wide approaches: SRCs should be part of a broader, integrated river management approach. Combining sediment data with water quality, ecological health, and socioeconomic factors will support sustainable decision-making. This requires collaboration between hydrologists, ecologists, and policymakers.

5 Conclusions

The study provided insight on the most appropriate SRC development method for the estimation of coarse suspended sediment load at the outlet of nine Mediterranean sub-watersheds. The results were site/station dependent, i.e., no methodology emerged as universally accepted. The NSE values indicate that the simple rating curve performs best at the cross-sections Moni Ilarion, Moni Prodromou, and Arta bridge, the different ratings for the dry and wet season of the year at Grevena bridge and Gogo bridge, the hydrographic classification at Velventos and Plaka bridge, and the broken line interpolation at Avlaki dam and Tsimovo bridge. In this regard, the study advocates the use of multiple SRC methods followed by the quantification of the derivative uncertainty. The results are encouraging enough to valorize SRC as reliable alternative for the assessment of suspended sediment load in data-scarce rivers.