Introduction

Detached soil particles enter the fluvial network by ways of land surface, channel bed and bank erosion. The fine particles are transported in suspension (suspended load) given the river’s turbulence, while the coarse ones in traction and/or saltation along the stream bed (bed load). Suspended sediment concentration (SSC) is considered one of the major impairment factors of natural watercourses (US Environmental Protection Agency 1996). Over the last decades, there is a growing interest in suspended load estimation and its fluvial transport, related to issues like contaminant and nutrient transport (Conrad and Saunderson 2000; Neal et al. 2006); reservoir sedimentation; channel and harbor silting; the impact degree of soil erosion (Walling 1974; Sui et al. 2005); and specific sediment source such as bank erosion and landslide (Prestegaard 1988; Laubel et al. 2000); riparian zone or vegetation (Steiger et al. 2001; Nicholas 2003) on a watercourse’s sediment budget. Thus, the accurate estimation of SSC at different temporal scales becomes essential for the implementation of the proper river; catchment management strategies (Walling 1977a; Kuhnle and Simon 2000). Yet, the frequent lack of available and moreover reliable field data, representative of the river’s sedimentary regime, poses a serious problem considering the validation of estimates.

In Greece, only the Public Power Corporation (PPC) has conducted sediment discharge and simultaneous discharge–sediment discharge field measurements. Such measurements featured mainly the designing stage of dam construction projects and often stopped (the recording of raw sediment discharge data continues; the processing and dissemination to the public has stopped) after their completion. The sampling program (infrequent, unsystematic, etc.) and measuring technique used (e.g., bed load was disregarded) had several deficiencies as well. Thus, the development of an alternative (synthetic) high-quality sediment discharge time series becomes necessary, for the most accurate quantification of sediment yield, and by extension of soil erosion.

Among several approximations (Wren et al. 2000; Gao 2008) {e.g., sediment modeling at watershed scale [RUSLE (Renard et al. 1991), SWAT (Arnold et al. 1993)], turbidity monitoring (Lewis 1996; Rasmussen et al. 2009), artificial intelligence techniques [fuzzy logic (Lohani et al. 2007), Artificial Neural Networks (Jain 2001; Sharma et al. 2015) etc.], automated pumping samplers (Herman et al. 2008; Gettel et al. 2011)}, the rating curve methodology can constitute a good basis for describing the sediment regime of watercourses in catchments with limited data. At such conditions, a simple approach may perform equally or even better than a comprehensive one. The ability to create a continuous sediment discharge time series of low temporal resolution (equal to the corresponding discharge one), the potential of applying the established transport relationship for the reconstruction of long-term sediment records (stationary conditions need to be assumed), the ease of use (just the “a” and “b” rating parameters need to be estimated; sediment discharge prediction requires only discharge field measurements) and low implementation cost {limited sediment sampling allowing for cost-effective studies, especially in cases of poor financial and labor resources (Walling 1977a); contrary to e.g. the turbidity monitoring and/or automatic sampling approaches—although the respective devices are quite inexpensive to acquire, their installation, calibration, operation, and maintenance costs are fairly high} are characteristics that strengthen its overall status.

In sight of the above, the study aims to evaluate the performance of four alternative suspended sediment rating curve development methods (i) simple rating curve (Asselman 2000), (ii) different ratings for the dry and wet season of the year (Walling 1974), (iii) different ratings for the rising and falling limb of the runoff hydrograph (Glysson 1987), and (iv) broken line interpolation that uses different exponents for two discharge classes (Koutsoyiannis 2000) {also referred in international literature as piecewise linear regression or segmented regression (Ryan and Porth 2007; Yang et al. 2016)} at the outlet of the Venetikos River catchment, namely Grevena Bridge, located at Western Macedonia, Northern Greece. The main objective is to provide a solid guidance on the selection of the most appropriate one for the estimation of sediment discharge (yield) at the specific gauging site (basin), as well as to properly assess such values. In this respect, a statistical analysis was executed in order to evaluate the performance of these methods, by comparing the simulated mean monthly sediment discharge values against the observed ones (1965–1982), through a set of different statistical indices {mean bias error (MBE), mean absolute error (MAE), variance of the distribution of differences (sd2), root mean square error (RMSE), root mean square error-systematic component (RMSEs), root mean square error-unsystematic component (RMSEu), index of agreement (d), coefficient of determination (R2), Nash–Sutcliffe efficiency (NSE)}. An attempt was also made to compare the attributed sediment yield results against those of four empirical equations (Dendy and Bolton 1976; Avendano Salas et al. 1997; Webb and Griffiths 2001; Lu et al. 2003).

The Venetikos River was selected for the study purposes, by being the largest and most important tributary of Aliakmonas River (one of the major aquatic systems of Greece), contributing to the overall development of the region in terms of meeting with the water supply and irrigation needs (by extension, the agricultural productivity sustainability). The sediment issues (production/delivery/deposition) posed, regarding the imminent construction by the Greek PPC of the “Elafi” dam, downstream its outlet, are of great importance (reservoir aggradation, pressure exerted at the body of the dam, etc.) and need to be addressed as well.

Materials and methods

Study area and measurements

The Venetikos River catchment is located at the Southwestern part of the Western Macedonia Water District, resting almost entirely over the Grevena Prefecture (Fig. 1). The basin is mountainous (the elevation ranges from 437.76–2240.0 m, having a mean value of 1008.71 m) with intense topographical variations and an almost circular shape, covering an area of 855.23 km2.

Fig. 1
figure 1

Study area

The vegetation cover is extensive. According to the European CORINE (Coordination of Information on the Environment) Land Cover classification (1:100,000), version 2000 (CLC 2017), approximately 80% of the area is covered by forests and semi-natural areas, especially at the upstream mountainous parts of the catchment, while the few plains and croplands (approximately 18% of the area) are mainly located towards its outlet (Efthimiou 2016).

The bedrock is mostly comprised of sedimentary formations such as limestones, conglomerates, sandstones, marls, quaternary alluvial deposits (terraces, talus cones and scree, etc.). The catchment has a dense hydrographic network including four main streams, emanating from the Eastern part of the Pindus mountain range. The main watercourse is 53.9 km long with an average slope of 1.5%. For the time period 1965–1982, mean annual rainfall and temperature were estimated equal to 1015.1 mm and 9.6 °C, respectively (Efthimiou 2016).

The Greek PPC conducted the (daily) discharge (Q, m3 s−1), simultaneous discharge-sediment discharge (47 Q–Qs pairs) (Table 1) and (monthly) sediment discharge (Qs, kg s−1) (Table 2) measurements at the outlet of the Venetikos River catchment, namely Grevena bridge.

Table 1 Simultaneous discharge–sediment discharge measurements (Q–Qs pairs)
Table 2 Sediment discharge measurements (Qs, kg s−1)

The Q; Q–Qs pairs sampling program was infrequent (hydrologically based rather than on fixed interval, i.e., calendar based; characterized by long periods with no measurements) and often inadequate, having random and unsystematic character. Additionally, although PPC continues up to this day to collect primary field data, the processed ones (made available to the researchers) refer to periods of the recent past. The latter are considered obsolete and need to be extrapolated in order to meet with the needs of modern research efforts.

Regarding the sampling technique, suspended sediment load was measured on a monthly time basis, using a Delft Bottle type sampler (one sample per position; several positions per section). Such sampler is used to measure suspended sediment transport in rivers and other watercourses, from the surface down to 0.1 m above the river bottom. Its function is based on the flow-through principle, according to which the sediment containing water enters the intake nozzle, flows through the bottle-shaped sampler, leaving it at the backside. Its shape induces a strong reduction of the flow velocity within the sampling chamber (the water enters the sampler with almost the same velocity as the undisturbed flow), “forcing” the sediment material (sand particles larger than about 100 μm) to settle there. Using this instrument, the local average sand transport is measured directly; the laboratory analysis is rather limited, which is an advantage of the Delft Bottle method. The minimum sampling time is about 5 min to obtain a statistically reliable result (Dijkman 1978, 1981).

Unfortunately, the specific methodology has several deficiencies, and does not provide a complete picture of the watercourse’s sediment regime. Specifically, apart from the lack of bed load measurements (sparse or completely absent due to high cost and technical difficulties; in order to take into account bed load the Greek PPC increases the suspended load value by 10%; empirically, based on field experience), other attributes such as the suspended load granulometry; per depth concentration distribution, water temperature, etc. are not considered (Lykoudi 2000). Additionally, the majority of the simultaneous discharge-sediment discharge measurements were taken in low; medium flow conditions, while most of the annual sediment load (especially in Mediterranean countries) is transported by a few major flood events (Kronvang et al. 1997; Lykoudi and Zarris 2004). Moreover, the Delft sampler induces errors due to (i) incorrect intake velocity compared to the local flow velocity, (ii) inefficiency of the sampler to collect sediment particles finer than 100 μm, (iii) additional collection of sediment particles during the raising and lowering of the instrument, and iv) sediment loss during the removal of the sand catch from the Delft Bottle (http://www.coastalwiki.org/wiki/Delft_Bottle_suspended_ load_sampler, Accessed 16 June 2018). Such errors could rise up to 50% for individual samples, even after the application of correction measures, making the Delft Bottle only appropriate for a rough estimate of the local sand transport (only of the immersed volume of the sand catch on board of the vessel).

Overall, the mean annual discharge and sediment discharge values for the time period 1965–1982 were estimated equal to 17.92 m3 sec−1 and 21.48 kg s−1, respectively (Efthimiou 2016). The annual trend of the variables is presented in Fig. 2.

Fig. 2
figure 2

Mean annual discharge (Q, m3 s−1), sediment discharge (Qs, kg s−1)

Sediment rating curves

Theoretical background

The two most commonly used approaches for the estimation of suspended sediment load are the interpolation and extrapolation procedures (Walling and Webb 1981). The former is based on the hypothesis that instant field measurements (sediment discharge or concentration) are representative of a longer time period (e.g., day or week), requiring a regular sampling program. Regarding the latter, a regression analysis is performed considering a limited number of such field measurements, with the establishment of a rating relationship between them and the stream discharge data. Their extrapolation over the period of interest is achieved by implementing the streamflow record (usually mean daily discharge) to the mathematical equation of such relationship.

A typical example of the extrapolation procedure is the sediment rating curve method. Its widespread use is due to the fairly good correlation that exists between discharge (independent variable) and suspended sediment discharge at a certain cross-section of the river, justified by the fact that discharge represents not only the cross-section’s hydraulic parameters but the catchment’s hydrological ones as well (Koutsoyiannis and Tarla 1987). Given such method, sediment load is calculated either by implementing the flow-duration (Walling and Webb 1988; Cordova and Gonzalez 1997) or the magnitude-frequency (Stow and Chang 1987; Mckee and Hossain 2002) approach. The most commonly used rating curve development method is a power function (Campbell and Bauder 1940; Mimikou 1982; Asselman 2000) with the form of Eq. 1.

$$ {Q}_s=a{Q}^be $$
(1)

where, Qs is the sediment discharge in M T−1 units (usually in kg s−1), Q is the river discharge in V T−1 units (usually in m3 s−1), “a” is the rating coefficient (kg s−1), “b” is the rating exponent (dimensionless; normally assigned values between 0.5 and 3) and “e” is the multiplicative error term which (theoretically) exhibits a log-normal (Gaussian) distribution (Walling 1977a).

According to Peters-Kummerly (1973) the “a” coefficient is an index of the catchment’s erodibility (high values denote the presence of erodible geological formations), while the “b” exponent is a measure of the erosive and transportation capacity of the river flow. The higher the “b” values are the more effective the transport capacity is, while at this state, even a small increase of the river discharge can significantly increase sediment discharge {steep positive; low or level; negative gradients indicate that SSC increases with a fast pace; moderate pace; dilutes (due to limited sediment supply), respectively, as streamflow increases (Ellison et al. 2014)}. The value range of “b” can also be ascribed to the dynamics of different storm events (Tanaka et al. 1983), the variations that characterize the rising and falling limb of hydrographs (Park 1992) or the erodibility variations regarding different regions (dry; wet areas) and the extend of new sediment sources (Walling 1974; Mimikou 1982; Kesel 1989; Syvitski et al. 2000).

Sediment rating curves while being widely used for estimating sediment yield, suffer from serious criticism. A main disadvantage is the fixed values of the “a” and “b” parameters, indicating a permanent relationship between Q and Qs. Stationarity, however, does not apply at flood events while during their manifestation, apart from considering the temporal variation of concentration and discharge (Rieger et al. 1988; Walling and Webb 1988) additional sources of sediments like e.g. bank erosion should be sought. Thus, during such events the rating curve performance is very poor in terms of load prediction (Blanco et al. 2010). Additionally, sediment yield estimations based on rating curve calculations involve greater error (approximating − 80 to 900%) than those from direct measurements (Walling 1977a; Dickinson 1981; Ferguson 1986; Walling and Webb 1988; Singh and Durgunoglu 1989; Asselman 2000; Horowitz 2003). The accuracy is affected by the development method of the rating relationship (the logarithmic transformation of the Q, Qs data inherently contains a statistical bias), and the scatter of the Q–Qs pairs around the regression line associated with such method (Walling 1977a; Walling and Webb 1981; Tanaka et al. 1983; Kronvang et al. 1997; Gao et al. 2007). Walling (1977a) states that the main hydraulic reason causing the scatter is that suspended sediment load in natural rivers is essentially a non–capacity load. The latter is also affected by the sampling frequency, i.e., the number and location (high, low discharges) of the available Q–Qs pairs (Walling and Webb 1981; Roberts 1997). According to Walling and Teed (1971), the scatter can be interpreted by short- (within events) and long-term (between events) hysteresis effects, i.e., lack of temporal homogeneity characterized by different sediment concentration for equivalent magnitude discharge on the rising {greater Qs; could deviate by at least two orders of magnitude or more (Walling 1977a)} and falling limb of a hydrograph (Gurnell 1987; Kronvang et al. 1997). Hysteresis may be a significant index of different runoff and erosion types (Seeger et al. 2004) sediment delivery and source area (Klein 1984; Williams 1989; DiCenzo and Luk 1997). In sight of the above sediment discharge is greatly depended on the basin’s upstream sediment supply and availability, i.e., transport and deposit mechanisms related to geological and geomorphological conditions, patterns of tributary inflows, exhaustion effects, soil type, land use/cover type, seasonality {soil moisture and vegetation conditions, base–flow magnitude, rainfall characteristics (distribution, movement), water temperature (affects viscosity and by extension sediment transport)}, and less on the river’s discharge. Potential sources of erroneous assessment of sediment load, related to the rating curve methodology, are the inaccuracies in stream flow and sediment sampling (Loughran 1971; Douglas 1971) and the incorrect implementation of the flow data to the rating relationship which could cause underestimation of loads by 50% or more (Colby 1956; Gregory and Walling 1973). The definition of outliers is an equally important factor in determining the sediment rating curves. Finally, the method does not perform well at small catchments (Walling and Webb 1988).

In order to compensate with the uncertainty related to sediment rating curves, various modifications have been introduced. These modifications include the division of the available Q–Qs pairs into seasonal, i.e., wet/dry season (Walling 1974, 1977a; Rovira and Batalla 2006; Hu et al. 2011), monthly (Mao and Carrillo 2017) and hydrological, i.e., rising/falling stage of the hydrograph (Glysson 1987; Jansson 1996; De Girolamo et al. 2015) groupings, considering separate relationships per situation, calibration for different rainfall intensity ranges (Guzman et al. 2013), use of polynomial or more complex functions (Cordova and Gonzalez 1997; Horowitz 2003), etc. According to Sivakumar and Wallender (2004), none of the existing methods classifies as “universally” accepted, and the choice lies upon the situation at hand.

All and all, despite their shortcomings, sediment rating curves can constitute a good basis for estimating the suspended load of a large scaled hydrologic catchment with limited data sets even under Mediterranean climatic characteristics.

Log-transformed regression bias

The sediment rating curve is developed using a least-square linear regression analysis, given the logarithmically transformed Q–Qs pairs. Such approach can be justified on statistical grounds in terms of linearity of the relationship and of meeting with the theoretical hypotheses {the sediment data residuals are normally distributed; their variance is constant (homoscedasticity) (Ferguson 1986; Cohn et al. 1992; Asselman 2000)}, which are being violated by field water measurements, leading to inaccurate load estimation (Achite and Ouillon 2007; Crowder et al. 2007). The back-transformation to arithmetic space entails an inherent statistical bias (i.e., the residuals’ distribution is not normal; its mean value is greater than zero) (Koch and Smillie 1986), that is usually negative, leading to the underestimation of sediment load.

The latter is proportional to the variance of the additive (log-linear regression) error terms (Ferguson 1986). The underestimation occurs because the power function regression curve has to pass through the arithmetic means of the data pairs, being systematically higher than the corresponding geometric ones (obtained by back-transforming values on the logarithmic scale), from which the logarithmic function regression line (log–log plot) must respectively pass. In wash load conditions, the underestimation grows (Ferguson 1986; Singh and Durgunoglu 1989; Asselman 2000) because the deviation increases {the mean square error of the log-transformed regression σ2 (> 0; 0 when no dispersion around the regression line occurs) increases as well, causing the multiplicative error terms to attain higher values}.

Several methods have been proposed to correct the back–transformation bias (e.g., Bradu and Mundlak 1970; Jones et al. 1981; Ferguson 1986; Koch and Smillie 1986; Cohn et al. 1989; Duan 1983). Yet, there is no consensus on which of the bias correction factors developed is the best.

Additionally, Walling and Webb (1988) have demonstrated that statistical bias is not the dominant reason of inaccuracy in load estimates, since despite its removal the underestimation of actual sediment concentrations remains. This is because other sources of error (e.g., scatter; seasonality associated with the rating relationship, hysteretic, and exhaustion effects), which are not reflected in the bias correction factor (BCF), are more significant (Walling 1977a, 1977b).

Development

The development of the sediment rating curves at the Grevena bridge cross-section was based on the 47 simultaneous discharge-sediment discharge measurements (Table 1), conducted by the Greek PPC at the homonym gauging station, given the linear regression of their log-transformed variables (least squares fitting procedure). Four alternative methods were used, namely (i) simple rating curve, (ii) different ratings for the dry and wet season of the year, (iii) different ratings for the rising and falling limb of the runoff hydrograph, and (iv) broken line interpolation that uses different exponents for two discharge classes (Table 3, Fig. 3).

Table 3 Sediment rating curves equations
Fig. 3
figure 3

Forms of different sediment rating curves

The simple rating curve method equation consists of a single segment representing the entire data set. No discretization was made among the 47 pairs (Fig. 3a).

The different ratings for the dry and wet season of the year method equations vary from one season to the next. For a given discharge volume, the dry season curve conveys more sediment discharge than the wet season one. This is due to the greater erosive capability of a dry season rain compared to a wet season one conveying the same volume of discharge, as well as to the catchment’s greater sediment availability, a result of the physical, atmospheric, chemical and other processes of the preceded dry period. In an indirect way, apart from discharge and sediment discharge, this method takes under consideration several more parameters with a distinct seasonal variation, such as water temperature, seasonal characteristics and variations of hydrological parameters (e.g., type of storm), and the catchment’s vegetation cover, seasonal mechanism of erosion (Mimikou 1982; Koutsoyiannis and Tarla 1987). For the development of the wet season segment, 36 data pairs were used, classified in the respective period (December–May), while the corresponding dry one (June–November) was based on the 11 remaining pairs (Fig. 3b).

The rising and falling limb of the runoff hydrograph method stratifies the data according to the magnitude of flow, applying a separate curve for each stratum (Fig. 3c). Given the mean multi-annual discharge value of the period 1965–1982 (17.86 m3 s−1), the inflection point (1.2 × Qaverage) was estimated equal to 21.43 m3 s−1. Discharge values higher than 21.43 m3 s−1 comprised the rising limb of the hydrograph, while lower the falling, leading to two different sets of discharge-sediment discharge pairs (25 for the high discharges class and 22 for the low discharges one), described by two different equations.

The broken line interpolation method consists of sequentially straight line segments. The number of the segments is the outcome of the compromise between two objectives, minimizing the fitting error and the roughness of the broken line. The interpolation is homoscedastic and the inflection point is determined by a trial-and-error procedure (Koutsoyiannis 2000). At the present study, a two segment line is considered (Fig. 3d), based on physical grounds. On a gravel-bed river (like Venetikos), there is a threshold discharge value, attributed to the existing armor layer. For discharges bellow this boundary, it is assumed that there is no exchange of the suspended sediment with the river bed. At flood events, when the boundary is exceeded, this layer begins to deteriorate. Once the layer is fully “washed out” a larger range of particle sizes is exposed and the transport rate increases significantly. Moreover, during such events, bank erosion occurs, enhancing the river’s sediment availability. The aforementioned trial-and-error procedure indicated that the inflection point minimizing the error function (thus, determining the equations that mathematically describe the two different segments) is 22 m3 s−1 {22 pairs were below this threshold (used for the development of the lower segment’s equation) while 25 were above it}. This value is greater than the river’s mean annual discharge (17.86 m3 s−1), meaning that the armor layer does not break up as frequently. Therefore, extreme floods are relatively more important to the long term yield, than frequent runoff events.

Since the “a” and “b” parameters of each method resulted from a log–log regression, a correction of the back-transformation bias (from logarithmic to arithmetic space) is required. Such correction was achieved by using the non-parametric BCF or “smearing” estimator (Eq. 2), introduced by Duan (1983). The bias is removed (thus, the unbiased estimator of the true load is obtained) by directly multiplying the BCF to “a” (rating coefficient), forcing the curve to shift towards higher sediment discharge values (Fig. 3; dashed lines).

$$ BCF=\frac{1}{n}\sum \limits_{i=1}^n{b}^{r_i} $$
(2)

where, n is the number of samples (or regression residuals in logarithmic space), b is the logarithm base used (10 or “e”) and ri is the difference between measured and estimated sediment concentration per sample (in logarithmic units).

The curves are presented as straight lines, using a log–log plot (Fig. 3).

It is noted that the curves’ implementation period (1965–82) is identical to the measuring (1965–1982) and sampling (1965–1983) one, justifying the seamless use of the aforementioned equations. The latter are considered valid throughout its duration, since the basin’s land cover; morphological, etc. characteristics (thus the rating coefficients that describe them) remain unchanged.

Statistical measures

Since correlation measures such as R (correlation coefficient) and R2 (coefficient of determination) are often misleading in comparing model simulated (Pi) and observed (Oi) values (Willmott and Wicks 1980; Fox 1981; Willmott 1982; Legates and McCabe Jr. 1999), alternative statistical indices like, e.g., the index of agreement (d) {“correction” measure for the R2 coefficient; its values (as of R2) should preferably be close to the unit (Willmott and Wicks 1980; Willmott 1981, 1982)} are computed.

Fox (1981) suggests that four different measure types should be calculated. The mean bias error (MBE) (it is the difference between mean observed and mean simulated values) which describes the bias, the variance of the distribution of differences (sd2) describing the variability of the difference between simulated and observed values around the MBE, the root mean square error (RMSE) and/or the mean absolute error (MAE). RMSE and MAE are prevalent correlation measures because they summarize the mean difference between observed and predicted values (MAE is less sensitive to outliers than RMSE and moreover is more preferable for smaller datasets). Both their values can range from zero to infinity with the ones closest to zero being better (Alexandris et al. 2008).

Additional measures are the systematic (RMSEs) and unsystematic (RMSEu) component of the RMSE (Willmott 1981). The former is determined by the distance between the linear regressions best-fit line and the 1:1 (45o) line, while the latter by the distance between the data points and the linear regression best–fit line. According to Berengena and Gavilan (2005), RMSEs is a measure of the space available for adjustment, while RMSEu of the scatter about the regression line and can be interpreted as a measure of the potential accuracy. RMSEu should preferably be closest to zero while RMSEs should be close to the RMSE (Alexandris et al. 2008).

The Nash–Sutcliffe efficiency (NSE) is also computed as the ratio of residual variance to measured data variances (Nash and Sutcliffe 1970). NSE, apart from the time series convergence, also takes under consideration the dispersion of measurements, based on the declination of the observed data against their mean value (fraction denominator). Its preferable values are close to the unit (indicating that the simulated time series is almost identical to the measured one). The index has no lower limit (negative values can be assigned as well).

Computational forms of all the indices are given bellow (Table 4).

Table 4 Statistical measures

Results and discussion

Sediment rating curves

The simple rating curve (Fig. 3a) under-predicted the high sediment discharges {points above the curve; wash load conditions, where the greater part of annual sedimentation is yielded; few events, yet the underestimations are substantially larger and more crucial for e.g. the total load estimation (Horowitz 2003; Cox et al. 2008)} and over-predicted the low ones (points sited below the curve; base flow conditions; large number). Note, the consistent significant underestimation in the discharge range below 10 and above 40 m3 s−1 and the tendency for underestimation between 10 and 40 m3 s−1. This is mainly because a straight line fit is not suitable for the data since it does not incorporate known factors affecting sediment transport {Freund et al. (2006) state that the regression model used should adequately describe the population of the data that is trying to model}. This is also attributed to the significant number of sediment discharge measurements conducted at low flow conditions, having a major effect on the correlation, as well as the statistical bias inherently contained within the logarithmic transformation of the Q, Qs data (characteristic of all development methods). The latter was removed by implementing a BCF, forcing the curve (dashed line) to shift towards higher sediment discharge values (applied to all development methods). This led to a better fit at high flows but overestimated the low and medium values of sediment discharge, due to the relatively high BCF value (1.71; caused by the notable log–linear regression error terms per sample). The poor fit of the BCF corrected model at low and medium flows is also due to the aforementioned inappropriateness of the straight line fit. Failing to account for the nonlinear relation between sediment and water discharge exaggerated the residual standard error which inflated the BCF correction.

Concerning the different ratings for the dry and wet season of the year method (Fig. 3b), for each discharge volume the dry season curve yields more sediment discharge than the wet season one {a dry season rain has greater erosive capability; the catchment has greater sediment availability (Mimikou 1982; Koutsoyiannis and Tarla 1987)}. Among the 47 Q–Qs pairs collected, 36 were stratified to the wet season and 11 to dry one. Wet season had the highest mean (32.26 m3 s−1) and max (140.71 m3 s−1) discharge, a fact attributed to the basin’s rainfall; mixture of rainfall and snowmelt characteristics. Contrary, dry season had the lowest minimum discharge (1.29 m3 s−1). Suspended sediment discharge followed the seasonal patterns of discharge. Among the equations representing the two distinct segments of the method, the wet season’s “b” exponent (1.51) is higher than the dry season’s one (1.30). The difference is attributed to the difference in the transport processes within the watercourse {higher “b” values denote a more effective transport capacity (Peters-Kummerly 1973; Ellison et al. 2014)}, the transport processes within the catchment {“b” represents the degree at which sediment is supplied from uplands to streams; the transport rate is supply limited; the supply degree depends on seasonal variations of land use and land cover, i.e. larger areas of exposed bare soils and less vegetation cover in the dry season, and wider coverage of thicker vegetation in the wet season (Gao and Puckett 2011)} and the river erodibility {“b” represents the erosive power of a river (Asselman 2000; Gao 2008)} in the two discharge zones. Finally, in the wet season, the Q–Qs pairs displayed greater scatter causing relatively lower R2 values.

Concerning the rising and falling limb of the runoff hydrograph method (Fig. 3c), sediment discharge is higher for discharge records greater than the inflection point. The slope of the curve above the threshold is steeper (SSC increases with a faster pace in this range), denoting that this specific discharge value is probably linked with erosion and sediment transport processes of different sources (e.g., bank erosion; bed material deposited at periods of low flow but not washed away even in “normal” flooding conditions; deterioration of the river bed armor layer; and sediment supply from upland hillslopes), since it occurs during extremely intense events for discharges exceeding the value of 21.43 m3 s−1.

The slope of the broken line interpolation method equation (Fig. 3d) above the inflection point is subject to the same causes and limitations as the one of the rising and falling limb of the runoff hydrograph method. It is also noted that their respective segments are described by the same mathematical formulas, since the inflection point is almost identical (21.43 ≈ 22 m3 s−1).

Correlation between observed and simulated sediment discharge (yield)

The development of the rating curves’ formulae is followed by the estimation of the simulated mean daily (suspended) sediment discharge. The latter was calculated by implementing the observed mean daily discharge records (provided by the Greek PPC) on each curve’s mathematical equation (1965–1982). Subsequently, the aforementioned mean daily values of the variable were aggregated, delivering the mean monthly (and annual; see Table 5) sediment discharge. The convergence between observed and simulated time series, at various time scales, is presented in Figs. 4 and 5.

Table 5 Mean annual sediment yield and sediment discharge values
Fig. 4
figure 4

Monthly values of observed and simulated sediment discharge

Fig. 5
figure 5

Mean monthly; annual values of observed and simulated sediment discharge

While the total sediment volume is quite satisfactory simulated by every method, they all display weakness in simulating the outliers equally well. In regard to the sediment load’s temporal distribution, at all time scales methods (iii) and (iv) constantly overestimated the observed results, while method (ii) constantly underestimated them. Method (i) on the other hand displayed an irregular behavior (notable monthly under-prediction; under-prediction at specific months, years). The broken line interpolation and rising and falling limb of the runoff hydrograph methods perform equally well, displaying almost identical deviations against the observed results.

The correlation between monthly observed and simulated sediment discharge was also evaluated by means of a least squares fitting procedure (Fig. 6). The linear regression equation (Y = bX + a) is depicted by the straight red line, where the Y axis represents the observed sediment discharge values (PPC), the X axis the simulated sediment discharges (four rating curve methods) (Pineiro et al. 2008), and the “a,” “b” constants the slope and intercept of the regression equation. The Y = X line (45o or slope = 1) is depicted with black color. The derivative mathematical formulas along with the cross-correlation coefficient (R2) are also presented in each figure.

Fig. 6
figure 6

Correlation of observed and simulated sediment discharge

Regarding the latter, the simulated sediment discharge values of every development method correlated exceptionally well to the observed ones (notably high values of R2). Methods (i) and (ii) underestimated the measured sediment discharge values. Additionally, at low-flow conditions, most pairs are placed around the regression line, while as discharge increases, their dispersion widens (the underestimation grows analogously). This pattern is not followed by methods (iii) and (iv) since both over-predicted the observed results, while all pairs are very close to the regression line. This indicates that they should be considered more reliable (more consistent) in predicting the actual values of the variable, even at high discharges where greater uncertainty occurs.

Overall, the diagrams support and enhance the aforementioned interpretations, as well as the following statistical analysis results.

Subsequently, simulated mean annual sediment yield was calculated for each method taking under consideration the catchment area. Observed (PPC) mean annual sediment yield and discharge values derived by aggregating the corresponding mean monthly measurements (Table 5).

All simulated values are considered similar, and in accordance with the observed ones. The lowest mean annual values were attributed by the different ratings for the dry and wet season of the year method (ii) while the highest by the different ratings for the rising and falling limb of the runoff hydrograph (iii) and the broken line interpolation (iv) methods. The latter exhibit almost identical results since they are both described by the same mathematical equation. Methods (i) and (ii) underestimated the observed results while (iii) and (iv) overestimated them. The simple rating curve yielded the smallest difference against the observed values (0.13 kg s−1 or 4.63 t km−2).

The results are considered fairly low, compared to the ones yielded by other catchments of similar size, mostly due to the bedrock {not as prone to erosion, yielding small portions of sediments; this fact is also supported by the low values of the “a” coefficient (Table 3), index of the basin’s erodibility, denoting the presence of low-susceptibility geological formations (Peters-Kummerly 1973), geomorphology and land cover pattern of the catchment (extensive vegetation, especially at the slopes of the mountainous landscape), providing protection against erosion. Additionally, the mild cultivation and farming techniques practiced in the portion of the catchment used for agricultural activities do not encumber the soil’s vulnerability to erosion.

Inaccuracy in load estimates could also be ascribed to sampling errors of streamflow and sediment load data (Horowitz 2003), the sampling schedule followed (e.g., emphasis in base flow conditions with few measurements in flood events; the points that define the rating relationship don’t represent a random sample of all possible points), the incorrect application of the flow records to the rating relationship (Walling 1977a), the inherent disadvantages of the rating curve methodology {e.g., the stationarity hypothesis doesn’t apply at natural rivers; human induced actions (e.g., changes in land use, fires), climate, earthquakes, landslides, and flow alterations directly affect the suspended sediment regime (Prestegaard 1988; Yang et al. 2007; Horowitz 2010)}.

Correlation between observed (simulated) and empirical sediment discharge (yield)

An attempt was also made to correlate the results of four empirical equations (Table 5), to the observed (PPC) and simulated (rating curves) sediment discharge, yield values. The latter estimate the catchment’s mean annual sediment yield (Dendy and Bolton 1976; Avendano Salas et al. 1997; Lu et al. 2003) and discharge (Webb and Griffiths 2001) as a function of its area (A, km2).

The empirically estimated mean annual sediment yield and discharge values, apart from the Lu et al. (2003) method which performed moderately well, are significantly lower against the respective observed and simulated ones. The poor correlation is attributed to the over-simplified approach of the complex detachment-transport-deposition mechanism of soil erosion, omitting important attributes of the basin like the local climate, bedrock, geomorphology, land cover pattern, and stream dynamics. Moreover, to the errors associated with the rating curves’ development technique, the field measuring methodology as well as the random or systematic errors that are associated with such estimates used in the aforementioned equations.

Statistical analysis

The statistical analysis results (1965–1982) are presented in Table 6.

Table 6 Statistical analysis results

The indices values of all methods are very satisfactory and significantly close to the preferable ones. The method that seems to perform best is the broken line interpolation, not only by meeting the preferable performance criteria (Table 6 footnotes) of the majority of the statistical indices, but also by being superior in comparison to the other methods. It is noted that while the rising and falling limb of the runoff hydrograph (iii) and the broken line interpolation (iv) methods exhibit almost identical statistical characteristics (described by the same mathematical equations), the latter seems to prevail.

All statistical measures are in agreement with the illustrated results obtained by the regression analysis. Thus, the broken line interpolation is henceforward to be considered the representative rating curve development method for the specific gauging site (basin).

Conclusions

The study provided guidance on the selection of the most appropriate suspended sediment rating curve development method (among four alternative approximations), at the outlet of the Venetikos River watershed. Given the observed sediment discharge field measurements provided by the Greek PPC and considering both graphical and statistical analyses (1965–1982; various time scales), the method that performed best was the broken line interpolation that uses different exponents for two discharge classes. Thus, henceforward, it will be treated as the representative one for the aforementioned site (basin).

The creation of a continuous sediment discharge time series of high quality and low temporal resolution was essential, in order to transcend the deficiencies of the sampling schedule and the inherent errors of the PPC field data, quantify the soil erosion phenomenon as accurately as possible at different temporal scales, meet with the needs of modern research efforts and aid policy makers to design and implement the proper river and catchment management strategies. In sight of the above, the estimation of the cumulative suspended load, reaching the basin’s outlet as sediment yield (extrapolated till the present day), can constitute a valuable tool regarding the imminent construction of the “Elafi” dam, as far as the dead storage volume design is concerned.

Finally, the attempt to evaluate the estimated (and observed) sediment yield values against the ones attributed by four empirical equations was unsuccessful, due to the over-simplification of the soil erosion mechanism adopted by such approaches, the shortcomings of the rating curves’ development techniques and errors associated with the field measurements.

Overall, despite its shortcomings, the rating curve methodology can constitute a good basis for describing the sediment regime of watercourses, at catchments with limited or poor quality data. The ease of use and low implementation cost are characteristics that strengthen its overall status.