Abstract
Almost all water resources projects require past record of streamflow data and longer the record, the better the decision that can be taken during design or operation stage. However, in most of the cases, a long record of streamflow data is not available and it becomes essential to synthetically generate sequence of streamflow those are statistically similar to the observed data. Models to generate such sequences are available for a single river (single-site) and for both river and its tributaries (multi-site); however, comparative studies of these models needs to be done, before implementation to actual system. This study deals with the comparison of the performances of single-site and multi-site, seasonal streamflow generation models, applied to an existing river with tributary across which reservoirs were constructed. Since cross-correlation structure of the flows in a river–tributary system plays an important role in the integrated operation of the reservoirs, multi-site models are developed, as the cross-correlation cannot be preserved by the single-site models. Performances of the developed single-site and multi-site models are compared in terms of mean, standard deviation, skewness, serial correlation and cross-correlation of the observed and the generated series. The results indicated that cross-correlations are well preserved by the multi-site models only, whereas other statistical parameters, except serial correlation, are well preserved by both the single-site and multi-site models.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Synthetic generation of streamflow is one of the major areas in stochastic hydrology. Since the flow through a river is inherently stochastic, sufficient information about this flow is almost essential in either design or operation of any water resources project. Such information is usually retrieved from the observed records of flows. However, in most of the cases, past records of flows are available for a limited length and such records do not provide the proper picture of variability in flows. Any system designed with such limited data becomes shortsighted and inherits the risk of being inadequate for the unknown flow sequences that the system may experience in future. To deal with this issue of limited available data, usually a synthetic generation model is used that is capable of generating equally likely sequences of flow data which are similar to the historical data in a statistical sense. Two different classes of models are generally used: (i) single-site models for flow data of a single river and (ii) multi-site models dealing with flow data of more than one adjacent rivers (or tributaries).
Brief review of past works
Single-site models
Streamflow generation model was originally introduced by Thomas and Fiering (1962) which is a first-order autoregressive model for generating monthly streamflows of the Clearwater River and its tributaries in Idaho. Since then a number of models have been suggested for hydrologic time series in general and streamflow series in particular. Harms and Campbell (1967) extended Thomas–Fiering model to preserve: (a) normal distribution of annual flows; (b) log-normal distribution of monthly flows; and (c) correlation between annual flows. McMahon and Miller (1971) applied the Thomas Fiering model to skewed hydrologic data using gamma transformation. However, they noted an inconsistency in the transformation process to modify random normal variates to random skewed variates used in the model. According to them, this transformation could be applied to larger skews by taking initially a logarithmic transformation of all flows prior to calculating the parameters of the model. This procedure appreciably reduced the skewness, thus allowing the transformation to generate within the limits of its consistency. Bobée and Robitaille (1975) proposed formulae for adjusting the average of estimates to give a better estimation of the skewness of the population. Phien and Ruksaslip (1981) considered four models for generation of monthly streamflows and modified them when needed with a view to reproduce the mean, standard deviation and skewness coefficient of each monthly sequence of the historical records. The four single-site streamflow generation models for monthly sequences considered were: (a) Thomas–Fiering model (1962), (b) First Spolia–Chander model (1974), (c) Second Spolia–Chander model (1977), (d) Sen model (1978). Modifications were made to account for preserving the skewness. To evaluate the above models, the number of negative values generated and the computer time required were also considered. The modified models proposed by them could satisfactorily preserve the mean, standard deviation and skewness coefficient of the historical records. Application of single-site model can also be found in the works of Shih (1978), Stedinger and Taylor (1982), Awchi and Srivastava (2009), Sangal and Biswas (1970), McMahon and Miller 1971, Moss and Dawdy 1974, Wallis et al. 1974, Mckerchar and Delleur 1974, Mejia and Rodriguez-Iturbe 1974, Charbeneau 1978, Stedinger and Taylor 1982, Sim 1987, Savic et al. 1989, Arselan 2012, etc.
Multi-site models
Most of the reported studies on streamflow generation are based on a single site. But for simultaneous generation of flows in a river–tributary system, multi-site models are more logical. If the individual flows in the river and its tributary are spatially uncorrelated, developing individual, single-site models, for the river as well as for the tributary may be sufficient. But usually, flows in a river and its tributary are observed to have significant cross-correlations, since both the river and the tributary receive runoff from the same parent rainfall on the basin. In this context, Fiering and Jackson (1971) explained that, if a particular month is unusually wet at one site in an area, it is very likely that the same month will be wet at nearby sites. In such cases, it becomes necessary to develop multi-site models which can preserve the cross-correlation in addition to the other required properties at each site (Matalas 1967).
Xu et al. (2001, 2003) used Markov cross-correlation pulse model to extend synthetic streamflow generation for a single site to multiple sites with possibly high cross-correlations of the daily values among these sites. For simulating multi-site multi-season streamflows, Srinivas and Srinivasan (2005) introduced a new hybrid stochastic model which used a parsimonious periodic parametric model without normalization for partial pre-whitening of streamflows at each site. The resulting residuals were resampled using moving block bootstrap to reproduce site-to-site correlations. Szilagyi et al. (2006) applied a hybrid, seasonal Markov chain-based model of daily flow simulation at multiple catchment sites. The model used components of the shot noise models in a Markov chain-based approach, together with a conceptual framework describing flow recession without the need for information on precipitation. They could generate arbitrarily long time series of daily flow rates that at least moderately well preserve basic long-term (mean, variance, skewness, autocorrelation structure, cross-correlations) statistics, as well as short-term behavior of the original time series. Use of multi-site models can also be found in the studies of Wang and Ding (2007), Hao and Singh (2013), Srivastav and Simonovic (2014), etc.
Apart from these autoregressive (AR) models, works based on autoregressive moving average (ARMA) and autoregressive integrated moving average (ARIMA) models have also been reported both for single- and multi-site cases (Box and Jenkins 1970; Moss and Dawdy 1974; Mckerchar and Delleur 1974; Stedinger et al. 1985; Sim 1987, etc.) and ANN models (Cigizoglu 2005; Kisi 2007; Ahmed and Sarma 2007; Yonaba et al. 2010; Mehr et al. 2014, etc.).
Since single-site models are inherently simple and efficient for a single river, many a times, single-site models are employed for multi-river systems. But as cross-correlation structure cannot be considered into the model, they cannot properly preserve the cross-correlation structure of multi-river flow data. Since a comprehensive comparison of the performances of single-site and multi-site models applied to the same river–tributary system is not available, this study aims to provide such a detailed analysis. A number of seasonal (monthly) AR models are developed for an existing river and its tributary, considering both single-site as well as multi-site formulation. As for the distribution of the flow data, models are developed based on i) normal distribution and ii) gamma distribution, both for single-site and multi-site models.
The generated series are compared with the historical series in terms of long-term statistical parameters including, mean, standard deviation, coefficient of skewness, serial correlation between successive months and cross-correlation between two sites in the same month.
Study area
Damodar valley (DV) reservoir system in India is a multi-purpose multi-reservoir system. The two upper reservoirs, Konar and Tilaiya, are constructed across river Konar and river Barakar, respectively, as shown in Fig. 1. Performance of the integrated operation of this multi-reservoir system largely depends on the flow in these rivers. For simulation and optimization studies on the operation of this multi-reservoir system, a long sequence of possible flows in future in these rivers is essential that resembles the observed flow series. Hence, streamflow generation models are developed for the flows in these two rivers, which are actually inflows to the two reservoirs.
Konar dam is constructed across Konar River, about 30.6 km from its confluence with Damodar River. The reservoir is primarily responsible for flood control and to supply cooling water to Bokaro thermal power station in the downstream. Tilaiya dam was constructed across the Barakar River, at Tilaiya in Koderma district in the Indian state of Jharkhand mainly to supply irrigation water during the dry season. Tilaiya dam has a power generation capacity of 4 MW.
Streamflow generation models
Four single-site and two multi-site models are developed in this study. As for the distribution of the flow series, normal distribution and gamma distribution are used. In many reported models, normal distribution is used due to its simplicity, but being a symmetric distribution it cannot preserve skewness. Since streamflow values are always positive, its distribution has inherent skewness and use of a skewed distribution like gamma distribution is preferred.
Single site model
The general form of a seasonal, first-order Thomas–Fiering model is given below (Haan 1977):
in which \( x_{i,j} \) is flow in the jth month of ith year; \( \overline{x}_{j} \) and \( S_{x,j} \) are mean and standard deviation of the flows in the jth month, respectively; \( r_{x,j} \) is first-order serial correlation between j and j + 1th month; and z is a random component with zero mean and unit variance. In the above equation for a monthly model, \( x_{i,j + 1} \) is understood to be \( x_{i + 1,1} \) when j = 12.
Normal model
Equation (1) actually represents the normal model, if the random component z is taken as normally distributed with zero mean and unit standard deviation. Since normal distribution is symmetric with respect to mean, it is possible that some of the generated flows are found to be negative. But, since the flow value cannot be negative, these are usually discarded after using it for generating the next value. Moreover, as the starting value is selected arbitrarily, the first few years of generated values are discarded.
Gamma model
If the observed series has appreciable skewness, use of a skewed distribution instead of normal distribution is preferable (Haan 1977). Gamma distribution is one such distribution which is used in this study.
Equation (1) can also be used for the gamma model, except that the random component zi,j+1 is replaced by εi,j+1 as follows:
The random component εi,j is calculated from the following equation (Haan 1977):
Where zi, j is normally distributed with zero mean and unit standard deviation, as usual, and cε, j is skewness of random component εi, j+1 and given by
Multi-site models
Multi-site modeling was first proposed by Fiering (1964) which was a principal component model. Later, Matalas (1967) proposed a lag-one multivariate model. The multi-site seasonal AR(1) model (Matalas 1967) may be written as:
Where \( \text{Z}_{i,j} \) is a vector (\( n \times 1 \)) of standardized streamflow values at \( n \) sites (reservoirs). The subscripts \( i \) and \( j \) denote the year and season, where \( j = 1, 2, \ldots w \); \( w \) representing the number of seasons in the year (\( w \) = 12 for a monthly model). \( {\varvec{\upvarepsilon}}_{i,j} \) is a vector \( \left( {n \times 1} \right) \) of serially and mutually uncorrelated independent variables with zero mean and unit variance. \( {\mathbf{A}}_{j} \) and \( {\mathbf{B}}_{j} \) are coefficient matrices of size \( \left( {n \times n} \right) \).
The \( {\mathbf{Z}}_{i,j} \) vector is assumed to be derived from the original series \( {\mathbf{X}}_{i,j} \) through a two step process of standardization and normalization (if needed, for non-normal models) as follows:
In Eq. (6), \( x_{i,j}^{k} \) represents actual streamflow value at the site \( k \), during the year \( i \) and month \( j \). It is kth element of the vector \( {\mathbf{X}}_{i,j} \). The terms \( \overline{x}_{j}^{k} \) and \( S_{x,j}^{k} \) are the monthly mean and monthly standard deviation of the series \( \varvec{x}_{i}^{k} \), respectively, and \( y_{i,j}^{k} \) is the kth element of the standardized vector \( \varvec{y}_{i,j} \). In Eq. (7), the term \( g_{i}^{k} \)(.) is a transformation function which is applied in case of non-normal distributions to normalize the original series. After generation of the Z series, inverse transformation of this function is applied to achieve the desired distribution.
Estimation of parameters
The parameter matrices \( {\mathbf{A}}_{j} \) and \( {\mathbf{B}}_{j} \) of Eq. (5) are estimated as follows (Haan 1977):
Where \( {\mathbf{M}}_{0,j} \) and \( {\mathbf{M}}_{1,j} \) are the cross-covariance matrix of lag zero and lag one, respectively. The cross-covariance matrices are obtained from the following equations:
Matrix \( {\mathbf{B}}_{j} \) does not have a unique solution. Rather it can have several solutions. Matalas [1967] suggested principal component analysis. But a more straight forward solution was proposed by Young and Pisano [1968] assuming \( {\mathbf{B}}_{j} \) as a lower triangular matrix.
Normal model
If the normalization step (Eq. 7) is omitted, then the model acts as a normal model.
Gamma model
For developing gamma model, the original series \( {\mathbf{X}}_{i,j} \) is first standardized using Eq. (6) and the standardized series \( {\mathbf{Y}}_{i,j} \) is normalized using the Wilson–Hilferty transformation as follows:
Where \( c_{y,j}^{k} \) represents the monthly skewness coefficient of the series \( {\mathbf{y}}_{i}^{k} \), and \( z_{i,j}^{\text{k}} \) is the ith element of the normalized vector \( {\mathbf{Z}}_{{{\text{i}},{\text{j}}}} \).
The lag-zero and lag-one cross-covariance matrices are estimated from the Z series using Eqs. (10) and (11). Parameter matrices \( {\mathbf{A}}_{j} \) and \( {\mathbf{B}}_{j} \) are estimated from Eqs. (8) and (9), assuming \( {\mathbf{B}}_{j} \) as lower triangular matrix.
Then, a sequence of normal random deviate \( {\varvec{\upvarepsilon}}_{i,j} \) of length (N*n) is generated where N represents number of years for which flows are required to be generated and n is number of sites.
After the generation of standard normal vector \( {\mathbf{Z}}_{i,j} \), inverse transformation of Eq. (12) is applied to transform the generated normal vector into standard gamma vector \( {\mathbf{Y}}_{i,j} \) using the following equation:
Now, the original series is obtained as:
Results and discussion
After developing the models, monthly sequences of 100 years flow data have been generated for each of the two rivers, namely Konar and Barakar. The statistical parameters of the models are estimated from thirty-seven years of observed flow through these two rivers. The generated series of the two rivers are compared with the corresponding observed series in terms of mean, standard deviation, coefficient of skewness of each month, serial correlation of successive months and cross-correlation between the flows in two rivers. The corresponding plots of comparisons are shown in Figs. 2, 3. 4 and 5 for Konar River with single-site model, in Figs. 6, 7, 8 and 9 for Barakar River with single-site model, in Fig. 10 for both Konar River and Barakar River with single-site model, in Figs. 11, 12, 13 and 14 for Konar River with multi-site model, in Figs. 15, 16, 17 and 18 for Barakar River with multi-site model and in Fig. 19 for both Konar River and Barakar River with multi-site model.
Results from single site models
Figure 2 presents the plots of monthly mean values obtained from single-site models with normal distribution and gamma distribution, along with those obtained from the observed data series, for Konar River. It may be seen that both normal and gamma model generated mean values almost equal to that of the observed series, except for the month of August. Similar plot for monthly standard deviation are shown in Fig. 3, which also shows very close agreement of the generated series with the observed series, except for the month of August and September. The situation is however different in case of skewness coefficient (Fig. 4), where the Gamma model yielded results similar to the observed series, but for normal model, the values are different and around zero. This is expected since normal distribution is a symmetric distribution. The small amount of skewness that can be observed is due to making the generated negative values equal to zero. In Fig. 5, which shows the serial correlations it can be seen that both the three plots are quite close.
Similar comparative plots of the statistical parameters are obtained for Barakar River also, as shown in Figs. 6, 7, 8 and 9. In case of preserving mean values, it can be seen from Fig. 6 that both the two models yielded results very close to the observed series, except for the month of August. Comparatively, gamma model yielded better results. In terms of monthly standard deviation also (Fig. 7), both normal and gamma model produced results very close to the observed series, except for the month of August and September. In case of skewness coefficient (Fig. 8), results of gamma model are quite similar to the observed series but that of normal model is quite different, similar to that observed in case of Konar River. As for the serial correlation, it can be seen from Fig. 9 that results from both the two models are very close to the observed values.
Although the single-site models were developed separately for each river and models for one river yielded results without having any knowledge about flows in the other river, just for comparison, cross-correlations are computed from the two generated series for Konar and Barakar, for each model. These values are shown in Fig. 10. Expectedly, generated cross-correlation values did not match at all with those of the observed series.
Results from multi-site models
Plots for comparing mean, standard deviation, skewness and serial correlation values obtained from the multi-site models with two different distributions with those of the observed series are shown in Figs. 11,12, 13 and 14 for Konar River and in Figs. 15, 16, 17 and 18 for Barakar River. Like the single-site models, here also it can be observed that both the normal model and the gamma model preserved the mean and standard deviation values very well, for both Konar and Barakar. In case of skewness coefficient, gamma model yielded much better result than the normal model. Regarding serial correlation values, however, both the models produced values quite different than that of the observed series, for both Konar and Barakar.
Regarding preservation of cross-correlation between the flows in two rivers, it can be seen from Fig. 19 that both the models produced excellent results with values almost equal to those of the observed series.
Conclusion
A comparative study on the performances of single-site AR model and multi-site AR model, for synthetic generation of flows in an existing river and tributary is presented in this paper. As for the distribution of the flows, both normal distribution and gamma distribution are used and compared. Gamma distribution is used to take care of the skewness in the series, if any. Results indicate that regarding preservation of mean, standard deviation and serial correlation, both single-site models and multi-site models produce very good results with each distribution, for both the rivers. Gamma model is, however, found to be much better than the normal model in preserving skewness, which is expected since normal distribution has zero skewness. Cross-correlation is not at all preserved by the single-site models, which is excellently preserved by the multi-site models. Hence, in cases, where preservation of mean, standard deviation, serial correlation and skewness is needed, single-site model with gamma distribution can be used. If preservation of cross-correlation is required, then multi-site model with gamma distribution is to be used. It may be noted here that the performance of an AR model is dependent on its parameters, which are in turn dependent on the length of the observed record and variations in flow characteristics captured in the record. Hence, these conclusions are specific to the river system studied and may be applicable to river systems with similar flow characteristics.
References
Ahmed JA, Sarma AK (2007) Artificial neural network model for synthetic streamflow generation. Water Resour Manage 21:1015–1029
Arselan CA (2012) Stream flow simulation and synthetic flow calculation by modified Thomas Fiering model. Al-Rafidain Eng 20(2):118–127
Awchi TA, Srivastava DK (2009) Analysis of drought and storage for Mula project using ANN and stochastic generation models. Hydrol Res 40(1):79–91. https://doi.org/10.2166/nh.2009.012
Bobée B, Robitaille R (1975) Correction of bias in the estimation of the coefficient of skewness. Water Resour Res 11(6):851–854
Box GEP, Jenkins GM (1970) Time series analysis: forecasting and control. Holden-Day, San Francisco, California, pp 55–56
Charbeneau RJ (1978) Comparison of the two- and three- parameter log normal distributions used in streamflow synthesis. Water Resour Res 14(1):149–150
Cigizoglu HK (2005) Application of generalized regression neural networks to intermittent flow forecasting and estimation. J Hydrol Eng 10(4):336–341
Fiering MB (1964) Multivariate Technique for Synthetic Hydrology. J Hydraulics Div ASCE 90(5):43–60
Fiering M, Jackson B (1971) Synthetic streamflows, water resources monograph 1. American Geophysical Union, Washington, D.C.
Haan CT (1977) Statistical methods in hydrology. Iowa State University Press, Ames
Hao Z, Singh VP (2013) Modeling multi-site streamflow dependence with maximum entropy copula. Water Resour Res 49(7139–7143):2013. https://doi.org/10.1002/wrcr.20523
Harms AA, Campbell TH (1967) An extension to the Thomas–Fiering model for the sequential generation of Streamflow. Water Resour Res 3(3):653–661
Kisi O (2007) Streamflow forecasting using different artificial neural network algorithms. J Hydrol Eng 12:532–539
Matalas NC (1967) Mathematical assessment of synthetic hydrology. Water Resour Res 3(4):937–945
Mckerchar AI, Delleur JW (1974) Application of seasonal parametric linear stochastic models to monthly flow data. Water Resour Res 10(2):246–255
McMahon TA, Miller AJ (1971) Application of the Thomas and Fiering model to skewed hydrologic data. Water Resour Res 7(5):1338–1340
Mehr AD, Kahya E, Sahin A (2015) Successive-station monthly streamflow prediction using different artificial neural network algorithms. Int J Environ Sci Technol 12:2191–2200
Mejia JM, Rodriguez-Iturbe I (1974) Correlation links between normal and log normal processes. Water Resour Res 10(4):689–693
Moss ME, Dawdy DR (1974) Stochastic simulation for basins with sort or no records of streamflow. Design of water resources projects with inadequate data. U.S. Geological Survey, Washington, D.C., pp 365–376
Phien HN, Ruksaslip W (1981) A review of single-site models for monthly streamflow generation. J Hydrol 52(1–2):1–12
Sangal BP, Biswas AK (1970) The 3-parameter lognormal distribution and its application in hydrology. Water Resour Res 6(2):505–515
Savic DA, Burn DH, Zrinji Z (1989) A comparison of streamflow generation models for reservoir capacity-yield analysis. Water Resour Bull 25(5):977–983
Shih SF (1978) Generating streamflow sequences with trend and cyclical movements. J Am Water Resour Assoc 14(4):942–955
Sim CH (1987) A mixed gamma ARMA(1,1) model for river flow time series. Water Resour Res 23(1):32–36
Srinivas VV, Srinivasan K (2005) Hybrid moving block bootstrap for stochastic simulation of multi-site multi-season streamflows. J Hydrol 302(1–4):307–330
Srivastav RK, Simonovic SP (2014) An analytical procedure for multi-site, multi-season streamflow generation using maximum entropy bootstrapping. Environ Model Softw 59:59–75
Stedinger JR, Taylor MR (1982) Synthetic streamflow generation: 1. Model verification and validation. Water Resour Res 18(4):909–918
Stedinger JR, Lettenmaier DP, Vogel RM (1985) Multisite ARMA(1,1) and disaggregation models for annual streamflow generation. Water Resour Res 21(4):497–509
Szilagyi J, Balint G, Csik A (2006) Hybrid, Markov chain-based model for daily streamflow generation at multiple catchment sites. J Hydrol Eng 11(3):245–256
Thomas HA, Fiering MB (1962) Mathematical synthesis of streamflow sequences for the analysis of river basin by simulation. In: Maass A et al (eds) Design of water resource systems. Harvard University Press, Cambridge, pp 459–493
Wallis JR, Matalas NC, Slack JR (1974) Just a Moment! Water Resour Res 10(2):211–219
Wang W, Ding J (2007) A multivariate nonparametric model for synthetic generation of daily streamflow. Hydrol Process 21:1764–1771
Yonaba H, Anctil F, Fortin V (2010) Comparing sigmoid transfer functions for neural network multistep ahead streamflow forecasting. J Hydrol Eng 15(4):275–283
Xu Z, Schumann A, Li J (2003) Markov cross-correlation pulse model for daily streamflow generation at multiple sites. Adv Water Resour 26:325–335
Xu Z, Schumann A, Brass C, Li J, Ito K (2001) Chain-dependent Markov correlation pulse model for daily streamflow generation. Adv Water Resour 24:551–564
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Medda, S., Bhar, K.K. Comparison of single-site and multi-site stochastic models for streamflow generation. Appl Water Sci 9, 67 (2019). https://doi.org/10.1007/s13201-019-0947-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13201-019-0947-3