Key Words

1 Introduction

Hydrology deals with the occurrence, movement, and storage of water in the earth system. Water occurs in liquid, solid, and vapor phases, and it is transported through the system in various pathways through the atmosphere, the land surface, and the subsurface and is stored temporarily in storages such as the vegetation cover, soil, wetlands, lakes, flood plains, aquifers, oceans, and the atmosphere. Thus, hydrology deals with understanding the underlying physical and stochastic processes involved and estimating the quantity and quality of water in the various phases and stores. For this purpose, a number of physical and statistical laws are applied, mathematical models are developed, and various state and input and output variables are measured at various points in time and space. In addition, natural systems are increasingly being affected by human intervention such as building of dams, river diversions, groundwater pumping, deforestation, irrigation systems, hydropower development, mining operations, and urbanization. Thus, the study of hydrology also includes quantifying the effects of such human interventions on the natural system (at watershed, river basin, regional, country, continent, and global scales). Water covers about 70 % of the earth surface, but only about 2.5 % of the total water on the earth is freshwater and the rest is saltwater (NASA Earth Observatory website). Of the total amount of the earth’s freshwater, about 70 % is contained in rivers, lakes, and glaciers and about 30 % in aquifers as groundwater [1].

A related term/concept commonly utilized in hydrology is hydrologic cycle. It conveys the idea that as water occurs in nature, say in the form of rainfall, part of it may be temporarily stored on vegetation (e.g., trees), the remaining part reaches the ground surface, and in turn part of that amount may infiltrate and percolate into the subsurface, and another part may travel over the land surface eventually reaching the streams and the ocean. In addition, part of the water temporarily stored on the vegetation canopy, the soil, depression pools, the snow pack, the lakes, and the oceans evaporates back into the atmosphere. That process of water circulating from the start of the precipitation, traveling through the river basin (or through the entire earth system), and then evaporating back to the atmosphere is known as the hydrologic cycle.

This introductory chapter includes seven subjects, namely, hydroclimatology, surface water hydrology, soil hydrology, glacier hydrology, watershed and river basin modeling, risk and uncertainty analysis, and data acquisition and information systems. The intent is to discuss some basic concepts and methods for quantifying the amount of water in the various components of the hydrologic cycle. However, the chapter content cannot be comprehensive because of space limitations. Thus, the emphasis has been on recent developments particularly on the role that atmospheric and climatic processes play in hydrology, the advances in hydrologic modeling of watersheds, the experiences in applying statistical concepts and laws for dealing with risk and uncertainty and the challenges encountered in dealing with nonstationarity, and the use of newer equipment (particularly spaceborne sensors) for detecting and estimating the various components of the hydrologic cycle such as precipitation, soil moisture, and evapotranspiration. Current references have been included as feasible for most of the subjects.

2 Hydroclimatology

All years are not equal when it comes to hydrology and climate. The year-to-year response of the hydrologic system that results in floods or droughts is driven by the nonlinear interactions of the atmosphere, oceans, and land surface. While a deterministic understanding of the complex interactions of these systems may be near impossible, certain patterns have been identified that have been correlated to particular hydrologic response in different locations.

These identified patterns range in spatial and temporal scales as depicted in Fig. 1.1. At the lower left are the smaller spatial scale and relatively fast evolving atmospheric phenomena that can impact midlatitude weather systems resulting in different hydrologic outcomes. As the space and time scale expand, ocean processes start to play a role, and the patterns or relations are coupled ocean–atmosphere events that can span multiple years and play a role in spatial patterns of hydrologic response as well as magnitude. The largest spatial and longest time-scale processes come from the oceanic system and can play a role in decadal variability of hydrologic response.

Fig. 1.1
figure 1

Schematic depicting the range of spatial and temporal scale of climate patterns and associated hydrologic response

The strength of a given pattern and the interactions among multiple identified patterns across multiple scales play an important role in the type and level of hydrologic response (e.g., flood or drought). In addition, changes to the hydroclimatic system arising from natural and anthropogenic elements can impact the hydrology in a given location. This section presents an overview of the climate system and its potential impact on hydrology. Specified patterns in the ocean and atmospheric systems will be shown and related to hydrologic response in locations where a clear connection has been identified. Hydrologic response to climate change will also be reviewed noting some of the latest work completed in this area.

2.1 The Hydroclimatic System

The climate for a given location is a function of the nonlinear interactions of multiple physical processes occurring simultaneously in the atmosphere, ocean, and land surface systems. The atmosphere responds to changes in solar radiation, tilt and rotation of the earth, atmospheric constituents, and distribution of heat input from the ocean and land surface systems. The ocean system responds to changes in wind stresses from the atmosphere as well as from thermohaline currents at various depths that may be influenced by the bathymetry of the different ocean basins and relative positions of the continents. The land system is influenced by the temperature of both atmosphere and ocean and develops its own pattern of heating that is radiated back to the atmosphere as long-wave radiation. All of these elements play a role in the evolution of weather systems that result in different hydrologic outcomes.

While physical equations have been developed to describe the different time-evolving elements of these systems, using them directly to determine their impact on hydrology is extremely complex and filled with uncertainty. An alternative approach is to look for characteristic recurring patterns in the hydroclimatic system and examine their correlation with hydrologic time series to determine if there is a potential link. In some cases, the correlation may not be strong, but this may be due to the impact of other patterns or the combination of processes. Because of this, greater insight may be gained by examining hydrologic response through the use of probability distributions conditioned upon a given hydroclimatic patterns or collection of patterns. This can be limited by the available realizations provided by the observed record.

In the following sections, three scales of hydroclimate patterns identified in Fig. 1.1 are presented along with their potential impact on hydrologic response. Examples from observations or studies that have identified regions having significant correlative response will be highlighted. Additional factors that can impact extreme events also will be pointed out. Finally, a discussion of hydrologic response due to climate change will be provided in the context of scale and forcing of the hydroclimate system.

2.2 Hydroclimatic System Patterns: Atmospheric Patterns

Atmospheric patterns are the smallest in spatial scale and shortest in temporal scale. They are considered hydroclimatic patterns as they are larger than the scale of weather systems which is often referred to as the synoptic scale [2]. The synoptic scale has a spatial extent the size of time-varying high- and low-pressure systems that form as part of the time evolution of the atmosphere. These systems are usually 500–1,000 km in spatial extent with extreme cases being larger. The life cycle of these events as they impact a given location results in a time scale on the order of 3 days. Patterns of atmospheric hydroclimate evolve on the order of weeks and have a spatial scale of several thousand kilometers. In addition, the pattern itself may result in the formation of planetary waves that can impact weather systems far removed from the pattern itself.

One of the most well-known atmospheric hydroclimate patterns is the Madden-Julian Oscillation [3]. This continent-sized cluster of convective activity migrates across the tropics with a periodicity ranging from 30 to 90 days. It is thought that the convective activity excites planetary scale waves that can interact with weather systems in the midlatitudes which can lead to enhanced precipitation for some locations. Maloney and Hartmann [4, 5] studied the influence of the Madden-Julian Oscillation and hurricane activity in the Gulf of Mexico.

A second pattern of atmospheric hydroclimate that can influence midlatitude weather systems and the resulting hydrologic response is the Arctic Oscillation [6]. This pressure pattern between the Northern Hemisphere polar region and northern midlatitudes has two phases called the positive phase and negative phase. In the positive phase, the higher pressures are in the northern midlatitudes which results in storm tracks shifting northward and confining arctic air masses to the polar region. As a result, places like Alaska, Scotland, and Scandinavia tend to be wetter and warmer, while the Mediterranean region and western United States tend to be drier. The negative phase is the opposite with more cold air movement to the northern midlatitudes and wetter conditions in the western United States and Mediterranean regions. The time frame for the oscillations is on the order of weeks. The oscillation does not directly cause storms but influences pressure tendencies in the midlatitudes that can facilitate the formation of storms in select regions. Additional information on this phenomenon can be found on the National Oceanographic and Atmospheric Administration’s (NOAA) Climate Prediction Center’s web pages (e.g., http://www.cpc.ncep.noaa.gov/products/precip/CWlink/daily_ao_index/teleconnections.shtml).

2.3 Hydroclimatic System Patterns: Coupled Atmosphere-Ocean Patterns

Coupled atmosphere-ocean patterns extend from the scale of atmospheric phenomena to the scale of select regions in ocean basins. These patterns can persist from months to years and can have significant influence on atmospheric circulation patterns that result in changes to storm tracks and observed hydrologic conditions at given locations.

The best-known phenomenon of this type is the El Niño/Southern Oscillation (ENSO). The ENSO pattern was discovered in pieces by different researchers in the late 1800s [7]. Subsequent studies showed that the variously observed pressure differences, changes in surface ocean currents, and changes in the equatorial sea surface temperatures in the eastern Pacific Ocean from the dateline to the coast of South America were all part of the ENSO pattern. There are three phases to ENSO: a warm (El Niño) phase, a cool (La Niña) phase, and a neutral phase. Transitions between phases occur in time periods ranging from 2 to 7 years. While this is a tropical phenomenon, hydrologic impacts occur across the globe as the global atmosphere responds to the tropical ocean/atmosphere conditions that can persist for more than a year. Further information on ENSO can be found in Philander [7] and NOAA’s Climate Prediction Center web pages.

The United States has several regions that have seemingly well-defined hydrologic responses to the different phases of ENSO. The southeast tends to have colder drier winters during La Niña. In the west, the Pacific Northwest tends to be wetter (drier) than average during La Niña/El Niño, while the Southwest is drier (wetter) than average [8]. Cayan et al. [9] investigated the relationship of ENSO to hydrologic extremes in the western United States. Gray [10], Richards and O’Brien [11], and Bove et al. [12] have investigated links of Atlantic Basin hurricane activity to the state of ENSO which has a distinct impact on hydrologic condition in the Gulf States and Eastern seaboard.

It is important to realize that the ENSO phenomenon tends to impact the atmospheric circulation patterns. Variability in the positioning of the atmospheric circulation patterns relative to the land surface can have a significant influence on the observed hydrologic response for some locations. Figure 1.2 shows a plot of the Multivariate ENSO Index, an index based on multiple factors to determine the strength of the El Niño or La Niña event [13]. In Fig. 1.2, red regions are associated with El Niño events, and blue regions are associated with La Niña events.

Fig. 1.2
figure 2

Plot of multivariate ENSO index from 1950 to present. Blue regions are associated with La Niña events and red regions are associated with El Niño events (source: NOAA, ESRL, http://www.pmel.noaa.gov/co2/file/Multivariate+ENSO+Index) (Color figure online)

2.4 Hydroclimatic System Patterns: Ocean System Patterns

The oceanic component of the hydroclimate system has the longest time scale of evolution which can lead to interannual to decadal influences on hydrologic response. Ocean system patterns that influence the hydroclimate system are often tied to sea surface temperature patterns that are driven in part by ocean circulations due to heat content and salinity variations across the depth and breadth of the ocean basins.

One pattern of oceanic hydroclimate is the Pacific Decadal Oscillation (PDO). This sea surface temperature pattern spans the entire Pacific Ocean north of the equator ([14]; Minobe [15]). In the Atlantic basin, the Atlantic Multidecadal Oscillation (AMO) has been identified by Xie and Tanimoto [16]. Figure 1.3 shows a plot of the PDO and AMO.

Fig. 1.3
figure 3

Time series of PDO and AMO (with permission from [17])

For the PDO, there are two phases, a warm phase and a cold phase. In the warm phase of the PDO, a pool of warmer than average sea surface temperatures extends across the northeast Pacific. It is surrounded by a ring of cooler-than-normal water to the west. The cold phase has a cooler-than-average pool of water in the northeast Pacific with a ring of warmer water surrounding it to the west. The transition between a warm and cold phase occurs between 10 and 30 years. Its discovery was an outcome of a search for causal mechanisms of changes in fisheries patterns along the coast of North America [14, 18]. Due to ocean patterns’ long time period of evolution, they tend to serve as a backdrop upon which the shorter time-scale processes occur. In that sense, impacts tend to relate more to decadal variability rather than specific event influence. Correlations with hydrologic conditions can be found in numerous studies and reviews (e.g., [1921]).

Like the PDO, the AMO has a warm and cold phase defined primarily by SST patterns. For the North Atlantic and the AMO, any linear trends are removed from the SST time series prior to determining the phase of the AMO to take anthropogenic climate change into account. Variability in the AMO is associated with the ocean’s thermohaline circulation. Correlations of the AMO to Northern Hemisphere precipitation and air temperature patterns are also numerous (e.g., [2224]).

2.5 Interactions Across Scales and Extreme Events

The phenomena mentioned above do not evolve in isolation, and at any given time, multiple features can be influencing midlatitude weather patterns and their associated hydrologic response. In some cases, the interactions can mitigate the influence of one pattern and may muddle the correlation with hydrologic response in a given location. On the other hand, there may be times when interactions between the processes occur in such a way that an unusually extreme event results. In these cases, there may be additional processes such as atmospheric rivers [25] that come into play.

Atmospheric rivers are narrow bands of high concentrations of atmospheric water vapor that extend from the tropics to the midlatitudes. When these water vapor bands interact with the right atmospheric dynamics, extreme precipitation events tend to occur. The relation of processes such as atmospheric rivers and other hydroclimate patterns and their associated impact on hydrologic response is an area of open research. NOAA’s Climate Prediction Center tracks a large collection of these hydroclimate system patterns and has more information and references on their website.

2.6 Climate Change

Changes in atmospheric composition impacting the radiative balance of the atmosphere can have significant impacts on hydrologic processes. Increasing temperatures lead to higher freezing altitudes which lead to higher elevation snow lines. Higher snow lines mean greater watershed area contributing to runoff during a precipitation event which will result in more direct runoff and possible higher peak flows. Higher snow lines may result in smaller runoff volumes during the snowmelt period, changing the shape of the annual hydrograph. Higher snow lines may also change the local water balances resulting in changes to watershed yields for water supply purposes.

Methods for assessing hydrologic impacts of climate change are varied. Impacts to annual and monthly hydrology for water supply purposes have looked at scaled changes to monthly flow volumes using ratios (e.g., [2628]). Hydrologic models have been used to determine changes to flows using temperature and precipitation change estimates from global climate model projections (e.g., [2931]). However, these simulations assume that the model calibration for historical hydrologic conditions is also appropriate for future climate conditions. Such questions suggest that more research is needed into watershed processes and their potential change in relationship to each other with different climate conditions. Another option for expanding the hydrologic realizations of the observed record is to use paleoclimate estimates of hydrologic variables. For example, this has been done in the United States Bureau of Reclamation’s Lower Colorado Study [32]. Other methodologies will likely be developed as more refined climate change projection information becomes available and more planning studies require consideration of climate change impacts.

2.7 Remarks

Climate plays a significant role in hydrologic response. Year-to-year variations in peak flows, low flows, or annual totals can be related to specific hydroclimatic patterns through a variety of correlative methods. Several hydroclimatic patterns have been identified with phases lasting from days to years to decades. Climate change may cause fundamental shifts in hydrologic processes at a given location that may impact the correlative relation between the climate phenomena and local hydrologic response. Continued research and development is needed to move beyond correlative relations to a greater understanding of the physical processes that enable climate to impact weather that impacts hydrologic response. While a deterministic mapping of these processes may not be possible due to the complexity and interaction of the different phenomena, there should be opportunity for examining conditional probability distributions and their evolution based on the evolution of the climate system.

3 Surface Water Hydrology

3.1 Precipitation

The lifting of moist air masses in the atmosphere leads to the cooling and condensation which results in precipitation of water vapor from the atmosphere in the form of rain, snow, hail, and sleet. Following the cooling of air masses, cloud droplets form on condensation nuclei consisting of dust particles or aerosols (typically < 1 μm diameter). When the condensed moisture droplet is larger than 0.1 mm, it falls as precipitation, and these drops grow as they collide and coalesce to form larger droplets. Raindrops falling to the ground are typically in the size range of 0.5–3 mm, while rain with droplet sizes less than 0.5 mm is called drizzle.

There are three main mechanisms that contribute to lifting of air masses. Frontal lifting occurs when warm air is lifted over cooler air by frontal passage resulting in cyclonic or frontal storms. The zone where the warm and cold air masses meet is called a front. In a warm front, warm air advances over a colder air mass with a relatively slow rate of ascent causing precipitation over a large area, typically 300–500 km ahead of the front. In a cold front, warm air is pushed upward at a relatively steep slope by the advancing cold air, leading to smaller precipitation areas in advance of the cold front. Precipitation rates are generally higher in advance of cold fronts than in advance of warm fronts. Oftentimes, warm air rises as it is forced over hills or mountains due to orographic lifting as it occurs in the northwestern United States, and the resulting precipitation events are called orographic storms. Orographic precipitation is a major factor in most mountainous areas and exhibits a high degree of spatial variability. In convective lifting, warm air rises by virtue of being less dense than the surrounding air, and the resulting precipitation events are called convective storms or, more commonly, thunderstorms.

Natural precipitation is hardly ever uniform in space, and spatially averaged rainfall (also called mean areal precipitation) is commonly utilized in hydrologic applications. Mean areal precipitation tends to be scale dependent and statistically nonhomogeneous in space. Precipitation at any location (measured or unmeasured) may be estimated using an interpolation scheme that employs linear weighting of point precipitation measurements at the individual rain gauges over a desired area as

$$ \widehat{P}(x)={\displaystyle \sum_{i=1}^N{w}_i}P\left({x}_i\right), $$
(1.1)

where \( \widehat{P}(x) \) is the precipitation estimate at location x; P(x i ) is the measured precipitation at rain gauge i, that is, located at x i ; w i is the weight associated with the point measurement at station i; and N is the total number of measurements (gauges) being used in the interpolation. Because of unbiasedness, the following condition \( \sum \limits_{i=1}^N \) w i = 1 must be met.

There are a variety of ways to estimate the weights, w i , depending on the underlying assumptions about the spatial distribution of the precipitation. Some of the more common methods are summarized briefly:

  1. (a)

    The precipitation is assumed to be uniformly distributed in space, and an equal weight is assigned to each station so that the estimated rainfall at any point is simply equal to the arithmetic average of the measured data, i.e.,

    $$ {w}_i=\frac{1}{N},\kern3em i=1,\dots, N. $$
    (1.2)
  2. (b)

    The precipitation at any point is estimated to equal the precipitation at the nearest station. Under this assumption, w i = 1 for the nearest station, and w i = 0 for all other stations. This methodology is the discrete equivalent of the Thiessen polygon method [33] that has been widely used in hydrology.

  3. (c)

    The weight assigned to each measurement station is inversely proportional to the distance from the estimation point to the measurement station. This approach is frequently referred to as the reciprocal-distance approach (e.g., [34]). An example of the reciprocal-distance approach is the inverse-distance-squared method in which the station weights are given by

    $$ {w}_i=\frac{1/{d}_i^2}{{\displaystyle \sum_{i=1}^N\left(1/{d}_i^2\right)}},\kern3em i=1,\dots, N, $$
    (1.3)

    where d i is the distance to station i and N is the number of stations within some defined radius where the precipitation is to be estimated.

  4. (d)

    The weights are calculated using geostatistical methods such as kriging using either the covariance or the variogram function of the precipitation (e.g., [35]). Because the kriging weights are dependent on spatial continuity of precipitation, kriging techniques are suitable for examining scale dependency of spatially averaged precipitation [36].

The methods above should not be used to estimate precipitation depths of mountainous watersheds where the spatial variability is very high. Nowadays, these computations are facilitated through the use of geographic information systems (GIS) that enable processing and visualization of data. Figure 1.4 is an example of spatial interpolation of precipitation over a watershed using kriging techniques.

Fig. 1.4
figure 4

Interpolated precipitation map over the Ohio River Basin using USHCN precipitation stations for February 2000

After specifying the station weights in the precipitation interpolation formula, the next step is to numerically discretize the averaging area by placing an averaging grid. The definition of the averaging grid requires specification of the origin, discretization in the x- and y-directions, and the number of cells in each of the coordinate directions. The precipitation, \( \widehat{P}\left({x}_j\right) \), at the center, x j , of each cell is then calculated using (1.1) with specified weights, and the average precipitation over the entire area, \( \overline{P} \), is given by

$$ \overline{P}=\frac{1}{A}{\displaystyle \sum_{j=1}^J\widehat{P}\left({x}_j\right){A}_j}, $$
(1.4)

where A is the averaging area, A j is the area contained in cell j, and J is the number of cells that contain a portion of the averaging area.

The fractions of precipitation that are trapped infiltrate into the ground and fill local depressions are called abstractions or losses, while the remainder is called excess precipitation, i.e., the fraction that generates runoff. The terms used in abstractions and runoff computations are illustrated in Fig. 1.5 where precipitation and loss rates are plotted versus time for a precipitation event. The total precipitation depth P(t) is the area under the plot of precipitation intensity i. The total precipitation is partitioned into initial abstraction I a , continued abstraction F a , and excess precipitation (which is assumed to be converted into surface runoff and its accumulation is called cumulative runoff R(t)). The initial abstraction I a is the area under the precipitation intensity curve at the beginning of the precipitation event when all the precipitation is lost through interception, surface storage, infiltration, and other abstractions. The continued abstraction F a includes losses that occur after the initial abstraction has been met and primarily represents infiltration losses into the soil. Referring to Fig. 1.5, continued abstraction is the area under the loss rate curve after runoff is initiated, and the total abstraction S is the sum of I a and F a . The excess precipitation R(t) is the area under the precipitation intensity plot after subtracting the total losses. The ultimate abstraction S is an estimate of the total abstractions assuming that precipitation continues indefinitely.

Fig. 1.5
figure 5

Partitioning of the total precipitation hyetograph into excess precipitation and abstractions. The cumulative precipitation P(t) and cumulative runoff R(t) are also shown schematically

3.2 Interception and Depression Storage

Interception is the part of precipitation that is stored on the earth’s surface such as vegetation. Part of the intercepted water evaporates, but part of it may eventually filter through the vegetation and reach the soil surface as throughfall or creep down the branches as stemflow. Studies indicate that interception accounts for 10–30 % of the total rainfall in the Amazon rainforest depending on the season. Precipitation is also intercepted by buildings and other aboveground structures as in urban areas and industrial complexes. Methods used for estimating interception are mostly empirical, where the amount of interception is expressed either as a fraction of the amount of precipitation or as a function of the precipitation amount. Interception percentages over seasonal and annual time scales for several types of vegetation have been summarized by Woodall [37]. These data indicate that, on an annual basis, interception ranges from 3 % for hardwood litter to 48 % for some conifers. Many interception formulas are similar to that originally suggested by Horton [38], where the interception, I, for a single storm, is related to the precipitation amount, P, by an equation of the form

$$ I=a+b{P}^n, $$
(1.5)

where a and b are constants. When I is expressed in millimeters, typical values are n = 1 (for most vegetative covers), a between 0.02 mm for shrubs and 0.05 mm for pine woods, and b between 0.18 and 0.20 for orchards and woods and 0.40 for shrubs. The interception storage capacity of surface vegetation may vary from less than 0.3 mm to 13 mm, with a typical value for turf grass of 1.3 mm.

Some interception models account for limited storage capacity of surface vegetation and evaporation during a storm (e.g., [39]) such as

$$ I=S\left(1-{e}^{-P/S}\right)+{K}^{\prime } Et, $$
(1.6)

where S is the storage capacity of vegetation, P is the amount of precipitation during the storm, K′ is the ratio of the surface area of one side of the leaves to the projection of the vegetation at the ground (called the leaf area index), E is the evaporation rate during the storm from plant surfaces, and t is the duration of the storm. The storage capacity, S, is typically in the range of 3–5 mm for fully developed pine trees; 7 mm for spruce, fir, and hemlock; 3 mm for leafed-out hardwoods; and 1 mm for bare hardwoods [40]. More sophisticated models of interception are described in Ramirez and Senarath [41] and Brutsaert [42].

Interception by forest litter is much smaller than canopy interception. The amount of litter interception is largely dependent on the thickness of the litter, water holding capacity, frequency of wetting, and evaporation rate. Studies have shown that it is only a few millimeters in depth in most cases [43] and, typically, about 1–5 % of annual precipitation and less than 50 mm/year are lost to litter interception [44].

Water that accumulates in surface depressions during a storm is called depression storage and can be a major part of the hydrologic budget in flat watersheds [45]. This portion of rainfall does not contribute to surface runoff. Depression storage is generally expressed as an average depth over the catchment area, and typical depths range from 0.5 to 7.5 mm.

3.3 Infiltration

The process by which water enters into the ground through the soil surface is called infiltration and is usually the dominant rainfall abstraction process. Bare-soil infiltration rates are considered high when they are greater than 25 mm/h and low when they are less than 2.5 mm/h [46]. The infiltration rate f expresses how fast water enters the soil at the surface. If water is ponded on the surface, the infiltration occurs at the potential infiltration rate (often called infiltration capacity) and is considered to be limited by soil properties. In case of rainfall over initially dry soils, the rate of supply of water at the surface (rainfall rate) is less than the potential infiltration rate, all the water enters the soil, and infiltration is limited by rainfall rate. The cumulative infiltration F is the accumulated depth of water infiltrated over a given time and is related to infiltration rate as

$$ f(t)=\frac{ dF(t)}{ dt} $$
(1.7a)

and

$$ F(t)={\displaystyle {\int}_0^tf(t) dt}. $$
(1.7b)

The simplest model for infiltration is the ϕ index, which is a constant rate of abstraction such that the excess depth of rainfall equals the direct runoff depth; it has been commonly used in practice. Our current understanding of water movement through unsaturated soils is expressed by Richards’ equation, and the infiltration process determines the boundary condition at the soil surface. Since Richards’ equation is nonlinear, simpler empirical models for infiltration are commonly used. For example, Horton [47, 48] expressed potential infiltration rate as

$$ f(t)={f}_c+\left({f}_0-{f}_c\right){e}^{- kt}, $$
(1.8)

where k is a decay constant and f 0 is the initial infiltration rate at t = 0 and decreases exponentially until it reaches a constant rate f c. Philip [49, 50] expressed cumulative infiltration as

$$ F(t)=S{t}^{1/2}+ Kt, $$
(1.9)

where S is soil sorptivity (a function of the soil suction potential) and K is the saturated hydraulic conductivity. Thus, the potential infiltration rate from this model when water supply is not limited is

$$ f(t)=\frac{1}{2}S{t}^{-1/2}+K. $$
(1.10)

The two terms in Philip’s equation represent the effects of suction and gravity forces, respectively, in moving the water to deeper soil locations.

Green and Ampt [51] proposed a simplified infiltration model which approximated the water content profile in the soil as a sharp front, with the volumetric moisture content equal to the initially uniform value of θ i below the front and saturated soil with moisture content equal to porosity η above the front. The wetting front penetrates to a depth L in time t since the start of the infiltration process. Water is ponded to a small depth H 0 on the soil surface, denoting an infinite supply of water at the surface. For a control volume extending from the soil surface to the wetting front of unit area, volumetric continuity yields

$$ F(t)=L\left(\eta -{\theta}_i\right)= L\varDelta \theta. $$
(1.11)

Denoting H as the total head (sum of gravity and suction heads), Darcy’s law over this length of saturated soil is

$$ -f=-K\frac{\partial H}{\partial z}. $$
(1.12)

Simplification yields

$$ f=K\left[\frac{\psi \varDelta \theta +F}{F}\right] $$
(1.13)

and

$$ F(t)= Kt+\psi \varDelta \theta \ln \left(1+\frac{F(t)}{\psi \varDelta \theta}\right), $$
(1.14)

where ψ is the suction head at the wetting front.

When the supply of water is limited as it normally occurs during rainfall events, water will pond on the surface only if the rainfall intensity exceeds the infiltration capacity of the soil. The ponding time t p is the elapsed time between the time rainfall begins and the time water begins to pond on the soil surface. During pre-ponding times (t < t p ), the rainfall intensity is less than the potential infiltration rate, and the soil surface is unsaturated. Ponding is initiated when the rainfall intensity exceeds the potential infiltration rate at t = t p and the soil surface reaches saturation. With continued rainfall (t > t p ), the saturated region extends deeper into the soil, and the ponded water is available on the soil surface to contribute to runoff. At incipient ponding conditions, F p = i t p and the infiltration rate equals the rainfall rate (i.e., f = i) so that

$$ {t}_p=\frac{ K\psi \varDelta \theta}{i\left(i-K\right)}. $$
(1.15)

Post-ponding cumulative infiltration is given by

$$ F-{F}_p-\psi \varDelta \theta \ln \left(\frac{\psi \varDelta \theta +F}{\psi \varDelta \theta +{F}_p}\right)=K\left(t-{t}_p\right) $$
(1.16)

and the infiltration rate by (1.13). Pre- and post-ponding infiltration rates under supply-limiting conditions can be computed for the Horton and Philip models (e.g., [52]).

The Natural Resources Conservation Service (NRCS), formerly the Soil Conservation Service (SCS), developed the curve number method that is widely used in practice due to its simplicity and availability of empirical information (SCS, [53, 54]). The method relies on the use of a single parameter called the curve number CN. Following Fig. 1.5, consider the relationship below on intuitive grounds

$$ \frac{F_a}{S}=\frac{R}{P-{I}_a}. $$
(1.17)

Note that at the beginning of the rainfall event, both F a /S and R/(PI a ) are zero. As time progresses, both F a /S and R/(PI a ) approach unity asymptotically. The continuity equation gives

$$ P(t)={I}_a+{F}_a(t)+R(t),P(t)>{I}_a. $$
(1.18)

Based on analyses of empirical data from numerous gauged watersheds, the NRCS proposed

$$ {I}_a=0.2S. $$
(1.19)

Combining (1.17)–(1.19) gives

$$ \begin{array}{l}R(t)=\frac{{\left[P(t)-0.2S\right]}^2}{P+0.8S},\kern2em \mathrm{for}\kern2em P(t)\ge {I}_a\\ {}R(t)=0,\kern2em \mathrm{for}\kern2em P(t)<{I}_a\end{array} $$
(1.20a)

with

$$ \begin{array}{l}S=\frac{2,540}{ CN}-25.4\kern1.25em \mathrm{for}\ R,P,S\ \mathrm{in}\ \mathrm{cm}\\ {}S=\frac{1,000}{ CN}-10\kern1.25em \mathrm{for}\ R,P,S\ \mathrm{in}\ \mathrm{in}\mathrm{ches}\end{array} $$
(1.20b)

where R(t) is the runoff volume (rainfall excess) expressed in the form of depth that results from precipitation P(t), S is the maximum potential abstraction after runoff begins, I a is the initial abstraction before runoff begins, and CN is a curve number. Note that even though R, P, S, and I a are essentially volumes, they have units of cm or inches, because these numbers are expressed over the watershed area. The theoretical justification of the foregoing method has been developed [55, 56].

The curve number CN depends on soil characteristics, land cover, and antecedent moisture conditions. Information on local soils is available from various sources, including published NRCS county soil surveys. The standard NRCS soil classification system consists of four groups (A, B, C, and D). Group A soils have low runoff potential and high infiltration rates (greater than 0.76 cm/h) and consist primarily of deep well-drained sands and gravel. Group B soils have moderate infiltration rates (0.38–0.76 cm/h) and consist primarily of moderately fine to moderately coarse textured soils, such as loess and sandy loam. Group C soils have low infiltration rates (0.127–0.38 cm/h) and consist of clay loam, shallow sandy loam, and clays. And Group D soils have high runoff potential and low infiltration rates (less than 0.127 cm/h) and consist primarily of clays with a high swelling potential, soils with a permanent high water table, or shallow soils over nearly impervious material. Rather than estimating S for each watershed, NRCS recommends working with a dimensionless CN with 0 ≤ CN ≤ 100. A CN of 100 corresponds to S = 0, implying that all precipitation is converted to runoff. For gauged watersheds, the parameters CN (or S) and I a may be determined by calibration. For ungauged watersheds, CN values may be estimated using tables (SCS, [57]).

3.4 Evaporation and Evapotranspiration

While precipitation brings water from the atmosphere down to the earth, evaporation does the opposite; it returns water from the earth back to the atmosphere. Evaporation generally occurs from all water storages such as interception and depression storages and surface water storages such as lakes and reservoirs. Also water may evaporate from the soil, snow, ice, and from all bodies that store and carry water. A related phenomenon is the water that is transported by plants from the root zone to the atmosphere, a process that is called transpiration. In this section, we will discuss the fundamental concepts behind the process of evaporation from liquid water bodies, soil, and solid water (ice and snow). In addition, we will discuss several methods for estimating lake evaporation and evapotranspiration from natural and irrigated fields and river basins. The study of evaporation is important in hydrologic and water resources engineering for several reasons. One important reason is in water balance studies of reservoirs and river basins. For example, in designing the capacity of a reservoir for municipal water supply, one must take into account the expected losses of water by evaporation from the planned reservoir. Also, (after the dam is built) during the real-time operation of the reservoir (to meet the expected water demands), one must consider that certain amount of water will be lost by evaporation. Another example is the problem of determining the expected water demands of irrigation systems. One must determine how much water will be lost by evaporation from the irrigated field plus the amount of water that will be needed by the plant to growth and to transpire.

Globally, about 62 % of the precipitation that falls on the continents is evapotranspired. About 97 % of this amount is evapotranspiration (ET) from land surface, while 3 % constitutes open-water evaporation. ET exceeds runoff in most river basins and is a major component of energy and water vapor exchange between land surfaces and the atmosphere.

3.4.1 Concept of Evaporation

Evaporation denotes the conversion of water in the liquid or solid phase at or near the earth’s land surface to atmospheric water vapor. In general the term includes evaporation of liquid water from rivers, lakes, oceans, and bare soil. Related terms include evapotranspiration from vegetative surfaces and sublimation from ice and snow surfaces.

Evaporation can be thought of as a diffusion process in which there exists transfer of water vapor. This water transfer is caused by a generating force which is the gradient of water vapor pressure existing in the interface liquid-air. Following Eagleson [58], let us consider a water body in which the temperature of the water surface is denoted by T 0 and the air above the water surface is still (no wind) and has temperature T and water vapor pressure equal to e. One could also assume that just above the water surface, there is a thin layer of saturated air with temperature equal to that of the water surface, i.e., T 0, and saturated vapor pressure denoted by e 0 (the thin layer becomes saturated as a result of a continuous exchange of water molecules due to vaporization and condensation).

Evaporation from the water body will exist as long as there is a gradient of water vapor pressure, i.e., whenever the saturated vapor pressure e o (at temperature T o) is greater than the water vapor pressure e of the air above the thin layer. Therefore, one can write

$$ E=-K\frac{\partial e}{\partial y}, $$
(1.21)

where E = evaporation rate, K = mass transfer coefficient, e = vapor pressure, and y = height. Naturally there must be many other factors besides water vapor pressure that influences evaporation rates. They may be categorized as (a) meteorological factors, (b) the nature of the evaporating surface, and (c) water quality. Meteorological factors include radiation, vapor pressure, humidity, temperature, wind, and pressure (elevation). Short- and long-wave radiations are main sources of energy that are necessary for liquid water to become water vapor. Water temperature at the water surface determines the water vapor pressure just above the (water) surface. Likewise, air temperature and air moisture determine the water vapor conditions (pressure) above the water surface. And both the water vapor near the water surface and that above the surface determine the rate of evaporation as (1.21) suggests. Also wind has a major effect; it enhances the rate of evaporation due to turbulent convection. Certainly the nature of the evaporating surface must have some effect on evaporation rates. For example, under all other conditions being the same, the evaporation rate per unit area from water must be different than that from ice. One difference is the temperatures at the surfaces of water and ice and the corresponding saturated water vapor pressures. Another difference is that the net radiation will vary for both surfaces because of the differences in albedo and reflectivity. Water quality is also important in determining evaporation rates. An example is the difference in evaporation rates per unit area of clean water versus water with a high concentration of sediments.

3.4.2 Lake Evaporation

Estimating evaporation rates from open-water bodies such as lakes and reservoirs has been an active area of study for water scientists and hydrologic engineers for many decades. Many theories and formulas have been developed for estimating lake evaporation. The various estimation methods can be classified as (a) use of pan coefficients, (b) water budget analysis (mass balance or continuity equation), (c) energy budget analysis (energy balance), (d) aerodynamic method (diffusion or mass transfer), and (e) combination method (Penman method).

3.4.2.1 Estimating Lake Evaporation by Pan Coefficients

Measurements of evaporation in a pan or water tank are quite useful for predicting evaporation rates from any surface such as water and soil. For example, the standard US National Weather Service Class A pan is a common instrument utilized in the United States. It has 4 ft. diameter and is 10 in. deep. The pan is filled with water to a depth of 8 in. The water surface in the pan is measured by a hook gauge in a stilling well attached to the pan, and measurements are usually made daily. Water in the pan is filled back to the full depth of 8 in. each time a reading of the stilling basin is made. Evaporation readings are adjusted for any precipitation measured in a standard rain gauge. There are several other types of evaporimeters that are currently used in many parts of the world. The method (one of the simplest methods available) involves measuring pan evaporation at or near the lake and using a pan coefficient. The equation is

$$ {E}_L= cEp, $$
(1.22)

where E L denotes lake evaporation, E p is pan evaporation, and c is a pan coefficient. The coefficient c generally varies with the season, the type of pan, and the region. The average of the monthly (or seasonal) pan coefficients is smaller than one and about the same as the coefficient based on annual quantities. For example, for Class A pans, the annual c is of the order of 0.70. The coefficient 0.7 is generally used in formulas that calculate pan evaporation to obtain an estimate of lake evaporation. Extensive tables of c are available; see, for instance, Bras [59].

3.4.2.2 Estimating Lake Evaporation by the Water Budget Equation

This is the most obvious approach and involves direct measurements of all water inputs, outputs, and change of water storage in the lake during the time interval Δt considered. Applying the mass balance (water budget) equation, we can determine the water storage in the lake at the end of the time interval Δt as S 2 = S 1 + I + PEOO g where I = surface inflow into the lake, P = precipitation on the lake surface, E = evaporation from the lake, O g = subsurface seepage, O = surface outflow (lake outflow or releases), and S 1 = lake storage at the beginning of the time interval. Solving for E gives

$$ E=\varDelta S+I+P-O-{O}_g, $$
(1.23)

where ΔS = S 1S 2. This method may give reasonable estimates of lake evaporation as long as the measurements (and estimations) of the variables involved are accurate. This can be generally achieved regarding the terms ΔS and O. However, the terms I and P may or may not be accurate depending of the particular case at hand. For example, the inflow I should be accurate if a stream gauging station is available upstream and near the lake entrance (it would be less accurate if the gauging station is located far from the dam site). Also estimates must be made of the runoff from the ungauged creeks surrounding the lake. Likewise estimates of P may be accurate or not depending on the size of the lake and the available network of precipitation gauges. On the other hand, the term O g is generally inaccurate or unknown. Estimates of O g can be obtained by calibrating a loss function by taking appropriate measurements in the reservoir during certain periods of time. However, this may not be practical or possible in large lakes.

3.4.2.3 Estimating Lake Evaporation by the Energy Budget Method

This approach involves direct measurements or estimation of all sources of energy inputs, outputs, and change of energy stored in the lake during the time interval Δt considered. Assuming a lake area unit of 1 cm2 and the time interval of 1 day, the energy budget equation for the lake in cal/(cm2 × day) can be written as

$$ {Q}_{\theta }={Q}_s-{Q}_r-{Q}_{\mathcal{l}}-{Q}_h-{Q}_E+{Q}_{\mathrm{adv}}, $$
(1.24)

where Q s = short-wave radiation input, Q r = reflected short-wave radiation, \( {Q}_{\mathcal{l}} \) = net long-wave radiation output (atmospheric long wave, reflected long wave, and emitted long wave from the lake), Q h = sensible heat loss (heat conduction at the molecular level), Q E = energy used for lake evaporation, Q adv = net advected energy (due to inflow, outflow, precipitation, and seepage), and Q θ = change of energy stored during the time interval considered. One can simplify this equation by assuming that Q h = BQ E in which B is called the Bowen’s ratio. B may be determined by

$$ B=0.61\frac{p_a}{1,000}\frac{\left({T}_0-{T}_a\right)}{\left({e}_0-{e}_a\right)}, $$
(1.25)

where p a = air pressure in mb, T 0 = temperature of the water surface in °C, T a = temperature of the air in °C, e 0 = saturated water vapor pressure (in mb) at temperature T 0, and e a = water vapor pressure of the air in mb. Then from (1.24), the energy used for lake evaporation is

$$ {Q}_E=\frac{\left({Q}_s-{Q}_r-{Q}_{\mathcal{l}}\right)+\left({Q}_{\mathrm{adv}}-{Q}_{\theta}\right)}{\left(1+B\right)} $$

Because Q E = ρL v E, where ρ = density of water in g/cm3, L v = latent heat of vaporization in cal/g (it can be determined accurately by L v = 597.3 − 0.564 T for T ≤ 40 °C), and E is the evaporation rate in cm/day (for a 1 cm2 area of the lake), the foregoing equation can be written as

$$ E=\frac{Q_n+\left({Q}_{\mathrm{adv}}-{Q}_{\theta}\right)}{\rho {L}_v\left(1+B\right)}, $$
(1.26)

where \( {Q}_n={Q}_s-{Q}_r-{Q}_{\mathcal{l}} \) = net radiation in cal/(cm2×day). This term may be estimated from measurements in a pan (tank) as

$$ {Q}_n={Q}_{np}+\varepsilon \sigma {\left({T}_{0p}-{T}_0\right)}^4, $$
(1.27)

where Q np = pan net all-wave radiation in cal/(cm2×day) (measured using a net pyrradiometer), T 0p = water surface temperature in the pan in °K, σ = 11.71 × 10−8 cal/(cm2 × °K4 × day) (Stefan-Boltzmann constant), and ε = water surface emissivity ≈ 0.97. Lastly the term Q advQ θ can be determined by accounting the amount of energy contained in each term of the water budget equation (e.g., [60]).

In addition, pan evaporation may be estimated from (1.26) if the pertinent quantities involved are either measured or estimated for a pan. For instance, neglecting the term Q advQ θ, an estimate of pan evaporation may be obtained by

$$ {E}_p={Q}_{np}^{\prime }={Q}_{np}/\left[\rho {L}_v\left(1+B\right)\right], $$
(1.28)

where Q np is the net radiation for the pan in cal/(cm2 × day) and the symbol Q np is used to emphasize that the net radiation is computed in equivalent units of evaporation. Furthermore, formulas for estimating pan evaporation as a function of daily solar radiation and air temperature are available. For example, a formula to estimate pan evaporation developed based on Class A pan data is given by (e.g., [60])

$$ {E}_p={Q}_{np}^{\prime }=7.14\times {10}^{-3}{Q}_s+5.26\times {10}^{-6}{Q}_s{\left({T}_a+17.8\right)}^{1.87}+3.94\times {10}^{-6}{Q}_s^2-2.39\times {10}^{-9}{Q}_s^2{\left({T}_a-7.2\right)}^2-1.02, $$
(1.29)

where E p = Q np = pan evaporation in mm per day, Q s = solar radiation in cal/(cm2 day), and T a = air temperature in °C. Then, lake evaporation may be determined approximately by E L = c E p where c is a pan coefficient. Often a value of c = 0.7 is used for a Class A pan.

3.4.2.4 Estimating Lake Evaporation by the Mass Transfer Method (Aerodynamic or Diffusion)

From turbulent convection theory, the vertical flux of water vapor can be written as

$$ E=-\rho {K}_w\frac{\partial {\overline{q}}_h}{\partial y}, $$
(1.30)

where E = lake evaporation rate in g/(cm2 × s) (flux), ρ = density (g/cm3), K w = water eddy diffusivity (cm2/s), \( {\overline{q}}_h \) = mean specific humidity, and y = elevation above the lake water surface (cm). Likewise, from the equation of momentum flux, one can write

$$ \tau =\rho {K}_{m\;}\frac{\partial \overline{u}}{\partial y}, $$
(1.31)

where τ = momentum flux or shear stress in g/(cm × s2), K m = kinematic eddy viscosity (cm2/s), and \( \overline{u} \) = mean wind velocity in the horizontal direction (cm/s). The sketch shown in Fig. 1.6 summarizes the foregoing concepts.

Fig. 1.6
figure 6

Variation of wind velocity and specific humidity with height above the lake surface

Based on the foregoing concepts and equations, it may be shown that an expression for E can be written as (e.g., [61])

$$ E=d\;\overline{u}\;\left({e}_0-{e}_a\right) $$
(1.32a)

which says that the evaporation rate is a function of both \( \overline{u} \) and e and d is a constant. But (1.32a) gives zero evaporation if \( \overline{u}=0 \), which is not realistic. Therefore, a modified equation can be written as (e.g., [59])

$$ E=\left(a+b\overline{u}\right)\left({e}_0-{e}_a\right), $$
(1.32b)

where a and b are coefficients. Recall that e 0 is the saturated vapor pressure at the lake surface temperature T 0 and e a is the vapor pressure of the air above the lake. However, in formulas of this type, the saturated vapor pressure e s at the air temperature T a is often used instead of e o.

Several empirical formulas of the type of (1.32a) and (1.32b) have been developed (e.g., [6265]). For example, Dunne’s formula is

$$ E=\left(0.013+0.00016{\overline{u}}_2\right)\left(1-f\right){e}_a, $$
(1.33)

where E = lake evaporation in cm/day, \( {\overline{u}}_2 \) = wind speed at 2 m above the lake water surface in km/day, f = relative humidity of the air above the lake surface (fraction), and e a = vapor pressure of the air above the lake surface in mb.

Furthermore, mass transfer-based formulas have been developed for estimating pan evaporation. For example, an empirical equation for a Class A pan evaporation is [46]

$$ {E}_p=\left(0.42+0.0029{\overline{u}}_p\right){\left({e}_s-{e}_a\right)}^{0.88}, $$
(1.34)

where E p = pan evaporation (mm/day), \( {\overline{u}}_p \) = wind speed (km/day) at 15 cm above the pan rim, e s = saturated vapor pressure at air temperature 1.5 m above the ground surface (mb), and e a = water vapor pressure of the air at 1.5 m above the ground surface (mb). Then, lake evaporation may be estimated as E L = 0.7 E p .

3.4.2.5 Estimating Lake Evaporation by Penman’s Equation (Combination Method)

Penman [66] combined the energy budget and the mass transfer methods for estimating lake evaporation. Essentially it involves combining (1.26) and (1.32) under certain assumptions. First of all, the term Q advQ θ in (1.26) was neglected or assumed to be negligible; then in estimating the net all-wave radiation Q n , the temperature of the lake water surface T 0 was replaced by the air temperature above the lake, T a ; and then in estimating the effect of turbulent convection as in (1.32), e 0 was replaced by e s . Under these conditions, Penman’s equation is applicable to shallow lakes. Penman showed that the equation for estimating lake evaporation takes the form

$$ E=\frac{\varDelta {Q}_n^{\prime }+\gamma {E}_a}{\varDelta +\gamma }, $$
(1.35)

where E = lake evaporation in inches/day or cm/day; Q n = net all-wave radiation expressed in the same units as E, i.e., in/day or cm/day (i.e., if Q n is known then Q n = Q n /(ρL v )); and E a = evaporation term estimated by the mass transfer method, e.g., from (1.32b). The coefficient γ (mb/°C) is determined by γ = 0.00061 p a , in which p a is the air pressure in mb. Likewise, the coefficient Δ is the derivative of e s with respect to T evaluated at T a . An approximate equation to estimate Δ in units mb/°C is [60]

$$ \varDelta =\frac{d{e}_s}{ dT}={\left(0.00815{T}_a+0.8912\right)}^7,\kern2em {T}_a\ge -25\;{}^{\circ}\mathrm{C}. $$
(1.36)

Therefore, (1.35) gives an estimate of daily lake evaporation if adequate climatological data are available for a shallow lake. However, to apply (1.35) for deep lakes, where significant energy transfer from deep layers of the lake to the evaporating surface (of the lake) may occur, in addition to advected energy, one must make an adjustment to the estimate provided by (1.35). Such an adjustment must include an estimate of the term Q advQ θ plus an adjustment factor α. Recall that (1.35) assumes that the energy advected into the lake is balanced by a change of heat storage, i.e., Q advQ θ ≈ 0. This assumption may be reasonable for shallow lakes, but for deep lakes, further corrections may be necessary. For example, lake evaporation may be adjusted as

$$ {E}_L^{\prime }={E}_L+\alpha \left({Q}_{\mathrm{adv}}-{Q}_{\theta}\right), $$
(1.37)

in which E L = adjusted lake evaporation, E L is the lake evaporation from (1.35), and α is an adjustment coefficient that can be estimated by (e.g., [60])

$$ \alpha ={\left[1+\frac{0.00066{p}_a+{\left({T}_0+273\right)}^3\times {10}^{-8}\times {\left(0.177+0.00143{v}_4\right)}^{-1}}{{\left(0.00815{T}_0+0.8912\right)}^7}\right]}^{-1}, $$
(1.38)

where p a = atmospheric pressure in mb, T 0 = water temperature of the lake surface in °C, and v 4 = 4 m wind speed (upwind from the lake) in km/day.

Also Penman’s equation can be used to estimate pan evaporation if the variables involved in the right-hand side of (1.35) correspond to a pan. Thus, if Q n in (1.35) is obtained for a pan (for instance, using (1.28) or (1.29)) and E a is given by (1.34), for example, then E of (1.35) is an estimate of pan evaporation. Therefore, Penman’s equation for estimating pan evaporation can be written as

$$ {E}_p=\frac{\varDelta {Q}_{np}^{\prime }+\gamma {E}_{ap}}{\varDelta +\gamma }, $$
(1.39)

where the term E a in Penman’s equation (1.35) has been replaced by E ap to emphasize that it refers to an estimate for a pan. Then lake evaporation can be obtained by E L = cE p in which c is the pan coefficient.

An equation that corrects for the sensible heat transfer through the pan is available. Assuming that pan evaporation E p is estimated using Penman’s equation (1.39), the corrected equation of pan evaporation (mm/day) becomes [60]

$$ {E}_p^{\prime }={E}_p\pm 0.00064{p}_a{\alpha}_p\left(0.37+0.00255{\overline{u}}_p\right){\left|{T}_0-{T}_a\right|}^{0.88}, $$
(1.40)

where p a = air pressure in mb, α p = correction factor, \( {\overline{u}}_p \) = wind speed in km/day at 150 mm above the pan rim, T 0 = temperature of the water surface at the pan in °C, and T a = air temperature in °C and the + sign after E p is for T 0 > T a and the – sign otherwise. And the correction factor α p can be approximated by [60]

$$ {\alpha}_p=0.34+0.0117{T}_0-3.5\times {10}^{-7}{\left({T}_0+17.8\right)}^3+0.0135{\overline{u}}_p^{\;0.36}. $$
(1.41)

No additional correction for advected energy is necessary because it is generally small for a pan. Then lake evaporation can be determined by multiplying E p by an appropriate pan coefficient.

3.4.3 Transpiration, Evapotranspiration, and Consumptive Use

Transpiration is the water vapor discharged to the atmosphere by plants through their stomatal pores. The factors affecting transpiration are (a) meteorological (e.g., radiation, temperature, and wind), (b) type of plant (e.g., shallow roots, long roots, and leaves) and stage of growth, (c) type of soil, and (d) available water. The role of meteorological factors is similar as for evaporation from free water surface discussed in Sects. 3.4.1 and 3.4.2. In fact, Penman’s equation has been modified [67] to determine the evaporation rate from vegetation surfaces, and the resulting modified equation has been known as the Penman–Monteith equation.

Evapotranspiration is a widely used term in hydrology and irrigation engineering. Evapotranspiration is the amount of water evapotranspired by the soil and the plants of a given area of land. A relater term, consumptive use, accounts for evapotranspiration and the amount of water utilized by plants for growing plant tissue. When the area considered is a watershed or a river basin, evapotranspiration includes water evaporated from lakes and in general from all other sources. In addition, two other terms related to evapotranspiration are potential evapotranspiration (PET) and actual evapotranspiration (AET). Potential evapotranspiration is the expected evapotranspiration rate for the expected (normal) climatic conditions in the area, under plenty amount of water available in the soil and under complete (dense) vegetation coverage (i.e., potential evapotranspiration is a maximum evapotranspiration rate under unlimited water availability). On the other hand, actual evapotranspiration is less or equal to potential evapotranspiration because it depends on the water available in the soil. Also the term reference-crop evapotranspiration has been used as being equivalent to PET [66]. Furthermore, additional concepts related to PET such as Bouchet’s complementary relationship and its advection-aridity interpretation have been suggested (e.g., [6870]).

Soil moisture tension (refer to Sect. 4), which varies with soil moisture, plays an important role in the evaporation rate of the water that reaches the plant stomata. Figure 1.7 shows a schematic of the typical relationship between soil moisture tension and content, the wilting point (soil moisture level below which plants cannot extract water from the soil), the field capacity (soil moisture level above which water may percolate down below the root zone and eventually reach the aquifer), and the available water (amount of water that is available to the plant). Also Fig. 1.8 shows a schematic of an assumed relationship between potential and actual evapotranspiration as a function of soil moisture.

Fig. 1.7
figure 7

Basic relationship between soil moisture tension ψ and soil moisture content θ. The plot also shows the wilting point θ WP , the field capacity θ FC , and the available water AW = θ FC θ WP

Fig. 1.8
figure 8

Relationship between actual evapotranspiration (AET) and soil moisture θ. It assumes that AET is equal to potential evapotranspiration PET for θθ FC

We illustrate some of the foregoing concepts with a simple example. Assume that at time t = 0, the soil moisture at a farm lot is at field capacity. If the expected precipitation rate is 12 mm/week and the consumptive use is 30 mm/week, how often and how much one must irrigate? Without any further information given for the problem, a simple approach could be to irrigate at a rate so as to make up for the deficit and to keep the soil moisture at the field capacity level. In this case, the irrigation rate would be 30 − 12 = 18 mm/week. However, one may like to take into account the water availability of the soil, the depth of the root zone, and the operating irrigation policy. For instance, suppose that the field capacity is 30 % (in volume), the wilting point is 10 %, and the root zone depth is 300 mm. Then the corresponding amounts of water for the 300 mm root zone are θ FC = 0.30 × 300 = 90 mm (field capacity) and θ WP = 0.10 × 300 = 30 mm (wilting point), and the available water is AW = 90 − 30 = 60 mm. In addition, suppose the operating policy is such that we would like to avoid drying up the soil beyond 50 mm. Since at the beginning of the week, the soil moisture is at field capacity, i.e., θ 0 = 90 mm; after one week, the soil moisture will go down to θ 1 = 90 + 12 − 30 = 72 mm > 50 mm. Similarly, at the end of the second week (without irrigation), the soil moisture becomes θ 2 = 72 + 12 − 30 = 54 mm, i.e., at the end of the second week, the soil moisture reaches the limiting threshold of 54 mm. Then, one may irrigate 90 − 54 = 36 mm every 2 weeks to comply with the operating policy.

3.4.3.1 Methods for Estimating Consumptive Use

There are several methods available for estimating consumptive use as a function of climatic factors such as temperature and radiation and type of vegetation (e.g., [7173]). For example, the Blaney-Criddle empirical equation [74] gives the consumptive use as a function of temperature, the percentage of daytime hours, and a crop use coefficient. The consumptive use for a given month t can be determined by

$$ {u}_t=\left(1/100\right){K}_t{T}_t{D}_t, $$
(1.42)

where u t = consumptive use for month t (inches), K t = crop use coefficient for month t, T t = mean monthly air temperature °F, and D t = % of the annual daytime hours occurring during month t (it varies with the latitude, the month, and the hemisphere). Table 1.1 provides values of D t as a function of latitude and month of the year. Then, the total consumptive use U throughout the irrigation season is

Table 1.1 Monthly percentage of daytime hours D t (relative to the year) for various latitudes of the north and south hemispheres (adapted from ref. 75)
$$ U={\displaystyle \sum_{t=1}^N{u}_t}={\displaystyle \sum_{t=1}^N{K}_t\frac{T_t{D}_t}{100}}, $$
(1.43)

where N = number of months of the irrigation season. More generally available are crop coefficients for the whole irrigation season rather than monthly coefficients. Then, (1.43) can be written as

$$ U={K}_s{\displaystyle \sum_{t=1}^N\frac{T_t{D}_t}{100}}, $$
(1.44)

where K s = crop use coefficient for the irrigation season. Table 1.2 is a brief table that gives values of K s for various crops.

Table 1.2 Values of K s for various crops (data taken from “Irrigation Water Requirements”, Tech. Release No. 21, USDA Soil Conservation Service, 1970)

We apply the Blaney-Criddle method to determine the total consumptive use and the net water required for an irrigation area located in eastern Colorado (with approximate latitude of 40°N). The irrigation area of 100 acres is planned where 20 % of the land will grow beans and 80 % potatoes. Consider the crop coefficients 0.65 and 0.70 for beans and potatoes, respectively, and the growing seasons June–August and May–September, respectively. The average monthly precipitation and temperature data for the referred months are shown in columns 2 and 3 of Table 1.3. We will apply (1.44) for each crop, and then, the total consumptive use for the 100-acre area in units of acre-ft can be determined by

Table 1.3 Computation of consumptive use for the example described in the text above
$$ U=\left(1/12\right)\times 100\times \left[0.2{K}_s(b){\displaystyle \sum_{t=1}^{N(b)}\frac{T_t{D}_t}{100}}+0.8{K}_s(p){\displaystyle \sum_{t=1}^{N(p)}\frac{T_t{D}_t}{100}}\right], $$

where N(b) and N(p) are the number of months of the growing season for beans and potatoes, respectively, and K s (b) and K s (p) refer to the crop coefficients, respectively. The computations are carried out in Table 1.3. From the foregoing equation and the results of the table, we get: Consumptive use for the total area = (1/12) × 100 (0.2 × 13.89 + 0.8 × 22.82) = 175.3 acre-ft.

The Blaney-Criddle method has been quite popular in practice for several decades. It is still used because of its simplicity. However, Penman [66] and Monteith [67] laid the foundation for an energy-based method, which has been known in literature as the Penman–Monteith method. Perhaps the most recent manual on the subject published in the United States has been prepared by a task committee sponsored by the Environmental Water Resources Research Institute (EWRI) of the American Society of Civil Engineers (ASCE) [72]. The committee recommended a “standardized reference evapotranspiration equation” denoted as ET SZ which may be applicable for a reference ET for a short crop (e.g., clipped grass with about 0.12-m height) and a tall crop (e.g., full-cover alfalfa of about 0.50-m height). Thus, the recommended ASCE Penman–Monteith equation for daily short reference ET (mm/day) is [72]

$$ E{T}_{SZ}=\frac{0.408\varDelta \left({R}_n-G\right)+900\gamma {u}_2\left({e}_s-{e}_a\right){\left(T+273\right)}^{-1}}{\varDelta +\gamma \left(1+0.34{u}_2\right)}, $$
(1.45)

where Δ = slope of the saturation vapor pressure-temperature curve (kPa/°C), R n = net radiation at the crop surface (MJ/(m2 × day)), G = soil heat flux density at the soil surface (MJ/(m2 × day)), γ = psychrometric constant (kPa/°C), u 2 = mean daily wind speed at 2-m height (m/s), e s = saturation vapor pressure (kPa) at 1.5–2.5-m height calculated as the average of the e s obtained for maximum and minimum air temperature, e a = mean actual vapor pressure at 1.5–2.5-m height (kPa), and T = mean daily air temperature at 1.5–2.5-m height (°C). Note that the constants 900 (numerator) and 0.34 (denominator) must be changed to 1,600 and 0.38, respectively, for daily tall reference ET, and different sets of constants must be used for calculations of hourly ET. The referred manual [72] provides the needed equations for calculating the various terms in (1.45) along with explanatory appendices and examples. Equation (1.45) has been tested for 49 sites throughout the United States and found to be reliable.

Furthermore, the estimation of the crop evapotranspiration ET c (i.e., consumptive use) requires an appropriate crop coefficient K c , i.e., ET c = K c ET SZ , where K c varies depending on whether ET SZ is for short reference or long reference ET. Allen et al. [72] provides the various key references available in literature for determining the appropriate values of K c . The crop coefficient varies with crop development stage and ranges from 0 to 1 (e.g., about 0.2 for young seedlings to 1.0 for crops at peak growing stage, although in some cases K c may reach values greater than 1, depending on the crop and weather conditions). For illustration Table 1.4 gives values of K c for commonly grown crops in the State of Colorado [76].

Table 1.4 Crop coefficients K c for commonly grown crops for use with long reference ET (summarized from 76)

We include a hypothetical example, similar to the previous one, but with additional concepts and details. For this purpose, we consider a farm lot with a soil having field capacity equal to 30 % (in volume) and wilting point 10 %. The root zone depth is equal to 30 cm, the crop evapotranspiration ET c (consumptive use) has been calculated as 30 mm/week, and the expected precipitation is 10 mm/week (for the sake of simplicity, we use weeks as the time frame; the calculations would be more realistic using a daily time step). It is also assumed that the actual evapotranspiration AET is equal to ET c whenever the soil moisture is at field capacity or greater; otherwise the actual evapotranspiration decreases linearly from ET c to zero at the wilting point (refer to Fig. 1.8). In addition, it is assumed that in the past several days, it has been raining so that at the time of planting the soil moisture is at field capacity (level). However, after planting supplementary irrigation may be needed to make up for the soil moisture deficit. Irrigation guidelines for the crop at the referred site suggest avoiding the soil moisture falling below 60 % of the field capacity level at any given time. The question is how much to irrigate and how often, to sustain the crop growth at the farm level.

Summarizing, the following data are specified: θ FC = 30 %, θ WP = 10 %, y = 30 cm, ET c = 30 mm/week, θ 0 = θ FC , and θ t > 0.60 θ FC for any t. Then, for a soil depth of 300 mm, the field capacity (in mm) is equal to θ FC = 0.30 × 300 = 90 mm, the wilting point (in mm) is θ WP = 0.1 × 300 = 30 mm, the available water AW = 90 − 30 = 60 mm, and θ t > 0.6 × 90 = 54 mm. In addition, from Fig. 1.8, the actual evapotranspiration can be determined as

$$ \begin{array}{c}\hfill AE{T}_t=\frac{E{T}_c}{\theta_{FC}-{\theta}_{WP}}\left({\theta}_t-{\theta}_{WP}\right)\kern1em \mathrm{for}\kern2em {\theta}_{WP}\le {\theta}_t<{\theta}_{FC}\hfill \\ {}\kern2.1em =E{T}_c\kern8em \mathrm{for}\kern3em {\theta}_t\ge {\theta}_{FC}\hfill \end{array}. $$
(1.46)

Since the initial soil moisture is θ 0 = θ FC = 90 mm, the actual evapotranspiration can be assumed to be equal to ET c for the first week, i.e., AET 1 = ET c = 30 mm, and assuming no irrigation in the first week, by the end of the week, the soil moisture depth becomes θ 1 = θ 0 + PAET 1 = 90 + 10 − 30 = 70 mm, which is bigger than 54 mm (the lower soil moisture limit specified). Then, from (1.46), the actual evapotranspiration for the second week becomes

$$ AE{T}_2=\frac{E{T}_c}{\theta_{FC}-{\theta}_{WP}}\left({\theta}_1-{\theta}_{WP}\right)=\frac{30}{90-30}\left(70-30\right)=20\;\mathrm{mm} $$

Likewise, assuming no irrigation for the second week, the soil moisture by the end of the second week is θ 2 = θ 1 + PAET 2 = 70 + 10 − 20 = 60 mm > 54 mm, and the actual evapotranspiration for the third week is AET 3 = 15 mm. Further, if we continue the third week with no irrigation, the soil moisture by the end of the third week becomes θ 3 = θ 2 + PAET 3 = 60 + 10 − 15 = 55 mm > 54 mm. If we let another week without irrigation, it may be shown that the soil moisture will fall below the threshold 54 mm; thus, it is apparent that we would need to irrigate in the following weeks. Table 1.5 gives the results of calculations following the procedure as shown above except that supplementary irrigation of 10 mm is considered every other week. Note that in the hypothetical example, we simply used the expected precipitation rate of 10 mm/week. In an actual situation, the weekly precipitation can be updated from measurements. The values in Table 1.5 suggests that irrigating more than 10 mm every other week will result in higher soil moisture levels and consequently higher values of AET. In fact, if we over-irrigate, it may cause the soil moisture to exceed the field capacity θ FC so that not only the AET will be at the maximum rate but also the excess water will percolate down to lower levels (in that case, the soil moisture balance must include the deep percolation term DP as shown in Fig. 1.9) and the irrigation water would not be used efficiently.

Table 1.5 Calculations of actual evapotranspiration and soil moisture for the hypothetical example
Fig. 1.9
figure 9

Schematic depicting key variables influencing the soil moisture of the root zone

3.5 Runoff

Runoff is a term used to denote the excess water from precipitation that is available on the land surface after abstractions. Estimations of surface flows over a watershed are used for designing hydraulic structures such as bridges, spillways, dams, levees, storm sewers, culverts, and detention basins. They are also useful for flood management programs, especially for delineating flood plains. The influence of land use changes over time can also be represented in mathematical models of watershed runoff. Also runoff estimates are a prerequisite for computation of sediment transport and associated chemicals. There are numerous mechanisms of runoff generation that often occur in combination to produce streamflow.

Hortonian overland flow occurs from a catchment when rainfall exceeds the ability of the soil to infiltrate water and the excess water moves on the surface downslope. For Hortonian overland flow to occur, the rainfall intensity must be greater than the saturated hydraulic conductivity of the soil, and the duration of the rainfall event must be long enough to achieve surface saturation. Hortonian overland flow is prominent in (1) semiarid to arid regions where rainfalls tend to be intense and natural surface conductivities are low, (2) areas where surface conductivity has been reduced by compaction, and (3) impermeable areas. Horton’s [47] view was modified by Betson [77], who proposed the partial-area concept according to which surface water may originate as Hortonian overland flow on a limited contributing area that varies from basin to basin.

In many forested catchments, rainfall intensity rarely exceeds the saturated conductivity of the surface soil, and saturation-excess overland flow develops because rain falls on temporarily or permanently saturated areas (wetlands) with no storage for water to infiltrate. Saturation overland flow is liable to be a dominant runoff mechanism in drainage basins with concave hillslope profiles and wide, flat valleys. When slowly moving subsurface water encounters saturated areas near the stream, some of the water reemerges onto the ground surface because the capacity of the soils to transmit all of the incoming water downslope is insufficient.

The areas of a catchment that are prone to saturation tend to be near the stream channels or where groundwater discharges to the surface. These areas grow in size during a storm and shrink during extended dry periods [65, 78, 79]. The areas on which saturation-excess overland flow develops expand and shrink during a storm in response to rainfall reflecting the overall wetness of the watershed. This mechanism of runoff generation is often referred to as the variable source area concept and was modeled by Govindaraju and Kavvas [80]. Variable source areas exert a very strong influence on the nature of the streamflow hydrograph for a storm event.

Most hydrologic models treat the overland flow process as fully turbulent, broad sheet flow, which may be satisfactory for computing runoff rates. However, overland flow can occur over large parts of the landscape, and the depths and velocities of flow can be extremely variable. The microrelief of most natural soil surfaces is highly variable, resulting in nonuniform flow depths across the surface. The flow concentration is sometimes called rills, and the areas between rills are called interrill areas. The degree that flow concentrations occur on a surface depends on soil cover, tillage, natural roughness, and soil erodibility.

Further, the interaction of overland flow with infiltration is strongly modulated by spatial heterogeneity of soil hydraulic properties. Specifically, water running downstream may infiltrate into regions characterized by moisture deficit leading to the runon process that, in principle, should be represented through a coupled solution of overland flow and infiltration equations. The assumption of spatially uniform infiltration and rainfall excess is an important limitation in most current modeling approaches of surface flows over natural surfaces. Runon has been incorporated in analyzing infiltration and Hortonian surface runoff (e.g., [8185]). Morbidelli et al. [85] presented a simple methodology that allows for an explicit representation of the runon process at the watershed scale.

Water that has infiltrated the soil surface and is impeded from downward movement due to stratification (abrupt or gradual) tends to move laterally in shallow soil horizons as subsurface storm flow or interflow. While this subsurface water generally moves slowly to the stream and contributes to baseflow, it may be energized by the presence of preferential flow pathways (e.g., soil cracks, old animal burrows, decayed root channels) leading to quickflow response in the stream. The response of interflow to rainfall events would be sluggish if it is not aided by the presence of macropores where Darcian flow through the soil matrix is largely short-circuited by water moving in conduits. Macropores are typically on the order of 3–100 mm in diameter and are interconnected to varying degrees; thus, they can allow water to bypass the soil matrix and move rapidly at speeds much greater than those predicted by Darcy’s law. Stillman et al. [86] show that the effective conductivity of soils increased by several orders of magnitudes in the presence of macropores. The combination of macropores and tile drains was shown to generate Hortonian-like streamflow responses even when no surface flow was observed. It is generally difficult to assess the importance of macropores or to simulate their effects in catchment-scale models because their number, orientation, size, and interconnectedness are highly site-specific and macroscopic properties have to be obtained through calibration.

In general, groundwater flow results in the longest travel time (days, weeks, to years) for the water that fell on the soil surface to eventually reach the stream. Streamflows during the dry periods are comprised almost entirely of groundwater discharge or baseflow. Consequently, baseflow tends to vary slowly and over long time periods in response to changing inputs of water through net recharge. The unsaturated portion of the soil holds water at negative gauge pressures (i.e., water pressure is less than atmospheric pressure). The hydraulic resistance offered by the unsaturated soil is high, resulting in low flux rates (see Sect. 4.1). Flow through the unsaturated zone is one of the primary mechanisms of replenishing the aquifer through recharge. In special cases, unsaturated flow may contribute to baseflow in a stream [87].

4 Soil Moisture Hydrology

“Soils sustain life” [88]. Many factors are embedded in that statement. From the hydrologic perspective, the key is soil moisture (soil water), and the mechanism for storing soil water is capillarity. More fundamentally the answer to how soils sustain life is the surface tension of water. Soil moisture is commonly considered the water in the root zone that enables the interaction with atmospheric processes such as precipitation and air temperature; it recycles water back to the atmosphere through evapotranspiration and serves as the medium for infiltration and subsurface recharge.

4.1 Basic Concepts and Definitions

Some key concepts related to soil water are:

  • Weak intermolecular attractions called van der Waals forces hold water together.

    Intermolecular forces are feeble; but without them, life as we know it would be impossible. Water would not condense from vapor into solid or liquid forms if its molecules didn’t attract each other. [89]

  • Surface tension is a force per unit area acting at air-water interfaces, because water molecules are attracted to themselves rather than to air. Where the air-water interface contacts a solid (i.e., soil particle), a contact angle forms to balance forces on the liquid water at the contact, which allows the curvature of an air-water interface to balance capillary forces. In this way, surface tension combines with the geometry of solids, or porous media, to cause capillarity.

  • Capillarity holds water in small pores, while allowing pressure continuity and drainage of water from larger pores at a given water pressure or matric potential. Soil water can be retained or stored in the near-surface soils for extraction by plants at matric potentials up to a wilting point of approximately 15 bars or 15 atmospheres of negative pressure.

Water is stored in the near surface primarily due to capillary forces that counteract gravity. As soils drain and dewater, smaller pores hold the remaining water, and the resulting hydraulic conductivity decreases rapidly. Mualem [90] quantified the unsaturated hydraulic conductivity K using the concept of a bundle of capillary tubes with a distribution of diameters representing the pore throats in real porous media. As a result, K of a soil having complex geometry has been quantified using the relatively simple equation:

$$ {K}_r\left({S}_e\right)=\sqrt{S_e}{\left[\frac{{\displaystyle {\int}_{\;0}^{\;{S}_e}{\psi}^{-1}d{S}_e}}{{\displaystyle {\int}_{\;0}^{\;1}{\psi}^{-1}d{S}_e}}\right]}^{\;2}, $$
(1.47)

where K r = K/K s , K s is K at saturation with water [m s−1], ψ is matric potential [m], and S e is effective saturation:

$$ {S}_e=\frac{\theta -{\theta}_r}{\theta -{\theta}_s}, $$
(1.48)

in which θ is volumetric water content [m3 m−3] with subscripts r and s denoting residual and saturated values. Mualem [90] derived (1.47) by assuming that an incremental change in soil water content is related to a pore water distribution function of the pore radii together with the capillary law, in which ψ is inversely proportional to the pore radius. Subsequently, van Genuchten [91] proposed the now commonly used analytical equation for water retention:

$$ {S}_e\left(\psi \right)={\left[1+{\left(\psi /{\psi}_c\right)}^{\alpha}\right]}^{-\beta }, $$
(1.49)

where α, β, and ψ c [m] are fitting parameters. Combining (1.49) with Mualem’s model of (1.47) yields the predicted K r (S e ) as

$$ K\left({S}_e\right)={S}_e^{\eta }{\left[1-{\left(1-{S}_e^{1/\beta}\right)}^{\beta}\right]}^{\;2}. $$
(1.50)

The parameter β comes from the fitted water retention curve of (1.49). Mualem [90] explored a range of η values (−1.0 to 2.5) over a large data set (45 soils) and found that an average value of approximately η = 0.5 was optimal.

How well do these (1.49) and (1.50) fit measured soil characteristics? Many studies have reported water retention (storage) data relevant to (1.49), but flux data relevant to estimating hydraulic conductivity (1.50) are rarely measured. A high-quality data set for a silt loam soil is shown in Fig. 1.10. The van Genuchten (vG) equation (1.49) fits the water retention data well over the range of measured soil water contents (left in Fig. 1.10). However, the default (predictive) Mualem-van Genuchten (MvG) equation (1.50) with η = 0.5 overestimated measured values of K at high values of soil water suction (right in Fig. 1.10). By fitting the η value (η = 4.7), the fitted K is excellent over 6 orders of magnitude. Note that this value of the saturation exponent is greater than the upper limit explored by Mualem [90]. Figure 1.10 (right) also illustrates how poorly the Gardner (1958) equation fits the K data due to its exponential structure, and the fit cannot be improved further by parameter adjustment. Gardner’s equation is commonly used for its quasi-linear mathematical convenience, but this example offers caution. More recently, Rucker et al. [93] presented a method for improved parameter equivalence between MvG and Gardner equations, given the utility of Gardner’s exponential form.

Fig. 1.10
figure 10

Soil hydraulic property data [94] for left (water retention characteristic for Ida silt loam with the fit to (1.49)) and right (hydraulic conductivity as a function of matric suction (1,000 cm = 10 bar)) with model fits combining (1.49) and (1.50). Possible extreme fits using the exponential (log-linear) equation of Gardner [92] are shown for comparison (original figures taken from [95]). Notation: vG = van Genuchten’s equation (1.49); MvG = Mualem-van Genuchten’s equation (1.50); Gardner = Gardner’s exponential model

This example soil (Ida silt loam is not an extreme example) allows us to illustrate some possible time scales associated with soil hydraulic properties at different moisture states. Six orders of magnitude in K from saturation (θ = 0.55 m3 m−3) to the assumed wilting point of plants (15 bar or approximately θ = 0.20 m3 m−3) are indicative of the potential range of temporal responses in soils (assuming a unit vertical gradient or gravity drainage). For example, 1 year = 365 × 86,400 s = 31.5 × 106 s, such that the same amount of water draining in about half a minute at saturation would take a full year at 15 bar.

Perhaps as impressive a contrast is the change in K going from saturation to 100 cm of suction. In this example, very little water is drained (Δθ ~ 0.05 m3 m−3), but K decreases by a factor of approximately 20. At θ ~ 0.5 m3 m−3, K = 10−5 cm s−1, or less than 9 mm day−1, water is being stored in the root zone for plant extraction at a maximum daily rate similar to the drainage rate. Subsequently, as the soil drains and dries further, root water uptake is the dominant sink.

Intermolecular forces in a soil water system may be depicted as shown in Fig. 1.11 [96], where moisture tension or matric suction is shown (in a log-log scale) to be a power function of water film thickness in soil. Here water film thickness is defined as a representative length of the water-solid interface to the air-water interface. Hygroscopic water is in very close proximity with the solid surface (within approximately 0.2 μm) and is considered immobile under subsurface environmental conditions. Field capacity is a commonly used term (see Sect. 3.4), but not well defined. Soils do drain under gravitational force (unit vertical gradient) at rates determined by K(ψ), such that capillary water drains at different rates. However, the drastically reduced rates of drainage (Fig. 1.10 right) allow soil water to be stored over a range of time scales (minutes to years).

Fig. 1.11
figure 11

Conceptual soil water retention modes (from [96])

4.2 Soil Moisture Recycling

In addition to soils storing water for plant growth, water evaporated from soils and transpired by plants is recirculated into the atmosphere, thus promoting a positive feedback mechanism for precipitation. The importance of this feedback seems to depend on the scale of interest. At the global scale, circulation of water between the land, atmosphere, and ocean is obviously important. Simulation of such circulation patterns is the basis for projecting future climates in general circulation models (GCMs). Moving down in scale to individual continents, basins, and regional watersheds, the coupling of land-atmosphere interactions may become looser. For this reason, hydrologic models are typically run “off-line” (not coupled with an atmospheric model to capture these land-atmosphere feedbacks) and driven by measured precipitation without considering feedbacks. However, regional-scale feedback has been shown to account for a “weakly dependent” pattern of annual rainfall via “precipitation recycling” in central Sudan [97], the Amazon Basin [98], and other regions of the world (e.g., [99]). At linear scales of <300 km (i.e., watershed areas <90,000 km2), however, the recycling ratio (P/ET) of a watershed is expected to be less than 10 % based on simple scaling of annual precipitation in the Amazon Basin [100]. More recently, Koster et al. [101] described the Global Land–Atmosphere Coupling Experiment (GLACE) as a model intercomparison study addressing the effects of soil moisture anomalies that affect precipitation at the GCM grid cell resolution over the globe. The simulated strength of coupling between soil moisture and precipitation varied widely, but the ensemble multi-GCM results provided “hot spots” of relatively strong coupling based on a precipitation similarity metric. Koster et al. [101] discussed differences between their approach and the methods of estimating “recycling” above, but all studies indicate that the land’s effect on rainfall is relatively small, though significant in places, relative to other atmospheric processes.

4.3 Variability of Soil Moisture

Soil moisture varies spatially (laterally and vertically) and temporally with characteristic periodicities (from infiltration events to diurnal, seasonal, annual cycles) and longer-term variability related to climatic variability (e.g., ENSO, PDO, and AMO) and projected climate change. As noted in Sect. 4.1, soil moisture response times are controlled primarily by moisture-dependent soil hydraulic properties. Generally, soils in humid climates respond much faster to a unit of infiltrated water than similar soils in more arid climates. This highlights an interaction between climate and soil physical characteristics (related to soil texture and structure). Delworth and Manabe [102] found global patterns of soil moisture in simulations, where the surface energy balance controlled soil moisture interactions with the atmosphere. The partitioning of sensible and latent (evaporative) heat fluxes were influenced strongly by soil moisture. This caused longer response times in soil moisture at high latitudes (moving from the equator to the poles) associated with lower potential evaporation. Therefore, depending upon the scale of interest, one may need to consider coupled land (soil and vegetation)-atmosphere interactions to assess the spatial and temporal variability of soil moisture.

Profile soil moisture dynamics can also vary spatially by hillslope position in semiarid (e.g., [103]) and more humid environments [104, 105].

Spatial variability of soil moisture may be related to short-term hydrologic processes, land management, and weather patterns and to long-term soil development and terrain attributes. Spatial soil moisture has been correlated with terrain attributes, such as surface slope, aspect, curvature, potential upslope contributing area, and attributes derived from these quantities. The processes causing these correlations usually are not identified rigorously, but the inferred factors include short-term hydrometeorological fluxes and long-term pedologic and geomorphic processes.

Zaslavsky and Sinai [106] related near-surface (0–0.4 m) soil moisture variation to topographic curvature (Laplacian of elevation) in Israel, citing processes of unsaturated lateral subsurface flow and raindrop splash affected by local surface curvature. Slope-oriented soil layering and the associated state-dependent anisotropy [107, 108], as well as lateral flow caused by transient wetting and drying [109], likely caused the observed soil moisture variability.

In a more humid environment in Australia [110], soil moisture variability has been related to topographic attributes, including ln(a), where a is the specific contributing area or a potential solar radiation index (PSRI) as a function of topographic slope, aspect, and solar inclination (latitude). More recently, data from Australia and other sites have been reanalyzed using empirical orthogonal functions (EOF’s) for space-time interpolation of soil moisture [111], and the EOF parameters were correlated with the mean soil moisture state and topographic attributes [112, 113]. In this way, spatially explicit patterns of soil moisture are estimated rather than relating a lumped statistical distribution of soil moisture to the spatial mean, as previously inferred using the variable infiltration capacity (VIC) model [114].

4.4 Scaling of Soil Moisture

The variance of soil moisture within an area typically increases with the size or spatial extent of the area [115]. Green and Erskine [116] and Green et al. [117] conducted field-scale experiments to measure spatial attributes of soil moisture, soil hydraulic properties (water infiltration capacity), crop yield, and landscape topography for spatial scaling and modeling in rain-fed agricultural terrain. The spatial sampling design for soil water (Fig. 1.12a) provided a range of sample spacings, and these samples were used to estimate the lumped statistical distribution of water content for each sample date illustrated with histograms (Fig. 1.12b).

Fig. 1.12
figure 12

(a) Spatial sampling pattern, using a time domain reflectometry (TDR) sensor to measure soil moisture in the top 0.3 m of soil and (b) soil moisture histograms of spatial measurements at 4 sampling dates (from [116])

Fractal geometry was found to characterize the spatial autocorrelation structure of these spatial variables. “Simple” or monofractal geometry was inferred for soil moisture using power-law semi-variograms (Fig. 1.13a–e). The fitted power-law models plot as straight lines on a log-log scale (Fig. 1.13f), which shows the temporal variability of the spatial structure. Steeper lines in June and their associated higher values of the model exponent correspond to greater spatial organization, which yields a higher Hurst exponent and lower fractal dimension [116].

Fig. 1.13
figure 13

Soil moisture experimental semi-variograms and power-law model fits used to estimate fractal geometry at different sampling dates. Model parameters a and b shown in (a)–(e) are the power-law multiplier and exponent, respectively, where b is related to the Hurst exponent or fractal dimension [116]

In a different agricultural field in Colorado, Green et al. [117] estimated steady infiltration rates from single-ring infiltrometer measurements at 150 nested sample locations spanning 10 hillslope positions (30-m by 30-m sites). Fractal behavior was analyzed using fractional moment analysis and power-law variogram fits to estimate the multifractal exponent or the monofractal Hurst exponent H as a function of the maximum lag distance h max. The spatial values of infiltration displayed persistence (H > 0.5) up to a maximum H = 0.9 at approximately 200 m, followed by a decline in H down to a value of H = 0.14 for h max = 600 m.

Because such “pre-asymptotic” behavior [118] in infiltration fractal behavior and persistence may be due to the sparseness of the measurements between landscape positions, terrain attributes computed from a 5-m grid DEM were used as surrogate spatial data. Terrain attributes (slope and contributing area) displayed similar variations in fractal behavior using the dense terrain data. Spatial persistence was identified at hillslope scales (approximately 200 m) with much lower values of H at smaller and larger scales. Green et al. [117] surmised that hillslope-scale processes affecting soil erosion, deposition, and development may account for the deviations from pure fractal behavior. Based on previous numerical simulations [119], areal infiltration capacity decreases with increasing values of H, for a given variance in saturated hydraulic conductivity. If H changes with the scale of analysis, as indicated here, pre-asymptotic persistence of infiltration rates at hillslope scales may be larger than expected at field to watershed scales.

Advances in hydrologic simulation and spatial quantification for precision agriculture and conservation require prediction of landscape-related variability and the transfer of information across spatial scales. The studies referenced here provided insights and methods for scaling infiltration, soil water content, and crop yield related to landscape topography.

5 Hydrology of Glaciers

The cryosphere is an important component of the hydroclimate system, and the melt of snow and ice plays a vital role in the hydrologic cycle of river basins [120]. In recent decades, there has been a major concern of water resources specialists and policy makers regarding the accelerated decline of glaciers worldwide. It has been argued that such decline has been occurring due to global warming resulting from a number of factors such as the effects of increasing atmospheric concentration of CO2 and other greenhouse gases, the effect of land use changes, and other human activities. Regardless the loss of glacier cover may have a significant impact on the availability of water resources in various parts of the world, such as the Andean and Himalayan regions. For example, it has been estimated that glacier decline may have a significant impact on hydropower production in the Peruvian Andes Mountains [121]. This section describes briefly the methods commonly available for estimating ice melt/snowmelt. A more detailed discussion on the subject can be found in Singh et al. [122] and the references cited below.

5.1 Basic Concepts and Definitions

A glacier is a body of solid water in a perennial state that gains mass through solid precipitation (primarily snow) and loses it by melting and sublimation of ice. The relationship between the gain and loss of mass of water is known as glacier mass balance. The area where the accumulation of mass occurs is called the accumulation zone and is located at the top of the glacier, where, by the action of gravity, it moves downward towards the lower parts of the glacier. A glacier behaves as a viscoplastic body that becomes deformed because of its own weight [123]. Generally the solid precipitation (snow) that is subject to temperature below 0 °C is transformed into ice. In addition, the albedo is generally above 0.6 which avoids the exchange of energy with the surface of the glacier. At the lower part of the glacier, there is the ablation zone where the process of fusion takes place with more intensity and snow and ice are melted (Fig. 1.14). This mainly occurs on the surface, in part because of the temperature (above 0 °C) and liquid precipitation that allows the albedo to be below 0.4. The ablation and the accumulation zones are separated by an imaginary line known as glacier equilibrium line (ELA) where the glacier mass balance is zero (no gain or loss of mass) and is usually found close to the isotherm 0 °C. A related term is the line (elevation) of the annual end-of-the summer snowline (EOSS) which sometimes is used as a surrogate for glacier mass balance values [124]. The dynamics in the accumulation and ablation zones is mainly controlled by the energy balance and the topography of the area where the glacier is located.

Fig. 1.14
figure 14

Andean Glacier where it can be seen as the accumulation and ablation zones separated by the glacier equilibrium line ELA (line of segments) (personal communication from Bernard Pouyaud)

The study of glaciers has a broad interest, ranging from paleoclimatology (isotopic dating of ice cores, reconstitution of moraines, etc.) to future scenarios of the evolution of the glacier coverage as time evolves. From a hydrologic perspective, the most relevant processes are the ice melting and the ensuing contribution of glacier meltwater to river discharge. The dynamics of glaciers are complex because of the physical processes involved, the topographic conditions, the geographical area where it is located, and the climatic conditions of the area. A simple conceptualization assumes that a glacier consists of a system of three components: snow, firn, and ice.

5.2 Glacial and Snow Fusion Methods

The processes of ice melt and snowmelt are driven by different factors such as energy exchange, albedo, temperature, slope, shading, and orientation. A number of methods have been proposed in literature for estimating the amount of ice melt and snowmelt (e.g., [125127]). The methods are based on one or more of the several processes and variables involved. The modeling of ice and snow fusion in the midlatitude areas is similar as in the tropical regions, except for the difference of seasonal climatic conditions. Currently there are two types of methods used for estimating the glacial melting and snowmelting: the temperature-index (often called degree-day) method and the energy balance method. Also hybrid methods have been suggested. The details and experience of the various methods have been reviewed by Hock [126, 127].

5.2.1 Temperature-Index Methods

The temperature-index methods are based on conceptual models that are formulated as lumped (global) or semi-distributed levels. They are generally based on empirical relationships between the ice melt/snowmelt and the air temperature. This is because of the strong correlation that exists between these two variables. For example, Braithwaite and Olesen [128] found a correlation of 0.96 between annual ice melt and positive air temperature. Despite that several other factors besides air temperature influence the ice melt/snowmelt rate, the main reason why temperature-index-based models generally give quite satisfactory estimates of ice melt/snowmelt is because of the significant relationship between some of the key physical factors involved with temperature. For example, long-wave radiation and sensible heat flux generally are the largest source of heat for the melting of ice and snow. And both heat fluxes are strongly affected by air temperature (Ohmura [120]). A detailed review of the physical basis and the various factors involved in temperature-based models, their usefulness, limitations, and experience can be found in Ohmura [120] and Hock [126].

Hock [126] gave four reasons why temperature-index models have been popular in practice. They are summarized as follows: (1) wide availability of air temperature data, (2) simple interpolation and forecasting possibilities of air temperature, (3) generally good performance, and (4) ease of use. The temperature-index models can be used at different time scales such as daily, weekly, and monthly. The temperature data are easily available either from measurements or indirectly from reanalysis. Its wide application includes the prediction of ice melting/snowmelting for flow forecasts operations, modeling of glacier mass balance, and the evaluation of the snow and ice response applied to climate change predictions (e.g., [129, 130]).

The classical model for the ice melt and snowmelt for a given day may be expressed as

$$ \begin{array}{l}{M}_t=\mathrm{DDF}\times {T}_t,\kern3em {T}_t>0\\ {}\kern1.1em =0\kern3.6em,\kern3em {T}_t\le 0,\end{array} $$
(1.51)

where M t = ice melt or snowmelt during a given day (mm/day), T t = mean daily temperature (°C), and DDF = degree-day factor (mm/(day × °K)). Naturally in order to calculate the total melt M during a given number of days n (e.g., n = 30 days), one will have to integrate the daily values so that M = ∑ n1 M t .

The values of DDF may be determined by direct comparison using snow lysimeters or ablation stakes or from the melting estimated by the energy balance method (e.g., [131133]). Field work investigations have shown that the term DDF exhibits significant temporal and spatial variability and it generally varies depending on the season, location, orientation of mountain slopes, humidity, wind, and other environmental factors. Hock [126] provides a table of average values of DDF for snow and ice (Table 1.6 above is a brief summary). Table 1.6 shows that the values of DDF for snow are lower than those for ice, which is mainly because of the higher albedo of snow compared to the albedo for ice.

Table 1.6 Values of DDF for snow and ice for different parts of the world (summarized from 126)

5.2.2 Energy Balance Method

The energy balance is a physically based method where all sources of energy of the underlying system (e.g., 1 m2 on the surface of a glacier) are considered and the excess of energy is assumed to be used for ice melt/snowmelt. It is mostly used for estimating melt for short time steps (e.g., daily or hourly) although it can be used for longer periods. The energy balance method analyzes the exchange of energy produced between the surface of the glacier and the air.

The energy balance equation may be written as [127]

$$ {Q}_N+{Q}_H+{Q}_L+{Q}_G+{Q}_R+{Q}_M=0, $$
(1.52)

where Q N is the net radiation flux, Q H is the sensible heat flux, Q L is the latent heat flux, Q G is the ground heat flux, Q R is the heat flux brought by rainfall, and Q M is the energy consumed by the ice melt/snowmelt (the units of all energy fluxes may be W/m2). Then the ice melt/snowmelt rates may be determined by

$$ M=\frac{Q_M}{\rho_w{L}_f}, $$
(1.53)

where ρ w is the density of water and L f is the latent heat of fusion. Therefore, the energy balance method for estimating the melt rate M requires measuring all the heat fluxes involved in (1.52), i.e., Q N , Q H , Q L , Q G , and Q R . However, this task is generally costly since it requires many especial equipment and instruments. Thus, alternative methods have been developed for estimating the various terms involved based on standard meteorological observations [127].

Energy balance models may be applied at a site (location of the equipment) or for an area (e.g., a square grid). Examples of energy balance studies at a Table 2 of Hock [127]. For instance, Hay and Fitzharris [134] for the Ivory Glacier (1,500 m) in New Zealand gave the following estimates: Q N = 76, Q H = 44, Q L = 23, and Q R = 4 (W/m2), which correspond to 52 %, 30 %, 16 %, and 2 %, respectively, of the total energy flux available for melt, i.e., Q M = 147 W/m2 (these estimates were made for a period of 53 days during the summer). As expected, the relative contributions of the various components of the energy balance equation vary with the weather conditions so that they change during the melt season. Also the direct comparisons of the various estimates reported by different studies are restricted by the different uncertainties arising from the instruments and methods utilized. In addition, energy balance studies at spatial distributed scales are lacking; thus, a challenge for distributed studies is the extrapolation of input data and energy balance components for the entire grid [127].

5.2.3 Remarks

The temperature-index-based methods are much simpler to use than the energy balance methods. For this reason, the former methods have found wide acceptance in practice for a variety of problems such as flow forecasting (e.g., [135]) and assessment of basin response to potential climate change [136]. For basin studies requiring melt estimations at the monthly, weekly, and daily time scales, simple temperature-index-based methods may be sufficient, but for smaller temporal scale (e.g., hourly) and spatial scale (e.g., small basins), more refined energy-based methods may be desirable.

In addition, over the years, there have been a number of studies oriented to improving the temperature-index-based methods by incorporating a radiation term in the equations (e.g., [131, 137, 138]) and adding other variables such as wind speed and vapor pressure (e.g., [139]). Likewise, the temperature-index-based and energy balance-based methods have been applied for the same basin but for different time periods, i.e., the temperature-index-based method for the dry periods and a simplified energy balance method during wet periods [140]. Also, software such as SRM has been developed and applied for a wide range of studies for estimating snowmelt and glacier melt worldwide [141].

Furthermore, in some cases, estimations of glacier melt contributions to streamflows have been made using water balance equations for the basin where a glacier drains. For example, this method has been applied [142] to estimate that a significant amount (at least one-third) of annual streamflows of the Santa River in Peru arises from the melt of glaciers located at the Cordillera Blanca (Andes Mountains). Also a simple energy balance model with remotely sensed data of short-wave and long-wave radiations, DEM obtained from the Global Land One-Kilometer Base Elevation (GLOBE) and glacier areas derived from the Global Land Ice Measurements from Space (GLIMS) database have been utilized for estimating glacier contribution to streamflows worldwide [143].

5.3 Glacier Equipment

The energy balance method briefly summarized in Sect. 5.2.2 above requires specialized equipment and instrumentation to measure and estimate the various variables involved. The equipment installed at a selected site includes a number of sensors to measure the incident and diffuse solar radiation, both short- and long-wave radiation, air temperature at the glacier area, humidity, and speed and direction of the wind. In the case of applying the “temperature-index” methods, the equipment is simpler where the most important sensor is for air temperature at the glacial area. In both cases, it is important measuring precipitation data (at least of liquid precipitation) at the glacial area. Figure 1.15 shows a station located in the tropical Andes Mountains at 5,180 m.a.s.l in Peru. This station has sensors for radiation, speed and direction of the wind, temperature, pressure, humidity, an echo sounding (to measure variations of snow height), and a gyroscope system that keeps the radiation sensors in a vertical position on the ice surface.

Fig. 1.15
figure 15

Climate station located at a tropical glacier in Peru

6 Watershed and River Basin Modeling

We have seen in previous sections a number of fundamental processes related to the hydrologic cycle, such as precipitation, interception, depression storage, infiltration, evaporation, soil moisture, and glacier melt/snowmelt. The main purpose of this section is to bring together many of the underlying concepts and mathematical formulations of the referred processes for finding the relationships of precipitation and streamflow at various time scales, such as hours, days, weeks, seasons, and years. We will refer here to some of the foregoing concepts, mathematical formulations, and models to describe and interrelate the underlying physical processes so that one can estimate, for example, what fraction of the precipitation that falls on the basin is transformed into streamflow at the outlet of the basin. We will start by reviewing some needed concepts and then proceed with discussing some key features of such as concepts and definitions, types of models, temporal and spatial scales, model building and formulation, model calibration, and a brief example.

Hydrologic modeling of watersheds and river basins has a long history since the simple rational method for relating precipitation and runoff was established in the nineteenth century [144]. Thus, in about 160 years, a number of scientific and technological developments have occurred, which has led to a variety of models with various degrees of sophistication. Nowadays watershed models are increasingly adopted in the decision-making process to address a wide range of water resources and environmental issues. Models can represent the dynamic relationship between natural and human systems (Fig. 1.16) and have been applied to describe not only the transport of water and sediment but sources, fate, and transport of contaminants in watersheds and river basins. They can also help evaluate the response of fluxes of water, sediment, chemicals, and contaminants to the changing climate and land use. Efficient management of water and environmental systems requires integration of hydrologic, physical, geological, and biogeochemical processes. This has led to the development of complex models that can simulate an increasing number of state and output variables at various locations within the watershed and river basin. On the other hand, data availability and identifiability among other pragmatic considerations suggest adopting simple model structures that can solve the problem at hand. The development and application of watershed models must take into account the tradeoff between available data, model complexity, performance, and identifiability of the model structure.

Fig. 1.16
figure 16

Watershed and River Basin systems where processes interact with atmospheric and oceanic systems, environmental and ecological systems, and human systems

Over the years, a number of articles and books have been published on models for representing the hydrologic cycle of watersheds and river basins. For instance, detailed examples of a variety of models can be found in the books by Singh [145], Beven [146], Bowles [147], and Singh and Frevert [148, 149], and journal articles have been published describing critical issues involved in the modeling process such as parameter estimation, uncertainty, sensitivity analysis, performance, scaling, and applications and experiences thereof (e.g., [150155]). Also literature abounds on classification and types of models such as deterministic or stochastic, conceptual or physically based (mechanistic), lumped or distributed, event or continuous, and some other types such as black box and parametric (e.g., [52, 60, 156158], and other papers cited above).

6.1 Basic Concepts and Definitions

A model is a representation of a real system such as a watershed and a river basin. The model structure is developed on the basis of our understanding of the physical principles/rules that govern the system. A general distributed-parameter form of a model that can represent spatial heterogeneities inherent in the real system, as well as nonlinear interactions between system processes, can be expressed as [159]

$$ \frac{ dx\left(r,t\right)}{ dt}=f\left({\nabla}^2x,\nabla x,x,u,\theta, t,r\right), $$
(1.54)

where f is a functional representation of the internal system processes; x, u, and θ are vectors of state variables, system inputs, and model parameters, respectively; t represents time; and r is a three-dimensional vector specifying spatial locations. State variables (x) are quantities of mass stored within the system boundary (B). Model inputs (u) and outputs (y) are fluxes of mass and energy into and out of the systems, respectively. Figure 1.17 illustrates a schematic of model components from a system perspective [160]. In the context of watershed modeling, examples of state variables may include mass of water, sediment, chemicals, organisms stored within surface and subsurface system compartments, vegetation biomass on the ground surface, and energy fluxes. Precipitation flux is a typical example of model input (u) (although depending on the model input, variables may also include wind speed, air temperature, humidity, and radiation as needed). Fluxes of flow, sediment, and chemicals along the river network are examples of model outputs. Model parameters (θ) are assumed to be time invariant, but they may vary in space depending on the model; they are estimated by appropriate methods for solving the partial differential equation of (1.54).

Fig. 1.17
figure 17

Schematic of model components (adapted from ref. 160)

A simpler lumped-parameter model in the form of an ordinary differential equation can be derived from (1.54) to describe changes in state variables as a function of time as

$$ \frac{ dx(t)}{ dt}=f\left(x,u,\theta, t\right), $$
(1.55)

and model outputs are generated as

$$ y(t)=g\left(x,\theta, t\right). $$
(1.56)

The vector functional relationships f and g constitute the model structure M and are typically treated as deterministic components. In reality, however, a model is only a mere approximation of the hydrologic system under study. Many important processes may be unknown during the development of the model, while some other processes may be considered to be insignificant and may be ignored. Thus, the structure of any watershed model will be generally incomplete and will contain uncertainty. Mathematical models are never perfect because of the errors in model conceptualization, input and output observations, physical characteristics of the system, temporal and spatial scales, and parameter estimation. In this scenario, model parameters may be understood as being effective parameters that compensate for these errors [161163].

Depending on the physical basis of the model structure components f and g, watershed models may be classified into two main categories: physically based models and empirical models. Physically based models employ physical principles, i.e., conservation of mass, energy, and momentum, to describe the nonlinear system dynamics that control the movement of water, particles, chemicals, and organisms within various system compartments. Empirical models describe the relationships between system inputs and outputs based on statistical analysis of historical observations. Application of empirical models is limited to the conditions over which the observations were acquired. Most watershed models, however, contain both physically based equations and empirical relationships to represent the underlying processes of the watershed and may be referred to as process-based models.

Often, physically based models are divided into conceptual models and pure physically based models, where the latter ones presumably involve the proper equations for each process according to the current state of the knowledge. Freeze and Harlam [164] established the blueprint of this type of hydrologic models, proposing for the first time a representation of the underlying processes using the physical equations at the local scale. In principle, “pure” physically based models do not need calibration, and parameters can be estimated directly from the available information. However, the proper physical equations to be applied depend on the spatial scale. For example, consider the modeling of an aquifer (Fig. 1.18): generally applying the Darcy law at the local scale, one should use Boussinesq’s equations for solving the groundwater flow problem, but at pore scales, they must be substituted by the Navier–Stokes equations. Furthermore, at the aquifer scale, linear reservoir and water balance equations may give a good representation of the system.

Fig. 1.18
figure 18

Different equations may be needed for different scales in saturated flow in aquifer systems. From left to right in blue box: pore (1 mm), local (1 m), and whole aquifer (10 km) scales (color figure online)

Regardless of the degree of empiricism embedded in the model structure, the system of functional relationships can be resolved on various spatial and temporal scales. These equations may be solved for the entire watershed as a single unit where a unique set of state variables are defined and a single set of model parameters are estimated for the entire system. Models that use this approach are referred to as lumped-parameter models. On the other hand, the watershed may be divided into smaller rather homogeneous subunits or areas (e.g., grids, sub-watersheds, hydrologic response units) in order to better represent the heterogeneities of watershed characteristics such as soils, land use, land cover, and terrain as well as the spatial variability of the inputs such as precipitation and potential evapotranspiration. Thus, these models are referred to as distributed-parameter models, where the state variables and model parameters are different for each subunit. Then, as the number of subunits increases, the computational burden for solving the system equations may substantially increase.

In addition to the spatial scale, selection of appropriate temporal scales is an important consideration for building the models and solving the system equations. Event-based models are needed for analyzing the effect of design storms on the hydrologic system and usually require small time steps such as hours or even smaller, depending on the size of the catchment. Larger time steps (e.g., daily or even longer in some cases) may be sufficient for continuous models that are appropriate for long-term assessment of changes in the hydrologic system in response to the changes in climate, land use, and management drivers. Even in the case of distributed-parameter models with relatively small subunits for model computations, model parameters are aggregate measures of spatially and temporally heterogeneous properties of each unit. Thus, model parameters will always contain uncertainties that propagate forward into model predictions of state and output variables. Even “pure” physically based models involve effective parameters that must be also calibrated [165, 166].

6.2 Brief Example

For illustrative purposes, we describe some basic aspects of hydrologic modeling using a relatively simple distributed model known as TETIS model and its application to the Goodwin Creek basin for flood event simulation.

6.2.1 The Basin

Goodwin Creek is a 21.3 km2 experimental basin located in Panola County (Mississippi, USA). The watershed is fully instrumented with 14 flow gauges and 32 rain gauges, in order to continuously monitor precipitation, runoff, and sediment yield with a high spatial and temporal resolution [167]. The original hydrometeorological data has been sampled for this work at 5 min temporal resolution. Soils are mainly silt loams, and the topography is quite smooth, with elevation ranging from 67 to 121 m.a.s.l. and slope from 0 to 22 % (for a 30-m scale resolution). Major land uses are pasture, agriculture, and forest. The climate is humid, warm in the summer and temperate in the winter. The average annual precipitation is 1,440 mm, and convective rainfall events are common, especially in the summer. The watershed surface hydrology is largely Hortonian, with runoff almost entirely formed by overland flow and a little baseflow at the outlet (less than 0.05 m3/s). The main storm events of years 1981, 1982, and 1983, with peak flows at the outlet of 39.8, 37.8, and 106.3 m3/s, respectively, will be used in this example.

6.2.2 The Distributed Model

The TETIS model is a distributed hydrologic model, with physically based formulations and parameters, developed for continuous and event simulation of the hydrologic cycle at basin scale. The model has been satisfactorily tested in different climatic scenarios with a wide range of basin sizes, from a few hectares up to 60,000 km2 [163, 168172].

Version 8 of TETIS (free download at http://lluvia.dihma.upv.es) represents the water cycle in each cell of the spatial grid using up to six interconnected vertical tanks. The example here (with no snow and no explicit representation of the vegetation interception) considers only four tanks as shown in Fig. 1.19. The relationships between tanks, representing the different hydrologic processes, are described by linear reservoirs and flow threshold schemes. The first tank (T1) represents the aggregation of vegetation interception, surface puddles, and upper soil capillary retention; water can leave this tank only by evapotranspiration and, for this reason, is called static tank (storage). The second tank (T2) corresponds to the surface storage, i.e., the water that does not infiltrate and generates overland flow. The third tank (T3) represents the gravitational soil storage; the percolation process is modeled according to both soil saturation conditions and vertical hydraulic conductivity, and the remaining water is available for interflow. The fourth tank (T4) represents the aquifer, which generates the baseflow and the groundwater outflow (underground losses in reference to the catchment outlet). The groundwater outflows are the aquifer flows that do not contribute to baseflow within the basin (generally they contribute to baseflows downstream or to the sea). Eight parameters are needed for the runoff production: static storage capacity, vegetation density, soil surface infiltration capacity, horizontal saturated permeability and percolation capacity, aquifer permeability, underground loses capacity, and overland flow velocity.

Fig. 1.19
figure 19

Vertical conceptualization of TETIS model considering four tanks

Basin stream network can be considered as an additional fifth tank, but it is not necessarily included in all cells. Two different types of channel networks can be defined: gullies (without permanent flow) and rivers (with permanent flow). The starting cells of these networks are defined by two drainage area thresholds. Every cell receives inflows from upstream and drains downstream following a 3D scheme generated from a digital elevation model (DEM), and the flow is routed towards the lowest of the eight contiguous cells. Figure 1.20 shows a 2D simplification of this scheme. The overland flow and the interflow are routed to the respective tanks (T2 and T3) of the downstream cell (Fig. 1.20); once both flows reach a cell whose drainage area is greater than the threshold drainage area corresponding to gullies, they move into T5. In the same way, baseflow is routed to T4 of the downstream cell until it reaches a second threshold drainage area (for river channels), and then it moves into T5. Therefore, this couple of threshold drainage areas divides the watershed into three classes of cells: pure hillslope cells (without the T5 tank), gully cells (with T5 tank and no connection between aquifer and gully), and river cells (with the T5 tank and connection between aquifer and channel). The flow routing along the stream channel network is carried out using the geomorphologic kinematic wave (GKW) methodology, where cross-section and roughness characteristics of the stream channel network are estimated with power laws of drainage area and slope for each cell [163].

Fig. 1.20
figure 20

Horizontal conceptualization of TETIS model

The model effective parameters are organized following a split structure [163, 173]. Basically, each effective parameter i for cell j, (θ ij *), is the multiplication of a correction factor R i that depends only on the type of the parameter and the prior parameter estimation θ ij , i.e., correction factors modify globally each parameter map, assuming the prior spatial structure and thus reducing drastically the number of parameters to be calibrated. In the referred TETIS configuration, there are a total of nine correction factors: eight affecting the runoff production parameter maps and one for the stream network velocity. Also, the split structure of model effective parameters facilitates the extrapolation to ungauged watersheds [171].

6.2.3 Initial Parameter Estimation

Concerning the initial parameter estimation in the model calibration, the best advice is to use all available information and experience. In this example, the basic information used to estimate the model parameters was taken from Blackmarr [167]. This initial parameter estimation is the prior parameter set in calibrating the effective parameters of TETIS model.

The DEM for the basin included 30 × 30-m square cells which were used to derive flow direction, slope, and flow accumulation maps (Fig. 1.21). The last one is needed for stream channel routing with TETIS, because hydraulic characteristics in the GKW are extrapolated using mainly the drainage area of each cell [163]. Drainage threshold areas were estimated as 0.01 km2 (to differentiate between hillslopes and gullies) and 15.3 km2 (for differentiating between gullies and river channels). The parameters of the GKW power laws have been taken from Molnár and Ramírez [174]. The overland flow velocity map was estimated from the slope map, assuming a representative uniform flow and a rough estimation of the roughness coefficient. Vegetation density was obtained from a simple reclassification of the land cover map. The static storage capacity and infiltration capacity were estimated using soil information (texture, soil classification, and soil profiles), the land cover map for effective root depth, and proper pedotransfer functions. Percolation capacity was derived from a geological map of the study area. These three important parameter maps are shown in Fig. 1.22. The estimated values are in fact modal values for the union of the three original cartographic units (vegetation, soil, and lithology). No specific estimation was done for horizontal saturated soil permeability, aquifer permeability, and underground loses capacity: for example, the infiltration capacity map was used also for the horizontal saturated soil permeability assuming a high correlation between them.

Fig. 1.21
figure 21

Parameter maps derived from the DEM: flow direction, slope, and accumulated area (color figure online)

Fig. 1.22
figure 22

Main parameter maps derived from available landscape information: static storage capacity and upper and lower soil-saturated permeabilities (for the infiltration and percolation capacities, respectively) (color figure online)

6.3 Model Calibration and Testing

Application of simulation models in research or water management decision making requires establishing credibility, i.e., “a sufficient degree of belief in the validity of the model” [175]. Beck et al. [176] describe attributes of a valid model as follows: (1) soundness of mathematical representation of processes, (2) sufficient correspondence between model outputs and observations, and (3) fulfillment of the designated task. Literature review is commonly practiced to deal with the first attribute, which often includes model calibration. Model calibration is the process of adjusting the model parameters (θ) manually or automatically for the system of interest until model outputs adequately match the observed data. The credibility of model simulations is further evaluated by investigating whether model predictions are satisfactory on different data sets, a procedure often referred to as validation, verification, or testing [177, 178].

One common calibration and testing strategy is to split observed data into two data sets: one data set for calibration and another one for testing. It is desirable that the calibration and testing data sets contain observed data of approximately the same lengths (although often this requirement is not met where data sets are small). It is also important that both calibration and testing data sets contain periods with high and low flows in order to increase the robustness of the model. Yapo et al. [179] demonstrated that approximately eight years of daily data were needed to appropriately adjust model parameters for calibration of a rainfall-runoff model for their watershed. Gan et al. [180] indicated that ideally, calibration of rainfall-runoff models should use 3–5 years of daily data that include average, wet, and dry years so that the data encompass a sufficient range of hydrologic events to activate all the model components during calibration. However, the required amount of calibration data is project specific. For example, in the case study referred to in Sect. 6.2, only one single flood event at the outlet of the basin was used for model calibration.

The classical calibration procedure aims at identifying a unique “best parameter set,” (θ ^), that provides the closest match between model predictions and real-world observations of system outputs (y). Several measures of information have been proposed for calibration of hydrologic models (Gupta et al. [181]), including Nash-Sutcliffe efficiency coefficient (E), root mean square error (RMSE), mean absolute error (MAE), maximum absolute deviation (MAD), bias (BIAS), and lag-1 autocorrelation (r 1). Some studies have identified ranges for these measures that can be used to classify model simulations as poor, acceptable, good, and very good (e.g., [182, 183]). But other factors must be also taken into consideration such as the time step (i.e., more relaxed for smaller discretization), streamflow errors (especially when outputs are sediment and water quality), and input and prior parameter information uncertainties.

Per illustration, considering our example, the TETIS model includes an automatic calibration module based on the SCE-UA algorithm [184, 185], which was used to calibrate all the correction factors (9) and the initial values of the state variables (4). The model calibration results using the RMSE as objective function for a flood event in October 1981 measured at the outlet of the basin (station Q01) are shown in Fig. 1.23 (left). It is worth noting that the referred basin shows a typical Hortonian behavior and the referred flood event occurred with a very dry initial condition. The resulting Nash-Sutcliffe efficiency coefficient was E = 0.98, which can be considered a very good performance [183]. As stated above, the calibrated model must be capable of reproducing properly the dominant process in the basin for events other than those used for calibration (temporal validation) and events occurring at other sites in the basin (spatial validation) or better both (temporal and spatial validation) for a more robust model testing. For example, Fig. 1.23 (right) shows the flood output estimated for an upstream site corresponding to a storm event occurred in September 1983 for which the efficiency coefficient is E = 0.87. As expected the value of E for validation is smaller than that for calibration (0.98), but a decrease smaller than 0.2 is generally judged to be acceptable.

Fig. 1.23
figure 23

Calibration of the TETIS model for Goodwin Creek at the outlet of the basin for a storm event in October 1981 (left) and validation for an upstream station for a storm event occurred in September 1983 (right)

The literature abounds with optimization procedures for automatic calibration of hydrologic and water quality models by means of minimizing appropriate objective functions that reflect the modeling error magnitude. While several studies have demonstrated the importance of choosing a formal and statistically correct objective function for proper calibration of hydrologic models (e.g., [186188]), others have argued that there may not exist such measures (e.g., [151, 181, 189]). A major limitation of the classical calibration procedure is that it may not be possible to identify a unique set of model parameters that simultaneously minimize all objective functions corresponding to all model outputs. Thus, the use of multi-objective optimization algorithms has gained wide acceptance in the past years (e.g., [190193]). Multi-objective approaches are particularly suitable for multisite multivariable calibration of watershed models, where minimization of all errors associated with estimated fluxes of water, sediments, and chemicals at multiple outlets within the watershed is desired.

6.4 Sensitivity Analysis

A model sensitivity analysis can be helpful in understanding which model inputs and initial conditions are most important or influential for understanding potential limitations of the model. Additional care must be taken when estimating model parameters that are the most influential. Data collection efforts that support the modeling study may focus on obtaining better data for these parameters. In order to eliminate the effect of an influential initial condition, one can do three things: to simulate a sufficiently long “warming” period (usually months for aquifer initial conditions, weeks for soil moisture, and days and hours for river channel discharges, depending on the size and the particular watershed at hand), use as initial state the state in a similar time within the simulated period, or calibrate the initial condition.

The sensitivity analysis can also identify potential limitations of the model. If a model is not sensitive to parameters that are to be varied in testing the project objectives or hypotheses, a different model may need to be selected. Or alternatively, improve the mathematical model by removing the non-influential parameters and the corresponding processes. As stated above, models are abstractions of the systems they simulate and therefore typically represent system components with varying levels of detail. For example, the scientific literature may indicate that differences in tillage practices influence pesticide losses in surface runoff. In such a case, the use of a model that is not sensitive to tillage to examine the impact of switching from conventional tillage to conservation tillage on pesticide losses in surface runoff would be inappropriate. Sensitivity analysis can be done by local sensitivity analysis (i.e., without interactions between the analyzed inputs, parameters, and initial conditions) or better by a general sensitivity analysis (GSA) using Monte Carlo simulations (see, e.g., [194]). Generally, it is worth selecting only the behavioral simulations [146, 151, 195, 196].

The literature and model documentation are often excellent sources of information on model sensitivity. For example, Muttiah and Wurbs [197] identified the sensitivity of SWAT (Soil and Water Assessment Tool) to various parameters. However, it may be necessary to conduct a sensitivity analysis for the study watershed if its conditions are significantly different than those for model sensitivity analyses reported in the literature, since model sensitivity may be specific to the model setup. Thus, limited data for parameterizing the model may need to be collected prior to conducting a sensitivity analysis. Generally, the sensitivity analysis should be completed using an un-calibrated model setup, since the influential parameters and those with the greatest uncertainty are typically used for model calibration. For example, Spruill et al. [198] conducted a SWAT sensitivity analysis to evaluate parameters that were thought to influence stream discharge predictions. During calibration, the average absolute deviation between observed and simulated streamflows was minimized and used to identify optimum values or ranges for each parameter. In our case study (Sect. 6.2), a GSA was made, and the most influential correction factors were detected using as behavioral threshold the Nash-Sutcliffe efficiency coefficient of 0.75. Figure 1.24 shows two extreme cases: the correction factors of the static tank (storage) capacity and overland flow velocity. The static tank, which is the sink (during a flood event) of the infiltration when soil moisture is below field capacity, is a very influential component; on the other hand, the discharge at the basin outlet is not sensitive to the propagation of the overland flow within the hillslopes, i.e., maximum attention must be made to the estimation of the static tank storage capacity, whereas a rough estimation may be enough for the hillslope velocities.

Fig. 1.24
figure 24

Sensitivity analyses of two TETIS model correction factors for the Goodwin Creek basin. The behavioral parameter sets were detected where E > 0.75

6.5 Uncertainty Analysis

Any modeling process will necessarily entail a number of uncertainties arising from data, model abstractions, and natural heterogeneity of the watersheds. To this, we can add the uncertainty related to the decision-making process. The National Research Council report “Assessing the TMDL approach to water quality management” emphasizes that modeling uncertainty should be rigorously and explicitly addressed in development and application of models for environmental management, especially when stakeholders are affected by the decisions contingent upon model-supported analyses [199].

Uncertainties from the various model components illustrated in Fig. 1.17 can propagate forward into model predictions for state and output variables. There are three primary types of uncertainties in watershed modeling: parameter uncertainty, structural uncertainty, and data uncertainty (e.g., [188, 200]). Parameter uncertainty arises from errors in estimating model parameters (θ). Structural uncertainties result from incomplete representation of the real system by the functional relationships f and g, or numerical schemes that are employed to solve the system equations [(1.55) and (1.56)]. Data uncertainties are associated with errors in the system inputs (u) and outputs (y) and may consist of random and systematic errors (e.g., instrumentation and human) and errors arising from the discrepancy between the scale of modeling outputs and observations.

Uncertainties associated with watershed modeling can be addressed using three types of methods: (1) behavioral, (2) analytic, and (3) sampling-based methods. Behavioral methods are based on human judgment and experience and are carried out by asking experts in the problem area to provide their best assessment of the probability of a particular outcome. This method should be the last resort for addressing the problem of model uncertainty and should only be used in the absence of statistical methods [201]. Unlike behavioral methods, analytical methods that are based on the method of moments provide a quantitative estimate of model uncertainty [202]. The output function of the model is expanded by series expansions (such as Taylor series), and first-order, quadratic, or higher-order terms of the series are selected for computation of the moments. For example, the first-order variance propagation method is based on the first-order approximation of the Taylor series, and the first two moments are used to compute the variance of the model output. First-order reliability method (FORM) and first-order second moment (FOSM) are among the popular analytical methods for determining model uncertainties (e.g., [203]). The prerequisite of analytical methods is that the solution of the differential equation f (1.55) must be obtained analytically. In the context of watershed modeling, this prerequisite is a formidable barrier for the wider use of analytical methods because analytical solutions to highly nonlinear system of equations are rarely available.

Sampling-based uncertainty analysis methods are commonly used in watershed modeling, where instead of analytical solutions a probability distribution function for the model output is generated from multiple realizations of the parameter space (e.g., [203]). Sample statistics are used to compute first and second moments. Also, the relative importance of model parameters with regard to variations of the model output can be determined. The most common sampling technique for deriving distributions for model outputs is Monte Carlo simulations [204], which has also been the basis for more sophisticated sampling methods. In particular, various Markov chain Monte Carlo (MCMC) methods have been developed to deal with input, parameter, and model structural uncertainties in hydrologic prediction (e.g., [188, 200, 205, 206]).

In addition, from a decision maker point of view, it is necessary to integrate all sources of uncertainty in the context of model predictive uncertainty (PU). PU is the probability of any actual value (observed or not) conditioned to all available information and knowledge [207]. Figure 1.25 illustrates this important concept and shows the relationship between observations (real output) and simulations for the implemented model (i.e., fixed model and parameters), and PU is the conditional probability given the simulated output. From the model users’ perspective, what we want is to understand the position of the reality (real output) given the simulations. Furthermore, additional PU algorithms have been developed such as the hydrologic uncertainty processor [207], the Bayesian model averaging [208, 209], the model conditional processor, MCP [210], and the quantile regression [211]. Moreover, all these techniques can be used for uncertainty reduction by combining more than one model.

Fig. 1.25
figure 25

Predictive uncertainty of the real output, i.e., the conditional probability given the simulations for the Goodwin Creek outlet (adapted from ref. 212)

We have applied the MCP for obtaining the PU of the simulations in our case study based on the TETIS model. MCP is a Bayesian method based on the estimation of the joint distribution function of observations and simulations. To estimate the PU statistical model parameters, a calibration period covering the maximum range of possible outcomes is needed, to reduce the extrapolation of the estimated distributions and correlation functions. Figure 1.26 shows the estimated 90 % band (i.e., the 5 and 95 % quantiles) obtained with the referred method for the simulations of two storm events that occurred in May 1983 (larger and a smaller storm events compared to that used for calibration). In this case, it is clear that the no flow is predicted with high reliability, flow peaks are predicted with acceptable reliability, and the larger PU is located around the 50 m3/s discharge.

Fig. 1.26
figure 26

PU 90 % band for the simulations on May 1983 at the Goodwin Creek outlet flow gauge station

7 Risk and Uncertainty Analyses in Hydrology

Statistical concepts and methods are routinely utilized for approaching a number of problems in hydrology and water resources. This is because most, if not all, hydrologic processes have some degree of randomness and uncertainty. For example, annual precipitation over a basin is a random occurrence that is generally described by probability laws. Another example is the random occurrence of annual maximum floods at a given cross section of a stream. Thus, concepts of risk and uncertainty are commonly utilized for planning and management of hydraulic structures such as spillways and dikes. This section starts with an elementary and brief review of some basic concepts of probability and statistics. Then frequency analysis of hydrologic variables is presented by using nonparametric and parametric methods and models. The concepts of risk, vulnerability, uncertainty, and regional analysis are discussed primarily in connection with flood-related structures. In addition, the concepts and applications of stochastic techniques, particularly streamflow simulation and forecasting, are discussed. The section ends with a summary of what has been done with the issue of nonstationarity.

7.1 Introduction

Many of the problems that we face in planning and management of water resources and environmental systems involve some degree of uncertainty. For example, the occurrences of multiyear droughts or the occurrences of yearly maximum floods are random events that must be approached using probability theory, statistics, and stochastic methods. Often in characterizing random events, the concept of random variables is utilized. For example, if X is a random variable, it means that it is governed by a certain probability law (we will also call it a model) which can be represented by a probability density function (PDF) \( {f}_X\left(\mathit{x,}\underset{\bar{\mkern6mu}}{\theta}\right) \) or a cumulative distribution function (CDF) \( {F}_X\left(x,,,\underset{\bar{\mkern6mu}}{\theta}\right) \), where \( \underset{\bar{\mkern6mu}}{\theta }=\left\{{\theta}_1,\dots, {\theta}_m\right\} \) is the parameter set (population parameters) and where m is the number of parameters of the model. For brevity we will also use the notation f X (x) and F X (x), but it will be understood that they include the parameter set \( \underset{\bar{\mkern6mu}}{\theta } \). It can be shown that the population moments of the random variable X, say the expected value E(X) = μ X or the variance Var(X) = σ 2 X , are functions of the parameter set \( \underset{\bar{\mkern6mu}}{\theta } \) and they are constant values (they are not random variables).

It is also convenient to remember the concept of a random sample. A random sample could be represented by X 1, …, X N where all the Xs have the same distribution f X (x) (i.e., the same population mean μ and variance σ 2). Sample moments are functions of the random sample, for example, the sample mean \( {\widehat{\mu}}_X=\overline{X}=\left(1/N\right){\displaystyle {\sum}_1^N{X}_i} \) and the sample variance \( {\widehat{\sigma}}_X^2={S}^2=\left[1/\left(N-1\right)\right]{\displaystyle {\sum}_1^N{\left({X}_i-\overline{X}\right)}^2} \) are the first and the second sample moments. Since they are functions of the random sample, they are also random variables, and as such they also have moments. For instance, it may be shown that the expected values of \( {\widehat{\mu}}_X \) and \( {\widehat{\sigma}}_X^2 \) are μ and σ 2, respectively. Likewise, the variance of \( {\widehat{\mu}}_X \) is equal to σ 2/N. In addition, we could also refer to a random sample as the set x 1, …, x N where x i represents a particular value of the random variable X. And we could also define the sample moments as above using the same equations (e.g., \( {\widehat{\mu}}_x=\overline{x}=\left(1/N\right){\displaystyle {\sum}_1^N{x}_i} \)), but the big difference is that \( \overline{x} \) is not a random variable but a given quantity that depends on the values x 1, …, x N (while \( \overline{X} \) is a random variable as noted above).

Furthermore, assuming that we have a random sample X 1, …, X N from a known model \( {f}_X\left(x,,,\underset{\bar{\mkern6mu}}{\theta}\right) \) but unknown parameter set \( \underset{\bar{\mkern6mu}}{\theta } \), one can estimate \( \underset{\bar{\mkern6mu}}{\theta } \) by using various estimation methods such as the method of moments, probability-weighted moments, and maximum likelihood (e.g., [213]). Regardless of the estimation method, the estimator say \( \underset{\bar{\mkern6mu}}{\widehat{\theta}} \) will be a function of the random sample say \( \underset{\bar{\mkern6mu}}{\widehat{\theta}}={g}_1\left({X}_1,\dots, {X}_N\right) \). And the qth-quantile estimator \( {\widehat{X}}_q \) will be a function of the parameter set, i.e., \( {\widehat{X}}_q={g}_2\left({\widehat{\theta}}_1,\dots, {\widehat{\theta}}_m\right) \). Then \( \underset{\bar{\mkern6mu}}{\widehat{\theta}} \) and consequently \( {\widehat{X}}_q \) are random variables. Therefore, often in applying these concepts for problems such as flood frequency analysis, one would like to estimate the confidence limits of the population quantiles.

We can address some problems in engineering hydrology where probability laws and models can be directly applied for making risk-based design decisions. That is the case, for example, when we use probabilistic models for fitting the frequency distribution of annual floods and estimating the design flood to be used for designing the capacity of a spillway. We are able to do that because we assume that the sequence of annual floods is a random sample X 1, …, X N , i.e., there is no correlation among the Xs or they are uncorrelated. However, many data that we use in hydrology and water resources are autocorrelated, i.e., temporally dependent, and in such case, a direct application of a probability law \( {f}_X\left(x,,,\underset{\bar{\mkern6mu}}{\theta}\right) \) may not be enough. For example, monthly and annual streamflow data (mean flow or total volume) or daily precipitation data are generally autocorrelated. In these cases, additional concepts and different types of models are needed in order to represent the temporal and spatial variability of the data. Such models incorporate one or more terms linking the underlying variable with its past plus a random term, as is the case of single site or univariate models, and also linking it with other variables at other sites as is the case of multisite or multivariate models. These models fall in the category of stochastic models or time series models.

7.2 Frequency Analysis of Hydrologic Data

7.2.1 Empirical Frequency Analysis

Hydrologic data can be analyzed by using nonparametric methods for determining the PDF and CDF. Let us assume that we have a random sample denoted by x1, …, x i , …, x N where N is the sample size. For instance, x i , i = 1, …, N may be a sequence of maximum annual floods. The simplest procedure for estimating the empirical PDF is to arrange the data from the smallest to the largest one, say x 1, …, x i , …, x N such that x 1 is the minimum and x N is the maximum. Then the range of the data is subdivided into classes j = 1, …, N c with N c = the number of classes. Next assume that the class width is Δx and the number of observations that fall in class j is N j . Then the relative frequency corresponding to class j is N j /N. The plot of N j /N against the class mark (the midpoint of the class) is the typical histogram, and the empirical PDF (estimate of the population PDF) is given by f(j) = N j /(NΔx). Additional details regarding the criteria for selecting N c and Δx can be found in standard books (e.g., [214216]). In addition, Kernel density estimates (KDE) may be useful in cases where a smooth density is needed across the range of the data set (rather than point estimates for classes). For example, KDE has been useful for identifying bimodality in the frequency distribution (e.g., [217]). Whether using f(j) or KDE, the empirical CDF F(j) can be found by integration.

Also the empirical CDF may be determined based on the so-called plotting position formulas as follows: (1) Arrange the data x1, …, x i , …, x N in either increasing or decreasing order of magnitude (for simplicity we will assume throughout this section that the data are arranged in increasing order of magnitude). As above denote the arranged sequence by x 1, …, x i , …, x N where x 1 is the minimum and x N is the maximum. (2) Assign a probability P(Xx i ) to each value x i by using a plotting position formula. Several formulas have been suggested for this purpose (Table 1.7 gives some examples). The most widely used formula in practice is the Weibull plotting position formula, i.e., F(x i ) = P(Xx i ) = i/(N + 1). The formula gives a non-exceedance probability or the probability that the random variable X is less or equal to the value x i (value that corresponds to the order i in the arranged sample). Then, the exceedance probability is P(x i ) = 1 − F(x i ) = P(X > x i ) = 1 − i/(N + 1).

Table 1.7 Examples of plotting position formulas typically used in hydrology [213]

In addition, the concept of return period or recurrence interval has been widely used for many purposes in engineering practice. For events defined in the upper probability scale (generally events related to maximum quantities such as floods), the return period is equal to one divided by the exceedance probability, i.e., the empirical estimate of the return period is T(x i ) = 1/P(x i ). On the other hand, in case that hydrologic events are defined in the lower probability scale (such as for minimum flows), the return period is given by T(x i ) = 1/F(x i ). The empirical CDF is sometimes plotted on probability papers. A probability paper designed for a given model has the probability scale distorted so that the CDF of the model plots as a straight line. The most popular and useful probability papers are the normal, lognormal, and Gumbel probability papers. The example below further illustrates the method.

The empirical CDF for the maximum annual floods of the St. Mary’s River at Stillwater, Canada, will be determined based on the flood data available for the period 1916–1939 as shown in the first two columns of Table 1.8. The original data have been ordered from the smallest (8,040) to the largest value (20,100) as shown in columns 4 and 5 of Table 1.8. Using the Weibull plotting position formula, the non-exceedance and the exceedance probabilities are calculated as shown in columns 6 and 7. And the return period is listed in column 8. The empirical CDF is plotted in Fig. 1.27. Based on the empirical distribution, one can make probability statements about the possible occurrences of certain flood events. For example, from Table 1.8 (columns 5 and 6), one can write P(X ≤ 17, 200) = 80 % which is the probability that annual floods at St. Mary’s River will be less or equal to 17,200. Conversely, P(X > 17, 200) = 20 % is the exceedance probability, and T (17,200) = 5 years is the corresponding return period. Obviously relevant probability information can be obtained from the empirical CDF. However, such information is rather limited because many design problems require estimating flood quantiles for specified return periods or estimating the return periods for given flood magnitudes that are beyond the values that can be found from the empirical frequency analysis such as that shown in Table 1.8. Section 7.3 below shows how using probabilistic models can enhance the frequency analysis of hydrologic data.

Table 1.8 Empirical CDF for the St. Mary’s River annual flood data
Fig. 1.27
figure 27

Comparison of the empirical and fitted normal and lognormal CDFs for the annual flood data of the St. Mary’s River

7.2.2 Frequency Analysis Based on Probabilistic Models

Probability models such as the normal, lognormal, gamma (Pearson), log-gamma (log-Pearson), and general extreme value (GEV) distributions have been widely used for fitting the distribution of hydrologic data. From experience the type of data may suggest applying or discarding one or more candidate models that may be considered for the data at hand. For example, extreme flood or extreme precipitation data are generally skewed, and for this reason, the normal distribution would not be a suitable distribution for such data. Generally more than one distribution may fit the empirical data reasonably well although, often, significant differences may result when extrapolating the fitted distribution beyond the range of the empirical data. While fitting a particular model has become a simple task, the difficulty lies in selecting the model to be used for making design or management decisions [213]. However, in many countries and regions of the world, guidelines and manuals have been developed, suggesting a particular distribution for a certain type of hydrologic data. For example, Bulletin 17B [218] is a manual that suggests using the log-Pearson III distribution for flood frequency analysis in the United States of America.

In this section, we describe only four distributions, namely, the normal, lognormal, log-Pearson III, and the Gumbel distribution (which is a particular case of the GEV distribution). The fitting method, i.e., parameter estimation, will be illustrated using the method of moments only. The reader should be aware though that several alternative estimation methods exist in literature, some of them more efficient for certain distributions than the method of moments. Likewise, statistical tests such as the Smirnov-Kolmogorov test are available that help judging the goodness of fit of a particular model. For additional information on alternative probabilistic models, parameter estimation methods, testing techniques, and evaluating uncertainties, the reader is referred to well-known references (e.g., [213, 219]).

7.2.2.1 Normal Distribution

The normal distribution is a benchmark distribution not only for hydrology but for many other fields as well. The PDF is given by

$$ {f}_X(x)=\frac{1}{\sqrt{2\pi}\sigma } \exp \left[-\frac{1}{2}{\left(\frac{x-\mu }{\sigma}\right)}^2\right],\kern3em -\infty <x<\infty, $$
(1.57)

where μ and σ are the model parameters. The plot of the PDF f(x) vs. x is centered around μ and has a bell shape symmetric form. Certain properties of the normal distribution are useful. For instance, it may be shown that the population mean, variance, and skewness coefficient of the normal variable X are E(X) = μ, Var(X) = σ 2, and γ(X) = 0, respectively. The normal random variable X can be standardized as

$$ Z=\left(X-\mu \right)/\sigma, $$
(1.58)

where Z is known as the standard normal and has mean 0 and variance 1. A typical problem of practical interest is determining the value of the cumulative probability for a specified value x (of the normal variable X). This can be obtained from the cumulative distribution function (CDF). The CDF of X, i.e., F X (x), can be found by integrating the density function f(x) in (1.57) from − to x. Mathematically this can be expressed as

$$ {F}_X(x)={\displaystyle \underset{-\infty }{\overset{x}{\int }}{f}_X(x) dx}={\displaystyle \underset{-\infty }{\overset{x}{\int }}\frac{1}{\sqrt{2\pi}\sigma } \exp \left[-\frac{1}{2}{\left(\frac{x-\mu }{\sigma}\right)}^2\right] dx}. $$

Unfortunately one cannot integrate the normal density in close form so numerical integration or tables must be used to find F X (x). Actually tables and numerical approximations are available in reference to the standardized variable Z. Thus, the following relationship is useful for practical applications of the normal distribution:

$$ {F}_X(x)={F}_Z(z)=\varPhi (z), $$
(1.59)

in which z = (xμ)/σ and Φ(z) denotes the CDF of the standard normal variable. In other words, the CDF of X can be found from the CDF of Z. Tables relating Φ(z) versus z can be found in any standard statistical book (e.g., [220]). Likewise, another problem of interest is, given the value of the non-exceedance probability, i.e., given F X (x q ) = q, we would like to find the q th-quantile x q . It may be shown that x q can be obtained as a function of z q , the q th quantile of the standard normal distribution as

$$ {x}_q=\mu +\sigma {z}_q. $$
(1.60)

Also note that both the estimation of the CDF F(x) and the quantile x q can be made using statistical software packages and Excel.

The estimation of the parameters of the normal distribution can be made by the method of moments. They are

$$ {\widehat{\mu}}_X=\overline{x}=\frac{1}{N}{\displaystyle \sum_{i=1}^N{x}_i} $$
(1.61)

and

$$ {\widehat{\sigma}}_X={s}_x=\sqrt{\frac{1}{N-1}{\displaystyle \sum_{i=1}^N{\left({x}_i-\overline{x}\right)}^2}}, $$
(1.62)

where \( \overline{x} \) and s x are the sample mean and standard deviation, respectively.

7.2.2.2 Lognormal Distribution

The lognormal distribution has been quite useful in the field of hydrology because it is a skewed distribution and is related to the normal distribution. Let us consider a lognormal distributed random variable X with parameters x 0, μ Y , and σ Y . It may be shown that if X is lognormal distributed with parameters x 0, μ Y , and σ Y , then Y = log a (Xx 0) (where x 0 is a lower bound) or Y = log a (x 0X) (where x 0 is an upper bound) is normal with parameters μ Y , and σ Y (note that a is the base of the logarithms and the bases e or 10 are commonly used). The PDF of the lognormal distribution with three parameters is defined as

$$ {f}_X(x)=\frac{k}{\sqrt{2\pi}\left(x-{x}_0\right){\sigma}_Y} \exp \left[-\frac{1}{2}{\left(\frac{{ \log}_a\left(x-{x}_0\right)-{\mu}_Y}{\sigma_Y}\right)}^2\right],\kern3em \mathrm{for}\kern2em {x}_0<x<\infty $$
(1.63a)

or

$$ {f}_X(x)=\frac{k}{\sqrt{2\pi}\left({x}_0-x\right){\sigma}_Y} \exp \left[-\frac{1}{2}{\left(\frac{{ \log}_a\left({x}_0-x\right)-{\mu}_Y}{\sigma_Y}\right)}^2\right],\kern3em \mathrm{for}\kern2em -\infty <x<{x}_0, $$
(1.63b)

where k = 1 if a = e and k = log10(e) = 0.4343 if a = 10. In particular, if x 0 = 0, the model becomes the two-parameter lognormal distribution. As for the normal distribution, it is not possible to integrate the lognormal density in closed form. Therefore, the following relations are useful for computations:

$$ {F}_X(x)=\varPhi \left[\frac{{ \log}_a\left(x-{x}_0\right)-{\mu}_Y}{\sigma_Y}\right],\kern3em \mathrm{for}\kern2em {x}_0<x<\infty $$
(1.64a)

or

$$ {F}_X(x)=1-\varPhi \left[\frac{{ \log}_a\left({x}_0-x\right)-{\mu}_Y}{\sigma_Y}\right],\kern2em \mathrm{for}\kern2em -\infty <x<{x}_0 $$
(1.64b)

which give the CDF of X as a function of the CDF of the standardized normal. Likewise

$$ {x}_q={x}_0+{ \exp}_a\left({\mu}_Y+{\sigma}_Y{z}_q\right),\kern2em \mathrm{for}\kern2em {x}_0<x<\infty $$
(1.65a)

or

$$ {x}_q={x}_0-{ \exp}_a\left({\mu}_Y+{\sigma}_Y{z}_{1-q}\right),\kern2em \mathrm{for}\kern2em -\infty <x<{x}_0 $$
(1.65b)

give the q th quantile of X as a function of the q th or (1 − q)th quantile of the standard normal.

Parameter estimation for the lognormal distribution can be made as follows. An efficient estimator of x 0 is [213]

$$ {\widehat{x}}_0=\frac{x_{\min }{x}_{\max }-{x}_{\mathrm{med}}^2}{x_{\min }+{x}_{\max }-2{x}_{\mathrm{med}}} $$
(1.66)

where x min, x max, and x med are the sample minimum, maximum, and median, respectively. If x min + x max − 2x med > 0, \( {\widehat{x}}_0 \) is a lower bound, whereas if x min + x max − 2x med < 0, \( {\widehat{x}}_0 \) is an upper bound. Once x 0 is estimated, the parameters μ Y and σ Y may be estimated by

$$ {\widehat{\mu}}_Y=\overline{y}=\frac{1}{N}{\displaystyle \sum_{i=1}^N{ \log}_a\left({x}_i-{\widehat{x}}_0\right)},\kern3em \mathrm{for}\kern2em {x}_0<x<\infty $$
(1.67a)

or

$$ {\widehat{\mu}}_Y=\overline{y}=\frac{1}{N}{\displaystyle \sum_{i=1}^N{ \log}_a\left({\widehat{x}}_0-{x}_i\right)},\kern3em \mathrm{for}\kern2em -\infty <x<{x}_0 $$
(1.67b)

and

$$ {\widehat{\sigma}}_Y={s}_y=\sqrt{\frac{1}{N-1}{\displaystyle \sum_{i=1}^N{\left[{ \log}_a\left({x}_i-{\widehat{x}}_0\right)-\overline{y}\right]}^2}},\kern3em \mathrm{for}\kern2em {x}_0<x<\infty $$
(1.68a)

or

$$ {\widehat{\sigma}}_Y={s}_y=\sqrt{\frac{1}{N-1}{\displaystyle \sum_{i=1}^N{\left[{ \log}_a\left({\widehat{x}}_0-{x}_i\right)-\overline{y}\right]}^2}},\kern3em \mathrm{for}\kern2em -\infty <x<{x}_0 $$
(1.68b)

and \( \overline{y} \) and s y are, respectively, the sample mean and standard deviation in the log domain.

For the same flood data of Table 1.8 above, the normal and lognormal models are fitted. Table 1.8 gives the sample mean, standard deviation, and skewness coefficient as \( \overline{x}=13,172.9 \), s x = 3, 569.3, and g x = 0.271, respectively. Thus, from (1.61) and (1.62), the parameters of the normal distribution are \( {\widehat{\mu}}_x=\overline{x}=13,172.9 \) and \( {\widehat{\sigma}}_x={s}_x=3,569.3 \). The PDF and CDF are obtained from (1.57) and (1.59) using the mathematical functions available in Excel. Table 1.9 below shows a sample of the results obtained. In addition, Fig. 1.27 shows the comparison of the fitted normal and empirical CDFs, and Fig. 1.28 shows the fitted normal model PDF and CDF. Also a lognormal-2 model (i.e., with x 0 = 0) is fitted. Table 1.8 (column 3) gives the base-10 logarithms of the data and the mean, standard deviation, and skewness coefficient of the logarithms as \( \overline{y}=4.104 \), s y = 0.121, and g y = − 0.174, respectively. Then, the lognormal-2 model parameters are estimated from (1.67a) and (1.68a); they give \( {\widehat{\mu}}_Y=\overline{y}=4.104 \) and \( {\widehat{\sigma}}_Y={s}_y=0.121 \). The corresponding fitted lognormal PDF is obtained from (1.63a) in which k = 0.4343 and x 0 = 0, and the fitted CDF is obtained from (1.64a) using the mathematical functions available in Excel. Table 1.9 shows the results obtained for a range of x values varying from 0 to 26,000. Figure 1.27 compares the normal, lognormal, and empirical CDFs. Because the skewness of the data is small, no major differences are seen between the CDFs. Also, Fig. 1.28 compares the lognormal-2 model PDF and CDF versus those of the normal model. One may observe that while the normal PDF is symmetric, that of the lognormal model is slightly skewed to the right (because of the positive skewness coefficient).

Table 1.9 PDF and CDF for the normal and lognormal models fitted to the annual flood data of the St. Mary’s River
Fig. 1.28
figure 28

PDF and CDF for the normal and lognormal models fitted to the annual flood data of the St. Mary’s River

7.2.2.3 Log-Pearson III Distribution

The log-Pearson type III distribution has been widely applied in hydrology, in particular for fitting the frequency distribution of extreme hydrologic data such as annual flood data. The US IACWD [218] recommended the use of the log-Pearson type III distribution as an attempt to promote a uniform and consistent approach for flood frequency studies. As a result, this distribution has become quite popular in the United States.

The probability density function of the log-Pearson type III distribution may be written as (e.g., [216, 221])

$$ {f}_X(x)=\frac{k}{\alpha \varGamma \left(\beta \right)x}{\left[\frac{{ \log}_a(x)-{y}_0}{\alpha}\right]}^{\beta -1} \exp \left[-\frac{{ \log}_a(x)-{y}_0}{\alpha}\right], $$
(1.69)

where α, β, and y 0 are the parameters and Γ(β) denotes the complete gamma function. The variable Y is a log-transform of X, i.e., Y = log a (X), and it implies that if X is log-Pearson III distributed with parameters α, β, and y 0, then Y is gamma distributed with the same parameter set. Thus, the parameters α and y 0 are expressed in the log domain. Also β > 0 and α, and y 0 may be either positive or negative. If α > 0, f(x) is positively skewed and varies in the range exp a (y 0) ≤ x < . On the other hand, if α < 0, f(x) is either positively or negatively skewed depending on the values of α and β, and f(x) varies in the range −x < exp a (y 0). The CDF and the quantile (for a given non-exceedance probability) cannot be represented explicitly as is the case for the normal and lognormal models. Therefore, tables or numerical approximations are necessary for their computations.

The following relationships are important for parameter estimation. It may be shown that if X is log-Pearson III distributed with parameters α, β, and y 0, the first three population moments of Y = log a (X) are

$$ E(Y)={\mu}_Y={y}_0+\alpha \beta $$
(1.70)
$$ Var(Y)={\sigma}_Y^2={\alpha}^2\beta $$
(1.71)

and

$$ {\gamma}_Y=\frac{2\alpha }{\left|\alpha \right|\sqrt{\beta }}. $$
(1.72)

Consider the random sample x 1, …, x N where N = sample size. For fitting the log-Pearson III distribution, the original data were log-transformed, i.e., y = loga(x), and the new data set in the log domain is denoted as y 1, …, y N . Then, based on the moment (1.70)–(1.72), the log-Pearson III parameters can be estimated as

$$ \widehat{\beta}={\left(2/{g}_y\right)}^2 $$
(1.73)
$$ \widehat{\alpha}=\frac{s_y{g}_y}{2} $$
(1.74)
$$ {\widehat{y}}_0=\overline{y}-\widehat{\alpha}\widehat{\beta }, $$
(1.75)

where \( \overline{y} \), s y , and g y are, respectively, the sample mean, standard deviation, and skewness coefficient of the logarithms of the data (the logs of the x’s).

The quantile (value of x) for a return period of T years (or equivalently for an exceedance probability p) can be obtained using the frequency factor of the gamma distribution as

$$ { \log}_a\left({x}_T\right)=\overline{y}+{K}_T{s}_y, $$
(1.76)

where K T is the frequency factor for the gamma distribution and is a function of the skewness coefficient γ Y and T. Appropriate tables that give K T for a range of values of γ Y and T can be found in literature (e.g., [218]).

For the same flood data used above, we fit the log-Pearson III distribution. The mean, standard deviation, and the skewness coefficient of the base-10 logarithms of the sample flood data are given in Table 1.8 (3rd column). They are \( \overline{y}=4.104,\kern0.5em {s}_y=0.121, \) and g y = −0.174, respectively. The parameters are estimated from (1.73) through (1.75) which give \( \widehat{\beta}=132.1 \), \( \widehat{\alpha}=-0.0105 \), and \( {\widehat{y}}_0=5.491 \), respectively. The flood quantiles are estimated from (1.76) using \( \overline{y}=4.104,\kern0.5em {s}_y=0.121 \), g y = −0.174, and 10 values of the frequency factor K T that are taken from tables [218]. The results are shown in columns 1–6 of Table 1.10. For example, referring to the 8th row, we observe that K T = 1.9592 for q = 0.98 or T = 50. The value of K T is obtained by interpolating between 1.94499 and 1.99973, values that correspond to T = 50 and g y = −0.20 and −0.10, respectively. Then (1.76) gives log(x T ) = 4.104 + 1.9592 × 0.121 = 4.341 so that x T = 21,931.4. The values of x and q from Table 1.10 are plotted as shown in Fig. 1.29. Clearly the log-Pearson III model fits the empirical CDF reasonably well.

Table 1.10 Computations of flood quantiles using the log-Pearson III model fitted to the annual flood data of the St. Mary’s River
Fig. 1.29
figure 29

Comparison of the fitted Gumbel and log-Pearson III CDFs versus the empirical CDF for the annual floods of the St. Mary’s River

7.2.2.4 Gumbel Distribution

The Gumbel distribution is a particular case of the GEV distribution, i.e., the type I GEV. It has been a popular model for fitting the frequency distribution of extreme natural events such as extreme floods and winds. The model has two parameters and a fixed skewness coefficient. A nice feature of the Gumbel distribution is that both the CDF and the quantile can be written in explicit mathematical forms; hence, its application is simple and does not require numerical approximations or tables. Several parameter estimation techniques such as the method of moments, probability-weighted moments, and maximum likelihood have been developed for the Gumbel distribution (e.g., [213, 216]). In this section, we include only the moment estimates.

The PDF and CDF of the Gumbel model are given, respectively, as

$$ {f}_X(x)=\frac{1}{\alpha } \exp \left\{-\frac{x-{x}_0}{\alpha }- \exp \left[-\frac{x-{x}_0}{\alpha}\right]\right\},\kern2em -\infty <x<\infty $$
(1.77)

and

$$ {F}_X(x)= \exp \left\{- \exp \left[-\frac{x-{x}_0}{\alpha}\right]\right\},\kern3em -\infty <x<\infty, $$
(1.78)

where x 0 is the location parameter (central value or mode) and α is the scale parameter. Because of the nature of the CDF, the Gumbel model is also known as the double exponential distribution. By taking logarithms twice in (1.78), one can write x as a function of F(x) = q as

$$ x={x}_0-\alpha \ln \left[- \ln q\right] $$
(1.79)

which can be used to obtain quantiles for specified values of the non-exceedance probability.

In addition, it may be shown that the first two population moments of the Gumbel distribution are

$$ E(X)=\mu ={x}_0+0.5772\;\alpha $$
(1.80)

and

$$ Var(X)={\sigma}^2=\left({\pi}^2/6\right){\alpha}^2=1.645{\alpha}^2. $$
(1.81)

Furthermore it may be shown that the skewness coefficient is γ = 1.1396. Equations (1.80) and (1.81) can be readily used to obtain the moment estimates of the parameters as

$$ \widehat{\alpha}=\frac{\sqrt{6}}{\pi }{s}_x=0.78{s}_x $$
(1.82)

and

$$ {\widehat{x}}_0=\overline{x}-0.5772\widehat{\alpha}, $$
(1.83)

in which \( \overline{x} \) and s x are the sample mean and standard deviation.

For the same flood data used above, we fit the Gumbel model. The parameters are estimated from (1.82) and (1.83) based on the sample statistics \( \overline{x}=13,172.9 \) and s x = 3, 569.3. The results are \( \widehat{\alpha}=2,784 \) and \( {\widehat{x}}_0=11,566 \). Then (1.77) and (1.78) are used to calculate the PDF and CDF, respectively, for values of x ranging from 6,000 to 22,000 as shown in columns 1–3 in Table 1.11. Also flood quantiles are estimated from (1.79) for specified values of the non-exceedance probability q ranging from 0.1 to 0.9999 (i.e., p ranging from 0.9 to 0.0001 or T ranging from 1.111 to 10,000 as shown in columns 4–7 in Table 1.11). The CDF is plotted in Fig. 1.29 next to the CDF of the log-Pearson III model and the empirical CDF.

Table 1.11 Gumbel model PDF and CDF for various values of the flood x and flood quantiles obtained for given values of non-exceeding probabilities

7.2.3 Risk and Reliability for Design

7.2.3.1 Design Flood and Design Life

We have seen in previous sections that annual floods are random variables that can be described by probability laws or probability distribution functions. Once a probability model is specified, one can determine a flood quantile for any non-exceedance (or exceedance) probability. Thus, for the models we have presented in Sect. 7.2.2 above, we have outlined the equations and procedures for estimating flood quantiles. For example, for the lognormal model, (1.65) can be used to determine the flood value x corresponding to a specified non-exceedance probability F(x) = q. Such flood value (flood quantile) was denoted as x q . Also since T = 1/(1 − q) = 1/p is the return period, such flood quantile is commonly denoted as x T and is called the T-year flood (note that sometimes the notation x p is also used which means the flood with exceeding probability p = 1 − q). For instance, referring to the lognormal model that was fitted to the annual flood data of the St. Mary’s River, the model parameters were found to be \( {\widehat{\mu}}_Y=4.104 \) and \( {\widehat{\sigma}}_Y=0.121 \). Then assuming q = 0.99 (i.e., p = 0.01 or T = 100), (1.65) for x 0 = 0 gives

$$ {\widehat{x}}_{0.99}={ \exp}_{10}\left(4.104+0.121{z}_{0.99}\right)={ \exp}_{10}\left(4.104+0.121\times 2.326\right)=24,291. $$

Thus, 24,291 cfs is the flood with 99 % non-exceedance probability, or the flood with 1 % exceedance probability (i.e., there is 1 % of chance that floods in the referred river will exceed 24,291 in any given year), or is the 100-year flood.

In the context of designing hydraulic structures such as drainage systems and spillways, generally the return period T is specified depending on the type of structure to be designed (e.g., [46]), and the design flood is determined from the frequency analysis of flood data as referred to in Sect. 7.2.2. The design life of a hydraulic structure has an economic connotation. For purposes of defining the concept, a simple example follows. Suppose the designer of a small bridge selects 25 years as the return period and after estimating the 25-year flood from frequency analysis, the estimated cost of the bridge is $ 50,000. To pay for the construction of the bridge, the designer goes to the bank to borrow the money. The bank officer tells her that they can lend her the money if it is paid off in no more than 10 years. Then 10 years becomes the design life. The banker in fact may ask some other technical questions or requirements before processing the loan. For example, the bank may like to know “what is the risk that two floods exceeding the design flood may occur during the first five years after the construction of the bridge?” (perhaps the reasoning being that if one flood exceeding the design flood, say a 30-year flood, occurs in the five year period, the bridge may be repaired and continue functioning for the rest of the design life and that possibility may be acceptable to the bank, but if two floods exceeding the design flood occur within the first five years, then that possibility may not be acceptable to the bank especially if the risk of that event is beyond an acceptable level). The answer to the foregoing question and similar others concerning the risk of failure of a hydraulic structure are discussed in the sections below.

7.2.3.2 Probability of the Number of Floods Exceeding the Design Flood in a Given Time Period

Once a tentative design flood has been specified for a hydraulic structure, one of the first questions the designer may like to know is the probability that a certain number of floods exceeding the design flood may occur during a given number of years (e.g., during the design life of the structure). We will answer this and other related questions using the binomial probability law. For easy explanation in the following text, when referring to “floods that exceed the design flood,” we will use the term exceeding floods.

Firstly, let us consider a simple case. Assume that a T-year flood is the design flood, i.e., a flood with p = 1/T exceeding probability. This implies that p is the probability of exceeding floods and q = 1 − p is the probability of non-exceeding floods. In fact, p is the probability of exceeding floods in any given year, and we will assume that it remains constant throughout the future years considered, and also we will assume that floods are independent events. Figure 1.30 below illustrates this concept. Considering n = 2, we would like to answer the question: what is the probability that y exceeding floods will occur during the 2-year period? Clearly the only possible values that Y can take on are y = 0, 1, or 2. Thus, we would like to find P(Y = 0), P(Y = 1), and P(Y = 2). Denoting by F the event of exceeding floods in any 1 year and by NF the opposite (non-exceeding floods), Table 1.12 summarizes the exceeding flood events that must occur in years 1 and 2 for the number of exceeding floods in the 2-year period to be either 0, 1, or 2. The last column gives the probability P(Y = y), y = 0,1,2. Following similar reasoning when n = 3, one can find that P(Y = 0) = (1 − p)3, P(Y = 1) = 3 p (1 − p)2, P(Y = 2) = 3 p 2 (1 − p), and P(Y = 3) = p 3. In general, for any n, it may be shown that

Fig. 1.30
figure 30

Schematic depicting the design flood x T and exceeding and non-exceeding probabilities throughout years 1 to n (adapted from ref. 216)

Table 1.12 Flood occurrence and probability of the number of exceeding floods in a 2-year period
$$ \begin{array}{l}P\left(Y=y\right)=\left(\begin{array}{c}\hfill n\hfill \\ {}\hfill y\hfill \end{array}\right){p}^y{\left(1-p\right)}^{n-y},\kern2em y=0,1,\dots, n\\ {}\kern3.8em =\frac{n!}{y!\left(n-y\right)!}{p}^y{\left(1-p\right)}^{n-y},\kern2em y=0,1,\dots, n\end{array} $$
(1.84)

which is the well-known binomial probability model. For example, for n = 3, (1.84) gives

$$ P\left(Y=2\right)=\frac{3!}{2!\left(3-2\right)!}{p}^2{\left(1-p\right)}^{3-2}=3{p}^2\left(1-p\right). $$
7.2.3.3 First Occurrence Probability of a Flood Exceeding the Design Flood and Return Period

We have already stated above that return period T is equal to 1/p. However, it is useful examining the fundamental concepts behind this definition. Let us consider again the same flood problem as before where we selected a given value of T and determine the corresponding design flood (i.e., flood quantile) from frequency analysis. We would like to answer the following question: what is the probability that an exceeding flood (a flood exceeding the design flood) will occur for the first time in year w? Clearly that first time could be in year 1, or 2, or 3, etc., or perhaps it will never occur. Obviously the waiting time for an exceeding flood to occur for the first time is a random variable that we will denote by W. For an exceeding flood to occur for the first time at year W = w, the following event must occur:

Year:

1

2

3

w − 1

w

Event:

NF

NF

NF

NF

F

Probability:

1 − p

1 − p

1 − p

1 − p

p

As usual considering that floods are independent the referred event has probability (1 − p)w−1 p or

$$ P\left(W=w\right)={\left(1-p\right)}^{w-1}p,\kern3em w=1,2,\dots. $$
(1.85)

which is the geometric probability law. It may be shown that E(W) = 1/p, i.e., the expected waiting time or the mean number of years that will take for an exceeding flood to occur is 1/p and that has become known as return period, i.e., T = 1/p.

7.2.3.4 Risk of Failure and Reliability

Risk and reliability are important concepts for designing hydraulic structures. In the previous examples on designing flood-related structures such as a bridge, we assumed that the return period T was selected from design tables or manuals and that the actual design flood magnitude is found from flood frequency analysis. In this section, we are interested on the risk of failure of the referred structure. However, we must specify a time frame such as one year and two years where the possibility of failure of the referred structure may occur. Also we will define as failure as that situation in which a flood exceeding the design flood occurs. Then we can ask the question: “what is the risk of failure of the structure in a period of n years?” (The value of n could be in fact the design life that we referred to in Sect. 7.2.3 above.) For instance, for n = 1, the reliability is \( {R}_{\mathcal{l}}=q=1-p \) and conversely the risk is \( R=1-{R}_{\mathcal{l}}=p \). When n = 2 the reliability of the structure can be calculated by the event that no exceeding floods will occur in the 2-year period, i.e., NF in the first year and NF in the second year. Thus, the probability of such 2-year event is (1 − p) (1 − p) = (1 − p)2 so that the reliability of the structure is \( {R}_{\mathcal{l}}={\left(1-p\right)}^2 \) and consequently the risk of failure becomes R = 1 − (1 − p)2. Likewise, in general for an n-year period, the reliability is (1 − p)n and the risk of failure becomes R = 1 − (1 − p)n.

Actually with the foregoing background, we can now define reliability and risk using the binomial law previously described in Sect. 7.2.3. We define reliability as the probability that no exceeding floods will occur in an n-year period, i.e., \( {R}_{\mathcal{l}}=P\left(Y=0\right) \) where Y = random variable denoting the number of exceeding floods in an n-year period. Likewise, risk is defined as the probability that one or more exceeding floods will occur in an n-year period, i.e., R = P(Y > 0) = 1 − P(Y = 0). These probabilities can be readily obtained from (1.84). Summarizing

$$ {R}_{\mathcal{l}}=P\left(Y=0\right)={\left(1-p\right)}^n, $$
(1.86)
$$ R=P\left(Y>0\right)=1-{\left(1-p\right)}^n. $$
(1.87)

Consider the data and results of the log-Pearson III model which was fitted to the annual flood data of the St. Mary’s River (Sect. 7.2.2). Assume that a large bridge will be designed to cross the river. From design tables (e.g., [46]), the return period for designing a large bridge is taken as 50 years, and the 50-year flood using the log-Pearson III model gives \( {\widehat{x}}_{50}=21,931.4\;\mathrm{cfs} \). Then, for p = 0.02 and n = 10 (design life), (1.86) gives \( {R}_{\mathcal{l}}={\left(1-0.02\right)}^{10}=81.7\;\% \) and R = 1 − (1 − 0.02)10 = 18.3 %. To lower the risk of failure, one may have to increase the design flood. For instance, if T = 100 and p = 0.01, then the risk becomes 9.6 %.

7.2.3.5 Expected Damage, Vulnerability, and Risk

In the previous section, we defined the term risk in the sense of hydrologic risk. However, in actual practice, the term risk also has other connotations. For example, continuing with the previous reference to flood events, let us assume that for a given reach of a river, it is known that the relationship between the flood level H and the damage D, i.e., D = g 1(H) and D, can be expressed in monetary terms. Likewise, we assume a relationship between the flood level H and the flood discharge X, i.e., H = g 2(X). Because the probability distribution of X is known (Sect. 7.3), then conceptually we can find the distribution f H (h) of the flood level H and consequently the distribution f D (d) of the flood damage D. Then the expected value of the flood damage E(D) can be found by integration, i.e., \( E(D)={\int}_{-\infty}^{\infty } \) df D (d)dd. Such expected damage (expected cost) has been also called risk, i.e., R = E(D). A practical reference on this subject is USACE [222].

However, an alternative way at looking at the problem may be finding a relationship (function) linking directly the damage D and the flood X, e.g., d(X). In this case, the expected damage (risk) can be found as \( R=E(D)={\int}_{-\infty}^{\infty } \) d(x)f X (x)dx. More realistically, since the damage begins to occur after the flood has reached some threshold, say x 0, and the damage after the flood reaches and exceeds some maximum threshold, x m is likely to be a total or maximum damage d m (e.g., total loss of a farmhouse, thus the cost of replacing the property), then the expected damage must be determined as \( R=E(D)={\displaystyle {\int}_{-{x}_0}^{x_m}d(x){f}_X(x) dx}+\left[1-{F}_X\left({x}_m\right)\right]\times {d}_m \) where F X (x m ) is the CDF of the flood X evaluated at the value x m .

Furthermore, we may add the concept of vulnerability. Assume a simple case where a flood wall is built to protect the flood plain so that d(x) = 0 if xx d and d(x) = d m if x > x d , where x d is the design flood of the flood wall. Then the risk is given by R = P(X > x d )d m . In addition, assume that the property owners in the floodplain have the option of building additional protection for their property. For example, if they do nothing, then the damage is d m when the flood exceeds x d , and in that case, the vulnerability (V) of the property is 100 %. While if property owners build say a wall surrounding their property, we may estimate the vulnerability as say 75 %, and if, in addition, they protect the doors and windows, then the estimated vulnerability may be reduced to 60 %. Thus, the risk is now given by R = P(X > x d )d m V. Therefore, in general assuming that vulnerability is also a function of the flood magnitude, V(x), the expected damage (risk) may be determined as \( R=E(D)=E(D)={\displaystyle {\int}_{x_d}^{\infty }d(x)V(x){f}_X(x) dx} \). Further details on the concept of vulnerability in connection to flood analysis may be found in Platte [223].

7.2.4 Regional Frequency Analysis

In general, regional frequency analysis is a procedure for estimating quantiles of a probability distribution of the variable of interest (e.g., floods, low flows, or maximum precipitation) which is applicable to a given region (or area). Commonly this is done where the particular site (e.g., a stream cross section) lacks enough data so that a reliable estimate of a given quantile (e.g., the 100-year flood) can be made or the site is ungauged. Thus, alternative methods have been developed in literature depending on the type of variable, although some of the methods may be equally applicable regardless of the type of variable.

A widely used method for regional flood frequency analysis is based on a multiple regression model such as \( Y=a{X}_1^{b_1}{X}_2^{b_2}\dots {X}_m^{b_m} \), where the dependent variable Y may represent a particular flood quantile (e.g., the T-year flood say Q T ) and the Xs are the independent variables (predictors), which generally involve physiographic (e.g., area of the basin, slope, and drainage density) and climatic (e.g., index of precipitation, temperature, and wind) characteristics. Literature abounds on applying this technique (e.g., [213, 224228]). For example, Mc Cain and Jarrett [226] found the regression equation Q 100 = 1.88A 0.787 P 0.932 for the mountains of the State of Colorado, USA, where A and P represent the drainage area and the mean annual precipitation, respectively. Selecting a particular quantile (say Q T ) as the dependent variable has the disadvantage that multiple regression equations may be needed, i.e., one for every T. Instead two alternatives may be (1) regionalizing the sample moments such as \( \overline{Q} \), S Q , and g Q (i.e., the sample mean, standard deviation, and skewness coefficient, respectively) from which the estimates of the parameters \( \underset{\bar{\mkern6mu}}{\theta } \) of the flood frequency distribution \( {f}_Q\left(q,,,\underset{\bar{\mkern6mu}}{\theta}\right) \) can be determined based on the method of moments and (2) regionalizing the parameters of a particular model. The two alternatives have the advantage that only two or three regressions are needed (depending on the model) and any flood quantile may be derived from the regionalized parameters (e.g., [227, 229]).

Another alternative, which is applicable for regionalizing flood quantiles and extreme precipitation quantiles, is the so-called index-flood method (IFM). This method, originally suggested by Dalrymple [230], involves three key assumptions: (1) observations at any given site are independent and identically distributed, (2) observations at different sites are independent, and (3) frequency distributions at different sites are identical except for a scale factor. The first assumption is a basic assumption for most methods of frequency analysis of extreme events, but the last two assumptions are unlikely to be met by hydrometeorological data [231]. The third assumption may be mathematically characterized as y (i) q = μ (i) y R q , i = 1, …, n where y (i) q = qth quantile for site i, μ (i) = index flood for site i, y R q = regional q th quantile, and n = number of sites in the region. The index flood μ (i) is usually taken to be the at-site population mean for site i, which is estimated by the at-site sample mean, i.e., \( {\widehat{\mu}}^{(i)}={\overline{y}}^{(i)} \). To estimate the regional q th quantile, a model is assumed, and the parameter estimation may be based on the method of moments, probability-weighted moments, or maximum likelihood, depending on the selected model. Details on the applicability of this method for flood and extreme precipitation frequency analysis can be found in many published papers and books (e.g., [231236]). An apparent flaw of the method resulting from using the sample mean as the index flood (as noted above) has been discussed by Stedinger [237] and Sveinsson et al. [238], and an index-flood method that avoids such a flaw (called the population index flood) has been developed [238]. In addition, Burn et al. [239] discussed approaches for regionalizing catchments for regional flood frequency analysis, Cunnane [240] reviewed the various methods and merits of regional flood frequency analysis, and also a worldwide comparison of regional flood estimation methods has been done [241].

Furthermore, regionalization and estimation of low-flow variables (e.g., [242247]) and droughts (e.g., [248250]) have been suggested in literature.

7.2.5 Uncertainty Considerations in Frequency Analysis

In the examples in Sect. 7.2.2, we illustrated how one can estimate the parameters \( \underset{\bar{\mkern6mu}}{\theta } \) of a specified distribution \( {f}_X\left(x,;,\underset{\bar{\mkern6mu}}{\theta}\right) \) given that we have observations x 1, …, x N of say flood, extreme precipitation, or low flows. However, since the parameters are estimated from a limited sample, they are uncertain quantities, i.e., \( \underset{\bar{\mkern6mu}}{\widehat{\theta}}={g}_1\left({x}_1,\dots, {x}_N\right) \), and consequently since the qth quantile x q is a function of the parameters, then it is also an uncertain quantity, i.e., \( {\widehat{x}}_q={g}_2\left(\underset{\bar{\mkern6mu}}{\widehat{\theta}}\right) \). Thus, for the common distributions that are generally applied in hydrology and water resources and for the various estimation methods, procedures have been developed for estimating the confidence limits for the population parameters and confidence limits for the population quantiles (e.g., [213, 216, 218, 251]). Obviously those confidence limits depend on the sample size N, and as the sample size becomes larger, the confidence interval becomes narrower and conversely.

There is the additional uncertainty regarding the distribution model, although often the model may be suggested by manuals or standards and they vary with the region or country. For example, for flood frequency analysis, the log-Pearson III is the preferred model in the United States, while the logistic model is the recommended model in Great Britain. Regardless there are also statistical procedures for testing the goodness of fit of the models although often more than one candidate model may not be rejected by the tests [219]. Likewise, simulation studies can be made for comparing the applicability of alternative models for estimating quantiles that are beyond the length of the historical sample. Also when the sample size for a given site is small, one may apply statistical models to extend the short records if longer records are available at nearby basins, and in some cases, rainfall-runoff models may be useful for record extension. And as indicated above, regional frequency analysis may be also applied particularly for ungauged basins, but also regional parameters or quantile estimates can be combined with at-site estimates (e.g., [226]).

Furthermore, in designing flood-related hydraulic structures, it is a common practice to specify a return period and derive the corresponding design flood of the structure from the frequency distribution of the historical annual floods. Thus, the return period T = 1/(1 − q) is specified and the design flood x q obtained from the selected CDF \( F\left(x;\theta \bar{\mkern6mu}\right) \). Another case that arises in practice relates to projects that have been operating for some time, and it may be desirable reevaluating the capacity of the structure. This may be desirable because of several reasons such as the occurrence of extreme floods, the additional years of flood records, the modification of design manuals and procedures, and perhaps changes in the hydrologic regime as a result of climate variability and change, or changes in the landscape and land use, etc. [216]. In any case, reevaluating the capacity of the structure means that the flood magnitude is known and one may like to recalculate the structure’s performance, such as the return period and the risk of failure. Thus, in this second situation, the design flood magnitude x q is known, and the problem is estimating the non-exceedance probability q. Thus, q is the uncertain quantity (and consequently p, T, and R). A method that accounts for the uncertainty in estimating the non-exceedance probability q, the return period T, and the risk of failure R has been suggested by Salas and Heo [252], Salas et al., [253], and Salas et al. [254].

7.3 Stochastic Methods in Hydrology and Water Resources

7.3.1 Introduction

Generally stochastic (time series) models may be used for two main purposes, namely, for stochastic simulation or data generation and for forecasting. In stochastic simulation, we use a stochastic model to generate artificial records of the variable at hand, e.g., streamflows, for a specified period of time, e.g., 50 years. Depending on the problem, one can simulate many equally likely samples of streamflows, each 50 years long or simulate one very long sample (e.g., 100,000 years long). On the other hand, in forecasting, we make the best estimate of the value of streamflow that may occur say in the period April–July given the observed streamflows in the past and many other predictors as needed. Typically, stochastic simulation is used for planning purposes, e.g., for estimating the capacity of a reservoir to supply water for an irrigation system, for testing operating rules and procedures under uncertain hydrologic scenarios, for estimating the return period of severe droughts, and for many other purposes (e.g., [255, 256]). On the other hand, short-term, medium-term, and long-range forecasting are needed in practice for a number of applications such as operating water supply systems, hydropower network systems, flood warning systems, irrigation scheduling, water releases from reservoirs, and tracking the dynamics of ongoing droughts.

The field of stochastic hydrology has been developed since the early work of Hurst [257], Thomas and Fiering [258], Yevjevich [259], Matalas [260], and Mandelbrot and Wallis [261] in the 1950s and 1960s who inspired the work and contributions of many others along several directions and books, chapters of books, papers, manuals, and software have been developed. Perhaps a broad classification of the various methods proposed may be as parametric and nonparametric methods, and in each (category) well-known models and approaches became popular such as autoregressive (AR) and autoregressive and moving average (ARMA) for parametric (e.g., [255, 256, 262264]) and bootstrap, kernel density estimates (KDE), K-nearest neighbor (KNN), and variations thereof for nonparametric (e.g., [217, 265269]). Also the methods and names of models depend on the type of hydrologic processes to be analyzed, such as precipitation or streamflows, on the time scale, i.e., hourly, daily, seasonal, and yearly, and the number of sites involved (single or multiple sites). For example, contemporaneous ARMA (CARMA) has been widely used for modeling multisite streamflows (e.g., [256, 264]). Also modeling and simulation of complex systems can be simplified using temporal and spatial disaggregation and aggregation approaches (e.g., [256, 262, 270276]). In this section, we introduce the subject with some concepts and definitions, describe how to characterize a hydrologic time series at yearly and monthly time scales, and apply a simple AR model along with an example to illustrate how to simulate streamflows. Subsequently we briefly discuss additional concepts regarding forecasting followed by a section on uncertainty issues. The issue of nonstationarity is covered in Sect. 7.4.

7.3.2 Main Concepts and Definitions

Most hydrologic series of practical interest are discrete time series defined on hourly, daily, weekly, monthly, and annual time intervals. The term seasonal time series is often used for series with time intervals that are fractions of a year (e.g., a month). Also seasonal time series are often called periodic-stochastic series because although being stochastic, they evolve in a periodic fashion from year to year. Hydrologic time series may be single or univariate series (e.g., the monthly precipitation series at a given gauge) and multiple or multivariate series (e.g., the monthly precipitation series obtained from several gauges). A time series is said to be stationary if the statistical properties such as the mean, variance, and skewness do not vary through time. Conversely if the statistical properties vary through time, then the time series is nonstationary.

Hydrologic time series are generally autocorrelated. Autocorrelation in some series such as streamflow usually arises from the effects of surface, soil, and groundwater storages [256]. Conversely, annual precipitation and annual maximum flows (flood peaks) are usually uncorrelated. Sometimes autocorrelation may be the result of trends and/or shifts in the series [97, 277]. In addition, multiple hydrologic series may be cross-correlated. For example, the streamflow series at two nearby gauging stations in a river basin are expected to be cross-correlated because the sites are subject to similar climatic and hydrologic events, and as the sites considered become farther apart, their cross-correlation decreases. However, because of the effect of some large-scale atmospheric-oceanic phenomena such as ENSO (El Niño Southern Oscillation), significant cross-correlation between SST (sea surface temperature) and streamflow between sites that may be thousands of miles apart can be found [278]. Furthermore, hydrologic time series may be intermittent when the variable under consideration takes on nonzero and zero values throughout the length of the record. For instance, hourly and daily rainfalls are typically intermittent series, while monthly and annual rainfalls are usually non-intermittent. However, in arid regions, even monthly and annual precipitation and runoff may be intermittent as well.

Traditionally, certain annual hydrologic series have been considered to be stationary, although this assumption may be incorrect because of the effect of large-scale climatic variability, natural disruptions like a volcanic eruption, anthropogenic changes such as the effect of reservoir construction on downstream flow, and the effect of landscape changes on some components of the hydrologic cycle [279]. Also, hydrologic series defined at time intervals smaller than a year such as months generally exhibit distinct seasonal (periodic) patterns due to the annual revolution of the earth around the sun. Likewise, summer hourly rainfall series or certain water quality constituents related to temperature may also exhibit distinct diurnal patterns due to the daily rotation of the earth [280, 281]. Cyclic patterns of hydrologic series translate into statistical characteristics that vary within the year or within a week or a day as the case may be, such as seasonal or periodic variations in the mean, variance, covariance, and skewness. Thus, series with periodic variations in their statistical properties are nonstationary.

In addition of seasonality (periodicity), hydrologic time series may exhibit trends, shifts or jumps, autocorrelation, and non-normality. In general, natural and human-induced factors may produce gradual and instantaneous trends and shifts (jumps) in hydroclimatic series. For example, a large forest fire in a river basin can immediately affect the runoff, producing a shift in the runoff series. A large volcanic explosion or a large landslide can produce sudden changes in the sediment transport series of a stream. Trends in nonpoint source water quality series may be the result of long-term changes in agricultural practices and agricultural land development, and changes in land use and the development of reservoirs and diversion structures may also cause trends and shifts in streamflow series. The concern about the effects of global warming and those from low-frequency components in the atmospheric and ocean system (e.g., the Pacific Decadal Oscillation and the Atlantic Multidecadal Oscillation) is making hydrologists more aware of the occurrence of trends and shifts in hydrologic time series and the ensuing effects on water resources, the environment, and society (e.g., [279, 282], and also refer to Sect. 7.4).

7.3.3 Stochastic Characteristics of Hydrologic Data

The stochastic characterization of the underlying hydrologic processes is important in constructing stochastic models. In general, the stochastic characteristics of hydrologic series depend on the type of data at hand, e.g., data of precipitation and streamflow, and the time scale, e.g., yearly and monthly. The most commonly used statistical properties for analyzing hydrologic time series are the sample mean \( \overline{y} \), variance s 2, coefficient of variation Cv, skewness coefficient g, and lag-k autocorrelation coefficient r k . Coefficients of variation of annual flows are typically smaller than one, although they may be close to one or greater in streams in arid and semiarid regions. The coefficients of skewness g of annual flows are typically greater than zero. In some streams, small values of g are found suggesting that annual flows may be approximately normally distributed. On the other hand, in streams of arid and semiarid regions, g can be greater than one.

The sample mean, variance, and skewness coefficient may be calculated, respectively, as

$$ \overline{y}=\frac{1}{N}{\displaystyle \sum_{t=1}^N{y}_t} $$
(1.88)
$$ {s}_y^2=\frac{1}{N-1}{\displaystyle \sum_{t=1}^N{\left({y}_t-\overline{y}\right)}^2} $$
(1.89)

and

$$ {g}_y=\frac{N{\displaystyle \sum_{t=1}^N{\left({y}_t-\overline{y}\right)}^3}}{\left(N-1\right)\left(N-2\right){s}_y^3}. $$
(1.90)

And the sample lag-1 autocorrelation coefficient r 1 may be determined by

$$ {r}_1=\frac{c_1}{c_0}, $$
(1.91a)
$$ {c}_k=\frac{1}{N}{\displaystyle \sum_{t=1}^{N-k}\left({y}_{t+k}-\overline{y}\right)\left({y}_t-\overline{y}\right)},\kern3em k=0,1,\dots, $$
(1.91b)

where N = sample size and k = time lag. The lag-1 autocorrelation coefficient r 1 (also called serial correlation coefficient) is a simple measure of the degree of time dependence of a series. Generally r 1 for annual flows is small but positive, although negative r 1s may occur because of sample variability. Large values of r 1 for annual flows can be found for a number of reasons including the effect of natural or man-made surface storage such as lakes, reservoirs, or glaciers, the effect of groundwater storage, the effect of errors in naturalizing streamflow data, and the effect of low-frequency components of the climate system. The estimators s y 2, g y , and r 1 are biased downward relative to the corresponding population statistics. Corrections for bias for these estimators have been suggested (e.g., [283285]). In addition, when analyzing several time series jointly, cross-correlations may be important (e.g., [256]).

While the overall stochastic properties of hydrologic time series, such as those defined above, may be determined either from annual series or for seasonal series as a whole, specific seasonal (periodic) properties may provide a better picture of the stochastic characteristics of hydrologic time series that are defined at time intervals smaller than a year such as monthly streamflow data. Let the seasonal time series be represented by y ν,τ , ν = 1, …, N; τ = 1, …, ω in which ν = year, τ = season, N = number of years of record, and ω = the number of seasons per year (e.g., ω = 12 for monthly data). Then, for each season τ, one can determine the mean \( {\overline{y}}_{\tau } \), variance s 2 τ , coefficient of variation Cv τ , and skewness coefficient g τ (these statistics are denoted as seasonal or periodic statistics). For example, the sample seasonal mean, variance, and skewness coefficient may be determined, respectively, as

$$ {\overline{y}}_{\tau }=\frac{1}{N}{\displaystyle \sum_{\nu =1}^N{y}_{\nu, \tau }},\dots \tau =1,\dots, \omega $$
(1.92)
$$ {s}_{\tau}^2=\frac{1}{N-1}{\displaystyle \sum_{\nu =1}^N{\left({y}_{\nu, \tau }-{\overline{y}}_{\tau}\right)}^2},\dots \tau =1,\dots, \omega $$
(1.93)

and

$$ {g}_{\tau }=\frac{N{\displaystyle \sum_{\nu =1}^N\Big({y}_{\nu, \tau }}-{\overline{y}}_{\tau}\Big){}^3}{\left(N-1\right)\left(N-2\right){s}_{\tau}^3},\dots \tau =1,\dots \omega. $$
(1.94)

Furthermore, the sample season-to-season correlation coefficient r 1,τ may be estimated by

$$ {r}_{1,\tau }=\frac{c_{1,\tau }}{{\left({c}_{0,\tau -1}{c}_{0,\tau}\right)}^{1/2}},\kern3em \tau =1,\dots, \omega $$
(1.95a)
$$ {c}_{k,\tau }=\frac{1}{N}{\displaystyle \sum_{\nu =1}^N\left({y}_{\nu, \tau }-{\overline{y}}_{\tau}\right)\left({y}_{\nu, \tau -k}-{\overline{y}}_{\tau -k}\right)},\kern2em k=0,1;\kern2em \tau =1,\dots, \omega $$
(1.95b)

For instance, for monthly streamflows, r 1,4 represents the correlation between the flows of the fourth month with those of the third month. Note that for τ = 1, c 0,τ − 1 in (1.95a) must be replaced by c 0,ω , and for τ = 1 and k = 1, y ν,τ − 1 and \( {\overline{y}}_{\tau -1} \) in (1.95b) must be replaced by y ν − 1,ω and \( {\overline{y}}_{\omega } \), respectively. Likewise, for multiple seasonal time series, the sample lag-1 seasonal cross-correlation coefficient r ij1,τ between the seasonal time series y (i) ν,τ and y (j) ν,τ − 1 for sites i and j may be determined.

The statistics \( {\overline{y}}_{\tau } \), s τ , g τ , and r 1,τ may be plotted versus time τ = 1,…,ω to observe whether they exhibit a seasonal pattern. Fitting these statistics by Fourier series is especially effective for weekly and daily data [262]. Generally, for seasonal streamflow series, \( {\overline{y}}_{\tau }>{{\displaystyle s}}_{\tau } \) although for some streams \( {\overline{y}}_{\tau } \) may be smaller than s τ especially during the “low-flow” season. Furthermore, for streamflow series in dry areas, the mean may be smaller than the standard deviation, i.e., \( {\overline{y}}_{\tau }<{s}_{\tau } \) throughout the year [279]. Likewise, values of the skewness coefficient g τ for the dry season are generally larger than those for the wet season indicating that data in the dry season depart more from normality than data in the wet season. Values of the skewness for intermittent hydrologic series are usually larger than skewness for similar non-intermittent series. Seasonal correlations r 1,τ for streamflow during the dry season are generally larger than those for the wet season, and they are significantly different than zero for most of the months. On the other hand, month-to-month correlations for monthly precipitation are generally low or not significantly different from zero for most of the months [286], while lag-1 correlations are generally greater than zero for weekly, daily, and hourly precipitation.

For illustration consider the time series of annual streamflows for the Poudre River at Mouth of the Canyon for the period 1971–1990 (as shown in Table 1.13, column 2). We would like to calculate the main stochastic characteristics of the annual flow data. We apply (1.88)–(1.90) to get the mean, standard deviation, and skewness coefficient, respectively, of the original data denoted as x t . They are shown at the bottom of column 2 in Table 1.13. It gives a coefficient of variation of about 0.41. Note that the skewness coefficient is about 1.4, which suggests that the data are skewed to the right and departs from the normal distribution. We apply the logarithmic transformation to try bringing the skewness down to zero (and close to the normal distribution). The log-transformed flows are shown in column 3, and the resulting statistics are given at the bottom. In this case, the skewness coefficient is 0.018, i.e., near zero. Thus, we can assume that the log-transformed flows are close to be normal distributed. In addition, we calculate the lag-1 serial correlation coefficient r 1 of the transformed flows y t . For this purpose, we apply (1.91a) and (1.91b) and get r 1 = 0.107. This low value is typical of small rivers (by the way, the r 1 obtained for the same river based on a 120-year data set gives a value of r 1 of about 0.15).

Table 1.13 Statistical analysis of the annual streamflows (acre-ft) of the Poudre River for the period 1971–1990

7.3.4 Stochastic Modeling and Simulation of Hydrologic Data

A number of stochastic models have been developed for simulating hydrologic processes such as streamflows. Some of the models are conceptually (physically) based, some others are empirical or transformed or adapted from existing models developed in other fields, while some others have arisen specifically to address some particular features of the process under consideration. In general models for short time scales such as daily are more complex than models for larger time scales such as monthly and annual. Also some of the models have been developed specifically for precipitation while some others for streamflow. Yet many of them are useful for both and for many other hydrologic processes. We will illustrate here a simple model that may be useful for data generation of annual data at one site (single variable). In some cases, the model may be also useful for data generation of monthly data after standardizing the data seasonally (i.e., season by season) although periodic-stochastic models may be better to apply for seasonal data. For further description of alternative models that are available for annual and seasonal data for both single site and multisite systems including models for intermittent data, the reader is referred to Salas et al. [262], Loucks et al. [255], Salas [256], and Hipel and McLeod [264].

We will use the lag-1 autoregressive or AR(1) model, which is given by

$$ {y}_t={\mu}_y+\phi \left({y}_{t-1}-{\mu}_y\right)+{\varepsilon}_t, $$
(1.96)

where ε t is a random noise term which is normally distributed with mean zero and variance σ 2 ε and is uncorrelated with y t − 1. In addition it may be shown that because ε t is normally distributed, also y t is normal with mean μ y and variance σ 2 y = σ 2 ε /(1 − ϕ 2). To generate synthetic records of the variable y t , one can use model (1.96) if the model parameters are known or estimated. The parameters of the model may be estimated by using the method of moments (although other methods are available). They are

$$ {\widehat{\mu}}_y=\overline{y}, $$
(1.97)
$$ \widehat{\phi}={r}_1, $$
(1.98)

and

$$ {\widehat{\sigma}}_{\varepsilon}^2=\left(1-{r}_1^2\right){s}_y^2. $$
(1.99)

Substituting the estimated parameters of (1.97)–(1.99) into (1.96), we have

$$ {y}_t=\overline{y}+{r}_1\left({y}_{t-1}-\overline{y}\right)+\sqrt{1-{r}_1^2}{s}_y{\xi}_t $$
(1.100a)

or

$$ {y}_t=\left(1-{r}_1\right)\overline{y}+{r}_1{y}_{t-1}+\sqrt{1-{r}_1^2}{s}_y{\xi}_t, $$
(1.100b)

where in this case, ξ t is a normal random variable with mean zero and variance one. Thus, to generate the variable y t , one needs to generate the normal random number ξ t . The standard normal random number ξ t can be found from tables or from numerical algorithms available to generate standard normal random numbers (e.g., [256, 287]). Also the function NORMINV of Excel can be used to generate standard normal random numbers. One may observe from (1.100a) that it is also necessary to know the previous value of y, i.e., y t − 1. For example, to generate the first value y 1, (1.100b) gives

$$ {y}_1=\left(1-{r}_1\right)\overline{y}+{r}_1{y}_0+\sqrt{1-{r}_1^2}{s}_y{\xi}_1 $$

which says that in addition to ξ 1, we need to know the initial value y 0. The initial value y 0 may be taken to be equal to the mean \( \overline{y} \), but in order to remove the effect of such arbitrary initial condition, one should warm up the generation as suggested by Fiering and Jackson [288]. For example, if we want to generate a sample of 100 values of y t , one could generate 150 values, drop the first 50, and use the remaining 100 values. Alternatively, y 0 can be taken randomly from a normal distribution with mean \( \overline{y} \) and standard deviation s y . This way there is no need for a warm up generation. We will illustrate the approach by generating a few values of y t as shown in the example below.

We use the data of the annual flows of the Poudre River shown in Table 1.13 and the AR(1) model (1.100) to generate synthetic annual flows for the Poudre. Firstly, we will build a model in the logarithmic domain because the data analysis in Sect. 7.3.3 showed that the original data were skewed and that the logarithmic transformation was able to bring the skewness down to nearly zero. Recall from Table 1.13 that the basic statistics of the log-transformed flows are \( \overline{y}=5.46473 \), s y = 0.1704, and r 1 = 0.107. To start the generation, we must generate the initial value y 0 . For this purpose, we obtain the standard normal random number −0.0898 so that

$$ {y}_0=\overline{y}+{s}_y{\xi}_0=5.46473+0.1704\times \left(-0.0898\right)=5.449428. $$

Then for t ≥ 1, we will use (1.100b) as

$$ \begin{array}{l}{y}_t=\left(1-0.107\right)\times 5.46473+0.107{y}_{t-1}+\sqrt{1-{0.107}^2}\times 0.1704{\xi}_t\\ {}\kern0.75em =4.88+0.107{y}_{t-1}+0.169422{\xi}_t.\end{array} $$
(1.101)

Then values of y t are obtained by successively applying (1.101). For example, we get ξ 1 = − 0.4987 and ξ 2 = 1.2471 and (1.101) gives

$$ \begin{array}{l}{y}_1=4.88+0.107{y}_0+0.169422{\xi}_1=4.88+0.107\times 5.449428+0.169422\times \left(-0.4987\right)\\ {}\kern0.85em =5.37860\end{array} $$
$$ \begin{array}{l}{y}_2=4.88+0.107{y}_1+0.169422{\xi}_2=4.88+0.107\times 5.37860+0.169422\times (1.2471)\\ {}\kern0.80em =5.666799\end{array} $$

and so on. Furthermore, since the original flow data x t has been transformed into a normal variable y t by using the logarithmic transformation, we need to invert the data generated (in the normal domain) back to the original flow domain. Taking the antilog can do this, i.e., \( {x}_t={10}^{y_t} \). Thus, inverting the generated values y 1 = 5.37860 and y 2 = 5.666799, we get

x 1 = 105.37860 = 239, 111.3 acre−ft and x 2 = 105.666799 = 464, 300.7 acre‐ft.

The rest of the example can be seen in Table 1.14 below where ten values of synthetic streamflows have been generated.

Table 1.14 Generated annual streamflows based on the AR(1) model for a 10-year period

Generally one must generate many samples (e.g., 100) each of length equal to the historical sample to make comparisons and verifications in order to see whether the model is capable of “reproducing” in the statistical sense the historical statistics that are relevant to the problem at hand (e.g., basic statistics, storage capacity, drought duration, and magnitude). For this purpose, one may use box plots and software packages such as SPIGOT [289] and SAMS-2010 [290]. In general, the length of generation depends on the particular planning and management problem at hand (e.g., [255, 256]).

7.3.5 Stochastic Forecasting

Stochastic forecasting techniques have been used in hydrology and water resources for a long time. Some of the stochastic techniques that are applied for short-, medium-, and long-term forecasting of hydrologic variables such as streamflows include regression models, principal components-based regression models, autoregressive integrated moving average (ARIMA) models, autoregressive moving average with exogenous variables (ARMAX), and transfer function noise (TFN) models. The advantage of using well-structured models is that model identification and parameter estimation techniques are widely available in statistical software packages. In addition, Kalman filtering techniques can be included to allow for model parameters to vary through time. Examples of applying many of these models including nonparametric techniques and extended streamflow prediction can be found in a number of papers and books published in literature (e.g., [264, 291295]). In addition, because short-term rainfall is an intermittent process, often Markov chains and point process models are applied for forecasting rainfall (e.g., [296298]).

Furthermore, since about 1990, artificial neural networks (ANN) have become popular for a number of applications such as streamflow and precipitation forecasting. The ASCE J. Hydrol. Engr. Vol.5, No.2, 2000 is a dedicated issue on the subject, and the book Artificial Neural Networks in Hydrology [299] includes some chapters specifically on streamflow forecasting (e.g., [300, 301]). Also French et al. [302] used ANN to forecast rainfall intensity fields, and ANN was applied for forecasting rainfall for 6-h lead time based on observations of rainfall and wind at a number of gauges [303]. Other forecasting applications of ANN can be found in [304] and [305].

Also since about the 1990s, a variety of stochastic forecasting approaches have been developed based on hydrologic, oceanic, and atmospheric predictors. It has demonstrated the significant effects of climatic signals such as SST, ENSO, PDO, AMO, and NAO and other atmospheric variables such as pressure and wind on precipitation and streamflow variations (e.g., [14, 306310]) and that seasonal and longer-term streamflow forecasts can be improved using climatic factors (e.g., [307, 311315]).

For example, Stone et al. [316] developed a probabilistic rainfall forecast using the Southern Oscillation Index (SOI) as a predictor. Also Sharma [317] applied a nonparametric model to forecast rainfall with 3–24 months of lead times. Another example is the forecasting of the Blue Nile River seasonal streamflows based on sea surface temperature (SST) for lead times of several months and up to 24 months based on multiple linear regression and principal component analysis [312]. And Grantz et al. [313] developed a forecast model using SST, GH, and SWE as predictors for forecasting April–July streamflows at the Truckee and Carson rivers in Nevada. They found that forecast skills are significant for up to 5-month lead time based on SST and GH. Also Regonda et al. [318] reported April–July streamflow forecasts in the Gunnison River using various climatic factors. And more recently, Salas et al. [315] reported successful forecasting results of seasonal and yearly streamflows in several rivers with headwaters in the State of Colorado based on hydrologic, oceanic, and atmospheric predictors.

7.3.6 Uncertainty Issues in Stochastic Generation and Forecasting

Uncertainties in hydrologic stochastic simulation may arise from various sources which include model uncertainty and parameter uncertainty. Model uncertainty can be minimized by applying well-known models, testing them with appropriate procedures, and relying on the experience and judgment of the modeler. Thus, we will center our attention here on the uncertainty that arises from the limited data that may be available for analysis. Stochastic models are often applied for simulating possible hydrologic scenarios that may occur in the future. But since the parameters of the underlying models are estimated using limited records, the parameter estimates are uncertain quantities, and consequently the decision variables that may be used for planning and management of water resources systems, such as the storage capacity of a reservoir or the critical drought that may occur in a given number of years, are also uncertain quantities.

The effect of parameter uncertainty using stochastic models can be quantified based on asymptotic analysis and Bayesian inference. In the asymptotic analysis, the approximate distributions of parameter estimators are derived based upon large sample theory. For example, Box and Jenkins [319] derived the large sample variance-covariance matrix of parameter estimators for univariate autoregressive moving average (ARMA) models, which enables one defining an approximate distribution of parameter estimators for sufficient large sample size. Also Camacho et al. [320] studied the large sample properties of parameter estimators of the contemporaneous autoregressive moving average (CARMA) model. In the Bayesian framework, the posterior distributions of parameter estimators describe the uncertainty of the parameters. Vicens et al. [321] determined the Bayesian posterior distribution of the parameters of the lag-1 autoregressive model, and Valdes et al. [322] expanded the Bayesian approach to the multivariate AR(1) model. Their application with diffuse prior distribution showed that the model produces synthetic flows with higher standard deviations than the historical sample when the historical records are short. Also McLeod and Hipel [323] suggested simulation procedures for streamflow generation with parameter uncertainty based on the ARMA model. In addition, Stedinger and Taylor [324] also applied the Bayesian framework to examine the effect of parameter uncertainty of annual streamflow generation for determining the reservoir system capacity and suggested that incorporating parameter uncertainty into the streamflow generation would increase the variability of the generated storage capacity.

Although the issue of parameter uncertainty based on parametric models such as ARMA has been well recognized in the past and some procedures have been suggested (e.g., [262, 289, 321323, 325328]), unfortunately the conventional approaches, i.e., simulation with no consideration of parameter uncertainty, are still being applied in practice generally leading to underdesign of hydraulic structures. This issue has been reexamined by Lee et al. [329] and suggests that neglecting parameter uncertainty in stochastic simulation may have serious consequences for determining the storage capacity of reservoirs or estimating critical droughts.

Furthermore, forecasts based on any type of ARMA, ARMAX, and TFN models can include the estimation of confidence limits (e.g., [264]). Also in conjunction with Kalman filter techniques, previous forecast errors can be used to improve forecasts for subsequent time steps (e.g., [291]). Likewise, confidence limits on forecasts based on multiple regression models are also well known in literature (e.g., [215, 330]).

7.4 Nonstationarity

Over the past decades, there have been a number of studies documenting that hydrologic records exhibit some type of nonstationarity in the form of increasing or decreasing trends (e.g., [331335]), upward and downward shifts (e.g., [277, 336339]), or a combination of trends and shifts. Perhaps the most obvious cases of human intervention leading to changes in the flow characteristics in part of the basin is the construction of diversion dams and dams for water regulation (which cause significant changes in the water regime downstream of the dam site but also changes in sediment transport and water quality). Also, it has been argued that streamflow records may be changing because of the effect of land use changes in the basin such as increasing urbanization (e.g., [340]), the effect of deforestation, and the conversion of arid and semiarid lands in large-scale irrigated fields (e.g., [279]).

The changes resulting from human intervention, some of which have been referred to above, are quite clear, and water resources engineers have developed methods to quantify them. In fact, a key step in many hydrologic studies is to “naturalize” the gauged flow records where upstream human intervention has taken place (although in complex systems, it is not an easy problem). However, in the last few years, it has been apparent that some part of the “changes” that we may be observing in hydrologic records may be due to the effect of climatic variability, particularly resulting from low-frequency components such as ENSO (El Niño Southern Oscillation) but more importantly from large-scale decadal and multidecadal oscillations such as the PDO and AMO. And these large-scale forcing factors have been shown to exert in-phase and out-of-face oscillations in the magnitude of floods, mean flows, and droughts (e.g., [338, 339, 341343]). To tackle the various types of nonstationarities, several stochastic approaches have been proposed in the literature such as using flood frequency distributions with mixed components (e.g., [344347]), flood frequency models imbedded with trend components (e.g., [333, 348350]), flood frequency modeling considering shifting patterns (e.g., [351, 352]), and flood frequency modeling considering covariates (e.g., [348, 349, 353, 354]).

In addition, stochastic approaches have been developed to deal with nonstationarities to simulate, for example, monthly and yearly hydrologic processes such as streamflows (e.g., for drought studies and design of reservoirs) using both short-memory models, such as shifting mean models and regime switching models that have features of nonstationarity (e.g., [277, 339, 343, 355357]), and long-memory models such as FARMA (e.g., [358360]) and fractional Gaussian noise models (e.g., [361, 362]). Thus, the field of stochastic hydrology has been enriched in the past decades to accommodate both stationary and nonstationary features of hydrologic regimes. However, a word of caution is that as more features of the hydroclimate regimes are involved and considered, it has become necessary to develop more sophisticated models and procedures, some of which require a very good understanding of stochastic processes and hydroclimatic variability. On the other hand, the availability of computational tools, databases, and software have made it possible to develop and, in some cases, to apply some of the complex models referred to above in actual cases of planning and management of water resources systems.

8 Advances in Hydrologic Data Acquisition and Information Systems

In order to have a good understanding of the dynamics of the hydrologic cycle, it has been necessary to make observations of the key variables involved such as precipitation, air temperature, humidity, evaporation, infiltration, soil moisture, groundwater levels, and streamflow. While field measurements are still being made with traditional equipment and devices, such as the conventional rain gauges and current flow meters, over the years, measurement equipment has become more sophisticated taking advantage of technological developments in materials, electronics, software, hardware, remote sensing, image processing, and computational algorithms. As a result, data have become more plentiful, often accessible in real time depending on the case and needs. Automated data screening may also make certain data sources more reliable.

Among the various developments, perhaps the most prominent ones are those obtained from spaceborne sensors that help gather information useful for hydrologic investigations. Thus, in this section, we summarize the main products that are being developed based on remote sensing from space. Also we include advances made for hydrologic measurements in large rivers and developing data information systems to make data gathering and applications more efficient.

8.1 Satellite Precipitation Estimation

Precipitation is one of the most important variables for studying the hydrologic cycle and for basic hydrologic studies in river basins. However, in many parts of the world, particularly in remote places such as the oceans and the arctic regions where the accessibility is difficult, surface precipitation measurements are lacking. Likewise, in developing countries, precipitation measurements based on the conventional rain gauges are insufficient, and weather-related radars may not even be available mainly because of the high cost of establishing and maintaining the monitoring stations (e.g., [363365]). Furthermore, for a variety of reasons also in the developed world, there has been a trend of decreasing some of the existing measurement network (e.g., [366]).

On the other hand, over the past decades, several satellite precipitation products (SPP) with high spatial resolution (e.g., 1°, 0.5°, 0.25°) and temporal scales such as 1 h, 3 h, daily, and monthly have been developed. These products enable estimating precipitation over much of the world (e.g., [364]). The use of satellites for this purpose already has several decades of history starting with the launch of the low earth orbit (LEO) satellite by the United States in 1960. In addition, another type of meteorological satellite, the geostationary earth orbit satellite (GEOS), was launched in 1974 by the United States. Since then, several similar LEO and GEOS satellites were launched by several countries.

The sensors aboard the LEO satellites are for detecting the visible and infrared (IR) bands of the spectrum and over time have been developed to include advanced very high-resolution radiometers (AVHRR) and passive microwave (PMW) radiometers. And the Tropical Rainfall Measuring Mission (TRMM) satellite launched in 1997 further increased the PMW capabilities along with active microwave precipitation radar (PR) capable of capturing information on horizontal and vertical variability of rainfall. The various radiometers aboard the LEO satellites have provided spatial resolution of 1 km and 6-h temporal sampling [363]. Likewise the GEOS meteorological satellites carry aboard visible and IR sensors, and the various satellites are capable of detecting the visible and IR radiation at a finer temporal resolution (a minimum of 3 h) although at a coarser spatial resolution of 4 km. The combined precipitation information from multiple sensors and algorithms produces estimates of precipitation over almost the entire globe. Thus, the SPP provide additional precipitation data beyond what may be available from conventional surface-based equipment such as rain gauges and radars.

TRMM satellite uses both active and passive microwave instruments for measuring primarily heavy to moderate rainfall over tropical and subtropical regions of the world. Building on the success of TRMM a Global Precipitation Measurement (GPM) mission, an international network of satellites has been planned to provide the next generation of global observations of rain and snow (http://pmm.nasa.gov/GPM). The advantage of GPM over TRMM will be its capability of measuring light rain and snow. GPM will give global measurements of precipitation with improved accuracy and temporal and spatial resolutions. The GPM core observatory is scheduled to be launched in 2014.

Several SPP exist that may be useful for hydrologic applications such as the TRMM, Multi-satellite Precipitation Analysis (TMPA, 367), NOAA-CPC morphing technique (CMORPH, 368), and Precipitation Estimation from Remotely Sensed Information using Artificial Neural Network (PERSIANN, 369). These products are available at a minimum of 3-h temporal scale and spatial resolution of 0.25° latitude/longitude. These SPP generally combine data from PMW and thermal IR sensors, and some also include surface rain gauge observations. The main differences among them are the manner in which the individual data inputs are combined. These differences may affect the accuracy of precipitation estimates over different regions of the world [370]. Thus, a number of studies have been undertaken to validate and compare them against precipitation data observed at surface rain gauges in various regions of the world such as the United States (e.g., [368, 369, 371]), Africa (e.g., [372375]), South America [370, 376], and worldwide [367, 377]. For example, Dinku et al. [370] evaluated the TMPA, CMORPH, PERSIANN, NRLB, and the GSMaP products at daily and 10-daily temporal scales and spatial resolution of 0.25° latitude/longitude against surface precipitation observed at 600 rain gauges in Colombia. Based on a number of validation techniques, the authors concluded that the performance of the tested SPP are reasonably good for detecting the occurrence of precipitation but are poor in estimating the amount of daily precipitation. But the products have good skill at the 10-day time frame. Although the performances varied over the various geographical regions in Colombia, the best performance was found for the eastern region and that CMORPH and GSMaP gave the best results. In addition, assessments of some SPP have been made against estimated flood hydrographs (e.g., [365, 378]). For example, the TMPA precipitation estimates were used in conjunction with the variable infiltration capacity (VIC) hydrologic model to estimate streamflow hydrographs for the La Plata basin (Argentina) for the period 1998–2006 [365]. A good agreement of the TMPA-driven simulated seasonal and interannual variability of streamflows was obtained. Also the timing of the daily flood events and low flows were reproduced well although the peak flows were overestimated.

Furthermore, evaluations of the uncertainty of satellite-based precipitation estimates and the associated surface-based data obtained from rain gauges and radars have been made (e.g., [379, 380]), and a global map of uncertainties in satellite-based precipitation estimates has been developed [381]. They used CMORPH, GSMaP, PERSIANN, 3B42, 3B42RT, and NRL satellite precipitation products and estimated the ensemble mean and coefficient of variation (uncertainty) of precipitation over the globe. The ensemble mean reproduced the major features of precipitation consistent with surface observations. The uncertainty among the various estimates varied in the range 40–60 % over the oceans (especially in the tropics) and over the lower latitude of South America. However, the uncertainty varied in the range 100–140 % over high latitudes (>40° in both hemispheres) especially during the cold season. As expected, large uncertainties across the year were found over complex geographic regions such as the Rocky Mountains, the Andes Mountains, and the Tibetan Plateau [381].

The applicability of some of the SPP is currently being studied in Peru. Figure 1.31 shows a comparison of the seasonal precipitation obtained from surface rain gauges (observed) versus the precipitation estimates obtained from TRMM3B42, CMORPH, and PERSIANN products for seasons December–February (DJF, top) and June–August (JJA, down). One may observe the complex precipitation distribution across the country for the two selected seasons. It appears that PERSIANN more closely resembles the spatial variability of precipitation for the DJF (summer) season, but for the JJA (winter) season, none of the SPP gives good results particularly for the mideastern region where the precipitation may reach about 1,200 mm. The referred comparison is simply graphical, but validation statistics will be determined to identify the specific strengths and weaknesses of the different SPP (Lavado, personal communication).

Fig. 1.31
figure 31

Observed and satellite precipitation estimates for the summer (DJF) and winter (JJA) periods in Peru using TRMM3B42, CMORPH, and PERSIANN products (source: W. Lavado, article in preparation) (color figure online)

8.2 Spaceborne Methods for Estimating Surface Waters: Rivers, Wetlands, and Lakes

While conventional systems for measuring surface and subsurface waters in river systems are well established (Sect. 8.4), unfortunately in some remote regions of the world and particularly in developing countries, ground-based measurements and estimations of streamflows are insufficient especially because of the high cost of establishing and maintaining the gauging stations. In addition, there are some quite large river basins having complex geomorphology and hydrodynamics with meanders and not well-defined channels where the conventional gauging procedures are inappropriate (e.g., [382]). The applications of spaceborne remote sensing methods have opened newer possibilities for expanding the coverage of surface water in the world.

For example, one of the most promising new methods is based on the radar altimetry, which has been used since the 1990s for measuring surface elevations in the oceans. The various satellites having such devices include the ERS1 launched in 1991, TOPEX/Poseidon (T/P) launched in 1992, the ICESat launched in 2003, and the satellites launched by Japan and Europe (refer to 382 for details of available satellites and websites for measuring surface waters). The satellites include the radar altimeters that have become useful for measuring river surfaces particularly for large rivers and wetlands (a radar altimeter emits microwave pulses to the surface and registers the travel time between the emitted pulse and the received echo, which allows estimating the distance between the altimeter antenna and the water surface).

Among the first studies to apply satellite altimetry for measuring river level variations were those by Cudlip [383], Guzkowska et al. [384], and Koblisnky et al. [385] with applications to the Amazon River. The latter study was based on the Geosat altimeter, and the results showed the potential of using altimeter data, but the Geosat radar did not give sufficient accuracy or coverage. On the other hand, Birkett [386] used the NASA radar altimeter (NRA) on board of the T/P satellite and investigated their application for measuring surface water at large rivers and wetlands in various places of the world. And the results obtained were quite good in terms of accuracy and the capability of tracking the seasonal and interannual variations of the Amazon River water levels. This initial application of the T/P altimetry was followed by other studies by Birkett et al. [387] for studying the surface water dynamics in the Amazon Basin using data of 7.5 years of the T/P and on a wider spatial scale across the Amazon Basin. The results obtained demonstrated not only the capability of monitoring the variations of the water surface height but also the water surface gradient. Also Coe and Birkett [388] extended the previous studies to investigate the variations of Lake Chad levels using T/P altimetry in conjunction with ground-based data to estimate not only lake levels but also river discharges at a major tributary of Lake Chad basin. Thus, they were able to predict Lake Chad level changes by observations of the changes at a station more than 600 km upstream. Additional studies with applications to the Amazon River and the Rio Negro (a major tributary of the Amazon) can be found in Zakharova et al. [389], Leon et al. [390], and Getirana et al. [391].

Alsdorf et al. [382] in reviewing the various applications of spaceborne sensors for estimating surface water suggested that the advances made in remote sensing using satellites have demonstrated that the elevation of the water surface (h), its slope (∂h/∂x), and its temporal change ∂h/∂t can be estimated using the technology from spaceborne sensors. They also discussed the limitations and challenges ahead for measuring velocity, bathymetry, and other hydraulic/hydrologic properties. In fact, recently Kääb and Prowse [392] have been able to estimate the two-dimensional surface water velocity field for the St. Lawrance and MaKenzie rivers. Also recent applications of T/P altimetry have been made to forecast transboundary river water elevations [393].

8.3 Spaceborne Methods for Estimating Soil Moisture, Evaporation, Vegetation, Snow, Glaciers, and Groundwater

Microwave radiometers have been used for estimating soil moisture for the past several decades, and such experience has been extended to using satellite-borne sensors. The interest on this technology has been energized in this century with the launch of the Soil Moisture and Ocean Salinity (SMOS) satellite by the European Space Agency (ESA) in 2008 and the expected launching of NASA’s satellite in 2014 that will carry aboard Soil Moisture Active Passive (SMAP) instruments. SMOS satellite carries a microwave radiometer that captures images of “brightness temperature” that correspond to microwave radiation emitted from the soil and ocean surfaces, which are then related to soil moisture held in the surface layer and ocean salinity. Estimates of surface soil moisture can be made with an accuracy of about 4 % (ESA website Dec. 2011) which is approximately twice the error of in situ electronic sensors. Recently SMOS has been used to keep track the soil moisture levels in Europe during the autumn of 2011, which has been very warm and dry. Likewise, the new SMAP satellite is expected to provide soil moisture information on a global scale and should be useful for a variety of applications in agriculture, weather forecasting, drought monitoring, and watershed modeling and should be also helpful in global circulation modeling. A number of studies have been made in developing the scientific and technical bases of spaceborne soil moisture estimation and its applications (e.g., [394402]).

Evaporation cannot be measured directly from spaceborne sensors, but it can be estimated using the remote sensing data based on mathematical relationships that represent the soil and air exchanges of water and energy fluxes. Estimates based on remote sensing data can be made with different approaches such as direct methods using thermal infrared (TIR) sensors and indirect methods using assimilation procedures combining different wavelengths to get various input parameters. Some methods are based on the spatial variability present in remote sensed images and no additional meteorological data to estimate evapotranspiration for routine applications (e.g., Courault et al. [403]). Detailed reviews of the various methods available for estimating evaporation using remote sensing have been made by several investigators [403411]. A comprehensive summary table of the various methods and validation results and sources is included in Kalma et al. [411].

Also the Advanced Spaceborne Thermal Emission and Reflection (ASTER) radiometer has enabled the estimation of a number of surface fluxes such as heat and water vapor. For example, Ma et al. [412] suggested a method for deriving surface temperature, normalized difference vegetation index (NDVI), modified soil-adjusted vegetation index, net radiation flux, soil heat flux, and latent heat flux based on ASTER images and tested it on an experimental site located on the Tibetan Plateau. The results showed that the derived evaporation estimates based on ASTER were within 10 % of the corresponding ground measurements. However, the vegetation-derived estimates were not validated because of the lack of data in the study site. While the proposed method is still in development, the results obtained have been encouraging.

Forest degradation has become a major concern in the past decades because of the deterioration of the ecosystem, sustainable biodiversity, disruption of its natural functioning, and the effects on the water cycle. In addition to the regular field observations, the application of remote sensing technology has become attractive for detecting forest degradation by measuring differences in the biophysical/biochemical attributes of the canopy surfaces between healthy and degraded forests [413]. Several vegetation-related indices have been proposed for monitoring the state of vegetation using remote sensing techniques such as NDVI [414]; the photochemical reflectance index [415]; the normalized difference water index, NDWI [416]; the water index, WI [417]; the land surface water index, LSWI [418]; and the land surface temperature, LST [419, 420]. For example, the WI and NDWI indices correlate well with vegetation water concentration [417], and sparse or short vegetation shows a higher LST value than dense or tall vegetation [419]. Matsushita et al. [413] investigated the degree of forest degradation in Kochi, Japan, using Terra/ASTER satellite sensors and concluded that the use of water content based (e.g., LSWI) and the pigment content based (e.g., NDVI) obtained from satellite data were not effective for detecting forest degradation in the study area, but in contrast the thermal IR bands of the Terra/ASTER data were effective. However, the coarse spatial resolution of the satellite images still limits their application and suggests that the use of higher resolution may have large potential in mapping forest degradation [413].

In addition, various modeling tools have been suggested for mapping soil moisture, evapotranspiration, and moisture stresses based on thermal remote sensing data. For example, Anderson et al. [421, 422] investigated using TIR remote sensing to monitor evapotranspiration and moisture stress fields at continental scales based on improvements of the Atmosphere-Land Exchange Inverse (ALEXI) model [423]. Also, mapping of evapotranspiration, moisture stresses, and drought identification at continental, regional, and local scales can be accomplished by properly utilizing a suite of TIR sensors available from the Geostationary Operational Environmental Satellites (GOES) and the Landsat series of satellites [424, 425]. By combining a number of TIR images retrieved from instruments such as the Moderate Resolution Imaging Spectrometer (MODIS) on board the Terra and Aqua satellites, AVHRR, and ASTER, and models such as ALEXI (for coarser spatial scales) and DisALEXI (a disaggregation algorithm to obtain a finer spatial resolution), useful products for mapping evapotranspiration at the ~100-m scale have been developed [424, 425].

Snowmelt and glacier melt are important sources of water for many parts of the world. Snow cover, depth, and density can be estimated by satellite remote sensing. For example, optical remote sensing of snow cover has been made since the 1970s using Landsat series of sensors, and more recently NASA’s instrument MODIS on Terra (since 1999) and Aqua (since 2002) satellites and NOAA’s Interactive Multisensor Snow and Ice Mapping System (IMS) provide 500-m resolution snow cover products that are available at National Snow and Ice Data Center (NSIDC, http://nsidc.org). Although differencing between snow and clouds is still a concern (e.g., [426]), some validation studies (e.g., [427, 428]) suggest a good potential for hydrologic applications. Likewise, glacier dynamics have been widely studied by airborne and spaceborne sensors. Table 1 in Gao and Liu [429] gives details of remote sensors that may be useful in glaciology. Both aerial photography and satellite images are used to map the areal extent of glaciers and monitor their temporal evolution [429]. For example, Kääb [430] used ASTER and Shuttle Radar Topographic Mission (SRTM) data to estimate the glacier dynamics at East Himalaya. The use of SRTM and SPOT satellite images have been also used for mass balance studies of some glaciers in India [431].

In addition, passive and active microwave radiation have been useful for determining snow extent, depth and density, and consequently snow water equivalent (SWE) (e.g., [432]). For example, the scanning multichannel microwave radiometer (SMMR) launched in 1978 has been used for retrieving SWE at the global scale [433]. Microwave radiation is related to various properties of the snow such as the number of snow grains and the packing of the grains [434], so is a function of snow depth and density. SWE algorithms (e.g., simple linear regression equations) have been developed using spaceborne microwave radiometer data for both open spaces and areas with forest cover. However, high-resolution (~100-m scales) SWE data are not available from current space systems, and radar technologies are being developed to fill such gap so that data retrieval will be able to capture the effects of topographic features and variations of wind [435]. Several uncertainties are involved in estimating SWE from space sensors. Dong et al. [433] examined satellite-derived SWE errors associated with several factors such as snow pack mass, distance to significant open-water bodies, forest cover, and topographical factors using SMMR data. Also the use of data assimilation for estimating snowpack properties based on Kalman filter has been suggested [436]. Furthermore, it has been reported that signals transmitted from global positioning system (GPS) satellites can be utilized for retrieving SWE [437].

Furthermore, spaceborne technology has been developed aimed at measuring (estimating) the total amount of water in the earth system particularly the surface water (soil moisture and snow) and the subsurface water (groundwater) based on the Gravity Recovery and Climate Experiment (GRACE) satellites (e.g., [438440]). The joint use of Global Land Data Assimilation System (GLDAS) that gives estimates of surface waters and GRACE enables the estimation of groundwater storage (NASA website, 2011). For example, these techniques have been applied to estimate the variations of the total water storage for Texas river basins that drain to the Gulf of Mexico, the Rio Grande, Arkansas, and Red rivers, and California’s Central Valley systems. Also this technology has been applied to quantify the current rates of groundwater depletion in northwestern India, in the Middle East, and in Africa (e.g., [441, 442]) and has been included as an additional input to identify drought severity (Drought Monitor website). Green et al. [443] reviewed additional applications and the history of GRACE.

8.4 Advances in Measuring Large River Systems

Conventional methods for measuring and estimating surface waters are well known (e.g., [52, 444]). For estimating surface waters in streams, for example, hydrometric stations are located at an appropriate cross section to register water levels (H) using recording or non-recording gauges, which are then converted into water discharge (Q) by using appropriate relationships between Q and H (rating curves). Such relationships are developed by measuring stream water velocities and depths at a number of points across the stream cross section, which allows estimating the water discharge. Likewise, the hydrometric station can be used to measure sediment concentrations and other water quality parameters as the case may be. Thus, hydrologic services of the countries worldwide generally have a network of hydrometric gauging stations to make systematic observations to quantify the streamflow variations through time.

However, for measuring streamflows and other properties such as sediment transport in large river systems such as the Amazon River, such conventional methods are quite limited. Thus, for the past decades, the interest on developing especial equipment and methods for measuring large rivers has grown (e.g., [445449]). In the 1990s, a number of studies were made to improve measuring discharges of the Amazon River, and a joint effort of Brazilians and French hydrologists introduced the Doppler technology with good results (e.g., [450, 451]). The study by Filizola and Guyot [452] describes the use of the Acoustic Doppler Current Profiler (ADCP) for streamflow measurement in the Amazon at a gauging station near Obidos, Brazil (the ADCP uses the Doppler effect by transmitting sound at a fixed frequency and receiving the echoes returning from sound scatters in the water). For example, they reported that the water discharge and suspended sediment in March 24, 1995 were 172,400 m3/s and 3.15 × 106 Ton/day, respectively. Details of the equipment and methods used for measuring and estimating the river discharge and suspended sediment can be found in Filizola and Guyot [452]. More recently Laraque et al. [453] reported additional studies of mixing processes at the confluence of a major tributary of the Amazon River (near Manaus) using also ADCP. Also the ADCP technology is being used for measuring river discharges and suspended sediment in major rivers in the Andean countries (Peru, Ecuador, and Bolivia). Per illustration Fig. 1.32 shows personal of SENAMHI (Peru) and IRD (France) with ADCP equipment for streamflow measurements in the Huallaga River, Peru.

Fig. 1.32
figure 32

(a) Staff of SENAMHI-Peru and IRD-France measuring discharge in the Huallaga River (Peru) using ADCP, (b) Huallaga River, (c) ADCP installed in the boat, and (d) transect of the ADCP, velocity grids, and measuring sections (source: Jorge Carranza SENAMHI-Peru)

8.5 Using Dendrohydrology for Extending Hydrologic Data

Dendrohydrology is the analysis and application of tree-ring records for hydrologic studies [454]. Trees are useful for reconstructing streamflows because they are sensitive recorders of natural climate variability. Tree-ring growth is affected by the same set of climatic factors (e.g., precipitation and evapotranspiration) that affect streamflows [455]. Dendrohydrology started in western North America primarily using ring-width time series to extend gauge records of streamflows [456]. Tree-ring records have been used to extend the short records of a number of hydrologic processes such as streamflows [457], precipitation [458], soil moisture [459], and SWE [460]. An extensive review of dendrohydrology was made by Loaiciga et al. [461]. The reconstructed streamflow records enable one observing a wider range of flow scenarios that may be obtainable from the historical records alone. For example, Woodhouse [462] observed that the reconstructed streamflows of the Middle Boulder Creek showed that the low-flow events that occurred in the past were more persistent than those found from the analysis of the historical records. Similar other studies of tree-ring reconstructed flows indicate that droughts of more severe magnitude and longer durations had occurred in the past compared to droughts occurred during the historical period (e.g., [455, 457, 463465]).

Several record extension models have been employed in literature to extend streamflow records using tree-ring indices data. Among them is the traditional multiple linear regression model (e.g., [457, 462, 466, 467]), principal component analysis (e.g., [457, 463, 467]), and transfer function models (e.g., [468]). For example, Woodhouse [462] used multiple linear regression models and the stepwise regression technique to select the tree-ring indices to be used for reconstructing the streamflows of the Middle Boulder Creek, Colorado. Furthermore, Tarawneh and Salas [464] developed a record extension technique, which is based on multiple linear regression with noise and spatial disaggregation to reconstruct the annual streamflows of the Colorado River for the entire 29 flow sites.

8.6 Developments in Hydrologic Information Systems

Hydrologic information has been collected by many entities and national and international organizations worldwide for a variety of purposes such as for evaluating water resources availability in various regions and countries, for water resources developments in river basins, for geo-environmental investigations in river basins, for detecting the effect of human interventions on hydro-environmental systems, and for studying the impact of climate variability and change on the water resources and the environment of river basins. Several years ago, the US National Science Foundation supported the creation of the Consortium of Universities for the Advancement of Hydrologic Sciences (CUAHSI) and also funded the Hydrologic Information System (HIS) project for sharing hydrologic data. It will consist of databases that will be integrated and connected through the Internet and web services for data finding, accessibility, and publication [469471]. An example is HydroServer, a computer server that includes a collection of databases, web services, tools, and software applications that allows data producers to store, publish, and manage from a project site (Tarboton et al. [471]). Current efforts in various directions have been summarized in a CUAHSI Conference on Hydrologic Data and Information Systems convened at Utah State University on June 22–24, 2011. For example, a framework is currently being developed through which hydrologic and atmospheric science data can be shared, managed, discovered, and distributed (e.g., [472]).

9 Acknowledgements

We would like to acknowledge the collaboration received from Dr. Jose L. Chavez, Assistant Professor of Colorado State University, and the peer review made by Dr. Wade Crow. Their suggestions and comments improved the final content of the chapter. In addition the main author acknowledges the continuous support from the NSF, particularly the ongoing project P2C2: Multi-Century Streamflow Records Derived from Watershed Modeling and Tree Ring Data (ATM-0823480).