Background & Summary

The seasonal variability of streamflow has led civilization to rely on built infrastructure, such as levees and dams, for flood control, water supply, crop production, and clean electricity1,2,3,4. With extreme events increasing under a changing climate, reliable hydrological predictions are key to improving strategic planning and the operation of water infrastructure5,6,7,8,9,10. Large-scale land surface models (LSMs) have long been essential tools for predicting future hydrology. LSMs are used in Earth-system model frameworks to link land surface processes with other, interacting processes to predict the impacts of a changing climate and evolving human systems11,12,13,14. Here we focus on one of the most dominantly used LSMs, the latest version of the Community Land Model (CLM), CLM515. CLM5 is the land component of the Community Earth System Model, the Euro-Mediterranean Center on Climate Change coupled Earth System model16, and the Norwegian Earth System Model17. Because of the structural complexity and computationally expensive nature of CLM5, limited attention has been given to addressing uncertainties in its default hydrological parameters and how these uncertainties might impact hydrological predictions and subsequent decision-making18,19,20.

In practice, CLM5 users typically adopt the default parameter values provided by developers. These values are estimated based on limited/empirical data or calibrated deterministic values reported in the literature for a limited number of basins21. Moreover, prior hydrological calibration efforts for LSMs frequently only use one error metric (e.g., Nash-Sutcliffe Efficiency [NSE])21,22,23, which narrows their focus to one aspect of the flow duration curve (i.e., high flows) and can lead to significant inadvertent biases in hydrological predictions. Neglecting parameter uncertainties also can lead to biased decision-making. For example, ignoring parameter uncertainty in riverine flood prediction biases homeowners’ house-elevation decisions results, potentially resulting in higher projected economic costs24. Ignoring parameter uncertainty in crop yield projection under climate change biases crop insurance policies25. As a result, uncertainty characterization (UC) of hydrological parameters in LSM predictions is critical to informing how model parameterization influences model outcomes and applications26. For this work, we define UC as “model evaluation under alternative hydrological parameterization hypotheses to explore their implications for model output uncertainty27.

To support the broad adoption of UC in CLM5 applications, we developed benchmark CLM5 hydrological datasets based on extensive UC of CLM5 hydrological parameters for 464 basins that are part of the Catchment Attributes and Meteorology for Large-sample Studies (CAMELS)28,29 basins over the conterminous United States (CONUS). The original CAMELS data set includes 671 headwater-type basins with minimal human influence across the CONUS. CAMELS provides basin area information from two different sources: the national geospatial fabric polygon30 and the United States Geological Survey Geospatial Attributes of Gages for Evaluating Streamflow version II database31. Following the recommendation of Addor, et al.28 not to use basins with large area discrepancies between the two sources, we identified 464 out of the 671 basins with a basin area relative difference of less than 2% as suitable for CLM5 evaluation.

Five common meteorological forcing datasets are also used to characterize the forcing data selection effects. As shown in Fig. 1, the datasets consist of three parts for each meteorological data type:

  1. 1.

    Performance of CLM5 default hydrological parameters on hydrological predictions using 28 error metrics that capture different flow regimes, evapotranspiration (ET) regimes, and extreme conditions.

  2. 2.

    Large-ensemble (~1,300) hydrological CLM5 outputs that account for hydrological parameter uncertainties at each basin.

  3. 3.

    Site-level and regional hydrological parameter sensitivity analysis results that clarify the parametric controls for CLM5 hydrological predictability for 28 error metrics.

Fig. 1
figure 1

A schematic view of the CLM5 benchmark hydrological datasets. In step 2, about 1,300 ensemble parameter sets are generated using a Latin Hypercube Sampling method to produce about 1,300 ensemble time series and error metrics. The same ensemble parameters and error metrics are used in step 3 to generate at-site and regional parameter sensitivity scores as well as behavioral sensitive parameters.

The 28 error metrics provide a diagnostic evaluation of how closely the model simulates watershed behavior and support the application of CLM5 in a wide range of studies such as flood and drought prediction, reservoir operation and management, hydrological prediction under anthropogenic influence, etc. For instance, reservoir modelers prioritize capturing monthly flows and annual water balances, while ecosystem modelers generally emphasize the importance of predictions pertaining to seasonal low flow or general low flow regimes. In the error metrics dataset, users can select the metric of interest or a weighted multi-objective metric depending on the application.

Although the datasets are generated at gauged CAMELS basins, the full set of 464 basins are clustered to facilitate regional-scale analysis and extend the results to ungauged basins/grid cells over the CONUS. These datasets intend to offer guidance for future CLM5 hydrological applications, including parameter calibration, by reducing parameter dimensionality, identifying the behavioral values of sensitive parameters, characterizing forcing selection effects, and diagnosing potentially inadequate model structure and parameterization.

Methods

CLM5 configuration data

Observational datasets used for CLM5 UC include unregulated daily flow observations for 1980–2014 from the CAMELS dataset, which consists of headwater-type basins with minimal human impacts over the CONUS (Fig. 2a). Monthly ET data at 0.05° grid cell are acquired from the Moderate Resolution Imaging Spectroradiometer (MODIS) products32. The basins range in size from about 4 to 25,791 km2, with a median basin size of about 436 km2. The basin mean elevations range from about 15 m in the Delaware to 3,529 m in the Southern Rocky Mountains, with a median elevation of 458 m.

Fig. 2
figure 2

(a) The 464 CAMELS basins and seven clusters defined by the reproducible k-means++ algorithm. (b) CONUS 1/8° grid cells placed into the same seven clusters. White areas indicate that lakes and wetland are removed in clustering.

The five common gridded meteorological forcing datasets include data from Phase 2 of the North American Land Data Assimilation System (NLDAS-2)33, Parameter-elevation Regressions on Independent Slopes Model (PRISM)34, Daymet35, Livneh36, and dynamically downscaled European Centre for Medium-Range Weather Forecasts Reanalysis v537 using the Weather Research and Forecasting (WRF-ERA5) model38.

Both NLDAS-2 and WRF-ERA5 include hourly precipitation, air temperature, wind speed, surface pressure, specific humidity, and shortwave and longwave radiation data at a 1/8° grid cell over the CONUS. The Livneh data provide daily precipitation, maximum and minimum temperature, and wind speed information at a 1/16° grid cell over the CONUS. Livneh wind speed data are acquired from the National Centers for Enviromental Prediction-National Center for Atmospheric Research (NCEP-NCAR) reanalysis39. PRISM and Daymet data provide daily precipitation as well as maximum and minimum temperature information at 4 km and 1 km grid cells over the CONUS, respectively. We use the Mountain Micro Climate Simulator algorithm40 to disaggregate daily Livneh, PRISM, and Daymet data into an hourly scale and generate surface pressure, specific humidity, and shortwave and longwave radiation data. Because wind speed data are not provided in PRISM and Daymet data, wind speed is taken from the NLDAS-2 data. The NLDAS-2 data are based on the North American Regional Reanalysis41, a major improvement over the earlier NCEP-NCAR reanalysis. All temporal disaggregation is done using the open source Python package MetSim42.

The land surface data including land unit type, soil properties, and plant functional type are acquired from the CLM5 input dataset for the CLM5 configuration setting at a 1/8° grid cell over CONUS13. The CLM5 land surface data are derived from a variety of sources such as the Moderate Resolution Imaging Spectroradiometer (MODIS) Vegetation Continuous Fields product, the Global Land One-km Base Elevation Project, and the International Geosphere-Biosphere Programme, among others15. In addition to the CLM5 land surface data, we also include the 1-km grid cell baseflow index43 (upscaled to 1/8° grid cell) over the CONUS for basin clustering. At each CAMELS basin, we estimate the basin mean meteorological forcing, ET, land surface data, and baseflow index from the overlapped grid cells using the area-weighted average method.

Basin clustering

A total of 22 physical features are selected for each CAMELS basin for clustering (Supplementary Table 1). We classify the 22 features into five categories (topography, land use, soil properties, climate, and other) depending on their function44. Several features within each category are highly correlated (i.e., pairs of features that exhibit a Pearson correlation coefficient >0.7). We remove these redundant features and select one representative feature from each correlated group, adding them to independent features that are not strongly correlated with any others. For example, ELEV and STD_ELEV in the “Topography” category are highly correlated, so only ELEV is used in the clustering. SOIL_COLOR is not strongly correlated with other features within the “Soil” category, but is strongly correlated with SLOPE in the “Topography” category. Thus, we did not keep SOIL_COLOR in the clustering analysis. We used a final total of 17 features in the clustering. Note that we do not include streamflow as a clustering criterion. This will allow the clustering analysis to be applied areas of the CONUS where no flow records are available. We use the k-means++ clustering45,46 with the bootstrapping method to find a stable and reproducible clustering system.

Multiple clusters (cluster size 3 to 10) are tested in the clustering process to identify the optimal number of clusters. First, we randomly partition 90% of the 464 basins as training sets and leave the remaining 10% as validation sets for each cluster number. We then bootstrap 70% of the training sets 40 times and build 40 clustering models. Finally, we classify the validation sets and select the cluster number with highest reproducibility based on four cluster similarity indices: (1) the Rand Index47, (2) the Adjusted Rand Index48, (3) the Jaccard Index49, and (4) the Fowlkes–Mallows Index50. Our results suggest that a cluster size of seven has the highest similarity measures for all four indices. Therefore, we use seven clusters for regional analysis (Fig. 2a). Figure 2b shows the 50,629 1/8° grid cells over the CONUS grouped into 7 corresponding clusters.

CLM5 hydrological parameters

We used the CLM5 Perturbed Parameter Ensembles version, recently developed at NCAR51, to perform land surface simulations and produce hydrological datasets. The CLM5-Perturbed Parameter Ensembles configuration allows users to perturb default parameter values. For spatially distributed parameters such as soil porosity and hydraulic conductivity, spatially uniform scaling factors are introduced to preserve the underlying structure. Parameters related to hydrological processes in CLM5 can be classified into six groups: (1) canopy water, (2) surface water, (3) soil water, (4) subsurface water, (5) snow, and (6) evaporation. In this study, we include parameters that cover all six groups in an attempt to gain a comprehensive understanding of the role of CLM5 hydrological parameters in hydrological predictions. Based on previous studies18,19,20 and discussions with CLM5 core developers (i.e., the co-authors D. Kennedy and S. Swenson), we identified 15 hydrological parameters that likely have dominant impacts on the simulation of surface and subsurface runoff, evaporation, canopy water, snow, and soil moisture. Table 1 shows the default parameter values and their prior ranges based on the expert judgement of CLM5 developers.

Table 1 The 15 selected hydrological parameters, relevant processes, default values, and prior ranges.

Ensemble simulation and sensitivity analysis

CLM5 is configured for each basin for ensemble simulation. For each basin, we sample 1,500 parameter sets from their uniform prior distributions using the Latin Hypercube Sampling (LHS) method52, which can effectively sample full parameter ranges by dividing the parameter space evenly for representative sample draws. This results in a total of 1,500 × 464 × 5 = 3,480,000 CLM5 simulations. For the default and each ensemble parameter set, we run CLM5 in the satellite phenology mode for 2005–2014. This 10-year simulation period represents the CONUS flooding climatology53 and contains extreme hydrological events, which are important for characterizing CLM5 predictability and uncertainty in simulating extreme events. These events include major flooding and droughts such as the 2005 Pacific Northwest drought, the 2012 central Great Plains drought, and the 2012–2016 California exceptional drought. Before the 10-year simulation, each CLM5 run was spun up for 25 years to equilibrate all states54. All simulations were performed on the National Energy Research Scientific Computing Center (NERSC) Cori high-performance computing (HPC) system.

Due to parameter interactions that may result in nonphysical states and failed runs, our goal was to obtain at least 1,000 successful CLM5 simulations for each of the 464 CAMELS basins for each forcing dataset. We found that about 10% of the 1,500 parameter sets failed to converge for several basins for each meteorological forcing, resulting in ~1,300 successful CLM5 runs in each basin for the parameter uncertainty characterization and sensitivity analysis for each meteorological forcing. Investigating the runs that failed due to water balance error did not lead to any spatial or parameter-based patterns. All sampled parameters are within their physical ranges, but their complex interactions combined with local climates likely result in nonphysical simulated states and lead to failed runs. Different parameter sets failed in different basins and meteorological forcings, suggesting that parameter interactions vary with the basin and climate. Numerical experiments must be carefully designed to tease out the source of the error and relevant parameters for locations with different climate regimes. However, that work is beyond the scope of this study.

After producing the ensemble simulations, we use the Delta moment-independent sensitivity analysis method (Delta-MIM) to calculate the sensitivity score of the 15 hydrological parameters55,56. We selected Delta-MIM for this study because it does not require a specific sampling scheme and includes effects of high-order statistical moments in the response metrics of interest57. Delta-MIM exploits an empiric density-based measure that identifies the parameters that most influence the entire distribution of the response variable (i.e., it captures higher order interactive effects beyond mean and variance responses). For each parameter, the resulting Delta index measures the normalized expected shift in the distribution of the response variable induced by the parameter.

Diagnostic error metrics

We include a total of 28 error metrics to comprehensively assess CLM5 performance, uncertainty, hydrological parameter sensitivity to different flow regimes (e.g., high/low flows, water balance, etc.), and ET characteristics at different temporal scales (e.g., seasonal and annual). Table 2 presents these metrics. Their relevant scales and mathematical descriptions are provided in the Supplementary Information.

Table 2 Description of the 28 error metrics.

Data Records

The CLM5 hydrological datasets are publicly available in comma-separated value (.csv) and netcdf (.nc) formats and hosted in the MultiSector Dynamics – Living, Intuitive, Value-adding, Environment (MSD-LIVE) data repository58. Due to page limitation, Table 3 only provides an example of the data structures, data files, and variables. Full data descriptions can be found in the README file in the repository.

Table 3 Description of the CLM5 hydrological datasets.

Technical Validation

The accuracy and precision of the CLM5 ensemble streamflow simulations depend on partitioning the “behavioral” and “nonbehavioral” parameter sets using streamflow measurements, which differ for each error metric and threshold value. Simulations that produce error metrics that fall within user-defined acceptable performance metric ranges are considered “behavioral”, while those that fall outside these ranges are “non-behavioral”. In the following discussion, we use CLM5 ensemble simulations driven by the NLDAS-2 meteorological forcing data as an example and perform similar analyses for the other meteorological forcing datasets. Figure 3 shows the spread of regional monthly runoff in 7 clusters using two different constraints to partition behavioral parameter sets: (1) annual flow bias within 10% and (2) annual flow bias within 10% and monthly NSE higher than 0.5. Despite biases in a few regions (i.e., underestimating the summer flow in Cluster 2-Pacific and a flow peak time mismatch in Cluster 4-Rockies), the behavioral ensemble simulations that satisfy either constraint significantly improve default parameter simulation for all clusters and better reproduce observed flow. Using the single best performing set based on the monthly KGE metric, CLM5 skill for simulating monthly streamflow in 2005–2014 can be improved from 0.8586 with the default parameters to 0.8637 in Cluster 1-Northeast, from 0.6476 to 0.7278 in Cluster 2-Pacific, from ‒0.3448 to 0.9110 in Cluster 3-AZ/NM, from 0.4089 to 0.4750 in Cluster 4-Rockies, from ‒0.5674 to 0.8624 in Cluster 5-Great Plains, from 0.2836 to 0.7974 in Cluster 6-Midwest, and from 0.6004 to 0.9233 in Cluster 7-Southeast.

Fig. 3
figure 3

Regional mean monthly flow using the NLDAS-2 forcing data in the 7 clusters. The green spread indicates all ~1,300 ensemble members. The red shading indicates the spread for parameter sets that have annual flow bias within 10% of the observed flows. The blue shading indicates the spread for parameter sets that have annual flow bias within 10% of the observed flows and an NSE value of monthly flow above or equal to 0.5.

Usage Notes

The CLM5 hydrological datasets listed in Table 3 can be directly used for a wide variety of applications over different spatial scales ranging from local, to regional, to the full CONUS. We present the major three data usage applications here, but our choices are not exhaustive.

  1. 1.

    Characterize meteorological and hydrological parameter uncertainty. For each meteorological forcing, the ~1,300 hydrological parameter sets and their ensemble simulations can be directly used to study the impacts of hydrological parameter uncertainty on hydrological predictions. One notable example is assessing the relative role of parameter uncertainty and choice of meteorological forcing in simulating different flow regimes. For projection studies, users also can assess the relative roles of hydrological parameter uncertainty and climate or land use change uncertainty on future hydrological changes. At the CAMELS basin scale, users can directly employ the ensemble streamflow prediction datasets to characterize uncertainty. For ungauged basins in the CONUS, users can find the basin cluster as shown in Fig. 2b and then approximate parameter uncertainty with the spread of regional streamflow as shown in Fig. 3.

  2. 2.

    Guide hydrological parameter calibration (deterministic) and behavioral parameter selection (ensemble) at both CAMELS basins and ungauged basins. In practice, the accuracy and precision of the CLM5 ensemble streamflow simulations depend on the partitioning of behavioral and nonbehavioral parameter sets. Simulations that produce error metrics that fall within user-defined acceptable performance metric ranges (e.g., NSE ≥ 0.5 in Fig. 3) are considered behavioral, while those that fall outside these ranges are non-behavioral. Figure 4 shows the sensitivity scores of the 464 basins to the annual flow bias metric and the regional sensitivity scores to 28 error metrics for Cluster 1-Northeast, using NLDAS-2 forcing data as an example. These results can aid in future CLM5 hydrological parameter calibration efforts by reducing parameter dimensionality with sensitive parameters and identifying their behavioral values for different error metrics. At the CAMELS basin scale, users can directly select the best performance parameter set for their metric of interest (such as seasonal or annual flow bias for reservoir modeling) to perform deterministic simulations or select ensemble behavioral parameter sets with one or more metric constraints. At ungauged basins, users first identify their basin cluster number. They then use the regional sensitivity score such as Fig. 4b to identify the sensitive parameters and find their behavioral parameter values. The sensitivity scores for the 28 error metrics can support a wide range of hydrological applications.

    Fig. 4
    figure 4

    (a) The normalized sensitivity score of the 15 hydrological parameters to the annual flow bias metric at each basin in each cluster. (b) Regional normalized sensitivity score to 28 diagnostic error metrics using Cluster 1-Northeast and NLDAS-2 forcing data as an example.

  3. 3.

    Aid CLM5 model developers in diagnosing potentially inadequate model structures and parameterizations. For example, Fig. 3 shows that no parameter set meets the constraint that monthly flow NSE is higher than 0.5 in Cluster 4-Rockies using the NLDAS-2 forcing data. This indicates very poor performance and some errors in model structure for high flow simulation and timing in this region. The earlier peak flow may be related to CLM5’s lack of representation of sub-grid topographic variability and how it impacts solar radiation, which is critical to correctly timing snow melt. The small value in the depth-to-bedrock parametrization for Cluster 2-Pacific (i.e., mean value of 1.08 m) may help explain the underestimation of summer low flow due to the predicted low soil water-holding capacity.

Note that CAMELS basins are small to mid-size basins with minimal human intervention. For users who are interested in modeling the large river systems typically influenced by human activities such as reservoir operations, these data sets can produce enhanced CLM5 runoff simulations as input for downstream river routing and water management models59,60.