1 Introduction

Global climate models (GCM) are widely used for climate studies and seasonal forecasting/simulations (Saha et al. 2006; Xue et al. 2004, 2006, 2010). However, they have difficulty in simulating the regional-scale climate features, especially precipitation, due partly to difficulties to parameterize subgrid cloud microphysics and convection, and other problems such as the inability to resolve regional-scale circulations such as sea-land and mountain-valley breezes, and other orographic and surface heterogeneity-driven precipitation. Regional climate models (RCM), on the other hand, allow for higher spatial-resolution domains and have increasingly been applied for intraseasonal, seasonal, and interannual climate studies. RCM’s higher vertical and horizontal resolutions provide better representations of topography and land surface heterogeneity, and therefore are able to resolve regional and local-scale physical processes and atmospheric circulations.

The term dynamic downscaling refers to numerical model simulations where a higher-resolution RCM is forced by lateral boundary conditions (LBC), taken from coarser-resolution GCM or Reanalysis products. Most dynamic downscaling studies of the United States (US) climate focused so far on the spring or summer season (e.g., Fennessy and Shukla 2000; Xue et al. 2001, 2007, 2012; Liang et al. 2004a; Bukovsky and Karoly 2009; Chan and Misra 2011). Xue et al. (2007) investigated the ability of dynamic downscaling to simulate North America’s summer precipitation under several domain sizes and boundary locations, and horizontal resolution configurations for two sets of LBC taken from the global reanalysis and the North American regional reanalysis. The results showed that downscaling ability is very sensitive to the North American domain’s southern boundary location, which was related to the proper simulation of the low-level jet. Furthermore, the sensitivity tests also showed that improved downscaling results were achieved with higher-resolution domains and higher-frequency LBC.

The impact of spatial resolution on dynamic downscaling results was also investigated in De Sales and Xue (2011) by examining the role of the Andes mountain range elevation on South America’s precipitation simulations. The results further confirmed that the more realistic topographic representation by regional climate models can significantly improve the simulation of warm and cold season precipitation of GCM, by correctly positioning moisture fluxes at the lower levels of the atmosphere. The downscaling sensitivity to other factors, for instance cumulus precipitation parameterizations, has also been investigated. Liang et al. (2004a) documented that the dynamic downscaling skillfulness in simulating the precipitation diurnal cycle over the US is also dependent on the choice of cumulus parameterization schemes because the skill of individual schemes is regime dependent. Summer rainfall amounts in the North American monsoon region are very poorly simulated by the Grell scheme but well reproduced by the Kain–Fritsch scheme, whereas rainfall amounts from moist convection in the southeast are underestimated by the former and overestimated by the latter. Dynamic downscaling experiments have also been successfully carried out over East Asia and West Africa and other geographical regions (e.g. Druyan et al. 2010; Sun et al. 2011).

These studies have shown that dynamic downscaling can significantly improve the simulation of both temporal and spatial distribution of regional precipitation. However, winter season RCM downscaling have been scarce (e.g., Pielke et al. 1999; Kim et al. 2000; Waliser et al. 2011). Furthermore, in most dynamic downscaling studies, only atmospheric GCMs, which uses specified sea surface temperature (SST), and reanalysis data were applied to provide LBCs for the RCMs. Studies of downscaling abilities of fully-coupled atmospheric-ocean GCM simulations representing the full interactions between the oceans, land and atmosphere have been scarcer.

The present study aims at assessing whether additional prediction skill can be achieved by RCM dynamic downscaling of the National Center for Environmental Prediction Climate Forecast System (NCEP CFS) winter climate forecasts over the contiguous United States from 1982 through 2004. CFS is a fully-coupled atmospheric-ocean GCM (Saha et al. 2006). To accomplish this task, the ETA regional climate model was 1-way nested in the NCEP CFS for a series of 22 winter season (from December through April) dynamic downscaling simulations. This study is part of the Multi-RCM Ensemble Downscaling of Multi-GCM seasonal forecasts (MRED) project (http://rcmlab.agron.iastate.edu/mred) which is aimed at understanding the utility of ensemble downscaling simulations of the winter regional climate over the US. Brief descriptions of the models, detailed experimental design, are presented in the next section. In Sect. 3, we present a detailed analysis of downscaling forecast results including the impact of ensemble size on the simulated precipitation, a comparison between CFS’ and ETA’s spatial distributions and temporal variations of precipitation, and their inter-annual and intra-seasonal variability. A possible explanation for the difference between model predictions and causes for dynamic downscaling improved skills are provided in Sect. 4. As a final point, our conclusions are presented in Sect. 5.

2 Models, data and experimental design

The NCEP CFS’ hindcasts provided the LBCs for the RCM integrations in this study. The CFS became operational at NCEP in August 2004 and replaced the earlier dynamical Seasonal Forecast Model (Kanamitsu et al. 2002), which was forced by specified SSTs. CFS is a fully coupled model representing the interaction between the Earth’s oceans, land and atmosphere. The atmospheric components includes updated versions of parameterizations of cumulus convection (Hong and Pan 1998), solar radiation transfer (Hou et al. 2002), and boundary layer vertical diffusion (Hong and Pan 1996). Land surface water and energy fluxes are calculated by the Oregon State University two-layer soil model (Mahrt and Pan 1984). As for the oceanic components, the CFS uses the Geophysical Fluid Dynamics Laboratory Modular Ocean Model version 3 (MOM3) (Pacanowski and Griffies 1998), which is a finite difference version of the ocean primitive equations under the assumptions of Boussinesq and hydrostatic approximations.

Current operational CFS runs are initialized from the operational T382L64 atmospheric Global Forecast System (GFS) analyses and ocean analyses with 40 vertical levels and 1–1/3 degree horizontal resolution. CFS uses the GFS as its atmospheric component on a coarser T62L64 resolution grid, and the MOM3 as its oceanic components. A coupler in CFS exchanges heat, water, and energy between the two models (Saha et al. 2006). This study uses the CFS climatology runs, which are runs starting on the same calendar month for different past years (1982–present), and using the T62L28 Reanalysis-2 (Kanamitsu et al. 2002) as the atmospheric initial conditions; and historical 40-level, 1–1/3 degree MOM3 ocean analyses as the ocean initial conditions.

The RCM utilized in this study is the NCEP limited-area ETA regional climate model with a modification in the land surface processes model (Xue et al. 2001, 2007). The NCEP ETA model has been used for research and operational purposes. This model evolved from the earlier Hydrometeorological Institute and Belgrade University model with step-like mountain vertical coordinates (Mesinger et al. 1988; Janjic 1994). The model’s code has since been upgraded to include more advanced schemes such as the Arakawa-style horizontal advection scheme (Janjic 1984), a radiation scheme based on Lacis and Hansen (1974) and Fels and Schwartzkopf (1975), and a Mellor-Yamada Level 2.5 closure scheme (1982) to represent turbulence in the planetary boundary layer and in the free atmosphere. In terms of precipitation, the model utilizes the Betts-Miller-Janjic scheme for deep and shallow moist convection (Betts 1986; Janjic 1994), and a grid-scale precipitation scheme based on Zhao and Carr (1997).

The ETA model was further modified to include the third version of the Simplified Simple Biosphere model (SSiB-3, Sun et al. 1999; Sun and Xue 2001) which in addition to simulating processes such as, runoff, vegetation and bare soil evaporation, and photosynthesis-controlled canopy transpiration, also includes a multi-layer surface snow hydrology scheme. The aerodynamic resistance values in SSiB-3 are determined in terms of vegetation properties, ground conditions, and bulk Richardson number according to the modified Monin–Obukhov similarity theory. The model is intended to realistically simulate the controlling biophysical processes and to provide fluxes of radiation, momentum, and sensible and latent heat to RCMs. Moreover, SSiB-3 ensures energy, water and momentum conservation at the atmosphere-land surface interface. Information regarding the ETA/SSiB-3 coupling, vegetation classification, and vegetation parameters can be found in Xue et al. (2001). Hereafter, the coupled version of the ETA/SSiB-3 will be referred to as UCLA-ETA. The UCLA-ETA model was set up on a 0.25° × 0.25° horizontal resolution and 38 vertical levels grid covering most of the central North America and the adjoining Atlantic and Pacific Oceans (Fig. 1). Different versions of SSiB-coupled ETA models have been extensively tested on seasonal experiments (e.g., Chou et al. 2002; De Sales and Xue 2006, 2011; Xue et al. 2001, 2007, 2012).

Fig. 1
figure 1

The domain used for the UCLA-ETA simulations (red line) includes the entire contiguous United States in addition to most of Mexico and the Caribbean, as well as southern Canada and portions of eastern Northern Pacific and western North Atlantic

Ten-member ensembles each of 22 winter season (December through April) integrations were performed with the CFS by NCEP as part of the MRED project. Ensemble members started at 00z 21, 22, 23, 24, 25, 29, 30 November, and 01, 02, 03 December of the years between 1982 and 2003 and ended at 00z 01 May of the following year. Each of the 220 CFS hindcasts was then 1-way downscaled by the higher resolution UCLA-ETA regional model, starting from the same initial date and initial conditions provided by the CFS. The regional model’s LBCs were updated every 6 h of hindcast from the CFS output. The initial conditions for soil temperature and wetness, initial snow cover, as well as daily SST and sea ice concentrations for all experiments were also taken from the CFS results. No form of interior nudging was utilized for the UCLA-ETA hindcasts, which were carried out continuously without any restarts.

Several observational and reanalysis data sets are used for model evaluation, including the Oregon State University (OSU) Parameter–Elevation Regressions on Independent Slopes Model (PRISM) Climate Group monthly precipitation and surface temperature data (http://prism.oregonstate.edu, Di Luzio et al. 2008), the Climate Prediction Center’s global gauge-based analysis of precipitation and surface temperature (CPC, Chen et al. 2002), the NCEP North America Regional Reanalysis (NARR, Mesinger et al. 2006), the Global Land Data Assimilation System (GLDAS, Rodell et al. 2004) data set, and the Rutgers University Global Snow Lab (Dyer and Mote 2006). The OSU, CPC, and NARR data sets are available on a 1/25°, 0.25°, and 0.33° latitude/longitude grids, respectively. The GLDAS and Snow lab data sets are mapped on 1-degree resolution grids. PRISM is a knowledge-based system that uses point observational data, a digital elevation model and other geographic datasets to generate gridded estimates of atmospheric fields, and intends to provide an improved representation of them in orographically sensitive areas (Daly et al. 1994; Di Luzio et al. 2008).

For comparison purposes and to comply with MRED specifications all observational data and model output were bi-linearly interpolated to the MRED common analysis domain grid defined from 124.75° to 60.0°W and from 24.75°N to 49.125°N, with 0.375° horizontal resolution (referred to as MRED grid hereafter). This grid encompasses the contiguous US, northern Mexico, and some of Canada’s southeast. A sample of the MRED grid can be found at http://rcmlab.agron.iastate.edu/mred under the MRED Output Data section. The PRISM-adjusted OSU precipitation and surface temperature fields only cover the contiguous US area. CPC global gauge-based temperature and precipitation analyses were used to fill in the missing-data areas of MRED grid. The resulting merged datasets are referred to as observations hereafter. CPC’s unadjusted gauge-based precipitation analysis is also utilized as a second verification dataset and is referred as CPC in the text.

3 Results

3.1 Analysis of ensemble size

Before we examine the UCLA-ETA downscaling abilities, we evaluate the impact of ensemble size on precipitation predictions. Monthly mean precipitation values (from December to April for 22 years) were first calculated for each grid point over the entire MRED grid for each of the 10 members as well as for observations. Precipitation was chosen for this analysis because it is a final product of model forecast, and thus strongly affected by forecast’s initial conditions and other variables. Spatial averages of observed (Po) and modeled (Pm) precipitation for every month (t = 1,…,110) of each ensemble member (k = 1,…,10) were then calculated over the entire domain. The number of possible combinations of k elements from a set of 10 ensemble members without repetition, i.e., C(10, k); is given by

$$ C(10,k) = \frac{10!}{k!(10 - k)!}\quad {\text{k }} = { 1}, \ldots , 10 $$
(1)

Using k = 4 as an example, the total number of combinations of 4 members from a set of 10 according to Eq. 1 is 210. For each model, the root mean square error (RMSE) was calculated based on observations for all 210 combinations of monthly means, which were then averaged over the 110-month period to yield the mean RMSE of all possible combinations for k = 4 members. The procedure was repeated for k ranging from 1 to 10. Results are shown in Fig. 2, in which the vertical bars indicate the standard deviation of the mean. Please note that Fig. 2a, b use different vertical scales.

Fig. 2
figure 2

Relationship between US-average precipitation RMSE and ensemble size for a CFS and b UCLA-ETA forecasts. Error bars represent the variability (one standard deviation) around the mean

Both CFS and UCLA-ETA show an inverse relationship between mean RMSE and ensemble size with errors reducing as the number of members increase. When the sample size increases, the errors are eventually stabilized and reach an equilibrium value. For a 1-member ensemble, the mean RMSE are approximately 1.30 and 0.67 mm day−1 for CFS and UCLA-ETA, respectively. The RMSE are 1.23 and 0.60 mm day−1, respectively, for a 5-member ensemble, and 1.22 and 0.59 mm day−1, respectively, for a 10-member ensemble. The rate at which the error decreases with ensemble size is larger for the regional model, especially between k = 1 and k = 4. For k ≥8 the decrease rate is much smaller in either model. There is also significant difference in standard deviation around the mean RMSE between models. In general, CFS exhibits larger standard deviations than the regional model, which indicates larger intra-ensemble (or internal-model) variability in CFS than in UCLA-ETA. The difference is especially larger for smaller ensembles. CFS’s increasing performance with increasing ensemble sizes is further analyzed by Saha et al. (2006).

The results above indicate that an ensemble of 10 integrations of each winter season by either model appears to be sufficient to significantly minimize uncertainties associated with the initial conditions and model internal variability. Therefore, except where explicitly indicated, 10-member ensemble means of CFS and UCLA-ETA runs are used for the remainder of this study.

3.2 Dynamic downscaling abilities

In this section, we explore the spatial distribution of precipitation, surface air temperature, and snow water equivalent (SWE). All results in this section are based on the 10-member ensemble average since a 10-member ensemble is appropriate to represent the UCLA-ETA model downscaling abilities.

Figure 3 shows the 22-year December–April average precipitation for the observations; and the CFS and UCLA-ETA predictions. The observation shows two areas of high precipitation; one over the Southeastern states where average precipitation ranged between 3 and 5 mm day−1. This area extends further north to include the Mid-Atlantic and Northeastern States with somewhat lower averages around 3 mm day−1. Another area with high precipitation can be seen over the western half of the Northwestern states from Washington to Northern California. This area is dominated by the Cascade and Sierra Nevada mountain ranges, where precipitation total range from 4 to 10 mm day−1. In addition, a large region of weak precipitation (<1 mm day−1) separates these two wet areas. This dry region includes most of the Central and Upper Midwest plains and Rocky Mountain States.

Fig. 3
figure 3

1982–2004 December–April average precipitation from a observation, b CFS and c UCLA-ETA hindcasts (mm day−1). Dashed vertical line in a indicates the separation between eastern and western sub-domains used in the calculation of the results displayed in Figs. 7, 8, 10, 12 and 13

Despite simulating the two distinct areas of larger rainfall total in the east and far west of the domain, the CFS tends to over-predict the precipitation throughout the domain. To more clearly show the difference between observation and hindcasts and downscaling, Fig. 4a displays the average precipitation bias for the CFS model predictions. Most of the domain exhibits positive precipitation biases for the CFS, with the largest biases located in the Pacific Northwest, around the Great Lakes area, and northern New England, where the difference between the model and the observation is larger than +3 mm day−1 at some locations. Except for the regions around the Mississippi River valley and Southern States, CFS overestimated the winter precipitation throughout the analysis domain.

Fig. 4
figure 4

Spatial distribution of winter precipitation bias for a CFS and b UCLA-ETA hindcasts based on observations

The UCLA-ETA predictions, on the other hand, capture the precipitation patterns and amount better as shown in Fig. 3c. For most of the domain, UCLA-ETA’s biases are between −0.5 and 0.5 mm day−1, which indicates a major improvement from the CFS results (Fig. 4b). Precipitation dry bias is found over the Southern States and Lower Mississippi River valley and along the mountainous areas of the West coastal mountains areas. Such dry bias in the Southern States has also been found in downscaling studies with different regional models (e.g., Liang et al. 2004b; Pan et al. 2001). To better quantify each model’s performance, we calculated the mean, bias, RMSE and spatial correlation (SCOR) of December–April precipitation over the entire MRED grid. Results are shown in Table 1. On average, the dynamic downscaling improves the mean precipitation RMSE and SCOR by approximately 41, and 15 % respectively, compared to the CFS predictions.

Table 1 Precipitation mean, bias, root-mean-square error (RMSE) and spatial correlation (Scorr) averaged over the contiguous US land points from observation, CFS, and UCLA-ETA forecasts

We also include in Table 1 the same statistical measures calculated based on CPC global gauge-based observational data to assess uncertainty between observational data sets. A comparison of these two observational data sets reveals that the spatial correlations of the simulated precipitation are better for both models when PRISM-adjusted observation is used as a reference. It is interesting to note that, PRISM-adjusted precipitation data is substantially wetter than CPC’s unadjusted data, resulting in less improvement by the UCLA-ETA downscaling. Nevertheless, both data sets show a consistent significant improvement by the dynamic downscaling.

Winter-average surface air temperature hindcast/downscaling and the observation are displayed in Fig. 5. Because the observation and models have different topographic heights due to their horizontal resolutions, we interpolate the model’s results to the observational data’s height to remove the topographic effect, following the same method discussed by Xue et al. (1996). Without this elevation correction, CFS’s low horizontal resolution is unable to show the detailed temperature variability over the Rockies. After this topographic interpolation, both models produce similar results (Table 2) and display all major features seen in the observation (Fig. 5). Along the Rocky Mountains and in the Great Basin region, the regional model produces colder temperatures than CFS and observation (Fig. 5c). As will be shown next, such cold bias seems to be associated with a positive bias in snow cover by the UCLA-ETA. It is unclear how reliable the observational network is over these mountainous regions. On average, the regional model results yield lower temperature bias but higher RMSE than the CFS. Little improvement was attained in temperature spatial correlation with dynamic downscaling, as both models produced correlation coefficients above 90 %.

Fig. 5
figure 5

1982–2004 December–April average surface temperature from a observation, b CFS and c UCLA-ETA hindcasts (°C)

Table 2 Surface air temperature mean, bias, root-mean-square error (RMSE) and spatial correlation (Scorr) averaged over the contiguous US for observation, CFS, and UCLA-ETA results

Average SWE estimate based on the Rutgers University Global Snow Lab’s mean snow depth data (Dyer and Mote 2006) for the study period is shown in Fig. 6a. We converted the snow depth to SWE with a constant bulk density of 250 kg m−3. Although the comparison between modeled and observed SWE magnitude may not be adequate due to lack of directly measured SWE, it is still possible to compare the spatial distribution. Most of the differences between CFS and UCLA-ETA are found in the mountainous West. The UCLA-ETA’s SWE follows the higher terrain along the major mountain ranges, including the Sierra Nevada in central California, and the southern Cascades, both of which are beyond CFS’ capability due to its coarse horizontal resolution. When compared with an assumption of constant snow density in observation, Fig. 6a, c suggest the regional model probably overestimates SWE, which explains the cold biases along the Rocky Mountains and Great Basin (Fig. 5c). The lack of precipitation, temperature and SWE measurements in these high-elevation areas, however, makes it difficult to realistically assess model performance there. Despite the substantial precipitation wet bias, the CFS does not show a wet bias in SWE. Instead, there is an apparent dry bias along the southern snow boundary as shown in Fig. 6b. Nevertheless, it captures well the average SWE in some parts of the upper Midwest.

Fig. 6
figure 6

1982–2004 December–April average snow water equivalent from a Rutgers University dataset, b CFS, and c UCLA-ETA predictions (10−3 m). A constant snow bulk density of 250 kg m−3 is used to estimate Rutgers University’s snow water equivalent based on the original snow depth values

To have a clearer view of which type of precipitation events contribute more to the precipitation biases in the predictions, next we examine the precipitation energy for different precipitation rates. To calculate precipitation energy, thresholding is initially used to convert the monthly precipitation fields into binary maps, which represent the occurrence of precipitation/no-precipitation events of intensity equal or higher than the threshold. The total number of precipitation events is then tallied and averaged over the domain’s area. Precipitation energy is thus directly proportional to the number of precipitation events exceeding a given threshold and ranges from 0 to 1, with 1 indicating precipitation of intensity equal or higher than the threshold is found on all grid points of the domain (see De Sales and Xue 2011 for detail).

Generally, weaker precipitation events are associated with non-convective precipitation systems, while stronger events are connected to convective precipitation systems. A comparison between the observed and modeled precipitation energy decaying rates consequently can provide insightful information on which type of precipitation events contributed to the forecast’s dry or wet biases. For this technique to deliver realistic information, high temporal-resolution precipitation data must be used. We utilize 5-day mean values. Because OSU PRISM-adjusted precipitation data is only available as monthly means, we use NARR precipitation analyses. It should be noted that NARR assimilates PRISM-correct precipitation information into its analyses, and therefore should provide comparable results to real observations for this purpose.

To facilitate the analysis, we divide the domain into eastern and western regions. The 100°W meridian was selected as the separation line because precipitation shows different characteristics to its east and west (Fig. 3a). Different influences of the neighboring oceans and topography on precipitation in each of these regions warrant their separation for a more robust analysis. Figures 7a, b shows the precipitation energy distribution obtained for the eastern and western sub-domains. Shaded areas represent the uncertainty across the 22 years in the study. NARR precipitation energy shows a sharp decline between 0.0 and 0.6 mm day−1 in both sub-domains, and decreases at slower rates for more intense events.

Fig. 7
figure 7

Average precipitation energy decomposition of 5-day precipitation over a eastern and b western sub-domains calculated from CPC gauge-base data (CPC), North American regional reanalysis (NARR), CFS and UCLA-ETA results. Grey shaded areas represent uncertainty across the 22 winters of study. Eastern and western sub-domains are separated by 100°W meridian as indicated in Fig. 3a

In general, the CFS tends to overestimate the number the events throughout the threshold spectrum in both sub-domains, except at intense-threshold events. In the east, CFS produces approximately 10, 30, and 35 % more precipitation events at 0.2, 1.0, and 2.0 mm day−1 thresholds, respectively, than the UCLA-ETA. For thresholds below 2.0 mm day−1, the regional model overestimates the events but does so less strongly than the global model. Overall, the regional model’s downscaling is more comparable to the observations for events of mid intensity. Above 2.0 mm day−1, UCLA-ETA’s precipitation energy becomes lower than NARR’s, which indicates that the model underestimates the areal coverage for those precipitation occurrences, thus resulting in dry biases at these higher thresholds. This behavior is consistent with the dry bias in the Lower Mississippi region, where more intense convective precipitation still occurs in the wintertime.

In the west, the differences between CFS and UCLA-ETA energy distribution are even more striking (Fig. 7b). CFS hindcasts nearly 30–50 % more precipitation events than UCLA-ETA at thresholds ranging from 0.2 to 2.5 mm day−1. UCLA-ETA decomposition is similar to the NARR’s for thresholds of less than 1.0 mm day−1, but a dry bias pattern develops for stronger precipitation events. Nevertheless, the regional model results are more consistent with observations for most of threshold spectrum. Figure 7 reveals that CFS’s wet bias discussed previous resulted from an over-prediction of precipitation events across the entire precipitation intensity spectrum, expect events stronger than 5 mm day−1.

The precipitation energy decomposition analysis shows that precipitation events of different intensity benefited from the downscaling, especially weak to mid-intensity events. Also, it indicates that the UCLA-ETA model has a tendency to underestimate stronger events. Since stronger precipitation occurrences are often of convective nature, results suggest a possible deficiency in the regional model’s convective precipitation parameterization for the winter season. The information from this analysis could provide useful information for the improvements of convective precipitation schemes in the UCLA-ETA.

3.3 Precipitation inter-annual and intra-seasonal variability

The analysis in last section shows that dynamic downscaling with the UCLA-ETA model can improve the spatial pattern of winter precipitation and SWE over the contiguous US. In this section we examine the downscaling ability in producing the inter-annual and intra-seasonal variability of precipitation over the MRED domain. The goal here is to investigate if downscaling can also improve the prediction of precipitation temporal evolution in addition to the spatial distribution. To facilitate the investigation, we also separate the analysis into the two regions: eastern and western sub-domains (Fig. 3a). Figure 8a, b shows time series of seasonal average precipitation over each of the sub-domains for observation, CFS and UCLA-ETA results. Vertical bars indicate the variability across the 10 ensemble members, and dashed horizontal lines indicate the 22-winter mean. The figures clearly demonstrate that CFS overestimates precipitation for every winter season as discussed above. On average this global model produces about 45 and 90 % more precipitation than was observed in the eastern and western sub-domains, respectively. In contrast, UCLA-ETA results are more comparable to observation for every year. A reduction of approximately 46 and 60 % in precipitation RMSE was achieved with downscaling over each of the sub-domains.

Fig. 8
figure 8

Time series of winter mean precipitation from observation, CFS, and UCLA-ETA averaged over the eastern (a) and western (b) sub-domains. For example, 1982 refers to the average between Dec 1982 and Apr 1983. Vertical bars indicate one standard deviation among the 10 ensemble members (mm day−1). The time series’ mean, root-mean-square error (RMSE), and correlation coefficient (CORR) based on observations are also displayed

However, little difference was attained with downscaling in terms of precipitation temporal correlations. Eastern and western precipitation correlation coefficients between modeled and observed time series are 0.45 and 0.47, respectively, for CFS, and 0.49 and 0.48, respectively, for UCLA-ETA. The coefficients are significant at a 95 % confidence interval. In fact, downscaled precipitation time series are highly correlated with CFS’; especially in the western US where the correlation coefficient between the two models is 0.98. This suggests that the temporal variability in the UCLA-ETA is very much controlled by the imposed LBCs.

In terms of the inter-annual variability of seasonal means, the eastern sub-domain exhibits larger variability with an average standard deviation of 0.34 mm day−1, while the western shows 0.22 mm day−1. The average standard deviation for CFS and UCLA-ETA are 0.16 and 0.22 mm day−1, respectively, in the eastern and 0.18 and 0.16 mm day−1, respectively, in the western sub-domain. These low standard deviation values indicate a lack of internal variance in the model results; and along with the high correlation between the models’ monthly precipitation time series further reaffirms the possible effect of imposed CFS’ LBC on UCLA-ETA’s predictions. It also suggests that the 1-way downscaling technique may hinder the improvement of precipitation variability predictions by the regional model. However, in a downscaling study for eleven East Asian summers, it was found an improvement in producing the interannual variability by the RCM (Sato and Xue 2012). It is unclear, whether the difference in these two studies is due to different regions, seasons, and/or models. Further investigation on this issue is necessary.

To further explore the impact of LBC on downscaling inter-annual variability results, we calculate anomaly correlation coefficients of precipitation in every grid point following the methodology described in the work by Saha et al. (2006). Should LBC be constraining the downscaling variability simulations, both models should exhibit similar anomaly correlation throughout the domain. While the correlation coefficients and standard deviations calculated from seasonal precipitation time series described above depict the mean behavior over large regions of the eastern and western US, the anomaly correlation coefficient maps provide the geographic distribution of the model’s year-to-year variability. As shown in Fig. 9, despite localized differences, the global and the regional models’ temporal anomaly correlation maps are overall similar, with areas of higher correlation over in the Florida peninsula, eastern Georgia, as well as over the Southwest. Anomaly correlation maps of 500 and 200-hPa geopotential height (not shown) also resulted in nearly identical maps between CFS and UCLA-ETA, which along with Fig. 9a, b corroborates to the assumption that LBC is a dominant factor in this regional model’s precipitation inter-annual variability.

Fig. 9
figure 9

Spatial distribution of anomaly correlation coefficients for a CFS and b UCLA-ETA 1982–2004 December–April average precipitation (%)

We next examine the downscaling capabilities regarding the precipitation intra-seasonal variability. Figure 10 shows the 22-year average precipitation for each month for the sub-domains. The vertical bars indicate the standard deviation from the mean and represent the inter-annual variability of monthly means based on the 22-winter climatology. Tables 3 and 4 have the mean and standard deviation values for each sub-domain. In the east (Fig. 10a–c), monthly means show a small decrease in precipitation from December to January followed by a steady increase until April. As for the west (Fig. 10d–f), the precipitation intra-seasonal variability is rather flat with very little change from month to month.

Fig. 10
figure 10

1982–2004 monthly mean precipitation from observations, CFS, and UCLA-ETA models averaged over the eastern (a, b, c) and western (d, f) sub-domains. Error bars indicate one standard deviation among the 22 years of study (mm day−1)

Table 3 Eastern sub-domain monthly precipitation means and their standard deviations in parenthesis
Table 4 Same as Table 3 except for western sub-domain

Both models produce significant month-to-month variability in the east with the downscaling results being more comparable to observation for every month. Only the UCLA-ETA results display the precipitation dip in January and positive trend thereafter in the eastern sub-domain. CFS’s wet bias is evident in every month but especially so towards the season’s end. In terms of the inter-annual variability of monthly means based on the 22-winter climatology, the models exhibit less variability than observation. While in the east CFS and UCLA-ETA mean standard deviation are respectively 50 and 40 % less than observed; in the west these values are approximately 30 and 40 % less, respectively. The weaker inter-annual variability in the downscaled monthly means is also believed to be related to the imposed CFS’ LBC.

4 Discussion

Despite the similarities in the precipitation temporal variability, the comparison between CFS and UCLA-ETA results indicates a dramatic difference in the amount and spatial distribution of precipitation between the models. The dynamic downscaling with the UCLA-ETA can improve the prediction of winter seasonal precipitation over the contiguous US by significantly lowering CFS’s average RMSE by as much as 41 % and increasing by 15 % the average spatial correlation with observation. Such difference may arise from several sources, for example, differences in resolution, topography representation, differences in model’s atmospheric physics, dynamic processes, as well as land surface processes. In this section, we conduct a preliminary analysis to investigate the source of precipitation differences.

Analyses of mid-level and upper-level wind circulation show little difference between the models (not shown). At 200 hPa, both CFS and UCLA-ETA exhibit similar patterns on winter average geopotential height, with a ridge of high pressure located over the Northwest, West, and northern Rockies; and a trough of low pressure over the Great Lakes, Northeast and Mid-Atlantic regions. Spatial correlation coefficient of winter-average 200 hPa geopotential height (with zonal mean removed) against NARR is 0.99 for CFS and 0.96 for UCLA-ETA. Modeled average geopotential height patterns at the mid-levels of the troposphere are also similar. The correlation coefficients for average 500-hPa geopotential height (with zonal mean removed) between models and NARR are approximately 0.99 for both models. It has been pointed out that a RCM should, in most circumstances, at least be able to reproduce the large-scale patterns of the GCMs, which provided the LBCs, at the upper and mid-levels of the troposphere. This is a fundamental requirement for the dynamic downscaling (Xue et al. 2007) and the UCLA-ETA satisfies this requirement. Therefore, the difference in precipitation should be produced by other processes, which play major roles in the lower troposphere.

Accurate representation of topography is an important factor contributing to the RCM’s better performance in many occasions (e.g., Chan and Misra 2011; De Sales and Xue 2011). The higher vertical and horizontal resolutions of RCMs provide a much finer representation of topography than the global model. Precipitation events at different spatial scales and intensities respond differently to topography height in dynamic downscaling simulations (De Sales and Xue 2011). To test the impact of topography on the winter season downscaling, a sensitivity test was carried out where the UCLA-ETA was run with the same topographic representation of the CFS model. On average, the CFS’s topography is lower than that of the regional model. For example, the average surface height of the Rocky Mountains (113 W–103 W and 35 N–45 N) in the CFS is 1,902 m; while the regional model’s higher resolution yields an average topography of 2,038 m. Should topography be the main cause of the precipitation difference, we would expect this sensitivity test to significantly degrade the UCLA-ETA performance and produce positive precipitation biases as CFS does.

Three winter integrations were performed ranging the same period as the original integrations, from December through April. In general, the low-topography UCLA-ETA experiments produced less precipitation than the original topography by a factor of roughly 10 %. Average winter precipitation biases for the original and low topography UCLA-ETA, and CFS’ hindcast were −0.28, −0.41, and 1.38 mm day−1 based on CPC observations. The results from the sensitivity test suggest that although UCLA-ETA topography helps the improvement, it is not the major cause for the precipitation differences discussed in previous sessions because the differences between the low and original topography runs are much smaller than the differences between the UCLA-ETA (even with low topography) and the CFS.

We next look at the land surface processes. Studies have indicated that the land surface processes play a major role in downscaling results (e.g., Xue et al. 2001; Collini et al. 2008; Gao et al. 2011). To investigate this issue, we use the Global Land Data Assimilation System (GLDAS) multi-model ensemble average (Rodell et al. 2004) as the reference for land surface processes estimates. We assume that the multi-model average from the GLDAS provides the best estimation for the large scale land surface fluxes at this point. During winter, due to lower net radiation at the surface, seasonal surface heat fluxes over the US are rather low, with latent heat ranging from 0 to 60 W m−2 and sensible heat ranging from 0 to 80 W m−2 (Fig. 11). Latent heat is higher over the southeast US especially along the Gulf of Mexico, and then becomes lower to its north and to its west. Over the mountainous West and along the Canadian border, it is very low. Latent heat flux is also high along the northwest coastal region. The spatial distribution of seasonal surface latent heat flux resembles that of seasonal precipitation (Fig. 3). On the other hand, sensible heat is higher over southwest US and northern Mexican semi-arid and arid regions. It becomes lower to the north and to the east. The lowest sensible heat fluxes are found in the Great Lakes area, the Ohio River Valley; and the northern Rockies and in the Northwest.

Fig. 11
figure 11

1982-2004 December–April average surface latent and sensible heat fluxes for (a and d) GLDAS, (b and e) CFS, and (c and f) UCLA-ETA (W m−2)

Both models produce similar surface net radiation seasonal averages (not shown), which is mostly controlled by the downward long wave radiation. UCLA-ETA, however, significantly improves the surface energy partitioning (Fig. 11c, f). In general, the regional model’s fluxes are more comparable to GLDAS, both in intensity and spatial distribution than the global model’s fluxes. Domain-average latent heat flux for GLDAS, CFS, and UCLA-ETA are approximately 24.0, 57.6, and 26.2 W m−2, respectively. As for sensible heat flux, these averages are 29.9, 8.3, and 34.1 W m−2, respectively. Downscaling lowers the seasonal latent and sensible heat flux RMSE by roughly 80 and 35 %, respectively, on average (Table 5).

Table 5 Latent and sensible heat fluxes’ mean, bias, root-mean-square error (RMSE) and spatial correlation (Scorr) averaged over the study domain land surface for GLDAS, CFS and UCLA-ETA

When looked at a regional scale, the differences are even larger. For example, the sensible heat flux is approximately 6 times larger on average and the latent heat is 50 % lower over the eastern sub-domain in the UCLA-ETA than in the CFS. Such large differences are consistent with the large differences seen in the seasonal precipitation. The breakdown of the seasonal average in monthly means for eastern and western sub-domains shows more clearly the source for the large difference between the models (Figs. 12, 13). The very different sensible heat flux distribution between CFS and downscaling results occur in the first three months. Between December and January, the CFS produces negative fluxes, when the UCLA-ETA shows positive ones that are consistent with the GLDAS results (Fig. 13). Over the entire five-month period, the CFS consistently produces lower sensible fluxes than the UCLA-ETA. In a study with the GFS, the atmospheric component of CFS, and prescribed SST, it has also been found that the land surface scheme in GFS produced lower sensible heat flux, including the negative values in some areas, compared to the GFS coupled with the SSiB (Xue et al. 2004). Another study by Yang et al. (2007) also concluded that the GFS model over-predicted downward sensible heat flux during winter when compared to in situ data.

Fig. 12
figure 12

1982-2004 monthly mean latent heat flux for GLDAS, CFS, and UCLA-ETA results averaged over the eastern (a, b, c) and western (d, f) sub-domains. Error bars indicate one standard deviation across the 22 years of study (W m−2)

Fig. 13
figure 13

Same as Fig. 12 except for sensible heat flux

In contrast with sensible heat flux, CFS’ latent heat flux is larger (Fig. 12); leading to larger seasonal means in both sub-domains. Lower seasonal precipitation totals in the UCLA-ETA is consistent with less monthly latent heat produced by that model, and vice versa in the CFS results. On average, downscaling reduced CFS’s precipitation and surface latent heat flux by 54 and 55 % respectively. On the other hand, the regional model’s sensible heat flux is consistent with GLDAS and is 310 % more than the global model’s. In terms of inter-annual variability of the monthly surface fluxes, in general, GLDAS exhibits larger inter-annual variability than either model. The average standard deviation of monthly latent heat flux means for GLDAS, CFS and UCLA-ETA are 6.31, 1.80 and 3.82 W m−2, respectively, in the eastern, and 4.47, 2.89 and 2.55, respectively, in the western sub-domain. The standard deviations for sensible heat means are 8.04, 2.56, 3.85; and 5.78, 3.03, 4.07 W m−2, in the same order. A complete list of the heat fluxes monthly means and standard deviations can be found in Tables 6 and 7. Inter-annual variability of modeled surface fluxes is especially small in the first three months. Similar to precipitation, imposed CFS LBC may be responsible for the weak inter-annual signal in the RCM’s surface flux results also.

Table 6 Eastern sub-domain monthly latent and sensible heat fluxes’ means and their standard deviations in parenthesis for GLDAS, CFS, and UCLA-ETA
Table 7 Same as Table 6 except for western sub-domain

Figure 14 shows the average 850-hPa moisture flux divergence from NARR, CFS, and UCLA-ETA results over the eastern sub-domain. Due to complex topography in the west, the calculation of low-level moisture flux convergence in that region may not be reliable, and thus it is not shown. Moisture flux divergence averages over the eastern sub-domain are −0.29, −0.12 and −0.31 × 10−7 s−1 for NARR, CFS and UCLA-ETA respectively. Despite the excessive precipitation, the CFS generates less lower-level moisture convergence than the UCLA-ETA and NARR. This result, in addition to the lack of significant difference in upper and mid-level wind circulation, points to surface evaporation as a possible main cause for the precipitation difference between CFS and UCLA-ETA models.

Fig. 14
figure 14

1982–2004 December–April average moisture flux divergence at 850 hPa (10−7s−1) for a NARR, b CFS, and c UCLA-ETA

Negative sensible heat fluxes in the first 3 months and weak lower-level moisture flux convergence suggest that the precipitation over-predicted by the CFS may be a product of too much precipitation recycling through surface evaporation. Ruiz-Barradas and Nigam (2005, 2006) found that several GCMs tend to vigorously recycle precipitation erroneously, an effect they refer to as “overcooking” of land–atmosphere interactions. Regression of warm-season NARR precipitation on evaporation and moisture flux showed that precipitation in the eastern US is mostly supported by convergence of stationary moisture flux. According to their analysis, transient moisture fluxes and surface evaporation play a secondary role on precipitation (Ruiz-Barradas and Nigam 2005, 2006). Our analysis seems to suggest that the CFS may suffer from a similar problem.

To corroborate this assumption, we calculate the temporal correlation between daily precipitation and daily latent heat flux over the eastern and western domains. Our assumption is that CFS’s strong precipitation recycling should reflect on higher temporal correlations than the UCLA-ETA. The eastern sub-domain average correlation coefficient between precipitation and latent heat flux for CFS is 0.44, while for the UCLA-ETA it is only 0.19. Corresponding values for these two models for the western sub-domain are 0.53 and 0.36, respectively. These correlations are significant at a 95 % confidence level. CFS’ higher correlations support the “overcooking” hypothesis.

In contrast, the UCLA-ETA precipitation and land latent-to-sensible heat partitioning are more consistent with observation and GLDAS data, which confirms that the dynamic downscaling with this regional model can add significant value to CFS’ seasonal precipitation and land energy budget, which is very likely due to better coupling of land–atmosphere processes over the US during the winter.

5 Concluding remarks

This study investigated the added value of fully prognostic dynamic downscaling of CFS winter season predictions with the UCLA-ETA regional climate model. It included the multi-member ensembles of 22 winter (December to April) seasons in the contiguous US between 1982 and 2004. Analysis of relationship between ensemble size and modeled precipitation showed that 10 realizations are sufficient to provide a good representation of a winter season on both models by significantly reducing the uncertainties associated with initial conditions and model internal variability. Improvements associated with an increasing number of integration realizations were determined to be irrelevant for ensemble formed by more than 10 members.

Winter seasonal precipitation was the focus of the study. Comparison between CFS and UCLA-ETA model results showed that the latter was able to improve the precipitation prediction over most of the domain, except in the southern States along the coastal area. Domain-average precipitation bias for CFS and UCLA-ETA were 1.12 and −0.46 mm day−1, respectively, based on PRISM-adjusted precipitation analyses. The average RMSE for each model were roughly 1.5 and 0.9 mm day−1, respectively. Comparison of bias spatial distributions showed that the CFS overestimated the precipitation over most of the country, especially in the Northwest and Great Lakes regions; while UCLA-ETA produced results more consistent with the observations. Meanwhile, downscaling greatly improved seasonal SWE spatial distribution, especially over the mountainous West. On other hand, the downscaling did not show substantial improvement in seasonal surface temperature results.

Time series of seasonal and monthly precipitation means for eastern and western US sub-domains showed that the UCLA-ETA improved considerably the CFS hindcasts for every year and month of the study, by significantly reducing CFS’s excessive precipitation to values closer to the observed. On average, downscaling lowered the RMSE associated with 22-year winter precipitation time series by approximately 46 % in the eastern and 60 % in the western sub-domains. The dynamic downscaling of CFS predictions also resulted in large reductions in monthly mean precipitation bias and RMSE.

The number of observed and modeled precipitation events for specific intensity thresholds was assessed through the precipitation energy decomposition. Results indicated that the CFS overestimated the number of precipitation events for most of the intensity thresholds, especially weak to mid-intensity ones. For instance, for events equal or larger than 1.0 mm day−1, CFS produces approximately 30 and 50 % more events than the UCLA-ETA in the eastern and western sub-domain respectively. On the other hand, the regional model overestimated the number of weak to mid-intensity events, but underestimated the number of stronger precipitation events. As stronger precipitation events are often of convective nature, the results suggest a possible deficiency in the regional model’s convection parameterization to form strong precipitation during the winter season in the study area.

Despite large improvements in spatial and temporal precipitation intensity distribution, the dynamic downscaling’s ability to reproduce the inter-annual and intra-seasonal variability of precipitation was unclear. Comparison of eastern and western modeled and observed seasonal and monthly mean precipitation time series show both models produced low correlations coefficients and low variances compared to observation. Furthermore, anomaly correlation spatial distribution of precipitation and upper-level geopotential height (not shown) calculated for both models also confirm the UCLA-ETA’s deficiency in improving the temporal variability. In fact, the correlation coefficient between CFS and UCLA-ETA precipitation time series were very high in both regions. These results suggest that this regional model’s ability to simulate year-to-year and month-to-month variability of precipitation and other variables discussed in the paper may be hindered by the lack of variability in the LBC provided by the CFS in the 1-way downscaling method utilized. Further tests for different regions/seasons/models are needed to confirm this issue.

The striking difference between observation, CFS, and UCLA-ETA average seasonal precipitation totals suggests a fundamental difference in physical and dynamical processes leading to different precipitation prediction in the models. Comparison of upper and mid-level geopotential height showed little difference between the global and regional models. A sensitivity test with the CFS topography in the UCLA-ETA shows the topography only plays a secondary role in the dynamic downscaling improvement.

Although both models produced similar surface net radiation and surface temperature, the partitioning between latent and sensible heat fluxes is very different, with the UCLA-ETA values being more comparable to GLDAS estimates. The CFS, in contrast, places most of the energy as latent heat flux, especially on the first three months when sensible heat flux is either nearly non-existent or downward (negative), which is not consistent with GLDAS data. Surface energy partitioning is determined by the gradients of near-surface temperature and water vapor as well as aerodynamic resistances to heat and water vapor transfers. Such aerodynamic resistances are represented very differently in the CFS and UCLA-ETA’s land surface models, which points to the land surface model, including the coupling methodology, as the main cause for the regional model seasonal downscaling improvements.

Comparison between average 850-hPa moisture flux divergences revealed that CFS produced less lower-level convergence than UCLA-ETA and NARR on average over the eastern US The lack of difference at mid and upper-level dynamics, along with reduced lower-level moisture flux convergence, and excessive land evaporation, indicates an overly strong surface-atmosphere coupling as the probable cause for the CFS’s precipitation over-prediction. Such assumption is further substantiated by the correlation coefficient between daily precipitation and evaporation in CFS hindcasts and observation; which were found to be approximately 130 % higher in the eastern and 50 % higher in the western sub-domains compared to the UCLA-ETA.

This study showed that downscaling of winter season hindcasts with the UCLA-ETA can significantly add skill to CFS’ results in intensity and spatial distribution of precipitation. The results further suggest that the precipitation improvement is mainly due to a realistic partitioning of the land surface energy produced by the land surface scheme in the UCLA-ETA.