Modeling high-resolution precipitation by coupling a regional climate model with a machine learning model: an application to Sai Gon–Dong Nai Rivers Basin in Vietnam

Trinh, T.; Do, N.; Nguyen, V. T.; Carr, K.

doi:10.1007/s00382-021-05833-6

Modeling high-resolution precipitation by coupling a regional climate model with a machine learning model: an application to Sai Gon–Dong Nai Rivers Basin in Vietnam

Published: 05 June 2021

Volume 57, pages 2713–2735, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Climate Dynamics Aims and scope Submit manuscript

Modeling high-resolution precipitation by coupling a regional climate model with a machine learning model: an application to Sai Gon–Dong Nai Rivers Basin in Vietnam

Download PDF

T. Trinh^1,2,3,
N. Do⁴,
V. T. Nguyen⁵ &
…
K. Carr¹

862 Accesses
7 Citations
1 Altmetric
Explore all metrics

Abstract

Modeling of large rainfall events plays an important role in water resources and floodplain management. Rainfall is resulted from complex interactions between climate factors (air moisture, temperature, wind speed, etc.) and land surface (topography, soil, land cover, etc.). Therefore, deriving accurate areal rainfall is not only relied on atmospheric boundary conditions, but also on the reliability and availability of soils, topography, and vegetation data. Consequently, uncertainties in both atmospheric and land surface conditions contributes to rainfall model errors. In this study, a blended technique combining dynamical and statistical downscaling has been explored. The proposed downscaling approach uses input provided from three different global reanalysis data sets including ERA-Interim, ERA20C, and CFSR. These reanalysis atmospheric data are hybridly downscaled by means of the Weather Research and Forecasting (WRF) model, which is followed by the application of an artificial neural network (ANN) model to further downscale the WRF output to a finer resolution over the studied region. The proposed technique has been applied to the third largest river basin in Vietnam, the Sai Gon–Dong Nai Rivers Basin; and the calibration and validation show the simulation results agreed well with observation data. Results of this study suggest that the proposed approach can improve the accuracy of simulated data, as it merges model simulations with observations over the modeled region. Another highlight of this approach is inexpensive computational demand on both computation times and output storage.

Coupling dynamical and statistical downscaling for high-resolution rainfall forecasting: case study of the Red River Delta, Vietnam

Article Open access 03 May 2018

Daily precipitation performances of regression-based statistical downscaling models in a basin with mountain and semi-arid climates

Article 04 December 2022

Statistical downscaling rainfall using artificial neural network: significantly wetter Bangkok?

Article 09 August 2015

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The modeling of large rainfall events is a fundamental and challenging topic in water resources and floodplain management. Rainfall results from complex interactions between climate factors, such as air moisture, temperature and wind speed; and the land surface, such as topography, soil, and land cover conditions. Therefore, deriving an accurate areal rainfall is not only relied on the atmospheric boundary conditions but also on the reliability and availability of soils, topography, and vegetation data. Consequently, the uncertainties of both atmospheric and land surface conditions contribute to rainfall model errors (Gebregiorgis and Hossain 2012; Shepherd 2014; Reichler and Kim 2008). Wherein, Gebregiorgis and Hossain (2012) explored uncertainties of three satellite rainfall products relating to unreliable topography and climate conditions. Shepherd (2014) found sources of uncertainty coming from climate boundary conditions and resulting in atmospheric model errors. Reichler and Kim (2008) identified errors and uncertainties associated with the different reanalysis data sets by comparing them with a wide range of observations. These errors suggest the need for a holistic approach considering both high-resolution topography and land surface distribution to aid in the generation of realistic rainfall information.

Recently, there have been attempts to model rainfall events by means of global atmospheric models (GCMs) (Krishnamurti et al. 1997; Compo et al. 2006; Lledó et al. 2013; Fuka et al. 2014). Such GCMs consider various aspects of climate and the effect of the land surface on receiving surface rainfall. However, their spatial resolution, typically 100 km, is too coarse for use in analyzing water resources at the watershed or regional scale. One recommendation is the use of downscaling technologies to refine coarse grid resolution data to desired finer spatial grid resolutions. Commonly, there are two different approaches, statistical or stochastic downscaling (SD) and dynamical downscaling (DD). The SD refers to empirical relationships between large-scale modeled atmospheric variables and local-scale meteorological variables. The empirical relationships and inexpensive computational demands enable SD to be popular and widely used in many regional atmospheric studies (Burlando and Rosso 2002; Fowler et al. 2007; Goyal and Ojha 2011; Hashmi et al. 2011a, b, 2013; Pilling and Jones 2002; Raje and Mujumdar 2011; Wilby and Wigley 1997; Yang et al. 2011, 2012). Because SD methods rely on the assumption of an unchanged statistical relationship, they require long historical climate observation data for validation, which is not always available for every region. DD is an alternative to SD for empirical climate downscaling that can overcome the drawbacks of SD methods. DD works by employing a regional climate model (RCM), which is based on the same principles as a GCM, but has a higher resolution. A RCM uses large-scale GCMs’ outputs for initial and lateral boundary conditions to generate much finer meteorological variables with incorporated high-resolution topography and land-sea distribution. This allows dynamic interaction between the atmosphere and land surface, thereby accounting for the impact of heterogeneity in the topography, vegetation, and soil on the local climate. DD is known as the most suitable technology for modeling climate information with complex topography at regional scales (Kavvas et al. 2013; Kjellström et al. 2016; Jang and Kavvas 2015; Jang et al. 2017). In spite of recent developments in DD making them easily accessible, this method still requires expensive computational demand on both long computation times and large output storage.

In order to overcome the limitations in both DD and SD approaches, a blended technique combining dynamical and statistical downscaling has been explored. Recently, Liu and Fan (2014), Tran and Taniguchi (2018), and Walton et al. (2015) have applied a hybrid dynamical-statistical downscaling approach by incorporating a regional climate model (RCM) with a statistical downscaling technique to some regions in China, Vietnam, and the Western United State. However, before coupling a RCM with a statistical model, both models need to be calibrated and validated in order to verify their capability and reliability for further downscaling applications. Hence, ignoring the calibration and validation of the RCM, Liu and Fan (2014) and Tran and Taniguchi (2018) may obtain unreliable downscaled data for the estimation of atmospheric variables; particularly, in mountainous regions. Furthermore, the temporal downscaling data obtained from Liu and Fan (2014), Tran and Taniguchi (2018) and Walton et al. (2015) is mainly focused on monthly scale, which are inappropriate for the analysis of floods and large rainfall events.

In this context, this study applied a regional climate model (RCM) coupled with machine learning algorithms to model and reconstruct rainfall data. This new technique, called hybrid downscaling (HD), first uses large-scale atmospheric conditions as determined by a GCM for its lateral boundary conditions before being downscaled by a RCM model, then applies ANN model to further downscale from selected RCM outputs to a finer spatial resolution. The HD also includes the influences of terrain factors and physical interactions between atmosphere and land surface conditions. Another highlight of this technology is that it improves the accuracy of simulated data as it merges model simulations with observations over the modeled region. The proposed downscaling technique uses input provided from three different global reanalysis datasets; ECMWF—Atmospheric Reanalysis coarse climate data of the twentieth century (ERA-20C, https://rda.ucar.edu/datasets/ds626.0) (Poli et al. 2013, 2016), ECMWF—Reanalysis Interim (ERA-Interim, https://rda.ucar.edu/datasets/ds627.0) (Berrisford et al. 2009; Dee et al. 2011), and Climate Forecast System Reanalysis (CFSR, https://rda.ucar.edu/datasets/ds093.0) (Saha et al. 2010; Wang et al. 2011). These three datasets provide three-dimensional data and uniformly cover the globe at a spatial resolution of 1.25° (ERA20C), 0.75° (ERA-Interim), and 0.5° (CFSR). These coarse scale atmospheric data are hybrid downscaled by means of the Weather Research and Forecasting model (WRF, Skamarock et al. 2005), then followed by the application of an artificial neural network (ANN) model to further downscale from the WRF output to a finer resolution over the studied watershed. First, the WRF and ANN models are calibrated and validated against existing ground observation data, then hybrid method is evaluated through time series and spatial analyses. The Sai Gon–Dong Nai Rivers Basin is selected as a case study for the application of the hybrid technique. Due to its important location and complicated physical processes causing severe rainfall in this area, it is necessary to apply advanced technologies to investigate severe rainfall processes, and model realistic historical rainfall events for this region.

2 Description of the study region

The selected watershed, the Sai Gon–Dong Nai (SG–DN) Rivers, ranks third-largest in country after the Mekong and Red River water systems, but it is the largest inland river in Vietnam. The SG–DN Rivers have become an important source of hydropower, with many hydropower plants and large amounts of water resources used for all southern provinces of Vietnam. Natural impacts from meteorological factors have caused many difficulties for socio-economic development activities in the basin. The SG–DN Rivers Basin has a complex terrain system including mountainous and delta regions with tropical heavy rainfall experienced from summer monsoon (SMS) and tropical cyclone (TC) systems (Nguyen-Thi et al. 2012; Yokoi and Matsumoto 2008).

The SG–DN Rivers Basin shown in Fig. 1 covers the provinces of Lam Dong, Binh Phuoc, Binh Duong, Dong Nai, Dak Nong, Long An, Tay Ninh, and Ho Chi Minh City, and parts of Ninh Thuan, Binh Thuan, and Ba Ria-Vung Tau with a total catchment area of about 44,500 km². SG–DN Rivers Basin includes the two main river system including Sai Gon and Dong Nai Rivers. This area is a complex terrain region including mountainous and delta regions with elevations from 2 to 2291 m. Along with an important source of hydropower, the SG–DN Rivers Basin also include a number of important industrial zones. The region’s atmospheric condition falls in a tropical monsoon climate experiencing a wet summer from late May through early November with an average annual rainfall of about 1800 mm, and humidity of 78–82%. The land use condition of the watershed is various land types including agricultural, forested, and urban areas.

3 Methodology and implementation

This study introduces a blended technique to model rainfall events by coupling physically based numerical atmospheric and machine learning models. The required atmospheric data used to set up the initial and boundary conditions in WRF simulations over SG–DN basin are taken from the three reanalysis datasets, including ERA-20C, ERA-Interim, and CFSR. These datasets were selected because they provide three-dimensional data at 6-h time increments for the required atmospheric and surface variables. They are also long enough to be reliable in a statistical sense and consistently cover the entire globe uniformly (Rossi et al. 2007). The WRF model is utilized as the physically based numerical atmospheric model, while the ANN model is selected as the machine learning model, as shown in Fig. 2. There are five main steps in developing this hybrid rainfall model:

1.
Implementation of the physically based numerical atmospheric model, WRF, over the target watershed for the three different reanalysis datasets.
2.
Calibration and validation of the WRF model over the target watershed for the three different reanalysis datasets.
3.
Implementation of the ANN model with its input provided from WRF’s outputs.
4.
Training and validation of the ANN model over the target watershed for the three different reanalysis datasets.
5.
Provision of hybrid downscaling model for the target watershed.

In-depth description of each steps is presented in the following sections.

3.1 Implementation of the physically based numerical atmospheric model

The WRF model was employed for dynamical downscaling with inputs from the three reanalysis datasets. The WRF model is able to simulate vertical and horizontal air motions with multiple physics options for moisture dynamics, microphysics processes, cumulus cloud parameterizations, planetary boundary layer (PBL) schemes, radiation schemes, and surface schemes. A number of studies successfully applied the WRF model for precipitation analysis on regions in Vietnam (Ho et al. 2019, 2020; Cuong and Toan, 2019; Raghavan et al. 2016; Minh et al. 2018) with encouraging performance when compared to the recent rainfall observation data. Thus, WRF is selected herein, although other numerical models can be implemented for regional atmospheric modeling. In this study, a series of three nested domains for WRF simulations are implemented, as shown in Fig. 3. The largest domain (D1) covers the southern half of Vietnam and parts of Thailand, Laos, Cambodia, and Malaysia, having a spatial resolution of 81 km (21 × 18 horizontal grid points). D2 is the second largest domain with a resolution of 27 km (27 × 24 horizontal grid points), and D3 is the innermost and smallest domain with spatial resolution of 9 km (48 × 33 horizontal grid points). It is noted that WRF is implemented based on all 3 domains only for ERA20C data, while the ERA-Interim and CFSR were used only on D2 and D3.

3.2 Calibration and validation of WRF model over the target watershed for the three different reanalysis datasets

After successful implementation of WRF for SG–DN, the model was calibrated and validated against the observation rainfall data. Recently, the Vietnam Gridded Precipitation (VnGP) dataset was developed, and has been widely used for reliable observation (Nguyen-Xuan et al. 2016). The VnGP is daily gridded rainfall dataset that was interpolated by means of the Sphere-map interpolation technique from 481 rain gauges. This dataset has the resolution of 0.1°, and covers the whole Vietnam (Nguyen-Xuan et al. 2016). The validation of VnGP was carried out by comparing with gauge observations through correlations, mean absolute errors, root mean square errors, and spatial distribution. The validation results show that the VnGP is matched well with rainfall observation rather than different interpolation techniques. VnGP is currently available at the Data Integration and Analysis System (DIAS) (https://diasjp.net/en). The spatially-distributed daily rainfall data of VnGP are available from Jan 1980 to December 2010. This data was compared with the model’s precipitation simulations over SG–DN. First, the WRF model’s configurations are selected based on comparisons between downscaled rainfall data and the VnGP dataset. Table 1 shows 12 combinations of parameterization schemes based on previous studies in Vietnam (Ho et al. 2020; Ho et al. 2019; Cuong et al. 2019b; Raghavan et al. 2016; Minh et al. 2018; Trinh et al. 2020). The best parameterization scheme was selected based on the correlation coefficient for simulated daily basin average precipitation and VnGP data between 1 January, 1994 and 31 December, 1995. Water years 1994–1995 were selected for comparison due to their inclusion of historical extreme flood events. Note that D3 is primarily used in these comparisons.

Table 1 Twelve combinations of physics parameterizations for WRF configuration

Modeling high-resolution precipitation by coupling a regional climate model with a machine learning model: an application to Sai Gon–Dong Nai Rivers Basin in Vietnam

Abstract

Similar content being viewed by others

Coupling dynamical and statistical downscaling for high-resolution rainfall forecasting: case study of the Red River Delta, Vietnam

Daily precipitation performances of regression-based statistical downscaling models in a basin with mountain and semi-arid climates

Statistical downscaling rainfall using artificial neural network: significantly wetter Bangkok?

1 Introduction

2 Description of the study region

3 Methodology and implementation

3.1 Implementation of the physically based numerical atmospheric model

3.2 Calibration and validation of WRF model over the target watershed for the three different reanalysis datasets

3.3 Implementation of ANN architecture with back-propagation algorithm

3.4 Training and validation of ANN model over the target watershed for the three different reanalysis datasets

4 Results and discussion

5 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation