1 Introduction

With advantages such as low computational cost and a range of output variables, applications of numerical weather prediction (NWP) models have become increasingly common in climate research and operational weather forecasting in recent decades. Among these models, general circulation models (GCMs) provide comprehensive predictions of large-scale climate events (Gillett and Thompson 2003; Osborn 2004), and regional climate models (RCMs) can accomplish high-resolution runs over restricted areas and explore local circulation dynamics at lower computational cost (Dickinson et al. 1989; Wang et al. 2004; Giorgi 2006). Typically, the Advanced Research WRF model (ARW), designed to serve both regional operational forecasting and atmospheric research needs, is increasingly in use throughout the world as a regional climate model (Skamarock et al. 2008; Bukovsky and Karoly 2009; Argüeso et al. 2011).

Prior to utilizing the WRF model for climate simulations, it is crucial that model outputs be analyzed against observational data to assess their ability to capture spatial and temporal distributions. To our knowledge, variations in output from the WRF model depend on many factors, including the model itself (Giorgi and Bi 2000; Christensen et al. 2001), boundary conditions (Von Storch et al. 2000; Denis et al. 2002), geographic region (Seth and Giorgi 1998; Landman et al. 2005), and parameterization schemes (Ratnam and Kumar 2005; Tegoulias et al. 2017). Given these factors, the choice of parameterization schemes in WRF is one substantial source of model uncertainty (Jerez et al. 2013; Mooney et al. 2013). A wide range of parameterizations are available in WRF, which differ in their level of complexity and their representation of physical processes. However, there is no single configuration optimal for all locations, variables, and at every possible timescale (Fernández et al. 2007; Borge et al. 2008). It is therefore necessary to identify the optimal set of schemes applicable to the domain of interest (Ferreira et al. 2014; Chen et al. 2014; Pieri et al. 2015).

Over the years, significant developments have been achieved in the methods for evaluating the sensitivities to WRF parameterizations or the simulations of models (Fernández et al. 2007; Hu et al. 2010; Evans et al. 2012; Crétat et al. 2012; Li et al. 2016; Giannaros et al. 2019). Furthermore, a variety of measures, such as metrics (Willmott et al. 1985; Taylor 2001; Gleckler et al. 2008) and statistical approaches (Perkins et al. 2007; Quan et al. 2016; Budakoti et al. 2019), have been applied to calculate some key traceable components across different scales, such as precipitation, temperature, and surface flux (Tian et al. 2017; Yáñez-Morroni et al. 2018). Gallus and Bresch (2006) utilized WRF to model multiple events to compare the sensitivity of warm season rainfall forecasts to changes in model physics, dynamics, and initial conditions. They adopted statistical evaluation indexes, such as the equitable threat score (ETS; Schaefer 1990) and bias, to measure forecast accuracy and explore the key impacting factors of different types of rainfall events. Bastidas et al. (2006) carried out a comparison of land surface model sensitivity within a multicriteria framework. Bukovsky and Karoly (2009) discussed the impacts of changes in convective and land surface parameterizations, nest feedbacks, sea surface temperature, and WRF version on mean precipitation in four-month-long simulations by utilizing a subjective evaluation. Argüeso et al. (2011) evaluated the WRF sensitivity to eight different combinations of cumulus, microphysics, and planetary boundary layer (PBL) parameterization schemes over southern Spain for the period 1990–99. Cohen et al. (2015) summarized the key characteristics of the various PBL parameterization schemes employed to simulate the southeastern USA cold season severe thunderstorm environment. Their method of evaluating the performance of models focused on using a framework for error analysis often applied in economic forecast analysis. Hasan et al. (2018) made a comparison between observed and simulated rainfall over Bangladesh using 19 different combinations of microphysics and cumulus schemes available in WRF. The study found a combination of the Stony Brook University microphysics schemes with the Tiedtke cumulus scheme was the most suitable scheme for reproducing these events. However, an evaluation of the performance of high-resolution models and a quantitative method for evaluating the overall performance in simulating multiple fields are still lacking. Additionally, the number of parameterizations has increased, and no robust evaluation of the five types of physics options is available over the Haihe River Basin.

The goal of this study was to evaluate the skill of different physics scheme combinations in reproducing precipitation and temperature over the Haihe River Basin. The results will provide useful information about the optimal parameterization sets over the river basin and suggestions for which physics options may be sensitive to simulating the key component. The study region is characterized by being the political, economic, cultural, and transport center of China. Also, severe shortage of water resources and related environmental problems in this region, have become major critical challenges for regional social and economic development. Another important highlight of the present study that distinguishes it from others focuses on utilizing the multivariable integrated evaluation (MVIE) method (Xu et al. 2016, 2017) to measure the overall performance of WRF in simulating multiple fields, The MVIE method consists of multiple statistics that measure model performance from different aspects. It provides a framework that can evaluate model ability to simulate individual variables and the overall model ability to simulating multiple fields. The MVIE method is particularly suitable for this study that compares and ranks various WRF simulations in terms of their ability to simulate multiple fields.

Section 2 provides the model setup and descriptions of the study area and the evaluation method. Section 3 describes the evaluation process of each scheme option, and a brief discussion of the robustness of the optimal schemes is presented in Sect. 3.6. Our main conclusions are given in Sect. 4.

2 Methodology

2.1 Study site

The study area chosen was the Haihe River Basin (Fig. 1), located between 35°–43°N and 112°–120°E in northern China. The basin is surrounded by the Taihang Mountains to the west, the Bohai Sea to the east, the Mongolia Plateau to the north, and the lower reaches of the Yellow River to the south. The Beijing-Tianjin-Hebei megacity is located in the Haihe River basin where three major rivers, Haihe river, Luanhe river, and Tuhaimajia river flow through. The mountain areas are concentrated in the western parts of the basin, and the central, eastern, and southern areas are plain areas. This entire area belongs to the Temperate East Asia monsoon climate zone, which is mainly hot and wet in summer and cold and dry in winter. Additionally, this area is distinguished by an average annual temperature of 10.4 °C and average annual precipitation and evaporation of approximately 541.6 and 470 mm, respectively (He et al. 2015). Rainfall shows strong spatial and temporal variability in this area (Wang et al. 2019). Over 70% of the annual precipitation occurs during the summer season (June–August) in the form of rainstorms. The strongest evaporation was observed in the winter wheat-growing season (March–June), with about 55% of the annual total. Drought often occurs in spring as a result of low precipitation, high evaporation, and higher temperatures. The Haihe river basin is the economic, political, and cultural center of China, with a high density of population and huge demands for resources. And this area has limited water resources and often suffers from drought disaster in recent decades associated the weakening monsoon circulation (Wang et al. 2001; Ding et al. 2008). Drought is one of these major weather-related disasters and has serious environmental, social and economic impacts (Ma et al. 2006,2007). Many previous studies focused on the spatiotemporal characteristics of drought and the causes of drought information (Ding et al. 2010, Li et al. 2015, Duan et al. 2017). And a mesoscale climate model (i.e., WRF) is a powerful tool to simulate meteorological and land surface conditions relating to drought. However, one of the most important uncertainties of WRF simulation is related to the choice of optimal physical schemes. Only a few studies have been conducted in this area using the WRF model to investigate the changes in regional climate. Therefore, it is of importance to evaluate the overall performance of WRF in this area in order to determine an optimal set of parameterization combinations applicable to the Haihe River Basin.

Fig. 1
figure 1

Map of the study site: Haihe River Basin, China

2.2 Model configuration and datasets

In this study, the numerical experiments were conducted using the Advanced Research WRF model version 3.9.1 (Table 1), which is a non-hydrostatic, primitive-equation model providing many different physical and running options for a wide spectrum of applications at very different scales, from large-eddy simulations to climate simulations (see https://www2.mmm.ucar.edu/wrf/users). Physical options are composed of various parameterization schemes to describe sub-grid-scale processes, such as microphysics, cumulus convection, near-surface physics, land surface physics, planetary boundary-layer physics, and atmospheric longwave and shortwave radiative transfer. For each physical option, there are many parameterization schemes available in the WRF model (Table 2). For the study area, we chose the default combination of parameterization schemes that is often used for regional climates in WRF reference physical suits. They are WSM6 microphysics scheme, the YSU planetary boundary-layer scheme, the KF cumulus scheme, the CAM shortwave and longwave radiation, and the Noah Land Surface scheme. The horizontal resolution of the simulation area encompassing the Haihe River Basin was set at 13 km, with a total of 87 points in the east–west direction and 80 points in the north–south direction. In the vertical direction, there were 51 stretched vertical levels topped at 50 hPa. The initial and lateral boundary conditions were obtained from the European Centre for Medium-Range Weather Forecasts (ECMWF) six-hourly ERA-Interim reanalysis. Land use categorization in WRF 3.9.1 was determined from Moderate Resolution Imaging Spectroradiometer (MODIS) data. The time step of the WRF model was set to 180 s, and the output frequency of the WRF model was 3 h. Simulations were carried out for the four-month period of summer (May 1st–August 31st) in 2016. The first month was discarded for spin-up purpose. The other three months, i.e., June, July, and August 2016, were used to evaluate performance of different combinations of physical parameterizations. These simulations were assessed in terms of temperature and precipitation against hourly rainfall and daily surface air temperature observation from 261 stations in the Haihe River Basin.

Table 1 Overview of WRF model configurations
Table 2 Physics parameterization scheme assessed in this study

2.3 Experimental design and evaluation methodology

Use of the rule of statistical permutation and combination of five types of physics options to generate different scheme sets would have resulted in millions of experiments. Therefore, in order to reduce the computational cost effectively, we analyzed the interactions between WRF parameterization schemes and further utilized the step-wise refinement method to determine the optimal combination of physics parameterizations from five types of scheme options (Stergiou et al. 2017). As shown in Fig. 2, the first set of WRF simulations were kept the same as the default model configuration, except that the microphysics schemes were chosen from 19 kinds of schemes. The second set of WRF simulations was consistent with the default schemes, except that the microphysics scheme was adjusted for the optimal scheme, which further modified the choices for the planetary boundary-layer physics. A third set of experiments were performed, which were consistent with the second set of simulations, except that the cumulus schemes were assessed on the basis of the optimal PBL scheme. There was a concern that free combinations of all existing longwave and shortwave radiation schemes would not be able to run the WRF model successfully. Thus, some combinations of both longwave and shortwave radiation schemes available to begin the WRF model were applied to the fourth part of this study when the model configurations were consistent with the third set of simulations and the cumulus scheme was found to be the best option. Finally, the fifth set of experiments focused on sensitivity analysis of the land surface schemes in relation to selection of the other four best-performing schemes. Therefore, in order to carry out all the experiments for this study, we chose to select the best setup for runs supporting climate simulation enhancement over the Haihe River Basin.

Fig. 2
figure 2

Interactions between the WRF parameterization schemes

To comprehensively evaluate and rank the WRF simulations, we employed a multivariable integrated evaluation (hereafter MVIE) method developed by Xu et al.(2017). The MVIE framework consists of three levels of statistical metrics (Table 3). The first level of metrics, consists of the commonly used mean error (ME), standard deviation (hereafter SD) value, correlation coefficient (CORR), and root-mean-square deviation (hereafter RMSD). These metrics assess model performance in terms of individual variables (i.e. temperature and precipitation). For example, the standard deviation (SD) measures the variance of a scalar field. Correlation coefficient describes the pattern similarity between the two scalar fields (here referring to model outputs and observations). RMSD measures the overall difference of two scalar fields. The RMSD is a function of mean error, standard deviation, and correlation coefficient (Xu and Han 2020). The second level of metrics include the root-mean-square length (hereafter RMSL) of a vector field, the vector field similarity coefficient (hereafter VSC), and the root-mean-square vector deviation (hereafter RMSVD). These metrics can be defined in centered or uncentered forms. In this study, we employed the centered statistical metrics. The centered RMSL (cRMSL) is analogous to standard deviation but it describes the variance of a vector field. The centered VSC (cVSC) is analogous to Pearson’s correlation coefficient but it measures the pattern similarity between two vector fields. Similarly, the centered RMSVD (cRMSVD) is analogous to RMSD except that it measures the overall difference between two vector fields. Similar to the Taylor diagram (Taylor 2001), one can show these vector statistics on a vector field evaluation (VFE) diagram. The VFE diagram is a generalized Taylor diagram, which can summarize model performance in simulating vector fields (Xu et al. 2016). The VFE diagram can intuitively reveal to what extent the overall RMSVD of various fields are separately attributable to the bias in variance (represented by the RMSL) and the pattern similarities (represented by VSC). The VFE methods were successfully applied to evaluate model performance in simulating vector fields (e.g. Wang et al. 2019; Huang et al. 2018, 2020).

Table 3 Formulas of the statistical metrics in the multivariable integrated evaluation (MVIE) method

Based on the vector statistics, Xu et al. (2017) devised a MVIE method to evaluate model performance in simulating multiple fields. The general idea of MVIE is to normalized various scalar fields and group them into a multiple dimensional vector field. Then one can evaluate the modelled constructed vector fields to the observed one using the vector statistical metrics. To summarize and rank the overall model performance, Xu et al. (2017) defined a multivariable integrated evaluation index (MIEI), which takes multiple statistics, e.g., mean error, pattern similarity, and variance, of multiple variables into account. In this study, we normalize the temperature and precipitation and group them into a two-dimensional vector field. Thus, the grouped vector field derived from model can be assessed against that derived from observation by using the VFE diagram and MIEI. The computation of three levels of metrics consists of the following steps:

Step 1: Prepare the two groups of datasets (the model data and observations) for evaluation. The original model field for uncentered mode is produced by interpolating WRF model output to the closest observational station location. In this study, the anomaly field for centered mode is produced by the original field minus the mean value.

Step 2: Calculate the first level of metrics for individual variables in Table 3. In centered mode, the SD, cRMSD, CORR, ME of the anomaly field are calculated and displayed in Result and discussion part. In uncentered mode, the rms, uCORR, RMSD of the original field are calculated and not showed here.

Step 3: Calculate the second level of metrics for multivariable integrated field in Table 3. For the centered mode in this study, centered RMSL (cRMSL), centered VSC (cVSC), centered RMSVD (cRMSVD) and VME are calculated in terms of anomaly fields. For the uncentered mode (not shown here), the RMSL, VSC, and RMSL are calculated in terms of the original field.

Step 4: Calculate the third level of metrics for summarizing overall performance in Table 3, MIEI is calculated in terms of the original field to rank the performance of the different parameterizations.

3 Results and discussion

3.1 Microphysics

The other WRF parameterizations, except for the microphysics option, were generally held as default options in the present analysis, which represents a sensitivity analysis of the simulation results with different microphysics schemes. Microphysics schemes contain explicitly resolved water vapor, cloud, and precipitation processes. In the current version of the ARW, microphysics is able to accommodate any number of mass mixing-ratio variables, and other quantities such as number concentrations. In this study, we chose 19 commonly used microphysics parameterizations, and we calculated three levels of metrics for each WRF simulation with certain microphysics parameterizations (Table 4). It is immediately obvious that the option 11, CAM5.1, shows the minimum MIEI (0.71) which is selected as the best microphysics parameterization scheme.

Table 4 Three levels of statistics in MVIE in terms of 19 kinds of microphysics schemes; the lighter colors show which schemes performed better

The metrics table (Table 4) of the centered statistics decomposes the original field into mean and anomaly fields for evaluation. The anomaly fields are further evaluated from the perspective of variance (SD, cRMSL), pattern similarity (CORR, cVSC) and overall difference between the model and observation (cRMSD, cRMSVD). Note that the mean error (ME) is additionally computed for the centered statistics, as the centered statistics exclude mean error. For temperature and precipitation, the mean error (ME) values of CAM5.1 have the minimum values (− 0.96 and 1.21). Undoubtedly, the vector mean error (VME) of CAM5.1, measuring the difference between two mean vector fields, shows the minimum value (1.54) among 19 kinds of microphysics schemes. For the centered mode, the cRMSLs, reflecting the total standard deviation (SD) values across all components (including precipitation and temperature here) of the anomaly field varied from 0.98 to 1.07 among the 19 different simulations. And it clearly indicated that which schemes overestimated or underestimated the amplitude error of anomaly field, as characterized by greater or smaller normalized cRMSLs, respectively. For example, some schemes (e.g., Kessler, Goddard_GCE) underestimated the overall cRMSL over the Haihe River Basin in summer. In contrast, other schemes (e.g., WSM6_Grau) overestimated the overall cRMSL, reaching 1.07. As for the second statistic, the cVSC (in terms of anomaly fields) reflects the pattern similarity between two vector fields. The CAM V5.1 2–moment 5–class Scheme (hereafter CAM5.1, Eaton 2011), capture the simulation of precipitation and temperature, in close agreement with the observations, and the cVSC both reach a highest value of 0.78. All schemes show a high correlation with observations of surface temperature, but precipitation shows lower correlations with the corresponding in situ observations, with a range between 0.58 and 0.64. The cRMSVD measures the overall cRMSDs of temperature and precipitation and indicates the overall difference of anomaly field in terms of the simulation of multiple variables. The optimal microphysics scheme CAM5.1 has the minimum cRMSVD of 0.94, which reflects that the optimal one has the minimum overall error in terms of anomaly fields. There was a significant difference between the cRMSDs in terms of the two variables, surface temperature, and precipitation. The cRMSD of surface temperature ranges from 1.08 to 1.14, and the one as regards precipitation ranges from 0.77 to 0.91. Under these conditions, the overall differences of anomaly field were primarily from temperature and precipitation second. The MIEI values in Table 3 take both the amplitudes and pattern similarities of various variables into account, and can therefore provide a comprehensive evaluation of model performance in terms of original field. In general, the MIEI values of all of the schemes varied from 0.71 to 0.78, which suggests that the performances of all model simulations were sensitive to the choice of microphysics scheme options. For example, the CAM5.1 scheme (option 11) show smaller MIEI values (smaller than 0.71) than those of the other schemes. Note that the default scheme (WSM6_Grau scheme, option 6) had a larger MIEI value (0.78) than that of the optimal scheme.

In comparison with the MIEI, the VFE diagram can provide statistics on model performance that are more comprehensive, and also clearly shows the differences between models and observations as well as the differences between the various models. A VFE diagram in terms of original field showing the overall RMSL, VSC, and RMSVD over the Haihe River Basin is shown in Fig. 3. It can readily be seen that the optimal scheme (CAM5.1 scheme, option 11) is a better performer than the default scheme (option 6, WSM6_Grau). The CAM5.1 scheme shows the maximum VSC (0.75), indicating that the scheme applied in the climate model can generally better reproduce the spatial pattern of the two variables relative to other models. Similarly, in comparison with the default scheme (option 6), the optimal scheme (option 11) shows a smaller RMSVD (0.96) describing the overall difference between two vector fields (here referring to model field and observation field). However, in terms of the property of the RMSL measuring the overall magnitude of multiple variables, the RMSL of the CAM 5.1 schemes is 0.94 smaller than 1.0, which indicates that the model simulation underestimate the magnitude of the vector field.

Fig. 3
figure 3

Vector field evaluation (VFE) diagram for the 19 kinds of microphysics parameterizations over the Haihe River Basin. The purple symbol denotes the default scheme set and the red symbol denotes the optimal scheme set

The CAM 5.1 denotes the version of 5.0 of the Community Atmosphere Model (CAM), which contains significant enhancements to the representation of atmospheric process. Especially, the revised cloud macrophysics scheme provides a more transparent treatment of cloud processes and imposes full consistency between cloud fraction and cloud condensate. A prognostic, two-moment formulation for cloud droplet and cloud ice, and liquid mass and number concentrations are included in Stratiform microphysical processes. And the scheme describes ice supersaturation and features activation of aerosols to form cloud drops and ice crystals (Eaton 2011, http://www.cgd.ucar.edu/amp/research/modeling.html).

3.2 pbl/surface layer scheme

The main objectives for developing PBL schemes are concluded as follows: (1) calculating the fluxes of momentum, heat, and water vapor within the atmospheric boundary layer. (2) predicting the atmospheric boundary layer depth, the amount of cloud, and consequently the vertical redistribution of heat and mass. It can be divided into two classes of PBL scheme (Xie et al 2012), the first one is turbulent kinetic energy prediction (MYJ, MYNN, Bougeault-Lacarrere, TEMF, QNSE, CAM UW), the second one is diagnostic non-local (YSU, GFS, MRF, ACM2) [available from http://www2.mmm.ucar.edu/wrf/users/tutorial/201601/physics.pdf]. To determinate which PBL scheme performed best, Table 5 gives all of the statistical metrics values of various PBL schemes on the basis of the evaluation method. According to the monotonic property of MIEI values, it can readily be seen that the MRF scheme (option 99), followed by the ACM2-1 and ACM2-7 schemes, has a smaller MIEI value (0.70) than the other models. BouLac-1 and BouLac-2 show the worst performance with respect to the greatest MIEI values (above 0.74). Undoubtedly, the cVSC and cRMSVD values of the MRF schemes have been optimized to a greater or less extent than all of the other PBL schemes and the best microphysical schemes (CAM5.1). For example, the cVSC values of all PBL schemes range from 0.76 to 0.79, and the MRF scheme has a greatest value (0.79) among the others. The centered correlation coefficients (CORR) across the two individual variables (surface temperature and precipitation) show a better result. Specifically, the CORR values of surface temperature ranges from 0.89 to 0.91, which implies that the models reproduce the spatial pattern of temperature quite well. Compared with surface temperature, precipitation shows weaker centered correlation coefficients (CORR) ranging from 0.62 to 0.68. However, the MRF scheme selected for the optimal PBL scheme has a higher CORR value for precipitation (0.68). Similarly, all schemes show smaller cRMSL values, ranging from 0.99 to 1.04, than the microphysics schemes. The cRMSL values of MRF scheme approaching a value of 1.0, reveals the small amplitude error of the anomaly field. For mean field, the mean error (ME) of each individual variable in all PBL schemes shows a decreasing tendency relative to all microphysics schemes above. Regarding to the vector mean error (VME), the optimal MRF scheme shows a smaller mean error (0.77) of simulating multiple variables. And the optimal one has minimum mean error of temperature (− 0.35) and precipitation (0.69).

Table 5 Three levels of statistics in MVIE in terms of 17 kinds of PBL/Surface Layer schemes; the lighter colors show which schemes performed better

The application of the VFE diagram (Fig. 4) to the Haihe River Basin illustrates the overall correspondence between model results and observed behavior. For example, the optimal PBL scheme (MRF/MM5) shows a very high value of VSC (0.76), which indicates that the spatial pattern of all scalar fields (surface temperature and precipitation) is very close to that of the observational field. It should be noted that the optimal microphysics scheme (CAM5.1) and the default combination of parameterizations have weaker values of VSC of 0.75 and 0.69, respectively. The RMSLs of all schemes, composed of the RMSs of surface temperature and precipitation, are significantly different, and are also not well suited to performance evaluation here. The RMSL of the MRF scheme (0.93) is farther from the value of 1.0 than that of the default scheme (1.02), which indicates that the optimal one indeed underestimates the amplitude error of multiple variables. Similarly, the RMSVDs of the MRF scheme (0.95) are smaller than those of all other PBL schemes and the default scheme (1.11), which indicates that the MRF scheme performs better. The MIEI serves as a more concise evaluation index for generating the ranking of all model performances in simulating surface temperature and precipitation, and a smaller MIEI value shows a better model performance. Thus, the MRF scheme, which has the advantages of the aspects of three statistical quantities (i.e., RMSL, VSC, RMSVD), has a smaller value of MIEI (0.70) than the default scheme (0.78) and the optimal microphysics scheme (0.71). This fact strongly suggests that the combination of the optimal microphysics and PBL schemes can enhance the predictive performance for precipitation and surface temperature.

Fig. 4
figure 4

VFE diagram for the 17 kinds of PBL/Surface Layer schemes over the Haihe River Basin. The purple symbol denotes the default scheme set and the red symbol denotes the optimal scheme set

The MRF scheme as the optimal scheme has been widely used for atmospheric numerical models because of its smaller computational demands and because it produces reasonable results. Also, the MRF scheme, on the basis of a nonlocal-K approach proposed by Troen and Mahrt (1986), substantially improves the precipitation forecast by enhancing the convective processes at the right place and by suppressing abnormal rainfall events (Hong and Pan 1996; Balzarini et al. 2014).

3.3 Cumulus scheme

After completing two sets of simulations, we conducted a third simulation group focusing on the cumulus scheme on the basis of the microphysics and PBL options set to CAM5.1 and MRF, respectively. Cumulus schemes are intended to represent vertical fluxes under circumstances of unsolved updrafts and downdrafts and compensating motion outside clouds. Additionally, cumulus parameterizations are theoretically valid only for coarser grid sizes (greater than 10 km), where the release of latent heat is necessary in the convective columns. According to the results given in Table 6, the BMJ scheme shows a smaller MIEI (0.68) relative to all other schemes, and the MIEI of all cumulus schemes ranged from 0.68 to 0.75. In contrast, the GFE scheme had a greater MIEI (0.75) characterized by a worse performance of the climate model. For mean field, the VME (vector mean error), showing mean errors in terms of multivariable integrated field, ranges from 0.94 to 1.93. Regarding to the mean error (ME) of the optimal scheme (BMJ), both precipitation and temperature are in a smaller value (− 0.63 and − 0.75). In terms of the mean error (ME) values of other options, either temperature or precipitation is in high values. For anomaly field, the centered root-mean-square vector difference (cRMSVD), providing a comprehensive evaluation of anomaly field, varied from 0.83 to 0.98. It can readily be seen that there are significant differences between the amplitudes of various variables, relative to the RMSD statistical quantities, among the various cumulus schemes. For example, the RMSD of temperature, ranging from 0.44 to 0.56, is much smaller than that of precipitation, which ranges from 0.68 to 0.85. The next statistical quantity to discuss is the centered vector similarity coefficient (cVSC). The spatial patterns of all simulation results are very close to the observational anomaly field, which corresponds to very high values of cVSC (0.77–0.81). The centered root-mean-square length (cRMSL) of all schemes ranges from 0.89 to 01.03. And the cRMSL of the optimal one (0.99) is approaching 1.0, which reveals less amplitude error of multivariable integrated field. In general, the optimal BMJ scheme has smaller values of MIEI, cRMSVD, VME and greater values of cVSC, which suggests that the optimal one has improvements in model results.

Table 6 Three levels of statistics in MVIE in terms of 10 kinds of cumulus schemes; the lighter colors show which schemes performed better

The VFE diagram shown in Fig. 5 also provides some guidance for evaluating the original field’s performance of various cumulus schemes as well as the default scheme. As shown in Fig. 5, all cumulus schemes have improved model performance compared to the default combination of schemes. For example, the optimal BMJ scheme has a distinct advantage in promoting model simulation accuracy over the default scheme. The MIEI value of the BMJ scheme (0.68) is much smaller than that of the WSM6 scheme (0.78). Specifically, the BMJ scheme shows a smaller root-mean-square vector difference (RMSVD,0.88) and a higher vector similarity coefficient (VSC, 0.78) relative to the RMSVD (1.11) and VSC (0.69) of the default one. However, it should be noted that, as regards the RMSL, the default scheme (1.02) approaches 1.0 more than the BMJ scheme (0.86). The VFE diagram can clearly illustrate that the overall RMSVD of the BMJ scheme is more associated with the improvement in pattern similarity than with the systematic difference in vector length.

Fig. 5
figure 5

VFE diagram for the 10 kinds of cumulus schemes over the Haihe River Basin. The purple symbol denotes the default scheme set and the red symbol denotes the optimal scheme set

With respect to the optimal cumulus scheme, the Betts–Miller–Janjic (BMJ) scheme derived from the Betts–Miller convective adjustment scheme has been optimized in concept to the parameter values recommended by Janjic (1994, 2000). Furthermore, cumulus schemes fall into two main classes: (i) the adjustment type containing only one BMJ scheme and (ii) the mass-flux type containing all other schemes in WRF.

3.4 Longwave/shortwave radiation scheme

The radiation scheme is responsible for atmospheric heating resulting from radiative flux divergence and surface downward longwave and shortwave radiation for the ground heat budget. Longwave radiation schemes compute mainly clear-sky and cloud upward and downward radiation fluxes. Shortwave radiation schemes compute clear-sky and cloudy solar fluxes, and most schemes consider downward and upward fluxes in particular (Dudhia scheme only has downward flux). The longwave/shortwave radiation schemes simulation group is assessed next (see Table 7). Because some of the radiation schemes were unable to start WRF successfully, we selected nine kinds of radiation schemes capable of outputting model results. There is a subtle difference among the MIEI values of all radiation schemes, which range from 0.67 to 0.68. Thus, we select the optimal option by the performance of mean fields and anomaly fields. For mean fields, the CAM scheme, as with the default radiation scheme, has a minimum vector mean error (VME) value of 0.84, indicating its less mean difference. For anomaly fields, most of the simulations in this group have greater centered vector similarity coefficient (cVSC)and smaller centered root-mean-square difference (cRMSVD) than the earlier simulation groups, which demonstrates that the process of selecting optimal physics schemes enhances the quality of the simulations. The centered root-mean-square length (cRMSL) of all radiation schemes shows a steady trend exceeding 1.0. Specifically, the cRMSL of the optimal scheme (CAM) is 1.00, indicating its good performance in amplitude error of anomaly field. And the optimal one has a larger cVSC (0.82) and smaller cRMSVD (0.86) among other option. The CAM longwave radiation schemes documented fully by Collins et al. (2006) are linked to resolved clouds and cloud fractions and are able to handle several trace gases. Additionally, the CAM shortwave radiation scheme using cloud fractions and overlapping assumptions for unsaturated regions has a monthly zonal ozone climatology and can handle the optical properties of several aerosol types and trace gases. All in all, the CAM radiation scheme is especially suited to regional climate simulations by having an ozone distribution that varies during the simulation according to monthly zonal-mean climatological data.

Table 7 Three levels of statistics in MVIE in terms of 9 kinds of Longwave/Shortwave schemes; the lighter colors show which schemes performed better

3.5 Land surface parameterization scheme

The representation of LSM schemes results from the increased interest in land-use activities and the need to simulate regional climates more precisely. The presence of vegetation, orographic features, and surface heterogeneity, all suitably represented in LSM, all influence surface albedo (radiative transfer), surface roughness (momentum transfer), and surface hydrology (sensible and latent heat transfer; runoff). The last simulation group focuses on the evaluation of the land surface schemes given in Table 8. In most runs, there is little difference in the MIEIs of various LSM (Land Surface Model) schemes, and the optimal Noah scheme has a relatively small MIEI value (0.68). Particularly for mean field, the mean error of individual variables in this group is superior to all other groups. For example, the mean error (ME) of temperature in Noah simulation is much less than other option and approximately approaching 0. The VFE diagram in Fig. 6 shows that, although not all radiation and LSM schemes show significant differences in modeling results from the optimal choice, the selection of the best combination of schemes optimizes model performance more than the default scheme.

Table 8 Three levels of statistics in MVIE in terms of 3 kinds of Land Surface schemes; the lighter colors show which schemes performed better
Fig. 6
figure 6

VFE diagram for the 9 kinds of radiation schemes and 3 types of LSM schemes over the Haihe River Basin. The purple symbol denotes the default scheme set and the red symbol denotes the optimal scheme set. Because the optimal LSM scheme (Noah) is consistent with the default scheme, we discuss the two types of physics options here together in the VFE diagram

The optimal LSM scheme selected above was the Noah LSM scheme, which is the successor to the OSU LSM scheme described by Chen and Dudhia (2001), and has a four-layer soil temperature and moisture model with canopy moisture and snow cover prediction. Additionally, the Noah LSM not only predicts soil ice and fractional snow cover effects, but also has an improved urban treatment, and considers surface emissivity properties (Koren et al. 1999; Ek et al. 2003).

3.6 Identifying the optimal parameterization scheme combination

The results of the test of the sensitivity of precipitation and surface temperature simulated by the WRF model to different parameterization scheme combinations have been determined, and the best performing scheme set consists of the CAM5.1 microphysics scheme, the MRF PBL scheme, the BMJ cumulus scheme, the CAM for shortwave/longwave radiation scheme, and the Noah land surface scheme. In order to identify the best parameterization setup with the overall best performance in terms of simulating temperature and precipitation, several approaches were utilized in this study. One crucial issue is selection of the VFE diagram providing various statistics from all groups of WRF simulations (Fig. 7). Some manipulation of the statistical metrics is necessary to rank the model performance of all simulation members in the study; thus, the RMSL, VSC, and RMSVD statistical quantities have been integrated to access all 58 sets of simulations shown in Fig. 7. As shown in the figure, different simulation groups are clearly separated from each other. The performance of members in different groups in the VFE diagram has been discussed before, so here we focus on analyzing whether the evaluation method enhances the performance of each option on basis of spatial mean values and temporal mean values in all scalar fields (i.e. surface temperature and precipitation). The earlier work involved five stages of evaluation work, and it can be seen from Fig. 7 that the RMSVD and VSC statistical quantities have been much enhanced in the process of selecting five kinds of physics parameterizations. The difference between the default model setup and the optimal set of schemes can also be seen from Fig. 7. Specifically, compared with the default scheme, the optimal scheme has a higher VSC (0.79; that of the default is 0.69) and a smaller RMSVD (0.87; that of the default is 1.11). Some analysis of the features of the RMS values of various scalar fields is necessary prior to ranking the RMSL values of the vector field here. Thus, relative to surface temperature, the normalized RMS of the optimal scheme (1.00) approaches 1.0 more closely than that of the default scheme (0.95). In contrast, the optimal scheme has underestimated the RMS of precipitation (0.72), which is different than the overestimate of the default scheme (1.07). All in all, the optimal combination of physics parameterizations is beneficial for improving the reproduction of surface temperature and precipitation in the WRF model over the Haihe River Basin.

Fig. 7
figure 7

VFE diagram for all 58 kinds of simulation above the Haihe River Basin. Different colors represent the different physics options, and the red (purple) symbol is the best proper (default) scheme set over the basin

Next, in order to identify the impact of the relative differences between different parameterization schemes on the bias between model outputs and observations, we calculated the spatial mean bias of the best schemes of all five options examined above to present the spatial bias characteristics of precipitation and surface temperature (Fig. 8) over the Haihe River Basin. The statistic for precipitation and temperature is calculated as the spatial average of the differences in the three months’ mean fields. In addition, in order to identify whether the optimal physics parameterizations set is optimized, we also calculated the statistics of the default scheme to demonstrate the improvement in the other model simulations. As shown in Fig. 8, we can conclude that the evaluation approach described above greatly enhances the ability of the model to simulate precipitation. The bias values for precipitation using the default option shown in the WRF model greatly overestimate rainfall. This overestimation of rainfall in summer is located over most regions in the Haihe River Basin, and overestimations of up to 4–5 mm/day are located in the northeastern part, the southeast plain area, and the western hilly area of the basin. Looking at the optimal microphysics option, the simulation in some areas also produces overestimations of more than 3 mm/day. In the optimal PBL schemes, there is no significant difference between the model and the observational data, and the positive and negative bias values are all distributed between − 3 and + 3 mm/day. The optimal cumulus scheme option has a minimum magnitude (between − 1 and + 1 mm/day) of bias values spread over the Haihe River Basin. With respect to the last simulation (referring to the optimal radiation/LSM scheme, the optimal LSM scheme remains as the default scheme, and so the radiation scheme coincides with the LSM scheme), it can readily be seen that the remarkable bias values of almost all models have mostly disappeared when using the optimal combination of schemes. The summer precipitation predictions remain poor over a very small area located in the southern hilly area, with a relatively small underestimation, while the remaining parts maintain a high correspondence with the optimal cumulus schemes characterized by no severe failures.

Fig. 8
figure 8

Spatial characteristics of the mean model deviations from the observational data for precipitation (left) and surface temperature (right)

For the surface temperature variable (Fig. 8), the majority of the simulations show a magnitude of temperature biases with a range between − 2.5 °C and + 2.5 °C. The underestimate with a magnitude of − 2.5 °C is located in most parts of the hilly areas (the northern and western parts), and the model deviation in the central plateau area show a small overestimate (above 0.5 °C) or a small underestimate (above − 1 °C). There is a similar pattern to the temperature simulated by each physics combination (shown in Fig. 8). However, the trend of particular model bias in some parts of the Haihe River Basin shows the improvement in the temperature forecast in the final simulation. For example, the central area shifts from a small underestimate (above − 1 °C) to a small overestimate (above 0.5 °C). Furthermore, the peak values of the overestimates in several parts of the hilly areas have disappeared. Although the overestimates in some parts have intensified in the final simulation, we can see that the simulation of surface temperature has improved to some extent as a result of using the optimal physics schemes combination.

We analyzed the relative differences in the impacts of all combinations on the temporal characteristics of surface temperature (Fig. 9) and precipitation (Fig. 10) by calculating the spatial averages over the Haihe River Basin at the daily timescale. As shown in Fig. 9, the first four plots show the daily spatial averages of all simulations during the study period. Specifically, all of the simulated temperatures show underestimates compared with the observed temperatures. Note that there is a steady trend in simulating the temperature by selecting the optimal options above, and these simulated temperatures gradually approach the observed temperature data, which indicates that the evaluation method we have chosen is capable of selecting the best scheme set to further enhance the model’s prediction accuracy.

Fig. 9
figure 9

Temporal characteristics of the spatial averages of surface temperature for all simulations over the Haihe River Basin during the summer of 2016. The gray lines denote the improper schemes, and the other colored lines denote the optimal schemes for each physics option

Fig. 10
figure 10

Temporal characteristics of the spatial averages of precipitation for all simulations over the Haihe River Basin during the summer of 2016. The gray lines denote the improper schemes, and the other colored lines denote the optimal schemes for each physics option

Figure 10 shows plots of regional averages for the amount of simulated rainfall falling on each day in summer. By examining each physics parameterization in Fig. 10, our study focuses on discussing the optimal scheme for each option examined above. Although there is no particular amplitude error among the various scheme sets, it could be seen that the optimal setup (red line) is in more close proximity to the observational one (yellow line) than other setups in Fig. 10. The amount of rainfall distributed on the daily timescale, in Haihe river basin located in semi-arid region, is small and may account for the unremarkable amplitude errors between the model and the observational data. All in all, the model can simulate the overall temporal distribution of observational precipitation to some degree.

4 Conclusions

According to previous studies, the simulation of rainfall and temperature is fundamental to climate research using the Advanced Research Weather Research and Forecasting (WRF) model, and particularly because the WRF model performs poorly when simulating rainfall. The selection of the optimal combination of schemes from a wide range of physical parameterization sets is beneficial for improving the performance of the model, even at the risk of high computational cost. Given the two reasons above, in this study, various physics combinations have been used to simulate the surface temperature and precipitation during a summer season in the Haihe River Basin for the purpose of optimizing the performance of the WRF model. In contrast with previous studies, we chose a multivariable integrated evaluation (MVIE) method, which groups various scalar fields (here referring to temperature and precipitation) into a vector field in order to thoroughly identify the most appropriate combination of schemes for the WRF model for the Haihe River Basin. Furthermore, the 58 members of the physics combinations that were chosen, which were not determined by permutation and combination of all parameterizations from the five physics options because of the computational cost, but by utilizing a stepwise refinement method based on an analysis of the interactions between the WRF parameterization schemes. In brief, this study has applied a more comprehensive evaluation methodology to assess more efficiently the existing options that the WRF model offers (here referring to the five kinds of options, MP, PBL, CU, RA, and LSM) with as small a computational cost as possible.

In the evaluation process, various metrics were calculated and integrated to produce a preferred combination of physics parameterizations, which contains CAM5.1 MP, MRF PBL, BMJ cumulus, CAM radiation, and Noah LSM schemes, in comparison with the default scheme set (referring to WSM6 MP, YSU PBL, KF cumulus, CAM radiation, and Noah LSM schemes). Analysis of the multivariable integrated field revealed that the optimal scheme for each option examined above has a smaller MIEI value, which is much attributed to the improvement of centered vector similarity coefficient (cVSC) and smaller mean error. Obviously, the optimal setup has a higher cVSC value than other setups, which shows a good performance in pattern similarity of anomaly field. With respect to mean field, the optimal one for each option, in terms of both precipitation and temperature, showed a decreasing trend in mean error. Especially for surface temperature, the optimal scheme for each option reduced mean errors, gradually approaching a value of 0.

For this domain, we recommend the appropriate scheme set that contains the CAM5.1 MP, MRF PBL, BMJ cumulus, CAM radiation, and Noah LSM. To identify the advantages of the set of schemes selected by the MVIE approach, the VFE diagram with all sets of schemes clearly illustrates that the best set of schemes improved the pattern similarity and the RMS vector difference compared with the default scheme set. The spatial mean bias plots of temperature and precipitation also demonstrate that the best scheme shows much smaller deviations from the observed values than the default scheme, and, in particular, the spatial improvement in the prediction of precipitation is obvious. The temporal plots of the spatial averages of temperature and precipitation show that the optimal scheme for each option has gradually improved the simulation of the spatial averages with the selection of successive schemes, and surface temperature matches the observational data better than precipitation does. Perhaps a more robust conclusion drawn from the overall analysis results is that the model output is more sensitive to the choice of microphysics schemes and PBL schemes than the cumulus scheme, but the radiation and LSM schemes have no significant impact on the model results.

It should be acknowledged that there will be uncertainties in aspects of observation data. The in-situ observational data will not only contains errors from instruments and measuring practices, but also should take a longer time series into consideration. In further work, greater advantage should be taken of more sets of observational data, such as Climatic Research Unit (CRU) gridded data, Global Historical Climatology Network (GHCN) temperature data, and Global Precipitation Climatology Center (GPCC) precipitation data. Also, the results for the optimal scheme set in this study can increase confidence in regional modeling over the Haihe River Basin for subsequent studies. It is worth pointing out that MVIE needs to be further improved to continue to be used in higher resolution convection-permitting models (Liu et al. 2017; Li et al. 2019). Unsurprisingly, the model evaluators can apply the assessment method used in this study to the domains of interest involved with other variables or extreme events.