1 Introduction

In recent decades, global warming has emerged as one of the most challenging problems (Smith et al. 2018; Mokhov 2022), with human-induced heat-trapping gases such as CO2, CH4, and N2O significantly contributing to the rise in global temperatures (Liu et al. 2022a; Yasmin et al. 2022). Consequently, various adverse effects on forestry (Sperry et al. 2019), agriculture (Baldos et al. 2019), hydrology (Wine and Davison 2019), Physiology of fishes in water (Alfonso et al. 2021), livestock (Lacetera, 2019), and climate have been observed due to the escalating levels of these greenhouse gases (Rajak 2021). Global warming also disrupts the natural cycle of meteorological variables (Dou et al. 2022; Zhang et al. 2021)), intensifying evaporation and leading to localized storms and droughts. This amplified water cycle gives rise to extreme weather conditions such as floods and droughts (Duan and Duan 2020; Oh et al. 2020; Çakmak et al. 2021; Wei et al. 2021; Çakmak and Acar 2022). Moreover, Mare et al. (2018) found that the number of fatalities resulting from precipitation-related natural disasters consistently exceeds those caused by all other global incidents. Consequently, it is crucial to monitor and forecast precipitation characteristics with greater accuracy to ensure environmental sustainability (Perović et al. 2021).

Numerous methods have been developed in the literature to forecast global climate changes in different regions and time periods (Russo et al. 2022). GCMs are mathematical models that are built upon biological and physical laws, playing a crucial role in understanding global climate variations and predicting future conditions (Try et al. 2022; Hamed et al. 2022). Whereas, CMIP is a collaborative project of scientists that standardizes the use of GCMs and analyzes the outcomes generated by these models. For instance, Wu et al. (2020) employed climate models from CMIP6 to forecast changes in wind speed, while Yue et al. (2021) utilized GCMs from phase 6 of CMIP to analyze and project changes in precipitation and temperatures. There are various sources of uncertainties associated with climate models (Davies-Barnard et al. 2022; Lovenduski et al. 2016; Zheng et al. 2021). To enhance the robustness of climate projections, multiple models are often combined using Multiple Model Ensemble (MME) approach (Kim et al. 2020).

MME is a technique that combines various models to provide a more comprehensive and reliable understanding. However, different weighted and unweighted strategies exist in literature to ensemble multiple models (Mudryk et al. 2020). The unweighted ensemble methods combine all GCMs by simple average (Raju and Kumar 2020). For example, Xu et al. (2019), Liu et al. (2022), and Dong et al. (2021) used the SMA approach to create an ensemble, whereas, weighted MME schemes assign unequal weights to each model (Kim et al. 2020; Mudryk et al. 2020). These weights reflect the prior performance of the model (Jose et al. 2022; Morim et al. 2020). Therefore, several researchers developed different weighting schemes to ensemble the data (Knutti et al. 2019; Sanderson et al. 2015). For example, Bayesian Model Averaging (BMA) is a prominent ensemble method that assigns weights through the posterior probability distributions of models (Ombadi et al. 2021; Raftery et al. 2005; Zhang et al. 2016). Wootten et al. (2020) ensemble multiple models of CMIP6 by using BMA to forecast precipitation. Some other weighting strategies include cue weighting strategy (Otterbring et al. 2022), Flow-based weighting scheme (Dong et al. 2021), and Copula-based Bayesian Model Averaging (CBMA) (Ehteram et al. 2022; Seifi et al. 2022). Weighted ensemble projections are generally considered more robust compared to non-weighted ensemble techniques (Xu et al. 2022a; Scafetta 2022).

To the best of our knowledge, the extent to which extreme values influence GCM predictions is not yet fully understood in many MME weighting approaches. Moreover, no existing weighting scheme ranked models based on their relevant features for better accuracy. Hence, the proposed weighting scheme evaluates the value-to-value variation between observed and simulated models to reduce the influence of extreme values. Further, this research incorporates Monte Carlo Feature Selection (MCFS) (Dramiński et al. 2008) to identify the relative importance (RI) of GCMs in comparison to observed data. The choice to utilize MCFS is based on its superiority over other algorithms in terms of improved performance in handling data-related issues such as multicollinearity, capturing nonlinear interactions, and providing flexibility in model selection.

MCFS is a machine learning tool, and in contemporary times, machine learning applications are playing essential roles in forecasting and classification. This is due to fact of availability of large and high dimension data from various fields such as healthcare, social media, online education, and environmental sciences (Nematzadeh et al. 2019; Zhou et al. 2022). The high dimensional data set may overfit or underfit the results due to redundant or irrelevant features (Alirezanejad et al. 2020). Therefore, feature selection is necessary not only for effectively handling several variables but also for the selection of relevant features for accurate modeling and prediction (Tadist et al. 2019).

In the literature, there are several machine learning–based algorithms of machine learning (Hasan and Bao 2021). There are two main types of feature selection techniques that exist in literature, i.e., filter and wrapper techniques. The wrapper technique is further divided into wrapped methods and embedded methods (Li et al. 2021a). The wrapper methods use machine learning algorithms, and the embedded ones use techniques such as RIDGE and LASSO regression (Alirezanejad et al. 2020; Zhou et al. 2022). On the other hand, the filter approaches don’t use any models. They have their own computationally efficient techniques (Alhakami et al. 2019).

Overall, the objective of this research is to incorporate the implications of the MCFS algorithm with some mathematical formulations to improve the ensemble of multiple GCMs. As MCFS algorithm alone only gives a single value weight to models by evaluating the whole prior performance, the research proposes a new weighting scheme called MCFSAWS-Ensemble that considers the value-to-value variation and provides a weight to each value of model. Hence, by using the MCFSAWS-Ensemble, policymakers can make better policies for a sustainable environment and can reduce the impact of extreme events on society. For application purposes, MCFSAWS-Ensemble focused on the precipitation data of the Tibet Plateau region of China and considered 20 GCM simulations from CMIP6.

The remainder of this paper is organized as follows: Section 2 presents the existing and proposed methods. Section 3 provides a description of the study area and data. Section 4 discusses the results. Finally, Section 5 presents the conclusions Fig. 1.

Fig. 1
figure 1

Flow chart of the proposed weighting scheme

2 Methods

2.1 Monte Carlo Feature Selection

Monte Carlo Feature Selection (MCFS) is a variables/features ranking algorithm of machine learning developed by Dramiński et al. (2008). The MCFS algorithm consists of three main steps. The first step involves estimating the importance of features. In the second step, validation is performed to evaluate the performance of the selected features. Finally, the third and last step confirms the features and identifies the most important ones then assigns weights to them according to their significance. The visual representation of the MCFS algorithm is shown in Fig. 2. In summation, MCFS identifies the informative and non-informative features and ranks them, accordingly (Li et al. 2020). It uses mathematical models to depict the variations between the features in terms of probability distributions (Niaz et al. 2020). Several iterations of a random sample procedure are performed to derive the subsets of features (Tadist et al. 2019). Different subsets are chosen at each iteration. The outcomes are used to estimate the probability that each feature is relevant to the classification task. The main purpose of the MCFS algorithm is to estimate the ranking of several features by making a thousand trees for selected random subsets (Yan et al. 2019). For a sample of size n, consider a classification problem with c classes (Dramiński et al. 2010). The weighted accuracy (WAcc) of a tree is used to determine the classification skill of the tree on a test set. The mathematical formulation of WAcc is mentioned below.

$$\textrm{WAcc}=\frac{1}{c}\sum\nolimits_{j=1}^c\frac{n_{jj}}{n_{j1}+{n}_{j2}+\dots {n}_{jc}}$$
(1)
Fig. 2
figure 2

Flowchart of the MCFS algorithm

WAcc highlights the relative significance of a specific feature. The classification of the number of samples from the jth class to the kth class is njk (j, k = 1, 2, 3, …, c ;\(\sum_{jk}{n}_{jk}=n\)). Therefore, WAcc is just the average of true positive rates of all the classes (Equation (1)). This weight accuracy is used to determine the relative importance (RIjk) of a feature (jk) (Equation (2)). The mathematical formulation RIjk is given below.

$${RI}_{jk}=\sum\nolimits_{\iota =1}^{s.t} WAc{c}_{\iota}^u GR\left({n}_{jk}\left(\iota \right)\right){\left(\frac{no. in\ {n}_{jk}\left(\iota \right)}{no. in\ \iota}\right)}^{\nu }$$
(2)

In the above equation, s. t is the total number of trees. The nodes of ιth tree for feature jk are njk(ι). The gain ratio for node njk(ι) is GR(njk(ι)). Whereas, no. in njk(ι) is the number of samples in the node njk(ι). On the other hand, the number of samples in ιth root of the tree is (no. in ι), and v and u are positive real numbers.

2.2 The proposed weighting scheme — MCFSAWS-Ensemble: Monte Carlo Feature Selection Adaptive Weighting Scheme for GCMs Ensemble

This section presents the mathematical structure of the proposed ensemble weighting scheme and its flow chart given in Fig. 1. The proposed weighting scheme consists of the hybridization of two types of weights derived from two types of sources. The first sources quantify the relative importance of simulated data. While the second source extracts weights by quantifying value by value difference of GCMs with observed data. We hypothesize that the first sourced weights will reflect the relative importance of each climate model in a multi-model ensemble, whereas, the second sourced weights ensure the diminishment of the impact of outliers in model aggregation.

Mathematically, consider the multivariate precipitation time series of observed and simulated data from multiple climate models at a single grid point, denoted as, R = [Y, M1, M2, M3…, Mk]. Here, Y is the observed precipitation data of a particular grid point and M1, M2, M3…, Mk are the temporal vectors of the simulated data. To ensemble multiple models under various future scenarios, this research suggests the following steps for the derivation of weights against each model.

2.2.1 Source 1 — Implication of model importance using MCFS

This source computes the overall importance of each model while considering their prior performance. We considered the Relative Importance Score (RIS) (Vi) as the first source weight for each model, computed using Equation (2). In this study, the RIS is calculated using the rmcfs library of the R software. This score serves as the initial weight for each simulated model. In this paper, we denote the RIS as the first source weighting using Vi.

2.2.2 Source 2 — Real-time value by value base extraction of weights under exponential transformation

This source provides a set of equations that extracts weights for each model by transforming the deviations among each value of the observed (Y) that and simulated data (Mi). The transformation is made in such a way that the nearest values of climate models (Mi) to observed data (Y) receive high weight and vice versa. Mathematically, firstly we suggest taking the absolute difference between the observed and each model value using the following equation.

$${d}_i=\mid Y-{M}_i\mid$$
(3)

Secondly, the following equation exponential the differences computed by Equation (3). The main purpose of exponentiation of the difference is to explore the impact of extreme values and outliers.

$${z}_i={e}^{d_i}$$
(4)

Thirdly, the following transformation provides a set of indices that describes the closeness of simulated to observed data.

$${p}_i=1-\frac{z_i}{q}$$
(5)

In the above equation, \(q={\sum}_{i=1}^k{z}_i\).

The main objective of this transformation is to assign weights to each GCM according to its distance to the observed data set. This transformation allows higher weight to lower distance value and vice versa.

Then, we standardized the weights computed from Equation (5) by the following mathematical equation.

$${U}_i=\frac{p_i}{\sum_{i=1}^k{p}_i}$$
(6)

In the above equation, Ui are the standardized weights and \({\sum}_{i=1}^k{U}_i=1\). Further, we hybridize the weights combining the initial (Vi) and standardized weights (Ui) of ith GCM through simple average (Equation (7)).

$${b}_i=\frac{V_i+{U}_i}{2}$$
(7)
$${W}_i=\frac{b_i}{\sum_{i=1}^k{b}_i}$$
(8)

All the k time series (i = 1, 2, 3…k) are iterated through Equations (3), (4), (5), and (6). Further, we aggregated data of multiple GCMs using the proposed weighting scheme that accounts for unequal weights in the multi-model ensemble. Mathematically,

$${X}_t=\sum\nolimits_{i=1}^k{W}_{it}{M}_{it}$$
(9)

In the above equation, Xt denoted aggregated data of multiple GCMs under the proposed ensemble scheme.

2.3 Comparative methods and measures

In this article, we used two comparative statistical measures, namely Mean Absolute Error (MAE) and Pearson correlation coefficient, to assess the appropriateness and efficiency of MCFSAWS-Ensemble. These measures were then compared with Simple Models Averaging (SMA) method.

SMA is a statistical method that combines models by giving each model equal weights (e.g., Dey et al. 2022; Zeng et al. 2022). It is simple and easy to implement. Therefore, it is widely used to combine multiple models. This simple average of k climate models is presented in Equation (10). Here, Mj(t) is the output of jth GCM at time t.

$$SMA=\frac{1}{k}\sum\nolimits_{j=1}^k{M}_j(t)$$
(10)

MAE is a relative performance measure and is widely used in literature. For example, Xu et al. (2022b), Chen et al. (2022), and Niu et al. (2023) have used this index as a performance assessment criterion. The mathematical formulation of MAE is presented below.

$$\textrm{MAE}=\frac{1}{\textrm{n}}{\sum}_{\textrm{i}=1}^{\textrm{n}}\left|{\textrm{Y}}_{\textrm{i}}-{\uprho}_{\textrm{i}}\right|$$
(11)

In the above equation, n is the sample size. Yi is the observed data and ρi is the estimated data, whereas, the Pearson correlation coefficient is a statistical tool that is used to determine the linear relationships between variables (Rungskunroch et al. 2022). Several researchers have used correlation coefficients in their studies. For example, Varney et al. (2022) used correlation to compare the performance of the proposed index. Its value ranges from (−1, 1). The correlation (r) between the observed data (Y) and the estimated data (ρ) is presented in Equation (12).

$$\textrm{r}=\frac{Cov\left(Y,\rho \right)}{\sigma_Y\ {\sigma}_{\rho }}$$
(12)

In the above equation, σY is the standard deviation of Y and σρis the standard deviation of ρ.

3 Application

Tibetan is the region of China that has covered more than 2.5 million km2 of the world (Chen et al. 2022a). The Tibet Plateau (or Himalayan Plateau) is located in central Asia, with an average elevation of almost 4000 m (Zhang et al. 2022). It is also known as the “Roof of the World” because it is the source of many Asian rivers. The ecology and climate of Asian countries mostly depend upon the Tibet plateau (Wang et al. 2022). The boundary of the Tibet plateau touches the southwest Himalayas and northeast Kunlun and Aljin mountains (Chen et al. 2022). Its border is identified as being above the 2500-m contour line (Zhang et al. 2022). In our study, we considered the monthly precipitation data of 32 randomly selected stations on the Tibetan plateau (Fig. 3). Gridded CN05.1 observational data set of precipitation on a resolution of 0.5° × 0.5° is considered as the observed data. Therefore, simulated model data is also re-gridded to a standard resolution of 0.5° × 0.5°. Moreover, the unit of data is a millimeter per month. However, the lack of precipitation and rising temperatures in this region were the reasons for choosing it. These extreme events can signal drought or flood Li et al. 2021b. We used historical precipitation time series data from the Tibet Plateau region of China from 1961 to 2014. Moreover, we used 20 models of CMIP6 for future prediction. Table 1 is describing the model’s name, modeling center, and resolution of each selected GCM model.

Fig. 3
figure 3

Spatial distribution of the selected locations

Table 1 Description of CMIP6 models

4 Results and discussion

4.1 Implication of MCFSAWS-Ensemble

This section presents the results associated with the execution of MCFSAWS-Ensemble. In this paper, we provide a numerical and graphical description of the RMCF execution for one random grid point. The remaining results are archived in the authors’ gallery.

Table 2 provides the cutoff value and size of variables associated with different algorithms based on the importance and non-importance scores of each variable under RMCFs. In this research, we employed permutation methods as a cutoff method. Using this method, we observed that all the GCMs are considered important. Table 3 presents the ranking of various GCMs based on their RI scores under the RMCFs algorithm. The RI scores indicate the importance or performance of each model in simulating climate conditions compared to observed data. The models are listed in descending order, with CanESM5 achieving the highest RI score of 0.122969, followed by CanESM5.CanOE at 0.102068, and ACCESS.ESM1.5 at 0.085126. These top-ranked models are considered to have higher fidelity in replicating the observed climate patterns. As we move down the ranking, the RI scores decrease, indicating relatively lower performance in simulating climate conditions. The models at the bottom of the ranking, such as GFDL.ESM4, INM.CM5.0, HadGEM3.GC31.LL, INM.CM4.8, and EC.Earth3.Veg, have lower RI scores ranging from 0.002701 to 0.001403. The ranking of the GCMs based on their RI scores is significant for model ensemble construction. Models with higher RI scores are generally more reliable and accurate in capturing the observed climate behavior. Therefore, when forming a model ensemble, the higher-ranked models would typically be given more weight or importance due to their superior performance, while the lower-ranked models may be assigned lesser weight or excluded from the ensemble altogether.

Table 2 Cutoff value based on its importance and non-importance scores under RMCFs algorithm
Table 3 Relative importance (RI) of various GCMs simulation under RMCFs algorithm

Figure 4 displays the relative importance of each GCM, with the horizontal axis representing the model’s RI value ranging from 0 to 1 and the vertical axis showing the model names. The CanESM5 model stands out with the largest RI bar compared to the other models. In Fig. 5, the Interdependency Discover (ID) of each GCM is graphically presented, with the size and color indicating the strength of interdependency. The vibrant color of the model point signifies high dependence and importance for other models, while lighter colors represent lower dependence, and the larger arrow size represents a higher correlation between models.

Fig. 4
figure 4

Relative importance (RI) of various climate simulations of GCMs at one random point

Fig. 5
figure 5

Interdependency discovery plot among GCMs model under MCFS

After assessing the Relative Importance (RI), we calculated the point-to-point differences and transformed them according to the method described in source 2 of the MCFSAWS-Ensemble scheme. Subsequently, we hybridized these two sources by combining their weighted and standardized values.

The proposed weights for a single spatial location of time series data are depicted in Fig. 6. The horizontal axis represents the time period from 1961 to 2014, while the vertical axis displays the values of the proposed weighting scheme. Different colors are used to represent various models. It was observed that the weights of each model varied over time, indicating that none of the models remained consistently important. This outcome aligns with the desired results since as time and parameters change; the importance of models should also change in the aggregation process. Figure 7 presents significant deviations in the temporal behavior of aggregated data between the MCFSAWS-Ensemble and SMA approaches.

Fig. 6
figure 6

Weights of various GCMs model at one random grid point

Fig. 7
figure 7

Temporal behavior of ensemble data under MCFSAWS-Ensemble and SMA scheme at one random grid point

The next subsection evaluates the validity of the proposed weighting scheme. We utilize Mean Absolute Error (MAE) and correlation measures to assess the superiority of MCFSAWS-Ensemble over SAM approaches. These results are based on all selected grid points.

4.2 Validation

This section presents results associated with the validation of the proposed weighting scheme. Table 4 presents a comparison of the MAE and correlation values for the proposed MCFSAWS-Ensemble scheme and the SMA scheme. These statistics infer the superiority of the proposed procedure over the SMA scheme, particularly when considering unequal weights. The MAE values indicate the average magnitude of the differences between the simulated precipitation and the observed values. Under the MCFSAWS-Ensemble scheme, the minimum MAE is 0.5037328, while the SMA scheme has a slightly higher minimum MAE of 0.5180123. This suggests that the proposed weighting scheme performs marginally better in terms of minimizing errors at the lowest level. Moving to the quartiles, the first quartile MAE for the MCFSAWS-Ensemble scheme is 1.0350234, while the SMA scheme exhibits a slightly higher first quartile MAE of 1.1211800. This trend continues for the median and mean MAE values, where the MCFSAWS-Ensemble scheme demonstrates lower values (1.4950217 and 1.6426240, respectively) compared to the SMA scheme (1.3675169 and 1.6925493, respectively). These results suggest that the proposed weighting scheme generally outperforms the SMA scheme in terms of achieving lower MAE values. Regarding correlation, higher values indicate a stronger positive relationship between the simulated precipitation and the observed values. The MCFSAWS-Ensemble scheme shows a minimum correlation of −0.54468, while the SMA scheme has a slightly better minimum correlation of −0.462500. However, when considering the quartiles, median, and mean correlation values, the MCFSAWS-Ensemble scheme consistently exhibits higher values (ranging from 0.608719 to 0.813871) compared to the SMA scheme (ranging from 0.584670 to 0.742601). This suggests that the proposed weighting scheme generally provides a more favorable correlation with the observed precipitation data.

Table 4 Overall description of comparative statistics of MAE and correlation under proposed MCFSAWS-Ensemble and SMA Schemes in all the grid points

In summary, the table indicates that the MCFSAWS-Ensemble scheme, with its unequal weighting approach, performs better than the SMA scheme in terms of achieving lower MAE values and higher correlations. These findings highlight the efficiency and superiority of the proposed procedure for climate simulation of precipitation in a multi-model ensemble of GCMs.

Figure 7 shows the temporal behavior of ensemble data indicating the consistency of two schemes at a specific grid point, while Fig. 8 illustrates the spatial distribution of correlation coefficients between the data from the MCFSAWS-Ensemble scheme and the SMA scheme with observed data. We observe that the proposed weighting scheme is consistent with SMA and there is no contradiction in any single grid points. This inference supports and enhances the understanding of the comparative analysis provided in Table 4.

Fig. 8
figure 8

Spatial distribution of correlation coefficient of data MCFSAWS-Ensemble and SMA scheme with observed data

5 Conclusion

Continuous monitoring and projection of climate change are compulsory to care about future global health. Therefore, prioritizing the monitoring and projection of climate change is crucial to promoting global health and ensuring a sustainable future for all. Multi-model ensemble approach for GCMs is important for climate change assessments, as it can help to account for uncertainties and provide a more comprehensive understanding of the potential impacts of climate change. In this paper, we proposed a new weighting scheme — MCFSAWS-Ensemble: Monte Carlo Feature Selection Adaptive Weighting Scheme for GCMs Ensemble. Unlike, the SAM scheme, the proposed weighting scheme has the potential to reduce the effects of extreme values. For application, we used the simulated time series data of precipitation data from 20 GCMs of CMIP phase 6 at the Tibetan Plateau. Values of quality measures indicate that the proposed unequal weighting scheme (MCFSAWS-Ensemble) performed better as compared to the SMA approach. MCFSAWS-Ensemble successfully reduced the effect of extreme values in the time of precipitation data of the Tibetan Plateau. The suggested weights under the proposed scheme can be cogitated to combine CMIP6 data of future scenarios. In summation, the proposed weighting scheme can help to aggregate multiple, which can improve our understanding of the climate system and extreme events like drought and flood. In future research, the same research framework can be extended to aggregate other important variables like temperature, humidity, evaporation, etc. These findings contribute in several ways to our understanding of multi-model ensembles and provide a basis for accurate assessment of climate change and its impact. The potential limitation of the study is that we only used precipitation data. In future research, other meteorological parameter such as temperature, humidity, and wind speed can be incorporated to enhance the accuracy of drought assessment.