Introduction

In recent decades, climate change has emerged as a major concern due to its impact on the earth's climate. Defined as long-term alterations in greenhouse gas concentrations, it significantly influences climatic processes (Goyal, 2004; Werndl, 2016). These changes have demonstrably affected Earth's temperature and precipitation patterns (Solomon et al., 2007; Calzadilla et al., 2013). Consequently, various related aspects of human life have experienced modifications, including extreme weather events (floods, droughts, heatwaves) (Mal et al., 2018; Luber and McGeehin, 2008; Ahmed et al., 2018), storm patterns (Michener et al. 1997; Brooks, 2013), and crop water requirements (Gohari et al., 2013; Li et al., 2022).

Climate change poses a significant threat to water resources by significantly impacting hydrological cycles (Allan et al., 2013). Projections for future climate change indicate continued and substantial impacts on water resource systems. Ignoring these consequences is no longer an option (Pachauri et al., 2014). Therefore, developing climate-resilient water resource systems becomes increasingly crucial (Charlton and Arnell, 2011; Gohari et al., 2017; Zareian, 2021).

The Intergovernmental Panel on Climate Change (IPCC) highlights the significant threat climate change poses to water resources, particularly in arid and semi-arid regions (IPCC, 2021). Their report details the anticipated impacts on these regions, including changes in temperature, precipitation, evaporation, soil moisture, and river flow. This challenge is evident in the case of Iran and its neighbors, where managing shared water resources amidst droughts and climate change has become increasingly complex. These complexities underscore the need to investigate how climate change specifically affects temperature and precipitation in these shared watersheds. Such insights are crucial for informing effective policy guidance for future water resource management in the region.

Atmosphere- Ocean General Circulation Models (AOGCMs) represent the primary tools for studying the Earth's future climate in the context of climate change. These models employ intricate three-dimensional networks to represent each component of the Earth's climate system. Notably, AOGCMs are constructed by coupling together distinct models representing atmospheric and oceanic processes (IPCC, 2021). Building upon this approach, the Intergovernmental Panel on Climate Change (IPCC) released its sixth assessment report (AR6) on climate change, highlighting the critical role of CMIP6 models (Shukla et al., 2019). Consequently, CMIP6 models stand as one of the most prominent tools for analyzing and understanding future climate change scenarios.

Research suggests that combining AOGCMs through group implementation offers advantages over using individual models (Zareian et al., 2015). This approach has been explored in various studies, demonstrating potential improvements in future climate change forecasts. For instance, Chen et al. (2017) investigated weighting methods for AOGCMs in hydrological studies. They compared five methods applied to 28 CMIP5 models in Canada, finding that combining AOGCMs could enhance future climate predictions. Similarly, Ahmed et al. (2019) evaluated 20 CMIP5 models based on their ability to simulate precipitation between 1961 and 2005. They employed various criteria, including spatial efficiency and skill scores, and compared two methods for integrating precipitation data from the selected models: Simple Mean (SM) and Random Forest Regression (RF). Their findings indicated superior performance when combining models using the Random Forest Regression method.

Studies have explored various methods to improve the accuracy of climate change projections using AOGCMs. Gohari et al. (2015) investigated a probabilistic method based on statistical distribution fitting, demonstrating its ability to enhance the accuracy of CMIP3 models for temperature and precipitation changes in central Iran. Tanveer et al. (2016) employed the Reliability Ensemble Averaging (REA) method to combine 18 CMIP5 models, reducing uncertainty in climate predictions for the Han River basin in South Korea. They used the root mean square error (RMSE) to evaluate uncertainty across different models and found REA averaging significantly improved accuracy. Xiong et al. (2021) compared the ability of 24 CMIP5 models to simulate air temperature in central Asia. They used spatial and temporal analysis, trend analysis, and empirical orthogonal functions (EOF) to identify the most accurate models. CNRM-CM5, GFDL-CM3, and GISS-E2-H emerged as the most accurate for this specific region. Liu et al. (2022) evaluated 22 CMIP6 models in simulating extreme precipitation over Central Asia using Taylor diagrams and DISO (Distance between Indices of Simulation and Observation). Their findings revealed uncertainties in all models for simulating extreme precipitation patterns in early spring and late winter.

Previous research has established the capability of CMIP models to capture the general spatial distribution of temperature and precipitation across Central Asia (Xiong et al., 2021; Yanmin & Haomin, 2013). Analyses using CMIP5 models have shown success in replicating trends, temporal evolution, and annual mean surface air temperatures (Yanmin & Haomin, 2013). However, limitations have also been identified. Studies highlight the inability of CMIP5 models to capture regional details of climate change impacts and accurately describe low-scale variations (Xiong et al., 2021).

Simulating precipitation poses a greater challenge for CMIP models. While some models show promise in replicating the annual cycle shape in specific regions, difficulties arise in capturing peak precipitation and accurately simulating Probability Density Functions (PDFs). Additionally, there is a tendency for overestimation, particularly in arid regions (Guo et al., 2021). Recent work on CMIP6 models suggests progress in simulating extreme precipitation events during specific seasons. However, substantial biases remain, particularly in early spring and late winter (Liu et al., 2022). These biases highlight the ongoing need for model refinement.

This study addresses several key gaps and inconsistencies in the current literature. First, we will employ daily data, providing a more granular understanding of model performance compared to previous studies which have primarily focused on annual or seasonal means (Xiong et al., 2021; Yanmin & Haomin, 2013). Second, our research concentrates on Iran and its neighboring countries, allowing for a more in-depth analysis of a climatically diverse region, compared to existing research that often encompasses broader regions within Central Asia. Finally, by focusing on CMIP6, our study will contribute to understanding the effectiveness of the latest generation of climate models in this specific region, whereas prior studies acknowledge advancements in CMIP6 models but lack a dedicated evaluation for this area using daily data (Guo et al., 2021; Liu et al., 2022).

By addressing these gaps, this study will provide valuable insights into the ability of CMIP6 models to simulate temperature and precipitation patterns in a region of significant geopolitical and climatic importance (Iran and its neighboring countries). Improved understanding of model strengths and weaknesses will inform future climate projections and impact assessments for the region.

Materials and Methods

Study Area

This study focuses on Iran and seven neighboring countries (Iraq, Turkey, Azerbaijan, Armenia, Afghanistan, Turkmenistan and Pakistan) (Figure 1). Since the study area includes several shared watersheds, any change in temperature and precipitation in these countries due to climate change may result in new conflicts regarding the water resources management.

Figure 1
figure 1

Location map of the study area

Extraction of CMIP6 outputs

The official IPCC database (https://www.ipcc.ch) was used to extract the outputs of 10 CMIP6 models. Table 1 summarizes the characteristics of the selected models. To evaluate the accuracy of the CMIP6 models, the historical period 1980 to 2014 was used as a baseline period. Selecting these specific models involved careful consideration of two key criteria. Firstly, advanced parameterization schemes for both the atmosphere and land surface were sought to accurately represent regional processes like monsoonal circulation patterns (He et al., 2023). Secondary, we looked for models with demonstrated skill in simulating temperature and precipitation over similar geographic regions or climatic zones based on historical performance evaluations (Yanmin & Haomin, 2013; Guo et al., 2021).

Table 1 Characteristics of selected CMIP6 models used in this study

Extraction of temperature and precipitation data from ERA5

The European Centre for Medium-Range Weather Forecasts (ECMWF) provides a valuable source of reanalysis meteorological data. Renowned for its accuracy, ECMWF offers the highly regarded ERA5 dataset, which merges observational and numerical model data. This dataset offers high temporal (hourly to monthly) and spatial (0.1 degrees) resolution (Hersbach et al., 2020). Given the limited density of meteorological stations in the study area, monthly temperature and precipitation data for the historical period 1980-2014 were retrieved from ERA5 on the ECMWF website (https://www.ecmwf.int) to serve as observed historical data. The selection of ERA5 dataset is motivated by two key considerations. Firstly, ERA5 has established credibility in representing climate variables like temperature and precipitation, particularly in regions with limited observational networks (Gomis-Cebolla et al., 2023). Studies have demonstrated strong agreement between ERA5 data and available in-situ observations in the study area (Song et al., 2022; Radmanesh et al., 2023). This established performance makes ERA5 a reliable alternative for evaluating climate model simulations. Secondly, the scarcity of high-quality, long-term observational meteorological data across Iran and neighboring countries presents a significant challenge.

Assessing the accuracy of CMIP6 models

The historical outputs of the chosen CMIP6 models can be employed to evaluate their accuracy in simulating future meteorological variables. This comparison allows us to assess how well each model replicates historical temperature and precipitation patterns by comparing its outputs against observed data from meteorological stations. In this study, the accuracy of the selected models was evaluated by comparing their historical outputs (1980-2014) with reanalyzed data from ERA5. The Kling-Gupta Combined Statistical Index (KGE) was used for this evaluation. The KGE index offers several advantages over simpler indices like Root Mean Squared Error (RMSE), Mean Squared Error (MSE), or the coefficient of determination (R2). While RMSE, MSE, and R2 are commonly used, they each focus on a single aspect of model performance. RMSE and MSE emphasize the magnitude of errors, and R2 reflects the correlation between simulated and observed data. These indices fail to capture the multifaceted nature of model agreement, neglecting factors like mean bias and variability. Also, compared to dichotomous or probabilistic evaluations, the KGE index offers a single metric combining mean bias, variability agreement, and temporal correlation. This makes it easier to interpret and assess climate model accuracy in simulating precipitation and temperature (Lamontagne et al., 2020). The KGE index can be calculated using Eqs. 1-4:

$$KGE=1-\sqrt{{\left(r-1\right)}^{2}+{\left(\alpha -1\right)}^{2}+{\left(\beta -1\right)}^{2}}$$
(1)
$$\alpha =\frac{{\sigma }_{s}}{{\sigma }_{o}}$$
(2)
$$\beta =\frac{{\mu }_{s}}{{\mu }_{o}}$$
(3)
$$r=\frac{\sum ({x}_{i}-\overline{x })({y}_{i}-\overline{y })}{\sqrt{{({x}_{i}-\overline{x })}^{2}{({y}_{i}-\overline{y })}^{2}}}$$
(4)

where, \(x\) is the of historical precipitation or temperature that extracted from ERA5 dataset; \(y\) represents historical precipitation or temperature data that has been derived from the CMIP6 models; \({\mu }_{s}\) and \({\mu }_{o}\) are the averages of ERA5 and CMIP6 data, respectively; \({\sigma }_{s}\) and \({\sigma }_{o}\) are the standard deviations of the ERA5 and CMIP6 data, respectively; and r is the Pearson correlation coefficient (Knoben et al., 2019). KGE value ranges from \(-\infty\) to 1, with values closer to 1 showing the strongest correlation between observational and simulated data (Patil and Stieglitz, 2015).

The complexity of calculations and the need for extensive pixel-based comparisons across a vast region necessitated the use of the R programming language for this analysis.

Results

The accuracy of CMIP6 models in simulating temperatures

Figure 2 depicts the KGE index, spatially distributed, which compares the accuracy of different CMIP6 models in estimating temperature across the study area. The BCC-CSM2-MR model shows fluctuating KGE values between 0 and 0.4 in most regions, except for eastern Afghanistan and western Turkey, where slightly higher accuracy is observed. Similar performance is seen with the CAMS-CSM1-0 model, except for small areas between Azerbaijan and Armenia, where its accuracy is lower. The CESM2 and CMCC-ESM2 models exhibit lower accuracy, particularly in central and eastern Afghanistan, as indicated by the KGE index. The KGE analysis revealed that the CNRM-CM6-1 model displayed lower performance compared to others, particularly in central and northeastern Afghanistan and Turkmenistan (Fig. 2). The GFDL-ESM4 model generally performed well across the study area, except for a small region along the northern Afghan-Pakistani border. HadGEM3-GC31-LL and IPSL-CM6A-LR models exhibited the highest error values in eastern Afghanistan and northern Pakistan. Additionally, MIROC6 and MRI-EMS2-0 models showed significant variations in error, particularly across Afghanistan, Pakistan, Iran, Azerbaijan, and Turkey (Fig. 2).

Figure 2
figure 2

Spatial assessment of CMIP6 performance in temperature prediction using KGE index

Figure 3 displays the average KGE values for historical temperature simulations across various countries. In January, the MRI-EMS2-0 model (KGE=0.25) exhibited the most accurate performance in Iraq, while the HadGEM3-GC31-LL model showed the weakest performance among the compared models for the same region. For February, MRI-EMS2-0 (KGE=0.31) remained the top performer in Iraq, followed by MIROC6 (KGE=-3.4) with the poorest performance. The comparison of KGE changes in March reveals that MIROC6 (KGE=0.34, Iraq) maintained its leading position, while the HadGEM3-GC31-LL model (KGE=-9.4, Armenia) demonstrated the weakest performance. Finally, in April, CMCC-ESM2 (KGE=0.28, Iraq) emerged as the best model for temperature prediction, while CNRM-CM6-1 (KGE=-8.8, Afghanistan) had the weakest performance (Fig. 3).

Figure 3
figure 3

Average KGE index for estimating monthly temperature in different countries

According to the results, the best and weakest CMIP6 models in May were HadGEM3-GC31-LL (KGE=0.3 in Turkmenistan) and MIROC6 (KGE=-1.5 in Afghanistan). There was highest accuracy (KGE=0.34) for the IPSL-CM6A-LR model in Azerbaijan in June, whereas the lowest accuracy (KGE=-0.81) was noted in Pakistan for the same model. In July, MRI-EMS2-0 model (KGE=-0.49) and CESM2 model (KGE=0.35) in Iraq have the best and weakest performance, respectively. In August IPSL-CM6A-LR model (KGE= 0.49 in Turkey) and MRI-EMS2-0 model (KGE=-0.48 in Pakistan), had the most accurate and weakest performances, respectively. Finally, the best and weakest performance for CMIP6 models were observed in September for IPSL-CM6A-LR model (KGE=0.41 in Armenia) and GFDL-ESM4 (KGE=-0.05 in Pakistan); in October for IPSL-CM6A-LR model (KGE=0.29 in Armenia) and HadGEM3-GC31-LL model (KGE=-4.29, Pakistan); in November for CMCC-ESM2 model (KGE=0.15 in Iraq) and BCC-CSM2-MR model (KGE=-4.8 in Armenia) and in December for IPSL-CM6A-LR model (KGE=0.17 in Iraq) and CNRM-CM6-1 model (KGE=-3.2 in Iran) (Fig. 3).

KGE values, used to estimate historical annual temperatures, are presented in Figure 4 and Table 2. The IPSL-CM6A-LR model exhibited the best performance (KGE=0.51) in Azerbaijan, whereas the HadGEM3-GC31-LL model showed the lowest performance (KGE=-1.4) in Pakistan.

Figure 4
figure 4

Average KGE index for estimating annual temperature in different countries

Table 2 Average KGE index for estimating annual temperature in different countries

The standard deviation (SD) of the monthly KGE values was also calculated to compare the accuracy of CMIP6 models for estimating temperatures at monthly level (Figure 5). Winter months had the greatest SD, with January showing the peak value. In contrast, summer months (May-September) had the lowest SD. Armenia exhibited the highest (SD = 0.81) and lowest (SD = 0.07) monthly KGE variations for temperature estimation (Fig. 5). These findings suggest that CMIP6 models might exhibit larger variations in performance during colder months when simulating temperatures. Consequently, selecting the most suitable CMIP6 model for cold-month temperature simulations requires increased caution.

Figure 5
figure 5

Standard deviation of the KGE index for monthly and annual temperature estimates by different CMIP6 models

The lower performance of some models in central and eastern Afghanistan (e.g., CESM2, CMCC-ESM2) might be linked to factors like complex terrain or limited observational data in these regions. Investigating the specific characteristics of these models and the region's topography or data availability could shed light on this spatial discrepancy. Additionally, the standard deviation analysis (Figure 5) suggests higher performance variability during colder months. This could be attributed to factors like stronger atmospheric circulation patterns or less influence of local phenomena in winter compared to summer. Exploring the models' ability to capture these large-scale atmospheric dynamics in winter versus their performance in simulating smaller-scale summer processes would be a valuable addition. By delving deeper into the spatial and temporal variations, we can gain a more nuanced understanding of the strengths and weaknesses of each model. For instance, the consistent performance of IPSL-CM6A-LR in Azerbaijan (Figure 3) suggests its potential strength in simulating temperature dynamics for regions with specific geographical characteristics (e.g., mountainous areas). Conversely, models like CNRM-CM6-1, consistently showing lower performance across the region (Figure 2), might require further investigation into their physical parameterizations or underlying processes.

The accuracy of CMIP6 models in simulating precipitation

Figure 6 shows the KGE index's spatial variation for simulating precipitation across the study area. Compared to temperature simulations, precipitation simulations exhibited greater spatial variability (larger differences in KGE values across the region). Notably, the BCC-CSM2-MR showed lower performance in the central and driest regions (Figure 6). Additionally, the CAMS-CSM1-0 model's accuracy varied spatially. It's worth mentioning that CAMS-CSM1-0, CESM2, and CMCC-ESM2 models showed higher accuracy in western regions (Iraq and Turkey) compared to other areas. The CNRM-CM6-1 model performed well in most regions, while the GFDL-ESM4 model excelled in eastern areas (Afghanistan, Pakistan, and Turkmenistan). The HadGEM3-GC31-LL model showed good performance in central and western regions. The IPSL-CM6A-LR model had low accuracy in all areas except Turkmenistan. Additionally, MIROC6 and MRI-EMS2-0 models were inaccurate in most regions, except for Iraq (Fig. 6).

Figure 6
figure 6

Spatial assessment of CMIP6 performance in precipitation prediction using KGE index

As shown in Figure 7, CMIP6 models exhibit variations in accuracy for predicting precipitation across different months. In January, the CAMS-CSM1-0 model (KGE=0.19) and the MRI-EMS2-0 model (KGE=-0.63) were identified as having the highest and lowest accuracy in Armenia and Pakistan, respectively. In February, HadGEM3-GC31-LL (KGE=0.18 in Armenia) and GFDL-ESM4 (KGE=-0.5 in Pakistan) were ranked as the most accurate and least accurate models, respectively. IPSL-CM6A-LR (KGE=0.17 in Armenia) and MIROC6 (KGE=-0.79 in Pakistan) displayed the most and least skillful performances in March, respectively. Finally, in April, GFDL-ES (KGE=0.39 in Armenia) and MIROC6 (KGE=-1.7 in Iran) showed the highest and lowest accuracy (Fig. 7).

Figure 7
figure 7

Average KGE index for estimating monthly precipitation in different countries

Based on the findings, GFDL-ESM4 emerged as the most effective model in Armenia during May (KGE= -0.01), June (KGE= 0.12), and July (KGE =0.03). However, MIROC6 (KGE=-7.5 in Iran), IPSL-CM6A-LR (KGE=-8.2 in Iraq) and BCC-CSM2-MR (KGE=-9.4 in Afghanistan) exhibited the least proficient performance in May, June, and July, respectively. The most effective models identified in August, September, and October were HadGEM3-GC31-LL (KGE=-0.13 in Armenia), MIROC6 (KGE=0.08 in Armenia), and CESM2 (KGE=0.09 in Turkmenistan), respectively. Conversely, IPSL-CM6A-LR (KGE=-9.8 in Iran), BCC-CSM2-MR (KGE=-9.8 in Iraq) and CNRM-CM6-1 (KGE=-8.7 in Iran) were deemed the least proficient models in August, September and October, respectively (Fig. 7).

A comparison of the accuracy of different CMIP6 models in estimating precipitation indicated that MIROC6 (KGE=0.13 in Armenia) and CMCC-ESM2 (KGE=0.12 in Armenia) were the most accurate models in November and December, respectively. Furthermore, MRI-EMS2-0 (KGE=-4 in Armenia) and HadGEM3-GC31-LL (KGE=-0.93 in Armenia) were the weakest models during November and December, respectively. The results also indicate that MIROC6 (KGE=0.13 in Armenia) and CMCC-ESM2 (KGE=0.12 in Armenia) performed best in November and December, respectively. Conversely, MRI-EMS2-0 (KGE=-4 in Armenia) and HadGEM3-GC31-LL (KGE=-0.93 in Armenia) exhibited the lowest performance during these months (Fig. 7).

The KGE values for annual precipitation estimation are presented in Table 3 and Figure 8. The KGE index ranges from -0.64 (MRI-EMS2-0 in Afghanistan) to 0.05 (CMCC-ESM2 in Armenia). These results suggest a trend of better performance by CMIP6 models in estimating precipitation during the winter months.

Table 3 Average KGE index for estimating annual precipitation in different countries
Figure 8
figure 8

Average KGE index for estimating annual precipitation in different countries

Figure 9 depicts the monthly variation of the KGE index for estimating precipitation across various countries. It reveals that model performance varied the most during warm months (May to October), peaking in Iraq during September (SD= 4.2). Conversely, Afghanistan exhibited the lowest variability in January (SD= 0.07). These findings suggest that CMIP6 models are more consistent in predicting precipitation during rainy months compared to warm months.

Figure 9
figure 9

Standard deviation of the KGE index for monthly and annual precipitation estimates by different CMIP6 models

The observed higher spatial variability in precipitation simulations compared to temperature (Figure 6) warrants further discussion. Potential reasons for this could include the complex interplay of factors like topography, convection, and large-scale atmospheric circulation that influence precipitation. Models might struggle to accurately represent these interactions, leading to greater spatial variability in performance compared to temperature simulations. Additionally, limited precipitation data, especially in mountainous regions, can hinder model calibration and validation, potentially contributing to the observed spatial variations in accuracy. Investigating the models' performance in areas with denser observational networks could provide insights into their intrinsic capabilities. Furthermore, the trend of better performance during winter months (Figure 8) could be related to the dominance of large-scale circulation patterns in winter precipitation compared to the more localized convective processes that influence summer precipitation. Analyzing the models' ability to capture these seasonal variations in precipitation drivers would be insightful. Understanding the spatial and temporal variations in precipitation simulations allows for a more targeted discussion of model strengths and weaknesses. For instance, the consistent performance of GFDL-ESM4 in eastern regions (Figure 6) suggests its potential strength in simulating precipitation patterns influenced by specific atmospheric dynamics in those areas. Conversely, models like BCC-CSM2-MR, consistently showing lower accuracy across the region (Figure 6), might require further investigation into their parameterizations for representing precipitation processes.

Discussion

This study evaluated the performance of various CMIP6 models in simulating temperature and precipitation across a region encompassing Afghanistan, Pakistan, Iran, Azerbaijan, and Turkey. The findings provide valuable insights into the strengths and weaknesses of these models for climate simulations in this specific region. Consistent with the findings of Babaousmail et al. (2021) and Nguyen-Duy et al. (2023), the ensemble mean of multiple CMIP6 models often outperformed individual models in terms of capturing the overall trends. This emphasizes the potential benefits of utilizing multi-model ensembles for generating more robust climate projections, a practice advocated for by Sanderson et al. (2021) and Peng et al. (2023).

Our findings align with previous studies (e.g., Zhang et al., 2024; Babaousmail et al., 2021; Chen et al., 2022; Jiang et al., 2020) in demonstrating that CMIP6 models exhibit varying skill in simulating temperature and precipitation. IPSL-CM6A-LR emerged as a relatively skillful model for precipitation in some areas (e.g., Azerbaijan), while models like BCC-CSM2-MR displayed poor performance (central arid regions). This highlights the importance of regional considerations when selecting appropriate models, as echoed by Xu et al. (2023) in their analysis of East Asian monsoon simulations.

Similar to Guo et al. (2021) and Ngoma et al. (2021) who reported wet biases in arid regions, our study observed overestimations of precipitation in some areas (e.g., Iran) by most CMIP6 models. This suggests potential limitations in these models' capability to accurately capture precipitation dynamics in arid and semi-arid climates, a challenge also recognized in studies by Kim et al. (2024) and Liu et al. (2022). Our results resonate with Iqbal et al. (2021) and Ngoma et al. (2021) who observed higher model discrepancies in simulating warm-season precipitation compared to colder months. This suggests that CMIP6 models require further development to improve their performance in capturing the complexities of seasonal precipitation patterns, as emphasized by Wang et al. (2023) in their recent contribution.

This study provides a detailed assessment of CMIP6 model performance in a specific region (Afghanistan, Pakistan, Iran, Azerbaijan, and Turkey) that has not been extensively explored in previous literature. This regional focus offers valuable insights into the suitability of these models for climate simulations in this area, complementing the works of Gao et al. (2022) on the Tibetan Plateau and He et al. (2023). By analyzing monthly variations in the KGE index, we demonstrate that model accuracy fluctuates throughout the year, with generally lower performance during warmer months for precipitation simulations. This information is crucial for selecting appropriate models for specific seasonal climate studies, as highlighted by Rivera (2023). The inclusion of standard deviation analysis for monthly KGE values provides additional insights into model consistency. We observed higher standard deviations during winter months for temperature estimations, suggesting greater variability in model performance during colder periods. This information can be helpful for researchers when interpreting model outputs and associated uncertainties, echoing the importance of uncertainty quantification stressed by Yazdandoost et al. (2021).

While this study offers valuable insights, certain limitations need to be acknowledged. The accuracy of the findings is contingent on the quality and completeness of observational data used for comparison, as noted by Chen et al. (2020) in their assessment of CMIP5 and CMIP6 data. The evaluation focused on historical simulations, and the performance of these models for future climate projections remains to be assessed, as addressed by O'Neill et al. (2016) concerning the challenges of future climate projections. The coarse resolution of CMIP6 models might not adequately capture the influence of complex regional topography on climate, a limitation also recognized in studies by Giorgi & Raffaele (2022).

Future research directions include incorporating additional observational datasets to strengthen the evaluation of model performance, assessing how well these models project future climate changes in the study region, investigating the potential benefits of employing higher-resolution models for capturing regional climate dynamics, and identifying the underlying physical processes responsible for model biases to inform model development efforts. By addressing these limitations and pursuing further research avenues, we can gain a deeper understanding of CMIP6 model capabilities and limitations for climate simulations in the region of interest and beyond. This knowledge will be instrumental in generating more reliable climate projections and informing effective climate change adaptation strategies.

Conclusions

A crucial first step in assessing the impacts of climate change on temperature and precipitation is an initial evaluation of the accuracy of different AOGCM models. This is especially important in vulnerable regions with shared water sources, where changes in temperature and precipitation can have a substantial impact on water availability. This study found variations in the accuracy of CMIP6 models for estimating temperature and precipitation across Iran and its neighboring countries. During the historical period (1980-2014), these models performed relatively better at estimating temperature compared to precipitation. Additionally, the variability in temperature estimation was lower during warmer months compared to colder months. Interestingly, the opposite trend was observed for precipitation, with higher accuracy in colder months. However, despite the better performance in temperature estimation, the accuracy of CMIP6 models in predicting temperature remains insufficient in the some regions of the study area Therefore, a thorough evaluation of CMIP6 models is critical before further analysis of climate change impacts on hydro-ecological aspects in these regions.

Competing interests

The authors declare no competing interests.