1 Introduction

Southern South America (SSA, roughly between 52–74° W and 22–57° S) is one of the most populated portions of South America and presents a wide variety of climates. The region is exposed to climate extremes such as extreme precipitation events that affect the socio-economic activities, energy demand and health systems. In particular, extreme rainfall events are recognised as some of the major threats of climate change, given an increase of water vapor availability in the atmosphere due to a rise in greenhouse gases concentrations (Du et al. 2019). Some portions of SSA, such as southern Chile and southeastern South America (SESA, covering northeastern Argentina, southern Brazil and Uruguay) are some of the rainiest regions of South America after the Colombian Andes and the Amazonas. In particular, SESA is part of La Plata Basin, the second basin of South America in terms of river discharge and size, playing a critical role in the regional economy (Berbery and Barros 2002; Barros et al. 2006). Cattle raising, rainfed agricultural production and hydroelectric power generation are the main economic activities in SESA, resulting especially vulnerable to extreme precipitation, which are the main contributors to the hydrological cycle and often lead to overflows and floods (Vörösmarty et al. 2013; Cavalcanti et al. 2015).

Precipitation in SSA is controlled by both large-scale and regional-scale forcings. Rainfall in central Chile increases southward from very dry conditions along the Atacama Desert in northern Chile to more than 3000 mm in southern Chile (Quintana and Aceituno 2012). Precipitation in those regions is mostly associated with the passage of mid-latitude cold fronts mainly during the cold season (April to September), when the subtropical anticyclone of the Pacific Ocean and the mid-latitude band of migratory low pressure systems are at their northernmost location (Montecinos and Aceituno 2003). Regarding the interannual variability, El Niño–Southern Oscillation (ENSO) episodes are commonly related to above normal rainfall in central Chile (Quintana and Aceituno 2012). Eastward from the Andes, the northwestern portion of Argentina and Argentinian Patagonia present much drier conditions, mainly due to the presence of the Andes mountain range that acts as a topographic barrier, not allowing the extratropical disturbances embedded in the mid-latitude westerlies to reach these regions. Conversely, SESA presents a uniform precipitation annual cycle with large amounts of precipitation, frequently associated with extreme precipitation events. They are typically related to extratropical synoptic systems during the cold season (April–September), cyclogenesis during the transition seasons and mesoscale convective systems particularly during the warm season (October–March) (Cavalcanti 2012). Moreover, different forcings at multiple temporal and spatial scales were documented to influence precipitation variability and extremes in SESA. In particular, the role of the Atlantic and Pacific oceans was thoroughly studied. Sea surface temperature (SST) anomalies in the equatorial Pacific and western Indian Ocean associated with the ENSO teleconnection induce circulation anomalies that promote the occurrence of heavy rainfall in SESA especially during spring (Robledo et al. 2013). Furthermore, positive SST anomalies over the southwestern Atlantic basin accompanied by a weak activity of the South Atlantic Convergence Zone (SACZ) are associated with positive precipitation anomalies over the region during the austral summer (Doyle and Barros 2002). Another key ingredient favouring extreme precipitation over the region is the enhancement of the South American Low Level Jet (SALLJ), which advects warm and humid air from the Amazonas that interacts with relatively colder and drier masses from mid-latitudes (Salio et al. 2007; Penalba and Robledo 2010). This availability of heat and moisture is combined with the intensification of the upper level subtropical jet, the topographic forcing of the Andes on baroclinic disturbances and with the presence of a lee trough at the mid-levels of the atmosphere that propagates eastwards, favouring the instability conditions required for the development of convective systems in SESA (Durkee et al. 2009; Rasmussen and Houze 2016). All the described above gives an idea of the complexity of extreme precipitation events in SSA and the importance of their study.

Research on climate variability and extremes requires long records of high-quality and high-resolution data. Gauge observations in SSA often present gaps and errors in their time series, and/or their spatial and temporal resolution is not always sufficient. This is a particular challenge for developing countries, where maintaining these station networks is really expensive and rain gauges are the main direct rainfall measurement. They are necessary for the calibration and validation of radar and satellite products, numeric models and also for the construction of gridded precipitation products (Salio et al. 2015; Sun et al. 2018). These gridded datasets are frequently used for the evaluation of global and regional climate models (GCMs and RCMs, respectively) and to remove model output biases. Prein and Gobiet (2017) compared several gridded precipitation observations and RCMs in Europe and found that the differences between them reached the same magnitude as precipitation errors found in the RCMs. Thus, including multiple datasets with information from different sources is of key importance to assess the observational uncertainty in climate studies and model evaluations.

RCMs are one of the most popular downscaling techniques applied to GCMs or to reanalyses. They produce high-resolution climate information and are able to capture regional scale forcings that GCMs cannot reproduce, such as land-sea contrast and topography (Di Luca et al. 2016). In particular, they yield a better representation of the physical mechanisms involved in the occurrence of extreme precipitation events than the GCMs (Rummukainen 2010), which have exhibited difficulties in representing their frequency and intensity especially in SESA (Bettolli and Penalba 2014). Several studies have addressed the evaluation of RCMs in South America, including SSA (da Rocha et al. 2009; Solman et al. 2013; Giorgi et al. 2014; Llopart et al. 2014; Carril et al. 2016; Tencer et al. 2016; among others). Solman and Blazquez (2019) studied the variability of daily precipitation over South America during 1979–2005 and assessed the added value of historical RCM simulations compared to their driving GCMs. Although an overall good performance was shown, the added value was dependent on the RCM–GCM pair evaluated, but poorer for extreme events. This was associated with the quality of convective schemes, which represents one of the most important sources of model errors. Falco et al. (2019) analysed the mean climate conditions over South America using historical and evaluation simulations at a monthly scale during 1990–2004 and identified a robust added value in the simulations of summer air temperature over tropical and subtropical latitudes. However, less clear results were found when evaluating summer mean precipitation. Furthermore, Bettolli et al. (2020) assessed the inter-comparison between statistical and dynamical regional simulations and observations in representing the 2009–2010 warm season, a season with record extreme precipitation events in SESA, exhibiting large dispersion among them.

In characterising precipitation climate extremes, an important goal is understanding the synoptic circulation that favours the occurrence of these events and their representation by RCMs and GCMs simulations. In this way, the generation of atmospheric circulation classifications can reduce all the atmospheric configurations that occur during a period of study into predominant and representative circulation types (CTs) and allows the exploration of projected climate changes related to them. Many studies around the world have employed different clustering techniques for these purposes (Huth 2000; Gutiérrez et al. 2005; Bettolli et al. 2010; Agel et al. 2017; Smith and Sheridan 2019; among others) and have analysed both observed and simulated CTs-precipitation relationships (Schuenemann and Cassano 2010; Gibson et al. 2016; Glisan et al. 2016; Pinto et al. 2018). For different regions of SSA¸ previous studies by Solman and Menendez (2003), Barrucand et al. (2014), Penalba et al. (2013) and Olmo et al. (2020) have characterised observed CTs and the associated precipitation and only one, to the authors' knowledge, have evaluated simulated CTs by GCMs (Bettolli and Penalba 2014). Moreover, regarding the different clustering techniques, only few studies in some portions of SSA have addressed the use of the Self-Organizing Maps (SOM) technique for synoptic climatology characterisation: Espinoza et al. (2012) investigated the spatial and temporal characteristics of cold surges and identified large-scale circulation patterns that favour cold intrusions over several portions of South America; whereas D’onofrio et al. (2010) explored the use of SOM and k-means to perform a statistical downscaling of daily precipitation in Argentina, with relatively good and homogenous skills for different precipitation thresholds. More recently, Loikith et al. (2019) considered multiple circulation variables to define synoptic circulation patterns in SSA with SOM, focusing on the impact of atmospheric rivers and interannual variability in surface temperature and precipitation. Thus, using the SOM technique to address the study of synoptic circulation in SSA and its relation to extreme precipitation events becomes a valuable contribution to the current knowledge on the behaviour of these climate extremes.

All of above evidence that the characterisation of daily precipitation extremes across SSA in observations and RCMs simulations is still an open issue, as well as the representation of the dominant CTs and their association with extreme events. In this context, the aim of this study is to characterise daily extreme precipitation in SSA in terms of its frequency and intensity considering multiple observational datasets and RCMs and to evaluate the synoptic circulation-precipitation extremes relationship by performing a classification of atmospheric circulation types using the SOM technique.

2 Data and methodology

2.1 Extreme precipitation

Extreme daily precipitation in SSA was analysed considering four observational datasets and four RCMs during the 31-year common period 1980–2010. The period of study was chosen as it presented the largest number of RCM evaluation simulations. On one hand, we considered the meteorological stations network employed in Olmo et al. (2020), which was quality controlled (with special focus on outliers in their time series) and checked for physical consistency (for instance, by comparing nearby stations). Data were provided by the National Weather Services of Argentina, Brazil, Paraguay and Uruguay, and the Center for Climate and Resilience Research of Chile. This dataset resulted in a fine balance between the spatial and temporal coverage over SSA, with most of the stations presenting less than 5% of missing data (Fig. 1). This gauge-based dataset was used as reference throughout the study. In addition, three other gridded daily precipitation products with different resolutions and temporal coverage were considered for the study: (a) the CHC InfraRed Precipitation with Station data (CHIRPS) dataset, which is generated using satellite data and incorporates station data in its construction; (b) the CPC Global Unified Gauge-Based Analysis of Daily Precipitation (CPC) dataset, which is built using in-situ information and numeric models; (c) the Multiple-Source Weighted-Ensemble Precipitation (MSWEP) dataset based on satellite data that also considers information from multiple sources. On the other hand, we used four RCMs from the evaluation experiment of the Coordinated Regional Climate Downscaling Experiment (CORDEX)—South American domain: the models RCA4 and WRF were part of the CORDEX Phase I simulations (Giorgi et al. 2009), whereas the models RegCM4 and REMO were part of the most recent CORDEX CORE simulations (Gutowski et al. 2016). They have horizontal grid spacing of approximately 0.44° × 0.44° and 0.22° × 0.22°, respectively, and were driven by the ERA-Interim reanalysis. A detailed description of the datasets included in this study is presented in Table 1.

Fig. 1
figure 1

adapted from Olmo et al. (2020). The different climatic regions are Northern Chile, Central and Southern Chile, Arid Diagonal Region, Argentinian Patagonia and Southeastern South America. Atmospheric circulation was studied within the blue box

Meteorological stations considered in the gridded STATIONS dataset over southern South America (SSA) domain of study and sub-regions

Table 1 Gridded datasets used in this study

In order to inter-compare observational datasets and RCMs, they were all interpolated using a bilinear scheme to a common rectilinear grid of 0.5° × 0.5° in the SSA land domain (Fig. 1). Taking into account that SSA is a wide region with different climates (Prohaska 1976), a regionalisation of SSA was used in some analyses for the sake of conciseness (Fig. 1), based on the one presented in Olmo et al. (2020).

For each observational dataset and RCM, extreme precipitation was defined at each grid point as those days when the accumulated precipitation exceeded the 95th percentile (P95) of the empirical distribution of rainy days (accumulated precipitation above 1 mm). This percentile was calculated in the base-period 1981–2010, based on a 29-days moving window centered on each calendar day (Tencer et al. 2016; Olmo et al. 2020). The length of this window was selected due to the high variability of daily precipitation, based on a sensibility analysis specific for the region (not shown) following the recommendations by Zhang et al. (2011). Additionally, Taylor diagrams (Taylor 2001) were used for summarizing the representation of the spatial distribution of the P95. These diagrams quantify the degree of statistical similarity between the reference observations (STATIONS) and the rest of the datasets, reporting the Pearson correlation coefficient, the standard deviation and the centered root mean squared error.

2.2 Circulation types

Classifications of atmospheric circulation can help understanding the relationship between large-scale atmospheric circulation and local variables and assessing the value of downscaled climate (Hewitson 2010). In this study, the non-linear Self-Organizing Maps (SOM) classification was employed to obtain the dominant circulation types (CTs) in SSA. The SOM technique was developed by Kohonen (2001) and was used in several applications, including the study of extreme precipitation events (Glisan et al. 2016; Agel et al. 2017; Gibson et al. 2017; Pinto et al. 2018). A review of SOM applications in synoptic climatology can be found in Hewitson and Crane (2002). The SOM employs a neural network algorithm to establish generalized spatial patterns—nodes—that span the range of conditions present in the training data. Each node is characterised by a reference vector, which is updated every time new data (daily fields) is presented to the SOM. The nodes are arranged in a two-dimensional array in such a way that similar nodes are located near each other, which facilitates the interpretation of the representative synoptic CTs and the underlying physical processes (Hewitson and Crane 2002).

Initially, the user defines the topology and number of nodes of the SOM. Input data cases (daily fields) are randomly chosen and presented to the SOM through the training procedure, based on the following equation: vk = vk + α n[w(i), k] (xi − vk), where v is the reference vector of the node k; α is the learning rate, decreasing with time; n is the neighbouring function that determines the rate of change in a ratio around the winning node w(i) (which also decreases with time); while xi is the input data. Every time new data is presented to the SOM, a measure of distance (for instance, Euclidean distance) is compared between the input data and each node to determine the “winning” node, that is, the node with the smallest distance. The winning node and the neighbouring nodes are updated during the SOM training. Thereby, each node is iteratively redefined by this procedure, and the input maps organize themselves generating the SOM after all the iterations (Kohonen 2001; Gutiérrez et al. 2005).

In this work, the atmospheric circulation was described using daily geopotential height at the 500 hPa level (Z500) from the European Centre for Medium-Range Weather Forecasts ERA-Interim reanalysis (Dee et al. 2011), with a spatial resolution of 0.75° × 0.75°, and daily Z500 simulations from the four RCMs described above (Table 1). An extended domain was considered for this analysis (roughly between 40°–90° W and 15°–60° S) in order to better capture the synoptic systems associated with extreme precipitation events (Fig. 1). RCMs daily fields were interpolated into ERA-I spatial resolution. For the description of the CTs in SSA, daily ERA-Interim Z500 anomalies from the set of all days were used during the period 1979–2017 to build the SOM. Anomalies were calculated by subtracting the daily mean values to each daily field. Based on the recommendations by Gibson et al. (2017), the neighbourhood ratio was selected to decrease linearly from 5 to 1 and the learning rate from 0.05 to 0.01, through 5000 iterations. The Euclidean distance was chosen to determine the winning node. Choosing the size of the SOM is generally, as for most of the synoptic classifications, a user decision based on the knowledge of the climatic states of the region and based on the evaluation of several trials. An optimal SOM size would enable to discriminate between relevant circulation patterns, while not producing redundant CTs that would complex the analysis. In this study, several SOM sizes were tested, varying from 12 to 42 nodes. Finally, a 4 × 4 hexagonal SOM (16 nodes) was chosen to represent the dominant atmospheric circulation of all days in SSA.

For the evaluation of the RCMs simulations in reproducing the dominant CTs, two approaches are usually considered. On one hand, the daily simulated fields from the RCMs can be projected into the reanalysis-trained SOM according to a measure of distance (Bettolli and Penalba 2014; Gibson et al. 2016). On the other hand, they can be used as input data for the SOM training together with the reanalysis fields, so the SOM can span all the possible configurations that are presented in the reanalysis and RCMs data (Glisan et al 2016; Pinto et al. 2018). In this work, we decided to follow the former approach, so daily Z500 anomalies fields from the RCMs were mapped to the nodes of the reanalysis-trained SOM by minimizing the Euclidean distance. The implicit assumption here is that the RCMs patterns fall within the data space represented by the ERA-I reanalysis. In this sense, Table S1 (see supplementary material) presents the quantization error (defined as the average Euclidean distance between an input daily field and the reference vector of each CT) as a measure of how accurate this assumption is Quagraine et al. (2020). In most of the cases, the errors detected in the RCMs were of the same magnitude or smaller than the ones presented by ERA-I, which indicated that the synoptic patterns simulated by the RCMs were typically included in the ERA-I reanalysis data space. Note that the decision of mapping the RCMs simulations into the SOM generated with reanalysis data allows this SOM to be easily used for the projection of other datasets, such as different GCMs and RCMs experiments and/or reanalyses.

The evaluation of the RCMs performance in representing the observed SOM as depicted by ERA-Interim was quantified on the basis of node frequency using the Pearson correlation (Gibson et al. 2016) and spatial patterns by means of Taylor diagrams (Taylor 2001).

2.3 Relationship between CTs and extreme precipitation

In order to evaluate if the occurrence of extreme precipitation conditioned by a CT was significantly higher or lower than the climatological frequency of extreme events, we employed a Monte Carlo approach with 5000 trials, based on the methodology used by Agel et al. (2017). The frequency of days with extreme precipitation events given a SOM node was compared to the frequency of all days assigned to that node. A similar example than the one presented in Agel et al. (2017) was introduced for more clarification. For instance, consider we had 10,000 days in the period of study, of which 500 were extreme precipitation events (that is, a climatological frequency of 5%). If 50 days were assigned to a node, and of these 5 resulted in extreme precipitation events (10%), then for each trial 50 days were randomly chosen from the set of all days without replacement and the percentage of those days that were extreme was calculated. After 5000 trials, the distribution of this percentage of extreme events should peak close to the 5% climatological frequency. By sorting these percentages in ascending order and selecting the top and bottom 2.5% as the upper and lower bounds of significance (0.05 level), we were able to establish if the actual frequency of extreme events given that node was statistically different from the climatological frequency of extreme precipitation. This procedure was replicated for each of the 16 nodes. Additionally, the Pearson correlation (PC) between the observed frequencies and the RCM simulated frequencies in each node was calculated as a measure of correspondence.

Once the association between the circulation types identified at the 500 hPa level and extreme precipitation was determined, meridional and zonal wind and specific humidity anomalies fields at the 850 hPa level were considered for a more comprehensive evaluation of the synoptic conditions related to extreme precipitation events.

3 Results

3.1 Extreme precipitation in southern South America

The main features of extreme precipitation in SSA as depicted by the different observational datasets and RCMs were analysed. To this end, the austral warm (October–March) and cold (April–September) seasons were considered separately. Figure 2a displays the seasonal average value of P95 and the relative bias compared to STATIONS and Fig. 2b shows the Taylor diagrams for the spatial distribution of the P95 mean fields, whereas the Online Resource 1 (see supplementary material) displays the percentage contribution of extreme precipitation (daily precipitation above the P95) to the total rainfall of each season and the differences in each dataset compared to STATIONS. All the observational datasets successfully represented the spatial distribution of P95 across SSA when compared to STATIONS, with spatial correlations from 0.7 to 0.9 (Fig. 2). Maximum values were located in SESA during the warm season, up to 50 mm per day in the STATIONS dataset, which were underestimated (overestimated) by CHIRPS and CPC (MSWEP). Extreme precipitation in this region contributed more than 25% to the total rainfall during the warm season (see Online Resource 1 in supplementary material), which was well captured by CPC and MSWEP, but underestimated by CHIRPS. Much less intense P95 values were found in the arid diagonal region of Argentina and Argentinian Patagonia, indicating the lower magnitude of extreme precipitation in those areas, which were clearly underestimated by CHIRPS as reflected by the bias field (Fig. 2a). Note, however, that precipitation amounts and especially P95 are low in these regions and relative differences may exacerbate the biases. In the Chilean territory, P95 exhibited from the desert northern Chile, a southward increase of P95 to values around 30 mm a day in central and southern Chile for most of the observational datasets, with the exception of CHIRPS that overestimated the P95 values. In comparison, an eastward shift of the extreme precipitation was detected east of the Andes Mountains during the cold season in all observational datasets, although MSWEP and CHIRPS depicted maximum values of similar or even greater magnitude than in the warm season compared to STATIONS. During this season, the contribution of extreme precipitation to total rainfall was reduced in most of SSA (see Online Resource 1 in supplementary material), with the exception of central and southern Chile, where contributions seemed similar to the warm season despite the intense P95 detected mainly in the cold season. Here, the spatial extent and intensity of this maximum showed differences in the observational datasets. STATIONS and CPC depicted less intense and more spatially restricted values compared to MSWEP and CHIRPS, which presented overestimations of P95 in all the region (Fig. 2a, b). In the case of the set of RCMs (Fig. 2), larger dispersion was observed among them compared to the observational datasets. RegCM4 and WRF seemed to well represent the behaviour of P95 around SSA, particularly the observed maximum in SESA during both seasons, although it was slightly underestimated in WRF. However, they tended to overestimate the extreme precipitation contribution to total rainfall in some areas of SSA (see Online Resource 1 in supplementary material). REMO strongly overestimated the P95 in SESA and its relative contribution to total rainfall, with values up to 70 mm. Moreover, a common shortcoming of these RCMs was the overestimation of the extreme precipitation in some portions of the arid diagonal region of Argentina and in central and southern Chile, this latter more pronounced during the cold season. Conversely, RCA4 presented more issues in representing the spatial distribution of P95 and its contribution to total rainfall, particularly during the warm season, exhibiting the SESA maximum less intense and located further north. This model also overestimated extreme precipitation in central and southern Chile during the cold season, in agreement with the rest of the RCMs. Previous studies analysing mean precipitation in several portions of SSA exhibit general congruent results (Giorgi et al. 2012; Solman 2016; Falco et al. 2019; Remedio et al. 2019), indicating general dry (wet) biases over La Plata Basin (central and southern Andes). Moreover, for the analysis of precipitation extremes, Solman and Blázquez (2019) addressed the study of GCM-driven WRF, RCA and a lower resolution version of REMO and found a clear underestimation of extreme precipitation by RCA4 and more similar values to the CPC reference for the rest of the models. Tencer et al. (2016) evaluated a multi-model ensemble of RCMs in reproducing daily heavy precipitation and detected underestimations over most of La Plata Basin, while overestimations were found over the upstream slopes of the Andes, which was in agreement with the results found here in most of the RCMs. Even more, Carril et al. (2012) observed that RCMs thoroughly underestimate heavy precipitation intensities and detected larger model deviations from the observations in summer, which was associated with the occurrence of more severe events. More recently, Bettolli et al. (2020) analysed multiple statistical and dynamical regional simulations of extreme precipitation during the 2009–2010 warm season in SESA and found general underestimations (overestimations) of precipitation intensities in RCA4 (REMO), while WRF and RegCM4 recorded similar values to the in-situ observations. In that work, most of the downscaling tools evidenced added value compared to ERA-Interim raw precipitation, which exhibited a poor performance in all the evaluated aspects. The ERA-Interim misrepresentation of the observed precipitation is also extended over large portions of SSA, particularly in the areas close to the Andes mountain range (Zazulie et al. 2017). Nevertheless, some of the RCMs used in this work appeared to improve the representation of precipitation extremes in SSA. Besides the shortcomings of RCA4 and REMO, the spatial pattern over SSA was generally better captured by the RCMs than in former studies using previous versions of the RCMs employed in this work (Tencer et al. 2016).

Fig. 2
figure 2

a Seasonal mean P95 (expressed in millimeters) and relative biases compared to STATIONS (expressed as percentages); b Taylor diagrams of the spatial mean fields of P95 (a). Results are shown for the base period 1981–2010 in all the observational datasets and RCMs during the warm and cold seasons, separately

Altogether, most of the datasets well-captured both extreme precipitation regions in SSA (SESA and central and southern Chile) in terms of its intensity, which was in agreement with previous studies (Penalba and Robledo 2010; Solman and Blázquez 2019; Olmo et al. 2020). Furthermore, this analysis evidenced the uncertainty in precipitation features over SSA in both observations and RCMs. For the observational datasets, it was found that STATIONS and CPC, which are constructed using in-situ observations, and CHIRPS and MSWEP, both satellite estimations, seemed to respectively show similar representations of extreme precipitation in SSA. Particularly, these latter datasets often exhibited overestimations of P95, which was consistent with previous studies showing that satellite-based products tended to overestimate precipitation extremes in SESA and also in complex terrain regions, such as the areas of SSA near the Andes mountains (Salio et al. 2015; Sun et al. 2018).

In order to evaluate the frequency and intensity of precipitation according to different thresholds, regional average frequency distributions of daily precipitation in the different regions across SSA are displayed in Fig. 3. In this figure, the distributions are presented for the whole year, although the performance of the datasets was congruent during the warm and cold seasons (not shown). In northern Chile, almost all datasets showed higher frequencies than the ones detected in STATIONS. The observational datasets presented less spread than the set of RCMs, which exhibited an overestimation of light and moderate intensities and particularly of the highest intensities in the empirical distributions, more pronounced in REMO and RegCM4. In central and southern Chile, similar results among the observational datasets were found, although the dispersion increased. Both MSWEP and CHIRPS satellite-based products seemed to overestimate daily precipitation frequencies for all intensities considered in the Chilean territory. In the case of the RCMs, REMO and RegCM4 continued to show overestimations in moderate and high intensity precipitation, whereas WRF and RCA4 showed less differences with the observational datasets. Eastward from the Andes, the arid diagonal region of Argentina presented similar results among datasets, with the exception of models RegCM4 and REMO, which failed to reproduce the frequency distribution of moderate and intense daily precipitation. The regions examined above are the portions of SSA nearest to the Andes mountain range, which may be related to the difficulties of the RCMs in capturing the orographic precipitation that occurs in part of those regions (Solman et al. 2008; Tencer et al. 2016; Falco et al. 2019). In Argentinian Patagonia, both sets of observational datasets and RCMs performed similarly. In the case of SESA, which was the rainiest region as depicted by the daily precipitation frequency distributions, the set of observational datasets showed congruent results, but CHIRPS tended to underestimate moderate and heavy precipitation frequencies. WRF adequately reproduced the STATIONS distribution, but it often overestimated extreme values. RCA4 showed a clear underestimation of daily precipitation in SESA according to different thresholds, but particularly for extremes, which was congruent with the underestimation of P95 detected in Fig. 2. RegCM4 and REMO again overestimated the observed frequencies, with the largest differences in intense precipitation. The tendency detected in the higher resolution RCMs (RegCM4 and REMO) corresponded with the higher P95 values in Fig. 2 and was consistent in most of SSA, showing that these latest simulations did not appear to improve their representation of daily precipitation in SSA compared to the other RCMs. Similar results regarding the frequency distribution of daily precipitation were found in RCMs over La Plata Basin in previous works (Giorgi et al. 2014; Solman and Blázquez 2019).

Fig. 3
figure 3

Daily precipitation frequency distributions. Regional averages are presented for each sub-region of southern South America (SSA). Colours indicate the different observational datasets (green tones) and RCMs (purple tones)

Given all the description above, SESA is the sub-region in SSA that exhibited the maximum values of P95, particularly during the warm season, and recorded the highest frequencies of extreme precipitation. Therefore, the following analyses will be focused on the study of extreme events in SESA, with emphasis on the inter-comparison of the observational datasets and RCMs.

3.2 Extreme precipitation in southeastern South America

SESA is characterised by a precipitation annual cycle with large amounts of precipitation throughout the year, commonly associated with extremes. They are typically related to extratropical synoptic systems during the cold season, cyclogenesis in the transition seasons and mesoscale convective systems particularly during summer and spring (warm season) (Zipser et al. 2006; Cavalcanti 2012). In order to analyse the spatial coverage of extreme precipitation over SESA, different thresholds of covered area were considered. The covered area was quantified by the percentages of grid points with daily precipitation above its P95. No requirement was imposed on the proximity of grid points with extreme precipitation when defining the covered area, thus, allowing a less restricted condition for the RCMs to reproduce the systems in terms of their location. The frequency of days according to the different thresholds was calculated considering all days in the common period 1980–2010 (Fig. 4a). More dispersion among datasets was found when increasing the covered area in SESA, mainly for the RCMs. Particularly, RCA4 presented large overestimations of the number of days with greater percentages of covered area. However, as it was observed in Sect. 3.1, it tended to underestimate the frequencies of daily precipitation above different absolute thresholds (Fig. 3). It should be noted that the definition of extreme precipitation was relative to each dataset, since it was defined by their own percentiles. WRF also tended to overestimate the frequency of the most spatially extended events, whereas RegCM4 showed less differences with the other datasets. REMO showed clear underestimations particularly for days with more than 10% of covered area. As observed in Fig. 4a, when selecting lower percentages of covered area, there was much more agreement between datasets. Based on this analysis, an event of extreme precipitation with a minimum covered area was defined using a threshold of 5% of grid points exceeding their P95. This percentage of covered area allows the analysis of more structured extreme precipitation systems and also events with a greater spatial extension. In the following analyses only these days with extreme precipitation events were considered.

Fig. 4
figure 4

a Frequency of days versus the threshold of covered area (quantified as percentages of grid points with daily precipitation above its P95) in southeastern South America (SESA). b Monthly average frequencies of days with extreme precipitation events (at least 5% of covered area). Colours indicate the different observational datasets (green tones) and RCMs (purple tones)

It should be mentioned that although the areality distributions in Fig. 4a were calculated for the whole year, they were also estimated during the warm and cold seasons and the performance of the datasets was congruent throughout the year (not shown). However, as depicted by the monthly average frequencies of extreme precipitation events (at least 5% of covered area) exhibited in Fig. 4b, extreme precipitation showed intra-annual variability. All datasets adequately represented the seasonal variation of extreme rainfall events in SESA, with minimum frequencies during the austral winter and maximum values during spring and summer. When comparing the observational datasets, STATIONS and CPC showed similar frequencies, while CHIRPS and MSWEP usually underestimated the observations in STATIONS. In the case of the RCMs, the simulated seasonality usually presented a larger spread than the observations, and WRF, RegCM4 and RCA4 tended to overestimate extremes frequencies in the rainy season. However, RCA4 seemed to present a different behaviour in the cold season, when it tended to show smaller frequencies than STATIONS. REMO frequencies were more similar to the observations, although it exhibited less annual amplitude than the rest of the datasets and presented slightly higher frequencies in the cold season. Previous studies showed that most RCMs tended to underestimate mean precipitation throughout the year but particularly during winter over SESA (Jacob et al. 2012; Solman et al. 2013; Falco et al. 2019). However, the frequency of extreme events, as defined in the present work, seemed to be well-captured by the set of RCMs during winter months but overestimated during warm months (Fig. 4b).

In order to complement the evaluation of the observational datasets and RCMs in reproducing extreme rainfall events in SESA, we analysed the coincidence among the different observations and RCMs in detecting extreme precipitation events by comparing the dates of these events in all datasets. Figure 5 displays the absolute frequency of extreme precipitation events with at least 5% of covered area (indicated in brackets) and the number of coincident events between datasets considering a 3-day window, for the warm and cold seasons, separately. As it was expected, extreme precipitation events in SESA were much more frequent in the warm than in the cold season. 645 (358) extreme events were detected in STATIONS during the warm (cold) season, while the rest of the observational datasets exhibited a smaller number of events. As observed in Fig. 4b, RCMs tended to overestimate the number of extreme events, particularly during the rainy season. It was found that the greatest agreement in detecting extreme events during the same dates was among observations and among RCMs, respectively. In particular, CPC showed the largest coincidences with STATIONS (of about 40% in both seasons), and these two observational datasets were the best matched with the RCMs in both seasons. Whereas the number of coincident events among the rest of the datasets relative to the total number of extreme events remained very small (around 20–35%). These results implicated that the different datasets considered in this study often differ in the identification of extreme events in SESA, and that differences among observational datasets and RCMs are larger than within the different sets. This was in accordance with results by Bettolli et al. (2020), who analysed the ability of statistical and dynamical regional climate models to capture extreme precipitation events in SESA during the 2009–2010 warm season. The authors noted large observational uncertainties in the location and intensity of extreme precipitation events, which may be related to the differences found in this work when identifying extreme events over the region. Notwithstanding, although a weak day-to-day correspondence to the observations is expected in the RCMs due to temporal synchrony is only induced by the boundary conditions with prescribed reanalysis values (Gutiérrez et al. 2019), the models were able to capture the correct timing of precipitation occurrence. This was reflected in the number of coincident dates with extreme precipitation events as shown in Fig. 5. All of this points out the current observational uncertainty when evaluating precipitation extremes in SESA at a daily scale. Therefore, the use of multiple datasets is of key importance when carrying out studies of climate variability and extremes in the region.

Fig. 5
figure 5

Number of days with extreme precipitation events in southeastern South America (SESA) for each individual dataset in brackets and the number of coincident extreme precipitation events between each pair of datasets in circles during the a warm season, b cold season

In general terms, these findings highlight the importance of extreme precipitation events in SESA in terms of their intensity and frequency. Even more, the comparison between observational datasets and RCMs exhibits their shortcomings in the detection of extreme events with different spatial coverage and their seasonal behaviour. This poses a particular challenge when performing impact studies and for the assessment of future projections of extreme rainfall in the region.

3.3 Circulation types

In the previous sections, SESA was characterized as the sub-region of SSA where the intensity and frequency of extreme precipitation events are remarkable. Hence, the synoptic circulation that favours the occurrence of extreme events in SESA and the evaluation of the RCMs in simulating the main circulation features related to them are key aspects of study. Figure 6 displays the span of CTs as obtained by the SOM procedure explained in Sect. 2.2 (referred as node (n° row, n° column)). The nodes are arranged in the SOM with a topological order, with the corners of the SOM representing CTs that differ the most from each other. The identified circulation patterns showed positive and negative structures or centers of Z500 anomalies that disturb the typical westerly flow of mid-latitudes at the 500 hPa level. In the bottom SOM, mostly anticyclonic centers were found, covering large portions of SSA and producing an upper-level ridge often centered in the southern Atlantic Ocean. The positive Z500 anomalies were more extended over the continent in nodes (4,1) and (4,2), associated with generally stable conditions over SSA, whereas in nodes (4,3) and (4,4) they were more restricted to the southern tip of the continent. In the middle-right SOM, negative Z500 anomalies were mainly observed in the Atlantic Ocean, which varied in intensity and location among the CTs. For instance, nodes (2,3) and (3,3) presented a dipolar structure with a negative center in the Atlantic Ocean and a positive center over the southern Pacific Ocean, more intense in the former CT. Nodes (2,4) and (3,4) exhibited a similar structure but, in these cases, the anomalous centers presented a SE-NW inclination. In the case of the top-right SOM, nodes (1,3) and (1,4) showed a wide negative center over the Atlantic Ocean that entered the continent, while positive anomalies were located in the southern Pacific Ocean, more intense in node (1,4). When analysing the top-left SOM, an anomalous cyclonic center affecting the southern Pacific Ocean and southern tip of South America and an anticyclonic center positioned over the Atlantic Ocean were found. This configuration of negative anomalies in the Pacific Ocean favours the intrusion of cold and humid air from the south-west to southern Chile, while the positive anomalies in the Atlantic Ocean allow warmer and humid air from lower latitudes to enter SESA. In terms of node frequencies (Fig. 7a), the top-left and bottom-right corners of the SOM seemed to be the most frequent CTs in both seasons, although nodes located at the top-right SOM also showed high frequencies during the cold season. This classification of dominant CTs in SSA using SOM exhibited congruent results with previous studies that considered different clustering techniques, circulation domains and reanalyses (Solman and Menéndez 2003; Bettolli et al. 2010; Barrucand et al. 2014; Loikith et al. 2019; Olmo et al. 2020). In this way, the 4 × 4 SOM classification performed in this study adds more detail to the spectrum of synoptic states given by the CTs that affect SSA.

Fig. 6
figure 6

The 4 × 4 SOM map of geopotential height anomalies at 500 hPa (Z500) from ERA-Interim in shaded colours (expressed in meters). Black numbered lines show geopotential height at 500 hPa composites associated with each node (expressed in meters). Nodes are identified as node (n° row, n° column). For instance, the upper left corner is referred to as node (1,1)

Fig. 7
figure 7

a SOM node frequencies for ERA-Interim and for the projected RCMs; b SOM node frequencies of extreme precipitation events in southeastern South America (SESA) for the STATIONS dataset and for the projected RCMs. Pink solid (dotted) lines indicate those nodes that significantly enhanced (inhibited) the occurrence of extreme precipitation events based on a Monte Carlo approach. Results are shown for the warm and cold seasons, separately

Following this analysis, focus was put on the relationship between the CTs described above and the extreme precipitation events characterised in SESA (Sect. 3.2). To this end, the days with extreme precipitation events detected in the STATIONS dataset (at least 5% of covered area) were considered, and in order to characterise the spatial location of these events, the percentages of these days at each grid point conditioned by each CT were calculated during the warm and cold seasons, separately (Fig. 8). Additionally, those nodes that significantly enhanced or inhibited the occurrence of extreme rainfall events (according to the Monte Carlo approach explained in Sect. 2.3) were highlighted (significant at the 5% level) in the first column of Fig. 7b, that is the nodes in which the frequencies of extreme precipitation events were higher or lower than the expected due to chance. For the warm season (Fig. 8a), nodes (1,1), (1,2) and (1,3) presented significantly higher frequencies of extremes than the climatological frequency of about 18% (Fig. 7b). Particularly, nodes located in the top-left SOM showed the highest percentages of extreme events. The disturbances exhibited in these CTs produced a marked trough usually located east of the Andes, providing favorable instability conditions for the development of extreme precipitation systems in SESA (Fig. 6). This baroclinic configuration at the mid-level atmosphere typically combines with an intensification of the SALLJ, strengthening the meridional advections of humid and warm air from the Amazonas to SESA, necessary for the occurrence of heavy storms in the region (Salio et al. 2007; Rasmussen and Houze 2016; Rasera et al. 2017). The spatial distribution of percentages found for nodes (1,1), (1,2) and (1,3) seemed to present a regionalisation of the extreme events that agreed with the location of the Z500 anomalies in the different atmospheric states, with more extremes in southern SESA for node (1,1), followed by high percentages in the central part of the region in node (1,2) and a marked center of high percentages in northeastern SESA in node (1,3) (Fig. 8a). It is worth mentioning that this regionalisation of extreme events was in line with the different positions where the SALLJ was located in SESA, which direction varied from N to NE in nodes (1,1) to (1,3) (as will be discussed later in Fig. 9 for node (1,3)). Although some areas with high percentages of extremes were detected such as in nodes (2,1) and (3,1), the rest of the SOM nodes did not show a specific spatial pattern of extreme occurrences. Nodes located at the bottom-right SOM exhibited the smallest percentages around SESA, in agreement with the atmospheric configurations found at the 500 hPa level. In particular, node (4,4) showed significantly lower frequencies than the expected due to chance (Fig. 7b) indicating that this structure inhibited precipitation extremes in the warm season. During the cold season (Fig. 8b), the percentages of extremes found for all nodes were always smaller than in the warm season (Fig. 8a), in accordance with the statistical characterisation of extreme precipitation presented in Sect. 3.2. As in the warm season, nodes located at the top-left SOM were associated with extreme events, but only node (1,2) significantly favoured extreme rainfall events in more restricted areas of SESA (Fig. 7b). The rest of the SOM nodes also presented few areas with reduced extreme events percentages. However, contrary to the warm season, nodes (4,3) and (4,4) showed relatively high percentages across SESA, which could be associated with the anomalous anticyclone located at the southern Atlantic Ocean and slight negative anomalies in the northern portion of the circulation domain. In coincidence with these spatial distributions of extreme precipitation events conditioned by CTs, nodes (4,4) and (3,3) enhanced the occurrence of extreme events, while four CTs [(4,1), (3,2) and (3,4)] inhibited their occurrence (Fig. 8b).

Fig. 8
figure 8

Percentages of days with extreme precipitation (daily precipitation above the P95) at each grid point conditioned by each circulation type as depicted by ERA-Interim in the STATIONS dataset for the a warm season, b cold season

Fig. 9
figure 9

Composites during extreme precipitation events of: geopotential height at 500 hPa (anomalies are shaded), wind (vector of anomalies) and specific humidity anomalies at 850 hPa (shaded) (first two panels, respectively), the percentages of days with extreme precipitation (daily precipitation above the P95) at each grid point and mean precipitation (last two panels, respectively) for the a node (1,3) in the warm season

Fig. 10
figure 10

Composites during extreme precipitation events of: geopotential height at 500 hPa (anomalies are shaded), wind (vector of anomalies) and specific humidity anomalies at 850 hPa (shaded) (first two panels, respectively), the percentages of days with extreme precipitation (daily precipitation above the P95) at each grid point and mean precipitation (last two panels, respectively) for the node (4,4) in the cold season

By these analyses, it was possible to identify the CTs most related to extreme precipitation events as well as the different areas in SESA where the CTs conditioned their occurrence. Previous studies analysing the link between mid-level atmospheric circulation and surface variables showed that specific CTs are related to the occurrence of extreme events. Tencer et al. (2016) found, in a more reduced number of patterns constructed by a combination of principal components analysis (PCA) and k-means clustering, similar structures at 500 hPa that were associated with the occurrence of temperature and precipitation compound extremes in several parts of Argentina. More recently, Olmo et al. (2020) used oblique rotated PCA and pointed out the influence of specific circulation patterns on the individual and joint occurrence of extreme temperatures and heavy precipitation in SSA. Particularly, the dipolar structure presented in the top-left SOM was strongly related to heavy precipitation and warm nights or cold days occurring together in southern Chile and SESA. Even more, the present work shows that besides SESA is characterized by a remarkable extreme precipitation occurrence as a whole region, the precipitation extremes can be regionalised according to the dominant synoptic environment.

In a next step, the observed frequencies of days assigned to a node (according to the ERA-I data) and of extreme precipitation events conditioned by each CT as depicted by the STATIONS were compared to the RCMs simulations (Fig. 7). An ideal model would recreate the same CTs that take place in the real atmosphere, here represented by the ERA-I reanalysis, and the same node frequencies as well. The RCMs, which daily fields were projected to the reanalysis-based SOM according to the Euclidean distance (Sect. 2.2), were able to reproduce the synoptic structures of the CTs illustrated in Fig. 6, as displayed in the Taylor diagrams in the Online Resource 2 (see supplementary material), with spatial correlations of about 0.95 and normalized standard deviations near 1 for most of the circulation patterns. All models exhibited coincident results. The RCMs showed similar distributions of node frequencies among them, and strongly overestimated the frequencies of few nodes, while they generally underestimated the frequencies of the rest of the nodes in both seasons (Fig. 7a). For instance, nodes (4,4) and (3,1) in ERA-I during the warm season exhibited frequencies of about 10 and 7%, respectively, whereas the RCMs presented frequencies of about 18% on both of them. Nodes (4,1) and (4,2) exhibited the lowest RCM frequencies in both seasons, with values of less than 1% in most of them, and were the RCM projections that exhibited the largest differences of spatial structures with the CTs as depicted by ERA-Interim (see Online Resource 2 in supplementary material). When quantifying the agreement of the node frequency distributions, correlations of about 0.55 were found for the warm season, whereas of about 0.47 in the cold season (Fig. 7a). In the case of extremes frequencies (Fig. 7b), the RCMs presented more varied results than in the evaluation of node frequencies, probably due to the differences in the set of extreme events in each model (Fig. 5). However, they relatively well captured the highest frequencies of extreme precipitation events of the CTs located at the top-left SOM in the rainy season. RCMs also tended to overestimate the frequencies at the bottom-right SOM, which may be related to the almost nule frequencies of nodes (4,1) and (4,2). Correlation coefficients for this season varied from 0.14 in RegCM4, which failed in reproducing the distribution of extreme events within the SOM, to 0.63 in REMO. During the cold season, the RCMs well reproduced the frequency of extreme events in node (4,4), but underestimated the extreme frequencies of the top-left CTs and of node (3,3), this latter probably due to the higher frequencies detected in the neighboring node (3,4). Correlation coefficients for the RCMs were less spread than in the warm season (from about 0.35–0.60), but RegCM4 continued to present the lowest correlation and REMO the highest correlation, which was little reduced compared to the warm season.

It is worth mentioning that this comprehensive analysis was replicated using the CPC dataset as the observational reference, due to STATIONS and CPC being the two observational datasets that matched the most with the RCMs (Fig. 4). Results from this analysis considering these two datasets were consistent. Thus, for brevity, only STATIONS results were shown.

All of the described above exhibited that the RCMs tended to simplify the predominant CTs into a more reduced number of configurations, exhibiting difficulties in reproducing the frequency distribution of days and extreme events in the SOM (Fig. 7). Glisan et al. (2016) found a similar behaviour of a WRF climate simulation when analysing the frequency distributions in a SOM performed for Alaska, while Pinto et al. (2018) also found some contradictions in the SOM node frequencies for South Africa when comparing ERA-I results with a set of GCMs and GCM-driven RCMs in the historical period and in future projections. This sort of evaluations of the currently available climate models becomes necessary in order to gain confidence in past and future climate changes, particularly for extreme events, as their future changes would be probably linked to changes in the associated atmospheric circulation.

Finally, in order to further analyse the ability of the SOM in representing the synoptic environment associated with extreme precipitation events, a more detailed description of two particular CTs that exhibited high extreme frequencies (Fig. 7) is displayed in Figs. 9 and 10. To this end, only days that recorded extreme precipitation events given these CTs were considered. Composites of Z500 were performed and, additionally, wind and specific humidity anomalies at 850 hPa were included in the analysis to consider the role of the SALLJ in the occurrence of extreme rainfall in SESA (Salio et al. 2007; Rasmussen and Houze 2016). Moreover, the percentage of extreme events (at least 5% of covered area) conditioned to each CT, and mean daily precipitation amounts during the extreme events were also displayed. Figure 9 shows the results for node (1,3), which was found to significantly enhance the probability of extreme precipitation events during the rainy season. In about 17% of the days when this node occurred, an extreme precipitation event was detected in the STATIONS dataset (Fig. 7b). This frequency was similar in the set of RCMs. The ERA-I configuration at 500 hPa (first row in Fig. 9) was congruent with the CT displayed in Fig. 6, although the negative center over the southern Atlantic Ocean was more extended over the continent and, in addition, an anticyclonic structure was found north to the negative anomalies center. All RCMs successfully represented the spatial pattern at this level of the atmosphere, in both its structure and intensity. When analysing the 850 hPa level (second row in Fig. 9), strong NW wind anomalies controlled northeastern SESA, which were associated with positive anomalies of specific humidity at this level. This configuration at 850 hPa was related to the extension of the SALLJ further south, advecting humid and warm air from lower latitudes that enhanced the instability conditions required for the development of extreme rainfall events in SESA during the warm season. The RCMs showed similar configurations than ERA-I, even though they tended to present a more marked NW–SE inclination of the SALLJ. In particular, RCA4 and REMO overestimated the specific humidity anomalies during the extreme events. As reflected by the contrast between positive specific humidity anomalies in northeastern SESA and negative anomalies in the southwestern part of the region, the surface panels displayed in Fig. 9 exhibited that extreme rainfall events typically occurred where the anomalous wind from the north reached SESA and, in consequence, moisture availability was higher. The spatial distribution of extremes (third row in Fig. 9) was adequately represented by the RCMs, that detected high percentages in the northeastern part of SESA. However, they tended to locate the maximum values little further south than in the observations and presented a more extended center of high frequencies. Moreover, precipitation composites for the extreme events (fourth row in Fig. 9) showed observed maximum intensities in northeastern SESA, whereas the RCMs presented more varied results: RegCM4 and WRF precipitation composites were congruent with the STATIONS, but REMO tended to overestimate both the intensity and the spatial extent of the precipitation extremes, with high intensities over large portions of SESA. RCA4 strongly underestimated precipitation intensities, although the percentages of extreme events and their spatial location were generally well represented. Furthermore, this model tended to overestimate the positive specific humidity anomalies in the region, even though this did not result in heavier rainfall. In consequence, RCA4 may be misrepresenting the physical relationships between mid and low-level circulation and the surface mechanisms involved in the development of extreme precipitation systems in SESA, which may be the reason for the strong underestimation of precipitation intensities found in this study, in agreement with Solman and Blázquez (2019) and Bettolli et al. (2020).

The other CT selected for further analysis was the node (4,4) (Fig. 10) during the cold season, which showed some of the highest frequencies of extreme events in large areas of SESA, in contradiction with results found for the warm season (Figs. 7, 8b). In this case, the ERA-I configuration at the 500 hPa level showed some differences with the CT displayed in Fig. 6. An anomalous cyclonic center was found over northern and central Chile and western Argentina, while positive anomalies dominated over the Atlantic Ocean and the southern Pacific Ocean. This configuration of anomalies disturbed the mid-latitudes flow, producing a pronounced trough near 75° W, which was well-represented by all the RCMs. At the 850 hPa level, the moisture content in SESA was anomalously high, mainly due to advections from the north and northeast in the whole SESA domain, which can be associated with extreme precipitation spread all over the region. The set of RCMs presented a similar configuration of anomalies at 850 hPa, although the wind direction was less clear than for the other case of study (Fig. 9). In particular, REMO tended to overestimate the positive specific humidity anomalies. The dispersed extreme frequencies and precipitation amounts over SESA were adequately reproduced by the set of RCMs. RegCM4 and WRF correctly represented the precipitation intensities, with maximum values in the eastern portion of SESA, while WRF tended to overestimate the extreme frequencies of this node. REMO overestimated both the intensity and the spatial extent of the extreme events, whereas RCA presented clear underestimations of the rainfall amounts.

Since the SOM was constructed considering all days of both seasons of the year (extreme and non-extreme days), it may be expected to find some differences between the SOM node and the composite of days within the node that recorded extreme precipitation in each season. To quantify their differences, the spatial correlation between the composites of Z500 anomalies of extreme days and non-extreme days within these CTs was calculated. Correlations coefficients were 0.95 in node (1,3) during the warm season and 0.84 in node (4,4) during the cold season, showing the good agreement of configurations in both nodes. Therefore, the main structures presented in Fig. 6 were representative of the mid-level atmospheric circulation that significantly enhanced the probability of extreme precipitation events (Fig. 7). In this sense, the patterns found here represent a robust approach for defining and differentiating structures at a larger scale as they organised the lower scale processes responsible for generating extreme precipitation events.

4 Discussion and conclusions

Extreme rainfall events are recognised as some of the major threats of climate change, since higher greenhouse gases concentrations in the atmosphere can lead to an increase in water vapor availability necessary for the development of precipitation systems (Du et al. 2019). In this context, southern South America (SSA) is a wide populated region where extreme precipitation has large impacts. Several forcings at multiple scales control the precipitation occurrence and intensity in the different portions of SSA, which brings complexity when performing climate studies of spatial and temporal variability, especially for extremes. Even more, this kind of assessment requires long records of high-quality and high-resolution data. However, despite the growing availability of several observational datasets with information from multiple sources (including radar, satellite, gauge observations and combinations of them), the uncertainties in precipitation extremes are still large, which should be especially considered for model calibrations and evaluations (Bettolli et al. 2020; Condom et al. 2020).

In this work, we addressed the evaluation and comparison of multiple observational datasets and simulations from four RCMs driven by ERA-Interim in reproducing extreme precipitation in SSA during the common period 1980–2010, including the brand new CORDEX CORE simulations of RegCM4 and REMO. Extreme events were characterised in terms of their intensity and frequency all around SSA. When considering extreme intensities (Fig. 2), southeastern South America (SESA) and central and southern Chile presented the maximum values in all of SSA during the warm and cold seasons, respectively. Congruent behaviours were found between both gauged-based datasets CPC and the gridded stations (STATIONS) as well as between the satellite-based products CHIRPS and MSWEP. These two last datasets tended to overestimate extreme intensities in comparison to the gauge-based datasets. In the case of the RCMs, most of them successfully captured the spatial pattern of extreme precipitation across SSA, although showing overestimations over central and southern Chile and more variable results in SESA. RegCM4 and WRF seemed to well represent the observed maximum in SESA, while REMO strongly overestimated extreme rainfall in this region (Fig. 2) and RCA4 had more difficulties in representing the spatial distribution of extreme precipitation intensities, especially during the warm season. When considering extreme frequencies (Fig. 3), inter-comparison results varied among the sub-regions of SSA, although the largest dispersion in the frequency distributions was mainly found for moderate and heavy precipitation amounts, and the spread between observational datasets was smaller than for the RCMs. In SESA, WRF adequately reproduced the STATIONS distribution, whereas RCA4 (RegCM4 and REMO) underestimated (overestimated) daily precipitation intensities, particularly for extremes.

By these analyses, extreme precipitation in SESA standed out from the rest of SSA in terms of its intensity and frequency. The frequency of extreme events with a spatial extension up to 7.5% of grid points were the best captured by all datasets considered, exhibiting lower agreement for more extended systems especially in the RCMs (Fig. 4a). When analysing extreme events with a minimum areal extension of 5% of grid points over SESA, the intra-annual variability was generally well-captured by the observational datasets, whereas the RCMs tended to overestimate the frequencies during warm months (Fig. 4b). Particularly, the identification of days with extreme rainfall events showed large differences among observational datasets and RCMs, reflecting the current observational uncertainty when evaluating precipitation extremes in SESA at a daily scale (Fig. 5). The different timing and location of extremes (including the areal coverage) among datasets poses a particular challenge when performing impact studies in the region and for the assessment of future projections of extreme rainfall. Thus, stressing that the use of multiple datasets is of key importance when carrying out regional studies of climate variability and extremes.

The synoptic circulation associated with extreme precipitation events in SESA was described by performing a classification of circulation types (CTs) using the Self-Organizing Maps (SOM) technique, which can serve as a tool for understanding the relationship between large-scale atmospheric circulation and local variables and to assess the value of downscaled climate (Hewitson 2010). The identified CTs presented positive and negative structures or centers of geopotential height anomalies at the 500 hPa level that disturb the typical westerly flow of mid-latitudes (Fig. 6). The range of atmospheric configurations found by this procedure represented the variety of synoptic states, and specific CTs were found to significantly favour the occurrence of extreme precipitation events in sectorized areas of SESA (Figs. 7, 8). These CTs presented a dipolar structure of Z500 anomalies that produced a marked trough at the mid-level atmosphere usually located east of the Andes, which was found to significantly enhance the occurrence of extreme precipitation events in the warm season (Fig. 7). This kind of structures combined with an intensification of the low-level jet of South America usually provide favorable instability conditions for the development of extreme precipitation systems in the region and follow the conceptual scheme of the atmospheric configuration that leads to these events as described by previous studies (Salio et al. 2007; Rasmussen and Houze 2016; Rasera et al. 2017). However, the present work showed that besides SESA is characterised by a remarkable extreme precipitation occurrence as a whole region, the precipitation extremes in the region can be regionalised according to the dominant synoptic environment of each CT. In this context, the patterns found here represented a robust approach to defining and differentiating structures at a larger scale as they organised the lower scale processes responsible for generating extreme precipitation events. This kind of study could be considered as an initial step in a hierarchical classification of structures and variables through which the occurrence of precipitation extremes in the region of study can be explained (Bettolli et al. 2010).

When evaluating the set of RCMs, they adequately reproduced the spatial structures of the CTs. Although they were able to represent the SOM node frequencies, they tended to simplify the predominant CTs into a more reduced number of configurations, showing some difficulties in reproducing the frequency distribution of days. However, they appropriately represented the extreme precipitation frequencies within the SOM and their associated atmospheric configurations at the 500 and 850 hPa levels, exhibiting some limitations in the location and intensity of the resulting precipitation systems (Figs. 7, 9, 10).

Overall, this work contributes to the study of extreme precipitation in SSA, and particularly in SESA, where extreme rainfall events are the most intense and frequent and observational uncertainties are yet notably large. In this context, continuous evaluations of observational datasets, global and regional climate models become necessary for a better understanding of the physical mechanisms that are responsible for extreme precipitation over the region, as well as for their past and future changes in a climate change scenario.