Introduction

The Metropolitan Area of Buenos Aires (MABA) is the third megacity in Latin America. In spite of regulations, the ozone (O3) concentration in this region has not been measured regularly until 2015, and since then at only one air quality monitoring station. Therefore, air quality models are currently the only available tool to provide estimates of O3 distribution across the MABA. The reliability of model results is assessed through model performance evaluations, including several steps among which the probabilistic evaluation plays an important role (Chang and Hanna 2005; Derwent et al. 2010). It assesses the uncertainty in model results that is caused by the uncertainties in model formulations, input variables or parameters, resolution, etc. Among all possible sources of error, uncertainties in the model input variables typically dominate the uncertainty in modeled pollutant concentrations (Russell and Dennis 2000). A widely used methodology to assess the uncertainty of modeled pollutant concentrations caused by possible errors in the input variables is the Monte Carlo (MC) analysis (e.g., Hanna et al. 1998; Bergin et al. 1999; Moore and Londergan 2001; Hanna et al. 2005, 2007; Rodriguez et al. 2007; Tang et al. 2010; Tan et al. 2014; Pineda Rojas et al. 2016). While this technique can be applied to any air quality model, high computational demand is among its main limitations. Typically, N = 100 is considered an acceptable number of MC runs to assess the (gridded) ensemble of modeled concentration solutions from which uncertainty is computed. Its analysis usually limits to the evaluation of the sensitivity of the output to uncertainties in the model input variables at a few receptors, providing quantitative measures (e.g., sensitivity coefficients) that show which are the variables that dominate the model uncertainty at those receptors. However, further analysis of gridded MC outcomes (i.e., the model output and associated input data) may provide some insight on the type of solutions that can be obtained with the model. The main limitation, the size and complexity of the dataset, can be tackled utilizing techniques from the field of big data.

Clustering analysis aims for an unbiased classification of big datasets into groups containing objects with similar characteristics. In air quality studies, it has been widely used to identify monitoring stations with similar pollutant concentrations (e.g., Flemming et al. 2005; Afif et al. 2009; Henne et al. 2010), classify monitoring sites based on their chemical composition (e.g., Austin et al. 2013; Wang et al. 2016; Park et al. 2018), study the impact of remote emission sources on urban levels of particulate matter (PM) concentrations (e.g., Borge et al. 2007; Karaca and Camci 2010; Dimitriou and Kassomenos 2014; Terrouche et al. 2016), and identify meteorological patterns associated to pollution episodes of O3 (e.g., Beaver and Palazoglu 2006; Pakalapati et al. 2009; Khedairia and Khadir 2012; Awang et al. 2016) and PM (Rimetz-Planchon et al. 2008; Unal et al. 2011). Only a few works have applied clustering analysis to study gridded modeled pollutant concentrations (e.g., Jin et al. 2011). Despite of its wide application in air quality studies, it has not been used in combination with MC simulations to perform a systematic qualitative screening of the outcomes of air quality models.

DAUMOD-GRS (MODelo de Dispersión Atmosférica Urbano-Generic Reaction Set) is a simple urban-scale atmospheric dispersion model that allows the estimation of ground-level O3 concentrations resulting from area source emissions of nitrogen oxides and volatile organic compounds, transport by the wind, atmospheric dispersion, and simplified photochemistry (Pineda Rojas and Venegas 2013). The statistical evaluation of the model has shown an acceptable performance to simulate O3 hourly concentrations at 20 receptors within the MABA (Pineda Rojas 2014). In Pineda Rojas et al. (2016), the uncertainty of the summer maximum O3 hourly concentration (Cmax) was evaluated at each receptor of the MABA domain applying the MC analysis. Results from that work showed that the greatest uncertainties of Cmax (up to 47 ppb) are obtained outside of the MABA, where the greatest values of Cmax are estimated (up to 51 ppb) and the lack of observations impedes model testing. Given the amount of information obtained from the MC simulations, in this work, we apply clustering techniques to characterize the conditions leading to the occurrence of modeled Cmax values. The objective is to further explore those gridded MC outcomes in order to better understand the type of model solutions that can be obtained with the DAUMOD-GRS throughout the whole MABA area.

Methodology

Description of the Monte Carlo outcomes used for clustering

The MC simulations are runs of a model fed with N different input datasets, obtained by perturbing the model variables randomly based on their error distributions and ranges. The base case input datasets (i.e., without perturbations) consist of surface hourly meteorological information registered at the domestic airport of Buenos Aires city (AEP: 34° 34′ S, 58° 30′ W) during a typical summer (2007), sounding meteorological data from the international airport (EZE: 34° 49′ S, 58° 30′ W), and area source emission rates of nitrogen oxides (NOx) and volatile organic compounds (VOCs) from the emission inventory developed for the MABA by Venegas et al. (2011). A constant regional background concentration of 20 ppb is assumed for ozone based on a previous study (Mazzeo et al. 2005), and “clean air” concentration values are assumed for NOx and VOCs given that the MABA is surrounded by non-urban areas.

The MC outcomes used in this work were obtained previously (Pineda Rojas et al. 2016) by perturbing nine input variables that feed the DAUMOD-GRS model: wind speed (WS) and direction (DIR), air temperature (T), sky cover (SC), total solar radiation (TSR), atmospheric stability class (KST), NOx emission rate (QNOx), VOC emission rate (QVOC), and regional background O3 concentration ([O3]r). Due to the lack of information, the probability density functions and error ranges of these variables were taken from the literature (see Table S.1). Simple random sampling was used to obtain N = 100 sets of perturbations from these uncertainty distributions, with which the “base case” data described above were perturbed. In this way, 100 perturbed input datasets were generated to perform the MC runs. All simulations considered a temporal resolution of 1 h and a spatial resolution of 1 km × 1 km. At each hour, spatially constant meteorological conditions were assumed, and only the emissions were allowed to vary spatially. On the other hand, perturbations of all variables were considered constant both spatially and with time (see Pineda Rojas et al. (2016) for details).

The results obtained from these MC simulations include the value of Cmax estimated during diurnal hours (7–19 h) at each square kilometer of the MABA domain (4647 receptors) and the values of the nine perturbed input variables at the moment of occurrence of Cmax. It is worth noting that, at different receptors, Cmax occurs at different times of the summer which results in a wide range of leading conditions in spite of the assumption of horizontally homogeneous meteorological variables (Pineda Rojas et al. 2016). This previous work also shows that the hour of occurrence of Cmax (H) can vary considerably. For this reason, H was also considered a relevant variable for the clustering analysis. Hence, a total of 4,647,000 data (i.e., 10 variables × 4647 receptors × 100 possible solutions) were obtained from that model uncertainty assessment. This size clearly limits the direct observation of the data.

Clustering analysis

Clustering analysis aims to find groups of “objects” within a large dataset based on their similarity. The k-means algorithm is a widely used clustering method for air quality studies (e.g., Davies et al. 1998; Beaver and Palazoglu 2006; Lu et al. 2006; Pakalapati et al. 2009; Jin et al. 2011; Khedairia and Khadir 2012; Austin et al. 2013; Gomez-Losada et al. 2018). It is a heuristic algorithm aiming to place k cluster centers (k, user defined) in a M-dimensional space (M, number of variables describing the objects) so as to minimize the mean distance from objects to their closest cluster center. The k centers are first distributed randomly, following which two steps are iterated until a steady solution is reached: (i) each object is assigned to the nearest cluster center and (ii) each cluster center is reset to the geometrical mean among all objects belonging to it.

Implementation of the k-means method

The definition of the objects and the form of standardization depend on the purpose of the clustering implementation. In this case, we aim to determine, for example, whether or not spatial patterns can be observed in the conditions of occurrence of Cmax modeled with the DAUMOD-GRS. Hence, an object is considered the set of conditions (xi, i = 1, ..., M) in which an individual Cmax occurs, and the outcomes at all the receptors in the modeling domain are pooled together to define the object set (i.e., 100 MC repetitions at 4647 receptors = 464,700 objects). Given that the conditions of occurrence of Cmax through the MC simulations are given by the nine perturbed input variables, all these variables together with the hour of occurrence of Cmax are chosen to define the M-dimensional space (i.e., M = 10). Since variables are not comparable, each one is scaled subtracting its mean (\( \overline{x_i} \)) and dividing by its standard deviation (\( {\sigma}_{x_i} \)) across all objects in the dataset:

$$ {x}_i^{\prime }=\left({x}_i-\overline{x_i}\right)/{\sigma}_{x_i} $$
(1)

The wind variables (polar coordinates) are decomposed into their x and y components, which define a proper Euclidean space (otherwise, 0° and 360° would be far apart). The MATLAB function kmeans is used with n = 100 random initializations (to avoid suboptimal local solutions), and different values of k are considered. For a given value of k, among the n clustering solutions, the one with the lowest within-cluster sum of point-to-centroid distances is selected:

$$ S={\sum}_{j\in c}d{\left(\overrightarrow{x_j},\kern0.5em \overrightarrow{x_c}\right)}^2 $$
(2)

where \( \overrightarrow{x_j} \) is the position of point j in the normalized variables space [\( \overrightarrow{x_j}=\left({x}_1^{\prime },{x}_2^{\prime },\dots, {x}_M^{\prime}\right) \)], \( \overrightarrow{x_c} \) is the centroid position of cluster c, and the sum is performed over all elements j in cluster c and over all clusters c = 1, ..., k.

Choice of the best cluster distribution

There is no universal agreement regarding the optimal number of clusters for a given dataset. In this paper the silhouette criterion (Rouseeuw 1987) is applied. It compares, for different values of k, the average over all objects of:

$$ {S}_j=\left({b}_j-{a}_j\right)/\max \left({a}_j,{b}_j\right) $$
(3)

where aj is the mean distance from object j to all other objects in the same cluster and bj is its mean distance to objects in other clusters. Silhouette maxima are typically used to determine k since they offer better cluster definition than its local neighbors.

In an initial exploration with downsampled data (100 times), k-means solutions were obtained for k ranging from 1 to 10. The MATLAB function silhouette was used to compare them, exhibiting two local maxima at k = 4 (0.51) and k = 6 (0.54). Both sets of solutions were analyzed for the complete dataset and qualitatively similar conclusions were extracted. In the remaining part of this paper, results for k = 4 are reported for the sake of simplicity in description and visualization.

Results

Spatial patterns

Our first observation is that when plotted on the map of the MABA, clusters present characteristic spatial patterns, both when the frequency of occurrence of each cluster (Fig. 1) and when the dominant cluster in each location (Fig. 2) are considered. Except for cluster 4 that dominates in emission transition areas, all other clusters exhibit a radial pattern resembling those of the NOx and VOC emissions (see Pineda Rojas 2014). Clusters 1 and 2 appear mostly at receptors with emissions: in the city of Buenos Aires (highest emission rates) and in the greater Buenos Aires (moderate emissions), respectively. Clusters 3 and 4 are mostly present in the suburbs and outside of the MABA where no emissions are considered and the highest Cmax values are obtained (Pineda Rojas et al. 2016). In 57% of receptors, the frequency of occurrence of the dominant cluster at each receptor is ≥ 0.85; and in 81% of them, it is ≥ 0.65 (see Fig. 2b). (Note that only in 3% of the receptors, the frequency of the dominant cluster is < 0.5.) This means that in most of the analyzed domain, the family of leading conditions of Cmax is well defined, while only in a small portion, two or more clusters can dominate depending on the specific MC run.

Fig. 1
figure 1

Spatial distribution the frequency of occurrence of each cluster in the MC simulations (✖ indicates receptors selected for a more detailed analysis)

Fig. 2
figure 2

Spatial distribution of a the dominant cluster at each receptor and b its frequency of occurrence in the MC simulations

Cluster spatial patterns highlight the predominant role played by emissions in determining the conditions of occurrence of Cmax.

Multivariate cluster structure

Are emissions the only variables determining the cluster structure (or separation) or do they interact with other variables? One way to look at the relative contributions of different variables to structure is to study the standard deviation of normalized variables x′ (Fig. 3b). Its value for each cluster can be compared with the value for the whole dataset (which is equal to 1 due to the normalization given by Eq. (1)). A normalized variable that has a standard deviation close to 1 indicates that its spread within the cluster is similar to that of the population (for example, [O3]r for all clusters). In contrast, a standard deviation close to 0 shows that the cluster specializes in a small range of values of the variable (for example, H in cluster 3), while a standard deviation greater than 1 is indicative of a complex structure (for example, SC in cluster 4). To understand which variables contribute to define a clustered data structure beyond the spatial distribution determined by emissions, their absolute deviations from 1 are averaged across all clusters (Fig. 4). The first three variables in the rank (putting aside emissions) are considered for visualization: the hour of occurrence of Cmax (H), the total solar radiation (TSR), and the sky cover (SC). Contour curves surrounding the regions containing 99.5% of data within each cluster are plotted in the planes SC-H (Fig. 5a) and TSR-H (Fig. 5b). Under low SC conditions, cluster 3 appears in the morning hours with low TSR values while clusters 1 and 2 appear at midday with high TSR. Cluster 4 instead seems to have a more complex distribution, appearing mostly in early hours with high SC values and in late hours at low SC values. In midday hours, cluster 4 also appears with high SC and low TSR values. These projections help to understand the separation between clusters and the more complex scatter plot spanning all three variables (Fig. 5c, d). In this three-dimensional space, clusters distribute forming arcs that extend across the H dimension in successive TSR-H planes. When SC is low, the arc is highest in TSR, containing clusters 3 in the morning, 1 and 2 at noon, and part of 4 in late hours. As SC increases, these arcs become lower in TSR and are mostly formed by Cmax belonging to cluster 4.

Fig. 3
figure 3

a Mean values and b standard deviation of normalized variables (z-score) for each cluster (Cmax, summer maximum O3 concentration; H, hour of occurrence of Cmax; WS, wind speed; T, air temperature; SC, sky cover; KST, atmospheric stability class; TSR, total solar radiation; QNOx, NOx emission rate; QVOC, VOC emission rate; [O3]r, regional background O3 concentration)

Fig. 4
figure 4

Sorted absolute deviation of sigma values of normalized variables (Fig. 3b) from 1, averaged over the four clusters

Fig. 5
figure 5

Contour curves surrounding the regions containing 99.5% of data within each cluster in the SC-H (a) and TSR-H (b) planes and distribution of objects of each cluster in the H-SC-TSR space from two perspectives (c and d)

In summary, these results show that the structure of clusters spans multiple variables and that at different spatial locations, distinctive and complex sets of conditions lead to Cmax modeled with the DAUMOD-GRS model.

Characterization of the clusters

In order to provide a full description of the clusters, the mean values and ranges of all variables are analyzed (except for the wind direction that is discussed apart in “Wind direction”). Table 1 presents the mean variables of each cluster, and Fig. 6 shows the 95% confidence range of each variable vs. the cluster number. Comparing clusters 1 and 2 (which dominate at urban and suburban receptors, respectively), these occur on average at 14 h and 13 h, respectively, under conditions of clear sky (mean SC = 1), moderate wind speeds (WS = 6.0 m/s and 5.1 m/s, respectively), relatively high values of temperature (T ~ 27 °C) and total solar radiation (TSR = 762 and 855 W/m2, respectively), and atmospheric instability (i.e., lower mean values of KST) (see Table 1). Cluster 1 presents a wider range of variation of H, SC, KST, and TSR than cluster 2, presumably associated to a wider range of QNOx and QVOC (which are produced by both their variations within the city and their uncertainty ranges in the Monte Carlo simulations).

Table 1 Variables from Fig. 3, averaged for each cluster (N number of objects of each cluster (%))
Fig. 6
figure 6

Mean variables and their 95% confidence interval vs. cluster number: a summer maximum O3 concentration (Cmax), b hour of occurrence of Cmax (H), c wind speed (WS), d air temperature (T), e sky cover (SC), f atmospheric stability class (KST), g total solar radiation (TSR), h NOx emission rate (QNOx), i VOC emission rate (QVOC), and l regional background O3 concentration ([O3]r)

On the other hand, comparing clusters 3 and 4 (that dominate at receptors with no emissions, where the largest Cmax values are estimated), these occur on average at 7 h and 15 h, respectively under low wind conditions (WS ≤ 1.7 m/s) (see Table 1). The difference in the mean time of occurrence of Cmax between these two clusters is also reflected in the mean values of the meteorological variables that present marked diurnal cycles, as the atmospheric stability class. However, this is not observed in the mean total solar radiation that is greater for cluster 3 (at H = 7 h) than that for cluster 4 (at H = 15 h), due to the fact that in cluster 4, Cmax occurs with a mean sky cover value of 4 (partly cloudy sky). Regarding their within-cluster variations, while cluster 4 presents a wide range of H (as well as in other variables like WS, SC, and TSR), cluster 3 occurs only during early-morning hours (see Fig. 6b). The possible reasons for the occurrence of such an ozone morning peak under conditions of clusters 3 are discussed in “Discussion.”

Wind direction

The role of wind direction (DIR) is more difficult to analyze because, when considering the information from all receptors, it is not possible to distinguish situations of DIR that bring more or less polluted air to the receptors (a same DIR may have different effects on the pollutant concentration at different receptors depending on the emission sources and how they distribute around them). However, it is worth inspecting whether differences exist among the most frequent wind directions of the clusters. Figure 7 presents the wind rose of each cluster. The four clusters show variable and different dominant wind directions. In cluster 1 (associated to the lowest Cmax values), winds leading to the occurrence of Cmax are mainly of moderate intensities (4 m/s) from the ENE (22%) or relatively intense (~ 8 m/s) from the SE-SSE (27%). In cluster 2, Cmax occurs with lower mean wind speeds (3 m/s) from the ENE sector (9%), moderate (5 m/s) from the W (11%), and intense (8 m/s) from the S (9%). In cluster 3, the dominant wind directions are also variable: E (14%), WSW (13%), and NNW (14%), but wind intensities are low (< 2 m/s), as previously noted. The same is observed for cluster 4 but with different dominant wind directions: N-NNE (29%) and E-ESE (29%). (As shown in Table 1, each cluster presents a different number of objects and then the same frequency of DIR for two different clusters gives different number of wind situations with that DIR.)

Fig. 7
figure 7

Wind rose of each cluster

Given the spatial distribution of clusters 3 and 4 (see Fig. 1), their corresponding wind roses (Fig. 7c, d, respectively) suggest that Cmax could be occurring with winds that come from outside the MABA (i.e., bringing no emissions to those receptors). This can be easily confirmed with a histogram of wind directions at the time of occurrence of Cmax at any of the receptors where these clusters dominate. For example, the histograms of DIR (not shown) at four selected receptors indicated in Fig. 1 verify this. At the suburban receptor selected in Fig. 1c, where cluster 3 dominates, Cmax occurs with winds from the W-NW (in 59% of the MC simulations), while at the one chosen for cluster 4 (Fig. 1d), the maximum ozone concentration is mostly (91%) associated with winds from the E-SE. The occurrence of Cmax with winds that come from outside the MABA is intriguing yet compatible with theory and observations as analyzed in the following section.

Discussion

The results obtained in this work show that at receptors of relatively high and moderate emission rates, the DAUMOD-GRS model gives relatively conventional results based on our knowledge of the typical O3 diurnal profile (peaks occurring around midday hours due to photochemical conversion of NO2 into O3, enhanced by the presence of VOCs). However, at receptors with no emissions, cluster 3 shows that Cmax can occur at early morning hours and/or with winds that come from outside the MABA, which is less expected. The clear spatial pattern of the obtained cluster distribution (“Spatial patterns”) allows us to identify the region of the modeling domain where these unconventional results need to be explored. By choosing two representative examples of receptors with contrasting Cmax leading conditions, an analysis of the potential causes of the O3 morning peak is performed. Figure 8 shows the diurnal variations of the O3 concentrations ([O3]) and the initial concentrations of nitrogen monoxide ([NO]i) and ozone ([O3]i), at an urban receptor (UR, Fig. 1a) where cluster 1 dominates and at a suburban receptor (SU, Fig. 1c) where cluster 3 is dominant. The reason to plot [NO]i and [O3]i is that in the model, reaction NO + O3 → NO2 (NO titration) is the only one occurring in the absence of solar radiation. For simplicity, the DAUMOD-GRS memory component was initially excluded from the analysis. A conventional hourly profile of O3 concentration occurs at receptor UR (Fig. 8a). In this receptor, [NO]i is always greater than or equal to [O3]i, with a higher difference at early-morning and late-evening hours. At 6 h, [O3] ≈ 0 because the initial 20 ppb of ozone reacts with NO to generate NO2 through the above reaction, and there is no solar radiation to form it photochemically. At the following hours, when solar radiation becomes important, the generated NO2 is photolysed to form O3, and [O3] increases reaching its maximum value (16.7 ppb) at 14 h. In turn, at receptor SU (Fig. 8b), at 6 h, [NO]i is close to zero due to a NW wind coming from outside the MABA. Consequently, the initial O3 concentration cannot be consumed chemically and hence [O3] remains at around 20 ppb. At 7 h, the wind direction changes to NNW and some NO (coming from the MABA) starts removing O3 via the NO titration reaction. In the following hours, the wind keeps rotating hourly bringing pollutants from the MABA. As shown in Fig. 8b, from 7 to 10 h, [O3] still depends strongly on [NO]i (which is supported by a strong correlation between [NO]i and [O3] (R2 = 0.71)). After that, photochemistry starts to dominate. In this case, the O3 morning peak (17.6 ppb) occurs at 8 h and is slightly higher than the one occurring at 14 h (17.3 ppb). This means that when the solar radiation is low, an ozone morning maximum (higher than the midday peak) can occur if [NO]i ≪ [O3]i and the diurnal amplitude is relatively small. When these simulations are repeated including the standard memory component of the model, the above analysis is still valid with the only addition of an increased morning peak at the suburban receptor (not shown).

Fig. 8
figure 8

Diurnal variations of the concentration of O3 ([O3]) and the initial concentrations of NO ([NO]i) and O3 ([O3]i), at two receptors of the MABA: a an urban receptor (UR) where cluster 1 dominates and b a suburban receptor (SU) where cluster 3 dominates, during their days of occurrence of Cmax

This explains why the second type of modeled ozone profile only occurs in the MABA surroundings with winds from outside the urban area and not at receptors with high and moderate emissions (where [NO]i is always relatively large). These results are consistent with those obtained by Bogo et al. (1999) who measured O3 hourly concentrations at a coastal site of the city in spring 1995 and found that the highest O3 peak concentration occurred with very low NO concentration and wind coming from the vast de la Plata River estuary. The authors also show examples where the maximum O3 concentration occurs during morning hours. Our results suggest that, while such a morning peak may not be responsible for Cmax in the urban area of the MABA, where most of the measurements have been made, it is in the surroundings where morning peaks become more relevant. The observation of this distinct behavior suggests that it would be very interesting to monitor O3, NO, and NO2 concentrations at these less explored areas.

Summary and conclusions

Our main result is a qualitative characterization of the type of solutions that can be obtained with the DAUMOD-GRS to estimate the summer maximum O3 concentration (Cmax) in the MABA. In a previous work (Pineda Rojas et al. 2016), the gridded uncertainty of Cmax due to the uncertainties in the DAUMOD-GRS input variables was assessed applying the Monte Carlo (MC) analysis. A sensitivity analysis performed at eight selected receptors showed that the relative contributions from nine input variables to total Cmax uncertainty vary spatially, with the regional background O3 concentration being the dominant input. The present work, in contrast, focuses on the identification and characterization of atmospheric and emission conditions leading to Cmax in the gridded MC outcomes. To describe such conditions, we apply clustering analysis aiming to understand the dynamics of the DAUMOD-GRS model in different parts of the MABA, especially in its surroundings where the largest Cmax and uncertainty values are estimated and the lack of observations impedes its statistical evaluation.

Applying the k-means algorithm, four families of conditions that lead to the occurrence of Cmax are identified. The spatial variation of the dominant cluster (i.e., the most present in MC simulations) appears to be associated to that of the NOx and VOC emissions in the MABA. At urban and suburban receptors (i.e., receptors with emissions), two clusters are mostly present: in both of them, Cmax occurs on average at 13–14 h, under conditions of clear sky, moderate to intense winds, and relatively high air temperature and solar radiation. At the most urbanized area, a wider range of the emission rates appears to lead to a greater variation in the conditions under which Cmax can occur, compared with the suburban zone. At the surroundings of the MABA (where no emissions are considered and the highest Cmax values are simulated), other two clusters are obtained: one in which Cmax occurs only during early-morning hours, under clear sky conditions, low wind speed, and variable wind direction (in occasions coming from outside the MABA) and another cluster (mainly in emission transition areas) in which O3 peak concentrations occur on average at 15 h under partly cloudy sky conditions. Less conventional model results revealed by one of the clusters (i.e., an ozone morning peak or values of Cmax occurring with winds that come from outside the urban area) are consistent with the few measurements carried out in the city of Buenos Aires. This suggests that further monitoring efforts at suburban or rural areas could be particularly useful to enhance our current knowledge of O3 dynamics in the MABA surroundings (and hence determine if further model adjustments or better parameter estimations are needed for this region). Our results exemplify the way in which clustering can be helpful in the analysis of Monte Carlo simulations, to unveil stereotypical spatial patterns in large collections of modeled concentration peaks and discriminate between the families of conditions generating them.