1 Introduction

The industrial and population growth, together with increasing vehicle fleet have contributed to air pollution worsening in megacities (Parrish et al. 2011), as Rio de Janeiro. According to the Emissions Inventory in the Metropolitan Region of Rio de Janeiro (MRRJ), vehicles are responsible for generating 40% of inhalable particulate matter (PM10) (FEEMA 2004), where fine particles (PM2.5) represent about 35% of PM10 (INEA 2013; Godoy et al. 2009). It is well known that fine particles penetrate more deeply in respiratory tract and reach the pulmonary alveoli causing health damages (Shaka and Saliba 2004; US EPA 1999). Furthermore, minerals and toxic substances are constituents of these particles and they can carry micro-organisms such as fungus and bacteria (Kampa and Castanhas 2008; Voutsa and Samara 2002).

To evaluate air quality in a region, atmospheric pollutant concentrations associated to meteorological and topographical conditions (atmospheric dispersion) must be considered (Pires 2005; FEEMA 2006; Heintzenberg 1989; Swietlicki et al. 2008). In the MRRJ there are many complex aspects that influence the atmospheric dispersion of pollutants and thus air quality. Some of these are the irregular land use and the proximity of the sea (Guanabara and Sepetiba Bays). The latter favors natural ventilation. However, the rugged topography, parallel to the coastline, hinders air circulation increasing pollutant levels (Soluri et al. 2007; Godoy et al. 2009). Another important feature is the tropical climate with intense solar radiation and high temperatures, which favors chemical reactions and particle formation (FEEMA 2006).

Air quality management is more efficient when air basins are delimited (Pires 2005). Air basins are polygons delimited by local topography, around 100 m altitude. This is the main influence related to the atmospheric pollutant dispersions (Godoy et al. 2009).

The Environmental Institute of Rio de Janeiro State (INEA) has one of the largest PM2.5 monitoring networks in Latin America. PM2.5 concentrations together with meteorological data provided by INEA were used to evaluate the air quality in the MRRJ. The relationship between PM2.5 and meteorological conditions, seasonality and air basins were determined using statistics tools.

The goal of this study was to use statistics tools (PCA, HCA, Kruskal–Wallis, Mann–Whitney’s test and others) to a better understanding of the relationship between fine particulate matter (PM2.5) levels and seasons, meteorological conditions and air basins.

2 Methodology

2.1 Sampling sites

Metropolitan Region of Rio de Janeiro (MRRJ) consists of different air pollution sources, as well as, a complex topography, which hinders the air mass circulation and pollutant dispersions in some areas far from the coast (Pires 2005; Soluri et al. 2007). Based on these characteristics, four air basins were defined to the MRRJ, which are similar in terms of topography, meteorology and emission sources (FEEMA 2006). In these air basins, six sites were chosen for monitoring PM2.5 concentrations (Fig. 1; Table S1; Supplementary Information SI).

Fig. 1
figure 1

Source: Adapted from FEEMA (2006)

Map of Rio de Janeiro state and the sampling sites (a Ramos, b Downtown, c Maracanã, d Copacabana, e Jacarepaguá, f Santa Cruz) delimited by the air basins I to IV

The air basin I has an area of 730 km2 and it is bathed by the Atlantic Ocean (Sepetiba Bay). One sampling site (Santa Cruz, SC) was selected in this area. The site is located in an industrial area, where steel mills and other industries are settled. In addition, an important road linking Rio de Janeiro to Sao Paulo with heavy traffic crosses the county. The air basin II has an area of 140 km2 and is facing the Atlantic Ocean. Jacarepaguá, a residential area with heavy traffic, was the sampling site selected. The air basin III covers an area of 700 km2, including periphery and Guanabara Bay. Three sampling sites (Ramos, Maracanã, and Downtown stations) were chosen, where the first one is in an industrial zone and the others in urban zones. Finally, the air basin IV, with an area of 830 km2, covers some of the towns present on the other side of the Guanabara Bay. For this air basin no sampling site was selected. Copacabana station was adopted in this study because presents similar characteristics to the sites in the air basin IV.

2.2 Particulate matter sampling

Fine particulate matter sampling was performed by INEA using high volume samplers (Model AGVMP252, Energética, São Paulo, Brazil). The samplers are installed approximately 2 m above ground level. PM2.5 samples were collected in glass fiber filters, for 24 h, every 6 days, with a volumetric flow rate of 1.14 m3 min−1. All samples were collected from January to December 2011, corresponding about 4 samples/month/site. Mass of particulate matter was obtained by gravimetric analysis. The filters were weighed before and after sampling on an analytical balance (Mettler E., Zürich, Switzerland), as described in the Brazilian Technical Standard method (NBR 13412 1995), which is similar to the ASTM D4096-91 (2009) method.

2.3 Meteorological data

Temperature (T), relative humidity (RH), wind speed (WS) and wind directions (WD) were monitored every 15 min by surface meteorological stations located near to PM2.5 sampling sites (d < 2 km). The meteorological data used in the statistical analyzes were averages of 24 h corresponding to PM2.5 monitored day. Except WD, which was calculated by mode and it showed by Wind Roses.

2.4 Statistical analyses

Boxplot was used to represent the distribution of the PM2.5 concentrations and RH, T, WS and WD. Boxplot was also used to verify anomalous values that exceeded the range of 1.5 ± interquartile. Grubbs and Dixon test was applied to detect maximum and minimum anomalous values from PM2.5 concentration dataset (Grubbs 1969). Shapiro–Wilk test was also used to evaluate if PM2.5 data from each station followed a normal distribution (Shapiro and Wilk 1965). All tests were performed with 5% (p = 0.05) of significance level. All statistical analyses were performed using statistical computing platform R (R Development Core Team 2011).

Hierarchical cluster analysis (HCA) was implemented to group similar variables, by adopting the Ward method, Squared Euclidean distance, considering PM2.5 concentrations and meteorological variables. Correlation matrix was also applied pondering five variables (PM2.5 concentration, T, RH, WS, and WD) for each site.

Principal component analysis (PCA) was applied to evaluate relationship between PM2.5 concentrations and all the meteorological variables, which were plotted separately by sampling site and season with scores from principal components. The autoscaling pretreatment was used. It means that the data were centered on the mean and divided by the standard deviation. Kruskal–Wallis test was carried out to confirm the results observed by principal components and PM2.5 concentration scatter plot. PCA was also used to confirm air basin defined previously (FEEMA 2006), through graphical analysis of centroids of each PCA and site. Kruskal–Wallis test and after Mann–Whitney test were applied to evaluate air basin.

3 Results and discussion

3.1 PM2.5 concentrations

PM2.5 daily concentrations ranged from 1 to 70 µg m−3 and annual means varied from 8.7 to 31.9 µg m−3 (Table 1). As in Rio de Janeiro state, as well as in Brazil, there is still no standard for PM2.5. Therefore, all data were compared to the daily (35 µg m−3) and annual standards (15 µg m−3) of air quality established by the US Environmental Protection Agency (EPA 2006).

Table 1 Descriptive statistics for PM2.5 concentrations for each sampling site in 2011

Ramos, Maracanã and Downtown stations exceeded the daily standards of air quality only in 2 days. These three stations belong to the air basin III and had similar behavior. Although few events exceeded the daily limit, the PM2.5 annual average concentrations varied between 15.9 and 17.3 µg m−3, exceeding the annual limit from 6 to 15%. Previous study developed in Maracanã region (2003–2005) registered an annual average between 9.8 and 11 µg m−3 (Soluri et al. 2007; Godoy et al. 2009). High concentrations measured in 2011 probably are due to the increased of vehicle fleet. Due to the reduction of taxes for the acquisition of new vehicles, this encouraged the growth of vehicles throughout Brazil.

Santa Cruz and Copacabana presented the lowest averages of PM2.5 concentrations and not exceeded the annual, as well as, the daily air quality standards. These two regions are located near to the ocean, where the sea breeze favors atmospheric pollutant dispersions. In other studies performed in Santa Cruz, annual concentrations were higher, varying from 10.5 to 12 µg m−3 (Soluri et al. 2007; Rodriguez-Cotto et al. 2014). Although, since 2010, one of the biggest steel mills was settled in Santa Cruz, the PM2.5 levels not increased significantly, probably due to sea breeze influence in the pollutant dispersions.

In Jacarepaguá was registered the highest PM2.5 levels, exceeding the daily standard in 33% of the days. This condition is considered unacceptable to human exposure in a short term. PM2.5 annual average concentration (31.9 µg m−3) overtaken in 110% the limits established by EPA. Previous studies in this site performed between September 2003 to December 2005 found annual average concentrations lower, around 11.0 µg m−3 (Soluri et al. 2007; Godoy et al. 2009). Higher PM2.5 concentrations in 2011 are due to infrastructure works related to urban mobility improvement to implement express corridors to buses (BRT). The related activities contributed to the increasing of heavy vehicle circulation to bulldozing and material loading.

In general, the highest concentrations were observed in the winter months, where atmospheric dispersion conditions are impaired (Fig. 2).

Fig. 2
figure 2

Daily PM2.5 concentration in 2011 by each site

3.2 Meteorological conditions

Descriptive statistics of meteorological variations (temperature, relative humidity and wind speed) monitored in the six sampling sites in 2011 were applied to characterize local meteorological conditions (Table 2).

Table 2 Descriptive statistics of meteorological variables of each of the six stations

Annual mean temperatures recorded ranged from 23 to 28 °C, with the lowest registered in Jacarepaguá and the highest in Ramos. This difference in the mean temperatures is probably due to greater green area in Jacarepaguá in contrast with major land use in Ramos.

Ramos, Maracanã and Santa Cruz registered similar means RH (77–78%), while Copacabana (69%) and Downtown (74%) were lower. The highest mean RH was measured in Jacarepaguá (86%), probably due to afforestation rate.

Figure S1 (Supplementary Information) shows the wind roses with predominant wind directions and intensities. Light wind breeze with intensity between 1.0 and 3.0 m s−1 and higher frequency in the east direction were registered in Downtown. According to Pimentel et al. (2014), in the years of 2001–2005, it was observed high wind occurrences with reduced intensity of 4.0 m s−1, suggesting urbanization influence due to wind corridor formation. In Copacabana the winds came mainly from south quadrant, i.e., from the ocean, with calmness conditions (36%). In Jacarepaguá light winds and calmness conditions (40%) were predominant. Probably, these conditions can have influenced the high PM2.5 concentrations in 2011. In Maracanã, 7% of the predominant winds were registered from east/southeast direction. Light winds were also registered from east/southeast (3–5%) and from southwest and northwest (21%) directions. In Ramos occurred light breeze wind with speeds in southwest directions with occurrence of 10%. These winds also registered in southeast and northeast directions, however, with the occurrence varying between 3 and 7%. These light winds were distributed in southwest, southeast and northeast directions, with calmness conditions coming to 24%. In Santa Cruz wind speed above 3 m s−1 had occurrence in southwest direction (9%) and in north/northeast directions (3–6%), which indicates the occurrence of sea/continental breeze. Calmness conditions registered were 7%, being the lowest to the other sites. This region, apparently, had good dispersion conditions due to incidence of strong winds from Sepetiba Bay. Pimentel et al. (2014) also studied winds regime in Santa Cruz in the years of 2001/2002 and 2004/2005 and revealed a wind standard well defined in southwest and northeast directions, with maximum speed about 10 m s−1. Those directions were also verified in this study.

3.3 PM2.5 concentrations and meteorological conditions

The assessment of normal distribution of the PM2.5 concentrations and the meteorological data were analyzed by applying the Shapiro–Wilk test. WD, T, and RH presented a normal distribution for all sites, except for Jacarepaguá. On the other hand, PM2.5 concentrations did not show normal distribution. As not all of the variables presented a normal distribution, non-parametric tests were applied.

Figure S2 (in Supplementary Information) present box plots for the distribution of meteorological variables and PM2.5 concentrations. Anomalous values were more pronounced for PM2.5 concentrations for all sampling sites. These anomalous values were confirmed by applying Grubbs and Dixon tests. According to the tests, concentrations of 1 and 32 µg m−3 in Copacabana were anomalous, as well as, concentrations of 1 and 50 µg m−3 in Ramos. These values were removed from the dataset and a new one was obtained and statistics tests were applied.

Figure 3 presents Dendrograms obtained for meteorological variables (T, RH, WS and WD) and PM2.5 concentrations for all the sites, aiming to verify similarities among them, applying HCA. According to the results, PM2.5 concentrations from Downtown, Copacabana and Maracanã (Fig. 3a, b, d) presented high similarity with wind speed. It means that this parameter influence PM concentrations. In addition, Copacabana and Maracanã had relative humidity with similarity to PM2.5 concentrations, while Downtown had temperature. The parameter with the highest dissimilarity with PM2.5 in the three sites was wind directions. PM2.5 concentrations in Jacarepaguá were influenced by wind direction and speed, while in Santa Cruz were temperature and relative humidity. High temperatures lead to vertical air movement, resulting in an effective pollutants displacement from low to high altitudes. When this occurs, the atmosphere is in an instability state and there is a deep mixed layer. On the other hand, low temperatures do not favor vertical movements, resulting in increase in the atmospheric pollutant concentrations at low levels. In Ramos PM2.5 concentrations showed low similarity with the variables, indicating that meteorological conditions did not influence directly pollutant concentrations.

Fig. 3
figure 3

Dendrograms of PM2.5 concentrations and meteorology parameters normalized from each site, applying HCA

Correlation matrix between meteorological variables and PM2.5 concentrations were also prepared to help a better understanding of the results. The bolded values in the Table 3 were significant statistically. Then, Downtown, Copacabana, Maracanã, Ramos and Santa Cruz showed a negative relationship between RH and T. The negative value means that these variables are inversely proportional, i.e., when temperature increases an expansion in air volume occurs, resulting in a decrease in relative humidity.

Table 3 Correlation matrix of meteorological variables and PM2.5 concentrations for the six sampling sites (the bold values are significant statistically)

Regarding to HCA results, only Santa Cruz presented correlation between PM2.5 concentrations and meteorological conditions (RH—0.378 and T 0.446). It is coherent once the site is located in a region influenced by Sepetiba Bay, receiving high humidity and helping atmospheric pollutant dispersions.

3.4 Seasonal influence on the PM2.5 concentrations

The influence of the seasons on the PM2.5 concentrations (Table S2, Supplementary Information) was evaluated using PCA together with the meteorological data (RH, T, WD and WS). PCA was carried out on all the data and then the data was separated by season for the seasonal analysis. Kaiser criterion were adopted to choose the variables. This criterion was proposed by Kaiser (1960), and is probably the one of the most widely used. This criterion can retain only factors with eigenvalues greater than 1. In essence this is like saying that, unless a factor extracts at least as much as the equivalent of one original variable. Two principal components (PCs) were obtained which explained about 67% of the variability, where PC1 was represented by T and RH (35% of the variability) and PC2 by wind variables (WS and WD, explaining 32%). Therefore, the PCs have similar influence in the model. The commonalities ranged from 36 to 68% (Table S3, SI). PM2.5 was the variable with lowest level of variability explanation, because it had few correlations with meteorological variables.

Samples from each season were concentrated around the mean values and scores generate by PCA, they cannot be clearly distinguished (Fig. 4). PCA was used as a preliminary analysis, which would suggest the hypotheses, which were verified through the Kruskal–Wallis test (Table 4). This test showed that only in Maracanã PM2.5 concentrations were affected by the seasons.

Fig. 4
figure 4

PM2.5 concentration (µg m−3) scatter plot as a function of the two principal components, evaluating the influence of the seasons on a Downtown, b Copacabana, c Jacarepaguá, d Maracanã, e Ramos, f Santa Cruz

Table 4 Kruskal–Wallis test to evaluate the influence of the seasons on PM2.5 concentrations

3.5 Influence of air basins in the PM2.5 concentrations

PM2.5 concentrations scatter plot (Fig. 5) was used to verify the air basins (Fig. 1). According to PCs, Copacabana, Jacarepaguá, Maracanã, and Downtown were indistinguishable, i.e., they had similar atmospheric performance. These results contradict the classification established by FEEMA (2006), where Maracanã and Downtown are in the air basin III and Jacarepaguá in the air basin II. Kruskal–Wallis test also applied to 360 samples and the results showed χ 2 = 159.7 and p value <0.05. Then Mann–Whitney test (post hoc) was used to identify different sites groups about PM2.5 concentration. The results showed in Table 5 confirmed the PCA results, i.e., Downtown, Maracanã, and Jacarepaguá should be in the same air basin. Copacabana, which was not included in any air basin, presented similar behavior of Santa Cruz, in almost all seasons except winter. Ramos is located in the air basin III, but as observed in Fig. 5, it is not confirmed. Ramos was different from the other sites, because it is mainly influenced by PC1, which is represented by T and RH variables and they were inversely proportional in this component.

Fig. 5
figure 5

PM2.5 concentrations scatter plot from six monitoring sites as a function of the two principal components centroid

Table 5 Mann–Whitney test for the evaluation of the air basins

Santa Cruz was more influenced by PC2, which expresses wind information (WS and WD), confirming the results previously discussed (Table 2), where in this site was verified the highest wind speed. Santa Cruz was confirmed in the air basin I, as defined previously by, FEEMA.

4 Conclusion

Six PM2.5 monitoring sites in the Rio de Janeiro city (Copacabana, Downtown, Jacarepaguá, Maracanã, Ramos and Santa Cruz) were chosen in different air basins with diverse air pollution sources, aiming to obtain an atmospheric profile for a metropolitan region in a tropical country in South America. Jacarepaguá (urban zone), situated in the air basin II, was the most polluted site. The PM2.5 levels exceeded in 110% the EPA annual standards, overtaken the daily standard in 33% of the monitored days. In contrast, Santa Cruz (industrial zone), situated in the air basin I, has not violated the daily and annual air quality standards. In this region, an intense pollutant’s dispersion occurs due to the strong presence of sea breeze from the Sepetiba Bay.

Regarding the air basins, the classification established by FEEMA was not in agreement with the results found in this study. According to the air basins obtained in this work, Downtown, Jacarepaguá and Maracanã should be in the same air basin, contradicting the assortment established by FEEMA, where Jacarepaguá belongs to the air basin II and the others to the air basin III. Vehicle emissions are the main pollution source in these sites. Ramos is included in the air basin III. However, our results suggested other air basins, once PM2.5 emissions are mostly from industrial sources. Copacabana, which does not belong to any air basin defined by FEEMA, had similar behavior of Santa Cruz probably due to both suffer strong influence from the sea breeze. Santa Cruz is the only site in agreement with the air basins previously defined.