Introduction

The health effects of air pollution have been studied intensely in recent years. Exposure to pollutants such as nitrogen dioxides (NO2), sulfur dioxides (SO2), carbon monoxide (CO), ozone, and particulate matter (PM) has been associated with increased mortality and hospital admissions due to health outcomes such as respiratory and cardiovascular diseases.

In epidemiological studies, the key question concerns the extent of the exposure to various chemicals in absence of a direct measure of exposure (Dionisio et al. 2016; Luong and Zhang 2017). In such a case, the exposure is commonly evaluated by three indirect methods: use of the distance from the source as a proxy, dispersion modeling, and air monitoring. Dispersion models, including information on meteorology, emission, and topography, are a useful tool to predict mean ground level concentration around sources and to identify, on average, areas of maximum and minimum population exposure to contaminants emitted in air, but not measured at ground stations. By increasing the spatial and temporal resolution and by providing estimates for pollutants that are not measured, they circumvent limits in monitoring networks due to location, errors, and frequency of measurements. However, they require large simulation time, appropriate parameterizations, and extensive validation with available measured data. This especially occurs in complex terrain applications, where both mean flow and turbulence, having been forced by the large scale motion, are then heavily modified by the local complex orography (Hanna and Strimaitis 1990; Varvayanni et al. 1998; Mangia et al. 2012; dos Santos Cerqueira et al. 2018). On the other hand, when emissions are not well defined and controlled, as in the case of fugitive emissions or flaring and venting activities, measurements of specific contaminants can be fundamental for an estimation of such kind of impact.

The present study focuses on the estimation of the air quality impact of the Centro Olio Val d’Agri (COVA) in the frame of a more comprehensive health impact assessment study. The COVA operates at the biggest onshore European reservoir (crude oil and gas). It is located in the Agri Valley (Southern Italy) which is about 30 km long and 12 km wide, at about 600 m above sea level. COVA activities involve different types of emissions, not all of them are well-defined and controlled, as it is the case of fugitive emissions from flanges, valves, seals, drains, or flares.

The health study design requires the assignment of an exposure variable for each subject in the cohort of the two small valley towns close to the plant, Viggiano and Grumento Nova.

To fulfill this requirement, we used the RMS (RAMS-MIRS-SPRAY) (Trini Castelli 2000, 2008) modeling system for the simulation of pollutant dispersion of regularly monitored emissions linked to stationary combustion. The RMS modeling system has been extensively validated and applied in several environmental impact studies (Sansigolo Kerr et al. 2001; Carvalho et al. 2002; Oettl et al. 2007; Trini Castelli et al. 2003, 2010, 2011)

In order to estimate the impact of the other types of emissions, a spatial interpolation of specific measured contaminants was performed.

The concentration of the pollutants and their distribution are discussed in view of their implications in the epidemiological study.

Materials and methods

The upper Agri Valley (Fig. 1), situated in the south-western sector of Basilicata (Southern Italy), is oriented towards the NW-SE and is bordered by the Apennine Mountains on both sides. The valley houses are the largest European onshore oil reservoirs which in recent years had caused a significant increase of activities related to the extraction and pre-treatment of hydrocarbons before they are conveyed to the refinery. The Centro Olio Val d’Agri is the largest existing oil/gas pre-treatment plant located in a populated area (Trippetta et al. 2013) The nominal treatment capacity of the entire plant is 16.500 m3 d−1 of crude oil and 3.100.000 Sm3 d−1 of associated gas).

Fig. 1
figure 1

Area of the study. Agri Valley and the industrial site, including the monitoring stations (see Table 2) and the RAMS models simulation domain (reddish area; see Table 3). Image data: Google, Landsat/Copernicus, SIO, NOAA, U.S. Navy, NGA, GEBCO

Emissions

The COVA pre-treatment process implies different types of emissions due to (i) incineration of residues and electric and thermic power generation (stationary combustion), (ii) flaring and venting activities, and (iii) fugitive emissions from oil tanks. Not all of the emissions are well-defined and controlled. In this study, the impact of the monitored emissions of SO2, NOx, and CO, due to stationary combustion, was estimated using dispersion simulations. Such hourly emission data were provided by the owner for 2 entire years, 2012 and 2013. Table 1 summarizes the annual emission data for the year 2013 and the main characteristics of the stacks.

Table 1 Emissions (total tons in the year 2013) of SO2, NOx, and CO for each COVA stack, together with stack height (h), diameter (d), yearly averaged flue gas temperature (T) and exit speed (V)

Figure 2 illustrates the density functions of the plume exit speed and temperature for the different stacks, both of which influence the effective height of the release due to the plume rise, hence making them useful for supporting the interpretation of the results. The functions have been computed by a kernel density estimation, a non-parametric method for estimating the probability density function of a random variable. Thus, the area under the curves is equal to 1. E04bis and E20, emitting also SO2, are the highest stacks with lower exit speeds and higher temperatures with respect to the others, while E11a, E11b, and E11c have the highest exit speeds.

Fig. 2
figure 2

Distribution of the exit temperature (top) and exit speed (bottom) of the release for the seven stacks: E03 gray line, E04bis purple line, E11A + B green line, E11C blue line, E12C red line, E12B brown line, and E20 orange line

Regarding the flares emissions, only the hourly gas flow rates expressed in kg h−1 were provided by the owner, and no other information about specific substances emitted was included. The analysis of the hourly maximum flow rate for each day in 2013, based on a calendar plot (Fig. A1 Electronic Supplementary Material, ESM), highlighted the highly irregular trend of the phenomenon with values exceeding 20,000 kg h−1 on some days.

Concentration data

A monitoring network of five stations had measured both meteorological and concentration data with some regularity since 2013 (Fig. 1 and Table 2). The monitoring stations referred to as VIGG and GRNO are located close to the towns of Viggiano and Grumento respectively, while VIGZ is very close to the plant.

Table 2 Characteristics of the five monitoring stations

We analyzed the series of the hourly wind speed and direction measured at the five stations for the year 2013 (Fig. A2 ESM). The data were missing for the first 2 months at the MADB, CMOL, GRNO, and VIGG stations. Moreover, the series evidenced a systematic shift in the measurements of approximately 3 ms−1 in some months of the year. For the VIGZ station, the wind speed measurements show a regular evolution, but the wind direction is provided as discretized values and are not representative of the actual variability of the wind velocity. Thus, the majority of the observed wind velocity data do not represent a valid database to validate the performance of the modeling system.

To evaluate the impact of different types of emissions, in addition to those measured at the chimneys, specific contaminants measured in the monitoring network have been analyzed. Among the specific ones for the oil/gas activities, the hydrogen sulfide (H2S) resulted as the most complete and reliable series. Therefore, the analysis was based on this contaminant.

On the basis of the availability and quality of the emission and concentration data, the analysis and simulations were performed for the year 2013.

The modeling system

RMS is a modeling suite composed by the atmospheric model RAMS (Pielke et al. 1992; Cotton et al. 2003), the boundary layer parameterization code MIRS (Trini Castelli 2000), and the Lagrangian particle model SPRAY (Tinarelli et al. 2000). In RMS suite, RAMS has been modified including alternative turbulence closure schemes, also for specific simulations at the microscale. MIRS processes the meteorological RAMS output fields or, alternatively, other kinds of data fields derived by observations or diagnostic models, then calculates the boundary layer quantities and the Lagrangian turbulence fields. SPRAY is a three-dimensional Lagrangian particle model designed to simulate the airborne pollutant dispersion, able to take into account the spatial and temporal inhomogeneities of both the mean flow and turbulence. Concentration fields generated by point, areal, or volume sources can be reproduced by the model. The trajectory of the airborne pollutant is simulated through virtual particles: their mean motion is defined by the local wind, and the dispersion is determined solving the Langevin stochastic differential equations for the velocity fluctuations, reproducing the statistical characteristics of the turbulent flow. SPRAY allows realistic reproductions of complex phenomena, such as low wind speed conditions, strong temperature inversions, flow over complex topography, land use, and terrain variability.

In this study, four nested three-dimensional (3D) grids were used in RAMS, the largest one (4272 × 3696 km2) covered all Europe, from North Africa to Northern Europe, with a horizontal grid resolution of 48 km, the second one (1452 × 1596 km2) covered all Italy and part of central Europe with a 12-km grid mesh, while grid 3 (136 × 136 km2) was covering part of South Italy with a 4-km grid mesh. The smallest domain of grid 4 (45 × 30 km2), with 1-km grid spacing and where SPRAY model was run (see Fig. 1), was covering the most part of the Agri Valley, including the area where the two small towns (Viggiano and Grumento Nova) subject of the epidemiological study are settled. Additional details on the RAMS simulation domains are reported in Table 3.

Table 3 Details of the RAMS model simulation domains

The simulations were carried out for the year 2013 as both meteorology and emission scenario were representative of the typical conditions in the area. The year 2013 has also the highest number of valid data on emissions and concentrations. In order to reduce the time needed to perform the yearly simulation, the RAMS analysis fields from previous runs over Italy, driven by the NCEP (National Centers for Environmental Predictions) meteorological analyses, were acquired for the two coarse domains of 48- and 12-km resolution. RAMS analysis fields were then used as input and nudging on hourly basis for the two nested domains at 4- and 1-km resolution. This approach allowed reducing the simulation time to more than one-tenth of a full prognostic run over four nested domains. This is an important aspect when dealing with time restrictions in environmental impact studies. The drawback is that the two-way nesting cannot work from the two finest to the two coarsest grids. However, based on a preliminary assessment comparing this “nudging” approach with a full four-grid run, it was shown that the outputs are very similar and the quality of the simulation holds.

In SPRAY dispersion simulation, the Lagrangian particles are emitted every 30 s considering the emission from the stacks as point sources. The number of the released particles is established to generate a minimum concentration associated with the single particle of 0.005 μg m−3 for NOx, CO, and SO2. Such minimum value is appropriate for reproducing the concentrations with a good detail. The particle trajectories are followed while they stay inside the simulation domain, for the full period of 8760 h. The total particle number during the simulation was about 15 million. The releases at the different stacks are traced separately and independently of the other ones in order to assess the contribution to the ground concentrations of each single emitted plume.

Results

The results of the air quality impact study are discussed referring to the typical atmospheric circulation in the area and the distribution of the concentrations related to the COVA plant emissions. The predicted data for the concentrations of the treated pollutants were then used as a basis for the heath impact assessment.

Meteorological simulations

Figure 3 shows the scatter plots comparing model-predicted and observed air temperatures for the year 2013. We note that while the temperature is measured at a height of 2 m, the first RAMS level reported in the plot is at a height of 24 m. The scatter is not negligible but the qualitative agreement, considering also the stringency of a paired comparison between a point measurement and an averaged simulated value, is fair. RAMS model tends to produce higher temperatures than observed, mostly during nighttime. We note that the difficulty in reproducing the diurnal evolution, with consequent deviations between observed and simulated temperature at the surface, is a known problem in numerical meteorological models (Svensson et al. 2011). It was found that the non-local boundary layer schemes, which are generally adopted in mesoscale models like RAMS, tend to produce higher temperatures than local schemes, in particular for nighttime (Kleczek et al. 2014).

Fig. 3
figure 3

Scatter plot between the observed and predicted temperature values at the five measuring stations: from top-left to bottom-right MADB, CMOL, GRNO, VIGG, and VIGZ. Daytime data (from 6 a.m. to 6 p.m.): green circles; nighttime data (from 6 p.m. to 6 a.m. next day): blue circles

In any case, given the high exit temperatures of the flue gas Tf at the stack, the effect of the temperature overestimation on the plume rise and on its dynamics is negligible, since the atmospheric temperature Ta enters the buoyancy flux as follows:

$$ {F}_b= gV{r}^2\frac{T_f-{T}_a}{T_a} $$
(1)

where g is the gravity acceleration, V is the exit speed, and r is the stack radius and the temperatures Tf and Ta are given in degrees K.

The agreement between the predicted values for the temperature and the observations has been quantified calculating some statistical metrics for the full yearly dataset.

The statistical indexes used are correlation (R), normalized mean square error (NMSE), and fractional bias (FB), defined as:

$$ R=\frac{\overline{\left({C}_o-\overline{C_o}\right)\left({C}_p-\overline{C_p}\right)}}{\sigma_{Co}{\sigma}_{Cp}} $$
(2)
$$ FB=2\frac{\left(\overline{C_o}-\overline{C_p}\right)}{\left(\overline{C_o}+\overline{C_p}\right)}, $$
(3)
$$ NMSE=\frac{\overline{{\left({C}_o-{C}_p\right)}^2}}{\overline{C_o}\;\overline{C_p}}, $$
(4)

where the indices o and p denote respectively the observed and the predicted values of meteorological variable C.

Regardless the tendency of the model to overestimate the temperatures, the values obtained for statistical indexes (Table 4) are in the range of values which are typical of mesoscale models, as demonstrated in many works (see, for instance, Gross 1994).

Table 4 Statistical indexes for the temperature at the stations

Given the poor quality of the observed data for wind velocity, it was not possible to perform a systematic evaluation of the meteorological model simulations based on them. However, to provide a qualitative comparison between the predicted and observed wind fields, the wind roses, for the months where the observations were more reliable, are reported in Fig. 4 for the Grumento Nova (GRNO) and Viggiano (VIGG) stations. Both observed and modeled wind roses indicated that the prevailing directions with even greater intensity are those coming from the western sectors with some differences between the stations, due to their location, and between the predictions and observations, due to the model resolution.

Fig. 4
figure 4

Wind roses based on the hourly data at the GRNO (top) and VIGG (bottom) stations for sub-periods in 2013 with more reliable measured data. RAMS simulations (left) and data measured by the anemometer (right)

In Grumento Nova, the dominant wind directions are well captured in both the north-west and south-west sectors, while in Viggiano, the frequency of the wind directions in the north-west sector tends to be underestimated. The quality of the comparison is reasonable, considering that the simulations were performed at a regional scale with a horizontal resolution of 1 km that cannot fully reproduce the topographical characteristics of the area. In complex orography, small-scale gradients are hardly captured by models due to the space averaging. Moreover, in comparison, RAMS data are provided at 24 m while the observations are collected at 10 m. We also take into consideration that observations are instantaneous single-point values and may differ significantly from the time and space averages produced by model simulations.

The simulated wind roses at the other three stations for the year 2013 (not shown) also indicated that in the area, the prevailing directions were from the western sectors. This observation is consistent with the analysis of the observed data for years 2013, 2014, and 2015.

Keeping in mind the anomalous high speeds shown in the wind velocity measured data for the year 2013 (Fig. A2 ESM) as an example in Fig. 5, the comparison between observed and predicted wind rose at the Masseria De Blasiis (MADB) site evidences an adequate correlation among predicted and observed wind directions. In particular, looking at the wind roses for years 2014–2015, for which the observed wind data were more reliable, we note that the distribution of wind speeds is analogous as well.

Fig. 5
figure 5

Wind roses based on the hourly data at the MADB station. RAMS simulations for 2013 (left), measured data (anemometer) in 2013 (center), and measured (anemometer) data in years 2014–2015 (right)

Overall results indicate that the model simulations were able to capture the climatology of the area.

Dispersion simulations

The ground-level concentrations, considering all releases together or the single stack, have been calculated for all the species of interest, NOx, SO2, and CO. The hourly emission data provided by the owner per each stack were used as input fields for the releasing sources in the dispersion model simulations. The hourly averaged concentrations were computed by sampling the particles contained in grid cells close to the ground every 30 s, with a 500 m × 500 m horizontal dimension and 15 m vertical depth. From the hourly averaged concentrations, it was then possible to estimate the statistics of interest, in each point of the computational domain, such as means, percentiles, and maxima.

Figure 6 shows the simulated annual mean and maximum ground-level concentration for NOx and SO2 over the computed area while Fig. 7 shows a zoom over the area included in the health study.

Fig. 6
figure 6

Simulated ground-level concentration maps of NOx (left, μg m−3) and SO2 (right, μg m−3), annual averages (top) and maximum concentrations (bottom)

Fig. 7
figure 7

Simulated annual mean and maximum ground-level concentration for NOx and SO2 over the computed area but zooming on the area around the COVA plant

The spatial distribution of pollutants is characterized by a larger impact of the plant emissions in the eastern-north-eastern sector. The two municipalities of Grumento Nova (GRNO) and Viggiano (VIGG) are impacted differently by the plant releases with the highest concentration values occurring further downwind of the two towns’ sites, as highlighted in Figs. 6 and 7. This pattern is coherent with the prevailing wind directions which transport the pollutant from the south-west and north-west sectors towards the north and north-east, as well as the stacks characteristics, particularly the plume temperatures and exit speed (Table 1). Clearly, the complexity of the orography drives the distribution of pollutants at the ground, where maximum values are found on the slopes. This is also related to the height of the release points and the additional strong plume rise formed by the high temperature and exit velocity of the plumes. Therefore, the plumes are transported at higher vertical layers, farther away from the release points, generating higher ground-level concentration at larger distances from the plant sites where they hit the orography.

Looking at the distribution of the maximum concentration values for NOx and SO2 (Fig. 6, bottom), we note that the peak values were also found in the north and north-east area and are related to the orographic impact, in agreement with the mean values. In addition, peak values were also found upslope in the western part of the domain, due to the along-valley circulation. Also, maximum values were found in proximity to the release point around the plant area. These were probably generated by convective atmospheric conditions inducing the looping of the plume, thus generating high concentration values close to the emission points. This is even clearer from the zoomed graphs (Fig. 7) since inside the valley, the area characterized by maximum concentrations lies in a range of about 2 km distance from the plant.

Considering the concentration maps for the single stacks (examples for NOx in Fig. 8 and for SO2 in Fig. A3 ESM), the simulation showed that the emissions giving the greater impact on the territory are from the stacks E11, E12, and E20, which are characterized by a large flow rate and/or a high exit velocity (Table 1 and Fig. 2).

Fig. 8
figure 8

Maps of the annual averages of the simulated ground-level concentration for NOx (μg m−3) for the single stack. From left top to right bottom: E03, E04bis, E11A + B, E11C, E12B, E12C, and E20

Despite some local differences between annual averaged SO2 and NOx maps, the Pearson correlation index is greater than 0.90 both in computed domain and in the zoom area.

From a qualitative comparison between the trends of the predicted and observed fields, through a visual inspection of the hourly concentration dataset at the stations, it was observed that the peaks of concentrations predicted by the simulation often occurred in correspondence to the peaks detected by the measurements (see, for example, Fig. A4 ESM). However, a correspondence between peaks paired in time and space cannot be expected, since observations include the background and any other pollutant sources present in the area, contributing to the modulation of the maximum concentration values.

H2S observed concentration maps

The analysis of the H2S series measured at the five stations shows a high irregular trend with different peaks that alternate with lower values. Often, they are associated with some other specific pollutants, as toluene and O-xylene measured during a flare event. (see Fig. A5 ESM for the event during the period from May 12–24, 2013).

But hydrogen sulfide is also associated to fugitive emissions from oil tanks. Being the most complete measured series among those ones specific for the oil-gas extraction and pre-treatment, and considering its impact on health, we focused on its spatial distribution. (Gianicolo et al. 2016)

Figure 9 shows the spatial pattern of annual averaged H2S concentrations for the year 2013. It was obtained by a kriging interpolation of annual average data measured at the five monitoring stations. Analogous distribution was found for 2014 and 2015. As well as for simulated concentration, a spatial gradient is evident with highest values found in the eastern sector.

Fig. 9
figure 9

Interpolation of measured H2S data (μg m−3) at the five monitoring stations. Year: 2013. Image data: Google, Landsat/Copernicus, SIO, NOAA

Discussion

An epidemiological residential cohort study design requires the assignment of a single exposure variable for each subject of the cohort. In the case of an industrial plant characterized by different types of emissions, not all well-defined and controlled, as in a pre-treatment gas oil center, the integration of modeling studies and measurements is necessary for a better exposure assessment of a population affected by the impacts of such a plant. In this way, limits of each approach may overcome limits of the other ones.

Despite the widespread interest around the oil gas extraction and pre-treatment, to date, there are very few studies focused on pollutants emitted during such activities, their monitoring, and consequent impact on the health of communities living around there. As noted in a recent review paper (Bustaffa et al. 2016), most of studies deal with oil refineries. Bustaffa et al. pointed out that the non-methane hydrocarbons are the main atmospheric pollutants near plants of crude oil treatment, though they are rarely monitored with continuity. In Agri Valley, some experimental campaigns were conducted, but they were focused on fine and ultrafine particulate matter and its component black carbon (BC) (Calvello et al. 2014, Trippetta et al. 2013). Their analysis confirmed that the pollutants related to the COVA combustion process as nitrogen oxides, benzene, and toluene have the highest concentration values and are significantly correlated close to the plant, with the indication that BC may be a vehicle for organic compounds in the environment.

To take into account the complex emission scenario, in this paper, all monitored emissions and measured concentration series were analyzed to individuate the best exposure metric. Considering that correlation among the modeled maps assessed using Pearson coefficient correlation was greater than 90%, it was assumed that the map of one of the emitted monitored pollutants, i.e., NOx, could be considered proxy of the mix of emitted substances. On the other hand, the adequate correlation between the modeled NOx and H2S measured maps (correlation index = 0.65) supported the hypothesis that the NOx map could be exploited as a proxy for the exposure to the mix of substances emitted by stacks and flares, and therefore it could be considered at the moment as the best metric of individual residential average annual exposure to COVA emissions.

Conclusions

The impact of the COVA plant on air quality has been investigated by means of dispersion simulations and monitoring data. The annually averaged simulated ground concentration maps allowed for the identification of the most affected areas east of the industrial zone, at about 5-km distance. Concentration peaks were also found in proximity to the plant, in an area of about 1 km. The concentration distribution is consistent with the prevailing direction of western winds and the vertical rise of exhaust plumes. The annual map of the hydrogen sulfide measured within the surrounding territory of two municipalities shows a similar spatial distribution compared to the simulated concentrations. The high correlations among modeled and measured maps allowed to assume NOX as proxy of the mix of emitted substances, representing at the moment the best metric for population exposure assessment in the epidemiological study.

Finally, the simulations show that the area impacted by the plumes is even larger than that of the two municipalities, suggesting the need to extend the monitoring area and to include the population living in the most affected areas farther from the plant in the health study.