Introduction

Exposure to fine particulate air pollution (PM2.5) has been associated with increased morbidity and premature mortality, suggesting that sustained reductions in pollution exposure could result in improved health and increased life expectancy (Gilboa et al. 2005; Sarnat et al. 2005; Pope et al. 2009; Matte et al. 2009; Solomon et al. 2012; Hubbell 2012). Estimating population exposure to PM2.5 has traditionally been done by assigning measurements of a central ground monitor to people living within the region (Kanaroglou et al. 2005; Sampson et al. 2013). However, a number of studies have shown the limitations of using central ground monitor data as the exposure metric (Lefohn et al. 1987; Wade et al. 2006; Beelen et al. 2009; Kim et al. 2014; Dionisio et al. 2016). These limitations include monitoring sites in national regulatory networks that are relatively sparse across broad regions of the country (Hu et al. 2014a) and pollutant concentrations that can be impacted by local emissions, leading to local variations (Hu et al. 2014b). A variety of modeling approaches are now being used to better estimate pollutant concentration variations not captured by monitors (Marmur et al. 2005; Johnson et al. 2010; Liu et al. 2012).

One approach to develop air quality fields is using chemical transport models (CTMs) that account for local variations affected by emissions and meteorology (Godowitch et al. 2015; Kim et al. 2015; Pleim et al. 2016). The Community Multiscale Air Quality (CMAQ; Binkowski 2003; Byun and Schere 2006) model is a state-of-the-science chemical transport model (CTM) designed to follow the dynamics of air pollutants from emissions. CMAQ captures spatial and temporal variations (Friberg et al. 2016) but is subject to errors due to limitations in insufficient characterization of meteorological (Yu et al. 2012) and emission inputs (Gilliland et al. 2008; Xiao et al. 2010; Ivey et al. 2015), as well as physical and chemical processes (Carlton et al. 2008; Tang et al. 2011; Ivey et al. 2016).

The objective of this research is to use the data fusion (DF) approach to develop spatiotemporal concentration fields for PM2.5 mass, five PM species, and three gases for the state of North Carolina to support the University of North Carolina at Chapel Hill’s health analysis of coronary heart disease patients in NC (McGuinn et al. 2017). The data fusion approach is developed at a spatial resolution of 12 km that combines observations from ambient monitors and data from CMAQ to better estimate ground-level air pollutant concentration fields for improved exposure estimates (Friberg et al. 2016). Several data withholding methods, which involve the use of monitor observations, were used to evaluate the stability of the data fusion method. A comparison of total PM2.5 mass concentration is made between the results using unadjusted CMAQ pollutant fields, the data fusion application, ordinary kriging, and two satellite aerosol optical depth (AOD) data-included methods (Hu et al. 2014a; Di et al. 2016). These were compared as a part of evaluating the performance of various PM2.5 exposure methods. Exposure fields of five PM species and three gases were also compared between CMAQ results and data fusion method results.

Methods

Four statistical methods were used to create the spatiotemporal fields, and the results were compared with each other and evaluated against observations. The first statistical method used was the data fusion method. The data fusion method combines observations and modeled pollutant fields and was used during 2006–2008 period over North Carolina. (The data fusion method was actually applied from 2002 to 2010; 2006–2008 is in the middle part of that period and could be representative of the meteorological conditions experienced over that time.) The second and third methods were a two-stage statistical model and a neural network-based hybrid model, which both use satellite aerosol optical depth (AOD) and other data to develop PM2.5 fields separately. Reliance on AOD data led to those methods being applied just to PM2.5 mass, not individual PM or gaseous species. The fourth method uses ordinary kriging of observations at monitoring sites and was applied to develop PM2.5 and CO fields. Other pollutant species were monitored at very few locations, limiting the amount of information available to develop spatiotemporal exposure fields as well as conduct a more thorough evaluation.

Air quality data

The observations used for data fusion come from the State and Local Air Monitoring Stations (SLAMS), Chemical Speciation Network (CSN) (Chu 2004) and Interagency Monitoring of Protected Visual Environments (IMPROVE) (Malm et al. 1994) networks. Observations from all available networks are utilized together. Pollutants include concentrations of three gases (carbon monoxide (CO), nitrogen dioxide (NO2), and nitrogen oxide (NO x )), PM2.5 mass, and five PM2.5 components (elemental carbon (EC), organic carbon (OC), ammonium (NH4 +), nitrate (NO3 ), and sulfate (SO4 2−)) (Fig. 1). Because of the limited number of monitoring sites for some species (e.g., CO, NO2, and NO x ) in NC, we also included monitoring sites in neighboring states.

Fig. 1
figure 1

Ambient air quality monitor locations used in this analysis. (Not all monitor locations have all species)

Twenty-four-hour average PM2.5 concentrations for years 2006 to 2008 were collected from the EPA’s Air Quality System Technology Transfer Network for use in the two-stage statistical model. The MODIS aerosol data (collection 5) at 550 nm wavelength were obtained from the NASA Earth Observing System Data Gateway at the Goddard Space Flight Center.

Chemical transport model simulated concentrations

Pollutant concentration fields used in this paper are developed using CMAQ model version 4.5 at 12-km resolution for the 2006–2008 period over North Carolina. A comprehensive model evaluation (Wyat Appel et al. 2008) of CMAQ version 4.5 conducted by the USEPA showed that simulated particulate nitrate and ammonium are biased high in the fall due to an overestimation of seasonal ammonia emissions (Qin et al. 2015). The EPA evaluation also found that simulated carbonaceous aerosol concentrations are biased low during the late spring and summer due to the lack of some secondary organic aerosol (SOA) formation pathways in the model (Jathar et al. 2016; Woody et al. 2016).

Data fusion

The approach used to combine the CMAQ-derived fields with observed pollutant concentrations was described in detail in Friberg et al. (2016). The method blends observations and CMAQ results based on spatial correlation analysis between observations and CMAQ simulations and generates a new field that captures local observations, as well as spatial variability from CMAQ. A summary is provided in the Electronic supplementary material.

Data fusion results were integrated with the Integrated Mobile Source Indicator (IMSI) method (Pachon et al. 2012) to estimate the influence of mobile sources on PM2.5. The IMSI method, which is developed for use in air quality and epidemiologic analyses, uses EC and NO x as indicators of diesel vehicle (DV) and CO and NO x as indicators of gasoline vehicle (GV) impacts. Here, the IMSI method, along with pollutant fields derived from the data fusion method, are used to provide spatiotemporal fields of mobile source impacts for use in source-specific, multipollutant, health analyses. The method is described in detail in the Electronic supplementary material.

Interpolation

Ordinary kriging (Cressie 1988) was applied to observed PM2.5 and CO to develop air quality fields for comparison with the more advanced methods. PM2.5 originates from multiple sources, both primary and secondary, whereas CO originates largely from mobile sources. PM2.5 and CO are monitored at more sites than PM species and primary mobile source gases.

Methods utilizing satellite aerosol optical depth for PM2.5 estimation

Two-stage statistical model

A two-stage statistical model (Hu et al. 2014a) employing satellite-retrieved aerosol optical depth (AOD) at 10 km resolution from Moderate Resolution Imaging SpectroRadiometer (MODIS) was used to develop PM2.5 fields. The grids were restructured for comparison at 12 km resolution. The model includes a linear mixed effects module with day-specific random intercepts and slopes for AOD and meteorological fields as the first stage to account for the day-to-day variability in the PM2.5-AOD relationship. The second stage is a geographically weighted regression model to capture spatial variation. Details of the method are found elsewhere (Hu et al. 2014a).

Neural network-based hybrid model

Di et al. (2016) applied another method that uses a neural network-based hybrid model that includes satellite-based AOD data from MODIS, absorbing aerosol index (AAI), chemical transport model (GEOS-Chem) output, land-use terms, and meteorological variables. The method has been used to estimate the national PM2.5 fields at 1 km × 1 km resolution. Detailed description is found in a previous publication (Di et al. 2016). We extracted the results for North Carolina for 2006 to 2008.

Model evaluation methods

The performance of the data fusion method was evaluated by using three data withholding methods, as described in following subsections.

Random data withholding

Ten groups of observational data were constructed, with each group having 10% of the data randomly (not linked to specific monitors) withheld. Each group was run independently. Performance was assessed by comparing the simulated values to the data that were withheld for that iteration.

Randomly based monitor data withholding

Even though the random data withholding method is commonly used, it may overestimate the performance of the data fusion method. Monitor-based cross-validation may better reflect performance of the data fusion method because it is representative of areas where no monitor is located as opposed to a situation where a measurement is missing. In this case, the entire set of 60 PM2.5 monitors were randomly split into ten subsets with six monitors in each subset. For each of ten cross-validation iterations, one subset (10% of monitors) was selected as the testing sample and the remaining nine subsets (90% of the monitors) were used to reapply the method. Estimates of the withheld monitor values were compared with the actual monitor values. This randomly based monitor data withholding was repeated twice to check the stability of this evaluation to the random choice of monitor grouping. For NO2 and CO, leave one monitor out (LOO) was applied (i.e., in each test only one monitor data has been removed) due to the limited number of monitors available in the domain.

Spatially based monitor data withholding

Monitors may be clustered such that when one is removed there are nearby monitors that lead to the various methods being able to accurately estimate the pollutant levels for the removed monitor. This can result in an overestimation of a model’s ability to provide accurate concentration estimates in a region with no monitors. Here, the entire set of monitors was spatially split into ten subsets (Fig. S1) according to their locations, and withholding was performed with the spatially based removed subsets.

Results and discussion

CMAQ

As a baseline, the unadjusted CMAQ results are evaluated over the NC domain. Annual average PM2.5 shows that concentrations from CMAQ results (Table 1) are higher in 2007 for most species than in 2006 and 2008. For PM2.5, the R 2 between pollutant observations and CMAQ simulations over the 3-year period is 0.32 and a root mean square error (RMSE) is 5.16 μg/m3. Linear regression (Fig. 2; Table 2) between pollutant observations and CMAQ has a slope of 0.51. Evaluation results for other species tend to be have lower correlations (Table 2).

Table 1 Annual average concentrations from data fusion and CMAQ over the NC domain
Fig. 2
figure 2

Linear regression between observation (OBS) and simulations (PM2.5)

Table 2 Method performance evaluation (CMAQ, DF, and DF-WH) for PM2.5 and PM2.5 species (EC, OC, NH4 +, NO3 , and SO4 2−) and mobile source-related gases NO2, NO x , and CO, 24-h average values

Data fusion

There are decreasing trends in the annual average concentration for all species from 2006 to 2008 in the data fusion results (Table 1). The annual average concentrations for each species from the DF method are higher than those from the CMAQ results. The probability density distributions of all species concentrations are log-normally distributed (Fig. S2).

Spatial plots of the annual averages for each of the nine pollutants show high concentrations in major urban centers (Fig. 3; Figs. S3a and S3b). Emission impacts are evident near the major interstates in the NO2, NO x , and CO fields. Concentrations at the western and eastern boundaries are much lower than the other areas because these are forest and coastal areas, respectively.

Fig. 3
figure 3

Annual average spatial distribution fields from data fusion, 2008

Monthly trends in North Carolina averaged over 3 years (Fig. S4) show that the concentrations of PM2.5 and SO4 2− are higher in the summer and lower in the winter in North Carolina, while NO3 , EC and OC are lower in the summer and higher in the winter. Concentrations of CO, NO x , and NO2 are higher in the winter and lower in the summer. These trends are expected based on the atmospheric formation chemistry of the secondary components (i.e., sulfate formed in summer and nitrate in winter) and the mixing height (lower in winter) due to meteorological conditions.

Mobile source impacts are estimated using the IMSI method applied to the DF fields. IMSI impacts decrease in the summer and increase in the fall (Fig. 4). The reduction of gasoline vehicle impacts is larger than the reduction of diesel vehicle impacts during the summer months. Emission-based IMSI value for gasoline (IMSIGV) and emission-based IMSI value for diesel vehicles (IMSIDV) are higher in 2007 than 2006 and 2008 (Fig. S5). The elevated impact areas near highways indicate that the method captures a mobile source activity and the data fusion fields are trustable (Fig. S6).

Fig. 4
figure 4

Monthly trends of IMSIEB, IMSIEB, GV, and IMSIEB, DV from 2006 to 2008 (unitless)

Temporal correlations between IMSI impacts and PM2.5 concentrations indicate that highly populated and busy traffic areas have lower temporal correlations than other areas (Fig. S7). The correlations between PM2.5 and EC, CO, and NO x are low in rural areas (Fig. S8). The low temporal correlation between PM2.5 and the primary pollutants is because much of the PM2.5 in the area is secondary (Gertler et al. 2000; Gertler 2005). The annual average spatial correlations between IMSI impacts and PM2.5 concentrations are 0.72 (2006), 0.71 (2007), and 0.78 (2008).

Ten percent random data withholding (Fig. S9) led to a R 2 of 0.82 (Fig. 2) for PM2.5, 0.24 (Fig. S10) for CO and 0.78 (Fig. S11) for NO2. Reapplying the method led to very similar correlations (e.g., for PM2.5, the R 2 was 0.81). Spatial 10% monitor withholding cross-validation (only applied to PM2.5 due to the lack of monitors) led to a lower R 2 of 0.73 (Fig. 2). The LOO results for CO and NO2 also have lower R 2 values than the random data withholding, with a decrease from 0.24 to 0.10 for CO and from 0.78 to 0.52 for NO2. Although there is a small difference in PM2.5 RMSE results of approximately 1.20 μg/m3 between the 10% random data withholding results and the original DF data sets (Fig. S12; Tables 3, 4, and 5), both of these values are much smaller than the CMAQ RMSE results of 5.16 μg/m3. Spatial distributions of the maximum root-mean-squared deviation (mRMSD: The maximum daily root-mean-squared deviation value throughout the whole year.) for PM2.5 show that the largest mRMSD are lower than 2, except in northeastern NC in 2008 (Fig. S13a and S13b). The RMSD of spatially removed groupings (Fig. S14) is similar to randomly removed groupings (Fig. S13a and S13b) for PM2.5, except for the northeast area of North Carolina in 2008 because of the limited monitors in this area (Fig. 1). NO2 results are similar, with RMSE decreasing from 7.1 ppb (CMAQ) to 2.4 ppb (data fusion) (Table 5). For CO, RMSE decreases from 269 ppb (CMAQ) to 231 ppb (data fusion) (Table 4). RMSEs of LOO results for NO2 and CO also show larger increases compared with 10% random data withholding results (Tables 4 and 5). All monitor-based withholding cross-validation for PM2.5, CO, and NO2 have larger RMSE and smaller R 2 than 10% random data withholding results.

Table 3 Performance evaluation for observation (OBS) and simulations (PM2.5) using data withholding approaches, 24-h average values
Table 4 Performance evaluation for observation (OBS) and simulations (CO), 24-h average values
Table 5 Performance evaluation for observation (OBS) and simulations (NO2), 24-h average values

The spatial 10% monitor withholding leads to a lower R 2 and higher RMSE for PM2.5 as compared with random 10% monitor withholding (Table 3) with RMSE increases from 2.48 μg/m3 (random) to 2.81 μg/m3 (spatial). When removing values in spatially similar groupings, kriging results are minimally impacted by distant observations. As a result, the CMAQ simulations are more heavily weighted and the performance of the withheld data fusion results worsens. The LOO test for NO2 and CO shows the influence of the distribution and quantity of the monitoring sites. CO monitors are located mainly in urban areas, while NO2 monitors are distributed more widely. There are fewer monitors for both NO2 and CO than for PM2.5.

Ordinary kriging interpolation

Annual average PM2.5 and CO spatial plots from kriging are shown in the Electronic supplementary material (Fig. S15). Linear regression (Figs. S16a and S16b) between ordinary kriging and observations has the highest R 2 and slope among all the methods. RMSEs are also very small, which are 0.67 μg/m3 and 24 ppb, separately. Such performance is expected when using the same data in the application because of the ordinary kriging method’s mechanism, so monitor-based data withholding was performed for evaluation.

The performance using monitor-based withholding for ordinary kriging is similar to data fusion results. R 2 for monitor-based withholding is larger than 0.70. Results for CO are worse than the total data interpolation; R 2 decreases from 0.99 (ordinary kriging) to 0.13 (ordinary kriging LOO) (Fig. S16b).

Methods using satellite-retrieved AOD for PM2.5

Two-stage statistical model

The R 2 between observation and two-stage statistical model results is 0.81 (Table 3) lower than data fusion results (0.95, Table 2). The RMSE of two-stage statistical model (3.06 μg/m3) is better than CMAQ data RMSE of 5.16 μg/m3 when comparing simulated results with observations. A tenfold cross-validation (random data withholding) shows that the 3-year averaged R 2 is 0.78 and the averaged RMSE is 3.06 from 2006 to 2008.

Neural network-based hybrid model

The linear regression between neural network-based hybrid model results and pollutant observations has an R 2 of 0.82 (Fig. S17). The annual average spatial distribution fields (Fig. S18) show a decreasing trend for PM2.5 concentration from 2006 to 2008. The fields show that the method is also good at capturing the spatial information that urban areas have a high PM2.5 concentration and rural areas have a lower concentration.

Comparison between CMAQ and data fusion for all species

Correlations between 10% random data withholding results and observations are higher than CMAQ and observations (Figs. S9 and S12; Table 2). R 2 values for PM2.5, EC, OC, NH4 +, NO3 , SO4 2−, NO2, NO x , and CO between observations and data fusion simulations increase compared with the correlations between observations and CMAQ simulations. RMSEs decrease and R 2 increases for all the species except NO3 and NO x . The R 2 between observation and 10% random data withholding for PM2.5 is 0.82. SO4 2− also performs very well with a R 2 value of 0.82. R 2 value between daily CMAQ and data fusion results for each grid over the whole year for 2008 show that the highest values correspond to the grids that are nearest to monitors for all pollutants (Fig. 5). R 2 values decrease as the distance to monitors increase, which indicates that the accuracy of this method increases with the number of monitors used because of the high dependency on the number and locations of monitors to perform the kriging step in the data fusion method.

Fig. 5
figure 5

R 2 values of each grid for 2008

Comparison between data fusion and two-stage statistical model

The relationship between data fusion and two-stage statistical model results for PM2.5 simulations during 2006 to 2008 are calculated using Deming regression (Deming 1943) to equally weight the two inputs because both data are estimated values from models (Fig. 6a). The grid-by-grid correlations over most of the domain have a value close to 1; however, the correlations in boundary areas are lower. Both the data fusion and two-stage statistical model capture the urban area PM2.5 concentrations. Fewer monitors are located in the forested areas of NC, so the results from the two methods are not as strongly correlated. CMAQ secondary organic carbon formation is typical biased low in forested areas (Van Donkelaar et al. 2007; Zhang et al. 2007; Baek et al. 2011), which may contribute to low correlations with the two-stage statistical model. The two-stage statistical model can overestimate concentrations in the coastal areas of eastern NC (Fig. S19) because of the high relative humidity in the area, which leads to a bias in estimated PM2.5 from satellite-retrieved AOD (Liu et al. 2005; Hu et al. 2013). The retrieval quality of the MODIS product is sensitive to vegetation cover and has difficulty distinguishing between the mixed land and water pixels, a limitation that might also contribute to the overestimation of the two-stage model along the coast. Lacking AOD data could be another limitation of these AOD data-included methods because of the satellite pattern and cloud cover days.

Fig. 6
figure 6

a Temporal correlations (R) between data fusion and two-stage statistical model from 2006 to 2008. b Temporal correlations (R) between data fusion and Harvard’s hybrid method from 2006 to 2008

Comparison between data fusion and hybrid model

Another comparison is made between the data fusion and Di et al.’s (2016) method. Temporal Deming regression (Fig. 6b) shows the higher correlation in urban areas and lower correlation in the eastern and western boundaries and mid-south areas. This is similar to the comparison of data fusion and the two-stage statistical model results except in the mid-south area, which is a national forest. The difference in annual average concentration in coastal areas (Figs. S18, S19, and S20) illustrates that the neural network-based hybrid model could provide a more accurate spatial information because of the use of AAI and CTM outputs to improve accuracy.

Conclusion

Application of the data fusion method for primary and secondary pollutants over North Carolina demonstrates that the method provides accurate concentration fields, especially for PM2.5 total mass, OC, SO4 2−, NH4 +, and NO2, capturing the spatial and temporal variations in both gaseous and speciated particulate matter concentrations. Capturing these variations is critical for improved estimation of exposures for health studies. Cross-validation with 10% random data withholding indicates that the DF results have little bias. CMAQ-modeled, non-data fused concentration fields were subject to higher temporally and spatially varying bias and error and lower correlations. These results demonstrate that the data fusion approach, as opposed to using CTM fields directly, should be used to provide spatiotemporal exposure fields for health studies that use daily air quality metrics. Using the DF method-derived fields to estimate mobile source impacts using the IMSI method also found that the results could be used in health studies.

This study also investigated the use of random data withholding versus withholding monitors randomly and based upon spatial clustering. Findings show that the data fusion method does provide accurate fields, but random data withholding may overestimate the ability of such methods to provide accurate concentration estimates in areas lacking monitors. The number and the distribution of monitoring sites affect the accuracy of the data fusion method. The more widely the monitors are distributed, the more stable the data fusion method results. Observation availability is an important factor in the application and evaluation of the method according to some pollutants’ performances such as CO, NO2, and NO x have very few monitors. Moreover, CO monitors are mainly located in urban areas. However, this research and previous studies demonstrate the benefits of the method versus the use of air quality model fields directly.

Spatiotemporal PM2.5 fields derived using the CTM-based data fusion method are compared well with similar fields derived using AOD and another chemical transport model. These and prior results suggest that the data fusion method provides a promising approach to develop exposure fields for health analysis across both urban and regional scales. A major advantage of CTM-based data fusion methods (which could potentially include the hybrid approach) over methods relying mostly on AOD to provide spatial variations is that it provides speciated PM2.5 and gaseous pollutant fields.