Abstract
Background
There is substantial interest in using networks of lower-cost air quality sensors to characterize urban population exposure to fine particulate matter mass (PM2.5). However, sensor uncertainty is a concern with these monitors.
Objectives
(1) Quantify the uncertainty of lower-cost PM2.5 sensors; (2) Use the high spatiotemporal resolution of a lower-cost sensor network to quantify the contribution of different modifiable and non-modifiable factors to urban PM2.5.
Methods
A network of 64 lower-cost monitors was deployed across Pittsburgh, PA, USA. Measurement and sampling uncertainties were quantified by comparison to local reference monitors. Data were sorted by land-use characteristics, time of day, and wind direction.
Results
Careful calibration, temporal averaging, and reference site corrections reduced sensor uncertainty to 1 μg/m3, ~10% of typical long-term average PM2.5 concentrations in Pittsburgh. Episodic and long-term enhancements to urban PM2.5 due to a nearby large metallurgical coke manufacturing facility were 1.6 ± 0.36 μg/m3 and 0.3 ± 0.2 μg/m3, respectively. Daytime land-use regression models identified restaurants as an important local contributor to urban PM2.5. PM2.5 above EPA and WHO daily health standards was observed at several sites across the city.
Significance
With proper management, a large network of lower-cost sensors can identify statistically significant trends and factors in urban exposure.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Introduction
Both long- and short-term exposure to the “fine” fraction of atmospheric particulate matter (particles <2.5 μm in diameter, PM2.5) are associated with negative health outcomes [1,2,3,4,5]. In the United States, the concentration of PM2.5 is regulated based on data from a handful of relatively high-cost regulatory-grade monitors located in each city. However, these sparse monitoring networks are not sufficient to capture the spatial distribution of pollutant concentrations within an urban area [6], and recent evidence suggests that higher health risks may be associated with local (<1 km) and neighborhood-scale (1–10 km) variations in PM2.5 [5].
Recently, there has been substantial interest in deploying denser networks of lower-cost air quality monitors to better characterize the factors driving local pollution and the exposure of urban populations to PM2.5 [7,8,9,10,11,12,13,14,15,16]. However, an ongoing concern with lower-cost PM2.5 monitors is their uncertainty, which is typically larger than regulatory-grade instruments. Lower-cost PM2.5 monitors typically use optical measurements, which are sensitive to changes in the aerosol scattering coefficient due to hygroscopic growth. This can lead to overestimation of PM2.5 mass when humidity is high. Lower-cost optical PM2.5 sensors are also unable to detect particles under 300 nm, leading them to potentially underestimate PM2.5 mass. However, several studies have shown that careful calibration and corrections can reduce the errors between lower-cost sensors and regulatory-grade instruments [7, 16, 17]. While many studies have focused on the development and evaluation of lower-cost PM2.5 monitors [7, 12, 14,15,16,17,, 18], few studies have investigated how data from networks of these lower-cost monitors can be used to examine local air quality issues [10, 13, 19,20,21].
To reduce population exposure, the complex interplay of multiple factors that lead to elevated PM2.5 concentrations need to be characterized, quantified, and ultimately, controlled. A common approach to characterize intra-urban spatial patterns of PM2.5 concentrations is saturation filter sampling, which is often used to develop land use regression (LUR) models [6, 22,23,24,25]. However, filter samples collected over multiple days or weeks can mask the influence of sources that are significant on subdaily or even hourly time scales. An important advantage of lower-cost monitors over filter sampling is the ability to perform long-term, yet time-resolved (hourly or faster) measurements at a large number of locations. Time-resolved measurements allow stratification of the data by time of day or wind direction, enabling better understanding of the effects of different modifiable and unmodifiable factors on urban concentrations. Modifiable factors are those that can be controlled by regulatory action, such as emissions sources (e.g., industry, traffic, restaurants), the built environment, and human behavior. Unmodifiable factors include meteorology (temperature, RH, wind direction, boundary layer height, etc.) as well as topography.
Lenschow et al. [26] proposed a framework in which the concentration of pollutants in any location is the sum of the regional background, the urban increment, and neighborhood-scale sources. Properly characterizing the regional background is key to understanding the magnitude of the urban increment, which is defined as the difference in concentration between a rural site and an “urban background” location [27]. This can be challenging because urban emissions and industrial sources often influence the concentrations in the surrounding “rural” areas where background measurements are made; this can lead to underestimating the urban increment [27]. A dense network of lower-cost sensors could address these limitations by making simultaneous measurements at multiple background sites. In addition, time-resolved measurements allow periods of urban or industrial influence on background sites to be excluded from the analysis.
This paper describes PM2.5 measurements from a dense network of lower-cost air pollution monitors operated over two years at 64 sites in and around Pittsburgh, Pennsylvania. Pittsburgh, like many urban areas around the world, is impacted by vehicular emissions, local commercial activities (e.g., restaurants), and regional transport. It is also impacted by nearby major industrial facilities. First, we characterize the combined uncertainty of the measurements due to the sensors and the sampling strategy, especially as it relates to long-term exposure measurements. We then quantify the spatial patterns of PM2.5 pollution and leverage the time-resolved data to investigate the contributions of different sources to urban PM2.5. This analysis demonstrates how dense, time-resolved lower-cost monitor networks can be used to quantify the modifiable factors driving pollutant concentrations in urban areas.
Experimental methods
Study area
A network of lower-cost sensors was deployed in Pittsburgh and surrounding suburban and rural areas (Fig. 1). Pittsburgh is a city located in southwestern Pennsylvania, USA, and has a population of 0.3 million with an additional 2 million people in the surrounding metropolitan area. The city is characterized by complex topography at the confluence of three major river valleys (Allegheny, Monongahela, and Ohio Rivers). The Pittsburgh region ranks among the 10 highest annual average PM2.5 concentrations in the United States [28]. US Steel’s Mon Valley Works: Clairton Plant, the largest metallurgical coke production facility in North America, is located 17 km south of the city of Pittsburgh.
Measurements
The Center for Atmospheric Particle Studies at Carnegie Mellon University built an urban-scale network using the lower-cost Real-time Affordable Multi-Pollutant (RAMP) monitors (manufactured by formerly SenSevere, now part of Sensit Technologies, Valparaiso, IN), which have been described in detail elsewhere [7, 19, 29, 30]. The RAMP network provides spatiotemporally resolved data from sites in and around Pittsburgh, PA (Fig. 1). At the time of writing, RAMPs had been deployed at 69 sites. Of these, five sites were excluded from this analysis due to limited data collection. We therefore consider data from a network of 64 RAMPs. The network was deployed starting in late July 2016; this manuscript considers PM2.5 data collected through March 2019. Given the deployment and maintenance schedules, the sampling period at the individual RAMP sites ranged from 11 to 563 days, with an average length of 284 days (a complete list of sensors can be found in Table S1). A photograph of a typical RAMP setup can be seen in Fig. S1. Sites were chosen to characterize pollution concentrations in a range of different environments, such as suburban, downtown, near busy roadways, and urban background, and to span various land-use characteristics (traffic, restaurants, and proximity to industrial sources) commonly used to create LUR models.
As previously described [7, 29, 30], the RAMPs include five gas sensors (CO, CO2, NO2, either O3 or VOCs, and either SO2 or NO), and were deployed with one of two external optical PM2.5 sensors: the Met-One Neighborhood Particulate Monitor sensor or the PurpleAir PM2.5 monitor. This manuscript focuses on data from the PM2.5 sensors. Malings et al. [7] describes an extensive evaluation of the optical PM2.5 monitors used here, including the development of a correction based on colocations with Beta Attenuation Mass Monitors (BAM, Met One Instruments, Grants Pass, OR, USA), a federal equivalent method, operated by the Allegheny County Health Department (ACHD). The corrections account for hygroscopic growth, particle mass below the instrument detection limit of 300 nm, and sensor drift. All data discussed here has been corrected to BAM-equivalent values using the algorithm (Eqs. (4) & (5)) in Malings et al. [7]. Malings et al. [7] did not find significant differences between the two types of PM2.5 sensors (Met-One Neighborhood Particulate Monitor sensor or the PurpleAir PM2.5 monitor) and we have treated them the same in this analysis. The data were collected four times per minute and then averaged to time scales from 1 h to the entire multiyear dataset for this analysis.
Wind direction data were measured at the Allegheny County Airport (AGC, a small regional airport between Pittsburgh and the Clairton Plant, shown in Fig. 1. It is located 7.6 km northwest of the Clairton Plant and 10 km south of the Carnegie Mellon University Campus), and were retrieved from the National Oceanic and Atmospheric Administration’s Climate Data Online database [31]. Local topography, such as street canyons and the river valley, will impact the wind direction at RAMP sites. However, the data from AGC are a reasonable estimate of regional wind direction—the data from this site were consistent with those from other sites in the Pittsburgh area (See Fig. S2).
Uncertainty analysis
Uncertainty is a key issue with drawing robust conclusions on air pollution exposure from lower-cost sensor data. We accounted for two types of uncertainty: (i) instrument uncertainty associated with the sensors themselves and (ii) sampling uncertainty caused by noncontinuous and nonsimultaneous deployments. In addition, we estimate the reduction in uncertainty when concentrations from an individual RAMP are averaged over time and when data from multiple RAMP monitors are averaged together.
Instrument uncertainty
This work extends the analysis of Malings et al. [7] to evaluate sensor performance, covering more sensors and the period of this study. Instrument uncertainty was determined by comparing measurements from the RAMPs colocated at two different regulatory monitoring sites operated by ACHD over a ~2-year period. One site was in an urban neighborhood (Lawrenceville, AQS#42-003-0008), and the other was downwind of the Clairton Plant (Lincoln, AQS#42-003-7004).
One RAMP monitor was colocated at each of these two ACHD sites for the entire study period. For more limited periods (up to 100 days) 38 RAMP monitors were colocated at one of the ACHD sites. This was done on a rolling basis with different RAMPs to continually evaluate instrument performance. Therefore, our dataset allows us to both evaluate the effects of different averaging times on RAMP monitor performance, as well as intercompare performance across a set of RAMP monitors.
We quantified the instrument uncertainty as the 75th percentile of the absolute percent error between the PM2.5 measurements of the two long-term pairs of colocated RAMP and BAM monitors as a function of averaging time, ranging from 1 h to a year. This conservative approach assumes that all the variance between the two measurements is due to the RAMPs, while the BAM monitors are measuring the true value. However, the EPA’s national precision estimate for BAM measurements exceeds 20% for daily average values [32].
Since the colocation datasets were typically longer than the different averaging periods, we randomly subsampled the data from each RAMP and calculated the absolute percent error relative to the BAM. This subsampling approach creates a distribution of comparisons for a single RAMP during a given averaging period. The lines in Fig. 2a shows the 75th percentile of the absolute percent error (henceforth, “uncertainty”) for the two long-term colocations. We compare the uncertainties of the two long-term colocations to all 40 RAMPs (the two long term and 38 additional limited periods, shown in the box-whisker plots in Fig. 2a), to ensure their uncertainties are similar. At averaging times of at least 10 days (240 h), the uncertainty of the two long-term colocated RAMPs are higher than most of the others, indicating that our estimate is conservative.
At 1-h resolution, the uncertainty of the RAMP sensors is large, between 40 and 50%, which correspond to 4-5 μg/m3 for the average PM2.5 concentration in Pittsburgh. However, the uncertainty decreases with averaging time. For about half the colocated RAMPs, averaging one day of data reduced the uncertainty to 20% or less. Averaging 10-days of data reduces the uncertainty for 75% percent of the colocated RAMPs to 11% or less, and reduces the MAPE (mean absolute percent error) for 90% of the colocated RAMPs to 20% or less. There is little additional reduction in uncertainty after about 10-days of averaging. As we will discuss, this level of performance is sufficient to quantify spatial patterns in Pittsburgh. The data in Fig. 2a demonstrate that although the lower-cost sensors can be noisy at short time scales and are sensitive to the effects of humidity on particle growth, averaging over time reduces these effects, which do not appear to create long-term biases. This performance is similar to that reported by Malings et al. [7].
Instrument uncertainty can be further reduced by averaging data from multiple RAMPs operating simultaneously in similar urban environments. This reduction was estimated by bootstrapping data collected using 11 RAMPs that were simultaneously colocated at one of the ACHD sites for a 17-day period. The results from the analysis are shown in Fig. S3. The reduction in uncertainty by averaging data from multiple RAMPs together scales with \(1/_{\sqrt{n}}\), where n is the number of sensors.
Sampling uncertainty
A second source of uncertainty is the sampling uncertainty, a result of RAMPs not operating continuously at each site over the entire study period. Therefore, the average concentration from each RAMP might not be representative of the long-term average at that site, because of daily and/or seasonal differences in PM2.5.
To correct for noncontinuous sampling, we used data from BAM monitors operated by ACHD at sites not dominated by local sources as a reference to temporally adjust for any changes in background pollution levels using the approach of De Nazelle et al. [8]. The hourly RAMP measurements were multiplied by the ratio of the long-term average of the ACHD sites, Cann to the average hourly ACHD BAM concentration, Ct,,
For example, if the hourly average of concentrations at the ACHD sites (Ct) is larger than the ACHD long-term average background concentration (Cann), the reference site correction ratio of \(\frac{{C_{ann}}}{{C_t}}\) would be less than one. This correction reduces the influence of episodic, daily, and seasonal differences on estimated long-term average concentrations when not all sites were sampling simultaneously.
To quantify the reduction in uncertainty using the reference-site correction, we compared averages with and without the correction applied to measurements from five ACHD monitoring sites. At each averaging period length, we compared the average concentration of the entire dataset at a given location to that of subsamples from that site. Because random sampling is more likely to approximate the true average than the average of continuous time periods, we used a “sliding window” technique for a more conservative estimate of the sampling uncertainty. For example, for a one-day average we started with the first 24 h of measurements. Then, we would slide the 24-h “window” forward 1 h, such that the window started at the second hour. Each hourly ACHD measurement in the window was multiplied by \(\frac{{C_{ann}}}{{C_t}}\), then the values in the window were averaged and compared with the long-term average. This procedure was repeated across the entire colocation period for each length of averaging period to create several thousand comparisons for each site.
Figure 2b shows the 75th percentile of the absolute percent error between the subsamples estimated average for different averaging periods and the true long-term average for that site, for both uncorrected data and data using the reference site correction. As expected, in both cases the sampling uncertainty decreases with increasing averaging period length and the reference site correction reduces the sampling uncertainty. For example, for sampling periods between 40 and 100 days, the average sampling uncertainty of the uncorrected data was ~2 μg/m3 (or roughly 20% of long-term average PM2.5 concentrations in Pittsburgh). The reference site correction reduces this sampling uncertainty (Fig. 2b) by about a factor of 10 to 0.3 and 0.2 μg/m3 for 40- and 100-day averages, respectively.
Total uncertainty
To estimate longer-term average concentrations, we applied the reference-site correction and averaged all data collected at given site during the period of interest. We apply a minimum cutoff of 240 h (10 days-equivalent) of data, not necessarily continuous, when calculating averages. Fifty-one sites had more than 4800 h (200 days) of data. The instrument and sampling uncertainty were combined in quadrature to estimate the total uncertainty: \(U_{total}=\sqrt{U_{intrument}^{2}+U_{sampling}^{2}}\).
The total uncertainty for the long-term average values for each of the 64 RAMP sites ranged from 0.75 to 1.13 μg/m3, with an average of 0.92 μg/m3. The uncertainty of differences in concentrations between two RAMPS was also calculated as the uncertainty of the two values combined in quadrature: \(U_{i - j} = \sqrt {U_i^2 + U_j^2}\). In this paper, values given after the “±” symbol indicate the uncertainty of the reported value, not the variability or standard deviation of the data.
Statistical tests
We used two different nonparametric statistical tests to compare differences in distributions. The Wilcoxon signed-rank test was used for paired samples. The Mann–Whitney U test was used to compare unpaired distributions with unequal sample sizes. To confirm the suitability of this test to compare populations with substantially different sample sizes, we took 5000 random samples the same size as the smaller set from the larger set (n1 = 859 & n2 = 4142; n1 = 929 & n2 = 5456). In cases where we report a statistically significant result, the differences between all subsamples were significant to 95% confidence level for each random sample.
Background concentrations
In order to estimate the contribution of local sources to concentration differences within the city, we need to subtract the regional background concentrations [26] instead of applying a reference site correction. To do this, we used data from nine RAMP sites in rural/suburban locations (the red points in Fig. 1). The regional background at a given hour was defined as the median PM2.5 concentration measured at all the background sites operating during that hour. Periods when background sites were influenced by emissions from the city or industrial sources were largely eliminated by using the median value of the backgrounds site concentrations. To confirm this, we compared the results when excluding periods when the wind direction was within 20° of the bearing between a background site and the Clairton Plant or downtown Pittsburgh. There was no difference in the calculated regional background value between these two cases. Therefore, we included all data in our calculation of the background concentration.
LUR models & land-use variables
To investigate the contribution of different modifiable and unmodifiable factors to the spatial pattern in measured PM2.5 concentrations, we built LUR models using measured PM2.5 concentrations and land-use data from GIS databases [33]. These variables included land use type, traffic, population density, restaurant density, and proximity to industrial point sources (see complete list in Table S2). Traffic-related variables were extracted for circular buffer sizes from 25 to 1000 m, point source-related variables were calculated at buffer sizes between 1000 and 30,000 m, and all other land use variables at buffer sizes from 50 to 5000 m. In total, we included 30 land use categories (totaling 110 variables at different buffer sizes).
We followed the ESCAPE protocol to build LUR models [34]. The long-term average PM2.5 concentrations were used as the dependent variables in stepwise multiple linear regression, and the land-use variables were used as the predictors. Predictor variables are only added to the model if they increased the R2 by ≥0.01, if the correlation p value was statistically significant (p < 0.05), if the sign of the coefficient agreed with expected influence on concentrations, and if its addition to the model did not change the sign of previously added variables. Once a variable at a specific buffer size was selected, its value was then subtracted from those of larger buffer sizes, while smaller buffer sizes were removed. For example, if traffic density at a 300 m buffer was selected, the traffic densities at 25, 50, and 100 m would be excluded, and traffic density for the 300 m buffer size would be subtracted from those at 500 and 1000 m.
The models were assessed by performing leave-one-out cross validation (LOO-CV). LOO-CV reduces the variance and bias compared with k-fold cross validation for smaller datasets [35]. We report two metrics of the LUR models: Mean-square-error-R2 (MSE-R2, which describes how well the relationship between measurements and predictions follows the 1:1 line), and MAPE for both the overall and LOO-CV evaluation.
Results and discussion
Long-term average PM2.5 concentrations
Figure 3 shows the long-term average PM2.5 concentrations measured at the 64 RAMP sites, which ranged from 7.7-11.4 μg/m3. Data from urban sites are plotted in Fig. 3a; Fig. 3b shows sites near the Clairton Plant. The horizontal red lines in Fig. 3 indicate the regional background PM2.5 concentration, 8.2 ± 0.4 μg/m3, which is somewhat higher than the value at a very rural site operated by the PA Department of Environmental Protection located in a large state park 35 km west (upwind in the prevailing wind direction, see Fig. S4) of Pittsburgh (7.7 μg/m3, Florence AQS#42-125-5001). This difference suggests that there is some contribution from local sources to the PM2.5 concentrations at sites in the RAMP network used to define the regional background, highlighting the challenges with defining background [27].
Similar to previous studies [36, 37], we find that PM2.5 levels in the Pittsburgh region are dominated by regional transport rather than local sources. The regional background contributes between 72 and 100% of the concentrations at the 46 urban RAMP sites.
Figure 3b illustrates the dramatic effect that a large industrial source can have on local PM2.5 concentrations. The Clairton Plant is the largest point source in Allegheny County, emitting 33% of the primary PM2.5 listed in the EPA’s 2017 National Emissions Inventory for the county [38]. At sites immediately upwind (west) of the Clairton Plant, long-term average PM2.5 concentrations are essentially the same as background levels (Fig. 3b). However, at sites immediately downwind (east) of the Clairton Plant the long-term average PM2.5 concentrations are 2-3 µg/m3 above background levels. These are some of the highest PM2.5 levels measured in the Pittsburgh region.
We also used the RAMP data to quantify the contribution of local sources, including the urban increment [26] and neighborhood-level sources. Figure 4a shows boxplots of the long-term average concentrations at sites categorized based on land use: background, urban, or industrial. The average concentration at all urban sites was 9.6 ± 0.13 µg/m3. This is the sum of the regional background, the urban increment, and neighborhood-level sources [26]. To isolate the urban increment, which is the PM2.5 contribution from all of the sources spread across the city, we first removed the influence of Clairton Plant from the data measured at 31 urban sites without high traffic or restaurant density. This was done by excluding periods from our long-term averages when wind directions were within 20° of the bearing between a given site and the Clairton Plant. At these 31 urban sites, the long-term (Clairton-excluded) average PM2.5 concentration was 9.1 ± 0.16 µg/m3 (Fig. 4c). We define the difference between this value and the regional background as the urban increment. This difference is 0.89 ± 0.43 µg/m3, which is 11% of the regional background.
We also used the RAMP data to estimate the neighborhood or hyperlocal sources at sites with high restaurant and traffic density (defined as in the top 15% in either of these categories of land use value in the city). These sites had an average (Clairton-excluded) PM2.5 concentrations of 9.6 ± 0.24 µg/m3. This means that, at these high activity sites, hyperlocal sources contributed, on average, an additional 0.5 ± 0.47 µg/m3 (Fig. 4c) in PM2.5 mass above a typical urban site. The urban increment also includes the contribution from these sources (which, in addition to smaller sample sizes, could explain why this difference was not statistically significant, \(p = 0.12\), Mann–Whitney U test). This analysis illustrates how a network of lower-cost sensors allows us to separate different sources of PM2.5, and reduces the uncertainty in the average person’s exposure to outdoor concentrations in different urban areas.
We used both the density and temporal resolution of the RAMP network to investigate whether the spatial pattern in PM2.5 concentration varied by time of day. We find the spatial pattern of long-term average PM2.5 concentrations persists across the network when the hourly RAMP data are sorted by time of day; the Spearman rank-order correlation between the average concentration during a given period of the day and the long-term average for the same site was high (Spearman rho of 0.82–0.94, Fig. S5). This indicates a consistent spatial pattern of persistent enhancements: sites with high long term-average concentrations are high throughout the day. This kind of insight can inform exposure studies that take into account peoples’ movements during a typical day.
Sources that have previously been identified as important contributors to the urban increment and neighborhood enhancement in Pittsburgh include vehicle emissions, restaurants [37], and industrial emissions, such as from Clairton Plant [39]. In subsequent sections, we use the RAMP network to systematically explore the contribution of each of these source categories to PM2.5 concentrations across the Pittsburgh region
Quantifying the contribution of a major industrial source
Clairton Plant has been well-documented as an important source of PM2.5 in the Pittsburgh area [39,40,41]. In this section, we illustrate an approach to quantify the contribution of similar major industrial sources across a broad urban environment while addressing the uncertainty of measurements from lower-cost sensors. We further show that the RAMP network can be used in ways that integrated (typically 24-h or biweekly) filter measurements and centralized real-time measurement sites cannot.
We leveraged both the density and the time resolved nature of the RAMP network to quantify Clairton Plant’s influence across the entire city. We first sorted the hourly PM2.5 data collected at each site into two categories: measurements made when the site was downwind from the Clairton Plant (defined as when the wind direction was within 20° of the bearing between the RAMP location and the Clairton Plant) and data from all other periods. Figure 5a shows an example of this analysis for one site ~17 km north of the Clairton Plant (star on Fig. 5b) that is representative of residential areas impacted by the Clairton Plant plume. At this site, the long-term average PM2.5 concentration was 7.4 ± 0.8 µg/m3 for periods when wind from the Clairton Plant were excluded, essentially the same as the regional background. However, when the site was downwind from the Clairton Plant the average concentration at this site was 9.6 ± 1.1 µg/m3. The difference, 2.2 ± 1.4 µg/m3, is statistically significant (\(p = 5.5x10^{ - 28}\), Mann–Whitney U test) and indicates that the Clairton Plant can have a substantial impact on local PM2.5 concentrations during periods when sites are downwind. We call this average increase in concentration when a site was downwind of the Clairton plant the episodic Clairton enhancement.
We repeated the analysis shown in Fig. 5a for every RAMP site in the city. Figure 4b shows boxplots of average concentrations for the two different wind conditions. The uncertainty of the episodic Clairton enhancement measured at individual sites was often similar in magnitude to the measurement uncertainty, one of the limitations of lower-cost sensors. However, the dense network allowed us to reduce this uncertainty by averaging data from all 46 urban RAMP sites. On average, when a site was downwind from the Clairton Plant the PM2.5 concentration was 11.2 ± 0.3 μg/m3 versus 9.6 ± 0.2 μg/m3 when the wind was not from Clairton. Therefore, the average episodic Clairton enhancement at the 46 urban RAMP sites was 1.6 ± 0.36 μg/m3. Dense networks also mean that there is a sufficient sample size to perform statistical tests, which confirm that the impact of Clairton Plant is statistically significant (\(p = 9.3x10^{ - 6}\), Wilcoxon signed rank test). This illustrates how robust conclusions on the influence of a particular source can be drawn by averaging data across a dense network of lower-cost sensors.
Using data collected in 2001–2002, Chu et al. [39] performed a similar analysis for a single site located in Schenley park adjacent to the Carnegie Mellon University campus during the Pittsburgh Air Quality Study. This site is located 17 km NNW from the Clairton Plant. They also reported that the Clairton Plant had a large impact on short-term PM2.5 concentrations. For example, while Chu et al. [39] did not explicitly calculate the contribution from Clairton Plant, their figures suggest that the average PM2.5 concentration was frequently greater than 20 µg/m3 when the Schenley Park site was downwind of the Clairton Plant, compared with a long-term average concentration of 15.5 µg/m3. Data from the RAMP network, collected more than 15 years later, indicate much lower concentrations and less impact of Clairton across a broad range of sites. For example, at the RAMP site on the Carnegie Mellon campus, the average concentration excluding periods when it is downwind of Clairton Plant is 9.1 ± 0.9 µg/m3, compared with 10.5 ± 1.0 µg/m3 when the site is downwind of Clairton Plant. The episodic Clairton enhancement at this site of 1.4 ± 1.4 µg/m3 is statistically significant (\(p = 4.1x10^{ - 11}\), Mann–Whitney U test). The reduction of the episodic Clairton enhancement relative to the Analysis of Chu et al. [39] highlights the effectiveness of regulations. The dense network of sites in the RAMP network allowed our analysis to be done across a much broader spatial domain then the Chu et al. [39] analysis.
While the analysis shown in Fig. 5a indicates that the Clairton Plant can make a large contribution to short-term concentrations, data from the RAMP network can also be used to estimate the contribution of Clairton to the long-term average concentration (across all time periods). We did this by calculating the difference between the long-term average including all wind directions and the average after excluding periods when each site is downwind of Clairton Plant. On average, the Clairton Plant contributes 0.3 ± 0.2 μg/m3 to the long-average PM2.5 concentrations at the 46 urban the sites in the RAMP network, though it varies widely by site (Fig. 5b). The enhancement depends, in part, on the fraction of time a site is downwind of the Clairton Plant. The magnitude of the long-term Clairton increment at individual sites was often smaller than the measurement uncertainty. However, the density of the RAMP network provides a sufficient sample size to conduct statistical tests to confirm that this average difference is significant (\(p = 4.0x10^{ - 9},\) Wilcoxon signed rank test).
Unlike the analysis of Chu et al. [39], which was performed at one central location, the density of the RAMP network provided insight into spatial patterns and the factors that cause the intra-urban variation of the influence of Clairton Plant. Certain sites appear to be more influenced by the emissions from the Clairton Plant, in part due to local topography. For example, Fig. 5b shows a map of the long-term Clairton increment across all urban sites superimposed on the elevation above sea level. The long-term Clairton increment was greatest at the eastern Pittsburgh city limits, along a corridor north of the Clairton Plant. When the wind conditions are right, the topography appears to create a channel, which directs the plume from the Clairton Plant to travel downriver and up this corridor. Identifying this kind of spatial pattern requires a dense network of sites and would not be possible using only the measurements from the sparse network of ACHD regulatory monitors.
Quantifying the contribution of traffic and restaurants
We also used the RAMP data to investigate sources responsible for the hyperlocal enhancement over the urban increment. We leverage the temporal resolution of RAMP data to remove measurements made when the site was downwind from the Clairton Plant (defined as when it was within 20° of the bearing between the RAMP location and the Clairton Plant) to investigate sources, such as traffic and restaurants, that are found in major cities around the world. We then fit an LUR model (Table S3) to the Clairton-free data using the predictor variables listed in Table S2.
The best fit model includes three variables: the density of PM-emitting point sources (1000 m), the density of PM-emitting point sources weighted by emissions (5000 m), and the housing density (300 m), listed in order of the highest correlation (see Table S4). The buffer sizes of point source predictor variables within the city are outside of the range of Clairton Plant; these variables include institutions like universities and hospitals as well as smaller industrial facilities.
The model intercept was 8.1 μg/m3, similar to the estimated regional background concentration of 8.2 μg/m3. The LUR-predicted city average PM2.5 concentration was 8.8 μg/m3. Fig. S6a shows the concentration surface predicted by the LUR model over the city of Pittsburgh; Figs. S6b-d shows the distributions of the predictors over the city. This model describes about half of the measured variability in the local PM2.5 (MSE-R2 of 0.52 and a MAPE of 5.1%). This is comparable to performance of similar intra-urban LUR models in the literature; Hoek et al. [22] and Liu et al. [42] review PM2.5 LUR studies from the UK, Europe, North America, and China. The number of predictor variables in these studies range from 2 to 6, and R2 values range from 0.17 to 0.95.
We also fit LUR models to different subsets of the data: seasonal averages and averages over specific times of day (Table S4). The MSE-R2 ranged from 0.29 to 0.65 and the MAPE ranged from 4.2 to 8.6%. Most predictor variables were associated with PM2.5-emitting point sources, though commercial land use, population density, and housing density were also found to be important. The models that performed the best were fit to daytime average data (8AM-8PM). During these periods, restaurant density was the most highly correlated predictor variable. This points to the importance of emissions from restaurants driving local patterns in PM2.5 concentrations in Pittsburgh.
Traffic was not identified as a statistically significant parameter in our LUR model fitting. This was true even for models fit to data subsampled from rush hour periods, despite the immediate proximity of several RAMP sites to major highways, busy urban roads, and bus corridors. While previous studies have identified traffic as an important contributor to spatial patterns of PM2.5 composition in Pittsburgh [37, 43, 44, 45], others have shown that overall urban PM2.5 mass concentrations do not decrease with distance from roads [46].
Daily concentrations compared with EPA and WHO standards
We also used the RAMP network to investigate the frequency of high-concentration events. For daily averages, only days with data with at least 18 h out of 24 were included, which is the practice of the US EPA. Figure 6 shows the number of days per year that the daily average concentration measured at the 64 RAMP sits exceeded two different short-term standards: the 24-h World Health Organization’s (WHO) standard of 25 µg/m3 (Fig. 6a) and the U.S. EPA 24-h National Ambient Air Quality Standard (NAAQS) for PM2.5 of 35 µg/m3 (Fig. 6b). Daily average concentrations are more uncertain than the long-term average concentrations (as discussed previously); the one-day uncertainty is 1.75 µg/m3 for the typical concentration in Pittsburgh, which is 7 and 5% of the WHO and EPA standards, respectively. To compare data on a consistent basis, we calculated the fraction of days when the concentrations exceeded these standards during each sampling period, then applied it to estimate the number of days per year that exceeded a given threshold.
Fifty-eight sites in the RAMP network recorded daily-average concentrations that exceeded the WHO threshold of 25 µg/m3. Of these 58 sites, the number of exceedances varied from one day per year to 27 days per year (mean: 5.8 days per year). Only 22 sites had average daily concentrations that exceeded the daily average NAAQS of 35 µg/m3. The number of exceedances of the EPA standard varied between 1 and 7 days per year (mean: 1.9 days per year). During 2001–2002, Chu et al. [39] found that 24 of 322 days (7.5%) had 24-h average concentrations exceeding 35 µg/m3 at the Carnegie Mellon site. During this study, we did not record any days with concentrations exceeding 35 µg/m3 at the Carnegie Mellon site, indicating the effectiveness of emission controls implemented over the last two decades. However, the RAMP network identified other areas across the city that still require attention.
The sites that exceeded the EPA’s daily standard were primarily near Clairton Plant and in the city’s east end corridor discussed earlier (Fig. 5b); several other sites were along the rivers and therefore may have been influenced by local inversions. While the regulatory monitors (triangles in Fig. 6) are located in some of the locations that experience exceedances of the daily standards, there were no measurements at many of the RAMP sites with daily averages over the thresholds before we deployed the network. This is an example of how lower-cost sensor networks could be used to complement existing regulatory networks by filling in spatial gaps, and how they can inform the placement of future regulatory air quality monitoring stations by identifying previously unmonitored urban locations that may exceed ambient air quality standards.
Recommendations
Lower-cost sensors enable highly time-resolved, long-term measurement of urban air pollution at a never-before-possible spatial resolution and with modest ongoing maintenance. Despite the uncertainty of individual lower-cost sensors, we demonstrate that measurements from networks of these monitors can be used to robustly identify patterns of PM2.5 within urban areas that other measurement methods miss, and to quantify the contributions of modifiable factors to PM2.5 exposures across an urban area. Two key aspects of the network that allow for this kind of analysis are the high temporal resolution and the network density.
High time-resolution (1-h) measurements allow data to be categorized by season, time of day, and wind direction (which can vary throughout a single day). This allows isolation of different modifiable factors, such as a large industrial source. In contrast, these analyses are difficult to perform with traditional integrated filter-based daily or weekly samples, while the high costs of reference-grade instruments that measure hourly PM2.5, like the BAM or TEOM, prevent a dense network of measurements. In this study, we leveraged time-resolved lower-cost sensor measurements (in combination with sensor location) to isolate the influence of different modifiable factors. This has allowed us to identify the influence of Clairton Plant, as well as remove its effects in order to examine the impacts of traffic and restaurants.
A dense spatial network of sites is also a key factor in this analysis. The uncertainty of these measurements is too high to draw conclusions from comparisons of individual monitors. To reduce uncertainty, it is important to have multiple sensors located in a given area or land-use category of interest. By comparing averaged data from groups of sensors, we quantified the contribution of the regional background and the urban increment. The data from these sensor networks can also be used to identify some of the hyperlocal sources contributing to the urban increment when coupled with readily-available meteorological data and land-use characteristics. The density of the RAMP network was also critical for constraining how large industrial sites, topography, and meteorology influence acute (24-h) exposures in Pittsburgh.
For future studies utilizing lower-cost sensors, we recommend the following:
Uncertainty reduction
We demonstrate that the effect of relatively high uncertainty on individual measurements can be managed by employing multiple strategies. First, by careful calibration with regulatory grade monitors, as covered in detail by Malings et al. [7]. Second, by deploying monitors for sufficiently long sampling periods to allow data averaging (we used a minimum length of 240 h, equivalent to 10 days, of data for analysis of long term trends). Third, applying reference site correction technique to reduce the uncertainty associated with discontinuous and nonsimultaneous sampling throughout the network. Finally, data from multiple sensors at each land use type or within the same neighborhood can be averaged together to reduce the uncertainty in the average concentration.
Network design
We systematically deployed sensors across a range of land use types (urban background, urban core, near industrial, near road, etc.) and deployed monitors at multiple sites with similar land-use characteristics. Averaging data across sensors deployed at similar sites allowed us to better constrain the influence of different modifiable factors. However, the limited number of sites with high traffic and high restaurant density included in this study reduced the power of statistical tests we conducted. More sensors should be placed in high traffic/high restaurant areas and near other likely sources of pollution. This will increase confidence in the influence of such sources.
Our data suggest that a network of lower-cost multi-pollutant monitors is preferable to a network of lower-cost PM sensors only. Lower-cost gas sensors like NO2, CO, and ozone (part of the RAMPs used in this study) provide complementary evidence of pollution and are also associated with adverse health effects [47]. For example, we did not find evidence of traffic influence using PM2.5 data alone. However, measurements of NO2 and CO can be used to identify traffic contributions [19, 21].
References
Brook RD, Rajagopalan S, Pope CA, Brook JR, Bhatnagar A, Diez-Roux AV, et al. Particulate matter air pollution and cardiovascular disease. Circulation. 2010;121:2331–78.
WHO Europe. Health Aspects of Air Pollution Results from the WHO Project ‘Systematic Review of Health Aspects of Air Pollution in Europe’. 2004 https://apps.who.int/iris/bitstream/handle/10665/107571/E83080.pdf?sequence=1&isAllowed=y.
Pope A, Burnett R, Thun M, Calle E, Krewski D, Ito K, et al. Long-term exposure to fine particulate air pollution. JAMA. 2002;287:1192.
Pope CA, Lefler JS, Ezzati M, Higbee JD, Marshall JD, Kim S-Y, et al. Mortality risk and fine particulate air pollution in a large, representative cohort of U.S. adults. Environ Health Perspect. 2019;127:077007.
Lefler JS, Higbee JD, Burnett RT, Ezzati M, Coleman NC, Mann DD et al. Air pollution and mortality in a large, representative U.S. cohort: multiple-pollutant analyses, and spatial and temporal decompositions. Environ Heal. 2019. 10.1186/s12940-019-0544-9.
Eeftens M, Beelen R, De Hoogh K, Bellander T, Cesaroni G, Cirach M, et al. Development of land use regression models for PM2.5, PM 2.5 absorbance, PM10 and PMcoarse in 20 European study areas; Results of the ESCAPE project. Environ Sci Technol. 2012;46:11195–205.
Malings C, Tanzer R, Hauryliuk A, Saha PK, Robinson AL, Presto AA, et al. Fine particle mass monitoring with low-cost sensors: corrections and long-term performance evaluation. Aerosol Sci Technol. 2019;0:1–15.
De Nazelle A, Seto E, Donaire-Gonzalez D, Mendez M, Matamala J, Nieuwenhuijsen MJ, et al. Improving estimates of air pollution exposure through ubiquitous sensing technologies. Environ Pollut. 2013;176:92–99.
Schneider P, Castell N, Dauge FR, Vogt M, Lahoz WA, Bartonova A. A network of low-cost air quality sensors and its use for mapping urban air quality. In: Earth Systems Data and Models Mobile Information Systems Leveraging Volunteered Geographic Information for Earth Observation. Cham: Springer International Publishing; 2018, p. 93–110.
Popoola OAM, Carruthers D, Lad C, Bright VB, Mead MI, Stettler MEJ, et al. Use of networks of low cost air quality sensors to quantify air quality in urban settings. Atmos Environ. 2018;194:58–70.
Masiol M, Zíková N, Chalupa DC, Rich DQ, Ferro AR, Hopke PK. Hourly land-use regression models based on low-cost PM monitor data. Environ Res. 2018;167:7–14.
Castell N, Dauge FR, Schneider P, Vogt M, Lerner U, Fishbain B et al. Can commercial low-cost sensor platforms contribute to air quality monitoring and exposure estimates? Environ Int. 2017. 10.1016/j.envint.2016.12.007.
Schneider P, Castell N, Vogt M, Dauge FR, Lahoz WA, Bartonova A Mapping urban air quality in near real-time using observations from low-cost sensors and model information. Environ Int. 2017. 10.1016/j.envint.2017.05.005.
Mead MI, Popoola OAM, Stewart GB, Landshoff P, Calleja M, Hayes M, et al. The use of electrochemical sensors for monitoring urban air quality in low-cost, high-density networks. Atmos Environ. 2013;70:186–203.
Jiao W, Hagler G, Williams R, Sharpe R, Brown R, Garver D, et al. Community Air Sensor Network (CAIRSENSE) project: evaluation of low-cost sensor performance in a suburban environment in the southeastern United States. Atmos Meas Tech. 2016;9:5281–92.
Zheng T, Bergin MH, Johnson KK, Tripathi SN, Shirodkar S, Landis MS, et al. Field evaluation of low-cost particulate matter sensors in high-and low-concentration environments. Atmos Meas Tech. 2018;11:4823–46.
Crilley LR, Shaw M, Pound R, Kramer LJ, Price R, Young S, et al. Evaluation of a low-cost optical particle counter (Alphasense OPC-N2) for ambient air monitoring. Atmos Meas Tech. 2018;11:709–20.
Piedrahita R, Xiang Y, Masson N, Ortega J, Collier A, Jiang Y, et al. The next generation of low-cost personal air quality sensors for quantitative exposure monitoring. Atmos Meas Tech. 2014;7:3325–36.
Tanzer R, Malings C, Hauryliuk A, Subramanian R, Presto AA. Demonstration of a low-cost multi-pollutant network to quantify intra-urban spatial variations in air pollutant source impacts and to evaluate environmental justice. Int J Environ Res Public Health. 2019;16:2523.
Subramanian R, Ellis A, Torres-Delgado E, Tanzer R, Malings C, Rivera F, et al. Air quality in puerto rico in the aftermath of hurricane maria: a case study on the use of lower cost air quality monitors. ACS Earth Sp Chem. 2018;2:1179–86.
Zimmerman N, Li HZ, Ellis A, Hauryliuk A, Robinson ES, Gu P, et al. Improving correlations between land use and air pollutant concentrations using wavelet analysis: insights from a low-cost sensor network. Aerosol Air Qual Res. 2019;5:1–15.
Hoek G, Beelen R, de Hoogh K, Vienneau D, Gulliver J, Fischer P, et al. A review of land-use regression models to assess spatial variation of outdoor air pollution. Atmos Environ. 2008;42:7561–78.
Henderson SB, Beckerman B, Jerrett M, Brauer M Application of land use regression to estimate long-term concentrations of traffic-related nitrogen oxides and fine particulate matter. Environ Sci Technol. 2007. 10.1021/es0606780.
Beelen R, Hoek G, Pebesma E, Vienneau D, de Hoogh K, Briggs DJ. Mapping of background air pollution at a fine spatial scale across the European Union. Sci Total Environ. 2009;407:1852–67.
Clougherty JE, Kheirbek I, Eisl HM, Ross Z, Pezeshki G, Gorczynski JE, et al. Intra-urban spatial variability in wintertime street-level concentrations of multiple combustion-related air pollutants: The New York City Community Air Survey (NYCCAS). J Expo Sci Environ Epidemiol. 2013;23:232–40.
Lenschow P, Abraham HJ, Kutzner K, Lutz M, Preuß JD, Reichenbficher W. Some ideas about the sources of PM10. Atmos Environ. 2001;35:23–33.
Thunis P. On the validity of the incremental approach to estimate the impact of cities on air quality. Atmos Environ. 2018;173:210–22.
Association AL State of the Air 2018. https://www.lung.org/our-initiatives/healthy-air/sota/city-rankings/msas/pittsburgh-new-castle-weirton-pa-oh-wv.html#pmann.
Zimmerman N, Presto AA, Kumar SPN, Gu J, Hauryliuk A, Robinson ES, et al. A machine learning calibration model using random forests to improve sensor performance for lower-cost air quality monitoring. Atmos Meas Tech. 2018;11:291–313.
Malings C, Tanzer R, Hauryliuk A, Kumar SPN, Zimmerman N, Kara LB, et al. Development of a general calibration model and long-term performance evaluation of low-cost sensors for air pollutant gas monitoring. Atmos Meas Tech. 2019;12:903–20.
Local Climatological Data (LCD) | Data Tools | Climate Data Online (CDO) | National Climatic Data Center (NCDC). https://www.ncdc.noaa.gov/cdo-web/datatools/lcd Accessed 15 Jan 2020.
US EPA. 3-Year Quality Assurance Report for Calendar Years 2011, 2012, and 2013 PM2.5 Ambient Air Monitoring Program. 2015 https://www3.epa.gov/ttnamti1/files/ambient/pm25/qa/20112013pm25qareport.pdf.
Allegheny County GIS OPen Data. http://openac-alcogis.opendata.arcgis.com/.
Brunekreef B. Study manual for the European Study of Cohorts for Air Pollution Effects. The Netherlands: Institute for Risk Assessment Sciences, Utrecht University; 2008. p. 1–66.
Lachenbruch PA, Mickey MR. Estimation of error rates in discriminant analysis. Technometrics. 1968;10:1.
Tang W, Raymond T, Wittig B, Davidson C, Pandis S, Robinson A, et al. Spatial variations of PM2.5 during the Pittsburgh air quality study. Aerosol Sci Technol. 2004;38:80–90.
Gu P, Li HZ, Ye Q, Robinson ES, Apte JS, Robinson AL, et al. Intracity variability of particulate matter exposure is driven by carbonaceous sources and correlated with land-use variables. Environ Sci Technol. 2018;52:11545–54.
US EPA. 2017 National Emissions Inventory (NEI) Data. 2017. https://www.epa.gov/air-emissions-inventories/2017-national-emissions-inventory-nei-data.
Chu N, Kadane JB, Davidson CI. Identifying likely PM 2.5 sources on days of elevated concentration: A simple statistical approach. Environ Sci Technol. 2009;43:2407–11.
Anderson RR, Martello DV, White CM, Crist KC, John K, Modey WK, et al. The regional nature of PM 2.5 episodes in the upper Ohio River Valley. J Air Waste Manag Assoc. 2012;54:971–84.
Allegheny county Health Department. Proposed Revision to the Allegheny County Portion of the Pennsylvania State Implementation Plan. 2019 https://alleghenycounty.us/uploadedFiles/Allegheny_Home/Health_Department/Programs/Air_Quality/SIPs/90-SIP-PM25-SIP-2012-NAAQS-03-20-2019-prelim-draft.pdf.
Liu C, Henderson BH, Wang D, Yang X, Peng ZR. A land use regression application into assessing spatial variation of intra-urban fine particulate matter (PM2.5) and nitrogen dioxide (NO2) concentrations in City of Shanghai, China. Sci Total Environ. 2016;565:607–15.
Li HZ, Dallmann TR, Gu P, Presto AA. Application of mobile sampling to investigate spatial variation in fine particle composition. Atmos Environ. 2016;142:71–82.
Robinson ES, Gu P, Ye Q, Li HZ, Shah RU, Apte JS, et al. Restaurant impacts on outdoor air quality: elevated organic aerosol mass from restaurant cooking with neighborhood-scale plume extents. Environ Sci Technol. 2018;52:9285–94.
Saha PK, Zimmerman N, Malings C, Hauryliuk A, Li Z, Snell L, et al. Quantifying high-resolution spatial variations and local source impacts of urban ultrafine particle concentrations. Sci Total Environ. 2019;655:473–81.
Karner AA, Eisinger DS, Niemeier DEBA. Near-roadway air quality: synthesizing the findings from real-world. Data 2010;44:5334–44.
Di Q, Dai L, Wang Y, Zanobetti A, Choirat C, Schwartz JD, et al. Association of short-term exposure to air pollution with mortality in older adults. J Am Med Assoc. 2017;318:2446–56.
Acknowledgements
The authors thank Eric Lipsky, Naomi Zimmerman, Aja Ellis, and Rebecca Tanzer for assistance with instrument setup and operation.
Funding
This was developed as part of the Center for Air, Climate and Energy Solution (CACES). Funding was provided by the United States Environmental Protection Agency (assistance agreement nos. RD83587301 and 83628601) and the Heinz Endowments Fund (grants E2375 and E3145). It has not been formally reviewed by the EPA.The views expressed in this document are solely those of authors and do not necessarily reflect those of the funders. The funders do not endorse any products or commercial services mentioned in this publication.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
About this article
Cite this article
Rose Eilenberg, S., Subramanian, R., Malings, C. et al. Using a network of lower-cost monitors to identify the influence of modifiable factors driving spatial patterns in fine particulate matter concentrations in an urban environment. J Expo Sci Environ Epidemiol 30, 949–961 (2020). https://doi.org/10.1038/s41370-020-0255-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41370-020-0255-x
- Springer Nature America, Inc.
Keywords
This article is cited by
-
Spatialized PM2.5 during COVID-19 pandemic in Brazil’s most populous southern city: implications for post-pandemic era
Environmental Geochemistry and Health (2024)
-
Using crowd-sourced low-cost sensors in a land use regression of PM2.5 in 6 US cities
Air Quality, Atmosphere & Health (2022)
-
Spatial variations in urban air pollution: impacts of diesel bus traffic and restaurant cooking at small scales
Air Quality, Atmosphere & Health (2021)