Introduction

Over the last decades, in rapidly developing countries, such as China, the disease burden has shifted from a profile dominated by infectious diseases to one increasingly characterised by non-communicable diseases (NCDs) [1]. Air pollution is now the leading environmental risk factor for NCDs resulting in millions of premature deaths and accelerating rates of chronic disease worldwide [2]. Epidemiological studies have had significant impact in the setting of national and international air quality standards to protect global populations from the detrimental effects of air pollution. However, most of these studies commonly derive metrics of short-term exposure from static outdoor monitoring networks with low spatial and temporal resolution [3]. Such measurements are generally highly correlated at these coarser scales and cannot separate the individual health effects of pollutants [4]. Failure to capture the high granularity of total personal exposure introduces exposure misclassification that can lead to bias in health estimations [5, 6].

A range of complex interacting factors drive the high ambient air pollution heterogeneity, while individual variability in personal exposure includes a behavioural component [7] as a person moves between different microenvironments with varying emission sources. During daily life, peak exposure events often occur during commuting [8] while the indoor environment is a significant site for exposure in part because people spend substantial fractions (often as much as 90%) of their time indoors [9]. Indoor air is affected by outdoor pollutants penetrating building envelopes with additional indoor sinks, sources and emissions from building materials which cannot be captured by static outdoor monitoring networks [10].

Several studies have identified large discrepancies between personal exposure measurements and outdoor concentrations [7]. These exposure uncertainties may introduce prediction errors and bias with substantial implications for interpreting epidemiological studies on air pollution, particularly the time-series analyses [5]. The between-subject variability is large because air pollutants concentrations vary significantly by both location and activity. Therefore, a comprehensive personal exposure assessment requires two components: (1) the pollutant concentrations the person is exposed to; and (2) the recording of a person’s time-activity patterns that may vary with age, gender, occupation and socio-economic status [11]. Physical activity levels affect the potential dose of inhaled air pollution [12].

In light of this challenge, “Effects of AIR pollution on cardiopuLmonary disEaSe in urban and peri-urban reSidents in Beijing” (AIRLESS) [13] nested within the “Air pollution and human health in a Chinese megacity” (APHH) research programme was initiated [14]. The aim of the AIRLESS project was to address the complex issue of multipollutant exposures on cardiopulmonary outcomes. This paper presents the results of the field deployment of 60 portable air pollution sensor platforms for one week during the winter and summer season in 251 participants of the AIRLESS panel study residing in urban and peri-urban China. The expectation is that similar effects would be evident in the general population hence this paper has wider significance than just these cohorts. The main objective of this work is to demonstrate that novel sensor technologies and computational methods offer a paradigm shift in collecting highly resolved measurements of individualised air pollutants improving dose estimations in large-scale health studies.

Materials and methods

This section briefly describes the methodology employed for creating a comprehensive database of validated personal concentrations and time-activity location patterns of 251 participants of a health panel study matched with intensive monitoring of outdoor air pollution levels. In the last subsection, methodologies for estimating dose with traditional and highly resolved exposure metrics are outlined. The AIRLESS project will integrate these results of detailed doses and exposures to multiple air pollutants with changes in cardio-pulmonary health outcomes to ensure the biggest scientific and policy impact.

The participant sample

The measurements were collected as part of the AIRLESS project which was designed as a panel study with repeated personal exposure and clinical measurements of 123 urban and 128 peri-urban participants during the winter (14th Nov–21st Dec 2016) and summer seasons (22nd May–21st Jun 2017). Each participant carried a personal air quality monitor (PAM) for 1 week in each season. Thirty PAM devices were deployed at both the urban and peri-urban clinic sites, which enabled the recruitment of 30 subjects each week at each location. This paper will mainly focus on the analysis of the winter campaign data. The participants residing in the urban site were 59–75 years old, and were primarily retired (88%). In the peri-urban site the age range was between 50 and 73 years old, and their primary occupation was agriculture (53%) followed by retirement 17% and housekeeping 13% [13].

Outdoor air pollution measurements

Intensive ambient air pollution monitoring campaigns were launched simultaneously next to the urban and peri-urban clinics, which were in close proximity to most subjects’ residential addresses. Both stations measured background air pollution levels, as they were located away from direct sources. These measurements had the same time resolution as the personal measurements (1 min). A detailed description of the ambient air pollution monitoring campaign is presented in Shi et al. [14].

The personal air quality monitor (PAM)

The PAM is an autonomous unit that incorporates multiple sensors for activity, and for physical and chemical parameters [15]. The compact and lightweight design of the PAM (∼400 g) makes the unit suitable for personal exposure assessment. The time resolution of the measurements was set at 1 min time intervals resulting in a battery life on a single charge of ∼24 h. The PAM collects multiple timestamped geo-coordinated measurements of gaseous pollutants, particulate matter, temperature, relative humidity, background noise levels and accelerometry. Measurements are transmitted to a secure server through GPRS for further processing.

Previous work [15] characterised the performance of the PAM that integrates multiple miniaturised sensors for nitrogen oxides (NOx), carbon monoxide (CO), ozone (O3) and particulate matter (PM) measurements. Overall, the air pollution sensors showed high reproducibility (mean R2 = 0.93, min–max: 0.80–1.00) and excellent agreement with standard instrumentation (mean R2 = 0.82, min–max: 0.54–0.99) in outdoor, indoor and commuting microenvironments across seasons and different geographical settings making it suitable for collecting highly resolved exposure metrics at large scale. There were some indications that the EC sensor performance is less reliable at high temperatures (>40 °C); however, such extreme environmental conditions were not encountered during the actual personal exposure sample periods discussed here. A limitation of all optical PM sensors, including the one used here but also reference instrumentation, is that they cannot measure small particles below a critical size threshold. We have shown that by appropriate local calibration, this shortcoming can be largely accounted for. Following the methodology described in Chatzidiakou et al. [15], the raw measurements of gaseous pollutants and particulate matter were converted to physical units.

The data capture of the personal measurements was 81%, including data losses from participants not complying with the protocol (e.g., no charging of the monitor) and from data cleaning procedures, demonstrating the deployment feasibility and participant acceptability of novel sensor technologies.

The time-activity model

This subsection provides a brief overview of the progressive composite model (Fig. 1) that classifies time-activity-location patterns automatically using: (1) auxiliary parameters collected with the PAM (geo-coordinates, background noise and acceleration levels) as input; and (2) machine learning techniques of spatio-temporal clustering, movement analysis methods, geographical information systems (GIS) [16] and rule-based algorithms. The participants carried the PAMs during their daily life (Fig. 1a). In the first step, the model computes the space-time utilisation distributions of the GPS coordinates for each participant (Fig. 1b) [17]. The resulting metrics (time spent in each location, re-visitation rate and metrics of directional movement) were used to classify each point in one of three core location categories (Fig. 1c): home, other static locations and in transit. Sleeping was further classified in the home category using local time, background noise levels and deposition of coarse particulate matter. In transit category was classified into five modes of transport (walking, cycling, motorbike, car/bus and train/tube) (Fig. 1d) to capture distinct air pollution microenvironments and different inhalation rates.

Fig. 1: Flow chart of time-activity model.
figure 1

a Raw GPS data of 251 participants carrying 60 PAMs during the winter fieldwork campaign (14th Nov–21st Dec 2016) plotted on urban and peri-urban maps. Map data Google 2019. b 3D map of a representative participant illustrating the relative amount of time spent in visited locations. The space-time utilisation distribution was constructed using advanced spatio-temporal analysis of the GPS data [17]. c Separation of static clusters from clusters with directional movement using derived parameters from step b. d Classification of mode of transport using movement analysis methods [23], GIS [16], PAM data (e.g., speed, acceleration) and questionnaire responses collected in the panel study. e Time spent in different core locations (home, transit and other static) by the two cohorts.

The time-activity classification of the data collected during the winter campaign showed that, in line with previous research [9], the participants spent as much as 90% of their time at home (Fig. 1e) partly due to socio-economic factors (e.g., little agricultural activity in winter and large percentage of retired participants). Travel behaviour is a complex issue affected by a multitude of factors, for instance supply and costs of transportation alternatives, incomes as well as urban size and spread. In line with previous studies [18], the urban participants spent 5% of their time budget travelling, and covered a larger spatial area than the peri-urban group that were relatively sedentary (Fig. 1a).

Personal concentrations, exposure and dose estimates

The basic concepts used in exposure assessments were developed in the early 80s [19]. Personal exposure is defined as the contact of a person with a pollutant of concentration c, at a particular time t. We refer to the mean personal exposure as the average pollutant concentration in the visited microenvironment over the corresponding time period [20]. In the following, exposure misclassification is defined as the difference between exposure estimated from measurements at a fixed monitoring station and personal exposure measured by portable sensors.

The air pollution dose describes the amount of pollutant that is actually received by the organism by inhalation. As a first approximation, the potential dose can be defined as the inhaled amount of a pollutant, assuming a total absorption of the pollutant by the body. In the following, the term “dose” will refer to this potential dose. The dose D(t) per time unit [µg min−1] is the product of the air pollution concentration c [µg m−3] with the inhalation rate f [m3 min−1] [20] (Eq. 1). The total dose is the integral of D(t) over a defined period of time, in this study over the participation time of each individual (7 days).

$$D\left( t \right) = f\left( t \right)c\left( t \right)$$
(1)

In epidemiological studies, ambient monitoring data are typically averaged for the study area and short-term exposure on any given day is assumed to be the same for the entire population [21]. To understand the differences that arise from the spatial resolution of air pollution measurements employed and the varying time-activity-location patterns of individuals, three approaches were adopted to estimate dose:

  • Method A uses air pollution measurements from the static monitoring station closest to the participant’s residential address cstat(t) representing the method employed by the majority of epidemiological studies to estimate exposure. Although this approximation is generally poor, the relevant parameter for interpretation is the extent to which actual personal exposures are correlated to the area-average exposure over time [21]. This method uses generic inhalation rates fgen = 9 L m−1 as the level of physical activity may not be available [12].

  • Method B assumes the same generic inhalation rate fgen, but utilises highly resolved air pollution measurements in the immediate proximity of the participant collected with the PAM cPAM.

  • Method C estimates intake in an optimal way by using air pollution concentrations measured in the immediate microenvironment of each participant at high temporal resolution (cPAM), and inhalation rates derived from the physical activity intensity (fact) estimated with the time-activity model (“The time-activity model” section).

Results

Seasonal variation of outdoor and personal air pollution concentrations

In summary, the outdoor air pollutant levels were very high in Beijing and surroundings, particularly during the winter months with the exception of ozone which was higher during the summer (Fig. 2). Synoptic-scale meteorological analysis suggests that the degraded outdoor air quality in winter was due to the greater stagnation and weak southerly circulation [14]. More specifically, winter outdoor air pollution was characterised by several high PM2.5 pollution events, with peak hourly concentrations ranging up to 617 µg m−3; whereas, during the summer there were events of high ozone concentrations with the highest hourly average of 168 ppb.

Fig. 2: Boxplots of ambient and personal air pollution levels in urban and peri-urban Beijing during the winter and summer campaigns.
figure 2

The white whisker box plots illustrate outdoor air pollution levels measured at the reference monitoring stations during the summer (May-June 2017) and winter (Nov-Dec 2016) campaigns. The blue box plots show the levels measured with 60 PAMs (blue) deployed to 251 participants during the same periods. IQR inter-quartile range, perc. percentile.

Concentrations measured with the PAMs carried by participants showed two distinct profiles (Fig. 2) consistent between seasons:

  • Personal CO and NO concentrations. Partially driven by the outdoor concentrations, levels of CO and NO, measured with the PAMs showed a strong seasonal variation with higher levels measured during the winter season. The difference between personal and outdoor concentrations was much higher during winter indicating stronger sources in proximity to the participants compared with the summer.

  • Personal NO2, PM2.5 and O3 concentrations. Contrary to personal CO and NO levels, which broadly followed the outdoor trends, NO2, PM2.5 and O3 levels (Fig. 2) were significantly lower than outdoor levels in both seasons and showed little (PM2.5 and O3) or no (NO2) seasonal variation.

The personal measurements show that there is a substantial exposure misclassification that could be introduced when using outdoor measurements as exposure metrics. Apart from the substantial difference in the magnitude of personal and outdoor measurements, there is also a poor correlation (R2 < 0.2 across all pollutants; see Fig. S1 in Supplementary Material) indicating that in that environment exposure metrics derived from the outdoor monitoring stations explained an insignificant amount of the variability in personal exposure.

Correlation between individual pollutants

The previous subsection highlighted that measurements from static outdoor monitoring sites are poor surrogates for personal exposure levels stressing the need for measurements as close as possible to the individual to capture the high granularity of personal exposure.

A further significant limitation introduced when using measurements from static monitoring stations as metrics of exposure is usually the high correlation between different species. For example, COMEAP report [4] concluded that insufficient evidence on the health impacts of NO2 due to the high correlation of this pollutant with other traffic-related pollutants such as primary combustion particles, particle number concentration or carbon monoxide. As a result, the statistical associations of each individual pollutant with a health effect will, to some extent, also reflect the effects of other pollutants in the group.

The difficulty of interpreting the results of highly correlated pollutants persists even when multipollutant models are applied in the statistical analysis, as the multicollinearity introduced from highly correlated species prohibits multipollutant health models from assigning specific health effects to individual pollutants. In addition, if one correlated pollutant has a larger exposure misclassification than another, this may result in the effects associated with a causal relationship being underestimated whilst non-causal associations are spuriously overestimated [4].

The correlation between outdoor pollutants will vary by location, in this case, Fig. 3(a) shows scatter plots between the outdoor NO2 and PM2.5 concentrations measured at the static outdoor monitoring station at the primary urban site used, and (b) measured with mobile sensors carried by the urban participants. While the two pollutants were strongly correlated at the outdoor monitoring station (R2 = 0.64), low correlations were observed in the personal measurements (R2 = 0.05). This demonstrates clearly that personal monitoring can break the correlation between outdoor correlated pollutants because they capture variable emissions from sources in the direct environment of a person with changing compositions.

Fig. 3: Correlation between the concentration measurements of two different pollution species (NO2 and PM2.5).
figure 3

a Measurements of a static monitoring station (PKU Beijing, grey) compared to measurements of portable monitors (urban site, black). Measurements were taken over the same time period. b Portable monitor measurements (inset of graph a) separated by location (home, other static and transit).

Figure 3(b) uses the personal exposure measurements and breaks them down by location as classified by the time-activity model (“The time-activity model“ section). A low correlation between the two pollutants was observed at home and other static locations (R2 < 0.1), whereas the two pollutants were moderately correlated in transit environments (R2 = 0.25). This is due to similar emission sources in traffic outdoor environments where pollutants are generally more correlated, also reflected in the measurements at the outdoor monitoring site. Although not shown here equivalent arguments apply to CO and O3.

Exposure and dose estimations

The large differences between outdoor and personal concentrations highlighted in the previous subsection were driven by time-activity-location patterns of individual participants as well as infiltration rates of outdoor pollutants in visited indoor microenvironments.

The bar plots in Fig. 4 show the average inhaled dose of urban and peri-urban participants calculated with the three-dose estimation methods described in “Personal concentrations, exposure and dose estimates” section. A detailed description of the calculations used to create Figs. 46 is given in Section S1.

Fig. 4: Dose estimations per time unit of various pollutants during the winter deployment.
figure 4

Doses are estimated using methods A–C (see section “Personal concentrations, exposure and dose estimates”) and averaged over the urban (left) and peri-urban cohort (right). Error bars indicate the standard error of dose estimations across the cohort.

Fig. 5: Weekly air pollution dose by participant, split by contribution from different microenvironments (winter deployment).
figure 5

a Total pollutant dose of each participant over one week for PM2.5 (top) and NO2 (bottom), estimated using method C. Colours mark the contributions of each activity to the total dose. b Density plots of the individual doses in the urban (blue) and peri-urban (black) cohort, the vertical dotted lines mark the average weekly dose of the two cohorts.

Fig. 6: Mean pollutant dose per unit time during different activities.
figure 6

Mean doses per minute were estimated using the three estimation methods A–C (see section “Personal concentrations, exposure and dose estimates”) and averaged over all 251 participants. Error bars indicate the standard error of dose estimations across the cohort.

As methods A and B are integrating a constant inhalation rate and the measurements were taken over the same time period, the average dose depends only on the measured pollutant concentrations in the surrounding microenvironment. Therefore, the doses are directly proportional to the exposures, and the difference between the two methods is a measure of the exposure misclassification between personal and outdoor estimates which was substantial in all cases. For example, the outdoor stations overpredicted PM2.5 exposure by up to fourfold, while exposure to CO was under-predicted by up to fivefold.

The difference between method B and method C, both derived from personal measurement, was marginal (Fig. 4) despite integrating activity-dependent inhalation rates in method C. This was mostly due to the low physical activity levels of the participants which resulted in an average inhalation rate similar to the generic one used for method A and B. The home microenvironment was the most important modifier of personal dose, partly because participants spent most of their time there (Fig. 1e). In addition, strong indoor sources of CO and NO operated in the home microenvironment elevating personal dose. On the other hand, indoor doses of NO2, O3 and PM2.5 were lower indicating the presence of strong chemical sinks.

Using PM2.5 and NO2 as an example, Fig. 5a shows the total pollutant dose of each participant over their participation week (calculated with dose estimation method C). The contributions from different microenvironments to the total dose are colour-coded. While generally the urban participants receive a lower PM dose than the peri-urban group, the urban participants received higher doses of NO2 making, therefore, the exposure profiles of these two groups distinct. However, the variation between individual participants was larger than the variation between the two groups (Fig. 5b). Although the participants spent little time in transportation, it was a significant site of exposure to both pollutants particularly for the urban participants.

Figure 6 shows the average pollutant dose per minute the participants inhaled during different activities, calculated with the three methods A, B and C. As method B integrates one generic inhalation rate for all activities, the average dose is proportional to the pollutant concentrations the participants were exposed to during the different activities. The home environment had the biggest impact on CO dose which was probably caused by indoor emission sources such as cooking and heating. The average NOx dose was highest during street-level transportation possibly due to strong sources in the traffic environment. When inhalation rates were taken into account (method C), the maximum dose was received during active modes of transport (walking, cycling) due to the increased physical activity levels and inhalation rates. It is therefore likely that the inhaled dose would be significantly underestimated in more active subgroups of the population.

Discussion

Exposure misclassification of air pollution remains one of the biggest limitations of epidemiological research on the health impacts of air pollution, preventing the discipline to move from general associations to specific ones. In the absence of personal/indoor measurements, health studies have mainly relied on available outdoor monitoring network data to assess short-term exposures [21]. This paper demonstrates a new methodological framework where novel sensor technologies and advanced computational methods offer a paradigm shift to estimate activity-weighted air pollution exposure in large-scale health studies.

In total, 60 validated personal air quality platforms with miniaturised novel sensors that measure physical parameters, gaseous pollutants and particulate matter were deployed to 251 participants of two established cohorts residing in urban and peri-urban Beijing, China [13]. Time-activity-location classifications were derived automatically using GPS coordinates, accelerometry and background noise levels collected with the personal monitors. Because such auxiliary data can be collected with widely used smartphones from a large number of the population [22], this technique potentially provides unobtrusive means of enhancing epidemiological exposure data at low cost minimising participant burden.

The relatively sedentary elderly participants spent ~90% of their time at home and as little as 2% outdoors, which is in line with previous research in developed countries [9]. The home environment was, therefore, the major contributor to overall exposure, and an important modifier of personal concentrations for all investigated air pollutant species. Exposure differences between the two participant groups were attributed partly to the variation in domestic energy use e.g., in winter the urban building stock in China relies on centralised gas heating system, while traditional biomass and coal stoves remain the key emission source for heating and cooking in peri-urban areas. However, the exposure variability between participants was larger than the variability between the two groups, stressing the need to go beyond current methodologies to estimate population exposures.

We found low correlations and substantial differences in the magnitude estimated from outdoor and personal air quality measurements. An important implication for health studies is that relying on outdoor measurements could introduce significant error and bias in health models depending on individual pollutants’ chemical reactivity and strength of local emission sources. The magnitude of the health effects derived from improved exposure estimates is likely to be different than previous estimations using outdoor measurements as metrics of exposure [3]. The extend of misclassification is hard to quantify as it varies significantly with season and location.

Traffic-related pollutants, such as NO2 and PM2.5, are generally highly correlated when measured at coarse spatial and temporal scales, making it challenging to distinguish a causal link between a single pollutant and a specific health outcome [4]. A major advantage of the proposed methodology is that novel sensor technologies enable the collection of personal measurements at high spatial resolution, and therefore significantly reduce the correlation between individual air pollutants observed at monitoring stations.

Further work matches these estimations with detailed medical biomarkers to draw more reliable associations between air pollution exposure and health impacts [13]. This paper focused on a participant sample with specific personal and socio-economic characteristics that resulted in generally low physical activity. However, the demonstrated approach is applicable in diverse geographical settings and subgroups. The expectation is that the dose misclassification will likely be larger in subpopulations that are more physically active, such as children, increasing the importance of the proposed methodology in disentangling the complex mechanisms of health risks and individual susceptibility to air pollution.