Introduction

Personal air pollution exposures are very challenging to quantify accurately. Traditional approaches to quantifying exposure to outdoor air pollution assume that concentrations at the residential address are adequate surrogates of personal exposure to air pollution of outdoor origin (Bae et al. 2007; Bell and Ebisu 2012; Elliott and Smiley 2019; Houston et al. 2004; Rowangould 2013). The underlying assumption is that individuals spend most of their time indoors at the residence (Klepeis et al. 2001) and that outdoor air pollution infiltrates into the indoor environment where exposure occurs. However, given that people are mobile and their exposure to air pollution can occur in various locations, this static residential approach will inevitably introduce exposure measurement error and potential bias in air pollution and health assessments, which may lead to ineffective public health policy interventions.

Measurement errors in air pollution exposure can come from several sources. Recent studies report that exposure measurement may be substantially biased low if not considering human mobility (Gurram et al. 2019; Lu 2021; Park and Kwan 2017; Tayarani and Rowangould 2020). Park and Kwan (2017) argue that the individual’s time-activity pattern determines their exposure levels as personal exposure to air pollution occurs through dynamic spatiotemporal interactions between individuals and air pollutant distribution. Although people generally spend more time at home, the majority of their exposure occurs in other places (Park 2020). For example, workers generally spend more time exposed in traffic during commuting. Overlooking human mobility will lead to inaccurate air pollution exposure measurements.

Beyond ignoring human mobility, spatiotemporal resolution of air pollution surfaces or prediction models used is another important factor that may cause exposure measurement error. Coarse resolution cannot reflect important spatial gradients (Clark et al. 2022; Korhonen et al. 2019; Li et al. 2016). The spatial resolution of assigned air pollution concentrations in recent studies varied substantially, ranging from 0.25 to 10 km2 (Dewulf et al. 2016; Gurram et al. 2019; Park and Kwan 2017; Yu et al. 2020). The temporal resolution of outdoor PM2.5 surfaces is another important factor, with several studies relying on daily, monthly, or annual averaged pollution models to estimate residential exposure (M. Nyhan et al. 2016; Pennington et al. 2017; Setton et al. 2008). Greater temporal aggregation does not capture how concentrations vary over time. Evidence shows that personal exposure to outdoor air pollution tended to be overestimated at the residence and underestimated at daily activity spaces when daily, monthly, or annual average air pollution concentrations are used to estimate personal exposure (Dhondt et al. 2012).

Moreover, exposure measurement error will further distort the air pollution-health effect estimates (Basagaña et al. 2013; S. Y. Kim et al. 2009; Samoli and Butland 2017; Sellier et al. 2014), leading to biased estimates between exposure and health outcome. Prior studies have documented the impact of potential exposure measurement errors on the air pollution-health relationship. Jerrett et al. (2005) show that inaccurate personal exposure measurements resulting from poorly spatially resolved air pollution prediction models may significantly impact the relationship between air pollution exposure and mortality. Inconsistent results in effect estimates of NO2 on newborns’ birthweight were obtained when different spatially resolved air pollution prediction models were used (Sellier et al. 2014). Another study presents that exposure measurement error might lead to bias of regression coefficients and to inflation of their variance when personal exposures assessed through air pollution prediction models with different spatial and temporal resolutions are used as explanatory variables in models for exposure-health estimates (Basagaña et al. 2013).

Substantial effort has been invested in air pollution epidemiology research to develop statistical models to predict personal exposures at subjects’ locations in situations where measurements at the desired locations are not available. However, most existing air pollution epidemiology studies focus on the impact of air pollution prediction models on the exposure-health relationship (Basagaña et al. 2013; S. Y. Kim et al. 2009; Samoli and Butland 2017; Sellier et al. 2014). Few studies have examined the effects of human mobility in exposure measurements on the association between air pollution exposure and health outcomes. Yu et al. (2020) find that people with higher mobility levels tend to have larger exposure measurement errors. Shareck et al. (2014) argue that unequally distributed features and resources across spaces may induce air pollution exposure disparities by constraining sites where people perform their everyday activities. For example, due to accessibility limitations, low-income groups usually travel shorter distances from their homes than their high-income counterparts (Morency et al. 2011; Vallée et al. 2010). Compared to whites, blacks and Latinos usually have lower mobility levels (Hu et al. 2020). Full-time employees tend to travel longer daily distances (Järv et al. 2015; Morency et al. 2011; Páez et al. 2010), whereas part-time employees and unemployed people are more place-bound (Lu 2021; Vallée et al. 2010). Little is known about how distinct travel behaviors and mobility patterns of different sociodemographic groups may influence their air pollution exposure levels. The impact of human mobility on the exposure-health effect is also less studied in previous research.

This study aims to examine the impact of human mobility in exposure measurement errors and health effects associated with air pollution and disentangle the complex relationship between sociodemographic variables and exposure measurement errors. In this study, residence-based and mobility-based PM2.5 exposures for a sample of Los Angeles County residents on a typical weekday were estimated by coupling hourly 500 × 500m PM2.5 surfaces at the neighborhood level and simulated daily mobility data at the individual level. The study samples were classified into three exposure groups based on differences between their residence- and mobility-based PM2.5 exposures: individuals with similar residence and mobility exposure, individuals whose exposures were overestimated, and individuals whose exposures were underestimated. Random forest classification models were used to examine the impacts of a series of mobility and sociodemographic variables on exposure classification results. Last, sensitivity analysis was conducted to examine the impact of human mobility on exposure-health effects across exposure classification groups and sociodemographic groups.

The remainder of the paper is organized as follows. The “Method” section presents the data and methods used in this study. The “Results” section summarizes the results. The “Discussion” section discusses the findings. Conclusions are presented in the “Conclusion” section.

Method

Study area

Los Angeles is well recognized for its notoriously severe air pollution problem as one of the US metropolitan regions with the highest level of particulate matter pollution (American Lung Association 2020). Los Angeles has the most developed highway system and the busiest traffic in the US PM2.5 pollution, a primary air pollutant created by vehicles, which has been a serious public health problem in Los Angeles for decades. In Los Angeles, PM2.5 concentrations vary spatially and temporally, with the highest pollution observed during peak hours and within core urban areas (Lu et al. 2021). Los Angeles is, therefore, a good case study to examine variation in exposure levels across time and space taking daily travel patterns into account and test whether exposure patterns are related to sociodemographic population characteristics.

Data

PM2.5 pollution modeling

In this study, ground-level PM2.5 concentrations were estimated from our recently developed PM2.5 model (Lu et al. 2021). We have created an hourly, 0.25-km gridded PM2.5 model for Los Angeles County that incorporates low-cost air sensor data (i.e., PurpleAir) and machine learning techniques. Ambient air pollution has been traditionally monitored at regulatory stations at high instrumentation and maintenance costs. Sparse and uneven regulatory monitoring has a limited ability to reflect pollution details, especially in unmonitored areas. Dense deployment allows low-cost PurpleAir sensors to capture spatiotemporal variations of localized PM2.5 concentrations at finer resolution than regulatory air quality stations (Bi et al. 2020; Lu et al. 2021; Mousavi and Wu 2021). A number of recent studies have used PurpleAir sensors to develop PM2.5 modeling at fine spatiotemporal resolution (Bi et al. 2020; Lu et al. 2021; Mousavi and Wu 2021).

In this study, twenty-four hourly PM2.5 concentration surfaces over the course of a typical weekday in 2019 were generated at a 500 m × 500 m grid level for Los Angeles County. A suite of spatiotemporal variables, including meteorological conditions, land use variables, and traffic counts, was integrated with the random forest method to estimate PM2.5 concentration at the sub-daily and neighborhood level. Estimated PM2.5 concentrations were then validated against measured PM2.5 concentrations by the 10-fold cross-validation method. The results showed that the PurpleAir-based PM2.5 prediction model could capture more than 90% of variations. A comprehensive description and validation results can be found in Lu et al. (2021).

Activity-based travel demand modeling

Daily travel trajectories for 100,784 Los Angeles County residents were simulated using an activity-based travel demand model developed by the Southern California Association of Governments (SCAG) for an average weekday in 2019 (Pendyala et al. 2012; Ziemke et al. 2015). American Community Survey (ACS) 2003 and Census 2000 have been used to validate this SCAG simulated travel trajectory data. Validation results show that the SCAG activity-based travel demand model has a good performance in predicting “activity purpose-number” and mimicking corresponding population features at the individual level. According to the validation results, the majority of the synthetic population deviated less than 5% from the reference group in terms of demographic and socioeconomic characteristics (Pendyala et al. 2012).

The SCAG travel trajectory dataset contains 387,398 trip records for 100,784 Los Angeles County residents (approximately 10% of the total Los Angeles County population). Each trip record includes a personal ID, origin-destination pair of the trip, trip purpose, trip departure and arrival timestamps, trip duration, and travel mode. The personal ID is unique for each individual and is used to connect with synthetic demographic features provided by the SCAG (Bhat et al. 2013; Lu 2021). The origin and destination of each trip are allocated to the geographic unit of the traffic analysis zone (TAZ), whose size is similar to the census tract. The centroid of TAZ is assumed to be each trip’s origin or destination point.

However, the travel trajectory dataset lacks information on travel paths between the activity sites. This study used the OSMnx python package to estimate probable travel paths between two activity TAZs in the shortest path distance (Boeing 2017; Lu 2021).

Individual exposure assessment

Static and dynamic exposure assessment

The study region was subdivided into 0.25 km2 hexagon grids. As noted earlier, hourly PM2.5 concentrations of each grid (twenty-four hours in total) were generated by utilizing the PM2.5 model developed by Lu et al. (2021) for Wednesday, September 18, 2019. Due to limitations in computing resources, PM2.5 concentrations are assumed to be constant during an hour within each hexagon grid. The hourly PM2.5 concentrations were spatially matched to each TAZ in compliance with the travel trajectory data and averaged within each TAZ if multiple PM2.5 hexagon grids locating in the same TAZ. Two types of individual PM2.5 exposures were then assessed: (1) static PM2.5 exposure at residence and (2) dynamic PM2.5 exposure that considers individuals’ daily mobility patterns.

The individual static and dynamic exposures are estimated as in Eqs. (1) and (2):

$${\mathrm{Static}}_i=\frac{\sum_{t=1}^T{PM}_{h,t}}{T}$$
(1)
$${\mathrm{Dynamic}}_i=\frac{\sum_{t=1}^T\sum_{n=1}^N{PM}_{n,t}\cdot {P}_n}{T}$$
(2)

where PMh, t is PM2.5 concentration in hour t at TAZ h, where individual i’s home is located. T denotes 24 h of a day. PMn, t is PM2.5 concentration in hour t at TAZ n, where individual i is located within hour t. N represents the total number of TAZs (microenvironments) individual i has stayed during hour t (N ≥ 1). Pn denotes the percentage of time during hour t that individual i stays in TAZ n.

Exposure classification based on exposure measurement error

Prior research has documented the occurrence of exposure misclassification when human mobility is not taken into account in exposure assessment (Guo et al. 2020; Yu et al. 2020). Exposure of individuals who have high residence-based exposures is likely to be reduced by their mobility, while exposure of individuals who have relatively low residence-based exposures is likely to be increased (J. Kim and Kwan 2021; Kwan 2018). All study subjects were subdivided into three groups according to their exposure measurement errors shown in Eq. (3): (1) individuals with similar dynamic and static exposures, which is referred to as the “Accurate” group; (2) individuals with higher static exposures than their dynamic exposures, which is referred to as the “Overestimated” group; and (3) individuals with higher dynamic exposures than their static exposures, which is referred to as the “Underestimated” group.

The magnitude and direction of two statistical indicators were employed to categorize exposure classification groups: (1) exposure measurement error and (2) mean absolute percentage error (MAPE). The exposure measurement error was calculated by subtracting an individual’s static exposure from their dynamic exposure (i.e., Dynamici − Statici). A positive exposure measurement error indicates an individual’s exposure is underestimated, while a negative measurement error indicates overestimated exposure. MAPE was adopted as an additional criterion to evaluate the degree of agreement between an individual’s static and dynamic exposures: \(\left|\frac{\mathrm{D}{\mathrm{ynamic}}_i-{\mathrm{Static}}_i}{{\mathrm{Static}}_i}\right|\times 100\%\). Higher MAPE values indicate differences between static and dynamic exposures as a result of overestimated or underestimated exposures. The thresholds for exposure measurement error and MAPE were set to ±0.5 μg/m3 and 10%, respectively, to determine an individual’s exposure classification group. A comprehensive description of the classification method is shown in Eq. (3).

$${E}_i=\left\{\begin{array}{c}\mathrm{Overestimated}\kern1.25em \mathrm{if}\ {\mathrm{Error}}_i<-0.5\ \mathrm{and}\ {MAPE}_i>10\%\\ {}\mathrm{Accurate}\kern0.75em if-0.5\le {\mathrm{Error}}_i\le 0.5\ \mathrm{and}\ {MAPE}_i\le 10\%\\ {}\mathrm{Underestimated}\kern0.75em \mathrm{if}\ {\mathrm{Error}}_i>0.5\ \mathrm{and}\kern0.5em {MAPE}_i>10\%\end{array}\right.$$
(3)

where Ei denotes the exposure classification group that individual i belongs to; Errori is the exposure measurement error for individual i.

Random forest classification model

This study utilized the random forest classification model to examine associations of a variety of mobility and sociodemographic variables with exposure classification results. In contrast to traditional linear regression, the random forest model can capture nonlinear relationships between response variables and predictors and provide a flexible and automated process for predicting target variables (Breiman 2001). The random forest model generates a number of decision trees and trains each decision tree independently using a random sample of the data. This randomness contributes to the model being more robust than a single decision tree and less prone to overfitting the training data. Furthermore, the random forest model avoids the probable multicollinearity across sociodemographic variables, which violates the underlying premise of independence in many regression models.

In this study, 90% of samples were randomly subsampled as training set and the remaining 10% as a testing set to evaluate the model performance. Since the classes were unbalanced (79% of the study sample was classified as the Accurate group, 9% as the Overestimated group, and 12% as the Underestimated group), a combination of the Synthetic Minority Over-sampling Technique (SMOTE) and random under-sampling methods was utilized to resample the dataset until balanced training classes were achieved (Chawla et al. 2002; He and Garcia 2009). The class-wise sensitivity and specificity, as well as the mean classification accuracy, were calculated to evaluate the random forest model performance. Confusion matrices were utilized to calculate the specificity and sensitivity of candidate models.

The optimal number of randomly sampled features at each node (m) and decision trees (k) were determined by minimizing the out-of-bag (OOB) error rate through iterative cross-validation (Lu et al. 2021). The relative importance of each predictor variable was determined using the mean decrease in accuracy based on OOB error. Partial dependence plots were produced to depict the correlations between predictor variables and the probability of being classified into a given class. A partial dependence plot demonstrates the marginal effect of a predictor variable on the predicted response while controlling for all other variables in the model (Friedman 2001). Figure 1 shows the framework and process of assessing exposure classification error and factors affecting it.

Fig. 1
figure 1

Research conceptual framework

Exposure and health effect across exposure classification groups

Ordinary least squares (OLS) regression models were run as multivariate models to assess the association between personal exposures and health outcomes for static and dynamic exposures, respectively. The exposure-health effects were further assessed across different exposure classification groups, racial groups, and income groups. Recent studies have shown that exposure to PM2.5 is linked to acute respiratory symptoms (Bose et al. 2015) and cardiovascular disease (Madrigano et al. 2013; Neophytou et al. 2014). Thus, two health outcomes were adopted as the dependent variables for the OLS models: the rate of emergency department visits for asthma and the rate of emergency department visits for heart attacks per 10,000 persons.

The health outcome data were obtained from CalEnviroScreen at the census tract level (California Office of Environmental Health Hazard Assessment 2023). Since the individual-level health outcome data were not available, the rate of asthma and rate of heart attack was assigned to each study subject based on their residential locations. That is, study subjects who live in the same TAZ are assumed to have the same health outcomes. The relationship between exposure and the two health indicators was estimated by OLS models adjusted several confounding variables including demographic variables (gender, age, race) and socioeconomic status (income, employment status, education). Table 2 presents a summary of these variables.

In summary, this study first measures the residence-based static and mobility-based dynamic exposures for study subjects, respectively. The study subjects are then classified to three groups according to the magnitude and direction of their exposure measurement errors. The random forest classification model is used to examine the impact of human mobility and sociodemographic characteristics on exposure measurement error. Last, this study explores the exposure-health effect by using static and dynamic exposure measurements through OLS regression models among different cohorts (Fig. 1).

Results

PM2.5 and population activity distribution

Figure 2a–d presents the estimated hourly PM2.5 concentrations across Los Angeles County at the neighborhood level on a typical weekday in 2019. A heterogeneous spatiotemporal distribution pattern of PM2.5 pollution can be observed. In general, PM2.5 concentrations are higher in daytime than evening and night, especially in the morning peak hours (Fig. 2b), and most pollution concentrates in urban cores and along highways. These patterns are in line with the findings of previous studies (Lu et al. 2021).

Fig. 2
figure 2

Estimated hourly PM2.5 concentrations at (a) midnight, (b) 8 AM, (c) noon, and (d) 6 PM and distribution of population activity at (e) midnight, (f) 8 AM, (g) noon, and (h) 6 PM on a typical weekday in Los Angeles County

Figure 2e–h presents the simulated activity patterns of Los Angeles residents at four different hours on a typical weekday in 2019. These figures reflect the distribution pattern of people’s residences and workplaces. Overall, Los Angeles residents generally travel from their sparsely distributed places of residence (Fig. 2e) to urban cores (Downtown Los Angeles, Wilshire-Santa Monica corridor, Long Beach) in the early morning (Fig. 2f) and stay all day (Fig. 2g) until they return to their residences again in the evening (Fig. 2h).

Exposure classification error analysis

To investigate potential exposure measurement errors resulting from ignoring human mobility, flows between different quartiles of study subject’s static exposure and dynamic exposure were plotted in Fig. 3a. A high percentage of the population was misclassified into other quartiles, especially for study subjects with static exposures in middle quartiles (Q2 and Q3). About one-third of populations in the middle quartiles was classified into other quartiles when human mobility was omitted in exposure measurement. Three exposure classification groups were identified by quantifying the difference between individuals’ static and dynamic exposures based on Eq. (3). Table 1 gives summary statistics. Figure 3b–d shows the distributions of static and dynamic exposures for each group.

Fig. 3
figure 3

The distribution of exposure measurement error: (a) direction of potential PM2.5 exposure misclassifications between static exposure and dynamic exposure; (b) distribution of static and dynamic exposures for the Accurate group; (c) distribution of static and dynamic exposures for the Overestimated group; (d) distribution of static and dynamic exposures for the Underestimated group

Table 1 Summary statistics of static and dynamic PM2.5 exposures (μg/m3) across exposure classification groups (***p < 0.001)

Table 1 shows that the Accurate group is the largest. For about 80% of the observations, there is no difference between static and dynamic exposures (the difference is statistically significant but not meaningful). Figure 3b shows how close the two distributions are. The Overestimated group is the smallest (9% of the study sample). For individuals in the Overestimated group, mean static exposure was 0.96 μg/m3 higher than their dynamic exposure. This difference is large (about 10%) and significant. This group has the highest static exposure level of all groups (Fig. 3c). The Underestimated group accounts for the remaining 12% of observations. The mean difference between static and dynamic estimates is 1.15 μg/m3 or about 17%, even larger than the difference for the Overestimated group (Fig. 3d).

Random forest results

Model performance and variable importance

The descriptive analysis has revealed mobility and sociodemographic differences across exposure classification groups. Random forest models were further trained using the same set of mobility and sociodemographic variables to examine their correlation with exposure classification errors. Table 2 lists all mobility and sociodemographic variables and their summary statistics used for the random forest model. As noted earlier, three exposure classification groups were defined, and the random forest classification algorithm was used to develop a predictive classification model based on individual’s mobility patterns, residential pollution level, and sociodemographic characteristics. The hyperparameters for the random forest model were set to 1500 decision trees with a minimum sample leaf of 50.

Table 2 Descriptive statistics of the mobility and sociodemographic variables used in the analysis of exposure classification errors

The random forest model yielded a mean classification accuracy (adjusted across all classes) of 71%. Figure 4a presents the confusion matrix for evaluating the random forest model’s performance. The sensitivity values for the Accurate, Overestimated, and Underestimated groups are 73%, 71%, and 70%, respectively, implying good agreement between actual and predicted classifications.

Fig. 4
figure 4

Random Forest model performance: (a) confusion matrix of predicted exposure classification groups; (b) variable importance rank

The relative contribution value of predictor variables to the random forest classification results is shown in Fig. 4b, sorted in order of importance. The variable importance rank shows that daily trip distance, hours stay out of home, household income, and residential pollution level are among the most important features. By contrast, ethnicity, employment status, and education play weaker roles in affecting exposure classification errors. These findings suggest that individual’s exposure measurement error is mainly affected by their mobility levels, income, and pollution levels at residence.

Partial dependence analysis

The partial dependence plots illustrate the marginal effect of a single variable on the predicted classification outcome. According to variable importance results, the partial dependence of daily trip distance, hours stay out of home, household income, and residential pollution levels on the probability of classification results were examined. Figure 5 plots the partial dependence of the abovementioned variables for all groups.

Fig. 5
figure 5

Partial dependence (PD) plots for the most important variables in the random forest classification model for (a) the Accurate group, (b) the Overestimated group, and (c) the Underestimated group

As shown in Fig. 5a, increasing probabilities of an individual belonging to the Accurate group were associated with shorter daily trip distance, fewer hours spent out of home, and higher residential pollution levels. Household income displays a nonlinear relationship with probabilities of the Accurate group. The most significant marginal influence was depicted at around $80,000. The middle column of Fig. 5a shows a two-dimensional partial dependence plot of daily trip distance and hours stay out of home to explore the effects of combining two mobility variables on probabilities of the Accurate group. The color scheme represents different probability levels. Yellow tones indicate a lower probability, and purple tones denote a higher probability. The two-dimensional plot shows that individuals who travel longer distances and time away from home are least likely to be categorized to the Accurate group.

A similar effect of mobility variables on classification probability can be observed in Fig. 5b,c. Both probabilities of the Overestimated group and the Underestimated group grow with the daily trip distance and hours stay out of home. The two-dimensional plots indicate that exposure is more likely to be overestimated or underestimated for people with high mobility levels. Although the effects of mobility variables on the magnitude of exposure classification error were similar for the Overestimated group and the Underestimated group, different household income and residential pollution levels for the two groups resulted in completely opposite directions of exposure classification errors. Increasing probabilities of the Overestimated group were associated with lower household income and higher residential pollution levels (Fig. 5b). By contrast, lower household income and higher residential pollution levels were associated with reduced probabilities of the Underestimated group (Fig. 5c). The opposite associations of household income and residential pollution level with the probabilities of the Overestimated and Underestimated groups suggest that as mobility levels increased, exposures were more likely to be overestimated for low-income residents living in highly polluted areas, while exposures were more likely to be underestimated for high-income residents living in areas with cleaner air.

Exposure-health effect analysis

Figure 6 shows the correlation coefficient (95 percent confidence interval (CI)) between exposures (static and dynamic exposure) and health outcomes (rate of asthma and heart attack) across three different cohorts: exposure classification groups, racial groups, and income groups. A positive association between exposure and adverse health outcomes was observed in Fig. 6. However, the coefficients of static and dynamic exposures were significantly different across different groups. Exposure measurement errors associated with omitting human mobility can result in bias in the correlation between exposures and health outcomes. For the Accurate group, the correlation coefficient of static exposure is similar to dynamic exposure (2.42 vs. 2.50 for asthma and 0.21 vs. 0.22 for heart attack). However, for people whose exposure is overestimated, the effect of PM2.5 exposure on the risk of asthma (1.85) and heart attack (0.27) is greater than estimated by static exposure (1.35 and 0.20). Conversely, for people whose exposure is underestimated, their health risks related to exposure to PM2.5 tended to be overstated. For the Underestimated group, a 1 μg/m3 increase in static exposure to PM2.5 would lead to a 1.21% increase in emergency department visits for asthma. At the same time, the rate decreases to 0.74% when considering human mobility in the exposure measurement (Fig. 6a).

Fig. 6
figure 6

Sensitivity analysis of exposure-health effect

The effect of PM2.5 exposure on the risk of asthma and heart attack is found to be underestimated for most racial and income groups if ignoring human mobility in exposure measurement (Fig. 6c–f). Hispanics, blacks, and the low-income are found to be disproportionately burdened with health risks associated with air pollution, which is consistent with findings obtained from existing literature (Bae et al. 2007; Gilbert and Chakraborty 2011; Houston et al. 2004). The sensitivity analysis on the exposure-health effect suggests that health risks of the socially disadvantaged after exposure to PM2.5 are likely to be underestimated due to the exposure mismeasurement introduced by ignoring human mobility.

Discussion

Overlooking human mobility may lead to incorrect exposure assessment and misleading conclusions and thus results in inefficient public health policy solutions (J. Kim and Kwan 2021; Park and Kwan 2017). A growing amount of research has highlighted the importance of human mobility in air pollution exposure assessment (Dewulf et al. 2016; Ma et al. 2020; M. M. Nyhan et al. 2019; Park 2020), but little is known about the impact of distinct mobility patterns on exposure measurement errors and how these errors influence exposure-health effect. This study offers important insights into the literature by investigating the underlying factors contributing to exposure measurement errors. This study indicates that the individual’s mobility level is the most critical factor in determining exposure measurement errors. Exposure measurement errors increase with mobility. Individuals with high mobility have the most significant exposure measurement errors, especially those who travel long distances and spend more time out of the home.

There is also a significant correlation between individuals’ sociodemographic characteristics and exposure measurement errors. Household income has the greatest effect on exposure measurement errors, likely due to the key role of wealth in determining where people live, their occupations, and places people often visit (Sampson 2019). According to the results, household income is more inclined to drive the direction of exposure measurement errors. Air pollution at residence is another critical factor influencing exposure measurement errors. On average, as mobility increases, exposure is likely overestimated for low-income residents of neighborhoods with poor air quality, while exposure is typically underestimated for high-income residents of neighborhoods with cleaner air. This finding is consistent with the conclusion obtained from prior empirical research: Exposures of individuals who are less exposed at residence are likely amplified by their mobility, and exposure of people with high residential exposure is usually attenuated (Dewulf et al. 2016; Picornell et al. 2019; Tayarani and Rowangould 2020; Yu et al. 2020). One probable explanation is that people from neighborhoods with cleaner air are more likely to carry out their daily activities in neighborhoods with poorer air quality (Boeing et al. 2023). In contrast, residents of neighborhoods with high air pollution tend to engage in daily activities in neighborhoods with less air pollution than their residential neighborhoods (J. Kim and Kwan 2021; Lu 2021).

The results show that the relative exposure measurement error is larger for wealthier people because they often reside in neighborhoods with cleaner air and their residence-based exposures start much lower than those with financial restrictions. However, the overall exposure and burden of health risks are much higher for the more disadvantaged populations as they are likely from more polluted neighborhoods. People tend to spend more time at home, even those with high mobility levels (Lu 2021; Park 2020). If the socially disadvantaged stay within their residential neighborhoods or vicinity most of the day and spend a lot more time in transit to move shorter distances, either or both of these mobility patterns can lead to much worse exposures but fewer exposure measurement errors. Given the significant contribution of residential air pollution to an individual’s overall exposure, although the exposure of people who live in more polluted neighborhoods may be overestimated, they are still likely to have relatively higher exposures than those living in less polluted neighborhoods.

Moreover, exposure mismeasurement can result in bias in the correlation between air pollution exposure and health outcomes, which may further bias estimates of public health impact. The direction and magnitude of exposure measurement error can lead to incorrect estimates of the exposure-health effect. Our results show that the exposure-health effect may be underestimated for individuals with overestimated exposure. Conversely, for those whose exposure is underestimated, their health risks after exposure to PM2.5 tend to be overstated. Ineffective public health and environmental interventions can be introduced by biased exposure-health effects as a result of exposure mismeasurement. Low-income and ethnic minorities have been burdened with more financial restrictions (Bae et al. 2007). They are also found to be exposed to high air pollution, which makes them doubly disadvantaged (Bae et al. 2007; Elliott and Smiley 2019; Gilbert and Chakraborty 2011; Houston et al. 2004; J. Kim and Kwan 2021). Low-income people generally spend more time exposed to traffic during commuting or live in areas with poorer air quality, thus increasing exposure. Also, higher incidence of co-morbidities, nutritional deficiency, and less access to information and education due to lack of economic resources impose an increased vulnerability for socially disadvantaged groups. It is important for policymakers to account for individual’s exposure at not only places of residence but also all other activity locations. Accurate exposure measurements can help policymakers develop public health policies that reflect the interests of all people.

Several limitations in this study need to be addressed in future research. First, the human movement data used in this study were simulated from an activity-based travel demand model. Given this model simulated people’s daily travel trajectories for a typical weekday in 2019, it was assumed that individuals have constant activity patterns throughout the year. However, people’s daily mobility patterns are not consistent over time and may vary across weekdays, weekends, or seasons (Susilo and Kitamura 2005; Xianyu et al. 2017). It is debatable whether people’s varied travel behaviors on a different day (e.g., weekend, holiday) can generate similar exposure classification patterns identified in this study. Thus, more effort should be placed into studying how different travel behaviors over time can affect exposure measurement error by collecting human movement data covering multiple time periods.

Second, in this study, only ambient PM2.5 exposure was estimated as indoor PM2.5 data were not available. Recent evidence shows that people spend more time indoors (e.g., at home, workplace, and school) during the day, especially those with less mobility (Lu 2021; Park 2020). Staying indoors may provide some protection from sources of ambient air pollution (e.g., traffic emission), leading to different results in air pollution exposure assessments. Future exposure research should consider indoor and outdoor PM2.5 concentrations to measure individual exposure accurately.

Third, the unique characteristics of demographic composition and land use layout are recognized for Los Angeles. As a result, the spatiotemporal mobility patterns and ground-level PM2.5 concentration distribution depicted in this study only represent the study area's distinct features. The population mobility pattern, spatiotemporal variabilities of air pollution concentrations, sociodemographic mix, and land use layout are expected to vary across different regions. Further research is needed to examine whether findings from this study can be applied to other areas.

Conclusion

Ignoring human mobility in exposure estimates can lead to erroneous exposure assessments and ineffective policy implications. Prior research has emphasized the importance of human mobility in estimating air pollution exposure, but little is known about how human mobility might lead to exposure measurement errors. To fill the literature gap, this study assesses residence-based and mobility-based PM2.5 exposures for 100,784 Los Angeles County residents. It examines the impact of mobility and sociodemographic variables on potential exposure measurement errors. Detailed human mobility data was integrated with hourly PM2.5 surfaces at the neighborhood level. The finding suggests that the magnitude of exposure measurement error is linked to people’s mobility levels, and individuals’ sociodemographic variables drive the direction of exposure measurement errors. Individuals with high mobility levels are likely to have increased exposure measurement errors. High income and low residential pollution are associated with exposure underestimation, and low income and high residential pollution levels are associated with exposure overestimation. The exposure measurement error introduced by the residence-based method can further lead to erroneous conclusions on the relationship between exposure and health risks. Policymakers should take into account human mobility and sociodemographic characteristics in exposure assessment and ensure that their policies reflect not only the preferences of socially advantaged populations but also the interests of disadvantaged populations.