Introduction

Exposure to air pollution is estimated to cause 1.3 million deaths worldwide/year in urban areas, and emissions from road traffic account for a significant share of this burden (WHO 2011). Therefore, an accurate assessment of human exposure is crucial for a correct determination of the association between the traffic-related air pollutants and the negative health outcomes (Hertel et al. 2008).

Exposure estimates to atmospheric pollutants can address individuals (personal exposure) or large population groups (population exposure) and can be based on direct (exposure monitoring) or indirect methods (exposure modelling) (Zou et al. 2009). In practice, monitoring of personal exposure is limited to studies with a small number of individuals because of the high costs associated with the measurements. Similarly, air quality time series provided by a monitoring network are frequently used as a good individual exposure indicator. Nevertheless, this estimate has been found to correlate poorly with personal exposures (Kousa et al. 2002; Koistinen et al. 2001; Oglesby et al. 2000) as it does not capture spatial heterogeneity in exposure to air pollution, time spent indoors and human mobility (Koistinen et al. 2001) thus leading to inaccuracies and underestimation of the health effects of air pollution (Szpiro et al. 2008; Peng and Bell 2010). In addition, the presence of individuals in direct vicinity to the emission sources may result in higher exposure concentrations than pollution levels registered at monitoring stations (Baklanov et al. 2007).

Consequently, exposure modelling technique arises as an alternative approach able to address the spatial and temporal variability of individual exposure concentrations and is recommended for exposure assessment (WHO 2002; Brunekreef and Holgate 2002; Brauer et al. 2002). Over the past 20 years, several exposure modelling methods have been developed with the aim of estimating exposure at the individual level. The major purpose of these models is to characterise air quality concentrations to be used as surrogate of personal exposure to air pollution and assumes that subjects within a demographic area (e.g. census units) are equally exposed to air pollution. Thus, over the past decade, air quality models have been integrated with geographic information system (GIS) in an attempt to reflect individual exposure by combining air pollutants concentration data with residence location (e.g. Gauderman et al. 2007; Hoek et al. 2008). To overcome some of these issues, several personal exposure models based on a microenvironment approach, such as AirPex (Freijer et al. 1998), SHEDS-PM (Burke et al. 2001), HAPEM (Özkaynak et al. 2008) and APEX (US Environmental Protection Agency 2009) are available. These models are designed to simulate the distribution of personal exposure in several microenvironments (e.g. outdoors, indoor-residential, public buildings, and workplaces) (Burke et al. 2001), by combining the estimated pollutant concentrations at every microenvironment and the time spent at visited microenvironments based on participant’s diary or time–activity measurement databases (Kruize et al. 2003; Georgopoulos et al. 2005; Özkaynak et al. 2008; US Environmental Protection Agency 2006; Health Effects Institute 2010; Klepeis et al. 2001). However, by using this time–activity location data, the information on the actual “activity space” of individuals required for high-resolution exposure modelling is rarely available (Health Effects Institute 2010) and individual air pollution exposure context can be assumed as a series of independent microenvironment exposures (Ballesta et al. 2008).

Recently, there has been an increasing focus on using global positioning system (GPS) technology to collect the individual trajectory information to be used in combination with air pollution levels to estimate personal air pollution exposure levels in urban areas (Jensen 2006; Gerharz et al. 2009, 2013). However, such models are still strongly dependent on questionnaires’ information to derive individual activity profile, providing exposure estimates only if the individual resides in a microenvironment which is specified in the model (Gerharz et al. 2013) and generally only indoor and in-vehicle microenvironments are identified, ignoring exposure during walking periods.

In this perspective, despite several efforts have been expended on characterising the spatial and temporal distributions of air pollution, much work remains in understanding the role of individual mobility in conditioning exposures in urban areas. Therefore, what has risen from literature reviews is that the time-sequence of exposure events is not preserved in exposure assessment, and the information to evaluate possible correlations in exposures to different pollutants because activities that are related in time are not conserved. The source–receptor relationship, especially for “hot-spots” peak exposure is still insufficiently addressed and the contribution of traffic-related air pollution to the total exposure is not clear (Health Effects Institute 2010; Wang et al. 2009). In addition, the development of innovative models that reduce uncertainties in exposure characterisation is required (Lioy 2010).

Recent findings highlight that the population mobility is one of the factors that may affect significantly the exposure (Nethery et al. 2008; Beckx et al. 2009; Dons et al. 2011; Tchepel and Dias 2011). In this sense, the knowledge of where individuals spend time is essential for assessment of human exposure to air pollution and research on human behaviour or activities is a crucial component of modern and future exposure science (Lioy 2010). The predictability in human dynamics by studying the mobility patterns of individuals using mobile phones became an emerging field (Song et al. 2010), and GPS technology presents as a promising tool by monitoring real-time geographic positions.

In this perspective, the GPS-based exposure model to traffic-related air pollution (ExPOSITION) is developed in the present work for quantification of human exposure to traffic-related air pollutants within distinct microenvironments using air pollution modelling with high spatial–temporal resolution and a novel approach based on trajectory analysis of individuals. Enhanced resources, such as GIS, GPS and data mining techniques are considered by the model to analyse the human behaviour and individual’s activities required for exposure assessment. For this purpose, information on detailed time–location data are collected for each individual at each moment by GPS-equipped mobile phones, offering many advantages over traditional time–location analysis, such as high temporal resolution and minimum reporting burden for participants (Rainham et al. 2010).

The GPS technology guarantees that there will be an increasing availability of large amounts of data affecting individual trajectories, at increasing localisation precision. However, significant uncertainties associated with the processing and classifying of raw GPS data is one of challenging issue for the exposure studies (Wu et al. 2010; Zheng and Zhou 2011). Thus, a trajectory data mining and geo-spatial analysis algorithm were developed and implemented within GIS to process the trajectories obtained with GPS-equipped mobile phones to automatically identify time–activity location in several microenvironments, such as commuting, indoor, and outdoor locations. The time–activity patterns of each individual are finally combined with information on pollutant concentrations that the individuals experiences at different microenvironments. For outdoor environment, atmospheric dispersion modelling and different modelling tools may be used to provide this external information on ambient concentrations for ExPOSITION model. Indoor concentrations are estimated considering not only outdoor/indoor infiltration factor but also the additional contribution of indoor emissions sources. In addition, a probabilistic approach are considered by the ExPOSITION model to characterise the variability of the microenvironmental parameters providing an additional knowledge on the variation associated with microenvironmental concentrations and its contribution to the individual exposure estimates. The development of the ExPOSITION model is presented and described.

Methodology—human exposure modelling

The ExPOSITION model is developed to assess average short (e.g. daily) and long-term (e.g. annual) inhalation exposures of the individuals to traffic-related air pollutants over urban spatial scale with high spatial–temporal resolution. For this purpose, air pollution concentrations are estimated for different microenvironments (described in “Microenvironmental concentrations”) and combined with detailed time–activity patterns obtained from data collected by mobile phones with GPS technology (described in “Trajectory data mining” and “Time–activity patterns”). The ExPOSITION modelling system developed and applied in this study is schematically presented in Fig. 1 and described in the following sections.

Fig. 1
figure 1

Conceptual framework of the ExPOSITION modelling system

Personal exposure is characterised by ExPOSITION model in terms of time-weighted average exposure concentration calculated from air pollutant concentration fields and time spent by individuals in different microenvironments (Eq. 1). It is important to highlight the distinction between air pollution “concentration” provided by dispersion models, and “exposure concentration” defined as amount of chemicals that comes into contact with the human body and take into account not only pollutant concentration fields but also the location of an individual and duration of the exposure. Thus, individual exposure is calculated by ExPOSITION as following:

$$ {\overline{E}}_i=\frac{1}{t_2-{t}_1}{\displaystyle \underset{t_1}{\overset{t_2}{\int }}C{\left(x,y,z,t\right)}_i\;\mathrm{d}t} $$
(1)

where \({\overline{E}}_i\) (μg m−3) is the average exposure concentration for person i, C i (x, y, z, t) (μg m−3) is the air pollutant concentration occurring at a particular point where the person i is located during the time t and spatial coordinate (x, y, and z) and t 1 and t 2 (hours) are the starting and ending times of the exposure event.

Exposure estimates are provided in μg m−3 and can be determined for each individual as hourly, daily or annual average and resulting data can be exported for further analysis (e.g. epidemiological analysis and health impact assessment).

Microenvironmental concentrations

Specific microenvironments are distinguished in the exposure model including residence, other indoors, outdoors and in-vehicle (Table 1). Two different approaches are considered to characterise pollution levels in these microenvironments. Thus, outdoor concentrations are estimated using atmospheric dispersion modelling and different modelling tools may be used to provide this external information for ExPOSITION as will be discussed in “Emission and air quality modelling”. For indoors and in-vehicle microenvironments a probabilistic approach was implemented as an integrated part of ExPOSITION algorithm. In this case, it is assumed that within a microenvironment the pollutants are homogeneously distributed and microenvironmental concentration C(x, y, z, t) (μg m−3) considered in Eq. 1 is calculated using a linear regression equation based on the outdoor/indoor infiltration factor α j (dimensionless) and additional contribution of indoor pollution sources expressed as βj (μg m−3):

Table 1 Parameters used to determine PM2.5 concentrations in different microenvironments
$$ C\left(x,y,z,t\right)={\beta}_j+{\alpha}_j\times C{\left(x,y,z,t\right)}_{\mathrm{ambient}} $$
(2)

where C(x, y, z, t)ambient (μg m−3) is the outdoor concentration that occurring in the immediate vicinity to the microenvironment j at time t and spatial coordinate(x, y, z).

Microenvironmental concentrations are estimated based on a probabilistic approach considered by the model that attempts to capture the variability in microenvironment parameters. In this sense, to calculate microenvironmental concentrations for each individual the ExPOSITION model randomly assigns the parameters β and α to each indoor location from empirical distributions taking into account the average and standard deviation obtained from literature review for each type of microenvironments (Table 1).

A single value is selected from the probabilistic distribution of each microenvironmental parameter α and β. These values are then used in the model to produce a single estimate of microenvironmental concentration. This process is repeated many times, with new values for each stochastic input parameter and probability distribution of exposure microenvironments is obtained.

Trajectory data mining

Trajectories of the individuals are required as one of the main inputs to the exposure modelling. The time–activity patterns are determined by the model based on an innovative approach developed for collection and analysis of data registered by mobile phones with GPS technology and thus providing the precise location and time spent at different microenvironments during the daily trajectories of individuals required for the exposure assessment.

Collection of time–location information using GPS technology provides continuous tracking of the individuals with high data resolution in time and in space. However, to overcome some of the significant uncertainties associated with the processing and classifying of raw GPS, automatic processing of GPS raw data using the trajectory data mining is implemented in ExPOSITION model.

To identify important patterns, several levels of GPS data processing are required (Fig. 2a). First, it is necessary to “clean” the GPS raw data to eliminate invalid entries. Subsequently, the places where the individual was stopped for a certain time period are distinguished from moving activities, like driving a vehicle. Finally, it is necessary to discover which of these points belong to the same activity/place. For this purpose, the data clustering process is implemented to distinguish significant places based on the analysis of spatial and temporal information of GPS points (Fig. 2a).

Fig. 2
figure 2

a Schematic representation of the trajectory data mining analysis; b GPS raw data, GPS “clean” trajectory and stay points detection

Thus, “significant places” are considered as those locations that play significant role in the activities of a person, carrying a particular semantic meaning such as the living and working places, the restaurant and shopping mall, etc., ignoring the transition between these places. Additionally, a “movement activity” is a composition of movements with a frequent regularity of location change over time which can be aggregated by the purpose of the trip of an individual.

A preliminary processing of GPS data is implemented as a first step to “clean” the data and converts it into a standard format in preparation for the clustering approach. For this purpose, an error-checking algorithm was developed to remove invalid points. This algorithm considers a measurement as valid, if the GPS receiver is able to see at least four satellites and if the horizontal dilution of precision value is below 6 (Fig. 2b). Otherwise the measurement is considered invalid. In addition, the algorithm evaluates incorrect entries of the travel speed.

GPS datasets provide information on the locations in coordinate form (e.g. latitude and longitude) but contains no semantic meaning (Zhou et al. 2007a) like the address or characteristics of location, i.e. type of microenvironments. Therefore, it is necessary to extract and distinguish in the GPS data the locations where the individual stopped for a certain time period and these locations are designated as “stay points”. A stay point represents a geographic location in which the individual stays for a certain time period and in addition to a raw GPS point carries a particular semantic meaning.

The algorithm to extract stay points from GPS data is iterative and it is based on searching for locations where the user has spent a longer time period (Li et al. 2008). As presented in Fig. 2b, the extraction of a stay point S from a user’s GPS trajectory P = {p 1, p 2, … , p K }, depends on two scale parameters: a distance threshold (D threh) and a time threshold (T threh). Thus, a single stay point S can be characterised by a group of consecutive GPS points p i containing latitude (p i.Lat), longitude (p i.Long) and time (p i.T):

$$ \begin{array}{l}S=\left\{{p}_{\mathrm{i}}\right\},\mathrm{where}\ m\le i\le n,\hfill \\ {}\mathrm{Distance}\left({p}_{\mathrm{m}},{p}_{\mathrm{i}}\right)\le {D}_{\mathrm{threh}}\;\mathrm{and}\hfill \\ {}\left|{p}_{\mathrm{n}}.T\hbox{--} {p}_{\mathrm{m}}.T\right|\ge {T}_{\mathrm{threh}}\hfill \end{array} $$

A pre-processing of the GPS data and detection of the stay points is important to extract some important locations. However, the repetition of the same locations is not considered and each time that a location is discovered it is assumed as a new location. To overcome this problem, a second level analysis to group up different stay points with the same semantic meaning is implemented using cluster analysis.

Clustering is a data mining technique focused on detecting hidden groups, or clusters, among a set of objects (Bock 1996). In this study, to group the points belonging to the same premises, and thus define the personally significant places, a density-based clustering algorithm DJ-Cluster (Zhou et al. 2004, 2007a, b) was implemented. The DJ-Cluster algorithm is selected and applied in this study, as it is less vulnerable to noise and does not require the number of places as a parameter. However, the algorithm depends excessively on the density of the points and does not give importance to the time spent in each site, i.e. duration, which will be relevant for the exposure quantification.

In the clustering algorithm, the neighbourhoods within distance Eps are analysed for each point. If at least a minimum number (MinPts) of such neighbourhoods is found, the points are either grouped as a new cluster or joined with an existing cluster, and a significant place is created. Otherwise, the point is labelled as a moving activity (e.g. being in vehicle microenvironment) (Fig. 3). The following conditions define the density-based neighbourhood of a point and density-joinable relationships (Zhou et al. 2007a):

  1. (a)

    Density-based neighbourhood of a point:

    The density-based neighbourhood N of a point p, denoted byN(p), is defined as:

    $$ N(p)=\left.\left\{q\in Q\left|\mathrm{dist}\right.\right.\left(p,q\right)\le \mathrm{Eps}\right\} $$
    (3)

    where Q is the set of all points, qis any point in the sample, Eps is the radius of a circle around p to defines the density. The following condition is also needs to be satisfied for N(p):

    $$ \#N(p)\ge \mathrm{MinPts} $$
    (4)

    where MinPts is the minimum number of points required in that circle.

  2. (b)

    Density-joinable:

    N(p) is density-joinable to N(q) denoted as J(N(p), N(q)), with respect to Eps and MinPts, if there is a point such that both N(p) and N(q) contain it.

    For the objectives of this study, the sites are identified as personally significant places taking into account two variables: density and duration. In this perspective, DJ-Cluster algorithm was changed to implement additional condition based on duration of stay, as presented in Fig. 3. Thus, N(p) defined in Eq. 3 needs to satisfy simultaneously two conditions:

    $$ \#N(p)\ge \mathrm{MinPts}\kern0.5em \wedge N(p)\cdot \mathrm{duration}\ge \mathrm{MinDrt} $$
    (5)

    where MinDrt is a parameter that represents the minimum duration at a location. Thus, p can be considered as a cluster or merged with an existent cluster in case that has a minimum number of points required MinPts and a minimum duration MinDrt.

    Fig. 3
    figure 3

    Flowchart of the clustering process

    These data are further analysed within GIS environment for classification of microenvironments and to obtain information on time–activity patterns.

Time–activity patterns

Location of the individuals in space and in time is required to estimate individual exposure in a combination with pollutants concentration fields provided by air pollution dispersion model.

To obtain information on time–activity patterns the significant places and movement activities extracted from the trajectory are further analysed within GIS environment to cross this information with other geo-spatial information. For this purpose, geoprocessing of GPS data is performed using ModelBuilder module provided by ArcGis 10. ModelBuilder can be thought of as a visual programming language for building workflows in which it is possible to create, edit, and manage geospatial analysis (Allen 2011).

The geoprocessing of GPS data is accomplished by considering analytical functions and several predefined criteria based on speed, time and spatial location register for the trajectory points to classify the significant places and movement activities to three activity categories: indoor, outdoor and in vehicle travel. The detailed GIS-maps are used to identify and to classify the microenvironments.

An indoor activity is distinguished from outdoor based on the time register. If the spending time in that point is equal or higher than 10 min, based on several tests conducted in this study and as presented by Ashbrook and Thad (2003), the significant place is identified as an indoor activity, and it is geographically located to the nearest indoor microenvironment, acquiring the entire attribute data associated to this microenvironment, such as microenvironment type (residence, workplace, restaurant, etc.). Additionally, the speed value is analysed to distinguish outdoor activity from in vehicle travel. However, the higher speed values registered during driving a vehicle are not sufficient to identify a movement activity. In addition, activities like being static outdoor and in the traffic jam are difficult to distinguish based on speed criteria only. Thus, if the speed value is less than the speed of walking of 2 km h−1 (Transportation Research Board 1994), the distance between the identified point and the nearest road will be analysed. If there is no intersection with the road network, the significant place is identified as an outdoor microenvironment, such as being in a park, sitting on a terrace, etc. Otherwise, vehicle microenvironment is identified.

Finally, this detailed time–activity patterns for each individual will be linked with the pollutants concentration fields varying in space and in time provided by air pollution dispersion model described in the next section, allowing to produce exposure estimates within distinct microenvironments.

Emission and air quality modelling

Air quality modelling allows establishing the relationships between current emissions and current air quality at particular locations. Information on variability of air pollutant concentrations is essential for the exposure quantification and these data may be provided for ExPOSITION by any modelling tools if it is compatible with their requirements in terms of spatial and temporal data resolution.

In this work, hourly traffic emissions required by the air quality model were estimated using the Transport Emission Model for Line Sources (TREM). The emission factors considered by TREM depend on average speed, fuel type, engine capacity and emission reduction technology. A new version Transport Emission Model for Hazardous Air Pollutants (TREM-HAP) prepared to calculate HAPs emissions (Tchepel et al. 2012) has been used to provide inputs for AUSTAL2000 dispersion model.

AUSTAL2000 is the official reference air dispersion model of the German Regulation on Air Quality Control for short-range applications (Janicke and Janicke 2002; Janicke 2004). The model is based on Lagrangian approach that simulates the dispersion of air pollutants by utilising a random walk process. Three-dimensional diagnostic wind fields is calculated based on a given initial wind profile and a given terrain profile and/or set of building shapes. Additionally, the vector of the turbulent velocity is randomly varied for every particle by using a Markov process (Janicke 2002; VDI 2000). The fundamental equation for the Lagrangian atmospheric dispersion of a single pollutant is given by Eq. 6.

$$ C\left(x,t\right)={\displaystyle \underset{0}{\overset{t}{\int }}{\displaystyle \int P\left(x,t\Big|{x}^{\hbox{'}},{t}^{\hbox{'}}\right)}\kern0.3em S}\left({x}^{\hbox{'}},{t}^{\hbox{'}}\right)\;\mathrm{d}{x}^{\hbox{'}}\mathrm{d}{t}^{\hbox{'}} $$
(6)

where C(x, t) is the average pollutant concentration in x at time t, S(x′, t′) is the source term and P(x, t|x′, t′) is the probability density function (PDF), that the hypothetical parcel moves from the point x′ at time t′ to the point x at time t. Therefore, if actual paths of the portions of air can be obtained, the simple calculation of the density of trajectories points provides an estimate of the concentration (Graff 2002).

The main objective of AUSTAL2000 application in the current study is the calculation of atmospheric dispersion of substances, including PM fractions (four different classes of the aerodynamic diameter) allowing to establish relationship between emissions and air quality and to provide hourly pollutants concentration fields. Additionally to input data on emissions, a continuous time series of meteorological parameters, including wind direction, wind speed and atmospheric stability are required by AUSTAL2000.

Currently, several studies using the AUSTAL2000 are available, as well comparative analyses with other dispersion models (Yau et al. 2010; Langner and Klemm 2011; Merbitz et al. 2012; Gerharz and Pebesma 2012).

Model application

The Leiria urban area, situated in the central part of Portugal and covering eight sub-municipality units, was selected in this study for the model application. The study domain covering an area of 4.5 × 4.5 km2 with 20-m grid resolution and a complex orography, containing about 5,000 buildings considered as obstacles for the air dispersion modelling. The Leiria urban area and road network considered in this study for the exposure quantification are presented in Fig. 4.

Fig. 4
figure 4

a Data recording screen from mobile phone; b spatial visualisation of the GPS raw data recorded

In this study, given the huge health impacts of PM even at levels of exposure currently being experienced by most urban populations (WHO 2006), the new exposure modelling tool was applied to provide a quantitative assessment of the personal exposure to PM2.5.

Hourly PM2.5 emissions from road traffic were estimated by TREM based on the traffic volume for each road. For this purpose, data reported by Pinto et al. (2008) were used to characterise the number of vehicles for each road link. To estimate PM2.5 concentrations hourly simulations were conducted with AUSTAL2000 model taking into account hourly meteorological conditions, such as prevailing wind direction and wind speed and background concentrations measured at one monitoring station located in sub-urban area of the city with a temporal resolution of 15 min. In addition, the model system includes a diagnostic wind field model to account for terrain profile and buildings structures.

To characterise the variability in input parameters used to calculate microenvironmental concentrations (Eq. 2), a set of random inputs characterising the infiltration factor α and the contribution of indoor pollution β are generated for each microenvironment. The PDF for both parameters is determined using the information presented in Table 1. A combination of random values is used to create 625 independent inputs for each microenvironment to be considered by ExPOSITION for the exposure estimations.

TTGPSLogger tracking system (TTGPSLogger 2012) was used to collect GPS data providing trajectories of five individuals during a working day in November 2010. TTGPSLogger is a GPS logger software for Symbian S60 allowing to store detailed time–location information on geographic coordinates, speed and time at regular time intervals during its use over the daily activities of individuals. In addition, information on the positioning accuracy of GPS receiver is provided (number of satellites, position dilution of precision, etc.). The GPS tracking log can be written in NMEA, GPX or KML format. For proper implementation of the trajectory data mining analysis, the GPS data was collected in one-second intervals.

Results and discussion

In this section, the results obtained with newly developed modelling tool for short-term PM2.5 exposure quantification are presented and discussed.

GPS data processing

TTGPSLogger tracking system installed on mobile phone is used to collect real-time latitude-longitude position of individuals, speed and time during their daily activities (Fig. 4a). This information was stored in a GPX file format that is compatible with GIS systems presenting very useful to analyse the spatial distribution of large amount of GPS raw data collected. Thus, during a typical working day of one of the individuals analysed in this study, 30,179 GPS raw points with a temporal resolution of 1 s are collected by TTGPSLogger tracking system. However, some of the collected GPS data points with invalid information, such as incorrect entries of speed values achieving maximum of 650 km h−1, are identified.

Most of the invalid measurements observed in this study are from areas where the individual has stayed indoors due to the obstruction of the GPS signal inside of buildings. Furthermore, there are some situations where the GPS receiver located inside buildings does not lose the signal but the data collected are affected by significant errors achieving about 60 m of distance from the actual position. Another limitation observed is a gap of GPS information during some periods (from 15 s to 10 min) depending on the GPS status.

Taking into account the limitations detected during the analysis of GPS raw data, cleaning of the data and their processing are required to predict the time–activity patterns (Fig. 5).

Fig. 5
figure 5

Example illustrating the data processing applied to GPS raw data

In Fig. 5, a sequence of images with a zoom to the workplace of the individual 4 is presented to illustrate the formation of three clusters detected from the data. The first image corresponds to the raw GPS points recorded. Subsequently, the data cleaning allows to remove invalid points from collected GPS raw points. However, this approach is only a pre-processing to detect errors and inconsistencies in data. The stay point detection algorithm reduces significantly the number of GPS points that are consequently used for the clustering. In the example presented in Fig. 5, one significant place and two movement activity clusters are identified from the set of stay points.

The locations resulting from the clustering algorithm are further analysed within GIS environment to cross this information with other geo-spatial information and to obtain detailed time–activity patterns classified by different types of microenvironments. Thus, in the case of individual 4, 30,179 collected GPS raw points resulted in 15,978 stay points, originating 295 locations that are linked with the pollutants concentration in distinct microenvironments to assess its individual exposure.

Emission, air quality and meteorological data

To estimate human exposure to PM2.5, hourly traffic emissions and air pollutants concentrations were estimated. Figure 6a illustrates the spatial variations in hourly traffic-related emissions across the study area obtained by linking TREM-HAP outputs to GIS maps. As could be seen in the figure, higher emission values are observed for main city entrances.

Fig. 6
figure 6

Spatial distribution of a hourly PM2.5 emissions (g.km-1), b daily average PM2.5 concentration (μg m−3) and time spent by the individual in each microenvironment

Apart from road transport, other emission sources that play an important role for the study area are characterised in the current study by considering the background concentrations of PM2.5. As mentioned in “Model application” the background concentrations of PM2.5 and meteorological data were monitored in a sub-urban location at one fixed monitoring station. It should be mentioned that the study period was analysed and selected for this work as the PM2.5 concentrations observed for this sampling period were not affected by long-range transport.

The meteorological data obtained during the sampling period are presented in the Fig. 7a. As could be seen, the higher wind intensities are achieved with winds blowing from the North, which is also the predominant wind direction, although there is a significant contribution of the East direction. As regards the variation throughout the day, generally the wind speed gradually increases, reaching maximum values at 6 p.m. and minimum values during the night. Therefore, this information on prevailing wind direction and wind speed for each hour, terrain profile and building shapes of the city are explicitly considered by the meteorological module and reflected in the modelled PM2.5 concentrations.

Fig. 7
figure 7

a Hourly wind speed obtained from measurements as a function of wind direction; b temporal variation of hourly background PM2.5 concentrations

The time series of hourly background concentrations of PM2.5 measured from fixed monitoring station is presented in the Fig. 7b evidencing a gradually decrease for PM2.5 concentrations during the night-time period. The lower concentrations of PM2.5 are observed for 11 a.m., affected by the lower emissions that were observed during the same time period, thus influencing consequently PM2.5 concentrations. Conversely, the hourly maximum value of 29.1 μg m−3 are reached at 12 a.m.

The spatial distribution of the air pollutants concentration obtained by the AUSTAL2000 is presented in Fig. 6b showing that distribution of pollution levels within the study domain is not homogeneous. In addition, time–activity patterns obtained for one of the individuals are presented in the figure as an example. The analysis of results examines the PM2.5 concentration variation in space and in time provided by air pollution dispersion model and the influence of time spent in each microenvironment type. Thus, these findings enhance the importance of taking into account the high spatial and temporal variations in outdoor concentrations, the “microenvironmental” variations imposed by a variety of indoor and outdoor locations and the time spent indoors to obtain accurate personal exposure estimates to air pollution.

Personal exposure modelling

The individual exposure assessment performed by the ExPOSITION model is presented in this section. For better understanding of the contribution of different microenvironments to the daily average PM2.5 exposure in the study area at a typical working day, several statistical parameters, including average individual exposure, 5th and 95th percentiles and extreme values were analysed (Table 2).

Table 2 Exposure concentration for PM2.5 (μg m−3) in different microenvironments

As could be seen in Table 2, the largest variability in the exposure concentration is identified for outdoor and residence microenvironment. Exposure concentration calculated for in vehicle are characterised by smaller variability range but higher absolute values in comparison with the other types of microenvironments. In addition, it is possible to verify that the variability in the PM2.5 exposure concentration in each microenvironment type is significant showing the importance to consider this variability in individual exposure modelling.

As expected, the indoor microenvironments represent a great relevance for the exposure of individuals (Fig. 8). Conversely, it is possible to verify that being outdoors represents a very low contribution to the exposure because corresponds only about 2 % of the time spent by individuals during their daily activities, which suggests that outdoor concentrations measurements should be used carefully for human exposure quantification. However, outdoor concentrations represent an important part of the pollution levels estimated for indoor microenvironments because of outdoor/indoor infiltration.

Fig. 8
figure 8

Distribution of time spent by individuals and average contribution of different microenvironments to daily individual exposure

To better understand the individual exposure obtained during the simulation period, a temporal variation of the exposure concentration was analysed as presented in Fig. 9. Several statistical parameters, including hourly average exposure concentration, 5th and 95th percentiles are analysed for each individual.

Fig. 9
figure 9

Temporal variation of individual exposure concentrations and outdoor concentrations of PM2.5

The results show that the five individuals are exposed to different PM2.5 concentrations during their daily activities, and a significant variability in PM2.5 exposures across the individuals is evident in Fig. 9. Analysing the individual exposure concentrations during night time (between 9 p.m. and 7 a.m., approximately), when the people stay in residence, the hourly exposure concentrations presents a similar trend with the outdoor concentrations but different magnitude. However, throughout the day and depending on the daily activity of the individuals the hourly average exposure concentrations tend to be more variable. The highest exposure levels are related with both the magnitude of pollutant concentrations and the time spent in specific microenvironments as, for example, could be seen in Fig. 9 for the individual 1 at 4 p.m.

Overall, the daily average exposure to PM2.5 predicted by the ExPOSITION model correspond to 10.6 μg m−3 in terms of the mean value for all individuals and 6.0–16.4 μg m−3 in terms of 5th–95th percentiles. Comparing the mean value obtained by the model and estimated from air quality measurements at a fixed point (11 μg m−3) represented in Fig. 6b, an agreement between the approaches is evidenced. However, the ExPOSITION model reveals additional inter- and intra-variability of individuals' exposure levels, suggesting limited representativeness of air quality concentrations obtained from point measurements to characterise individual exposure to urban air pollution.

The results obtained from the ExPOSITION model are in good agreement with the daily average exposure reported for other European cities such as Helsinki (9.9 μg m−3) (Koistinen et al. 2001) and Amsterdam (14.5 μg m−3) (Janssen et al. 2005). The current study shows that high PM2.5 exposure is mainly attributed to indoor microenvironments rather than outdoor, as also presented by Georgopoulos (2005). In this context, individual time activities patterns and time spent at different microenvironments during the day should be of prime concern in addition to the variability in the pollution levels, as presented by Burke et al. (2001).

Conclusions

Under this work, a GPS-based personal exposure model based on an innovative approach for trajectory analysis was developed. For this purpose, a time–activity pattern discovery sequence, based on trajectory data mining and geo-spatial analysis within GIS, was developed to extract useful time–location information from GPS raw data collected by a mobile phone with a GPS tracking system carried by the user during their daily activities. Taking into account the limitations detected during the analysis of GPS raw data, the results obtained during the several levels of GPS data analyses indicate that this approach could be used to analyse the human behaviours and activities required for exposure assessment.

Time series of individual exposure concentrations to PM2.5 are presented for the entire study area characterising a person’s contact with a given pollution levels at different microenvironments. The results show a significant contribution of indoor microenvironments to the total exposure values thus stressing that individual exposure depends not only on the exposure pollution levels but also on the time spent in the microenvironment during the day. In addition, the low contribution of outdoor environment to the daily individual exposure suggests that the concentration peaks of air pollution are not co-located in time and space with the time period that the individual spend outdoors.

The uncertainty of the exposure estimates are not addressed in the current application of the ExPOSITION model. Uncertainties in indoor and outdoor concentrations could be both relevant and should be taken into account in ExPOSITION model as they contributed to the total exposure uncertainties with varying strengths depending on the time people spent in different environments. However, as mentioned before, the uncertainty related to the human mobility during the exposure assessment period is overcome by the exposure modelling approach developed in this study.

Overall, by comparing the personal exposure modelling results with fixed-point measurements, the ExPOSITION results clearly outperforms the traditional approach of using urban background measurements as proxy for the personal exposure.

In contrast to exposure model approaches currently available (STEMS (Gulliver and Briggs 2005); Beckx et al. 2009; Gerharz et al. 2009, 2013), our model approach focuses not only on the indoor, outdoor or journey-time exposure to outdoor concentration but provides exposure estimates for the whole activity profile including the contribution of indoor emission sources and the variability of microenvironmental parameters. Additionally, current exposure models lack the temporal and spatial resolution of the presented approach that can therefore be considered for application areas different from those of the existing models, focussing on the exposure at the individual level.

The methodology developed and applied in this study allows to estimate and analyse the magnitude, frequency and the inter- and intra-variability of personal exposure levels, as well the contribution of different microenvironments, clearly addressing the time-sequence of the exposure events and source–receptor relationship, which is essential for health impact assessment and epidemiological studies. The ExPOSITION model contributes to better understanding of individual exposure in urban areas, by providing information on individual exposure taking into account where individuals actually spend their time and the high spatial and temporal variations of the “microenvironmental” concentrations imposed by a variety of indoor and outdoor locations, essential in the selection of strategies to reduce exposure to urban air pollution and related health effects.

An application of the ExPOSITION model developed in this study to benzene and validation of the modelling approach against individual exposure measurements will be presented in the future work.