Introduction

In recent years, the concept of the exposome, which is composed of every exposure to which an individual is subjected from conception to death, was developed to draw attention to the critical need for more complete environmental exposure assessment in epidemiological studies (Wild 2005, Wild, 2009, Wild, 2011). Since exposome requires consideration of both the nature of those exposures and their changes over time (Wild 2012), the advancement of mobile technology, sensors, and the “internet-of-things” becomes very useful in exposure studies.

The determination of a person’s exposure to air pollutants requires the knowledge of the various microenvironments where he moves and the activities which are carried out. For that reason, in previous exposure studies, time activity diaries (Steinle et al. 2012; Schweizer et al., 2005; Briggs 2005; Kleipeis 1999) have been used to determine people’s location and activities for exposure assessment. As traditional methods often failed to integrate important information on activity patterns, which may lead to bias or loss of statistical power in health effects associations (de Nazelle et al., 2013), a growing body of research has used GPS devices (Dueker et al. 2014; Wu et al. 2011; Phillips et al., 2011) to reconcile data on location, duration, and routes of individual trips, as well as to evaluate the quality of data obtained from self-reports. Physical activity is also a key exposure determinant for air pollutants, since activity-based differences of respiratory rate result in differences of the actual uptake (Sarigiannis et al. 2012). Based on the need for more accurate measurements than those used in the past, numerous types of available physical activity sensors have been used in recent studies (Bassett 2012, Yang and Hsu 2010), so as to obtain comprehensive activity patterns. As personal “smart” technologies become more prevalent, there is the promise of lightweight, portable, wireless, and lower cost sensor systems, while the benefits of the “easy-to-use,” the acquisition of high-time resolution data, and the connection to the internet for remotely access of the data collected have been already recognized (Kaufman et al. 2015, Darwish and Hassanien 2011). Furthermore, a new opportunity emerges with the widespread use of smartphones, which have integrated GPS systems, and other sensor technologies such as accelerometer, gyroscope, and magnetometer may help exposure assessors ascertain both location and activity of individuals with even greater ease and lower cost (de Nazelle et al., 2013). In order to enable the wider use of such devices and utilization of the collected data, the development of an appropriate infrastructure which ensures the quality assurance and quality control processes is necessary (Nethery et al. 2014). Beyond that, the design of the proper software for a comprehensive evaluation of the data is equally important. Towards this direction, Artificial Neural Networks (ANNs) have been proven a reliable technique in several environmental applications (Karakitsios et al. 2006), as well as for predicting human exposure from sensors data (Sarigiannis et al. 2009).

Based on the above, the overall objective of the current work is to give insight on the suitability of low-cost sensors for exposure studies. In particular, from the technical aspect, the present study aims at providing (a) information about the suitability of activity monitors in exposure studies (considering also their low cost), (b) data about the encountered locations and the performed activities, as well as (c) inputs to evaluate the capability of a location predicting ANN model, so as to disengage from self-reporting in exposure studies.

Methods and materials

Study aims and design

The study consists of two major parts, which relate (a) to field experimental work, meaning the use of sensors in individuals and (b) the evaluation of the collected data, for deriving a location predictive model. The combined information of the activity sensors, the GPS, and the outcome of the predictive model will result finally in a comprehensive description of the time-space activity daily profile of the individuals without the need of self-reporting. The main scope of the study is the collection of individualized data/profiles by using only low-cost personal sensors. This way, more detailed exposure information which is essential in environment-health association studies could be obtained.

Field study and use of sensors

In the frame of a trial campaign which is aiming to investigate the potential use of smart consumer products in measuring personal time-location-activity data for exposure studies, 21 individuals from the two largest Greek cities (Athens and Thessaloniki) had participated. The two cities were selected as typical examples of Southern Europe megacities. Prior to the beginning of the campaign the proper function and performance of the equipment was tested under laboratory conditions. In order to secure the quality of the results, specific instructions and training were given to each individual participant by the trained study staff. It should be noted here that the participation in this study was entirely voluntary, while before the beginning of the campaigns, an informed consent form was signed by each participant. Ethical approval was obtained from the Institutional review board of National Center for Scientific Research “Demokritos” (NCSRD).

The volunteers were asked to self-report their location and activity for a week using an application in their phones and a series of wearable fitness and location trackers (Fig. 1). The equipment used in the study is commercially available and was selected to comply with the specific European standards that guarantee its proper performance throughout the study. All the devices were worn simultaneously by the individuals and described in detail below.

Fig. 1
figure 1

An overview of the personal sensors worn by participants

Elitech temperature logger

All the participants wore the Elitech RC-4HA (Elitech Ltd., UK) sensor, which continuously recorded environmental temperature and humidity every 10 s, aiming at detecting changes between indoor and outdoor conditions. The temperature sensor was worn on the upper torso as shown in Fig. 1, with the logger placed on a belt or around the neck, so as to avoid contact with the skin or any other object.

Fitbit Flex

Fitbit Flex (San Francisco, CA, USA) is a personal activity monitor, worn on the participant’s wrist. This sensor recorded the number of steps, the total distance, and the duration of intense activities, calculating energy expenditure in a daily basis, using the person’s height, weight, and age. This device was synced with volunteer’s smartphone by Bluetooth connection.

Actigraph

The Actigraph GTX3 (ActiGraph, Pensacola, FL, USA) device is an activity monitor with Bluetooth feature, worn on the side (Fig. 1), used to capture, and records physical activity and sleep/wake information every 10 s, since it includes a three-axis accelerometer.

Activ8

The Activ8 system (Activ8, Remedy Distribution, The Netherlands) is a Bluetooth enabled device, used for both activity recognition and energy expenditure, and synchronized with the phones of participants without the need to take the device out of their pocket.

GPS

A Qstarz BT 1000XT GPS device (Qstarz, Taipei, Taiwan) was carried by the participants, in order to determine and track continuously their precise location (every 5 s), and worn on a belt around the waist on the right hip.

Moves App

The Moves App (ProtoGeo Oy,Helsinki, Finland) was downloaded to every participant’s smartphone, and all the individuals were asked to carry their phone with them as always as possible. This application tracked the participant’s location and activity, expressed as steps and kilometers, estimating as well the mode of transport that had been used. The participant was asked to label the encountered locations in the app, such as “home” or “work.” After the completion of the campaign, all the data was downloaded from the website.

Paper Log

A time activity diary was filled out on paper by participants each day. The volunteers indicated their location among “indoors at home,” “outdoors at home,” “indoors at work,” “outdoors at work,” “in transit,” “indoors at other location,” or “outdoors at other location,” recording the start and the end time of each activity.

Predicting the location using ANN models

The capability of predicting location from sensors data was explored using an Artificial Neural Network (ANN) model. A neural network consists of a number of interconnected processing elements, commonly referred to as neurons. The neurons are logically arranged into two or more layers and interact with each other via weighted connections. These scalar weights determine the nature and strength of the influence between the interconnected neurons. Each neuron is connected to all the neurons in the next layer. There is an input layer where data are presented to the neural network, and an output layer that holds the response of the network to the input. It is the intermediate layers, also known as hidden layers, that enable these networks to represent and compute complicated associations between patterns. Neural networks essentially learn through the adaptation of their connection weights. For the needs of the study, a multi-layer perceptron network was utilized, which is a feed-forward artificial neural network model that maps sets of input data onto a set of appropriate output. It is a modification of the standard linear perceptron in that it uses three or more layers of neurons (nodes) with nonlinear activation functions and is more powerful than the perceptron in that it can distinguish data that is not linearly separable or separable by a hyperplane. Location, motion, and intensity of activity data were used as input to an ANN model, aiming at deriving a time-location model based solely on sensor data. The independent variables that fed the ANN input layer were consisted of (a) the differential of personal temperature, dTemp/dt, derived from the wearable temperature sensor; (b) the observed outdoor temperature, Tempout, derived from a central meteorological station of Athens and Thessaloniki, respectively; (c) the ratio Temp/Tempout; (d) the rate of change of personal sensor temperature dTemp/dt; and (e) personal speed, derived from the GPS devices wore by the participants. Moreover, information on day light was also included as an input variable, transformed into a categorical element (day or night).

MATLAB scripts were used in order to combine data from all different sensors which were then merged into a single database with a time step of 1 min. Out of the full dataset, a sample based on data captured during the first four consecutive days of the week (Monday to Thursday), also including data from Saturday, was created. This was the training set that was then further divided into an internal training and a testing set (85 and 15% of the initial training set respectively). The model developed upon the training set was then tested against an independent validation set which was based on data from Friday and Sunday. Both training and validation datasets were based on complementary parts of the week so that the model captures different patterns that occur on a workday as well as during the weekend. The performance of the proposed methodology was evaluated using various training algorithms on different network architectures, such as Bayesian regularization (Foresee and Hagan 1997), Resilient backpropagation (Riedmiller and Braun 1993), Scaled Conjugate Gradient (Moller 1993), as well as the Broyden, Fletcher, Goldfarb and Shanno (Dennis and Schnabel 1983) and the Levenberg-Marquardt (Hagan and Menhaj, 1995) algorithms. With respect to the employed architectures, various number of units used in the hidden layer were tested, using 1 or 2 hidden layers with 6 up to 30 hidden neurons. From the obtained results, it was found that Bayesian regularization is constantly among the best results while the use of 1 hidden layer with 13 neurons can model successfully the problem under study. Thus, the proposed ANN model uses three layers as shown in Fig. 2. The first (input) layer consists of 6 neurons, one for each input parameter. The second (hidden) layer consists of 13 neurons that implement the hyperbolic tangent sigmoid transfer function. Finally, the third (output) layer consists of three categorical outputs that correspond to one of the predicted location. Network training was performed using the Bayesian regularization algorithm. A detailed analysis of the equations constructing the ANN model is described elsewhere (Sarigiannis et al., 2009). Networks were evaluated using root mean square error (RMSE), mean square error (MSE), R2, and via cross-validation using the following metrics: accuracy, sensitivity, specificity, and cross-entropy.

Fig. 2
figure 2

Feed forward Artificial Neural Network

Results

Evaluation of sensors

Activity sensors

As far as activity sensors is concerned, the main goal of this study was to evaluate the accuracy of the Fitbit Flex, the Activ8, and the Moves App in exposure studies based on their convenience, ease of use, and low cost, while the Actigraph was used for comparison. The data obtained from Activ8 were not well correlated with the other devices; therefore, this device was excluded from the analysis. Figure 3 shows a comparison between Fitbit steps and Moves App steps for a single participant during the week of the campaign, while Fig. 4 illustrates a scatter plot of Moves App versus Fitbit total steps per day for a subset of the participants. As shown, the Moves App tends to underestimate step counts compared to the Fitbit Flex, probably because of the participant’s difficulty to carry the mobile phone with him all the time, with respect to the Fitbit Flex that it can be worn on the wrist and record the activity in continuous basis. It was found that the mean relative percent difference between these two methods was 56%. Similar results were obtained when the Moves App steps counts compared to those from the Actigragh device. Daily total Fitbit and Actigraph steps were compared using Pearson’s correlation coefficient, and the analyses showed a quite well concordance (0.792), with the Actigragh in general tended to underestimate the Fitbit steps (Fig. 5). This may be due to the fact that the participants took off their belt with the Actigraph after 10 min when they returned at home, while the Fitbit Flex remained on their wrist and it continued recording the steps that volunteers took in their residences during housekeeping or while moving from one room to another.

Fig. 3
figure 3

Comparison between Fitbit steps and Moves App steps for a single participant during the week of the campaign

Fig. 4
figure 4

Scatter plot of total steps per day for Moves App versus Fitbit. Different colors indicate data for each participant in the Athens subset

Fig. 5
figure 5

Comparison between Fitbit steps and Actigraph steps for a single participant during the week of the campaign

Location sensors

Paper log versus Moves App

A comparison of location and activity from the Moves App compared to the paper log for two different participants is shown in Fig. 6. This visual comparison has emerged having in one column the time per minute and in the other one the value that corresponded to each different microenvironment. This procedure has been conducted separately for the Moves App and the paper-log, respectively. One participant’s data was excluded from analysis due to failure to complete the log book adequately. The analysis of the data showed that, in general, the Moves App is able to record user’s location adequately. The main advantage of using the Moves App is that self-reporting is not anymore required. Furthermore, the smooth transition between several locations in a short period of time could be achieved, which is not always possible with the paper log. However, according to the paper log, it was observed that using the Moves, there may be inaccuracies in the determination of location when a person moves between short distances and is unable to determine whether an individual is indoors or outdoors, probably due to measurements uncertainties in the phone’s locational system. As it is shown in Fig. 6, the App recognizes that the participant is at home but does not distinguish the difference between indoors and outdoors, while using the paper log, this information can be obtained.

Fig. 6
figure 6

A visual comparison of Paper log and Moves App for two participants

Moves App versus GPS

According to the paper log, the GPS device indicated more accuracy on the route that participant followed in respect with the Moves App. Figure 7 illustrates a comparison between the route of GPS and Moves for a study participant in a single day. A comparison of the coordinates of each place (house, work, other) which were obtained by Moves and GPS respectively was conducted, and it was found that the correspondence of Moves with GPS is differed among individuals and ranged between 70 and 97%.

Fig. 7
figure 7

A comparison between the routes of GPS (cyan) and Moves App (red)

Predicting location using ANN

Among the several networks tested, the highest performance was achieved by a MLP 7-13-3 network. A visual comparison between the real location data and the ANN predicted ones is showcased in Fig. 8a, b. The results illustrate that the ANN model performs very well in predicting the various locations, especially the indoor ones, which also comprise the vast majority of registries of the training set, since most of the time of the daily activity is spent in indoor locations. The identification of outdoor locations is not so efficient, since many of these registries are not discriminated from the in-transit mode.

Fig. 8
figure 8

a Showcase of the location prediction of a week day of a typical individual in Athens. b Showcase of the location prediction of a week day of a typical individual in Thessaloniki

The prediction accuracy for the various locations (for both Athens and Thessaloniki) is demonstrated in Table 1, while the cross-validation metrics are presented in Table 2.

Table 1 Accuracy of location prediction using the ANN in Athens and Thessaloniki
Table 2 Cross-validation metrics (accuracy, sensitivity, specificity, and cross-entropy)

Discussion

The current study aimed on the assessment of feasibility of low-cost personal sensors of activity, location, and temperature in individual exposure assessment. The results indicated that the activity tracking devices performed very well in defining the activity levels of the participants, confirming the results of previous studies. Actigraph has been used in evaluating physical activity levels in children and adolescents, and this device can accurately measure step counts and energy expenditure level between subjects in various ages (Yang and Hsu 2010). De Vries et al. (2006) reported that the ActiGraph was the most studied activity monitor, and many studies have validated its reliability and performance. The Fitbit device has also been used in several exposure studies, which showed that it has comparability with Actigraph (Gusmer et al., 2014) and Actical (Noah et al. 2013) meters. In a comparison between the Fitbit and the Yamax Pedometer, which is often considered a gold standard for pedometry, the Fitbit was found to be more accurate in counting steps (Mammen et al. 2012), while Guo et al. (2013), who performed a study to evaluate the most popular activity devices, indicated that the clear winner among the devices was the Fitbit, with a very low error of around 1% for step recording. The overall evaluation of the activity meters is quite positive, and this is a first element for building a reliable exposure model for air pollutants; intensity of activity is a key exposure modifier, affecting the inhalation rate, and the actual uptake, resulting in accurate dose estimates (Sarigiannis et al. 2012, Karakitsios et al. 2014).

Beyond the activity levels, a key determinant of exposure to air pollutants remains the contamination of the several locations. The most usual method for tracking location is self-reporting in paper logs (Jantunen et al., 1998). Tracking location in a paper log although is quite informative, it is time consuming and the results are not always accurate, subjected to bias related to recall of activities and time spent in different microenvironments. Hence, the quality of the retrieved information is subjective, depending on the consistency and the accuracy of each individual and trerefore the quality of the data may differ among participants. This results in inconsistent information among the participants and inaccurate estimates of exposure and intake; this exposure missclasification bias might be proven detrimental for environmental health studies, especially for the assessment of the individual exposome. The use of GPS device although may bring several challenges, such as the missing data due to no GPS signal reception inside certain buildings or the difficulty of discriminating among certain microenvironments (de Nazelle et al., 2013; Nethery et al. 2014), it still provides significant opportunities in identifying the location of the exposed individuals. In order to address these limitations, Breen et al. (2014) proposed that an automated classification model for GPS data is needed. In this study, a further utilization of the sensors data is proposed, using an ANN model for identifying the type of location, hence, overcoming the major drawback of the GPS-type sensors, which fail to differentiate whether an individual stay indoors or outdoors. This information is quite critical for exposure assessment, since misclassification of the location type will result in completely different exposure estimates, e.g., a false outdoor classification would result in overestimation of exposure for typical traffic air pollutants, such as CO and NOx, and underestimation of typical indoor pollutants, such as carbonyls and phthalates. Hence, it is critical to be able to differentiate between outdoor and indoor locations and this is feasible using a combination of various types of data obtained from sensors; trajectories should be described by GPS or Moves application, while the definition of the type of location should be ascribed by the ANN model prediction.

It has to be noted that info from temperature sensors and the respective synthetic variables used as input in the ANN model are key parameters for the successful location identification. Using both the data from the personal temperature sensor and the ambient air temperature sensor provided an opportunity to the model for discriminating the type of location. Especially for the cold period, the synthetic variable of dT/dt provided additional info for the transition between the locations, contributing significantly to the successful prediction between outdoors and indoors. At the moment, the model predicts very well the presence in indoor locations and this is very important, since the majority of the daily activities (more than 85%) is spent in indoor locations. However, there is still room for improvement in the discrimination between in transit and outdoor locations. The model is expected to provide better results if the speedometer of the GPS devices was more accurate; several mismatch records had been observed (e.g., irrational high speed in indoor locations or very low speeds inside vehicles), providing confusing information to the model. An idea for future work regarding the further exploitation of the GPS data is to filter further process the raw data and to filter the irrational speed values from the training data set, combined with different weights in the contribution of the various parameters.

While the model is currently under refinement, the combined use of sensor (activity, temperature, GPS) and ANN modeling seems a very promising technique for providing information for accurate exposure studies, without the need for self-reporting and the use of paper-log. This is a key step towards the recruitment of a greater number of individuals that is necessary for large-scale population studies associating environment and health, as well as to retrieve comparable data, which do not suffer from misclassification bias of self-reporting.

Conclusions

From the analysis that has been carried out, we concluded that both Actigraph and Moves App correlated quite well with the Fitbit Flex, while the Moves App tended to underestimate the daily steps counts to the other devices. A comparison between the Moves App and the time activity diary showed that the Moves App records adequately the location and the several activities of an individual. The results indicate that in future studies, when using Moves to assess personal time-location activity patterns, an additional sensor may be used to further identify the various indoor and outdoor location. In terms of route identification, the GPS device was proven more accurate than the Moves App. Finally, the prediction of the locations encountered on a daily basis by the individual participants was predicted quite successful by the ANN model, providing the ground for getting disengaged from paper-log use in population studies. Overall, this study provides a comprehensive framework on utilizing sensors (commercial devices) data with advanced modeling techniques for accurate individualized exposure assessment.