Keywords

1 Introduction

In the era of pervasive and ubiquity computing, various applications of activity recognition are evident in many real-life, human-centric problems such as eldercare, healthcare, sports etc. [1]. In most developed countries, demographical trends tend toward more and more elderly people, usually left to their own means in receiving healthcare and other services. The effects of these trends are dramatic on public and private healthcare, as well as on the individuals themselves. Therefore, it is economically and socially beneficial to enhance prevention, by shifting from a centralized, expert-driven model to one that permeates home environments, with focus on advanced homecare services dedicated to personal medical assistance [2].

Regular daily exercises reduce the risk and progression of chronic diseases and improve functional abilities, cardiorespiratory fitness and metabolic health in patients with frequent diseases, such as cardiovascular, lung, and neurodegenerative diseases. With the stimulation of physical activity, the risks for these conditions can be greatly reduced. In particular, [3, 4] showed the effect of physical activity on coronary heart diseases and on the risk of hypertension known as high blood pressure. Additionally, in [5] it is confirmed the effect of the physical activity on diabetes, while [6] prescribes exercise therapy against diabetes. The benefits from regular physical activities for healthy population include reductions in body weight and fat, resting heart rate, increased high density lipoprotein cholesterol and an improved maximal oxygen uptake.

Health benefits associated with physical activity depend on the activity duration, intensity and frequency, therefore it is important to monitor and distinguish the physical activities. Activity recognition (AR) attempts to recognize the actions of an agent in environment from sequence of observations. AR has the potential to address the emerging health conditions such as obesity, heart conditions, diabetes, etc., since physical inactivity is the main factors for those conditions or at least strongly coupled with them. Additionally, human activity recognition can help to develop patient recovery trainings or even provide early detection of diseases, strokes, falls, etc.

Therefore, we can identify three main advantages of AR: (i) early detection of falls and other abnormalities in elderly; (ii) help in the process of recovery after an accident/stroke and (iii) prevention of diseases.

The in-depth data analyses can lead to a broader range of societal challenges. However, most of the studies that investigate the benefits from different physical activities are expensive and require complex process of monitoring. In most epidemiological studies, participants are equipped with special sensors, therefore, the cohort size is limited, and the conclusions cannot be simply mapped to broader population [7]. It would be more convenient more participants to be included in the studies, but without financial implications by means of hardware requirements and manual data analyses. In such scenarios, a common framework is needed that is capable not only to gather data form many participants, but also to provide automatic activity recognition.

This paper has three main goals: (1) to design a system architecture and organization for activity recognition using smartphones and smartwatches as sensor devices which gather data and recognize activities, along with a remote cloud system which is in charge of training and improving the models for recognition; (2) to develop a new lightweight algorithm for activity recognition that is easily implementable for smartphone and smartwatch applications; and (3) to evaluate the accuracy of the algorithm on smartphone and smartwatch sensors and to test which sensor combination gives the best results for the activity recognition problem, i.e. if the accuracy benefits of those combinations are greater than the additional costs of combining those sensors.

Our algorithm is based on neural network, i.e. Long Short Term Memory networks. Although this algorithm has been previously used for activity recognition on wearables [8] and smartphones [9], to the best of our knowledge, this is the first research that evaluates the algorithm on data collected from smartwatch sensors.

The remaining sections of this paper are organized as follows. The second section describes and illustrates the architecture for development of AR methods in details. In the third section we describe and test a neural network which can be used as activity recognition model, as well as the best combination of sensors that outperforms all the others. The paper is concluded in the fourth section.

2 System Architecture for Efficient AR Tools Development

The traditional approach for development new AR tools and techniques includes collecting sensors data to be further used for training and testing. The process of data collection can be made in a special laboratory conditions, or under field conditions. In both cases, all data are stored in a central server and the process of developing AR techniques is made offline.

In the first case, participants equipped with sensors are guided to perform particular activities for a relatively short period. Therefore, most of the collected datasets are with small number of participant, which is eventually leading to development of inconvenient models.

In the second case, participants are wearing the sensors during their normal daily life, reporting the activities in an activity diary. Typical sensors are accelerometers attached to an elastic belt and placed at the hip or at the dominant ankle. The main drawback of this data collection scenario is the lack of accuracy with respect to time spent performing particular activity. Additionally, wearing the sensors for a long time can even reduce participants’ compliance to take part in the study [7]. Still, the approaches from the literature that investigate the AR accuracy on data collected under field condition are satisfying, performing almost equally good as on data collected under field conditions. However, this is expensive, labor and time consuming task for the application developers.

In this section we will describe a system architecture that will overcome both problems, i.e. the development of AR tools would be made online on large datasets collected under field condition. The system we purpose consists of three main parts: server (cloud), normal users and users-contributors.

  • The server (cloud) is a central part in the architecture. Its function is to store data, build AR models, update AR models and distribute them to the users.

  • Normal users get the AR models from the server (cloud) on their wearable (smartphone). They can contribute only by sending their experience back to the server (cloud).

  • Users contributors sent labeled data to the server (cloud), so they contribute to the process of data collection. Additionally, they can act as normal users.

The contributors provide labeled data to be used for creating new AR models or improving the existing ones. Before a specific activity, the contributor can turn her device to “contribute mode” to record labeled data to be sent to the server-side for further processing. Data transferring to the server (cloud) should be done in real-time or near real-time. To save energy for data transmission, different techniques can be used for data reduction, like delta compression (for near real-time transfer) or data prediction based on dual prediction scheme [10, 11]. Additionally, some application specific heuristics can be investigated for this purpose, like sending data only when the phone is connected to the charger, or if the available battery is above a predefined threshold (Fig. 1).

Fig. 1.
figure 1

General framework of the system for online data collection and processing

The architecture we propose is not only suitable for applications based exclusively for smartphone sensors, but can be extended to systems that use different wearables. In this case, the sensors from the wearables send data measurements to the smartphone, which acts as a gateway or hub to retransmit the data to the server (cloud). In this case, it is important for both devices (smartphone and wearable) to be able to communicate using common protocols with low energy requirement, like Bluetooth Low Energy (BLE), etc. Although there are many different protocols for smart devices to transfer data (ZigBee, Z-Wave, Insteon, etc.), the state-of-the-art smartphones are not supporting them [12], therefore this scenario is usually not feasible.

The need for such architecture is not new, but previously it was not feasible since most of the tools for AR require in-depth data analyses usually performed by a team that includes both data scientists and experts in this field to be jointly involved in the process of generating hand-crafted features. Recently, there have been new tools that include automated feature engineering techniques to extract features from the raw accelerometer readings and to select a subset of the most significant features.

3 Designing Accurate and Lightweight Algorithms for AR

The activity recognition at the present-day mainly is sensor-based, implemented with the help of smartphones and wearable devices acting like wearable sensors and computational (recognition) devices. In [13], artificial intelligence (AI) techniques are used to develop daily activity reminders for elders with memory impairments. Moreover, in [14] the authors built abnormal human-activity detection models which can be used to detect and notify of abnormal behavior and early detection of dementia.

Body-worn sensors can be used in sports and physical activities in order to assess and improve the overall sport performance and fitness. In [15] the authors achieved to learn the daily activity pattern of the users and assess the daily energy expenditures in order to help users improve their lifestyle. There are also commercial devices for monitoring sport activities such as Nike +  [16] sensor which is placed inside a shoe to keep track (duration) of running and jogging exercises. If connected to a smartphone application, it enables the user to set training goals or to challenge friends.

For the process of activity recognition, different approaches have been proposed in the literature, ranging from simple models to complex neural networks. For instance, [17] examines techniques such as Hidden Markov model, the conditional random filed (CRF), the skip-chain CRF, etc., for building activity models. The authors in [18] collected multimodal sensor data, extracted features from their dataset and then employed Support Vector Machine (SVM) on the features. Furthermore, [19, 20] proposed neural networks as classifiers using generated features from the data. As improvement, [21] used Convolutional neural networks (CNN) where the model itself automatically extracts features from the raw sensor data without the need of human expert with prior knowledge of the field to generate handcrafted features. Finally, in [22] the authors suggested a recurrent neural model superior to other neural network models. This model is able to capture temporal correlations from the sensor data, therefore it is applicable to wide range of problems while providing acceptable accuracy. In [7], automated feature engineering technique is performed to extract features from the raw accelerometer readings from epidemiological studies, and four machine learning algorithms are used for classification, showing that only one accelerometer is sufficient for accurate activity recognition.

Apart from the accuracy, other major problems in deploying AR models and applications is the computational cost and time complexity of the algorithm, since it should operate in real-time on devices with limited energy [23,24,25]. Even in the modern smartphones with performances comparable to those of the computers, the power remains a challenging problem, since battery technology has not kept pace with information and communication technologies. Other issues regarding energy consumption is sampling frequency, as it is an important parameter for the accuracy of the algorithm.

In this section we describe the dataset used in our study, the model we developed for activity recognition, as well as the results of our analyses.

3.1 The Dataset

We use the AR dataset [26, 27] which consists of around 9 million entries, 4 million accelerometer data entries and 4 million gyroscope entries recorded in laboratory conditions. There are recordings of 9 users performing the following activities: ‘Biking’, ‘Sitting’, ‘Standing’, ‘Walking’, ‘Stair Up’ and ‘Stair down’ while data is recorded via two embedded sensors: accelerometer and gyroscope. Four smartwatches (two LG and two Samsung Galaxy Gear) and eight smartphones (two Samsung S3 mini, two Samsung S3, two LG Nexus 4 and two Samsung S +) were used. The data was split into four subsets: smartphone accelerometer data, smartwatch accelerometer data, smartphone gyroscope data and smartwatch gyroscope data.

3.2 Long Short Term Memory Neural Network

Generally, AR techniques first segment the time series data with sliding windows, then apply signal processing and statistical methods for feature extraction from the raw accelerometer measurements and then train machine learning algorithms for classification of different activities.

Among many techniques for activity recognition from the literature, we investigated a neural network based approach known as Long Short Term Memory (LSTM) networks. The main advantages of LSTM can be summarized as: (i) easily implementable on mobile applications; (ii) outperforms other approaches from the literature by means of accuracy; and (iii) robust enough to perform almost equally good on data collected under field conditions as on data collected in a controlled environment [8, 9].

LSTM network as a deep learning system appropriate for temporal modeling was initially proposed by Hochreiter [28] and later improved in 2000 by Gers [29]. It has shown improvements over Deep Neural Networks for speech recognition problem [30]. Since 2016, LSTM became integral part of many applications and services delivered by Google, Microsoft and Apple, including personalized speech recognition on the smartphone [31] and gesture typing decoding [32].

LSTM networks are a special type of neural networks that remember information from further back in the past. Given a sequence of inputs X = {x1, x2,…, xn}, LSTM associates each time step with an input gate, memory gate and output gate, denoted respectively as it, ft and ot. The information from the past is remembered using the state vector ct-1. The forget gate decides how much of the previous information are going to be forgotten. The input gate decides how to update the state vector using the information from the current input. The lt vector consists of the information from the current input added to the state. Finally, the output gate decides what information to output at the current time step. This process is formalized as in (1),

$$ \begin{array}{*{20}c} {i_{t} = \sigma \left( {W_{i} \cdot \left[ {h_{t - 1} ,x_{t} } \right]} \right)} \\ {f_{t} = \sigma \left( {W_{f} \cdot \left[ {h_{t - 1} ,x_{t} } \right]} \right)} \\ {o_{t} = \sigma \left( {W_{o} \cdot \left[ {h_{t - 1} ,x_{t} } \right]} \right)} \\ {l_{t} = \tanh \left( {W_{l} \cdot \left[ {h_{t - 1} ,x_{t} } \right]} \right)} \\ {c_{t} = f_{t} \cdot c_{t - 1} + i_{t} \, \cdot l_{t} } \\ {h_{t} = o_{t} \cdot \tanh \left( {c_{t} } \right)} \\ \end{array} $$
(1)

where Wi , Wf , Wo and Wl have dimensions D × 2N, D is the number of memory cells and N is the dimension of the input vector. These matrices represent the parameters of the network. LSTM is local in space and time since its computational complexity per time step and weight is O (1) [28].

As an algorithm for learning models for the task of activity recognition we use LSTM Network implemented in TensorFlow [33]. For every task we tuned the parameters (learning rate, hidden layers, structure of the network) to obtain optimal results. Our basic model consists of two fully connected and two LSTM layers with 64 units each and we use L2 regularization to avoid overfitting. We train the model for 70 epochs.

3.3 Experiments

First the data was split into training and testing sets. Because our dataset contains entries of 9 users, we separated the data from two users and we used it for testing while the remaining data was used for training. With this kind of splitting we avoid overfitting on user specific data and the results are unbiased. Before feeding the data to the models, we do sliding window of size 200 and step of 50.

The experimental results are presented in Fig. 2. The bars represent the sensor combination used as input to the models. It can be seen that in general the accuracy is higher when the smartphone sensors’ data is used. The highest accuracy of 94% is achieved for the combination accelerometer-gyroscope form the smartphones’ sensors.

Fig. 2.
figure 2

Accuracy for each sensor combination

Figures 3 and 4 present the confusion matrices for the accelerometer and for the gyroscope data. The rows represent the true class and the columns represent the predicated class. From the confusion matrices it can be seen that misclassifications from the gyroscope phone data and the misclassifications from the accelerometer phone data are contradictory. For example, from Fig. 3 can be seen that the models that use only acceleration data, mostly mix the classes Sitting and Standing. On the other hand, it can be seen in Fig. 4 that the models that use only gyroscope data mix the classes Jogging and Upstairs. To exploit the model variability, we used the accelerometer and the gyroscope data for building the final models.

Fig. 3.
figure 3

Confusion matrix for phone accelerometer data

Fig. 4.
figure 4

Confusion matrix for phone gyroscope data

4 Conclusion

Activity recognition is integral part of many wearable devices, therefore development new tools and algorithms will remain a challenging problem for the research community in the next few years. The process of data collection is usually expensive labor task that is time consuming for both developers of wearable applications and participants that assist in the process of data collection. In this paper we propose a system architecture that can decrease the time to develop new and more accurate methods and tools for activity recognition.

We develop a LSTM technique that performs accurate AR using smartphone and smartwatch sensors. From our experiments, it has been proved that it is better to combine the accelerometer and the gyroscope sensors from the smartphone in order to increase the accuracy of the model, since models that use smartphone acceleration data make different misclassification errors compared to the models that use smartphone gyroscope data.