Keywords

1 Introduction

According to the last Ageing Report released by the European Commission [12], the percentage of people aged 65 and over will rise from 19% to 29% of the global European population, while those aged 80 and over will increase from 5% to 13%, by 2070. Because of the frailty condition typically associated to older age, developing efficient systems to monitor elderly’s life style can help to study their habits, identify misbehaviors, anticipate potential risks and suggest ways to improve their quality of life [2, 4]. Recently, the advances in wearable technologies, providing rich sensory information, have led to a wide use of wearable devices as monitoring systems in the broad domain of Activity Recognition (AR) researches [9, 17], especially focusing on the automatic classification of indoor Activities of Daily Living (ADLs) [16]. ADLs are defined as self-care activities important for health maintenance and independent living [28], and evaluating the capability of subjects to autonomously carry out ADLs gives essential information about their lives, thus allowing to understand needs, difficulties and health conditions [10]. Geriatricians assess ADLs and IADLs as part of assessing an older person’s function: in fact, problems with ADLs and Instrumented ADLs (IADLs) usually reflect problems with physical and/or cognitive health [29].

In recent years, many researchers have proposed monitoring approaches aimed at HAR and automatic ADLs/IADLs classification, which combine minimally invasive sensing devices to efficient data processing algorithms [30]. An effective example is provided by the use of wrist-worn triaxial accelerometers for unobtrusively collecting ADLs-related signals, joint with effective Machine Learning (ML) algorithms to reach high levels of accuracy in identifying and recognizing different activities [21, 22, 27]. Wrist-worn triaxial accelerometers are available within smartwatches and smartbands, i.e. consumer electronics devices that are nowadays very common among people of different ages, especially among those interested in monitoring their performances during fitness and sport activities, or in lifelogging their daily behaviors. Technologies for lifelogging (alternatively known as quantified self or self-tracking) allow individuals to pervasively capture data about them, their environment, and the people they interact with [13]. The same technologies may enable ADLs detection and classification. Since it is acknowledged that the optimal positioning of a sensor is driven by user acceptance, as well as by the resulting classification accuracy, a meta-analysis of user preferences in the design of wearables indicated that they would like to wear the sensor on the wrist, followed, in descending order, by the trunk, belt, ankle and finally the armpit [3].

In this paper we present the results of a research activity focused on lifelogging for older adults through a non-video based sensing system, namely a wearable accelerometer available on board a smartwatch-like device. An annotated dataset of acceleration signals from the device was collected, from volunteers performing six different ADLs (i.e., Brushing Teeth, Grooming Hair, Washing Hands, Washing Dishes, Ironing Clothes and Dusting) in uncontrolled conditions. The dataset is then exploited to test different automatic classification algorithms, by assessing the impact of activities selected to be performed. In particular, six among the most common supervised ML approaches for classifying human daily activities are applied and compared.

The paper is organized as follows: Sect. 2 shortly reviews the state-of-the-art in the field of HAR and ADLs classification from the wrist-worn accelerometers. Section 3 presents the main steps of the work, aimed at signals collection, pre-processing and features extraction. Section 4 details the classification approaches tested on six different ADLs performed by the volunteers participating in the experiments, and discusses the accuracy attained. Finally, Sect. 5 concludes the paper.

2 Background

Typically, the approaches to activity recognition may be classified into vision-based and sensor-based ones. The former solutions exploit different types of cameras (RGB, but also RGBD adding depth information) to capture the agents’s activity [7], but they may suffer from physical limitations (occlusions and reduced field of view) and may be perceived as too invasive for the user’s privacy. Sensor-based approaches rely on sensors, or installed in the living environment (ambient sensors) or attached to the body of user (wearables), to gather data about the agents’ behaviour alongside with the environment where they live. With the development of Micro Electro-Mechanical System (MEMS) technologies, wearable sensors integrated with inertial, acceleration and magnetic sensors are becoming increasingly less expensive, smaller, and lighter. In the framework of lifelogging systems, either ambient and wearable sensors may be used, the latter including smartphones too, that can be seen as smart sensor systems [24].

Gomes et al. [15] present the use of a sensorized wrist device to recognize eating and drinking events, with the aim of triggering automatic reminders to promote independent living of older adults at home. Data measured from accelerometer and gyroscope are used to assess the performance of a single multi-class classification model. Although only two elderly subjects contributed to the data collection process, the results show that it is possible to correctly classify eating and drinking events with acceptable accuracy. In [20], authors propose an approach for personalizing classification rules to a single person. The method improves activity detection from wrist-worn accelerometer data on a four-class recognition problem, where classes are ambulation, cycling, sedentary, and other. The manuscript extends a previously published activity classification method based on support vector machines (SVMs) to estimate the uncertainty of the classification process. Cleland et al. [6] analysed data from ten adults, who were instructed to perform twelve activities with an accelerometer sensor placed on their left wrist. The activities to be classified were divided into stationary - such as standing and sleeping - dynamic - such as walking and running, and transitional activities - such as stand-to-sit and sit-to-stand. The recording sessions lasted five minutes for stationary and dynamic activities, and fifteen seconds for transitional activities repeated fifteen times. Although being useful to monitor the level of physical activity of a monitored subject, the activities considered in this work do not specifically refer to ADLs, differently from those analysed in this paper.

In [19], authors exploit acceleration data from the users’ wrist to build up assessment models for quantifying activities, to develop an algorithm for sleep duration detection and to quantitatively assess the regularity of ADLs. A total of ten healthy subjects, by wearing a wrist device, conducted 14 different ADLs, that can be grouped in five main categories: rest/sleep level of activity (i.e., sleeping); sedentary level activities (i.e., sitting and watching TV, sitting and reading newspaper, and sitting and web browsing); light level of activities (i.e., housekeeping, driving, and walking without hand-swing); moderate levels of activities (i.e., walking with hand-swing and up and down stairs, with or without hand-swing); vigorous levels of activity (i.e., jogging with or without hand-swing). Differently from what is presented in this work, the above mentioned paper considered ADLs that were quite different from each other, while in this paper similar ADLs are considered, from the point of view of the type of movements performed by the subject’s wrist.

By analyzing the literature, it is possible to see how vastly different wearable recognition systems and accuracies are reported, depending on the activities examined. Additionally, it is important to remember that accelerometers may not be appropriate for some activities, the positioning of sensors also plays an important role, and likely this is a limiting factor for many applications, since sensors positioning is often largely driven by user acceptance rather than optimality of ADLs recognition performance. Finally, in some settings, other sensor modalities may be more appropriate for the activities that are hard to classify using wrist-worn accelerometers.

3 Materials and Methods

3.1 Data Collection

In this study, a total of 36 recordings (18 from men and 18 from women) were collected from the participants in the data collection phase. The participants, with age around 30 years, were in good health status and no current physical conditions could affect the performed daily-living activities. Activity recordings lasted around 5 min per each activity. Each individual carried out all the scenarios considered, performing each activity three times, in free-living conditions at their own home environment. Two scenarios of activities, namely Hygiene Scenario (i.e., Brushing Teeth, Grooming Hair, Washing Hands) and House Cleaning Scenario (i.e., Washing Dishes, Ironing Clothes, Dusting) were evaluated.

Fig. 1.
figure 1

Position of the wrist-worn Empatica E4 with the system of coordinates on the device.

Accelerometer data were recorded by wearing the Empatica E4 wrist-worn device [11] on the dominant wrist, as shown in Fig. 1. According to literature, the wrist is an appropriate place for analysing and recognising some of the activities performed in this work such as grooming hair, brushing teeth and washing hands [5, 14, 18]. Prior to collecting data, the device was coupled with the Empatica smartphone app, named E4 real-time App, to stream data via Bluetooth, according with manufacturer guidelines. After pressing the button to start the recording, the device takes around 15 s to calibrate the system, thus improving the accuracy of the sensor reading. Accelerometer data from each activity was gathered at a sample rate 32 Hz in a measurable range of ±2g, and then stored in the online cloud platform, called E4 Connect, in the form of .csv files. After this, the recorded files were downloaded and renamed with labels corresponding to the scenario and the activity performed. The raw data collection, including acceleration samples used in this study, is available and can be downloaded from the Mendeley platform [26].

3.2 Data Pre-processing

The raw acceleration data consists of time instants and acceleration values along the X, Y and Z axes, as represented in Fig. 1. According to the literature, external interferences or loose coupling may generate both high frequency noise and abnormal spikes. In order to clean and validate the collected data, a 4\(^{th}\) low-pass Butterworth filter with cut-off frequency set 15 Hz, and a 3\(^{rd}\) order median filter were used to attenuate the signal noise [16, 23]. Additionally, considering the automatic calibration performed by the device, the initial 15 s of signal were discarded from each session acquisition.

Following the filtering step, the accelerometer signal was divided into fixed-size and non-overlapping windows of duration 3 s (corresponding to 96 samples) [1], thus resulting in 92 windows. This short window duration has been previously used because it includes a significant number of samples and it allows to rapidly extract features representative of each activity [16]. Therefore, by choosing the proper window length, the meaningful units about each activity were extracted from each segment, reducing the errors and inaccuracy in the classification phase.

3.3 Features Selection and Extraction

Each performed activity can be discriminated by looking for certain motion properties. This way, the corresponding features might be used to classify and distinguish the different activities, by extracting mainly statistical information from the signals. From this idea, a set of 17 common features expressed in time domain were extracted from the collected signals, and computed according to the equations and definitions provided in [8]. As specified in Table 1, some features were computed from the acceleration values along the three X, Y and Z axes, others were obtained from the acceleration Signal Magnitude Vector (i.e. SMV = \(\sqrt{a_X^{2}+a_Y^{2}+a_Z^{2}}\)). In fact, these last features exhibit reduced sensitivity to orientation changes [6]. According to an our previous work [25], the extracted features highly impact the algorithm’s performance. Therefore, such distinction in calculating features considered both their corresponding mathematical definitions and their information content, aiming to reduce the potential information redundancy.

Six machine learning algorithms were used for the supervised learning classification tasks, and their performance were assessed. In particular, the Decision Tree (J48), Random Forest (RF), Naïve Bayes (NB), Neural Networks (NNs), k-Nearest Neighbor (kNN) and Support Vector Machines (SVM) were compared. HAR systems can be evaluated using different testing strategies. In this work, we used the 10-fold cross validation, in which all the sessions were divided into training (90% of data) and testing set (10% of data). The overall accuracy, defined as the ratio of correctly-classified activities over the total of activities, was computed as an average over the 10 iterations.

Table 1. List of Time-based features.

4 Results

4.1 Training and Classifiers Evaluation

In supervised learning approaches, the algorithm learns from a set of training examples, which have pre-classified features. For the supervised learning, in fact, labeled classes are provided. In unsupervised learning approaches, instead, the algorithm works without labeled classes: the basic idea is to find patterns in the data using only the input variables. In reinforcement learning approaches, the algorithm learns from feedback after the decision was made, a feedback is sent to the classifier to know which decisions were correct and which instead, were incorrect. Most of the HAR systems work with a supervised learning approach. In general, a supervised approach is the most used thanks to its capability to learn the relationship between the input attributes, the features extracted, the target attributes and the labeled classes. This relationship defines a model, which can be used for predicting the target attribute, knowing only the values of the input data.

The performance of the six machine learning algorithms mentioned in Sect. 3 were assessed by using the WEKA learning tools [31]. The preliminary evaluation of the classification performance provided by the six classifiers is summarized in Table 2, in which the attained accuracy levels are reported.

Table 2. Accuracy of tested classifiers.

Among the six classifiers, J48 and RF algorithms provided the best results with an accuracy higher than 98%, while NB performance was the worst at 52.05%. These three classifiers were considered for further evaluations. In fact, besides the accuracy, the F-measure, sensitivity and precision were computed too, as detailed in Table 3. The F-measure is defined as the harmonic mean of the precision and recall, where precision (positive predictive value) is the fraction of true positive data to total data classified as positive, and recall (sensitivity) is the fraction of true positive data to the total data that should be classified as true.

Table 3. Evaluation metrics of J48, RF and NB classifiers.

According to such results, the confusion matrix for J48, RF and NB algorithms were reported in Tables 4, 5, and 6, to summarizes how the six activities are classified by each algorithm. The actual classes are in the Y-axis, while the predicted classes are in the X-axis.

Table 4. Confusion Matrix for J48.
Table 5. Confusion Matrix for RF.
Table 6. Confusion Matrix for NB.

5 Discussions and Conclusion

In order to support the independent living of older adults, it is crucial to have a robust HAR system. This specific study used a single wrist-worn device for the collection of accelerometer data. The tasks have been examined by testing several well-known classification algorithms in recognizing six daily activities: brushing teeth, grooming hair, washing hands, washing dishes, ironing clothes and dusting. Generally, the performance of classifiers is influenced by the nature of dataset (e.g., the choice of activities to be executed, and the modality of acquisition to be performed). In this case, the non-linearity of dataset along with the acquisitions in uncontrolled conditions affected the NB linear classifier reaching low accuracy. However, the RF and J48 non-linear classifiers provided more than 98% classification accuracy, resulting in good tools for the recognition of the performed activities in uncontrolled conditions.

Also, as it emerges from the confusion matrices presented in the previous Section, the prediction of each classifier in discriminating the activities may depend on the nature of the activities considered. Table 6, for NB algorithm, suggests that some specific misclassifications are more frequent than others: the “Ironing Clothes” class is mostly confused with “Brushing Teeth” class, due to similar patterns for the wrist movements. Also, the “Washing Dishes” class and the “Washing Hands” class, the “Grooming Hair” and the “Dusting” classes are frequently confused, due to the natural randomness in performing these activities exhibited by a subject in free living conditions. Although the number of wrong-predicted activities is low, some observations’ errors are evident also in Tables 4 and 5 (i.e.,“Washing Dishes”-“Washing Hands” and “Brushing Teeth”-“Ironing Clothes”).

Results are promising: the features set used is a suitable set allowing high accuracy results and high interpretability. This work fosters the adoption of wrist-worn devices as unobtrusive and practical tools in healthcare and well-being research. From our findings, a larger population should be involved to improve the reliability of the investigated approaches.