Keywords

1 Introduction

The ageing population around the world is increasing and it is estimated that two billion people will be aged over 65 years by 2050. This will affect the planning and delivery of health and social care as well as the clinical condition of frailty. Frailty is a medical syndrome which is characterized by diminished strength, endurance, and reduced physiologic function that increases an individual’s vulnerability for developing increased dependency and/or death [1]. Frailty is characterized by multiple pathologies such as weight loss, weakness, low activity, slow motor performance, balance and gait abnormalities, as well as cognitive ones [2]. Frailty increases risks of incident falls, worsening of mobility, disability, hospitalization or institutionalization, and mortality [3,4,5], which in turn increase the burden to cares and costs to the society.

It is assumed that early intervention with frail persons will improve quality of life and reduce health services costs. Thus, it is essential to develop real life tools for the assessment of physiologic reserve and the need to test interventions that alter the natural course of frailty since frailty is a dynamic and not an irreversible process. Several efforts have been done in this direction through research and development activities. In [6] an Ecosystem for training, informing and providing tools, processes, methodologies for ICT and active, healthy aging was developed mainly targeting to caregivers, older people and general population. In [7] an interactive tabletop platform able to integrate potentialities derived from both technology and leisure activities was designed. Another purpose of [7] was the monitoring of the older people status, with information about his/her progression/regression in cognitive health. In [8] a home-care solution that will address older people living in the community in a preventive manner and rely on ICT and virtual reality gaming through the exploitation of haptic technologies, vision control and context awareness methods was developed and integrated, while promising to redefine fall prevention by motivating people to be more active, in a friendly way and with tele-supervision if necessary. In [9] a more holistic, personalized, medically efficient and economical monitoring system for people with epilepsy was provided. In [10] multidisciplinary research areas in serious games, social networking, wireless sensor networks, activity recognition and contextualization, behavioral pattern analysis were combined in pilot setups involving both older users and care providers. In [11] a mobile, personalized assistant for older people was developed, using cutting edge technologies such as advanced speech interaction, which helps them stay independent, coordinate with careers and foster their social contacts. In [12] new, economically sustainable home assistance service which extends older people independent living was introduced, measuring the impact of monitoring, cognitive training and e-Inclusion services on the quality of life of older people, on the cost of social and healthcare delivered to them, and on a number of social indicators. In [13] the objective was to develop and test a proactive personal robotic, integrated with innovative sensors, localization and communication technologies, and smart textiles to support independent living for older adults, in their home or in various degrees of institutionalization, with a focus on health, nutrition, well-being, and safety. In [14] innovative ICT-based solutions for the detection of falls in ageing people were studied, covering prevention and detection of falls in different circumstances.

Human motion monitoring is a must in surveillance of older people, since the related information is crucial for understanding the physical status and the behavior of the older people. Older people suffering from frailty are often required to fulfill a program of activity which follows a training schedule that is integrated within their daily activities [15]. Therefore, the detection of activities such as walking or walking-upstairs becomes quite useful to provide valuable information to the caregiver about the patient’s behavior. Under conditions of daily living, human-activity recognition could be performed using objective and reliable techniques.

Monitoring the activities of daily living requires non-intrusive technology. The main devices an older person can be instrumented through are classified into two main categories based on the used technology: vision-based and wearable. In computer vision, complex sensors such as cameras that continuously record the movement of the elderly have been used to submit the acquired data to specific image algorithms that recognize human activities. In general, tracking and activity recognition using computer vision-based techniques perform quite well in a laboratory conditions or at well-controlled environments. However, their accuracy is lower in real-home settings, due to the high-level activities that take place in the natural environments, as well as the variable lighting or clutter [16]. Furthermore, computer vision methods require a pre-built infrastructure which instead of the time and cost of installation introduce limitations for the space of application since it is hard to be used outdoors. As a result, wearable devices such as body-attached accelerometers and gyroscopes are commonly used as an alternative in order to assess variable daily living activities. The human motion detection problem using wearable sensors is an emerging area of research due to their low-power requirements, small size, non-intrusiveness and ability to provide data regarding human motion. The acquired data can be processed using signal processing and pattern recognition methods, in order to obtain nearly accurate recognition of human motion.

Data acquired from wearable sensors have been used to evaluate several human-activity recognition methods proposed in the literature. Acceleration signals have been used to analyze and classify different kinds of activity [16, 17] or applied for recognizing a wide set of daily physical activities [18]. Feature selection techniques have also been investigated [19]. The reclassification step introduced by Bernecker et al. [20] has been demonstrated to increase motion recognition accuracy. The results achieved by the on-board processing technique for the real-time classification system proposed by Karantonis et al. [21] demonstrate the feasibility of implementing an accelerometer-based, real-time movement classifier using embedded intelligence. Khan et al. [16] proposed a system that uses a hierarchical-recognition scheme, i.e. the state recognition at the lower level using statistical features and the activity recognition at the upper level using the augmented feature vector followed by linear discriminant analysis. Considering the machine learning algorithms for human motion identification that are found in the literature, the most widely used are artificial neural networks [21,22,23], Naive-Bayes [18] and support vector machines [19, 24,25,26].

In this article we focus on the physiological function and motor performance thus we present a human motion identification scheme together with preliminary evaluation results, which will be further exploited within the FrailSafe [27] project architecture. The proposed scheme uses temporal and spectral features extracted from the sensor signals and concatenated to a single feature vector to train motion dependent binary classification models that afterwards fuse their decisions to identify ADLs. This scheme is compared against the common multiclass classification scheme after its optimization using two different strategies.

The reminder of this article is organized as follows. In Sect. 2 we present the FrailSafe concept. In Sect. 3 we describe the proposed human motion identification scheme and the common multiclass approach with respect to the optimization strategies. In Sects. 4 and 5 we present the experimental setup and the evaluation results respectively. Finally, in Sect. 6 we conclude this work.

2 The FrailSafe Concept

FrailSafe aims to better understand frailty and its relation to co-morbidities, to develop quantitative and qualitative measures to define frailty and to use these measures to predict short and long-term outcome. In order to achieve these goals real life tools for the assessment of physiological reserve and of external challenges will be developed. These tools will provide an adaptive model (sensitive to changes) in order that pharmaceutical and non-pharmaceutical interventions, which will be designed to delay, arrest or even reverse the transition to frailty. Moreover, FrailSafe targets at creating “prevent-frailty” evidence based recommendations for older people regarding activities of daily living, lifestyle and nutrition, as well as strengthening the motor cognitive, and other “anti-frailty” activities through the delivery of personalized treatment programs, monitoring alerts, guidance and education. The FrailSafe conceptual diagram for motion monitoring is illustrated in Fig. 1.

Fig. 1.
figure 1

The FrailSafe conceptual diagram for motion monitoring reproduced from [28].

Through patient-specific interventions, FrailSafe aims to define a frailty measure. This measure is initially constructed from prior knowledge on the field, and then globally updated based on analysis of long-term observations of all older peoples’ states. This update is then applied to the individual patient models, modifying them accordingly, to fit different needs per patient. The monitoring of the older people’s motion activity is performed through the environmental sensors module, which includes accelerometer sensors for the monitoring of the human motions. Details about the motion identification implementation are provided in the next section.

3 Methodology for Human Motion Identification

3.1 Motion Dependent Classification Scheme

The presented methodology for human motion identification will be used as part of an end-to-end system for sensing and predicting risk of frailty taking into account associated co-morbidities using advanced personalized models as well as delaying frailty using advanced interventions.

The proposed classification methodology can be used as a core module in order to discriminate the detected motions to six basic activities: walking, walking-upstairs, walking-downstairs, sitting, standing and laying. The block diagram of the overall workflow for learning the activity classifiers is illustrated in Fig. 2.

Fig. 2.
figure 2

Motion Dependent Classification of ADLs.

The multi-parametric sensor (accelerometer and gyroscope) data are pre-processed as in [24,25,26] by applying noise filters and then sampled in fixed-width sliding windows \( W_{i} , 1 \le i \le I \) (frames) of 2.56 s and 50% overlap. The sensor acceleration signal, which has gravitational and body motion components, was separated using a Butterworth low-pass filter into body acceleration and gravity. From each frame, a vector of features \( V_{i} \in R^{k} , k = \left| {F_{\rm T} } \right| + |F_{F} | \) was obtained by calculating variables from the time \( F_{T}^{i} \in R^{{|F_{T} |}} \) and frequency domain \( F_{F}^{i} \in R^{{|F_{F} |}} \).

The extracted time domain and frequency domain features are concatenated into a single feature vector as a representative signature for each frame. Details on the type of extracted features are provided in Sect. 4.

All frames are used as input to the human motion identification module which classifies basic activities of daily living (ADLs) in order to obtain some preliminary evaluation results for the proposed scheme. In this module, \( N \) motion dependent binary classification models that have been built in the training phase, are used to label the frames. \( N \) is the number of the discrete motions/classes; here \( N = 6 \) since the motions to be identified include six ADLs: walking, walking-upstairs, walking-downstairs, sitting, standing and laying. Each classification model is trained as a binary model capable to discriminate each motion from all the others, e.g. the first classification model performs walking vs. all remaining ADLs classification, the second classification model performs walking-upstairs vs. all remaining ADLs classification, etc.

During the training phase of the classification scheme, frames with known class labels (manually labeled) are used to train the N classification models. For this purpose, the same training set is labeled with \( N \) different ways to obtain \( N \) motion dependent binary classification models. Each time the examined motion is treated as the positive class while a negative class label is assigned to all of the remaining motions. In a further step, in order to obtain appropriate weights to fuse the individual classifiers, we evaluated each of the classification models using a 10-fold cross-validation protocol on the corresponding training sets. The sensitivity achieved from each motion dependent classification model \( \left( {S_{i} , 1 < i < N } \right) \) and defined as:

$$ S_{i} = \frac{{TP_{i} }}{{TP_{i} + FN_{i} }}. $$
(1)

where \( TP_{i} \) denotes true positives of classifier \( i \) and \( FN_{i} \) its false negatives, is calculated. The achieved sensitivity is used as a weight to multiply the decision \( (d_{i, } 1 < i < N \) ) taken by the corresponding classifier \( i \) in a fusion function used to combine the individual decisions:

$$ Decision = S_{1} d_{1} + S_{2} d_{2} + \ldots + S_{N} d_{N} . $$
(2)

Here, we selected sensitivity instead of another measure such as accuracy since a measure of the proportion of the positives (i.e., the motion under consideration) that were correctly identified as such is more appropriate for the purpose of fusion.

During the test phase the unknown multi-parametric sensor signals are pre-processed and parameterized with similar setup as in the training phase. Each extracted feature vector is provided as input to each one of the \( N \) trained motion dependent classification models. Each of them takes a binary decision \( (d_{i, } 1 < i < N \) ). Finally, the \( N \) individual decisions are combined using the fusion function of Eq. (2).

3.2 Multiclass Classification Scheme

The proposed scheme is compared to the common multiclass classification scheme, which is widely used in such applications. The block diagram of the workflow for learning the multiclass classifier of this scheme is illustrated in Fig. 3.

Fig. 3.
figure 3

Multiclass classification scheme reproduced from [28].

In order to make a fair comparison, the multi-parametric sensor (accelerometer and gyroscope) data are pre-processed and parameterized with the same features as in the motion dependent classification scheme. Once again, the extracted time domain and frequency domain features are concatenated to a single feature vector as a representative signature for each frame. However, in this scheme all frames are used as input to the human motion identification module which directly classifies the N basic activities of daily living (ADLs). In this module, a model for multiclass classification between N (here N = 6) basic ADLs (walking, walking-upstairs, walking-downstairs, sitting, standing and laying), which has been previously built in a training phase, is used to label the frames. Each frame is classified independently.

During the training phase of the classification architecture, frames with known class labels (labeled manually) are used to train the multiclass classification model. During the test phase the unknown multi-parametric sensor signals are pre-processed and parameterized with similar setup as in the training phase. Each extracted feature vector is provided as input to the trained classifier.

In a further, step we attempt to optimize the multiclass classification scheme following two different strategies. On the one hand, we perform feature selection by feature ranking prior to classification while on the other hand we perform subject dependent classification.

Feature Selection.

Regarding the first optimization strategy, we examined the discriminative ability of the extracted features for the human motion identification [28]. The ReliefF algorithm [29] (which is an extension of an earlier algorithm called Relief [30]) was used for estimating the importance of each feature in multiclass classification. In the ReliefF algorithm the weight of any given feature decreases if the squared Euclidean distance of that feature to nearby instances of the same class is more than the distance to nearby instances of the other class. ReliefF is considered one of the most successful feature ranking algorithms due to its simplicity and effectiveness [31,32,33] (only linear time in the number of given features and training samples is required), noise tolerance and robustness in detecting relevant features effectively, even when these features are highly dependent on other features [31, 34]. Furthermore, ReliefF avoids any exhaustive or heuristic search compared with conventional wrapper methods and usually performs better compared to filter methods due to the performance feedback of a nonlinear classifier when searching for useful features [32].

The performance of the method, in terms of accuracy for different number of N-best features (N = 10, 20, 30,… 560), with respect to the ReliefF feature ranking algorithm was examined. The subset that achieved the highest classification accuracy was selected and used to optimize the performance of the multiclass classification scheme.

Subject Dependent Classification.

Regarding the second optimization strategy, we simply trained a subject specific multiclass classification model for each subject separately instead of one global model trained on several subjects [28]. This strategy makes the classification procedure an easier task since inter-subject variability does not affect the accuracy of the model. However, such an approach has the serious disadvantage of requiring training data from each subject.

4 Experimental Setup

4.1 Dataset

The previously described classification methodology was evaluated on multi-parametric data from the UCI HAR Dataset [24]. The dataset consists of accelerometer and gyroscope recordings from 30 volunteers within an age bracket of 19-48 years when performing six activities (walking, walking-upstairs, walking-downstairs sitting, standing, laying). For the experiments each person worn a smartphone (Samsung Galaxy S II) on the waist. Using its embedded accelerometer and gyroscope, 3-axial linear acceleration and 3-axial angular velocity at a constant rate of 50 Hz were captured. The data were labeled manually using the corresponding video recordings which were captured during the experiments.

4.2 Feature Extraction and Classification Algorithm

Initially, the sensor signals (accelerometer and gyroscope) were pre-processed as proposed in [24,25,26] in order to proceed with feature extraction. The features selected for this analysis are those proposed in [24,25,26], which come from the accelerometer and gyroscope 3-axial raw signals denoted as tAcc-XYZ and tGyro-XYZ with prefix ‘t’ used to denote time. The sampling frequency of these time domain signals was 50 Hz. In order to remove noise Anguita et al. performed low pass filtering using a median filter and a 3rd order low pass Butterworth filter with a corner frequency of 20 Hz. Then, in order to separate the acceleration signal into body and gravity acceleration signals denoted as tBodyAcc-XYZ and tGravityAcc-XYZ, they used another low pass Butterworth filter with a corner frequency of 0.3 Hz.

Subsequently, Jerk signals denoted as tBodyAccJerk-XYZ and tBodyGyroJerk-XYZ were obtained by the time derivation of the body linear acceleration and angular velocity. Also they used the Euclidean norm to calculate the magnitude of these three-dimensional signals yielding the following signals: tBodyAccMag, tGravityAccMag, tBodyAccJerkMag, tBodyGyroMag and tBodyGyroJerkMag.

Finally, a Fast Fourier Transform (FFT) was applied to signals tBodyAcc-XYZ, tBodyAccJerk-XYZ, tBodyGyro-XYZ, tBodyAccJerkMag, tBodyGyroMag, tBodyGyroJerkMag producing fBodyAcc-XYZ, fBodyAccJerk-XYZ, fBodyGyro-XYZ, fBodyAccJerkMag, fBodyGyroMag, fBodyGyroJerkMag. Here, the prefix ‘f’ was used to indicate frequency domain signals.

These signals were used to estimate variables of the feature vector for each pattern: ‘-XYZ’ is used to denote 3-axial signals in the X, Y and Z directions. The aforementioned signals which were produced by processing accordingly the initial sensor recordings are tabulated in Table 1.

Table 1. Pre-processed Signals reproduced from [28].

The set of features that were extracted from these signals are those proposed by Anguita et al. including the mean value, the standard deviation, the median absolute deviation, the largest value in array, the smallest value in array, the signal magnitude area, the energy measure as the sum of the squares divided by the number of values, the interquartile range, the signal entropy, the autoregresesion coefficients with Burg order equal to 4, the correlation coefficient between two signal, the index of the frequency component with largest magnitude, the weighted average of the frequency components to obtain a mean frequency, the skewness of the frequency domain signal, the kurtosis of the frequency domain signal, the energy of a frequency interval within the 64 bins of the FFT of each window and the angle between two vectors.

Additional vectors were obtained by averaging the signals in a signal window sample. These are used on the angle variable (see Table 2). In conclusion, for each record a 561-feature vector with the aforementioned time and frequency domain variables was provided.

Table 2. Additional Signals reproduced from [28].

The computed feature vectors were used to train either six binary classification models (walking-vs-all remaining, walking-upstairs-vs-all remaining, walking-downstairs-vs-all remaining, sitting-vs-all remaining, standing-vs-all remaining, and laying-vs-all remaining) for the motion dependent classification scheme or a multiclass classification model for the multiclass classification scheme. In the last case, the multiclass classification model is trained on the reduced feature vector obtained by feature selection for the first optimization strategy or on the subject specific data for the second optimization strategy. In order to evaluate the ability of the extracted features to discriminate between ADLs we examined the SMO [35, 36] with RBF kernel classification algorithm, which was implemented by the WEKA machine learning toolkit [37]. SMO algorithm is an implementation of Support Vector Machines provided by the WEKA toolkit. Here we selected SMO for the classification since SVMs are used mostly in relevant literature.

During the test phase, the sensor signals were pre-processed and parameterized as during training. The SMO classification model was used to label each of the activities. In order to directly compare the proposed methodology with previous approaches evaluated on the same dataset, we followed the evaluation protocol applied in the existing literature [24,25,26, 38,39,40]. In particular, we used the existing random partitioning of the dataset into two sets, where 70% consists of training samples and 30% consists of test samples.

However, for the motion dependent classification scheme, in order to learn the weights for the fusion function (2), sensitivity values should be obtained based absolutely on the training set since an overlap with the test set would lead to over fitting. For this purpose, in order to obtain the appropriate weights for (2) we performed N times 10-fold cross validation on the N versions of the training set respectively.

5 Experimental Results

5.1 Evaluation of the Motion Dependent Classification Scheme

The motion dependent classification scheme presented in Sect. 3 was evaluated using the feature extraction and the classification algorithm described in Sect. 4. Initially, each motion dependent classification model \( i (1 \le i \le N\,and\,N = 6) \) was evaluated on the corresponding version of the training set which treats the motion/ADL under consideration as positive class and all the others as negative. The aim of this evaluation is to obtain appropriate weights for the fusion function (2) which combines the individual decisions. For this purpose, the sensitivity of each model as defined in (1) was used. The obtained values for the sensitivity measure for each of the motion dependent classification models for the training set are tabulated in Table 3.

Table 3. Sensitivity of motion dependent classifiers on the training set.

Based on these values, the fusion function is:

$$ Decision = 1d_{1} + 1d_{2} + 0.99d_{4} + 0.95d_{4} + 0.96d_{5} + 1d_{6} . $$
(3)

The classification performance of the overall framework was evaluated on the 30% of the dataset consisting of the testing samples in terms of accuracy defined as

$$ Accuracy = \frac{TP + TN}{TP + FP + TN + FN} . $$
(4)

where TP denotes the true positives, TN denotes the true negatives, FP the false positives and FN the false negatives. Table 4 shows the achieved accuracy obtained from the test set for the motion dependent binary classifiers and the proposed fusion framework. The achieved accuracy of the proposed scheme is 99%.

Table 4. Accuracy of motion dependent classifiers and fusion scheme on the test set.

5.2 Evaluation of the Multiclass Classification Scheme

In order to make a fair comparison of the proposed fusion scheme with the common multiclass classification scheme, we evaluated the latter after an attempt to optimize it either by feature selection or by performing subject dependent classification.

Feature Selection.

Regarding the first optimization strategy, we applied feature ranking on the whole dataset (consisting of all available subjects) using the ReliefF algorithm as described in Sect. 3.2. The performance of the classification, in terms of accuracy, for different number of N-best features (N = 10, 20, 30, …, 560) for the SMO algorithm is shown in Fig. 4.

Fig. 4.
figure 4

Classification Accuracy for different subsets of N-best features (N = 10,20,…, 550) reproduced from [28].

As can be seen in Fig. 4 the highest classification accuracy is achieved when a large subset of discriminative features is used. Specifically, the highest accuracy is achieved for a subset of 550 best features with a percentage of 97% which is equal to the accuracy achieved when all features are used. It seems that the size and the variability of the dataset is relatively large requiring a feature vector of high dimensionality to accurately discriminate between the six classes. However, with only 40 features a high accuracy equal to 90% can be achieved.

Table 5 show the 40 best features according to the ReliefF ranking algorithm. Although it is best to use a high dimensional feature vector to achieve higher classification accuracy, feature selection still being important in cases where a light human motion identification module is needed such as in FRAILSAFE [27].

Table 5. ReliefF Feature Ranking reproduced from [28].

Subject Dependent Classification.

Regarding the second optimization strategy, we trained subject specific classifiers as described in Sect. 3.2. The results are shown on Table 6.

Table 6. Subject dependent human motion identification Accuracy reproduced from [28].

As can be seen in Table 6, the overall highest accuracy of the proposed methodology for human motion identification is 100% for subjects 1, 9, 11,14, 23, 24, 26, 27 and 29 and the lowest accuracy 69.07% was obtained for the 6th subject. However, the mean accuracy is relatively high, i.e., 94.35%.

Comparison.

Table 7 compares the proposed fusion scheme with the common multiclass classification scheme and other relevant approaches evaluated on the same dataset with the same evaluation protocol. As can be seen the proposed framework outperforms both the multiclass classification scheme and the previous approaches [24, 25, 38,39,40] reaching an accuracy of 99%. The improvement of the overall performance is owed to the fact that motion-specific SVM-based detectors were used, which are able to accurately model the specific characteristics of each motion type. Moreover, during model training, the SVM algorithm adapts it’s kernel mapping parameters appropriately for each target motion type in order the corresponding feature vectors to be better discriminated by the ones of the other human body motion types.

Table 7. Comparison of existing methods on the test set.

6 Conclusions

We presented a methodology for human motion identification from multi-parametric sensor data acquired using accelerometers and gyroscopes, which will be used as part of an end-to-end system for sensing and predicting treatment of frailty and associated co-morbidities using advanced personalized models and advanced interventions. The methodology uses motion dependent binary classification models that classify separately the sensor signals and combine the individual decisions using a fusion function based on the sensitivity of each model to discriminate the examined class. The classification models were trained using feature vectors consisting of a large number of time and frequency domain features and SVMs to train and test the motion dependent classification models. The proposed scheme is compared against the common multiclass classification scheme after optimization of the latter through feature selection and subject dependent classification. All schemes were evaluated using multi-parametric data from 30 subjects. The proposed scheme reached an accuracy of approximately 99% which is higher than the one achieved by the multiclass classification scheme even after its optimization. Finally, the proposed scheme was compared against other relevant studies in the literature. The achieved accuracy is more than 2.5% higher than the ones reported in the previous approaches evaluated on the same dataset with the same evaluation protocol.