Keywords

1 Introduction

Human activity recognition has been proven for its benefits in various fields such as healthcare [1,2,3]. Human activity recognition is often used to help the physicians to make a correct diagnosis for patients. Accelerometers are a common device in human activity recognition due to its ability to provide an objective, non-intrusive measure of activity and the high resolution of data acquisition [1]. Triaxial accelerometers in smartphones are the practical and cost-effective choice for human daily activity recognition [4].

Various studies evaluating human daily activity using the acceleration data by extracting its features either the time or frequency-domain features. High accuracy results with more than 90% accuracy have been reported in past studies [5,6,7]. However, the high accuracy results are only possible when the activities are simple, well-separated, and performed carefully based on researchers’ instructions. If the activities are more complex, closely similar to each other, and performed in a more natural manner, human activity recognition becomes a challenging problem since there is no clear way to relate the signal data to a specific activity.

Past studies reported promising results obtained from using jerk-based features for activity recognition in animals [8]. Jerk was used because it is not possible to direct the animals with specific instructions to perform the tasks. Further, jerk could describe the sensor orientation although the sensor is loosely attached and frequently shifted [8]. Motivated by the promising results in animals, the aim of the present study was to evaluate the performance of jerk compared to the acceleration in recognizing human daily activities. This study examined whether jerk, the derivative of acceleration, could overcome the low sensitivity of acceleration data in discriminating complex and closely similar human daily activities. Machine learning techniques were used for the evaluation due to their robustness [9].

This paper is organized as follows. Section 2 describes the materials and method in obtaining the dataset and a brief description of the theoretical background behind acceleration and jerk, feature extraction, and machine learning classifiers. Section 3 presents results and discussion. Finally, the conclusion is presented in Sect. 4.

2 Materials and Method

The dataset used to evaluate the performance of acceleration and jerk was taken from the study by Anguita et al. [4]. The data was collected using the triaxial accelerometer built-in Samsung smartphone from thirty subjects aged between nineteen and forty-eight. The smartphone was attached to the waist with the sampling frequency rate was set to 50 Hz. Each subject performed walking, walking upstairs, walking downstairs, sitting, standing, and lying down. The dataset was pre-processed with a fifty percent overlapping window of 2.56 s. Then, time- and frequency-domain features from acceleration and jerk data were extracted. The extracted features were then analyzed using supervised machine learning techniques, namely K-Nearest neighbors (KNN), Fisher’s linear discriminant analysis (LDA), support vector machine (SVM), and random forest (RF). Ten-fold cross-validation using Weka environment was done to evaluate the performance of acceleration and jerk datasets.

2.1 Acceleration and Jerk

Acceleration is the rate of change of velocity in terms of speed and/or direction. The acceleration is commonly used in human activity recognition researches. However, acceleration data exclude the force of gravity and thus it is just a consequence of static load [10]. Jerk is the derivative of acceleration and it is felt like the change of accelerations. The magnitude of jerk describes the changes of accelerations independently from the sensor orientation [8]. Thus, jerk can overcome the problem of unknown sensor orientation that might happen when the smartphone on the subjects’ pocket moves its position.

2.2 Feature Extraction

There are two modes of features in human physiological data: time-domain features that refer to the variation of the amplitude of the signal with time; and frequency-domain features that are more robust and need signal pre-processing such as the Fast Fourier Transform (FFT) [11]. In this study, the features are mean, standard deviation, mean absolute deviation, maximum, minimum, signal magnitude area, energy, interquartile range, and spectral entropy in both the time- and frequency-domain modes.

2.3 Classifiers

In this study, we evaluated the performance of acceleration and jerk data using supervised classifiers: K-Nearest neighbors (KNN), Fisher’s linear discriminant analysis (LDA), support vector machine (SVM), and random forest (RF). Weka environment was used for the machine learning analyses. The KNN is a fast and simple supervised classifier [9]. The classification is done by determining the similarity between the training set and new observation that is assigned to the most similar class based on the most votes [12]. Despite its simplicity, KNN is an effective classifier for human activity recognition [13].

The LDA is capable of ensuring the projections of samples from different classes to a line are well-separated. It can separate the classes with an obvious distance between mean values and small variances [14]. Whereas, the SVM can distinctly classify the patterns of the data points into different classes. The difference between the SVM and LDA is that the SVM selects the hyperplane with maximum margin distance [15]. The RF constructs a large number of uncorrelated decision trees as an ensemble at training time with the output is the classification of individual trees [16]. Each individual tree has a class prediction and the class with the most votes will become the model prediction. Past studies reported high accuracy using the RF as a classifier [9, 16, 17].

3 Results

This study evaluated the performance of both acceleration and jerk in order to see whether jerk could really recognize human activity better than acceleration. The evaluation was conducted for three different groups: acceleration, jerk, and combination of both acceleration and jerk. Then, these groups were evaluated using KNN, LDA, SVM, and RF. Ten-fold cross-validation with 90% of the data used for training was selected in this study. The dataset was divided into ten folds with the same class distribution as the original dataset. Each fold was used once to test the performance of the classifier from the combined data of the remaining nine folds. As can be seen in Table 1, the RF outperformed the other classifiers in all groups with the highest accuracy is 88.30% in the combination of acceleration and jerk group.

Table 1 Performance evaluation of the classifiers

Since the RF outperformed the other classifiers, we focus on the RF to compare the performance of acceleration and jerk. Tables 2, 3, and 4 show the confusion matrices of the RF for acceleration, jerk, and combination of acceleration and jerk, respectively. The standing, sitting, and lying are misclassified with one another. However, acceleration and combination groups classified better than jerk.

Table 2 Confusion matrix of RF for acceleration
Table 3 Confusion matrix of RF for jerk
Table 4 Confusion matrix of RF for the combination of acceleration and jerk

4 Discussion

We used ten-fold cross-validation because it has a low variance than a single hold-out set estimator. For the dataset in this study, the test set is very small if the single hold-out set is used because 90% of the data are used for training, and 10% used for testing. The findings of this study showed that RF yielded the highest accuracy. RF is an effective method to rank the importance of variables. Unlike the other decision tree classifiers, each tree in RF can only select a random subset of features that makes it possible to increase the variation among the trees in the model. Thus, the classification will result in higher accuracy considering the low correlation across trees [9, 17].

The performance of a combination of both acceleration and jerk yielded in the highest accuracy. However, when the acceleration and jerk were evaluated as individual groups, acceleration resulted in higher accuracy than jerk. The logic behind using jerk-based features is because jerk is the derivative of acceleration. Thus, it is supposed to be able to detect slight changes in closely similar activities [8]. However, it is not the case in this current study. This could be caused by the fact that the past study [8] used jerk to evaluate the activity recognition in animals with less-to-none instructions from the researchers. On the contrary, the dataset in this study was taken from human subjects with detailed instructions for each activity. Therefore, in contrast to the past study [8], jerk did not perform well in recognizing human activities. This might happen because there were little changes in forces in human signal data during standing, sitting, and lying. Jerk is felt when there is an obvious change in force [10].

As for walking on a flat surface, walking downstairs, and walking upstairs, jerk is also the worst in classifying the activities correctly. Both acceleration and combined groups are able to distinguish the walking activities. This might also happen because the subjects in this study were young adults with good postural control [18]. They could compensate with both increasing and decreasing forces that happened during walking downstairs and upstairs. Based on the results in this study, it can be said that jerk does not perform better than acceleration in human activity recognition when the change of force in the activities is minimum. This finding increases the understanding of the usage of acceleration data and its derivative in human activity recognition. Further studies with more variation of activities including activities with an obvious change of force, to evaluate the performance of jerk and other derivatives of acceleration are needed. In addition, future studies should also evaluate the performance of jerk in other age brackets such as older adults since the other age brackets might not have a good postural control like the young adults.

5 Conclusion and Future Work

This paper has presented the performance evaluation of jerk compared to acceleration using machine learning techniques, namely the k-Nearest neighbors (KNN), Fisher’s linear discriminant analysis (LDA), support vector machine (SVM), and random forest (RF). Based on the evaluation results, it can be concluded that the RF outperformed the other machine learning techniques in terms of accuracy and jerk did not perform better than acceleration in recognizing human activities. This could happen because there were only slight force changes in acceleration in the activities being evaluated. The results of the current study increase the understanding of the usage of acceleration data and its derivative in human activity recognition. Further studies evaluating activities with an obvious change of force are important to confirm whether jerk is more sensitive than acceleration in activities with a bigger change of force.