Segmentation and recognition of human motion sequences using wearable inertial sensors

Guo, Ming; Wang, Zhelong

doi:10.1007/s11042-017-5573-1

Segmentation and recognition of human motion sequences using wearable inertial sensors

Published: 05 January 2018

Volume 77, pages 21201–21220, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

Segmentation and recognition of human motion sequences using wearable inertial sensors

Download PDF

Ming Guo¹ &
Zhelong Wang¹

1043 Accesses
19 Citations
Explore all metrics

Abstract

The application of human motion monitoring technology based on wearable inertial sensors has achieved great success in the last ten years. But now the research is mainly focused on isolated motion recognition, and there is scarce research on recognition of human motion sequences. In this paper a novel monitoring framework of human motion sequences is proposed based on wearable inertial sensors. The monitoring framework is composed of data acquisition, segmentation, and recognition stages; the main work of this paper is the last two parts. At the segmentation stage, SVD is used to perform pre-segmentation of motion sequence and its purpose is to reduce time in the segmentation process as much as possible. Then a novel similarity measure named MSH_sim is proposed to accomplish the fine segmentation. At the recognition stage an HMM is used to recognize the motion sequence. We use four inertial sensors to collect the human motion data. Experiments are implemented to evaluate the performance of the proposed monitoring framework, and from the experiment results, it can be seen that the proposed method may achieve better performance compared to other methods.

Human Motion Monitoring Platform Based on Positional Relationship and Inertial Features

Semantically Consistent Human Motion Segmentation

Multidimensional evaluation and analysis of motion segmentation for inertial measurement unit applications

Article 16 August 2015

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Monitoring and recognition of human motion based on wearable inertial sensors is an increasingly important research field in pattern recognition and machine learning [6, 9, 31]. With the aid of wearable inertial devices like accelerometers and gyroscopes, users can obtain more information to assist themselves. Compared to video sensor-based human motion recognition, wearable inertial sensors have a number of advantages in data acquisition [13], such as easy carrying, low cost, unrestricted collection range and strong privacy preservation. Accordingly, this technique can be applied to many fields such as health care and assisted living [35, 38], prevention of diseases [32], entertainment, and promoting exercise [18, 19].

At present, the study on human motion recognition using wearable inertial sensors is mainly focused on isolated motion recognition. Although a variety of human motions have been researched [22, 26, 41], these motions have been achieved by manual segmentation which wastes a lot of time and energy. Even more important is that these motions are isolated and their correlations with each other are removed. Although the motion data and the recognition results are performed well, it is hard to apply to real life in this way because of the manual need for segmentation and the artificial nature of the data collected. Real motion data is presented in the form of a fluid sequence in practical application and human motions have randomness and disorder, which can even make manual motion identification a difficult task. Particularly, motion data is usually mixed with some useless data, so it is important to retrieve only the motion data we need from time-series data [11]. For the reasons above, a set of effective segmentation and recognition strategies for human motion sequence needs to be constructed.

Currently, research on human motion mainly uses video sensors, and the motion sequences can be divided frame-by-frame, then we may study the image associated with each frame using a common recognition algorithm [8, 10, 25, 28, 34]. In addition, some more prominent methods are proposed using video sensors. Wang et al. [39] introduced a method of motion sequence recognition based on characteristic descriptors. The authors made dynamic silhouette sequences translate into multivariate time series in the low dimension space by using tensor subspace analysis (TSA), then a support vector machine (SVM) classifier was used to classify the human motions after extracting motion descriptors in multivariate time series. Zhao et al. [43] studied gesture recognition from motion data streams based on a dynamic matching approach. The authors mainly solved two problems in this paper: one was how to monitor motions from a continuous data stream, and the other was how to recognize a gesture that had different styles. Ofli et al. [29] proposed a new definition of human motions which was described as sequences of the most informative joints (SMIJ), they just used the temporal ordering of joints to analyze motion sequences and also constructed some features to develop their method. Li et al. [24] proposed a new framework to predict long-term complex activities by mining human activity sequence patterns, the relationships between activities were established using a probabilistic suffix tree (PST), and sequential pattern mining (SPM) was used to structure the interactive information model.

Compared to recognition of human motion sequence using video sensors, only using inertial sensors to study human motion [17]. Studies based on video sensors may easily extract related motion features (such as joint angle, distance, direction and human silhouette), but it is difficult for a study using inertial sensors. Additionally, only some statistical features of the signals can be extracted, such as mean, variance, or median. Accurate trajectories may be obtained using video sensors, which can help us build the motion model, but it is too difficult if only using inertial sensors to generate the motion trajectory. At present, there has less research on human motion sequences using inertial sensors. Much of the current work [4, 7, 12, 16] was focused on how to improve the performance of classification algorithms. Although they also studied motion sequences, and these papers did not consider whether there was invalid data in the motion sequences. In [2, 17], the authors only studied single-motion sequences without considering multiple-motions sequences, and the sophisticated sensors used in their work are difficult to apply in real life. The literatures on activity spotting [1, 30] systematically studied human motion sequences by using wearable inertial sensors; the first step was to extract the useful motion data in data stream, then the extracted data was recognized by using traditional classification algorithms. The main inadequacies of the papers were that the accuracy rates at the segmentation stages were relatively low.

In this paper a novel monitoring framework of human motion using wearable inertial sensors is proposed. The monitoring framework may be used to handle the human motion sequences automatically, and retrieve motion data to be recognized from motion sequences. The monitoring framework does not require manual processing, and it may save human power and material resources and time. The main contributions of our work are as follows:

(1)
We propose a systematic and automatic monitoring framework based on wearable inertial sensors. The framework mainly includes three stages: data acquisition, segmentation and recognition. Segmentation is mainly composed of pre-segmentation and fine segmentation. A hidden Markov model (HMM) is used to recognize the segmented human motion data at the recognition stage.
(2)
During pre-segmentation, singular value decomposition (SVD) is used to remove useless data as much as possible for reducing the time for the whole segmentation process.
(3)
A novel similarity measure function called multi-sensors similarity measure function (MSH_sim) is proposed to achieve accurate segmentation during fine segmentation.
(4)
The monitoring framework can recognize multiple human motions, and because of the use of the sliding window at the segmentation stage, online recognition can be achieved using the proposed monitoring framework.

The rest of the paper is organized as follows: in Section 2, the monitoring framework is given and a detailed introduction is given of the segmentation and recognition stages. In Section 3 the experiment and evaluation are presented; data acquisition is introduced in detail, and performance measurement and evaluation of the proposed method are discussed. This is all followed by the conclusion in the last section.

2 Methodology

The main components of the monitoring framework of human motion are shown in Fig. 1. It includes three components: data acquisition, segmentation and recognition. Data acquisition will be introduced in Section 3.1. Segmentation consists of pre-segmentation and fine-segmentation. At the recognition stage an HMM is used. In addition, for a motion sequence, the motion data to be recognized is described as labeled data, and the rest is called junk data in this paper.

2.1 Segmentation stage

2.1.1 Pre-segmentation

For segmentation of the human motion sequence, the first step is pre-segmentation as shown in Fig. 1, and its purpose is to remove the junk data. For this purpose, we use SVD combined with a sliding window. Because of this sliding window, our proposed framework has the ability of on-line processing as well.

For a real matrix A ∈ R^m×n, here m ≫ n, there must exist two orthogonal matrices U = [u₁, u₂, ⋯ , u_m] ∈ R^{m × m} and V = [v₁, v₂, ⋯ , v_n] ∈ R^{n × n}, and there have,

$$ A=UDV^{T}, $$

(1)

here, D = diag(λ₁, λ₂, ⋯ , λ_n), and λ₁ ≥ λ₂ ≥ ⋯ ≥ λ_n. λ_i (i ∈ {1, 2, ⋯ , n}) represents the eigenvalues of matrix A. For (1), V is described as the right eigenvectors of real matrix A. Since V belongs to the kernel of real matrix A, and for two similar data matrices, the difference in the change of sensitivity can be obtained through calculation of the inner product of the right eigenvalues of two similar data matrices [23, 37]. Suppose v₁ and v₂ are the right eigenvectors of two similar data matrices, and α is the intersection angle of them, then we have |v₁ ∗ v₂| = |v₁||v₂||cos(α)|, because v₁ and v₂ are orthogonal, so |v₁| = |v₂| = 1, then |v₁ ∗ v₂| = |cos(α)|. When α is close to 0, |v₁ ∗ v₂| is close to 1. Particularly, when α is equal to 0 (it also means that the two data matrices are same), |v₁ ∗ v₂| = 1. Most importantly, this similarity is mainly measured by the first few larger eigenvalues of the data matrices [23].

Suppose there are z sensor nodes, and the motion data from the k th sensor node is defined as matrix X_k, k ∈ (1,2, ⋯ , z). $V_{k}=({v_{1}^{k}},\,\,,{v_{2}^{k}},\cdots ,\,\,{v_{n}^{k}})$ is the right eigenvector set of matrix X_k after SVD. In addition, the candidate matrix is Y, suppose eigenvalues of matrix Y is $(\widehat {\lambda }_{1},\widehat {\lambda }_{2},\cdots ,\widehat {\lambda }_{n})$, and $ \widehat {\lambda }_{1}\geq \widehat {\lambda }_{2}\geq {\cdots } \geq \widehat {\lambda }_{n}$. $\widehat {V}=[\widehat {v}_{1},\widehat {v}_{2},\cdots ,\widehat {v}_{n}]$ is the right eigenvector set of matrix Y after SVD. Then the segmentation function δ_k for the k th sensor node may be defined by the following equation,

$$ \delta_{k}=\frac{1}{\omega}\sum\limits_{i = 1}^{\omega}|{v_{i}^{k}}*\widehat{v}_{i}|, $$

(2)

and ω may be obtained by the following equation,

$$ J(\omega)=\mathop{min}\limits_{\omega} \,\,\left\{\left( \sum\limits_{i = 1}^{\omega}\widehat{\lambda}_{i}\right)/\left( \sum\limits_{i = 1}^{n}\widehat{\lambda}_{i}\right)\geq \sigma\right\}. $$

(3)

Here, σ is a threshold parameter. Suppose that the obtained data matrix in the k th sensor node is $\widetilde {X}_{k}$ after using (2), and its serial number during the sampling time is defined as $L_{pre}^{k}(\widetilde {X}_{k})$, then the final result L_pre for all sensor nodes can be defined as follows,

$$ \begin{array}{ll} L_{pre}&=L_{pre}^{1}(\widetilde{X}_{1})\cap L_{pre}^{2}(\widetilde{X}_{2}) \cap{\cdots} \cap L_{pre}^{z}(\widetilde{X}_{z})\\ &=\bigcap\limits_{i = 1}^{z}L_{pre}^{i}(\widetilde{X}_{i}). \end{array} $$

(4)

The detailed procedure for pre-segmentation is in Algorithm 1, T₁ is empirical parameter and it is equal to 0.85 in this paper. For a data matrix A ∈ R^m×n, (m ≫ n), the time complexity of SVD of matrix A is O(mn²), and SVD for matrix A^TA can take O(n³). The right eigenvectors of A are the eigenvectors of A^TA on the basis of matrix theory. So in this paper we use matrix A^TA instead of matrix A to make SVD in order to save computation cost. In general, the sensitivity of this proposed method based on SVD is high according to matrix theory, and the corresponding segmentation function value (2) may be higher only when the two motions are very similar. In practice, everyone has his or her own style to perform a motion, which makes it impossible to have two fully consistent motions for different people. Even for the same person, it is impossible for the person to perform two perfectly consistent motions due to the influence of environmental factors. In addition, if the accuracy of the inertial sensors used in the paper is not very high, it is more difficult to find a correspondence between two very similar trajectories. Two examples are given in Fig. 2. The first example includes Fig. 2a and b, and it represents that the same person performing the same motion two times. Figure 2a shows the sensor signals including acceleration signal and angular velocity signal. From Fig. 2a it can be seen that there is a difference between the two sensor signals, although the subject and the motion are the same. Figure 2b shows the corresponding eigenvalues, and the deviation of the second eigenvalues is larger, one is 313.37, the other is 580.49. For the first example, the segmentation function value is only 0.94 according to (2). The second example, including Fig. 2c and d, is obtained by different two subjects performing the same motion. From Fig. 2c it can be seen that the two curves differ. Figure 2d shows the corresponding eigenvalues, the eigenvalues of the first subject are 1367.1, 646.15, 290.95, 12.19, 6.42 and 2.81, and the eigenvalues of the second subject are 1134.5, 746.20, 215.49, 8.69, 4.62 and 1.86. For the second example, the segmentation function value is just 0.89, which has a gap compared with the result of 0.995 in [23]. Although it is difficult to accomplish the fine segmentation of the human motion sequence by this method, because of the small computational complexity mentioned above the method is chosen to complete the pre-segmentation. The objective is to reduce the time required for the whole segmentation process as much as possible.

2.1.2 Fine segmentation

After the pre-segmentation step, the new motion data can be obtained according to the (2) and (3). During fine segmentation, our purpose is to get the required motion data from the new data after pre-segmentation. We mainly use a feature similarity search between sequence data in a specific window and candidate data, and their similarity is compared using feature vectors. It is important to note that a novel similarity measure function, MSH_sim, is proposed to do this work. This similarity measure function is established mainly based on two aspects including the similarity measure function (H_sim) proposed in [42] and the characteristics of sensor data. H_sim is defined as the following equation,

$$ \text{H}_{sim}=\frac{1}{n}\left( \sum\limits_{i = 1}^{n}\frac{1}{(a_{i}-b_{i})^{p}}\right)^{\frac{1}{p}}, $$

(5)

here A = [a₁, a₂, ⋯ , a_n] ∈ Rⁿ, B = [b₁, b₂, ⋯ , b_n] ∈ Rⁿ are two vectors. The function in [42] is not suitable for the present research, because it does not consider the correlation between two vectors, and this problem makes only certain features play a major role, but these features may be not very important.

Suppose there are z inertial sensor nodes, and for the inertial sensors, they can measure an acceleration signal and angular velocity signal. For candidate data, the corresponding acceleration feature set and angular velocity feature set are Y_acc = [y1acc, y2acc, ⋯ , ypacc] and $Y_{av}=[y_{1}^{av},y_{2}^{av},\cdots , y_{p}^{av} ]$. Acceleration feature vector is $y_{i}^{acc}=[y_{i1}^{acc},y_{i2}^{acc},\cdots ,y_{is}^{acc}]^{T}\in R^{s}$ and angular velocity feature vector is $y_{i}^{av}=[y_{i1}^{av},y_{i2}^{av},\cdots ,y_{iv}^{av}]^{T}\in R^{v}$, (i ∈ (1, 2, ⋯ , p)). s and v are the dimensions of acceleration features and angular velocity features, and each feature vector is composed of some basic features such as mean, variance, energy, and these features are obtained based on z sensor data fusion. For the motion sequence data obtained by pre-segmentation, suppose for one specific window whose size is greater than all candidate data length, acceleration feature vector is $x^{acc}=[x_{1}^{acc},x_{2}^{acc}$, $\cdots ,x_{s}^{acc}]^{T}\in R^{s}$ and angular velocity feature vector is $x^{av}=[x_{1}^{av},x_{2}^{av},\cdots $, $x_{v}^{av}]^{T}\in R^{v}$. Then the measure function F_acc for acceleration feature vector x^acc and measure function F_av for angular velocity feature vector x^av are as follows,

$$ \begin{array}{ll} F_{acc}&=\frac{1}{ps}\sum\limits_{i = 1}^{p}\left( \sum\limits_{j = 1}^{s}\frac{1}{1+\mathrm{(C_{i}^{acc})}^{-1}|x_{j}^{acc}-y_{ij}^{acc}|^{2}}\right)^{\frac{1}{2}}\\ F_{av}&=\frac{1}{pv}\sum\limits_{i = 1}^{p}\left( \sum\limits_{j = 1}^{v}\frac{1}{1+\mathrm{(C_{i}^{av})}^{-1}|x_{j}^{av}-y_{ij}^{av}|^{2}}\right)^{\frac{1}{2}}. \end{array} $$

(6)

Here, $\mathrm {C_{i}^{acc}}=cov(x^{acc},y_{i}^{acc})$, $\mathrm {C_{i}^{acc}}=cov(x^{av},y_{i}^{av}),\,\,i\in (1,2,\cdots ,p)$, and cov(⋅) represents the covariance function. So the multi-sensor similarity measure function MSH_sim can be defined as follows,

$$ \text{MSH}_{sim}=\frac{1}{2}(F_{acc}+F_{av}). $$

(7)

According to (6), it involves the multiplication of an n × n dimensional feature matrix for z sensors, so the time complexity of MSH_sim is O(z³n³). In addition, the above similarity measure function MSH_sim has the following characteristics, (1) MSH_sim may reflect the level of similarity between two data, and it can be obtained from the comparison of each dimension of the feature vectors; (2) The value range of MSH_sim is [0, 1]. If its value is higher, it shows that the two data are more similar. When MSH_sim is equal to 1, it indicates that these two data are equivalent. When the value of MSH_sim is close to 0, this indicates that the similarity of the two data is low; (3) The measure function MSH_sim considers the correlation between two feature vectors by using the covariance function; (4) The measure function MSH_sim also considers the characteristics of multi-sensors data; the acceleration data and angular velocity data are calculated separately, as they are separate scales. The detailed procedure of fine segmentation is presented in Algorithm 2, and in this paper T₂ is chosen in the interval [0.96, 0.98].

2.2 Recognition stage

After segmenting the motion sequences, the motion data with labels needs to be recognized. In this paper, the classification algorithm we chose is an HMM. HMMs have been used in the field of human motion recognition based on wearable inertial sensors for a long time [40]. In this paper, we need to establish the HMMs for different human activities and the classification result is achieved though finding the activity sequence matching the maximum posterior probability. In addition, the features used as observations in the HMM including mean, variance, kurtosis, correlation coefficients, energy and entropy [20, 21]; the resulting number of features was 144(6 features × 4 nodes × 6 axes). The feature selection method we used is called robust linear discriminant analysis (RLDA) [15]. RLDA is a modified form of linear discriminant analysis (LDA), and it can solve the problem that the error exponentially and distort discriminant analysis become larger by reestimating several smaller eigenvalues of within-class scatter matrix in LDA.

3 Experiment and evaluation

3.1 Experimental platform and data acquisition

In this paper, data acquisition is based on a multi-sensor experimental platform, which mainly includes the following components. (1) Four inertial sensor nodes, each consists of a triaxial accelerometer (ADXL325) and a triaxial gyroscope (LPR550AL), the size and the shape of the sensor node are shown in Fig. 3b. (2) One wireless receiving node, whose shape is shown in Fig. 3c. (3) A personal computer. Four sensor nodes are used to collect the motion data, and data may be uploaded to personal computer by the wireless receiving node. The data processing software is in the personal computer collecting that data.

Motion data was collected from nine subjects from our laboratory, training data was collected from five of them. The remaining four subjects were held out as a test data set. Four inertial sensor nodes were placed on the subjects, and the locations are right waist, left arm, right ankle and left thigh as shown in Fig. 3a. The sampling frequency is 50Hz. These subjects all perform the ten same basic sports motions. These ten basic motions are walking (WA), running (RUN), turning-left waist (TLW), turning-right waist (TRW), pressing-left leg (PLL), pressing-right leg (PRL), kicking-left leg (KLL), kicking-right leg (KRL), climbing stairs (CS) and going downstairs (GD), respectively. Training data subjects are asked to do one motion at a time, and the execution time for each motion is about 120s. Test data subjects (they are marked as Subject 1^#, Subject 2^#, Subject 3^#, Subject 4^#) are asked to carry out the corresponding motion sequences, and the execution order is shown in Fig. 4. In Fig. 4a it shows the actual execution frame of Subject 1^#,and the frame is obtained by the video, in Fig. 4b it shows the resultant acceleration curve from the right waist sensor node. Testing time is about 660s for each test data subject.

3.2 Performance measurement

In order to evaluate the performance of the proposed method, the following metrics are used [5]: Precision, Recall, Accuracy, and F-score. Suppose that TP (true positive) represents the number of positive sampled points that are classified as positive, FP (false positive) represents the number of negative sampled points that are classified as positive. FN (false negative) represents the number of positive sampled points that are classified as negative, TN (true negative) represents the number of negative sampled points that are classified as negative. The above-mentioned metrics are defined as follows,

$$ \textit{Precision}=\frac{TP}{TP+FP}, $$

(8)

$$ \textit{Recall}=\frac{TP}{TP+FN}, $$

(9)

$$ \textit{Accuracy}=\frac{TP+TN}{TP+FP+FN+TN}, $$

(10)

$$ \textit{F-score}= 2*\frac{Precision*Recall}{Precision+Recall}. $$

(11)

This problem is a multi-class problem, so the averages of recall and precision are used to calculate the F-score.

3.3 Performance evaluation for segmentation

In this section, we evaluate the running time for segmentation for multi-sensor data. It shows the comparison of running time between MSH_sim and SVD + MSH_sim for four subjects in Table 1. From the table it can be seen that, running time is almost more than double if we only use the fine segmentation method MSH_sim without SVD. This demonstrates the effectiveness of the per-segmentation method based on SVD proposed in this paper.

Table 1 The comparison of running time (s) between MSH_sim and SVD+MSH_sim for four subjects

Full size table

In order to evaluate the segmentation performance, four subjects are required to perform the motion sequences mentioned in section III-A, and the actual results are shown in Fig. 5. Figure 5a shows the segmentation result of Subject 1^#, the retrieved motions are represented as labeled data, and the rest are defined as junk data. The blue curve in the first box of Fig. 5a represents the real situation, and it is obtained by using the video, the red curve of Fig. 5a is the result of data segmentation by using the proposed method. Similarly, the results for the remaining three subjects can be obtained. From the figure, it can be seen that the result of Subject 1^# is consistent with expectations. But for Subject 2^#, this result is not good compared with the actual condition. Overall, this figure shows that most of the labeled data may be retrieved by using the proposed method, although there may be some errors for the four subjects.

In order to show the segmentation effects more clearly, the confusion matrices combining with precision and recall of the four subjects are given in Table 2. For Subject 1^#, the precision and recall of labeled data are 95.60% and 97.88% respectively, and both of them are high. The precision and recall of junk data are also high at 95.73% and 91.15% respectively, which shows that the proposed segmentation method is effective. For Subject 2^#, the precision and recall of labeled data are 88.35% and 96.38% , and the precision and recall of junk data are 91.21% and 74.73%. The recall of junk data is not high, it shows the junk data in excess of 20% is wrongly assigned as labeled data. The same situation also appears for Subject 3^#, the precision and recall of labeled data are 90.86% and 96.90% respectively, and the precision and recall of junk data are 91.96% and 77.66%. For Subject 4^#, the results are very good like the Subject 1^#. In summary, from the Table 2 it can be seen that the precisions and recalls for all of the subjects are good, again, this shows that the segmentation method proposed in the paper is effective. Table 3 gives the detailed segmentation results of four metrics by using SVD+MSH_sim.

Table 2 The confusion matrices of four subjects during segmentation. LD represents labeled data, JD represents junk data

Full size table

Table 3 The detailed results (%) of Precision, Recall, Accuracy and F-score by using SVD+MSH_sim at segmentation stage, $\overline {P}$ represents the average of precision values and $\overline {R}$ represents the average of recall values

Full size table

To express the superiority of the proposed segmentation method based on SVD and MSH_sim in the paper, we make the comparisons between our method and other methods including H_sim and Euclidean distance [1, 3]. Euclidean distance is a very useful similarity measuring technique, and it can be defined as follows,

$$Euclidean=((x-y)(x-y)^{T})^{\frac{1}{2}}, $$

where x ∈ Rⁿ and y ∈ Rⁿ are two feature vectors. Figure 6 gives the accuracy rate of segmentation between different segmentation methods including H_sim, Euclidean distance and our method. From Fig. 6 it can be seen that MSH_sim outperforms the other two algorithms for four subjects. For Subject 1^#, the accuracy rate of MSH_sim is 95.58%, the other two methods are 92.38% and 79.50%. For Subject 2^#, the result of our method is just 89.13%, which is not still exceeded by other methods, and their results are 86.74% and 77.42%. For Subject 3^#, the results are 91.05%, 88.93% and 78.79% corresponding to MSH_sim, H_sim and Euclidean distance. The accuracy rates are 94.89%, 84.32% and 75.98% for Subject 4^#. As you can see from the results, both MSH_sim and H_sim can obtain better performance than that of Euclidean distance. That is because the first two methods calculate the level of similarity from each dimension of the feature vectors, but the similarity measure by using Euclidean distance is calculated holistically. This will lead to greater error, especially for the acceleration data and angular velocity data that have different orders of magnitude. In addition, our method MSH_sim is slightly better than H_sim, because we fully consider the characteristics of the sensor data, and use the covariance function to solve the correlation among axes. Because the acceleration data and angular velocity data have different dimensions, they are processed separately.

3.4 Performance evaluation for recognition

After segmentation retrieved data is classified into the ten kinds of human motions introduced in section III-A. The recognition algorithm we chose is HMM because of its ability to process sequences. Figure 7 gives the detailed classification results of four subjects. The red curves represent the types of motion for recognition, and the blue curves represent the actual acquisition data. From the four pictures in Fig. 7, the two curves in each picture are basically the same. In detail, the red curve and blue curve in Fig. 7a are closer than the other three pictures except TLW and TRW, which suggests that the classification result of Subject 1^# is the best among these four subjects. The overlap of two curves in Fig. 7d is not well especially for KLL, and this shows that the classification effect of Subject 4^# is poor. Next the concrete results are given in order to express the classification effects better.

Table 4 gives the corresponding confusion matrices for four subjects. For Subject 1^# (Table 4(a)), it can be seen that the recalls and the precisions of TLW and TRW are relatively low compared with other motions, and they are not more than 90%. The reason is that these two motions are easy to confuse, which is consistent with what is shown in Fig. 7a. The classification results of other motions in Table 4(a) are relatively better. For Subject 2^# (Table 4(b)), the precisions of four motions including PRL, KLL, KRL and GD are much worse than other motions, and their precisions are 79.61%, 79.08% , 60.26% and 81.18%, respectively. The recalls of TRW and KRL are just 79.01% and 76.15%, respectively. The reason is principally because the error is larger in the segmentation stage for Subject 2^#. The precision of GD is very poor for Subject 3^#, because some data in CS and some junk data are erroneously classified as GD, but for other motions their results are good. For Subject 4^#, most of the data in PLL are erroneously classified as PRL. As a result, the recall of PLL is very low at only 8.45%, and the precision of PRL is only 54.43%, which is consistent with that shown in Fig. 7d. In short, we can achieve good results for motion sequence classification using an HMM. Table 5 gives the detailed recognition results of four metrics by using an HMM.

Table 4 The confusion matrices of four subjects in recognition stage

Full size table

Table 5 The detailed results (%) of Precision, Recall, Accuracy, and F-score by using an HMM for the recognition stage

Full size table

Figure 8 shows the comparison of recognition results between the HMM and other four classification algorithms including K-nearest neighbor (KNN) [36], naive Bayes (NB) [33], softmax regression (SR) [14], and linear discriminant analysis classifier (LDAC) [27]. From Fig. 8 it can be seen that the HMM classification algorithm can achieve the best accuracy rates for four subjects, 92.75%, 87.75%, 89.42% and 90.21%, respectively. The results of both NB and LDAC are relatively stable, and their accuracy rates are more than 80% for four subjects but all are lower than that by using the HMM. The KNN result is relatively low for each subject, and the effect of SR is also not satisfying from Fig. 8. This demonstrates that the classification algorithm chosen in this paper is effective.

4 Conclusion

In order to realize the segmentation and recognition of human motion sequences using wearable inertial sensors, a novel recognition framework is proposed; this is mainly composed of a data acquisition stage, segmentation stage, and recognition stage. Four inertial sensors are used to collect human motion data. The segmentation stage is divided into two steps including pre-segmentation and fine segmentation. At the pre-segmentation step SVD is used to delete the junk data in the human motion sequences. We also proposed a novel similarity measure, MSH_sim, to realize the accurate segmentation of motion sequence during fine segmentation. An HMM was used during the recognition stage. Motions sequences from four test data subjects are used to validate the proposed methods. The experimental results demonstrate the effectiveness of the proposed framework.

In our future work, there are still some problems that need further research. First, the on-line recognition should be realized if we want to use this method in real life, and the performance of inertial sensor will be improved to achieve long-term monitoring of human motions, and we also consider to establish the monitoring system based on cloud computing in future work. Second, time consumption is still great because of multi-sensors fusion, it needs to be reduced by means of the most simplified algorithm. Third, in future work some of the more complex motion sequences need to be studied.

References

Amft O (2011) Self-taught learning for activity spotting in on-body motion sensor data. In: 2011 15th annual international symposium on wearable computers. IEEE, pp 83–86
Amft O, Bannach D, Pirkl G, Kreil M, Lukowicz P (2010, April) Towards wearable sensing-based assessment of fluid intake. In: PerCom Workshops, pp 298–303
Amft O, Troster G (2008) Recognition of dietary activity events using on-body sensors. Artif Intell Med 42(2):121–136
Article Google Scholar
Andreu J, Angelov P (2010) Real-time human activity recognition from wireless sensors using evolving fuzzy systems. In: 2010 IEEE International Conference on Fuzzy Systems (FUZZ). IEEE, pp 1–8
Bama SS, Ahmed ML, Saravanan A (2015) A survey on performance evaluation measures for information retrieval system
Bhimani J, Mi N, Leeser M, Yang Z (2017) FIM: performance prediction for parallel computation in iterative data processing applications. In: 2017 IEEE 10th International Conference on Cloud Computing (CLOUD). IEEE, pp 359–366
Blanke U, Schiele B (2009) Daily routine recognition through activity spotting. In: International symposium on location-and context-awareness. Springer, Berlin, pp 192–206
Candes E, Li X, Ma Y, Wright J (2010) Robust principal component analysis?: recovering low-rank matrices from sparse errors. 8(1):201–204
Chen L, Hoey J, Nugent CD, Cook DJ, Yu Z (2012) Sensor-based activity recognition. IEEE Trans Syst Man Cybern Part C Appl Rev 42(6):790–808
Article Google Scholar
Elhamifar E, Vidal R (2013) Sparse subspace clustering: algorithm, theory, and applications. IEEE Trans Pattern Anal Mach Intell 35(11):2765–2781
Article Google Scholar
Ghaleb FF, Youness EA, Elmezain M, Dewdar FS (2015) Vision-based hand gesture spotting and recognition using CRF and SVM. J Softw Eng Appl 8(07):313
Article Google Scholar
Ghassemzadeh H, Guenterberg E, Ostadabbas S, Jafari R (2009) A motion sequence fusion technique based on pca for activity analysis in body sensor networks. In: 2009 annual international conference of the IEEE engineering in medicine and biology society. IEEE, pp 3146–3149
Guan D, Ma T, Yuan W, Lee YK, Jehad Sarkar AM (2011) Review of sensor-based activity recognition systems. IETE Tech Rev 28(5):418–433
Article Google Scholar
Guo J, Xie X, Bie R, Sun L (2014) Structural health monitoring by using a sparse coding-based deep learning algorithm with wireless sensor networks. Pers Ubiquit Comput 18(8):1977–1987
Article Google Scholar
Guo M, Wang Z (2015) A feature extraction method for human action recognition using body-worn inertial sensors. In: 2015 IEEE 19th International Conference on Computer Supported Cooperative Work in Design (CSCWD). IEEE, pp 576–581
Hammerla NY, Halloran S, Ploetz T (2016) Deep, Convolutional, and Recurrent Models for Human Activity Recognition using Wearables. arXiv:1604.08880
Junker H, Amft O, Lukowicz P, Trrster G (2008) Gesture spotting with body-worn inertial sensors to detect user activities. Pattern Recogn 41(6):2010–2024
Article MATH Google Scholar
Kunze K, Barry M, Heinz EA, Lukowicz P, Majoe D, Gutknecht J (2006) Towards recognizing tai chi-an initial experiment using wearable sensors. In: 2006 3rd International Forum on Applied Wearable Computing (IFAWC). VDE, pp 1–6
Ladha C, Hammerla NY, Olivier P, Plotz T (2013) ClimbAX: skill assessment for climbing enthusiasts. In: Proceedings of the 2013 ACM international joint conference on Pervasive and ubiquitous computing. ACM, pp 235–244
Lara OD, Labrador MA (2013) A survey on human activity recognition using wearable sensors. IEEE Commun Surv Tutorials 15(3):1192–1209
Article Google Scholar
Lee MW, Khan AM, Kim TS (2011) A single tri-axial accelerometer-based real-time personal life log system capable of human activity recognition and exercise information generation. Pers Ubiquit Comput 15(8):887–898
Article Google Scholar
Leutheuser H, Schuldhaus D, Eskofier BM (2013) Hierarchical, multi-sensor based classification of daily life activities: comparison with state-of-the-art algorithms using a benchmark dataset. PloS One 8(10):e75196
Article Google Scholar
Li C, Zheng SQ, Prabhakaran B (2007) Segmentation and recognition of motion streams by similarity search. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 3(3):16
Article Google Scholar
Li K, Fu Y (2014) Prediction of human activity by discovering temporal sequence patterns. IEEE Trans Pattern Anal Mach Intell 36(8):1644–1657
Article Google Scholar
Liu G, Lin Z, Yan S, Sun J, Yu Y, Ma Y (2013) Robust recovery of subspace structures by low-rank representation. IEEE Trans Pattern Anal Mach Intell 35(1):171–184
Article Google Scholar
Lu Y, Wei Y, Liu L (2016) Towards unsupervised physical activity recognition using smartphone accelerometers[J]. Multimed Tool Appl 2016:1–19
Google Scholar
Murtaza M, Sharif M, Raza M, Shah J (2014) Face recognition using adaptive margin fishers criterion and linear discriminant analysis. Int Arab J Inform Technol 11 (2):1–11
Google Scholar
Ni B, Wang G, Moulin P (2013) Rgbd-hudaact: a color-depth video database for human daily activity recognition. In: Consumer Depth Cameras for Computer Vision. Springer, London, pp 193–208
Ofli F, Chaudhry R, Kurillo G, Vidal R, Bajcsy R (2014) Sequence of the most informative joints (smij): a new representation for human skeletal action recognition. J Vis Commun Image Represent 25(1):24–38
Article Google Scholar
Ogris G, Stiefmeier T, Lukowicz P, Troster G (2008) Using a complex multi-modal on-body sensor system for activity spotting . In: 2008 12th IEEE international symposium on wearable computers. IEEE, pp 55–62
Ordonez FJ, de Toledo P, Sanchis A (2015) Sensor-based bayesian detection of anomalous living patterns in a home setting. Pers Ubiquit Comput 19(2):259–270
Article Google Scholar
Paradiso R, Loriga G, Taccini N (2005) A wearable health care system based on knitted integrated sensors. IEEE Trans Inf Technol Biomed 9(3):337–344
Article Google Scholar
Rish I (2001) An empirical study of the naive Bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence, vol 3, No. 22. IBM, New York, pp 41–46
Shi Q, Cheng L, Wang L, Smola A (2011) Human action segmentation and recognition using discriminative semi-markov models. Int J Comput Vis 93(1):22–32
Article MATH Google Scholar
Singla G, Cook DJ, Schmitter-Edgecombe M (2010) Recognizing independent and joint activities among multiple residents in smart environments. J Ambient Intell Humaniz Comput 1(1):57–63
Article Google Scholar
Song Y, Huang J, Zhou D, Zha H, Giles CL (2007) Iknn: informative k-nearest neighbor pattern classification. In: European conference on principles of data mining and knowledge discovery. Springer, Berlin, pp 248–264
Stewart GW (1973) Error and perturbation bounds for subspaces associated with certain eigenvalue problems. SIAM Rev 15(4):727–764
Article MathSciNet MATH Google Scholar
Van Kasteren TLM, Englebienne G, Krose BJ (2010) An activity monitoring system for elderly care using generative and discriminative models. Pers Ubiquit Comput 14(6):489–498
Article Google Scholar
Wang L, Wang X, Leckie C, Ramamohanarao K (2008) Characteristic-based descriptors for motion sequence recognition. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, Berlin, pp 369–380
Wang Z, Guo M, Zhao C (2016) Badminton stroke recognition based on body sensor networks. IEEE Trans Human-Machine Syst 46(5):769–775
Article Google Scholar
Wu D, Wang Z, Chen Y, Zhao H (2016) Mixed-kernel based weighted extreme learning machine for inertial sensor based human activity recognition with imbalanced dataset. Neurocomputing 190:35–49
Article Google Scholar
Yang F, Zhu Y, Shi BL (2004) An efficient method for similarity search on quantitative transaction data. J Comput Res Develop 41(2):361–368
Google Scholar
Zhao X, Li X, Pang C, Zhu X, Sheng QZ (2013) Online human gesture recognition from motion data streams. In: Proceedings of the 21st ACM international conference on multimedia. ACM, pp 23–32

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China under Grant No.61473058, Fundamental Research Funds for the Central Universities (DUT15ZD114) and Project Funded by China Postdoctoral Science Foundation (2017M621131). The authors gratefully acknowledge the assistance of Mark V. Albert in correcting English language.

Author information

Authors and Affiliations

School of Control Science and Engineering, Dalian University of Technology, Dalian, 116024, China
Ming Guo & Zhelong Wang

Authors

Ming Guo
View author publications
You can also search for this author in PubMed Google Scholar
Zhelong Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ming Guo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, M., Wang, Z. Segmentation and recognition of human motion sequences using wearable inertial sensors. Multimed Tools Appl 77, 21201–21220 (2018). https://doi.org/10.1007/s11042-017-5573-1

Download citation

Received: 14 February 2017
Revised: 30 October 2017
Accepted: 21 December 2017
Published: 05 January 2018
Issue Date: August 2018
DOI: https://doi.org/10.1007/s11042-017-5573-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Segmentation and recognition of human motion sequences using wearable inertial sensors

Abstract

Similar content being viewed by others

Human Motion Monitoring Platform Based on Positional Relationship and Inertial Features

Semantically Consistent Human Motion Segmentation

Multidimensional evaluation and analysis of motion segmentation for inertial measurement unit applications

1 Introduction