Sparse representation based classification scheme for human activity recognition using smartphones

Jansi, R.; Amutha, R.

doi:10.1007/s11042-018-6662-5

Sparse representation based classification scheme for human activity recognition using smartphones

Published: 19 September 2018

Volume 78, pages 11027–11045, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

Sparse representation based classification scheme for human activity recognition using smartphones

Download PDF

634 Accesses
19 Citations
Explore all metrics

Abstract

The availability of built-in sensors in mobile devices have paved a way for researchers to accurately determine human activities through these sensors. In this paper, we present a novel action recognition system based on sparse representation wherein, eight different human activities were classified. Our proposed classifier employs data of accelerometer, gyroscope, magnetometer and orientation sensor equipped in smartphones for recognizing human activities. Time-domain and frequency-domain features are derived from the acquired sensor data. We have introduced a novel algorithm for fusing the data from the four sensors using a sparse representation based technique that aid in achieving the best classification performance. In the proposed algorithm, if the majority of the sensors indicate a particular class as the output, then that specific class is assigned as the actual test class. However, if there is a disagreement between the classified output of different sensors, then a novel weighted fusion scheme is introduced to fuse the scores and the residue produced by different sensors. The weight used in fusion is chosen to be the standard deviation of the score vector. Thus, the features of excellent sensors are made to bestow more on to the result of action recognition. Finally, the action label is recognized based on an activity metric that maximizes the score while minimizing the residue. The performance analysis of the proposed system is performed using leave-one-subject-out approach. Performance evaluation metrics like recall, precision, specificity, F-score and accuracy are utilized in projecting the performance of the proposed system. It was shown that the proposed system attained a high overall accuracy of about 97.13%.

Learning Features for Activity Recognition with Shift-Invariant Sparse Coding

E-Health Human Activity Recognition Scheme Using Smartphone’s Data

Reducing covariate factors of gait recognition using feature selection and dictionary-based sparse coding

Article 22 February 2017

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Human activity recognition (HAR) is a rapidly expanding area of research in the field of pervasive computing. It refers to the process of automatically recognizing human activities and has gained importance lately because its health-based applications have turned out to be multifarious involving ambient assisted living, personal fitness monitoring, child and elderly support, chronic health care, rehabilitation and fall detection. Many approaches have been proposed in the literature for human activity recognition. These approaches can be broadly grouped into three categories: vision-based approaches, environmental sensor-based approaches and wearable sensor-based approaches.

Computer vision-based activity recognition [11, 21, 35, 46] involves recognition of activities from videos captured by cameras under well-controlled laboratory settings. However, these methods fail to produce reliable results under home settings due to issues [1, 36] like clutter, occlusion, variable illumination, shadow etc. In addition, these methods require cameras to be fixed at predetermined locations that limit the coverage area owing to the requirement of deploying a large number of cameras. The privacy issue is another major drawback that prohibits the deployment of vision based techniques for action recognition.

Environment sensor-based approaches are used in recognizing human activities based on the interaction between the subjects and environmental sensors like RFID tags [39], infrared-based motion sensors [26] etc. This approach is used in the recognition of daily activities like washing, eating, sleeping, lying, sitting etc. However, the main drawback of this scheme is its limitation of being confined to indoor scenarios [30].

Unlike vision and environment based sensors, wearable sensor-based techniques provide robust action recognition in both indoor and outdoor environments. These sensors can be easily worn by individuals on various parts of the body like arm, waist, legs etc., due to their flexibility and miniature size. In addition, individuals can also carry more than one sensor device at a time [14]. All these benefits have led to tremendous research in the field of action recognition using wearable sensors. Among these, the smartphone based action recognition schemes are more trending since they render powerful context-based information and also have wireless communication capabilities as they possess various built-in sensors including accelerometer, gyroscope, magnetometer, orientation sensor, barometer, proximity sensors, camera, blue-tooth and wireless fidelity (wi-fi) modules.

Recently, extensive research is being carried out on high-dimensional sparse signals. A signal is said to be sparse if it can be represented as a linear combination of relatively few base elements in a basis or an overcomplete dictionary [7]. In fact, most of the real-world signals have sparsity property embedded in them. A detailed comparative study of various sparse representation based algorithms and its applications was presented in [50]. In this, sparse representation was categorized into 5 groups based on the norms used for optimization. The 5 categories included minimization using l₀− norm, l_p− norm, l₁− norm, l_{2, 1}− norm and l₂− norm. In l₀− norm minimization, the optimization problem was framed such that the sparsity or the number of coefficients in the sparse vector is minimum. However, this problem is a non-deterministic NP-hard problem and hence it is difficult to produce an approximate solution. This problem was overcome by the implementation of l₁− norm minimization as it had an analytical solution. l₁− norm refers to the sum of absolute values of all the coefficients in the sparse coefficient vector. For l_p− norm minimization, the value of p was varied from 0 to 1. In particular, the values investigated were p = 0.1, 1/2, 1/3 and 0.9. The l₂− norm is also called the Euclidean norm, which is calculated as the root of the squared sum of all the elements in the sparse coefficient vector. The l_p− norm was found to be non-convex and non-smooth. The l₁− norm was identified to be convex, non-smooth and globally nondifferentiable. Whereas, the l₂− norm was found to be convex, smooth and globally differentiable. However, the l₂− norm minimization was found to be “limitedly-sparse” and not strictly sparse. This showed that l₂− norm had the property of discriminability. Although the primary aim of exploring sparsity nature of a signal is for the purpose of compression and reconstruction, its discriminative capability has been analysed and popularly used in many machine learning applications that include but not limited to image fusion [29], object tracking [42], face recognition [10], human activity recognition [20] and human emotion recognition [51]. Inspired by their credibility to be used in miscellaneous domains, in this paper, we have exploited the sparsity nature of human activity based inertial signals acquired from wearable sensors and proposed a sparse representation based action recognition framework. This paper explores a novel methodology based on sparse theory to incorporate and fuse inertial data from sensors like accelerometer, gyroscope, magnetometer and orientation sensor to improve the accuracy and reliability of smartphone based action recognition schemes.

This remainder of this paper is organized as follows. Section 2 surveys the state-of-the-art works related to human action recognition. Section 3 depicts the detailed description of the proposed framework covering data acquisition, feature extraction and proposed classification. Section 4 presents the quantitative evaluation of the proposed system. The conclusion of the paper is finally provided in Section 5.

2 Related works

With recent advances in pervasive computing, the sensor network technology potentially facilitates various fields ranging from healthcare, ambient assisted living, security and surveillance. Particularly, researchers have developed various action recognition systems using wearable sensors.

Bao et al. [6] proposed an action recognition framework that utilized five biaxial accelerometers. Features like mean, correlation, energy and frequency domain entropy were extracted from the acquired accelerometer data. Several classifiers were evaluated and it was found that decision tree classifiers produce the best performance. The authors of [18] projected an activity recognition scheme using a tri-axial accelerometer embedded in smartphones. Using the concept of principal component analysis, the fundamental period of the accelerometer data was extracted using the phase trajectory matrix. The classification was done using k-nearest neighbor (k-NN) and neural networks. Accelerometer based activity recognition system using an ensemble of classifiers was implemented in [9]. This approach combined three classifiers namely, the J48 decision tree, Multi-Layer Perceptrons and Logistic Regression techniques using combination rule based on average of probabilities. Liu et al. [24] presented an automated action recognition system using temporal patterns extracted from different actions. Joint pattern feature space was constructed using the extracted patterns and was used for classification. Varkey et al. [37] presented a window based algorithm for recognizing fly activities and fine movements within each activity. Features like mean, standard deviation, peak-to-peak, root mean square, maximum and correlation between axes were extracted from accelerometer and gyroscope data. This system utilized a supervised learning approach based on support vector machine (SVM) for classification. Zhang and Sawchuk [49] projected a new framework for activity recognition based on sparse theory and compressive sensing using wearable sensors. Further, it was shown that feature extraction based on random projections achieved best recognition performance. This system achieved a maximum accuracy of about 96.1% considering nine types of activities including forward walking, left side walking, right side walking, upstairs, downstairs, jumping, running, standing and sitting. Fuentes et al. [13] introduced an online motion recognition system using accelerometers embedded in smartphones. The raw accelerometer data was converted to statistical features which were classified using an SVM classifier. This system produced an overall accuracy of about 93% for recognizing activities like stopping, walking, standing-up and sitting-down. In [47], Yin et al. proposed a high-performance training-free approach using accelerometer sensor for recognizing hand gestures. A robust template matching technique based on dynamic programming was presented and used in the gesture recognition process.

The authors of [45] proposed a distributed recognition scheme for classifying human actions using wearable sensors. In this scheme, each action class was modeled using a subspace in the mixture subspace model. Using the training data, the sparsest linear representation of the test data was computed. It was demonstrated that the test class corresponds to the one that produces dominant coefficients. It was also shown that the proposed system had good sensor energy saving capability. A novel scheme for sensor selection based on ontology for wearable action recognition was presented in [38]. The wearable sensors comprised of magnetic and inertial measurement units. For any given nonrecoverable sensor, a new technique for automatic selection of its suitable replacement was also proposed. This framework was based on a set of heuristic rules which were used to find the sensors for replacement. Finally, the appropriate sensor is selected based on iteratively posed queries. The positioning of sensors at the appropriate location is important for accurate action recognition. Hence, an investigation of wearable sensor placement on various parts of the body was conducted in [5]. In addition, most discriminating time-frequency features for effectively classifying different activities was also analyzed in this paper. It was observed that for high-level activities knee, ear and arm were the optimal body locations for sensor placement. Similarly, for transitional activities chest and knee locations were selected.

A number of score level fusion schemes have been proposed in the literature for solving classification problem. These schemes attain high accuracy since they allow multiple scores to be integrated in an efficient manner. A weighted fusion scheme for combining various matching scores for face and palmprint recognition was proposed in [44]. This system utilized two different biometric traits. The proposed score fusion scheme comprised of four steps. In the first two steps, the matching score between the training and testing samples of the two traits were calculated. In the third step, a cross-matching score between the training sample of first trait and the testing sample of the second trait was computed. In the final step, the three matching scores were normalized and combined using weighted coefficients. An adaptive weighted fusion scheme for classifying images was proposed in [43]. In this paper, the optimal weights for fusing various features were determined automatically without any manual settings. Initially, features were extracted from different classes of images. Distance-based scores were then computed between the test sample and all the training samples. These distances are then normalized and sorted. Finally, the scores were fused using weighted fusion technique in which the weights were adaptively chosen based on the confidence of the scores.

The extensive prevalence of smartphones in today’s world has massively accelerated tremendous research in the field of action recognition using smartphones. These smartphones are embedded with various sensors out of which accelerometer, gyroscope, magnetometer and orientation sensors offer rich and valuable location and movement based information, using which the actions performed by individuals can be easily analyzed and discerned. Further, the use of smartphones in action recognition creates low intrusiveness as they substitute additional sensing components for acquiring the sensor data. Owing to these benefits, smart phone based action recognition systems have paved path for a host of innovative applications including but not limited to fall detection [12], health monitoring [27], classification of construction worker activities [2], personal authentication [19], elderly safety [3], physical activity recognition [25], sporting activities classification [28], context-aware recommendations [31] and emotion recognition [52].

The motivation behind investigating action recognition using built-in sensors of smart phones is its diverse applications in various fields mentioned above. To further improve the accuracy of recognition using such built-in sensors we have proposed a novel action recognition scheme in this paper. There are many studies in the literature that have explored the fusion of various sensors like accelerometer, gyroscope and magnetometer.

Wang et al. explored the use of smartphone embedded inertial sensors in action recognition. They showed that accelerometer and gyroscope data when fused produces better recognition performance compared to that of using a single sensor data. In addition, a novel feature selection approach was also proposed for dimensionality reduction and to simultaneously increase the recognition rate [40]. Huynh et al. [17] proposed a threshold based fall detection algorithm using a wireless wearable sensor system comprising of a tri-axial accelerometer and tri-axial gyroscope. It was shown that the addition of gyroscope sensor information increased the overall sensitivity of the system since it provides information related to angular velocity changes. Lee and Mase presented an activity and location recognition system consisting of a bi-axial accelerometer, gyroscope and digital compass. This system could determine the location of the auser, detect transitions and classify activities like sitting, walking and standing [23]. Yun et al. gave the notion of a foot motion filtering algorithm for estimating foot kinematics during normal walking [48]. This system was built using input from sensors which include tri-axial accelerometer, a tri-axial angular rate sensor and a tri-axial magnetometer. The proposed algorithm recognized foot kinematics parameters like foot orientation, acceleration, position, velocity and gait phase. In addition, an adaptive-gain complimentary filter was used for accurate estimation of foot orientation. Ronao and Cho [33] proposed a human activity recognition framework using deep convolutional neural networks (convnets). This system used data from accelerometer and gyroscope sensors that are embedded in smartphones. It was shown that convnets can automatically and adaptively extract robust and relevant features from the sensor data. Altun et al. [4] presented a comparative study of various classification techniques used for classifying human activities using wearable inertial and magnetic sensors. Five sensor units were worn around the chest, the legs and the arms. Feature extraction was performed using principal component analysis. It was inferred that Bayesian decision making classifier performed better compared to other classifiers like decision tree (DT), k-NN, least-squares method, SVM and artificial neural networks. Gravina et al. [15] gave a systematic and comprehensive review of different levels of multi-sensor data fusion in body sensor networks. The performance of three levels of fusion namely data-level, feature-level and decision-level fusion were analyzed in detail in this paper.

The main challenge in using different inertial sensors for action recognition lies with the formulation of a suitable fusion rule, that best incorporates the information from all sensors that ultimately aid in improving the classification accuracy. Thus, the main goal of this paper is to present a novel sparse representation based action recognition scheme that fuses data from various built-in sensors of smart phones to classify human activities and to compare its performance with various standard machine learning classification algorithms.

3 Methodology

The complete overview of the action recognition scheme using smartphone is shown in Fig. 1. It comprises of two stages, namely, the training and testing stage. The data from the four inertial sensors of mobile phone namely, accelerometer, gyroscope, magnetometer and orientation sensor were acquired. From the acquired data, two types of features from the time and frequency domain were extracted. The above two steps are performed in both the training and testing stage. Two types of dictionaries i.e., concatenated and class-specific dictionaries were generated from the extracted features during the training stage. During testing, the features from the test data along with the two dictionaries generated during the training stage are used in classifying the test activity using the proposed classification algorithm.

3.1 Data acquisition

The smartphone used in data acquisition was Moto M equipped with an Android operating system (version 6.0.1). This device has a wide range of sensors including triaxial accelerometer, triaxial gyroscope and magnetometer. In addition to these sensors, this device also provides orientation based measurements. The accelerometer sensor returns the acceleration force applied to the device by the user and is measured in meters per square second (m/s2). The effect of gravity is included in this acceleration signal. The gyroscope measures angular velocity in radians per second (rad/s), which is the rate of rotation of the device around each axis. The magnetometer measures the magnetic field along three perpendicular axes in microtesla (μT). The orientation sensor gives three rotation angles of the device namely the roll, pitch and the azimuth with respect to each axis. The calculation of these rotation angles utilizes the accelerometer, magnetometer and gyroscope sensors. Thus, all the four types of sensor data having three components each were acquired.

Data was sampled at a rate of 64 Hz. This sampling rate was chosen since it is sufficiently higher than the minimum sampling rate of 20 Hz necessary for human action recognition. Fifteen healthy subjects (7 males and 8 females) with age 29 ± 4.5 years, height 5.5 ± 0.57 ft and weight 64 ± 5 Kg (mean ± standard deviation) were involved in data collection. These subjects were asked to wear a belt type mobile pouch such that the mobile phone was at the right waist as shown in Fig. 2.

Each participant was asked to perform a total of 8 different kinds of daily activities including sit, stand, lie-down, walk, jog, jump, upstairs and downstairs. Each of these activities was performed for a duration of about 30 s, three times each. Thus, the total time for recording all the activities using all the subjects were about 3 h. Each of the acquired data was segmented prior to feature extraction using a non-overlapping window with a window length of 2 s.

3.2 Feature extraction

The extracted features comprised of time-domain features and frequency-domain features. Time-domain features have been popularly used in generating distinct and proficient features from sensor data, that aid in successful representation of human activities. In our work, 9 popularly used time-domain features were employed. Statistical features like mean, standard deviation, first quartile, second quartile, third quartile and the pairwise correlation between the three axes were used [32]. In addition, root mean square (RMS), interquartile range (IQR) and zero crossing rate (ZCR) were also included to boost the classification performance [49]. First quartile, second quartile and third quartile refers to 25th percentile, median and 75th percentile respectively. Every feature was computed for each component of the sensor data. Thus, the total number of time domain features extracted from a 2-s segmented window was 27.

Frequency domain features were extracted by performing a fast Fourier transform (FFT) on each window. FFT aids in analyzing a time domain signal in frequency-domain, by taking advantage of the fact that any continuous time signal can be decomposed to a sum of weighted sinusoidal functions. A total of eight different features were extracted namely, dominant frequency [49], spectral energy, entropy [6] and magnitude of the first five components of the FFT spectrum [32]. Dominant frequency is determined to be the frequency that has the highest peak in the spectrum. Each of these features was derived for each component of the sensor data, and thus the total number of frequency domain features extracted from a 2-s window summed to 24.

3.3 Proposed sparse representation based classification scheme

Sparse representation based classification has been used in various vision [16] and wearable sensor [41, 49] based action recognition schemes. Three common techniques used are based on using shared, class-specific and concatenated dictionary [16]. In our work, we have combined the concept of class-specific and concatenated dictionary based classification and proposed a novel sensor fusion based classification scheme that achieves best action recognition performance. Here, classification is initially performed based on majority voting scheme.

Sparse representation is a technique in which the linear combination of a few atoms from an over-complete dictionary can be used to represent a signal. In this way, the sparse representation can be used to represent a signal in its compact form. Let us consider f ∈ R^n × 1 to be the input signal vector and ϕ ∈ R^n × m to be the over-complete dictionary such that n < m. Then, according to sparse representation theory, we can represent the input signal as, f = ϕα, where α ∈ R^m × 1 is the sparse coefficient vector.

In this work, we have collected data from four different smartphone sensors namely accelerometer, gyroscope, magnetometer and orientation sensor. Let C denote the total number of activity classes to be classified. Here, the total number of activity classes to be classified is C = 8. In the proposed scheme, features were extracted from the data of all the four sensors belonging to each activity class. Let i represent class label and j represent sensor label where i ∈ 1, 2, ..., C and j ∈ 1, 2, 3, 4. The features extracted from the i^th class of j^th sensor is used to form a feature matrix φ_ij ∈ R^n × m, where n represents the number of features extracted from a single sensor data and let m represents the number of segmented time-frames from a single class. These features matrices were used to create two types of dictionaries namely class-specific and concatenated dictionaries. Let $ {\varphi}_i^{cl}\in {R}^{4n\times m} $ represent the class-specific dictionary of class i formed using $ {\varphi}_i^{cl}={\left[{\varphi}_{i1}\;|\;{\varphi}_{i2}\;|\;{\varphi}_{i3}|{\varphi}_{i4}\right]}^T $ i.e., every class-specific dictionary comprises features from all the four sensors of class i. Let $ {\varphi}_j^{co}\in {R}^{n\times 8m} $ represent concatenated dictionary of sensor j formed using $ {\varphi}_j^{co}=\left[{\varphi}_{1j}\;|\;{\varphi}_{2j}\;|\dots |{\varphi}_{ij}\right] $ i.e., each concatenated dictionary includes features from all the classes of a particular sensor.

The description of the proposed algorithm is as follows. Features are extracted from all the four sensors. Features from each sensor are initially deployed separately in the estimation of activity using sparse representation based framework using a concatenated dictionary $ {\varphi}_j^{co} $, where j ∈ 1, 2, 3, 4. In addition two initializations are made. An Activity label l is initialized with a zero value. Also, an Activity flag Af ∈ R^1 × C is initialized with all zeros. During classification, a sensor-specific test feature vector $ {f}_j^t $, j ∈ 1, 2, 3, 4 is constructed comprising of features of a particular sensor obtained from test data. For every sensor, the sparse coefficient vector α is obtained using the orthogonal matching pursuit algorithm (OMP) [8] by utilizing the concatenated dictionary $ {\varphi}_j^{co} $ and sensor-specific test feature vector $ {f}_j^t $ of the corresponding sensor. The obtained sparse coefficient vector is split in accordance with the number of action classes being considered. In our work, it is split into 8 divisions since we have considered 8 action classes and the l₁− score $ {s}_i^j $ of each action i belonging to sensor j is calculated as the l₁− norm value of each sub-vector. The test action class is identified based on the class that gets the maximum score. The output obtained during the classification is used to increment the Activity flag at its corresponding location. For example, if the output is 2, then the second location of the Activity flag gets incremented from its initial 0 value to 1. The same process is repeated for all four sensors. If anyone of the 8 locations of Activity flag has a value of 3 or 4, it clearly indicates that majority of the sensors have produced the same action class as output. In this case, the Activity label l is updated with the value corresponding to the location of Activity flag that contains the highest value. For instance, if the Activity flag gets updated as Af = [0 0 0 0 0 3 0 1] after all the four iterations, it denotes that, the 6th class has got a majority vote of 3 and 8th class has got a vote of 1. Hence, it means that three out of four sensors have indicated 6th class as the output. So, in this case, the Activity label l is updated with 6. However, if none of the action classes gets a vote of 3 or 4 it shows that there exist dissimilar outputs from different sensors. It this case the Activity label l does not get updated and remains with a zero value.

When the Activity label l value is still at zero, next level classification is performed. In this case, classification is done using both the minimum reconstruction error obtained using a class-specific dictionary $ {\varphi}_i^{cl} $ and also using l₁− score obtained from the concatenated dictionary $ {\varphi}_j^{co} $. Now, a combined test feature vector $ {f}_c^t\in {R}^{4n\times 1} $ is constructed using features from all four sensors of the test data. Using the combined test feature vector $ {f}_c^t $ and class-specific dictionaries $ {\varphi}_i^{cl} $, sparse coefficient vector is obtained for every class using OMP. For every class, the reconstruction error r_i is calculated using the computed sparse coefficient vector and its corresponding class-specific dictionary. The l₁− scores $ {s}_i^j $ obtained using l₁− norm of each sensor j are used to form l₁− score vector s^j. The l₁− score vector with highly varying values indicates highly varying correlation with different classes, that increases its distinguishing capability of identifying the test activity class from other classes. While the ones with similar values indicates the least capability of distinguishing the actual test class. Hence, in order to allow the features of the sensor with a more distinguishing capability to contribute more to the decision of action recognition, a weighted score fusion scheme has been used. In this scheme, the fusion weight for adaptively fusing the scores of every sensor j are computed as the standard deviation of the l₁− score vector σ^j. Thus, the scores of different sensors are integrated by virtue of different weights based on its standard deviation. Then, the values of l₁−score $ {s}_i^j $ and reconstruction error r_i are normalized in the range 0 to 1, where the least value is mapped to 0 and the highest value to 1. Now, the Activity metric for each class am_i is formulated such that it should have the maximum l₁− score and minimum reconstruction error. In this way, it can be computed as $ a{m}_i={s}_i^1{\sigma}^1+{s}_i^2{\sigma}^2+{s}_i^3{\sigma}^3+{s}_i^4{\sigma}^4-{r}_i $. This can also be framed as, $ a{m}_i=\sum \limits_{j=1}^4{s}_i^j{\sigma}^j-{r}_i $. Finally, the activity class index i is estimated as the class that produces maximum value of activity metric am_i. This index is identified to be the Activity label l..

In the above algorithm, the value of error bound ε was empirically set to 0.01.

4 Performance evaluation

The classification results for the C class classification problem was organized in the form of a confusion matrix M_C × C, wherein, each element M_ij indicates the count of observation which class i are classified as class j. From the confusion matrix, true positives t_p, true negatives t_n, false positives f_p and false negatives f_n of the system are identified. These measurements are utilized in calculating standard performance metrics like recall, precision, specificity, F-score and accuracy of the system [22].

Recall (λ): The fraction of correctly estimated positive cases to the total number of positive cases defines the recall of a classifier.

$$ \lambda =\frac{t_p}{t_p+{f}_n} $$

(1)

Precision (ρ): The fraction of correctly estimated positive cases to the total number of cases estimated as positive indicates the precision of a classifier.

$$ \rho =\frac{t_p}{t_p+{f}_p} $$

(2)

Specificity (δ): The fraction of correctly estimated negative cases to the total number of negative cases constitutes the specificity of a classifier.

$$ \delta =\frac{t_n}{t_n+{f}_p} $$

(3)

F-score (μ): The union of precision and recall into a single metric by means of their harmonic mean indicates the F-score of a classifier.

$$ \mu =2\times \frac{\rho \times \lambda }{\rho +\lambda } $$

(4)

Accuracy (α): The total number of cases that were correctly estimated among all the cases is the accuracy of a classifier.

$$ \alpha =\frac{t_n+{t}_p}{t_n+{t}_p+{f}_n+{f}_p} $$

(5)

To show the importance of using data from four different sensors in improving the recognition rate, we have compared the classification accuracy of classifying activities using sensors individually and with various combinations using feature level fusion [15]. In feature level fusion, the features extracted from data of different sensors are appended to form a new feature vector which is used in classification. These comparisons are shown in Table 1 in terms of overall accuracy. From Table 1, it is evident that usage of data from multiple sensors helps in achieving maximum recognition rate. Among the three standard classifiers, we find that SVM produces better results. Now, considering SVM we can observe that the use of a single sensor namely accelerometer along produces an overall accuracy of about 76%. By using one more sensor, that is along with gyroscope sensor the accuracy raises to about 86.2%. Further, by using magnetometer along with accelerometer and gyroscope the accuracy is further increased to about 90.9%. Finally, by using the features from all four sensors namely, accelerometer (a), gyro sensor (g), magnetometer (m) and orientation sensor (o) we find that a much greater value of about 94.8% accuracy is obtained. These results clearly demonstrate that an increase in the number of sensors helps in contributing to the performance of the system. This is mainly due to the complementary information provided by different sensors.

Table 1 Comparison of classification accuracy using different sensors and their combinations

Full size table

To exhibit the superiority of the proposed sparse representation based classification framework quantitatively, this framework was compared with various standard classifiers. In particular, DT, k-NN and SVM were used for comparison [2]. For evaluation, the leave-one-subject-out cross-validation technique was used [32]. In this technique accelerometer data from one subject was used for testing, and the remaining data was utilized for training. This procedure was repeated until all the subjects were used for testing at least once. The overall performance is the average of all the repetitions.

The confusion matrices obtained for classifying activities using data from all four sensors using standard classifiers and the proposed sparse representation based classifier as described in section 3.3 are shown in Tables 2, 3, 4, and 5.

Table 2 Confusion matrix for classification using DT

Full size table

Table 3 Confusion matrix for classification using k-NN

Full size table

Table 4 Confusion matrix for classification using SVM

Full size table

Table 5 Confusion matrix for classification using the proposed system

Full size table

From these matrices, we infer that the performance of the proposed classification scheme is higher than that of standard classifiers. The values of recall and precision for all the activities are greater than 90%, unlike standard classifiers. While classifying activities that have a similar style like upstairs and downstairs, the performance of the standard classifiers is very low. However, the proposed system achieves an average recall of 92.89% for these two activities which are 24.97%, 4.29%, 2.81% greater than that achieved using DT, k-NN and SVM respectively. Similarly, the proposed system achieves an average precision of 93.57% for these two activities which is 32.40%, 7.68%, 4.68% greater than that achieved using DT, k-NN and SVM respectively. The overall accuracy obtained using the proposed scheme is 97.13%, which is 15.89%, 4.28 and 2.33% greater than the accuracy achieved using DT, k-NN and SVM respectively.

To further validate the performance of the proposed system, its performance is compared with standard classifiers and also with the state-of-the-art sparse representation based algorithms in terms of all the performance measures like average values of recall, precision, specificity, F-score and overall accuracy averaged over all the eight activities. These values are presented in Table 6.

Table 6 Comparison of standard and state-of-the-art classifiers with the proposed system

Full size table

The proposed system proves its splendid performance, by producing superior values greater than 97%, for all the performance measures. As seen from the Table 6, among the three standard classifiers, SVM produces better results. However, the proposed system renders 2.33%, 2.31%, 0.33 and 2.32% higher recognition values than that of SVM in terms of recall, precision, specificity and F-score respectively. In [49], the features extracted from all the sensors are combined using feature level fusion to form a single feature vector. Then, reproduction error is computed for each class and the action class that produces minimum residue is classified as the output action label. Nonetheless, in our proposed system classification is done not only based on residue but also based on l₁− score. Hence, we can clearly observe from Table 6 that the proposed system produces better results compared to the algorithm presented in [49]. To further investigate the performance of our system, the classification was also done using concatenated dictionary formed using features of all classes based on maximum l₁− norm values of the sparse coefficient vector as in [34]. Again, it was observed that the proposed system produced higher values of recall, precision, specificity and F-score of about 1.35%, 1.36%, 0.19 and 1.35% greater compared to the l₁− norm based algorithm used in [34] respectively.

The bar graphs in Figs. 3 and 4, sketches the variation in the performance of various classifiers in terms of Specificity and F-score respectively. Comparing the three standard classifiers namely, DT, k-NN and SVM, the superiority of SVM is noticeable. However, the proposed system outperforms SVM, in terms of both specificity and F-score value for almost all the actions. In addition, from the graphs we also perceive that the proposed system achieves better recognition rate compared to the state-of-the-art classifiers for most of the actions.

Hence, it is obvious that the proposed method outperforms all other methods. This is because unlike other methods, in the proposed algorithm classification is done in two levels. First, classification is done using a majority vote criterion based on the l₁− scores obtained from different sensors. Second, in case if none of the action gets a majority score, classification is done based on weighted fusion scheme in which, the l₁− score are weighted using its standard deviation. Furthermore, the final classification criterion is designed such that the weighted l₁− norm scores are maximized while simultaneously minimizing the reconstruction error. These aspects of the proposed algorithm aids in producing better classification results.

5 Conclusions

In this paper, we presented a novel framework for accurately determining human activities using data from in-built smartphone sensors. The features extracted from the data acquired by these sensors were utilized in action recognition using a novel sparse representation based algorithm. This algorithm was used in fusing data from various sensors in a superlative manner and achieved highest action recognition rate. Also, the proposed system was developed using data from mobile devices that makes it ease for practical implementation, as it does not require any additional equipments for data collection. This further helps in facile real-time implementation of the proposed system. Furthermore, the performance of the proposed scheme was quantitatively analyzed using various performance metrics like recall, precision, specificity, F-score and accuracy. It was shown that the proposed system showed exceptional performance compared to standard classifiers and also state-of-the-art sparse representation based algorithms in terms of all the performance metrics being considered.

References

Aggarwal JK, Xia L (2014) Human activity recognition from 3d data: a review. Pattern Recogn Lett 48:70–80
Article Google Scholar
Akhavian R, Behzadan AH (2016) Smartphone-based construction workers' activity recognition and classification. Autom Constr 71:198–209
Article Google Scholar
Alam MA, Wang W, Ahamed SI, Chu W (2013) Elderly safety: a smartphone based real time approach. In: Proceedings of the international conference on smart homes and health telematics. Springer, Berlin, pp 134–142
Google Scholar
Altun K, Barshan B, Tuncel O (2010) Comparative study on classifying human activities with miniature inertial and magnetic sensors. Pattern Recogn 43(10):3605–3620
Article Google Scholar
Atallah L, Lo B, King R, Yang GZ (2011) Sensor positioning for activity recognition using wearable accelerometers. IEEE T Biomed Circ Syst 5(4):320–329
Article Google Scholar
Bao L, Intille S (2004) Activity recognition from user-annotated acceleration data. In: Ferscha A, Mattern F (eds) Pervasive computing. Lecture notes in computer science, vol 3001. Springer Berlin / Heidelberg, pp 1–17
Baraniuk RG, Candes E, Elad M, Ma Y (2010) Applications of sparse representation and compressive sensing [scanning the issue]. Proc IEEE 98(6):906–909
Article Google Scholar
Bruckstein AM, Donoho DL, Elad M (2007) From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Rev 51(1):34–81
Article MathSciNet Google Scholar
Catal C, Tufekci S, Pirmit E, Kocabag G (2015) On the use of ensemble of classifiers for accelerometer-based activity recognition. Appl Soft Comput 37:1018–1022
Article Google Scholar
Chen Z, Huang W, Lv Z (2017) Towards a face recognition method based on uncorrelated discriminant sparse preserving projection. Multimed Tools Appl 76(17):17669–17683
Article Google Scholar
Cheng H, Liu Z, Zhao Y, Ye G, Sun X (2014) Real world activity summary for senior home monitoring. Multimed Tools Appl 70(1):177–197
Article Google Scholar
Figueiredo IN, Leal C, Pinto L, Bolito J, Lemos A (2016) Exploring smartphone sensors for fall detection. mUX: J Mob User Exp 5(1):2
Google Scholar
Fuentes D, Gonzalez-Abril L, Angulo C, Ortega JA (2012) Online motion recognition using an accelerometer in a mobile device. Expert Syst Appl 39(3):2461–2465
Article Google Scholar
Gao L, Bourke AK, Nelson J (2014) Evaluation of accelerometer based multi-sensor versus single-sensor activity recognition systems. Med Eng Phys 36(6):779–785
Article Google Scholar
Gravina R, Alinia P, Ghasemzadeh H, Fortino G (2017) Multi-sensor fusion in body sensor networks: state-of-the-art and research challenges. Inform Fusion 35:68–80
Article Google Scholar
Guha T, Ward RK (2012) Learning sparse representations for human action recognition. IEEE Trans Pattern Anal Mach Intell 34(8):1576–1588
Article Google Scholar
Huynh QT, Nguyen UD, Irazabal LB, Ghassemian N, Tran BQ (2015) Optimization of an accelerometer and gyroscope-based fall detection algorithm. J Sensors 452078
Ignatov AD, Strijov VV (2016) Human activity recognition using quasiperiodic time series collected from a single tri-axial accelerometer. Multimed Tools Appl 75(12):7257–7270
Article Google Scholar
Jain A, Kanhangad V (2015) Exploring orientation and accelerometer sensor data for personal authentication in smartphones using touchscreen gestures. Pattern Recogn Lett 68:351–360
Article Google Scholar
Jansi R, Amutha R (2018) A novel chaotic map based compressive classification scheme for human activity recognition using a tri-axial accelerometer. Multimed Tools Appl 2018:1–20
Google Scholar
Jansi R, Amutha R, Gokulakrishnan A (2017) A novel framework for action recognition based on histogram of oriented gradients and sparsity-inducing shared dictionary. In: Proceedings of the IEEE International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), 493–497
Lara OD, Labrador MA (2013) A survey on human activity recognition using wearable sensors. IEEE Commun Surv Tut 15(3):1192–1209
Article Google Scholar
Lee SW, Mase K (2002) Activity and location recognition using wearable sensors. IEEE Pervas Comput 1(3):24–32
Article Google Scholar
Liu Y, Nie L, Liu L (2016) Rosenblum DS. From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115
Article Google Scholar
Lu Y, Wei Y, Liu L, Zhong J, Sun L, Liu Y (2017) Towards unsupervised physical activity recognition using smartphone accelerometers. Multimed Tools Appl 76(8):10701–10719
Article Google Scholar
Luo X, Guan Q, Tan H, Gao L, Wang Z, Luo X (2017) Simultaneous indoor tracking and activity recognition using pyroelectric infrared sensors. Sensors 17(8):1738
Article Google Scholar
Miao F, Cheng Y, He Y, He Q, Li Y (2015) A wearable context-aware ECG monitoring system integrated with built-in kinematic sensors of the smartphone. Sensors 15(5):11465–11484
Article Google Scholar
Mitchell E, Monaghan D, O'Connor NE (2013) Classification of sporting activities using smartphone accelerometers. Sensors 13(4):5317–5337
Article Google Scholar
Moonon AU, Hu J, Li S (2015) Remote sensing image fusion method based on nonsubsampled shearlet transform and sparse representation. Sens Imaging 16(1):23
Article Google Scholar
Ordóñez FJ, de Toledo P, Sanchis A (2015) Sensor-based Bayesian detection of anomalous living patterns in a home setting. Pers Ubiquit Comput 19(2):259–270
De Pessemier T, Dooms S, Martens L (2014) Context-aware recommendations through context and activity recognition in a mobile environment. Multimed Tools Appl 72(3):2925–2948
Preece SJ, Goulermas JY, Kenney LP, Howard D (2009) A comparison of feature extraction methods for the classification of dynamic activities from accelerometer data. IEEE Trans Biomed Eng 56(3):871–879
Article Google Scholar
Ronao CA, Cho SB (2016) Human activity recognition with smartphone sensors using deep learning neural networks. Expert Syst Appl 59:235–244
Article Google Scholar
Roy D, Srinivas M, Mohan CK (2016) Sparsity-inducing dictionaries for effective action classification. Pattern Recogn 59:55–62
Article Google Scholar
Seo JJ, Kim HI, De Neve W, Ro YM (2017) Effective and efficient human action recognition using dynamic frame skipping and trajectory rejection. Image Vis Comput 58:76–85
Article Google Scholar
Tapia EM, Intille SS, Larson K (2004) Activity recognition in the home using simple and ubiquitous sensors. In: Pervasive computing. Springer, Berlin, pp 158–175
Chapter Google Scholar
Varkey JP, Pompili D, Walls TA (2012) Human motion recognition using a wireless sensor-based wearable system. Pers Ubiquit Comput 6(7):897–910
Article Google Scholar
Villalonga C, Pomares H, Rojas I, Banos O (2017) MIMU-Wear: ontology-based sensor selection for real-world wearable activity recognition. Neurocomputing 250:76–100
Article Google Scholar
Wang S (2015) Zhou G. a review on radio based activity recognition. Dig Commun Netw 1(1):20–29
Article Google Scholar
Wang A, Chen G, Yang J, Zhao S, Chang CY (2016) A comparative study on human activity recognition using inertial sensors in a smartphone. IEEE Sensors J 16(11):4566–4578
Article Google Scholar
Xiao L, Li R, Luo J, Xiao Z (2016) Energy-efficient recognition of human activity in body sensor networks via compressed classification. Int J Distrib Sens N 12(12):1–8
Google Scholar
Xing X, Qiu F, Xu X, Qing C, Wu Y (2017) Robust object tracking based on sparse representation and incremental weighted PCA. Multimed Tools Appl 76(2):2039–2057
Article Google Scholar
Xu Y, Lu Y (2015) Adaptive weighted fusion: a novel fusion approach for image classification. Neurocomputing 168:566–574
Article Google Scholar
Xu Y, Zhu Q, Zhang D (2011) Combine crossing matching scores with conventional matching scores for bimodal biometrics and face and palmprint recognition experiments. Neurocomputing 74(18):3946–3952
Article Google Scholar
Yang AY, Jafari R, Sastry SS, Bajcsy R (2009) Distributed recognition of human actions using wearable motion sensor networks. J Amb Intel Smart En 1(2):103–115
Google Scholar
Yao T, Wang Z, Xie Z, Gao J, Feng DD (2017) Learning universal multiview dictionary for human action recognition. Pattern Recogn 64:236–244
Article Google Scholar
Yin L, Dong M, Duan Y, Deng W, Zhao K, Guo J (2014) A high-performance training-free approach for hand gesture recognition with accelerometer. Multimed Tools Appl 72(1):843–864
Article Google Scholar
Yun X, Calusdian J, Bachmann ER, McGhee RB (2012) Estimation of human foot motion during normal walking using inertial and magnetic sensor measurements. IEEE Trans Instrum Meas 61(7):2059–2072
Article Google Scholar
Zhang M, Sawchuk AA (2013) Human daily activity recognition with sparse representation using wearable sensors. IEEE J Biomed Health Inform 17(3):553–556
Article Google Scholar
Zhang Z, Xu Y, Yang J, Li X, Zhang D (2015) A survey of sparse representation: algorithms and applications. IEEE Access 3:490–530
Article Google Scholar
Zhao X, Zhang S, Lei B (2014) Robust emotion recognition in noisy speech via sparse representation. Neural Comput & Applic 24(7–8):1539–1553
Article Google Scholar
Zualkernan I, Aloul F, Shapsough S, Hesham A, El-Khorzaty Y (2017) Emotion recognition using mobile phones. Comput Electr Eng 60:1–13
Article Google Scholar

Download references

Acknowledgements

The authors wish to thank all the volunteers who contributed towards data collection.

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, SSN College of Engineering, Chennai, India
R. Jansi & R. Amutha

Authors

R. Jansi
View author publications
You can also search for this author in PubMed Google Scholar
R. Amutha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to R. Jansi.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jansi, R., Amutha, R. Sparse representation based classification scheme for human activity recognition using smartphones. Multimed Tools Appl 78, 11027–11045 (2019). https://doi.org/10.1007/s11042-018-6662-5

Download citation

Received: 03 December 2017
Revised: 26 July 2018
Accepted: 07 September 2018
Published: 19 September 2018
Issue Date: April 2019
DOI: https://doi.org/10.1007/s11042-018-6662-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Sparse representation based classification scheme for human activity recognition using smartphones

Abstract

Similar content being viewed by others

Learning Features for Activity Recognition with Shift-Invariant Sparse Coding

E-Health Human Activity Recognition Scheme Using Smartphone’s Data

Reducing covariate factors of gait recognition using feature selection and dictionary-based sparse coding

1 Introduction

2 Related works