Introduction

Obesity rate is increasing worldwide and becoming major public health concern in the developed as well as developing countries. Worldwide obesity is nearly doubled after 1980 [1]. World Health Organization (WHO) estimates that more than 10 % of the world population is obese. Obesity is also related to large number of chronic diseases. Recently, it is found that the number of years lived with obesity is directly proportional to the risk of mortality [2]. According to world health organization, about 2.8 million people are dying every year due to obesity related diseases [1].

To reduce the problem of obesity, preventative efforts include proper diet and enhanced daily physical activities. Lot of research has been done to optimize the diet and exercise plan to reduce the obesity in adults and children. It is reported in the literature that both diet and physical activity are important factors [35]. Many nutritionist and doctors monitor the physical activity of patients by self-filled questionnaires to assess the amount of physical activity [6]. Physical activity index based on the questionnaires are also proposed to assess different level of activeness of the people [7].

Physical activity of 15 min a day or 90 min a week of moderate intensity exercise is beneficial for reduction the mortality rate and increases the life expectancy [8]. Physical activity monitoring includes intensity, duration, frequency and type of the activity to define the volume of the physical activity [9]. It is difficult and cumbersome to report the daily physical activity by the person using self-recorded reports. Hence, lot of research is done in the past two decades to use the wearable sensors for monitoring the daily physical activity. Pedometers are common cheap sensors on the waist belt which measure the vertical acceleration and calculate the walking steps by sensing the zero crossing of acceleration exceeding certain threshold of the acceleration [10, 11]. It records the number of steps taken and daily report of the walking activity can be generated to assess the amount of physical activity. But pedometers cannot capture different type of physical activity like swimming, bicycling, standing etc. Moreover, correlation between step frequency and energy expenditure varies for walking, running or jumping activities.

Use of acceleration sensors in physical activity monitoring gained popularity in the last decade as more accurate and cheaper sensors are available with the advancement of MEMs technology [1215]. Acceleration based monitoring system can be integrated to provide more comprehensive intelligent in-home monitoring of physical activities [16, 17]. Mathie et al. [18] presented a framework of binary decision tree for classification of various human movements including rest, walking and falling using single tri-axial acceleration sensor placed at the waist. Sekine et al. [19] used discrete wavelet transform to classify different types of walking including walking on level surface, walking upstairs and walking downstairs. Many systems investigated the classification of various physical activities by placing more than one sensor on the human body [2022]. But these systems are not practical in the daily life environment due to multiple sensors on the body and their cables etc. Lee et al. [23] used a single tri-axial acceleration sensor placed on the waist to classify standing, sitting, lying, walking and running and claimed the accuracy of 99 % for only five subjects using c-mean fuzzy classification algorithms. Bonomi et al. [24] also used a single acceleration sensor on the back to classify the activities of lying, sitting, standing, dynamic standing, walking, running and cycling using decision tree classifier and produced about 95 % classification accuracy on twenty subjects. Allen et al. [25] used Gaussian mixture model to classify three postures and five movements on six elderly subjects and reported average classification accuracy of 91 %. Karantonis et al. [26] indicated accuracy of 91 % classification accuracy of 12 activities of six subjects using features of magnitude, tilt angle and fast Fourier transform of the acceleration data. Jin et al. [27] used fuzzy inference system to classify four activities of lying, sitting, walking and running. Activity monitoring systems using acceleration sensors can also be applied to identify different gait parameters and walking pattern classification [28] and the abnormal gait detection [29].

With advancement in the mobile phone technology and emergence of smartphone containing lot of sensors, physical activity monitoring is realized by many mobile applications using acceleration sensor of the smartphone. Wu et al. [30] evaluated different classifiers on the three activities (Walking, jogging, using stairs) using mean, standard deviation and fast Fourier transform as features and obtained average accuracy of 90 % using KNN classifier. Anguita et al. [31] did the classification of six activities by fixing the smartphone on the waist and recording the 3D acceleration sensor data. They have used 17 features comprising of time and frequency domain patterns to classify the activities using support vector machine and obtained 89 % classification accuracy. Siirtola et al. [32] placed the smartphone in the front pocket of the trouser and collected the data of five activities (walking, running, cycling, driving a car and sitting/standing) and compared two classifiers namely, KNN and QDA (quadratic discriminant analysis). Classification accuracy was found to be about 95 % for these activities for both classifiers. Mitchel et al. [33] interestingly placed the smartphone on the back of the subject to record the acceleration data for seven activities (stationary, walking, jogging, sprinting, hitting and dribbling the ball). Features are extracted by calculating energy distribution ratios from discrete wavelet transform of acceleration signals. Different classifiers are compared and average of F-measure accuracy of 87 % is obtained.

In this paper, six different types of activities (walking, jogging, standing, sitting, climbing upstairs and downstairs) are classified with high accuracy (more than 99 %) with 10 folds cross validation. K nearest neighbor classifier is used on simple time domain features extracted from the acceleration data of the smartphone. Effective feature set reduction is achieved through correlation based feature selection. Significant instances are selected to minimize the time and space complexity of the KNN classifier. In the end, results reported in this paper is compared with the published results to illustrate the effectiveness of the proposed framework.

Materials and methods

Block diagram of physical activity recognition using smartphone acceleration sensor is shown in Fig. 1. 3D acceleration sensor data is recorded continuously and pre-processed to separate the body and gravity acceleration signals. Furthermore, a jerk filter is used to calculate the jerk signals from the acceleration data. In the next step, time domain features are extracted on the body and gravity acceleration signals. On the training features dataset, subsets of features are selected to reduce the time and space complexity of the classification. Classification of the physical activity is done in the next step and type of physical activity is recorded to generate the physical activity reports.

Fig. 1
figure 1

Physical activity recognition using smartphone acceleration sensor

Description of data

The dataset used in this study was released by the Wireless Sensor Data Mining (WISDM) Lab. The dataset is known as WISDM’s activity prediction dataset [34]. Approval from the Fordham university Institutional Review Board is obtained to collect this data by the authors of [34]. In WISDM dataset, thirty six volunteer subjects performed six activities namely, walking, jogging, ascending stairs (Upstairs hereafter), descending stairs (downstairs hereafter), sitting and standing for a specific period of time.

Subjects were carrying Android-based accelerometer incorporated smart phones in their front pants leg pockets [34]. Acceleration data is recorded with the sampling frequency of 20Hz. Figure 2 shows a sample of acceleration signal in x, y, z-directions for all six types of physical activities. A value of 10 corresponds to one g which is 9.81 m/s2. Activity of Jogging produces periodic movements in x-direction having high amplitudes as compared to the walking activity. Sitting and Standing activities shows very little variations in the acceleration signal and change of direction of gravity is evident from y direction to z direction as subject switches for sitting to standing posture. Walking upstairs and downstairs patterns are somewhat similar to walking with lesser periodicity. Acceleration signal in x direction is shifted upward for walking upstairs and shifted downward for downstairs.

Fig. 2
figure 2

Acceleration signal for all activities; Columns (First is acceleration in x direction, second is acceleration in y direction and third is acceleration in z direction); Rows (First is Walking, second is Jogging, third is Sitting, fourth is Standing, fifth is walking Upstairs and sixth is walking Downstairs)

Acceleration data preprocessing

Before calculating any feature, the raw accelerometer data was preprocessed to reduce noise using median filter or order n in each dimension separately. A window of w t seconds (f s  × w t samples) is used to calculate the feature set for a particular activity. Here, f s is the sampling frequency of the acceleration data.

Feature extraction

Every w t second window consists of acceleration in three dimension a x a y and a z . Acceleration in each direction captured by accelerometer is the sum of gravitational, ‘g’ and body, ‘b’ accelerations. Thus, a 3rd order Butterworth low pass filter is used with a cutoff frequency of 0.3 Hz to separate the acceleration signal into gravity (a g x a g y  and a g z ) and body acceleration (a b x a b y  and a b z ). The estimate of rate of change in acceleration known as Jerk, ‘j’ is calculated by following steps [35],

Calculate gradients, c x , c y and c z of a x , a y and a z separately.

Calculate angles, α k between a b k (i) and a b k (i − 1) where k ∈ {x, y, z}

Calculate jerk j :  × [0 180] →   using following equation

$$ {j}_k(i)=\left(1+\frac{\left|{\alpha}_k(i)\right|}{180}\right){c}_k^{\hbox{'}}(i) $$
$$ {c}_k^{\hbox{'}}(i)=\left\{\begin{array}{cc}\hfill \left|{c}_k(i)\right|\hfill & \hfill if\ \left|{a}_k^b(i)\right|\ge \left|{a}_k^b\left( i-1\right)\right|\hfill \\ {}\hfill -\left|{c}_k(i)\right|\hfill & \hfill otherwise\hfill \end{array}\right.\begin{array}{cc}\hfill \hfill & \hfill \hfill \end{array} k\in \left\{ x, y, z\right\} $$

In the next step, total body acceleration a b, total gravity acceleration a g and total jerk \( j \) are calculated as follows,

$$ {a}^b=\sqrt{{\left({a}_x^b\right)}^2+{\left({a}_y^b\right)}^2+{\left({a}_z^b\right)}^2} $$
$$ {a}^g=\sqrt{{\left({a}_x^g\right)}^2+{\left({a}_y^g\right)}^2+{\left({a}_z^g\right)}^2} $$
$$ j=\sqrt{j_x^2+{j}_y^2+{j}_z^2} $$

are less computationally expensive. Looking at the Fig. 1, features should include statistical descriptors as they will be useful in identifying postural activities like sitting and standing from the rest. Moreover walking and jogging will produce difference in the statistical measures. Periodic activities like walking, jogging, walking upstairs, and downstairs should have correlated acceleration patterns in different directions. Hence correlation among three directions and auto regression analysis may produce discriminatory features. Therefore, following time domain features are extracted from body acceleration, jerk, total body acceleration, and total jerk signal,

•Mean

•Maximum value

•Mean square value

•Standard deviation

•Minimum value

•Interquartile range

•Median absolute deviation

•Signal magnitude area

•Signal entropy

•Auto-regression coefficients with burg order of four

•Correlation coefficient among x, y and z directions

 

Thus for every window of w t seconds, 105 features are calculated.

Feature subset selection

It is important that we analyze the feature space and select those features which contribute more in the correct classification of the physical activities. Feature subset selection will help to improve the performance of the model and reduce the processing cost. In this paper, we have used correlation based feature selection (CFS) method [36, 37] to select the feature subset. This method considers the prediction ability of each feature in the subset and redundancy of the feature with other features simultaneously. Hence, in the feature subset, high correlation of the features to the classes and low inter-correlation among features is desirable. Linear correlation coefficient is used to find out the correlation among the feature subset. Different search methods can be used to find the feature subset in the CFS technique. In this paper, we have used three methods; namely, scatter search, reduced scatter search and subset linear forward selection. Scatter search method [38] uses diversification generation method to generate diverse subsets and passed them though an improvement method which is usually a local search in the initial phase. A reference set is built on the initial sets and subsets are generated from the reference set. Main loop of the scatter search consists of subset generation, solution combination, improvement method and reference set update method. This loop is terminated based on the stopping condition using a threshold value [39]. Lopez et al. [38] developed three scatter search base algorithms, sequential scatter search with greedy combination (SS-GC), sequential scatter search with reduced greedy combination (SS-RGS) and parallel scatter search.

Wrapper methods are very popular type of methods in finding out the feature subset by assigning a score to the features subset using a classifier. In these methods subset evaluations are costly. Hall et al. [40] proposed linear forward selection approach to reduce the computational complexity of the wrapper method by reducing the number of subset evaluations. A variant of linear forward selection is proposed in [41] to produce smaller subset quickly.

Classification of physical activity

In this paper, we have compared the performance of three classifiers. K nearest neighbor (KNN) [42] classifier is a widely used model free classifier in which classification of the data is decided based the class labels of the neighboring instances. For a set of instances DB and a query point q and parameter K, KNN returns a set of nearest neighbors NN q such that

$$ \forall i, j\ i\in N{N}_q, j\in DB- N{N}_q: d\left( q, i\right)< d\left( q, j\right) $$

Here d(q, i)  is any distance metric. Class of query point q will be decided by the majority class of NN q . Random forest is an ensemble classifier which produces predictions of the classes without over fitting the training data [43]. In this type of classifier, many trees are constructed on different feature subsets selected randomly. Class is predicted by aggregating over the ensemble. Random forest is used successfully in many classification applications [44, 45]. A detailed explanation of random forest can be found in [43, 44, 46]. Rotation forest is another type of ensemble classifier in which M decision trees are trained from different subset of features independently [47, 48]. For Rotation forest classifier, user has to define the number of features in a subset, number of classifier in the ensemble, extraction method and base classifier.

Results and discussion

As discussed in section 2.3, acceleration data is x, y and z directions are divided into instances by a sliding window wt of 10 s. Sampling frequency to record the acceleration data is 20Hz. An overlap of 2.5 s is considered for sliding the window. Features described in the section 2.3 are calculated on the window of 10 s and feature set (FS1) of 21331 instances is obtained where each instance contains 105 features. Table 1 shows each activity with its respective number of instances in the feature set.

Table 1 Number of instances per activity

All features selection and classification results are obtained by WEKA software [49]. K nearest neighbor (KNN) [42] is used for the classification of the feature set FS1. Value of K is selected as 3. 10 folds cross validation results are given in Table 2. Overall classification accuracy is found to be 99.4 %. TP rate, FP rate are true positive and false positive rates respectively.

Table 2 Classification result by KNN on FS1

Precision is defined as the proportion of instances which belongs to a class (true instances) by the total instances classified by the classifier as belong to this particular class. Recall is defined as proportion of instances classified in one class by the total instances belonging to that class. F-measure is the combination of precision and recall and defined as,

$$ F- measure=\frac{2\times \Pr ecision\times \mathrm{Re} call}{ \Pr ecision+\mathrm{Re} call} $$

F-measure is 0.993 (99.3 % in percentage) that shows very good performance of the feature set for all the activities. Confusion matrix for KNN classifier on FS1 is given in Table 3. Some instances of walking and jogging are confused with upstairs and downstairs. Similarly, some instances of upstairs and downstairs confuse with walking and jogging.

Table 3 Confusion matrix of KNN classifier on FS1

This is natural as walking on the stairs is somewhat similar to walking on the flat surface. Moreover, walking and jogging can have different patterns depending on the subject’s body height, body weight and walking style.

It is assumed that some attributes may be redundant in the feature set FS1. Hence correlation based feature selection (CFS) is used to remove the redundant and irrelevant features and to reduce the feature set. Three types of search methods, scatter search (SS), reduced scatter search (RSS) and subset size forward selection (SSFS) are used to search the best feature subset. Out of these three methods, reduced scatter search method generated the minimum feature subset comprising of 30 features.

Classification results of KNN classifier (K = 3) on the features subsets from SS, RSS and SSFS are summarized in Table 4. All results are based on 10 fold cross validation. Among the three search methods, average values of precision, recall and F-measure are almost equal (about 98 %). Hence the feature subset FS2 produced by reduced scatter search is better than other two as it contains less number of features. Confusion matrix of the classification results using KNN on FS2 with 10 fold validation are given in Table 5. Some instances of walking are confused with walking upstairs and downstairs. Similarly many instances of walking upstairs and downstairs are confused with walking activity. Few instances of walking upstairs and downstairs are confused with each other. Since there is lot of similarity in the acceleration data of these three activities, so it is natural that they will be confused with each other. This observation is also evident in other published results [34, 50]. Activities of jogging, sitting and standing are classified accurately with little confusion between jogging and walking.

Table 4 Reduction of feature subset by CFS on FS1
Table 5 Confusion matrix of KNN classifier on FS2

Performance of three classifiers is compared in Table 6 using the feature subset FS2 having 30 features. The value of K in KNN classifier is set to be 3. In Rotation forest classifier, base classifier is J48 classifier [51] and extraction method is principal component analysis (PCA). In random forest classifier, 10 trees are constructed from 5 random features. All results are based on 10 folds cross validation. It can be observed from the table that KNN outperforms other two ensemble classifiers. Time taken to build the model by random forest is better than rotation forest. Both rotation forest and random forest are better than KNN in the time and space complexity for searching the class of a query data point. On FS2 dataset for 10 fold cross validation, time complexity of KNN is 20 times more than random forest and rotation forest when tested on same PC with exactly same specifications. Overall classification accuracy of KNN for 10 folds cross validation is better than rotation forest and random forest (Table 6). For upstairs and downstairs activity classification, KNN outperform rotation forest and random forest by 10 %. F-measure for KNN is 0.965 and 0.935 for upstairs and downstairs as compared to rotation forest (0.89 and 0.815) and random forest (0.86 and 0.748). Therefore for overall classification of all six activities, KNN is a better choice. To improve the time complexity of KNN classifier, many variants or algorithms are proposed in the literature [52, 53].

Table 6 Classification results of three classifiers on FS2

In KNN based classification, whole training dataset is used as representative instances in the query classification.

Hence it is important to remove the redundant or less significant instances from the training dataset to reduce the size. There are many algorithms to select the significant instances with respect to classification [54, 55]. In this paper, we have used Decremental Reduction Optimization Procedure 2 (DROP2) proposed by Wilson and Martinez [54]. For the set of instances S, the algorithm starts by taking all the instances from the set S into T and then removes an instance P from the set T if at least as many of its associates in the original set T including the instances already removed from T are classified correctly without P. This procedure is done for all the instances in the set T. The dataset FS2 is divided into training and testing datasets by dividing them randomly. Training set includes 80 % of the instances of FS2 (17064 instances) and testing set contains 20 % of the instances of FS2 (4267 instances). DROP2 pruning algorithm is used on the training dataset and pruned instances are stored as FS5 dataset. Training and testing datasets are classified using KNN classifier (K = 1) using FS5 (Pruned dataset). Classification results are summarized in Table 7. DROP2 pruning method when applied to training dataset retained only 11.6 % of the total instances in the training set. Selection percentage for each class is listed in the second column of the Table 7. For classes, Walking, Upstairs and Downstairs, large number of instances is retained. According to our previous discussion it was found that upstairs and downstairs classes are most difficult to classify. Hence AF pruning considered their most of the instances as significant for classification.

Table 7 Classification results of KNN (DROP2 based reduction on FS5)

Classification accuracies for training and testing datasets are impressive as very little degradation in the accuracies are observed when comparing the full dataset FS2 and pruned dataset FS5 (F-measures are about 0.971 for training and 0.953 for testing respectively).

Confusion matrix of KNN classifier applied on the training dataset using the pruned dataset FS5 is reported in Table 8. Most of the instances of all six classes are classified correctly with small confusion among Upstairs, Downstairs and Walking. Similar trend is observed in the confusion matrix (see Table 9) of KNN classifier on the testing dataset using the pruned dataset FS5.

Table 8 Confusion matrix of KNN classifier on the training set using FS5
Table 9 Confusion matrix of KNN classifier the testing set using FS5

Advantage of DROP2 pruning algorithm is evident from the overall selection percentages. Only 11 % of the instances are retained from the training dataset which are significant in classification of all six activities. From the reduction percentages, it is observed that more instances are retained for Upstairs and Downstairs classes. The reason can be both the classes are difficult and more sparsely distributed on the features space. Moreover, number of instances for these two classes is few in the original dataset FS2 as compared to Walking and Jogging classes. Sitting and Standing activities are easier to classify and their selection percentages are low.

In Table 10, our results are compared with the published results under similar experimental setups. Maurer et al. [21] showed an accuracy of about 80 % with a sensor placed in the trouser pocket for six types of activities (Standing, Sitting, running, upstairs, downstairs, walking). They showed very low classification accuracies for upstairs and downstairs. Sun et al. [56] recorded the acceleration data by putting the mobile phone in different pockets with different orientations for seven activities (Stationary, Walking, Running, Bicycling, Upstairs, Downstairs and Driving).

Table 10 Comparison of performance with the reported results

They have applied SVM classifier on 66 features and achieved F-measure equals to 93 %. In their results F-measure of all activities are above 90 %. Variation on the data of activities depends on the number of subjects as well. In this research they have used only six subjects to conduct the experiment. Karantonis et al. [26] applied decision tree classification approach to classify walking (at three speeds), transitional posture movement (sit-to-stand, stand-to-sit, lying, lying-to-sit and sit-to-lying) and falls by placing acceleration sensor on the waist belt.

They have achieved the overall classification accuracy of 90.8 % on a relatively small number of subjects (only six). Allen et al. [25] also placed the acceleration sensor on the waist belt and classified eight activities (Sitting, Standing, Lying, walking, Sit-to-stand, Stand-to-sit, Stand-to-lie and Lie-to-stand) on relatively small number of subjects (six only) and achieved the accuracy of 91 %. Mi et al. [58] conducted small pilot study on five activities (Standing, Sitting, Lying, Walking and Running) of five subjects to get the classification accuracy of 99 %. Results of Kwapisz et al. [34] are the most relevant to our research as they have used the same data same number of activities. They have achieved the classification accuracy of 91.7 % with multi-layered perceptron. But their classification accuracy in the activities of walking upstairs and downstairs are 61 % and 44 % respectively which are very low as compared to other activities. Upstairs and downstairs activities are confused with each other and with walking as well. Therefore, they combined the upstairs and downstairs activities into one class and called this class as Stairs and managed to get the classification accuracy of this class as 77.6 %. Kastner et al. [59] achieved good classification accuracy of 96 % on testing data by combining the features of acceleration and gyro sensors of the smartphone. They classified six similar activities as presented in our paper. Comparing with the published results, our framework produced much better results. On the full feature set FS1, classification accuracy is 99.4 % which is highest as compared to the published results. Moreover, F-measure of all activities is more than 96 % for the feature set FS1. Performance of KNN classifier on reduced feature set FS2 by scatter search is not degraded much and F-measure of all the activities are above 93 % and overall F-measure is 98.2 %. This result is more than the published results as well. Redundant or less important instances for KNN classifiers are pruned by DROP2 pruning method. This will speed up the time complexity of the classification on testing instances. For only 11.6 % selected instances from the training set, KNN classifier managed to achieve over all F-measure of 97 % for training set and 95 % for testing set. The acceleration data is sampled with the sampling frequency of 20 Hz. Window of 10 s is used extract the features of one instance with an overlap of 2.5 s. Hence, features will be calculated once only after 2.5 s. All the features are calculated in time domain. So the time complexity of KNN classifier with feature subset of only 30 features and 1689 representative prototypes (FS5: Pruned set by DROP2 method) will be space and time efficient.

Conclusion

Importance of physical activity monitoring is many folds. The basic step in physical activity monitoring in the classification of physical activities based on some sensors placed on the body or carried by the subject. In this paper, classification results are presented for six types of physical activities. Major contribution of the paper is optimal selection of features from the acceleration data recorded by the smartphone. Different types of ensemble classifiers are studied to optimize the classification accuracy of all six types of physical activities. It was found that KNN classifier is the best classifier for the optimal feature subset of 30 features based on simple time domain features calculated from the acceleration data of 10 s window sampled at 20Hz. Classification accuracy of the optimal feature subset is found to be more than 98 % classification accuracy. Most importantly, classification accuracy of more than 96 % for the difficult to classify physical activities (walking Upstairs and Downstairs) is achieved. To improve the time and space complexity, significant instances are selected to represent all types of physical activities by DROP2 pruning algorithm. It is shown that about 1800 instances representing six types of activities can produce more than 95 % classification accuracy.