Keywords

1 Introduction

In this modern digital era, self-dependency is one of the foremost stipulations of mankind. The aged and differently abled persons are also not out of this ensemble. The field of technology which helps to meet their craving is named ambient assisted living [1]. In the field of ambient assisted living, the primary requirement is human activity recognition. Excluding AAL, the real-time monitoring of human activity finds major application in the area of personal fitness, gaming, and entertainment. Human activity recognition can be done using two types of sensors; the first one is high-end audio visual sensor nodes with camera [2, 9], microphone [3], etc., and the second type is wearable sensors like gyroscope, accelerometer [4,5,6,7,8], thermal sensors, radio frequency identifier [10, 11], etc. Among these, the gyroscope and accelerometer are aptly felicitous for their small size, low-power consumption, fingertip availability with smart phone and smart watches, and they do not constrain user mobility while protecting their privacy. For these reason, these sensors found huge application in the field of AAL, while image or video processing-based activity recognition is useful for surveillance and security purpose. As this work deals with detection of daily life activities for assisted living, tri-axial accelerometer is used as sensor.

From the detail study of the previous proposed works for activity detection, the pr methods can broadly be categorized as the generative model and the discriminative approach. Majority of the state-of-the-art works comes under generative model category. Application of Gaussian mixture modeling (GMM) and Gaussian mixture regression (GMR) on accelerometer data [5], hidden Markov models (HMM) on RFID tag generated data in smart home environment [11], dictionary learning approach on accelerometer data [7] are some of them. In the discriminative approach for recognition of activities of daily life, we find application of support vector machine [3, 9, 12], KNN [13], decision tree [14], ANN [8], and many more classifiers. In a HMM-based HAR system, [15] classification accuracy of 69.18% is achieved, while in a principal component analysis (PCA) of video signal being accompanied with SVM classifier, 95% accuracy is reported [9]. In many HAR systems, PCA [9, 16, 17] makes a salient appearance in dimensionality reduction of large number of feature set extracted from sensor data. Walse et al. [8] shown that use of PCA reduces the dimension of feature set from 561 to 70 causing a visible reduction of modeling time while compromising very less with the accuracy of the system. PCA-based decision tree classifier is proven to be useful in classification of cancer data [18] and other medical data [19]. Being inspired by the efficient application of PCA in dimensionality reduction without losing information, it is used in this proposed work to reduce the three dimensional data of tri-axial accelerometer into one dimensional data keeping major information content within them. In this aspect, it is novel approach to wield PCA on raw sensor data unlike on feature set as seen in most state-of-the-art works. The high-classification accuracy in various discriminative approaches lead us toward the use of a very popular bagging technique named random forest classifier. Application of random forest algorithm on combined feature set of data generated by accelerometer and gyroscope is reported [20, 23] to be capable of activity recognition with accuracy of 90–98% depending on selection of activity type. In a random forest-based sports activity, recognition system around 99% detection accuracy was achieved using sensors worn on arm, forearm, belt, and dumbbell [21]. A comparative study of different classifier algorithm [22] on inertia sensors data, worn on wrist, chest, and ankle shows random forest which gives sufficiently high accuracy with very less training time.

In rest of this paper, Sect. 2 describes the data collection and pre-processing using PCA, Sect. 3 contains feature extraction, and Sect. 4 emphasis on classification of activities of daily life using random forest classifier. Result and comparative study with other related works are depicted in Sect. 5.

2 Data Collection and Pre-processing

The data for this noble activity detection algorithm are acquired from a popular benchmark dataset named human motion primitive (HMP) dataset collected from archive.ics.uci.edu [24]. In this work, five common daily life activities are considered for classification. These are (i) descend stairs, (ii) climb stairs, (iii) comb hair, (iv) brush teeth, and (v) drink glass.

Around 25 dataset of each activities are taken. These dataset contains readings of tri-axial accelerometer worn on right wrist of the volunteer, with a sampling frequency of 32 Hz. The three columns of the dataset contain coded values of accelerometer reading along X, Y, and Z direction. These data are normalized between the ranges −1.5 to +1.5 g and filtered using a median filter of size 3 to reduce noise. The sensitivity of this data is 6 bits per axis. The sample plots of data of each selected activities are shown in Fig. 1.

Fig. 1
The various graphs depict the signals of different activities such as descending stairs, brushing teeth, climbing stairs, combing hair, and drinking from a glass.

Sample plot of tri-axial accelerometer data for selected activities

Observation of these plots adjudicates that though the signals of last three activities are quite different from each other, the nature of signals of first two activities is very much similar. Also the data of the three axis for a particular activity are very much random in nature, and it is difficult to establish any mathematical relationship in between them. Therefore in most of the related work, features are extracted from all the three columns of sensor data, and a large dimensional feature set is developed which increases complexity and execution time of the classifier algorithm.

This work uses a different approach by reducing the dimensionality of the raw data while protecting the major information content within them. Here, principal component analysis (PCA) comes out to be the best possible tool to extract a single dimensional data from the three dimensional data without reducing the information content in the raw dataset.

PCA is a method of representing a given dataset A of a certain dimension into a derived dataset of same dimension where the axis of representing the vector samples of the data matrix is different. For example, if A is an n dimensional matrix, given by A = {A1, A2, A3 … An}, PCA converts it into another n dimensional matrix B = {B1, B2, B3 … Bn} where most of the information of dataset A is conserved into the first few columns of matrix B. Thus, the dimensionality of matrix A is reduced significantly with a small tradeoff information content in initial data.

A can be converted to B by a matrix computation

$$A = C \cdot B$$
(1)

where C is an n × n coefficient matrix.

Its values are determined in such a way that maximum variance of new data is ensured in first column of matrix B, and this variance gradually reduces as we proceed toward 1 to n. Here, information content in a data is basically represented by its variance. But direct determination of C is not possible, therefore covariance matrix K is computed using following equation.

$$K_{n \times n} = \frac{1}{n - 1}\sum\limits_{j = 1}^{n} {(A_{j} - A^{\prime})^{T} \cdot (A_{j} - A^{\prime})}$$
(2)

where Aʹ is the mean of the considered data samples and is given by

$$A^{\prime} = \frac{1}{n}\sum\limits_{j = 1}^{n} {A_{j} }$$
(3)

Next, the eigenvalues of Kn × n for initial data A are calculated, and let they are denoted by λ1, λ2, … λn. Here, λ1 ≥ λ2 ≥ λ3… λn − 1 ≥ λn. Then, eigenvectors corresponding to n eigenvalues are computed among them m eigenvectors S1, S2 … Sm associated with m highest eigenvalues define the representation of data A into m dimensional axis known as principal axis. In this work, the raw dataset is 3 dimensional, and PCA decomposes each dataset into three different uncorrelated array of data, which are actually the eigenvectors SCORE1, SCORE2, SCORE3. The plot of these for a ‘comb hair’ dataset is shown in Fig. 2. Among these only the array of data with highest eigenvalue, i.e., the first one is taken for further analysis which is also known as principal component of data. Plot of this for the same dataset is shown in Fig. 3. This single dimensional principal component data with highest variance are taken for further analysis.

Fig. 2
A graph of acceleration in meters per second squared versus the number of samples plots 4 curves with a series of peaks and troughs.

Principal component plot of acceleration data

Fig. 3
A graph of acceleration in meters per second squared versus the number of samples plots a curve with a series of peaks and troughs.

Plot of the first dimension of PCA

3 Feature Extraction

Several time domain and frequency domain features of this principal component of accelerometer data are calculated. Mean, median, variance, standard deviation, kurtosis, and RMS value are the time domain features estimated here, among them the variance and standard deviation are found to come up as very much significant in the classification algorithm.

The significant frequencies of this principal component of data are determined using fast Fourier transform technique.

The frequency spectrum got from this is shown in Fig. 4. The peak amplitude in this spectrum and its corresponding frequency is recorded as two significant feature for classification. A complex feature is derived by taking the ratio of this recorded amplitude and frequency denoted as ratio of amplitude to frequency (RAF).

Fig. 4
A graph of amplitude from 0 to 1.4 versus frequency from negative 20 to 20 plots a curve with a maximum peak between negative 5 and 5 and a labeled coordinate of x equals 1.455 and y equals 1.201.

Frequency spectrum of a descend stair signal after PCA

The most significant features which are used in classification process are, respectively, variance (var), standard deviation (std), peak amplitude of frequency spectrum (amp), corresponding frequency of this peak amplitude (principal frequency), and ratio of this amplitude and frequency (RAF). The feature table with three sample dataset of each activity is shown in Table 1. In actual practice, 25 such datasets of each activity are analyzed to form a feature set of 125 (25 × 5) rows and 5 columns.

Table 1 Features table

4 Classification Based on Random Forest

Random forest classifier is used to classify the five different types of activity. Random forest is a popular bagging technique. Here, if a n × m dimensional feature set is used to train a classifier, then it randomly selects some ensembles of feature samples with dimension p × q where p < n and q < m. Presence of feature samples in these ensembles is repetitive. Decision trees are formed using each ensemble by choosing the best suitable feature to split training set on the node without pruning. Thus, largest possible trees are grown by selecting appropriate feature and binary classification at every node. In programming, this is done using nested if else loop. Here, some of the feature data samples may remain out of the ensembles which are known as out of bag samples. At every iteration of a tree grown using a selected ensemble of samples, the out of bag classification error is calculated which plays an important role in determination of appropriate number of trees to be grown in the random forest to get best classification result. Thus, once the training is completed, the testing can be done using a feature set of any unknown class. Different class prediction may be obtained by different decision trees, and the final prediction result is achieved by maximum class voting.

Here from the feature table among the 25 datasets of a particular activity, 15 randomly selected datasets are taken for training the classifier model, while 10 datasets are kept for testing purpose. Thus, the input to the classifier is the feature dataset (X) and a class variable (Y) which contains the classes of activity to be trained for its corresponding features. Here, five different types activities are to be classified, and in the class variable, they are denoted as follows; descend stairs as ‘1’, climb stairs as ‘2’, comb hair as ‘3’, brush teeth as ‘4’, and drink glass as ‘5’. The random forest classifier then generates number of decision trees as specified by user, here in this case 10.

The number of grown trees is taken as 10 by viewing the number of grown trees versus out of bag classification error plot as shown in Fig. 5. From the graph, it is observed that the classification error is minimum for approximately 10 grown trees, which increases further and remains almost constant for up to 30 grown trees. In detail experimentation, it was observed that this error increases further with number of grown trees due to overtraining. Therefore, 10 decision trees are finally grown during training. One such grown decision tree is shown in Fig. 6, where at every node, a binary split is occurring depending on a threshold value of a certain feature denoted as x1, x2 … x5. Such binary classification at multiple stage gives rise to a complete tree capable to decide class of all the five activities denoted as 1–5.

Fig. 5
A graph of out of bag classification error versus the number of grown trees plots a downward trend curve with a point at (26, 0.1765).

Plot of number of grown trees versus out of bag classification error

Fig. 6
A tree diagram depicts different branches of binary splits based on threshold values marked as X 1, X 2, X 3, X 4, and X 5 at every node.

Grown classification tree

Thus, the trained classifier model when fed by a test feature set of a unknown activity 10 decisions emerges from 10 classification trees among these which decision gets majority votes that comes out as final class prediction. Similarly by testing on 10 feature sets of each activity, 96% accuracy is achieved. The test results are shown in Table 3.

5 Result

The developed human activity detection system is found to be capable of detecting five activities of daily life with an average accuracy of 96%. The confusion matrix is formed with the classification results of test data which are 40% of the total available dataset. Remaining 60% data are used for training. The confusion matrix in Table 2 shows four activities climb stairs, comb hair, brush teeth, and drink something from a glass are truly detected with 100% accuracy. Where only 20% of descend stairs is detected as climb stairs which is false detection. The accuracy of a particular activity detection is calculated as (true positive/number of instances) × 100%. Table 3 shows classification accuracy of five different activities. The average of those 96% is considered to be the overall accuracy of the proposed model (Table 4).

Table 2 Confusion matrix in terms of percentage of classified activities
Table 3 Classification accuracy of different activities
Table 4 Performance comparison with other related works

Around 95–98% classification accuracy is achieved by a SVM-based HAR system [9] which uses camera as sensor node. The BiLSTM RNN system [16] registered average accuracy of 97.64% using three accelerometer, two gyroscope, two magnetometer, and one ECG sensor. In two other random forest-based multi-sensor HAR systems, overall accuracy of 93.44 [22] and 90% [23] is reported. In comparison with these state-of-the-art works, this proposed system ensures 96% detection accuracy with only single wrist worn accelerometer, which makes the system simple, feasible, and cost effective.

6 Discussion

This proposed method is found to be capable of recognizing human activity with a reasonably good accuracy using limited resource, i.e., only one tri-axial accelerometer. Application of PCA on the raw sensor data reduces the dimension of data keeping all important information. This leads to less time complexity and faster response of the system reducing amount of data handling requirement. The MATLAB program execution time in a i3 7th Gen PC with 8 GB RAM is approximately 4.5 s. Application of random forest classifier gives 100% detection accuracy for four activity and 80% for descend stair only. Hence, it is worth saying that the proposed method can be adopted to develop small, wearable, and cost-effective HAR systems. Application of this HAR method for more number of ADL detection is future plan of this work.