Comprehensive machine and deep learning analysis of sensor-based human activity recognition

Balaha, Hossam Magdy; Hassan, Asmaa El-Sayed

doi:10.1007/s00521-023-08374-7

Comprehensive machine and deep learning analysis of sensor-based human activity recognition

Original Article
Published: 08 March 2023

Volume 35, pages 12793–12831, (2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Neural Computing and Applications Aims and scope Submit manuscript

Comprehensive machine and deep learning analysis of sensor-based human activity recognition

Download PDF

762 Accesses
17 Citations
1 Altmetric
Explore all metrics

Abstract

Human Activity Recognition (HAR) is a crucial research focus in the body area networks and pervasive computing domains. The goal of HAR is to examine activities from raw sensor data, video sequences, or even images. It aims to classify input data correctly into its underlying category. In the current study, machine and deep learning approaches along with different traditional dimensionality reduction and TDA feature extraction techniques are suggested to solve the HAR problem. Two public datasets (i.e., WISDM and UCI-HAR) are used to conduct the experiments. Different data balancing techniques are utilized to deal with the problem of imbalanced data. Additionally, a sampling mechanism with two overlapping percentages (i.e., 0% and 50%) is applied to each dataset to retrieve four balanced datasets. Five traditional dimensionality reduction techniques in addition to the Topological Data Analysis (TDA) are utilized. Seven machine learning (ML) algorithms are used to perform HAR where six of them are ensemble classifiers. In addition to that, 1D-CNN, BiLSTM, and GRU deep learning approaches are utilized. Three categories of experiments (i.e., ML with traditional features, ML with TDA, and DL) are applied. For the first category experiments, the best-reported scores concerning the WISDM dataset are accuracy and WSM of 99.10% and 86.61%, respectively. When concerning the UCI-HAR dataset, the best-reported scores are accuracy and WSM of 100% and 100%, respectively. For the second category experiments, the best-reported scores concerning the WISDM dataset are accuracy and WSM of 95.34% and 89.62%, respectively. When concerning the UCI-HAR dataset, the best-reported scores are accuracy and WSM of 96.70% and 92.57%, respectively. For the third category experiments, the best-reported scores concerning the WISDM dataset are accuracy and WSM of 99.90% and 99.76%, respectively. When concerning the UCI-HAR dataset, the best-reported scores are accuracy and WSM of 100% and 100%, respectively. After concluding the final results, the suggested approach is compared with 6 related studies utilizing the same dataset(s).

A Hybrid Deep Learning-Based Approach for Human Activity Recognition Using Wearable Sensors

A Deep Learning Framework for Smartphone Based Human Activity Recognition

Article 15 May 2023

Human activity recognition from multiple sensors data using deep CNNs

Article 24 June 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Recently, artificial intelligence (AI) has been widely used in several applications (e.g., cancer recognition [1, 2], burnout analysis [3], exam correction [4, 5], diseases diagnosis [6,7,8], sign language interpretation [9], natural language processing [10], and pattern recognition [11]). AI accelerated development has paved the way for human activity recognition (HAR). It is concerned with using sensor data to recognize a specific action (or movement) of a person. It has become one of the broadest research topics because of the sensors and accelerometers availability, less power consumption, and low cost. It has been widely employed in smart home [12], medical care [13, 14], image analysis [15], video surveillance [16], military defense [17], sleep state detection [18], and behavior monitoring [19, 20]. In HAR, movements are indoors and outdoors-performed activities (e.g., talking, walking, running, sitting, and standing). Additionally, they can be more focused activities such as the activities achieved in a kitchen or on a factory floor [21]. In short, the fundamental task of HAR is to choose a suitable sensor and use it to observe and capture the activities of the user [22] as shown in Fig. 1. Recently, HAR can be classified into the sensor- and visual-based recognition [23, 24]. HAR sensor-based data has become a research focusing-field because of the wide usage of wearable and portable sensors in daily life. The HAR sensors mainly include geomagnetic [25], acceleration [26, 27], and gyroscope [28].

Historically, it required custom hardware and was costly to gather and store data from sensors for activity recognition. Nowadays, smartphones, smartwatches, and other personal tracking devices, utilized for health and fitness monitoring, are inexpensive and omnipresent. As a result, sensor data collected from these devices are more common and inexpensive to collect, so, the activity recognition problem become a wide study field. Smartwatches have been beneficial in a broad range of healthcare applications, especially, the ones that are concentrating on health and fitness monitoring [29, 30]. Compared to other smart devices, smartwatches are truly wearable without interrupting the daily lives of the user [31]. The growth of smartwatches in the healthcare field has facilitated people to monitor their fitness and health [32]. Unfortunately, wearable sensors-gathered data are time-series data that are complex, noisy, and imbalanced [33]. Hence, HAR is a complex procedure, which contains the following steps: pre-process and segment the time series data, extract the data features, and then classify them by utilizing a classification algorithm.

Manual features extraction for HAR, based on classical machine learning (ML) algorithms, is required [34]. Dimensionality reduction and feature extraction methods are required for ML algorithms to achieve better performance. These methods aspire to find the most informational and compacted set of features by generating new ones from the existing features. They represent the most crucial part of classification because the performance is decreased significantly if the features are not suitable.

Creating classification models that can classify the less common activities is a significant challenge. Classification models designed to classify imbalanced data are biased to learn about the more frequently occurring classes. This type of bias happens as the models learn better from classes containing more data. Different methods were proposed to deal with the class imbalance problem and these methods can be split into two main approaches: data-level and classifier-level methods [35]. Traditional ML methods that have been used to perform the HAR task include Naive Bayes and support vector machines (SVM) [36]. Recently, the evolution of deep learning has resulted in being utilized widely in HAR [37]. It learns and extracts features automatically without the complex steps of manual feature extraction, hence, the workload of feature engineering is significantly decreased [38, 39]. Deep Neural Networks such as Recurrent Neural Networks and Convolutional Neural Networks (CNN) have gained significant performance across different applications and outperformed many traditional methods. Lately, Long Short-Term Memory (LSTM) and CNN provide state-of-the-art results on HAR tasks with no or little feature engineering [40].

1.1 Research gap

In the HAR research field, a high-quality benchmark dataset for HAR methods is missing. Most of the publically available datasets suffer from limited or imbalanced data problems [33]. Most of the observed activities are simplistic and do not cover the entirety of human actions. Integrating deep architectures for solving HAR using context information can be difficult [41]. Although these methods deliver state-of-the-art performance on benchmark datasets, they are still overconfident in their predictions.

1.2 Research objectives

The major objective of the current study is to suggest an approach that is used for Human Activity Recognition (HAR). The objective is to suggest an analysis of machine and deep learning algorithms to deal with HAR. Additionally, extracting the most relevant features from raw data and reducing their dimensionality in an efficient manner are two challenging tasks.

1.3 Research contributions

The contributions of the current study can be summarized as follows:

Performing human activity recognition tasks using a detailed comparative analysis of a variety of machine and deep learning algorithms to determine the optimal modal.
Analyzing balancing and sampling techniques to deal with imbalanced data, hence, determining the best approaches.
Reporting state-of-the-art performance metrics and comparing them with different related studies and approaches.

1.4 Paper organization

The rest of the current study is organized as follows: Sect. 2 reviews and summarizes the related literature. In Sect. 3, the background is discussed. It represents imbalanced data and oversampling techniques, features engineering and dimensionality reduction, Topological Data Analysis (TDA), feature scaling, and classification and optimization. In Sect. 4, a discussion about the methodology, datasets acquisition, data pre-processing phase, features engineering and dimensionality reduction techniques, ML classification and optimization phase, and DL classification phase. Section 5 presents the details and discussions of the experiments and results. Section 6 presents the study limitations and finally, Sect. 7 conclude the paper and present the future work.

2 Literature review

Classical ML algorithms demand extensive domain expertise and feature engineering to transform raw sensor data into features, from which a classifier identifies activities (e.g., SVM [42] and random forest [43]). In shi et al. [44], an algorithm based on the standard deviation trend analysis was utilized to recognize transition activity. For basic activity, SVM was mainly used for recognition. For transition activity, the standard deviation value of data was analyzed to evaluate the trend of the flow of the overall data to recognize the activity. The achieved accuracy by their proposed model was over 80% on real data.

In Garcia et. al. [45], their placement-, orientation-, and subject-independent HAR dataset was introduced. An SVM algorithm was presented to perform the experiments on the dataset and accuracy of 74.39% was obtained. Their proposed model was able to tighten the gap between the real-life application and a model. Ahmed et al. [46] have proposed a hybrid method that contains a filter and a wrapper for the feature selection process. The process employed a sequential floating forward search to extract the features that would be fed to a multi-class SVM. Their model was validated on a public benchmark dataset [47] and an average accuracy of 98.13% was delivered. Their proposed system provided acceptable activity recognition and operated efficiently with limited hardware resources.

Deep learning algorithms such as recurrent neural networks [48] and convolutional neural networks [49] conduct automatic feature extraction and classification. They have delivered promising results in different sensor-based HAR scenarios [50]. Barut et al. [51] used a single wearable sensor to create a new dataset and utilized a multi-task LSTM model for intensity evaluation and activity recognition to deliver better outcomes. Accuracy of 97.76% and F1-score of 83.43% were obtained. Wang and Liu [52] suggested a Hierarchical-LSTM based on the LSTM for human activity recognition. Three public UCI datasets were used to train and evaluate their model and accuracy of 99.15% was achieved.

Furthermore, convolutional neural networks are used for HAR tasks for temporal features extraction and to produce significant performance advancement [53,54,55]. Zhang et al. [34] utilized the encoder and decoder operations of the U-Net architecture in creating their proposed HAR framework. Rather than sliding window labeling, dense labeling was used to provide a single label per sample in the time series data. Additionally, to enhance the performance of the dense prediction outcome, a post-correction algorithm was utilized. Four datasets, including the WISDM dataset [56], UCI HAPT dataset (HAPT) [47], UCI OPPORTUNITY Gesture dataset (OPP Gesture) [57], and the dataset of self-collected Sanitation was used to conduct experiments. For the OPP Gesture dataset, an accuracy of 94.78% was obtained by their U-Net_PC model. Teng et al. [58] used the local loss function to achieve the layer-wise training of a convolution neural network for HAR. Their method was evaluated on five public datasets, namely UCI HAR dataset [42], Opportunity dataset [59], UniMib-SHAR dataset [60], PAMAP2 dataset [61], and WISDM dataset [36]. The reported accuracy and F1-score were 98.82% and 98.81%, respectively.

To extract powerful features from raw sensor-based data automatically and effectively, Ronao and Cho [62] proposed a model formed of alternating convolution and pooling layers. To predict human activities, the extracted features from the previous layers were passed to the fully connected and SoftMax layers. The dataset has been collected from 30 volunteer subjects and the dataset proposed in [63] was used to train and evaluate their model, reaching an overall performance of 94.79% on the dataset with raw sensor data, and 95.75% with further information of temporal fast Fourier transform of the HAR dataset. Bianchi et al. [64] suggested a CNN model formed of four convolution layers and only one fully connected layer for HAR and performed well on their small training set. Their system was designed to recognize nine different activities with an accuracy of 97%.

A different design paradigm that was very prevalent among the community of activity recognition was to create hybrid models [50, 65]. In Ordonez and Roggen [66], a DL architecture using a combination of convolutional and recurrent neural networks was proposed to conduct HAR from wearable sensors. Two public datasets, OPPORTUNITY [59] and Skoda [67] were used to evaluate their proposed approach. For the Skoda dataset, an F1-score of 95.8% was obtained. In Xia et al. [68], two LSTM layers and cascading convolutional layers were employed to extract features from time-series data. To maintain the classification performance while reducing the model parameters, a global average pooling layer was utilized rather than the fully connected layer. Three public datasets, UCI [47], WISDM [56], and OPPORTUNITY [57, 59] were used to evaluate the model performance. The overall accuracy of the model for the UCI-HAR dataset was 95.78%, for the WISDM dataset was 95.85%, and for the OPPORTUNITY dataset was 92.63%.

Ignatov et al. [69] used CNN and statistical features to extract features from sensor data. The assembled features vector was handed to the next layers to identify the activities. The proposed approach was evaluated on two commonly used datasets (i.e., WISDM [56] and UCI [47]). The reported accuracy and F1-score were 97.63% and 97.62%, respectively. The results indicated that their presented model delivered state-of-the-art performance while demanding no manual feature engineering and low computational cost. In Xu et al. [55], the inception module of GoogLeNet architecture was explored to extract spatial features from sensor data.

Furthermore, temporal features obtained using the recurrent neural network and spatial features were combined. Three benchmark datasets were used to conduct experiments, OPPORTUNITY dataset [59], PAMAP2 dataset [61], and Smartphone database [70]. For the OPPORTUNITY dataset, an F-measure of 94.6% was obtained. Khan and Ahmad [71] proposed a multi-head attention-based model for HAR. Their framework included three lightweight convolutional heads, each created using one-dimensional CNN to extract features from input sensor data. Their model was induced with attention to strengthening the representation ability of CNN. Two publicly available datasets: WISDM [56] and UCI HAR [47] were used to conduct ablation experiments and studies and evaluate the proposed model. The achieved F1-score was 97.20% for the WISDM dataset.

When dealing with HAR data, a problem of imbalanced data may exist. To tackle the imbalance problem, the most intuitive path is to re-sample the class with the largest number of samples as done in Alani et al. [33]. Experiments were done using an extensive sensor-based multi-modal dataset developed from the Sensor Platform for Healthcare in a Residential Environment [72]. The results showed that when using the SMOTE oversampling technique to correct the class imbalance, CNN-LSTM achieved the highest classification accuracy of 93.67% followed by CNN of 93.55%, and LSTM of 92.98%. Grzeszick et al. [73] used two augmentation techniques, Gaussian noises perturbation, and interpolation to solve the class imbalance problem. In their study, a CNN was utilized on the multiple inertial measurement units sequential data. A dataset introduced in [74] was used to evaluate the proposed model and classification accuracy of 73.9% ± 4.6% was obtained.

2.1 Related studies summarization

Table 1 presents a comparison between the related studies and the current study. The related studies are ordered from the oldest to the latest.

Table 1 Comparison between the related studies and the current study

Comprehensive machine and deep learning analysis of sensor-based human activity recognition

Abstract

Similar content being viewed by others

A Hybrid Deep Learning-Based Approach for Human Activity Recognition Using Wearable Sensors

A Deep Learning Framework for Smartphone Based Human Activity Recognition

Human activity recognition from multiple sensors data using deep CNNs

Explore related subjects

1 Introduction

1.1 Research gap

1.2 Research objectives

1.3 Research contributions

1.4 Paper organization

2 Literature review

2.1 Related studies summarization

3 Background

3.1 Imbalanced data and oversampling techniques

3.2 Features engineering and dimensionality reduction techniques

3.2.1 Features extraction

3.2.2 Topological data analysis (TDA)

3.3 Feature scaling techniques

3.4 Classification, optimization, and performance evaluation

3.4.1 Machine learning classifiers

3.4.2 Deep learning classifiers

3.4.3 K-fold cross-validation

3.4.4 Grid search hyperparameter optimization

4 Methodology

4.1 Data acquisition phase

4.2 Data pre-processing phase

4.2.1 Data balancing

4.2.2 Data sampling

4.3 Features engineering and dimensionality reduction techniques

4.3.1 Features extraction using TDA

4.3.2 Dimensionality reduction

4.4 ML classification and optimization phase

4.4.1 Hyperparameters optimization using grid search (GS)

4.4.2 Features scaling

4.4.3 Performance improvement

4.5 DL classification phase

4.6 Performance evaluation

4.6.1 Multi-class averaging

5 Experiments and discussion

5.1 Experiments configurations, constrains, and assumptions

5.2 First category experiments

5.2.1 First category experiments remarks

5.3 Second category experiments

5.3.1 Second category experiments remarks

5.4 Third category experiments

5.4.1 Third category experiments remarks

5.5 Overall remarks

5.6 Related studies comparison

6 Limitations

7 Conclusions

7.1 Future work

Data Availability

Abbreviations

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Intellectual property

Research ethics

Authorship

Compliance with ethical standards

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation