Keywords

1 Introduction

Because more than 37 million falls each year necessitate medical attention, falls are a severe public health issue [1]. One of them is the elderly, who can put themselves in danger of serious harm which can be hip fractures or even death [2, 3]. In 2030, there will be 1.4 billion elderly, and by 2050, there will be 2.1 billion. Ages 65 and older have a higher risk of falling (by 28–35%) than seniors 65 and older. For those who are older than 70, this risk rises to 32–42% [4, 5]. This research was created to be able to do early prevention of the elderly in the case of a fall incident because it is a severe concern. There is a system that can detect human behaviors using a variety of algorithms has been developed through several experiments. The recognized target in this system has a human movement feature. Human Activity Recognition has been the subject of several investigations. There are a several research on HAR that using radar [6,7,8,9], a system of computer vision [10,11,12,13,14], and an inertial sensor [11, 15, 16]. Because it is inexpensive and suitable for use by the elderly during activities, the author employed an inertial sensor for HAR investigation.

Several academics have investigated the use of different algorithms to Utilize inertial sensors to detect human activity. In a study [17], deep neural network (DNN), bidirectional-long short-term memory (BLSTM), CNN, and CNN-LSTM algorithms were used to create a wearable device with an IMU sensor. In this study, the maximum accuracy obtained was 90%. According to research [18], the Hidden Markov model (HMM) method outperformed the k-nearest neighbor (KNN), Naïve Bayes (NB), and LSM methods (ANN) with an average accuracy classification rate was 93.2%. A dataset from 91 participants that were kept for 27.76 h was used in the research [19] used a dataset stored for 27.76 h from 91 people. The CNN approach was employed in this investigation. Extraction Features take part in these studies that impact the accuracy value. Because feature extraction may characterize data that provides information about the most prominent aspects, it has become absolutely important in terms of data-related issues [20].

The goal of this research is to develop a HAR that can detect the activities of the elderly using datasets. The dataset consists of each axis’ acceleration and gyroscope data. This dataset was created from a sample of 10 individuals, split into 4 classes. Walking, Falling, Sitting to Standing, and Standing to Sitting are the four classes in question. From the collected dataset, it is continued with the preprocessing stage with an amount of data 28,958. Using the XGBoost method, the data from the preprocessing results will select the top 4 extraction features. The top 4 extraction features were chosen to improve system performance during classifying operations. The four most successful extraction characteristics will be used as the data for processing using stochastic gradient descent, random forest, K-nearest neighbor, decision tree, and Gaussian Naive Bayes. We will select one of the best machine learning methods from among the five by comparing the average accuracy values acquired. The best machine learning method produced will be followed by an evaluation of the training data results utilizing the confusion matrix.

2 Method and Material

This experiment uses a dataset that looks at the inertial sensor’s acceleration and gyroscope values as an input. The position of inertial sensor is placed on the user’s chest so that the measuring point of this inertia sensor is on the chest. This dataset will be preprocessed first using the fast Fourier transform method. After using the fast Fourier transform method is continued by looking at the mean, median, maximum, minimum, skewness, kurtosis, and variance values. From all the extraction features obtained, the selection of the best four feature extractions is carried out using the XGBoost algorithm. Training will be carried out on the four extraction features using the Stochastic Gradient Descent, Random Forest, K-Nearest Neighbor, Decision Tree, and Gaussian Naive Bayes methods to compare the average accuracy values of each machine learning method. The output of this system is the identification of human activity after selecting feature extraction and machine learning approaches. The drilling process is carried out as shown in Fig. 1.

Fig. 1
A flowchart of the proposed method. Input consists of acceleration of each axis and gyroscope of each axis to datasets, which lead to the process that includes preprocessing, extraction feature selection, and machine learning methods selection to output as a human activity.

Proposed method

2.1 Datasets

The dataset used is a dataset obtained from 10 samples of subjects for each movement. The dataset sample will be 4 classes. These 4 classes are walking, standing to sitting, sitting to standing, and falling as seen from the acceleration and gyroscope values for every 0.01 s. Each person performs the movement from standing to sitting ten times, sitting to standing ten times, walking for 1 min, and falling 15 times. From the 15 falling activities, there are three variations of the falling movement, five times fall forward, five times fall on the right side, and five times fall on the left side. The method used for sampling is by connecting a push button to the wearable device. When the participant is doing an activity, the participant will press the push button until the activity is over. The wearable device placement system for participants is shown in Fig. 2.

Fig. 2
An illustration of wearable device placement. A human is represented with the tag of input as X, Y, and Z axes acceleration and X, Y, and Z axes gyroscope and left hand with push button.

Wearable device placement

2.2 Preprocessing

The preprocessing used in this system is Fast Fourier Transform (FFT). A signal is shifted from the time domain to the frequency domain using the fast Fourier transform. The first step in preprocessing is to analyze the data and utilize the Fast Fourier Transform technique to transform the signal from the time domain to the frequency domain [21]. The best technique for signal processing is called Fast Fourier Trans-form [22]. The general equations for the Fast Fourier Transform are in Eqs. (1) and (2) [23].

$${X}_{(k)}={\sum }_{n=0}^{N-1}{x}_{n}\left({W}_{n}^{kn}\right)$$
(1)
$${W}_{n}^{kn}= {e}^{-j{\omega }_{0}kn}$$
(2)

where \({X}_{(k)}\) is the value of FFT, N is the size of its domain, and xn is a periodic signal with period n.

2.3 Extraction Features

The preprocessing used in this system is a fast Fourier transform. The next process is grouping the data after collecting the acceleration and gyroscope data in the frequency domain. The data is grouped for every 1 s when frequency domain data is being processed. Using feature extraction, the mean, median, maximum, minimum, skewness, kurtosis, and variance values are obtained from the data every second.

Mean is the statistical formula to calculate the average value of the results of data collection with a formula shown in Eq. (3).

$$\overline{x }=\frac{\sum x}{N}$$
(3)

where \(\sum x\) Sum of data and N Number of data.

Median is one of the statistical formulas that aim to get the median value of data collection that has been ordered. The formula for the median can be seen in Eq. (4).

$$Med\left(X\right)=\left\{\begin{array}{l}X\left[\frac{n}{2}\right] , if\,n\,even\\ \frac{\left(X\left[\frac{n-1}{2}\right]+X\left[\frac{n+1}{2}\right]\right)}{2} , if\,n\,odd\end{array}\right.$$
(4)

Maximum is a statistical method that has a function in determining the highest value in the collected data.

Minimum is a statistical method with a function that is inversely proportional to the maximum, which is getting the lowest value from the data collected.

Skewness is a statistical method by looking at the level of asymmetry of the data obtained. The following is the formula for the Skewness method, which can be seen in Eq. (5).

$$\stackrel{\sim }{{\mu }_{3}}=\frac{{\sum }_{i}^{N}{\left({x}_{i}-\overline{x }\right)}^{3}}{\left(N-1\right)*{\sigma }^{3}}$$
(5)

where N is Number of data, \({x}_{i}\) is Random variable, \(\overline{x }\) is Mean of the data, and \({\sigma }^{3}\) is Standard Deviation.

Kurtosis is a method by looking at the sharpness of the collected data. To get the sharpness value, it can be shown in Eq. (6).

$$Kurt= \frac{{\mu }_{4}}{{\sigma }^{4}}$$
(6)

where \({\mu }_{4}\) is Central moment and \({\sigma }^{4}\) is Standard Deviation.

Variance is a method to find out how far the data is spread. The value can be found in Eq. (7).

$${S}^{2}=\frac{\sum {\left({x}_{i}-\overline{x }\right)}^{2}}{n-1}$$
(7)

where \({S}^{2}\) is Variance, \({x}_{i}\) is Data per index, \(\overline{x }\) is Mean, and n is number of data.

With a dataset in the form of acceleration and gyroscope, Fig. 3 shows a representation of the feature extraction process used in this experiment. The acceleration and gyroscope dataset consists of 3 axes, namely the X, Y, and Z axes. Using the Fast Fourier Transform technique, each of these axes will be transformed into a frequency domain. The data that has been translated into the frequency domain will provide the mean, median, maximum, minimum, skewness, kurtosis, and variance. In order to achieve this, 42 extraction features were used in this experiment.

Fig. 3
A flow chart of feature extraction diagram. Datasets flow to acceleration and gyroscope to X, Y, and Z axes. Further, both lead to F F T and finally, mean, median, maximum, minimum, skewness, kurtosis, and variance.

Feature extraction diagram

2.4 Feature Selection

From all the extracted features obtained, the extraction feature will be selected. This selection aims to reduce the number of feature extractions used for training data. In this selection, the XGBoost algorithm is used. XGBoost is a gradient boosted decision tree extension. It has been used to solve numerous classification issues in various fields [24]. One of these features is that it can determine the best extraction feature. This best feature extraction is obtained by calculating how many features are used to divide the data across all trees. So it will bring up the most dominating extraction features [25].

2.5 Classifier

From survey of classifier that was used for human activity recognition [11], there are several classifiers which have good accuracy modal. There are KNN, Decision tree, and Random Forest. Therefore in this study, it still used that classifier, but for com-parison, Gaussian NB is added. This research can be described whether the important feature will produce a good model with all classifiers.

Stochastic Gradient Descent. It is a simple statistical-based optimization technique that is efficiently used in finding coefficient values to minimize loss (error) functions on a large scale [26]. Generally, stochastic gradient methods are applied to solve optimization issues using Eq. (8).

$$\mathop {\min }\limits_{{x \in {\mathbb{R}}^{d} }} f\left( x \right): = {\mathbb{E}}f_{\gamma } \left( x \right)$$
(8)

where {\(fr : r \in \Gamma\)} is a family of functions from \({\mathbb{R}}^{d}\) to \({\mathbb{R}}\) and γ is a Γ-valued random variable, with respect to which the expectation is taken (these notions will be made precise in the following sections). In supervised learning applications, is typically a uniform random variable taking values in the range of {1, 2, …, n}. \(f\) is the total empirical loss function in this case, while {\(fr : r \in \Gamma\)} are the loss functions resulting from the rth training.

Decision Tree Classifier process consists of converting the data format (table) into a tree model, converting the tree model into rules and simplifying the rules. In building a decision tree using the CART algorithm, impurity or entropy and information gain are used to determine the root node [26]. Entropy and information gain in Decision Tree Classifier have a significant impact on the Decision Tree Classifier algorithm. According to Shannon, entropy is a measurement of the amount of data created and the level of uncertainty in that output. According to Shannon, a discrete system’s entropy value is

$$H=-{\Sigma }_{{P}_{i}}{\mathrm{log}}_{2}{p}_{i}$$
(9)

where pi is the probability of the event occurring.

Knowing how much meaningful knowledge is gathered about the response variables from the explanatory variable is important to the building of a classification tree. This is known as information gain. Information gain can be used to determine how essential or influential an explanatory variable is in relation to the response variable. In terms of entropy, we can describe it as [27, 28]:

$$IG\left(\left.Y\right|X\right)=H\left(Y\right)-{H}_{X}\left(Y\right)$$
(10)

Random Forest is an ensemble technique consisting of several decision trees as classifiers. Classes obtained from this classification process are taken from the most Classes generated by the decision tree in Random Forest [26]. Numerous branches of tree classification are possible as a result of the equation. The process known as “majority voting” is the most popular. This function will select the most frequent class that is classified using a tree [27].

K-Neighbors Classifier is a simple classification in classifying features based on the closest distance to adjacent features [26]. The main idea behind this approach is to determine the distance between two classes. The KNN function by default uses the Euclidean distance, which may be determined using the following equation.

$$D\left(a,b\right)= \sqrt{{\left({a}_{i}-{b}_{1}\right)}^{2}+ {\left({a}_{2}-{b}_{2}\right)}^{2}+ \cdots + {\left({a}_{n}-{b}_{n}\right)}^{2}}$$
(11)

where a is the position for the first class and b is the position for the second class [29].

Gaussian Naive Bayes is a Naive Bayes classification model that uses continuous data types, and each type is characterized by Gaussian multivariate or Normal Probability Density Function (PDF). Gaussian Naive Bayes has two parameters mean and variance [26]. Bayesian classifier can be seen in Eq. 12:

$${h}_{nb}=argmaxP\left(c\right){\Pi }_{i=0}^{d}P\left(\left.{x}_{i}\right|c\right) c\in Y$$
(12)

where c is a part of Y, which is the total number of activity categories, Y = {c1, c2, …, cN}. N is the number of activity categories overall. d is the total number of characteristics. The ith characteristic is represented by xi [29].

3 Results and Analysis

3.1 Preprocessing

The goal of preprocessing is to change the incoming data’s format so that it can be processed in the right way. The rapid Fourier transform method is used to first preprocess each axis’ time domain acceleration and gyroscope data into frequency domain data. Following the acquisition of the frequency domain value, the feature extraction features of mean, median, maximum, minimum, skewness, kurtosis, and variance are recorded. XGBoost is used to choose the features from the extracted features. The extraction characteristics of Y-Axis Gyroscope mean, X-Axis Acceleration skewness, X-Axis Gyroscope variance, and X-Axis Gyroscope max are the best extraction features based on the results of selecting the most important extraction features. The feature extraction outcomes that most important affect HAR and the resulting Score are shown in Table 1.

Table 1 Score of each extraction feature

3.2 Machine Learning

Using the feature extraction from the outcomes of the preprocessing, compare the machine learning approach with the cross-validation result 3 and test size value 0.2. Table 2 displays a comparison of the average accuracy values for each machine learning method.

Table 2 Average accuracy results

The average accuracy value while combining four feature extractions and just one feature extraction is displayed in Table 2. Based on the average accuracy results, the highest average accuracy value is Random Forest with combined feature and Random Forest has the lowest average accuracy value with only using the X-Axis Gyroscope maximum. The average accuracy value of the Random Forest with combined feature is 99.59%, and Random Forest with only using the X-Axis Gyroscope maximum average accuracy value is 61.15%. As a result of Random Forest’s best accuracy with features combined, proceed by looking at the confusion matrix produced by this method. The confusion matrix of Random Forest with combined features can be shown in Fig. 4.

Fig. 4
A schematic of a 4 cross 4 confusion matrix. Row and column headers are falling, walking, standing to sitting, and sitting to standing. The values of A 11, 3652. A 13, 4. A 14, 1. A 22, 1063. A 31, 7. A 33, 675. A 34, 2. A 41, 9. A 43, 5. A 44, 374 and other values are 0. On the right, a gradient scale ranges from 0 to 3500.

Confusion matrix

Figure 4 demonstrates several errors in the classification of activities in the confusion matrix data. This Figure shows that the test data utilized is a dataset from a 20% split with a total data of 5,791. The amount of each activity’s inaccurate categories are 5 for falling, 0 for walking, 9 for standing to sitting, and 14 for sitting to standing. Table 3 displays the findings of the entire data evaluation.

Table 3 Data evaluation

For each class, Table 3 displays the list of precision, recall, F1-Score, and support values. The data shows that 0.97 and 0.99 correspondingly represent the lowest recall and precision values. The lowest F1 score from this value is 0.98, while the highest F1 score is 1.00. As a result, it can be said that the training results accurately detect HAR due to the value’s high precision, recall, and f1 score. Additionally, it can classify data with a low rate of classification errors in every class.

4 Discussion

These results show that an extraction feature has an impact on how elderly people will classify human activities. We can select the optimal extraction feature for classification using the XGBoost algorithm without using all the acceleration and gyro-scope information on every axis. Therefore, using this discovery, computations for activity classification can be done quickly. Research on HAR using the IMU Sensor has been conducted in the past, such as the [2, 15, 16] research with various inertial sensor placements. The preprocessing method used in this study separates it from that study’s research by time domain to frequency domain conversion and applying statistical techniques. As a result, the dataset used in this study’s training is not made up entirely of raw data for acceleration and gyroscope values from each axis. The mean Y-axis gyroscope, X-axis acceleration skewness, X-axis variance gyroscope, and X-max axis gyroscope are found to be the best combinations for feature extraction in this study using the Xgboost method.

The weakness of this research is that the training process takes a long time. The preprocessing required to obtain a value for machine learning is the explanation. Additionally, the performance of the system decreases with increasing statistical formula complexity. Therefore, a simpler statistical method and a limitation on the amount of extraction features employed during training are required to speed up system performance. Given that falls are frequently experienced by the elderly, the ability to recognize human movement in the elderly makes it possible to administer early treatment in the event of a fall. The consequences of falls in the elderly, which can lead to serious damage or death, necessitate early care.

5 Conclusion

According to the result of this research, this HAR can determine the activities of walking, falling, standing to sitting, and sitting to standing. Feature extraction and machine learning method that is very accurate at detecting HAR has been created. The mean Y-axis gyroscope, X-axis acceleration skewness, X-axis variance gyro-scope, and X-max axis gyroscope are the strongest extraction characteristics discovered in this research. The random forest technique is shown to have the highest average value using these 4 extraction features as training data. 99.59% is the average accuracy value. The training results exhibit high recall, precision, and F1-Score values in addition to high accuracy. The smallest recall value is 0.99, precisely 0.97, and the F1-score 0.98. As a result, using the strongest extraction characteristics and Random Forest method, it is possible to identify HAR, particularly when employed in the elderly with an inertial sensor. This HAR can provide early attention to the elderly by knowing the daily activities of the elderly that can keep track of elderly activities and concentrate more attention on elderly activities, particularly fall activities. Thus, monitors can provide early assistance to the elderly. This research can be expanded by selecting the optimal extraction feature to identify more activity categories, resulting in a better level of accuracy. As well as the installation of inertial sensors to enhance the elderly comfort.