1 Introduction

1.1 Overview

A bipedal walking robot is a kind of humanoid robot, which is very similar to human morphology. It mimics human behavior and is devised to perform human-specific tasks. Currently, humanoid robots are not capable to walk properly like human beings. The human location is referred by the term human gait. It is a manifestation of the change in the joint angles of hip, knee and ankle. The human gait is considered a very unique biometric pattern of human being [1, 2]. Compare to other biometrics, it is unobstructive, difficult to spool, and permanent. However, it is a very complex learning process which involves interaction among different parts of the brain along with spinal cord, motor action and muscle interaction [3]. It is highly variable and depends on several parameters, e.g., speed, cloth, walking style, mental health condition, etc. [4]. It is affected by several factors like mental status, fatigue situation and cognitive status. These are measured using changes in the joints ankle value, step length, stride length, ground clearance, etc. It is potentially being used in measuring the various health issues like cognitive decline due to aging, freezing of gait, Parkinson, etc. It is one of the complicated physical activity which involves the coordination and synchronization of the various body parts together [5]. The gait is widely used for clinical assessment of neurological disorder patients, early diagnosis of gait abnormality, etc [6, 7]. It is also used for robotics walk generation and enhancing the cognitive capability to enable HRI [8, 9] and haptic [10]. Due to its biphasic and bipedal nature, it is popular for various human activities recognition [11].

1.2 Different sub-phases and events of gait cycles

The gait cycle consists of seven sub-phases as shown in Fig. 1. Each sub-phase corresponds to some percentage of gait cycle as shown in Table 1. The stance phase also known as double support phase(DSP), when both the feet touch the ground. On the other hand, swing phase also known as single support phase(SSP) as one limb touches the ground, while the other is off the ground to begin the next step [12]. Figure 2 represents the different event and configuration of walking sub-phases.

Fig. 1
figure 1

Different walking sub-phases

Fig. 2
figure 2

Different gait sub-phases event

Table 1 Percentage wise division of different walking sub-phases

1.3 Author’s contribution

Highlights of author’s contribution as follows:

  1. 1.

    Database creation and analysis Total 50 subjects data are collected for seven walking activities. The data are collected using tri-axial 9-Degree of freedom(Dof) IMU sensor BWT61CL MPU6050. The IMU sensors were placed on chest, left thigh and right thigh for three different modality of data set.

  2. 2.

    Data representation The sampling rate was 50 Hz, and each activity is recorded for 100 samples. Total 24 attributes are collected from sensors for each sample. The sensor used was triaxial acceleration, gyroscope, and magnetometer placed at chest, left, and right thigh. The tensor size was 100*(50*24).

  3. 3.

    Data pre-processing The extended Kalman filter is used for removable of noise during sensor reading.

  4. 4.

    Modelling of deep learning and hybrid deep learning classifiers To classify the data of different activities, four different hybrid deep learning models, namely, CNN–LSTM, CNN–GRU, LSTM–CNN, and LSTM–GRU are applied. To provide the generic solution, the ensemble learning method was also applied.

  5. 5.

    Performance analysis The proposed method has achieved 99.34% accuracy which is quite high and accurate. The MHEALTH data set is used for verification of model as benchmark data set.

1.4 Novelty

The paper has extensively explored the 7 walking activities data recorded through IMU sensor with extended Kalman filter in a controlled laboratory environment for 50 subjects. The paper has also opened the direction of how human activities can be used for HRI, building cognitive capability, diagnosis of disease at early stage, and study of human adaptive locomotion. The novelty of this research work is multidimensional. It starts from data collection to selection of filter and finally designing of hybrid deep learning-based model. The generic activity recognition framework is designed using 4 hybrid deep learning models namely CNN–LSTM, LSTM–CNN, CNN–GRU, and GRU–LSTM for all 7 walking activities. Finally, to provide the visual presentation of results the performance matrix, classification matrix and accuracy curves are included. To provide the generic solution, the ensemble learning-based method is employed.

1.5 Organization of the paper

The rest of the paper is structured as follows. The second section includes a literature review of related work about human activity recognition using a wearable sensors. It also describes several deep learning models with ensemble learning for walking activity data recognition. The third section is preliminaries. It provides the detail of all prerequisites require for study including notation and abbreviation, data pre-processing filters and different deep learning techniques. Section 4 contains a methodology and the proposed algorithm for the activities recognition system. This section illustrates data collection procedures in a controlled laboratory environment and pre-processing of raw data. This section also describes the data set and different human walking activity recognition algorithms. Section 5 discusses results and analysis of different activity classification using hybrid deep neural network models namely, CNN, CNN–LSTM, LSTM–CNN, CNN–GRU, and GRU–CNN. The last Sect. 6 incorporates conclusion and proposed future research work.

2 Literature review of related works

The gait is very commonly used in human activity recognition. Hsu et al. [13] used multiple wearable sensors for analyzing and classifying the gait of patients with neurological disorders like multiple sclerosis, cerebral palsy and stroke patients. Another novel approach was proposed by Mekruksavanich et al. [14] where smartphones as wearable sensors were used to collect data. The study was made to classify gait patterns for three different activities like walking upstairs, walking downstairs and walking on floor using LSTM. Jennifer et al. [15] have collected 7 different human activities data using wearable sensor and classified using DNN. Ioannis et al. [16] presented work for classification of neurological gait disorders using multitask feature learning. Semwal et al. [17] have proposed human gait state prediction using cellular automata and classification using ELM. Semwal et al. [18] have provided the optimized feature based on incremental feature analysis for gait data classification. In subsequent work, Semwal et al. [19] have also designed the vector field for gait sub-phases for reconstruction of gait. Chen et al. [20] have proposed deep convolutions neural networks based on multistatic micro-Doppler signatures for gait classification. Semwal et al. [21] proposed the restricted Boltzmann machine-based DNN model for gait activities classification. Poschadel et al. 2017 used dictionary learning method for gait classification [22, 23].

Some deep learning approaches like CNN allow classification directly on raw data without performing features extraction [24]. The CNN model performs three important work, i.e., sparse interaction, parameter sharing and equivalent representation. After convolution pooling and fully connected layers are used for classification and regression tasks [25].

Gupta et al. [26] have proposed a method for human activity recognition using ensemble learning technique. The results proved that the ensemble learning has outperformed among various techniques used for activity recognition and classification. For dimensionality reduction, principal component analysis method was used. The limitation was that the proposed algorithm was implemented on standard data sets and autoencoders could also be used for dimensionality reduction. Wang et al. [27] performed gait classification using ensemble CNN model which also suffered from several disadvantages like it could not use heterogeneous CNN classifiers for classification. The concept of integration of different classifiers was also proposed by Sun et al. [28] in 2020 on multigait feature for human gait classification. The ensemble learning method enhanced the overall recognition accuracy of the proposed framework but it also increased the complexity of the network. Wang et al. [29] have proposed a method for gait recognition which was invariant to viewing directions and incorporated heterogeneous techniques combined together for learning. However, the results were obtained from experimental data sets which could not further be extended in practical application. Nandi et al. [9] have proposed the hybrid automata-based model for generation of human locomotion trajectories in 2016. These model proved to be very efficient in calculating the joint trajectories correctly and this work could also be extended to push recovery analysis further.

3 Preliminaries

This section presents the description of notations and abbreviations, Kalman filter for data pre-processing and noise removal, benchmark data set description and various deep learning algorithms used in proposed methodology.

3.1 Notations and abbreviations

This subsection presents the notation and abbreviation used in the proposed methodology. Table 2 describes all the notation.

Table 2 Notation and abbreviations

3.2 Standard filter for IMU data

3.2.1 Extended Kalman filter

The noisy measurement of data is removed using the extended Kalman filter. The noisy reading is due to environment noise, self-occlusions, loss of accuracy due to fast movements, etc., during the data acquisition. Likelihood function is used for estimation of the variance of the noise processing the capturing of the raw data using IMU sensor. Extended Kalman filter is applied to remove the noise from the model [30]. Extended Kalman filter has the ability to smooth the accelerometer and gyroscope reading. The extended Kalman filter performs the nonlinear transformations also.

$$\begin{aligned} x_{t+1}= A x_t + \varepsilon , \quad \text {where}\quad \varepsilon \sim {\mathcal {N}}(0, Q) \end{aligned}$$
(1)
$$\begin{aligned} y_{t+1}= A x_{t+1} + \delta , \quad \text {where}\quad \delta \sim {\mathcal {N}}(0, R) \end{aligned}$$
(2)

where, x represents state vector, y represents measurement vector and t represents the discrete time index. To model nonlinear transformations, extended Kalman filter is used:

$$\begin{aligned} x_{t+1}= f(x_t) + \varepsilon , \quad \text {where}\quad \varepsilon \sim {\mathcal {N}}(0, Q)\end{aligned}$$
(3)
$$\begin{aligned} y_{t+1}= g(x_{t+1}) + \delta , \quad \text {where}\quad \delta \sim {\mathcal {N}}(0, R) \end{aligned}$$
(4)

Nonlinearity in the transformations is dealt with the usage of Taylor expansion. In the presented case, the transformations are approximated using the first-order Taylor expansion evaluated at the mean estimate of x at time t, \(\mu _t\):

$$\begin{aligned} x_{t+1}\approx f(\mu _t) + \frac{\partial {f}}{\partial {x}}(\mu _t) (x_t - \mu _t) + \varepsilon \end{aligned}$$
(5)
$$\begin{aligned} y_{t+1}\approx g(\mu _{t+1}) + \frac{\partial {g}}{\partial {x}}(\mu _{t+1}) (x_{t+1} - \mu _{t+1}) + \delta \end{aligned}$$
(6)

The first-order derivative term \(\frac{\partial {f}}{\partial {x}}(\mu _t)\), also referred to as the Jacobian, is used here as the linear transformation, while the \(f(\mu _t)\) term just serves to shift the mean of the transformation.

Extended Kalman filter also take care to include limiting conditions based on the joints displacements per frame.

The update equations obtained for our mean and variance estimates for the motion model using the nonlinear transformation above, we get the following new update equations for our mean and variance estimates for the motion model:

$$\begin{aligned} \mu ^-_{t+1}= f\left( \mu ^+_t\right) \end{aligned}$$
(7)
$$\begin{aligned} \varSigma ^-_{t+1}= J_f \varSigma ^+_t J_f^T + Q \end{aligned}$$
(8)

and for the sensor model:

$$\begin{aligned} \mu ^+_{t+1}= \mu ^-_{t+1} + \varSigma ^-_{t+1} J_g^T \left( J_g \varSigma ^-_{t+1} J_g^T + R\right) ^{-1} \left[ y_{t+1} - g(\mu ^-_{t+1})\right] \end{aligned}$$
(9)
$$\begin{aligned} \varSigma ^+_{t+1}= \varSigma ^-_{t+1} - \varSigma ^-_{t+1} J_g^T \left( J_g \varSigma ^-_{t+1} J_g^T + R\right) ^{-1} J_g \varSigma ^-_{t+1} \end{aligned}$$
(10)

where \(J_f = \frac{\partial {f}}{\partial {x}}(\mu ^+_t)\) and \(J_g = \frac{\partial {g}}{\partial {x}}(\mu ^-_{t+1})\).

3.3 Description of deep learning algorithms

3.4 Convolution neural network (CNN)

The designing of CNN/ConvNet is inspired from the visual cortex of the animal. ConvNet are unsupervised algorithms so they have the capability to learn from the filters and examples. ConvNet is a mathematical structure whose building blocks are convolution, max pooling/average pooling and dense layers. The first two components that is convolution and pooling are responsible for feature extraction and the dense layer is used to map these features to the final output. In order to improve the performance of the model, the depth of the network also increases which comes up with the problem of additional cost. So the CNN model becomes expensive with deeper layers. Another problem associated with the deep layer architecture is overfitting. The problem of overfitting and high computational cost due to deeper networks was overcome by hybrid ensemble learning model. Equation 11 represents the convolution operation.

$$\begin{aligned} (f*g)(z_1, z_2) = \sum _{\begin{array}{c}(x_1+y_1=z_1)\\ (x_2+y_2=z_2)\end{array}} f(x_1,x_2) \cdot g(y_1,y_2) \end{aligned} $$
(11)

where f is data and g is the kernel.

3.4.1 Dropout

In dense layer, problem of co-adaption occurs. Co-adaption means when more than one neurons in a layer extract the or similar information from the input. This problem majorly occurs when connection weights attached to few neurons are same. This results in the problem of drainage of resources in obtaining same information and leading to the problem of overfitting. Dropout technique is used to solve the problem of co-adaption in neural network. In this technique, randomly selected neurons are dropped by turning their neuron values to zero in each layer during the training process. The rate at which these neurons are turned off or dropped is known as dropout rate. Dropout also improves the learning rate of a model. Dropout is generally implemented after the fully connected layers in the neural network data sets used.

3.4.2 Convolution neural network–long short-term memory (CNN–LSTM)

It is a hybrid model that tries to combine the powers of convolutional networks followed by long short-term memory (LSTM). CNN is basically involved in feature extraction, and LSTM is responsible for the learning purpose. CNN–LSTM can apply a variety of operations involving sequential or time series inputs and this model is spatially and temporally deep which makes it fit for such kind of classification tasks. Originally it was known as long term recurrent convolutional network (LRCN). Lets have a deeper look into the LSTM. Its working is governed by following equations:

$$\begin{aligned} {\hbox{function}}_t&= \sigma ({\hbox{weight}}_f \cdot [{\hbox{hidden}}_{t-1}, {\hbox{input}}_t] +{\hbox{bias}}_f)\\ {\hbox{intermediate}}_t&= \sigma ({\hbox{weight}}_i \cdot [{\hbox{hidden}}_{t-1} , {\hbox{input}}_t]+{\hbox{bias}}_i)\\ C_{\mathrm{temp}}&= \tanh ({\hbox{weight}}_c \cdot [{\hbox{hidden}}_{t-1} , {\hbox{input}}_t] + {\hbox{bias}}_c)\\ C_t&= {\hbox{function}}_t *C_{t-1} + {\hbox{intermediate}}_t *c_{\mathrm{temp}}\\ {\hbox{output}}_t&= \sigma ({\hbox{weight}}_{\mathrm{output}} \cdot [{\hbox{hidden}}_{t-1} , {\hbox{input}}_t] + {\hbox{bias}}_{\mathrm{output}})\\ h_t&= {\hbox{output}}_t *\tanh (C_t) \end{aligned}$$
(12)

Here, we have divided the shape of our data into frames, so it becomes 3-dimensional.

3.4.3 Convolution neural network gated recurrent unit (CNN–GRU)

Gated recurrent units follow a very similar approach to long short-term memory units. GRU has an updated gate and a reset gate which are responsible for the flow of information vectors. These gates combinedly decide what part of the tensor needs to be remembered in the next step and which may be updated. Lets have a closer look on the architecture of Gated recurrent units. The gated recurrent unit is performing the following operation:

$$\begin{aligned} z_t&= \sigma ({\hbox{weight}}_z \cdot [{\hbox{hidden}}_{t-1} , {\hbox{input}}_t] )\\ r_t&= \sigma ({\hbox{weight}}_r \cdot [{\hbox{hidden}}_{t-1} , {\hbox{input}}_t] )\\ h_{\mathrm{temp}}&= \tanh ({\hbox{weight}}\cdot [r_t *h_{t-1} , {\hbox{input}}_t] )\\ h_t&= (1 - z_t) *{\hbox{hidden}}_{t-1} + z_t *h_{\mathrm{temp}} \end{aligned} $$
(13)

Here we have divided the shape of our data into frames, so it becomes 3-dimensional.

3.5 Ensemble learning

Ensemble learning technique used to avoid overfitting by reducing the complexity of model. It used to consider the average performance of different classifier and different hyperparameter. So, if any classifier performs poor and another performs better, the ensemble learning used to combine all results. The average of all classifier will provide the much needed generality of model by reducing variance and model complexity.

4 Proposed methodology

This section consists of data set description and pre-processing, flowchart of proposed architecture, and algorithm.

4.1 Data acquisition system

4.1.1 Data set description

The proposed work has used a multiple-task gait analysis approach using tri axial IMU sensors data. The IMU sensor is placed at chest, left thigh and right thigh. The data were collected for 50 subjects for 7 different activities in controlled laboratory environment. The data set contains data from the various tasks: (i) natural walk, (ii) standing, (iii) climbing stairs, (iv) cycling, (v) jogging, (vi)running, (vii) knees bending(Crouching). Figure 3 shows the data collection of different walking subjects using IMU sensor. The data are collected in controlled laboratory environment. Total 50 subjects data are collected for seven walking activities.

Fig. 3
figure 3

Different subject(s) data collection using IMU a Sensor placed on left and right thigh joint; b sensor placed on chest

4.1.2 Data set validation

To validate our collected data set, the previously available MHEALTh data set is considered as benchmark data set for comparing results with our collected data [31]. The MHEALTH data set is a physical activity monitoring data set. The mobile Health data set captures different physical activities based on multimodal body sensing. Ten subjects were made to perform 12 different physical activities like standing, walking, running, cycling, jogging, climbing stairs, crouching and so on. The data were captured by making use of 3 inertial measurement units (IMU) sensors and a heart rate monitor. Three IMU sensors were placed on each subject chest, right wrist and left ankle. The number of attributes obtained from this data set are 23. ECG measurements were also used for heart monitoring. This data set is used for Human activity recognition for identifying and classifying different physical activities [32].

4.2 Pre-processing steps: an extended Kalman filter

Due to inherent and physical limitation and high sensitivity, the sensor may record the noisy data. Hence, pre-processing of data is important before providing the deep learning-based classifier. The extended Kalman filter is utilized for smoothing the data. A window frame of 0.8 s was taken with overlapping of 50%.

4.3 Proposed architecture

Figure 4 shows the working flowchart of the proposed activities recognition system.

Fig. 4
figure 4

Working flowchart of the proposed activities recognition system

4.4 Model design and validation

The randomly Shuffling raw data are random splitting in to train, test and validation data. When designing the classification model, it is the best choice to test their accuracy on different data. For this purpose, whole data are divided into three part 60% training, 20% validation and 20% test data set. Further, the validation data are also shuffled with training data using cross-validation.

4.4.1 Data classification and parameter tuning

The data are classified using proposed hybrid deep learning-based classifier. Total 7 classifier is designed for classification purpose namely CNN, LSTM, GRU, CNN–LSTM, LSTM–CNN, CNN–GRU, GRU–CNN AND Ensemble Learning model. Figure 5 shows the proposed hybrid deep learning architecture. Tables 345 and 6 shows the list of hyperparameter used by different classifier.

Fig. 5
figure 5

Proposed hybrid deep learning architecture

Table 3 Set of parameters for CNN
Table 4 Set of parameters for CNN–LSTM
Table 5 Set of parameters for CNN–GRU
Table 6 Set of parameters for GRU–LSTM

4.4.2 Fivefolds cross-validation

The fivefold cross-validation is implemented. Figure 6 refers to fivefold cross-validation architecture.

Fig. 6
figure 6

Fivefold cross-validation Architecture

4.5 Performance evaluation parameter

The performance of any classifier used to measure in term of accuracy, precision, f score and recall value. The following section will deal with these parameters.

4.5.1 Accuracy

Accuracy is a performance measure which can be defined as the ratio of correctly predicted observation to the total observation

$$\begin{aligned} {\hbox{Accuracy}} = \frac{c}{n} \end{aligned}$$
(14)

Where c is correctly classified activities and n is the total number of activities.

4.5.2 Precision

Precision is the ratio of the predicted positive observation to the total predicted positive observation, it is defined as follows:

$$\begin{aligned} {\hbox{Precision}}=\frac{{\hbox{TP}}}{{\hbox{TP}}+{\hbox{FP}}} \end{aligned}$$
(15)

4.5.3 Recall

Recall is the ratio of the true positive observation to the total actual positive observation, it is defined as follows:

$$\begin{aligned} {\hbox{recall}}=\frac{{\hbox{TP}}}{{\hbox{TP}}+{\hbox{FN}}} \end{aligned}$$
(16)

4.5.4 F1 score

F1 score on the other hand is the weighted average of the precision and recall. The formula for f-measure is calculated as:

$$\begin{aligned} {\hbox{f-measure}}=\frac{((1+\beta ^2)*{\hbox{recall}}*{\hbox{precision}})}{((\beta ^2)*{\hbox{recall}}*{\hbox{precision}})} \end{aligned}$$
(17)

where, the value assigned to \(\beta \) is 1.

4.5.5 Proposed algorithm

Algorithm 1 present details of proposed work.

figure a

5 Experiment results and discussions

The results were obtained on computer system with i8 8700U processor and 12.0 GB RAM. In first step, the collected data of 50 subject of 7 walking activities are validated.

5.1 Input tensor size

In next step, the tensor input is prepared for validation of deep learning model. Total 9 attribute of accelerometer reading around chest, left and right thigh is calculated for all 7 activities. The sampling frequency was 50 Hz so the input vector size was (50*9) and the total tensor size for 7 activities was (700*450).

5.2 Model fitting and classification

The classification of different human walking activities was performed using seven different deep learning classifiers namely, CNN, LSTM, GRU, CNN–LSTM, LSTM–CNN, CNN–GRU, GRU–CNN and achieved 97.26%, 90.67%, 77.38%, 97.83%, 94.35%, 97.64%, 96.98% accuracy, respectively, on our data set. Tables 345 and 6 shows the hyperparameters used by abovementioned four hybrid deep learning models. Later, the 3 stage ensemble learning-based classifier is designed which has reported testing accuracy 99.34%, highest among all classifier.

5.3 Performance evaluation metrics

The accuracy of all the classification techniques is computed to compare the performance of each of the algorithms implemented. The results also incorporated precision, recall, F1 score, and support for all the labeled outputs. Confusion matrix is also displayed for all the abovementioned classification techniques.

5.3.1 Performance measurement in terms of classifier accuracy

The data set consists of seven different walking activities. There are total 50 subjects data across all 7 classes. Figure 7 shows the classification accuracy report for different classifiers for individual activities. Table 7 represents the classification accuracy report for different classifiers on out data set and MHEALTH data set. The overall accuracy of the Ensemble learning classifier was found to be 99.34%. To improve the overall classification accuracy of different activity gait, the performance of all different classifier has been combined for different hyperparameter. The average of all five classifier has given overall 99.34% accuracy in 4 different trials. Figure 13 shows the individual walking activities accuracy of different classifiers.

Fig. 7
figure 7

Accuracy classification report

Table 7 Comparison with state of the art models

We have provided the confusion matrices for all the classifiers for human walking activities classification. We have computed precision, accuracy, recall, \(f-1\) score and support value for each classifier and for each activity and comparison is made. Table 8 shows the detailed performance over various parameters. The model accuracy and loss plot with output as confusion matrix are plotted for all the four hybrid deep learning models, i.e., CNN–LSTM, LSTM–CNN, CNN–GRU and GRU–CNN and ensemble learning. In the case of CNN–LSTM, we inspect a steep drop in the values of loss of training and validation data with in first 3–4 epochs. A gradual increase in the accuracy of training and validation data is depicted in first 5 epochs and then we see a constant curve. After that some oscillations are seen in loss of training and validation data and a very slight gain of accuracy is observed. Figure 8 shows the confusion matrix and loss plot for CNN–LSTM model.

Table 8 Matrices showing individual results
Fig. 8
figure 8

CNN–LSTM model

In the output of LSTM–CNN, till first 15 epochs a rapid decrease in the loss of training and validation data is seen, in the same time accuracy boosted up to 50%. Then, till first 60 epochs we can see an oscillatory decrease in the values of loss. Accuracy meanwhile takes a constant curve. Figure 9 shows the confusion matrix and accuracy and loss curve using LSTM–CNN model.

Fig. 9
figure 9

LSTM–CNN model

Output of CNN–GRU takes almost similar pattern as GRU–CNN, while this model gives us 100% accuracy. We can see the confusion matrix Fig. 10 and the characteristic plot Fig. 11. The ensemble learning model is best suited for the activity recognition problem. Figure 12 shows the best output for ensemble learning model. When evaluating the classification models, it is required to test their accuracy on different data than that was used initially to train the model. This confirms that our model is not “overfit” to one small set of data. The proposed data set is validated with Mhealth data set. Table 7 shows the different model accuracy comparison of our data set with MHEALTH data. It shows these method performance better for our data set. The results also show the superiority of ensemble learning-based classifier (Fig. 13).

Fig. 10
figure 10

CNN–GRU model

Fig. 11
figure 11

GRU–CNN model

Fig. 12
figure 12

Ensemble learning model

Fig. 13
figure 13

Individual activity accuracy classification report

6 Conclusion

This paper has considered 50 subjects data for seven different walking activities. The data were collected through IMU sensor, placed at chest, left thigh, and right thigh. Total 100 samples are collected for all 11 activities. The sampling rate considered was 50 Hz. Following 7 walking activities were performed for all the 50 subjects:. These activities are (i) natural walk, (ii) standing, (iii) climbing stairs, (iv) cycling, (v) jogging, (vi)running, (vii) knees bending(Crouching). The data were classified using four Hybrid deep learning models namely, CNN–LSTM, CNN–GRU, LSTM–GRU, LSTM–CNN, and ensemble of all model. The accuracy was achieved for different deep learning models are CNN, LSTM, GRU, CNN–LSTM, LSTM–CNN, CNN–GRU, GRU–CNN have achieved 97.26%, 90.67%, 77.38%, 97.83%, 94.35%, 97.64%, 96.98% accuracy, respectively, on our data set. The paper has also implemented the ensemble learning by providing the combined average results of different classifier and providing the dropout. The proposed ensemble learning-based hybrid deep leaning framework has provided a promising classification accuracy of 99.34% over other models. It provides the generic solution and removes the dependency on more on hyperparameter and data dependence. The ensemble technique has reduced the variance and model complexity.

6.1 Future scope

The work further can be extended for stable robot walk generation, automatic person tracking, clinically person health monitoring, surveillance of person based on gait, restoring of elderly and crouch walk and pedestrian navigation, etc. As a future scope, the economical reliable and sustainable solution can be developed by imparting the human behavior and learning using hybrid deep learning algorithm for automatically gait activities recognition. Gait-related walking activities are very important for analysis of postural instability, repairment of gait abnormality, diagnosis of cognitive declination, and enhance the cognitive ability of human-centered humanoid robot system. Though this proposed system is a prototype, but the preliminary result is encouraging and leads to development of an integrated system for tracking automatic gait activities. The system may consist of IMU sensor with processing, display device and in a software part there will be a pre-recorded perfect walking activity recognition so that subject may follow the rehabilitation activities perfectly.