1 Introduction

Gait analysis is crucial in rehabilitation because it gives reliable information regarding a patient’s walking pattern and lower limb motions. This information can be used to identify divergences from normal gait and assess rehabilitation interventions’ effectiveness. By examining gait, clinicians can demarcate the degree of impairment, monitor progress during rehabilitation, and adjust the treatment plan accordingly. Gait analysis plays a crucial role in rehabilitation by providing valuable insight into the patient’s operational abilities and enabling clinicians to optimise the treatment plan to attain the most suitable outcomes. Injuries like knee osteoarthritis, sciatica, meniscus and anterior cruciate ligament (ACL) are among the leading sources of impairment worldwide, affecting people of all ages [1, 2]. It has been demonstrated that assistive technology has the potential to enhance the standard of life for individuals with these injuries, particularly by tracking their rehabilitation progress [3, 4]. Gait analysis is a standard diagnostic tool for neuromuscular and skeletal disorders that helps classify and assess lower extremity motion [5]. However, conventional clinical rehabilitation techniques involving gait analysis require comprehensive laboratory settings, which can be time-consuming, costly, and inconvenient, especially in secluded areas [6]. Remote monitoring of rehabilitation improvement using wearable devices has become critical to overcoming these limitations. These wearables can not only control assistive devices like exoskeletons but also yield improvement feedback to users and assist clinicians in assessing and treating patients [7,8,9]. Lower extremity motions are integral to many human activities, including sitting, standing, stair ascent and descent, and squatting. Gait analysis, which involves classifying and evaluating lower leg motions [3, 10], is crucial for diagnosing neuromuscular and skeletal disorders. However, traditional gait analysis techniques require extensive laboratory setups, making it necessary to develop more straightforward methods for assessing gait dysfunction. Various noninvasive and kinematics techniques have been proposed for this purpose [11].

In recent years, electromyography (EMG) has emerged as a widely used approach for recording muscle activities in the skeletal muscles, which is valuable for investigating neuropathic and myopathic conditions, controlling prosthetic devices, and aiding in rehabilitation. EMG signals can be obtained using surface or concentric needle electrodes. sEMG is mainly used in rehabilitation and prosthetic applications. In contrast, concentrically arranged needles diagnose neuromuscular disorders affecting motor units (MUs). In [12], Gautam et al. have used a transfer learning-based LRCN model on the publicly available UCI dataset to predict joint angles and classify lower extremity activities. They have achieved a mean classification accuracy of 92.4% and 98.1% for participants with knee disorders and fit participants, respectively. A general deep-learning approach is shown in Fig. 1. In [13], the authors have generated a dataset using a Microsoft Kinect V2 sensor for 12 different human activities. They have used a hybrid deep learning model for classification, and an average accuracy of 90.89% has been achieved.

Imbalanced class distribution is a significant problem in medical datasets, where the number of samples in different classes varies greatly. This imbalance can lead to prejudiced results towards the majority class, negatively impacting diagnosis accuracy [14]. Therefore, balancing the data by either oversampling the minority class or undersampling the majority class is essential to enhance diagnostic success. In a previous study [15], Rajesh et al. used the AdaBoost ensemble classifier to classify five groups of heartbeats with imbalanced ECG beats. Other studies have also shown that oversampling techniques, such as the Synthetic Minority Oversampling Technique (SMOTE) and Adaptive Synthetic Technique (ADASYN), can address the class disproportion problem [16,17,18,19]. Taft et al., for example, used SMOTE to improve the classification model’s performance in identifying adverse medication events in females hospitalised for childbirth and labour. [17].

Fig. 1
figure 1

Flowchart of general deep learning model

The major contributions of the authors are ensue:

  1. 1.

    Data Standardisation: The data were scaled using a standard scaler, which rescales the features to have a zero mean and unit variance. It ensures that all features have the same scale and distribution, improving the model’s efficiency and convergence.

  2. 2.

    Managing the imbalanced dataset: The dataset is highly imbalanced. We used ADASYN to balance the dataset.

  3. 3.

    Data representation: The four-channel sEMG input signal data were segmented into a 256ms window size with an overlap of 64ms.

  4. 4.

    Model Design: Our proposed hybrid ensemble deep neural network architecture for lower extremity recognition utilises an ensemble of CNN and LSTM models trained on four channels of sEMG data. By combining CNN and LSTM, the model can effectively capture human activity data’s spatial and temporal aspects.

  5. 5.

    Performance analysis: The model has been validated rigorously by investigating the sEMG data and has obtained an average accuracy of 98.8%, 98.3%, and 99.3% for fit individuals for sitting, standing and walking, respectively. Furthermore, an average accuracy of 98.2%, 98.1%, and 99.0% for individuals with knee pathology is a remarkable improvement over the previously published state-of-the-art work.

  6. 6.

    Statistical Analysis: The proposed ensemble model has been statistically tested to be significantly different from all competing algorithms using Friedman test, Bonferroni-Dunn test and Wilcoxon-signed rank test.

An ensemble deep learning model comprising CNN-LSTM using four-channel sEMG signal data has been proposed. Our hybrid model utilises both CNN and LSTM to capture human activity data’s spatial and temporal characteristics obtained through sEMG. While CNN captures spatial information, LSTM captures temporal information in the data.

Moreover, the proposed work dealt with the imbalanced dataset by employing Adaptive Synthetic Sampling (ADASYN) technique. This data augmentation technique addresses the problem of imbalanced data by creating synthetic data for classes with fewer samples. ADASYN uses weight distribution while generating synthetic samples for minority classes. Unlike SMOTE, where the synthetic samples are generated uniformly for all minority classes, ADASYN generated synthetic samples according to their difficulty in learning. Hence, more synthetic samples are generated for minority classes that are harder to learn and most likely to be misclassified. The ADASYN algorithm has been shown to improve the performance of classifiers on imbalanced datasets by effectively increasing the size of the minority class and reducing the bias towards the majority class. None of the previous work published on this dataset has dealt with the problem of imbalanced classes, which can make the model overly biased towards the majority class. Moreover, none of the previously published work has performed statistical tests like the Bonferroni-Dunn test, the Friedman test or Wilcoxon test to demonstrate whether the proposed hybrid ensemble model is significantly different from the existing models or not. Our proposed model (ensemble) not only outperforms existing competing algorithms but is also proven to be significantly different statistically. The same has been demonstrated in the results (Sect. 6) of this paper.

The remaining portion of this article is laid out as follows. Section 2 comprises the related work of lower extremity activity recognition using wearable sensors. Section 3 contains the description of the individual components of the proposed hybrid ensemble model. Section 4 contains the comprehensive description of the architecture and functionality of our model. Section 5 is the methodology section that entails data collection and pre-processing information. Section 6 outlines the formulas for evaluating the model’s performance. It also comprises a detailed description of the results obtained and provides a comparative analysis with other cutting-edge work. Section 6 also gives the results of all statistical tests performed. Lastly, Section 7 comprises the conclusion and the future scope.

2 Literature review

Extensive research has been conducted on gait analysis and human activity recognition. The data can be collected as sensor-based or video-based [20]. Both of these methodologies have their benefits and drawbacks. Sensor-based systems are more successful at capturing the tiny subtleties of human motion that may be difficult to see in video frames. Gait analysis can be performed on data obtained from IMU sensors. IMUs typically comprise accelerometers, gyroscopes, and sometimes magnetometers and can be worn on various body parts to capture motion data [10, 21,22,23]. In [24], Naik et al. investigated 11 fit individuals and 11 with knee disorders. Independent Component Analysis (ICA) separated the sEMG signals into independent components representing individual muscle activations. Six time-domain features were extracted, and an ensemble-based modelling approach was later used for classification. Fit subjects had a mean classification accuracy of 96.1%, while people with knee problems had an accuracy of 86.2%. Zhang et al. introduced a new technique for identifying multi-channel electromyography (EMG) signals using noise-assisted multivariate empirical mode decomposition (NA-MEMD) in [25]. The NA-MEMD technique was used to partition multi-channel EMG data into a set of intrinsic mode functions (IMFs) that capture the signal’s numerous frequency components. However, their study was limited to fit participants and was not extended to participants with knee disorders. The average classification activity obtained was 79% for walking and 83% for both sitting and standing, respectively. In [12], Gautam et al. proposed a combination of convolutional and recurrent neural networks (LRCN) for the recognition of lower extremity movements, prediction of knee joint angles, and classification of lower extremity activities using surface electromyography (sEMG) signals for both fit and participants with known knee disorders. According to [12], the authors reported mean classification accuracy for fit and people with knee ailments of 98.1% and 92.4%, respectively. None of the previous work [12, 24, 25] dealt with a significant problem of an imbalanced dataset. In [26], the authors have proposed a deep neural network model for the consecutive estimation of lower extremity motions using sEMG signals. The model comprises multiple branches, each processing a different segment of the sEMG signal. The outputs from these branches are then combined to produce the final prediction. The model achieves a mean classification accuracy of 90.92% for speed-dependent and 85.4% for speed-independent. In [27], the authors have used the sEMG data of 11 fit participants and 11 participants with known knee disorders. They have denoised the data and obtained time-domain features. Various anomaly detection techniques were used to enhance the model’s classification accuracy. Various machine learning classifiers, notably random forest and light gradient boosting, were used to classify the activities. The best classification accuracy of 98.5% was obtained using the iforest anomaly detection algorithm with a light gradient boosting machine. The LDA-PSO-LSTM algorithm combines three techniques: long-short-term memory (LSTM) neural networks, particle swarm optimisation (PSO), and linear discriminant analysis (LDA). To extract discriminant characteristics from sEMG data, the LDA is used. The PSO algorithm is then used to optimise the hyperparameters of the LSTM, which is employed to recognise the gait phases [28]. In [29], authors used a convolutional network for feature selection from the denoised data and a kernel extreme learning machine to classify lower limb activity. The accuracy reported was 95.90% for classification. In [30], the authors have proposed a bimodal hybrid classifier on IMU sensor data for HAR, which can improve the robustness and accuracy of the recognition system. However, the approach may require a relatively large amount of training data to achieve high accuracy. Secondly, the approach may be affected by sensor placement and calibration, which can affect the quality of the motion data.

3 Deep neural networks (DNNs)

Deep neural network (DNN) is a form of artificial neural network (ANN) that consists of multiple hidden layers between the input and output layers. There are various deep learning models like convolution neural network (CNN), long short-term memory (LSTM) and bi-directional LSTM, etc. In this section, the individual components of the proposed hybrid ensemble deep learning model are detailed.

3.1 Convolution neural network (CNN)

Convolutional neural networks (CNNs) perform various operations such as convolution, pooling, dropout, fully connected layer and activation functions.

  1. 1.

    Convolution: The convolution operation involves sliding a filter or kernel over the input data and performing element-wise multiplication, followed by a summation. The equation for the convolution operation is Eq. 1

  2. 2.

    Pooling: Pooling is a downsampling operation that reduces the spatial size of the activation maps while preserving the essential features. The two most common types of pooling operations are maximum pooling and average pooling. Maximum pooling selects the maximum value within a window of pixels, while average pooling calculates the average value within the window. In our proposed algorithm, maximum pooling has been employed.

  3. 3.

    Activation Function: An activation function adds nonlinearity to the network after each convolution or pooling operation. The most popular activation function is the rectified linear unit, denoted as ReLU, which returns a maximum of 0 or the input value. ReLU and softmax activation functions have the equations Eqs. 2 and 3, respectively.

  4. 4.

    Fully Connected Layers: After several convolutional and pooling layers, the feature maps are flattened into a vector and passed through one or more fully connected layers. These layers are similar to feed-forward neural networks, where each neuron is connected to all neurons in the previous layer. The output of the final dense layer is then supplied to a softmax layer for classification. The equation for the same is Eq. 4.

  5. 5.

    Dropout Throughout the training process, dropout neurons are selected at random and eliminated by setting their synaptic values to zero in every layer. The pace at which these synaptic are lost is known as the dropout rate. Additionally, it speeds up a model’s learning process and inhibits overfitting.

    $$\begin{aligned} c(t_{s})= & {} (a*b)(t_{s}) \end{aligned}$$
    (1)
    $$\begin{aligned} \text {ReLU}(x)= & {} \max (0, x) \end{aligned}$$
    (2)
    $$\begin{aligned} \text {softmax}(z_i)= & {} \frac{\exp (z_i)}{\sum _{j=1}^{K}\exp (z_j)} \end{aligned}$$
    (3)
    $$\begin{aligned} X= & {} Wz + \textrm{bias} \end{aligned}$$
    (4)

    where \(c(t_{s})\) denotes the convolution of a and b, depicted by \((a * b)(t_{s})\), and is defined as the integral of the product of the two functions, shifted by a parameter \(t_{s}\). Furthermore, \(\exp {z_i}\) and \(\exp {z_j}\) denote the standard exponential function for input and output vector, respectively, and k denotes the number of categories.

3.2 Long short term memory (LSTM)

It is commonly employed in analysis of time-series and natural language processing.

  1. 1.

    Input Gate: The input gate controls what data from the current input are stored in the memory cell. The input gate’s equation is Eq. 5.

  2. 2.

    Forget Gate: The forget gate determines which information from the previous memory cell should be discarded. Equation 6 gives the equation for the forget gate.

  3. 3.

    Output Gate: It identifies which information from the current input and previous memory cell should be outputted. The equation for the output gate is Eq. 7.

  4. 4.

    Cell State: It is the long-term memory of the LSTM. It is updated based on the forget, input and output gates. The equations for the same are Eqs. 8 and 9.

  5. 5.

    Hidden State: The hidden state is the short-term memory of the LSTM. It is calculated based on the cell state and the output gate. The equation for the hidden state is shown in Eq. 10.

    $$\begin{aligned} h_{ts}= & {} \sigma (W_\mathrm{{input}} * [h_{ts-1}, x_{ts}] + b_\mathrm{{input}}) \end{aligned}$$
    (5)
    $$\begin{aligned} f_{ts}= & {} \sigma (W_\mathrm{{forget}} * [h_{ts-1}, x_{ts}] + b_\mathrm{{forget}}) \end{aligned}$$
    (6)
    $$\begin{aligned} o_{ts}= & {} \sigma (W_\mathrm{{output}} * [h_{ts-1}, x_{ts}] + b_\mathrm{{output}}) \end{aligned}$$
    (7)
    $$\begin{aligned} c_{ts}= & {} \textrm{tan}h(W_{c}[\tilde{h}_{ts-1},x_{ts}] + b_{c}) \end{aligned}$$
    (8)
    $$\begin{aligned} c_{ts}= & {} f_{ts} * c_{ts-1} + i_{ts} * \tilde{c_{ts}} \end{aligned}$$
    (9)
    $$\begin{aligned} h_{ts}= & {} o_{ts} * tanh(c_{ts}) \end{aligned}$$
    (10)

    where \(f_{ts}\), \(o_{ts}\) and \(h_{ts}\) denote the forget, output and input gate, respectively. W implies the weight for the corresponding gate (g), \(h_{ts}\) denotes the previous LSTM block’s output at time step ts-1, \(x_{ts}\) denotes the input at current time step ts, and b implies the bias for the corresponding gate (g). \(c_{ts}\) implies cell state at time step ts, and \(\tilde{c_{ts}}\) denotes the candidate for cell state at time step ts.

3.3 Hybrid ensemble deep learning

A hybrid deep ensemble model is a model that combines both deep learning and ensemble learning. Basic deep-learning models can learn complex patterns in data but may suffer from overfitting and have difficulty generalising to new data. Ensemble learning, on the other hand, combines multiple models to improve performance by leveraging the strengths of each deep learning model and reducing the risk of overfitting. In a hybrid deep ensemble model, multiple deep learning models are trained on the same data but with different architectures, hyperparameters, or random initialisations. The outputs of these models are then combined. This approach can improve the performance of deep learning models by reducing the risk of overfitting and increasing the diversity of the models. It can also provide more robust predictions and better generalisation of new data. This proposed work uses both LSTM and CNN to create a deep hybrid ensemble model that considers the data’s spatial and temporal aspects.

4 Proposed model

The ensemble model is made up of LSTM and CNN models that run in tandem. Two convolutional layers precede a maximum pooling layer and a dropout layer in the model design. The convolutional layers employ 3 × 1 filters with increasing filters, commencing with 32 and doubling to 64. Using max pooling layers diminishes the dimensionality of the feature maps and extracts the most significant characteristics. By randomly turning a percentage of the input units to 0 during training, the dropout layers help circumvent overfitting by manipulating the network to acquire more robust and generalisable features. The LSTM model is split into two layers, each with 32 and 16 units. Finally, the output of the final convolutional layer is transformed into a vector. The outputs of the LSTM and CNN models are concatenated and supplied into a dense layer consisting of 32 units and an activation function of ReLU. The dense layer’s output is then transmitted to the dense layer that follows it, which uses a softmax activation function to generate the final estimations for the three categories. To train and access the model on the training set, threefold cross-validation is employed. The 3-fold cross-validation is depicted in Fig. 2. To avoid overfitting, a premature end of the callback is also used. Figure 3 depicts the architecture. Table 1 contains details regarding the model’s architecture.

Fig. 2
figure 2

Threefold cross-validation architecture

Fig. 3
figure 3

Architecture of our proposed model

Table 1 Model architecture

Furthermore, the approach of the suggested study is shown in Algorithm 1.

Algorithm 1
figure a

Proposed Hybrid Ensemble Learning-based Human Activity Recognition System

5 Experimental setup and details

The content of this section encompasses a description and pre-processing of the dataset used, as well as a schematic diagram illustrating the proposed architecture and the algorithm.

5.1 Data acquisition and setup

This study gathered signal data from 22 male volunteers aged 18 and up. Four sEMG channels and one lower limb goniometer measurement channel are included in the data. The participants were split into two groups: eleven fit people and eleven with knee disease, including one with sciatica pain, four with meniscal rupture and six with an anterior cruciate ligament (ACL) injury. Biometrics Ltd. and Datalog MWX8 with the goniometer SG150B were utilised to obtain data on sEMG and knee joints. To ensure interference-free sampling, the SEMG electrodes are separated by 20 mm, and the input impedance is greater than 10 M ohm. The data capture sampling rate was 1000 Hz, and the range for filtering the data instances was between 20 Hz and 460 Hz. The goniometer is positioned on the outer side of the knee. The sEMG electrode channels were positioned on the rectus femoris (rf), semitendinosus (st), vastus medialis (vm), and biceps femoris (bf). The participants’ left and right extremities were selected for fit individuals and those with knee disorders. sEMG data and knee joint angles were collected while the subject engaged in three types of physical tasks: standing while making knee flexion motions, walking at ground level, and sitting while performing knee extension actions. These exercises are commonly performed in daily life and rehabilitative activities and do not necessitate the use of additional weights, dumbbells, or fitness equipment. The database does not include knee joint angle measurements or the sEMG signal during transitional activities such as sit-to-stand or walk-to-sit.

5.2 Data pre-processing

The four-channel sEMG signal data was first standardised to have unit variance, and zero mean to ensure the faster convergence of gradient descent and to preclude the overfitting problem [31]. We also employed the ADASYN algorithm to balance the imbalanced dataset. ADASYN creates synthetic samples of the classes with a smaller number of samples. It was done to ensure the model is impartial and unbiased towards the majority class. Later, the four-channel sEMG signal data were segmented into a 256ms window size with 64 ms overlap using a sliding window approach, as per the research published by [24]. We employed k-fold cross-validation, where k = 3, to perform an equitable comparison with existing cutting-edge work. To create k-folds, we randomly divided the dataset into k equal subgroups and repeated the process k times. One of the k subsets is designated as the testing or validation set in each fold, whereas the remaining k-1 subsets are designated as the learning set. The final result is a calculation of the average classification accuracy over k-folds.

6 Results and discussion

6.1 Performance evaluation criteria

The efficiency of the suggested hybrid ensemble model is evaluated using accuracy, F1_score, recall, and precision. The following equations give the formula used for each of the performance evaluation criteria:

$$\begin{aligned} \textrm{Prec}= & {} \frac{\textrm{Tru}\_\textrm{Pos}}{\textrm{Tru}\_\textrm{Pos}+\textrm{False}\_\textrm{Pos}} \end{aligned}$$
(11)
$$\begin{aligned} \textrm{Rec}= & {} \frac{\textrm{Tru}\_\textrm{Pos}}{\textrm{True}\_\textrm{Pos} + \textrm{Fal} \_\textrm{Neg}} \end{aligned}$$
(12)
$$\begin{aligned} F1\_\textrm{score}= & {} 2 * \frac{\textrm{Prec} * \textrm{Rec}}{\textrm{Prec} + \textrm{Rec}} \end{aligned}$$
(13)
$$\begin{aligned} \textrm{Acc}= & {} \frac{\textrm{Tru}\_\textrm{Pos} +\textrm{Tru}\_\textrm{Neg}}{\textrm{Tru}\_\textrm{Pos} +\textrm{Fal}\_\textrm{Pos} +\textrm{Fal}\_\textrm{Neg}+\textrm{Tru}\_\textrm{Neg}} * 100 \end{aligned}$$
(14)

where Acc denotes accuracy, Prec denotes precision, Rec denotes recall, Tru_Pos denotes true positive, Tru_Neg denotes true negative, Fal_Pos denotes false positive and Fal_Neg denotes false negative.

Apart from the metrics mentioned above, we have also performed statistical tests, namely Friedman test [32], Bonferroni-Dunn test [33] and Wilcoxon test [34], to compare our proposed ensemble model with other competing algorithms.

6.1.1 Comparative analysis

The effectiveness of the suggested model was assessed using the aforementioned equations: Eqs. 1114. The results obtained have been consolidated in Tables 2 and 3. Here, Tables 2 and 3 show the individual subject analysis for healthy and subjects with knee pathology on three different activities, namely walking, standing and sitting obtained by proposed hybrid ensemble model. However, the point is that none of the previously published state-of-the-art work has dealt with the imbalanced dataset problem. Hence, it would be inappropriate to compare the accuracy, as accuracy alone is not the best metric to evaluate the performance of a model, especially when the class distribution is imbalanced. The average classification accuracy, precision, recall and f1_score obtained by the proposed hybrid ensemble model are 98.8%, 98.8%, 98.7% and 98.7%, respectively, for all the healthy participants and 98.4%, 98.4%, 98.3% and 98.3% for the participants with knee pathology. Moreover, this paper also includes the participant-wise classification accuracy for fit individuals obtained by the proposed hybrid ensemble model in Table 4. Table 4 also compares the suggested model with other cutting-edge work, namely MyoNet [12], ICA-EBM [24], NA-MEMD [25], on the same dataset for fit participants. The average classification accuracy of walking activity obtained by the proposed model is 99.3% ± 1.1 as opposed to 98.2% ± 1.6 in [12], 96.0% ± 1.3 in [24], and 79.0% in [25]. Furthermore, the average classification accuracy for standing activity for fit participants obtained by the proposed hybrid ensemble model is 98.3% ± 2.5 and for sitting is 98.8% ± 2.0 which shows a significant improvement over the previously published state-of-the-artwork: 97.7% ± 1.3, 98.4% ± 1.4 [12] and 96.2% ± 1.2, 96.2% ± 1.1 in [24], 83.0% and 83.0% in [25], respectively. The comparison of mean classification accuracy obtained using the proposed hybrid ensemble work with other state-of-the-art work for healthy participants presented in Table 4 is further illustrated in Fig. 4a.

Our proposed model also shows promising results for classifying various lower limb activities performed by individuals with knee pathology. Table 3 comprises the class-wise average accuracy, f1_score, precision, and recall for each individual with known knee pathology. Moreover, Table 5 compares the proposed model’s performance on data obtained from participants with knee pathology with existing cutting-edge work, namely MyoNet [12] and ICA-EBM [24]. The mean accuracy of 98.2% ± 2.9, 98.1% ± 2.6, 99.0% ± 1.4 for sitting, standing, and walking activities for participants having knee disorders using the proposed hybrid ensemble model. It shows a significant improvement from the average accuracy obtained by previous cutting-edge work; 86.4%, 85.5%, 86.6% in [24], 92.2%, 92.3%, 92.8% in [12] for walking, standing, and sitting, respectively. The details presented in Table 5 can also be seen in Fig. 4b. The previously published state-of-the-art work has struggled to perform activity classification for subjects with knee pathology, as for such individuals, each pathology is different and can lead to different functional limitations and activity restrictions. Therefore, developing a single classification model that accounts for different pathologies in different individuals is challenging. Our proposed model achieved an average accuracy of 98.4% across all individuals with pathology as opposed to 92.4% in [12], 86.2% in [24]. The proposed model outperforms the previously published state-of-the-art work [24] by 12.2% and [12] by 6.0%. Moreover, it is worth noting that in the real world, signals contain noise, and our proposed method works on noisy signals and does not employ any denoising technique. Furthermore, the proposed model showed an average precision, recall and f1_score of 98.4%, 98.3% and 98.4%, as opposed to 93.4%, 92.6% and 92.9%, respectively, across subjects with knee pathology. The comparison of the overall performance of the proposed model with the MyoNet method [12] for healthy subjects and subjects with knee pathology has been shown in Fig. 4c and d, respectively. It can be noted that the proposed model outperforms the Myonet method [12] in terms of recall, F1_score and accuracy for healthy subjects and in terms of accuracy, recall, precision and F1_score for subjects the knee pathology.

In this research, a subject-wise comparison of the suggested ensemble model with CNN-only and LSTM-only models for healthy subjects and subjects with knee disorder is also shown in Tables 6 and 7, respectively. The mean accuracy for walking, standing, and sitting using the CNN-only model is 98.3%, 96.7% and 96.0% for fit individuals and 97.6%, 97.4% and 97.2%, respectively, for subjects with knee disorders. Similarly, the LSTM-only models achieved an average accuracy of 86.0%, 81.7% and 91.2% for walking, standing, and sitting activities for fit participants and 80.1%, 79.3% and 78.3%, respectively, for participants with knee disorder. It can be noted that LSTM_only suffers massively in classifying the lower limb-related activities for both fit and individuals with knee pathology, and CNN_only performs better than LSTM_only but the hybrid deep ensemble model performs the best as it captures both the spatial and temporal aspect of the data. The comparison of accuracy obtained using CNN_only, LSTM_only and proposed hybrid model have also been further illustrated in Fig. 4e and f.

Fig. 4
figure 4

Comparative analysis of proposed Model with other cutting-edge methodology

Table 2 Average precision, accuracy, F1_score and recall obtained of each healthy subject for each activity
Table 3 Average precision, accuracy, F1_score and recall obtained of each subject with knee pathology for each activity
Table 4 Comparative analysis of classification accuracy (%) for different methods on the three activity classes for the healthy subjects dataset
Table 5 Comparitive analysis of classification accuracy (%) for subjects with knee pathology
Table 6 Comparative analysis of suggested Ensemble model with LSTM-only and CNN-only model for healthy subjects
Table 7 Comparative analysis of proposed Ensemble model with LSTM-only and CNN-only model for subjects with knee pathology

6.1.2 Statistical test

To compare the efficiency of the suggested work with that of other cutting-edge approaches, we have performed friedman [32], bonferroni-dunn [33] and Wilcoxon signed rank test [34] on the proposed ensemble model, ICA [24], MyoNet [12]. In addition to the proposed ensemble model, we have also compared the CNN-only and LSTM-only models. We have not statistically compared our proposed model with NA-MEMD [25] because in [25], the study was limited to healthy subjects, and pathological subjects were not considered.

The Friedman test is a widely accepted statistical test to compare the performance of multiple models. Let \(r_j^i\) signify the rank of the ith algorithm over the jth dataset, \(r_i\) denote the mean rank of the ith method, and n denotes the count of sets or samples. k signifies the count of competing methods. In the event of a tie between any algorithms, the average rank is shared.

The Friedman statistic(\(F_r\)) is evaluated using the Eqs. [15] and [16].

$$\begin{aligned} F_r = \frac{(n-1)\chi _f^2}{n(k-1)-\chi _f^2} \end{aligned}$$
(15)

where

$$\begin{aligned} \chi _f^2 = \frac{12n}{k(k+1)}(\sum _{i=1}^k r_i^2 - \frac{k(k+1)^2}{4}) \end{aligned}$$
(16)

In our case, n = 66, k = 5, chosen level of significance (\(\alpha\)) = 0.05.

$$\begin{aligned} \chi _f^2 = \frac{12\times 66}{5\times 6} \times \left( 51.529 - \frac{5\times (6)^2}{4}\right) \end{aligned}$$
(17)
$$\begin{aligned} \chi _f^2= 172.367 \end{aligned}$$
(18)
$$\begin{aligned} F_r = \left( \frac{65 \times 172.367}{66 \times 4 - 172.367} \right) = 122.268 \end{aligned}$$
(19)

Critical value \(p\text {-value} = 1.1415 \times 10^{-36}\)

This \(p\text {-value}\) indicates decisive statistical significance, as it is much smaller than the threshold for statistical significance (\(\alpha\)) = 0.05. This means we can reject the null hypothesis that the models are not statistically different with high confidence. Additionally, we have used the pairwise comparison method of the Bonferroni-Dunn test [33] to determine whether algorithms are significantly different. Using the proposed model as the governing algorithm, we have also examined whether it performs noticeably better than rival techniques. Equation 20 determines the critical distance (CD).

$$\begin{aligned} CD = x_\alpha \left[ \frac{k(k+1)}{6n}\right] ^{1/2} \end{aligned}$$
(20)

At the level of significance, \(\alpha\) = 0.05, the obtained CD = 0.6876. The result for the Bonferroni-Dunn test with the proposed algorithm as a control algorithm and a general comparison is shown in Fig. 5a and b, respectively. Figure 5a shows that the CNN-only model and MyoNet [12] are similar and not statistically different, and LSTM-only and ICA [24] are also similar to each other and not statistically significantly different. As depicted in Fig. 5a and b, it is further emphasised that the proposed ensemble model (referred to as ensemble) is not only better performing than all the other competing algorithms in terms of accuracy but is also significantly different from them.

Fig. 5
figure 5

Comparative analysis of each algorithm using the Bonferroni-Dunn test

To further validate that the proposed hybrid ensemble model outperforms the other model and is significantly different from existing state-of-the-art work, we have employed Wilcoxon Signed-Rank Test [34]. The result of the Wilcoxon signed-rank test has been shown in Table 8. As shown in Table 8, the \(p-\text {value}\) is significantly less than the significance level \(\alpha\) = 0.05. The proposed ensemble model is significantly different from other competing algorithms and outperforms each of them.

Table 8 Wilcoxon signed-rank test results for the suggested ensemble model opposed to other models

6.2 Discusssion

Individuals with knee-related illnesses like osteoarthritis, meniscus tears, and ACL tears frequently struggle with daily activities involving lower extremity motions such as walking, standing, and sitting [35]. Surface electromyography signals obtained from the muscles, like the quads and hamstrings, during motion can aid in diagnosis by clinicians and assist in rehabilitation [36, 37]. Additionally, sEMG signals can help evaluate the improvement in physiotherapy sessions when using a network-based rehabilitation approach.

We have suggested a hybrid ensemble model comprising LSTM and CNN that captures the spatial and temporal aspects of the signal. In [25], Zhang et al. investigated the use of intrinsic mode functions (IMFs) obtained from the decomposition of surface electromyography (sEMG) signals as features for identifying the lower leg motion. They explored the use of three different decomposition methods, namely multivariate empirical mode decomposition (MEMD), noise-assisted MEMD (NA-MEMD) and empirical mode decomposition (EMD), to extract the IMFs from the raw sEMG signal. One of the significant limitations of their work is that they have restricted the scope of their study to the data of healthy subjects and not extended it to subjects with knee disorders. In [24], Naik et al. have extracted six time-domain features and then applied feature dimensionality reduction. Furthermore, the authors used SVM with an RBF kernel to classify lower limb motion in fit subjects and subjects with knee abnormalities. We have compared our suggested ensemble model with previous cutting-edge work and CNN-only and LSTM-only models. The ensemble model outperforms all the other methods with an average accuracy of 99.3%, 98.3% and 98.8% for a walk, stand and sit activity for healthy subjects and 99.0%, 98.1% and 98.2% respectively, for subjects with knee pathology. Moreover, our proposed ensemble model is also statistically proven to differ significantly from all the competing methodologies. For better understanding, the comparative analysis of the previous cutting-edge techniques with the proposed ensemble model is shown in Table 9.

Table 9 Comparison with related work

7 Conclusion and future scope

A hybrid ensemble deep learning model is proposed for classifying lower extremity movement. The proposed hybrid ensemble model significantly improves the recall, f1_score, accuracy, and precision values over the previously published state-of-the-art work. Our proposed hybrid ensemble model provides an average accuracy of 98.8%, precision of 98.8%, f1_score of 98.7% and recall of 98.7%. In addition to the proposed hybrid ensemble model comprising CNN and LSTM models, this work has also standardised the data zero mean and unit variance that leads to faster convergence of the models along with enhacing model’s efficiency. The dataset was highly imbalanced, and the previous works had not considered that aspect. An imbalanced dataset may lead to discriminatory behaviour by the model towards the majority class, and hence accuracy may not be the best metric to gauge the model’s performance. We have handled the imbalanced dataset problem using the adaptive synthetic oversampling algorithm (ADASYN). Enhancing classification accuracy, our suggested hybrid ensemble model beats all current cutting-edge work, particularly for participants with knee pathologies. For fit participants and participants with knee disorders, our model had a mean classification accuracy of 98.8% and 98.4%, respectively. Additionally, the proposed ensemble model has been also compared with CNN and LSTM individually. And our results show that the proposed hybrid ensemble models works better in catering both temporal and spacial aspect of the data. Moreover, the proposed work has also incorporated statistical tests like the Friedman, Bonferroni-Dunn, and Wilcoxon tests to compare our proposed model with all other competing algorithms. All the statistical tests and performance metrics proved that our model is superior and significantly different from all other competing algorithms discussed. None of the previous works has standardised the data, handled the imbalanced dataset problem, or compared their algorithms statistically with previous state-of-the-art work. All the points mentioned above have been addressed in the proposed work. In the future, we intend to create a model that can classify activities and predict joint angles. The proposed model can also be validated using a larger dataset. We will additionally employ TinyML to significantly reduce the model’s complexity and make it more computationally efficient.