1 Introduction

In today's world, with the progress of technology and globalization, personal healthcare systems have taken an essential part of everyone's life [1]. According to World Health Organization (WHO), the highest death rate occurs due to heart disease or cardiovascular diseases. The WHO statistics show that the total number of deaths caused due to cardiovascular disease is 15.2 million [2].

The heart is a vital part of the human body. Every part of the body receives blood and essential nutrients through the heart. The remaining organs in the body can stop working if there is a malfunctioning of the heart. The risk of heart diseases increases with diabetes, blood pressure, high cholesterol or obesity. Thus, the prediction of heart disease using medical data will play an essential role in healthcare analytics [3]. Heart diseases can be predicted based on various conditions such as arrhythmia, cardiomyopathy, coronary or peripheral blood vessels. However, the first check of heart disease is based on ECG. It is used to determine the reliable functioning and electrical activity in the heart [4]. It is more popular as it is a non-invasive and simple mechanism for the prediction of heart disease. The important part of the ECG signal is it gives the different types of heartbeats such as normal, PVC, APC and PB [5].

The ability to read and derive a set of conclusions from ECG data involves carefully observing the data for specific patterns. Pattern classification has been dramatically improved with advances in computational power and algorithms. Machines can learn the patterns of data using DL and computational algorithms [6]. They possess the ability to learn essential features as well as unknown features that influence the classification task. The learning algorithms are capable of providing accurate results in ECG analysis and classification [7]. The DL models and computational algorithms also help in the classification of the recurring patterns in ECG data.

In the proposed work, we study the usage of DL methods with ECG signals and analyze its influence on heartbeats. In Sect. 2, the related work based on the different ECG analysis strategies is discussed. The different DL techniques based on the stacked LSTM model are discussed in Sect. 3. It is followed by the experiments and the results in Sect. 4. The summary and future work based on the findings is discussed in Sect. 5.

2 Related work

The different types of work for the heartbeat classification and prediction are discussed in this section. The techniques of identifying a person with heart disease or not will be beneficial for the medical field and individuals. The risks known in prior will help take the necessary precautions and awareness among the people. The essential types of the DL models considered for the review are LSTM, CNN, RNN, and GRU. The different types of work carried out in this regard are explained further.

For the task of arrhythmia classification, DL models offer multiple approaches for obtaining the solution. While raw ECG data can directly be fed to the DL model for classification, it was found that by introducing encoded features and the data, higher classification accuracy can be obtained. The introduction of encoded features introduces a high cost in the process, and approaches such as the use of LSTMs to capture temporal dependencies have been shown to avoid that cost [8]. Studies such as [910] suggest fusing existing and previous softmax outputs to avoid overfitting. Some studies use one-dimensional ECG signals as inputs to models by converting them into time–frequency ranges. The effect of layer increments in DL models is one of the major parameters that are studied. With the success of Convolutional Neural Networks in many practices and fields, their use in the medical field has gained a lot of attention.

The CNNs are used to classify patient-specific beats as well as to detect different types of ECG data. Some of the CNN models are prepared with depths of many layers reaching almost 34-layer networks. By using techniques such as batch normalization and dropout, the occurrence of overfitting is avoided. The inherent presence of imbalanced data can lead to misleading results in the performance of classifiers. Due to the sequential nature of ECG data, LSTM based approaches are quite popular. LSTM networks have severely reduced the computation time to classify arrhythmia data. LSTM networks have also been used as a feature extractor in [11]. They used an LSTM-based autoencoder model and input the features for classification by SVM. RNN was used for the diagnosis and the classification of the arrhythmic beats [12]. The evaluation of the RNN model was carried with the metrics accuracy, sensitivity and specificity. The RNN model consisted of three layers with 64, 256 and 100 neurons, respectively.

In [13], CNN based method was used for the classification and analysis of the heartbeats. It consisted of four layers with 32, 16, 16 and 32 neurons in the respective layers. The backpropagation and stopping criterion was used to achieve the required training level accuracy. A two-stage hierarchical method [14] was used for the end-to-end classification of the heartbeats. The approach consisted of identifying the category first and the final class it belongs to. In [15], the fusion-based Neighbourhood component analysis (NCA) method was used for heartbeat classification. It used the dimensionality reduction method to reduce the dimensions for analysis. A nearest neighbourhood classifier method was used for the classification using the distance metrics, namely Euclidian, spearman and cosine.

A normalized CNN method was used for the arrhythmia classification [16]. It consisted of convolution, batch normalization and dropout layers. A variation of the CNN was experimented with seven hidden layers for the heartbeats classification [9]. Principal Component Analysis (PCA) with optimized Directed Acyclic Graph Support Vector Machines (DAG-SVM) was used to analyze and classify the heartbeats. The data reduction method PCA extracted the data's essential features, and SVM was used for the final classification [17]. LSTM network was used to recognize the arrhythmias based on the ECG signals automatically. The heartbeats' final classification based on the ECG signals was carried out using the SVM [1819].

As discussed, there are various DL methods for ECG classification [20]. The different types of techniques, along with the techniques used, are summarized in Table 1. Most of the works used Vanilla CNN and RNN neural networks for the classification. However, current deep learning research [21] has been evolved with LSTM neural networks, which are better than CNN and RNN. In this paper, the LSTM and its variants are studied for classification. The methodology followed for the design of the LSTM and its variants is discussed in the next section.

Table 1 Summary of Related work for Heartbeat classification

3 Proposed methodology

In this section, we discuss the methodology followed for the analysis of different Deep learning models considered. The proposed methodology uses LSTM as the fundamental neural network for ECG analysis and heartbeat classification. There are five classes, namely, Normal, PVC, APC, PB and others. The overall methodology followed for the analysis is as shown in Fig. 1. The different neural networks based on the LSTM and its structure are discussed in the later sections.

Fig. 1
figure 1

Overview of Methodology

3.1 LSTM

In applications that involve sequential data, traditional neural networks often face problems due to their inability to capture long and short-term dependencies. In ECG data, these dependencies are crucial and must be modelled appropriately to obtain accurate results. Recurrent Neural networks have an internal memory present that assists in learning with the application of backpropagation through time and provide a method to calculate short term dependencies.

However, problems arise in capturing long term dependencies due to the vanishing gradient problem. To solve this problem, gated RNN models such as LSTM networks have been proposed, which consist of LSTM cells that each have input, forget, and output gates, as shown in the Fig. 2. The introduction of such gates has proved to be monumental in capturing long term dependencies and offers a viable method for capturing valuable information from ECG data.

Fig. 2
figure 2

Architecture of LSTM

Figure 2 presents the basic overview of an LSTM cell [22]. LSTM cells work by determining the amount of information to be carried across iterations based on the values of the input, output and forget gates. At any instant of time, the primary inputs to the cell are, xt, which is the current input to the cell, ht-1 presents the cell's hidden information so far, whereas Ct-1 refers to the previous cell state of the LSTM. For the information to be carried across time, the input, output and forget gates are used to perform necessary calculations. While the forget gate is used to determine the amount of information necessary from the previous cell state, the input gate is used to calculate the influence of the current input to the cell. These two gates are essential in maintaining the cell state at any given point in time. The output gate ot is instrumental in calculating the amount of information of the cell state to be carried to the next instant of time.

3.2 Dataset and setup

The MIT-BIH Arrhythmia Dataset [2324] is a benchmark standard for all ECG Data classification tasks. The dataset consists of 48 half-hour excerpts of two-channel ambulatory ECG. The records 100 to 124 consist of ECG data chosen randomly, whereas records 200 to 234 show less common but clinically significant arrhythmias. All records were included from the second series to ensure all Arrhythmia samples were included. In contrast, a random number of samples from the first series were sufficient to capture all the necessary data. The experiments were implemented using Python and Tensorflow. After a 70:30 split of the dataset, the model was trained and tested on the suitable datasets. The computer used in the experimental studies has an Intel Core i7-9750H 2.60 GHz CPU, 16 GB memory and NVIDIA GeForce GTX 1660 Ti graphics card.

3.3 Pre-processing

The set of ECG waves from the dataset are pre-processed using Fourier transform, and Inverse Fourier transform methods. These methods are implemented in python using rfft() and irfft() methods in the scipy package [25]. The rfft () method took the input wave from the dataset and produced the smoothened wave. On obtaining the output produced by the rfft() method, it is fed as input to the irfft() method. The irfft() method takes the input wave and produces the smoothened wave that can be used to reconstruct the wave using the inverse of the Fourier transform. The final pre-processed wave from the iftt() method is fed to the DL model for obtaining predictions for the classes. The different neural networks based on the LSTM and its structure are discussed in the later sections.

3.4 Deep neural networks for classification

The comparison and analysis of the proposed methodology are evaluated using four different DL models with different size layers, as shown in Fig. 3. The base architecture of each of the neural network model is LSTM with different layer sizes, as shown in Table 2. Table 2 presents a brief overview of the number of LSTM layers and Hidden layers that were varied across different Deep Neural Network (DNN) architectures. DNN-2 had an increase in the LSTM layers compared to DNN-1. DNN-3 provided an increase in the hidden layer count, whereas DNN-4 saw the introduction of a Bidirectional LSTM layer. Each of the neural networks is discussed in this section.

Fig. 3
figure 3

Detailed Layered Representations of Neural Networks

Table 2 Deep Neural Networks for analysis

Neural Network 1 consists of an input layer,1 LSTM layer and 4 hidden layers before the output layer. The input later is responsible for receiving a pre-processed ECG wave. The LSTM layer can capture any temporal dependencies or patterns in the input wave and suitably adjust weights. The hidden layers are further used to classify the input wave correctly. This network has a relatively less number of parameters as compared to other networks and, as such, will have a lesser training time per epoch. The model uses the Adam optimizer, and the ReLu activation function was used for the hidden layers.

Neural Network 2 has an input layer,2 stacked LSTM layers and 3 hidden layers before the output layer. The introduction of stacked LSTM layers allows the model to learn complex patterns but introduces a large number of parameters and hence increases training time. The model uses the Adam optimizer, and the ReLu activation function was used for the hidden layers.

Neural Network 3 has an input later, 2 stacked LSTM layers and 4 hidden layers before the output layer. While the stacked LSTM configuration is responsible for capturing temporal patterns, an increase in the number of hidden layers can further help the model understand the classification patterns. The model used the Adam optimizer and the ReLu activation function.

As a final variation in the different Neural network architectures tested in the experiment, a Bidirectional LSTM layer was introduced to the stacked LSTM configuration. The introduction of Bidirectional LSTM allows the model to learn the temporal dependencies better. However, it results in an increase in the number of parameters and training time per epoch. The model uses the Adam optimizer, and the ReLu activation function was used for the hidden layers.

3.5 Evaluation metrics

The evaluation metrics used to evaluate the results are accuracy, sensitivity, specificity, precision, and F-Score [26]. Sensitivity refers to the proportion of positives identified correctly and calculated using Eq. (1), where TP stands for True Positive, and FN stands for False Negative. Specificity refers to the proportion of negatives that have been identified correctly and is calculated using Eq. (2), where TN stands for True Negative, and FP stands for False Positive. Precision is calculated using Eq. (3) and presents the positive predictive value or the fraction of true positive classifications compared to all positive classifications. F-score is defined as the harmonic mean of the precision and sensitivity and is calculated by Eq. (4). F-score values close to 1 suggest the goodness of the model.

$$\mathrm{SE }= \frac{\mathrm{TP}}{\mathrm{TP }+\mathrm{ FN}} * 100$$
(1)
$$\mathrm{SP }= \frac{\mathrm{TN}}{\mathrm{TN }+\mathrm{ FP}} * 100$$
(2)
$$\mathrm{Precision }= \frac{\mathrm{TP}}{\mathrm{TP }+\mathrm{ FP}} * 100$$
(3)
$$\mathrm{F}-\mathrm{Score }= \frac{2*\mathrm{TP}}{2*\mathrm{TP}+\mathrm{ FP}+\mathrm{FN}} * 100$$
(4)

4 Results and discussion

The performance of different neural networks with respect to Sensitivity, specificity, accuracy per class, overall accuracy and balanced accuracy was recorded and compared. The four deep neural networks results are defined in terms of accuracy, Sensitivity, and specificity for each of the classes, namely Normal, PVC, APC, PB, and others. The overall accuracy of all the DNNs considered is as shown in Fig. 4. The graph in Fig. 4 depicts the accuracy scores and the four neural network architectures' balanced accuracy scores. The pattern suggests an increase in accuracy with the stacked LSTM configuration and with an increase in hidden layers. The introduction of the Bidirectional LSTM layer has significantly improved the accuracy score of the stacked LSTM configuration.

Fig. 4
figure 4

Performance comparison of neural nnetworks

Table 3 presents the results of the different models on the data in the form of multiple performance metrics. DNN-4 achieved the best overall accuracy of 94.9%. On close observation of the performance of models for the Normal class, DNN-4 had the best accuracy of 97.8% compared to the other models. While DNN-3 has a sensitivity of 0.967 compared to that of DNN-4, which is 0.954, suggesting a decrease in the false-negative classifications. DNN-4 showed the best results of the PVC class's performance metrics, with the highest class accuracy of 98.2%. DNN-3 suggests a better Precision of 0.962 compared to that of DNN-4, 0.9 indicating a lesser number of false-positive classifications. For the PB class, the best results of all performance metrics were achieved by DNN-4 compared to the other models. The classification of the Others class showed higher precision and F score in DNN-3, 0.950 and 0.947, compared to that of DNN-4, 0.936 and 0.943, respectively. This can be attributed to a decrease in false-positive classification.

Table 3 Performance Values of DNN models for classification

The neural networks designed for the experimental study were trained on the arrhythmia data separately, and changes in accuracy and loss values for the training and validation sets are given in Fig. 5. The graphs of DNN-1 suggest the model's ability to reach a high accuracy in less time due to the simple nature of the model. The accuracy plots of the different neural network architectures suggest the presence of multiple local minima and the model's ability to overcome them through backpropagation. The accuracy plot of DNN-2 suggests that the model took multiple iterations to escape the local minima initially. In contrast, the accuracy plot of DNN-3 shows a more irregular pattern in reaching the final global optima, indicating multiple local optima. The accuracy plot of the DNN-4 shows an initial steep curve suggesting the ability of the model to learn fast and reach a better accuracy score quickly. However, this comes at the cost of an increased number of parameters.

Fig. 5
figure 5

Comparison of Accuracy of DNN for classification

The performance of the model often aims in decreasing the loss value and is evident in Fig. 6. The loss graph suggests a steady decrease in the loss values of DNN-1 over iterations. DNN-2, on the other hand, shows a high loss value for a large number of iterations initially and can reduce the value over the subsequent few iterations drastically. The loss graph of DNN-3 comparatively shows an irregular pattern but indicating a general decrease in loss across iterations. The steep decrease in the loss in DNN-4 indicates the model's tendency to quickly decrease loss and converge on the global optima in a few iterations, as is evident from the graph.

Fig. 6
figure 6

Comparison of Loss of DNN for classification

The performance analysis results of the proposed Bi-directional LSTM model (DNN-4) with the other baselines are as shown in Table 4. The accuracy of the bi-directional LSTM model is 95% that outperforms the other baselines, DAG-SVM (92%) and LSTM-2layers (81%). Similarly, the sensitivity of the bi-directional LSTM model is 94% which is higher compared to the baselines DAG-SVM (83%) and LSTM-2layers (89%). On the other hand, the specificity of the bi-directional LSTM model is 98% which is higher compared to the baselines DAG-SVM (91%) and LSTM-2layers (73%). The precision of the bi-directional LSTM model is 95% which is higher compared to the baselines DAG-SVM (78%) and LSTM-2layers (87%). The two evaluation parameters, namely sensitivity and specificity, are most important for the results of the classification. Sensitivity gives the proportion of the correctly classified true positives, while specificity gives the proportion of the correctly classified true negatives. The significant values of sensitivity and specificity demonstrate that the bi-directional LSTM model is more appropriate for the classification.

Table 4 Performance Comparison with the Existing works

5 Conclusion

Healthcare analytics is becoming predominant with the advances in computing, data collection, analysis and classification. DL methods are becoming vital for the analysis in Heartbeat classification because of the available data such as MIT-BIH arrhythmia. The main focus of the paper was ECG based classification of heartbeats using LSTM DL models. The different variants of LSTM DL models were implemented for the classification, of which bi-directional LSTM DL model provides the highest accuracy (95%). Most importantly, the bi-directional LSTM DL model's comparative analysis showed the highest values sensitivity (95%) and specificity (98%) with the existing works. Therefore, it is evident that the LSTM DL models provide an accurate classification of heartbeats. The classification results can be used to diagnose heart diseases and other treatments by doctors. In the future, the different variants of the bi-directional LSTM DL model with the different optimizers can be used to classify heart diseases.