Keywords

1 Introduction

Cardiovascular diseases (CVDs) have been the leading reason for death worldwide for the last 25 years. CVDs are responsible for 28% of deaths in India, according to a report from the Lancet Global Health [1]. Cardiovascular diseases (CVDs) overtook cancer as the leading cause of death in India in this century. This epidemiological transition is primarily due to an increase in the occurrence of cardiovascular diseases in India. In 2016, the approximate prevalence of CVDs in India was 54.5 million people. In India, one of every four deaths is now caused by CVDs, with ischemic heart disease and stroke accounting for more than 80% of this strain. Heart arrhythmia is often caused by irregular electrical conduction of cardiac tissue. Premature ventricular contractions (PVCs) for a long period may potentially progress to ventricular tachycardia (VT) or ventricular fibrillation (VF), which can result in heart failure. As a result, it is important to track heart rates regularly to control and avoid CVDs.

Analyzing electrocardiogram (ECG) signals, which consist of three main components: P wave, QRS complex, and T wave, is the most common method for diagnosing arrhythmias. It aids in the diagnosis of heart disease by monitoring the heart’s rhythm and electrical pulse. Heartbeat classification is important in ECG research for assessing the efficacy of arrhythmia diagnosis. Since manually categorizing heartbeats in long-term ECG recordings are time-consuming, the use of computerized algorithms provides critical diagnostic assistance. Significant preparation is needed to understand these dynamic cardiac arrhythmias associated with ECG characteristics consistently.

With the growth and development of AI technologies, many machine learning approaches are being utilized in ECG signal feature detection to solve problems related to vast quantities of ECG signal feature data. Several techniques for classifying ECG arrhythmias have been discussed in the literature. Guler et al. [3] proposed a feed-forward neural network as a classifier and wavelet transformation for feature extraction, both of which used the Lavenberg-Marquard algorithm in training while [8] discussed an optimization-based deep convolutional neural network (CNN).

Hannun et al. [2] have suggested a 1D CNN classifier that used more and broader data than Kiranyaz’s CNN model. Although using a bigger ECG dataset, the arrhythmia detection rate is significantly lower than that reported in the literature. Many of the following drawbacks can be seen in many historically written articles for ECG arrhythmia classification: (1) decent results without cross-validation on carefully picked ECG recordings, (2) ECG pulse loss in noise filtering and feature extraction methods, (3) a small number of ECG arrhythmia categories for classification, (4) relatively low classification performance to use in the field. This manuscript shows and discusses how arrhythmia can be classified into seven categories using deep 2D CNN with grayscale ECG images.

The remaining of this manuscript is structured in the following manner. Section 2 delves into the specific methodologies utilized to classify ECG arrhythmias, such as ECG data preprocessing and the use of a CNN classifier. Next, Sect. 3 deals with the evaluation and experimental results of ECG arrhythmia classification. Section 4 concludes the paper with the concluding remarks.

2 Methodology

In this paper, we have classified 7 types of ECG arrhythmia signals through the application of computer vision methods. Particularly, CNNs were used to understand the classification of two-dimensional ECG signal images. This was possible because of the transformation of ECG signals from the MIT-BIH and the PTB databases [11, 12] into 2D images during the data preprocessing phase. Various data augmentation techniques were applied to these images to increase our input data sample. A summary of the implementation of the processes is shown in Fig. 1.

Fig. 1
figure 1

Summarization of classification process

2.1 Pre-processing of ECG Data

The 2-dimensional CNN models were chosen for classification instead of signal processing as their performance is independent of the amount of noise in the ECG signals. This allows us to disregard the power line interference, electromyography noise, motion artifacts, line wander among other types of noises without using advanced noise reduction techniques such as wavelet transform or window-based filtering techniques.

Signals for the MIT-BIH dataset were transformed into individual images by first plotting them against time and then using a moving window to obtain images at regular intervals. These images were then converted into 128 × 128 grayscale images using a short Python snippet. This process resulted in 75,000 images classified into 8 types—7 classes of arrhythmia and one normal heartbeat.

2.2 Data Augmentation

To achieve high performance on complex tasks, CNN models are largely reliant on the amount and the diversity of the training data. Data augmentation helps with accommodating both these issues coupled with improving the class imbalance problem in the MIT-BIH dataset. Also, according to [13], networks trained with just data augmentation more easily adapt to different architectures and are less complex.

We used geometric transformations—rotation, scaling, flipping, cropping, and color space transformations—Aussian noise injection and color casting to augment the images. Each of the types of ECG beats was then resized back into 128 × 128 images after undergoing geometric scaling and cropping.

2.3 Kernel Initialization

Weight initialization is paramount as using large weights initially can cause the output of activation functions in the model’s layers to explode during a forward pass. Similarly, if the weights are initialized to be too small, outputs will vanish while we run through the neural network. This makes the loss gradients to be inefficacious when running the network backward leading to issues with the convergence of the model.

As we will be using the ReLU activation function instead of the hyperbolic tanh function, Xavier initialization can cause our activation outputs to completely vanish. Thus, we used Kaiming’s initialization to set our weights.

2.4 Activation Function

Activation functions decide which information should be passed on forward through the network by helping in the calculation of error during the backpropagation process. They are also used for introducing non-linearity into the networks and therefore, the sigmoid function. The tanh function and the ReLU function have been used popularly by machine learning researchers as their activation functions.

Convolutional neural networks usually employ ReLU and its other types as the activation functions for their hidden layers. Using ReLU can sometimes lead to missing out on some information as it converts all the negative values to zero, thus deactivating some nodes. Parametric ReLU solves this problem by assigning a small, non-zero gradient when the unit is not active.

2.5 Regularization

Batch normalization was originally motivated by the internal covariate shift. ReLU is not zero centered. Hence, initialization and input distribution may not be normalized. Therefore, the input distribution shifts over time. In deeper nets, you even get an amplified effect. As a result, the layers constantly have to adapt and this leads to slow learning. Batch normalization helps with this adaptation by calculating the central tendencies of a layer and then scaling it to adjust for the slow learning.

Shibani and Dimitris showed in their research that batch normalization is effective even if you introduce an internal covariate shift after the batch normalization layer again [17]. Therefore, we have used batch normalization for each block of our model.

Dropout is another method of regularization that randomly drops or ignores some layer outputs. This creates new layers with different numbers of nodes at each iteration which helps to train a neural network with different architectures simultaneously.

2.6 Cost and Optimizer Function

Cost function or loss function conveys the error between the forecasted and the actual output, and various optimizers are utilized to reduce the cost function by updating the parameters accordingly. We made use of the Adam optimization algorithm for our proposed CNN model. It requires very little memory and is computationally efficient to implement.

2.7 Optimized CNN Classifier Architecture

Using the above-recorded hyperparameters, our model’s architecture was designed keeping efficiency and speed in mind. It is specifically designed for situations where there is a trade-off between computational power and speed. The architecture is shown in Fig. 2.

Fig. 2
figure 2

Proposed CNN architecture

3 Experiments and Result

We implemented three popular convolutional neural network models to benchmark our proposed model with the other CNNs. These are MobileNet V1, DenseNet, and state-of-the-art EfficientNet V2. These networks along with the proposed model were trained and tested on the same MIT-BIH and PTB dataset and evaluated across the same metrics to achieve consistency.

3.1 Data Acquisition

The “PTB diagnostic ECG database” and the “MIT-BIH arrhythmia database” were used to obtain the ECG arrhythmia recordings for this work. The “MIT-BIH arrhythmia archive” is a publicly accessible dataset that contains standard investigative data for heart arrhythmia diagnosis.

There are 48 half-hour ECG recordings in the archive, obtained from 47 individuals. At 360 samples per second, the ECG recording is sampled. In the “MIT-BIH database,” there are 110,000 ECG beats with 15 various forms of arrhythmia.

In the PTB diagnostic ECG database, there are 549 records in the archive from 290 different individuals (aged 17–87, average age 57.2).

This paper’s experiment aims to validate the success of famous CNN models as well as past ECG arrhythmia classification works. We included regular beat (NOR) and seven forms of ECG arrhythmias from the “MIT-BIH database” and “PTB diagnostic ECG database.”

3.2 Experimental Setup and Training

We used transfer learning by first training all the networks on an extensive dataset. Then, we used the feature maps acquired by this model for our specific task of recognizing and classifying 8 classes. This assisted us in faster training and maximizing our performance more efficiently. All four models were trained in Google Colab that is a free Jupyter notebook environment by Google entirely set up in the cloud. It provides a GPU capability with 12 GB of RAM which can be used to run deep CNNs. It is accelerated by CUDA 9.2 and CUDNN 6.0.

3.3 Performance Evaluation

Table 1 shows the properties of the selected evaluation metrics for each model. It can be observed from the table that three models except the MobileNet have a high accuracy rate (98.6% for the DenseNet versus 98.9% for ResNet-50 versus 99.88% for the proposed model). We observe that our proposed model outperforms all the other three in the comparison of sensitivity values which are 99.38% for our network and around 97% for the other two models except ResNet-50 which showed around 98.3% sensitivity.

Table 1 Performance evaluation

We can also observe from the table that the proposed model obtained excellent precision values for each class of the dataset, the lowest value being 97% for the VEB type while classifying the classes—NOR, APC, VPW correctly for 99% of iterations.

The other four types were also detected precisely about 98% of the time. Each of the networks gave the best performance when paired with a different activation function. MobileNet preferred LeakyReLU, ResNet-50 preferred ReLU while DenseNet provided the best output with Softmax. For our proposed model, we experimented and concluded with parametric ReLU as the optimum activation function (Fig. 3).

Fig. 3
figure 3

Confusion matrix of proposed architecture with the precision of all classes

3.4 Comparison with the Existing Techniques

The authors of [15] proposed a novel CNN architecture with accuracy and sensitivity values on the MIT-BIH dataset of 99.05 and 97.85%.

Yu et al. [16] used feed-forward neural networks to classify the same 8 classes as ours and they got an accuracy rate of 98.71% and they did not record the recall values. Thus, the proposed model has shown better performance than the previous techniques applied for the same task.

4 Discussion and Conclusion

Four convolutional neural network pipelines were implemented and analyzed for the purpose of ECG arrhythmia classification—MobileNet V1, DenseNet 169, ResNet-50, and our proposed CNN model. We started by discussing the data preprocessing and augmentation techniques we employed and then moved onto the architecture and parameters of our proposed model. We briefly discussed the other three models as well and then laid out the metrics which were used for evaluating these models.

All 4 networks were trained and tested on the MIT-BIH and PTB datasets. Due to the high values of the evaluation metrics from our model, we think that this study will be helpful to take as a baseline for ECG arrhythmia classification tasks. It was designed keeping efficiency and weight in mind and thus could be advantageous when used in situations where there is a bargain of computational power. This research works as a guideline for systems that want to use deep learning for medical classification tasks.

Subsequently, we would like to try to explore some novel CNN architectures that can improve upon the results presented in this research article by experimenting with different activation functions, changing the position of each layer in the architecture, and working with different depth lengths of a network.