1 Introduction

The structure of the heart has evolved to such an extent that it can function continuously throughout a human life without any failure. However, the World Health Organization reported that cardiovascular diseases are the number one cause of death globally. As assessed, 17.9 million people died from cardiovascular diseases (CVDs) in 2016, representing 31% of all global deaths. Furthermore, most of these deaths occurred in low-income countries. Almost half of the total deaths were classed as sudden cardiac deaths (SCDs), and arrhythmias are a reason for most of these diseases [1].

It is found that all the electrical activities of the heart are represented by the ECG signal and that manifested by P-QRS-T peaks linked by PR and ST segments, which means that the heartbeat consists of intervals. These peaks, segments and intervals represent time-domain features, and that is the pattern of the repolarization and depolarization of the ventricular and atrial of the heart [2]. The literature has reflected that the arrhythmia detection and classification are well-known methods used in the diagnosis of cardiovascular disease [3]. As such, the clinician depends on it to depict the whole picture of a heart’s performance and its health. Regarding the morphology of the ECG signal, Association for the Advancement of Medical Instrumentation (AAMI) subdivided the arrhythmia into five major groups, and they are non-ectopic (N), supraventricular ectopic (S), ventricular ectopic (V), fusion (F) and unknown beats (Q) [4].

In general situation, physicians in order to treat the ECG signal manually implement visual assessment to classify the arrhythmia types seeking the diagnosing of it. However, the problem here is that this approach may be subjective, and the interpretation might be biased. Therefore, the smart-based solution is needed and has been developed by the researchers, which are so-called a computer-aided diagnosis (CAD) system where the ECG automatically is treated [5]. The literature showed that most of the studies that dealt with the CAD system had used machine learning techniques to classify and interoperate the ECG arrhythmias. It is known that all machine learning approaches should go through three main stages. These stages are firstly preprocessing, then feature extraction and reduction, and finally classification. The most challenging is the feature-related stage because it needs handcraft engineering, which is resolved in this study by using deep learning approaches [5]. The process of arrhythmia detection has led researchers to develop novel algorithms to analyze ECG signals through QRS detection. As a result of that research in some types of research, they segment each ECG signal into its separate beats and extract the features form every arrhythmia signal type. In other words, using the deep learning approach to classify the arrhythmia is a slightly new field and more studies are needed to develop a reliable smart system that can support physicians to diagnose heart diseases and even can replace them in the future.

The research history showed that CNN has been implemented to classify those five significant groups, where the real experiment run for two different versions of data, noised and non-noised, which are obtained from the public database. The developed, trained model in one experiment has nine layers [6]. By using a similar concept and theory, CNN also has been used in the literature to detect and diagnose the coronary artery automatically. In order to implement the last model, two different time intervals for the ECG signal were used as input vector disjointedly to two different models, and each of them consists of two and five seconds, respectively. The structure of each model comprises of 11 layers for every interval [7]. In another different implementation, a multi-scaled fusion of deep convolutional neural network has been used to select atrial fibrillation (AF) from a single-lead ECG signal. The structure of the model is a two-stream convolutional, and here different filter sizes were used in order to grasp features of various scales. In the actual implementation of the model, a fixed time interval was used. Specifically, different intervals were used separately, and they are 5, 10, 20 and 30 s in order to obtain them. In the experiment, the AF signal recording is cropped and padded sometimes if needed [8].

The above paragraph and the literature showed that CNN had been used successfully to solve some problems related to arrhythmia classification, diagnosis of the coronary artery, atrial fibrillation sifting and feature extraction. However, there is a different way to apply the CNN in the ECG signal to get a better result and more benefit regarding the use of artificial intelligence in heart diseases. To use the CNN to classify ten different arrhythmia types or more is quite a challenge and should be tackled. However, the classification of ten different arrhythmia types is the main issue of this study because some of the previous studies have not studied it in this manner. Consequently, the primary purpose of this study was to use the CNN in the classification of the ECG signal arrhythmia effectively and design a good structure for the proposed neural network and produce the optimum parameters of it.

2 Materials and method

It is known that the pattern recognition and classification system imply a particular framework comprising of preprocessing, feature extraction and classification. Consequently, there are specific stages that must be followed. In this study, the framework consists of preprocessing and classification stages without feature extraction stage as it is shown in Fig. 1. The deep learning approach was adopted to classify the ten different arrhythmia types. A novel CNN architecture as shown in Fig. 2 was designed, developed and trained with the dataset from MIT-BIH database targeting automatic classification.

Fig. 1
figure 1

Proposed method

Fig. 2
figure 2

Architecture of the proposed CNN model

To validate the proposed model and get high-performance accuracy, the downloaded data from MIT-BTH were segmented in the preprocessing stage and also augmented in order to solve the problem of the imbalanced dataset as appearing in the varied number for different arrhythmia types for all the related.

ECG heartbeat data in this study besides that are getting sufficient dataset to train the deep learning model that was targeted as well. At the end of the process, a reasonable number of samples have been obtained for all classes. In the following subsection, the model and its concepts will be discussed in further detail.

2.1 Data collection

To implement and test the proposed method, only lead II ECG signal, which is depicted in Fig. 3, was extracted from the MIT-BIH arrhythmia database. After augmentation sub-step, the total number of the ECG heartbeats that is used in the study is 105,199, where the average number for each arrhythmia class is 10,520, and each ECG heartbeat consists of 300 samples. The details of the data are shown in Table 1. Regarding AAMI classification, those ten different arrhythmia types considered in this study are the following: normal beat (NB), left bundle branch block beat (LBBB), right bundle branch block beat (RBBB), nodal junctional escape beat (NJEB), atrial premature beat (APB), aberrated atrial premature beat (AAPB), nodal junctional premature beat (NJPB), premature ventricular contraction (PVC), fusion of ventricular and normal beat (FVNB), fusion of paced and normal beat (FPNB).

Fig. 3
figure 3

Ten different arrhythmia types used in the study

Table 1 Distribution of extracted arrhythmia data

2.2 Preprocessing

In order to complete this stage, two major tasks in two sub-stages should be undertaken sequentially. Those two sub-stages are de-noising and segmentation. In this study, the noise reduction is not considered, and the segmentation was implemented as explained in the following paragraph.

It is known that the segmentation of the ECG signal mainly depends on the QRS complex detection, and to achieve this job, the researcher, through the literature, shows the following approaches. The first approach is the threshold-based methods, which include the derivative-based algorithm. The second approach is that one using regular grammar. The third is correlation-based methods, and the fourth is the method based on digital filters. All those mentioned approaches are suitable for the automatic implementation in the final application [911].

2.3 Data extraction and augmentation

In this study, the approach that implemented to extract heartbeat is slightly different from the methods as mentioned earlier. Here the strategy of segmentation was based on annotation file, which was downloaded from the database along with the data. Therefore, in order to segment the actual 30-min-long ECG signal, which was downloaded from MIT-BIH database along with its annotation file, the following two steps were applied.

Those two steps are: first, the beginning and the end of the heartbeat for each type of those ten different arrhythmias were determined depending on the annotation file. Then, the length of the heartbeat which is the number of samples in every heartbeat was taken as 300 samples. Those 300 samples were taken from the beginning of each heartbeat because the annotation file gives different lengths for those different types of the arrhythmia.

It is noted that the extracted data were imbalanced for those ten different arrhythmia types and the total number for each class in the dataset for all the classes was highly varied. Therefore, the extracted data were augmented by implementing Z-score and then using different means (\(\mu\)) and standard deviation (\(\sigma\)) for different extracted classes, and the augmented final balanced dataset is shown in Table 1.

2.4 Convolutional neural network

The CNN is a well-known classifier developed by Fukushima in 1980 [12] and then upgraded by LeCun et al. [13] in which two main performances were developed. The first module is to identify and extract features from the input data; therefore, they automatically learned by the convolution and max pooling layers. The second partition is fully connected multi-layer perceptron (MLP), and its role is to accomplish the classification job [14].

To implement the convolution layer, the feature vector or the activation map from the previous layer should be the input. Then, a filter or such kernel should be used to convolve the input. After that, the bias is added to the result and subsequently fed to such activation function to produce the output. The new activation map is the new input to another activation layer or max pooling, or fully connected layer. If the fully connected layer is the last one and the application is classification, the activation function should be softmax. Equation (1) shows how the convolution layer is implemented where \(A\) is the activation (features vector), \(l\) is the layer index,\(\sigma\) is the activation function produced non-linearity,\(b\) is the bias of the kth activation map, \(N\) is the filter size, \(W_{n}^{k}\) is the weight for the nth filter indexed and kth activation map, \(X_{i}^{0}\) is the data input vector, and \(n\) is the total number of samples in the features

$$ A_{i}^{l,k} = \sigma \left( {b_{k} + \sum\limits_{n = 1}^{N} {W_{n}^{k} X_{i + n - 1}^{0k} } } \right) $$
(1)

For the max pooling implementation, the situation is slightly different from convolution layer. That is, the input from the previous layer is downsampled to produce the output, which is the activation map. In order to accomplish this task, such filter and stride size were needed. Equation (2) shows how the max pooling is implemented where \(T\) is the window size of the pooling and \(S\) is the pooling stride.

$$ P_{i}^{l,k} = \max_{t \in T} \left( {A_{iXs + r}^{l,k} } \right) $$
(2)

2.5 The architecture

Using one-dimensional (1D) strategy in this study, a CNN is built up from 11 layers which encompass 4 convolution layers, 4 max pooling layers in the first part and 3 fully connected layers in the second part. The max poling layers come alternately after the convolution ones, and all the fully connected come at the second part of the architecture. To control the number of the features and correctly extract them through the first part of the CNN, the stride is set to 1 and 2 for the convolution and max pooling, respectively. Furthermore, the size of the window for the convolution layer, which was layers 1, 3, 5 and 7, has been adjusted to 5, 7, 9 and 11, respectively. In contrast, the size of the window for the alternating max pooling layers, which are layers 2, 4, 6 and 8, has been fixed to 2 for all layers. In the second part of the CNN, which is the part that will achieve the classification task and consists of layers 9, 10 and 11, the number of neurons for those fully connected layers is 30, 20 and 10, respectively; all those above parameters were selected after many implementations of trial-and-error algorithm for different values, and it is found that they are the best ones. For the convolution layers, the ReLU function was adopted as the activation function for the non-linearity and the softmax function for the last layer to class those ten types of the arrhythmia. The proposed structure of the CNN is depicted in Fig. 2, and the details of it are given in Table 2.

Table 2 A summary table of the proposed CNN model

3 Experiment

The proposed CNN model was implemented as depicted in Fig. 1 and Fig. 2, and the details of it are reflected in Table 2. The data which were used to train and test the model are illustrated in Table 1. In this section, the architecture which is described in the previous section has been implemented and evaluated in this section; as a result of that, the performance was measured and compared with existing methods.

3.1 Training and testing

To implement the CNN network, the stochastic learning and backpropagation techniques is a famous tool to have and execute through every deep learning and regular Neural Network (NN) [15]. To realize that, the weights were updated using equation (3), and similarly, the biases were updated regarding equation (4):

$$ W_{l} = \left(1 - \frac{{\alpha_{\lambda } }}{m}\right)W_{l - 1} - \frac{\alpha }{{m_{b} }}\frac{\partial J}{{\partial W}} $$
(3)
$$ b_{l} = b_{l - 1} - \frac{\alpha }{{m_{b} }}\frac{\partial J}{{\partial W}} $$
(4)

where \(W\) is the weights, \(b\) is the bias, \(l\) is the layer index, \(\alpha\) is the learning rate, \(\lambda\) is the regularization parameter, \(m\) is the total number of samples, \(m_{b}\) is the batch size and \(J\) is the cost function.

To run the actual experiment, TensorFlow framework has been taken as the environment to implement the proposed CNN architecture. In this study, in order to accomplish the training process for getting the optimal performance, the ADAM algorithm has been taken as an optimization approach [16]. The experiment was run with random initialization of the weights in some way that should solve the obstacle of the vanishing and exploding problems which affect the performance of the whole algorithm if not solved [17]. The parameters of the ADAM algorithm were taken as the default one of the TensorFlow, where the learning rate is 0.001 which is aid to solve the difficulty of the convergence; the momentum parameters are \(\beta_{1}\) being 0.9 and \(\beta_{2}\) being 0.999 and the regularization factor being changed to 0.01, and this will help to solve the problem of the overfitting in the training data. The activation is ReLU for the inner layer and softmax for the last one. The other hyperparameters related to the experiment have been adjusted as the following: the batch size is 64, which is needed to accelerate the learning process, and the model has been run for 100 epochs in the training process. The training and testing data were produced by dividing the whole augmented data which are 105,199 heartbeat and that by firstly shuffling the 10 classes and then dividing the whole data using the TensorFlow framework automatically. The percentages of the division are 80% and 20% for training and testing data, respectively. So, the amount of the data becomes 84,130 for the training and 21,069 for the testing data regarding all the classes under the study.

3.2 Performance measures

In this study, it was found that the most convenient way to evaluate performance is to calculate the SEN as in Eq. (5), SPE as in Eq. (6) and PRE as in Eq. (7). Another way that we can use to evaluate performance is the receiver operating characteristic (ROC), which can diagnose the binary model and reflect the performance graphically by using different thresholds, used along with SEN, SPE and PRE. In this study, the ROC has been used for a multi-label classifier with one vs all. In addition to that, the area under curve (AUC) which is the one single number of the performance of the model gotten by calculating the percentage of the area under ROC lays between 0 and 1. To be more sure about the result, more performance measurements have been done by using micro- and macro-averaging which has been used in this study. Those two last metrics depend solely on precision as appeared in Eqs. (8) and (9):

$$ SEN = \frac{TP}{{TP + FN}} $$
(5)
$$ SPE = \frac{TN}{{TN + FP}} $$
(6)
$$ PRE = \frac{TP}{{TP + FP}} $$
(7)
$$ PRE_{micro} = \frac{{TP_{1} + \cdots + TP_{k} }}{{TP_{1} + \cdots + TP_{k} + FP_{1} + \cdots + FP_{k} }} $$
(8)
$$ PRE_{macro} = \frac{{PRE_{1} + \cdots + PRE_{k} }}{k} $$
(9)

\(TP\) is the true positive, and \(TN\) is the true negative. \(FN\) is the false negative, and \(FP\) is the false positive. \(PRE_{micro}\) is the micro-average, and \(PRE_{macro}\) is the macro-average.

3.3 Results

In this section, the output of the study, in particular the result of the experiment, will be reflected and shown in detail. These extracted data vectors were directly applied to the CNN discriminator after divided into training and testing data.

The proposed CNN model has been run on a standard × 64-based PC workstation with Intel (R) Core (TM) i5-6200U CPU @ 2.30 GHz, 2400 MHz, 2 Core(s), 4 logical processor(s) and installed physical memory (RAM) 8.00 GB. The performance of the CNN discriminator was firstly measured by a confusion matrix, as presented in Table 3 and Fig. 4. Secondly, based on that confusion matrix, where the diagonal elements accurately represent the correctly classified arrhythmia heartbeats, the performance has been measured again by calculating PRE, SEN and SPE for every arrhythmia type individually, as illustrated in Table 4. By using the same testing data and in the implementation to get more performance measurement, the ROCs were calculated for each and every arrhythmia type individually; in addition to that, the related AUC is also considered, and both of them were plotted and the zoomed version of the same plot is shown in Fig. 5. Further illustration for the performance is depicted by plotting the micro- and macro-average and the zoomed version of the same plot is shown in Fig. 6.

Table 3 Confusion matrix of the CNN
Fig. 4
figure 4

Confusion matrix for the predictions of the CNN

Table 4 Performance parameters of the proposed method
Fig. 5
figure 5

Zoomed version ROC for the ten classes of the proposed model

Fig. 6
figure 6

Macro and micro for the ten classes of the proposed model

3.4 Discussion

The main finding of this study is getting high performance when deep learning is used to classify ten different arrhythmia types, and that happens by direct implementation of CNN, and the comparison presented in Table 5 showed that clearly. In addition to that finding, the proposed method reduces the number of stages compared to the conventional pattern recognition framework where the feature extraction should be handcraft engineered. Moreover, the size of the feature vector itself was reduced and governed by the simple use of max pooling layer while in the conventional framework it needs such a statistical-based method like PCA.

Table 5 Performance comparison

It is noted that the original dataset which is downloaded from the MIT-BIH database is not enough and suitable to train the proposed model so that the dataset was augmented for all classes to obtain sufficient data besides the problem of the imbalanced dataset for the different classes was solved. This solution is suitable for now and did well in the study, but it is recommended to get all the datasets directly from the patient in order to develop a more trusted application in the real world, and that will help a lot to generalized the model and deploy it.

The kind of the proposed research helps to build systems to connect remote doctors with their patients, and automatically, it can be deployed to use by a cardiologist in order to support them in diagnosing and interpret the arrhythmia. Furthermore, ECG data can be transmitted efficiently through such a system, and assistance applications in mobile phones or wearable devices can be implemented effectively. In addition to that, it could be implemented in intensive care units as well as in the remote area in the developing countries where the health care is not accessible for the poor people, and it will help to diagnose the arrhythmia diseases.

In order to be sure about the output, the performance of the proposed approach needs to be confirmed, the results have been compared with previous algorithms and study in the literature. Table 5 reflects this comparison, and it is found that the proposed algorithm can acquire the best performance measures, and the accuracy reached to 99.84.

4 Conclusion

The stated purpose of the study has been achieved very satisfactorily. The deep learning approach has been entirely implemented, namely the CNN, and that leads to the successful identification and automatic classification of the arrhythmia, which is essential for diagnosing heart diseases. The experiment has been run to classify ten different arrhythmia types that belong to all the five main groups of the arrhythmia as named by ANSI/AAMI EC57 standard, and the dataset for the study has been downloaded from the MIT-BIH database. The output of the study can contribute to building systems for automatically connecting remote doctors with their patients and it can be deployed to use by a cardiologist in order to support them in diagnosing and interpret the arrhythmia. Furthermore, it could be implemented in intensive care units as well as in the remote area in the developing countries where health care is not accessible for poor people.

In the future works, the authors or another researcher can use the sequential model to classify the arrhythmia as another option of deep learning and then compare it with CNN. Furthermore, the model could be implemented as a mobile app together with IoT devices to achieve mobile health monitoring system in the heart diseases and try to implement it in the developing counties.