1 Introduction

ECG classification is a multi-domain task that requires efficient signal processing blocks, although optimization of ECG signal acquisition and filtering processes have been achieved with the advancement in signal acquisition technology, and integration of on board digital signal processing (DSP) elements. Typical ECG classification system wherein the QRS feature is used to classify ECG signals into normal and arrhythmia signals. Each ECG wave is necessarily made up of following signals, P-Wave, the initial wave of the ECG signal [1]. It is a low amplitude, and short time period wave that has a typical interval of 0.12–0.22 s for regular heart beat patterns. QRS wave, also known as the heart beat pulse, is a spike-like wave with a typical duration of less than 0.12 s [1]. Difference between ‘R’ intervals for consecutive ECG signals is termed as ‘RR’ interval, and is used for identification of arrhythmic heart signals. ST segment used for measuring depression in exercise stress testing [2]. The Q duration is also known as the duration of a beat QT that represents the condition of heart. If there is a variations in these measures from the normal value they indicate the irregular rhythm called Arrhythmia [3].

All these waveforms are used for feature extraction for ECG signal. Once the features are extracted, they are fed to a feature selection unit. Some features are passed for classification purpose while others are removed based on their feature variance. These selected features with maximum variance are given to a classification block, like neural networks, SVM, RF, CNN, etc. To differentiate between signals belonging to different categories. The results of this classification are given to a post-processing block, that helps to analyse the changes in patterns of those signals, and evaluate future health risks for patients. A survey of the most recently proposed algorithms for ECG signal classification has been mentioned in next section i.e. overview of the literature. This section will help the researchers to evaluate nuances, advantages, and drawbacks of these systems. The section is followed by design of the proposed ensemble learning classifier that uses a combination of CNN with bio-inspired & linear classification algorithms like random forest, kNN & SVC.

Summary of proposed paper is given as,

  1. 1.

    Design of a customized CNN model which is inspired by VGGNet-16 (visual geometry group network) classifier method.

  2. 2.

    Design of the ensemble classification model that combines multiple classifiers to form effective classifier system.

The suggested algorithm as well as a comparison of that algorithm to a number of high-efficiency models that have been studied before for ECG classification. Additionally, it suggests carrying out additional studies in this area so as to further enhance the functioning of the system as a whole.

2 Overview of related literature

Review of several works based on deep learning and machine learning

Ref. No

Method used/Feature extraction technique

Description

[4]

Morphological and dynamic features

An SVM classifier was used to classify distinct types of heartbeats using morphological and dynamic data

[5]

Optimal orthogonal wavelet filters

It was shown that wavelet decomposition with an orthogonal filter bank decreased stop-band energy and fuzzy entropy characteristics

[6]

Neural network (Predictive Coefficients and Probabilistic)

ECG wavelet analysis revealed an R peak. RNN and SVM classifiers use QRS linear predictive coefficients

[7]

Dual tree complex wavelet based features

Was done using the discrete wavelet transform and dual-tree complex wavelet transform

[8]

Feature methods with long short-term memory (LSTM) network model

An LSTM model was used to divide the MIT-BIH arrhythmia datasets into five types of arrhythmia beats based on these characteristics

[9]

DNN (Deep Neural Network) with Unsupervised Feature Extraction Technique

Pre-training and fine-tuning were done using deep auto-encoders and deep neural networks, respectively

[10]

DNN and engineered features

On the 2017 PhysioNet/Computing in Cardiology challenge database, the ensemble technique does a better job of identifying arrhythmia than individual classifiers

[11]

Auto-encoders (stacked sparse) and softmax regression

An ECG arrhythmia classification algorithm based on softmax regression was reported.In-depth features were extracted from the MIT-BIH arrhythmia data set using stacked sparse auto-encoders

[12]

Deep CNN with long duration ECG signals

There are 17 kinds of arrhythmia beats in the 1-D CNN model. The MIT-BIH arrhythmia database was used to look at 10-s long-duration ECG signal fragments from one lead of 45 patients

[13]

CNN, LSTM

The model handled variable-length data with 98.10 percent accuracy using ten-fold cross-validation

[14]

LSTM-Based Auto-Encoder

The auto-encoder decoder model received the reconstructed ECG signals with high-level characteristics using the LSTM model

[15]

convolutional encoded features with bidirectional LSTM memory

A convolutional encoder initially encoded a bidirectional LSTM model to identify arrhythmias from ECG data

[16]

Multi layer NN (Neural Network) and metaheuristic algorithm approach

Automatic ECG arrhythmia classification using multilayer perceptron neural networks (MLP) and enhanced metaheuristics was presented. The MLP classifier was trained and tested using particle swarm optimization.

[17]

Combination of CNN, LSTM

The authors suggested a hybrid DL model using CNN and LSTM to detect six types of arrhythmia beats. To increase the hybrid model's performance, a varied number of people were used to train and test the dataset

Researchers have proposed several powerful algorithms over the past several years to optimize the performance of ECG signal classification. Most of these algorithms use neural networks and other optimization techniques in order to optimize classification accuracy, precision, recall and f-Measure values. In order to optimize the performance of ECG signal classification, number of decent algorithms approaches have been proposed by researchers over the past several years.

For classification some researcher used machine learning [18,19,20,21,22,23,24] and deep learning [25,26,27,28,29,30] algorithms. Machine learning classifier like, Naive Bayes [31, 32], Decision Tree [33], KNN [34, 35], SVM [36, 37], RF [38], Logistic Regression, Optimization technique [39] and others are used to categories the dataset. The CNNs [38, 39, 34] are often used by many researcher in different areas. RNN, multi-layered feed-forward neural network (MLFFNN) [35] is a type of artificial neural network (ANN) [34, 40] It improves fixed-size input and output networks. After that, the suggested model's statistical evaluation and comparison with other CNN-based models are demonstrated. There are a few observations and research areas that academics might explore to enhance and adjust the recommended model.

2.1 Challenges

Selecting the appropriate feature is also essential for optimal classifier performance. The standard method for classifying diseases has a significant problem when it comes to picking the most essential attribute. Standard deep learning suffers from high processing complexity and lengthy training times because of the large dimensionality of the data used for training. In the medical literature, there are several illness-decision-help systems with varying degrees of precision. However, the vast majority of investigations have not looked into missing data and feature selection in its entirety. The lack of data used for training leads to over-fitting, which in turn leads to inaccurate predictions.

3 Materials and method

In order to enhance accuracy levels of ECG classification models, an ensemble CNN network model with bio-inspired and linear classifiers are designed in this section. This design combines VGGNet-16 inspired CNN, Discrete wavelet transform (DWT), bio-inspired RF model, linear SVM, and kNN classifier together to reduce classification errors. Architecture diagram for the proposed model can be observed from Fig. 1, wherein these classifiers are combined via weighted operations in order to achieve high accuracy, precision, recall, and F-measure.

Fig. 1
figure 1

Methodology adopted

3.1 Database used

The MIT-BIH arrhythmia database is a publicly available dataset which includes standard investigation material for the identification of cardiac arrhythmia. Since 1980, it has been utilised for the purpose of fundamental research and medical device development on heart rhythm and associated illnesses.For validating the proposed method, we have used ECG signals from MIT-BIH dataset [41] (see Tables 1, 2) (see Fig. 2).

Table 1 Dataset information
Table 2 Proposed CNN model and its layer-wise purpose
Fig. 2
figure 2

Proposed CNN Layer architecture for ECG Classification System

The CNN is trained using sparse categorical cross-entropy (SCCE) loss function, and is optimized for accuracy. The CNN model has high accuracy, but is not able to differentiate between some Cardiovascular disease (CVD) classes via feature convolutions. The hyperparameters used for these convolutions are depicted in Table 3 as follows,

Table 3 Hyperparameters of the VGGNet-16 model

Thereby, the same dataset is given to a discrete wavelet transformation engine to evaluate wavelet features. These features are evaluated using Eqs. 1 and 2 as follows,

$$F_{dwt} = \mathop \sum \limits_{i = 0}^{N - 1} x\left( i \right)* \partial \left( {i - N} \right)$$
(1)
$$\partial \left( k \right) = \frac{1}{{a^{k} }}* \partial \left( {\frac{N}{{a^{k} }}} \right)$$
(2)

‘x’ is the ECG waveform trials, ‘\(\partial \left(k\right)\)’ is the wavelet function, ‘a’ is wavelet constant, ‘N’ is the number of samples in the ECG waveform and the output wavelet features. Using these features, kNN, SVM and Random forest classifiers are trained. After this training, all the test set values are given to each of these classifiers, and the following Eq. 3 is used to find the final class,

$$C_{out}=w_{cnn}*C_{cnn}+w_{knn}*C_{knn}+w_{svm}*C_{svm}+w_{rf}*C_{rf}$$
(3)

where, ‘w’ are the weight factors, and ‘C’ are the class outputs given by the classifier. Weight factors are evaluated from historical accuracy values obtained by these classifiers in the literature, and it is observed that the following values of weight are most optimum for the proposed model, \({w}_{cnn}=0.6, {w}_{knn}=0.2,\) & \({w}_{svm}=0.1\,and\,{w}_{rf}=0.1\) etc., Thereby, the final class for the given ECG waveform is evaluated and stored for further performance comparison In order to evaluate performance of the proposed model, the next section uses entire MIT-BIH dataset, and divides it into equal parts for evaluation of accuracy, precision, recall and f-Measure values. These values are compared using state-of-the-art methodologies.

3.2 Performance parameter

Precision measures the quality of a model's positive prediction and is used to evaluate model performance. Precision is the ratio of true positives to valid forecasts. Precision measures how many of the positively anticipated samples are meaningful.

Where, TP, is True positive (TP), and false positive (FP). False negative (FN), True negative (TN) etc.

$$\mathrm{Precision}\,\left(\mathrm{P}\right)=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FP}}$$
(4)

The recall rate is the percentage of accurate positive samples to total positive samples. measures the model's ability to identify positive samples. Higher recall means more positive samples. recall, or sensitivity. It measures how many positive samples are expected to be positive.

$${\text{Recall }}\left( {\text{R}} \right) = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}}$$
(5)

The accuracy of a machine learning model determines which model is better at discovering correlations and patterns in a dataset based on training data. Accuracy is the ratio of correct to total classifications.

$${\text{Accuracy}}\,\left( {\text{A}} \right) = { }\frac{{{\text{TP}} + {\text{TN}}}}{{{\text{TP}} + {\text{TN}} + {\text{FP}} + {\text{FN}}}}$$
(6)

The F-measure uses the harmonic mean of accuracy and recall, weighing each variable equally. It allows a model's performance to be discussed and compared using a single score that combines accuracy and recall. The F-Measure combines accuracy and recall.

$${\text{F}} - {\text{Measure}}\left( {\text{F}} \right) = 2 \times \frac{{{\text{Precision }} \times {\text{Recall}}}}{{{\text{Precision }} + {\text{Recall}}}}$$
(7)

4 Results and comparative analysis

A total of 109,446 sample records were collected for this research from MIT-BIH. Each of these records is categorized into 5 different CVD classes: Premature ventricular contraction, Fusion of ventricular and normal beat, Ventricular escape beat, Paced beat, and Normal. The performance parameters like accuracy, precision, recall, and f-Measure values compared for CVD classification the proposed model and models defined in [7, 42,43,44,45,46]. Table 4 indicates the accuracy performance of the proposed model w.r.t. number of ECG entries used for testing. The dataset was divided into 70:30, wherein ~ 77 k records were used to train the model, while the remaining ~ 33 k records were used to test the model and evaluate its performance. Thus, overfitting has been taken care of by dividing the set into training & testing samples, and then evaluating testing set performance parameters. The following Table 4 showcases the test accuracy (TA) performance of all the models.

Table 4 Comparative study of proposed method ensemble CNN network model with other literature of MIT BIH database

From Fig. 3a, it is observed that, Average accuracy is around better when compared to existing models wherein these accuracy values are plotted, while its ROC performance can be observed from Fig. 3b, wherein its performance can be observed. Based on this performance, the confusion matrix is evaluated, and can be observed as follows,

Fig. 3
figure 3

a Average accuracy for different algorithms. b ROC Curve of the classification model. c Epoch Training progress

 

Confusion matrix

 

[[18025

93

0

0

0

0]

[0

556

0

0

0

0]

[0

0

144

4

0

0]

[0

0

0

162

0

0]

[0

0

0

0

1605

3]

[0

0

0

0

0

0]]

From this matrix, it can be observed that almost all the entries are properly classified, while some entries from class 1, and class 4 are not categorized into the required ECG class.

Table 5 showcases the test precision (TP) performance of all the models. From the test precision values, it can be observed that the proposed model is highly efficient and as better performance when compared to decent classification models. Average precision is around 99.47% when compared to existing models.

Table 5 Test precision for combined dataset

The following Table 6 showcases the test recall (TR) performance of all the models. From the test recall, it can be observed that the proposed model is highly efficient, and as better performance when compared to decent classification models.

Table 6 Test recall for combined dataset

Average recall is around better when compared to decent models. The following Table 7 showcases all the models' test f-Measure (TF) performance. From the test f-Measure, it can be observed that the proposed model is highly efficient, and as better performance when compared to other decent classification models. Average f-Measure is around 99.60% when compared to existing models. It can be observed that the proposed algorithm is superior in terms of all the performance parameters when compared on different datasets with different existing model. The accuracy of this model is saturated, and thus can only be improved infinitesimally by using superior deep learning models for the same datasets.

Table 7 Test f-Measure for combined dataset

The execution of our proposed model which gives an accuracy of 99.98%. This accuracy was validated via the following process,

  1. 1.

    The model was initially evaluated with a training & testing ratio of 70:30, wherein standard deviation of samples was considered to divide the input datasets.

  2. 2.

    This process was repeated for \(N=10\) iterations, with different sets of training & testing samples.

  3. 3.

    Average accuracy from these iterations was used to obtain the final performance metrics.

These metrics assisted in evaluation of the final accuracy on given dataset. To further validate this process, the following Table 8 indicate training progress in every epoch.

Table 8 Computation parameter of proposed model

Based on these runs (10 runs with different training & testing sets), standard deviation of error for the model was evaluated and can be observed from Table 9 as follows,

Table 9 Standard Deviation of error for different runs

Upon observing the standard deviation, it can be concluded that the model has low error, and thus can be used for a wide variety of real-time clinical applications.

5 Conclusion and future scope

In this research, a novel model are use to classify five heartbeat classes, namely N, S, V, F, and Q, with MIT-BIH Heartbeat Database. The proposed model network model are learn and generalise fast set-up, collective, easy execution, and better accuracy. All performance parameter values are better when compared with the recently decent algorithms. The performance of this model is confirmed on MIT-BIH standard dataset, which makes the system applicable for real-time use cases. The proposed ensemble classification model of CNN with random forest, SVM, and kNN out perform as compare to other existing models for MIT BIH heartbeat Database. The use of wavelet transforms for feature extraction, reduces the feature-length by maximizing variance between feature sets of different classes and improving overall performance. The proposed model has accuracy of 99.98%, precision of 99.48%, recall of 99.73%, which generates a high f-Measure value of 99.6%. It is advised that the system must be tested for other ECG datasets, including but not limited to Physionet, PTB Datasets, Mendeley datasets etc. The evaluation of real-time energy consumption and optimization for clinical use might be future scope of proposed research work.