Introduction

The cardiovascular disease (CVD), one of the foremost causes of human death, is causing increasing number of mortalities worldwide, especially in developing countries.[25] As the principal form of CVD which is broadly described as irregular heartbeat, cardiac arrhythmias can be detected by electrocardiogram (ECG) that records the electrical activity of myocardium to provide rich physiological information on user’s heart state.[7] In ECG diagnosis, normal state and abnormal state classification of heartbeats plays vital role in both research and clinic.

In the past years, many efforts were made on developing automatic cardiac arrhythmia classification based on machine learning (ML) technologies. Researchers utilized features such as higher order statistics of wavelet packet decomposition (WPD) coefficients, wavelet features and morphological features, combined with various types of classifiers such as k-nearest neighbor (KNN) and support vector machine (SVM) to recognize different classes of ECG signals.[12,15,16,20,28,29,31] Generally speaking, extracting appropriate features from ECG signals plays an important part in the result of cardiac arrhythmias classification and prediction. Nevertheless, it is hard to manually extract appropriate features which demands professional knowledge and cost enormous human labor.[8] The selected features may also not be suitable for different types of dataset.

Apart from the aforementioned traditional ML approach, deep learning (DL) including convolutional neural networks (CNN) and recurrent neural networks, has more advantages in prediction tasks. In recent years, DL has become an important methodology to be successfully adopted in computer vision, pattern recognition, bioinformatics, etc.[10] The benefits of DL include: (1) it does not require artificial feature extraction but can acquire features in an effective way, and (2) no need to choose appropriate classifier. Thus, DL alleviates a lot of workload in model construction.

Recently, due to the feasibility and portability of wearable devices, some applications have been gradually applied in ECG signal collection.[6] Compared to the hospital ECG equipment, the wearable ECG device is more user-friendly and more convenient to monitor the heart status in real time. A large volume of dynamic ECG data can be collected in the user’s daily life, transmitted through the Internet of Things, and accessed by the specialists for possible early diagnosis.[17]

However, conventional algorithms do not offer flexibility to handle such huge volumes of wearable ECG data. For instance, user mobility introduces specific challenge especially in terms of signal quality and real time computing. Consequently, a more insensitive, robust and light weight classification system is highly desired for cardiac arrhythmias detection based on ECG signals.

Generally ECG classification models could be categorized into beat-based schemes and record-based schemes. In the beat-based scheme, heartbeat of all the patients are integrated without distinguishing them either into the training set or into the test set. For cardiac arrhythmia classification, many researchers employ beat-based scheme for the purpose of improving classification accuracy (e.g.[16,28,29]). However, the training step is peculiarly prone to the overfitting problem when the training and test samples are from the same patient. Whereas, the record-based classification can avert aforementioned overfitting issue since all the beats in the test set are completely from unknown patients, which is much more closer to the real scenario. Overfitting makes the neural network model perform well on the training set, but has poor generalization ability and performance on the test set. Accordingly in this article, a record-based classification model is proposed to match the practical application.

Previously we have proposed conventional machine learning algorithm for cardiac arrhythmia classification.[30] In this study, our motivation is to develop an automatic method based on big ECG data to recognize cardiac arrhythmias with high accuracy and low computation. To reach our objective, a one-dimensional (1-D) seven-layer CNN model is designed to classify three types of ECG beats, known as normal beat (N), premature ventricular contraction beat (V), and right bundle branch block beat (R). Certain similarities exist between the three types of ECG beats, therefore a detection system needs to be devised to distinguish them.

The main contributions of this work are summarized as below: (1) We have designed a 1-D seven layer CNN classification system that can automatically extract appropriate features and recognize three different types of wearable big ECG data (N, V, R) in arrhythmia monitoring with superior performance. (2) Record-based ten-fold cross validation scheme is employed, i.e., the training set and the test set are separated completely, thus ensuring the independence of the samples and verifying the robustness of our approach, which in turn makes the experimental scenario even closer to the clinical application. (3) In order to study the generalization ability of the proposed method, we validate the classification model both on original signals and de-noised signals. Experimental results verify its insensitivity for wearable ECG signals processing without adjusting the parameters.

Related work

Currently in clinic, cardiac arrhythmias are classified by examining ECG recordings of the patient by the expert cardiologist. This process is expensive as well as time consuming. Thus, there is much desire to design an automatic classification method for diagnosing cardiac arrhythmias during the treatment. It is worth noting that many efforts have been put in the latest years on automatic cardiac arrhythmias classification with good performance especially through the joint investigation of big ECG data analytics by deep learning solutions[1,2,5,13,14,18,19,23,24,27]

An eleven-layer deep CNN system was proposed by Acharya et al. to classify four types of arrhythmia disease, where they implemented two experiments.[1] The results achieved an accuracy of 0.925 for two seconds of ECG duration (experiment A), and an accuracy of 0.949 for five seconds of ECG duration (experiment B), both of which demonstrated high effectiveness in prediction. Sannino et al. designed a deep neural network (DNN) system for identifying abnormal beats from a large quantity of ECG signals.[23] The numerical results illustrated the effectiveness of the approach, especially in terms of accuracy with 0.9909. However, the signal denoising step increased the workload of this work.

Mathews et al. studied the application of the deep belief networks (DBN) and Restricted Boltzmann Machine (RBM) based on deep learning methodology for detecting supraventricular and ventricular heartbeats.[18] Experimental results demonstrated that DBN and RBM can achieve high average classification accuracies of 0.9363 for ventricular ectopic beats (VEB), and 0.9557 for supraventricular ectopic beats (SVEB) with suitable parameters.

Tan et al. utilized eight-layer stacked long short-term memory (LSTM) network with CNN to classify ECG signals automatically.[27] The proposed deep learning model was able to identify CVDs with a diagnostic sensitivity of 0.9576, specificity of 0.957, and accuracy of 0.9985. Kiranyaz et al. used an adaptive implementation of one-dimensional (1-D) CNN for a patient-specific ECG heartbeat classification.[14] The performance of the classification experiments were implemented with diagnose sensitivity of 0.939, specificity of 0.989, and accuracy of 0.99 for VEB, and sensitivity of 0.603, specificity of 0.992, and accuracy of 0.976 for SVEB, respectively.

Currently, there is still no identical CNN model for arrhythmia recognition that can achieve similar good performance on both original signals and de-noised signals. Moreover, for wearable ECG data classification, if the noise removal step can be neglected, overall workload will be greatly reduced for the real-time purpose of processing. Accordingly in this work, a seven-layer CNN with 1-D convolution and 1-D mean-pooling is proposed, outperforming most of the current approaches by solving the aforementioned problems. The model is insensitive to ECG signals based on independent individual records, no matter it is a clean signal or not.

Methodology

Arrhythmia classification can be described as a pattern recognition problem. In this section, a DL ECG pattern recognition system for classifying cardiac arrhythmias is employed. The typical blocks of the classification system contain four main modules: (1) Wearable ECG data acquisition, (2) Cloud platform, (3) Cardiac arrhythmias classification model, and 4) Remote treatment. Specifically, the innovations of this work focus on module 3 with three consecutive steps: (i) Pre-processing, (ii) CNN pattern recognition model and (iii) Performance evaluation. With respect to the evaluation, our experiments involve two data sets denoted as data set A and data set B. The set A consists of raw ECG signals while the set B includes filtered ECG signals. Feature extraction task and classification task can be automatically implemented by the proposed CNN model.

Data Acquisition

The ECG data used are from the Massachusetts Institute of Technology-Beth Israel Hospital (MIT-BIH) arrhythmia database, where 48 ECG records from 47 subjects are consisted.[21] Each record contains approximately half-hour long ECG signals and the signals are sampled at 360 Hz with 11-bit resolution over the 10 mV range. The database includes annotation files for records and beats class information, which is labeled by independent expert cardiologists. The three types of ECG heartbeats (R, V and N) for one lead MLII from 45 records (the records 102, 104 and 114 are not included) produce a total number of 87223 ECG beats. The dataset suffers from the sample imbalance problem with only 16.43% abnormal cases among the whole dataset. The division of the training set and the test set based on the ECG records is summarized in Table 1 and detailed numbers for each type of the ECG beats are presented in Table 2.

Table 1 The division of the training set and the test set for the record-based cross validation scheme.
Table 2 Detailed numbers for each type of the ECG beats.

Pre-processing

The preprocessing which includes noise removal and heartbeat segmentation is an important step in data analysis. In this work, wavelet transform is performed to remove both high frequency noise and low frequency noise. Heartbeat segmentation algorithm is designed to obtain individual heartbeats. The specific details are presented below.

Noise Removal: The original ECG signals usually contain high-frequency noise caused by power line interference and low-frequency noise due to body movement. In this step, wavelet transform (WT) is selected to analyze the ECG signals, since WT is suitable to deal with the non-linear and non-stationary signals. Meanwhile, WT de-noising preserves useful signals that can distinguish high-frequency noise from high-frequency information effectively.[4] Thus in this work, WT is utilized to analyze the component of specific frequency sub-bands and to further remove the noise.

In the first place, the Daubechies-5 (db5) mother wavelet is utilized to decompose the ECG signals into nine high frequency sub-bands and one low frequency sub-band. After that, we remove the top three high-frequency sub-bands and one low frequency sub-band, then the remaining detailed coefficient sub-bands of the fourth, the fifth, the sixth, the seventh, the eighth, and the ninth are adopted to reconstruct filtered signal by wavelet inverse transform. This noise removal step is only part of the set B. There are two criterias to evaluate the effects of de-noising, namely minimum mean square error (MSE) and signal-to-noise ratio (SNR).[26]

Heartbeat Segmentation: On the contrary to blind segmentation, we segment the input ECG data based on fiducial-points. This ensures that each sample contains essential information of the signal. According to the annotation file, ECG signals are segmented into individual heartbeats, which are 300-points-long with respect to R-peak (fiducial point). A beat is formed by 99 samples forward and 200 samples backward, respectively. Examples of three different types of ECG beats used for set A and set B are displayed in Fig. 1.

Figure 1
figure 1

Examples of three different types of ECG beats used. Original signals are shown in the top row, and de-noised signals are shown in the bottom row. From left to right: normal beat (N), premature ventricular contraction beat (V), and right bundle branch block beat (R).

Convolutional Neural Network Model

CNN is a computational algorithm that is inspired by the network of biological neurons to solve classification problems and prediction tasks, etc. CNN, as one of the most popular neural networks, consists of many parameters and some hidden layers.[9] Unlike traditional machine learning approaches which need knowledge of expertise and are time consuming, CNN does not need to manually extract a set of appropriate features. In contrast, CNN is able to extract features and complete the classification task automatically which alleviates the burden of training and testing time.[3] The architecture of our proposed CNN model, as shown in Fig. 2, involves input layer, hidden layers (convolution, non-linearity, mean-pooling) and output layer (classification).

Figure 2
figure 2

The CNN classification model.

Table 3 The details of CNN structure of the set A and set B.

The architecture of the CNN is built to take advantage of the two-dimensional (2-D) structure and pixel relations of image recognition. ECG signals can be considered as 1-D images. Therefore, 1-D convolution as a convolutional layer is suitable for ECG signal feature extraction.

Given an input signal sequence s(t), t=1, 2,..., n, and weight w(t), t=1, 2,..., m. Filter performs convolution functions for the characteristics of the upper layer in turn. The convolution output is as follows:

$$\begin{aligned} y(t)= \sum _{k=1}^{m} w_{k}*s_{t-k+1} \end{aligned}.$$
(1)

An activation function f(x) is required for the convolutional layer for nonlinear feature mapping. The rectified linear unit (ReLU) as an activation function is applied in the process of convolution. The definition of ReLU is as below:

$$\begin{aligned} f\left( x \right) ={\rm{max}}\left( 0,x \right) \end{aligned}.$$
(2)
Table 4 The classification performance (sensitivity, specificity, and accuracy) for set A and set B.

Then the input of the neuron of i in layer h is defined as below:

$$\begin{aligned} p_{i}^h= f\left(\sum _{j=1}^{m} w_{j}^h*p_{i-j+m}^{h-1}+b^h\right) =f\left(w^h*p_{(i+m-1):t}^{h-1}+b^h\right)\end{aligned},$$
(3)

where, i=1, 2,..., n, \(b^i\) denotes the bias parameter, \(w^h\in R^m\) is an m-dimensional filter, \(p_{(i+m-1):t}^h=[p_{i+m-1}^h,...,p_{i}^h]^T\), and \(w^h\) is the same for all neurons at the same convolutional layer.

The pooling operation can be described as a self-sampling process, which reduces the number of features and avoids over-fitting, offering strong robustness. The average pooling approach is adopted in this step.

The detailed parameters of CNN structure is listed in Table 3. Regarding our CNN structure, the input layer known as layer 0 is convolved with the kernel size of 3 to get layer 1. A mean-pooling of size 2 is used in each feature map (layer 2). Then, the feature maps from layer 2 are convolved with the kernel size of 4 to generate layer 3. A mean-pooling of size 2 is used in each feature map (layer 4). The feature maps gained from layer 4 are then convolved with a kernel size of 4 to get layer 5. A mean-pooling of size 2 is employed in each feature map (layer 6). At last, the neurons of every map in layer 6 are fully connected to 3 neurons in layer 7. The ReLU activation function is applied in layer 1, layer 3, and layer 5, respectively. The last layer is a softmax layer that has a number of output maps equaling to the number of classes to classify the ECG signals into N, V and R. In addition, the parameter of learning rate is set to 0.002. The hyperparameters and architecture are set based on empirical study of the performance. The CNN classification model is presented in Fig. 3. Back propagation (BP) algorithm with batch size of 20 samples is adopted for the update of weights and biases in the operation process.[11] The weights and biases are updated as follows.

Figure 3
figure 3

Loss and accuracy for training epoch of net A.. a Training loss. b Training accuracy.

$$\begin{aligned} w_{h}= \left(1-\frac{l}{v}\right)w_{h-1}-\frac{l}{g}\frac{\partial c}{\partial w} \end{aligned},$$
(4)
$$\begin{aligned} b_{h}= b_{h-1}-\frac{l}{g}\frac{\partial c}{\partial w} \end{aligned},$$
(5)

where, l is the learning rate, v is the total training samples, g is the batch size and c is the cost function.

Performance Evaluation

Performance of the proposed approach is evaluated in terms of accuracy, sensitivity and specificity as defined below.

$${\text{Accuracy}} = \frac{{{\text{TP}} + {\text{TN}}}}{{{\text{TP}} + {\text{FN}} + {\text{TN}} + {\text{FP}}}},$$
(6)
$${\text{Sensitivity}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}},$$
(7)
$${\text{Specificity = }}\frac{{{\text{TN}}}}{{{\text{TN}} + {\text{FP}}}},$$
(8)

where, TP is True Positive, FN represents False Negative, FP means False Positive, and TN stands for True Negative. Accuracy measures the overall performance of our approach and the other two metrics distinguish certain beat types from other beat types (e.g., distinguish V from non-V).[22]

The record-based ten-fold cross validation is applied in the experiment by dividing the data set into ten parts. Nine of them are taken as training data and the remaining part is adopted as test data. The record division of training and test sets for the record-based ten-fold cross validation scheme is presented in the Table 1, in order to ensure that the training set and test set in each fold contain all ECG data types.

Table 5 Recent studies with better performance on the cardiac arrhythmias classification of ECG beats.

Results and Discussion

The proposed CNN model has been trained and tested by MATLAB 2014b software on a PC workstation with 3.70 GHz CPU and 16 GB RAM. It takes about 1057 s to complete one epoch for set A and 1050 s to complete one epoch for set B. A total of twenty epochs of train and test iterations are run for set A and set B, respectively.

Figure 4 display the loss and accuracy of the training epochs for net A and net B, respectively. The training accuracy obtained for set A and set B are 0.9453 and 0.9482, respectively. This indicates that the training accuracy for set A and set B are almost the same. However, the SNR and MSE value obtained for the filtering approach in set B are 34.0172 and 0.1129, respectively, which demonstrates that a significant noise presents in the raw signal. Thus, from the training accuracy we can conclude that the designed CNN model is insensitive to the noise of the original input ECG signal.

Figure 4
figure 4

Loss and accuracy for training epoch of net B. a Training loss. b Training accuracy.

The classification performance (accuracy, sensitivity and specificity) for set A and set B are presented in Table 4. Notably our proposed model obtains the accuracy of 0.9874, the sensitivity of 0.9811 and the specificity of 0.9905 for set A, the accuracy of 0.9876, the sensitivity of 0.9813 and the specificity of 0.9907 for set B, respectively.However the experimental results on both set A and set B reflect that our proposed approach can perform well even without noise removal, accordingly denoising is no longer necessary in our scenario. The designed CNN model can extract appropriate features and classify the ECG beats efficiently and automatically, which saves a lot of time looking for effective features. Additionally, ten-fold cross validation based on individual records is applied to further boost the robustness of our algorithm.

As explained in “Methodology” section, the imbalanced dataset used in our experiments consists of 72896 normal beats, 7080 V beats and 7247 R beats, respectively. This sample imbalance problem usually results in low sensitivity for the classification algorithm. However, for the purpose of alignment with clinical needs, we do not artificially balance the data in order to improve the sensitivity of the results.

Recent studies with good performance on the cardiac arrhythmias classification of ECG beats are summarized in Table 5. Comparing to these state-of-art solutions, our proposed CNN model has the following advantages:

  1. (i)

    It can simplify the analysis of wearable large ECG data in arrhythmia detection application and increase the classification accuracy.

  2. (ii)

    Record-based ten-fold cross validation is adopted, which guarantees the independence of the training and test sets, and further enhances the robustness of the proposed method.

  3. (iii)

    It performs well on both the original ECG signals and de-noised signals since the model can learn appropriate filters by itself. In addition, if the denoising step is removed, overall workload is also reduced.

Nevertheless, our CNN model also consists of some limitations. For instance, this research is completed based on three types of ECG beats only, therefore the type of signals needs to be increased to satisfy clinical requirement.

Conclusion

In this article, a 1-D seven layer CNN model is proposed for cardiac arrhythmias classification based on wearable big ECG data in arrhythmia monitoring application. The model is able to extract appropriate features and distinguish three different types of ECG beats (R, V and N) automatically offering opportunities of transforming the early arrhythmia detection from clinical to daily life. It performs well on both the original ECG signals and de-noised signals, therefore computational workload can be reduced by removing the denoising step. Ten-fold cross validation for record-based scheme (i.e. individual patients) is adopted in our model, which further enhances the robustness. Nowadays, the combination of wearable big data and artificial intelligence has narrowed the gap of daily healthcare. Our designed model and the corresponding algorithm provide opportunities of transforming the early cardiac arrhythmias detection from clinical to daily life.

In the future, we plan to improve the current study maily from two folds: (1) develop an ECG pattern recognition system for the purpose of offering real time services for smart home monitoring, (2) investigate state-of-the-art unsupervised deep learning algorithms, for instance, Generative Adversarial Networks to identify unlabled big ECG data.