1 Introduction

Cardiac arrhythmia is a type of cardiovascular diseases (CVDs) that threatens millions of people’s lives around the world. The easiest way to identify arrhythmia is to perform a manual inspection on 24 to 72 hours electrocardiograms (ECG). Traditionally, to have such long-term ECG recordings, patients need to wear a Holter Monitor for a continuous time period, which is a very uncomfortable experience. The rapid growth of Internet-of-Things (IoT) techniques has spawned novel ways, like Fitbit, Apple Watch, or Android Wear, for heart status tracking [47]. In comparison to the Holter Moniter, the IoT-based devices are more human-friendly because they have fewer cords and smaller-sizes, and cause fewer disruptions to patient’s daily routines. However, on the other hand, the prevalence of IoT-based devices has also resulted in a dramatic increase of ECG data, posing a great challenge to the ECG interpretation. Manual inspections become time-consuming and error-prone, which is no longer possible. An automated method is highly demanded to provide a cost-effective screening for arrhythmia and allow at-risk patients to receive timely treatments.

Heartbeat classification plays a crucial role in identification of arrhythmia. Basically, heartbeats can be classified into five classes: Normal(N), Supra-ventricular (S) ectopic, Ventricular (V ) ectopic, Fusion (F) and Unknown (Q) beats [6]. Particularly, most arrhythmias are found in S and V beats. Figure 1 presents a sample ECG segment, where the problematic heartbeats are highlighted by circles. It can be seen that the S beat exhibits a great morphological similarity in temporal dimension to the normal heartbeats. Since ECG recordings are mostly dominated by normal heartbeats for the majority of patients [22], the similarity brings a great difficulty in distinguishing the S beats from the normal ones.

Figure 1
figure 1

A sample ECG recording that contains N, S and V heartbeats. Note: RR-intervals denote the time distance between two successive R peaks

Many research attempts have been made to provide solutions for automated heartbeat classification. The existing methods are roughly divided as feature-engineering based and deep-learning based methods. However, none of these methods has achieved a clinical significance. Most feature-engineering methods are facing a bottleneck of applying a standalone classifier and using a static feature set to classify all heartbeat samples [11, 15, 16, 31, 50]. This has been shown to cause huge impacts on identification of the problematic heartbeats. The deep-learning based methods are commonly limited by learning temporal patterns from the raw ECG heartbeats only. The frequency patterns and the RR-intervals have not been well considered to assist the classification. Moreover, to supply sufficient training data for driving the deep neural networks, many works [2, 3, 26, 49, 51, 54] followed a biased evaluation procedure, in which they synthesized heartbeat samples from the whole dataset and then randomly split all heartbeats for model training, validation and test. Consequently, heartbeats from the same patient are likely to appear in both the training and test datasets, leading to an over estimation of the model performance. The overoptimistic results may hide potential limitations of the neural networks.

Besides, data quality also presents challenges for an IoT-based arrhythmia detection method. First, the IoT-based heart rate sensors may vary the rate of measurement for battery preservation [7]. Second, the collected ECG recordings are likely interrupted by background noises and baseline wonders (the effect that the base axis (X-axis) of individual heartbeats appear to move up or down rather than being straight all the time).

To solve these problems, we propose a framework for arrhythmia detection from IoT-based ECGs. The framework consists of a data cleaning module and a heartbeat classification module. Specifically, we provide two novel solutions for the heartbeat classification task. The first one is a feature-engineering based method, in which we introduce the Dynamic Ensemble Selection (DES) technique and specially design a result regulator to improve the problematic heartbeats detection. The other one is a deep neural network that performs multi-channel convolutions in parallel to manage both temporal and frequency patterns to assist the classification. To remedy the impact brought by the lack of consideration of heart rhythms, the proposed network accepts heart rhythms (RR-intervals) as part of the input. In order to reveal the performance of the proposed methods in real-world practices, we evaluate the models on the benchmark MIT-BIH arrhythmia database following the inter-patient evaluation paradigm proposed in [16]. The paradigm divides the benchmark database into a training and a test dataset at patient level, making the heartbeat classification a significantly more difficult task.

The rest of this paper is structured as follows. Section 2 reviews current methods in heartbeat classification. Section 3 presents the proposed framework and the two embedded solutions for heartbeat classification. The experiment results and discussion are presented in Section 4. Section 5 concludes this paper and discusses the future work.

2 Related work

This section provides a comprehensive review of current methods for heartbeat classification. As mentioned before, the existing methods can be roughly allocated to either the feature-engineering based or the deep-learning based category. The differences between them are summarized in Table 1.

Table 1 Comparison between feature-engineering based and deep-learning based methods

The feature-engineering based methods focus on signal feature extraction and classifier selection. Commonly used features includes RR-intervals [4, 11, 52], samples or segments of ECG curves [35], higher-order statistics [4, 17], wavelet coefficients [15, 20, 37], and signal energy [50]. They are mostly extracted from cardiac rhythm, or time/frequency domains. Feature correlation and effectiveness are important concerns for this type of methods. To avoid negative impacts of noisy data, techniques, like the floating sequential search [29] and the weighted LD model [18], must be employed to reduce the feature space. Regarding the selection of classifiers, the support vector machine (SVM) is the most widely used for its robustness, good generalization and computationally efficiency [1, 14]. Besides, the nearest neighbors (NN) and artificial neural networks (ANN) are also frequently found in the literature. The performances of current feature-engineering based methods are mainly limited by the application of single classifiers and the use of fixed features to classify all heartbeat types. On one hand, in consideration of the intra- and inter-subjects variations of the feature values, it is difficult for a single classifier to well handle a wide region of the feature space [53]. Although some ensemble methods, such as random forest [4] and ensemble of support vector machine [24], have been employed to remedy the disadvantages, the problem is still open because the diversity of the traditional ensembles is relatively low. On the other hand, using fixed features tends to make sporadically occurred S beats be wrongly classified as V beats because both heartbeats types exhibits anomalies in heart rhythms.

By contrast, the deep-learning based methods are more straightforward and integrated, in which features and classifiers are not concerns. They provide end-to-end solutions to the heartbeat classification task. The existing deep learning models are mainly extensions of convolution neural network (CNN) [2, 3, 26, 38] or combinations of CNN and recurrent neural network (RNN) [49, 51]. However, most of the CNN models are limited by the lack of consideration of frequency patterns and the heart rhythm to assist the classification. Moreover, in order to provide enough training data, many of them are evaluated in an ideal experimental setting where heartbeats from the same patient are allowed to appear in both training and test sets. The results can not reveal the true performances of the models in real-world practices and also may hide potential limitations of the methods. As compared to the feature-engineering based methods, both the results and the intermediate process of deep neural networks are less explainable. This is a potential impediment that prevents deep learning models from being widely applied in practices because explainability is important for clinicians to justify and rationalize the model outcome.

3 The proposed framework for arrhythmia detection

The proposed framework for arrhythmia detection from IoT-based ECGs is presented in this section. Figure 2 shows the framework architecture and the whole life-cycle of arrhythmia detection from IoT-based ECGs. The framework consists of a data cleaning module and a heartbeat classification module. It accepts raw ECG signals that collected from different IoT devices as input and outputs predictions for individual heartbeats.

Figure 2
figure 2

Architecture of the proposed framework. The whole life-cycle of arrhythmia detection from IoT-based ECGs includes 4 phases: data collection, storage, analysis and results notification. Specifically, ECG sensing network generates ECG recordings for patients and transmits the produced data to the IoT cloud, where fast access storage are conducted. The proposed framework is deployed in the IoT cloud to provide data analysis. Results from the framework will be pushed to patients’ ends via Internet

To reduce the impact of noisy data on the prediction accuracy, the input signals are performed a series of preprocessing, such as frequency calibration, baseline correction, and noise reduction, before heartbeat classification. We propose two solutions, namely Dynamic Heartbeat Classification with Adjusted Features (DHCAF) and Multi-channel Heartbeat Convolution Neural Network (MCHCNN), for the heartbeat classification task. DHCAF is a feature-engineering based method, whereas MCHCNN is a deep-learning based method.

Details of the data cleaning module and two heartbeat classification solutions are presented below.

3.1 Data cleaning module

Frequency calibration

To avoid the possible bias in sampling frequency caused by different ECG collectors, we develop a frequency calibration component to re-sample all incoming ECG recordings to 360 Hz at the input of the system.

Baseline correction

To correct the baseline wanders, we process each ECG recording with a 200-ms width median filter followed by a 600-ms median filter to obtain the recording baseline, and then subtract the baseline from the raw ECG recording to get the baseline corrected data.

Noise reduction

For noise reduction, we apply discrete wavelet transform [39] with Daubechies-4 mother wavelet function to remove recordings’ Gaussian white noise. The Daubechies-4 function has short vanishing moment, which is ideal for analyzing signals like ECG with sudden changes. Concretely, in the noise reduction component, the baseline corrected recordings are decomposed to different frequency bands with various resolutions. The coefficients of detail information (cDx) in each frequency band is then processed by a high-pass filter with a threshold value

$$ T = \sqrt { 2 * \log (n ) }, $$

where n indicates the length of the input recording. Coefficients that blocked by the filter are set to zero. Finally, the clean recordings are obtained by employing inverse discrete wavelet transform on all the coefficients.

Heartbeat segmentation

The clean signals are segmented to individual heartbeats by taking advantage of the R peak locations that detected by the Pan-Tompkins algorithm [36]. For each R peak, 90 samples (250 ms) before R peak and 144 samples (400 ms) after R peak are taken to represent a heartbeat, which is long enough to catch samples to represent the re-polarization of ventricles and short enough to exclude the neighbor heartbeats [4].

3.2 Dynamic heartbeat classification with adjusted features

Architecture of the proposed DHCAF is shown in Figure 3. The model contains 4 processing stages: Feature Extraction, Classifier Pool Training, Classifier Selection and Prediction, and Result Refinement.

Figure 3
figure 3

Architecture of the proposed DHCAF

Feature extraction

In this stage, three types of features are extracted to represent individual heartbeats: RR-intervals, higher order statistics and wavelet coefficients.

As experimentally proven in [52], the RR-interval is one of the most indispensable features for heartbeat classification and it has great capacity to tell both the S and V beats from the normal beats. In this work, four types of RR-intervals are extracted from ECG signals: pre_RR, post_RR, local_RR and global_RR [30]. The RR-intervals can significantly vary with patients. To reduce the negative impact of the variation, we normalize the RR-intervals in the way below:

$$ nomalized\_pre\_RR = \frac{pre\_RR}{mean(ds.pre\_RR)} $$
(1)
$$ nomalized\_post\_RR = \frac{post\_RR}{mean(ds.post\_RR)} $$
(2)
$$ nomalized\_local\_RR = \frac{local\_RR}{mean(ds.local\_RR)} $$
(3)
$$ nomalized\_global\_RR = \frac{global\_RR}{mean(ds.global\_RR)} $$
(4)

where ds.pre_RR denotes the average of all pre_RRs in the ds that the heartbeat belongs to, and so on.

Regarding the higher order statistics (HOS), it is reported being useful in catching subtle changes in ECG data [32]. In this work, the skewness (3rd order statistics) and kurtosis (4th order statistics) are calculated for each heartbeat. They can be mathematically defined as follows, where X1..., N denotes all the data samples in a signal, \(\bar {X}\) is the mean and s is the standard deviation.

$$ Skewness = \frac{{\sum}_{i=1}^{N}(X_{i}-\bar{X})^{3}/N}{s^{3}} $$
(5)
$$ Kurtosis = \frac{{\sum}_{i=1}^{N}(X_{i}-\bar{X})^{4}/N}{s^{4}} - 3 $$
(6)

The wavelet coefficients provide both time and frequency domain information of a signal, which is claimed to be the best features of ECG signal [30]. The choice of the mother wavelet function used for coefficients extraction is crucial to the final classification performance. In this work, the Haar wavelet function is chosen because of its simplicity and that it has been demonstrated as the ideal wavelet for short time signal analysis [50].

Classifier pool training

In this stage, a collection of classifiers, including multi-layers perceptron, support vector machine (SVM), linear SVM, Bayesian model with Gaussian kernel, decision tree, and K-nearest neighbors model, are trained using the extracted features, to create an accurate and diverse classifier pool.

Classifier selection and prediction

This stage plays a core role in the model. The Dynamic Ensemble Selection (DES) [13] technique is introduced in this stage to select the most competent classifiers for making predictions of the test samples. It helps to solve both the intra- and inter-subjects variations of the feature values.

In DES, the competence of a classifier in the pool is measured by its performance over a local region of the feature space where the testing sample is located. Methods for defining a local region includes clustering [28], k-nearest neighbors [40], potential function model [44, 45] and decision space [9]. The criterion for measuring the performance of a base classifier can be divided as individual-based and group-based criterion. In the individual-based criterion, each base classifier is independently measured by evaluation metrics such as ranking, accuracy, probabilistic, behavior [9], meta-learning [12]. In the group-based criterion, the performance of a base classifier relates to its iterations with other classifiers in the pool. For example, diversity, data handling [46] and ambiguity [19] are widely used group-based performance metrics.

Once the candidates classifiers are selected, aggregation of results from these classifiers is then performed to give a united decision. There are three main strategies for results combination: static combiner, trained combiner and dynamic weighting. The majority voting scheme is a representative static combiner, which is also commonly used in the traditional ensemble methods. In trainable combiners, the outputs of the selected based classifiers are used as the input features for another learning algorithm, such as [8, 33]. In dynamic weighting, higher weight value will be allocated to the most competent classifier and then the outputs of all the weighted classifiers are aggregated to give the united decision.

Currently, prevalent DES techniques that can be used in this stage include DES-KL [45], DES-KNN [41], KNORA-E [27], KNORA-U [27], KNOP [9], DES-P [45], DES-RRC [44], and META-DES [12]. Extensive experiments have been done to evaluate these DES techniques in our previous work [21]. The results showed that there is no significant difference of their performances. We adopt the META-DES in this work since it has reported superior performances in a wider range of datasets.

Result refinement

The aggregated result from the previous stage will be refine in this stage by our adjusted features strategy. Specifically, we train an SVM classifier with only HOS and wavelet coefficients (the RR-intervals are removed) to improve the results of S and V beats. The rationale of such a classification strategy is that the sensitivities to certain feature varies with heartbeat types [52] . For instance, the RR-intervals are indispensable to for identifying disease heartbeats from the normal ones. However, the RR-intervals can also cause troubles to make a distinction between different kinds of disease heartbeats, such as S and V beats.

3.3 Multi-channels heartbeat convolution neural network

Architecture of the proposed Multi-channels Convolution Neural Network (MCHCNN) is presented in Figure 4. The network accepts two inputs: raw ECG heartbeat and heart rhythm (RR-intervals). As motivated by an electroencephalogram (EEG) processing network [42] which uses different sizes of convolution filters to capture temporal and frequency patterns from EEG signals, the proposed MCHCNN perform 3 channels of convolutions in parallel on the input ECG heartbeats to extract the temporal and frequency information. The convolution filter size varies with channels, where the smaller filter is used to capture temporal patterns and the larger filter is used to capture frequency patterns. We denote the convolution process as Conv(x, y) in Figure 4, where x is the convolution filter size and y is the amount of the output feature maps. Each convolution operation is followed by a batch normalization and a ReLu activation. The batch normalization normalizes the output of the convolution by subtracting the batch mean and dividing by the batch standard deviation, which reduces the problem of internal covariate shift [25] and overfitting. The introduction of a ReLu activation is to allow the network to extract nonlinear features.

Figure 4
figure 4

Architecture of the proposed Multi-channels Heartbeat Convolution Neural Network (MCHCNN)

Every three stacked convolutions are wrapped into a building block and bypassed by a shortcut connection. The learned features are added to the shortcut at the end of each building block. Such a design helps to reduce the network degradation problem [23]. Each channel contains 3 building blocks. Learned features from the three channels are integrated by addition before a pooling layer. The pooling layer is used to reduce feature dimensions, after which the learned features are reduced to half-size. It helps to reduce the number of parameters in the following fully connected layer and lower the risk of overfitting.

A Rhythm Integration layer is specially designed to concatenate the learned features and the input heart rhythms. It reduces the impact brought by the lack of consideration of heart rhythms on identification of disease heartbeats in many existing network models.

Next, the dense layer is used to learn non-linear combinations of the learned features. The softmax layer gives the probabilities of the each heartbeat type.

4 Evaluation

In this section, we evaluate the proposed framework equipped with DHCAF and with MCHCNN, respectively. The MIT-BIH-AR database [34] is used as the benchmark database. It is the most representative database for arrhythmia detection and it has been used for most of the published research [16]. Details of the database are given below.

4.1 The MIT-BIH-AR database

The MIT-BIH-AR database contains 48 two-leads ambulatory ECG records from 47 patients (22 females and 25 males). Each record has approximately 30 minutes in length. These recordings were digitized at 360Hz. For most of them, the first lead is modified limb lead II (except for the recording 114). The second lead is a pericardial lead (usually V 1, sometimes are V 2, V 4 or V 5, depending on subjects).

In order to reveal the performance of the proposed framework, we follow the evaluation paradigm proposed in [16] to divide the database into a training and a test dataset. The paradigm avoids heartbeats of the same patient appearing in both training and test stages, ensuring a fair evaluation. Table 2 shows the division details, where DS1 is the training set and DS2 denotes the test set.

Table 2 Recording distributions and class proportions on DS1 and DS2

Noticing that DS1 is extremely imbalanced and dominated by N beats, we apply the SMOTEENN technique [5, 10, 43] on DS1 to over-sample the minority heartbeats (S and V ) to the same amount of N beats.

4.2 Evaluation metrics

Evaluation metrics used in this work are sensitivity (Se), positive predictive value (+ P) and accuracy value (Acc),as formulated below,

$$ Se = \frac{TP}{TP+FN} $$
(7)
$$ +P = \frac{TP}{TP+FP} $$
(8)
$$ Acc = \frac{TP+TN}{\sum} $$
(9)

where TP, TN, FP and FN denotes true positive, true negative, false positive and false negative, respectively, and \(\sum \) represents the amount of instances in the data set. According to the AAMI standard [16], penalties would not be applied for the misclassification of F and Q beats, as they are naturally unclassifiable.

4.3 Results of the proposed framework

Confusion matrixs of the proposed framework with DHCAF and with MCHCNN on DS2 are presented in Table 3. We summarize the results and compare our framework with multiple state-of-the-art methods in Table 4. All results reported in Table 4 are obtained under the same evaluation paradigm on DS2 of MIT-BIH-AR database.

Table 3 Confusion matrixs of DHCAF and MCHCNN on DS2
Table 4 Arrhythmia detection results of the proposed framework and the stat-of-the-art methods on DS2

It is clear that the proposed framework with DHCAF achieves the best sensitivity of both class S and V, and maintain a good performance in overall accuracy and classification of class N. Shan’s model [11] obtains the highest accuracy and class N sensitivity. However, it fails in the detection of class S, with the sensitivity of class S being merely 29.5%, which limits the model’s practical significance. The proposed framework with MCHCNN outperforms DHCAF in terms of the overall accuracy, sensitivity of N beats, and the positive predictive value of S beat, but its performance on sensitivity of S beat is less satisfied. In fact, it can be found that the positive predictive values of S beats for most listed works in Table 4 are relatively low, as compared to other metrics. This is mainly caused by some N beats being misclassified as S beats. As mentioned in the Introduction section, the similar QRS complex and the data imbalance problem have introduced a great difficulty in distinguishing the S from the N beats. We compare the proposed framework with MCHCNN to another deep-learning based method by Sellami et al. [38], which reports model performance under the same unbiased evaluation. The results show that Sellami’s work has achieved a promising performance on identification of both the problematic S and V beats, being close to that of the proposed framework with DHCAF. However, this is at the cost of the overall accuracy and sensitivity of normal beats. In real-world practices, the large amount of misclassification of normal heartbeats as the disease heartbeats will result in an unnecessary waste in medical resources.

From the above analysis, the proposed framework with DHCAF is believed to be a more appropriate choice than other listed works for cardiac arrhythmia detection, because it achieves the best identification performance on disease heartbeats while maintaining a good overall accuracy and classification performance on the normal heartbeats.

4.4 Ablative analysis

We perform ablative analysis for the proposed DHCAF and MCHCNN to demonstrate the effectiveness of the model architectures. The results are summarized in Table 5 and Table 6, respectively.

Table 5 Ablative analysis of DHCAF
Table 6 Ablative analysis of MCHCNN

Two baselines are used in the ablative analysis for DHCAF. One is DHCAF with the result refinement stage removed. The other one is DHCAF with the dynamic ensemble selection classification of DHCAF replaced by ensemble of SVM classification. It is apparent that the result regulator has made unique contributions to DHCAF, with which the overall accuracy, sensitivity of class S, and positive predictive of class S and V are visibly increased. On the other hand, the poor classification performance of the SVM ensemble has demonstrated the importance of the introduction of dynamic ensemble selection to the proposed method.

As discussed in Sections 1 and 2, many existing deep neural network models have not taken the heart rhythms into account for heartbeat classification, but this limitation is hidden by the over-optimistic results obtained in a biased evaluation paradigm. In the ablative test of MCHCNN, we want to know the actual impact of heart rhythms on model performances. Therefore, we construct a baseline MCHCNN which only take raw ECG heartbeats as input. The results, as seen in Table 6, indicate that heart rhythm (RR-intervals) are necessary for identification of the disease heartbeats. Without consideration of heart rhythm, the baseline can hardly detect S beats. The detection on V beats is also affected. The outcome is in line with the medical fact. As we can see in Figure 1, most V beats present a huge morphological difference with other heartbeats. That is why the baseline can still maintain 73.8% sensitivity on V beats. However, for S beats, the heart rhythm is essential for distinguishing them from the normal heartbeats.

Although heartbeat rhythms has been part of the input to the proposed MCHCNN, the S beats detection performance is still less satisfied. This indicates that the raw heartbeat rhythms provide limited assistance to our MCHCNN in identification of S beats. A possible explanation is that the heartbeat rhythms are not integrated well to the network and also easily affected by other learned features. A future study is needed to investigate this issue.

5 Conclusion

Millions of people around the world are suffering from cardiac arrhythmia. In this work, we propose a framework for automated arrhythmia detection from IoT-based ECGs. The framework consists of two modules: a data cleaning module to tackle the challenges presented by IoT-based ECGs, and a heartbeat classification module for identification the disease heartbeats. Specifically, we proposed two solutions, DHCAF and MCHCNN, for the heartbeat classification task. DHCAF is a feature-engineering based method which introduces the dynamic ensemble selection techniques and uses an adjust-feature strategy to assist disease heartbeats identification. By contrast, MCHCNN is an end-to-end solution that performs multi-channel convolutions to capture both the temporal and frequency information from the raw heartbeats to improve the classification performance. We evaluate the proposed framework on the MIT-BIH-AR database under the inter-patient evaluation paradigm. The results show that the proposed framework with DHCAF is a qualified candidate for automated arrhythmia detection from IoT-based ECGs. Besides, although the S beats detection performance of MCHCNN is less satisfied, the network still provide some insights to our future study.

This work is a first step to provide a solution for the automated arrhythmia detection in the era of Internet-of-Things. In our next study, we aim to investigate a more effective way for integration of the heart rhythms into a neural network.