Keywords

1 Introduction

Cardiovascular disease is a serious problem with a high fatality rate and can easily lead to multiple complications, posing a significant threat to human health [1,2,3]. Arrhythmia is one of the most common types of cardiovascular disease; therefore, the automatic diagnosis of arrhythmia has attracted attention from researchers. An ECG is widely used in the diagnosis of arrhythmia owing to its noninvasive nature and because it provides a rich heart rhythm, thereby making the diagnostic process convenient for medical workers [4, 5]. However, an ECG signal is nonlinear, and small changes might get ignored when an ECG is viewed by the naked eye; moreover, an accurate diagnosis of arrhythmia manually requires a 24 h holter recording process, which is a cumbersome and lengthy process [6,7,8]. Therefore, it is necessary to use computer algorithms to diagnose arrhythmia. Additionally, the use of such algorithms can improve the accuracy and robustness of the diagnosis and reduce the diagnosis time and workload.

With the development of information technology, many arrhythmia classification methods that use computer algorithms have emerged currently, some of which are traditional ECG algorithms. First, the ECG signals are extracted from features [9, 10] and then put into a support vector machine and random forest in the classifier [11,12,13,14]. However, because the features are manually extracted, the obtained information may not fully reflect the true ECG signal, leading to the loss of important features. Therefore, to determine an arrhythmia, it is difficult to obtain the best results based solely on the use of machine learning.

Compared with machine learning, deep learning provides greater advantages. In deep learning, all hidden features are noticed, and no manual feature extraction is required [15]. In terms of ECGs, deep learning has also been applied to many studies on ECG signals. Xiong et al. proposed a 16-layer one-dimensional CNN to classify ECGs [16], and Acharya et al. proposed an 11-layer deep CNN network as an ECG computer-aided diagnosis system to develop four different types of automatic arrhythmia classification method [6]. Fujita et al. used a CNN combined with raw data or a continuous wavelet transform for classification of the four types of ECG signals [17]. Oh et al. used a CNN and a long short-term memory(LSTM) model to diagnose N sinus rhythms, left bundle branch blocks, right bundle branch blocks, premature atrial beats, and premature ventricular beats. The ECG signal has achieved good classification results [7]. Zheng et al. converted a one-dimensional ECG signal into a two-dimensional gray image and used the combined model of a CNN-LSTM to detect and classify the input data [18]. Several researchers have shown that the application of deep learning to the classification of ECG signals significantly improves the performance of the system. Various neural networks can extract complex nonlinear features from the original data without manual intervention, thereby making the classification results more ideal. However, learning the thinking mechanism of the ECG signal features with the high accuracy required for monitoring remains a difficult task. CNNs and BiLSTM models [19] have their own advantages in terms of feature extraction and dependency learning and they can be used for arrhythmia monitoring, improving the accuracy and stability of automatic arrhythmia detection.

In this paper, we proposed an end-to-end arrhythmia detection method to utilize the advantages of the CNN and BiLSTM networks completely. The innovative ECG classification algorithm is called CNN-BiLSTM, and it can identify and classify abnormal signals from ECG signals. The contributions of this study are as follows.

  1. I.

    The CNN provides advantages in terms of image processing, and the BiLSTM model can compensate for the shortcomings of the CNN in terms of context sequences. Therefore, the end-to-end network of the CNN-BiLSTM can effectively improve the accuracy of arrhythmia detection.

  2. II.

    Adaptive segmentation and resampling are adopted to align the heartbeats of patients with various heart rates. Multi-scale signals that represent electrocardiographic characteristics can be used as the input of the network to extract multi-scale features.

  3. III.

    Using a small amount of data as the input of the network reduces the computing resources and yields good experimental results. This improves the generalization of the network model and provides a high-precision classification method to meet the needs of automatic detection.

Section 1 of this paper introduces the current research background on automatic arrhythmia detection and the related research algorithms that have been implemented. Section 2 describes the operations conducted prior to the experiment and introduces the data and related network structure needed for the experiment. Section 2.3 introduces the experimental details and results. Section 3 provides some concluding remarks and areas of future research.

2 Materials and Methods

2.1 Description of Dataset

In this study, an ECG signal was obtained from the MIT-BIH Arrhythmia Database, which is an internationally recognized open-source database [20]. It includes 48 ECG records of 47 subjects. Each record contains a 30 min ECG signal, digitized at a rate of 360 samples per second within a range of 10 millivolts with an 11-bit resolution. Each record has an annotation file for the computer to read. A complete normal ECG signal is shown in Fig. 1.

Fig. 1.
figure 1

A normal ECG signal

This study used some of the signals in the database, including the following four types: normal heartbeat(N), ventricular premature beat(V), left bundle branch block heartbeat(L), and right bundle branch block heartbeat(R). The performances of the four signals are depicted in Fig. 2. Ventricular premature beats often manifest as QRS waves with wide deformities, and the direction of T waves is opposite to the direction of the QRS waves. An L often shows that the QRS wave becomes a broad R wave, and the time limit is extended. An R often shows that the QRS wave is M-shaped, and the R wave is wide and has notches.

Fig. 2.
figure 2

N, V, R, L signal performance

Normal pulsation dominates the dataset; thus, we select a portion of a normal pulsation and simultaneously balance the data of the remaining three types of beats to avoid bias in the experimental results. When the ECG signal is evenly distributed, the neural network exhibits a better convergence. Therefore, we normalized the experimental data. In this study, 80% of the data were used for training, and 20% were used for testing.

2.2 Networks

Convolutional Neural Network. CNN is one of the most commonly used neural networks in the field of image processing [21,22,23]. It mainly includes an input layer, a convolution layer, a pooling layer, and an output layer. Among them, the convolutional layer and pooling layer are the core of the CNN. The CNN used in this study contains four convolutional layers and four pooling layers. Its architecture is shown in Fig. 3. The fully connected layer is not connected here, and the output data are input into the subsequent BiLSTM network to continue the training.

Fig. 3.
figure 3

CNN structure

The CNN is connected to the input layer through the convolution kernel. The convolution kernel performs point multiplication through a sliding window to achieve multi-scale feature extraction. Simultaneously, the weight sharing mechanism of the convolutional layer makes it more effective for feature extraction, thereby significantly reducing the number of free variables that need to be learned. The pooling layer follows the convolutional layer and performs a downsampling to reduce the feature size [24]. After going through several convolution and pooling layers, the features obtained are converted into a single one-dimensional vector for classification.

Bidirectional Long and Short-Term Memory Network. An LSTM network is an improved model of a cyclic neural network. It not only transmits forward information but also processes the current information. An LSTM network mainly includes three control gate units, i.e., an input gate, a forget gate, and an output gate. The input gate controls how much input information needs to be kept at the current moment, whereas the forget gate controls how much information needs to be discarded at the previous moment. The output gate controls the amount of information that needs to be output to the hidden state at the current moment. The hidden layer structure is shown in Fig. 4.

Fig. 4.
figure 4

BiLSTM hidden layer structure diagram

Assuming that a given input sequence is represented by \(X_{t}\), its update state has the following formula:

$$\begin{aligned} f_{t}=\delta \left( W_{f}\left[ h_{t-1},X_{t}\right] +b_{f}\right) \end{aligned}$$
(1)
$$\begin{aligned} i_{t}=\delta \left( W_{i}\left[ h_{t-1},X_{t}\right] +b_{i}\right) \end{aligned}$$
(2)
$$\begin{aligned} j_{t}=tanh(W_{c}\left[ h_{t-1},X_{t}\right] +b_{c}) \end{aligned}$$
(3)
$$\begin{aligned} O_{t}=\delta \left( W_{o}\left[ h_{t-1},X_{t}\right] +b_{o}\right) \end{aligned}$$
(4)
$$\begin{aligned} h_{t}=O_{t}*tanh(C_{t}) \end{aligned}$$
(5)

Here, \(C_{t}\) is the state information of the memory unit, \(j_{t}\) is the accumulated information at the current moment, W is the weight coefficient matrix, b is the bias term, sigma is the sigmoid activation function, and tanh is the hyperbolic tangent activation function.

As shown in Fig. 4, the first layer is a forward LSTM, and the second layer is a backward LSTM. The final output is calculated using the following formula:

$$\begin{aligned} h_{t}=\alpha h_t^f+\beta h_t^b \end{aligned}$$
(6)

In this formula, \(h_t^f\) is the output of the forward LSTM layer, which will be is from \(x_{1}\) to \(x_{t}\) as input; \(h_b^t\) is the output of a backward LSTM layer, which is from \(x_{t}\) to \(x_{1}\); \(\alpha \) and \(\beta \) are the sequence control of the forward LSTM and backward LSTM factors (\(\alpha \) + \(\beta \) = 1),; respectively; and \(h_{t}\) is the element sum of two unidirectional LSTM factors at time t.

Proposed Architecture. In the aforementioned networks, CNNs have significant advantages in terms of image processing. The CNN model extracts local features in the input signal through a sliding convolution kernel, and the dependence of the data is difficult to learn. BiLSTM can learn the forward and backward information of the feature vector extracted by the CNN by controlling the gate unit; thus, the feature extraction is more perfect. In this paper, an ECG signal classification model based on CNN-BiLSTM is proposed. The ECG signal is preprocessed and input into the model. The CNN obtains the local features of the ECG signal through the convolutional and pooling layers and then places these features into the BiLSTM. The hidden layer obtains the best feature information. The learning rate used by the network is 0.01, and the batch size is 16. Finally, the data are divided into four categories, i.e., N, V, R, and L, through the fully connected layer and the softmax function. The network structure of CNN-BiLSTM is shown in Fig. 5.

Fig. 5.
figure 5

CNN-BiLSTM network structure

2.3 Experimental Detail and Results

The research described in this paper is based on the TensorFlow neural network framework. Before the start of the experiment, the data label is converted into the corresponding one-key heat carrier. This study uses the same amount of N, V, R, and L data for the experiments, and the signal processing of the dataset is random. The parameters of the network are optimized, and the Adam updater is used to update the weights to obtain the best classification results. Table 1 lists the relevant parameters of the experimental network. Additionally, the classification results of a single CNN network and a BiLSTM network are compared with the experimental results of the composite network proposed herein.

To estimate the performance in terms of heartbeat classification, the performance of the model is usually accurately evaluated [24,25,26,27].

Table 1. The parameters of the model

Figure 6 presents the loss function curve of the experiment using three different networks, i.e., a CNN, BiLSTM, and CNN-BiLSTM, under the same data. It can clearly be observed that the convergence effect of the proposed CNN-BiLSTM network is better than that of the two single networks.

Fig. 6.
figure 6

CNN, BiLSTM, CNN-BiLSTM loss function curve comparison

Figure 7 presents the overall accuracy of the three networks, CNN, BiLSTM, and CNN-BiLSTM, and the classification accuracy for the four types of data. The data volume of the specific classification is provided in Tables 2, 3, 4. It can be observed that the classification accuracy of the CNN-BiLSTM network is higher than CNN and BiLSTM.

Fig. 7.
figure 7

The accuracy of CNN, BiLSTM and CNN-BiLSTM

Table 2. Classification results of CNN network
Table 3. Classification results of BiLSTM
Table 4. Classification results of CNN-BiLSTM

As mentioned previously, the CNN-BiLSTM model achieves an overall classification accuracy of 99.69% on the test set, where N is 99.75%, V is 99.56%, R is 99.92%, and L is 99.52%.

In this study, by combining the deep learning model of a CNN and an LSTM to extract ECG features, the features can be automatically extracted, and a higher accuracy can be achieved.

Table 5 presents a series of scientific studies based on ECG signals with regard to the MIT-BIH Arrhythmia database. We can observe that, compared with other deep learning methods, the proposed CNN-BiLSTM network model improves the input signal and network structure of the model.

Table 5. Comparison with previous work on the MIT-BIH Arrhythmia database

3 Conclusion

At present, arrhythmia is one of the most common types of cardiovascular disease, and it seriously endangers human health. In this paper, an automatic system for arrhythmia classification was proposed based on a CNN-BiLSTM. This network can automatically extract and classify ECG signal features, thereby significantly reducing the workload of doctors. The network includes four convolutional layers, four pooling layers, a BiLSTM layer, and a fully connected layer and has achieved good classification performance. This classification method reduces the computing resources required and achieves a high accuracy; thus, it can be used as an auxiliary diagnostic method for clinical arrhythmia detection.