1 Introduction

Congestive heart failure (CHF) is a pathophysiological syndrome where there is abnormal filling and/or emptying of the left heart chamber [1]. It is caused by structural and/or functional derangements due to - and can also be considered the final stage of - diverse heart diseases. The prevalence and incidence of CHF are increasing, with approximately 26 million adults diagnosed with CHF worldwide in 2014 [2]. It is a major contributor to global mortality and morbidity, as well as an important factor for loss of quality life years and increased healthcare expenditure. This is because of the debilitating symptoms such as breathlessness and fatigue experienced by sufferers of CHF. Consequently, these patients experience a decline in their quality of life as they are increasingly unable to carry out physical and social activities [3]. It is also noted that CHF predominantly affects the elderly (age > 64 years) [4]. Therefore, there is a need for early detection of CHF in the ageing population, which is a problem many countries in the world are facing right now. In addition, CHF contributes to increased care and economic burden on patients’ families with around 40% of them having to struggle with their daily routine [3]. An early detection will allow institution of preventative measures and treatment that may alter the course of the disease and impede the progression of CHF among the elderly.

Figure 1 shows the comparison of a healthy and a CHF heart with impaired pump function. In the healthy heart, there is good stroke volume (blood flow volume ejected per heart beat) and oxygen-rich blood is pumped to the body from the left ventricle. However, in a common type of CHF with impaired pump function, stroke volume drops and the heart is unable to efficiently pump oxygen-rich blood to the rest of the body. The heart is remodeled from the underlying disease process, becoming enlarged with stiff muscle walls as it is being stretched to hold more oxygen-rich blood to pump to the body. The weakened pumping capacity results in easy fatiguability. It also causes blood and fluid to back up into the lungs and the body, resulting in breathlessness and generalized swelling, respectively [5].

Fig. 1
figure 1

Illustration of a healthy heart and one with heart failure

The diagnosis of CHF is a clinical one, requiring a conglomerate of symptoms and signs, as well as corroborative evidence from investigative tests. The electrocardiogram (ECG) is a noninvasive test commonly used by the healthcare professionals to record the heart activities of patients. Although the ECG signals are altered in CHF, the changes are non-specific and by themselves, are insensitive and not specific for diagnosis of CHF when using standard manual analytic methods. Typically, the recorded ECG signals are visually examined by cardiologists for the detection of any abnormalities present in the signals. However, visual assessment of different ECG readings recorded from various patients is time-consuming. Further, manual interpretation of the ECG signals may be subject to inter-observer variability.

2 Related work

Many different traditional machine learning techniques have been employed to surmount the inadequacies of manual analysis of ECG signals in CHF (refer to Table 10). Traditional machine learning technique refers to an algorithm which has pre-processing, feature extraction and selection, and classification processes. The selection of distinctive features between normal and CHF signals is difficult and involves a lot of time and effort. Also, the robustness of the features extracted from the signals is dependent upon the quality of data. Pre-processing of the signals such as noise removal and R-peak detection are required in order to extract the most significant features for classification. To avoid the pitfalls of traditional machine learning, we propose deep learning in this work in order to optimize the performance of an automated CHF diagnosis system. Deep learning is a form of machine learning approach where the network learns and picks up distinct characteristics automatically based on the input ECG signals [6].

Convolutional neural network (CNN) is one of the forms of deep learning which has been widely employed in speech and image recognition [7] and is receiving plenty of attention in the medical field [7]. Recently, researchers are using CNN models to develop computer-aided diagnosis system to diagnose diverse medical conditions [8,9,10,11,12,13,14,15,16,17]. The authors have employed CNN models in the detection of various heart diseases such as identifying arrhythmias with 2-seconds and 5-seconds ECG segments [13], diagnosing myocardial infarction ECG beats with and without noise removal [14], distinguishing coronary artery disease ECG signals from normal ECG signals with 2-seconds and 5-seconds signals [15], classifying 5 different types of heartbeats with ECG beats [16], and lastly, the detection of shockable and non-shockable 2-seconds ECG ventricular arrhythmias [17]. These published works have demonstrated relatively good performance with minimum pre-processing and no feature extraction or selection. Lately, Tan et al. [18] designed a long-short term memory (LSTM) with CNN to diagnose coronary artery disease. Their network achieved a high diagnostic accuracy of 99.85%. But, as compared to the LSTM network, CNN has faster computational time and is less complex. Hence, this paper uses a deep CNN model (11-layers) to study the automatic classification of ECG signals into normal and CHF classes.

3 Materials used

The ECG signals used in this work were obtained from public databases (PhysioBank) namely the Beth Israel Deaconess Medical Centre (BIDMC) Congestive Heart Failure Database, Fantasia Database, and MIT-BIH Normal Sinus Rhythm Database (NSRDB) [19]. Table 1 summarizes the details of the ECG data collected from each database.

Table 1 The details of ECG signals obtained from various databases

The severity of CHF symptoms is graded based on the New York Heart Association (NYHA) scale [20]:

Class 1: :

mild with no limitation of physical activity;

Class 2: :

mild with slight limitation of physical activity;

Class 3: :

moderate with marked limitation of physical activity; and

Class 4: :

severe with total limitation of physical activity.

The CHF ECG data used in this work are in Class 3 and Class 4 categories.

A total of four datasets (Set A, Set B, Set C, and Set D) are used in this work. Both Sets A and B consist of full ECG data (unbalanced), while Sets C and D have balanced number of ECG data (see Table 2). 30,000 normal ECG data are randomly selected from the full set for Sets C and D.

Table 2 The total ECG segments used in each data set

Figure 2 shows typical normal and CHF ECG segments obtained from the public databases.

Fig. 2
figure 2

A typical normal (Fantasia and NSRDB) and CHF ECG segments

4 Methodology

4.1 Pre-processing

The Fantasia and BIDMC ECG databases are sampled at 250 Hz frequency whereas the MIT-BIH Normal Sinus database (NSRDB) is sampled at 128 Hz frequency. Therefore, the signals obtained from NSRDB are up sampled to 250 Hz. This ensures that the frequency of ECG signals is standardized. Then, the ECG records were segmented into 2 seconds ECGs (without performing R-peak detection). Each ECG signal (2 seconds) is 500 samples in length.

Also, each ECG signal is regularized with Z score normalization, standard deviation of 1, and zero mean before inputting into the network.

4.2 CNN architecture

The details of the proposed CNN model are tabulated in Table 3 and the graphical representation of the architecture can be seen in Fig. 3. The number of layers and the tuning parameters are varied by a brute force method until the optimum diagnostic performance is achieved. Hence, the proposed model consists of 4 convolutions, 4 max-pooling, and 3 fully-connected layers. The stride (the amount by which the filter shifts) is set at 1 and 2 for convolution and max-pooling respectively in this work. These layers make up the fundamental structure of CNN whereby convolution picks up distinctive features from the input ECG signal. The max-pooling operation reduces the dimensions of feature maps and at the same time retain important and significant features of the input ECG signal. The max-pooling is performed after every convolution operation in this work. Lastly, the fully-connected layer is intended to connect the neurons in the previous layers into a two-class (normal or CHF) probability distribution.

Table 3 The structure of the CNN model for Sets A to D
Fig. 3
figure 3

The architecture of the proposed CNN model

Layer 0 (input layer) is convolved with a size 5 kernel (filter) to produce the first layer. Then, a max-pooling operation (kernel 2) is administered on layer 1 (496 × 5) to form layer 2 (248 × 5). After which, in layer 2, a convolution is performed with a filter (size 5) to construct layer 3. Then, a max-pooling is once again applied to decrease the number of output neurons. Again, a convolution is performed in layer 4 (122 × 5) with a kernel size 3 to form layer 5. Then, a max-pooling is performed to decrease the number of neurons from 120 × 10 to 60 × 10 (layer 6). Another round of convolution with kernel size 3 is applied followed by one last max-pooling operation to form layer 8 with 29 × 10 neurons. Layer 8 is fully-connected to 40 output neurons in layer 9 and fully-connected to 20 neurons in layer 10. Lastly, layer 10 is fully-connected to the final layer (layer 11) with 2 outputs which represent the two classification classes (normal and CHF).

4.3 Training and testing of CNN model

Xavier initialization is used to initialize the model weights [21]. A backpropagation [22] with a batch size of 10 is used to update the CNN model in this study. The network loss is evaluated using the cross-entropy function. The parameters used to train the proposed CNN structure in order to yield the maximum diagnostic performance are lambda (L1 regularization) = 0.2, learning rate = 3x10− 4 and momentum = 0.3. These parameters help to impede overfitting of the data (regularization), assist in data convergent (learning rate), and adjust the speed of the learning (momentum) [23].

Furthermore, leaky rectifier linear unit (LeakyRelu) [24] shown in (1) is employed as activation function for layers 1, 3, 5, 7, 9, and 10 whereas layer 11 implemented the SoftMax function as seen in (2).

$$ f(x)\left\{ {{\begin{array}{*{20}c} x & {for\,x>0} \\ {0.01x} & {for\,x\,=<0} \end{array} }} \right. $$
(1)
$$ P_{i} =\frac{e^{x_{i} }}{{{\sum}_{1}^{j}} {e^{x_{j} }}} \quad for\,i = 1,{\ldots} j $$
(2)

Where f(x) represents the function, Pi is the probability distribution over the total possible classes, andj denotes the total number of classes.

Stratified ten-fold cross-validation strategy [25] is performed in this work. The ECG segments of four sets are divided into ten parts. Nine parts are used to train the model whilst the remaining part is used to test the model. Each divided part contains approximately the same target class percentage as the entire dataset. Ten iterations are conducted in this work. The average of the ten iterations for the four sets are tabulated in Table 9.

5 Results

Two Intel Xeon 2.40 GHz (E5620) processor and a 24 GB RAM are used to train the proposed network without the implementation of a graphics processing unit (GPU). Table 4 shows the average time needed to train an epoch for each dataset. 60 epochs are run in this study to develop the model.

Table 4 Training time to complete an epoch

The confusion matrix for sets A to D are shown in Tables 567 and 8 respectively. Table 9 shows the overall average performance to classify normal and CHF classes with our proposed CNN model. The proposed CNN model achieved the highest accuracy of 98.97%, sensitivity of 98.87%, and specificity of 99.01% for Set B.

Table 5 Confusion matrix for the unbalanced data set - NSRDB/BIDMC (Set A)
Table 6 Confusion matrix for the unbalanced data set - Fantasia/BIDMC (Set B)
Table 7 Confusion matrix for the balanced data set - NSRDB/BIDMC (Set C)
Table 8 Confusion matrix for the balanced data set - Fantasia/BIDMC (Set D)
Table 9 Summary of classification results for different datasets

In Set A, 95.75% of the normal ECG segments are correctly classified in the normal class and 96.52% of CHF signals are correctly classified in the CHF class. Only 4.25% and 3.48% of the ECG signals are incorrectly categorized as CHF and normal class respectively. Also, in Set B, a very small percentage of approximately 0.99% normal ECG signals are incorrectly grouped as CHF class, and 1.13% of CHF ECG signals are misclassified into the normal class.

Likewise, 5.88% of normal ECG signals are wrongly classified to CHF class in Set C. Also, the misclassification rate of CHF ECG signals is about 5.32%. Set D attained better classification results than Set C with 1.84% of CHF ECG signals and 1.50% of normal ECG signals wrongly classified into normal and CHF classes, respectively.

6 Discussion

Based on Table 9, it can be noted that Set B and Set D achieved better performance as compared to Set A and Set C. In addition, it can also be observed that the full set (Set A and Set B) yielded better performance as compared to the balanced set (Set C and Set D). This might be because more variations in the large number of ECG signals (see Table 2) in the full set ensure more diversity learning during training and hence helped to achieve better results than in the balanced set. Also, the quality of the ECG signals may affect the overall diagnostic performance. Out of the four sets, Set B is reported to achieve the highest diagnostic accuracy of 98.97%.

Table 10 discusses the different algorithms developed for the automated detection of CHF with ECG signals obtained from PhysioBank. The different techniques recorded in Table 10 yielded high diagnostic performance. Most of the works listed in Table 10 performed denoising and R-peak detection in the pre-processing step. But, our proposed CNN model does not require any processing of the ECG data. Further, the majority of the ECG signals are either segmented into an ECG beat or into different segments of ECG signals. However, in this work, the ECG signals used are shorter in duration.

Table 10 Selected studies of an automated CHF detection system using ECG data obtained from PhysioBank

Although the proposed CNN model did not obtain 100.00% accuracy in the classification of normal and CHF ECG signals, this study is the first to implement a CNN model to classify ECG signals into normal and CHF classes. Unlike our proposed algorithm, the works in Table 10 adopted the conventional machine learning techniques. Hence, the novelty of this work is the development of an 11-layer deep CNN model for the detection of CHF ECG signals.

In this work, we have developed the deep learning model using short durations (2-seconds) of ECG signals to diagnose the CHF. Such deep learning model can also be implemented using HRV signals and echocardiographic images to identify CHF automatically. The authors have developed automated diagnostic system using heart rate variability (HRV) signals [26, 27] and echocardiographic images [28] to detect CHF. Hence, the authors intend to design a CNN model to automatically diagnose CHF using HRV signals or echocardiogram images.

Also, this two-class (normal and CHF) diagnostic stratification can potentially be extended to four classes. Acharya et al. [29] and Fujita et al. [30] developed an algorithm to diagnose normal, CHF, myocardial infarction (MI), and coronary artery disease (CAD). Both works demonstrated high diagnostic performance (see Table 10). Moreover, our group has already performed automated diagnosis of CAD [15] and MI [14] with an 11-layer deep CNN model respectively. We have also detected automatically non-ectopic, supraventricular ectopic, ventricular ectopic, fusion, and unknown ECG beats using CNN [16]. In future, the authors intend to develop a CNN model to detect the MI, CHF, CAD, and normal (four-class) ECG signals.

The advantages of the proposed CNN model are:

  • 11-layer deep CNN model is proposed.

  • Denoising is not required.

  • R-peak detection is not required.

  • Hand-crafted features are not required.

The limitations of the proposed CNN model are:

  • Requires big data to achieve the optimum performance.

  • Requires extensive computational power for training the model.

Nevertheless, running the proposed model with a graphics processing unit (GPU) will accelerate the time taken to train the model and reduces the processing power needed for training. In addition, the performance will increase if there are more diverse ECG signals used to train the CNN model. Hence, the advantages outweigh the drawbacks of this proposed deep CNN model.

7 Conclusion

Unlike the conventional machine learning techniques, this study implemented an 11-layer deep CNN model to automatically diagnose CHF using ECG signals. The proposed model is fully-automatic and R-peak detection is not required. Also, four different sets of data obtained from PhysioBank were used to train and test the CNN model. Set B obtained the highest performance using our proposed model with an accuracy, specificity and sensitivity of 98.97%, 99.01% and 98.87% respectively. Nevertheless, the diagnostic ability of the suggested model can be enhanced using huge ECG database belonging to different stages of CHF. It is anticipated such CNN models can also be developed to detect different cardiac diseases like dilated, ischemic, and hypertrophic cardiomyopathy. Once the CNN model is well-trained, it can be introduced in the healthcare industries as an adjunct tool to assist cardiologists in providing quick and reliable second opinions on the diagnosis.