Introduction

Cardiovascular disease (CVD) or heart disease is one of the causes of the increasing mortality rate worldwide. According to WHO reports, more than 10 million deaths occur every year due to heart disease [1]. Heart diseases can be described as any kind of disorder such as infection, genetic defects, blood vessel disease, heart valve disease etc., which affects the heart. This affects pumping or circulating required amount of blood to the various parts of the body. Several risks contribute to heart diseases, such as smoking, cholesterol, high blood pressure, diabetes, obesity, stress, etc. Many people have experienced chest pain and fatigue symptoms, but some have not realized any signs until it occurred [2]. Therefore, it is necessary to monitor the symptoms that contribute to CVD. To diagnose the heart disease, majorly used method is angiography. But it is very costly, and it requires technical experience [3]. Apart from it, various techniques like blood pressure monitoring, echocardiogram, electrocardiograph, electrophysiological studies, myocardial perfusion scans, tilt table test are performed to diagnose heart disease [4]. However to diagnose the heart disease, skillful and experienced medical practitioners are required.

In the past, various research works are done in diagnosing heart disease. Machine learning algorithms play a predominant role in diagnosing heart disease. Machine learning algorithms have the advantage of extracting the necessary information from a large amount of data. Many machine learning algorithms such as decision tree, naive Bayes, support vector machine, radial basis function, K-NN, single conjunctive rule learner are proposed in the past [5]. But these existing methods achieved 65–85% of accuracies only. To prevent misclassification and to diagnose the disease early, there is a need to build an automated system that helps to spot the disease in advance [6]. Predominantly artificial intelligence was used to diagnose chronic disease. Recently, many researchers are focusing on deep learning algorithms because it works well for complex functions. It automatically learns the features of the input data, and it helps the model to learn the complex functions [7].

Various deep learning algorithms are deployed to perform complex tasks in the area of medical applications. After analyzing various algorithms, Convolutional neural network (CNN) was found to be a suitable method to diagnose heart disease, because when the layers deepen, the model can be easily learned, and the features can be represented in a brief and conceptual way. Along with CNN, the long short term memory model is also proposed in this work. LSTM is a time recurrent neural network (RNN). In LSTM, each unit communicates with other units and has the power of storing and feedback the necessary information. Thus the hybrid model of CNN with LSTM is proposed here which improves the classification accuracy.

The rest of the sections are aligned as follows. “Related Work” describes the literature review related to the work. Background methods are discussed in “Background”. The proposed work is presented in the subsequent section. In “Experimental Results”, the results of the proposed work are shown and comparison with other algorithms are discussed.

Related Work

Various researchers had proposed different techniques for predicting heart disease [2, 3, 8,9,10]. The majority of the work was focused on machine learning, data mining, and deep learning. No single data mining technique is suitable or produces best results for all kinds of dataset. Various data mining techniques such as classification, clustering, association, regression have their own pros and cons, which was discussed in [10]. By using machine learning and data mining algorithms, processing time gets reduced and results in an increase of accuracy. Machine learning algorithms work well for the large data set and perform analysis efficiently. To classify heart failure, the author in [11] experimented with an artificial neural network, Support vector machine, and logistic regression, and have suggested SVM methodologies as good technique for prediction of heart disease, considering classification accuracy as a performance measure. Kindie et al. [12] used rough set indiscernibility relation method with backpropagation neural network to classify the heart disease and this work provided effective classification model.

Various optimization algorithms, genetic algorithms, and fuzzy are used for feature selection and classification to predict heart disease. A hybrid method of genetic and neural network algorithm is used for diagnosis of CVD in [13]. Z-Alizadeh Sani dataset is used in this work. Here, in this method genetic algorithm was able to suggest better weights for neural network, and increased the performance of neural network achieving better accuracy. Enhanced Extreme Learning Machine (ELM) algorithm was proposed in [14]. In this work, features are selected using adaptive grasshopper optimization algorithm (AGOA), and they are classified using enhanced ELM. In [15] adaptive genetic algorithm with fuzzy logic (AGAFL) were used to predict heart disease. Rules are generated using fuzzy classifiers, and they are given to the genetic algorithm. Relevant features are extracted through rough set theory, and AGAFL classifiers are applied to classify heart disease.

Saba Bashir et al. [16] focused on decision tree, logistic regression, logistic regression SVM, Naïve Bayes and random forest algorithms for feature selection techniques and showed that logistic regression achieved higher accuracy than other approaches. Runchun et al. [17] proposed an information gain ratio for feature selection and used an Adaboost + RF algorithm to classify the disease. Compared to machine learning algorithms, they showed Adaboost + RF worked well for missing and unbalanced data.

Kavita et al. proposed the MultiLayer Pi-Sigma Neuron Model (MLPSNM) [18]. In this MLPSNM, bipolar sigmoid activation functions are used for the backpropagation algorithm for network learning and used SVM-LDA method to classify heart disease. In [19] hybridized Ruzzo–Tompa memetic algorithm is used to select the global features, and they are classified by the deep neural network. The result is compared with the other optimization algorithm, and the work shows that the proposed algorithm outperforms others.

Compared to traditional feature-based classification, deep learning techniques learn features directly from the data itself. Thus, it has achieved good results in medical data processing. Recently, a number of computational intelligence techniques such as logistic regression (LR), support vector machine (SVM), deep neural network (DNN), decision tree (DT), Naive Bayes (NB), random forest (RF), and K-nearest neighbor (K-NN) were applied and compared for the prediction of coronary artery heart disease [20]. They have used Statlog and Cleveland heart disease dataset from the UCI machine learning repository database and showed the highest accuracy was obtained by deep neural network with good sensitivity and precision. After considering the existing system's advantages and disadvantages, this work focused on deep learning to predict heart disease.

Background

Convolutional Neural Network

The convolutional neural network is one of the popular methods in artificial neural network (ANN). It is a deep learning technique commonly used in image processing applications. It is a well-known feature extraction technique and is also used in the classification process. Generally, it consists of a convolutional layer, pooling layer, and the fully connected layer. Each input is associated with weights and bias. The traditional CNN architecture is shown in Fig. 1.

Fig. 1
figure 1

Traditional CNN architecture

Convolutional Layer

In this layer, the convolution of input data with weights are calculated, and they are added with the bias and given to the activation function to create the feature map. This feature map is given to the next layer. If the input value is xj, then the output can be calculated as in Eq. (1).

$$o_{j} = \, f\left( {b_{j} + \mathop \sum \limits_{j = 1}^{p} w_{j} x_{j} } \right),$$
(1)

where oj is the output, bj is the bias, xj is the input vector, f is the activation function, Wj is the weight associated with the layer.

Pooling Layer

The convolutional layer is followed by the pooling layer and performs pooling operations to reduce the feature size computed by the convolutional layer. Common methods of pooling are max pooling and average pooling. In this proposed work, max-pooling is used, and it is one of the successful techniques. This operation calculates the maximum value for the given inputs. The max-pooling operations are given in Eq. (2).

$$f_{j} = \, \max \left( {o_{j} } \right).$$
(2)

After several convolutional and pooling layers, the output values, i.e., features, are transformed into a single vector and given to the fully connected layer.

Fully Connected Layer

The output of the pooling layer is given to the fully connected layer. A fully connected layer takes the output of the pooling layer and predicts the feature that most correlates with the class label. The activation function used in this work is a Softmax classifier. Softmax classifier is a function that normalizes the input to a probability distribution. This classifier works well for binary classifiers.

LSTM

Long short term memory (LSTM) method is a type of recurrent neural network (RNN) that is widely used in deep learning. Generally, LSTM unit consists of a cell, an input gate, an output gate, and a forgot gate. The cell recognizes the values over random time intervals, and the three remaining gates manage the flow of data to and from the cell. Figure 2 represents the traditional LSTM architecture.

Fig. 2
figure 2

Traditional LSTM architecture

In the forgot gate, the memory is administered by a single layer neural network and uses a sigmoid activation to calculate the output based on the input, the previous block output, and memory. The function is given in Eq. (3)

$$fn_{j} = \, h\left( {w\left[ {x_{j} ,s_{j - 1} ,m_{j} - 1} \right] + b_{fn} } \right),$$
(3)

where h is the sigmoid function, x is the input vector, s is the previous unit output, and m is the previous gate memory, w represents the weights associated with it, and b is the bias.

In the input gate, the neural network created new memory and used the tanh function with the previous gate memory, and the calculation is given in Eqs. (4) and (5).

$$ip_{j} = \, h\left( {w\left[ {x_{j} ,s_{j - 1} ,m_{j - 1} } \right] + b_{ip} } \right)$$
(4)
$$C_{j} = \, fn_{j} \times C_{j - 1} + ip_{j} \times \tanh \left( {x_{j} ,s_{j - 1} ,m_{j - 1} } \right] + b_{c} .$$
(5)

Lastly, in the output gate, outputs are generated. This can be calculated as in Eqs. (6) and (7).

$$Op_{j} = \, h\left( {w\left[ {x_{j} ,s_{j - 1} ,m_{j - 1} } \right] + b_{op} } \right)$$
(6)
$$g_{j} = \, op_{j} \times \tanh \left( {C_{j} } \right).$$
(7)

Proposed Work

Dataset

In this work, the UCI machine learning repository dataset is used and it is described in Table 1. Data in the dataset are collected from the Hungarian Institute of Cardiology, Cleveland clinic foundations. It consists of information about patient records of both normal and abnormal. This database consists of 76 attributes, with a total of 303 observations. The attributes are age, sex, resting blood pressure, cholesterol, etc. And the data set consists of six missing values. In 303 observations, 138 are normal persons, and 165 are abnormal persons, i.e., suffered from heart disease.

Table 1 UCI Heart Disease Dataset attributes and description

Pre-processing

After the collection of data, the data should be preprocessed to handle the missing values and noises. This data set contains six missing values. The missing values should be handled very carefully because it may affect the results. Several preprocessing techniques such as cleaning, integration, reduction, transformation, and discretization are available.

To manage the missing values presented in the data set, the normalization technique is used in this work. Normalization techniques are used to modify the values on a common scale. Missing values are replaced with z values. Equation (8) applies z-score normalization to the collected data set.

$$z_{i} = \frac{{x_{i} - \min \left( x \right)}}{\max \left( x \right) - \min \left( x \right)},$$
(8)

where x is the input vector, max(x) represents the maximum value in the dataset, and min(x) represents the minimum value in the dataset. After the data are preprocessed, the dataset is divided into training and testing sets. In this work, the data are divided into training data 70%, and testing data 30%.

Feature Selection

Successful ranking methods for feature selection are Information Gain, Gini Index, PCA, and weight by SVM. In the information gain and Gini index, the gain and Gini index are evaluated for each attribute and attribute which has a maximum gain & Gini index are selected. Based on these attributes, the dataset is split for classification. PCA, i.e., Principal Component Analysis, converts correlated attributes into uncorrelated attributes by performing an orthogonal transformation.

In this work, weight by SVM feature selection algorithms are considered. It calculates F-score as in Eq. (9) to estimate the weights of the attributes.

$$FS\left( i \right) = \frac{{\left( {\overline{a}_{i}^{\left( + \right)} - \overline{a}_{i} } \right)^{2} - \left( {\overline{a}_{i}^{\left( - \right)} - \overline{a}_{i} } \right)^{2} }}{{\frac{1}{{n_{ + } - 1}}\mathop \sum \nolimits_{i = 1 }^{{n_{ + } }} \left( {\overline{ a}_{i,j}^{\left( + \right)} - \overline{a}_{i}^{\left( + \right)} } \right)^{2} + \frac{1}{{n_{ - } - 1}}\mathop \sum \nolimits_{i = 1}^{{n_{ - } }} \left( {\overline{ a}_{i,j}^{\left( - \right)} - \overline{a}_{i}^{\left( - \right)} } \right)^{2} }},$$
(9)

where input vector is taken as ai, n+ be the total number of positive training instances, n- be the total number of negative instances, the average of the instances is represented as \(\overline{a}_{i}\). The average of the positive and negative instances are represented as \(\overline{a}_{i}^{\left( + \right)}\) and \(\overline{a}_{i}^{\left( - \right)}\). \(\overline{ a}_{i,j}^{\left( + \right)}\) and \(\overline{ a}_{i,j}^{\left( - \right)}\) represents the ith attribute of jth positive and negative instances.

Hybrid of CNN and LSTM

In this hybrid network, input data after preprocessing is given to the convolutional layer and then to the pooling layer. The pooling layer output is given to the LSTM, and the fully connected layer will predict if person has heart disease or not. The proposed architecture is represented in Fig. 3.

Fig. 3
figure 3

Proposed Hybrid CNN and LSTM Architecture

This hybrid network has five layers that are alternating convolutional layer and pooling layers. The network structure, i.e., the depth of the network and the parameters, are trained to enhance the performance of the classifier and to diminish the over-fitting of the parameters. The collected data after preprocessing is given to the convolutional layer. This layer convolves the input matrix and it is given to the pooling layer. In the pooling layer, a max-pooling operation is performed, and the output is fed to the LSTM layer. LSTM layer performs the tanh function, and it is given to the fully connected layer. In the fully connected layer, the activation function i.e., Softmax classifier, is performed, and the result is given to the output layer. The output layer classifies the data into heart disease or no heart disease based on the results from the fully connected layer.

The network is trained using the backpropagation network. The backpropagation algorithm uses gradient descendent function to calculate the error, and based on the error value, the weights and bias are updated. The error is propagated backwards to the previous layers, and the network is trained. The weights and biases are updated according to the Eqs. (10) and (11).

$$\Delta W_{1} \left( {p + 1} \right) \, = \, - \frac{{x_{\lambda } }}{r}\frac{x}{n}W_{1} - \frac{\partial C}{{\partial W_{l} }} + m\Delta W_{1} \left( p \right)$$
(10)
$$\Delta B_{1} \left( {p + 1} \right) \, = - \frac{x}{n}\frac{\partial C}{{\partial B_{1} }} + m\Delta B_{1} \left( p \right),$$
(11)

where W denotes the weight, B denotes the bias, λ denotes the regularization parameter, x be the learning rate, and p be the updating step, l be the layer number, n be the total number of training samples, m be the momentum and C be the cost function.

ReLu (Rectified Linear Units) is used as an activation function in all the convolutional layers. The Softmax activation function is used in the output layer. A batch Normalization layer is instituted in this network to reduce the overfitting & the sensitivity of the algorithm and also contributed to the backpropagation function.

Experimental Results

This section explores the effectiveness of hybrid CNN and LSTM for the prediction of heart disease. Deep learning consists of a lot of parameters and to identify the optimal parameters various experiments were conducted on hybrid CNN and LSTM networks and the pseudocode is described below.

Pseudocode for the Hybrid CNN-LSTM Algorithm

figure a

Inputs are normalized so that they fall within the values 0 and 1. The selected features are given to the convolutional layer and the parameters used in this layer are (1) kernel size (describes the dimensionality of the convolutional window) and (2) activation function (ReLu is used here). The convolutional layer is given to pooling layer. Max pooling is applied to every data to obtain the required features from the dataset by choosing the maximum values. In the fully connected layer, activation function i.e., Sigmoid activation function is used to calculate the probability for the two classes i.e., heart disease or no heart disease.

After getting results from the fully connected layer the model is configured in the compile function and the parameters used are (1) loss function to reduce the loss incurred in the algorithm (2) optimizer to compile the model. In fitting a model the model is evaluated in the test data. At the end accuracies are calculated and it describes how well the model is performed in the test data.

CNN + LSTM is run for 200 epochs. ADAM is used as an optimizer with a learning rate of 0.001. The loss function used is binary cross-entropy.

The performance of the model is evaluated by accuracy, specificity and sensitivity measures. These measures depict how much the model is best and stable. Sensitivity measures how well the actual positives are predicted correctly, Specificity evaluates how well the actual negatives are predicted correctly. The performance metrics are calculated using Eqs. (12), (13), and (14).

$${\text{ACC}} = \frac{{{\text{TPr}} + {\text{TNr}}}}{{{\text{TNr}} + {\text{FPr}} + {\text{TPr}} + {\text{FNr}}}} \times 100$$
(12)
$${\text{SPEC}} = \frac{{{\text{TNr}}}}{{{\text{TNr}} + {\text{FPr}}}} \times 100$$
(13)
$${\text{SENS}} = \frac{{{\text{TPr}}}}{{{\text{TPr}} + {\text{FNr}}}} \times 100,$$
(14)

where TPr represent positive data that are predicted correctly as normal, TNr indicates negative data that are correctly predicted as abnormal, FPr indicates negative data that are predicted as normal class, and FNr indicates positive data that are predicted as abnormal class.

To detect the diagnostic capacity, receiver operating characteristic curves are used. ROC curve is a popular evaluation metric used in binary classification problems. It is a curve drawn between True Positive Rate (TPR = TPr / TPr + FNr) and False positive Rate (FPR = FPr / FPr + TNr) for all classification thresholds. Figure 4 represents the ROC curve for our proposed model. It illustrates that the model has good ability to separate positive classes from negative classes.

Fig. 4
figure 4

ROC curve

An epoch is a hyperparameter that refers one full cycle through the entire training dataset. It is also defined as the number of times the algorithm iterates until it achieves better accuracy. It trains the network with each instance of the dataset. The Epoch wise accuracies is represented in Fig. 5. From the result, it is understood that the network achieved high accuracy when the epoch number increased for both training data and testing data. CNN + LSTM networks reached 89% accuracies in 200 epochs.

Fig. 5
figure 5

Epochs vs accuracies

Figure 6 indicates for various learning rates how the network is processed. When the number of iterations increased with learning rates, the accuracy differs. For the learning rate 0.001, CNN + LSTM reached high accuracy.

Fig. 6
figure 6

Learning rate vs accuracy

When CNN performs individually, accuracy and loss simultaneously increased for higher CNN layers as shown in Fig. 7. But, when combined with LSTM, the loss got reduced, and accuracies improved higher. Precision is also observed to be the same as accuracies. With increasing hidden layer and hybrid of LSTM and CNN higher accuracy was achieved.

Fig. 7
figure 7

Comparison of CNN layers and LSTM

The proposed algorithm was compared with various machine learning algorithms. Among these, CNN + LSTM attained better accuracy as shown in Fig. 8. The reason is that CNN does not use the previous information for process unlike RNN and LSTM. LSTM uses the previous information to classify the future instances. Therefore, CNN with LSTM are used and they efficiently handled the dynamical changes in the large data.

Fig. 8
figure 8

Comparison with other models

Conclusion

Most of the people in the world are majorly affected by the heart disease. The disease cannot be cured completely but it can be under control. If it is uncontrolled it leads to mortality, so the heart disease should be early detected as soon as possible. In this work, effective heart disease classification method is proposed by using hybrid deep learning techniques such as the Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM). The data are preprocessed using normalization to handle missing values. Weights by SVM feature selection method is used for feature selection and the features are given to hybrid CNN and LSTM model. Backpropagation method is also used to minimize the loss incurred in the model. The proposed model is analyzed by comparing different parameters and on each comparison the proposed hybrid model provides better results. Experimental results achieved 89% accuracy, 81% sensitivity, and 93% specificity values compared to other traditional machine learning classifiers. In future work, this CNN + LSTM can also be evaluated with real-time medical datasets, and the performance of the model can be analyzed.