Keywords

1 Introduction

Cardiovascular diseases remain the biggest cause of deaths worldwide over the last two decades. The diagnosis of heart disease (HD) in most cases depends on a complex combination of clinical and pathological data. An intelligent HD prediction system built with the aid of data mining technique like decision trees, naïve Bayes, and artificial neural network was proposed by Palaniappan and Awang [1]. The result illustrated the peculiar strength of each of the methodologies in comprehending the objectives of the specified mining objectives. Srinivas et al. [2] applied data mining techniques to predict heart attack. Based on the calculated significant weightage, the frequent patterns having value greater than a predefined threshold were chosen for the valuable prediction of heart attack.

Yao and Liu [3] proposed evolutionary feedforward neural network with neuro-genetic approach for optimization of a trading agent. Carlos Ordonez used association rules to improve HD prediction. Association rules were applied on a real dataset, contacting medical records of patient with HD, and the risk factors were identified. Sulaiman et al. [4] proposed hybrid multi-layer feedforward neural network (MLFFNN) technique for predicting the output from a grid-connected photovoltaic system. The classification of biomedical dataset was done with hybrid genetic optimized back-propagation network by Kuok et al. [5] and Karegowda et al. [6] also proposed genetic optimized back-propagation network for classification. According to Song and Gu [7], PSO algorithm has been promised to solve many optimization problems. Lee et al. [8] have used PSO and GA for excess return evaluation in stock market. Based on their experiment, it is proven that PSO algorithm is better compared to GA. In this paper, an intelligent HD prediction system is proposed using an optimized feed forward neural network. In order to improve the performance of the feed forward neural network, its parameters are tuned using PSO algorithm, and it is discussed in the rest of the paper.

2 Materials and Methodology

This section describes the dataset and proposed methodology for HD prediction. The inputs are normalized; min–max normalization techniques are used for normalization, and the normalized attributes are given as the input to the PSONN.

2.1 Heart Disease Dataset

HD dataset is obtained from UCI (University of California, Irvine CA) centre for machine learning and intelligent systems. The data collected from 270 patients are used for proposed work, and it contains 13 risk factors. HD risk factors never occur in isolation, but they are correlated with each other. The digitized data have 150 normal and 120 abnormal cases. In the selected dataset, class 0 specifies the absence of heart attack and class 1 specifies the presence of HD. Normally, direct support clinical decision-making is the intention behind the design of a clinical decision support system, and it presents patient-specific assessments or recommendations produced using the characteristics of individual patients to clinicians for consideration. The 13 attributes considered are age, gender, blood pressure, cholesterol, fasting blood sugar, chest pain type, maximum heart rate, ST segment slope in ECG, number of major vessels colored in angiogram, and thallium test value.

2.2 Multi-Layer Feed Forward Neural Network

Artificial neural networks (ANN) can be used to tackle the said problem of prediction in medical dataset involving multiple inputs. A MLFFNN is a three-layer network with input unit \( x_{i} = \{ x_{1} ,x_{2} , \ldots ,x_{n} \} \), hidden layer \( h_{j} = \{ h_{1} ,h_{2} , \ldots ,h_{n} \} \), and output layer \( y_{k} = \{ y_{1} ,y_{2} , \ldots ,y_{n} \} \). A MLFFNN consists of a layer of input units, one or more layers of hidden units, and one output layer of units. Each connection between nodes has a weight associated with it. In addition, there is a special weight (w) given as \( w_{ij} = \left\{ {w_{1} ,w_{2} , \ldots ,w_{n} } \right\} \), where n denotes the number of neurons in the hidden layer, which feeds into every node at the hidden layer and a special weight (z) given as \( z_{jk} = \{ z_{1} ,z_{2} , \ldots ,z_{n} \} \) that feeds into every node at the output layer. These weights are called the bias and set the thresholding values for the nodes. Each hidden node calculates the weighted sum of its inputs and applies a thresholding function to determine the output of the hidden node. The thresholding function applied at the hidden node is a sigmoid activation function. The back-propagation (BP) algorithm is a commonly used learning algorithm for training ANN. The network is first initialized by setting up all its weights to be small random numbers between [0, +1]. The 13 inputs are applied, and the output is calculated. The actual output (t) obtained is compared with the target output (y), and the error E is calculated. This error is then used mathematically to change the weights in such a way that the error will get minimized. The process is repeated again and again until the error is minimal. In the proposed work, mean square error function defined in Eq. (1) is used for training.

$$ E = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left( {y_{i} - t_{i} } \right)^{2} } $$
(1)

The weights are updated using Eq. (2)

$$ \varDelta w_{ij} \left( {t + 1} \right) = - \eta \frac{\partial E}{{\partial w_{ij} }} + \alpha \varDelta w_{ij} \left( t \right) $$
(2)

In Eq. (2), α is the momentum factor and η is the learning rate.

The momentum factor α allows for momentum in weight adjustment. The magnitude of the persistence is controlled by the momentum factor. Momentum in the BP algorithm can be helpful in speeding the convergence and avoiding local minima. Learning rate η is a training parameter that controls the size of weight and bias changes when learning the selection of a learning rate is of critical importance in finding the true global minimum of the error. Increasing number of hidden neurons increases processing power and system flexibility. But the complexity and cost of the system also increase depending on the number of hidden neurons. Range of optimizing parameters used is given in Table 1.

Table 1 Optimizing parameters selected in FFNN

2.3 Particle Swarm Optimization

Particle swarm optimization (PSO) is a population-based stochastic optimization technique developed by Eberhart and Kennedy [9]. PSO does not require gradient information of the objective function under consideration. PSO can reach the global optimum value with less iteration. In PSO, each particle in the population has a velocity v i (t), which enables it to fly through the problem space. Therefore, each particle is represented by a position x i (t) and a velocity vector. Dimensions of position and velocity vectors are defined by the number of decision variables in the optimization problem. Modification of the position of a particle is performed by using its previous position information and its current velocity.

$$ v_{i} \left( {t + 1} \right) = wv_{i} \left( t \right) + c_{1} \,{\text{rand}}\,_{1} + c_{2} \,{\text{rand}}\,_{2} \left( {{\text{Gbest}}_{i} - x_{i} (t)} \right) $$
(3)
$$ x_{i} \left( {t + 1} \right) = x_{i} \left( t \right) + v_{i} \left( {t + 1} \right) $$
(4)

where v i (t) is velocity of particle i at iteration t, x i (t) is current position of the particle i at iteration t, Pbest i is personal best of particle i, Gbest i is best position in the neighborhood, rand is random number between 0 and 1, w is weighting function, c 1 is cognition learning rate, and c 2 is social learning rate

The PSONN was mainly developed to search for the optimal training parameters, i.e., the number of neurons in the hidden layer, the learning rate, the momentum rate, the transfer function in the hidden layer, and the learning algorithm. These training parameters are also known as the decision variables for the optimization task. The objective function for the optimization process was to minimize the MSE during training. PSO is chosen and applied in feed forward neural network to enhance the learning process in terms of convergence rate and classification accuracy.

3 Heart Disease Prediction System Using PSONN

The input data to PSONN need to be normalized as NN only work with data represented by numbers in the range between 0.001 and 0.999. Three-layer PSONN was used in this study for model calibration. The number of input neurons depends on the input attributes, and there is only one neuron in the output layer. In the proposed work, 13 input neurons and one output neuron for prediction of HD are used. The optimal training parameters considered are the number of neurons in the hidden layer, momentum rate and the learning rate, the transfer function in the hidden layer, and the learning algorithm.

The PSONN algorithm for intelligent heart attack prediction is given below

The PSO parameters and c 1, c 2 are the acceleration constants which are initially set to 2, and r 1 and r 2 are random integers set to a range between [0, 1]. The initial population is randomly selected, and the population size is set to 25. The total number of iterations is chosen as 200. With these parameters, the number of hidden layer neurons, momentum rate and the learning rate, the transfer function in the hidden layer, and the learning algorithm are optimized. The whole process is repeated until the generation reaches 200 or the minimum fitness is achieved.

4 Experimental Results

In this work, we analyze the use of the PSO algorithm and the cooperative variant with the weight decay mechanism for neural network training, aiming better generalization performances in HD prediction. For evaluating these algorithms, we apply them to benchmark classification problems of the medical field. The training set and testing set accuracy are calculated and analyzed for the standard HD dataset. In the standard dataset, 270 data are used for training and the network is also tested with same data. The performance metrics accuracy, sensitivity, and specificity of training and testing cases of both dataset are given in Table 2. The performance metrics obtained in the proposed system for standard dataset are compared with SVM classifier and feed forward neural network classifier and are shown in Fig. 1.

Table 2 Training and testing performance metrics
Fig. 1
figure 1figure 1

Performance analysis for Cleveland dataset

ROC curve is a graphical plot created by plotting false positive rate versus true negative rate, and it is used to analyze the performance of the classifier. The ROC curve for SVM, feed forward neural network, and PSO optimized neural network (PSONN) is shown in Fig. 2. From Fig. 1, it is observed that the PSONN has significantly good performance compared with all other classifiers.

Fig. 2
figure 2figure 2

ROC plot for different classifiers

The ROC curve depends on the true positive rate and false positive rate. From this curve, it is shown that the performance of the PSONN-based prediction system has shown a significant improvement on classifying the diseased and non-diseased patterns. When analyzing these graphs, the PSONN shows a better performance in terms of accuracy, sensitivity, and specificity. A true positive rate of 0.889 is obtained with a false positive rate of 0.0987 which is achieved by the proposed classifier. The sensitivity and specificity achieved by the PSONN are 88.98 and 90.13 for standard dataset with an overall accuracy of 89.6, which is higher than the other two well-known classifiers. This increase is because of the fact that the neural network structure is optimized with the PSO algorithm. The BP algorithm primarily depends on the initial parameter setting, and the performance of the classifier depends on these parameters. From Fig. 1, it is noticed that FFNN has an improved performance than the SVM; this is due to the fact that SVM does not depend on the generalization error. Although FFNN produces a good classification accuracy for prediction of HDs, this classifier selects the initial parameter setting manually, which will take a considerable time to classify the patterns. Hence, FFNN and SVM will not be suitable for the physicians to analyze and predict the diseased and non-diseased patterns. The results of the experiments show that an improvement of 4 % accuracy has been achieved by applying optimized neural network classifiers. This implies that the optimized neural network is one of the desirable classifiers, which can be used as an aid for the physicians to predict the HDs.

5 Conclusion

In this paper, we have proposed an adaptive intelligent mechanism for HD prediction using the profiles collected from the patients. PSONN adopted global and local optimization of the network parameters within the specified range. By exploiting this distinct feature of the PSONN, a computerized prediction algorithm is developed that is not only accurate but is also computationally efficient for heart attack prediction. With the proper optimization using PSO, the method can thus evolve an optimum number of hidden units within an architecture space. The results of proposed system have achieved better accuracy than most of the existing algorithms.