Hybrid genetic algorithm and a fuzzy logic classifier for heart disease diagnosis

Reddy, G. Thippa; Reddy, M. Praveen Kumar; Lakshmanna, Kuruva; Rajput, Dharmendra Singh; Kaluri, Rajesh; Srivastava, Gautam

doi:10.1007/s12065-019-00327-1

Hybrid genetic algorithm and a fuzzy logic classifier for heart disease diagnosis

Special Issue
Published: 26 November 2019

Volume 13, pages 185–196, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Evolutionary Intelligence Aims and scope Submit manuscript

Hybrid genetic algorithm and a fuzzy logic classifier for heart disease diagnosis

Download PDF

G. Thippa Reddy¹,
M. Praveen Kumar Reddy¹,
Kuruva Lakshmanna¹,
Dharmendra Singh Rajput¹,
Rajesh Kaluri¹ &
…
Gautam Srivastava ORCID: orcid.org/0000-0001-9851-4103^2,3

4474 Accesses
270 Citations
Explore all metrics

Abstract

For the past two decades, most of the people from developing countries are suffering from heart disease. Diagnosing these diseases at earlier stages helps patients reduce the risk of death and also in reducing the cost of treatment. The objective of adaptive genetic algorithm with fuzzy logic (AGAFL) model is to predict heart disease which will help medical practitioners in diagnosing heart disease at early stages. The model consists of the rough sets based heart disease feature selection module and the fuzzy rule based classification module. The generated rules from fuzzy classifiers are optimized by applying the adaptive genetic algorithm. First, important features which effect heart disease are selected by rough set theory. The second step predicts the heart disease using the hybrid AGAFL classifier. The experimentation is performed on the publicly available UCI heart disease datasets. Thorough experimental analysis shows that our approach has outperformed current existing methods.

A Hybrid Approach for Heart Disease Diagnosis and Prediction Using Machine Learning Techniques

RETRACTED ARTICLE: Efficient hybrid algorithm based on genetic with weighted fuzzy rule for developing a decision support system in prediction of heart diseases

Article 25 February 2021

Adaptive weighted fuzzy rule-based system for the risk level assessment of heart disease

Article 06 September 2017

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The progress made in the field of computer technology, storage of digital data, and technological advancement in communication technologies has enabled the generation of huge amounts of data in the medical field [29]. Extracting patterns from medical data helps medical practitioners in diagnosing patients.

A patient’s data is comprised of attributes like demography, test results, images, video clippings, and others [12]. Extraction of desired information from the voluminous data manually is a herculean task considering size of the data and wide dimensionsality of the data [2]. Hence, automated analyzing techniques are required for analyzing the data. It is handy to use data mining techniques which can automate the analysis as well as handle the large datasets [1]. Data mining [3] helps doctors in diagnosing the patients by extracting useful knowledge from patients’ medical data [6, 16, 18]. We can use the term Medical data mining for models classifying medical data. It uses data mining methods for obtaining accurate information. Medical data mining is used to diagnose illness, administer therapy, establish rapport among doctors as well as patients, bettering managing of healthcare, and so on [15, 24]. Every day, gigabytes of medical data is generated from several sources including image databases like SPECT, MRI, PET, signal databases like ECG and EEG [25]. Unlike traditional data mining, data mining in the medical field is very cumbersome [5, 12, 27].

In the past few decades, millions of people across the globe have succumbed to heart disease due to changes in lifestyle and food habits. Diagnosing heart disease takes a lot of time. By using machine learning techniques, medical practitioners can get valuable inputs about the heart patients, which will enable them to give proper treatment to the patients [4]. This motivated us to develop this model to help doctors in diagnosing heart disease patients.

This paper proposes a novel AGAFL classifier for classifying heart disease datasets. AGAFL has three steps:

1.
reduction of features/dimensions utilizing rough sets
2.
generating rules from the reduced dataset through the application of Fuzzy Logic Classifier
3.
optimizing generated rules through the application of Adaptive Genetic Algorithm.

The latter utilizes a fitness function for optimizing the rules generated through Fuzzy Logic Classifier. The major contributions of the proposed model are as follows:

rough set theory to identify most relevant features as Rough Set theory is an effective tool to deal with vagueness and uncertainty information to select the most relevant attributes for a decision system.
Adaptive Genetic Algorithm to optimize the classification rules, to achieve better accuracy, reduce time complexity as justified in results and discussion section.

2 Paper organization

The rest of the paper is organized as follows. First in Sect. 2, we review all related works. Section 3 narrates the necessary background information. We discuss the proposed classification approach in Sect. 4. Experimental results are discussed in Sect. 5. Finally, Sect. 6 discusses future work followed by Sect. 7 which summarizes some concluding remarks.

3 Related work

For diagnosing diseases, several diagnostic techniques are proposed by researchers. Long et al. [17] narrated the cardiac diagnostic method by minimizing dimensions using rough sets as well as interval type-2 fuzzy logic system. Then, a hybrid learning procedure comprising of fuzzy c-means clustering and tuning of parameters using firefly algorithm is employed on the dataset.

Santhanam and Ephzibah [22] proposed a hybrid genetic fuzzy model to predict heart disease. For feature selection genetic algorithms was employed. The selected features were used to build a classification model using fuzzy inference. Sample data was utilized to create the required fuzzy rules. The genetic algorithm chose a significant and relevant subset of rules. The parameters selected were serum cholesterol (chol), sex, obtained maximum heart rate (thalach), ST depression induced by exercise relative to rest (oldpeak), exercise induced angina (exang), thal value as well as the count of major vessels coloured (ca). Fuzzifying was done through Gaussian membership function, and de-fuzzifying by employing centroid method. The model was evaluated by metrics such as specificity, accuracy, sensitivity, confusion matrix.

Srinivas et al. [26] has predicted cardiac disease based on rough-fuzzy hybrid classifier. The procedure used is: (1) generating rules utilizing rough set theory, (2) predicting using fuzzy classifier. Experiment is carried out on publicly available Hungarian, Cleveland, as well as Switzerland heart disease datasets. Seera and Lim [23] analyzed medical data using a hybrid fuzzy min–max neural network, Random Forest model, as well as the Classification and Regression Tree.

Yuvraj and Vivekanandan [32] described SVM based Classifying of Tumour with Factorizing of Symmetry-Non-Negative Matrix by utilization of Data for Gene Expression. Genes were chosen through Non-negative Matrix Factorization. Symmetry NMF was used for classifying and extraction of features was done by SVM-NMF. Finally, Support Vector Machine with weighted kernel width was used for classification. In the same way, Vafaie et al. [30] had classified heart disease datasets based on ECG by genetic-fuzzy system and ECG signals’ dynamical model. Long et al [17] proposed a model to diagnose cardiac ailment by an algorithm based on firefly optimization algorithm.

Few models based on Neural Networks are proposed by Kharat et al. [14] to classify human brain images based on magnetic resonance. Their Neural Network technique included three stages namely dimensionality reduction, feature extraction and classification. During the early stages, features correlated with MRI images are extracted with Discrete Wavelet Transformation. In the next stage, MRI parameters get reduced using Principles Component Analysis. At the categorizing stage, a couple of classifying units based on supervised machine learning are used. Former works on principle of Feed-Forward Artificial Neural Network; the latter rely on back propagation neural network. Brain images of MRIs are classified to be normal or abnormal by utilization of these classifying units. Henriques et al. [11] explains predicting cardiac-failure decompensating events by trend-analysis of tele-monitoring data.

Doctors as well as researchers have proposed several methods to predict diabetes in order to reduce the cost of the tests, time to diagnosis, and also for precise prediction. A scheme to monitor type 2 diabetes mellitus is explained by Wang and Kang [10]. The following algorithms are employed in this paper:

1.
Decision Tree (DT) to classify and generate rules. It is relatively very quick as well as efficient in rules generation.
2.
Artificial Neural Networks (ANN) to processes nonlinear problems.
3.
A back propagation neural network is widely utilized to diagnose as well as to predict.

Time Series is used to predict based on three models: Integrated, Auto Regressive and Moving Average. The above models are combined to produce the following hybrid models: Auto-regressive Integrated Moving Average and Autoregressive Moving Average. After pre-processing of data, selection of feature is carried out utilizing these, ANN predicts the disease and then generates suggestion on implementing clinical procedures as well as strategies that control diabetes.

Reddy and Khare [28] introduced an algorithm called FFBAT-Optimized Rule Based Fuzzy Logic Classifier, for diabetes classification. In this model, Locality Preserving Projection is utilized to reduce number of features and diabetes classification is carried out employing RBFL classifier. The related attributes are found by the algorithm LPP; then the RBFL generates fuzzy rules, finally the algorithm of FFBAT is employed to optimize rules. FFBAT is a hybrid of BAT and another optimization technique namely Firefly (FF). FFBAT is also used to classify the publicly available heart disease datasets from UCI machine learning repository [21].

Gandomi et al. [9] have suggested Cuckoo Search combined with LÃ©vy flights. Reddy and Khare [20] have introduced an algorithm, OFBAT-RBFL for classifying cardiac ailments. In this paper Opposition Based Learning is integrated with the FFBAT to improvise FFBAT.

Kaluri and Reddy [13] created a framework to extract and recognize the sign gesture language using four stages:

1.
Segmentation utilizing Modified Region Growing Algorithm
2.
Utilizing median filter to remove noise,
3.
Feature extraction
4.
Recognition employing Adaptive Genetic Fuzzy Classifier

Game et al. [8] have proposed a model for classifying health care data. It includes the following steps (1) Map-reduce framework (2) support vector machine (3) optimized decision tree classifier. PCA is used for dimensionality reduction. Next SVM is applied. For optimal rule generation divergence based grey wolf optimization.

Wang et al. [31] have proposed various evolutionary approaches for classification. In the first approach encoding rule sets with bit string genomes is performed. In the second approach Genetic Programming (GP) is used for creating decision trees with arbitrary expressions attached to the nodes. In the third approach, EDDIE-101, is used for classification.

From our in-depth literature survey, we have found that existing algorithms performed well on heart disease datasest, but when features are reduced by optimization algorithms, the performance of the algorithms with respect to several measures like accuracy, specificity, sensitivity, has also reduced dramatically.

4 Algorithm background

Pawlak [19] introduced Rough sets theory (RST). RST is used for analyzing details which are classified as not clear or not decided. Primary application of RST is with regard to attribute reduction. Rough sets theory’s common notion is listed below: Let $I=(U, A \cup \left\{ d \right\} )$ stand for the scheme of information, where U denotes universe amongst non-empty group of limited objects, A denotes state attributes’ group which is limited and non-empty, the decision feature is shown as d (decision table is also such table’s name), $\forall a\in A$ there exists a task that is equivalent $f_{a}:U\rightarrow V_{a}$ , here $V_{a}$ stands for A value’s group. If $P\subseteq A$ , P-in-discernibility associated being symbolized as IND(P) , being distinct:

$$\begin{aligned} IND(P)=\left\{ \left\langle x,y \right\rangle \epsilon U\mid \forall a\in P,f_{a}\left( x\right) = f_{a}\left( y\right) \right\} \end{aligned}$$

(1)

U’s separation produced by IND(P) being symbolized U/P. If $\left( x,y\right) \in IND\left( P\right)$, then x as well as y are indiscernible by a feature from P. After that P-in-discernibility associations similarity classes being symbolised as $\left[ x\right] _{p}$. Let $X\subseteq U$, P-power approximation $\underline{P}X$ as well as P-upper approximation $\overline{P}X$ of set X can be distinct:

$$\begin{aligned} \underline{P}X= & {} \left\{ x \in U \mid \left[ x\right] _{p} \nsubseteq X\right\} \end{aligned}$$

(2)

$$\begin{aligned} \overline{P}X= & {} \left\{ x \in U \mid \left[ x\right] _{p}\right\} \cap X\ne \phi \end{aligned}$$

(3)

Let p, $Q\subseteq A$ is equivalence relations over U, positive, negative as well as regions of boundary may get definition to be:

$$\begin{aligned} POS_p(Q)= & {} \bigcup _{X\in U|Q}{\underline{PX}} \end{aligned}$$

(4)

$$\begin{aligned} NEG_{p}\left( Q\right)= & \; U - \bigcup _{X\in U|Q}\overline{P}X \end{aligned}$$

(5)

$$\begin{aligned} BND_{p}\left( Q\right)= & {} \bigcup _{X\in U|Q}\overline{P}X- \bigcup _{X\in U|Q}\underline{P}X \end{aligned}$$

(6)

The optimistic section of separation $U\mid Q$ with corresponding to P, $POS_{p}\left( Q\right)$, is group of every object of U may become to be positively categorized for obstructing separation $U\mid Q$ utilizing P. Q reliant on p in a level $k\left( 0 \le k \le 1\right)$ symbolized by

$$\begin{aligned} P\Rightarrow k^{Q} \gamma _{P}\left( Q \right) \mid \frac{POS_{p}\left( Q \right) }{U}\mid \end{aligned}$$

(7)

Here p is a set of attributes conditionally, Q is decision and $\curlyvee _{P} (Q)$ is the classification’s quality. If $k=1$, Q is reliant entirely on P; if $0<k<1$, Q is oriented incompletely on P; as well as if $k=0$ the Q is unbiased on P. $\mid .\mid$ represent set’s cardinality. Feature diminutions aim is to eradicate features which are not necessary. Each reduct’s group is distinct:

$$\begin{aligned} Red(C)=\left\{ R \subseteq C |\curlyvee _{R} (D) =\curlyvee _{C} (D), \forall B\subset R, \curlyvee _{B} (D)\ne \curlyvee _{C} (D) \right\} \end{aligned}$$

(8)

A set of minimal reductions is defined in Eq. 9 as

$$\begin{aligned} Red(C)_{min}=\left\{ R\in Red\mid \forall R' \in Red,|R|\le |R'| \right\} \end{aligned}$$

(9)

5 AGAFL model

5.1 Genetic algorithm

The genetic algorithm starts with a set of solutions (denoted by chromosomes) called population. The selected solutions form new solutions (offspring) based on their fitness value - the more the value of fitness, the more chances they have to reproduce. The Basic Genetic Algorithm has been explained in Algorithm 1.

The adaptive genetic algorithm (AGA) is an improved version of the genetic algorithm, in which, adaptive mutations are employed for achieving desired optimizing results. A genetic algorithm employs mutations to each parent chromosome, where random interchanging of genes occurs. In the proposed adaptive mutation, the rate of mutation calculation is based on the chromosome’s fitness. The performance of mutation is based on the rate of mutation. For functioning of AGA, chromosomes are to be generated for the solution set. Every chromosome gets subjected to many AGA steps.

5.2 Steps in adaptive genetic algorithm

Genetic algorithm is a popular soft computing method. To improve canonical GAs, many variations are proposed. One such technique is AGA. It has the following steps. The process is explained in Fig. 1.

1.
Generation of Chromosomes
2.
Calculating Fitness function
3.
Crossover
4.
Adaptive Mutation
5.
Selection

While optimizing the rules that the fuzzy classifier generates, every rule is considered to be a chromosome. The chromosome pools are randomly generated as well as every chromosome gets subjected to AGA’s many operations. Based on the fitness value, the Chromosomes gets evaluated and those chromosomes are made available at output. The vital steps in genetic algorithm are mutation, crossover. The chromosome represents contained information in a predefined way of the solution. The binary string is a common way to encode information about chromosome. A chromosome can be representing as follows:

$$\begin{aligned}&\hbox {Chromosome }1 \Rightarrow \texttt {1101100100110110}\\&\hbox {Chromosome }2 \Rightarrow \texttt {1101111000011110} \end{aligned}$$

Every bit in the string can retain a correspondence to the solution’s characteristic. In other words a number can be represented by a complete string. Many coding techniques exist for solution encoding; it depends basically on the problem solved. To cite a situation, a real number or an integer could be directly encoded, certain permutations can be encoded and so on.

Step 1
Chromosome Generation The initial stage in this AGA algorithm is generating chromosomes. The chromosomes here are nothing but the generated rules employing fuzzy; the genes are nothing but the rule parameters. At the solution space, a count of ‘C’ chromosomes in random are generated that are given in the term shown below
$$\begin{aligned} Ch_{k}=[{G_{0}}^k{G_{1}}^k...{G_{C_{L}-1}}^k] 0\le k \le M-1;0\le i \le C_{L}-1 \end{aligned}$$
(10)
where $G_{i}^{k}$ is the jth gene of the chromosome, M is the total population and CL is the length of the chromosome.
Step 2
Calculating Fitness function The fitness function is given in Eq. 11. The prime goal of fitness function is to optimize the rules while selecting solutions. The solutions for having better fitness are chosen to proceed further in Eq. 11.
$$\begin{aligned} ft=\sum _{K=1}^{M}R_{s}/M \end{aligned}$$
(11)
where s denotes $\frac{m}{k}$ to be included in the summation term, and certain parameters to improve. Here we have $R_s$ as the selected rule and M are the rules total count.
The fitness value, $f_{t}$ for each chromosome is computed based on the chosen rules. Every chromosome is checked against fitness function. Only those solutions which satisfy the fitness function will be selected to participate in the reproduction using either crossover or mutation.
Step 3
Crossover For generating a new chromosome, crossover is performed between two parent chromosomes. The newly generated chromosome is called an offspring. The crossover is carried out depending upon chosen genes and production of offspring depends on the crossover rate (CO rate). The equation to find the crossover point is shown in Eq. 12
$$\begin{aligned} CP_{rate}=\frac{CG}{CL} \end{aligned}$$
(12)
where $CP_rate$ is the Crossover Rate, CG is the number of Genes Generated, and CL is length of the chromosome.
Based on the computed CO rate, the parent chromosomes perform crossover generating a set of new chromosomes named offspring. By CO, the crossover point is found, and the genes at these points are interchanged from chromosomes of both parent so that offsprings are generated containing characteristics of both parents’ chromosomes. The chromosomes generated will have a better fitness when compared with the older chromosome generation thereby making it better for processing.
Step 4
Adaptive Mutation In the proposed method, in place of mutating step in which some random genes are changed from a single parent, the mutation is done based on rate of mutation. Mutation rate is calculated as below:
$$\begin{aligned} MU_{r}=\frac{P_{m}}{C_{L}} \end{aligned}$$
(13)
where $MU_r$ is the Mutation rate, $P_m$ is the Mutation Point and $C_L$ is the length of the chromosome.
The selection of mutation rate depends upon the estimated fitness value. Based on the generated rules by fuzzy logic, the fitness value is utilized in this method. Comparison of mutation rate with stated values of fitness is done based on the threshold and resultant values are selected as final mutation rate. The vector representing the mutation points is as follows:
$$\begin{aligned} MU_{r}=\left\{ {mp_{1}, mp_{2}...mp_{l}}\right\} \end{aligned}$$
(14)
Where l denotes length of chromosome. Rate of mutation r identification is done basing on fitness $f_{t}$.
$$\begin{aligned} MU_{r}= {\left\{ \begin{array}{ll} 1; &{} \quad \text {if }f_{t}\le T \\ 0; &{} \quad \text {else} \end{array}\right. } \end{aligned}$$
(15)
where T computation is based upon generated fuzzy rule. Mutating being done for extraction of every mutating point utilizing in Eq. 14. Rate of mutation changes for every chromosome during every iteration and depends upon fitness value.
Step 5
Selection
The last step in the adaptive genetic algorithm is the selection process. Based on the fitness value obtained, the chromosomes that are new ($N_{p}$) are positioned in a selection pool. In selection pool chromosomes whose value of fitness is the best will stay on top. The top most $N_{p}$ chromosomes stored in selection pool are chosen as the next generation between the 2$N_{p}$ chromosomes.

5.3 Proposed AGAFL method

The process of feature reduction decreases the computation cost and also increases classification performance. To improve results in predicting disease, in this paper, rough sets is used for feature reduction; to generate rule set Fuzzy Logic classifier is used. The solution set is formed by Adaptive Genetic Algorithm in order to get optimized rules to predict disease. The disease prediction model comprises of steps: Rough sets based attribute reduction, normalization, and then AGAFL classification. To start with the input dataset normalization is done in the range of $\left[ 0,1\right]$. The rough set based technique is applied for selecting best attributes. The reduced attributes will be divided into a couple of subsets: testing dataset, training dataset. The training dataset is fed into AGAFL; testing dataset is utilized for testing the proposed model. The process of proposed heart disease prediction is depicted in Fig. 2. The details of each step in the proposed model are described as follows:

1.
Normalization
Consider the dataset containing the number of attributes and entities. Normalization is applied to the dataset to reduce the arithmetic complexity of the data by converting data into interval of specific type. For normalizing, widely used min-max method is employed. The original dataset is mapped by Min-max normalization into one range by using
$$\begin{aligned} D^{n}=\frac{D-{D_{min}}}{D_{max}-{D_{min}}}\times [new_{min}-new_{max}]+new_{min} \end{aligned}$$
(16)
Range of transform datasets is described by $new_{min}$, $new_{max}$; where, it is utilized that $new_{min}=0$ and $new_{max}=1$. The pseudocode of the proposed model is in Algorithm 2.
2.
Attribute Reduction using Rough Sets
The main task here is reducing attributes utilizing Rough sets. In addition there is reduction in attributes’ quantity and eliminating details which are irrelevant, unconnected, noisy or even redundant.
3.
Solution Representation
The solution is represented in a binary system. In every bit 1 shows selecting as 0, represents not selecting attribute of equivalence. To cite a situation, dataset containing 10 attributes $\left( a_{1},a_{2},a_{3}, \ldots a_{10}\right)$ as well as a solution $Y=1010110010$, then chosen attributes will be$\left( a_{1},a_{3},a_{5},a_{6},a_{9}\right)$.
4.
Fitness Function
Every solution’s value of fitness’ is generated by fitness function. Solution that is best is chosen based on the fitness value. It this work, UCI Heart disease dataset 14 attributes. From the attributes, we see that age, fasting blood sugar, sex, resting blood pressure are the attribute subset that are the much contributing parameters. These attributes’ subset is applied for defining the criterion for fitness to generate next population. Now, rule set from Fuzzy logic classifier will form the population on which fitness function will be applied. The Fitness Function states that every rule’s antecedent must possess the attribute subset $S_{f}$ = (age, fasting blood sugar, sex, resting blood pressure) to have better fitness, will be selected to proceed further. In other words, the chosen rule to participate in reproduction to generate next generation shall be the super set of $S_{f}$. Let $R=\left\{ r_{1},r_{2},r_{3}, \ldots. r_{m}\right\}$ be the set of rules under consideration to generate new population. Let $R_{f}\subseteq R$where $R_{f}$ is the set of rules containing super sets of $S_{f}$. Goodness of every solution is evaluated by fitness function $S_{f}$.
5.
Termination Criteria
The algorithm will stop its implementation only if maximum count of iterations is reached. The solution that contains the best fitness value is selected by utilizing RS, and the AGAFL is used to classify the datasets. As mentioned earlier the best attributes are given as input to fuzzy classifier.
6.
Prediction Based on Fuzzy Logic System
Once the reduction in the features from the dataset input is done, hybrid ADAFL classifier predicts disease. Fuzzy logic classifier has three steps:
1. (a)
  Fuzzification.
2. (b)
  Fuzzy inference engine.
3. (c)
  De-Fuzzification
1. 6.1
  Fuzzy Inference System
  A fuzzy inference system aids in mapping the inputs to the equivalent output by predefined fuzzy rules. The knowledge support includes if-then rules that denote the relationship among the input and output fuzzy groups. The inference system is enhanced by a sequence of actions like;
  1. i.
    Development of fuzzy rules.
  2. ii.
    Fuzzify values of input based on degree of membership.
  3. iii.
    Merging of fuzzified input as well as fuzzy rules to improve rules’ strength.
  4. iv.
    Finally the output is de-fuzzified to yield output as a crisp value.
2. 6.2
  Membership function
  The input data is transformed into a value of membership (or membership degree) among 0 and 1 by the membership function (MF). The triangular membership method is selected for modifying the data of input into fuzzy value. The principle engaged to analyze the membership values is illustrated below:
  $$\begin{aligned} f(x)={\left\{ \begin{array}{ll} 0 &{} \quad \text {if }x\le = i \\ \frac{x-i}{j-i} &{} \quad \text {if }i\le x\le j \\ \frac{k-x}{k-j} &{} \quad \text {if }j\le x \le k \\ 0 &{} \quad \text {if }x\ge k \end{array}\right. } \end{aligned}$$
  (17)
3. 6.3
  Fuzzy rule generation
  The fuzzy rule generation is a very important mission that assists in recording the input to its equivalent output. If $A_1, A_2,\ldots , A_N$ are the attributes and $C_1, C_2$ are the class labels then a fuzzy rule can be framed based on the linguistic values like high, medium, low. The values N and M are the number of attributes and number of classes respectively. Therefore the fuzzy rule can be framed as follows:
  - If $A_1$ is high and $A_2$ is low and $A_3$ is medium then class is $C_2$;
  - If $A_1$ is low and $A_2$ is medium and $A_3$ is medium then class is $C_1$;
  - If $A_1$ is high and $A_2$ is medium and $A_3$ is low then class is $C_2$;
4. 6.4
  Rule based fuzzy score computation
  The testing data with reduced attribute is fed to the AGAFL, where the test data is converted to the fuzzified value based on the fuzzy membership function. Then, the fuzzified input is matched with the fuzzy rules defined in the rule base. Here, the rule inference procedure is used to obtain the linguistic value that is then converted to the fuzzy score using the average weighted method. From the fuzzy score obtained, the classification decision is produced. The proposed model is demonstrated in Fig. 2.

6 Results and discussion

To implement the method proposed, Matlab version (7.12) is utilized. It is being carried out using a laptop with windows 10 that has the processor of Intel Core i5 having a speed of 1.6 GHz and 8 GB RAM. The model is evaluated on three different datasets in UCI machine learning repository.

6.1 Dataset description

The experiments were carried out on three different heart disease datasets from UCI machine learning repository namely Switzerland, Hungarian and Cleveland. These datasets have the following attributes

sex
age
type of chest pain
cholesterol
blood pressure while resting
fasting blood sugar
maximum attained heart rate
ECG at rest
exercise induced angina
slope of peak exercise
older peak
no. of major vessels by fluoroscopy colored
class label
thal

6.2 Evaluation metrics

Evaluating method of heart diease diagnosis is done by using the metrics below:

Sensitivity = $\dfrac{TP}{TP+FN}$

(Number of true positive assessment)/(Number of all positive assessment)

Specificity = $\dfrac{TN}{TN+FP}$

(Number of true negative assessment)/(Number of all negative assessment)

Accuracy = $\dfrac{TN + TP}{TN+TP+FN+FP}$

(Number of correct assessments)/Number of all assessments).

6.3 Performance evaluation

Heart disease prediction based on novel AGAFL classification is applied on the dataset. For classification this model used hybrid of Adaptive Genetic Algorithm and Fuzzy logic classifier. In previous works [28], LPP algorithm has been used for feature reduction and FFBAT+RBFL for prediction, where FFBAT is hybrid of firefly and bat optimization algorithms and RBFL is Rule Based Fuzzy Classifier. Also, Rough Set (RS) algorithm has been used for feature reduction and Fuzzy Logic Classifier (FL) for prediction, where FFBAT is hybrid of firefly and bat optimization algorithms and RBFL is Rule Based Fuzzy Classifier. The entire dataset is cross validated with k-fold cross validation, where k $=$ 10, by shuffling the dataset and split the dataset into k groups (k $=$ 10). Then the first group is used as a validation dataset whereas remaining $\text {k}-1$ (9) groups are used to train the model. It shows the comparative analyzes of proposed approach based on accuracy sensitivity and specificity measures. 80% of the data being used for training the proposed model and remaining 20% to validate the model. Table 1 proves that the proposed approach outperforms the existing approaches. Figures 3, 4 and 5 show the performance evaluation of proposed and existing algorithms based on the measures accuracy, sensitivity and specificity respectively. The proposed method reduces the search space when the class label of a new record has to be predicted, hence reducing the time complexity significantly which is demonstrated in the Fig. 6.

Table 1 Performance evaluation of proposed method and existing methods

Full size table

6.4 Significance testing

To test statistical difference between our proposed algorithm and other existing approaches parametric paired sample t test is applied [7]. $h=0$ is considered as null hypothesis that says that there is no significance difference between one existing algorithm and the proposed algorithm . We performed t test in MATLAB (R2014a) for significance level 0.05, where statistics are $`h'$, $`p'$ and $`t'$. If t test return the value $h=0$, null hypothesis is accepted and if $h=1$, it rejects the null hypothesis implies that there exist a significance difference between our proposed algorithm and existing one. This could be proven with the smaller p value than the significance level 0.05 and the larger t (calculated) value than the t (tabulated). In our experiment we took three datasets: Cleveland, Hungarian, Switzerland heart disease datasets taken from UCI machine learning repository. For four degree of freedom ($\text {df} = \text {observation}-1$) comparative t test results for F-Measure on three datasets are given in Table 2. We can observe that Cleveland except all datasets are significantly performing better for Proposed AGAFL than other algorithms. Also for Cleveland dataset, all evaluation measures are showing better results.

Table 2 Statistics of paired t test for F-measure of AGAFL with LPP $+$ RBFL, AGAFL with RS $+$ FL for $df=4$ and t (tabulated) $=$ 2.776

Full size table

ANOVA Null Hypothesis is also performed for significance testing. ANOVA’s Null hypothesis is true when all means of the experiment are identical or have no significant difference. Thus, they can be considered as a part of a larger set of the population. On the other hand, the alternate hypothesis is valid when at least one of the sample means is different from the rest of the sample means. In mathematical form, they can be represented as:

$$\begin{aligned}&H_{o}:\mu _{1}=\mu _{2}= \ldots .. \mu _{n}\hbox { Null hypothesis}\\&H_{1}:\mu _{1}\ne \mu _{2}\hbox { Alternate hypothesis} \end{aligned}$$

If the p value is less than the alpha level selected (which it is, in our case), as given in Tables 3 and 4, we reject the Null Hypothesis.

Table 3 ANOVA-single factor summary

Full size table

Table 4 ANOVA

Full size table

Within the group variance is larger and between the groups variance is small. So F will be smaller. Here, we can see that the F-value is greater than the F-critical value for the alpha level selected (0.05). Therefore, we have evidence to reject the null hypothesis and say that at least one of the two samples have significantly different means and thus belong to an entirely different population.

7 Future work

This work can significantly be improved by using advanced meta-heuristic algorithms like whale optimization algorithm, Antlion algorithm, adaptive bee colony algorithm, and others. Moreover, these models can be extended to other medical datasets as they become available. Also this model can be tested on several other medical datasets. Also this model can be tested on other domains like insurance, finance, etc. In the current study, the environment is static and data is not streaming. Thus, we can consider the development and testing of the proposed model in the dynamic environments or use streaming data in this model as a future work.

8 Conclusion

In this article, a novel method for heart disease classification has been proposed using Rough Sets and Fuzzy rule-based classification with adaptive genetic algorithm. The classification model proposed in this work has the following steps: first, feature reduction is done by rough set theory. Then, prediction of ailment by hybridizing Adaptive Genetic Algorithm with fuzzy logic classifier (AGAFL) is done. The generated rules are optimized by applying Adaptive Genetic Algorithm. The experimentation is performed on the UCI Heart Disease datasets. The overall experimental analysis shows that AGAFL performed better than other hybrid combinations with respect to measures like accuracy, specificity and sensitivity. Major strengths of the proposed model are, it can efficiently handle noisy data, it works efficiently even on huge number of attributes. Also, the proposed model avoids entrapment in local optimum.

References

Brameier M, Banzhaf W (2001) A comparison of linear genetic programming and neural networks in medical data mining. IEEE Trans Evol Comput 5(1):17–26
MATH Google Scholar
Cios KJ (2000) From the guest editor medical data mining and knowledge discovery. IEEE Eng Med Biol Mag 19(4):15–16
Google Scholar
Clarkson K, Srivastava G, Meawad F, Dwivedi AD (2019) Where’s @waldo? Finding users on twitter. In: Artificial intelligence and soft computing—18th international conference, ICAISC 2019, Zakopane, Poland, June 16–20, 2019, proceedings, part II, pp 338–349. https://doi.org/10.1007/978-3-030-20915-5_31
Google Scholar
Dwivedi AD, Malina L, Dzurenda P, Srivastava G (2019) Optimized blockchain model for internet of things based healthcare applications. In: 42nd international conference on telecommunications and signal processing, TSP 2019, Budapest, Hungary, July 1–3, 2019, pp 135–139. https://doi.org/10.1109/TSP.2019.8769060
Dwivedi AD, Srivastava G, Dhar S, Singh R (2019) A decentralized privacy-preserving healthcare blockchain for IOT. Sensors 19(2):326. https://doi.org/10.3390/s19020326
Google Scholar
Feyyad U (1996) Data mining and knowledge discovery: making sense out of data. IEEE Expert 11(5):20–25
Google Scholar
Fisher R (1955) Statistical methods and scientific induction. J R Stat Soc Series B Stat Methodol 17(1):69–78
MathSciNet MATH Google Scholar
Game PS, Vaze V, Emmanuel M (2019) Optimized decision tree rules using divergence based grey wolf optimization for big data classification in health care. Evol Intel. https://doi.org/10.1007/s12065-019-00267-w
Google Scholar
Gandomi AH, Yang XS, Alavi AH (2013) Cuckoo search algorithm: a metaheuristic approach to solve structural optimization problems. Eng Comput 29(1):17–35
Google Scholar
Han J, Rodriguez JC, Beheshti M (2008) Diabetes data analysis and prediction model discovery using rapidminer. In: 2008 second international conference on future generation communication and networking, vol. 3. IEEE, pp 96–99
Henriques J, Carvalho P, Paredes S, Rocha T, Habetha J, Antunes M, Morais J (2014) Prediction of heart failure decompensation events by trend analysis of telemonitoring data. IEEE J Biomed Health Inform 19(5):1757–1769
Google Scholar
Herland M, Khoshgoftaar TM, Wald R (2014) A review of data mining using big data in health informatics. J Big Data 1(1):2
Google Scholar
Kaluri R, Reddy P (2016) Sign gesture recognition using modified region growing algorithm and adaptive genetic fuzzy classifier. Int J Intell Eng Syst 9:225–233
Google Scholar
Kharat KD, Kulkarni PP, Nagori M (2012) Brain tumor classification using neural network based methods. Int J Comput Sci Inform 1(4):2231–5292
Google Scholar
Lahsasna A, Ainon RN, Zainuddin R, Bulgiba A (2012) Design of a fuzzy-based decision support system for coronary heart disease diagnosis. J Med Syst 36(5):3293–3306
Google Scholar
Lehmann TM, Güld MO, Deselaers T, Keysers D, Schubert H, Spitzer K, Ney H, Wein BB (2005) Automatic categorization of medical images for content-based retrieval and data mining. Comput Med Imaging Graph 29(2–3):143–155
Google Scholar
Long NC, Meesad P, Unger H (2015) A highly accurate firefly based algorithm for heart disease prediction. Expert Syst Appl 42(21):8221–8231
Google Scholar
Mohan S, Thirumalai C, Srivastava G (2019) Effective heart disease prediction using hybrid machine learning techniques. IEEE Access 7:81542–81554. https://doi.org/10.1109/ACCESS.2019.2923707
Google Scholar
Pawlak Z, Sowinski R (1994) Rough set approach to multi-attribute decision analysis. Eur J Oper Res 72(3):443–459
MATH Google Scholar
Reddy GT, Khare N (2017) An efficient system for heart disease prediction using hybrid ofbat with rule-based fuzzy logic model. J Circuits Systems Comput 26(04):1750061
Google Scholar
Reddy GT, Khare N (2018) Heart disease classification system using optimised fuzzy rule based algorithm. Int J Biomed Eng Technol 27(3):183–202
Google Scholar
Santhanam T, Ephzibah E (2015) Heart disease prediction using hybrid genetic fuzzy model. Indian J Sci Technol 8(9):797
Google Scholar
Seera M, Lim CP (2014) A hybrid intelligent system for medical data classification. Expert Syst Appl 41(5):2239–2249
Google Scholar
Si W, Srivastava G, Zhang Y, Jiang L (2019) Green internet of things application of a medical massage robot with system interruption. IEEE Access 7:127066–127077. https://doi.org/10.1109/ACCESS.2019.2939502
Google Scholar
Sidek KA, Mai V, Khalil I (2014) Data mining in mobile ecg based biometric identification. J Netw Comput Appl 44:83–91
Google Scholar
Srinivas K, Rao GR, Govardhan A (2014) Rough-fuzzy classifier: a system to predict the heart disease by blending two different set theories. Arab J Sci Eng 39(4):2857–2868
Google Scholar
Srivastava G, Crichigno J, Dhar S (2019) A light and secure healthcare blockchain for iot medical devices. In: 2019 IEEE Canadian conference of electrical and computer engineering (CCECE), pp 1–5. https://doi.org/10.1109/CCECE.2019.8861593
Thippa Reddy G, Khare N (2016) FFBAT-optimized rule based fuzzy logic classifier for diabetes. Int J Eng Res Afr 24:137–152
Google Scholar
Tsymbal A, Bolshakova N (2006) Guest editorial introduction to the special section on mining biomedical data. IEEE Trans Inf Technol Biomed 10(3):425–428
Google Scholar
Vafaie M, Ataei M, Koofigar HR (2014) Heart diseases prediction based on ecg signals’ classification using a genetic-fuzzy system and dynamical model of ECG signals. Biomed Signal Process Control 14:291–296
Google Scholar
Wang P, Weise T, Chiong R (2011) Novel evolutionary algorithms for supervised classification problems: an experimental study. Evol Intell 4(1):3–16
Google Scholar
Yuvaraj N, Vivekanandan P (2013) An efficient SVM based tumor classification with symmetry non-negative matrix factorization using gene expression data. In: 2013 international conference on information communication and embedded systems (ICICES). IEEE, pp 761–768

Download references

Author information

Authors and Affiliations

Vellore Institute of Technology, Vellore, Tamil Nadu, India
G. Thippa Reddy, M. Praveen Kumar Reddy, Kuruva Lakshmanna, Dharmendra Singh Rajput & Rajesh Kaluri
Department of Mathematics and Computer Science, Brandon University, Brandon, R7A 6A9, Canada
Gautam Srivastava
Research Center for Interneural Computing, China Medical University, Taichung, 40402, Taiwan, ROC
Gautam Srivastava

Authors

G. Thippa Reddy
View author publications
You can also search for this author in PubMed Google Scholar
M. Praveen Kumar Reddy
View author publications
You can also search for this author in PubMed Google Scholar
Kuruva Lakshmanna
View author publications
You can also search for this author in PubMed Google Scholar
Dharmendra Singh Rajput
View author publications
You can also search for this author in PubMed Google Scholar
Rajesh Kaluri
View author publications
You can also search for this author in PubMed Google Scholar
Gautam Srivastava
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gautam Srivastava.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Reddy, G.T., Reddy, M.P.K., Lakshmanna, K. et al. Hybrid genetic algorithm and a fuzzy logic classifier for heart disease diagnosis. Evol. Intel. 13, 185–196 (2020). https://doi.org/10.1007/s12065-019-00327-1

Download citation

Received: 11 September 2019
Revised: 11 November 2019
Accepted: 15 November 2019
Published: 26 November 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s12065-019-00327-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Hybrid genetic algorithm and a fuzzy logic classifier for heart disease diagnosis

Abstract

Similar content being viewed by others

A Hybrid Approach for Heart Disease Diagnosis and Prediction Using Machine Learning Techniques

RETRACTED ARTICLE: Efficient hybrid algorithm based on genetic with weighted fuzzy rule for developing a decision support system in prediction of heart diseases

Adaptive weighted fuzzy rule-based system for the risk level assessment of heart disease

1 Introduction

2 Paper organization

3 Related work

4 Algorithm background