A proposed gender-based approach for diagnosis of the coronary artery disease

Hogo, Mofreh A.

doi:10.1007/s42452-020-2858-1

A proposed gender-based approach for diagnosis of the coronary artery disease

Research Article
Published: 11 May 2020

Volume 2, article number 1060, (2020)
Cite this article

Download PDF

SN Applied Sciences Aims and scope Submit manuscript

A proposed gender-based approach for diagnosis of the coronary artery disease

Download PDF

Mofreh A. Hogo¹

1190 Accesses
3 Citations
Explore all metrics

Abstract

The in-time and correct diagnosis of coronary artery disease (CAD) assists cardiologists to take the right decisions. The patient’s gender affects the diagnosis, treatment process, and recovery program. The patient’s gender affects the structure and performance of the CAD diagnosis system. This paper studies the patient’s gender effects on the CAD diagnosis model structure and performance. The work in the paper built two separate and individual models: male and female. The feature set of each model was selected using the features ranking voting (FRV) Algorithm. The memberships of the selected features for each model were computed using the probabilistic clustering technique. We built 38 different classifiers for each model to select the best one with high performance and a simple structure. The results of each selected diagnosis model of each gender were analyzed and compared with related works. The comparison shows that the proposed approach outperforms current models and with a simple structure. The accuracy of the male diagnosis model was 95%, with a sensitivity of 96% and specificity of 100%. The accuracy of the female diagnosis model was 96%, with a sensitivity of 97% and specificity of 96%. The high-performance results prove the success of the proposed gender-based approach for the diagnosis of coronary artery disease.

Improved Detection of Coronary Artery Disease Using DT-RFE Based Feature Selection and Ensemble Learning

Feature Selection Method Based on Grey Wolf Optimization for Coronary Artery Disease Classification

A Novel Effective Ensemble Model for Early Detection of Coronary Artery Disease

Discover the latest articles, news and stories from top researchers in related subjects.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Coronary artery disease (CAD) is one of the most critical diseases; it can cause severe heart attacks in patients. Patients and cardiologists should be aware of preventing the occurrence of sudden life-threatening events. A huge quantity of clinical data should be managed so that the cardiovascular risk could be estimated more reliably to stratify patients and foster early intervention [1, 2]. According to [3] CAD occurs when atherosclerotic plaque (hardening of the arteries) builds up in the wall of the arteries that supply the heart; this plaque is primarily made of cholesterol. Plaque accumulation can be accelerated by smoking, high blood pressure, and a high level of cholesterol percentage compared with the normal level and diabetes. Patients are also at higher risk for plaque development if they are older (greater than 45 years for men and 55 years for women), or if they have a positive family history for early heart artery disease. When a blood clot forms on top of this plaque, the artery becomes blocked, causing a heart attack [3]. The total direct and indirect cost of cardiovascular disease (CVD) and stroke in the USA in 2009 is estimated to be $312.6 billion, compared with $228 billion in 2008 for all cancer and benign neoplasms according to the American Heart Association [4]. CVD costs more than any other diagnostic group. It is important to detect the cardiovascular symptoms precisely because when heart problems are detected, their underlying cause (atherosclerosis) is usually quite advanced and had been progressing for decades.

One of the possible solutions to minimize heart problems is to increase the people aware of their respective CAD risks in advance and guide them to take preventive actions accordingly [5]. As in many complex medical problems, early and accurate detection could lead to better decision making.

The gender of the CAD patient is an important issue to consider when building a CAD diagnosis model. Building separate CAD diagnosis models for male and female speeds up detection, decision making, and treatment program. The gender-based analysis affects the development process of the diagnosis model in all of its stages [6].

This paper studies the effects of the patient gender on building the CAD diagnosis model; that leads to speed up the discovery and diagnosis and support the surgeons to make the right decision at the correct time.

The proposed CAD approach aims at the development of CAD diagnosis model achieved with the following characteristics:

A small and simple model with a minimum number of features and high performance around 97% and faster warning based on fewer measurements so there is no need to wait for all measurements. The homogeneity of the dataset in each model when building two separate models one for male gender and another independent separate one for female gender; the training model for each gender will be simple due to homogeneity and small due to the small feature set; therefore, the testing will be faster. The time for learning and testing the model was less than 0.015 s for each model.

The rest of this paper is organized as follows: Sect. 2 reviews the literature. Section 3 introduces the proposed approach. Section 4 describes the experimental design. Section 5 is reserved for the analysis and discussion of the obtained results. Finally, Sect. 6 concludes the paper and proposes future extensions.

2 Literature review

The importance of developing CAD diagnosis systems and CAD decision support systems is introduced and implemented in many research works. CAD diagnosis and decision support systems, designed to assist physicians and other healthcare professionals with decision-making tasks, are called clinical decision support systems (CDSS). These systems mainly link two main components: knowledge and experts. The knowledge component of these systems may come from the experts (via expert systems) and/or it can be extracted from historical data sources via data mining. Data mining offers tools, considering decision making for a particular individual patient rather than a population of patients, resulting in the emphasis on finding out potentially useful, valid, novel and easily comprehended knowledge from data [7]. According to [3] CAD diagnosis methods include the recording of the electrical activity of the heart, as well as nuclear imaging of the blood flow to different regions of the heart. Using an external camera and ultrasound imaging of the heart muscle with exercise stress testing (stress echocardiography) is also a very accurate technique to detect CAD. The American Heart Association has identified several risk factors; some of them can be modified, treated or controlled, and some cannot; the more risk factors you have, the greater your chance of developing coronary heart disease [8].

Several studies have applied several techniques in collecting datasets from patients while using different data mining algorithms to develop data mining models. The work presented in [9] used a feature creation method to enrich the dataset and applied the information gain and confidence to determine the effectiveness of features on CAD [9]. It reported that chest pain, region RWMA2, and age were the most effective ones with high accuracy. Particle swarm optimization (PSO)-based fuzzy expert system for the diagnosis of coronary artery disease was proposed in [10]. It applied a decision tree (DT) classifier and converted its output into crisp if–then rules and transformed it into the fuzzy rule base. PSO was employed to tune the fuzzy membership functions (MFs). The work presented in [10] introduced a particle swarm optimization (PSO)-based fuzzy expert system to diagnose CAD. It applied a decision tree (DT) classifier and converted its output into crisp if–then rules and transformed it into the fuzzy rule base. PSO was employed to tune the fuzzy membership functions (MFs). A framework for intelligent medical diagnosis using the rough sets with formal concept analysis fuzzy set was developed in [11]. A fuzzy expert system for coronary heart disease diagnosis in Jordan was developed in [12]. In 2013, a newly provided dataset for heart disease, named as Z-Alizadehsani dataset, was published [13]. The collected databank comprises the information of 303 patients. This databank has 55 independent parameters and classifies a person into a normal or CAD class. They utilized several machine learning methods such as SMO, ANN, NB and bagging algorithms to study the Z-Alizadehsani dataset; moreover, they used feature creation and feature selection methods to evaluate the obtained results and concluded that the performances can be improved with utilizing the feature creation and feature selection techniques.

A computer-aided decision making for heart disease detection using a hybrid neural network–genetic algorithm was implemented in [14]. Using decision tree algorithms and data mining approaches for proving that HS-CRP is strongly associated with coronary heart disease (CHD) was implemented in [15]. The use of computational intelligence methods to detect CAD was introduced in [16]. The benchmarking of feature selection techniques for coronary artery CAD diagnosis was proposed in [17]. A cardiovascular risk prediction method based on text analysis and data mining ensemble system was developed in [18]. A CAD diagnosis model using supervised fuzzy c-means with differential search algorithm was presented in [19]. A fuzzy rule generation for diagnosis of coronary heart disease risk using a subtractive clustering method was presented in [20]. The use of fuzzy c-means clustering employed for predicting heart disease symptoms was presented in [21]. The use of fuzzy classification for obtaining an enhanced risk prediction system for cardiovascular disease in India was introduced in [22]. Data mining with decision trees for the assessment of the risk factors of coronary heart events was developed in [23].

Alizadehsani et al. [24] divided the dataset into the training 90% and test 10% datasets. They used the information gain and SVM methods for feature analysis and feature selection. They employed the SVM methodology in combination with RBF, sigmoid, linear, and polynomial for building classifier.

A model of an association rule discovery with fuzzy decreasing support on syndrome differentiation and medication in coronary heart disease was presented in [24]. The use of cost-sensitive algorithms for a diagnosis of coronary artery disease was implemented in [25], and a comparative study of medical data classification methods based on decision tree and bagging algorithms was presented. According to medical experts, early detection may prevent death due to CAD if the proper medication is given thereafter as presented in [26]. Considering the gender difference of the patient when analyzing and diagnosing the patients’ data and the behavior of the CAD and its progress and causes affects the diagnosis and treatment program, as stated in [6]. Gender differences in patients include biological, environmental, behavioral and psychological risk factors. A methodology for the automatic detection of normal and coronary artery disease conditions using heart rate signals is presented in [27] using principal component analysis (PCA) and different analysis algorithms. The work presented in [28] selected the classifier with high performance among different types of classifiers such as logistic regression (LR), classification and regression tree (CART), Multilayer Perceptron (MLP), radial basis function (RBF) and self-organizing feature maps (SOFM); eight predefined attributes were used (age, sex, family history of CAD, smoking status, diabetes mellitus, systemic hypertension, hypercholesterolemia, and body mass index (BMI)), and the major drawback of this work is its low performance. The work presented in [29] applied traditional machine learning algorithms with three types of SVM, the performance enhancement based on data preprocessing of attributes normalization and utilized the genetic algorithm and particle swarm optimization of classifier parameters selection and also introduced a new genetic training, which provided the accuracy of 93.08%.

Ghiasi et al. [30] introduced a tree-based classifier for building a CAD diagnosis approach employing the Z-Alizadehsani dataset. Another work utilized the ANN and GA techniques; Arabasadi et al. [15] classified the Z-Alizadehsani dataset in 2017. The obtained performance was represented by magnitudes of sensitivity, specificity, and accuracy for the GA-ANN which were 97%, 92%, and 93.85%, respectively. On the other hand, the ANN model classified the dataset with 84.62% accuracy, 86% sensitivity, and 83% specificity. The classification capability of the hybrid model of GA-ANN was considerably higher than the developed ANN model. The building of CAD classification system using Naïve Bayes, SVM, SMO, C4.5, and KNN techniques' accuracy levels was also evaluated in a work introduced by Alizadehsani et al. [31]. Alizadehsani et al. [32] proposed a data mining model for CAD classification to detect left circumflex, left anterior descending, and right coronary artery, which improved the classification accuracy. Acharya et al. [33] introduced a comparative study of the performance of three techniques named as discrete wavelet transform (DWT), empirical mode decomposition (EMD), and discrete cosine transform (DCT) in the detection of CAD. Alkeshoush et al. [34] studied the importance of the particle swarm optimization (PSO) algorithm in the diagnosis of heart disease. Steele et al. [35] discovered that machine learning techniques outperform in electronic health records than conventional survival methods for predicting patient mortality in CAD. Johnson et al. [36] employed machine learning techniques for the scoring of CAD characteristics on coronary CT angiograms. In 2019, Alizadehsani et al. [37] conducted a review of machine learning techniques for CAD prediction.

3 Proposed approach

The work in this paper studies and presents the effect of separating the dataset of the CAD patients into two independent datasets for male and female patients and builds two separate diagnosis models for CAD patients based on their gender. The workflow of the proposed approach is comprised of four phases as shown in Fig. 1.

The first phase comprises data segmentation into male and female datasets (segments). The second phase includes the preprocessing for missed values, features discretization, binarization, and features selection processes to extract the most important features in each dataset (using the features ranks voting named FRV [38]. The third phase is the building of 38 different classifiers from seven different classifier categories for female and male datasets separately to select the most suitable classifier model for each gender. The last phase applies the classification via clustering technique to increase accuracy, sensitivity, and specificity. The overall structure of the proposed approach of building gender-based diagnosis models for males and females is illustrated in Fig. 1.

More than 50 different data mining methods were applied in implementing the proposed diagnosis system. The following steps represent the different phases and the structure of the two proposed diagnosis models:

1.
The preprocessing methods which solve the problem of missed values, and feature evaluation algorithms (CfsSubsetEval, GainRatioAttributeEval, CorrelationAttributeEval, GainRatioAttributeEval, InfoGain-AttributeEval, OneRAttributeEval, PrincipalComponents, SymmetricalUncertAttributeEval),
2.
The features selection algorithm FRV (Feature Rank voting Algorithm), features binarization and discretization processes (NominalToBinary Conversion, attributes discretization, ClusterMembership methods)
3.
Classification methods which are grouped into seven groups as:
- Bayes Net-based classifiers,
- Functions-based classifiers,
- Misc,
- Lazy includes,
- Meta-based classifiers (optimization techniques).
- Rule-based classifiers,
- Tree-based classifiers.
4.
The classification via clustering to increase the accuracy of the diagnosis model.

Some stages of the above preprocessing methods such as the feature evaluation methods and classifiers used the Weka software which is implemented [39] and developed in software programs.

4 Experimental design

4.1 Dataset description and preprocessing

The proposed approach in this paper utilized the dataset in [40] with its description presented on the UCI machine learning repository and named as “Heart Disease dataset and Z-Alizadehsani dataset.” This dataset is comprised of 270 patients’ records, each with 75 attributes.

This dataset is divided into two segments based on the gender of the patient as male data segment (183 records with 74 attributes instead of 75 by excluding the gender feature) and female data segment (87 instances with 74 attributes instead of 75 by excluding the gender feature). The Z-Alizadehsani dataset is comprised of 303 CAD patients’ records, each with 56 attributes, categorized into four categories; demographic features set (seven attributes), symptoms and examination feature set (16 attributes), ECG features set (14 attributes), and laboratory feature set (18 attributes), as shown in Tables 1, 2, 3, and 4, respectively. The total number of attributes is 56 including the class attributes.

Table 1 Demographic features

A proposed gender-based approach for diagnosis of the coronary artery disease

Abstract

Similar content being viewed by others

Improved Detection of Coronary Artery Disease Using DT-RFE Based Feature Selection and Ensemble Learning

Feature Selection Method Based on Grey Wolf Optimization for Coronary Artery Disease Classification

A Novel Effective Ensemble Model for Early Detection of Coronary Artery Disease

Explore related subjects

1 Introduction

2 Literature review

3 Proposed approach

4 Experimental design

4.1 Dataset description and preprocessing

4.2 The features selection

4.2.1 Using five features ranking algorithms

4.2.2 Using nine features ranking algorithms

4.3 Classifier selection step and results interpretation

4.4 Classification via clustering

5 Results analysis and discussion

5.1 Performance metrics

5.2 Performance analysis

6 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation